[
https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12852855#action_12852855
]
Renaud Delbru commented on LUCENE-1410:
---------------------------------------
Here some results on the perofrmance of compression-decompression of various
algorithms over the wikipedia dataset (english articles).
First, the performance on compressing/decompressing over the full .doc posting
list using block size of 1024.
The Compression and Decompression are in KInt/ms and the size in bytes.
||Codec||Compression||Decompression||Size||
|FOR (Orig)|20|106|448856|
|PFOR (Orig)|8|107| 383596|
|VINT|37|74|421764|
|FOR (Opt)|39|102|447369|
|PFOR (Opt)|10|108|382379|
|RICE|8|31|350687|
|S9|16|65|408218|
* VInt is a block based version of variable integer (unlike the simple int
block, I am creating blocks using vint, then read the full block in memory and
decompress it using vint one integer at a time).
* For (Orig) and PFOR (Orig) are your implementation of FOR and PFOR.
* FOR (Opt) and PFOR (Opt) are my implementation of FOR and PFOR (using array
of functors for each compression decompression case).
* Rice is one implementation of the rice algorithm, but block-based.
* S9 is your implementation of simple 9 with some bug fixes, but block-based.
I have created a *IndexInput and *IndexOutput for each algorithm, and use them
to compress and decompress the posting file in this experiment.
We can see that using an array of functors for compression in FOR provides a
good speed increase in compression. However, on PFOR, the compression speed
increase is limited. From my test, it is due to the sort step which is
necessary for comrpessing each block.
PFOR provides a good balance between decompression speed and size, but it is
very slow in compression (it is as slow as Rice). I don't think it is possible
to improve the compression speed due to the sort step, which seems to impose a
hard upper limit in term of performance (I have tried various heuristics to
avoid sorting, but not very successful so far).
Following this first benchmark, I have created various Lucene codec that uses
the *IndexInput and *IndexOutput of the different compression algorithms. I
have indexed the full wikipedia dataset. Each block-based codec is configured
to use block of 1024.
I have recorded the average commit time and the optimise time. Then, I have
executing random keyword queries and I have recorded the average query time and
the number of bytes read for answering the query.
h5. Compression
Time is in ms, Size in bytes.
||Codec||Avg Commit Time||Total Commit Time||Optimise Time||Index Size||
|STD|1276|359978|113386|1423251|
|SEP|1437|405286|117163|1666870|
|VINT|1572|426551|122557|1742083|
|RICE|1791|505312|230636|1339703|
|S9|1575|444157|146216|1530666|
|FOR|1520|428847|134502|1754578|
|PFOR|1852|522401|189262|1511154|
* STD is the Standard lucene codec.
* SEP the Sep codec.
* All the other algorithms are based on the Sep codec, but block-based.
* FOR and PFOR is my implementation.
We can see that the sep codec and block-based codecs have a certain overhead
(in indexing time and index size) compared to the standard lucene codec. But, I
imagine that this overhead can be reduced with some code optimisation. For the
index size, the overhead in block-based codeca is due to the skip list, which
is much bigger (127MB for SEP against 189MB for block-based) since we need to
encode the block offset, and the inner documet offset in the block.
In term of compression speed and index size, we can see that this results
follows the previous results. We can observe also that PFOR is the slowest.
h5. Decompression
For each codec, I have performed five runs, each run having 200 random keyword
queries, and average the time over the 5 runs and 200 queries (i.e., 200*5 =
1000 query execution).
For each run, I am opening a new IndexReader and IndexSearcher and I am reusing
them across the 200 random queries.
To generate the queries, I first grouped the terms into three group: HIGH,
MEDIUM and LOW, based on their frequency, in order to be able to generate
random keyword queries based on this three frequency groups. Then, I have
performed benchmarks with random queries of one, two or more keywords, using
all possible combinations (e.g., HIGH & HIGH, HIGH & MEDIUM, etc). I have
executed the queries using a single thread, and also using multiple threads.
I am reporting below a few of query results, but for most of the other queries,
the results were following the same scheme. I don't report the results of the
Standard and Sep codec, but they always outperform block-based codec, whatever
compression used.
However, the results for the query time are completely contradictory with the
results of the first benchmark. The slowest algorithm (Rice, Vint) generally
provides the faster query execution time, and the faster algorithm (FOR, PFOR)
provides the slowest query execution time. Simple9 seems to be as fast as VInt.
I currently have no explanation for that. Maybe the dataset (wikipedia
articles) is too small to see the benefits. But I would say instead there is a
problem somewhere. How come VInt and even Rice could provide better query
execution times than FOR and PFOR ? While, in the first benchmark, it is clear
that FOR and PFOR provides nearly 40% of decompression performance increase.
Do you have some advices on how to improve the benchmark, or some ideas on the
causes of this frop of performance ? I'll be able to re-perform the benchmarks
if you propose something.
h6. HIGH:MUST - 1 thread - 200 random queries
||Codec||Avg Query Time||
|Rice|1.5 ms|
|VInt|1.2 ms|
|PFOR|2.3 ms|
|FOR|2.5 ms|
|S9|1 ms|
h6. HIGH:MUST - 2 thread - 200 random queries for each thread
||Codec||Avg Query Time||
|Rice|2 ms|
|VInt|1.9 ms|
|PFOR|3 ms|
|FOR|3.5 ms|
|S9|1.6 ms|
h6. HIGH:MUST HIGH:SHOULD HIGH:SHOULD HIGH:SHOULD - 1 thread - 200 random
queries
||Codec||Avg Query Time||
|Rice|1.5 ms|
|VInt|1.2 ms|
|PFOR|2.7 ms|
|FOR|2.5 ms|
|S9|1.3 ms|
h6. HIGH:MUST HIGH:SHOULD HIGH:SHOULD HIGH:SHOULD - 2 threads - 200 random
queries for each thread
|Rice|2.5 ms|
|VInt|2 ms|
|PFOR|3 ms|
|FOR|3.4 ms|
|S9|2 ms|
> PFOR implementation
> -------------------
>
> Key: LUCENE-1410
> URL: https://issues.apache.org/jira/browse/LUCENE-1410
> Project: Lucene - Java
> Issue Type: New Feature
> Components: Other
> Reporter: Paul Elschot
> Priority: Minor
> Attachments: autogen.tgz, for-summary.txt,
> LUCENE-1410-codecs.tar.bz2, LUCENE-1410b.patch, LUCENE-1410c.patch,
> LUCENE-1410d.patch, LUCENE-1410e.patch, TermQueryTests.tgz, TestPFor2.java,
> TestPFor2.java, TestPFor2.java
>
> Original Estimate: 21840h
> Remaining Estimate: 21840h
>
> Implementation of Patched Frame of Reference.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]