[jira] Commented: (LUCENE-1410) PFOR implementation

Renaud Delbru (JIRA) Fri, 02 Apr 2010 09:16:54 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12852855#action_12852855
 ]


Renaud Delbru commented on LUCENE-1410:
---------------------------------------

Here some results on the perofrmance of compression-decompression of various 
algorithms over the wikipedia dataset (english articles).

First, the performance on compressing/decompressing over the full .doc posting 
list using block size of 1024. 
The Compression and Decompression are in KInt/ms and the size in bytes.

||Codec||Compression||Decompression||Size||
|FOR (Orig)|20|106|448856|
|PFOR (Orig)|8|107| 383596|
|VINT|37|74|421764|
|FOR (Opt)|39|102|447369|
|PFOR (Opt)|10|108|382379|
|RICE|8|31|350687|
|S9|16|65|408218|

* VInt is a block based version of variable integer (unlike the simple int 
block, I am creating blocks using vint, then read the full block in memory and 
decompress it using vint one integer at a time).
* For (Orig) and PFOR (Orig) are your implementation of FOR and PFOR.
* FOR (Opt) and PFOR (Opt) are my implementation of FOR and PFOR (using array 
of functors for each compression decompression case).
* Rice is one implementation of the rice algorithm, but block-based.
* S9 is your implementation of simple 9 with some bug fixes, but block-based.

I have created a *IndexInput and *IndexOutput for each algorithm, and use them 
to compress and decompress the posting file in this experiment.

We can see that using an array of functors for compression in FOR provides a 
good speed increase in compression. However, on PFOR, the compression speed 
increase is limited. From my test, it is due to the sort step which is 
necessary for comrpessing each block. 
PFOR provides a good balance between decompression speed and size, but it is 
very slow in compression (it is as slow as Rice). I don't think it is possible 
to improve the compression speed due to the sort step, which seems to impose a 
hard upper limit in term of performance (I have tried various heuristics to 
avoid sorting, but not very successful so far).

Following this first benchmark, I have created various Lucene codec that uses 
the *IndexInput and *IndexOutput of the different compression algorithms. I 
have indexed the full wikipedia dataset. Each block-based codec is configured 
to use block of 1024. 
I have recorded the average commit time and the optimise time. Then, I have 
executing random keyword queries and I have recorded the average query time and 
the number of bytes read for answering the query.

h5. Compression

Time is in ms, Size in bytes.

||Codec||Avg Commit Time||Total Commit Time||Optimise Time||Index Size||
|STD|1276|359978|113386|1423251|
|SEP|1437|405286|117163|1666870|
|VINT|1572|426551|122557|1742083|
|RICE|1791|505312|230636|1339703|
|S9|1575|444157|146216|1530666|
|FOR|1520|428847|134502|1754578|
|PFOR|1852|522401|189262|1511154|

* STD is the Standard lucene codec.
* SEP the Sep codec.
* All the other algorithms are based on the Sep codec, but block-based.
* FOR and PFOR is my implementation.

We can see that the sep codec and block-based codecs have a certain overhead 
(in indexing time and index size) compared to the standard lucene codec. But, I 
imagine that this overhead can be reduced with some code optimisation. For the 
index size, the overhead in block-based codeca is due to the skip list, which 
is much bigger (127MB for SEP against 189MB for block-based) since we need to 
encode the block offset, and the inner documet offset in the block.

In term of compression speed and index size, we can see that this results 
follows the previous results. We can observe also that PFOR is the slowest.

h5. Decompression

For each codec, I have performed five runs, each run having 200 random keyword 
queries, and average the time over the 5 runs and 200 queries (i.e., 200*5 = 
1000 query execution).
For each run, I am opening a new IndexReader and IndexSearcher and I am reusing 
them across the 200 random queries.

To generate the queries, I first grouped the terms into three group: HIGH, 
MEDIUM and LOW, based on their frequency, in order to be able to generate 
random keyword queries based on this three frequency groups. Then, I have 
performed benchmarks with random queries of one, two or more keywords, using 
all possible combinations (e.g., HIGH & HIGH, HIGH & MEDIUM, etc). I have 
executed the queries using a single thread, and also using multiple threads.

I am reporting below a few of query results, but for most of the other queries, 
the results were following the same scheme. I don't report the results of the 
Standard and Sep codec, but they always outperform block-based codec, whatever 
compression used.

However, the results for the query time are completely contradictory with the 
results of the first benchmark. The slowest algorithm (Rice, Vint) generally 
provides the faster query execution time, and the faster algorithm (FOR, PFOR) 
provides the slowest query execution time. Simple9 seems to be as fast as VInt. 
I currently have no explanation for that. Maybe the dataset (wikipedia 
articles) is too small to see the benefits. But I would say instead there is a 
problem somewhere. How come VInt and even Rice could provide better query 
execution times than FOR and PFOR ? While, in the first benchmark, it is clear 
that FOR and PFOR provides nearly 40% of decompression performance increase.
Do you have some advices on how to improve the benchmark, or some ideas on the 
causes of this frop of performance ? I'll be able to re-perform the benchmarks 
if you propose something.

h6. HIGH:MUST - 1 thread - 200 random queries

||Codec||Avg Query Time||
|Rice|1.5 ms|
|VInt|1.2 ms|
|PFOR|2.3 ms|
|FOR|2.5 ms|
|S9|1 ms|

h6. HIGH:MUST - 2 thread - 200 random queries for each thread

||Codec||Avg Query Time||
|Rice|2 ms|
|VInt|1.9 ms|
|PFOR|3 ms|
|FOR|3.5 ms|
|S9|1.6 ms|

h6. HIGH:MUST HIGH:SHOULD HIGH:SHOULD HIGH:SHOULD - 1 thread - 200 random 
queries

||Codec||Avg Query Time||
|Rice|1.5 ms|
|VInt|1.2 ms|
|PFOR|2.7 ms|
|FOR|2.5 ms|
|S9|1.3 ms|

h6. HIGH:MUST HIGH:SHOULD HIGH:SHOULD HIGH:SHOULD - 2 threads - 200 random 
queries for each thread

|Rice|2.5 ms|
|VInt|2 ms|
|PFOR|3 ms|
|FOR|3.4 ms|
|S9|2 ms|



> PFOR implementation
> -------------------
>
>                 Key: LUCENE-1410
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1410
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Other
>            Reporter: Paul Elschot
>            Priority: Minor
>         Attachments: autogen.tgz, for-summary.txt, 
> LUCENE-1410-codecs.tar.bz2, LUCENE-1410b.patch, LUCENE-1410c.patch, 
> LUCENE-1410d.patch, LUCENE-1410e.patch, TermQueryTests.tgz, TestPFor2.java, 
> TestPFor2.java, TestPFor2.java
>
>   Original Estimate: 21840h
>  Remaining Estimate: 21840h
>
> Implementation of Patched Frame of Reference.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1410) PFOR implementation

Reply via email to