[
https://issues.apache.org/jira/browse/LUCENE-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992237#comment-12992237
]
hao yan commented on LUCENE-2903:
---------------------------------
I tried to move memory allocation out of readBlock() to BlockReader's
constructor. It improves the performance a little. I also tried to use
ByteBuffer/IntBuffer to replace my manual convertsion between bytes[]/int[]. It
makes things worse.
The following is my result for 0.1M data:
(1) BulkVInt vs patchedFrameoFRef3
QueryQPS bulkVIntQPS patchedFrameoFRef3 Pct diff
"united states" 393.55 362.84 -7.8%
"united states"~3 243.84 236.80 -2.9%
+nebraska +states 1140.25 998.00 -12.5%
+united +states 687.76 633.31 -7.9%
doctimesecnum:[10000 TO 60000] 413.56 427.53 3.4%
doctitle:.*[Uu]nited.* 510.46 534.47 4.7%
spanFirst(unit, 5) 1240.69 1108.65 -10.6%
spanNear([unit, state], 10, true) 511.77 463.18 -9.5%
states 1626.02 1483.68 -8.8%
u*d 164.23 162.79 -0.9%
un*d 257.53 252.97 -1.8%
uni* 607.53 591.02 -2.7%
unit* 1024.59 1043.84 1.9%
united states 627.35 578.70 -7.8%
united~0.6 11.51 11.36 -1.3%
united~0.75 52.58 53.57 1.9%
unit~0.5 12.08 11.93 -1.2%
unit~0.7 50.98 51.30 0.6%
(2) FrameOfRef VS PatchcedFrameOfRef3
QueryQPS patchedFrameofrefQPS pathcedFrameofref3 Pct diff
"united states" 314.76 362.71 15.2%
"united states"~3 227.53 237.08 4.2%
+nebraska +states 1075.27 1025.64 -4.6%
+united +states 646.41 626.57 -3.1%
doctimesecnum:[10000 TO 60000] 412.88 429.37 4.0%
doctitle:.*[Uu]nited.* 481.70 528.82 9.8%
spanFirst(unit, 5) 1060.45 1118.57 5.5%
spanNear([unit, state], 10, true) 409.33 467.73 14.3%
states 1353.18 1479.29 9.3%
u*d 158.91 165.98 4.4%
un*d 237.36 256.41 8.0%
uni* 560.22 593.12 5.9%
unit* 946.97 1043.84 10.2%
united states 431.22 583.09 35.2%
united~0.6 10.91 11.37 4.2%
united~0.75 50.30 53.30 5.9%
unit~0.5 11.54 11.94 3.5%
unit~0.7 47.38 50.38 6.3%
(3) PatchedFrameOfRef VS PatchedFrameOfRef3
QueryQPS FrameOfRefQPS pathcedFrameofref3 Pct diff
"united states" 326.26 360.49 10.5%
"united states"~3 226.50 234.69 3.6%
+nebraska +states 1077.59 1021.45 -5.2%
+united +states 648.51 630.52 -2.8%
doctimesecnum:[10000 TO 60000] 324.46 428.45 32.0%
doctitle:.*[Uu]nited.* 485.44 527.70 8.7%
spanFirst(unit, 5) 1007.05 1111.11 10.3%
spanNear([unit, state], 10, true) 446.03 465.55 4.4%
states 1449.28 1459.85 0.7%
u*d 158.43 161.79 2.1%
un*d 246.37 256.28 4.0%
uni* 548.85 594.88 8.4%
unit* 920.81 1042.75 13.2%
united states 450.65 576.37 27.9%
united~0.6 11.07 11.26 1.7%
united~0.75 50.70 52.60 3.8%
unit~0.5 11.64 11.76 1.0%
unit~0.7 49.04 50.70 3.4%
> Improvement of PForDelta Codec
> ------------------------------
>
> Key: LUCENE-2903
> URL: https://issues.apache.org/jira/browse/LUCENE-2903
> Project: Lucene - Java
> Issue Type: Improvement
> Reporter: hao yan
> Attachments: LUCENE_2903.patch, LUCENE_2903.patch
>
>
> There are 3 versions of PForDelta implementations in the Bulk Branch:
> FrameOfRef, PatchedFrameOfRef, and PatchedFrameOfRef2.
> The FrameOfRef is a very basic one which is essentially a binary encoding
> (may result in huge index size).
> The PatchedFrameOfRef is the implmentation based on the original version of
> PForDelta in the literatures.
> The PatchedFrameOfRef2 is my previous implementation which are improved this
> time. (The Codec name is changed to NewPForDelta.).
> In particular, the changes are:
> 1. I fixed the bug of my previous version (in Lucene-1410.patch), where the
> old PForDelta does not support very large exceptions (since
> the Simple16 does not support very large numbers). Now this has been fixed in
> the new LCPForDelta.
> 2. I changed the PForDeltaFixedIntBlockCodec. Now it is faster than the other
> two PForDelta implementation in the bulk branch (FrameOfRef and
> PatchedFrameOfRef). The codec's name is "NewPForDelta", as you can see in the
> CodecProvider and PForDeltaFixedIntBlockCodec.
> 3. The performance test results are:
> 1) My "NewPForDelta" codec is faster then FrameOfRef and PatchedFrameOfRef
> for almost all kinds of queries, slightly worse then BulkVInt.
> 2) My "NewPForDelta" codec can result in the smallest index size among all 4
> methods, including FrameOfRef, PatchedFrameOfRef, and BulkVInt, and itself)
> 3) All performance test results are achieved by running with "-server"
> instead of "-client"
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]