[ https://issues.apache.org/jira/browse/LUCENE-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992237#comment-12992237 ]
hao yan commented on LUCENE-2903: --------------------------------- I tried to move memory allocation out of readBlock() to BlockReader's constructor. It improves the performance a little. I also tried to use ByteBuffer/IntBuffer to replace my manual convertsion between bytes[]/int[]. It makes things worse. The following is my result for 0.1M data: (1) BulkVInt vs patchedFrameoFRef3 QueryQPS bulkVIntQPS patchedFrameoFRef3 Pct diff "united states" 393.55 362.84 -7.8% "united states"~3 243.84 236.80 -2.9% +nebraska +states 1140.25 998.00 -12.5% +united +states 687.76 633.31 -7.9% doctimesecnum:[10000 TO 60000] 413.56 427.53 3.4% doctitle:.*[Uu]nited.* 510.46 534.47 4.7% spanFirst(unit, 5) 1240.69 1108.65 -10.6% spanNear([unit, state], 10, true) 511.77 463.18 -9.5% states 1626.02 1483.68 -8.8% u*d 164.23 162.79 -0.9% un*d 257.53 252.97 -1.8% uni* 607.53 591.02 -2.7% unit* 1024.59 1043.84 1.9% united states 627.35 578.70 -7.8% united~0.6 11.51 11.36 -1.3% united~0.75 52.58 53.57 1.9% unit~0.5 12.08 11.93 -1.2% unit~0.7 50.98 51.30 0.6% (2) FrameOfRef VS PatchcedFrameOfRef3 QueryQPS patchedFrameofrefQPS pathcedFrameofref3 Pct diff "united states" 314.76 362.71 15.2% "united states"~3 227.53 237.08 4.2% +nebraska +states 1075.27 1025.64 -4.6% +united +states 646.41 626.57 -3.1% doctimesecnum:[10000 TO 60000] 412.88 429.37 4.0% doctitle:.*[Uu]nited.* 481.70 528.82 9.8% spanFirst(unit, 5) 1060.45 1118.57 5.5% spanNear([unit, state], 10, true) 409.33 467.73 14.3% states 1353.18 1479.29 9.3% u*d 158.91 165.98 4.4% un*d 237.36 256.41 8.0% uni* 560.22 593.12 5.9% unit* 946.97 1043.84 10.2% united states 431.22 583.09 35.2% united~0.6 10.91 11.37 4.2% united~0.75 50.30 53.30 5.9% unit~0.5 11.54 11.94 3.5% unit~0.7 47.38 50.38 6.3% (3) PatchedFrameOfRef VS PatchedFrameOfRef3 QueryQPS FrameOfRefQPS pathcedFrameofref3 Pct diff "united states" 326.26 360.49 10.5% "united states"~3 226.50 234.69 3.6% +nebraska +states 1077.59 1021.45 -5.2% +united +states 648.51 630.52 -2.8% doctimesecnum:[10000 TO 60000] 324.46 428.45 32.0% doctitle:.*[Uu]nited.* 485.44 527.70 8.7% spanFirst(unit, 5) 1007.05 1111.11 10.3% spanNear([unit, state], 10, true) 446.03 465.55 4.4% states 1449.28 1459.85 0.7% u*d 158.43 161.79 2.1% un*d 246.37 256.28 4.0% uni* 548.85 594.88 8.4% unit* 920.81 1042.75 13.2% united states 450.65 576.37 27.9% united~0.6 11.07 11.26 1.7% united~0.75 50.70 52.60 3.8% unit~0.5 11.64 11.76 1.0% unit~0.7 49.04 50.70 3.4% > Improvement of PForDelta Codec > ------------------------------ > > Key: LUCENE-2903 > URL: https://issues.apache.org/jira/browse/LUCENE-2903 > Project: Lucene - Java > Issue Type: Improvement > Reporter: hao yan > Attachments: LUCENE_2903.patch, LUCENE_2903.patch > > > There are 3 versions of PForDelta implementations in the Bulk Branch: > FrameOfRef, PatchedFrameOfRef, and PatchedFrameOfRef2. > The FrameOfRef is a very basic one which is essentially a binary encoding > (may result in huge index size). > The PatchedFrameOfRef is the implmentation based on the original version of > PForDelta in the literatures. > The PatchedFrameOfRef2 is my previous implementation which are improved this > time. (The Codec name is changed to NewPForDelta.). > In particular, the changes are: > 1. I fixed the bug of my previous version (in Lucene-1410.patch), where the > old PForDelta does not support very large exceptions (since > the Simple16 does not support very large numbers). Now this has been fixed in > the new LCPForDelta. > 2. I changed the PForDeltaFixedIntBlockCodec. Now it is faster than the other > two PForDelta implementation in the bulk branch (FrameOfRef and > PatchedFrameOfRef). The codec's name is "NewPForDelta", as you can see in the > CodecProvider and PForDeltaFixedIntBlockCodec. > 3. The performance test results are: > 1) My "NewPForDelta" codec is faster then FrameOfRef and PatchedFrameOfRef > for almost all kinds of queries, slightly worse then BulkVInt. > 2) My "NewPForDelta" codec can result in the smallest index size among all 4 > methods, including FrameOfRef, PatchedFrameOfRef, and BulkVInt, and itself) > 3) All performance test results are achieved by running with "-server" > instead of "-client" -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org