[ 
https://issues.apache.org/jira/browse/LUCENE-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992237#comment-12992237
 ] 

hao yan commented on LUCENE-2903:
---------------------------------

I tried to move memory allocation out of readBlock() to BlockReader's 
constructor. It improves the performance a little. I also tried to use 
ByteBuffer/IntBuffer to replace my manual convertsion between bytes[]/int[]. It 
makes things worse.

The following is my result for 0.1M data:
(1) BulkVInt vs patchedFrameoFRef3
        QueryQPS       bulkVIntQPS patchedFrameoFRef3  Pct diff
     "united states"      393.55      362.84     -7.8%
   "united states"~3      243.84      236.80     -2.9%
   +nebraska +states     1140.25      998.00    -12.5%
     +united +states      687.76      633.31     -7.9%
doctimesecnum:[10000 TO 60000]      413.56      427.53      3.4%
doctitle:.*[Uu]nited.*      510.46      534.47      4.7%
  spanFirst(unit, 5)     1240.69     1108.65    -10.6%
spanNear([unit, state], 10, true)      511.77      463.18     -9.5%
              states     1626.02     1483.68     -8.8%
                 u*d      164.23      162.79     -0.9%
                un*d      257.53      252.97     -1.8%
                uni*      607.53      591.02     -2.7%
               unit*     1024.59     1043.84      1.9%
       united states      627.35      578.70     -7.8%
          united~0.6       11.51       11.36     -1.3%
         united~0.75       52.58       53.57      1.9%
            unit~0.5       12.08       11.93     -1.2%
            unit~0.7       50.98       51.30      0.6%

(2) FrameOfRef VS PatchcedFrameOfRef3
QueryQPS        patchedFrameofrefQPS pathcedFrameofref3  Pct diff
     "united states"      314.76      362.71     15.2%
   "united states"~3      227.53      237.08      4.2%
   +nebraska +states     1075.27     1025.64     -4.6%
     +united +states      646.41      626.57     -3.1%
doctimesecnum:[10000 TO 60000]      412.88      429.37      4.0%
doctitle:.*[Uu]nited.*      481.70      528.82      9.8%
  spanFirst(unit, 5)     1060.45     1118.57      5.5%
spanNear([unit, state], 10, true)      409.33      467.73     14.3%
              states     1353.18     1479.29      9.3%
                 u*d      158.91      165.98      4.4%
                un*d      237.36      256.41      8.0%
                uni*      560.22      593.12      5.9%
               unit*      946.97     1043.84     10.2%
       united states      431.22      583.09     35.2%
          united~0.6       10.91       11.37      4.2%
         united~0.75       50.30       53.30      5.9%
            unit~0.5       11.54       11.94      3.5%
            unit~0.7       47.38       50.38      6.3%


(3) PatchedFrameOfRef VS PatchedFrameOfRef3

 QueryQPS             FrameOfRefQPS pathcedFrameofref3  Pct diff
     "united states"      326.26      360.49     10.5%
   "united states"~3      226.50      234.69      3.6%
   +nebraska +states     1077.59     1021.45     -5.2%
     +united +states      648.51      630.52     -2.8%
doctimesecnum:[10000 TO 60000]      324.46      428.45     32.0%
doctitle:.*[Uu]nited.*      485.44      527.70      8.7%
  spanFirst(unit, 5)     1007.05     1111.11     10.3%
spanNear([unit, state], 10, true)      446.03      465.55      4.4%
              states     1449.28     1459.85      0.7%
                 u*d      158.43      161.79      2.1%
                un*d      246.37      256.28      4.0%
                uni*      548.85      594.88      8.4%
               unit*      920.81     1042.75     13.2%
       united states      450.65      576.37     27.9%
          united~0.6       11.07       11.26      1.7%
         united~0.75       50.70       52.60      3.8%
            unit~0.5       11.64       11.76      1.0%
            unit~0.7       49.04       50.70      3.4%




> Improvement of PForDelta Codec
> ------------------------------
>
>                 Key: LUCENE-2903
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2903
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: hao yan
>         Attachments: LUCENE_2903.patch, LUCENE_2903.patch
>
>
> There are 3 versions of PForDelta implementations in the Bulk Branch: 
> FrameOfRef, PatchedFrameOfRef, and PatchedFrameOfRef2.
> The FrameOfRef is a very basic one which is essentially a binary encoding 
> (may result in huge index size).
> The PatchedFrameOfRef is the implmentation based on the original version of 
> PForDelta in the literatures.
> The PatchedFrameOfRef2 is my previous implementation which are improved this 
> time. (The Codec name is changed to NewPForDelta.).
> In particular, the changes are:
> 1. I fixed the bug of my previous version (in Lucene-1410.patch), where the 
> old PForDelta does not support very large exceptions (since
> the Simple16 does not support very large numbers). Now this has been fixed in 
> the new LCPForDelta.
> 2. I changed the PForDeltaFixedIntBlockCodec. Now it is faster than the other 
> two PForDelta implementation in the bulk branch (FrameOfRef and 
> PatchedFrameOfRef). The codec's name is "NewPForDelta", as you can see in the 
> CodecProvider and PForDeltaFixedIntBlockCodec.
> 3. The performance test results are:
> 1) My "NewPForDelta" codec is faster then FrameOfRef and PatchedFrameOfRef 
> for almost all kinds of queries, slightly worse then BulkVInt.
> 2) My "NewPForDelta" codec can result in the smallest index size among all 4 
> methods, including FrameOfRef, PatchedFrameOfRef, and BulkVInt, and itself)
> 3) All performance test results are achieved by running with "-server" 
> instead of "-client"

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to