long natively using VarHandles

Uwe Schindler (Jira) Sun, 19 Sep 2021 04:58:05 -0700


    [ 
https://issues.apache.org/jira/browse/LUCENE-10113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17417323#comment-17417323
 ]


Uwe Schindler commented on LUCENE-10113:
----------------------------------------

Performance comparison:

{noformat}
                    TaskQPS baseline      StdDevQPS my_modified_version      
StdDev                Pct diff p-value
       HighTermMonthSort       80.36     (11.4%)       78.81      (7.9%)   
-1.9% ( -19% -   19%) 0.533
    HighTermTitleBDVSort       12.47     (20.5%)       12.23     (18.6%)   
-1.9% ( -33% -   46%) 0.756
        HighSloppyPhrase       14.80      (3.9%)       14.61      (3.4%)   
-1.2% (  -8% -    6%) 0.287
               OrHighLow      191.33      (3.1%)      189.67      (3.8%)   
-0.9% (  -7% -    6%) 0.436
            HighSpanNear        2.76      (2.6%)        2.74      (4.4%)   
-0.6% (  -7% -    6%) 0.575
                  Fuzzy2       57.57      (1.6%)       57.21      (2.2%)   
-0.6% (  -4% -    3%) 0.304
              HighPhrase       39.92      (3.5%)       39.68      (2.8%)   
-0.6% (  -6% -    5%) 0.546
             LowSpanNear        8.42      (2.3%)        8.39      (2.8%)   
-0.4% (  -5% -    4%) 0.663
                Wildcard       31.79      (3.8%)       31.74      (3.9%)   
-0.2% (  -7% -    7%) 0.891
                  IntNRQ       94.20      (3.5%)       94.05      (4.1%)   
-0.2% (  -7% -    7%) 0.898
             MedSpanNear       29.62      (2.3%)       29.59      (2.7%)   
-0.1% (  -4% -    5%) 0.894
   HighTermDayOfYearSort       56.16      (7.7%)       56.11      (7.5%)   
-0.1% ( -14% -   16%) 0.969
         LowSloppyPhrase       10.42      (2.7%)       10.42      (2.4%)   
-0.0% (  -4% -    5%) 0.985
         MedSloppyPhrase        7.32      (2.5%)        7.32      (2.3%)    
0.0% (  -4% -    4%) 0.977
                 Prefix3       12.62      (9.1%)       12.63     (10.5%)    
0.1% ( -17% -   21%) 0.987
                  Fuzzy1       89.28      (1.1%)       89.34      (2.0%)    
0.1% (  -3% -    3%) 0.903
   BrowseMonthSSDVFacets        5.04      (5.5%)        5.04      (5.2%)    
0.1% ( -10% -   11%) 0.966
            OrNotHighLow      571.04      (1.5%)      571.46      (2.7%)    
0.1% (  -4% -    4%) 0.917
     MedIntervalsOrdered       36.70      (5.6%)       36.78      (5.7%)    
0.2% ( -10% -   12%) 0.905
                PKLookup      203.36      (3.8%)      203.81      (3.2%)    
0.2% (  -6% -    7%) 0.845
    HighIntervalsOrdered        3.55      (5.3%)        3.56      (5.0%)    
0.2% (  -9% -   11%) 0.891
                 Respell       59.27      (1.3%)       59.44      (1.7%)    
0.3% (  -2% -    3%) 0.548
               MedPhrase      367.55      (2.0%)      368.61      (1.7%)    
0.3% (  -3% -    4%) 0.623
              AndHighLow      560.26      (3.3%)      561.90      (3.8%)    
0.3% (  -6% -    7%) 0.795
            OrNotHighMed      971.44      (2.4%)      974.30      (3.2%)    
0.3% (  -5% -    6%) 0.742
               LowPhrase       41.63      (2.5%)       41.76      (2.3%)    
0.3% (  -4% -    5%) 0.672
     LowIntervalsOrdered       94.44      (3.1%)       94.75      (3.2%)    
0.3% (  -5% -    6%) 0.744
                 MedTerm     1590.31      (4.9%)     1596.02      (5.0%)    
0.4% (  -9% -   10%) 0.819
           OrHighNotHigh      958.25      (3.5%)      964.26      (3.7%)    
0.6% (  -6% -    8%) 0.581
                 LowTerm     1527.92      (2.5%)     1538.97      (3.0%)    
0.7% (  -4% -    6%) 0.412
              OrHighHigh       26.32      (3.0%)       26.55      (3.4%)    
0.9% (  -5% -    7%) 0.373
            OrHighNotMed     1177.62      (4.3%)     1188.50      (4.8%)    
0.9% (  -7% -   10%) 0.522
            OrHighNotLow     1215.18      (4.5%)     1227.52      (4.6%)    
1.0% (  -7% -   10%) 0.481
               OrHighMed       65.77      (4.0%)       66.50      (3.7%)    
1.1% (  -6% -    9%) 0.365
              AndHighMed       44.34      (4.4%)       44.84      (5.0%)    
1.1% (  -7% -   11%) 0.449
           OrNotHighHigh      783.60      (3.9%)      792.60      (4.6%)    
1.1% (  -7% -    9%) 0.392
             AndHighHigh       38.95      (4.7%)       39.44      (4.6%)    
1.3% (  -7% -   11%) 0.392
BrowseDayOfYearSSDVFacets        4.68     (10.0%)        4.77      (9.7%)    
1.9% ( -16% -   23%) 0.551
   BrowseMonthTaxoFacets        1.20      (9.0%)        1.23      (9.7%)    
2.3% ( -15% -   23%) 0.437
BrowseDayOfYearTaxoFacets        1.15      (9.7%)        1.18     (11.0%)    
2.4% ( -16% -   25%) 0.461
                HighTerm     2329.95      (4.5%)     2391.41      (5.3%)    
2.6% (  -6% -   13%) 0.092
    BrowseDateTaxoFacets        1.16      (9.7%)        1.19     (11.1%)    
2.7% ( -16% -   25%) 0.421
              TermDTSort       65.25      (7.7%)       68.06     (10.3%)    
4.3% ( -12% -   24%) 0.132


CPU merged search profile for my_modified_version:
PROFILE SUMMARY from 1454870 events (total: 1M)
  tests.profile.mode=cpu
  tests.profile.count=30
  tests.profile.stacksize=1
  tests.profile.linenumbers=false
PERCENT       CPU SAMPLES   STACK
9.22%         134161        
org.apache.lucene.util.packed.DirectMonotonicReader#get()
8.22%         119591        
org.apache.lucene.util.packed.DirectReader$DirectPackedReader12#get()
5.35%         77845         
org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$17#binaryValue()
3.52%         51219         
java.util.Collections$UnmodifiableCollection$1#<init>()
3.49%         50705         
org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts#countAll()
2.44%         35538         org.apache.lucene.store.ByteBufferGuard#getShort()
2.37%         34551         java.nio.Buffer#scope()
2.03%         29597         java.nio.ByteBuffer#getArray()
2.02%         29320         jdk.internal.misc.Unsafe#convEndian()
1.91%         27740         java.nio.DirectByteBuffer#getShort()
1.90%         27626         
org.apache.lucene.store.ByteBufferIndexInput#readBytes()
1.81%         26300         java.util.Objects#checkIndex()
1.76%         25665         
org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$EverythingEnum#advance()
1.60%         23302         
org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$VaryingBPVReader#getLongValue()
1.51%         21980         
org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$EverythingEnum#nextPosition()
1.51%         21925         
org.apache.lucene.queries.intervals.OrderedIntervalsSource$OrderedIntervalIterator#nextInterval()
1.22%         17802         jdk.internal.misc.Unsafe#copyMemory()
1.16%         16832         
org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockImpactsPostingsEnum#advance()
1.14%         16544         
org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts#countOneSegment()
1.11%         16206         jdk.internal.util.Preconditions#checkFromIndexSize()
1.10%         16072         org.apache.lucene.codecs.lucene90.ForUtil#expand8()
1.07%         15617         java.nio.Buffer#position()
1.05%         15245         
org.apache.lucene.util.packed.DirectReader$DirectPackedReader4#get()
0.97%         14144         
org.apache.lucene.queries.spans.NearSpansOrdered#stretchToOrder()
0.95%         13785         org.apache.lucene.search.ConjunctionDISI#doNext()
0.94%         13737         
org.apache.lucene.queries.intervals.IntervalFilter#nextInterval()
0.94%         13647         
org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$EverythingEnum#skipPositions()
0.86%         12467         
java.util.Collections$UnmodifiableCollection$1#hasNext()
0.82%         11951         
org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll()
0.82%         11948         
org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$20#ordValue()


CPU merged search profile for baseline:
PROFILE SUMMARY from 1455846 events (total: 1M)
  tests.profile.mode=cpu
  tests.profile.count=30
  tests.profile.stacksize=1
  tests.profile.linenumbers=false
PERCENT       CPU SAMPLES   STACK
9.92%         144437        
org.apache.lucene.util.packed.DirectMonotonicReader#get()
8.91%         129653        
org.apache.lucene.util.packed.DirectReader$DirectPackedReader12#get()
6.42%         93437         
org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$17#binaryValue()
3.60%         52446         
java.util.Collections$UnmodifiableCollection$1#<init>()
3.30%         48044         
org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts#countAll()
2.33%         33969         
org.apache.lucene.store.ByteBufferIndexInput#readBytes()
2.26%         32835         java.nio.ByteBuffer#getArray()
1.99%         28936         java.util.Objects#checkIndex()
1.96%         28474         org.apache.lucene.store.ByteBufferGuard#getShort()
1.92%         27908         java.nio.Buffer#scope()
1.87%         27199         jdk.internal.util.Preconditions#checkFromIndexSize()
1.77%         25702         
org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$EverythingEnum#advance()
1.71%         24943         jdk.internal.misc.Unsafe#convEndian()
1.67%         24365         
org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$VaryingBPVReader#getLongValue()
1.67%         24310         
org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts#countOneSegment()
1.57%         22795         
org.apache.lucene.queries.intervals.OrderedIntervalsSource$OrderedIntervalIterator#nextInterval()
1.51%         21924         
org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$EverythingEnum#nextPosition()
1.23%         17931         java.nio.Buffer#position()
1.18%         17145         jdk.internal.misc.Unsafe#copyMemory()
1.16%         16845         
org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockImpactsPostingsEnum#advance()
1.12%         16270         org.apache.lucene.codecs.lucene90.ForUtil#expand8()
1.10%         15971         java.nio.DirectByteBuffer#getShort()
0.99%         14447         
org.apache.lucene.util.packed.DirectReader$DirectPackedReader4#get()
0.98%         14252         
org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$EverythingEnum#skipPositions()
0.95%         13817         
org.apache.lucene.queries.intervals.IntervalFilter#nextInterval()
0.93%         13573         
org.apache.lucene.queries.spans.NearSpansOrdered#stretchToOrder()
0.92%         13323         org.apache.lucene.search.ConjunctionDISI#doNext()
0.80%         11677         
java.util.Collections$UnmodifiableCollection$1#hasNext()
0.78%         11422         
org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll()
0.76%         11123         
org.apache.lucene.queries.spans.NearSpansOrdered#nextStartPosition()


HEAP merged search profile for my_modified_version:
PROFILE SUMMARY from 78058 events (total: 27928M)
  tests.profile.mode=heap
  tests.profile.count=30
  tests.profile.stacksize=1
  tests.profile.linenumbers=false
PERCENT       HEAP SAMPLES  STACK
17.16%        4792M         org.apache.lucene.util.FixedBitSet#<init>()
8.40%         2344M         
org.apache.lucene.search.ExactPhraseMatcher$1$1#getImpacts()
7.41%         2070M         
org.apache.lucene.search.ExactPhraseMatcher$1#getImpacts()
6.80%         1898M         java.util.AbstractList#iterator()
5.46%         1524M         
org.apache.lucene.codecs.lucene90.blocktree.SegmentTermsEnumFrame#<init>()
3.18%         888M          org.apache.lucene.util.ArrayUtil#growExact()
2.97%         830M          org.apache.lucene.util.BytesRef#<init>()
2.69%         750M          java.util.ArrayList#grow()
2.60%         726M          
org.apache.lucene.util.fst.ByteSequenceOutputs#read()
2.56%         715M          
org.apache.lucene.queryparser.charstream.FastCharStream#refill()
2.43%         677M          
org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockDocsEnum#<init>()
1.79%         499M          java.nio.DirectByteBufferR#duplicate()
1.67%         466M          
org.apache.lucene.queryparser.charstream.FastCharStream#GetImage()
1.66%         463M          
jdk.internal.misc.Unsafe#allocateUninitializedArray()
1.58%         440M          
org.apache.lucene.codecs.lucene90.blocktree.SegmentTermsEnum#getFrame()
1.55%         433M          java.util.ArrayList#iterator()
1.34%         375M          org.apache.lucene.util.PriorityQueue#<init>()
1.30%         363M          
org.apache.lucene.codecs.lucene90.blocktree.IntersectTermsEnumFrame#load()
1.20%         333M          
org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockImpactsDocsEnum#<init>()
1.14%         318M          
org.apache.lucene.codecs.lucene90.blocktree.SegmentTermsEnum#<init>()
1.11%         309M          
org.apache.lucene.codecs.lucene90.Lucene90PostingsReader#newTermState()
1.09%         305M          org.apache.lucene.codecs.lucene90.ForUtil#<init>()
0.87%         244M          java.nio.DirectByteBufferR#slice()
0.83%         230M          
org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$EverythingEnum#<init>()
0.82%         229M          
org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockImpactsPostingsEnum#<init>()
0.80%         223M          
org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingStoredFieldsReader$BlockState#document()
0.73%         204M          java.nio.DirectByteBufferR#asLongBuffer()
0.73%         203M          
org.apache.lucene.codecs.lucene90.blocktree.IntersectTermsEnumFrame#<init>()
0.72%         201M          java.util.AbstractList#listIterator()
0.68%         191M          java.util.Arrays#copyOf()


HEAP merged search profile for baseline:
PROFILE SUMMARY from 78116 events (total: 27923M)
  tests.profile.mode=heap
  tests.profile.count=30
  tests.profile.stacksize=1
  tests.profile.linenumbers=false
PERCENT       HEAP SAMPLES  STACK
17.15%        4789M         org.apache.lucene.util.FixedBitSet#<init>()
8.30%         2317M         
org.apache.lucene.search.ExactPhraseMatcher$1$1#getImpacts()
7.31%         2040M         
org.apache.lucene.search.ExactPhraseMatcher$1#getImpacts()
6.89%         1925M         java.util.AbstractList#iterator()
5.39%         1506M         
org.apache.lucene.codecs.lucene90.blocktree.SegmentTermsEnumFrame#<init>()
3.24%         904M          org.apache.lucene.util.ArrayUtil#growExact()
3.01%         840M          org.apache.lucene.util.BytesRef#<init>()
2.72%         759M          java.util.ArrayList#grow()
2.59%         724M          
org.apache.lucene.util.fst.ByteSequenceOutputs#read()
2.50%         697M          
org.apache.lucene.queryparser.charstream.FastCharStream#refill()
2.41%         673M          
org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockDocsEnum#<init>()
1.85%         515M          java.nio.DirectByteBufferR#duplicate()
1.69%         472M          
jdk.internal.misc.Unsafe#allocateUninitializedArray()
1.60%         446M          
org.apache.lucene.queryparser.charstream.FastCharStream#GetImage()
1.56%         434M          
org.apache.lucene.codecs.lucene90.blocktree.SegmentTermsEnum#getFrame()
1.50%         419M          java.util.ArrayList#iterator()
1.36%         378M          org.apache.lucene.util.PriorityQueue#<init>()
1.33%         371M          
org.apache.lucene.codecs.lucene90.blocktree.IntersectTermsEnumFrame#load()
1.15%         321M          
org.apache.lucene.codecs.lucene90.blocktree.SegmentTermsEnum#<init>()
1.12%         313M          
org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockImpactsDocsEnum#<init>()
1.07%         298M          
org.apache.lucene.codecs.lucene90.Lucene90PostingsReader#newTermState()
1.02%         284M          org.apache.lucene.codecs.lucene90.ForUtil#<init>()
0.93%         260M          java.nio.DirectByteBufferR#slice()
0.82%         228M          
org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$EverythingEnum#<init>()
0.81%         227M          
org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockImpactsPostingsEnum#<init>()
0.81%         225M          
org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingStoredFieldsReader$BlockState#document()
0.77%         214M          java.nio.DirectByteBufferR#asLongBuffer()
0.69%         193M          
org.apache.lucene.queryparser.classic.Token#newToken()
0.67%         187M          java.util.AbstractList#listIterator()
0.66%         185M          java.util.Arrays#copyOf()
{noformat}

It looks like there's a slight improvement in some queries/sorting. The new 
code is much cleaner, so I see no reason not to commit this. I am still open 
for suggestions about the FST readers.

> Improve ByteArrayDataInput to read primitive short/int/long natively using 
> VarHandles
> -------------------------------------------------------------------------------------
>
>                 Key: LUCENE-10113
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10113
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/store
>    Affects Versions: main (9.0)
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>            Priority: Major
>             Fix For: main (9.0)
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> LUCENE-10112 reminded me about something i wanted to do long ago: Basically 
> for all IndexInputs/DataInputs we are able to natively read short, int, long 
> using little endian with single CPU instructions (due to using ByteBuffer's 
> methods that support primitive reads). Only ByteArrayDataInput still uses 
> manual code beased on the the inherited byte-by-byte approach to read single 
> bytes and combining the bytes using little endian.
> The approach here is to use Java 9+ VarHandles to allow reading 
> int/long/short as single cpu instructions and not manually recombining the 
> bytes. The trick is to make a "view" var handle which allows to access the 
> byte array using the same mechanisms as ByteBuffers or JDK 17 MemorySegments 
> (under the hood it uses Unsafe to use CPU instructions and optionally swap 
> bytes if platform endianness is BE).
> In LUCENE-10112 there were similar stuff done with LZ4 and a microbenchmark 
> was written that showed a significant speed improvement when accessing the 
> types with VarHandle.
> P.S.: The same applies to FST.BytesReader and/or ByteSliceReader, but I am no 
> sure if those use the int/short/long ones at all. At least this one does not 
> override the methods to read ints, longs and shorts, so there is no 
> optimization at all. FST seems to read bytes and byte[] only and 
> ByteSliceReader mostly VInts.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-10113) Improve ByteArrayDataInput to read primitive short/int/long natively using VarHandles

Reply via email to