uschindler edited a comment on pull request #177:
URL: https://github.com/apache/lucene/pull/177#issuecomment-861046986


   I did more testig and was able to make MMapDirectory work the same speed. 
This patch fixing the perf issue shows the problems: 
https://github.com/uschindler/lucene/commit/b057213cc6548ffa29b8e2810f39eb84c50a3bc0
   
   The main issue can be seen in the copying still used by Lucene between 
mmaped segment and heap:
   ```java
     private static void copySegmentToHeap(MemorySegment src, long srcOffset, 
byte[] target, int targetOffset, int len) {
       Objects.checkFromIndexSize(srcOffset, len, src.byteSize());
       theUnsafe.copyMemory(null, src.address().toRawLongValue() + srcOffset, 
           target, Unsafe.ARRAY_BYTE_BASE_OFFSET + targetOffset, len);
       //MemorySegment.ofArray(target).asSlice(targetOffset, 
len).copyFrom(src.asSlice(srcOffset, len));
     }
   ```
   
   If you do the memory copy using Unsafe as this static method shows, the 
performance is identical:
   
   ```
                       TaskQPS baseline      StdDevQPS my_modified_version      
StdDev                Pct diff p-value
                   PKLookup      186.78      (2.2%)      180.11      (1.4%)   
-3.6% (  -7% -    0%) 0.000
      BrowseMonthSSDVFacets        4.14      (6.0%)        4.05      (6.3%)   
-2.3% ( -13% -   10%) 0.238
   BrowseDayOfYearSSDVFacets        4.03      (4.4%)        3.94      (4.7%)   
-2.2% ( -10% -    7%) 0.123
                    Prefix3        5.43     (10.4%)        5.36     (11.7%)   
-1.3% ( -21% -   23%) 0.708
                     IntNRQ       41.13      (2.0%)       40.61      (1.8%)   
-1.2% (  -4% -    2%) 0.040
            LowSloppyPhrase        5.85      (4.2%)        5.79      (3.8%)   
-1.0% (  -8% -    7%) 0.411
           HighSloppyPhrase        9.74      (4.1%)        9.64      (3.9%)   
-1.0% (  -8% -    7%) 0.416
                 HighPhrase      226.76      (2.7%)      225.13      (2.9%)   
-0.7% (  -6% -    5%) 0.417
                  MedPhrase      204.52      (2.7%)      203.77      (2.5%)   
-0.4% (  -5% -    4%) 0.656
                     Fuzzy1       84.24      (4.5%)       84.20      (5.9%)   
-0.0% (  -9% -   10%) 0.979
                  LowPhrase       23.33      (2.6%)       23.33      (2.7%)   
-0.0% (  -5% -    5%) 0.986
                    Respell       34.95      (1.5%)       34.97      (1.6%)    
0.1% (  -3% -    3%) 0.898
                   HighTerm     1128.45      (4.8%)     1129.85      (4.7%)    
0.1% (  -8% -   10%) 0.934
          HighTermMonthSort       86.97     (17.0%)       87.24     (14.5%)    
0.3% ( -26% -   38%) 0.951
               OrHighNotLow      913.63      (4.0%)      919.28      (3.5%)    
0.6% (  -6% -    8%) 0.603
               OrNotHighLow      630.14      (2.8%)      635.01      (3.0%)    
0.8% (  -4% -    6%) 0.401
                AndHighHigh       42.87      (4.2%)       43.22      (4.4%)    
0.8% (  -7% -    9%) 0.550
                    MedTerm     1504.76      (4.3%)     1519.72      (3.8%)    
1.0% (  -6% -    9%) 0.440
            MedSloppyPhrase       38.70      (3.2%)       39.11      (2.6%)    
1.1% (  -4% -    7%) 0.251
                 AndHighLow      464.18      (4.4%)      469.92      (4.6%)    
1.2% (  -7% -   10%) 0.388
               OrNotHighMed      762.97      (2.8%)      773.13      (3.2%)    
1.3% (  -4% -    7%) 0.162
                  OrHighLow      290.51      (4.7%)      294.49      (4.3%)    
1.4% (  -7% -   10%) 0.339
                     Fuzzy2       51.19      (8.4%)       51.90      (7.2%)    
1.4% ( -13% -   18%) 0.576
                    LowTerm     1892.07      (2.9%)     1919.02      (4.0%)    
1.4% (  -5% -    8%) 0.199
                 AndHighMed       43.09      (5.0%)       43.76      (4.7%)    
1.6% (  -7% -   11%) 0.311
               OrHighNotMed      708.72      (3.0%)      719.93      (2.7%)    
1.6% (  -4% -    7%) 0.084
              OrHighNotHigh      791.29      (4.2%)      803.91      (3.7%)    
1.6% (  -6% -    9%) 0.204
                  OrHighMed       37.87      (4.8%)       38.52      (3.2%)    
1.7% (  -5% -   10%) 0.179
              OrNotHighHigh      755.35      (3.4%)      769.72      (3.6%)    
1.9% (  -4% -    9%) 0.084
      HighTermDayOfYearSort       66.49     (17.2%)       67.81     (15.3%)    
2.0% ( -25% -   41%) 0.700
       HighIntervalsOrdered        5.10      (4.0%)        5.20      (3.9%)    
2.0% (  -5% -   10%) 0.108
       HighTermTitleBDVSort       71.78     (17.6%)       73.23     (16.8%)    
2.0% ( -27% -   44%) 0.712
                 OrHighHigh        9.62      (6.7%)        9.83      (3.2%)    
2.2% (  -7% -   13%) 0.181
                   Wildcard       13.65      (9.0%)       14.11     (10.0%)    
3.4% ( -14% -   24%) 0.264
                 TermDTSort       61.68     (15.7%)       64.16      (9.1%)    
4.0% ( -18% -   34%) 0.324
               HighSpanNear        8.32      (2.6%)        8.72      (3.1%)    
4.7% (   0% -   10%) 0.000
                LowSpanNear       10.98      (2.1%)       11.54      (2.2%)    
5.1% (   0% -    9%) 0.000
                MedSpanNear        8.34      (2.0%)        8.77      (2.2%)    
5.2% (   0% -    9%) 0.000
      BrowseMonthTaxoFacets        1.02      (3.5%)        1.13     (13.6%)   
10.1% (  -6% -   28%) 0.001
   BrowseDayOfYearTaxoFacets        1.00      (4.6%)        1.11     (14.4%)   
10.5% (  -8% -   30%) 0.002
       BrowseDateTaxoFacets        1.00      (4.6%)        1.11     (14.4%)   
10.7% (  -8% -   31%) 0.002
    
   CPU merged search profile for my_modified_version:
   JFR aggregation command: 
/home/jenkins/tools/java/64bit/latest-jdk17/bin/java --add-modules 
jdk.incubator.foreign -server -Xms2g -Xmx2g -XX:-TieredCompilation 
-XX:+HeapDumpOnOutOfMemoryError -Xbatch -cp 
/home/thetaphi/benchmark/lucene_candidate/buildSrc/build/classes/java/main 
-Dtests.profile.mode=cpu -Dtests.profile.stacksize=1 -Dtests.profile.count=30 
org.apache.lucene.gradle.ProfileResults 
/home/thetaphi/benchmark/util/bench-search-baseline_vs_patch-my_modified_version-12.jfr
 
/home/thetaphi/benchmark/util/bench-search-baseline_vs_patch-my_modified_version-8.jfr
 
/home/thetaphi/benchmark/util/bench-search-baseline_vs_patch-my_modified_version-11.jfr
 
/home/thetaphi/benchmark/util/bench-search-baseline_vs_patch-my_modified_version-18.jfr
 
/home/thetaphi/benchmark/util/bench-search-baseline_vs_patch-my_modified_version-17.jfr
 
/home/thetaphi/benchmark/util/bench-search-baseline_vs_patch-my_modified_version-0.jfr
 /home/thetaphi/benchmark/util/bench-search-baseline_vs_patch-my_modified_
 version-14.jfr 
/home/thetaphi/benchmark/util/bench-search-baseline_vs_patch-my_modified_version-10.jfr
 
/home/thetaphi/benchmark/util/bench-search-baseline_vs_patch-my_modified_version-4.jfr
 
/home/thetaphi/benchmark/util/bench-search-baseline_vs_patch-my_modified_version-5.jfr
 
/home/thetaphi/benchmark/util/bench-search-baseline_vs_patch-my_modified_version-16.jfr
 
/home/thetaphi/benchmark/util/bench-search-baseline_vs_patch-my_modified_version-2.jfr
 
/home/thetaphi/benchmark/util/bench-search-baseline_vs_patch-my_modified_version-6.jfr
 
/home/thetaphi/benchmark/util/bench-search-baseline_vs_patch-my_modified_version-15.jfr
 
/home/thetaphi/benchmark/util/bench-search-baseline_vs_patch-my_modified_version-1.jfr
 
/home/thetaphi/benchmark/util/bench-search-baseline_vs_patch-my_modified_version-19.jfr
 
/home/thetaphi/benchmark/util/bench-search-baseline_vs_patch-my_modified_version-13.jfr
 
/home/thetaphi/benchmark/util/bench-search-baseline_vs_patch-my_modified_version-7.jfr
 /home/thetaphi/bench
 mark/util/bench-search-baseline_vs_patch-my_modified_version-9.jfr 
/home/thetaphi/benchmark/util/bench-search-baseline_vs_patch-my_modified_version-3.jfr
   Took 4.04 seconds
   WARNING: Using incubator modules: jdk.incubator.foreign
   PROFILE SUMMARY from 1544656 events (total: 1M)
     tests.profile.mode=cpu
     tests.profile.count=30
     tests.profile.stacksize=1
     tests.profile.linenumbers=false
   PERCENT       CPU SAMPLES   STACK
   15.33%        236729        
org.apache.lucene.util.packed.DirectMonotonicReader#get()
   9.95%         153617        
org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts#countAll()
   7.58%         117056        
org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$15#binaryValue()
   5.42%         83667         
org.apache.lucene.util.packed.DirectReader$DirectPackedReader12#get()
   4.47%         68989         
jdk.internal.misc.ScopedMemoryAccess#getShortUnalignedInternal()
   4.43%         68353         jdk.internal.foreign.Utils#filterSegment()
   2.49%         38393         
org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts#countOneSegment()
   1.98%         30594         
jdk.internal.foreign.AbstractMemorySegmentImpl#checkBoundsSmall()
   1.91%         29460         
org.apache.lucene.index.SingletonSortedSetDocValues#nextDoc()
   1.82%         28104         
org.apache.lucene.facet.taxonomy.IntTaxonomyFacets#increment()
   1.73%         26738         
jdk.internal.foreign.AbstractMemorySegmentImpl#isSet()
   1.72%         26516         sun.misc.Unsafe#copyMemory()
   1.63%         25197         
org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockDocsEnum#nextDoc()
   1.55%         23880         
jdk.internal.misc.ScopedMemoryAccess#getByteInternal()
   1.42%         21908         
jdk.internal.util.Preconditions#checkFromIndexSize()
   1.31%         20268         
org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$EverythingEnum#advance()
   1.10%         16954         
org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockImpactsPostingsEnum#advance()
   1.00%         15398         org.apache.lucene.search.ConjunctionDISI#doNext()
   0.94%         14473         java.lang.invoke.VarHandleGuards#guard_LJ_I()
   0.90%         13916         
jdk.internal.foreign.SharedScope#checkValidState()
   0.83%         12820         
org.apache.lucene.codecs.lucene90.Lucene90PostingsReader#findFirstGreater()
   0.78%         12064         
org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$EverythingEnum#nextPosition()
   0.73%         11224         
org.apache.lucene.index.SingletonSortedSetDocValues#nextOrd()
   0.72%         11055         
org.apache.lucene.util.packed.DirectReader$DirectPackedReader4#get()
   0.71%         10950         org.apache.lucene.store.DataInput#readVInt()
   0.69%         10643         
org.apache.lucene.store.MemorySegmentIndexInput$SingleSegmentImpl#seek()
   0.69%         10612         
org.apache.lucene.queries.intervals.OrderedIntervalsSource$OrderedIntervalIterator#nextInterval()
   0.61%         9380          
jdk.internal.foreign.AbstractMemorySegmentImpl#scope()
   0.57%         8825          org.apache.lucene.util.BitSet#or()
   0.57%         8747          java.util.Objects#checkIndex()
    
    
   CPU merged search profile for baseline:
   JFR aggregation command: 
/home/jenkins/tools/java/64bit/latest-jdk17/bin/java --add-modules 
jdk.incubator.foreign -server -Xms2g -Xmx2g -XX:-TieredCompilation 
-XX:+HeapDumpOnOutOfMemoryError -Xbatch -cp 
/home/thetaphi/benchmark/lucene_baseline/buildSrc/build/classes/java/main 
-Dtests.profile.mode=cpu -Dtests.profile.stacksize=1 -Dtests.profile.count=30 
org.apache.lucene.gradle.ProfileResults 
/home/thetaphi/benchmark/util/bench-search-baseline_vs_patch-baseline-3.jfr 
/home/thetaphi/benchmark/util/bench-search-baseline_vs_patch-baseline-17.jfr 
/home/thetaphi/benchmark/util/bench-search-baseline_vs_patch-baseline-16.jfr 
/home/thetaphi/benchmark/util/bench-search-baseline_vs_patch-baseline-13.jfr 
/home/thetaphi/benchmark/util/bench-search-baseline_vs_patch-baseline-12.jfr 
/home/thetaphi/benchmark/util/bench-search-baseline_vs_patch-baseline-5.jfr 
/home/thetaphi/benchmark/util/bench-search-baseline_vs_patch-baseline-18.jfr 
/home/thetaphi/benchmark/util/bench-search-baseline_vs_patch-ba
 seline-9.jfr 
/home/thetaphi/benchmark/util/bench-search-baseline_vs_patch-baseline-1.jfr 
/home/thetaphi/benchmark/util/bench-search-baseline_vs_patch-baseline-19.jfr 
/home/thetaphi/benchmark/util/bench-search-baseline_vs_patch-baseline-4.jfr 
/home/thetaphi/benchmark/util/bench-search-baseline_vs_patch-baseline-2.jfr 
/home/thetaphi/benchmark/util/bench-search-baseline_vs_patch-baseline-8.jfr 
/home/thetaphi/benchmark/util/bench-search-baseline_vs_patch-baseline-11.jfr 
/home/thetaphi/benchmark/util/bench-search-baseline_vs_patch-baseline-10.jfr 
/home/thetaphi/benchmark/util/bench-search-baseline_vs_patch-baseline-14.jfr 
/home/thetaphi/benchmark/util/bench-search-baseline_vs_patch-baseline-15.jfr 
/home/thetaphi/benchmark/util/bench-search-baseline_vs_patch-baseline-0.jfr 
/home/thetaphi/benchmark/util/bench-search-baseline_vs_patch-baseline-7.jfr 
/home/thetaphi/benchmark/util/bench-search-baseline_vs_patch-baseline-6.jfr
   Took 4.24 seconds
   WARNING: Using incubator modules: jdk.incubator.foreign
   PROFILE SUMMARY from 1712367 events (total: 1M)
     tests.profile.mode=cpu
     tests.profile.count=30
     tests.profile.stacksize=1
     tests.profile.linenumbers=false
   PERCENT       CPU SAMPLES   STACK
   12.99%        222468        
org.apache.lucene.util.packed.DirectMonotonicReader#get()
   9.56%         163732        
org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts#countAll()
   5.78%         99057         java.nio.ByteBuffer#getArray()
   5.05%         86545         
org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$15#binaryValue()
   4.12%         70481         
org.apache.lucene.util.packed.DirectReader$DirectPackedReader12#get()
   3.96%         67813         java.nio.Buffer#scope()
   3.87%         66312         jdk.internal.misc.Unsafe#convEndian()
   3.81%         65189         
org.apache.lucene.store.ByteBufferGuard#ensureValid()
   3.48%         59539         
org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts#countOneSegment()
   2.30%         39343         
org.apache.lucene.store.ByteBufferGuard#getBytes()
   2.14%         36573         java.nio.ByteBuffer#get()
   1.88%         32278         
org.apache.lucene.index.SingletonSortedSetDocValues#nextDoc()
   1.67%         28624         java.nio.Buffer#checkIndex()
   1.53%         26222         
org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockDocsEnum#nextDoc()
   1.22%         20948         
org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$EverythingEnum#advance()
   1.19%         20403         
org.apache.lucene.store.ByteBufferGuard#getShort()
   1.03%         17553         
org.apache.lucene.facet.taxonomy.IntTaxonomyFacets#increment()
   1.01%         17218         
org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockImpactsPostingsEnum#advance()
   0.94%         16167         java.nio.Buffer#position()
   0.89%         15218         
org.apache.lucene.store.ByteBufferIndexInput#readBytes()
   0.88%         15123         org.apache.lucene.search.ConjunctionDISI#doNext()
   0.76%         12938         
org.apache.lucene.store.ByteBufferIndexInput$SingleBufferImpl#seek()
   0.75%         12878         
org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$19#ordValue()
   0.75%         12813         
org.apache.lucene.codecs.lucene90.Lucene90PostingsReader#findFirstGreater()
   0.73%         12537         
org.apache.lucene.util.packed.DirectReader$DirectPackedReader4#get()
   0.69%         11839         
org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$EverythingEnum#nextPosition()
   0.69%         11738         org.apache.lucene.store.DataInput#readVInt()
   0.65%         11070         
org.apache.lucene.queries.intervals.OrderedIntervalsSource$OrderedIntervalIterator#nextInterval()
   0.62%         10575         
jdk.internal.util.Preconditions#checkFromIndexSize()
   0.59%         10022         
jdk.internal.misc.ScopedMemoryAccess#getByteInternal()
    
    
   HEAP merged search profile for my_modified_version:
   JFR aggregation command: 
/home/jenkins/tools/java/64bit/latest-jdk17/bin/java --add-modules 
jdk.incubator.foreign -server -Xms2g -Xmx2g -XX:-TieredCompilation 
-XX:+HeapDumpOnOutOfMemoryError -Xbatch -cp 
/home/thetaphi/benchmark/lucene_candidate/buildSrc/build/classes/java/main 
-Dtests.profile.mode=heap -Dtests.profile.stacksize=1 -Dtests.profile.count=30 
org.apache.lucene.gradle.ProfileResults 
/home/thetaphi/benchmark/util/bench-search-baseline_vs_patch-my_modified_version-12.jfr
 
/home/thetaphi/benchmark/util/bench-search-baseline_vs_patch-my_modified_version-8.jfr
 
/home/thetaphi/benchmark/util/bench-search-baseline_vs_patch-my_modified_version-11.jfr
 
/home/thetaphi/benchmark/util/bench-search-baseline_vs_patch-my_modified_version-18.jfr
 
/home/thetaphi/benchmark/util/bench-search-baseline_vs_patch-my_modified_version-17.jfr
 
/home/thetaphi/benchmark/util/bench-search-baseline_vs_patch-my_modified_version-0.jfr
 /home/thetaphi/benchmark/util/bench-search-baseline_vs_patch-my_modified
 _version-14.jfr 
/home/thetaphi/benchmark/util/bench-search-baseline_vs_patch-my_modified_version-10.jfr
 
/home/thetaphi/benchmark/util/bench-search-baseline_vs_patch-my_modified_version-4.jfr
 
/home/thetaphi/benchmark/util/bench-search-baseline_vs_patch-my_modified_version-5.jfr
 
/home/thetaphi/benchmark/util/bench-search-baseline_vs_patch-my_modified_version-16.jfr
 
/home/thetaphi/benchmark/util/bench-search-baseline_vs_patch-my_modified_version-2.jfr
 
/home/thetaphi/benchmark/util/bench-search-baseline_vs_patch-my_modified_version-6.jfr
 
/home/thetaphi/benchmark/util/bench-search-baseline_vs_patch-my_modified_version-15.jfr
 
/home/thetaphi/benchmark/util/bench-search-baseline_vs_patch-my_modified_version-1.jfr
 
/home/thetaphi/benchmark/util/bench-search-baseline_vs_patch-my_modified_version-19.jfr
 
/home/thetaphi/benchmark/util/bench-search-baseline_vs_patch-my_modified_version-13.jfr
 
/home/thetaphi/benchmark/util/bench-search-baseline_vs_patch-my_modified_version-7.jfr
 /home/thetaphi/benc
 hmark/util/bench-search-baseline_vs_patch-my_modified_version-9.jfr 
/home/thetaphi/benchmark/util/bench-search-baseline_vs_patch-my_modified_version-3.jfr
   Took 2.67 seconds
   WARNING: Using incubator modules: jdk.incubator.foreign
   PROFILE SUMMARY from 68380 events (total: 25828M)
     tests.profile.mode=heap
     tests.profile.count=30
     tests.profile.stacksize=1
     tests.profile.linenumbers=false
   PERCENT       HEAP SAMPLES  STACK
   18.44%        4761M         org.apache.lucene.util.FixedBitSet#<init>()
   7.55%         1948M         java.util.AbstractList#iterator()
   5.46%         1409M         
org.apache.lucene.search.ExactPhraseMatcher$1$1#getImpacts()
   5.21%         1345M         
org.apache.lucene.codecs.lucene90.blocktree.SegmentTermsEnumFrame#<init>()
   4.55%         1174M         org.apache.lucene.util.BytesRef#<init>()
   4.14%         1068M         
org.apache.lucene.search.ExactPhraseMatcher$1#getImpacts()
   3.95%         1020M         
org.apache.lucene.util.fst.ByteSequenceOutputs#read()
   3.75%         967M          org.apache.lucene.util.ArrayUtil#growExact()
   3.13%         808M          
org.apache.lucene.queryparser.charstream.FastCharStream#refill()
   2.90%         749M          
org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockDocsEnum#<init>()
   2.07%         534M          
org.apache.lucene.codecs.lucene90.blocktree.IntersectTermsEnumFrame#load()
   1.90%         492M          java.util.ArrayList#grow()
   1.78%         460M          
jdk.internal.misc.Unsafe#allocateUninitializedArray()
   1.43%         369M          
org.apache.lucene.codecs.lucene90.Lucene90PostingsReader#newTermState()
   1.32%         340M          
org.apache.lucene.codecs.lucene90.blocktree.SegmentTermsEnum#getFrame()
   1.31%         337M          java.util.AbstractList#listIterator()
   1.27%         328M          
org.apache.lucene.codecs.lucene90.blocktree.SegmentTermsEnum#<init>()
   1.23%         317M          
org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockImpactsDocsEnum#<init>()
   1.13%         290M          
org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockImpactsPostingsEnum#<init>()
   1.11%         287M          java.util.ArrayList#iterator()
   1.09%         282M          
org.apache.lucene.codecs.lucene90.ForUtil#<init>()
   1.07%         276M          
org.apache.lucene.queryparser.charstream.FastCharStream#GetImage()
   1.06%         274M          
org.apache.lucene.store.MemorySegmentIndexInput#buildSlice()
   1.01%         259M          java.util.Arrays#asList()
   0.97%         250M          java.util.Arrays#copyOf()
   0.96%         248M          org.apache.lucene.util.PriorityQueue#<init>()
   0.91%         233M          
org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingStoredFieldsReader$BlockState#document()
   0.87%         223M          
org.apache.lucene.queryparser.classic.Token#newToken()
   0.86%         222M          
org.apache.lucene.codecs.lucene90.blocktree.IntersectTermsEnumFrame#<init>()
   0.84%         216M          
jdk.internal.foreign.MappedMemorySegmentImpl#dup()
    
    
   HEAP merged search profile for baseline:
   JFR aggregation command: 
/home/jenkins/tools/java/64bit/latest-jdk17/bin/java --add-modules 
jdk.incubator.foreign -server -Xms2g -Xmx2g -XX:-TieredCompilation 
-XX:+HeapDumpOnOutOfMemoryError -Xbatch -cp 
/home/thetaphi/benchmark/lucene_baseline/buildSrc/build/classes/java/main 
-Dtests.profile.mode=heap -Dtests.profile.stacksize=1 -Dtests.profile.count=30 
org.apache.lucene.gradle.ProfileResults 
/home/thetaphi/benchmark/util/bench-search-baseline_vs_patch-baseline-3.jfr 
/home/thetaphi/benchmark/util/bench-search-baseline_vs_patch-baseline-17.jfr 
/home/thetaphi/benchmark/util/bench-search-baseline_vs_patch-baseline-16.jfr 
/home/thetaphi/benchmark/util/bench-search-baseline_vs_patch-baseline-13.jfr 
/home/thetaphi/benchmark/util/bench-search-baseline_vs_patch-baseline-12.jfr 
/home/thetaphi/benchmark/util/bench-search-baseline_vs_patch-baseline-5.jfr 
/home/thetaphi/benchmark/util/bench-search-baseline_vs_patch-baseline-18.jfr 
/home/thetaphi/benchmark/util/bench-search-baseline_vs_patch-b
 aseline-9.jfr 
/home/thetaphi/benchmark/util/bench-search-baseline_vs_patch-baseline-1.jfr 
/home/thetaphi/benchmark/util/bench-search-baseline_vs_patch-baseline-19.jfr 
/home/thetaphi/benchmark/util/bench-search-baseline_vs_patch-baseline-4.jfr 
/home/thetaphi/benchmark/util/bench-search-baseline_vs_patch-baseline-2.jfr 
/home/thetaphi/benchmark/util/bench-search-baseline_vs_patch-baseline-8.jfr 
/home/thetaphi/benchmark/util/bench-search-baseline_vs_patch-baseline-11.jfr 
/home/thetaphi/benchmark/util/bench-search-baseline_vs_patch-baseline-10.jfr 
/home/thetaphi/benchmark/util/bench-search-baseline_vs_patch-baseline-14.jfr 
/home/thetaphi/benchmark/util/bench-search-baseline_vs_patch-baseline-15.jfr 
/home/thetaphi/benchmark/util/bench-search-baseline_vs_patch-baseline-0.jfr 
/home/thetaphi/benchmark/util/bench-search-baseline_vs_patch-baseline-7.jfr 
/home/thetaphi/benchmark/util/bench-search-baseline_vs_patch-baseline-6.jfr
   Took 2.58 seconds
   WARNING: Using incubator modules: jdk.incubator.foreign
   PROFILE SUMMARY from 69795 events (total: 26355M)
     tests.profile.mode=heap
     tests.profile.count=30
     tests.profile.stacksize=1
     tests.profile.linenumbers=false
   PERCENT       HEAP SAMPLES  STACK
   17.98%        4739M         org.apache.lucene.util.FixedBitSet#<init>()
   7.47%         1968M         java.util.AbstractList#iterator()
   5.16%         1359M         
org.apache.lucene.search.ExactPhraseMatcher$1$1#getImpacts()
   4.97%         1309M         
org.apache.lucene.codecs.lucene90.blocktree.SegmentTermsEnumFrame#<init>()
   4.44%         1169M         org.apache.lucene.util.BytesRef#<init>()
   4.08%         1074M         
org.apache.lucene.search.ExactPhraseMatcher$1#getImpacts()
   3.91%         1030M         
org.apache.lucene.util.fst.ByteSequenceOutputs#read()
   3.38%         892M          
org.apache.lucene.queryparser.charstream.FastCharStream#refill()
   3.36%         885M          org.apache.lucene.util.ArrayUtil#growExact()
   2.81%         741M          
org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockDocsEnum#<init>()
   2.08%         548M          
org.apache.lucene.codecs.lucene90.blocktree.IntersectTermsEnumFrame#load()
   2.00%         528M          java.util.ArrayList#grow()
   1.82%         479M          
jdk.internal.misc.Unsafe#allocateUninitializedArray()
   1.58%         415M          java.nio.DirectByteBufferR#duplicate()
   1.43%         376M          
org.apache.lucene.codecs.lucene90.blocktree.SegmentTermsEnum#getFrame()
   1.41%         370M          java.util.AbstractList#listIterator()
   1.33%         350M          
org.apache.lucene.codecs.lucene90.Lucene90PostingsReader#newTermState()
   1.26%         330M          
org.apache.lucene.queryparser.charstream.FastCharStream#GetImage()
   1.17%         307M          
org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockImpactsDocsEnum#<init>()
   1.16%         306M          
org.apache.lucene.codecs.lucene90.ForUtil#<init>()
   1.14%         299M          java.util.ArrayList#iterator()
   1.12%         295M          
org.apache.lucene.codecs.lucene90.blocktree.SegmentTermsEnum#<init>()
   1.00%         263M          java.nio.DirectByteBufferR#slice()
   0.98%         257M          java.util.Arrays#asList()
   0.97%         254M          org.apache.lucene.util.PriorityQueue#<init>()
   0.91%         239M          
org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockImpactsPostingsEnum#<init>()
   0.88%         231M          
org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingStoredFieldsReader$BlockState#document()
   0.73%         192M          java.nio.DirectByteBufferR#asLongBuffer()
   0.70%         184M          
org.apache.lucene.codecs.lucene90.blocktree.IntersectTermsEnumFrame#<init>()
   0.68%         179M          
org.apache.lucene.store.ByteBufferIndexInput#newCloneInstance()
   ```
   
   If you comment out the unsafe code and use the "official MemorySegmen API", 
the whole thing gets crazy:
   - runtime of a bench run on my machine oes up from 57 seconds to 75 seconds, 
while otheriwse (Unsafe) staying identical to baseline
   - in addition it produces a lot of garbage, heap dump contains many 
`HeapMemorySegmentImpl$OfByte` classes (that are the wrappers around `byte[]` 
when viewed as MemorySegment. Every wrapping produces a new instance which is 
not catched by escape analysis. This slows down! The heap dump has 50% of heap 
filled with those objects according to JFR.
   
   I will report this problem to project Panama! @mcimadamore 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to