[
https://issues.apache.org/jira/browse/LUCENE-10113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17417323#comment-17417323
]
Uwe Schindler commented on LUCENE-10113:
----------------------------------------
Performance comparison:
{noformat}
TaskQPS baseline StdDevQPS my_modified_version
StdDev Pct diff p-value
HighTermMonthSort 80.36 (11.4%) 78.81 (7.9%)
-1.9% ( -19% - 19%) 0.533
HighTermTitleBDVSort 12.47 (20.5%) 12.23 (18.6%)
-1.9% ( -33% - 46%) 0.756
HighSloppyPhrase 14.80 (3.9%) 14.61 (3.4%)
-1.2% ( -8% - 6%) 0.287
OrHighLow 191.33 (3.1%) 189.67 (3.8%)
-0.9% ( -7% - 6%) 0.436
HighSpanNear 2.76 (2.6%) 2.74 (4.4%)
-0.6% ( -7% - 6%) 0.575
Fuzzy2 57.57 (1.6%) 57.21 (2.2%)
-0.6% ( -4% - 3%) 0.304
HighPhrase 39.92 (3.5%) 39.68 (2.8%)
-0.6% ( -6% - 5%) 0.546
LowSpanNear 8.42 (2.3%) 8.39 (2.8%)
-0.4% ( -5% - 4%) 0.663
Wildcard 31.79 (3.8%) 31.74 (3.9%)
-0.2% ( -7% - 7%) 0.891
IntNRQ 94.20 (3.5%) 94.05 (4.1%)
-0.2% ( -7% - 7%) 0.898
MedSpanNear 29.62 (2.3%) 29.59 (2.7%)
-0.1% ( -4% - 5%) 0.894
HighTermDayOfYearSort 56.16 (7.7%) 56.11 (7.5%)
-0.1% ( -14% - 16%) 0.969
LowSloppyPhrase 10.42 (2.7%) 10.42 (2.4%)
-0.0% ( -4% - 5%) 0.985
MedSloppyPhrase 7.32 (2.5%) 7.32 (2.3%)
0.0% ( -4% - 4%) 0.977
Prefix3 12.62 (9.1%) 12.63 (10.5%)
0.1% ( -17% - 21%) 0.987
Fuzzy1 89.28 (1.1%) 89.34 (2.0%)
0.1% ( -3% - 3%) 0.903
BrowseMonthSSDVFacets 5.04 (5.5%) 5.04 (5.2%)
0.1% ( -10% - 11%) 0.966
OrNotHighLow 571.04 (1.5%) 571.46 (2.7%)
0.1% ( -4% - 4%) 0.917
MedIntervalsOrdered 36.70 (5.6%) 36.78 (5.7%)
0.2% ( -10% - 12%) 0.905
PKLookup 203.36 (3.8%) 203.81 (3.2%)
0.2% ( -6% - 7%) 0.845
HighIntervalsOrdered 3.55 (5.3%) 3.56 (5.0%)
0.2% ( -9% - 11%) 0.891
Respell 59.27 (1.3%) 59.44 (1.7%)
0.3% ( -2% - 3%) 0.548
MedPhrase 367.55 (2.0%) 368.61 (1.7%)
0.3% ( -3% - 4%) 0.623
AndHighLow 560.26 (3.3%) 561.90 (3.8%)
0.3% ( -6% - 7%) 0.795
OrNotHighMed 971.44 (2.4%) 974.30 (3.2%)
0.3% ( -5% - 6%) 0.742
LowPhrase 41.63 (2.5%) 41.76 (2.3%)
0.3% ( -4% - 5%) 0.672
LowIntervalsOrdered 94.44 (3.1%) 94.75 (3.2%)
0.3% ( -5% - 6%) 0.744
MedTerm 1590.31 (4.9%) 1596.02 (5.0%)
0.4% ( -9% - 10%) 0.819
OrHighNotHigh 958.25 (3.5%) 964.26 (3.7%)
0.6% ( -6% - 8%) 0.581
LowTerm 1527.92 (2.5%) 1538.97 (3.0%)
0.7% ( -4% - 6%) 0.412
OrHighHigh 26.32 (3.0%) 26.55 (3.4%)
0.9% ( -5% - 7%) 0.373
OrHighNotMed 1177.62 (4.3%) 1188.50 (4.8%)
0.9% ( -7% - 10%) 0.522
OrHighNotLow 1215.18 (4.5%) 1227.52 (4.6%)
1.0% ( -7% - 10%) 0.481
OrHighMed 65.77 (4.0%) 66.50 (3.7%)
1.1% ( -6% - 9%) 0.365
AndHighMed 44.34 (4.4%) 44.84 (5.0%)
1.1% ( -7% - 11%) 0.449
OrNotHighHigh 783.60 (3.9%) 792.60 (4.6%)
1.1% ( -7% - 9%) 0.392
AndHighHigh 38.95 (4.7%) 39.44 (4.6%)
1.3% ( -7% - 11%) 0.392
BrowseDayOfYearSSDVFacets 4.68 (10.0%) 4.77 (9.7%)
1.9% ( -16% - 23%) 0.551
BrowseMonthTaxoFacets 1.20 (9.0%) 1.23 (9.7%)
2.3% ( -15% - 23%) 0.437
BrowseDayOfYearTaxoFacets 1.15 (9.7%) 1.18 (11.0%)
2.4% ( -16% - 25%) 0.461
HighTerm 2329.95 (4.5%) 2391.41 (5.3%)
2.6% ( -6% - 13%) 0.092
BrowseDateTaxoFacets 1.16 (9.7%) 1.19 (11.1%)
2.7% ( -16% - 25%) 0.421
TermDTSort 65.25 (7.7%) 68.06 (10.3%)
4.3% ( -12% - 24%) 0.132
CPU merged search profile for my_modified_version:
PROFILE SUMMARY from 1454870 events (total: 1M)
tests.profile.mode=cpu
tests.profile.count=30
tests.profile.stacksize=1
tests.profile.linenumbers=false
PERCENT CPU SAMPLES STACK
9.22% 134161
org.apache.lucene.util.packed.DirectMonotonicReader#get()
8.22% 119591
org.apache.lucene.util.packed.DirectReader$DirectPackedReader12#get()
5.35% 77845
org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$17#binaryValue()
3.52% 51219
java.util.Collections$UnmodifiableCollection$1#<init>()
3.49% 50705
org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts#countAll()
2.44% 35538 org.apache.lucene.store.ByteBufferGuard#getShort()
2.37% 34551 java.nio.Buffer#scope()
2.03% 29597 java.nio.ByteBuffer#getArray()
2.02% 29320 jdk.internal.misc.Unsafe#convEndian()
1.91% 27740 java.nio.DirectByteBuffer#getShort()
1.90% 27626
org.apache.lucene.store.ByteBufferIndexInput#readBytes()
1.81% 26300 java.util.Objects#checkIndex()
1.76% 25665
org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$EverythingEnum#advance()
1.60% 23302
org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$VaryingBPVReader#getLongValue()
1.51% 21980
org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$EverythingEnum#nextPosition()
1.51% 21925
org.apache.lucene.queries.intervals.OrderedIntervalsSource$OrderedIntervalIterator#nextInterval()
1.22% 17802 jdk.internal.misc.Unsafe#copyMemory()
1.16% 16832
org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockImpactsPostingsEnum#advance()
1.14% 16544
org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts#countOneSegment()
1.11% 16206 jdk.internal.util.Preconditions#checkFromIndexSize()
1.10% 16072 org.apache.lucene.codecs.lucene90.ForUtil#expand8()
1.07% 15617 java.nio.Buffer#position()
1.05% 15245
org.apache.lucene.util.packed.DirectReader$DirectPackedReader4#get()
0.97% 14144
org.apache.lucene.queries.spans.NearSpansOrdered#stretchToOrder()
0.95% 13785 org.apache.lucene.search.ConjunctionDISI#doNext()
0.94% 13737
org.apache.lucene.queries.intervals.IntervalFilter#nextInterval()
0.94% 13647
org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$EverythingEnum#skipPositions()
0.86% 12467
java.util.Collections$UnmodifiableCollection$1#hasNext()
0.82% 11951
org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll()
0.82% 11948
org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$20#ordValue()
CPU merged search profile for baseline:
PROFILE SUMMARY from 1455846 events (total: 1M)
tests.profile.mode=cpu
tests.profile.count=30
tests.profile.stacksize=1
tests.profile.linenumbers=false
PERCENT CPU SAMPLES STACK
9.92% 144437
org.apache.lucene.util.packed.DirectMonotonicReader#get()
8.91% 129653
org.apache.lucene.util.packed.DirectReader$DirectPackedReader12#get()
6.42% 93437
org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$17#binaryValue()
3.60% 52446
java.util.Collections$UnmodifiableCollection$1#<init>()
3.30% 48044
org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts#countAll()
2.33% 33969
org.apache.lucene.store.ByteBufferIndexInput#readBytes()
2.26% 32835 java.nio.ByteBuffer#getArray()
1.99% 28936 java.util.Objects#checkIndex()
1.96% 28474 org.apache.lucene.store.ByteBufferGuard#getShort()
1.92% 27908 java.nio.Buffer#scope()
1.87% 27199 jdk.internal.util.Preconditions#checkFromIndexSize()
1.77% 25702
org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$EverythingEnum#advance()
1.71% 24943 jdk.internal.misc.Unsafe#convEndian()
1.67% 24365
org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$VaryingBPVReader#getLongValue()
1.67% 24310
org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts#countOneSegment()
1.57% 22795
org.apache.lucene.queries.intervals.OrderedIntervalsSource$OrderedIntervalIterator#nextInterval()
1.51% 21924
org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$EverythingEnum#nextPosition()
1.23% 17931 java.nio.Buffer#position()
1.18% 17145 jdk.internal.misc.Unsafe#copyMemory()
1.16% 16845
org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockImpactsPostingsEnum#advance()
1.12% 16270 org.apache.lucene.codecs.lucene90.ForUtil#expand8()
1.10% 15971 java.nio.DirectByteBuffer#getShort()
0.99% 14447
org.apache.lucene.util.packed.DirectReader$DirectPackedReader4#get()
0.98% 14252
org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$EverythingEnum#skipPositions()
0.95% 13817
org.apache.lucene.queries.intervals.IntervalFilter#nextInterval()
0.93% 13573
org.apache.lucene.queries.spans.NearSpansOrdered#stretchToOrder()
0.92% 13323 org.apache.lucene.search.ConjunctionDISI#doNext()
0.80% 11677
java.util.Collections$UnmodifiableCollection$1#hasNext()
0.78% 11422
org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll()
0.76% 11123
org.apache.lucene.queries.spans.NearSpansOrdered#nextStartPosition()
HEAP merged search profile for my_modified_version:
PROFILE SUMMARY from 78058 events (total: 27928M)
tests.profile.mode=heap
tests.profile.count=30
tests.profile.stacksize=1
tests.profile.linenumbers=false
PERCENT HEAP SAMPLES STACK
17.16% 4792M org.apache.lucene.util.FixedBitSet#<init>()
8.40% 2344M
org.apache.lucene.search.ExactPhraseMatcher$1$1#getImpacts()
7.41% 2070M
org.apache.lucene.search.ExactPhraseMatcher$1#getImpacts()
6.80% 1898M java.util.AbstractList#iterator()
5.46% 1524M
org.apache.lucene.codecs.lucene90.blocktree.SegmentTermsEnumFrame#<init>()
3.18% 888M org.apache.lucene.util.ArrayUtil#growExact()
2.97% 830M org.apache.lucene.util.BytesRef#<init>()
2.69% 750M java.util.ArrayList#grow()
2.60% 726M
org.apache.lucene.util.fst.ByteSequenceOutputs#read()
2.56% 715M
org.apache.lucene.queryparser.charstream.FastCharStream#refill()
2.43% 677M
org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockDocsEnum#<init>()
1.79% 499M java.nio.DirectByteBufferR#duplicate()
1.67% 466M
org.apache.lucene.queryparser.charstream.FastCharStream#GetImage()
1.66% 463M
jdk.internal.misc.Unsafe#allocateUninitializedArray()
1.58% 440M
org.apache.lucene.codecs.lucene90.blocktree.SegmentTermsEnum#getFrame()
1.55% 433M java.util.ArrayList#iterator()
1.34% 375M org.apache.lucene.util.PriorityQueue#<init>()
1.30% 363M
org.apache.lucene.codecs.lucene90.blocktree.IntersectTermsEnumFrame#load()
1.20% 333M
org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockImpactsDocsEnum#<init>()
1.14% 318M
org.apache.lucene.codecs.lucene90.blocktree.SegmentTermsEnum#<init>()
1.11% 309M
org.apache.lucene.codecs.lucene90.Lucene90PostingsReader#newTermState()
1.09% 305M org.apache.lucene.codecs.lucene90.ForUtil#<init>()
0.87% 244M java.nio.DirectByteBufferR#slice()
0.83% 230M
org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$EverythingEnum#<init>()
0.82% 229M
org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockImpactsPostingsEnum#<init>()
0.80% 223M
org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingStoredFieldsReader$BlockState#document()
0.73% 204M java.nio.DirectByteBufferR#asLongBuffer()
0.73% 203M
org.apache.lucene.codecs.lucene90.blocktree.IntersectTermsEnumFrame#<init>()
0.72% 201M java.util.AbstractList#listIterator()
0.68% 191M java.util.Arrays#copyOf()
HEAP merged search profile for baseline:
PROFILE SUMMARY from 78116 events (total: 27923M)
tests.profile.mode=heap
tests.profile.count=30
tests.profile.stacksize=1
tests.profile.linenumbers=false
PERCENT HEAP SAMPLES STACK
17.15% 4789M org.apache.lucene.util.FixedBitSet#<init>()
8.30% 2317M
org.apache.lucene.search.ExactPhraseMatcher$1$1#getImpacts()
7.31% 2040M
org.apache.lucene.search.ExactPhraseMatcher$1#getImpacts()
6.89% 1925M java.util.AbstractList#iterator()
5.39% 1506M
org.apache.lucene.codecs.lucene90.blocktree.SegmentTermsEnumFrame#<init>()
3.24% 904M org.apache.lucene.util.ArrayUtil#growExact()
3.01% 840M org.apache.lucene.util.BytesRef#<init>()
2.72% 759M java.util.ArrayList#grow()
2.59% 724M
org.apache.lucene.util.fst.ByteSequenceOutputs#read()
2.50% 697M
org.apache.lucene.queryparser.charstream.FastCharStream#refill()
2.41% 673M
org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockDocsEnum#<init>()
1.85% 515M java.nio.DirectByteBufferR#duplicate()
1.69% 472M
jdk.internal.misc.Unsafe#allocateUninitializedArray()
1.60% 446M
org.apache.lucene.queryparser.charstream.FastCharStream#GetImage()
1.56% 434M
org.apache.lucene.codecs.lucene90.blocktree.SegmentTermsEnum#getFrame()
1.50% 419M java.util.ArrayList#iterator()
1.36% 378M org.apache.lucene.util.PriorityQueue#<init>()
1.33% 371M
org.apache.lucene.codecs.lucene90.blocktree.IntersectTermsEnumFrame#load()
1.15% 321M
org.apache.lucene.codecs.lucene90.blocktree.SegmentTermsEnum#<init>()
1.12% 313M
org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockImpactsDocsEnum#<init>()
1.07% 298M
org.apache.lucene.codecs.lucene90.Lucene90PostingsReader#newTermState()
1.02% 284M org.apache.lucene.codecs.lucene90.ForUtil#<init>()
0.93% 260M java.nio.DirectByteBufferR#slice()
0.82% 228M
org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$EverythingEnum#<init>()
0.81% 227M
org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockImpactsPostingsEnum#<init>()
0.81% 225M
org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingStoredFieldsReader$BlockState#document()
0.77% 214M java.nio.DirectByteBufferR#asLongBuffer()
0.69% 193M
org.apache.lucene.queryparser.classic.Token#newToken()
0.67% 187M java.util.AbstractList#listIterator()
0.66% 185M java.util.Arrays#copyOf()
{noformat}
It looks like there's a slight improvement in some queries/sorting. The new
code is much cleaner, so I see no reason not to commit this. I am still open
for suggestions about the FST readers.
> Improve ByteArrayDataInput to read primitive short/int/long natively using
> VarHandles
> -------------------------------------------------------------------------------------
>
> Key: LUCENE-10113
> URL: https://issues.apache.org/jira/browse/LUCENE-10113
> Project: Lucene - Core
> Issue Type: Improvement
> Components: core/store
> Affects Versions: main (9.0)
> Reporter: Uwe Schindler
> Assignee: Uwe Schindler
> Priority: Major
> Fix For: main (9.0)
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> LUCENE-10112 reminded me about something i wanted to do long ago: Basically
> for all IndexInputs/DataInputs we are able to natively read short, int, long
> using little endian with single CPU instructions (due to using ByteBuffer's
> methods that support primitive reads). Only ByteArrayDataInput still uses
> manual code beased on the the inherited byte-by-byte approach to read single
> bytes and combining the bytes using little endian.
> The approach here is to use Java 9+ VarHandles to allow reading
> int/long/short as single cpu instructions and not manually recombining the
> bytes. The trick is to make a "view" var handle which allows to access the
> byte array using the same mechanisms as ByteBuffers or JDK 17 MemorySegments
> (under the hood it uses Unsafe to use CPU instructions and optionally swap
> bytes if platform endianness is BE).
> In LUCENE-10112 there were similar stuff done with LZ4 and a microbenchmark
> was written that showed a significant speed improvement when accessing the
> types with VarHandle.
> P.S.: The same applies to FST.BytesReader and/or ByteSliceReader, but I am no
> sure if those use the int/short/long ones at all. At least this one does not
> override the methods to read ints, longs and shorts, so there is no
> optimization at all. FST seems to read bytes and byte[] only and
> ByteSliceReader mostly VInts.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]