[ https://issues.apache.org/jira/browse/LUCENE-10542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17530312#comment-17530312 ]
ASF subversion and git services commented on LUCENE-10542: ---------------------------------------------------------- Commit fba1a68b4518b799d6e4c8637448e032825549cb in lucene's branch refs/heads/branch_9x from Kevin Risden [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=fba1a68b451 ] LUCENE-10542: FieldSource exists implementations can avoid value retrieval (#847) > FieldSource exists implementations can avoid value retrieval > ------------------------------------------------------------ > > Key: LUCENE-10542 > URL: https://issues.apache.org/jira/browse/LUCENE-10542 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Kevin Risden > Assignee: Kevin Risden > Priority: Minor > Attachments: flamegraph_getValueForDoc.png > > Time Spent: 0.5h > Remaining Estimate: 0h > > While looking at LUCENE-10534, found that *FieldSource exists implementation > after LUCENE-7407 can avoid value lookup when just checking for exists. > Flamegraphs - x axis = time spent as a percentage of time being profiled, y > axis = stack trace bottom being first call top being last call > Looking only at the left most getValueForDoc highlight only (and it helps to > make it bigger or download the original) > !flamegraph_getValueForDoc.png|height=410,width=1000! > LongFieldSource#exists spends MOST of its time doing a > LongFieldSource#getValueForDoc. LongFieldSource#getValueForDoc spends its > time doing two things primarily: > * FilterNumericDocValues#longValue() > * advance() > This makes sense based on looking at the code (copied below to make it easier > to see at once) > https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/LongFieldSource.java#L72 > {code:java} > private long getValueForDoc(int doc) throws IOException { > if (doc < lastDocID) { > throw new IllegalArgumentException( > "docs were sent out-of-order: lastDocID=" + lastDocID + " vs > docID=" + doc); > } > lastDocID = doc; > int curDocID = arr.docID(); > if (doc > curDocID) { > curDocID = arr.advance(doc); > } > if (doc == curDocID) { > return arr.longValue(); > } else { > return 0; > } > } > {code} > LongFieldSource#exists - doesn't care about the actual longValue. Just that > there was a value found when iterating through the doc values. > https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/LongFieldSource.java#L95 > {code:java} > @Override > public boolean exists(int doc) throws IOException { > getValueForDoc(doc); > return arr.docID() == doc; > } > {code} > So putting this all together for exists calling getValueForDoc, we spent ~50% > of the time trying to get the long value when we don't need it in exists. We > can save that 50% of time making exists not care about the actual value and > just return if doc == curDocID basically. > This 50% extra is exaggerated in MaxFloatFunction (and other places) since > exists() is being called a bunch. Eventually the value will be needed from > longVal(), but if we call exists() say 3 times for every longVal(), we are > spending a lot of time computing the value when we only need to check for > existence. > I found the same pattern in DoubleFieldSource, EnumFieldSource, > FloatFieldSource, IntFieldSource, LongFieldSource. I put together a change > showing what this would look like: > ---- > Simple JMH performance tests comparing the original FloatFieldSource to the > new ones from PR #847. > > | Benchmark | Mode | > Cnt | Score and Error | Units | > |-----------------------------------------------------------------|-------|-----|------------------|-------| > | MyBenchmark.testMaxFloatFunction | thrpt | > 25 | 64.159 ± 2.031 | ops/s | > | MyBenchmark.testNewMaxFloatFunction | thrpt | > 25 | 94.997 ± 2.365 | ops/s | > | MyBenchmark.testMaxFloatFunctionNewFloatFieldSource | thrpt | > 25 | 123.191 ± 9.291 | ops/s | > | MyBenchmark.testNewMaxFloatFunctionNewFloatFieldSource | thrpt | > 25 | 123.817 ± 6.191 | ops/s | > | MyBenchmark.testMaxFloatFunctionRareField | thrpt | > 25 | 244.921 ± 6.439 | ops/s | > | MyBenchmark.testNewMaxFloatFunctionRareField | thrpt | > 25 | 239.288 ± 5.136 | ops/s | > | MyBenchmark.testMaxFloatFunctionNewFloatFieldSourceRareField | thrpt | > 25 | 271.521 ± 3.870 | ops/s | > | MyBenchmark.testNewMaxFloatFunctionNewFloatFieldSourceRareField | thrpt | > 25 | 279.334 ± 10.511 | ops/s | > Source: https://github.com/risdenk/lucene-jmh -- This message was sent by Atlassian Jira (v8.20.7#820007) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org