[ https://issues.apache.org/jira/browse/LUCENE-10542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17529009#comment-17529009 ]
Kevin Risden commented on LUCENE-10542: --------------------------------------- [~hossman] I think I addressed your comments from LUCENE-10534. The PR I pushed updated BytesRefFieldSource and I did another check for the method name getValueForDoc. > FieldSource exists implementation can avoid value retrieval > ----------------------------------------------------------- > > Key: LUCENE-10542 > URL: https://issues.apache.org/jira/browse/LUCENE-10542 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Kevin Risden > Assignee: Kevin Risden > Priority: Minor > Attachments: flamegraph_getValueForDoc.png > > Time Spent: 20m > Remaining Estimate: 0h > > While looking at LUCENE-10534, found that *FieldSource exists implementation > after LUCENE-7407 can avoid value lookup when just checking for exists. > Flamegraphs - x axis = time spent as a percentage of time being profiled, y > axis = stack trace bottom being first call top being last call > Looking only at the left most getValueForDoc highlight only (and it helps to > make it bigger or download the original) > !flamegraph_getValueForDoc.png|height=410,width=1000! > LongFieldSource#exists spends MOST of its time doing a > LongFieldSource#getValueForDoc. LongFieldSource#getValueForDoc spends its > time doing two things primarily: > * FilterNumericDocValues#longValue() > * advance() > This makes sense based on looking at the code (copied below to make it easier > to see at once) > https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/LongFieldSource.java#L72 > {code:java} > private long getValueForDoc(int doc) throws IOException { > if (doc < lastDocID) { > throw new IllegalArgumentException( > "docs were sent out-of-order: lastDocID=" + lastDocID + " vs > docID=" + doc); > } > lastDocID = doc; > int curDocID = arr.docID(); > if (doc > curDocID) { > curDocID = arr.advance(doc); > } > if (doc == curDocID) { > return arr.longValue(); > } else { > return 0; > } > } > {code} > LongFieldSource#exists - doesn't care about the actual longValue. Just that > there was a value found when iterating through the doc values. > https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/LongFieldSource.java#L95 > {code:java} > @Override > public boolean exists(int doc) throws IOException { > getValueForDoc(doc); > return arr.docID() == doc; > } > {code} > So putting this all together for exists calling getValueForDoc, we spent ~50% > of the time trying to get the long value when we don't need it in exists. We > can save that 50% of time making exists not care about the actual value and > just return if doc == curDocID basically. > This 50% extra is exaggerated in MaxFloatFunction (and other places) since > exists() is being called a bunch. Eventually the value will be needed from > longVal(), but if we call exists() say 3 times for every longVal(), we are > spending a lot of time computing the value when we only need to check for > existence. > I found the same pattern in DoubleFieldSource, EnumFieldSource, > FloatFieldSource, IntFieldSource, LongFieldSource. I put together a change > showing what this would look like: > ---- > Simple JMH performance tests comparing the original FloatFieldSource to the > new ones from PR #847. > > ||Benchmark||Mode||Cnt||Score and Error||Units|| > |MyBenchmark.testMaxFloatFunction|thrpt|25|65.668 ± 2.724|ops/s| > |MyBenchmark.testMaxFloatFunctionNewFloatFieldSource|thrpt|25|113.779 ± > 8.229|ops/s| > |MyBenchmark.testMaxFloatFunctionRareField|thrpt|25|237.400 ± 7.981|ops/s| > |MyBenchmark.testMaxFloatFunctionNewFloatFieldSourceRareField|thrpt|25|281.997 > ± 27.575|ops/s| > Source: https://github.com/risdenk/lucene-jmh -- This message was sent by Atlassian Jira (v8.20.7#820007) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org