[ https://issues.apache.org/jira/browse/LUCENE-4574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13507499#comment-13507499 ]
Robert Muir commented on LUCENE-4574: ------------------------------------- {quote} So why do you hate this very simple cache so much? {quote} I want things fixed correctly, the way I see it there is a lot of bogusness: * When solr is only sorting by score, it should call IS.search without a Sort to get faster behavior. The relevance comparator documents that its the slow way. * its especially stupid someone can ask for fillFields=true and trackDocScores=true if you have a relevance comparator. * i'm not sure trackMaxScore=true is really useful at all except when relevance is the only sort, in which case you should be using IS.search without a sort anyway. If someone really needs this combination, i think its ok to make them impl their own collector * i don't like wrapping the scorer with this cache in this relevance comparator. I feel like the comparator can probably do this in a cleaner way. * i don't like all this caching just added on a whim everywhere. I see it here, I see BooleanScorer2 has a cache, I see block-join query has a cache, and I see PositivesScoreOnlyCollector has a cache. there are already cachingvaluesources at the valuesource level too: look at CachingDoubleValueSource in spatial .Some of these are senseless. If there is a real reason, its not documented. We should instead fix the APIs and so on instead of just adding all this caching everywhere. * i think calling score() twice is bogus, but we should be fixing this correctly instead of hacking something in to speed up a slow functionquery. So yeah, clearly adding caches everywhere isn't the right solution to this stuff. I feel like I'm drowning in caches and bug reports like this one still exist. We shouldnt rush anything in because of a particularly slow function query. Trust me, I think its bogus we call score() twice: but if something is put in rather quickly on this issue (e.g. more caching) then i prefer if its more contained so it can easily be ripped out later, when the problem is ultimately solved correctly. > FunctionQuery ValueSource value computed twice per document > ----------------------------------------------------------- > > Key: LUCENE-4574 > URL: https://issues.apache.org/jira/browse/LUCENE-4574 > Project: Lucene - Core > Issue Type: Bug > Components: core/search > Affects Versions: 4.0, 4.1 > Reporter: David Smiley > Assignee: David Smiley > Attachments: LUCENE-4574.patch, LUCENE-4574.patch, LUCENE-4574.patch, > LUCENE-4574.patch, Test_for_LUCENE-4574.patch > > > I was working on a custom ValueSource and did some basic profiling and > debugging to see if it was being used optimally. To my surprise, the value > was being fetched twice per document in a row. This computation isn't > exactly cheap to calculate so this is a big problem. I was able to > work-around this problem trivially on my end by caching the last value with > corresponding docid in my FunctionValues implementation. > Here is an excerpt of the code path to the first execution: > {noformat} > at > org.apache.lucene.queries.function.docvalues.DoubleDocValues.floatVal(DoubleDocValues.java:48) > at > org.apache.lucene.queries.function.FunctionQuery$AllScorer.score(FunctionQuery.java:153) > at > org.apache.lucene.search.TopFieldCollector$OneComparatorScoringMaxScoreCollector.collect(TopFieldCollector.java:291) > at org.apache.lucene.search.Scorer.score(Scorer.java:62) > at > org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:588) > at > org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:280) > {noformat} > And here is the 2nd call: > {noformat} > at > org.apache.lucene.queries.function.docvalues.DoubleDocValues.floatVal(DoubleDocValues.java:48) > at > org.apache.lucene.queries.function.FunctionQuery$AllScorer.score(FunctionQuery.java:153) > at > org.apache.lucene.search.ScoreCachingWrappingScorer.score(ScoreCachingWrappingScorer.java:56) > at > org.apache.lucene.search.FieldComparator$RelevanceComparator.copy(FieldComparator.java:951) > at > org.apache.lucene.search.TopFieldCollector$OneComparatorScoringMaxScoreCollector.collect(TopFieldCollector.java:312) > at org.apache.lucene.search.Scorer.score(Scorer.java:62) > at > org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:588) > at > org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:280) > {noformat} > The 2nd call appears to use some score caching mechanism, which is all well > and good, but that same mechanism wasn't used in the first call so there's no > cached value to retrieve. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org