[jira] [Commented] (LUCENE-4574) FunctionQuery ValueSource value computed twice per document

Robert Muir (JIRA) Wed, 28 Nov 2012 08:27:06 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-4574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13505619#comment-13505619
 ]


Robert Muir commented on LUCENE-4574:
-------------------------------------

I think its generally cheap. like today its already cached in BooleanScorer2 
(which solr always gets for a booleanquery), and for
a term query its typically like a multiply and so on. So i think caching in 
general is only useless and would hurt here. in these 
silly cases (sorting with relevance but also asking for filling scores versus 
etc), cheaper to just call it twice rather than try to do 
something funkier in the collector: we would have to benchmark this.

{quote}
So for me this means adding the cache at FunctionQuery$AllScorer. 
{quote}

I think I like this idea better than adding caching in general to these 
collectors. Is the score() method typically expensive
for function queries?

Yet another possibility is, instead of asking to track scores when sorting by 
relevance, to ask to fill sort fields (the default anyway right?).
Its sorta redundant to ask for both. If you do this, i dont think it calls 
score() twice.

Finally, we could also consider something like your patch, except more honed in 
these particular silly situations. so thats something like,
up-front setting a boolean in these collectors ctors if one of the comparators 
is relevance and also its asked to track scores/max scores. 
then in setscorer, we could do like your patch only if this boolean is set. i 
feel like we wouldnt have to add 87 more specialized collectors to do this. I 
just havent looked at the code to try to figure out what all the situations can 
be (all those booleans etc to indexsearcher) where 
score() can currently be called twice.

                
> FunctionQuery ValueSource value computed twice per document
> -----------------------------------------------------------
>
>                 Key: LUCENE-4574
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4574
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/search
>    Affects Versions: 4.0, 4.1
>            Reporter: David Smiley
>         Attachments: LUCENE-4574.patch, Test_for_LUCENE-4574.patch
>
>
> I was working on a custom ValueSource and did some basic profiling and 
> debugging to see if it was being used optimally.  To my surprise, the value 
> was being fetched twice per document in a row.  This computation isn't 
> exactly cheap to calculate so this is a big problem.  I was able to 
> work-around this problem trivially on my end by caching the last value with 
> corresponding docid in my FunctionValues implementation.
> Here is an excerpt of the code path to the first execution:
> {noformat}
>         at 
> org.apache.lucene.queries.function.docvalues.DoubleDocValues.floatVal(DoubleDocValues.java:48)
>         at 
> org.apache.lucene.queries.function.FunctionQuery$AllScorer.score(FunctionQuery.java:153)
>         at 
> org.apache.lucene.search.TopFieldCollector$OneComparatorScoringMaxScoreCollector.collect(TopFieldCollector.java:291)
>         at org.apache.lucene.search.Scorer.score(Scorer.java:62)
>         at 
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:588)
>         at 
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:280)
> {noformat}
> And here is the 2nd call:
> {noformat}
>         at 
> org.apache.lucene.queries.function.docvalues.DoubleDocValues.floatVal(DoubleDocValues.java:48)
>         at 
> org.apache.lucene.queries.function.FunctionQuery$AllScorer.score(FunctionQuery.java:153)
>         at 
> org.apache.lucene.search.ScoreCachingWrappingScorer.score(ScoreCachingWrappingScorer.java:56)
>         at 
> org.apache.lucene.search.FieldComparator$RelevanceComparator.copy(FieldComparator.java:951)
>         at 
> org.apache.lucene.search.TopFieldCollector$OneComparatorScoringMaxScoreCollector.collect(TopFieldCollector.java:312)
>         at org.apache.lucene.search.Scorer.score(Scorer.java:62)
>         at 
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:588)
>         at 
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:280)
> {noformat}
> The 2nd call appears to use some score caching mechanism, which is all well 
> and good, but that same mechanism wasn't used in the first call so there's no 
> cached value to retrieve.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-4574) FunctionQuery ValueSource value computed twice per document

Reply via email to