[jira] [Commented] (LUCENE-4574) FunctionQuery ValueSource value computed twice per document

Robert Muir (JIRA) Fri, 30 Nov 2012 09:53:58 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-4574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13507499#comment-13507499
 ]


Robert Muir commented on LUCENE-4574:
-------------------------------------

{quote}
So why do you hate this very simple cache so much? 
{quote}

I want things fixed correctly, the way I see it there is a lot of bogusness:
* When solr is only sorting by score, it should call IS.search without a Sort 
to get faster behavior. The relevance comparator documents that its the slow 
way.
* its especially stupid someone can ask for fillFields=true and 
trackDocScores=true if you have a relevance comparator.
* i'm not sure trackMaxScore=true is really useful at all except when relevance 
is the only sort, in which case you should be using IS.search without a sort 
anyway. If someone really needs this combination, i think its ok to make them 
impl their own collector
* i don't like wrapping the scorer with this cache in this relevance 
comparator. I feel like the comparator can probably do this in a cleaner way.
* i don't like all this caching just added on a whim everywhere. I see it here, 
I see BooleanScorer2 has a cache, I see block-join query has a cache, and I see 
PositivesScoreOnlyCollector has a cache. there are already cachingvaluesources 
at the valuesource level too: look at CachingDoubleValueSource in spatial .Some 
of these are senseless. If there is a real reason, its not documented. We 
should instead fix the APIs and so on instead of just adding all this caching 
everywhere.
* i think calling score() twice is bogus, but we should be fixing this 
correctly instead of hacking something in to speed up a slow functionquery. 

So yeah, clearly adding caches everywhere isn't the right solution to this 
stuff. I feel like I'm drowning in caches and bug reports like this one still 
exist.

We shouldnt rush anything in because of a particularly slow function query. 
Trust me, I think its bogus we call score() twice: but if something is put in 
rather quickly on this issue (e.g. more caching) then i prefer if its more 
contained so it can easily be ripped out later, when the problem is ultimately 
solved correctly.

                
> FunctionQuery ValueSource value computed twice per document
> -----------------------------------------------------------
>
>                 Key: LUCENE-4574
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4574
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/search
>    Affects Versions: 4.0, 4.1
>            Reporter: David Smiley
>            Assignee: David Smiley
>         Attachments: LUCENE-4574.patch, LUCENE-4574.patch, LUCENE-4574.patch, 
> LUCENE-4574.patch, Test_for_LUCENE-4574.patch
>
>
> I was working on a custom ValueSource and did some basic profiling and 
> debugging to see if it was being used optimally.  To my surprise, the value 
> was being fetched twice per document in a row.  This computation isn't 
> exactly cheap to calculate so this is a big problem.  I was able to 
> work-around this problem trivially on my end by caching the last value with 
> corresponding docid in my FunctionValues implementation.
> Here is an excerpt of the code path to the first execution:
> {noformat}
>         at 
> org.apache.lucene.queries.function.docvalues.DoubleDocValues.floatVal(DoubleDocValues.java:48)
>         at 
> org.apache.lucene.queries.function.FunctionQuery$AllScorer.score(FunctionQuery.java:153)
>         at 
> org.apache.lucene.search.TopFieldCollector$OneComparatorScoringMaxScoreCollector.collect(TopFieldCollector.java:291)
>         at org.apache.lucene.search.Scorer.score(Scorer.java:62)
>         at 
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:588)
>         at 
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:280)
> {noformat}
> And here is the 2nd call:
> {noformat}
>         at 
> org.apache.lucene.queries.function.docvalues.DoubleDocValues.floatVal(DoubleDocValues.java:48)
>         at 
> org.apache.lucene.queries.function.FunctionQuery$AllScorer.score(FunctionQuery.java:153)
>         at 
> org.apache.lucene.search.ScoreCachingWrappingScorer.score(ScoreCachingWrappingScorer.java:56)
>         at 
> org.apache.lucene.search.FieldComparator$RelevanceComparator.copy(FieldComparator.java:951)
>         at 
> org.apache.lucene.search.TopFieldCollector$OneComparatorScoringMaxScoreCollector.collect(TopFieldCollector.java:312)
>         at org.apache.lucene.search.Scorer.score(Scorer.java:62)
>         at 
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:588)
>         at 
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:280)
> {noformat}
> The 2nd call appears to use some score caching mechanism, which is all well 
> and good, but that same mechanism wasn't used in the first call so there's no 
> cached value to retrieve.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-4574) FunctionQuery ValueSource value computed twice per document

Reply via email to