Re: TSDC, TopFieldCollector & co

Shai Erera Wed, 30 Sep 2009 12:03:54 -0700

I was half way through answering the second part when I noticed your second
update :).


I don't know about adding reset() to Collector. It makes sense "for
completeness" in case other Collectors can be reset() as well. But reset()
is a delicate method. It needs to be used cautiously. E.g., if you ask for
100 results, and then want to ask for just 10/20/40, a user might be tempted
to think he can call reset() and it will work. But reset() can be called
only if you want to ask for 100 results again, at least if we don't want to
complicate anything in the code. That's why I think it better exist on TSDC
w/ the proper documentation. If anything, reset() is a TSDC-related
operation (or TopDocsCollector). If I have my own Collector, its reset()
method signature might need to be different, not to speak about
documentation.

About clear(Object sentinel) - is it still a question (now that you
understood getSentinelValue())? I think we should not make it final anyway.
It restricts PQ extensions unnecessarily ...

Shai

On Wed, Sep 30, 2009 at 8:41 PM, eks dev <[email protected]> wrote:

> > BTW eks, you asked about reusing TSDC.
>
> yeah, it is normally not a big deal to allocate everything again, but these
> arrays are not necessarily small, I guess it would make sense to open this
> possibility.
>
> do you think where would be better to add reset(),  TSDC or to Collector?
>
> I would even suggest to change clear method to become clear(Object
> sentinel) to PQ instead... or to add such a method (backwards compatibility)
> ...
>
> Another question:
> Looking at the code in PQ, it was not really clear to me why sentinels have
> to be allocated maxSize times in initialize method? I am talking about:
>
>    // If sentinel objects are supported, populate the queue with them
>    Object sentinel = getSentinelObject();
>    if (sentinel != null) {
>      heap[1] = sentinel;
>      for (int i = 2; i < heap.length; i++) {
>        heap[i] = getSentinelObject(); //Why not simply heap[i] = sentinel;
>      }
>      size = maxSize;
>    }
>
>  getSentinelObject() creates new object every time (HitQueue)
>
> are these objects mutable?
>
>
>
>
>
>
>
> ----- Original Message ----
> > From: Shai Erera <[email protected]>
> > To: [email protected]
> > Sent: Wednesday, 30 September, 2009 18:11:03
> > Subject: Re: TSDC, TopFieldCollector & co
> >
> > BTW eks, you asked about reusing TSDC. PQ has a clear() method, so it can
> be
> > reused. Only currently it's final and nullifies the array. We'll need to
> > un-final it, and then override in HitQueue to just reset the ScoreDoc
> > instances to be sentinels again. And of course add a reset() method to
> TSDC.
> >
> > On Wed, Sep 30, 2009 at 5:26 PM, eks dev wrote:
> >
> > > Thanks Mark, Shai,
> > > I was getting confused by so many possibilities to do the "almost the
> same
> > > thing" ;)
> > >
> > > But have figured it out by peeking into BoolenQuery code that decides
> if
> > > "out of order" should be used..., BQ will pick the right TSDC ... I
> like it,
> > > option 1 it is minimum user code.
> > >
> > > Cheers, eks
> > >
> > >
> > >
> > > ----- Original Message ----
> > > > From: Shai Erera
> > > > To: [email protected]
> > > > Sent: Wednesday, 30 September, 2009 17:12:38
> > > > Subject: Re: TSDC, TopFieldCollector & co
> > > >
> > > > I agree. If you need sort-by-score, it's better to use the "fast"
> search
> > > > methods. IndexSearcher will create the appropriate TSDC instance for
> you,
> > > > based on the Query that was passed.
> > > >
> > > > If you need to create multiple Collectors and pass a kind of
> > > Multi-Collector
> > > > to IndexSearcher, then you should create TSDC according to Mark's
> example
> > > > above.
> > > >
> > > > Shai
> > > >
> > > > On Wed, Sep 30, 2009 at 4:57 PM, Mark Miller wrote:
> > > >
> > > > > If you want relevance sorting (Sort.Score not Sort.Relevance
> right?),
> > > > > I'd think you want to use TopScoreDocCollector, not
> TopFieldCollector.
> > > > > The only reason to use relevance with TopFieldCollector is if you
> you
> > > > > are doing a nth sort with a field sort as well.
> > > > >
> > > > > You don't really need to worry about things like turning off the
> max
> > > > > score tracking here - its just going to be the first doc on the
> queue.
> > > > >
> > > > > You also do want to specify whether or not to collect docs in order
> if
> > > > > you care about performance:
> > > > >
> > > > >  public static TopScoreDocCollector create(int numHits, boolean
> > > > > docsScoredInOrder)
> > > > >
> > > > > ie:
> > > > >
> > > > > TopScoreDocCollector.create(nDocs, !weight.scoresDocsOutOfOrder());
> > > > >
> > > > > Which means you just want option 1.
> > > > >
> > > > > --
> > > > > - Mark
> > > > >
> > > > > http://www.lucidimagination.com
> > > > >
> > > > >
> > > > >
> > > > > eks dev wrote:
> > > > > > Hi All,
> > > > > >
> > > > > > What is the best way to achieve the following and what are the
> > > > > differences, if I say "I do not normalize scores, so I do not need
> max
> > > score
> > > > > tracking, I do not care if hits are returned in doc id order, or
> any
> > > other
> > > > > order. I need only to get maxDocs *best scoring* documents":
> > > > > >
> > > > > > OPTION 1:
> > > > > > TopDocs top = ixSearcher.search(q, filter, maxDocs);
> > > > > >
> > > > > > OPTION 2:
> > > > > >    final TopScoreDocCollector tfc =
> > > TopScoreDocCollector.create(maxDocs,
> > > > > false);
> > > > > >     ixSearcher.search(q, filter, tfc);
> > > > > >     TopDocs top = tfc.topDocs();
> > > > > >
> > > > > >
> > > > > > OPTION 3:
> > > > > >     final TopFieldCollector tfc =
> > > > > TopFieldCollector.create(Sort.RELEVANCE, maxDocs,
> > > > > >         false  /* fillFields */,
> > > > > >         true   /* trackDocScores */,
> > > > > >         false   /* trackMaxScore */,
> > > > > >         false  /* docsInOrder */);
> > > > > >
> > > > > >     ixSearcher.search(q.weight(ixSearcher),filter, tfc);
> > > > > >     TopDocs top = tfc.topDocs();
> > > > > >
> > > > > >
> > > > > > what are the pros and cons?
> > > > > > If I read javadoc correctly,
> > > > > > - OPTION 1 tracks max score and delivers doc Ids in order
> (suboptimal
> > > > > performance for my case)
> > > > > > - OPTION 2 I do not know abut max score tracking, but doc Ids are
> not
> > > > > required to be in order
> > > > > > - OPTION 3 looks like exactly what I want, but one performance
> > > comment in
> > > > > javadoc about Sort.RELEVANCE made me think if that is the fastest
> way?
> > > > > >
> > > > > > What would be recommended here, any other options to achieve the
> > > fastest
> > > > > search with above defined conditions (no max score tracking and doc
> id
> > > order
> > > > > irrelevant)?  OPTIN2 looks nice, but as said, I am not sure about
> max
> > > score
> > > > > tracking?
> > > > > >
> > > > > > Thanks,
> > > > > > eks
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> ---------------------------------------------------------------------
> > > > > > To unsubscribe, e-mail: [email protected]
> > > > > > For additional commands, e-mail:
> [email protected]
> > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> ---------------------------------------------------------------------
> > > > > To unsubscribe, e-mail: [email protected]
> > > > > For additional commands, e-mail: [email protected]
> > > > >
> > > > >
> > >
> > >
> > >
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: [email protected]
> > > For additional commands, e-mail: [email protected]
> > >
> > >
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Re: TSDC, TopFieldCollector & co

Reply via email to