Re: how to get all documents in the results ?

Anshum Wed, 23 Mar 2011 01:16:00 -0700

So functionally I am assuming you've achieved what you'd been aiming for.
About the scores, the matchalldocs does score docs based on norm factors
etc.
therefore the score wouldn't be 0.
--
Anshum Gupta
http://ai-cafe.blogspot.com



On Wed, Mar 23, 2011 at 1:38 PM, Patrick Diviacco <
patrick.divia...@gmail.com> wrote:

> yeah it is clear. However I don't just want all documents, I still want to
> perform my specific query and on the bottom of the relevant docs, to list
> all not relevant docs as well. (I need this for successive steps).
>
> However now it seems to work. I've added MatchAllDocsQuery to my
> BooleanQuery, and I get the original results plus all remaining docs on
> bottom.
>
>  The only thing I don't understand is the fixed score of not relevant docs
> (the ones appearing only now with MatchAllDocsQuery).
>
> This is the list (please ignore Time and Harvesine scores).
>
> 298736393 298736393 All-text score:9.775101 Time:1.0 Harvesine:1.0 true
> 298736393 298736445 All-text score:3.1505027 Time:1.0 Harvesine:1.0 true
> 298736393 298736527 All-text score:3.1505027 Time:1.0 Harvesine:1.0 true
> 298736393 298736583 All-text score:3.1505027 Time:0.9999981 Harvesine:1.0
> true
> 298736393 298736638 All-text score:3.1505027 Time:0.9999981 Harvesine:1.0
> true
> 298736393 298736709 All-text score:3.1505027 Time:0.9999981 Harvesine:1.0
> true
> 298736393 298736781 All-text score:3.1505027 Time:0.9999981
> Harvesine:-5593.9307 true
> 298736393 298736848 All-text score:3.1505027 Time:0.9999981 Harvesine:1.0
> true
> 298736393 406412394 All-text score:0.62491775 Time:0.9983447
> Harvesine:-5593.9307 false
> 298736393 361767007 All-text score:0.44931924 Time:0.9994102
> Harvesine:-5593.9307 false
> 298736393 361767030 All-text score:0.44931924 Time:0.9994102
> Harvesine:-5593.9307 false
> 298736393 361767061 All-text score:0.44931924 Time:0.9994102
> Harvesine:-5593.9307 false
> 298736393 361767089 All-text score:0.44931924 Time:0.9994102
> Harvesine:-5593.9307 false
> 298736393 361767103 All-text score:0.44931924 Time:0.9994102
> Harvesine:-5593.9307 false
> 298736393 361767123 All-text score:0.44931924 Time:0.9994102
> Harvesine:-5593.9307 false
> 298736393 361767146 All-text score:0.44931924 Time:0.9994102
> Harvesine:-5593.9307 false
> 298736393 ?361492738 All-text score:0.13399276 Time:0.99982494
> Harvesine:-5593.9307 false
> 298736393 361492799 All-text score:0.13399276 Time:0.99982494
> Harvesine:-5593.9307 false
> 298736393 85899696 All-text score:0.061811533 Time:0.9680632
> Harvesine:-2355.1875 false
> 298736393 103368776 All-text score:0.032769617 Time:0.855137
> Harvesine:-5593.9307 false
> 298736393 101106927 All-text score:0.029139375 Time:0.8688775
> Harvesine:-5593.9307 false
> 298736393 80793153 All-text score:0.0018638512 Time:0.99774164
> Harvesine:-2201.7686 false
> 298736393 82661153 All-text score:0.0018638512 Time:0.98817354
> Harvesine:-5593.9307 false
> 298736393 84079583 All-text score:0.0018638512 Time:0.9796918
> Harvesine:-5593.9307 false
> 298736393 84079707 All-text score:0.0018638512 Time:0.9796918
> Harvesine:-5593.9307 false
> 298736393 84079911 All-text score:0.0018638512 Time:0.9796918
> Harvesine:-5593.9307 false
>
>
> On 23 March 2011 09:01, Anshum <ansh...@gmail.com> wrote:
>
> > Hi Patrick,
> > You *don't* need to add a MatchAllDocs query to anything. If you just
> want
> > all docs, just pass it to the searcher.search function and you'd get all
> > results.
> > MatchAllDocs query is the same as BooleanQuery , just that MADQ matches
> all
> > docs in the index. You wouldn't need to specify anything there.
> > The below would work and get you all the docs in the index as the result
> > (provided you specify a limit high enough for the numDocs to match param)
> >
> > *Query query = new MatchAllDocsQuery();*
> > *searcher.search(query.....);*
> >
> > Hope this clarifies your doubt.
> >
> > --
> > Anshum Gupta
> > http://ai-cafe.blogspot.com
> >
> >
> > On Wed, Mar 23, 2011 at 1:14 PM, Patrick Diviacco <
> > patrick.divia...@gmail.com> wrote:
> >
> > > The issue with
> > >
> > >
> > > My confusion about MatchAllDocsQuery is that I cannot specify which
> terms
> > > in
> > > which fields to search with it. I'm probably wrong.
> > >
> > > I currently have a BooleanQuery, that I use to build the query with
> > several
> > > fields and several terms.
> > >
> > > Can I just pass MatchAllDocsQuery to BooleanQuery.add method in order
> to
> > > add
> > > all remaining docs ?
> > >
> > > thanks
> > >
> > >
> > >
> > >
> > > On 22 March 2011 12:42, Anshum <ansh...@gmail.com> wrote:
> > >
> > > > MatchAllDocs does not consider only a single field but all fields
> i.e.
> > it
> > > > takes a *:* query.
> > > >
> > > > *1. *
> > > > *****Snip****
> > > > Query query = new MatchAllDocsQuery();
> > > > TopDocs  td = is.search(query, ir.numDocs());
> > > > ScoreDoc[ ] scoreDocs = td.scoreDocs;
> > > > for(ScoreDoc scoreDoc:scoreDocs){
> > > >
> > > > ... Your code...
> > > >
> > > > }
> > > > ****/Snip***
> > > >
> > > > *2. Collector approach:*
> > > >
> > > > Query query = new MatchAllDocsQuery();
> > > > int hitsPerPage = ir.numDocs();
> > > > TopScoreDocCollector collector =
> > TopScoreDocCollector.create(hitsPerPage,
> > > > true);
> > > > is.search(query, collector);
> > > > ScoreDoc[] scoreDocs = collector.topDocs().scoreDocs;
> > > > for(ScoreDoc scoreDoc:scoreDocs){
> > > >
> > > > .. Your code..
> > > >
> > > > }
> > > >
> > > > The better part about using a collector based approach is that you
> > could
> > > do
> > > > a lot in the collect step i.e. everything that you'd do in the
> > iteration
> > > > post search could also be done while collecting here.
> > > >
> > > > Also, could you also tell me the exact case as to what is it that you
> > are
> > > > trying to achieve. You may have a completely different option that
> you
> > > > haven't read which someone could advice if they know the exact
> intent.
> > > >
> > > > Hope this helps.
> > > >
> > > > --
> > > > Anshum Gupta
> > > > http://ai-cafe.blogspot.com
> > > >
> > > >
> > > > On Tue, Mar 22, 2011 at 4:59 PM, Patrick Diviacco <
> > > > patrick.divia...@gmail.com> wrote:
> > > >
> > > > > 1. "all" docs
> > > > >
> > > > > 2. because matchalldocs only consider one field at once. I'm
> > searching
> > > > over
> > > > > multiple fields instead.
> > > > >
> > > > > 3. could you tell me more about this ? It might be a solution!
> > > > >
> > > > >
> > > > >
> > > > > On 22 March 2011 12:18, Anshum <ansh...@gmail.com> wrote:
> > > > >
> > > > > > so a few things
> > > > > > 1. are you looking to get 'all' documents or only docs matching
> > your
> > > > > query?
> > > > > > 2. if its about fetching all docs, why not use the matchalldocs
> > > query?
> > > > > > 3. did you try using a collector instead of topdocs?
> > > > > >
> > > > > > --
> > > > > > Anshum Gupta
> > > > > > http://ai-cafe.blogspot.com
> > > > > >
> > > > > >
> > > > > > On Tue, Mar 22, 2011 at 4:46 PM, Patrick Diviacco <
> > > > > > patrick.divia...@gmail.com> wrote:
> > > > > >
> > > > > > > I don't think the link you suggested can help, but maybe I'm
> > wrong.
> > > > > > >
> > > > > > > Also, the parameter MAX_HITS is not useful, it just limit the
> > > > results,
> > > > > it
> > > > > > > doesn't add the not relevant docs.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On 22 March 2011 12:10, Anshum <ansh...@gmail.com> wrote:
> > > > > > >
> > > > > > > > Hi Patrick,
> > > > > > > > You may have a look at this, perhaps this will help you with
> > it.
> > > > Let
> > > > > me
> > > > > > > > know
> > > > > > > > if you're still stuck up.
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://stackoverflow.com/questions/3300265/lucene-3-iterating-over-all-hits
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Anshum Gupta
> > > > > > > > http://ai-cafe.blogspot.com
> > > > > > > >
> > > > > > > >
> > > > > > > > On Tue, Mar 22, 2011 at 4:10 PM, <karl.wri...@nokia.com>
> > wrote:
> > > > > > > >
> > > > > > > > > Not sure what your use case actually is, but it sounds like
> > you
> > > > may
> > > > > > be
> > > > > > > > > unclear how Lucene works.
> > > > > > > > >
> > > > > > > > > Each query clause you have will produce an iterator that
> > walks
> > > > over
> > > > > > the
> > > > > > > > > documents that match that clause.  All the documents from
> the
> > > > > entire,
> > > > > > > > root
> > > > > > > > > query get scored.  The scoring evaluation per document is
> > also
> > > > > > related
> > > > > > > to
> > > > > > > > > the form of your query expression hierarchy.
> > > > > > > > >
> > > > > > > > > So, MatchAllDocsQuery is exactly what you want if you want
> a
> > > > > document
> > > > > > > > > iterator that includes all documents in the index.  You can
> > > > change
> > > > > > how
> > > > > > > > this
> > > > > > > > > is scored by extending MatchAllDocsQuery and writing a
> custom
> > > > > scorer.
> > > > > > > > >
> > > > > > > > > Karl
> > > > > > > > >
> > > > > > > > > -----Original Message-----
> > > > > > > > > From: ext Patrick Diviacco [mailto:
> > patrick.divia...@gmail.com]
> > > > > > > > > Sent: Tuesday, March 22, 2011 4:23 AM
> > > > > > > > > To: java-user@lucene.apache.org
> > > > > > > > > Subject: how to get all documents in the results ?
> > > > > > > > >
> > > > > > > > > I'm using the following code because I want to see the
> entire
> > > > > > > collection
> > > > > > > > in
> > > > > > > > > my query results:
> > > > > > > > >
> > > > > > > > > //adding wildcards-term to see all results
> > > > > > > > > rest = new TermQuery(new Term("*","*"));
> > > > > > > > > booleanQuery.add(rest, BooleanClause.Occur.SHOULD);
> > > > > > > > >
> > > > > > > > > But it doesn't work, I only see the relevant docs and not
> all
> > > the
> > > > > > other
> > > > > > > > > ones.
> > > > > > > > > How can I get all documents ordered by relevance instead ?
> > > > > > > > >
> > > > > > > > > ps. MatchAllDocsQuery is not a solution because I need to
> > > specify
> > > > > my
> > > > > > > own
> > > > > > > > > custom query.
> > > > > > > > >
> > > > > > > > >
> > > > >
> ---------------------------------------------------------------------
> > > > > > > > > To unsubscribe, e-mail:
> > > java-user-unsubscr...@lucene.apache.org
> > > > > > > > > For additional commands, e-mail:
> > > > java-user-h...@lucene.apache.org
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: how to get all documents in the results ?

Reply via email to