Re: Time of processing hits.doc()

German Kondolf Mon, 19 Nov 2007 08:20:13 -0800

I have already defined a Lucene Filter for every "id" of "ubicacion".
I just create the bitset for every value, and count it against the result.


One possible optimization is to read the terms of the field you're
trying to "group", that's the optimization we'll be working soon on
our app.

I never read all the results.

On Nov 19, 2007 1:05 PM, Haroldo Nascimento <[EMAIL PROTECTED]> wrote:
> German,
>
>   What I need is similar to the your site
> http://listados.deremate.com.ar/panaderia .
>   I have many results of search, but I show any result (for example:
> first 10 for first page) , but for create the options of filter of
> location I need read all results fof search. The problem of
> performance is when I have 30.000 results.
>
>   How you get the filter the "ubication" ? You need read all results ?
>
>   thanks
>
> On Nov 19, 2007 12:05 PM, Grant Ingersoll <[EMAIL PROTECTED]> wrote:
> > I think, based on your previous question, that you just need to use
> > the search() method that returns TopDocs, not the lower-level
> > HitCollector method.  From the TopDocs, you can then access the
> > ScoreDoc, which will give you info about the doc and the score.  See 
> > http://www.lucenebootcamp.com/LuceneBootCamp/training/src/test/java/com/lucenebootcamp/training/basic/TopDocsTest.java
> >  from my Lucene Boot Camp training class for a really simple example.
> >
> > -Grant
>
> >
> >
> > On Nov 19, 2007, at 9:12 AM, Haroldo Nascimento wrote:
> >
> > > Mark,
> > >
> > >  How I can get the information of Document. I think that is in the
> > > implementation do method abstract collect. How I can get it .
> > >
> > >  Below is the example of javadoc the Lucene.
> > >
> > > Searcher searcher = new IndexSearcher(indexReader);
> > >   final BitSet bits = new BitSet(indexReader.maxDoc());
> > >   searcher.search(query, new HitCollector() {
> > >       public void collect(int doc, float score) {
> > >         bits.set(doc);
> > >       }
> > >     });
> > >
> > > Thanks
> > >
> > >
> > > On Nov 18, 2007 8:09 PM, Mark Miller <[EMAIL PROTECTED]> wrote:
> > >> Hey Haroldo.
> > >>
> > >> First thing you need to do is *stop* using Hits in your searches.
> > >> Hits
> > >> is optimized for some pretty specific use cases and you will get
> > >> along
> > >> much better by using a HitCollector.
> > >>
> > >> Hits has three main functions:
> > >>
> > >> It caches documents, normalizes scores, and stores ids associated
> > >> with
> > >> scores (a HitDoc). If you attempt to retrieve a HitDoc past the first
> > >> 100 from Hits, a new search will be issued to grab double the
> > >> required
> > >> HitDocs needed to satisfy your HitDoc retrieval attempt. This will be
> > >> repeated everytime you ask for a HitDoc beyond the current cache
> > >> (which
> > >> began at 100). This means that if you need to get a HitDoc beyond
> > >> 100,
> > >> Hits is not a great choice for you. You will want to use the
> > >> HitCollector instead...but remember that you are losing the
> > >> normalized
> > >> scores (simple to copy code if you still want it) and the document
> > >> caching (I rarely want that anyway).
> > >>
> > >> An issue to watch out for: with Hits, you do not have to ask for how
> > >> many docs to get back, but with a HitCollector solution you will need
> > >> to. This is a minor dilema if you want to go over all of the hits no
> > >> matter what. You can pass a huge number to ensure you get everything,
> > >> but you will be creating large data structures if you do this, as
> > >> structure sizes may be initialized by the number you pass. Also,
> > >> passing
> > >> the maximum integer will cause an error (negative init size) as
> > >> Lucene
> > >> initializes a data structure to hold the hits as n+1.
> > >>
> > >> - Mark
> > >>
> > >>
> > >> Haroldo Nascimento wrote:
> > >>> I have a problem of performance when I need group the result do
> > >>> search
> > >>>
> > >>> I have the code below:
> > >>>
> > >>>   for (int i = 0; i < hits.length(); i++) {
> > >>>                    doc = hits.doc(i);
> > >>>
> > >>>                    obj1 = doc.get(Constants.STATE_DESC_FIELD_LABEL);
> > >>>                    obj2 = doc.get(xxx);
> > >>>                    ...
> > >>>   }
> > >>>
> > >>>  I work with volume of data very big. The search process in 0.300
> > >>> seconds but when the object hits have much results, the time for get
> > >>> all objects is very big. The command hits.doc(i) is processed in 2
> > >>> second.
> > >>>
> > >>>  Por exemplo. For hits.length() equals the 25.000 results, the time
> > >>> of "pos search" is 7 seconds.
> > >>>
> > >>>  I get all result because I need group the result (remove the
> > >>> duplicate results).
> > >>>
> > >>>  Is there any form in Lucene that group the result. I need of
> > >>> anything as the command "group by" of sql.
> > >>>
> > >>>  Thanks.
> > >>>
> > >>
> > >>> ---------------------------------------------------------------------
> > >>> To unsubscribe, e-mail: [EMAIL PROTECTED]
> > >>> For additional commands, e-mail: [EMAIL PROTECTED]
> > >>>
> > >>>
> > >>>
> > >>
> > >> ---------------------------------------------------------------------
> > >> To unsubscribe, e-mail: [EMAIL PROTECTED]
> > >> For additional commands, e-mail: [EMAIL PROTECTED]
> > >>
> > >>
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > > For additional commands, e-mail: [EMAIL PROTECTED]
> > >
> >
> > --------------------------
> > Grant Ingersoll
> > http://lucene.grantingersoll.com
> >
> > Lucene Helpful Hints:
> > http://wiki.apache.org/lucene-java/BasicsOfPerformance
> > http://wiki.apache.org/lucene-java/LuceneFAQ
>
> >
> >
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Time of processing hits.doc()

Reply via email to