Re: looks like no allowing of paging without counting entire result set?

Erick Erickson Mon, 20 Jun 2011 07:13:23 -0700

<<< that if the first page took 3 seconds to come up, the second page
took 3 seconds + x seconds>>>


This is really suspicious, what all are you trying to do in your
process? Because I'm starting to guess
that Solr isn't the performance problem here, assuming
reasonably-sized pages (e.g. < thousands).

If all you're doing is matching terms, not scoring, using wildcards,
and all that, you might get
some joy from TermDocs or similar.

Best
Erick

On Mon, Jun 20, 2011 at 9:44 AM, Hiller, Dean  x66079
<dean.hil...@broadridge.com> wrote:
> One more note:  We hit a big performance problem in that if the first page 
> took 3 seconds to come up, the second page took 3 seconds + x seconds to come 
> up....this was the major problem we hit.  Our client is not a web app but 
> automated software so the timings on the second page really need to be in the 
> 0 seconds + x seconds range.
>
> So, deep paging may happen if there are no matches in our system as the 
> automated software has to go through all results until it pairs up the record 
> that just came in.
>
> Main issue is we have nothing to do with search and are trying to use lucene 
> as a plain indexing library for those typical rdbms indexing use-cases that 
> you have.
>
> Dean
>
> -----Original Message-----
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Monday, June 20, 2011 6:15 AM
> To: java-user@lucene.apache.org
> Subject: Re: looks like no allowing of paging without counting entire result 
> set?
>
> re: 20020101 to the end of time.. Use a clause like [2002-01-01 TO *]
>
> About paging... Yes, you have to start all over again for each search. The 
> basic
> problem is that you have to score every document each search, the last 
> document
> scored might be the highest-scoring document.
>
> But let's back up a step, can you tell us what the higher-level
> problem you're trying
> to solve is? *Why* do you want to do "deep paging"? Do you care about scoring
> the documents or do you just want to look at all of them that match?
>
> One solution would be to use a Collector that collected as many documents as
> you ever want to return and then you can use that list to "page". But
> that requires
> a stateful connection, which may be appropriate to your problem...
>
> Best
> Erick
>
> On Sun, Jun 19, 2011 at 2:39 PM, Hiller, Dean  x66079
> <dean.hil...@broadridge.com> wrote:
>> "It supports it like 2.9, but not using the Hits API. As described above, to
>> show results 991 to 1000 request the top-1000 results and display the last
>> 10 :-)"
>>
>> Bear with me as I am little confused so let me throw some stuff down here 
>> and think out loud...
>> So, I basically have to request the top 100, then do another request for the 
>> next 100, etc. etc which seems like that would start all over from scratch 
>> and be a bit of a performance hit correct???  I would think the optimal way 
>> would be search returns an object which maintains a cursor into the index 
>> tree until I close it so I can keep asking for the next 100.  It sounds like 
>> this new api doesn't do that?  And maybe the old one didn't either but from 
>> client perspective, I thought the Hits object might actually just maintain 
>> that pointer.
>>
>> NOTE: I am not doing anything close to search.  Just basic column indexing 
>> like an RDBMS would do for us except we don't have an RDBMS.  Our old RDBMS 
>> system has scaled up to being too costly(3 terabytes).  We are now scaling 
>> out with noSQL and trying to replace the RDBMS before the costs start to be 
>> more than the customers pay us.
>>
>> BIG NOTE: I think back to hibernate here where if you use select * from xx 
>> where yyy and setMaxResults and setFirstPage(index), it gets slower and 
>> slower as you page further in, BUT if you instead use the ScrollableResults, 
>> it maintains a cursor and the speed NEVER gets slower as you page into the 
>> results.
>>
>> Maybe I am using the wrong library but there are a lot of noSQL users of 
>> Hbase starting to use SOLR from what I understand.  Should I be using a 
>> different indexing library perhaps?
>>
>> Thanks,
>> Dean
>>
>>
>> -----Original Message-----
>> From: Uwe Schindler [mailto:u...@thetaphi.de]
>> Sent: Sunday, June 19, 2011 12:16 PM
>> To: java-user@lucene.apache.org
>> Subject: RE: looks like no allowing of paging without counting entire result 
>> set?
>>
>>> I am wondering how the old Hits object worked that was deprecated and
>>> removed....that looks like I could stop asking it for more results and it
>> would
>>> work better not counting all activities that matched in my 10 mil or 100
>> mil
>>> result set and just returning the first 100, second 100 and then I can cut
>> off
>>> which would be way more performant.
>>
>> Hits did exactly what you described before. It got as many results as needed
>> to show the nth page. To when showing the page for results 20 to 30, it
>> fetches at least 30 results.
>>
>> In general Full Text Search engines are only scoring the top results. This
>> is e.g. one reason why Google limits the maximum page you can go to.
>>
>>> Should I just use 2.9 instead?  But then 3.x doesn't seem to support this?
>>
>> It supports it like 2.9, but not using the Hits API. As described above, to
>> show results 991 to 1000 request the top-1000 results and display the last
>> 10 :-)
>>
>> Uwe
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
>> This message and any attachments are intended only for the use of the 
>> addressee and
>> may contain information that is privileged and confidential. If the reader 
>> of the
>> message is not the intended recipient or an authorized representative of the
>> intended recipient, you are hereby notified that any dissemination of this
>> communication is strictly prohibited. If you have received this 
>> communication in
>> error, please notify us immediately by e-mail and delete the message and any
>> attachments from your system.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
> This message and any attachments are intended only for the use of the 
> addressee and
> may contain information that is privileged and confidential. If the reader of 
> the
> message is not the intended recipient or an authorized representative of the
> intended recipient, you are hereby notified that any dissemination of this
> communication is strictly prohibited. If you have received this communication 
> in
> error, please notify us immediately by e-mail and delete the message and any
> attachments from your system.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: looks like no allowing of paging without counting entire result set?

Reply via email to