<<< that if the first page took 3 seconds to come up, the second page took 3 seconds + x seconds>>>
This is really suspicious, what all are you trying to do in your process? Because I'm starting to guess that Solr isn't the performance problem here, assuming reasonably-sized pages (e.g. < thousands). If all you're doing is matching terms, not scoring, using wildcards, and all that, you might get some joy from TermDocs or similar. Best Erick On Mon, Jun 20, 2011 at 9:44 AM, Hiller, Dean x66079 <dean.hil...@broadridge.com> wrote: > One more note: We hit a big performance problem in that if the first page > took 3 seconds to come up, the second page took 3 seconds + x seconds to come > up....this was the major problem we hit. Our client is not a web app but > automated software so the timings on the second page really need to be in the > 0 seconds + x seconds range. > > So, deep paging may happen if there are no matches in our system as the > automated software has to go through all results until it pairs up the record > that just came in. > > Main issue is we have nothing to do with search and are trying to use lucene > as a plain indexing library for those typical rdbms indexing use-cases that > you have. > > Dean > > -----Original Message----- > From: Erick Erickson [mailto:erickerick...@gmail.com] > Sent: Monday, June 20, 2011 6:15 AM > To: java-user@lucene.apache.org > Subject: Re: looks like no allowing of paging without counting entire result > set? > > re: 20020101 to the end of time.. Use a clause like [2002-01-01 TO *] > > About paging... Yes, you have to start all over again for each search. The > basic > problem is that you have to score every document each search, the last > document > scored might be the highest-scoring document. > > But let's back up a step, can you tell us what the higher-level > problem you're trying > to solve is? *Why* do you want to do "deep paging"? Do you care about scoring > the documents or do you just want to look at all of them that match? > > One solution would be to use a Collector that collected as many documents as > you ever want to return and then you can use that list to "page". But > that requires > a stateful connection, which may be appropriate to your problem... > > Best > Erick > > On Sun, Jun 19, 2011 at 2:39 PM, Hiller, Dean x66079 > <dean.hil...@broadridge.com> wrote: >> "It supports it like 2.9, but not using the Hits API. As described above, to >> show results 991 to 1000 request the top-1000 results and display the last >> 10 :-)" >> >> Bear with me as I am little confused so let me throw some stuff down here >> and think out loud... >> So, I basically have to request the top 100, then do another request for the >> next 100, etc. etc which seems like that would start all over from scratch >> and be a bit of a performance hit correct??? I would think the optimal way >> would be search returns an object which maintains a cursor into the index >> tree until I close it so I can keep asking for the next 100. It sounds like >> this new api doesn't do that? And maybe the old one didn't either but from >> client perspective, I thought the Hits object might actually just maintain >> that pointer. >> >> NOTE: I am not doing anything close to search. Just basic column indexing >> like an RDBMS would do for us except we don't have an RDBMS. Our old RDBMS >> system has scaled up to being too costly(3 terabytes). We are now scaling >> out with noSQL and trying to replace the RDBMS before the costs start to be >> more than the customers pay us. >> >> BIG NOTE: I think back to hibernate here where if you use select * from xx >> where yyy and setMaxResults and setFirstPage(index), it gets slower and >> slower as you page further in, BUT if you instead use the ScrollableResults, >> it maintains a cursor and the speed NEVER gets slower as you page into the >> results. >> >> Maybe I am using the wrong library but there are a lot of noSQL users of >> Hbase starting to use SOLR from what I understand. Should I be using a >> different indexing library perhaps? >> >> Thanks, >> Dean >> >> >> -----Original Message----- >> From: Uwe Schindler [mailto:u...@thetaphi.de] >> Sent: Sunday, June 19, 2011 12:16 PM >> To: java-user@lucene.apache.org >> Subject: RE: looks like no allowing of paging without counting entire result >> set? >> >>> I am wondering how the old Hits object worked that was deprecated and >>> removed....that looks like I could stop asking it for more results and it >> would >>> work better not counting all activities that matched in my 10 mil or 100 >> mil >>> result set and just returning the first 100, second 100 and then I can cut >> off >>> which would be way more performant. >> >> Hits did exactly what you described before. It got as many results as needed >> to show the nth page. To when showing the page for results 20 to 30, it >> fetches at least 30 results. >> >> In general Full Text Search engines are only scoring the top results. This >> is e.g. one reason why Google limits the maximum page you can go to. >> >>> Should I just use 2.9 instead? But then 3.x doesn't seem to support this? >> >> It supports it like 2.9, but not using the Hits API. As described above, to >> show results 991 to 1000 request the top-1000 results and display the last >> 10 :-) >> >> Uwe >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> This message and any attachments are intended only for the use of the >> addressee and >> may contain information that is privileged and confidential. If the reader >> of the >> message is not the intended recipient or an authorized representative of the >> intended recipient, you are hereby notified that any dissemination of this >> communication is strictly prohibited. If you have received this >> communication in >> error, please notify us immediately by e-mail and delete the message and any >> attachments from your system. >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > This message and any attachments are intended only for the use of the > addressee and > may contain information that is privileged and confidential. If the reader of > the > message is not the intended recipient or an authorized representative of the > intended recipient, you are hereby notified that any dissemination of this > communication is strictly prohibited. If you have received this communication > in > error, please notify us immediately by e-mail and delete the message and any > attachments from your system. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org