In Solr 4.7 an exciting new feature was added that allows one to page through a complete result set without having to worry about missing or double results at page boundaries while keeping resource utilization low.
I have a common use case that has similar performance and consistency problems that could be solved by extending the way CursorMarks work: A. The user executes a search and obtains thousands of results of which he sees the first 'page'. Apart from scrolling through the list he also has a scrollbar (or paging controls) to jump to anywhere in the list. B. The user uses the scrollbar to jump to an arbitrary place in the list. C. The user scrolls down a bit (but past the current 'page') to find what he's looking for. D. The user realizes he's too far down and scrolls up a bit again (but before the current 'page' again...) (Yes, I know that users should be educated to refine their search, but unfortunately, if the client for which the application is developed specifies that it should be possible to use it this way...) For the moment this is implemented by using the start/rows parameters to get the appropriate 'page' and this has the disadvantages that cursorMark solves: - Solr (actually I use Lucene directly, but that doesn't matter here) needs to store *all* documents up to document (start+rows) to be able to returns just the rows requested. Except for step A (where start==0), this may be a huge performance hit. - If the index is modified concurrently (especially when using NRT), jumping to the next/previous page can cause documents being repeated or skipped at page boundaries (as explained in https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results) Here's the way an extension to the cursorMark system could solve the problem: A. Solr/Lucene executes the search and returns the total number of hits and the requested number of top documents. start=0, rows=n, cursorMark=* B. start=x, rows=n, cursorMark=*: Here Solr should allow combining both start!=0 and cursorMark=*. It should execute a normal request using start=x and rows=n and add two cursorMarks : on corresponding to the sort values of the first document and one corresponding to the sort values of the last document C. Use cursorMark to get the 'next' pages: This is the same way cursorMark works for the moment: the user passes the cursorMark corresponding to the sort values of the last document. D. Use the cursorMark corresponding to the sort values of the first document to get the 'previous' pages. a In terms of implementing these changes, I've been looking at the source code and already did the easy ones :) - If a cursorMark is passed (either cursorMark=* or a 'real' value), Solr should return two cursorMarks in the result: nextCursorMark as before and prevCursorMark corresponding to the sort values of the first document. Done. - start!=0 and cursorMark=* should no longer be mutually exclusive (but start!=0 and cursorMark!=* should). Done. - When returning a result using a cursorMark, the start value returned should correspond to the actual position of the first document in the full result set. For the next page, this equals to the number of documents skipped during processing, but unfortunately I didn't see a way (yet) to pass that information along everywhere. This start value, together with the (possibly changed) numFound value can be used in the GUI to adjust the position of the scrollbar or the paging controls accordingly without having to estimate it. - Implementing reverse paging could actually be easier than it sounds by internally reversing the sort order (really reversing, not just reversing ASC/DESC!) using the cursor as in the normal case and afterwards reversing the obtained list of documents. I've updated PagingFieldCollector in TopFieldCollector.java by negating the values in reverseMul and overriding topDocs(start, howMany), but have to check everywhere partial results are merged as well... - Implement a corresponding amount of test cases for the paging up case as that exist for the paging down case (help! :) While working on the code, I thought of another use case as well: refreshing the current page: Instead of passing the same start value again, the prevCursorMark could be passed, but with a hint that the document on or after this cursorMark should be returned. Which brings me to the question of how to specify the new behavior to Solr without affecting the current behavior. I propose that prevCursorMark and nextCursorMark simply encode the sort values for the first and last document (as nextCursorMark does now) and that a simple prefix is used when cursorMark should be used differently: ">": documents after the cursor position: use with nextCursorMark to get the next page of results ">=": documents after or on the cursor position: use with prevCursorMark to refresh the same page keeping the same sort position for the first document "<": documents before the cursor position: use with prevCursorMark to get the previous page of results "<=": documents before or on the cursor position: use with nextCursorMark to get the same page keeping the same sort position for the last document (for completeness, useful?) So if prevCursorMark was "ABC" and nextCursorMark was "DEF", - "<ABC" would return the previous page - ">DEF" or "DEF" would return the next page - ">=ABC" would return the same page (but with 'fresh' values/documents), keeping 'visual' position the same I'd appreciate any comments on this or if anyone else has already started work on similar changes. In the meantime I'll continue working on what I have and check how I can make my changes available (through a patch attached to a new issue in Jira?) Luc Vanlerberghe --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org