: I'm with you now. So you do seeks in your comparator. For a large index you : might as well use java.io.RandomAccessFile for the "array", because there : would be little value in buffering when the comparator is liable to jump all
yep .. that's what i was getting at ... but i'm not so sure that buffering won't be usefull. I've i'm not mistaken, all Scorers are by contract expected to score docs in docId order so when your hits are being collected for sorting, you should allways be moving forward in the file -- but you may skip ahead alot when the result set isn't a high percentage of the total number of docs. (i may be wrong about all Scorers going in docId order ... if you explicilty use the 1.4 BooleanScorer you may not get that behavior, but i think everything else works that way ... perhaps someone else can verify that) : around the file. This sounds very expensive, though. If you don't open a : Searcher to frequently, it makes sense (in my muddled mind) to pre-sort to : reduce the number of seeks. That was the half-baked idea of the third file, : which essentially orders document IDs. presort on what exactly, the field you want to sort on? -- That's esentially what the TermEnum is. I'm not sure how having that helps you ... let's assume you've got some data structure (let's not worry about the file/ram or TermEnum distinction just yet) containing every document in your index of 100,000,000 products sorted on the price field, and you've done a search for "apple" and there are 1,000,000 docIds for matching products ready to be collected by your new custom Scoring code ... how does the full list of all docIds sorted by price help you as you are given docIds and have to decide if that doc is better or worse then the docs you've already collected? : > Bear in mind, there have been some improvements recently to the ability to : grab individual stored fields per document.... : : I can't see anything like that in 2.0. Is that something in the Lucene HEAD : build? I guess so ... search the java-dev archives for "lazy field loading" or "Fieldable" .. that should find some of the discussion about it and the jira issue with the changes. -Hoss --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]