Sorry, yes, this was my fault with the indexing speedups in 2.3 (LUCENE-843): as of 2.3, if any fields have term vectors enabled, the fields are sorted lexicographically. As of 2.4 (LUCENE-1301, refactoring the indexing core), that sort happens even without term vectors.
Hoss I see you've opened an issue for this (LUCENE-1727) for this; I'll take that & fix for 2.9. Sorry, Mike On Tue, Jun 30, 2009 at 9:20 PM, Mark Miller<markrmil...@gmail.com> wrote: > Yeah, I've heard rumblings about this issue before. I can't remember what > patch changed it though - one of Mike M's I think? > > On Tue, Jun 30, 2009 at 8:40 PM, Chris Hostetter > <hossman_luc...@fucit.org>wrote: > >> >> Hmmm... i'm not an expert on the internals of indexing, and i don't use >> FieldSelectors much, but this seems like a pretty big bug to me ... or at >> the very least: a change in behavior that completely eliminates the value >> of LOAD_AND_BREAK. >> >> https://issues.apache.org/jira/browse/LUCENE-1727 >> >> >> >> : The Lucene FAQ says... >> : >> : What is the order of fields returned by Document.fields()? >> : * Fields are returned in the same order they were added to the document. >> : (now getFields() as fields is deprecated) >> : >> : However I think this may no longer be the case in 2.4 >> : >> : We are indexing documents in a specific order so that we can >> LOAD_AND_BREAK out of our FieldSelector as early as possible. >> : i.e. we have typically 50 indexed fields for a document, but when we are >> loading results with .doc(), we know we only need 4 of them. >> : >> : So, our code ensures that these are added to the index first - and once >> the 4th field is loaded we break out of the selector. >> : >> : This speeds us up by an order of magnitude. >> : >> : >> : >> : However, we are finding that our field selector is processing fields in >> alphabetical order, not order of addition. This means that we'd have to >> rename our fields to 'aaa..' in order to guarantee they'd be processed >> first. >> : >> : >> : I think, but am not sure, that this bit of code causes the problem (as >> spotted in >> http://www.mail-archive.com/java-user@lucene.apache.org/msg24105.html). >> : It seems to have been introduced in version 2.4 (fields are in addition >> order in 2.3.2) >> : >> : DocFieldProcessorPerThread.java: >> : >> : // If we are writing vectors then we must visit >> : // fields in sorted order so they are written in >> : // sorted order. TODO: we actually only need to >> : // sort the subset of fields that have vectors >> : // enabled; we could save [small amount of] CPU >> : // here. >> : quickSort(fields, 0, fieldCount-1); >> : >> : >> : This appears to sort fields into alphabetical order. >> : >> : Assuming that implementing the TODO would keep them in order of addition >> (and just keep vectors fields themselves sorted) - is it worth raising a >> JIRA to fix this ? >> : >> : >> : regards, >> : >> : matt >> : >> : >> : >> : >> : _________________________________________________________________ >> : Get the best of MSN on your mobile >> : http://clk.atdmt.com/UKM/go/147991039/direct/01/ >> >> >> >> -Hoss >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org