Uwe Schindler wrote:

I have no problem with it! Thanks!

What I would like to be fixed before moving it to core is the fact that a additional helper field is needed for the trie values. If everything could be in one field and the field is still sortable, it would be fine. For that, the order of terms in the FieldCache should be fixed. As current trie fields
of highest precision order before all other lower precision field, the
simpliest fix would be to only index the first first term from TermEnum at
the documents index in the FieldCache.

Another way would be to just invert the order and let the higher precision fields appear at last in the TermEnum. Both would be possible, but there should be a clear statement, which term for multi-term-fields is put into
FieldCache (maybe configureable). See LUCENE-1372 for that.

Though, won't this make loading the field cache more costly since
you'll iterate through many more terms?

If all terms could be in one field, the API to TrieRange could be simplier and more effective for the GC. The trieCodeLong/Int() method would just
return a TokenStream that can be indexed using "new
Field(Name,TokenStream)", more effectively using the Token's char buffer during trie encoding (it could be reused). This is how it is done by Solr at the moment (but with the additional allocation of the array) - I do not like
the array allocations for each term and the whole trie-encoding at the
moment (1x char[], 1x String[], additional copying,...).

I agree it'd be awesome to have a less GC costly translation
during indexing.

I would be happy to have it in core, I could prepare the patch, when the
above is fixed!

OK.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to