Uwe Schindler wrote:
I have no problem with it! Thanks!
What I would like to be fixed before moving it to core is the fact
that a
additional helper field is needed for the trie values. If everything
could
be in one field and the field is still sortable, it would be fine.
For that,
the order of terms in the FieldCache should be fixed. As current
trie fields
of highest precision order before all other lower precision field, the
simpliest fix would be to only index the first first term from
TermEnum at
the documents index in the FieldCache.
Another way would be to just invert the order and let the higher
precision
fields appear at last in the TermEnum. Both would be possible, but
there
should be a clear statement, which term for multi-term-fields is put
into
FieldCache (maybe configureable). See LUCENE-1372 for that.
Though, won't this make loading the field cache more costly since
you'll iterate through many more terms?
If all terms could be in one field, the API to TrieRange could be
simplier
and more effective for the GC. The trieCodeLong/Int() method would
just
return a TokenStream that can be indexed using "new
Field(Name,TokenStream)", more effectively using the Token's char
buffer
during trie encoding (it could be reused). This is how it is done by
Solr at
the moment (but with the additional allocation of the array) - I do
not like
the array allocations for each term and the whole trie-encoding at the
moment (1x char[], 1x String[], additional copying,...).
I agree it'd be awesome to have a less GC costly translation
during indexing.
I would be happy to have it in core, I could prepare the patch, when
the
above is fixed!
OK.
Mike
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]