On Tue, 2011-04-12 at 11:41 +0200, Gregor Heinrich wrote: > Hi -- has there been any effort to create a numerical representation of > Lucene > indices. That is, to use the Lucene Directory backend as a large > term-document > matrix at index level. As this would require bijective mapping between terms > (per-field, as customary in Lucene) and a numerical index (integer, > monotonous > from 0 to numTerms()-1), I guess this requires some some special > modifications > to the Lucene core.
Maybe you're thinking about something like TermsEnum? https://hudson.apache.org/hudson/job/Lucene-trunk/javadoc/all/org/apache/lucene/index/TermsEnum.html It provides ordinal-access to terms, represented with longs. In order to make the access at index-level rather than segment-level you will have to perform a merge of the ordinals from the different segments. Unfortunately it is optional whether the codec supports ordinal-based terms access and the default codec does not, so you will have to explicitly select a codec when you build your index. --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
