Is your entire corpus a single document? Because I'm having trouble imagining a single document where this would be a problem, unless your increment gap is huge. The term positions are relative to a single document...
You say that your levels have less than 1,000 elements each With an increment gap of 100, you're only talking a total here of 300,000 as your increment gap "holes", so you've got room for, uhhhhmm, a lot more tokens per document. If you're running over that limit, the increment gap is the least of your problems <G>... Of course I may be missing the point completely... Erick On Mon, Mar 15, 2010 at 5:03 AM, Rene Hackl-Sommer <[email protected]>wrote: > Hello, > > I am working at a use case that is very demanding regarding the number of > token positions. For one special field in the index, I need to represent > different hierarchy levels, like this: > > <MyField> > <Level_1> > <Level_2> > <Level_3> > > Please note that I need to do this with Lucene, not a XML search engine. > > Now, on Level_3 there a hundreds of tokens, Level_2 also has hundreds of > entries and Level_1 is in there with a low 3-digit figure. For those who > wish to know: this is an intricate system of chemical entities and some > their properties. > > I need this information to be searchable in all conceivable ways. What I am > doing right now is use position increment gaps to separate the Levels and > search with SpanQueries. It works like a charm for a setup with limited > entries. But Integer.MAX_VALUE poses a cap on the approach, of course. Would > it be thinkable to replace the current integer counting system with a long > based system? What issues should I consider? > > Thanks, > Rene > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
