Is your entire corpus a single document? Because I'm having trouble
imagining a single document where this would be a problem, unless
your increment gap is huge. The term positions are relative to
a single document...

You say that your levels have less than 1,000 elements each With
an increment gap of 100, you're only talking a total here of 300,000
as your increment gap "holes", so you've got room for, uhhhhmm, a lot
more tokens per document. If you're  running over that limit, the
increment gap is the least of your problems <G>...

Of course I may be missing the point completely...

Erick

On Mon, Mar 15, 2010 at 5:03 AM, Rene Hackl-Sommer <rene.a.ha...@gmx.de>wrote:

> Hello,
>
> I am working at a use case that is very demanding regarding the number of
> token positions. For one special field in the index, I need to represent
> different hierarchy levels, like this:
>
> <MyField>
> <Level_1>
> <Level_2>
> <Level_3>
>
> Please note that I need to do this with Lucene, not a XML search engine.
>
> Now, on Level_3 there a hundreds of tokens, Level_2 also has hundreds of
> entries and Level_1 is in there with a low 3-digit figure. For those who
> wish to know: this is an intricate system of chemical entities and some
> their properties.
>
> I need this information to be searchable in all conceivable ways. What I am
> doing right now is use position increment gaps to separate the Levels and
> search with SpanQueries. It works like a charm for a setup with limited
> entries. But Integer.MAX_VALUE poses a cap on the approach, of course. Would
> it be thinkable to replace the current integer counting system with a long
> based system? What issues should I consider?
>
> Thanks,
> Rene
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

Reply via email to