Re: Increase number of available positions?

Rene Hackl-Sommer Mon, 15 Mar 2010 07:00:27 -0700

Is your entire corpus a single document? Because I'm having trouble
imagining a single document where this would be a problem, unless
your increment gap is huge. The term positions are relative to
a single document...

It is getting pretty huge, yes (see below). The term positions are alsorelative to a single field, aren't they?

<MyField>
<Level_1>
<Level_2>
<Level_3>

Let me plug in some figures to help clarify. On Level 3 there arehundreds of tokens. So to be able to search two or more terms in MyFieldin the same Level_3, I put a position gap of 1000 between all Level_3's.Per Level_2 there might be hundreds of Level_3 entries. As I want torestrict the search to all Level_3 entries of a Level_2, I set theposition increment gap for Level_2 at 1000x1000 = 1,000,000 (1000 forthe Tokens in Level_3 and 1000 for the Level_3 entries in Level_2).

This done, Level_1 still needs to be accomodated. If you're looking at500 Level_2 entries, a gap of 1,000,000x500 is needed per Level_1 entry,to be able to search only within each of the Level_1 elements.That wayonly four Level_1 entries can be included before the maximum value isreached.


Queries I am looking to support might look like this in an easy case:

Search in MyField: Terms T1 and T2 on Level_2 and T3, T4, and T5 onLevel_3, which should both be in the same Level_1.

Sorry if this is confusing, what with all these levels going on. I thinkwhat it comes down to is whether the integer based position countingmight be replaced by long. Can this be done at all? Are performance orother implications conceivable? Or is the current implementation toodeeply wired to Lucene core workings to make this a reasonable endeavour?


Cheers
Rene

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Increase number of available positions?

Reply via email to