Indexing the multiple words at the same position

Jeroen Lauwers Fri, 06 Aug 2010 03:25:15 -0700

Has anyone encountered the following problem (and found a solution)

I need to index a classical text that can have multiple words at that same 
position. Example: if a publisher isn't sure if Shakespeare wrote "To be or not 
to be happy" or "To be or not to be daddy", he will put the 'best' word (eg. 
'happy') in the full text and the second option (eg. 'daddy') in the "notes" at 
the bottom of a page.
Now, our customer wants to search for "to be daddy" and find "to be happy". So, 
if I could index "daddy" at the same position as "happy" , I would be very 
happy too.


Of course you can think of a solution where one would index the full text for 
each version, but this is not sustainable when the number of "multiple 
occupation of a single position" increase.

I have been looking at the 'next()' method of the 'Tokenizer' class, but I 
haven't found the solution (yet).

Thanks in advance to all who reply.
Jeroen

Indexing the multiple words at the same position

Reply via email to