Hi,

I am new to Lucene and have been reading the documentation. I would like to use 
Lucene to query a song database by lyrics. The query could potentially contain 
typos, or even wrong words, word contractions (can't versus cannot), etc..

I would like to create an inverted list by word pairs and possibly phrases and 
not just by isolated words. For example:
<w1,w2>   < d1, d10, d27>
<w2,w3>   <d2, d13>
...

OR even
<phrase 1> <d1, d3,...>
<phrase 2> <...>
...

It seems to me that, by default, the index in Lucene stores statistics for 
isolated words. The Lucene documentation refers to the word "Term" all the time 
and seems to imply that "Term" can be a word or a phrase, but I can't see how 
IndexWriter can read a document and index it by word pairs. 

thank you in advance for the answers and my apologies if I did not get the 
terminology quite right.

-Ghinwa

Reply via email to