Hi, I am new to Lucene and have been reading the documentation. I would like to use Lucene to query a song database by lyrics. The query could potentially contain typos, or even wrong words, word contractions (can't versus cannot), etc..
I would like to create an inverted list by word pairs and possibly phrases and not just by isolated words. For example: <w1,w2> < d1, d10, d27> <w2,w3> <d2, d13> ... OR even <phrase 1> <d1, d3,...> <phrase 2> <...> ... It seems to me that, by default, the index in Lucene stores statistics for isolated words. The Lucene documentation refers to the word "Term" all the time and seems to imply that "Term" can be a word or a phrase, but I can't see how IndexWriter can read a document and index it by word pairs. thank you in advance for the answers and my apologies if I did not get the terminology quite right. -Ghinwa