Rob, look at the third hit:
  http://www.lucenebook.com/search?query=bi-grams

Otis

----- Original Message ----
From: Rob Young <[EMAIL PROTECTED]>

> That sounds like just what I'm looking for. Do you know if this is 
> covered in Lucene in Action or where I can find more information about it.

Eric Isakson wrote:

>You might consider using overlapping bi-gram tokenization with stripped out 
>whitespace and a PhraseQuery.
>
>So your tokenized content, "spongebob squarepants", would look like:
>
>sp po on ng ge eb bo ob bs sq qu ua ar re ep pa an nt ts
>
>and your tokens for your query, "sponge bob", would look like
>
>sp po on ng ge eb bo ob
>
>Add each token to the PhraseQuery and you should match.
>
>This is very similar to the techniques used for searching in Asian languages 
>which do not seperate words with spaces. There are probably some side effects 
>for compound words that you didn't mean to do this too, but without knowing 
>the exact domain of compound words that you wish to support, this is probably 
>the best you will be able to do.





---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to