Thanks for the detailed response sujit. UIMA, especially looks like an interesting option.
On 3/24/11 3:57 PM, "Sujit Pal" <sujit....@comcast.net> wrote: >I don't know if there is already an analyzer available for this, but you >could use GATE or UIMA for Named Entity Extraction against names and >expand the query to include the extra names that are used synonymously. >You could do this outside Lucene or inline using a custom Lucene >tokenizer that embeds either a GATE or UIMA NER. > >If you go the custom route (and you are not familiar with GATE or UIMA), >you may want to take a look at Dr Manu Konchady's book on Lingpipe, >Lucene and GATE - there is code in there to embed a GATE NER into a >Lucene tokenizer (although its not a streaming tokenizer due to the >nature of the NER process). The process would be similar for embedding a >UIMA NER. > >GATE (ANNIE) contains data files that list the common synonyms (eg. Bill >== William, Bob == Robert, Tom == Thomas, etc) which you can leverage >with GATE's Jape rule language. Alternatively, you could use the same >data from UIMA using a custom analysis engine (I prefer this route >because this is all Java, easier learning curve and maintainability). > >-sujit > >On Thu, 2011-03-24 at 14:31 -0400, Deepak Konidena wrote: >> Hi, >> >> I would like to build a search system where a search for "Dan" would >>also search for "Daniel" and a search for "Will", "William" . Any ideas >>on how to go about implementing that? I can think of writing a custom >>Analyzer that would map these partial tokens to their full firstname or >>lastnames. But is there an Analyzer in lucene contrib modules or >>elsewhere that does a similar job for me? >> >> Thanks, >> Deepak Konidena. > > >--------------------------------------------------------------------- >To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >For additional commands, e-mail: java-user-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org