On 3/25/11 5:57 AM, "Ian Lea" <ian....@gmail.com> wrote: >Have you tried stemming? Simpler and available in core lucene. Look >at PorterStemFilter or use your favourite search engine to find more >info and options.
Ian, I did try PorterStemFilter and couldn't get the result I wanted. (Dan == Daniel, Will == William, etc..) > >If instead you go the synonym route, there is sample code in Lucene in >Action and a wordnet contrib module you might find useful. Thanks, Will take a look at that. > > >-- >Ian. > >On Thu, Mar 24, 2011 at 7:57 PM, Sujit Pal <sujit....@comcast.net> wrote: >> I don't know if there is already an analyzer available for this, but you >> could use GATE or UIMA for Named Entity Extraction against names and >> expand the query to include the extra names that are used synonymously. >> You could do this outside Lucene or inline using a custom Lucene >> tokenizer that embeds either a GATE or UIMA NER. >> >> If you go the custom route (and you are not familiar with GATE or UIMA), >> you may want to take a look at Dr Manu Konchady's book on Lingpipe, >> Lucene and GATE - there is code in there to embed a GATE NER into a >> Lucene tokenizer (although its not a streaming tokenizer due to the >> nature of the NER process). The process would be similar for embedding a >> UIMA NER. >> >> GATE (ANNIE) contains data files that list the common synonyms (eg. Bill >> == William, Bob == Robert, Tom == Thomas, etc) which you can leverage >> with GATE's Jape rule language. Alternatively, you could use the same >> data from UIMA using a custom analysis engine (I prefer this route >> because this is all Java, easier learning curve and maintainability). >> >> -sujit >> >> On Thu, 2011-03-24 at 14:31 -0400, Deepak Konidena wrote: >>> Hi, >>> >>> I would like to build a search system where a search for "Dan" would >>>also search for "Daniel" and a search for "Will", "William" . Any ideas >>>on how to go about implementing that? I can think of writing a custom >>>Analyzer that would map these partial tokens to their full firstname or >>>lastnames. But is there an Analyzer in lucene contrib modules or >>>elsewhere that does a similar job for me? >>> >>> Thanks, >>> Deepak Konidena. >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> > >--------------------------------------------------------------------- >To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >For additional commands, e-mail: java-user-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org