Thanks for the detailed response sujit. UIMA, especially looks like an
interesting option.


On 3/24/11 3:57 PM, "Sujit Pal" <sujit....@comcast.net> wrote:

>I don't know if there is already an analyzer available for this, but you
>could use GATE or UIMA for Named Entity Extraction against names and
>expand the query to include the extra names that are used synonymously.
>You could do this outside Lucene or inline using a custom Lucene
>tokenizer that embeds either a GATE or UIMA NER.
>
>If you go the custom route (and you are not familiar with GATE or UIMA),
>you may want to take a look at Dr Manu Konchady's book on Lingpipe,
>Lucene and GATE - there is code in there to embed a GATE NER into a
>Lucene tokenizer (although its not a streaming tokenizer due to the
>nature of the NER process). The process would be similar for embedding a
>UIMA NER.
>
>GATE (ANNIE) contains data files that list the common synonyms (eg. Bill
>== William, Bob == Robert, Tom == Thomas, etc) which you can leverage
>with GATE's Jape rule language. Alternatively, you could use the same
>data from UIMA using a custom analysis engine (I prefer this route
>because this is all Java, easier learning curve and maintainability).
>
>-sujit
>
>On Thu, 2011-03-24 at 14:31 -0400, Deepak Konidena wrote:
>> Hi,
>> 
>> I  would like to build a search system where a search for "Dan" would
>>also search for "Daniel" and a search for "Will", "William" . Any ideas
>>on how to go about implementing that? I can think of writing a custom
>>Analyzer that would map these partial tokens to their full firstname or
>>lastnames. But is there an Analyzer in lucene contrib modules or
>>elsewhere that does a similar job for me?
>> 
>> Thanks,
>> Deepak Konidena.
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>For additional commands, e-mail: java-user-h...@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to