Yes, you have right. You can replace DictionaryNameFinder with a Lucene index. When you mentioned DictionaryNameFinder I was thinking at Name entity recognition module (tagging being done using a NER model).

Sorry for this misunderstanding.

BR,
Catalin

On 09/14/2015 03:31 PM, Damiano Porta wrote:
HI Catalin,
than you so much for you help.

Yes I found Lucene's FuzzyQuery, but i did not understand one passage. When
I check the term (with typos) against a Lucene Index to find the correct
form, why do I have to use DictionaryNameFinder? I mean..

1. I can create an index with all the correct names
2. CHecking each token against that index to find a match or a word (with a
specific "distance")
3. If I found something i "tag" that word as city without using
DictionaryNameFinder.

I mean, my "dictionary" will be this Lucene's index.
Correct?

Thank you!
Damiano



2015-09-14 13:10 GMT+02:00 Cătălin M. <[email protected]>:

A solution might be to check typos (Gogle, Gooogle) against a Lucene index
that would contain your dictionary of companies, too. Using the FuzzyQuery
you would find the correct form => "Google" and then use this correct orm
in your DictionaryNameFinder.

Please let me know if it seems feasible.

BR,
Catalin



On 09/13/2015 10:35 PM, Damiano Porta wrote:

Hi Catalin,
Can i use it with DictionaryNameFinder?
Thanks
Damiano

Il giorno Dom 13 Set 2015 21:08 Catalin Mititelu <
[email protected]>
ha scritto:

Hi Damiano,
You may try Lucene fuzzy query which is based on Levenstein distance.

BR,
Catalin

On 09/13/2015 09:59 PM, Damiano Porta wrote:

Hello,

I have created a very big dictionary of companies, it is around 3M.
At the moment i am using DictionaryNameFinder class, but I need to
implement something to find typos like Gogle/Gooogle Inc etc.
I read something about leveinstain distance, is this implementend in
OpenNLP?
It seems good but i read it takes a lot of times if the words are many

(my

case).

What should i do?
Thanks!
Damiano



Reply via email to