Hello!
I would like to understand the best approach to the following problem.

I have documents really similar to resume/cv and i have to extract entities
( Name, Surname, Birthday, Cities, zipcode etc).

To extract those entities I am combining different finders:

Birthday and zipcodes = RegexNameFinder
Name, Surname and Cities = DictionaryNameFinder.

There are no problems with those finders, but, i am looking for a
method/algorithm or something like that to *confirm* the entities.

with "confirm" i mean that i have to find specific term (or entities) in
proximities (closer to the entities I have found).

Example:

My name is <name>
Name: <name>
Name and Surname: <name>

I can confirm the entity <name> because it is closer to specific term that
let me understand the "context". If i have "name" or "surname" words near
the entity <name> so i can say that i have found the <name> with a good
probability.

So the goal is write those kind of rules to confirm entities. Another
example should be:

My address is ......, <zipcode>00143</<ipcode> <city>Rome</city>

Italian zipcodes are 5 digits long (numeric only), it is easy to find a 5
digits number inside my document (i use regex as i wrote above), and i also
check it by quering a database to understand if the number exists. The
problem here is that i need one more check to confirm (definitely) it.

I must see if that number is near the entity <city>, if yes, ok... i have
good probabilities.

I also tried to train a model but i do not really have a "context"
(sentences).
Training the model with:

My name is: <name>John</name>
Name: <name>John</name>
Name/Surname: <name>John</name>
<name>John</name> is my name

does not sound good to me because:
1. i have read we need many sentences to train a good model,
2. Those are not "sentences" i do not have a "context" (remember we i said
the document is similar to resume/cv)
3. Maybe those phrases are too short

I do not know how many different ways i could find to say the exact thing,
but surelly i can not find 15000 ways :)

What method should i use to try to confirm my entities?

Thank you so much!

Reply via email to