Hello! I would like to understand the best approach to the following problem.
I have documents really similar to resume/cv and i have to extract entities ( Name, Surname, Birthday, Cities, zipcode etc). To extract those entities I am combining different finders: Birthday and zipcodes = RegexNameFinder Name, Surname and Cities = DictionaryNameFinder. There are no problems with those finders, but, i am looking for a method/algorithm or something like that to *confirm* the entities. with "confirm" i mean that i have to find specific term (or entities) in proximities (closer to the entities I have found). Example: My name is <name> Name: <name> Name and Surname: <name> I can confirm the entity <name> because it is closer to specific term that let me understand the "context". If i have "name" or "surname" words near the entity <name> so i can say that i have found the <name> with a good probability. So the goal is write those kind of rules to confirm entities. Another example should be: My address is ......, <zipcode>00143</<ipcode> <city>Rome</city> Italian zipcodes are 5 digits long (numeric only), it is easy to find a 5 digits number inside my document (i use regex as i wrote above), and i also check it by quering a database to understand if the number exists. The problem here is that i need one more check to confirm (definitely) it. I must see if that number is near the entity <city>, if yes, ok... i have good probabilities. I also tried to train a model but i do not really have a "context" (sentences). Training the model with: My name is: <name>John</name> Name: <name>John</name> Name/Surname: <name>John</name> <name>John</name> is my name does not sound good to me because: 1. i have read we need many sentences to train a good model, 2. Those are not "sentences" i do not have a "context" (remember we i said the document is similar to resume/cv) 3. Maybe those phrases are too short I do not know how many different ways i could find to say the exact thing, but surelly i can not find 15000 ways :) What method should i use to try to confirm my entities? Thank you so much!