Hi Jörn,

Thanks for your quick response.

Primarily the language is English, probably more American rather than European.

Domain-wise for the NER 'date' related otherwise, input data is domain independent. The current implementation/model for NER date detection is very good, it is the odd edge case such as lower case days, which cause problems.

I could go to the lengths of probably writing a regex for it, but it would be better to have a NLP solution, as these are already scanning input texts.

Your UIMA based annotation tooling sounds interesting and worth a look.

Thanks

Mark

On 18/01/2012 21:05, Jörn Kottmann wrote:
On 1/18/12 8:35 PM, mark meiklejohn wrote:
James,

I agree the correct way is to ensure upper-case. But when you have no
control over input it makes things a little more difficult.

So, I may look at a training set. What is the recommended size of a
training set?


In an annotation project I was doing lately our models started to work
after a couple
of hundred news articles. It of course depends on your language, domain
and the entities you
want to detect.

To make training easier I started to work on UIMA based annotation
tooling, let me know
if you would like to try that, any feedback is very welcome.

Jörn






Reply via email to