Enhancement engine to recognize street names

Andreas Kuckartz Mon, 21 Apr 2014 04:23:54 -0700

I am about to create an enhancement engine to recognize street names.
More precisely: Names of streets in Germany contained in German language
texts.


These are possible approaches:

1. Using simple heuristics such as these:

Everything beginning with a capital letter and ending with "str.",
"strasse" or "straße" or " Str." etc. is a street name. Similar for
"Ring", "Allee" etc. And if a blank and an integer or something like
"5B" follows that can be considered to be the corresponding street number.

2. Create an OpenNLP NameFinder model for the "OpenNLP Custom NER Model
Engine".

Creating such a model seems to require a lot of data:

"The training data should contain at least 15000 sentences to create a
model which performs well."
See:
http://opennlp.apache.org/documentation/1.5.3/manual/opennlp.html#tools.namefind

Can such models be created without training data?
Are there other suggestions?

Cheers,
Andreas

Enhancement engine to recognize street names

Reply via email to