Re: NE

Jörn Kottmann Mon, 22 Aug 2011 14:42:10 -0700

On 8/22/11 11:38 AM, Eugen Ignat wrote:

Hello,
I want to use the "Name Finder" from OpenNLP, but for Romanian.
I downloaded all the models for the Name Finder: date, location, money,
organization, percentage, person and time name for English.
I presume for location, organization and person, in the model there should
be some sort of list/lists.


No, these models are statistical. That means they can learn with training
data what is an entity and what is not.

These features are generated by all kinds of rules, e.g. the token,capitalization

of the token. These features cannot be adjusted to work with Romanian by
hand.

Indeed you need to create new training data which contains Romanian texts,

you will get the best performance if you choose training data which iswithin

your domain, e.g. to process medical texts, you shouldn't use news wire
for training.

Have a look at our documentation:
http://incubator.apache.org/opennlp/documentation/manual/opennlp.html#tools.namefind.training

And now to my problem: can i open the .model files in some way that i don't
contravene with the license (moral or written), and so that i can find these
lists. Of course, after i "make" the models for Romanian, i will send them
back to you if you wish them.


Well, we have a model package, which is simply a zip, this you can unzip,

and then it contains a model file. The model file is the binaryrepresentation

of our statistical maxent (or perceptron) model.

There is no license issue, or other reason to keep you from looking at it.

At OpenNLP we currently simply lack the tools to inspect an existing model,
it would be interesting to see the features and their associated weights.

Jörn

Re: NE

Reply via email to