Yes, we had multiword entities. Actually, the dataset was quite "dirty" and "funny" - there were names like "al`XXX" and "al XXX" and some other where the separator was some funny unicode character. But I don't remember any problems similar to those you have (I followed the thread). But that was OpenNLP 1.4.0 or 1.4.3, somewhere in that range. I don't have exact figures now, but I've fished out a precision (for one class) from an old email: 80.98%
Aliaksandr On Wed, Feb 8, 2012 at 11:45 AM, Jim - FooBar(); <[email protected]>wrote: > Hi there Autayeu, > > Did you have any multi-word entities in your annotated corpus? > If yes, how did the maxent NER model perform? Could it find them or was it > just finding single-word entities? > If you don't understand why i'm asking have a look at the previous > messages.... > > I really appreciate the help... > > Regards, > Jim > > > > On 08/02/12 10:39, Aliaksandr Autayeu wrote: > >> p.s: have you ever done any serious NER (not for demonstration purposes) >>> using openNLP? >>> >> I did experiments (more than a year ago, with 1.4.3) for standard three >> classes, got the state of the art for our private corpus, but then we >> changed approach. >> >> Aliaksandr >> >> >
