Ok got it! I see what you mean...In a nutshell:
You want the output of the AggregateNameFinder to be able to be used as
training data for training a separate model for example. This would
indeed require no nested sgml tags in the output...I was only thinking
about the evaluation aspect.
Jim
On 17/04/12 16:00, Jim - FooBar(); wrote:
Could you name a few of the "applications" that cannot deal with
overlapping/intersecting spans ???
I am still struggling to realise why 2 individual but overlapping Span
objects can cause problems. They are totally individual and they will
be processed serially...each one has its own start offset and its own
end offset. the prediction will either be correct or wrong...
unless of course, you're referring to the actual annotation that comes
back on screen when you deploy any name finder...hmm, this must be
what you probably meant...
well, in that case the problem is not messing up our evaluation
statistics but rather keeping the format of the output compatible
with format the trainer expects, meaning: no nested sgml tags & no
characters immediately after/before the sgml tag.
Is this what you're trying to get across Jorn?
Jim
On 17/04/12 15:42, Jörn Kottmann wrote:
On 04/17/2012 04:40 PM, Jim - FooBar(); wrote:
By "applications" you mean programs external or unrelated to openNLP?
An application which is using the output of the name finder.
Jörn