On 17/04/12 13:52, Jörn Kottmann wrote:
If you don't want to handle these cases, you can simply copy all names together
into a list, and then do evaluation on this list.
This approach works with our evaluation, but will usually be an issue for applications which expect output
where the ambiguities mentioned earlier are resolved.

That is exactly what my current AggregateNameFinder does...It just gets rids of duplicates...

I propose that we make a simple baseline implementations
which takes all output spans, orders them and then resolves
the ambiguities based on the order. This will prefer longer
names over shorter names, but ignores the type.

There are more sophisticated ways of handling this,
e.g taking probabilities from the statistical name finders into
account, but these might be a bit more restrictive as well.

I agree on the baseline implementation but i don't see why the spans need to be ordered and why ambiguities need resolving...the only true ambiguity that can occur is having the exact same span with a different type in which case we need to make a decision. Taking the probabilities from maxent is also a bit naive because you will not know which model to trust (maybe the weakest model gives you highest probs)...

Jim

p.s: i have to clarify that using a real dictionary and real-word corpus the AggregateNameFinder which simply merges the distinct predictions achieved a 29% improvement on recall and no change in precision which was already very high.





Reply via email to