Look at http://en.wikipedia.org/wiki/Louis_Vuitton and
http://en.wikipedia.org/wiki/Louis_Vuitton_(designer)
<http://en.wikipedia.org/wiki/Louis_Vuitton_%28designer%29> .
In this case it's unambiguous, but in the italian Wikipedia both the
designer and the company are described on the same page (which
consequently is "tagged" as <person> and <company>). During the parse of
a Wikipedia dump, it is quite hard to me the decision if it's better to
assign a given tag or another one.
The simplest strategy maybe the one of establishing an ordering, or
priority, between tags (e.g. between <company> and <person> assign
always <company>). The other option is to discard the tagging, trying to
not introduce errors, but in this case I loose a precious tagged
sentence. If I could assign both tags to the same entity I see an
advantage: it may be better knowing that a certain entity is a <person>
OR a <company> (or in some cases a <person> AND a <company>) , than
having only one or none tag.
However I wasn't interested specifically in Louis Vuitton, but in
discussing about multi-tagging of a certain entity. Imagine also some
example like: dolphin which is a <fish> and a <mammal> (set
intersection), or a cat which is an <animal> and a <felinae> (a subset).
However these cases could be resolved with some ontological inference.
Riccardo
On 12/01/2012 04:15, James Kosin wrote:
On 1/9/2012 8:43 AM, Jörn Kottmann wrote:
On 1/9/12 2:37 PM, Riccardo Tasso wrote:
Hi all,
does it make sense using the Name Finder module with multiple
tags/entities?
My use case is the following: I have an ambiguous training set (in my
case extracted from wikipedia). For example for "Louis Vuitton" I
can't easily decide if it occurs as Company or as Person. However I
think it is better recognize that the entity found is a Person OR a
Company than not recognizing it at all.
Is it currently possible with openNLP?
That sounds like a very rare case to me, maybe just label it as one of
the both (in this case maybe as company),
then it will at least be detected?
Jörn
It may be better to label as a Company. Louis Vuitton may not really
exist as a person. I can't seem to find anything that suggests he is a
real person.
James