Span.getCoveredText() returning string based on character positions

Jeff Zemerick Thu, 13 Oct 2022 06:51:16 -0700

Hi All,

The NameFinder implementations create spans based on the entity's
token-based start/end indexes.


But, Span.getCoveredText() gets the covered text based on the character
start/end instead of the token start/end.

An example:

sentence = "Neil Abercrombie Anibal Acevedo-Vila Gary Ackerman"
span = [0..2) person

And getting the covered text of the span:

span.getCoveredText(sentence) returns "Ne" and not "Neil Abercrombie"



Span.getCoveredText() is used a lot so it seems like changing its behavior
is not the fix. Is the problem that the NameFinder implementations are
using token positions instead of character positions when creating Spans?
Meaning that NameFinderME/DL should be updated to use character start/stop
positions instead of token start/stop positions?

Thanks,
Jeff

Span.getCoveredText() returning string based on character positions

Reply via email to