Ok , sorry I rushed earlier...Now I remember what happened 8-9 months
ago...It's not the Span.spansToStrings () that has the problem but the
RegexNameFinder instead! Calling the .find method of the
RegexNameFinder returns spans of the form I mentioned earlier (#<Span
[3..3)>)...I do remember fixing this but I 'm not sure I submitted a
patch...can anyone shed some light or should I go back to diff my sources?
Jim
On 20/02/13 12:16, Jim foo.bar wrote:
I forgot to mention that I'm referring to the 1.5.2-incubating
version available on maven. Presumably this been fixed in trunk?
Jim
On 20/02/13 11:53, Jim foo.bar wrote:
Hi everyone,
I'm pretty sure we had this discussion last year and that it was
fixed! Basically, whenever any NameFinder recognises a single word
token the resulting span is something like this:
(#<Span [3..3)> #<Span [6..6)>)
while I think it should have been (#<Span [3..4)> #<Span [6..7)>).
As a result the following exception is thrown :
StringIndexOutOfBoundsException String index out of range: -1
java.lang.AbstractStringBuilder.substring
(AbstractStringBuilder.java:872)
I am 99% positive that we've fixed this in the past...at least my
private openNLP build behaves as expected. Just in case I'm doing
something wrong here are my steps:
- create a RegexNameFinder passing the following regexes in an array:
"\d+", "\w+ive?"
-call find on it passing the following text in an array ["azestapine"
"treatment" "is" "10" "times" "more" "effective" "."]
-I get back the aformentioned spans (#<Span [3..3)> #<Span [6..6)>)
-trying to convert them to string-array (via Span/spansToStrings)
doesn't work!
any ideas? This is quite important isn't it?
Jim