Hi Peter,
I believe I've encountered this too; I never got around to tracking it down
to the root cause, and didn't have the civic-mindedness to report it as you
have. Thanks!
To shut it up I implemented a brutal brute-force workaround, enclosed for
your possible amusement.
But it occurred to me that in every other case, where the annotation
> doesn't begin on the first character and it doesn't throw an exception, it
> might cause downstream methods like doesSubsume to give the wrong result
> because the begin/end offsets are wrong.
One would think so, but interestingly enough, this does *not* seem to be
the case. Everywhere I've checked (quite a few, over the past few years),
non-initial ContextAnnotation offsets look correct.
Workaround: a class that extends NegexAnnotator and adjusts the offsets at
the end of the process() method.
public class NegexAnnotator extends
org.apache.ctakes.ytex.uima.annotators.NegexAnnotator {
...
private void adjustContextOffsets(JCas jCas) {
String text = jCas.getDocumentText();
if (text == null) return;
Collection<ContextAnnotation> contexts = JCasUtil.select(jCas,
ContextAnnotation.class);
if (contexts == null || contexts.isEmpty()) return;
contexts.stream()
.filter(c -> c.getBegin() < 0)
.peek(c -> logger.debug("adjusting begin=" + c.getBegin()))
.forEach(c -> c.setBegin(0));
// don't know if this happens
int docTextLen = jCas.getDocumentText().length();
contexts.stream()
.filter(c -> c.getEnd() >= docTextLen)
.peek(c -> logger.debug("adjusting end=" + c.getEnd()))
.forEach(c -> c.setEnd(docTextLen - 1));
}
On Sun, Aug 30, 2020 at 5:35 PM Peter Abramowitsch <[email protected]>
wrote:
> Hi,
> I was getting a StringIndexOutOfBoundsException in
> DependencyUtil.doesSubsume(annot1, annot2) with exactly this situation:
>
> *negex annotator*
> *the text begins "negative for <anything>"*
>
> If the chunk *negative for xyz *is preceded by anything else, even a
> space, the problem goes away. It also goes away when you choose another
> style of negation. "no headache", for instance
>
> I've traced the problem back to some illegal entries in the jCAS You can
> see from the image below that the ContextAnnotation's begin offset is
> illegal.
>
> Clearly there's an off-by-one error and this triggered the exception
> because in my example, the Annotation is created right from the 0th char of
> my note text. But it occurred to me that in every other case, where the
> annotation doesn't begin on the first character and it doesn't throw an
> exception, it might cause downstream methods like doesSubsume to give the
> wrong result because the begin/end offsets are wrong.
>
> I'm not sure how to follow this up. But if anyone wants to tackle it....?
>
> This is from HistoryAttributeClassifier beginning at line 274
>
> [image: image.png]
>
>
>
>