Miguel Alvarez created UIMA-5757:
------------------------------------
Summary: Unable to extract features when annotation ends with HTML
tag
Key: UIMA-5757
URL: https://issues.apache.org/jira/browse/UIMA-5757
Project: UIMA
Issue Type: Bug
Components: Ruta
Affects Versions: 2.6.1ruta
Environment: RUTA 2.6.1, Windows 10, Eclipse Mars, JDK 1.8.0_144
Reporter: Miguel Alvarez
If there is an annotation that covers the whole sofa string, and the sofa
string ends with an HTML tag, it seems like RUTA isn't able to extract the
features for that annotation. For instance, lets suppose this document
(represented as XMI):
{code:java}
// XMI document
<?xml version="1.0" encoding="UTF-8"?>
<xmi:XMI xmlns:xmi="http://www.omg.org/XMI" xmlns:cas="http:///uima/cas.ecore"
xmlns:tcas="http:///uima/tcas.ecore"
xmlns:types="http:///com/acme/uima/types.ecore" xmi:version="2.0">
<cas:NULL xmi:id="0"/>
<tcas:DocumentAnnotation xmi:id="8" sofa="1" begin="0" end="12" language="es"/>
<types:MyDocument xmi:id="14" sofa="1" begin="0" end="12"
documentId="test_docsize_39d5541c-5e7f-391c-95af-c82ce6306644"/>
<cas:Sofa xmi:id="1" sofaNum="1" sofaID="_InitialView" mimeType="text"
sofaString="ABCDEFGHIJ<p>"/>
<cas:View sofa="1" members="8 14"/>
</xmi:XMI>
{code}
And the following RUTA script:
{code:java}
// RUTA script
STRING documentId = "Unknown";
com.acme.uima.types.MyDocument{-> GETFEATURE("documentId", documentId)};
LOG("Starting to process document: " + documentId);
{code}
The LOG action will output Unknown. But as soon as the string doesn't end with
an HTML tag, it works fine.
Any ideas what could be going on?
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)