[jira] [Created] (UIMA-5757) Unable to extract features when annotation ends with HTML tag

Miguel Alvarez (JIRA) Thu, 05 Apr 2018 16:02:35 -0700

Miguel Alvarez created UIMA-5757:
------------------------------------

             Summary: Unable to extract features when annotation ends with HTML 
tag
                 Key: UIMA-5757
                 URL: https://issues.apache.org/jira/browse/UIMA-5757
             Project: UIMA
          Issue Type: Bug
          Components: Ruta
    Affects Versions: 2.6.1ruta
         Environment: RUTA 2.6.1, Windows 10, Eclipse Mars, JDK 1.8.0_144
            Reporter: Miguel Alvarez



If there is an annotation that covers the whole sofa string, and the sofa 
string ends with an HTML tag, it seems like RUTA isn't able to extract the 
features for that annotation. For instance, lets suppose this document 
(represented as XMI):

 
{code:java}
// XMI document
<?xml version="1.0" encoding="UTF-8"?>
<xmi:XMI xmlns:xmi="http://www.omg.org/XMI"; xmlns:cas="http:///uima/cas.ecore"; 
xmlns:tcas="http:///uima/tcas.ecore"; 
xmlns:types="http:///com/acme/uima/types.ecore"; xmi:version="2.0">
<cas:NULL xmi:id="0"/>
<tcas:DocumentAnnotation xmi:id="8" sofa="1" begin="0" end="12" language="es"/>
<types:MyDocument xmi:id="14" sofa="1" begin="0" end="12" 
documentId="test_docsize_39d5541c-5e7f-391c-95af-c82ce6306644"/>
<cas:Sofa xmi:id="1" sofaNum="1" sofaID="_InitialView" mimeType="text" 
sofaString="ABCDEFGHIJ&lt;p&gt;"/>
<cas:View sofa="1" members="8 14"/>
</xmi:XMI>
{code}
And the following RUTA script:

 

 
{code:java}
// RUTA script
STRING documentId = "Unknown";
com.acme.uima.types.MyDocument{-> GETFEATURE("documentId", documentId)};
LOG("Starting to process document: " + documentId);
{code}
The LOG action will output Unknown. But as soon as the string doesn't end with 
an HTML tag, it works fine.

 

Any ideas what could be going on?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (UIMA-5757) Unable to extract features when annotation ends with HTML tag

Reply via email to