CasToInlineXml adds whitespace
------------------------------

                 Key: UIMA-2101
                 URL: https://issues.apache.org/jira/browse/UIMA-2101
             Project: UIMA
          Issue Type: Bug
    Affects Versions: 2.3.1SDK
            Reporter: Steven Bethard


CasToInlineXml adds indentation between adjacent XML elements. E.g. for a 
single character document with a single annotation covering that one character, 
it will write:

<?xml version="1.0" encoding="UTF-8"?>
<Document>
    <uima.tcas.DocumentAnnotation sofa="Sofa" begin="0" end="1" 
language="x-unspecified">
        <uima.tcas.Annotation sofa="Sofa" begin="0" end="1"> 
</uima.tcas.Annotation>
    </uima.tcas.DocumentAnnotation>
</Document>

I think it should instead write everything in a single line, that is:

<?xml version="1.0" encoding="UTF-8"?>
<Document><uima.tcas.DocumentAnnotation sofa="Sofa" begin="0" end="1" 
language="x-unspecified"><uima.tcas.Annotation sofa="Sofa" begin="0" end="1"> 
</uima.tcas.Annotation></uima.tcas.DocumentAnnotation></Document>

I believe this could be fixed by replacing the line:

XMLSerializer sax2xml = new XMLSerializer(byteArrayOutputStream);

with the line:

XMLSerializer sax2xml = new XMLSerializer(byteArrayOutputStream, false);

I think it's a bug that CasToInlineXml is changing the character offsets, but I 
would also be happy if there was an alternate constructor or a method on 
CasToInlineXml that allowed disabling the formatting.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to