[
https://issues.apache.org/jira/browse/UIMA-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Richard Eckart de Castilho updated UIMA-2101:
---------------------------------------------
Attachment: UIMA-2101-eckart-20110329.patch
In addition to being able to disable formatting - as motivated by Steven - I
would like to be able to access the SAX events generated from the CAS, so I can
use a custom transformer in the DKPro Core component XmlWriterInline.
Added a patch to address the issue. Patch is against SVN trunk rev 1085925 of
the uimaj-core module.
- Added new method CasToInlineXml.generateXML(CAS, FSMatchConstraint,
ContentHandler) which allows the user to use a custom transformer or other SAX
event handler.
- Added new property outputFormatted controlling whether generated XML strings
are formatted or not. This property does not affect the new generateXML(...)
method (see above). Per default the property is set to true, resembling the
state without the patch.
- Added rudimentary test case to check if (not) formatting works. Code borrows
from XmiCasDeserializerTest.
- Auto-formatted using UIMA Eclipse Code profile added a few braces.
> CasToInlineXml adds whitespace
> ------------------------------
>
> Key: UIMA-2101
> URL: https://issues.apache.org/jira/browse/UIMA-2101
> Project: UIMA
> Issue Type: Bug
> Affects Versions: 2.3.1SDK
> Reporter: Steven Bethard
> Attachments: UIMA-2101-eckart-20110329.patch
>
>
> CasToInlineXml adds indentation between adjacent XML elements. E.g. for a
> single character document with a single annotation covering that one
> character, it will write:
> {noformat}
> <?xml version="1.0" encoding="UTF-8"?>
> <Document>
> <uima.tcas.DocumentAnnotation sofa="Sofa" begin="0" end="1"
> language="x-unspecified">
> <uima.tcas.Annotation sofa="Sofa" begin="0" end="1">
> </uima.tcas.Annotation>
> </uima.tcas.DocumentAnnotation>
> </Document>
> {noformat}
> I think it should instead write everything in a single line, that is:
> {noformat}
> <?xml version="1.0" encoding="UTF-8"?>
> <Document><uima.tcas.DocumentAnnotation sofa="Sofa" begin="0" end="1"
> language="x-unspecified"><uima.tcas.Annotation sofa="Sofa" begin="0" end="1">
> </uima.tcas.Annotation></uima.tcas.DocumentAnnotation></Document>
> {noformat}
> I believe this could be fixed by replacing the line:
> {noformat}
> XMLSerializer sax2xml = new XMLSerializer(byteArrayOutputStream);
> {noformat}
> with the line:
> {noformat}
> XMLSerializer sax2xml = new XMLSerializer(byteArrayOutputStream, false);
> {noformat}
> I think it's a bug that CasToInlineXml is changing the character offsets, but
> I would also be happy if there was an alternate constructor or a method on
> CasToInlineXml that allowed disabling the formatting.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira