Jorn, Thanks for the link to that section of documentation. The mention of the XMLUtils class was just what I needed. I wrote an XmlFilter class that uses XMLUtils to detect invalid XML characters and replace them with spaces so that our annotation offsets will still match the original text. I was thinking about the issue all wrong. I was assuming that all ASCII-8 characters are also valid XML-1.0 characters.
Thanks, Thomas Ginter 801-448-7676 thomas.gin...@utah.edu On Jun 14, 2012, at 3:52 PM, Jörn Kottmann wrote: > You write a string to the CAS which contains a non-xml character. > This character cannot be serialized into XMI, and thats what this exception > is about. > > Have a look at our documentation explaining the issue: > http://uima.apache.org/d/uimaj-2.4.0/tutorials_and_users_guides.html#ugr.tug.xmi_emf.xml_character_issues > > Hope that helps, > Jörn > > On 06/14/2012 11:39 PM, Thomas Ginter wrote: >> We are getting an odd error while trying to process large datasets using >> UIMA-AS 2.3.1. There is an exception thrown by the XmiCasSerializer in the >> Client when it is in the process of serializing a CAS to be sent to a remote >> service. The exception is as follows: >> >> org.apache.uima.resource.ResourceProcessException >> at >> org.apache.uima.adapter.jms.client.BaseUIMAAsynchronousEngineCommon_impl.sendCAS(BaseUIMAAsynchronousEngineCommon_impl.java:854) >> at >> org.apache.uima.adapter.jms.client.BaseUIMAAsynchronousEngineCommon_impl.sendCAS(BaseUIMAAsynchronousEngineCommon_impl.java:885) >> at >> org.apache.uima.adapter.jms.client.BaseUIMAAsynchronousEngineCommon_impl.process(BaseUIMAAsynchronousEngineCommon_impl.java:734) >> at gov.va.vinci.flap.Client.run(Client.java:181) >> at gov.va.vinci.density.DensityClient.main(DensityClient.java:137) >> Caused by: org.xml.sax.SAXParseException: Trying to serialize non-XML 1.0 >> character: _, 0x1a >> at >> org.apache.uima.util.XMLSerializer$CharacterValidatingContentHandler.checkForInvalidXmlChars(XMLSerializer.java:254) >> at >> org.apache.uima.util.XMLSerializer$CharacterValidatingContentHandler.startElement(XMLSerializer.java:174) >> at >> org.apache.uima.cas.impl.XmiCasSerializer$XmiCasDocSerializer.startElement(XmiCasSerializer.java:1003) >> at >> org.apache.uima.cas.impl.XmiCasSerializer$XmiCasDocSerializer.encodeFS(XmiCasSerializer.java:755) >> at >> org.apache.uima.cas.impl.XmiCasSerializer$XmiCasDocSerializer.encodeIndexed(XmiCasSerializer.java:700) >> at >> org.apache.uima.cas.impl.XmiCasSerializer$XmiCasDocSerializer.serialize(XmiCasSerializer.java:268) >> at >> org.apache.uima.cas.impl.XmiCasSerializer$XmiCasDocSerializer.access$700(XmiCasSerializer.java:108) >> at >> org.apache.uima.cas.impl.XmiCasSerializer.serialize(XmiCasSerializer.java:1539) >> at >> org.apache.uima.aae.UimaSerializer.serializeCasToXmi(UimaSerializer.java:136) >> at >> org.apache.uima.adapter.jms.client.BaseUIMAAsynchronousEngineCommon_impl.serializeCAS(BaseUIMAAsynchronousEngineCommon_impl.java:260) >> at >> org.apache.uima.adapter.jms.client.BaseUIMAAsynchronousEngineCommon_impl.sendCAS(BaseUIMAAsynchronousEngineCommon_impl.java:779) >> ... 4 more >> >> It happens at apparently random points when processing the corpus and is >> never actually "thrown" but is simply written to StdErr. Also the >> serializer never seems to return which means the >> UimaAsynchronoousEngine.process() method never returns and the client simply >> "hangs" until it is manually terminated. To resolve this issue I have >> implemented text filters for the incoming CAS data to prevent anything out >> of the ASCII-8 range. I have also tried switching the server and client to >> binary serialization strategies but that causes the XmiCasSerializer in my >> UimaAsBaseListener object to return errors attempting to serialize CAS >> objects revieved in the entityProcessingComplete event. >> >> Any suggestions from the UIMA masters? How can I debug further so that I >> can find out A: Where is this illegal character coming from and B: How can I >> prevent it from happening? >> >> Thanks, >> >> Thomas Ginter >> 801-448-7676 >> thomas.gin...@utah.edu<mailto:thomas.gin...@utah.edu> >> >> >> >> >> >