[ 
https://issues.apache.org/jira/browse/UIMA-3818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gregoire Jadi updated UIMA-3818:
--------------------------------

    Description: 
The UTF8 character '𝒪' can not be deserialized by 
`XmiCasDeserializer.deserialize'.

Here is a way to reproduce this:

{code:java}
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.InputStream;
import java.io.OutputStream;

import org.apache.uima.cas.impl.XmiCasDeserializer;
import org.apache.uima.cas.impl.XmiCasSerializer;
import org.apache.uima.fit.factory.JCasFactory;
import org.apache.uima.jcas.JCas;

public class Test {
    public static void main(String[] args) throws Exception {
        JCas jCas = JCasFactory.createJCas();
        jCas.setDocumentText("𝒪");
        File file = new File("/tmp/test.xmi");
        OutputStream outputStream = new FileOutputStream(file);
        XmiCasSerializer.serialize(jCas.getCas(), outputStream);

        InputStream inputStream = new FileInputStream(file);
        XmiCasDeserializer.deserialize(inputStream, jCas.getCas());
    }
}
{code}

And here is the stacktrace:
{code}
[Fatal Error] :1:350: Character reference "&#56490" is an invalid XML character.
Exception in thread "main" org.xml.sax.SAXParseException; lineNumber: 1; 
columnNumber: 350; Character reference "&#56490" is an invalid XML character.
        at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
        at 
org.apache.uima.cas.impl.XmiCasDeserializer.deserialize(XmiCasDeserializer.java:1955)
        at 
org.apache.uima.cas.impl.XmiCasDeserializer.deserialize(XmiCasDeserializer.java:1872)
        at Test.main(Test.java:24)
     [java] Java Result: 1
{code}

Please tell me if you need more information.

  was:
The UTF8 character '𝒪' can not be deserialized by 
`XmiCasDeserializer.deserialize'.

Here is a way to reproduce this:

{code:java}
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.InputStream;
import java.io.OutputStream;

import org.apache.uima.cas.impl.XmiCasDeserializer;
import org.apache.uima.cas.impl.XmiCasSerializer;
import org.apache.uima.fit.factory.JCasFactory;
import org.apache.uima.jcas.JCas;

public class Test {
    public static void main(String[] args) throws Exception {
        JCas jCas = JCasFactory.createJCas();
        jCas.setDocumentText("𝒪");
        File file = new File("/tmp/test.xmi");
        OutputStream outputStream = new FileOutputStream(file);
        XmiCasSerializer.serialize(jCas.getCas(), outputStream);

        InputStream inputStream = new FileInputStream(file);
        XmiCasDeserializer.deserialize(inputStream, jCas.getCas());
    }
}
{code}

Please tell me if you need more information.


> Unsuported XML entity by XmiCas(De)serializer
> ---------------------------------------------
>
>                 Key: UIMA-3818
>                 URL: https://issues.apache.org/jira/browse/UIMA-3818
>             Project: UIMA
>          Issue Type: Bug
>          Components: Collection Processing
>    Affects Versions: 2.4.2SDK
>            Reporter: Gregoire Jadi
>
> The UTF8 character '𝒪' can not be deserialized by 
> `XmiCasDeserializer.deserialize'.
> Here is a way to reproduce this:
> {code:java}
> import java.io.File;
> import java.io.FileInputStream;
> import java.io.FileOutputStream;
> import java.io.InputStream;
> import java.io.OutputStream;
> import org.apache.uima.cas.impl.XmiCasDeserializer;
> import org.apache.uima.cas.impl.XmiCasSerializer;
> import org.apache.uima.fit.factory.JCasFactory;
> import org.apache.uima.jcas.JCas;
> public class Test {
>     public static void main(String[] args) throws Exception {
>         JCas jCas = JCasFactory.createJCas();
>         jCas.setDocumentText("𝒪");
>         File file = new File("/tmp/test.xmi");
>         OutputStream outputStream = new FileOutputStream(file);
>         XmiCasSerializer.serialize(jCas.getCas(), outputStream);
>         InputStream inputStream = new FileInputStream(file);
>         XmiCasDeserializer.deserialize(inputStream, jCas.getCas());
>     }
> }
> {code}
> And here is the stacktrace:
> {code}
> [Fatal Error] :1:350: Character reference "&#56490" is an invalid XML 
> character.
> Exception in thread "main" org.xml.sax.SAXParseException; lineNumber: 1; 
> columnNumber: 350; Character reference "&#56490" is an invalid XML character.
>       at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
>       at 
> org.apache.uima.cas.impl.XmiCasDeserializer.deserialize(XmiCasDeserializer.java:1955)
>       at 
> org.apache.uima.cas.impl.XmiCasDeserializer.deserialize(XmiCasDeserializer.java:1872)
>       at Test.main(Test.java:24)
>      [java] Java Result: 1
> {code}
> Please tell me if you need more information.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to