[ 
https://issues.apache.org/jira/browse/UIMA-6162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17000929#comment-17000929
 ] 

Richard Eckart de Castilho commented on UIMA-6162:
--------------------------------------------------

I have set up a unit test on a PR branching off master before your fix: 
https://github.com/apache/uima-uimaj/pull/16

The test in this PR fails because it builds on a version which doesn't include 
your fix yet. When merging master into it, it should work.

> Concurrent binary serialization produces corrupt output
> -------------------------------------------------------
>
>                 Key: UIMA-6162
>                 URL: https://issues.apache.org/jira/browse/UIMA-6162
>             Project: UIMA
>          Issue Type: Bug
>          Components: UIMA
>    Affects Versions: 3.1.1SDK
>            Reporter: Richard Eckart de Castilho
>            Priority: Major
>         Attachments: admin.ser
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> I suspect there could be an issue in `BinaryCasSerDes`.
> When deserializing the attached file `admin.ser`, I get this stack trace:
> {code:java}
> Caused by: java.lang.ClassCastException: class 
> org.apache.uima.jcas.tcas.Annotation cannot be cast to class 
> org.apache.uima.jcas.cas.Sofa (org.apache.uima.jcas.tcas.Annotation and 
> org.apache.uima.jcas.cas.Sofa are in unnamed module of loader 
> org.apache.catalina.loader.ParallelWebappClassLoader @4593ff34)at 
> org.apache.uima.cas.impl.BinaryCasSerDes.makeSofaFromHeap(BinaryCasSerDes.java:1823)
>  ~[uimaj-core-3.1.1.jar:3.1.1]at 
> org.apache.uima.cas.impl.BinaryCasSerDes.getSofaFromAnnotBase(BinaryCasSerDes.java:1817)
>  ~[uimaj-core-3.1.1.jar:3.1.1]at 
> org.apache.uima.cas.impl.BinaryCasSerDes.createFSsFromHeaps(BinaryCasSerDes.java:1701)
>  ~[uimaj-core-3.1.1.jar:3.1.1]at 
> org.apache.uima.cas.impl.BinaryCasSerDes.reinit(BinaryCasSerDes.java:259) 
> ~[uimaj-core-3.1.1.jar:3.1.1]at 
> org.apache.uima.cas.impl.BinaryCasSerDes.reinit(BinaryCasSerDes.java:328) 
> ~[uimaj-core-3.1.1.jar:3.1.1]at 
> org.apache.uima.cas.impl.Serialization.deserializeCASComplete(Serialization.java:129)
>  ~[uimaj-core-3.1.1.jar:3.1.1]{code}
>  The code used to read the file before deserializing is as follows:
> {code:java}
>     public static void readSerializedCas(CAS aCas, File aFile)
>         throws IOException
>     {
>         try (ObjectInputStream is = new ObjectInputStream(new 
> FileInputStream(aFile))) {
>             CASCompleteSerializer serializer = (CASCompleteSerializer) 
> is.readObject();
>             deserializeCASComplete(serializer, (CASImpl) aCas);
>         }
>         catch (ClassNotFoundException e) {
>             throw new IOException(e);
>         }
>     }
> {code}
> I set a breakpoint to BinaryCasSerDes:1608 which is a for loop iterating over 
> the heap. Apparently, the first feature structure that is encountered is an 
> annotation type which is NOT the SOFA. Then in line 1700, the deserializer 
> tries to resolve the SOFA for this annotation but fails because it has not 
> yet been deserialized. Eventually makeSofaFromHeap is called and checks if a 
> SOFA needs to be created. It tries to look up the SOFAs ID (1) from 
> csds.addr2fs.get(sofaAddr) (BinaryCasSerDes:1821) and generates a new SOFA. 
> However, when the SECOND annotation is read and csds.addr2fs.get(sofaAddr) 
> (BinaryCasSerDes:1821) is called again and tries to resolve the SOFA from 
> addr 1, it gets the previously deserialized annotation instead of the SOFA 
> annotation that had been created.
> The SOFA that has been implicitly created is added to the csds.addr2fs map at 
> key 1... however, later in BinaryCasSerDes:1723, the key 1 is overwritten by 
> the deserialized annotation:
> {code}
>         if (!isSofa) { // if it was a sofa, other code added or pended it
>           csds.addFS(fs, heapIndex); // this overrides to SOFA that was 
> created at key 1 because heapIndex is also 1
>         }
> {code}
> The heap looks something like this:
> {code}
> [0, 187, 1, 33, 46, 199, 200, 201, 44, 202, 187, 1, 33, 46, 203, 204, 205, 
> 45, 206, 187, 1, 33, 46, 207, 208, 209, 46, 210, 187, 1, 33, 46, 211, 212, 
> 213, 47, 214, 187, 1, 33, 46, 215, 216, 217, 48, 1, 187, 1,...
> {code}
> I guess that 187 is the type code of the first annotation and we can see it 
> repeats a couple of times. The 1 seems to be the SOFA ID - the first feature 
> of the feature structures. However, instead of 1 referring to the address of 
> the SOFA, it points at the first annotation which is NOT a SOFA.
> Bug in the serialization code assuming that the SOFA is always in the first 
> position?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to