[ 
https://issues.apache.org/jira/browse/UIMA-6162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16998495#comment-16998495
 ] 

Richard Eckart de Castilho commented on UIMA-6162:
--------------------------------------------------

I can't easily roll the application back to using UIMAv2 and it is really 
difficult to reproduce.

This is the code used to serialize the CAS - the middle part 
(SAFEGUARD/preserveForDebuggin) I have added to help me in debugging and it is 
not part of the regular code:

{code}
    public static void writeSerializedCas(CAS aCas, File aFile)
        throws IOException
    {
        FileUtils.forceMkdir(aFile.getParentFile());
        
        CASCompleteSerializer serializer = null;
        
        try {
            serializer = serializeCASComplete((CASImpl) aCas);

            // BEGIN SAFEGUARD --------------
            // Safeguard that we do NOT write a CAS which can afterwards not be 
read and thus would
            // render the document broken within the project
            // Reason we do this: 
https://issues.apache.org/jira/browse/UIMA-6162
            CAS dummy = createCas((TypeSystemDescription) null, null, null);
            deserializeCASComplete(serializer, (CASImpl) dummy);
            // END SAFEGUARD --------------
        }
        catch (Exception e) {
            preserveForDebugging(aFile, aCas, serializer);
            throw new IOException(e);
        }

        try (ObjectOutputStream os = new ObjectOutputStream(new 
FileOutputStream(aFile))) {
            os.writeObject(serializer);
        }
    }
{code}

Meanwhile, I had the suspicion that there might have been concurrent writes to 
a CAS, but so far I could find no evidence of that. I am using a cache, but it 
is bound to a particular web request, so it's not shared across threads. I'm 
not aware of CAS objects being shared across threads.

I'm think right now of wrapping the CAS objects in a dynamic proxy which checks 
on every method call that the CAS is only used on the thread on which it was 
created... just to make sure that concurrent access is ruled out.

> Sofa not found when deserializing CAS
> -------------------------------------
>
>                 Key: UIMA-6162
>                 URL: https://issues.apache.org/jira/browse/UIMA-6162
>             Project: UIMA
>          Issue Type: Bug
>          Components: UIMA
>    Affects Versions: 3.1.1SDK
>            Reporter: Richard Eckart de Castilho
>            Priority: Major
>         Attachments: admin.ser
>
>
> I suspect there could be an issue in `BinaryCasSerDes`.
> When deserializing the attached file `admin.ser`, I get this stack trace:
> {code:java}
> Caused by: java.lang.ClassCastException: class 
> org.apache.uima.jcas.tcas.Annotation cannot be cast to class 
> org.apache.uima.jcas.cas.Sofa (org.apache.uima.jcas.tcas.Annotation and 
> org.apache.uima.jcas.cas.Sofa are in unnamed module of loader 
> org.apache.catalina.loader.ParallelWebappClassLoader @4593ff34)at 
> org.apache.uima.cas.impl.BinaryCasSerDes.makeSofaFromHeap(BinaryCasSerDes.java:1823)
>  ~[uimaj-core-3.1.1.jar:3.1.1]at 
> org.apache.uima.cas.impl.BinaryCasSerDes.getSofaFromAnnotBase(BinaryCasSerDes.java:1817)
>  ~[uimaj-core-3.1.1.jar:3.1.1]at 
> org.apache.uima.cas.impl.BinaryCasSerDes.createFSsFromHeaps(BinaryCasSerDes.java:1701)
>  ~[uimaj-core-3.1.1.jar:3.1.1]at 
> org.apache.uima.cas.impl.BinaryCasSerDes.reinit(BinaryCasSerDes.java:259) 
> ~[uimaj-core-3.1.1.jar:3.1.1]at 
> org.apache.uima.cas.impl.BinaryCasSerDes.reinit(BinaryCasSerDes.java:328) 
> ~[uimaj-core-3.1.1.jar:3.1.1]at 
> org.apache.uima.cas.impl.Serialization.deserializeCASComplete(Serialization.java:129)
>  ~[uimaj-core-3.1.1.jar:3.1.1]{code}
>  The code used to read the file before deserializing is as follows:
> {code:java}
>     public static void readSerializedCas(CAS aCas, File aFile)
>         throws IOException
>     {
>         try (ObjectInputStream is = new ObjectInputStream(new 
> FileInputStream(aFile))) {
>             CASCompleteSerializer serializer = (CASCompleteSerializer) 
> is.readObject();
>             deserializeCASComplete(serializer, (CASImpl) aCas);
>         }
>         catch (ClassNotFoundException e) {
>             throw new IOException(e);
>         }
>     }
> {code}
> I set a breakpoint to BinaryCasSerDes:1608 which is a for loop iterating over 
> the heap. Apparently, the first feature structure that is encountered is an 
> annotation type which is NOT the SOFA. Then in line 1700, the deserializer 
> tries to resolve the SOFA for this annotation but fails because it has not 
> yet been deserialized. Eventually makeSofaFromHeap is called and checks if a 
> SOFA needs to be created. It tries to look up the SOFAs ID (1) from 
> csds.addr2fs.get(sofaAddr) (BinaryCasSerDes:1821) and generates a new SOFA. 
> However, when the SECOND annotation is read and csds.addr2fs.get(sofaAddr) 
> (BinaryCasSerDes:1821) is called again and tries to resolve the SOFA from 
> addr 1, it gets the previously deserialized annotation instead of the SOFA 
> annotation that had been created.
> The SOFA that has been implicitly created is added to the csds.addr2fs map at 
> key 1... however, later in BinaryCasSerDes:1723, the key 1 is overwritten by 
> the deserialized annotation:
> {code}
>         if (!isSofa) { // if it was a sofa, other code added or pended it
>           csds.addFS(fs, heapIndex); // this overrides to SOFA that was 
> created at key 1 because heapIndex is also 1
>         }
> {code}
> The heap looks something like this:
> {code}
> [0, 187, 1, 33, 46, 199, 200, 201, 44, 202, 187, 1, 33, 46, 203, 204, 205, 
> 45, 206, 187, 1, 33, 46, 207, 208, 209, 46, 210, 187, 1, 33, 46, 211, 212, 
> 213, 47, 214, 187, 1, 33, 46, 215, 216, 217, 48, 1, 187, 1,...
> {code}
> I guess that 187 is the type code of the first annotation and we can see it 
> repeats a couple of times. The 1 seems to be the SOFA ID - the first feature 
> of the feature structures. However, instead of 1 referring to the address of 
> the SOFA, it points at the first annotation which is NOT a SOFA.
> Bug in the serialization code assuming that the SOFA is always in the first 
> position?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to