Re: CAS Serialization with Java Serialization

Richard Eckart de Castilho Mon, 19 May 2014 07:33:31 -0700

Hi Ofer,

I can tell you that in WebAnno (non-ASF) we use the 
CASCompleteSerializer to persist the CAS and we use the addresses of 
annotations across de/serialization cycles to refer to annotations.
We found this to work reliably (with stable addresses) whereas other
forms of serialization, e.g. the compressed binary formats, do not
maintain stable CAS addresses.


It sounds a bit as if the JCas structures may not have been 
properly set up yet. Maybe try calling something like cas.getJCas() or
even jcas.getCAS().getJCas() before trying to resolve the
address against the CAS.

Cheers,

-- Richard

On 19.05.2014, at 16:14, Ofer Bronstein <[email protected]> wrote:

> Hi Richard and all,
> 
> Thank you for the idea. I tried using your idea with ll_getFSForRef(), but
> I get a NullPointerException:
> In CASImpl.ll_getFSForRef(int fsRef), in the last line of the method (line
> 3117), the expression this.svd.localFsGenerators[getHeap().heap[fsRef]]
> returns null, but since the full phrase
> is this.svd.localFsGenerators[getHeap().heap[fsRef]].createFS(fsRef, this),
> we get a NullPointerException since we're trying to call createFS(fsRef,
> this) on null.
> 
> The address I am using is definitely on a Sentence Annotation that exists
> in the CAS, in the _InitialView,  and I got the address by calling
> getAddress() on it and saving the Integer.
> Can you think of any reason why this happens? Or, should I do something
> special to make the address valid, or have the FeatureStructure retrievable
> from it?
> 
> Thank you,
> Ofer
> 
> 
> On Mon, May 19, 2014 at 1:24 PM, Richard Eckart de Castilho
> <[email protected]>wrote:
> 
>> Hi Ofer,
>> 
>> I'm not an expert on Java Serialization but here is goes nothing ;)
>> 
>> 1) I suppose you could override the default Java Serialization process for
>> your Document class and handle the de/serialization of the CAS via
>> the CASCompleteSerializer - that would basically be the special treatment.
>> 
>> 2) I do not think that you can make JCas objects (like SentenceAnnotation)
>> "survive" the serialization process because they are not serializable.
>> If you manage to de/serialize the CAS using CASCompleteSerializer, then
>> you can make use of the CAS addresses in each annotation. Your Sentence
>> object can maintain a reference to the address of each SentenceAnnotation.
>> When you want to access the SentenceAnnotation through your Sentence,
>> you do so by resolving the address against the loaded JCas:
>> 
>> (Store this address in your Sentence)
>>  int address = sentenceAnnotation.getAddress()
>> 
>> (Use it later after deserialization to fetch the SentenceAnnotation from
>> the JCas)
>>  (SentenceAnnotation) aJCas.getLowLevelCas().ll_getFSForRef(address)
>> 
>> Btw. this is as fast as it gets - JCas wrappers use such code internally.
>> 
>> I'd say what you plan to do should work but it verges on the border of
>> black magic! But then again, I've done similar stuff ;)
>> 
>> Cheers,
>> 
>> -- Richard
>> 
>> In your Document object, make the CAS a
>> 
>> On 19.05.2014, at 12:04, Ofer Bronstein <[email protected]> wrote:
>> 
>>> Hi Richard and all,
>>> 
>>> Thank you for your answer. This is still only a partial solution, as:
>>> 
>>> 1. The JCas is referenced from inside a Document object, and by your
>>> suggestion, I must serialize both of them separately. For instance, write
>>> it alternating: <Document, JCas, Document, JCas, ...>, or implement
>>> Serializable.writeObject() and call
>>> ObjectOutputStream.defaultWriteObject() for the other fields. However, I
>> am
>>> looking for a way to have the serializer of the document just go through
>>> its default writeObject() implementation, and only when it encounters the
>>> JCas field - then some special treatment would be triggered.
>>> 
>>> 2. More importantly - my Sentence object (referenced by a Document
>> object)
>>> has a reference to a Sentence Annotation. This Annotation cannot be
>>> serialized by the method you suggest, as it only takes a full CAS. Of
>>> course I could implement here something that when deserializing, I would
>>> iterate through the CAS and find each sentence's annotation and manually
>>> put its reference in the Sentence object. But this is pretty complicated,
>>> and would be a very lengthy process during deserialization. So I am
>> looking
>>> for a way for the SentenceAnnotation references to "survive" the
>>> serialization\deserialization.
>>> 
>>> Do you have any ideas?
>>> 
>>> Thank you,
>>> Ofer
>>> 
>>> 
>>> On Mon, May 19, 2014 at 12:19 PM, Richard Eckart de Castilho <
>> [email protected]
>>>> wrote:
>>> 
>>>> Hello Ofer,
>>>> 
>>>> the CAS cannot be serialized immediately, but there is a helper class
>>>> which is serializable.
>>>> 
>>>> To write:
>>>> 
>>>> ObjectOutputStream docOS = ...
>>>> CASCompleteSerializer serializer =
>>>> Serialization.serializeCASComplete(aJCas.getCasImpl());
>>>> docOS.writeObject(serializer);
>>>> 
>>>> To read:
>>>> 
>>>> ObjectInputStream is = ...
>>>> CASCompleteSerializer serializer = (CASCompleteSerializer)
>> is.readObject();
>>>> Serialization.deserializeCASComplete(serializer, (CASImpl) aCAS);
>>>> 
>>>> However, there are newer and more efficient binary formats that you
>> might
>>>> want to use [1].
>>>> 
>>>> If you want to dig into the topic or if you want to use a ready-made
>> pair
>>>> of
>>>> readers/writers for the binary formats, you could consider taking a
>> look at
>>>> the BinaryCasReader/Writer in the DKPro Core [2,3] (non-ASF).
>>>> 
>>>> Cheers,
>>>> 
>>>> -- Richard
>>>> 
>>>> [1]
>>>> 
>> http://uima.apache.org/d/uimaj-2.6.0/tutorials_and_users_guides.html#ugr.tug.type_filtering.compressed_file
>>>> [2]
>>>> 
>> https://code.google.com/p/dkpro-core-asl/source/browse/de.tudarmstadt.ukp.dkpro.core-asl/trunk/de.tudarmstadt.ukp.dkpro.core.io.bincas-asl/src/main/java/de/tudarmstadt/ukp/dkpro/core/io/bincas/BinaryCasReader.java
>>>> [3]
>>>> 
>> https://code.google.com/p/dkpro-core-asl/source/browse/de.tudarmstadt.ukp.dkpro.core-asl/trunk/de.tudarmstadt.ukp.dkpro.core.io.bincas-asl/src/main/java/de/tudarmstadt/ukp/dkpro/core/io/bincas/BinaryCasWriter.java
>>>> 
>>>> On 19.05.2014, at 11:03, Ofer Bronstein <[email protected]> wrote:
>>>> 
>>>>> Hi Guys,
>>>>> 
>>>>> I am an Israeli Master's Student, and have been happily working with
>> UIMA
>>>>> for the past two years.
>>>>> I hope this is the right place for my question -
>>>>> 
>>>>> I have a Document object I created, which has a JCas member with
>>>>> annotations over a document.
>>>>> I also have a Sentence object, with a member referencing its Sentence
>>>>> Annotation in the corresponding JCas. Each Document object references
>> all
>>>>> of its Sentence objects.
>>>>> I would like to dump each Document object as a file on disk, using the
>>>>> default Java serialization. Later they would also be deserialized back
>>>> into
>>>>> the Java objects. I understand I would need some special treatment for
>>>> the
>>>>> JCases and the Sentence Annotations as they are not serializable (now I
>>>> get
>>>>> NotSerializableException). Hopefully the treatment could be as minimal
>> as
>>>>> possible.
>>>>> 
>>>>> How do you suggest to do this, regarding serialization of JCas and
>>>>> combining it with Java serialization?
>>>>> 
>>>>> I am working on Windows, with Java 1.6 and UIMA 2.4.0. I am using the
>>>> same
>>>>> type system and the same 3 views for all JCases and annotations.
>>>>> 
>>>>> Thank you,
>>>>> Ofer Bronstein

Re: CAS Serialization with Java Serialization

Reply via email to