Hi Ofer, I'm not an expert on Java Serialization but here is goes nothing ;)
1) I suppose you could override the default Java Serialization process for your Document class and handle the de/serialization of the CAS via the CASCompleteSerializer - that would basically be the special treatment. 2) I do not think that you can make JCas objects (like SentenceAnnotation) "survive" the serialization process because they are not serializable. If you manage to de/serialize the CAS using CASCompleteSerializer, then you can make use of the CAS addresses in each annotation. Your Sentence object can maintain a reference to the address of each SentenceAnnotation. When you want to access the SentenceAnnotation through your Sentence, you do so by resolving the address against the loaded JCas: (Store this address in your Sentence) int address = sentenceAnnotation.getAddress() (Use it later after deserialization to fetch the SentenceAnnotation from the JCas) (SentenceAnnotation) aJCas.getLowLevelCas().ll_getFSForRef(address) Btw. this is as fast as it gets - JCas wrappers use such code internally. I'd say what you plan to do should work but it verges on the border of black magic! But then again, I've done similar stuff ;) Cheers, -- Richard In your Document object, make the CAS a On 19.05.2014, at 12:04, Ofer Bronstein <[email protected]> wrote: > Hi Richard and all, > > Thank you for your answer. This is still only a partial solution, as: > > 1. The JCas is referenced from inside a Document object, and by your > suggestion, I must serialize both of them separately. For instance, write > it alternating: <Document, JCas, Document, JCas, ...>, or implement > Serializable.writeObject() and call > ObjectOutputStream.defaultWriteObject() for the other fields. However, I am > looking for a way to have the serializer of the document just go through > its default writeObject() implementation, and only when it encounters the > JCas field - then some special treatment would be triggered. > > 2. More importantly - my Sentence object (referenced by a Document object) > has a reference to a Sentence Annotation. This Annotation cannot be > serialized by the method you suggest, as it only takes a full CAS. Of > course I could implement here something that when deserializing, I would > iterate through the CAS and find each sentence's annotation and manually > put its reference in the Sentence object. But this is pretty complicated, > and would be a very lengthy process during deserialization. So I am looking > for a way for the SentenceAnnotation references to "survive" the > serialization\deserialization. > > Do you have any ideas? > > Thank you, > Ofer > > > On Mon, May 19, 2014 at 12:19 PM, Richard Eckart de Castilho <[email protected] >> wrote: > >> Hello Ofer, >> >> the CAS cannot be serialized immediately, but there is a helper class >> which is serializable. >> >> To write: >> >> ObjectOutputStream docOS = ... >> CASCompleteSerializer serializer = >> Serialization.serializeCASComplete(aJCas.getCasImpl()); >> docOS.writeObject(serializer); >> >> To read: >> >> ObjectInputStream is = ... >> CASCompleteSerializer serializer = (CASCompleteSerializer) is.readObject(); >> Serialization.deserializeCASComplete(serializer, (CASImpl) aCAS); >> >> However, there are newer and more efficient binary formats that you might >> want to use [1]. >> >> If you want to dig into the topic or if you want to use a ready-made pair >> of >> readers/writers for the binary formats, you could consider taking a look at >> the BinaryCasReader/Writer in the DKPro Core [2,3] (non-ASF). >> >> Cheers, >> >> -- Richard >> >> [1] >> http://uima.apache.org/d/uimaj-2.6.0/tutorials_and_users_guides.html#ugr.tug.type_filtering.compressed_file >> [2] >> https://code.google.com/p/dkpro-core-asl/source/browse/de.tudarmstadt.ukp.dkpro.core-asl/trunk/de.tudarmstadt.ukp.dkpro.core.io.bincas-asl/src/main/java/de/tudarmstadt/ukp/dkpro/core/io/bincas/BinaryCasReader.java >> [3] >> https://code.google.com/p/dkpro-core-asl/source/browse/de.tudarmstadt.ukp.dkpro.core-asl/trunk/de.tudarmstadt.ukp.dkpro.core.io.bincas-asl/src/main/java/de/tudarmstadt/ukp/dkpro/core/io/bincas/BinaryCasWriter.java >> >> On 19.05.2014, at 11:03, Ofer Bronstein <[email protected]> wrote: >> >>> Hi Guys, >>> >>> I am an Israeli Master's Student, and have been happily working with UIMA >>> for the past two years. >>> I hope this is the right place for my question - >>> >>> I have a Document object I created, which has a JCas member with >>> annotations over a document. >>> I also have a Sentence object, with a member referencing its Sentence >>> Annotation in the corresponding JCas. Each Document object references all >>> of its Sentence objects. >>> I would like to dump each Document object as a file on disk, using the >>> default Java serialization. Later they would also be deserialized back >> into >>> the Java objects. I understand I would need some special treatment for >> the >>> JCases and the Sentence Annotations as they are not serializable (now I >> get >>> NotSerializableException). Hopefully the treatment could be as minimal as >>> possible. >>> >>> How do you suggest to do this, regarding serialization of JCas and >>> combining it with Java serialization? >>> >>> I am working on Windows, with Java 1.6 and UIMA 2.4.0. I am using the >> same >>> type system and the same 3 views for all JCases and annotations. >>> >>> Thank you, >>> Ofer Bronstein
