Marshall Schor created UIMA-5164:
------------------------------------

             Summary: uv3 support for arbitrary Java objects in the CAS
                 Key: UIMA-5164
                 URL: https://issues.apache.org/jira/browse/UIMA-5164
             Project: UIMA
          Issue Type: New Feature
          Components: e
            Reporter: Marshall Schor
            Assignee: Marshall Schor
            Priority: Minor
             Fix For: 3.0.0SDKexp


Using JCas, it has been possible to have arbitrary Java objects included in the 
Java instance.  The problem with doing this has been that there was no 
architected way for these objects to participate in the broader UIMA 
interoperability concepts such as serialization, remote annotators, etc.  And, 
furthermore, JCas objects were optional, and might not be used.

UIMA V3 implements Feature Structures in the CAS as JCas objects directly, so 
these are now always present and reliable.  This means that when an 
implementation adds arbitrary Java objects (e.g, a special HashSet containing 
Feature Structures) to a JCas class definition, they are reliably present.

Here's how we could make this all work in v3.

A user would first pick some Java class to emulate in the CAS.  A requirement 
would be that the data in the emulated class would need to support having a 
serialized form representing a  "snapshot" of the data at a particular moment, 
that  could be put into the CAS using a fixed number of UIMA features of normal 
UIMA data types, including Feature Structures.  For example, an 
ArrayList<FeatureStructure>  could be put into the CAS as an FSArray instance 
of the current size; a Map<Integer, FeatureStructure> could be put into the CAS 
as an IntegerArray and an FSArray, etc.  The snapshot would be produced 
whenever needed, for example, during serialization.  A corresponding 
transformation (used, for instance, during deserialization) would convert the 
snapshot data back into the emulated Java class instance.

This new kind of hybrid object would be implemented with a custom JCas cover 
class which wrapped the emulated Java class instance.  It would also have as 
features those needed for the "snapshot" representation.

The user would need to 
* define a UIMA type; this type would include the feature definitions needed 
for the snapshot. 
* create the corresponding JCas cover class for that type
* add 3 extra methods in the cover class, all methods defined by a new UIMA 
interface "UimaSerializable"
** _init_from_cas_data()
** _save_to_cas_data() 
** clone

The _init_from_cas_data would use the cas data in this Feature Structure to 
initialize the emulated Java class.  

This method would be called by the framework whenever it makes a new instance 
with non-empty Feature Structure data (for example, during deserialization), so 
that the emulated Java class instance may be initialized.  This would typically 
be called by routines like the cas copier and deserialization.

Similarly, the _save_to_cas_data would be called by the framework as part of 
serialization, and would extract data from the emulated Java class and save as 
CAS features..

This Jira adds support for this approach; other Jiras will add some likely 
popular new types (example: FSArrayList - like ArrayList<TOP>).  Users can 
(easily ?) add types of their own, for instance, if they need a peculiar kind 
of Set of Feature Structures, perhaps built on top of ConcurrentSkipListSet 
using a special definition of set-member-equals.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to