On 12/1/2017 4:16 PM, Richard Eckart de Castilho wrote:
> On 01.12.2017, at 16:12, Marshall Schor <[email protected]> wrote:
>> I'm in the middle of things with changes for 3.0.0sdk;  could you check out 
>> the
>> 3.0.0-beta tag and change
>> in the uimaj-core project,
>>   Class: org.apache.uima.cas.impl.FSClassRegistry
>>   to comment out the throw clause: lines 456-460
>>
>> and then rebuild uimaj-core, try it, and let me know if it works?
>>
>> If so, let's put in a Jira to switch the "throw" to a "report" (not 
>> completely
>> straight-forward, but not too hard- I can do the change...)
> Commenting out these lines, I can load the CAS and I can also see that WebAnno
> can access and render the annotations. 
>
> However, it doesn't seem to be possible to retrieve the annotations by their 
> addresses:
> org.apache.uima.cas.impl.LowLevelException: Error in low-level CAS APIs: 
> accessing FS with id 15, but no such FS exists in this CAS.
>       at 
> org.apache.uima.cas.impl.CASImpl.getFsFromId_checked(CASImpl.java:2444) 
> ~[classes/:?]
>       at org.apache.uima.cas.impl.CASImpl.ll_getFSForRef(CASImpl.java:2641) 
> ~[classes/:?]
>
> WebAnno uses the CasCompleteSerializer since FS addresses in v2 remained 
> stable with this
> particular serialization format.
>
> I checked the CASImpl.svd.id2fs (JCasHashMap): all four of its 
> JCasHashMapSubMaps report a size of 0.
> It seems like during the deserialization, the id2fs map is not updated with 
> the addresses obtained
> from the serialized file. 

The JCasHashMap is normally not used and not maintained - hence you see it is
0.  It is there mainly to support Pear trampolines.

>
> Digging further, I found that BinaryCasSerDes.createFSsFromHeaps actually 
> sets up a
> map of the v2 IDs obtained from the serialized CAS to the v3 FSes, but it 
> does not actually
> set the IDs of the v3 FSes to the values obtained from the CAS. 
> Unfortunately, this is an
> essential assumption made in the WebAnno code. 
>
> It looks to me that v3 is rather flexible with respect to assigning IDs to 
> FSes (unlike
> v2 where this was bound to the heap organization). It would be great if this 
> flexibility
> could be used in order to assign the IDs in the way that they are read from 
> the serialized
> CAS (cf. CommonSerDesSequential.addr2fs).
This could be done, but it would not be enough.  On top of this, you would need
to have a map from these numbers to the feature structures.

The map you saw throwing the exception, is not normally populated, because it
prevents "garbage collection" of unreferenced Feature Structures.  The map is
used when low level APIs are used to create Feature Structures.  This is
required because of a race condition that can happen where a GC happens after
the Feature Structure is created (returning an "int"), and before that Feature
Structure instance can be "held onto" by something to prevent GC.

I'm wondering if some other approach could be done, that would treat these kinds
of use cases specially, and not have to give up the v3 benefits like GC in the
general case.

A more general question: The v3 approach to serialization / deserialization is
to serialize just those FSs which are indexed, or reachable from other
serializable things. Does this work for webAnnot?  A consequence would be, that
deserializing a CAS produced by v2, which had a bunch of FSs which were not
indexed, and not referenced by anything, would end up being GC'd.

   Whereas, in v2, they would be in the "cas" and gettable via their "address"
(integer).

Is this (maybe made-up) use case something that goes on in WebAnnot?

  If so, we may need some creative thinking - how to support something like
this, while keeping the v3 benefits.

Thanks for all your testing.
I'll put in a Jira to change the throw to a message, for the JCas super class 
test.

-Marshall

> Cheers,
>
> -- Richard

Reply via email to