[ 
https://issues.apache.org/jira/browse/UIMA-5662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16295141#comment-16295141
 ] 

Marshall Schor commented on UIMA-5662:
--------------------------------------

Thanks, Richard, for the nice use case.  Re: 
* all FSs always have an address: in v3, all FSs have an "id" which is an int, 
like an address
* FS's ids are written out for some serializations (XCAS, Xmi, JSON), and are 
"imputed" for others (Binary, CasComplete, Compressed). By imputed, I mean the 
ids are not written, but the FSs are output in a specific order, and that order 
can be used to determine "ids".

As a side effect, if a CAS is deserialized (and has all of its FSs 
"reachable"), then as long as the reachability of those FSs doesn't change, the 
ids (written or imputed) will be the same when written out (and subsequently 
deserialized) because
* for written-out ids, the ids haven't changed, and 
* for imputed ids, the order is kept by sorting all the FSs by id order.    

In V3 (currently) the map in the CAS that enables low-level getFSForRef(int) to 
work, won't be consulted when serializers determine what FSs are "reachable", 
so in that sense, the serializers all perform a kind of GC when serializing.  
But, for serializers writing the fsId into the serialized form, these will 
write the actual id, so the id's will be "stable" even if some of the FSs are 
no longer reachable.

This might mean, though, that one of your use cases won't work, unless you 
change the "save the stable id's form" to xmi.  (The v3 cas complete kind will 
miss collecting no-longer-reachable FSs, and that form doesn't explicitly 
encode the id's (it is using the impute approach).

> uv3 support CAS deserialization subsequent low level access
> -----------------------------------------------------------
>
>                 Key: UIMA-5662
>                 URL: https://issues.apache.org/jira/browse/UIMA-5662
>             Project: UIMA
>          Issue Type: Improvement
>          Components: Core Java Framework
>    Affects Versions: 3.0.0SDK-beta
>            Reporter: Marshall Schor
>            Assignee: Marshall Schor
>            Priority: Minor
>             Fix For: 3.0.0SDK
>
>
> Some users depend 1) constant v2-ids for FSs preserved in deserialization and 
> serialization, and 2) low level cas API access to these.
> V3 normally doesn't maintain tables linking ids to FSs, as these (unless weak 
> refs are used) prevent GC of unreachable FSs.
> Based on a mode, set by -Duima.deserialize_perserve_ids, and also 
> controllable by new config option per deserialize call, alter the 
> deserialization for those deserializers which know about v2 ids, to put these 
> into the map used for low-level CAS access, using the actual v2 ids, and 
> change the v3 next available id for future new FSs to be 1 beyond the end.
> The -Duima.deserialize-preserve_ids global setting is needed to handle the 
> use case of some annotators using low-level APIs, when part of a pipeline is 
> "remoted". 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to