[ 
https://issues.apache.org/jira/browse/UIMA-5662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16280247#comment-16280247
 ] 

Marshall Schor commented on UIMA-5662:
--------------------------------------

Thinking more (out loud) about this topic, mainly from the perspective of 
backwards compatibility: there seem to be 2 distinct issues:  deserializing FSs 
with the fsIds in the serialized form, and populating the map that enables 
ll_getFSForRef( int ).

The map already is populated, normally, only when a FS is created using a ll_ 
interface.  It can be forced to populate for all FS creations using the -D 
swtich. Normally, it isn't populated for
* regular fs creation
* creation via cas copier
* creation via deserializations
* creation via side effects (e.g., if you do a cas.getDocumentAnnotation() an 
one doesn't exist).

A general observation: if an application uses the ll_getFSForRef(int) for some 
FS access, this map must be populated for those FSs, in order to work.

When deserializing, some serialization forms store explicitly an FsId, some 
"impute" an FsId from the layout.  The latter forms don't preserve constant 
FsIds over a sequence of operations such as:
# deserialize -> CAS
# update by removing some of the deserialized FSs
# reserialize out 
The reserialize (in v3) only serializes "reachable" FSs, so the "layout" will 
be different. 

When deserializing, we can create FSs with the same explicit FsIds.  It makes 
sense to do this for just those forms where the FsId is stored explicitly, so 
they can remain "constant". For other forms, there's no point in doing this as 
far as I can see, because the ids are not constant.

It would be possible to design things so that the deserialization (for those 
forms having an explicit fsId, and as long as the special "merge" form of 
deserialization isn't being used), we could implement this to always keep the 
same fsIds.  (The merge form is used when sending a cas to multiple remote 
services in parallel, and "merging" back the results from all of those, when 
they return).  This would allow backwards compatibility with applications using 
deserialization + getFSForRef() calls, without a global -D flag.  I don't think 
it would have any negative impacts.

For casCopying, if copying just a single fs, or a single view, the target cas 
may already have FSs with the same fsId.  Rather than handle special cases 
where this might be made to work, for now, we can just say that cas copying 
won't preserve fsIds.

> uv3 support CAS deserialization subsequent low level access
> -----------------------------------------------------------
>
>                 Key: UIMA-5662
>                 URL: https://issues.apache.org/jira/browse/UIMA-5662
>             Project: UIMA
>          Issue Type: Improvement
>          Components: Core Java Framework
>    Affects Versions: 3.0.0SDK-beta
>            Reporter: Marshall Schor
>            Assignee: Marshall Schor
>            Priority: Minor
>             Fix For: 3.0.0SDK
>
>
> Some users depend 1) constant v2-ids for FSs preserved in deserialization and 
> serialization, and 2) low level cas API access to these.
> V3 normally doesn't maintain tables linking ids to FSs, as these (unless weak 
> refs are used) prevent GC of unreachable FSs.
> Based on a mode, set by -Duima.deserialize_perserve_ids, and also 
> controllable by new config option per deserialize call, alter the 
> deserialization for those deserializers which know about v2 ids, to put these 
> into the map used for low-level CAS access, using the actual v2 ids, and 
> change the v3 next available id for future new FSs to be 1 beyond the end.
> The -Duima.deserialize-preserve_ids global setting is needed to handle the 
> use case of some annotators using low-level APIs, when part of a pipeline is 
> "remoted". 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to