[ https://issues.apache.org/jira/browse/UIMA-5662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16280247#comment-16280247 ]
Marshall Schor commented on UIMA-5662: -------------------------------------- Thinking more (out loud) about this topic, mainly from the perspective of backwards compatibility: there seem to be 2 distinct issues: deserializing FSs with the fsIds in the serialized form, and populating the map that enables ll_getFSForRef( int ). The map already is populated, normally, only when a FS is created using a ll_ interface. It can be forced to populate for all FS creations using the -D swtich. Normally, it isn't populated for * regular fs creation * creation via cas copier * creation via deserializations * creation via side effects (e.g., if you do a cas.getDocumentAnnotation() an one doesn't exist). A general observation: if an application uses the ll_getFSForRef(int) for some FS access, this map must be populated for those FSs, in order to work. When deserializing, some serialization forms store explicitly an FsId, some "impute" an FsId from the layout. The latter forms don't preserve constant FsIds over a sequence of operations such as: # deserialize -> CAS # update by removing some of the deserialized FSs # reserialize out The reserialize (in v3) only serializes "reachable" FSs, so the "layout" will be different. When deserializing, we can create FSs with the same explicit FsIds. It makes sense to do this for just those forms where the FsId is stored explicitly, so they can remain "constant". For other forms, there's no point in doing this as far as I can see, because the ids are not constant. It would be possible to design things so that the deserialization (for those forms having an explicit fsId, and as long as the special "merge" form of deserialization isn't being used), we could implement this to always keep the same fsIds. (The merge form is used when sending a cas to multiple remote services in parallel, and "merging" back the results from all of those, when they return). This would allow backwards compatibility with applications using deserialization + getFSForRef() calls, without a global -D flag. I don't think it would have any negative impacts. For casCopying, if copying just a single fs, or a single view, the target cas may already have FSs with the same fsId. Rather than handle special cases where this might be made to work, for now, we can just say that cas copying won't preserve fsIds. > uv3 support CAS deserialization subsequent low level access > ----------------------------------------------------------- > > Key: UIMA-5662 > URL: https://issues.apache.org/jira/browse/UIMA-5662 > Project: UIMA > Issue Type: Improvement > Components: Core Java Framework > Affects Versions: 3.0.0SDK-beta > Reporter: Marshall Schor > Assignee: Marshall Schor > Priority: Minor > Fix For: 3.0.0SDK > > > Some users depend 1) constant v2-ids for FSs preserved in deserialization and > serialization, and 2) low level cas API access to these. > V3 normally doesn't maintain tables linking ids to FSs, as these (unless weak > refs are used) prevent GC of unreachable FSs. > Based on a mode, set by -Duima.deserialize_perserve_ids, and also > controllable by new config option per deserialize call, alter the > deserialization for those deserializers which know about v2 ids, to put these > into the map used for low-level CAS access, using the actual v2 ids, and > change the v3 next available id for future new FSs to be 1 beyond the end. > The -Duima.deserialize-preserve_ids global setting is needed to handle the > use case of some annotators using low-level APIs, when part of a pipeline is > "remoted". -- This message was sent by Atlassian JIRA (v6.4.14#64029)