[ 
https://issues.apache.org/jira/browse/UIMA-5662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16284040#comment-16284040
 ] 

Marshall Schor commented on UIMA-5662:
--------------------------------------

Some brief responses; will be thinking more about this in general.  
First, thanks for your discussion, always useful!

Re: "do not really plan on adding a new built-in type...",  the thought was to 
add a new "semi-built-in" type.  These are just like built-in types, but you 
have to explicitly import them (for backwards compatibility when working with 
type systems from v2 with binary serializations).

Although the "type" would be built-in, instances of it would not be - it would 
be up to the user to create 1 or more instances .  Perhaps this is what you 
were trying to say.

Re: "approach taken in the XMI deserializer-serialzer where ID info is kept":  
yes, that is similar.

Re: "I'd have to manually figure out the next ID" - yes, in v3, that's a simple 
api call on the FS:  fs._id().  

re: circumstances:
* lookups FS -> ID are fast - yes they are
* Lookups ID -> FS are fast - that would depend on what kind of map was used, 
but in general it should be like a hash map.
* maps store in the CAS - I think not, in the proposal, if you mean in the 
sense that the client code could use some special CAS APIs to access via ints 
(such as the low-level CAS apis).  The idea I am exploring is generalizing 
this, which would involve the client knowing about the map.
* The client code can set up an Id assignment - that could be "optional" - in 
that the client could choose to use the fs._id() int instead.
* "one such strategy ..." - I think that was part of this proposal, if you 
restrict the "reader components" to deserializers. 

Re: some questions:
* removing the need for XmiSerializationSharedData - that is used for multiple 
purposes, not just id mapping. 
* "out-of-typesystem info" - there's no generalization proposed for that - it 
is supported for some (not all) kinds of (de)serializations, more as an 
internal implementation detail.

re: risks:
* if the maps are stored like FSes - the proposal would be to store these using 
the v3 support for arbitrary Java objects in the CAS.  This support already 
accomodates serialization / deserialization to v2 systems, by arranging the 
transportable form to be common uima objects. (see 
https://uima.apache.org/d/uimaj-3.0.0-beta/version_3_users_guide.html#uv3.custom_java_objects
 )


> uv3 support CAS deserialization subsequent low level access
> -----------------------------------------------------------
>
>                 Key: UIMA-5662
>                 URL: https://issues.apache.org/jira/browse/UIMA-5662
>             Project: UIMA
>          Issue Type: Improvement
>          Components: Core Java Framework
>    Affects Versions: 3.0.0SDK-beta
>            Reporter: Marshall Schor
>            Assignee: Marshall Schor
>            Priority: Minor
>             Fix For: 3.0.0SDK
>
>
> Some users depend 1) constant v2-ids for FSs preserved in deserialization and 
> serialization, and 2) low level cas API access to these.
> V3 normally doesn't maintain tables linking ids to FSs, as these (unless weak 
> refs are used) prevent GC of unreachable FSs.
> Based on a mode, set by -Duima.deserialize_perserve_ids, and also 
> controllable by new config option per deserialize call, alter the 
> deserialization for those deserializers which know about v2 ids, to put these 
> into the map used for low-level CAS access, using the actual v2 ids, and 
> change the v3 next available id for future new FSs to be 1 beyond the end.
> The -Duima.deserialize-preserve_ids global setting is needed to handle the 
> use case of some annotators using low-level APIs, when part of a pipeline is 
> "remoted". 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to