[
https://issues.apache.org/jira/browse/UIMA-5662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16294292#comment-16294292
]
Richard Eckart de Castilho commented on UIMA-5662:
--------------------------------------------------
Because of the way I am currently used to dealing with the state of affairs in
UIMA v2, I may be biased towards a specific mode, namely:
* that all FSes always have an address
* that for specific serialization formats these addresses are stable and there
is no garbage collection during save/load
* that for other serialization formats, the addresses are no stable, but there
is garbage collection during save/load
So currently, instead of using an API to control when GC should happen, I use
one or the other serialization method. Mind that this happens in a web
application (multiple users, concurrent access). Using the current approach, I
can choose between stable addresses and GC on a per-CAS-instance base even when
working with multiple CASes simultaneously in a single thread (e.g. when doing
a diff across CASes). For example, when a user opens a document, the
server-side processing of the web request first loads the CAS from disk
(CasCompleteSerializer, stable IDs), then stores it again into a byte array
(Binary format 6, GC), then loads it again from the byte array into a new CAS
with a potentially update type system (Binary format 6, lenient loading), then
saves it again to disk (CasCompleteSerializer, stable IDs). While the user is
continues to work on the document, load/save always happens using
CasCompleteSerializer and avoiding GC.
If the UIMA API provides more control over the FS<->ID mapping and if more file
formats support stable IDs, it would probably no longer be necessary to use
different formats to achieve this effect. Instead, I would probably try to do
the following:
* store the data only in a single format (preferably form 6 compressed with
lenient loading assuming that it eventually supports stable IDs)
* when a user opens a document, load it without FS<->ID mapping to allow for
garbage collection; save it again
* when a user continues to work on a document, load/save it with FS<->ID
mapping enabled
If the FS<->ID mapping could make use of weak references, I would probably make
use of that: once an FS is no longer reachable, the editor has no use for it
anymore.
> uv3 support CAS deserialization subsequent low level access
> -----------------------------------------------------------
>
> Key: UIMA-5662
> URL: https://issues.apache.org/jira/browse/UIMA-5662
> Project: UIMA
> Issue Type: Improvement
> Components: Core Java Framework
> Affects Versions: 3.0.0SDK-beta
> Reporter: Marshall Schor
> Assignee: Marshall Schor
> Priority: Minor
> Fix For: 3.0.0SDK
>
>
> Some users depend 1) constant v2-ids for FSs preserved in deserialization and
> serialization, and 2) low level cas API access to these.
> V3 normally doesn't maintain tables linking ids to FSs, as these (unless weak
> refs are used) prevent GC of unreachable FSs.
> Based on a mode, set by -Duima.deserialize_perserve_ids, and also
> controllable by new config option per deserialize call, alter the
> deserialization for those deserializers which know about v2 ids, to put these
> into the map used for low-level CAS access, using the actual v2 ids, and
> change the v3 next available id for future new FSs to be 1 beyond the end.
> The -Duima.deserialize-preserve_ids global setting is needed to handle the
> use case of some annotators using low-level APIs, when part of a pipeline is
> "remoted".
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)