[ https://issues.apache.org/jira/browse/UIMA-5662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16294292#comment-16294292 ]
Richard Eckart de Castilho commented on UIMA-5662: -------------------------------------------------- Because of the way I am currently used to dealing with the state of affairs in UIMA v2, I may be biased towards a specific mode, namely: * that all FSes always have an address * that for specific serialization formats these addresses are stable and there is no garbage collection during save/load * that for other serialization formats, the addresses are no stable, but there is garbage collection during save/load So currently, instead of using an API to control when GC should happen, I use one or the other serialization method. Mind that this happens in a web application (multiple users, concurrent access). Using the current approach, I can choose between stable addresses and GC on a per-CAS-instance base even when working with multiple CASes simultaneously in a single thread (e.g. when doing a diff across CASes). For example, when a user opens a document, the server-side processing of the web request first loads the CAS from disk (CasCompleteSerializer, stable IDs), then stores it again into a byte array (Binary format 6, GC), then loads it again from the byte array into a new CAS with a potentially update type system (Binary format 6, lenient loading), then saves it again to disk (CasCompleteSerializer, stable IDs). While the user is continues to work on the document, load/save always happens using CasCompleteSerializer and avoiding GC. If the UIMA API provides more control over the FS<->ID mapping and if more file formats support stable IDs, it would probably no longer be necessary to use different formats to achieve this effect. Instead, I would probably try to do the following: * store the data only in a single format (preferably form 6 compressed with lenient loading assuming that it eventually supports stable IDs) * when a user opens a document, load it without FS<->ID mapping to allow for garbage collection; save it again * when a user continues to work on a document, load/save it with FS<->ID mapping enabled If the FS<->ID mapping could make use of weak references, I would probably make use of that: once an FS is no longer reachable, the editor has no use for it anymore. > uv3 support CAS deserialization subsequent low level access > ----------------------------------------------------------- > > Key: UIMA-5662 > URL: https://issues.apache.org/jira/browse/UIMA-5662 > Project: UIMA > Issue Type: Improvement > Components: Core Java Framework > Affects Versions: 3.0.0SDK-beta > Reporter: Marshall Schor > Assignee: Marshall Schor > Priority: Minor > Fix For: 3.0.0SDK > > > Some users depend 1) constant v2-ids for FSs preserved in deserialization and > serialization, and 2) low level cas API access to these. > V3 normally doesn't maintain tables linking ids to FSs, as these (unless weak > refs are used) prevent GC of unreachable FSs. > Based on a mode, set by -Duima.deserialize_perserve_ids, and also > controllable by new config option per deserialize call, alter the > deserialization for those deserializers which know about v2 ids, to put these > into the map used for low-level CAS access, using the actual v2 ids, and > change the v3 next available id for future new FSs to be 1 beyond the end. > The -Duima.deserialize-preserve_ids global setting is needed to handle the > use case of some annotators using low-level APIs, when part of a pipeline is > "remoted". -- This message was sent by Atlassian JIRA (v6.4.14#64029)