Hi,
ids are just too convenient when you build some editor or in other use
cases where you want to modify an annotation but cannot keep the CAS. I
assume that changing the ids when storing the CAS could be OK.
We should at least try to support them and see how bad the performance
drop is.
Btw, I already have another use case where I use them: Applying ruta
rules directly on annotation objects in java code. Here, the address/id
is injected in the rule string and then resolved later again within the
ruta impl.
Ruta.matches(jcas, Ruta.inject("${PARTOF(Person)} NUM;", annotation))
This returns true if the given annotation if covered by a Person
annotation and is followed by a NUM annotation.
I like to extend this functionality in ruta in future, and I do not see
how I can keep it without ids.
Best,
Peter
Am 08.09.2016 um 15:27 schrieb Marshall Schor:
> It seems that some (but not all) users really like and make use of
>
> * int "id"s that are stable and don't change due to loading/saving
> * to get "direct access" to FSs using these "id"s
> * want UIMA framework support for this
>
> I state this based on a history over time of multiple discussions on various
> lists, about this topic.
>
> Up to now, these users have been using internal data in V2 (the "address" in
> the
> low level representation), which is stable for some load/save operations but
> not
> others.
>
> Supporting this costs two things:
> * space - in each FS, for the int "id" and
> * space/time to hold and update a map from "id" to the FS for direct access.
> This map would likely have "weak references" (an additional Java Object
> overhead
> per FS) to permit GC to work. (The use of weak refs could be an option, as
> well).
>
> We could support such a thing in V3 based on some pipeline setting (e.g. using
> additionalParameters options); this would permit freeing the use of internal
> id's etc., to be more just for internal use.
>
> Is this a reasonable description of this "use case"? Does it seem reasonable
> for
> V3 to support such a thing?
>
> -Marshall
>
>
> On 9/2/2016 1:56 PM, Richard Eckart de Castilho wrote:
>> See comment at end of mail.
>>
>> On 02.09.2016, at 15:18, Marshall Schor <[email protected]> wrote:
>>> To go from an ID to an FS is not generally possible, because normally, the
>>> framework doesn't keep this association. There are exceptions though, the
>>> main
>>> ones being:
>>>
>>> a) If you use low level CAS Apis to create FSs, the API returns the ID,
>>> which
>>> means, that a GC that happens right after the API returns would garbage
>>> collect
>>> the FS because at that point, nothing is "holding on" to any reference
>>> (it's not
>>> in any index). To prevent this, the low level create FS methods add the FS
>>> to a
>>> map which goes from ID -> FS, and thus "holds onto" the FS, preventing
>>> Garbage
>>> collection.
>>>
>>> b) Another case where this happens is when PEARs are used; in this case the
>>> FSs
>>> involved with PEAR "trampoline" FSs end up being in similar maps.
>>>
>>> Both of these approaches of course disable a feature of V3 - namely, that
>>> unrefererenced FSs can be garbage collected.
>>>
>>> ...
>>>
>>> There is an API in the V3 CASImpl, getFsFromId(int) and also
>>> getFsFromId_checked(int), which retrieves the associated FS, given the ID,
>>> or
>>> returns null (or throws an exception) if it isn't in the table. Most FSs
>>> created normally, won't be in the table.
>> Can we do this? -> As soon as an FS has been added to an index or is being
>> referenced from another FS, its ID should be resolvable to the respective FS.
>>
>> When an FS is in an index or being referred by another FS, it cannot be
>> garbage collected anyway. The CAS could maintain a lookup using weak
>> references to provides a central place to look up such FSes via their IDs
>> without preventing garbage collection.
>>
>> WebAnno remembers the ID of every FS rendered on screen. When the user makes
>> an action, we load the CAS from disk and then look up the ID to retrieve the
>> FS. We do not keep the CAS in memory all the time. If we would have to scan
>> the whole CAS for the FS with a given ID, it would have probably a serious
>> performance impact.
>>
>> Cheers,
>>
>> -- Richard