Hi,

ids are just too convenient when you build some editor or in other use
cases where you want to modify an annotation but cannot keep the CAS. I
assume that changing the ids when storing the CAS could be OK.


We should at least try to support them and see how bad the performance
drop is.


Btw, I already have another use case where I use them: Applying ruta
rules directly on annotation objects in java code. Here, the address/id
is injected in the rule string and then resolved later again within the
ruta impl.

Ruta.matches(jcas, Ruta.inject("${PARTOF(Person)} NUM;", annotation))

This returns true if the given annotation if covered by a Person
annotation and is followed by a NUM annotation.

I like to extend this functionality in ruta in future, and I do not see
how I can keep it without ids.


Best,


Peter



Am 08.09.2016 um 15:27 schrieb Marshall Schor:
> It seems that some (but not all) users really like and make use of
>
> * int "id"s that are stable and don't change due to loading/saving
> * to get "direct access" to FSs using these "id"s
> * want UIMA framework support for this
>
> I state this based on a history over time of multiple discussions on various
> lists, about this topic.
>
> Up to now, these users have been using internal data in V2 (the "address" in 
> the
> low level representation), which is stable for some load/save operations but 
> not
> others.
>
> Supporting this costs two things:
> * space - in each FS, for the int "id" and
> * space/time to hold and update a map from "id" to the FS for direct access. 
> This map would likely have "weak references" (an additional Java Object 
> overhead
> per FS) to permit GC to work. (The use of weak refs could be an option, as 
> well).
>
> We could support such a thing in V3 based on some pipeline setting (e.g. using
> additionalParameters options); this would permit freeing the use of internal
> id's etc., to be more just for internal use.
>
> Is this a reasonable description of this "use case"? Does it seem reasonable 
> for
> V3 to support such a thing?
>
> -Marshall
>
>
> On 9/2/2016 1:56 PM, Richard Eckart de Castilho wrote:
>> See comment at end of mail.
>>
>> On 02.09.2016, at 15:18, Marshall Schor <[email protected]> wrote:
>>> To go from an ID to an FS is not generally possible, because normally, the
>>> framework doesn't keep this association.  There are exceptions though, the 
>>> main
>>> ones being:
>>>
>>> a) If you use low level CAS Apis to create FSs, the API returns the ID, 
>>> which
>>> means, that a GC that happens right after the API returns would garbage 
>>> collect
>>> the FS because at that point, nothing is "holding on" to any reference 
>>> (it's not
>>> in any index).  To prevent this, the low level create FS methods add the FS 
>>> to a
>>> map which goes from ID -> FS, and thus "holds onto" the FS, preventing 
>>> Garbage
>>> collection.
>>>
>>> b) Another case where this happens is when PEARs are used; in this case the 
>>> FSs
>>> involved with PEAR "trampoline" FSs end up being in similar maps.
>>>
>>> Both of these approaches of course disable a feature of V3 - namely, that
>>> unrefererenced FSs can be garbage collected.
>>>
>>> ...
>>>
>>> There is an API in the V3 CASImpl, getFsFromId(int)  and also
>>> getFsFromId_checked(int), which retrieves the associated FS, given the ID, 
>>> or
>>> returns null (or throws an exception) if it isn't in the table.  Most FSs
>>> created normally, won't be in the table.
>> Can we do this? -> As soon as an FS has been added to an index or is being 
>> referenced from another FS, its ID should be resolvable to the respective FS.
>>
>> When an FS is in an index or being referred by another FS, it cannot be 
>> garbage collected anyway. The CAS could maintain a lookup using weak 
>> references to provides a central place to look up such FSes via their IDs 
>> without preventing garbage collection.
>>
>> WebAnno remembers the ID of every FS rendered on screen. When the user makes 
>> an action, we load the CAS from disk and then look up the ID to retrieve the 
>> FS. We do not keep the CAS in memory all the time. If we would have to scan 
>> the whole CAS for the FS with a given ID, it would have probably a serious 
>> performance impact.
>>
>> Cheers,
>>
>> -- Richard

Reply via email to