Hi Jörn, Thanks for your input. Could you possible expand with a few specifics on what changes you think would make it easier to use with Hadoop etc.?
-Marshall On 9/7/2016 7:46 AM, Joern Kottmann wrote: > Hello all, > > at my work place we use UIMA mostly with custom code to load data into a > pipeline and store its results, > therefore we don't depend at all on the UIMA serialization formats. And > changing them, or adding new ones which > are incompatible wouldn't be an issue at all. Also the existing code can be > ported to work with UIMA 3. > > I really hope we can get UIMA 3 into a shape where it is easier to use with > todays requirements (e.g. with Hadoop) > and possibilities. > > I personally think that the effort to create the next overhauled version > shouldn't be limited in anyway by backward compatibility. > For me it is a good solution if there is some help with migrating things to > UIMA 3 (e.g. a guide which explains what to do) > and maybe maintaining UIMA 2 for a while in parallel (e.g. fixes of very > urgent/critical bugs). > > Jörn > > On Fri, Sep 2, 2016 at 7:56 PM, Richard Eckart de Castilho <r...@apache.org> > wrote: > >> See comment at end of mail. >> >> On 02.09.2016, at 15:18, Marshall Schor <m...@schor.com> wrote: >>> To go from an ID to an FS is not generally possible, because normally, >> the >>> framework doesn't keep this association. There are exceptions though, >> the main >>> ones being: >>> >>> a) If you use low level CAS Apis to create FSs, the API returns the ID, >> which >>> means, that a GC that happens right after the API returns would garbage >> collect >>> the FS because at that point, nothing is "holding on" to any reference >> (it's not >>> in any index). To prevent this, the low level create FS methods add the >> FS to a >>> map which goes from ID -> FS, and thus "holds onto" the FS, preventing >> Garbage >>> collection. >>> >>> b) Another case where this happens is when PEARs are used; in this case >> the FSs >>> involved with PEAR "trampoline" FSs end up being in similar maps. >>> >>> Both of these approaches of course disable a feature of V3 - namely, that >>> unrefererenced FSs can be garbage collected. >>> >>> ... >>> >>> There is an API in the V3 CASImpl, getFsFromId(int) and also >>> getFsFromId_checked(int), which retrieves the associated FS, given the >> ID, or >>> returns null (or throws an exception) if it isn't in the table. Most FSs >>> created normally, won't be in the table. >> Can we do this? -> As soon as an FS has been added to an index or is being >> referenced from another FS, its ID should be resolvable to the respective >> FS. >> >> When an FS is in an index or being referred by another FS, it cannot be >> garbage collected anyway. The CAS could maintain a lookup using weak >> references to provides a central place to look up such FSes via their IDs >> without preventing garbage collection. >> >> WebAnno remembers the ID of every FS rendered on screen. When the user >> makes an action, we load the CAS from disk and then look up the ID to >> retrieve the FS. We do not keep the CAS in memory all the time. If we would >> have to scan the whole CAS for the FS with a given ID, it would have >> probably a serious performance impact. >> >> Cheers, >> >> -- Richard