Re: Ideas for UIMA v3

Richard Eckart de Castilho Tue, 23 Jun 2015 06:15:30 -0700

On 23.06.2015, at 15:03, Thilo Goetz <[email protected]> wrote:

> On 06/22/2015 06:23 PM, Marshall Schor wrote:
>> In reading this paper, it seems one of the key ideas is "dynamic typing".  
>> There
>> seems to be multiple aspects to this, including type "adapters" of various
>> kinds, to enable more-easily fitting together independently developed
>> components.  I also get the sense that making things "easy" for developers 
>> is a
>> value that dynamic typing provides.  Are you thinking here of the Javascript
>> style of typing values as "var" instead of specific static types?
>> 
>> If dynamic typing means something beyond getting independently - developed
>> components' type systems to work together "easily", can you give a couple of 
>> use
>> cases of what the dream is here?
> 
> That would be a good start. Beyond that, think about what we call generic 
> annotators, i.e., annotators that take a spec as input (e.g., a bunch of 
> regex rules) and produce annotations or other data as output. The data types 
> that the generic annotator produces varies with the spec, and so it can't 
> have a static, external type system. It might produce tokens with one spec, 
> sentences with another, and person names with a third.
> 
> Also, and I can't stress this enough, I want to be able to communicate with 
> annotators just at the level of the data. I want to be able to read data from 
> files, or from network streams. I want to read from Kafka or sequence files 
> in HDFS. And I want to be able to do that without having to know the precise 
> type system that the data was written with. And I want to be able to do this 
> in Python or Go if I feel like it, so there must be no framework dependency. 
> Think JSON.
> 
> Of course I need to know a thing or two about the data format, otherwise the 
> data is not very useful. However, if I just need the tokens, I don't want to 
> have to know all the rest, and I'd like this to be a lot easier than it is 
> now in UIMA.


We had a nice discussion about this at the COLING workshop ;)

E.g. right now it is quite annoying to change the type system in the CAS for 
such a generic annotator. It *is* in fact possible, e.g. setting up a component 
that does:

1) serialize the current CAS to a byte buffer A
2) create a new temporary CAS with the desired (extended) type system
3) serialize it to another byte buffer B
4) unserialize B into the current CAS (basically redefining the current type 
system)
5) unserializing A back into the current CAS using a lenient deserialization

This works in the serialization mechanism are smartly chosen, but it is not 
really convenient at all.

I think there was some discussion about removing the need to "lock" the type 
system after CAS initialization as well - in the workshop as well as here on 
the list, wasn't there?

Cheers,

-- Richard

Re: Ideas for UIMA v3

Reply via email to