On 23.06.2015, at 15:03, Thilo Goetz <[email protected]> wrote: > On 06/22/2015 06:23 PM, Marshall Schor wrote: >> In reading this paper, it seems one of the key ideas is "dynamic typing". >> There >> seems to be multiple aspects to this, including type "adapters" of various >> kinds, to enable more-easily fitting together independently developed >> components. I also get the sense that making things "easy" for developers >> is a >> value that dynamic typing provides. Are you thinking here of the Javascript >> style of typing values as "var" instead of specific static types? >> >> If dynamic typing means something beyond getting independently - developed >> components' type systems to work together "easily", can you give a couple of >> use >> cases of what the dream is here? > > That would be a good start. Beyond that, think about what we call generic > annotators, i.e., annotators that take a spec as input (e.g., a bunch of > regex rules) and produce annotations or other data as output. The data types > that the generic annotator produces varies with the spec, and so it can't > have a static, external type system. It might produce tokens with one spec, > sentences with another, and person names with a third. > > Also, and I can't stress this enough, I want to be able to communicate with > annotators just at the level of the data. I want to be able to read data from > files, or from network streams. I want to read from Kafka or sequence files > in HDFS. And I want to be able to do that without having to know the precise > type system that the data was written with. And I want to be able to do this > in Python or Go if I feel like it, so there must be no framework dependency. > Think JSON. > > Of course I need to know a thing or two about the data format, otherwise the > data is not very useful. However, if I just need the tokens, I don't want to > have to know all the rest, and I'd like this to be a lot easier than it is > now in UIMA.
We had a nice discussion about this at the COLING workshop ;) E.g. right now it is quite annoying to change the type system in the CAS for such a generic annotator. It *is* in fact possible, e.g. setting up a component that does: 1) serialize the current CAS to a byte buffer A 2) create a new temporary CAS with the desired (extended) type system 3) serialize it to another byte buffer B 4) unserialize B into the current CAS (basically redefining the current type system) 5) unserializing A back into the current CAS using a lenient deserialization This works in the serialization mechanism are smartly chosen, but it is not really convenient at all. I think there was some discussion about removing the need to "lock" the type system after CAS initialization as well - in the workshop as well as here on the list, wasn't there? Cheers, -- Richard
