I added a wiki page to develop the ideas here. 

This is what I got from reading this:
 
One idea is having an annotator not have a type system specification, but rather
have it dynamically create types / features according to some configuration info
or some dynamically-obtained information (perhaps the results of some previous
analysis).

Another idea is having an annotator be able to read Feature Structure data from
a wide variety of sources, and have the data include the type/feature metadata
(either externally - as we do now in UIMA with a type system external XML
specification, or embedded - like JSON would naturally do).  Such an annotator
would have some notion of the type / feature information it was interested in
processing, but could ignore the rest.

Finally, a third idea is to have the componentization be such that no "UIMA
Framework" was needed, or if present, it's hidden.  I'm thinking that this
means, for simpler analytics, the idea of a pipe line, and combining things,
would not be present; it would be more like just a single annotator.  For more
complex things, the idea of a pipeline would be encapsulated (like UIMA's
Aggregates), and the whole thing would look like something that could be
embedded, in any of the other "big data" frameworks as an analysis piece.  The
implication is that this would enable using other frameworks' scaleout 
mechanisms.

Does this capture the ideas?  Please fill what I may have missed :-)

-Marshall

On 6/23/2015 9:03 AM, Thilo Goetz wrote:
> On 06/22/2015 06:23 PM, Marshall Schor wrote:
>> In reading this paper, it seems one of the key ideas is "dynamic typing".  
>> There
>> seems to be multiple aspects to this, including type "adapters" of various
>> kinds, to enable more-easily fitting together independently developed
>> components.  I also get the sense that making things "easy" for developers 
>> is a
>> value that dynamic typing provides.  Are you thinking here of the Javascript
>> style of typing values as "var" instead of specific static types?
>>
>> If dynamic typing means something beyond getting independently - developed
>> components' type systems to work together "easily", can you give a couple of 
>> use
>> cases of what the dream is here?
>
> That would be a good start. Beyond that, think about what we call generic
> annotators, i.e., annotators that take a spec as input (e.g., a bunch of regex
> rules) and produce annotations or other data as output. The data types that
> the generic annotator produces varies with the spec, and so it can't have a
> static, external type system. It might produce tokens with one spec, sentences
> with another, and person names with a third.
>
> Also, and I can't stress this enough, I want to be able to communicate with
> annotators just at the level of the data. I want to be able to read data from
> files, or from network streams. I want to read from Kafka or sequence files in
> HDFS. And I want to be able to do that without having to know the precise type
> system that the data was written with. And I want to be able to do this in
> Python or Go if I feel like it, so there must be no framework dependency.
> Think JSON.
>
> Of course I need to know a thing or two about the data format, otherwise the
> data is not very useful. However, if I just need the tokens, I don't want to
> have to know all the rest, and I'd like this to be a lot easier than it is now
> in UIMA.
>
>>
>> I put some concepts into the wiki-page; feel free to correct/augment etc.
>>
>> Thanks! -Marshall
>>
>> On 6/22/2015 5:30 AM, Thilo Goetz wrote:
>>> Let me throw last year's dream into the ring then:
>>> http://aclweb.org/anthology/W14-5209
>>>
>>> --Thilo
>>>
>>> On 06/18/2015 04:41 PM, Marshall Schor wrote:
>>>> I've put up another wiki page as a place to collect ideas for UIMA version 
>>>> 3.
>>>>
>>>> It's a place to dream a bit.  It goes with the earlier page,
>>>> https://cwiki.apache.org/confluence/display/UIMA/Modernizing+the+internals+of+UIMA
>>>>
>>>> .
>>>>
>>>> Feel free to contribute, of course!
>>>>
>>>> The page is linked from the above page, but here's the direct link:
>>>> https://cwiki.apache.org/confluence/display/UIMA/Ideas+for+UIMAJ+v3
>>>>
>>>> -Marshall
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
>
>

Reply via email to