On 06/24/2015 12:30 PM, Richard Eckart de Castilho wrote:
On 24.06.2015, at 11:56, Thilo Goetz <[email protected]> wrote:
Or would the issue be solvable by integrating uimaFIT into the core (e.g. to
avoid re-approval of libraries by company legal departments)?
I am just speaking for myself here, but I don't want more framework, I want
less. I don't want to have to use uimaFIT to help me deal with UIMA if I could
have something that doesn't require that help. Give me a simple JSON based data
format and I'll be happy.
I know this is selfish, but unless the discussion is seriously moving in that
direction, I'm simply not interested :)
Fair enough.
Marshall already did some nice work on JSON serialization, so I think there is
movement into that direction.
Just to be very clear: that is not good enough. I want a JSON format
that I can read and write without the help of the framework. From my
datastructures, into my datastructures. In some programming language
that hasn't been invented yet. Simple enough that I don't need to absorb
and reimplement the whole UIMA philosophy.
But what I don't understand is how a data format resolves to "less framework".
The data format is basically addressing ingestion and export, but not processing or
pipelines. Even if you have a simple data format like JSON, there's still the need to run
analysis, right? Is the analysis in your scenario just a black box? And in order to apply
the analysis, you'll need some kind API - how do you imagine it?
The analysis is a black box, yes. What else could it be? I don't care
how the POS tagger does what it does. All I'm interested in is what it
needs as input, and how it gives me the output. I can parse JSON into
Java pojos with jackson for example, that's super simple. Writing them
out is even easier. What APIs do I need other than being able to tell
some piece of analysis to do its stuff on a bunch of data?
I am painting things black and white here. I certainly see that some
convenience APIs can be useful. The import thing for me though is that
they are just that: a convenience. If the basic definition is the data
format, than other things like APIs can be stacked on top of that. But
if I want to work with the data directly, then I can do that without
having to jump through hoops.
--Thilo
Cheers,
-- Richard