Hi,
On Mon, Jan 14, 2013 at 10:58 AM, David Riccitelli <[email protected]> wrote:
> ...You can find more information and the source code here:
> https://github.com/insideout10/stanbol-facade ...
Interesting - I think that matches my recent thoughts about Stanbol as
a (mostly) stateless content enhancement service - let me try to
describe my use cases for that, to see how much our ideas overlap.
I don't want to derail your efforts, what I describe here might have a
larger scope and I don't have code to back it so far, so feel free to
go ahead with your proposal...but maybe this helps refine the idea or
design it in an extensible way.
1. Simple enhancement of textual content
Client either POSTS a text/plain document (that's the default mime
type), or does a GET with the content in a request parameter.
Stanbol use a default enhancement pipeline (mor on that below) and
returns enhancements in a simple default format (ideally a human
readable format that doesn't scare "semantic newbies"). Client can
request other output formats with Accept header or by adding an
extension to the request URL.
2. Enhancement of binary content
Client POSTS a PDF, image or other document.
Stanbol uses a default content extractor (DCE) to get text from that
binary content, and then runs as above.
3. Enhancement of remote content
Same as 2. but the posted (json?) document contains URLs of content
that Stanbol first retrieves. Textual content is then extracted and
aggregated from the responses using the DCE, then proceed as in 2.
4. Requests including enhancement pipeline definitions ("stateless Stanbol")
Using a multipart POST in the previous use cases, one part can be a
pipeline definition that describes the enhancement pipeline to use.
The only configuration required on the Stanbol side is making the
engines available with unique names, their assembly is dynamic while
processing the request.
I'm saying pipeline and not enhancement chain as this goes a bit
further, the pipeline can include selection/configuration of the DCE,
selection/configuration of the renderer used for the enhancement graph
etc., probably using a mini flow language to allow parts of the
pipeline to depend on previous results (similar to the
https://gist.github.com/2931050 idea).
The pipeline granularity can also be smaller than enhancement engines,
for example to select specific NLP components as introduced by
STANBOL-733. One example is dynamic selection of a different part of
speech tagger depending on the detected language.
-Bertrand