Thanks Fabian. Here are the answers:
> 1) You introduce an new endpoint http://localhost:8080/api/tasks Correct. Ideally the API front-end focuses more on developer adoption so to provide APIs that ease integration. > 2) The endpoint consumes JSON that either has the HTML content or a URL > pointing to HTML content When the consumer posts: - a 'content', it is sent straight to the enhancement chain. - a URL, it is parsed with Readability and the output content is then sent to the enhancement chain. Note that if Readability understands the the URL points to an article split on multiple pages (therefore multiple URLs), it will then load the content from all the related URLs. > 3) The accepted media-type is also defined in the JSON file for the request The HTTP *Accept *header is currently ignored. Indeed it would be probably more correct to eliminate the *mimeType *property and rely solely on the *Accept *header. > 4) Using readability the HTML is cleaned and then some enhancement chain is > triggered. Which chain is used here? The default chain is used unless the consumer specifies which chain to use by setting the chainName property in the JSON payload [1]. > 5) The usual enhancement RDF is returned to the user Correct. BR, David [1] ln 95: https://github.com/insideout10/stanbol-facade/blob/master/stanbol-facade-api/src/main/java/io/insideout/stanbol/facade/services/TaskService.java On Mon, Jan 14, 2013 at 12:21 PM, Fabian Christ < [email protected]> wrote: > Hi David, > > nice idea. First let me summarize what this contribution is about to see if > I understood it correctly. > > 1) You introduce an new endpoint http://localhost:8080/api/tasks > 2) The endpoint consumes JSON that either has the HTML content or a URL > pointing to HTML content > 3) The accepted media-type is also defined in the JSON file for the request > 4) Using readability the HTML is cleaned and then some enhancement chain is > triggered. Which chain is used here? > 5) The usual enhancement RDF is returned to the user > > Is this what it does? > > Thanks, > - Fabian > > > 2013/1/14 David Riccitelli <[email protected]> > > > Hello, > > > > I would like to introduce one more contribution for Apache Stanbol. > > > > It is not an engine, but an HTTP API for Stanbol which pre-processes and > > submits analysis tasks, and returns the result synchronously to the > > consumer. It aims to simplify development integrations and to provide a > > powerful pre-processing API for analysis of URLs. > > > > It implements the *Readability* library, in order to support URL > > submissions: > > - loading contents from remote URLs and > > - cleaning them up of all the surrounding noise. > > > > Readability is the same library behind the *Reader* function of Safari > that > > many users know already. > > > > To summarize: > > > > - extremely simple APIs to ease prototyping, integration and usage > > - support for textual contents > > - support for URLs > > - *for URLs, preprocessing of HTML pages to capture the actual URL > > content while skipping noise such as ads, menus and so forth* > > - synchronous access (for asynchronous access see idntik.it) > > > > You can find more information and the source code here: > > https://github.com/insideout10/stanbol-facade > > > > Shall I open a JIRA to discuss a possible integration in the trunk? > > > > BR, > > David Riccitelli > > > > -- check the Swagger for WordLift <http://bit.ly/VtoM5H> > > > > > ******************************************************************************** > > InsideOut10 s.r.l. > > P.IVA: IT-11381771002 > > Fax: +39 0110708239 > > --- > > LinkedIn: http://it.linkedin.com/in/riccitelli > > Twitter: ziodave > > --- > > Layar Partner Network< > > > http://www.layar.com/publishing/developers/list/?page=1&country=&city=&keyword=insideout10&lpn=1 > > > > > > > > ******************************************************************************** > > > > > > -- > Fabian > http://twitter.com/fctwitt > -- David Riccitelli -- check the Swagger for WordLift <http://bit.ly/VtoM5H> ******************************************************************************** InsideOut10 s.r.l. P.IVA: IT-11381771002 Fax: +39 0110708239 --- LinkedIn: http://it.linkedin.com/in/riccitelli Twitter: ziodave --- Layar Partner Network<http://www.layar.com/publishing/developers/list/?page=1&country=&city=&keyword=insideout10&lpn=1> ********************************************************************************
