[jira] [Commented] (JENA-203) support for Non Blocking Parsers

Henry Story (Commented) (JIRA) Wed, 01 Feb 2012 08:35:21 -0800

    [ 
https://issues.apache.org/jira/browse/JENA-203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13197939#comment-13197939
 ]


Henry Story commented on JENA-203:
----------------------------------

With a bit of help from Damian I did get  the RDF/XML parser to be asynchronous 
using com.fasterxml.aalto asynchronous parser [1]. 

I had to adapt Damian's jena.rdf.arp.StAX2SAX - which I called AsyncJenaParser 
[2] . This is then used by the URLFetcher class [3]. This class
extends the async_http_client by ning [4], to fetch RDF.

Currently it can only fetch RDF/XML, and with a bit more work, any XML format.

What is missing is the Turtle parsers and JSON parsers

The URLFetcher could be a bit more general and just pass on the data it 
receives to some actors. That would remove the parser processing from the IO 
thread, and allow the fetcher to be more general. 

There is perhaps something here that can be integrated by Jena. The 
AsyncJenaParser perhaps?

Henry

[1] http://www.cowtowncoder.com/blog/archives/2011/03/entry_451.html
[2] 
https://dvcs.w3.org/hg/read-write-web/file/aa9074df0635/src/main/java/patch/AsyncJenaParser.java
[3] 
https://dvcs.w3.org/hg/read-write-web/file/d9c1f87eee55/src/main/scala/cache/WebFetcher.scala
[4] all classes can be found in the build file 
https://dvcs.w3.org/hg/read-write-web/file/aa9074df0635/project/build.scala



                
> support for Non Blocking Parsers
> --------------------------------
>
>                 Key: JENA-203
>                 URL: https://issues.apache.org/jira/browse/JENA-203
>             Project: Jena
>          Issue Type: Improvement
>            Reporter: Henry Story
>
> In a Linked Data environment servers have to fetch data off the web. The 
> speed at which such data 
> is served can be very slow. So one wants to avoid using up one thread for 
> each connections (1 thread = 
> 0.5 to 1MB approximately). This is why Java NIO was developed and why servers 
> such as Netty
> are so popular, why http client libraries such as 
> https://github.com/sonatype/async-http-client are more
> and more numerous, and why framewks such as http://akka.io/ which support 
> relatively lightweight
> actors (500 bytes per actor) are growing more viisible.
> Unless I am mistaken the only way to parse some content is using methods that 
> use an 
> InputStream such as this:
>     val m = ModelFactory.createDefaultModel()
>      m.getReader(lang.jenaLang).read(m, in, base.toString)
> That read call blocks. Would it be possible to have an API which allows
> one to parse a document in chunks as they arrive from the input?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (JENA-203) support for Non Blocking Parsers

Reply via email to