[jira] [Commented] (JENA-203) support for Non Blocking Parsers

Henry Story (Commented) (JIRA) Thu, 01 Mar 2012 01:44:24 -0800

    [ 
https://issues.apache.org/jira/browse/JENA-203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13219944#comment-13219944
 ]


Henry Story commented on JENA-203:
----------------------------------

I am not sure what is the best way to change the Jena API for non blocking 
parsers, nor if anything needs to be done (yet). Essentially the way these 
parsers work is that one should be
able to parse chunks of data, get some partial results (a small set of triples) 
and feed that to a  Jena graph or store. Feeding it to a Jena Graph, or popping 
statements into a store one at a time is  not a problem. So the XML parser I 
did above shows that it can be done with the jena rdf/xml parsers, and the 
turtle parser shows how one can do it with other frameworks that use Jena: 
after all the Turtle parser tests can add triples to Jena or Sesame graphs.

But I think consciousness of this problem should help guide the direction of 
your thinking when developing new parsers, or what is needed to work with 
linked data in  an efficient way.

Out of doing this a few times an API will probably emerge.

Currently I have a simple blocking interface API for the non blocking parser
   
https://github.com/betehess/pimp-my-rdf/blob/248c8a13567e589308d1b7999570a14d6b530b20/n3/src/main/scala/TurtleReader.scala

we all know this API. I need to find out how people in the actors community do 
this, and see what kind of pattern they agree is good. If I find that
I'll post that here. Perhaps that will lead to some ideas of what such a 
pattern looks like.

(The NTriples file moved. Here is the current snapshot link, which should be a 
permalink 
   
https://github.com/betehess/pimp-my-rdf/blob/248c8a13567e589308d1b7999570a14d6b530b20/n3/src/main/scala/NTriples.scala
 , but won't necessarily be the most up to date one )

I'll keep you posted on further developments. I should try using these parsers 
in a real scenario soon, so I'll soon know how well this holds up.

                
> support for Non Blocking Parsers
> --------------------------------
>
>                 Key: JENA-203
>                 URL: https://issues.apache.org/jira/browse/JENA-203
>             Project: Apache Jena
>          Issue Type: Improvement
>            Reporter: Henry Story
>
> In a Linked Data environment servers have to fetch data off the web. The 
> speed at which such data 
> is served can be very slow. So one wants to avoid using up one thread for 
> each connections (1 thread = 
> 0.5 to 1MB approximately). This is why Java NIO was developed and why servers 
> such as Netty
> are so popular, why http client libraries such as 
> https://github.com/sonatype/async-http-client are more
> and more numerous, and why framewks such as http://akka.io/ which support 
> relatively lightweight
> actors (500 bytes per actor) are growing more viisible.
> Unless I am mistaken the only way to parse some content is using methods that 
> use an 
> InputStream such as this:
>     val m = ModelFactory.createDefaultModel()
>      m.getReader(lang.jenaLang).read(m, in, base.toString)
> That read call blocks. Would it be possible to have an API which allows
> one to parse a document in chunks as they arrive from the input?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (JENA-203) support for Non Blocking Parsers

Reply via email to