So I wrote out a gist that shows how one should be able to use Jena Parsers
It is here:

   https://gist.github.com/1704255

But I get the exception 

ERROR (WebFetcher.scala:59) : org.xml.sax.SAXParseException; systemId: 
http://bblfish.net/people/henry/card.rdf; lineNumber: 134; columnNumber: 44; 
XML document structures must start and end within the same entity.
com.hp.hpl.jena.shared.JenaException: org.xml.sax.SAXParseException; systemId: 
http://bblfish.net/people/henry/card.rdf; lineNumber: 134; columnNumber: 44; 
XML document structures must start and end within the same entity.
        at 
com.hp.hpl.jena.rdf.model.impl.RDFDefaultErrorHandler.fatalError(RDFDefaultErrorHandler.java:60)
        at 
com.hp.hpl.jena.rdf.arp.impl.ARPSaxErrorHandler.fatalError(ARPSaxErrorHandler.java:51)
        at com.hp.hpl.jena.rdf.arp.impl.XMLHandler.warning(XMLHandler.java:211)
        at 
com.hp.hpl.jena.rdf.arp.impl.XMLHandler.fatalError(XMLHandler.java:241)
        at o

As expected, because there one cannot pass partial documents to the reader.

Henry


On 29 Jan 2012, at 23:52, Henry Story wrote:

> 
> On 29 Jan 2012, at 23:28, Henry Story wrote:
> 
>> 
>> On 29 Jan 2012, at 23:04, Andy Seaborne wrote:
>> 
>>> Hi Henry,
>>> 
>>> On 29/01/12 21:40, Henry Story wrote:
>>>> [ I just opened a bug report for this, but it was suggested that a wider
>>>> discussion on how to do it would be useful on this list. ]
>>> 
>>> The thread of interest is:
>>> 
>>> http://www.mail-archive.com/[email protected]/msg02451.html
>>> 
>>>> Unless I am mistaken the only way to parse some content is using methods 
>>>> that use an
>>>> InputStream such as this:
>>>> 
>>>>   val m = ModelFactory.createDefaultModel()
>>>>    m.getReader(lang.jenaLang).read(m, in, base.toString)
>>> 
>>> As already commented on the thread, passing the reader to an actor allows 
>>> async reading.  Readers are configurable - you can have anything you like.  
>>> No reason why the RDFReader can't be using async NIO.
>> 
>> Mhh, can I call at time t1
>> 
>>  reader.read( model, inputStream, base);
>> 
>> with an inputStream that only contains a chunk of the data? And then call it 
>> again with
>> another chunk of the data later with a newly filled input stream that 
>> contains the next segment
>> of the data?
>> 
>>  reader.read( model, inputStream2, base);
>> 
>> It says nothing about that in the documentation, so I just assumed it does 
>> not work...
> 
> Well I did look at the code (but perhaps not deeply enough, and only the 
> released 
> version of Jena). From that I got the feeling that one has to send one whole 
> RDF 
> document down an input stream at a time.
> 
> If one cannot send chunks to the reader then essentially the thread that 
> calls the
> read(...) method above will block until the whole document is read in. Even 
> if an 
> actor calls that method, the actor will then block the thread that it is 
> executing
> in until it  is finished. So actors don't help (unless there is some magic I 
> don't
> know about). Now if the server serving the document is serving it at 56 
> bauds, really
> slowly, then one thread could be used up even though it is producing very very
> little work.
> 
> If on the other hand I could send partial pieces of XML documents down 
> different 
> input streams and different times, then the NIO thread could call the reader 
> every time it received some data. For example in the code I was writing here 
> using the
> http-async-client https://gist.github.com/1701141
> 
> The method I have now on line 39-42
> 
>  def onBodyPartReceived(bodyPart: HttpResponseBodyPart) = {
>    bodyPart.writeTo(out)
>    STATE.CONTINUE
>  }
> 
> 
>  could be changed to 
> 
>  def onBodyPartReceived(bodyPart: HttpResponseBodyPart) = {
>    reader.read(model, new ByteArrayInputStream(bodyPart.getBodyPartBytes(), 
> base)
>    STATE.CONTINUE
>  }
> 
>  and so the body part would be consumed by the read in chunks.
> 
>> 
>>> 
>>> There is also RIOT - have you looked parsing the read request to a parser 
>>> in an actor, the catching the Sink<Triple> interface for the return -- that 
>>> wokrs in an actor style.
>>> 
>>> The key question is what Jena can enable,  this so that possibilities can 
>>> be built on top.  I don't think jena is a good level to pick one approach 
>>> over another as it is in danger of clashing with other choice in the 
>>> application.  Your akka is a good example of one possible choice.
>>> 
>>>> I did open the issue-203 so that when we agree on a solution we could send 
>>>> in
>>>> some patches.
>>> 
>>> Look forward to seeing this,
>>> 
>>>     Andy
>> 
>> Social Web Architect
>> http://bblfish.net/
>> 
> 
> Social Web Architect
> http://bblfish.net/
> 

Social Web Architect
http://bblfish.net/

Reply via email to