So I wrote out a gist that shows how one should be able to use Jena Parsers It is here:
https://gist.github.com/1704255 But I get the exception ERROR (WebFetcher.scala:59) : org.xml.sax.SAXParseException; systemId: http://bblfish.net/people/henry/card.rdf; lineNumber: 134; columnNumber: 44; XML document structures must start and end within the same entity. com.hp.hpl.jena.shared.JenaException: org.xml.sax.SAXParseException; systemId: http://bblfish.net/people/henry/card.rdf; lineNumber: 134; columnNumber: 44; XML document structures must start and end within the same entity. at com.hp.hpl.jena.rdf.model.impl.RDFDefaultErrorHandler.fatalError(RDFDefaultErrorHandler.java:60) at com.hp.hpl.jena.rdf.arp.impl.ARPSaxErrorHandler.fatalError(ARPSaxErrorHandler.java:51) at com.hp.hpl.jena.rdf.arp.impl.XMLHandler.warning(XMLHandler.java:211) at com.hp.hpl.jena.rdf.arp.impl.XMLHandler.fatalError(XMLHandler.java:241) at o As expected, because there one cannot pass partial documents to the reader. Henry On 29 Jan 2012, at 23:52, Henry Story wrote: > > On 29 Jan 2012, at 23:28, Henry Story wrote: > >> >> On 29 Jan 2012, at 23:04, Andy Seaborne wrote: >> >>> Hi Henry, >>> >>> On 29/01/12 21:40, Henry Story wrote: >>>> [ I just opened a bug report for this, but it was suggested that a wider >>>> discussion on how to do it would be useful on this list. ] >>> >>> The thread of interest is: >>> >>> http://www.mail-archive.com/[email protected]/msg02451.html >>> >>>> Unless I am mistaken the only way to parse some content is using methods >>>> that use an >>>> InputStream such as this: >>>> >>>> val m = ModelFactory.createDefaultModel() >>>> m.getReader(lang.jenaLang).read(m, in, base.toString) >>> >>> As already commented on the thread, passing the reader to an actor allows >>> async reading. Readers are configurable - you can have anything you like. >>> No reason why the RDFReader can't be using async NIO. >> >> Mhh, can I call at time t1 >> >> reader.read( model, inputStream, base); >> >> with an inputStream that only contains a chunk of the data? And then call it >> again with >> another chunk of the data later with a newly filled input stream that >> contains the next segment >> of the data? >> >> reader.read( model, inputStream2, base); >> >> It says nothing about that in the documentation, so I just assumed it does >> not work... > > Well I did look at the code (but perhaps not deeply enough, and only the > released > version of Jena). From that I got the feeling that one has to send one whole > RDF > document down an input stream at a time. > > If one cannot send chunks to the reader then essentially the thread that > calls the > read(...) method above will block until the whole document is read in. Even > if an > actor calls that method, the actor will then block the thread that it is > executing > in until it is finished. So actors don't help (unless there is some magic I > don't > know about). Now if the server serving the document is serving it at 56 > bauds, really > slowly, then one thread could be used up even though it is producing very very > little work. > > If on the other hand I could send partial pieces of XML documents down > different > input streams and different times, then the NIO thread could call the reader > every time it received some data. For example in the code I was writing here > using the > http-async-client https://gist.github.com/1701141 > > The method I have now on line 39-42 > > def onBodyPartReceived(bodyPart: HttpResponseBodyPart) = { > bodyPart.writeTo(out) > STATE.CONTINUE > } > > > could be changed to > > def onBodyPartReceived(bodyPart: HttpResponseBodyPart) = { > reader.read(model, new ByteArrayInputStream(bodyPart.getBodyPartBytes(), > base) > STATE.CONTINUE > } > > and so the body part would be consumed by the read in chunks. > >> >>> >>> There is also RIOT - have you looked parsing the read request to a parser >>> in an actor, the catching the Sink<Triple> interface for the return -- that >>> wokrs in an actor style. >>> >>> The key question is what Jena can enable, this so that possibilities can >>> be built on top. I don't think jena is a good level to pick one approach >>> over another as it is in danger of clashing with other choice in the >>> application. Your akka is a good example of one possible choice. >>> >>>> I did open the issue-203 so that when we agree on a solution we could send >>>> in >>>> some patches. >>> >>> Look forward to seeing this, >>> >>> Andy >> >> Social Web Architect >> http://bblfish.net/ >> > > Social Web Architect > http://bblfish.net/ > Social Web Architect http://bblfish.net/
