Hi, [ I just opened a bug report for this, but it was suggested that a wider discussion on how to do it would be useful on this list. ]
In a Linked Data environment servers have to fetch data off the web. The speed at which such data is served can be very slow. So one wants to avoid using up one thread for each connections (1 thread = 0.5 to 1MB approximately). This is why Java NIO was developed and why servers such as Netty are so popular, why http client libraries such as https://github.com/sonatype/async-http-client are more and more numerous, and why actor frameworks such as http://akka.io/ which support relatively lightweight actors (500 bytes per actor) are growing more visible. Unless I am mistaken the only way to parse some content is using methods that use an InputStream such as this: val m = ModelFactory.createDefaultModel() m.getReader(lang.jenaLang).read(m, in, base.toString) That read call *blocks*: i.e. the thread that calls that will then spend all its time on the reading in the information, HOWEVER SLOWLY it is sent. Would it be possible to have an API which allows one to parse a document in chunks as they arrive from the input? Without that each request for a remote resource ties up a minimum of 0.5-1 MB, plus the swapping costs of threads (which is known to be very high). So if you fetch 500 remote resources before you even get started and you use up 500MB whilst you slow down your machine dramatically due to swapping. Instead with akka actors you would use 500bytes*500 = 250000bytes = 250kbytes = 1/4 MB plus perhaps a few threads. With simple NIO you have the same or even less. 1 NIO thread can read as much input as it can handle. And you probably just need a few worker threads if the parsing is more work that reading. So just like that we can save a lot of memory. HAVING Said that. What is the best way to do this? An (ugly?) solution that would work is just to have a method reader.write(byteArray) So instead of having the thread doing the reading, this makes it possible for the IO layer to pass blocks of characters straight to the model as those blocks of characters come along. It would be better of course if the structure passed could be one that was not changeable, even better, if it could use NIO bytes buffers as that reduces the need even to copy data, but I guess that the Jena parsers were not written with that in mind. I did open the issue-203 so that when we agree on a solution we could send in some patches. https://issues.apache.org/jira/browse/JENA-203 Henry Social Web Architect http://bblfish.net/
