Barry As Andy has stated in his replies no we didn't have this functionality already and he has now added it to trunk.
As far as your described use case goes I would point out that this mode of operation will not be scalable unless you have appropriately partitioned the data. Parsing is inherently a blocking process hence why the iterator model provided by RIOT already relies on having a producer and a consumer thread with a bounded thread safe queue between them to stop the producer filling the memory with as much data as it can read before the consumer ever gets to start processing the data. In your described model you will need to parse the entirety of the data into memory before you can start consuming it which risks OOM errors with larger datasets. If your real target is Hadoop input formats then you may want to instead take a look at Paolo Castagna's jena-grande repository on GitHub - https://github.com/castagna/jena-grande which is a little out of date with respect to latest Hadoop versions but demonstrates how to create input formats for RDF - https://github.com/castagna/jena-grande/tree/master/src/main/java/org/apache /jena/grande/mapreduce/io Hope this helps, Rob From: "Coughlan, Barry" <[email protected]> Reply-To: <[email protected]> Date: Friday, 1 November 2013 09:15 To: "[email protected]" <[email protected]> Subject: Single-threaded RIOT parsingof InputStream > Hi all, > > According to the RIOT docs, iterating over triples/quads with piped streams > requires separate threads for producer/consumer. > > For some applications this isn't practical. In my case I am running an Hadoop > job on NTriple datasets, so I am parsing one triple at a time. The overhead > and extra code complexity of kicking off a thread to parse each triple is too > high, and this may be true for other use cases involving small datasets. > > I wrote some StreamRDF implementations which store the results in Java > Collections, so that parsing can be run on a single thread. Attached is a > patch with the implementations, tests and an example (I borrowed the term > 'Collector' from Apache Lucene). But I now suspect that I've overlooked some > simple existing API call to do this. > > Any feedback appreciated. > > Regards, > Barry
