Re: RDF Thrift for Jena

Andy Seaborne Thu, 04 Sep 2014 02:36:12 -0700

Cool - my first attempt at write speed testing suggested it was aboutthe same as N-triples.

Write performance testing is harder (!!!) because you need a big enoughsource of data to run without the source itself affecting the numbers.

N-Triples writing has always been faster than reading - it's much closerto "push strings straight into the output" with no single charactermangling most of the time.

From looking at the thrift implementation, it has to do smallchar->byte conversions.

It maybe faster to not use Java's native converter (which involves acopy) but to do direct chars -> output stream using BlockUTF8.

When I last tested, BlockUTF8 was faster for strings <~100 charactersbut after that Java JDK was faster for larger.


        Andy

On 04/09/14 10:05, Rob Vesse wrote:

Thanks Andy,

I have started experimenting, more on that to follow

Rob

On 31/08/2014 15:36, "Andy Seaborne" <a...@apache.org> wrote:

On 26/08/14 21:20, Andy Seaborne wrote:

I've been working on a binary format for RDF and SPARQL result sets:

http://afs.github.io/rdf-thrift/

This is now ready to go if everyone is OK with that.

I'm flagging this up for passive consensus because it adds a new
dependency (for Apache Thrift).

And of course any questions or comments.

Summary, as an RDF syntax:

+ x3 faster to parse than N-triples
+ same size as N-triples, and same compression effects with gzip (8-10
compression).
+ Not much additional work to add because Thrift does most of the work.

      Andy


Migration done (JENA-774).  Some cleaning up to do (putting classes in
more logical places mostly) but tests in and passing.

        Andy

Re: RDF Thrift for Jena

Reply via email to