Re: Binary RDF

Paul Houle Thu, 19 Jun 2014 10:34:29 -0700

Cool!
ᐧ

On Thu, Jun 19, 2014 at 12:06 PM, Andy Seaborne <[email protected]> wrote:
> Lizard needs to do network transfer of RDF data.  Rather than just doing
> something specific to Lizard, I've started on a general binary RDF module
> using Apache Thrift.
>
> == RDF-Thrift
> Work in Progress :: https://github.com/afs/rdf-thrift/
>
> Discussion welcome.
>
>
> The current is to have three supported abstractions:
>
> 1. StreamRDF
> 2. SPARQL Result Sets
> 3. RDF patch (which is very like StreamRDF but with A and D markers).
>
> A first pass for StreamRDF is done including some attempts to reduce objetc
> churn when crossing the abstract boundaries. Abstract is all very well but
> repeated conversion of datastructures can slow things down.
>
> Using StreamRDF means that prefix compression can be done.
>
> See
>   https://github.com/afs/rdf-thrift/blob/master/RDF.thrift
> for the encoding at the moment for just RDF.
>
> == In Jena
>
> There are a number of places this might be useful:
>
> 1/ Fuseki and "application/sparql-results+thrift", "application/x-thrift"
>
> (oh dear, "application/x-thrift", "x-" is not encouraged any more due to the
> transition problem c.f. "application/x-www-form-urlencoded")
>
> 2/ Hadoop-RDF
>
> This is currently using N-Triple/N-Quads.  Rob - presumably this would be
> useful eventually.  AbstractNodeTupleWritable / AbstractNLineFileInputFormat
> look about right to be but that's from code-reading not code-doing.
>
> (I know you/Cray have some internal binary RDF)
>
> 3/ Data bags and spill to disk
>
> 4/ RDF patch
>
> 5/ TDB (v2 - it would be a disk change) could useful use the RDF term
> encoding for the node table.
>
> 5/ Files.  Add to RIOT as a new syntax (a fairly direct access to
> StreamRDF+Thrift) which then helps TDB loading.
>
> 6/ Caching results set in queries in Fuseki.
>
> In an ideal world, the Thrift format could be shared across toolkits. There
> is nothing Jena specific about the wire encoding.
>
> == Thrift vs Protocol Buffer(+netty)
>
> The Lizard prototype currently uses Protocol Buffer + netty.  Doing RDF
> Thrift has a way to learn about Thrift.
>
> All the reviews and comparisons on the interweb seem to be born out.
> There isn't a huge difference between the two.
>
> Thrift's initial entry costs are higher (document is still weak, the maven
> artifact does not have a maven compatible source artifact (!!!) so you have
> to mangle one yourself which isn't hard; there is the source but in a
> non-standard form.
>
> Thrift has it's own networking; I'm unlikely to use the service (RPC) layer
> from Thrift in Lizard itself as it is not fully streaming but driving the
> next layer down directly is quite easy (as it is in PB+N).
>
> Protocol Buffers does not have a network layer, it's just the byte encoding,
> but Netty comes with built in protocol buffer handling (PB+N).  That works
> fine as well and I have done back and found the equivalent functionality I
> have used in RDF Thrift.
>
> For binary RDF and it's general use, thrift's wider language cover is a plus
> point.
>
>         Andy




-- 
Paul Houle
Expert on Freebase, DBpedia, Hadoop and RDF
(607) 539 6254    paul.houle on Skype   [email protected]

Re: Binary RDF

Reply via email to