Cool! ᐧ On Thu, Jun 19, 2014 at 12:06 PM, Andy Seaborne <[email protected]> wrote: > Lizard needs to do network transfer of RDF data. Rather than just doing > something specific to Lizard, I've started on a general binary RDF module > using Apache Thrift. > > == RDF-Thrift > Work in Progress :: https://github.com/afs/rdf-thrift/ > > Discussion welcome. > > > The current is to have three supported abstractions: > > 1. StreamRDF > 2. SPARQL Result Sets > 3. RDF patch (which is very like StreamRDF but with A and D markers). > > A first pass for StreamRDF is done including some attempts to reduce objetc > churn when crossing the abstract boundaries. Abstract is all very well but > repeated conversion of datastructures can slow things down. > > Using StreamRDF means that prefix compression can be done. > > See > https://github.com/afs/rdf-thrift/blob/master/RDF.thrift > for the encoding at the moment for just RDF. > > == In Jena > > There are a number of places this might be useful: > > 1/ Fuseki and "application/sparql-results+thrift", "application/x-thrift" > > (oh dear, "application/x-thrift", "x-" is not encouraged any more due to the > transition problem c.f. "application/x-www-form-urlencoded") > > 2/ Hadoop-RDF > > This is currently using N-Triple/N-Quads. Rob - presumably this would be > useful eventually. AbstractNodeTupleWritable / AbstractNLineFileInputFormat > look about right to be but that's from code-reading not code-doing. > > (I know you/Cray have some internal binary RDF) > > 3/ Data bags and spill to disk > > 4/ RDF patch > > 5/ TDB (v2 - it would be a disk change) could useful use the RDF term > encoding for the node table. > > 5/ Files. Add to RIOT as a new syntax (a fairly direct access to > StreamRDF+Thrift) which then helps TDB loading. > > 6/ Caching results set in queries in Fuseki. > > In an ideal world, the Thrift format could be shared across toolkits. There > is nothing Jena specific about the wire encoding. > > == Thrift vs Protocol Buffer(+netty) > > The Lizard prototype currently uses Protocol Buffer + netty. Doing RDF > Thrift has a way to learn about Thrift. > > All the reviews and comparisons on the interweb seem to be born out. > There isn't a huge difference between the two. > > Thrift's initial entry costs are higher (document is still weak, the maven > artifact does not have a maven compatible source artifact (!!!) so you have > to mangle one yourself which isn't hard; there is the source but in a > non-standard form. > > Thrift has it's own networking; I'm unlikely to use the service (RPC) layer > from Thrift in Lizard itself as it is not fully streaming but driving the > next layer down directly is quite easy (as it is in PB+N). > > Protocol Buffers does not have a network layer, it's just the byte encoding, > but Netty comes with built in protocol buffer handling (PB+N). That works > fine as well and I have done back and found the equivalent functionality I > have used in RDF Thrift. > > For binary RDF and it's general use, thrift's wider language cover is a plus > point. > > Andy
-- Paul Houle Expert on Freebase, DBpedia, Hadoop and RDF (607) 539 6254 paul.houle on Skype [email protected]
