Re: RDF Thrift for Jena

Andy Seaborne Mon, 01 Sep 2014 14:41:18 -0700

On 01/09/14 19:57, Stian Soiland-Reyes wrote:

Sounds proper enough :) with a binary format obviously one has to be very
careful about any changes, but I was more thinking of versioning of the API
of Apache Thrift that your module would use through dependencies.

Same applies to text forms. Their strength is that they are W3Cstandards. If that is of paramount important, then possibly "RDF 1.1N-Quads" is the best choice because it is fixed.

If I was to use Jena 1.14.0 depending on Apache Thrift say 0.6.0, but
instead also depended on  (something that depends on) a newer Apache Thrift
0.9.0, have that project committed themselves to semantic versioning so
that this would still in theory work? E.g. not deleting or breaking
existing API signatures (adding is ok)

I'm not seeing that RDF Thrift is different from anything else in termsof versioning. You are always have the issue that dependency versionsmight conflict. At some level, you have to judge the community - atleast with open source you have that option, as well as taking thesource and keeping what you need. Archive a record of the dependenciesyou use! (if you trust maven central - Apache projects do not makereleases that depend on anything not maven central - no dependencies onobscure transient jar repos!)

Incremental versioning had better work as Apache Thrift depends on anearlier version of org.apache.httpcomponents:httpcore (v 4.2.4) and Jenacurrently uses 4.2.6.


We have:

<dependency>
  <groupId>org.apache.thrift</groupId>
  <artifactId>libthrift</artifactId>
  <version>0.9.1</version>
  <exclusions>
    <!-- Use whatever version Jena is using -->
    <exclusion>
       <groupId>org.apache.httpcomponents</groupId>
       <artifactId>httpcore</artifactId>
    </exclusion>
  </exclusions>
</dependency>

In theory it should not make anything fall over unless you tried to use the
Jena Thrift serialization.. but that depends on how it is wired in. In RIOT
the standard language serializers are hardcoded somewhere, right?

Wired in but not hard coded. They have never been hardcoded but it wasquite hard to rewire. Think of them as a standard library of things touse if you want to.

Now RIOT has registries (for parsers, for writers, for stream writers,then registries for SPARQL Result Sets readers and writers) which have aset of languages included and set up but you can remove one, replace oneor add one (RDF Thrift and JSON-LD were developed outside Jena and wiredin at run time until they moved into RIOT when stable). Or call anycode you like and put the outcome into a graph/dataset.


        Andy

On 1 Sep 2014 09:35, "Andy Seaborne" <a...@apache.org> wrote:

On 31/08/14 19:03, Stian Soiland-Reyes wrote:

How have you tested this for IRIs and international characters in
literals?
sorry, I am out travelling and have not checked the code yet.. :)


Yes.

Thrift encodes strings as UTF-8.

The wire form of an IRI is a tagged string:
http://afs.github.io/rdf-thrift/rdf-binary-thrift.html

struct RDF_IRI {
1: required string iri
}

  The new dependency on Apache Thrift would be my main concern if this is

not
in a separate module. How stable are Thrift APIs?E.g. do they follow
semantic versioning so that a Jena build will work with a newer Thrift
version (with same major)?


Stronger than that - Thrift cares a lot about wire/storage format
compatibility because of the large scale of deployments in which it's used.

A system wide, cross-language change of format simply isn't practical. It
would have to be a parallel evolution.

See their discussion of adding the union type - on the wire its a struct
of one element (i.e. each element is 'optional') and union-ness is provided
by the encode/decode.  Old implementations that are not aware of union
still work.

What is open (but closing) is whether the RDF encoding is the right one.
Evidence from real use is always going to be valuable.

         Andy

  On 31 Aug 2014 15:37, "Andy Seaborne" <a...@apache.org> wrote:


  On 26/08/14 21:20, Andy Seaborne wrote:


  I've been working on a binary format for RDF and SPARQL result sets:


http://afs.github.io/rdf-thrift/

This is now ready to go if everyone is OK with that.

I'm flagging this up for passive consensus because it adds a new
dependency (for Apache Thrift).

And of course any questions or comments.

Summary, as an RDF syntax:

+ x3 faster to parse than N-triples
+ same size as N-triples, and same compression effects with gzip (8-10
compression).
+ Not much additional work to add because Thrift does most of the work.

       Andy

Migration done (JENA-774).  Some cleaning up to do (putting classes in
more logical places mostly) but tests in and passing.

          Andy

Re: RDF Thrift for Jena

Reply via email to