Thrift is 3 layers: a service model, an encoding layer and a handling of the bytes in/out. RDF Thrift is not using the service layer; I'm using it elsewhere (Lizard) and it is just fine - it's simpler than netty for a tightly bound system.

Java Thrift only has dependencies
  org.apache.httpcomponents:httpcore
  org.apache.commons:commons-lang3

and the httpcore part is for Thrift over HTTP (TServlet - thrift-encoded RPC over HTTP).

Avro is the system to look at if you want encoding schema evolution.

        Andy

On 01/09/14 23:25, Stian Soiland-Reyes wrote:
Thanks for your clarifications, don't worry I am now officially relieved! :)

I am sorry for being that "versioning guy" - I guess I've had too many
bad experiences trying to manage dependencies of dependencies of
dependencies over the years.. (even down to having our own class
loader mechanism...!)

I check now and see that Apache Thrift is in fact a long-running
project that, and although still evolving, seems to do things the
right way.

If I understand it right (just clicking through the Thrift
documentation) it seems it would mainly be the code-generation step
from the Thrift IDLs that would be suspectible to change - which is
not very different from the situation with XSDs and JAXB-API,
and thus less of a concern for users of Jena which might themselves
also (indirectly) use a newer Thrift version.



On 1 September 2014 22:40, Andy Seaborne <a...@apache.org> wrote:
On 01/09/14 19:57, Stian Soiland-Reyes wrote:

Sounds proper enough :) with a binary format obviously one has to be very
careful about any changes, but I was more thinking of versioning of the
API
of Apache Thrift that your module would use through dependencies.


Same applies to text forms. Their strength is that they are W3C standards.
If that is of paramount important, then possibly "RDF 1.1 N-Quads" is the
best choice because it is fixed.


If I was to use Jena 1.14.0 depending on Apache Thrift say 0.6.0, but
instead also depended on  (something that depends on) a newer Apache
Thrift
0.9.0, have that project committed themselves to semantic versioning so
that this would still in theory work? E.g. not deleting or breaking
existing API signatures (adding is ok)


I'm not seeing that RDF Thrift is different from anything else in terms of
versioning.  You are always have the issue that dependency versions might
conflict.  At some level, you have to judge the community - at least with
open source you have that option, as well as taking the source and keeping
what you need.  Archive a record of the dependencies you use!  (if you trust
maven central - Apache projects do not make releases that depend on anything
not maven central - no dependencies on obscure transient jar repos!)

Incremental versioning had better work as Apache Thrift depends on an
earlier version of org.apache.httpcomponents:httpcore (v 4.2.4) and Jena
currently uses 4.2.6.

We have:

<dependency>
   <groupId>org.apache.thrift</groupId>
   <artifactId>libthrift</artifactId>
   <version>0.9.1</version>
   <exclusions>
     <!-- Use whatever version Jena is using -->
     <exclusion>
        <groupId>org.apache.httpcomponents</groupId>
        <artifactId>httpcore</artifactId>
     </exclusion>
   </exclusions>
</dependency>


In theory it should not make anything fall over unless you tried to use
the
Jena Thrift serialization.. but that depends on how it is wired in. In
RIOT
the standard language serializers are hardcoded somewhere, right?


Wired in but not hard coded.  They have never been hardcoded but it was
quite hard to rewire.  Think of them as a standard library of things to use
if you want to.

Now RIOT has registries (for parsers, for writers, for stream writers, then
registries for SPARQL Result Sets readers and writers) which have a set of
languages included and set up but you can remove one, replace one or add one
(RDF Thrift and JSON-LD were developed outside Jena and wired in at run time
until they moved into RIOT when stable).  Or call any code you like and put
the outcome into a graph/dataset.

         Andy



On 1 Sep 2014 09:35, "Andy Seaborne" <a...@apache.org> wrote:

On 31/08/14 19:03, Stian Soiland-Reyes wrote:

How have you tested this for IRIs and international characters in
literals?
sorry, I am out travelling and have not checked the code yet.. :)


Yes.

Thrift encodes strings as UTF-8.

The wire form of an IRI is a tagged string:
http://afs.github.io/rdf-thrift/rdf-binary-thrift.html

struct RDF_IRI {
1: required string iri
}

   The new dependency on Apache Thrift would be my main concern if this is

not
in a separate module. How stable are Thrift APIs?E.g. do they follow
semantic versioning so that a Jena build will work with a newer Thrift
version (with same major)?


Stronger than that - Thrift cares a lot about wire/storage format
compatibility because of the large scale of deployments in which it's
used.

A system wide, cross-language change of format simply isn't practical. It
would have to be a parallel evolution.

See their discussion of adding the union type - on the wire its a struct
of one element (i.e. each element is 'optional') and union-ness is
provided
by the encode/decode.  Old implementations that are not aware of union
still work.

What is open (but closing) is whether the RDF encoding is the right one.
Evidence from real use is always going to be valuable.

          Andy

   On 31 Aug 2014 15:37, "Andy Seaborne" <a...@apache.org> wrote:


   On 26/08/14 21:20, Andy Seaborne wrote:


   I've been working on a binary format for RDF and SPARQL result sets:


http://afs.github.io/rdf-thrift/

This is now ready to go if everyone is OK with that.

I'm flagging this up for passive consensus because it adds a new
dependency (for Apache Thrift).

And of course any questions or comments.

Summary, as an RDF syntax:

+ x3 faster to parse than N-triples
+ same size as N-triples, and same compression effects with gzip (8-10
compression).
+ Not much additional work to add because Thrift does most of the
work.

        Andy


Migration done (JENA-774).  Some cleaning up to do (putting classes in
more logical places mostly) but tests in and passing.

           Andy











Reply via email to