Re: Pluggable Serializers

David Nadlinger Sun, 22 May 2011 10:52:18 -0700

Hey Bryan,

First, I'd like to thank you a lot for your offer – I very muchappreciate any help from more experienced Thrift users or developers.

I thought a bit more about this issue, and while I agree that thecurrent scheme makes it really hard to implement alternative protocolsdiffering from the flat, »context-free« nature of the default binaryprotocol, I'm not sure how pluggable serializers would be implemented inyour idea.

More specifically, I can't quite see how structs would really beserialized after the change. Would you propose to replace the protocolinterface by a project-specific generated serializer interface having awrite method for all defined struct types, like in the following example?


---
struct Foo {
   int a;
   // No read/write method here.
}

struct Bar { … }

interface TSerializer {
   void writeFoo(Foo f);
   void writeBar(Bar b);
}

class TBinarySerializer implements TSerializer {
   this (TTransport t) { … }
   void writeFoo(Foo f) { … }
   void writeBar(Bar b) { … }
}

class TJsonSerializer implements TSerializer { … }
---

Having such a single global interface doesn't seem quite right to me(extensibility, etc.) even if it would be generated, and indeed youwrote about serializer classes being generated for each struct. But howwould you connect serializers to protocols then, or how would theprotocol interface (i.e. TProtocol and friends) look like in the firstplace to allow for writing protocol agnostic code? It appears to me thatsomewhere all possible »protocol styles« (i.e. serializer types) wouldhave to be enumerated, because otherwise there would be no way for thewrite() methods to be able to select the correct serializer, whichdoesn't seem like a great solution either.

To clarify what I mean, another example how I think this approach couldbe implemented:


---
interface TProtocol {
   void writeStruct(TStructSerializerFactory s);
   …
}

class TBinaryProtocol implements TProtocol {
   void writeStruct(TStructSerializerFactory s) {
      s.getBinarySerializer().writeTo(this);
   }
   …
}

interface TBinarySerializer { void writeTo(TBinaryProtocol t); }
interface TJsonSerializer { void writeTo(TJsonProtocol t); }

interface TStructSerializerFactory {
   // Have to enumerate all possible protocol »styles« here.
   TBinarySerializer getBinarySerializer();
   TJsonSerializer getJsonSerializer();
   …
}

struct Foo {
   int a;
   void write(TProtocol t) {
      t.writeStruct(new FooSerializerFactory(this));
   }
}

class FooSerializerFactory implements TStructSerializerFactory {
   Foo f_;
   this(Foo f) {
      f_ = f;
   }
   TBinarySerializer getBinarySerializer() {
      return new FooBinarySerializer(f_);
   }
   // other factory methods
   …
}

class FooBinarySerializer implements TBinarySerializer {
   Foo f_;
   this(Foo f) {
      f_ = f;
   }
   void writeTo(TBinaryProtocol t) {
      // The code currently generated into Foo.write().
      …
   }
}
---

There are of course a few other possible ways to implement this, but Icouldn't really come up with a design to connect serializers andprotocols that doesn't seem hackish or overly complex.

But isn't the problem really just that the current TProtocol interfacemakes it hard to implement protocols that have some kind of »scope« or»nesting«, like JSON does, because everything is »flattened« to a singlelayer, only to painstakingly reconstruct the structure from thewrite*Begin() and write*End() calls later?

I think it would help quite a bit to just replace all the pairs of*Begin() and *End() calls with a single function, e.g. writeStruct(),which takes a delegate/lambda (or whatever it is called in therespective language) for writing the children. A little piece of D-stylepseudocode to illustrate what I mean:


---
interface TProtocol {
   void writeStruct(string name, void delegate() writeMembers);
   …
}

class TJsonProtocol implements TProtocol {
   void writeStruct(string name, void delegate() writeMembers) {
      // Do some setup work, open a new JSON object.
      …
      // Call the passed in delegate, which calls other write* functions
      // on this protocol instance to write out all the members.
      writeMembers();

      // Do some cleanup work, close the JSON object definition, being
      // able to access any data stored in local variables above.
   }

   …
}

struct Foo {
   int a;
   void write(TProtocol t) {
      t.writeStruct("Foo", {
         // Write all the members of Foo to t, just like we do now:
         t.writeField(1, …);
      } );
   }
}
---

This way, you don't need an excessive amount of bookkeeping to persistthe information about the structure across the different calls by justmapping the structure to recursive function calls, but there is still asimple common interface for all protocols. I'll give it a try whenimplementing the protocols in D, let's see how this works out…


Thanks for reading through all this,
David


On 4/30/11 11:26 PM, Bryan Duxbury wrote:

Hey David -

I don't think it's been explored in great detail anywhere yet, but my idea
was that we'd introduce a layer of abstraction between struct and protocol
called serializer. This new object would basically take the guts of the
write() and read() methods and move them into a separate class, which the
compiler would generate for each struct.

The first draft of this would just be an exercise in refactoring, but once
the code was generated in a different class, we could extend he model to
generate different kinds of serializers that work better with different
protocols. For instance, I could imagine a "CompactSerializer" that meant we
didn't have to keep a stateful Protocol, or a JsonSerializer that just made
JSON without all the existing machinations.

I wish I had more to offer here, but I just haven't had the time to
experiment. If you're starting from scratch on a new language
implementation, I'd recommend just porting the Java library as directly as
you can manage. It's extremely mature and robust - and it has pretty decent
tests.

Let me know if you run into specific roadblocks. I'm always happy to help
new languages come on board!

-Bryan

On Fri, Apr 29, 2011 at 4:36 PM, David Nadlinger<[email protected]>wrote:

Hello list,

as this is my first post here, let my quickly introduce myself first: My
name is David Nadlinger, I'm a student from Austria, and I am going to work
on a Thrift-related project during this year's Google Summer of Code under
the umbrella of Digital Mars: a Thrift implementation for/in the D
programming language. [1]

While preparing my project proposal, I came across a JIRA entry which
discusses the idea of pluggable serializers [2], and as I will implement a
new language library during the course of the project, this obviously caught
my attention. As I am somewhat familiar with the way serialization is
currently implemented, I can see the limitations of the existing approach,
but are there any details on how exactly the design of the proposed new
solution would look like? Maybe there is some previous discussion on the
topic I missed while looking through the mailing list archives? Otherwise,
Bryan, would you mind quickly sketching how you envision the design?

As I am currently thinking about the library design for D, I would be
grateful for any feedback, also regarding any other lessons learned about
the current C++/Java library design.

Thanks a lot,
David


[1] http://klickverbot.at/code/gsoc/thrift/ (nothing of interest there
yet)

[2] https://issues.apache.org/jira/browse/THRIFT-769

Re: Pluggable Serializers

Reply via email to