Hey Bryan,
First, I'd like to thank you a lot for your offer – I very much
appreciate any help from more experienced Thrift users or developers.
I thought a bit more about this issue, and while I agree that the
current scheme makes it really hard to implement alternative protocols
differing from the flat, »context-free« nature of the default binary
protocol, I'm not sure how pluggable serializers would be implemented in
your idea.
More specifically, I can't quite see how structs would really be
serialized after the change. Would you propose to replace the protocol
interface by a project-specific generated serializer interface having a
write method for all defined struct types, like in the following example?
---
struct Foo {
int a;
// No read/write method here.
}
struct Bar { … }
interface TSerializer {
void writeFoo(Foo f);
void writeBar(Bar b);
}
class TBinarySerializer implements TSerializer {
this (TTransport t) { … }
void writeFoo(Foo f) { … }
void writeBar(Bar b) { … }
}
class TJsonSerializer implements TSerializer { … }
---
Having such a single global interface doesn't seem quite right to me
(extensibility, etc.) even if it would be generated, and indeed you
wrote about serializer classes being generated for each struct. But how
would you connect serializers to protocols then, or how would the
protocol interface (i.e. TProtocol and friends) look like in the first
place to allow for writing protocol agnostic code? It appears to me that
somewhere all possible »protocol styles« (i.e. serializer types) would
have to be enumerated, because otherwise there would be no way for the
write() methods to be able to select the correct serializer, which
doesn't seem like a great solution either.
To clarify what I mean, another example how I think this approach could
be implemented:
---
interface TProtocol {
void writeStruct(TStructSerializerFactory s);
…
}
class TBinaryProtocol implements TProtocol {
void writeStruct(TStructSerializerFactory s) {
s.getBinarySerializer().writeTo(this);
}
…
}
interface TBinarySerializer { void writeTo(TBinaryProtocol t); }
interface TJsonSerializer { void writeTo(TJsonProtocol t); }
interface TStructSerializerFactory {
// Have to enumerate all possible protocol »styles« here.
TBinarySerializer getBinarySerializer();
TJsonSerializer getJsonSerializer();
…
}
struct Foo {
int a;
void write(TProtocol t) {
t.writeStruct(new FooSerializerFactory(this));
}
}
class FooSerializerFactory implements TStructSerializerFactory {
Foo f_;
this(Foo f) {
f_ = f;
}
TBinarySerializer getBinarySerializer() {
return new FooBinarySerializer(f_);
}
// other factory methods
…
}
class FooBinarySerializer implements TBinarySerializer {
Foo f_;
this(Foo f) {
f_ = f;
}
void writeTo(TBinaryProtocol t) {
// The code currently generated into Foo.write().
…
}
}
---
There are of course a few other possible ways to implement this, but I
couldn't really come up with a design to connect serializers and
protocols that doesn't seem hackish or overly complex.
But isn't the problem really just that the current TProtocol interface
makes it hard to implement protocols that have some kind of »scope« or
»nesting«, like JSON does, because everything is »flattened« to a single
layer, only to painstakingly reconstruct the structure from the
write*Begin() and write*End() calls later?
I think it would help quite a bit to just replace all the pairs of
*Begin() and *End() calls with a single function, e.g. writeStruct(),
which takes a delegate/lambda (or whatever it is called in the
respective language) for writing the children. A little piece of D-style
pseudocode to illustrate what I mean:
---
interface TProtocol {
void writeStruct(string name, void delegate() writeMembers);
…
}
class TJsonProtocol implements TProtocol {
void writeStruct(string name, void delegate() writeMembers) {
// Do some setup work, open a new JSON object.
…
// Call the passed in delegate, which calls other write* functions
// on this protocol instance to write out all the members.
writeMembers();
// Do some cleanup work, close the JSON object definition, being
// able to access any data stored in local variables above.
}
…
}
struct Foo {
int a;
void write(TProtocol t) {
t.writeStruct("Foo", {
// Write all the members of Foo to t, just like we do now:
t.writeField(1, …);
} );
}
}
---
This way, you don't need an excessive amount of bookkeeping to persist
the information about the structure across the different calls by just
mapping the structure to recursive function calls, but there is still a
simple common interface for all protocols. I'll give it a try when
implementing the protocols in D, let's see how this works out…
Thanks for reading through all this,
David
On 4/30/11 11:26 PM, Bryan Duxbury wrote:
Hey David -
I don't think it's been explored in great detail anywhere yet, but my idea
was that we'd introduce a layer of abstraction between struct and protocol
called serializer. This new object would basically take the guts of the
write() and read() methods and move them into a separate class, which the
compiler would generate for each struct.
The first draft of this would just be an exercise in refactoring, but once
the code was generated in a different class, we could extend he model to
generate different kinds of serializers that work better with different
protocols. For instance, I could imagine a "CompactSerializer" that meant we
didn't have to keep a stateful Protocol, or a JsonSerializer that just made
JSON without all the existing machinations.
I wish I had more to offer here, but I just haven't had the time to
experiment. If you're starting from scratch on a new language
implementation, I'd recommend just porting the Java library as directly as
you can manage. It's extremely mature and robust - and it has pretty decent
tests.
Let me know if you run into specific roadblocks. I'm always happy to help
new languages come on board!
-Bryan
On Fri, Apr 29, 2011 at 4:36 PM, David Nadlinger<[email protected]>wrote:
Hello list,
as this is my first post here, let my quickly introduce myself first: My
name is David Nadlinger, I'm a student from Austria, and I am going to work
on a Thrift-related project during this year's Google Summer of Code under
the umbrella of Digital Mars: a Thrift implementation for/in the D
programming language. [1]
While preparing my project proposal, I came across a JIRA entry which
discusses the idea of pluggable serializers [2], and as I will implement a
new language library during the course of the project, this obviously caught
my attention. As I am somewhat familiar with the way serialization is
currently implemented, I can see the limitations of the existing approach,
but are there any details on how exactly the design of the proposed new
solution would look like? Maybe there is some previous discussion on the
topic I missed while looking through the mailing list archives? Otherwise,
Bryan, would you mind quickly sketching how you envision the design?
As I am currently thinking about the library design for D, I would be
grateful for any feedback, also regarding any other lessons learned about
the current C++/Java library design.
Thanks a lot,
David
[1] http://klickverbot.at/code/gsoc/thrift/ (nothing of interest there
yet)
[2] https://issues.apache.org/jira/browse/THRIFT-769