[ https://issues.apache.org/jira/browse/THRIFT-1948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13649013#comment-13649013 ]
Chet Murthy commented on THRIFT-1948: ------------------------------------- [~c...@yeksigian.com] Your note brings up a ton of things, so I'll list a bunch of responses, trying to tie them back to your note below with little snippets of reference text. (1) [Can you detail the poor experience that you have been having?] [Example #1] Thrift's strength is that it provides a clear and simple wireline type system, along with efficient language-level bindings. So let's take an example: In C, "uuid_t" is an opaque type. We know that it's a "unsigned char[16]", but that's only b/c we can look inside the abstraction. [problem a] When I want to transfer one via Thrift, I must -convert- it to-and-from a std::string. [problem b] when I want to convert a std::string to a uuid_t, I might want for standard checks to be applied. And I'd certainly like for the marshaller to run the conversion during demarshalling -- imagine a list of structs with members that themselves are lists of structs with uuid_t members. In my application, I need to basically copy the entire tree of data to properly convert the strings into uuid_t objects. [Example #2] I'm working with Leveldb. Leveldb is already-written, and has (for instance) an "enum Code" of various values for a field in a Status return-object from many methods. I want to carry this across Thrift to clients. There's no way for me to get Thrift to generate code that will consume/produce values of this "enum Code" -- thrift will generate its own types. This wouldn't be so bad, if thrift at least did as I suggest above -- allow custom type-converters, so that at least the "extra step of conversion" can be done -during- demarshalling (and hopefully, via the magic of inlining, for free). Of course, I could just arrange for the enum generated by Thrift to "line up" with the enum in Leveldb's code, but that's trusting that things don't change, and kind of negates the point of using type-safe languages -- i.e., so that if Leveldb changes in incompatible ways, the compiler will tell me so (or at least, an assert will fail). [Example #3] I'm working with C++ and Ocaml source. I find that the Ocaml language-binding is heavy on O-O features, and hence is -very- "anti-intuitive". Not "unintuitive". "anti-intuitive". On my list of things to do, is to write a new ocaml codegen that will produce more-intuitive ocaml bindings. [Summary of (1)] If effort is to be spent, I think it should be on improving the language-bindings, and specifically reducing the number of cases where -further- de/marshalling has to happen in the application, by finding ways to insert the required code in the marshallers themselves. I have forgotten the technical terms, but C# has support for things like this. Again, this isn't about improving the set of wire-types, nor the set of allowed operations. Just about making the -mapping- from wire-types to language-types more -precise-. > I think that if we are restricting Thrift to contain the types that have > already been defined, we are severely limiting the future potential that > Thrift has. (2) [Thrift futures: "I don't think that the answer should be yet another protocol; it would be better to be a part of the language that is defined."] I'm actually strongly against this. To my mind, DCE, CORBA, SOAP (need one go on?) -all- failed because of two things: (a) inefficiency (b) cumbersome language-binding They didn't fail because they lacked fancy and interesting messaging models (e.g., CORBA had interesting streaming stuff, also lots of other complex stuff). Precisely put, I think that if some feature can be implemented as a library/framework -around- Thrift, without incurring nontrivial performance costs, then it -should- be. Notice that my examples above were all of the flavor of "when the demarshaller's finished, then the app has to copy the entire data-structure to apply its further demarshalling steps". And the argument is -solely- a performance argument. In the problem-domains I'm attacking, I care about microseconds. So these issues matter a -lot-. And I think that if Thrift strays from "microseconds matter", -that- will be the reason it gets replaced. Look at Jeff Dean's LADIS 2009 keynote for arguments for this .... (3) [the specific stream example in the proposal] Looking at the specific stream example in the proposal, I am unconvinced. For instance, in the case of Chubby, one might think that Chubby (a lock-server) needs exactly what is proposed. There are several problems: (a) A chubby server process may crash (taking with it the TCP conns) and restart (or be replaced by another existing process). Semantically, the Chubby client does not see a failure -- it repeatedly reconnects and eventually succeeds. And the "stream" of updates from the server to the client proceeds uninterrupted (using persistent storage to keep track of what's been sent and acked -- and btw, that's the only way to do it, given that TCP is an *unreliable* messaging protocol). [And this isn't just in "my" Chubby implementation -- it's intrinsic to the original design in the Chubby paper. Also, I -stress- the unreliability of TCP for any -real- definition of reliability.] Of course, there are lots of cases where a TCP connection failure should mean that the stream transfer is aborted. But the specific case of a lock-server (at least, for the most widely-known and widely-cited example) not not such a case. (b) Baking in one mechanism for managing "state" is bound to be problematic. Again from the lock-server example: Chubby servers are meant to have many thousands of clients; they store -all- their state in a persistent store. So the mechanism for looking up a client's state when (e.g.) a resource is unlocked (to decide what update to send the client) is necessarily going to involve a database operation. At least, in the case of Chubby. Again, I can imagine simple cases where some simple state mechanism would suffice. But again, the specific example cited fails to convince, because (again) it doesn't cover Chubby. (4) [can't it be done as a library on top of Thrift?] Again, I question whether this can't be done on top of Thrift. Maybe with a few extra threads, sure. But keeping Thrift lean and simple is -very- valuable because, as I said before, microseconds matter. So does portability. I think it would be very useful if you could explain why some mechanism -above/around- Thrift isn't sufficient (since I can imagine at least one method that should be "no biggie"). I'm not proposing that every developer should reimplement that mechanism. One could easily imagine some C++ template classes, plus some standard idioms for writing the IDL, that sufficed for most cases, it seems to me. (5) [Cassandra] I assume you're talking about CQL? Several things to note: (a) it isn't a general RPC protocol. So the CAS folks can take liberties (b) I think they're going to find that the task of providing high-quality language-bindings is a little tougher than they envision. I wish them luck. (c) Their mention of 'stream ids' as a way of multiplexing multiple streams or requests over a single TCP conn .... reminds me of something. Have a look at BEEP/BXXP. They did a similar thing. It -looks- beautiful, until you realize that they're reproducing the logic and structure of TCP conns .... on top of a TCP conn. The Web has taught us that there's a solution for that: "open more TCP conns". --> I cannot overemphasize how baroque it all looks, if you step back and realize that they could have just created more TCP conns and been -done- with it. Like browsers do! I mean, notice that HTTP/1.1 mandates that responses to pipelined requests return in -strict- request-order. It's all designed to -preserve- the RPC nature of the connection, as much as possible. --> and of course, you can get server->client RPCs in the manner that Chubby uses (by having a client->server RPC that waits for the server to want to send a message to the client). --> of course^2, there's no need for this to require more -threads- just for a more-efficient stub/skeleton, which again is independent of the RPC language and marshaller-compiler. --> e.g., I'm currently hacking the ocaml codegen to expose the "send_procname/recv_procname" methods, so that, in combination with a slightly-modified Transport (and ServerSocket), I can get asynchronous clients. One could do similar things on the server. (d) And finally, I don't buy the argument that pushing a message from server to client requires either of more threads or some special wireline feature. Instead, the client could open two conns to the same server, and the -Processor- abstraction could arrange that one of them is set up with the client as the "Thrift client", and the other with the server as the "Thrift client". This could all be done outside the Thrift protocol and IDL. > Add a stream type > ----------------- > > Key: THRIFT-1948 > URL: https://issues.apache.org/jira/browse/THRIFT-1948 > Project: Thrift > Issue Type: New Feature > Components: AS3 - Compiler, AS3 - Library, C glib - Compiler, C glib > - Library, C# - Compiler, C# - Library, C++ - Compiler, C++ - Library, Cocoa > - Compiler, Cocoa - Library, Compiler (General), D - Compiler, D - Library, > Delphi - Compiler, Delphi - Library, Erlang - Compiler, Erlang - Library, Go > - Compiler, Go - Library, Haskell - Compiler, Haskell - Library, Java - > Compiler, Java - Library, JavaME - Compiler, JavaME - Library, JavaScript - > Compiler, JavaScript - Library, Node.js - Compiler, Node.js - Library, OCaml > - Compiler, OCaml - Library, Perl - Compiler, Perl - Library, PHP - Compiler, > PHP - Library, Python - Compiler, Python - Library, Ruby - Compiler, Ruby - > Library, Smalltalk - Compiler, Smalltalk - Library > Reporter: Carl Yeksigian > Assignee: Carl Yeksigian > > This is a proposal for an addition to the Thrift IDL, which allows for > sending chunks of data between the server and the client without having the > whole message in memory at the start of the communication. > Here are two use cases where I have been thinking about the possibility of > using streams. > LockServer.thrift: > {code} > struct Update { > 1: required string lock_handle, > 2: required i64 owner > } > service LockService { > stream<Update> updates_for(1: string prefix) > } > {code} > This would allow the LockServer to push out updates that happen based on the > prefix the client has specified, rather than the constant polling that would > currently be required to imitate this interface. > ManyResults.thrift: > {code} > service QueryProvider { > stream<Result> run_query() > } > {code} > This allows the query provider to run the query and send back the results as > they come in, rather than having to bunch them up, or provide a way to page > through the results to the client. > The new keyword, "stream<T>", would indicate that there is a series of values > typed T which would be communicated between client and server. Stream would > have three primitives: > {code} > next(T) > error(TException) > end() > {code} > Protocols would be enhanced with the following methods: > {code} > writeStreamBegin(etype, streamid) > writeStreamNext(streamid, streamMessageType) > writeStreamNextEnd() > writeStreamErrorEnd() > etype, streamid = readStreamBegin() > streamid, streamMessageType = readStreamNext() > readStreamNextEnd() > readStreamErrorEnd() > {code} > streamMessageType is one of the following: > # next > This means that the message will be of the element type. > # error > An exception was thrown during materialization of the stream. > The stream is now closed. > # end > This means that the stream is finished. > The stream is now closed. > Once all streams are closed, readMessageEnd should be called. Before the > first writeStreamNext() could be called, the message should otherwise be > complete. Otherwise, an exception should be raised. > It is possible that an exception will be thrown while the stream is being > materialized; however, this can only occur inside of a service. In this case, > error() will be called; the exception should be one of the exceptions that > the service call would have thrown. The values that were generated before the > exception will generally be valid, but may only have meaning if the stream is > ended. All streams which are currently open may get the same exception. > If the following service was defined: > {code} > stream<i64> random_numbers(stream<i64> max) > {code} > A sample session from client to server would be: > {code} > writeMessageBegin() > writeStreamBegin(I64, 0) > writeStreamNext(0, next) > writeI64(10) > writeStreamNextEnd() > writeStreamNext(0, end) > writeMessageEnd() > {code} > A sample session from server to client would be: > {code} > writeMessageBegin() > writeStreamBegin(i64, 0) > writeStreamNext(0, next) > writeI64(3) > writeStreamNextEnd() > writeStreamNext(0, end) > writeMessageEnd() > {code} > This change would not be compatible with previous versions of Thrift. Also, > for languages which do not support this type of streaming, it could be > translated into a list. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira