[jira] [Commented] (THRIFT-1948) Add a stream type

Chet Murthy (JIRA) Fri, 03 May 2013 22:08:20 -0700

    [ 
https://issues.apache.org/jira/browse/THRIFT-1948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13649013#comment-13649013
 ]


Chet Murthy commented on THRIFT-1948:
-------------------------------------


[~c...@yeksigian.com]

Your note brings up a ton of things, so I'll list a bunch of
responses, trying to tie them back to your note below with little
snippets of reference text.

(1) [Can you detail the poor experience that you have been having?]

[Example #1]

Thrift's strength is that it provides a clear and simple wireline type
system, along with efficient language-level bindings.  So let's take
an example:

In C, "uuid_t" is an opaque type.  We know that it's a "unsigned
char[16]", but that's only b/c we can look inside the abstraction.

[problem a] When I want to transfer one via Thrift, I must -convert-
it to-and-from a std::string.

[problem b] when I want to convert a std::string to a uuid_t, I might
want for standard checks to be applied.  And I'd certainly like for
the marshaller to run the conversion during demarshalling -- imagine a
list of structs with members that themselves are lists of structs with
uuid_t members.  In my application, I need to basically copy the
entire tree of data to properly convert the strings into uuid_t
objects.

[Example #2]

I'm working with Leveldb.  Leveldb is already-written, and has (for
instance) an "enum Code" of various values for a field in a Status
return-object from many methods.  I want to carry this across Thrift
to clients.  There's no way for me to get Thrift to generate code that
will consume/produce values of this "enum Code" -- thrift will
generate its own types.  This wouldn't be so bad, if thrift at least
did as I suggest above -- allow custom type-converters, so that at
least the "extra step of conversion" can be done -during-
demarshalling (and hopefully, via the magic of inlining, for free).

Of course, I could just arrange for the enum generated by Thrift to
"line up" with the enum in Leveldb's code, but that's trusting that
things don't change, and kind of negates the point of using type-safe
languages -- i.e., so that if Leveldb changes in incompatible ways,
the compiler will tell me so (or at least, an assert will fail).

[Example #3]

I'm working with C++ and Ocaml source.  I find that the Ocaml
language-binding is heavy on O-O features, and hence is -very-
"anti-intuitive".  Not "unintuitive".  "anti-intuitive".  On my list of things 
to do, is to write a new ocaml codegen that will produce more-intuitive ocaml 
bindings.

[Summary of (1)] If effort is to be spent, I think it should be on
improving the language-bindings, and specifically reducing the number
of cases where -further- de/marshalling has to happen in the
application, by finding ways to insert the required code in the
marshallers themselves.

I have forgotten the technical terms, but C# has support for things
like this.  Again, this isn't about improving the set of wire-types,
nor the set of allowed operations.  Just about making the -mapping-
from wire-types to language-types more -precise-.

> I think that if we are restricting Thrift to contain the types that have
> already been defined, we are severely limiting the future potential that
> Thrift has.

(2) [Thrift futures: "I don't think that the answer should be yet
another protocol; it would be better to be a part of the language that
is defined."]

I'm actually strongly against this.  To my mind, DCE, CORBA, SOAP
(need one go on?) -all- failed because of two things:

  (a) inefficiency

  (b) cumbersome language-binding

They didn't fail because they lacked fancy and interesting messaging
models (e.g., CORBA had interesting streaming stuff, also lots of
other complex stuff).

Precisely put, I think that if some feature can be implemented as a
library/framework -around- Thrift, without incurring nontrivial
performance costs, then it -should- be.  Notice that my examples above
were all of the flavor of "when the demarshaller's finished, then the
app has to copy the entire data-structure to apply its further
demarshalling steps".  And the argument is -solely- a performance
argument.

In the problem-domains I'm attacking, I care about microseconds.  So
these issues matter a -lot-.  And I think that if Thrift strays from
"microseconds matter", -that- will be the reason it gets replaced.
Look at Jeff Dean's LADIS 2009 keynote for arguments for this ....

(3) [the specific stream example in the proposal]

Looking at the specific stream example in the proposal, I am
unconvinced.  For instance, in the case of Chubby, one might think
that Chubby (a lock-server) needs exactly what is proposed.  There are several
problems:

  (a) A chubby server process may crash (taking with it the TCP conns)
  and restart (or be replaced by another existing process).  Semantically, the 
Chubby
  client does not see a failure -- it repeatedly reconnects and
  eventually succeeds.  And the "stream" of updates from the server to
  the client proceeds uninterrupted (using persistent storage to keep track of 
what's been sent and acked -- and btw, that's the only way to do it, given that 
TCP is an *unreliable* messaging protocol).

  [And this isn't just in "my" Chubby implementation -- it's intrinsic
  to the original design in the Chubby paper. Also, I -stress- the 
unreliability of TCP for any -real- definition of reliability.]

  Of course, there are lots of cases where a TCP connection failure
  should mean that the stream transfer is aborted.  But the specific
  case of a lock-server (at least, for the most widely-known and
  widely-cited example) not not such a case.

  (b) Baking in one mechanism for managing "state" is bound to be
  problematic.  Again from the lock-server example: Chubby servers are
  meant to have many thousands of clients; they store -all- their
  state in a persistent store.  So the mechanism for looking up a
  client's state when (e.g.) a resource is unlocked (to decide what
  update to send the client) is necessarily going to involve a
  database operation.  At least, in the case of Chubby.

  Again, I can imagine simple cases where some simple state mechanism
  would suffice.  But again, the specific example cited fails to
  convince, because (again) it doesn't cover Chubby.

(4) [can't it be done as a library on top of Thrift?]

Again, I question whether this can't be done on top of Thrift.  Maybe
with a few extra threads, sure.  But keeping Thrift lean and simple is
-very- valuable because, as I said before, microseconds matter.  So does 
portability.

I think it would be very useful if you could explain why some
mechanism -above/around- Thrift isn't sufficient (since I can imagine
at least one method that should be "no biggie").  I'm not proposing
that every developer should reimplement that mechanism.  One could
easily imagine some C++ template classes, plus some standard idioms
for writing the IDL, that sufficed for most cases, it seems to me.

(5) [Cassandra]

I assume you're talking about CQL?  Several things to note:

  (a) it isn't a general RPC protocol.  So the CAS folks can take
  liberties

  (b) I think they're going to find that the task of providing
  high-quality language-bindings is a little tougher than they
  envision.  I wish them luck.

  (c) Their mention of 'stream ids' as a way of multiplexing multiple
  streams or requests over a single TCP conn .... reminds me of
  something.  Have a look at BEEP/BXXP.  They did a similar thing.  It
  -looks- beautiful, until you realize that they're reproducing the
  logic and structure of TCP conns .... on top of a TCP conn.  The Web
  has taught us that there's a solution for that: "open more TCP
  conns".

    --> I cannot overemphasize how baroque it all looks, if you step
    back and realize that they could have just created more TCP conns
    and been -done- with it.  Like browsers do!  I mean, notice that
    HTTP/1.1 mandates that responses to pipelined requests return in
    -strict- request-order.  It's all designed to -preserve- the RPC
    nature of the connection, as much as possible.

    --> and of course, you can get server->client RPCs in the manner
    that Chubby uses (by having a client->server RPC that waits for
    the server to want to send a message to the client).

    --> of course^2, there's no need for this to require more -threads-
    just for a more-efficient stub/skeleton, which again is
    independent of the RPC language and marshaller-compiler.

    --> e.g., I'm currently hacking the ocaml codegen to expose the
    "send_procname/recv_procname" methods, so that, in combination
    with a slightly-modified Transport (and ServerSocket), I can get
    asynchronous clients.  One could do similar things on the server.

  (d) And finally, I don't buy the argument that pushing a message
  from server to client requires either of more threads or some
  special wireline feature.

  Instead, the client could open two conns to the same server, and the
  -Processor- abstraction could arrange that one of them is set up
  with the client as the "Thrift client", and the other with the
  server as the "Thrift client".  This could all be done outside the
  Thrift protocol and IDL.

                
> Add a stream type
> -----------------
>
>                 Key: THRIFT-1948
>                 URL: https://issues.apache.org/jira/browse/THRIFT-1948
>             Project: Thrift
>          Issue Type: New Feature
>          Components: AS3 - Compiler, AS3 - Library, C glib - Compiler, C glib 
> - Library, C# - Compiler, C# - Library, C++ - Compiler, C++ - Library, Cocoa 
> - Compiler, Cocoa - Library, Compiler (General), D - Compiler, D - Library, 
> Delphi - Compiler, Delphi - Library, Erlang - Compiler, Erlang - Library, Go 
> - Compiler, Go - Library, Haskell - Compiler, Haskell - Library, Java - 
> Compiler, Java - Library, JavaME - Compiler, JavaME - Library, JavaScript - 
> Compiler, JavaScript - Library, Node.js - Compiler, Node.js - Library, OCaml 
> - Compiler, OCaml - Library, Perl - Compiler, Perl - Library, PHP - Compiler, 
> PHP - Library, Python - Compiler, Python - Library, Ruby - Compiler, Ruby - 
> Library, Smalltalk - Compiler, Smalltalk - Library
>            Reporter: Carl Yeksigian
>            Assignee: Carl Yeksigian
>
> This is a proposal for an addition to the Thrift IDL, which allows for 
> sending chunks of data between the server and the client without having the 
> whole message in memory at the start of the communication.
> Here are two use cases where I have been thinking about the possibility of 
> using streams.
> LockServer.thrift:
> {code}
> struct Update {
>       1: required string lock_handle,
>       2: required i64 owner
> }
> service LockService {
>       stream<Update> updates_for(1: string prefix)
> }
> {code}
> This would allow the LockServer to push out updates that happen based on the 
> prefix the client has specified, rather than the constant polling that would 
> currently be required to imitate this interface.
> ManyResults.thrift:
> {code}
> service QueryProvider {
>   stream<Result> run_query()
> }
> {code}
> This allows the query provider to run the query and send back the results as 
> they come in, rather than having to bunch them up, or provide a way to page 
> through the results to the client.
> The new keyword, "stream<T>", would indicate that there is a series of values 
> typed T which would be communicated between client and server. Stream would 
> have three primitives:
> {code}
> next(T)
> error(TException)
> end()
> {code}
> Protocols would be enhanced with the following methods:
> {code}
> writeStreamBegin(etype, streamid)
> writeStreamNext(streamid, streamMessageType)
> writeStreamNextEnd()
> writeStreamErrorEnd()
> etype, streamid = readStreamBegin()
> streamid, streamMessageType = readStreamNext()
> readStreamNextEnd()
> readStreamErrorEnd()
> {code}
> streamMessageType is one of the following:
> # next
>   This means that the message will be of the element type.
> # error
>   An exception was thrown during materialization of the stream.
>   The stream is now closed.
> # end
>   This means that the stream is finished.
>   The stream is now closed.
> Once all streams are closed, readMessageEnd should be called. Before the 
> first writeStreamNext() could be called, the message should otherwise be 
> complete. Otherwise, an exception should be raised.
> It is possible that an exception will be thrown while the stream is being 
> materialized; however, this can only occur inside of a service. In this case, 
> error() will be called; the exception should be one of the exceptions that 
> the service call would have thrown. The values that were generated before the 
> exception will generally be valid, but may only have meaning if the stream is 
> ended. All streams which are currently open may get the same exception.
> If the following service was defined:
> {code}
> stream<i64> random_numbers(stream<i64> max)
> {code}
> A sample session from client to server would be:
> {code}
> writeMessageBegin()
> writeStreamBegin(I64, 0)
> writeStreamNext(0, next)
> writeI64(10)
> writeStreamNextEnd()
> writeStreamNext(0, end)
> writeMessageEnd()
> {code}
> A sample session from server to client would be:
> {code}
> writeMessageBegin()
> writeStreamBegin(i64, 0)
> writeStreamNext(0, next)
> writeI64(3)
> writeStreamNextEnd()
> writeStreamNext(0, end)
> writeMessageEnd()
> {code}
> This change would not be compatible with previous versions of Thrift. Also, 
> for languages which do not support this type of streaming, it could be 
> translated into a list.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (THRIFT-1948) Add a stream type

Reply via email to