Re: non blocking Cassandra with Tornado

2010-08-10 Thread Ryan Daum


 Barring this we (place where I work, Chango) will probably eventually fork
 Cassandra to have a RESTful interface and use the Jetty async HTTP client to
 connect to it. It's just ridiculous for us to have threads and associated
 resources tied up on I/O-blocked operations.


 We've done exactly this but with Netty rather than Jetty. Helps too because
 we can easily have testers look at what we put into our CFs. Took some
 cheating to marshall the raw binaries into JSON but I'm pretty happy with
 what it's bought us so far.


Are you capable/willing/interested in sharing the work you did?

Ryan


Re: non blocking Cassandra with Tornado

2010-08-06 Thread Erik Onnen
On Thu, Jul 29, 2010 at 9:57 PM, Ryan Daum r...@thimbleware.com wrote:


 Barring this we (place where I work, Chango) will probably eventually fork
 Cassandra to have a RESTful interface and use the Jetty async HTTP client to
 connect to it. It's just ridiculous for us to have threads and associated
 resources tied up on I/O-blocked operations.


We've done exactly this but with Netty rather than Jetty. Helps too because
we can easily have testers look at what we put into our CFs. Took some
cheating to marshall the raw binaries into JSON but I'm pretty happy with
what it's bought us so far.


Re: non blocking Cassandra with Tornado

2010-08-06 Thread Jonathan Ellis
See comments to https://issues.apache.org/jira/browse/CASSANDRA-1256

On Fri, Jul 30, 2010 at 12:57 AM, Ryan Daum r...@thimbleware.com wrote:
 An asynchronous thrift client in Java would be something that we could
 really use; I'm trying to get a sense of whether this async client is usable
 with Cassandra at this point -- given that Cassandra typically bundles a
 specific older Thrift version, would the technique described here work at
 all with a 0.6.x or 0.7 distribution? Has anybody tried this?
 Barring this we (place where I work, Chango) will probably eventually fork
 Cassandra to have a RESTful interface and use the Jetty async HTTP client to
 connect to it. It's just ridiculous for us to have threads and associated
 resources tied up on I/O-blocked operations.
 R

 On Tue, Jul 27, 2010 at 11:51 AM, Dave Viner davevi...@pobox.com wrote:

 FWIW - I think this is actually more of a question about Thrift than about
 Cassandra.  If I understand you correctly, you're looking for a async
 client.  Cassandra lives on the other side of the thrift service.  So, you
 need a client that can speak Thrift asynchronously.
 You might check out the new async Thrift client in Java for inspiration:
 http://blog.rapleaf.com/dev/2010/06/23/fully-async-thrift-client-in-java/
 Or, even better, port the Thrift async client to work for python and other
 languages.
 Dave Viner

 On Tue, Jul 27, 2010 at 8:44 AM, Peter Schuller
 peter.schul...@infidyne.com wrote:

  The idea is rather than calling a cassandra client function like
  get_slice(), call the send_get_slice() then have a non blocking wait on
  the
  socket thrift is using, then call recv_get_slice().

 (disclaimer: I've never used tornado)

 Without looking at the generated thrift code, this sounds dangerous.
 What happens if send_get_slice() blocks? What happens if
 recv_get_slice() has to block because you didn't happen to receive the
 response in one packet?

 Normally you're either doing blocking code or callback oriented
 reactive code. It sounds like you're trying to use blocking calls in a
 non-blocking context under the assumption that readable data on the
 socket means the entire response is readable, and that the socket
 being writable means that the entire request can be written without
 blocking. This might seems to work and you may not block, or block
 only briefly. Until, for example, a TCP connection stalls and your
 entire event loop hangs due to a blocking read.

 Apologies if I'm misunderstanding what you're trying to do.

 --
 / Peter Schuller






-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


Re: non blocking Cassandra with Tornado

2010-08-05 Thread aaron morton
Have you had a chance to try this technique out in Java ?

I've not been able to get back to my original experiments for the last week. 

If it works you should be able to put together a non blocking client that still 
used thrift. 

Aaron
On 30 Jul 2010, at 16:57, Ryan Daum wrote:

 An asynchronous thrift client in Java would be something that we could really 
 use; I'm trying to get a sense of whether this async client is usable with 
 Cassandra at this point -- given that Cassandra typically bundles a specific 
 older Thrift version, would the technique described here work at all with a 
 0.6.x or 0.7 distribution? Has anybody tried this?
 
 Barring this we (place where I work, Chango) will probably eventually fork 
 Cassandra to have a RESTful interface and use the Jetty async HTTP client to 
 connect to it. It's just ridiculous for us to have threads and associated 
 resources tied up on I/O-blocked operations.
 
 R
 
 On Tue, Jul 27, 2010 at 11:51 AM, Dave Viner davevi...@pobox.com wrote:
 FWIW - I think this is actually more of a question about Thrift than about 
 Cassandra.  If I understand you correctly, you're looking for a async client. 
  Cassandra lives on the other side of the thrift service.  So, you need a 
 client that can speak Thrift asynchronously.
 
 You might check out the new async Thrift client in Java for inspiration:
 
 http://blog.rapleaf.com/dev/2010/06/23/fully-async-thrift-client-in-java/
 
 Or, even better, port the Thrift async client to work for python and other 
 languages.  
 
 Dave Viner
 
 
 On Tue, Jul 27, 2010 at 8:44 AM, Peter Schuller peter.schul...@infidyne.com 
 wrote:
  The idea is rather than calling a cassandra client function like
  get_slice(), call the send_get_slice() then have a non blocking wait on the
  socket thrift is using, then call recv_get_slice().
 
 (disclaimer: I've never used tornado)
 
 Without looking at the generated thrift code, this sounds dangerous.
 What happens if send_get_slice() blocks? What happens if
 recv_get_slice() has to block because you didn't happen to receive the
 response in one packet?
 
 Normally you're either doing blocking code or callback oriented
 reactive code. It sounds like you're trying to use blocking calls in a
 non-blocking context under the assumption that readable data on the
 socket means the entire response is readable, and that the socket
 being writable means that the entire request can be written without
 blocking. This might seems to work and you may not block, or block
 only briefly. Until, for example, a TCP connection stalls and your
 entire event loop hangs due to a blocking read.
 
 Apologies if I'm misunderstanding what you're trying to do.
 
 --
 / Peter Schuller
 
 



non blocking Cassandra with Tornado

2010-07-27 Thread aaron morton
Today I worked out how to make non blocking calls to Cassandra inside of the 
non blocking Tornado web server (http://www.tornadoweb.org/) using Python. I 
thought I'd share it here and see if anyone thinks I'm abusing Thrift too much 
and inviting trouble.

It's a bit mucky and I have not tested it for things like timeouts and errors. 
But here goes...

The idea is rather than calling a cassandra client function like get_slice(), 
call the send_get_slice() then have a non blocking wait on the socket thrift is 
using, then call recv_get_slice().

So the steps in Tornado are:

1.  Web handler creates an object from the model, calls a function on it 
like start_read() to populate it.

2.  model.start_read() needs to call get_slice() on the thrift generated 
Cassandra client. Instead it calls send_get_slice() and returns to the calling 
web handler. 

3   Web Handler then asks Tornado to epoll wait for any activity on the 
thrift socket. It gets access to the socket file descriptor by following this 
chain from the thrift generated Cassandra client 
_iprot.trans.__TTransportBase__trans.handle.fileno()

4,  Web handler function called in 1 above returns, Tornado keeps the http 
connection alive and the web handler instance alive. Later when the socket has 
activity Tornado will call back into the web handler. 

5.  To get the result of the call to cassandra the Web Handler calls a 
function on the model such as finish_read(). finish_read() wants to get the 
results of the get_slice() and do something, so it calls recv_get_slice on the 
thrift Cassandra client. Processes the result and returns to the web handler. 


This looks like the same process the TTwisted.py transport in the thrift 
package is using. Except it's not using the nasty reference to get to the raw 
socket. 

I'm not sure about any adverse affects on the Cassandra server from the client 
not servicing the socket immediately when it starts sending data back. I'm 
guessing there are some buffers there, but not sure. Could I be accidentally 
blocking / hurting the cassandra server ?

Thanks
Aaron

Re: non blocking Cassandra with Tornado

2010-07-27 Thread Sandeep Kalidindi at PaGaLGuY.com
@aaron - thanks a lot. i will test it. This is very much needed.

Cheers,
Deepu.



On Tue, Jul 27, 2010 at 6:03 PM, aaron morton aa...@thelastpickle.comwrote:

 Today I worked out how to make non blocking calls to Cassandra inside of
 the non blocking Tornado web server (http://www.tornadoweb.org/) using
 Python. I thought I'd share it here and see if anyone thinks I'm abusing
 Thrift too much and inviting trouble.

 It's a bit mucky and I have not tested it for things like timeouts and
 errors. But here goes...

 The idea is rather than calling a cassandra client function like
 get_slice(), call the send_get_slice() then have a non blocking wait on the
 socket thrift is using, then call recv_get_slice().

 So the steps in Tornado are:

 1.  Web handler creates an object from the model, calls a function on it
 like start_read() to populate it.

 2. model.start_read() needs to call get_slice() on the thrift generated
 Cassandra client. Instead it calls send_get_slice() and returns to the
 calling web handler.

 3  Web Handler then asks Tornado to epoll wait for any activity on the
 thrift socket. It gets access to the socket file descriptor by following
 this chain from the thrift generated Cassandra client
 _iprot.trans.__TTransportBase__trans.handle.fileno()

 4,  Web handler function called in 1 above returns, Tornado keeps the http
 connection alive and the web handler instance alive. Later when the socket
 has activity Tornado will call back into the web handler.

 5. To get the result of the call to cassandra the Web Handler calls a
 function on the model such as finish_read(). finish_read() wants to get the
 results of the get_slice() and do something, so it calls recv_get_slice on
 the thrift Cassandra client. Processes the result and returns to the web
 handler.


 This looks like the same process the TTwisted.py transport in the thrift
 package is using. Except it's not using the nasty reference to get to the
 raw socket.

 I'm not sure about any adverse affects on the Cassandra server from the
 client not servicing the socket immediately when it starts sending data
 back. I'm guessing there are some buffers there, but not sure. Could I be
 accidentally blocking / hurting the cassandra server ?

 Thanks
 Aaron



Re: non blocking Cassandra with Tornado

2010-07-27 Thread Peter Schuller
 The idea is rather than calling a cassandra client function like
 get_slice(), call the send_get_slice() then have a non blocking wait on the
 socket thrift is using, then call recv_get_slice().

(disclaimer: I've never used tornado)

Without looking at the generated thrift code, this sounds dangerous.
What happens if send_get_slice() blocks? What happens if
recv_get_slice() has to block because you didn't happen to receive the
response in one packet?

Normally you're either doing blocking code or callback oriented
reactive code. It sounds like you're trying to use blocking calls in a
non-blocking context under the assumption that readable data on the
socket means the entire response is readable, and that the socket
being writable means that the entire request can be written without
blocking. This might seems to work and you may not block, or block
only briefly. Until, for example, a TCP connection stalls and your
entire event loop hangs due to a blocking read.

Apologies if I'm misunderstanding what you're trying to do.

-- 
/ Peter Schuller


Re: non blocking Cassandra with Tornado

2010-07-27 Thread Dave Viner
FWIW - I think this is actually more of a question about Thrift than about
Cassandra.  If I understand you correctly, you're looking for a async
client.  Cassandra lives on the other side of the thrift service.  So, you
need a client that can speak Thrift asynchronously.

You might check out the new async Thrift client in Java for inspiration:

http://blog.rapleaf.com/dev/2010/06/23/fully-async-thrift-client-in-java/

Or, even better, port the Thrift async client to work for python and other
languages.

Dave Viner


On Tue, Jul 27, 2010 at 8:44 AM, Peter Schuller peter.schul...@infidyne.com
 wrote:

  The idea is rather than calling a cassandra client function like
  get_slice(), call the send_get_slice() then have a non blocking wait on
 the
  socket thrift is using, then call recv_get_slice().

 (disclaimer: I've never used tornado)

 Without looking at the generated thrift code, this sounds dangerous.
 What happens if send_get_slice() blocks? What happens if
 recv_get_slice() has to block because you didn't happen to receive the
 response in one packet?

 Normally you're either doing blocking code or callback oriented
 reactive code. It sounds like you're trying to use blocking calls in a
 non-blocking context under the assumption that readable data on the
 socket means the entire response is readable, and that the socket
 being writable means that the entire request can be written without
 blocking. This might seems to work and you may not block, or block
 only briefly. Until, for example, a TCP connection stalls and your
 entire event loop hangs due to a blocking read.

 Apologies if I'm misunderstanding what you're trying to do.

 --
 / Peter Schuller



Re: non blocking Cassandra with Tornado

2010-07-27 Thread Aaron Morton


Without looking at the generated thrift code, this sounds dangerous.
What happens if send_get_slice() blocks? What happens if
recv_get_slice() has to block because you didn't happen to receive the
response in one packet?
get_slice() has two lines it it, a call to send_get_slice() and one to recv_get_slice() . send_get_slice() sends the request down the socket to the server and returns. recv_get_slice() take a blocking read (with timeout) against the socket, pulls the entire message, decodes it and returns it.
Normally you're either doing blocking code or callback oriented
reactive code. It sounds like you're trying to use blocking calls in a
non-blocking context under the assumption that readable data on the
socket means the entire response is readable, and that the socket
being writable means that the entire request can be written without
blocking. This might seems to work and you may not block, or block
only briefly. Until, for example, a TCP connection stalls and your
entire event loop hangs due to a blocking read.
I'm not interrupting any of the work thrift is doing when reading or writing to the socket. Those functions still get to complete as normal. The goal is to let the tornado server work on another request while the first one is waiting for Cassandra to do its work. It's wasted time on the web heads that could otherwise be employed servicing other requests. Once it detects the socket state has changed it will add the callback into the event loop. And I then ask the Cassandra client to read all the data from the socket. It's still a blocking call, just that we don't bother to call it unless we know there is data sitting there for it.The recv could still bock hang etc, but will do that in a the normally blocking model. I'll need to test the timeouts and error propagation in these cases. Thanks for the feedbackAaron

Re: non blocking Cassandra with Tornado

2010-07-27 Thread Aaron Morton
Thanks for the link. It is more of a thrift thing, perhaps I need to do some tests where the web handler sends the get_slice to cassandra but never calls recv to see what could happen. I'll take a look at the Java binding and 
see what it would take to offer a patch to Thrift. Most people coding 
Python (including the guy sitting next to me) would probably so to use 
the thrift Twisted binding.May also take a look at the avro bindings. AaronOn 28 Jul, 2010,at 03:51 AM, Dave Viner davevi...@pobox.com wrote:FWIW - I think this is actually more of a question about Thrift than about Cassandra. If I understand you correctly, you're looking for a async client. Cassandra "lives" on the other side of the thrift service. So, you need a client that can speak Thrift asynchronously.
You might check out the new async Thrift client in Java for inspiration:http://blog.rapleaf.com/dev/2010/06/23/fully-async-thrift-client-in-java/
Or, even better, port the Thrift async client to work for python and other languages. Dave VinerOn Tue, Jul 27, 2010 at 8:44 AM, Peter Schuller peter.schul...@infidyne.com wrote:
 The idea is rather than calling a cassandra client function like
 get_slice(), call the send_get_slice() then have a non blocking wait on the
 socket thrift is using, then call recv_get_slice().

(disclaimer: I've never used tornado)

Without looking at the generated thrift code, this sounds dangerous.
What happens if send_get_slice() blocks? What happens if
recv_get_slice() has to block because you didn't happen to receive the
response in one packet?

Normally you're either doing blocking code or callback oriented
reactive code. It sounds like you're trying to use blocking calls in a
non-blocking context under the assumption that readable data on the
socket means the entire response is readable, and that the socket
being writable means that the entire request can be written without
blocking. This might seems to work and you may not block, or block
only briefly. Until, for example, a TCP connection stalls and your
entire event loop hangs due to a blocking read.

Apologies if I'm misunderstanding what you're trying to do.

--
/ Peter Schuller