Re: ec2 tests
@chris. Thanks. I wil keep you update if I find something @Joe. I am not telling this is a bad number. I am just telling this is still not enough for us ( in order to limit the number of nodes) ;o) If I look at the last bench, version 0.6.2 is around 13000w/s I should/would be able to reach 1w/sec (in fact this is almost the case in non virtualized env.). I am just trying to understand where is the bottleneck. What do you mean by N+1 scaling? Not sure to understand the expression. Thanks. On Saturday, June 19, 2010, Chris Dean wrote: @Chris, Did you get any bench you could share with us? We're still working on it. It's a lower priority task so it will take a while to finish. So far we've run on all the AWS data centers in the US and used several different setups. We also did a test on Rackspace with one setup and some whitebox servers we had in the office. (The whitebox servers are still running I believe.) I don't have the numbers here, but the fastest by far is the non-virtualized whitebox servers. No real surprise. Rackspace was faster than AWS US-West; US-West faster than the than US-East. We always use 3 Cassandra servers and one or two machines to run stress.py. I don't think we're seeing the 7500 writes/sec so maybe our config is wrong. You'll have to be patient until my colleague writes this all up. Cheers, Chris Dean -- Olivier Mallassi OCTO Technology 50, Avenue des Champs-Elysées 75008 Paris Mobile: (33) 6 28 70 26 61 Tél: (33) 1 58 56 10 00 Fax: (33) 1 58 56 10 01 http://www.octo.com Octo Talks! http://blog.octo.com
Re: Learning-by-doing (also announcing a new Ruby Client Codename: Greek Architect)
Hi Thomas, did you look at cassandra gem from twitter (fauna/cassandra) on github? They also use the thrift_client and already have the basic cassandra API accessible. I'm also using ruby with cassandra and still need to find a slick way to do the inserts and when to update the indexes. If I understood correctly you want to look up a user by name. And you want to have the name as row keys where you would find the UID with which you could look up your user? From my understanding I would put the index into one row, name it e.g. user_by_name, and insert columns where the column name is the user name and the value of the column would be the UUID of the user. Cassandra creates the index on the column names. You can efficiently insert new values into this row. What do you want to do with the users partitioned by date? Sorted by creation date? Login date? For the cascading deletes: a database would have to remove a record from its index, too. In Cassandra you would have to do it yourself.. Christian On Jun 19, 2010, at 3:27 AM, Thomas Heller wrote: Howdy! So, last week I finally got around to playing with Cassandra. After a while I understood the basics. To test this assumption I started working on my own Client implementation since Learning-by-doing is what I do and existing Ruby Clients (which are awesome) already abstracted too much for me to really grasp what was going on. Java is not really my thing (anymore) so I began with the Thrift API and Ruby. Anyways back to Topic. This library is now is available at: http://github.com/thheller/greek_architect Since I have virtually no experience with Cassandra (but plenty with SQL) I started with the first use-case which I have programmed a bunch of times before. User Management. I build websites which are used by other people, so I need to store them somewhere. Step #1: Creating Users and persisting them in Cassandra Example here: http://github.com/thheller/greek_architect/blob/master/spec/examples/user_create_spec.rb I hope my rspec-style documentation doesnt confuse too many people since I already have a gazillion questions for this simple, but also VERY common use-case. Since a question is best asked with a concrete example to refer to, here goes my first one: Would any of you veterans build what I built the way I did? (refering to the cassandra design, not the ruby client) I insert Users with UUID keys into one ColumnFamily. I then index them by creating a row in another ColumnFamily using the Name as Key and then adding one column holding a reference to the User UUID. I also insert a reference into another ColumnFamily holding a List of Users partitioned by Date. I'm really unsure about the index design, since they dont get updated when a User row is removed. I could hook into the remove call (like I did into mutations) and cascade the deletes where needed, but 10+ years of SQL always want to tell me I'm crazy for doing this stuff! I'd really appreciate some feedback. Cheers, Thomas
Re: Occasional 10s Timeouts on Read
I shall do just that. I did a bunch of tests this morning and the situation appears to be this: I have three nodes A, B and C, with RF=2. I understand now why this issue wasn't apparent with RF=3. If there are regular intranode column requests going on (e.g. i set up a pinger to get remote columns), the cluster functions normally. However, if no intranode column requests happen for a few hours, (3 hours is the minimum I've seen, but sometimes it takes longer), things go wrong. Using node A as the point of contact from the client, all columns that live on A are returned in a timely fashion, but for columns that only live on B C, the retrieval times out, with this in the log: INFO 13:13:28,345 error writing to /10.33.3.20 No request for replicas, or consistency checks are seen in the logs of B C at this time. Using 'nodetool ring' from each of the three nodes shows all nodes as Up. Telnet from A to B on port 7000 connects. Tcpdump logs look like, at first glance, that gossip communication, perhaps heartbeats, are proceeding normally, but I haven't really analyzed them. Fifteen minutes later, the cluster decided to behave normally again. Everyone talks to each other like buddies and delivers columns fast an regularly. This is really looking like a Cassandra bug. I'll report back with my TRACE log later and I expect I'll be opening a ticket. The confidence level of my employer in my Cassandra solution to their petabyte data storage project is... uh... well... it could be better. AJ On Fri, Jun 18, 2010 at 8:16 PM, Jonathan Ellis jbel...@gmail.com wrote: set log level to TRACE and see if the OutboundTcpConnection is going bad. that would explain the message never arriving. On Fri, Jun 18, 2010 at 10:39 AM, AJ Slater a...@zuno.com wrote: To summarize: If a request for a column comes in *after a period of several hours with no requests*, then the node servicing the request hangs while looking for its peer rather than servicing the request like it should. It then throws either a TimedOutException or a (wrong) NotFoundExeption. And it doen't appear to actually send the message it says it does to its peer. Or at least its peer doesn't report the request being received. And then the situation magically clears up after approximately 2 minutes. However, if the idle period never occurs, then the problem does not manifest. If I run a cron job with wget against my server every minute, I do not see the problem. I'll be looking at some tcpdump logs to see if i can suss out what's really happening, and perhaps file this as a bug. The several hours between reproducible events makes this whole thing aggravating for detection, debugging and I'll assume, fixing, if it is indeed a cassandra problem. It was suggested on IRC that it may be my network. But gossip is continually sending heartbeats and nodetool and the logs show the nodes as up and available. If my network was flaking out I'd think it would be dropping heartbeats and I'd see that. AJ On Thu, Jun 17, 2010 at 2:26 PM, AJ Slater a...@zuno.com wrote: These are physical machines. storage-conf.xml.fs03 is here: http://pastebin.com/weL41NB1 Diffs from that for the other two storage-confs are inline here: a...@worm:../Z3/cassandra/conf/dev$ diff storage-conf.xml.lpc03 storage-conf.xml.fs01 185c185 InitialToken71603818521973537678586548668074777838/InitialToken 229c229 ListenAddress10.33.2.70/ListenAddress --- ListenAddress10.33.3.10/ListenAddress 241c241 ThriftAddress10.33.2.70/ThriftAddress --- ThriftAddress10.33.3.10/ThriftAddress 341c341 ConcurrentReads16/ConcurrentReads --- ConcurrentReads4/ConcurrentReads a...@worm:../Z3/cassandra/conf/dev$ diff storage-conf.xml.lpc03 storage-conf.xml.fs02 185c185 InitialToken0/InitialToken --- InitialToken120215585224964746744782921158327379306/InitialToken 206d205 Seed10.33.3.20/Seed 229c228 ListenAddress10.33.2.70/ListenAddress --- ListenAddress10.33.3.20/ListenAddress 241c240 ThriftAddress10.33.2.70/ThriftAddress --- ThriftAddress10.33.3.20/ThriftAddress 341c340 ConcurrentReads16/ConcurrentReads --- ConcurrentReads4/ConcurrentReads Thank you for your attention, AJ On Thu, Jun 17, 2010 at 2:09 PM, Benjamin Black b...@b3k.us wrote: Are these physical machines or virtuals? Did you post your cassandra.in.sh and storage-conf.xml someplace? On Thu, Jun 17, 2010 at 10:31 AM, AJ Slater a...@zuno.com wrote: Total data size in the entire cluster is about twenty 12k images. With no other load on the system. I just ask for one column and I get these timeouts. Performing multiple gets on the columns leads to multiple timeouts for a period of a few seconds or minutes and then the situation magically resolves itself and response times are down to single digit milliseconds for a column get. On Thu, Jun 17, 2010 at 10:24 AM, AJ Slater a...@zuno.com wrote: Cassandra 0.6.2 from the apache
Re: Learning-by-doing (also announcing a new Ruby Client Codename: Greek Architect)
On Sat, Jun 19, 2010 at 9:30 AM, Christian van der Leeden christian.vanderlee...@googlemail.com wrote: Hi Thomas, did you look at cassandra gem from twitter (fauna/cassandra) on github? They also use the thrift_client and already have the basic cassandra API accessible. I'm also using ruby with cassandra and still need to find a slick way to do the inserts and when to update the indexes. Have you looked at CassandraObject for that? http://github.com/nzkoz/cassandra_object -ryan If I understood correctly you want to look up a user by name. And you want to have the name as row keys where you would find the UID with which you could look up your user? From my understanding I would put the index into one row, name it e.g. user_by_name, and insert columns where the column name is the user name and the value of the column would be the UUID of the user. Cassandra creates the index on the column names. You can efficiently insert new values into this row. What do you want to do with the users partitioned by date? Sorted by creation date? Login date? For the cascading deletes: a database would have to remove a record from its index, too. In Cassandra you would have to do it yourself.. Christian On Jun 19, 2010, at 3:27 AM, Thomas Heller wrote: Howdy! So, last week I finally got around to playing with Cassandra. After a while I understood the basics. To test this assumption I started working on my own Client implementation since Learning-by-doing is what I do and existing Ruby Clients (which are awesome) already abstracted too much for me to really grasp what was going on. Java is not really my thing (anymore) so I began with the Thrift API and Ruby. Anyways back to Topic. This library is now is available at: http://github.com/thheller/greek_architect Since I have virtually no experience with Cassandra (but plenty with SQL) I started with the first use-case which I have programmed a bunch of times before. User Management. I build websites which are used by other people, so I need to store them somewhere. Step #1: Creating Users and persisting them in Cassandra Example here: http://github.com/thheller/greek_architect/blob/master/spec/examples/user_create_spec.rb I hope my rspec-style documentation doesnt confuse too many people since I already have a gazillion questions for this simple, but also VERY common use-case. Since a question is best asked with a concrete example to refer to, here goes my first one: Would any of you veterans build what I built the way I did? (refering to the cassandra design, not the ruby client) I insert Users with UUID keys into one ColumnFamily. I then index them by creating a row in another ColumnFamily using the Name as Key and then adding one column holding a reference to the User UUID. I also insert a reference into another ColumnFamily holding a List of Users partitioned by Date. I'm really unsure about the index design, since they dont get updated when a User row is removed. I could hook into the remove call (like I did into mutations) and cascade the deletes where needed, but 10+ years of SQL always want to tell me I'm crazy for doing this stuff! I'd really appreciate some feedback. Cheers, Thomas
Re: Occasional 10s Timeouts on Read
TRACE 14:42:06,248 unable to connect to /10.33.3.20 java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) So that's interesting since it is a clear failure that comes from the operating system and indicates something which can be observed outside of cassandra using system tools. Presumably either cassandra is somehow connecting to the wrong port, or this is a firewalling/os/network issue, or the 'other' cassandra is not listening on the port. Using tcpdump/netstat -nlp should narrow that down. Is it possible connections only succeed in one direction for example? -- / Peter Schuller
Re: Occasional 10s Timeouts on Read
The only indication I have that cassandra realized something was wrong during this period was this INFO message: 10.33.2.70:/var/log/cassandra/output.log DEBUG 20:00:35,841 get_slice DEBUG 20:00:35,841 weakreadremote reading SliceFromReadCommand(table='jolitics.c om', key='4c43228354b38f14a1a015dd722cdf4b', column_parent='QueryPath(columnFami lyName='Images', superColumnName='null', columnName='null')', start='', finish=' ', reversed=false, count=100) DEBUG 20:00:35,841 weakreadremote reading SliceFromReadCommand(table='jolitics.c om', key='4c43228354b38f14a1a015dd722cdf4b', column_parent='QueryPath(columnFami lyName='Images', superColumnName='null', columnName='null')', start='', finish=' ', reversed=false, count=100) from 60999@/10.33.3.10 INFO 20:00:35,842 error writing to /10.33.3.10 TRACE 20:00:36,267 Received a GossipDigestSynMessage from /10.33.3.10 TRACE 20:00:36,267 reporting /10.33.3.10 TRACE 20:00:36,267 reporting /10.33.3.10 Where it notes there's an error. The next read I did at 20:02, while writing my last mail to this list, succeeded. So, its timing out, but al the while sending heartbeats and GossipDIgestSyns and Acks back and forth and maybe not really querying its peers when it should, or timing out trying to do so. When it finally realizes theres an error, it resets something? And then we're back in business? I'm going to be offline for 48 hours. AJ On Sat, Jun 19, 2010 at 8:09 PM, AJ Slater a...@zuno.com wrote: Agreed. But those connection errors were happening at a sort of random time. Not the time when I was seeing the problem. Now I am seeing the problem and here are some logs without ConnectionExceptions. Here we're asking 10.33.2.70 for key: 52e86817a577f75e545cdecd174d8b17 which resides only on 10.33.3.10 and 10.33.3.20. Note all the apparently normal communication. Execept that no mention of a request for key 52e86817a577f75e545cdecd174d8b17 ever comes up in 10.33.3.10's log, despite 10.33.2.70 saying it was reading from 10.33.3.10 The problem resolved itself again at 20:02, maybe 20 minutes later. where all of a sudden I get my columns returned in milliseconds and I see good stuff like: DEBUG 20:06:35,238 Reading consistency digest for 52e86817a577f75e545cdecd174d8b17 from 59...@[/10.33.3.10, /10.33.3.20] Here's some logs from the problem period 10.33.2.70:/var/log/cassandra/output.log DEBUG 19:42:03,230 get_slice DEBUG 19:42:03,231 weakreadremote reading SliceFromReadCommand(table='jolitics.c om', key='52e86817a577f75e545cdecd174d8b17', column_parent='QueryPath(columnFami lyName='Images', superColumnName='null', columnName='null')', start='', finish=' ', reversed=false, count=100) DEBUG 19:42:03,231 weakreadremote reading SliceFromReadCommand(table='jolitics.c om', key='52e86817a577f75e545cdecd174d8b17', column_parent='QueryPath(columnFami lyName='Images', superColumnName='null', columnName='null')', start='', finish=' ', reversed=false, count=100) from 57663@/10.33.3.10 TRACE 19:42:03,619 Gossip Digests are : /10.33.2.70:1276981671:20386 /10.33.3.10 :1276983719:18303 /10.33.3.20:1276983726:18295 /10.33.2.70:1276981671:20386 TRACE 19:42:03,619 Sending a GossipDigestSynMessage to /10.33.3.20 ... TRACE 19:42:03,619 Performing status check ... TRACE 19:42:03,619 PHI for /10.33.3.10 : 0.95343619570936 TRACE 19:42:03,619 PHI for /10.33.3.20 : 0.8635116192106644 TRACE 19:42:03,621 Received a GossipDigestAckMessage from /10.33.3.20 TRACE 19:42:03,621 reporting /10.33.3.10 TRACE 19:42:03,621 reporting /10.33.3.20 TRACE 19:42:03,621 marking as alive /10.33.3.10 TRACE 19:42:03,621 Updating heartbeat state version to 18304 from 18303 for /10. 33.3.10 ... TRACE 19:42:03,621 marking as alive /10.33.3.20 TRACE 19:42:03,621 Updating heartbeat state version to 18296 from 18295 for /10. 33.3.20 ... TRACE 19:42:03,622 Scanning for state greater than 20385 for /10.33.2.70 TRACE 19:42:03,622 Scanning for state greater than 20385 for /10.33.2.70 TRACE 19:42:03,622 Sending a GossipDigestAck2Message to /10.33.3.20 TRACE 19:42:04,172 Received a GossipDigestSynMessage from /10.33.3.10 TRACE 19:42:04,172 reporting /10.33.3.10 TRACE 19:42:04,172 reporting /10.33.3.10 TRACE 19:42:04,172 Scanning for state greater than 20385 for /10.33.2.70 TRACE 19:42:04,172 Size of GossipDigestAckMessage is 52 TRACE 19:42:04,172 Sending a GossipDigestAckMessage to /10.33.3.10 TRACE 19:42:04,174 Received a GossipDigestAck2Message from /10.33.3.10 TRACE 19:42:04,174 reporting /10.33.3.10 TRACE 19:42:04,174 marking as alive /10.33.3.10 TRACE 19:42:04,174 Updating heartbeat state version to 18305 from 18304 for /10. 33.3.10 ... 10.33.3.10:/var/log/cassandra/output.log TRACE 19:42:03,174 Sending a GossipDigestSynMessage to /10.33.3.20 ... TRACE 19:42:03,174 Performing status check ... TRACE 19:42:03,174 PHI for /10.33.2.70 : 1.3363463863632534 TRACE 19:42:03,174 PHI for /10.33.3.20 : 0.9297110501502452 TRACE 19:42:03,175