Re: ec2 tests

2010-06-19 Thread Olivier Mallassi
@chris. Thanks. I wil keep you update if I find something

@Joe. I am not telling this is a bad number. I am just telling this is
still not enough for us ( in order to limit the number of nodes)  ;o)
If I look at the last bench, version 0.6.2 is around 13000w/s
 I should/would be able to reach 1w/sec (in fact this is almost
the case in non virtualized env.).
I am just trying to understand where is the bottleneck.

What do you mean by N+1 scaling? Not sure to understand the expression.

Thanks.

On Saturday, June 19, 2010, Chris Dean  wrote:
 @Chris, Did you get any bench you could share with us?

 We're still working on it.  It's a lower priority task so it will take a
 while to finish.  So far we've run on all the AWS data centers in the US
 and used several different setups.  We also did a test on Rackspace with
 one setup and some whitebox servers we had in the office.  (The whitebox
 servers are still running I believe.)

 I don't have the numbers here, but the fastest by far is the
 non-virtualized whitebox servers.  No real surprise.  Rackspace was
 faster than AWS US-West; US-West faster than the than US-East.

 We always use 3 Cassandra servers and one or two machines to run
 stress.py.  I don't think we're seeing the 7500 writes/sec so maybe our
 config is wrong.  You'll have to be patient until my colleague writes
 this all up.

 Cheers,
 Chris Dean


-- 

Olivier Mallassi
OCTO Technology

50, Avenue des Champs-Elysées
75008 Paris

Mobile: (33) 6 28 70 26 61
Tél: (33) 1 58 56 10 00
Fax: (33) 1 58 56 10 01

http://www.octo.com
Octo Talks! http://blog.octo.com


Re: Learning-by-doing (also announcing a new Ruby Client Codename: Greek Architect)

2010-06-19 Thread Christian van der Leeden
Hi Thomas,

did you look at cassandra gem from twitter (fauna/cassandra) on github?
They also use the thrift_client and already have the basic cassandra API 
accessible.

I'm also using ruby with cassandra and still need to find a slick way to do the 
inserts
and when to update the indexes. 

If I understood correctly you want to look up a user by name. And you want to 
have
the name as row keys where you would find the UID with which you could look up 
your user?

From my understanding I would put the index into one row, name it e.g. 
user_by_name,
and insert columns where the column name is the user name and the value of the 
column
would be the UUID of the user. Cassandra creates the index on the column names. 
You can
efficiently insert new values into this row.

What do you want to do with the users partitioned by date? Sorted by creation 
date? Login date?

For the cascading deletes: a database would have to remove a record from its 
index, too.
In Cassandra you would have to do it yourself..

Christian

On Jun 19, 2010, at 3:27 AM, Thomas Heller wrote:

 Howdy!
 
 So, last week I finally got around to playing with Cassandra. After a
 while I understood the basics. To test this assumption I started
 working on my own Client implementation since Learning-by-doing is
 what I do and existing Ruby Clients (which are awesome) already
 abstracted too much for me to really grasp what was going on. Java is
 not really my thing (anymore) so I began with the Thrift API and Ruby.
 
 Anyways back to Topic.
 
 This library is now is available at:
 http://github.com/thheller/greek_architect
 
 Since I have virtually no experience with Cassandra (but plenty with
 SQL) I started with the first use-case which I have programmed a bunch
 of times before. User Management. I build websites which are used by
 other people, so I need to store them somewhere.
 
 Step #1: Creating Users and persisting them in Cassandra
 
 Example here:
 http://github.com/thheller/greek_architect/blob/master/spec/examples/user_create_spec.rb
 
 I hope my rspec-style documentation doesnt confuse too many people
 since I already have a gazillion questions for this simple, but also
 VERY common use-case. Since a question is best asked with a concrete
 example to refer to, here goes my first one:
 
 Would any of you veterans build what I built the way I did? (refering
 to the cassandra design, not the ruby client)
 
 I insert Users with UUID keys into one ColumnFamily. I then index them
 by creating a row in another ColumnFamily using the Name as Key and
 then adding one column holding a reference to the User UUID. I also
 insert a reference into another ColumnFamily holding a List of Users
 partitioned by Date.
 
 I'm really unsure about the index design, since they dont get updated
 when a User row is removed. I could hook into the remove call (like I
 did into mutations) and cascade the deletes where needed, but 10+
 years of SQL always want to tell me I'm crazy for doing this stuff!
 
 I'd really appreciate some feedback.
 
 Cheers,
 Thomas



Re: Occasional 10s Timeouts on Read

2010-06-19 Thread AJ Slater
I shall do just that. I did a bunch of tests this morning and the
situation appears to be this:

I have three nodes A, B and C, with RF=2. I understand now why this
issue wasn't apparent with RF=3.

If there are regular intranode column requests going on (e.g. i set up
a pinger to get remote columns), the cluster functions normally.
However, if no intranode column requests happen for a few hours, (3
hours is the minimum I've seen, but sometimes it takes longer), things
go wrong. Using node A as the point of contact from the client, all
columns that live on A are returned in a timely fashion, but for
columns that only live on B  C, the retrieval times out, with this in
the log:

INFO 13:13:28,345 error writing to /10.33.3.20

No request for replicas, or consistency checks are seen in the logs of
B  C at this time. Using 'nodetool ring' from each of the three nodes
shows all nodes as Up. Telnet from A to B on port 7000 connects.
Tcpdump logs look like, at first glance, that gossip communication,
perhaps heartbeats, are proceeding normally, but I haven't really
analyzed them.

Fifteen minutes later, the cluster decided to behave normally again.
Everyone talks to each other like buddies and delivers columns fast an
regularly.

This is really looking like a Cassandra bug. I'll report back with my
TRACE log later and I expect I'll be opening a ticket. The confidence
level of my employer in my Cassandra solution to their petabyte data
storage project is... uh... well... it could be better.

AJ


On Fri, Jun 18, 2010 at 8:16 PM, Jonathan Ellis jbel...@gmail.com wrote:
 set log level to TRACE and see if the OutboundTcpConnection is going
 bad.  that would explain the message never arriving.

 On Fri, Jun 18, 2010 at 10:39 AM, AJ Slater a...@zuno.com wrote:
 To summarize:

 If a request for a column comes in *after a period of several hours
 with no requests*, then the node servicing the request hangs while
 looking for its peer rather than servicing the request like it should.
 It then throws either a TimedOutException or a (wrong)
 NotFoundExeption.

 And it doen't appear to actually send the message it says it does to
 its peer. Or at least its peer doesn't report the request being
 received.

 And then the situation magically clears up after approximately 2 minutes.

 However, if the idle period never occurs, then the problem does not
 manifest. If I run a cron job with wget against my server every
 minute, I do not see the problem.

 I'll be looking at some tcpdump logs to see if i can suss out what's
 really happening, and perhaps file this as a bug. The several hours
 between reproducible events makes this whole thing aggravating for
 detection, debugging and I'll assume, fixing, if it is indeed a
 cassandra problem.

 It was suggested on IRC that it may be my network. But gossip is
 continually sending heartbeats and nodetool and the logs show the
 nodes as up and available. If my network was flaking out I'd think it
 would be dropping heartbeats and I'd see that.

 AJ

 On Thu, Jun 17, 2010 at 2:26 PM, AJ Slater a...@zuno.com wrote:
 These are physical machines.

 storage-conf.xml.fs03 is here:

 http://pastebin.com/weL41NB1

 Diffs from that for the other two storage-confs are inline here:

 a...@worm:../Z3/cassandra/conf/dev$ diff storage-conf.xml.lpc03
 storage-conf.xml.fs01
 185c185

   InitialToken71603818521973537678586548668074777838/InitialToken
 229c229
    ListenAddress10.33.2.70/ListenAddress
 ---
   ListenAddress10.33.3.10/ListenAddress
 241c241
    ThriftAddress10.33.2.70/ThriftAddress
 ---
   ThriftAddress10.33.3.10/ThriftAddress
 341c341
    ConcurrentReads16/ConcurrentReads
 ---
   ConcurrentReads4/ConcurrentReads


 a...@worm:../Z3/cassandra/conf/dev$ diff storage-conf.xml.lpc03
 storage-conf.xml.fs02
 185c185
    InitialToken0/InitialToken
 ---
   InitialToken120215585224964746744782921158327379306/InitialToken
 206d205
        Seed10.33.3.20/Seed
 229c228
    ListenAddress10.33.2.70/ListenAddress
 ---
   ListenAddress10.33.3.20/ListenAddress
 241c240
    ThriftAddress10.33.2.70/ThriftAddress
 ---
   ThriftAddress10.33.3.20/ThriftAddress
 341c340
    ConcurrentReads16/ConcurrentReads
 ---
   ConcurrentReads4/ConcurrentReads


 Thank you for your attention,

 AJ


 On Thu, Jun 17, 2010 at 2:09 PM, Benjamin Black b...@b3k.us wrote:
 Are these physical machines or virtuals?  Did you post your
 cassandra.in.sh and storage-conf.xml someplace?

 On Thu, Jun 17, 2010 at 10:31 AM, AJ Slater a...@zuno.com wrote:
 Total data size in the entire cluster is about twenty 12k images. With
 no other load on the system. I just ask for one column and I get these
 timeouts. Performing multiple gets on the columns leads to multiple
 timeouts for a period of a few seconds or minutes and then the
 situation magically resolves itself and response times are down to
 single digit milliseconds for a column get.

 On Thu, Jun 17, 2010 at 10:24 AM, AJ Slater a...@zuno.com wrote:
 Cassandra 0.6.2 from the apache 

Re: Learning-by-doing (also announcing a new Ruby Client Codename: Greek Architect)

2010-06-19 Thread Ryan King
On Sat, Jun 19, 2010 at 9:30 AM, Christian van der Leeden
christian.vanderlee...@googlemail.com wrote:
 Hi Thomas,

        did you look at cassandra gem from twitter (fauna/cassandra) on github?
 They also use the thrift_client and already have the basic cassandra API 
 accessible.

 I'm also using ruby with cassandra and still need to find a slick way to do 
 the inserts
 and when to update the indexes.

Have you looked at CassandraObject for that?
http://github.com/nzkoz/cassandra_object

-ryan


 If I understood correctly you want to look up a user by name. And you want to 
 have
 the name as row keys where you would find the UID with which you could look 
 up your user?

 From my understanding I would put the index into one row, name it e.g. 
 user_by_name,
 and insert columns where the column name is the user name and the value of 
 the column
 would be the UUID of the user. Cassandra creates the index on the column 
 names. You can
 efficiently insert new values into this row.

 What do you want to do with the users partitioned by date? Sorted by creation 
 date? Login date?

 For the cascading deletes: a database would have to remove a record from its 
 index, too.
 In Cassandra you would have to do it yourself..

 Christian

 On Jun 19, 2010, at 3:27 AM, Thomas Heller wrote:

 Howdy!

 So, last week I finally got around to playing with Cassandra. After a
 while I understood the basics. To test this assumption I started
 working on my own Client implementation since Learning-by-doing is
 what I do and existing Ruby Clients (which are awesome) already
 abstracted too much for me to really grasp what was going on. Java is
 not really my thing (anymore) so I began with the Thrift API and Ruby.

 Anyways back to Topic.

 This library is now is available at:
 http://github.com/thheller/greek_architect

 Since I have virtually no experience with Cassandra (but plenty with
 SQL) I started with the first use-case which I have programmed a bunch
 of times before. User Management. I build websites which are used by
 other people, so I need to store them somewhere.

 Step #1: Creating Users and persisting them in Cassandra

 Example here:
 http://github.com/thheller/greek_architect/blob/master/spec/examples/user_create_spec.rb

 I hope my rspec-style documentation doesnt confuse too many people
 since I already have a gazillion questions for this simple, but also
 VERY common use-case. Since a question is best asked with a concrete
 example to refer to, here goes my first one:

 Would any of you veterans build what I built the way I did? (refering
 to the cassandra design, not the ruby client)

 I insert Users with UUID keys into one ColumnFamily. I then index them
 by creating a row in another ColumnFamily using the Name as Key and
 then adding one column holding a reference to the User UUID. I also
 insert a reference into another ColumnFamily holding a List of Users
 partitioned by Date.

 I'm really unsure about the index design, since they dont get updated
 when a User row is removed. I could hook into the remove call (like I
 did into mutations) and cascade the deletes where needed, but 10+
 years of SQL always want to tell me I'm crazy for doing this stuff!

 I'd really appreciate some feedback.

 Cheers,
 Thomas




Re: Occasional 10s Timeouts on Read

2010-06-19 Thread Peter Schuller
 TRACE 14:42:06,248 unable to connect to /10.33.3.20
 java.net.ConnectException: Connection refused
        at java.net.PlainSocketImpl.socketConnect(Native Method)

So that's interesting since it is a clear failure that comes from the
operating system and indicates something which can be observed outside
of cassandra using system tools. Presumably either cassandra is
somehow connecting to the wrong port, or this is a
firewalling/os/network issue, or the 'other' cassandra is not
listening on the port. Using tcpdump/netstat -nlp should narrow that
down.

Is it possible connections only succeed in one direction for example?

-- 
/ Peter Schuller


Re: Occasional 10s Timeouts on Read

2010-06-19 Thread AJ Slater
The only indication I have that cassandra realized something was wrong
during this period was this INFO message:

10.33.2.70:/var/log/cassandra/output.log

DEBUG 20:00:35,841 get_slice
DEBUG 20:00:35,841 weakreadremote reading SliceFromReadCommand(table='jolitics.c
om', key='4c43228354b38f14a1a015dd722cdf4b', column_parent='QueryPath(columnFami
lyName='Images', superColumnName='null', columnName='null')', start='', finish='
', reversed=false, count=100)
DEBUG 20:00:35,841 weakreadremote reading SliceFromReadCommand(table='jolitics.c
om', key='4c43228354b38f14a1a015dd722cdf4b', column_parent='QueryPath(columnFami
lyName='Images', superColumnName='null', columnName='null')', start='', finish='
', reversed=false, count=100) from 60999@/10.33.3.10
INFO 20:00:35,842 error writing to /10.33.3.10
TRACE 20:00:36,267 Received a GossipDigestSynMessage from /10.33.3.10
TRACE 20:00:36,267 reporting /10.33.3.10
TRACE 20:00:36,267 reporting /10.33.3.10


Where it notes there's an error. The next read I did at 20:02, while
writing my last mail to this list, succeeded.

So, its timing out, but al the while sending heartbeats and
GossipDIgestSyns and Acks back and forth and maybe not really querying
its peers when it should, or timing out trying to do so. When it
finally realizes theres an error, it resets something? And then we're
back in business?

I'm going to be offline for 48 hours.

AJ

On Sat, Jun 19, 2010 at 8:09 PM, AJ Slater a...@zuno.com wrote:
 Agreed. But those connection errors were happening at a sort of random
 time. Not the time when I was seeing the problem. Now I am seeing the
 problem and here are some logs without ConnectionExceptions.

 Here we're asking 10.33.2.70 for key: 52e86817a577f75e545cdecd174d8b17
 which resides only on 10.33.3.10 and 10.33.3.20. Note all the
 apparently normal communication. Execept that no mention of a request
 for key 52e86817a577f75e545cdecd174d8b17 ever comes up in 10.33.3.10's
 log, despite 10.33.2.70 saying it was reading from 10.33.3.10

 The problem resolved itself again at 20:02, maybe 20 minutes later.
 where all of a sudden I get my columns returned in milliseconds and I
 see good stuff like:

 DEBUG 20:06:35,238 Reading consistency digest for
 52e86817a577f75e545cdecd174d8b17 from 59...@[/10.33.3.10, /10.33.3.20]

 Here's some logs from the problem period

 10.33.2.70:/var/log/cassandra/output.log

 DEBUG 19:42:03,230 get_slice
 DEBUG 19:42:03,231 weakreadremote reading 
 SliceFromReadCommand(table='jolitics.c
 om', key='52e86817a577f75e545cdecd174d8b17', 
 column_parent='QueryPath(columnFami
 lyName='Images', superColumnName='null', columnName='null')', start='', 
 finish='
 ', reversed=false, count=100)
 DEBUG 19:42:03,231 weakreadremote reading 
 SliceFromReadCommand(table='jolitics.c
 om', key='52e86817a577f75e545cdecd174d8b17', 
 column_parent='QueryPath(columnFami
 lyName='Images', superColumnName='null', columnName='null')', start='', 
 finish='
 ', reversed=false, count=100) from 57663@/10.33.3.10
 TRACE 19:42:03,619 Gossip Digests are : /10.33.2.70:1276981671:20386 
 /10.33.3.10
 :1276983719:18303 /10.33.3.20:1276983726:18295 /10.33.2.70:1276981671:20386
 TRACE 19:42:03,619 Sending a GossipDigestSynMessage to /10.33.3.20 ...
 TRACE 19:42:03,619 Performing status check ...
 TRACE 19:42:03,619 PHI for /10.33.3.10 : 0.95343619570936
 TRACE 19:42:03,619 PHI for /10.33.3.20 : 0.8635116192106644
 TRACE 19:42:03,621 Received a GossipDigestAckMessage from /10.33.3.20
 TRACE 19:42:03,621 reporting /10.33.3.10
 TRACE 19:42:03,621 reporting /10.33.3.20
 TRACE 19:42:03,621 marking as alive /10.33.3.10
 TRACE 19:42:03,621 Updating heartbeat state version to 18304 from 18303 for 
 /10.
 33.3.10 ...
 TRACE 19:42:03,621 marking as alive /10.33.3.20
 TRACE 19:42:03,621 Updating heartbeat state version to 18296 from 18295 for 
 /10.
 33.3.20 ...
 TRACE 19:42:03,622 Scanning for state greater than 20385 for /10.33.2.70
 TRACE 19:42:03,622 Scanning for state greater than 20385 for /10.33.2.70
 TRACE 19:42:03,622 Sending a GossipDigestAck2Message to /10.33.3.20
 TRACE 19:42:04,172 Received a GossipDigestSynMessage from /10.33.3.10
 TRACE 19:42:04,172 reporting /10.33.3.10
 TRACE 19:42:04,172 reporting /10.33.3.10
 TRACE 19:42:04,172 Scanning for state greater than 20385 for /10.33.2.70
 TRACE 19:42:04,172  Size of GossipDigestAckMessage is 52
 TRACE 19:42:04,172 Sending a GossipDigestAckMessage to /10.33.3.10
 TRACE 19:42:04,174 Received a GossipDigestAck2Message from /10.33.3.10
 TRACE 19:42:04,174 reporting /10.33.3.10
 TRACE 19:42:04,174 marking as alive /10.33.3.10
 TRACE 19:42:04,174 Updating heartbeat state version to 18305 from 18304 for 
 /10.
 33.3.10 ...


 10.33.3.10:/var/log/cassandra/output.log

 TRACE 19:42:03,174 Sending a GossipDigestSynMessage to /10.33.3.20 ...
 TRACE 19:42:03,174 Performing status check ...
 TRACE 19:42:03,174 PHI for /10.33.2.70 : 1.3363463863632534
 TRACE 19:42:03,174 PHI for /10.33.3.20 : 0.9297110501502452
 TRACE 19:42:03,175