read-update all columns access pattern

2012-05-14 Thread Marcel Steinbach
We're on a read and update heavy access pattern. E.g. each request to
Cassandra goes like

1. read all columns of row
2. do something with row
3. write all columns of row

the columns we use are always the same, e.g. always (c1,c2,c3). c2 and
c3 have a TTL.

Since we always read c1,c2,c3 and after that overwrite c1,c2,c3, I
found out, with https://issues.apache.org/jira/browse/CASSANDRA-2498,
specifying which columns I want to read prevents Cassandra from
looking into all historic SSTables.

However, there is also the possibility to switch to leveled
compactions for read/update intense workloads, right? How would you
compare both solutions?

Should we settle with the access pattern change, switch to leveled
compactions, or do both?

Thanks!
-- Marcel


Re: upgrade from 1.0.7 to 1.0.8

2012-03-11 Thread Marcel Steinbach
Check this out:

http://www.datastax.com/docs/1.0/install/upgrading#upgrading-between-minor-releases-of-cassandra-1-0-x

Cheers

Am 11.03.2012 um 07:42 schrieb Tamar Fraenkel ta...@tok-media.com:

Hi!

I want to experiment with upgrading. Does anyone have a good link on how to
upgrade Cassandra?

Thanks,

*Tamar Fraenkel *
Senior Software Engineer, TOK Media

tokLogo.png

ta...@tok-media.com
Tel:   +972 2 6409736
Mob:  +972 54 8356490
Fax:   +972 2 5612956


Re: data model question

2012-03-11 Thread Marcel Steinbach
Either you do that or you could think about using a secondary index on the
fb user name in your primary cf.

See http://www.datastax.com/docs/1.0/ddl/indexes

Cheers

Am 11.03.2012 um 09:51 schrieb Tamar Fraenkel ta...@tok-media.com:

Hi!
I need some advise:
I have user CF, which has a UUID key which is my internal user id.
One of the column is facebook_id of the user (if exist).

I need to have the reverse mapping from facebook_id to my UUID.
My intention is to add a CF for the mapping from Facebook Id to my id:

user_by_fbid = {
  // key is fb Id, column name is our User Id, value is empty
  13101876963: {
f94f6b20-161a-4f7e-995f-0466c62a1b6b : 
  }
}

Does this makes sense.
This CF will be used whenever a user log in through Facebook to retrieve
the internal id.
Thanks

*Tamar Fraenkel *
Senior Software Engineer, TOK Media

tokLogo.png

ta...@tok-media.com
Tel:   +972 2 6409736
Mob:  +972 54 8356490
Fax:   +972 2 5612956


Get all keys from the cluster

2012-01-21 Thread Marcel Steinbach
We're running a 8 node cluster with different CFs for different applications. 
One of the application uses 1.5TB out of 1.8TB in total, but only because we 
started out with a deletion mechanism and implemented one later on. So there is 
probably a high amount of old data in there, that we don't even use anymore. 

Now we want to delete that data. To know, which rows we may delete, we have to 
lookup a SQL database. If the key is not in there anymore, we may delete that 
row in cassandra, too. 

This basically means, we have to iterate over all the rows in that CF. This 
kind of begs for hadoop, but that seems not to be an option, currently. I tried.

So we figured, we could run over the sstables files (maybe only the index), 
check the keys in the mysql, and later run the deletes on the cluster. This 
way, we could iterate on each node in parallel. 

Does that sound reasonable? Any pros/cons, maybe a killer argument to use 
hadoop for that?

Cheers
Marcelhr style=border-color:blue
pchors GmbH
brhr style=border-color:blue
pspecialists in digital and direct marketing solutionsbr
Haid-und-Neu-Straße 7br
76131 Karlsruhe, Germanybr
www.chors.com/p
pManaging Directors: Dr. Volker Hatz, Markus PlattnerbrAmtsgericht 
Montabaur, HRB 15029/p
p style=font-size:9pxThis e-mail is for the intended recipient only and may 
contain confidential or privileged information. If you have received this 
e-mail by mistake, please contact us immediately and completely delete it (and 
any attachments) and do not forward it or inform any other person of its 
contents. If you send us messages by e-mail, we take this as your authorization 
to correspond with you by e-mail. E-mail transmission cannot be guaranteed to 
be secure or error-free as information could be intercepted, amended, 
corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. 
Neither chors GmbH nor the sender accept liability for any errors or omissions 
in the content of this message which arise as a result of its e-mail 
transmission. Please note that all e-mail communications to and from chors GmbH 
may be monitored./p

Re: Get all keys from the cluster

2012-01-21 Thread Marcel Steinbach
Thanks for your suggestions, Eric!

 One of the application uses 1.5TB out of 1.8TB

I'm sorry, maybe that statment was slightly ambiguous. I meant to say, that one 
application uses 1.5TB, while the others use 300GB, totalling in 1.8TB of data. 
Our total disk capacity, however, is at about 7 TB, so we're still far from 
running out of disk space.

 Is there any way that you could do that lookup in reverse where you pull the 
 records from your SQL database, figure out which keys aren't necessary, and 
 then delete any unnecessary keys that may or may not exist in cassandra? 
Unfortunately, that won't work since the SQL db does only contain the keys, 
that we want to _keep_ in cassandra.

 If that's not a possibility, then what about creating the same Cassandra 
 schema in a different keyspace and copying all the relevant records from the 
 current keyspace to the new keyspace using the SQL database records as a 
 basis for what is actually relevant within the new keyspace.  
I like that idea. So instead of iterating over all cassandra rows, I would 
iterate over the SQL DB, which would indeed save me a lot of IO. However, rows 
inserted into my CF during iterating over the SQL DB might not be copied into 
the new keyspace. But maybe we could arrange to do that 
during low-demand-hours to minimize the amount of new inserts and additionally 
run the copy a second time with a select on newly inserted sql rows. So we'll 
probably go with that.

Thanks again for your help!

Cheers
Marcel

On 21.01.2012, at 11:52, Eric Czech wrote:

 Is there any way that you could do that lookup in reverse where you pull the 
 records from your SQL database, figure out which keys aren't necessary, and 
 then delete any unnecessary keys that may or may not exist in cassandra?  
 
 If that's not a possibility, then what about creating the same Cassandra 
 schema in a different keyspace and copying all the relevant records from the 
 current keyspace to the new keyspace using the SQL database records as a 
 basis for what is actually relevant within the new keyspace.  If you could 
 perform that transfer, then you could just delete the old 1.5TB keyspace 
 altogether, leaving only the data you need.  If that sort of duplication 
 would put you over the 1.8TB limit during the transfer, then maybe you could 
 consider CF compression upfront.
 
 Short of that, I can tell from experience that doing these sort of left 
 join deletes from cassandra to SQL really suck.  We have had to resort to 
 using hadoop to do this but since our hadoop/cassandra clusters are much 
 larger than our single SQL instances, keeping all the hadoop processes from 
 basically DDoSing our SQL servers while still making the process faster 
 than thrift iterations over all the rows (via custom programs) in cassandra 
 hasn't been a convincing solution.
 
 I'd say that the first solution I proposed is definitely the best, but also 
 the most unrealistic.  If that's really not a possibility for you, then I'd 
 seriously look at trying to make my second suggestion work even if it means 
 brining up new hardware or increasing the capacity of existing resources.  
 That second suggestion also has the added benefit of likely minimizing I/O 
 since it's the only solution that doesn't require reading or deleting any of 
 the unnecessary data (beyond wholesale keyspace or CF deletions) assuming 
 that the actually relevant portion of your data is significantly less than 
 1.5TB.  
 
 I hope that helps!
 
 And in the future, you should really try to avoid letting your data size get 
 beyond 40 - 50 % of your actual on-disk capacity.  Let me know if anyone in 
 the community disagrees, but I'd say you're about 600 GB past the point at 
 which you have a lot of easy outs -- but I hope you find one anyways!
 
 
 On Sat, Jan 21, 2012 at 2:45 AM, Marcel Steinbach marcel.steinb...@chors.de 
 wrote:
 We're running a 8 node cluster with different CFs for different applications. 
 One of the application uses 1.5TB out of 1.8TB in total, but only because we 
 started out with a deletion mechanism and implemented one later on. So there 
 is probably a high amount of old data in there, that we don't even use 
 anymore.
 
 Now we want to delete that data. To know, which rows we may delete, we have 
 to lookup a SQL database. If the key is not in there anymore, we may delete 
 that row in cassandra, too.
 
 This basically means, we have to iterate over all the rows in that CF. This 
 kind of begs for hadoop, but that seems not to be an option, currently. I 
 tried.
 
 So we figured, we could run over the sstables files (maybe only the index), 
 check the keys in the mysql, and later run the deletes on the cluster. This 
 way, we could iterate on each node in parallel.
 
 Does that sound reasonable? Any pros/cons, maybe a killer argument to use 
 hadoop for that?
 
 Cheers
 Marcel
 hr style=border-color:blue
 pchors GmbH
 brhr style=border-color:blue
 pspecialists in digital

Re: Unbalanced cluster with RandomPartitioner

2012-01-21 Thread Marcel Steinbach
I thought about our issue again and was thinking, maybe the describeOwnership 
should take into account, if a token is outside the partitioners maximum token 
range?

To recap our problem: we had tokens, that were apart by 12.5% of the token 
range 2**127, however, we had an offset on each token, which moved the 
cluster's token range above 2**127. That resulted in two nodes getting almost 
none or none primary replicas. 

Afaik, the partitioner itself describes the key ownership in the ring, but it 
didn't take into account that we left its maximum key range. 

Of course, it  is silly and not very likely that users make that mistake, 
however, we did it, and it took me quite some time to figure that out (maybe 
also because it wasn't me that setup the cluster). 

To carry it to the extreme, you could construct a cluster of  n nodes with all 
tokens greater than 2**127, the ownership description would show a ownership of 
1/n each but all data would go to the node with the lowest token (given RP and 
RF=1).

I think it is wrong to calculate the ownership by subtracting the previous 
token from the current token and divide it by the maximum token without 
acknowledging we already might be out of bounds. 

Cheers 
Marcel

On 20.01.2012, at 16:28, Marcel Steinbach wrote:

 Thanks for all the responses!
 
 I found our problem:
 Using the Random Partitioner, the key range is from 0..2**127.When we added 
 nodes, we generated the keys and out of convenience, we added an offset to 
 the tokens because the move was easier like that.
 
 However, we did not execute the modulo 2**127 for the last two tokens, so 
 they were outside the RP's key range. 
 moving the last two tokens to their mod 2**127 will resolve the problem.
 
 Cheers,
 Marcel
 
 On 20.01.2012, at 10:32, Marcel Steinbach wrote:
 
 On 19.01.2012, at 20:15, Narendra Sharma wrote:
 I believe you need to move the nodes on the ring. What was the load on the 
 nodes before you added 5 new nodes? Its just that you are getting data in 
 certain token range more than others.
 With three nodes, it was also imbalanced. 
 
 What I don't understand is, why the md5 sums would generate such massive hot 
 spots. 
 
 Most of our keys look like that: 
 00013270494972450001234567
 with the first 16 digits being a timestamp of one of our application 
 server's startup times, and the last 10 digits being sequentially generated 
 per user. 
 
 There may be a lot of keys that start with e.g. 0001327049497245  (or some 
 other time stamp). But I was under the impression that md5 doesn't bother 
 and generates uniform distribution?
 But then again, I know next to nothing about md5. Maybe someone else has a 
 better insight to the algorithm?
 
 However, we also use cfs with a date (mmdd) as key, as well as cfs 
 with uuids as keys. And those cfs in itself are not balanced either. E.g. 
 node 5 has 12 GB live space used in the cf the uuid as key, and node 8 only 
 428MB. 
 
 Cheers,
 Marcel
 
 
 On Thu, Jan 19, 2012 at 3:22 AM, Marcel Steinbach 
 marcel.steinb...@chors.de wrote:
 On 18.01.2012, at 02:19, Maki Watanabe wrote:
 Are there any significant difference of number of sstables on each nodes?
 No, no significant difference there. Actually, node 8 is among those with 
 more sstables but with the least load (20GB)
 
 On 17.01.2012, at 20:14, Jeremiah Jordan wrote:
 Are you deleting data or using TTL's?  Expired/deleted data won't go away 
 until the sstable holding it is compacted.  So if compaction has happened 
 on some nodes, but not on others, you will see this.  The disparity is 
 pretty big 400Gb to 20GB, so this probably isn't the issue, but with our 
 data using TTL's if I run major compactions a couple times on that column 
 family it can shrink ~30%-40%.
 Yes, we do delete data. But I agree, the disparity is too big to blame only 
 the deletions. 
 
 Also, initially, we started out with 3 nodes and upgraded to 8 a few weeks 
 ago. After adding the node, we did
 compactions and cleanups and didn't have a balanced cluster. So that should 
 have removed outdated data, right?
 
 2012/1/18 Marcel Steinbach marcel.steinb...@chors.de:
 We are running regular repairs, so I don't think that's the problem.
 And the data dir sizes match approx. the load from the nodetool.
 Thanks for the advise, though.
 
 Our keys are digits only, and all contain a few zeros at the same
 offsets. I'm not that familiar with the md5 algorithm, but I doubt that it
 would generate 'hotspots' for those kind of keys, right?
 
 On 17.01.2012, at 17:34, Mohit Anchlia wrote:
 
 Have you tried running repair first on each node? Also, verify using
 df -h on the data dirs
 
 On Tue, Jan 17, 2012 at 7:34 AM, Marcel Steinbach
 marcel.steinb...@chors.de wrote:
 
 Hi,
 
 
 we're using RP and have each node assigned the same amount of the token
 space. The cluster looks like that:
 
 
 Address Status State   LoadOwnsToken
 
 
 205648943402372032879374446248852460236
 
 1

Re: Unbalanced cluster with RandomPartitioner

2012-01-20 Thread Marcel Steinbach
On 19.01.2012, at 20:15, Narendra Sharma wrote:
 I believe you need to move the nodes on the ring. What was the load on the 
 nodes before you added 5 new nodes? Its just that you are getting data in 
 certain token range more than others.
With three nodes, it was also imbalanced. 

What I don't understand is, why the md5 sums would generate such massive hot 
spots. 

Most of our keys look like that: 
00013270494972450001234567
with the first 16 digits being a timestamp of one of our application server's 
startup times, and the last 10 digits being sequentially generated per user. 

There may be a lot of keys that start with e.g. 0001327049497245  (or some 
other time stamp). But I was under the impression that md5 doesn't bother and 
generates uniform distribution?
But then again, I know next to nothing about md5. Maybe someone else has a 
better insight to the algorithm?

However, we also use cfs with a date (mmdd) as key, as well as cfs with 
uuids as keys. And those cfs in itself are not balanced either. E.g. node 5 has 
12 GB live space used in the cf the uuid as key, and node 8 only 428MB. 

Cheers,
Marcel

 
 On Thu, Jan 19, 2012 at 3:22 AM, Marcel Steinbach marcel.steinb...@chors.de 
 wrote:
 On 18.01.2012, at 02:19, Maki Watanabe wrote:
 Are there any significant difference of number of sstables on each nodes?
 No, no significant difference there. Actually, node 8 is among those with 
 more sstables but with the least load (20GB)
 
 On 17.01.2012, at 20:14, Jeremiah Jordan wrote:
 Are you deleting data or using TTL's?  Expired/deleted data won't go away 
 until the sstable holding it is compacted.  So if compaction has happened on 
 some nodes, but not on others, you will see this.  The disparity is pretty 
 big 400Gb to 20GB, so this probably isn't the issue, but with our data using 
 TTL's if I run major compactions a couple times on that column family it can 
 shrink ~30%-40%.
 Yes, we do delete data. But I agree, the disparity is too big to blame only 
 the deletions. 
 
 Also, initially, we started out with 3 nodes and upgraded to 8 a few weeks 
 ago. After adding the node, we did
 compactions and cleanups and didn't have a balanced cluster. So that should 
 have removed outdated data, right?
 
 2012/1/18 Marcel Steinbach marcel.steinb...@chors.de:
 We are running regular repairs, so I don't think that's the problem.
 And the data dir sizes match approx. the load from the nodetool.
 Thanks for the advise, though.
 
 Our keys are digits only, and all contain a few zeros at the same
 offsets. I'm not that familiar with the md5 algorithm, but I doubt that it
 would generate 'hotspots' for those kind of keys, right?
 
 On 17.01.2012, at 17:34, Mohit Anchlia wrote:
 
 Have you tried running repair first on each node? Also, verify using
 df -h on the data dirs
 
 On Tue, Jan 17, 2012 at 7:34 AM, Marcel Steinbach
 marcel.steinb...@chors.de wrote:
 
 Hi,
 
 
 we're using RP and have each node assigned the same amount of the token
 space. The cluster looks like that:
 
 
 Address Status State   LoadOwnsToken
 
 
 205648943402372032879374446248852460236
 
 1   Up Normal  310.83 GB   12.50%
  56775407874461455114148055497453867724
 
 2   Up Normal  470.24 GB   12.50%
  78043055807020109080608968461939380940
 
 3   Up Normal  271.57 GB   12.50%
  99310703739578763047069881426424894156
 
 4   Up Normal  282.61 GB   12.50%
  120578351672137417013530794390910407372
 
 5   Up Normal  248.76 GB   12.50%
  141845999604696070979991707355395920588
 
 6   Up Normal  164.12 GB   12.50%
  163113647537254724946452620319881433804
 
 7   Up Normal  76.23 GB12.50%
  184381295469813378912913533284366947020
 
 8   Up Normal  19.79 GB12.50%
  205648943402372032879374446248852460236
 
 
 I was under the impression, the RP would distribute the load more evenly.
 
 Our row sizes are 0,5-1 KB, hence, we don't store huge rows on a single
 node. Should we just move the nodes so that the load is more even
 distributed, or is there something off that needs to be fixed first?
 
 
 Thanks
 
 Marcel
 
 hr style=border-color:blue
 
 pchors GmbH
 
 brhr style=border-color:blue
 
 pspecialists in digital and direct marketing solutionsbr
 
 Haid-und-Neu-Straße 7br
 
 76131 Karlsruhe, Germanybr
 
 www.chors.com/p
 
 pManaging Directors: Dr. Volker Hatz, Markus PlattnerbrAmtsgericht
 Montabaur, HRB 15029/p
 
 p style=font-size:9pxThis e-mail is for the intended recipient only and
 may contain confidential or privileged information. If you have received
 this e-mail by mistake, please contact us immediately and completely delete
 it (and any attachments) and do not forward it or inform any other person of
 its contents. If you send us messages by e-mail, we take this as your
 authorization to correspond with you by e-mail. E-mail transmission cannot
 be guaranteed to be secure or error-free as information

Re: Unbalanced cluster with RandomPartitioner

2012-01-20 Thread Marcel Steinbach
Thanks for all the responses!

I found our problem:
Using the Random Partitioner, the key range is from 0..2**127.When we added 
nodes, we generated the keys and out of convenience, we added an offset to the 
tokens because the move was easier like that.

However, we did not execute the modulo 2**127 for the last two tokens, so they 
were outside the RP's key range. 
moving the last two tokens to their mod 2**127 will resolve the problem.

Cheers,
Marcel

On 20.01.2012, at 10:32, Marcel Steinbach wrote:

 On 19.01.2012, at 20:15, Narendra Sharma wrote:
 I believe you need to move the nodes on the ring. What was the load on the 
 nodes before you added 5 new nodes? Its just that you are getting data in 
 certain token range more than others.
 With three nodes, it was also imbalanced. 
 
 What I don't understand is, why the md5 sums would generate such massive hot 
 spots. 
 
 Most of our keys look like that: 
 00013270494972450001234567
 with the first 16 digits being a timestamp of one of our application server's 
 startup times, and the last 10 digits being sequentially generated per user. 
 
 There may be a lot of keys that start with e.g. 0001327049497245  (or some 
 other time stamp). But I was under the impression that md5 doesn't bother and 
 generates uniform distribution?
 But then again, I know next to nothing about md5. Maybe someone else has a 
 better insight to the algorithm?
 
 However, we also use cfs with a date (mmdd) as key, as well as cfs with 
 uuids as keys. And those cfs in itself are not balanced either. E.g. node 5 
 has 12 GB live space used in the cf the uuid as key, and node 8 only 428MB. 
 
 Cheers,
 Marcel
 
 
 On Thu, Jan 19, 2012 at 3:22 AM, Marcel Steinbach 
 marcel.steinb...@chors.de wrote:
 On 18.01.2012, at 02:19, Maki Watanabe wrote:
 Are there any significant difference of number of sstables on each nodes?
 No, no significant difference there. Actually, node 8 is among those with 
 more sstables but with the least load (20GB)
 
 On 17.01.2012, at 20:14, Jeremiah Jordan wrote:
 Are you deleting data or using TTL's?  Expired/deleted data won't go away 
 until the sstable holding it is compacted.  So if compaction has happened 
 on some nodes, but not on others, you will see this.  The disparity is 
 pretty big 400Gb to 20GB, so this probably isn't the issue, but with our 
 data using TTL's if I run major compactions a couple times on that column 
 family it can shrink ~30%-40%.
 Yes, we do delete data. But I agree, the disparity is too big to blame only 
 the deletions. 
 
 Also, initially, we started out with 3 nodes and upgraded to 8 a few weeks 
 ago. After adding the node, we did
 compactions and cleanups and didn't have a balanced cluster. So that should 
 have removed outdated data, right?
 
 2012/1/18 Marcel Steinbach marcel.steinb...@chors.de:
 We are running regular repairs, so I don't think that's the problem.
 And the data dir sizes match approx. the load from the nodetool.
 Thanks for the advise, though.
 
 Our keys are digits only, and all contain a few zeros at the same
 offsets. I'm not that familiar with the md5 algorithm, but I doubt that it
 would generate 'hotspots' for those kind of keys, right?
 
 On 17.01.2012, at 17:34, Mohit Anchlia wrote:
 
 Have you tried running repair first on each node? Also, verify using
 df -h on the data dirs
 
 On Tue, Jan 17, 2012 at 7:34 AM, Marcel Steinbach
 marcel.steinb...@chors.de wrote:
 
 Hi,
 
 
 we're using RP and have each node assigned the same amount of the token
 space. The cluster looks like that:
 
 
 Address Status State   LoadOwnsToken
 
 
 205648943402372032879374446248852460236
 
 1   Up Normal  310.83 GB   12.50%
 56775407874461455114148055497453867724
 
 2   Up Normal  470.24 GB   12.50%
 78043055807020109080608968461939380940
 
 3   Up Normal  271.57 GB   12.50%
 99310703739578763047069881426424894156
 
 4   Up Normal  282.61 GB   12.50%
 120578351672137417013530794390910407372
 
 5   Up Normal  248.76 GB   12.50%
 141845999604696070979991707355395920588
 
 6   Up Normal  164.12 GB   12.50%
 163113647537254724946452620319881433804
 
 7   Up Normal  76.23 GB12.50%
 184381295469813378912913533284366947020
 
 8   Up Normal  19.79 GB12.50%
 205648943402372032879374446248852460236
 
 
 I was under the impression, the RP would distribute the load more evenly.
 
 Our row sizes are 0,5-1 KB, hence, we don't store huge rows on a single
 node. Should we just move the nodes so that the load is more even
 distributed, or is there something off that needs to be fixed first?
 
 
 Thanks
 
 Marcel
 
 hr style=border-color:blue
 
 pchors GmbH
 
 brhr style=border-color:blue
 
 pspecialists in digital and direct marketing solutionsbr
 
 Haid-und-Neu-Straße 7br
 
 76131 Karlsruhe, Germanybr
 
 www.chors.com/p
 
 pManaging Directors: Dr. Volker Hatz, Markus PlattnerbrAmtsgericht

Re: Unbalanced cluster with RandomPartitioner

2012-01-19 Thread Marcel Steinbach
On 18.01.2012, at 02:19, Maki Watanabe wrote:
 Are there any significant difference of number of sstables on each nodes?
No, no significant difference there. Actually, node 8 is among those with more 
sstables but with the least load (20GB)

On 17.01.2012, at 20:14, Jeremiah Jordan wrote:
 Are you deleting data or using TTL's?  Expired/deleted data won't go away 
 until the sstable holding it is compacted.  So if compaction has happened on 
 some nodes, but not on others, you will see this.  The disparity is pretty 
 big 400Gb to 20GB, so this probably isn't the issue, but with our data using 
 TTL's if I run major compactions a couple times on that column family it can 
 shrink ~30%-40%.
Yes, we do delete data. But I agree, the disparity is too big to blame only the 
deletions. 

Also, initially, we started out with 3 nodes and upgraded to 8 a few weeks ago. 
After adding the node, we did
compactions and cleanups and didn't have a balanced cluster. So that should 
have removed outdated data, right?

 2012/1/18 Marcel Steinbach marcel.steinb...@chors.de:
 We are running regular repairs, so I don't think that's the problem.
 And the data dir sizes match approx. the load from the nodetool.
 Thanks for the advise, though.
 
 Our keys are digits only, and all contain a few zeros at the same
 offsets. I'm not that familiar with the md5 algorithm, but I doubt that it
 would generate 'hotspots' for those kind of keys, right?
 
 On 17.01.2012, at 17:34, Mohit Anchlia wrote:
 
 Have you tried running repair first on each node? Also, verify using
 df -h on the data dirs
 
 On Tue, Jan 17, 2012 at 7:34 AM, Marcel Steinbach
 marcel.steinb...@chors.de wrote:
 
 Hi,
 
 
 we're using RP and have each node assigned the same amount of the token
 space. The cluster looks like that:
 
 
 Address Status State   LoadOwnsToken
 
 
 205648943402372032879374446248852460236
 
 1   Up Normal  310.83 GB   12.50%
  56775407874461455114148055497453867724
 
 2   Up Normal  470.24 GB   12.50%
  78043055807020109080608968461939380940
 
 3   Up Normal  271.57 GB   12.50%
  99310703739578763047069881426424894156
 
 4   Up Normal  282.61 GB   12.50%
  120578351672137417013530794390910407372
 
 5   Up Normal  248.76 GB   12.50%
  141845999604696070979991707355395920588
 
 6   Up Normal  164.12 GB   12.50%
  163113647537254724946452620319881433804
 
 7   Up Normal  76.23 GB12.50%
  184381295469813378912913533284366947020
 
 8   Up Normal  19.79 GB12.50%
  205648943402372032879374446248852460236
 
 
 I was under the impression, the RP would distribute the load more evenly.
 
 Our row sizes are 0,5-1 KB, hence, we don't store huge rows on a single
 node. Should we just move the nodes so that the load is more even
 distributed, or is there something off that needs to be fixed first?
 
 
 Thanks
 
 Marcel
 
 hr style=border-color:blue
 
 pchors GmbH
 
 brhr style=border-color:blue
 
 pspecialists in digital and direct marketing solutionsbr
 
 Haid-und-Neu-Straße 7br
 
 76131 Karlsruhe, Germanybr
 
 www.chors.com/p
 
 pManaging Directors: Dr. Volker Hatz, Markus PlattnerbrAmtsgericht
 Montabaur, HRB 15029/p
 
 p style=font-size:9pxThis e-mail is for the intended recipient only and
 may contain confidential or privileged information. If you have received
 this e-mail by mistake, please contact us immediately and completely delete
 it (and any attachments) and do not forward it or inform any other person of
 its contents. If you send us messages by e-mail, we take this as your
 authorization to correspond with you by e-mail. E-mail transmission cannot
 be guaranteed to be secure or error-free as information could be
 intercepted, amended, corrupted, lost, destroyed, arrive late or incomplete,
 or contain viruses. Neither chors GmbH nor the sender accept liability for
 any errors or omissions in the content of this message which arise as a
 result of its e-mail transmission. Please note that all e-mail
 communications to and from chors GmbH may be monitored./p
 
 
 
 
 
 -- 
 w3m



Re: Unbalanced cluster with RandomPartitioner

2012-01-19 Thread Marcel Steinbach
2012/1/19 aaron morton aa...@thelastpickle.com:
 If you have performed any token moves the data will not be deleted until you
 run nodetool cleanup.
We did that after adding nodes to the cluster. And then, the cluster
wasn't balanced either.
Also, does the Load really account for dead data, or is it just live data?

 To get a baseline I would run nodetool compact to do major compaction and
 purge any tomb stones as others have said.
We will do that, but I doubt we have 450GB tomb stones on node 2...

 Cheers

 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 18/01/2012, at 2:19 PM, Maki Watanabe wrote:

 Are there any significant difference of number of sstables on each nodes?

 2012/1/18 Marcel Steinbach marcel.steinb...@chors.de:

 We are running regular repairs, so I don't think that's the problem.

 And the data dir sizes match approx. the load from the nodetool.

 Thanks for the advise, though.


 Our keys are digits only, and all contain a few zeros at the same

 offsets. I'm not that familiar with the md5 algorithm, but I doubt that it

 would generate 'hotspots' for those kind of keys, right?


 On 17.01.2012, at 17:34, Mohit Anchlia wrote:


 Have you tried running repair first on each node? Also, verify using

 df -h on the data dirs


 On Tue, Jan 17, 2012 at 7:34 AM, Marcel Steinbach

 marcel.steinb...@chors.de wrote:


 Hi,



 we're using RP and have each node assigned the same amount of the token

 space. The cluster looks like that:



 Address         Status State   Load            Owns    Token



 205648943402372032879374446248852460236


 1       Up     Normal  310.83 GB       12.50%

  56775407874461455114148055497453867724


 2       Up     Normal  470.24 GB       12.50%

  78043055807020109080608968461939380940


 3       Up     Normal  271.57 GB       12.50%

  99310703739578763047069881426424894156


 4       Up     Normal  282.61 GB       12.50%

  120578351672137417013530794390910407372


 5       Up     Normal  248.76 GB       12.50%

  141845999604696070979991707355395920588


 6       Up     Normal  164.12 GB       12.50%

  163113647537254724946452620319881433804


 7       Up     Normal  76.23 GB        12.50%

  184381295469813378912913533284366947020


 8       Up     Normal  19.79 GB        12.50%

  205648943402372032879374446248852460236



 I was under the impression, the RP would distribute the load more evenly.


 Our row sizes are 0,5-1 KB, hence, we don't store huge rows on a single

 node. Should we just move the nodes so that the load is more even

 distributed, or is there something off that needs to be fixed first?



 Thanks


 Marcel


 hr style=border-color:blue


 pchors GmbH


 brhr style=border-color:blue


 pspecialists in digital and direct marketing solutionsbr


 Haid-und-Neu-Straße 7br


 76131 Karlsruhe, Germanybr


 www.chors.com/p


 pManaging Directors: Dr. Volker Hatz, Markus PlattnerbrAmtsgericht

 Montabaur, HRB 15029/p


 p style=font-size:9pxThis e-mail is for the intended recipient only and

 may contain confidential or privileged information. If you have received

 this e-mail by mistake, please contact us immediately and completely delete

 it (and any attachments) and do not forward it or inform any other person of

 its contents. If you send us messages by e-mail, we take this as your

 authorization to correspond with you by e-mail. E-mail transmission cannot

 be guaranteed to be secure or error-free as information could be

 intercepted, amended, corrupted, lost, destroyed, arrive late or incomplete,

 or contain viruses. Neither chors GmbH nor the sender accept liability for

 any errors or omissions in the content of this message which arise as a

 result of its e-mail transmission. Please note that all e-mail

 communications to and from chors GmbH may be monitored./p






 --
 w3m




RecentReadLatencyHistogramMicros vs. latencies in client

2012-01-17 Thread Marcel Steinbach
Hi, 
we're running a 8 node cassandra-0.7.6 cluster, with avg. throughput of 5k 
reads/s and almost as much writes/s. The client API is pelops 1.1-0.7.x.

Latencies in the CFs (RecentReadLatencyHistogramMicros) look fine with 99th 
percentile at 61ms. However, on the client side, p99 latency is at 1.1s 
(seconds!) and we only have 91% below 60ms! So there is a big difference 
between the numbers shown in the CF latencies and what the client actually 
experiences. 

The cluster is not very balanced currently, but nothing indicates a latency of 
1.1 seconds. I also see high write latency in the clients, with about 4% taking 
50+ ms. Whereas in the RecentWriteLatencyHistogramMicros, 99,9% of the 
latencies are below 1ms.

I'm not sure where the additional latency is gained. Is it possible, the 
request spends some time in a queue before being processed? If so, is there a 
way to optimize that? I already increased core pool size for the ReadStage, 
which didn't improve things.
Any other ideas? 

Thanks!
Marcelhr style=border-color:blue
pchors GmbH
brhr style=border-color:blue
pspecialists in digital and direct marketing solutionsbr
Haid-und-Neu-Straße 7br
76131 Karlsruhe, Germanybr
www.chors.com/p
pManaging Directors: Dr. Volker Hatz, Markus PlattnerbrAmtsgericht 
Montabaur, HRB 15029/p
p style=font-size:9pxThis e-mail is for the intended recipient only and may 
contain confidential or privileged information. If you have received this 
e-mail by mistake, please contact us immediately and completely delete it (and 
any attachments) and do not forward it or inform any other person of its 
contents. If you send us messages by e-mail, we take this as your authorization 
to correspond with you by e-mail. E-mail transmission cannot be guaranteed to 
be secure or error-free as information could be intercepted, amended, 
corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. 
Neither chors GmbH nor the sender accept liability for any errors or omissions 
in the content of this message which arise as a result of its e-mail 
transmission. Please note that all e-mail communications to and from chors GmbH 
may be monitored./p

Unbalanced cluster with RandomPartitioner

2012-01-17 Thread Marcel Steinbach
Hi,

we're using RP and have each node assigned the same amount of the token space. 
The cluster looks like that:

Address Status State   LoadOwnsToken
   
   
205648943402372032879374446248852460236 
1   Up Normal  310.83 GB   12.50%  
56775407874461455114148055497453867724  
2   Up Normal  470.24 GB   12.50%  
78043055807020109080608968461939380940  
3   Up Normal  271.57 GB   12.50%  
99310703739578763047069881426424894156  
4   Up Normal  282.61 GB   12.50%  
120578351672137417013530794390910407372 
5   Up Normal  248.76 GB   12.50%  
141845999604696070979991707355395920588 
6   Up Normal  164.12 GB   12.50%  
163113647537254724946452620319881433804 
7   Up Normal  76.23 GB12.50%  
184381295469813378912913533284366947020 
8   Up Normal  19.79 GB12.50%  
205648943402372032879374446248852460236 

I was under the impression, the RP would distribute the load more evenly.
Our row sizes are 0,5-1 KB, hence, we don't store huge rows on a single node. 
Should we just move the nodes so that the load is more even distributed, or is 
there something off that needs to be fixed first?

Thanks
Marcelhr style=border-color:blue
pchors GmbH
brhr style=border-color:blue
pspecialists in digital and direct marketing solutionsbr
Haid-und-Neu-Straße 7br
76131 Karlsruhe, Germanybr
www.chors.com/p
pManaging Directors: Dr. Volker Hatz, Markus PlattnerbrAmtsgericht 
Montabaur, HRB 15029/p
p style=font-size:9pxThis e-mail is for the intended recipient only and may 
contain confidential or privileged information. If you have received this 
e-mail by mistake, please contact us immediately and completely delete it (and 
any attachments) and do not forward it or inform any other person of its 
contents. If you send us messages by e-mail, we take this as your authorization 
to correspond with you by e-mail. E-mail transmission cannot be guaranteed to 
be secure or error-free as information could be intercepted, amended, 
corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. 
Neither chors GmbH nor the sender accept liability for any errors or omissions 
in the content of this message which arise as a result of its e-mail 
transmission. Please note that all e-mail communications to and from chors GmbH 
may be monitored./p

Re: Unbalanced cluster with RandomPartitioner

2012-01-17 Thread Marcel Steinbach
We are running regular repairs, so I don't think that's the problem. 
And the data dir sizes match approx. the load from the nodetool.
Thanks for the advise, though.

Our keys are digits only, and all contain a few zeros at the same offsets. I'm 
not that familiar with the md5 algorithm, but I doubt that it would generate 
'hotspots' for those kind of keys, right?

On 17.01.2012, at 17:34, Mohit Anchlia wrote:

 Have you tried running repair first on each node? Also, verify using
 df -h on the data dirs
 
 On Tue, Jan 17, 2012 at 7:34 AM, Marcel Steinbach
 marcel.steinb...@chors.de wrote:
 Hi,
 
 we're using RP and have each node assigned the same amount of the token 
 space. The cluster looks like that:
 
 Address Status State   LoadOwnsToken
   
 205648943402372032879374446248852460236
 1   Up Normal  310.83 GB   12.50%  
 56775407874461455114148055497453867724
 2   Up Normal  470.24 GB   12.50%  
 78043055807020109080608968461939380940
 3   Up Normal  271.57 GB   12.50%  
 99310703739578763047069881426424894156
 4   Up Normal  282.61 GB   12.50%  
 120578351672137417013530794390910407372
 5   Up Normal  248.76 GB   12.50%  
 141845999604696070979991707355395920588
 6   Up Normal  164.12 GB   12.50%  
 163113647537254724946452620319881433804
 7   Up Normal  76.23 GB12.50%  
 184381295469813378912913533284366947020
 8   Up Normal  19.79 GB12.50%  
 205648943402372032879374446248852460236
 
 I was under the impression, the RP would distribute the load more evenly.
 Our row sizes are 0,5-1 KB, hence, we don't store huge rows on a single 
 node. Should we just move the nodes so that the load is more even 
 distributed, or is there something off that needs to be fixed first?
 
 Thanks
 Marcel
 hr style=border-color:blue
 pchors GmbH
 brhr style=border-color:blue
 pspecialists in digital and direct marketing solutionsbr
 Haid-und-Neu-Straße 7br
 76131 Karlsruhe, Germanybr
 www.chors.com/p
 pManaging Directors: Dr. Volker Hatz, Markus PlattnerbrAmtsgericht 
 Montabaur, HRB 15029/p
 p style=font-size:9pxThis e-mail is for the intended recipient only and 
 may contain confidential or privileged information. If you have received 
 this e-mail by mistake, please contact us immediately and completely delete 
 it (and any attachments) and do not forward it or inform any other person of 
 its contents. If you send us messages by e-mail, we take this as your 
 authorization to correspond with you by e-mail. E-mail transmission cannot 
 be guaranteed to be secure or error-free as information could be 
 intercepted, amended, corrupted, lost, destroyed, arrive late or incomplete, 
 or contain viruses. Neither chors GmbH nor the sender accept liability for 
 any errors or omissions in the content of this message which arise as a 
 result of its e-mail transmission. Please note that all e-mail 
 communications to and from chors GmbH may be monitored./p