read-update all columns access pattern
We're on a read and update heavy access pattern. E.g. each request to Cassandra goes like 1. read all columns of row 2. do something with row 3. write all columns of row the columns we use are always the same, e.g. always (c1,c2,c3). c2 and c3 have a TTL. Since we always read c1,c2,c3 and after that overwrite c1,c2,c3, I found out, with https://issues.apache.org/jira/browse/CASSANDRA-2498, specifying which columns I want to read prevents Cassandra from looking into all historic SSTables. However, there is also the possibility to switch to leveled compactions for read/update intense workloads, right? How would you compare both solutions? Should we settle with the access pattern change, switch to leveled compactions, or do both? Thanks! -- Marcel
Re: upgrade from 1.0.7 to 1.0.8
Check this out: http://www.datastax.com/docs/1.0/install/upgrading#upgrading-between-minor-releases-of-cassandra-1-0-x Cheers Am 11.03.2012 um 07:42 schrieb Tamar Fraenkel ta...@tok-media.com: Hi! I want to experiment with upgrading. Does anyone have a good link on how to upgrade Cassandra? Thanks, *Tamar Fraenkel * Senior Software Engineer, TOK Media tokLogo.png ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956
Re: data model question
Either you do that or you could think about using a secondary index on the fb user name in your primary cf. See http://www.datastax.com/docs/1.0/ddl/indexes Cheers Am 11.03.2012 um 09:51 schrieb Tamar Fraenkel ta...@tok-media.com: Hi! I need some advise: I have user CF, which has a UUID key which is my internal user id. One of the column is facebook_id of the user (if exist). I need to have the reverse mapping from facebook_id to my UUID. My intention is to add a CF for the mapping from Facebook Id to my id: user_by_fbid = { // key is fb Id, column name is our User Id, value is empty 13101876963: { f94f6b20-161a-4f7e-995f-0466c62a1b6b : } } Does this makes sense. This CF will be used whenever a user log in through Facebook to retrieve the internal id. Thanks *Tamar Fraenkel * Senior Software Engineer, TOK Media tokLogo.png ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956
Get all keys from the cluster
We're running a 8 node cluster with different CFs for different applications. One of the application uses 1.5TB out of 1.8TB in total, but only because we started out with a deletion mechanism and implemented one later on. So there is probably a high amount of old data in there, that we don't even use anymore. Now we want to delete that data. To know, which rows we may delete, we have to lookup a SQL database. If the key is not in there anymore, we may delete that row in cassandra, too. This basically means, we have to iterate over all the rows in that CF. This kind of begs for hadoop, but that seems not to be an option, currently. I tried. So we figured, we could run over the sstables files (maybe only the index), check the keys in the mysql, and later run the deletes on the cluster. This way, we could iterate on each node in parallel. Does that sound reasonable? Any pros/cons, maybe a killer argument to use hadoop for that? Cheers Marcelhr style=border-color:blue pchors GmbH brhr style=border-color:blue pspecialists in digital and direct marketing solutionsbr Haid-und-Neu-Straße 7br 76131 Karlsruhe, Germanybr www.chors.com/p pManaging Directors: Dr. Volker Hatz, Markus PlattnerbrAmtsgericht Montabaur, HRB 15029/p p style=font-size:9pxThis e-mail is for the intended recipient only and may contain confidential or privileged information. If you have received this e-mail by mistake, please contact us immediately and completely delete it (and any attachments) and do not forward it or inform any other person of its contents. If you send us messages by e-mail, we take this as your authorization to correspond with you by e-mail. E-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, amended, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. Neither chors GmbH nor the sender accept liability for any errors or omissions in the content of this message which arise as a result of its e-mail transmission. Please note that all e-mail communications to and from chors GmbH may be monitored./p
Re: Get all keys from the cluster
Thanks for your suggestions, Eric! One of the application uses 1.5TB out of 1.8TB I'm sorry, maybe that statment was slightly ambiguous. I meant to say, that one application uses 1.5TB, while the others use 300GB, totalling in 1.8TB of data. Our total disk capacity, however, is at about 7 TB, so we're still far from running out of disk space. Is there any way that you could do that lookup in reverse where you pull the records from your SQL database, figure out which keys aren't necessary, and then delete any unnecessary keys that may or may not exist in cassandra? Unfortunately, that won't work since the SQL db does only contain the keys, that we want to _keep_ in cassandra. If that's not a possibility, then what about creating the same Cassandra schema in a different keyspace and copying all the relevant records from the current keyspace to the new keyspace using the SQL database records as a basis for what is actually relevant within the new keyspace. I like that idea. So instead of iterating over all cassandra rows, I would iterate over the SQL DB, which would indeed save me a lot of IO. However, rows inserted into my CF during iterating over the SQL DB might not be copied into the new keyspace. But maybe we could arrange to do that during low-demand-hours to minimize the amount of new inserts and additionally run the copy a second time with a select on newly inserted sql rows. So we'll probably go with that. Thanks again for your help! Cheers Marcel On 21.01.2012, at 11:52, Eric Czech wrote: Is there any way that you could do that lookup in reverse where you pull the records from your SQL database, figure out which keys aren't necessary, and then delete any unnecessary keys that may or may not exist in cassandra? If that's not a possibility, then what about creating the same Cassandra schema in a different keyspace and copying all the relevant records from the current keyspace to the new keyspace using the SQL database records as a basis for what is actually relevant within the new keyspace. If you could perform that transfer, then you could just delete the old 1.5TB keyspace altogether, leaving only the data you need. If that sort of duplication would put you over the 1.8TB limit during the transfer, then maybe you could consider CF compression upfront. Short of that, I can tell from experience that doing these sort of left join deletes from cassandra to SQL really suck. We have had to resort to using hadoop to do this but since our hadoop/cassandra clusters are much larger than our single SQL instances, keeping all the hadoop processes from basically DDoSing our SQL servers while still making the process faster than thrift iterations over all the rows (via custom programs) in cassandra hasn't been a convincing solution. I'd say that the first solution I proposed is definitely the best, but also the most unrealistic. If that's really not a possibility for you, then I'd seriously look at trying to make my second suggestion work even if it means brining up new hardware or increasing the capacity of existing resources. That second suggestion also has the added benefit of likely minimizing I/O since it's the only solution that doesn't require reading or deleting any of the unnecessary data (beyond wholesale keyspace or CF deletions) assuming that the actually relevant portion of your data is significantly less than 1.5TB. I hope that helps! And in the future, you should really try to avoid letting your data size get beyond 40 - 50 % of your actual on-disk capacity. Let me know if anyone in the community disagrees, but I'd say you're about 600 GB past the point at which you have a lot of easy outs -- but I hope you find one anyways! On Sat, Jan 21, 2012 at 2:45 AM, Marcel Steinbach marcel.steinb...@chors.de wrote: We're running a 8 node cluster with different CFs for different applications. One of the application uses 1.5TB out of 1.8TB in total, but only because we started out with a deletion mechanism and implemented one later on. So there is probably a high amount of old data in there, that we don't even use anymore. Now we want to delete that data. To know, which rows we may delete, we have to lookup a SQL database. If the key is not in there anymore, we may delete that row in cassandra, too. This basically means, we have to iterate over all the rows in that CF. This kind of begs for hadoop, but that seems not to be an option, currently. I tried. So we figured, we could run over the sstables files (maybe only the index), check the keys in the mysql, and later run the deletes on the cluster. This way, we could iterate on each node in parallel. Does that sound reasonable? Any pros/cons, maybe a killer argument to use hadoop for that? Cheers Marcel hr style=border-color:blue pchors GmbH brhr style=border-color:blue pspecialists in digital
Re: Unbalanced cluster with RandomPartitioner
I thought about our issue again and was thinking, maybe the describeOwnership should take into account, if a token is outside the partitioners maximum token range? To recap our problem: we had tokens, that were apart by 12.5% of the token range 2**127, however, we had an offset on each token, which moved the cluster's token range above 2**127. That resulted in two nodes getting almost none or none primary replicas. Afaik, the partitioner itself describes the key ownership in the ring, but it didn't take into account that we left its maximum key range. Of course, it is silly and not very likely that users make that mistake, however, we did it, and it took me quite some time to figure that out (maybe also because it wasn't me that setup the cluster). To carry it to the extreme, you could construct a cluster of n nodes with all tokens greater than 2**127, the ownership description would show a ownership of 1/n each but all data would go to the node with the lowest token (given RP and RF=1). I think it is wrong to calculate the ownership by subtracting the previous token from the current token and divide it by the maximum token without acknowledging we already might be out of bounds. Cheers Marcel On 20.01.2012, at 16:28, Marcel Steinbach wrote: Thanks for all the responses! I found our problem: Using the Random Partitioner, the key range is from 0..2**127.When we added nodes, we generated the keys and out of convenience, we added an offset to the tokens because the move was easier like that. However, we did not execute the modulo 2**127 for the last two tokens, so they were outside the RP's key range. moving the last two tokens to their mod 2**127 will resolve the problem. Cheers, Marcel On 20.01.2012, at 10:32, Marcel Steinbach wrote: On 19.01.2012, at 20:15, Narendra Sharma wrote: I believe you need to move the nodes on the ring. What was the load on the nodes before you added 5 new nodes? Its just that you are getting data in certain token range more than others. With three nodes, it was also imbalanced. What I don't understand is, why the md5 sums would generate such massive hot spots. Most of our keys look like that: 00013270494972450001234567 with the first 16 digits being a timestamp of one of our application server's startup times, and the last 10 digits being sequentially generated per user. There may be a lot of keys that start with e.g. 0001327049497245 (or some other time stamp). But I was under the impression that md5 doesn't bother and generates uniform distribution? But then again, I know next to nothing about md5. Maybe someone else has a better insight to the algorithm? However, we also use cfs with a date (mmdd) as key, as well as cfs with uuids as keys. And those cfs in itself are not balanced either. E.g. node 5 has 12 GB live space used in the cf the uuid as key, and node 8 only 428MB. Cheers, Marcel On Thu, Jan 19, 2012 at 3:22 AM, Marcel Steinbach marcel.steinb...@chors.de wrote: On 18.01.2012, at 02:19, Maki Watanabe wrote: Are there any significant difference of number of sstables on each nodes? No, no significant difference there. Actually, node 8 is among those with more sstables but with the least load (20GB) On 17.01.2012, at 20:14, Jeremiah Jordan wrote: Are you deleting data or using TTL's? Expired/deleted data won't go away until the sstable holding it is compacted. So if compaction has happened on some nodes, but not on others, you will see this. The disparity is pretty big 400Gb to 20GB, so this probably isn't the issue, but with our data using TTL's if I run major compactions a couple times on that column family it can shrink ~30%-40%. Yes, we do delete data. But I agree, the disparity is too big to blame only the deletions. Also, initially, we started out with 3 nodes and upgraded to 8 a few weeks ago. After adding the node, we did compactions and cleanups and didn't have a balanced cluster. So that should have removed outdated data, right? 2012/1/18 Marcel Steinbach marcel.steinb...@chors.de: We are running regular repairs, so I don't think that's the problem. And the data dir sizes match approx. the load from the nodetool. Thanks for the advise, though. Our keys are digits only, and all contain a few zeros at the same offsets. I'm not that familiar with the md5 algorithm, but I doubt that it would generate 'hotspots' for those kind of keys, right? On 17.01.2012, at 17:34, Mohit Anchlia wrote: Have you tried running repair first on each node? Also, verify using df -h on the data dirs On Tue, Jan 17, 2012 at 7:34 AM, Marcel Steinbach marcel.steinb...@chors.de wrote: Hi, we're using RP and have each node assigned the same amount of the token space. The cluster looks like that: Address Status State LoadOwnsToken 205648943402372032879374446248852460236 1
Re: Unbalanced cluster with RandomPartitioner
On 19.01.2012, at 20:15, Narendra Sharma wrote: I believe you need to move the nodes on the ring. What was the load on the nodes before you added 5 new nodes? Its just that you are getting data in certain token range more than others. With three nodes, it was also imbalanced. What I don't understand is, why the md5 sums would generate such massive hot spots. Most of our keys look like that: 00013270494972450001234567 with the first 16 digits being a timestamp of one of our application server's startup times, and the last 10 digits being sequentially generated per user. There may be a lot of keys that start with e.g. 0001327049497245 (or some other time stamp). But I was under the impression that md5 doesn't bother and generates uniform distribution? But then again, I know next to nothing about md5. Maybe someone else has a better insight to the algorithm? However, we also use cfs with a date (mmdd) as key, as well as cfs with uuids as keys. And those cfs in itself are not balanced either. E.g. node 5 has 12 GB live space used in the cf the uuid as key, and node 8 only 428MB. Cheers, Marcel On Thu, Jan 19, 2012 at 3:22 AM, Marcel Steinbach marcel.steinb...@chors.de wrote: On 18.01.2012, at 02:19, Maki Watanabe wrote: Are there any significant difference of number of sstables on each nodes? No, no significant difference there. Actually, node 8 is among those with more sstables but with the least load (20GB) On 17.01.2012, at 20:14, Jeremiah Jordan wrote: Are you deleting data or using TTL's? Expired/deleted data won't go away until the sstable holding it is compacted. So if compaction has happened on some nodes, but not on others, you will see this. The disparity is pretty big 400Gb to 20GB, so this probably isn't the issue, but with our data using TTL's if I run major compactions a couple times on that column family it can shrink ~30%-40%. Yes, we do delete data. But I agree, the disparity is too big to blame only the deletions. Also, initially, we started out with 3 nodes and upgraded to 8 a few weeks ago. After adding the node, we did compactions and cleanups and didn't have a balanced cluster. So that should have removed outdated data, right? 2012/1/18 Marcel Steinbach marcel.steinb...@chors.de: We are running regular repairs, so I don't think that's the problem. And the data dir sizes match approx. the load from the nodetool. Thanks for the advise, though. Our keys are digits only, and all contain a few zeros at the same offsets. I'm not that familiar with the md5 algorithm, but I doubt that it would generate 'hotspots' for those kind of keys, right? On 17.01.2012, at 17:34, Mohit Anchlia wrote: Have you tried running repair first on each node? Also, verify using df -h on the data dirs On Tue, Jan 17, 2012 at 7:34 AM, Marcel Steinbach marcel.steinb...@chors.de wrote: Hi, we're using RP and have each node assigned the same amount of the token space. The cluster looks like that: Address Status State LoadOwnsToken 205648943402372032879374446248852460236 1 Up Normal 310.83 GB 12.50% 56775407874461455114148055497453867724 2 Up Normal 470.24 GB 12.50% 78043055807020109080608968461939380940 3 Up Normal 271.57 GB 12.50% 99310703739578763047069881426424894156 4 Up Normal 282.61 GB 12.50% 120578351672137417013530794390910407372 5 Up Normal 248.76 GB 12.50% 141845999604696070979991707355395920588 6 Up Normal 164.12 GB 12.50% 163113647537254724946452620319881433804 7 Up Normal 76.23 GB12.50% 184381295469813378912913533284366947020 8 Up Normal 19.79 GB12.50% 205648943402372032879374446248852460236 I was under the impression, the RP would distribute the load more evenly. Our row sizes are 0,5-1 KB, hence, we don't store huge rows on a single node. Should we just move the nodes so that the load is more even distributed, or is there something off that needs to be fixed first? Thanks Marcel hr style=border-color:blue pchors GmbH brhr style=border-color:blue pspecialists in digital and direct marketing solutionsbr Haid-und-Neu-Straße 7br 76131 Karlsruhe, Germanybr www.chors.com/p pManaging Directors: Dr. Volker Hatz, Markus PlattnerbrAmtsgericht Montabaur, HRB 15029/p p style=font-size:9pxThis e-mail is for the intended recipient only and may contain confidential or privileged information. If you have received this e-mail by mistake, please contact us immediately and completely delete it (and any attachments) and do not forward it or inform any other person of its contents. If you send us messages by e-mail, we take this as your authorization to correspond with you by e-mail. E-mail transmission cannot be guaranteed to be secure or error-free as information
Re: Unbalanced cluster with RandomPartitioner
Thanks for all the responses! I found our problem: Using the Random Partitioner, the key range is from 0..2**127.When we added nodes, we generated the keys and out of convenience, we added an offset to the tokens because the move was easier like that. However, we did not execute the modulo 2**127 for the last two tokens, so they were outside the RP's key range. moving the last two tokens to their mod 2**127 will resolve the problem. Cheers, Marcel On 20.01.2012, at 10:32, Marcel Steinbach wrote: On 19.01.2012, at 20:15, Narendra Sharma wrote: I believe you need to move the nodes on the ring. What was the load on the nodes before you added 5 new nodes? Its just that you are getting data in certain token range more than others. With three nodes, it was also imbalanced. What I don't understand is, why the md5 sums would generate such massive hot spots. Most of our keys look like that: 00013270494972450001234567 with the first 16 digits being a timestamp of one of our application server's startup times, and the last 10 digits being sequentially generated per user. There may be a lot of keys that start with e.g. 0001327049497245 (or some other time stamp). But I was under the impression that md5 doesn't bother and generates uniform distribution? But then again, I know next to nothing about md5. Maybe someone else has a better insight to the algorithm? However, we also use cfs with a date (mmdd) as key, as well as cfs with uuids as keys. And those cfs in itself are not balanced either. E.g. node 5 has 12 GB live space used in the cf the uuid as key, and node 8 only 428MB. Cheers, Marcel On Thu, Jan 19, 2012 at 3:22 AM, Marcel Steinbach marcel.steinb...@chors.de wrote: On 18.01.2012, at 02:19, Maki Watanabe wrote: Are there any significant difference of number of sstables on each nodes? No, no significant difference there. Actually, node 8 is among those with more sstables but with the least load (20GB) On 17.01.2012, at 20:14, Jeremiah Jordan wrote: Are you deleting data or using TTL's? Expired/deleted data won't go away until the sstable holding it is compacted. So if compaction has happened on some nodes, but not on others, you will see this. The disparity is pretty big 400Gb to 20GB, so this probably isn't the issue, but with our data using TTL's if I run major compactions a couple times on that column family it can shrink ~30%-40%. Yes, we do delete data. But I agree, the disparity is too big to blame only the deletions. Also, initially, we started out with 3 nodes and upgraded to 8 a few weeks ago. After adding the node, we did compactions and cleanups and didn't have a balanced cluster. So that should have removed outdated data, right? 2012/1/18 Marcel Steinbach marcel.steinb...@chors.de: We are running regular repairs, so I don't think that's the problem. And the data dir sizes match approx. the load from the nodetool. Thanks for the advise, though. Our keys are digits only, and all contain a few zeros at the same offsets. I'm not that familiar with the md5 algorithm, but I doubt that it would generate 'hotspots' for those kind of keys, right? On 17.01.2012, at 17:34, Mohit Anchlia wrote: Have you tried running repair first on each node? Also, verify using df -h on the data dirs On Tue, Jan 17, 2012 at 7:34 AM, Marcel Steinbach marcel.steinb...@chors.de wrote: Hi, we're using RP and have each node assigned the same amount of the token space. The cluster looks like that: Address Status State LoadOwnsToken 205648943402372032879374446248852460236 1 Up Normal 310.83 GB 12.50% 56775407874461455114148055497453867724 2 Up Normal 470.24 GB 12.50% 78043055807020109080608968461939380940 3 Up Normal 271.57 GB 12.50% 99310703739578763047069881426424894156 4 Up Normal 282.61 GB 12.50% 120578351672137417013530794390910407372 5 Up Normal 248.76 GB 12.50% 141845999604696070979991707355395920588 6 Up Normal 164.12 GB 12.50% 163113647537254724946452620319881433804 7 Up Normal 76.23 GB12.50% 184381295469813378912913533284366947020 8 Up Normal 19.79 GB12.50% 205648943402372032879374446248852460236 I was under the impression, the RP would distribute the load more evenly. Our row sizes are 0,5-1 KB, hence, we don't store huge rows on a single node. Should we just move the nodes so that the load is more even distributed, or is there something off that needs to be fixed first? Thanks Marcel hr style=border-color:blue pchors GmbH brhr style=border-color:blue pspecialists in digital and direct marketing solutionsbr Haid-und-Neu-Straße 7br 76131 Karlsruhe, Germanybr www.chors.com/p pManaging Directors: Dr. Volker Hatz, Markus PlattnerbrAmtsgericht
Re: Unbalanced cluster with RandomPartitioner
On 18.01.2012, at 02:19, Maki Watanabe wrote: Are there any significant difference of number of sstables on each nodes? No, no significant difference there. Actually, node 8 is among those with more sstables but with the least load (20GB) On 17.01.2012, at 20:14, Jeremiah Jordan wrote: Are you deleting data or using TTL's? Expired/deleted data won't go away until the sstable holding it is compacted. So if compaction has happened on some nodes, but not on others, you will see this. The disparity is pretty big 400Gb to 20GB, so this probably isn't the issue, but with our data using TTL's if I run major compactions a couple times on that column family it can shrink ~30%-40%. Yes, we do delete data. But I agree, the disparity is too big to blame only the deletions. Also, initially, we started out with 3 nodes and upgraded to 8 a few weeks ago. After adding the node, we did compactions and cleanups and didn't have a balanced cluster. So that should have removed outdated data, right? 2012/1/18 Marcel Steinbach marcel.steinb...@chors.de: We are running regular repairs, so I don't think that's the problem. And the data dir sizes match approx. the load from the nodetool. Thanks for the advise, though. Our keys are digits only, and all contain a few zeros at the same offsets. I'm not that familiar with the md5 algorithm, but I doubt that it would generate 'hotspots' for those kind of keys, right? On 17.01.2012, at 17:34, Mohit Anchlia wrote: Have you tried running repair first on each node? Also, verify using df -h on the data dirs On Tue, Jan 17, 2012 at 7:34 AM, Marcel Steinbach marcel.steinb...@chors.de wrote: Hi, we're using RP and have each node assigned the same amount of the token space. The cluster looks like that: Address Status State LoadOwnsToken 205648943402372032879374446248852460236 1 Up Normal 310.83 GB 12.50% 56775407874461455114148055497453867724 2 Up Normal 470.24 GB 12.50% 78043055807020109080608968461939380940 3 Up Normal 271.57 GB 12.50% 99310703739578763047069881426424894156 4 Up Normal 282.61 GB 12.50% 120578351672137417013530794390910407372 5 Up Normal 248.76 GB 12.50% 141845999604696070979991707355395920588 6 Up Normal 164.12 GB 12.50% 163113647537254724946452620319881433804 7 Up Normal 76.23 GB12.50% 184381295469813378912913533284366947020 8 Up Normal 19.79 GB12.50% 205648943402372032879374446248852460236 I was under the impression, the RP would distribute the load more evenly. Our row sizes are 0,5-1 KB, hence, we don't store huge rows on a single node. Should we just move the nodes so that the load is more even distributed, or is there something off that needs to be fixed first? Thanks Marcel hr style=border-color:blue pchors GmbH brhr style=border-color:blue pspecialists in digital and direct marketing solutionsbr Haid-und-Neu-Straße 7br 76131 Karlsruhe, Germanybr www.chors.com/p pManaging Directors: Dr. Volker Hatz, Markus PlattnerbrAmtsgericht Montabaur, HRB 15029/p p style=font-size:9pxThis e-mail is for the intended recipient only and may contain confidential or privileged information. If you have received this e-mail by mistake, please contact us immediately and completely delete it (and any attachments) and do not forward it or inform any other person of its contents. If you send us messages by e-mail, we take this as your authorization to correspond with you by e-mail. E-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, amended, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. Neither chors GmbH nor the sender accept liability for any errors or omissions in the content of this message which arise as a result of its e-mail transmission. Please note that all e-mail communications to and from chors GmbH may be monitored./p -- w3m
Re: Unbalanced cluster with RandomPartitioner
2012/1/19 aaron morton aa...@thelastpickle.com: If you have performed any token moves the data will not be deleted until you run nodetool cleanup. We did that after adding nodes to the cluster. And then, the cluster wasn't balanced either. Also, does the Load really account for dead data, or is it just live data? To get a baseline I would run nodetool compact to do major compaction and purge any tomb stones as others have said. We will do that, but I doubt we have 450GB tomb stones on node 2... Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 18/01/2012, at 2:19 PM, Maki Watanabe wrote: Are there any significant difference of number of sstables on each nodes? 2012/1/18 Marcel Steinbach marcel.steinb...@chors.de: We are running regular repairs, so I don't think that's the problem. And the data dir sizes match approx. the load from the nodetool. Thanks for the advise, though. Our keys are digits only, and all contain a few zeros at the same offsets. I'm not that familiar with the md5 algorithm, but I doubt that it would generate 'hotspots' for those kind of keys, right? On 17.01.2012, at 17:34, Mohit Anchlia wrote: Have you tried running repair first on each node? Also, verify using df -h on the data dirs On Tue, Jan 17, 2012 at 7:34 AM, Marcel Steinbach marcel.steinb...@chors.de wrote: Hi, we're using RP and have each node assigned the same amount of the token space. The cluster looks like that: Address Status State Load Owns Token 205648943402372032879374446248852460236 1 Up Normal 310.83 GB 12.50% 56775407874461455114148055497453867724 2 Up Normal 470.24 GB 12.50% 78043055807020109080608968461939380940 3 Up Normal 271.57 GB 12.50% 99310703739578763047069881426424894156 4 Up Normal 282.61 GB 12.50% 120578351672137417013530794390910407372 5 Up Normal 248.76 GB 12.50% 141845999604696070979991707355395920588 6 Up Normal 164.12 GB 12.50% 163113647537254724946452620319881433804 7 Up Normal 76.23 GB 12.50% 184381295469813378912913533284366947020 8 Up Normal 19.79 GB 12.50% 205648943402372032879374446248852460236 I was under the impression, the RP would distribute the load more evenly. Our row sizes are 0,5-1 KB, hence, we don't store huge rows on a single node. Should we just move the nodes so that the load is more even distributed, or is there something off that needs to be fixed first? Thanks Marcel hr style=border-color:blue pchors GmbH brhr style=border-color:blue pspecialists in digital and direct marketing solutionsbr Haid-und-Neu-Straße 7br 76131 Karlsruhe, Germanybr www.chors.com/p pManaging Directors: Dr. Volker Hatz, Markus PlattnerbrAmtsgericht Montabaur, HRB 15029/p p style=font-size:9pxThis e-mail is for the intended recipient only and may contain confidential or privileged information. If you have received this e-mail by mistake, please contact us immediately and completely delete it (and any attachments) and do not forward it or inform any other person of its contents. If you send us messages by e-mail, we take this as your authorization to correspond with you by e-mail. E-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, amended, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. Neither chors GmbH nor the sender accept liability for any errors or omissions in the content of this message which arise as a result of its e-mail transmission. Please note that all e-mail communications to and from chors GmbH may be monitored./p -- w3m
RecentReadLatencyHistogramMicros vs. latencies in client
Hi, we're running a 8 node cassandra-0.7.6 cluster, with avg. throughput of 5k reads/s and almost as much writes/s. The client API is pelops 1.1-0.7.x. Latencies in the CFs (RecentReadLatencyHistogramMicros) look fine with 99th percentile at 61ms. However, on the client side, p99 latency is at 1.1s (seconds!) and we only have 91% below 60ms! So there is a big difference between the numbers shown in the CF latencies and what the client actually experiences. The cluster is not very balanced currently, but nothing indicates a latency of 1.1 seconds. I also see high write latency in the clients, with about 4% taking 50+ ms. Whereas in the RecentWriteLatencyHistogramMicros, 99,9% of the latencies are below 1ms. I'm not sure where the additional latency is gained. Is it possible, the request spends some time in a queue before being processed? If so, is there a way to optimize that? I already increased core pool size for the ReadStage, which didn't improve things. Any other ideas? Thanks! Marcelhr style=border-color:blue pchors GmbH brhr style=border-color:blue pspecialists in digital and direct marketing solutionsbr Haid-und-Neu-Straße 7br 76131 Karlsruhe, Germanybr www.chors.com/p pManaging Directors: Dr. Volker Hatz, Markus PlattnerbrAmtsgericht Montabaur, HRB 15029/p p style=font-size:9pxThis e-mail is for the intended recipient only and may contain confidential or privileged information. If you have received this e-mail by mistake, please contact us immediately and completely delete it (and any attachments) and do not forward it or inform any other person of its contents. If you send us messages by e-mail, we take this as your authorization to correspond with you by e-mail. E-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, amended, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. Neither chors GmbH nor the sender accept liability for any errors or omissions in the content of this message which arise as a result of its e-mail transmission. Please note that all e-mail communications to and from chors GmbH may be monitored./p
Unbalanced cluster with RandomPartitioner
Hi, we're using RP and have each node assigned the same amount of the token space. The cluster looks like that: Address Status State LoadOwnsToken 205648943402372032879374446248852460236 1 Up Normal 310.83 GB 12.50% 56775407874461455114148055497453867724 2 Up Normal 470.24 GB 12.50% 78043055807020109080608968461939380940 3 Up Normal 271.57 GB 12.50% 99310703739578763047069881426424894156 4 Up Normal 282.61 GB 12.50% 120578351672137417013530794390910407372 5 Up Normal 248.76 GB 12.50% 141845999604696070979991707355395920588 6 Up Normal 164.12 GB 12.50% 163113647537254724946452620319881433804 7 Up Normal 76.23 GB12.50% 184381295469813378912913533284366947020 8 Up Normal 19.79 GB12.50% 205648943402372032879374446248852460236 I was under the impression, the RP would distribute the load more evenly. Our row sizes are 0,5-1 KB, hence, we don't store huge rows on a single node. Should we just move the nodes so that the load is more even distributed, or is there something off that needs to be fixed first? Thanks Marcelhr style=border-color:blue pchors GmbH brhr style=border-color:blue pspecialists in digital and direct marketing solutionsbr Haid-und-Neu-Straße 7br 76131 Karlsruhe, Germanybr www.chors.com/p pManaging Directors: Dr. Volker Hatz, Markus PlattnerbrAmtsgericht Montabaur, HRB 15029/p p style=font-size:9pxThis e-mail is for the intended recipient only and may contain confidential or privileged information. If you have received this e-mail by mistake, please contact us immediately and completely delete it (and any attachments) and do not forward it or inform any other person of its contents. If you send us messages by e-mail, we take this as your authorization to correspond with you by e-mail. E-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, amended, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. Neither chors GmbH nor the sender accept liability for any errors or omissions in the content of this message which arise as a result of its e-mail transmission. Please note that all e-mail communications to and from chors GmbH may be monitored./p
Re: Unbalanced cluster with RandomPartitioner
We are running regular repairs, so I don't think that's the problem. And the data dir sizes match approx. the load from the nodetool. Thanks for the advise, though. Our keys are digits only, and all contain a few zeros at the same offsets. I'm not that familiar with the md5 algorithm, but I doubt that it would generate 'hotspots' for those kind of keys, right? On 17.01.2012, at 17:34, Mohit Anchlia wrote: Have you tried running repair first on each node? Also, verify using df -h on the data dirs On Tue, Jan 17, 2012 at 7:34 AM, Marcel Steinbach marcel.steinb...@chors.de wrote: Hi, we're using RP and have each node assigned the same amount of the token space. The cluster looks like that: Address Status State LoadOwnsToken 205648943402372032879374446248852460236 1 Up Normal 310.83 GB 12.50% 56775407874461455114148055497453867724 2 Up Normal 470.24 GB 12.50% 78043055807020109080608968461939380940 3 Up Normal 271.57 GB 12.50% 99310703739578763047069881426424894156 4 Up Normal 282.61 GB 12.50% 120578351672137417013530794390910407372 5 Up Normal 248.76 GB 12.50% 141845999604696070979991707355395920588 6 Up Normal 164.12 GB 12.50% 163113647537254724946452620319881433804 7 Up Normal 76.23 GB12.50% 184381295469813378912913533284366947020 8 Up Normal 19.79 GB12.50% 205648943402372032879374446248852460236 I was under the impression, the RP would distribute the load more evenly. Our row sizes are 0,5-1 KB, hence, we don't store huge rows on a single node. Should we just move the nodes so that the load is more even distributed, or is there something off that needs to be fixed first? Thanks Marcel hr style=border-color:blue pchors GmbH brhr style=border-color:blue pspecialists in digital and direct marketing solutionsbr Haid-und-Neu-Straße 7br 76131 Karlsruhe, Germanybr www.chors.com/p pManaging Directors: Dr. Volker Hatz, Markus PlattnerbrAmtsgericht Montabaur, HRB 15029/p p style=font-size:9pxThis e-mail is for the intended recipient only and may contain confidential or privileged information. If you have received this e-mail by mistake, please contact us immediately and completely delete it (and any attachments) and do not forward it or inform any other person of its contents. If you send us messages by e-mail, we take this as your authorization to correspond with you by e-mail. E-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, amended, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. Neither chors GmbH nor the sender accept liability for any errors or omissions in the content of this message which arise as a result of its e-mail transmission. Please note that all e-mail communications to and from chors GmbH may be monitored./p