Re: Data tombstoned during bulk loading 1.2.10 - 2.0.3
I don't know what is the real cause of my problem. We are still guessing. All operations I have done one cluster are described on timeline: 1.1.7- 1.2.10 - upgradesstable - 2.0.2 - normal operations -2.0.3 - normal operations - now normal operations means reads/writes/repairs. Could you please, describe briefly how to recover data? I have a problem with scenario described under link: http://thelastpickle.com/blog/2011/12/15/Anatomy-of-a-Cassandra-Partition.html , I can't apply this solution to my case. regards Olek 2014-02-03 Robert Coli rc...@eventbrite.com: On Mon, Feb 3, 2014 at 2:17 PM, olek.stas...@gmail.com olek.stas...@gmail.com wrote: No, i've done repair after upgrade sstables. In fact it was about 4 weeks after, because of bug: If you only did a repair after you upgraded SSTables, when did you have an opportunity to hit : https://issues.apache.org/jira/browse/CASSANDRA-6527 ... which relies on you having multiple versions of SStables while streaming? Did you do any operation which involves streaming? (Add/Remove/Replace a node?) =Rob
Maximum size and number of datafiles
Hello here, Is it possible to tell me if it possible to choose the maximum size for a datafile to prevent fs saturation. When cassandra choose to add a datafile ? Thanks 4 all your answears. Regards, Bonnet Jonathan.
Keyspace directory not getting created in 1 machine
Dear Team , I have a 3 node cassandra 1.1.12 opensource version installed in our lab.The db files for columnfamilies are getting created in 2 machines while in one of the machine the data directory is empty.I have tried with the following option nodetool -h [IP address of the not working machine] rebuild --- to auto bootstrap system keyspace and other column family information Still the error persist Best Regards Hari Krishnan Rajendhran Hadoop Admin DESS-ABIM ,Chennai BIGDATA Galaxy Tata Consultancy Services Cell:- 9677985515 Mailto: hari.rajendh...@tcs.com Website: http://www.tcs.com Experience certainty. IT Services Business Solutions Consulting =-=-= Notice: The information contained in this e-mail message and/or attachments to it may contain confidential or privileged information. If you are not the intended recipient, any dissemination, use, review, distribution, printing or copying of the information contained in this e-mail message and/or attachments to it are strictly prohibited. If you have received this communication in error, please notify us by reply e-mail or telephone and immediately and permanently delete the message and any attachments. Thank you
Re: Keyspace directory not getting created in 1 machine
Hi Hari, On 04/02/14 10:38, Hari Rajendhran wrote: Dear Team , I have a 3 node cassandra 1.1.12 opensource version installed in our lab.The db files for columnfamilies are getting created in 2 machines while in one of the machine the data directory is empty.I have tried with the following option nodetool -h [IP address of the not working machine] rebuild --- to auto bootstrap system keyspace and other column family information Still the error persist do all of the nodes know about all of the other nodes? Try using nodetool status on each node to check. Another possibility is that you configured that one node to store its data in a different directory rather than the standard one, but you are mistakenly looking for the data in the standard location. Ciao, Duncan. Best Regards Hari Krishnan Rajendhran Hadoop Admin DESS-ABIM ,Chennai BIGDATA Galaxy Tata Consultancy Services Cell:- 9677985515 Mailto: hari.rajendh...@tcs.com Website: http://www.tcs.com Experience certainty. IT Services Business Solutions Consulting =-=-= Notice: The information contained in this e-mail message and/or attachments to it may contain confidential or privileged information. If you are not the intended recipient, any dissemination, use, review, distribution, printing or copying of the information contained in this e-mail message and/or attachments to it are strictly prohibited. If you have received this communication in error, please notify us by reply e-mail or telephone and immediately and permanently delete the message and any attachments. Thank you
what tool will create noncql columnfamilies in cassandra 3a
Cassandra 2.0.4 cli is informing me that it will no longer exist in the next major. How will users adjust the meta data of non cql column families and other cfs that do not fit into the cql model? -- Sorry this was sent from mobile. Will do less grammar and spell check than usual.
Re: Ultra wide row anti pattern
I have actually been building something similar in my space time. You can hang around and wait for it or build your own. Here is the basics. Not perfect but it will work. Create column family queue with gc_grace_period=[1 day] set queue [timeuuid()] [z+timeuuid()] = [ work do do] The producer can decide how it wants to role over the row key and the column key it does not matter. Supposing there are N consumers. We need a way for the consumers to not do the same work. We can use something like the bakery algorithm. Remember at QUORUM a reader sees writes. A consumer needs an identifier (it could be another uuid or an ip address) A consumer calls get_range_slice on the queue the slice is from new byte[] to byte[] limit 100 The consumer sees data like this. [1234] [z-$timeuuid] = data Now we register that this consumer wants to consume this queue set [1234] [a-$[ip}] at quorum Now we do a slice get_slice [1234] from new byte [] to ' b' There are a few possible returns. 1) 1 bidder... [1234] [a-$myip] You won start consuming 2) 2 bidders [1234] [a-$myip] [1234] [a-$otherip] compare $myip vs $otherip higher wins Whoever wins can then start consuming the columns in the queue and delete them when done. On Friday, January 31, 2014, DuyHai Doan doanduy...@gmail.com wrote: Thanks Nat for your ideas. This could be as simple as adding year and month to the primary key (in the form 'mm'). Alternatively, you could add this in the partition in the definition. Either way, it then becomes pretty easy to re-generate these based on the query parameters. The thing is that it's not that simple. My customer has a very BAD idea, using Cassandra as a queue (the perfect anti-pattern ever). Before trying to tell them to redesign their entire architecture and put in some queueing system like ActiveMQ or something similar, I would like to see how I can use wide rows to meet the requirements. The functional need is quite simple: 1) A process A loads users into Cassandra and sets the status on this user to be 'TODO'. When using the bucketing technique, we can limit a row width to, let's say 100 000 columns. So at the end of the current row, process A knows that it should move to next bucket. Bucket is coded using composite partition key, in our example it would be 'TODO:1', 'TODO:2' etc 2) A process B reads the wide row for 'TODO' status. It starts at bucket 1 so it will read row with partition key 'TODO:1'. The users are processed and inserted in a new row 'PROCESSED:1' for example to keep track of the status. After retrieving 100 000 columns, it will switch automatically to the next bucket. Simple. Fair enough 3) Now what sucks it that some time, process B does not have enough data to perform functional logic on the user it fetched from the wide row, so it has to REPUT some users back into the 'TODO' status rather than transitioning to 'PROCESSED' status. That's exactly a queue behavior. A simplistic idea would be to insert again those m users with 'TODO:n', with n higher than the current bucket number so it can be processed later. But then it screws up all the counting system. Process A which inserts data will not know that there are already m users in row n, so will happily add 100 000 columns, making the row size grow to 100 000 + m. When process B reads back again this row, it will stop at the first 100 000 columns and skip the trailing m elements . That 's the main reason for which I dropped the idea of bucketing (which is quite smart in normal case) to trade for ultra wide row. Any way, I'll follow your advice and play around with the parameters of SizeTiered Regards Duy Hai DOAN On Fri, Jan 31, 2014 at 9:23 PM, Nate McCall n...@thelastpickle.com wrote: The only drawback for ultra wide row I can see is point 1). But if I use leveled compaction with a sufficiently large value for sstable_size_in_mb (let's say 200Mb), will my read performance be impacted as the row grows ? For this use case, you would want to use SizeTieredCompaction and play around with the configuration a bit to keep a small number of large SSTables. Specifically: keep min|max_threshold really low, set bucket_low and bucket_high closer together maybe even both to 1.0, and maybe a larger min_sstable_size. YMMV though - per Rob's suggestion, take the time to run some tests tweaking these options. Of course, splitting wide row into several rows using bucketing technique is one solution but it forces us to keep track of the bucket number and it's not convenient. We have one process (jvm) that insert data and another process (jvm) that read data. Using bucketing, we need to synchronize the bucket number between the 2 processes. This could be as simple as adding year and month to the primary key (in the form 'mm'). Alternatively, you could add this in the partition in the definition. Either way, it then becomes pretty easy to re-generate these based on the query parameters. --
Re: Ultra wide row anti pattern
Sorry, I am not understanding the problem, and I am new to Cassandra, and want to understand this issue. Why do we need to use wide row for this situation, why not a simple table in cassandra? todolist (user, state) == is there any other information in this table which needs for processing todo? processedlist (user, state) On Tue, Feb 4, 2014 at 7:50 AM, Edward Capriolo edlinuxg...@gmail.comwrote: I have actually been building something similar in my space time. You can hang around and wait for it or build your own. Here is the basics. Not perfect but it will work. Create column family queue with gc_grace_period=[1 day] set queue [timeuuid()] [z+timeuuid()] = [ work do do] The producer can decide how it wants to role over the row key and the column key it does not matter. Supposing there are N consumers. We need a way for the consumers to not do the same work. We can use something like the bakery algorithm. Remember at QUORUM a reader sees writes. A consumer needs an identifier (it could be another uuid or an ip address) A consumer calls get_range_slice on the queue the slice is from new byte[] to byte[] limit 100 The consumer sees data like this. [1234] [z-$timeuuid] = data Now we register that this consumer wants to consume this queue set [1234] [a-$[ip}] at quorum Now we do a slice get_slice [1234] from new byte [] to ' b' There are a few possible returns. 1) 1 bidder... [1234] [a-$myip] You won start consuming 2) 2 bidders [1234] [a-$myip] [1234] [a-$otherip] compare $myip vs $otherip higher wins Whoever wins can then start consuming the columns in the queue and delete them when done. On Friday, January 31, 2014, DuyHai Doan doanduy...@gmail.com wrote: Thanks Nat for your ideas. This could be as simple as adding year and month to the primary key (in the form 'mm'). Alternatively, you could add this in the partition in the definition. Either way, it then becomes pretty easy to re-generate these based on the query parameters. The thing is that it's not that simple. My customer has a very BAD idea, using Cassandra as a queue (the perfect anti-pattern ever). Before trying to tell them to redesign their entire architecture and put in some queueing system like ActiveMQ or something similar, I would like to see how I can use wide rows to meet the requirements. The functional need is quite simple: 1) A process A loads users into Cassandra and sets the status on this user to be 'TODO'. When using the bucketing technique, we can limit a row width to, let's say 100 000 columns. So at the end of the current row, process A knows that it should move to next bucket. Bucket is coded using composite partition key, in our example it would be 'TODO:1', 'TODO:2' etc 2) A process B reads the wide row for 'TODO' status. It starts at bucket 1 so it will read row with partition key 'TODO:1'. The users are processed and inserted in a new row 'PROCESSED:1' for example to keep track of the status. After retrieving 100 000 columns, it will switch automatically to the next bucket. Simple. Fair enough 3) Now what sucks it that some time, process B does not have enough data to perform functional logic on the user it fetched from the wide row, so it has to REPUT some users back into the 'TODO' status rather than transitioning to 'PROCESSED' status. That's exactly a queue behavior. A simplistic idea would be to insert again those m users with 'TODO:n', with n higher than the current bucket number so it can be processed later. But then it screws up all the counting system. Process A which inserts data will not know that there are already m users in row n, so will happily add 100 000 columns, making the row size grow to 100 000 + m. When process B reads back again this row, it will stop at the first 100 000 columns and skip the trailing m elements . That 's the main reason for which I dropped the idea of bucketing (which is quite smart in normal case) to trade for ultra wide row. Any way, I'll follow your advice and play around with the parameters of SizeTiered Regards Duy Hai DOAN On Fri, Jan 31, 2014 at 9:23 PM, Nate McCall n...@thelastpickle.com wrote: The only drawback for ultra wide row I can see is point 1). But if I use leveled compaction with a sufficiently large value for sstable_size_in_mb (let's say 200Mb), will my read performance be impacted as the row grows ? For this use case, you would want to use SizeTieredCompaction and play around with the configuration a bit to keep a small number of large SSTables. Specifically: keep min|max_threshold really low, set bucket_low and bucket_high closer together maybe even both to 1.0, and maybe a larger min_sstable_size. YMMV though - per Rob's suggestion, take the time to run some tests tweaking these options. Of course, splitting wide row into several rows using bucketing technique is one solution but it forces us
Re: Ultra wide row anti pattern
Generally you need to make a wide row because the row keys in cassandra are ordered by their md5/murmer code. As a result you have no way of locating new rows, but if the row name is predictable the columns inside the row are ordered. On Tue, Feb 4, 2014 at 12:02 PM, Yogi Nerella ynerella...@gmail.com wrote: Sorry, I am not understanding the problem, and I am new to Cassandra, and want to understand this issue. Why do we need to use wide row for this situation, why not a simple table in cassandra? todolist (user, state) == is there any other information in this table which needs for processing todo? processedlist (user, state) On Tue, Feb 4, 2014 at 7:50 AM, Edward Capriolo edlinuxg...@gmail.comwrote: I have actually been building something similar in my space time. You can hang around and wait for it or build your own. Here is the basics. Not perfect but it will work. Create column family queue with gc_grace_period=[1 day] set queue [timeuuid()] [z+timeuuid()] = [ work do do] The producer can decide how it wants to role over the row key and the column key it does not matter. Supposing there are N consumers. We need a way for the consumers to not do the same work. We can use something like the bakery algorithm. Remember at QUORUM a reader sees writes. A consumer needs an identifier (it could be another uuid or an ip address) A consumer calls get_range_slice on the queue the slice is from new byte[] to byte[] limit 100 The consumer sees data like this. [1234] [z-$timeuuid] = data Now we register that this consumer wants to consume this queue set [1234] [a-$[ip}] at quorum Now we do a slice get_slice [1234] from new byte [] to ' b' There are a few possible returns. 1) 1 bidder... [1234] [a-$myip] You won start consuming 2) 2 bidders [1234] [a-$myip] [1234] [a-$otherip] compare $myip vs $otherip higher wins Whoever wins can then start consuming the columns in the queue and delete them when done. On Friday, January 31, 2014, DuyHai Doan doanduy...@gmail.com wrote: Thanks Nat for your ideas. This could be as simple as adding year and month to the primary key (in the form 'mm'). Alternatively, you could add this in the partition in the definition. Either way, it then becomes pretty easy to re-generate these based on the query parameters. The thing is that it's not that simple. My customer has a very BAD idea, using Cassandra as a queue (the perfect anti-pattern ever). Before trying to tell them to redesign their entire architecture and put in some queueing system like ActiveMQ or something similar, I would like to see how I can use wide rows to meet the requirements. The functional need is quite simple: 1) A process A loads users into Cassandra and sets the status on this user to be 'TODO'. When using the bucketing technique, we can limit a row width to, let's say 100 000 columns. So at the end of the current row, process A knows that it should move to next bucket. Bucket is coded using composite partition key, in our example it would be 'TODO:1', 'TODO:2' etc 2) A process B reads the wide row for 'TODO' status. It starts at bucket 1 so it will read row with partition key 'TODO:1'. The users are processed and inserted in a new row 'PROCESSED:1' for example to keep track of the status. After retrieving 100 000 columns, it will switch automatically to the next bucket. Simple. Fair enough 3) Now what sucks it that some time, process B does not have enough data to perform functional logic on the user it fetched from the wide row, so it has to REPUT some users back into the 'TODO' status rather than transitioning to 'PROCESSED' status. That's exactly a queue behavior. A simplistic idea would be to insert again those m users with 'TODO:n', with n higher than the current bucket number so it can be processed later. But then it screws up all the counting system. Process A which inserts data will not know that there are already m users in row n, so will happily add 100 000 columns, making the row size grow to 100 000 + m. When process B reads back again this row, it will stop at the first 100 000 columns and skip the trailing m elements . That 's the main reason for which I dropped the idea of bucketing (which is quite smart in normal case) to trade for ultra wide row. Any way, I'll follow your advice and play around with the parameters of SizeTiered Regards Duy Hai DOAN On Fri, Jan 31, 2014 at 9:23 PM, Nate McCall n...@thelastpickle.com wrote: The only drawback for ultra wide row I can see is point 1). But if I use leveled compaction with a sufficiently large value for sstable_size_in_mb (let's say 200Mb), will my read performance be impacted as the row grows ? For this use case, you would want to use SizeTieredCompaction and play around with the configuration a bit to keep a small number of large SSTables. Specifically: keep min|max_threshold really
Re: Data tombstoned during bulk loading 1.2.10 - 2.0.3
On Tue, Feb 4, 2014 at 12:21 AM, olek.stas...@gmail.com olek.stas...@gmail.com wrote: I don't know what is the real cause of my problem. We are still guessing. All operations I have done one cluster are described on timeline: 1.1.7- 1.2.10 - upgradesstable - 2.0.2 - normal operations -2.0.3 - normal operations - now normal operations means reads/writes/repairs. Could you please, describe briefly how to recover data? I have a problem with scenario described under link: http://thelastpickle.com/blog/2011/12/15/Anatomy-of-a-Cassandra-Partition.html, I can't apply this solution to my case. I think your only option is the following : 1) determine which SSTables contain rows have doomstones (tombstones from the far future) 2) determine whether these tombstones mask a live or dead version of the row, by looking at other row fragments 3) dump/filter/re-write all your data via some method, probably sstable2json/json2sstable 4) load the corrected sstables by starting a node with the sstables in the data directory I understand you have a lot of data, but I am pretty sure there is no way for you to fix it within Cassandra. Perhaps ask for advice on the JIRA ticket mentioned upthread if this answer is not sufficient? =Rob
Re: Ultra wide row anti pattern
Great idea for implementing queue pattern. Thank you Edward. However with your design there are still corner cases for 2 consumers to read from the same queue. Reading and writing with QUORUM does not prevent race conditions. I believe the new CAS feature of C* 2.0 might be useful here but with the expense of reduced throughput (because of the Paxos round) On Tue, Feb 4, 2014 at 4:50 PM, Edward Capriolo edlinuxg...@gmail.comwrote: I have actually been building something similar in my space time. You can hang around and wait for it or build your own. Here is the basics. Not perfect but it will work. Create column family queue with gc_grace_period=[1 day] set queue [timeuuid()] [z+timeuuid()] = [ work do do] The producer can decide how it wants to role over the row key and the column key it does not matter. Supposing there are N consumers. We need a way for the consumers to not do the same work. We can use something like the bakery algorithm. Remember at QUORUM a reader sees writes. A consumer needs an identifier (it could be another uuid or an ip address) A consumer calls get_range_slice on the queue the slice is from new byte[] to byte[] limit 100 The consumer sees data like this. [1234] [z-$timeuuid] = data Now we register that this consumer wants to consume this queue set [1234] [a-$[ip}] at quorum Now we do a slice get_slice [1234] from new byte [] to ' b' There are a few possible returns. 1) 1 bidder... [1234] [a-$myip] You won start consuming 2) 2 bidders [1234] [a-$myip] [1234] [a-$otherip] compare $myip vs $otherip higher wins Whoever wins can then start consuming the columns in the queue and delete them when done. On Friday, January 31, 2014, DuyHai Doan doanduy...@gmail.com wrote: Thanks Nat for your ideas. This could be as simple as adding year and month to the primary key (in the form 'mm'). Alternatively, you could add this in the partition in the definition. Either way, it then becomes pretty easy to re-generate these based on the query parameters. The thing is that it's not that simple. My customer has a very BAD idea, using Cassandra as a queue (the perfect anti-pattern ever). Before trying to tell them to redesign their entire architecture and put in some queueing system like ActiveMQ or something similar, I would like to see how I can use wide rows to meet the requirements. The functional need is quite simple: 1) A process A loads users into Cassandra and sets the status on this user to be 'TODO'. When using the bucketing technique, we can limit a row width to, let's say 100 000 columns. So at the end of the current row, process A knows that it should move to next bucket. Bucket is coded using composite partition key, in our example it would be 'TODO:1', 'TODO:2' etc 2) A process B reads the wide row for 'TODO' status. It starts at bucket 1 so it will read row with partition key 'TODO:1'. The users are processed and inserted in a new row 'PROCESSED:1' for example to keep track of the status. After retrieving 100 000 columns, it will switch automatically to the next bucket. Simple. Fair enough 3) Now what sucks it that some time, process B does not have enough data to perform functional logic on the user it fetched from the wide row, so it has to REPUT some users back into the 'TODO' status rather than transitioning to 'PROCESSED' status. That's exactly a queue behavior. A simplistic idea would be to insert again those m users with 'TODO:n', with n higher than the current bucket number so it can be processed later. But then it screws up all the counting system. Process A which inserts data will not know that there are already m users in row n, so will happily add 100 000 columns, making the row size grow to 100 000 + m. When process B reads back again this row, it will stop at the first 100 000 columns and skip the trailing m elements . That 's the main reason for which I dropped the idea of bucketing (which is quite smart in normal case) to trade for ultra wide row. Any way, I'll follow your advice and play around with the parameters of SizeTiered Regards Duy Hai DOAN On Fri, Jan 31, 2014 at 9:23 PM, Nate McCall n...@thelastpickle.com wrote: The only drawback for ultra wide row I can see is point 1). But if I use leveled compaction with a sufficiently large value for sstable_size_in_mb (let's say 200Mb), will my read performance be impacted as the row grows ? For this use case, you would want to use SizeTieredCompaction and play around with the configuration a bit to keep a small number of large SSTables. Specifically: keep min|max_threshold really low, set bucket_low and bucket_high closer together maybe even both to 1.0, and maybe a larger min_sstable_size. YMMV though - per Rob's suggestion, take the time to run some tests tweaking these options. Of course, splitting wide row into several rows using bucketing technique
Re: Ultra wide row anti pattern
You could use another column of CAS as a management layer. You only have to consult it when picking up new rows. On Tue, Feb 4, 2014 at 3:45 PM, DuyHai Doan doanduy...@gmail.com wrote: Great idea for implementing queue pattern. Thank you Edward. However with your design there are still corner cases for 2 consumers to read from the same queue. Reading and writing with QUORUM does not prevent race conditions. I believe the new CAS feature of C* 2.0 might be useful here but with the expense of reduced throughput (because of the Paxos round) On Tue, Feb 4, 2014 at 4:50 PM, Edward Capriolo edlinuxg...@gmail.comwrote: I have actually been building something similar in my space time. You can hang around and wait for it or build your own. Here is the basics. Not perfect but it will work. Create column family queue with gc_grace_period=[1 day] set queue [timeuuid()] [z+timeuuid()] = [ work do do] The producer can decide how it wants to role over the row key and the column key it does not matter. Supposing there are N consumers. We need a way for the consumers to not do the same work. We can use something like the bakery algorithm. Remember at QUORUM a reader sees writes. A consumer needs an identifier (it could be another uuid or an ip address) A consumer calls get_range_slice on the queue the slice is from new byte[] to byte[] limit 100 The consumer sees data like this. [1234] [z-$timeuuid] = data Now we register that this consumer wants to consume this queue set [1234] [a-$[ip}] at quorum Now we do a slice get_slice [1234] from new byte [] to ' b' There are a few possible returns. 1) 1 bidder... [1234] [a-$myip] You won start consuming 2) 2 bidders [1234] [a-$myip] [1234] [a-$otherip] compare $myip vs $otherip higher wins Whoever wins can then start consuming the columns in the queue and delete them when done. On Friday, January 31, 2014, DuyHai Doan doanduy...@gmail.com wrote: Thanks Nat for your ideas. This could be as simple as adding year and month to the primary key (in the form 'mm'). Alternatively, you could add this in the partition in the definition. Either way, it then becomes pretty easy to re-generate these based on the query parameters. The thing is that it's not that simple. My customer has a very BAD idea, using Cassandra as a queue (the perfect anti-pattern ever). Before trying to tell them to redesign their entire architecture and put in some queueing system like ActiveMQ or something similar, I would like to see how I can use wide rows to meet the requirements. The functional need is quite simple: 1) A process A loads users into Cassandra and sets the status on this user to be 'TODO'. When using the bucketing technique, we can limit a row width to, let's say 100 000 columns. So at the end of the current row, process A knows that it should move to next bucket. Bucket is coded using composite partition key, in our example it would be 'TODO:1', 'TODO:2' etc 2) A process B reads the wide row for 'TODO' status. It starts at bucket 1 so it will read row with partition key 'TODO:1'. The users are processed and inserted in a new row 'PROCESSED:1' for example to keep track of the status. After retrieving 100 000 columns, it will switch automatically to the next bucket. Simple. Fair enough 3) Now what sucks it that some time, process B does not have enough data to perform functional logic on the user it fetched from the wide row, so it has to REPUT some users back into the 'TODO' status rather than transitioning to 'PROCESSED' status. That's exactly a queue behavior. A simplistic idea would be to insert again those m users with 'TODO:n', with n higher than the current bucket number so it can be processed later. But then it screws up all the counting system. Process A which inserts data will not know that there are already m users in row n, so will happily add 100 000 columns, making the row size grow to 100 000 + m. When process B reads back again this row, it will stop at the first 100 000 columns and skip the trailing m elements . That 's the main reason for which I dropped the idea of bucketing (which is quite smart in normal case) to trade for ultra wide row. Any way, I'll follow your advice and play around with the parameters of SizeTiered Regards Duy Hai DOAN On Fri, Jan 31, 2014 at 9:23 PM, Nate McCall n...@thelastpickle.com wrote: The only drawback for ultra wide row I can see is point 1). But if I use leveled compaction with a sufficiently large value for sstable_size_in_mb (let's say 200Mb), will my read performance be impacted as the row grows ? For this use case, you would want to use SizeTieredCompaction and play around with the configuration a bit to keep a small number of large SSTables. Specifically: keep min|max_threshold really low, set bucket_low and bucket_high closer together maybe even both to 1.0, and maybe a larger
Re: Cassandra 2.0 with Hadoop 2.x?
Hi, Look for posts from Thunder Stumpges in this mailing list. I know he has succeeded to make it Hadoop 2.x work with Cassandra 2.x For those who are interested in using it with Cassandra 1.2.13 you can use the patch https://github.com/cscetbon/cassandra/commit/88d694362d8d6bc09b3eeceb6baad7b3cc068ad3.patch It uses Cloudera CDH4 repository for Hadoop Classes but you can use others. Regards -- Cyril SCETBON On 03 Feb 2014, at 19:10, Clint Kelly clint.ke...@gmail.com wrote: Folks, Has anyone out there used Cassandra 2.0 with Hadoop 2.x? I saw this discussion on the Cassandra JIRA: https://issues.apache.org/jira/browse/CASSANDRA-5201 but the fix referenced (https://github.com/michaelsembwever/cassandra-hadoop) is for Cassandra 1.2. I put together a similar patch for Cassandra 2.0 for anyone who is interested: https://github.com/wibiclint/cassandra2-hadoop2 but I'm wondering if there is a more official solution to this problem. Any help would be appreciated. Thanks! Best regards, Clint
Re: Data tombstoned during bulk loading 1.2.10 - 2.0.3
Seems good. I'll discus it with data owners and we choose the best method. Best regards, Aleksander 4 lut 2014 19:40 Robert Coli rc...@eventbrite.com napisał(a): On Tue, Feb 4, 2014 at 12:21 AM, olek.stas...@gmail.com olek.stas...@gmail.com wrote: I don't know what is the real cause of my problem. We are still guessing. All operations I have done one cluster are described on timeline: 1.1.7- 1.2.10 - upgradesstable - 2.0.2 - normal operations -2.0.3 - normal operations - now normal operations means reads/writes/repairs. Could you please, describe briefly how to recover data? I have a problem with scenario described under link: http://thelastpickle.com/blog/2011/12/15/Anatomy-of-a-Cassandra-Partition.html, I can't apply this solution to my case. I think your only option is the following : 1) determine which SSTables contain rows have doomstones (tombstones from the far future) 2) determine whether these tombstones mask a live or dead version of the row, by looking at other row fragments 3) dump/filter/re-write all your data via some method, probably sstable2json/json2sstable 4) load the corrected sstables by starting a node with the sstables in the data directory I understand you have a lot of data, but I am pretty sure there is no way for you to fix it within Cassandra. Perhaps ask for advice on the JIRA ticket mentioned upthread if this answer is not sufficient? =Rob
Question 1: JMX binding, Question 2: Logging
Hi all, I'm fairly new to Cassandra. I'm deploying it to a PaaS. One thing this entails is that it must be able to have more than one instance on a single node. I'm running into the problem that JMX binds to 0.0.0.0:7199. My question is this: Is there a way to configure this? I have actually found the post that said to change the the following JVM_OPTS=$JVM_OPTS -Djava.rmi.server.hostname=127.1.246.3 where 127.1.246.3 is the IP I want to bind to.. This actually did not change the JMX binding by any means for me. I saw a post about a jmx listen address in cassandra.yaml and this also did not work. Any clarity on whether this is bindable at all? Or if there are plans for it? Also- I have logging turned on. For some reason, though, my Cassandra is not actually logging as intended. My log folder is actually empty after each (failed) run (due to the port being taken by my other cassandra process). Here is an actual copy of my log4j-server.properites file: http://fpaste.org/74470/15510941/ Any idea why this might not be logging? Thank you and best regards Kyle
Re: Question 1: JMX binding, Question 2: Logging
Hello Kyle, For your first question, you need to create aliases to localhost e.g. 127.0.0.2,127.0.0.3 etc. this should get you going. About the logging issue, I think if your instance failing before it gets to long anything, as an example you can strart one instance and make sure it logs correctly. Hope that helps. Sandeep On Tue, Feb 4, 2014 at 4:25 PM, Kyle Crumpton (kcrumpto) kcrum...@cisco.com wrote: Hi all, I'm fairly new to Cassandra. I'm deploying it to a PaaS. One thing this entails is that it must be able to have more than one instance on a single node. I'm running into the problem that JMX binds to 0.0.0.0:7199. My question is this: Is there a way to configure this? I have actually found the post that said to change the the following JVM_OPTS=$JVM_OPTS -Djava.rmi.server.hostname=127.1.246.3 where 127.1.246.3 is the IP I want to bind to.. This actually did not change the JMX binding by any means for me. I saw a post about a jmx listen address in cassandra.yaml and this also did not work. Any clarity on whether this is bindable at all? Or if there are plans for it? Also- I have logging turned on. For some reason, though, my Cassandra is not actually logging as intended. My log folder is actually empty after each (failed) run (due to the port being taken by my other cassandra process). Here is an actual copy of my log4j-server.properites file: http://fpaste.org/74470/15510941/ Any idea why this might not be logging? Thank you and best regards Kyle
Re: Question 1: JMX binding, Question 2: Logging
JMX stuff is in /conf/cassandra-env.sh On Tue, Feb 4, 2014 at 2:25 PM, Kyle Crumpton (kcrumpto) kcrum...@cisco.com wrote: Hi all, I'm fairly new to Cassandra. I'm deploying it to a PaaS. One thing this entails is that it must be able to have more than one instance on a single node. I'm running into the problem that JMX binds to 0.0.0.0:7199. My question is this: Is there a way to configure this? I have actually found the post that said to change the the following JVM_OPTS=$JVM_OPTS -Djava.rmi.server.hostname=127.1.246.3 where 127.1.246.3 is the IP I want to bind to.. This actually did not change the JMX binding by any means for me. I saw a post about a jmx listen address in cassandra.yaml and this also did not work. Any clarity on whether this is bindable at all? Or if there are plans for it? Also- I have logging turned on. For some reason, though, my Cassandra is not actually logging as intended. My log folder is actually empty after each (failed) run (due to the port being taken by my other cassandra process). Here is an actual copy of my log4j-server.properites file: http://fpaste.org/74470/15510941/ Any idea why this might not be logging? Thank you and best regards Kyle
Re: Cassandra 2.0 with Hadoop 2.x?
Hello Clint, Yes I was able to get it working after a bit of work. I have pushed the branch with the fix (which is currently quite a ways behind latest). You can compare to yours I suppose. Let me know if you have any questions. https://github.com/VerticalSearchWorks/cassandra/tree/Cassandra2-CDH4 regards, Thunder On Tue, Feb 4, 2014 at 1:40 PM, Cyril Scetbon cyril.scet...@free.fr wrote: Hi, Look for posts from Thunder Stumpges in this mailing list. I know he has succeeded to make it Hadoop 2.x work with Cassandra 2.x For those who are interested in using it with Cassandra 1.2.13 you can use the patch https://github.com/cscetbon/cassandra/commit/88d694362d8d6bc09b3eeceb6baad7b3cc068ad3.patch It uses Cloudera CDH4 repository for Hadoop Classes but you can use others. Regards -- Cyril SCETBON On 03 Feb 2014, at 19:10, Clint Kelly clint.ke...@gmail.com wrote: Folks, Has anyone out there used Cassandra 2.0 with Hadoop 2.x? I saw this discussion on the Cassandra JIRA: https://issues.apache.org/jira/browse/CASSANDRA-5201 but the fix referenced (https://github.com/michaelsembwever/cassandra-hadoop) is for Cassandra 1.2. I put together a similar patch for Cassandra 2.0 for anyone who is interested: https://github.com/wibiclint/cassandra2-hadoop2 but I'm wondering if there is a more official solution to this problem. Any help would be appreciated. Thanks! Best regards, Clint
Re: Lots of deletions results in death by GC
I ran my test again, and Flush Writer¹s ³All time blocked² increased to 2 and then shortly thereafter GC went into its death spiral. I doubled memtable_flush_writers (to 2) and memtable_flush_queue_size (to 8) and tried again. This time, the table that always sat with Memtable data size = 0 now showed increases in Memtable data size. That was encouraging. It never flushed, which isn¹t too surprising, because that table has relatively few rows and they are pretty wide. However, on the fourth table to clean, Flush Writer¹s ³All time blocked² went to 1, and then there were no more completed events, and about 10 minutes later GC went into its death spiral. I assume that each time Flush Writer completes an event, that means a table was flushed. Is that right? Also, I got two dropped mutation messages at the same time that Flush Writer¹s All time blocked incremented. I then increased the writers and queue size to 3 and 12, respectively, and ran my test again. This time All time blocked remained at 0, but I still suffered death by GC. I would almost think that this is caused by high load on the server, but I¹ve never seen CPU utilization go above about two of my eight available cores. If high load triggers this problem, then that is very disconcerting. That means that a CPU spike could permanently cripple a node. Okay, not permanently, but until a manual flush occurs. If anyone has any further thoughts, I¹d love to hear them. I¹m quite at the end of my rope. Thanks in advance Robert From: Nate McCall n...@thelastpickle.com Reply-To: user@cassandra.apache.org Date: Saturday, February 1, 2014 at 9:25 AM To: Cassandra Users user@cassandra.apache.org Subject: Re: Lots of deletions results in death by GC What's the output of 'nodetool tpstats' while this is happening? Specifically is Flush Writer All time blocked increasing? If so, play around with turning up memtable_flush_writers and memtable_flush_queue_size and see if that helps. On Sat, Feb 1, 2014 at 9:03 AM, Robert Wille rwi...@fold3.com wrote: A few days ago I posted about an issue I¹m having where GC takes a long time (20-30 seconds), and it happens repeatedly and basically no work gets done. I¹ve done further investigation, and I now believe that I know the cause. If I do a lot of deletes, it creates memory pressure until the memtables are flushed, but Cassandra doesn¹t flush them. If I manually flush, then life is good again (although that takes a very long time because of the GC issue). If I just leave the flushing to Cassandra, then I end up with death by GC. I believe that when the memtables are full of tombstones, Cassadnra doesn¹t realize how much memory the memtables are actually taking up, and so it doesn¹t proactively flush them in order to free up heap. As I was deleting records out of one of my tables, I was watching it via nodetool cfstats, and I found a very curious thing: Memtable cell count: 1285 Memtable data size, bytes: 0 Memtable switch count: 56 As the deletion process was chugging away, the memtable cell count increased, as expected, but the data size stayed at 0. No flushing occurred. Here¹s the schema for this table: CREATE TABLE bdn_index_pub ( tshard VARCHAR, pord INT, ord INT, hpath VARCHAR, page BIGINT, PRIMARY KEY (tshard, pord) ) WITH gc_grace_seconds = 0 AND compaction = { 'class' : 'LeveledCompactionStrategy', 'sstable_size_in_mb' : 160 }; I have a few tables that I run this cleaning process on, and not all of them exhibit this behavior. One of them reported an increasing number of bytes, as expected, and it also flushed as expected. Here¹s the schema for that table: CREATE TABLE bdn_index_child ( ptshard VARCHAR, ord INT, hpath VARCHAR, PRIMARY KEY (ptshard, ord) ) WITH gc_grace_seconds = 0 AND compaction = { 'class' : 'LeveledCompactionStrategy', 'sstable_size_in_mb' : 160 }; In both cases, I¹m deleting the entire record (i.e. specifying just the first component of the primary key in the delete statement). Most records in bdn_index_pub have 10,000 rows per record. bdn_index_child usually has just a handful of rows, but a few records can have up 10,000. Still a further mystery, 1285 tombstones in the bdn_index_pub memtable doesn¹t seem like nearly enough to create a memory problem. Perhaps there are other flaws in the memory metering. Or perhaps there is some other issue that causes Cassandra to mismanage the heap when there are a lot of deletes. One other thought I had is that I page through these tables and clean them out as I go. Perhaps there is some interaction between the paging and the deleting that causes the GC problems and I should create a list of keys to delete and then delete them after I¹ve finished reading the entire table. I reduced memtable_total_space_in_mb from the default (probably 2.7 GB) to 1 GB, in hopes that it would force Cassandra to flush
Re: Lots of deletions results in death by GC
Is it possible you are generating *exclusively* deletes for this table? On 5 February 2014 00:10, Robert Wille rwi...@fold3.com wrote: I ran my test again, and Flush Writer's All time blocked increased to 2 and then shortly thereafter GC went into its death spiral. I doubled memtable_flush_writers (to 2) and memtable_flush_queue_size (to 8) and tried again. This time, the table that always sat with Memtable data size = 0 now showed increases in Memtable data size. That was encouraging. It never flushed, which isn't too surprising, because that table has relatively few rows and they are pretty wide. However, on the fourth table to clean, Flush Writer's All time blocked went to 1, and then there were no more completed events, and about 10 minutes later GC went into its death spiral. I assume that each time Flush Writer completes an event, that means a table was flushed. Is that right? Also, I got two dropped mutation messages at the same time that Flush Writer's All time blocked incremented. I then increased the writers and queue size to 3 and 12, respectively, and ran my test again. This time All time blocked remained at 0, but I still suffered death by GC. I would almost think that this is caused by high load on the server, but I've never seen CPU utilization go above about two of my eight available cores. If high load triggers this problem, then that is very disconcerting. That means that a CPU spike could permanently cripple a node. Okay, not permanently, but until a manual flush occurs. If anyone has any further thoughts, I'd love to hear them. I'm quite at the end of my rope. Thanks in advance Robert From: Nate McCall n...@thelastpickle.com Reply-To: user@cassandra.apache.org Date: Saturday, February 1, 2014 at 9:25 AM To: Cassandra Users user@cassandra.apache.org Subject: Re: Lots of deletions results in death by GC What's the output of 'nodetool tpstats' while this is happening? Specifically is Flush Writer All time blocked increasing? If so, play around with turning up memtable_flush_writers and memtable_flush_queue_size and see if that helps. On Sat, Feb 1, 2014 at 9:03 AM, Robert Wille rwi...@fold3.com wrote: A few days ago I posted about an issue I'm having where GC takes a long time (20-30 seconds), and it happens repeatedly and basically no work gets done. I've done further investigation, and I now believe that I know the cause. If I do a lot of deletes, it creates memory pressure until the memtables are flushed, but Cassandra doesn't flush them. If I manually flush, then life is good again (although that takes a very long time because of the GC issue). If I just leave the flushing to Cassandra, then I end up with death by GC. I believe that when the memtables are full of tombstones, Cassadnra doesn't realize how much memory the memtables are actually taking up, and so it doesn't proactively flush them in order to free up heap. As I was deleting records out of one of my tables, I was watching it via nodetool cfstats, and I found a very curious thing: Memtable cell count: 1285 Memtable data size, bytes: 0 Memtable switch count: 56 As the deletion process was chugging away, the memtable cell count increased, as expected, but the data size stayed at 0. No flushing occurred. Here's the schema for this table: CREATE TABLE bdn_index_pub ( tshard VARCHAR, pord INT, ord INT, hpath VARCHAR, page BIGINT, PRIMARY KEY (tshard, pord) ) WITH gc_grace_seconds = 0 AND compaction = { 'class' : 'LeveledCompactionStrategy', 'sstable_size_in_mb' : 160 }; I have a few tables that I run this cleaning process on, and not all of them exhibit this behavior. One of them reported an increasing number of bytes, as expected, and it also flushed as expected. Here's the schema for that table: CREATE TABLE bdn_index_child ( ptshard VARCHAR, ord INT, hpath VARCHAR, PRIMARY KEY (ptshard, ord) ) WITH gc_grace_seconds = 0 AND compaction = { 'class' : 'LeveledCompactionStrategy', 'sstable_size_in_mb' : 160 }; In both cases, I'm deleting the entire record (i.e. specifying just the first component of the primary key in the delete statement). Most records in bdn_index_pub have 10,000 rows per record. bdn_index_child usually has just a handful of rows, but a few records can have up 10,000. Still a further mystery, 1285 tombstones in the bdn_index_pub memtable doesn't seem like nearly enough to create a memory problem. Perhaps there are other flaws in the memory metering. Or perhaps there is some other issue that causes Cassandra to mismanage the heap when there are a lot of deletes. One other thought I had is that I page through these tables and clean them out as I go. Perhaps there is some interaction between the paging and the deleting that causes the GC problems and I should create a list of keys to delete and then delete them after I've finished reading the
Re: what tool will create noncql columnfamilies in cassandra 3a
I am also curious as to how users will manage Thrift-based tables without the cli. PyCassaShell comes to mind, as does using Thrift-based clients. On Tue, Feb 4, 2014 at 9:53 AM, Edward Capriolo edlinuxg...@gmail.comwrote: Cassandra 2.0.4 cli is informing me that it will no longer exist in the next major. How will users adjust the meta data of non cql column families and other cfs that do not fit into the cql model? -- Sorry this was sent from mobile. Will do less grammar and spell check than usual. -- Patricia Gorla @patriciagorla Consultant Apache Cassandra Consulting http://www.thelastpickle.com http://thelastpickle.com
Re: Lots of deletions results in death by GC
Sorry to hear that Robert, I ran into similar issue a while ago. I had an extremely heavy write and update load, as a result Cassandra (1.2.9) was constantly flushing to disk and used to GC, tried exactly the same steps you tried (tuning memtable_flush_writers (to 2) and memtable_flush_queue_size (to 8) ) no luck. Almost all of the issues went away when I migrated to 1.2.13 this release also had some fixes which I badly needed. What version are you running ? (I tried to look in the thread but couldn't find one, sorry if this is a repeat question) Dropped messages are the sign that Cassandra is taking heavy that's the load shedding mechanism. I would love to see some sort of back-pressure implemented. -sandeep On Tue, Feb 4, 2014 at 6:10 PM, Robert Wille rwi...@fold3.com wrote: I ran my test again, and Flush Writer's All time blocked increased to 2 and then shortly thereafter GC went into its death spiral. I doubled memtable_flush_writers (to 2) and memtable_flush_queue_size (to 8) and tried again. This time, the table that always sat with Memtable data size = 0 now showed increases in Memtable data size. That was encouraging. It never flushed, which isn't too surprising, because that table has relatively few rows and they are pretty wide. However, on the fourth table to clean, Flush Writer's All time blocked went to 1, and then there were no more completed events, and about 10 minutes later GC went into its death spiral. I assume that each time Flush Writer completes an event, that means a table was flushed. Is that right? Also, I got two dropped mutation messages at the same time that Flush Writer's All time blocked incremented. I then increased the writers and queue size to 3 and 12, respectively, and ran my test again. This time All time blocked remained at 0, but I still suffered death by GC. I would almost think that this is caused by high load on the server, but I've never seen CPU utilization go above about two of my eight available cores. If high load triggers this problem, then that is very disconcerting. That means that a CPU spike could permanently cripple a node. Okay, not permanently, but until a manual flush occurs. If anyone has any further thoughts, I'd love to hear them. I'm quite at the end of my rope. Thanks in advance Robert From: Nate McCall n...@thelastpickle.com Reply-To: user@cassandra.apache.org Date: Saturday, February 1, 2014 at 9:25 AM To: Cassandra Users user@cassandra.apache.org Subject: Re: Lots of deletions results in death by GC What's the output of 'nodetool tpstats' while this is happening? Specifically is Flush Writer All time blocked increasing? If so, play around with turning up memtable_flush_writers and memtable_flush_queue_size and see if that helps. On Sat, Feb 1, 2014 at 9:03 AM, Robert Wille rwi...@fold3.com wrote: A few days ago I posted about an issue I'm having where GC takes a long time (20-30 seconds), and it happens repeatedly and basically no work gets done. I've done further investigation, and I now believe that I know the cause. If I do a lot of deletes, it creates memory pressure until the memtables are flushed, but Cassandra doesn't flush them. If I manually flush, then life is good again (although that takes a very long time because of the GC issue). If I just leave the flushing to Cassandra, then I end up with death by GC. I believe that when the memtables are full of tombstones, Cassadnra doesn't realize how much memory the memtables are actually taking up, and so it doesn't proactively flush them in order to free up heap. As I was deleting records out of one of my tables, I was watching it via nodetool cfstats, and I found a very curious thing: Memtable cell count: 1285 Memtable data size, bytes: 0 Memtable switch count: 56 As the deletion process was chugging away, the memtable cell count increased, as expected, but the data size stayed at 0. No flushing occurred. Here's the schema for this table: CREATE TABLE bdn_index_pub ( tshard VARCHAR, pord INT, ord INT, hpath VARCHAR, page BIGINT, PRIMARY KEY (tshard, pord) ) WITH gc_grace_seconds = 0 AND compaction = { 'class' : 'LeveledCompactionStrategy', 'sstable_size_in_mb' : 160 }; I have a few tables that I run this cleaning process on, and not all of them exhibit this behavior. One of them reported an increasing number of bytes, as expected, and it also flushed as expected. Here's the schema for that table: CREATE TABLE bdn_index_child ( ptshard VARCHAR, ord INT, hpath VARCHAR, PRIMARY KEY (ptshard, ord) ) WITH gc_grace_seconds = 0 AND compaction = { 'class' : 'LeveledCompactionStrategy', 'sstable_size_in_mb' : 160 }; In both cases, I'm deleting the entire record (i.e. specifying just the first component of the primary key in the delete statement). Most records in bdn_index_pub have 10,000 rows per record. bdn_index_child usually has
Looking for clarification on the gossip protocol... 3 random nodes every second?
Hi, I'm looking to get some clarification on how the gossip protocol works in Cassandra 2.0. Does a node contact 3 purely random nodes every second for gossip or is there more intelligence involved in how it selects the 3 nodes? *The Apache wiki on Cassandra states this:* Gossip timer task runs every second. During each of these runs the node initiates gossip exchange according to following rules: 1) Gossip to random live endpoint (if any) 2) Gossip to random unreachable endpoint with certain probability depending on number of unreachable and live nodes 3) If the node gossiped to at (1) was not seed, or the number of live nodes is less than number of seeds, gossip to random seed with certain probability depending on number of unreachable, seed and live nodes. These rules were developed to ensure that if the network is up, all nodes will eventually know about all other nodes. Link: http://wiki.apache.org/cassandra/ArchitectureGossip Is the above still true for C* 2.0? Let's say all 20 nodes in a C* cluster are up. In this case will each node simply contact 3 random nodes every second? - SF