Repair hangs after Upgrade to VNodes 1.2.2
Has anyone else experienced this? After upgrading to VNodes, I am having Repair issues. If I run `nodetool -h localhost repair`, then it will repair only the first Keyspace and then hang... I let it go for a week and nothing. If I run `nodetool -h localhost repair -pr`, then it appears to only repair the first VNode range, but does do all keyspaces... I can't find anything in my cassandra logs to point to a problem for either scenario. My work around is to run a repair command independently for my different keyspaces nodetool -h localhost repair Keyspace1 nodetool -h localhost repair Keyspace2 ... But that is silly! I am using Cassandra 1.2.2 Thanks! Ryan
Re: Repair hangs after Upgrade to VNodes 1.2.2
Marco, No there are no errors... the last line I see in my logs related to repair is : [repair #...] Sending completed merkle tree to /[node] for (keyspace1,columnfamily1) Ryan On Wed, Mar 27, 2013 at 8:49 AM, Marco Matarazzo marco.matara...@hexkeep.com wrote: If I run `nodetool -h localhost repair`, then it will repair only the first Keyspace and then hang... I let it go for a week and nothing. Does node logs show any error ? If I run `nodetool -h localhost repair -pr`, then it appears to only repair the first VNode range, but does do all keyspaces… As far as I know, this is fixed in cassandra 1.2.3 -- Marco Matarazzo
Re: Repair hangs after Upgrade to VNodes 1.2.2
Upgrading to 1.2.3 fixed the -pr Repair.. I'll just use that from now on (which is what I prefer!) Thanks, Ryan On Wed, Mar 27, 2013 at 9:11 AM, Ryan Lowe ryanjl...@gmail.com wrote: Marco, No there are no errors... the last line I see in my logs related to repair is : [repair #...] Sending completed merkle tree to /[node] for (keyspace1,columnfamily1) Ryan On Wed, Mar 27, 2013 at 8:49 AM, Marco Matarazzo marco.matara...@hexkeep.com wrote: If I run `nodetool -h localhost repair`, then it will repair only the first Keyspace and then hang... I let it go for a week and nothing. Does node logs show any error ? If I run `nodetool -h localhost repair -pr`, then it appears to only repair the first VNode range, but does do all keyspaces… As far as I know, this is fixed in cassandra 1.2.3 -- Marco Matarazzo
VNodes, Replication and Minimum cluster size
I have heard before that the recommended minimum cluster size is 4 (with replication factor of 3). I am curious to know if vnodes would change that or if that statement was valid to begin with! The use case I am working on is one where we see tremendous amount of load for just 2 days out of the week and the rest of the time the cluster is pretty much idle. It appears that vnodes will allow me to auto-scale the clusters size a little easier, but I am wondering what is the smallest I can get the cluster in physical server count and still have a good replication count. I'll panic about having 1 of 2 or 1 of 3 servers going down in an outage as a separate topic alone at night while not sleeping. Thanks! Ryan
Re: Creating column families per client
What we have done to avoid creating multiple column families is to sort of namespace the row key. So if we have a column family of Users and accounts: AccountA and AccountB, we do the following: Column Family User: AccountA/ryan : { first: Ryan, last: Lowe } AccountB/ryan : { first: Ryan, last: Smith} etc. For our needs, this did the same thing as having 2 User column families for AccountA and AccountB Ryan On Wed, Dec 21, 2011 at 10:34 AM, Flavio Baronti f.baro...@list-group.comwrote: Hi, based on my experience with Cassandra 0.7.4, i strongly discourage you to do that: we tried dynamical creation of column families, and it was a nightmare. First of all, the operation can not be done concurrently, therefore you must find a way to avoid parallel creation (over all the cluster, not in a single node). The main problem however is with timestamps. The structure of your keyspace is versioned with a time-dependent id, which is assigned by the host where you perform the schema update based on the local machine time. If you do two updates in close succession on two different nodes, and their clocks are not perfectly synchronized (and they will never be), Cassandra might be confused by their relative ordering, and stop working altogether. Bottom line: don't. Flavio Il 12/21/2011 14:45 PM, Rafael Almeida ha scritto: Hello, I am evaluating the usage of cassandra for my system. I will have several clients who won't share data with each other. My idea is to create one column family per client. When a new client comes in and adds data to the system, I'd like to create a column family dynamically. Is that reliable? Can I create a column family on a node and imediately add new data on that column family and be confident that the data added will eventually become visible to a read? []'s Rafael
Scaling Out / Replication Factor too?
We are working on a system that has super heavy traffic during specific times... think of sporting events. Other times we will get almost 0 traffic. In order to handle the traffic during the events, we are planning on scaling out cassandra into a very large cluster. The size of our data is still quite small. A single event's data might be 100MB in size max, but we will be inserting that data very rapidly and needing to read it at the same time. Since we have very slow times, we use a replication factor of 2 and a cluster size of 2 to handle the traffic... it handles it perfectly. Since dataset size is not really an issue, what is the best way to scale out for us? We are using order preserving partitioners to do range scanning, so last time I tried to scale out our cluster we ended up with very uneven load. Then the few nodes that contained that data were very swamped, while the rest were barely touched. Other note is that since we have very little data, and lots of memory, we turned on key and row cache almost as high as we could go. So my question is this... if I bring in 20+ nodes, should I increase the replication factor as well? It would seem to make sense that more replication factor would help distribute load? Or does it just mean that writes take even longer? What are some other suggestions on how to do scale up (and then back down) for a system that gets very high traffic in known small time windows. Let me know if you need more info. Thanks! Ryan
Re: Scaling Out / Replication Factor too?
Edward, This information (and the presentation) was very helpful... but I still have a few more questions. During a test run, I brought up 16 servers with RF of 2 and Read Repair Chance of 1.0. However, like I mentioned, the load was only on a few servers. I attempted to increase the Key and Row caching to completely cover our entire dataset, but still CPU load on the few servers was still extremely high. Does the cache exist on every server in the cluster? Would turning off Read Repair (or turning it dramatically down) help reduce the load on the servers with the heavy load? Thanks! Ryan On Sun, Aug 28, 2011 at 3:49 PM, Edward Capriolo edlinuxg...@gmail.comwrote: So my question is this... if I bring in 20+ nodes, should I increase the replication factor as well? Each write is done to all natural endpoints of the data. If you set replication factor equal to number of nodes write operations do not scale. This is because each write has to happen on every node. The same thing is true with read operations, even if you ready at CL.ONE read repair will perform a read on all replicates. * However there is one caveat to this advice, I covered it in this presentation I did. http://www.slideshare.net/edwardcapriolo/cassandra-as-memcache The read_repair_chance controls how often a read repair is triggered. You can increase Replication Factor and lower read_repair_chance. This gives you many severs capable of serving the same read without burdening by doing repair reads across the other 19 nodes. However this is NOT the standard method to scale out. The standard, and probably better way in all but a few instances, is to leave the replication factor alone and add more nodes. Normally, people set Replication Factor at 3. This gives 3 nodes to serve reads, as long as their dataset is small, which is true in your case, these reads are heavily cached. You would need a very high number of reads/writes to bottleneck any node. Raising and lowering replication factor is not the way to go, changing the replication factor involves more steps then just changing the variable as does growing and shrinking the cluster. What to do about idling servers is another question. We have thought about having our idling web servers join our hadoop cluster at night and then leave again in the morning :) Maybe you can have some fun with your cassandra gear in its idle time. On Sun, Aug 28, 2011 at 2:47 PM, Ryan Lowe ryanjl...@gmail.com wrote: We are working on a system that has super heavy traffic during specific times... think of sporting events. Other times we will get almost 0 traffic. In order to handle the traffic during the events, we are planning on scaling out cassandra into a very large cluster. The size of our data is still quite small. A single event's data might be 100MB in size max, but we will be inserting that data very rapidly and needing to read it at the same time. Since we have very slow times, we use a replication factor of 2 and a cluster size of 2 to handle the traffic... it handles it perfectly. Since dataset size is not really an issue, what is the best way to scale out for us? We are using order preserving partitioners to do range scanning, so last time I tried to scale out our cluster we ended up with very uneven load. Then the few nodes that contained that data were very swamped, while the rest were barely touched. Other note is that since we have very little data, and lots of memory, we turned on key and row cache almost as high as we could go. So my question is this... if I bring in 20+ nodes, should I increase the replication factor as well? It would seem to make sense that more replication factor would help distribute load? Or does it just mean that writes take even longer? What are some other suggestions on how to do scale up (and then back down) for a system that gets very high traffic in known small time windows. Let me know if you need more info. Thanks! Ryan
Re: Scaling Out / Replication Factor too?
I meant to also add that we do not necessarily care if the Reads are somewhat stale... if two people reading from the cluster at the same time get different results (say within a 5 min window) then that is acceptable. performance is the key thing. Ryan On Sun, Aug 28, 2011 at 7:24 PM, Ryan Lowe ryanjl...@gmail.com wrote: Edward, This information (and the presentation) was very helpful... but I still have a few more questions. During a test run, I brought up 16 servers with RF of 2 and Read Repair Chance of 1.0. However, like I mentioned, the load was only on a few servers. I attempted to increase the Key and Row caching to completely cover our entire dataset, but still CPU load on the few servers was still extremely high. Does the cache exist on every server in the cluster? Would turning off Read Repair (or turning it dramatically down) help reduce the load on the servers with the heavy load? Thanks! Ryan On Sun, Aug 28, 2011 at 3:49 PM, Edward Capriolo edlinuxg...@gmail.comwrote: So my question is this... if I bring in 20+ nodes, should I increase the replication factor as well? Each write is done to all natural endpoints of the data. If you set replication factor equal to number of nodes write operations do not scale. This is because each write has to happen on every node. The same thing is true with read operations, even if you ready at CL.ONE read repair will perform a read on all replicates. * However there is one caveat to this advice, I covered it in this presentation I did. http://www.slideshare.net/edwardcapriolo/cassandra-as-memcache The read_repair_chance controls how often a read repair is triggered. You can increase Replication Factor and lower read_repair_chance. This gives you many severs capable of serving the same read without burdening by doing repair reads across the other 19 nodes. However this is NOT the standard method to scale out. The standard, and probably better way in all but a few instances, is to leave the replication factor alone and add more nodes. Normally, people set Replication Factor at 3. This gives 3 nodes to serve reads, as long as their dataset is small, which is true in your case, these reads are heavily cached. You would need a very high number of reads/writes to bottleneck any node. Raising and lowering replication factor is not the way to go, changing the replication factor involves more steps then just changing the variable as does growing and shrinking the cluster. What to do about idling servers is another question. We have thought about having our idling web servers join our hadoop cluster at night and then leave again in the morning :) Maybe you can have some fun with your cassandra gear in its idle time. On Sun, Aug 28, 2011 at 2:47 PM, Ryan Lowe ryanjl...@gmail.com wrote: We are working on a system that has super heavy traffic during specific times... think of sporting events. Other times we will get almost 0 traffic. In order to handle the traffic during the events, we are planning on scaling out cassandra into a very large cluster. The size of our data is still quite small. A single event's data might be 100MB in size max, but we will be inserting that data very rapidly and needing to read it at the same time. Since we have very slow times, we use a replication factor of 2 and a cluster size of 2 to handle the traffic... it handles it perfectly. Since dataset size is not really an issue, what is the best way to scale out for us? We are using order preserving partitioners to do range scanning, so last time I tried to scale out our cluster we ended up with very uneven load. Then the few nodes that contained that data were very swamped, while the rest were barely touched. Other note is that since we have very little data, and lots of memory, we turned on key and row cache almost as high as we could go. So my question is this... if I bring in 20+ nodes, should I increase the replication factor as well? It would seem to make sense that more replication factor would help distribute load? Or does it just mean that writes take even longer? What are some other suggestions on how to do scale up (and then back down) for a system that gets very high traffic in known small time windows. Let me know if you need more info. Thanks! Ryan
Re: For multi-tenant, is it good to have a key space for each tenant?
I've been doing multi-tenant with cassandra for a while, and from what I have found, it is better to keep your keyspaces down in number. That said, I have been using composite keys for my multi-tenancy now and it works great: Column Family: User Key: [AccountId]/[UserId] This makes it super handy especially if you use the order preserving partitioner with range queries... If for example I want all of the users in account 14, I can do this range query: get User[14/:14/~]; But I am no great expert... just someone who is trying and loving cassandra! Ryan On Thu, Aug 25, 2011 at 1:20 AM, Himanshi Sharma himanshi.sha...@tcs.comwrote: I am working on similar sort of stuff. As per my knowledge, creating keyspace for each tenant would impose lot of memory constraints. Following Shared Keyspace and Shared Column families would be a better approach. And each row in CF could be referred by tenant_id as row key. And again it depends on the type of application. Hey this is just a suggestion, m not completely sure.. :) Himanshi Sharma From: Guofeng Zhang guofen...@gmail.com To: user@cassandra.apache.org Date: 08/25/2011 10:38 AM Subject: For multi-tenant, is it good to have a key space for each tenant? -- I wonder if it is a good practice to create a key space for each tenant. Any advice is appreciated. Thanks =-=-= Notice: The information contained in this e-mail message and/or attachments to it may contain confidential or privileged information. If you are not the intended recipient, any dissemination, use, review, distribution, printing or copying of the information contained in this e-mail message and/or attachments to it are strictly prohibited. If you have received this communication in error, please notify us by reply e-mail or telephone and immediately and permanently delete the message and any attachments. Thank you
Counter Column Family Inconsistent Node
[default@Race] list CounterCF; Using default limit of 100 --- RowKey: Stats = (counter=APP, value=7503) = (counter=FILEUPLOAD, value=155) = (counter=MQUPLOAD, value=4726775) = (counter=PAGES, value=131948) = (counter=REST, value=3) = (counter=SOAP, value=44) = (counter=WS, value=1943) 1 Row Returned. [default@Race] list CounterCF; Using default limit of 100 --- RowKey: Stats = (counter=APP, value=93683) = (counter=FILEUPLOAD, value=347) = (counter=MQUPLOAD, value=14961065367) = (counter=PAGES, value=183089568) = (counter=REST, value=3) = (counter=SOAP, value=44) = (counter=WS, value=23972) 1 Row Returned. [default@Race] list CounterCF; Using default limit of 100 --- RowKey: Stats = (counter=APP, value=7503) = (counter=FILEUPLOAD, value=155) = (counter=MQUPLOAD, value=4726775) = (counter=PAGES, value=131948) = (counter=REST, value=3) = (counter=SOAP, value=44) = (counter=WS, value=1943) 1 Row Returned. [default@Race] list CounterCF; Using default limit of 100 --- RowKey: Stats = (counter=APP, value=7503) = (counter=FILEUPLOAD, value=155) = (counter=MQUPLOAD, value=4726775) = (counter=PAGES, value=131948) = (counter=REST, value=3) = (counter=SOAP, value=44) = (counter=WS, value=1943)
Re: Counter Column Family Inconsistent Node
yeah, sorry about that... pushed click before I added my comments. I have a cluster of 5 nodes using 0.8.4 where I am using counters. One one of my nodes, every time I do a list command I get different results. The counters jump all over the place. Any ideas? I have run nodetool repair on all nodes. Thanks! Ryan On Tue, Aug 16, 2011 at 1:18 PM, Ryan Lowe ryanjl...@gmail.com wrote: [default@Race] list CounterCF; Using default limit of 100 --- RowKey: Stats = (counter=APP, value=7503) = (counter=FILEUPLOAD, value=155) = (counter=MQUPLOAD, value=4726775) = (counter=PAGES, value=131948) = (counter=REST, value=3) = (counter=SOAP, value=44) = (counter=WS, value=1943) 1 Row Returned. [default@Race] list CounterCF; Using default limit of 100 --- RowKey: Stats = (counter=APP, value=93683) = (counter=FILEUPLOAD, value=347) = (counter=MQUPLOAD, value=14961065367) = (counter=PAGES, value=183089568) = (counter=REST, value=3) = (counter=SOAP, value=44) = (counter=WS, value=23972) 1 Row Returned. [default@Race] list CounterCF; Using default limit of 100 --- RowKey: Stats = (counter=APP, value=7503) = (counter=FILEUPLOAD, value=155) = (counter=MQUPLOAD, value=4726775) = (counter=PAGES, value=131948) = (counter=REST, value=3) = (counter=SOAP, value=44) = (counter=WS, value=1943) 1 Row Returned. [default@Race] list CounterCF; Using default limit of 100 --- RowKey: Stats = (counter=APP, value=7503) = (counter=FILEUPLOAD, value=155) = (counter=MQUPLOAD, value=4726775) = (counter=PAGES, value=131948) = (counter=REST, value=3) = (counter=SOAP, value=44) = (counter=WS, value=1943)
Re: Counter Column Family Inconsistent Node
Actually I think it was more related to our servers getting their time out of sync... after finding this article: http://ria101.wordpress.com/2011/02/08/cassandra-the-importance-of-system-clocks-avoiding-oom-and-how-to-escape-oom-meltdown/ I checked our servers, and sure enough, 2 of them were out of sync with each other with more than a 2 min difference! I reconfigured ntp and I think I am back in business. Thanks though! Ryan On Tue, Aug 16, 2011 at 2:53 PM, Jonathan Ellis jbel...@gmail.com wrote: May be the same as https://issues.apache.org/jira/browse/CASSANDRA-3006 ? On Tue, Aug 16, 2011 at 12:20 PM, Ryan Lowe ryanjl...@gmail.com wrote: yeah, sorry about that... pushed click before I added my comments. I have a cluster of 5 nodes using 0.8.4 where I am using counters. One one of my nodes, every time I do a list command I get different results. The counters jump all over the place. Any ideas? I have run nodetool repair on all nodes. Thanks! Ryan On Tue, Aug 16, 2011 at 1:18 PM, Ryan Lowe ryanjl...@gmail.com wrote: [default@Race] list CounterCF; Using default limit of 100 --- RowKey: Stats = (counter=APP, value=7503) = (counter=FILEUPLOAD, value=155) = (counter=MQUPLOAD, value=4726775) = (counter=PAGES, value=131948) = (counter=REST, value=3) = (counter=SOAP, value=44) = (counter=WS, value=1943) 1 Row Returned. [default@Race] list CounterCF; Using default limit of 100 --- RowKey: Stats = (counter=APP, value=93683) = (counter=FILEUPLOAD, value=347) = (counter=MQUPLOAD, value=14961065367) = (counter=PAGES, value=183089568) = (counter=REST, value=3) = (counter=SOAP, value=44) = (counter=WS, value=23972) 1 Row Returned. [default@Race] list CounterCF; Using default limit of 100 --- RowKey: Stats = (counter=APP, value=7503) = (counter=FILEUPLOAD, value=155) = (counter=MQUPLOAD, value=4726775) = (counter=PAGES, value=131948) = (counter=REST, value=3) = (counter=SOAP, value=44) = (counter=WS, value=1943) 1 Row Returned. [default@Race] list CounterCF; Using default limit of 100 --- RowKey: Stats = (counter=APP, value=7503) = (counter=FILEUPLOAD, value=155) = (counter=MQUPLOAD, value=4726775) = (counter=PAGES, value=131948) = (counter=REST, value=3) = (counter=SOAP, value=44) = (counter=WS, value=1943) -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com