Re: Attempting to load already loaded column family during startup
well, that didn't go away after I remove all the caches. What should I do now? On Wed, Oct 10, 2012 at 2:15 PM, Manu Zhang owenzhang1...@gmail.com wrote: exception encountered during startup: Attempting to load already loaded column family system_traces.sessionsjava.lang.RuntimeException: Attempting to load already loaded column family system_traces.sessions at org.apache.cassandra.config.Schema.load(Schema.java:398) at org.apache.cassandra.config.Schema.load(Schema.java:111) at org.apache.cassandra.config.Schema.load(Schema.java:96) at org.apache.cassandra.config.DatabaseDescriptor.loadSchemas(DatabaseDescriptor.java:560) at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:214) at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:386) at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:429) This is thrown while reading saved row caches. What could have caused the problem?
Re: unbalanced ring
Hi! I am re-posting this, now that I have more data and still *unbalanced ring*: 3 nodes, RF=3, RCL=WCL=QUORUM Address DC RackStatus State Load OwnsToken 113427455640312821154458202477256070485 x.x.x.xus-east 1c Up Normal 24.02 GB33.33% 0 y.y.y.y us-east 1c Up Normal 33.45 GB33.33% 56713727820156410577229101238628035242 z.z.z.zus-east 1c Up Normal 29.85 GB33.33% 113427455640312821154458202477256070485 repair runs weekly. I don't run nodetool compact as I read that this may cause the minor regular compactions not to run and then I will have to run compact manually. Is that right? Any idea if this means something wrong, and if so, how to solve? Thanks, * Tamar Fraenkel * Senior Software Engineer, TOK Media [image: Inline image 1] ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956 On Tue, Mar 27, 2012 at 9:12 AM, Tamar Fraenkel ta...@tok-media.com wrote: Thanks, I will wait and see as data accumulates. Thanks, *Tamar Fraenkel * Senior Software Engineer, TOK Media [image: Inline image 1] ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956 On Tue, Mar 27, 2012 at 9:00 AM, R. Verlangen ro...@us2.nl wrote: Cassandra is built to store tons and tons of data. In my opinion roughly ~ 6MB per node is not enough data to allow it to become a fully balanced cluster. 2012/3/27 Tamar Fraenkel ta...@tok-media.com This morning I have nodetool ring -h localhost Address DC RackStatus State Load OwnsToken 113427455640312821154458202477256070485 10.34.158.33us-east 1c Up Normal 5.78 MB 33.33% 0 10.38.175.131 us-east 1c Up Normal 7.23 MB 33.33% 56713727820156410577229101238628035242 10.116.83.10us-east 1c Up Normal 5.02 MB 33.33% 113427455640312821154458202477256070485 Version is 1.0.8. *Tamar Fraenkel * Senior Software Engineer, TOK Media [image: Inline image 1] ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956 On Tue, Mar 27, 2012 at 4:05 AM, Maki Watanabe watanabe.m...@gmail.comwrote: What version are you using? Anyway try nodetool repair compact. maki 2012/3/26 Tamar Fraenkel ta...@tok-media.com Hi! I created Amazon ring using datastax image and started filling the db. The cluster seems un-balanced. nodetool ring returns: Address DC RackStatus State Load OwnsToken 113427455640312821154458202477256070485 10.34.158.33us-east 1c Up Normal 514.29 KB 33.33% 0 10.38.175.131 us-east 1c Up Normal 1.5 MB 33.33% 56713727820156410577229101238628035242 10.116.83.10us-east 1c Up Normal 1.5 MB 33.33% 113427455640312821154458202477256070485 [default@tok] describe; Keyspace: tok: Replication Strategy: org.apache.cassandra.locator.SimpleStrategy Durable Writes: true Options: [replication_factor:2] [default@tok] describe cluster; Cluster Information: Snitch: org.apache.cassandra.locator.Ec2Snitch Partitioner: org.apache.cassandra.dht.RandomPartitioner Schema versions: 4687d620-7664-11e1--1bcb936807ff: [10.38.175.131, 10.34.158.33, 10.116.83.10] Any idea what is the cause? I am running similar code on local ring and it is balanced. How can I fix this? Thanks, *Tamar Fraenkel * Senior Software Engineer, TOK Media [image: Inline image 1] ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956 -- With kind regards, Robin Verlangen www.robinverlangen.nl tokLogo.png
cassandra 1.2 beta in production
Hi Guys, What known critical bugs are there that couldn't allow to use 1.2 beta 1 in production? We don't use cql and secondary indexes. -- Best regards** Zotov Alexey Grid Dynamics Skype: azotcsit
Re: unbalanced ring
Hi, Same thing here: 2 nodes, RF = 2. RCL = 1, WCL = 1. Like Tamar I never ran a major compaction and repair once a week each node. 10.59.21.241eu-west 1b Up Normal 133.02 GB 50.00% 0 10.58.83.109eu-west 1b Up Normal 98.12 GB 50.00% 85070591730234615865843651857942052864 What phenomena could explain the result above ? By the way, I have copy the data and import it in a one node dev cluster. There I have run a major compaction and the size of my data has been significantly reduced (to about 32 GB instead of 133 GB). How is that possible ? Do you think that if I run major compaction in both nodes it will balance the load evenly ? Should I run major compaction in production ? 2012/10/10 Tamar Fraenkel ta...@tok-media.com Hi! I am re-posting this, now that I have more data and still *unbalanced ring *: 3 nodes, RF=3, RCL=WCL=QUORUM Address DC RackStatus State Load OwnsToken 113427455640312821154458202477256070485 x.x.x.xus-east 1c Up Normal 24.02 GB33.33% 0 y.y.y.y us-east 1c Up Normal 33.45 GB33.33% 56713727820156410577229101238628035242 z.z.z.zus-east 1c Up Normal 29.85 GB33.33% 113427455640312821154458202477256070485 repair runs weekly. I don't run nodetool compact as I read that this may cause the minor regular compactions not to run and then I will have to run compact manually. Is that right? Any idea if this means something wrong, and if so, how to solve? Thanks, * Tamar Fraenkel * Senior Software Engineer, TOK Media [image: Inline image 1] ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956 On Tue, Mar 27, 2012 at 9:12 AM, Tamar Fraenkel ta...@tok-media.comwrote: Thanks, I will wait and see as data accumulates. Thanks, *Tamar Fraenkel * Senior Software Engineer, TOK Media [image: Inline image 1] ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956 On Tue, Mar 27, 2012 at 9:00 AM, R. Verlangen ro...@us2.nl wrote: Cassandra is built to store tons and tons of data. In my opinion roughly ~ 6MB per node is not enough data to allow it to become a fully balanced cluster. 2012/3/27 Tamar Fraenkel ta...@tok-media.com This morning I have nodetool ring -h localhost Address DC RackStatus State Load OwnsToken 113427455640312821154458202477256070485 10.34.158.33us-east 1c Up Normal 5.78 MB 33.33% 0 10.38.175.131 us-east 1c Up Normal 7.23 MB 33.33% 56713727820156410577229101238628035242 10.116.83.10us-east 1c Up Normal 5.02 MB 33.33% 113427455640312821154458202477256070485 Version is 1.0.8. *Tamar Fraenkel * Senior Software Engineer, TOK Media [image: Inline image 1] ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956 On Tue, Mar 27, 2012 at 4:05 AM, Maki Watanabe watanabe.m...@gmail.com wrote: What version are you using? Anyway try nodetool repair compact. maki 2012/3/26 Tamar Fraenkel ta...@tok-media.com Hi! I created Amazon ring using datastax image and started filling the db. The cluster seems un-balanced. nodetool ring returns: Address DC RackStatus State Load OwnsToken 113427455640312821154458202477256070485 10.34.158.33us-east 1c Up Normal 514.29 KB 33.33% 0 10.38.175.131 us-east 1c Up Normal 1.5 MB 33.33% 56713727820156410577229101238628035242 10.116.83.10us-east 1c Up Normal 1.5 MB 33.33% 113427455640312821154458202477256070485 [default@tok] describe; Keyspace: tok: Replication Strategy: org.apache.cassandra.locator.SimpleStrategy Durable Writes: true Options: [replication_factor:2] [default@tok] describe cluster; Cluster Information: Snitch: org.apache.cassandra.locator.Ec2Snitch Partitioner: org.apache.cassandra.dht.RandomPartitioner Schema versions: 4687d620-7664-11e1--1bcb936807ff: [10.38.175.131, 10.34.158.33, 10.116.83.10] Any idea what is the cause? I am running similar code on local ring and it is balanced. How can I fix this? Thanks, *Tamar Fraenkel * Senior Software Engineer, TOK Media [image: Inline image 1] ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956 -- With kind regards, Robin Verlangen www.robinverlangen.nl tokLogo.png
Upgrading hardware on a node in a cluster
Hi List I'd like to migrate my nodes in a cluster to new hardware, moving one node at a time. I'm running the cluster in Amazon, so I don't get to pick the ip number of each host myself. I'd like to decommision, say, the node with token 0, and bring that node up on the new hardware (which will have a new IP number). Can anyone provide me with a recipe for doing this? I've looked around and read about nodetool move, which didn't make me much wiser. Thanks for your help, /Martin Koch - Issuu - Senior Systems Architect
Re: 1000's of CF's.
Main problem that this sweet spot is very narrow. We can't have lots of CF, we can't have long rows and we end up with enormous amount of huge composite row keys and stored metadata about that keys (keep in mind overhead on such scheme, but looks like that nobody really cares about it anymore). And this approach is bad for running Hadoop jobs on it (for now i'm pointing at this as main problem for me right now) and for creating secondary indices (lots of rows - high cardinality, right?), also some 'per-CF option' could become a limitation factor. And bad thing about it - this just doesn't look extendable, you just must end up with 'not-so-many' big CFs - that's a dead end. Maybe it wouldn't look that bad if you try not to associate CF with any real entity and call them 'Random stuff store'. I just hope that i'm wrong and there's some good compromise between three ways of storing data - long rows, many 'very-composite' rows and partitioning by CF. Which way is preferable to run complicated analytics queries on top of it in fair amount of time? How people handle this? */-- W/ best regards, Sergey. /* On 10.10.2012 2:15, Ben Hood wrote: I'm not a Cassandra dev, so take what I say with a lot of salt, but AFAICT, there is a certain amount of overhead in maintaining a CF, so when you have large numbers of CFs, this adds up. From a layperson's perspective, this observation sounds reasonable, since zero-cost CFs would be tantamount to being able to implement secondary indexes by just adding CFs. So instead of paying the for the overhead (or ineffectiveness of high-cardinaility secondary indexes, which ever way you want to look at it), you are expecting a free lunch by just scaling out in terms on new CFs. I would imagine that under the covers, the layout of Cassandra has a sweet spot of a smallish number of CFs (i.e. 10s), but these can practically have as many rows as you like. On Mon, Oct 8, 2012 at 11:02 AM, Vanger disc...@gmail.com wrote: So what solution should be for cassandra architecture when we need to make Hadoop M\R jobs and not be restricted by number of CF? What we have now is fair amount of CFs ( 2K) and this number is slowly growing so we already planing to merge partitioned CFs. But our next goal is to run hadoop tasks on those CFs. All we have is plain Hector and custom ORM on top of it. As far as i understand VirtualKeyspace doesn't help in our case. Also i dont understand why not implement support for many CF ( or build-in partitioning ) on cassandra side. Anybody can explain why this can or cannot be done in cassandra? Just in case: We're using cassandra 1.0.11 on 30 nodes (planning upgrade on 1.1.* soon). -- W/ best regards, Sergey. On 04.10.2012 0:10, Hiller, Dean wrote: Okay, so it only took me two solid days not a week. PlayOrm in master branch now supports virtual CF's or virtual tables in ONE CF, so you can have 1000's or millions of virtual CF's in one CF now. It works with all the Scalable-SQL, works with the joins, and works with the PlayOrm command line tool. Two ways to do it, if you are using the ORM half, you just annotate @NoSqlEntity(MyVirtualCfName) @NoSqlVirtualCf(storedInCf=sharedCf) So it's stored in sharedCf with the table name of MyVirtualCfName(in command line tool, use MyVirtualCfName to query the table). Then if you don't know your meta data ahead of time, you need to create DboTableMeta and DboColumnMeta objects and save them for every table you create and can use TypedRow to read and persist (which is what we have a project doing). If you try it out let me know. We usually get bug fixes in pretty fast if you run into anything. (more and more questions are forming on stack overflow as well ;) ). Later, Dean
Re: Option for ordering columns by timestamp in CF
I think Cassandra should provide an configurable option on per column family basis to do columns sorting by time-stamp rather than column names. This would be really helpful to maintain time-sorted columns without using up the column name as time-stamps which might otherwise be used to store most relevant column names useful for retrievals. Very frequently we need to store data sorted in time order. Therefore I think this may be a very general requirement not specific to just my use-case alone. Does it makes sense to create an issue for this ? On Fri, Mar 25, 2011 at 2:38 AM, aaron morton aa...@thelastpickle.comwrote: If you mean order by the column timestamp (as passed by the client) that it not possible. Can you use your own timestamps as the column name and store them as long values ? Aaron On 25 Mar 2011, at 09:30, Narendra Sharma wrote: Cassandra 0.7.4 Column names in my CF are of type byte[] but I want to order columns by timestamp. What is the best way to achieve this? Does it make sense for Cassandra to support ordering of columns by timestamp as option for a column family irrespective of the column name type? Thanks, Naren
Re: can I have a mix of 32 and 64 bit machines in a cluster?
On Tuesday 09 of October 2012, Brian Tarbox wrote: I can't imagine why this would be a problem but I wonder if anyone has experience with running a mix of 32 and 64 bit nodes in a cluster. We are running mixed userspace 64/32bit (all kernels 64bit) linux 1.0.10 cluster for our daily operations for months now without issue. Regards, -- Mateusz Korniak
Re: Option for ordering columns by timestamp in CF
I think that would be cool. /Martin Koch - Issuu - Senior Software Architect On Wed, Oct 10, 2012 at 11:44 AM, Ertio Lew ertio...@gmail.com wrote: I think Cassandra should provide an configurable option on per column family basis to do columns sorting by time-stamp rather than column names. This would be really helpful to maintain time-sorted columns without using up the column name as time-stamps which might otherwise be used to store most relevant column names useful for retrievals. Very frequently we need to store data sorted in time order. Therefore I think this may be a very general requirement not specific to just my use-case alone. Does it makes sense to create an issue for this ? On Fri, Mar 25, 2011 at 2:38 AM, aaron morton aa...@thelastpickle.comwrote: If you mean order by the column timestamp (as passed by the client) that it not possible. Can you use your own timestamps as the column name and store them as long values ? Aaron On 25 Mar 2011, at 09:30, Narendra Sharma wrote: Cassandra 0.7.4 Column names in my CF are of type byte[] but I want to order columns by timestamp. What is the best way to achieve this? Does it make sense for Cassandra to support ordering of columns by timestamp as option for a column family irrespective of the column name type? Thanks, Naren
Re: Upgrading hardware on a node in a cluster
Well, you could use amazon VPC in which case you DO pick the IP yourself ;)….it makes life a bit easier. Dean From: Martin Koch m...@issuu.commailto:m...@issuu.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Wednesday, October 10, 2012 3:29 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Upgrading hardware on a node in a cluster Hi List I'd like to migrate my nodes in a cluster to new hardware, moving one node at a time. I'm running the cluster in Amazon, so I don't get to pick the ip number of each host myself. I'd like to decommision, say, the node with token 0, and bring that node up on the new hardware (which will have a new IP number). Can anyone provide me with a recipe for doing this? I've looked around and read about nodetool move, which didn't make me much wiser. Thanks for your help, /Martin Koch - Issuu - Senior Systems Architect
Re: 1000's of CF's.
I do believe they could solve this if they wanted to. We are now streaming 5000 virtual CF's into one CF with PlayOrm. Our plan now is to use storm to do the processing in place of map/reduce. Each virtual CF can also be partitioned(you choose the column that is the partition key). So I would love to see cassandra have a way to create virtual CF with a row key prefix identifying that virtual CF. Right now however, until cassandra has something, we are moving forward with our solution as it seems to work great so far. And we don't have time to wait either. In PlayOrm, each index of each partition has the full list of keys so I will probably just have storm work off the indices of every partition in the virtual CF so I can map/reduce a virtual CF just fine. Later, Dean From: Vanger disc...@gmail.commailto:disc...@gmail.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Wednesday, October 10, 2012 3:37 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: 1000's of CF's. Main problem that this sweet spot is very narrow. We can't have lots of CF, we can't have long rows and we end up with enormous amount of huge composite row keys and stored metadata about that keys (keep in mind overhead on such scheme, but looks like that nobody really cares about it anymore). And this approach is bad for running Hadoop jobs on it (for now i'm pointing at this as main problem for me right now) and for creating secondary indices (lots of rows - high cardinality, right?), also some 'per-CF option' could become a limitation factor. And bad thing about it - this just doesn't look extendable, you just must end up with 'not-so-many' big CFs - that's a dead end. Maybe it wouldn't look that bad if you try not to associate CF with any real entity and call them 'Random stuff store'. I just hope that i'm wrong and there's some good compromise between three ways of storing data - long rows, many 'very-composite' rows and partitioning by CF. Which way is preferable to run complicated analytics queries on top of it in fair amount of time? How people handle this? -- W/ best regards, Sergey. On 10.10.2012 2:15, Ben Hood wrote: I'm not a Cassandra dev, so take what I say with a lot of salt, but AFAICT, there is a certain amount of overhead in maintaining a CF, so when you have large numbers of CFs, this adds up. From a layperson's perspective, this observation sounds reasonable, since zero-cost CFs would be tantamount to being able to implement secondary indexes by just adding CFs. So instead of paying the for the overhead (or ineffectiveness of high-cardinaility secondary indexes, which ever way you want to look at it), you are expecting a free lunch by just scaling out in terms on new CFs. I would imagine that under the covers, the layout of Cassandra has a sweet spot of a smallish number of CFs (i.e. 10s), but these can practically have as many rows as you like. On Mon, Oct 8, 2012 at 11:02 AM, Vanger disc...@gmail.commailto:disc...@gmail.com wrote: So what solution should be for cassandra architecture when we need to make Hadoop M\R jobs and not be restricted by number of CF? What we have now is fair amount of CFs ( 2K) and this number is slowlygrowing so we already planing to merge partitioned CFs. But our next goal is to run hadoop tasks on those CFs. All we have is plain Hector and custom ORM on top of it. As far as i understand VirtualKeyspace doesn't help in our case. Also i dont understand why not implement support for many CF ( or build-in partitioning ) on cassandra side. Anybody can explain why this can or cannot be done in cassandra? Just in case: We're using cassandra 1.0.11 on 30 nodes (planning upgrade on 1.1.* soon). -- W/ best regards, Sergey. On 04.10.2012 0:10, Hiller, Dean wrote: Okay, so it only took me two solid days not a week. PlayOrm in master branch now supports virtual CF's or virtual tables in ONE CF, so you can have 1000's or millions of virtual CF's in one CF now. It works with all the Scalable-SQL, works with the joins, and works with the PlayOrm command line tool. Two ways to do it, if you are using the ORM half, you just annotate @NoSqlEntity(MyVirtualCfName) @NoSqlVirtualCf(storedInCf=sharedCf) So it's stored in sharedCf with the table name of MyVirtualCfName(in command line tool, use MyVirtualCfName to query the table). Then if you don't know your meta data ahead of time, you need to create DboTableMeta and DboColumnMeta objects and save them for every table you create and can use TypedRow to read and persist (which is what we have a project doing). If you try it out let me know. We usually get bug fixes in pretty fast if you run into anything. (more and more questions are forming on stack overflow as well ;) ). Later, Dean
Re: Attempting to load already loaded column family during startup
I know what happened here. The node encountering exception during startup is 1.2 while there is another node of 1.2-beta2. https://issues.apache.org/jira/browse/CASSANDRA-4416 includes metadata for system keyspace itself in schema_* tables. Hence, when both nodes were up, 1.2-beta2 node streamed that metadata to 1.2 node. Now when I restarted 1.2 node, the following code will load system keyspace again when only non-system keyspace should be loaded. public static CollectionKSMetaData loadFromTable() { ListRow serializedSchema = SystemTable.serializedSchema(SystemTable.SCHEMA_KEYSPACES_CF); ListKSMetaData keyspaces = new ArrayListKSMetaData(serializedSchema.size()); for (Row row : serializedSchema) { if (invalidSchemaRow(row)) continue; keyspaces.add(KSMetaData.fromSchema(row, serializedColumnFamilies(row.key))); } return keyspaces; } In 1.2-beta2, system keyspace will be filtered out. I think I'm gonna update my 1.2 node. On Wed, Oct 10, 2012 at 2:18 PM, Manu Zhang owenzhang1...@gmail.com wrote: well, that didn't go away after I remove all the caches. What should I do now? On Wed, Oct 10, 2012 at 2:15 PM, Manu Zhang owenzhang1...@gmail.comwrote: exception encountered during startup: Attempting to load already loaded column family system_traces.sessionsjava.lang.RuntimeException: Attempting to load already loaded column family system_traces.sessions at org.apache.cassandra.config.Schema.load(Schema.java:398) at org.apache.cassandra.config.Schema.load(Schema.java:111) at org.apache.cassandra.config.Schema.load(Schema.java:96) at org.apache.cassandra.config.DatabaseDescriptor.loadSchemas(DatabaseDescriptor.java:560) at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:214) at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:386) at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:429) This is thrown while reading saved row caches. What could have caused the problem?
Re: cassandra 1.2 beta in production
https://issues.apache.org/jira/browse/CASSANDRA/fixforversion/12323284 On Wed, Oct 10, 2012 at 1:41 AM, Alexey Zotov azo...@griddynamics.com wrote: Hi Guys, What known critical bugs are there that couldn't allow to use 1.2 beta 1 in production? We don't use cql and secondary indexes. -- Best regards Zotov Alexey Grid Dynamics Skype: azotcsit
Re: unbalanced ring
major compaction in production is fine, however it is a heavy operation on the node and will take I/O and some CPU. the only time i have seen this happen is when i have changed the tokens in the ring, like nodetool movetoken. cassandra does not auto-delete data that it doesn't use anymore just in case you want to move the tokens again or otherwise undo. try nodetool cleanup On Wed, Oct 10, 2012 at 2:01 AM, Alain RODRIGUEZ arodr...@gmail.com wrote: Hi, Same thing here: 2 nodes, RF = 2. RCL = 1, WCL = 1. Like Tamar I never ran a major compaction and repair once a week each node. 10.59.21.241eu-west 1b Up Normal 133.02 GB 50.00% 0 10.58.83.109eu-west 1b Up Normal 98.12 GB 50.00% 85070591730234615865843651857942052864 What phenomena could explain the result above ? By the way, I have copy the data and import it in a one node dev cluster. There I have run a major compaction and the size of my data has been significantly reduced (to about 32 GB instead of 133 GB). How is that possible ? Do you think that if I run major compaction in both nodes it will balance the load evenly ? Should I run major compaction in production ? 2012/10/10 Tamar Fraenkel ta...@tok-media.com Hi! I am re-posting this, now that I have more data and still *unbalanced ring*: 3 nodes, RF=3, RCL=WCL=QUORUM Address DC RackStatus State Load OwnsToken 113427455640312821154458202477256070485 x.x.x.xus-east 1c Up Normal 24.02 GB33.33% 0 y.y.y.y us-east 1c Up Normal 33.45 GB 33.33% 56713727820156410577229101238628035242 z.z.z.zus-east 1c Up Normal 29.85 GB33.33% 113427455640312821154458202477256070485 repair runs weekly. I don't run nodetool compact as I read that this may cause the minor regular compactions not to run and then I will have to run compact manually. Is that right? Any idea if this means something wrong, and if so, how to solve? Thanks, * Tamar Fraenkel * Senior Software Engineer, TOK Media [image: Inline image 1] ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956 On Tue, Mar 27, 2012 at 9:12 AM, Tamar Fraenkel ta...@tok-media.comwrote: Thanks, I will wait and see as data accumulates. Thanks, *Tamar Fraenkel * Senior Software Engineer, TOK Media [image: Inline image 1] ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956 On Tue, Mar 27, 2012 at 9:00 AM, R. Verlangen ro...@us2.nl wrote: Cassandra is built to store tons and tons of data. In my opinion roughly ~ 6MB per node is not enough data to allow it to become a fully balanced cluster. 2012/3/27 Tamar Fraenkel ta...@tok-media.com This morning I have nodetool ring -h localhost Address DC RackStatus State Load OwnsToken 113427455640312821154458202477256070485 10.34.158.33us-east 1c Up Normal 5.78 MB 33.33% 0 10.38.175.131 us-east 1c Up Normal 7.23 MB 33.33% 56713727820156410577229101238628035242 10.116.83.10us-east 1c Up Normal 5.02 MB 33.33% 113427455640312821154458202477256070485 Version is 1.0.8. *Tamar Fraenkel * Senior Software Engineer, TOK Media [image: Inline image 1] ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956 On Tue, Mar 27, 2012 at 4:05 AM, Maki Watanabe watanabe.m...@gmail.com wrote: What version are you using? Anyway try nodetool repair compact. maki 2012/3/26 Tamar Fraenkel ta...@tok-media.com Hi! I created Amazon ring using datastax image and started filling the db. The cluster seems un-balanced. nodetool ring returns: Address DC RackStatus State Load OwnsToken 113427455640312821154458202477256070485 10.34.158.33us-east 1c Up Normal 514.29 KB 33.33% 0 10.38.175.131 us-east 1c Up Normal 1.5 MB 33.33% 56713727820156410577229101238628035242 10.116.83.10us-east 1c Up Normal 1.5 MB 33.33% 113427455640312821154458202477256070485 [default@tok] describe; Keyspace: tok: Replication Strategy: org.apache.cassandra.locator.SimpleStrategy Durable Writes: true Options: [replication_factor:2] [default@tok] describe cluster; Cluster Information: Snitch: org.apache.cassandra.locator.Ec2Snitch Partitioner: org.apache.cassandra.dht.RandomPartitioner Schema versions: 4687d620-7664-11e1--1bcb936807ff: [10.38.175.131, 10.34.158.33, 10.116.83.10] Any idea what is the cause? I am running similar code on local ring and it is balanced. How can I fix this? Thanks, *Tamar Fraenkel * Senior Software Engineer, TOK Media [image: Inline image 1] ta...@tok-media.com
Re: Upgrading hardware on a node in a cluster
if you have N nodes in your cluster, add N new nodes using the new hardware, then decommision the old N nodes. (and migrate to VPC like dean said) On Wed, Oct 10, 2012 at 5:23 AM, Hiller, Dean dean.hil...@nrel.gov wrote: Well, you could use amazon VPC in which case you DO pick the IP yourself ;)….it makes life a bit easier. Dean From: Martin Koch m...@issuu.commailto:m...@issuu.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Wednesday, October 10, 2012 3:29 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Upgrading hardware on a node in a cluster Hi List I'd like to migrate my nodes in a cluster to new hardware, moving one node at a time. I'm running the cluster in Amazon, so I don't get to pick the ip number of each host myself. I'd like to decommision, say, the node with token 0, and bring that node up on the new hardware (which will have a new IP number). Can anyone provide me with a recipe for doing this? I've looked around and read about nodetool move, which didn't make me much wiser. Thanks for your help, /Martin Koch - Issuu - Senior Systems Architect
Re: unbalanced ring
Hi! Apart from being heavy load (the compact), will it have other effects? Also, will cleanup help if I have replication factor = number of nodes? Thanks *Tamar Fraenkel * Senior Software Engineer, TOK Media [image: Inline image 1] ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956 On Wed, Oct 10, 2012 at 6:12 PM, B. Todd Burruss bto...@gmail.com wrote: major compaction in production is fine, however it is a heavy operation on the node and will take I/O and some CPU. the only time i have seen this happen is when i have changed the tokens in the ring, like nodetool movetoken. cassandra does not auto-delete data that it doesn't use anymore just in case you want to move the tokens again or otherwise undo. try nodetool cleanup On Wed, Oct 10, 2012 at 2:01 AM, Alain RODRIGUEZ arodr...@gmail.comwrote: Hi, Same thing here: 2 nodes, RF = 2. RCL = 1, WCL = 1. Like Tamar I never ran a major compaction and repair once a week each node. 10.59.21.241eu-west 1b Up Normal 133.02 GB 50.00% 0 10.58.83.109eu-west 1b Up Normal 98.12 GB 50.00% 85070591730234615865843651857942052864 What phenomena could explain the result above ? By the way, I have copy the data and import it in a one node dev cluster. There I have run a major compaction and the size of my data has been significantly reduced (to about 32 GB instead of 133 GB). How is that possible ? Do you think that if I run major compaction in both nodes it will balance the load evenly ? Should I run major compaction in production ? 2012/10/10 Tamar Fraenkel ta...@tok-media.com Hi! I am re-posting this, now that I have more data and still *unbalanced ring*: 3 nodes, RF=3, RCL=WCL=QUORUM Address DC RackStatus State Load OwnsToken 113427455640312821154458202477256070485 x.x.x.xus-east 1c Up Normal 24.02 GB 33.33% 0 y.y.y.y us-east 1c Up Normal 33.45 GB 33.33% 56713727820156410577229101238628035242 z.z.z.zus-east 1c Up Normal 29.85 GB 33.33% 113427455640312821154458202477256070485 repair runs weekly. I don't run nodetool compact as I read that this may cause the minor regular compactions not to run and then I will have to run compact manually. Is that right? Any idea if this means something wrong, and if so, how to solve? Thanks, * Tamar Fraenkel * Senior Software Engineer, TOK Media [image: Inline image 1] ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956 On Tue, Mar 27, 2012 at 9:12 AM, Tamar Fraenkel ta...@tok-media.comwrote: Thanks, I will wait and see as data accumulates. Thanks, *Tamar Fraenkel * Senior Software Engineer, TOK Media [image: Inline image 1] ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956 On Tue, Mar 27, 2012 at 9:00 AM, R. Verlangen ro...@us2.nl wrote: Cassandra is built to store tons and tons of data. In my opinion roughly ~ 6MB per node is not enough data to allow it to become a fully balanced cluster. 2012/3/27 Tamar Fraenkel ta...@tok-media.com This morning I have nodetool ring -h localhost Address DC RackStatus State Load OwnsToken 113427455640312821154458202477256070485 10.34.158.33us-east 1c Up Normal 5.78 MB 33.33% 0 10.38.175.131 us-east 1c Up Normal 7.23 MB 33.33% 56713727820156410577229101238628035242 10.116.83.10us-east 1c Up Normal 5.02 MB 33.33% 113427455640312821154458202477256070485 Version is 1.0.8. *Tamar Fraenkel * Senior Software Engineer, TOK Media [image: Inline image 1] ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956 On Tue, Mar 27, 2012 at 4:05 AM, Maki Watanabe watanabe.m...@gmail.com wrote: What version are you using? Anyway try nodetool repair compact. maki 2012/3/26 Tamar Fraenkel ta...@tok-media.com Hi! I created Amazon ring using datastax image and started filling the db. The cluster seems un-balanced. nodetool ring returns: Address DC RackStatus State Load OwnsToken 113427455640312821154458202477256070485 10.34.158.33us-east 1c Up Normal 514.29 KB 33.33% 0 10.38.175.131 us-east 1c Up Normal 1.5 MB 33.33% 56713727820156410577229101238628035242 10.116.83.10us-east 1c Up Normal 1.5 MB 33.33% 113427455640312821154458202477256070485 [default@tok] describe; Keyspace: tok: Replication Strategy: org.apache.cassandra.locator.SimpleStrategy Durable Writes: true Options: [replication_factor:2] [default@tok] describe cluster; Cluster Information: Snitch: org.apache.cassandra.locator.Ec2Snitch
Re: How to replace a dead *seed* node while keeping quorum
I witnessed the same behavior as reported by Edward and James. Removing the host from its own seed list does not solve the problem. Removing it from config of all nodes and restarting each, then restarting the failed node worked. Ron On Sep 12, 2012, at 4:42 PM, Edward Sargisson wrote: I'm reposting my colleague's reply to Rob to the list (with James' permission) in case others are interested. I'll add to James' post below to say I don't believe we saw the message that that slice of code would have printed. Hey Rob, Ed's AWOL right now and I'm not on u@c.a.o, but I can tell you that when I removed the downed seed node from its own list of seed nodes in cassandra.yaml that it didn't join the existing ring nor did it get any schemas or data from the existing ring; it felt like timeouts were happening. (IANA Cassandra wizard, so excuse my terminology impedance.) Changing the machine's hostname and giving it a new IP, it behaved as expected; joining the ring, syncing both schema and associated data. Downed node is 1.1.4, the rest of the ring is 1.1.2. I'm in a situation where I can revert the IP/hostname change and retry the scenario as needed if you've got any ideas. HTH, JAmes Cheers, Edward On 12-09-12 03:53 PM, Rob Coli wrote: On Tue, Sep 11, 2012 at 4:21 PM, Edward Sargisson edward.sargis...@globalrelay.net wrote: If the downed node is a seed node then neither of the replace a dead node procedures work (-Dcassandra.replace_token and taking initial_token-1). The ring remains split. [...] In other words, if the host name is on the seeds list then it appears that the rest of the ring refuses to bootstrap it. Close, but not exactly... ./src/java/org/apache/cassandra/service/StorageService.java line 559 of 3090 if (DatabaseDescriptor.isAutoBootstrap() DatabaseDescriptor.getSeeds().contains(FBUtilities.getBroadcastAddress()) !SystemTable.isBootstrapped()) logger_.info(This node will not auto bootstrap because it is configured to be a seed node.); getSeeds asks your seed provider for a list of seeds. If you are using the SimpleSeedProvider, this basically turns the list from seeds in cassandra.yaml on the local node into a list of hosts. So it isn't that the other nodes have this node in their seed list.. it's that the node you are replacing has itself in its own seed list, and shouldn't. I understand that it can be tricky in conf management tools to make seed nodes' seed lists not contain themselves, but I believe it is currently necessary in this case. FWIW, it's unclear to me (and Aaron Morton, whose curiousity was apparently equally piqued and is looking into it further..) why exactly seed nodes shouldn't bootstrap. It's possible that they only shouldn't bootstrap without being in hibernate mode, and that the code just hasn't been re-written post replace_token/hibernate to say that it's ok for seed nodes to bootstrap as long as they hibernate... =Rob -- Edward Sargisson senior java developer Global Relay edward.sargis...@globalrelay.net 866.484.6630 New York | Chicago | Vancouver | London (+44.0800.032.9829) | Singapore (+65.3158.1301) Global Relay Archive supports email, instant messaging, BlackBerry, Bloomberg, Thomson Reuters, Pivot, YellowJacket, LinkedIn, Twitter, Facebook and more. Ask about Global Relay Message — The Future of Collaboration in the Financial Services World All email sent to or from this address will be retained by Global Relay’s email archiving system. This message is intended only for the use of the individual or entity to which it is addressed, and may contain information that is privileged, confidential, and exempt from disclosure under applicable law. Global Relay will not be liable for any compliance or technical information provided herein. All trademarks are the property of their respective owners.
Re: unbalanced ring
it should not have any other impact except increased usage of system resources. and i suppose, cleanup would not have an affect (over normal compaction) if all nodes contain the same data On Wed, Oct 10, 2012 at 12:12 PM, Tamar Fraenkel ta...@tok-media.comwrote: Hi! Apart from being heavy load (the compact), will it have other effects? Also, will cleanup help if I have replication factor = number of nodes? Thanks *Tamar Fraenkel * Senior Software Engineer, TOK Media [image: Inline image 1] ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956 On Wed, Oct 10, 2012 at 6:12 PM, B. Todd Burruss bto...@gmail.com wrote: major compaction in production is fine, however it is a heavy operation on the node and will take I/O and some CPU. the only time i have seen this happen is when i have changed the tokens in the ring, like nodetool movetoken. cassandra does not auto-delete data that it doesn't use anymore just in case you want to move the tokens again or otherwise undo. try nodetool cleanup On Wed, Oct 10, 2012 at 2:01 AM, Alain RODRIGUEZ arodr...@gmail.comwrote: Hi, Same thing here: 2 nodes, RF = 2. RCL = 1, WCL = 1. Like Tamar I never ran a major compaction and repair once a week each node. 10.59.21.241eu-west 1b Up Normal 133.02 GB 50.00% 0 10.58.83.109eu-west 1b Up Normal 98.12 GB 50.00% 85070591730234615865843651857942052864 What phenomena could explain the result above ? By the way, I have copy the data and import it in a one node dev cluster. There I have run a major compaction and the size of my data has been significantly reduced (to about 32 GB instead of 133 GB). How is that possible ? Do you think that if I run major compaction in both nodes it will balance the load evenly ? Should I run major compaction in production ? 2012/10/10 Tamar Fraenkel ta...@tok-media.com Hi! I am re-posting this, now that I have more data and still *unbalanced ring*: 3 nodes, RF=3, RCL=WCL=QUORUM Address DC RackStatus State Load OwnsToken 113427455640312821154458202477256070485 x.x.x.xus-east 1c Up Normal 24.02 GB 33.33% 0 y.y.y.y us-east 1c Up Normal 33.45 GB 33.33% 56713727820156410577229101238628035242 z.z.z.zus-east 1c Up Normal 29.85 GB 33.33% 113427455640312821154458202477256070485 repair runs weekly. I don't run nodetool compact as I read that this may cause the minor regular compactions not to run and then I will have to run compact manually. Is that right? Any idea if this means something wrong, and if so, how to solve? Thanks, * Tamar Fraenkel * Senior Software Engineer, TOK Media [image: Inline image 1] ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956 On Tue, Mar 27, 2012 at 9:12 AM, Tamar Fraenkel ta...@tok-media.comwrote: Thanks, I will wait and see as data accumulates. Thanks, *Tamar Fraenkel * Senior Software Engineer, TOK Media [image: Inline image 1] ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956 On Tue, Mar 27, 2012 at 9:00 AM, R. Verlangen ro...@us2.nl wrote: Cassandra is built to store tons and tons of data. In my opinion roughly ~ 6MB per node is not enough data to allow it to become a fully balanced cluster. 2012/3/27 Tamar Fraenkel ta...@tok-media.com This morning I have nodetool ring -h localhost Address DC RackStatus State Load OwnsToken 113427455640312821154458202477256070485 10.34.158.33us-east 1c Up Normal 5.78 MB 33.33% 0 10.38.175.131 us-east 1c Up Normal 7.23 MB 33.33% 56713727820156410577229101238628035242 10.116.83.10us-east 1c Up Normal 5.02 MB 33.33% 113427455640312821154458202477256070485 Version is 1.0.8. *Tamar Fraenkel * Senior Software Engineer, TOK Media [image: Inline image 1] ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956 On Tue, Mar 27, 2012 at 4:05 AM, Maki Watanabe watanabe.m...@gmail.com wrote: What version are you using? Anyway try nodetool repair compact. maki 2012/3/26 Tamar Fraenkel ta...@tok-media.com Hi! I created Amazon ring using datastax image and started filling the db. The cluster seems un-balanced. nodetool ring returns: Address DC RackStatus State Load OwnsToken 113427455640312821154458202477256070485 10.34.158.33us-east 1c Up Normal 514.29 KB 33.33% 0 10.38.175.131 us-east 1c Up Normal 1.5 MB 33.33% 56713727820156410577229101238628035242 10.116.83.10us-east 1c Up Normal 1.5 MB 33.33%
Re: READ messages dropped
Hi! Thanks for the answer. I don't see much change in the load this Cassandra cluster is under, so why is the sudden surge of such messages? What I did noticed while looking at the logs (which are also running OpsCenter), is that there is some correlation between the dropped reads and flushes of OpsCenter column families to disk and or compactions. What are the rollups CFs? why is there so much traffic in them? Thanks, *Tamar Fraenkel * Senior Software Engineer, TOK Media [image: Inline image 1] ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956 On Wed, Oct 10, 2012 at 1:00 AM, aaron morton aa...@thelastpickle.comwrote: or how to solve it? Simple solution is move to m1.xlarge :) In the last 3 days I see many messages of READ messages dropped in last 5000ms on one of my 3 nodes cluster. The node is not able to keep up with the load. Possible causes include excessive GC, aggressive compaction, or simply too many requests. it also a good idea to take a look at iostat to see if the disk is keeping up. Hope that helps - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 9/10/2012, at 9:08 AM, Tamar Fraenkel ta...@tok-media.com wrote: Hi! In the last 3 days I see many messages of READ messages dropped in last 5000ms on one of my 3 nodes cluster. I see no errors in the log. There are also messages of Finished hinted handoff of 0 rows to endpoint but I had those for a while now, so I don't know if they are related. I am running Cassandra 1.0.8 on a 3 node cluster on EC2 m1.large instances. Rep factor 3 (Quorum read and write) Does anyone have a clue what I should be looking for, or how to solve it? Thanks, *Tamar Fraenkel * Senior Software Engineer, TOK Media tokLogo.png ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956 tokLogo.png
Re: 1.1.1 is repair still needed ?
On Tue, Oct 9, 2012 at 12:56 PM, Oleg Dulin oleg.du...@gmail.com wrote: My understanding is that the repair has to happen within gc_grace period. [ snip ] So the question is, is this still needed ? Do we even need to run nodetool repair ? If Hinted Handoff works in your version of Cassandra, and that version is 1.0, you should not need to repair if no node has crashed or been down for longer than max_hint_window_in_ms. This is because after 1.0, any failed write to a remote replica results in a hint, so any DELETE should eventually be fully replicated. However hinted handoff is meaningfully broken between 1.1.0 and 1.1.6 (unreleased) so you cannot rely on the above heuristic for consistency. In these versions, you have to repair (or read repair 100% of keys) once every GCGraceSeconds to prevent the possibility of zombie data. If it were possible to repair on a per-columnfamily basis, you could get a significant win by only repairing columnfamilies which take DELETE traffic. https://issues.apache.org/jira/browse/CASSANDRA-4772 =Rob -- =Robert Coli AIMGTALK - rc...@palominodb.com YAHOO - rcoli.palominob SKYPE - rcoli_palominodb