Re: Cluster not accepting insert while one node is down
Hi Traian, There is your problem. You are using RF=1, meaning that each node is responsible for its range, and nothing more. So when a node goes down, do the math, you just can't read 1/5 of your data. This is very cool for performances since each node owns its own part of the data and any write or read need to reach only one node, but it removes the SPOF, which is a main point of using C*. So you have poor availability and poor consistency. An usual configuration with 5 nodes would be RF=3 and both CL (RW) = QUORUM. This will replicate your data to 2 nodes + the natural endpoints (total of 3/5 nodes owning any data) and any read or write would need to reach at least 2 nodes before being considered as being successful ensuring a strong consistency. This configuration allow you to shut down a node (crash or configuration update/rolling restart) without degrading the service (at least allowing you to reach any data) but at cost of more data on each node. Alain 2013/2/14 Traian Fratean traian.frat...@gmail.com I am using defaults for both RF and CL. As the keyspace was created using cassandra-cli the default RF should be 1 as I get it from below: [default@TestSpace] describe; Keyspace: TestSpace: Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy Durable Writes: true Options: [datacenter1:1] As for the CL it the Astyanax default, which is 1 for both reads and writes. Traian. 2013/2/13 Alain RODRIGUEZ arodr...@gmail.com We probably need more info like the RF of your cluster and CL of your reads and writes. Maybe could you also tell us if you use vnodes or not. I heard that Astyanax was not running very smoothly on 1.2.0, but a bit better on 1.2.1. Yet, Netflix didn't release a version of Astyanax for C*1.2. Alain 2013/2/13 Traian Fratean traian.frat...@gmail.com Hi, I have a cluster of 5 nodes running Cassandra 1.2.0 . I have a Java client with Astyanax 1.56.21. When a node(10.60.15.67 - *diiferent* from the one in the stacktrace below) went down I get TokenRandeOfflineException and no other data gets inserted into *any other* node from the cluster. Am I having a configuration issue or this is supposed to happen? com.netflix.astyanax.connectionpool.impl.CountingConnectionPoolMonitor.trackError(CountingConnectionPoolMonitor.java:81) - com.netflix.astyanax.connectionpool.exceptions.TokenRangeOfflineException: TokenRangeOfflineException: [host=10.60.15.66(10.60.15.66):9160, latency=2057(2057), attempts=1]UnavailableException() com.netflix.astyanax.connectionpool.exceptions.TokenRangeOfflineException: TokenRangeOfflineException: [host=10.60.15.66(10.60.15.66):9160, latency=2057(2057), attempts=1]UnavailableException() at com.netflix.astyanax.thrift.ThriftConverter.ToConnectionPoolException(ThriftConverter.java:165) at com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:60) at com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:27) at com.netflix.astyanax.thrift.ThriftSyncConnectionFactoryImpl$1.execute(ThriftSyncConnectionFactoryImpl.java:140) at com.netflix.astyanax.connectionpool.impl.AbstractExecuteWithFailoverImpl.tryOperation(AbstractExecuteWithFailoverImpl.java:69) at com.netflix.astyanax.connectionpool.impl.AbstractHostPartitionConnectionPool.executeWithFailover(AbstractHostPartitionConnectionPool.java:255) Thank you, Traian.
[VOTE] Release Mojo's Cassandra Maven Plugin 1.2.1-1
Hi, I'd like to release version 1.2.1-1 of Mojo's Cassandra Maven Plugin to sync up with the 1.2.1 release of Apache Cassandra. We solved 1 issues: http://jira.codehaus.org/secure/ReleaseNote.jspa?projectId=12121version=19089 Staging Repository: https://nexus.codehaus.org/content/repositories/orgcodehausmojo-015/ Site: http://mojo.codehaus.org/cassandra-maven-plugin/index.html SCM Tag: https://svn.codehaus.org/mojo/tags/cassandra-maven-plugin-1.2.1-1@17931 [ ] +1 Yeah! fire ahead oh and the blind man on the galloping horse says it looks fine too. [ ] 0 Mehhh! like I care, I don't have any opinions either, I'd follow somebody else if only I could decide who [ ] -1 No! wait up there I have issues (in general like, ya know, and being a trouble-maker is only one of them) The vote is open for 72h and will succeed by lazy consensus. Guide to testing staged releases: http://maven.apache.org/guides/development/guide-testing-releases.html Cheers -Stephen P.S. In the interest of ensuring (more is) better testing, and as is now tradition for Mojo's Cassandra Maven Plugin, this vote is also open to any subscribers of the dev and user@cassandra.apache.org mailing lists that want to test or use this plugin.
Re: Cluster not accepting insert while one node is down
I will let commiters or anyone that has knowledge on Cassandra internal answer this. From what I understand, you should be able to insert data on any up node with your configuration... Alain 2013/2/14 Traian Fratean traian.frat...@gmail.com You're right as regarding data availability on that node. And my config, being the default one, is not suited for a cluster. What I don't get is that my 67 node was down and I was trying to insert in 66 node, as can be seen from the stacktrace. Long story short: when node 67 was down I could not insert into any machine in the cluster. Not what I was expecting. Thank you for the reply! Traian. 2013/2/14 Alain RODRIGUEZ arodr...@gmail.com Hi Traian, There is your problem. You are using RF=1, meaning that each node is responsible for its range, and nothing more. So when a node goes down, do the math, you just can't read 1/5 of your data. This is very cool for performances since each node owns its own part of the data and any write or read need to reach only one node, but it removes the SPOF, which is a main point of using C*. So you have poor availability and poor consistency. An usual configuration with 5 nodes would be RF=3 and both CL (RW) = QUORUM. This will replicate your data to 2 nodes + the natural endpoints (total of 3/5 nodes owning any data) and any read or write would need to reach at least 2 nodes before being considered as being successful ensuring a strong consistency. This configuration allow you to shut down a node (crash or configuration update/rolling restart) without degrading the service (at least allowing you to reach any data) but at cost of more data on each node. Alain 2013/2/14 Traian Fratean traian.frat...@gmail.com I am using defaults for both RF and CL. As the keyspace was created using cassandra-cli the default RF should be 1 as I get it from below: [default@TestSpace] describe; Keyspace: TestSpace: Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy Durable Writes: true Options: [datacenter1:1] As for the CL it the Astyanax default, which is 1 for both reads and writes. Traian. 2013/2/13 Alain RODRIGUEZ arodr...@gmail.com We probably need more info like the RF of your cluster and CL of your reads and writes. Maybe could you also tell us if you use vnodes or not. I heard that Astyanax was not running very smoothly on 1.2.0, but a bit better on 1.2.1. Yet, Netflix didn't release a version of Astyanax for C*1.2. Alain 2013/2/13 Traian Fratean traian.frat...@gmail.com Hi, I have a cluster of 5 nodes running Cassandra 1.2.0 . I have a Java client with Astyanax 1.56.21. When a node(10.60.15.67 - *diiferent* from the one in the stacktrace below) went down I get TokenRandeOfflineException and no other data gets inserted into *any other* node from the cluster. Am I having a configuration issue or this is supposed to happen? com.netflix.astyanax.connectionpool.impl.CountingConnectionPoolMonitor.trackError(CountingConnectionPoolMonitor.java:81) - com.netflix.astyanax.connectionpool.exceptions.TokenRangeOfflineException: TokenRangeOfflineException: [host=10.60.15.66(10.60.15.66):9160, latency=2057(2057), attempts=1]UnavailableException() com.netflix.astyanax.connectionpool.exceptions.TokenRangeOfflineException: TokenRangeOfflineException: [host=10.60.15.66(10.60.15.66):9160, latency=2057(2057), attempts=1]UnavailableException() at com.netflix.astyanax.thrift.ThriftConverter.ToConnectionPoolException(ThriftConverter.java:165) at com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:60) at com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:27) at com.netflix.astyanax.thrift.ThriftSyncConnectionFactoryImpl$1.execute(ThriftSyncConnectionFactoryImpl.java:140) at com.netflix.astyanax.connectionpool.impl.AbstractExecuteWithFailoverImpl.tryOperation(AbstractExecuteWithFailoverImpl.java:69) at com.netflix.astyanax.connectionpool.impl.AbstractHostPartitionConnectionPool.executeWithFailover(AbstractHostPartitionConnectionPool.java:255) Thank you, Traian.
Re: cassandra error: Line 1 = Keyspace names must be case-insensitively unique (usertable conflicts with usertable)
On Thu, Feb 14, 2013 at 1:36 PM, Muntasir Raihan Rahman muntasir.rai...@gmail.com wrote: Hi, I am trying to run cassandra on a 10 node cluster. But I keep getting this error: Line 1 = Keyspace names must be case-insensitively unique (usertable conflicts with usertable). *When are you getting this error? I mean in which sort of operation * I checked the database, and I only have one keyspace named usertable Can anyone please suggest what's going on? Thanks Muntasir. -- Best Regards Muntasir Raihan Rahman Email: muntasir.rai...@gmail.com Phone: 1-217-979-9307 Department of Computer Science, University of Illinois Urbana Champaign, 3111 Siebel Center, 201 N. Goodwin Avenue, Urbana, IL 61801 -- Abhijit Chanda +91-974395
Cassandra as a service on Windows
Hi all,\ According to http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-windows-service-new-cql-clients-and-more running cassandra.bat install should make cassandra run on a service on a windows box. However I'm getting the following when I try: C:\apache-cassandra-1.2.1\bincassandra.bat install trying to delete service if it has been created already The system cannot find the path specified. Installing cassandra. If you get registry warnings, re-run as an Administrator The system cannot find the path specified. Setting the parameters for cassandra The system cannot find the path specified. Installation of cassandra is complete no service is installed. Digging into cassandra.bat I notice: :doInstallOperation set SERVICE_JVM=cassandra rem location of Prunsrv set PATH_PRUNSRV=%CASSANDRA_HOME%\bin\daemon\ set PR_LOGPATH=%PATH_PRUNSRV% \bin\daemon does not exist. Is this a bug that it's missing ? Andy The University of Dundee is a registered Scottish Charity, No: SC015096
Re: Write performance expectations...
Using multithreading, inserting 2000 per thread, resulted in no throughput increase. Each thread is taking about 4 seconds per, indicating a bottleneck elsewhere. Ken - Original Message - From: Tyler Hobbs ty...@datastax.com To: user@cassandra.apache.org Sent: Wednesday, February 13, 2013 11:06:30 AM Subject: Re: Write performance expectations... 2500 inserts per second is about what a single python thread using pycassa can do against a local node. Are you using multiple threads for the inserts? Multiple processes? On Wed, Feb 13, 2013 at 8:21 AM, Alain RODRIGUEZ arodr...@gmail.com wrote: Is there a particular reason for you to use EBS ? Instance Store are recommended because they improve performances by reducing the I/O throttling. An other thing you should be aware of is that replicating the data to all node reduce your performance, it is more or less like if you had only one node (at performance level I mean). Also, writing to different datacenters probably induce some network latency. You should give the EC2 instance type (m1.xlarge / m1.large / ...) if you want some feedback about the 2500 w/s, and also give the mean size of your rows. Alain 2013/2/13 ka...@comcast.net blockquote Hello, New member here, and I have (yet another) question on write performance. I'm using Apache Cassandra version 1.1, Python 2.7 and Pycassa 1.7. I have a cluster of 2 datacenters, each with 3 nodes, on AWS EC2 using EBS and the RandomPartioner. I'm writing to a column family in a keyspace that's replicated to all nodes in both datacenters, with a consistency level of LOCAL_QUORUM. I'm seeing write performance of around 2500 rows per second. Is this in the ballpark for this kind of configuration? Thanks in advance. Ken /blockquote -- Tyler Hobbs DataStax
Re: Write performance expectations...
it could be the instances are IO limited. I've been running benchmarks with Cassandra 1.1.9 the last 2 weeks on a AMD FX 8 core with 32GB of ram. with 24 threads I get roughly 20K inserts per second. each insert is only about 100-150 bytes. On Thu, Feb 14, 2013 at 8:07 AM, ka...@comcast.net wrote: Using multithreading, inserting 2000 per thread, resulted in no throughput increase. Each thread is taking about 4 seconds per, indicating a bottleneck elsewhere. Ken From: Tyler Hobbs ty...@datastax.com To: user@cassandra.apache.org Sent: Wednesday, February 13, 2013 11:06:30 AM Subject: Re: Write performance expectations... 2500 inserts per second is about what a single python thread using pycassa can do against a local node. Are you using multiple threads for the inserts? Multiple processes? On Wed, Feb 13, 2013 at 8:21 AM, Alain RODRIGUEZ arodr...@gmail.com wrote: Is there a particular reason for you to use EBS ? Instance Store are recommended because they improve performances by reducing the I/O throttling. An other thing you should be aware of is that replicating the data to all node reduce your performance, it is more or less like if you had only one node (at performance level I mean). Also, writing to different datacenters probably induce some network latency. You should give the EC2 instance type (m1.xlarge / m1.large / ...) if you want some feedback about the 2500 w/s, and also give the mean size of your rows. Alain 2013/2/13 ka...@comcast.net Hello, New member here, and I have (yet another) question on write performance. I'm using Apache Cassandra version 1.1, Python 2.7 and Pycassa 1.7. I have a cluster of 2 datacenters, each with 3 nodes, on AWS EC2 using EBS and the RandomPartioner. I'm writing to a column family in a keyspace that's replicated to all nodes in both datacenters, with a consistency level of LOCAL_QUORUM. I'm seeing write performance of around 2500 rows per second. Is this in the ballpark for this kind of configuration? Thanks in advance. Ken -- Tyler Hobbs DataStax
Unbalanced ring after upgrade!
Hello, We just upgraded from 1.1.2-1.1.9. We utilize the byte ordered partitioner (we generate our own hashes). We have not yet upgraded sstables. Before the upgrade, we had a balanced ring. After the upgrade, we see: 10.0.4.22 us-east 1a Up Normal 77.66 GB 0.04% Token(bytes[0001]) 10.0.10.23 us-east 1d Up Normal 82.74 GB 0.04% Token(bytes[1555]) 10.0.8.20 us-east 1c Up Normal 81.79 GB 0.04% Token(bytes[2aaa]) 10.0.4.23 us-east 1a Up Normal 82.66 GB 33.84% Token(bytes[4000]) 10.0.10.20 us-east 1d Up Normal 80.21 GB 67.51% Token(bytes[5554]) 10.0.8.23 us-east 1c Up Normal 77.12 GB 99.89% Token(bytes[6aac]) 10.0.4.21 us-east 1a Up Normal 81.38 GB 66.09% Token(bytes[8000]) 10.0.10.24 us-east 1d Up Normal 83.43 GB 32.41% Token(bytes[9558]) 10.0.8.21 us-east 1c Up Normal 84.42 GB 0.04% Token(bytes[aaa8]) 10.0.4.25 us-east 1a Up Normal 80.06 GB 0.04% Token(bytes[c000]) 10.0.10.21 us-east 1d Up Normal 83.57 GB 0.04% Token(bytes[d558]) 10.0.8.24 us-east 1c Up Normal 90.74 GB 0.04% Token(bytes[eaa8]) Restarting a node essentially changes who own 99% of the ring. Given we use an RF of 3, and LOCAL_QUORUM consistency for everything, and we are not seeing errors, something seems to be working correctly. Any idea what is going on above? Should I be alarmed? -Mike
Mutation dropped
Hi - I am doing a load test using YCSB across 2 nodes in a cluster and seeing a lot of mutation dropped messages. I understand that this is due to the replica not being written to the other node ? RF = 2, CL =1. From the wiki - For MUTATION messages this means that the mutation was not applied to all replicas it was sent to. The inconsistency will be repaired by Read Repair or Anti Entropy Repair Thanks, Kanwar
Re: Cassandra 1.2.1 key cache error
Those are good suggestions guys. I'm using Java 7 and this is my first install of C* so looks like it might be genuine. From what I understand this is a minor issue that doesn't affect the functionality, correct? If not I should prob download a prev version of C* or build my own... Have filed a Jira here: https://issues.apache.org/jira/browse/CASSANDRA-5253 Thanks Ahmed On 12 February 2013 23:49, Edward Capriolo edlinuxg...@gmail.com wrote: It can also happen if you have an older/non sun jvm. On Tuesday, February 12, 2013, aaron morton aa...@thelastpickle.com wrote: This looks like a bug in 1.2 beta https://issues.apache.org/jira/browse/CASSANDRA-4553 Can you confirm you are running 1.2.1 and if you can re-create this with a clean install please create a ticket on https://issues.apache.org/jira/browse/CASSANDRA Thanks - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 13/02/2013, at 1:22 AM, Ahmed Guecioueur ahme...@gmail.com wrote: Hi I am currently evaluating Cassandra on a single node. Running the node seems fine, it responds to Thrift (via Hector) and CQL3 requests to create delete keyspaces. I have not yet tested any data operations. However, I get the following each time the node is started. This is using the latest production jars (v 1.2.1) downloaded from the Apache website: INFO [main] 2013-02-07 19:48:55,610 AutoSavingCache.java (line 139) reading saved cache C:\Cassandra\saved_caches\system-local-KeyCache-b.db WARN [main] 2013-02-07 19:48:55,614 AutoSavingCache.java (line 160) error reading saved cache C:\Cassandra\saved_caches\system-local-KeyCache-b.db java.io.EOFException at java.io.DataInputStream.readInt(Unknown Source) at org.apache.cassandra.utils.ByteBufferUtil.readWithLength(ByteBufferUtil.java:349) at org.apache.cassandra.service.CacheService$KeyCacheSerializer.deserialize(CacheService.java:378) at org.apache.cassandra.cache.AutoSavingCache.loadSaved(AutoSavingCache.java:144) at org.apache.cassandra.db.ColumnFamilyStore.init(ColumnFamilyStore.java:277) at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:392) at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:364) at org.apache.cassandra.db.Table.initCf(Table.java:337) at org.apache.cassandra.db.Table.init(Table.java:280) at org.apache.cassandra.db.Table.open(Table.java:110) at org.apache.cassandra.db.Table.open(Table.java:88) at org.apache.cassandra.db.SystemTable.checkHealth(SystemTable.java:421) at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:177) at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:370) at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:413) INFO [SSTableBatchOpen:1] 2013-02-07 19:48:56,212 SSTableReader.java (line 164) Opening C:\Cassandra\data\system_auth\users\system_auth-users-ib-1 (72 bytes)
Re: Write performance expectations...
Alain, I found out that the client node is an m1.small, and the cassandra nodes are m1.large. This is what is contained in each row: {dev1-dc1r-redir-0.unica.net/B9tk: {batchID: 2486272}}. Not a whole lot of data. If you don't use EBS, how is data persistence then maintained in the event that an instance goes down for whatever reason? Ken - Original Message - From: Alain RODRIGUEZ arodr...@gmail.com To: user@cassandra.apache.org Sent: Thursday, February 14, 2013 8:34:06 AM Subject: Re: Write performance expectations... Hi Ken, You really should take a look at my first answer... and give us more information on the size of your inserts, the type of EC2 you are using at least. You should also consider using Instance store and not EBS. Well, look at all these things I already told you. Alain 2013/2/14 Peter Lin wool...@gmail.com it could be the instances are IO limited. I've been running benchmarks with Cassandra 1.1.9 the last 2 weeks on a AMD FX 8 core with 32GB of ram. with 24 threads I get roughly 20K inserts per second. each insert is only about 100-150 bytes. On Thu, Feb 14, 2013 at 8:07 AM, ka...@comcast.net wrote: Using multithreading, inserting 2000 per thread, resulted in no throughput increase. Each thread is taking about 4 seconds per, indicating a bottleneck elsewhere. Ken From: Tyler Hobbs ty...@datastax.com To: user@cassandra.apache.org Sent: Wednesday, February 13, 2013 11:06:30 AM Subject: Re: Write performance expectations... 2500 inserts per second is about what a single python thread using pycassa can do against a local node. Are you using multiple threads for the inserts? Multiple processes? On Wed, Feb 13, 2013 at 8:21 AM, Alain RODRIGUEZ arodr...@gmail.com wrote: Is there a particular reason for you to use EBS ? Instance Store are recommended because they improve performances by reducing the I/O throttling. An other thing you should be aware of is that replicating the data to all node reduce your performance, it is more or less like if you had only one node (at performance level I mean). Also, writing to different datacenters probably induce some network latency. You should give the EC2 instance type (m1.xlarge / m1.large / ...) if you want some feedback about the 2500 w/s, and also give the mean size of your rows. Alain 2013/2/13 ka...@comcast.net Hello, New member here, and I have (yet another) question on write performance. I'm using Apache Cassandra version 1.1, Python 2.7 and Pycassa 1.7. I have a cluster of 2 datacenters, each with 3 nodes, on AWS EC2 using EBS and the RandomPartioner. I'm writing to a column family in a keyspace that's replicated to all nodes in both datacenters, with a consistency level of LOCAL_QUORUM. I'm seeing write performance of around 2500 rows per second. Is this in the ballpark for this kind of configuration? Thanks in advance. Ken -- Tyler Hobbs DataStax
Re: Write performance expectations...
A m1.small will probably be unable to maximize throughput on your m1.large cluster. If you don't use EBS, how is data persistence then maintained in the event that an instance goes down for whatever reason? You answered by yourself earlier in this thread : I'm writing to a column family in a keyspace that's replicated to all nodes in both datacenters So if one of your node goes down for any reason you'll have to bootstrap a new one to replace the dead node, which will take data on remaining replicas. You're in the first anti-pattern listed here : http://www.datastax.com/docs/1.1/cluster_architecture/anti_patterns using EBS. Alain 2013/2/14 ka...@comcast.net Alain, I found out that the client node is an m1.small, and the cassandra nodes are m1.large. This is what is contained in each row: {dev1-dc1r-redir-0.unica.net/B9tk: {batchID: 2486272}}. Not a whole lot of data. If you don't use EBS, how is data persistence then maintained in the event that an instance goes down for whatever reason? Ken -- *From: *Alain RODRIGUEZ arodr...@gmail.com *To: *user@cassandra.apache.org *Sent: *Thursday, February 14, 2013 8:34:06 AM *Subject: *Re: Write performance expectations... Hi Ken, You really should take a look at my first answer... and give us more information on the size of your inserts, the type of EC2 you are using at least. You should also consider using Instance store and not EBS. Well, look at all these things I already told you. Alain 2013/2/14 Peter Lin wool...@gmail.com it could be the instances are IO limited. I've been running benchmarks with Cassandra 1.1.9 the last 2 weeks on a AMD FX 8 core with 32GB of ram. with 24 threads I get roughly 20K inserts per second. each insert is only about 100-150 bytes. On Thu, Feb 14, 2013 at 8:07 AM, ka...@comcast.net wrote: Using multithreading, inserting 2000 per thread, resulted in no throughput increase. Each thread is taking about 4 seconds per, indicating a bottleneck elsewhere. Ken From: Tyler Hobbs ty...@datastax.com To: user@cassandra.apache.org Sent: Wednesday, February 13, 2013 11:06:30 AM Subject: Re: Write performance expectations... 2500 inserts per second is about what a single python thread using pycassa can do against a local node. Are you using multiple threads for the inserts? Multiple processes? On Wed, Feb 13, 2013 at 8:21 AM, Alain RODRIGUEZ arodr...@gmail.com wrote: Is there a particular reason for you to use EBS ? Instance Store are recommended because they improve performances by reducing the I/O throttling. An other thing you should be aware of is that replicating the data to all node reduce your performance, it is more or less like if you had only one node (at performance level I mean). Also, writing to different datacenters probably induce some network latency. You should give the EC2 instance type (m1.xlarge / m1.large / ...) if you want some feedback about the 2500 w/s, and also give the mean size of your rows. Alain 2013/2/13 ka...@comcast.net Hello, New member here, and I have (yet another) question on write performance. I'm using Apache Cassandra version 1.1, Python 2.7 and Pycassa 1.7. I have a cluster of 2 datacenters, each with 3 nodes, on AWS EC2 using EBS and the RandomPartioner. I'm writing to a column family in a keyspace that's replicated to all nodes in both datacenters, with a consistency level of LOCAL_QUORUM. I'm seeing write performance of around 2500 rows per second. Is this in the ballpark for this kind of configuration? Thanks in advance. Ken -- Tyler Hobbs DataStax
Re: Size Tiered - Leveled Compaction
I second these questions: we've been looking into changing some of our CFs to use leveled compaction as well. If anybody here has the wisdom to answer them it would be of wonderful help. Thanks Charles On Wed, Feb 13, 2013 at 7:50 AM, Mike mthero...@yahoo.com wrote: Hello, I'm investigating the transition of some of our column families from Size Tiered - Leveled Compaction. I believe we have some high-read-load column families that would benefit tremendously. I've stood up a test DB Node to investigate the transition. I successfully alter the column family, and I immediately noticed a large number (1000+) pending compaction tasks become available, but no compaction get executed. I tried running nodetool sstableupgrade on the column family, and the compaction tasks don't move. I also notice no changes to the size and distribution of the existing SSTables. I then run a major compaction on the column family. All pending compaction tasks get run, and the SSTables have a distribution that I would expect from LeveledCompaction (lots and lots of 10MB files). Couple of questions: 1) Is a major compaction required to transition from size-tiered to leveled compaction? 2) Are major compactions as much of a concern for LeveledCompaction as their are for Size Tiered? All the documentation I found concerning transitioning from Size Tiered to Level compaction discuss the alter table cql command, but I haven't found too much on what else needs to be done after the schema change. I did these tests with Cassandra 1.1.9. Thanks, -Mike
Re: Cluster not accepting insert while one node is down
Generally data isn't written to whatever node the client connects to. In your case, a row is written to one of the nodes based on the hash of the row key. If that one replica node is down, it won't matter which coordinator node you attempt a write with CL.ONE: the write will fail. If you want the write to succeed, you could do any one of: write with CL.ANY, increase RF to 2+, write using a row key that hashes to an UP node. -Bryan On Thu, Feb 14, 2013 at 2:06 AM, Alain RODRIGUEZ arodr...@gmail.com wrote: I will let commiters or anyone that has knowledge on Cassandra internal answer this. From what I understand, you should be able to insert data on any up node with your configuration... Alain 2013/2/14 Traian Fratean traian.frat...@gmail.com You're right as regarding data availability on that node. And my config, being the default one, is not suited for a cluster. What I don't get is that my 67 node was down and I was trying to insert in 66 node, as can be seen from the stacktrace. Long story short: when node 67 was down I could not insert into any machine in the cluster. Not what I was expecting. Thank you for the reply! Traian. 2013/2/14 Alain RODRIGUEZ arodr...@gmail.com Hi Traian, There is your problem. You are using RF=1, meaning that each node is responsible for its range, and nothing more. So when a node goes down, do the math, you just can't read 1/5 of your data. This is very cool for performances since each node owns its own part of the data and any write or read need to reach only one node, but it removes the SPOF, which is a main point of using C*. So you have poor availability and poor consistency. An usual configuration with 5 nodes would be RF=3 and both CL (RW) = QUORUM. This will replicate your data to 2 nodes + the natural endpoints (total of 3/5 nodes owning any data) and any read or write would need to reach at least 2 nodes before being considered as being successful ensuring a strong consistency. This configuration allow you to shut down a node (crash or configuration update/rolling restart) without degrading the service (at least allowing you to reach any data) but at cost of more data on each node. Alain 2013/2/14 Traian Fratean traian.frat...@gmail.com I am using defaults for both RF and CL. As the keyspace was created using cassandra-cli the default RF should be 1 as I get it from below: [default@TestSpace] describe; Keyspace: TestSpace: Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy Durable Writes: true Options: [datacenter1:1] As for the CL it the Astyanax default, which is 1 for both reads and writes. Traian. 2013/2/13 Alain RODRIGUEZ arodr...@gmail.com We probably need more info like the RF of your cluster and CL of your reads and writes. Maybe could you also tell us if you use vnodes or not. I heard that Astyanax was not running very smoothly on 1.2.0, but a bit better on 1.2.1. Yet, Netflix didn't release a version of Astyanax for C*1.2. Alain 2013/2/13 Traian Fratean traian.frat...@gmail.com Hi, I have a cluster of 5 nodes running Cassandra 1.2.0 . I have a Java client with Astyanax 1.56.21. When a node(10.60.15.67 - *diiferent* from the one in the stacktrace below) went down I get TokenRandeOfflineException and no other data gets inserted into *any other* node from the cluster. Am I having a configuration issue or this is supposed to happen? com.netflix.astyanax.connectionpool.impl.CountingConnectionPoolMonitor.trackError(CountingConnectionPoolMonitor.java:81) - com.netflix.astyanax.connectionpool.exceptions.TokenRangeOfflineException: TokenRangeOfflineException: [host=10.60.15.66(10.60.15.66):9160, latency=2057(2057), attempts=1]UnavailableException() com.netflix.astyanax.connectionpool.exceptions.TokenRangeOfflineException: TokenRangeOfflineException: [host=10.60.15.66(10.60.15.66):9160, latency=2057(2057), attempts=1]UnavailableException() at com.netflix.astyanax.thrift.ThriftConverter.ToConnectionPoolException(ThriftConverter.java:165) at com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:60) at com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:27) at com.netflix.astyanax.thrift.ThriftSyncConnectionFactoryImpl$1.execute(ThriftSyncConnectionFactoryImpl.java:140) at com.netflix.astyanax.connectionpool.impl.AbstractExecuteWithFailoverImpl.tryOperation(AbstractExecuteWithFailoverImpl.java:69) at com.netflix.astyanax.connectionpool.impl.AbstractHostPartitionConnectionPool.executeWithFailover(AbstractHostPartitionConnectionPool.java:255) Thank you, Traian.
Re: subscribe request
This is new. On Thu, Feb 14, 2013 at 9:24 AM, Muntasir Raihan Rahman muntasir.rai...@gmail.com wrote: -- Best Regards Muntasir Raihan Rahman Email: muntasir.rai...@gmail.com Phone: 1-217-979-9307 Department of Computer Science, University of Illinois Urbana Champaign, 3111 Siebel Center, 201 N. Goodwin Avenue, Urbana, IL 61801 -- Eric Evans Acunu | http://www.acunu.com | @acunu
Re: subscribe request
i was hoping for a rick roll. On 14 February 2013 16:55, Eric Evans eev...@acunu.com wrote: This is new. On Thu, Feb 14, 2013 at 9:24 AM, Muntasir Raihan Rahman muntasir.rai...@gmail.com wrote: -- Best Regards Muntasir Raihan Rahman Email: muntasir.rai...@gmail.com Phone: 1-217-979-9307 Department of Computer Science, University of Illinois Urbana Champaign, 3111 Siebel Center, 201 N. Goodwin Avenue, Urbana, IL 61801 -- Eric Evans Acunu | http://www.acunu.com | @acunu
Re: subscribe request
I apologize for this silly mistake! Thanks Muntasir. On Thu, Feb 14, 2013 at 11:01 AM, Andy Twigg andy.tw...@gmail.com wrote: i was hoping for a rick roll. On 14 February 2013 16:55, Eric Evans eev...@acunu.com wrote: This is new. On Thu, Feb 14, 2013 at 9:24 AM, Muntasir Raihan Rahman muntasir.rai...@gmail.com wrote: -- Best Regards Muntasir Raihan Rahman Email: muntasir.rai...@gmail.com Phone: 1-217-979-9307 Department of Computer Science, University of Illinois Urbana Champaign, 3111 Siebel Center, 201 N. Goodwin Avenue, Urbana, IL 61801 -- Eric Evans Acunu | http://www.acunu.com | @acunu -- Best Regards Muntasir Raihan Rahman Email: muntasir.rai...@gmail.com Phone: 1-217-979-9307 Department of Computer Science, University of Illinois Urbana Champaign, 3111 Siebel Center, 201 N. Goodwin Avenue, Urbana, IL 61801
Re: Upgrade to Cassandra 1.2
Thanks Aaron and Manu. Since we are using 1.1, there is no num_taken parameter. when I upgrade to 1.2, should I set num_token=1 to start up, or I can set to other numbers? Daning On Tue, Feb 12, 2013 at 3:45 PM, Manu Zhang owenzhang1...@gmail.com wrote: num_tokens is only used at bootstrap I think it's also used in this case (already bootstrapped with num_tokens = 1 and now num_tokens 1). Cassandra will split a node's current range into *num_tokens* parts and there should be no change to the amount of ring a node holds before shuffling. On Wed, Feb 13, 2013 at 3:12 AM, aaron morton aa...@thelastpickle.comwrote: Restore the settings for num_tokens and intial_token to what they were before you upgraded. They should not be changed just because you are upgrading to 1.2, they are used to enable virtual nodes. Which are not necessary to run 1.2. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 13/02/2013, at 8:02 AM, Daning Wang dan...@netseer.com wrote: No, I did not run shuffle since the upgrade was not successful. what do you mean reverting the changes to num_tokens and inital_token? set num_tokens=1? initial_token should be ignored since it is not bootstrap. right? Thanks, Daning On Tue, Feb 12, 2013 at 10:52 AM, aaron morton aa...@thelastpickle.comwrote: Were you upgrading to 1.2 AND running the shuffle or just upgrading to 1.2? If you have not run shuffle I would suggest reverting the changes to num_tokens and inital_token. This is a guess because num_tokens is only used at bootstrap. Just get upgraded to 1.2 first, then do the shuffle when things are stable. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 12/02/2013, at 2:55 PM, Daning Wang dan...@netseer.com wrote: Thanks Aaron. I tried to migrate existing cluster(ver 1.1.0) to 1.2.1 but failed. - I followed http://www.datastax.com/docs/1.2/install/upgrading, have merged cassandra.yaml, with follow parameter num_tokens: 256 #initial_token: 0 the initial_token is commented out, current token should be obtained from system schema - I did rolling upgrade, during the upgrade, I got Borken Pipe error from the nodes with old version, is that normal? - After I upgraded 3 nodes(still have 5 to go), I found it is total wrong, the first node upgraded owns 99.2 of ring [cassy@d5:/usr/local/cassy conf]$ ~/bin/nodetool -h localhost status Datacenter: datacenter1 === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack DN 10.210.101.11745.01 GB 254 99.2% f4b6afe3-7e2e-4c61-96e8-12a529a31373 rack1 UN 10.210.101.12045.43 GB 256 0.4% 0fd912fb-3187-462b-8c8a-7d223751b649 rack1 UN 10.210.101.11127.08 GB 256 0.4% bd4c37bc-07dd-488b-bfab-e74e32c26f6e rack1 What was wrong? please help. I could provide more information if you need. Thanks, Daning On Mon, Feb 4, 2013 at 9:16 AM, aaron morton aa...@thelastpickle.comwrote: There is a command line utility in 1.2 to shuffle the tokens… http://www.datastax.com/dev/blog/upgrading-an-existing-cluster-to-vnodes $ ./cassandra-shuffle --help Missing sub-command argument. Usage: shuffle [options] sub-command Sub-commands: create Initialize a new shuffle operation ls List pending relocations clearClear pending relocations en[able] Enable shuffling dis[able]Disable shuffling Options: -dc, --only-dc Apply only to named DC (create only) -tp, --thrift-port Thrift port number (Default: 9160) -p, --port JMX port number (Default: 7199) -tf, --thrift-framed Enable framed transport for Thrift (Default: false) -en, --and-enableImmediately enable shuffling (create only) -H, --help Print help information -h, --host JMX hostname or IP address (Default: localhost) -th, --thrift-host Thrift hostname or IP address (Default: JMX host) Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 3/02/2013, at 11:32 PM, Manu Zhang owenzhang1...@gmail.com wrote: On Sun 03 Feb 2013 05:45:56 AM CST, Daning Wang wrote: I'd like to upgrade from 1.1.6 to 1.2.1, one big feature in 1.2 is that it can have multiple tokens in one node. but there is only one token in 1.1.6. how can I upgrade to 1.2.1 then breaking the token to take advantage of this feature? I went through this doc but it does not say how to change the num_token http://www.datastax.com/docs/1.2/install/upgrading Is there other doc about this upgrade path? Thanks, Daning I think for each node you need to change the num_token
Re: Upgrade to Cassandra 1.2
From: http://www.datastax.com/docs/1.2/configuration/node_configuration#num-tokens About num_tokens: If left unspecified, Cassandra uses the default value of 1 token (for legacy compatibility) and uses the initial_token. If you already have a cluster with one token per node, and wish to migrate to multiple tokens per node. So I would let #num_tokens commented in the cassandra.yaml and would set the initial_token at the same value than in the pre-C*1.2.x-uprage configuration. Alain 2013/2/14 Daning Wang dan...@netseer.com Thanks Aaron and Manu. Since we are using 1.1, there is no num_taken parameter. when I upgrade to 1.2, should I set num_token=1 to start up, or I can set to other numbers? Daning On Tue, Feb 12, 2013 at 3:45 PM, Manu Zhang owenzhang1...@gmail.comwrote: num_tokens is only used at bootstrap I think it's also used in this case (already bootstrapped with num_tokens = 1 and now num_tokens 1). Cassandra will split a node's current range into *num_tokens* parts and there should be no change to the amount of ring a node holds before shuffling. On Wed, Feb 13, 2013 at 3:12 AM, aaron morton aa...@thelastpickle.comwrote: Restore the settings for num_tokens and intial_token to what they were before you upgraded. They should not be changed just because you are upgrading to 1.2, they are used to enable virtual nodes. Which are not necessary to run 1.2. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 13/02/2013, at 8:02 AM, Daning Wang dan...@netseer.com wrote: No, I did not run shuffle since the upgrade was not successful. what do you mean reverting the changes to num_tokens and inital_token? set num_tokens=1? initial_token should be ignored since it is not bootstrap. right? Thanks, Daning On Tue, Feb 12, 2013 at 10:52 AM, aaron morton aa...@thelastpickle.comwrote: Were you upgrading to 1.2 AND running the shuffle or just upgrading to 1.2? If you have not run shuffle I would suggest reverting the changes to num_tokens and inital_token. This is a guess because num_tokens is only used at bootstrap. Just get upgraded to 1.2 first, then do the shuffle when things are stable. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 12/02/2013, at 2:55 PM, Daning Wang dan...@netseer.com wrote: Thanks Aaron. I tried to migrate existing cluster(ver 1.1.0) to 1.2.1 but failed. - I followed http://www.datastax.com/docs/1.2/install/upgrading, have merged cassandra.yaml, with follow parameter num_tokens: 256 #initial_token: 0 the initial_token is commented out, current token should be obtained from system schema - I did rolling upgrade, during the upgrade, I got Borken Pipe error from the nodes with old version, is that normal? - After I upgraded 3 nodes(still have 5 to go), I found it is total wrong, the first node upgraded owns 99.2 of ring [cassy@d5:/usr/local/cassy conf]$ ~/bin/nodetool -h localhost status Datacenter: datacenter1 === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack DN 10.210.101.11745.01 GB 254 99.2% f4b6afe3-7e2e-4c61-96e8-12a529a31373 rack1 UN 10.210.101.12045.43 GB 256 0.4% 0fd912fb-3187-462b-8c8a-7d223751b649 rack1 UN 10.210.101.11127.08 GB 256 0.4% bd4c37bc-07dd-488b-bfab-e74e32c26f6e rack1 What was wrong? please help. I could provide more information if you need. Thanks, Daning On Mon, Feb 4, 2013 at 9:16 AM, aaron morton aa...@thelastpickle.comwrote: There is a command line utility in 1.2 to shuffle the tokens… http://www.datastax.com/dev/blog/upgrading-an-existing-cluster-to-vnodes $ ./cassandra-shuffle --help Missing sub-command argument. Usage: shuffle [options] sub-command Sub-commands: create Initialize a new shuffle operation ls List pending relocations clearClear pending relocations en[able] Enable shuffling dis[able]Disable shuffling Options: -dc, --only-dc Apply only to named DC (create only) -tp, --thrift-port Thrift port number (Default: 9160) -p, --port JMX port number (Default: 7199) -tf, --thrift-framed Enable framed transport for Thrift (Default: false) -en, --and-enableImmediately enable shuffling (create only) -H, --help Print help information -h, --host JMX hostname or IP address (Default: localhost) -th, --thrift-host Thrift hostname or IP address (Default: JMX host) Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 3/02/2013, at 11:32 PM, Manu Zhang owenzhang1...@gmail.com wrote: On Sun 03
Re: Cluster not accepting insert while one node is down
From the exception, looks like astyanax didn't even try to call Cassandra. My guess would be astyanax is token aware, it detects the node is down and it doesn't even try. If you use Hector, it might try to write since it's not token aware. But As Byran said, it eventually will fail. I guess hinted hand off won't help since the write doesn't satisfy CL.ONE. From: Bryan Talbot btal...@aeriagames.com To: user@cassandra.apache.org Sent: Thursday, February 14, 2013 8:30 AM Subject: Re: Cluster not accepting insert while one node is down Generally data isn't written to whatever node the client connects to. In your case, a row is written to one of the nodes based on the hash of the row key. If that one replica node is down, it won't matter which coordinator node you attempt a write with CL.ONE: the write will fail. If you want the write to succeed, you could do any one of: write with CL.ANY, increase RF to 2+, write using a row key that hashes to an UP node. -Bryan On Thu, Feb 14, 2013 at 2:06 AM, Alain RODRIGUEZ arodr...@gmail.com wrote: I will let commiters or anyone that has knowledge on Cassandra internal answer this. From what I understand, you should be able to insert data on any up node with your configuration... Alain 2013/2/14 Traian Fratean traian.frat...@gmail.com You're right as regarding data availability on that node. And my config, being the default one, is not suited for a cluster. What I don't get is that my 67 node was down and I was trying to insert in 66 node, as can be seen from the stacktrace. Long story short: when node 67 was down I could not insert into any machine in the cluster. Not what I was expecting. Thank you for the reply!Traian. 2013/2/14 Alain RODRIGUEZ arodr...@gmail.com Hi Traian, There is your problem. You are using RF=1, meaning that each node is responsible for its range, and nothing more. So when a node goes down, do the math, you just can't read 1/5 of your data. This is very cool for performances since each node owns its own part of the data and any write or read need to reach only one node, but it removes the SPOF, which is a main point of using C*. So you have poor availability and poor consistency. An usual configuration with 5 nodes would be RF=3 and both CL (RW) = QUORUM. This will replicate your data to 2 nodes + the natural endpoints (total of 3/5 nodes owning any data) and any read or write would need to reach at least 2 nodes before being considered as being successful ensuring a strong consistency. This configuration allow you to shut down a node (crash or configuration update/rolling restart) without degrading the service (at least allowing you to reach any data) but at cost of more data on each node. Alain 2013/2/14 Traian Fratean traian.frat...@gmail.com I am using defaults for both RF and CL. As the keyspace was created using cassandra-cli the default RF should be 1 as I get it from below: [default@TestSpace] describe; Keyspace: TestSpace: Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy Durable Writes: true Options: [datacenter1:1] As for the CL it the Astyanax default, which is 1 for both reads and writes. Traian. 2013/2/13 Alain RODRIGUEZ arodr...@gmail.com We probably need more info like the RF of your cluster and CL of your reads and writes. Maybe could you also tell us if you use vnodes or not. I heard that Astyanax was not running very smoothly on 1.2.0, but a bit better on 1.2.1. Yet, Netflix didn't release a version of Astyanax for C*1.2. Alain 2013/2/13 Traian Fratean traian.frat...@gmail.com Hi, I have a cluster of 5 nodes running Cassandra 1.2.0 . I have a Java client with Astyanax 1.56.21. When a node(10.60.15.67 - diiferent from the one in the stacktrace below) went down I get TokenRandeOfflineException and no other data gets inserted into any other node from the cluster. Am I having a configuration issue or this is supposed to happen? com.netflix.astyanax.connectionpool.impl.CountingConnectionPoolMonitor.trackError(CountingConnectionPoolMonitor.java:81) - com.netflix.astyanax.connectionpool.exceptions.TokenRangeOfflineException: TokenRangeOfflineException: [host=10.60.15.66(10.60.15.66):9160, latency=2057(2057), attempts=1]UnavailableException() com.netflix.astyanax.connectionpool.exceptions.TokenRangeOfflineException: TokenRangeOfflineException: [host=10.60.15.66(10.60.15.66):9160, latency=2057(2057), attempts=1]UnavailableException() at com.netflix.astyanax.thrift.ThriftConverter.ToConnectionPoolException(ThriftConverter.java:165) at com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:60) at com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:27) at com.netflix.astyanax.thrift.ThriftSyncConnectionFactoryImpl$1.execute(ThriftSyncConnectionFactoryImpl.java:140) at
RE: Mutation dropped
Hi - Is there a parameter which can be tuned to prevent the mutations from being dropped ? Is this logic correct ? Node A and B with RF=2, CL =1. Load balanced between the two. -- Address Load Tokens Owns (effective) Host ID Rack UN 10.x.x.x 746.78 GB 256 100.0% dbc9e539-f735-4b0b-8067-b97a85522a1a rack1 UN 10.x.x.x 880.77 GB 256 100.0% 95d59054-be99-455f-90d1-f43981d3d778 rack1 Once we hit a very high TPS (around 50k/sec of inserts), the nodes start falling behind and we see the mutation dropped messages. But there are no failures on the client. Does that mean other node is not able to persist the replicated data ? Is there some timeout associated with replicated data persistence ? Thanks, Kanwar From: Kanwar Sangha [mailto:kan...@mavenir.com] Sent: 14 February 2013 09:08 To: user@cassandra.apache.org Subject: Mutation dropped Hi - I am doing a load test using YCSB across 2 nodes in a cluster and seeing a lot of mutation dropped messages. I understand that this is due to the replica not being written to the other node ? RF = 2, CL =1. From the wiki - For MUTATION messages this means that the mutation was not applied to all replicas it was sent to. The inconsistency will be repaired by Read Repair or Anti Entropy Repair Thanks, Kanwar
Re: Size Tiered - Leveled Compaction
I haven't tried to switch compaction strategy. We started with LCS. For us, after massive data imports (5000 w/seconds for 6 days), the first repair is painful since there is quite some data inconsistency. For 150G nodes, repair brought in about 30 G and created thousands of pending compactions. It took almost a day to clear those. Just be prepared LCS is really slow in 1.1.X. System performance degrades during that time since reads could go to more SSTable, we see 20 SSTable lookup for one read.. (We tried everything we can and couldn't speed it up. I think it's single threaded and it's not recommended to turn on multithread compaction. We even tried that, it didn't help )There is parallel LCS in 1.2 which is supposed to alleviate the pain. Haven't upgraded yet, hope it works:) http://www.datastax.com/dev/blog/performance-improvements-in-cassandra-1-2 Since our cluster is not write intensive, only 100 w/seconds. I don't see any pending compactions during regular operation. One thing worth mentioning is the size of the SSTable, default is 5M which is kind of small for 200G (all in one CF) data set, and we are on SSD. It more than 150K files in one directory. (200G/5M = 40K SSTable and each SSTable creates 4 files on disk) You might want to watch that and decide the SSTable size. By the way, there is no concept of Major compaction for LCS. Just for fun, you can look at a file called $CFName.json in your data directory and it tells you the SSTable distribution among different levels. -Wei From: Charles Brophy cbro...@zulily.com To: user@cassandra.apache.org Sent: Thursday, February 14, 2013 8:29 AM Subject: Re: Size Tiered - Leveled Compaction I second these questions: we've been looking into changing some of our CFs to use leveled compaction as well. If anybody here has the wisdom to answer them it would be of wonderful help. Thanks Charles On Wed, Feb 13, 2013 at 7:50 AM, Mike mthero...@yahoo.com wrote: Hello, I'm investigating the transition of some of our column families from Size Tiered - Leveled Compaction. I believe we have some high-read-load column families that would benefit tremendously. I've stood up a test DB Node to investigate the transition. I successfully alter the column family, and I immediately noticed a large number (1000+) pending compaction tasks become available, but no compaction get executed. I tried running nodetool sstableupgrade on the column family, and the compaction tasks don't move. I also notice no changes to the size and distribution of the existing SSTables. I then run a major compaction on the column family. All pending compaction tasks get run, and the SSTables have a distribution that I would expect from LeveledCompaction (lots and lots of 10MB files). Couple of questions: 1) Is a major compaction required to transition from size-tiered to leveled compaction? 2) Are major compactions as much of a concern for LeveledCompaction as their are for Size Tiered? All the documentation I found concerning transitioning from Size Tiered to Level compaction discuss the alter table cql command, but I haven't found too much on what else needs to be done after the schema change. I did these tests with Cassandra 1.1.9. Thanks, -Mike
Re: Size Tiered - Leveled Compaction
BTW, when I say major compaction, I mean running the nodetool compact command (which does a major compaction for Sized Tiered Compaction). I didn't see the distribution of SSTables I expected until I ran that command, in the steps I described below. -Mike On Feb 14, 2013, at 3:51 PM, Wei Zhu wrote: I haven't tried to switch compaction strategy. We started with LCS. For us, after massive data imports (5000 w/seconds for 6 days), the first repair is painful since there is quite some data inconsistency. For 150G nodes, repair brought in about 30 G and created thousands of pending compactions. It took almost a day to clear those. Just be prepared LCS is really slow in 1.1.X. System performance degrades during that time since reads could go to more SSTable, we see 20 SSTable lookup for one read.. (We tried everything we can and couldn't speed it up. I think it's single threaded and it's not recommended to turn on multithread compaction. We even tried that, it didn't help )There is parallel LCS in 1.2 which is supposed to alleviate the pain. Haven't upgraded yet, hope it works:) http://www.datastax.com/dev/blog/performance-improvements-in-cassandra-1-2 Since our cluster is not write intensive, only 100 w/seconds. I don't see any pending compactions during regular operation. One thing worth mentioning is the size of the SSTable, default is 5M which is kind of small for 200G (all in one CF) data set, and we are on SSD. It more than 150K files in one directory. (200G/5M = 40K SSTable and each SSTable creates 4 files on disk) You might want to watch that and decide the SSTable size. By the way, there is no concept of Major compaction for LCS. Just for fun, you can look at a file called $CFName.json in your data directory and it tells you the SSTable distribution among different levels. -Wei From: Charles Brophy cbro...@zulily.com To: user@cassandra.apache.org Sent: Thursday, February 14, 2013 8:29 AM Subject: Re: Size Tiered - Leveled Compaction I second these questions: we've been looking into changing some of our CFs to use leveled compaction as well. If anybody here has the wisdom to answer them it would be of wonderful help. Thanks Charles On Wed, Feb 13, 2013 at 7:50 AM, Mike mthero...@yahoo.com wrote: Hello, I'm investigating the transition of some of our column families from Size Tiered - Leveled Compaction. I believe we have some high-read-load column families that would benefit tremendously. I've stood up a test DB Node to investigate the transition. I successfully alter the column family, and I immediately noticed a large number (1000+) pending compaction tasks become available, but no compaction get executed. I tried running nodetool sstableupgrade on the column family, and the compaction tasks don't move. I also notice no changes to the size and distribution of the existing SSTables. I then run a major compaction on the column family. All pending compaction tasks get run, and the SSTables have a distribution that I would expect from LeveledCompaction (lots and lots of 10MB files). Couple of questions: 1) Is a major compaction required to transition from size-tiered to leveled compaction? 2) Are major compactions as much of a concern for LeveledCompaction as their are for Size Tiered? All the documentation I found concerning transitioning from Size Tiered to Level compaction discuss the alter table cql command, but I haven't found too much on what else needs to be done after the schema change. I did these tests with Cassandra 1.1.9. Thanks, -Mike
Re: Upgrade to Cassandra 1.2
Thanks! suppose I can upgrade to 1.2.x with 1 token by commenting out num_tokens, how can I changed to multiple tokens? could not find doc clearly stating about this. On Thu, Feb 14, 2013 at 10:54 AM, Alain RODRIGUEZ arodr...@gmail.comwrote: From: http://www.datastax.com/docs/1.2/configuration/node_configuration#num-tokens About num_tokens: If left unspecified, Cassandra uses the default value of 1 token (for legacy compatibility) and uses the initial_token. If you already have a cluster with one token per node, and wish to migrate to multiple tokens per node. So I would let #num_tokens commented in the cassandra.yaml and would set the initial_token at the same value than in the pre-C*1.2.x-uprage configuration. Alain 2013/2/14 Daning Wang dan...@netseer.com Thanks Aaron and Manu. Since we are using 1.1, there is no num_taken parameter. when I upgrade to 1.2, should I set num_token=1 to start up, or I can set to other numbers? Daning On Tue, Feb 12, 2013 at 3:45 PM, Manu Zhang owenzhang1...@gmail.comwrote: num_tokens is only used at bootstrap I think it's also used in this case (already bootstrapped with num_tokens = 1 and now num_tokens 1). Cassandra will split a node's current range into *num_tokens* parts and there should be no change to the amount of ring a node holds before shuffling. On Wed, Feb 13, 2013 at 3:12 AM, aaron morton aa...@thelastpickle.comwrote: Restore the settings for num_tokens and intial_token to what they were before you upgraded. They should not be changed just because you are upgrading to 1.2, they are used to enable virtual nodes. Which are not necessary to run 1.2. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 13/02/2013, at 8:02 AM, Daning Wang dan...@netseer.com wrote: No, I did not run shuffle since the upgrade was not successful. what do you mean reverting the changes to num_tokens and inital_token? set num_tokens=1? initial_token should be ignored since it is not bootstrap. right? Thanks, Daning On Tue, Feb 12, 2013 at 10:52 AM, aaron morton aa...@thelastpickle.com wrote: Were you upgrading to 1.2 AND running the shuffle or just upgrading to 1.2? If you have not run shuffle I would suggest reverting the changes to num_tokens and inital_token. This is a guess because num_tokens is only used at bootstrap. Just get upgraded to 1.2 first, then do the shuffle when things are stable. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 12/02/2013, at 2:55 PM, Daning Wang dan...@netseer.com wrote: Thanks Aaron. I tried to migrate existing cluster(ver 1.1.0) to 1.2.1 but failed. - I followed http://www.datastax.com/docs/1.2/install/upgrading, have merged cassandra.yaml, with follow parameter num_tokens: 256 #initial_token: 0 the initial_token is commented out, current token should be obtained from system schema - I did rolling upgrade, during the upgrade, I got Borken Pipe error from the nodes with old version, is that normal? - After I upgraded 3 nodes(still have 5 to go), I found it is total wrong, the first node upgraded owns 99.2 of ring [cassy@d5:/usr/local/cassy conf]$ ~/bin/nodetool -h localhost status Datacenter: datacenter1 === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack DN 10.210.101.11745.01 GB 254 99.2% f4b6afe3-7e2e-4c61-96e8-12a529a31373 rack1 UN 10.210.101.12045.43 GB 256 0.4% 0fd912fb-3187-462b-8c8a-7d223751b649 rack1 UN 10.210.101.11127.08 GB 256 0.4% bd4c37bc-07dd-488b-bfab-e74e32c26f6e rack1 What was wrong? please help. I could provide more information if you need. Thanks, Daning On Mon, Feb 4, 2013 at 9:16 AM, aaron morton aa...@thelastpickle.comwrote: There is a command line utility in 1.2 to shuffle the tokens… http://www.datastax.com/dev/blog/upgrading-an-existing-cluster-to-vnodes $ ./cassandra-shuffle --help Missing sub-command argument. Usage: shuffle [options] sub-command Sub-commands: create Initialize a new shuffle operation ls List pending relocations clearClear pending relocations en[able] Enable shuffling dis[able]Disable shuffling Options: -dc, --only-dc Apply only to named DC (create only) -tp, --thrift-port Thrift port number (Default: 9160) -p, --port JMX port number (Default: 7199) -tf, --thrift-framed Enable framed transport for Thrift (Default: false) -en, --and-enableImmediately enable shuffling (create only) -H, --help Print help information -h, --host JMX hostname or IP address (Default: localhost) -th, --thrift-host Thrift
multiget_slice using CQL3
Hi Guys, What's the syntax for multiget_slice in CQL3? How about multiget_count? -- Drew
Re: multiget_slice using CQL3
I'm confused what you are looking to do. CQL3 syntax (SELECT * FROM keyspace.cf WHERE user = 'cooldude') has nothing to do with thrift client calls (such as multiget_slice) What is your goal here? Best, michael On 2/14/13 5:57 PM, Drew Kutcharian d...@venarc.com wrote: Hi Guys, What's the syntax for multiget_slice in CQL3? How about multiget_count? -- Drew
Re: multiget_slice using CQL3
The equivalent of multget slice is select * from table where primary_key in ('that', 'this', 'the other thing') Not sure if you can count these in a way that makes sense since you can not group. On Thu, Feb 14, 2013 at 9:17 PM, Michael Kjellman mkjell...@barracuda.com wrote: I'm confused what you are looking to do. CQL3 syntax (SELECT * FROM keyspace.cf WHERE user = 'cooldude') has nothing to do with thrift client calls (such as multiget_slice) What is your goal here? Best, michael On 2/14/13 5:57 PM, Drew Kutcharian d...@venarc.com wrote: Hi Guys, What's the syntax for multiget_slice in CQL3? How about multiget_count? -- Drew
Question on Cassandra Snapshot
I have been looking at incremental backups and snapshots. I have done some experimentation but could not come to a conclusion. Can somebody please help me understanding it right? /data is my data partition With incremental_backup turned OFF in Cassandra.yaml - Are all SSTables are under /data/TestKeySpace/ColumnFamily at all times?With incremental_backup turned ON in cassandra.yaml - Are current SSTables under /data/TestKeySpace/ColumnFamily/ with a hardlink to /data/TestKeySpace/ColumnFamily/backups? Lets say I have taken snapshot and moved the /data/TestKeySpace/ColumnFamily/snapshots/snapshot-name/*.db to tape, at what point should I be backing up *.db files from /data/TestKeySpace/ColumnFamily/backups directory. Also, should I be deleting the *.db files whose inode matches with the files in the snapshot? Is that a correct approach? I noticed /data/TestKeySpace/ColumnFamily/snapshots/timestamp-ColumnFamily/ what are these timestamp directories? Thanks in advance. SC
Re: [VOTE] Release Mojo's Cassandra Maven Plugin 1.2.1-1
+1 =) 2013/2/14 Stephen Connolly stephen.alan.conno...@gmail.com Hi, I'd like to release version 1.2.1-1 of Mojo's Cassandra Maven Plugin to sync up with the 1.2.1 release of Apache Cassandra. We solved 1 issues: http://jira.codehaus.org/secure/ReleaseNote.jspa?projectId=12121version=19089 Staging Repository: https://nexus.codehaus.org/content/repositories/orgcodehausmojo-015/ Site: http://mojo.codehaus.org/cassandra-maven-plugin/index.html SCM Tag: https://svn.codehaus.org/mojo/tags/cassandra-maven-plugin-1.2.1-1@17931 [ ] +1 Yeah! fire ahead oh and the blind man on the galloping horse says it looks fine too. [ ] 0 Mehhh! like I care, I don't have any opinions either, I'd follow somebody else if only I could decide who [ ] -1 No! wait up there I have issues (in general like, ya know, and being a trouble-maker is only one of them) The vote is open for 72h and will succeed by lazy consensus. Guide to testing staged releases: http://maven.apache.org/guides/development/guide-testing-releases.html Cheers -Stephen P.S. In the interest of ensuring (more is) better testing, and as is now tradition for Mojo's Cassandra Maven Plugin, this vote is also open to any subscribers of the dev and user@cassandra.apache.org mailing lists that want to test or use this plugin.
Re: multiget_slice using CQL3
Thanks Edward. I assume I can still do a column slice using WHERE in case of wide rows. I wonder if the multiget count is the only thing that you can do using thrift but not CQL3. On Feb 14, 2013, at 6:35 PM, Edward Capriolo edlinuxg...@gmail.com wrote: The equivalent of multget slice is select * from table where primary_key in ('that', 'this', 'the other thing') Not sure if you can count these in a way that makes sense since you can not group. On Thu, Feb 14, 2013 at 9:17 PM, Michael Kjellman mkjell...@barracuda.com wrote: I'm confused what you are looking to do. CQL3 syntax (SELECT * FROM keyspace.cf WHERE user = 'cooldude') has nothing to do with thrift client calls (such as multiget_slice) What is your goal here? Best, michael On 2/14/13 5:57 PM, Drew Kutcharian d...@venarc.com wrote: Hi Guys, What's the syntax for multiget_slice in CQL3? How about multiget_count? -- Drew