Re: Exception in Hadoop Word Count sample
Yes. That's the problem. Thanks Jonathan. I'm actually using trunk against a 0.7. How can I generate the distro in trunk? Forgive my ignorance, I'm more used to maven. On Thu, Sep 15, 2011 at 1:08 AM, Jonathan Ellis jbel...@gmail.com wrote: You're using a 0.8 wordcount against a 0.7 Cassandra? On Wed, Sep 14, 2011 at 2:19 PM, Tharindu Mathew mcclou...@gmail.com wrote: I see $subject. Can anyone help me to rectify this? Stacktrace: Exception in thread main org.apache.thrift.TApplicationException: Required field 'replication_factor' was not found in serialized data! Struct: KsDef(name:wordcount, strategy_class:org.apache.cassandra.locator.SimpleStrategy, strategy_options:{replication_factor=1}, replication_factor:0, cf_defs:[CfDef(keyspace:wordcount, name:input_words, column_type:Standard, comparator_type:AsciiType, default_validation_class:AsciiType), CfDef(keyspace:wordcount, name:output_words, column_type:Standard, comparator_type:AsciiType, default_validation_class:AsciiType), CfDef(keyspace:wordcount, name:input_words_count, column_type:Standard, comparator_type:UTF8Type, default_validation_class:CounterColumnType)]) at org.apache.thrift.TApplicationException.read(TApplicationException.java:108) at org.apache.cassandra.thrift.Cassandra$Client.recv_system_add_keyspace(Cassandra.java:1531) at org.apache.cassandra.thrift.Cassandra$Client.system_add_keyspace(Cassandra.java:1514) at WordCountSetup.setupKeyspace(Unknown Source) at WordCountSetup.main(Unknown Source) -- Regards, Tharindu blog: http://mackiemathew.com/ -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com -- Regards, Tharindu blog: http://mackiemathew.com/
Re: Exception in Hadoop Word Count sample
Found it. 'ant artifacts' On Thu, Sep 15, 2011 at 12:02 PM, Tharindu Mathew mcclou...@gmail.comwrote: Yes. That's the problem. Thanks Jonathan. I'm actually using trunk against a 0.7. How can I generate the distro in trunk? Forgive my ignorance, I'm more used to maven. On Thu, Sep 15, 2011 at 1:08 AM, Jonathan Ellis jbel...@gmail.com wrote: You're using a 0.8 wordcount against a 0.7 Cassandra? On Wed, Sep 14, 2011 at 2:19 PM, Tharindu Mathew mcclou...@gmail.com wrote: I see $subject. Can anyone help me to rectify this? Stacktrace: Exception in thread main org.apache.thrift.TApplicationException: Required field 'replication_factor' was not found in serialized data! Struct: KsDef(name:wordcount, strategy_class:org.apache.cassandra.locator.SimpleStrategy, strategy_options:{replication_factor=1}, replication_factor:0, cf_defs:[CfDef(keyspace:wordcount, name:input_words, column_type:Standard, comparator_type:AsciiType, default_validation_class:AsciiType), CfDef(keyspace:wordcount, name:output_words, column_type:Standard, comparator_type:AsciiType, default_validation_class:AsciiType), CfDef(keyspace:wordcount, name:input_words_count, column_type:Standard, comparator_type:UTF8Type, default_validation_class:CounterColumnType)]) at org.apache.thrift.TApplicationException.read(TApplicationException.java:108) at org.apache.cassandra.thrift.Cassandra$Client.recv_system_add_keyspace(Cassandra.java:1531) at org.apache.cassandra.thrift.Cassandra$Client.system_add_keyspace(Cassandra.java:1514) at WordCountSetup.setupKeyspace(Unknown Source) at WordCountSetup.main(Unknown Source) -- Regards, Tharindu blog: http://mackiemathew.com/ -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com -- Regards, Tharindu blog: http://mackiemathew.com/ -- Regards, Tharindu blog: http://mackiemathew.com/
RE: Get CL ONE / NTS
I do not agree here. I trade consistency (it's more data miss than consistency here) over performance in my case. I'm okay to handle the popping of the Spanish inquisition in the current DC by triggering a new read with a stronger CL somewhere else (for example in other DCs). If the data is nowhere to be found or nothing is reachable, well, it's sad but true but it will be the end of the game. Fine. What I'm missing is a clear behavior for CL.ONE. I'm unsure about what nodes are used by ONE and how the filtering of missing data/error is done. I've landed in ReadCallback.java but error handling is out of my reach for the moment. Thanks, - Pierre From: aaron morton [mailto:aa...@thelastpickle.com] Sent: Thursday, September 15, 2011 12:27 AM To: user@cassandra.apache.org Subject: Re: Get CL ONE / NTS Are you advising CL.ONE does not worth the game when considering read performance ? Consistency is not performance, it's a whole new thing to tune in your application. If you have performance issues deal with those as performance issues, better code / data model / hard ware. By the way, I do not have consistency problem at all - data is only written once Nobody expects a consistency problem. It's chief weapon is surprise. Surprise and fear. It's two weapons are fear and surprise. And so forth http://www.youtube.com/watch?v=Ixgc_FGam3s If you write at LOCAL QUORUM in DC 1 and DC 2 is down at the start of the request, a hint will be stored in DC 1. Some time later when DC 2 comes back that hint will be sent to DC 2. If in the mean time you read from DC 2 at CL ONE you will not get that change. With Read Repair enabled it will repair in the background and you may get a different response on the next read (Am guessing here, cannot remember exactly how RR works cross DC) Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 15/09/2011, at 10:07 AM, Pierre Chalamet wrote: Thanks Aaron, didn't seen your answer before mine. I do agree for 2/ I might have read error. Good suggestion to use EACH_QUORUM - it could be a good trade off to read at this level if ONE fails. Maybe using LOCAL_QUORUM might be a good answer and will avoid headache after all. Are you advising CL.ONE does not worth the game when considering read performance ? By the way, I do not have consistency problem at all - data is only written once (and if more it is always the same data) and read several times across DC. I only have replication problems. That's why I'm more inclined to use CL.ONE for read if possible. Thanks, - Pierre -Original Message- From: aaron morton [mailto:aa...@thelastpickle.com] Sent: Wednesday, September 14, 2011 11:48 PM To: user@cassandra.apache.org; pie...@chalamet.net Subject: Re: Get CL ONE / NTS Your current approach to Consistency opens the door to some inconsistent behavior. 1/ Will I have an error because DC2 does not have any copy of the data ? If you read from DC2 at CL ONE and the data is not replicated it will not be returned. 2/ Will Cassandra try to get the data from DC1 if nothing is found in DC2 ? Not at CL ONE. If you used CL EACH QUORUM then the read will go to all the DC's. If DC2 is behind DC1 then you will get the data form DC1. 3/ In case of partial replication to DC2, will I see sometimes errors about servers not holding the data in DC2 ? Depending on the API call and the client, working at CL ONE, you will see either errors or missing data. 4/ Does Get CL ONE failed as soon as the fastest server to answer tell it does not have the data or does it waits until all servers tell they do not have the data ? yes Consider using LOCAL QUORUM for write and read, will make things a bit more consistent but not add inter DC overhead into the request latency. Still possible to not get data in DC2 if it is totally disconnected from the DC1 write at LOCAL QUORUM and read at EACH QUORUM . Will so you can always read, requests in DC2 will fail if DC1 is not reachable. Hope that helps. - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 15/09/2011, at 1:33 AM, Pierre Chalamet wrote: Hello, I have 2 datacenters. Cassandra is configured as follow: - RackInferringSnitch - NetworkTopologyStrategy for CF - strategy_options: DC1:3 DC2:3 Data are written using CL LOCAL_QUORUM so data written from one datacenter will eventually be replicated to the other datacenter. Data is always written exactly once. On the other side, I'd like to improve the read path. I'm using actually the CL ONE since data is only written once (ie: timestamp is more or less meaningless in my case). This is where I have some doubts: if data is written on DC1 and tentatively read from DC2 while the data is still not replicated or partially replicated (for whatever good reason since replication is async),
how did hprof file generated?
in one of my node, I found many hprof files in the cassandra installation directory, they are using as much as 200GB disk space. other nodes didn't have those files. turns out that those files are used for memory analyzing, not sure how they are generated? like these: java_pid10626.hprof java_pid13898.hprof java_pid17061.hprof java_pid21002.hprof java_pid23194.hprof java_pid29241.hprof java_pid5013.hprof
Re: how did hprof file generated?
in one of my node, I found many hprof files in the cassandra installation directory, they are using as much as 200GB disk space. other nodes didn't have those files. turns out that those files are used for memory analyzing, not sure how they are generated? You're probably getting OutOfMemory exceptions. Cassandra by default runs with -XX:+HeapDumpOnOutOfMemory (or some such, I forget exactly what it's called). If this is the case, you probably need to increase your heap size or adjust Cassandra settings. -- / Peter Schuller (@scode on twitter)
Re: Exception in Hadoop Word Count sample
Now I get this, Any help would be greatly appreciated. ./bin/word_count 11/09/15 12:28:28 INFO WordCount: output reducer type: cassandra 11/09/15 12:28:29 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= 11/09/15 12:28:30 INFO mapred.JobClient: Running job: job_local_0001 11/09/15 12:28:30 INFO mapred.MapTask: io.sort.mb = 100 11/09/15 12:28:30 INFO mapred.MapTask: data buffer = 79691776/99614720 11/09/15 12:28:30 INFO mapred.MapTask: record buffer = 262144/327680 11/09/15 12:28:30 WARN mapred.LocalJobRunner: job_local_0001 java.lang.RuntimeException: java.lang.UnsupportedOperationException: no local connection available at org.apache.cassandra.hadoop.ColumnFamilyRecordReader.initialize(ColumnFamilyRecordReader.java:132) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:418) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:620) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177) Caused by: java.lang.UnsupportedOperationException: no local connection available at org.apache.cassandra.hadoop.ColumnFamilyRecordReader.getLocation(ColumnFamilyRecordReader.java:176) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader.initialize(ColumnFamilyRecordReader.java:113) ... 4 more 11/09/15 12:28:31 INFO mapred.JobClient: map 0% reduce 0% 11/09/15 12:28:31 INFO mapred.JobClient: Job complete: job_local_0001 11/09/15 12:28:31 INFO mapred.JobClient: Counters: 0 11/09/15 12:28:31 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 11/09/15 12:28:32 INFO mapred.JobClient: Running job: job_local_0002 11/09/15 12:28:32 INFO mapred.MapTask: io.sort.mb = 100 11/09/15 12:28:32 INFO mapred.MapTask: data buffer = 79691776/99614720 11/09/15 12:28:32 INFO mapred.MapTask: record buffer = 262144/327680 11/09/15 12:28:32 WARN mapred.LocalJobRunner: job_local_0002 java.lang.RuntimeException: java.lang.UnsupportedOperationException: no local connection available at org.apache.cassandra.hadoop.ColumnFamilyRecordReader.initialize(ColumnFamilyRecordReader.java:132) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:418) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:620) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177) Caused by: java.lang.UnsupportedOperationException: no local connection available at org.apache.cassandra.hadoop.ColumnFamilyRecordReader.getLocation(ColumnFamilyRecordReader.java:176) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader.initialize(ColumnFamilyRecordReader.java:113) ... 4 more 11/09/15 12:28:33 INFO mapred.JobClient: map 0% reduce 0% 11/09/15 12:28:33 INFO mapred.JobClient: Job complete: job_local_0002 11/09/15 12:28:33 INFO mapred.JobClient: Counters: 0 11/09/15 12:28:33 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 11/09/15 12:28:34 INFO mapred.JobClient: Running job: job_local_0003 11/09/15 12:28:34 INFO mapred.MapTask: io.sort.mb = 100 11/09/15 12:28:34 INFO mapred.MapTask: data buffer = 79691776/99614720 11/09/15 12:28:34 INFO mapred.MapTask: record buffer = 262144/327680 11/09/15 12:28:34 WARN mapred.LocalJobRunner: job_local_0003 java.lang.RuntimeException: java.lang.UnsupportedOperationException: no local connection available at org.apache.cassandra.hadoop.ColumnFamilyRecordReader.initialize(ColumnFamilyRecordReader.java:132) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:418) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:620) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177) Caused by: java.lang.UnsupportedOperationException: no local connection available at org.apache.cassandra.hadoop.ColumnFamilyRecordReader.getLocation(ColumnFamilyRecordReader.java:176) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader.initialize(ColumnFamilyRecordReader.java:113) ... 4 more 11/09/15 12:28:35 INFO mapred.JobClient: map 0% reduce 0% 11/09/15 12:28:35 INFO mapred.JobClient: Job complete: job_local_0003 11/09/15 12:28:35 INFO mapred.JobClient: Counters: 0 11/09/15 12:28:35 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 11/09/15 12:28:36 INFO mapred.JobClient: Running job: job_local_0004 11/09/15 12:28:36 INFO mapred.MapTask: io.sort.mb = 100 11/09/15 12:28:37 INFO mapred.MapTask: data buffer = 79691776/99614720 11/09/15 12:28:37 INFO mapred.MapTask: record buffer = 262144/327680 11/09/15 12:28:37 WARN mapred.LocalJobRunner: job_local_0004 java.lang.RuntimeException: java.lang.UnsupportedOperationException: no local connection available at
Re: how did hprof file generated?
got it! thanks! On Thu, Sep 15, 2011 at 4:10 PM, Peter Schuller peter.schul...@infidyne.com wrote: in one of my node, I found many hprof files in the cassandra installation directory, they are using as much as 200GB disk space. other nodes didn't have those files. turns out that those files are used for memory analyzing, not sure how they are generated? You're probably getting OutOfMemory exceptions. Cassandra by default runs with -XX:+HeapDumpOnOutOfMemory (or some such, I forget exactly what it's called). If this is the case, you probably need to increase your heap size or adjust Cassandra settings. -- / Peter Schuller (@scode on twitter)
New node unable to stream (0.8.5)
Hi. We've been running a 7-node cluster with RF 3, QUORUM reads/writes in our production environment for a few months. It's been consistently stable during this period, particularly once we got out maintenance strategy fully worked out (per node, one repair a week, one major compaction a week, the latter due to the nature of our data model and usage). While this cluster started, back in June or so, on the 0.7 series, it's been running 0.8.3 for a while now with no issues. We upgraded to 0.8.5 two days ago, having tested the upgrade in our staging cluster (with an otherwise identical configuration) previously and verified that our application's various use cases appeared successful. One of our nodes suffered a disk failure yesterday. We attempted to replace the dead node by placing a new node at OldNode.initial_token - 1 with auto_bootstrap on. A few things went awry from there: 1. We never saw the new node in bootstrap mode; it became available pretty much immediately upon joining the ring, and never reported a joining state. I did verify that auto_bootstrap was on. 2. I mistakenly ran repair on the new node rather than removetoken on the old node, due to a delightful mental error. The repair got nowhere fast, as it attempts to repair against the down node which throws an exception. So I interrupted the repair, restarted the node to clear any pending validation compactions, and... 3. Ran removetoken for the old node. 4. We let this run for some time and saw eventually that all the nodes appeared to be done various compactions and were stuck at streaming. Many streams listed as open, none making any progress. 5. I observed an Rpc-related exception on the new node (where the removetoken was launched) and concluded that the streams were broken so the process wouldn't ever finish. 6. Ran a removetoken force to get the dead node out of the mix. No problems. 7. Ran a repair on the new node. 8. Validations ran, streams opened up, and again things got stuck in streaming, hanging for over an hour with no progress. 9. Musing that lingering tasks from the removetoken could be a factor, I performed a rolling restart and attempted a repair again. 10. Same problem. Did another rolling restart and attempted a fresh repair on the most important column family alone. 11. Same problem. Streams included CFs not specified, so I guess they must be for hinted handoff. In concluding that streaming is stuck, I've observed: - streams will be open to the new node from other nodes, but the new node doesn't list them - streams will be open to the other nodes from the new node, but the other nodes don't list them - the streams reported may make some initial progress, but then they hang at a particular point and do not move on for an hour or more. - The logs report repair-related activity, until NPEs on incoming TCP connections show up, which appear likely to be the culprit. I can provide more exact details when I'm done commuting. With streaming broken on this node, I'm unable to run repairs, which is obviously problematic. The application didn't suffer any operational issues as a consequence of this, but I need to review the overnight results to verify we're not suffering data loss (I doubt we are). At this point, I'm considering a couple options: 1. Remove the new node and let the adjacent node take over its range 2. Bring the new node down, add a new one in front of it, and properly removetoken the problematic one. 3. Bring the new node down, remove all its data except for the system keyspace, then bring it back up and repair it. 4. Revert to 0.8.3 and see if that helps. Recommendations? Thanks. - Ethan
Re: New node unable to stream (0.8.5)
On Thu, Sep 15, 2011 at 1:16 PM, Ethan Rowe et...@the-rowes.com wrote: Hi. We've been running a 7-node cluster with RF 3, QUORUM reads/writes in our production environment for a few months. It's been consistently stable during this period, particularly once we got out maintenance strategy fully worked out (per node, one repair a week, one major compaction a week, the latter due to the nature of our data model and usage). While this cluster started, back in June or so, on the 0.7 series, it's been running 0.8.3 for a while now with no issues. We upgraded to 0.8.5 two days ago, having tested the upgrade in our staging cluster (with an otherwise identical configuration) previously and verified that our application's various use cases appeared successful. One of our nodes suffered a disk failure yesterday. We attempted to replace the dead node by placing a new node at OldNode.initial_token - 1 with auto_bootstrap on. A few things went awry from there: 1. We never saw the new node in bootstrap mode; it became available pretty much immediately upon joining the ring, and never reported a joining state. I did verify that auto_bootstrap was on. 2. I mistakenly ran repair on the new node rather than removetoken on the old node, due to a delightful mental error. The repair got nowhere fast, as it attempts to repair against the down node which throws an exception. So I interrupted the repair, restarted the node to clear any pending validation compactions, and... 3. Ran removetoken for the old node. 4. We let this run for some time and saw eventually that all the nodes appeared to be done various compactions and were stuck at streaming. Many streams listed as open, none making any progress. 5. I observed an Rpc-related exception on the new node (where the removetoken was launched) and concluded that the streams were broken so the process wouldn't ever finish. 6. Ran a removetoken force to get the dead node out of the mix. No problems. 7. Ran a repair on the new node. 8. Validations ran, streams opened up, and again things got stuck in streaming, hanging for over an hour with no progress. 9. Musing that lingering tasks from the removetoken could be a factor, I performed a rolling restart and attempted a repair again. 10. Same problem. Did another rolling restart and attempted a fresh repair on the most important column family alone. 11. Same problem. Streams included CFs not specified, so I guess they must be for hinted handoff. In concluding that streaming is stuck, I've observed: - streams will be open to the new node from other nodes, but the new node doesn't list them - streams will be open to the other nodes from the new node, but the other nodes don't list them - the streams reported may make some initial progress, but then they hang at a particular point and do not move on for an hour or more. - The logs report repair-related activity, until NPEs on incoming TCP connections show up, which appear likely to be the culprit. Can you send the stack trace from those NPE. I can provide more exact details when I'm done commuting. With streaming broken on this node, I'm unable to run repairs, which is obviously problematic. The application didn't suffer any operational issues as a consequence of this, but I need to review the overnight results to verify we're not suffering data loss (I doubt we are). At this point, I'm considering a couple options: 1. Remove the new node and let the adjacent node take over its range 2. Bring the new node down, add a new one in front of it, and properly removetoken the problematic one. 3. Bring the new node down, remove all its data except for the system keyspace, then bring it back up and repair it. 4. Revert to 0.8.3 and see if that helps. Recommendations? Thanks. - Ethan
Re: New node unable to stream (0.8.5)
Here's a typical log slice (not terribly informative, I fear): INFO [AntiEntropyStage:2] 2011-09-15 05:41:36,106 AntiEntropyService.java (l ine 884) Performing streaming repair of 1003 ranges with /10.34.90.8 for (299 90798416657667504332586989223299634,54296681768153272037430773234349600451] INFO [AntiEntropyStage:2] 2011-09-15 05:41:36,427 StreamOut.java (line 181) Stream context metadata [/mnt/cassandra/data/events_production/FitsByShip-g-1 0-Data.db sections=88 progress=0/11707163 - 0%, /mnt/cassandra/data/events_pr oduction/FitsByShip-g-11-Data.db sections=169 progress=0/6133240 - 0%, /mnt/c assandra/data/events_production/FitsByShip-g-6-Data.db sections=1 progress=0/ 6918814 - 0%, /mnt/cassandra/data/events_production/FitsByShip-g-12-Data.db s ections=260 progress=0/9091780 - 0%], 4 sstables. INFO [AntiEntropyStage:2] 2011-09-15 05:41:36,428 StreamOutSession.java (lin e 174) Streaming to /10.34.90.8 ERROR [Thread-56] 2011-09-15 05:41:38,515 AbstractCassandraDaemon.java (line 139) Fatal exception in thread Thread[Thread-56,5,main] java.lang.NullPointerException at org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpC onnection.java:174) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConn ection.java:114) Not sure if the exception is related to the outbound streaming above; other nodes are actively trying to stream to this node, so perhaps it comes from those and temporal adjacency to the outbound stream is just coincidental. I have other snippets that look basically identical to the above, except if I look at the logs to which this node is trying to stream, I see that it has concurrently opened a stream in the other direction, which could be the one that the exception pertains to. On Thu, Sep 15, 2011 at 7:41 AM, Sylvain Lebresne sylv...@datastax.comwrote: On Thu, Sep 15, 2011 at 1:16 PM, Ethan Rowe et...@the-rowes.com wrote: Hi. We've been running a 7-node cluster with RF 3, QUORUM reads/writes in our production environment for a few months. It's been consistently stable during this period, particularly once we got out maintenance strategy fully worked out (per node, one repair a week, one major compaction a week, the latter due to the nature of our data model and usage). While this cluster started, back in June or so, on the 0.7 series, it's been running 0.8.3 for a while now with no issues. We upgraded to 0.8.5 two days ago, having tested the upgrade in our staging cluster (with an otherwise identical configuration) previously and verified that our application's various use cases appeared successful. One of our nodes suffered a disk failure yesterday. We attempted to replace the dead node by placing a new node at OldNode.initial_token - 1 with auto_bootstrap on. A few things went awry from there: 1. We never saw the new node in bootstrap mode; it became available pretty much immediately upon joining the ring, and never reported a joining state. I did verify that auto_bootstrap was on. 2. I mistakenly ran repair on the new node rather than removetoken on the old node, due to a delightful mental error. The repair got nowhere fast, as it attempts to repair against the down node which throws an exception. So I interrupted the repair, restarted the node to clear any pending validation compactions, and... 3. Ran removetoken for the old node. 4. We let this run for some time and saw eventually that all the nodes appeared to be done various compactions and were stuck at streaming. Many streams listed as open, none making any progress. 5. I observed an Rpc-related exception on the new node (where the removetoken was launched) and concluded that the streams were broken so the process wouldn't ever finish. 6. Ran a removetoken force to get the dead node out of the mix. No problems. 7. Ran a repair on the new node. 8. Validations ran, streams opened up, and again things got stuck in streaming, hanging for over an hour with no progress. 9. Musing that lingering tasks from the removetoken could be a factor, I performed a rolling restart and attempted a repair again. 10. Same problem. Did another rolling restart and attempted a fresh repair on the most important column family alone. 11. Same problem. Streams included CFs not specified, so I guess they must be for hinted handoff. In concluding that streaming is stuck, I've observed: - streams will be open to the new node from other nodes, but the new node doesn't list them - streams will be open to the other nodes from the new node, but the other nodes don't list them - the streams reported may make some initial progress, but then they hang at a particular point and do not move on for an hour or more. - The logs report repair-related activity, until NPEs on incoming TCP connections show up, which appear likely to be the
Re: New node unable to stream (0.8.5)
I just noticed the following from one of Jonathan Ellis' messages yesterday: Added to NEWS: - After upgrading, run nodetool scrub against each node before running repair, moving nodes, or adding new ones. We did not do this, as it was not indicated as necessary in the news when we were dealing with the upgrade. So perhaps I need to scrub everything before going any further, though the question is what to do with the problematic node. Additionally, it would be helpful to know if scrub will affect the hinted handoffs that have accumulated, as these seem likely to be part of the set of failing streams. On Thu, Sep 15, 2011 at 8:13 AM, Ethan Rowe et...@the-rowes.com wrote: Here's a typical log slice (not terribly informative, I fear): INFO [AntiEntropyStage:2] 2011-09-15 05:41:36,106 AntiEntropyService.java (l ine 884) Performing streaming repair of 1003 ranges with /10.34.90.8 for (299 90798416657667504332586989223299634,54296681768153272037430773234349600451] INFO [AntiEntropyStage:2] 2011-09-15 05:41:36,427 StreamOut.java (line 181) Stream context metadata [/mnt/cassandra/data/events_production/FitsByShip-g-1 0-Data.db sections=88 progress=0/11707163 - 0%, /mnt/cassandra/data/events_pr oduction/FitsByShip-g-11-Data.db sections=169 progress=0/6133240 - 0%, /mnt/c assandra/data/events_production/FitsByShip-g-6-Data.db sections=1 progress=0/ 6918814 - 0%, /mnt/cassandra/data/events_production/FitsByShip-g-12-Data.db s ections=260 progress=0/9091780 - 0%], 4 sstables. INFO [AntiEntropyStage:2] 2011-09-15 05:41:36,428 StreamOutSession.java (lin e 174) Streaming to /10.34.90.8 ERROR [Thread-56] 2011-09-15 05:41:38,515 AbstractCassandraDaemon.java (line 139) Fatal exception in thread Thread[Thread-56,5,main] java.lang.NullPointerException at org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpC onnection.java:174) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConn ection.java:114) Not sure if the exception is related to the outbound streaming above; other nodes are actively trying to stream to this node, so perhaps it comes from those and temporal adjacency to the outbound stream is just coincidental. I have other snippets that look basically identical to the above, except if I look at the logs to which this node is trying to stream, I see that it has concurrently opened a stream in the other direction, which could be the one that the exception pertains to. On Thu, Sep 15, 2011 at 7:41 AM, Sylvain Lebresne sylv...@datastax.comwrote: On Thu, Sep 15, 2011 at 1:16 PM, Ethan Rowe et...@the-rowes.com wrote: Hi. We've been running a 7-node cluster with RF 3, QUORUM reads/writes in our production environment for a few months. It's been consistently stable during this period, particularly once we got out maintenance strategy fully worked out (per node, one repair a week, one major compaction a week, the latter due to the nature of our data model and usage). While this cluster started, back in June or so, on the 0.7 series, it's been running 0.8.3 for a while now with no issues. We upgraded to 0.8.5 two days ago, having tested the upgrade in our staging cluster (with an otherwise identical configuration) previously and verified that our application's various use cases appeared successful. One of our nodes suffered a disk failure yesterday. We attempted to replace the dead node by placing a new node at OldNode.initial_token - 1 with auto_bootstrap on. A few things went awry from there: 1. We never saw the new node in bootstrap mode; it became available pretty much immediately upon joining the ring, and never reported a joining state. I did verify that auto_bootstrap was on. 2. I mistakenly ran repair on the new node rather than removetoken on the old node, due to a delightful mental error. The repair got nowhere fast, as it attempts to repair against the down node which throws an exception. So I interrupted the repair, restarted the node to clear any pending validation compactions, and... 3. Ran removetoken for the old node. 4. We let this run for some time and saw eventually that all the nodes appeared to be done various compactions and were stuck at streaming. Many streams listed as open, none making any progress. 5. I observed an Rpc-related exception on the new node (where the removetoken was launched) and concluded that the streams were broken so the process wouldn't ever finish. 6. Ran a removetoken force to get the dead node out of the mix. No problems. 7. Ran a repair on the new node. 8. Validations ran, streams opened up, and again things got stuck in streaming, hanging for over an hour with no progress. 9. Musing that lingering tasks from the removetoken could be a factor, I performed a rolling restart and attempted a repair again. 10. Same problem. Did another rolling
Re: New node unable to stream (0.8.5)
After further review, I'm definitely going to scrub all the original nodes in the cluster. We've lost some data as a result of this situation. It can be restored, but the question is what to do with the problematic new node first. I don't particularly care about the data that's on it, since I'm going to re-import the critical data from files anyway, and then I can recreate derivative data afterwards. So it's purely a matter of getting the cluster healthy again as quickly as possible so I can begin that import process. Any issue with running scrubs on multiple nodes at a time, provided they aren't replication neighbors? On Thu, Sep 15, 2011 at 8:18 AM, Ethan Rowe et...@the-rowes.com wrote: I just noticed the following from one of Jonathan Ellis' messages yesterday: Added to NEWS: - After upgrading, run nodetool scrub against each node before running repair, moving nodes, or adding new ones. We did not do this, as it was not indicated as necessary in the news when we were dealing with the upgrade. So perhaps I need to scrub everything before going any further, though the question is what to do with the problematic node. Additionally, it would be helpful to know if scrub will affect the hinted handoffs that have accumulated, as these seem likely to be part of the set of failing streams. On Thu, Sep 15, 2011 at 8:13 AM, Ethan Rowe et...@the-rowes.com wrote: Here's a typical log slice (not terribly informative, I fear): INFO [AntiEntropyStage:2] 2011-09-15 05:41:36,106 AntiEntropyService.java (l ine 884) Performing streaming repair of 1003 ranges with /10.34.90.8 for (299 90798416657667504332586989223299634,54296681768153272037430773234349600451] INFO [AntiEntropyStage:2] 2011-09-15 05:41:36,427 StreamOut.java (line 181) Stream context metadata [/mnt/cassandra/data/events_production/FitsByShip-g-1 0-Data.db sections=88 progress=0/11707163 - 0%, /mnt/cassandra/data/events_pr oduction/FitsByShip-g-11-Data.db sections=169 progress=0/6133240 - 0%, /mnt/c assandra/data/events_production/FitsByShip-g-6-Data.db sections=1 progress=0/ 6918814 - 0%, /mnt/cassandra/data/events_production/FitsByShip-g-12-Data.db s ections=260 progress=0/9091780 - 0%], 4 sstables. INFO [AntiEntropyStage:2] 2011-09-15 05:41:36,428 StreamOutSession.java (lin e 174) Streaming to /10.34.90.8 ERROR [Thread-56] 2011-09-15 05:41:38,515 AbstractCassandraDaemon.java (line 139) Fatal exception in thread Thread[Thread-56,5,main] java.lang.NullPointerException at org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpC onnection.java:174) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConn ection.java:114) Not sure if the exception is related to the outbound streaming above; other nodes are actively trying to stream to this node, so perhaps it comes from those and temporal adjacency to the outbound stream is just coincidental. I have other snippets that look basically identical to the above, except if I look at the logs to which this node is trying to stream, I see that it has concurrently opened a stream in the other direction, which could be the one that the exception pertains to. On Thu, Sep 15, 2011 at 7:41 AM, Sylvain Lebresne sylv...@datastax.comwrote: On Thu, Sep 15, 2011 at 1:16 PM, Ethan Rowe et...@the-rowes.com wrote: Hi. We've been running a 7-node cluster with RF 3, QUORUM reads/writes in our production environment for a few months. It's been consistently stable during this period, particularly once we got out maintenance strategy fully worked out (per node, one repair a week, one major compaction a week, the latter due to the nature of our data model and usage). While this cluster started, back in June or so, on the 0.7 series, it's been running 0.8.3 for a while now with no issues. We upgraded to 0.8.5 two days ago, having tested the upgrade in our staging cluster (with an otherwise identical configuration) previously and verified that our application's various use cases appeared successful. One of our nodes suffered a disk failure yesterday. We attempted to replace the dead node by placing a new node at OldNode.initial_token - 1 with auto_bootstrap on. A few things went awry from there: 1. We never saw the new node in bootstrap mode; it became available pretty much immediately upon joining the ring, and never reported a joining state. I did verify that auto_bootstrap was on. 2. I mistakenly ran repair on the new node rather than removetoken on the old node, due to a delightful mental error. The repair got nowhere fast, as it attempts to repair against the down node which throws an exception. So I interrupted the repair, restarted the node to clear any pending validation compactions, and... 3. Ran removetoken for the old node. 4. We let this run for some time and saw eventually that all the nodes appeared to be done various
Re: New node unable to stream (0.8.5)
That means we missed a place we needed to special-case for backwards compatibility -- the workaround is, add an empty encryption_options section to cassandra.yaml: encryption_options: internode_encryption: none keystore: conf/.keystore keystore_password: cassandra truststore: conf/.truststore truststore_password: cassandra Created https://issues.apache.org/jira/browse/CASSANDRA-3212 to fix this. On Thu, Sep 15, 2011 at 7:13 AM, Ethan Rowe et...@the-rowes.com wrote: Here's a typical log slice (not terribly informative, I fear): INFO [AntiEntropyStage:2] 2011-09-15 05:41:36,106 AntiEntropyService.java (l ine 884) Performing streaming repair of 1003 ranges with /10.34.90.8 for (299 90798416657667504332586989223299634,54296681768153272037430773234349600451] INFO [AntiEntropyStage:2] 2011-09-15 05:41:36,427 StreamOut.java (line 181) Stream context metadata [/mnt/cassandra/data/events_production/FitsByShip-g-1 0-Data.db sections=88 progress=0/11707163 - 0%, /mnt/cassandra/data/events_pr oduction/FitsByShip-g-11-Data.db sections=169 progress=0/6133240 - 0%, /mnt/c assandra/data/events_production/FitsByShip-g-6-Data.db sections=1 progress=0/ 6918814 - 0%, /mnt/cassandra/data/events_production/FitsByShip-g-12-Data.db s ections=260 progress=0/9091780 - 0%], 4 sstables. INFO [AntiEntropyStage:2] 2011-09-15 05:41:36,428 StreamOutSession.java (lin e 174) Streaming to /10.34.90.8 ERROR [Thread-56] 2011-09-15 05:41:38,515 AbstractCassandraDaemon.java (line 139) Fatal exception in thread Thread[Thread-56,5,main] java.lang.NullPointerException at org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpC onnection.java:174) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConn ection.java:114) Not sure if the exception is related to the outbound streaming above; other nodes are actively trying to stream to this node, so perhaps it comes from those and temporal adjacency to the outbound stream is just coincidental. I have other snippets that look basically identical to the above, except if I look at the logs to which this node is trying to stream, I see that it has concurrently opened a stream in the other direction, which could be the one that the exception pertains to. On Thu, Sep 15, 2011 at 7:41 AM, Sylvain Lebresne sylv...@datastax.com wrote: On Thu, Sep 15, 2011 at 1:16 PM, Ethan Rowe et...@the-rowes.com wrote: Hi. We've been running a 7-node cluster with RF 3, QUORUM reads/writes in our production environment for a few months. It's been consistently stable during this period, particularly once we got out maintenance strategy fully worked out (per node, one repair a week, one major compaction a week, the latter due to the nature of our data model and usage). While this cluster started, back in June or so, on the 0.7 series, it's been running 0.8.3 for a while now with no issues. We upgraded to 0.8.5 two days ago, having tested the upgrade in our staging cluster (with an otherwise identical configuration) previously and verified that our application's various use cases appeared successful. One of our nodes suffered a disk failure yesterday. We attempted to replace the dead node by placing a new node at OldNode.initial_token - 1 with auto_bootstrap on. A few things went awry from there: 1. We never saw the new node in bootstrap mode; it became available pretty much immediately upon joining the ring, and never reported a joining state. I did verify that auto_bootstrap was on. 2. I mistakenly ran repair on the new node rather than removetoken on the old node, due to a delightful mental error. The repair got nowhere fast, as it attempts to repair against the down node which throws an exception. So I interrupted the repair, restarted the node to clear any pending validation compactions, and... 3. Ran removetoken for the old node. 4. We let this run for some time and saw eventually that all the nodes appeared to be done various compactions and were stuck at streaming. Many streams listed as open, none making any progress. 5. I observed an Rpc-related exception on the new node (where the removetoken was launched) and concluded that the streams were broken so the process wouldn't ever finish. 6. Ran a removetoken force to get the dead node out of the mix. No problems. 7. Ran a repair on the new node. 8. Validations ran, streams opened up, and again things got stuck in streaming, hanging for over an hour with no progress. 9. Musing that lingering tasks from the removetoken could be a factor, I performed a rolling restart and attempted a repair again. 10. Same problem. Did another rolling restart and attempted a fresh repair on the most important column family alone. 11. Same problem. Streams included CFs not specified, so I guess they must be for hinted handoff.
Re: New node unable to stream (0.8.5)
Where did the data loss come in? Scrub is safe to run in parallel. On Thu, Sep 15, 2011 at 8:08 AM, Ethan Rowe et...@the-rowes.com wrote: After further review, I'm definitely going to scrub all the original nodes in the cluster. We've lost some data as a result of this situation. It can be restored, but the question is what to do with the problematic new node first. I don't particularly care about the data that's on it, since I'm going to re-import the critical data from files anyway, and then I can recreate derivative data afterwards. So it's purely a matter of getting the cluster healthy again as quickly as possible so I can begin that import process. Any issue with running scrubs on multiple nodes at a time, provided they aren't replication neighbors? On Thu, Sep 15, 2011 at 8:18 AM, Ethan Rowe et...@the-rowes.com wrote: I just noticed the following from one of Jonathan Ellis' messages yesterday: Added to NEWS: - After upgrading, run nodetool scrub against each node before running repair, moving nodes, or adding new ones. We did not do this, as it was not indicated as necessary in the news when we were dealing with the upgrade. So perhaps I need to scrub everything before going any further, though the question is what to do with the problematic node. Additionally, it would be helpful to know if scrub will affect the hinted handoffs that have accumulated, as these seem likely to be part of the set of failing streams. On Thu, Sep 15, 2011 at 8:13 AM, Ethan Rowe et...@the-rowes.com wrote: Here's a typical log slice (not terribly informative, I fear): INFO [AntiEntropyStage:2] 2011-09-15 05:41:36,106 AntiEntropyService.java (l ine 884) Performing streaming repair of 1003 ranges with /10.34.90.8 for (299 90798416657667504332586989223299634,54296681768153272037430773234349600451] INFO [AntiEntropyStage:2] 2011-09-15 05:41:36,427 StreamOut.java (line 181) Stream context metadata [/mnt/cassandra/data/events_production/FitsByShip-g-1 0-Data.db sections=88 progress=0/11707163 - 0%, /mnt/cassandra/data/events_pr oduction/FitsByShip-g-11-Data.db sections=169 progress=0/6133240 - 0%, /mnt/c assandra/data/events_production/FitsByShip-g-6-Data.db sections=1 progress=0/ 6918814 - 0%, /mnt/cassandra/data/events_production/FitsByShip-g-12-Data.db s ections=260 progress=0/9091780 - 0%], 4 sstables. INFO [AntiEntropyStage:2] 2011-09-15 05:41:36,428 StreamOutSession.java (lin e 174) Streaming to /10.34.90.8 ERROR [Thread-56] 2011-09-15 05:41:38,515 AbstractCassandraDaemon.java (line 139) Fatal exception in thread Thread[Thread-56,5,main] java.lang.NullPointerException at org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpC onnection.java:174) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConn ection.java:114) Not sure if the exception is related to the outbound streaming above; other nodes are actively trying to stream to this node, so perhaps it comes from those and temporal adjacency to the outbound stream is just coincidental. I have other snippets that look basically identical to the above, except if I look at the logs to which this node is trying to stream, I see that it has concurrently opened a stream in the other direction, which could be the one that the exception pertains to. On Thu, Sep 15, 2011 at 7:41 AM, Sylvain Lebresne sylv...@datastax.com wrote: On Thu, Sep 15, 2011 at 1:16 PM, Ethan Rowe et...@the-rowes.com wrote: Hi. We've been running a 7-node cluster with RF 3, QUORUM reads/writes in our production environment for a few months. It's been consistently stable during this period, particularly once we got out maintenance strategy fully worked out (per node, one repair a week, one major compaction a week, the latter due to the nature of our data model and usage). While this cluster started, back in June or so, on the 0.7 series, it's been running 0.8.3 for a while now with no issues. We upgraded to 0.8.5 two days ago, having tested the upgrade in our staging cluster (with an otherwise identical configuration) previously and verified that our application's various use cases appeared successful. One of our nodes suffered a disk failure yesterday. We attempted to replace the dead node by placing a new node at OldNode.initial_token - 1 with auto_bootstrap on. A few things went awry from there: 1. We never saw the new node in bootstrap mode; it became available pretty much immediately upon joining the ring, and never reported a joining state. I did verify that auto_bootstrap was on. 2. I mistakenly ran repair on the new node rather than removetoken on the old node, due to a delightful mental error. The repair got nowhere fast, as it attempts to repair against the down node which throws an exception. So I interrupted the repair, restarted the node to clear any pending validation
Re: New node unable to stream (0.8.5)
Thanks, Jonathan. I'll try the workaround and see if that gets the streams flowing properly. As I mentioned before, we did not run scrub yet. What is the consequence of letting the streams from the hinted handoffs complete if scrub hasn't been run on these nodes? I'm currently running scrub on one node to get a sense of the time frame. Thanks again. - Ethan On Thu, Sep 15, 2011 at 9:09 AM, Jonathan Ellis jbel...@gmail.com wrote: That means we missed a place we needed to special-case for backwards compatibility -- the workaround is, add an empty encryption_options section to cassandra.yaml: encryption_options: internode_encryption: none keystore: conf/.keystore keystore_password: cassandra truststore: conf/.truststore truststore_password: cassandra Created https://issues.apache.org/jira/browse/CASSANDRA-3212 to fix this. On Thu, Sep 15, 2011 at 7:13 AM, Ethan Rowe et...@the-rowes.com wrote: Here's a typical log slice (not terribly informative, I fear): INFO [AntiEntropyStage:2] 2011-09-15 05:41:36,106 AntiEntropyService.java (l ine 884) Performing streaming repair of 1003 ranges with /10.34.90.8for (299 90798416657667504332586989223299634,54296681768153272037430773234349600451] INFO [AntiEntropyStage:2] 2011-09-15 05:41:36,427 StreamOut.java (line 181) Stream context metadata [/mnt/cassandra/data/events_production/FitsByShip-g-1 0-Data.db sections=88 progress=0/11707163 - 0%, /mnt/cassandra/data/events_pr oduction/FitsByShip-g-11-Data.db sections=169 progress=0/6133240 - 0%, /mnt/c assandra/data/events_production/FitsByShip-g-6-Data.db sections=1 progress=0/ 6918814 - 0%, /mnt/cassandra/data/events_production/FitsByShip-g-12-Data.db s ections=260 progress=0/9091780 - 0%], 4 sstables. INFO [AntiEntropyStage:2] 2011-09-15 05:41:36,428 StreamOutSession.java (lin e 174) Streaming to /10.34.90.8 ERROR [Thread-56] 2011-09-15 05:41:38,515 AbstractCassandraDaemon.java (line 139) Fatal exception in thread Thread[Thread-56,5,main] java.lang.NullPointerException at org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpC onnection.java:174) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConn ection.java:114) Not sure if the exception is related to the outbound streaming above; other nodes are actively trying to stream to this node, so perhaps it comes from those and temporal adjacency to the outbound stream is just coincidental. I have other snippets that look basically identical to the above, except if I look at the logs to which this node is trying to stream, I see that it has concurrently opened a stream in the other direction, which could be the one that the exception pertains to. On Thu, Sep 15, 2011 at 7:41 AM, Sylvain Lebresne sylv...@datastax.com wrote: On Thu, Sep 15, 2011 at 1:16 PM, Ethan Rowe et...@the-rowes.com wrote: Hi. We've been running a 7-node cluster with RF 3, QUORUM reads/writes in our production environment for a few months. It's been consistently stable during this period, particularly once we got out maintenance strategy fully worked out (per node, one repair a week, one major compaction a week, the latter due to the nature of our data model and usage). While this cluster started, back in June or so, on the 0.7 series, it's been running 0.8.3 for a while now with no issues. We upgraded to 0.8.5 two days ago, having tested the upgrade in our staging cluster (with an otherwise identical configuration) previously and verified that our application's various use cases appeared successful. One of our nodes suffered a disk failure yesterday. We attempted to replace the dead node by placing a new node at OldNode.initial_token - 1 with auto_bootstrap on. A few things went awry from there: 1. We never saw the new node in bootstrap mode; it became available pretty much immediately upon joining the ring, and never reported a joining state. I did verify that auto_bootstrap was on. 2. I mistakenly ran repair on the new node rather than removetoken on the old node, due to a delightful mental error. The repair got nowhere fast, as it attempts to repair against the down node which throws an exception. So I interrupted the repair, restarted the node to clear any pending validation compactions, and... 3. Ran removetoken for the old node. 4. We let this run for some time and saw eventually that all the nodes appeared to be done various compactions and were stuck at streaming. Many streams listed as open, none making any progress. 5. I observed an Rpc-related exception on the new node (where the removetoken was launched) and concluded that the streams were broken so the process wouldn't ever finish. 6. Ran a removetoken force to get the dead node out of the mix.
Re: New node unable to stream (0.8.5)
On Thu, Sep 15, 2011 at 9:21 AM, Jonathan Ellis jbel...@gmail.com wrote: Where did the data loss come in? The outcome of the analytical jobs run overnight while some of these repairs were (not) running is consistent with what I would expect if perhaps 20-30% of the source data was missing. Given the strong consistency model we're using, this is surprising to me, since the jobs did not report any read or write failures. I wonder if this is a consequence of the dead node missing and the new node being operational but having received basically none of its hinted handoff streams. Perhaps with streaming fixed the data will reappear, which would be a happy outcome, but if not, I can reimport the critical stuff from files. Scrub is safe to run in parallel. Is it somewhat analogous to a major compaction in terms of I/O impact, with perhaps less greedy use of disk space? On Thu, Sep 15, 2011 at 8:08 AM, Ethan Rowe et...@the-rowes.com wrote: After further review, I'm definitely going to scrub all the original nodes in the cluster. We've lost some data as a result of this situation. It can be restored, but the question is what to do with the problematic new node first. I don't particularly care about the data that's on it, since I'm going to re-import the critical data from files anyway, and then I can recreate derivative data afterwards. So it's purely a matter of getting the cluster healthy again as quickly as possible so I can begin that import process. Any issue with running scrubs on multiple nodes at a time, provided they aren't replication neighbors? On Thu, Sep 15, 2011 at 8:18 AM, Ethan Rowe et...@the-rowes.com wrote: I just noticed the following from one of Jonathan Ellis' messages yesterday: Added to NEWS: - After upgrading, run nodetool scrub against each node before running repair, moving nodes, or adding new ones. We did not do this, as it was not indicated as necessary in the news when we were dealing with the upgrade. So perhaps I need to scrub everything before going any further, though the question is what to do with the problematic node. Additionally, it would be helpful to know if scrub will affect the hinted handoffs that have accumulated, as these seem likely to be part of the set of failing streams. On Thu, Sep 15, 2011 at 8:13 AM, Ethan Rowe et...@the-rowes.com wrote: Here's a typical log slice (not terribly informative, I fear): INFO [AntiEntropyStage:2] 2011-09-15 05:41:36,106 AntiEntropyService.java (l ine 884) Performing streaming repair of 1003 ranges with /10.34.90.8for (299 90798416657667504332586989223299634,54296681768153272037430773234349600451] INFO [AntiEntropyStage:2] 2011-09-15 05:41:36,427 StreamOut.java (line 181) Stream context metadata [/mnt/cassandra/data/events_production/FitsByShip-g-1 0-Data.db sections=88 progress=0/11707163 - 0%, /mnt/cassandra/data/events_pr oduction/FitsByShip-g-11-Data.db sections=169 progress=0/6133240 - 0%, /mnt/c assandra/data/events_production/FitsByShip-g-6-Data.db sections=1 progress=0/ 6918814 - 0%, /mnt/cassandra/data/events_production/FitsByShip-g-12-Data.db s ections=260 progress=0/9091780 - 0%], 4 sstables. INFO [AntiEntropyStage:2] 2011-09-15 05:41:36,428 StreamOutSession.java (lin e 174) Streaming to /10.34.90.8 ERROR [Thread-56] 2011-09-15 05:41:38,515 AbstractCassandraDaemon.java (line 139) Fatal exception in thread Thread[Thread-56,5,main] java.lang.NullPointerException at org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpC onnection.java:174) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConn ection.java:114) Not sure if the exception is related to the outbound streaming above; other nodes are actively trying to stream to this node, so perhaps it comes from those and temporal adjacency to the outbound stream is just coincidental. I have other snippets that look basically identical to the above, except if I look at the logs to which this node is trying to stream, I see that it has concurrently opened a stream in the other direction, which could be the one that the exception pertains to. On Thu, Sep 15, 2011 at 7:41 AM, Sylvain Lebresne sylv...@datastax.com wrote: On Thu, Sep 15, 2011 at 1:16 PM, Ethan Rowe et...@the-rowes.com wrote: Hi. We've been running a 7-node cluster with RF 3, QUORUM reads/writes in our production environment for a few months. It's been consistently stable during this period, particularly once we got out maintenance strategy fully worked out (per node, one repair a week, one major compaction a week, the latter due to the nature of our data model and usage). While this cluster started, back in June or so, on the 0.7 series, it's been running 0.8.3 for a while now with no issues. We upgraded to 0.8.5 two days ago,
Re: New node unable to stream (0.8.5)
Hinted handoff doesn't use streaming mode, so it doesn't care. (Streaming to Cassandra means sending raw sstable file ranges to another node. HH just uses the normal column-based write path.) On Thu, Sep 15, 2011 at 8:24 AM, Ethan Rowe et...@the-rowes.com wrote: Thanks, Jonathan. I'll try the workaround and see if that gets the streams flowing properly. As I mentioned before, we did not run scrub yet. What is the consequence of letting the streams from the hinted handoffs complete if scrub hasn't been run on these nodes? I'm currently running scrub on one node to get a sense of the time frame. Thanks again. - Ethan On Thu, Sep 15, 2011 at 9:09 AM, Jonathan Ellis jbel...@gmail.com wrote: That means we missed a place we needed to special-case for backwards compatibility -- the workaround is, add an empty encryption_options section to cassandra.yaml: encryption_options: internode_encryption: none keystore: conf/.keystore keystore_password: cassandra truststore: conf/.truststore truststore_password: cassandra Created https://issues.apache.org/jira/browse/CASSANDRA-3212 to fix this. On Thu, Sep 15, 2011 at 7:13 AM, Ethan Rowe et...@the-rowes.com wrote: Here's a typical log slice (not terribly informative, I fear): INFO [AntiEntropyStage:2] 2011-09-15 05:41:36,106 AntiEntropyService.java (l ine 884) Performing streaming repair of 1003 ranges with /10.34.90.8 for (299 90798416657667504332586989223299634,54296681768153272037430773234349600451] INFO [AntiEntropyStage:2] 2011-09-15 05:41:36,427 StreamOut.java (line 181) Stream context metadata [/mnt/cassandra/data/events_production/FitsByShip-g-1 0-Data.db sections=88 progress=0/11707163 - 0%, /mnt/cassandra/data/events_pr oduction/FitsByShip-g-11-Data.db sections=169 progress=0/6133240 - 0%, /mnt/c assandra/data/events_production/FitsByShip-g-6-Data.db sections=1 progress=0/ 6918814 - 0%, /mnt/cassandra/data/events_production/FitsByShip-g-12-Data.db s ections=260 progress=0/9091780 - 0%], 4 sstables. INFO [AntiEntropyStage:2] 2011-09-15 05:41:36,428 StreamOutSession.java (lin e 174) Streaming to /10.34.90.8 ERROR [Thread-56] 2011-09-15 05:41:38,515 AbstractCassandraDaemon.java (line 139) Fatal exception in thread Thread[Thread-56,5,main] java.lang.NullPointerException at org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpC onnection.java:174) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConn ection.java:114) Not sure if the exception is related to the outbound streaming above; other nodes are actively trying to stream to this node, so perhaps it comes from those and temporal adjacency to the outbound stream is just coincidental. I have other snippets that look basically identical to the above, except if I look at the logs to which this node is trying to stream, I see that it has concurrently opened a stream in the other direction, which could be the one that the exception pertains to. On Thu, Sep 15, 2011 at 7:41 AM, Sylvain Lebresne sylv...@datastax.com wrote: On Thu, Sep 15, 2011 at 1:16 PM, Ethan Rowe et...@the-rowes.com wrote: Hi. We've been running a 7-node cluster with RF 3, QUORUM reads/writes in our production environment for a few months. It's been consistently stable during this period, particularly once we got out maintenance strategy fully worked out (per node, one repair a week, one major compaction a week, the latter due to the nature of our data model and usage). While this cluster started, back in June or so, on the 0.7 series, it's been running 0.8.3 for a while now with no issues. We upgraded to 0.8.5 two days ago, having tested the upgrade in our staging cluster (with an otherwise identical configuration) previously and verified that our application's various use cases appeared successful. One of our nodes suffered a disk failure yesterday. We attempted to replace the dead node by placing a new node at OldNode.initial_token - 1 with auto_bootstrap on. A few things went awry from there: 1. We never saw the new node in bootstrap mode; it became available pretty much immediately upon joining the ring, and never reported a joining state. I did verify that auto_bootstrap was on. 2. I mistakenly ran repair on the new node rather than removetoken on the old node, due to a delightful mental error. The repair got nowhere fast, as it attempts to repair against the down node which throws an exception. So I interrupted the repair, restarted the node to clear any pending validation compactions, and... 3. Ran removetoken for the old node. 4. We let this run for some time and saw eventually that all the nodes appeared to be done various compactions and were stuck at streaming. Many
Re: New node unable to stream (0.8.5)
If you added the new node as a seed, it would ignore bootstrap mode. And bootstrap / repair *do* use streaming so you'll want to re-run repair post-scrub. (No need to re-bootstrap since you're repairing.) Scrub is a little less heavyweight than major compaction but same ballpark. It runs sstable-at-a-time so (as long as you haven't been in the habit of forcing majors) space should not be a concern. On Thu, Sep 15, 2011 at 8:40 AM, Ethan Rowe et...@the-rowes.com wrote: On Thu, Sep 15, 2011 at 9:21 AM, Jonathan Ellis jbel...@gmail.com wrote: Where did the data loss come in? The outcome of the analytical jobs run overnight while some of these repairs were (not) running is consistent with what I would expect if perhaps 20-30% of the source data was missing. Given the strong consistency model we're using, this is surprising to me, since the jobs did not report any read or write failures. I wonder if this is a consequence of the dead node missing and the new node being operational but having received basically none of its hinted handoff streams. Perhaps with streaming fixed the data will reappear, which would be a happy outcome, but if not, I can reimport the critical stuff from files. Scrub is safe to run in parallel. Is it somewhat analogous to a major compaction in terms of I/O impact, with perhaps less greedy use of disk space? On Thu, Sep 15, 2011 at 8:08 AM, Ethan Rowe et...@the-rowes.com wrote: After further review, I'm definitely going to scrub all the original nodes in the cluster. We've lost some data as a result of this situation. It can be restored, but the question is what to do with the problematic new node first. I don't particularly care about the data that's on it, since I'm going to re-import the critical data from files anyway, and then I can recreate derivative data afterwards. So it's purely a matter of getting the cluster healthy again as quickly as possible so I can begin that import process. Any issue with running scrubs on multiple nodes at a time, provided they aren't replication neighbors? On Thu, Sep 15, 2011 at 8:18 AM, Ethan Rowe et...@the-rowes.com wrote: I just noticed the following from one of Jonathan Ellis' messages yesterday: Added to NEWS: - After upgrading, run nodetool scrub against each node before running repair, moving nodes, or adding new ones. We did not do this, as it was not indicated as necessary in the news when we were dealing with the upgrade. So perhaps I need to scrub everything before going any further, though the question is what to do with the problematic node. Additionally, it would be helpful to know if scrub will affect the hinted handoffs that have accumulated, as these seem likely to be part of the set of failing streams. On Thu, Sep 15, 2011 at 8:13 AM, Ethan Rowe et...@the-rowes.com wrote: Here's a typical log slice (not terribly informative, I fear): INFO [AntiEntropyStage:2] 2011-09-15 05:41:36,106 AntiEntropyService.java (l ine 884) Performing streaming repair of 1003 ranges with /10.34.90.8 for (299 90798416657667504332586989223299634,54296681768153272037430773234349600451] INFO [AntiEntropyStage:2] 2011-09-15 05:41:36,427 StreamOut.java (line 181) Stream context metadata [/mnt/cassandra/data/events_production/FitsByShip-g-1 0-Data.db sections=88 progress=0/11707163 - 0%, /mnt/cassandra/data/events_pr oduction/FitsByShip-g-11-Data.db sections=169 progress=0/6133240 - 0%, /mnt/c assandra/data/events_production/FitsByShip-g-6-Data.db sections=1 progress=0/ 6918814 - 0%, /mnt/cassandra/data/events_production/FitsByShip-g-12-Data.db s ections=260 progress=0/9091780 - 0%], 4 sstables. INFO [AntiEntropyStage:2] 2011-09-15 05:41:36,428 StreamOutSession.java (lin e 174) Streaming to /10.34.90.8 ERROR [Thread-56] 2011-09-15 05:41:38,515 AbstractCassandraDaemon.java (line 139) Fatal exception in thread Thread[Thread-56,5,main] java.lang.NullPointerException at org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpC onnection.java:174) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConn ection.java:114) Not sure if the exception is related to the outbound streaming above; other nodes are actively trying to stream to this node, so perhaps it comes from those and temporal adjacency to the outbound stream is just coincidental. I have other snippets that look basically identical to the above, except if I look at the logs to which this node is trying to stream, I see that it has concurrently opened a stream in the other direction, which could be the one that the exception pertains to. On Thu, Sep 15, 2011 at 7:41 AM, Sylvain Lebresne sylv...@datastax.com wrote: On Thu, Sep 15, 2011 at 1:16 PM, Ethan Rowe et...@the-rowes.com wrote: Hi. We've been running a 7-node cluster
Re: New node unable to stream (0.8.5)
On Thu, Sep 15, 2011 at 10:03 AM, Jonathan Ellis jbel...@gmail.com wrote: If you added the new node as a seed, it would ignore bootstrap mode. And bootstrap / repair *do* use streaming so you'll want to re-run repair post-scrub. (No need to re-bootstrap since you're repairing.) Ah, of course. That's what happened; the chef recipe added the node to its own seed list, which is a problem I thought we'd fixed but apparently not. That definitely explains the bootstrap issue. But no matter, so long as the repairs can eventually run. Scrub is a little less heavyweight than major compaction but same ballpark. It runs sstable-at-a-time so (as long as you haven't been in the habit of forcing majors) space should not be a concern. Cool. We've deactivated all tasks against these nodes and will scrub them all in parallel, apply the encryption options you specified, and see where that gets us. Thanks for the assistance. - Ethan On Thu, Sep 15, 2011 at 8:40 AM, Ethan Rowe et...@the-rowes.com wrote: On Thu, Sep 15, 2011 at 9:21 AM, Jonathan Ellis jbel...@gmail.com wrote: Where did the data loss come in? The outcome of the analytical jobs run overnight while some of these repairs were (not) running is consistent with what I would expect if perhaps 20-30% of the source data was missing. Given the strong consistency model we're using, this is surprising to me, since the jobs did not report any read or write failures. I wonder if this is a consequence of the dead node missing and the new node being operational but having received basically none of its hinted handoff streams. Perhaps with streaming fixed the data will reappear, which would be a happy outcome, but if not, I can reimport the critical stuff from files. Scrub is safe to run in parallel. Is it somewhat analogous to a major compaction in terms of I/O impact, with perhaps less greedy use of disk space? On Thu, Sep 15, 2011 at 8:08 AM, Ethan Rowe et...@the-rowes.com wrote: After further review, I'm definitely going to scrub all the original nodes in the cluster. We've lost some data as a result of this situation. It can be restored, but the question is what to do with the problematic new node first. I don't particularly care about the data that's on it, since I'm going to re-import the critical data from files anyway, and then I can recreate derivative data afterwards. So it's purely a matter of getting the cluster healthy again as quickly as possible so I can begin that import process. Any issue with running scrubs on multiple nodes at a time, provided they aren't replication neighbors? On Thu, Sep 15, 2011 at 8:18 AM, Ethan Rowe et...@the-rowes.com wrote: I just noticed the following from one of Jonathan Ellis' messages yesterday: Added to NEWS: - After upgrading, run nodetool scrub against each node before running repair, moving nodes, or adding new ones. We did not do this, as it was not indicated as necessary in the news when we were dealing with the upgrade. So perhaps I need to scrub everything before going any further, though the question is what to do with the problematic node. Additionally, it would be helpful to know if scrub will affect the hinted handoffs that have accumulated, as these seem likely to be part of the set of failing streams. On Thu, Sep 15, 2011 at 8:13 AM, Ethan Rowe et...@the-rowes.com wrote: Here's a typical log slice (not terribly informative, I fear): INFO [AntiEntropyStage:2] 2011-09-15 05:41:36,106 AntiEntropyService.java (l ine 884) Performing streaming repair of 1003 ranges with / 10.34.90.8 for (299 90798416657667504332586989223299634,54296681768153272037430773234349600451] INFO [AntiEntropyStage:2] 2011-09-15 05:41:36,427 StreamOut.java (line 181) Stream context metadata [/mnt/cassandra/data/events_production/FitsByShip-g-1 0-Data.db sections=88 progress=0/11707163 - 0%, /mnt/cassandra/data/events_pr oduction/FitsByShip-g-11-Data.db sections=169 progress=0/6133240 - 0%, /mnt/c assandra/data/events_production/FitsByShip-g-6-Data.db sections=1 progress=0/ 6918814 - 0%, /mnt/cassandra/data/events_production/FitsByShip-g-12-Data.db s ections=260 progress=0/9091780 - 0%], 4 sstables. INFO [AntiEntropyStage:2] 2011-09-15 05:41:36,428 StreamOutSession.java (lin e 174) Streaming to /10.34.90.8 ERROR [Thread-56] 2011-09-15 05:41:38,515 AbstractCassandraDaemon.java (line 139) Fatal exception in thread Thread[Thread-56,5,main] java.lang.NullPointerException at org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpC onnection.java:174) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConn ection.java:114) Not sure if the exception is related to the outbound
Re: Configuring multi DC cluster
Wait, his nodes are going SC, SC, AT, AT. Shouldn't they go SC, AT, SC, AT? By which I mean that if he adds another node to the ring (or lowers the replication factor), he will have a node that is under-utilized. The rings in his data centers have the tokens: SC: 0, 1 AT: 85070591730234615865843651857942052864, 85070591730234615865843651857942052865 They should be: SC: 0, 85070591730234615865843651857942052864 AT: 1, 85070591730234615865843651857942052865 Or did I forget/misread something? - Original Message - From: aaron morton aa...@thelastpickle.com To: user@cassandra.apache.org Sent: Tuesday, September 13, 2011 6:19:16 PM Subject: Re: Configuring multi DC cluster Looks good to me. Last time I checked the Partitioner did not take the DC into consideration https://issues.apache.org/jira/browse/CASSANDRA-3047 Good luck. - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 14/09/2011, at 8:41 AM, Anand Somani wrote: Hi, Just trying to setup a cluster of 4 nodes for multiDC scenario - with 2 nodes in each DC. This is all on the same box just for testing the configuration aspect. I have configured things as • PropertyFile • 127.0.0.4=SC:rack1 127.0.0.5=SC:rack2 127.0.0.6=AT:rack1 127.0.0.7=AT:rack2 # default for unknown nodes default=SC:rack1 • Setup initial tokens as - advised • configured keyspace with SC:2, AT:2 • ring looks like • Address Status State Load Owns Token 85070591730234615865843651857942052865 127.0.0.4 Up Normal 464.98 KB 50.00% 0 127.0.0.5 Up Normal 464.98 KB 0.00% 1 127.0.0.6 Up Normal 464.99 KB 50.00% 85070591730234615865843651857942052864 127.0.0.7 Up Normal 464.99 KB 0.00% 85070591730234615865843651857942052865 Is that what I should expect the ring to look like? Is there anything else I should be testing/validating to make sure that things are configured correctly for NTS? Thanks Anand
[BETA RELEASE] Apache Cassandra 1.0.0-beta1 released
The Cassandra team is pleased to announce the release of the first beta for the future Apache Cassandra 1.0. Let me first stress that this is beta software and as such is *not* ready for production use. The goal of this release is to give a preview of what will be Cassandra 1.0 and more importantly to get wider testing before the final release. So please help us make Cassandra 1.0 be the best it possibly could by testing this beta release and reporting any problem you may encounter[3,4]. You can have a look at the change log[1] and the release notes[2] to see where Cassandra 1.0 differs from the 0.8 series. Apache Cassandra 1.0.0-beta1[5] is available as usual from the cassandra website: http://cassandra.apache.org/download/ Thank you for your help in testing and have fun with it. [1]: http://goo.gl/evCW0 (CHANGES.txt) [2]: http://goo.gl/HbNsV (NEWS.txt) [3]: https://issues.apache.org/jira/browse/CASSANDRA [4]: user@cassandra.apache.org [5]: https://svn.apache.org/repos/asf/cassandra/tags/cassandra-1.0.0-beta1
Re: Configuring multi DC cluster
You are right, good catch, thanks! On Thu, Sep 15, 2011 at 8:28 AM, Konstantin Naryshkin konstant...@a-bb.netwrote: Wait, his nodes are going SC, SC, AT, AT. Shouldn't they go SC, AT, SC, AT? By which I mean that if he adds another node to the ring (or lowers the replication factor), he will have a node that is under-utilized. The rings in his data centers have the tokens: SC: 0, 1 AT: 85070591730234615865843651857942052864, 85070591730234615865843651857942052865 They should be: SC: 0, 85070591730234615865843651857942052864 AT: 1, 85070591730234615865843651857942052865 Or did I forget/misread something? - Original Message - From: aaron morton aa...@thelastpickle.com To: user@cassandra.apache.org Sent: Tuesday, September 13, 2011 6:19:16 PM Subject: Re: Configuring multi DC cluster Looks good to me. Last time I checked the Partitioner did not take the DC into consideration https://issues.apache.org/jira/browse/CASSANDRA-3047 Good luck. - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 14/09/2011, at 8:41 AM, Anand Somani wrote: Hi, Just trying to setup a cluster of 4 nodes for multiDC scenario - with 2 nodes in each DC. This is all on the same box just for testing the configuration aspect. I have configured things as • PropertyFile • 127.0.0.4=SC:rack1 127.0.0.5=SC:rack2 127.0.0.6=AT:rack1 127.0.0.7=AT:rack2 # default for unknown nodes default=SC:rack1 • Setup initial tokens as - advised • configured keyspace with SC:2, AT:2 • ring looks like • Address Status State Load Owns Token 85070591730234615865843651857942052865 127.0.0.4 Up Normal 464.98 KB 50.00% 0 127.0.0.5 Up Normal 464.98 KB 0.00% 1 127.0.0.6 Up Normal 464.99 KB 50.00% 85070591730234615865843651857942052864 127.0.0.7 Up Normal 464.99 KB 0.00% 85070591730234615865843651857942052865 Is that what I should expect the ring to look like? Is there anything else I should be testing/validating to make sure that things are configured correctly for NTS? Thanks Anand
Re: New node unable to stream (0.8.5)
Cool. We've deactivated all tasks against these nodes and will scrub them all in parallel, apply the encryption options you specified, and see where that gets us. Thanks for the assistance. To follow up: * We scrubbed all the nodes * We applied the encryption options specified * A repair is continuing (for about an hour so far, perhaps more) on the new, problematic node; it's successfully streaming data from its neighbors and has built up a roughly equivalent data volume on disk We'll see if the data is fully restored once this process completes. Even if it isn't, it seems likely that the cluster will be in a healthy state soon, so we can reimport as necessary and we'll be out of the woods. Now that I've said all that, something will inevitably go wrong, but until that happens, thanks again for the feedback. - Ethan On Thu, Sep 15, 2011 at 8:40 AM, Ethan Rowe et...@the-rowes.com wrote: On Thu, Sep 15, 2011 at 9:21 AM, Jonathan Ellis jbel...@gmail.com wrote: Where did the data loss come in? The outcome of the analytical jobs run overnight while some of these repairs were (not) running is consistent with what I would expect if perhaps 20-30% of the source data was missing. Given the strong consistency model we're using, this is surprising to me, since the jobs did not report any read or write failures. I wonder if this is a consequence of the dead node missing and the new node being operational but having received basically none of its hinted handoff streams. Perhaps with streaming fixed the data will reappear, which would be a happy outcome, but if not, I can reimport the critical stuff from files. Scrub is safe to run in parallel. Is it somewhat analogous to a major compaction in terms of I/O impact, with perhaps less greedy use of disk space? On Thu, Sep 15, 2011 at 8:08 AM, Ethan Rowe et...@the-rowes.com wrote: After further review, I'm definitely going to scrub all the original nodes in the cluster. We've lost some data as a result of this situation. It can be restored, but the question is what to do with the problematic new node first. I don't particularly care about the data that's on it, since I'm going to re-import the critical data from files anyway, and then I can recreate derivative data afterwards. So it's purely a matter of getting the cluster healthy again as quickly as possible so I can begin that import process. Any issue with running scrubs on multiple nodes at a time, provided they aren't replication neighbors? On Thu, Sep 15, 2011 at 8:18 AM, Ethan Rowe et...@the-rowes.com wrote: I just noticed the following from one of Jonathan Ellis' messages yesterday: Added to NEWS: - After upgrading, run nodetool scrub against each node before running repair, moving nodes, or adding new ones. We did not do this, as it was not indicated as necessary in the news when we were dealing with the upgrade. So perhaps I need to scrub everything before going any further, though the question is what to do with the problematic node. Additionally, it would be helpful to know if scrub will affect the hinted handoffs that have accumulated, as these seem likely to be part of the set of failing streams. On Thu, Sep 15, 2011 at 8:13 AM, Ethan Rowe et...@the-rowes.com wrote: Here's a typical log slice (not terribly informative, I fear): INFO [AntiEntropyStage:2] 2011-09-15 05:41:36,106 AntiEntropyService.java (l ine 884) Performing streaming repair of 1003 ranges with / 10.34.90.8 for (299 90798416657667504332586989223299634,54296681768153272037430773234349600451] INFO [AntiEntropyStage:2] 2011-09-15 05:41:36,427 StreamOut.java (line 181) Stream context metadata [/mnt/cassandra/data/events_production/FitsByShip-g-1 0-Data.db sections=88 progress=0/11707163 - 0%, /mnt/cassandra/data/events_pr oduction/FitsByShip-g-11-Data.db sections=169 progress=0/6133240 - 0%, /mnt/c assandra/data/events_production/FitsByShip-g-6-Data.db sections=1 progress=0/ 6918814 - 0%, /mnt/cassandra/data/events_production/FitsByShip-g-12-Data.db s ections=260 progress=0/9091780 - 0%], 4 sstables. INFO [AntiEntropyStage:2] 2011-09-15 05:41:36,428 StreamOutSession.java (lin e 174) Streaming to /10.34.90.8 ERROR [Thread-56] 2011-09-15 05:41:38,515 AbstractCassandraDaemon.java (line 139) Fatal exception in thread Thread[Thread-56,5,main] java.lang.NullPointerException at org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpC onnection.java:174) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConn ection.java:114) Not sure if the exception is related to the outbound streaming above; other nodes are actively trying to stream to this node, so perhaps it comes from
Re: [BETA RELEASE] Apache Cassandra 1.0.0-beta1 released
This is a great new! Is it possible to do a write-up of main changes like Leveldb and explain it a little bit. I get lost reading JIRA and sometimes is difficult to follow the thread. It looks like there are some major changes in this release. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/BETA-RELEASE-Apache-Cassandra-1-0-0-beta1-released-tp6797930p6798330.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: [BETA RELEASE] Apache Cassandra 1.0.0-beta1 released
Isn't this levelDB implementation for Google's LevelDB? http://code.google.com/p/leveldb/ From what I know, its quite fast.. On Thu, Sep 15, 2011 at 4:04 PM, mcasandra mohitanch...@gmail.com wrote: This is a great new! Is it possible to do a write-up of main changes like Leveldb and explain it a little bit. I get lost reading JIRA and sometimes is difficult to follow the thread. It looks like there are some major changes in this release. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/BETA-RELEASE-Apache-Cassandra-1-0-0-beta1-released-tp6797930p6798330.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: [BETA RELEASE] Apache Cassandra 1.0.0-beta1 released
Is it possible to update an existing column family with {stable_compression: SnappyCompressor, compaction_strategy:LeveldCompactionStrategy}? Or will I have to make a new column family and migrate my data to it? -Jeremiah On 09/15/2011 01:01 PM, Sylvain Lebresne wrote: The Cassandra team is pleased to announce the release of the first beta for the future Apache Cassandra 1.0. Let me first stress that this is beta software and as such is *not* ready for production use. The goal of this release is to give a preview of what will be Cassandra 1.0 and more importantly to get wider testing before the final release. So please help us make Cassandra 1.0 be the best it possibly could by testing this beta release and reporting any problem you may encounter[3,4]. You can have a look at the change log[1] and the release notes[2] to see where Cassandra 1.0 differs from the 0.8 series. Apache Cassandra 1.0.0-beta1[5] is available as usual from the cassandra website: http://cassandra.apache.org/download/ Thank you for your help in testing and have fun with it. [1]: http://goo.gl/evCW0 (CHANGES.txt) [2]: http://goo.gl/HbNsV (NEWS.txt) [3]: https://issues.apache.org/jira/browse/CASSANDRA [4]: user@cassandra.apache.org [5]: https://svn.apache.org/repos/asf/cassandra/tags/cassandra-1.0.0-beta1
Re: [BETA RELEASE] Apache Cassandra 1.0.0-beta1 released
So I should be able to do rolling upgrade from 0.7 to 1.0 (not there in the release notes, but I assume that is work in progress). Thanks On Thu, Sep 15, 2011 at 1:36 PM, amulya rattan talk2amu...@gmail.comwrote: Isn't this levelDB implementation for Google's LevelDB? http://code.google.com/p/leveldb/ From what I know, its quite fast.. On Thu, Sep 15, 2011 at 4:04 PM, mcasandra mohitanch...@gmail.com wrote: This is a great new! Is it possible to do a write-up of main changes like Leveldb and explain it a little bit. I get lost reading JIRA and sometimes is difficult to follow the thread. It looks like there are some major changes in this release. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/BETA-RELEASE-Apache-Cassandra-1-0-0-beta1-released-tp6797930p6798330.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Calling All Boston Cassandra Users
Hi All, Just a quick note to encourage those of you who are in the Greater Boston area to join: 1) The Boston subgroup of the Cassandra LinkedIn Group: http://www.linkedin.com/groups?home=gid=3973913 2) The Boston Cassandra Meetup Group: http://www.meetup.com/Boston-Cassandra-Users The first meetup was held this week and it would be great to see more faces at the next one! Cheers, Chris
Re: Get CL ONE / NTS
What I’m missing is a clear behavior for CL.ONE. I’m unsure about what nodes are used by ONE and how the filtering of missing data/error is done. I’ve landed in ReadCallback.java but error handling is out of my reach for the moment. Start with StorageProxy.fetch() to see which nodes are considered to be part of the request. ReadCallback.ctor() will decide which are actually involved based on the CL and RR been enabled. At CL ONE there is no checkin of the replica responses for consistency, as there is only one response. If RR is enabled it will start from ReadCallback.maybeResolveForRepair(). Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 15/09/2011, at 7:21 PM, Pierre Chalamet wrote: I do not agree here. I trade “consistency” (it’s more data miss than consistency here) over performance in my case. I’m okay to handle the popping of the Spanish inquisition in the current DC by triggering a new read with a stronger CL somewhere else (for example in other DCs). If the data is nowhere to be found or nothing is reachable, well, it’s sad but true but it will be the end of the game. Fine. What I’m missing is a clear behavior for CL.ONE. I’m unsure about what nodes are used by ONE and how the filtering of missing data/error is done. I’ve landed in ReadCallback.java but error handling is out of my reach for the moment. Thanks, - Pierre From: aaron morton [mailto:aa...@thelastpickle.com] Sent: Thursday, September 15, 2011 12:27 AM To: user@cassandra.apache.org Subject: Re: Get CL ONE / NTS Are you advising CL.ONE does not worth the game when considering read performance ? Consistency is not performance, it's a whole new thing to tune in your application. If you have performance issues deal with those as performance issues, better code / data model / hard ware. By the way, I do not have consistency problem at all - data is only written once Nobody expects a consistency problem. It's chief weapon is surprise. Surprise and fear. It's two weapons are fear and surprise. And so forth http://www.youtube.com/watch?v=Ixgc_FGam3s If you write at LOCAL QUORUM in DC 1 and DC 2 is down at the start of the request, a hint will be stored in DC 1. Some time later when DC 2 comes back that hint will be sent to DC 2. If in the mean time you read from DC 2 at CL ONE you will not get that change. With Read Repair enabled it will repair in the background and you may get a different response on the next read (Am guessing here, cannot remember exactly how RR works cross DC) Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 15/09/2011, at 10:07 AM, Pierre Chalamet wrote: Thanks Aaron, didn't seen your answer before mine. I do agree for 2/ I might have read error. Good suggestion to use EACH_QUORUM - it could be a good trade off to read at this level if ONE fails. Maybe using LOCAL_QUORUM might be a good answer and will avoid headache after all. Are you advising CL.ONE does not worth the game when considering read performance ? By the way, I do not have consistency problem at all - data is only written once (and if more it is always the same data) and read several times across DC. I only have replication problems. That's why I'm more inclined to use CL.ONE for read if possible. Thanks, - Pierre -Original Message- From: aaron morton [mailto:aa...@thelastpickle.com] Sent: Wednesday, September 14, 2011 11:48 PM To: user@cassandra.apache.org; pie...@chalamet.net Subject: Re: Get CL ONE / NTS Your current approach to Consistency opens the door to some inconsistent behavior. 1/ Will I have an error because DC2 does not have any copy of the data ? If you read from DC2 at CL ONE and the data is not replicated it will not be returned. 2/ Will Cassandra try to get the data from DC1 if nothing is found in DC2 ? Not at CL ONE. If you used CL EACH QUORUM then the read will go to all the DC's. If DC2 is behind DC1 then you will get the data form DC1. 3/ In case of partial replication to DC2, will I see sometimes errors about servers not holding the data in DC2 ? Depending on the API call and the client, working at CL ONE, you will see either errors or missing data. 4/ Does Get CL ONE failed as soon as the fastest server to answer tell it does not have the data or does it waits until all servers tell they do not have the data ? yes Consider using LOCAL QUORUM for write and read, will make things a bit more consistent but not add inter DC overhead into the request latency. Still possible to not get data in DC2 if it is totally disconnected from the DC1 write at LOCAL QUORUM and read at EACH QUORUM . Will so you can always read, requests in DC2 will fail if DC1 is not reachable. Hope that helps. -
Re: Configuring multi DC cluster
Yes my bad. http://wiki.apache.org/cassandra/Operations#Token_selection Thanks - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 16/09/2011, at 6:50 AM, Anand Somani wrote: You are right, good catch, thanks! On Thu, Sep 15, 2011 at 8:28 AM, Konstantin Naryshkin konstant...@a-bb.net wrote: Wait, his nodes are going SC, SC, AT, AT. Shouldn't they go SC, AT, SC, AT? By which I mean that if he adds another node to the ring (or lowers the replication factor), he will have a node that is under-utilized. The rings in his data centers have the tokens: SC: 0, 1 AT: 85070591730234615865843651857942052864, 85070591730234615865843651857942052865 They should be: SC: 0, 85070591730234615865843651857942052864 AT: 1, 85070591730234615865843651857942052865 Or did I forget/misread something? - Original Message - From: aaron morton aa...@thelastpickle.com To: user@cassandra.apache.org Sent: Tuesday, September 13, 2011 6:19:16 PM Subject: Re: Configuring multi DC cluster Looks good to me. Last time I checked the Partitioner did not take the DC into consideration https://issues.apache.org/jira/browse/CASSANDRA-3047 Good luck. - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 14/09/2011, at 8:41 AM, Anand Somani wrote: Hi, Just trying to setup a cluster of 4 nodes for multiDC scenario - with 2 nodes in each DC. This is all on the same box just for testing the configuration aspect. I have configured things as • PropertyFile • 127.0.0.4=SC:rack1 127.0.0.5=SC:rack2 127.0.0.6=AT:rack1 127.0.0.7=AT:rack2 # default for unknown nodes default=SC:rack1 • Setup initial tokens as - advised • configured keyspace with SC:2, AT:2 • ring looks like • Address Status State Load Owns Token 85070591730234615865843651857942052865 127.0.0.4 Up Normal 464.98 KB 50.00% 0 127.0.0.5 Up Normal 464.98 KB 0.00% 1 127.0.0.6 Up Normal 464.99 KB 50.00% 85070591730234615865843651857942052864 127.0.0.7 Up Normal 464.99 KB 0.00% 85070591730234615865843651857942052865 Is that what I should expect the ring to look like? Is there anything else I should be testing/validating to make sure that things are configured correctly for NTS? Thanks Anand
Re: LevelDB type compaction
On Thu, Sep 15, 2011 at 3:05 PM, mcasandra mohitanch...@gmail.com wrote: With Leveldb is it going to make reads slower No. Qualified: compared to major compaction under the tiered strategy, leveled reads will usually be a little slower for update-heavy loads. (For insert-mostly workloads compaction doesn't really matter.) But major compaction is not practical in production; you want something that gives consistently good performance, rather than good performance once a day or once a week and then degrades quickly. my understanding is it will create more smaller files Yes. and updates could be scattered all over before compaction? No, updates to a given row will be still be in a single sstable. Also, when memtables are flushed, does it create small files too since memtable size would generally be bigger than 2-4MB? Level0 (newly flushed) sstables are not size-limited. This is one of a handful of differences we have over leveldb itself, which remains a good overview (http://leveldb.googlecode.com/svn/trunk/doc/impl.html). -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: [BETA RELEASE] Apache Cassandra 1.0.0-beta1 released
You should be able to update it, which will leave existing sstables untouched but new ones will be generated compressed. (You could issue scrub to rewrite the existing ones compressed too, if you wanted to force that.) On Thu, Sep 15, 2011 at 3:44 PM, Jeremiah Jordan jeremiah.jor...@morningstar.com wrote: Is it possible to update an existing column family with {stable_compression: SnappyCompressor, compaction_strategy:LeveldCompactionStrategy}? Or will I have to make a new column family and migrate my data to it? -Jeremiah On 09/15/2011 01:01 PM, Sylvain Lebresne wrote: The Cassandra team is pleased to announce the release of the first beta for the future Apache Cassandra 1.0. Let me first stress that this is beta software and as such is *not* ready for production use. The goal of this release is to give a preview of what will be Cassandra 1.0 and more importantly to get wider testing before the final release. So please help us make Cassandra 1.0 be the best it possibly could by testing this beta release and reporting any problem you may encounter[3,4]. You can have a look at the change log[1] and the release notes[2] to see where Cassandra 1.0 differs from the 0.8 series. Apache Cassandra 1.0.0-beta1[5] is available as usual from the cassandra website: http://cassandra.apache.org/download/ Thank you for your help in testing and have fun with it. [1]: http://goo.gl/evCW0 (CHANGES.txt) [2]: http://goo.gl/HbNsV (NEWS.txt) [3]: https://issues.apache.org/jira/browse/CASSANDRA [4]: user@cassandra.apache.org [5]: https://svn.apache.org/repos/asf/cassandra/tags/cassandra-1.0.0-beta1 -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: [BETA RELEASE] Apache Cassandra 1.0.0-beta1 released
Should, although we've only tested 0.8-to-1.0 directly. That would be a useful report to contribute! On Thu, Sep 15, 2011 at 3:45 PM, Anand Somani meatfor...@gmail.com wrote: So I should be able to do rolling upgrade from 0.7 to 1.0 (not there in the release notes, but I assume that is work in progress). Thanks On Thu, Sep 15, 2011 at 1:36 PM, amulya rattan talk2amu...@gmail.com wrote: Isn't this levelDB implementation for Google's LevelDB? http://code.google.com/p/leveldb/ From what I know, its quite fast.. On Thu, Sep 15, 2011 at 4:04 PM, mcasandra mohitanch...@gmail.com wrote: This is a great new! Is it possible to do a write-up of main changes like Leveldb and explain it a little bit. I get lost reading JIRA and sometimes is difficult to follow the thread. It looks like there are some major changes in this release. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/BETA-RELEASE-Apache-Cassandra-1-0-0-beta1-released-tp6797930p6798330.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Cassandra -f problem
Thanks a lot. The problem was that every terminal I open on Debian 6 lacks of java home and path; I have to export them every time I start the virtual machine; btw I have Debian and cassandra running inside vmware workstation. Thanks again. I'm following the readme file. On Mon, Sep 12, 2011 at 11:37 PM, Roshan Dawrani roshandawr...@gmail.comwrote: Hi, Do you have JAVA_HOME exported? If not, can you export it and retry? Cheers. On Tue, Sep 13, 2011 at 8:59 AM, Hernán Quevedo alexandros.c@gmail.com wrote: Hi, Roshan. This is great support, amazing support; not used to it :) Thanks for the reply. Well I think java is installed correctly, I mean, the java -version command works on a terminal, so the PATH env variable is correctly set, right? I downloaded the JDK7 and put it on opt/java/ and then set the path. But, the eclipse icon says it can't find any JRE or JDK, which is weird because of what I said above... but... but... what else could it be? Thanks! On Sun, Sep 11, 2011 at 10:05 PM, Roshan Dawrani roshandawr...@gmail.com wrote: Hi, Cassandra starts JVM as $JAVA -ea -cp $CLASSPATH Looks like $JAVA is coming is empty in your case, hence the error exec -ea not found. Do you not have java installed? Please install it and set JAVA_HOME appropriately and retry. Cheers. On Mon, Sep 12, 2011 at 8:23 AM, Hernán Quevedo alexandros.c@gmail.com wrote: Hi, all. I´m new at this and haven´t been able to install cassandra in debian 6. After uncompressing the tar and creating var/log and var/lib directories, the command bin/cassandra -f results in message exec: 357 -ea not found preventing cassandra from run the process README file says it is suppose to start. Any help would be very appreciated. Thnx! -- Roshan Blog: http://roshandawrani.wordpress.com/ Twitter: @roshandawrani http://twitter.com/roshandawrani Skype: roshandawrani -- Είναι η θέληση των Θεών. -- Roshan Blog: http://roshandawrani.wordpress.com/ Twitter: @roshandawrani http://twitter.com/roshandawrani Skype: roshandawrani -- Είναι η θέληση των Θεών.