Re: Revisit Cassandra EOL Policy
Anuj, do you have a link to the versioning policy? The tick-tock versioning blog post [1] says that EOL happens after two major versions come out, but I can't find this stated more formally anywhere. I'm interested in how long a given version will receive patches for security issues or critical data loss bugs (i.e., the policy of the Apache project itself, distinct from any support that may be available through Datastax). The Postgres project has a great write-up of their policy [2]. And for what it's worth, we are starting to use Cassandra and do have automation around it. I don't have strong feelings about what the versioning policy should look like, but having clear expectations about what happens if there's a critical bug (i.e., can we expect a patch or do we need to upgrade major versions?) is very useful. [1]: http://www.planetcassandra.org/blog/cassandra-2-2-3-0-and-beyond/ [2]: http://www.postgresql.org/support/versioning/
Re: [RELEASE] Apache Cassandra 3.1 released
Thanks, Josh and Paulo--that's much clearer.
Re: [RELEASE] Apache Cassandra 3.1 released
I'm still confused, even after reading the blog post twice (and reading the linked Intel post). I understand what you are doing conceptually, but I'm having a hard time mapping that to actual planned release numbers. > The 3.0.2 will only contain bugfixes, while 3.2 will introduce new features. Will 3.2 contain the bugfixes that are in 3.0.2 as well? Is 3.x.y just 3.0.x plus new stuff? Where most of the time y is 0, unless there's a really serious issue that needs fixing?
Re: UnknownColumnFamily exception / schema inconsistencies
Just wanted to follow up and say thanks: I went through this process (as per Robert's suggestion, with the node stopped and no refresh) on an affected cluster and was able to resolve the issue.
Re: UnknownColumnFamily exception / schema inconsistencies
Any advice on how to proceed here? Sebastian seems to have guessed correctly at the underlying issue, but I'm still not sure how to resolve this given what I see in the data directory and the catalogs. On Wed, Nov 11, 2015 at 12:15 PM, Maciek Sakrejda <mac...@heroku.com> wrote: > On Wed, Nov 11, 2015 at 9:55 AM, Sebastian Estevez < > sebastian.este...@datastax.com> wrote: > >> Stupid question, but how do I find the problem table? The error message >>> complains about a keyspace (by uuid); I haven't seen errors relating to a >>> specific table. I've poked around in the data directory, but I'm not sure >>> what I'm looking for. >> >> >> Is the message complaining about a *keyspace* or abou*t a table (cfid)*? >> You'r original was complaining about a table: >> > >> at=IncomingTcpConnection.run UnknownColumnFamilyException reading from >>> socket; closing org.apache.cassandra.db.UnknownColumnFamilyException: >>> Couldn't find *cfId=3ecce750-84d3-11e5-bdd9-**dd7717dcdbd5* >> >> > Sorry, you're absolutely right--it's the table from this error message. I > confused myself. But now I was able to find it: > > cursors-3ecce75084d311e5bdd9dd7717dcdbd5 > cursors-3ed23e8084d311e583b30fc0205655f5 > > The second uuid is the one that shows up via the schema_columnfamilies > query, but on two of the nodes, the directory with the *other* uuid exists. > Can I just rename the directory on these two nodes? Or how should I proceed? >
Re: UnknownColumnFamily exception / schema inconsistencies
On Fri, Nov 13, 2015 at 9:56 AM, Sebastian Estevez < sebastian.este...@datastax.com> wrote: > I think you're just missing the steps in *Bold*: > > Thanks, but I wasn't clear on what to do if the "new" directory does not exist at all on some of the nodes (only the old). Can I just rename the "old" to the "new" or is there more to it?
Re: UnknownColumnFamily exception / schema inconsistencies
On Tue, Nov 10, 2015 at 3:20 PM, Sebastian Estevez < sebastian.este...@datastax.com> wrote: > #1 The cause of this problem is a CREATE TABLE statement collision. Do not > generate tables > dynamically from multiple clients, even with IF NOT EXISTS. First thing you > need to do is > fix your code so that this does not happen. Just create your tables manually > from cqlsh allowing > time for the schema to settle. > > #2 Here's the fix: > > 1) Change your code to not automatically re-create tables (even with IF NOT > EXISTS). > > 2) Run a rolling restart to ensure schema matches across nodes. Run nodetool > describecluster > > around your cluster. Check that there is only one schema version. > > Thanks, that seems to have resolved the schema version inconsistency (though I'm still getting the original error). > ON EACH NODE: > > 3) Check your filesystem and see if you have two directories for the table in > > question in the data directory. > > Stupid question, but how do I find the problem table? The error message complains about a keyspace (by uuid); I haven't seen errors relating to a specific table. I've poked around in the data directory, but I'm not sure what I'm looking for.
Re: UnknownColumnFamily exception / schema inconsistencies
On Wed, Nov 11, 2015 at 9:55 AM, Sebastian Estevez < sebastian.este...@datastax.com> wrote: > Stupid question, but how do I find the problem table? The error message >> complains about a keyspace (by uuid); I haven't seen errors relating to a >> specific table. I've poked around in the data directory, but I'm not sure >> what I'm looking for. > > > Is the message complaining about a *keyspace* or abou*t a table (cfid)*? > You'r original was complaining about a table: > > at=IncomingTcpConnection.run UnknownColumnFamilyException reading from >> socket; closing org.apache.cassandra.db.UnknownColumnFamilyException: >> Couldn't find *cfId=3ecce750-84d3-11e5-bdd9-**dd7717dcdbd5* > > Sorry, you're absolutely right--it's the table from this error message. I confused myself. But now I was able to find it: cursors-3ecce75084d311e5bdd9dd7717dcdbd5 cursors-3ed23e8084d311e583b30fc0205655f5 The second uuid is the one that shows up via the schema_columnfamilies query, but on two of the nodes, the directory with the *other* uuid exists. Can I just rename the directory on these two nodes? Or how should I proceed?
UnknownColumnFamily exception / schema inconsistencies
Hello, I've been having some strange issues with one of our test clusters (4-day-old, 3-node, 2.1.10 cluster on AWS). I saw a number of messages like the following: [] 10 Nov 20:21:00.406 * pri=WARN t=MessagingService-Incoming-/ 192.168.168.202 at=IncomingTcpConnection.run UnknownColumnFamilyException reading from socket; closing org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find cfId=3ecce750-84d3-11e5-bdd9-dd7717dcdbd5 A colleague suggested I run repair, but that failed with: [2015-11-10 20:06:54,329] Nothing to repair for keyspace 'eventPipesState' [2015-11-10 20:06:54,348] Starting repair command #1, repairing 768 ranges for keyspace dbs8okvd7jcurj (parallelism=SEQUENTIAL, full=true) [2015-11-10 20:06:55,599] Repair command #1 finished [2015-11-10 20:06:55,610] Starting repair command #2, repairing 487 ranges for keyspace context (parallelism=SEQUENTIAL, full=true) [2015-11-10 20:11:21,213] Lost notification. You should check server log for repair status of keyspace context [2015-11-10 20:11:21,288] Lost notification. You should check server log for repair status of keyspace context Exception occurred during clean-up. java.lang.reflect.UndeclaredThrowableException error: JMX connection closed. You should check server log for repair status of keyspace context(Subsequent keyspaces are not going to be repaired). -- StackTrace -- java.io.IOException: JMX connection closed. You should check server log for repair status of keyspace context(Subsequent keyspaces are not going to be repaired). I searched for other cases of similar issues, and found some posts (e.g., http://stackoverflow.com/questions/22783577/org-apache-cassandra-db-unknowncolumnfamilyexception-couldnt-find-cfid ), but nothing that seemed directly relevant. Still, I tried `nodetool describecluster` and all the nodes showed up as being on the same schema version. The server log did not include any more info. I asked about this on IRC and got the suggestion to run `nodetool resetlocalschema`. I tried running that, and it completed (and `nodetool describecluster` now shows this node as having a different schema version from the other two nodes) but now I still get the original error in the server logs but also [] 10 Nov 22:51:10.466 * pri=ERROR t=Thrift:12 at=CustomTThreadPoolServer.run Error occurred during processing of message. java.lang.IllegalArgumentException: Unknown keyspace/cf pair (system_auth.credentials) Further `nodetool repair`s on the same node do complete, but only seem to process the `system` keyspace (and don't do anything with it): [2015-11-10 22:38:07,415] Nothing to repair for keyspace 'system' I also tried running `nodetool repair` from another node in the cluster, but that just seems to hang: [2015-11-10 22:53:11,830] Starting repair command #7, repairing 768 ranges for keyspace dbs8okvd7jcurj (parallelism=SEQUENTIAL, full=true) [2015-11-10 22:53:12,943] Repair command #7 finished [2015-11-10 22:53:12,958] Starting repair command #8, repairing 534 ranges for keyspace context (parallelism=SEQUENTIAL, full=true) How can I restore this cluster? And ideally, how can I figure out what went wrong here in the first place?
Re: UnknownColumnFamily exception / schema inconsistencies
Oh and for what it's worth, I've also looked through the logs for this node, and the oldest error in the logs seems to be: [] 06 Nov 22:10:53.260 * pri=ERROR t=Thrift:16 at=CustomTThreadPoolServer.run Error occurred during processing of message. java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.RuntimeException: org.apache.cassandra.exceptions.ConfigurationException: Column family ID mismatch (found 3ed23e80-84d3-11e5-83b3-0fc0205655f5; expected 3ecce750-84d3-11e5-bdd9-dd7717dcdbd5) Then the logs show a compaction, and then the UnknownColumnFamilyException starts occuring.
Re: Incremental repair from the get go
Following up on this older question: as per the docs, one *should* still do full repair periodically (the docs say weekly), right? And run incremental more often to fill in?
Re: What is your backup strategy for Cassandra?
On Thu, Sep 17, 2015 at 7:46 PM, Marc Tamskywrote: > This seems like an apt time to quote [1]: > > > Remember that you get 1 point for making a backup and 10,000 points for > restoring one. > > Restoring from backups is my goal. > > The commonly recommended tools (tablesnap, cassandra_snapshotter) all seem > to leave the restore operation as a pretty complicated exercise for the > operator. > > Do any include a working way to restore, on a different host, all of node > X's data from backups to the correct directories, such that the restored > files are in the proper places and the node restart method [2] "just works"? > As someone getting started with Cassandra, I'm very much interested in this as well. It seems that for the most part, folks seem to rely on replication and node replacement to recover from failures, and perhaps this is a testament for how well this works, but as long as we're hauling out aphorisms, "RAID is not a backup" seems to (partially) apply here too. I'd love to hear more about how the community does restores, too. This isn't complaining about shoddy tooling: this is trying to understand--and hopefully, in time, improve--the status quo re: disaster recovery. E.g., given that tableslurp operates on a single table at a time, do people normally just restore single tables? Is that used when there's filesystem or disk corruption? Bugs? Other issues? Looking forward to learning more. Thanks, Maciek
Re: Replacing dead node and cassandra.replace_address
On Tue, Sep 8, 2015 at 11:14 AM, sai krishnam raju potturi < pskraj...@gmail.com> wrote: > Once the new node is bootstrapped, you could remove replacement_address > from the env.sh file > Thanks, but how do I know when bootstrapping is completed?
Replacing dead node and cassandra.replace_address
According to the docs [1], when replacing a Cassandra node, I should start the replacement with cassandra.replace_address specified. Does that just become part of the replacement node's startup configuration? Can I (or do I have to) stop specifying it at some point? Does this affect subsequent node restarts (whether intentional or due to a crash)? I'm running Cassandra 2.1. Thanks, Maciek [1]: http://docs.datastax.com/en/cassandra/2.1/cassandra/operations/opsReplaceNode.html
tablesnap / tableslurp usage pointers?
Hi, I'm trying to use tablesnap [1] for disaster recovery backups, and while my uploads seem to be working fine, I can't figure out how to run the associated tableslurp tool for restores. If I pass the full S3 path to the individual table to tableslurp, it will restore that table, but if I try to pass the path to, e.g., the full keyspace, I get: LookupError: Cannot find anything to restore from my-bucket:my-prefix:/my-path Based on the source [2], it seems to be only looking for `-listdir.json` files in the same directory, but my directories in S3 only have `-listdir.json` files for other *files*, not directories. I am running tablesnap with the `--recursive` option. Any ideas? Thanks, Maciek [1]: https://github.com/JeremyGrosser/tablesnap [2]: https://github.com/JeremyGrosser/tablesnap/blob/master/tableslurp#L122-L124