Re: Taking a Cluster Wide Snapshot
On 4/25/2012 11:34 PM, Shubham Srivastava wrote: Whats the best way(or the only way) to take a cluster wide backup of Cassandra. Cant find much of the documentation on the same. I am using a MultiDC setup with cassandra 0.8.6. Regards, Shubham here's how i'm doing in AWS land using the DataStax AMI via a nightly cron job. you'll need pssh and s3cmd - #!/bin/bash cd /home/ec2-user/ops echo making snapshots pssh -h prod-cassandra-nodes.txt -l ubuntu -P 'nodetool -h localhost -p 7199 clearsnapshot stocktouch' pssh -h prod-cassandra-nodes.txt -l ubuntu -P 'nodetool -h localhost -p 7199 snapshot stocktouch' echo making tar balls pssh -h prod-cassandra-nodes.txt -l ubuntu -P -t 0 'rm `hostname`-cassandra-snapshot.tar.gz' pssh -h prod-cassandra-nodes.txt -l ubuntu -P -t 0 'tar -zcvf `hostname`-cassandra-snapshot.tar.gz /raid0/cassandra/data/stocktouch/snapshots' echo coping tar balls pslurp -h prod-cassandra-nodes.txt -l ubuntu /home/ubuntu/*cassandra-snapshot.tar.gz . echo tar'ing tar balls tar -cvf cassandra-snapshots-all-nodes.tar 10* echo pushing to S3 ../s3cmd-1.1.0-beta3/s3cmd put cassandra-snapshots-all-nodes.tar s3://stocktouch-backups echo DONE!
RE: Taking a Cluster Wide Snapshot
Thanks a Lot Deno. A bit surprised that the an equivalent command should be there with nodetool. Not sure if it is in the latest release. BTW this makes a prerequisite that all the Data files of Cassandra be it index or filters etc will have unique names across cluster. Is this a reasoanble assumption to have. Regards, Shubham From: Deno Vichas [d...@syncopated.net] Sent: Thursday, April 26, 2012 12:09 PM To: user@cassandra.apache.org Subject: Re: Taking a Cluster Wide Snapshot On 4/25/2012 11:34 PM, Shubham Srivastava wrote: Whats the best way(or the only way) to take a cluster wide backup of Cassandra. Cant find much of the documentation on the same. I am using a MultiDC setup with cassandra 0.8.6. Regards, Shubham here's how i'm doing in AWS land using the DataStax AMI via a nightly cron job. you'll need pssh and s3cmd - #!/bin/bash cd /home/ec2-user/ops echo making snapshots pssh -h prod-cassandra-nodes.txt -l ubuntu -P 'nodetool -h localhost -p 7199 clearsnapshot stocktouch' pssh -h prod-cassandra-nodes.txt -l ubuntu -P 'nodetool -h localhost -p 7199 snapshot stocktouch' echo making tar balls pssh -h prod-cassandra-nodes.txt -l ubuntu -P -t 0 'rm `hostname`-cassandra-snapshot.tar.gz' pssh -h prod-cassandra-nodes.txt -l ubuntu -P -t 0 'tar -zcvf `hostname`-cassandra-snapshot.tar.gz /raid0/cassandra/data/stocktouch/snapshots' echo coping tar balls pslurp -h prod-cassandra-nodes.txt -l ubuntu /home/ubuntu/*cassandra-snapshot.tar.gz . echo tar'ing tar balls tar -cvf cassandra-snapshots-all-nodes.tar 10* echo pushing to S3 ../s3cmd-1.1.0-beta3/s3cmd put cassandra-snapshots-all-nodes.tar s3://stocktouch-backups echo DONE!
Re: Taking a Cluster Wide Snapshot
there's no prerequisite for unique names. each node's snapshot gets tar'ed up and then copied over to a directory the name of the hostname of the node. then those dirs are tar'ed and copied to S3. what i haven't tried yet is to untar everything for all nodes into a single node cluster. i'm assuming i can get tar to replace or skip existing file so i end up with a set of unique files. can somebody confirm this? On 4/25/2012 11:45 PM, Shubham Srivastava wrote: Thanks a Lot Deno. A bit surprised that the an equivalent command should be there with nodetool. Not sure if it is in the latest release. BTW this makes a prerequisite that all the Data files of Cassandra be it index or filters etc will have unique names across cluster. Is this a reasoanble assumption to have. Regards, Shubham *From:* Deno Vichas [d...@syncopated.net] *Sent:* Thursday, April 26, 2012 12:09 PM *To:* user@cassandra.apache.org *Subject:* Re: Taking a Cluster Wide Snapshot On 4/25/2012 11:34 PM, Shubham Srivastava wrote: Whats the best way(or the only way) to take a cluster wide backup of Cassandra. Cant find much of the documentation on the same. I am using a MultiDC setup with cassandra 0.8.6. Regards, Shubham here's how i'm doing in AWS land using the DataStax AMI via a nightly cron job. you'll need pssh and s3cmd - #!/bin/bash cd /home/ec2-user/ops echo making snapshots pssh -h prod-cassandra-nodes.txt -l ubuntu -P 'nodetool -h localhost -p 7199 clearsnapshot stocktouch' pssh -h prod-cassandra-nodes.txt -l ubuntu -P 'nodetool -h localhost -p 7199 snapshot stocktouch' echo making tar balls pssh -h prod-cassandra-nodes.txt -l ubuntu -P -t 0 'rm `hostname`-cassandra-snapshot.tar.gz' pssh -h prod-cassandra-nodes.txt -l ubuntu -P -t 0 'tar -zcvf `hostname`-cassandra-snapshot.tar.gz /raid0/cassandra/data/stocktouch/snapshots' echo coping tar balls pslurp -h prod-cassandra-nodes.txt -l ubuntu /home/ubuntu/*cassandra-snapshot.tar.gz . echo tar'ing tar balls tar -cvf cassandra-snapshots-all-nodes.tar 10* echo pushing to S3 ../s3cmd-1.1.0-beta3/s3cmd put cassandra-snapshots-all-nodes.tar s3://stocktouch-backups echo DONE!
Re: Taking a Cluster Wide Snapshot
Your second part was what I was also referring where I put all the files from nodes to a single node to create a similar bkp which needs to have unique file names across cluster. From: Deno Vichas [mailto:d...@syncopated.net] Sent: Thursday, April 26, 2012 12:29 PM To: user@cassandra.apache.org user@cassandra.apache.org Subject: Re: Taking a Cluster Wide Snapshot there's no prerequisite for unique names. each node's snapshot gets tar'ed up and then copied over to a directory the name of the hostname of the node. then those dirs are tar'ed and copied to S3. what i haven't tried yet is to untar everything for all nodes into a single node cluster. i'm assuming i can get tar to replace or skip existing file so i end up with a set of unique files. can somebody confirm this? On 4/25/2012 11:45 PM, Shubham Srivastava wrote: Thanks a Lot Deno. A bit surprised that the an equivalent command should be there with nodetool. Not sure if it is in the latest release. BTW this makes a prerequisite that all the Data files of Cassandra be it index or filters etc will have unique names across cluster. Is this a reasoanble assumption to have. Regards, Shubham From: Deno Vichas [d...@syncopated.netmailto:d...@syncopated.net] Sent: Thursday, April 26, 2012 12:09 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Taking a Cluster Wide Snapshot On 4/25/2012 11:34 PM, Shubham Srivastava wrote: Whats the best way(or the only way) to take a cluster wide backup of Cassandra. Cant find much of the documentation on the same. I am using a MultiDC setup with cassandra 0.8.6. Regards, Shubham here's how i'm doing in AWS land using the DataStax AMI via a nightly cron job. you'll need pssh and s3cmd - #!/bin/bash cd /home/ec2-user/ops echo making snapshots pssh -h prod-cassandra-nodes.txt -l ubuntu -P 'nodetool -h localhost -p 7199 clearsnapshot stocktouch' pssh -h prod-cassandra-nodes.txt -l ubuntu -P 'nodetool -h localhost -p 7199 snapshot stocktouch' echo making tar balls pssh -h prod-cassandra-nodes.txt -l ubuntu -P -t 0 'rm `hostname`-cassandra-snapshot.tar.gz' pssh -h prod-cassandra-nodes.txt -l ubuntu -P -t 0 'tar -zcvf `hostname`-cassandra-snapshot.tar.gz /raid0/cassandra/data/stocktouch/snapshots' echo coping tar balls pslurp -h prod-cassandra-nodes.txt -l ubuntu /home/ubuntu/*cassandra-snapshot.tar.gz . echo tar'ing tar balls tar -cvf cassandra-snapshots-all-nodes.tar 10* echo pushing to S3 ../s3cmd-1.1.0-beta3/s3cmd put cassandra-snapshots-all-nodes.tar s3://stocktouch-backups echo DONE!
Question regarding major compaction.
In the tuning documentation regarding Cassandra, it's recomended not to run major compactions. I understand what a major compaction is all about but I'd like an in depth explanation as to why reads will continually degrade until the next major compaction is manually invoked. From the doc: So while read performance will be good immediately following a major compaction, it will continually degrade until the next major compaction is manually invoked. For this reason, major compaction is NOT recommended by DataStax. Regards /Fredrik
Re: Question regarding major compaction.
I'm also quite interested in this question. Here's my understanding on this problem. 1. If your workload is append-only, doing a major compaction shouldn't affect the read performance too much, because each row appears in one sstable anyway. 2. If your workload is mostly updating existing rows, then more and more columns will be obsoleted in that big sstable created by major compaction. And that super big sstable won't be compacted until you either have another 3 similar-sized sstables or start another major compaction. But I am not very sure whether this will be a major problem, because you only end up with reading one more sstable. Using size-tiered compaction against mostly-update workload itself may result in reading multiple sstables for a single row key. Please correct me if I am wrong. Cheng On Thu, Apr 26, 2012 at 3:50 PM, Fredrik fredrik.l.stigb...@sitevision.sewrote: In the tuning documentation regarding Cassandra, it's recomended not to run major compactions. I understand what a major compaction is all about but I'd like an in depth explanation as to why reads will continually degrade until the next major compaction is manually invoked. From the doc: So while read performance will be good immediately following a major compaction, it will continually degrade until the next major compaction is manually invoked. For this reason, major compaction is NOT recommended by DataStax. Regards /Fredrik
Re: Question regarding major compaction.
Exactly, but why would reads be significantly slower over time when including just one more, although sometimes large, SSTable in the read? Ji Cheng skrev 2012-04-26 11:11: I'm also quite interested in this question. Here's my understanding on this problem. 1. If your workload is append-only, doing a major compaction shouldn't affect the read performance too much, because each row appears in one sstable anyway. 2. If your workload is mostly updating existing rows, then more and more columns will be obsoleted in that big sstable created by major compaction. And that super big sstable won't be compacted until you either have another 3 similar-sized sstables or start another major compaction. But I am not very sure whether this will be a major problem, because you only end up with reading one more sstable. Using size-tiered compaction against mostly-update workload itself may result in reading multiple sstables for a single row key. Please correct me if I am wrong. Cheng On Thu, Apr 26, 2012 at 3:50 PM, Fredrik fredrik.l.stigb...@sitevision.se mailto:fredrik.l.stigb...@sitevision.se wrote: In the tuning documentation regarding Cassandra, it's recomended not to run major compactions. I understand what a major compaction is all about but I'd like an in depth explanation as to why reads will continually degrade until the next major compaction is manually invoked. From the doc: So while read performance will be good immediately following a major compaction, it will continually degrade until the next major compaction is manually invoked. For this reason, major compaction is NOT recommended by DataStax. Regards /Fredrik
Maintain sort order on updatable property and pagination
Hi All, I am using property of columns i.e., they are in sorted order to store sort orders (I believe everyone else is also using the same). But if I want to maintain sort order on a property, whose value changes, I would have to perform read and delete operation. Is there a better way to solve this in real time. Also for pagination, we have to set range for columnNames. If we know the last page's last columnName we can get the next page. What if we want to go from page 2 to page 6, this seems impossible as of now. Any suggestion? Thank you. **
Re: nodetool repair hanging
My cluster is very small (300 MB) and compact was taking more than 2 hours. I ended up bouncing all the nodes. After that, I was able to run repair on all nodes, and each one takes less than a minute. If this happens again I will be sure to run compactionstats and netstats. Thanks for that tip. Bill On Wed, Apr 25, 2012 at 11:49 AM, Gregg Ulrich gulr...@netflix.com wrote: How much data do you have and how long is a while? In my experience repairs can take a very long time. Check to see if validation compactions are running (nodetool compactionstats) or if files are streaming (nodetool netstats). If either of those are in progress then your repair should be running. I've seen 12 node, 50G clusters take days to repair to a new data center. Not sure if 1.0 is different but in 0.X I don't believe killing the nodetool process stops the repair. When we need to stop a repair we have bounced all of the participating nodes. I've been told that there is no harm in stopping repairs. On Apr 24, 2012, at 2:55 PM, Bill Au wrote: I am running 1.0.8. I am adding a new data center to an existing cluster. Following steps outlined in another thread on the mailing list, things went fine except for the last step, which is to run repair on all the nodes in the new data center. Repair seems to be hanging indefinitely. There is no activity in system.log. I did notice that the node being repair is requesting ranges from nodes in both the existing and new data center. Since there is not data in the new data center initially, I though that it may be why repair is hanging. So I break out of the repair with a control-C after waiting for a while. I do see data being added to the new nodes. When I ran repair for the second time it is still hanging. Why is repair hanging? Is it save to use control-C to break out of it. How do I recover from this? Bill
Data model question, storing Queue Message
Hi everyone ! I'm fairly new to cassandra and I'm not quite yet familiarized with column oriented NoSQL model. I have worked a while on it, but I can't seems to find the best model for what I'm looking for. I have a Erlang software that let user connecting and communicate with each others, when an user (A) sends a message to a disconnected user (B), it stores it on the database and wait for the user (B) to connect and retrieve the message queue, and deletes it. Here's some key point : - Users are identified by integer IDs - Each message are unique by combination of : Sender ID - Receiver ID - Message ID - time I have a queue Message, and here's the operations I would need to do as fast as possible : - Store from 1 to X messages per registered user - Get the number of stored messages per user (Can be a incremental variable updated at each store // this is often retrieved) - retrieve all messages from an user at once. - delete all messages from an user at once. - delete all messages that are older than Y months (from all users). I really don't think that storage will be an issue, I have 2TB per nodes, messages are 1KB limited. I'm really looking for speed rather than storage optimization. My configuration is 2 dedicated server which are both : - 4 x Intel i7 2.66 Ghz - 64 bits - 24 Go - 2 TB Thank you all.
RE: Taking a Cluster Wide Snapshot
I was trying to get hold of all the data kind of a global snapshot. I did the below : I copied all the snapshots from each individual nodes where the snapshot data size was around 12Gb on each node to a common folder(one folder alone). Strangely I found duplicate file names in multiple snapshots and more strangely the data size was different of each duplicate file which lead to the total data size to close to 13Gb(else have to be overwritten) where as the expectation was 12*6 = 72Gb. Does that mean that if I need to create a new ring with the same data as the existing one I cant just do that or should I start with the 13Gb copy to check if all the data is present which sounds pretty illogical. Please suggest?? From: Shubham Srivastava Sent: Thursday, April 26, 2012 12:43 PM To: 'user@cassandra.apache.org' Subject: Re: Taking a Cluster Wide Snapshot Your second part was what I was also referring where I put all the files from nodes to a single node to create a similar bkp which needs to have unique file names across cluster. From: Deno Vichas [mailto:d...@syncopated.net] Sent: Thursday, April 26, 2012 12:29 PM To: user@cassandra.apache.org user@cassandra.apache.org Subject: Re: Taking a Cluster Wide Snapshot there's no prerequisite for unique names. each node's snapshot gets tar'ed up and then copied over to a directory the name of the hostname of the node. then those dirs are tar'ed and copied to S3. what i haven't tried yet is to untar everything for all nodes into a single node cluster. i'm assuming i can get tar to replace or skip existing file so i end up with a set of unique files. can somebody confirm this? On 4/25/2012 11:45 PM, Shubham Srivastava wrote: Thanks a Lot Deno. A bit surprised that the an equivalent command should be there with nodetool. Not sure if it is in the latest release. BTW this makes a prerequisite that all the Data files of Cassandra be it index or filters etc will have unique names across cluster. Is this a reasoanble assumption to have. Regards, Shubham From: Deno Vichas [d...@syncopated.netmailto:d...@syncopated.net] Sent: Thursday, April 26, 2012 12:09 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Taking a Cluster Wide Snapshot On 4/25/2012 11:34 PM, Shubham Srivastava wrote: Whats the best way(or the only way) to take a cluster wide backup of Cassandra. Cant find much of the documentation on the same. I am using a MultiDC setup with cassandra 0.8.6. Regards, Shubham here's how i'm doing in AWS land using the DataStax AMI via a nightly cron job. you'll need pssh and s3cmd - #!/bin/bash cd /home/ec2-user/ops echo making snapshots pssh -h prod-cassandra-nodes.txt -l ubuntu -P 'nodetool -h localhost -p 7199 clearsnapshot stocktouch' pssh -h prod-cassandra-nodes.txt -l ubuntu -P 'nodetool -h localhost -p 7199 snapshot stocktouch' echo making tar balls pssh -h prod-cassandra-nodes.txt -l ubuntu -P -t 0 'rm `hostname`-cassandra-snapshot.tar.gz' pssh -h prod-cassandra-nodes.txt -l ubuntu -P -t 0 'tar -zcvf `hostname`-cassandra-snapshot.tar.gz /raid0/cassandra/data/stocktouch/snapshots' echo coping tar balls pslurp -h prod-cassandra-nodes.txt -l ubuntu /home/ubuntu/*cassandra-snapshot.tar.gz . echo tar'ing tar balls tar -cvf cassandra-snapshots-all-nodes.tar 10* echo pushing to S3 ../s3cmd-1.1.0-beta3/s3cmd put cassandra-snapshots-all-nodes.tar s3://stocktouch-backups echo DONE!
Re: Taking a Cluster Wide Snapshot
I copied all the snapshots from each individual nodes where the snapshot data size was around 12Gb on each node to a common folder(one folder alone). Strangely I found duplicate file names in multiple snapshots and more strangely the data size was different of each duplicate file which lead to the total data size to close to 13Gb(else have to be overwritten) where as the expectation was 12*6 = 72Gb. You have detected via experimentation that the namespacing of sstable filenames per CF per node is not unique. In order to do the operation you are doing, you have to rename them to be globally unique. Just inflate the integer part is the easiest way. https://issues.apache.org/jira/browse/CASSANDRA-1983 =Rob -- =Robert Coli AIMGTALK - rc...@palominodb.com YAHOO - rcoli.palominob SKYPE - rcoli_palominodb
Re: user Digest of: get.23021
I am having the same issue in 1.0.7 with leveled compation. It seems that the repair is flaky. It either completes relatively fast in a TEST environment (7 minutes) or gets stuck trying to receive a merkle tree from a peer that is already sending it the merkle tree. Only solution is to restart cassandra. But, we that's not good. On Thu, Apr 26, 2012 at 2:12 PM, user-h...@cassandra.apache.org wrote: user Digest of: get.23021 Topics (messages 23021 through 23021) repair waiting for something 23021 by: Igor Return-Path: buzzt...@gmail.com Received: (qmail 18382 invoked by uid 99); 26 Apr 2012 18:12:10 - Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 26 Apr 2012 18:12:10 + X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of buzztemk@gmail.comdesignates 209.85.213.44 as permitted sender) Received: from [209.85.213.44] (HELO mail-yw0-f44.google.com) (209.85.213.44) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 26 Apr 2012 18:12:03 + Received: by yhkk25 with SMTP id k25so1353248yhk.31 for user-get.23...@cassandra.apache.org; Thu, 26 Apr 2012 11:11:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=r9z+JIAEkTfLo/8PFQJjtEFfJbNxrmswWgqgBxX7sGs=; b=SqDIdsaA/YBsIb8yTAjwlLyz/3KvP2fJzedX1lywPYnAT698AbE2yGI30qpGo8rUQM q/QFJ5mFNQkdrn0Ghr6L+wKe+slq6Teb8C/feeHBU9BkjbaAY40UPPljJyf/L0Yr9Sp8 ryso93dpcgcC18DdwbAPHmxd0C9G20gf4dbQcpquAKgyxtTK849GQXpPICS4AUHlG2bL OY83kIzRIBv7g3Zy2SJALwYX9eeB6zGin0DbnrtgGr7IqI0LBscWv6eKNMS658twLGG+ 37cVt+Wmtf5QIIT/Jm2qUdBZ7NViwwlnkJL79ULGnesj4Hewp2npFQAmLypK+8fGqoAM ie9Q== MIME-Version: 1.0 Received: by 10.182.113.106 with SMTP id ix10mr10045510obb.26.1335463902287; Thu, 26 Apr 2012 11:11:42 -0700 (PDT) Received: by 10.60.143.102 with HTTP; Thu, 26 Apr 2012 11:11:42 -0700 (PDT) Date: Thu, 26 Apr 2012 14:11:42 -0400 Message-ID: caal7ocavuw1rtaqwlddzbnzosv7-qxqfhot7w6uj8q08m03...@mail.gmail.com Subject: Get From: Frank Ng buzzt...@gmail.com To: user-get.23...@cassandra.apache.org Content-Type: multipart/alternative; boundary=f46d0447f3b081982d04be98eb8e -- Hi, 10 nodes cassandra 1.0.3, several DC. weekly nodetool repair stuck for unusual long time for node 10.254.237.2. output log on this node: INFO 11:19:42,045 Starting repair command #1, repairing 5 ranges. INFO 11:19:42,053 [repair #040aae00-28a1-11e1--e378018944ff] new session: will sync *localhost/10.254.237.2, /10.254.221.2, /10.253.2.2, / 10.254.217.2, /10.254.94.2* on range (85070591730234615865843651857942052864,85070591730234615865843651857942052865] for meter.[eventschema, schema, ids, transaction] INFO 11:19:42,055 [repair #040aae00-28a1-11e1--e378018944ff] requests for merkle tree sent for eventschema (to [/10.253.2.2, /10.254.221.2, localhost/10.254.237.2, /10.254.217.2, /10.254.94.2]) INFO 11:19:42,063 Enqueuing flush of Memtable-eventschema@1509399856(18748/23435 serialized/live bytes, 4 ops) INFO 11:19:42,063 Writing Memtable-eventschema@1509399856(18748/23435 serialized/live bytes, 4 ops) INFO 11:19:42,072 Completed flushing /spool1/cassandra/data/meter/eventschema-hb-40-Data.db (4745 bytes) INFO 11:19:42,073 Discarding obsolete commit log:CommitLogSegment(/var/lib/cassandra/commitlog/CommitLog-1324019623060.log) INFO 11:19:42,076 [repair #040aae00-28a1-11e1--e378018944ff] Received merkle tree for eventschema from localhost/10.254.237.2 INFO 11:19:42,102 [repair #040aae00-28a1-11e1--e378018944ff] Received merkle tree for eventschema from /10.254.221.2 INFO 11:19:42,128 [repair #040aae00-28a1-11e1--e378018944ff] Received merkle tree for eventschema from /10.254.217.2 INFO 11:19:42,228 [repair #040aae00-28a1-11e1--e378018944ff] Received merkle tree for eventschema from /10.253.2.2 And nothing after that for long time. So node sent request for trees to other nodes and received all but from the 10.254.94.2* *On that 10.254.94.2 node: INFO 11:19:42,083 [repair #040aae00-28a1-11e1--e378018944ff] Sending completed merkle tree to /10.254.237.2 for (meter,eventschema) So merkle tree were lost somewhere. Will this waiting break somehow or I need to restart node?
Re: repair waiting for something
I am having the same issue in 1.0.7 with leveled compation. It seems that the repair is flaky. It either completes relatively fast in a TEST environment (7 minutes) or gets stuck trying to receive a merkle tree from a peer that is already sending it the merkle tree. Only solution is to restart cassandra. But, we that's not good.
Is this possible.
Hello, I am new to cassandra and was hoping if someone can tell me if the following is possible. Given I have a columnfamily with a list of users in each Row. Each user has the properties: name, highscore, x, y, z. I want to use name as the column key, but I want the columns to be sorted by highscore (always). The only reads would be to get the top N users by highscore in a given row. I thought about adding the weight to the name as the key (eg: 299.76-johnsmith) but then I would not be able to update a given user. This was not possible in the past, but I am not familiar, with the newer cassandra versions.
Re: Is this possible.
Data model: REM CQL 3.0 $ cqlsh --cql3 drop COLUMNFAMILY user_score_v3; CREATE COLUMNFAMILY user_score_v3 (name varchar, highscore float, x int, y varchar, z varchar, PRIMARY KEY (name, highscore) ); DML is as usual, as commom, as RDBMS SQL. Query: Top 3, SELECT name, highscore, x,y,z FROM user_score_v3 where name='abc' ORDER BY highscore desc LIMIT 3; You may try Reversed Comparators, see http://thelastpickle.com/2011/10/03/Reverse-Comparators/ Help this is helpful. Thank, Charlie | DBA On Thu, Apr 26, 2012 at 12:34 PM, Ed Jone edjo...@gmail.com wrote: Hello, I am new to cassandra and was hoping if someone can tell me if the following is possible. Given I have a columnfamily with a list of users in each Row. Each user has the properties: name, highscore, x, y, z. I want to use name as the column key, but I want the columns to be sorted by highscore (always). The only reads would be to get the top N users by highscore in a given row. I thought about adding the weight to the name as the key (eg: 299.76-johnsmith) but then I would not be able to update a given user. This was not possible in the past, but I am not familiar, with the newer cassandra versions. -- Thanks, Charlie (@mujiang) 一个 木匠 === Data Architect Developer http://mujiang.blogspot.com
Re: Is this possible.
DML example, insert into user_score_v3(name, highscore, x,y,z) values ('abc', 299.76, 1001, '*', '*'); ... 2012/4/26 Data Craftsman database.crafts...@gmail.com: Data model: REM CQL 3.0 $ cqlsh --cql3 drop COLUMNFAMILY user_score_v3; CREATE COLUMNFAMILY user_score_v3 (name varchar, highscore float, x int, y varchar, z varchar, PRIMARY KEY (name, highscore) ); DML is as usual, as commom, as RDBMS SQL. Query: Top 3, SELECT name, highscore, x,y,z FROM user_score_v3 where name='abc' ORDER BY highscore desc LIMIT 3; You may try Reversed Comparators, see http://thelastpickle.com/2011/10/03/Reverse-Comparators/ Help this is helpful. Thank, Charlie | DBA On Thu, Apr 26, 2012 at 12:34 PM, Ed Jone edjo...@gmail.com wrote: Hello, I am new to cassandra and was hoping if someone can tell me if the following is possible. Given I have a columnfamily with a list of users in each Row. Each user has the properties: name, highscore, x, y, z. I want to use name as the column key, but I want the columns to be sorted by highscore (always). The only reads would be to get the top N users by highscore in a given row. I thought about adding the weight to the name as the key (eg: 299.76-johnsmith) but then I would not be able to update a given user. This was not possible in the past, but I am not familiar, with the newer cassandra versions. -- Thanks, Charlie (@mujiang) 一个 木匠 === Data Architect Developer http://mujiang.blogspot.com -- -- Thanks, Charlie (@mujiang) 一个 木匠 === Data Architect Developer http://mujiang.blogspot.com
Node join streaming stuck at 100%
This is the second node I've joined to my cluster in the last few days, and so far both have become stuck at 100% on a large file according to netstats. This is on 1.0.9, is there anything I can do to make it move on besides restarting Cassandra? I don't see any errors or warns in logs for either server, and there is plenty of disk space. On the sender side I see this: Streaming to: /10.20.1.152 /opt/cassandra/data/MonitoringData/PropertyTimeline-hc-80540-Data.db sections=1 progress=82393861085/82393861085 - 100% On the node joining I don't see this file in netstats, and all pending streams are sitting at 0%
Map reduce without hdfs
Hello to all! It it possible to launch only hadoop mapreduce task tracker and job tracker against cassandra cluster, and doesn't launch HDFS (use for shared storage something else)?? Thanks
Re: Map reduce without hdfs
That is one of the perks of brisk and later datastax enterprise. As it stands the datanode component is only used as a distributed cache for jars. So if your job uses cassandra for input format and output format you only need the other components for temporary storage. On Thu, Apr 26, 2012 at 7:12 PM, ruslan usifov ruslan.usi...@gmail.com wrote: Hello to all! It it possible to launch only hadoop mapreduce task tracker and job tracker against cassandra cluster, and doesn't launch HDFS (use for shared storage something else)?? Thanks
RE: Taking a Cluster Wide Snapshot
Thanks a lot Rob. On another thought I could also try copying the data of my keyspace alone from one node to another node in the new cluster (I have both the old and new clusters having same nodes DC1:6,DC2:6 with same tokens) with the same tokens. Would there be any risk of the new cluster getting joined to the old cluster probably if the data inside keyspace is aware of the original IP's etc. Is this recommended? Regards, Shubham From: Rob Coli [rc...@palominodb.com] Sent: Thursday, April 26, 2012 11:42 PM To: user@cassandra.apache.org Subject: Re: Taking a Cluster Wide Snapshot I copied all the snapshots from each individual nodes where the snapshot data size was around 12Gb on each node to a common folder(one folder alone). Strangely I found duplicate file names in multiple snapshots and more strangely the data size was different of each duplicate file which lead to the total data size to close to 13Gb(else have to be overwritten) where as the expectation was 12*6 = 72Gb. You have detected via experimentation that the namespacing of sstable filenames per CF per node is not unique. In order to do the operation you are doing, you have to rename them to be globally unique. Just inflate the integer part is the easiest way. https://issues.apache.org/jira/browse/CASSANDRA-1983 =Rob -- =Robert Coli AIMGTALK - rc...@palominodb.com YAHOO - rcoli.palominob SKYPE - rcoli_palominodb
Re: Taking a Cluster Wide Snapshot
On Thu, Apr 26, 2012 at 10:38 PM, Shubham Srivastava shubham.srivast...@makemytrip.com wrote: On another thought I could also try copying the data of my keyspace alone from one node to another node in the new cluster (I have both the old and new clusters having same nodes DC1:6,DC2:6 with same tokens) with the same tokens. Would there be any risk of the new cluster getting joined to the old cluster probably if the data inside keyspace is aware of the original IP's etc. As a result of this very concern while @ Digg... https://issues.apache.org/jira/browse/CASSANDRA-769 tl;dr : as long as your cluster names are unique in your cluster config (**and you do not copy the System keyspace, letting the new cluster initialize with the new cluster name**), nodes are at no risk of joining the wrong cluster. =Rob -- =Robert Coli AIMGTALK - rc...@palominodb.com YAHOO - rcoli.palominob SKYPE - rcoli_palominodb