Re: Taking a Cluster Wide Snapshot

2012-04-26 Thread Deno Vichas

On 4/25/2012 11:34 PM, Shubham Srivastava wrote:
Whats the best way(or the only way) to take a cluster wide backup of 
Cassandra. Cant find much of the documentation on the same.


I am using a MultiDC setup with cassandra 0.8.6.


Regards,
Shubham
 here's how i'm doing in AWS land using the DataStax AMI via a nightly 
cron job. you'll need pssh and s3cmd -



#!/bin/bash
cd /home/ec2-user/ops

echo making snapshots
pssh -h prod-cassandra-nodes.txt -l ubuntu -P 'nodetool -h localhost -p 
7199 clearsnapshot stocktouch'
pssh -h prod-cassandra-nodes.txt -l ubuntu -P 'nodetool -h localhost -p 
7199 snapshot stocktouch'


echo making tar balls
pssh -h prod-cassandra-nodes.txt -l ubuntu -P -t 0 'rm 
`hostname`-cassandra-snapshot.tar.gz'
pssh -h prod-cassandra-nodes.txt -l ubuntu -P -t 0 'tar -zcvf 
`hostname`-cassandra-snapshot.tar.gz 
/raid0/cassandra/data/stocktouch/snapshots'


echo coping tar balls
pslurp -h prod-cassandra-nodes.txt -l ubuntu 
/home/ubuntu/*cassandra-snapshot.tar.gz .


echo tar'ing tar balls
tar -cvf cassandra-snapshots-all-nodes.tar 10*

echo pushing to S3
../s3cmd-1.1.0-beta3/s3cmd put cassandra-snapshots-all-nodes.tar  
s3://stocktouch-backups


echo DONE!



RE: Taking a Cluster Wide Snapshot

2012-04-26 Thread Shubham Srivastava
Thanks a Lot Deno.  A bit surprised that the an equivalent command should be 
there with nodetool. Not sure if it is in the latest release.

BTW this makes a prerequisite that all the Data files of Cassandra be it index 
or filters etc will have unique names across cluster. Is this a reasoanble 
assumption to have.

Regards,
Shubham

From: Deno Vichas [d...@syncopated.net]
Sent: Thursday, April 26, 2012 12:09 PM
To: user@cassandra.apache.org
Subject: Re: Taking a Cluster Wide Snapshot

On 4/25/2012 11:34 PM, Shubham Srivastava wrote:
Whats the best way(or the only way) to take a cluster wide backup of Cassandra. 
Cant find much of the documentation on the same.

I am using a MultiDC setup with cassandra 0.8.6.


Regards,
Shubham
 here's how i'm doing in AWS land using the DataStax AMI via a nightly cron 
job.  you'll need pssh and s3cmd -


#!/bin/bash
cd /home/ec2-user/ops

echo making snapshots
pssh -h prod-cassandra-nodes.txt -l ubuntu -P 'nodetool -h localhost -p 7199 
clearsnapshot stocktouch'
pssh -h prod-cassandra-nodes.txt -l ubuntu -P 'nodetool -h localhost -p 7199 
snapshot stocktouch'

echo making tar balls
pssh -h prod-cassandra-nodes.txt -l ubuntu -P -t 0 'rm 
`hostname`-cassandra-snapshot.tar.gz'
pssh -h prod-cassandra-nodes.txt -l ubuntu -P -t 0 'tar -zcvf 
`hostname`-cassandra-snapshot.tar.gz /raid0/cassandra/data/stocktouch/snapshots'

echo coping tar balls
pslurp -h prod-cassandra-nodes.txt -l ubuntu 
/home/ubuntu/*cassandra-snapshot.tar.gz .

echo tar'ing tar balls
tar -cvf cassandra-snapshots-all-nodes.tar 10*

echo pushing to S3
../s3cmd-1.1.0-beta3/s3cmd put cassandra-snapshots-all-nodes.tar  
s3://stocktouch-backups

echo DONE!



Re: Taking a Cluster Wide Snapshot

2012-04-26 Thread Deno Vichas
there's no prerequisite for unique names.  each node's snapshot gets 
tar'ed up and then copied over to a directory the name of the hostname 
of the node.  then those dirs are tar'ed and copied to S3.


what i haven't tried yet is to untar everything for all nodes into a 
single node cluster.  i'm assuming i can get tar to replace or skip 
existing file so i end up with a set of unique files.  can somebody 
confirm this?





On 4/25/2012 11:45 PM, Shubham Srivastava wrote:
Thanks a Lot Deno.  A bit surprised that the an equivalent command 
should be there with nodetool. Not sure if it is in the latest release.


BTW this makes a prerequisite that all the Data files of Cassandra be 
it index or filters etc will have unique names across cluster. Is this 
a reasoanble assumption to have.


Regards,
Shubham

*From:* Deno Vichas [d...@syncopated.net]
*Sent:* Thursday, April 26, 2012 12:09 PM
*To:* user@cassandra.apache.org
*Subject:* Re: Taking a Cluster Wide Snapshot

On 4/25/2012 11:34 PM, Shubham Srivastava wrote:
Whats the best way(or the only way) to take a cluster wide backup of 
Cassandra. Cant find much of the documentation on the same.


I am using a MultiDC setup with cassandra 0.8.6.


Regards,
Shubham
 here's how i'm doing in AWS land using the DataStax AMI via a nightly 
cron job. you'll need pssh and s3cmd -



#!/bin/bash
cd /home/ec2-user/ops

echo making snapshots
pssh -h prod-cassandra-nodes.txt -l ubuntu -P 'nodetool -h localhost 
-p 7199 clearsnapshot stocktouch'
pssh -h prod-cassandra-nodes.txt -l ubuntu -P 'nodetool -h localhost 
-p 7199 snapshot stocktouch'


echo making tar balls
pssh -h prod-cassandra-nodes.txt -l ubuntu -P -t 0 'rm 
`hostname`-cassandra-snapshot.tar.gz'
pssh -h prod-cassandra-nodes.txt -l ubuntu -P -t 0 'tar -zcvf 
`hostname`-cassandra-snapshot.tar.gz 
/raid0/cassandra/data/stocktouch/snapshots'


echo coping tar balls
pslurp -h prod-cassandra-nodes.txt -l ubuntu 
/home/ubuntu/*cassandra-snapshot.tar.gz .


echo tar'ing tar balls
tar -cvf cassandra-snapshots-all-nodes.tar 10*

echo pushing to S3
../s3cmd-1.1.0-beta3/s3cmd put cassandra-snapshots-all-nodes.tar  
s3://stocktouch-backups


echo DONE!





Re: Taking a Cluster Wide Snapshot

2012-04-26 Thread Shubham Srivastava
Your second part was what I was also referring where I put all the files from 
nodes to a single node to create a similar bkp which needs to have unique file 
names across cluster.


From: Deno Vichas [mailto:d...@syncopated.net]
Sent: Thursday, April 26, 2012 12:29 PM
To: user@cassandra.apache.org user@cassandra.apache.org
Subject: Re: Taking a Cluster Wide Snapshot

there's no prerequisite for unique names.  each node's snapshot gets tar'ed up 
and then copied over to a directory the name of the hostname of the node.  then 
those dirs are tar'ed and copied to S3.

what i haven't tried yet is to untar everything for all nodes into a single 
node cluster.  i'm assuming i can get tar to replace or skip existing file so i 
end up with a set of unique files.  can somebody confirm this?




On 4/25/2012 11:45 PM, Shubham Srivastava wrote:
Thanks a Lot Deno.  A bit surprised that the an equivalent command should be 
there with nodetool. Not sure if it is in the latest release.

BTW this makes a prerequisite that all the Data files of Cassandra be it index 
or filters etc will have unique names across cluster. Is this a reasoanble 
assumption to have.

Regards,
Shubham

From: Deno Vichas [d...@syncopated.netmailto:d...@syncopated.net]
Sent: Thursday, April 26, 2012 12:09 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Taking a Cluster Wide Snapshot

On 4/25/2012 11:34 PM, Shubham Srivastava wrote:
Whats the best way(or the only way) to take a cluster wide backup of Cassandra. 
Cant find much of the documentation on the same.

I am using a MultiDC setup with cassandra 0.8.6.


Regards,
Shubham
 here's how i'm doing in AWS land using the DataStax AMI via a nightly cron 
job.  you'll need pssh and s3cmd -


#!/bin/bash
cd /home/ec2-user/ops

echo making snapshots
pssh -h prod-cassandra-nodes.txt -l ubuntu -P 'nodetool -h localhost -p 7199 
clearsnapshot stocktouch'
pssh -h prod-cassandra-nodes.txt -l ubuntu -P 'nodetool -h localhost -p 7199 
snapshot stocktouch'

echo making tar balls
pssh -h prod-cassandra-nodes.txt -l ubuntu -P -t 0 'rm 
`hostname`-cassandra-snapshot.tar.gz'
pssh -h prod-cassandra-nodes.txt -l ubuntu -P -t 0 'tar -zcvf 
`hostname`-cassandra-snapshot.tar.gz /raid0/cassandra/data/stocktouch/snapshots'

echo coping tar balls
pslurp -h prod-cassandra-nodes.txt -l ubuntu 
/home/ubuntu/*cassandra-snapshot.tar.gz .

echo tar'ing tar balls
tar -cvf cassandra-snapshots-all-nodes.tar 10*

echo pushing to S3
../s3cmd-1.1.0-beta3/s3cmd put cassandra-snapshots-all-nodes.tar  
s3://stocktouch-backups

echo DONE!




Question regarding major compaction.

2012-04-26 Thread Fredrik
In the tuning documentation regarding Cassandra, it's recomended not to 
run major compactions.
I understand what a major compaction is all about but I'd like an in 
depth explanation as to why reads will continually degrade until the 
next major compaction is manually invoked.


From the doc:
So while read performance will be good immediately following a major 
compaction, it will continually degrade until the next major compaction 
is manually invoked. For this reason, major compaction is NOT 
recommended by DataStax.


Regards
/Fredrik


Re: Question regarding major compaction.

2012-04-26 Thread Ji Cheng
I'm also quite interested in this question. Here's my understanding on this
problem.

1. If your workload is append-only, doing a major compaction shouldn't
affect the read performance too much, because each row appears in one
sstable anyway.

2. If your workload is mostly updating existing rows, then more and more
columns will be obsoleted in that big sstable created by major compaction.
And that super big sstable won't be compacted until you either have another
3 similar-sized sstables or start another major compaction. But I am not
very sure whether this will be a major problem, because you only end up
with reading one more sstable. Using size-tiered compaction against
mostly-update workload itself may result in reading multiple sstables for a
single row key.

Please correct me if I am wrong.

Cheng


On Thu, Apr 26, 2012 at 3:50 PM, Fredrik
fredrik.l.stigb...@sitevision.sewrote:

 In the tuning documentation regarding Cassandra, it's recomended not to
 run major compactions.
 I understand what a major compaction is all about but I'd like an in depth
 explanation as to why reads will continually degrade until the next major
 compaction is manually invoked.

 From the doc:
 So while read performance will be good immediately following a major
 compaction, it will continually degrade until the next major compaction is
 manually invoked. For this reason, major compaction is NOT recommended by
 DataStax.

 Regards
 /Fredrik



Re: Question regarding major compaction.

2012-04-26 Thread Fredrik
Exactly, but why would reads be significantly slower over time when 
including just one more, although sometimes large, SSTable in the read?


Ji Cheng skrev 2012-04-26 11:11:
I'm also quite interested in this question. Here's my understanding on 
this problem.


1. If your workload is append-only, doing a major compaction shouldn't 
affect the read performance too much, because each row appears in one 
sstable anyway.


2. If your workload is mostly updating existing rows, then more and 
more columns will be obsoleted in that big sstable created by major 
compaction. And that super big sstable won't be compacted until you 
either have another 3 similar-sized sstables or start another major 
compaction. But I am not very sure whether this will be a major 
problem, because you only end up with reading one more sstable. Using 
size-tiered compaction against mostly-update workload itself may 
result in reading multiple sstables for a single row key.


Please correct me if I am wrong.

Cheng


On Thu, Apr 26, 2012 at 3:50 PM, Fredrik 
fredrik.l.stigb...@sitevision.se 
mailto:fredrik.l.stigb...@sitevision.se wrote:


In the tuning documentation regarding Cassandra, it's recomended
not to run major compactions.
I understand what a major compaction is all about but I'd like an
in depth explanation as to why reads will continually degrade
until the next major compaction is manually invoked.

From the doc:
So while read performance will be good immediately following a
major compaction, it will continually degrade until the next major
compaction is manually invoked. For this reason, major compaction
is NOT recommended by DataStax.

Regards
/Fredrik






Maintain sort order on updatable property and pagination

2012-04-26 Thread Rajat Mathur
Hi All,

I am using property of columns i.e., they are in sorted order to store sort
orders (I believe everyone else is also using the same).
But if I want to maintain sort order on a property, whose value changes, I
would have to perform read and delete operation. Is there a better way to
solve this in real time.

Also for pagination, we have to set range for columnNames. If we know the
last page's last columnName we can get the next page. What if we want to go
from page 2 to page 6, this seems impossible as of now. Any suggestion?

Thank you.

**


Re: nodetool repair hanging

2012-04-26 Thread Bill Au
My cluster is very small (300 MB) and compact was taking more than 2 hours.

I ended up bouncing all the nodes.  After that,  I was able to run repair
on all nodes, and each one takes less than a minute.

If this happens again I will be sure to run compactionstats and netstats.
Thanks for that tip.

Bill

On Wed, Apr 25, 2012 at 11:49 AM, Gregg Ulrich gulr...@netflix.com wrote:

 How much data do you have and how long is a while?  In my experience
 repairs can take a very long time.  Check to see if validation compactions
 are running (nodetool compactionstats) or if files are streaming (nodetool
 netstats).  If either of those are in progress then your repair should be
 running.  I've seen 12 node, 50G clusters take days to repair to a new data
 center.

 Not sure if 1.0 is different but in 0.X I don't believe killing the
 nodetool process stops the repair.  When we need to stop a repair we have
 bounced all of the participating nodes.  I've been told that there is no
 harm in stopping repairs.

 On Apr 24, 2012, at 2:55 PM, Bill Au wrote:

  I am running 1.0.8.  I am adding a new data center to an existing
 cluster.  Following steps outlined in another thread on the mailing list,
 things went fine except for the last step, which is to run repair on all
 the nodes in the new data center.  Repair seems to be hanging indefinitely.
  There is no activity in system.log.  I did notice that the node being
 repair is requesting ranges from nodes in both the existing and new data
 center.  Since there is not data in the new data center initially, I though
 that it may be why repair is hanging.  So I break out of the repair with a
 control-C after waiting for a while.  I do see data being added to the new
 nodes.  When I ran repair for the second time it is still hanging.
 
  Why is repair hanging?  Is it save to use control-C to break out of it.
  How do I recover from this?
 
  Bill




Data model question, storing Queue Message

2012-04-26 Thread Morgan Segalis
Hi everyone !

I'm fairly new to cassandra and I'm not quite yet familiarized with column 
oriented NoSQL model.
I have worked a while on it, but I can't seems to find the best model for what 
I'm looking for.

I have a Erlang software that let user connecting and communicate with each 
others, when an user (A) sends
a message to a disconnected user (B), it stores it on the database and wait for 
the user (B) to connect and retrieve
the message queue, and deletes it. 

Here's some key point : 
- Users are identified by integer IDs
- Each message are unique by combination of : Sender ID - Receiver ID - Message 
ID - time

I have a queue Message, and here's the operations I would need to do as fast as 
possible : 

- Store from 1 to X messages per registered user
- Get the number of stored messages per user (Can be a incremental variable 
updated at each store // this is often retrieved)
- retrieve all messages from an user at once.
- delete all messages from an user at once.
- delete all messages that are older than Y months (from all users).

I really don't think that storage will be an issue, I have 2TB per nodes, 
messages are 1KB limited.
I'm really looking for speed rather than storage optimization.

My configuration is 2 dedicated server which are both :
- 4 x Intel i7 2.66 Ghz
- 64 bits
- 24 Go
- 2 TB

Thank you all.

RE: Taking a Cluster Wide Snapshot

2012-04-26 Thread Shubham Srivastava
I was trying to get hold of all the data kind of a global snapshot.

I did the below :

I copied all the snapshots from each individual nodes where the snapshot data 
size was around 12Gb on each node to a common folder(one folder alone).

Strangely I found duplicate file names in multiple snapshots and more strangely 
the data size was different of each duplicate file which lead to the total data 
size to close to 13Gb(else have to be overwritten) where as the expectation was 
12*6 = 72Gb.

Does that mean that if I need to create a new ring with the same data as the 
existing one I cant just do that or should I start with the 13Gb copy to check 
if all the data is present which sounds pretty illogical.

Please suggest??


From: Shubham Srivastava
Sent: Thursday, April 26, 2012 12:43 PM
To: 'user@cassandra.apache.org'
Subject: Re: Taking a Cluster Wide Snapshot

Your second part was what I was also referring where I put all the files from 
nodes to a single node to create a similar bkp which needs to have unique file 
names across cluster.


From: Deno Vichas [mailto:d...@syncopated.net]
Sent: Thursday, April 26, 2012 12:29 PM
To: user@cassandra.apache.org user@cassandra.apache.org
Subject: Re: Taking a Cluster Wide Snapshot

there's no prerequisite for unique names.  each node's snapshot gets tar'ed up 
and then copied over to a directory the name of the hostname of the node.  then 
those dirs are tar'ed and copied to S3.

what i haven't tried yet is to untar everything for all nodes into a single 
node cluster.  i'm assuming i can get tar to replace or skip existing file so i 
end up with a set of unique files.  can somebody confirm this?




On 4/25/2012 11:45 PM, Shubham Srivastava wrote:
Thanks a Lot Deno.  A bit surprised that the an equivalent command should be 
there with nodetool. Not sure if it is in the latest release.

BTW this makes a prerequisite that all the Data files of Cassandra be it index 
or filters etc will have unique names across cluster. Is this a reasoanble 
assumption to have.

Regards,
Shubham

From: Deno Vichas [d...@syncopated.netmailto:d...@syncopated.net]
Sent: Thursday, April 26, 2012 12:09 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Taking a Cluster Wide Snapshot

On 4/25/2012 11:34 PM, Shubham Srivastava wrote:
Whats the best way(or the only way) to take a cluster wide backup of Cassandra. 
Cant find much of the documentation on the same.

I am using a MultiDC setup with cassandra 0.8.6.


Regards,
Shubham
 here's how i'm doing in AWS land using the DataStax AMI via a nightly cron 
job.  you'll need pssh and s3cmd -


#!/bin/bash
cd /home/ec2-user/ops

echo making snapshots
pssh -h prod-cassandra-nodes.txt -l ubuntu -P 'nodetool -h localhost -p 7199 
clearsnapshot stocktouch'
pssh -h prod-cassandra-nodes.txt -l ubuntu -P 'nodetool -h localhost -p 7199 
snapshot stocktouch'

echo making tar balls
pssh -h prod-cassandra-nodes.txt -l ubuntu -P -t 0 'rm 
`hostname`-cassandra-snapshot.tar.gz'
pssh -h prod-cassandra-nodes.txt -l ubuntu -P -t 0 'tar -zcvf 
`hostname`-cassandra-snapshot.tar.gz /raid0/cassandra/data/stocktouch/snapshots'

echo coping tar balls
pslurp -h prod-cassandra-nodes.txt -l ubuntu 
/home/ubuntu/*cassandra-snapshot.tar.gz .

echo tar'ing tar balls
tar -cvf cassandra-snapshots-all-nodes.tar 10*

echo pushing to S3
../s3cmd-1.1.0-beta3/s3cmd put cassandra-snapshots-all-nodes.tar  
s3://stocktouch-backups

echo DONE!




Re: Taking a Cluster Wide Snapshot

2012-04-26 Thread Rob Coli
 I copied all the snapshots from each individual nodes where the snapshot
 data size was around 12Gb on each node to a common folder(one folder alone).

 Strangely I found duplicate file names in multiple snapshots and
 more strangely the data size was different of each duplicate file which lead
 to the total data size to close to 13Gb(else have to be overwritten) where
 as the expectation was 12*6 = 72Gb.

You have detected via experimentation that the namespacing of sstable
filenames per CF per node is not unique. In order to do the operation
you are doing, you have to rename them to be globally unique. Just
inflate the integer part is the easiest way.

https://issues.apache.org/jira/browse/CASSANDRA-1983

=Rob

-- 
=Robert Coli
AIMGTALK - rc...@palominodb.com
YAHOO - rcoli.palominob
SKYPE - rcoli_palominodb


Re: user Digest of: get.23021

2012-04-26 Thread Frank Ng
I am having the same issue in 1.0.7 with leveled compation.  It seems that
the repair is flaky.  It either completes relatively fast in a TEST
environment (7 minutes) or gets stuck trying to receive a merkle tree from
a peer that is already sending it the merkle tree.

Only solution is to restart cassandra.  But, we that's not good.

On Thu, Apr 26, 2012 at 2:12 PM, user-h...@cassandra.apache.org wrote:


 user Digest of: get.23021

 Topics (messages 23021 through 23021)

 repair waiting for something
23021 by: Igor



 Return-Path: buzzt...@gmail.com
 Received: (qmail 18382 invoked by uid 99); 26 Apr 2012 18:12:10 -
 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230)
by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 26 Apr 2012 18:12:10
 +
 X-ASF-Spam-Status: No, hits=1.5 required=5.0
tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS
 X-Spam-Check-By: apache.org
 Received-SPF: pass (nike.apache.org: domain of buzztemk@gmail.comdesignates
 209.85.213.44 as permitted sender)
 Received: from [209.85.213.44] (HELO mail-yw0-f44.google.com)
 (209.85.213.44)
by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 26 Apr 2012 18:12:03
 +
 Received: by yhkk25 with SMTP id k25so1353248yhk.31
for user-get.23...@cassandra.apache.org; Thu, 26 Apr 2012
 11:11:42 -0700 (PDT)
 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=gmail.com; s=20120113;
h=mime-version:date:message-id:subject:from:to:content-type;
bh=r9z+JIAEkTfLo/8PFQJjtEFfJbNxrmswWgqgBxX7sGs=;
b=SqDIdsaA/YBsIb8yTAjwlLyz/3KvP2fJzedX1lywPYnAT698AbE2yGI30qpGo8rUQM

 q/QFJ5mFNQkdrn0Ghr6L+wKe+slq6Teb8C/feeHBU9BkjbaAY40UPPljJyf/L0Yr9Sp8

 ryso93dpcgcC18DdwbAPHmxd0C9G20gf4dbQcpquAKgyxtTK849GQXpPICS4AUHlG2bL

 OY83kIzRIBv7g3Zy2SJALwYX9eeB6zGin0DbnrtgGr7IqI0LBscWv6eKNMS658twLGG+

 37cVt+Wmtf5QIIT/Jm2qUdBZ7NViwwlnkJL79ULGnesj4Hewp2npFQAmLypK+8fGqoAM
 ie9Q==
 MIME-Version: 1.0
 Received: by 10.182.113.106 with SMTP id
 ix10mr10045510obb.26.1335463902287;
  Thu, 26 Apr 2012 11:11:42 -0700 (PDT)
 Received: by 10.60.143.102 with HTTP; Thu, 26 Apr 2012 11:11:42 -0700 (PDT)
 Date: Thu, 26 Apr 2012 14:11:42 -0400
 Message-ID: 
 caal7ocavuw1rtaqwlddzbnzosv7-qxqfhot7w6uj8q08m03...@mail.gmail.com
 Subject: Get
 From: Frank Ng buzzt...@gmail.com
 To: user-get.23...@cassandra.apache.org
 Content-Type: multipart/alternative; boundary=f46d0447f3b081982d04be98eb8e


 --


  Hi,

 10 nodes cassandra 1.0.3, several DC. weekly nodetool repair stuck for
 unusual long time for node 10.254.237.2.

 output log on this node:
  INFO 11:19:42,045 Starting repair command #1, repairing 5 ranges.
  INFO 11:19:42,053 [repair #040aae00-28a1-11e1--e378018944ff] new
 session: will sync *localhost/10.254.237.2, /10.254.221.2, /10.253.2.2, /
 10.254.217.2, /10.254.94.2* on range
 (85070591730234615865843651857942052864,85070591730234615865843651857942052865]
 for meter.[eventschema, schema, ids, transaction]
  INFO 11:19:42,055 [repair #040aae00-28a1-11e1--e378018944ff] requests
 for merkle tree sent for eventschema (to [/10.253.2.2, /10.254.221.2,
 localhost/10.254.237.2, /10.254.217.2, /10.254.94.2])
  INFO 11:19:42,063 Enqueuing flush of 
 Memtable-eventschema@1509399856(18748/23435
 serialized/live bytes, 4 ops)
  INFO 11:19:42,063 Writing Memtable-eventschema@1509399856(18748/23435
 serialized/live bytes, 4 ops)
  INFO 11:19:42,072 Completed flushing
 /spool1/cassandra/data/meter/eventschema-hb-40-Data.db (4745 bytes)
  INFO 11:19:42,073 Discarding obsolete commit
 log:CommitLogSegment(/var/lib/cassandra/commitlog/CommitLog-1324019623060.log)
  INFO 11:19:42,076 [repair #040aae00-28a1-11e1--e378018944ff] Received
 merkle tree for eventschema from localhost/10.254.237.2
  INFO 11:19:42,102 [repair #040aae00-28a1-11e1--e378018944ff] Received
 merkle tree for eventschema from /10.254.221.2
  INFO 11:19:42,128 [repair #040aae00-28a1-11e1--e378018944ff] Received
 merkle tree for eventschema from /10.254.217.2
  INFO 11:19:42,228 [repair #040aae00-28a1-11e1--e378018944ff] Received
 merkle tree for eventschema from /10.253.2.2

 And nothing after that for long time. So node sent request for trees to
 other nodes and received all but from the 10.254.94.2*

 *On that 10.254.94.2 node:
 INFO 11:19:42,083 [repair #040aae00-28a1-11e1--e378018944ff] Sending
 completed merkle tree to /10.254.237.2 for (meter,eventschema)

 So merkle tree were lost somewhere. Will this waiting break somehow or I
 need to restart node?




Re: repair waiting for something

2012-04-26 Thread Frank Ng
I am having the same issue in 1.0.7 with leveled compation.  It seems that
the repair is flaky.  It either completes relatively fast in a TEST
environment (7 minutes) or gets stuck trying to receive a merkle tree from
a peer that is already sending it the merkle tree.

Only solution is to restart cassandra.  But, we that's not good.


Is this possible.

2012-04-26 Thread Ed Jone
Hello,

I am new to cassandra and was hoping if someone can tell me if the
following is possible.


Given I have a columnfamily with a list of users in each Row.

Each user has the properties: name, highscore, x, y, z.

I want to use name as the column key, but I want the columns to be sorted
by highscore (always).

The only reads would be to get the top N users by highscore in a given row.
I thought about adding the weight to the name as the key (eg:
299.76-johnsmith) but then I would not be able to update a given user.

This was not possible in the past, but I am not familiar, with the newer
cassandra versions.


Re: Is this possible.

2012-04-26 Thread Data Craftsman
Data model:

REM CQL 3.0


$ cqlsh --cql3

drop COLUMNFAMILY user_score_v3;

CREATE COLUMNFAMILY user_score_v3
(name varchar,
 highscore float,
 x int,
 y varchar,
 z varchar,
 PRIMARY KEY (name, highscore)
);

DML is as usual, as commom, as RDBMS SQL.

Query:

Top 3,

SELECT name, highscore, x,y,z FROM user_score_v3 where name='abc'
ORDER BY highscore desc
LIMIT 3;

You may try Reversed Comparators, see
http://thelastpickle.com/2011/10/03/Reverse-Comparators/

Help this is helpful.

Thank,
Charlie | DBA


On Thu, Apr 26, 2012 at 12:34 PM, Ed Jone edjo...@gmail.com wrote:
 Hello,

 I am new to cassandra and was hoping if someone can tell me if the following
 is possible.


 Given I have a columnfamily with a list of users in each Row.

 Each user has the properties: name, highscore, x, y, z.

 I want to use name as the column key, but I want the columns to be sorted by
 highscore (always).

 The only reads would be to get the top N users by highscore in a given row.
 I thought about adding the weight to the name as the key (eg:
 299.76-johnsmith) but then I would not be able to update a given user.

 This was not possible in the past, but I am not familiar, with the newer
 cassandra versions.


--
Thanks,

Charlie (@mujiang) 一个 木匠
===
Data Architect Developer
http://mujiang.blogspot.com


Re: Is this possible.

2012-04-26 Thread Data Craftsman
DML example,

insert into user_score_v3(name, highscore, x,y,z)
values ('abc', 299.76, 1001, '*', '*');
...


2012/4/26 Data Craftsman database.crafts...@gmail.com:
 Data model:

 REM CQL 3.0
 

 $ cqlsh --cql3

 drop COLUMNFAMILY user_score_v3;

 CREATE COLUMNFAMILY user_score_v3
 (name varchar,
  highscore float,
  x int,
  y varchar,
  z varchar,
  PRIMARY KEY (name, highscore)
 );

 DML is as usual, as commom, as RDBMS SQL.

 Query:

 Top 3,

 SELECT name, highscore, x,y,z FROM user_score_v3 where name='abc'
 ORDER BY highscore desc
 LIMIT 3;

 You may try Reversed Comparators, see
 http://thelastpickle.com/2011/10/03/Reverse-Comparators/

 Help this is helpful.

 Thank,
 Charlie | DBA


 On Thu, Apr 26, 2012 at 12:34 PM, Ed Jone edjo...@gmail.com wrote:
 Hello,

 I am new to cassandra and was hoping if someone can tell me if the following
 is possible.


 Given I have a columnfamily with a list of users in each Row.

 Each user has the properties: name, highscore, x, y, z.

 I want to use name as the column key, but I want the columns to be sorted by
 highscore (always).

 The only reads would be to get the top N users by highscore in a given row.
 I thought about adding the weight to the name as the key (eg:
 299.76-johnsmith) but then I would not be able to update a given user.

 This was not possible in the past, but I am not familiar, with the newer
 cassandra versions.


 --
 Thanks,

 Charlie (@mujiang) 一个 木匠
 ===
 Data Architect Developer
 http://mujiang.blogspot.com



-- 
--
Thanks,

Charlie (@mujiang) 一个 木匠
===
Data Architect Developer
http://mujiang.blogspot.com


Node join streaming stuck at 100%

2012-04-26 Thread Bryce Godfrey
This is the second node I've joined to my cluster in the last few days, and so 
far both have become stuck at 100% on a large file according to netstats.  This 
is on 1.0.9, is there anything I can do to make it move on besides restarting 
Cassandra?  I don't see any errors or warns in logs for either server, and 
there is plenty of disk space.

On the sender side I see this:
Streaming to: /10.20.1.152
   /opt/cassandra/data/MonitoringData/PropertyTimeline-hc-80540-Data.db 
sections=1 progress=82393861085/82393861085 - 100%

On the node joining I don't see this file in netstats, and all pending streams 
are sitting at 0%





Map reduce without hdfs

2012-04-26 Thread ruslan usifov
Hello to all!

It it possible to launch only hadoop mapreduce task tracker and job tracker
against cassandra cluster, and doesn't launch HDFS (use for shared storage
something else)??

Thanks


Re: Map reduce without hdfs

2012-04-26 Thread Edward Capriolo
That is one of the perks of brisk and later datastax enterprise. As it
stands the datanode component is only used as a distributed cache for
jars. So if your job uses cassandra for input format and output format
you only need the other components for temporary storage.

On Thu, Apr 26, 2012 at 7:12 PM, ruslan usifov ruslan.usi...@gmail.com wrote:
 Hello to all!

 It it possible to launch only hadoop mapreduce task tracker and job tracker
 against cassandra cluster, and doesn't launch HDFS (use for shared storage
 something else)??

 Thanks


RE: Taking a Cluster Wide Snapshot

2012-04-26 Thread Shubham Srivastava
Thanks a lot Rob. 

On another thought I could also try copying the data of my keyspace alone from 
one node to another node in the new cluster (I have both the old and new 
clusters having same nodes DC1:6,DC2:6 with same tokens) with the same tokens.

Would there be any risk of the new cluster getting joined to the old cluster 
probably if the data inside keyspace is aware of the original IP's etc. 

Is this recommended?

Regards,
Shubham

From: Rob Coli [rc...@palominodb.com]
Sent: Thursday, April 26, 2012 11:42 PM
To: user@cassandra.apache.org
Subject: Re: Taking a Cluster Wide Snapshot

 I copied all the snapshots from each individual nodes where the snapshot
 data size was around 12Gb on each node to a common folder(one folder alone).

 Strangely I found duplicate file names in multiple snapshots and
 more strangely the data size was different of each duplicate file which lead
 to the total data size to close to 13Gb(else have to be overwritten) where
 as the expectation was 12*6 = 72Gb.

You have detected via experimentation that the namespacing of sstable
filenames per CF per node is not unique. In order to do the operation
you are doing, you have to rename them to be globally unique. Just
inflate the integer part is the easiest way.

https://issues.apache.org/jira/browse/CASSANDRA-1983

=Rob

--
=Robert Coli
AIMGTALK - rc...@palominodb.com
YAHOO - rcoli.palominob
SKYPE - rcoli_palominodb


Re: Taking a Cluster Wide Snapshot

2012-04-26 Thread Rob Coli
On Thu, Apr 26, 2012 at 10:38 PM, Shubham Srivastava
shubham.srivast...@makemytrip.com wrote:
 On another thought I could also try copying the data of my keyspace alone 
 from one node to another node in the new cluster (I have both the old and new 
 clusters having same nodes DC1:6,DC2:6 with same tokens) with the same tokens.

 Would there be any risk of the new cluster getting joined to the old cluster 
 probably if the data inside keyspace is aware of the original IP's etc.

As a result of this very concern while @ Digg...

https://issues.apache.org/jira/browse/CASSANDRA-769

tl;dr : as long as your cluster names are unique in your cluster
config (**and you do not copy the System keyspace, letting the new
cluster initialize with the new cluster name**), nodes are at no risk
of joining the wrong cluster.

=Rob

-- 
=Robert Coli
AIMGTALK - rc...@palominodb.com
YAHOO - rcoli.palominob
SKYPE - rcoli_palominodb