Query

2012-06-06 Thread MOHD ARSHAD SALEEM
Hi All,

I am using Hector client for cassandra . I wanted to know how to create 
keyspace and column family using API's to read and write data.
or  i have to create keyspace and column family manually using command line 
interface.

Regards
Arshad


Re: How to include two nodes in Java code using Hector

2012-06-06 Thread Roshni Rajagopal


In Hector when you create a cluster using the API, you specify an IP address  
cluster name. Thereafter internally which node serves the request or how many 
nodes need to be contacted to read/write data depends on the cluster 
configuration i.e. Whats your replication strategy, factor, consistency levels 
for the col family , how many nodes are there in the ring etc. So you don't 
individually need to connect to each node via Hector client. Once you connect 
to the cluster  keyspace, via any IP add of any node in the cluster, when you 
make Hector calls to read/write data, it would automatically figure out the 
node level details and carry out the task. You won't get 50% of the data, you 
will get all data.


Also when you remove a node, your data will be unavailable ONLY if you don't 
have it available in some other node as a replica..


Regards,


From: Prakrati Agrawal 
prakrati.agra...@mu-sigma.commailto:prakrati.agra...@mu-sigma.com
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Tue, 5 Jun 2012 20:05:21 -0700
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: RE: How to include two nodes in Java code using Hector

But the data is distributed on the nodes ( meaning 50% of data is on one node 
and 50% of data is on another node) so I need to specify the node ip address 
somewhere in the code. But where do I specify that is what I am clueless about. 
Please help me

Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com

From: Harshvardhan Ojha [mailto:harshvardhan.o...@makemytrip.com]
Sent: Tuesday, June 05, 2012 5:51 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: RE: How to include two nodes in Java code using Hector

Use Consistency Level =2.

Regards
Harsh

From: Prakrati Agrawal [mailto:prakrati.agra...@mu-sigma.com]
Sent: Tuesday, June 05, 2012 4:08 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: How to include two nodes in Java code using Hector

Dear all

I am using a two node Cassandra cluster. How do I code in Java using Hector to 
get data from both the nodes. Please help

Thanks and Regards

Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | 
www.mu-sigma.comhttp://www.mu-sigma.com



This email message may contain proprietary, private and confidential 
information. The information transmitted is intended only for the person(s) or 
entities to which it is addressed. Any review, retransmission, dissemination or 
other use of, or taking of any action in reliance upon, this information by 
persons or entities other than the intended recipient is prohibited and may be 
illegal. If you received this in error, please contact the sender and delete 
the message from your system.

Mu Sigma takes all reasonable steps to ensure that its electronic 
communications are free from viruses. However, given Internet accessibility, 
the Company cannot accept liability for any virus introduced by this e-mail or 
any attachment and you are advised to use up-to-date virus checking software.


This email message may contain proprietary, private and confidential 
information. The information transmitted is intended only for the person(s) or 
entities to which it is addressed. Any review, retransmission, dissemination or 
other use of, or taking of any action in reliance upon, this information by 
persons or entities other than the intended recipient is prohibited and may be 
illegal. If you received this in error, please contact the sender and delete 
the message from your system.

Mu Sigma takes all reasonable steps to ensure that its electronic 
communications are free from viruses. However, given Internet accessibility, 
the Company cannot accept liability for any virus introduced by this e-mail or 
any attachment and you are advised to use up-to-date virus checking software.

This email and any files transmitted with it are confidential and intended 
solely for the individual or entity to whom they are addressed. If you have 
received this email in error destroy it immediately. *** Walmart Confidential 
***


RE: How to include two nodes in Java code using Hector

2012-06-06 Thread Prakrati Agrawal
Thank you for the reply.
Now I have decommissioned a node but now I don't know how to recommission it 
.Please help me

Thanks and Regards

Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com

-Original Message-
From: Roshni Rajagopal [mailto:roshni.rajago...@wal-mart.com]
Sent: Wednesday, June 06, 2012 11:42 AM
To: user@cassandra.apache.org
Subject: Re: How to include two nodes in Java code using Hector



In Hector when you create a cluster using the API, you specify an IP address  
cluster name. Thereafter internally which node serves the request or how many 
nodes need to be contacted to read/write data depends on the cluster 
configuration i.e. Whats your replication strategy, factor, consistency levels 
for the col family , how many nodes are there in the ring etc. So you don't 
individually need to connect to each node via Hector client. Once you connect 
to the cluster  keyspace, via any IP add of any node in the cluster, when you 
make Hector calls to read/write data, it would automatically figure out the 
node level details and carry out the task. You won't get 50% of the data, you 
will get all data.


Also when you remove a node, your data will be unavailable ONLY if you don't 
have it available in some other node as a replica..


Regards,


From: Prakrati Agrawal 
prakrati.agra...@mu-sigma.commailto:prakrati.agra...@mu-sigma.com
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Tue, 5 Jun 2012 20:05:21 -0700
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: RE: How to include two nodes in Java code using Hector

But the data is distributed on the nodes ( meaning 50% of data is on one node 
and 50% of data is on another node) so I need to specify the node ip address 
somewhere in the code. But where do I specify that is what I am clueless about. 
Please help me

Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com

From: Harshvardhan Ojha [mailto:harshvardhan.o...@makemytrip.com]
Sent: Tuesday, June 05, 2012 5:51 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: RE: How to include two nodes in Java code using Hector

Use Consistency Level =2.

Regards
Harsh

From: Prakrati Agrawal [mailto:prakrati.agra...@mu-sigma.com]
Sent: Tuesday, June 05, 2012 4:08 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: How to include two nodes in Java code using Hector

Dear all

I am using a two node Cassandra cluster. How do I code in Java using Hector to 
get data from both the nodes. Please help

Thanks and Regards

Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | 
www.mu-sigma.comhttp://www.mu-sigma.com



This email message may contain proprietary, private and confidential 
information. The information transmitted is intended only for the person(s) or 
entities to which it is addressed. Any review, retransmission, dissemination or 
other use of, or taking of any action in reliance upon, this information by 
persons or entities other than the intended recipient is prohibited and may be 
illegal. If you received this in error, please contact the sender and delete 
the message from your system.

Mu Sigma takes all reasonable steps to ensure that its electronic 
communications are free from viruses. However, given Internet accessibility, 
the Company cannot accept liability for any virus introduced by this e-mail or 
any attachment and you are advised to use up-to-date virus checking software.


This email message may contain proprietary, private and confidential 
information. The information transmitted is intended only for the person(s) or 
entities to which it is addressed. Any review, retransmission, dissemination or 
other use of, or taking of any action in reliance upon, this information by 
persons or entities other than the intended recipient is prohibited and may be 
illegal. If you received this in error, please contact the sender and delete 
the message from your system.

Mu Sigma takes all reasonable steps to ensure that its electronic 
communications are free from viruses. However, given Internet accessibility, 
the Company cannot accept liability for any virus introduced by this e-mail or 
any attachment and you are advised to use up-to-date virus checking software.

This email and any files transmitted with it are confidential and intended 
solely for the individual or entity to whom they are addressed. If you have 
received this email in error destroy it immediately. *** Walmart Confidential 
***

 This email message may contain proprietary, private and confidential 
information. The information transmitted is intended only for the person(s) or 
entities to which it is addressed. Any review, retransmission, dissemination or 
other use of, or taking of any action 

How to make a decommissioned node join the ring again

2012-06-06 Thread Prakrati Agrawal
Dear all

I decommissioned a node. Now I want to make that node a part of the ring again. 
How do I do it? Please help me

Thanks and Regards

Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com



This email message may contain proprietary, private and confidential 
information. The information transmitted is intended only for the person(s) or 
entities to which it is addressed. Any review, retransmission, dissemination or 
other use of, or taking of any action in reliance upon, this information by 
persons or entities other than the intended recipient is prohibited and may be 
illegal. If you received this in error, please contact the sender and delete 
the message from your system.

Mu Sigma takes all reasonable steps to ensure that its electronic 
communications are free from viruses. However, given Internet accessibility, 
the Company cannot accept liability for any virus introduced by this e-mail or 
any attachment and you are advised to use up-to-date virus checking software.


Re: memory issue on 1.1.0

2012-06-06 Thread aaron morton
Mina, 
That does not sound right. 

If you have the time can you create a jira ticket describing the 
problem, please include:

* the GC logs gathered by enabling them here 
https://github.com/apache/cassandra/blob/trunk/conf/cassandra-env.sh#L165 (It 
would be good to see the node get into trouble if possible).
* OS, JVM and cassandra versions
* information on the schema and workload
* anything else you think is important. 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 6/06/2012, at 7:24 AM, Mina Naguib wrote:

 
 Hi Wade
 
 I don't know if your scenario matches mine, but I've been struggling with 
 memory pressure in 1.x as well.  I made the jump from 0.7.9 to 1.1.0, along 
 with enabling compression and levelled compactions, so I don't know which 
 specifically is the main culprit.
 
 Specifically, all my nodes seem to lose heap memory.  As parnew and CMS do 
 their job, over any reasonable period of time, the floor of memory after a 
 GC keeps rising.  This is quite visible if you leave jconsole connected for a 
 day or so, and manifests itself as a funny-looking cone like so: 
 http://mina.naguib.ca/images/cassandra_jconsole.png
 
 Once memory pressure reaches a point where the heap can't be maintained 
 reliably below 75%, cassandra goes into survival mode - via a bunch of 
 tunables in cassandra.conf it'll do things like flush memtables, drop caches, 
 etc - all of which, in my experience, especially with the recent off-heap 
 data structures, exasperate the problem.
 
 I've been meaning, of course, to collect enough technical data to file a bug 
 report, but haven't had the time.  I have not yet tested 1.1.1 to see if it 
 improves the situation.
 
 What I have found however, is a band-aid which you see at the rightmost 
 section of the graph in the screenshot I posted.  That is simply to hit 
 Perform GC button in jconsole.  It seems that a full System.gc() *DOES* 
 reclaim heap memory that parnew and CMS fail to reclaim.
 
 On my production cluster I have a full-GC via JMX scheduled in a rolling 
 fashion every 4 hours.  It's extremely expensive (20-40 seconds of 
 unresponsiveness) but is a necessary evil in my situation.  Without it, my 
 nodes enter a nasty spiral of constant flushing, constant compactions, high 
 heap usage, instability and high latency.
 
 
 On 2012-06-05, at 2:56 PM, Poziombka, Wade L wrote:
 
 Alas, upgrading to 1.1.1 did not solve my issue.
 
 -Original Message-
 From: Brandon Williams [mailto:dri...@gmail.com] 
 Sent: Monday, June 04, 2012 11:24 PM
 To: user@cassandra.apache.org
 Subject: Re: memory issue on 1.1.0
 
 Perhaps the deletes: https://issues.apache.org/jira/browse/CASSANDRA-3741
 
 -Brandon
 
 On Sun, Jun 3, 2012 at 6:12 PM, Poziombka, Wade L 
 wade.l.poziom...@intel.com wrote:
 Running a very write intensive (new column, delete old column etc.) process 
 and failing on memory.  Log file attached.
 
 Curiously when I add new data I have never seen this have in past sent 
 hundreds of millions new transactions.  It seems to be when I 
 modify.  my process is as follows
 
 key slice to get columns to modify in batches of 100, in separate threads 
 modify those columns.  I advance the slice with the start key each with 
 last key in previous batch.  Mutations done are update a column value in 
 one column family(token), delete column and add new column in another (pan).
 
 Runs well until after about 5 million rows then it seems to run out of 
 memory.  Note that these column families are quite small.
 
 WARN [ScheduledTasks:1] 2012-06-03 17:49:01,558 GCInspector.java (line 
 145) Heap is 0.7967470834946492 full.  You may need to reduce memtable 
 and/or cache sizes.  Cassandra will now flush up to the two largest 
 memtables to free up memory.  Adjust flush_largest_memtables_at 
 threshold in cassandra.yaml if you don't want Cassandra to do this 
 automatically
 INFO [ScheduledTasks:1] 2012-06-03 17:49:01,559 StorageService.java 
 (line 2772) Unable to reduce heap usage since there are no dirty 
 column families
 INFO [GossipStage:1] 2012-06-03 17:49:01,999 Gossiper.java (line 797) 
 InetAddress /10.230.34.170 is now UP
 INFO [ScheduledTasks:1] 2012-06-03 17:49:10,048 GCInspector.java 
 (line 122) GC for ParNew: 206 ms for 1 collections, 7345969520 used; 
 max is 8506048512
 INFO [ScheduledTasks:1] 2012-06-03 17:49:53,187 GCInspector.java 
 (line 122) GC for ConcurrentMarkSweep: 12770 ms for 1 collections, 
 5714800208 used; max is 8506048512
 
 
 Keyspace: keyspace
   Read Count: 50042632
   Read Latency: 0.23157864418482224 ms.
   Write Count: 44948323
   Write Latency: 0.019460829472992797 ms.
   Pending Tasks: 0
   Column Family: pan
   SSTable count: 5
   Space used (live): 1977467326
   Space used (total): 1977467326
   Number of Keys (estimate): 16334848
   Memtable 

how to create keyspace using cassandra API's

2012-06-06 Thread MOHD ARSHAD SALEEM
Hi All,

I am using Hector as a client in cassandra.And  iam trying to create Keyspace 
using the following API's
Keyspace keyspace = HFactory.createKeyspace(test, cluster);
but it showing the following error:
caused by: InvalidRequestException(why:Keyspace test does not exist)
can any body help me how to create keyspace in cassandra.

Regards
Arshad


Re: Query

2012-06-06 Thread Filippo Diotalevi
Hi,  
the Javadoc (or source code) of the me.prettyprint.hector.api.factory.HFactory 
class contains all the examples to create keyspaces and column families.

To create a keyspace:

String testKeyspace = testKeyspace; 
KeyspaceDefinition newKeyspace = 
HFactory.createKeyspaceDefinition(testKeyspace);
cluster.addKeyspace(newKeyspace);



To create a column family and a keyspace:

String keyspace = testKeyspace; 
String column1 = testcolumn;
ColumnFamilyDefinition columnFamily1 = 
HFactory.createColumnFamilyDefinition(keyspace, column1);
ListColumnFamilyDefinition columns = new ArrayListColumnFamilyDefinition(); 
columns.add(columnFamily1);

KeyspaceDefinition testKeyspace =
HFactory.createKeyspaceDefinition(keyspace, 
org.apache.cassandra.locator.SimpleStrategy.class.getName(), 1, columns);
cluster.addKeyspace(testKeyspace);


-- 
Filippo Diotalevi



On Wednesday, 6 June 2012 at 07:05, MOHD ARSHAD SALEEM wrote:

 Hi All,
 
 I am using Hector client for cassandra . I wanted to know how to create 
 keyspace and column family using API's to read and write data.
 or  i have to create keyspace and column family manually using command line 
 interface.
 
 Regards
 Arshad



RE: how to create keyspace using cassandra API's

2012-06-06 Thread Prakrati Agrawal
You have to create the keyspace manually first using Cassandra cli

Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com

From: MOHD ARSHAD SALEEM [mailto:marshadsal...@tataelxsi.co.in]
Sent: Wednesday, June 06, 2012 2:27 PM
To: user@cassandra.apache.org
Subject: how to create keyspace using cassandra API's

Hi All,

I am using Hector as a client in cassandra.And  iam trying to create Keyspace 
using the following API's
Keyspace keyspace = HFactory.createKeyspace(test, cluster);
but it showing the following error:
caused by: InvalidRequestException(why:Keyspace test does not exist)
can any body help me how to create keyspace in cassandra.

Regards
Arshad


This email message may contain proprietary, private and confidential 
information. The information transmitted is intended only for the person(s) or 
entities to which it is addressed. Any review, retransmission, dissemination or 
other use of, or taking of any action in reliance upon, this information by 
persons or entities other than the intended recipient is prohibited and may be 
illegal. If you received this in error, please contact the sender and delete 
the message from your system.

Mu Sigma takes all reasonable steps to ensure that its electronic 
communications are free from viruses. However, given Internet accessibility, 
the Company cannot accept liability for any virus introduced by this e-mail or 
any attachment and you are advised to use up-to-date virus checking software.


Re: memory issue on 1.1.0

2012-06-06 Thread aaron morton
I looked through the log again. Still looks like it's overloaded and not 
handling the overload very well. 

It looks like a sustained write load of around 280K columns every 5 minutes for 
about 5 hours. It may be that the CPU is the bottle neck when it comes to GC 
throughput. You are hitting ParNew issues from the very start, and end up with 
20 second CMS. Do you see high CPU load ? 

Can you enable the GC logging options in cassandra-env.sh ? 
 
Can you throttle back the test and to a level where the server does not fail ? 

Alternatively can you dump the heap when it get's full and see what it taking 
up all the space ?

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 6/06/2012, at 2:12 PM, Poziombka, Wade L wrote:

 Ok, so I have completely refactored to remove deletes and it still fails. So 
 it is completely unrelated to deletes.
 
 I guess I need to go back to 1.0.10?  When I originally evaluated I ran 
 1.0.8... perhaps I went a bridge too far with 1.1.
 
 I don't think I am doing anything exotic here.
 
 Here is my column family.  
 
 KsDef(name:TB_UNIT, 
 strategy_class:org.apache.cassandra.locator.SimpleStrategy, 
 strategy_options:{replication_factor=3}, 
 cf_defs:[
 
 CfDef(keyspace:TB_UNIT, name:token, column_type:Standard, 
 comparator_type:BytesType, column_metadata:[ColumnDef(name:70 61 6E 45 6E 63, 
 validation_class:BytesType), ColumnDef(name:63 72 65 61 74 65 54 73, 
 validation_class:DateType), ColumnDef(name:63 72 65 61 74 65 44 61 74 65, 
 validation_class:DateType, index_type:KEYS, index_name:TokenCreateDate), 
 ColumnDef(name:65 6E 63 72 79 70 74 69 6F 6E 53 65 74 74 69 6E 67 73 49 44, 
 validation_class:UTF8Type, index_type:KEYS, 
 index_name:EncryptionSettingsID)], caching:keys_only), 
 
 CfDef(keyspace:TB_UNIT, name:pan_d721fd40fd9443aa81cc6f59c8e047c6, 
 column_type:Standard, comparator_type:BytesType, caching:keys_only), 
 
 CfDef(keyspace:TB_UNIT, name:counters, column_type:Standard, 
 comparator_type:BytesType, column_metadata:[ColumnDef(name:75 73 65 43 6F 75 
 6E 74, validation_class:CounterColumnType)], 
 default_validation_class:CounterColumnType, caching:keys_only)
 
 ])
 
 
 -Original Message-
 From: Poziombka, Wade L [mailto:wade.l.poziom...@intel.com] 
 Sent: Tuesday, June 05, 2012 3:09 PM
 To: user@cassandra.apache.org
 Subject: RE: memory issue on 1.1.0
 
 Thank you.  I do have some of the same observations.  Do you do deletes?
 
 My observation is that without deletes (or column updates I guess) I can run 
 forever happy.  but when I run (what for me is a batch process) operations 
 that delete and modify column values I run into this.
 
 Reading bug https://issues.apache.org/jira/browse/CASSANDRA-3741 the advice 
 is to NOT do deletes individually and to truncate.  I am scrambling to try to 
 do this but curious if it will be worth the effort.
 
 Wade
 
 -Original Message-
 From: Mina Naguib [mailto:mina.nag...@bloomdigital.com] 
 Sent: Tuesday, June 05, 2012 2:24 PM
 To: user@cassandra.apache.org
 Subject: Re: memory issue on 1.1.0
 
 
 Hi Wade
 
 I don't know if your scenario matches mine, but I've been struggling with 
 memory pressure in 1.x as well.  I made the jump from 0.7.9 to 1.1.0, along 
 with enabling compression and levelled compactions, so I don't know which 
 specifically is the main culprit.
 
 Specifically, all my nodes seem to lose heap memory.  As parnew and CMS do 
 their job, over any reasonable period of time, the floor of memory after a 
 GC keeps rising.  This is quite visible if you leave jconsole connected for a 
 day or so, and manifests itself as a funny-looking cone like so: 
 http://mina.naguib.ca/images/cassandra_jconsole.png
 
 Once memory pressure reaches a point where the heap can't be maintained 
 reliably below 75%, cassandra goes into survival mode - via a bunch of 
 tunables in cassandra.conf it'll do things like flush memtables, drop caches, 
 etc - all of which, in my experience, especially with the recent off-heap 
 data structures, exasperate the problem.
 
 I've been meaning, of course, to collect enough technical data to file a bug 
 report, but haven't had the time.  I have not yet tested 1.1.1 to see if it 
 improves the situation.
 
 What I have found however, is a band-aid which you see at the rightmost 
 section of the graph in the screenshot I posted.  That is simply to hit 
 Perform GC button in jconsole.  It seems that a full System.gc() *DOES* 
 reclaim heap memory that parnew and CMS fail to reclaim.
 
 On my production cluster I have a full-GC via JMX scheduled in a rolling 
 fashion every 4 hours.  It's extremely expensive (20-40 seconds of 
 unresponsiveness) but is a necessary evil in my situation.  Without it, my 
 nodes enter a nasty spiral of constant flushing, constant compactions, high 
 heap usage, instability and high latency.
 
 
 On 2012-06-05, at 2:56 PM, Poziombka, Wade L wrote:
 
 Alas, upgrading to 1.1.1 did not solve 

Cassandra not retrieving the complete data on 2 nodes

2012-06-06 Thread Prakrati Agrawal
Dear all

I was originally having a 1 node cluster. Then I added one more node to it with 
initial token configured appropriately. Now when I run my queries I am not 
getting all my data ie all columns.

Output on 2 nodes
Time taken to retrieve columns 43707 of key range is 1276
Time taken to retrieve columns 2084199 of all tickers is 54334
Time taken to count is 230776
Total number of rows in the database are 183
Total number of columns in the database are 7903753

Output on 1 node
Time taken to retrieve columns 43707 of key range is 767
Time taken to retrieve columns 382 of all tickers is 52793
Time taken to count is 268135
Total number of rows in the database are 396
Total number of columns in the database are 16316426

Please help me. Where is my data going or how should I retrieve it.

Thanks and Regards
Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com



This email message may contain proprietary, private and confidential 
information. The information transmitted is intended only for the person(s) or 
entities to which it is addressed. Any review, retransmission, dissemination or 
other use of, or taking of any action in reliance upon, this information by 
persons or entities other than the intended recipient is prohibited and may be 
illegal. If you received this in error, please contact the sender and delete 
the message from your system.

Mu Sigma takes all reasonable steps to ensure that its electronic 
communications are free from viruses. However, given Internet accessibility, 
the Company cannot accept liability for any virus introduced by this e-mail or 
any attachment and you are advised to use up-to-date virus checking software.


Why Hector is taking more time than Thrift

2012-06-06 Thread Prakrati Agrawal
Dear all

I am trying to evaluate the performance of Cassandra and wrote a code to 
retrieve a complete row ( having 43707 columns) using Thrift and Hector.
The thrift client code took 0.767 seconds while Hector code took 0.883 seconds 
. Is it expected that Hector will be slower than Thrift? If yes, then why are 
we using Hector and not Thrift?

Thanks and Regards

Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com



This email message may contain proprietary, private and confidential 
information. The information transmitted is intended only for the person(s) or 
entities to which it is addressed. Any review, retransmission, dissemination or 
other use of, or taking of any action in reliance upon, this information by 
persons or entities other than the intended recipient is prohibited and may be 
illegal. If you received this in error, please contact the sender and delete 
the message from your system.

Mu Sigma takes all reasonable steps to ensure that its electronic 
communications are free from viruses. However, given Internet accessibility, 
the Company cannot accept liability for any virus introduced by this e-mail or 
any attachment and you are advised to use up-to-date virus checking software.


Re: Why Hector is taking more time than Thrift

2012-06-06 Thread R. Verlangen
Hector is a higher-level client that provides some abstraction and an easy
to use interface. The Thrift API is pretty raw. So for most cases the
Hector client would be the best choice; except for use-cases where the
ultimate performance is a requirement (resulting in lots of more
maintenance between Thrift API changes).

2012/6/6 Prakrati Agrawal prakrati.agra...@mu-sigma.com

  Dear all

 ** **

 I am trying to evaluate the performance of Cassandra and wrote a code to
 retrieve a complete row ( having 43707 columns) using Thrift and Hector. *
 ***

 The thrift client code took 0.767 seconds while Hector code took 0.883
 seconds . Is it expected that Hector will be slower than Thrift? If yes,
 then why are we using Hector and not Thrift?

 ** **

 Thanks and Regards

 ** **

 Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 |
 www.mu-sigma.com 

 ** **

 --
 This email message may contain proprietary, private and confidential
 information. The information transmitted is intended only for the person(s)
 or entities to which it is addressed. Any review, retransmission,
 dissemination or other use of, or taking of any action in reliance upon,
 this information by persons or entities other than the intended recipient
 is prohibited and may be illegal. If you received this in error, please
 contact the sender and delete the message from your system.

 Mu Sigma takes all reasonable steps to ensure that its electronic
 communications are free from viruses. However, given Internet
 accessibility, the Company cannot accept liability for any virus introduced
 by this e-mail or any attachment and you are advised to use up-to-date
 virus checking software.




-- 
With kind regards,

Robin Verlangen
*Software engineer*
*
*
W http://www.robinverlangen.nl
E ro...@us2.nl

Disclaimer: The information contained in this message and attachments is
intended solely for the attention and use of the named addressee and may be
confidential. If you are not the intended recipient, you are reminded that
the information remains the property of the sender. You must not use,
disclose, distribute, copy, print or rely on this e-mail. If you have
received this message in error, please contact the sender immediately and
irrevocably delete this message and any copies.


Re: Nodes not picking up data on repair, disk loaded unevenly

2012-06-06 Thread aaron morton
You are basically in trouble. If you can nuke it and start again it would be 
easier. If you want to figure out how to get out of it keep the cluster up and 
have a play. 


 -What I think the solution should be:
You want to get repair to work before you start deleting data. 

 At ~840GB I'm probably running close
 to the max load I should have on a node,
roughly 300GB to 400GB is the max load

 On node #1 I was able to successfully run a scrub and
 major compaction, 
In this situation running a major compaction is now what you want. it creates a 
huge file that can only be compacted if there is enough space for another huge 
file. Smaller files only need small space to be compacted. 

 Is there something I should be looking for in the logs to verify that the
 repair was successful? 
grep for repair command

The shortcut on EC2 is add an EBS volumn, tell cassandra it can store stuff 
there (in the yaml) and buy some breathing room. 


What version are you using ?

Has there been times when nodes were down ? 

Clear as much space as possible from the disk. Check for snapshots in all KS's. 

What KS's (including the system KS) are taking up the most space ? Are there a 
lot of hints in the system KS (they are not replicated)?

Try to get a feel for what CF's are taking up the space or not as the case my 
be. Look in nodetool cfstats to see how big the rows are. 

I you have enabled compression run nodetool upgradetables to compress them. 


In general, try to get free space on the nodes by using compaction, moving 
files to a new mount etc so that you can get repair to run. 

Cheers

 

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 6/06/2012, at 6:53 AM, Luke Hospadaruk wrote:

 I have a 4-node cluster with one keyspace (aside from the system keyspace)
 with the replication factor set to 4.  The disk usage between the nodes is
 pretty wildly different and I'm wondering why.  It's becoming a problem
 because one node is getting to the point where it sometimes fails to
 compact because it doesn't have enough space.
 
 I've been doing a lot of experimenting with the schema, adding/dropping
 things, changing settings around (not ideal I realize, but we're still in
 development).
 
 In an ideal world, I'd launch another cluster (this is all hosted in
 amazon), copy all the data to that, and just get rid of my current
 cluster, but the current cluster is in use by some other parties so
 rebuilding everything is impractical (although possible if it's the only
 reliable solution).
 
 $ nodetool -h localhost ring
 Address DCRack  Status State  Load   Owns   Token
 
 
 1.xx.xx.xx   Cassandra   rack1   Up Normal  837.8 GB   25.00%  0
 
 2.xx.xx.xx   Cassandra   rack1   Up Normal  1.17 TB25.00%
 42535295865117307932921825928971026432
 3.xx.xx.xx   Cassandra   rack1   Up Normal  977.23 GB  25.00%
 85070591730234615865843651857942052864
 4.xx.xx.xx   Cassandra   rack1   Up Normal  291.2 GB   25.00%
 127605887595351923798765477786913079296
 
 -Problems I'm having:
 Nodes are running out of space and are apparently unable to perform
 compactions because of it.  These machines have 1.7T total space each.
 
 The logs for node #2 have a lot of warnings about insufficient space for
 compaction.  Node number 4 was so extremely out of space (cassandra was
 failing to start because of it)that I removed all the SSTables for one of
 the less essential column families just to bring it back online.
 
 
 I have (since I started noticing these issues) enabled compression for all
 my column families.  On node #1 I was able to successfully run a scrub and
 major compaction, so I suspect that the disk usage for node #1 is about
 where all the other nodes should be.  At ~840GB I'm probably running close
 to the max load I should have on a node, so I may need to launch more
 nodes into the cluster, but I'd like to get things straightened out before
 I introduce more potential issues (token moving, etc).
 
 Node #4 seems not to be picking up all the data it should have (since
 repication factor == number of nodes, the load should be roughly the
 same?).  I've run repairs on that node to seemingly no avail (after repair
 finishes, it still has about the same disk usage, which is much too low).
 
 
 -What I think the solution should be:
 One node at a time:
 1) nodetool drain the node
 2) shut down cassandra on the node
 3) wipe out all the data in my keyspace on the node
 4) bring cassandra back up
 5) nodetool repair
 
 -My concern:
 This is basically what I did with node #4 (although I didn't drain, and I
 didn't wipe the entire keyspace), and it doesn't seem to have regained all
 the data it's supposed to have after the repair. The column family should
 have at least 200-300GB of data, and the SSTables in the data directory
 only total about 11GB, am I missing something?
 
 Is there a way to verify that a node _really_ has all the data 

RE: Cassandra not retrieving the complete data on 2 nodes

2012-06-06 Thread Prakrati Agrawal
Please anyone reply to my query

Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com

From: Prakrati Agrawal [mailto:prakrati.agra...@mu-sigma.com]
Sent: Wednesday, June 06, 2012 2:34 PM
To: user@cassandra.apache.org
Subject: Cassandra not retrieving the complete data on 2 nodes

Dear all

I was originally having a 1 node cluster. Then I added one more node to it with 
initial token configured appropriately. Now when I run my queries I am not 
getting all my data ie all columns.

Output on 2 nodes
Time taken to retrieve columns 43707 of key range is 1276
Time taken to retrieve columns 2084199 of all tickers is 54334
Time taken to count is 230776
Total number of rows in the database are 183
Total number of columns in the database are 7903753

Output on 1 node
Time taken to retrieve columns 43707 of key range is 767
Time taken to retrieve columns 382 of all tickers is 52793
Time taken to count is 268135
Total number of rows in the database are 396
Total number of columns in the database are 16316426

Please help me. Where is my data going or how should I retrieve it.

Thanks and Regards
Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com



This email message may contain proprietary, private and confidential 
information. The information transmitted is intended only for the person(s) or 
entities to which it is addressed. Any review, retransmission, dissemination or 
other use of, or taking of any action in reliance upon, this information by 
persons or entities other than the intended recipient is prohibited and may be 
illegal. If you received this in error, please contact the sender and delete 
the message from your system.

Mu Sigma takes all reasonable steps to ensure that its electronic 
communications are free from viruses. However, given Internet accessibility, 
the Company cannot accept liability for any virus introduced by this e-mail or 
any attachment and you are advised to use up-to-date virus checking software.


This email message may contain proprietary, private and confidential 
information. The information transmitted is intended only for the person(s) or 
entities to which it is addressed. Any review, retransmission, dissemination or 
other use of, or taking of any action in reliance upon, this information by 
persons or entities other than the intended recipient is prohibited and may be 
illegal. If you received this in error, please contact the sender and delete 
the message from your system.

Mu Sigma takes all reasonable steps to ensure that its electronic 
communications are free from viruses. However, given Internet accessibility, 
the Company cannot accept liability for any virus introduced by this e-mail or 
any attachment and you are advised to use up-to-date virus checking software.


RE: Query

2012-06-06 Thread MOHD ARSHAD SALEEM
Hi,
After creating the keyspace successfully now i want to know how to read write 
data using API,s

Regards
Arshad

From: Filippo Diotalevi [fili...@ntoklo.com]
Sent: Wednesday, June 06, 2012 2:27 PM
To: user@cassandra.apache.org
Subject: Re: Query

Hi,
the Javadoc (or source code) of the me.prettyprint.hector.api.factory.HFactory 
class contains all the examples to create keyspaces and column families.

To create a keyspace:

String testKeyspace = testKeyspace;
KeyspaceDefinition newKeyspace = 
HFactory.createKeyspaceDefinition(testKeyspace);
cluster.addKeyspace(newKeyspace);


To create a column family and a keyspace:

String keyspace = testKeyspace;
String column1 = testcolumn;
ColumnFamilyDefinition columnFamily1 = 
HFactory.createColumnFamilyDefinition(keyspace, column1);
ListColumnFamilyDefinition columns = new ArrayListColumnFamilyDefinition();
columns.add(columnFamily1);

KeyspaceDefinition testKeyspace =
HFactory.createKeyspaceDefinition(keyspace, 
org.apache.cassandra.locator.SimpleStrategy.class.getName(), 1, columns);
cluster.addKeyspace(testKeyspace);

--
Filippo Diotalevi



On Wednesday, 6 June 2012 at 07:05, MOHD ARSHAD SALEEM wrote:

Hi All,

I am using Hector client for cassandra . I wanted to know how to create 
keyspace and column family using API's to read and write data.
or  i have to create keyspace and column family manually using command line 
interface.

Regards
Arshad



Problem in getting data from a 2 node cluster

2012-06-06 Thread Prakrati Agrawal
Dear all,

I had a 1 node cluster. Then I added 1 more node to it.
When I ran my query on 1 node cluster I got all my data but when I ran my query 
on the 2 node cluster (Hector code) I am not getting the same data.
How do I ensure that my Hector code retrieves data from all the nodes.

Also when I decommission my node and then add it again I get the following 
message.
This node will not auto bootstrap because it is configured to be a seed node
Please tell me the meaning of it also

Thanks and Regards
Prakrati

Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com



This email message may contain proprietary, private and confidential 
information. The information transmitted is intended only for the person(s) or 
entities to which it is addressed. Any review, retransmission, dissemination or 
other use of, or taking of any action in reliance upon, this information by 
persons or entities other than the intended recipient is prohibited and may be 
illegal. If you received this in error, please contact the sender and delete 
the message from your system.

Mu Sigma takes all reasonable steps to ensure that its electronic 
communications are free from viruses. However, given Internet accessibility, 
the Company cannot accept liability for any virus introduced by this e-mail or 
any attachment and you are advised to use up-to-date virus checking software.


unsubscribe

2012-06-06 Thread Cyril Scetbon


--
Cyril SCETBON



Re: unsubscribe

2012-06-06 Thread Cyril Scetbon

On 6/6/12 12:13 PM, Cyril Scetbon wrote:



sorry for that

--
Cyril SCETBON



Re: Problem in getting data from a 2 node cluster

2012-06-06 Thread R. Verlangen
Did you run repair on the new node?

2012/6/6 Prakrati Agrawal prakrati.agra...@mu-sigma.com

  Dear all,

 ** **

 I had a 1 node cluster. Then I added 1 more node to it. ** **

 When I ran my query on 1 node cluster I got all my data but when I ran my
 query on the 2 node cluster (Hector code) I am not getting the same data.
 

 How do I ensure that my Hector code retrieves data from all the nodes. ***
 *

 ** **

 Also when I decommission my node and then add it again I get the following
 message.

 This node will not auto bootstrap because it is configured to be a seed
 node

 Please tell me the meaning of it also

 ** **

 Thanks and Regards

 Prakrati

 ** **

 Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 |
 www.mu-sigma.com 

 ** **

 --
 This email message may contain proprietary, private and confidential
 information. The information transmitted is intended only for the person(s)
 or entities to which it is addressed. Any review, retransmission,
 dissemination or other use of, or taking of any action in reliance upon,
 this information by persons or entities other than the intended recipient
 is prohibited and may be illegal. If you received this in error, please
 contact the sender and delete the message from your system.

 Mu Sigma takes all reasonable steps to ensure that its electronic
 communications are free from viruses. However, given Internet
 accessibility, the Company cannot accept liability for any virus introduced
 by this e-mail or any attachment and you are advised to use up-to-date
 virus checking software.




-- 
With kind regards,

Robin Verlangen
*Software engineer*
*
*
W http://www.robinverlangen.nl
E ro...@us2.nl

Disclaimer: The information contained in this message and attachments is
intended solely for the attention and use of the named addressee and may be
confidential. If you are not the intended recipient, you are reminded that
the information remains the property of the sender. You must not use,
disclose, distribute, copy, print or rely on this e-mail. If you have
received this message in error, please contact the sender immediately and
irrevocably delete this message and any copies.


RE: Problem in getting data from a 2 node cluster

2012-06-06 Thread Prakrati Agrawal
What does repair do?

Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com

From: R. Verlangen [mailto:ro...@us2.nl]
Sent: Wednesday, June 06, 2012 3:56 PM
To: user@cassandra.apache.org
Subject: Re: Problem in getting data from a 2 node cluster

Did you run repair on the new node?
2012/6/6 Prakrati Agrawal 
prakrati.agra...@mu-sigma.commailto:prakrati.agra...@mu-sigma.com
Dear all,

I had a 1 node cluster. Then I added 1 more node to it.
When I ran my query on 1 node cluster I got all my data but when I ran my query 
on the 2 node cluster (Hector code) I am not getting the same data.
How do I ensure that my Hector code retrieves data from all the nodes.

Also when I decommission my node and then add it again I get the following 
message.
This node will not auto bootstrap because it is configured to be a seed node
Please tell me the meaning of it also

Thanks and Regards
Prakrati

Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | 
www.mu-sigma.comhttp://www.mu-sigma.com



This email message may contain proprietary, private and confidential 
information. The information transmitted is intended only for the person(s) or 
entities to which it is addressed. Any review, retransmission, dissemination or 
other use of, or taking of any action in reliance upon, this information by 
persons or entities other than the intended recipient is prohibited and may be 
illegal. If you received this in error, please contact the sender and delete 
the message from your system.

Mu Sigma takes all reasonable steps to ensure that its electronic 
communications are free from viruses. However, given Internet accessibility, 
the Company cannot accept liability for any virus introduced by this e-mail or 
any attachment and you are advised to use up-to-date virus checking software.



--
With kind regards,

Robin Verlangen
Software engineer

W http://www.robinverlangen.nl
E ro...@us2.nlmailto:ro...@us2.nl

Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.



This email message may contain proprietary, private and confidential 
information. The information transmitted is intended only for the person(s) or 
entities to which it is addressed. Any review, retransmission, dissemination or 
other use of, or taking of any action in reliance upon, this information by 
persons or entities other than the intended recipient is prohibited and may be 
illegal. If you received this in error, please contact the sender and delete 
the message from your system.

Mu Sigma takes all reasonable steps to ensure that its electronic 
communications are free from viruses. However, given Internet accessibility, 
the Company cannot accept liability for any virus introduced by this e-mail or 
any attachment and you are advised to use up-to-date virus checking software.


RE: Problem in getting data from a 2 node cluster

2012-06-06 Thread Prakrati Agrawal
When I run the nodetool command I get the following information
./nodetool -h localhost ring
Address DC  RackStatus State   Load
Effective-Owership  Token

   85070591730234615865843651857942052864
162.192.100.16  datacenter1 rack1   Up Normal  238.22 MB   50.00%   
   0
162.192.100.48  datacenter1 rack1   Up Normal  115.6 MB50.00%   
   85070591730234615865843651857942052864

Please help me

Thanks and Regards

Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com

From: Prakrati Agrawal [mailto:prakrati.agra...@mu-sigma.com]
Sent: Wednesday, June 06, 2012 3:55 PM
To: user@cassandra.apache.org
Subject: RE: Problem in getting data from a 2 node cluster

What does repair do?

Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com

From: R. Verlangen [mailto:ro...@us2.nl]
Sent: Wednesday, June 06, 2012 3:56 PM
To: user@cassandra.apache.org
Subject: Re: Problem in getting data from a 2 node cluster

Did you run repair on the new node?
2012/6/6 Prakrati Agrawal 
prakrati.agra...@mu-sigma.commailto:prakrati.agra...@mu-sigma.com
Dear all,

I had a 1 node cluster. Then I added 1 more node to it.
When I ran my query on 1 node cluster I got all my data but when I ran my query 
on the 2 node cluster (Hector code) I am not getting the same data.
How do I ensure that my Hector code retrieves data from all the nodes.

Also when I decommission my node and then add it again I get the following 
message.
This node will not auto bootstrap because it is configured to be a seed node
Please tell me the meaning of it also

Thanks and Regards
Prakrati

Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | 
www.mu-sigma.comhttp://www.mu-sigma.com



This email message may contain proprietary, private and confidential 
information. The information transmitted is intended only for the person(s) or 
entities to which it is addressed. Any review, retransmission, dissemination or 
other use of, or taking of any action in reliance upon, this information by 
persons or entities other than the intended recipient is prohibited and may be 
illegal. If you received this in error, please contact the sender and delete 
the message from your system.

Mu Sigma takes all reasonable steps to ensure that its electronic 
communications are free from viruses. However, given Internet accessibility, 
the Company cannot accept liability for any virus introduced by this e-mail or 
any attachment and you are advised to use up-to-date virus checking software.



--
With kind regards,

Robin Verlangen
Software engineer

W http://www.robinverlangen.nl
E ro...@us2.nlmailto:ro...@us2.nl

Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.



This email message may contain proprietary, private and confidential 
information. The information transmitted is intended only for the person(s) or 
entities to which it is addressed. Any review, retransmission, dissemination or 
other use of, or taking of any action in reliance upon, this information by 
persons or entities other than the intended recipient is prohibited and may be 
illegal. If you received this in error, please contact the sender and delete 
the message from your system.

Mu Sigma takes all reasonable steps to ensure that its electronic 
communications are free from viruses. However, given Internet accessibility, 
the Company cannot accept liability for any virus introduced by this e-mail or 
any attachment and you are advised to use up-to-date virus checking software.


This email message may contain proprietary, private and confidential 
information. The information transmitted is intended only for the person(s) or 
entities to which it is addressed. Any review, retransmission, dissemination or 
other use of, or taking of any action in reliance upon, this information by 
persons or entities other than the intended recipient is prohibited and may be 
illegal. If you received this in error, please contact the sender and delete 
the message from your system.

Mu Sigma takes all reasonable steps to ensure that its electronic 
communications are free from viruses. However, given Internet accessibility, 
the Company cannot accept liability for any virus introduced by this e-mail or 
any attachment and you are advised to use up-to-date virus checking software.


Re: Problem in getting data from a 2 node cluster

2012-06-06 Thread R. Verlangen
Repair ensures that all data is consistent and available on the node.

2012/6/6 Prakrati Agrawal prakrati.agra...@mu-sigma.com

  When I run the nodetool command I get the following information

 ./nodetool -h localhost ring

 Address DC  RackStatus State   Load
 Effective-Owership  Token   


   
 85070591730234615865843651857942052864
 

 162.192.100.16  datacenter1 rack1   Up Normal  238.22 MB
 50.00%  0   

 162.192.100.48  datacenter1 rack1   Up Normal  115.6 MB
 50.00%  85070591730234615865843651857942052864  

 ** **

 Please help me

 ** **

 Thanks and Regards

 ** **

 Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 |
 www.mu-sigma.com 

 ** **

 *From:* Prakrati Agrawal [mailto:prakrati.agra...@mu-sigma.com]
 *Sent:* Wednesday, June 06, 2012 3:55 PM
 *To:* user@cassandra.apache.org
 *Subject:* RE: Problem in getting data from a 2 node cluster

 ** **

 What does repair do?

 ** **

 Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 |
 www.mu-sigma.com 

 ** **

 *From:* R. Verlangen [mailto:ro...@us2.nl]
 *Sent:* Wednesday, June 06, 2012 3:56 PM
 *To:* user@cassandra.apache.org
 *Subject:* Re: Problem in getting data from a 2 node cluster

 ** **

 Did you run repair on the new node?

 2012/6/6 Prakrati Agrawal prakrati.agra...@mu-sigma.com

 Dear all,

  

 I had a 1 node cluster. Then I added 1 more node to it. 

 When I ran my query on 1 node cluster I got all my data but when I ran my
 query on the 2 node cluster (Hector code) I am not getting the same data.
 

 How do I ensure that my Hector code retrieves data from all the nodes. ***
 *

  

 Also when I decommission my node and then add it again I get the following
 message.

 This node will not auto bootstrap because it is configured to be a seed
 node

 Please tell me the meaning of it also

  

 Thanks and Regards

 Prakrati

  

 Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 |
 www.mu-sigma.com 

  

 ** **
  --

 This email message may contain proprietary, private and confidential
 information. The information transmitted is intended only for the person(s)
 or entities to which it is addressed. Any review, retransmission,
 dissemination or other use of, or taking of any action in reliance upon,
 this information by persons or entities other than the intended recipient
 is prohibited and may be illegal. If you received this in error, please
 contact the sender and delete the message from your system.

 Mu Sigma takes all reasonable steps to ensure that its electronic
 communications are free from viruses. However, given Internet
 accessibility, the Company cannot accept liability for any virus introduced
 by this e-mail or any attachment and you are advised to use up-to-date
 virus checking software.



 

 ** **

 --
 With kind regards,

 ** **

 Robin Verlangen

 *Software engineer*

 ** **

 W http://www.robinverlangen.nl

 E ro...@us2.nl

 ** **

 Disclaimer: The information contained in this message and attachments is
 intended solely for the attention and use of the named addressee and may be
 confidential. If you are not the intended recipient, you are reminded that
 the information remains the property of the sender. You must not use,
 disclose, distribute, copy, print or rely on this e-mail. If you have
 received this message in error, please contact the sender immediately and
 irrevocably delete this message and any copies.

 ** **

 ** **
  --

 This email message may contain proprietary, private and confidential
 information. The information transmitted is intended only for the person(s)
 or entities to which it is addressed. Any review, retransmission,
 dissemination or other use of, or taking of any action in reliance upon,
 this information by persons or entities other than the intended recipient
 is prohibited and may be illegal. If you received this in error, please
 contact the sender and delete the message from your system.

 Mu Sigma takes all reasonable steps to ensure that its electronic
 communications are free from viruses. However, given Internet
 accessibility, the Company cannot accept liability for any virus introduced
 by this e-mail or any attachment and you are advised to use up-to-date
 virus checking software.

 --
 This email message may contain proprietary, private and confidential
 information. The information transmitted is intended only for the person(s)
 or entities to which it is addressed. Any review, retransmission,
 dissemination or other use of, or taking of any action in reliance upon,
 this information by persons or entities other than the intended 

RE: Problem in getting data from a 2 node cluster

2012-06-06 Thread Prakrati Agrawal
Yes I ran nodetool repair also. Still the same problem I am getting lesser data 
when using my code on a 2 node cluster. Please help me

Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com

From: R. Verlangen [mailto:ro...@us2.nl]
Sent: Wednesday, June 06, 2012 4:01 PM
To: user@cassandra.apache.org
Subject: Re: Problem in getting data from a 2 node cluster

Repair ensures that all data is consistent and available on the node.
2012/6/6 Prakrati Agrawal 
prakrati.agra...@mu-sigma.commailto:prakrati.agra...@mu-sigma.com
When I run the nodetool command I get the following information
./nodetool -h localhost ring
Address DC  RackStatus State   Load
Effective-Owership  Token

   85070591730234615865843651857942052864
162.192.100.16  datacenter1 rack1   Up Normal  238.22 MB   50.00%   
   0
162.192.100.48  datacenter1 rack1   Up Normal  115.6 MB50.00%   
   85070591730234615865843651857942052864

Please help me

Thanks and Regards

Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | 
www.mu-sigma.comhttp://www.mu-sigma.com

From: Prakrati Agrawal 
[mailto:prakrati.agra...@mu-sigma.commailto:prakrati.agra...@mu-sigma.com]
Sent: Wednesday, June 06, 2012 3:55 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: RE: Problem in getting data from a 2 node cluster

What does repair do?

Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | 
www.mu-sigma.comhttp://www.mu-sigma.com

From: R. Verlangen [mailto:ro...@us2.nlmailto:ro...@us2.nl]
Sent: Wednesday, June 06, 2012 3:56 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Problem in getting data from a 2 node cluster

Did you run repair on the new node?
2012/6/6 Prakrati Agrawal 
prakrati.agra...@mu-sigma.commailto:prakrati.agra...@mu-sigma.com
Dear all,

I had a 1 node cluster. Then I added 1 more node to it.
When I ran my query on 1 node cluster I got all my data but when I ran my query 
on the 2 node cluster (Hector code) I am not getting the same data.
How do I ensure that my Hector code retrieves data from all the nodes.

Also when I decommission my node and then add it again I get the following 
message.
This node will not auto bootstrap because it is configured to be a seed node
Please tell me the meaning of it also

Thanks and Regards
Prakrati

Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | 
www.mu-sigma.comhttp://www.mu-sigma.com



This email message may contain proprietary, private and confidential 
information. The information transmitted is intended only for the person(s) or 
entities to which it is addressed. Any review, retransmission, dissemination or 
other use of, or taking of any action in reliance upon, this information by 
persons or entities other than the intended recipient is prohibited and may be 
illegal. If you received this in error, please contact the sender and delete 
the message from your system.

Mu Sigma takes all reasonable steps to ensure that its electronic 
communications are free from viruses. However, given Internet accessibility, 
the Company cannot accept liability for any virus introduced by this e-mail or 
any attachment and you are advised to use up-to-date virus checking software.



--
With kind regards,

Robin Verlangen
Software engineer

W http://www.robinverlangen.nl
E ro...@us2.nlmailto:ro...@us2.nl

Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.



This email message may contain proprietary, private and confidential 
information. The information transmitted is intended only for the person(s) or 
entities to which it is addressed. Any review, retransmission, dissemination or 
other use of, or taking of any action in reliance upon, this information by 
persons or entities other than the intended recipient is prohibited and may be 
illegal. If you received this in error, please contact the sender and delete 
the message from your system.

Mu Sigma takes all reasonable steps to ensure that its electronic 
communications are free from viruses. However, given Internet accessibility, 
the Company cannot accept liability for any virus introduced by this e-mail or 
any attachment and you are advised to use up-to-date virus checking software.


This email message may contain proprietary, private and confidential 
information. The 

RE: Problem in getting data from a 2 node cluster

2012-06-06 Thread Prakrati Agrawal
I even used CassandraHostConfigurator and added a string of hosts but still the 
same issue. Please someone help me

Thanks and Regards

Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com

From: Prakrati Agrawal [mailto:prakrati.agra...@mu-sigma.com]
Sent: Wednesday, June 06, 2012 4:04 PM
To: user@cassandra.apache.org
Subject: RE: Problem in getting data from a 2 node cluster

Yes I ran nodetool repair also. Still the same problem I am getting lesser data 
when using my code on a 2 node cluster. Please help me

Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com

From: R. Verlangen [mailto:ro...@us2.nl]
Sent: Wednesday, June 06, 2012 4:01 PM
To: user@cassandra.apache.org
Subject: Re: Problem in getting data from a 2 node cluster

Repair ensures that all data is consistent and available on the node.
2012/6/6 Prakrati Agrawal 
prakrati.agra...@mu-sigma.commailto:prakrati.agra...@mu-sigma.com
When I run the nodetool command I get the following information
./nodetool -h localhost ring
Address DC  RackStatus State   Load
Effective-Owership  Token

   85070591730234615865843651857942052864
162.192.100.16  datacenter1 rack1   Up Normal  238.22 MB   50.00%   
   0
162.192.100.48  datacenter1 rack1   Up Normal  115.6 MB50.00%   
   85070591730234615865843651857942052864

Please help me

Thanks and Regards

Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | 
www.mu-sigma.comhttp://www.mu-sigma.com

From: Prakrati Agrawal 
[mailto:prakrati.agra...@mu-sigma.commailto:prakrati.agra...@mu-sigma.com]
Sent: Wednesday, June 06, 2012 3:55 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: RE: Problem in getting data from a 2 node cluster

What does repair do?

Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | 
www.mu-sigma.comhttp://www.mu-sigma.com

From: R. Verlangen [mailto:ro...@us2.nlmailto:ro...@us2.nl]
Sent: Wednesday, June 06, 2012 3:56 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Problem in getting data from a 2 node cluster

Did you run repair on the new node?
2012/6/6 Prakrati Agrawal 
prakrati.agra...@mu-sigma.commailto:prakrati.agra...@mu-sigma.com
Dear all,

I had a 1 node cluster. Then I added 1 more node to it.
When I ran my query on 1 node cluster I got all my data but when I ran my query 
on the 2 node cluster (Hector code) I am not getting the same data.
How do I ensure that my Hector code retrieves data from all the nodes.

Also when I decommission my node and then add it again I get the following 
message.
This node will not auto bootstrap because it is configured to be a seed node
Please tell me the meaning of it also

Thanks and Regards
Prakrati

Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | 
www.mu-sigma.comhttp://www.mu-sigma.com



This email message may contain proprietary, private and confidential 
information. The information transmitted is intended only for the person(s) or 
entities to which it is addressed. Any review, retransmission, dissemination or 
other use of, or taking of any action in reliance upon, this information by 
persons or entities other than the intended recipient is prohibited and may be 
illegal. If you received this in error, please contact the sender and delete 
the message from your system.

Mu Sigma takes all reasonable steps to ensure that its electronic 
communications are free from viruses. However, given Internet accessibility, 
the Company cannot accept liability for any virus introduced by this e-mail or 
any attachment and you are advised to use up-to-date virus checking software.



--
With kind regards,

Robin Verlangen
Software engineer

W http://www.robinverlangen.nl
E ro...@us2.nlmailto:ro...@us2.nl

Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.



This email message may contain proprietary, private and confidential 
information. The information transmitted is intended only for the person(s) or 
entities to which it is addressed. Any review, retransmission, dissemination or 
other use of, or taking of any action in reliance upon, this information by 
persons or entities other than the intended recipient is prohibited and may be 
illegal. If you received this in error, please contact the sender and delete 
the message from your system.

Mu Sigma takes all reasonable 

RE: Problem in getting data from a 2 node cluster

2012-06-06 Thread Prakrati Agrawal
I will repeat my query once again:
I had a 1 node cluster. Then I added 1 more node to it.
When I ran my query on 1 node cluster I got all my data but when I ran my query 
on the 2 node cluster (Hector code) I am not getting the same data.
How do I ensure that my Hector code retrieves data from all the nodes.

Also when I decommission my node and then add it again I get the following 
message.
This node will not auto bootstrap because it is configured to be a seed node
Please tell me the meaning of it also
The things I already tried are:


1.   Used CassandraHostConfigurator - Still same issue

2.   Used nodetool repair on both the nodes - Still same issue
Please help me out. I am badly stuck

Thanks and Regards

Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com

From: Prakrati Agrawal [mailto:prakrati.agra...@mu-sigma.com]
Sent: Wednesday, June 06, 2012 4:41 PM
To: user@cassandra.apache.org
Subject: RE: Problem in getting data from a 2 node cluster

I even used CassandraHostConfigurator and added a string of hosts but still the 
same issue. Please someone help me

Thanks and Regards

Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com

From: Prakrati Agrawal [mailto:prakrati.agra...@mu-sigma.com]
Sent: Wednesday, June 06, 2012 4:04 PM
To: user@cassandra.apache.org
Subject: RE: Problem in getting data from a 2 node cluster

Yes I ran nodetool repair also. Still the same problem I am getting lesser data 
when using my code on a 2 node cluster. Please help me

Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com

From: R. Verlangen [mailto:ro...@us2.nl]
Sent: Wednesday, June 06, 2012 4:01 PM
To: user@cassandra.apache.org
Subject: Re: Problem in getting data from a 2 node cluster

Repair ensures that all data is consistent and available on the node.
2012/6/6 Prakrati Agrawal 
prakrati.agra...@mu-sigma.commailto:prakrati.agra...@mu-sigma.com
When I run the nodetool command I get the following information
./nodetool -h localhost ring
Address DC  RackStatus State   Load
Effective-Owership  Token

   85070591730234615865843651857942052864
162.192.100.16  datacenter1 rack1   Up Normal  238.22 MB   50.00%   
   0
162.192.100.48  datacenter1 rack1   Up Normal  115.6 MB50.00%   
   85070591730234615865843651857942052864

Please help me

Thanks and Regards

Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | 
www.mu-sigma.comhttp://www.mu-sigma.com

From: Prakrati Agrawal 
[mailto:prakrati.agra...@mu-sigma.commailto:prakrati.agra...@mu-sigma.com]
Sent: Wednesday, June 06, 2012 3:55 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: RE: Problem in getting data from a 2 node cluster

What does repair do?

Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | 
www.mu-sigma.comhttp://www.mu-sigma.com

From: R. Verlangen [mailto:ro...@us2.nlmailto:ro...@us2.nl]
Sent: Wednesday, June 06, 2012 3:56 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Problem in getting data from a 2 node cluster

Did you run repair on the new node?
2012/6/6 Prakrati Agrawal 
prakrati.agra...@mu-sigma.commailto:prakrati.agra...@mu-sigma.com
Dear all,

I had a 1 node cluster. Then I added 1 more node to it.
When I ran my query on 1 node cluster I got all my data but when I ran my query 
on the 2 node cluster (Hector code) I am not getting the same data.
How do I ensure that my Hector code retrieves data from all the nodes.

Also when I decommission my node and then add it again I get the following 
message.
This node will not auto bootstrap because it is configured to be a seed node
Please tell me the meaning of it also

Thanks and Regards
Prakrati

Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | 
www.mu-sigma.comhttp://www.mu-sigma.com



This email message may contain proprietary, private and confidential 
information. The information transmitted is intended only for the person(s) or 
entities to which it is addressed. Any review, retransmission, dissemination or 
other use of, or taking of any action in reliance upon, this information by 
persons or entities other than the intended recipient is prohibited and may be 
illegal. If you received this in error, please contact the sender and delete 
the message from your system.

Mu Sigma takes all reasonable steps to ensure that its electronic 
communications are free from viruses. However, given Internet accessibility, 
the Company cannot accept liability for any virus introduced by this e-mail or 
any attachment and you are advised to use up-to-date virus checking software.



--
With kind regards,

Robin Verlangen
Software engineer

W http://www.robinverlangen.nl
E ro...@us2.nlmailto:ro...@us2.nl

Disclaimer: The information contained in this message 

Re: Query

2012-06-06 Thread shelan Perera
Hi,

You can find detailed info here [1]

[1] https://github.com/hector-client/hector/wiki/User-Guide

regards

On Wed, Jun 6, 2012 at 3:38 PM, MOHD ARSHAD SALEEM 
marshadsal...@tataelxsi.co.in wrote:

  Hi,
 After creating the keyspace successfully now i want to know how to read
 write data using API,s

 Regards
 Arshad
  --
 *From:* Filippo Diotalevi [fili...@ntoklo.com]
 *Sent:* Wednesday, June 06, 2012 2:27 PM
 *To:* user@cassandra.apache.org
 *Subject:* Re: Query

   Hi,
 the Javadoc (or source code) of
 the me.prettyprint.hector.api.factory.HFactory class contains all the
 examples to create keyspaces and column families.

  To create a keyspace:

  String testKeyspace = testKeyspace;
 KeyspaceDefinition newKeyspace
 = HFactory.createKeyspaceDefinition(testKeyspace);
 cluster.addKeyspace(newKeyspace);


  To create a column family and a keyspace:

  String keyspace = testKeyspace;
 String column1 = testcolumn;
 ColumnFamilyDefinition columnFamily1
 = HFactory.createColumnFamilyDefinition(keyspace, column1);
 ListColumnFamilyDefinition columns =
 new ArrayListColumnFamilyDefinition();
 columns.add(columnFamily1);

  KeyspaceDefinition testKeyspace =
 HFactory.createKeyspaceDefinition(keyspace, 
 org.apache.cassandra.locator.SimpleStrategy.class.getName(),
 1, columns);
 cluster.addKeyspace(testKeyspace);

  --
 Filippo Diotalevi


  On Wednesday, 6 June 2012 at 07:05, MOHD ARSHAD SALEEM wrote:

   Hi All,

 I am using Hector client for cassandra . I wanted to know how to create
 keyspace and column family using API's to read and write data.
 or  i have to create keyspace and column family manually using command
 line interface.

 Regards
 Arshad





-- 
Shelan Perera

Home: http://www.shelan.org
Blog   : http://www.shelanlk.com
Twitter: shelan
skype  :shelan.perera
gtalk   :shelanrc

 I am the master of my fate:
 I am the captain of my soul.
 *invictus*


RE: Problem in getting data from a 2 node cluster

2012-06-06 Thread Tim Wintle
On Wed, 2012-06-06 at 06:54 -0500, Prakrati Agrawal wrote:
 This node will not auto bootstrap because it is configured to be a
 seed node

This means the cassandra.yaml on that node references itself as a seed
node.


After you decommission the second node, can you still access the entire
dataset in the single node cluser, or has it been lost along the way?

What is the replication factor for your data?


Tim Wintle




Node decomission failed

2012-06-06 Thread Marc Canaleta
Hi,

We are testing Cassandra and tried to remove a node from the cluster using
nodetool decomission. The node transferred the data, then died for about
20 minutes without responding, then came back to life with a load of
50-100, was in a heavy load during about 1 hour and then returned to normal
load. It seems to have stopped receiving new data but it is still in the
cluster.

The node we tried to remove is the third one:

root@dc-cassandra-03:~# nodetool ring
Note: Ownership information does not include topology, please specify a
keyspace.
Address DC  RackStatus State   LoadOwns
   Token

   113427455640312821154458202477256070484
10.70.147.62datacenter1 rack1   Up Normal  7.14 GB
33.33%  0
10.208.51.64datacenter1 rack1   Up Normal  3.68 GB
33.33%  56713727820156410577229101238628035242
10.190.207.185  datacenter1 rack1   Up Normal  3.54 GB
33.33%  113427455640312821154458202477256070484


It seems it is still part of the cluster. What should we do? decomission
again?

How can we know the current state of the cluster?

Thanks!


Re: MeteredFlusher in system.log entries

2012-06-06 Thread rohit bhatia
Hi..

the link http://thelastpickle.com/2011/05/04/How-are-Memtables-measured/
mentions that From version 0.7 onwards the worse case scenario is up
to CF Count + Secondary Index Count + memtable_flush_queue_size
(defaults to 4) + memtable_flush_writers (defaults to 1 per data
directory) memtables in memory the JVM at once..

So it implies that for flushing, Cassandra copies the memtables content.
So does this imply that writes to column families are not stopped even
when it is being flushed?

Thanks
Rohit

On Wed, Jun 6, 2012 at 9:42 AM, rohit bhatia rohit2...@gmail.com wrote:
 Hi Aaron

 Thanks for the link, I have gone through it. But this doesn't justify
 nodes of exactly same config/specs differing in their flushing
 frequency.
 The traffic on all node is same as we are using RandomPartitioner

 Thanks
 Rohit

 On Wed, Jun 6, 2012 at 12:24 AM, aaron morton aa...@thelastpickle.com wrote:
 See the section on memtable_total_space_in_mb here
  http://thelastpickle.com/2011/05/04/How-are-Memtables-measured/

 Cheers
 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 6/06/2012, at 2:27 AM, rohit bhatia wrote:

 I am trying to understand the variance in flushes frequency in a 8
 node Cassandra cluster.
 All the flushes are of the same type and initiated by MeteredFlusher.java =

 INFO [OptionalTasks:1] 2012-06-05 06:32:05,873 MeteredFlusher.java
 (line 62) flushing high-traffic column family CFS(Keyspace='Stats',
 ColumnFamily='Minutewise_Channel_Stats') (estimated 501695882 bytes)
 [taken from system.log]

 Number of flushes for 1 column family vary from 6 flushes per day to
 24 flushes per day among nodes of same configuration and same
 hardware.
 Could you please throw light on the what conditions does
 MeteredFlusher use to trigger memtable flushes.
 Also how accurate is the estimated size in the above logfile entry.

 Regards
 Rohit Bhatia
 Software Engineer, Media.net




Re: MeteredFlusher in system.log entries

2012-06-06 Thread rohit bhatia
Also, Could someone please explain how the factor of 7 comes in the
picture in this sentence

For example if memtable_total_space_in_mb is 100MB, and
memtable_flush_writers is the default 1 (with one data directory), and
memtable_flush_queue_size is the default 4, and a Column Family has no
secondary indexes. The CF will not be allowed to get above one seventh
of 100MB or 14MB, as if the CF filled the flush pipeline with 7
memtables of this size it would take 98MB. 

On Wed, Jun 6, 2012 at 6:22 PM, rohit bhatia rohit2...@gmail.com wrote:
 Hi..

 the link http://thelastpickle.com/2011/05/04/How-are-Memtables-measured/
 mentions that From version 0.7 onwards the worse case scenario is up
 to CF Count + Secondary Index Count + memtable_flush_queue_size
 (defaults to 4) + memtable_flush_writers (defaults to 1 per data
 directory) memtables in memory the JVM at once..

 So it implies that for flushing, Cassandra copies the memtables content.
 So does this imply that writes to column families are not stopped even
 when it is being flushed?

 Thanks
 Rohit

 On Wed, Jun 6, 2012 at 9:42 AM, rohit bhatia rohit2...@gmail.com wrote:
 Hi Aaron

 Thanks for the link, I have gone through it. But this doesn't justify
 nodes of exactly same config/specs differing in their flushing
 frequency.
 The traffic on all node is same as we are using RandomPartitioner

 Thanks
 Rohit

 On Wed, Jun 6, 2012 at 12:24 AM, aaron morton aa...@thelastpickle.com 
 wrote:
 See the section on memtable_total_space_in_mb here
  http://thelastpickle.com/2011/05/04/How-are-Memtables-measured/

 Cheers
 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 6/06/2012, at 2:27 AM, rohit bhatia wrote:

 I am trying to understand the variance in flushes frequency in a 8
 node Cassandra cluster.
 All the flushes are of the same type and initiated by MeteredFlusher.java =

 INFO [OptionalTasks:1] 2012-06-05 06:32:05,873 MeteredFlusher.java
 (line 62) flushing high-traffic column family CFS(Keyspace='Stats',
 ColumnFamily='Minutewise_Channel_Stats') (estimated 501695882 bytes)
 [taken from system.log]

 Number of flushes for 1 column family vary from 6 flushes per day to
 24 flushes per day among nodes of same configuration and same
 hardware.
 Could you please throw light on the what conditions does
 MeteredFlusher use to trigger memtable flushes.
 Also how accurate is the estimated size in the above logfile entry.

 Regards
 Rohit Bhatia
 Software Engineer, Media.net




How do I initialize Astyanax in a EJB Stateless bean

2012-06-06 Thread xsdt

Hello All,

How do I initialize Astyanax inside an EJB Stateless bean, which I am 
using to implement DAO?


Thanks

ben.jamin


Re: Nodes not picking up data on repair, disk loaded unevenly

2012-06-06 Thread Luke Hospadaruk
Thanks for the tips

Some things I found looking around:

grepping the logs for a specific repair I ran yesterday:

/var/log/cassandra# grep df14e460-af48-11e1--e9014560c7bd system.log
 INFO [AntiEntropySessions:13] 2012-06-05 19:58:51,303 AntiEntropyService.java 
(line 658) [repair #df14e460-af48-11e1--e9014560c7bd] new session: will 
sync /4.xx.xx.xx, /1.xx.xx.xx, /3.xx.xx.xx, /2.xx.xx.xx on range 
(85070591730234615865843651857942052864,127605887595351923798765477786913079296]
 for content.[article2]
 INFO [AntiEntropySessions:13] 2012-06-05 19:58:51,304 AntiEntropyService.java 
(line 837) [repair #df14e460-af48-11e1--e9014560c7bd] requests for merkle 
tree sent for article2 (to [ /4.xx.xx.xx, /1.xx.xx.xx, /3.xx.xx.xx, 
/2.xx.xx.xx])
 INFO [AntiEntropyStage:1] 2012-06-05 20:07:01,169 AntiEntropyService.java 
(line 190) [repair #df14e460-af48-11e1--e9014560c7bd] Received merkle tree 
for article2 from /4.xx.xx.xx
 INFO [AntiEntropyStage:1] 2012-06-06 04:12:30,633 AntiEntropyService.java 
(line 190) [repair #df14e460-af48-11e1--e9014560c7bd] Received merkle tree 
for article2 from /3.xx.xx.xx
 INFO [AntiEntropyStage:1] 2012-06-06 07:02:51,497 AntiEntropyService.java 
(line 190) [repair #df14e460-af48-11e1--e9014560c7bd] Received merkle tree 
for article2 from /1.xx.xx.xx

So it looks like I never got the tree from node #2 (the node which has 
particularly out of control disk usage).

These are running on amazon m1.xlarge instances with all the EBS volumes raided 
together for a total of 1.7TB.

 What version are you using ?
1.0

 Has there been times when nodes were down ?
Yes, but mostly just restarts, and mostly just one node at a time

 Clear as much space as possible from the disk. Check for snapshots in all 
 KS's.
Already done.

 What KS's (including the system KS) are taking up the most space ? Are there 
 a lot of hints in the system KS (they are not replicated)?
-There's just one KS that I'm actually using, which is taking up anywhere from 
about 650GB on the node I was able to scrub and compact (that sounds like the 
right size to me), and 1.3T on the node that is hugely bloated.
-There are pretty big huge hints CFs on all but one node (the node I deleted 
data from, although I did not delete any hints from there). They're between 
175GB and 250GB depending on the node.
-Is there any way to force replay of hints to empty this out – just a full 
cluster restart when everything is working again maybe?
-Could I just disable hinted handoff and wipe out those tables?  I realize I'll 
loose those hints, but that doesn't bother me terribly.  I have a high 
replication factor and all my writes have been at cl=ONE (so all the data in 
the hints should actually exist in a CF somewhere right?).  Perhaps more 
importantly if some data has been stalled in a hints table for a week I won't 
really miss it since it basically doesn't exist right now.  I can re-write any 
data that got lost (although that's not ideal).

 Try to get a feel for what CF's are taking up the space or not as the case my 
 be. Look in nodetool cfstats to see how big the rows are.
The hints table and my tables are the only thing taking up any significant 
space on the system

 you have enabled compression run nodetool upgradetables to compress them.
how much working space does this need?  Problem is that node #2 is so full I'm 
not sure any major rebuild or compaction will be susccessful.  The other nodes 
seem to be handiling things ok although they are still heavily loaded.

 In general, try to get free space on the nodes by using compaction, moving 
 files to a new mount etc so that you can get repair to run.
-I'll try adding an EBS volume or two to the bloated node and see if that 
allows me to successfuly compact/repair.
-If I add another volume to that node, then run some compactions and such to 
the point where everything fits on the main volume again, I may just replace 
that node with a new one.  Can I move things off of and then kill the ebs 
volume?

Other thoughts/notes:
This cluster has a super high write load currently since I'm still building it 
out.  I frequently update every row in my CFs
I almost certainly need to add more capacity (more nodes).  The general plan is 
to get everything sort of working first though, since repairs and such are 
currently failing it seems like a bad time to add more nodes.

Thanks,
Luke

From: aaron morton aa...@thelastpickle.commailto:aa...@thelastpickle.com
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Nodes not picking up data on repair, disk loaded unevenly

You are basically in trouble. If you can nuke it and start again it would be 
easier. If you want to figure out how to get out of it keep the cluster up and 
have a play.


-What I think the 

Re: [phpcassa] multi_get and composite, cassandra crash my mind

2012-06-06 Thread Tyler Hobbs
On Wed, Jun 6, 2012 at 2:49 AM, Juan Ezquerro LLanes arr...@gmail.comwrote:



 El martes, 5 de junio de 2012 19:19:02 UTC+2, Tyler Hobbs escribió:

 The Cassandra users mailing list is a better place for this question, so
 I'm moving it there.


 Hi, I need a phpcassa compatible solution . you think is better to
 move to the java world? :)


It should be doable in phpcassa either way; there's no limitation there.





 Some comments inline:

 On Tue, Jun 5, 2012 at 6:47 AM, Juan Ezquerro LLanes  wrote:

 I have a columnfamily like:

 CREATE COLUMN FAMILY Watchdog
 WITH key_validation_class = 'CompositeType(**LexicalUUIDType,**
 LexicalUUIDType)'
 AND comparator = UTF8Type
 AND column_metadata = [
 {column_name: error_code, validation_class: UTF8Type,
 index_type: KEYS}
 {column_name: line, validation_class: IntegerType}
 {column_name: file_path, validation_class: UTF8Type}
 {column_name: function, validation_class: UTF8Type}
 {column_name: content, validation_class: UTF8Type}
 {column_name: additional_data, validation_class: UTF8Type}
 {column_name: date_created, validation_class: DateType,
 index_type: KEYS}
 {column_name: priority, validation_class: IntegerType,
 index_type: KEYS}
 ];

 Row key is a combo of 2 uuid, the first it's the user's uuid, if i want
 a select of all the watchdog entrys of a user.how can i do? is it
 possible? I justk know user uuid, the other part of key is unknow uuid.

 The idea is simple, i have a user and i want all the records on
 watchdog, and i want secondary index to do search.very simple with
 mysql but here i can't find the way.

 If i do with a supercolumn i can use secondary indexes, if key is
 composite there is no way for select all data related to a
 user...


 Don't use super columns.  You can't put secondary indexes on super column
 families, anyways.


 TYPO:  If i do with a supercolumn i can*'t* use secondary indexes ...
 :)




 The ugly way:

 CREATE COLUMN FAMILY Watchdog
 WITH key_validation_class = LexicalUUIDType
 AND comparator = UTF8Type
 AND column_metadata = [
 *  {column_name: user_uuid, validation_class: LexicalUUIDType,
 index_type: KEYS}*
 {column_name: error_code, validation_class: UTF8Type,
 index_type: KEYS}
 {column_name: line, validation_class: IntegerType}
 {column_name: file_path, validation_class: UTF8Type}
 {column_name: function, validation_class: UTF8Type}
 {column_name: content, validation_class: UTF8Type}
 {column_name: additional_data, validation_class: UTF8Type}
 {column_name: date_created, validation_class: DateType,
 index_type: KEYS}
 {column_name: priority, validation_class: IntegerType,
 index_type: KEYS}
 ];


 I'm not sure why you think this is the ugly way to do it.  Assuming there
 will be plenty of events for each user, this will work pretty well with a
 secondary index.  Have you tried it?


 You think that's a good idea with very large sets of data, ok, you are the
 master, i try :)

 Thanks again :)


 The other decent option is to maintain your own index in a separate
 column family with one row per user, similar to the materialized view
 approach described here: http://www.datastax.com/dev/**
 blog/advanced-time-series-**with-cassandrahttp://www.datastax.com/dev/blog/advanced-time-series-with-cassandra



 But i think that is not a nice solution because y always need to search
 in all rows of very big tables to take all user's data...

 Please can help?

 Thanks.




 --
 Tyler Hobbs
 DataStax http://datastax.com/


 El martes, 5 de junio de 2012 19:19:02 UTC+2, Tyler Hobbs escribió:

 The Cassandra users mailing list is a better place for this question, so
 I'm moving it there.

 Some comments inline:

 On Tue, Jun 5, 2012 at 6:47 AM, Juan Ezquerro LLanes arr...@gmail.comwrote:

 I have a columnfamily like:

 CREATE COLUMN FAMILY Watchdog
 WITH key_validation_class = 'CompositeType(**LexicalUUIDType,**
 LexicalUUIDType)'
 AND comparator = UTF8Type
 AND column_metadata = [
 {column_name: error_code, validation_class: UTF8Type,
 index_type: KEYS}
 {column_name: line, validation_class: IntegerType}
 {column_name: file_path, validation_class: UTF8Type}
 {column_name: function, validation_class: UTF8Type}
 {column_name: content, validation_class: UTF8Type}
 {column_name: additional_data, validation_class: UTF8Type}
 {column_name: date_created, validation_class: DateType,
 index_type: KEYS}
 {column_name: priority, validation_class: IntegerType,
 index_type: KEYS}
 ];

 Row key is a combo of 2 uuid, the first it's the user's uuid, if i want
 a select of all the watchdog entrys of a user.how can i do? is it
 possible? I justk know user uuid, the other part of key is unknow uuid.

 The idea is simple, i have a user and i 

RE: Cassandra not retrieving the complete data on 2 nodes

2012-06-06 Thread Poziombka, Wade L
what is your consistency level?

From: Prakrati Agrawal [mailto:prakrati.agra...@mu-sigma.com]
Sent: Wednesday, June 06, 2012 4:58 AM
To: user@cassandra.apache.org
Subject: RE: Cassandra not retrieving the complete data on 2 nodes

Please anyone reply to my query

Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | 
www.mu-sigma.comhttp://www.mu-sigma.com

From: Prakrati Agrawal [mailto:prakrati.agra...@mu-sigma.com]
Sent: Wednesday, June 06, 2012 2:34 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Cassandra not retrieving the complete data on 2 nodes

Dear all

I was originally having a 1 node cluster. Then I added one more node to it with 
initial token configured appropriately. Now when I run my queries I am not 
getting all my data ie all columns.

Output on 2 nodes
Time taken to retrieve columns 43707 of key range is 1276
Time taken to retrieve columns 2084199 of all tickers is 54334
Time taken to count is 230776
Total number of rows in the database are 183
Total number of columns in the database are 7903753

Output on 1 node
Time taken to retrieve columns 43707 of key range is 767
Time taken to retrieve columns 382 of all tickers is 52793
Time taken to count is 268135
Total number of rows in the database are 396
Total number of columns in the database are 16316426

Please help me. Where is my data going or how should I retrieve it.

Thanks and Regards
Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | 
www.mu-sigma.comhttp://www.mu-sigma.com



This email message may contain proprietary, private and confidential 
information. The information transmitted is intended only for the person(s) or 
entities to which it is addressed. Any review, retransmission, dissemination or 
other use of, or taking of any action in reliance upon, this information by 
persons or entities other than the intended recipient is prohibited and may be 
illegal. If you received this in error, please contact the sender and delete 
the message from your system.

Mu Sigma takes all reasonable steps to ensure that its electronic 
communications are free from viruses. However, given Internet accessibility, 
the Company cannot accept liability for any virus introduced by this e-mail or 
any attachment and you are advised to use up-to-date virus checking software.


This email message may contain proprietary, private and confidential 
information. The information transmitted is intended only for the person(s) or 
entities to which it is addressed. Any review, retransmission, dissemination or 
other use of, or taking of any action in reliance upon, this information by 
persons or entities other than the intended recipient is prohibited and may be 
illegal. If you received this in error, please contact the sender and delete 
the message from your system.

Mu Sigma takes all reasonable steps to ensure that its electronic 
communications are free from viruses. However, given Internet accessibility, 
the Company cannot accept liability for any virus introduced by this e-mail or 
any attachment and you are advised to use up-to-date virus checking software.


Re: Cassandra not retrieving the complete data on 2 nodes

2012-06-06 Thread Tyler Hobbs
In addition to using a low consistency level, it sounds like you didn't
bootstrap the node or run a repair after it joined the ring.

On Wed, Jun 6, 2012 at 12:41 PM, Poziombka, Wade L 
wade.l.poziom...@intel.com wrote:

  what is your consistency level?

 ** **

 *From:* Prakrati Agrawal [mailto:prakrati.agra...@mu-sigma.com]
 *Sent:* Wednesday, June 06, 2012 4:58 AM
 *To:* user@cassandra.apache.org
 *Subject:* RE: Cassandra not retrieving the complete data on 2 nodes

 ** **

 Please anyone reply to my query

 ** **

 Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 |
 www.mu-sigma.com 

 ** **

 *From:* Prakrati Agrawal 
 [mailto:prakrati.agra...@mu-sigma.comprakrati.agra...@mu-sigma.com]

 *Sent:* Wednesday, June 06, 2012 2:34 PM
 *To:* user@cassandra.apache.org
 *Subject:* Cassandra not retrieving the complete data on 2 nodes

 ** **

 Dear all

 ** **

 I was originally having a 1 node cluster. Then I added one more node to it
 with initial token configured appropriately. Now when I run my queries I am
 not getting all my data ie all columns.

 ** **

 Output on 2 nodes

 Time taken to retrieve columns 43707 of key range is 1276

 Time taken to retrieve columns 2084199 of all tickers is 54334

 Time taken to count is 230776

 Total number of rows in the database are 183

 Total number of columns in the database are 7903753

 ** **

 Output on 1 node

 Time taken to retrieve columns 43707 of key range is 767

 Time taken to retrieve columns 382 of all tickers is 52793

 Time taken to count is 268135

 Total number of rows in the database are 396

 Total number of columns in the database are 16316426

 ** **

 Please help me. Where is my data going or how should I retrieve it.

 ** **

 Thanks and Regards

 Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 |
 www.mu-sigma.com 

 ** **

 ** **
  --

 This email message may contain proprietary, private and confidential
 information. The information transmitted is intended only for the person(s)
 or entities to which it is addressed. Any review, retransmission,
 dissemination or other use of, or taking of any action in reliance upon,
 this information by persons or entities other than the intended recipient
 is prohibited and may be illegal. If you received this in error, please
 contact the sender and delete the message from your system.

 Mu Sigma takes all reasonable steps to ensure that its electronic
 communications are free from viruses. However, given Internet
 accessibility, the Company cannot accept liability for any virus introduced
 by this e-mail or any attachment and you are advised to use up-to-date
 virus checking software.

 ** **
  --

 This email message may contain proprietary, private and confidential
 information. The information transmitted is intended only for the person(s)
 or entities to which it is addressed. Any review, retransmission,
 dissemination or other use of, or taking of any action in reliance upon,
 this information by persons or entities other than the intended recipient
 is prohibited and may be illegal. If you received this in error, please
 contact the sender and delete the message from your system.

 Mu Sigma takes all reasonable steps to ensure that its electronic
 communications are free from viruses. However, given Internet
 accessibility, the Company cannot accept liability for any virus introduced
 by this e-mail or any attachment and you are advised to use up-to-date
 virus checking software.




-- 
Tyler Hobbs
DataStax http://datastax.com/


Re: memory issue on 1.1.0

2012-06-06 Thread Tyler Hobbs
Just to check, do you have JNA setup correctly? (You should see a couple of
log messages about it shortly after startup.)  Truncate also performs a
snapshot by default.

On Wed, Jun 6, 2012 at 12:38 PM, Poziombka, Wade L 
wade.l.poziom...@intel.com wrote:

 **

 However, after all the work I issued a truncate on the old column family
 (the one replaced by this process) and I get an out of memory condition
 then.




-- 
Tyler Hobbs
DataStax http://datastax.com/


RE: memory issue on 1.1.0

2012-06-06 Thread Poziombka, Wade L
I believe so.  There are no warnings on startup.

So is there a preferred way to completely eliminate a column family?

From: Tyler Hobbs [mailto:ty...@datastax.com]
Sent: Wednesday, June 06, 2012 1:17 PM
To: user@cassandra.apache.org
Subject: Re: memory issue on 1.1.0

Just to check, do you have JNA setup correctly? (You should see a couple of log 
messages about it shortly after startup.)  Truncate also performs a snapshot by 
default.
On Wed, Jun 6, 2012 at 12:38 PM, Poziombka, Wade L 
wade.l.poziom...@intel.commailto:wade.l.poziom...@intel.com wrote:
However, after all the work I issued a truncate on the old column family (the 
one replaced by this process) and I get an out of memory condition then.



--
Tyler Hobbs
DataStaxhttp://datastax.com/


Re: Removing a node in cluster

2012-06-06 Thread aaron morton
It depends on what you mean by remove (background info here 
http://www.datastax.com/docs/1.0/operations/cluster_management ) 

If you use nodetool decomission or nodetool removetoken the data will be 
redistributed. 

Cheers


-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 6/06/2012, at 5:39 PM, Prakrati Agrawal wrote:

 Dear all
  
 I am trying to check the performance of Cassandra on adding or removing 
 nodes. I want to know what happens to my existing data if I remove a node ? 
 Please help me
  
 Thanks and Regards
  
 Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com
  
 
 This email message may contain proprietary, private and confidential 
 information. The information transmitted is intended only for the person(s) 
 or entities to which it is addressed. Any review, retransmission, 
 dissemination or other use of, or taking of any action in reliance upon, this 
 information by persons or entities other than the intended recipient is 
 prohibited and may be illegal. If you received this in error, please contact 
 the sender and delete the message from your system.
 
 Mu Sigma takes all reasonable steps to ensure that its electronic 
 communications are free from viruses. However, given Internet accessibility, 
 the Company cannot accept liability for any virus introduced by this e-mail 
 or any attachment and you are advised to use up-to-date virus checking 
 software.



Re: How to include two nodes in Java code using Hector

2012-06-06 Thread aaron morton
The client does not have to know where the data is, thats what the cluster 
works out see 
http://www.datastax.com/docs/1.0/cluster_architecture/about_client_requests

 Now I have decommissioned a node but now I don't know how to recommission it 
 .Please help me
http://www.datastax.com/docs/1.0/operations/cluster_management

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 6/06/2012, at 6:12 PM, Prakrati Agrawal wrote:

 Thank you for the reply.
 Now I have decommissioned a node but now I don't know how to recommission it 
 .Please help me
 
 Thanks and Regards
 
 Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com
 
 -Original Message-
 From: Roshni Rajagopal [mailto:roshni.rajago...@wal-mart.com]
 Sent: Wednesday, June 06, 2012 11:42 AM
 To: user@cassandra.apache.org
 Subject: Re: How to include two nodes in Java code using Hector
 
 
 
 In Hector when you create a cluster using the API, you specify an IP address 
  cluster name. Thereafter internally which node serves the request or how 
 many nodes need to be contacted to read/write data depends on the cluster 
 configuration i.e. Whats your replication strategy, factor, consistency 
 levels for the col family , how many nodes are there in the ring etc. So you 
 don't individually need to connect to each node via Hector client. Once you 
 connect to the cluster  keyspace, via any IP add of any node in the cluster, 
 when you make Hector calls to read/write data, it would automatically figure 
 out the node level details and carry out the task. You won't get 50% of the 
 data, you will get all data.
 
 
 Also when you remove a node, your data will be unavailable ONLY if you don't 
 have it available in some other node as a replica..
 
 
 Regards,
 
 
 From: Prakrati Agrawal 
 prakrati.agra...@mu-sigma.commailto:prakrati.agra...@mu-sigma.com
 Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Date: Tue, 5 Jun 2012 20:05:21 -0700
 To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Subject: RE: How to include two nodes in Java code using Hector
 
 But the data is distributed on the nodes ( meaning 50% of data is on one node 
 and 50% of data is on another node) so I need to specify the node ip address 
 somewhere in the code. But where do I specify that is what I am clueless 
 about. Please help me
 
 Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com
 
 From: Harshvardhan Ojha [mailto:harshvardhan.o...@makemytrip.com]
 Sent: Tuesday, June 05, 2012 5:51 PM
 To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Subject: RE: How to include two nodes in Java code using Hector
 
 Use Consistency Level =2.
 
 Regards
 Harsh
 
 From: Prakrati Agrawal [mailto:prakrati.agra...@mu-sigma.com]
 Sent: Tuesday, June 05, 2012 4:08 PM
 To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Subject: How to include two nodes in Java code using Hector
 
 Dear all
 
 I am using a two node Cassandra cluster. How do I code in Java using Hector 
 to get data from both the nodes. Please help
 
 Thanks and Regards
 
 Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | 
 www.mu-sigma.comhttp://www.mu-sigma.com
 
 
 
 This email message may contain proprietary, private and confidential 
 information. The information transmitted is intended only for the person(s) 
 or entities to which it is addressed. Any review, retransmission, 
 dissemination or other use of, or taking of any action in reliance upon, this 
 information by persons or entities other than the intended recipient is 
 prohibited and may be illegal. If you received this in error, please contact 
 the sender and delete the message from your system.
 
 Mu Sigma takes all reasonable steps to ensure that its electronic 
 communications are free from viruses. However, given Internet accessibility, 
 the Company cannot accept liability for any virus introduced by this e-mail 
 or any attachment and you are advised to use up-to-date virus checking 
 software.
 
 
 This email message may contain proprietary, private and confidential 
 information. The information transmitted is intended only for the person(s) 
 or entities to which it is addressed. Any review, retransmission, 
 dissemination or other use of, or taking of any action in reliance upon, this 
 information by persons or entities other than the intended recipient is 
 prohibited and may be illegal. If you received this in error, please contact 
 the sender and delete the message from your system.
 
 Mu Sigma takes all reasonable steps to ensure that its electronic 
 communications are free from viruses. However, given Internet accessibility, 
 the Company cannot accept liability for any virus introduced by this e-mail 
 or 

Re: how to create keyspace using cassandra API's

2012-06-06 Thread aaron morton
You can use the CLI http://www.datastax.com/docs/1.0/dml/using_cli or CQL 
http://www.datastax.com/docs/1.0/dml/using_cql 

Cheers


-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 6/06/2012, at 9:00 PM, Prakrati Agrawal wrote:

 You have to create the keyspace manually first using Cassandra cli
  
 Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com
  
 From: MOHD ARSHAD SALEEM [mailto:marshadsal...@tataelxsi.co.in] 
 Sent: Wednesday, June 06, 2012 2:27 PM
 To: user@cassandra.apache.org
 Subject: how to create keyspace using cassandra API's
  
 Hi All,
 
 I am using Hector as a client in cassandra.And  iam trying to create Keyspace 
 using the following API's
 Keyspace keyspace = HFactory.createKeyspace(test, cluster);
 but it showing the following error:
 caused by: InvalidRequestException(why:Keyspace test does not exist)   
 can any body help me how to create keyspace in cassandra.
 
 Regards
 Arshad
 
 This email message may contain proprietary, private and confidential 
 information. The information transmitted is intended only for the person(s) 
 or entities to which it is addressed. Any review, retransmission, 
 dissemination or other use of, or taking of any action in reliance upon, this 
 information by persons or entities other than the intended recipient is 
 prohibited and may be illegal. If you received this in error, please contact 
 the sender and delete the message from your system.
 
 Mu Sigma takes all reasonable steps to ensure that its electronic 
 communications are free from viruses. However, given Internet accessibility, 
 the Company cannot accept liability for any virus introduced by this e-mail 
 or any attachment and you are advised to use up-to-date virus checking 
 software.



Re: Node decomission failed

2012-06-06 Thread aaron morton
Take a look in the logs for .185 and check for errors. 

Run node tool ring from node .62 to see if it thinks .185 is in the ring. 

if all looks good, try to decomission again. 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 7/06/2012, at 12:32 AM, Marc Canaleta wrote:

 Hi,
 
 We are testing Cassandra and tried to remove a node from the cluster using 
 nodetool decomission. The node transferred the data, then died for about 20 
 minutes without responding, then came back to life with a load of 50-100, was 
 in a heavy load during about 1 hour and then returned to normal load. It 
 seems to have stopped receiving new data but it is still in the cluster.
 
 The node we tried to remove is the third one:
 
 root@dc-cassandra-03:~# nodetool ring
 Note: Ownership information does not include topology, please specify a 
 keyspace. 
 Address DC  RackStatus State   LoadOwns   
  Token   
   
  113427455640312821154458202477256070484 
 10.70.147.62datacenter1 rack1   Up Normal  7.14 GB 33.33% 
  0   
 10.208.51.64datacenter1 rack1   Up Normal  3.68 GB 33.33% 
  56713727820156410577229101238628035242  
 10.190.207.185  datacenter1 rack1   Up Normal  3.54 GB 33.33% 
  113427455640312821154458202477256070484
 
 
 It seems it is still part of the cluster. What should we do? decomission 
 again?
 
 How can we know the current state of the cluster?
 
 Thanks!



Re: MeteredFlusher in system.log entries

2012-06-06 Thread aaron morton
You question was
 Could you please throw light on the what conditions does
 MeteredFlusher use to trigger memtable flushes.

The answer is estimates of the ratio between the live size and the serialised 
size of memtables are kept. The MeteredFlusher periodically checks the 
serialised size of all memtables and uses the ratio to determine if 
memtable_total_space_in_mb has been reached. If there is a variation between 
nodes it may be that some are getting more traffic than others. 

 So it implies that for flushing, Cassandra copies the memtables content.

 
No

 So does this imply that writes to column families are not stopped even
when it is being flushed?
Yes. 
In a worst case scenario writes will block if the memtable flushing cannot keep 
up. 

 Also, Could someone please explain how the factor of 7 comes in the
picture in this sentence

In the example (see previous para) 7 is the number of memtables the CF could 
have in memory at once (forgetting about the other cf's).  

Cheers
 
-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 7/06/2012, at 1:08 AM, rohit bhatia wrote:

 Also, Could someone please explain how the factor of 7 comes in the
 picture in this sentence
 
 For example if memtable_total_space_in_mb is 100MB, and
 memtable_flush_writers is the default 1 (with one data directory), and
 memtable_flush_queue_size is the default 4, and a Column Family has no
 secondary indexes. The CF will not be allowed to get above one seventh
 of 100MB or 14MB, as if the CF filled the flush pipeline with 7
 memtables of this size it would take 98MB. 
 
 On Wed, Jun 6, 2012 at 6:22 PM, rohit bhatia rohit2...@gmail.com wrote:
 Hi..
 
 the link http://thelastpickle.com/2011/05/04/How-are-Memtables-measured/
 mentions that From version 0.7 onwards the worse case scenario is up
 to CF Count + Secondary Index Count + memtable_flush_queue_size
 (defaults to 4) + memtable_flush_writers (defaults to 1 per data
 directory) memtables in memory the JVM at once..
 
 So it implies that for flushing, Cassandra copies the memtables content.
 So does this imply that writes to column families are not stopped even
 when it is being flushed?
 
 Thanks
 Rohit
 
 On Wed, Jun 6, 2012 at 9:42 AM, rohit bhatia rohit2...@gmail.com wrote:
 Hi Aaron
 
 Thanks for the link, I have gone through it. But this doesn't justify
 nodes of exactly same config/specs differing in their flushing
 frequency.
 The traffic on all node is same as we are using RandomPartitioner
 
 Thanks
 Rohit
 
 On Wed, Jun 6, 2012 at 12:24 AM, aaron morton aa...@thelastpickle.com 
 wrote:
 See the section on memtable_total_space_in_mb here
  http://thelastpickle.com/2011/05/04/How-are-Memtables-measured/
 
 Cheers
 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 6/06/2012, at 2:27 AM, rohit bhatia wrote:
 
 I am trying to understand the variance in flushes frequency in a 8
 node Cassandra cluster.
 All the flushes are of the same type and initiated by MeteredFlusher.java 
 =
 
 INFO [OptionalTasks:1] 2012-06-05 06:32:05,873 MeteredFlusher.java
 (line 62) flushing high-traffic column family CFS(Keyspace='Stats',
 ColumnFamily='Minutewise_Channel_Stats') (estimated 501695882 bytes)
 [taken from system.log]
 
 Number of flushes for 1 column family vary from 6 flushes per day to
 24 flushes per day among nodes of same configuration and same
 hardware.
 Could you please throw light on the what conditions does
 MeteredFlusher use to trigger memtable flushes.
 Also how accurate is the estimated size in the above logfile entry.
 
 Regards
 Rohit Bhatia
 Software Engineer, Media.net
 
 



Re: memory issue on 1.1.0

2012-06-06 Thread aaron morton
use drop. 

truncate is mostly for unit tests.
A
-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 7/06/2012, at 6:22 AM, Poziombka, Wade L wrote:

 I believe so.  There are no warnings on startup. 
  
 So is there a preferred way to completely eliminate a column family?
  
 From: Tyler Hobbs [mailto:ty...@datastax.com] 
 Sent: Wednesday, June 06, 2012 1:17 PM
 To: user@cassandra.apache.org
 Subject: Re: memory issue on 1.1.0
  
 Just to check, do you have JNA setup correctly? (You should see a couple of 
 log messages about it shortly after startup.)  Truncate also performs a 
 snapshot by default.
 
 On Wed, Jun 6, 2012 at 12:38 PM, Poziombka, Wade L 
 wade.l.poziom...@intel.com wrote:
 However, after all the work I issued a truncate on the old column family (the 
 one replaced by this process) and I get an out of memory condition then.
 
 
 
 -- 
 Tyler Hobbs
 DataStax
 



Re: Nodes not picking up data on repair, disk loaded unevenly

2012-06-06 Thread Luke Hospadaruk
Another little question:

I just added some EBS volumes to the nodes that are particularly choked and I 
am now running major compactions on those nodes (and all is well so far).  Once 
everything gets back down to a normal size, can I move all the data back off 
the ebs volumes?
something along the lines of:

nodetool –h localhost drain
stop cassandra
remove ebs volumes from cassandra conf
cp -r /recovery/* /mnt/data
unmount/detatch/delete ebs volume
start cassandra

Then add some more nodes to the cluster to keep this from happening in the 
future.

I assume all the files stored in any of the data directories are all uniquely 
named and cassandra won't really care where they are as long as everything it 
wants is in it's data directories.

I was also thinking to copy my column families (using thrift or the like) to 
fresh column families to undo any strangeness done by my major compactions, 
then get rid of the old CFs once everything is hunkey-dory.

Luke


Re: Secondary Indexes, Quorum and Cluster Availability

2012-06-06 Thread Jim Ancona
On Tue, Jun 5, 2012 at 4:30 PM, Jim Ancona j...@anconafamily.com wrote:

 It might be a good idea for the documentation to reflect the tradeoffs
 more clearly.


Here's a proposed addition to the Secondary Index FAQ at
http://wiki.apache.org/cassandra/SecondaryIndexes

Q: How does choice of Consistency Level affect cluster availability when
using secondary indexes?
A: Because secondary indexes are distributed, you must have CL level nodes
available for *all* token ranges in the cluster in order to complete a
query. For example, with RF = 3, when two out of three consecutive nodes in
the ring are unavailable, *all* secondary index queries at CL = QUORUM will
fail, however secondary index queries at CL = ONE will succeed. This is
true regardless of cluster size.

Comments?

Jim


Cassandra 1.1.1 Fails to Start

2012-06-06 Thread Javier Sotelo
Hi All,

On SuSe Linux blade with 6GB of RAM.

with disk_access_mode mmap_index_only and mmap I see OOM map failed error
on SSTableBatchOpen thread. cat /proc/pid/maps shows a peak of 53521
right before it dies. vm.max_map_count = 1966080 and /proc/pid/limits
shows unlimited locked memory.

with disk_access_mode standard, the node does start up but I see the
repeated error:
ERROR [CompactionExecutor:6] 2012-06-06 20:24:19,772
AbstractCassandraDaemon.java (line 134) Exception in thread
Thread[CompactionExecutor:6,1,main]
java.lang.StackOverflowError
at com.google.common.collect.Sets$1.iterator(Sets.java:578)
at com.google.common.collect.Sets$1.iterator(Sets.java:578)
at com.google.common.collect.Sets$1.iterator(Sets.java:578)
...

I'm not sure the second error is related to the first. I prefer to run with
full mmap but I have run out of ideas. Is there anything else I can do to
debug this?

Here's startup settings from debug log:
 INFO [main] 2012-06-06 20:17:10,267 AbstractCassandraDaemon.java (line
121) JVM vendor/version: Java HotSpot(TM) 64-Bit Server VM/1.6.0_31
 INFO [main] 2012-06-06 20:17:10,267 AbstractCassandraDaemon.java (line
122) Heap size: 1525415936/1525415936
 ...
 INFO [main] 2012-06-06 20:17:10,946 CLibrary.java (line 111) JNA mlockall
successful
 ...
 INFO [main] 2012-06-06 20:17:11,055 DatabaseDescriptor.java (line 191)
DiskAccessMode is standard, indexAccessMode is standard
 INFO [main] 2012-06-06 20:17:11,213 DatabaseDescriptor.java (line 247)
Global memtable threshold is enabled at 484MB
 INFO [main] 2012-06-06 20:17:11,499 CacheService.java (line 96)
Initializing key cache with capacity of 72 MBs.
 INFO [main] 2012-06-06 20:17:11,509 CacheService.java (line 107)
Scheduling key cache save to each 14400 seconds (going to save all keys).
 INFO [main] 2012-06-06 20:17:11,510 CacheService.java (line 121)
Initializing row cache with capacity of 0 MBs and provider
org.apache.cassandra.cache.SerializingCacheProvider
 INFO [main] 2012-06-06 20:17:11,513 CacheService.java (line 133)
Scheduling row cache save to each 0 seconds (going to save all keys).

Thanks In Advance,
Javier


Re: how to create keyspace using cassandra API's

2012-06-06 Thread Abhijit Chanda
U can use Astyanax API. These sort minor issues are resolved in that API.

Regards,
Abhijit