Query
Hi All, I am using Hector client for cassandra . I wanted to know how to create keyspace and column family using API's to read and write data. or i have to create keyspace and column family manually using command line interface. Regards Arshad
Re: How to include two nodes in Java code using Hector
In Hector when you create a cluster using the API, you specify an IP address cluster name. Thereafter internally which node serves the request or how many nodes need to be contacted to read/write data depends on the cluster configuration i.e. Whats your replication strategy, factor, consistency levels for the col family , how many nodes are there in the ring etc. So you don't individually need to connect to each node via Hector client. Once you connect to the cluster keyspace, via any IP add of any node in the cluster, when you make Hector calls to read/write data, it would automatically figure out the node level details and carry out the task. You won't get 50% of the data, you will get all data. Also when you remove a node, your data will be unavailable ONLY if you don't have it available in some other node as a replica.. Regards, From: Prakrati Agrawal prakrati.agra...@mu-sigma.commailto:prakrati.agra...@mu-sigma.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Tue, 5 Jun 2012 20:05:21 -0700 To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: RE: How to include two nodes in Java code using Hector But the data is distributed on the nodes ( meaning 50% of data is on one node and 50% of data is on another node) so I need to specify the node ip address somewhere in the code. But where do I specify that is what I am clueless about. Please help me Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com From: Harshvardhan Ojha [mailto:harshvardhan.o...@makemytrip.com] Sent: Tuesday, June 05, 2012 5:51 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: RE: How to include two nodes in Java code using Hector Use Consistency Level =2. Regards Harsh From: Prakrati Agrawal [mailto:prakrati.agra...@mu-sigma.com] Sent: Tuesday, June 05, 2012 4:08 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: How to include two nodes in Java code using Hector Dear all I am using a two node Cassandra cluster. How do I code in Java using Hector to get data from both the nodes. Please help Thanks and Regards Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.comhttp://www.mu-sigma.com This email message may contain proprietary, private and confidential information. The information transmitted is intended only for the person(s) or entities to which it is addressed. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited and may be illegal. If you received this in error, please contact the sender and delete the message from your system. Mu Sigma takes all reasonable steps to ensure that its electronic communications are free from viruses. However, given Internet accessibility, the Company cannot accept liability for any virus introduced by this e-mail or any attachment and you are advised to use up-to-date virus checking software. This email message may contain proprietary, private and confidential information. The information transmitted is intended only for the person(s) or entities to which it is addressed. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited and may be illegal. If you received this in error, please contact the sender and delete the message from your system. Mu Sigma takes all reasonable steps to ensure that its electronic communications are free from viruses. However, given Internet accessibility, the Company cannot accept liability for any virus introduced by this e-mail or any attachment and you are advised to use up-to-date virus checking software. This email and any files transmitted with it are confidential and intended solely for the individual or entity to whom they are addressed. If you have received this email in error destroy it immediately. *** Walmart Confidential ***
RE: How to include two nodes in Java code using Hector
Thank you for the reply. Now I have decommissioned a node but now I don't know how to recommission it .Please help me Thanks and Regards Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com -Original Message- From: Roshni Rajagopal [mailto:roshni.rajago...@wal-mart.com] Sent: Wednesday, June 06, 2012 11:42 AM To: user@cassandra.apache.org Subject: Re: How to include two nodes in Java code using Hector In Hector when you create a cluster using the API, you specify an IP address cluster name. Thereafter internally which node serves the request or how many nodes need to be contacted to read/write data depends on the cluster configuration i.e. Whats your replication strategy, factor, consistency levels for the col family , how many nodes are there in the ring etc. So you don't individually need to connect to each node via Hector client. Once you connect to the cluster keyspace, via any IP add of any node in the cluster, when you make Hector calls to read/write data, it would automatically figure out the node level details and carry out the task. You won't get 50% of the data, you will get all data. Also when you remove a node, your data will be unavailable ONLY if you don't have it available in some other node as a replica.. Regards, From: Prakrati Agrawal prakrati.agra...@mu-sigma.commailto:prakrati.agra...@mu-sigma.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Tue, 5 Jun 2012 20:05:21 -0700 To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: RE: How to include two nodes in Java code using Hector But the data is distributed on the nodes ( meaning 50% of data is on one node and 50% of data is on another node) so I need to specify the node ip address somewhere in the code. But where do I specify that is what I am clueless about. Please help me Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com From: Harshvardhan Ojha [mailto:harshvardhan.o...@makemytrip.com] Sent: Tuesday, June 05, 2012 5:51 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: RE: How to include two nodes in Java code using Hector Use Consistency Level =2. Regards Harsh From: Prakrati Agrawal [mailto:prakrati.agra...@mu-sigma.com] Sent: Tuesday, June 05, 2012 4:08 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: How to include two nodes in Java code using Hector Dear all I am using a two node Cassandra cluster. How do I code in Java using Hector to get data from both the nodes. Please help Thanks and Regards Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.comhttp://www.mu-sigma.com This email message may contain proprietary, private and confidential information. The information transmitted is intended only for the person(s) or entities to which it is addressed. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited and may be illegal. If you received this in error, please contact the sender and delete the message from your system. Mu Sigma takes all reasonable steps to ensure that its electronic communications are free from viruses. However, given Internet accessibility, the Company cannot accept liability for any virus introduced by this e-mail or any attachment and you are advised to use up-to-date virus checking software. This email message may contain proprietary, private and confidential information. The information transmitted is intended only for the person(s) or entities to which it is addressed. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited and may be illegal. If you received this in error, please contact the sender and delete the message from your system. Mu Sigma takes all reasonable steps to ensure that its electronic communications are free from viruses. However, given Internet accessibility, the Company cannot accept liability for any virus introduced by this e-mail or any attachment and you are advised to use up-to-date virus checking software. This email and any files transmitted with it are confidential and intended solely for the individual or entity to whom they are addressed. If you have received this email in error destroy it immediately. *** Walmart Confidential *** This email message may contain proprietary, private and confidential information. The information transmitted is intended only for the person(s) or entities to which it is addressed. Any review, retransmission, dissemination or other use of, or taking of any action
How to make a decommissioned node join the ring again
Dear all I decommissioned a node. Now I want to make that node a part of the ring again. How do I do it? Please help me Thanks and Regards Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com This email message may contain proprietary, private and confidential information. The information transmitted is intended only for the person(s) or entities to which it is addressed. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited and may be illegal. If you received this in error, please contact the sender and delete the message from your system. Mu Sigma takes all reasonable steps to ensure that its electronic communications are free from viruses. However, given Internet accessibility, the Company cannot accept liability for any virus introduced by this e-mail or any attachment and you are advised to use up-to-date virus checking software.
Re: memory issue on 1.1.0
Mina, That does not sound right. If you have the time can you create a jira ticket describing the problem, please include: * the GC logs gathered by enabling them here https://github.com/apache/cassandra/blob/trunk/conf/cassandra-env.sh#L165 (It would be good to see the node get into trouble if possible). * OS, JVM and cassandra versions * information on the schema and workload * anything else you think is important. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 6/06/2012, at 7:24 AM, Mina Naguib wrote: Hi Wade I don't know if your scenario matches mine, but I've been struggling with memory pressure in 1.x as well. I made the jump from 0.7.9 to 1.1.0, along with enabling compression and levelled compactions, so I don't know which specifically is the main culprit. Specifically, all my nodes seem to lose heap memory. As parnew and CMS do their job, over any reasonable period of time, the floor of memory after a GC keeps rising. This is quite visible if you leave jconsole connected for a day or so, and manifests itself as a funny-looking cone like so: http://mina.naguib.ca/images/cassandra_jconsole.png Once memory pressure reaches a point where the heap can't be maintained reliably below 75%, cassandra goes into survival mode - via a bunch of tunables in cassandra.conf it'll do things like flush memtables, drop caches, etc - all of which, in my experience, especially with the recent off-heap data structures, exasperate the problem. I've been meaning, of course, to collect enough technical data to file a bug report, but haven't had the time. I have not yet tested 1.1.1 to see if it improves the situation. What I have found however, is a band-aid which you see at the rightmost section of the graph in the screenshot I posted. That is simply to hit Perform GC button in jconsole. It seems that a full System.gc() *DOES* reclaim heap memory that parnew and CMS fail to reclaim. On my production cluster I have a full-GC via JMX scheduled in a rolling fashion every 4 hours. It's extremely expensive (20-40 seconds of unresponsiveness) but is a necessary evil in my situation. Without it, my nodes enter a nasty spiral of constant flushing, constant compactions, high heap usage, instability and high latency. On 2012-06-05, at 2:56 PM, Poziombka, Wade L wrote: Alas, upgrading to 1.1.1 did not solve my issue. -Original Message- From: Brandon Williams [mailto:dri...@gmail.com] Sent: Monday, June 04, 2012 11:24 PM To: user@cassandra.apache.org Subject: Re: memory issue on 1.1.0 Perhaps the deletes: https://issues.apache.org/jira/browse/CASSANDRA-3741 -Brandon On Sun, Jun 3, 2012 at 6:12 PM, Poziombka, Wade L wade.l.poziom...@intel.com wrote: Running a very write intensive (new column, delete old column etc.) process and failing on memory. Log file attached. Curiously when I add new data I have never seen this have in past sent hundreds of millions new transactions. It seems to be when I modify. my process is as follows key slice to get columns to modify in batches of 100, in separate threads modify those columns. I advance the slice with the start key each with last key in previous batch. Mutations done are update a column value in one column family(token), delete column and add new column in another (pan). Runs well until after about 5 million rows then it seems to run out of memory. Note that these column families are quite small. WARN [ScheduledTasks:1] 2012-06-03 17:49:01,558 GCInspector.java (line 145) Heap is 0.7967470834946492 full. You may need to reduce memtable and/or cache sizes. Cassandra will now flush up to the two largest memtables to free up memory. Adjust flush_largest_memtables_at threshold in cassandra.yaml if you don't want Cassandra to do this automatically INFO [ScheduledTasks:1] 2012-06-03 17:49:01,559 StorageService.java (line 2772) Unable to reduce heap usage since there are no dirty column families INFO [GossipStage:1] 2012-06-03 17:49:01,999 Gossiper.java (line 797) InetAddress /10.230.34.170 is now UP INFO [ScheduledTasks:1] 2012-06-03 17:49:10,048 GCInspector.java (line 122) GC for ParNew: 206 ms for 1 collections, 7345969520 used; max is 8506048512 INFO [ScheduledTasks:1] 2012-06-03 17:49:53,187 GCInspector.java (line 122) GC for ConcurrentMarkSweep: 12770 ms for 1 collections, 5714800208 used; max is 8506048512 Keyspace: keyspace Read Count: 50042632 Read Latency: 0.23157864418482224 ms. Write Count: 44948323 Write Latency: 0.019460829472992797 ms. Pending Tasks: 0 Column Family: pan SSTable count: 5 Space used (live): 1977467326 Space used (total): 1977467326 Number of Keys (estimate): 16334848 Memtable
how to create keyspace using cassandra API's
Hi All, I am using Hector as a client in cassandra.And iam trying to create Keyspace using the following API's Keyspace keyspace = HFactory.createKeyspace(test, cluster); but it showing the following error: caused by: InvalidRequestException(why:Keyspace test does not exist) can any body help me how to create keyspace in cassandra. Regards Arshad
Re: Query
Hi, the Javadoc (or source code) of the me.prettyprint.hector.api.factory.HFactory class contains all the examples to create keyspaces and column families. To create a keyspace: String testKeyspace = testKeyspace; KeyspaceDefinition newKeyspace = HFactory.createKeyspaceDefinition(testKeyspace); cluster.addKeyspace(newKeyspace); To create a column family and a keyspace: String keyspace = testKeyspace; String column1 = testcolumn; ColumnFamilyDefinition columnFamily1 = HFactory.createColumnFamilyDefinition(keyspace, column1); ListColumnFamilyDefinition columns = new ArrayListColumnFamilyDefinition(); columns.add(columnFamily1); KeyspaceDefinition testKeyspace = HFactory.createKeyspaceDefinition(keyspace, org.apache.cassandra.locator.SimpleStrategy.class.getName(), 1, columns); cluster.addKeyspace(testKeyspace); -- Filippo Diotalevi On Wednesday, 6 June 2012 at 07:05, MOHD ARSHAD SALEEM wrote: Hi All, I am using Hector client for cassandra . I wanted to know how to create keyspace and column family using API's to read and write data. or i have to create keyspace and column family manually using command line interface. Regards Arshad
RE: how to create keyspace using cassandra API's
You have to create the keyspace manually first using Cassandra cli Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com From: MOHD ARSHAD SALEEM [mailto:marshadsal...@tataelxsi.co.in] Sent: Wednesday, June 06, 2012 2:27 PM To: user@cassandra.apache.org Subject: how to create keyspace using cassandra API's Hi All, I am using Hector as a client in cassandra.And iam trying to create Keyspace using the following API's Keyspace keyspace = HFactory.createKeyspace(test, cluster); but it showing the following error: caused by: InvalidRequestException(why:Keyspace test does not exist) can any body help me how to create keyspace in cassandra. Regards Arshad This email message may contain proprietary, private and confidential information. The information transmitted is intended only for the person(s) or entities to which it is addressed. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited and may be illegal. If you received this in error, please contact the sender and delete the message from your system. Mu Sigma takes all reasonable steps to ensure that its electronic communications are free from viruses. However, given Internet accessibility, the Company cannot accept liability for any virus introduced by this e-mail or any attachment and you are advised to use up-to-date virus checking software.
Re: memory issue on 1.1.0
I looked through the log again. Still looks like it's overloaded and not handling the overload very well. It looks like a sustained write load of around 280K columns every 5 minutes for about 5 hours. It may be that the CPU is the bottle neck when it comes to GC throughput. You are hitting ParNew issues from the very start, and end up with 20 second CMS. Do you see high CPU load ? Can you enable the GC logging options in cassandra-env.sh ? Can you throttle back the test and to a level where the server does not fail ? Alternatively can you dump the heap when it get's full and see what it taking up all the space ? Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 6/06/2012, at 2:12 PM, Poziombka, Wade L wrote: Ok, so I have completely refactored to remove deletes and it still fails. So it is completely unrelated to deletes. I guess I need to go back to 1.0.10? When I originally evaluated I ran 1.0.8... perhaps I went a bridge too far with 1.1. I don't think I am doing anything exotic here. Here is my column family. KsDef(name:TB_UNIT, strategy_class:org.apache.cassandra.locator.SimpleStrategy, strategy_options:{replication_factor=3}, cf_defs:[ CfDef(keyspace:TB_UNIT, name:token, column_type:Standard, comparator_type:BytesType, column_metadata:[ColumnDef(name:70 61 6E 45 6E 63, validation_class:BytesType), ColumnDef(name:63 72 65 61 74 65 54 73, validation_class:DateType), ColumnDef(name:63 72 65 61 74 65 44 61 74 65, validation_class:DateType, index_type:KEYS, index_name:TokenCreateDate), ColumnDef(name:65 6E 63 72 79 70 74 69 6F 6E 53 65 74 74 69 6E 67 73 49 44, validation_class:UTF8Type, index_type:KEYS, index_name:EncryptionSettingsID)], caching:keys_only), CfDef(keyspace:TB_UNIT, name:pan_d721fd40fd9443aa81cc6f59c8e047c6, column_type:Standard, comparator_type:BytesType, caching:keys_only), CfDef(keyspace:TB_UNIT, name:counters, column_type:Standard, comparator_type:BytesType, column_metadata:[ColumnDef(name:75 73 65 43 6F 75 6E 74, validation_class:CounterColumnType)], default_validation_class:CounterColumnType, caching:keys_only) ]) -Original Message- From: Poziombka, Wade L [mailto:wade.l.poziom...@intel.com] Sent: Tuesday, June 05, 2012 3:09 PM To: user@cassandra.apache.org Subject: RE: memory issue on 1.1.0 Thank you. I do have some of the same observations. Do you do deletes? My observation is that without deletes (or column updates I guess) I can run forever happy. but when I run (what for me is a batch process) operations that delete and modify column values I run into this. Reading bug https://issues.apache.org/jira/browse/CASSANDRA-3741 the advice is to NOT do deletes individually and to truncate. I am scrambling to try to do this but curious if it will be worth the effort. Wade -Original Message- From: Mina Naguib [mailto:mina.nag...@bloomdigital.com] Sent: Tuesday, June 05, 2012 2:24 PM To: user@cassandra.apache.org Subject: Re: memory issue on 1.1.0 Hi Wade I don't know if your scenario matches mine, but I've been struggling with memory pressure in 1.x as well. I made the jump from 0.7.9 to 1.1.0, along with enabling compression and levelled compactions, so I don't know which specifically is the main culprit. Specifically, all my nodes seem to lose heap memory. As parnew and CMS do their job, over any reasonable period of time, the floor of memory after a GC keeps rising. This is quite visible if you leave jconsole connected for a day or so, and manifests itself as a funny-looking cone like so: http://mina.naguib.ca/images/cassandra_jconsole.png Once memory pressure reaches a point where the heap can't be maintained reliably below 75%, cassandra goes into survival mode - via a bunch of tunables in cassandra.conf it'll do things like flush memtables, drop caches, etc - all of which, in my experience, especially with the recent off-heap data structures, exasperate the problem. I've been meaning, of course, to collect enough technical data to file a bug report, but haven't had the time. I have not yet tested 1.1.1 to see if it improves the situation. What I have found however, is a band-aid which you see at the rightmost section of the graph in the screenshot I posted. That is simply to hit Perform GC button in jconsole. It seems that a full System.gc() *DOES* reclaim heap memory that parnew and CMS fail to reclaim. On my production cluster I have a full-GC via JMX scheduled in a rolling fashion every 4 hours. It's extremely expensive (20-40 seconds of unresponsiveness) but is a necessary evil in my situation. Without it, my nodes enter a nasty spiral of constant flushing, constant compactions, high heap usage, instability and high latency. On 2012-06-05, at 2:56 PM, Poziombka, Wade L wrote: Alas, upgrading to 1.1.1 did not solve
Cassandra not retrieving the complete data on 2 nodes
Dear all I was originally having a 1 node cluster. Then I added one more node to it with initial token configured appropriately. Now when I run my queries I am not getting all my data ie all columns. Output on 2 nodes Time taken to retrieve columns 43707 of key range is 1276 Time taken to retrieve columns 2084199 of all tickers is 54334 Time taken to count is 230776 Total number of rows in the database are 183 Total number of columns in the database are 7903753 Output on 1 node Time taken to retrieve columns 43707 of key range is 767 Time taken to retrieve columns 382 of all tickers is 52793 Time taken to count is 268135 Total number of rows in the database are 396 Total number of columns in the database are 16316426 Please help me. Where is my data going or how should I retrieve it. Thanks and Regards Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com This email message may contain proprietary, private and confidential information. The information transmitted is intended only for the person(s) or entities to which it is addressed. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited and may be illegal. If you received this in error, please contact the sender and delete the message from your system. Mu Sigma takes all reasonable steps to ensure that its electronic communications are free from viruses. However, given Internet accessibility, the Company cannot accept liability for any virus introduced by this e-mail or any attachment and you are advised to use up-to-date virus checking software.
Why Hector is taking more time than Thrift
Dear all I am trying to evaluate the performance of Cassandra and wrote a code to retrieve a complete row ( having 43707 columns) using Thrift and Hector. The thrift client code took 0.767 seconds while Hector code took 0.883 seconds . Is it expected that Hector will be slower than Thrift? If yes, then why are we using Hector and not Thrift? Thanks and Regards Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com This email message may contain proprietary, private and confidential information. The information transmitted is intended only for the person(s) or entities to which it is addressed. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited and may be illegal. If you received this in error, please contact the sender and delete the message from your system. Mu Sigma takes all reasonable steps to ensure that its electronic communications are free from viruses. However, given Internet accessibility, the Company cannot accept liability for any virus introduced by this e-mail or any attachment and you are advised to use up-to-date virus checking software.
Re: Why Hector is taking more time than Thrift
Hector is a higher-level client that provides some abstraction and an easy to use interface. The Thrift API is pretty raw. So for most cases the Hector client would be the best choice; except for use-cases where the ultimate performance is a requirement (resulting in lots of more maintenance between Thrift API changes). 2012/6/6 Prakrati Agrawal prakrati.agra...@mu-sigma.com Dear all ** ** I am trying to evaluate the performance of Cassandra and wrote a code to retrieve a complete row ( having 43707 columns) using Thrift and Hector. * *** The thrift client code took 0.767 seconds while Hector code took 0.883 seconds . Is it expected that Hector will be slower than Thrift? If yes, then why are we using Hector and not Thrift? ** ** Thanks and Regards ** ** Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com ** ** -- This email message may contain proprietary, private and confidential information. The information transmitted is intended only for the person(s) or entities to which it is addressed. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited and may be illegal. If you received this in error, please contact the sender and delete the message from your system. Mu Sigma takes all reasonable steps to ensure that its electronic communications are free from viruses. However, given Internet accessibility, the Company cannot accept liability for any virus introduced by this e-mail or any attachment and you are advised to use up-to-date virus checking software. -- With kind regards, Robin Verlangen *Software engineer* * * W http://www.robinverlangen.nl E ro...@us2.nl Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies.
Re: Nodes not picking up data on repair, disk loaded unevenly
You are basically in trouble. If you can nuke it and start again it would be easier. If you want to figure out how to get out of it keep the cluster up and have a play. -What I think the solution should be: You want to get repair to work before you start deleting data. At ~840GB I'm probably running close to the max load I should have on a node, roughly 300GB to 400GB is the max load On node #1 I was able to successfully run a scrub and major compaction, In this situation running a major compaction is now what you want. it creates a huge file that can only be compacted if there is enough space for another huge file. Smaller files only need small space to be compacted. Is there something I should be looking for in the logs to verify that the repair was successful? grep for repair command The shortcut on EC2 is add an EBS volumn, tell cassandra it can store stuff there (in the yaml) and buy some breathing room. What version are you using ? Has there been times when nodes were down ? Clear as much space as possible from the disk. Check for snapshots in all KS's. What KS's (including the system KS) are taking up the most space ? Are there a lot of hints in the system KS (they are not replicated)? Try to get a feel for what CF's are taking up the space or not as the case my be. Look in nodetool cfstats to see how big the rows are. I you have enabled compression run nodetool upgradetables to compress them. In general, try to get free space on the nodes by using compaction, moving files to a new mount etc so that you can get repair to run. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 6/06/2012, at 6:53 AM, Luke Hospadaruk wrote: I have a 4-node cluster with one keyspace (aside from the system keyspace) with the replication factor set to 4. The disk usage between the nodes is pretty wildly different and I'm wondering why. It's becoming a problem because one node is getting to the point where it sometimes fails to compact because it doesn't have enough space. I've been doing a lot of experimenting with the schema, adding/dropping things, changing settings around (not ideal I realize, but we're still in development). In an ideal world, I'd launch another cluster (this is all hosted in amazon), copy all the data to that, and just get rid of my current cluster, but the current cluster is in use by some other parties so rebuilding everything is impractical (although possible if it's the only reliable solution). $ nodetool -h localhost ring Address DCRack Status State Load Owns Token 1.xx.xx.xx Cassandra rack1 Up Normal 837.8 GB 25.00% 0 2.xx.xx.xx Cassandra rack1 Up Normal 1.17 TB25.00% 42535295865117307932921825928971026432 3.xx.xx.xx Cassandra rack1 Up Normal 977.23 GB 25.00% 85070591730234615865843651857942052864 4.xx.xx.xx Cassandra rack1 Up Normal 291.2 GB 25.00% 127605887595351923798765477786913079296 -Problems I'm having: Nodes are running out of space and are apparently unable to perform compactions because of it. These machines have 1.7T total space each. The logs for node #2 have a lot of warnings about insufficient space for compaction. Node number 4 was so extremely out of space (cassandra was failing to start because of it)that I removed all the SSTables for one of the less essential column families just to bring it back online. I have (since I started noticing these issues) enabled compression for all my column families. On node #1 I was able to successfully run a scrub and major compaction, so I suspect that the disk usage for node #1 is about where all the other nodes should be. At ~840GB I'm probably running close to the max load I should have on a node, so I may need to launch more nodes into the cluster, but I'd like to get things straightened out before I introduce more potential issues (token moving, etc). Node #4 seems not to be picking up all the data it should have (since repication factor == number of nodes, the load should be roughly the same?). I've run repairs on that node to seemingly no avail (after repair finishes, it still has about the same disk usage, which is much too low). -What I think the solution should be: One node at a time: 1) nodetool drain the node 2) shut down cassandra on the node 3) wipe out all the data in my keyspace on the node 4) bring cassandra back up 5) nodetool repair -My concern: This is basically what I did with node #4 (although I didn't drain, and I didn't wipe the entire keyspace), and it doesn't seem to have regained all the data it's supposed to have after the repair. The column family should have at least 200-300GB of data, and the SSTables in the data directory only total about 11GB, am I missing something? Is there a way to verify that a node _really_ has all the data
RE: Cassandra not retrieving the complete data on 2 nodes
Please anyone reply to my query Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com From: Prakrati Agrawal [mailto:prakrati.agra...@mu-sigma.com] Sent: Wednesday, June 06, 2012 2:34 PM To: user@cassandra.apache.org Subject: Cassandra not retrieving the complete data on 2 nodes Dear all I was originally having a 1 node cluster. Then I added one more node to it with initial token configured appropriately. Now when I run my queries I am not getting all my data ie all columns. Output on 2 nodes Time taken to retrieve columns 43707 of key range is 1276 Time taken to retrieve columns 2084199 of all tickers is 54334 Time taken to count is 230776 Total number of rows in the database are 183 Total number of columns in the database are 7903753 Output on 1 node Time taken to retrieve columns 43707 of key range is 767 Time taken to retrieve columns 382 of all tickers is 52793 Time taken to count is 268135 Total number of rows in the database are 396 Total number of columns in the database are 16316426 Please help me. Where is my data going or how should I retrieve it. Thanks and Regards Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com This email message may contain proprietary, private and confidential information. The information transmitted is intended only for the person(s) or entities to which it is addressed. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited and may be illegal. If you received this in error, please contact the sender and delete the message from your system. Mu Sigma takes all reasonable steps to ensure that its electronic communications are free from viruses. However, given Internet accessibility, the Company cannot accept liability for any virus introduced by this e-mail or any attachment and you are advised to use up-to-date virus checking software. This email message may contain proprietary, private and confidential information. The information transmitted is intended only for the person(s) or entities to which it is addressed. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited and may be illegal. If you received this in error, please contact the sender and delete the message from your system. Mu Sigma takes all reasonable steps to ensure that its electronic communications are free from viruses. However, given Internet accessibility, the Company cannot accept liability for any virus introduced by this e-mail or any attachment and you are advised to use up-to-date virus checking software.
RE: Query
Hi, After creating the keyspace successfully now i want to know how to read write data using API,s Regards Arshad From: Filippo Diotalevi [fili...@ntoklo.com] Sent: Wednesday, June 06, 2012 2:27 PM To: user@cassandra.apache.org Subject: Re: Query Hi, the Javadoc (or source code) of the me.prettyprint.hector.api.factory.HFactory class contains all the examples to create keyspaces and column families. To create a keyspace: String testKeyspace = testKeyspace; KeyspaceDefinition newKeyspace = HFactory.createKeyspaceDefinition(testKeyspace); cluster.addKeyspace(newKeyspace); To create a column family and a keyspace: String keyspace = testKeyspace; String column1 = testcolumn; ColumnFamilyDefinition columnFamily1 = HFactory.createColumnFamilyDefinition(keyspace, column1); ListColumnFamilyDefinition columns = new ArrayListColumnFamilyDefinition(); columns.add(columnFamily1); KeyspaceDefinition testKeyspace = HFactory.createKeyspaceDefinition(keyspace, org.apache.cassandra.locator.SimpleStrategy.class.getName(), 1, columns); cluster.addKeyspace(testKeyspace); -- Filippo Diotalevi On Wednesday, 6 June 2012 at 07:05, MOHD ARSHAD SALEEM wrote: Hi All, I am using Hector client for cassandra . I wanted to know how to create keyspace and column family using API's to read and write data. or i have to create keyspace and column family manually using command line interface. Regards Arshad
Problem in getting data from a 2 node cluster
Dear all, I had a 1 node cluster. Then I added 1 more node to it. When I ran my query on 1 node cluster I got all my data but when I ran my query on the 2 node cluster (Hector code) I am not getting the same data. How do I ensure that my Hector code retrieves data from all the nodes. Also when I decommission my node and then add it again I get the following message. This node will not auto bootstrap because it is configured to be a seed node Please tell me the meaning of it also Thanks and Regards Prakrati Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com This email message may contain proprietary, private and confidential information. The information transmitted is intended only for the person(s) or entities to which it is addressed. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited and may be illegal. If you received this in error, please contact the sender and delete the message from your system. Mu Sigma takes all reasonable steps to ensure that its electronic communications are free from viruses. However, given Internet accessibility, the Company cannot accept liability for any virus introduced by this e-mail or any attachment and you are advised to use up-to-date virus checking software.
unsubscribe
-- Cyril SCETBON
Re: unsubscribe
On 6/6/12 12:13 PM, Cyril Scetbon wrote: sorry for that -- Cyril SCETBON
Re: Problem in getting data from a 2 node cluster
Did you run repair on the new node? 2012/6/6 Prakrati Agrawal prakrati.agra...@mu-sigma.com Dear all, ** ** I had a 1 node cluster. Then I added 1 more node to it. ** ** When I ran my query on 1 node cluster I got all my data but when I ran my query on the 2 node cluster (Hector code) I am not getting the same data. How do I ensure that my Hector code retrieves data from all the nodes. *** * ** ** Also when I decommission my node and then add it again I get the following message. This node will not auto bootstrap because it is configured to be a seed node Please tell me the meaning of it also ** ** Thanks and Regards Prakrati ** ** Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com ** ** -- This email message may contain proprietary, private and confidential information. The information transmitted is intended only for the person(s) or entities to which it is addressed. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited and may be illegal. If you received this in error, please contact the sender and delete the message from your system. Mu Sigma takes all reasonable steps to ensure that its electronic communications are free from viruses. However, given Internet accessibility, the Company cannot accept liability for any virus introduced by this e-mail or any attachment and you are advised to use up-to-date virus checking software. -- With kind regards, Robin Verlangen *Software engineer* * * W http://www.robinverlangen.nl E ro...@us2.nl Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies.
RE: Problem in getting data from a 2 node cluster
What does repair do? Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com From: R. Verlangen [mailto:ro...@us2.nl] Sent: Wednesday, June 06, 2012 3:56 PM To: user@cassandra.apache.org Subject: Re: Problem in getting data from a 2 node cluster Did you run repair on the new node? 2012/6/6 Prakrati Agrawal prakrati.agra...@mu-sigma.commailto:prakrati.agra...@mu-sigma.com Dear all, I had a 1 node cluster. Then I added 1 more node to it. When I ran my query on 1 node cluster I got all my data but when I ran my query on the 2 node cluster (Hector code) I am not getting the same data. How do I ensure that my Hector code retrieves data from all the nodes. Also when I decommission my node and then add it again I get the following message. This node will not auto bootstrap because it is configured to be a seed node Please tell me the meaning of it also Thanks and Regards Prakrati Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.comhttp://www.mu-sigma.com This email message may contain proprietary, private and confidential information. The information transmitted is intended only for the person(s) or entities to which it is addressed. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited and may be illegal. If you received this in error, please contact the sender and delete the message from your system. Mu Sigma takes all reasonable steps to ensure that its electronic communications are free from viruses. However, given Internet accessibility, the Company cannot accept liability for any virus introduced by this e-mail or any attachment and you are advised to use up-to-date virus checking software. -- With kind regards, Robin Verlangen Software engineer W http://www.robinverlangen.nl E ro...@us2.nlmailto:ro...@us2.nl Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies. This email message may contain proprietary, private and confidential information. The information transmitted is intended only for the person(s) or entities to which it is addressed. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited and may be illegal. If you received this in error, please contact the sender and delete the message from your system. Mu Sigma takes all reasonable steps to ensure that its electronic communications are free from viruses. However, given Internet accessibility, the Company cannot accept liability for any virus introduced by this e-mail or any attachment and you are advised to use up-to-date virus checking software.
RE: Problem in getting data from a 2 node cluster
When I run the nodetool command I get the following information ./nodetool -h localhost ring Address DC RackStatus State Load Effective-Owership Token 85070591730234615865843651857942052864 162.192.100.16 datacenter1 rack1 Up Normal 238.22 MB 50.00% 0 162.192.100.48 datacenter1 rack1 Up Normal 115.6 MB50.00% 85070591730234615865843651857942052864 Please help me Thanks and Regards Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com From: Prakrati Agrawal [mailto:prakrati.agra...@mu-sigma.com] Sent: Wednesday, June 06, 2012 3:55 PM To: user@cassandra.apache.org Subject: RE: Problem in getting data from a 2 node cluster What does repair do? Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com From: R. Verlangen [mailto:ro...@us2.nl] Sent: Wednesday, June 06, 2012 3:56 PM To: user@cassandra.apache.org Subject: Re: Problem in getting data from a 2 node cluster Did you run repair on the new node? 2012/6/6 Prakrati Agrawal prakrati.agra...@mu-sigma.commailto:prakrati.agra...@mu-sigma.com Dear all, I had a 1 node cluster. Then I added 1 more node to it. When I ran my query on 1 node cluster I got all my data but when I ran my query on the 2 node cluster (Hector code) I am not getting the same data. How do I ensure that my Hector code retrieves data from all the nodes. Also when I decommission my node and then add it again I get the following message. This node will not auto bootstrap because it is configured to be a seed node Please tell me the meaning of it also Thanks and Regards Prakrati Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.comhttp://www.mu-sigma.com This email message may contain proprietary, private and confidential information. The information transmitted is intended only for the person(s) or entities to which it is addressed. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited and may be illegal. If you received this in error, please contact the sender and delete the message from your system. Mu Sigma takes all reasonable steps to ensure that its electronic communications are free from viruses. However, given Internet accessibility, the Company cannot accept liability for any virus introduced by this e-mail or any attachment and you are advised to use up-to-date virus checking software. -- With kind regards, Robin Verlangen Software engineer W http://www.robinverlangen.nl E ro...@us2.nlmailto:ro...@us2.nl Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies. This email message may contain proprietary, private and confidential information. The information transmitted is intended only for the person(s) or entities to which it is addressed. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited and may be illegal. If you received this in error, please contact the sender and delete the message from your system. Mu Sigma takes all reasonable steps to ensure that its electronic communications are free from viruses. However, given Internet accessibility, the Company cannot accept liability for any virus introduced by this e-mail or any attachment and you are advised to use up-to-date virus checking software. This email message may contain proprietary, private and confidential information. The information transmitted is intended only for the person(s) or entities to which it is addressed. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited and may be illegal. If you received this in error, please contact the sender and delete the message from your system. Mu Sigma takes all reasonable steps to ensure that its electronic communications are free from viruses. However, given Internet accessibility, the Company cannot accept liability for any virus introduced by this e-mail or any attachment and you are advised to use up-to-date virus checking software.
Re: Problem in getting data from a 2 node cluster
Repair ensures that all data is consistent and available on the node. 2012/6/6 Prakrati Agrawal prakrati.agra...@mu-sigma.com When I run the nodetool command I get the following information ./nodetool -h localhost ring Address DC RackStatus State Load Effective-Owership Token 85070591730234615865843651857942052864 162.192.100.16 datacenter1 rack1 Up Normal 238.22 MB 50.00% 0 162.192.100.48 datacenter1 rack1 Up Normal 115.6 MB 50.00% 85070591730234615865843651857942052864 ** ** Please help me ** ** Thanks and Regards ** ** Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com ** ** *From:* Prakrati Agrawal [mailto:prakrati.agra...@mu-sigma.com] *Sent:* Wednesday, June 06, 2012 3:55 PM *To:* user@cassandra.apache.org *Subject:* RE: Problem in getting data from a 2 node cluster ** ** What does repair do? ** ** Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com ** ** *From:* R. Verlangen [mailto:ro...@us2.nl] *Sent:* Wednesday, June 06, 2012 3:56 PM *To:* user@cassandra.apache.org *Subject:* Re: Problem in getting data from a 2 node cluster ** ** Did you run repair on the new node? 2012/6/6 Prakrati Agrawal prakrati.agra...@mu-sigma.com Dear all, I had a 1 node cluster. Then I added 1 more node to it. When I ran my query on 1 node cluster I got all my data but when I ran my query on the 2 node cluster (Hector code) I am not getting the same data. How do I ensure that my Hector code retrieves data from all the nodes. *** * Also when I decommission my node and then add it again I get the following message. This node will not auto bootstrap because it is configured to be a seed node Please tell me the meaning of it also Thanks and Regards Prakrati Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com ** ** -- This email message may contain proprietary, private and confidential information. The information transmitted is intended only for the person(s) or entities to which it is addressed. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited and may be illegal. If you received this in error, please contact the sender and delete the message from your system. Mu Sigma takes all reasonable steps to ensure that its electronic communications are free from viruses. However, given Internet accessibility, the Company cannot accept liability for any virus introduced by this e-mail or any attachment and you are advised to use up-to-date virus checking software. ** ** -- With kind regards, ** ** Robin Verlangen *Software engineer* ** ** W http://www.robinverlangen.nl E ro...@us2.nl ** ** Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies. ** ** ** ** -- This email message may contain proprietary, private and confidential information. The information transmitted is intended only for the person(s) or entities to which it is addressed. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited and may be illegal. If you received this in error, please contact the sender and delete the message from your system. Mu Sigma takes all reasonable steps to ensure that its electronic communications are free from viruses. However, given Internet accessibility, the Company cannot accept liability for any virus introduced by this e-mail or any attachment and you are advised to use up-to-date virus checking software. -- This email message may contain proprietary, private and confidential information. The information transmitted is intended only for the person(s) or entities to which it is addressed. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended
RE: Problem in getting data from a 2 node cluster
Yes I ran nodetool repair also. Still the same problem I am getting lesser data when using my code on a 2 node cluster. Please help me Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com From: R. Verlangen [mailto:ro...@us2.nl] Sent: Wednesday, June 06, 2012 4:01 PM To: user@cassandra.apache.org Subject: Re: Problem in getting data from a 2 node cluster Repair ensures that all data is consistent and available on the node. 2012/6/6 Prakrati Agrawal prakrati.agra...@mu-sigma.commailto:prakrati.agra...@mu-sigma.com When I run the nodetool command I get the following information ./nodetool -h localhost ring Address DC RackStatus State Load Effective-Owership Token 85070591730234615865843651857942052864 162.192.100.16 datacenter1 rack1 Up Normal 238.22 MB 50.00% 0 162.192.100.48 datacenter1 rack1 Up Normal 115.6 MB50.00% 85070591730234615865843651857942052864 Please help me Thanks and Regards Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.comhttp://www.mu-sigma.com From: Prakrati Agrawal [mailto:prakrati.agra...@mu-sigma.commailto:prakrati.agra...@mu-sigma.com] Sent: Wednesday, June 06, 2012 3:55 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: RE: Problem in getting data from a 2 node cluster What does repair do? Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.comhttp://www.mu-sigma.com From: R. Verlangen [mailto:ro...@us2.nlmailto:ro...@us2.nl] Sent: Wednesday, June 06, 2012 3:56 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Problem in getting data from a 2 node cluster Did you run repair on the new node? 2012/6/6 Prakrati Agrawal prakrati.agra...@mu-sigma.commailto:prakrati.agra...@mu-sigma.com Dear all, I had a 1 node cluster. Then I added 1 more node to it. When I ran my query on 1 node cluster I got all my data but when I ran my query on the 2 node cluster (Hector code) I am not getting the same data. How do I ensure that my Hector code retrieves data from all the nodes. Also when I decommission my node and then add it again I get the following message. This node will not auto bootstrap because it is configured to be a seed node Please tell me the meaning of it also Thanks and Regards Prakrati Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.comhttp://www.mu-sigma.com This email message may contain proprietary, private and confidential information. The information transmitted is intended only for the person(s) or entities to which it is addressed. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited and may be illegal. If you received this in error, please contact the sender and delete the message from your system. Mu Sigma takes all reasonable steps to ensure that its electronic communications are free from viruses. However, given Internet accessibility, the Company cannot accept liability for any virus introduced by this e-mail or any attachment and you are advised to use up-to-date virus checking software. -- With kind regards, Robin Verlangen Software engineer W http://www.robinverlangen.nl E ro...@us2.nlmailto:ro...@us2.nl Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies. This email message may contain proprietary, private and confidential information. The information transmitted is intended only for the person(s) or entities to which it is addressed. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited and may be illegal. If you received this in error, please contact the sender and delete the message from your system. Mu Sigma takes all reasonable steps to ensure that its electronic communications are free from viruses. However, given Internet accessibility, the Company cannot accept liability for any virus introduced by this e-mail or any attachment and you are advised to use up-to-date virus checking software. This email message may contain proprietary, private and confidential information. The
RE: Problem in getting data from a 2 node cluster
I even used CassandraHostConfigurator and added a string of hosts but still the same issue. Please someone help me Thanks and Regards Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com From: Prakrati Agrawal [mailto:prakrati.agra...@mu-sigma.com] Sent: Wednesday, June 06, 2012 4:04 PM To: user@cassandra.apache.org Subject: RE: Problem in getting data from a 2 node cluster Yes I ran nodetool repair also. Still the same problem I am getting lesser data when using my code on a 2 node cluster. Please help me Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com From: R. Verlangen [mailto:ro...@us2.nl] Sent: Wednesday, June 06, 2012 4:01 PM To: user@cassandra.apache.org Subject: Re: Problem in getting data from a 2 node cluster Repair ensures that all data is consistent and available on the node. 2012/6/6 Prakrati Agrawal prakrati.agra...@mu-sigma.commailto:prakrati.agra...@mu-sigma.com When I run the nodetool command I get the following information ./nodetool -h localhost ring Address DC RackStatus State Load Effective-Owership Token 85070591730234615865843651857942052864 162.192.100.16 datacenter1 rack1 Up Normal 238.22 MB 50.00% 0 162.192.100.48 datacenter1 rack1 Up Normal 115.6 MB50.00% 85070591730234615865843651857942052864 Please help me Thanks and Regards Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.comhttp://www.mu-sigma.com From: Prakrati Agrawal [mailto:prakrati.agra...@mu-sigma.commailto:prakrati.agra...@mu-sigma.com] Sent: Wednesday, June 06, 2012 3:55 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: RE: Problem in getting data from a 2 node cluster What does repair do? Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.comhttp://www.mu-sigma.com From: R. Verlangen [mailto:ro...@us2.nlmailto:ro...@us2.nl] Sent: Wednesday, June 06, 2012 3:56 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Problem in getting data from a 2 node cluster Did you run repair on the new node? 2012/6/6 Prakrati Agrawal prakrati.agra...@mu-sigma.commailto:prakrati.agra...@mu-sigma.com Dear all, I had a 1 node cluster. Then I added 1 more node to it. When I ran my query on 1 node cluster I got all my data but when I ran my query on the 2 node cluster (Hector code) I am not getting the same data. How do I ensure that my Hector code retrieves data from all the nodes. Also when I decommission my node and then add it again I get the following message. This node will not auto bootstrap because it is configured to be a seed node Please tell me the meaning of it also Thanks and Regards Prakrati Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.comhttp://www.mu-sigma.com This email message may contain proprietary, private and confidential information. The information transmitted is intended only for the person(s) or entities to which it is addressed. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited and may be illegal. If you received this in error, please contact the sender and delete the message from your system. Mu Sigma takes all reasonable steps to ensure that its electronic communications are free from viruses. However, given Internet accessibility, the Company cannot accept liability for any virus introduced by this e-mail or any attachment and you are advised to use up-to-date virus checking software. -- With kind regards, Robin Verlangen Software engineer W http://www.robinverlangen.nl E ro...@us2.nlmailto:ro...@us2.nl Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies. This email message may contain proprietary, private and confidential information. The information transmitted is intended only for the person(s) or entities to which it is addressed. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited and may be illegal. If you received this in error, please contact the sender and delete the message from your system. Mu Sigma takes all reasonable
RE: Problem in getting data from a 2 node cluster
I will repeat my query once again: I had a 1 node cluster. Then I added 1 more node to it. When I ran my query on 1 node cluster I got all my data but when I ran my query on the 2 node cluster (Hector code) I am not getting the same data. How do I ensure that my Hector code retrieves data from all the nodes. Also when I decommission my node and then add it again I get the following message. This node will not auto bootstrap because it is configured to be a seed node Please tell me the meaning of it also The things I already tried are: 1. Used CassandraHostConfigurator - Still same issue 2. Used nodetool repair on both the nodes - Still same issue Please help me out. I am badly stuck Thanks and Regards Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com From: Prakrati Agrawal [mailto:prakrati.agra...@mu-sigma.com] Sent: Wednesday, June 06, 2012 4:41 PM To: user@cassandra.apache.org Subject: RE: Problem in getting data from a 2 node cluster I even used CassandraHostConfigurator and added a string of hosts but still the same issue. Please someone help me Thanks and Regards Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com From: Prakrati Agrawal [mailto:prakrati.agra...@mu-sigma.com] Sent: Wednesday, June 06, 2012 4:04 PM To: user@cassandra.apache.org Subject: RE: Problem in getting data from a 2 node cluster Yes I ran nodetool repair also. Still the same problem I am getting lesser data when using my code on a 2 node cluster. Please help me Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com From: R. Verlangen [mailto:ro...@us2.nl] Sent: Wednesday, June 06, 2012 4:01 PM To: user@cassandra.apache.org Subject: Re: Problem in getting data from a 2 node cluster Repair ensures that all data is consistent and available on the node. 2012/6/6 Prakrati Agrawal prakrati.agra...@mu-sigma.commailto:prakrati.agra...@mu-sigma.com When I run the nodetool command I get the following information ./nodetool -h localhost ring Address DC RackStatus State Load Effective-Owership Token 85070591730234615865843651857942052864 162.192.100.16 datacenter1 rack1 Up Normal 238.22 MB 50.00% 0 162.192.100.48 datacenter1 rack1 Up Normal 115.6 MB50.00% 85070591730234615865843651857942052864 Please help me Thanks and Regards Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.comhttp://www.mu-sigma.com From: Prakrati Agrawal [mailto:prakrati.agra...@mu-sigma.commailto:prakrati.agra...@mu-sigma.com] Sent: Wednesday, June 06, 2012 3:55 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: RE: Problem in getting data from a 2 node cluster What does repair do? Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.comhttp://www.mu-sigma.com From: R. Verlangen [mailto:ro...@us2.nlmailto:ro...@us2.nl] Sent: Wednesday, June 06, 2012 3:56 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Problem in getting data from a 2 node cluster Did you run repair on the new node? 2012/6/6 Prakrati Agrawal prakrati.agra...@mu-sigma.commailto:prakrati.agra...@mu-sigma.com Dear all, I had a 1 node cluster. Then I added 1 more node to it. When I ran my query on 1 node cluster I got all my data but when I ran my query on the 2 node cluster (Hector code) I am not getting the same data. How do I ensure that my Hector code retrieves data from all the nodes. Also when I decommission my node and then add it again I get the following message. This node will not auto bootstrap because it is configured to be a seed node Please tell me the meaning of it also Thanks and Regards Prakrati Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.comhttp://www.mu-sigma.com This email message may contain proprietary, private and confidential information. The information transmitted is intended only for the person(s) or entities to which it is addressed. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited and may be illegal. If you received this in error, please contact the sender and delete the message from your system. Mu Sigma takes all reasonable steps to ensure that its electronic communications are free from viruses. However, given Internet accessibility, the Company cannot accept liability for any virus introduced by this e-mail or any attachment and you are advised to use up-to-date virus checking software. -- With kind regards, Robin Verlangen Software engineer W http://www.robinverlangen.nl E ro...@us2.nlmailto:ro...@us2.nl Disclaimer: The information contained in this message
Re: Query
Hi, You can find detailed info here [1] [1] https://github.com/hector-client/hector/wiki/User-Guide regards On Wed, Jun 6, 2012 at 3:38 PM, MOHD ARSHAD SALEEM marshadsal...@tataelxsi.co.in wrote: Hi, After creating the keyspace successfully now i want to know how to read write data using API,s Regards Arshad -- *From:* Filippo Diotalevi [fili...@ntoklo.com] *Sent:* Wednesday, June 06, 2012 2:27 PM *To:* user@cassandra.apache.org *Subject:* Re: Query Hi, the Javadoc (or source code) of the me.prettyprint.hector.api.factory.HFactory class contains all the examples to create keyspaces and column families. To create a keyspace: String testKeyspace = testKeyspace; KeyspaceDefinition newKeyspace = HFactory.createKeyspaceDefinition(testKeyspace); cluster.addKeyspace(newKeyspace); To create a column family and a keyspace: String keyspace = testKeyspace; String column1 = testcolumn; ColumnFamilyDefinition columnFamily1 = HFactory.createColumnFamilyDefinition(keyspace, column1); ListColumnFamilyDefinition columns = new ArrayListColumnFamilyDefinition(); columns.add(columnFamily1); KeyspaceDefinition testKeyspace = HFactory.createKeyspaceDefinition(keyspace, org.apache.cassandra.locator.SimpleStrategy.class.getName(), 1, columns); cluster.addKeyspace(testKeyspace); -- Filippo Diotalevi On Wednesday, 6 June 2012 at 07:05, MOHD ARSHAD SALEEM wrote: Hi All, I am using Hector client for cassandra . I wanted to know how to create keyspace and column family using API's to read and write data. or i have to create keyspace and column family manually using command line interface. Regards Arshad -- Shelan Perera Home: http://www.shelan.org Blog : http://www.shelanlk.com Twitter: shelan skype :shelan.perera gtalk :shelanrc I am the master of my fate: I am the captain of my soul. *invictus*
RE: Problem in getting data from a 2 node cluster
On Wed, 2012-06-06 at 06:54 -0500, Prakrati Agrawal wrote: This node will not auto bootstrap because it is configured to be a seed node This means the cassandra.yaml on that node references itself as a seed node. After you decommission the second node, can you still access the entire dataset in the single node cluser, or has it been lost along the way? What is the replication factor for your data? Tim Wintle
Node decomission failed
Hi, We are testing Cassandra and tried to remove a node from the cluster using nodetool decomission. The node transferred the data, then died for about 20 minutes without responding, then came back to life with a load of 50-100, was in a heavy load during about 1 hour and then returned to normal load. It seems to have stopped receiving new data but it is still in the cluster. The node we tried to remove is the third one: root@dc-cassandra-03:~# nodetool ring Note: Ownership information does not include topology, please specify a keyspace. Address DC RackStatus State LoadOwns Token 113427455640312821154458202477256070484 10.70.147.62datacenter1 rack1 Up Normal 7.14 GB 33.33% 0 10.208.51.64datacenter1 rack1 Up Normal 3.68 GB 33.33% 56713727820156410577229101238628035242 10.190.207.185 datacenter1 rack1 Up Normal 3.54 GB 33.33% 113427455640312821154458202477256070484 It seems it is still part of the cluster. What should we do? decomission again? How can we know the current state of the cluster? Thanks!
Re: MeteredFlusher in system.log entries
Hi.. the link http://thelastpickle.com/2011/05/04/How-are-Memtables-measured/ mentions that From version 0.7 onwards the worse case scenario is up to CF Count + Secondary Index Count + memtable_flush_queue_size (defaults to 4) + memtable_flush_writers (defaults to 1 per data directory) memtables in memory the JVM at once.. So it implies that for flushing, Cassandra copies the memtables content. So does this imply that writes to column families are not stopped even when it is being flushed? Thanks Rohit On Wed, Jun 6, 2012 at 9:42 AM, rohit bhatia rohit2...@gmail.com wrote: Hi Aaron Thanks for the link, I have gone through it. But this doesn't justify nodes of exactly same config/specs differing in their flushing frequency. The traffic on all node is same as we are using RandomPartitioner Thanks Rohit On Wed, Jun 6, 2012 at 12:24 AM, aaron morton aa...@thelastpickle.com wrote: See the section on memtable_total_space_in_mb here http://thelastpickle.com/2011/05/04/How-are-Memtables-measured/ Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 6/06/2012, at 2:27 AM, rohit bhatia wrote: I am trying to understand the variance in flushes frequency in a 8 node Cassandra cluster. All the flushes are of the same type and initiated by MeteredFlusher.java = INFO [OptionalTasks:1] 2012-06-05 06:32:05,873 MeteredFlusher.java (line 62) flushing high-traffic column family CFS(Keyspace='Stats', ColumnFamily='Minutewise_Channel_Stats') (estimated 501695882 bytes) [taken from system.log] Number of flushes for 1 column family vary from 6 flushes per day to 24 flushes per day among nodes of same configuration and same hardware. Could you please throw light on the what conditions does MeteredFlusher use to trigger memtable flushes. Also how accurate is the estimated size in the above logfile entry. Regards Rohit Bhatia Software Engineer, Media.net
Re: MeteredFlusher in system.log entries
Also, Could someone please explain how the factor of 7 comes in the picture in this sentence For example if memtable_total_space_in_mb is 100MB, and memtable_flush_writers is the default 1 (with one data directory), and memtable_flush_queue_size is the default 4, and a Column Family has no secondary indexes. The CF will not be allowed to get above one seventh of 100MB or 14MB, as if the CF filled the flush pipeline with 7 memtables of this size it would take 98MB. On Wed, Jun 6, 2012 at 6:22 PM, rohit bhatia rohit2...@gmail.com wrote: Hi.. the link http://thelastpickle.com/2011/05/04/How-are-Memtables-measured/ mentions that From version 0.7 onwards the worse case scenario is up to CF Count + Secondary Index Count + memtable_flush_queue_size (defaults to 4) + memtable_flush_writers (defaults to 1 per data directory) memtables in memory the JVM at once.. So it implies that for flushing, Cassandra copies the memtables content. So does this imply that writes to column families are not stopped even when it is being flushed? Thanks Rohit On Wed, Jun 6, 2012 at 9:42 AM, rohit bhatia rohit2...@gmail.com wrote: Hi Aaron Thanks for the link, I have gone through it. But this doesn't justify nodes of exactly same config/specs differing in their flushing frequency. The traffic on all node is same as we are using RandomPartitioner Thanks Rohit On Wed, Jun 6, 2012 at 12:24 AM, aaron morton aa...@thelastpickle.com wrote: See the section on memtable_total_space_in_mb here http://thelastpickle.com/2011/05/04/How-are-Memtables-measured/ Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 6/06/2012, at 2:27 AM, rohit bhatia wrote: I am trying to understand the variance in flushes frequency in a 8 node Cassandra cluster. All the flushes are of the same type and initiated by MeteredFlusher.java = INFO [OptionalTasks:1] 2012-06-05 06:32:05,873 MeteredFlusher.java (line 62) flushing high-traffic column family CFS(Keyspace='Stats', ColumnFamily='Minutewise_Channel_Stats') (estimated 501695882 bytes) [taken from system.log] Number of flushes for 1 column family vary from 6 flushes per day to 24 flushes per day among nodes of same configuration and same hardware. Could you please throw light on the what conditions does MeteredFlusher use to trigger memtable flushes. Also how accurate is the estimated size in the above logfile entry. Regards Rohit Bhatia Software Engineer, Media.net
How do I initialize Astyanax in a EJB Stateless bean
Hello All, How do I initialize Astyanax inside an EJB Stateless bean, which I am using to implement DAO? Thanks ben.jamin
Re: Nodes not picking up data on repair, disk loaded unevenly
Thanks for the tips Some things I found looking around: grepping the logs for a specific repair I ran yesterday: /var/log/cassandra# grep df14e460-af48-11e1--e9014560c7bd system.log INFO [AntiEntropySessions:13] 2012-06-05 19:58:51,303 AntiEntropyService.java (line 658) [repair #df14e460-af48-11e1--e9014560c7bd] new session: will sync /4.xx.xx.xx, /1.xx.xx.xx, /3.xx.xx.xx, /2.xx.xx.xx on range (85070591730234615865843651857942052864,127605887595351923798765477786913079296] for content.[article2] INFO [AntiEntropySessions:13] 2012-06-05 19:58:51,304 AntiEntropyService.java (line 837) [repair #df14e460-af48-11e1--e9014560c7bd] requests for merkle tree sent for article2 (to [ /4.xx.xx.xx, /1.xx.xx.xx, /3.xx.xx.xx, /2.xx.xx.xx]) INFO [AntiEntropyStage:1] 2012-06-05 20:07:01,169 AntiEntropyService.java (line 190) [repair #df14e460-af48-11e1--e9014560c7bd] Received merkle tree for article2 from /4.xx.xx.xx INFO [AntiEntropyStage:1] 2012-06-06 04:12:30,633 AntiEntropyService.java (line 190) [repair #df14e460-af48-11e1--e9014560c7bd] Received merkle tree for article2 from /3.xx.xx.xx INFO [AntiEntropyStage:1] 2012-06-06 07:02:51,497 AntiEntropyService.java (line 190) [repair #df14e460-af48-11e1--e9014560c7bd] Received merkle tree for article2 from /1.xx.xx.xx So it looks like I never got the tree from node #2 (the node which has particularly out of control disk usage). These are running on amazon m1.xlarge instances with all the EBS volumes raided together for a total of 1.7TB. What version are you using ? 1.0 Has there been times when nodes were down ? Yes, but mostly just restarts, and mostly just one node at a time Clear as much space as possible from the disk. Check for snapshots in all KS's. Already done. What KS's (including the system KS) are taking up the most space ? Are there a lot of hints in the system KS (they are not replicated)? -There's just one KS that I'm actually using, which is taking up anywhere from about 650GB on the node I was able to scrub and compact (that sounds like the right size to me), and 1.3T on the node that is hugely bloated. -There are pretty big huge hints CFs on all but one node (the node I deleted data from, although I did not delete any hints from there). They're between 175GB and 250GB depending on the node. -Is there any way to force replay of hints to empty this out – just a full cluster restart when everything is working again maybe? -Could I just disable hinted handoff and wipe out those tables? I realize I'll loose those hints, but that doesn't bother me terribly. I have a high replication factor and all my writes have been at cl=ONE (so all the data in the hints should actually exist in a CF somewhere right?). Perhaps more importantly if some data has been stalled in a hints table for a week I won't really miss it since it basically doesn't exist right now. I can re-write any data that got lost (although that's not ideal). Try to get a feel for what CF's are taking up the space or not as the case my be. Look in nodetool cfstats to see how big the rows are. The hints table and my tables are the only thing taking up any significant space on the system you have enabled compression run nodetool upgradetables to compress them. how much working space does this need? Problem is that node #2 is so full I'm not sure any major rebuild or compaction will be susccessful. The other nodes seem to be handiling things ok although they are still heavily loaded. In general, try to get free space on the nodes by using compaction, moving files to a new mount etc so that you can get repair to run. -I'll try adding an EBS volume or two to the bloated node and see if that allows me to successfuly compact/repair. -If I add another volume to that node, then run some compactions and such to the point where everything fits on the main volume again, I may just replace that node with a new one. Can I move things off of and then kill the ebs volume? Other thoughts/notes: This cluster has a super high write load currently since I'm still building it out. I frequently update every row in my CFs I almost certainly need to add more capacity (more nodes). The general plan is to get everything sort of working first though, since repairs and such are currently failing it seems like a bad time to add more nodes. Thanks, Luke From: aaron morton aa...@thelastpickle.commailto:aa...@thelastpickle.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Nodes not picking up data on repair, disk loaded unevenly You are basically in trouble. If you can nuke it and start again it would be easier. If you want to figure out how to get out of it keep the cluster up and have a play. -What I think the
Re: [phpcassa] multi_get and composite, cassandra crash my mind
On Wed, Jun 6, 2012 at 2:49 AM, Juan Ezquerro LLanes arr...@gmail.comwrote: El martes, 5 de junio de 2012 19:19:02 UTC+2, Tyler Hobbs escribió: The Cassandra users mailing list is a better place for this question, so I'm moving it there. Hi, I need a phpcassa compatible solution . you think is better to move to the java world? :) It should be doable in phpcassa either way; there's no limitation there. Some comments inline: On Tue, Jun 5, 2012 at 6:47 AM, Juan Ezquerro LLanes wrote: I have a columnfamily like: CREATE COLUMN FAMILY Watchdog WITH key_validation_class = 'CompositeType(**LexicalUUIDType,** LexicalUUIDType)' AND comparator = UTF8Type AND column_metadata = [ {column_name: error_code, validation_class: UTF8Type, index_type: KEYS} {column_name: line, validation_class: IntegerType} {column_name: file_path, validation_class: UTF8Type} {column_name: function, validation_class: UTF8Type} {column_name: content, validation_class: UTF8Type} {column_name: additional_data, validation_class: UTF8Type} {column_name: date_created, validation_class: DateType, index_type: KEYS} {column_name: priority, validation_class: IntegerType, index_type: KEYS} ]; Row key is a combo of 2 uuid, the first it's the user's uuid, if i want a select of all the watchdog entrys of a user.how can i do? is it possible? I justk know user uuid, the other part of key is unknow uuid. The idea is simple, i have a user and i want all the records on watchdog, and i want secondary index to do search.very simple with mysql but here i can't find the way. If i do with a supercolumn i can use secondary indexes, if key is composite there is no way for select all data related to a user... Don't use super columns. You can't put secondary indexes on super column families, anyways. TYPO: If i do with a supercolumn i can*'t* use secondary indexes ... :) The ugly way: CREATE COLUMN FAMILY Watchdog WITH key_validation_class = LexicalUUIDType AND comparator = UTF8Type AND column_metadata = [ * {column_name: user_uuid, validation_class: LexicalUUIDType, index_type: KEYS}* {column_name: error_code, validation_class: UTF8Type, index_type: KEYS} {column_name: line, validation_class: IntegerType} {column_name: file_path, validation_class: UTF8Type} {column_name: function, validation_class: UTF8Type} {column_name: content, validation_class: UTF8Type} {column_name: additional_data, validation_class: UTF8Type} {column_name: date_created, validation_class: DateType, index_type: KEYS} {column_name: priority, validation_class: IntegerType, index_type: KEYS} ]; I'm not sure why you think this is the ugly way to do it. Assuming there will be plenty of events for each user, this will work pretty well with a secondary index. Have you tried it? You think that's a good idea with very large sets of data, ok, you are the master, i try :) Thanks again :) The other decent option is to maintain your own index in a separate column family with one row per user, similar to the materialized view approach described here: http://www.datastax.com/dev/** blog/advanced-time-series-**with-cassandrahttp://www.datastax.com/dev/blog/advanced-time-series-with-cassandra But i think that is not a nice solution because y always need to search in all rows of very big tables to take all user's data... Please can help? Thanks. -- Tyler Hobbs DataStax http://datastax.com/ El martes, 5 de junio de 2012 19:19:02 UTC+2, Tyler Hobbs escribió: The Cassandra users mailing list is a better place for this question, so I'm moving it there. Some comments inline: On Tue, Jun 5, 2012 at 6:47 AM, Juan Ezquerro LLanes arr...@gmail.comwrote: I have a columnfamily like: CREATE COLUMN FAMILY Watchdog WITH key_validation_class = 'CompositeType(**LexicalUUIDType,** LexicalUUIDType)' AND comparator = UTF8Type AND column_metadata = [ {column_name: error_code, validation_class: UTF8Type, index_type: KEYS} {column_name: line, validation_class: IntegerType} {column_name: file_path, validation_class: UTF8Type} {column_name: function, validation_class: UTF8Type} {column_name: content, validation_class: UTF8Type} {column_name: additional_data, validation_class: UTF8Type} {column_name: date_created, validation_class: DateType, index_type: KEYS} {column_name: priority, validation_class: IntegerType, index_type: KEYS} ]; Row key is a combo of 2 uuid, the first it's the user's uuid, if i want a select of all the watchdog entrys of a user.how can i do? is it possible? I justk know user uuid, the other part of key is unknow uuid. The idea is simple, i have a user and i
RE: Cassandra not retrieving the complete data on 2 nodes
what is your consistency level? From: Prakrati Agrawal [mailto:prakrati.agra...@mu-sigma.com] Sent: Wednesday, June 06, 2012 4:58 AM To: user@cassandra.apache.org Subject: RE: Cassandra not retrieving the complete data on 2 nodes Please anyone reply to my query Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.comhttp://www.mu-sigma.com From: Prakrati Agrawal [mailto:prakrati.agra...@mu-sigma.com] Sent: Wednesday, June 06, 2012 2:34 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Cassandra not retrieving the complete data on 2 nodes Dear all I was originally having a 1 node cluster. Then I added one more node to it with initial token configured appropriately. Now when I run my queries I am not getting all my data ie all columns. Output on 2 nodes Time taken to retrieve columns 43707 of key range is 1276 Time taken to retrieve columns 2084199 of all tickers is 54334 Time taken to count is 230776 Total number of rows in the database are 183 Total number of columns in the database are 7903753 Output on 1 node Time taken to retrieve columns 43707 of key range is 767 Time taken to retrieve columns 382 of all tickers is 52793 Time taken to count is 268135 Total number of rows in the database are 396 Total number of columns in the database are 16316426 Please help me. Where is my data going or how should I retrieve it. Thanks and Regards Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.comhttp://www.mu-sigma.com This email message may contain proprietary, private and confidential information. The information transmitted is intended only for the person(s) or entities to which it is addressed. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited and may be illegal. If you received this in error, please contact the sender and delete the message from your system. Mu Sigma takes all reasonable steps to ensure that its electronic communications are free from viruses. However, given Internet accessibility, the Company cannot accept liability for any virus introduced by this e-mail or any attachment and you are advised to use up-to-date virus checking software. This email message may contain proprietary, private and confidential information. The information transmitted is intended only for the person(s) or entities to which it is addressed. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited and may be illegal. If you received this in error, please contact the sender and delete the message from your system. Mu Sigma takes all reasonable steps to ensure that its electronic communications are free from viruses. However, given Internet accessibility, the Company cannot accept liability for any virus introduced by this e-mail or any attachment and you are advised to use up-to-date virus checking software.
Re: Cassandra not retrieving the complete data on 2 nodes
In addition to using a low consistency level, it sounds like you didn't bootstrap the node or run a repair after it joined the ring. On Wed, Jun 6, 2012 at 12:41 PM, Poziombka, Wade L wade.l.poziom...@intel.com wrote: what is your consistency level? ** ** *From:* Prakrati Agrawal [mailto:prakrati.agra...@mu-sigma.com] *Sent:* Wednesday, June 06, 2012 4:58 AM *To:* user@cassandra.apache.org *Subject:* RE: Cassandra not retrieving the complete data on 2 nodes ** ** Please anyone reply to my query ** ** Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com ** ** *From:* Prakrati Agrawal [mailto:prakrati.agra...@mu-sigma.comprakrati.agra...@mu-sigma.com] *Sent:* Wednesday, June 06, 2012 2:34 PM *To:* user@cassandra.apache.org *Subject:* Cassandra not retrieving the complete data on 2 nodes ** ** Dear all ** ** I was originally having a 1 node cluster. Then I added one more node to it with initial token configured appropriately. Now when I run my queries I am not getting all my data ie all columns. ** ** Output on 2 nodes Time taken to retrieve columns 43707 of key range is 1276 Time taken to retrieve columns 2084199 of all tickers is 54334 Time taken to count is 230776 Total number of rows in the database are 183 Total number of columns in the database are 7903753 ** ** Output on 1 node Time taken to retrieve columns 43707 of key range is 767 Time taken to retrieve columns 382 of all tickers is 52793 Time taken to count is 268135 Total number of rows in the database are 396 Total number of columns in the database are 16316426 ** ** Please help me. Where is my data going or how should I retrieve it. ** ** Thanks and Regards Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com ** ** ** ** -- This email message may contain proprietary, private and confidential information. The information transmitted is intended only for the person(s) or entities to which it is addressed. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited and may be illegal. If you received this in error, please contact the sender and delete the message from your system. Mu Sigma takes all reasonable steps to ensure that its electronic communications are free from viruses. However, given Internet accessibility, the Company cannot accept liability for any virus introduced by this e-mail or any attachment and you are advised to use up-to-date virus checking software. ** ** -- This email message may contain proprietary, private and confidential information. The information transmitted is intended only for the person(s) or entities to which it is addressed. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited and may be illegal. If you received this in error, please contact the sender and delete the message from your system. Mu Sigma takes all reasonable steps to ensure that its electronic communications are free from viruses. However, given Internet accessibility, the Company cannot accept liability for any virus introduced by this e-mail or any attachment and you are advised to use up-to-date virus checking software. -- Tyler Hobbs DataStax http://datastax.com/
Re: memory issue on 1.1.0
Just to check, do you have JNA setup correctly? (You should see a couple of log messages about it shortly after startup.) Truncate also performs a snapshot by default. On Wed, Jun 6, 2012 at 12:38 PM, Poziombka, Wade L wade.l.poziom...@intel.com wrote: ** However, after all the work I issued a truncate on the old column family (the one replaced by this process) and I get an out of memory condition then. -- Tyler Hobbs DataStax http://datastax.com/
RE: memory issue on 1.1.0
I believe so. There are no warnings on startup. So is there a preferred way to completely eliminate a column family? From: Tyler Hobbs [mailto:ty...@datastax.com] Sent: Wednesday, June 06, 2012 1:17 PM To: user@cassandra.apache.org Subject: Re: memory issue on 1.1.0 Just to check, do you have JNA setup correctly? (You should see a couple of log messages about it shortly after startup.) Truncate also performs a snapshot by default. On Wed, Jun 6, 2012 at 12:38 PM, Poziombka, Wade L wade.l.poziom...@intel.commailto:wade.l.poziom...@intel.com wrote: However, after all the work I issued a truncate on the old column family (the one replaced by this process) and I get an out of memory condition then. -- Tyler Hobbs DataStaxhttp://datastax.com/
Re: Removing a node in cluster
It depends on what you mean by remove (background info here http://www.datastax.com/docs/1.0/operations/cluster_management ) If you use nodetool decomission or nodetool removetoken the data will be redistributed. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 6/06/2012, at 5:39 PM, Prakrati Agrawal wrote: Dear all I am trying to check the performance of Cassandra on adding or removing nodes. I want to know what happens to my existing data if I remove a node ? Please help me Thanks and Regards Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com This email message may contain proprietary, private and confidential information. The information transmitted is intended only for the person(s) or entities to which it is addressed. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited and may be illegal. If you received this in error, please contact the sender and delete the message from your system. Mu Sigma takes all reasonable steps to ensure that its electronic communications are free from viruses. However, given Internet accessibility, the Company cannot accept liability for any virus introduced by this e-mail or any attachment and you are advised to use up-to-date virus checking software.
Re: How to include two nodes in Java code using Hector
The client does not have to know where the data is, thats what the cluster works out see http://www.datastax.com/docs/1.0/cluster_architecture/about_client_requests Now I have decommissioned a node but now I don't know how to recommission it .Please help me http://www.datastax.com/docs/1.0/operations/cluster_management Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 6/06/2012, at 6:12 PM, Prakrati Agrawal wrote: Thank you for the reply. Now I have decommissioned a node but now I don't know how to recommission it .Please help me Thanks and Regards Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com -Original Message- From: Roshni Rajagopal [mailto:roshni.rajago...@wal-mart.com] Sent: Wednesday, June 06, 2012 11:42 AM To: user@cassandra.apache.org Subject: Re: How to include two nodes in Java code using Hector In Hector when you create a cluster using the API, you specify an IP address cluster name. Thereafter internally which node serves the request or how many nodes need to be contacted to read/write data depends on the cluster configuration i.e. Whats your replication strategy, factor, consistency levels for the col family , how many nodes are there in the ring etc. So you don't individually need to connect to each node via Hector client. Once you connect to the cluster keyspace, via any IP add of any node in the cluster, when you make Hector calls to read/write data, it would automatically figure out the node level details and carry out the task. You won't get 50% of the data, you will get all data. Also when you remove a node, your data will be unavailable ONLY if you don't have it available in some other node as a replica.. Regards, From: Prakrati Agrawal prakrati.agra...@mu-sigma.commailto:prakrati.agra...@mu-sigma.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Tue, 5 Jun 2012 20:05:21 -0700 To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: RE: How to include two nodes in Java code using Hector But the data is distributed on the nodes ( meaning 50% of data is on one node and 50% of data is on another node) so I need to specify the node ip address somewhere in the code. But where do I specify that is what I am clueless about. Please help me Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com From: Harshvardhan Ojha [mailto:harshvardhan.o...@makemytrip.com] Sent: Tuesday, June 05, 2012 5:51 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: RE: How to include two nodes in Java code using Hector Use Consistency Level =2. Regards Harsh From: Prakrati Agrawal [mailto:prakrati.agra...@mu-sigma.com] Sent: Tuesday, June 05, 2012 4:08 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: How to include two nodes in Java code using Hector Dear all I am using a two node Cassandra cluster. How do I code in Java using Hector to get data from both the nodes. Please help Thanks and Regards Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.comhttp://www.mu-sigma.com This email message may contain proprietary, private and confidential information. The information transmitted is intended only for the person(s) or entities to which it is addressed. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited and may be illegal. If you received this in error, please contact the sender and delete the message from your system. Mu Sigma takes all reasonable steps to ensure that its electronic communications are free from viruses. However, given Internet accessibility, the Company cannot accept liability for any virus introduced by this e-mail or any attachment and you are advised to use up-to-date virus checking software. This email message may contain proprietary, private and confidential information. The information transmitted is intended only for the person(s) or entities to which it is addressed. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited and may be illegal. If you received this in error, please contact the sender and delete the message from your system. Mu Sigma takes all reasonable steps to ensure that its electronic communications are free from viruses. However, given Internet accessibility, the Company cannot accept liability for any virus introduced by this e-mail or
Re: how to create keyspace using cassandra API's
You can use the CLI http://www.datastax.com/docs/1.0/dml/using_cli or CQL http://www.datastax.com/docs/1.0/dml/using_cql Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 6/06/2012, at 9:00 PM, Prakrati Agrawal wrote: You have to create the keyspace manually first using Cassandra cli Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com From: MOHD ARSHAD SALEEM [mailto:marshadsal...@tataelxsi.co.in] Sent: Wednesday, June 06, 2012 2:27 PM To: user@cassandra.apache.org Subject: how to create keyspace using cassandra API's Hi All, I am using Hector as a client in cassandra.And iam trying to create Keyspace using the following API's Keyspace keyspace = HFactory.createKeyspace(test, cluster); but it showing the following error: caused by: InvalidRequestException(why:Keyspace test does not exist) can any body help me how to create keyspace in cassandra. Regards Arshad This email message may contain proprietary, private and confidential information. The information transmitted is intended only for the person(s) or entities to which it is addressed. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited and may be illegal. If you received this in error, please contact the sender and delete the message from your system. Mu Sigma takes all reasonable steps to ensure that its electronic communications are free from viruses. However, given Internet accessibility, the Company cannot accept liability for any virus introduced by this e-mail or any attachment and you are advised to use up-to-date virus checking software.
Re: Node decomission failed
Take a look in the logs for .185 and check for errors. Run node tool ring from node .62 to see if it thinks .185 is in the ring. if all looks good, try to decomission again. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 7/06/2012, at 12:32 AM, Marc Canaleta wrote: Hi, We are testing Cassandra and tried to remove a node from the cluster using nodetool decomission. The node transferred the data, then died for about 20 minutes without responding, then came back to life with a load of 50-100, was in a heavy load during about 1 hour and then returned to normal load. It seems to have stopped receiving new data but it is still in the cluster. The node we tried to remove is the third one: root@dc-cassandra-03:~# nodetool ring Note: Ownership information does not include topology, please specify a keyspace. Address DC RackStatus State LoadOwns Token 113427455640312821154458202477256070484 10.70.147.62datacenter1 rack1 Up Normal 7.14 GB 33.33% 0 10.208.51.64datacenter1 rack1 Up Normal 3.68 GB 33.33% 56713727820156410577229101238628035242 10.190.207.185 datacenter1 rack1 Up Normal 3.54 GB 33.33% 113427455640312821154458202477256070484 It seems it is still part of the cluster. What should we do? decomission again? How can we know the current state of the cluster? Thanks!
Re: MeteredFlusher in system.log entries
You question was Could you please throw light on the what conditions does MeteredFlusher use to trigger memtable flushes. The answer is estimates of the ratio between the live size and the serialised size of memtables are kept. The MeteredFlusher periodically checks the serialised size of all memtables and uses the ratio to determine if memtable_total_space_in_mb has been reached. If there is a variation between nodes it may be that some are getting more traffic than others. So it implies that for flushing, Cassandra copies the memtables content. No So does this imply that writes to column families are not stopped even when it is being flushed? Yes. In a worst case scenario writes will block if the memtable flushing cannot keep up. Also, Could someone please explain how the factor of 7 comes in the picture in this sentence In the example (see previous para) 7 is the number of memtables the CF could have in memory at once (forgetting about the other cf's). Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 7/06/2012, at 1:08 AM, rohit bhatia wrote: Also, Could someone please explain how the factor of 7 comes in the picture in this sentence For example if memtable_total_space_in_mb is 100MB, and memtable_flush_writers is the default 1 (with one data directory), and memtable_flush_queue_size is the default 4, and a Column Family has no secondary indexes. The CF will not be allowed to get above one seventh of 100MB or 14MB, as if the CF filled the flush pipeline with 7 memtables of this size it would take 98MB. On Wed, Jun 6, 2012 at 6:22 PM, rohit bhatia rohit2...@gmail.com wrote: Hi.. the link http://thelastpickle.com/2011/05/04/How-are-Memtables-measured/ mentions that From version 0.7 onwards the worse case scenario is up to CF Count + Secondary Index Count + memtable_flush_queue_size (defaults to 4) + memtable_flush_writers (defaults to 1 per data directory) memtables in memory the JVM at once.. So it implies that for flushing, Cassandra copies the memtables content. So does this imply that writes to column families are not stopped even when it is being flushed? Thanks Rohit On Wed, Jun 6, 2012 at 9:42 AM, rohit bhatia rohit2...@gmail.com wrote: Hi Aaron Thanks for the link, I have gone through it. But this doesn't justify nodes of exactly same config/specs differing in their flushing frequency. The traffic on all node is same as we are using RandomPartitioner Thanks Rohit On Wed, Jun 6, 2012 at 12:24 AM, aaron morton aa...@thelastpickle.com wrote: See the section on memtable_total_space_in_mb here http://thelastpickle.com/2011/05/04/How-are-Memtables-measured/ Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 6/06/2012, at 2:27 AM, rohit bhatia wrote: I am trying to understand the variance in flushes frequency in a 8 node Cassandra cluster. All the flushes are of the same type and initiated by MeteredFlusher.java = INFO [OptionalTasks:1] 2012-06-05 06:32:05,873 MeteredFlusher.java (line 62) flushing high-traffic column family CFS(Keyspace='Stats', ColumnFamily='Minutewise_Channel_Stats') (estimated 501695882 bytes) [taken from system.log] Number of flushes for 1 column family vary from 6 flushes per day to 24 flushes per day among nodes of same configuration and same hardware. Could you please throw light on the what conditions does MeteredFlusher use to trigger memtable flushes. Also how accurate is the estimated size in the above logfile entry. Regards Rohit Bhatia Software Engineer, Media.net
Re: memory issue on 1.1.0
use drop. truncate is mostly for unit tests. A - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 7/06/2012, at 6:22 AM, Poziombka, Wade L wrote: I believe so. There are no warnings on startup. So is there a preferred way to completely eliminate a column family? From: Tyler Hobbs [mailto:ty...@datastax.com] Sent: Wednesday, June 06, 2012 1:17 PM To: user@cassandra.apache.org Subject: Re: memory issue on 1.1.0 Just to check, do you have JNA setup correctly? (You should see a couple of log messages about it shortly after startup.) Truncate also performs a snapshot by default. On Wed, Jun 6, 2012 at 12:38 PM, Poziombka, Wade L wade.l.poziom...@intel.com wrote: However, after all the work I issued a truncate on the old column family (the one replaced by this process) and I get an out of memory condition then. -- Tyler Hobbs DataStax
Re: Nodes not picking up data on repair, disk loaded unevenly
Another little question: I just added some EBS volumes to the nodes that are particularly choked and I am now running major compactions on those nodes (and all is well so far). Once everything gets back down to a normal size, can I move all the data back off the ebs volumes? something along the lines of: nodetool –h localhost drain stop cassandra remove ebs volumes from cassandra conf cp -r /recovery/* /mnt/data unmount/detatch/delete ebs volume start cassandra Then add some more nodes to the cluster to keep this from happening in the future. I assume all the files stored in any of the data directories are all uniquely named and cassandra won't really care where they are as long as everything it wants is in it's data directories. I was also thinking to copy my column families (using thrift or the like) to fresh column families to undo any strangeness done by my major compactions, then get rid of the old CFs once everything is hunkey-dory. Luke
Re: Secondary Indexes, Quorum and Cluster Availability
On Tue, Jun 5, 2012 at 4:30 PM, Jim Ancona j...@anconafamily.com wrote: It might be a good idea for the documentation to reflect the tradeoffs more clearly. Here's a proposed addition to the Secondary Index FAQ at http://wiki.apache.org/cassandra/SecondaryIndexes Q: How does choice of Consistency Level affect cluster availability when using secondary indexes? A: Because secondary indexes are distributed, you must have CL level nodes available for *all* token ranges in the cluster in order to complete a query. For example, with RF = 3, when two out of three consecutive nodes in the ring are unavailable, *all* secondary index queries at CL = QUORUM will fail, however secondary index queries at CL = ONE will succeed. This is true regardless of cluster size. Comments? Jim
Cassandra 1.1.1 Fails to Start
Hi All, On SuSe Linux blade with 6GB of RAM. with disk_access_mode mmap_index_only and mmap I see OOM map failed error on SSTableBatchOpen thread. cat /proc/pid/maps shows a peak of 53521 right before it dies. vm.max_map_count = 1966080 and /proc/pid/limits shows unlimited locked memory. with disk_access_mode standard, the node does start up but I see the repeated error: ERROR [CompactionExecutor:6] 2012-06-06 20:24:19,772 AbstractCassandraDaemon.java (line 134) Exception in thread Thread[CompactionExecutor:6,1,main] java.lang.StackOverflowError at com.google.common.collect.Sets$1.iterator(Sets.java:578) at com.google.common.collect.Sets$1.iterator(Sets.java:578) at com.google.common.collect.Sets$1.iterator(Sets.java:578) ... I'm not sure the second error is related to the first. I prefer to run with full mmap but I have run out of ideas. Is there anything else I can do to debug this? Here's startup settings from debug log: INFO [main] 2012-06-06 20:17:10,267 AbstractCassandraDaemon.java (line 121) JVM vendor/version: Java HotSpot(TM) 64-Bit Server VM/1.6.0_31 INFO [main] 2012-06-06 20:17:10,267 AbstractCassandraDaemon.java (line 122) Heap size: 1525415936/1525415936 ... INFO [main] 2012-06-06 20:17:10,946 CLibrary.java (line 111) JNA mlockall successful ... INFO [main] 2012-06-06 20:17:11,055 DatabaseDescriptor.java (line 191) DiskAccessMode is standard, indexAccessMode is standard INFO [main] 2012-06-06 20:17:11,213 DatabaseDescriptor.java (line 247) Global memtable threshold is enabled at 484MB INFO [main] 2012-06-06 20:17:11,499 CacheService.java (line 96) Initializing key cache with capacity of 72 MBs. INFO [main] 2012-06-06 20:17:11,509 CacheService.java (line 107) Scheduling key cache save to each 14400 seconds (going to save all keys). INFO [main] 2012-06-06 20:17:11,510 CacheService.java (line 121) Initializing row cache with capacity of 0 MBs and provider org.apache.cassandra.cache.SerializingCacheProvider INFO [main] 2012-06-06 20:17:11,513 CacheService.java (line 133) Scheduling row cache save to each 0 seconds (going to save all keys). Thanks In Advance, Javier
Re: how to create keyspace using cassandra API's
U can use Astyanax API. These sort minor issues are resolved in that API. Regards, Abhijit