Re: Bootstrapping taking long
My nodes all have themselves in their list of seeds - always did - and everything works. (You may ask why I did this. I don't know, I must have copied it from an example somewhere.) On Wed, Jan 5, 2011 at 9:42 AM, Ran Tavory ran...@gmail.com wrote: I was able to make the node join the ring but I'm confused. What I did is, first when adding the node, this node was not in the seeds list of itself. AFAIK this is how it's supposed to be. So it was able to transfer all data to itself from other nodes but then it stayed in the bootstrapping state. So what I did (and I don't know why it works), is add this node to the seeds list in its own storage-conf.xml file. Then restart the server and then I finally see it in the ring... If I had added the node to the seeds list of itself when first joining it, it would not join the ring but if I do it in two phases it did work. So it's either my misunderstanding or a bug... On Wed, Jan 5, 2011 at 7:14 AM, Ran Tavory ran...@gmail.com wrote: The new node does not see itself as part of the ring, it sees all others but itself, so from that perspective the view is consistent. The only problem is that the node never finishes to bootstrap. It stays in this state for hours (It's been 20 hours now...) $ bin/nodetool -p 9004 -h localhost streams Mode: Bootstrapping Not sending any streams. Not receiving any streams. On Wed, Jan 5, 2011 at 1:20 AM, Nate McCall n...@riptano.com wrote: Does the new node have itself in the list of seeds per chance? This could cause some issues if so. On Tue, Jan 4, 2011 at 4:10 PM, Ran Tavory ran...@gmail.com wrote: I'm still at lost. I haven't been able to resolve this. I tried adding another node at a different location on the ring but this node too remains stuck in the bootstrapping state for many hours without any of the other nodes being busy with anti compaction or anything else. I don't know what's keeping it from finishing the bootstrap,no CPU, no io, files were already streamed so what is it waiting for? I read the release notes of 0.6.7 and 0.6.8 and there didn't seem to be anything addressing a similar issue so I figured there was no point in upgrading. But let me know if you think there is. Or any other advice... On Tuesday, January 4, 2011, Ran Tavory ran...@gmail.com wrote: Thanks Jake, but unfortunately the streams directory is empty so I don't think that any of the nodes is anti-compacting data right now or had been in the past 5 hours. It seems that all the data was already transferred to the joining host but the joining node, after having received the data would still remain in bootstrapping mode and not join the cluster. I'm not sure that *all* data was transferred (perhaps other nodes need to transfer more data) but nothing is actually happening so I assume all has been moved. Perhaps it's a configuration error from my part. Should I use I use AutoBootstrap=true ? Anything else I should look out for in the configuration file or something else? On Tue, Jan 4, 2011 at 4:08 PM, Jake Luciani jak...@gmail.com wrote: In 0.6, locate the node doing anti-compaction and look in the streams subdirectory in the keyspace data dir to monitor the anti-compaction progress (it puts new SSTables for bootstrapping node in there) On Tue, Jan 4, 2011 at 8:01 AM, Ran Tavory ran...@gmail.com wrote: Running nodetool decommission didn't help. Actually the node refused to decommission itself (b/c it wasn't part of the ring). So I simply stopped the process, deleted all the data directories and started it again. It worked in the sense of the node bootstrapped again but as before, after it had finished moving the data nothing happened for a long time (I'm still waiting, but nothing seems to be happening). Any hints how to analyze a stuck bootstrapping node??thanks On Tue, Jan 4, 2011 at 1:51 PM, Ran Tavory ran...@gmail.com wrote: Thanks Shimi, so indeed anticompaction was run on one of the other nodes from the same DC but to my understanding it has already ended. A few hour ago... I plenty of log messages such as [1] which ended a couple of hours ago, and I've seen the new node streaming and accepting the data from the node which performed the anticompaction and so far it was normal so it seemed that data is at its right place. But now the new node seems sort of stuck. None of the other nodes is anticompacting right now or had been anticompacting since then. The new node's CPU is close to zero, it's iostats are almost zero so I can't find another bottleneck that would keep it hanging. On the IRC someone suggested I'd maybe retry to join this node, e.g. decommission and rejoin it again. I'll try it now... [1] INFO [COMPACTION-POOL:1] 2011-01-04 04:04:09,721 CompactionManager.java (line 338) AntiCompacting
Re: Bootstrapping taking long
Well your ring issues don't make sense to me, seed list should be the same across the cluster. I'm just thinking of other things to try, non-boostrapped nodes should join the ring instantly but reads will fail if you aren't using quorum. On Wed, Jan 5, 2011 at 8:51 AM, Ran Tavory ran...@gmail.com wrote: I haven't tried repair. Should I? On Jan 5, 2011 3:48 PM, Jake Luciani jak...@gmail.com wrote: Have you tried not bootstrapping but setting the token and manually calling repair? On Wed, Jan 5, 2011 at 7:07 AM, Ran Tavory ran...@gmail.com wrote: My conclusion is lame: I tried this on several hosts and saw the same behavior, the only way I was able to join new nodes was to first start them when they are *not in* their own seeds list and after they finish transferring the data, then restart them with themselves *in* their own seeds list. After doing that the node would join the ring. This is either my misunderstanding or a bug, but the only place I found it documented stated that the new node should not be in its own seeds list. Version 0.6.6. On Wed, Jan 5, 2011 at 10:35 AM, David Boxenhorn da...@lookin2.com wrote: My nodes all have themselves in their list of seeds - always did - and everything works. (You may ask why I did this. I don't know, I must have copied it from an example somewhere.) On Wed, Jan 5, 2011 at 9:42 AM, Ran Tavory ran...@gmail.com wrote: I was able to make the node join the ring but I'm confused. What I did is, first when adding the node, this node was not in the seeds list of itself. AFAIK this is how it's supposed to be. So it was able to transfer all data to itself from other nodes but then it stayed in the bootstrapping state. So what I did (and I don't know why it works), is add this node to the seeds list in its own storage-conf.xml file. Then restart the server and then I finally see it in the ring... If I had added the node to the seeds list of itself when first joining it, it would not join the ring but if I do it in two phases it did work. So it's either my misunderstanding or a bug... On Wed, Jan 5, 2011 at 7:14 AM, Ran Tavory ran...@gmail.com wrote: The new node does not see itself as part of the ring, it sees all others but itself, so from that perspective the view is consistent. The only problem is that the node never finishes to bootstrap. It stays in this state for hours (It's been 20 hours now...) $ bin/nodetool -p 9004 -h localhost streams Mode: Bootstrapping Not sending any streams. Not receiving any streams. On Wed, Jan 5, 2011 at 1:20 AM, Nate McCall n...@riptano.com wrote: Does the new node have itself in the list of seeds per chance? This could cause some issues if so. On Tue, Jan 4, 2011 at 4:10 PM, Ran Tavory ran...@gmail.com wrote: I'm still at lost. I haven't been able to resolve this. I tried adding another node at a different location on the ring but this node too remains stuck in the bootstrapping state for many hours without any of the other nodes being busy with anti compaction or anything else. I don't know what's keeping it from finishing the bootstrap,no CPU, no io, files were already streamed so what is it waiting for? I read the release notes of 0.6.7 and 0.6.8 and there didn't seem to be anything addressing a similar issue so I figured there was no point in upgrading. But let me know if you think there is. Or any other advice... On Tuesday, January 4, 2011, Ran Tavory ran...@gmail.com wrote: Thanks Jake, but unfortunately the streams directory is empty so I don't think that any of the nodes is anti-compacting data right now or had been in the past 5 hours. It seems that all the data was already transferred to the joining host but the joining node, after having received the data would still remain in bootstrapping mode and not join the cluster. I'm not sure that *all* data was transferred (perhaps other nodes need to transfer more data) but nothing is actually happening so I assume all has been moved. Perhaps it's a configuration error from my part. Should I use I use AutoBootstrap=true ? Anything else I should look out for in the configuration file or something else? On Tue, Jan 4, 2011 at 4:08 PM, Jake Luciani jak...@gmail.com wrote: In 0.6, locate the node doing anti-compaction and look in the streams subdirectory in the keyspace data dir to monitor the anti-compaction progress (it puts new SSTables for bootstrapping node in there) On Tue, Jan 4, 2011 at 8:01 AM, Ran Tavory ran...@gmail.com wrote: Running nodetool decommission didn't help. Actually the node refused to decommission itself (b/c it wasn't part of the ring). So I simply stopped the process, deleted all the data directories and started it again. It worked in the sense of the node bootstrapped again but as before, after it had
The size of the data, I must be doing smth wrong....
Hi i have some data size issues: i am storing super columns with the following content: {a=1, b=2, c=3...n=14} i am storing it 300 000 times and i have a data size on the disk about 283Mo And in other side i have a mysql table which stores a bunch of data the schema follows: 6 varchars +100 5 ints +6 I put about 1 300 000 records on it and end up with 150Mo of data and 57Mo of index. Then i think i am certainly doing something wrong... The other thing is when i run flush and then compact the size of my data increases, then i imagine something is copied up on compaction So is there a way to remove the unused data? (cleanup doesn t seem to do the job). Any help to reduce the size of the data would be greatly apreciated! Greetings
Re: Cassandra 0.7 - Query on network topology
On Wed, Jan 5, 2011 at 3:37 AM, Narendra Sharma narendra.sha...@gmail.com wrote: What I am looking for is: 1. Some way to send requests for keys whose token fall between 0-25 to B and never to C even though C will have the data due to it being replica of B. 2. Only when B is down or not reachable, the request should go to C. 3. Once the requests start going to C, they should continue unless C is down and in which case the requests should then go to B. My understanding is that SimpleSnitch should fit here except for the enforcing #3 above. Right, with the caveat that you'll probably want to set the dynamic snitch badness threshold to allow switching to B even if C merely gets overloaded rather than completely down. The alternative is disabling the dynamic snitch entirely. will SimpleSnitch come into picture if the request from client reaches node C directly? Yes. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: The size of the data, I must be doing smth wrong....
It's normal for Cassandra to use more disk space than MySQL. It's part of what we trade for not having to rewrite every row when you add a new column. SSTables that are obsoleted by a compaction are deleted asynchronously when the JVM performs a GC. http://wiki.apache.org/cassandra/MemtableSSTable On Wed, Jan 5, 2011 at 8:35 AM, nicolas lattuada nicolaslattu...@hotmail.fr wrote: Hi i have some data size issues: i am storing super columns with the following content: {a=1, b=2, c=3...n=14} i am storing it 300 000 times and i have a data size on the disk about 283Mo And in other side i have a mysql table which stores a bunch of data the schema follows: 6 varchars +100 5 ints +6 I put about 1 300 000 records on it and end up with 150Mo of data and 57Mo of index. Then i think i am certainly doing something wrong... The other thing is when i run flush and then compact the size of my data increases, then i imagine something is copied up on compaction So is there a way to remove the unused data? (cleanup doesn t seem to do the job). Any help to reduce the size of the data would be greatly apreciated! Greetings -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: The size of the data, I must be doing smth wrong....
On Wed, Jan 5, 2011 at 9:52 AM, Jonathan Ellis jbel...@gmail.com wrote: It's normal for Cassandra to use more disk space than MySQL. It's part of what we trade for not having to rewrite every row when you add a new column. SSTables that are obsoleted by a compaction are deleted asynchronously when the JVM performs a GC. http://wiki.apache.org/cassandra/MemtableSSTable On Wed, Jan 5, 2011 at 8:35 AM, nicolas lattuada nicolaslattu...@hotmail.fr wrote: Hi i have some data size issues: i am storing super columns with the following content: {a=1, b=2, c=3...n=14} i am storing it 300 000 times and i have a data size on the disk about 283Mo And in other side i have a mysql table which stores a bunch of data the schema follows: 6 varchars +100 5 ints +6 I put about 1 300 000 records on it and end up with 150Mo of data and 57Mo of index. Then i think i am certainly doing something wrong... The other thing is when i run flush and then compact the size of my data increases, then i imagine something is copied up on compaction So is there a way to remove the unused data? (cleanup doesn t seem to do the job). Any help to reduce the size of the data would be greatly apreciated! Greetings -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com Unlike datastores that are delimited or have fixed column sizes Cassandra does not. Each row is a Sorted Map of columns. A Column is a tupple of {columnname,columnvalue,time}. Also the data is not stored as tersely as it is inside mysql.
Re: Bootstrapping taking long
In storage-conf I see this comment [1] from which I understand that the recommended way to bootstrap a new node is to set AutoBootstrap=true and remove itself from the seeds list. Moreover, I did try to set AutoBootstrap=true and have the node in its own seeds list, but it would not bootstrap. I don't recall the exact message but it was something like I found myself in the seeds list therefore I'm not going to bootstrap even though AutoBootstrap is true. [1] !-- ~ Turn on to make new [non-seed] nodes automatically migrate the right data ~ to themselves. (If no InitialToken is specified, they will pick one ~ such that they will get half the range of the most-loaded node.) ~ If a node starts up without bootstrapping, it will mark itself bootstrapped ~ so that you can't subsequently accidently bootstrap a node with ~ data on it. (You can reset this by wiping your data and commitlog ~ directories.) ~ ~ Off by default so that new clusters and upgraders from 0.4 don't ~ bootstrap immediately. You should turn this on when you start adding ~ new nodes to a cluster that already has data on it. (If you are upgrading ~ from 0.4, start your cluster with it off once before changing it to true. ~ Otherwise, no data will be lost but you will incur a lot of unnecessary ~ I/O before your cluster starts up.) -- AutoBootstrapfalse/AutoBootstrap On Wed, Jan 5, 2011 at 4:58 PM, David Boxenhorn da...@lookin2.com wrote: If seed list should be the same across the cluster that means that nodes *should* have themselves as a seed. If that doesn't work for Ran, then that is the first problem, no? On Wed, Jan 5, 2011 at 3:56 PM, Jake Luciani jak...@gmail.com wrote: Well your ring issues don't make sense to me, seed list should be the same across the cluster. I'm just thinking of other things to try, non-boostrapped nodes should join the ring instantly but reads will fail if you aren't using quorum. On Wed, Jan 5, 2011 at 8:51 AM, Ran Tavory ran...@gmail.com wrote: I haven't tried repair. Should I? On Jan 5, 2011 3:48 PM, Jake Luciani jak...@gmail.com wrote: Have you tried not bootstrapping but setting the token and manually calling repair? On Wed, Jan 5, 2011 at 7:07 AM, Ran Tavory ran...@gmail.com wrote: My conclusion is lame: I tried this on several hosts and saw the same behavior, the only way I was able to join new nodes was to first start them when they are *not in* their own seeds list and after they finish transferring the data, then restart them with themselves *in* their own seeds list. After doing that the node would join the ring. This is either my misunderstanding or a bug, but the only place I found it documented stated that the new node should not be in its own seeds list. Version 0.6.6. On Wed, Jan 5, 2011 at 10:35 AM, David Boxenhorn da...@lookin2.com wrote: My nodes all have themselves in their list of seeds - always did - and everything works. (You may ask why I did this. I don't know, I must have copied it from an example somewhere.) On Wed, Jan 5, 2011 at 9:42 AM, Ran Tavory ran...@gmail.com wrote: I was able to make the node join the ring but I'm confused. What I did is, first when adding the node, this node was not in the seeds list of itself. AFAIK this is how it's supposed to be. So it was able to transfer all data to itself from other nodes but then it stayed in the bootstrapping state. So what I did (and I don't know why it works), is add this node to the seeds list in its own storage-conf.xml file. Then restart the server and then I finally see it in the ring... If I had added the node to the seeds list of itself when first joining it, it would not join the ring but if I do it in two phases it did work. So it's either my misunderstanding or a bug... On Wed, Jan 5, 2011 at 7:14 AM, Ran Tavory ran...@gmail.com wrote: The new node does not see itself as part of the ring, it sees all others but itself, so from that perspective the view is consistent. The only problem is that the node never finishes to bootstrap. It stays in this state for hours (It's been 20 hours now...) $ bin/nodetool -p 9004 -h localhost streams Mode: Bootstrapping Not sending any streams. Not receiving any streams. On Wed, Jan 5, 2011 at 1:20 AM, Nate McCall n...@riptano.com wrote: Does the new node have itself in the list of seeds per chance? This could cause some issues if so. On Tue, Jan 4, 2011 at 4:10 PM, Ran Tavory ran...@gmail.com wrote: I'm still at lost. I haven't been able to resolve this. I tried adding another node at a different location on the ring but this node too remains stuck in the bootstrapping state for many hours without any of the other nodes being busy with anti compaction or anything else. I don't know what's keeping it from finishing the bootstrap,no CPU, no io, files were already streamed so
Re: Bootstrapping taking long
https://issues.apache.org/jira/browse/CASSANDRA-1676 you have to use at least 0.6.7 On Wed, Jan 5, 2011 at 4:19 PM, Edward Capriolo edlinuxg...@gmail.comwrote: On Wed, Jan 5, 2011 at 10:05 AM, Ran Tavory ran...@gmail.com wrote: In storage-conf I see this comment [1] from which I understand that the recommended way to bootstrap a new node is to set AutoBootstrap=true and remove itself from the seeds list. Moreover, I did try to set AutoBootstrap=true and have the node in its own seeds list, but it would not bootstrap. I don't recall the exact message but it was something like I found myself in the seeds list therefore I'm not going to bootstrap even though AutoBootstrap is true. [1] !-- ~ Turn on to make new [non-seed] nodes automatically migrate the right data ~ to themselves. (If no InitialToken is specified, they will pick one ~ such that they will get half the range of the most-loaded node.) ~ If a node starts up without bootstrapping, it will mark itself bootstrapped ~ so that you can't subsequently accidently bootstrap a node with ~ data on it. (You can reset this by wiping your data and commitlog ~ directories.) ~ ~ Off by default so that new clusters and upgraders from 0.4 don't ~ bootstrap immediately. You should turn this on when you start adding ~ new nodes to a cluster that already has data on it. (If you are upgrading ~ from 0.4, start your cluster with it off once before changing it to true. ~ Otherwise, no data will be lost but you will incur a lot of unnecessary ~ I/O before your cluster starts up.) -- AutoBootstrapfalse/AutoBootstrap On Wed, Jan 5, 2011 at 4:58 PM, David Boxenhorn da...@lookin2.com wrote: If seed list should be the same across the cluster that means that nodes *should* have themselves as a seed. If that doesn't work for Ran, then that is the first problem, no? On Wed, Jan 5, 2011 at 3:56 PM, Jake Luciani jak...@gmail.com wrote: Well your ring issues don't make sense to me, seed list should be the same across the cluster. I'm just thinking of other things to try, non-boostrapped nodes should join the ring instantly but reads will fail if you aren't using quorum. On Wed, Jan 5, 2011 at 8:51 AM, Ran Tavory ran...@gmail.com wrote: I haven't tried repair. Should I? On Jan 5, 2011 3:48 PM, Jake Luciani jak...@gmail.com wrote: Have you tried not bootstrapping but setting the token and manually calling repair? On Wed, Jan 5, 2011 at 7:07 AM, Ran Tavory ran...@gmail.com wrote: My conclusion is lame: I tried this on several hosts and saw the same behavior, the only way I was able to join new nodes was to first start them when they are *not in* their own seeds list and after they finish transferring the data, then restart them with themselves *in* their own seeds list. After doing that the node would join the ring. This is either my misunderstanding or a bug, but the only place I found it documented stated that the new node should not be in its own seeds list. Version 0.6.6. On Wed, Jan 5, 2011 at 10:35 AM, David Boxenhorn da...@lookin2.comwrote: My nodes all have themselves in their list of seeds - always did - and everything works. (You may ask why I did this. I don't know, I must have copied it from an example somewhere.) On Wed, Jan 5, 2011 at 9:42 AM, Ran Tavory ran...@gmail.com wrote: I was able to make the node join the ring but I'm confused. What I did is, first when adding the node, this node was not in the seeds list of itself. AFAIK this is how it's supposed to be. So it was able to transfer all data to itself from other nodes but then it stayed in the bootstrapping state. So what I did (and I don't know why it works), is add this node to the seeds list in its own storage-conf.xml file. Then restart the server and then I finally see it in the ring... If I had added the node to the seeds list of itself when first joining it, it would not join the ring but if I do it in two phases it did work. So it's either my misunderstanding or a bug... On Wed, Jan 5, 2011 at 7:14 AM, Ran Tavory ran...@gmail.com wrote: The new node does not see itself as part of the ring, it sees all others but itself, so from that perspective the view is consistent. The only problem is that the node never finishes to bootstrap. It stays in this state for hours (It's been 20 hours now...) $ bin/nodetool -p 9004 -h localhost streams Mode: Bootstrapping Not sending any streams. Not receiving any streams. On Wed, Jan 5, 2011 at 1:20 AM, Nate McCall n...@riptano.com wrote: Does the new node have itself in the list of seeds per chance? This could cause some issues if so. On Tue, Jan 4, 2011 at 4:10 PM, Ran Tavory
Re: Bootstrapping taking long
@Thibaut wrong email? Or how's Avoid dropping messages off the client request path (CASSANDRA-1676) related to the bootstrap questions I had? On Wed, Jan 5, 2011 at 5:23 PM, Thibaut Britz thibaut.br...@trendiction.com wrote: https://issues.apache.org/jira/browse/CASSANDRA-1676 you have to use at least 0.6.7 On Wed, Jan 5, 2011 at 4:19 PM, Edward Capriolo edlinuxg...@gmail.comwrote: On Wed, Jan 5, 2011 at 10:05 AM, Ran Tavory ran...@gmail.com wrote: In storage-conf I see this comment [1] from which I understand that the recommended way to bootstrap a new node is to set AutoBootstrap=true and remove itself from the seeds list. Moreover, I did try to set AutoBootstrap=true and have the node in its own seeds list, but it would not bootstrap. I don't recall the exact message but it was something like I found myself in the seeds list therefore I'm not going to bootstrap even though AutoBootstrap is true. [1] !-- ~ Turn on to make new [non-seed] nodes automatically migrate the right data ~ to themselves. (If no InitialToken is specified, they will pick one ~ such that they will get half the range of the most-loaded node.) ~ If a node starts up without bootstrapping, it will mark itself bootstrapped ~ so that you can't subsequently accidently bootstrap a node with ~ data on it. (You can reset this by wiping your data and commitlog ~ directories.) ~ ~ Off by default so that new clusters and upgraders from 0.4 don't ~ bootstrap immediately. You should turn this on when you start adding ~ new nodes to a cluster that already has data on it. (If you are upgrading ~ from 0.4, start your cluster with it off once before changing it to true. ~ Otherwise, no data will be lost but you will incur a lot of unnecessary ~ I/O before your cluster starts up.) -- AutoBootstrapfalse/AutoBootstrap On Wed, Jan 5, 2011 at 4:58 PM, David Boxenhorn da...@lookin2.com wrote: If seed list should be the same across the cluster that means that nodes *should* have themselves as a seed. If that doesn't work for Ran, then that is the first problem, no? On Wed, Jan 5, 2011 at 3:56 PM, Jake Luciani jak...@gmail.com wrote: Well your ring issues don't make sense to me, seed list should be the same across the cluster. I'm just thinking of other things to try, non-boostrapped nodes should join the ring instantly but reads will fail if you aren't using quorum. On Wed, Jan 5, 2011 at 8:51 AM, Ran Tavory ran...@gmail.com wrote: I haven't tried repair. Should I? On Jan 5, 2011 3:48 PM, Jake Luciani jak...@gmail.com wrote: Have you tried not bootstrapping but setting the token and manually calling repair? On Wed, Jan 5, 2011 at 7:07 AM, Ran Tavory ran...@gmail.com wrote: My conclusion is lame: I tried this on several hosts and saw the same behavior, the only way I was able to join new nodes was to first start them when they are *not in* their own seeds list and after they finish transferring the data, then restart them with themselves *in* their own seeds list. After doing that the node would join the ring. This is either my misunderstanding or a bug, but the only place I found it documented stated that the new node should not be in its own seeds list. Version 0.6.6. On Wed, Jan 5, 2011 at 10:35 AM, David Boxenhorn da...@lookin2.comwrote: My nodes all have themselves in their list of seeds - always did - and everything works. (You may ask why I did this. I don't know, I must have copied it from an example somewhere.) On Wed, Jan 5, 2011 at 9:42 AM, Ran Tavory ran...@gmail.com wrote: I was able to make the node join the ring but I'm confused. What I did is, first when adding the node, this node was not in the seeds list of itself. AFAIK this is how it's supposed to be. So it was able to transfer all data to itself from other nodes but then it stayed in the bootstrapping state. So what I did (and I don't know why it works), is add this node to the seeds list in its own storage-conf.xml file. Then restart the server and then I finally see it in the ring... If I had added the node to the seeds list of itself when first joining it, it would not join the ring but if I do it in two phases it did work. So it's either my misunderstanding or a bug... On Wed, Jan 5, 2011 at 7:14 AM, Ran Tavory ran...@gmail.com wrote: The new node does not see itself as part of the ring, it sees all others but itself, so from that perspective the view is consistent. The only problem is that the node never finishes to bootstrap. It stays in this state for hours (It's been 20 hours now...) $ bin/nodetool -p 9004 -h localhost streams Mode: Bootstrapping Not sending any streams. Not receiving any streams.
Re: Bootstrapping taking long
Had the same Problem a while ago. Upgrading solved the problem (Don't know if you have to redeploy your cluster though) http://www.mail-archive.com/user@cassandra.apache.org/msg07106.html On Wed, Jan 5, 2011 at 4:29 PM, Ran Tavory ran...@gmail.com wrote: @Thibaut wrong email? Or how's Avoid dropping messages off the client request path (CASSANDRA-1676) related to the bootstrap questions I had? On Wed, Jan 5, 2011 at 5:23 PM, Thibaut Britz thibaut.br...@trendiction.com wrote: https://issues.apache.org/jira/browse/CASSANDRA-1676 you have to use at least 0.6.7 On Wed, Jan 5, 2011 at 4:19 PM, Edward Capriolo edlinuxg...@gmail.comwrote: On Wed, Jan 5, 2011 at 10:05 AM, Ran Tavory ran...@gmail.com wrote: In storage-conf I see this comment [1] from which I understand that the recommended way to bootstrap a new node is to set AutoBootstrap=true and remove itself from the seeds list. Moreover, I did try to set AutoBootstrap=true and have the node in its own seeds list, but it would not bootstrap. I don't recall the exact message but it was something like I found myself in the seeds list therefore I'm not going to bootstrap even though AutoBootstrap is true. [1] !-- ~ Turn on to make new [non-seed] nodes automatically migrate the right data ~ to themselves. (If no InitialToken is specified, they will pick one ~ such that they will get half the range of the most-loaded node.) ~ If a node starts up without bootstrapping, it will mark itself bootstrapped ~ so that you can't subsequently accidently bootstrap a node with ~ data on it. (You can reset this by wiping your data and commitlog ~ directories.) ~ ~ Off by default so that new clusters and upgraders from 0.4 don't ~ bootstrap immediately. You should turn this on when you start adding ~ new nodes to a cluster that already has data on it. (If you are upgrading ~ from 0.4, start your cluster with it off once before changing it to true. ~ Otherwise, no data will be lost but you will incur a lot of unnecessary ~ I/O before your cluster starts up.) -- AutoBootstrapfalse/AutoBootstrap On Wed, Jan 5, 2011 at 4:58 PM, David Boxenhorn da...@lookin2.com wrote: If seed list should be the same across the cluster that means that nodes *should* have themselves as a seed. If that doesn't work for Ran, then that is the first problem, no? On Wed, Jan 5, 2011 at 3:56 PM, Jake Luciani jak...@gmail.com wrote: Well your ring issues don't make sense to me, seed list should be the same across the cluster. I'm just thinking of other things to try, non-boostrapped nodes should join the ring instantly but reads will fail if you aren't using quorum. On Wed, Jan 5, 2011 at 8:51 AM, Ran Tavory ran...@gmail.com wrote: I haven't tried repair. Should I? On Jan 5, 2011 3:48 PM, Jake Luciani jak...@gmail.com wrote: Have you tried not bootstrapping but setting the token and manually calling repair? On Wed, Jan 5, 2011 at 7:07 AM, Ran Tavory ran...@gmail.com wrote: My conclusion is lame: I tried this on several hosts and saw the same behavior, the only way I was able to join new nodes was to first start them when they are *not in* their own seeds list and after they finish transferring the data, then restart them with themselves *in* their own seeds list. After doing that the node would join the ring. This is either my misunderstanding or a bug, but the only place I found it documented stated that the new node should not be in its own seeds list. Version 0.6.6. On Wed, Jan 5, 2011 at 10:35 AM, David Boxenhorn da...@lookin2.comwrote: My nodes all have themselves in their list of seeds - always did - and everything works. (You may ask why I did this. I don't know, I must have copied it from an example somewhere.) On Wed, Jan 5, 2011 at 9:42 AM, Ran Tavory ran...@gmail.com wrote: I was able to make the node join the ring but I'm confused. What I did is, first when adding the node, this node was not in the seeds list of itself. AFAIK this is how it's supposed to be. So it was able to transfer all data to itself from other nodes but then it stayed in the bootstrapping state. So what I did (and I don't know why it works), is add this node to the seeds list in its own storage-conf.xml file. Then restart the server and then I finally see it in the ring... If I had added the node to the seeds list of itself when first joining it, it would not join the ring but if I do it in two phases it did work. So it's either my misunderstanding or a bug... On Wed, Jan 5, 2011 at 7:14 AM, Ran Tavory ran...@gmail.com wrote: The new node does not see itself as part of the ring, it sees all others but itself, so from that perspective the view is consistent.
Re: Bootstrapping taking long
OK, thanks, so I see we had the same problem (I too had multiple keyspace, not that I know why it matters to the problem at hand) and I see that by upgrading to 0.6.7 you solved your problem (I didn't try it, had a different workaround) but frankly, I don't understand how https://issues.apache.org/jira/browse/CASSANDRA-1676 would relate the the stuck bootstrap problem (I'm not saying that it isn't, I'd just like to understand why...) On Wed, Jan 5, 2011 at 5:42 PM, Thibaut Britz thibaut.br...@trendiction.com wrote: Had the same Problem a while ago. Upgrading solved the problem (Don't know if you have to redeploy your cluster though) http://www.mail-archive.com/user@cassandra.apache.org/msg07106.html On Wed, Jan 5, 2011 at 4:29 PM, Ran Tavory ran...@gmail.com wrote: @Thibaut wrong email? Or how's Avoid dropping messages off the client request path (CASSANDRA-1676) related to the bootstrap questions I had? On Wed, Jan 5, 2011 at 5:23 PM, Thibaut Britz thibaut.br...@trendiction.com wrote: https://issues.apache.org/jira/browse/CASSANDRA-1676 you have to use at least 0.6.7 On Wed, Jan 5, 2011 at 4:19 PM, Edward Capriolo edlinuxg...@gmail.comwrote: On Wed, Jan 5, 2011 at 10:05 AM, Ran Tavory ran...@gmail.com wrote: In storage-conf I see this comment [1] from which I understand that the recommended way to bootstrap a new node is to set AutoBootstrap=true and remove itself from the seeds list. Moreover, I did try to set AutoBootstrap=true and have the node in its own seeds list, but it would not bootstrap. I don't recall the exact message but it was something like I found myself in the seeds list therefore I'm not going to bootstrap even though AutoBootstrap is true. [1] !-- ~ Turn on to make new [non-seed] nodes automatically migrate the right data ~ to themselves. (If no InitialToken is specified, they will pick one ~ such that they will get half the range of the most-loaded node.) ~ If a node starts up without bootstrapping, it will mark itself bootstrapped ~ so that you can't subsequently accidently bootstrap a node with ~ data on it. (You can reset this by wiping your data and commitlog ~ directories.) ~ ~ Off by default so that new clusters and upgraders from 0.4 don't ~ bootstrap immediately. You should turn this on when you start adding ~ new nodes to a cluster that already has data on it. (If you are upgrading ~ from 0.4, start your cluster with it off once before changing it to true. ~ Otherwise, no data will be lost but you will incur a lot of unnecessary ~ I/O before your cluster starts up.) -- AutoBootstrapfalse/AutoBootstrap On Wed, Jan 5, 2011 at 4:58 PM, David Boxenhorn da...@lookin2.com wrote: If seed list should be the same across the cluster that means that nodes *should* have themselves as a seed. If that doesn't work for Ran, then that is the first problem, no? On Wed, Jan 5, 2011 at 3:56 PM, Jake Luciani jak...@gmail.com wrote: Well your ring issues don't make sense to me, seed list should be the same across the cluster. I'm just thinking of other things to try, non-boostrapped nodes should join the ring instantly but reads will fail if you aren't using quorum. On Wed, Jan 5, 2011 at 8:51 AM, Ran Tavory ran...@gmail.com wrote: I haven't tried repair. Should I? On Jan 5, 2011 3:48 PM, Jake Luciani jak...@gmail.com wrote: Have you tried not bootstrapping but setting the token and manually calling repair? On Wed, Jan 5, 2011 at 7:07 AM, Ran Tavory ran...@gmail.com wrote: My conclusion is lame: I tried this on several hosts and saw the same behavior, the only way I was able to join new nodes was to first start them when they are *not in* their own seeds list and after they finish transferring the data, then restart them with themselves *in* their own seeds list. After doing that the node would join the ring. This is either my misunderstanding or a bug, but the only place I found it documented stated that the new node should not be in its own seeds list. Version 0.6.6. On Wed, Jan 5, 2011 at 10:35 AM, David Boxenhorn da...@lookin2.comwrote: My nodes all have themselves in their list of seeds - always did - and everything works. (You may ask why I did this. I don't know, I must have copied it from an example somewhere.) On Wed, Jan 5, 2011 at 9:42 AM, Ran Tavory ran...@gmail.com wrote: I was able to make the node join the ring but I'm confused. What I did is, first when adding the node, this node was not in the seeds list of itself. AFAIK this is how it's supposed to be. So it was able to transfer all data to itself from other nodes but then it stayed in the bootstrapping state. So what I did (and I don't know why it works), is add this node to the seeds list in its
Re: Bootstrapping taking long
1676 says Avoid dropping messages off the client request path. Bootstrap messages are off the client requst path. So, if some of the nodes involved were loaded enough that they were dropping messages older than RPC_TIMEOUT to cope, it could lose part of the bootstrap communication permanently. On Wed, Jan 5, 2011 at 10:01 AM, Ran Tavory ran...@gmail.com wrote: OK, thanks, so I see we had the same problem (I too had multiple keyspace, not that I know why it matters to the problem at hand) and I see that by upgrading to 0.6.7 you solved your problem (I didn't try it, had a different workaround) but frankly, I don't understand how https://issues.apache.org/jira/browse/CASSANDRA-1676 would relate the the stuck bootstrap problem (I'm not saying that it isn't, I'd just like to understand why...) -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: Converting a TimeUUID to a long (timestamp) and vice-versa
It was our original intention on discussing this feature was to have back-and-forth conversion from timestamps (we were modelling similar functionality in Pycassa). It's lack of inclusion may have just been an oversight. We will add this in Hector trunk shortly - thanks for the complete code sample. On Tue, Jan 4, 2011 at 10:06 PM, Roshan Dawrani roshandawr...@gmail.com wrote: Ok, found the solution - finally ! - by applying opposite of what createTime() does in TimeUUIDUtils. Ideally I would have preferred for this solution to come from Hector API, so I didn't have to be tied to the private createTime() implementation. import java.util.UUID; import me.prettyprint.cassandra.utils.TimeUUIDUtils; public class TryHector { public static void main(String[] args) throws Exception { final long NUM_100NS_INTERVALS_SINCE_UUID_EPOCH = 0x01b21dd213814000L; UUID u1 = TimeUUIDUtils.getUniqueTimeUUIDinMillis(); final long t1 = u1.timestamp(); long tmp = (t1 - NUM_100NS_INTERVALS_SINCE_UUID_EPOCH) / 1; UUID u2 = TimeUUIDUtils.getTimeUUID(tmp); long t2 = u2.timestamp(); System.out.println(u2.equals(u1)); System.out.println(t2 == t1); } } On Wed, Jan 5, 2011 at 8:15 AM, Roshan Dawrani roshandawr...@gmail.com wrote: If I use com.eaio.uuid.UUID directly, then I am able to do what I need (attached a Java program for the same), but unfortunately I need to deal with java.util.UUID in my application and I don't have its equivalent com.eaio.uuid.UUID at the point where I need the timestamp value. Any suggestion on how I can achieve the equivalent using Hector library's TimeUUIDUtils? On Wed, Jan 5, 2011 at 7:21 AM, Roshan Dawrani roshandawr...@gmail.com wrote: Hi Victor / Patricio, I have been using Hector library's TimeUUIDUtils. I also just looked at TimeUUIDUtilsTest also but didn't find anything similar being tested there. Here is what I am trying and it's not working - I am creating a Time UUID, extracting its timestamp value and with that I create another Time UUID and I am expecting both time UUIDs to have the same timestamp() value - am I doing / expecting something wrong here?: === import java.util.UUID; import me.prettyprint.cassandra.utils.TimeUUIDUtils; public class TryHector { public static void main(String[] args) throws Exception { UUID someUUID = TimeUUIDUtils.getUniqueTimeUUIDinMillis(); long timestamp1 = someUUID.timestamp(); UUID otherUUID = TimeUUIDUtils.getTimeUUID(timestamp1); long timestamp2 = otherUUID.timestamp(); System.out.println(timestamp1); System.out.println(timestamp2); } } === I have to create the timestamp() equivalent of my time UUIDs so I can send it to my UI client, for which it will be simpler to compare long timestamp than comparing UUIDs. Then for the long timestamp chosen by the client, I need to re-create the equivalent time UUID and go and filter the data from Cassandra database. -- Roshan Blog: http://roshandawrani.wordpress.com/ Twitter: @roshandawrani Skype: roshandawrani On Wed, Jan 5, 2011 at 1:32 AM, Victor Kabdebon victor.kabde...@gmail.com wrote: Hi Roshan, Sorry I misunderstood your problem.It is weird that it doesn't work, it works for me... As Patricio pointed out use hector standard way of creating TimeUUID and tell us if it still doesn't work. Maybe you can paste here some of the code you use to query your columns too. Victor K. http://www.voxnucleus.fr 2011/1/4 Patricio Echagüe patric...@gmail.com In Hector framework, take a look at TimeUUIDUtils.java You can create a UUID using TimeUUIDUtils.getTimeUUID(long time); or TimeUUIDUtils.getTimeUUID(ClockResolution clock) and later on, TimeUUIDUtils.getTimeFromUUID(..) or just UUID.timestamp(); There are some example in TimeUUIDUtilsTest.java Let me know if it helps. On Tue, Jan 4, 2011 at 10:27 AM, Roshan Dawrani roshandawr...@gmail.com wrote: Hello Victor, It is actually not that I need the 2 UUIDs to be exactly same - they need to be same timestamp wise. So, what I need is to extract the timestamp portion from a time UUID (say, U1) and then later in the cycle, use the same long timestamp value to re-create a UUID (say, U2) that is equivalent of the previous one in terms of its timestamp portion - i.e., I should be able to give this U2 and filter the data from a column family - and it should be same as if I had used the original UUID U1. Does it make any more sense than before? Any way I can do that? rgds, Roshan On Tue, Jan 4, 2011 at 11:46 PM, Victor Kabdebon victor.kabde...@gmail.com wrote: Hello Roshan, Well it
Question about replication
Hello, Is it possible to set the replication factor to some kind of ALL setting so that all data gets replicated to all nodes and if a new node is dynamically added to the cluster, the current nodes replicate their data to it? Thanks, Mayuresh
Re: The CLI sometimes gets 100 results even though there are more, and sometimes gets more than 100
The CLI sometimes gets only 100 results (even though there are more) - and sometimes gets all the results, even when there are more than 100! What is going on here? Is there some logic that says if there are too many results return 100, even though too many can be more than 100? API calls have a limit since streaming is not supported and you could potentially have almost arbitrary large result sets. I believe cassandra-cli will allow you to set the limit if you look at the 'help' output and look for the word 'limit'. The way to iterate over large amounts of data is to do paging, with multiple queries. -- / Peter Schuller
Re: Cassandra 0.7 - Query on network topology
1. Some way to send requests for keys whose token fall between 0-25 to B and never to C even though C will have the data due to it being replica of B. If your data set is large, be mindful of the fact that this will cause C to be completely cold in terms of caches. I.e., when B does go down, C will take lots of iops. -- / Peter Schuller
Re: Question about replication
No. On Wed, Jan 5, 2011 at 10:38 AM, Mayuresh Kulkarni kul...@cs.rpi.edu wrote: Hello, Is it possible to set the replication factor to some kind of ALL setting so that all data gets replicated to all nodes and if a new node is dynamically added to the cluster, the current nodes replicate their data to it? Thanks, Mayuresh -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: The CLI sometimes gets 100 results even though there are more, and sometimes gets more than 100
I know that there's a limit, and I just assumed that the CLI set it to 100, until I saw more than 100 results. On Wed, Jan 5, 2011 at 6:56 PM, Peter Schuller peter.schul...@infidyne.comwrote: The CLI sometimes gets only 100 results (even though there are more) - and sometimes gets all the results, even when there are more than 100! What is going on here? Is there some logic that says if there are too many results return 100, even though too many can be more than 100? API calls have a limit since streaming is not supported and you could potentially have almost arbitrary large result sets. I believe cassandra-cli will allow you to set the limit if you look at the 'help' output and look for the word 'limit'. The way to iterate over large amounts of data is to do paging, with multiple queries. -- / Peter Schuller
Re: The CLI sometimes gets 100 results even though there are more, and sometimes gets more than 100
I know that there's a limit, and I just assumed that the CLI set it to 100, until I saw more than 100 results. Ooh, sorry. Didn't read carefully enough. Not sure why you see that behavior. Sounds strange; should not be supported at the thrift level AFAIK. -- / Peter Schuller
Re: Converting a TimeUUID to a long (timestamp) and vice-versa
Roshan, just a comment in your solution. The time returned is not a simple long. It also contains some bits indicating the version. On the other hand, you are assuming that the same machine is processing your request and recreating a UUID base on a long you provide. The clockseqAndNode id will vary if another machine takes care of the request (referring to your use case) . Is it possible for you to send the UUID to the view? I think that would be the correct behavior as a simple long does not contain enough information to recreate the original UUID. Does it make sense? On Wed, Jan 5, 2011 at 8:36 AM, Nate McCall n...@riptano.com wrote: It was our original intention on discussing this feature was to have back-and-forth conversion from timestamps (we were modelling similar functionality in Pycassa). It's lack of inclusion may have just been an oversight. We will add this in Hector trunk shortly - thanks for the complete code sample. On Tue, Jan 4, 2011 at 10:06 PM, Roshan Dawrani roshandawr...@gmail.com wrote: Ok, found the solution - finally ! - by applying opposite of what createTime() does in TimeUUIDUtils. Ideally I would have preferred for this solution to come from Hector API, so I didn't have to be tied to the private createTime() implementation. import java.util.UUID; import me.prettyprint.cassandra.utils.TimeUUIDUtils; public class TryHector { public static void main(String[] args) throws Exception { final long NUM_100NS_INTERVALS_SINCE_UUID_EPOCH = 0x01b21dd213814000L; UUID u1 = TimeUUIDUtils.getUniqueTimeUUIDinMillis(); final long t1 = u1.timestamp(); long tmp = (t1 - NUM_100NS_INTERVALS_SINCE_UUID_EPOCH) / 1; UUID u2 = TimeUUIDUtils.getTimeUUID(tmp); long t2 = u2.timestamp(); System.out.println(u2.equals(u1)); System.out.println(t2 == t1); } } On Wed, Jan 5, 2011 at 8:15 AM, Roshan Dawrani roshandawr...@gmail.com wrote: If I use com.eaio.uuid.UUID directly, then I am able to do what I need (attached a Java program for the same), but unfortunately I need to deal with java.util.UUID in my application and I don't have its equivalent com.eaio.uuid.UUID at the point where I need the timestamp value. Any suggestion on how I can achieve the equivalent using Hector library's TimeUUIDUtils? On Wed, Jan 5, 2011 at 7:21 AM, Roshan Dawrani roshandawr...@gmail.com wrote: Hi Victor / Patricio, I have been using Hector library's TimeUUIDUtils. I also just looked at TimeUUIDUtilsTest also but didn't find anything similar being tested there. Here is what I am trying and it's not working - I am creating a Time UUID, extracting its timestamp value and with that I create another Time UUID and I am expecting both time UUIDs to have the same timestamp() value - am I doing / expecting something wrong here?: === import java.util.UUID; import me.prettyprint.cassandra.utils.TimeUUIDUtils; public class TryHector { public static void main(String[] args) throws Exception { UUID someUUID = TimeUUIDUtils.getUniqueTimeUUIDinMillis(); long timestamp1 = someUUID.timestamp(); UUID otherUUID = TimeUUIDUtils.getTimeUUID(timestamp1); long timestamp2 = otherUUID.timestamp(); System.out.println(timestamp1); System.out.println(timestamp2); } } === I have to create the timestamp() equivalent of my time UUIDs so I can send it to my UI client, for which it will be simpler to compare long timestamp than comparing UUIDs. Then for the long timestamp chosen by the client, I need to re-create the equivalent time UUID and go and filter the data from Cassandra database. -- Roshan Blog: http://roshandawrani.wordpress.com/ Twitter: @roshandawrani Skype: roshandawrani On Wed, Jan 5, 2011 at 1:32 AM, Victor Kabdebon victor.kabde...@gmail.com wrote: Hi Roshan, Sorry I misunderstood your problem.It is weird that it doesn't work, it works for me... As Patricio pointed out use hector standard way of creating TimeUUID and tell us if it still doesn't work. Maybe you can paste here some of the code you use to query your columns too. Victor K. http://www.voxnucleus.fr 2011/1/4 Patricio Echagüe patric...@gmail.com In Hector framework, take a look at TimeUUIDUtils.java You can create a UUID using TimeUUIDUtils.getTimeUUID(long time); or TimeUUIDUtils.getTimeUUID(ClockResolution clock) and later on, TimeUUIDUtils.getTimeFromUUID(..) or just UUID.timestamp(); There are some example in TimeUUIDUtilsTest.java Let me know if it helps. On Tue, Jan 4,
Cassandra Meetup in San Francisco Bay Area
We are hosting a Cassandra meetup in BayArea. Jonathan will give a talk on Cassandra 0.7 The link to the meetup page is at http://www.meetup.com/Cassandra-User-Group-Meeting/ Thanks, Mubarak
Re: Bootstrapping taking long
I see. Thanks for claryfing Jonathan. On Wednesday, January 5, 2011, Jonathan Ellis jbel...@gmail.com wrote: 1676 says Avoid dropping messages off the client request path. Bootstrap messages are off the client requst path. So, if some of the nodes involved were loaded enough that they were dropping messages older than RPC_TIMEOUT to cope, it could lose part of the bootstrap communication permanently. On Wed, Jan 5, 2011 at 10:01 AM, Ran Tavory ran...@gmail.com wrote: OK, thanks, so I see we had the same problem (I too had multiple keyspace, not that I know why it matters to the problem at hand) and I see that by upgrading to 0.6.7 you solved your problem (I didn't try it, had a different workaround) but frankly, I don't understand how https://issues.apache.org/jira/browse/CASSANDRA-1676 would relate the the stuck bootstrap problem (I'm not saying that it isn't, I'd just like to understand why...) -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com -- /Ran
pig cassandra contribution
I am having problem running the cassandra_loadfunc.jar on my build of cassandra. PIG_CLASSPATH=:bin/../build/cassandra_loadfunc.jar::bin/../../..//lib/antlr-3.1.3.jar:bin/../../..//lib/avro-1.2.0-dev.jar:bin/../../..//lib/clhm-production.jar:bin/../../..//lib/commons-cli-1.1.jar:bin/../../..//lib/commons-codec-1.2.jar:bin/../../..//lib/commons-collections-3.2.1.jar:bin/../../..//lib/commons-lang-2.4.jar:bin/../../..//lib/google-collections-1.0.jar:bin/../../..//lib/hadoop-core-0.20.1.jar:bin/../../..//lib/high-scale-lib.jar:bin/../../..//lib/jackson-core-asl-1.4.0.jar:bin/../../..//lib/jackson-mapper-asl-1.4.0.jar:bin/../../..//lib/jline-0.9.94.jar:bin/../../..//lib/json-simple-1.1.jar:bin/../../..//lib/libthrift.jar:bin/../../..//lib/log4j-1.2.14.jar:bin/../../..//lib/slf4j-api-1.5.8.jar:bin/../../..//lib/slf4j-log4j12-1.5.8.jar:bin/../../..//lib/spymemcached-2.4.2.jar:bin/../../..//lib/zapcat-1.2.jar:bin/../../..//build/lib/jars/ant-1.6.5.jar:bin/../../..//build/lib/jars/apache-rat-0.6.jar:bin/../../..//build/lib/jars/apache-rat-core-0.6.jar:bin/../../..//build/lib/jars/apache-rat-tasks-0.6.jar:bin/../../..//build/lib/jars/asm-3.2.jar:bin/../../..//build/lib/jars/avalon-framework-4.1.3.jar:bin/../../..//build/lib/jars/commons-cli-1.1.jar:bin/../../..//build/lib/jars/commons-collections-3.2.jar:bin/../../..//build/lib/jars/commons-lang-2.1.jar:bin/../../..//build/lib/jars/commons-logging-1.1.1.jar:bin/../../..//build/lib/jars/junit-4.6.jar:bin/../../..//build/lib/jars/log4j-1.2.12.jar:bin/../../..//build/lib/jars/logkit-1.0.1.jar:bin/../../..//build/lib/jars/paranamer-ant-2.1.jar:bin/../../..//build/lib/jars/paranamer-generator-2.1.jar:bin/../../..//build/lib/jars/qdox-1.10.jar:bin/../../..//build/lib/jars/servlet-api-2.3.jar:bin/../../..//build/apache-cassandra-0.6.4.jar:bin/../../..//build/ivy-2.1.0.jar:/usr/local/pig-0.7.0/pig.jar In Grunt I did register again just in case it is not picked up by the classpath register /usr/local/pig-0.7.0/pig.jar; register /home/felix/cassandra/lib/libthrift.jar; register /home/felix/cassandra/contrib/pig/build/cassandra_loadfunc.jar grunt rows = LOAD 'cassandra://test.data' USING CassandraStorge(); 2011-01-05 13:50:50,071 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1070: Could not resolve CassandraStorge using imports: [org.apache.cassandra.hadoop.pig., , org.apache.pig.builtin., org.apache.pig.impl.builtin.] Details at logfile: /home/felix/cassandra/contrib/pig/pig_1294257032719.log the log file contains Pig Stack Trace --- ERROR 1070: Could not resolve CassandraStorge using imports: [org.apache.cassandra.hadoop.pig., , org.apache.pig.builtin., org.apache.pig.impl.builtin.] java.lang.RuntimeException: Cannot instantiate:CassandraStorge at org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:455) at org.apache.pig.impl.logicalLayer.parser.QueryParser.NonEvalFuncSpec(QueryParser.java:5087) at org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1434) at org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1245) at org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:911) at org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:700) at org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63) at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1164) at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1114) at org.apache.pig.PigServer.registerQuery(PigServer.java:425) at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:737) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:324) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:162) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138) at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75) at org.apache.pig.Main.main(Main.java:357) Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 1070: Could not resolve CassandraStorge using imports: [org.apache.cassandra.hadoop.pig., , org.apache.pig.builtin., org.apache.pig.impl.builtin.] at org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:440) at org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:452) ... 15 more Running hadoop 0.20.2 with pig0.7.0 and have to use cassandra 0.6.4. Thanks, Felix
Re: Reclaim deleted rows space
How does minor compaction is triggered? Is it triggered Only when a new SStable is added? I was wondering if triggering a compaction with minimumCompactionThreshold set to 1 would be useful. If this can happen I assume it will do compaction on files with similar size and remove deleted rows on the rest. Shimi On Tue, Jan 4, 2011 at 9:56 PM, Peter Schuller peter.schul...@infidyne.comwrote: I don't have a problem with disk space. I have a problem with the data size. [snip] Bottom line is that I want to reduce the number of requests that goes to disk. Since there is enough data that is no longer valid I can do it by reclaiming the space. The only way to do it is by running Major compaction. I can wait and let Cassandra do it for me but then the data size will get even bigger and the response time will be worst. I can do it manually but I prefer it to happen in the background with less impact on the system Ok - that makes perfect sense then. Sorry for misunderstanding :) So essentially, for workloads that are teetering on the edge of cache warmness and is subject to significant overwrites or removals, it may be beneficial to perform much more aggressive background compaction even though it might waste lots of CPU, to keep the in-memory working set down. There was talk (I think in the compaction redesign ticket) about potentially improving the use of bloom filters such that obsolete data in sstables could be eliminated from the read set without necessitating actual compaction; that might help address cases like these too. I don't think there's a pre-existing silver bullet in a current release; you probably have to live with the need for greater-than-theoretically-optimal memory requirements to keep the working set in memory. -- / Peter Schuller
Re: Cassandra Meetup in San Francisco Bay Area
Thanks for organizing this, Mubarak! A little more detail -- I'll explain the new features in Cassandra 0.7 including column time-to-live, columnfamily truncation, and secondary indexes, as well as some of the features that have been backported to recent 0.6 releases (aka Why You Should Upgrade Yesterday). The focus will primarily be on how these affect application design, but we'll also touch on operational considerations. I'm excited to meet everyone! I hear there will be pizza, too. :) On Wed, Jan 5, 2011 at 1:31 PM, Mubarak Seyed biggd...@gmail.com wrote: We are hosting a Cassandra meetup in BayArea. Jonathan will give a talk on Cassandra 0.7 The link to the meetup page is at http://www.meetup.com/Cassandra-User-Group-Meeting/ Thanks, Mubarak -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: Reclaim deleted rows space
Pretty sure there's logic in there that says don't bother compacting a single sstable. On Wed, Jan 5, 2011 at 2:26 PM, shimi shim...@gmail.com wrote: How does minor compaction is triggered? Is it triggered Only when a new SStable is added? I was wondering if triggering a compaction with minimumCompactionThreshold set to 1 would be useful. If this can happen I assume it will do compaction on files with similar size and remove deleted rows on the rest. Shimi On Tue, Jan 4, 2011 at 9:56 PM, Peter Schuller peter.schul...@infidyne.com wrote: I don't have a problem with disk space. I have a problem with the data size. [snip] Bottom line is that I want to reduce the number of requests that goes to disk. Since there is enough data that is no longer valid I can do it by reclaiming the space. The only way to do it is by running Major compaction. I can wait and let Cassandra do it for me but then the data size will get even bigger and the response time will be worst. I can do it manually but I prefer it to happen in the background with less impact on the system Ok - that makes perfect sense then. Sorry for misunderstanding :) So essentially, for workloads that are teetering on the edge of cache warmness and is subject to significant overwrites or removals, it may be beneficial to perform much more aggressive background compaction even though it might waste lots of CPU, to keep the in-memory working set down. There was talk (I think in the compaction redesign ticket) about potentially improving the use of bloom filters such that obsolete data in sstables could be eliminated from the read set without necessitating actual compaction; that might help address cases like these too. I don't think there's a pre-existing silver bullet in a current release; you probably have to live with the need for greater-than-theoretically-optimal memory requirements to keep the working set in memory. -- / Peter Schuller -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: pig cassandra contribution
Ignore the above error, I somehow passed that stage. However, I am still having problem with it. grunt register /home/felix/pig-0.7.0/pig-0.7.1-dev.jar; register /home/felix/cassandra/lib/libthrift.jar; grunt rows = LOAD 'cassandra://test/data' USING CassandraStorage(); grunt cols = FOREACH rows GENERATE flatten($1); grunt colnames = FOREACH cols GENERATE $0; grunt limit_colnames = limit colnames 10; grunt dump limit_colnames 2011-01-05 15:44:17,378 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics with processName=JobTracker, sessionId= 2011-01-05 15:44:17,460 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name: Store(file:/tmp/temp-1545399343/tmp576746049:org.apache.pig.builtin.BinStorage) - 1-27 Operator Key: 1-27) 2011-01-05 15:44:17,507 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1 2011-01-05 15:44:17,507 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1 2011-01-05 15:44:17,533 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2011-01-05 15:44:17,539 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2011-01-05 15:44:17,539 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 2011-01-05 15:44:21,785 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job 2011-01-05 15:44:21,841 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2011-01-05 15:44:21,842 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission. 2011-01-05 15:44:21,846 [Thread-5] WARN org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 2011-01-05 15:44:22,115 [Thread-5] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2011-01-05 15:44:22,133 [Thread-5] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2011-01-05 15:44:22,344 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete 2011-01-05 15:44:22,348 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2117: Unexpected error when launching map reduce job. Details at logfile: /home/felix/cassandra/contrib/pig/pig_1294263823129.log cat pig_1294263823129.log Pig Stack Trace --- ERROR 2117: Unexpected error when launching map reduce job. org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias limit_colnames at org.apache.pig.PigServer.openIterator(PigServer.java:521) at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:544) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:241) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:162) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138) at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75) at org.apache.pig.Main.main(Main.java:357) Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable to store alias limit_colnames at org.apache.pig.PigServer.store(PigServer.java:577) at org.apache.pig.PigServer.openIterator(PigServer.java:504) ... 6 more Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2117: Unexpected error when launching map reduce job. at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:209) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:308) at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:835) at org.apache.pig.PigServer.store(PigServer.java:569) ... 7 more Caused by: java.lang.RuntimeException: Could not resolve error that occured when launching map reduce job: java.lang.ExceptionInInitializerError at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$JobControlThreadExceptionHandler.uncaughtException(MapReduceLauncher.java:510) at java.lang.Thread.dispatchUncaughtException(Thread.java:1831) On Wed, Jan 5, 2011 at 12:02 PM, felix gao gre1...@gmail.com wrote: I am having problem running the
Re: Converting a TimeUUID to a long (timestamp) and vice-versa
Hi Patricio, Thanks for your comment. Replying inline. 2011/1/5 Patricio Echagüe patric...@gmail.com Roshan, just a comment in your solution. The time returned is not a simple long. It also contains some bits indicating the version. I don't think so. The version bits from the most significant 64 bits of the UUID are not used in creating timestamp() value. It uses only time_low, time_mid and time_hi fields of the UUID and not version, as documented here: http://download.oracle.com/javase/1.5.0/docs/api/java/util/UUID.html#timestamp%28%29. When the same timestamp comes back and I call TimeUUIDUtils.getTimeUUID(tmp), it internally puts the version back in it and makes it a Time UUID. On the other hand, you are assuming that the same machine is processing your request and recreating a UUID base on a long you provide. The clockseqAndNode id will vary if another machine takes care of the request (referring to your use case) . When I recreate my UUID using the timestamp() value, my requirement is not to arrive at the exactly same UUID from which timestamp() was derived in the first place. I need a recreated UUID *that should be equivalent in terms of its time value* - so that filtering the time-sorted columns using this time UUID works fine. So, if the lower order 64 bits (clockseq + node) become different, I don't think it is of any concern because the UUID comparison first goes by most significant 64 bits, i.e. the time value and that should settle the time comparison in my use case. Is it possible for you to send the UUID to the view? I think that would be the correct behavior as a simple long does not contain enough information to recreate the original UUID. In my use case, the non-Java clients will be receiving a number of such UUIDs then and they will have to sort them chronologically. I wanted to avoid bits based UUID comparison in these clients. Long timestamp() value is perfect for such ordering of data elements and I send much lesser amount of data over the wire. Does it make sense? Nearly everything makes sense to me :-) -- Roshan Blog: http://roshandawrani.wordpress.com/ Twitter: @roshandawrani http://twitter.com/roshandawrani Skype: roshandawrani
Re: Reclaim deleted rows space
Although it's not exactly the ability to list specific SSTables, the ability to only compact specific CFs will be in upcoming releases: https://issues.apache.org/jira/browse/CASSANDRA-1812 - Tyler On Wed, Jan 5, 2011 at 7:46 PM, Edward Capriolo edlinuxg...@gmail.comwrote: On Wed, Jan 5, 2011 at 4:31 PM, Jonathan Ellis jbel...@gmail.com wrote: Pretty sure there's logic in there that says don't bother compacting a single sstable. On Wed, Jan 5, 2011 at 2:26 PM, shimi shim...@gmail.com wrote: How does minor compaction is triggered? Is it triggered Only when a new SStable is added? I was wondering if triggering a compaction with minimumCompactionThreshold set to 1 would be useful. If this can happen I assume it will do compaction on files with similar size and remove deleted rows on the rest. Shimi On Tue, Jan 4, 2011 at 9:56 PM, Peter Schuller peter.schul...@infidyne.com wrote: I don't have a problem with disk space. I have a problem with the data size. [snip] Bottom line is that I want to reduce the number of requests that goes to disk. Since there is enough data that is no longer valid I can do it by reclaiming the space. The only way to do it is by running Major compaction. I can wait and let Cassandra do it for me but then the data size will get even bigger and the response time will be worst. I can do it manually but I prefer it to happen in the background with less impact on the system Ok - that makes perfect sense then. Sorry for misunderstanding :) So essentially, for workloads that are teetering on the edge of cache warmness and is subject to significant overwrites or removals, it may be beneficial to perform much more aggressive background compaction even though it might waste lots of CPU, to keep the in-memory working set down. There was talk (I think in the compaction redesign ticket) about potentially improving the use of bloom filters such that obsolete data in sstables could be eliminated from the read set without necessitating actual compaction; that might help address cases like these too. I don't think there's a pre-existing silver bullet in a current release; you probably have to live with the need for greater-than-theoretically-optimal memory requirements to keep the working set in memory. -- / Peter Schuller -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com I was wording if it made sense to have a JMX operation that can compact a list of tables by file name. This opens it up for power users to have more options then compact entire keyspace.
Re: Converting a TimeUUID to a long (timestamp) and vice-versa
Roshan, the first 64 bits does contain the version. The method UUID.timestamp() indeed takes it out before returning. You are right in that point. I based my comment on the UUID spec. What I am not convinced is that the framework should provide support to create an almost identical UUID where only the timestamp is the same (between the two UUIDs). UUID.equals() and UUID.compareTo() does compare the whole bit set to say that two objects are the same. It does compare the first 64 bits to avoid comparing the rest in case the most significant bits already show a difference. But coming to your point, should Hector provide that kind of support or do you feel that the problem you have is specific to your application ? I feel like UUID is as it says an Unique Identifier and creating a sort-of UUID based on a previous timestamp disregarding the least significant bits is not the right support Hector should expose. Thoughts? On Wed, Jan 5, 2011 at 6:30 PM, Roshan Dawrani roshandawr...@gmail.comwrote: Hi Patricio, Thanks for your comment. Replying inline. 2011/1/5 Patricio Echagüe patric...@gmail.com Roshan, just a comment in your solution. The time returned is not a simple long. It also contains some bits indicating the version. I don't think so. The version bits from the most significant 64 bits of the UUID are not used in creating timestamp() value. It uses only time_low, time_mid and time_hi fields of the UUID and not version, as documented here: http://download.oracle.com/javase/1.5.0/docs/api/java/util/UUID.html#timestamp%28%29. When the same timestamp comes back and I call TimeUUIDUtils.getTimeUUID(tmp), it internally puts the version back in it and makes it a Time UUID. On the other hand, you are assuming that the same machine is processing your request and recreating a UUID base on a long you provide. The clockseqAndNode id will vary if another machine takes care of the request (referring to your use case) . When I recreate my UUID using the timestamp() value, my requirement is not to arrive at the exactly same UUID from which timestamp() was derived in the first place. I need a recreated UUID *that should be equivalent in terms of its time value* - so that filtering the time-sorted columns using this time UUID works fine. So, if the lower order 64 bits (clockseq + node) become different, I don't think it is of any concern because the UUID comparison first goes by most significant 64 bits, i.e. the time value and that should settle the time comparison in my use case. Is it possible for you to send the UUID to the view? I think that would be the correct behavior as a simple long does not contain enough information to recreate the original UUID. In my use case, the non-Java clients will be receiving a number of such UUIDs then and they will have to sort them chronologically. I wanted to avoid bits based UUID comparison in these clients. Long timestamp() value is perfect for such ordering of data elements and I send much lesser amount of data over the wire. Does it make sense? Nearly everything makes sense to me :-) -- Roshan Blog: http://roshandawrani.wordpress.com/ Twitter: @roshandawrani http://twitter.com/roshandawrani Skype: roshandawrani -- Patricio.-
Re: Converting a TimeUUID to a long (timestamp) and vice-versa
Hi Patricio, Some thoughts inline. 2011/1/6 Patricio Echagüe patric...@gmail.com Roshan, the first 64 bits does contain the version. The method UUID.timestamp() indeed takes it out before returning. You are right in that point. I based my comment on the UUID spec. I know 64 bits have the version, but timestamp() doesn't and hence it is OK to use it for chronological ordering. Anyway, we agree on it now and this point is out. What I am not convinced is that the framework should provide support to create an almost identical UUID where only the timestamp is the same (between the two UUIDs). Well, I didn't really ask for framework to provide me such an almost identical UUID. What I raised was that since Hector is computing UTC time in 100 nano-sec units as utcTime = msec * 1000 + 0x01B21DD213814000L (NUM_100NS_INTERVALS_SINCE_UUID_EPOCH), it should, at the minimum, give a utility function to do the opposite msec = (utcTime - 0x01B21DD213814000L / 1000), so that if someone has to create an almost identical UUID, where timestamp is same, as I needed, *he shouldn't need to deal with such magic numbers that are linked to Hector's guts.* So, I don't mind creating the UUID myself, but I don't want to do magic calculations that should be done inside Hector to-and-fro, as it is an Hector's internal design thing. UUID.equals() and UUID.compareTo() does compare the whole bit set to say that two objects are the same. It does compare the first 64 bits to avoid comparing the rest in case the most significant bits already show a difference. I know it may need to look at all 128 bits eventually - but it first looks at first 64 bits (time stamp) and then the next 64. That's why I qualified it with for my usecase. It works for me, because the data I am filtering is already within a particular user's data-set - and the possibility of a user having 2 data-points at the same nano-second value (so that clockseq+node bits come into picture) is functionally nil. But coming to your point, should Hector provide that kind of support or do you feel that the problem you have is specific to your application ? As covered above, half of my solution should go inside Hector API, I feel. Other half about re-creating the same-timestamp-UUID and comparison using it is specific to my application. I feel like UUID is as it says an Unique Identifier and creating a sort-of UUID based on a previous timestamp disregarding the least significant bits is not the right support Hector should expose. The support Hector should expose is to keep its magic calculations inside to-and-fro. Does it make any sense? -- Roshan Blog: http://roshandawrani.wordpress.com/ Twitter: @roshandawrani http://twitter.com/roshandawrani Skype: roshandawrani
Riptano Cassandra trainings in Baltimore and Santa Clara
Riptano has two Apache Cassandra training days coming up: Baltimore on Jan 19 and Santa Clara on Feb 4. The Baltimore training will be taught by Jake Luciani, author of Lucandra/Solandra. The Santa Clara training will be taught by Ben Coverston, Riptano's director of operations. These are both full-day, hands-on events covering application design and operations with the new features in Cassandra 0.7. For more details, see http://www.eventbrite.com/org/474011012. See you there! -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com