from:"Yan Chunlu"

sometimes get timeout while batch inserting. (using pycassa)

2012-09-20 Thread Yan Chunlu

I am testing the performance of 1 cassandra node on a production server.  I
wrote a script to insert 1 million items into cassandra. the data is like
below:

*prefix = "benchmark_"*
*dct = {}*
*for i in range(0,100):*
*key = "%s%d" % (prefix,i)*
*dct[key] = "abc"*200*

and the inserting code is like this:
*
*
*cf.batch(write_consistency_level = CL_ONEl):*
*cf.insert('%s%s' % (prefix, key),*
*  {'value': pickle.dumps(val)},
*
*  ttl = None)*


sometimes I get timeout error (detail here:https://gist.github.com/3754965)
 while it's executing. sometime it runs okay.

while the script and cassandra run smoothly on my macbook(for many times),
the configuration of my mac is " 2.4 GHz Intel Core 2 Duo", 8GB memory, SSD
disk though.

really have no idea why is this...

the reason I am do this test is that on the other production server, my 3
nodes cluster also give the pycassa client "timeout" error. make the system
unstable. but I am not sure what the problem is. is it the bug of python
library?
thanks for any further help!

the test script is running on server A and cassandra is running on server
B.
the CPU of B is : "Intel(R) Xeon(R) CPU X3470  @ 2.93GHz Quadcore"

the sys stats on B is normal:

*vmstat 2*
procs ---memory-- ---swap-- -io -system--
cpu
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id
wa
 1  0 3643716 134876 191720 235262411 14400 22  3
74  0
 1  0 3643716 132016 191728 235518000 0   288 4701 16764  9  4
87  0
 0  0 3643716 129700 191736 235799600 0  5772 3775 17139  9  4
87  0
 0  0 3643716 127468 191744 2360420   32032   404 4490 17487 11  3
85  0
*
*
*iostat -x 2*

Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz
avgqu-sz   await r_await w_await  svctm  %util
sda   0.00   230.001.00   15.00 6.00   980.00   123.25
0.032.008.001.60   1.12   1.80
sdb   0.00 0.000.000.00 0.00 0.00 0.00
0.000.000.000.00   0.00   0.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
  11.521.211.990.480.00   84.80

Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz
avgqu-sz   await r_await w_await  svctm  %util
sda   7.00   184.00   12.50   12.0078.00   784.0070.37
0.114.658.320.83   1.88   4.60
sdb   0.00 0.000.000.00 0.00 0.00 0.00
0.000.000.000.00   0.00   0.00

*free -t*
 total   used   free sharedbuffers cached
Mem:  16467952   16378592  89360  0 1520322452216
-/+ buffers/cache:   137743442693608
Swap:  728743636437163643720
Total:23755388   200223083733080

*uptime*
 04:52:57 up 422 days, 19:59,  1 user,  load average: 2.71, 2.09, 1.48

Re: sometimes get timeout while batch inserting. (using pycassa)

2012-09-20 Thread Yan Chunlu

forgot to mention the rpc configuration in cassandra.yaml is:

rpc_timeout_in_ms: 2

and the cassandra version on production server is: 1.1.3

the cassandra version I am using on my macbook is:  1.0.10

On Thu, Sep 20, 2012 at 6:07 PM, Yan Chunlu  wrote:

> I am testing the performance of 1 cassandra node on a production server.
>  I wrote a script to insert 1 million items into cassandra. the data is
> like below:
>
> *prefix = "benchmark_"*
> *dct = {}*
> *for i in range(0,100):*
> *key = "%s%d" % (prefix,i)*
> *dct[key] = "abc"*200*
>
> and the inserting code is like this:
> *
> *
> *cf.batch(write_consistency_level = CL_ONEl):*
> *cf.insert('%s%s' % (prefix, key),*
> *  {'value':
> pickle.dumps(val)},*
> *  ttl = None)*
>
>
> sometimes I get timeout error (detail here:https://gist.github.com/3754965)
>  while it's executing. sometime it runs okay.
>
> while the script and cassandra run smoothly on my macbook(for many times),
> the configuration of my mac is " 2.4 GHz Intel Core 2 Duo", 8GB memory,
> SSD disk though.
>
> really have no idea why is this...
>
> the reason I am do this test is that on the other production server, my 3
> nodes cluster also give the pycassa client "timeout" error. make the system
> unstable. but I am not sure what the problem is. is it the bug of python
> library?
> thanks for any further help!
>
> the test script is running on server A and cassandra is running on server
> B.
> the CPU of B is : "Intel(R) Xeon(R) CPU X3470  @ 2.93GHz Quadcore"
>
> the sys stats on B is normal:
>
> *vmstat 2*
> procs ---memory-- ---swap-- -io -system--
> cpu
>  r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id
> wa
>  1  0 3643716 134876 191720 235262411 14400 22  3
> 74  0
>  1  0 3643716 132016 191728 235518000 0   288 4701 16764  9  4
> 87  0
>  0  0 3643716 129700 191736 235799600 0  5772 3775 17139  9  4
> 87  0
>  0  0 3643716 127468 191744 2360420   32032   404 4490 17487 11  3
> 85  0
> *
> *
> *iostat -x 2*
>
> Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz
> avgqu-sz   await r_await w_await  svctm  %util
> sda   0.00   230.001.00   15.00 6.00   980.00   123.25
> 0.032.008.001.60   1.12   1.80
> sdb   0.00 0.000.000.00 0.00 0.00 0.00
> 0.000.000.000.00   0.00   0.00
>
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>   11.521.211.990.480.00   84.80
>
> Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz
> avgqu-sz   await r_await w_await  svctm  %util
> sda   7.00   184.00   12.50   12.0078.00   784.0070.37
> 0.114.658.320.83   1.88   4.60
> sdb   0.00 0.000.000.00 0.00 0.00 0.00
> 0.000.000.000.00   0.00   0.00
>
> *free -t*
>  total   used   free sharedbuffers cached
> Mem:  16467952   16378592  89360  0 1520322452216
> -/+ buffers/cache:   137743442693608
> Swap:  728743636437163643720
> Total:23755388   200223083733080
>
> *uptime*
>  04:52:57 up 422 days, 19:59,  1 user,  load average: 2.71, 2.09, 1.48
>
>
>

how large cassandra could scale when it need to do manual operation?

2011-07-08 Thread Yan Chunlu

hi, all:
I am curious about how large that Cassandra can scale?

from the information I can get, the largest usage is at facebook, which is
about 150 nodes.  in the mean time they are using 2000+ nodes with Hadoop,
and yahoo even using 4000 nodes of Hadoop.

I am not understand why is the situation, I only have  little knowledge with
Cassandra and even no knowledge with Hadoop.



currently I am using cassandra with 3 nodes and having problem bring one
back after it out of sync, the problems I encountered making me worry about
how cassandra could scale out:

1):  the load balance need to manually performed on every node, according
to:

def tokens(nodes):

for x in xrange(nodes):

print 2 ** 127 / nodes * x



2): when adding new nodes, need to perform node repair and cleanup on every
node



3) when decommission a node, there is a chance that slow down the entire
cluster. (not sure why but I saw people ask around about it.) and the only
way to do is shutdown the entire the cluster, rsync the data, and start all
nodes without the decommission one.





after all, I think there is alot of human work to do to maintain the cluster
which make it impossible to scale to thousands of nodes, but I hope I am
totally wrong about all of this, currently I am serving 1 millions pv every
day with Cassandra and it make me feel unsafe, I am afraid one day one node
crash will cause the data broken and all cluster goes wrong



in the contrary, relational database make me feel safety but it does not
scale well.



thanks for any guidance here.

Re: how large cassandra could scale when it need to do manual operation?

2011-07-09 Thread Yan Chunlu

thank you very much for the reply. which brings me more confidence on
cassandra.
I will try the automation tools, the examples you've listed seems quite
promising!


about the decommission problem, here is the link:
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/how-to-decommission-two-slow-nodes-td5078455.html
 I am also trying to deploy cassandra across two datacenters(with 20ms
latency). so I am worrying about the network latency will even make it
worse.

maybe I was misunderstanding the replication factor, doesn't it RF=3 means I
could lose two nodes and still have one available(with 100% of the keys),
once Nodes>=3?   besides I am not sure what's twitters setting on RF, but it
is possible to lose 3 nodes in the same time(facebook once encountered photo
loss because there RAID broken, rarely happen though). I have the strong
willing to set RF to a very high value...

Thanks!


On Sat, Jul 9, 2011 at 5:22 AM, aaron morton wrote:

> AFAIK Facebook Cassandra and Apache Cassandra diverged paths a long time
> ago. Twitter is a vocal supporter with a large Apache Cassandra install,
> e.g. "Twitter currently runs a couple hundred Cassandra nodes across a half
> dozen clusters. "
> http://www.datastax.com/2011/06/chris-goffinet-of-twitter-to-speak-at-cassandra-sf-2011
>
>
>
> <http://www.datastax.com/2011/06/chris-goffinet-of-twitter-to-speak-at-cassandra-sf-2011>If
> you are working with a 3 node cluster removing/rebuilding/what ever one node
> will effect 33% of your capacity. When you scale up the contribution from
> each individual node goes down, and the impact of one node going down is
> less. Problems that happen with a few nodes will go away at scale, to be
> replaced by a whole set of new ones.
>
>
> 1):  the load balance need to manually performed on every node, according
> to:
>
> Yes
>
> 2): when adding new nodes, need to perform node repair and cleanup on every
> node
>
> You only need to run cleanup, see
> http://wiki.apache.org/cassandra/Operations#Bootstrap
>
> 3) when decommission a node, there is a chance that slow down the entire
> cluster. (not sure why but I saw people ask around about it.) and the only
> way to do is shutdown the entire the cluster, rsync the data, and start all
> nodes without the decommission one.
>
> I cannot remember any specific cases where decommission requires a full
> cluster stop, do you have a link? With regard to slowing down, the
> decommission process will stream data from the node you are removing onto
> the other nodes this can slow down the target node (I think it's more
> intelligent now about what is moved). This will be exaggerated in a 3 node
> cluster as you are removing 33% of the processing and adding some
> (temporary) extra load to the remaining nodes.
>
> after all, I think there is alot of human work to do to maintain the
> cluster which make it impossible to scale to thousands of nodes,
>
> Automation, Automation, Automation is the only way to go.
>
> Chef, Puppet, CF Engine for general config and deployment; Cloud Kick,
> munin, ganglia etc for monitoring. And
> Ops Centre (http://www.datastax.com/products/opscenter) for cassandra
> specific management.
>
> I am totally wrong about all of this, currently I am serving 1 millions pv
> every day with Cassandra and it make me feel unsafe, I am afraid one day one
> node crash will cause the data broken and all cluster goes wrong
>
> With RF3 and a 3Node cluster you have room to lose one node and the cluster
> will be up for 100% of the keys. While better than having to worry about
> *the* database server, it's still entry level fault tolerance. With RF 3 in
> a 6 Node cluster you can lose up to 2 nodes and still be up for 100% of the
> keys.
>
> Is there something you are specifically concerned about with your current
> installation ?
>
> Cheers
>
>   -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 8 Jul 2011, at 08:50, Yan Chunlu wrote:
>
> hi, all:
> I am curious about how large that Cassandra can scale?
>
> from the information I can get, the largest usage is at facebook, which is
> about 150 nodes.  in the mean time they are using 2000+ nodes with Hadoop,
> and yahoo even using 4000 nodes of Hadoop.
>
> I am not understand why is the situation, I only have  little knowledge
> with Cassandra and even no knowledge with Hadoop.
>
>
>
> currently I am using cassandra with 3 nodes and having problem bring one
> back after it out of sync, the problems I encountered making me worry about
> how cassandra could scale out:
>
> 1):  the load balance need to manually performed on every node, according
>

Re: how large cassandra could scale when it need to do manual operation?

2011-07-09 Thread Yan Chunlu

I missed the consistency level part, thanks very much for the explanation.
that is clear enough.

On Sun, Jul 10, 2011 at 7:57 AM, aaron morton wrote:

> about the decommission problem, here is the link:
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/how-to-decommission-two-slow-nodes-td5078455.html
>
> The key part of that post is "and since the second node was under heavy
> load, and not enough ram, it was busy GCing and worked horribly slow" .
>
> maybe I was misunderstanding the replication factor, doesn't it RF=3 means
> I could lose two nodes and still have one available(with 100% of the keys),
> once Nodes>=3?
>
> When you start losing replicas the CL you use dictates if the cluster is
> still up for 100% of the keys. See
> http://thelastpickle.com/2011/06/13/Down-For-Me/
>
>  I have the strong willing to set RF to a very high value...
>
> As chris said 3 is about normal, it means the QUORUM CL is only 2 nodes.
>
> I am also trying to deploy cassandra across two datacenters(with 20ms
>> latency).
>>
> Lookup LOCAL_QUORUM in the wiki
>
> Hope that helps.
>
>  -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 9 Jul 2011, at 02:01, Chris Goffinet wrote:
>
> As mentioned by Aaron, yes we run hundreds of Cassandra nodes across
> multiple clusters. We run with RF of 2 and 3 (most common).
>
> We use commodity hardware and see failure all the time at this scale. We've
> never had 3 nodes that were in same replica set, fail all at once. We
> mitigate risk by being rack diverse, using different vendors for our hard
> drives, designed workflows to make sure machines get serviced in certain
> time windows and have an extensive automated burn-in process of (disk,
> memory, drives) to not roll out nodes/clusters that could fail right away.
>
> On Sat, Jul 9, 2011 at 12:17 AM, Yan Chunlu  wrote:
>
>> thank you very much for the reply. which brings me more confidence on
>> cassandra.
>> I will try the automation tools, the examples you've listed seems quite
>> promising!
>>
>>
>> about the decommission problem, here is the link:
>> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/how-to-decommission-two-slow-nodes-td5078455.html
>>  I am also trying to deploy cassandra across two datacenters(with 20ms
>> latency). so I am worrying about the network latency will even make it
>> worse.
>>
>> maybe I was misunderstanding the replication factor, doesn't it RF=3 means
>> I could lose two nodes and still have one available(with 100% of the keys),
>> once Nodes>=3?   besides I am not sure what's twitters setting on RF, but it
>> is possible to lose 3 nodes in the same time(facebook once encountered photo
>> loss because there RAID broken, rarely happen though). I have the strong
>> willing to set RF to a very high value...
>>
>> Thanks!
>>
>>
>> On Sat, Jul 9, 2011 at 5:22 AM, aaron morton wrote:
>>
>>> AFAIK Facebook Cassandra and Apache Cassandra diverged paths a long time
>>> ago. Twitter is a vocal supporter with a large Apache Cassandra install,
>>> e.g. "Twitter currently runs a couple hundred Cassandra nodes across a half
>>> dozen clusters. "
>>> http://www.datastax.com/2011/06/chris-goffinet-of-twitter-to-speak-at-cassandra-sf-2011
>>>
>>>
>>>
>>> <http://www.datastax.com/2011/06/chris-goffinet-of-twitter-to-speak-at-cassandra-sf-2011>If
>>> you are working with a 3 node cluster removing/rebuilding/what ever one node
>>> will effect 33% of your capacity. When you scale up the contribution from
>>> each individual node goes down, and the impact of one node going down is
>>> less. Problems that happen with a few nodes will go away at scale, to be
>>> replaced by a whole set of new ones.
>>>
>>>
>>> 1):  the load balance need to manually performed on every node, according
>>> to:
>>>
>>> Yes
>>>
>>> 2): when adding new nodes, need to perform node repair and cleanup on
>>> every node
>>>
>>> You only need to run cleanup, see
>>> http://wiki.apache.org/cassandra/Operations#Bootstrap
>>>
>>> 3) when decommission a node, there is a chance that slow down the entire
>>> cluster. (not sure why but I saw people ask around about it.) and the only
>>> way to do is shutdown the entire the cluster, rsync the data, and start all
>>> nodes without the decommission one.
>>>
>>> I cannot remember an

Re: how large cassandra could scale when it need to do manual operation?

2011-07-10 Thread Yan Chunlu

thanks for the information Chris.

that's very much like what I am going to do, though not as many nodes as
yours.
do you place the nodes in the same datancter?
could you give more information about the latency between your datacenters?
and also the replica_placement_strategy, do you use the
"cassandra-topology.properties" file to maintain the node list? thanks!

maybe I worried too much about the disaster-tolerant things...


On Sat, Jul 9, 2011 at 5:01 PM, Chris Goffinet  wrote:

> As mentioned by Aaron, yes we run hundreds of Cassandra nodes across
> multiple clusters. We run with RF of 2 and 3 (most common).
>
> We use commodity hardware and see failure all the time at this scale. We've
> never had 3 nodes that were in same replica set, fail all at once. We
> mitigate risk by being rack diverse, using different vendors for our hard
> drives, designed workflows to make sure machines get serviced in certain
> time windows and have an extensive automated burn-in process of (disk,
> memory, drives) to not roll out nodes/clusters that could fail right away.
>
>
> On Sat, Jul 9, 2011 at 12:17 AM, Yan Chunlu  wrote:
>
>> thank you very much for the reply. which brings me more confidence on
>> cassandra.
>> I will try the automation tools, the examples you've listed seems quite
>> promising!
>>
>>
>> about the decommission problem, here is the link:
>> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/how-to-decommission-two-slow-nodes-td5078455.html
>>  I am also trying to deploy cassandra across two datacenters(with 20ms
>> latency). so I am worrying about the network latency will even make it
>> worse.
>>
>> maybe I was misunderstanding the replication factor, doesn't it RF=3 means
>> I could lose two nodes and still have one available(with 100% of the keys),
>> once Nodes>=3?   besides I am not sure what's twitters setting on RF, but it
>> is possible to lose 3 nodes in the same time(facebook once encountered photo
>> loss because there RAID broken, rarely happen though). I have the strong
>> willing to set RF to a very high value...
>>
>> Thanks!
>>
>>
>> On Sat, Jul 9, 2011 at 5:22 AM, aaron morton wrote:
>>
>>> AFAIK Facebook Cassandra and Apache Cassandra diverged paths a long time
>>> ago. Twitter is a vocal supporter with a large Apache Cassandra install,
>>> e.g. "Twitter currently runs a couple hundred Cassandra nodes across a half
>>> dozen clusters. "
>>> http://www.datastax.com/2011/06/chris-goffinet-of-twitter-to-speak-at-cassandra-sf-2011
>>>
>>>
>>>
>>> <http://www.datastax.com/2011/06/chris-goffinet-of-twitter-to-speak-at-cassandra-sf-2011>If
>>> you are working with a 3 node cluster removing/rebuilding/what ever one node
>>> will effect 33% of your capacity. When you scale up the contribution from
>>> each individual node goes down, and the impact of one node going down is
>>> less. Problems that happen with a few nodes will go away at scale, to be
>>> replaced by a whole set of new ones.
>>>
>>>
>>> 1):  the load balance need to manually performed on every node, according
>>> to:
>>>
>>> Yes
>>>
>>> 2): when adding new nodes, need to perform node repair and cleanup on
>>> every node
>>>
>>> You only need to run cleanup, see
>>> http://wiki.apache.org/cassandra/Operations#Bootstrap
>>>
>>> 3) when decommission a node, there is a chance that slow down the entire
>>> cluster. (not sure why but I saw people ask around about it.) and the only
>>> way to do is shutdown the entire the cluster, rsync the data, and start all
>>> nodes without the decommission one.
>>>
>>> I cannot remember any specific cases where decommission requires a full
>>> cluster stop, do you have a link? With regard to slowing down, the
>>> decommission process will stream data from the node you are removing onto
>>> the other nodes this can slow down the target node (I think it's more
>>> intelligent now about what is moved). This will be exaggerated in a 3 node
>>> cluster as you are removing 33% of the processing and adding some
>>> (temporary) extra load to the remaining nodes.
>>>
>>> after all, I think there is alot of human work to do to maintain the
>>> cluster which make it impossible to scale to thousands of nodes,
>>>
>>> Automation, Automation, Automation is the only way to go.
>>>
>>> Chef, Puppet, CF Engine for general config and deployment; Clou

Re: Corrupted data

2011-07-10 Thread Yan Chunlu

I am running RF=2(I have changed it from 2->3 and back to 2) and 3 nodes and
didn't running node repair more than 10 days, did not aware of this is
critical.  I run node repair recently and one of the node always hung...
from log it seems doing nothing related to the repair.

so I got two problems:

1) do I need to treat every node as failure and do a rolling replacement?
 since there might be some inconsistent in the cluster even I have no way to
find out.
2) is that the reason that caused the node repair hung? the log message
says:
Jul 10, 2011 4:40:35 AM ClientCommunicatorAdmin Checker-run
WARNING: Failed to check the connection: java.net.SocketTimeoutException:
Read timed out

then nothing.

thanks!

On Sat, Jul 9, 2011 at 10:16 PM, Peter Schuller  wrote:

> >> - Have you been running repair consistently ?
> >
> > Nop, only when something breaks
>
> This is unrelated to the problem you were asking about, but if you
> never run delete, make sure you are aware of:
>
> http://wiki.apache.org/cassandra/Operations#Frequency_of_nodetool_repair
> http://wiki.apache.org/cassandra/DistributedDeletes
>
>
> --
> / Peter Schuller
>

-- 
闫春路

Re: Corrupted data

2011-07-10 Thread Yan Chunlu

oh the error seems from jmx


sorry but seems I dont have more error messages, the node repair just never
ends... and strace the process find out nothing, it is not doing anything.

is there anyway to get more information about this?  do I need to do a major
compaction on every column family? thanks!

On Mon, Jul 11, 2011 at 1:36 AM, aaron morton wrote:

> 1) do I need to treat every node as failure and do a rolling replacement?
>  since there might be some inconsistent in the cluster even I have no way to
> find out.
>
> see
> http://wiki.apache.org/cassandra/Operations#Dealing_with_the_consequences_of_nodetool_repair_not_running_within_GCGraceSeconds
>
>
> <http://wiki.apache.org/cassandra/Operations#Dealing_with_the_consequences_of_nodetool_repair_not_running_within_GCGraceSeconds>
>
> 2) is that the reason that caused the node repair hung? the log message
> says:
> Jul 10, 2011 4:40:35 AM ClientCommunicatorAdmin Checker-run
> WARNING: Failed to check the connection: java.net.SocketTimeoutException:
> Read timed out
>
> I cannot find that anywhere in the code base, can you provide some more
> information ?
>
> Cheers
>
>  -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 10 Jul 2011, at 03:26, Yan Chunlu wrote:
>
> I am running RF=2(I have changed it from 2->3 and back to 2) and 3 nodes
> and didn't running node repair more than 10 days, did not aware of this is
> critical.  I run node repair recently and one of the node always hung...
> from log it seems doing nothing related to the repair.
>
> so I got two problems:
>
> 1) do I need to treat every node as failure and do a rolling replacement?
>  since there might be some inconsistent in the cluster even I have no way to
> find out.
> 2) is that the reason that caused the node repair hung? the log message
> says:
> Jul 10, 2011 4:40:35 AM ClientCommunicatorAdmin Checker-run
> WARNING: Failed to check the connection: java.net.SocketTimeoutException:
> Read timed out
>
> then nothing.
>
> thanks!
>
> On Sat, Jul 9, 2011 at 10:16 PM, Peter Schuller <
> peter.schul...@infidyne.com> wrote:
>
>> >> - Have you been running repair consistently ?
>> >
>> > Nop, only when something breaks
>>
>> This is unrelated to the problem you were asking about, but if you
>> never run delete, make sure you are aware of:
>>
>> http://wiki.apache.org/cassandra/Operations#Frequency_of_nodetool_repair
>> http://wiki.apache.org/cassandra/DistributedDeletes
>>
>>
>> --
>> / Peter Schuller
>>
>
>
>
> --
> 闫春路
>
>
>


-- 
Charles

Re: Corrupted data

2011-07-10 Thread Yan Chunlu

it has already run about 20 hours...

On Mon, Jul 11, 2011 at 1:36 AM, aaron morton wrote:

> 1) do I need to treat every node as failure and do a rolling replacement?
>  since there might be some inconsistent in the cluster even I have no way to
> find out.
>
> see
> http://wiki.apache.org/cassandra/Operations#Dealing_with_the_consequences_of_nodetool_repair_not_running_within_GCGraceSeconds
>
>
> <http://wiki.apache.org/cassandra/Operations#Dealing_with_the_consequences_of_nodetool_repair_not_running_within_GCGraceSeconds>
>
> 2) is that the reason that caused the node repair hung? the log message
> says:
> Jul 10, 2011 4:40:35 AM ClientCommunicatorAdmin Checker-run
> WARNING: Failed to check the connection: java.net.SocketTimeoutException:
> Read timed out
>
> I cannot find that anywhere in the code base, can you provide some more
> information ?
>
> Cheers
>
>  -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 10 Jul 2011, at 03:26, Yan Chunlu wrote:
>
> I am running RF=2(I have changed it from 2->3 and back to 2) and 3 nodes
> and didn't running node repair more than 10 days, did not aware of this is
> critical.  I run node repair recently and one of the node always hung...
> from log it seems doing nothing related to the repair.
>
> so I got two problems:
>
> 1) do I need to treat every node as failure and do a rolling replacement?
>  since there might be some inconsistent in the cluster even I have no way to
> find out.
> 2) is that the reason that caused the node repair hung? the log message
> says:
> Jul 10, 2011 4:40:35 AM ClientCommunicatorAdmin Checker-run
> WARNING: Failed to check the connection: java.net.SocketTimeoutException:
> Read timed out
>
> then nothing.
>
> thanks!
>
> On Sat, Jul 9, 2011 at 10:16 PM, Peter Schuller <
> peter.schul...@infidyne.com> wrote:
>
>> >> - Have you been running repair consistently ?
>> >
>> > Nop, only when something breaks
>>
>> This is unrelated to the problem you were asking about, but if you
>> never run delete, make sure you are aware of:
>>
>> http://wiki.apache.org/cassandra/Operations#Frequency_of_nodetool_repair
>> http://wiki.apache.org/cassandra/DistributedDeletes
>>
>>
>> --
>> / Peter Schuller
>>
>
>
>
> --
> 闫春路
>
>
>


-- 
Charles

cassandra goes infinite loop and data lost.....

2011-07-13 Thread Yan Chunlu

DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
collecting 0 of 2147483647: 100zs:false:14@1310168625866434

Re: cassandra goes infinite loop and data lost.....

2011-07-13 Thread Yan Chunlu

I gave cassandra 8GB heap size and somehow it run out of memory and crashed.
after I start it, it just runs in to the following infinite loop, the last
line:
DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
collecting 0 of 2147483647: 100zs:false:14@1310168625866434

goes for ever

I have 3 nodes and RF=2, so I am losing data. is that means I am screwed and
can't get it back?

 DEBUG [main] 2011-07-13 22:19:00,585 SliceQueryFilter.java (line 123)
collecting 20 of 2147483647: q74k:false:14@1308886095008943
DEBUG [main] 2011-07-13 22:19:00,585 SliceQueryFilter.java (line 123)
collecting 0 of 2147483647: 10fbu:false:1@1310223075340297
DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
collecting 0 of 2147483647: apbg:false:13@1305641597957086
DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
collecting 1 of 2147483647: auje:false:13@1305641597957075
DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
collecting 2 of 2147483647: ayj8:false:13@1305641597957060
DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
collecting 3 of 2147483647: b4fz:false:13@1305641597957096
DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
collecting 0 of 2147483647: 100zs:false:14@1310168625866434
DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
collecting 1 of 2147483647: 1017f:false:14@1310168680375612
DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
collecting 2 of 2147483647: 1018e:false:14@1310168759614715
DEBUG [main] 2011-07-13 22:19:00,587 SliceQueryFilter.java (line 123)
collecting 3 of 2147483647: 101dd:false:14@1310169260225339


On Thu, Jul 14, 2011 at 11:27 AM, Yan Chunlu  wrote:

> DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> collecting 0 of 2147483647: 100zs:false:14@1310168625866434




-- 
闫春路

Re: cassandra goes infinite loop and data lost.....

2011-07-13 Thread Yan Chunlu

16GB

On Thu, Jul 14, 2011 at 11:29 AM, Bret Palsson  wrote:

>  How much total memory does your machine have?
>
> --
> Bret
>
> On Wednesday, July 13, 2011 at 9:27 PM, Yan Chunlu wrote:
>
> I gave cassandra 8GB heap size and somehow it run out of memory and
> crashed. after I start it, it just runs in to the following infinite loop,
> the last line:
> DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> collecting 0 of 2147483647: 100zs:false:14@1310168625866434
>
> goes for ever
>
> I have 3 nodes and RF=2, so I am losing data. is that means I am screwed
> and can't get it back?
>
>  DEBUG [main] 2011-07-13 22:19:00,585 SliceQueryFilter.java (line 123)
> collecting 20 of 2147483647: q74k:false:14@1308886095008943
> DEBUG [main] 2011-07-13 22:19:00,585 SliceQueryFilter.java (line 123)
> collecting 0 of 2147483647: 10fbu:false:1@1310223075340297
> DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> collecting 0 of 2147483647: apbg:false:13@1305641597957086
> DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> collecting 1 of 2147483647: auje:false:13@1305641597957075
> DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> collecting 2 of 2147483647: ayj8:false:13@1305641597957060
> DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> collecting 3 of 2147483647: b4fz:false:13@1305641597957096
> DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> collecting 0 of 2147483647: 100zs:false:14@1310168625866434
> DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> collecting 1 of 2147483647: 1017f:false:14@1310168680375612
> DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> collecting 2 of 2147483647: 1018e:false:14@1310168759614715
> DEBUG [main] 2011-07-13 22:19:00,587 SliceQueryFilter.java (line 123)
> collecting 3 of 2147483647: 101dd:false:14@1310169260225339
>
>
> On Thu, Jul 14, 2011 at 11:27 AM, Yan Chunlu wrote:
>
>  DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> collecting 0 of 2147483647: 100zs:false:14@1310168625866434
>
>
>
>
> --
> 闫春路
>
>
>


-- 
Charles

Re: cassandra goes infinite loop and data lost.....

2011-07-13 Thread Yan Chunlu

problem is I can't take cassandra back does that because not enough
memory for cassandra?

On Thu, Jul 14, 2011 at 11:29 AM, Bret Palsson  wrote:

> How much total memory does your machine have?
>
> --
> Bret
>
> On Wednesday, July 13, 2011 at 9:27 PM, Yan Chunlu wrote:
>
> I gave cassandra 8GB heap size and somehow it run out of memory and
> crashed. after I start it, it just runs in to the following infinite loop,
> the last line:
> DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> collecting 0 of 2147483647: 100zs:false:14@1310168625866434
>
> goes for ever
>
> I have 3 nodes and RF=2, so I am losing data. is that means I am screwed
> and can't get it back?
>
>  DEBUG [main] 2011-07-13 22:19:00,585 SliceQueryFilter.java (line 123)
> collecting 20 of 2147483647: q74k:false:14@1308886095008943
> DEBUG [main] 2011-07-13 22:19:00,585 SliceQueryFilter.java (line 123)
> collecting 0 of 2147483647: 10fbu:false:1@1310223075340297
> DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> collecting 0 of 2147483647: apbg:false:13@1305641597957086
> DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> collecting 1 of 2147483647: auje:false:13@1305641597957075
> DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> collecting 2 of 2147483647: ayj8:false:13@1305641597957060
> DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> collecting 3 of 2147483647: b4fz:false:13@1305641597957096
> DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> collecting 0 of 2147483647: 100zs:false:14@1310168625866434
> DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> collecting 1 of 2147483647: 1017f:false:14@1310168680375612
> DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> collecting 2 of 2147483647: 1018e:false:14@1310168759614715
> DEBUG [main] 2011-07-13 22:19:00,587 SliceQueryFilter.java (line 123)
> collecting 3 of 2147483647: 101dd:false:14@1310169260225339
>
>
> On Thu, Jul 14, 2011 at 11:27 AM, Yan Chunlu wrote:
>
>  DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> collecting 0 of 2147483647: 100zs:false:14@1310168625866434
>
>
>
>
> --
> 闫春路
>
>
>


-- 
闫春路

Re: cassandra goes infinite loop and data lost.....

2011-07-13 Thread Yan Chunlu

okay, I am not sure if it is infinite loop, I change log4j to "DEBUG" only
because cassandra never get online after run cassandra, it seems just halt.
 I enable debug then it start showing those message very fast and never end.

I have just run nodetool cleanup, and it start reading commitlog, seems
normal now.

thanks for the help, I am really newbie on cassandra and has no idea how
does slice works, could you give me more information? thanks alot!

On Thu, Jul 14, 2011 at 1:36 PM, Jonathan Ellis  wrote:

> That says "I'm collecting data to answer requests."
>
> I don't see anything here that indicates an infinite loop.
>
> I do see that it's saying "N of 2147483647" which looks like you're
> doing slices with a much larger limit than is advisable (good way to
> OOM the way you already did).
>
> On Wed, Jul 13, 2011 at 8:27 PM, Yan Chunlu  wrote:
> > I gave cassandra 8GB heap size and somehow it run out of memory and
> crashed.
> > after I start it, it just runs in to the following infinite loop, the
> last
> > line:
> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> > collecting 0 of 2147483647: 100zs:false:14@1310168625866434
> > goes for ever
> > I have 3 nodes and RF=2, so I am losing data. is that means I am screwed
> and
> > can't get it back?
> > DEBUG [main] 2011-07-13 22:19:00,585 SliceQueryFilter.java (line 123)
> > collecting 20 of 2147483647: q74k:false:14@1308886095008943
> > DEBUG [main] 2011-07-13 22:19:00,585 SliceQueryFilter.java (line 123)
> > collecting 0 of 2147483647: 10fbu:false:1@1310223075340297
> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> > collecting 0 of 2147483647: apbg:false:13@1305641597957086
> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> > collecting 1 of 2147483647: auje:false:13@1305641597957075
> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> > collecting 2 of 2147483647: ayj8:false:13@1305641597957060
> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> > collecting 3 of 2147483647: b4fz:false:13@1305641597957096
> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> > collecting 0 of 2147483647: 100zs:false:14@1310168625866434
> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> > collecting 1 of 2147483647: 1017f:false:14@1310168680375612
> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> > collecting 2 of 2147483647: 1018e:false:14@1310168759614715
> > DEBUG [main] 2011-07-13 22:19:00,587 SliceQueryFilter.java (line 123)
> > collecting 3 of 2147483647: 101dd:false:14@1310169260225339
> >
> > On Thu, Jul 14, 2011 at 11:27 AM, Yan Chunlu 
> wrote:
> >>
> >> DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> >> collecting 0 of 2147483647: 100zs:false:14@1310168625866434
> >
> >
> > --
> > 闫春路
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>



-- 
闫春路

node repair eat up all disk io and slow down entire cluster(3 nodes)

2011-07-20 Thread Yan Chunlu

at the beginning of using cassandra, I have no idea that I should run "node
repair" frequently, so basically, I have 3 nodes with RF=3 and have not run
node repair for months, the data size is 20G.

the problem is when I start running node repair now, it eat up all disk io
and the server load became 20+ and increasing, the worst thing is, the
entire cluster has slowed down and can not handle request. so I have to stop
it immediately because it make my web service unavailable.

the server has Intel Xeon-Lynnfield 3470-Quadcore [2.93GHz] and 8G memory,
with Western Digital WD RE3 WD1002FBYS SATA disk.

I really have no idea what to do now, as currently I have already found some
data loss, any suggestions would be appreciated.

Re: node repair eat up all disk io and slow down entire cluster(3 nodes)

2011-07-20 Thread Yan Chunlu

just found this:
https://issues.apache.org/jira/browse/CASSANDRA-2156

but seems only available to 0.8 and people submitted a patch for 0.6, I am
using 0.7.4, do I need to dig into the code and make my own patch?

does add compaction throttle solve the io problem?  thanks!

On Wed, Jul 20, 2011 at 4:44 PM, Yan Chunlu  wrote:

> at the beginning of using cassandra, I have no idea that I should run "node
> repair" frequently, so basically, I have 3 nodes with RF=3 and have not run
> node repair for months, the data size is 20G.
>
> the problem is when I start running node repair now, it eat up all disk io
> and the server load became 20+ and increasing, the worst thing is, the
> entire cluster has slowed down and can not handle request. so I have to stop
> it immediately because it make my web service unavailable.
>
> the server has Intel Xeon-Lynnfield 3470-Quadcore [2.93GHz] and 8G memory,
> with Western Digital WD RE3 WD1002FBYS SATA disk.
>
> I really have no idea what to do now, as currently I have already found
> some data loss, any suggestions would be appreciated.
>



-- 
闫春路

with proof Re: cassandra goes infinite loop and data lost.....

2011-07-20 Thread Yan Chunlu

this time it is another node, the node goes down during repair, and come
back but never up, I change log level to "DEBUG" and found out it print out
the following message infinitely

DEBUG [main] 2011-07-20 20:58:16,286 SliceQueryFilter.java (line 123)
collecting 0 of 2147483647: 76616c7565:false:6@1311207851757243
DEBUG [main] 2011-07-20 20:58:16,319 SliceQueryFilter.java (line 123)
collecting 0 of 2147483647: 76616c7565:false:98@1306722716288857
DEBUG [main] 2011-07-20 20:58:16,424 SliceQueryFilter.java (line 123)
collecting 0 of 2147483647: 76616c7565:false:95@1311089980134545
DEBUG [main] 2011-07-20 20:58:16,611 SliceQueryFilter.java (line 123)
collecting 0 of 2147483647: 76616c7565:false:85@1311154048866767
DEBUG [main] 2011-07-20 20:58:16,754 SliceQueryFilter.java (line 123)
collecting 0 of 2147483647: 76616c7565:false:366@1311207176880564
DEBUG [main] 2011-07-20 20:58:16,770 SliceQueryFilter.java (line 123)
collecting 0 of 2147483647: 76616c7565:false:80@1310443605930900
DEBUG [main] 2011-07-20 20:58:16,816 SliceQueryFilter.java (line 123)
collecting 0 of 2147483647: 76616c7565:false:486@1311173929610402
DEBUG [main] 2011-07-20 20:58:16,870 SliceQueryFilter.java (line 123)
collecting 0 of 2147483647: 76616c7565:false:101@1310818289021118
DEBUG [main] 2011-07-20 20:58:17,041 SliceQueryFilter.java (line 123)
collecting 0 of 2147483647: 76616c7565:false:677@1311202595772170
DEBUG [main] 2011-07-20 20:58:17,047 SliceQueryFilter.java (line 123)
collecting 0 of 2147483647: 76616c7565:false:374@1311147641237918




On Thu, Jul 14, 2011 at 1:36 PM, Jonathan Ellis  wrote:

> That says "I'm collecting data to answer requests."
>
> I don't see anything here that indicates an infinite loop.
>
> I do see that it's saying "N of 2147483647" which looks like you're
> doing slices with a much larger limit than is advisable (good way to
> OOM the way you already did).
>
> On Wed, Jul 13, 2011 at 8:27 PM, Yan Chunlu  wrote:
> > I gave cassandra 8GB heap size and somehow it run out of memory and
> crashed.
> > after I start it, it just runs in to the following infinite loop, the
> last
> > line:
> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> > collecting 0 of 2147483647: 100zs:false:14@1310168625866434
> > goes for ever
> > I have 3 nodes and RF=2, so I am losing data. is that means I am screwed
> and
> > can't get it back?
> > DEBUG [main] 2011-07-13 22:19:00,585 SliceQueryFilter.java (line 123)
> > collecting 20 of 2147483647: q74k:false:14@1308886095008943
> > DEBUG [main] 2011-07-13 22:19:00,585 SliceQueryFilter.java (line 123)
> > collecting 0 of 2147483647: 10fbu:false:1@1310223075340297
> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> > collecting 0 of 2147483647: apbg:false:13@1305641597957086
> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> > collecting 1 of 2147483647: auje:false:13@1305641597957075
> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> > collecting 2 of 2147483647: ayj8:false:13@1305641597957060
> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> > collecting 3 of 2147483647: b4fz:false:13@1305641597957096
> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> > collecting 0 of 2147483647: 100zs:false:14@1310168625866434
> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> > collecting 1 of 2147483647: 1017f:false:14@1310168680375612
> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> > collecting 2 of 2147483647: 1018e:false:14@1310168759614715
> > DEBUG [main] 2011-07-13 22:19:00,587 SliceQueryFilter.java (line 123)
> > collecting 3 of 2147483647: 101dd:false:14@1310169260225339
> >
> > On Thu, Jul 14, 2011 at 11:27 AM, Yan Chunlu 
> wrote:
> >>
> >> DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> >> collecting 0 of 2147483647: 100zs:false:14@1310168625866434
> >
> >
> > --
> > 闫春路
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>



-- 
闫春路

Re: node repair eat up all disk io and slow down entire cluster(3 nodes)

2011-07-20 Thread Yan Chunlu

thank you very much for the help, I will try to adjust minor compaction and
also dealing with single CF at a time.

On Thu, Jul 21, 2011 at 7:56 AM, Aaron Morton wrote:

> If you have never run repair also check the section on repair on this page
> http://wiki.apache.org/cassandra/Operations About how frequently it should
> be run.
>
> There is an issue where repair can stream too much data, and this can lead
> to excessive disk use.
>
> My non scientific approach to the never run repair before problem is to
> repair a single CF at a time, starting with the small ones that are less
> likely to have differences as they will stream the smallest amount of data.
>
> If you really want to conserve disk IO during the repair consider disabling
> the minor compaction by setting the min and max thresholds to 0 via node
> tool.
>
> hope that helps.
>
>
> -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 20/07/2011, at 11:46 PM, Yan Chunlu  wrote:
>
> just found this:
> <https://issues.apache.org/jira/browse/CASSANDRA-2156>
> https://issues.apache.org/jira/browse/CASSANDRA-2156
>
> but seems only available to 0.8 and people submitted a patch for 0.6, I am
> using 0.7.4, do I need to dig into the code and make my own patch?
>
> does add compaction throttle solve the io problem?  thanks!
>
> On Wed, Jul 20, 2011 at 4:44 PM, Yan Chunlu < 
> springri...@gmail.com> wrote:
>
>> at the beginning of using cassandra, I have no idea that I should run
>> "node repair" frequently, so basically, I have 3 nodes with RF=3 and have
>> not run node repair for months, the data size is 20G.
>>
>> the problem is when I start running node repair now, it eat up all disk io
>> and the server load became 20+ and increasing, the worst thing is, the
>> entire cluster has slowed down and can not handle request. so I have to stop
>> it immediately because it make my web service unavailable.
>>
>> the server has Intel Xeon-Lynnfield 3470-Quadcore [2.93GHz] and 8G
>> memory, with Western Digital WD RE3 WD1002FBYS SATA disk.
>>
>> I really have no idea what to do now, as currently I have already found
>> some data loss, any suggestions would be appreciated.
>>
>
>
>
> --
> 闫春路
>
>


-- 
闫春路

Re: with proof Re: cassandra goes infinite loop and data lost.....

2011-07-20 Thread Yan Chunlu

sorry for the misunderstanding.  I saw many N of 2147483647 which N=0 and
thought it was not doing anything.

my node was very unbalanced and I was intend to rebalance it by "nodetool
move" after a "node repair", does that cause the slices much large?

Address Status State   LoadOwnsToken


 84944475733633104818662955375549269696
10.28.53.2  Down   Normal  71.41 GB81.09%
 52773518586096316348543097376923124102
10.28.53.3 Up Normal  14.72 GB10.48%
 70597222385644499881390884416714081360
10.28.53.4  Up Normal  13.5 GB 8.43%
84944475733633104818662955375549269696


should I do "nodetool move" according to
http://wiki.apache.org/cassandra/Operations#Load_balancing  before doing
repair?

thank you for your help!



On Thu, Jul 21, 2011 at 10:47 AM, Jonathan Ellis  wrote:

> This is not an infinite loop, you can see the column objects being
> iterated over are different.
>
> Like I said last time, "I do see that it's saying "N of 2147483647"
> which looks like you're
> doing slices with a much larger limit than is advisable."
>
> On Wed, Jul 20, 2011 at 9:00 PM, Yan Chunlu  wrote:
> > this time it is another node, the node goes down during repair, and come
> > back but never up, I change log level to "DEBUG" and found out it print
> out
> > the following message infinitely
> > DEBUG [main] 2011-07-20 20:58:16,286 SliceQueryFilter.java (line 123)
> > collecting 0 of 2147483647: 76616c7565:false:6@1311207851757243
> > DEBUG [main] 2011-07-20 20:58:16,319 SliceQueryFilter.java (line 123)
> > collecting 0 of 2147483647: 76616c7565:false:98@1306722716288857
> > DEBUG [main] 2011-07-20 20:58:16,424 SliceQueryFilter.java (line 123)
> > collecting 0 of 2147483647: 76616c7565:false:95@1311089980134545
> > DEBUG [main] 2011-07-20 20:58:16,611 SliceQueryFilter.java (line 123)
> > collecting 0 of 2147483647: 76616c7565:false:85@1311154048866767
> > DEBUG [main] 2011-07-20 20:58:16,754 SliceQueryFilter.java (line 123)
> > collecting 0 of 2147483647: 76616c7565:false:366@1311207176880564
> > DEBUG [main] 2011-07-20 20:58:16,770 SliceQueryFilter.java (line 123)
> > collecting 0 of 2147483647: 76616c7565:false:80@1310443605930900
> > DEBUG [main] 2011-07-20 20:58:16,816 SliceQueryFilter.java (line 123)
> > collecting 0 of 2147483647: 76616c7565:false:486@1311173929610402
> > DEBUG [main] 2011-07-20 20:58:16,870 SliceQueryFilter.java (line 123)
> > collecting 0 of 2147483647: 76616c7565:false:101@1310818289021118
> > DEBUG [main] 2011-07-20 20:58:17,041 SliceQueryFilter.java (line 123)
> > collecting 0 of 2147483647: 76616c7565:false:677@1311202595772170
> > DEBUG [main] 2011-07-20 20:58:17,047 SliceQueryFilter.java (line 123)
> > collecting 0 of 2147483647: 76616c7565:false:374@1311147641237918
> >
> >
> >
> > On Thu, Jul 14, 2011 at 1:36 PM, Jonathan Ellis 
> wrote:
> >>
> >> That says "I'm collecting data to answer requests."
> >>
> >> I don't see anything here that indicates an infinite loop.
> >>
> >> I do see that it's saying "N of 2147483647" which looks like you're
> >> doing slices with a much larger limit than is advisable (good way to
> >> OOM the way you already did).
> >>
> >> On Wed, Jul 13, 2011 at 8:27 PM, Yan Chunlu 
> wrote:
> >> > I gave cassandra 8GB heap size and somehow it run out of memory and
> >> > crashed.
> >> > after I start it, it just runs in to the following infinite loop, the
> >> > last
> >> > line:
> >> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> >> > collecting 0 of 2147483647: 100zs:false:14@1310168625866434
> >> > goes for ever
> >> > I have 3 nodes and RF=2, so I am losing data. is that means I am
> screwed
> >> > and
> >> > can't get it back?
> >> > DEBUG [main] 2011-07-13 22:19:00,585 SliceQueryFilter.java (line 123)
> >> > collecting 20 of 2147483647: q74k:false:14@1308886095008943
> >> > DEBUG [main] 2011-07-13 22:19:00,585 SliceQueryFilter.java (line 123)
> >> > collecting 0 of 2147483647: 10fbu:false:1@1310223075340297
> >> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> >> > collecting 0 of 2147483647: apbg:false:13@1305641597957086
> >> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
> >> > collecting 1 of 2147483647: auje:false:13@1305641597957075
> >> > DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFil

Re: with proof Re: cassandra goes infinite loop and data lost.....

2011-07-20 Thread Yan Chunlu

thans for the reply.

now the problem is how can I get rid of the ""N of 2147483647 ", it seems
never ends, and the node never goes UP
last time it happens I run "node cleanup", turns out some data loss(not sure
if caused by cleanup).

On Thu, Jul 21, 2011 at 11:37 AM, aaron morton wrote:

> Personally I would do a repair first if you need to do one, just so you are
> confident everything is where is should be.
>
> Then do the move as described in the wiki.
>
> Cheers
>
>  -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 21 Jul 2011, at 15:14, Yan Chunlu wrote:
>
> sorry for the misunderstanding.  I saw many N of 2147483647 which N=0 and
> thought it was not doing anything.
>
> my node was very unbalanced and I was intend to rebalance it by "nodetool
> move" after a "node repair", does that cause the slices much large?
>
> Address Status State   LoadOwnsToken
>
>
>  84944475733633104818662955375549269696
> 10.28.53.2  Down   Normal  71.41 GB81.09%
>  52773518586096316348543097376923124102
> 10.28.53.3 Up Normal  14.72 GB10.48%
>  70597222385644499881390884416714081360
> 10.28.53.4  Up Normal  13.5 GB 8.43%
> 84944475733633104818662955375549269696
>
>
> should I do "nodetool move" according to
> http://wiki.apache.org/cassandra/Operations#Load_balancing  before doing
> repair?
>
> thank you for your help!
>
>
>
> On Thu, Jul 21, 2011 at 10:47 AM, Jonathan Ellis wrote:
>
>> This is not an infinite loop, you can see the column objects being
>> iterated over are different.
>>
>> Like I said last time, "I do see that it's saying "N of 2147483647"
>> which looks like you're
>> doing slices with a much larger limit than is advisable."
>>
>> On Wed, Jul 20, 2011 at 9:00 PM, Yan Chunlu 
>> wrote:
>> > this time it is another node, the node goes down during repair, and come
>> > back but never up, I change log level to "DEBUG" and found out it print
>> out
>> > the following message infinitely
>> > DEBUG [main] 2011-07-20 20:58:16,286 SliceQueryFilter.java (line 123)
>> > collecting 0 of 2147483647: 76616c7565:false:6@1311207851757243
>> > DEBUG [main] 2011-07-20 20:58:16,319 SliceQueryFilter.java (line 123)
>> > collecting 0 of 2147483647: 76616c7565:false:98@1306722716288857
>> > DEBUG [main] 2011-07-20 20:58:16,424 SliceQueryFilter.java (line 123)
>> > collecting 0 of 2147483647: 76616c7565:false:95@1311089980134545
>> > DEBUG [main] 2011-07-20 20:58:16,611 SliceQueryFilter.java (line 123)
>> > collecting 0 of 2147483647: 76616c7565:false:85@1311154048866767
>> > DEBUG [main] 2011-07-20 20:58:16,754 SliceQueryFilter.java (line 123)
>> > collecting 0 of 2147483647: 76616c7565:false:366@1311207176880564
>> > DEBUG [main] 2011-07-20 20:58:16,770 SliceQueryFilter.java (line 123)
>> > collecting 0 of 2147483647: 76616c7565:false:80@1310443605930900
>> > DEBUG [main] 2011-07-20 20:58:16,816 SliceQueryFilter.java (line 123)
>> > collecting 0 of 2147483647: 76616c7565:false:486@1311173929610402
>> > DEBUG [main] 2011-07-20 20:58:16,870 SliceQueryFilter.java (line 123)
>> > collecting 0 of 2147483647: 76616c7565:false:101@1310818289021118
>> > DEBUG [main] 2011-07-20 20:58:17,041 SliceQueryFilter.java (line 123)
>> > collecting 0 of 2147483647: 76616c7565:false:677@1311202595772170
>> > DEBUG [main] 2011-07-20 20:58:17,047 SliceQueryFilter.java (line 123)
>> > collecting 0 of 2147483647: 76616c7565:false:374@1311147641237918
>> >
>> >
>> >
>> > On Thu, Jul 14, 2011 at 1:36 PM, Jonathan Ellis 
>> wrote:
>> >>
>> >> That says "I'm collecting data to answer requests."
>> >>
>> >> I don't see anything here that indicates an infinite loop.
>> >>
>> >> I do see that it's saying "N of 2147483647" which looks like you're
>> >> doing slices with a much larger limit than is advisable (good way to
>> >> OOM the way you already did).
>> >>
>> >> On Wed, Jul 13, 2011 at 8:27 PM, Yan Chunlu 
>> wrote:
>> >> > I gave cassandra 8GB heap size and somehow it run out of memory and
>> >> > crashed.
>> >> > after I start it, it just runs in to the following infinite loop, the
>> >> > last
>> >> > line:
>> >> > DEBUG

Re: node repair eat up all disk io and slow down entire cluster(3 nodes)

2011-07-21 Thread Yan Chunlu

after tried nodetool -h reagon repair key cf, I found that even repair
single CF, it involves rebuild all sstables(using nodetool compactionstats),
is that normal?

On Thu, Jul 21, 2011 at 7:56 AM, Aaron Morton wrote:

> If you have never run repair also check the section on repair on this page
> http://wiki.apache.org/cassandra/Operations About how frequently it should
> be run.
>
> There is an issue where repair can stream too much data, and this can lead
> to excessive disk use.
>
> My non scientific approach to the never run repair before problem is to
> repair a single CF at a time, starting with the small ones that are less
> likely to have differences as they will stream the smallest amount of data.
>
> If you really want to conserve disk IO during the repair consider disabling
> the minor compaction by setting the min and max thresholds to 0 via node
> tool.
>
> hope that helps.
>
>
> -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 20/07/2011, at 11:46 PM, Yan Chunlu  wrote:
>
> just found this:
> <https://issues.apache.org/jira/browse/CASSANDRA-2156>
> https://issues.apache.org/jira/browse/CASSANDRA-2156
>
> but seems only available to 0.8 and people submitted a patch for 0.6, I am
> using 0.7.4, do I need to dig into the code and make my own patch?
>
> does add compaction throttle solve the io problem?  thanks!
>
> On Wed, Jul 20, 2011 at 4:44 PM, Yan Chunlu < 
> springri...@gmail.com> wrote:
>
>> at the beginning of using cassandra, I have no idea that I should run
>> "node repair" frequently, so basically, I have 3 nodes with RF=3 and have
>> not run node repair for months, the data size is 20G.
>>
>> the problem is when I start running node repair now, it eat up all disk io
>> and the server load became 20+ and increasing, the worst thing is, the
>> entire cluster has slowed down and can not handle request. so I have to stop
>> it immediately because it make my web service unavailable.
>>
>> the server has Intel Xeon-Lynnfield 3470-Quadcore [2.93GHz] and 8G
>> memory, with Western Digital WD RE3 WD1002FBYS SATA disk.
>>
>> I really have no idea what to do now, as currently I have already found
>> some data loss, any suggestions would be appreciated.
>>
>
>
>
> --
> 闫春路
>
>


-- 
闫春路

Re: node repair eat up all disk io and slow down entire cluster(3 nodes)

2011-07-21 Thread Yan Chunlu

SStable Rebuilding, it might be the problem of CASSANDRA-2280

On Thu, Jul 21, 2011 at 7:52 PM, aaron morton wrote:

> What are you seeing in compaction stats ?
>
> You may see some of  https://issues.apache.org/jira/browse/CASSANDRA-2280
>
> Cheers
>
>  -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 21 Jul 2011, at 23:17, Yan Chunlu wrote:
>
> after tried nodetool -h reagon repair key cf, I found that even repair
> single CF, it involves rebuild all sstables(using nodetool compactionstats),
> is that normal?
>
> On Thu, Jul 21, 2011 at 7:56 AM, Aaron Morton wrote:
>
>> If you have never run repair also check the section on repair on this
>> page
>> http://wiki.apache.org/cassandra/Operations About how frequently it
>> should be run.
>>
>> There is an issue where repair can stream too much data, and this can lead
>> to excessive disk use.
>>
>> My non scientific approach to the never run repair before problem is to
>> repair a single CF at a time, starting with the small ones that are less
>> likely to have differences as they will stream the smallest amount of data.
>>
>> If you really want to conserve disk IO during the repair consider
>> disabling the minor compaction by setting the min and max thresholds to 0
>> via node tool.
>>
>> hope that helps.
>>
>>
>> -
>> Aaron Morton
>> Freelance Cassandra Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 20/07/2011, at 11:46 PM, Yan Chunlu  wrote:
>>
>> just found this:
>> <https://issues.apache.org/jira/browse/CASSANDRA-2156>
>> https://issues.apache.org/jira/browse/CASSANDRA-2156
>>
>> but seems only available to 0.8 and people submitted a patch for 0.6, I am
>> using 0.7.4, do I need to dig into the code and make my own patch?
>>
>> does add compaction throttle solve the io problem?  thanks!
>>
>> On Wed, Jul 20, 2011 at 4:44 PM, Yan Chunlu < 
>> springri...@gmail.com> wrote:
>>
>>> at the beginning of using cassandra, I have no idea that I should run
>>> "node repair" frequently, so basically, I have 3 nodes with RF=3 and have
>>> not run node repair for months, the data size is 20G.
>>>
>>> the problem is when I start running node repair now, it eat up all disk
>>> io and the server load became 20+ and increasing, the worst thing is, the
>>> entire cluster has slowed down and can not handle request. so I have to stop
>>> it immediately because it make my web service unavailable.
>>>
>>> the server has Intel Xeon-Lynnfield 3470-Quadcore [2.93GHz] and 8G
>>> memory, with Western Digital WD RE3 WD1002FBYS SATA disk.
>>>
>>> I really have no idea what to do now, as currently I have already found
>>> some data loss, any suggestions would be appreciated.
>>>
>>
>>
>>
>> --
>> 闫春路
>>
>>
>
>
> --
> 闫春路
>
>
>


-- 
闫春路

do I need to add more nodes? minor compaction eat all IO

2011-07-23 Thread Yan Chunlu

I have three nodes and RF=3, every time it is do minor compaction, the
cpu load(8 core) get to >30, and iostat -x 2 shows utils is 100%, is
that means I need more nodes?  the total data size is <60G

thanks!

--

Re: do I need to add more nodes? minor compaction eat all IO

2011-07-25 Thread Yan Chunlu

I am using normal SATA disk,  actually I was worrying about whether it
is okay if every time cassandra using all the io resources?
further more when is the good time to add more nodes when I was just
using normal SATA disk and with 100r/s it could reach 100 %util

how large the data size it should be on each node?


below is my iostat -x 2 when doing node repair, I have to repair
column family separately otherwise the load will be more crazy:

Device: rrqm/s   wrqm/s r/s w/srMB/swMB/s
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda   1.50 1.50  121.50   14.00 3.68 0.30
60.19   116.98 1569.46   59.49 14673.86   7.38 100.00






On Sun, Jul 24, 2011 at 8:04 AM, Jonathan Ellis  wrote:
> On Sat, Jul 23, 2011 at 4:16 PM, Francois Richard  wrote:
>> My understanding is that during compaction cassandra does a lot of non 
>> sequential readsa then dumps the results with a big sequential write.
>
> Compaction reads and writes are both sequential, and 0.8 allows
> setting a MB/s to cap compaction at.
>
> As to the original question "do I need to add more machines" I'd say
> that depends more on whether your application's SLA is met, than what
> % io util spikes to.
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>

Re: do I need to add more nodes? minor compaction eat all IO

2011-07-25 Thread Yan Chunlu

as the wiki suggested:
http://wiki.apache.org/cassandra/LargeDataSetConsiderations
Adding nodes is a slow process if each node is responsible for a large
amount of data. Plan for this; do not try to throw additional hardware
at a cluster at the last minute.


I really would like to know what's the status of my cluster, if it is normal


On Mon, Jul 25, 2011 at 8:59 PM, Yan Chunlu  wrote:
> I am using normal SATA disk,  actually I was worrying about whether it
> is okay if every time cassandra using all the io resources?
> further more when is the good time to add more nodes when I was just
> using normal SATA disk and with 100r/s it could reach 100 %util
>
> how large the data size it should be on each node?
>
>
> below is my iostat -x 2 when doing node repair, I have to repair
> column family separately otherwise the load will be more crazy:
>
> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
> sda               1.50     1.50  121.50   14.00     3.68     0.30
> 60.19   116.98 1569.46   59.49 14673.86   7.38 100.00
>
>
>
>
>
>
> On Sun, Jul 24, 2011 at 8:04 AM, Jonathan Ellis  wrote:
>> On Sat, Jul 23, 2011 at 4:16 PM, Francois Richard  wrote:
>>> My understanding is that during compaction cassandra does a lot of non 
>>> sequential readsa then dumps the results with a big sequential write.
>>
>> Compaction reads and writes are both sequential, and 0.8 allows
>> setting a MB/s to cap compaction at.
>>
>> As to the original question "do I need to add more machines" I'd say
>> that depends more on whether your application's SLA is met, than what
>> % io util spikes to.
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of DataStax, the source for professional Cassandra support
>> http://www.datastax.com
>>
>

how to solve one node is in heavy load in unbalanced cluster

2011-07-28 Thread Yan Chunlu

I have three nodes and RF=3.here is the current ring:


Address Status State Load Owns Token

84944475733633104818662955375549269696
node1 Up Normal 15.32 GB 81.09% 52773518586096316348543097376923124102
node2 Up Normal 22.51 GB 10.48% 70597222385644499881390884416714081360
node3 Up Normal 56.1 GB 8.43% 84944475733633104818662955375549269696


it is very un-balanced and I would like to re-balance it using
"nodetool move" asap. unfortunately I haven't been run node repair for
a long time.

aaron suggested it's better to run node repair on every node then re-balance it.


problem is the node3 is in heavy-load currently, and the entire
cluster slow down if I start doing node repair. I have to
disablegossip and disablethrift to stop the repair.

only cassandra running on that server and I have no idea what it was
doing. the cpu load is about 20+ currently. compcationstats and
netstats shows it was not doing anything.

I have change client to not to connect to node3, but still, it seems
in heavy load and io utils is 100%.


the log seems normal(although not sure what about the "Dropped read
message" thing):

 INFO 13:21:38,191 GC for ParNew: 345 ms, 627003992 reclaimed leaving
2563726360 used; max is 4248829952
 WARN 13:21:38,560 Dropped 826 READ messages in the last 5000ms
 INFO 13:21:38,560 Pool NameActive   Pending
 INFO 13:21:38,560 ReadStage 8  7555
 INFO 13:21:38,561 RequestResponseStage  0 0
 INFO 13:21:38,561 ReadRepairStage   0 0



is there anyway to tell what node3 was doing? or at least is there any
way to make it not slowdown the whole cluster?

Re: how to solve one node is in heavy load in unbalanced cluster

2011-07-28 Thread Yan Chunlu

add new nodes seems added more pressure  to the cluster?  how about your
data size?

On Fri, Jul 29, 2011 at 4:16 AM, Frank Duan  wrote:

> "Dropped read message" might be an indicator of capacity issue. We
> experienced the similar issue with 0.7.6.
>
> We ended up adding two extra nodes and physically rebooted the offending
> node(s).
>
> The entire cluster then calmed down.
>
> On Thu, Jul 28, 2011 at 2:24 PM, Yan Chunlu  wrote:
>
>> I have three nodes and RF=3.here is the current ring:
>>
>>
>> Address Status State Load Owns Token
>>
>> 84944475733633104818662955375549269696
>> node1 Up Normal 15.32 GB 81.09% 52773518586096316348543097376923124102
>> node2 Up Normal 22.51 GB 10.48% 70597222385644499881390884416714081360
>> node3 Up Normal 56.1 GB 8.43% 84944475733633104818662955375549269696
>>
>>
>> it is very un-balanced and I would like to re-balance it using
>> "nodetool move" asap. unfortunately I haven't been run node repair for
>> a long time.
>>
>> aaron suggested it's better to run node repair on every node then
>> re-balance it.
>>
>>
>> problem is the node3 is in heavy-load currently, and the entire
>> cluster slow down if I start doing node repair. I have to
>> disablegossip and disablethrift to stop the repair.
>>
>> only cassandra running on that server and I have no idea what it was
>> doing. the cpu load is about 20+ currently. compcationstats and
>> netstats shows it was not doing anything.
>>
>> I have change client to not to connect to node3, but still, it seems
>> in heavy load and io utils is 100%.
>>
>>
>> the log seems normal(although not sure what about the "Dropped read
>> message" thing):
>>
>>  INFO 13:21:38,191 GC for ParNew: 345 ms, 627003992 reclaimed leaving
>> 2563726360 used; max is 4248829952
>>  WARN 13:21:38,560 Dropped 826 READ messages in the last 5000ms
>>  INFO 13:21:38,560 Pool NameActive   Pending
>>  INFO 13:21:38,560 ReadStage 8  7555
>>  INFO 13:21:38,561 RequestResponseStage  0 0
>>  INFO 13:21:38,561 ReadRepairStage   0 0
>>
>>
>>
>> is there anyway to tell what node3 was doing? or at least is there any
>> way to make it not slowdown the whole cluster?
>>
>
>
>
> --
> Frank Duan
> aiMatch
> fr...@aimatch.com
> c: 703.869.9951
> www.aiMatch.com
>
>

Re: how to solve one node is in heavy load in unbalanced cluster

2011-07-28 Thread Yan Chunlu

and by the way, my RF=3 and the other two nodes have much more capacity, why
does they always routed the request to node3?

coud I do a rebalance now? before node repair?

On Fri, Jul 29, 2011 at 12:01 PM, Yan Chunlu  wrote:

> add new nodes seems added more pressure  to the cluster?  how about your
> data size?
>
>
> On Fri, Jul 29, 2011 at 4:16 AM, Frank Duan  wrote:
>
>> "Dropped read message" might be an indicator of capacity issue. We
>> experienced the similar issue with 0.7.6.
>>
>> We ended up adding two extra nodes and physically rebooted the offending
>> node(s).
>>
>> The entire cluster then calmed down.
>>
>> On Thu, Jul 28, 2011 at 2:24 PM, Yan Chunlu wrote:
>>
>>> I have three nodes and RF=3.here is the current ring:
>>>
>>>
>>> Address Status State Load Owns Token
>>>
>>> 84944475733633104818662955375549269696
>>> node1 Up Normal 15.32 GB 81.09% 52773518586096316348543097376923124102
>>> node2 Up Normal 22.51 GB 10.48% 70597222385644499881390884416714081360
>>> node3 Up Normal 56.1 GB 8.43% 84944475733633104818662955375549269696
>>>
>>>
>>> it is very un-balanced and I would like to re-balance it using
>>> "nodetool move" asap. unfortunately I haven't been run node repair for
>>> a long time.
>>>
>>> aaron suggested it's better to run node repair on every node then
>>> re-balance it.
>>>
>>>
>>> problem is the node3 is in heavy-load currently, and the entire
>>> cluster slow down if I start doing node repair. I have to
>>> disablegossip and disablethrift to stop the repair.
>>>
>>> only cassandra running on that server and I have no idea what it was
>>> doing. the cpu load is about 20+ currently. compcationstats and
>>> netstats shows it was not doing anything.
>>>
>>> I have change client to not to connect to node3, but still, it seems
>>> in heavy load and io utils is 100%.
>>>
>>>
>>> the log seems normal(although not sure what about the "Dropped read
>>> message" thing):
>>>
>>>  INFO 13:21:38,191 GC for ParNew: 345 ms, 627003992 reclaimed leaving
>>> 2563726360 used; max is 4248829952
>>>  WARN 13:21:38,560 Dropped 826 READ messages in the last 5000ms
>>>  INFO 13:21:38,560 Pool NameActive   Pending
>>>  INFO 13:21:38,560 ReadStage 8  7555
>>>  INFO 13:21:38,561 RequestResponseStage  0 0
>>>  INFO 13:21:38,561 ReadRepairStage   0 0
>>>
>>>
>>>
>>> is there anyway to tell what node3 was doing? or at least is there any
>>> way to make it not slowdown the whole cluster?
>>>
>>
>>
>>
>> --
>> Frank Duan
>> aiMatch
>> fr...@aimatch.com
>> c: 703.869.9951
>> www.aiMatch.com
>>
>>
>

Re: how to solve one node is in heavy load in unbalanced cluster

2011-07-31 Thread Yan Chunlu

any help? thanks!

On Fri, Jul 29, 2011 at 12:05 PM, Yan Chunlu  wrote:

> and by the way, my RF=3 and the other two nodes have much more capacity,
> why does they always routed the request to node3?
>
> coud I do a rebalance now? before node repair?
>
>
> On Fri, Jul 29, 2011 at 12:01 PM, Yan Chunlu wrote:
>
>> add new nodes seems added more pressure  to the cluster?  how about your
>> data size?
>>
>>
>> On Fri, Jul 29, 2011 at 4:16 AM, Frank Duan  wrote:
>>
>>> "Dropped read message" might be an indicator of capacity issue. We
>>> experienced the similar issue with 0.7.6.
>>>
>>> We ended up adding two extra nodes and physically rebooted the offending
>>> node(s).
>>>
>>> The entire cluster then calmed down.
>>>
>>> On Thu, Jul 28, 2011 at 2:24 PM, Yan Chunlu wrote:
>>>
>>>> I have three nodes and RF=3.here is the current ring:
>>>>
>>>>
>>>> Address Status State Load Owns Token
>>>>
>>>> 84944475733633104818662955375549269696
>>>> node1 Up Normal 15.32 GB 81.09% 52773518586096316348543097376923124102
>>>> node2 Up Normal 22.51 GB 10.48% 70597222385644499881390884416714081360
>>>> node3 Up Normal 56.1 GB 8.43% 84944475733633104818662955375549269696
>>>>
>>>>
>>>> it is very un-balanced and I would like to re-balance it using
>>>> "nodetool move" asap. unfortunately I haven't been run node repair for
>>>> a long time.
>>>>
>>>> aaron suggested it's better to run node repair on every node then
>>>> re-balance it.
>>>>
>>>>
>>>> problem is the node3 is in heavy-load currently, and the entire
>>>> cluster slow down if I start doing node repair. I have to
>>>> disablegossip and disablethrift to stop the repair.
>>>>
>>>> only cassandra running on that server and I have no idea what it was
>>>> doing. the cpu load is about 20+ currently. compcationstats and
>>>> netstats shows it was not doing anything.
>>>>
>>>> I have change client to not to connect to node3, but still, it seems
>>>> in heavy load and io utils is 100%.
>>>>
>>>>
>>>> the log seems normal(although not sure what about the "Dropped read
>>>> message" thing):
>>>>
>>>>  INFO 13:21:38,191 GC for ParNew: 345 ms, 627003992 reclaimed leaving
>>>> 2563726360 used; max is 4248829952
>>>>  WARN 13:21:38,560 Dropped 826 READ messages in the last 5000ms
>>>>  INFO 13:21:38,560 Pool NameActive   Pending
>>>>  INFO 13:21:38,560 ReadStage 8  7555
>>>>  INFO 13:21:38,561 RequestResponseStage  0 0
>>>>  INFO 13:21:38,561 ReadRepairStage   0 0
>>>>
>>>>
>>>>
>>>> is there anyway to tell what node3 was doing? or at least is there any
>>>> way to make it not slowdown the whole cluster?
>>>>
>>>
>>>
>>>
>>> --
>>> Frank Duan
>>> aiMatch
>>> fr...@aimatch.com
>>> c: 703.869.9951
>>> www.aiMatch.com
>>>
>>>
>>
>

Re: how to solve one node is in heavy load in unbalanced cluster

2011-07-31 Thread Yan Chunlu

is that okay to do nodetool move before a completely repair?

using this equation?
def tokens(nodes):

   - for x in xrange(nodes):
  - print 2 ** 127 / nodes * x


On Mon, Aug 1, 2011 at 1:17 AM, mcasandra  wrote:

> First run nodetool move and then you can run nodetool repair. Before you
> run
> nodetool move you will need to determine tokens that each node will be
> responsible for. Then use that token to perform move.
>
> --
> View this message in context:
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/how-to-solve-one-node-is-in-heavy-load-in-unbalanced-cluster-tp6630827p6638649.html
> Sent from the cassandra-u...@incubator.apache.org mailing list archive at
> Nabble.com.
>

Could I run node repair when disable gossip and thrift?

2011-07-31 Thread Yan Chunlu

I am running 3 nodes and RF=3, cassandra v0.7.4
seems when disablegossip and disablethrift could keep node in pretty low
load. sometimes when the node repair doing "rebuilding sstable", I would
disable gossip and thrift to lower the load. not sure if I could disable
them in the whole procedure. thanks!

Re: Could I run node repair when disable gossip and thrift?

2011-07-31 Thread Yan Chunlu

okay, I see. thanks a lot for the help!

On Mon, Aug 1, 2011 at 5:26 AM, aaron morton wrote:

> if you disable gossip the node will appear down to others. This would stop
> the repair starting. After repair has started it *may* still cause problems
> when new streams start (it probably does not). If the node is down other
> nodes will stop sending writes to it.
>
> disable thrift will stop clients sending writes to the node.
>
> If you disable thrift you are doing the opposite of repair. You are asking
> a node to repair it's data so it's in sync with other nodes, at the  same
> time you are preventing it from accepting writes and staying in sync with
> other nodes. So you are creating more repair work for the node.
>
> Will will be increasing the amount of Read Repair, Hinted Handoff and
> Repair work the node must do.  IMHO it's not a good idea.
>
> Cheers
>
> -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 1 Aug 2011, at 06:09, Yan Chunlu wrote:
>
> > I am running 3 nodes and RF=3, cassandra v0.7.4
> > seems when disablegossip and disablethrift could keep node in pretty low
> load. sometimes when the node repair doing "rebuilding sstable", I would
> disable gossip and thrift to lower the load. not sure if I could disable
> them in the whole procedure. thanks!
>
>

Re: how to solve one node is in heavy load in unbalanced cluster

2011-07-31 Thread Yan Chunlu

okay, thanks Aaron!

On Mon, Aug 1, 2011 at 5:43 AM, aaron morton wrote:

> aaron suggested it's better to run node repair on every node then
> re-balance it.
>
>
> That's me been cautious with other peoples data.
>
> It looks like node 3 is overwhelmed. Try getting the move sorted.
>
> Cheers
>
>  -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 1 Aug 2011, at 05:48, Yan Chunlu wrote:
>
> is that okay to do nodetool move before a completely repair?
>
> using this equation?
> def tokens(nodes):
>
>- for x in xrange(nodes):
>   - print 2 ** 127 / nodes * x
>
>
> On Mon, Aug 1, 2011 at 1:17 AM, mcasandra  wrote:
>
>> First run nodetool move and then you can run nodetool repair. Before you
>> run
>> nodetool move you will need to determine tokens that each node will be
>> responsible for. Then use that token to perform move.
>>
>> --
>> View this message in context:
>> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/how-to-solve-one-node-is-in-heavy-load-in-unbalanced-cluster-tp6630827p6638649.html
>> Sent from the cassandra-u...@incubator.apache.org mailing list archive at
>> Nabble.com.
>>
>
>
>

Re: how to solve one node is in heavy load in unbalanced cluster

2011-07-31 Thread Yan Chunlu

thanks a lot! I will try the "move".

On Mon, Aug 1, 2011 at 7:07 AM, mcasandra  wrote:

>
> springrider wrote:
> >
> > is that okay to do nodetool move before a completely repair?
> >
> > using this equation?
> > def tokens(nodes):
> >
> >- for x in xrange(nodes):
> >   - print 2 ** 127 / nodes * x
> >
>
> Yes use that logic to get the tokens. I think it's safe to run move first
> and reair later. You are moving some nodes data as is so it's no worse than
> what you have right now.
>
> --
> View this message in context:
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/how-to-solve-one-node-is-in-heavy-load-in-unbalanced-cluster-tp6630827p6639317.html
> Sent from the cassandra-u...@incubator.apache.org mailing list archive at
> Nabble.com.
>

Re: how to solve one node is in heavy load in unbalanced cluster

2011-08-04 Thread Yan Chunlu

I have tried the nodetool move but get the following error

node3:~# nodetool -h node3 move 0
Exception in thread "main" java.lang.IllegalStateException: replication
factor (3) exceeds number of endpoints (2)
 at
org.apache.cassandra.locator.SimpleStrategy.calculateNaturalEndpoints(SimpleStrategy.java:60)
at
org.apache.cassandra.service.StorageService.calculatePendingRanges(StorageService.java:930)
 at
org.apache.cassandra.service.StorageService.calculatePendingRanges(StorageService.java:896)
at
org.apache.cassandra.service.StorageService.startLeaving(StorageService.java:1596)
 at
org.apache.cassandra.service.StorageService.move(StorageService.java:1734)
at
org.apache.cassandra.service.StorageService.move(StorageService.java:1709)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
 at
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93)
at
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27)
 at
com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208)
at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120)
 at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262)
at
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836)
 at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761)
at
javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1427)
 at
javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72)
at
javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265)
 at
javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360)
at
javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:788)
 at sun.reflect.GeneratedMethodAccessor108.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305)
 at sun.rmi.transport.Transport$1.run(Transport.java:159)
at java.security.AccessController.doPrivileged(Native Method)
 at sun.rmi.transport.Transport.serviceCall(Transport.java:155)
at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535)
 at
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)
at
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649)
 at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)




then nodetool shows the node is leaving


nodetool -h reagon ring
Address Status State   LoadOwnsToken


 84944475733633104818662955375549269696
node3  Up Normal  13.18 GB81.09%
 52773518586096316348543097376923124102
node3 Up Normal  22.85 GB10.48%
 70597222385644499881390884416714081360
node3  Up Leaving 25.44 GB8.43%
84944475733633104818662955375549269696

the log didn't show any error message neither anything abnormal.  is there
something wrong?


I used to have RF=2, and changed it to RF=3 using cassandra-cli.


On Mon, Aug 1, 2011 at 10:22 AM, Yan Chunlu  wrote:

> thanks a lot! I will try the "move".
>
>
> On Mon, Aug 1, 2011 at 7:07 AM, mcasandra  wrote:
>
>>
>> springrider wrote:
>> >
>> > is that okay to do nodetool move before a completely repair?
>> >
>> > using this equation?
>> > def tokens(nodes):
>> >
>> >- for x in xrange(nodes):
>> >   - print 2 ** 127 / nodes * x
>> >
>>
>> Yes use that logic to get the tokens. I think it's safe to run move first
>> and reair later. You are moving some nodes data as is so it's no worse
>> than
>> what you have right now.
>>
>> --
>> View this message in context:
>> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/how-to-solve-one-node-is-in-heavy-load-in-unbalanced-cluster-tp6630827p6639317.html
>> Sent from the cassandra-u...@incubator.apache.org mailing list archive at
>> Nabble.com.
>>
>
>

Re: how to solve one node is in heavy load in unbalanced cluster

2011-08-04 Thread Yan Chunlu

sorry the ring info should be this:

nodetool -h node3 ring
Address Status State   LoadOwnsToken


 84944475733633104818662955375549269696
node1  Up Normal  13.18 GB81.09%
 52773518586096316348543097376923124102
node2 Up Normal  22.85 GB10.48%
 70597222385644499881390884416714081360
node3  Up Leaving 25.44 GB8.43%
84944475733633104818662955375549269696



On Thu, Aug 4, 2011 at 4:55 PM, Yan Chunlu  wrote:

> I have tried the nodetool move but get the following error
>
> node3:~# nodetool -h node3 move 0
> Exception in thread "main" java.lang.IllegalStateException: replication
> factor (3) exceeds number of endpoints (2)
>  at
> org.apache.cassandra.locator.SimpleStrategy.calculateNaturalEndpoints(SimpleStrategy.java:60)
> at
> org.apache.cassandra.service.StorageService.calculatePendingRanges(StorageService.java:930)
>  at
> org.apache.cassandra.service.StorageService.calculatePendingRanges(StorageService.java:896)
> at
> org.apache.cassandra.service.StorageService.startLeaving(StorageService.java:1596)
>  at
> org.apache.cassandra.service.StorageService.move(StorageService.java:1734)
> at
> org.apache.cassandra.service.StorageService.move(StorageService.java:1709)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>  at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
>  at
> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93)
> at
> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27)
>  at
> com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208)
> at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120)
>  at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262)
> at
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836)
>  at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761)
> at
> javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1427)
>  at
> javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72)
> at
> javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265)
>  at
> javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360)
> at
> javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:788)
>  at sun.reflect.GeneratedMethodAccessor108.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>  at java.lang.reflect.Method.invoke(Method.java:597)
> at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305)
>  at sun.rmi.transport.Transport$1.run(Transport.java:159)
> at java.security.AccessController.doPrivileged(Native Method)
>  at sun.rmi.transport.Transport.serviceCall(Transport.java:155)
> at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535)
>  at
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)
> at
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649)
>  at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>  at java.lang.Thread.run(Thread.java:662)
>
>
>
>
> then nodetool shows the node is leaving
>
>
> nodetool -h node3 ring
> Address Status State   LoadOwnsToken
>
>
>  84944475733633104818662955375549269696
> node1  Up Normal  13.18 GB81.09%
>  52773518586096316348543097376923124102
> node2 Up Normal  22.85 GB10.48%
>  70597222385644499881390884416714081360
> node3  Up Leaving 25.44 GB8.43%
> 84944475733633104818662955375549269696
>
> the log didn't show any error message neither anything abnormal.  is there
> something wrong?
>
>
> I used to have RF=2, and changed it to RF=3 using cassandra-cli.
>
>
> On Mon, Aug 1, 2011 at 10:22 AM, Yan Chunlu  wrote:
>
>> thanks a lot! I will try the "move".
>>
>>
>> On Mon, Aug 1, 2011 at 7:07 AM, mcasandra  wrote:
>>
>>>
>>> springrider wrote:
>>> >
>>> > is that okay to do nodetool move before a completely repair?
>>> >
>>> > using this equation?
>>> > def tokens(nodes):
>>> >

Re: how to solve one node is in heavy load in unbalanced cluster

2011-08-04 Thread Yan Chunlu

also nothing happens about the streaming:

nodetool -h node3 netstats
Mode: Normal
Not sending any streams.
 Nothing streaming from /10.28.53.11
Pool NameActive   Pending  Completed
Commandsn/a 0  165086750
Responses   n/a 0   99372520



On Thu, Aug 4, 2011 at 4:56 PM, Yan Chunlu  wrote:

> sorry the ring info should be this:
>
> nodetool -h node3 ring
> Address Status State   LoadOwnsToken
>
>
>  84944475733633104818662955375549269696
> node1  Up Normal  13.18 GB81.09%
>  52773518586096316348543097376923124102
> node2 Up Normal  22.85 GB10.48%
>  70597222385644499881390884416714081360
> node3  Up Leaving 25.44 GB8.43%
> 84944475733633104818662955375549269696
>
>
>
> On Thu, Aug 4, 2011 at 4:55 PM, Yan Chunlu  wrote:
>
>> I have tried the nodetool move but get the following error
>>
>> node3:~# nodetool -h node3 move 0
>> Exception in thread "main" java.lang.IllegalStateException: replication
>> factor (3) exceeds number of endpoints (2)
>>  at
>> org.apache.cassandra.locator.SimpleStrategy.calculateNaturalEndpoints(SimpleStrategy.java:60)
>> at
>> org.apache.cassandra.service.StorageService.calculatePendingRanges(StorageService.java:930)
>>  at
>> org.apache.cassandra.service.StorageService.calculatePendingRanges(StorageService.java:896)
>> at
>> org.apache.cassandra.service.StorageService.startLeaving(StorageService.java:1596)
>>  at
>> org.apache.cassandra.service.StorageService.move(StorageService.java:1734)
>> at
>> org.apache.cassandra.service.StorageService.move(StorageService.java:1709)
>>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>  at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> at java.lang.reflect.Method.invoke(Method.java:597)
>>  at
>> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93)
>> at
>> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27)
>>  at
>> com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208)
>> at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120)
>>  at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262)
>> at
>> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836)
>>  at
>> com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761)
>> at
>> javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1427)
>>  at
>> javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72)
>> at
>> javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265)
>>  at
>> javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360)
>> at
>> javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:788)
>>  at sun.reflect.GeneratedMethodAccessor108.invoke(Unknown Source)
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>  at java.lang.reflect.Method.invoke(Method.java:597)
>> at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305)
>>  at sun.rmi.transport.Transport$1.run(Transport.java:159)
>> at java.security.AccessController.doPrivileged(Native Method)
>>  at sun.rmi.transport.Transport.serviceCall(Transport.java:155)
>> at
>> sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535)
>>  at
>> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)
>> at
>> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649)
>>  at
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>  at java.lang.Thread.run(Thread.java:662)
>>
>>
>>
>>
>> then nodetool shows the node is leaving
>>
>>
>> nodetool -h node3 ring
>> Address Status State   LoadOwnsToken
>>
>>
>>  84944475733633104818662955375549269696
>> node1  Up Normal  13.18 GB81.09%
>>  52773518586096316348543097376923124102
>> node2 Up Normal  22.85 GB10.48%
>

Re: how to solve one node is in heavy load in unbalanced cluster

2011-08-04 Thread Yan Chunlu

forgot to mention I am using cassandra 0.7.4

On Thu, Aug 4, 2011 at 5:00 PM, Yan Chunlu  wrote:

> also nothing happens about the streaming:
>
> nodetool -h node3 netstats
> Mode: Normal
> Not sending any streams.
>  Nothing streaming from /10.28.53.11
> Pool NameActive   Pending  Completed
> Commandsn/a 0  165086750
> Responses   n/a 0   99372520
>
>
>
> On Thu, Aug 4, 2011 at 4:56 PM, Yan Chunlu  wrote:
>
>> sorry the ring info should be this:
>>
>> nodetool -h node3 ring
>> Address Status State   LoadOwnsToken
>>
>>
>>  84944475733633104818662955375549269696
>> node1  Up Normal  13.18 GB81.09%
>>  52773518586096316348543097376923124102
>> node2 Up Normal  22.85 GB10.48%
>>  70597222385644499881390884416714081360
>> node3  Up Leaving 25.44 GB8.43%
>> 84944475733633104818662955375549269696
>>
>>
>>
>> On Thu, Aug 4, 2011 at 4:55 PM, Yan Chunlu  wrote:
>>
>>> I have tried the nodetool move but get the following error
>>>
>>> node3:~# nodetool -h node3 move 0
>>> Exception in thread "main" java.lang.IllegalStateException: replication
>>> factor (3) exceeds number of endpoints (2)
>>>  at
>>> org.apache.cassandra.locator.SimpleStrategy.calculateNaturalEndpoints(SimpleStrategy.java:60)
>>> at
>>> org.apache.cassandra.service.StorageService.calculatePendingRanges(StorageService.java:930)
>>>  at
>>> org.apache.cassandra.service.StorageService.calculatePendingRanges(StorageService.java:896)
>>> at
>>> org.apache.cassandra.service.StorageService.startLeaving(StorageService.java:1596)
>>>  at
>>> org.apache.cassandra.service.StorageService.move(StorageService.java:1734)
>>> at
>>> org.apache.cassandra.service.StorageService.move(StorageService.java:1709)
>>>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>> at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>  at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>> at java.lang.reflect.Method.invoke(Method.java:597)
>>>  at
>>> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93)
>>> at
>>> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27)
>>>  at
>>> com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208)
>>> at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120)
>>>  at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262)
>>> at
>>> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836)
>>>  at
>>> com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761)
>>> at
>>> javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1427)
>>>  at
>>> javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72)
>>> at
>>> javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265)
>>>  at
>>> javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360)
>>> at
>>> javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:788)
>>>  at sun.reflect.GeneratedMethodAccessor108.invoke(Unknown Source)
>>> at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>  at java.lang.reflect.Method.invoke(Method.java:597)
>>> at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305)
>>>  at sun.rmi.transport.Transport$1.run(Transport.java:159)
>>> at java.security.AccessController.doPrivileged(Native Method)
>>>  at sun.rmi.transport.Transport.serviceCall(Transport.java:155)
>>> at
>>> sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535)
>>>  at
>>> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)
>>> at
>>> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649)
>>>  at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>

Re: how to solve one node is in heavy load in unbalanced cluster

2011-08-04 Thread Yan Chunlu

hi, any  help? thanks!

On Thu, Aug 4, 2011 at 5:02 AM, Yan Chunlu  wrote:

> forgot to mention I am using cassandra 0.7.4
>
>
> On Thu, Aug 4, 2011 at 5:00 PM, Yan Chunlu  wrote:
>
>> also nothing happens about the streaming:
>>
>> nodetool -h node3 netstats
>> Mode: Normal
>> Not sending any streams.
>>  Nothing streaming from /10.28.53.11
>> Pool NameActive   Pending  Completed
>> Commandsn/a 0  165086750
>> Responses   n/a     0   99372520
>>
>>
>>
>> On Thu, Aug 4, 2011 at 4:56 PM, Yan Chunlu  wrote:
>>
>>> sorry the ring info should be this:
>>>
>>> nodetool -h node3 ring
>>> Address Status State   LoadOwnsToken
>>>
>>>
>>>  84944475733633104818662955375549269696
>>> node1  Up Normal  13.18 GB81.09%
>>>  52773518586096316348543097376923124102
>>> node2 Up Normal  22.85 GB10.48%
>>>  70597222385644499881390884416714081360
>>> node3  Up Leaving 25.44 GB8.43%
>>> 84944475733633104818662955375549269696
>>>
>>>
>>>
>>> On Thu, Aug 4, 2011 at 4:55 PM, Yan Chunlu wrote:
>>>
>>>> I have tried the nodetool move but get the following error
>>>>
>>>> node3:~# nodetool -h node3 move 0
>>>> Exception in thread "main" java.lang.IllegalStateException: replication
>>>> factor (3) exceeds number of endpoints (2)
>>>>  at
>>>> org.apache.cassandra.locator.SimpleStrategy.calculateNaturalEndpoints(SimpleStrategy.java:60)
>>>> at
>>>> org.apache.cassandra.service.StorageService.calculatePendingRanges(StorageService.java:930)
>>>>  at
>>>> org.apache.cassandra.service.StorageService.calculatePendingRanges(StorageService.java:896)
>>>> at
>>>> org.apache.cassandra.service.StorageService.startLeaving(StorageService.java:1596)
>>>>  at
>>>> org.apache.cassandra.service.StorageService.move(StorageService.java:1734)
>>>> at
>>>> org.apache.cassandra.service.StorageService.move(StorageService.java:1709)
>>>>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>> at
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>>  at
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>> at java.lang.reflect.Method.invoke(Method.java:597)
>>>>  at
>>>> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93)
>>>> at
>>>> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27)
>>>>  at
>>>> com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208)
>>>> at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120)
>>>>  at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262)
>>>> at
>>>> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836)
>>>>  at
>>>> com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761)
>>>> at
>>>> javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1427)
>>>>  at
>>>> javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72)
>>>> at
>>>> javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265)
>>>>  at
>>>> javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360)
>>>> at
>>>> javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:788)
>>>>  at sun.reflect.GeneratedMethodAccessor108.invoke(Unknown Source)
>>>> at
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>  at java.lang.reflect.Method.invoke(Method.java:597)
>>>> at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305)
>>>>  at sun.rmi.transport.Transport$1.run(Transport.java:159)
>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>>  at sun.rmi.transport.Transport.serviceCall(Transport.java:155)
>>>> at
>>>> sun.rmi.transport.tcp.TCPTransport.handleMess

move one node for load re-balancing then it status stuck at "Leaving"

2011-08-04 Thread Yan Chunlu

I have 3 nodes and the RF used to be 2, after awhile I have changed it
to 3;  using Cassandra 0.7.4
I have tried the nodetool move but get the following error
node3:~# nodetool -h node3 move 0
Exception in thread "main" java.lang.IllegalStateException:
replication factor (3) exceeds number of endpoints (2)
at 
org.apache.cassandra.locator.SimpleStrategy.calculateNaturalEndpoints(SimpleStrategy.java:60)
at 
org.apache.cassandra.service.StorageService.calculatePendingRanges(StorageService.java:930)
at 
org.apache.cassandra.service.StorageService.calculatePendingRanges(StorageService.java:896)
at 
org.apache.cassandra.service.StorageService.startLeaving(StorageService.java:1596)
at org.apache.cassandra.service.StorageService.move(StorageService.java:1734)
at org.apache.cassandra.service.StorageService.move(StorageService.java:1709)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93)
at 
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27)
at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208)
at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120)
at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262)
at 
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836)
at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761)
at 
javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1427)
at 
javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72)
at 
javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265)
at 
javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360)
at 
javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:788)
at sun.reflect.GeneratedMethodAccessor108.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305)
at sun.rmi.transport.Transport$1.run(Transport.java:159)
at java.security.AccessController.doPrivileged(Native Method)
at sun.rmi.transport.Transport.serviceCall(Transport.java:155)
at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535)
at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)
at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)



then nodetool shows the node is leaving
nodetool -h node3 ring
Address         Status State   Load            Owns    Token

84944475733633104818662955375549269696
node1      Up     Normal  13.18 GB        81.09%
52773518586096316348543097376923124102
node2     Up     Normal  22.85 GB        10.48%
70597222385644499881390884416714081360
node3      Up     Leaving 25.44 GB        8.43%
84944475733633104818662955375549269696


after go through the code I found the following code:
    /**
     * iterator over the Tokens in the given ring, starting with the
token for the node owning start
     * (which does not have to be a Token in the ring)
     * @param includeMin True if the minimum token should be returned
in the ring even if it has no owner.
     */
    public static Iterator ringIterator(final ArrayList
ring, Token start, boolean includeMin)



does "starting with the token for the node owning start" means I need
to move node1 at first?   what should I do now?  restart node3 and
start over?

why does it stuck at "Leaving" anyway?   it supposed to do or not do
it, not just stuck on the way..

Re: move one node for load re-balancing then it status stuck at "Leaving"

2011-08-05 Thread Yan Chunlu

nothing...

nodetool -h node3 netstats
Mode: Normal
Not sending any streams.
 Nothing streaming from /10.28.53.11
Pool NameActive   Pending  Completed
Commandsn/a 0  186669475
Responses   n/a 0  117986130


nodetool -h node3 compactionstats
compaction type: n/a
column family: n/a
bytes compacted: n/a
bytes total in progress: n/a
pending tasks: 0



On Fri, Aug 5, 2011 at 1:47 PM, mcasandra  wrote:
> Check things like netstats, disk space etc to see why it's in Leaving state.
> Anything in the logs that shows Leaving?
>
> --
> View this message in context: 
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/move-one-node-for-load-re-balancing-then-it-status-stuck-at-Leaving-tp6655168p6655326.html
> Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
> Nabble.com.
>

Re: move one node for load re-balancing then it status stuck at "Leaving"

2011-08-06 Thread Yan Chunlu

is that possible that the implements of cassandra only calculate live nodes?

for example:
"node move node3" cause node3 "Leaving", then cassandra iterate over the
endpoints and found node1 and node2. so the endpoints is 2, but RF=3,
Exception raised.

is that true?



On Fri, Aug 5, 2011 at 3:20 PM, Yan Chunlu  wrote:

> nothing...
>
> nodetool -h node3 netstats
> Mode: Normal
> Not sending any streams.
>  Nothing streaming from /10.28.53.11
> Pool NameActive   Pending  Completed
> Commandsn/a 0  186669475
> Responses   n/a 0  117986130
>
>
> nodetool -h node3 compactionstats
> compaction type: n/a
> column family: n/a
> bytes compacted: n/a
> bytes total in progress: n/a
> pending tasks: 0
>
>
>
> On Fri, Aug 5, 2011 at 1:47 PM, mcasandra  wrote:
> > Check things like netstats, disk space etc to see why it's in Leaving
> state.
> > Anything in the logs that shows Leaving?
> >
> > --
> > View this message in context:
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/move-one-node-for-load-re-balancing-then-it-status-stuck-at-Leaving-tp6655168p6655326.html
> > Sent from the cassandra-u...@incubator.apache.org mailing list archive
> at Nabble.com.
> >
>

Re: move one node for load re-balancing then it status stuck at "Leaving"

2011-08-07 Thread Yan Chunlu

thanks for the help!

On Sun, Aug 7, 2011 at 2:10 PM, Dikang Gu  wrote:

> Yes, I think you are right.
>
> The "nodetool move" will move the keys on the node to the other two nodes,
> and the required replication is 3, but you will only have 2 live nodes after
> the move, so you have the exception.
>
>
> On Sun, Aug 7, 2011 at 2:03 PM, Yan Chunlu  wrote:
>
>> is that possible that the implements of cassandra only calculate live
>> nodes?
>>
>> for example:
>> "node move node3" cause node3 "Leaving", then cassandra iterate over the
>> endpoints and found node1 and node2. so the endpoints is 2, but RF=3,
>> Exception raised.
>>
>> is that true?
>>
>>
>>
>> On Fri, Aug 5, 2011 at 3:20 PM, Yan Chunlu  wrote:
>>
>>> nothing...
>>>
>>> nodetool -h node3 netstats
>>> Mode: Normal
>>> Not sending any streams.
>>>  Nothing streaming from /10.28.53.11
>>> Pool NameActive   Pending  Completed
>>> Commandsn/a 0  186669475
>>> Responses   n/a 0  117986130
>>>
>>>
>>> nodetool -h node3 compactionstats
>>> compaction type: n/a
>>> column family: n/a
>>> bytes compacted: n/a
>>> bytes total in progress: n/a
>>> pending tasks: 0
>>>
>>>
>>>
>>> On Fri, Aug 5, 2011 at 1:47 PM, mcasandra 
>>> wrote:
>>> > Check things like netstats, disk space etc to see why it's in Leaving
>>> state.
>>> > Anything in the logs that shows Leaving?
>>> >
>>> > --
>>> > View this message in context:
>>> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/move-one-node-for-load-re-balancing-then-it-status-stuck-at-Leaving-tp6655168p6655326.html
>>> > Sent from the cassandra-u...@incubator.apache.org mailing list archive
>>> at Nabble.com.
>>> >
>>>
>>
>>
>
>
> --
> Dikang Gu
>
> 0086 - 18611140205
>
>

Re: how to solve one node is in heavy load in unbalanced cluster

2011-08-07 Thread Yan Chunlu

thanks for the confirmation aaron!

On Sun, Aug 7, 2011 at 4:01 PM, aaron morton wrote:

> move first removes the node from the cluster, then adds it back
> http://wiki.apache.org/cassandra/Operations#Moving_nodes
>
> If you have 3 nodes and rf 3, removing the node will result in the error
> you are seeing. There is not enough nodes in the cluster to implement the
> replication factor.
>
> You can drop the RF down to 2 temporarily and then put it back to 3 later,
> see http://wiki.apache.org/cassandra/Operations#Replication
>
> Cheers
>
> -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 5 Aug 2011, at 03:39, Yan Chunlu wrote:
>
> hi, any  help? thanks!
>
> On Thu, Aug 4, 2011 at 5:02 AM, Yan Chunlu  wrote:
>
>> forgot to mention I am using cassandra 0.7.4
>>
>>
>> On Thu, Aug 4, 2011 at 5:00 PM, Yan Chunlu  wrote:
>>
>>> also nothing happens about the streaming:
>>>
>>> nodetool -h node3 netstats
>>> Mode: Normal
>>> Not sending any streams.
>>>  Nothing streaming from /10.28.53.11
>>> Pool NameActive   Pending  Completed
>>> Commands    n/a 0  165086750
>>> Responses   n/a 0   99372520
>>>
>>>
>>>
>>> On Thu, Aug 4, 2011 at 4:56 PM, Yan Chunlu wrote:
>>>
>>>> sorry the ring info should be this:
>>>>
>>>> nodetool -h node3 ring
>>>> Address Status State   LoadOwnsToken
>>>>
>>>>
>>>>  84944475733633104818662955375549269696
>>>> node1  Up Normal  13.18 GB81.09%
>>>>  52773518586096316348543097376923124102
>>>> node2 Up Normal  22.85 GB10.48%
>>>>  70597222385644499881390884416714081360
>>>> node3  Up Leaving 25.44 GB8.43%
>>>> 84944475733633104818662955375549269696
>>>>
>>>>
>>>>
>>>> On Thu, Aug 4, 2011 at 4:55 PM, Yan Chunlu wrote:
>>>>
>>>>> I have tried the nodetool move but get the following error
>>>>>
>>>>> node3:~# nodetool -h node3 move 0
>>>>> Exception in thread "main" java.lang.IllegalStateException: replication
>>>>> factor (3) exceeds number of endpoints (2)
>>>>>  at
>>>>> org.apache.cassandra.locator.SimpleStrategy.calculateNaturalEndpoints(SimpleStrategy.java:60)
>>>>> at
>>>>> org.apache.cassandra.service.StorageService.calculatePendingRanges(StorageService.java:930)
>>>>>  at
>>>>> org.apache.cassandra.service.StorageService.calculatePendingRanges(StorageService.java:896)
>>>>> at
>>>>> org.apache.cassandra.service.StorageService.startLeaving(StorageService.java:1596)
>>>>>  at
>>>>> org.apache.cassandra.service.StorageService.move(StorageService.java:1734)
>>>>> at
>>>>> org.apache.cassandra.service.StorageService.move(StorageService.java:1709)
>>>>>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>> at
>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>>>  at
>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>> at java.lang.reflect.Method.invoke(Method.java:597)
>>>>>  at
>>>>> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93)
>>>>> at
>>>>> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27)
>>>>>  at
>>>>> com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208)
>>>>> at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120)
>>>>>  at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262)
>>>>> at
>>>>> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836)
>>>>>  at
>>>>> com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761)
>>>>> at
>>>>> javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1427)
>>>>>  at
>>>>> javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionIm

node restart taking too long

2011-08-14 Thread Yan Chunlu

I got 3 nodes and RF=3, when I repairing ndoe3, it seems alot data
generated.  and server can not afford the load then crashed.
after come back, node 3 can not return for more than 96 hours

for 34GB data, the node 2 could restart and back online within 1 hour.

I am not sure what's wrong with node3 and should I restart node 3 again?
thanks!

Address Status State   LoadOwnsToken

113427455640312821154458202477256070484
node1 Up Normal  34.11 GB33.33%  0
node2 Up Normal  31.44 GB33.33%
56713727820156410577229101238628035242
node3 Down   Normal  177.55 GB   33.33%
113427455640312821154458202477256070484


the log shows it is still going on, not sure why it is so slow:


 INFO [main] 2011-08-14 08:55:47,734 SSTableReader.java (line 154) Opening
/cassandra/data/COMMENT
 INFO [main] 2011-08-14 08:55:47,828 ColumnFamilyStore.java (line 275)
reading saved cache /cassandra/saved_caches/COMMENT-RowCache
 INFO [main] 2011-08-14 09:24:52,198 ColumnFamilyStore.java (line 547)
completed loading (1744370 ms; 20 keys) row cache for COMMENT
 INFO [main] 2011-08-14 09:24:52,299 ColumnFamilyStore.java (line 275)
reading saved cache /cassandra/saved_caches/COMMENT-RowCache
 INFO [CompactionExecutor:1] 2011-08-14 10:24:55,480 CacheWriter.java (line
96) Saved COMMENT-RowCache (20 items) in 2535 ms

Re: node restart taking too long

2011-08-16 Thread Yan Chunlu

but it seems the row cache is cluster wide, how will  the change of row
cache affect the read speed?

On Mon, Aug 15, 2011 at 7:33 AM, Jonathan Ellis  wrote:

> Or leave row cache enabled but disable cache saving (and remove the
> one already on disk).
>
> On Sun, Aug 14, 2011 at 5:05 PM, aaron morton 
> wrote:
> >  INFO [main] 2011-08-14 09:24:52,198 ColumnFamilyStore.java (line 547)
> > completed loading (1744370 ms; 20 keys) row cache for COMMENT
> >
> > It's taking 29 minutes to load 200,000 rows in the  row cache. Thats a
> > pretty big row cache, I would suggest reducing or disabling it.
> > Background
> http://www.datastax.com/dev/blog/maximizing-cache-benefit-with-cassandra
> >
> > and server can not afford the load then crashed. after come back, node 3
> can
> > not return for more than 96 hours
> >
> > Crashed how ?
> > You may be seeing https://issues.apache.org/jira/browse/CASSANDRA-2280
> > Watch nodetool compactionstats to see when the Merkle tree build finishes
> > and nodetool netstats to see which CF's are streaming.
> > Cheers
> > -
> > Aaron Morton
> > Freelance Cassandra Developer
> > @aaronmorton
> > http://www.thelastpickle.com
> > On 15 Aug 2011, at 04:23, Yan Chunlu wrote:
> >
> >
> > I got 3 nodes and RF=3, when I repairing ndoe3, it seems alot data
> > generated.  and server can not afford the load then crashed.
> > after come back, node 3 can not return for more than 96 hours
> >
> > for 34GB data, the node 2 could restart and back online within 1 hour.
> >
> > I am not sure what's wrong with node3 and should I restart node 3 again?
> > thanks!
> >
> > Address Status State   LoadOwnsToken
> >
> > 113427455640312821154458202477256070484
> > node1 Up Normal  34.11 GB33.33%  0
> > node2 Up Normal  31.44 GB33.33%
> > 56713727820156410577229101238628035242
> > node3 Down   Normal  177.55 GB   33.33%
> > 113427455640312821154458202477256070484
> >
> >
> > the log shows it is still going on, not sure why it is so slow:
> >
> >
> >  INFO [main] 2011-08-14 08:55:47,734 SSTableReader.java (line 154)
> Opening
> > /cassandra/data/COMMENT
> >  INFO [main] 2011-08-14 08:55:47,828 ColumnFamilyStore.java (line 275)
> > reading saved cache /cassandra/saved_caches/COMMENT-RowCache
> >  INFO [main] 2011-08-14 09:24:52,198 ColumnFamilyStore.java (line 547)
> > completed loading (1744370 ms; 20 keys) row cache for COMMENT
> >  INFO [main] 2011-08-14 09:24:52,299 ColumnFamilyStore.java (line 275)
> > reading saved cache /cassandra/saved_caches/COMMENT-RowCache
> >  INFO [CompactionExecutor:1] 2011-08-14 10:24:55,480 CacheWriter.java
> (line
> > 96) Saved COMMENT-RowCache (20 items) in 2535 ms
> >
> >
> >
> >
> >
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>

Re: node restart taking too long

2011-08-16 Thread Yan Chunlu

 I saw alot slicequeryfilter things if changed the log level to DEBUG.  just
thought even bring up a new node will be faster than start the old one.
it is wired

DEBUG [main] 2011-08-16 06:32:49,213 SliceQueryFilter.java (line 123)
collecting 0 of 2147483647: 76616c7565:false:225@1313068845474382
DEBUG [main] 2011-08-16 06:32:49,245 SliceQueryFilter.java (line 123)
collecting 0 of 2147483647: 76616c7565:false:453@1310999270198313
DEBUG [main] 2011-08-16 06:32:49,251 SliceQueryFilter.java (line 123)
collecting 0 of 2147483647: 76616c7565:false:26@1313199902088827
DEBUG [main] 2011-08-16 06:32:49,576 SliceQueryFilter.java (line 123)
collecting 0 of 2147483647: 76616c7565:false:157@1313097239332314
DEBUG [main] 2011-08-16 06:32:50,674 SliceQueryFilter.java (line 123)
collecting 0 of 2147483647: 76616c7565:false:41729@1313190821826229
DEBUG [main] 2011-08-16 06:32:50,811 SliceQueryFilter.java (line 123)
collecting 0 of 2147483647: 76616c7565:false:6@1313174157301203
DEBUG [main] 2011-08-16 06:32:50,867 SliceQueryFilter.java (line 123)
collecting 0 of 2147483647: 76616c7565:false:98@1312011362250907
DEBUG [main] 2011-08-16 06:32:50,881 SliceQueryFilter.java (line 123)
collecting 0 of 2147483647: 76616c7565:false:42@1313201711997005
DEBUG [main] 2011-08-16 06:32:50,910 SliceQueryFilter.java (line 123)
collecting 0 of 2147483647: 76616c7565:false:96@1312939986190155
DEBUG [main] 2011-08-16 06:32:50,954 SliceQueryFilter.java (line 123)
collecting 0 of 2147483647: 76616c7565:false:621@1313192538616112



On Tue, Aug 16, 2011 at 7:32 PM, Yan Chunlu  wrote:

> but it seems the row cache is cluster wide, how will  the change of row
> cache affect the read speed?
>
>
> On Mon, Aug 15, 2011 at 7:33 AM, Jonathan Ellis  wrote:
>
>> Or leave row cache enabled but disable cache saving (and remove the
>> one already on disk).
>>
>> On Sun, Aug 14, 2011 at 5:05 PM, aaron morton 
>> wrote:
>> >  INFO [main] 2011-08-14 09:24:52,198 ColumnFamilyStore.java (line 547)
>> > completed loading (1744370 ms; 20 keys) row cache for COMMENT
>> >
>> > It's taking 29 minutes to load 200,000 rows in the  row cache. Thats a
>> > pretty big row cache, I would suggest reducing or disabling it.
>> > Background
>> http://www.datastax.com/dev/blog/maximizing-cache-benefit-with-cassandra
>> >
>> > and server can not afford the load then crashed. after come back, node 3
>> can
>> > not return for more than 96 hours
>> >
>> > Crashed how ?
>> > You may be seeing https://issues.apache.org/jira/browse/CASSANDRA-2280
>> > Watch nodetool compactionstats to see when the Merkle tree build
>> finishes
>> > and nodetool netstats to see which CF's are streaming.
>> > Cheers
>> > -
>> > Aaron Morton
>> > Freelance Cassandra Developer
>> > @aaronmorton
>> > http://www.thelastpickle.com
>> > On 15 Aug 2011, at 04:23, Yan Chunlu wrote:
>> >
>> >
>> > I got 3 nodes and RF=3, when I repairing ndoe3, it seems alot data
>> > generated.  and server can not afford the load then crashed.
>> > after come back, node 3 can not return for more than 96 hours
>> >
>> > for 34GB data, the node 2 could restart and back online within 1 hour.
>> >
>> > I am not sure what's wrong with node3 and should I restart node 3 again?
>> > thanks!
>> >
>> > Address Status State   LoadOwnsToken
>> >
>> > 113427455640312821154458202477256070484
>> > node1 Up Normal  34.11 GB33.33%  0
>> > node2 Up Normal  31.44 GB33.33%
>> > 56713727820156410577229101238628035242
>> > node3 Down   Normal  177.55 GB   33.33%
>> > 113427455640312821154458202477256070484
>> >
>> >
>> > the log shows it is still going on, not sure why it is so slow:
>> >
>> >
>> >  INFO [main] 2011-08-14 08:55:47,734 SSTableReader.java (line 154)
>> Opening
>> > /cassandra/data/COMMENT
>> >  INFO [main] 2011-08-14 08:55:47,828 ColumnFamilyStore.java (line 275)
>> > reading saved cache /cassandra/saved_caches/COMMENT-RowCache
>> >  INFO [main] 2011-08-14 09:24:52,198 ColumnFamilyStore.java (line 547)
>> > completed loading (1744370 ms; 20 keys) row cache for COMMENT
>> >  INFO [main] 2011-08-14 09:24:52,299 ColumnFamilyStore.java (line 275)
>> > reading saved cache /cassandra/saved_caches/COMMENT-RowCache
>> >  INFO [CompactionExecutor:1] 2011-08-14 10:24:55,480 CacheWriter.java
>> (line
>> > 96) Saved COMMENT-RowCache (20 items) in 2535 ms
>> >
>> >
>> >
>> >
>> >
>> >
>>
>>
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of DataStax, the source for professional Cassandra support
>> http://www.datastax.com
>>
>
>

Re: node restart taking too long

2011-08-16 Thread Yan Chunlu

does this need to be cluster wide? or I could just modify the caches
on one node?   since I could not connect to the node with
cassandra-cli, it says "connection refused"


[default@unknown] connect node2/9160;
Exception connecting to node2/9160. Reason: Connection refused.


so if I change the cache size via other nodes, how could node2 be
notified the changing?kill cassandra and start it again could make
it update the schema?



On Wed, Aug 17, 2011 at 5:59 AM, Teijo Holzer  wrote:
> Hi,
>
> yes, we saw exactly the same messages. We got rid of these by doing the
> following:
>
> * Set all row & key caches in your CFs to 0 via cassandra-cli
> * Kill Cassandra
> * Remove all files in the saved_caches directory
> * Start Cassandra
> * Slowly bring back row & key caches (if desired, we left them off)
>
> Cheers,
>
>        T.
>
> On 16/08/11 23:35, Yan Chunlu wrote:
>>
>>  I saw alot slicequeryfilter things if changed the log level to DEBUG.
>>  just
>> thought even bring up a new node will be faster than start the old
>> one. it
>> is wired
>>
>> DEBUG [main] 2011-08-16 06:32:49,213 SliceQueryFilter.java (line 123)
>> collecting 0 of 2147483647: 76616c7565:false:225@1313068845474382
>> DEBUG [main] 2011-08-16 06:32:49,245 SliceQueryFilter.java (line 123)
>> collecting 0 of 2147483647: 76616c7565:false:453@1310999270198313
>> DEBUG [main] 2011-08-16 06:32:49,251 SliceQueryFilter.java (line 123)
>> collecting 0 of 2147483647: 76616c7565:false:26@1313199902088827
>> DEBUG [main] 2011-08-16 06:32:49,576 SliceQueryFilter.java (line 123)
>> collecting 0 of 2147483647: 76616c7565:false:157@1313097239332314
>> DEBUG [main] 2011-08-16 06:32:50,674 SliceQueryFilter.java (line 123)
>> collecting 0 of 2147483647: 76616c7565:false:41729@1313190821826229
>> DEBUG [main] 2011-08-16 06:32:50,811 SliceQueryFilter.java (line 123)
>> collecting 0 of 2147483647: 76616c7565:false:6@1313174157301203
>> DEBUG [main] 2011-08-16 06:32:50,867 SliceQueryFilter.java (line 123)
>> collecting 0 of 2147483647: 76616c7565:false:98@1312011362250907
>> DEBUG [main] 2011-08-16 06:32:50,881 SliceQueryFilter.java (line 123)
>> collecting 0 of 2147483647: 76616c7565:false:42@1313201711997005
>> DEBUG [main] 2011-08-16 06:32:50,910 SliceQueryFilter.java (line 123)
>> collecting 0 of 2147483647: 76616c7565:false:96@1312939986190155
>> DEBUG [main] 2011-08-16 06:32:50,954 SliceQueryFilter.java (line 123)
>> collecting 0 of 2147483647: 76616c7565:false:621@1313192538616112
>>
>>
>>
>> On Tue, Aug 16, 2011 at 7:32 PM, Yan Chunlu > <mailto:springri...@gmail.com>> wrote:
>>
>>    but it seems the row cache is cluster wide, how will  the change of row
>>    cache affect the read speed?
>>
>>
>>    On Mon, Aug 15, 2011 at 7:33 AM, Jonathan Ellis >    <mailto:jbel...@gmail.com>> wrote:
>>
>>        Or leave row cache enabled but disable cache saving (and remove the
>>        one already on disk).
>>
>>        On Sun, Aug 14, 2011 at 5:05 PM, aaron morton
>> >        <mailto:aa...@thelastpickle.com>> wrote:
>>         >  INFO [main] 2011-08-14 09:24:52,198 ColumnFamilyStore.java
>> (line 547)
>>         > completed loading (1744370 ms; 20 keys) row cache for
>> COMMENT
>>         >
>>         > It's taking 29 minutes to load 200,000 rows in the  row cache.
>> Thats a
>>         > pretty big row cache, I would suggest reducing or disabling it.
>>         > Background
>>
>>  http://www.datastax.com/dev/blog/maximizing-cache-benefit-with-cassandra
>>         >
>>         > and server can not afford the load then crashed. after come
>> back,
>>        node 3 can
>>         > not return for more than 96 hours
>>         >
>>         > Crashed how ?
>>         > You may be seeing
>> https://issues.apache.org/jira/browse/CASSANDRA-2280
>>         > Watch nodetool compactionstats to see when the Merkle tree build
>>        finishes
>>         > and nodetool netstats to see which CF's are streaming.
>>         > Cheers
>>         > -
>>         > Aaron Morton
>>         > Freelance Cassandra Developer
>>         > @aaronmorton
>>         > http://www.thelastpickle.com
>>         > On 15 Aug 2011, at 04:23, Yan Chunlu wrote:
>>         >
>>         >
>>         > I got 3 nodes and RF=3, when I repairing ndoe3, it seems alot
>> data
>>         > generated.  and server can not af

Re: node restart taking too long

2011-08-17 Thread Yan Chunlu

but the data size in the saved_cache are relatively small:

will that cause the load problem?

 ls  -lh  /cassandra/saved_caches/
total 32M
-rw-r--r-- 1 cass cass 2.9M 2011-08-12 19:53 cass-CommentSortsCache-KeyCache
-rw-r--r-- 1 cass cass 2.9M 2011-08-17 04:29 cass-CommentSortsCache-RowCache
-rw-r--r-- 1 cass cass 2.7M 2011-08-12 18:50 cass-CommentVote-KeyCache
-rw-r--r-- 1 cass cass 140K 2011-08-12 19:53 cass-device_images-KeyCache
-rw-r--r-- 1 cass cass  33K 2011-08-12 18:51 cass-Hide-KeyCache
-rw-r--r-- 1 cass cass 4.6M 2011-08-12 19:53 cass-images-KeyCache
-rw-r--r-- 1 cass cass 2.6M 2011-08-12 19:53 cass-LinksByUrl-KeyCache
-rw-r--r-- 1 cass cass 2.5M 2011-08-12 18:50 cass-LinkVote-KeyCache
-rw-r--r-- 1 cass cass 7.5M 2011-08-12 18:50 cass-cache-KeyCache
-rw-r--r-- 1 cass cass 3.7M 2011-08-12 21:51 cass-cache-RowCache
-rw-r--r-- 1 cass cass 1.8M 2011-08-12 18:51 cass-Save-KeyCache
-rw-r--r-- 1 cass cass 111K 2011-08-12 19:50 cass-SavesByAccount-KeyCache
-rw-r--r-- 1 cass cass  864 2011-08-12 19:49 cass-VotesByDay-KeyCache
-rw-r--r-- 1 cass cass 249K 2011-08-12 19:49 cass-VotesByLink-KeyCache
-rw-r--r-- 1 cass cass   28 2011-08-14 12:50 system-HintsColumnFamily-KeyCache
-rw-r--r-- 1 cass cass5 2011-08-14 12:50 system-LocationInfo-KeyCache
-rw-r--r-- 1 cass cass   54 2011-08-13 13:30 system-Migrations-KeyCache
-rw-r--r-- 1 cass cass   76 2011-08-13 13:30 system-Schema-KeyCache

On Wed, Aug 17, 2011 at 4:31 PM, aaron morton  wrote:
> If you have a node that cannot start up due to issues loading the saved cache 
> delete the files in the saved_cache directory before starting it.
>
> The settings to save the row and key cache are per CF. You can change them 
> with an update column family statement via the CLI when attached to any node. 
> You may then want to check the saved_caches directory and delete any files 
> that are left (not sure if they are automatically deleted).
>
> i would recommend:
> - stop node 2
> - delete it's saved_cache
> - make the schema change via another node
> - startup node 2
>
> Cheers
>
> -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 17/08/2011, at 2:59 PM, Yan Chunlu wrote:
>
>> does this need to be cluster wide? or I could just modify the caches
>> on one node?   since I could not connect to the node with
>> cassandra-cli, it says "connection refused"
>>
>>
>> [default@unknown] connect node2/9160;
>> Exception connecting to node2/9160. Reason: Connection refused.
>>
>>
>> so if I change the cache size via other nodes, how could node2 be
>> notified the changing?    kill cassandra and start it again could make
>> it update the schema?
>>
>>
>>
>> On Wed, Aug 17, 2011 at 5:59 AM, Teijo Holzer  wrote:
>>> Hi,
>>>
>>> yes, we saw exactly the same messages. We got rid of these by doing the
>>> following:
>>>
>>> * Set all row & key caches in your CFs to 0 via cassandra-cli
>>> * Kill Cassandra
>>> * Remove all files in the saved_caches directory
>>> * Start Cassandra
>>> * Slowly bring back row & key caches (if desired, we left them off)
>>>
>>> Cheers,
>>>
>>>        T.
>>>
>>> On 16/08/11 23:35, Yan Chunlu wrote:
>>>>
>>>>  I saw alot slicequeryfilter things if changed the log level to DEBUG.
>>>>  just
>>>> thought even bring up a new node will be faster than start the old
>>>> one. it
>>>> is wired
>>>>
>>>> DEBUG [main] 2011-08-16 06:32:49,213 SliceQueryFilter.java (line 123)
>>>> collecting 0 of 2147483647: 76616c7565:false:225@1313068845474382
>>>> DEBUG [main] 2011-08-16 06:32:49,245 SliceQueryFilter.java (line 123)
>>>> collecting 0 of 2147483647: 76616c7565:false:453@1310999270198313
>>>> DEBUG [main] 2011-08-16 06:32:49,251 SliceQueryFilter.java (line 123)
>>>> collecting 0 of 2147483647: 76616c7565:false:26@1313199902088827
>>>> DEBUG [main] 2011-08-16 06:32:49,576 SliceQueryFilter.java (line 123)
>>>> collecting 0 of 2147483647: 76616c7565:false:157@1313097239332314
>>>> DEBUG [main] 2011-08-16 06:32:50,674 SliceQueryFilter.java (line 123)
>>>> collecting 0 of 2147483647: 76616c7565:false:41729@1313190821826229
>>>> DEBUG [main] 2011-08-16 06:32:50,811 SliceQueryFilter.java (line 123)
>>>> collecting 0 of 2147483647: 76616c7565:false:6@1313174157301203
>>>> DEBUG [main] 2011-08-16 06:32:50,867 SliceQueryFilter.java (line 123)
>>>> collecting 0 of 2147483647: 76616c7565:false:98@1312011362250907
>>>>

Re: node restart taking too long

2011-08-18 Thread Yan Chunlu

thanks a lot for  all the help!  I have gone through the steps and
successfully brought up the node2 :)

On Thu, Aug 18, 2011 at 10:51 AM, Boris Yen  wrote:
> Because the file only preserve the "key" of records, not the whole record.
> Records for those saved key will be loaded into cassandra during the
startup
> of cassandra.
>
> On Wed, Aug 17, 2011 at 5:52 PM, Yan Chunlu  wrote:
>>
>> but the data size in the saved_cache are relatively small:
>>
>> will that cause the load problem?
>>
>>  ls  -lh  /cassandra/saved_caches/
>> total 32M
>> -rw-r--r-- 1 cass cass 2.9M 2011-08-12 19:53
>> cass-CommentSortsCache-KeyCache
>> -rw-r--r-- 1 cass cass 2.9M 2011-08-17 04:29
>> cass-CommentSortsCache-RowCache
>> -rw-r--r-- 1 cass cass 2.7M 2011-08-12 18:50 cass-CommentVote-KeyCache
>> -rw-r--r-- 1 cass cass 140K 2011-08-12 19:53 cass-device_images-KeyCache
>> -rw-r--r-- 1 cass cass  33K 2011-08-12 18:51 cass-Hide-KeyCache
>> -rw-r--r-- 1 cass cass 4.6M 2011-08-12 19:53 cass-images-KeyCache
>> -rw-r--r-- 1 cass cass 2.6M 2011-08-12 19:53 cass-LinksByUrl-KeyCache
>> -rw-r--r-- 1 cass cass 2.5M 2011-08-12 18:50 cass-LinkVote-KeyCache
>> -rw-r--r-- 1 cass cass 7.5M 2011-08-12 18:50 cass-cache-KeyCache
>> -rw-r--r-- 1 cass cass 3.7M 2011-08-12 21:51 cass-cache-RowCache
>> -rw-r--r-- 1 cass cass 1.8M 2011-08-12 18:51 cass-Save-KeyCache
>> -rw-r--r-- 1 cass cass 111K 2011-08-12 19:50 cass-SavesByAccount-KeyCache
>> -rw-r--r-- 1 cass cass  864 2011-08-12 19:49 cass-VotesByDay-KeyCache
>> -rw-r--r-- 1 cass cass 249K 2011-08-12 19:49 cass-VotesByLink-KeyCache
>> -rw-r--r-- 1 cass cass   28 2011-08-14 12:50
>> system-HintsColumnFamily-KeyCache
>> -rw-r--r-- 1 cass cass5 2011-08-14 12:50 system-LocationInfo-KeyCache
>> -rw-r--r-- 1 cass cass   54 2011-08-13 13:30 system-Migrations-KeyCache
>> -rw-r--r-- 1 cass cass   76 2011-08-13 13:30 system-Schema-KeyCache
>>
>> On Wed, Aug 17, 2011 at 4:31 PM, aaron morton 
>> wrote:
>> > If you have a node that cannot start up due to issues loading the saved
>> > cache delete the files in the saved_cache directory before starting it.
>> >
>> > The settings to save the row and key cache are per CF. You can change
>> > them with an update column family statement via the CLI when attached
to any
>> > node. You may then want to check the saved_caches directory and delete
any
>> > files that are left (not sure if they are automatically deleted).
>> >
>> > i would recommend:
>> > - stop node 2
>> > - delete it's saved_cache
>> > - make the schema change via another node
>> > - startup node 2
>> >
>> > Cheers
>> >
>> > -
>> > Aaron Morton
>> > Freelance Cassandra Developer
>> > @aaronmorton
>> > http://www.thelastpickle.com
>> >
>> > On 17/08/2011, at 2:59 PM, Yan Chunlu wrote:
>> >
>> >> does this need to be cluster wide? or I could just modify the caches
>> >> on one node?   since I could not connect to the node with
>> >> cassandra-cli, it says "connection refused"
>> >>
>> >>
>> >> [default@unknown] connect node2/9160;
>> >> Exception connecting to node2/9160. Reason: Connection refused.
>> >>
>> >>
>> >> so if I change the cache size via other nodes, how could node2 be
>> >> notified the changing?kill cassandra and start it again could make
>> >> it update the schema?
>> >>
>> >>
>> >>
>> >> On Wed, Aug 17, 2011 at 5:59 AM, Teijo Holzer 
>> >> wrote:
>> >>> Hi,
>> >>>
>> >>> yes, we saw exactly the same messages. We got rid of these by doing
>> >>> the
>> >>> following:
>> >>>
>> >>> * Set all row & key caches in your CFs to 0 via cassandra-cli
>> >>> * Kill Cassandra
>> >>> * Remove all files in the saved_caches directory
>> >>> * Start Cassandra
>> >>> * Slowly bring back row & key caches (if desired, we left them off)
>> >>>
>> >>> Cheers,
>> >>>
>> >>>T.
>> >>>
>> >>> On 16/08/11 23:35, Yan Chunlu wrote:
>> >>>>
>> >>>>  I saw alot slicequeryfilter things if changed the log level to
>> >>>> DEBUG.
>> >>>>  just
>> >>>> thought even bring up a new node will be faster than st

Re: node restart taking too long

2011-08-18 Thread Yan Chunlu

just found out that changes via cassandra-cli, the schema change didn't
reach node2. and node2 became unreachable

I did as this document:
http://wiki.apache.org/cassandra/FAQ#schema_disagreement

but after that I just got two schema versons:



ddcada52-c96a-11e0-99af-3bd951658d61: [node1, node3]
 2127b2ef-6998-11e0-b45b-3bd951658d61: [node2]


is that enough delete Schema* && Migrations* sstables and restart the node?



On Thu, Aug 18, 2011 at 5:13 PM, Yan Chunlu  wrote:

> thanks a lot for  all the help!  I have gone through the steps and
> successfully brought up the node2 :)
>
>
> On Thu, Aug 18, 2011 at 10:51 AM, Boris Yen  wrote:
> > Because the file only preserve the "key" of records, not the whole
> record.
> > Records for those saved key will be loaded into cassandra during the
> startup
> > of cassandra.
> >
> > On Wed, Aug 17, 2011 at 5:52 PM, Yan Chunlu 
> wrote:
> >>
> >> but the data size in the saved_cache are relatively small:
> >>
> >> will that cause the load problem?
> >>
> >>  ls  -lh  /cassandra/saved_caches/
> >> total 32M
> >> -rw-r--r-- 1 cass cass 2.9M 2011-08-12 19:53
> >> cass-CommentSortsCache-KeyCache
> >> -rw-r--r-- 1 cass cass 2.9M 2011-08-17 04:29
> >> cass-CommentSortsCache-RowCache
> >> -rw-r--r-- 1 cass cass 2.7M 2011-08-12 18:50 cass-CommentVote-KeyCache
> >> -rw-r--r-- 1 cass cass 140K 2011-08-12 19:53 cass-device_images-KeyCache
> >> -rw-r--r-- 1 cass cass  33K 2011-08-12 18:51 cass-Hide-KeyCache
> >> -rw-r--r-- 1 cass cass 4.6M 2011-08-12 19:53 cass-images-KeyCache
> >> -rw-r--r-- 1 cass cass 2.6M 2011-08-12 19:53 cass-LinksByUrl-KeyCache
> >> -rw-r--r-- 1 cass cass 2.5M 2011-08-12 18:50 cass-LinkVote-KeyCache
> >> -rw-r--r-- 1 cass cass 7.5M 2011-08-12 18:50 cass-cache-KeyCache
> >> -rw-r--r-- 1 cass cass 3.7M 2011-08-12 21:51 cass-cache-RowCache
> >> -rw-r--r-- 1 cass cass 1.8M 2011-08-12 18:51 cass-Save-KeyCache
> >> -rw-r--r-- 1 cass cass 111K 2011-08-12 19:50
> cass-SavesByAccount-KeyCache
> >> -rw-r--r-- 1 cass cass  864 2011-08-12 19:49 cass-VotesByDay-KeyCache
> >> -rw-r--r-- 1 cass cass 249K 2011-08-12 19:49 cass-VotesByLink-KeyCache
> >> -rw-r--r-- 1 cass cass   28 2011-08-14 12:50
> >> system-HintsColumnFamily-KeyCache
> >> -rw-r--r-- 1 cass cass5 2011-08-14 12:50
> system-LocationInfo-KeyCache
> >> -rw-r--r-- 1 cass cass   54 2011-08-13 13:30 system-Migrations-KeyCache
> >> -rw-r--r-- 1 cass cass   76 2011-08-13 13:30 system-Schema-KeyCache
> >>
> >> On Wed, Aug 17, 2011 at 4:31 PM, aaron morton 
> >> wrote:
> >> > If you have a node that cannot start up due to issues loading the
> saved
> >> > cache delete the files in the saved_cache directory before starting
> it.
> >> >
> >> > The settings to save the row and key cache are per CF. You can change
> >> > them with an update column family statement via the CLI when attached
> to any
> >> > node. You may then want to check the saved_caches directory and delete
> any
> >> > files that are left (not sure if they are automatically deleted).
> >> >
> >> > i would recommend:
> >> > - stop node 2
> >> > - delete it's saved_cache
> >> > - make the schema change via another node
> >> > - startup node 2
> >> >
> >> > Cheers
> >> >
> >> > -
> >> > Aaron Morton
> >> > Freelance Cassandra Developer
> >> > @aaronmorton
> >> > http://www.thelastpickle.com
> >> >
> >> > On 17/08/2011, at 2:59 PM, Yan Chunlu wrote:
> >> >
> >> >> does this need to be cluster wide? or I could just modify the caches
> >> >> on one node?   since I could not connect to the node with
> >> >> cassandra-cli, it says "connection refused"
> >> >>
> >> >>
> >> >> [default@unknown] connect node2/9160;
> >> >> Exception connecting to node2/9160. Reason: Connection refused.
> >> >>
> >> >>
> >> >> so if I change the cache size via other nodes, how could node2 be
> >> >> notified the changing?kill cassandra and start it again could
> make
> >> >> it update the schema?
> >> >>
> >> >>
> >> >>
> >> >> On Wed, Aug 17, 2011 at 5:59 AM, Teijo Holzer 
> >> >> wrote:
> >> >>> Hi,
> >>

Re: node restart taking too long

2011-08-19 Thread Yan Chunlu

the log file shows as follows, not sure what does 'Couldn't find cfId=1000'
means(google just returned useless results):

INFO [main] 2011-08-18 07:23:17,688 DatabaseDescriptor.java (line 453) Found
table data in data directories. Consider using JMX to call
org.apache.cassandra.service.StorageService.loadSchemaFromYaml().
 INFO [main] 2011-08-18 07:23:17,705 CommitLogSegment.java (line 50)
Creating new commitlog segment
/cassandra/commitlog/CommitLog-1313670197705.log
 INFO [main] 2011-08-18 07:23:17,716 CommitLog.java (line 155) Replaying
/cassandra/commitlog/CommitLog-1313670030512.log
 INFO [main] 2011-08-18 07:23:17,734 CommitLog.java (line 314) Finished
reading /cassandra/commitlog/CommitLog-1313670030512.log
 INFO [main] 2011-08-18 07:23:17,744 CommitLog.java (line 163) Log replay
complete
 INFO [main] 2011-08-18 07:23:17,756 StorageService.java (line 364)
Cassandra version: 0.7.4
 INFO [main] 2011-08-18 07:23:17,756 StorageService.java (line 365) Thrift
API version: 19.4.0
 INFO [main] 2011-08-18 07:23:17,756 StorageService.java (line 378) Loading
persisted ring state
 INFO [main] 2011-08-18 07:23:17,766 StorageService.java (line 414) Starting
up server gossip
 INFO [main] 2011-08-18 07:23:17,771 ColumnFamilyStore.java (line 1048)
Enqueuing flush of Memtable-LocationInfo@832310230(29 bytes, 1 operations)
 INFO [FlushWriter:1] 2011-08-18 07:23:17,772 Memtable.java (line 157)
Writing Memtable-LocationInfo@832310230(29 bytes, 1 operations)
 INFO [FlushWriter:1] 2011-08-18 07:23:17,822 Memtable.java (line 164)
Completed flushing /cassandra/data/system/LocationInfo-f-66-Data.db (80
bytes)
 INFO [CompactionExecutor:1] 2011-08-18 07:23:17,823 CompactionManager.java
(line 396) Compacting
[SSTableReader(path='/cassandra/data/system/LocationInfo-f-63-Data.db'),SSTableReader(path='/cassandra/data/system/LocationInfo-f-64-Data.db'),SSTableReader(path='/cassandra/data/system/LocationInfo-f-65-Data.db'),SSTableReader(path='/cassandra/data/system/LocationInfo-f-66-Data.db')]
 INFO [main] 2011-08-18 07:23:17,853 StorageService.java (line 478) Using
saved token 113427455640312821154458202477256070484
 INFO [main] 2011-08-18 07:23:17,854 ColumnFamilyStore.java (line 1048)
Enqueuing flush of Memtable-LocationInfo@18895884(53 bytes, 2 operations)
 INFO [FlushWriter:1] 2011-08-18 07:23:17,854 Memtable.java (line 157)
Writing Memtable-LocationInfo@18895884(53 bytes, 2 operations)
ERROR [MutationStage:28] 2011-08-18 07:23:18,246 RowMutationVerbHandler.java
(line 86) Error in row mutation
org.apache.cassandra.db.UnserializableColumnFamilyException: Couldn't find
cfId=1000
at
org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:117)
at
org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:380)
at
org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:50)
at
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:72)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)
 INFO [GossipStage:1] 2011-08-18 07:23:18,255 Gossiper.java (line 623) Node
/node1 has restarted, now UP again
ERROR [ReadStage:1] 2011-08-18 07:23:18,254
DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor
java.lang.IllegalArgumentException: Unknown ColumnFamily prjcache in
keyspace prjkeyspace
at
org.apache.cassandra.config.DatabaseDescriptor.getComparator(DatabaseDescriptor.java:966)
at
org.apache.cassandra.db.ColumnFamily.getComparatorFor(ColumnFamily.java:388)
at
org.apache.cassandra.db.ReadCommand.getComparator(ReadCommand.java:93)
at
org.apache.cassandra.db.SliceByNamesReadCommand.(SliceByNamesReadCommand.java:44)
at
org.apache.cassandra.db.SliceByNamesReadCommandSerializer.deserialize(SliceByNamesReadCommand.java:110)
at
org.apache.cassandra.db.ReadCommandSerializer.deserialize(ReadCommand.java:122)
at
org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:67)

On Fri, Aug 19, 2011 at 5:44 AM, aaron morton wrote:

> Look in the logs to work find out why the migration did not get to node2.
>
> Otherwise yes you can drop those files.
>
> Cheers
>
>   -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 18/08/2011, at 11:25 PM, Yan Chunlu wrote:
>
> just found out that changes via cassandra-cli, the schema change didn't
> reach node2. and node2 became unreachable
>
> I did as this document:
> http://wiki.apache.org/cassandra/FAQ#schema_disagreement
>
> but after that I just got two schema versons:
>
>
>
> ddcada52-c96a-11e0-99af-3bd951658d61: [node1, node3]
>  2127b2ef-6998-11e0-b45b-3bd95

Re: node restart taking too long

2011-08-20 Thread Yan Chunlu

any suggestion? thanks!

On Fri, Aug 19, 2011 at 10:26 PM, Yan Chunlu  wrote:

> the log file shows as follows, not sure what does 'Couldn't find cfId=1000'
> means(google just returned useless results):
>
>
> INFO [main] 2011-08-18 07:23:17,688 DatabaseDescriptor.java (line 453)
> Found table data in data directories. Consider using JMX to call
> org.apache.cassandra.service.StorageService.loadSchemaFromYaml().
>  INFO [main] 2011-08-18 07:23:17,705 CommitLogSegment.java (line 50)
> Creating new commitlog segment
> /cassandra/commitlog/CommitLog-1313670197705.log
>  INFO [main] 2011-08-18 07:23:17,716 CommitLog.java (line 155) Replaying
> /cassandra/commitlog/CommitLog-1313670030512.log
>  INFO [main] 2011-08-18 07:23:17,734 CommitLog.java (line 314) Finished
> reading /cassandra/commitlog/CommitLog-1313670030512.log
>  INFO [main] 2011-08-18 07:23:17,744 CommitLog.java (line 163) Log replay
> complete
>  INFO [main] 2011-08-18 07:23:17,756 StorageService.java (line 364)
> Cassandra version: 0.7.4
>  INFO [main] 2011-08-18 07:23:17,756 StorageService.java (line 365) Thrift
> API version: 19.4.0
>  INFO [main] 2011-08-18 07:23:17,756 StorageService.java (line 378) Loading
> persisted ring state
>  INFO [main] 2011-08-18 07:23:17,766 StorageService.java (line 414)
> Starting up server gossip
>  INFO [main] 2011-08-18 07:23:17,771 ColumnFamilyStore.java (line 1048)
> Enqueuing flush of Memtable-LocationInfo@832310230(29 bytes, 1 operations)
>  INFO [FlushWriter:1] 2011-08-18 07:23:17,772 Memtable.java (line 157)
> Writing Memtable-LocationInfo@832310230(29 bytes, 1 operations)
>  INFO [FlushWriter:1] 2011-08-18 07:23:17,822 Memtable.java (line 164)
> Completed flushing /cassandra/data/system/LocationInfo-f-66-Data.db (80
> bytes)
>  INFO [CompactionExecutor:1] 2011-08-18 07:23:17,823 CompactionManager.java
> (line 396) Compacting
> [SSTableReader(path='/cassandra/data/system/LocationInfo-f-63-Data.db'),SSTableReader(path='/cassandra/data/system/LocationInfo-f-64-Data.db'),SSTableReader(path='/cassandra/data/system/LocationInfo-f-65-Data.db'),SSTableReader(path='/cassandra/data/system/LocationInfo-f-66-Data.db')]
>  INFO [main] 2011-08-18 07:23:17,853 StorageService.java (line 478) Using
> saved token 113427455640312821154458202477256070484
>  INFO [main] 2011-08-18 07:23:17,854 ColumnFamilyStore.java (line 1048)
> Enqueuing flush of Memtable-LocationInfo@18895884(53 bytes, 2 operations)
>  INFO [FlushWriter:1] 2011-08-18 07:23:17,854 Memtable.java (line 157)
> Writing Memtable-LocationInfo@18895884(53 bytes, 2 operations)
> ERROR [MutationStage:28] 2011-08-18 07:23:18,246
> RowMutationVerbHandler.java (line 86) Error in row mutation
> org.apache.cassandra.db.UnserializableColumnFamilyException: Couldn't find
> cfId=1000
> at
> org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:117)
> at
> org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:380)
> at
> org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:50)
> at
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:72)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> at java.lang.Thread.run(Thread.java:636)
>  INFO [GossipStage:1] 2011-08-18 07:23:18,255 Gossiper.java (line 623) Node
> /node1 has restarted, now UP again
> ERROR [ReadStage:1] 2011-08-18 07:23:18,254
> DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor
> java.lang.IllegalArgumentException: Unknown ColumnFamily prjcache in
> keyspace prjkeyspace
> at
> org.apache.cassandra.config.DatabaseDescriptor.getComparator(DatabaseDescriptor.java:966)
> at
> org.apache.cassandra.db.ColumnFamily.getComparatorFor(ColumnFamily.java:388)
> at
> org.apache.cassandra.db.ReadCommand.getComparator(ReadCommand.java:93)
> at
> org.apache.cassandra.db.SliceByNamesReadCommand.(SliceByNamesReadCommand.java:44)
> at
> org.apache.cassandra.db.SliceByNamesReadCommandSerializer.deserialize(SliceByNamesReadCommand.java:110)
> at
> org.apache.cassandra.db.ReadCommandSerializer.deserialize(ReadCommand.java:122)
> at
> org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:67)
>
>
>
> On Fri, Aug 19, 2011 at 5:44 AM, aaron morton wrote:
>
>> Look in the logs to work find out why the migration did not get to node2.
>>
>> Otherwise yes you can drop those files.
>>
>> Cheers
>>
>>   -
>> Aaron Morton
>> Freelance Cassandra De

Re: node restart taking too long

2011-08-20 Thread Yan Chunlu

that could be the reason, I did nodetool repair(unfinished, data size
increased 6 times bigger 30G vs 170G) and there should be some unclean
sstables on that node.

however upgrade it a tough work for me right now.  could the nodetool scrub
help?  or decommission the node and join it again?


On Sun, Aug 21, 2011 at 5:56 AM, Jonathan Ellis  wrote:

> This means you should upgrade, because we've fixed bugs about ignoring
> deleted CFs since 0.7.4.
>
> On Fri, Aug 19, 2011 at 9:26 AM, Yan Chunlu  wrote:
> > the log file shows as follows, not sure what does 'Couldn't find
> cfId=1000'
> > means(google just returned useless results):
> >
> > INFO [main] 2011-08-18 07:23:17,688 DatabaseDescriptor.java (line 453)
> Found
> > table data in data directories. Consider using JMX to call
> > org.apache.cassandra.service.StorageService.loadSchemaFromYaml().
> >  INFO [main] 2011-08-18 07:23:17,705 CommitLogSegment.java (line 50)
> > Creating new commitlog segment
> > /cassandra/commitlog/CommitLog-1313670197705.log
> >  INFO [main] 2011-08-18 07:23:17,716 CommitLog.java (line 155) Replaying
> > /cassandra/commitlog/CommitLog-1313670030512.log
> >  INFO [main] 2011-08-18 07:23:17,734 CommitLog.java (line 314) Finished
> > reading /cassandra/commitlog/CommitLog-1313670030512.log
> >  INFO [main] 2011-08-18 07:23:17,744 CommitLog.java (line 163) Log replay
> > complete
> >  INFO [main] 2011-08-18 07:23:17,756 StorageService.java (line 364)
> > Cassandra version: 0.7.4
> >  INFO [main] 2011-08-18 07:23:17,756 StorageService.java (line 365)
> Thrift
> > API version: 19.4.0
> >  INFO [main] 2011-08-18 07:23:17,756 StorageService.java (line 378)
> Loading
> > persisted ring state
> >  INFO [main] 2011-08-18 07:23:17,766 StorageService.java (line 414)
> Starting
> > up server gossip
> >  INFO [main] 2011-08-18 07:23:17,771 ColumnFamilyStore.java (line 1048)
> > Enqueuing flush of Memtable-LocationInfo@832310230(29 bytes, 1
> operations)
> >  INFO [FlushWriter:1] 2011-08-18 07:23:17,772 Memtable.java (line 157)
> > Writing Memtable-LocationInfo@832310230(29 bytes, 1 operations)
> >  INFO [FlushWriter:1] 2011-08-18 07:23:17,822 Memtable.java (line 164)
> > Completed flushing /cassandra/data/system/LocationInfo-f-66-Data.db (80
> > bytes)
> >  INFO [CompactionExecutor:1] 2011-08-18 07:23:17,823
> CompactionManager.java
> > (line 396) Compacting
> >
> [SSTableReader(path='/cassandra/data/system/LocationInfo-f-63-Data.db'),SSTableReader(path='/cassandra/data/system/LocationInfo-f-64-Data.db'),SSTableReader(path='/cassandra/data/system/LocationInfo-f-65-Data.db'),SSTableReader(path='/cassandra/data/system/LocationInfo-f-66-Data.db')]
> >  INFO [main] 2011-08-18 07:23:17,853 StorageService.java (line 478) Using
> > saved token 113427455640312821154458202477256070484
> >  INFO [main] 2011-08-18 07:23:17,854 ColumnFamilyStore.java (line 1048)
> > Enqueuing flush of Memtable-LocationInfo@18895884(53 bytes, 2
> operations)
> >  INFO [FlushWriter:1] 2011-08-18 07:23:17,854 Memtable.java (line 157)
> > Writing Memtable-LocationInfo@18895884(53 bytes, 2 operations)
> > ERROR [MutationStage:28] 2011-08-18 07:23:18,246
> RowMutationVerbHandler.java
> > (line 86) Error in row mutation
> > org.apache.cassandra.db.UnserializableColumnFamilyException: Couldn't
> find
> > cfId=1000
> > at
> >
> org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:117)
> > at
> >
> org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:380)
> > at
> >
> org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:50)
> > at
> >
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:72)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> > at java.lang.Thread.run(Thread.java:636)
> >  INFO [GossipStage:1] 2011-08-18 07:23:18,255 Gossiper.java (line 623)
> Node
> > /node1 has restarted, now UP again
> > ERROR [ReadStage:1] 2011-08-18 07:23:18,254
> > DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor
> > java.lang.IllegalArgumentException: Unknown ColumnFamily prjcache in
> > keyspace prjkeyspace
> > at
> >
> org.apache.cassandra.config.DatabaseDescriptor.getComparator(DatabaseDescriptor.java:966)
> > at
> >
> org.apache.cassandra.db.ColumnFa

how to know if nodetool cleanup is safe?

2011-08-21 Thread Yan Chunlu

since "nodetool cleanup" could remove hinted handoff,  will it cause the
data loss?

would it possible for this kind of data loss?

2011-08-21 Thread Yan Chunlu

I was aware of the deleted items might be come back alive without proper
node repair.

how about modified items, for example 'A'=>{1,2,3}.  then 'A'=>{4,5}.   if
that possible 'A' change back to {1,2,3}?

I have encountered this mystery problem after go through a mess procedure
with cassandra nodes, (repair,interrupt repair,restart,remove
migration,schema sstable,disable gossip,disablethrift,flush,change RF from
2->3 then 3-> then 2->3, rebalance the cluster using move, change key cache
and row cache to 0 and change back to 20) something like this.   Is that
possible that I could somehow make the old version data replace the new
version data?

I have 3 nodes and the client read/write consistency level is QUORUM, I have
changed it to ONE for awhile and changed it back.

The schema has not settled in 10 seconds; further migrations are ill-advised until it does.?

2011-08-21 Thread Yan Chunlu

I have encountered this problem while update the key cache and row cache.  I
once updated them to "0"(disable) while node2 was not available, when it
comeback they eventually have the same schema version.

[default@prjspace] describe cluster;
Cluster Information:
   Snitch: org.apache.cassandra.locator.SimpleSnitch
   Partitioner: org.apache.cassandra.dht.RandomPartitioner
   Schema versions:
79d072cc-cc62-11e0-a753-5525ca993302: [node3, node1, node2]



ever time I change the row cache && key cache back per CF, it show the
following error:

[default@prjspace] update column family CommentCache with
keys_cached=20;


Waiting for schema agreement...
Warning: unreachable nodes node3The schema has not settled in 10 seconds;
further migrations are ill-advised until it does.
Versions are
f7f24ef4-caf7-11e0-9b1d-5525ca993302:[node1],79d072cc-cc62-11e0-a753-5525ca993302:[node2],UNREACHABLE:[node3]


really have no idea what does this means

Re: Cassandra Cluster Admin - phpMyAdmin for Cassandra

2011-08-21 Thread Yan Chunlu

just tried it and it works like a charming!  thanks a lot for the great
work!



On Mon, Aug 22, 2011 at 9:47 AM, SebWajam  wrote:

> Hi,
>
> I'm working on this project for a few months now and I think it's mature
> enough to post it here:
> Cassandra Cluster Admin on 
> GitHub
>
> Basically, it's a GUI for Cassandra. If you're like me and used MySQL for a
> while (and still using it!), you get used to phpMyAdmin and its simple and
> easy to use user interface. I thought it would be nice to have a similar
> tool for Cassandra and I couldn't find any, so I build my own!
>
> Supported actions:
>
>- Keyspace manipulation (add/edit/drop)
>- Column Family manipulation (add/edit/truncate/drop)
>- Row manipulation on column family and super column family
>(insert/edit/remove)
>- Basic data browser to navigate in the data of a column family (seems
>to be the favorite feature so far)
>- Support Cassandra 0.8+ atomic counters
>- Support management of multiple Cassandra clusters
>
> Bug report and/or pull request are always welcome!
>
> --
> View this message in context: Cassandra Cluster Admin - phpMyAdmin for
> Cassandra
> Sent from the cassandra-u...@incubator.apache.org mailing list 
> archiveat 
> Nabble.com.
>

Re: The schema has not settled in 10 seconds; further migrations are ill-advised until it does.?

2011-08-21 Thread Yan Chunlu

thanks for the migration tip, but the schema is in agreement.

[default@prjspace] describe cluster;
Cluster Information:
   Snitch: org.apache.cassandra.locator.SimpleSnitch
   Partitioner: org.apache.cassandra.dht.RandomPartitioner
   Schema versions:
79d072cc-cc62-11e0-a753-5525ca993302: [node3, node1, node2]

the tpstats show nothing:

MigrationStage0 0   5614

is the MigrationStage a routine operation or it just execute once at the
start of the node?


is that possible every I update the schema and it suddenly start migration
then my update was interrupted?

it happens EVERY time  I update the schema, that's the part I was worrying
about, but after the error describe cluster didn't show anything wrong


On Mon, Aug 22, 2011 at 10:19 AM, Edward Capriolo wrote:

>
>
> On Sun, Aug 21, 2011 at 10:09 PM, Yan Chunlu wrote:
>
>> I have encountered this problem while update the key cache and row cache.
>>  I once updated them to "0"(disable) while node2 was not available, when it
>> comeback they eventually have the same schema version.
>>
>> [default@prjspace] describe cluster;
>> Cluster Information:
>>Snitch: org.apache.cassandra.locator.SimpleSnitch
>>Partitioner: org.apache.cassandra.dht.RandomPartitioner
>>Schema versions:
>> 79d072cc-cc62-11e0-a753-5525ca993302: [node3, node1, node2]
>>
>>
>>
>> ever time I change the row cache && key cache back per CF, it show the
>> following error:
>>
>> [default@prjspace] update column family CommentCache with
>> keys_cached=20;
>>
>>
>> Waiting for schema agreement...
>> Warning: unreachable nodes node3The schema has not settled in 10 seconds;
>> further migrations are ill-advised until it does.
>> Versions are
>> f7f24ef4-caf7-11e0-9b1d-5525ca993302:[node1],79d072cc-cc62-11e0-a753-5525ca993302:[node2],UNREACHABLE:[node3]
>>
>>
>> really have no idea what does this means
>>
>>
>>
>>
> Unreachable can mean that the node is still completing the schema
> migration. Only one operation can happen in this stage at a time. Try:
>
> [default@unknown] describe cluster;
>
> Might say that a node is unreachable. That can mean that node has a thread
> in the migration stage. You can check like so.
>
> # /usr/local/cassandra/bin/nodetool -h cdbsd03 -p 8585 tpstats | grep
> Migration
>
> On drop column family or truncate operations I have seen nodes stay in
> UNKNOWN state for a while.
>
> If they continue having trouble try a restart. If they still have trouble
> follow the FAQ's advice about correcting schema disagreement.
>
> Edward
>
>
>
>

get mycf['rowkey']['column_name'] return 'Value was not found' in cassandra-cli

2011-08-21 Thread Yan Chunlu

connect to cassandra-cli and issue the list my cf I got

RowKey: comments_62559
=> (column=76616c7565,
value=28286c70310a4c3236373632334c0a614c3236373733304c0a614c3236373737304c0a614c3236373932324c0a614c3236373934364c0a614c3236383137314c0a614c3236383330334c0a614c3236383934314c0a614c3236383938394c0,
timestamp=1312791934150273)


and using
get mycf['comments_62559'] could return
(column=76616c7565,
value=28286c70310a4c3236373632334c0a614c3236373733304c0a614c3236373737304c0a614c3236373932324c0a614c3236373934364c0a614c3236383137314c0a614c3236383330334c0a614c3236383934314c0a614c3236383938394c0,
timestamp=1312791934150273)



but
get mycf['comments_62559'][76616c7565];

returns 'Value was not found'

did I do something wrong?

Re: get mycf['rowkey']['column_name'] return 'Value was not found' in cassandra-cli

2011-08-22 Thread Yan Chunlu

the cassandra-cli version I am using is shipped with the cassandra 0.7.4
package;

but I could get results by the column name "14np_20nl":
get mycf2[14np][14np_20nl];




On Mon, Aug 22, 2011 at 1:20 PM, Jonathan Ellis  wrote:

> My guess: you're using an old version of the cli that isn't dealing
> with bytestype column names correctly
>
> On Mon, Aug 22, 2011 at 12:08 AM, Yan Chunlu 
> wrote:
> > connect to cassandra-cli and issue the list my cf I got
> > RowKey: comments_62559
> > => (column=76616c7565,
> >
> value=28286c70310a4c3236373632334c0a614c3236373733304c0a614c3236373737304c0a614c3236373932324c0a614c3236373934364c0a614c3236383137314c0a614c3236383330334c0a614c3236383934314c0a614c3236383938394c0,
> > timestamp=1312791934150273)
> >
> > and using
> > get mycf['comments_62559'] could return
> > (column=76616c7565,
> >
> value=28286c70310a4c3236373632334c0a614c3236373733304c0a614c3236373737304c0a614c3236373932324c0a614c3236373934364c0a614c3236383137314c0a614c3236383330334c0a614c3236383934314c0a614c3236383938394c0,
> > timestamp=1312791934150273)
> >
> >
> > but
> > get mycf['comments_62559'][76616c7565];
> > returns 'Value was not found'
> > did I do something wrong?
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>

Re: get mycf['rowkey']['column_name'] return 'Value was not found' in cassandra-cli

2011-08-22 Thread Yan Chunlu

thanks a lot!

On Mon, Aug 22, 2011 at 10:14 PM, Edward Capriolo wrote:

>
>
> On Mon, Aug 22, 2011 at 1:08 AM, Yan Chunlu  wrote:
>
>> connect to cassandra-cli and issue the list my cf I got
>>
>> RowKey: comments_62559
>> => (column=76616c7565,
>> value=28286c70310a4c3236373632334c0a614c3236373733304c0a614c3236373737304c0a614c3236373932324c0a614c3236373934364c0a614c3236383137314c0a614c3236383330334c0a614c3236383934314c0a614c3236383938394c0,
>> timestamp=1312791934150273)
>>
>>
>> and using
>> get mycf['comments_62559'] could return
>> (column=76616c7565,
>> value=28286c70310a4c3236373632334c0a614c3236373733304c0a614c3236373737304c0a614c3236373932324c0a614c3236373934364c0a614c3236383137314c0a614c3236383330334c0a614c3236383934314c0a614c3236383938394c0,
>> timestamp=1312791934150273)
>>
>>
>>
>> but
>> get mycf['comments_62559'][76616c7565];
>>
>> returns 'Value was not found'
>>
>> did I do something wrong?
>>
>
> Yes/Probably. Based on how you have defined your column families the data
> stored in your columns may be displayed differently. By default the storage
> is byte []. The Cli makes the decision to convert them to hex strings* (each
> major version 0.6. 0.7.X and 0.8.X of c* was selective about what it
> converted and why.)
>
> In any case there are two fixes:
> 1) Update the column family meta data and set the types correctly ASCII,
> UTF8, LONG, etc
>
> 2) Use the ASSUME keyword in the CLI to convert the rows to readable
> displays
> & when selecting columns use cli functions like : get
> CF[ascii('x')][ascii('y')] to make get what you are actually asking for.
>
> The CLI is more correct in current versions then it was in the past in
> regard to types and conversions, but if you do not define CF Meta Data it
> makes you scratch your head at times because it is not exactly clear that it
> is showing you a hex encoded byte [] and not an ascii string.
>
> Edward
>

Re: how to know if nodetool cleanup is safe?

2011-08-24 Thread Yan Chunlu

got it! thanks a lot for the explanation!

On Wed, Aug 24, 2011 at 1:06 AM, Edward Capriolo wrote:

>
> On Tue, Aug 23, 2011 at 11:56 AM, Sam Overton  wrote:
>
>> On 21 August 2011 12:34, Yan Chunlu  wrote:
>>
>>> since "nodetool cleanup" could remove hinted handoff,  will it cause the
>>> data loss?
>>
>>
>> Hi Yan,
>>
>> Hints are not guaranteed to be delivered and "nodetool cleanup" is one of
>> the reasons for that. This will only cause data-loss if you are writing at
>> CL.ANY where a hint counts as a write. If you are writing at CL.ONE or above
>> then at least one replica must receive the data for the write to succeed, so
>> losing hints will not cause data-loss.
>>
>> If a hint is not delivered then the replica to which it was intended will
>> become consistent after a read-repair, or after manual anti-entropy repair.
>>
>> Sam
>>
>> --
>> Sam Overton
>> Acunu | http://www.acunu.com | @acunu
>>
>
> If you run nodetool tpstats on each node in your cluster and ensure none of
> them have an active or pending threads in the Hinted stage no hints are
> currently being delivered. But as pointed out above Hinted Handoff is a best
> effort system.
>

cassandra auto create snapshots?

2011-08-29 Thread Yan Chunlu

just found the data dir consume a lot of space, which is because there was
many snapshots in it.

but I have set snapshot_before_compaction: false.  is that possible that
cassandra create those snapshot automatically?  could I delete them?

the dir names is strange(normally it should contain date info like this one:
1309954201568-20110706snap):

1309954201568-20110706snap  1313655860450  1313657278693  1313658563469
 1313660368230  1313661946829  1313673041895  1313978414627  1313993893774
 1313994151125
1309954218127-20110706snap  1313655977397  1313657539385  1313658794663
 1313660540216  1313665575966  1313684621571  1313981129181  1313993899685
 1313994436489
1309954367559-20110706snap  1313656318839  1313657769222  1313659117385
 1313660791414  1313670545340  1313688842151  1313993527645  1313994045681
 1313994537282
1313655524690   1313656750454  1313657987723  1313659841880
 1313661186615  1313671377867  1313821800047  1313993601281  1313994093703
 1313994548470
1313655632823   1313657058205  1313658288595  1313660165346
 1313661433088  1313672861562  1313822718672  1313993882415  1313994138019
 1313994791621

Re: cassandra auto create snapshots?

2011-08-29 Thread Yan Chunlu

so it was useless?   I didn't drop any CF/KS,  could "nodetool move",
"nodetool repair" cause the problem?

On Tue, Aug 30, 2011 at 5:23 AM, Jonathan Ellis  wrote:

> Perhaps you are seeing auto-snapshots before destructive events such
> as truncate or drop CF/KS.
>
> On Mon, Aug 29, 2011 at 4:19 PM, Yan Chunlu  wrote:
> > just found the data dir consume a lot of space, which is because there
> was
> > many snapshots in it.
> > but I have set snapshot_before_compaction: false.  is that possible that
> > cassandra create those snapshot automatically?  could I delete them?
> > the dir names is strange(normally it should contain date info like this
> one:
> > 1309954201568-20110706snap):
> > 1309954201568-20110706snap  1313655860450  1313657278693  1313658563469
> >  1313660368230  1313661946829  1313673041895  1313978414627
>  1313993893774
> >  1313994151125
> > 1309954218127-20110706snap  1313655977397  1313657539385  1313658794663
> >  1313660540216  1313665575966  1313684621571  1313981129181
>  1313993899685
> >  1313994436489
> > 1309954367559-20110706snap  1313656318839  1313657769222  1313659117385
> >  1313660791414  1313670545340  1313688842151  1313993527645
>  1313994045681
> >  1313994537282
> > 1313655524690   1313656750454  1313657987723  1313659841880
> >  1313661186615  1313671377867  1313821800047  1313993601281
>  1313994093703
> >  1313994548470
> > 1313655632823   1313657058205  1313658288595  1313660165346
> >  1313661433088  1313672861562  1313822718672  1313993882415
>  1313994138019
> >  1313994791621
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>

Re: cassandra auto create snapshots?

2011-08-29 Thread Yan Chunlu

thanks for the help.   have you tried use those snapshot to recover a node?
 I have not found anything related to those auto-created snapshots in the
wiki page, then dont even have a timestamp, not sure how to use those
files

On Tue, Aug 30, 2011 at 10:27 AM, Jonathan Ellis  wrote:

> No.
>
> On Mon, Aug 29, 2011 at 8:15 PM, Yan Chunlu  wrote:
> > so it was useless?   I didn't drop any CF/KS,  could "nodetool move",
> > "nodetool repair" cause the problem?
> >
> > On Tue, Aug 30, 2011 at 5:23 AM, Jonathan Ellis 
> wrote:
> >>
> >> Perhaps you are seeing auto-snapshots before destructive events such
> >> as truncate or drop CF/KS.
> >>
> >> On Mon, Aug 29, 2011 at 4:19 PM, Yan Chunlu 
> wrote:
> >> > just found the data dir consume a lot of space, which is because there
> >> > was
> >> > many snapshots in it.
> >> > but I have set snapshot_before_compaction: false.  is that possible
> that
> >> > cassandra create those snapshot automatically?  could I delete them?
> >> > the dir names is strange(normally it should contain date info like
> this
> >> > one:
> >> > 1309954201568-20110706snap):
> >> > 1309954201568-20110706snap  1313655860450  1313657278693
>  1313658563469
> >> >  1313660368230  1313661946829  1313673041895  1313978414627
> >> >  1313993893774
> >> >  1313994151125
> >> > 1309954218127-20110706snap  1313655977397  1313657539385
>  1313658794663
> >> >  1313660540216  1313665575966  1313684621571  1313981129181
> >> >  1313993899685
> >> >  1313994436489
> >> > 1309954367559-20110706snap  1313656318839  1313657769222
>  1313659117385
> >> >  1313660791414  1313670545340  1313688842151  1313993527645
> >> >  1313994045681
> >> >  1313994537282
> >> > 1313655524690   1313656750454  1313657987723
>  1313659841880
> >> >  1313661186615  1313671377867  1313821800047  1313993601281
> >> >  1313994093703
> >> >  1313994548470
> >> > 1313655632823   1313657058205  1313658288595
>  1313660165346
> >> >  1313661433088  1313672861562  1313822718672  1313993882415
> >> >  1313994138019
> >> >  1313994791621
> >> >
> >>
> >>
> >>
> >> --
> >> Jonathan Ellis
> >> Project Chair, Apache Cassandra
> >> co-founder of DataStax, the source for professional Cassandra support
> >> http://www.datastax.com
> >
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>

what's the difference between repair CF separately and repair the entire node?

2011-09-08 Thread Yan Chunlu

I have 3 nodes and RF=3.  I  tried to repair every node in the cluster by
using "nodetool repair mykeyspace mycf" on every column family.  it finished
within 3 hours, the data size is no more than 50GB.
after the repair, I have tried using nodetool repair immediately to repair
the entire node, but 48 hours has past it still going on. "compactionstats"
shows it is doing "SSTable rebuild".

so I am frustrating about why does "nodetool repair" so slow?   how does it
different with repair every CF?

I didn't tried to repair the system keyspace, does it also need to repair?
 thanks!

Re: what's the difference between repair CF separately and repair the entire node?

2011-09-12 Thread Yan Chunlu

I am using 0.7.4.  so it is always okay to do the routine repair on
Column Family basis? thanks!

On Fri, Sep 9, 2011 at 3:25 PM, Sylvain Lebresne  wrote:
>
> On Fri, Sep 9, 2011 at 4:18 AM, Yan Chunlu  wrote:
> > I have 3 nodes and RF=3.  I  tried to repair every node in the cluster by
> > using "nodetool repair mykeyspace mycf" on every column family.  it finished
> > within 3 hours, the data size is no more than 50GB.
> > after the repair, I have tried using nodetool repair immediately to repair
> > the entire node, but 48 hours has past it still going on. "compactionstats"
> > shows it is doing "SSTable rebuild".
> > so I am frustrating about why does "nodetool repair" so slow?   how does it
> > different with repair every CF?
>
> What version of Cassandra are you using. If you are using something < 0.8.2,
> then it may be because "nodetool repair" used to schedule its sub-task poorly,
> in ways that were counter-productive (fixed by CASSANDRA-2816).
>
> If you are using a more recent version, then it's an interesting report.
>
> > I didn't tried to repair the system keyspace, does it also need to repair?
>
> It doesn't.
>
> --
> Sylvain

Re: what's the difference between repair CF separately and repair the entire node?

2011-09-12 Thread Yan Chunlu

I think it is a serious problem since I can not "repair".  I am
using cassandra on production servers. is there some way to fix it
without upgrade?  I heard of that 0.8.x is still not quite ready in
production environment.

thanks!

On Tue, Sep 13, 2011 at 1:44 AM, Peter Schuller
 wrote:
>> I am using 0.7.4.  so it is always okay to do the routine repair on
>> Column Family basis? thanks!
>
> It's "okay" but won't do what you want; due to a bug you'll see
> streaming of data for other column families than the one you're trying
> to repair. This will be fixed in 1.0.
>
> --
> / Peter Schuller (@scode on twitter)
>

Re: what's the difference between repair CF separately and repair the entire node?

2011-09-13 Thread Yan Chunlu

me neither don't want to repair one CF at the time.

the "node repair" took a week and still running, compactionstats and
netstream shows nothing is running on every node,  and also no error
message, no exception, really no idea what was it doing,  I stopped
yesterday.  maybe I should run repair again while disable  compaction on all
nodes?

thanks!


On Wed, Sep 14, 2011 at 6:57 AM, Peter Schuller  wrote:

> > I think it is a serious problem since I can not "repair".  I am
> > using cassandra on production servers. is there some way to fix it
> > without upgrade?  I heard of that 0.8.x is still not quite ready in
> > production environment.
>
> It is a serious issue if you really need to repair one CF at the time.
> However, looking at your original post it seems this is not
> necessarily your issue. Do you need to, or was your concern rather the
> overall time repair took?
>
> There are other things that are improved in 0.8 that affect 0.7. In
> particular, (1) in 0.7 compaction, including validating compactions
> that are part of repair, is non-concurrent so if your repair starts
> while there is a long-running compaction going it will have to wait,
> and (2) semi-related is that the merkle tree calculation that is part
> of repair/anti-entropy may happen "out of synch" if one of the nodes
> participating happen to be busy with compaction. This in turns causes
> additional data to be sent as part of repair.
>
> That might be why your immediately following repair took a long time,
> but it's difficult to tell.
>
> If you're having issues with repair and large data sets, I would
> generally say that upgrading to 0.8 is recommended. However, if you're
> on 0.7.4, beware of
> https://issues.apache.org/jira/browse/CASSANDRA-3166
>
> --
> / Peter Schuller (@scode on twitter)
>

Re: what's the difference between repair CF separately and repair the entire node?

2011-09-14 Thread Yan Chunlu

is 0.8 ready for production use?   as I know currently many companies
including reddit.com are using 0.7, how does they get rid of the repair
problem?

On Wed, Sep 14, 2011 at 2:47 PM, Sylvain Lebresne wrote:

> On Wed, Sep 14, 2011 at 2:38 AM, Yan Chunlu  wrote:
> > me neither don't want to repair one CF at the time.
> > the "node repair" took a week and still running, compactionstats and
> > netstream shows nothing is running on every node,  and also no error
> > message, no exception, really no idea what was it doing,
>
> To add to the list of things repair does wrong in 0.7, we'll have to add
> that
> if one of the node participating in the repair (so any node that share a
> range
> with the node on which repair was started) goes down (even for a short
> time),
> then the repair will simply hang forever doing nothing. And no specific
> error message will be logged. That could be what happened. Again, recent
> releases of 0.8 fix that too.
>
> --
> Sylvain
>
> > I stopped yesterday.  maybe I should run repair again while disable
> > compaction on all nodes?
> > thanks!
> >
> > On Wed, Sep 14, 2011 at 6:57 AM, Peter Schuller
> >  wrote:
> >>
> >> > I think it is a serious problem since I can not "repair".  I am
> >> > using cassandra on production servers. is there some way to fix it
> >> > without upgrade?  I heard of that 0.8.x is still not quite ready in
> >> > production environment.
> >>
> >> It is a serious issue if you really need to repair one CF at the time.
> >> However, looking at your original post it seems this is not
> >> necessarily your issue. Do you need to, or was your concern rather the
> >> overall time repair took?
> >>
> >> There are other things that are improved in 0.8 that affect 0.7. In
> >> particular, (1) in 0.7 compaction, including validating compactions
> >> that are part of repair, is non-concurrent so if your repair starts
> >> while there is a long-running compaction going it will have to wait,
> >> and (2) semi-related is that the merkle tree calculation that is part
> >> of repair/anti-entropy may happen "out of synch" if one of the nodes
> >> participating happen to be busy with compaction. This in turns causes
> >> additional data to be sent as part of repair.
> >>
> >> That might be why your immediately following repair took a long time,
> >> but it's difficult to tell.
> >>
> >> If you're having issues with repair and large data sets, I would
> >> generally say that upgrading to 0.8 is recommended. However, if you're
> >> on 0.7.4, beware of
> >> https://issues.apache.org/jira/browse/CASSANDRA-3166
> >>
> >> --
> >> / Peter Schuller (@scode on twitter)
> >
> >
>

Re: what's the difference between repair CF separately and repair the entire node?

2011-09-14 Thread Yan Chunlu

thanks a lot for the help!

 I have read the post and think 0.8 might be good enough for me, especially
0.8.5.

also change gc_grace_seconds is a acceptable solution.



On Wed, Sep 14, 2011 at 4:03 PM, Sylvain Lebresne wrote:

> On Wed, Sep 14, 2011 at 9:27 AM, Yan Chunlu  wrote:
> > is 0.8 ready for production use?
>
> some related discussion here:
> http://www.mail-archive.com/user@cassandra.apache.org/msg17055.html
> but my personal answer is yes.
>
> >  as I know currently many companies including reddit.com are using 0.7,
> how
> > does they get rid of the repair problem?
>
> Repair problems in 0.7 don't hit everyone equally. For some people, it
> works
> relatively well even if not in the most efficient ways. Also, for some
> workload
> (if you don't do  much deletes for instance), you can set a big
> gc_grace_seconds
> value (say a month) and only run repair that often, which can make repair
> inefficiencies more bearable.
> That being said, I can't speak for "many companies", but I do advise
> evaluating
> an upgrade to 0.8.
>
> --
> Sylvain
>
> >
> > On Wed, Sep 14, 2011 at 2:47 PM, Sylvain Lebresne 
> > wrote:
> >>
> >> On Wed, Sep 14, 2011 at 2:38 AM, Yan Chunlu 
> wrote:
> >> > me neither don't want to repair one CF at the time.
> >> > the "node repair" took a week and still running, compactionstats and
> >> > netstream shows nothing is running on every node,  and also no error
> >> > message, no exception, really no idea what was it doing,
> >>
> >> To add to the list of things repair does wrong in 0.7, we'll have to add
> >> that
> >> if one of the node participating in the repair (so any node that share a
> >> range
> >> with the node on which repair was started) goes down (even for a short
> >> time),
> >> then the repair will simply hang forever doing nothing. And no specific
> >> error message will be logged. That could be what happened. Again, recent
> >> releases of 0.8 fix that too.
> >>
> >> --
> >> Sylvain
> >>
> >> > I stopped yesterday.  maybe I should run repair again while disable
> >> > compaction on all nodes?
> >> > thanks!
> >> >
> >> > On Wed, Sep 14, 2011 at 6:57 AM, Peter Schuller
> >> >  wrote:
> >> >>
> >> >> > I think it is a serious problem since I can not "repair".  I am
> >> >> > using cassandra on production servers. is there some way to fix it
> >> >> > without upgrade?  I heard of that 0.8.x is still not quite ready in
> >> >> > production environment.
> >> >>
> >> >> It is a serious issue if you really need to repair one CF at the
> time.
> >> >> However, looking at your original post it seems this is not
> >> >> necessarily your issue. Do you need to, or was your concern rather
> the
> >> >> overall time repair took?
> >> >>
> >> >> There are other things that are improved in 0.8 that affect 0.7. In
> >> >> particular, (1) in 0.7 compaction, including validating compactions
> >> >> that are part of repair, is non-concurrent so if your repair starts
> >> >> while there is a long-running compaction going it will have to wait,
> >> >> and (2) semi-related is that the merkle tree calculation that is part
> >> >> of repair/anti-entropy may happen "out of synch" if one of the nodes
> >> >> participating happen to be busy with compaction. This in turns causes
> >> >> additional data to be sent as part of repair.
> >> >>
> >> >> That might be why your immediately following repair took a long time,
> >> >> but it's difficult to tell.
> >> >>
> >> >> If you're having issues with repair and large data sets, I would
> >> >> generally say that upgrading to 0.8 is recommended. However, if
> you're
> >> >> on 0.7.4, beware of
> >> >> https://issues.apache.org/jira/browse/CASSANDRA-3166
> >> >>
> >> >> --
> >> >> / Peter Schuller (@scode on twitter)
> >> >
> >> >
> >
> >
>

segment fault with 0.8.5

2011-09-14 Thread Yan Chunlu

just tried cassandra 0.8.5 binary version, and got Segment fault

I am using Sun JDK so this is not CASSANDRA-2441


OS is Debian 5.0


java -version

java version "1.6.0_04"

Java(TM) SE Runtime Environment (build 1.6.0_04-b12)

Java HotSpot(TM) Server VM (build 10.0-b19, mixed mode)


uname -a

Linux mao 2.6.27.59 #1 SMP Mon Jul 25 14:30:33 CST 2011 i686 GNU/Linux


I also found that the format of configuration file "cassandra.yaml" is
different, are they compatible?



thanks!

how did hprof file generated?

2011-09-15 Thread Yan Chunlu

in one of my node, I found many hprof files in the cassandra installation
directory, they are using as much as 200GB disk space.  other nodes didn't
have those files.

turns out that those files are used for memory analyzing, not sure how they
are generated?


like these:

java_pid10626.hprof  java_pid13898.hprof  java_pid17061.hprof
 java_pid21002.hprof  java_pid23194.hprof  java_pid29241.hprof
 java_pid5013.hprof

Re: how did hprof file generated?

2011-09-15 Thread Yan Chunlu

got it! thanks!

On Thu, Sep 15, 2011 at 4:10 PM, Peter Schuller  wrote:

> > in one of my node, I found many hprof files in the cassandra installation
> > directory, they are using as much as 200GB disk space.  other nodes
> didn't
> > have those files.
> > turns out that those files are used for memory analyzing, not sure how
> they
> > are generated?
>
> You're probably getting OutOfMemory exceptions. Cassandra by default
> runs with -XX:+HeapDumpOnOutOfMemory (or some such, I forget exactly
> what it's called). If this is the case, you probably need to increase
> your heap size or adjust Cassandra settings.
>
> --
> / Peter Schuller (@scode on twitter)
>

"Ignorning message." showing in the log while upgrade to 0.8

2011-09-16 Thread Yan Chunlu

I am running local tests about upgrade cassandra.  upgrade from 0.7.4 to
0.8.5
after upgrade one node1,  two problem happened:

1,  node2 keep saying:

"Received connection from newer protocol version. Ignorning message."

is that normal behaviour?

2, while running "describe cluster" on node1, it shows node2 unreachable:
Cluster Information:
   Snitch: org.apache.cassandra.locator.SimpleSnitch
   Partitioner: org.apache.cassandra.dht.RandomPartitioner
   Schema versions:
UNREACHABLE: [node2]
05f1ee3b-e063-11e0-97d5-63c2fb3f0ca8: [node1, node3]

node3 seems act normal.


I saw the JMXPORT has changed since 0.8, is that the reason node was
unreachable?


thanks!

Re: "Ignorning message." showing in the log while upgrade to 0.8

2011-09-16 Thread Yan Chunlu

after kill node1 and start it again, node 3 has the same problems with
node2...

On Fri, Sep 16, 2011 at 10:42 PM, Yan Chunlu  wrote:

> I am running local tests about upgrade cassandra.  upgrade from 0.7.4 to
> 0.8.5
> after upgrade one node1,  two problem happened:
>
> 1,  node2 keep saying:
>
> "Received connection from newer protocol version. Ignorning message."
>
> is that normal behaviour?
>
> 2, while running "describe cluster" on node1, it shows node2 unreachable:
> Cluster Information:
>Snitch: org.apache.cassandra.locator.SimpleSnitch
>Partitioner: org.apache.cassandra.dht.RandomPartitioner
>Schema versions:
> UNREACHABLE: [node2]
> 05f1ee3b-e063-11e0-97d5-63c2fb3f0ca8: [node1, node3]
>
> node3 seems act normal.
>
>
> I saw the JMXPORT has changed since 0.8, is that the reason node was
> unreachable?
>
>
> thanks!
>
>
>

Re: "Ignorning message." showing in the log while upgrade to 0.8

2011-09-16 Thread Yan Chunlu

and also the load is unusual(node1 has 80M data before the upgrade):

bash-3.2$ bin/nodetool -h localhost ring
Address DC  RackStatus State   LoadOwns
   Token

   93798607613553124915572813490354413064
node2   datacenter1 rack1   Up Normal  86.03 MB46.81%
 3303745385038694806791595159000401786
node3   datacenter1 rack1   Up Normal  67.68 MB26.65%
 48642301133762927375044585593194981764
node1   datacenter1 rack1   Up Normal  114.81 KB   26.54%
 93798607613553124915572813490354413064



On Fri, Sep 16, 2011 at 10:48 PM, Yan Chunlu  wrote:

> after kill node1 and start it again, node 3 has the same problems with
> node2...
>
>
> On Fri, Sep 16, 2011 at 10:42 PM, Yan Chunlu wrote:
>
>> I am running local tests about upgrade cassandra.  upgrade from 0.7.4 to
>> 0.8.5
>> after upgrade one node1,  two problem happened:
>>
>> 1,  node2 keep saying:
>>
>> "Received connection from newer protocol version. Ignorning message."
>>
>> is that normal behaviour?
>>
>> 2, while running "describe cluster" on node1, it shows node2 unreachable:
>> Cluster Information:
>>Snitch: org.apache.cassandra.locator.SimpleSnitch
>>Partitioner: org.apache.cassandra.dht.RandomPartitioner
>>Schema versions:
>> UNREACHABLE: [node2]
>> 05f1ee3b-e063-11e0-97d5-63c2fb3f0ca8: [node1, node3]
>>
>> node3 seems act normal.
>>
>>
>> I saw the JMXPORT has changed since 0.8, is that the reason node was
>> unreachable?
>>
>>
>> thanks!
>>
>>
>>
>

Re: "Ignorning message." showing in the log while upgrade to 0.8

2011-09-17 Thread Yan Chunlu

so the schema version not consistent is abnormal?

is there a document explains the possible problem while upgrading?  I think
it might be concerned by many cassandra users. cause if anything unexpected
during upgrade, might cause serious problem on production server. cause
there is no way to revert the operation.

On Fri, Sep 16, 2011 at 11:29 PM, Jonathan Ellis  wrote:

> On Fri, Sep 16, 2011 at 9:42 AM, Yan Chunlu  wrote:
> > I am running local tests about upgrade cassandra.  upgrade from 0.7.4 to
> > 0.8.5
> > after upgrade one node1,  two problem happened:
> > 1,  node2 keep saying:
> > "Received connection from newer protocol version. Ignorning message."
> > is that normal behaviour?
>
> Yes.  It will take a few exchanges before the new node knows to use
> the older protocol with the 0.7 nodes.
>
> > 2, while running "describe cluster" on node1, it shows node2 unreachable:
> > Cluster Information:
> >Snitch: org.apache.cassandra.locator.SimpleSnitch
> >Partitioner: org.apache.cassandra.dht.RandomPartitioner
> >Schema versions:
> > UNREACHABLE: [node2]
> > 05f1ee3b-e063-11e0-97d5-63c2fb3f0ca8: [node1, node3]
> > node3 seems act normal.
> >
> > I saw the JMXPORT has changed since 0.8, is that the reason node was
> > unreachable?
>
> No.
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>

Re: "Ignorning message." showing in the log while upgrade to 0.8

2011-09-17 Thread Yan Chunlu

also not  fixed in 0.8.5?  I am using the binary version of 0.8.5.  Applying
the fix might need to compile it from the source?

On Sun, Sep 18, 2011 at 3:28 AM, Peter Schuller  wrote:

> >> I am running local tests about upgrade cassandra.  upgrade from 0.7.4 to
> >> 0.8.5
>
> [snip]
>
> > Yes.  It will take a few exchanges before the new node knows to use
> > the older protocol with the 0.7 nodes.
>
> Not going 0.7.4->0.8.4, due to
>
>   https://issues.apache.org/jira/browse/CASSANDRA-3166
>
> OP: Applying that fix should fix your problem.
>
> --
> / Peter Schuller (@scode on twitter)
>

Re: "Ignorning message." showing in the log while upgrade to 0.8

2011-09-18 Thread Yan Chunlu

thanks!   is the load info also a bug?  node1 supposed to have 80MB.

bash-3.2$ bin/nodetool -h localhost ring
Address DC  RackStatus State   LoadOwns
   Token

   93798607613553124915572813490354413064
node2   datacenter1 rack1   Up Normal  86.03 MB46.81%
 3303745385038694806791595159000401786
node3   datacenter1 rack1   Up Normal  67.68 MB26.65%
 48642301133762927375044585593194981764
node1   datacenter1 rack1   Up Normal  114.81 KB   26.54%
 93798607613553124915572813490354413064



On Sun, Sep 18, 2011 at 2:36 PM, Peter Schuller  wrote:

> > It's fixed on 0.8.6. For 0.8.5 you would have to build it from source
> > with the patch applied, yes.
> >
> > (Actually, in my opinion this bugfix is a good reason to release 0.8.6.)
>
> Turns out I had managed to miss the fact that a 0.8.6 release is being
> voted on so I'd expect it to happen soonish. You might wait for that,
> or apply the fix and build from source.
>
> --
> / Peter Schuller (@scode on twitter)
>

cassandra crashed while repairing, leave node size X3

2011-09-18 Thread Yan Chunlu

while doing repair on node3, the "Load" keep increasing, suddenly cassandra
has encountered OOM, and the "Load" stopped at 140GB,  after cassandra came
back, I tried node cleanup but it seems not working

does node repair generate many temp sstables?   how to get rid of them?
 thanks!

Address Status State   LoadOwnsToken


 113427455640312821154458202477256070484
node1  Up Normal  43 GB   33.33%  0

node2 Up Normal  59.52 GB33.33%
 56713727820156410577229101238628035242
node3  Down   Normal  142.57 GB   33.33%
 113427455640312821154458202477256070484

Re: cassandra crashed while repairing, leave node size X3

2011-09-18 Thread Yan Chunlu

so does major compaction actually "clean it" or "merge it", I am afraid it
give me a single large file

On Mon, Sep 19, 2011 at 10:26 AM, Anand Somani  wrote:

> In my tests I have seen repair sometimes take a lot of space (2-3 times),
> cleanup did not clean it, the only way I could clean that was using major
> compaction.
>
>
> On Sun, Sep 18, 2011 at 6:51 PM, Yan Chunlu  wrote:
>
>> while doing repair on node3, the "Load" keep increasing, suddenly
>> cassandra has encountered OOM, and the "Load" stopped at 140GB,  after
>> cassandra came back, I tried node cleanup but it seems not working
>>
>> does node repair generate many temp sstables?   how to get rid of them?
>>  thanks!
>>
>> Address Status State   LoadOwnsToken
>>
>>
>>  113427455640312821154458202477256070484
>> node1  Up Normal  43 GB   33.33%  0
>>
>> node2 Up Normal  59.52 GB33.33%
>>  56713727820156410577229101238628035242
>> node3  Down   Normal  142.57 GB   33.33%
>>  113427455640312821154458202477256070484
>>
>
>

Re: cassandra crashed while repairing, leave node size X3

2011-09-19 Thread Yan Chunlu

I am using 0.7.4 too. and would waiting for 0.8.6 stable to release because
of CASSANDRA-3166.

did you already using 0.8.6 in production?

2011/9/19 Jonas Borgström 

> On 09/19/2011 04:26 AM, Anand Somani wrote:
> > In my tests I have seen repair sometimes take a lot of space (2-3
> > times), cleanup did not clean it, the only way I could clean that was
> > using major compaction.
>
> Do you remember with what version you saw these problems?
>
> I've had the same problems with 0.7.4 but so far my repair tests with
> 0.8.6 seems to behave a lot better.
>
> / Jonas
>

Re: "Ignorning message." showing in the log while upgrade to 0.8

2011-09-19 Thread Yan Chunlu

any help on this? thanks!

On Sun, Sep 18, 2011 at 5:04 PM, Yan Chunlu  wrote:

> thanks!   is the load info also a bug?  node1 supposed to have 80MB.
>
> bash-3.2$ bin/nodetool -h localhost ring
> Address DC  RackStatus State   LoadOwns
>Token
>
>93798607613553124915572813490354413064
> node2   datacenter1 rack1   Up Normal  86.03 MB46.81%
>  3303745385038694806791595159000401786
> node3   datacenter1 rack1   Up Normal  67.68 MB26.65%
>  48642301133762927375044585593194981764
> node1   datacenter1 rack1   Up Normal  114.81 KB   26.54%
>  93798607613553124915572813490354413064
>
>
>
> On Sun, Sep 18, 2011 at 2:36 PM, Peter Schuller <
> peter.schul...@infidyne.com> wrote:
>
>> > It's fixed on 0.8.6. For 0.8.5 you would have to build it from source
>> > with the patch applied, yes.
>> >
>> > (Actually, in my opinion this bugfix is a good reason to release 0.8.6.)
>>
>> Turns out I had managed to miss the fact that a 0.8.6 release is being
>> voted on so I'd expect it to happen soonish. You might wait for that,
>> or apply the fix and build from source.
>>
>> --
>> / Peter Schuller (@scode on twitter)
>>
>
>

Re: cassandra crashed while repairing, leave node size X3

2011-09-19 Thread Yan Chunlu

got it, thanks!

On Tue, Sep 20, 2011 at 12:27 AM, Peter Schuller <
peter.schul...@infidyne.com> wrote:

> > In my tests I have seen repair sometimes take a lot of space (2-3 times),
> > cleanup did not clean it, the only way I could clean that was using major
> > compaction.
>
> https://issues.apache.org/jira/browse/CASSANDRA-2816 (follow links to
> other jiras)
> https://issues.apache.org/jira/browse/CASSANDRA-2699
>
> And yes to the one who asked: 'cleanup' only removes data that is not
> supposed to be on the node; repair transferes data that *should* be on
> the node, so only a compaction will cut down the size after a repair
> induced spike of load (data size).
>
> --
> / Peter Schuller (@scode on twitter)
>

Re: "Ignorning message." showing in the log while upgrade to 0.8

2011-09-20 Thread Yan Chunlu

sorry, my bad. messed up the yaml file.  I will try it again. thanks a lot!

On Tue, Sep 20, 2011 at 5:08 PM, aaron morton wrote:

> Is the data still physically on node 1 ? During start up does it log about
> opening the SSTables ?
>
> Another sometimes problem is if the schema is out of sync, and node 1 may
> not have all the CF's and will not have opened the SSTables.
>
> Check the logs and check if the physical data is there.
>
> Cheers
>
>  -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 20/09/2011, at 3:54 PM, Yan Chunlu wrote:
>
> any help on this? thanks!
>
> On Sun, Sep 18, 2011 at 5:04 PM, Yan Chunlu  wrote:
>
>> thanks!   is the load info also a bug?  node1 supposed to have 80MB.
>>
>> bash-3.2$ bin/nodetool -h localhost ring
>> Address DC  RackStatus State   Load
>>  OwnsToken
>>
>>  93798607613553124915572813490354413064
>> node2   datacenter1 rack1   Up Normal  86.03 MB46.81%
>>  3303745385038694806791595159000401786
>> node3   datacenter1 rack1   Up Normal  67.68 MB26.65%
>>  48642301133762927375044585593194981764
>> node1   datacenter1 rack1   Up Normal  114.81 KB   26.54%
>>  93798607613553124915572813490354413064
>>
>>
>>
>> On Sun, Sep 18, 2011 at 2:36 PM, Peter Schuller <
>> peter.schul...@infidyne.com> wrote:
>>
>>> > It's fixed on 0.8.6. For 0.8.5 you would have to build it from source
>>> > with the patch applied, yes.
>>> >
>>> > (Actually, in my opinion this bugfix is a good reason to release
>>> 0.8.6.)
>>>
>>> Turns out I had managed to miss the fact that a 0.8.6 release is being
>>> voted on so I'd expect it to happen soonish. You might wait for that,
>>> or apply the fix and build from source.
>>>
>>> --
>>> / Peter Schuller (@scode on twitter)
>>>
>>
>>
>
>

Re: [RELEASE] Apache Cassandra 0.8.6 released

2011-09-20 Thread Yan Chunlu

Great!  just waiting for it.

On Tue, Sep 20, 2011 at 6:12 PM, Sylvain Lebresne wrote:

> The Cassandra team is pleased to announce the release of Apache Cassandra
> version 0.8.6.
>
> Cassandra is a highly scalable second-generation distributed database,
> bringing together Dynamo's fully distributed design and Bigtable's
> ColumnFamily-based data model. You can read more here:
>
>  http://cassandra.apache.org/
>
> Downloads of source and binary distributions are listed in our download
> section:
>
>  http://cassandra.apache.org/download/
>
> This version is a maintenance/bug fix release[1]. In particular, it fixes a
> bug preventing rolling upgrades from the 0.7 series. Please pay attention
> to
> the release notes[2] before upgrading and let us know[3] if you were to
> encounter any problem.
>
> Have fun!
>
>
> [1]: http://goo.gl/COUVm (CHANGES.txt)
> [2]: http://goo.gl/3CjzD (NEWS.txt)
> [3]: https://issues.apache.org/jira/browse/CASSANDRA
>

progress of sstableloader keeps 0?

2011-09-22 Thread Yan Chunlu

I took a snapshot of one of my node in a cluster 0.7.4(N=RF=3).   use
sstableloader to load the snapshot data to another 1 node cluster(N=RF=1).


after execute  "bin/sstableloader  /disk2/mykeyspace/"


it says"Starting client (and waiting 30 seconds for gossip) ..."
"Streaming revelant part of  cf1.db. to [10.23.2.4]"

then showing the progress indicator and stopped. nothing changed after
then.

progress: [/10.28.53.16 1/72 (0)] [total: 0 - 0MB/s (avg: 0MB/s)]]]


I use nodetool to check the node 10.23.2.4, nothing changed. no data copied
to it. and the data dir also keep its original size. is there anything
wrong? how can I tell what was going on there?

thanks!

Re: Moving to a new cluster

2011-09-22 Thread Yan Chunlu

hi Aaron:

could you explain more about the issue about repair make space usage going
crazy?

I am planning to upgrade my cluster from 0.7.4 to 0.8.6, which is because
the repair never works on 0.7.4 for me.
more specifically,
CASSANDRA-2280
 and CASSANDRA-2156 .


from your description, I really worried about 0.8.6 might make it worse...

thanks!

On Thu, Sep 22, 2011 at 7:25 AM, aaron morton wrote:

> How much data is on the nodes in cluster 1 and how much disk space on
> cluster 2 ? Be aware that Cassandra 0.8 has an issue where repair can go
> crazy and use a lot of space.
>
> If you are not regularly running repair I would also repair before the
> move.
>
> The repair after the copy is a good idea but should technically not be
> necessary. If you can practice the move watch the repair to see if much is
> transferred (check the logs). There is always a small transfer, but if you
> see data been transferred for several minutes I would investigate.
>
> When you start a repair it will repair will the other nodes it replicates
> data with. So you only need to run it every RF nodes. Start it one one,
> watch the logs to see who it talks to and then start it on the first node it
> does not talk to. And so on.
>
> Add a snapshot before the clean (repair will also snapshot before it runs)
>
> Scrub is not needed unless you are migrating or you have file errors.
>
> If your cluster is online, consider running the clean every RFth node
> rather than all at once (e.g. 1,4, 7, 10 then 2,5,8,11). It will have less
> impact on clients.
>
> Cheers
>
>  -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 22/09/2011, at 10:27 AM, Philippe wrote:
>
> Hello,
> We're currently running on a 3-node RF=3 cluster. Now that we have a better
> grip on things, we want to replace it with a 12-node RF=3 cluster of
> "smaller" servers. So I wonder what the best way to move the data to the new
> cluster would be. I can afford to stop writing to the current cluster for
> whatever time is necessary. Has anyone written up something on this subject
> ?
>
> My plan is the following (nodes in cluster 1 are node1.1->1.3, nodes in
> cluster 2 are node2.1->2.12)
>
>- stop writing to current cluster & drain it
>- get a snapshot on each node
>- Since it's RF=3, each node should have all the data, so assuming I
>set the tokens correctly I would move the snapshot from node1.1 to node2.1,
>2.2, 2.3 and 2.4 then node1.2->node2.5,2.6,2.,2.8, etc. This is because the
>range for node1.1 is now spread across 2.1->2.4
>- Run repair & clean & scrub on each node (more or less in //)
>
> What do you think ?
> Thanks
>
>
>

Re: progress of sstableloader keeps 0?

2011-09-22 Thread Yan Chunlu

sorry I did not look into it  after check it I found version mismatch
exception is in the log:
ERROR [Thread-17] 2011-09-22 08:24:24,248 AbstractCassandraDaemon.java (line
139) Fatal exception in thread Thread[Thread-17,5,main]
java.lang.RuntimeException: Cannot recover SSTable
/disk2/cassandra/data/reddit/Comments-tmp-f-1 due to version mismatch.
(current version is g).
at
org.apache.cassandra.io.sstable.SSTableWriter.createBuilder(SSTableWriter.java:240)
at
org.apache.cassandra.db.compaction.CompactionManager.submitSSTableBuild(CompactionManager.java:1097)
at
org.apache.cassandra.streaming.StreamInSession.finished(StreamInSession.java:110)
at
org.apache.cassandra.streaming.IncomingStreamReader.readFile(IncomingStreamReader.java:104)
at
org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:61)
at
org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:189)
at
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:117)

does that mean I need to run scrub before running the loader?  could I just
delete it and keep going?  thanks!

On Fri, Sep 23, 2011 at 2:16 AM, Jonathan Ellis  wrote:

> Did you check for errors in logs on both loader + target?
>
> On Thu, Sep 22, 2011 at 10:52 AM, Yan Chunlu 
> wrote:
> > I took a snapshot of one of my node in a cluster 0.7.4(N=RF=3).   use
> > sstableloader to load the snapshot data to another 1 node
> cluster(N=RF=1).
> >
> > after execute  "bin/sstableloader  /disk2/mykeyspace/"
> >
> > it says"Starting client (and waiting 30 seconds for gossip) ..."
> > "Streaming revelant part of  cf1.db. to [10.23.2.4]"
> > then showing the progress indicator and stopped. nothing changed after
> > then.
> > progress: [/10.28.53.16 1/72 (0)] [total: 0 - 0MB/s (avg: 0MB/s)]]]
> >
> > I use nodetool to check the node 10.23.2.4, nothing changed. no data
> copied
> > to it. and the data dir also keep its original size. is there anything
> > wrong? how can I tell what was going on there?
> > thanks!
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>

Re: progress of sstableloader keeps 0?

2011-09-24 Thread Yan Chunlu

yes, I did.  thought 0.8 is downward compatible. is there other ways to load
0.7's data into 0.8?  will copy the data dir directly will work?   I would
like to put load of three nodes into one node.

 thanks!

On Sun, Sep 25, 2011 at 11:52 AM, aaron morton wrote:

> Looks like it is complaining that you are trying to load a 0.7 SSTable in
> 0.8.
>
>
> Cheers
>
>  -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 23/09/2011, at 5:23 PM, Yan Chunlu wrote:
>
> sorry I did not look into it  after check it I found version mismatch
> exception is in the log:
> ERROR [Thread-17] 2011-09-22 08:24:24,248 AbstractCassandraDaemon.java
> (line 139) Fatal exception in thread Thread[Thread-17,5,main]
> java.lang.RuntimeException: Cannot recover SSTable
> /disk2/cassandra/data/reddit/Comments-tmp-f-1 due to version mismatch.
> (current version is g).
> at
> org.apache.cassandra.io.sstable.SSTableWriter.createBuilder(SSTableWriter.java:240)
> at
> org.apache.cassandra.db.compaction.CompactionManager.submitSSTableBuild(CompactionManager.java:1097)
> at
> org.apache.cassandra.streaming.StreamInSession.finished(StreamInSession.java:110)
> at
> org.apache.cassandra.streaming.IncomingStreamReader.readFile(IncomingStreamReader.java:104)
> at
> org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:61)
> at
> org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:189)
> at
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:117)
>
>
> does that mean I need to run scrub before running the loader?  could I just
> delete it and keep going?  thanks!
>
> On Fri, Sep 23, 2011 at 2:16 AM, Jonathan Ellis  wrote:
>
>> Did you check for errors in logs on both loader + target?
>>
>> On Thu, Sep 22, 2011 at 10:52 AM, Yan Chunlu 
>> wrote:
>> > I took a snapshot of one of my node in a cluster 0.7.4(N=RF=3).   use
>> > sstableloader to load the snapshot data to another 1 node
>> cluster(N=RF=1).
>> >
>> > after execute  "bin/sstableloader  /disk2/mykeyspace/"
>> >
>> > it says"Starting client (and waiting 30 seconds for gossip) ..."
>> > "Streaming revelant part of  cf1.db. to [10.23.2.4]"
>> > then showing the progress indicator and stopped. nothing changed after
>> > then.
>> > progress: [/10.28.53.16 1/72 (0)] [total: 0 - 0MB/s (avg: 0MB/s)]]]
>> >
>> > I use nodetool to check the node 10.23.2.4, nothing changed. no data
>> copied
>> > to it. and the data dir also keep its original size. is there anything
>> > wrong? how can I tell what was going on there?
>> > thanks!
>>
>>
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of DataStax, the source for professional Cassandra support
>> http://www.datastax.com
>>
>
>
>

Re: Moving to a new cluster

2011-09-24 Thread Yan Chunlu

thanks!  is that similar problem described in this thread?


http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/nodetool-repair-caused-high-disk-space-usage-td6695542.html

On Sun, Sep 25, 2011 at 11:33 AM, aaron morton wrote:

> It can result in a lot of data on the node you run repair on. Where a lot
> means perhaps 2 or more  times more data.
>
> My unscientific approach is to repair one CF at a time so you can watch the
> disk usage and repair the smaller CF's first. After the repair compact if
> you need to.
>
> I think  the amount of extra data will be related to how out of sync things
> are, so once you get repair working smoothly it will be less of problem.
>
> Cheers
>
>
>  -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 23/09/2011, at 3:04 AM, Yan Chunlu wrote:
>
>
> hi Aaron:
>
> could you explain more about the issue about repair make space usage going
> crazy?
>
> I am planning to upgrade my cluster from 0.7.4 to 0.8.6, which is because
> the repair never works on 0.7.4 for me.
> more specifically, 
> CASSANDRA-2280<https://issues.apache.org/jira/browse/CASSANDRA-2280>
>  and CASSANDRA-2156 <https://issues.apache.org/jira/browse/CASSANDRA-2156>
> .
>
>
> from your description, I really worried about 0.8.6 might make it worse...
>
> thanks!
>
> On Thu, Sep 22, 2011 at 7:25 AM, aaron morton wrote:
>
>> How much data is on the nodes in cluster 1 and how much disk space on
>> cluster 2 ? Be aware that Cassandra 0.8 has an issue where repair can go
>> crazy and use a lot of space.
>>
>> If you are not regularly running repair I would also repair before the
>> move.
>>
>> The repair after the copy is a good idea but should technically not be
>> necessary. If you can practice the move watch the repair to see if much is
>> transferred (check the logs). There is always a small transfer, but if you
>> see data been transferred for several minutes I would investigate.
>>
>> When you start a repair it will repair will the other nodes it replicates
>> data with. So you only need to run it every RF nodes. Start it one one,
>> watch the logs to see who it talks to and then start it on the first node it
>> does not talk to. And so on.
>>
>> Add a snapshot before the clean (repair will also snapshot before it runs)
>>
>> Scrub is not needed unless you are migrating or you have file errors.
>>
>> If your cluster is online, consider running the clean every RFth node
>> rather than all at once (e.g. 1,4, 7, 10 then 2,5,8,11). It will have less
>> impact on clients.
>>
>> Cheers
>>
>>  -
>> Aaron Morton
>> Freelance Cassandra Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 22/09/2011, at 10:27 AM, Philippe wrote:
>>
>> Hello,
>> We're currently running on a 3-node RF=3 cluster. Now that we have a
>> better grip on things, we want to replace it with a 12-node RF=3 cluster of
>> "smaller" servers. So I wonder what the best way to move the data to the new
>> cluster would be. I can afford to stop writing to the current cluster for
>> whatever time is necessary. Has anyone written up something on this subject
>> ?
>>
>> My plan is the following (nodes in cluster 1 are node1.1->1.3, nodes in
>> cluster 2 are node2.1->2.12)
>>
>>- stop writing to current cluster & drain it
>>- get a snapshot on each node
>>- Since it's RF=3, each node should have all the data, so assuming I
>>set the tokens correctly I would move the snapshot from node1.1 to 
>> node2.1,
>>2.2, 2.3 and 2.4 then node1.2->node2.5,2.6,2.,2.8, etc. This is because 
>> the
>>range for node1.1 is now spread across 2.1->2.4
>>- Run repair & clean & scrub on each node (more or less in //)
>>
>> What do you think ?
>> Thanks
>>
>>
>>
>
>

Re: progress of sstableloader keeps 0?

2011-09-25 Thread Yan Chunlu

thanks!  another problem is what if cluster number are not the same?

in my case I am move 3 nodes cluster data to 1 node,  the keyspace files in
3 nodes might use the same name...

I am using the new cluster only for emergency usage, so only 1 node is
attached.

On Sun, Sep 25, 2011 at 5:20 PM, aaron morton wrote:

> That can read data from previous versions, i.e. if you upgrade to 0.8 it
> can read the existing files from 0.7.
>
> But what you are doing with the sstable loader is (AFAIK) only copying the
> Data portion of the CF. Once the table is loaded the node will then build
> the Index and the Filter, this is the createBuild() call in the stack. It's
> throwing because version 0.8 does not want to make version 0.8 Index and and
> Filter files for a version 0.7 Data file.
>
> We get the same problem when upgrading from 0.7 to 0.8, where Repair will
> not work because it is streaming a 0.7 version data file and the recipient
> then tries to build the Index and Filter files.
>
> So to read 0.7 data from 0.8 just copy over *all* the files for the
> keyspace (data, filter and index). Then scrub the nodes so that repair can
> work.
>
> Hope that helps.
>
>
>  -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 25/09/2011, at 6:07 PM, Yan Chunlu wrote:
>
> yes, I did.  thought 0.8 is downward compatible. is there other ways to
> load 0.7's data into 0.8?  will copy the data dir directly will work?   I
> would like to put load of three nodes into one node.
>
>  thanks!
>
> On Sun, Sep 25, 2011 at 11:52 AM, aaron morton wrote:
>
>> Looks like it is complaining that you are trying to load a 0.7 SSTable in
>> 0.8.
>>
>>
>> Cheers
>>
>>  -
>> Aaron Morton
>> Freelance Cassandra Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 23/09/2011, at 5:23 PM, Yan Chunlu wrote:
>>
>> sorry I did not look into it  after check it I found version mismatch
>> exception is in the log:
>> ERROR [Thread-17] 2011-09-22 08:24:24,248 AbstractCassandraDaemon.java
>> (line 139) Fatal exception in thread Thread[Thread-17,5,main]
>> java.lang.RuntimeException: Cannot recover SSTable
>> /disk2/cassandra/data/reddit/Comments-tmp-f-1 due to version mismatch.
>> (current version is g).
>> at
>> org.apache.cassandra.io.sstable.SSTableWriter.createBuilder(SSTableWriter.java:240)
>> at
>> org.apache.cassandra.db.compaction.CompactionManager.submitSSTableBuild(CompactionManager.java:1097)
>> at
>> org.apache.cassandra.streaming.StreamInSession.finished(StreamInSession.java:110)
>> at
>> org.apache.cassandra.streaming.IncomingStreamReader.readFile(IncomingStreamReader.java:104)
>> at
>> org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:61)
>> at
>> org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:189)
>> at
>> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:117)
>>
>>
>> does that mean I need to run scrub before running the loader?  could I
>> just delete it and keep going?  thanks!
>>
>> On Fri, Sep 23, 2011 at 2:16 AM, Jonathan Ellis wrote:
>>
>>> Did you check for errors in logs on both loader + target?
>>>
>>> On Thu, Sep 22, 2011 at 10:52 AM, Yan Chunlu 
>>> wrote:
>>> > I took a snapshot of one of my node in a cluster 0.7.4(N=RF=3).   use
>>> > sstableloader to load the snapshot data to another 1 node
>>> cluster(N=RF=1).
>>> >
>>> > after execute  "bin/sstableloader  /disk2/mykeyspace/"
>>> >
>>> > it says"Starting client (and waiting 30 seconds for gossip) ..."
>>> > "Streaming revelant part of  cf1.db. to [10.23.2.4]"
>>> > then showing the progress indicator and stopped. nothing changed after
>>> > then.
>>> > progress: [/10.28.53.16 1/72 (0)] [total: 0 - 0MB/s (avg: 0MB/s)]]]
>>> >
>>> > I use nodetool to check the node 10.23.2.4, nothing changed. no data
>>> copied
>>> > to it. and the data dir also keep its original size. is there anything
>>> > wrong? how can I tell what was going on there?
>>> > thanks!
>>>
>>>
>>>
>>> --
>>> Jonathan Ellis
>>> Project Chair, Apache Cassandra
>>> co-founder of DataStax, the source for professional Cassandra support
>>> http://www.datastax.com
>>>
>>
>>
>>
>
>

Re: progress of sstableloader keeps 0?

2011-09-25 Thread Yan Chunlu

thank you very much aaron. your explanation  is clear enough and very
helpful!

On Mon, Sep 26, 2011 at 4:58 AM, aaron morton wrote:

> If you had RF3 in a 3 node cluster and everything was repaired you *should*
> be ok to only take the data from 1 node, if the cluster is not receiving
> writes.
>
> If you want to merge the data from 3 nodes rename the files AFAIK they do
> not have to have contiguous file numbers.
>
> Cheers
>
>
>  -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 25/09/2011, at 10:45 PM, Yan Chunlu wrote:
>
> thanks!  another problem is what if cluster number are not the same?
>
> in my case I am move 3 nodes cluster data to 1 node,  the keyspace files in
> 3 nodes might use the same name...
>
> I am using the new cluster only for emergency usage, so only 1 node is
> attached.
>
> On Sun, Sep 25, 2011 at 5:20 PM, aaron morton wrote:
>
>> That can read data from previous versions, i.e. if you upgrade to 0.8 it
>> can read the existing files from 0.7.
>>
>> But what you are doing with the sstable loader is (AFAIK) only copying the
>> Data portion of the CF. Once the table is loaded the node will then build
>> the Index and the Filter, this is the createBuild() call in the stack. It's
>> throwing because version 0.8 does not want to make version 0.8 Index and and
>> Filter files for a version 0.7 Data file.
>>
>> We get the same problem when upgrading from 0.7 to 0.8, where Repair will
>> not work because it is streaming a 0.7 version data file and the recipient
>> then tries to build the Index and Filter files.
>>
>> So to read 0.7 data from 0.8 just copy over *all* the files for the
>> keyspace (data, filter and index). Then scrub the nodes so that repair can
>> work.
>>
>> Hope that helps.
>>
>>
>>  -
>> Aaron Morton
>> Freelance Cassandra Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 25/09/2011, at 6:07 PM, Yan Chunlu wrote:
>>
>> yes, I did.  thought 0.8 is downward compatible. is there other ways to
>> load 0.7's data into 0.8?  will copy the data dir directly will work?   I
>> would like to put load of three nodes into one node.
>>
>>  thanks!
>>
>> On Sun, Sep 25, 2011 at 11:52 AM, aaron morton 
>> wrote:
>>
>>> Looks like it is complaining that you are trying to load a 0.7 SSTable in
>>> 0.8.
>>>
>>>
>>> Cheers
>>>
>>>  -
>>> Aaron Morton
>>> Freelance Cassandra Developer
>>> @aaronmorton
>>> http://www.thelastpickle.com
>>>
>>> On 23/09/2011, at 5:23 PM, Yan Chunlu wrote:
>>>
>>> sorry I did not look into it  after check it I found version mismatch
>>> exception is in the log:
>>> ERROR [Thread-17] 2011-09-22 08:24:24,248 AbstractCassandraDaemon.java
>>> (line 139) Fatal exception in thread Thread[Thread-17,5,main]
>>> java.lang.RuntimeException: Cannot recover SSTable
>>> /disk2/cassandra/data/reddit/Comments-tmp-f-1 due to version mismatch.
>>> (current version is g).
>>> at
>>> org.apache.cassandra.io.sstable.SSTableWriter.createBuilder(SSTableWriter.java:240)
>>> at
>>> org.apache.cassandra.db.compaction.CompactionManager.submitSSTableBuild(CompactionManager.java:1097)
>>> at
>>> org.apache.cassandra.streaming.StreamInSession.finished(StreamInSession.java:110)
>>> at
>>> org.apache.cassandra.streaming.IncomingStreamReader.readFile(IncomingStreamReader.java:104)
>>> at
>>> org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:61)
>>> at
>>> org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:189)
>>> at
>>> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:117)
>>>
>>>
>>> does that mean I need to run scrub before running the loader?  could I
>>> just delete it and keep going?  thanks!
>>>
>>> On Fri, Sep 23, 2011 at 2:16 AM, Jonathan Ellis wrote:
>>>
>>>> Did you check for errors in logs on both loader + target?
>>>>
>>>> On Thu, Sep 22, 2011 at 10:52 AM, Yan Chunlu 
>>>> wrote:
>>>> > I took a snapshot of one of my node in a cluster 0.7.4(N=RF=3).   use
>>>> > sstableloader to load the snapshot data to another 1 node
>>>> cluster(N=RF=1).

how does compaction_throughput_kb_per_sec affect disk io?

2011-09-26 Thread Yan Chunlu

I am using the default 16MB when running repair. but the disk io is still
quite high:

Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz
avgqu-sz   await r_await w_await  svctm  %util
sdb 136.00 0.00  506.00   26.00 63430.00  5880.00   260.56
101.73  224.386.60 4462.62   1.88 100.00

Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz
avgqu-sz   await r_await w_await  svctm  %util
sdb  58.50 21978.00  131.00  122.50 16226.00 52596.00   542.97
122.98  870.28   10.02 1790.24   3.94 100.00




the  rkB/s  and wKB/s are almost 60MB, did I misunderstand the meaning of
compaction throttle?


cassandra version is 0.8.6

Re: how does compaction_throughput_kb_per_sec affect disk io?

2011-09-26 Thread Yan Chunlu

okay, thanks!

On Mon, Sep 26, 2011 at 10:38 PM, Jonathan Ellis  wrote:

> compaction throughput doesn't affect flushing or reads
>
> On Mon, Sep 26, 2011 at 7:40 AM, Yan Chunlu  wrote:
> > I am using the default 16MB when running repair. but the disk io is still
> > quite high:
> > Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s
> avgrq-sz
> > avgqu-sz   await r_await w_await  svctm  %util
> > sdb 136.00 0.00  506.00   26.00 63430.00  5880.00
> 260.56
> > 101.73  224.386.60 4462.62   1.88 100.00
> > Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s
> avgrq-sz
> > avgqu-sz   await r_await w_await  svctm  %util
> > sdb  58.50 21978.00  131.00  122.50 16226.00 52596.00
> 542.97
> > 122.98  870.28   10.02 1790.24   3.94 100.00
> >
> >
> >
> > the  rkB/s  and wKB/s are almost 60MB, did I misunderstand the meaning of
> > compaction throttle?
> >
> > cassandra version is 0.8.6
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>

anyway to disable row/key cache on single node while starting it?

2011-09-27 Thread Yan Chunlu

again I was doing repair on single CF and it crashed because of OOM,
leaving 286GB data(should be 40GB).   the problem here is it take very very
long to make the node back to alive,   seems because of it was loading  row
cache.  the last time I encountered this, I did people suggested that delete
everything in saved_cache directory and update the schema to set row/key
cache to 0.   but it seems is cluster wide and affect other nodes.

so is there anyway to stop the node to load cache while starting?

thanks!

anyway to throttle nodetool repair?

2011-09-27 Thread Yan Chunlu

I saw the ticket about compaction throttling, just wonder is that necessary
to add an option or is there anyway to do repair throttling?

every time I run nodetool repair, it uses all disk io and the server load
goes up quickly, just wonder is there anyway to make it smoother.

Re: anyway to throttle nodetool repair?

2011-10-10 Thread Yan Chunlu

so how about disk io?  is there anyway to use ionice to control it?

I have tried to adjust the priority by "ionice -c3 -p [cassandra pid].
 seems not working...

On Wed, Sep 28, 2011 at 12:02 AM, Peter Schuller <
peter.schul...@infidyne.com> wrote:

> > I saw the ticket about compaction throttling, just wonder is that
> necessary
> > to add an option or is there anyway to do repair throttling?
> > every time I run nodetool repair, it uses all disk io and the server load
> > goes up quickly, just wonder is there anyway to make it smoother.
>
> The validating compaction that is part of repair is subject to
> compaction throttling.
>
> The streaming of sstables afterwards is not however. In 1.0 there is
> thottling of streaming:
> https://issues.apache.org/jira/browse/CASSANDRA-3080
>
> --
> / Peter Schuller (@scode on twitter)
>

1 2 >

1 - 100 of 110 matches

Mail list logo