Re: Inconsistent data after adding a new DC and rebuilding

2017-04-11 Thread George Sigletos
Thanks for your reply. Yes, it would be nice to know the root cause.

Now running a full repair. Hopefully this will solve the problem

On Tue, Apr 11, 2017 at 9:43 AM, Roland Otta <roland.o...@willhaben.at>
wrote:

> well .. thats pretty much the same we saw in our environment (cassandra
> 3.7).
>
> in our case a full repair fixed the issues.
> but no doubt .. it would be more satisfying to know the root cause for
> that issue
>
> br,
> roland
>
>
> On Mon, 2017-04-10 at 19:12 +0200, George Sigletos wrote:
>
> In 3 out of 5 nodes of our new DC the rebuild process finished
> successfully. In the other two nodes not (the process was hanging doing
> nothing) so we killed it, removed all data and started again. This time
> finished successfully.
>
> Here is the netstats output of one of the new newly added nodes:
>
> Mode: NORMAL
> Not sending any streams.
> Read Repair Statistics:
> Attempted: 269142
> Mismatch (Blocking): 169866
> Mismatch (Background): 4
> Pool NameActive   Pending  Completed   Dropped
> Commandsn/a 2   10031126  1935
> Responses   n/a97   22565129   n/a
>
>
> On Mon, Apr 10, 2017 at 5:28 PM, Roland Otta <roland.o...@willhaben.at>
> wrote:
>
> Hi,
>
> we have seen similar issues here.
>
> have you verified that your rebuilds have been finished successfully? we
> have seen rebuilds that stopped streaming and working but have not finished.
> what does nodetool netstats output for your newly built up nodes?
>
> br,
> roland
>
>
> On Mon, 2017-04-10 at 17:15 +0200, George Sigletos wrote:
>
> Hello,
>
> We recently added a new datacenter to our cluster and run "nodetool
> rebuild -- " in all 5 new nodes, one by one.
>
> After this process finished we noticed there is data missing from the new
> datacenter, although it exists on the current one.
>
> How would that be possible? Should I maybe have run repair in all nodes of
> the current DC before adding the new one?
>
> Running Cassandra 2.1.15
>
> Kind regards,
> George
>
>
>
>
>
>


Re: Inconsistent data after adding a new DC and rebuilding

2017-04-10 Thread George Sigletos
In 3 out of 5 nodes of our new DC the rebuild process finished
successfully. In the other two nodes not (the process was hanging doing
nothing) so we killed it, removed all data and started again. This time
finished successfully.

Here is the netstats output of one of the new newly added nodes:

Mode: NORMAL
Not sending any streams.
Read Repair Statistics:
Attempted: 269142
Mismatch (Blocking): 169866
Mismatch (Background): 4
Pool NameActive   Pending  Completed   Dropped
Commandsn/a 2   10031126  1935
Responses   n/a97   22565129   n/a


On Mon, Apr 10, 2017 at 5:28 PM, Roland Otta <roland.o...@willhaben.at>
wrote:

> Hi,
>
> we have seen similar issues here.
>
> have you verified that your rebuilds have been finished successfully? we
> have seen rebuilds that stopped streaming and working but have not finished.
> what does nodetool netstats output for your newly built up nodes?
>
> br,
> roland
>
>
> On Mon, 2017-04-10 at 17:15 +0200, George Sigletos wrote:
>
> Hello,
>
> We recently added a new datacenter to our cluster and run "nodetool
> rebuild -- " in all 5 new nodes, one by one.
>
> After this process finished we noticed there is data missing from the new
> datacenter, although it exists on the current one.
>
> How would that be possible? Should I maybe have run repair in all nodes of
> the current DC before adding the new one?
>
> Running Cassandra 2.1.15
>
> Kind regards,
> George
>
>
>
>


Inconsistent data after adding a new DC and rebuilding

2017-04-10 Thread George Sigletos
Hello,

We recently added a new datacenter to our cluster and run "nodetool rebuild
-- " in all 5 new nodes, one by one.

After this process finished we noticed there is data missing from the new
datacenter, although it exists on the current one.

How would that be possible? Should I maybe have run repair in all nodes of
the current DC before adding the new one?

Running Cassandra 2.1.15

Kind regards,
George


Re: Change the IP of a live node

2017-03-16 Thread George Sigletos
This was a network problem at our side after all which we fixed. Cassandra
was blocking connections between 192.168.xxx <-> 10.179.xxx on port 7000

On Wed, Mar 15, 2017 at 2:47 PM, Ryan Svihla <r...@foundev.pro> wrote:

> I've actually changed the ip address quite a bit (gossip complains on
> startup and happily picks up the new address),  I think this maybe easier
> such as..can those ip addresses route to one another ?
>
> As in can the first node with 192.168.xx.xx hit the node with
> 10.179.xx.xx on that interface?
>
> On Wed, Mar 15, 2017 at 9:37 AM, kurt greaves <k...@instaclustr.com>
> wrote:
>
>> Cassandra uses the IP address for more or less everything. It's possible
>> to change it through some hackery however probably not a great idea. The
>> nodes system tables will still reference the old IP which is likely your
>> problem here.
>>
>> On 14 March 2017 at 18:58, George Sigletos <sigle...@textkernel.nl>
>> wrote:
>>
>>> To give a complete picture, my node has actually two network interfaces:
>>> eth0 for 192.168.xx.xx and eth1 for 10.179.xx.xx
>>>
>>> On Tue, Mar 14, 2017 at 7:46 PM, George Sigletos <sigle...@textkernel.nl
>>> > wrote:
>>>
>>>> Hello,
>>>>
>>>> I am trying to change the IP of a live node (I am not replacing a dead
>>>> one).
>>>>
>>>> So I stop the service on my node (not a seed node), I change the IP
>>>> from 192.168.xx.xx to 10.179.xx.xx, and modify "listen_address" and
>>>> "rpc_address" in the cassandra.yaml, while I also set auto_bootstrap:
>>>> false. Then I restart but it fails to see the rest of the cluster:
>>>>
>>>> Datacenter: DC1
>>>> ===
>>>> Status=Up/Down
>>>> |/ State=Normal/Leaving/Joining/Moving
>>>> --  AddressLoad   Tokens  OwnsHost
>>>> ID   Rack
>>>> DN  192.168.xx.xx  ?  256 ?
>>>> 241f3002-8f89-4433-a521-4fa4b070b704  r1
>>>> UN  10.179.xx.xx  3.45 TB256 ?
>>>> 3b07df3b-683b-4e2d-b307-3c48190c8f1c  RAC1
>>>> DN  192.168.xx.xx  ?  256 ?
>>>> 19636f1e-9417-4354-8364-6617b8d3d20b  r1
>>>> DN  192.168.xx.xx?  256 ?
>>>> 9c65c71c-f5dd-4267-af9e-a20881cf3d48  r1
>>>> DN  192.168.xx.xx   ?  256 ?
>>>> ee75219f-0f2c-4be0-bd6d-038315212728  r1
>>>>
>>>> Am I doing anything wrong? Thanks in advance
>>>>
>>>> Kind regards,
>>>> George
>>>>
>>>
>>>
>>
>
>
> --
>
> Thanks,
> Ryan Svihla
>
>


Re: Change the IP of a live node

2017-03-14 Thread George Sigletos
To give a complete picture, my node has actually two network interfaces:
eth0 for 192.168.xx.xx and eth1 for 10.179.xx.xx

On Tue, Mar 14, 2017 at 7:46 PM, George Sigletos <sigle...@textkernel.nl>
wrote:

> Hello,
>
> I am trying to change the IP of a live node (I am not replacing a dead
> one).
>
> So I stop the service on my node (not a seed node), I change the IP from
> 192.168.xx.xx to 10.179.xx.xx, and modify "listen_address" and
> "rpc_address" in the cassandra.yaml, while I also set auto_bootstrap:
> false. Then I restart but it fails to see the rest of the cluster:
>
> Datacenter: DC1
> ===
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  AddressLoad   Tokens  OwnsHost
> ID   Rack
> DN  192.168.xx.xx  ?  256 ?   
> 241f3002-8f89-4433-a521-4fa4b070b704
> r1
> UN  10.179.xx.xx  3.45 TB256 ?   
> 3b07df3b-683b-4e2d-b307-3c48190c8f1c
> RAC1
> DN  192.168.xx.xx  ?  256 ?   
> 19636f1e-9417-4354-8364-6617b8d3d20b
> r1
> DN  192.168.xx.xx?  256 ?   
> 9c65c71c-f5dd-4267-af9e-a20881cf3d48
> r1
> DN  192.168.xx.xx   ?  256 ?   
> ee75219f-0f2c-4be0-bd6d-038315212728
> r1
>
> Am I doing anything wrong? Thanks in advance
>
> Kind regards,
> George
>


Change the IP of a live node

2017-03-14 Thread George Sigletos
Hello,

I am trying to change the IP of a live node (I am not replacing a dead
one).

So I stop the service on my node (not a seed node), I change the IP from
192.168.xx.xx to 10.179.xx.xx, and modify "listen_address" and
"rpc_address" in the cassandra.yaml, while I also set auto_bootstrap:
false. Then I restart but it fails to see the rest of the cluster:

Datacenter: DC1
===
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  AddressLoad   Tokens  OwnsHost
ID   Rack
DN  192.168.xx.xx  ?  256 ?
241f3002-8f89-4433-a521-4fa4b070b704  r1
UN  10.179.xx.xx  3.45 TB256 ?
3b07df3b-683b-4e2d-b307-3c48190c8f1c  RAC1
DN  192.168.xx.xx  ?  256 ?
19636f1e-9417-4354-8364-6617b8d3d20b  r1
DN  192.168.xx.xx?  256 ?
9c65c71c-f5dd-4267-af9e-a20881cf3d48  r1
DN  192.168.xx.xx   ?  256 ?
ee75219f-0f2c-4be0-bd6d-038315212728  r1

Am I doing anything wrong? Thanks in advance

Kind regards,
George


Re: TRUNCATE throws OperationTimedOut randomly

2016-09-28 Thread George Sigletos
Even when I set a lower request-timeout in order to trigger a timeout,
still no WARN or ERROR in the logs

On Wed, Sep 28, 2016 at 8:22 PM, George Sigletos <sigle...@textkernel.nl>
wrote:

> Hi Joaquin,
>
> Unfortunately neither WARN nor ERROR found in the system logs across the
> cluster when executing truncate. Sometimes it executes immediately, other
> times it takes 25 seconds, given that I have connected with
> --request-timeout=30 seconds.
>
> The nodes are a bit busy compacting. On a freshly restarted cluster,
> truncate seems to work without problems.
>
> Some warnings that I see around that time but not exactly when executing
> truncate are:
> WARN  [CompactionExecutor:2] 2016-09-28 20:03:29,646
> SSTableWriter.java:241 - Compacting large partition
> system/hints:6f2c3b31-4975-470b-8f91-e706be89a83a (133819308 bytes
>
> Kind regards,
> George
>
> On Wed, Sep 28, 2016 at 7:54 PM, Joaquin Casares <
> joaq...@thelastpickle.com> wrote:
>
>> Hi George,
>>
>> Try grepping for WARN and ERROR on the system.logs across all nodes when
>> you run the command. Could you post any of the recent stacktraces that you
>> see?
>>
>> Cheers,
>>
>> Joaquin Casares
>> Consultant
>> Austin, TX
>>
>> Apache Cassandra Consulting
>> http://www.thelastpickle.com
>>
>> On Wed, Sep 28, 2016 at 12:43 PM, George Sigletos <sigle...@textkernel.nl
>> > wrote:
>>
>>> Thanks a lot for your reply.
>>>
>>> I understand that truncate is an expensive operation. But throwing a
>>> timeout while truncating a table that is already empty?
>>>
>>> A workaround is to set a high --request-timeout when connecting. Even 20
>>> seconds is not always enough
>>>
>>> Kind regards,
>>> George
>>>
>>>
>>> On Wed, Sep 28, 2016 at 6:59 PM, Edward Capriolo <edlinuxg...@gmail.com>
>>> wrote:
>>>
>>>> Truncate does a few things (based on version)
>>>>   truncate takes snapshots
>>>>   truncate causes a flush
>>>>   in very old versions truncate causes a schema migration.
>>>>
>>>> In newer versions like cassandra 3.4 you have this knob.
>>>>
>>>> # How long the coordinator should wait for truncates to complete
>>>> # (This can be much longer, because unless auto_snapshot is disabled
>>>> # we need to flush first so we can snapshot before removing the data.)
>>>> truncate_request_timeout_in_ms: 6
>>>>
>>>>
>>>> In older versions you can not control when this call will timeout, it
>>>> is fairly normal that it does!
>>>>
>>>>
>>>> On Wed, Sep 28, 2016 at 12:50 PM, George Sigletos <
>>>> sigle...@textkernel.nl> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> I keep executing a TRUNCATE command on an empty table and it throws
>>>>> OperationTimedOut randomly:
>>>>>
>>>>> cassandra@cqlsh> truncate test.mytable;
>>>>> OperationTimedOut: errors={}, last_host=cassiebeta-01
>>>>> cassandra@cqlsh> truncate test.mytable;
>>>>> OperationTimedOut: errors={}, last_host=cassiebeta-01
>>>>>
>>>>> Having a 3 node cluster running 2.1.14. No connectivity problems. Has
>>>>> anybody come across the same error?
>>>>>
>>>>> Thanks,
>>>>> George
>>>>>
>>>>>
>>>>
>>>
>>
>


Re: TRUNCATE throws OperationTimedOut randomly

2016-09-28 Thread George Sigletos
Hi Joaquin,

Unfortunately neither WARN nor ERROR found in the system logs across the
cluster when executing truncate. Sometimes it executes immediately, other
times it takes 25 seconds, given that I have connected with
--request-timeout=30 seconds.

The nodes are a bit busy compacting. On a freshly restarted cluster,
truncate seems to work without problems.

Some warnings that I see around that time but not exactly when executing
truncate are:
WARN  [CompactionExecutor:2] 2016-09-28 20:03:29,646 SSTableWriter.java:241
- Compacting large partition
system/hints:6f2c3b31-4975-470b-8f91-e706be89a83a (133819308 bytes

Kind regards,
George

On Wed, Sep 28, 2016 at 7:54 PM, Joaquin Casares <joaq...@thelastpickle.com>
wrote:

> Hi George,
>
> Try grepping for WARN and ERROR on the system.logs across all nodes when
> you run the command. Could you post any of the recent stacktraces that you
> see?
>
> Cheers,
>
> Joaquin Casares
> Consultant
> Austin, TX
>
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>
> On Wed, Sep 28, 2016 at 12:43 PM, George Sigletos <sigle...@textkernel.nl>
> wrote:
>
>> Thanks a lot for your reply.
>>
>> I understand that truncate is an expensive operation. But throwing a
>> timeout while truncating a table that is already empty?
>>
>> A workaround is to set a high --request-timeout when connecting. Even 20
>> seconds is not always enough
>>
>> Kind regards,
>> George
>>
>>
>> On Wed, Sep 28, 2016 at 6:59 PM, Edward Capriolo <edlinuxg...@gmail.com>
>> wrote:
>>
>>> Truncate does a few things (based on version)
>>>   truncate takes snapshots
>>>   truncate causes a flush
>>>   in very old versions truncate causes a schema migration.
>>>
>>> In newer versions like cassandra 3.4 you have this knob.
>>>
>>> # How long the coordinator should wait for truncates to complete
>>> # (This can be much longer, because unless auto_snapshot is disabled
>>> # we need to flush first so we can snapshot before removing the data.)
>>> truncate_request_timeout_in_ms: 6
>>>
>>>
>>> In older versions you can not control when this call will timeout, it is
>>> fairly normal that it does!
>>>
>>>
>>> On Wed, Sep 28, 2016 at 12:50 PM, George Sigletos <
>>> sigle...@textkernel.nl> wrote:
>>>
>>>> Hello,
>>>>
>>>> I keep executing a TRUNCATE command on an empty table and it throws
>>>> OperationTimedOut randomly:
>>>>
>>>> cassandra@cqlsh> truncate test.mytable;
>>>> OperationTimedOut: errors={}, last_host=cassiebeta-01
>>>> cassandra@cqlsh> truncate test.mytable;
>>>> OperationTimedOut: errors={}, last_host=cassiebeta-01
>>>>
>>>> Having a 3 node cluster running 2.1.14. No connectivity problems. Has
>>>> anybody come across the same error?
>>>>
>>>> Thanks,
>>>> George
>>>>
>>>>
>>>
>>
>


Re: TRUNCATE throws OperationTimedOut randomly

2016-09-28 Thread George Sigletos
Thanks a lot for your reply.

I understand that truncate is an expensive operation. But throwing a
timeout while truncating a table that is already empty?

A workaround is to set a high --request-timeout when connecting. Even 20
seconds is not always enough

Kind regards,
George


On Wed, Sep 28, 2016 at 6:59 PM, Edward Capriolo <edlinuxg...@gmail.com>
wrote:

> Truncate does a few things (based on version)
>   truncate takes snapshots
>   truncate causes a flush
>   in very old versions truncate causes a schema migration.
>
> In newer versions like cassandra 3.4 you have this knob.
>
> # How long the coordinator should wait for truncates to complete
> # (This can be much longer, because unless auto_snapshot is disabled
> # we need to flush first so we can snapshot before removing the data.)
> truncate_request_timeout_in_ms: 6
>
>
> In older versions you can not control when this call will timeout, it is
> fairly normal that it does!
>
>
> On Wed, Sep 28, 2016 at 12:50 PM, George Sigletos <sigle...@textkernel.nl>
> wrote:
>
>> Hello,
>>
>> I keep executing a TRUNCATE command on an empty table and it throws
>> OperationTimedOut randomly:
>>
>> cassandra@cqlsh> truncate test.mytable;
>> OperationTimedOut: errors={}, last_host=cassiebeta-01
>> cassandra@cqlsh> truncate test.mytable;
>> OperationTimedOut: errors={}, last_host=cassiebeta-01
>>
>> Having a 3 node cluster running 2.1.14. No connectivity problems. Has
>> anybody come across the same error?
>>
>> Thanks,
>> George
>>
>>
>


TRUNCATE throws OperationTimedOut randomly

2016-09-28 Thread George Sigletos
Hello,

I keep executing a TRUNCATE command on an empty table and it throws
OperationTimedOut randomly:

cassandra@cqlsh> truncate test.mytable;
OperationTimedOut: errors={}, last_host=cassiebeta-01
cassandra@cqlsh> truncate test.mytable;
OperationTimedOut: errors={}, last_host=cassiebeta-01

Having a 3 node cluster running 2.1.14. No connectivity problems. Has
anybody come across the same error?

Thanks,
George


Re: cqlsh problem

2016-09-20 Thread George Sigletos
This appears in the system log:

Caused by: java.lang.RuntimeException:
org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out -
received only 2 responses.
at org.apache.cassandra.auth.Auth.selectUser(Auth.java:276)
~[apache-cassandra-2.1.14.jar:2.1.14]
at org.apache.cassandra.auth.Auth.isSuperuser(Auth.java:97)
~[apache-cassandra-2.1.14.jar:2.1.14]
at
org.apache.cassandra.auth.AuthenticatedUser.isSuper(AuthenticatedUser.java:50)
~[apache-cassandra-2.1.14.jar:2.1.14]
at
org.apache.cassandra.auth.CassandraAuthorizer.authorize(CassandraAuthorizer.java:67)
~[apache-cassandra-2.1.14.jar:2.1.14]
at
org.apache.cassandra.auth.PermissionsCache$1.load(PermissionsCache.java:124)
~[apache-cassandra-2.1.14.jar:2.1.14]
at
org.apache.cassandra.auth.PermissionsCache$1.load(PermissionsCache.java:121)
~[apache-cassandra-2.1.14.jar:2.1.14]
at
com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3524)
~[guava-16.0.jar:na]
at
com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2317)
~[guava-16.0.jar:na]
at
com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2280)
~[guava-16.0.jar:na]
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2195)
~[guava-16.0.jar:na]


On Tue, Sep 20, 2016 at 11:12 AM, George Sigletos <sigle...@textkernel.nl>
wrote:

> I am also getting the same error:
> cqlsh  -u cassandra -p cassandra
>
> Connection error: ('Unable to connect to any servers', {'':
> OperationTimedOut('errors=Timed out creating connection (5 seconds),
> last_host=None',)})
>
> But it is not consistent. Sometimes I manage to connect. It is random.
> Using 2.1.14
>
> On Tue, Jun 14, 2016 at 4:29 AM, joseph gao <gaojf.bok...@gmail.com>
> wrote:
>
>> hi, Patric, [image: 内嵌图片 1],
>> netstat -lepunt looks like above
>>
>> 2016-05-27 23:16 GMT+08:00 Patrick McFadin <pmcfa...@gmail.com>:
>>
>>> Can you do a netstat -lepunt and show the output? If Cassandra is
>>> running you aren't trying to connect to the ip/port it's bound to.
>>>
>>> Patrick
>>>
>>>
>>> On Monday, May 23, 2016, joseph gao <gaojf.bok...@gmail.com> wrote:
>>>
>>>> I used to think it's firewall/network issues too. So I make ufw to be
>>>> inactive. I really don't what's the reason.
>>>>
>>>> 2016-05-09 19:01 GMT+08:00 kurt Greaves <k...@instaclustr.com>:
>>>>
>>>>> Don't be fooled, despite saying tcp6 and :::*, it still listens on
>>>>> IPv4. As far as I'm aware this happens on all 2.1 Cassandra nodes, and may
>>>>> just be an oddity of netstat. It would be unrelated to your connection
>>>>> timeout issues, that's most likely related to firewall/network issues.
>>>>>
>>>>> On 9 May 2016 at 09:59, joseph gao <gaojf.bok...@gmail.com> wrote:
>>>>>
>>>>>> It doesn't work ,still using ipv6 [image: 内嵌图片 1]
>>>>>>
>>>>>> And I already set [image: 内嵌图片 2]
>>>>>>
>>>>>> Now I'm using 4.1.1 using 9160 port instead of 5.x.x。
>>>>>>
>>>>>> Hopefully this could be resolved, Thanks!
>>>>>>
>>>>>> 2016-03-30 22:13 GMT+08:00 Alain RODRIGUEZ <arodr...@gmail.com>:
>>>>>>
>>>>>>> Hi Joseph,
>>>>>>>
>>>>>>> why cassandra using tcp6 for 9042 port like :
>>>>>>>> tcp6   0  0 0.0.0.0:9042:::*
>>>>>>>>  LISTEN
>>>>>>>>
>>>>>>>
>>>>>>> if I remember correctly, in 2.1 and higher, cqlsh uses native
>>>>>>> transport, port 9042  (instead of thrift port 9160) and your clients (if
>>>>>>> any) are also probably using native transport (port 9042). So yes, this
>>>>>>> could be an issue indeed.
>>>>>>>
>>>>>>> You should have something like:
>>>>>>>
>>>>>>> tcp0  0  1.2.3.4:9042   :::*
>>>>>>> LISTEN
>>>>>>>
>>>>>>> You are using IPv6 and no rpc address. Try setting it to the listen
>>>>>>> address and using IPv4.
>>>>>>>
>>>>>>> C*heers,
>>>>>>>
>>>>>>> ---
>>>>>>>
>>>>>>> Alain Rodriguez - al...@thelastpickle.com
>>>>>>>
>>>>>

Re: cqlsh problem

2016-09-20 Thread George Sigletos
I am also getting the same error:
cqlsh  -u cassandra -p cassandra

Connection error: ('Unable to connect to any servers', {'':
OperationTimedOut('errors=Timed out creating connection (5 seconds),
last_host=None',)})

But it is not consistent. Sometimes I manage to connect. It is random.
Using 2.1.14

On Tue, Jun 14, 2016 at 4:29 AM, joseph gao  wrote:

> hi, Patric, [image: 内嵌图片 1],
> netstat -lepunt looks like above
>
> 2016-05-27 23:16 GMT+08:00 Patrick McFadin :
>
>> Can you do a netstat -lepunt and show the output? If Cassandra is running
>> you aren't trying to connect to the ip/port it's bound to.
>>
>> Patrick
>>
>>
>> On Monday, May 23, 2016, joseph gao  wrote:
>>
>>> I used to think it's firewall/network issues too. So I make ufw to be
>>> inactive. I really don't what's the reason.
>>>
>>> 2016-05-09 19:01 GMT+08:00 kurt Greaves :
>>>
 Don't be fooled, despite saying tcp6 and :::*, it still listens on
 IPv4. As far as I'm aware this happens on all 2.1 Cassandra nodes, and may
 just be an oddity of netstat. It would be unrelated to your connection
 timeout issues, that's most likely related to firewall/network issues.

 On 9 May 2016 at 09:59, joseph gao  wrote:

> It doesn't work ,still using ipv6 [image: 内嵌图片 1]
>
> And I already set [image: 内嵌图片 2]
>
> Now I'm using 4.1.1 using 9160 port instead of 5.x.x。
>
> Hopefully this could be resolved, Thanks!
>
> 2016-03-30 22:13 GMT+08:00 Alain RODRIGUEZ :
>
>> Hi Joseph,
>>
>> why cassandra using tcp6 for 9042 port like :
>>> tcp6   0  0 0.0.0.0:9042:::*
>>>  LISTEN
>>>
>>
>> if I remember correctly, in 2.1 and higher, cqlsh uses native
>> transport, port 9042  (instead of thrift port 9160) and your clients (if
>> any) are also probably using native transport (port 9042). So yes, this
>> could be an issue indeed.
>>
>> You should have something like:
>>
>> tcp0  0  1.2.3.4:9042   :::*
>> LISTEN
>>
>> You are using IPv6 and no rpc address. Try setting it to the listen
>> address and using IPv4.
>>
>> C*heers,
>>
>> ---
>>
>> Alain Rodriguez - al...@thelastpickle.com
>>
>> France
>>
>> The Last Pickle - Apache Cassandra Consulting
>>
>> http://www.thelastpickle.com
>>
>> 2016-03-30 6:09 GMT+02:00 joseph gao :
>>
>>> why cassandra using tcp6 for 9042 port like :
>>> tcp6   0  0 0.0.0.0:9042:::*
>>>  LISTEN
>>> would this be the problem
>>>
>>> 2016-03-30 11:34 GMT+08:00 joseph gao :
>>>
 still have not fixed it . cqlsh: error: no such option:
 --connect-timeout
 cqlsh version 5.0.1



 2016-03-25 16:46 GMT+08:00 Alain RODRIGUEZ :

> Hi Joseph.
>
> As I can't reproduce here, I believe you are having network issue
> of some kind.
>
> MacBook-Pro:~ alain$ cqlsh --version
> cqlsh 5.0.1
> MacBook-Pro:~ alain$ echo 'DESCRIBE KEYSPACES;' | cqlsh
> --connect-timeout=5 --request-timeout=10
> system_traces  system
> MacBook-Pro:~ alain$
>
> It's been a few days, did you manage to fix it ?
>
> C*heers,
> ---
> Alain Rodriguez - al...@thelastpickle.com
> France
>
> The Last Pickle - Apache Cassandra Consulting
> http://www.thelastpickle.com
>
> 2016-03-21 9:59 GMT+01:00 joseph gao :
>
>> cqlsh version 5.0.1. nodetool tpstats looks good, log looks
>> good. And I used specified port 9042. And it immediately returns 
>> fail (less
>> than 3 seconds). By the way where should I use '--connect-timeout', 
>> cqlsh
>> seems don't have such parameters.
>>
>> 2016-03-18 17:29 GMT+08:00 Alain RODRIGUEZ :
>>
>>> Is the node fully healthy or rejecting some requests ?
>>>
>>> What are the outputs for "grep -i "ERROR"
>>> /var/log/cassandra/system.log" and "nodetool tpstats"?
>>>
>>> Any error? Any pending / blocked or dropped messages?
>>>
>>> Also did you try using distinct ports (9160 for thrift, 9042 for
>>> native) - out of curiosity, not sure this will help.
>>>
>>> What is your version of cqlsh "cqlsh --version" ?
>>>
>>> doesn't work most times. But some time it just work fine

>>>
>>> Do you fill like this is due to a timeout (query being too big,
>>> 

Re: Consistency level ONE and using withLocalDC

2016-06-09 Thread George Sigletos
Hi Alain,

Thank you for your answer.

I recently queried multiple times my cluster with consistency ONE and
setting "myLocalDC" (withUsedHostsPerRemoteDc=1)

However sometimes (not always) I got response from the node in the remote
DC. All my nodes in "myLocalDC" were up and running.

I was facing an data inconsistency issue. When connecting to the remote
node I got empty result, while when connecting to "myLocalDC" I got the
expected result back.

I was expecting that since all nodes in "myLocalDC" were up and running, no
attempt would have been made to the remote node.

I had to solve the problem by setting consistency "LOCAL_ONE" till I repair
the remote node. Or I could alternatively have set
withUsedHostsPerRemoteDc=0.

Kind regards,
George

On Wed, Jun 8, 2016 at 7:10 PM, Alain RODRIGUEZ <arodr...@gmail.com> wrote:

> Hi George,
>
> Would that be correct?
>
>
> I think it is actually quite the opposite :-).
>
> It is very well explained here:
> https://docs.datastax.com/en/drivers/java/2.0/com/datastax/driver/core/policies/DCAwareRoundRobinPolicy.Builder.html#withUsedHostsPerRemoteDc-int-
>
> Connection is opened to the X nodes in the remote DC. But it will only be
> used to indeed do a local operation as a fallback if the operation is not
> using a LOCAL_* consistency level.
>
> Sorry I have been so long answering you.
>
> ---
> Alain Rodriguez - al...@thelastpickle.com
> France
>
> The Last Pickle - Apache Cassandra Consulting
> http://www.thelastpickle.com
>
> 2016-05-20 17:54 GMT+02:00 George Sigletos <sigle...@textkernel.nl>:
>
>> Hello,
>>
>> Using withLocalDC="myLocalDC" and withUsedHostsPerRemoteDc>0 will
>> guarantee that you will connect to one of the nodes in "myLocalDC",
>>
>> but DOES NOT guarantee that your read/write request will be acknowledged
>> by a "myLocalDC" node. It may well be acknowledged by a remote DC node as
>> well, even if "myLocalDC" is up and running.
>>
>> Would that be correct? Thank you
>>
>> Kind regards,
>> George
>>
>
>


Re: Error while rebuilding a node: Stream failed

2016-06-02 Thread George Sigletos
I gave up completely with rebuild.

Now I am running `nodetool repair` and in case of network issues I retry
for the token ranges that failed using the -st and -et options of `nodetool
repair`.

That would be good enough for now, till we fix our network problems.

On Sat, May 28, 2016 at 7:05 PM, George Sigletos <sigle...@textkernel.nl>
wrote:

> No luck unfortunately. It seems that the connection to the destination
> node was lost.
>
> However there was progress compared to the previous times. A lot more data
> was streamed.
>
> (From source node)
> INFO  [GossipTasks:1] 2016-05-28 17:53:57,155 Gossiper.java:1008 -
> InetAddress /54.172.235.227 is now DOWN
> INFO  [HANDSHAKE-/54.172.235.227] 2016-05-28 17:53:58,238
> OutboundTcpConnection.java:487 - Handshaking version with /54.172.235.227
> ERROR [STREAM-IN-/54.172.235.227] 2016-05-28 17:54:08,938
> StreamSession.java:505 - [Stream #d25a05c0-241f-11e6-bb50-1b05ac77baf9]
> Streaming error occurred
> java.io.IOException: Connection timed out
> at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> ~[na:1.7.0_79]
> at sun.nio.ch.SocketDispatcher.read(Unknown Source) ~[na:1.7.0_79]
> at sun.nio.ch.IOUtil.readIntoNativeBuffer(Unknown Source)
> ~[na:1.7.0_79]
> at sun.nio.ch.IOUtil.read(Unknown Source) ~[na:1.7.0_79]
> at sun.nio.ch.SocketChannelImpl.read(Unknown Source) ~[na:1.7.0_79]
> at sun.nio.ch.SocketAdaptor$SocketInputStream.read(Unknown Source)
> ~[na:1.7.0_79]
> at sun.nio.ch.ChannelInputStream.read(Unknown Source)
> ~[na:1.7.0_79]
> at java.nio.channels.Channels$ReadableByteChannelImpl.read(Unknown
> Source) ~[na:1.7.0_79]
> at
> org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:51)
> ~[apache-cassandra-2.1.14.jar:2.1.14]
> at
> org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:257)
> ~[apache-cassandra-2.1.14.jar:2.1.14]
> at java.lang.Thread.run(Unknown Source) [na:1.7.0_79]
> INFO  [SharedPool-Worker-1] 2016-05-28 17:54:59,612 Gossiper.java:993 -
> InetAddress /54.172.235.227 is now UP
>
> On Fri, May 27, 2016 at 5:37 PM, George Sigletos <sigle...@textkernel.nl>
> wrote:
>
>> I am trying once more using more aggressive tcp settings, as recommended
>> here
>> <https://docs.datastax.com/en/cassandra/2.1/cassandra/troubleshooting/trblshootIdleFirewall.html>
>>
>> sudo sysctl -w net.ipv4.tcp_keepalive_time=60 
>> net.ipv4.tcp_keepalive_probes=3 net.ipv4.tcp_keepalive_intvl=10
>>
>> (added to /etc/sysctl.conf and run sysctl -p /etc/sysctl.conf on all
>> nodes)
>>
>> Let's see what happens. I don't know what else to try. I have even
>> further increased streaming_socket_timeout_in_ms
>>
>>
>>
>> On Fri, May 27, 2016 at 4:56 PM, Paulo Motta <pauloricard...@gmail.com>
>> wrote:
>>
>>> I'm afraid raising streaming_socket_timeout_in_ms won't help much in
>>> this case because the incoming connection on the source node is timing out
>>> on the network layer, and streaming_socket_timeout_in_ms controls the
>>> socket timeout in the app layer and throws SocketTimeoutException (not 
>>> java.io.IOException:
>>> Connection timed out). So you should probably use more aggressive tcp
>>> keep-alive settings (net.ipv4.tcp_keepalive_*) on both hosts, did you try
>>> tuning that? Even that might not be sufficient as some routers tend to
>>> ignore tcp keep-alives and just kill idle connections.
>>>
>>> As said before, this will ultimately be fixed by adding keep-alive to
>>> the app layer on CASSANDRA-11841. If tuning tcp keep-alives does not help,
>>> one extreme approach would be to backport this to 2.1 (unless some
>>> experienced operator out there has a more creative approach).
>>>
>>> @eevans, I'm not sure he is using a mixed version cluster, it seem he
>>> finished the upgrade from 2.1.13 to 2.1.14 before performing the rebuild.
>>>
>>> 2016-05-27 11:39 GMT-03:00 Eric Evans <john.eric.ev...@gmail.com>:
>>>
>>>> From the various stacktraces in this thread, it's obvious you are
>>>> mixing versions 2.1.13 and 2.1.14.  Topology changes like this aren't
>>>> supported with mixed Cassandra versions.  Sometimes it will work,
>>>> sometimes it won't (and it will definitely not work in this instance).
>>>>
>>>> You should either upgrade your 2.1.13 nodes to 2.1.14 first, or add
>>>> the new nodes using 2.1.13, and upgrade after.
>>>>

Re: Error while rebuilding a node: Stream failed

2016-05-28 Thread George Sigletos
No luck unfortunately. It seems that the connection to the destination node
was lost.

However there was progress compared to the previous times. A lot more data
was streamed.

(From source node)
INFO  [GossipTasks:1] 2016-05-28 17:53:57,155 Gossiper.java:1008 -
InetAddress /54.172.235.227 is now DOWN
INFO  [HANDSHAKE-/54.172.235.227] 2016-05-28 17:53:58,238
OutboundTcpConnection.java:487 - Handshaking version with /54.172.235.227
ERROR [STREAM-IN-/54.172.235.227] 2016-05-28 17:54:08,938
StreamSession.java:505 - [Stream #d25a05c0-241f-11e6-bb50-1b05ac77baf9]
Streaming error occurred
java.io.IOException: Connection timed out
at sun.nio.ch.FileDispatcherImpl.read0(Native Method) ~[na:1.7.0_79]
at sun.nio.ch.SocketDispatcher.read(Unknown Source) ~[na:1.7.0_79]
at sun.nio.ch.IOUtil.readIntoNativeBuffer(Unknown Source)
~[na:1.7.0_79]
at sun.nio.ch.IOUtil.read(Unknown Source) ~[na:1.7.0_79]
at sun.nio.ch.SocketChannelImpl.read(Unknown Source) ~[na:1.7.0_79]
at sun.nio.ch.SocketAdaptor$SocketInputStream.read(Unknown Source)
~[na:1.7.0_79]
at sun.nio.ch.ChannelInputStream.read(Unknown Source) ~[na:1.7.0_79]
at java.nio.channels.Channels$ReadableByteChannelImpl.read(Unknown
Source) ~[na:1.7.0_79]
at
org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:51)
~[apache-cassandra-2.1.14.jar:2.1.14]
at
org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:257)
~[apache-cassandra-2.1.14.jar:2.1.14]
at java.lang.Thread.run(Unknown Source) [na:1.7.0_79]
INFO  [SharedPool-Worker-1] 2016-05-28 17:54:59,612 Gossiper.java:993 -
InetAddress /54.172.235.227 is now UP

On Fri, May 27, 2016 at 5:37 PM, George Sigletos <sigle...@textkernel.nl>
wrote:

> I am trying once more using more aggressive tcp settings, as recommended
> here
> <https://docs.datastax.com/en/cassandra/2.1/cassandra/troubleshooting/trblshootIdleFirewall.html>
>
> sudo sysctl -w net.ipv4.tcp_keepalive_time=60 net.ipv4.tcp_keepalive_probes=3 
> net.ipv4.tcp_keepalive_intvl=10
>
> (added to /etc/sysctl.conf and run sysctl -p /etc/sysctl.conf on all nodes)
>
> Let's see what happens. I don't know what else to try. I have even further
> increased streaming_socket_timeout_in_ms
>
>
>
> On Fri, May 27, 2016 at 4:56 PM, Paulo Motta <pauloricard...@gmail.com>
> wrote:
>
>> I'm afraid raising streaming_socket_timeout_in_ms won't help much in this
>> case because the incoming connection on the source node is timing out on
>> the network layer, and streaming_socket_timeout_in_ms controls the socket
>> timeout in the app layer and throws SocketTimeoutException (not 
>> java.io.IOException:
>> Connection timed out). So you should probably use more aggressive tcp
>> keep-alive settings (net.ipv4.tcp_keepalive_*) on both hosts, did you try
>> tuning that? Even that might not be sufficient as some routers tend to
>> ignore tcp keep-alives and just kill idle connections.
>>
>> As said before, this will ultimately be fixed by adding keep-alive to the
>> app layer on CASSANDRA-11841. If tuning tcp keep-alives does not help, one
>> extreme approach would be to backport this to 2.1 (unless some experienced
>> operator out there has a more creative approach).
>>
>> @eevans, I'm not sure he is using a mixed version cluster, it seem he
>> finished the upgrade from 2.1.13 to 2.1.14 before performing the rebuild.
>>
>> 2016-05-27 11:39 GMT-03:00 Eric Evans <john.eric.ev...@gmail.com>:
>>
>>> From the various stacktraces in this thread, it's obvious you are
>>> mixing versions 2.1.13 and 2.1.14.  Topology changes like this aren't
>>> supported with mixed Cassandra versions.  Sometimes it will work,
>>> sometimes it won't (and it will definitely not work in this instance).
>>>
>>> You should either upgrade your 2.1.13 nodes to 2.1.14 first, or add
>>> the new nodes using 2.1.13, and upgrade after.
>>>
>>> On Fri, May 27, 2016 at 8:41 AM, George Sigletos <sigle...@textkernel.nl>
>>> wrote:
>>>
>>> >>>> ERROR [STREAM-IN-/192.168.1.141] 2016-05-26 09:08:05,027
>>> >>>> StreamSession.java:505 - [Stream
>>> #74c57bc0-231a-11e6-a698-1b05ac77baf9]
>>> >>>> Streaming error occurred
>>> >>>> java.lang.RuntimeException: Outgoing stream handler has been closed
>>> >>>> at
>>> >>>>
>>> org.apache.cassandra.streaming.ConnectionHandler.sendMessage(ConnectionHandler.java:138)
>>> >>>> ~[apache-cassandra-2.1.14.jar:2.1.14]
>>> >>>> a

Re: Error while rebuilding a node: Stream failed

2016-05-27 Thread George Sigletos
I am trying once more using more aggressive tcp settings, as recommended
here
<https://docs.datastax.com/en/cassandra/2.1/cassandra/troubleshooting/trblshootIdleFirewall.html>

sudo sysctl -w net.ipv4.tcp_keepalive_time=60
net.ipv4.tcp_keepalive_probes=3 net.ipv4.tcp_keepalive_intvl=10

(added to /etc/sysctl.conf and run sysctl -p /etc/sysctl.conf on all nodes)

Let's see what happens. I don't know what else to try. I have even further
increased streaming_socket_timeout_in_ms



On Fri, May 27, 2016 at 4:56 PM, Paulo Motta <pauloricard...@gmail.com>
wrote:

> I'm afraid raising streaming_socket_timeout_in_ms won't help much in this
> case because the incoming connection on the source node is timing out on
> the network layer, and streaming_socket_timeout_in_ms controls the socket
> timeout in the app layer and throws SocketTimeoutException (not 
> java.io.IOException:
> Connection timed out). So you should probably use more aggressive tcp
> keep-alive settings (net.ipv4.tcp_keepalive_*) on both hosts, did you try
> tuning that? Even that might not be sufficient as some routers tend to
> ignore tcp keep-alives and just kill idle connections.
>
> As said before, this will ultimately be fixed by adding keep-alive to the
> app layer on CASSANDRA-11841. If tuning tcp keep-alives does not help, one
> extreme approach would be to backport this to 2.1 (unless some experienced
> operator out there has a more creative approach).
>
> @eevans, I'm not sure he is using a mixed version cluster, it seem he
> finished the upgrade from 2.1.13 to 2.1.14 before performing the rebuild.
>
> 2016-05-27 11:39 GMT-03:00 Eric Evans <john.eric.ev...@gmail.com>:
>
>> From the various stacktraces in this thread, it's obvious you are
>> mixing versions 2.1.13 and 2.1.14.  Topology changes like this aren't
>> supported with mixed Cassandra versions.  Sometimes it will work,
>> sometimes it won't (and it will definitely not work in this instance).
>>
>> You should either upgrade your 2.1.13 nodes to 2.1.14 first, or add
>> the new nodes using 2.1.13, and upgrade after.
>>
>> On Fri, May 27, 2016 at 8:41 AM, George Sigletos <sigle...@textkernel.nl>
>> wrote:
>>
>> >>>> ERROR [STREAM-IN-/192.168.1.141] 2016-05-26 09:08:05,027
>> >>>> StreamSession.java:505 - [Stream
>> #74c57bc0-231a-11e6-a698-1b05ac77baf9]
>> >>>> Streaming error occurred
>> >>>> java.lang.RuntimeException: Outgoing stream handler has been closed
>> >>>> at
>> >>>>
>> org.apache.cassandra.streaming.ConnectionHandler.sendMessage(ConnectionHandler.java:138)
>> >>>> ~[apache-cassandra-2.1.14.jar:2.1.14]
>> >>>> at
>> >>>>
>> org.apache.cassandra.streaming.StreamSession.receive(StreamSession.java:568)
>> >>>> ~[apache-cassandra-2.1.14.jar:2.1.14]
>> >>>> at
>> >>>>
>> org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:457)
>> >>>> ~[apache-cassandra-2.1.14.jar:2.1.14]
>> >>>> at
>> >>>>
>> org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:263)
>> >>>> ~[apache-cassandra-2.1.14.jar:2.1.14]
>> >>>> at java.lang.Thread.run(Unknown Source) [na:1.7.0_79]
>> >>>>
>> >>>> And this is from the source node:
>> >>>>
>> >>>> ERROR [STREAM-OUT-/172.31.22.104] 2016-05-26 11:08:05,097
>> >>>> StreamSession.java:505 - [Stream
>> #74c57bc0-231a-11e6-a698-1b05ac77baf9]
>> >>>> Streaming error occurred
>> >>>> java.io.IOException: Broken pipe
>> >>>> at sun.nio.ch.FileChannelImpl.transferTo0(Native Method)
>> >>>> ~[na:1.7.0_79]
>> >>>> at sun.nio.ch.FileChannelImpl.transferToDirectly(Unknown
>> Source)
>> >>>> ~[na:1.7.0_79]
>> >>>> at sun.nio.ch.FileChannelImpl.transferTo(Unknown Source)
>> >>>> ~[na:1.7.0_79]
>> >>>> at
>> >>>>
>> org.apache.cassandra.streaming.compress.CompressedStreamWriter.write(CompressedStreamWriter.java:84)
>> >>>> ~[apache-cassandra-2.1.14.jar:2.1.14]
>> >>>> at
>> >>>>
>> org.apache.cassandra.streaming.messages.OutgoingFileMessage.serialize(OutgoingFileMessage.java:88)
>> >>>> ~[apache-cassandra-2.1.14.jar:2.1.14]
>> >>>> at
>> >>>

Re: Error while rebuilding a node: Stream failed

2016-05-27 Thread George Sigletos
Hello,

No there is no version mix. The first stack traces were indeed from 2.1.13.
Then I upgraded all nodes to 2.1.14. Still getting the same errors


On Fri, May 27, 2016 at 4:39 PM, Eric Evans <john.eric.ev...@gmail.com>
wrote:

> From the various stacktraces in this thread, it's obvious you are
> mixing versions 2.1.13 and 2.1.14.  Topology changes like this aren't
> supported with mixed Cassandra versions.  Sometimes it will work,
> sometimes it won't (and it will definitely not work in this instance).
>
> You should either upgrade your 2.1.13 nodes to 2.1.14 first, or add
> the new nodes using 2.1.13, and upgrade after.
>
> On Fri, May 27, 2016 at 8:41 AM, George Sigletos <sigle...@textkernel.nl>
> wrote:
>
> >>>> ERROR [STREAM-IN-/192.168.1.141] 2016-05-26 09:08:05,027
> >>>> StreamSession.java:505 - [Stream
> #74c57bc0-231a-11e6-a698-1b05ac77baf9]
> >>>> Streaming error occurred
> >>>> java.lang.RuntimeException: Outgoing stream handler has been closed
> >>>> at
> >>>>
> org.apache.cassandra.streaming.ConnectionHandler.sendMessage(ConnectionHandler.java:138)
> >>>> ~[apache-cassandra-2.1.14.jar:2.1.14]
> >>>> at
> >>>>
> org.apache.cassandra.streaming.StreamSession.receive(StreamSession.java:568)
> >>>> ~[apache-cassandra-2.1.14.jar:2.1.14]
> >>>> at
> >>>>
> org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:457)
> >>>> ~[apache-cassandra-2.1.14.jar:2.1.14]
> >>>> at
> >>>>
> org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:263)
> >>>> ~[apache-cassandra-2.1.14.jar:2.1.14]
> >>>> at java.lang.Thread.run(Unknown Source) [na:1.7.0_79]
> >>>>
> >>>> And this is from the source node:
> >>>>
> >>>> ERROR [STREAM-OUT-/172.31.22.104] 2016-05-26 11:08:05,097
> >>>> StreamSession.java:505 - [Stream
> #74c57bc0-231a-11e6-a698-1b05ac77baf9]
> >>>> Streaming error occurred
> >>>> java.io.IOException: Broken pipe
> >>>> at sun.nio.ch.FileChannelImpl.transferTo0(Native Method)
> >>>> ~[na:1.7.0_79]
> >>>> at sun.nio.ch.FileChannelImpl.transferToDirectly(Unknown
> Source)
> >>>> ~[na:1.7.0_79]
> >>>> at sun.nio.ch.FileChannelImpl.transferTo(Unknown Source)
> >>>> ~[na:1.7.0_79]
> >>>> at
> >>>>
> org.apache.cassandra.streaming.compress.CompressedStreamWriter.write(CompressedStreamWriter.java:84)
> >>>> ~[apache-cassandra-2.1.14.jar:2.1.14]
> >>>> at
> >>>>
> org.apache.cassandra.streaming.messages.OutgoingFileMessage.serialize(OutgoingFileMessage.java:88)
> >>>> ~[apache-cassandra-2.1.14.jar:2.1.14]
> >>>> at
> >>>>
> org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:49)
> >>>> ~[apache-cassandra-2.1.14.jar:2.1.14]
> >>>> at
> >>>>
> org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:41)
> >>>> ~[apache-cassandra-2.1.14.jar:2.1.14]
> >>>> at
> >>>>
> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:45)
> >>>> ~[apache-cassandra-2.1.14.jar:2.1.14]
> >>>> at
> >>>>
> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:358)
> >>>> [apache-cassandra-2.1.14.jar:2.1.14]
> >>>> at
> >>>>
> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:330)
> >>>> [apache-cassandra-2.1.14.jar:2.1.14]
>
>
> >>>>>>>>>>> ERROR [STREAM-IN-/192.168.1.140] 2016-05-24 22:44:57,704
> >>>>>>>>>>> StreamSession.java:620 - [Stream
> #2c290460-20d4-11e6-930f-1b05ac77baf9]
> >>>>>>>>>>> Remote peer 192.168.1.140 failed stream session.
> >>>>>>>>>>> ERROR [STREAM-OUT-/192.168.1.140] 2016-05-24 22:44:57,705
> >>>>>>>>>>> StreamSession.java:505 - [Stream
> #2c290460-20d4-11e6-930f-1b05ac77baf9]
> >>>>>>>>>>> Streaming error occurred
> >>>>>>

Re: Error while rebuilding a node: Stream failed

2016-05-27 Thread George Sigletos
) ~[na:1.7.0_79]
at
org.apache.cassandra.io.util.DataOutputStreamAndChannel.write(DataOutputStreamAndChannel.java:48)
~[apache-cassandra-2.1.14.jar:2.1.14]
at
org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:44)
~[apache-cassandra-2.1.14.jar:2.1.14]
at
org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:358)
[apache-cassandra-2.1.14.jar:2.1.14]
at
org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:338)
[apache-cassandra-2.1.14.jar:2.1.14]
at java.lang.Thread.run(Unknown Source) [na:1.7.0_79]

On Thu, May 26, 2016 at 7:05 PM, George Sigletos <sigle...@textkernel.nl>
wrote:

> The time the first streaming failure occurs varies from a few hours to 1+
> day.
>
> We also experience slowness problems with the destination node on Amazon.
> Rebuild is slow. That may also contribute to the problem.
>
> Unfortunately we only kept the logs of the source node and there is no
> other error prior to the streaming failure.
>
> Only compaction, flushing and writing memtable info messages.
>
> We are running the rebuild once more using destination node's external IP.
> If it fails again I will post the errors here.
>
> On Thu, May 26, 2016 at 5:20 PM, Paulo Motta <pauloricard...@gmail.com>
> wrote:
>
>> How long does it take after you trigger the rebuild process before it
>> fails?
>>
>> Was there any error before [STREAM-IN-/192.168.1.141] on the destination
>> node or [STREAM-OUT-/172.31.22.104] on the source node? Those are
>> showing consequences of the root error. In particular what were the last
>> messages on [STREAM-OUT-/192.168.1.141] and [STREAM-IN-/172.31.22.104] ?
>>
>> > Streaming does not seem to be resumed again from this node. Shall I
>> just kill again the entire rebuild process?
>>
>> Yes, resumable rebuild will be supported on CASSANDRA-10810.
>>
>> 2016-05-26 8:20 GMT-03:00 George Sigletos <sigle...@textkernel.nl>:
>>
>>> I tried again with setting streaming_socket_timeout_in_ms to 1 day on
>>> all nodes and after having upgraded to 2.1.14.
>>>
>>> My tcp_keep_alive_time is set to 2 hours and tcp_keepalive_probes to 9.
>>> That should be ok I would believe.
>>>
>>> I get streaming error again, shortly after starting the rebuild process.
>>> This is from the destination node:
>>>
>>> ERROR [STREAM-IN-/192.168.1.141] 2016-05-26 09:08:05,027
>>> StreamSession.java:505 - [Stream #74c57bc0-231a-11e6-a698-1b05ac77baf9]
>>> Streaming error occurred
>>> java.lang.RuntimeException: Outgoing stream handler has been closed
>>> at
>>> org.apache.cassandra.streaming.ConnectionHandler.sendMessage(ConnectionHandler.java:138)
>>> ~[apache-cassandra-2.1.14.jar:2.1.14]
>>> at
>>> org.apache.cassandra.streaming.StreamSession.receive(StreamSession.java:568)
>>> ~[apache-cassandra-2.1.14.jar:2.1.14]
>>> at
>>> org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:457)
>>> ~[apache-cassandra-2.1.14.jar:2.1.14]
>>> at
>>> org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:263)
>>> ~[apache-cassandra-2.1.14.jar:2.1.14]
>>> at java.lang.Thread.run(Unknown Source) [na:1.7.0_79]
>>>
>>> And this is from the source node:
>>>
>>> ERROR [STREAM-OUT-/172.31.22.104] 2016-05-26 11:08:05,097
>>> StreamSession.java:505 - [Stream #74c57bc0-231a-11e6-a698-1b05ac77baf9]
>>> Streaming error occurred
>>> java.io.IOException: Broken pipe
>>> at sun.nio.ch.FileChannelImpl.transferTo0(Native Method)
>>> ~[na:1.7.0_79]
>>> at sun.nio.ch.FileChannelImpl.transferToDirectly(Unknown Source)
>>> ~[na:1.7.0_79]
>>> at sun.nio.ch.FileChannelImpl.transferTo(Unknown Source)
>>> ~[na:1.7.0_79]
>>> at
>>> org.apache.cassandra.streaming.compress.CompressedStreamWriter.write(CompressedStreamWriter.java:84)
>>> ~[apache-cassandra-2.1.14.jar:2.1.14]
>>> at
>>> org.apache.cassandra.streaming.messages.OutgoingFileMessage.serialize(OutgoingFileMessage.java:88)
>>> ~[apache-cassandra-2.1.14.jar:2.1.14]
>>> at
>>> org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:49)
>>> ~[apache-cassandra-2.1.14.jar:2.1.14]
>>> at
>>> org.apache.cassandra.streaming.messages.OutgoingFileMessag

Re: Error while rebuilding a node: Stream failed

2016-05-26 Thread George Sigletos
The time the first streaming failure occurs varies from a few hours to 1+
day.

We also experience slowness problems with the destination node on Amazon.
Rebuild is slow. That may also contribute to the problem.

Unfortunately we only kept the logs of the source node and there is no
other error prior to the streaming failure.

Only compaction, flushing and writing memtable info messages.

We are running the rebuild once more using destination node's external IP.
If it fails again I will post the errors here.

On Thu, May 26, 2016 at 5:20 PM, Paulo Motta <pauloricard...@gmail.com>
wrote:

> How long does it take after you trigger the rebuild process before it
> fails?
>
> Was there any error before [STREAM-IN-/192.168.1.141] on the destination
> node or [STREAM-OUT-/172.31.22.104] on the source node? Those are showing
> consequences of the root error. In particular what were the last messages
> on [STREAM-OUT-/192.168.1.141] and [STREAM-IN-/172.31.22.104] ?
>
> > Streaming does not seem to be resumed again from this node. Shall I just
> kill again the entire rebuild process?
>
> Yes, resumable rebuild will be supported on CASSANDRA-10810.
>
> 2016-05-26 8:20 GMT-03:00 George Sigletos <sigle...@textkernel.nl>:
>
>> I tried again with setting streaming_socket_timeout_in_ms to 1 day on all
>> nodes and after having upgraded to 2.1.14.
>>
>> My tcp_keep_alive_time is set to 2 hours and tcp_keepalive_probes to 9.
>> That should be ok I would believe.
>>
>> I get streaming error again, shortly after starting the rebuild process.
>> This is from the destination node:
>>
>> ERROR [STREAM-IN-/192.168.1.141] 2016-05-26 09:08:05,027
>> StreamSession.java:505 - [Stream #74c57bc0-231a-11e6-a698-1b05ac77baf9]
>> Streaming error occurred
>> java.lang.RuntimeException: Outgoing stream handler has been closed
>> at
>> org.apache.cassandra.streaming.ConnectionHandler.sendMessage(ConnectionHandler.java:138)
>> ~[apache-cassandra-2.1.14.jar:2.1.14]
>> at
>> org.apache.cassandra.streaming.StreamSession.receive(StreamSession.java:568)
>> ~[apache-cassandra-2.1.14.jar:2.1.14]
>> at
>> org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:457)
>> ~[apache-cassandra-2.1.14.jar:2.1.14]
>> at
>> org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:263)
>> ~[apache-cassandra-2.1.14.jar:2.1.14]
>> at java.lang.Thread.run(Unknown Source) [na:1.7.0_79]
>>
>> And this is from the source node:
>>
>> ERROR [STREAM-OUT-/172.31.22.104] 2016-05-26 11:08:05,097
>> StreamSession.java:505 - [Stream #74c57bc0-231a-11e6-a698-1b05ac77baf9]
>> Streaming error occurred
>> java.io.IOException: Broken pipe
>> at sun.nio.ch.FileChannelImpl.transferTo0(Native Method)
>> ~[na:1.7.0_79]
>> at sun.nio.ch.FileChannelImpl.transferToDirectly(Unknown Source)
>> ~[na:1.7.0_79]
>> at sun.nio.ch.FileChannelImpl.transferTo(Unknown Source)
>> ~[na:1.7.0_79]
>> at
>> org.apache.cassandra.streaming.compress.CompressedStreamWriter.write(CompressedStreamWriter.java:84)
>> ~[apache-cassandra-2.1.14.jar:2.1.14]
>> at
>> org.apache.cassandra.streaming.messages.OutgoingFileMessage.serialize(OutgoingFileMessage.java:88)
>> ~[apache-cassandra-2.1.14.jar:2.1.14]
>> at
>> org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:49)
>> ~[apache-cassandra-2.1.14.jar:2.1.14]
>> at
>> org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:41)
>> ~[apache-cassandra-2.1.14.jar:2.1.14]
>> at
>> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:45)
>> ~[apache-cassandra-2.1.14.jar:2.1.14]
>> at
>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:358)
>> [apache-cassandra-2.1.14.jar:2.1.14]
>> at
>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:330)
>> [apache-cassandra-2.1.14.jar:2.1.14]
>> at java.lang.Thread.run(Unknown Source) [na:1.7.0_79]
>> INFO  [STREAM-OUT-/172.31.22.104] 2016-05-26 11:08:05,111
>> StreamResultFuture.java:180 - [Stream
>> #74c57bc0-231a-11e6-a698-1b05ac77baf9] Session with /172.31.22.104 is
>> complete
>> WARN  [STREAM-OUT-/172.31.22.104] 2016-05-26 11:08:05,114
>> StreamResultFuture.java:207 - [Stream
>> #74c57bc0-231a-11e6-a698-1b05ac77baf9] Stream failed
>>
>> > 

Re: Error while rebuilding a node: Stream failed

2016-05-26 Thread George Sigletos
I tried again with setting streaming_socket_timeout_in_ms to 1 day on all
nodes and after having upgraded to 2.1.14.

My tcp_keep_alive_time is set to 2 hours and tcp_keepalive_probes to 9.
That should be ok I would believe.

I get streaming error again, shortly after starting the rebuild process.
This is from the destination node:

ERROR [STREAM-IN-/192.168.1.141] 2016-05-26 09:08:05,027
StreamSession.java:505 - [Stream #74c57bc0-231a-11e6-a698-1b05ac77baf9]
Streaming error occurred
java.lang.RuntimeException: Outgoing stream handler has been closed
at
org.apache.cassandra.streaming.ConnectionHandler.sendMessage(ConnectionHandler.java:138)
~[apache-cassandra-2.1.14.jar:2.1.14]
at
org.apache.cassandra.streaming.StreamSession.receive(StreamSession.java:568)
~[apache-cassandra-2.1.14.jar:2.1.14]
at
org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:457)
~[apache-cassandra-2.1.14.jar:2.1.14]
at
org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:263)
~[apache-cassandra-2.1.14.jar:2.1.14]
at java.lang.Thread.run(Unknown Source) [na:1.7.0_79]

And this is from the source node:

ERROR [STREAM-OUT-/172.31.22.104] 2016-05-26 11:08:05,097
StreamSession.java:505 - [Stream #74c57bc0-231a-11e6-a698-1b05ac77baf9]
Streaming error occurred
java.io.IOException: Broken pipe
at sun.nio.ch.FileChannelImpl.transferTo0(Native Method)
~[na:1.7.0_79]
at sun.nio.ch.FileChannelImpl.transferToDirectly(Unknown Source)
~[na:1.7.0_79]
at sun.nio.ch.FileChannelImpl.transferTo(Unknown Source)
~[na:1.7.0_79]
at
org.apache.cassandra.streaming.compress.CompressedStreamWriter.write(CompressedStreamWriter.java:84)
~[apache-cassandra-2.1.14.jar:2.1.14]
at
org.apache.cassandra.streaming.messages.OutgoingFileMessage.serialize(OutgoingFileMessage.java:88)
~[apache-cassandra-2.1.14.jar:2.1.14]
at
org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:49)
~[apache-cassandra-2.1.14.jar:2.1.14]
at
org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:41)
~[apache-cassandra-2.1.14.jar:2.1.14]
at
org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:45)
~[apache-cassandra-2.1.14.jar:2.1.14]
at
org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:358)
[apache-cassandra-2.1.14.jar:2.1.14]
at
org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:330)
[apache-cassandra-2.1.14.jar:2.1.14]
at java.lang.Thread.run(Unknown Source) [na:1.7.0_79]
INFO  [STREAM-OUT-/172.31.22.104] 2016-05-26 11:08:05,111
StreamResultFuture.java:180 - [Stream
#74c57bc0-231a-11e6-a698-1b05ac77baf9] Session with /172.31.22.104 is
complete
WARN  [STREAM-OUT-/172.31.22.104] 2016-05-26 11:08:05,114
StreamResultFuture.java:207 - [Stream
#74c57bc0-231a-11e6-a698-1b05ac77baf9] Stream failed


Streaming does not seem to be resumed again from this node. Shall I just
kill again the entire rebuild process?

On Thu, May 26, 2016 at 12:17 AM, Paulo Motta <pauloricard...@gmail.com>
wrote:

> If increasing or disabling streaming_socket_timeout_in_ms on the source
> node does not fix it, you may want to have a look on your tcp keep alive
> settings on the source and destination nodes as intermediate
> routers/firewalls may be killing the connections due to inactivity. See
> this for more information:
> https://docs.datastax.com/en/cassandra/2.0/cassandra/troubleshooting/trblshootIdleFirewall.html
>
> This will ultimately fixed by CASSANDRA-11841 by adding keep-alive to the
> streaming protocol.
>
> 2016-05-25 18:09 GMT-03:00 George Sigletos <sigle...@textkernel.nl>:
>
>> Thanks a lot for your help. I will try that tomorrow. The first time that
>> I tried to rebuild, streaming_socket_timeout_in_ms was 0 and still failed.
>> Below is the directly previous error on the source node:
>>
>> ERROR [STREAM-IN-/172.31.22.104] 2016-05-24 22:32:20,437
>> StreamSession.java:505 - [Stream #2c290460-20d4-11e6-930f-1b05ac77baf9]
>> Streaming error occurred
>> java.io.IOException: Connection timed out
>> at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>> ~[na:1.7.0_79]
>> at sun.nio.ch.SocketDispatcher.read(Unknown Source) ~[na:1.7.0_79]
>> at sun.nio.ch.IOUtil.readIntoNativeBuffer(Unknown Source)
>> ~[na:1.7.0_79]
>> at sun.nio.ch.IOUtil.read(Unknown Source) ~[na:1.7.0_79]
>> at sun.nio.ch.SocketChannelImpl.read(Unknown Source)
>> ~[na:1.7.0_79]
>> at
>> org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:51)
>> ~[apache-cassandra-2.1.13.jar:2.1.13

Re: Error while rebuilding a node: Stream failed

2016-05-25 Thread George Sigletos
Thanks a lot for your help. I will try that tomorrow. The first time that I
tried to rebuild, streaming_socket_timeout_in_ms was 0 and still failed.
Below is the directly previous error on the source node:

ERROR [STREAM-IN-/172.31.22.104] 2016-05-24 22:32:20,437
StreamSession.java:505 - [Stream #2c290460-20d4-11e6-930f-1b05ac77baf9]
Streaming error occurred
java.io.IOException: Connection timed out
at sun.nio.ch.FileDispatcherImpl.read0(Native Method) ~[na:1.7.0_79]
at sun.nio.ch.SocketDispatcher.read(Unknown Source) ~[na:1.7.0_79]
at sun.nio.ch.IOUtil.readIntoNativeBuffer(Unknown Source)
~[na:1.7.0_79]
at sun.nio.ch.IOUtil.read(Unknown Source) ~[na:1.7.0_79]
at sun.nio.ch.SocketChannelImpl.read(Unknown Source) ~[na:1.7.0_79]
at
org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:51)
~[apache-cassandra-2.1.13.jar:2.1.13]
at
org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:250)
~[apache-cassandra-2.1.13.jar:2.1.13]
at java.lang.Thread.run(Unknown Source) [na:1.7.0_79]

On Wed, May 25, 2016 at 10:28 PM, Paulo Motta <pauloricard...@gmail.com>
wrote:

> > Workaround is to set to a larger streaming_socket_timeout_in_ms **on
> the source node**., the new default will be 8640ms (1 day).
>
> 2016-05-25 17:23 GMT-03:00 Paulo Motta <pauloricard...@gmail.com>:
>
>> Was there any other ERROR preceding this on this node (in particular the
>> last few lines of [STREAM-IN-/172.31.22.104])? If it's a
>> SocketTimeoutException, then what is happening is that the default
>> streaming socket timeout of 1 hour is not sufficient to stream a single
>> file and the stream session is failed. Workaround is to set to a larger
>> streaming_socket_timeout_in_ms, the new default will be 8640ms (1
>> day).
>>
>> We are addressing this on
>> https://issues.apache.org/jira/browse/CASSANDRA-11839.
>>
>> 2016-05-25 16:42 GMT-03:00 George Sigletos <sigle...@textkernel.nl>:
>>
>>> Hello again,
>>>
>>> Here is the error message from the source
>>>
>>> INFO  [STREAM-IN-/172.31.22.104] 2016-05-25 00:44:57,275
>>> StreamResultFuture.java:180 - [Stream
>>> #2c290460-20d4-11e6-930f-1b05ac77baf9] Session with /172.31.22.104 is
>>> complete
>>> WARN  [STREAM-IN-/172.31.22.104] 2016-05-25 00:44:57,276
>>> StreamResultFuture.java:207 - [Stream
>>> #2c290460-20d4-11e6-930f-1b05ac77baf9] Stream failed
>>> ERROR [STREAM-OUT-/172.31.22.104] 2016-05-25 00:44:57,353
>>> StreamSession.java:505 - [Stream #2c290460-20d4-11e6-930f-1b05ac77baf9]
>>> Streaming error occurred
>>> java.lang.AssertionError: Memory was freed
>>> at
>>> org.apache.cassandra.io.util.SafeMemory.checkBounds(SafeMemory.java:97)
>>> ~[apache-cassandra-2.1.13.jar:2.1.13]
>>> at org.apache.cassandra.io.util.Memory.getLong(Memory.java:249)
>>> ~[apache-cassandra-2.1.13.jar:2.1.13]
>>> at
>>> org.apache.cassandra.io.compress.CompressionMetadata.getTotalSizeForSections(CompressionMetadata.java:247)
>>> ~[apache-cassandra-2.1.13.jar:2.1.13]
>>> at
>>> org.apache.cassandra.streaming.messages.FileMessageHeader.size(FileMessageHeader.java:112)
>>> ~[apache-cassandra-2.1.13.jar:2.1.13]
>>> at
>>> org.apache.cassandra.streaming.StreamSession.fileSent(StreamSession.java:546)
>>> ~[apache-cassandra-2.1.13.jar:2.1.13]
>>> at
>>> org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:50)
>>> ~[apache-cassandra-2.1.13.jar:2.1.13]
>>> at
>>> org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:41)
>>> ~[apache-cassandra-2.1.13.jar:2.1.13]
>>> at
>>> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:45)
>>> ~[apache-cassandra-2.1.13.jar:2.1.13]
>>> at
>>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:351)
>>> ~[apache-cassandra-2.1.13.jar:2.1.13]
>>> at
>>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:331)
>>> ~[apache-cassandra-2.1.13.jar:2.1.13]
>>> at java.lang.Thread.run(Unknown Source) [na:1.7.0_79]
>>>
>>> On Wed, May 25, 2016 at 8:49 PM, Paulo Motta <pauloricard...@gmail.com>
>>> wrote:
>>>
>>>> This is the log of the destination/rebuilding node, you need t

Re: Error while rebuilding a node: Stream failed

2016-05-25 Thread George Sigletos
Hello again,

Here is the error message from the source

INFO  [STREAM-IN-/172.31.22.104] 2016-05-25 00:44:57,275
StreamResultFuture.java:180 - [Stream
#2c290460-20d4-11e6-930f-1b05ac77baf9] Session with /172.31.22.104 is
complete
WARN  [STREAM-IN-/172.31.22.104] 2016-05-25 00:44:57,276
StreamResultFuture.java:207 - [Stream
#2c290460-20d4-11e6-930f-1b05ac77baf9] Stream failed
ERROR [STREAM-OUT-/172.31.22.104] 2016-05-25 00:44:57,353
StreamSession.java:505 - [Stream #2c290460-20d4-11e6-930f-1b05ac77baf9]
Streaming error occurred
java.lang.AssertionError: Memory was freed
at
org.apache.cassandra.io.util.SafeMemory.checkBounds(SafeMemory.java:97)
~[apache-cassandra-2.1.13.jar:2.1.13]
at org.apache.cassandra.io.util.Memory.getLong(Memory.java:249)
~[apache-cassandra-2.1.13.jar:2.1.13]
at
org.apache.cassandra.io.compress.CompressionMetadata.getTotalSizeForSections(CompressionMetadata.java:247)
~[apache-cassandra-2.1.13.jar:2.1.13]
at
org.apache.cassandra.streaming.messages.FileMessageHeader.size(FileMessageHeader.java:112)
~[apache-cassandra-2.1.13.jar:2.1.13]
at
org.apache.cassandra.streaming.StreamSession.fileSent(StreamSession.java:546)
~[apache-cassandra-2.1.13.jar:2.1.13]
at
org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:50)
~[apache-cassandra-2.1.13.jar:2.1.13]
at
org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:41)
~[apache-cassandra-2.1.13.jar:2.1.13]
at
org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:45)
~[apache-cassandra-2.1.13.jar:2.1.13]
at
org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:351)
~[apache-cassandra-2.1.13.jar:2.1.13]
at
org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:331)
~[apache-cassandra-2.1.13.jar:2.1.13]
at java.lang.Thread.run(Unknown Source) [na:1.7.0_79]

On Wed, May 25, 2016 at 8:49 PM, Paulo Motta <pauloricard...@gmail.com>
wrote:

> This is the log of the destination/rebuilding node, you need to check what
> is the error message on the stream source node (192.168.1.140).
>
>
> 2016-05-25 15:22 GMT-03:00 George Sigletos <sigle...@textkernel.nl>:
>
>> Hello,
>>
>> Here is additional stack trace from system.log:
>>
>> ERROR [STREAM-IN-/192.168.1.140] 2016-05-24 22:44:57,704
>> StreamSession.java:620 - [Stream #2c290460-20d4-11e6-930f-1b05ac77baf9]
>> Remote peer 192.168.1.140 failed stream session.
>> ERROR [STREAM-OUT-/192.168.1.140] 2016-05-24 22:44:57,705
>> StreamSession.java:505 - [Stream #2c290460-20d4-11e6-930f-1b05ac77baf9]
>> Streaming error occurred
>> java.io.IOException: Connection timed out
>> at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
>> ~[na:1.7.0_79]
>> at sun.nio.ch.SocketDispatcher.write(Unknown Source)
>> ~[na:1.7.0_79]
>> at sun.nio.ch.IOUtil.writeFromNativeBuffer(Unknown Source)
>> ~[na:1.7.0_79]
>> at sun.nio.ch.IOUtil.write(Unknown Source) ~[na:1.7.0_79]
>> at sun.nio.ch.SocketChannelImpl.write(Unknown Source)
>> ~[na:1.7.0_79]
>> at
>> org.apache.cassandra.io.util.DataOutputStreamAndChannel.write(DataOutputStreamAndChannel.java:48)
>> ~[apache-cassandra-2.1.13.jar:2.1.13]
>> at
>> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:44)
>> ~[apache-cassandra-2.1.13.jar:2.1.13]
>> at
>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:351)
>> [apache-cassandra-2.1.13.jar:2.1.13]
>> at
>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:323)
>> [apache-cassandra-2.1.13.jar:2.1.13]
>> at java.lang.Thread.run(Unknown Source) [na:1.7.0_79]
>> INFO  [STREAM-IN-/192.168.1.140] 2016-05-24 22:44:58,625
>> StreamResultFuture.java:180 - [Stream
>> #2c290460-20d4-11e6-930f-1b05ac77baf9] Session with /192.168.1.140 is
>> complete
>> WARN  [STREAM-IN-/192.168.1.140] 2016-05-24 22:44:58,627
>> StreamResultFuture.java:207 - [Stream
>> #2c290460-20d4-11e6-930f-1b05ac77baf9] Stream failed
>> ERROR [RMI TCP Connection(24)-127.0.0.1] 2016-05-24 22:44:58,628
>> StorageService.java:1075 - Error while rebuilding node
>> org.apache.cassandra.streaming.StreamException: Stream failed
>> at
>> org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:85)
>> ~[apache-cassandra-2.1.13.jar:2.1.13]
>> at
>> com.google.common.util.concurrent.Futures$4.run

Re: Error while rebuilding a node: Stream failed

2016-05-25 Thread George Sigletos
Hello,

Here is additional stack trace from system.log:

ERROR [STREAM-IN-/192.168.1.140] 2016-05-24 22:44:57,704
StreamSession.java:620 - [Stream #2c290460-20d4-11e6-930f-1b05ac77baf9]
Remote peer 192.168.1.140 failed stream session.
ERROR [STREAM-OUT-/192.168.1.140] 2016-05-24 22:44:57,705
StreamSession.java:505 - [Stream #2c290460-20d4-11e6-930f-1b05ac77baf9]
Streaming error occurred
java.io.IOException: Connection timed out
at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
~[na:1.7.0_79]
at sun.nio.ch.SocketDispatcher.write(Unknown Source) ~[na:1.7.0_79]
at sun.nio.ch.IOUtil.writeFromNativeBuffer(Unknown Source)
~[na:1.7.0_79]
at sun.nio.ch.IOUtil.write(Unknown Source) ~[na:1.7.0_79]
at sun.nio.ch.SocketChannelImpl.write(Unknown Source) ~[na:1.7.0_79]
at
org.apache.cassandra.io.util.DataOutputStreamAndChannel.write(DataOutputStreamAndChannel.java:48)
~[apache-cassandra-2.1.13.jar:2.1.13]
at
org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:44)
~[apache-cassandra-2.1.13.jar:2.1.13]
at
org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:351)
[apache-cassandra-2.1.13.jar:2.1.13]
at
org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:323)
[apache-cassandra-2.1.13.jar:2.1.13]
at java.lang.Thread.run(Unknown Source) [na:1.7.0_79]
INFO  [STREAM-IN-/192.168.1.140] 2016-05-24 22:44:58,625
StreamResultFuture.java:180 - [Stream
#2c290460-20d4-11e6-930f-1b05ac77baf9] Session with /192.168.1.140 is
complete
WARN  [STREAM-IN-/192.168.1.140] 2016-05-24 22:44:58,627
StreamResultFuture.java:207 - [Stream
#2c290460-20d4-11e6-930f-1b05ac77baf9] Stream failed
ERROR [RMI TCP Connection(24)-127.0.0.1] 2016-05-24 22:44:58,628
StorageService.java:1075 - Error while rebuilding node
org.apache.cassandra.streaming.StreamException: Stream failed
at
org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:85)
~[apache-cassandra-2.1.13.jar:2.1.13]
at
com.google.common.util.concurrent.Futures$4.run(Futures.java:1172)
~[guava-16.0.jar:na]
at
com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297)
~[guava-16.0.jar:na]
at
com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156)
~[guava-16.0.jar:na]
at
com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145)
~[guava-16.0.jar:na]
at
com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202)
~[guava-16.0.jar:na]
at
org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:208)
~[apache-cassandra-2.1.13.jar:2.1.13]
at
org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:184)
~[apache-cassandra-2.1.13.jar:2.1.13]
at
org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:415)
~[apache-cassandra-2.1.13.jar:2.1.13]
at
org.apache.cassandra.streaming.StreamSession.sessionFailed(StreamSession.java:621)
~[apache-cassandra-2.1.13.jar:2.1.13]
at
org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:475)
~[apache-cassandra-2.1.13.jar:2.1.13]
at
org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:256)
~[apache-cassandra-2.1.13.jar:2.1.13]
at java.lang.Thread.run(Unknown Source) ~[na:1.7.0_79]
ERROR [STREAM-OUT-/192.168.1.140] 2016-05-24 22:44:58,629
StreamSession.java:505 - [Stream #2c290460-20d4-11e6-930f-1b05ac77baf9]
Streaming error occurred
java.io.IOException: Broken pipe
at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
~[na:1.7.0_79]
at sun.nio.ch.SocketDispatcher.write(Unknown Source) ~[na:1.7.0_79]
at sun.nio.ch.IOUtil.writeFromNativeBuffer(Unknown Source)
~[na:1.7.0_79]
at sun.nio.ch.IOUtil.write(Unknown Source) ~[na:1.7.0_79]
at sun.nio.ch.SocketChannelImpl.write(Unknown Source) ~[na:1.7.0_79]
at
org.apache.cassandra.io.util.DataOutputStreamAndChannel.write(DataOutputStreamAndChannel.java:48)
~[apache-cassandra-2.1.13.jar:2.1.13]
at
org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:44)
~[apache-cassandra-2.1.13.jar:2.1.13]
at
org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:351)
[apache-cassandra-2.1.13.jar:2.1.13]
at
org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:331)
[apache-cassandra-2.1.13.jar:2.1.13]
at java.lang.Thread.run(Unknown Source) [na:1.7.0_79]


On Wed, May 25, 2016 at 5:23 PM, Paulo Motta 
wrote:

> The stack trace from the rebuild command not show the root cause of 

Re: Error while rebuilding a node: Stream failed

2016-05-25 Thread George Sigletos
Hi Mike,

Yes I am using NetworkTopologyStrategy. I checked
cassandra-rackdc.properties on the new node:
dc=DCamazon-1
rack=RACamazon-1

I also checked the jira link you sent me. My network topology seems
correct: I have 4 nodes in DC1 and 1 node in DCamazon-1 and I can verify
that when running "nodetool status".

Now I am running a full repair on the amazon node. I have given up
rebuilding

Kind regards,
George



On Wed, May 25, 2016 at 8:50 AM, Mike Yeap <wkk1...@gmail.com> wrote:

> Hi George, are you using NetworkTopologyStrategy as the replication
> strategy for your keyspace? If yes, can you check the
> cassandra-rackdc.properties of this new node?
>
> https://issues.apache.org/jira/browse/CASSANDRA-8279
>
>
> Regards,
> Mike Yeap
>
> On Wed, May 25, 2016 at 2:31 PM, George Sigletos <sigle...@textkernel.nl>
> wrote:
>
>> I am getting this error repeatedly while I am trying to add a new DC
>> consisting of one node in AWS to my existing cluster. I have tried 5 times
>> already. Running Cassandra 2.1.13
>>
>> I have also set:
>> streaming_socket_timeout_in_ms: 360
>> in all of my nodes
>>
>> Does anybody have any idea how this can be fixed? Thanks in advance
>>
>> Kind regards,
>> George
>>
>> P.S.
>> The complete stack trace:
>> -- StackTrace --
>> java.lang.RuntimeException: Error while rebuilding node: Stream failed
>> at
>> org.apache.cassandra.service.StorageService.rebuild(StorageService.java:1076)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
>> at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
>> at java.lang.reflect.Method.invoke(Unknown Source)
>> at sun.reflect.misc.Trampoline.invoke(Unknown Source)
>> at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
>> at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
>> at java.lang.reflect.Method.invoke(Unknown Source)
>> at sun.reflect.misc.MethodUtil.invoke(Unknown Source)
>> at
>> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(Unknown Source)
>> at
>> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(Unknown Source)
>> at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(Unknown
>> Source)
>> at com.sun.jmx.mbeanserver.PerInterface.invoke(Unknown Source)
>> at com.sun.jmx.mbeanserver.MBeanSupport.invoke(Unknown Source)
>> at
>> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(Unknown Source)
>> at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(Unknown Source)
>> at
>> javax.management.remote.rmi.RMIConnectionImpl.doOperation(Unknown Source)
>> at
>> javax.management.remote.rmi.RMIConnectionImpl.access$300(Unknown Source)
>> at
>> javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(Unknown
>> Source)
>> at
>> javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(Unknown
>> Source)
>> at javax.management.remote.rmi.RMIConnectionImpl.invoke(Unknown
>> Source)
>> at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
>> at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
>> at java.lang.reflect.Method.invoke(Unknown Source)
>> at sun.rmi.server.UnicastServerRef.dispatch(Unknown Source)
>> at sun.rmi.transport.Transport$2.run(Unknown Source)
>> at sun.rmi.transport.Transport$2.run(Unknown Source)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at sun.rmi.transport.Transport.serviceCall(Unknown Source)
>> at sun.rmi.transport.tcp.TCPTransport.handleMessages(Unknown
>> Source)
>> at
>> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(Unknown Source)
>> at
>> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.access$400(Unknown
>> Source)
>> at
>> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler$1.run(Unknown Source)
>> at
>> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler$1.run(Unknown Source)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at
>> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(Unknown Source)
>> at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
>> Source)
>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
>> Source)
>> at java.lang.Thread.run(Unknown Source)
>>
>
>


Error while rebuilding a node: Stream failed

2016-05-25 Thread George Sigletos
I am getting this error repeatedly while I am trying to add a new DC
consisting of one node in AWS to my existing cluster. I have tried 5 times
already. Running Cassandra 2.1.13

I have also set:
streaming_socket_timeout_in_ms: 360
in all of my nodes

Does anybody have any idea how this can be fixed? Thanks in advance

Kind regards,
George

P.S.
The complete stack trace:
-- StackTrace --
java.lang.RuntimeException: Error while rebuilding node: Stream failed
at
org.apache.cassandra.service.StorageService.rebuild(StorageService.java:1076)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at sun.reflect.misc.Trampoline.invoke(Unknown Source)
at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at sun.reflect.misc.MethodUtil.invoke(Unknown Source)
at
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(Unknown Source)
at
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(Unknown Source)
at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(Unknown Source)
at com.sun.jmx.mbeanserver.PerInterface.invoke(Unknown Source)
at com.sun.jmx.mbeanserver.MBeanSupport.invoke(Unknown Source)
at
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(Unknown Source)
at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(Unknown Source)
at
javax.management.remote.rmi.RMIConnectionImpl.doOperation(Unknown Source)
at javax.management.remote.rmi.RMIConnectionImpl.access$300(Unknown
Source)
at
javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(Unknown
Source)
at
javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(Unknown
Source)
at javax.management.remote.rmi.RMIConnectionImpl.invoke(Unknown
Source)
at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at sun.rmi.server.UnicastServerRef.dispatch(Unknown Source)
at sun.rmi.transport.Transport$2.run(Unknown Source)
at sun.rmi.transport.Transport$2.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at sun.rmi.transport.Transport.serviceCall(Unknown Source)
at sun.rmi.transport.tcp.TCPTransport.handleMessages(Unknown Source)
at
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(Unknown Source)
at
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.access$400(Unknown
Source)
at
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler$1.run(Unknown Source)
at
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler$1.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(Unknown
Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
at java.lang.Thread.run(Unknown Source)


Consistency level ONE and using withLocalDC

2016-05-20 Thread George Sigletos
Hello,

Using withLocalDC="myLocalDC" and withUsedHostsPerRemoteDc>0 will guarantee
that you will connect to one of the nodes in "myLocalDC",

but DOES NOT guarantee that your read/write request will be acknowledged by
a "myLocalDC" node. It may well be acknowledged by a remote DC node as
well, even if "myLocalDC" is up and running.

Would that be correct? Thank you

Kind regards,
George


Re: opscenter doesn't work with cassandra 3.0

2016-01-26 Thread George Sigletos
Unfortunately Datastax decided to discontinue Opscenter for open source
Cassandra, starting from version 2.2.

Pitty

On Wed, Jan 6, 2016 at 6:00 PM, Michael Shuler 
wrote:

> On 01/06/2016 10:55 AM, Michael Shuler wrote:
> > On 01/06/2016 01:47 AM, Wills Feng wrote:
> >> Looks like opscenter doesn't support cassandra 3.0?
> >
> > This is correct. OpsCenter does not support Cassandra >= 3.0.
>
> It took me a minute to find the correct document:
>
>
> http://docs.datastax.com/en/upgrade/doc/upgrade/opscenter/opscCompatibility.html
>
> According to this version table, OpsCenter does not officially support
> Cassandra > 2.1.
>
> --
> Michael
>


Re: What is the ideal way to merge two Cassandra clusters with same keyspace into one?

2015-12-21 Thread George Sigletos
Hello,

We had a similar problem where we needed to migrate data from one cluster
to another.

We ended up using Spark to accomplish this. It is fast and reliable but
some downtime was required after all.

We minimized the downtime by doing a first run, and then run incremental
updates.

Kind regards,
George



On Mon, Dec 21, 2015 at 10:12 AM, Noorul Islam K M 
wrote:

>
> Hello all,
>
> We have two clusters X and Y with same keyspaces but distinct data sets.
> We are planning to merge these into single cluster. What would be ideal
> steps to achieve this without downtime for applications? We have time
> series data stream continuously writing to Cassandra.
>
> We have ruled out export/import as that will make us loose data during
> the time of copy.
>
> We also ruled out sstableloader as that is not reliable. It fails often
> and there is not way to start from where it failed.
>
> Any suggestions will help.
>
> Thanks and Regards
> Noorul
>


Re: What is the ideal way to merge two Cassandra clusters with same keyspace into one?

2015-12-21 Thread George Sigletos
Roughly half TB of data.

There is a timestamp column in the tables we migrated and we did use that
to achieve incremental updates.

I don't know anything about kairosdb, but I can see from the docs that
there exists a row timestamp column. Could you maybe use that one?

Kind regards,
George

On Mon, Dec 21, 2015 at 12:53 PM, Noorul Islam K M <noo...@noorul.com>
wrote:

> George Sigletos <sigle...@textkernel.nl> writes:
>
> > Hello,
> >
> > We had a similar problem where we needed to migrate data from one cluster
> > to another.
> >
> > We ended up using Spark to accomplish this. It is fast and reliable but
> > some downtime was required after all.
> >
> > We minimized the downtime by doing a first run, and then run incremental
> > updates.
> >
>
> How much data are you talking about?
>
> How did you achieve incremental run? We are using kairosdb and some of
> the other schemas does not have a way to filter based on date.
>
> Thanks and Regards
> Noorul
>
> > Kind regards,
> > George
> >
> >
> >
> > On Mon, Dec 21, 2015 at 10:12 AM, Noorul Islam K M <noo...@noorul.com>
> > wrote:
> >
> >>
> >> Hello all,
> >>
> >> We have two clusters X and Y with same keyspaces but distinct data sets.
> >> We are planning to merge these into single cluster. What would be ideal
> >> steps to achieve this without downtime for applications? We have time
> >> series data stream continuously writing to Cassandra.
> >>
> >> We have ruled out export/import as that will make us loose data during
> >> the time of copy.
> >>
> >> We also ruled out sstableloader as that is not reliable. It fails often
> >> and there is not way to start from where it failed.
> >>
> >> Any suggestions will help.
> >>
> >> Thanks and Regards
> >> Noorul
> >>
>


Re: Running sstableloader from every node when migrating?

2015-12-01 Thread George Sigletos
Thank you Robert and Anuja,

It does not seem that sstable2json is the right tool to go: there is no
documentation beyond Cassandra 1.2, it requires a specific sstable to be
given, which means a lot of manual work.

The documentation also mentions it is good for testing/debugging but I
would need to migrate near 1 TB of data from a 6-node cluster to a 3-node
one. Neither copying sstables/nodetool refresh seems a great option as
well. Unless I am missing something.

Using sstableloader seems a more logical option. Still a bottleneck if you
need to do it for every node in your source cluster. What if you had a
100-node cluster?

Thinking of just running a simple script, instead, that selects data from
the source cluster and inserts them to the target one.

Kind regards,
George

On Tue, Dec 1, 2015 at 7:54 AM, anuja jain <anujaja...@gmail.com> wrote:

> Hello George,
> You can use sstable2json to create the json of your keyspace and then load
> this json to your keyspace in new cluster using json2sstable utility.
>
> On Tue, Dec 1, 2015 at 3:06 AM, Robert Coli <rc...@eventbrite.com> wrote:
>
>> On Thu, Nov 19, 2015 at 7:01 AM, George Sigletos <sigle...@textkernel.nl>
>> wrote:
>>
>>> We would like to migrate one keyspace from a 6-node cluster to a 3-node
>>> one.
>>>
>>
>> http://www.pythian.com/blog/bulk-loading-options-for-cassandra/
>>
>> =Rob
>>
>>
>
>


Running sstableloader from every node when migrating?

2015-11-19 Thread George Sigletos
Hello,

We would like to migrate one keyspace from a 6-node cluster to a 3-node one.

Since an individual node does not contain all data, this means that we
should run the sstableloader 6 times, one for each node of our cluster.

To be precise, do "nodetool flush " then run sstableloader -d <3
target nodes> 

Would that be the correct approach?

Thank you in advance,
George


java.lang.IllegalArgumentException: Mutation of X bytes is too large for the maxiumum size of Y

2015-10-06 Thread George Sigletos
Hello,

I have been frequently receiving those warnings:

java.lang.IllegalArgumentException: Mutation of 35141120 bytes is too large
for the maxiumum size of 33554432
 at org.apache.cassandra.db.commitlog.CommitLog.add(CommitLog.java:221)
~[apache-cassandra-2.1.9.jar:2.1.9]
at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:383)
~[apache-cassandra-2.1.9.jar:2.1.9]
at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:363)
~[apache-cassandra-2.1.9.jar:2.1.9]
at org.apache.cassandra.db.Mutation.apply(Mutation.java:214)
~[apache-cassandra-2.1.9.jar:2.1.9]
at
org.apache.cassandra.db.MutationVerbHandler.doVerb(MutationVerbHandler.java:54)
~[apache-cassandra-2.1.9.jar:2.1.9]
at
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:64)
~[apache-cassandra-2.1.9.jar:2.1.9]
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
~[na:1.7.0_75]
at
org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164)
~[apache-cassandra-2.1.9.jar:2.1.9]
at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105)
[apache-cassandra-2.1.9.jar:2.1.9]
at java.lang.Thread.run(Unknown Source) [na:1.7.0_75]

Sometimes I can trigger them myself by trying to add the contents of  a
text document that is less than 1 MB.

Initially I did increase the "commitlog_segment_size_in_mb" from 32 to 64.
Thinking to further increase it to 96.

But would that be a solution to the problem? What could be possibly causing
this?

Thank you,
George


Re: java.lang.IllegalArgumentException: Mutation of X bytes is too large for the maxiumum size of Y

2015-10-06 Thread George Sigletos
I see no dropped mutation in any of the nodes:

Message type   Dropped
RANGE_SLICE  0
READ_REPAIR  0
PAGED_RANGE  0
BINARY   0
READ 0
MUTATION 0
_TRACE   0
REQUEST_RESPONSE 0
COUNTER_MUTATION 0

On Tue, Oct 6, 2015 at 5:35 PM, Kiran mk <coolkiran2...@gmail.com> wrote:

> Do you see more dropped mutation messages in nodetool tpstats output.
> On Oct 6, 2015 7:51 PM, "George Sigletos" <sigle...@textkernel.nl> wrote:
>
>> Hello,
>>
>> I have been frequently receiving those warnings:
>>
>> java.lang.IllegalArgumentException: Mutation of 35141120 bytes is too
>> large for the maxiumum size of 33554432
>>  at org.apache.cassandra.db.commitlog.CommitLog.add(CommitLog.java:221)
>> ~[apache-cassandra-2.1.9.jar:2.1.9]
>> at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:383)
>> ~[apache-cassandra-2.1.9.jar:2.1.9]
>> at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:363)
>> ~[apache-cassandra-2.1.9.jar:2.1.9]
>> at org.apache.cassandra.db.Mutation.apply(Mutation.java:214)
>> ~[apache-cassandra-2.1.9.jar:2.1.9]
>> at
>> org.apache.cassandra.db.MutationVerbHandler.doVerb(MutationVerbHandler.java:54)
>> ~[apache-cassandra-2.1.9.jar:2.1.9]
>> at
>> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:64)
>> ~[apache-cassandra-2.1.9.jar:2.1.9]
>> at java.util.concurrent.Executors$RunnableAdapter.call(Unknown
>> Source) ~[na:1.7.0_75]
>> at
>> org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164)
>> ~[apache-cassandra-2.1.9.jar:2.1.9]
>> at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105)
>> [apache-cassandra-2.1.9.jar:2.1.9]
>> at java.lang.Thread.run(Unknown Source) [na:1.7.0_75]
>>
>> Sometimes I can trigger them myself by trying to add the contents of  a
>> text document that is less than 1 MB.
>>
>> Initially I did increase the "commitlog_segment_size_in_mb" from 32 to
>> 64. Thinking to further increase it to 96.
>>
>> But would that be a solution to the problem? What could be possibly
>> causing this?
>>
>> Thank you,
>> George
>>
>


Re: Repair corrupt SSTable from power outage?

2015-10-02 Thread George Sigletos
I'm also facing problems regarding corrupt sstables and also couldn't run
sstablescrub successfully.

I restarted my nodes with disk failure policy "best_effort", then I run the
"nodetool scrub "

Once done I removed the corrupt tables manually and started repair


On Thu, Oct 1, 2015 at 7:27 PM, John Anderson  wrote:

> I have a 25 node cluster and we lost power on one of the racks last night
> and now 6 of our nodes will not start up and we are getting the following
> error:
>
> INFO  [main] 2015-10-01 10:19:22,111 CassandraDaemon.java:122 - JMX is
> enabled to receive remote connections on port: 7199
> INFO  [main] 2015-10-01 10:19:22,124 CacheService.java:111 - Initializing
> key cache with capacity of 100 MBs.
> INFO  [main] 2015-10-01 10:19:22,129 CacheService.java:133 - Initializing
> row cache with capacity of 0 MBs
> INFO  [main] 2015-10-01 10:19:22,133 CacheService.java:150 - Initializing
> counter cache with capacity of 50 MBs
> INFO  [main] 2015-10-01 10:19:22,135 CacheService.java:161 - Scheduling
> counter cache save to every 7200 seconds (going to save all keys).
> INFO  [main] 2015-10-01 10:19:22,211 ColumnFamilyStore.java:363 -
> Initializing system.sstable_activity
> INFO  [SSTableBatchOpen:1] 2015-10-01 10:19:22,639 SSTableReader.java:478
> - Opening
> /mnt/cassandra/data/data/system/sstable_activity-5a1ff267ace03f128563cfae6103c65e/system-sstable_activity-ka-45
> (805 bytes)
> ERROR [SSTableBatchOpen:1] 2015-10-01 10:19:22,657 FileUtils.java:447 -
> Exiting forcefully due to file system exception on startup, disk failure
> policy "stop"
> org.apache.cassandra.io.sstable.CorruptSSTableException:
> java.io.EOFException
> at
> org.apache.cassandra.io.compress.CompressionMetadata.(CompressionMetadata.java:131)
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at
> org.apache.cassandra.io.compress.CompressionMetadata.create(CompressionMetadata.java:85)
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at
> org.apache.cassandra.io.util.CompressedSegmentedFile$Builder.metadata(CompressedSegmentedFile.java:79)
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at
> org.apache.cassandra.io.util.CompressedPoolingSegmentedFile$Builder.complete(CompressedPoolingSegmentedFile.java:72)
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at
> org.apache.cassandra.io.util.SegmentedFile$Builder.complete(SegmentedFile.java:168)
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at
> org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:752)
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at
> org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:703)
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at
> org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:491)
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at
> org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:387)
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at
> org.apache.cassandra.io.sstable.SSTableReader$4.run(SSTableReader.java:534)
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> [na:1.8.0_60]
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> [na:1.8.0_60]
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> [na:1.8.0_60]
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> [na:1.8.0_60]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]
> Caused by: java.io.EOFException: null
> at
> java.io.DataInputStream.readUnsignedShort(DataInputStream.java:340)
> ~[na:1.8.0_60]
> at java.io.DataInputStream.readUTF(DataInputStream.java:589)
> ~[na:1.8.0_60]
> at java.io.DataInputStream.readUTF(DataInputStream.java:564)
> ~[na:1.8.0_60]
> at
> org.apache.cassandra.io.compress.CompressionMetadata.(CompressionMetadata.java:106)
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> ... 14 common frames omitted
>
>
> I found some people recommending scrubbing the sstable so I attempted that
> and got the following error:
>
> bin/sstablescrub system sstable_activity -v
>
>
> ERROR 17:26:03 Exiting forcefully due to file system exception on startup,
> disk failure policy "stop"
> org.apache.cassandra.io.sstable.CorruptSSTableException:
> java.io.EOFException
> at
> org.apache.cassandra.io.compress.CompressionMetadata.(CompressionMetadata.java:131)
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at
> org.apache.cassandra.io.compress.CompressionMetadata.create(CompressionMetadata.java:85)
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at
> org.apache.cassandra.io.util.CompressedSegmentedFile$Builder.metadata(CompressedSegmentedFile.java:79)
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at
> org.apache.cassandra.io.util.CompressedPoolingSegmentedFile$Builder.complete(CompressedPoolingSegmentedFile.java:72)
> ~[apache-cassandra-2.1.9.jar:2.1.9]
>   

Re: Currupt sstables when upgrading from 2.1.8 to 2.1.9

2015-09-30 Thread George Sigletos
Hello again and sorry for the late response,

Still having problems with upgrading from 2.1.8 to 2.1.9.

I decided to start the problematic nodes with "disk_failure_policy:
best_effort"

Currently running "nodetool scrub  "

Then removing the corrupted sstables and planning to run repair afterwards

This is way too many manual steps. I was wondering why not just removing
the entire /var/lib/cassandra/data folder + commitlogs, restart the node
again and wait to catch up

Kind regards,
George


On Fri, Sep 25, 2015 at 12:01 AM, Robert Coli  wrote:

> On Thu, Sep 24, 2015 at 3:00 PM, Robert Coli  wrote:
>
>> A node which has lost a SSTable also needs to be repaired immediately.
>>
>
> Forgot to mention, you can repair via this technique :
>
> https://issues.apache.org/jira/browse/CASSANDRA-6961
>
> =Rob
>
>


Currupt sstables when upgrading from 2.1.8 to 2.1.9

2015-09-15 Thread George Sigletos
Hello,

I tried to upgrade two of our clusters from 2.1.8 to 2.1.9. In some, but
not all nodes, I got errors about corrupt sstables when restarting. I
downgraded back to 2.1.8 for now.

Has anybody else faced the same problem? Should sstablescrub fix the
problem? I ddin't tried that yet.

Kind regards,
George

ERROR [SSTableBatchOpen:3] 2015-09-14 10:16:03,296 FileUtils.java:447 -
Exiting forcefully due to file system exception on startup, disk failure
policy "stop"
org.apache.cassandra.io.sstable.CorruptSSTableException:
java.io.EOFException
at
org.apache.cassandra.io.compress.CompressionMetadata.(CompressionMetadata.java:131)
~[apache-cassandra-2.1.9.jar:2.1.9]
at
org.apache.cassandra.io.compress.CompressionMetadata.create(CompressionMetadata.java:85)
~[apache-cassandra-2.1.9.jar:2.1.9]
at
org.apache.cassandra.io.util.CompressedSegmentedFile$Builder.metadata(CompressedSegmentedFile.java:79)
~[apache-cassandra-2.1.9.jar:2.1.9]
at
org.apache.cassandra.io.util.CompressedPoolingSegmentedFile$Builder.complete(CompressedPoolingSegmentedFile.java:72)
~[apache-cassandra-2.1.9.jar:2.1.9]
at
org.apache.cassandra.io.util.SegmentedFile$Builder.complete(SegmentedFile.java:168)
~[apache-cassandra-2.1.9.jar:2.1.9]
at
org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:752)
~[apache-cassandra-2.1.9.jar:2.1.9]
at
org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:703)
~[apache-cassandra-2.1.9.jar:2.1.9]
at
org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:491)
~[apache-cassandra-2.1.9.jar:2.1.9]
at
org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:387)
~[apache-cassandra-2.1.9.jar:2.1.9]
at
org.apache.cassandra.io.sstable.SSTableReader$4.run(SSTableReader.java:534)
~[apache-cassandra-2.1.9.jar:2.1.9]
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown
Source) [na:1.7.0_75]
at java.util.concurrent.FutureTask.run(Unknown Source) [na:1.7.0_75]
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
Source) [na:1.7.0_75]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source) [na:1.7.0_75]
at java.lang.Thread.run(Unknown Source) [na:1.7.0_75]
Caused by: java.io.EOFException: null
at java.io.DataInputStream.readUnsignedShort(Unknown Source)
~[na:1.7.0_75]
at java.io.DataInputStream.readUTF(Unknown Source) ~[na:1.7.0_75]
at java.io.DataInputStream.readUTF(Unknown Source) ~[na:1.7.0_75]
at
org.apache.cassandra.io.compress.CompressionMetadata.(CompressionMetadata.java:106)
~[apache-cassandra-2.1.9.jar:2.1.9]