Odd number of files on one node during repair (was: To Repair or Not to Repair)

2019-08-16 Thread Oleksandr Shulgin
On Tue, Aug 13, 2019 at 6:14 PM Oleksandr Shulgin <
oleksandr.shul...@zalando.de> wrote:

>
> I was wondering about this again, as I've noticed one of the nodes in our
> cluster accumulating ten times the number of files compared to the average
> across the rest of cluster.  The files are all coming from a table with
> TWCS and repair (running with Reaper) is ongoing.  The sudden growth
> started around 24 hours ago as the affected node was restarted due to
> failing AWS EC2 System check.
>

And now as the next weekly repair has started, the same node shows the
problem again.  Number of files went up to 6,000 in the last 7 hours, as
compared to the average of ~1,500 on the rest of the nodes, which remains
more or less constant.

Any advice how to debug it?

Regards,
--
Alex


Re: Assassinate fails

2019-08-16 Thread Alex
Hello Alain, 

long time  - I had to wait for a quiet week to try this. I finally did,
I thought I'd give you some feedback. 

Short reminder: one of the nodes of my 3.9 cluster died and I replaced
it. But it still appeared in nodetool status, on one node with a "null"
host_id and on another with the same host_id of its replacement.
nodetool assassinate failed and I could not decommission or remove any
other node on the cluster. 

Basically, after backup and preparing another cluster in case anything
went wrong, I did : 

DELETE FROM system.peers WHERE peer = '192.168.1.18'; 

and restarted cassandra on the two nodes still seeing the zombie node. 

After the first restart, the cassandra system.log was filled with: 

java.lang.NullPointerException: null
WARN  [MutationStage-2] 2019-08-15 15:31:44,735
AbstractLocalAwareExecutorService.java:169 - Uncaught exception on
thread Thread[MutationStage-2,5,main]: 

So... I restarted again. The error disappeared. I ran a full repair and
everything seems to be back in order. I could decommission a node
without problem. 

Thanks for your help ! 

Alex 

Le 05.04.2019 10:55, Alain RODRIGUEZ a écrit :

> Alex, 
> 
>> Well, I tried : rolling restart did not work its magic.
> 
> Sorry to hear and for misleading you. May faith into the rolling restart 
> magical power went down a bit, but I still think it was worth a try :D. 
> 
>> @ Alain : In system.peers I see both the dead node and its replacement with 
>> the same ID :peer | host_id
>> --+--
>> 192.168.1.18 | 09d24557-4e98-44c3-8c9d-53c4c31066e1
>> 192.168.1.22 | 09d24557-4e98-44c3-8c9d-53c4c31066e1 
>> 
>> Is it expected ? 
>> 
>> If I cannot fix this, I think I will add new nodes and remove, one by one, 
>> the nodes that show the dead node in nodetool status.
> 
> Well, no. This is clearly not good or expected I would say. 
> 
> TL;DR - SUGGESTED FIX: 
> What I would try to fix this is the following is removing this row. It 
> *should* be safe but that's only my opinion and with the condition you remove 
> *only* the 'ghost/dead' nodes. Any mistake here would probably be costly. 
> Again, be aware you're on a sensitive part when messing with system tables. 
> Think it twice, check it twice, take a copy of the SSTables/a snapshot. Then 
> I would go for it and observe changes on one node first. If no harm is done, 
> continue to the next node. 
> 
> Considering the old node is '192.168.1.18', I would run this on all nodes 
> (maybe after testing on a node) to make it simple or just run it on nodes 
> that show the ghost node(s):  
> 
> "DELETE FROM SYSTEM.PEERS WHERE PEER = '192.168.1.18';" 
> 
> Maybe will you need to restart, I think you won't even need it. I have good 
> hope that this should finally fix your issue with no harm. 
> 
> MORE CONTEXT - IDEA OF THE PROBLEM: 
> This above, is clearly an issue I would say. Most probably the source of your 
> troubles here. The problem is that I lack understanding. From where I stand, 
> this kind of bugs should not happen anymore in Cassandra (I did not see 
> anything similar for a while). 
> 
> I would blame: 
> - A corner case scenario (unlikely, system tables are rather solid for a 
> while). Or maybe are you using an old C* version. It *might* be related to 
> this (or similar): https://issues.apache.org/jira/browse/CASSANDRA-7122) 
> - A really weird operation (A succession of action might have put you in this 
> state, but hard for me to say what) 
> - KairosDB? I don't know It or what it does. Might it be less reliable than 
> Cassandra is, and have lead to this issue? Maybe, I have no clue once again. 
> 
> RISK OF THIS OPERATION AND CURRENT SITUATION: 
> Also, I *think* the current situation is relatively 'stable' (maybe just some 
> hints being stored for nothing, and possibly not being able to add more nodes 
> or change schema?). This is the kind of situation where 'rushing' a solution 
> without understanding the impacts and risks can make things to go terribly 
> wrong. Take the time to analyse my suggested fix, maybe read the ticket above 
> etc. When you're ready, backup the data, prepare well the DELETE command and 
> observe how 1 node reacts to the fix first. 
> 
> As you can see, I think it's the 'good' fix, but I'm not comfortable with 
> this operation. And you should not be either :). 
> I would say, arbitrary to share my feeling about this operation, that there 
> is 95% chances this does not hurt, 90% chances to fix the issue with that, 
> but if something goes wrong, if we are in the 5% were it does not go well, 
> there is a not negligible probability that you will destroy your cluster in a 
> very bad way. I guess I try to say be careful, watch your step, make sure you 
> remove the good line, ensure it works on one node with no harm. 
> I shared my feeling and I would try this fix. But it's ultimately your 
> responsibility and I won't be behind the machine when you'll fix it.