Odd number of files on one node during repair (was: To Repair or Not to Repair)
On Tue, Aug 13, 2019 at 6:14 PM Oleksandr Shulgin < oleksandr.shul...@zalando.de> wrote: > > I was wondering about this again, as I've noticed one of the nodes in our > cluster accumulating ten times the number of files compared to the average > across the rest of cluster. The files are all coming from a table with > TWCS and repair (running with Reaper) is ongoing. The sudden growth > started around 24 hours ago as the affected node was restarted due to > failing AWS EC2 System check. > And now as the next weekly repair has started, the same node shows the problem again. Number of files went up to 6,000 in the last 7 hours, as compared to the average of ~1,500 on the rest of the nodes, which remains more or less constant. Any advice how to debug it? Regards, -- Alex
Re: To Repair or Not to Repair
On Thu, Mar 14, 2019 at 9:55 PM Jonathan Haddad wrote: > My coworker Alex (from The Last Pickle) wrote an in depth blog post on > TWCS. We recommend not running repair on tables that use TWCS. > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html > Hi, I was wondering about this again, as I've noticed one of the nodes in our cluster accumulating ten times the number of files compared to the average across the rest of cluster. The files are all coming from a table with TWCS and repair (running with Reaper) is ongoing. The sudden growth started around 24 hours ago as the affected node was restarted due to failing AWS EC2 System check. Now I'm thinking again if we should be running those repairs at all. ;-) In the Summary of the blog post linked above, the following is written: It is advised to disable read repair on TWCS tables, and use an agressive tombstone purging strategy as digest mismatches during reads will still trigger read repairs. Was it meant to read "disable anti-entropy repair" instead? I find it confusing otherwise. Regards, -- Alex
RE: To Repair or Not to Repair
Beautiful, thank you very much! From: Jonathan Haddad [mailto:j...@jonhaddad.com] Sent: Thursday, March 14, 2019 4:55 PM To: user Subject: Re: To Repair or Not to Repair My coworker Alex (from The Last Pickle) wrote an in depth blog post on TWCS. We recommend not running repair on tables that use TWCS. http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html It's enough of a problem that we added a feature into Reaper to auto-blacklist TWCS / DTCS tables from being repaired, we wrote about it here: http://thelastpickle.com/blog/2019/02/15/reaper-1_4-released.html Hope this helps! Jon On Fri, Mar 15, 2019 at 9:48 AM Nick Hatfield mailto:nick.hatfi...@metricly.com>> wrote: It seems that running a repair works really well, quickly and efficiently when repairing a column family that does not use TWCS. Has anyone else had a similar experience? Wondering if running TWCS is doing more harm than good as it chews up a lot of cpu and for extended periods of time in comparison to CF’s with a compaction strategy of STCS Thanks, -- Jon Haddad http://www.rustyrazorblade.com twitter: rustyrazorblade
Re: To Repair or Not to Repair
My coworker Alex (from The Last Pickle) wrote an in depth blog post on TWCS. We recommend not running repair on tables that use TWCS. http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html It's enough of a problem that we added a feature into Reaper to auto-blacklist TWCS / DTCS tables from being repaired, we wrote about it here: http://thelastpickle.com/blog/2019/02/15/reaper-1_4-released.html Hope this helps! Jon On Fri, Mar 15, 2019 at 9:48 AM Nick Hatfield wrote: > It seems that running a repair works really well, quickly and efficiently > when repairing a column family that does not use TWCS. Has anyone else had > a similar experience? Wondering if running TWCS is doing more harm than > good as it chews up a lot of cpu and for extended periods of time in > comparison to CF’s with a compaction strategy of STCS > > > > > > Thanks, > -- Jon Haddad http://www.rustyrazorblade.com twitter: rustyrazorblade
To Repair or Not to Repair
It seems that running a repair works really well, quickly and efficiently when repairing a column family that does not use TWCS. Has anyone else had a similar experience? Wondering if running TWCS is doing more harm than good as it chews up a lot of cpu and for extended periods of time in comparison to CF's with a compaction strategy of STCS Thanks,