[ https://issues.apache.org/jira/browse/CASSANDRA-8366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14324190#comment-14324190 ]
Marcus Eriksson commented on CASSANDRA-8366: -------------------------------------------- I tried it once more, with autocompaction disabled to remove a bit of randomness (incremental repairs are sensitive to compactions as that will make an actually repaired sstable not be anticompacted since it has been compacted away). after 1 run with incremental repair: {code} $ du -sch /home/marcuse/.ccm/8366/node?/data/r1/ 1,5G /home/marcuse/.ccm/8366/node1/data/r1/ 1,5G /home/marcuse/.ccm/8366/node2/data/r1/ 1,5G /home/marcuse/.ccm/8366/node3/data/r1/ 4,4G total {code} all sstables were marked as repaired and, after 1 run with standard repair: {code} $ du -sch /home/marcuse/.ccm/8366/node?/data/r1/ 1,5G /home/marcuse/.ccm/8366/node1/data/r1/ 1,5G /home/marcuse/.ccm/8366/node2/data/r1/ 1,5G /home/marcuse/.ccm/8366/node3/data/r1/ 4,4G total {code} but, after an incremental repair with compactions enabled: {code} $ du -sch /home/marcuse/.ccm/8366/node?/data/r1/ 2,3G /home/marcuse/.ccm/8366/node1/data/r1/ 2,8G /home/marcuse/.ccm/8366/node2/data/r1/ 2,0G /home/marcuse/.ccm/8366/node3/data/r1/ 6,9G total {code} And, the reason is that we validate the wrong sstables: 1. we send out a prepare message to all nodes, the nodes select which sstables to repair 2. time passes, sstables get compacted (basically randomly) 3. we start validating the sstables out of the ones we picked in (1) *that still exist*. This set will differ between nodes. 4. overstream, pain Bug. Stand by for patch > Repair grows data on nodes, causes load to become unbalanced > ------------------------------------------------------------ > > Key: CASSANDRA-8366 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8366 > Project: Cassandra > Issue Type: Bug > Environment: 4 node cluster > 2.1.2 Cassandra > Inserts and reads are done with CQL driver > Reporter: Jan Karlsson > Assignee: Marcus Eriksson > Attachments: results-10000000-inc-repairs.txt, > results-17500000_inc_repair.txt, results-5000000_1_inc_repairs.txt, > results-5000000_2_inc_repairs.txt, > results-5000000_full_repair_then_inc_repairs.txt, > results-5000000_inc_repairs_not_parallel.txt, > run1_with_compact_before_repair.log, run2_no_compact_before_repair.log, > run3_no_compact_before_repair.log, test.sh, testv2.sh > > > There seems to be something weird going on when repairing data. > I have a program that runs 2 hours which inserts 250 random numbers and reads > 250 times per second. It creates 2 keyspaces with SimpleStrategy and RF of 3. > I use size-tiered compaction for my cluster. > After those 2 hours I run a repair and the load of all nodes goes up. If I > run incremental repair the load goes up alot more. I saw the load shoot up 8 > times the original size multiple times with incremental repair. (from 2G to > 16G) > with node 9 8 7 and 6 the repro procedure looked like this: > (Note that running full repair first is not a requirement to reproduce.) > {noformat} > After 2 hours of 250 reads + 250 writes per second: > UN 9 583.39 MB 256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 > UN 8 584.01 MB 256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 > UN 7 583.72 MB 256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 > UN 6 583.84 MB 256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 > Repair -pr -par on all nodes sequentially > UN 9 746.29 MB 256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 > UN 8 751.02 MB 256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 > UN 7 748.89 MB 256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 > UN 6 758.34 MB 256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 > repair -inc -par on all nodes sequentially > UN 9 2.41 GB 256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 > UN 8 2.53 GB 256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 > UN 7 2.6 GB 256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 > UN 6 2.17 GB 256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 > after rolling restart > UN 9 1.47 GB 256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 > UN 8 1.5 GB 256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 > UN 7 2.46 GB 256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 > UN 6 1.19 GB 256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 > compact all nodes sequentially > UN 9 989.99 MB 256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 > UN 8 994.75 MB 256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 > UN 7 1.46 GB 256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 > UN 6 758.82 MB 256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 > repair -inc -par on all nodes sequentially > UN 9 1.98 GB 256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 > UN 8 2.3 GB 256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 > UN 7 3.71 GB 256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 > UN 6 1.68 GB 256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 > restart once more > UN 9 2 GB 256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 > UN 8 2.05 GB 256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 > UN 7 4.1 GB 256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 > UN 6 1.68 GB 256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 > {noformat} > Is there something im missing or is this strange behavior? -- This message was sent by Atlassian JIRA (v6.3.4#6332)