[ 
https://issues.apache.org/jira/browse/CASSANDRA-8366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14324190#comment-14324190
 ] 

Marcus Eriksson commented on CASSANDRA-8366:
--------------------------------------------

I tried it once more, with autocompaction disabled to remove a bit of 
randomness (incremental repairs are sensitive to compactions as that will make 
an actually repaired sstable not be anticompacted since it has been compacted 
away). 

after 1 run with incremental repair:
{code}
$ du -sch /home/marcuse/.ccm/8366/node?/data/r1/                                
                                                    
1,5G    /home/marcuse/.ccm/8366/node1/data/r1/
1,5G    /home/marcuse/.ccm/8366/node2/data/r1/
1,5G    /home/marcuse/.ccm/8366/node3/data/r1/
4,4G    total
{code}
all sstables were marked as repaired

and, after 1 run with standard repair:
{code}
$ du -sch /home/marcuse/.ccm/8366/node?/data/r1/
1,5G    /home/marcuse/.ccm/8366/node1/data/r1/
1,5G    /home/marcuse/.ccm/8366/node2/data/r1/
1,5G    /home/marcuse/.ccm/8366/node3/data/r1/
4,4G    total
{code}

but, after an incremental repair with compactions enabled:
{code}
$ du -sch /home/marcuse/.ccm/8366/node?/data/r1/
2,3G    /home/marcuse/.ccm/8366/node1/data/r1/
2,8G    /home/marcuse/.ccm/8366/node2/data/r1/
2,0G    /home/marcuse/.ccm/8366/node3/data/r1/
6,9G    total
{code}

And, the reason is that we validate the wrong sstables:

1. we send out a prepare message to all nodes, the nodes select which sstables 
to repair
2. time passes, sstables get compacted (basically randomly)
3. we start validating the sstables out of the ones we picked in (1) *that 
still exist*. This set will differ between nodes.
4. overstream, pain

Bug. Stand by for patch

> Repair grows data on nodes, causes load to become unbalanced
> ------------------------------------------------------------
>
>                 Key: CASSANDRA-8366
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8366
>             Project: Cassandra
>          Issue Type: Bug
>         Environment: 4 node cluster
> 2.1.2 Cassandra
> Inserts and reads are done with CQL driver
>            Reporter: Jan Karlsson
>            Assignee: Marcus Eriksson
>         Attachments: results-10000000-inc-repairs.txt, 
> results-17500000_inc_repair.txt, results-5000000_1_inc_repairs.txt, 
> results-5000000_2_inc_repairs.txt, 
> results-5000000_full_repair_then_inc_repairs.txt, 
> results-5000000_inc_repairs_not_parallel.txt, 
> run1_with_compact_before_repair.log, run2_no_compact_before_repair.log, 
> run3_no_compact_before_repair.log, test.sh, testv2.sh
>
>
> There seems to be something weird going on when repairing data.
> I have a program that runs 2 hours which inserts 250 random numbers and reads 
> 250 times per second. It creates 2 keyspaces with SimpleStrategy and RF of 3. 
> I use size-tiered compaction for my cluster. 
> After those 2 hours I run a repair and the load of all nodes goes up. If I 
> run incremental repair the load goes up alot more. I saw the load shoot up 8 
> times the original size multiple times with incremental repair. (from 2G to 
> 16G)
> with node 9 8 7 and 6 the repro procedure looked like this:
> (Note that running full repair first is not a requirement to reproduce.)
> {noformat}
> After 2 hours of 250 reads + 250 writes per second:
> UN  9  583.39 MB  256     ?       28220962-26ae-4eeb-8027-99f96e377406  rack1
> UN  8  584.01 MB  256     ?       f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
> UN  7  583.72 MB  256     ?       2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
> UN  6  583.84 MB  256     ?       b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1
> Repair -pr -par on all nodes sequentially
> UN  9  746.29 MB  256     ?       28220962-26ae-4eeb-8027-99f96e377406  rack1
> UN  8  751.02 MB  256     ?       f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
> UN  7  748.89 MB  256     ?       2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
> UN  6  758.34 MB  256     ?       b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1
> repair -inc -par on all nodes sequentially
> UN  9  2.41 GB    256     ?       28220962-26ae-4eeb-8027-99f96e377406  rack1
> UN  8  2.53 GB    256     ?       f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
> UN  7  2.6 GB     256     ?       2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
> UN  6  2.17 GB    256     ?       b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1
> after rolling restart
> UN  9  1.47 GB    256     ?       28220962-26ae-4eeb-8027-99f96e377406  rack1
> UN  8  1.5 GB     256     ?       f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
> UN  7  2.46 GB    256     ?       2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
> UN  6  1.19 GB    256     ?       b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1
> compact all nodes sequentially
> UN  9  989.99 MB  256     ?       28220962-26ae-4eeb-8027-99f96e377406  rack1
> UN  8  994.75 MB  256     ?       f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
> UN  7  1.46 GB    256     ?       2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
> UN  6  758.82 MB  256     ?       b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1
> repair -inc -par on all nodes sequentially
> UN  9  1.98 GB    256     ?       28220962-26ae-4eeb-8027-99f96e377406  rack1
> UN  8  2.3 GB     256     ?       f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
> UN  7  3.71 GB    256     ?       2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
> UN  6  1.68 GB    256     ?       b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1
> restart once more
> UN  9  2 GB       256     ?       28220962-26ae-4eeb-8027-99f96e377406  rack1
> UN  8  2.05 GB    256     ?       f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
> UN  7  4.1 GB     256     ?       2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
> UN  6  1.68 GB    256     ?       b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1
> {noformat}
> Is there something im missing or is this strange behavior?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to