[jira] Commented: (CASSANDRA-1674) Repair using abnormally large amounts of disk space

Jonathan Ellis (JIRA) Wed, 17 Nov 2010 12:08:50 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-1674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12933122#action_12933122
 ]


Jonathan Ellis commented on CASSANDRA-1674:
-------------------------------------------

committed, but I think there is a bug.  With a similar setup to the above (200K 
keys instead of 1M), the pre-RF change setup is

{code}
Address         Status State   Load            Token                            
           
                                       106239986353888428655683112465158427815  
  
127.0.0.2       Up     Normal  37.97 MB        
21212647344528771789748883276744400257      
127.0.0.3       Up     Normal  18.98 MB        
63523312719601176253752035031089272162      
127.0.0.1       Up     Normal  19.05 MB        
106239986353888428655683112465158427815     
{code}

post-repair is
{code}
Address         Status State   Load            Token                            
           
                                       106239986353888428655683112465158427815  
  
127.0.0.2       Up     Normal  57.01 MB        
21212647344528771789748883276744400257      
127.0.0.3       Up     Normal  56.94 MB        
63523312719601176253752035031089272162      
127.0.0.1       Up     Normal  19.07 MB        
106239986353888428655683112465158427815     
{code}

So eyeballing it looks reasonable.  But when I kill node 2 and run

{code}$ python contrib/py_stress/stress.py -n 200000 -o read{code}

I get a ton of key-not-found exceptions, indicating that not all the data on 2 
got replicated to 3.

> Repair using abnormally large amounts of disk space
> ---------------------------------------------------
>
>                 Key: CASSANDRA-1674
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1674
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Stu Hood
>             Fix For: 0.6.9, 0.7.0
>
>         Attachments: 
> 0001-Only-repair-the-intersecting-portion-of-a-differing-ra.txt, 
> for-0.6-0001-Only-repair-the-intersecting-portion-of-a-differing-ra.txt
>
>
> I'm watching a repair on a 7 node cluster.  Repair was sent to one node; the 
> node had 18G of data.  No other node has more than 28G.  The node where the 
> repair initiated is now up to 261G with 53/60 AES tasks outstanding.
> I have seen repair take more space than expected on 0.6 but nothing this 
> extreme.
> Other nodes in the cluster are occasionally logging
> WARN [ScheduledTasks:1] 2010-10-28 08:31:14,305 MessagingService.java (line 
> 515) Dropped 7 messages in the last 1000ms
> The cluster is quiesced except for the repair.  Not sure if the dropped 
> messages are contributing the the disk space (b/c of retries?).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-1674) Repair using abnormally large amounts of disk space

Reply via email to