[jira] [Commented] (TS-2500) Disk failure can disable incorrect stripes

Alan M. Carroll (JIRA) Thu, 16 Jan 2014 08:42:39 -0800

    [ 
https://issues.apache.org/jira/browse/TS-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873565#comment-13873565
 ]


Alan M. Carroll commented on TS-2500:
-------------------------------------

Leif; See attached patch.

For each host record, there is an array of stripes (Vol*) and a stripe 
assignment array. The values in the assignment array are indices in to the 
stripe array. When a disk fails, the assignment array is updated. The first 
part of this is to remove all stripes that are on failed disks, creating a 
shorter stripe array which is used for populating the assignment table. The bug 
is that the assignment table is updated with the indices of this shorter, 
temporary array. For instance, if there are 10 stripes and a disk failure 
removes 2 of the stripes, then the assignment  array will be populated with 
values in the range 0..7, effectively removing 2 stripes from the active cache. 
But those two stripes will be the last two stripes in the host record stripe 
array, not necessarily the 2 stripes on the failed disk.

Apparently this was known because the data needed to do it correctly is 
computed, it's just not actually *used*. This is the 'mapping' array which 
converts from short/tmp array indices to host record indices. The patch is 
really just to use that as needed, although a few other tweaks are included to 
make this easier to debug in the future.

> Disk failure can disable incorrect stripes
> ------------------------------------------
>
>                 Key: TS-2500
>                 URL: https://issues.apache.org/jira/browse/TS-2500
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Cache
>            Reporter: Alan M. Carroll
>            Assignee: Alan M. Carroll
>             Fix For: 4.2.0
>
>         Attachments: ts-2500.diff
>
>
> When a disk fails, the last stripes in a set are removed from service, which 
> may or may not be the stripes associated with the failed disk.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (TS-2500) Disk failure can disable incorrect stripes

Reply via email to