[
https://issues.apache.org/jira/browse/TS-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873565#comment-13873565
]
Alan M. Carroll commented on TS-2500:
-------------------------------------
Leif; See attached patch.
For each host record, there is an array of stripes (Vol*) and a stripe
assignment array. The values in the assignment array are indices in to the
stripe array. When a disk fails, the assignment array is updated. The first
part of this is to remove all stripes that are on failed disks, creating a
shorter stripe array which is used for populating the assignment table. The bug
is that the assignment table is updated with the indices of this shorter,
temporary array. For instance, if there are 10 stripes and a disk failure
removes 2 of the stripes, then the assignment array will be populated with
values in the range 0..7, effectively removing 2 stripes from the active cache.
But those two stripes will be the last two stripes in the host record stripe
array, not necessarily the 2 stripes on the failed disk.
Apparently this was known because the data needed to do it correctly is
computed, it's just not actually *used*. This is the 'mapping' array which
converts from short/tmp array indices to host record indices. The patch is
really just to use that as needed, although a few other tweaks are included to
make this easier to debug in the future.
> Disk failure can disable incorrect stripes
> ------------------------------------------
>
> Key: TS-2500
> URL: https://issues.apache.org/jira/browse/TS-2500
> Project: Traffic Server
> Issue Type: Bug
> Components: Cache
> Reporter: Alan M. Carroll
> Assignee: Alan M. Carroll
> Fix For: 4.2.0
>
> Attachments: ts-2500.diff
>
>
> When a disk fails, the last stripes in a set are removed from service, which
> may or may not be the stripes associated with the failed disk.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)