[jira] [Commented] (TS-3132) Can't recover the disk, after disk replacement

Phil Sorber (JIRA) Fri, 12 Jun 2015 13:03:21 -0700

    [ 
https://issues.apache.org/jira/browse/TS-3132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14583983#comment-14583983
 ]


Phil Sorber commented on TS-3132:
---------------------------------

{noformat}
Hi Phil,
 
I just tracked down the source of our problem with replacement drives not being 
picked up by ATS. My grasp on the ATS code is still rather limited, but with 
enough printf’s in the right spots, I have a general idea of what’s going on.
 
First, I need to tell you guys that I’ve verified several times over that the 
issue we’re seeing exists with both ATS 3.2 and 5.2. It has nothing to do with 
the differences between IPCDN’s drive replacement procedure and ours (i.e. 
neither your reboot nor our /dev/ats/cachediskXX symlinks are causing the delta 
in behavior).
 
In our storage.config, we are not listing an explicit volume number for each 
drive, and as such, the list of volumes for that drive ends up being empty 
until the content of the drive is scanned. Because this is a new drive without 
volume info, the drive never ends up with a volume list, and as such ends up in 
a limbo state.
 
The quick and dirty workaround that I just tested with ATS 3.2 is to add a 
‘volume=1’ to every entry in storage.conf, but I believe that the correct 
solution is probably a change to the ‘fillExclusiveDisks()’ function in ATS’s 
iocore/Cache.cc source.
 
Tomorrow, I’ll see how easy it is to put a patch together for this, but I’m 
guessing its trickier than my understanding of the code would allow in any 
short amount of time.
 
-John
{noformat}

> Can't recover the disk, after disk replacement
> ----------------------------------------------
>
>                 Key: TS-3132
>                 URL: https://issues.apache.org/jira/browse/TS-3132
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Cache
>    Affects Versions: 4.2.0, 5.1.0
>         Environment: CentOS 6.5 x86_64
>            Reporter: seri,Kim
>            Assignee: Alan M. Carroll
>             Fix For: sometime
>
>
> {quote}
> \[Oct  9 13:55:18.257\] Server \{0x2add2d1d0700\} WARNING: cache disk 
> operation failed READ -1 5
> \[Oct  9 13:55:18.259\] Server \{0x2add03736700\} WARNING: Error accessing 
> Disk /dev/sdk \[1/5\]
> \[Oct  9 13:55:18.265\] Server \{0x2add2cdcc700\} WARNING: cache disk 
> operation failed READ -1 5
> \[Oct  9 13:55:18.271\] Server \{0x2add02120700\} WARNING: Error accessing 
> Disk /dev/sdk \[2/5\]
> \[Oct  9 13:55:18.273\] Server \{0x2add2dbda700\} WARNING: cache disk 
> operation failed READ -1 5
> \[Oct  9 13:55:18.280\] Server \{0x2add02827700\} WARNING: Error accessing 
> Disk /dev/sdk \[3/5\]
> \[Oct  9 13:55:18.325\] Server \{0x2add2c7c6700\} WARNING: cache disk 
> operation failed READ -1 5
> \[Oct  9 13:55:18.331\] Server \{0x2add01d1c700\} WARNING: Error accessing 
> Disk /dev/sdk \[4/5\]
> \[Oct  9 22:36:49.223\] Server \{0x2add2cbca700\} WARNING: cache disk 
> operation failed READ -1 5
> \[Oct  9 22:36:49.226\] Server \{0x2add01b1a700\} WARNING: too many errors 
> accessing disk /dev/sdk \[5/5\]: declaring disk bad
> \[Oct  9 22:36:49.405\] Server \{0x2add2c3c2700\} WARNING: cache disk 
> operation failed READ -1 5
> {quote}
> Because of disk failure,
> 1. I stopped Traffic Server, replaced the disk and started Traffic Server.
> {quote}
> \[Oct 14 16:53:04.131\] Server \{0x2ada6e423700\} WARNING: disk header 
> different for disk /dev/sdk: clearing the disk
> \[Oct 14 16:53:04.225\] Server \{0x2ada67757140\} NOTE: traffic server running
> \[Oct 14 16:53:11.890\] Server \{0x2ada6f231700\} NOTE: cache enabled
> {quote}
> 2. But failed to recover the disk.
> How can I recover the failed disk?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TS-3132) Can't recover the disk, after disk replacement

Reply via email to