RE: how to get rid of bad blocks in a file on PERC 5/I?

2010-05-03 Thread Patrick_Fischer
This was all times in my mind as I read your error, that it sound like a
punctured stripe. And now it is confirmed :-)
So it is no HW error, only a stripe which is damaged but earliest the
PErc6i have an feature to repair it.
For earlier controllers it is much difficult:

- the punctured stripe can occur in a written stripe or empty stripe
- to repair it on a written stripe you need a tool to locate the file
and overwrite it with a known good (sometimes the backup SW would tell
you which file/s is/are damaged)
- in a empty stripe you need to write on these block (in the past you
can download the MHDD utility, but it was long time not ago as I used it
last time)

The consistency check can't fix it.

Other way is, backup the data, delete the array, recreate it with
initialize, restore. (that is was the support say, as all other options
are to difficult)

Observations how it CAN BE occur: (only my experience and under rar
circumstances) (but max. in one of 1000 HDD Issues)
- if you try to rebuild a disk with a media error
- if a predictive failure disk was removed without setting it
offline


Some Lines out of a Dell Document to address these error on earlier
Perc's:

If media errors resides in user space (allocated space)
The first step to fix a punctured stripe is to do a full backup of the
logical disk. This will show if there are any media errors in user
space, i.e. one or several files will be reported as corrupt. Any file
reported corrupt must be overwritten with a known good copy. DO NOT
DELETE THE FILE since this will basically mean that the media errors are
"moved" to free space. Clearing media errors in free space is possible
but will require some downtime.
 
If a copy of the file doesn't exist that data will be lost. To still be
able to clear/overwrite the media errors you will have to create a dummy
file with the same name, same size and use it to overwrite the corrupt
file. 

Next step is to wait until Patrol Read have done at least one
cycle/iteration, then check the Windows event log/PERC controller log.
The punctured stripe have been fixed if sense key
3 11 00 doesn't show up anymore.
 
It's not unusual that media errors are still being reported after
replacing corrupt files but the number of affected LBA's should have at
least been reduced. This tell us that any remaining media errors resides
in free space.

If media errors resides in free space (unallocated space)
Media errors in free space can be cleared by using the MHDD program.
It's a freeware DOS program that can be used to write to a specific
LBA/specific disk on the PERC controller. It will require some downtime
since the system will need to be booted on a DOS diskette.


-Original Message-
From: Bond Masuda [mailto:bond.mas...@jlbond.com] 
Sent: Monday, May 03, 2010 9:05 PM
To: Fischer, Patrick
Cc: linux-poweredge-Lists
Subject: RE: how to get rid of bad blocks in a file on PERC 5/I?

Thanks Patrick for your reply.

I know my original message was long, so perhaps it was missed, but I did
run
a consistency check, at least twice. However, after each CC run, we
tested a
dd_rescue attempt on the file in question and still had unreadable
blocks. I
was expecting one of two things: 1) the consistency check reporting back
all
sorts of problems, or 2) the unreadable blocks would go away. Neither
was
the case and hence I decided to reach out.

I had forgotten about the "action=exportlog", thanks for reminding me
about
that. This is what i found:

04/29/10 14:45:10: EVT#17279-04/29/10 14:45:10:  97=Puncturing bad block
on
PD 03(e0/s3) at 4430
04/29/10 14:45:10: EVT#17280-04/29/10 14:45:10:  97=Puncturing bad block
on
PD 06(e0/s6) at 4430
04/29/10 14:45:11: EVT#17282-04/29/10 14:45:11:  97=Puncturing bad block
on
PD 04(e0/s4) at 4430
04/29/10 14:45:11: EVT#17283-04/29/10 14:45:11:  97=Puncturing bad block
on
PD 06(e0/s6) at 4430
04/29/10 14:45:11: EVT#17284-04/29/10 14:45:11:  97=Puncturing bad block
on
PD 03(e0/s3) at 4430

-Bond

> -Original Message-
> From: linux-poweredge-boun...@dell.com [mailto:linux-poweredge-
> boun...@dell.com] On Behalf Of patrick_fisc...@dell.com
> Sent: Monday, May 03, 2010 2:46 AM
> To: t...@seoss.co.uk; adam.niel...@uq.edu.au
> Cc: linux-powere...@lists.us.dell.com
> Subject: RE: how to get rid of bad blocks in a file on PERC 5/I?
> 
> Consistency Check:
> Check consistency. A check consistency determines the integrity of a
> virtual disk's redundant data. When necessary, this feature rebuilds
> the redundant information.
> Source:
>
http://support.dell.com/support/edocs/software/svradmin/6.2/en/OMSS/cnt
> rls.htm#wp681476
> 
> the remapping of bad sectors should be run automatically if the sector
> can't be written and the controller try to write or read from it.
> 
> Please check all times the controller log if you got filesystem erros
> like 

RE: how to get rid of bad blocks in a file on PERC 5/I?

2010-05-03 Thread Bond Masuda
Thanks Patrick for your reply.

I know my original message was long, so perhaps it was missed, but I did run
a consistency check, at least twice. However, after each CC run, we tested a
dd_rescue attempt on the file in question and still had unreadable blocks. I
was expecting one of two things: 1) the consistency check reporting back all
sorts of problems, or 2) the unreadable blocks would go away. Neither was
the case and hence I decided to reach out.

I had forgotten about the "action=exportlog", thanks for reminding me about
that. This is what i found:

04/29/10 14:45:10: EVT#17279-04/29/10 14:45:10:  97=Puncturing bad block on
PD 03(e0/s3) at 4430
04/29/10 14:45:10: EVT#17280-04/29/10 14:45:10:  97=Puncturing bad block on
PD 06(e0/s6) at 4430
04/29/10 14:45:11: EVT#17282-04/29/10 14:45:11:  97=Puncturing bad block on
PD 04(e0/s4) at 4430
04/29/10 14:45:11: EVT#17283-04/29/10 14:45:11:  97=Puncturing bad block on
PD 06(e0/s6) at 4430
04/29/10 14:45:11: EVT#17284-04/29/10 14:45:11:  97=Puncturing bad block on
PD 03(e0/s3) at 4430

-Bond

> -Original Message-
> From: linux-poweredge-boun...@dell.com [mailto:linux-poweredge-
> boun...@dell.com] On Behalf Of patrick_fisc...@dell.com
> Sent: Monday, May 03, 2010 2:46 AM
> To: t...@seoss.co.uk; adam.niel...@uq.edu.au
> Cc: linux-powere...@lists.us.dell.com
> Subject: RE: how to get rid of bad blocks in a file on PERC 5/I?
> 
> Consistency Check:
> Check consistency. A check consistency determines the integrity of a
> virtual disk's redundant data. When necessary, this feature rebuilds
> the redundant information.
> Source:
> http://support.dell.com/support/edocs/software/svradmin/6.2/en/OMSS/cnt
> rls.htm#wp681476
> 
> the remapping of bad sectors should be run automatically if the sector
> can't be written and the controller try to write or read from it.
> 
> Please check all times the controller log if you got filesystem erros
> like you described.
> Check the log for Bad LBA's on the disk like searching the log file
> with "bad"
> Check the Count of the LBA's and check if it occurs on multiple disks
> like a punctured stripe
> 
> The log you can get per megacli or open manage:
> 
> Server Administrator cli:
> omconfig storage controller action=exportlog controller=0
> where controller 0 = id of the involved controller

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


RE: how to get rid of bad blocks in a file on PERC 5/I?

2010-05-03 Thread Patrick_Fischer
Consistency Check:
Check consistency. A check consistency determines the integrity of a virtual 
disk's redundant data. When necessary, this feature rebuilds the redundant 
information.
Source:
http://support.dell.com/support/edocs/software/svradmin/6.2/en/OMSS/cntrls.htm#wp681476

the remapping of bad sectors should be run automatically if the sector can't be 
written and the controller try to write or read from it.

Please check all times the controller log if you got filesystem erros like you 
described.
Check the log for Bad LBA's on the disk like searching the log file with "bad"
Check the Count of the LBA's and check if it occurs on multiple disks like a 
punctured stripe

The log you can get per megacli or open manage:

Server Administrator cli:
omconfig storage controller action=exportlog controller=0
where controller 0 = id of the involved controller

Megacli:
Megacli -FwTermLog -Dsply -a0 > log.txt
- where a0 = controller 0


Best Regards/ Mit freundlichen Grüßen 

Patrick Fischer
Senior Server Analyst 
Dell | Global Support Services

Dell Halle GmbH in Vertretung für Dell GmbH 
Raffineriestraße 28, 06112 Halle (Saale), Germany 
Germany:  +49 69 9792 4299
Austria:+43 820 240 58 256

mailto:patrick_fisc...@dell.com, www.dell.de 

Geschäftsführer: Barbara Wittmann, Michael Kenney
Vorsitzender des Aufsichtsrates: Jürgen Renz
Eingetragen beim Amtsgericht Stendal unter HRB 215298, USt.-ID: DE242553389
WEEE-Reg.Nr. der Dell GmbH: DE 49515708
Online support - Antworten zu technischen Fragen, Treiber oder 
Anwendungsprogramme für ihr(e) System(e) 
http://support.euro.dell.com/support/index.aspx
Service call status - Abfrage zum Servicestatus ihres(r) System(e) 
http://support.euro.dell.com/support/topics/topic.aspx/emea/shared/support/support_history/de/search
Dell Technical Updates - Empfang zur Mitteilung von kritischen Updates, Updates 
für Anwendungsprogramme, Treiber und Tools für ihr(e) System(e). 
http://support.euro.dell.com/support/notifications/managesubscriptions.aspx


-Original Message-
From: linux-poweredge-boun...@dell.com 
[mailto:linux-poweredge-boun...@dell.com] On Behalf Of Tim Small
Sent: Saturday, May 01, 2010 12:17 PM
To: Adam Nielsen
Cc: linux-poweredge-Lists
Subject: Re: how to get rid of bad blocks in a file on PERC 5/I?

On 01/05/10 07:16, Adam Nielsen wrote:
> disks, as long as the hardware RAID controller can keep up with the disks
> there would be no difference in performance.
>

I remember reading a benchmark which showed that under random I/O 
patterns, the Linux software RAID performed better on a (from memory) 8 
disks RAID5, due to better use of SCSI scatter/gather.  I think this was 
vs. MegaRAID, but was a while ago.  I've not carried out any benchmarks 
myself, and no idea whether this goes for SATA NCQ as well...

Tim.


-- 
South East Open Source Solutions Limited
Registered in England and Wales with company number 06134732.
Registered Office: 2 Powell Gardens, Redhill, Surrey, RH1 1TQ
VAT number: 900 6633 53  http://seoss.co.uk/ +44-(0)1273-808309

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: how to get rid of bad blocks in a file on PERC 5/I?

2010-05-01 Thread Tim Small
On 01/05/10 07:16, Adam Nielsen wrote:
> disks, as long as the hardware RAID controller can keep up with the disks
> there would be no difference in performance.
>

I remember reading a benchmark which showed that under random I/O 
patterns, the Linux software RAID performed better on a (from memory) 8 
disks RAID5, due to better use of SCSI scatter/gather.  I think this was 
vs. MegaRAID, but was a while ago.  I've not carried out any benchmarks 
myself, and no idea whether this goes for SATA NCQ as well...

Tim.


-- 
South East Open Source Solutions Limited
Registered in England and Wales with company number 06134732.
Registered Office: 2 Powell Gardens, Redhill, Surrey, RH1 1TQ
VAT number: 900 6633 53  http://seoss.co.uk/ +44-(0)1273-808309

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: how to get rid of bad blocks in a file on PERC 5/I?

2010-04-30 Thread Adam Nielsen
> I suppose the argument isn't really about software vs hardware RAID
> though... if the firmware on the PERC5/I was released GPL, then the
> community can maintain and fix bugs just as well as software RAID in Linux.

Yes, I think those who dislike hardware RAID feel this way not because it's
hardware, but because the existing implementations leave a lot to be desired.
 If the firmware was GPL'd and bugs could be fixed as easily as in software
RAID then I'm sure you would find many more people moving to hardware RAID.

> Generally speaking, we've been very impressed with the PERC5/6 controllers.
> On a single controller and proper tuning with 8 drives/RAID-5, we easily
> achieve 400-500MB/sec sequential read/writes. For higher performance, we've
> used dual controllers with software RAID-1 and achieved almost double. We
> consider this a pretty cheap solution for that type of performance and the
> amount of storage space we get with the high capacity SATA drives.

My main reason for disliking hardware RAID is not the performance, but the
reliability.  As you've discovered, when something goes wrong like a dodgy
disk, it can be rather difficult to get to the bottom of what's happening.  If
the firmware was open source that sort of problem could be fixed, so really
it's the lack of control that I (and probably others) dislike.

> Although I haven't tried benchmarking an equivalent setup with software
> RAID-5, I would be curious to see what kind of performance can be achieved.
> Would using the fast CPUs we have these days for RAID-5 be faster than
> RAID-5 offloaded to a PERC5/6 controller?

I've always assumed that a CPU will be able to perform many more calculations
than a dedicated RAID controller, but I've never actually checked whether this
is the case.  At any rate you do have to share that CPU with other tasks so
it's possible that a high CPU task may slow your disks down, or lots of disk
activity will slow down your applications.  Whether this slowdown would still
result in faster performance than a hardware controller is a good question.  I
believe recent CPUs are able to perform well over 4GB/sec in RAID calculations
so I would suspect the answer is yes.  Of course the real bottleneck is the
disks, as long as the hardware RAID controller can keep up with the disks
there would be no difference in performance.

Cheers,
Adam.

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


RE: how to get rid of bad blocks in a file on PERC 5/I?

2010-04-30 Thread Bond Masuda
Hi Tim, Adam

Thanks for responding. Remarks inline below...

> In the case of a RAID controller, standard practise is for the
> controller to reconstruct the data from the other drives, and then
> issue
> the write instruction back to the original drive.  The better RAID
> implementations will actually REPORT THIS TO YOU, when it happens (e.g.
> Linux software RAID, so that you know the drive may be unwell).  To
> make
> matters worse you can't even reliably check the SMART data yourself
> with
> some of the Dell/LSI controllers - and LSI/Dell don't seem to care
> enough to fix this...
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=14831
 
With smartmontools 5.39.1, i was able to get the SMART stats off the drives
through the PERC5/I:

smartctl -A -d megaraid,X /dev/sdc, where X={0,1,...,7}

> Using smartctl to check what's gone on the with the drive itself would
> be the best thing to do, I think...  Recent smartctl has support for
> communicating with drives behind PERCs.

indeed we did. most of the SMART data was normal except on 1:6 which showed

attribute#200 Multi_Zone_Error_Rate was 277 (all other drives it was 0)

> ACK.  My conclusion is also to use AHCI, and software RAID.  It's more
> reliable generally, and if you do find a bug, the maintainers are
> responsive (or you can even fix it yourself, or pay someone else to -
> this is Open Source right?  Presumably that's why people use Linux in
> the first place?).  Oh, and it's cheaper too.

i have nothing against software RAID, we chose hardware RAID since it
essentially came with the PowerEdge server and we figured given it was
available there would be an advantage of using hardware RAID in RAID-5.

I suppose the argument isn't really about software vs hardware RAID
though... if the firmware on the PERC5/I was released GPL, then the
community can maintain and fix bugs just as well as software RAID in Linux.

Generally speaking, we've been very impressed with the PERC5/6 controllers.
On a single controller and proper tuning with 8 drives/RAID-5, we easily
achieve 400-500MB/sec sequential read/writes. For higher performance, we've
used dual controllers with software RAID-1 and achieved almost double. We
consider this a pretty cheap solution for that type of performance and the
amount of storage space we get with the high capacity SATA drives.

Although I haven't tried benchmarking an equivalent setup with software
RAID-5, I would be curious to see what kind of performance can be achieved.
Would using the fast CPUs we have these days for RAID-5 be faster than
RAID-5 offloaded to a PERC5/6 controller?

-Bond



___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: how to get rid of bad blocks in a file on PERC 5/I?

2010-04-30 Thread Tim Small
Adam Nielsen wrote:
> I believe that when hard disks discover they have a bad sector 
> they attempt to remap it themselves, but it may not always happen right 
> away.  So it's possible that by the time you rebuilt the array the 
> sectors had been relocated.
>   

I believe the standard behaviour is:

. Read and apply simple (fast/hardware-implemented AKA "online") error
correction

. If that fails try to use more complex (slow/firmware-implemented
AKA "offline") ECC - retry this a (usually configurable) number of times.

. In the case of successful correction (we have the user data),
write the data back to the sector, and then read-check it to see if it
was written successfully.

   . If the re-read-verify is OK, then continue as normal (maybe
increment one of the SMART counters)

   . If the re-read-verify fails, then reallocate the sector
(use a "spare" hidden reserved sector elsewhere on the disk).  Increment
the SMART "reallocated sector" count.

   . If the "offline" ECC fails, then we've really lost data, so
return a read-error to the disk controller - mark the sector as
"pending" - attempting to read the sector again may restart the
"offline" correction attempts.




If the controller later tries to WRITE to that sector instead of reading
it, then the drive will do the "write, and verify" step again as above
with the new data (i.e. see if the data can then be read, and if-not
then reallocate it).

In the case of a RAID controller, standard practise is for the
controller to reconstruct the data from the other drives, and then issue
the write instruction back to the original drive.  The better RAID
implementations will actually REPORT THIS TO YOU, when it happens (e.g.
Linux software RAID, so that you know the drive may be unwell).  To make
matters worse you can't even reliably check the SMART data yourself with
some of the Dell/LSI controllers - and LSI/Dell don't seem to care
enough to fix this...

https://bugzilla.kernel.org/show_bug.cgi?id=14831

> However given the subsequent failures I would think that the drive may 
> actually be fine - maybe you can run a self test on it without going 
> through a RAID controller.
>   

Using smartctl to check what's gone on the with the drive itself would
be the best thing to do, I think...  Recent smartctl has support for
communicating with drives behind PERCs.


> I don't know whether the situation has improved in recent years, the 
> experiences were enough to persuade me to switch to software RAID which 
> I have stuck with ever since.
>   

ACK.  My conclusion is also to use AHCI, and software RAID.  It's more
reliable generally, and if you do find a bug, the maintainers are
responsive (or you can even fix it yourself, or pay someone else to -
this is Open Source right?  Presumably that's why people use Linux in
the first place?).  Oh, and it's cheaper too.

Tim.


-- 
South East Open Source Solutions Limited
Registered in England and Wales with company number 06134732.
Registered Office: 2 Powell Gardens, Redhill, Surrey, RH1 1TQ
VAT number: 900 6633 53  http://seoss.co.uk/ +44-(0)1273-808309

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: how to get rid of bad blocks in a file on PERC 5/I?

2010-04-29 Thread Adam Nielsen
> I could use some help trying to get rid of some bad blocks on a RAID-5 on
> PERC 5/I controller.

I'm certainly no expert with this but just have a few thoughts I'd 
mention.  I believe that when hard disks discover they have a bad sector 
they attempt to remap it themselves, but it may not always happen right 
away.  So it's possible that by the time you rebuilt the array the 
sectors had been relocated.

However given the subsequent failures I would think that the drive may 
actually be fine - maybe you can run a self test on it without going 
through a RAID controller.

It's also my experience (albeit with older RAID controllers) that when 
the controller reports the array as OK, it means it can communicate with 
all the disks in it.  It's not a comment on whether any of the disks are 
working properly or not.  In fact the controllers I have used (old 
MegaRAID cards) work great if a disk dies, but they are a bit 
unpredictable when a disk works but has read problems.  They don't seem 
to be designed to cope with dodgy disks, only flat out broken ones.

I don't know whether the situation has improved in recent years, the 
experiences were enough to persuade me to switch to software RAID which 
I have stuck with ever since.

> But, after the
> xfs_repair, xfs_check says /data is in good condition.

I also don't know whether xfs_repair actually repairs the *data* in the 
filesystem.  I suspect it only checks the filesystem structure, and 
assumes the data itself is correct.  If there is no XFS metadata stored 
in a bad block, it wouldn't surprise me that the tools never detect it.

You could try removing each disk one at a time and trying to create an 
image of it, that will attempt a read from every block on the disk. 
Presumably at some point you will hit a disk that fails half way through 
the procedure, unless the disks really are all fine.

Cheers,
Adam.

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


RE: how to get rid of bad blocks in a file on PERC 5/I?

2010-04-29 Thread Bond Masuda
Hi everyone,

I could use some help trying to get rid of some bad blocks on a RAID-5 on
PERC 5/I controller.

Let me start by describing the setup:

- 8x SATA 500GB HDD on PERC 5/I with Dell firmware 5.2.2-0072
- The 8 drives are setup as RAID-5 and shows up in RHEL 5.4 as /dev/sdc
- /dev/sdc is formatted with XFS (XFS support from CentOS repository)
- /dev/sdc is mounted at /data
- for this discussion, let's call the disks 0:0, 0:1, 0:2, 0:3, 1:4, 1:5,
1:6, and 1:7

So, disk 1:6 all of sudden is marked as "failed" and /dev/sdc becomes
degraded. Just in case another disk goes bad, we decided to take a backup at
that moment; this would provide us with a copy of the "latest" instead of 1
day or older stuff. Our backup methodology is just a simple rsync of /data
to an external drive. 

During the backup run, we noticed that there was one 4GB file that did not
copy correctly. So, we used dd_rescue to make a copy of it but find out
there are 8 blocks that are not readable. (blocks 5608072-5608079, the point
is they are in a contiguous range, file system-wise)

So, at this point, we're glad that we just did a backup of everything since
now we're concerned that the rebuild of 1:6 might not succeed if there are
unreadable sectors somewhere. Just to see what might happen, and since 1:6
seems to still be spinning, we decided to force it to rebuild without
replacing it. To our surprise, the rebuild of 1:6 actually succeeded!?!??!
We then run a 'omconfig storage vdisk action=checkconsistency controller=0
vdisk=0' and it completes successfully! Does checkconsistency make hard
drives remap bad sectors?

Now /dev/sdc is once again in good health, so we think, but we're suspicious
of 1:6. At this point in time, we unmount /data and do xfs_check on it. it
reports that "block ?/? type unknown not expected" and so we run xfs_repair
on /data. everything seems to complete okay and we cleanly mount /data
again. we go back to examine the 4GB file. We tried another dd_rescue on it
but got the same exact results; 8 blocks in the same exact range as before
did not read. When we use rsync to copy this file, we get the following
console error messages:

end_request: I/O error, dev sdc, sector 6012984362
end_request: I/O error, dev sdc, sector 6012984362
end_request: I/O error, dev sdc, sector 6012984874
end_request: I/O error, dev sdc, sector 6012984874

These messages makes me think that there are bad sectors on one or more of
the disks? (would you people agree?) What I don't get is, we were having
this problem in degraded mode (when 1:6 was in "failed" state), so how could
it rebuild 1:6 with read errors somewhere other than 1:6?

In the middle of investigating all this, disk 1:6 again goes into "failed"
state. When we used 'omreport storage pdisk controller=0' some of the fields
for 1:6 were filled with garbage. This time, we think 1:6 is really toast
and decide to replace it with a spare we have. We put in the spare drive and
it begins to rebuild. Again, we're not sure it will be able to successfully
rebuild since we think there are read errors somewhere else. But to our
surprise, the new 1:6 disk rebuilds successfully and /dev/sdc is once again
in Status=Ok, State=Ready.

We go back to investigating the 4GB file with bad blocks. We try another
dd_rescue to copy it, but this time we get 16 bad blocks! The 1st 8 are
exactly as before (5608072-5608079), but there were also blocks
5608584-5608591; the 2nd range of 8 blocks being contiguous but separate
from the 1st 8. Problem getting worse?

On the one hand, since the rebuild of 1:6 succeeded twice, this makes us
think this is NOT a hard disk issue, but maybe a XFS issue? But, after the
xfs_repair, xfs_check says /data is in good condition. And the "I/O error,
dev sdc, sector " makes us think it could be a hard disk issue?

Some thoughts? Advice?
-Bond





___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq