[opensuse] raid question (2)

2007-06-26 Thread Lorenzo Cerini

Hi all,
i'm experiencing a problem with a software raid1 on opensuse10.0.

I replaced one disk (sdb).
Then i started the resync of my two raid:
now i have every 2 hours and half on my logs:

Jun 26 13:21:48 axis kernel: sda: Current: sense key: Medium Error
Jun 26 13:21:48 axis kernel: end_request: I/O error, dev sda, sector 293699707
Jun 26 13:21:48 axis kernel: raid1: sda: unrecoverable I/O read error for block 
239159032
Jun 26 13:21:48 axis kernel:  disk 0, wo:0, o:1, dev:sda3
Jun 26 13:21:48 axis kernel:  disk 0, wo:0, o:1, dev:sda3
Jun 26 13:21:48 axis kernel:  disk 0, wo:0, o:1, dev:sda3

The server works fine and i have a backup working for my sensible data,
but i would like to work out in some way this block and get able
to finish the resync between disks.
Any help appreciated.
Thanks in advance 


Lorenzo Cerini

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [opensuse] raid question (2)

2007-06-26 Thread John Andersen
On Tuesday 26 June 2007, Lorenzo Cerini wrote:
 Hi all,
 i'm experiencing a problem with a software raid1 on opensuse10.0.

 I replaced one disk (sdb).
 Then i started the resync of my two raid:
 now i have every 2 hours and half on my logs:

 Jun 26 13:21:48 axis kernel: sda: Current: sense key: Medium Error
 Jun 26 13:21:48 axis kernel: end_request: I/O error, dev sda, sector
 293699707 Jun 26 13:21:48 axis kernel: raid1: sda: unrecoverable I/O read
 error for block 239159032 Jun 26 13:21:48 axis kernel:  disk 0, wo:0, o:1,
 dev:sda3
 Jun 26 13:21:48 axis kernel:  disk 0, wo:0, o:1, dev:sda3
 Jun 26 13:21:48 axis kernel:  disk 0, wo:0, o:1, dev:sda3

Sounds to me like you replaced sdb only to find there are also problems
with sda.

Two possibilities spring to mind:
1) you replaced the wrong disk
2) whatever took out sdb also affected sda



-- 
_
John Andersen


pgpeKqUtHKCqS.pgp
Description: PGP signature


Re: [opensuse] raid question

2007-06-25 Thread Lorenzo Cerini

I thank you all.

As Carlos said in one of the last post, first replace, then investigate.
I was not the case to have the accounting office's server of a shipping company
(that means they work 24/24 7/7), stopped for any reason for more than 20 
minutes.
I just replaced the disk added the new disk to raid and resync.


Beside, just for completeness of information, the failed disk had  some real hardware 
trouble.




L.Cerini

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [opensuse] raid question

2007-06-23 Thread Jonathan Arsenault
On Fri, 2007-06-22 at 18:24 +0700, Fajar Priyanto wrote:
 First cylinder: (just enter)
 Last cylinder: +1000M (1GB)

I want my 24M back!

-- 
Why can't humans just reboot instead of sleeping, so much wasted cycles 
-Zombie Coder.
Jonathan Arsenault - [EMAIL PROTECTED] - http://jarpack.net


signature.asc
Description: This is a digitally signed message part


[opensuse] raid question

2007-06-22 Thread Lorenzo Cerini

Hi all,
i have a little trouble with a software raid1 array.
i built it with opensuse10.0.

Now one disks left the array giving me the (F) of failure for one partition.
here is the cat /proc/mdstat:

Personalities : [raid1]
md1 : active raid1 sda3[0]
 129017920 blocks [2/1] [U_]

md0 : active raid1 sdb1[2](F) sda1[0]
 26217984 blocks [2/1] [U_]

I ma quite far from this server location, so i need to know:
how much fair is to assume disks is not broker and just use 'badblocks -f'?

and if i want to replace it which is the easiest way ?
i partition here the new disk (maybe with fdisk, but do not know the
way to have the disk raid formatted with id=fd), replace the old one,
and then use raidhotadd, or, if the new disk will get anyway the /dev/sdb 
identifier,
the kernel will do it for me at boot time.

Thank in advance,
L.Cerini

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [opensuse] raid question

2007-06-22 Thread Carlos E. R.
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


The Friday 2007-06-22 at 12:40 +0200, Lorenzo Cerini wrote:

 Hi all,
 i have a little trouble with a software raid1 array.
 i built it with opensuse10.0.
 
 Now one disks left the array giving me the (F) of failure for one partition.
 here is the cat /proc/mdstat:
...
 I ma quite far from this server location, so i need to know:
 how much fair is to assume disks is not broker and just use 'badblocks -f'?
 
 and if i want to replace it which is the easiest way ?


Be aware that software raid will remove a disk for a simple glitch, a 
temporary failure. Just scan the logs for errors, check the drive (smart 
tests), etc. An attemt to write to a badblock would show on the log.

Then re-enable the disk, and watch it.

- -- 
Cheers,
   Carlos E. R.

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Made with pgp4pine 1.76

iD4DBQFGe7FXtTMYHG2NR9URAszYAJ97SZzC/kEMfr6i0Y2t12Vc5TPW4QCWNGju
0n96oriBlVKNDo+FyFzPkQ==
=9FsZ
-END PGP SIGNATURE-

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [opensuse] raid question

2007-06-22 Thread Fajar Priyanto
On Friday 22 June 2007 17:40, Lorenzo Cerini wrote:
 Hi all,
 i have a little trouble with a software raid1 array.
 i built it with opensuse10.0.

 Now one disks left the array giving me the (F) of failure for one
 partition. here is the cat /proc/mdstat:

  Personalities : [raid1]
 md1 : active raid1 sda3[0]
   129017920 blocks [2/1] [U_]

 md0 : active raid1 sdb1[2](F) sda1[0]
   26217984 blocks [2/1] [U_]

 I ma quite far from this server location, so i need to know:
 how much fair is to assume disks is not broker and just use 'badblocks -f'?

 and if i want to replace it which is the easiest way ?
 i partition here the new disk (maybe with fdisk, but do not know the
 way to have the disk raid formatted with id=fd), replace the old one,
 and then use raidhotadd, or, if the new disk will get anyway the /dev/sdb
 identifier, the kernel will do it for me at boot time.

Hello Lorenzo,
It seems like your raid arrays are in a pretty bad state.
/dev/md1 is broken, and /dev/md0 is too.

Here's a suggestion on how to troubleshoot it:
1. If you have some important data on that server, back it up first to a safe 
location other than the above mentioned server. Using scp, rsync, anything.
2. You can try to build the array one by one:
For /dev/md1:
mdadm /dev/md1 -a /dev/sdb3 (assuming the broken pair is sdb3)

For /dev/md0:
Remove the F member first:
mdadm /dev/md0 -r /dev/sdb1
Add it again:
mdadm /dev/md0 -a /dev/sdb1

For preparing the new disk, please take note the current partition scheme from 
the server, fdisk -l /dev/sda, fdisk -l /dev/sdb. You must make the partition 
on the new disk EXACTLY like the real one.
Then partition Using fdisk, for example sdb:
fdisk /dev/sdb
n (new)
primary partition (1-4)
First cylinder: (just enter)
Last cylinder: +1000M (1GB)
Repeat for other partitions.

Then, change the type of the partition as software raid:
t
L (for list of codes)
fd (software raid)
Repeat for other partitions

w (save)

--
I have the following note for the actual menu, attached as text file, 
hopefully it can go through the list.

Remember, backup your data first! Keep save.
HTH,
-- 
Fajar Priyanto | Reg'd Linux User #327841 | Linux tutorial 
http://linux2.arinet.org
6:23pm up 5:51, 2.6.18.2-34-default GNU/Linux 
Let's use OpenOffice. http://www.openoffice.org
[EMAIL PROTECTED] ~]# fdisk /dev/hda

The number of cylinders for this disk is set to 4865.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
   (e.g., DOS FDISK, OS/2 FDISK)

Command (m for help): m
Command action
   a   toggle a bootable flag
   b   edit bsd disklabel
   c   toggle the dos compatibility flag
   d   delete a partition
   l   list known partition types
   m   print this menu
   n   add a new partition
   o   create a new empty DOS partition table
   p   print the partition table
   q   quit without saving changes
   s   create a new empty Sun disklabel
   t   change a partition's system id
   u   change display/entry units
   v   verify the partition table
   w   write table to disk and exit
   x   extra functionality (experts only)

Command (m for help): p

Disk /dev/hda: 40.0 GB, 40020664320 bytes
255 heads, 63 sectors/track, 4865 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot  Start End  Blocks   Id  System
/dev/hda1   *   1   2   16033+  83  Linux
/dev/hda2   31177 9438187+  83  Linux
/dev/hda311781488 2498107+  83  Linux
/dev/hda41489486527125752+   5  Extended
/dev/hda514891814 2618563+  83  Linux
/dev/hda618151945 1052226   83  Linux
/dev/hda719462059  915673+  82  Linux swap
/dev/hda820602075  128488+  83  Linux
/dev/hda920762792 5759271   83  Linux
/dev/hda10   27933002 1686793+  83  Linux
/dev/hda11   3003482914675346   83  Linux
/dev/hda12   48304832   24066   83  Linux

Command (m for help): v
539838 unallocated sectors

Command (m for help): n
First cylinder (4833-4865, default 4833): 
Using default value 4833
Last cylinder or +size or +sizeM or +sizeK (4833-4865, default 4865): +50M

Command (m for help): p

Disk /dev/hda: 40.0 GB, 40020664320 bytes
255 heads, 63 sectors/track, 4865 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot  Start End  Blocks   Id  System
/dev/hda1   *   1   2   16033+  83  Linux
/dev/hda2   31177 9438187+  83  Linux
/dev/hda311781488 2498107+  83  Linux
/dev/hda41489486527125752+   5  Extended
/dev/hda514891814 2618563+  83  Linux

Re: [opensuse] raid question

2007-06-22 Thread Carlos E. R.
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


The Friday 2007-06-22 at 13:59 +0200, Lorenzo Cerini wrote:

 So, maybe it is better if i re-enable the disk and see what happens.
 There is no trouble about data-loss, since we regularly beckup everithing
 useful at midnight.

Just have a look at the logs first.

- -- 
Cheers,
   Carlos E. R.

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Made with pgp4pine 1.76

iD8DBQFGe7vftTMYHG2NR9URAuPkAJ4+OlhEn6fjFKJKqzqkxQSFy65AUgCeKlyk
2MqADDocdKynq/YuNbxh73A=
=2N0C
-END PGP SIGNATURE-

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [opensuse] raid question

2007-06-22 Thread Lorenzo Cerini

So, maybe it is better if i re-enable the disk and see what happens.
There is no trouble about data-loss, since we regularly beckup everithing
useful at midnight.
L.

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


The Friday 2007-06-22 at 12:40 +0200, Lorenzo Cerini wrote:

  

Hi all,
i have a little trouble with a software raid1 array.
i built it with opensuse10.0.

Now one disks left the array giving me the (F) of failure for one partition.
here is the cat /proc/mdstat:


...
  

I ma quite far from this server location, so i need to know:
how much fair is to assume disks is not broker and just use 'badblocks -f'?

and if i want to replace it which is the easiest way ?




Be aware that software raid will remove a disk for a simple glitch, a 
temporary failure. Just scan the logs for errors, check the drive (smart 
tests), etc. An attemt to write to a badblock would show on the log.


Then re-enable the disk, and watch it.

- -- 
Cheers,

   Carlos E. R.

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Made with pgp4pine 1.76

iD4DBQFGe7FXtTMYHG2NR9URAszYAJ97SZzC/kEMfr6i0Y2t12Vc5TPW4QCWNGju
0n96oriBlVKNDo+FyFzPkQ==
=9FsZ
-END PGP SIGNATURE-

  


--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [opensuse] raid question

2007-06-22 Thread Lorenzo Cerini

I found this on my logs.
I don't thik it is a badblock problem, but i cannot understand if it
is a hardware problem:
Jun 19 03:16:42 axis kernel: ata2: status=0x51 { DriveReady SeekComplete
Error }
Jun 19 03:16:42 axis kernel: ata2: error=0x40 { UncorrectableError }
Jun 19 03:16:46 axis kernel: ata2: status=0x51 { DriveReady SeekComplete
Error }
Jun 19 03:16:46 axis kernel: ata2: error=0x40 { UncorrectableError }
Jun 19 03:16:50 axis kernel: ata2: status=0x51 { DriveReady SeekComplete
Error }
Jun 19 03:16:50 axis kernel: ata2: error=0x40 { UncorrectableError }
Jun 19 03:16:53 axis kernel: ata2: status=0x51 { DriveReady SeekComplete
Error }
Jun 19 03:16:53 axis kernel: ata2: error=0x40 { UncorrectableError }
Jun 19 03:16:57 axis kernel: ata2: status=0x51 { DriveReady SeekComplete
Error }
Jun 19 03:16:57 axis kernel: ata2: error=0x40 { UncorrectableError }
Jun 19 03:16:57 axis kernel: SCSI error : 1 0 0 0 return code = 0x802
Jun 19 03:16:57 axis kernel: sdb: Current: sense key: Medium Error
Jun 19 03:16:57 axis kernel: Additional sense: Unrecovered read
error - auto reallocate failed
Jun 19 03:16:57 axis kernel: end_request: I/O error, dev sdb, sector
25690407
Jun 19 03:16:57 axis kernel: raid1: Disk failure on sdb1, disabling device.
Jun 19 03:16:57 axis kernel: Operation continuing on 1 devices
Jun 19 03:16:57 axis kernel: raid1: sdb1: rescheduling sector 25690344
Jun 19 03:16:57 axis kernel: RAID1 conf printout:
Jun 19 03:16:57 axis kernel:  --- wd:1 rd:2
Jun 19 03:16:57 axis kernel:  disk 0, wo:0, o:1, dev:sda1
Jun 19 03:16:57 axis kernel:  disk 1, wo:1, o:0, dev:sdb1
Jun 19 03:16:57 axis kernel: RAID1 conf printout:
Jun 19 03:16:57 axis kernel:  --- wd:1 rd:2
Jun 19 03:16:57 axis kernel:  disk 0, wo:0, o:1, dev:sda1
Jun 19 03:16:57 axis kernel: raid1: sda1: redirecting sector 25690344 to
another mirror
L.

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


The Friday 2007-06-22 at 13:59 +0200, Lorenzo Cerini wrote:

  

So, maybe it is better if i re-enable the disk and see what happens.
There is no trouble about data-loss, since we regularly beckup everithing
useful at midnight.



Just have a look at the logs first.

- -- 
Cheers,

   Carlos E. R.

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Made with pgp4pine 1.76

iD8DBQFGe7vftTMYHG2NR9URAuPkAJ4+OlhEn6fjFKJKqzqkxQSFy65AUgCeKlyk
2MqADDocdKynq/YuNbxh73A=
=2N0C
-END PGP SIGNATURE-

  



--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [opensuse] raid question

2007-06-22 Thread Carlos E. R.
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


The Friday 2007-06-22 at 14:34 +0200, Lorenzo Cerini wrote:

 I found this on my logs.
 I don't thik it is a badblock problem, but i cannot understand if it
 is a hardware problem:

I think so.

 Jun 19 03:16:57 axis kernel: ata2: status=0x51 { DriveReady SeekComplete 
 Error }
 Jun 19 03:16:57 axis kernel: ata2: error=0x40 { UncorrectableError }
 Jun 19 03:16:57 axis kernel: SCSI error : 1 0 0 0 return code = 0x802
 Jun 19 03:16:57 axis kernel: sdb: Current: sense key: Medium Error
 Jun 19 03:16:57 axis kernel: Additional sense: Unrecovered read error - 
 auto reallocate failed

I'm not familiar with scsi errors, but if I interpret it correctly, the 
drive tried to relocate the bad sector to somewhere else and failed: 
that's not good, it might mean that the drive has no more spare sectors 
for remapping and is thus at the end of its life.

You should investigate the smart logs (smartctl -a /dev/sdb). If you can 
determine that what I said is the case, then you should replace the drive 
promptly. If in doubt, run the short and long diagnostics. They don't 
catch everything, but if they sey bad, it is bad.


See? First admin rule: read the logs ;-)

- -- 
Cheers,
   Carlos E. R.

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Made with pgp4pine 1.76

iD8DBQFGe9NetTMYHG2NR9URArTKAJ46WEx3b403WKAh9ndvhtV/kPtb3wCeLHlp
fSygbHbuo4wI/qqEkUW5lQE=
=zXEq
-END PGP SIGNATURE-

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [opensuse] raid question

2007-06-22 Thread Lorenzo Cerini

Trouble is those are SATA disks, not SCSi.
So have no smartctl ( or at least smartctl answer me this way)

L.

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


The Friday 2007-06-22 at 14:34 +0200, Lorenzo Cerini wrote:

  

I found this on my logs.
I don't thik it is a badblock problem, but i cannot understand if it
is a hardware problem:



I think so.

  

Jun 19 03:16:57 axis kernel: ata2: status=0x51 { DriveReady SeekComplete Error }
Jun 19 03:16:57 axis kernel: ata2: error=0x40 { UncorrectableError }
Jun 19 03:16:57 axis kernel: SCSI error : 1 0 0 0 return code = 0x802
Jun 19 03:16:57 axis kernel: sdb: Current: sense key: Medium Error
Jun 19 03:16:57 axis kernel: Additional sense: Unrecovered read error - 
auto reallocate failed



I'm not familiar with scsi errors, but if I interpret it correctly, the 
drive tried to relocate the bad sector to somewhere else and failed: 
that's not good, it might mean that the drive has no more spare sectors 
for remapping and is thus at the end of its life.


You should investigate the smart logs (smartctl -a /dev/sdb). If you can 
determine that what I said is the case, then you should replace the drive 
promptly. If in doubt, run the short and long diagnostics. They don't 
catch everything, but if they sey bad, it is bad.



See? First admin rule: read the logs ;-)

- -- 
Cheers,

   Carlos E. R.

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Made with pgp4pine 1.76

iD8DBQFGe9NetTMYHG2NR9URArTKAJ46WEx3b403WKAh9ndvhtV/kPtb3wCeLHlp
fSygbHbuo4wI/qqEkUW5lQE=
=zXEq
-END PGP SIGNATURE-

  


--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [opensuse] raid question

2007-06-22 Thread Carlos E. R.
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


The Friday 2007-06-22 at 16:07 +0200, Lorenzo Cerini wrote:

 Trouble is those are SATA disks, not SCSi.
 So have no smartctl ( or at least smartctl answer me this way)

Doesn't smartctl work with SATA drives yet? I thought that had been 
solved.

- -- 
Cheers,
   Carlos E. R.
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Made with pgp4pine 1.76

iD8DBQFGfDOFtTMYHG2NR9URApbLAJ9NvNdxKQ5tziRX1PczEs2jhmd9hgCff1xM
eGusbWF/12YDvW1Jz+QLXJM=
=Ba0p
-END PGP SIGNATURE-

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [opensuse] raid question

2007-06-22 Thread Rauch Christian
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Carlos E. R. schrieb:
 
 The Friday 2007-06-22 at 16:07 +0200, Lorenzo Cerini wrote:
 
 Trouble is those are SATA disks, not SCSi.
 So have no smartctl ( or at least smartctl answer me this way)
 
 Doesn't smartctl work with SATA drives yet? I thought that had been
 solved.

It works here on 10.2, but the OP is using 10.0. It seems to that it was
not fixed there.

Regards,
Chris
- --
http://rauchs-home.de - home of yet another suse repository ;)
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (GNU/Linux)

iD8DBQFGfDTDayhvFxrDZlkRAiTrAJ9xEVyP1K4z12cny+u/Zh/Z4lHltwCfWCs+
YXKRPET/tpdZOHZIDqYwIuo=
=nU/w
-END PGP SIGNATURE-
-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [opensuse] raid question

2007-06-22 Thread Greg Freemyer

On 6/22/07, Rauch Christian [EMAIL PROTECTED] wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Carlos E. R. schrieb:

 The Friday 2007-06-22 at 16:07 +0200, Lorenzo Cerini wrote:

 Trouble is those are SATA disks, not SCSi.
 So have no smartctl ( or at least smartctl answer me this way)

 Doesn't smartctl work with SATA drives yet? I thought that had been
 solved.

It works here on 10.2, but the OP is using 10.0. It seems to that it was
not fixed there.

Regards,
Chris


It used to require a -d ata argument.  The OP should try that.

Also, SATA drives do not reallocate on read only on write.

Since the OP has the sector #, he should use dd to read in the sector
from the good drive to a temp file.  Then use dd to write it back out
to the failed drive.  In theory the bad drive will see that someone is
writing to a bad sector and re-map it to one of the spare sectors.

FYI: There was some discussion about mdraid doing this automatically
on a failed read, but I don't think it has been implemented yet.

Greg
--
Greg Freemyer
The Norcross Group
Forensics for the 21st Century
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [opensuse] raid question

2007-06-22 Thread Carlos E. R.
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


The Friday 2007-06-22 at 16:52 -0400, Greg Freemyer wrote:

 Also, SATA drives do not reallocate on read only on write.

Same as PATA. It is done by the disk hardware, no cpu intervention.


 Since the OP has the sector #, he should use dd to read in the sector
 from the good drive to a temp file.  Then use dd to write it back out
 to the failed drive.  In theory the bad drive will see that someone is
 writing to a bad sector and re-map it to one of the spare sectors.

That's not possible; it seems you haven't read previous posts:

| Jun 19 03:16:57 axis kernel: sdb: Current: sense key: Medium Error
| Jun 19 03:16:57 axis kernel: Additional sense: Unrecovered readerror - 
auto reallocate failed
| Jun 19 03:16:57 axis kernel: end_request: I/O error, dev sdb, sector25690407

Remapping has already failed.

- -- 
Cheers,
   Carlos E. R.
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Made with pgp4pine 1.76

iD8DBQFGfEjbtTMYHG2NR9URAonOAJ9GyKxE494dYF1ej7xk7LEnXgPfdACePNO8
wTf81MNpTrL/4RSZLXxT1U0=
=pxmf
-END PGP SIGNATURE-

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [opensuse] raid question

2007-06-22 Thread Greg Freemyer

On 6/22/07, Carlos E. R. [EMAIL PROTECTED] wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


The Friday 2007-06-22 at 16:52 -0400, Greg Freemyer wrote:

 Also, SATA drives do not reallocate on read only on write.

Same as PATA. It is done by the disk hardware, no cpu intervention.


 Since the OP has the sector #, he should use dd to read in the sector
 from the good drive to a temp file.  Then use dd to write it back out
 to the failed drive.  In theory the bad drive will see that someone is
 writing to a bad sector and re-map it to one of the spare sectors.

That's not possible; it seems you haven't read previous posts:

| Jun 19 03:16:57 axis kernel: sdb: Current: sense key: Medium Error
| Jun 19 03:16:57 axis kernel: Additional sense: Unrecovered readerror - 
auto reallocate failed
| Jun 19 03:16:57 axis kernel: end_request: I/O error, dev sdb, sector25690407

Remapping has already failed.


failed on a read the way I read it.  I suggested to do a write.  I
don't know what subsystem generated the above.  Maybe the dmraid layer
tried a write after the failed read?  Don't know, but I would still
try to do a write manually via dd.

FYI: I don't think the SATA error code interpretation by the SCSI
layer is 100% accurate, so I would not trust anything SATA related
that is being reported by the SCSI layer that libata is currently
kludged underneath.  Hopefully someday libata will become its own full
fledged subsystem without any of the scsi core code causing confusion.

Greg
--
Greg Freemyer
The Norcross Group
Forensics for the 21st Century
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [opensuse] raid question

2007-06-22 Thread Carlos E. R.
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


The Friday 2007-06-22 at 18:33 -0400, Greg Freemyer wrote:

 On 6/22/07, Carlos E. R. robin.listas@ wrote:

  | Jun 19 03:16:57 axis kernel: sdb: Current: sense key: Medium Error
  | Jun 19 03:16:57 axis kernel: Additional sense: Unrecovered readerror -
  | auto reallocate failed
  | Jun 19 03:16:57 axis kernel: end_request: I/O error, dev sdb,
  | sector25690407
 
  Remapping has already failed.
 
 failed on a read the way I read it.  

That is irrelevant: remapping was triggered and failed.

 I suggested to do a write.  I don't know what subsystem generated the 
 above.  Maybe the dmraid layer tried a write after the failed read?  
 Don't know, but I would still try to do a write manually via dd.

Something tried remapping and failed, and that is the important thing. If 
it is the HD remapping that failed, as I think it is, then the failure is 
crucial and the HD needs replacing ASAP, no toying.

In fact, trying to write to that sector will probably fail and the 
remapping will fail, too. Should fail.

That's why reading the SMART log is so important in this case. If he has 
10.0 and smartctl can't read it, then he should use a 10.2 rescue system 
and read that log.

Or replace the disk first, investigate later.

- -- 
Cheers,
   Carlos E. R.

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Made with pgp4pine 1.76

iD8DBQFGfFRjtTMYHG2NR9URAncYAJ4xPb8Dh/Sn7Y9CYup6oC2lfsSRJwCfTIZ+
LkuuvJmr5fz1+arc9CNGiy0=
=/rg6
-END PGP SIGNATURE-

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [opensuse] raid question

2007-06-22 Thread Greg Freemyer

On 6/22/07, Carlos E. R. [EMAIL PROTECTED] wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


The Friday 2007-06-22 at 18:33 -0400, Greg Freemyer wrote:

 On 6/22/07, Carlos E. R. robin.listas@ wrote:

  | Jun 19 03:16:57 axis kernel: sdb: Current: sense key: Medium Error
  | Jun 19 03:16:57 axis kernel: Additional sense: Unrecovered readerror -
  | auto reallocate failed
  | Jun 19 03:16:57 axis kernel: end_request: I/O error, dev sdb,
  | sector25690407
 
  Remapping has already failed.

 failed on a read the way I read it.

That is irrelevant: remapping was triggered and failed.

 I suggested to do a write.  I don't know what subsystem generated the
 above.  Maybe the dmraid layer tried a write after the failed read?
 Don't know, but I would still try to do a write manually via dd.

Something tried remapping and failed, and that is the important thing. If
it is the HD remapping that failed, as I think it is, then the failure is
crucial and the HD needs replacing ASAP, no toying.

In fact, trying to write to that sector will probably fail and the
remapping will fail, too. Should fail.

That's why reading the SMART log is so important in this case. If he has
10.0 and smartctl can't read it, then he should use a 10.2 rescue system
and read that log.

Or replace the disk first, investigate later.


I googled the error message.

Found it in 
http://tldp.org/HOWTO/archived/SCSI-Programming-HOWTO/SCSI-Programming-HOWTO-22.html

So it appears to be coming out of the SCSI layer that sits above
libata.  As I said before, I would not trust those error messages
since there is not a good mapping of ATA errors into the SCSI world.
Checking smart logs makes sense, but based on that single error I
would not be replacing hardware.

Greg
--
Greg Freemyer
The Norcross Group
Forensics for the 21st Century
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]