subject:"\[gentoo\-user\] dying hard drive\?"

Re: [gentoo-user] dying hard drive

2010-07-22 Thread Paul Hartman

On Thu, Jul 22, 2010 at 1:11 AM, Mick  wrote:
> On Thursday 22 July 2010 05:14:08 David Relson wrote:
>> /var/log/messages has indicated a slew of XFS problems on an external
>> USB hard drive (see attachment).  These look pretty fatal.  Anybody
>> think the file system is recoverable?
>
> You'll have to try to recover it, to see if it is possible:  xfs is vulnerable
> to power interruptions, so a faulty USB cable can cause corruption.

I had exactly this problem with a USB HDD formatted with xfs. The USB
cable that it came with was rubbish... the drive would disconnect &
reconnect on its own for no apparent reason, and corruption happened
of course. I replaced it with another cable and it worked fine after
that.

A few months later the power supply started to fail, it would
occasionally not provide enough power and the drive would go offline
or start beeping/clicking. At first I thought the disk was bad
(clicking is never good) but it was actually the sound of the drive
trying to spin up and failing. Eventually the power brick couldn't
even spin up the drive at all. I replaced the power supply and now the
drive works fine again, for now...

Re: [gentoo-user] dying hard drive

2010-07-22 Thread Mick

On Thursday 22 July 2010 05:14:08 David Relson wrote:
> /var/log/messages has indicated a slew of XFS problems on an external
> USB hard drive (see attachment).  These look pretty fatal.  Anybody
> think the file system is recoverable?

You'll have to try to recover it, to see if it is possible:  xfs is vulnerable 
to power interruptions, so a faulty USB cable can cause corruption.  I haven't 
had a corrupted xfs system for years now, so I put initial experiences down to 
early (buggy) versions of the drivers.  In my case, I was not able to recover 
and I had to reformat and start again.  After a couple of early mortality 
cases the fs in question carried on for 4 years without a single problem.

Try xfs_check and xfs_repair with the drive unmounted, but first use 
xfs_dump/restore or dd to make a back up just in case.

> Also, palimpsest is reporting (graphically) that my external hard drive is
> about to die.  Can I save it's report to a text file???

Sorry, can't help with that because I'm not familiar with the application.  
You could use sys-apps/smartmontools if you want a console application that 
you can copy and paste from.
-- 
Regards,
Mick

signature.asc
Description: This is a digitally signed message part.

[gentoo-user] dying hard drive

2010-07-21 Thread David Relson

/var/log/messages has indicated a slew of XFS problems on an external
USB hard drive (see attachment).  These look pretty fatal.  Anybody
think the file system is recoverable?

Also, palimpsest is reporting (graphically) that my external hard drive is
about to die.  Can I save it's report to a text file???
Jul 21 23:53:23 osage kernel: usb 2-1: new high speed USB device using ehci_hcd 
and address 2
Jul 21 23:53:23 osage kernel: usb 2-1: New USB device found, idVendor=0bc2, 
idProduct=3001
Jul 21 23:53:23 osage kernel: usb 2-1: New USB device strings: Mfr=1, 
Product=2, SerialNumber=3
Jul 21 23:53:23 osage kernel: usb 2-1: Product: FreeAgent
Jul 21 23:53:23 osage kernel: usb 2-1: Manufacturer: Seagate
Jul 21 23:53:23 osage kernel: usb 2-1: SerialNumber: 2GEX0DP4
Jul 21 23:53:23 osage kernel: scsi4 : usb-storage 2-1:1.0
Jul 21 23:53:24 osage kernel: scsi 4:0:0:0: Direct-Access Seagate  
FreeAgent102D PQ: 0 ANSI: 4
Jul 21 23:53:24 osage kernel: sd 4:0:0:0: Attached scsi generic sg1 type 0
Jul 21 23:53:28 osage kernel: sd 4:0:0:0: [sdb] 976773168 512-byte logical 
blocks: (500 GB/465 GiB)
Jul 21 23:53:28 osage kernel: sd 4:0:0:0: [sdb] Write Protect is off
Jul 21 23:53:28 osage kernel: sd 4:0:0:0: [sdb] Mode Sense: 1c 00 00 00
Jul 21 23:53:28 osage kernel: sd 4:0:0:0: [sdb] Assuming drive cache: write 
through
Jul 21 23:53:28 osage kernel: sd 4:0:0:0: [sdb] Assuming drive cache: write 
through
Jul 21 23:53:28 osage kernel: sdb: sdb1
Jul 21 23:53:28 osage kernel: sd 4:0:0:0: [sdb] Assuming drive cache: write 
through
Jul 21 23:53:28 osage kernel: sd 4:0:0:0: [sdb] Attached SCSI disk

Jul 21 23:54:18 osage kernel: XFS: bad magic number
Jul 21 23:54:18 osage kernel: XFS: SB validate failed
Jul 21 23:54:36 osage kernel: XFS mounting filesystem sdb1
Jul 21 23:54:36 osage kernel: Starting XFS recovery on filesystem: sdb1 
(logdev: internal)

Jul 21 23:55:12 osage kernel: XFS internal error XFS_WANT_CORRUPTED_GOTO at 
line 1544 of file fs/xfs/xfs_alloc.c.  Caller
 0x81122bf8
Jul 21 23:55:12 osage kernel: Pid: 4415, comm: mount Not tainted 
2.6.34-gentoo-r1 #1
Jul 21 23:55:12 osage kernel: Call Trace:
Jul 21 23:55:12 osage kernel: [] ? xfs_free_extent+0x7d/0x94
Jul 21 23:55:12 osage kernel: [] ? 
xfs_free_ag_extent+0x42e/0x662
Jul 21 23:55:12 osage kernel: [] ? xfs_free_extent+0x7d/0x94
Jul 21 23:55:12 osage kernel: [] ? xfs_trans_get_efd+0x21/0x29
Jul 21 23:55:12 osage kernel: [] ? 
xlog_recover_process_efi+0x113/0x171
Jul 21 23:55:12 osage kernel: [] ? 
xlog_recover_process_efis+0x4d/0x8a
Jul 21 23:55:12 osage kernel: [] ? 
xlog_recover_finish+0x14/0xac
Jul 21 23:55:12 osage kernel: [] ? xfs_mountfs+0x48f/0x556
Jul 21 23:55:12 osage kernel: [] ? kmem_zalloc+0xd/0x28
Jul 21 23:55:12 osage kernel: [] ? 
xfs_mru_cache_create+0x111/0x14c
Jul 21 23:55:12 osage kernel: [] ? 
xfs_fs_fill_super+0x199/0x300
Jul 21 23:55:12 osage kernel: [] ? get_sb_bdev+0x125/0x16d
Jul 21 23:55:12 osage kernel: [] ? xfs_fs_fill_super+0x0/0x300
Jul 21 23:55:12 osage kernel: [] ? vfs_kern_mount+0xaa/0x179
Jul 21 23:55:12 osage kernel: [] ? do_kern_mount+0x43/0xe1
Jul 21 23:55:12 osage kernel: [] ? do_mount+0x766/0x7e2
Jul 21 23:55:12 osage kernel: [] ? copy_from_user+0x13/0x25
Jul 21 23:55:12 osage kernel: [] ? sys_mount+0x84/0xc5
Jul 21 23:55:12 osage kernel: [] ? 
system_call_fastpath+0x16/0x1b
Jul 21 23:55:12 osage kernel: Filesystem "sdb1": XFS internal error 
xfs_trans_cancel at line 1161 of file fs/xfs/xfs_trans.c.  Caller 
0x8114f0b9
Jul 21 23:55:12 osage kernel: 
Jul 21 23:55:12 osage kernel: Pid: 4415, comm: mount Not tainted 
2.6.34-gentoo-r1 #1
Jul 21 23:55:12 osage kernel: Call Trace:
Jul 21 23:55:12 osage kernel: [] ? 
xlog_recover_process_efi+0x163/0x171
Jul 21 23:55:12 osage kernel: [] ? xfs_trans_cancel+0x56/0xd3
Jul 21 23:55:12 osage kernel: [] ? 
xlog_recover_process_efi+0x163/0x171
Jul 21 23:55:12 osage kernel: [] ? 
xlog_recover_process_efis+0x4d/0x8a
Jul 21 23:55:12 osage udevd-work[3570]: '/bin/mount -a' unexpected exit with 
status 0x000b
Jul 21 23:55:12 osage kernel: [] ? 
xlog_recover_finish+0x14/0xac
Jul 21 23:55:12 osage kernel: [] ? xfs_mountfs+0x48f/0x556
Jul 21 23:55:12 osage kernel: [] ? kmem_zalloc+0xd/0x28
Jul 21 23:55:12 osage kernel: [] ? 
xfs_mru_cache_create+0x111/0x14c
Jul 21 23:55:12 osage kernel: [] ? 
xfs_fs_fill_super+0x199/0x300
Jul 21 23:55:12 osage kernel: [] ? get_sb_bdev+0x125/0x16d
Jul 21 23:55:12 osage kernel: [] ? xfs_fs_fill_super+0x0/0x300
Jul 21 23:55:12 osage kernel: [] ? vfs_kern_mount+0xaa/0x179
Jul 21 23:55:12 osage kernel: [] ? do_kern_mount+0x43/0xe1
Jul 21 23:55:12 osage kernel: [] ? do_mount+0x766/0x7e2
Jul 21 23:55:12 osage kernel: [] ? copy_from_user+0x13/0x25
Jul 21 23:55:12 osage kernel: [] ? sys_mount+0x84/0xc5
Jul 21 23:55:12 osage kernel: [] ? 
system_call_fastpath+0x16/0x1b
Jul 21 23:55:12 osage kernel: xfs_force_shutdown(sdb1,0x8) called from line 
1162 of file fs/xfs/xfs_trans.c.  Return address = 0x81156cd5
Jul 21 23:55:12 osage kernel:

Re: [gentoo-user] dying hard drive?

2006-01-19 Thread matthew . garman

On Fri, Jan 13, 2006 at 06:15:20PM -0700, Richard Fish wrote:
> I was able to resurrect a drive with a similar problem with:
> dd if=/dev/zero of=/dev/hda bs=32k
> You can then check that the drive is working with:
> dd if=/dev/hda of=/dev/null bs=32k
> 
> If either command fails, then it is time to replace the drive.  In
> my case, that drive was still working perfectly 18 months later
> when I sold it to someone else.

I don't think that's going to work for me:

# dd if=/dev/zero of=/dev/hda bs=32k
dd: writing `/dev/hda': No space left on device
4884091+0 records in
4884090+0 records out

# dd if=/dev/hda of=/dev/null bs=32k
dd: reading `/dev/hda': Input/output error
3229627+1 records in
3229627+1 records out

D'oh!

Time to find that RMA form!

Thanks for the help,
Matt

-- 
Matt Garman
email at: http://raw-sewage.net/index.php?file=email
-- 
gentoo-user@gentoo.org mailing list

Re: [gentoo-user] dying hard drive?

2006-01-13 Thread Richard Fish

On 1/13/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
>
> I keep getting hard drive errors in my kernel log/dmesg that have me
> worried.  From /var/log/kernel/current:
>
> Jan 13 11:42:31 [kernel] hda: dma_intr: status=0x59 { DriveReady SeekComplete 
> DataRequest Error }
> - Last output repeated 7 times -
> Jan 13 11:42:39 [kernel] hda: dma_intr: error=0x40 { UncorrectableError }, 
> LBAsect=206696214, high=12, low=5369622, sector=206695927
> Jan 13 11:42:39 [kernel] ide: failed opcode was: unknown
> Jan 13 11:42:40 [kernel] hda: dma_intr: status=0x59 { DriveReady SeekComplete 
> DataRequest Error }

These mean the blocks are corrupt, and cannot be read.  Whatever was
on those blocks is now lost.

> On the drive.  Apparently, an error was found (details below).  I'm
> not sure if this drive is actually dying, though, as the following
> article (by the smartmontools author) suggests that one or two
> errors on a drive is nothing to worry about.  Also, the SMART
> overall-health self-assessment test comes back as PASSED.

I was able to resurrect a drive with a similar problem with:

dd if=/dev/zero of=/dev/hda bs=32k

!DANGER! the above command will destroy all data on the drive...but by
writing to those sectors you can cause the drive to remap them to
sectors reserved for that purpose.

You can then check that the drive is working with:

dd if=/dev/hda of=/dev/null bs=32k

If either command fails, then it is time to replace the drive.  In my
case, that drive was still working perfectly 18 months later when I
sold it to someone else.

In any case, time to make sure you have a good backup.

-Richard

-- 
gentoo-user@gentoo.org mailing list

Re: [gentoo-user] dying hard drive?

2006-01-13 Thread Willie Wong

On Fri, Jan 13, 2006 at 03:39:46PM -0600, Penguin Lover [EMAIL PROTECTED] 
squawked:
> 
> I keep getting hard drive errors in my kernel log/dmesg that have me
> worried.  From /var/log/kernel/current:
> 
> Jan 13 11:42:31 [kernel] hda: dma_intr: status=0x59 { DriveReady SeekComplete 
> DataRequest Error }
> - Last output repeated 7 times -
> Jan 13 11:42:39 [kernel] hda: dma_intr: error=0x40 { UncorrectableError }, 
> LBAsect=206696214, high=12, low=5369622, sector=206695927
> Jan 13 11:42:39 [kernel] ide: failed opcode was: unknown
> Jan 13 11:42:40 [kernel] hda: dma_intr: status=0x59 { DriveReady SeekComplete 
> DataRequest Error }
> 

Do you run SMARTD? If you do, did it complain? 
(grep SMART /var/log/everything/*)

Usually UncorrectablError means that some spots on your harddrive is
not readable. And if it keeps complaining, it might be a sign that
something is wrong with your drive. (Of course, it could also be flaky
connectors.)

Maybe you can take a look at 
http://www.samsung.com/Products/HardDiskDrive/troubleshooting/index.htm

A lot of times you get one or two bad sectors due to environmental
issues: power blip for one, and my roommate slamming the door too hard
on his way out for another. If that is the case, most harddrive
vendors provide a diagnostic tool that allows you to map that couple
sectors to one of the backup ones on the disk. (Yes, they have a few
extra on the harddrive just for that purpose). 
> 
> The drive is a 160 GB PATA Samsung.  It's about two or three years
> old, running 24x7 (although lightly).  The drive has three
> partitions, all are ext3.


> 
> SMART Self-test log structure revision number 1
> Num  Test_DescriptionStatus  Remaining  LifeTime(hours)  
> LBA_of_first_error
> # 1  Extended offlineCompleted: read failure   00% 11486 
> 262886799
> # 2  Short offline   Completed without error   00% 11483 -

W
-- 
Statistics are like a Bikini: 
  showing interesting details but hiding the important stuff.
Sortir en Pantoufles: up 62 days, 14:29
-- 
gentoo-user@gentoo.org mailing list

Re: [gentoo-user] dying hard drive?

2006-01-13 Thread Tim Igoe


[EMAIL PROTECTED] wrote:


I keep getting hard drive errors in my kernel log/dmesg that have me
worried.  From /var/log/kernel/current:

Jan 13 11:42:31 [kernel] hda: dma_intr: status=0x59 { DriveReady SeekComplete 
DataRequest Error }
   - Last output repeated 7 times -
Jan 13 11:42:39 [kernel] hda: dma_intr: error=0x40 { UncorrectableError }, 
LBAsect=206696214, high=12, low=5369622, sector=206695927
Jan 13 11:42:39 [kernel] ide: failed opcode was: unknown
Jan 13 11:42:40 [kernel] hda: dma_intr: status=0x59 { DriveReady SeekComplete 
DataRequest Error }


 

Exactly the same message I noticed less than 1hr before my Maxtor 
DiamondMax 9 packed in just before xmas. Annoyingly my drive wouldn't 
mount the main data partition but everything else seemed in tact. I 
managed to recover all my data from the drive using dd once i had a new 
drive.


I'd recommend backing up anything thats essencial on the drive and 
preparing for it to give up the ghost.



The drive is a 160 GB PATA Samsung.  It's about two or three years
old, running 24x7 (although lightly).  The drive has three
partitions, all are ext3.

When I started seeing the above messages, I ran 


   fsck.ext3 -f -v -c -c /dev/hda?

on all three partitions.  Note that the "-c" flag includes the bad
blocks check.

I also ran

   smartctl -t long /dev/hda

On the drive.  Apparently, an error was found (details below).  I'm
not sure if this drive is actually dying, though, as the following
article (by the smartmontools author) suggests that one or two
errors on a drive is nothing to worry about.  Also, the SMART
overall-health self-assessment test comes back as PASSED.

   http://www.linuxjournal.com/article/6983

But the constant kernel messages, along with the error in the "long"
SMART test, concern me.  At this point, I'm not really sure what my
next steps should be, so I'm looking for any suggestions or advice.

Thanks!
Matt



# smartctl -a /dev/hda

smartctl version 5.33 [i686-pc-linux-gnu] Copyright (C) 2002-4 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Device Model: SAMSUNG SP1614N
Serial Number:0642J1FW903226
Firmware Version: TM100-24
User Capacity:160,041,885,696 bytes
Device is:In smartctl database [for details use: -P show]
ATA Version is:   7
ATA Standard is:  ATA/ATAPI-7 T13 1532D revision 0
Local Time is:Fri Jan 13 15:24:27 2006 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status:  ( 245) Self-test routine in progress...
50% of test remaining.
Total time to complete Offline 
data collection: 		 (5760) seconds.

Offline data collection
capabilities:(0x1b) SMART execute Offline immediate.
Auto Offline data collection on/off 
support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
No Selective Self-test supported.
SMART capabilities:(0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability:(0x01) Error logging supported.
No General Purpose Logging support.
Short self-test routine 
recommended polling time: 	 (   1) minutes.

Extended self-test routine
recommended polling time:(  96) minutes.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME  FLAG VALUE WORST THRESH TYPE  UPDATED  
WHEN_FAILED RAW_VALUE
 1 Raw_Read_Error_Rate 0x000b   100   100   051Pre-fail  Always   - 
  1
 3 Spin_Up_Time0x0007   061   061   000Pre-fail  Always   - 
  6528
 4 Start_Stop_Count0x0032   100   100   000Old_age   Always   - 
  73
 5 Reallocated_Sector_Ct   0x0033   253   253   010Pre-fail  Always   - 
  0
 7 Seek_Error_Rate 0x000b   253   253   051Pre-fail  Always   - 
  0
 8 Seek_Time_Performance   0x0024   253   253   000Old_age   Offline  - 
  0
 9 Power_On_Half_Minutes   0x0032   098   098   000Old_age   Always   - 
  11505h+32m
10 Spin_Retry_Count

[gentoo-user] dying hard drive?

2006-01-13 Thread matthew . garman


I keep getting hard drive errors in my kernel log/dmesg that have me
worried.  From /var/log/kernel/current:

Jan 13 11:42:31 [kernel] hda: dma_intr: status=0x59 { DriveReady SeekComplete 
DataRequest Error }
- Last output repeated 7 times -
Jan 13 11:42:39 [kernel] hda: dma_intr: error=0x40 { UncorrectableError }, 
LBAsect=206696214, high=12, low=5369622, sector=206695927
Jan 13 11:42:39 [kernel] ide: failed opcode was: unknown
Jan 13 11:42:40 [kernel] hda: dma_intr: status=0x59 { DriveReady SeekComplete 
DataRequest Error }


The drive is a 160 GB PATA Samsung.  It's about two or three years
old, running 24x7 (although lightly).  The drive has three
partitions, all are ext3.

When I started seeing the above messages, I ran 

fsck.ext3 -f -v -c -c /dev/hda?

on all three partitions.  Note that the "-c" flag includes the bad
blocks check.

I also ran

smartctl -t long /dev/hda

On the drive.  Apparently, an error was found (details below).  I'm
not sure if this drive is actually dying, though, as the following
article (by the smartmontools author) suggests that one or two
errors on a drive is nothing to worry about.  Also, the SMART
overall-health self-assessment test comes back as PASSED.

http://www.linuxjournal.com/article/6983

But the constant kernel messages, along with the error in the "long"
SMART test, concern me.  At this point, I'm not really sure what my
next steps should be, so I'm looking for any suggestions or advice.

Thanks!
Matt



# smartctl -a /dev/hda

smartctl version 5.33 [i686-pc-linux-gnu] Copyright (C) 2002-4 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Device Model: SAMSUNG SP1614N
Serial Number:0642J1FW903226
Firmware Version: TM100-24
User Capacity:160,041,885,696 bytes
Device is:In smartctl database [for details use: -P show]
ATA Version is:   7
ATA Standard is:  ATA/ATAPI-7 T13 1532D revision 0
Local Time is:Fri Jan 13 15:24:27 2006 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status:  ( 245) Self-test routine in progress...
50% of test remaining.
Total time to complete Offline 
data collection: (5760) seconds.
Offline data collection
capabilities:(0x1b) SMART execute Offline immediate.
Auto Offline data collection on/off 
support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
No Selective Self-test supported.
SMART capabilities:(0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability:(0x01) Error logging supported.
No General Purpose Logging support.
Short self-test routine 
recommended polling time:(   1) minutes.
Extended self-test routine
recommended polling time:(  96) minutes.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME  FLAG VALUE WORST THRESH TYPE  UPDATED  
WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate 0x000b   100   100   051Pre-fail  Always   
-   1
  3 Spin_Up_Time0x0007   061   061   000Pre-fail  Always   
-   6528
  4 Start_Stop_Count0x0032   100   100   000Old_age   Always   
-   73
  5 Reallocated_Sector_Ct   0x0033   253   253   010Pre-fail  Always   
-   0
  7 Seek_Error_Rate 0x000b   253   253   051Pre-fail  Always   
-   0
  8 Seek_Time_Performance   0x0024   253   253   000Old_age   Offline  
-   0
  9 Power_On_Half_Minutes   0x0032   098   098   000Old_age   Always   
-   11505h+32m
 10 Spin_Retry_Count0x0013   253   253   049Pre-fail  Always   
-   0
 12 Power_Cycle_Count   0x0032   100   100   000Old_age   Always   
-   50
194 Temperature_Celsius 0x0022   163   127   000Old_age   Always   
-   25
195 Hardware_ECC_Recovered  0x000a   100   100   000Old_age   Always   
-   265460048
196 Reallocated_Event_Count 0x0012   100   100   000

Re: [gentoo-user] dying hard drive

Re: [gentoo-user] dying hard drive

[gentoo-user] dying hard drive

Re: [gentoo-user] dying hard drive?

Re: [gentoo-user] dying hard drive?

Re: [gentoo-user] dying hard drive?

Re: [gentoo-user] dying hard drive?

[gentoo-user] dying hard drive?

8 matches

Site Navigation

Mail list logo

Footer information