Re: invalid superblock - *again*

2006-08-27 Thread Neil Brown
On Tuesday August 22, [EMAIL PROTECTED] wrote:
> Am Dienstag, 22. August 2006 03:18 schrieb Neil Brown:
> > >
> > > Most notable: [   38.536733] md: kicking non-fresh sdd1 from array!
> > > What does this mean?
> >
> > It means that the 'event' count on sdd1 is old compared to that on
> > the other partitions.  The most likely explanation is that when the
> > array was last running, sdd1 was not part of it.
> 
> Event count - so: a certain command or set of instructions was sent to all 
> disks, but one didn't get it, hence the raid module can't ensure that the 
> data on that disk is consistent with the rest of the array?
> 

Not exactly.  Events are thing like starting and stopping the array,
adding or removing drives, drive failure and clean <-> dirty
transitions.  If the event counts are not consistent, then when the
array was last stopped, one drive (at least) was missing from the
array. 

> > > What's happening here? What can I do? Do I have to readd sdd and resync?
> > > Or is there an easier way out? What causes these issues?
> >
> > Yes, you need to add sdd1 back to the array and it will resync.
> 
> Ok, if that's what it takes.
> 
> > I would need some precise recent history of the array to know why this
> > happened.  That might not be easy to come by.
> 
> Depends on what exactly you mean. Disk age? smart data? Hardware types? Logs? 
> OS?
> 

Complete kernel logs since a time when it was known to be good might
be enough - so I could track all the 'event's and see where it went
out of sync.

> I don't have more than a few vague guesses about what might have happened. 
> First of all it might be possible that the file systems on the array were not 
> unmounted properly during shutdown because a remote NFS mount was hogging 
> them. If that be the case, LVM couldn't have shut down properly, then the md 
> device wouldn't have stopped and the machine just powered down.
> That would explain it.

Maybe but even shutting down with the array still active shouldn't
cause the event counts to go out of sync.  It should just trigger a
resync.

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: invalid superblock - *again*

2006-08-22 Thread Dexter Filmore
Am Dienstag, 22. August 2006 03:18 schrieb Neil Brown:
> >
> > Most notable: [   38.536733] md: kicking non-fresh sdd1 from array!
> > What does this mean?
>
> It means that the 'event' count on sdd1 is old compared to that on
> the other partitions.  The most likely explanation is that when the
> array was last running, sdd1 was not part of it.

Event count - so: a certain command or set of instructions was sent to all 
disks, but one didn't get it, hence the raid module can't ensure that the 
data on that disk is consistent with the rest of the array?

> > What's happening here? What can I do? Do I have to readd sdd and resync?
> > Or is there an easier way out? What causes these issues?
>
> Yes, you need to add sdd1 back to the array and it will resync.

Ok, if that's what it takes.

> I would need some precise recent history of the array to know why this
> happened.  That might not be easy to come by.

Depends on what exactly you mean. Disk age? smart data? Hardware types? Logs? 
OS?

I don't have more than a few vague guesses about what might have happened. 
First of all it might be possible that the file systems on the array were not 
unmounted properly during shutdown because a remote NFS mount was hogging 
them. If that be the case, LVM couldn't have shut down properly, then the md 
device wouldn't have stopped and the machine just powered down.
That would explain it.
Slackware has no raid runlevel scripts out of the box, I wrote them myself.
Maybe such conditions are not handled properly.

What speaks against that theory is that nfsd is stopped before lvm and raid is 
handled.

-- 
-BEGIN GEEK CODE BLOCK-
Version: 3.12
GCS d--(+)@ s-:+ a- C UL++ P+>++ L+++> E-- W++ N o? K-
w--(---) !O M+ V- PS+ PE Y++ PGP t++(---)@ 5 X+(++) R+(++) tv--(+)@ 
b++(+++) DI+++ D- G++ e* h>++ r* y?
--END GEEK CODE BLOCK--

http://www.stop1984.com
http://www.againsttcpa.com
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: invalid superblock - *again*

2006-08-21 Thread Neil Brown
On Tuesday August 22, [EMAIL PROTECTED] wrote:
> Am Montag, 21. August 2006 13:04 schrieb Dexter Filmore:
> I seriously don't know what's going on here.
> I upgraded packages and rebooted the machine to find that now disk 4 of 4 is 
> not assembled.
> 
> Here's dmesg and mdadm -E 
> 
> * dmesg **
> [   38.439644] md: md0 stopped.
> [   38.536089] md: bind
> [   38.536301] md: bind
> [   38.536501] md: bind
> [   38.536702] md: bind
> [   38.536733] md: kicking non-fresh sdd1 from array!
> [   38.536751] md: unbind
> [   38.536765] md: export_rdev(sdd1)
> [   38.536794] raid5: device sda1 operational as raid disk 0
> [   38.536812] raid5: device sdc1 operational as raid disk 2
> [   38.536831] raid5: device sdb1 operational as raid disk 1
> [   38.537453] raid5: allocated 4195kB for md0
> [   38.537471] raid5: raid level 5 set md0 active with 3 out of 4 devices, 
> algor
> ithm 2
> [   38.537499] RAID5 conf printout:
> [   38.537513]  --- rd:4 wd:3 fd:1
> [   38.537528]  disk 0, o:1, dev:sda1
> [   38.537543]  disk 1, o:1, dev:sdb1
> [   38.537558]  disk 2, o:1, dev:sdc1
> *
> 
> Most notable: [   38.536733] md: kicking non-fresh sdd1 from array!
> What does this mean?

It means that the 'event' count on sdd1 is old compared to that on
the other partitions.  The most likely explanation is that when the
array was last running, sdd1 was not part of it.

> 
> What's happening here? What can I do? Do I have to readd sdd and resync? Or 
> is 
> there an easier way out? What causes these issues?
> 

Yes, you need to add sdd1 back to the array and it will resync.

I would need some precise recent history of the array to know why this
happened.  That might not be easy to come by.

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: invalid superblock - *again*

2006-08-21 Thread Dexter Filmore
Am Montag, 21. August 2006 13:04 schrieb Dexter Filmore:
I seriously don't know what's going on here.
I upgraded packages and rebooted the machine to find that now disk 4 of 4 is 
not assembled.

Here's dmesg and mdadm -E 

* dmesg **
[   38.439644] md: md0 stopped.
[   38.536089] md: bind
[   38.536301] md: bind
[   38.536501] md: bind
[   38.536702] md: bind
[   38.536733] md: kicking non-fresh sdd1 from array!
[   38.536751] md: unbind
[   38.536765] md: export_rdev(sdd1)
[   38.536794] raid5: device sda1 operational as raid disk 0
[   38.536812] raid5: device sdc1 operational as raid disk 2
[   38.536831] raid5: device sdb1 operational as raid disk 1
[   38.537453] raid5: allocated 4195kB for md0
[   38.537471] raid5: raid level 5 set md0 active with 3 out of 4 devices, 
algor
ithm 2
[   38.537499] RAID5 conf printout:
[   38.537513]  --- rd:4 wd:3 fd:1
[   38.537528]  disk 0, o:1, dev:sda1
[   38.537543]  disk 1, o:1, dev:sdb1
[   38.537558]  disk 2, o:1, dev:sdc1
*

Most notable: [   38.536733] md: kicking non-fresh sdd1 from array!
What does this mean?

* mdadm -E /dev/sdd1 
/dev/sdd1:
  Magic : a92b4efc
Version : 00.90.02
   UUID : 7f103422:7be2c2ce:e67a70be:112a2914
  Creation Time : Tue May  9 01:11:41 2006
 Raid Level : raid5
Device Size : 244187904 (232.88 GiB 250.05 GB)
 Array Size : 732563712 (698.63 GiB 750.15 GB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 0

Update Time : Tue Aug 22 01:42:36 2006
  State : clean
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0
   Checksum : 33b2d59b - correct
 Events : 0.765488

 Layout : left-symmetric
 Chunk Size : 32K

  Number   Major   Minor   RaidDevice State
this 3   8   493  active sync   /dev/sdd1

   0 0   810  active sync   /dev/sda1
   1 1   8   171  active sync   /dev/sdb1
   2 2   8   332  active sync   /dev/sdc1
   3 3   8   493  active sync   /dev/sdd1

*

What's happening here? What can I do? Do I have to readd sdd and resync? Or is 
there an easier way out? What causes these issues?


-- 
-BEGIN GEEK CODE BLOCK-
Version: 3.12
GCS d--(+)@ s-:+ a- C UL++ P+>++ L+++> E-- W++ N o? K-
w--(---) !O M+ V- PS+ PE Y++ PGP t++(---)@ 5 X+(++) R+(++) tv--(+)@ 
b++(+++) DI+++ D- G++ e* h>++ r* y?
--END GEEK CODE BLOCK--

http://www.stop1984.com
http://www.againsttcpa.com
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html