Re: Raid-10 mount at startup always has problem

2007-12-17 Thread Daniel L. Miller

Daniel L. Miller wrote:

Doug Ledford wrote:

Nah.  Even if we had concluded that udev was to blame here, I'm not
entirely certain that we hadn't left Daniel with the impression that we
suspected it versus blamed it, so reiterating it doesn't hurt.  And I'm
sure no one has given him a fix for the problem (although Neil did
request a change that will give debug output, but not solve the
problem), so not dropping it entirely would seem appropriate as well.
  
I've opened a bug report on Ubuntu's Launchpad.net.  Scott James 
Remnant asked me to cc him on Neil's incremental reference - we'll see 
what happens from here.


Thanks for the help guys.  At the moment, I've changed my mdadm.conf 
to explicitly list the drives, instead of the auto=partition 
parameter.  We'll see what happens on the next reboot.


I don't know if it means anything, but I'm using a self-compiled 
2.6.22 kernel - with initrd.  At least I THINK I'm using initrd - I 
have an image, but I don't see an initrd line in my grub config.  
HmmI'm going to add a stanza that includes the initrd and see what 
happens also.


Wow.  Been a while since I asked about this - I just realized a reboot 
or two has come and gone.  I checked my md status - everything was 
online!  Cool.


My current dmesg output:
sata_nv :00:07.0: version 3.4
ACPI: PCI Interrupt Link [LTID] enabled at IRQ 23
ACPI: PCI Interrupt :00:07.0[A] - Link [LTID] - GSI 23 (level, 
high) - IRQ 23

sata_nv :00:07.0: Using ADMA mode
PCI: Setting latency timer of device :00:07.0 to 64
scsi0 : sata_nv
scsi1 : sata_nv
ata1: SATA max UDMA/133 cmd 0xc20001428480 ctl 0xc200014284a0 
bmdma 0x00
011410 irq 23
ata2: SATA max UDMA/133 cmd 0xc20001428580 ctl 0xc200014285a0 
bmdma 0x00
011418 irq 23

ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata1.00: ATA-7: ST3160811AS, 3.AAE, max UDMA/133
ata1.00: 312581808 sectors, multi 16: LBA48 NCQ (depth 31/32)
ata1.00: configured for UDMA/133
ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata2.00: ATA-7: ST3160811AS, 3.AAE, max UDMA/133
ata2.00: 312581808 sectors, multi 16: LBA48 NCQ (depth 31/32)
ata2.00: configured for UDMA/133
scsi 0:0:0:0: Direct-Access ATA  ST3160811AS  3.AA PQ: 0 ANSI: 5
ata1: bounce limit 0x, segment boundary 0x, hw 
segs 61

scsi 1:0:0:0: Direct-Access ATA  ST3160811AS  3.AA PQ: 0 ANSI: 5
ata2: bounce limit 0x, segment boundary 0x, hw 
segs 61

ACPI: PCI Interrupt Link [LSI1] enabled at IRQ 22
ACPI: PCI Interrupt :00:08.0[A] - Link [LSI1] - GSI 22 (level, 
high) - IRQ 22

sata_nv :00:08.0: Using ADMA mode
PCI: Setting latency timer of device :00:08.0 to 64
scsi2 : sata_nv
scsi3 : sata_nv
ata3: SATA max UDMA/133 cmd 0xc2000142a480 ctl 0xc2000142a4a0 
bmdma 0x00
011420 irq 22
ata4: SATA max UDMA/133 cmd 0xc2000142a580 ctl 0xc2000142a5a0 
bmdma 0x00
011428 irq 22

ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata3.00: ATA-7: ST3160811AS, 3.AAE, max UDMA/133
ata3.00: 312581808 sectors, multi 16: LBA48 NCQ (depth 31/32)
ata3.00: configured for UDMA/133
ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata4.00: ATA-7: ST3160811AS, 3.AAE, max UDMA/133
ata4.00: 312581808 sectors, multi 16: LBA48 NCQ (depth 31/32)
ata4.00: configured for UDMA/133
scsi 2:0:0:0: Direct-Access ATA  ST3160811AS  3.AA PQ: 0 ANSI: 5
ata3: bounce limit 0x, segment boundary 0x, hw 
segs 61

scsi 3:0:0:0: Direct-Access ATA  ST3160811AS  3.AA PQ: 0 ANSI: 5
ata4: bounce limit 0x, segment boundary 0x, hw 
segs 61

sd 0:0:0:0: [sda] 312581808 512-byte hardware sectors (160042 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't 
support DPO or FUA

sd 0:0:0:0: [sda] 312581808 512-byte hardware sectors (160042 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't 
support DPO or FUA

sda: unknown partition table
sd 0:0:0:0: [sda] Attached SCSI disk
sd 1:0:0:0: [sdb] 312581808 512-byte hardware sectors (160042 MB)
sd 1:0:0:0: [sdb] Write Protect is off
sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't 
support DPO or FUA

sd 1:0:0:0: [sdb] 312581808 

Re: Raid-10 mount at startup always has problem

2007-11-01 Thread Bill Davidsen

Daniel L. Miller wrote:

Doug Ledford wrote:

Nah.  Even if we had concluded that udev was to blame here, I'm not
entirely certain that we hadn't left Daniel with the impression that we
suspected it versus blamed it, so reiterating it doesn't hurt.  And I'm
sure no one has given him a fix for the problem (although Neil did
request a change that will give debug output, but not solve the
problem), so not dropping it entirely would seem appropriate as well.
  
I've opened a bug report on Ubuntu's Launchpad.net.  Scott James 
Remnant asked me to cc him on Neil's incremental reference - we'll see 
what happens from here.


Thanks for the help guys.  At the moment, I've changed my mdadm.conf 
to explicitly list the drives, instead of the auto=partition 
parameter.  We'll see what happens on the next reboot.


I don't know if it means anything, but I'm using a self-compiled 
2.6.22 kernel - with initrd.  At least I THINK I'm using initrd - I 
have an image, but I don't see an initrd line in my grub config.  
HmmI'm going to add a stanza that includes the initrd and see what 
happens also.



What did that do?

--
bill davidsen [EMAIL PROTECTED]
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Raid-10 mount at startup always has problem

2007-10-29 Thread Luca Berra

On Sun, Oct 28, 2007 at 08:21:34PM -0400, Bill Davidsen wrote:

Because you didn't stripe align the partition, your bad.
  
Align to /what/ stripe? Hardware (CHS is fiction), software (of the RAID 

the real stripe (track) size of the storage, you must read the manual
and/or bug technical support for that info.
you're about to create), or ??? I don't notice my FC6 or FC7 install 
programs using any special partition location to start, I have only run 
(tried to run) FC8-test3 for the live CD, so I can't say what it might 
do. CentOS4 didn't do anything obvious, either, so unless I really 
misunderstand your position at redhat, that would be your bad.  ;-)


If you mean start a partition on a pseudo-CHS boundary, fdisk seems to 
use what it thinks are cylinders for that.

Yes, fdisk will create partition at sector 63 (due to CHS being braindead,
other than fictional: 63 sectors-per-track)
most arrays use 64 or 128 spt, and array cache are aligned accordingly.
So 63 is almost always the wrong choice.

for the default choice you must consider what spt your array uses, iirc
(this is from memory, so double check these figures)
IBM 64 spt (i think)
EMC DMX 64
EMC CX 128???
HDS (and HP XP) except OPEN-V 96
HDS (and HP XP) OPEN-V 128
HP EVA 4/6/8 with XCS 5.x state that no alignment is needed even if i
never found a technical explanation about that.
previous HP EVA versions did (maybe 64).
you might then want to consider how data is laid out on the storage, but
i believe the storage cache is enough to deal with that issue.

Please note that 0 is always well aligned.

Note to people who is now wondering WTH i am talking about.

consider a storage with 64 spt, an io size of 4k and partition starting
at sector 63.
first io request will require two ios from the storage (1 for sector 63,
and one for sectors 64 to 70)
the next 7 io (71-78,79-86,97-94,95-102,103-110,111-118,119-126) will be
on the same track
the 8th will again require to be split, and so on.
this causes the storage to do 1 unnecessary io every 8. YMMV.

L.

--
Luca Berra -- [EMAIL PROTECTED]
   Communication Media  Services S.r.l.
/\
\ / ASCII RIBBON CAMPAIGN
 XAGAINST HTML MAIL
/ \
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Raid-10 mount at startup always has problem

2007-10-29 Thread Luca Berra

On Sun, Oct 28, 2007 at 10:59:01PM -0700, Daniel L. Miller wrote:

Doug Ledford wrote:

Anyway, I happen to *like* the idea of using full disk devices, but the
reality is that the md subsystem doesn't have exclusive ownership of the
disks at all times, and without that it really needs to stake a claim on
the space instead of leaving things to chance IMO.
  
I've been re-reading this post numerous times - trying to ignore the 
burgeoning flame war :) - and this last sentence finally clicked with me.



I am sorry Daniel, when i read Doug and Bill, stating that your issue
was not having a partition table, i immediately took the bait and forgot
about your original issue.
I have no reason to believe your problem is due to not having a
partition table on your devices.


sda: unknown partition table

sdb: unknown partition table

sdc: unknown partition table

sdd: unknown partition table

the above clearly shows that the kernel does not see a partition table
where there is none which happens in some cases and bit Doug so hard.
Note, it does not happen at random, it should happen only if you use a
partitioned md device with a superblock at the end. Or if you configure
it wrongly as Doug did. (i am not accusing Doug of being stupid at all,
it is a fairly common mistake to make and we should try to prevent this
in mdadm as much as we can)
Again, having the kernel find a partition table where there is none,
should not pose a problem at all unless there is some badly designed software
like udev/hal that believes it knows better than you about what you have
on your disks.
but _NEITHER OF THESE IS YOUR PROBLEM_ imho

I am also sorry to say that i fail to identify what the source of your
problem is, we should try harder instead of flaming between us.

Is it possible to reproduce it on the live system
e.g. unmount, stop array, start it again and mount.
I bet it will work flawlessly in this case.
then i would disable starting this array at boot, and start it manually
when the system is up (stracing mdadm, so we can see what it does)

I am also wondering about this:
md: md0: raid array is not clean -- starting background reconstruction
does your system shut down properly?
do you see the message about stopping md at the very end of the
reboot/halt process?

L.


--
Luca Berra -- [EMAIL PROTECTED]
   Communication Media  Services S.r.l.
/\
\ / ASCII RIBBON CAMPAIGN
 XAGAINST HTML MAIL
/ \
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Raid-10 mount at startup always has problem

2007-10-29 Thread Bill Davidsen

Luca Berra wrote:

On Sun, Oct 28, 2007 at 08:21:34PM -0400, Bill Davidsen wrote:

Because you didn't stripe align the partition, your bad.
  
Align to /what/ stripe? Hardware (CHS is fiction), software (of the RAID 

the real stripe (track) size of the storage, you must read the manual
and/or bug technical support for that info.


That's my point, there *is* no real stripe (track) size of the storage 
because modern drives use zone bit recording, and sectors per track 
depends on track, and changes within a partition. See

 http://www.dewassoc.com/kbase/hard_drives/hard_disk_sector_structures.htm
 http://www.storagereview.com/guide2000/ref/hdd/op/mediaTracks.html
you're about to create), or ??? I don't notice my FC6 or FC7 install 
programs using any special partition location to start, I have only 
run (tried to run) FC8-test3 for the live CD, so I can't say what it 
might do. CentOS4 didn't do anything obvious, either, so unless I 
really misunderstand your position at redhat, that would be your 
bad.  ;-)


If you mean start a partition on a pseudo-CHS boundary, fdisk seems 
to use what it thinks are cylinders for that.
Yes, fdisk will create partition at sector 63 (due to CHS being 
braindead,

other than fictional: 63 sectors-per-track)
most arrays use 64 or 128 spt, and array cache are aligned accordingly.
So 63 is almost always the wrong choice.


As the above links show, there's no right choice.


for the default choice you must consider what spt your array uses, iirc
(this is from memory, so double check these figures)
IBM 64 spt (i think)
EMC DMX 64
EMC CX 128???
HDS (and HP XP) except OPEN-V 96
HDS (and HP XP) OPEN-V 128
HP EVA 4/6/8 with XCS 5.x state that no alignment is needed even if i
never found a technical explanation about that.
previous HP EVA versions did (maybe 64).
you might then want to consider how data is laid out on the storage, but
i believe the storage cache is enough to deal with that issue.

Please note that 0 is always well aligned.

Note to people who is now wondering WTH i am talking about.

consider a storage with 64 spt, an io size of 4k and partition starting
at sector 63.
first io request will require two ios from the storage (1 for sector 63,
and one for sectors 64 to 70)
the next 7 io (71-78,79-86,97-94,95-102,103-110,111-118,119-126) will be
on the same track
the 8th will again require to be split, and so on.
this causes the storage to do 1 unnecessary io every 8. YMMV.
No one makes drives with fixed spt any more. Your assumptions are a 
decade out of date.


--
bill davidsen [EMAIL PROTECTED]
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Raid-10 mount at startup always has problem

2007-10-29 Thread Doug Ledford
On Sun, 2007-10-28 at 20:21 -0400, Bill Davidsen wrote:
 Doug Ledford wrote:
  On Fri, 2007-10-26 at 11:15 +0200, Luca Berra wrote:

  On Thu, Oct 25, 2007 at 02:40:06AM -0400, Doug Ledford wrote:
  
  The partition table is the single, (mostly) universally recognized
  arbiter of what possible data might be on the disk.  Having a partition
  table may not make mdadm recognize the md superblock any better, but it
  keeps all that other stuff from even trying to access data that it
  doesn't have a need to access and prevents random luck from turning your
  day bad.

  on a pc maybe, but that is 20 years old design.
  
 
  So?  Unix is 35+ year old design, I suppose you want to switch to Vista
  then?
 

  partition table design is limited because it is still based on C/H/S,
  which do not exist anymore.
  Put a partition table on a big storage, say a DMX, and enjoy a 20%
  performance decrease.
  
 
  Because you didn't stripe align the partition, your bad.

 Align to /what/ stripe? Hardware (CHS is fiction), software (of the RAID 
 you're about to create), or ??? I don't notice my FC6 or FC7 install 
 programs using any special partition location to start, I have only run 
 (tried to run) FC8-test3 for the live CD, so I can't say what it might 
 do. CentOS4 didn't do anything obvious, either, so unless I really 
 misunderstand your position at redhat, that would be your bad.  ;-)
 
 If you mean start a partition on a pseudo-CHS boundary, fdisk seems to 
 use what it thinks are cylinders for that.
 
 Please clarify what alignment provides a performance benefit.

Luca was specifically talking about the big multi-terabyte to petabyte
hardware arrays on the market.  DMX, DDN, and others.  When they export
a volume to the OS, there is an underlying stripe layout to that volume.
If you don't use any partition table at all, you are automatically
aligned with their stripes.  However, if you do, then you have to align
your partition on a chunk boundary or else performance drops pretty
dramatically as a result of more writes than not crossing chunk
boundaries unnecessarily.  It's only relevant when you are talking about
a raid device that shows the OS a single logical disk made from lots of
other disks.


-- 
Doug Ledford [EMAIL PROTECTED]
  GPG KeyID: CFBFF194
  http://people.redhat.com/dledford

Infiniband specific RPMs available at
  http://people.redhat.com/dledford/Infiniband


signature.asc
Description: This is a digitally signed message part


Re: Raid-10 mount at startup always has problem

2007-10-29 Thread Doug Ledford
On Mon, 2007-10-29 at 09:22 -0400, Bill Davidsen wrote:

  consider a storage with 64 spt, an io size of 4k and partition starting
  at sector 63.
  first io request will require two ios from the storage (1 for sector 63,
  and one for sectors 64 to 70)
  the next 7 io (71-78,79-86,97-94,95-102,103-110,111-118,119-126) will be
  on the same track
  the 8th will again require to be split, and so on.
  this causes the storage to do 1 unnecessary io every 8. YMMV.
 No one makes drives with fixed spt any more. Your assumptions are a 
 decade out of date.

Your missing the point, it's not about drive tracks, it's about array
tracks, aka chunks.  A 64k write, that should write to one and only one
chunk, ends up spanning two.  That increases the amount of writing the
array has to do and the number of disks it busies for a typical single
I/O operation.

-- 
Doug Ledford [EMAIL PROTECTED]
  GPG KeyID: CFBFF194
  http://people.redhat.com/dledford

Infiniband specific RPMs available at
  http://people.redhat.com/dledford/Infiniband


signature.asc
Description: This is a digitally signed message part


Re: Raid-10 mount at startup always has problem

2007-10-29 Thread Doug Ledford
On Mon, 2007-10-29 at 09:18 +0100, Luca Berra wrote:
 On Sun, Oct 28, 2007 at 10:59:01PM -0700, Daniel L. Miller wrote:
 Doug Ledford wrote:
 Anyway, I happen to *like* the idea of using full disk devices, but the
 reality is that the md subsystem doesn't have exclusive ownership of the
 disks at all times, and without that it really needs to stake a claim on
 the space instead of leaving things to chance IMO.

 I've been re-reading this post numerous times - trying to ignore the 
 burgeoning flame war :) - and this last sentence finally clicked with me.
 
 I am sorry Daniel, when i read Doug and Bill, stating that your issue
 was not having a partition table, i immediately took the bait and forgot
 about your original issue.

I never said *his* issue was lack of partition table, I just said I
don't recommend that because it's flaky.  The last statement I made
about his issue was to ask about whether the problem was happening
during initrd time or sysinit time to try and identify if it was failing
before or after / was mounted to try and determine where the issue might
lay.  Then we got off on the tangent about partitions, and at the same
time Neil started asking about udev, at which point it came out that
he's running ubuntu, and as much as I would like to help, the fact of
the matter is that I've never touched ubuntu and wouldn't have the
faintest clue, so I let Neil handle it.  At which point he found that
the udev scripts in ubuntu are being stupid, and from the looks of it
are the cause of the problem.  So, I've considered the initial issue
root caused for a bit now.


 like udev/hal that believes it knows better than you about what you have
 on your disks.
 but _NEITHER OF THESE IS YOUR PROBLEM_ imho

Actually, it looks like udev *is* the problem, but not because of
partition tables.

 I am also sorry to say that i fail to identify what the source of your
 problem is, we should try harder instead of flaming between us.

We can do both, or at least I can :-P

 Is it possible to reproduce it on the live system
 e.g. unmount, stop array, start it again and mount.
 I bet it will work flawlessly in this case.
 then i would disable starting this array at boot, and start it manually
 when the system is up (stracing mdadm, so we can see what it does)
 
 I am also wondering about this:
 md: md0: raid array is not clean -- starting background reconstruction
 does your system shut down properly?
 do you see the message about stopping md at the very end of the
 reboot/halt process?

The root cause is that as udev adds his sata devices one at a time, on
each add of the sata device it invokes mdadm to see if there is an array
to start, and it doesn't use incremental mode on mdadm.  As a result, as
soon as there are 3 out of the 4 disks present, mdadm starts the array
in degraded mode.  It's probably a race between the mdadm started on the
third disk and mdadm started on the fourth disk that results in the
message about being unable to set the array info.  The one loosing the
race gets the error as the other one has already manipulated the array
(for example, the 4th disk mdadm could be trying to add the first disk
to the array, but it's already there, so it gets this error and bails).

So, as much as you might dislike mkinitrd since 5.0 Luca, it doesn't
have this particular problem ;-)  In the initrd we produce, it loads all
the SCSI/SATA/etc drivers first, then calls mkblkdevs which forces all
of the devices to appear in /dev, and only then does it start the
mdadm/lvm configuration.  Daniel, I make no promises what so ever that
this will even work at all as it may fail to load modules or all other
sorts of weirdness, but if you want to test the theory, you can download
the latest mkinitrd from fedoraproject.org, then use it to create an
initrd image under some other name than your default image name, then
manually edit your boot to have an extra stanza that uses the mkinitrd
generated initrd image instead of the ubuntu image, and then just see if
it brings the md device up cleanly instead of in degraded mode.  That
should be a fairly quick and easy way to test if Neil's analysis of the
udev script was right.

-- 
Doug Ledford [EMAIL PROTECTED]
  GPG KeyID: CFBFF194
  http://people.redhat.com/dledford

Infiniband specific RPMs available at
  http://people.redhat.com/dledford/Infiniband


signature.asc
Description: This is a digitally signed message part


Re: Raid-10 mount at startup always has problem

2007-10-29 Thread Gabor Gombas
On Mon, Oct 29, 2007 at 08:41:39AM +0100, Luca Berra wrote:

 consider a storage with 64 spt, an io size of 4k and partition starting
 at sector 63.
 first io request will require two ios from the storage (1 for sector 63,
 and one for sectors 64 to 70)
 the next 7 io (71-78,79-86,97-94,95-102,103-110,111-118,119-126) will be
 on the same track
 the 8th will again require to be split, and so on.
 this causes the storage to do 1 unnecessary io every 8. YMMV.

That's only true for random reads. If the OS does sufficient read-ahead
then sequential reads are affected much less. But the killers are the
misaligned random writes since then (considering RAID5/6 for simplicity)
the stripe has to be read from all component disks before it can be
written back.

Gabor

-- 
 -
 MTA SZTAKI Computer and Automation Research Institute
Hungarian Academy of Sciences
 -
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Raid-10 mount at startup always has problem

2007-10-29 Thread Doug Ledford
On Sun, 2007-10-28 at 22:59 -0700, Daniel L. Miller wrote:
 Doug Ledford wrote:
  Anyway, I happen to *like* the idea of using full disk devices, but the
  reality is that the md subsystem doesn't have exclusive ownership of the
  disks at all times, and without that it really needs to stake a claim on
  the space instead of leaving things to chance IMO.

 I've been re-reading this post numerous times - trying to ignore the 
 burgeoning flame war :) - and this last sentence finally clicked with me.
 
 As I'm a novice Linux user - and not involved in development at all - 
 bear with me if I'm stating something obvious.  And if I'm wrong - 
 please be gentle!
 
 1.  md devices are not native to the kernel - they are 
 created/assembled/activated/whatever by a userspace program.

My real point was that md doesn't own the disks, meaning that during
startup, and at other points in time, software other than the md stack
can attempt to use the disk directly.  That software may be the linux
file system code, linux lvm code, or in some case entirely different OS
software.  Given that these situations can arise, using a partition
table to mark the space as in use by linux is what I meant by staking a
claim.  It doesn't keep the linux kernel from using it because it thinks
it owns it, but it does stop other software from attempting to use it.

 2.  Because md devices are non-native devices, and are composed of 
 native devices, the kernel may try to use those components directly 
 without going through md.

In the case of superblocks at the end, yes.  The kernel may see the
underlying file system or lvm disk label even if the md device is not
started.

 3.  Creating a partition table somehow (I'm still not clear how/why) 
 reduces the chance the kernel will access the drive directly without md.

The partition table is more to tell other software that linux owns the
space and to avoid mistakes where someone runs fdisk on a disk
accidentally and wipes out your array because they added a partition
table on what they thought was a new disk (more likely when you have
large arrays of disks attached via fiber channel or such than in a
single system).  Putting the superblock at the beginning of the md
device is the main thing that guarantees the kernel will never try to
use what's inside the md device without the md device running.

 These concepts suddenly have me terrified over my data integrity.  Is 
 the md system so delicate that BOOT sequence can corrupt it?

If you have your superblocks at the end of the devices, then there are
certain failure modes that can cause data inconsistencies.  Generally
speaking they won't harm the array itself, it's just that the different
disks in a raid1 array might contain different data.  If you don't use
partitions, then the majority of failure scenarios involve things like
accidental use of fdisk on the unpartitioned device, access of the
device by other OSes, that sort of thing.

   How is it 
 more reliable AFTER the completed boot sequence?

Once the array is up and running, the constituent disks are marked as
busy in the operating system, which prevents other portions of the linux
kernel and other software in general from getting at the md owned disks.

 Nothing in the documentation (that I read - granted I don't always read 
 everything) stated that partitioning prior to md creation was necessary 
 - in fact references were provided on how to use complete disks.  Is 
 there an official position on, To Partition, or Not To Partition?  
 Particularly for my application - dedicated Linux server, RAID-10 
 configuration, identical drives.
 
 And if partitioning is the answer - what do I need to do with my live 
 dataset?  Drop one drive, partition, then add the partition as a new 
 drive to the set - and repeat for each drive after the rebuild finishes?

You *probably*, and I emphasize probably, don't need to do anything.  I
emphasize it because I don't know enough about your situation to say so
with 100% certainty.  If I'm wrong, it's not my fault.

Now, that said, here's the gist of the situation.  There are specific
failure cases that can corrupt data in an md raid1 array mainly related
to superblocks at the end of devices.  There are specific failure cases
where an unpartitioned device can be accidentally partitioned or where a
partitioned md array in combination with superblocks at the end and
using a whole disk device can be misrecognized as a partitioned normal
drive.  There are, on the other hand, cases where it's perfectly safe to
use unpartitioned devices, or superblocks at the end of devices.  My
recommendation when someone asks what to do is to use partitions, and to
use superblocks at the beginning of the devices (except for /boot since
that isn't supported at the moment).  The reason I give that advice is
that I assume if a person knows enough to know when it's safe to use
unpartitioned devices, like Luca, then they wouldn't be asking me for
advice.  So since they 

Re: Raid-10 mount at startup always has problem

2007-10-29 Thread Richard Scobie

Daniel L. Miller wrote:

Nothing in the documentation (that I read - granted I don't always read 
everything) stated that partitioning prior to md creation was necessary 
- in fact references were provided on how to use complete disks.  Is 
there an official position on, To Partition, or Not To Partition?  
Particularly for my application - dedicated Linux server, RAID-10 
configuration, identical drives.


My simplistic reason for always making one partition on md drives, about 
100MB smaller than the full space, has been as insurance to allow use of 
a replacement drive from another manufacturer, which while nominally 
marked as the same size as the originals, is in fact slightly smaller.


Regards,

Richard
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Raid-10 mount at startup always has problem

2007-10-29 Thread Luca Berra

On Mon, Oct 29, 2007 at 11:47:19AM -0400, Doug Ledford wrote:

On Mon, 2007-10-29 at 09:18 +0100, Luca Berra wrote:

On Sun, Oct 28, 2007 at 10:59:01PM -0700, Daniel L. Miller wrote:
Doug Ledford wrote:
Anyway, I happen to *like* the idea of using full disk devices, but the
reality is that the md subsystem doesn't have exclusive ownership of the
disks at all times, and without that it really needs to stake a claim on
the space instead of leaving things to chance IMO.
   
I've been re-reading this post numerous times - trying to ignore the 
burgeoning flame war :) - and this last sentence finally clicked with me.


I am sorry Daniel, when i read Doug and Bill, stating that your issue
was not having a partition table, i immediately took the bait and forgot
about your original issue.


I never said *his* issue was lack of partition table, I just said I
don't recommend that because it's flaky.  The last statement I made

maybe i misread you but Bill was quite clear.


about his issue was to ask about whether the problem was happening
during initrd time or sysinit time to try and identify if it was failing
before or after / was mounted to try and determine where the issue might
lay.  Then we got off on the tangent about partitions, and at the same
time Neil started asking about udev, at which point it came out that
he's running ubuntu, and as much as I would like to help, the fact of
the matter is that I've never touched ubuntu and wouldn't have the
faintest clue, so I let Neil handle it.  At which point he found that
the udev scripts in ubuntu are being stupid, and from the looks of it
are the cause of the problem.  So, I've considered the initial issue
root caused for a bit now.

It seems i made an idiot of myself by missing half of the thread, and i
even knew ubuntu was braindead in their use of udev at startup, since a
similar discussion came up on the lvm or the dm-devel mailing list (that
time iirc it was about lvm over multipath)


like udev/hal that believes it knows better than you about what you have
on your disks.
but _NEITHER OF THESE IS YOUR PROBLEM_ imho


Actually, it looks like udev *is* the problem, but not because of
partition tables.

you are right.

L.

--
Luca Berra -- [EMAIL PROTECTED]
   Communication Media  Services S.r.l.
/\
\ / ASCII RIBBON CAMPAIGN
 XAGAINST HTML MAIL
/ \
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Raid-10 mount at startup always has problem

2007-10-29 Thread Doug Ledford
On Mon, 2007-10-29 at 22:29 +0100, Luca Berra wrote:
 At which point he found that
 the udev scripts in ubuntu are being stupid, and from the looks of it
 are the cause of the problem.  So, I've considered the initial issue
 root caused for a bit now.
 It seems i made an idiot of myself by missing half of the thread, and i
 even knew ubuntu was braindead in their use of udev at startup, since a
 similar discussion came up on the lvm or the dm-devel mailing list (that
 time iirc it was about lvm over multipath)

Nah.  Even if we had concluded that udev was to blame here, I'm not
entirely certain that we hadn't left Daniel with the impression that we
suspected it versus blamed it, so reiterating it doesn't hurt.  And I'm
sure no one has given him a fix for the problem (although Neil did
request a change that will give debug output, but not solve the
problem), so not dropping it entirely would seem appropriate as well.

-- 
Doug Ledford [EMAIL PROTECTED]
  GPG KeyID: CFBFF194
  http://people.redhat.com/dledford

Infiniband specific RPMs available at
  http://people.redhat.com/dledford/Infiniband


signature.asc
Description: This is a digitally signed message part


Re: Raid-10 mount at startup always has problem

2007-10-29 Thread Daniel L. Miller

Doug Ledford wrote:

Nah.  Even if we had concluded that udev was to blame here, I'm not
entirely certain that we hadn't left Daniel with the impression that we
suspected it versus blamed it, so reiterating it doesn't hurt.  And I'm
sure no one has given him a fix for the problem (although Neil did
request a change that will give debug output, but not solve the
problem), so not dropping it entirely would seem appropriate as well.
  
I've opened a bug report on Ubuntu's Launchpad.net.  Scott James Remnant 
asked me to cc him on Neil's incremental reference - we'll see what 
happens from here.


Thanks for the help guys.  At the moment, I've changed my mdadm.conf to 
explicitly list the drives, instead of the auto=partition parameter.  
We'll see what happens on the next reboot.


I don't know if it means anything, but I'm using a self-compiled 2.6.22 
kernel - with initrd.  At least I THINK I'm using initrd - I have an 
image, but I don't see an initrd line in my grub config.  HmmI'm 
going to add a stanza that includes the initrd and see what happens also.


--
Daniel
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Raid-10 mount at startup always has problem

2007-10-28 Thread Luca Berra

On Sat, Oct 27, 2007 at 04:47:30PM -0400, Doug Ledford wrote:

On Sat, 2007-10-27 at 09:50 +0200, Luca Berra wrote:

On Fri, Oct 26, 2007 at 03:26:33PM -0400, Doug Ledford wrote:
On Fri, 2007-10-26 at 11:15 +0200, Luca Berra wrote:
 On Thu, Oct 25, 2007 at 02:40:06AM -0400, Doug Ledford wrote:
 The partition table is the single, (mostly) universally recognized
 arbiter of what possible data might be on the disk.  Having a partition
 table may not make mdadm recognize the md superblock any better, but it
 keeps all that other stuff from even trying to access data that it
 doesn't have a need to access and prevents random luck from turning your
 day bad.
 on a pc maybe, but that is 20 years old design.

So?  Unix is 35+ year old design, I suppose you want to switch to Vista
then?
unix is a 35+ year old design that evolved in time, some ideas were
kept, some ditched.


BSD disk labels are still in use, SunOS disk labels are still in use,

i am not a solaris expert, do they still use disk labels under vxvm?
oh, by the way, disklabels do not support the partition type attribute.


partition tables are somewhat on the way out, but only because they are
being replaced by the new EFI disk partitioning method.  The only place
where partitionless devices is common is in dedicated raid boxes where
the raid controller is the only thing that will *ever* see that disk.

well i am more used to other os (HP, AIX) where lvm is the common mean of
accessing disk devices




by default fdisk misalignes partition tables
and aligning them is more complex than just doing without.


So.  You really need to take the time and to understand the alignment of
the device because then and only then can you pass options to mke2fs to

yes and i am not the only person in the world doing that.


Linux works properly with a partition table, so this is a specious
statement.
It should also work properly without one.


Most of the time it does.  But those times where it can fail, the
failure is due to not taking the precautions necessary to prevent it:
aka labeling disk usage via some sort of partition table/disklabel/etc.

I strongly disagree.
the failure is badly designed software.


Did you stick your mmc card in there during the install of the OS?

My laptop has a built-in mmc slot, so i sometimes leave a card plugged
in. But the mmc thing was just an example, it is not that critical.

i don't count myself as a moron, what i am trying to say is that
partition tables are one way of organizing disk space, not the only one.


Using whole disk devices isn't a means of organizing space.  It's a way
to get a rather miniscule amount of space back by *not* organizing the
space.

if i am using, say lvm to organize disk space, a partition table is
unnecessary to the organization, and it is natural not using them.


This whole argument seems to boil down to you wanting to perfectly
optimize your system for your use case which includes controlling the
environment enough that you know it's safe to not partition your disks,
where as I argue that although this works in controlled environments, it
is known to have failure modes in other environments, and I would be
totally remiss if I recommended to my customers that they should take
the risk that you can ignore because of your controlled environment
since I know a lot of my customers *don't* have a controlled environment
such as you do.


The whole argument to me boils down to the fact that not having a partition
table on a device is possible, and software that do not consider this
eventuality is flawed, and recommnding to work-around flawed software is
just digging your head in the sand.
But i believe i did not convince you one ounce more than you convinced
me, so i'll quit this thread which is getting too far.

Regards,
L.

--
Luca Berra -- [EMAIL PROTECTED]
   Communication Media  Services S.r.l.
/\
\ / ASCII RIBBON CAMPAIGN
 XAGAINST HTML MAIL
/ \
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Raid-10 mount at startup always has problem

2007-10-28 Thread Doug Ledford
On Sun, 2007-10-28 at 14:37 +0100, Luca Berra wrote:
 On Sat, Oct 27, 2007 at 04:47:30PM -0400, Doug Ledford wrote:

 Most of the time it does.  But those times where it can fail, the
 failure is due to not taking the precautions necessary to prevent it:
 aka labeling disk usage via some sort of partition table/disklabel/etc.
 I strongly disagree.
 the failure is badly designed software.

Then you need to blame Ingo who made putting the superblock at the end
of the device the standard.  If the superblock were always at the
beginning, then this whole argument would be moot.  Things would be
reliable the way you want.

 Using whole disk devices isn't a means of organizing space.  It's a way
 to get a rather miniscule amount of space back by *not* organizing the
 space.
 if i am using, say lvm to organize disk space, a partition table is
 unnecessary to the organization, and it is natural not using them.

If you are using straight lvm then you don't have this problem anyway.
Lvm doesn't allow the underlying physical device to *look* like a valid,
partitioned, single device.  Md does when the superblock is at the end.

 This whole argument seems to boil down to you wanting to perfectly
 optimize your system for your use case which includes controlling the
 environment enough that you know it's safe to not partition your disks,
 where as I argue that although this works in controlled environments, it
 is known to have failure modes in other environments, and I would be
 totally remiss if I recommended to my customers that they should take
 the risk that you can ignore because of your controlled environment
 since I know a lot of my customers *don't* have a controlled environment
 such as you do.
 
 The whole argument to me boils down to the fact that not having a partition
 table on a device is possible, and software that do not consider this
 eventuality is flawed,

It's simply not possible to 100% certain differentiate between an md
whole disk partitioned device with a superblock at the end and a regular
device.  Period.  You can try to be clever, but you can also get tripped
up.  The flaw is not with the software, it's with a design that allowed
this to happen.

  and recommnding to work-around flawed software is
 just digging your head in the sand.

If a design is broken but in place, I have no choice but to work around
it.  Anything else is just stupid.

 But i believe i did not convince you one ounce more than you convinced
 me, so i'll quit this thread which is getting too far.
 
 Regards,
 L.
 
 -- 
 Luca Berra -- [EMAIL PROTECTED]
 Communication Media  Services S.r.l.
  /\
  \ / ASCII RIBBON CAMPAIGN
   XAGAINST HTML MAIL
  / \
 -
 To unsubscribe from this list: send the line unsubscribe linux-raid in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
-- 
Doug Ledford [EMAIL PROTECTED]
  GPG KeyID: CFBFF194
  http://people.redhat.com/dledford

Infiniband specific RPMs available at
  http://people.redhat.com/dledford/Infiniband


signature.asc
Description: This is a digitally signed message part


Re: Raid-10 mount at startup always has problem

2007-10-28 Thread Bill Davidsen

Doug Ledford wrote:

On Fri, 2007-10-26 at 11:15 +0200, Luca Berra wrote:
  

On Thu, Oct 25, 2007 at 02:40:06AM -0400, Doug Ledford wrote:


The partition table is the single, (mostly) universally recognized
arbiter of what possible data might be on the disk.  Having a partition
table may not make mdadm recognize the md superblock any better, but it
keeps all that other stuff from even trying to access data that it
doesn't have a need to access and prevents random luck from turning your
day bad.
  

on a pc maybe, but that is 20 years old design.



So?  Unix is 35+ year old design, I suppose you want to switch to Vista
then?

  

partition table design is limited because it is still based on C/H/S,
which do not exist anymore.
Put a partition table on a big storage, say a DMX, and enjoy a 20%
performance decrease.



Because you didn't stripe align the partition, your bad.
  
Align to /what/ stripe? Hardware (CHS is fiction), software (of the RAID 
you're about to create), or ??? I don't notice my FC6 or FC7 install 
programs using any special partition location to start, I have only run 
(tried to run) FC8-test3 for the live CD, so I can't say what it might 
do. CentOS4 didn't do anything obvious, either, so unless I really 
misunderstand your position at redhat, that would be your bad.  ;-)


If you mean start a partition on a pseudo-CHS boundary, fdisk seems to 
use what it thinks are cylinders for that.


Please clarify what alignment provides a performance benefit.

--
bill davidsen [EMAIL PROTECTED]
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Raid-10 mount at startup always has problem

2007-10-28 Thread Daniel L. Miller

Doug Ledford wrote:

Anyway, I happen to *like* the idea of using full disk devices, but the
reality is that the md subsystem doesn't have exclusive ownership of the
disks at all times, and without that it really needs to stake a claim on
the space instead of leaving things to chance IMO.
  
I've been re-reading this post numerous times - trying to ignore the 
burgeoning flame war :) - and this last sentence finally clicked with me.


As I'm a novice Linux user - and not involved in development at all - 
bear with me if I'm stating something obvious.  And if I'm wrong - 
please be gentle!


1.  md devices are not native to the kernel - they are 
created/assembled/activated/whatever by a userspace program.
2.  Because md devices are non-native devices, and are composed of 
native devices, the kernel may try to use those components directly 
without going through md.
3.  Creating a partition table somehow (I'm still not clear how/why) 
reduces the chance the kernel will access the drive directly without md.


These concepts suddenly have me terrified over my data integrity.  Is 
the md system so delicate that BOOT sequence can corrupt it?  How is it 
more reliable AFTER the completed boot sequence?


Nothing in the documentation (that I read - granted I don't always read 
everything) stated that partitioning prior to md creation was necessary 
- in fact references were provided on how to use complete disks.  Is 
there an official position on, To Partition, or Not To Partition?  
Particularly for my application - dedicated Linux server, RAID-10 
configuration, identical drives.


And if partitioning is the answer - what do I need to do with my live 
dataset?  Drop one drive, partition, then add the partition as a new 
drive to the set - and repeat for each drive after the rebuild finishes?

--
Daniel
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Raid-10 mount at startup always has problem

2007-10-27 Thread Luca Berra

On Fri, Oct 26, 2007 at 03:26:33PM -0400, Doug Ledford wrote:

On Fri, 2007-10-26 at 11:15 +0200, Luca Berra wrote:

On Thu, Oct 25, 2007 at 02:40:06AM -0400, Doug Ledford wrote:
The partition table is the single, (mostly) universally recognized
arbiter of what possible data might be on the disk.  Having a partition
table may not make mdadm recognize the md superblock any better, but it
keeps all that other stuff from even trying to access data that it
doesn't have a need to access and prevents random luck from turning your
day bad.
on a pc maybe, but that is 20 years old design.


So?  Unix is 35+ year old design, I suppose you want to switch to Vista
then?

unix is a 35+ year old design that evolved in time, some ideas were
kept, some ditched.


partition table design is limited because it is still based on C/H/S,
which do not exist anymore.
Put a partition table on a big storage, say a DMX, and enjoy a 20%
performance decrease.


Because you didn't stripe align the partition, your bad.

:)
by default fdisk misalignes partition tables
and aligning them is more complex than just doing without.


Oh, and let's not go into what can happen if you're talking about a dual
boot machine and what Windows might do to the disk if it doesn't think
the disk space is already spoken for by a linux partition.
Why the hell should the existance of windows limit the possibility of
linux working properly.


Linux works properly with a partition table, so this is a specious
statement.

It should also work properly without one.


If i have a pc that dualboots windows i will take care of using the
common denominator of a partition table, if it is my big server i will
probably not. since it won't boot anything else than Linux.


Doesn't really gain you anything, but your choice.  Besides, the
question wasn't why shouldn't Luca Berra use whole disk devices, it
was why I don't recommend using whole disk devices, and my
recommendation wasn't based in the least bit upon a single person's use
scenario.

If i am the only person in the world that believes partition tables
should not be required then i'll shut up.


On the opposite, i once inserted an mmc memory card, which had been
initialized on my mobile phone, into the mmc slot of my laptop, and was
faced with a load of error about mmcblk0 having an invalid partition
table.


So?  The messages are just informative, feel free to ignore them.

but did not anaconda propose to wipe unpartitioned disks?


The phone dictates the format, only a moron would say otherwise.  But,
then again, the phone doesn't care about interoperability and many other
issues on memory cards that it thinks it owns, so only a moron would
argue that because a phone doesn't use a partition table that nothing
else in the computer realm needs to either.

i don't count myself as a moron, what i am trying to say is that
partition tables are one way of organizing disk space, not the only one.


Anyway, I happen to *like* the idea of using full disk devices, but the
reality is that the md subsystem doesn't have exclusive ownership of the
disks at all times, and without that it really needs to stake a claim on
the space instead of leaving things to chance IMO.
Start removing the partition detection code from the blasted kernel and
move it to userspace, which is already in place, but it is not the
default.


Which just moves where the work is done, not what work needs to be done.

and also permits to decide if it hat to be done or not.

It's a change for no benefit and a waste of time.

the waste of time was having to put code in mdadm to undo partition
detection on component devices, where partition detection should not
have taken place.



--
Luca Berra -- [EMAIL PROTECTED]
   Communication Media  Services S.r.l.
/\
\ / ASCII RIBBON CAMPAIGN
 XAGAINST HTML MAIL
/ \
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Raid-10 mount at startup always has problem

2007-10-27 Thread Luca Berra

On Fri, Oct 26, 2007 at 06:53:40PM +0200, Gabor Gombas wrote:

On Fri, Oct 26, 2007 at 11:15:13AM +0200, Luca Berra wrote:


on a pc maybe, but that is 20 years old design.
partition table design is limited because it is still based on C/H/S,
which do not exist anymore.


The MS-DOS format is not the only possible partition table layout. Other
formats such as GPT do not have such limitations.


Put a partition table on a big storage, say a DMX, and enjoy a 20%
performance decrease.


I assume your big storage uses some kind of RAID. Are your partitions
stripe-aligned? (Btw. that has nothing to do with partitions, LVM can
also suffer if PEs are not aligned).

mine are, unfortunately the default is to start them at 32256 bytes into
the device.


Oh, and let's not go into what can happen if you're talking about a dual
boot machine and what Windows might do to the disk if it doesn't think
the disk space is already spoken for by a linux partition.

Why the hell should the existance of windows limit the possibility of
linux working properly.

what i am saying is that a dual boot machine is not the only scenario we
have.


On the opposite, i once inserted an mmc memory card, which had been
initialized on my mobile phone, into the mmc slot of my laptop, and was
faced with a load of error about mmcblk0 having an invalid partition
table. Obviously it had none, it was a plain fat filesystem.
Is the solution partitioning it? I don't think the phone would
agree.


Well, it said it could not find a valid partition change. That was the
truth. Why is it a problem if the kernel states a fact?

it is random. reformatting it made the kernel message go away.
i wonder if by chance something would decide it is a valid partition
table

--
Luca Berra -- [EMAIL PROTECTED]
   Communication Media  Services S.r.l.
/\
\ / ASCII RIBBON CAMPAIGN
 XAGAINST HTML MAIL
/ \
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Raid-10 mount at startup always has problem

2007-10-27 Thread Gabor Gombas
On Sat, Oct 27, 2007 at 09:50:55AM +0200, Luca Berra wrote:

 Because you didn't stripe align the partition, your bad.
 :)
 by default fdisk misalignes partition tables
 and aligning them is more complex than just doing without.

Why use fdisk then? Use parted instead. It's not the kernel's fault if
you use tools not suited for a given task...

 Linux works properly with a partition table, so this is a specious
 statement.
 It should also work properly without one.

It does:

sd 0:0:2:0: [sdc] Very big device. Trying to use READ CAPACITY(16).
sd 0:0:2:0: [sdc] 7812333568 512-byte hardware sectors (315 MB)
sd 0:0:2:0: [sdc] Write Protect is off
sd 0:0:2:0: [sdc] Mode Sense: 23 00 00 00
sd 0:0:2:0: [sdc] Write cache: enabled, read cache: disabled, doesn't support 
DPO or FUA
 sdc: unknown partition table

Works perfectly without any partition tables...

You seem to be annoyed because the kernel tells you that there is no
partition table it recognizes - but if that bothers you so, simply stop
reading the kernel logs. My kernel also tells me that it failed to find
an AGP bridge - by your logic that should mean that everyone still using
AGP-capable motherboards should toss their system to the junkyard?!?

Gabor

-- 
 -
 MTA SZTAKI Computer and Automation Research Institute
Hungarian Academy of Sciences
 -
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Raid-10 mount at startup always has problem

2007-10-27 Thread Doug Ledford
On Sat, 2007-10-27 at 09:50 +0200, Luca Berra wrote:
 On Fri, Oct 26, 2007 at 03:26:33PM -0400, Doug Ledford wrote:
 On Fri, 2007-10-26 at 11:15 +0200, Luca Berra wrote:
  On Thu, Oct 25, 2007 at 02:40:06AM -0400, Doug Ledford wrote:
  The partition table is the single, (mostly) universally recognized
  arbiter of what possible data might be on the disk.  Having a partition
  table may not make mdadm recognize the md superblock any better, but it
  keeps all that other stuff from even trying to access data that it
  doesn't have a need to access and prevents random luck from turning your
  day bad.
  on a pc maybe, but that is 20 years old design.
 
 So?  Unix is 35+ year old design, I suppose you want to switch to Vista
 then?
 unix is a 35+ year old design that evolved in time, some ideas were
 kept, some ditched.

BSD disk labels are still in use, SunOS disk labels are still in use,
partition tables are somewhat on the way out, but only because they are
being replaced by the new EFI disk partitioning method.  The only place
where partitionless devices is common is in dedicated raid boxes where
the raid controller is the only thing that will *ever* see that disk.
Sometimes they do it on big SAN/NAS stuff because they don't want to
align the partition table to the underlying device's stripe layout, but
even then they do so in a tightly controlled environment where they know
exactly which machines will be allowed to even try and access the
device.

  partition table design is limited because it is still based on C/H/S,
  which do not exist anymore.
  Put a partition table on a big storage, say a DMX, and enjoy a 20%
  performance decrease.
 
 Because you didn't stripe align the partition, your bad.
 :)
 by default fdisk misalignes partition tables
 and aligning them is more complex than just doing without.

So.  You really need to take the time and to understand the alignment of
the device because then and only then can you pass options to mke2fs to
align the fs metadata with the stripes as well thereby buying you ever
more performance than just leaving off the partition table (assuming
that's what you use, I don't know if other mkfs programs have the same
options for aligning metadata with stripes).  And if you take the time
to understand the underlying stripe layout for the mkfs stuff, then you
can use the same information to align the partition table.

  Oh, and let's not go into what can happen if you're talking about a dual
  boot machine and what Windows might do to the disk if it doesn't think
  the disk space is already spoken for by a linux partition.
  Why the hell should the existance of windows limit the possibility of
  linux working properly.
 
 Linux works properly with a partition table, so this is a specious
 statement.
 It should also work properly without one.

Most of the time it does.  But those times where it can fail, the
failure is due to not taking the precautions necessary to prevent it:
aka labeling disk usage via some sort of partition table/disklabel/etc.

  If i have a pc that dualboots windows i will take care of using the
  common denominator of a partition table, if it is my big server i will
  probably not. since it won't boot anything else than Linux.
 
 Doesn't really gain you anything, but your choice.  Besides, the
 question wasn't why shouldn't Luca Berra use whole disk devices, it
 was why I don't recommend using whole disk devices, and my
 recommendation wasn't based in the least bit upon a single person's use
 scenario.
 If i am the only person in the world that believes partition tables
 should not be required then i'll shut up.
 
  On the opposite, i once inserted an mmc memory card, which had been
  initialized on my mobile phone, into the mmc slot of my laptop, and was
  faced with a load of error about mmcblk0 having an invalid partition
  table.
 
 So?  The messages are just informative, feel free to ignore them.
 but did not anaconda propose to wipe unpartitioned disks?

Did you stick your mmc card in there during the install of the OS?
That's the only time anaconda ever runs, and therefore the only time it
ever checks your devices.  It makes sense that during the initial
install, when the OS is only configured to see locally connected
devices, or possibly iSCSI devices that you have specifically told it to
probe, that it would then ask you the question about those devices.
Other network attached or shared devices are generally added after the
initial install.

 The phone dictates the format, only a moron would say otherwise.  But,
 then again, the phone doesn't care about interoperability and many other
 issues on memory cards that it thinks it owns, so only a moron would
 argue that because a phone doesn't use a partition table that nothing
 else in the computer realm needs to either.
 i don't count myself as a moron, what i am trying to say is that
 partition tables are one way of organizing disk space, not the only one.

Using whole disk devices isn't a 

Re: Raid-10 mount at startup always has problem

2007-10-26 Thread Neil Brown
On Thursday October 25, [EMAIL PROTECTED] wrote:
 Neil Brown wrote:
  It might be worth finding out where mdadm is being run in the init
  scripts and add a -v flag, and redirecting stdout/stderr to some log
  file.
  e.g.
 mdadm -As  -v  /var/log/mdadm-$$ 21
 
  And see if that leaves something useful in the log file.
 

 I haven't rebooted yet, but here's my /etc/udev/rules.d/70-mdadm.rules
 file (BTW - running on Ubuntu 7.10 Gutsy):
 
 SUBSYSTEM==block, ACTION==add|change,
 ENV{ID_FS_TYPE}==linux_raid*, RUN+=watershed -i udev-mdadm
 /sbin/mdadm -As -v  /var/log/mdadm-$$ 21

Yes, that would do exactly what you are experiencing.
Every time a component of a raid array is discovered, it will try to
assemble all known arrays.
So one drive appears, it tries to assemble the array but there aren't
enough so it gives up.
Then two drives.  Chances are there still aren't enough, so it gives
up again.
Then when there are three drives it will successfully assemble the
array - degraded.

Then when there are 4 drives, it will be too late.  I cannot see why
that would lead to the cannot update array info error, but it
certainly explains the rest.

That is really bad stuff to have in udev.
The --incremental mode was written precisely for use in udev.  I
wonder why they didn't use it

Maybe you should log a bug report with Ubuntu and suggest they discuss
their udev scripts with the developer of mdadm (that would be me I
guess).

NeilBrown
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Raid-10 mount at startup always has problem

2007-10-26 Thread Luca Berra

On Thu, Oct 25, 2007 at 02:40:06AM -0400, Doug Ledford wrote:

partition table (something that the Fedora/RHEL installers do to all
disks without partition tables...well, the installer tells you there's
no partition table and asks if you want to initialize it, but if someone
is in a hurry and hits yes when they meant no, bye bye data).

Cool feature



The partition table is the single, (mostly) universally recognized
arbiter of what possible data might be on the disk.  Having a partition
table may not make mdadm recognize the md superblock any better, but it
keeps all that other stuff from even trying to access data that it
doesn't have a need to access and prevents random luck from turning your
day bad.

on a pc maybe, but that is 20 years old design.
partition table design is limited because it is still based on C/H/S,
which do not exist anymore.
Put a partition table on a big storage, say a DMX, and enjoy a 20%
performance decrease.


Oh, and let's not go into what can happen if you're talking about a dual
boot machine and what Windows might do to the disk if it doesn't think
the disk space is already spoken for by a linux partition.

Why the hell should the existance of windows limit the possibility of
linux working properly.
If i have a pc that dualboots windows i will take care of using the
common denominator of a partition table, if it is my big server i will
probably not. since it won't boot anything else than Linux.


And, in particular with mdadm, I once created a full disk md raid array
on a couple disks, then couldn't get things arranged like I wanted, so I
just partitioned the disks and then created new arrays in the partitions
(without first manually zeroing the superblock for the whole disk
array).  Since I used a version 1.0 superblock on the whole disk array,
and then used version 1.1 superblocks in the partitions, the net result
was that when I ran mdadm -Eb, mdadm would find both the 1.1 and 1.0
superblocks in the last partition on the disk.  Confused both myself and
mdadm for a while.

yes, this is fun
On the opposite, i once inserted an mmc memory card, which had been
initialized on my mobile phone, into the mmc slot of my laptop, and was
faced with a load of error about mmcblk0 having an invalid partition
table. Obviously it had none, it was a plain fat filesystem.
Is the solution partitioning it? I don't think the phone would
agree.


Anyway, I happen to *like* the idea of using full disk devices, but the
reality is that the md subsystem doesn't have exclusive ownership of the
disks at all times, and without that it really needs to stake a claim on
the space instead of leaving things to chance IMO.

Start removing the partition detection code from the blasted kernel and
move it to userspace, which is already in place, but it is not the
default.



--
Luca Berra -- [EMAIL PROTECTED]
   Communication Media  Services S.r.l.
/\
\ / ASCII RIBBON CAMPAIGN
 XAGAINST HTML MAIL
/ \
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Raid-10 mount at startup always has problem

2007-10-26 Thread Gabor Gombas
On Fri, Oct 26, 2007 at 11:15:13AM +0200, Luca Berra wrote:

 on a pc maybe, but that is 20 years old design.
 partition table design is limited because it is still based on C/H/S,
 which do not exist anymore.

The MS-DOS format is not the only possible partition table layout. Other
formats such as GPT do not have such limitations.

 Put a partition table on a big storage, say a DMX, and enjoy a 20%
 performance decrease.

I assume your big storage uses some kind of RAID. Are your partitions
stripe-aligned? (Btw. that has nothing to do with partitions, LVM can
also suffer if PEs are not aligned).

 Oh, and let's not go into what can happen if you're talking about a dual
 boot machine and what Windows might do to the disk if it doesn't think
 the disk space is already spoken for by a linux partition.
 Why the hell should the existance of windows limit the possibility of
 linux working properly.

Well, if you want to convert a Windows partition to Linux by just
changing the partition type, running mke2fs over it, and filling it with
data, Windows will happily ignore the partition table change and will
overwrite your data without any notice on the next boot (happened with
one collegaue, not fun to debug). So much for automatic device type
detection...

 On the opposite, i once inserted an mmc memory card, which had been
 initialized on my mobile phone, into the mmc slot of my laptop, and was
 faced with a load of error about mmcblk0 having an invalid partition
 table. Obviously it had none, it was a plain fat filesystem.
 Is the solution partitioning it? I don't think the phone would
 agree.

Well, it said it could not find a valid partition change. That was the
truth. Why is it a problem if the kernel states a fact?

Gabor

-- 
 -
 MTA SZTAKI Computer and Automation Research Institute
Hungarian Academy of Sciences
 -
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Raid-10 mount at startup always has problem

2007-10-26 Thread Doug Ledford
On Fri, 2007-10-26 at 11:15 +0200, Luca Berra wrote:
 On Thu, Oct 25, 2007 at 02:40:06AM -0400, Doug Ledford wrote:
 The partition table is the single, (mostly) universally recognized
 arbiter of what possible data might be on the disk.  Having a partition
 table may not make mdadm recognize the md superblock any better, but it
 keeps all that other stuff from even trying to access data that it
 doesn't have a need to access and prevents random luck from turning your
 day bad.
 on a pc maybe, but that is 20 years old design.

So?  Unix is 35+ year old design, I suppose you want to switch to Vista
then?

 partition table design is limited because it is still based on C/H/S,
 which do not exist anymore.
 Put a partition table on a big storage, say a DMX, and enjoy a 20%
 performance decrease.

Because you didn't stripe align the partition, your bad.

 Oh, and let's not go into what can happen if you're talking about a dual
 boot machine and what Windows might do to the disk if it doesn't think
 the disk space is already spoken for by a linux partition.
 Why the hell should the existance of windows limit the possibility of
 linux working properly.

Linux works properly with a partition table, so this is a specious
statement.

 If i have a pc that dualboots windows i will take care of using the
 common denominator of a partition table, if it is my big server i will
 probably not. since it won't boot anything else than Linux.

Doesn't really gain you anything, but your choice.  Besides, the
question wasn't why shouldn't Luca Berra use whole disk devices, it
was why I don't recommend using whole disk devices, and my
recommendation wasn't based in the least bit upon a single person's use
scenario.

 And, in particular with mdadm, I once created a full disk md raid array
 on a couple disks, then couldn't get things arranged like I wanted, so I
 just partitioned the disks and then created new arrays in the partitions
 (without first manually zeroing the superblock for the whole disk
 array).  Since I used a version 1.0 superblock on the whole disk array,
 and then used version 1.1 superblocks in the partitions, the net result
 was that when I ran mdadm -Eb, mdadm would find both the 1.1 and 1.0
 superblocks in the last partition on the disk.  Confused both myself and
 mdadm for a while.
 yes, this is fun
 On the opposite, i once inserted an mmc memory card, which had been
 initialized on my mobile phone, into the mmc slot of my laptop, and was
 faced with a load of error about mmcblk0 having an invalid partition
 table.

So?  The messages are just informative, feel free to ignore them.

  Obviously it had none, it was a plain fat filesystem.
 Is the solution partitioning it? I don't think the phone would
 agree.

The phone dictates the format, only a moron would say otherwise.  But,
then again, the phone doesn't care about interoperability and many other
issues on memory cards that it thinks it owns, so only a moron would
argue that because a phone doesn't use a partition table that nothing
else in the computer realm needs to either.

 Anyway, I happen to *like* the idea of using full disk devices, but the
 reality is that the md subsystem doesn't have exclusive ownership of the
 disks at all times, and without that it really needs to stake a claim on
 the space instead of leaving things to chance IMO.
 Start removing the partition detection code from the blasted kernel and
 move it to userspace, which is already in place, but it is not the
 default.

Which just moves where the work is done, not what work needs to be done.
It's a change for no benefit and a waste of time.

-- 
Doug Ledford [EMAIL PROTECTED]
  GPG KeyID: CFBFF194
  http://people.redhat.com/dledford

Infiniband specific RPMs available at
  http://people.redhat.com/dledford/Infiniband


signature.asc
Description: This is a digitally signed message part


Re: Raid-10 mount at startup always has problem

2007-10-25 Thread Neil Brown
On Wednesday October 24, [EMAIL PROTECTED] wrote:
 Current mdadm.conf:
 DEVICE partitions
 ARRAY /dev/.static/dev/md0 level=raid10 num-devices=4 
 UUID=9d94b17b:f5fac31a:577c252b:0d4c4b2a auto=part
 
 still have the problem where on boot one drive is not part of the 
 array.  Is there a log file I can check to find out WHY a drive is not 
 being added?  It's been a while since the reboot, but I did find some 
 entries in dmesg - I'm appending both the md lines and the physical disk 
 related lines.  The bottom shows one disk not being added (this time is 
 was sda) - and the disk that gets skipped on each boot seems to be 
 random - there's no consistent failure:

Odd but interesting.
Does it sometimes fail to start the array altogether?

 md: md0 stopped.
 md: md0 stopped.
 md: bindsdc
 md: bindsdd
 md: bindsdb
 md: md0: raid array is not clean -- starting background reconstruction
 raid10: raid set md0 active with 3 out of 4 devices
 md: couldn't update array info. -22
  ^^^

This is the most surprising line, and hence the one most likely to
convey helpful information.

This message is generated when a process calls SET_ARRAY_INFO on an
array that is already running, and the changes implied by the new
array_info are not supportable.

The only way I can see this happening is if two copies of mdadm are
running at exactly the same time and are both are trying to assemble
the same array.  The first calls SET_ARRAY_INFO and assembles the
(partial) array.  The second calls SET_ARRAY_INFO and gets this error.
Not all devices are included because while when one mdadm when to
look, at a device, the other has it locked and so the first just
ignored it.

I just tried that, and sometimes it worked, but sometimes it assembled
with 3 out of 4 devices.  I didn't get the couldn't update array info
message, but that doesn't prove I'm wrong.

I cannot imagine how that might be happening (two at once) unless
maybe 'udev' had been configured to do something as soon as devices
were discovered seems unlikely.

It might be worth finding out where mdadm is being run in the init
scripts and add a -v flag, and redirecting stdout/stderr to some log
file.
e.g.
   mdadm -As  -v  /var/log/mdadm-$$ 21

And see if that leaves something useful in the log file.

BTW, I don't think your problem has anything to do with the fact that
you are using whole partitions.
While it is debatable whether that is a good idea or not (I like the
idea, but Doug doesn't and I respect his opinion) I doubt it would
contribute to the current problem.


Your description makes me nearly certain that there is some sort of
race going on (that is the easiest way to explain randomly differing
behaviours).   The race is probably between different code 'locking'
(opening with O_EXCL) the various devices.  Give the above error
message, two different 'mdadm's seems most likely, but an mdadm and a
mount-by-label scan could probably do it too.

NeilBrown
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Raid-10 mount at startup always has problem

2007-10-25 Thread Doug Ledford
On Wed, 2007-10-24 at 22:43 -0700, Daniel L. Miller wrote:
 Bill Davidsen wrote:
  Daniel L. Miller wrote:
  Current mdadm.conf:
  DEVICE partitions
  ARRAY /dev/.static/dev/md0 level=raid10 num-devices=4 
  UUID=9d94b17b:f5fac31a:577c252b:0d4c4b2a auto=part
 
  still have the problem where on boot one drive is not part of the 
  array.  Is there a log file I can check to find out WHY a drive is 
  not being added?  It's been a while since the reboot, but I did find 
  some entries in dmesg - I'm appending both the md lines and the 
  physical disk related lines.  The bottom shows one disk not being 
  added (this time is was sda) - and the disk that gets skipped on each 
  boot seems to be random - there's no consistent failure:
 
  I suspect the base problem is that you are using whole disks instead 
  of partitions, and the problem with the partition table below is 
  probably an indication that you have something on that drive which 
  looks like a partition table but isn't. That prevents the drive from 
  being recognized as a whole drive. You're lucky, if the data looked 
  enough like a partition table to be valid the o/s probably would have 
  tried to do something with it.
  [...]
  This may be the rare case where you really do need to specify the 
  actual devices to get reliable operation.
 OK - I'm officially confused now (I was just unofficially before).  WHY 
 is it a problem using whole drives as RAID components?  I would have 
 thought that building a RAID storage unit with identically sized drives 
 - and using each drive's full capacity - is exactly the way you're 
 supposed to!

As much as anything else this can be summed up as you are thinking of
how you are using the drives and not how unexpected software on your
system might try and use your drives.  Without a partition table, none
of the software on your system can know what to do with the drives
except mdadm when it finds an md superblock.  That doesn't stop other
software from *trying* to find out how to use your drives though.  That
includes the kernel trying to look for a valid partition table, mount
possibly scanning the drive for a file system label, lvm scanning for an
lvm superblock, mtools looking for a dos filesystem, etc.  Under normal
conditions, the random data on your drive will never look valid to these
other pieces of software.  But, once in a great while, it will look
valid.  And that's when all hell breaks loose.  Or worse, you run a
partition program such as fdisk on the device and it initializes the
partition table (something that the Fedora/RHEL installers do to all
disks without partition tables...well, the installer tells you there's
no partition table and asks if you want to initialize it, but if someone
is in a hurry and hits yes when they meant no, bye bye data).

The partition table is the single, (mostly) universally recognized
arbiter of what possible data might be on the disk.  Having a partition
table may not make mdadm recognize the md superblock any better, but it
keeps all that other stuff from even trying to access data that it
doesn't have a need to access and prevents random luck from turning your
day bad.

Oh, and let's not go into what can happen if you're talking about a dual
boot machine and what Windows might do to the disk if it doesn't think
the disk space is already spoken for by a linux partition.

And, in particular with mdadm, I once created a full disk md raid array
on a couple disks, then couldn't get things arranged like I wanted, so I
just partitioned the disks and then created new arrays in the partitions
(without first manually zeroing the superblock for the whole disk
array).  Since I used a version 1.0 superblock on the whole disk array,
and then used version 1.1 superblocks in the partitions, the net result
was that when I ran mdadm -Eb, mdadm would find both the 1.1 and 1.0
superblocks in the last partition on the disk.  Confused both myself and
mdadm for a while.

Anyway, I happen to *like* the idea of using full disk devices, but the
reality is that the md subsystem doesn't have exclusive ownership of the
disks at all times, and without that it really needs to stake a claim on
the space instead of leaving things to chance IMO.

   I should mention that the boot/system drive is IDE, and 
 NOT part of the RAID.  So I'm not worried about losing the system - but 
 I AM concerned about the data.  I'm using four drives in a RAID-10 
 configuration - I thought this would provide a good blend of safety and 
 performance for a small fileserver.
 
 Because it's RAID-10 - I would ASSuME that I can drop one drive (after 
 all, I keep booting one drive short), partition if necessary, and add it 
 back in.  But how would splitting these disks into partitions improve 
 either stability or performance?

-- 
Doug Ledford [EMAIL PROTECTED]
  GPG KeyID: CFBFF194
  http://people.redhat.com/dledford

Infiniband specific RPMs available at
  

Re: Raid-10 mount at startup always has problem

2007-10-25 Thread Doug Ledford
On Thu, 2007-10-25 at 16:12 +1000, Neil Brown wrote:

  md: md0 stopped.
  md: md0 stopped.
  md: bindsdc
  md: bindsdd
  md: bindsdb
  md: md0: raid array is not clean -- starting background reconstruction
  raid10: raid set md0 active with 3 out of 4 devices
  md: couldn't update array info. -22
   ^^^
 
 This is the most surprising line, and hence the one most likely to
 convey helpful information.
 
 This message is generated when a process calls SET_ARRAY_INFO on an
 array that is already running, and the changes implied by the new
 array_info are not supportable.
 
 The only way I can see this happening is if two copies of mdadm are
 running at exactly the same time and are both are trying to assemble
 the same array.  The first calls SET_ARRAY_INFO and assembles the
 (partial) array.  The second calls SET_ARRAY_INFO and gets this error.
 Not all devices are included because while when one mdadm when to
 look, at a device, the other has it locked and so the first just
 ignored it.

If mdadm copy A gets three of the devices, I wouldn't think mdadm copy B
would have been able to get enough devices to decide to even try and
assemble the array (assuming that once copy A locked the devices during
open, that it then held the devices until time to assemble the array).

-- 
Doug Ledford [EMAIL PROTECTED]
  GPG KeyID: CFBFF194
  http://people.redhat.com/dledford

Infiniband specific RPMs available at
  http://people.redhat.com/dledford/Infiniband


signature.asc
Description: This is a digitally signed message part


Re: Raid-10 mount at startup always has problem

2007-10-25 Thread Daniel L. Miller

Neil Brown wrote:

It might be worth finding out where mdadm is being run in the init
scripts and add a -v flag, and redirecting stdout/stderr to some log
file.
e.g.
   mdadm -As  -v  /var/log/mdadm-$$ 21

And see if that leaves something useful in the log file.

  
I haven't rebooted yet, but here's my /etc/udev/rules.d/70-mdadm.rules 
file (BTW - running on Ubuntu 7.10 Gutsy):


SUBSYSTEM==block, ACTION==add|change, 
ENV{ID_FS_TYPE}==linux_raid*, RUN+=watershed -i udev-mdadm 
/sbin/mdadm -As -v  /var/log/mdadm-$$ 21


# This next line (only) is put into the initramfs,
#  where we run a strange script to activate only some of the arrays
#  as configured, instead of mdadm -As:
#initramfs# SUBSYSTEM==block, ACTION==add|change, 
ENV{ID_FS_TYPE}==linux_raid*, RUN+=watershed -i udev-mdadm 
/scripts/local-top/mdadm from-udev



Could that initramfs line be causing the problem?
--
Daniel
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Raid-10 mount at startup always has problem

2007-10-25 Thread Bill Davidsen

Neil Brown wrote:

On Wednesday October 24, [EMAIL PROTECTED] wrote:
  

Current mdadm.conf:
DEVICE partitions
ARRAY /dev/.static/dev/md0 level=raid10 num-devices=4 
UUID=9d94b17b:f5fac31a:577c252b:0d4c4b2a auto=part


still have the problem where on boot one drive is not part of the 
array.  Is there a log file I can check to find out WHY a drive is not 
being added?  It's been a while since the reboot, but I did find some 
entries in dmesg - I'm appending both the md lines and the physical disk 
related lines.  The bottom shows one disk not being added (this time is 
was sda) - and the disk that gets skipped on each boot seems to be 
random - there's no consistent failure:



Odd but interesting.
Does it sometimes fail to start the array altogether?

  

md: md0 stopped.
md: md0 stopped.
md: bindsdc
md: bindsdd
md: bindsdb
md: md0: raid array is not clean -- starting background reconstruction
raid10: raid set md0 active with 3 out of 4 devices
md: couldn't update array info. -22


  ^^^

This is the most surprising line, and hence the one most likely to
convey helpful information.

This message is generated when a process calls SET_ARRAY_INFO on an
array that is already running, and the changes implied by the new
array_info are not supportable.

The only way I can see this happening is if two copies of mdadm are
running at exactly the same time and are both are trying to assemble
the same array.  The first calls SET_ARRAY_INFO and assembles the
(partial) array.  The second calls SET_ARRAY_INFO and gets this error.
Not all devices are included because while when one mdadm when to
look, at a device, the other has it locked and so the first just
ignored it.

I just tried that, and sometimes it worked, but sometimes it assembled
with 3 out of 4 devices.  I didn't get the couldn't update array info
message, but that doesn't prove I'm wrong.

I cannot imagine how that might be happening (two at once) unless
maybe 'udev' had been configured to do something as soon as devices
were discovered seems unlikely.

It might be worth finding out where mdadm is being run in the init
scripts and add a -v flag, and redirecting stdout/stderr to some log
file.
e.g.
   mdadm -As  -v  /var/log/mdadm-$$ 21

And see if that leaves something useful in the log file.

BTW, I don't think your problem has anything to do with the fact that
you are using whole partitions.
  


You don't think the unknown partition table on sdd is related? Because 
I read that as a sure indication that the system isn't considering the 
drive as one without a partition table, and therefore isn't looking for 
the superblock on the whole device. And as Doug pointed out, once you 
decide that there is a partition table lots of things might try to use it.

While it is debatable whether that is a good idea or not (I like the
idea, but Doug doesn't and I respect his opinion) I doubt it would
contribute to the current problem.


Your description makes me nearly certain that there is some sort of
race going on (that is the easiest way to explain randomly differing
behaviours).   The race is probably between different code 'locking'
(opening with O_EXCL) the various devices.  Give the above error
message, two different 'mdadm's seems most likely, but an mdadm and a
mount-by-label scan could probably do it too.
  

--
bill davidsen [EMAIL PROTECTED]
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Raid-10 mount at startup always has problem

2007-10-25 Thread Daniel L. Miller

Bill Davidsen wrote:
You don't think the unknown partition table on sdd is related? 
Because I read that as a sure indication that the system isn't 
considering the drive as one without a partition table, and therefore 
isn't looking for the superblock on the whole device. And as Doug 
pointed out, once you decide that there is a partition table lots of 
things might try to use it.  
Now, would the drive letters (sd[a-d]) change from reboot-to-reboot?  
Because it's not consistent - so far I've seen each of the four drives 
at one time or another fail during the boot.


I've added the verbose logging to the udev mdadm rule, and I've also 
manually specified the drives in mdadm.conf instead of leaving it on 
auto.  Curious what the next boot will bring.


--
Daniel
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Raid-10 mount at startup always has problem

2007-10-25 Thread Neil Brown
On Thursday October 25, [EMAIL PROTECTED] wrote:
 Neil Brown wrote:
 
  BTW, I don't think your problem has anything to do with the fact that
  you are using whole partitions.

 
 You don't think the unknown partition table on sdd is related? Because 
 I read that as a sure indication that the system isn't considering the 
 drive as one without a partition table, and therefore isn't looking for 
 the superblock on the whole device. And as Doug pointed out, once you 
 decide that there is a partition table lots of things might try to use it.

unknown partition table is what I would expect when using whole
drive.
It just mean the first block doesn't look like a partition table,
and if you have some early block of an ext3 (or other) filesystem in
the first block (as you would in this case), you wouldn't expect it to
look like a partition table.

I don't understand what you are trying to say with your second
sentence.

NeilBrown
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Raid-10 mount at startup always has problem

2007-10-24 Thread Daniel L. Miller

Daniel L. Miller wrote:

Richard Scobie wrote:

Daniel L. Miller wrote:


And you didn't ask, but my mdadm.conf:
DEVICE partitions
ARRAY /dev/.static/dev/md0 level=raid10 num-devices=4 
UUID=9d94b17b:f5fac31a:577c252b:0d4c4b2a


Try adding

auto=part

at the end of you mdadm.conf ARRAY line.

Thanks - will see what happens on my next reboot.


Current mdadm.conf:
DEVICE partitions
ARRAY /dev/.static/dev/md0 level=raid10 num-devices=4 
UUID=9d94b17b:f5fac31a:577c252b:0d4c4b2a auto=part


still have the problem where on boot one drive is not part of the 
array.  Is there a log file I can check to find out WHY a drive is not 
being added?  It's been a while since the reboot, but I did find some 
entries in dmesg - I'm appending both the md lines and the physical disk 
related lines.  The bottom shows one disk not being added (this time is 
was sda) - and the disk that gets skipped on each boot seems to be 
random - there's no consistent failure:


[...]
md: raid10 personality registered for level 10
[...]
md: Autodetecting RAID arrays.
md: autorun ...
md: ... autorun DONE.
[...]
scsi0 : sata_nv
scsi1 : sata_nv
ata1: SATA max UDMA/133 cmd 0xc20001428480 ctl 0xc200014284a0 
bmdma 0x00011410 irq 23
ata2: SATA max UDMA/133 cmd 0xc20001428580 ctl 0xc200014285a0 
bmdma 0x00011418 irq 23

ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata1.00: ATA-7: ST3160811AS, 3.AAE, max UDMA/133
ata1.00: 312581808 sectors, multi 16: LBA48 NCQ (depth 31/32)
ata1.00: configured for UDMA/133
ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata2.00: ATA-7: ST3160811AS, 3.AAE, max UDMA/133
ata2.00: 312581808 sectors, multi 16: LBA48 NCQ (depth 31/32)
ata2.00: configured for UDMA/133
scsi 0:0:0:0: Direct-Access ATA  ST3160811AS  3.AA PQ: 0 ANSI: 5
ata1: bounce limit 0x, segment boundary 0x, hw 
segs 61

scsi 1:0:0:0: Direct-Access ATA  ST3160811AS  3.AA PQ: 0 ANSI: 5
ata2: bounce limit 0x, segment boundary 0x, hw 
segs 61

ACPI: PCI Interrupt Link [LSI1] enabled at IRQ 22
ACPI: PCI Interrupt :00:08.0[A] - Link [LSI1] - GSI 22 (level, 
high) - IRQ 22

sata_nv :00:08.0: Using ADMA mode
PCI: Setting latency timer of device :00:08.0 to 64
scsi2 : sata_nv
scsi3 : sata_nv
ata3: SATA max UDMA/133 cmd 0xc2000142a480 ctl 0xc2000142a4a0 
bmdma 0x00011420 irq 22
ata4: SATA max UDMA/133 cmd 0xc2000142a580 ctl 0xc2000142a5a0 
bmdma 0x00011428 irq 22

ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata3.00: ATA-7: ST3160811AS, 3.AAE, max UDMA/133
ata3.00: 312581808 sectors, multi 16: LBA48 NCQ (depth 31/32)
ata3.00: configured for UDMA/133
ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata4.00: ATA-7: ST3160811AS, 3.AAE, max UDMA/133
ata4.00: 312581808 sectors, multi 16: LBA48 NCQ (depth 31/32)
ata4.00: configured for UDMA/133
scsi 2:0:0:0: Direct-Access ATA  ST3160811AS  3.AA PQ: 0 ANSI: 5
ata3: bounce limit 0x, segment boundary 0x, hw 
segs 61

scsi 3:0:0:0: Direct-Access ATA  ST3160811AS  3.AA PQ: 0 ANSI: 5
ata4: bounce limit 0x, segment boundary 0x, hw 
segs 61

sd 0:0:0:0: [sda] 312581808 512-byte hardware sectors (160042 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't 
support DPO or FUA

sd 0:0:0:0: [sda] 312581808 512-byte hardware sectors (160042 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't 
support DPO or FUA

sda: unknown partition table
sd 0:0:0:0: [sda] Attached SCSI disk
sd 1:0:0:0: [sdb] 312581808 512-byte hardware sectors (160042 MB)
sd 1:0:0:0: [sdb] Write Protect is off
sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't 
support DPO or FUA

sd 1:0:0:0: [sdb] 312581808 512-byte hardware sectors (160042 MB)
sd 1:0:0:0: [sdb] Write Protect is off
sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't 
support DPO or FUA

sdb: unknown partition table
sd 1:0:0:0: [sdb] Attached SCSI disk
sd 2:0:0:0: [sdc] 312581808 512-byte hardware sectors (160042 MB)
sd 2:0:0:0: [sdc] Write Protect is off
sd 2:0:0:0: [sdc] Mode Sense: 00 3a 00 00
sd 2:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't 
support DPO or FUA

sd 2:0:0:0: [sdc] 312581808 512-byte hardware sectors (160042 MB)
sd 2:0:0:0: [sdc] Write Protect is off
sd 2:0:0:0: [sdc] Mode Sense: 00 3a 00 00
sd 2:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't 
support DPO or FUA

sdc: unknown partition table
sd 2:0:0:0: [sdc] Attached SCSI disk
sd 3:0:0:0: [sdd] 312581808 512-byte hardware sectors (160042 MB)
sd 3:0:0:0: [sdd] Write Protect is off
sd 3:0:0:0: [sdd] Mode Sense: 00 3a 00 00
sd 

Re: Raid-10 mount at startup always has problem

2007-10-24 Thread Doug Ledford
On Wed, 2007-10-24 at 07:22 -0700, Daniel L. Miller wrote:
 Daniel L. Miller wrote:
  Richard Scobie wrote:
  Daniel L. Miller wrote:
 
  And you didn't ask, but my mdadm.conf:
  DEVICE partitions
  ARRAY /dev/.static/dev/md0 level=raid10 num-devices=4 
  UUID=9d94b17b:f5fac31a:577c252b:0d4c4b2a
 
  Try adding
 
  auto=part
 
  at the end of you mdadm.conf ARRAY line.
  Thanks - will see what happens on my next reboot.
 
 Current mdadm.conf:
 DEVICE partitions
 ARRAY /dev/.static/dev/md0 level=raid10 num-devices=4 
 UUID=9d94b17b:f5fac31a:577c252b:0d4c4b2a auto=part
 
 still have the problem where on boot one drive is not part of the 
 array.  Is there a log file I can check to find out WHY a drive is not 
 being added?

It usually means either the device is busy at the time the raid startup
happened, or the device wasn't created by udev yet at the time the
startup happened.  It it failing to start the array properly in the
initrd or is this happening after you've switched to the rootfs and are
running the startup scripts?


 md: md0 stopped.
 md: md0 stopped.
 md: bindsdc
 md: bindsdd
 md: bindsdb

Whole disk raid devices == bad.  Lots of stuff can go wrong with that
setup.

 md: md0: raid array is not clean -- starting background reconstruction
 raid10: raid set md0 active with 3 out of 4 devices
 md: couldn't update array info. -22
 md: resync of RAID array md0
 md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
 md: using maximum available idle IO bandwidth (but not more than 20 
 KB/sec) for resync.
 md: using 128k window, over a total of 312581632 blocks.
 Filesystem md0: Disabling barriers, not supported by the underlying device
 XFS mounting filesystem md0
 Starting XFS recovery on filesystem: md0 (logdev: internal)
 Ending XFS recovery on filesystem: md0 (logdev: internal)
 
 
 
-- 
Doug Ledford [EMAIL PROTECTED]
  GPG KeyID: CFBFF194
  http://people.redhat.com/dledford

Infiniband specific RPMs available at
  http://people.redhat.com/dledford/Infiniband


signature.asc
Description: This is a digitally signed message part


Re: Raid-10 mount at startup always has problem

2007-10-24 Thread Daniel L. Miller

Bill Davidsen wrote:

Daniel L. Miller wrote:

Current mdadm.conf:
DEVICE partitions
ARRAY /dev/.static/dev/md0 level=raid10 num-devices=4 
UUID=9d94b17b:f5fac31a:577c252b:0d4c4b2a auto=part


still have the problem where on boot one drive is not part of the 
array.  Is there a log file I can check to find out WHY a drive is 
not being added?  It's been a while since the reboot, but I did find 
some entries in dmesg - I'm appending both the md lines and the 
physical disk related lines.  The bottom shows one disk not being 
added (this time is was sda) - and the disk that gets skipped on each 
boot seems to be random - there's no consistent failure:


I suspect the base problem is that you are using whole disks instead 
of partitions, and the problem with the partition table below is 
probably an indication that you have something on that drive which 
looks like a partition table but isn't. That prevents the drive from 
being recognized as a whole drive. You're lucky, if the data looked 
enough like a partition table to be valid the o/s probably would have 
tried to do something with it.

[...]
This may be the rare case where you really do need to specify the 
actual devices to get reliable operation.
OK - I'm officially confused now (I was just unofficially before).  WHY 
is it a problem using whole drives as RAID components?  I would have 
thought that building a RAID storage unit with identically sized drives 
- and using each drive's full capacity - is exactly the way you're 
supposed to!  I should mention that the boot/system drive is IDE, and 
NOT part of the RAID.  So I'm not worried about losing the system - but 
I AM concerned about the data.  I'm using four drives in a RAID-10 
configuration - I thought this would provide a good blend of safety and 
performance for a small fileserver.


Because it's RAID-10 - I would ASSuME that I can drop one drive (after 
all, I keep booting one drive short), partition if necessary, and add it 
back in.  But how would splitting these disks into partitions improve 
either stability or performance?


--
Daniel
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Raid-10 mount at startup always has problem

2007-09-09 Thread Daniel L. Miller

Bill Davidsen wrote:

Daniel L. Miller wrote:

Hi!

I have a four-disk Raid-10 array that I created and mount with 
mdadm.  It seems like every re-boot, either the array is not 
recognized altogether, or one of the disks is not added.  Manually 
adding using mdadm works.


What superblock version and partition type did you use? mdadm -D please.
Thanks for the reply.  I've been wondering why no one answered me - then 
discovered your answer in my mailbox!  Must have been hiding somewhere . 
. . .


Anyway -
mdadm -D /dev/md0
/dev/md0:
   Version : 00.90.03
 Creation Time : Tue Oct  3 19:11:53 2006
Raid Level : raid10
Array Size : 312581632 (298.10 GiB 320.08 GB)
 Used Dev Size : 156290816 (149.05 GiB 160.04 GB)
  Raid Devices : 4
 Total Devices : 4
Preferred Minor : 0
   Persistence : Superblock is persistent

   Update Time : Sun Sep  9 18:51:17 2007
 State : clean
Active Devices : 4
Working Devices : 4
Failed Devices : 0
 Spare Devices : 0

Layout : near=2, far=1
Chunk Size : 32K

  UUID : 9d94b17b:f5fac31a:577c252b:0d4c4b2a
Events : 0.10811466

   Number   Major   Minor   RaidDevice State
  0   800  active sync   /dev/sda
  1   8   161  active sync   /dev/sdb
  2   8   322  active sync   /dev/sdc
  3   8   483  active sync   /dev/sdd

And you didn't ask, but my mdadm.conf:
DEVICE partitions
ARRAY /dev/.static/dev/md0 level=raid10 num-devices=4 
UUID=9d94b17b:f5fac31a:577c252b:0d4c4b2a


Daniel
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Raid-10 mount at startup always has problem

2007-09-09 Thread Richard Scobie

Daniel L. Miller wrote:


And you didn't ask, but my mdadm.conf:
DEVICE partitions
ARRAY /dev/.static/dev/md0 level=raid10 num-devices=4 
UUID=9d94b17b:f5fac31a:577c252b:0d4c4b2a



Hi Daniel,

Try adding

auto=part

at the end of you mdadm.conf ARRAY line.

Regards,

Richard

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Raid-10 mount at startup always has problem

2007-09-09 Thread Daniel L. Miller

Richard Scobie wrote:

Daniel L. Miller wrote:


And you didn't ask, but my mdadm.conf:
DEVICE partitions
ARRAY /dev/.static/dev/md0 level=raid10 num-devices=4 
UUID=9d94b17b:f5fac31a:577c252b:0d4c4b2a


Try adding

auto=part

at the end of you mdadm.conf ARRAY line.

Thanks - will see what happens on my next reboot.

Daniel
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Raid-10 mount at startup always has problem

2007-08-28 Thread Bill Davidsen

Daniel L. Miller wrote:

Hi!

I have a four-disk Raid-10 array that I created and mount with mdadm.  
It seems like every re-boot, either the array is not recognized 
altogether, or one of the disks is not added.  Manually adding using 
mdadm works.


What superblock version and partition type did you use? mdadm -D please.


Ubuntu, custom compiled kernel, 2.6.22
mdadm 2.6.2
Sata hard drives, nvidia CK804 controller - NOT using nvidia raid.




--
bill davidsen [EMAIL PROTECTED]
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html