Re: [BUG] Raid1/5 over iSCSI trouble

2007-10-24 Thread BERTRAND Joël

Hello,

	Any news about this trouble ? Any idea ? I'm trying to fix it, but I 
don't see any specific interaction between raid5 and istd. Does anyone 
try to reproduce this bug on another arch than sparc64 ? I only use 
sparc32 and 64 servers and I cannot test on other archs. Of course, I 
have a laptop, but I cannot create a raid5 array on its internal HD to 
test this configuration ;-)


Please note that I won't read my mails until next saturday morning 
(CEST).


After disconnection of iSCSI target :

Tasks: 232 total,   7 running, 224 sleeping,   0 stopped,   1 zombie
Cpu(s):  0.0%us, 15.2%sy,  0.0%ni, 84.3%id,  0.0%wa,  0.1%hi,  0.3%si, 
0.0%st

Mem:   4139032k total,  4127584k used,11448k free,95752k buffers
Swap:  7815536k total,0k used,  7815536k free,  3758792k cached

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
 9738 root  15  -5 000 R  100  0.0   4:56.82 md_d0_raid5
 9774 root  15  -5 000 R  100  0.0   5:52.41 istd1
 9739 root  15  -5 000 R   14  0.0   0:28.90 md_d0_resync
 9916 root  20   0  3248 1544 1120 R2  0.0   0:00.56 top
 4129 root  20   0 41648 5024 2432 S0  0.1   2:56.17 
fail2ban-server

1 root  20   0  2576  960  816 S0  0.0   0:01.58 init
2 root  15  -5 000 S0  0.0   0:00.00 kthreadd
3 root  RT  -5 000 S0  0.0   0:00.00 migration/0
4 root  15  -5 000 S0  0.0   0:00.02 ksoftirqd/0
5 root  RT  -5 000 S0  0.0   0:00.00 migration/1
6 root  15  -5 000 S0  0.0   0:00.00 ksoftirqd/1


Regards,

JKB
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Time to deprecate old RAID formats?

2007-10-24 Thread David Greaves
Doug Ledford wrote:
 On Mon, 2007-10-22 at 16:39 -0400, John Stoffel wrote:
 
 I don't agree completely.  I think the superblock location is a key
 issue, because if you have a superblock location which moves depending
 the filesystem or LVM you use to look at the partition (or full disk)
 then you need to be even more careful about how to poke at things.
 
 This is the heart of the matter.  When you consider that each file
 system and each volume management stack has a superblock, and they some
 store their superblocks at the end of devices and some at the beginning,
 and they can be stacked, then it becomes next to impossible to make sure
 a stacked setup is never recognized incorrectly under any circumstance.

I wonder if we should not really be talking about superblock versions 1.0, 1.1,
1.2 etc but a data format (0.9 vs 1.0) and a location (end,start,offset4k)?

This would certainly make things a lot clearer to new users:

mdadm --create /dev/md0 --metadata 1.0 --meta-location offset4k


mdadm --detail /dev/md0

/dev/md0:
Version : 01.0
  Metadata-locn : End-of-device
  Creation Time : Fri Aug  4 23:05:02 2006
 Raid Level : raid0


And there you have the deprecation... only two superblock versions and no real
changes to code etc

David
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Time to deprecate old RAID formats?

2007-10-24 Thread John Stoffel
 Bill == Bill Davidsen [EMAIL PROTECTED] writes:

Bill John Stoffel wrote:
 Why do we have three different positions for storing the superblock?  

Bill Why do you suggest changing anything until you get the answer to
Bill this question? If you don't understand why there are three
Bill locations, perhaps that would be a good initial investigation.

Because I've asked this question before and not gotten an answer, nor
is it answered in the man page for mdadm on why we have this setup. 

Bill Clearly the short answer is that they reflect three stages of
Bill Neil's thinking on the topic, and I would bet that he had a good
Bill reason for moving the superblock when he did it.

So let's hear Neil's thinking about all this?  Or should I just work
up a patch to do what I suggest and see how that flies? 

Bill Since you have to support all of them or break existing arrays,
Bill and they all use the same format so there's no saving of code
Bill size to mention, why even bring this up?

Because of the confusion factor.  Again, since noone has been able to
articulate a reason why we have three different versions of the 1.x
superblock, nor have I seen any good reasons for why we should have
them, I'm going by the KISS principle to reduce the options to the
best one.

And no, I'm not advocating getting rid of legacy support, but I AM
advocating that we settle on ONE standard format going forward as the
default for all new RAID superblocks.

John
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Raid-10 mount at startup always has problem

2007-10-24 Thread Daniel L. Miller

Daniel L. Miller wrote:

Richard Scobie wrote:

Daniel L. Miller wrote:


And you didn't ask, but my mdadm.conf:
DEVICE partitions
ARRAY /dev/.static/dev/md0 level=raid10 num-devices=4 
UUID=9d94b17b:f5fac31a:577c252b:0d4c4b2a


Try adding

auto=part

at the end of you mdadm.conf ARRAY line.

Thanks - will see what happens on my next reboot.


Current mdadm.conf:
DEVICE partitions
ARRAY /dev/.static/dev/md0 level=raid10 num-devices=4 
UUID=9d94b17b:f5fac31a:577c252b:0d4c4b2a auto=part


still have the problem where on boot one drive is not part of the 
array.  Is there a log file I can check to find out WHY a drive is not 
being added?  It's been a while since the reboot, but I did find some 
entries in dmesg - I'm appending both the md lines and the physical disk 
related lines.  The bottom shows one disk not being added (this time is 
was sda) - and the disk that gets skipped on each boot seems to be 
random - there's no consistent failure:


[...]
md: raid10 personality registered for level 10
[...]
md: Autodetecting RAID arrays.
md: autorun ...
md: ... autorun DONE.
[...]
scsi0 : sata_nv
scsi1 : sata_nv
ata1: SATA max UDMA/133 cmd 0xc20001428480 ctl 0xc200014284a0 
bmdma 0x00011410 irq 23
ata2: SATA max UDMA/133 cmd 0xc20001428580 ctl 0xc200014285a0 
bmdma 0x00011418 irq 23

ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata1.00: ATA-7: ST3160811AS, 3.AAE, max UDMA/133
ata1.00: 312581808 sectors, multi 16: LBA48 NCQ (depth 31/32)
ata1.00: configured for UDMA/133
ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata2.00: ATA-7: ST3160811AS, 3.AAE, max UDMA/133
ata2.00: 312581808 sectors, multi 16: LBA48 NCQ (depth 31/32)
ata2.00: configured for UDMA/133
scsi 0:0:0:0: Direct-Access ATA  ST3160811AS  3.AA PQ: 0 ANSI: 5
ata1: bounce limit 0x, segment boundary 0x, hw 
segs 61

scsi 1:0:0:0: Direct-Access ATA  ST3160811AS  3.AA PQ: 0 ANSI: 5
ata2: bounce limit 0x, segment boundary 0x, hw 
segs 61

ACPI: PCI Interrupt Link [LSI1] enabled at IRQ 22
ACPI: PCI Interrupt :00:08.0[A] - Link [LSI1] - GSI 22 (level, 
high) - IRQ 22

sata_nv :00:08.0: Using ADMA mode
PCI: Setting latency timer of device :00:08.0 to 64
scsi2 : sata_nv
scsi3 : sata_nv
ata3: SATA max UDMA/133 cmd 0xc2000142a480 ctl 0xc2000142a4a0 
bmdma 0x00011420 irq 22
ata4: SATA max UDMA/133 cmd 0xc2000142a580 ctl 0xc2000142a5a0 
bmdma 0x00011428 irq 22

ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata3.00: ATA-7: ST3160811AS, 3.AAE, max UDMA/133
ata3.00: 312581808 sectors, multi 16: LBA48 NCQ (depth 31/32)
ata3.00: configured for UDMA/133
ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata4.00: ATA-7: ST3160811AS, 3.AAE, max UDMA/133
ata4.00: 312581808 sectors, multi 16: LBA48 NCQ (depth 31/32)
ata4.00: configured for UDMA/133
scsi 2:0:0:0: Direct-Access ATA  ST3160811AS  3.AA PQ: 0 ANSI: 5
ata3: bounce limit 0x, segment boundary 0x, hw 
segs 61

scsi 3:0:0:0: Direct-Access ATA  ST3160811AS  3.AA PQ: 0 ANSI: 5
ata4: bounce limit 0x, segment boundary 0x, hw 
segs 61

sd 0:0:0:0: [sda] 312581808 512-byte hardware sectors (160042 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't 
support DPO or FUA

sd 0:0:0:0: [sda] 312581808 512-byte hardware sectors (160042 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't 
support DPO or FUA

sda: unknown partition table
sd 0:0:0:0: [sda] Attached SCSI disk
sd 1:0:0:0: [sdb] 312581808 512-byte hardware sectors (160042 MB)
sd 1:0:0:0: [sdb] Write Protect is off
sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't 
support DPO or FUA

sd 1:0:0:0: [sdb] 312581808 512-byte hardware sectors (160042 MB)
sd 1:0:0:0: [sdb] Write Protect is off
sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't 
support DPO or FUA

sdb: unknown partition table
sd 1:0:0:0: [sdb] Attached SCSI disk
sd 2:0:0:0: [sdc] 312581808 512-byte hardware sectors (160042 MB)
sd 2:0:0:0: [sdc] Write Protect is off
sd 2:0:0:0: [sdc] Mode Sense: 00 3a 00 00
sd 2:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't 
support DPO or FUA

sd 2:0:0:0: [sdc] 312581808 512-byte hardware sectors (160042 MB)
sd 2:0:0:0: [sdc] Write Protect is off
sd 2:0:0:0: [sdc] Mode Sense: 00 3a 00 00
sd 2:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't 
support DPO or FUA

sdc: unknown partition table
sd 2:0:0:0: [sdc] Attached SCSI disk
sd 3:0:0:0: [sdd] 312581808 512-byte hardware sectors (160042 MB)
sd 3:0:0:0: [sdd] Write Protect is off
sd 3:0:0:0: [sdd] Mode Sense: 00 3a 00 00
sd 

Re: Time to deprecate old RAID formats?

2007-10-24 Thread Mike Snitzer
On 10/24/07, John Stoffel [EMAIL PROTECTED] wrote:
  Bill == Bill Davidsen [EMAIL PROTECTED] writes:

 Bill John Stoffel wrote:
  Why do we have three different positions for storing the superblock?

 Bill Why do you suggest changing anything until you get the answer to
 Bill this question? If you don't understand why there are three
 Bill locations, perhaps that would be a good initial investigation.

 Because I've asked this question before and not gotten an answer, nor
 is it answered in the man page for mdadm on why we have this setup.

 Bill Clearly the short answer is that they reflect three stages of
 Bill Neil's thinking on the topic, and I would bet that he had a good
 Bill reason for moving the superblock when he did it.

 So let's hear Neil's thinking about all this?  Or should I just work
 up a patch to do what I suggest and see how that flies?

 Bill Since you have to support all of them or break existing arrays,
 Bill and they all use the same format so there's no saving of code
 Bill size to mention, why even bring this up?

 Because of the confusion factor.  Again, since noone has been able to
 articulate a reason why we have three different versions of the 1.x
 superblock, nor have I seen any good reasons for why we should have
 them, I'm going by the KISS principle to reduce the options to the
 best one.

 And no, I'm not advocating getting rid of legacy support, but I AM
 advocating that we settle on ONE standard format going forward as the
 default for all new RAID superblocks.

Why exactly are you on this crusade to find the one best v1
superblock location?  Giving people the freedom to place the
superblock where they choose isn't a bad thing.  Would adding
something like If in doubt, 1.1 is the safest choice. to the mdadm
man page give you the KISS warm-fuzzies you're pining for?

The fact that, after you read the manpage, you didn't even know that
the only difference between the v1.x variants is the location that the
superblock is placed indicates that you're not in a position to be so
tremendously evangelical about affecting code changes that limit
existing options.

Mike
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Time to deprecate old RAID formats?

2007-10-24 Thread Bill Davidsen

John Stoffel wrote:

Bill == Bill Davidsen [EMAIL PROTECTED] writes:



Bill John Stoffel wrote:
  
Why do we have three different positions for storing the superblock?  
  


Bill Why do you suggest changing anything until you get the answer to
Bill this question? If you don't understand why there are three
Bill locations, perhaps that would be a good initial investigation.

Because I've asked this question before and not gotten an answer, nor
is it answered in the man page for mdadm on why we have this setup. 


Bill Clearly the short answer is that they reflect three stages of
Bill Neil's thinking on the topic, and I would bet that he had a good
Bill reason for moving the superblock when he did it.

So let's hear Neil's thinking about all this?  Or should I just work
up a patch to do what I suggest and see how that flies? 
  


If you are only going to change the default, I think you're done, since 
people report problems with bootloaders starting versions other than 
0.90. And until I hear Neil's thinking on this, I'm not sure that I know 
what the default location and type should be. In fact, reading the 
discussion I suspect it should be different for RAID-0 (should be at the 
end) and all other types (should be near the front). That retains the 
ability to mount one part of the mirror as a single partition, while 
minimizing the possibility of bad applications seeing something which 
looks like a filesystem at the start of a partition and trying to run 
fsck on it.

Bill Since you have to support all of them or break existing arrays,
Bill and they all use the same format so there's no saving of code
Bill size to mention, why even bring this up?

Because of the confusion factor.  Again, since noone has been able to
articulate a reason why we have three different versions of the 1.x
superblock, nor have I seen any good reasons for why we should have
them, I'm going by the KISS principle to reduce the options to the
best one.

And no, I'm not advocating getting rid of legacy support, but I AM
advocating that we settle on ONE standard format going forward as the
default for all new RAID superblocks.
  


Unfortunately the solution can't be any simpler than the problem, and 
that's why I'm dubious that anything but the documentation should be 
changed, or an additional metadata target added per the discussion 
above, perhaps best1 for best 1.x format based on the raid level.


--
bill davidsen [EMAIL PROTECTED]
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Raid-10 mount at startup always has problem

2007-10-24 Thread Doug Ledford
On Wed, 2007-10-24 at 07:22 -0700, Daniel L. Miller wrote:
 Daniel L. Miller wrote:
  Richard Scobie wrote:
  Daniel L. Miller wrote:
 
  And you didn't ask, but my mdadm.conf:
  DEVICE partitions
  ARRAY /dev/.static/dev/md0 level=raid10 num-devices=4 
  UUID=9d94b17b:f5fac31a:577c252b:0d4c4b2a
 
  Try adding
 
  auto=part
 
  at the end of you mdadm.conf ARRAY line.
  Thanks - will see what happens on my next reboot.
 
 Current mdadm.conf:
 DEVICE partitions
 ARRAY /dev/.static/dev/md0 level=raid10 num-devices=4 
 UUID=9d94b17b:f5fac31a:577c252b:0d4c4b2a auto=part
 
 still have the problem where on boot one drive is not part of the 
 array.  Is there a log file I can check to find out WHY a drive is not 
 being added?

It usually means either the device is busy at the time the raid startup
happened, or the device wasn't created by udev yet at the time the
startup happened.  It it failing to start the array properly in the
initrd or is this happening after you've switched to the rootfs and are
running the startup scripts?


 md: md0 stopped.
 md: md0 stopped.
 md: bindsdc
 md: bindsdd
 md: bindsdb

Whole disk raid devices == bad.  Lots of stuff can go wrong with that
setup.

 md: md0: raid array is not clean -- starting background reconstruction
 raid10: raid set md0 active with 3 out of 4 devices
 md: couldn't update array info. -22
 md: resync of RAID array md0
 md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
 md: using maximum available idle IO bandwidth (but not more than 20 
 KB/sec) for resync.
 md: using 128k window, over a total of 312581632 blocks.
 Filesystem md0: Disabling barriers, not supported by the underlying device
 XFS mounting filesystem md0
 Starting XFS recovery on filesystem: md0 (logdev: internal)
 Ending XFS recovery on filesystem: md0 (logdev: internal)
 
 
 
-- 
Doug Ledford [EMAIL PROTECTED]
  GPG KeyID: CFBFF194
  http://people.redhat.com/dledford

Infiniband specific RPMs available at
  http://people.redhat.com/dledford/Infiniband


signature.asc
Description: This is a digitally signed message part


MD driver document

2007-10-24 Thread tirumalareddy marri

 Hi,
   I am looking for best way of understanding MD 
driver(including raid5/6) architecture. I am
developing driver for one of the PPC based SOC. I have
done some code reading and tried to use HW debugger to
walk through the code. But it was not much help.

  If you have any pointers or documents, I will
greatly appreciate if you can share it.
 
Thanks and regards,
 Marri



-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: MD driver document

2007-10-24 Thread Dan Williams
On 10/24/07, tirumalareddy marri [EMAIL PROTECTED] wrote:

  Hi,
I am looking for best way of understanding MD
 driver(including raid5/6) architecture. I am
 developing driver for one of the PPC based SOC. I have
 done some code reading and tried to use HW debugger to
 walk through the code. But it was not much help.

   If you have any pointers or documents, I will
 greatly appreciate if you can share it.


I started out with include/linux/raid/raid5.h.  Also, running it with
the debug print statements turned on will get you familiar with the
code flow.

Lastly, I wrote the following paper which is already becoming outdated:
http://downloads.sourceforge.net/xscaleiop/ols_paper_2006.pdf

 Thanks and regards,
  Marri


--
Dan
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Software RAID when it works and when it doesn't

2007-10-24 Thread Bill Davidsen

Alberto Alonso wrote:

On Tue, 2007-10-23 at 18:45 -0400, Bill Davidsen wrote:

  
I'm not sure the timeouts are the problem, even if md did its own 
timeout, it then needs a way to tell the driver (or device) to stop 
retrying. I don't believe that's available, certainly not everywhere, 
and anything other than everywhere would turn the md code into a nest of 
exceptions.





If we loose the ability to communication to that drive I don't see it
as a problem (that's the whole point, we kick it out of the array). So,
if we can't tell the driver about the failure we are still OK, md could
successfully deal with misbehaved drivers.


I think what you really want is to notice how long the drive and driver 
took to recover or fail, and take action based on that. In general kick 
the drive is not optimal for a few bad spots, even if the drive 
recovery sucks.


--
bill davidsen [EMAIL PROTECTED]
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] Raid1/5 over iSCSI trouble

2007-10-24 Thread Bill Davidsen

BERTRAND Joël wrote:

Hello,

Any news about this trouble ? Any idea ? I'm trying to fix it, but 
I don't see any specific interaction between raid5 and istd. Does 
anyone try to reproduce this bug on another arch than sparc64 ? I only 
use sparc32 and 64 servers and I cannot test on other archs. Of 
course, I have a laptop, but I cannot create a raid5 array on its 
internal HD to test this configuration ;-)


Sure you can, a few loopback devices and a few iSCSI, and you're in 
business. I think the ongoing discussion of timeouts and whatnot may 
bear some fruit eventually, perhaps not as fast as you would like. By 
Saturday a solution may emerge.


Please note that I won't read my mails until next saturday morning 
(CEST). 



--
bill davidsen [EMAIL PROTECTED]
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979


-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Time to deprecate old RAID formats?

2007-10-24 Thread Bill Davidsen

Doug Ledford wrote:

On Mon, 2007-10-22 at 16:39 -0400, John Stoffel wrote:

  

I don't agree completely.  I think the superblock location is a key
issue, because if you have a superblock location which moves depending
the filesystem or LVM you use to look at the partition (or full disk)
then you need to be even more careful about how to poke at things.



This is the heart of the matter.  When you consider that each file
system and each volume management stack has a superblock, and they some
store their superblocks at the end of devices and some at the beginning,
and they can be stacked, then it becomes next to impossible to make sure
a stacked setup is never recognized incorrectly under any circumstance.
It might be possible if you use static device names, but our users
*long* ago complained very loudly when adding a new disk or removing a
bad disk caused their setup to fail to boot.  So, along came mount by
label and auto scans for superblocks.  Once you do that, you *really*
need all the superblocks at the same end of a device so when you stack
things, it always works properly.
Let me be devil's advocate, I noted in another post that location might 
be raid level dependent. For raid-1 putting the superblock at the end 
allows the BIOS to treat a single partition as a bootable unit. For all 
other arrangements the end location puts the superblock where it is 
slightly more likely to be overwritten, and where it must be moved if 
the partition grows or whatever.


There really may be no right answer.

--
bill davidsen [EMAIL PROTECTED]
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Software RAID when it works and when it doesn't

2007-10-24 Thread Alberto Alonso
On Wed, 2007-10-24 at 16:04 -0400, Bill Davidsen wrote:

 I think what you really want is to notice how long the drive and driver 
 took to recover or fail, and take action based on that. In general kick 
 the drive is not optimal for a few bad spots, even if the drive 
 recovery sucks.

The problem is that the driver never comes back and the whole
array hangs, waiting forever. That's why a timeout within the
md code is needed to recover from these type of drivers.

Alberto

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] Raid1/5 over iSCSI trouble

2007-10-24 Thread Dan Williams
On 10/24/07, BERTRAND Joël [EMAIL PROTECTED] wrote:
 Hello,

 Any news about this trouble ? Any idea ? I'm trying to fix it, but I
 don't see any specific interaction between raid5 and istd. Does anyone
 try to reproduce this bug on another arch than sparc64 ? I only use
 sparc32 and 64 servers and I cannot test on other archs. Of course, I
 have a laptop, but I cannot create a raid5 array on its internal HD to
 test this configuration ;-)


Can you collect some oprofile data, as Ming suggested, so we can maybe
see what md_d0_raid5 and istd1 are fighting about?  Hopefully it is as
painless to run on sparc as it is on IA:

opcontrol --start --vmlinux=/path/to/vmlinux
wait
opcontrol --stop
opreport --image-path=/lib/modules/`uname -r` -l

--
Dan
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Time to deprecate old RAID formats?

2007-10-24 Thread Neil Brown
On Tuesday October 23, [EMAIL PROTECTED] wrote:
 On Tue, 2007-10-23 at 19:03 -0400, Bill Davidsen wrote:
  John Stoffel wrote:
   Why do we have three different positions for storing the superblock?  
 
  Why do you suggest changing anything until you get the answer to this 
  question? If you don't understand why there are three locations, perhaps 
  that would be a good initial investigation.
  
  Clearly the short answer is that they reflect three stages of Neil's 
  thinking on the topic, and I would bet that he had a good reason for 
  moving the superblock when he did it.
 
 I believe, and Neil can correct me if I'm wrong, that 1.0 (at the end of
 the device) is to satisfy people that want to get at their raid1 data
 without bringing up the device or using a loop mount with an offset.
 Version 1.1, at the beginning of the device, is to prevent accidental
 access to a device when the raid array doesn't come up.  And version 1.2
 (4k from the beginning of the device) would be suitable for those times
 when you want to embed a boot sector at the very beginning of the device
 (which really only needs 512 bytes, but a 4k offset is as easy to deal
 with as anything else).  From the standpoint of wanting to make sure an
 array is suitable for embedding a boot sector, the 1.2 superblock may be
 the best default.
 

Exactly correct.

Another perspective is that I chickened out of making a decision and
chose to support all the credible possibilities that I could think of.
And showed that I didn't have enough imagination.  The other
possibility that I should have included (as has been suggested in this
conversation, and previously on this list) is to store the superblock
both at the beginning and the end for redundancy.  However I cannot
decide whether to combine the 1.0 and 1.1 locations, or the 1.0 and
1.2.  And I don't think I want to support both (maybe I've learned my
lesson).

As for where the metadata should be placed, it is interesting to
observe that the SNIA's DDFv1.2 puts it at the end of the device.
And as DDF is an industry standard sponsored by multiple companies it
must be ..
Sorry.  I had intended to say correct, but when it came to it, my
fingers refused to type that word in that context.

DDF is in a somewhat different situation though.  It assumes that the
components are whole devices, and that the controller has exclusive
access - there is no way another controller could interpret the
devices differently before the DDF controller has a chance.

DDF is also interesting in that it uses 512 byte alignment for
metadata.  The 'anchor' block is in the last sector of the device.
This contrasts with current md metadata which is all 4K aligned.
Given that the drive manufacturers seem to be telling us that 4096 is
the new 512, I think 4K alignment was a good idea.
It could be that DDF actually specifies the anchor to reside in the
last block rather than the last sector, and it could be that the
spec allows for block size to be device specific - I'd have to hunt
through the spec again to be sure.

For the record, I have no intention of deprecating any of the metadata
formats, not even 0.90.
It is conceivable that I could change the default, though that would
require a decision as to what the new default would be.  I think it
would have to be 1.0 or it would cause too much confusion.

I think it would be entirely appropriate for a distro (especially an
'enterprise' distro) to choose a format and location that it was going
to standardise on and support, and make that the default on that
distro (by using a CREATE line in mdadm.conf).  Debian has already
done this by making 1.0 the default.

I certainly accept that the documentation is probably less that
perfect (by a large margin).  I am more than happy to accept patches
or concrete suggestions on how to improve that.  I always think it is
best if a non-developer writes documentation (and a developer reviews
it) as then it is more likely to address the issues that a
non-developer will want to read about, and in a way that will make
sense to a non-developer. (i.e. I'm to close to the subject to write
good doco).

NeilBrown

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] Raid1/5 over iSCSI trouble

2007-10-24 Thread David Miller
From: Dan Williams [EMAIL PROTECTED]
Date: Wed, 24 Oct 2007 16:49:28 -0700

 Hopefully it is as painless to run on sparc as it is on IA:
 
 opcontrol --start --vmlinux=/path/to/vmlinux
 wait
 opcontrol --stop
 opreport --image-path=/lib/modules/`uname -r` -l

It is painless, I use it all the time.

The only caveat is to make sure the /path/to/vmlinux is
the pre-stripped kernel image.  The images installed
under /boot/ are usually stripped and thus not suitable
for profiling.
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Time to deprecate old RAID formats?

2007-10-24 Thread Jeff Garzik

Neil Brown wrote:

On Tuesday October 23, [EMAIL PROTECTED] wrote:
As for where the metadata should be placed, it is interesting to
observe that the SNIA's DDFv1.2 puts it at the end of the device.
And as DDF is an industry standard sponsored by multiple companies it
must be ..
Sorry.  I had intended to say correct, but when it came to it, my
fingers refused to type that word in that context.

DDF is in a somewhat different situation though.  It assumes that the
components are whole devices, and that the controller has exclusive
access - there is no way another controller could interpret the
devices differently before the DDF controller has a chance.


grin agreed.



DDF is also interesting in that it uses 512 byte alignment for
metadata.  The 'anchor' block is in the last sector of the device.
This contrasts with current md metadata which is all 4K aligned.
Given that the drive manufacturers seem to be telling us that 4096 is
the new 512, I think 4K alignment was a good idea.
It could be that DDF actually specifies the anchor to reside in the
last block rather than the last sector, and it could be that the
spec allows for block size to be device specific - I'd have to hunt
through the spec again to be sure.


Its a bit of a mess.

Yes, with 1K and 4K sector devices starting to appear, as long as the 
underlying partitioning gets the initial partition alignment correct, 
this /should/ continue functioning as normal.


If for whatever reason you wind up with an odd-aligned 1K sector device 
and your data winds up aligned to even numbered [hard] sectors, 
performance will definitely suffer.


Mostly this is out of MD's hands, and up to the sysadmin and 
partitioning tools to get hard-sector alignment right.




For the record, I have no intention of deprecating any of the metadata
formats, not even 0.90.


strongly agreed



It is conceivable that I could change the default, though that would
require a decision as to what the new default would be.  I think it
would have to be 1.0 or it would cause too much confusion.


A newer default would be nice.

Jeff


-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Raid-10 mount at startup always has problem

2007-10-24 Thread Daniel L. Miller

Bill Davidsen wrote:

Daniel L. Miller wrote:

Current mdadm.conf:
DEVICE partitions
ARRAY /dev/.static/dev/md0 level=raid10 num-devices=4 
UUID=9d94b17b:f5fac31a:577c252b:0d4c4b2a auto=part


still have the problem where on boot one drive is not part of the 
array.  Is there a log file I can check to find out WHY a drive is 
not being added?  It's been a while since the reboot, but I did find 
some entries in dmesg - I'm appending both the md lines and the 
physical disk related lines.  The bottom shows one disk not being 
added (this time is was sda) - and the disk that gets skipped on each 
boot seems to be random - there's no consistent failure:


I suspect the base problem is that you are using whole disks instead 
of partitions, and the problem with the partition table below is 
probably an indication that you have something on that drive which 
looks like a partition table but isn't. That prevents the drive from 
being recognized as a whole drive. You're lucky, if the data looked 
enough like a partition table to be valid the o/s probably would have 
tried to do something with it.

[...]
This may be the rare case where you really do need to specify the 
actual devices to get reliable operation.
OK - I'm officially confused now (I was just unofficially before).  WHY 
is it a problem using whole drives as RAID components?  I would have 
thought that building a RAID storage unit with identically sized drives 
- and using each drive's full capacity - is exactly the way you're 
supposed to!  I should mention that the boot/system drive is IDE, and 
NOT part of the RAID.  So I'm not worried about losing the system - but 
I AM concerned about the data.  I'm using four drives in a RAID-10 
configuration - I thought this would provide a good blend of safety and 
performance for a small fileserver.


Because it's RAID-10 - I would ASSuME that I can drop one drive (after 
all, I keep booting one drive short), partition if necessary, and add it 
back in.  But how would splitting these disks into partitions improve 
either stability or performance?


--
Daniel
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html