raid10 far layout outperforms offset at writing? (was: Help with chunksize on raid10 -p o3 array)

2007-03-19 Thread Peter Rabbitson

Peter Rabbitson wrote:
I have been trying to figure out the best chunk size for raid10 before 
migrating my server to it (currently raid1). I am looking at 3 offset 
stripes, as I want to have two drive failure redundancy, and offset 
striping is said to have the best write performance, with read 
performance equal to far.


Incorporating suggestions from previous posts (thank you everyone), I 
used this modified script at http://rabbit.us/pool/misc/raid_test2.txt 
To negate effects of caching memory was jammed below 200mb free by using 
a full tmpfs mount with no swap. Here is what I got with far layout (-p 
f3): http://rabbit.us/pool/misc/raid_far.html The clear winner is 1M 
chunks, and is very consistent at any block size. I was surprised even 
more to see that my read speed was identical to that of a raid0 getting 
near the _maximum_ physical speed of 4 drives (roughly 55MB sustained 
across 1.2G). Unlike offset layout, far really shines at reading stuff 
back. The write speed did not suffer noticeably compared to offset 
striping. Here are the results (-p o3) for comparison: 
http://rabbit.us/pool/misc/raid_offset.html, and they roughly seem to 
correlate with my earlier testing using dd.


So I guess the way to go for this system will be f3, although the md(4) 
says that offset layout should be more beneficial. Is there anything I 
missed while setting my o3 array, so that I got worse performance for 
both read and write compared to f3?


Once again thanks everyone for the help.
Peter
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


strange test results

2007-03-19 Thread Tomka Gergely
Hi!

I am running tests on our new test device. The device has 2x2 core Xeon, 
intel 5000 chipset, two 3ware sata raid card on pcie, and 15 sata2 disks, 
running debian etch. More info at the bottom.

The first phase of the test is probing various raid levels. So i 
configured the cards to 15 JBOD disks, and hacked together a testing 
script. The script builds raid arrays, waits for sync, and then runs this 
command:

iozone -eM -s 4g -r 1024 -i0 -i1 -i2 -i8 -t16 -+u

The graphs of the results here:

http://gergely.tomka.hu/dt/index.html

And i have a lots of questions.

http://gergely.tomka.hu/dt/1.html

This graph is crazy, like thunderbolts. But the raid50 is generally slower 
than raid5. Why?

http://gergely.tomka.hu/dt/3.html

This is the only graph i can explain :)

http://gergely.tomka.hu/dt/4.html

With random readers, why raid0 slowing down? And why raid10 faster than 
raid0?

http://gergely.tomka.hu/dt/2.html

Why raid6 cant became faster, with multiple disks, as raid5  50?

So lots of questions. I am generally surprised by the non-linearity of 
some results and the lack of acceleration with more disks on other 
results. And now, the details:

Hardware:

Base Board Information
Manufacturer: Supermicro
Product Name: X7DB8
Processor Information
Socket Designation: LGA771/CPU1
Type: Central Processor
Family: Xeon
Manufacturer: Intel
ID: 64 0F 00 00 FF FB EB BF
Signature: Type 0, Family 15, Model 6, Stepping 4
(two cpus)
Memory Device
Array Handle: 0x0017
Error Information Handle: No Error
Total Width: 72 bits
Data Width: 64 bits
Size: 1024 MB
Form Factor: DIMM
Set: 1
Locator: DIMM x 4
Bank Locator: Bank1
Type: DDR2
Type Detail: Synchronous
Speed: 533 MHz (1.9 ns)
Manufacturer: Not Specified
Serial Number: Not Specified
Asset Tag: Not Specified
Part Number: Not Specified
(two of this also)

ursula:~# tw_cli show

Ctl   ModelPorts   Drives   Units   NotOpt   RRate   VRate   BBU

c09590SE-8ML   8   77   01   1   -
c19590SE-8ML   8   88   01   1   -

The tests generally:
mdadm
mkfs.xfs
blockdev --setra 524288 md (maybe not a good idea for multiple arrays)
do iozone test

raid10 is two disks raid1s in raid0 and raid50 is three disk raid6s in 
raid0.

These test runs for a week, and now slowly finishing. For this reason, 
replicatong the test to filter out accidents not a good option.

Any comments?

-- 
Tomka Gergely, [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Raid1 replaced with raid10?

2007-03-19 Thread Peter Rabbitson

Hi,

I just tried an idea I got after fiddling with raid10 and to my dismay 
it worked as I thought it will. I used two small partitions on separate 
disks to create a raid1 array. Then I did dd if=/dev/md2 of=/dev/null. I 
got only one of the disks reading. Nothing unexpected. Then I created a 
raid10 array on the same two partitions with the options -l10 -n2 -pf2. 
The same dd executed at twice the speed, reading _simultaneously_ from 
both drives. I did some bonnie++ benchmarking - same result - raid1 
reads only from a single disk raid10 from both. Write performance is 
worse (about 10% slower) with raid10, but you get twice the read speed.
In this light the obvious question is: can raid10 be used as a drop-in 
replacement for raid1 or there is a caveat with having the amount of 
disks equal the amount of chunk copies?


Peter
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Linux-usb-users] Failed reads from RAID-0 array (from newbie who has read the FAQ)

2007-03-19 Thread Bill Davidsen

Michael Schwarz wrote:

More than ever, I am convinced that it is actually a hardware problem, but
I am curious for the opinions of both of you on whether the system
(meaning, I guess, the combination of usb-storage driver and raid) is
really doing the best with what it has.
  


See below, but the short answer is there is probably room for improvement.

My last effort was to switch to a different computer. When I did, I got in
the dmesg log (unfortunately, not preserved, although I should be able to
recreate) that one of the flash drives had bad blocks. Some part of the
system eventually decided it was a dead device (I believe dmesg indicate
the scsi subsystem said so). The device (it happened to be /dev/sdc) was
peremptorially dropped from the system. This appears to be what hanged the
raid system.

(Why these messages never appeared on the other computer is beyond me;
obviously some difference in how the actual USB controller reports errors,
but, as I said, I've never studied USB drivers or hardware. In fact, once
you get beyond the UARTs you are getting sophisticated to me)

I've built an array of five known-good devices and so far it works
swimmingly (at least on the hardware that was better at error reporting).

So it seems to me that there is probably nothing actually wrong with the
drivers or their interactions at it leaves me only asking if there should
be some sort of improvement in error reporting/recovery up to userland.

If I am right and the scsi system was marking a device as dead, shouldn't
the userland read against the md device get an error instead of an
indefinite hang?
  


Let me make sure I have this scenario right... one write process (dd or 
cp) hangs, but you can still access data on the array, so the devices 
(all of them?) are working. It would be useful at that point to see if 
/proc/mdstat shows one device as failed.


Given that I have described the behavior, I would think that there is 
still a problem in the driver or md somewhere, hangs should time out, 
errors should be reported up, and if this is caused by a lost write 
completion, I would hope that would be timed out and reported. That's my 
read on it, these just hangs cases probably are undetected or 
mishandled errors which should be passed up and reported to the 
application or retried and completed. Or handled in some better way than 
what you describe.


Bad hardware is a fact of life, if you feel like chasing this more, an 
understanding of what the hardware did wrong and what the kernel didn't 
do right would be helpful. Of course the failure mode may be so rare, 
and the fix so time-consuming that it won't get fixed, but it can get 
documented.

Beyond this question which I leave to you (although I'd love to hear your
answers/thoughts), I think we can safely say that the problem was hardware
(even if hard to find). If either of you would like, I'd be happy to find
time this week to recreate the error on my better PC and send that
along.

As for rolling a custom kernel with more message buffer, well, I'm going
to be getting into a new device driver in the coming months, so a custom
debug kernel is definitely in my future, but I'm not sure when.

I must say, the kernel has become a much more complex beastie since 2.2.x!
(Although it also appears to be improved and somewhat more organized --
but definitely MUCH larger!)

Thank you both so much! I wouldn't even have diagnosed my hardware problem
without your prompts. I'm very grateful. Let me know if you'd like those
dmesg logs or if you'd just like to let it go!

  

--

bill davidsen [EMAIL PROTECTED]
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Linux-usb-users] Failed reads from RAID-0 array (from newbie who has read the FAQ)

2007-03-19 Thread Michael Schwarz
I'm going to hang on to the hardware. This is a pilot/demo that may lead
to development of a new device, and, if so, I'll be getting back into
device driver writing. Working this problem would be great practice for
that. So I will do it. The only problem is I don't know when!

I believe I can replicate the problem, so I'll find time (perhaps next
weekend) to capture the data of interest.

Mr. Stern: Where might I go for low level programming information on USB
devices? I'm interested in registers/DMA/packet formats, etc.

I've found info on the USB protocol itself, but I haven't found info on
devices. Obviously I can dig through kernel source, but documents would be
nice! Again, if this is an unreasonable request for you to do my
homework, just say so! I won't be offended. I'm sure I can find it myself
given time, but if you happen to have some URLs handy, they'd be
appreciated.

YET AGAIN thank you both! You've been of great help.

-- 
Michael Schwarz

 Michael Schwarz wrote:
 More than ever, I am convinced that it is actually a hardware problem,
 but
 I am curious for the opinions of both of you on whether the system
 (meaning, I guess, the combination of usb-storage driver and raid) is
 really doing the best with what it has.


 See below, but the short answer is there is probably room for improvement.
 My last effort was to switch to a different computer. When I did, I got
 in
 the dmesg log (unfortunately, not preserved, although I should be able
 to
 recreate) that one of the flash drives had bad blocks. Some part of the
 system eventually decided it was a dead device (I believe dmesg
 indicate
 the scsi subsystem said so). The device (it happened to be /dev/sdc) was
 peremptorially dropped from the system. This appears to be what hanged
 the
 raid system.

 (Why these messages never appeared on the other computer is beyond me;
 obviously some difference in how the actual USB controller reports
 errors,
 but, as I said, I've never studied USB drivers or hardware. In fact,
 once
 you get beyond the UARTs you are getting sophisticated to me)

 I've built an array of five known-good devices and so far it works
 swimmingly (at least on the hardware that was better at error
 reporting).

 So it seems to me that there is probably nothing actually wrong with the
 drivers or their interactions at it leaves me only asking if there
 should
 be some sort of improvement in error reporting/recovery up to userland.

 If I am right and the scsi system was marking a device as dead,
 shouldn't
 the userland read against the md device get an error instead of an
 indefinite hang?


 Let me make sure I have this scenario right... one write process (dd or
 cp) hangs, but you can still access data on the array, so the devices
 (all of them?) are working. It would be useful at that point to see if
 /proc/mdstat shows one device as failed.

 Given that I have described the behavior, I would think that there is
 still a problem in the driver or md somewhere, hangs should time out,
 errors should be reported up, and if this is caused by a lost write
 completion, I would hope that would be timed out and reported. That's my
 read on it, these just hangs cases probably are undetected or
 mishandled errors which should be passed up and reported to the
 application or retried and completed. Or handled in some better way than
 what you describe.

 Bad hardware is a fact of life, if you feel like chasing this more, an
 understanding of what the hardware did wrong and what the kernel didn't
 do right would be helpful. Of course the failure mode may be so rare,
 and the fix so time-consuming that it won't get fixed, but it can get
 documented.
 Beyond this question which I leave to you (although I'd love to hear
 your
 answers/thoughts), I think we can safely say that the problem was
 hardware
 (even if hard to find). If either of you would like, I'd be happy to
 find
 time this week to recreate the error on my better PC and send that
 along.

 As for rolling a custom kernel with more message buffer, well, I'm going
 to be getting into a new device driver in the coming months, so a custom
 debug kernel is definitely in my future, but I'm not sure when.

 I must say, the kernel has become a much more complex beastie since
 2.2.x!
 (Although it also appears to be improved and somewhat more organized --
 but definitely MUCH larger!)

 Thank you both so much! I wouldn't even have diagnosed my hardware
 problem
 without your prompts. I'm very grateful. Let me know if you'd like those
 dmesg logs or if you'd just like to let it go!


 --

 bill davidsen [EMAIL PROTECTED]
   CTO TMR Associates, Inc
   Doing interesting things with small computers since 1979

 -
 To unsubscribe from this list: send the line unsubscribe linux-raid in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to 

Re: [Linux-usb-users] Failed reads from RAID-0 array (from newbie who has read the FAQ)

2007-03-19 Thread Alan Stern
On Mon, 19 Mar 2007, Michael Schwarz wrote:

 I'm going to hang on to the hardware. This is a pilot/demo that may lead
 to development of a new device, and, if so, I'll be getting back into
 device driver writing. Working this problem would be great practice for
 that. So I will do it. The only problem is I don't know when!
 
 I believe I can replicate the problem, so I'll find time (perhaps next
 weekend) to capture the data of interest.

Michael, you don't seem to appreciate the basic principles for tracking 
down problems.

First: Simplify.  Get rid of everything that isn't relevant
to the problem and could serve to distract you.  In particular,
don't run X.  That will eliminate around half of your running
processes and shrink the stack dump down so that it might fit
in the kernel buffer without overflowing.

Second: Simplify.  Don't run kernels that have been modified by
Fedora or anybody else.  Use a plain vanilla kernel from
kernel.org.

Third: Simplify.  Try not to collect the same data over and over
again (take a look at the starts of all those dmesg files you
compressed and emailed).  You can clear the kernel's log buffer
after dumping it by doing dmesg -c /dev/null.

Fourth: Be prepared to make changes.  This means making changes
to the kernel configuration or source code, another reason for 
using a stock kernel.

To get some really useful data, you need to build a kernel with 
CONFIG_USB_DEBUG turned on.  Without that setting there won't be any 
helpful debugging information in the log.

Then you should run a minimal system.  Single-user mode would be best, 
but that can be _too_ bare-bones.  No GUI will suffice.

Then you should clear the kernel log before before starting the big file 
copy.  Basically nothing that happens before then is important, because 
nothing has gone wrong.

Then after the hang occurs, see what shows up in the dmesg log.  And get a 
stack dump.

 Mr. Stern: Where might I go for low level programming information on USB
 devices? I'm interested in registers/DMA/packet formats, etc.

Are you interested in USB devices (i.e., flash drives, webcams, and so on
-- the things you plug in to a USB connection) or USB controllers (the 
hardware in your computer that manages the USB bus)?

 I've found info on the USB protocol itself, but I haven't found info on
 devices. Obviously I can dig through kernel source, but documents would be
 nice! Again, if this is an unreasonable request for you to do my
 homework, just say so! I won't be offended. I'm sure I can find it myself
 given time, but if you happen to have some URLs handy, they'd be
 appreciated.

There are three types of USB controllers used in personal computers: UHCI, 
OHCI, and EHCI.  Links to their specifications are available here:

http://www.usb.org/developers/resources/

Specifications for various classes of USB devices are available here:

http://www.usb.org/developers/devclass_docs

Alan Stern

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Linux-usb-users] Failed reads from RAID-0 array (from newbie who has read the FAQ)

2007-03-19 Thread Michael Schwarz
Comments below.

-- 
Michael Schwarz

 On Mon, 19 Mar 2007, Michael Schwarz wrote:

 I'm going to hang on to the hardware. This is a pilot/demo that may lead
 to development of a new device, and, if so, I'll be getting back into
 device driver writing. Working this problem would be great practice for
 that. So I will do it. The only problem is I don't know when!

 I believe I can replicate the problem, so I'll find time (perhaps next
 weekend) to capture the data of interest.

 Michael, you don't seem to appreciate the basic principles for tracking
 down problems.

I want to bristle at this. I've been a professional software developer for
nearly 20 years. But I can't because all of your points below are, of
course, dead on for tracking down a device-level problem.


   First: Simplify.  Get rid of everything that isn't relevant
   to the problem and could serve to distract you.  In particular,
   don't run X.  That will eliminate around half of your running
   processes and shrink the stack dump down so that it might fit
   in the kernel buffer without overflowing.

Right on. And I know this; I should have had two boxes where I was
working; one where I could do browsy-emaily things separate from the
problem I was working.


   Second: Simplify.  Don't run kernels that have been modified by
   Fedora or anybody else.  Use a plain vanilla kernel from
   kernel.org.

Yeah; But here was where I lacked confidence. I used to know every inch of
my kernel and my hardware, but, as previously stated, that was back in the
2.2.x days. I wasn't confident that I could run my hardware with a
plain-vanilla kernel or that I could successfully roll my own working
2.6.x kernel in a timely manner. But, of course, I understand why this is
a good idea.


   Third: Simplify.  Try not to collect the same data over and over
   again (take a look at the starts of all those dmesg files you
   compressed and emailed).  You can clear the kernel's log buffer
   after dumping it by doing dmesg -c /dev/null.

Thanks, I actually didn't know that flag. Makes me feel pretty stupid...


   Fourth: Be prepared to make changes.  This means making changes
   to the kernel configuration or source code, another reason for
   using a stock kernel.

I agree -- I just lacked confidence doing so with newer kernels. I used to
ALWAYS build my own kernel right up through the 2.2.x series, building the
kernel to exactly match my hardware. I just haven't kept up. And if you
compare the 2.2.x kernel's configuration parameter list to the 2.6.x,
well, you can maybe understand why I was reluctant to launch on that when
under time pressure. But you point (I gather) is that if I had, it might
well have taken less time than it did...


 To get some really useful data, you need to build a kernel with
 CONFIG_USB_DEBUG turned on.  Without that setting there won't be any
 helpful debugging information in the log.

Before I send any more info on this problem, I will do this and all of the
above.


 Then you should run a minimal system.  Single-user mode would be best,
 but that can be _too_ bare-bones.  No GUI will suffice.

Will do.


 Then you should clear the kernel log before before starting the big file
 copy.  Basically nothing that happens before then is important, because
 nothing has gone wrong.

 Then after the hang occurs, see what shows up in the dmesg log.  And get a
 stack dump.

 Mr. Stern: Where might I go for low level programming information on USB
 devices? I'm interested in registers/DMA/packet formats, etc.

 Are you interested in USB devices (i.e., flash drives, webcams, and so on
 -- the things you plug in to a USB connection) or USB controllers (the
 hardware in your computer that manages the USB bus)?

Firstly the controllers, then specific devices.


 I've found info on the USB protocol itself, but I haven't found info on
 devices. Obviously I can dig through kernel source, but documents would
 be
 nice! Again, if this is an unreasonable request for you to do my
 homework, just say so! I won't be offended. I'm sure I can find it
 myself
 given time, but if you happen to have some URLs handy, they'd be
 appreciated.

 There are three types of USB controllers used in personal computers: UHCI,
 OHCI, and EHCI.  Links to their specifications are available here:

   http://www.usb.org/developers/resources/

Thanks. This is just what I wanted.


 Specifications for various classes of USB devices are available here:

   http://www.usb.org/developers/devclass_docs

And this. Thank you much. I won't post on this issue again until I've
cleared the decks of the items you mention above. Thanks again.


 Alan Stern



-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Linux-usb-users] Failed reads from RAID-0 array (from newbie who has read the FAQ)

2007-03-19 Thread Alan Stern
On Mon, 19 Mar 2007, Michael Schwarz wrote:

 Yeah; But here was where I lacked confidence. I used to know every inch of
 my kernel and my hardware, but, as previously stated, that was back in the
 2.2.x days. I wasn't confident that I could run my hardware with a
 plain-vanilla kernel or that I could successfully roll my own working
 2.6.x kernel in a timely manner. But, of course, I understand why this is
 a good idea.

It's not so hard to do, if you start from a known-good configuration.  
For instance, you could take the config your current distribution's kernel
is built from and just use it, although it would take a long time to build
because it includes so many drivers.  Whittling it down to just the
drivers you need would be tedious but not very difficult.

Alan Stern

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [PPC32] ADMA support for PPC 440SPe processors.

2007-03-19 Thread Michael Ellerman
On Mon, 2007-03-19 at 17:13 +0100, Benjamin Herrenschmidt wrote:
 BTW folks. Would it be hard to change your spe_ prefixes to something
 else ? There's already enough confusion between the freescale SPE unit
 and the cell SPEs :-)
 
 (such confusion is annoying when grepp'ing for code that might touch a
 given functionality for example).

Please please please!

cheers

-- 
Michael Ellerman
OzLabs, IBM Australia Development Lab

wwweb: http://michael.ellerman.id.au
phone: +61 2 6212 1183 (tie line 70 21183)

We do not inherit the earth from our ancestors,
we borrow it from our children. - S.M.A.R.T Person


signature.asc
Description: This is a digitally signed message part


Re: [PATCH] [PPC32] ADMA support for PPC 440SPe processors.

2007-03-19 Thread Stefan Roese
On Tuesday 20 March 2007 04:06, Michael Ellerman wrote:
 On Mon, 2007-03-19 at 17:13 +0100, Benjamin Herrenschmidt wrote:
  BTW folks. Would it be hard to change your spe_ prefixes to something
  else ? There's already enough confusion between the freescale SPE unit
  and the cell SPEs :-)
 
  (such confusion is annoying when grepp'ing for code that might touch a
  given functionality for example).

 Please please please!

OK. Who can resist so much pleading. ;-)

Best regards,
Stefan
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html