Re: Help RAID5 reshape Oops / backup-file

2007-10-11 Thread Nagilum

- Message from [EMAIL PROTECTED] -
Date: Fri, 12 Oct 2007 09:51:08 +1000
From: Neil Brown <[EMAIL PROTECTED]>
Reply-To: Neil Brown <[EMAIL PROTECTED]>
 Subject: Re: Help RAID5 reshape Oops / backup-file
  To: Nagilum <[EMAIL PROTECTED]>
  Cc: linux-raid@vger.kernel.org



On Thursday October 11, [EMAIL PROTECTED] wrote:

Ok, after looking in "Grow.c" I can see that the backup file is
removed once the critial section has passed:

if (backup_file)
unlink(backup_file);

printf(Name ": ... critical section passed.\n");

Since I had passed that point I'll try to find out where
Grow_restart() stumbles. By looking at it I'm not even sure it's able
to "resume" and not just restart. :-/



It isn't a problem that you didn't specify a backup-file.
If you don't, mdadm uses some spare space on one of the new drives.
After the critical section has passed, the backup file isn't needed
any longer.
The problem is that mdadm still wants to find and recover from it.

I throughly tested mdadm restarting from a crash during the critical
section, but it looks like I didn't properly test restarting from a
later crash.

I think if you just change the 'return 1' at the end of Grow_restart
to 'return 0' it should work for you.

I'll try to get this fixed properly (and tested) and release a 2.6.4.

NeilBrown




- End message from [EMAIL PROTECTED] -

Thanks, I changed Grow_restart as suggested, now I get:
nas:~/mdadm-2.6.3# ./mdadm -A /dev/md0 /dev/sd[a-e]
mdadm: /dev/md0 assembled from 3 drives and 2 spares - not enough to  
start the array.


nas:~/mdadm-2.6.3# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : inactive sda[0] sde[6] sdd[5] sdc[2] sdb[1]
  2441543360 blocks

unused devices: 

which is similar to what the old mdadm is telling me.
I'll try to find out where it gets the idea these are spares..
Would it be a good idea to update to vanilla 2.6.23 instead of running  
Debian Etch's 2.6.18-5?

If there is anything I can do to help with v2.6.4 let me know!
Thanks,
Alex.


#_  __  _ __ http://www.nagilum.org/ \n icq://69646724 #
#   / |/ /__  _(_) /_  _  [EMAIL PROTECTED] \n +491776461165 #
#  // _ `/ _ `/ / / // /  ' \  Amiga (68k/PPC): AOS/NetBSD/Linux   #
# /_/|_/\_,_/\_, /_/_/\_,_/_/_/_/   Mac (PPC): MacOS-X / NetBSD /Linux #
#   /___/ x86: FreeBSD/Linux/Solaris/Win2k  ARM9: EPOC EV6 #




cakebox.homeunix.net - all the machine one needs..



pgpdgRJbJZVe9.pgp
Description: PGP Digital Signature


Re: [PATCH] Expose the degraded status of an assembled array through sysfs

2007-10-11 Thread Neil Brown
On Wednesday October 10, [EMAIL PROTECTED] wrote:
> On Mon, Sep 10, 2007 at 06:51:14PM +0200, Iustin Pop wrote:
> > The 'degraded' attribute is useful to quickly determine if the array is
> > degraded, instead of parsing 'mdadm -D' output or relying on the other
> > techniques (number of working devices against number of defined devices, 
> > etc.).
> > The md code already keeps track of this attribute, so it's useful to export 
> > it.
> > 
> > Signed-off-by: Iustin Pop <[EMAIL PROTECTED]>
> > ---
> > Note: I sent this back in January and it people agreed it was a good
> > idea.  However, it has not been picked up. So here I resend it again.
> 
> Ping? Neil, could you spare a few moments to look at this? (and sorry for
> bothering you)

Yeh thanks for your patience.  September was not a good time for
getting my attention.

Yes, I think this is both sensible and useful.

I might just change..

> > @@ -2842,6 +2842,12 @@ sync_max_store(mddev_t *mddev, const char *buf, 
> > size_t len)
> >  static struct md_sysfs_entry md_sync_max =
> >  __ATTR(sync_speed_max, S_IRUGO|S_IWUSR, sync_max_show, sync_max_store);
> >  
> > +static ssize_t
> > +degraded_show(mddev_t *mddev, char *page)
> > +{
> > +   return sprintf(page, "%i\n", mddev->degraded);
> > +}

... the %i to a %d though.   At first I thought it was a typo, but
then checked the man page and discovered that %d and %i both mean the
same thing (so why support them both I wonder).

Thanks,
NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Help RAID5 reshape Oops / backup-file

2007-10-11 Thread Neil Brown
On Thursday October 11, [EMAIL PROTECTED] wrote:
> Ok, after looking in "Grow.c" I can see that the backup file is  
> removed once the critial section has passed:
> 
>   if (backup_file)
>   unlink(backup_file);
> 
>   printf(Name ": ... critical section passed.\n");
> 
> Since I had passed that point I'll try to find out where  
> Grow_restart() stumbles. By looking at it I'm not even sure it's able  
> to "resume" and not just restart. :-/
> 

It isn't a problem that you didn't specify a backup-file.
If you don't, mdadm uses some spare space on one of the new drives.
After the critical section has passed, the backup file isn't needed
any longer.
The problem is that mdadm still wants to find and recover from it.

I throughly tested mdadm restarting from a crash during the critical
section, but it looks like I didn't properly test restarting from a
later crash.

I think if you just change the 'return 1' at the end of Grow_restart
to 'return 0' it should work for you.

I'll try to get this fixed properly (and tested) and release a 2.6.4.

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID 5 performance issue.

2007-10-11 Thread Justin Piszcz



On Thu, 11 Oct 2007, Andrew Clayton wrote:


On Thu, 11 Oct 2007 13:06:39 -0400, Bill Davidsen wrote:


Andrew Clayton wrote:

On Fri, 5 Oct 2007 16:56:03 -0400, John Stoffel wrote:

 >> Can you start a 'vmstat 1' in one window, then start whatever
 >> you do

to get crappy performance.  That would be interesting to see.
   >

In trying to find something simple that can show the problem I'm
seeing. I think I may have found the culprit.

Just testing on my machine at home, I made this simple program.

/* fslattest.c */

#define _GNU_SOURCE

#include 
#include 
#include 
#include 
#include 
#include 
#include 


int main(int argc, char *argv[])
{
char file[255];

if (argc < 2) {
printf("Usage: fslattest file\n");
exit(1);
}

strncpy(file, argv[1], 254);
printf("Opening %s\n", file);

while (1) {
int testfd = open(file, >
O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600); close(testfd);
unlink(file);
sleep(1);
}

exit(0);
}


If I run this program under strace in my home directory (XFS file
system on a (new) disk (no raid involved) all to its own.like

$ strace -T -e open ./fslattest test

It doesn't looks too bad.

open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3
<0.005043> open("test",
O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.000212>
open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3
<0.016844>

If I then start up a dd in the same place.

$ dd if=/dev/zero of=bigfile bs=1M count=500

Then I see the problem I'm seeing at work.

open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3
<2.000348> open("test",
O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <1.594441>
open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3
<2.224636> open("test",
O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <1.074615>

Doing the same on my other disk which is Ext3 and contains the root
fs, it doesn't ever stutter

open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3
<0.015423> open("test",
O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.92>
open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3
<0.93> open("test",
O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.88>
open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3
<0.000103> open("test",
O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.96>
open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3
<0.94> open("test",
O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.000114>
open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3
<0.91> open("test",
O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.000274>
open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3
<0.000107>


Somewhere in there was the dd, but you can't tell.

I've found if I mount the XFS filesystem with nobarrier, the
latency is reduced to about 0.5 seconds with occasional spikes > 1
second.

When doing this on the raid array.

open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <0.009164>
open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <0.71>
open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <0.002667>

dd kicks in

open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <11.580238>
open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <3.94>
open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <0.63>
open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <4.297978>

dd finishes >
open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <0.000199>
open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <0.013413>
open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <0.025134>


I guess I should take this to the XFS folks.


Try mounting the filesystem "noatime" and see if that's part of the
problem.


Yeah, it's mounted noatime. Looks like I tracked this down to an XFS
regression.

http://marc.info/?l=linux-fsdevel&m=119211228609886&w=2

Cheers,

Andrew
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html



Nice!  Thanks for reporting the final result, 1-2 weeks of 
debugging/discussion, nice you found it.


Justin.
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Different sized disks for RAID1+0 or RAID10.

2007-10-11 Thread Neil Brown
On Wednesday October 10, [EMAIL PROTECTED] wrote:
> 
> I've currently got a pair of identical drives in a RAID1 set for
> my data partition. I'll be getting a pair of bigger drives in a
> bit, and I was wondering if I could RAID1 those (of course) and
> then RAID0 the two differently sized mds. Even better, will RAID10
> let me do this? 
> 
> I don't need to grow the current RAID1 into this new beast, I've
> got a place I can copy the existing data so I can start from
> scratch.
> 
> I imagine the answer is: "sure RAID10 / RAID0 let's you do this,
> but you don't get the striping performance benefit" for some of
> the data", which would be ok with me until the smaller drives go
> bad and I replace them.

RAID0 happily handles devices of different sizes and uses all the
available space.
RAID10 will only use as much space off each device as the smallest
device allows.

So you should have 2 RAID1 arrays of different sizes, and use RAID0 to
combine them.
Don't use RAID10.

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How do i limit the bandwidth-usage while resyncing on RAID 1?

2007-10-11 Thread Neil Brown
On Wednesday October 10, [EMAIL PROTECTED] wrote:
> Hello List,
> 
> while resyncing, the process takes the whole bandwidth from disk to disk.
> 
> This leads in a VERY unhappy situation, because the system on this raid is
> unpractical slow now, because it has to wait for disk-io.
> 
> How can i tune this? I want somthing like "nice -n 19 dm-mirror" ;)

Are you using MD raid, or DM?

If DM, then it is my understanding that you cannot ratelimit.

If DM, then you already have the answer from a previous post.

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: AW: How do i limit the bandwidth-usage while resyncing on RAID 1?

2007-10-11 Thread Richard Scobie

Rustedt, Florian wrote:

 Hi Richard,

Seems to me, that you mussunderstood? There's no rsync in RAID afaik?
This is an internal driver...?



Hi Florian,

My mistake - I read "rsync" in the Subject of your mail instead of "resync".

Try "echo X > /sys/block/mdY/md/sync_speed_max"

where X is the speed you require in kB/s and Y is your md device.

Regards,

Richard
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID 5 performance issue.

2007-10-11 Thread Andrew Clayton
On Thu, 11 Oct 2007 13:06:39 -0400, Bill Davidsen wrote:

> Andrew Clayton wrote:
> > On Fri, 5 Oct 2007 16:56:03 -0400, John Stoffel wrote:
> >
> >   >> Can you start a 'vmstat 1' in one window, then start whatever
> >   >> you do
> >> to get crappy performance.  That would be interesting to see.
> >> >
> > In trying to find something simple that can show the problem I'm
> > seeing. I think I may have found the culprit.
> >
> > Just testing on my machine at home, I made this simple program.
> >
> > /* fslattest.c */
> >
> > #define _GNU_SOURCE
> >
> > #include 
> > #include 
> > #include 
> > #include 
> > #include 
> > #include 
> > #include 
> >
> >
> > int main(int argc, char *argv[])
> > {
> > char file[255];
> >
> > if (argc < 2) {
> > printf("Usage: fslattest file\n");
> > exit(1);
> > }
> >
> > strncpy(file, argv[1], 254);
> > printf("Opening %s\n", file);
> >
> > while (1) {
> > int testfd = open(file, >
> > O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600); close(testfd);
> > unlink(file);
> > sleep(1);
> > }
> >
> > exit(0);
> > }
> >
> >
> > If I run this program under strace in my home directory (XFS file
> > system on a (new) disk (no raid involved) all to its own.like
> >
> > $ strace -T -e open ./fslattest test
> >
> > It doesn't looks too bad.
> >
> > open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3
> > <0.005043> open("test",
> > O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.000212>
> > open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3
> > <0.016844>
> >
> > If I then start up a dd in the same place.
> >
> > $ dd if=/dev/zero of=bigfile bs=1M count=500
> >
> > Then I see the problem I'm seeing at work.
> >
> > open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3
> > <2.000348> open("test",
> > O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <1.594441>
> > open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3
> > <2.224636> open("test",
> > O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <1.074615>
> >
> > Doing the same on my other disk which is Ext3 and contains the root
> > fs, it doesn't ever stutter
> >
> > open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3
> > <0.015423> open("test",
> > O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.92>
> > open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3
> > <0.93> open("test",
> > O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.88>
> > open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3
> > <0.000103> open("test",
> > O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.96>
> > open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3
> > <0.94> open("test",
> > O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.000114>
> > open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3
> > <0.91> open("test",
> > O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.000274>
> > open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3
> > <0.000107>
> >
> >
> > Somewhere in there was the dd, but you can't tell.
> >
> > I've found if I mount the XFS filesystem with nobarrier, the
> > latency is reduced to about 0.5 seconds with occasional spikes > 1
> > second.
> >
> > When doing this on the raid array.
> >
> > open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <0.009164>
> > open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <0.71>
> > open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <0.002667>
> >
> > dd kicks in
> >
> > open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <11.580238>
> > open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <3.94>
> > open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <0.63>
> > open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <4.297978>
> >
> > dd finishes >
> > open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <0.000199>
> > open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <0.013413>
> > open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <0.025134>
> >
> >
> > I guess I should take this to the XFS folks.
> 
> Try mounting the filesystem "noatime" and see if that's part of the
> problem.

Yeah, it's mounted noatime. Looks like I tracked this down to an XFS
regression.

http://marc.info/?l=linux-fsdevel&m=119211228609886&w=2

Cheers,

Andrew
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Different sized disks for RAID1+0 or RAID10.

2007-10-11 Thread Goswin von Brederlow
Kelly Byrd <[EMAIL PROTECTED]> writes:

> On Thu, 11 Oct 2007 11:38:04 -0400, Bill Davidsen <[EMAIL PROTECTED]> wrote:
>> Kelly Byrd wrote:
>>> I've currently got a pair of identical drives in a RAID1 set for
>>> my data partition. I'll be getting a pair of bigger drives in a
>>> bit, and I was wondering if I could RAID1 those (of course) and
>>> then RAID0 the two differently sized mds. Even better, will RAID10
>>> let me do this?
>>>
>> 
>> RAID-10 will let you do this, read past threads of this list for
>> discussion of using the "far" option to gain performance.

It will? I don't mean that it will do a raid10 the size of the smaller
disk.

Does raid10 do a 4 disk raid the size of the smaller disks followed by
2 disk raid for the remaining space?

>>> I don't need to grow the current RAID1 into this new beast, I've
>>> got a place I can copy the existing data so I can start from
>>> scratch.
>>>
>
> Doesn't the 'far' option trade write performance to gain read
> performance? This is a desktop, not at all a "mostly read" type
> workload. 

My tests with large files show no degradation in write and nearly
double speed on read. But that might differ for you.

MfG
Goswin
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID 5 performance issue.

2007-10-11 Thread Bill Davidsen

Andrew Clayton wrote:

On Fri, 5 Oct 2007 16:56:03 -0400, John Stoffel wrote:

  

Can you start a 'vmstat 1' in one window, then start whatever you do
to get crappy performance.  That would be interesting to see.



In trying to find something simple that can show the problem I'm
seeing. I think I may have found the culprit.

Just testing on my machine at home, I made this simple program.

/* fslattest.c */

#define _GNU_SOURCE

#include 
#include 
#include 
#include 
#include 
#include 
#include 


int main(int argc, char *argv[])
{
char file[255];

if (argc < 2) {
printf("Usage: fslattest file\n");
exit(1);
}

strncpy(file, argv[1], 254);
printf("Opening %s\n", file);

while (1) {
int testfd = open(file, 
O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600);

close(testfd);
unlink(file);
sleep(1);
}

exit(0);
}


If I run this program under strace in my home directory (XFS file system
on a (new) disk (no raid involved) all to its own.like

$ strace -T -e open ./fslattest test

It doesn't looks too bad.

open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.005043>
open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.000212>
open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.016844>

If I then start up a dd in the same place.

$ dd if=/dev/zero of=bigfile bs=1M count=500

Then I see the problem I'm seeing at work.

open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <2.000348>
open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <1.594441>
open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <2.224636>
open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <1.074615>

Doing the same on my other disk which is Ext3 and contains the root fs,
it doesn't ever stutter

open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.015423>
open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.92>
open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.93>
open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.88>
open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.000103>
open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.96>
open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.94>
open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.000114>
open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.91>
open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.000274>
open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3 <0.000107>


Somewhere in there was the dd, but you can't tell.

I've found if I mount the XFS filesystem with nobarrier, the
latency is reduced to about 0.5 seconds with occasional spikes > 1
second.

When doing this on the raid array.

open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <0.009164>
open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <0.71>
open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <0.002667>

dd kicks in

open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <11.580238>
open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <3.94>
open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <0.63>
open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <4.297978>

dd finishes 


open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <0.000199>
open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <0.013413>
open("test", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 <0.025134>


I guess I should take this to the XFS folks.


Try mounting the filesystem "noatime" and see if that's part of the problem.

--
bill davidsen <[EMAIL PROTECTED]>
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Different sized disks for RAID1+0 or RAID10.

2007-10-11 Thread Bill Davidsen

Kelly Byrd wrote:


On Thu, 11 Oct 2007 11:38:04 -0400, Bill Davidsen <[EMAIL PROTECTED]> wrote:
  

Kelly Byrd wrote:


I've currently got a pair of identical drives in a RAID1 set for
my data partition. I'll be getting a pair of bigger drives in a
bit, and I was wondering if I could RAID1 those (of course) and
then RAID0 the two differently sized mds. Even better, will RAID10
let me do this?

  

RAID-10 will let you do this, read past threads of this list for
discussion of using the "far" option to gain performance.


I don't need to grow the current RAID1 into this new beast, I've
got a place I can copy the existing data so I can start from
scratch.

  


Doesn't the 'far' option trade write performance to gain read
performance? This is a desktop, not at all a "mostly read" type
workload. 

  
Is your load not read-mostly? The things I want to have happen quickly 
are things like boot, start application, load a document, saved page, or 
man page, compile a kernel (that may not be typical), play an mp3 or 
video, load image(s) in gimp or similar, read mail... all things which 
feel faster if you favor read performance.


I think of it this way: most of the stuff I write is buffered by the 
system and I don't have to wait for it (unless it's huge). Most of the 
large stuff I read, as noted above, is stuff I wait for.


If you look at the times you have to wait for i/o, I bet you will decide 
a desktop is read-mostly after all.
  

I imagine the answer is: "sure RAID10 / RAID0 let's you do this,
but you don't get the striping performance benefit" for some of
the data", which would be ok with me until the smaller drives go
bad and I replace them.

  

Replacing the smaller drives could be an adventure if you plan to go to
larger replacement drives. I don't recall the issues involved with using
larger partitions and RAID-10, there's another issue for you to research.




Will do.


-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  



--
bill davidsen <[EMAIL PROTECTED]>
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Different sized disks for RAID1+0 or RAID10.

2007-10-11 Thread Kelly Byrd



On Thu, 11 Oct 2007 11:38:04 -0400, Bill Davidsen <[EMAIL PROTECTED]> wrote:
> Kelly Byrd wrote:
>> I've currently got a pair of identical drives in a RAID1 set for
>> my data partition. I'll be getting a pair of bigger drives in a
>> bit, and I was wondering if I could RAID1 those (of course) and
>> then RAID0 the two differently sized mds. Even better, will RAID10
>> let me do this?
>>
> 
> RAID-10 will let you do this, read past threads of this list for
> discussion of using the "far" option to gain performance.
>> I don't need to grow the current RAID1 into this new beast, I've
>> got a place I can copy the existing data so I can start from
>> scratch.
>>

Doesn't the 'far' option trade write performance to gain read
performance? This is a desktop, not at all a "mostly read" type
workload. 


>> I imagine the answer is: "sure RAID10 / RAID0 let's you do this,
>> but you don't get the striping performance benefit" for some of
>> the data", which would be ok with me until the smaller drives go
>> bad and I replace them.
>>
> 
> Replacing the smaller drives could be an adventure if you plan to go to
> larger replacement drives. I don't recall the issues involved with using
> larger partitions and RAID-10, there's another issue for you to research.
> 

Will do.


-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Different sized disks for RAID1+0 or RAID10.

2007-10-11 Thread Bill Davidsen

Kelly Byrd wrote:

I've currently got a pair of identical drives in a RAID1 set for
my data partition. I'll be getting a pair of bigger drives in a
bit, and I was wondering if I could RAID1 those (of course) and
then RAID0 the two differently sized mds. Even better, will RAID10
let me do this? 
  


RAID-10 will let you do this, read past threads of this list for 
discussion of using the "far" option to gain performance.

I don't need to grow the current RAID1 into this new beast, I've
got a place I can copy the existing data so I can start from
scratch.

I imagine the answer is: "sure RAID10 / RAID0 let's you do this,
but you don't get the striping performance benefit" for some of
the data", which would be ok with me until the smaller drives go
bad and I replace them.
  


Replacing the smaller drives could be an adventure if you plan to go to 
larger replacement drives. I don't recall the issues involved with using 
larger partitions and RAID-10, there's another issue for you to research.


--
bill davidsen <[EMAIL PROTECTED]>
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Help RAID5 reshape Oops / backup-file

2007-10-11 Thread Nagilum
Ok, after looking in "Grow.c" I can see that the backup file is  
removed once the critial section has passed:


if (backup_file)
unlink(backup_file);

printf(Name ": ... critical section passed.\n");

Since I had passed that point I'll try to find out where  
Grow_restart() stumbles. By looking at it I'm not even sure it's able  
to "resume" and not just restart. :-/



- Message from [EMAIL PROTECTED] -
Date: Tue, 09 Oct 2007 20:58:47 +0200
From: Nagilum <[EMAIL PROTECTED]>
Reply-To: Nagilum <[EMAIL PROTECTED]>
 Subject: Help RAID5 reshape Oops / backup-file
  To: linux-raid@vger.kernel.org



Hi,
During the process of reshaping a Raid5 from 3 (/dev/sd[a-c]) to 5
devices (/dev/sd[a-e]) the system was accidentally shut down.
I know I was stupid I should have used a --backup-file but stupid me didn't.
Thanks for not rubbing it any further. :(
Ok, here is what I have:

nas:~# uname -a
Linux nas 2.6.18-5-amd64 #1 SMP Thu Aug 30 01:14:54 UTC 2007 x86_64 GNU/Linux
nas:~# mdadm --version
mdadm - v2.5.6 - 9 November 2006
nas:~# mdadm -Q --detail /dev/md0
/dev/md0:
 Version : 00.91.03
   Creation Time : Sat Sep 15 21:11:41 2007
  Raid Level : raid5
 Device Size : 488308672 (465.69 GiB 500.03 GB)
Raid Devices : 5
   Total Devices : 5
Preferred Minor : 0
 Persistence : Superblock is persistent

 Update Time : Mon Oct  8 23:59:27 2007
   State : active, degraded, Not Started
  Active Devices : 3
Working Devices : 5
  Failed Devices : 0
   Spare Devices : 2

  Layout : left-symmetric
  Chunk Size : 16K

   Delta Devices : 2, (3->5)

UUID : 25da80a6:d56eb9d6:0d7656f3:2f233380
  Events : 0.470134

 Number   Major   Minor   RaidDevice State
0   800  active sync   /dev/sda
1   8   161  active sync   /dev/sdb
2   8   322  active sync   /dev/sdc
3   003  removed
4   004  removed

5   8   48-  spare   /dev/sdd
6   8   64-  spare   /dev/sde


nas:~# mdadm -E /dev/sd[a-e]
/dev/sda:
   Magic : a92b4efc
 Version : 00.91.00
UUID : 25da80a6:d56eb9d6:0d7656f3:2f233380
   Creation Time : Sat Sep 15 21:11:41 2007
  Raid Level : raid5
 Device Size : 488308672 (465.69 GiB 500.03 GB)
  Array Size : 1953234688 (1862.75 GiB 2000.11 GB)
Raid Devices : 5
   Total Devices : 5
Preferred Minor : 0

   Reshape pos'n : 872095808 (831.70 GiB 893.03 GB)
   Delta Devices : 2 (3->5)

 Update Time : Mon Oct  8 23:59:27 2007
   State : clean
  Active Devices : 5
Working Devices : 5
  Failed Devices : 0
   Spare Devices : 0
Checksum : f425054d - correct
  Events : 0.470134

  Layout : left-symmetric
  Chunk Size : 16K

   Number   Major   Minor   RaidDevice State
this 0   800  active sync   /dev/sda

0 0   800  active sync   /dev/sda
1 1   8   161  active sync   /dev/sdb
2 2   8   322  active sync   /dev/sdc
3 3   8   643  active sync   /dev/sde
4 4   8   484  active sync   /dev/sdd
/dev/sdb:
   Magic : a92b4efc
 Version : 00.91.00
UUID : 25da80a6:d56eb9d6:0d7656f3:2f233380
   Creation Time : Sat Sep 15 21:11:41 2007
  Raid Level : raid5
 Device Size : 488308672 (465.69 GiB 500.03 GB)
  Array Size : 1953234688 (1862.75 GiB 2000.11 GB)
Raid Devices : 5
   Total Devices : 5
Preferred Minor : 0

   Reshape pos'n : 872095808 (831.70 GiB 893.03 GB)
   Delta Devices : 2 (3->5)

 Update Time : Mon Oct  8 23:59:27 2007
   State : clean
  Active Devices : 5
Working Devices : 5
  Failed Devices : 0
   Spare Devices : 0
Checksum : f425055f - correct
  Events : 0.470134

  Layout : left-symmetric
  Chunk Size : 16K

   Number   Major   Minor   RaidDevice State
this 1   8   161  active sync   /dev/sdb

0 0   800  active sync   /dev/sda
1 1   8   161  active sync   /dev/sdb
2 2   8   322  active sync   /dev/sdc
3 3   8   643  active sync   /dev/sde
4 4   8   484  active sync   /dev/sdd
/dev/sdc:
   Magic : a92b4efc
 Version : 00.91.00
UUID : 25da80a6:d56eb9d6:0d7656f3:2f233380
   Creation Time : Sat Sep 15 21:11:41 2007
  Raid Level : raid5
 Device Size : 488308672 (465.69 GiB 500.03 GB)
  Array Size : 1953234688 (1862.75 GiB 2000.11 GB)
Raid Devices : 5
   Total Devices : 5
Preferred Minor : 0

   Reshape pos'n : 872095808 (831.70 GiB 893.03 GB)
   Delta Devices

AW: How do i limit the bandwidth-usage while resyncing on RAID 1?

2007-10-11 Thread Rustedt, Florian
Thank you ;)

Hoping, that did it!

Kind regards, Florian

> -Ursprüngliche Nachricht-
> Von: Tomasz Chmielewski [mailto:[EMAIL PROTECTED] 
> Gesendet: Mittwoch, 10. Oktober 2007 17:16
> An: Rustedt, Florian
> Cc: linux-raid@vger.kernel.org
> Betreff: Re: How do i limit the bandwidth-usage while 
> resyncing on RAID 1?
> 
> Rustedt, Florian schrieb:
> > Hello List,
> > 
> > while resyncing, the process takes the whole bandwidth from 
> disk to disk.
> > 
> > This leads in a VERY unhappy situation, because the system on this 
> > raid is unpractical slow now, because it has to wait for disk-io.
> > 
> > How can i tune this? I want somthing like "nice -n 19 dm-mirror" ;)
> 
> Look into sync_speed_min and sync_speed_max in /sys/block/mdX/md.
> 
> 
> --
> Tomasz Chmielewski
> http://blog.wpkg.org
> -
> To unsubscribe from this list: send the line "unsubscribe 
> linux-raid" in the body of a message to 
> [EMAIL PROTECTED] More majordomo info at  
> http://vger.kernel.org/majordomo-info.html
> 
**
IMPORTANT: The contents of this email and any attachments are confidential. 
They are intended for the 
named recipient(s) only.
If you have received this email in error, please notify the system manager or 
the sender immediately and do 
not disclose the contents to anyone or make copies thereof.
*** eSafe scanned this email for viruses, vandals, and malicious content. ***
**

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


AW: How do i limit the bandwidth-usage while resyncing on RAID 1?

2007-10-11 Thread Rustedt, Florian
 Hi Richard,

Seems to me, that you mussunderstood? There's no rsync in RAID afaik?
This is an internal driver...?

Kind regards, Florian

> -Ursprüngliche Nachricht-
> Von: Richard Scobie [mailto:[EMAIL PROTECTED] 
> Gesendet: Mittwoch, 10. Oktober 2007 21:46
> An: Linux RAID Mailing List
> Betreff: Re: How do i limit the bandwidth-usage while 
> resyncing on RAID 1?
> 
> Rustedt, Florian wrote:
> 
> > How can i tune this? I want somthing like "nice -n 19 dm-mirror" ;)
> 
> Have a look at man rsync - the --bwlimit=KBPS option.
> 
> Regards,
> 
> Richard
> 
> -
> To unsubscribe from this list: send the line "unsubscribe 
> linux-raid" in the body of a message to 
> [EMAIL PROTECTED] More majordomo info at  
> http://vger.kernel.org/majordomo-info.html
> 
**
IMPORTANT: The contents of this email and any attachments are confidential. 
They are intended for the 
named recipient(s) only.
If you have received this email in error, please notify the system manager or 
the sender immediately and do 
not disclose the contents to anyone or make copies thereof.
*** eSafe scanned this email for viruses, vandals, and malicious content. ***
**

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html