How mdadm can support > 2T

2006-11-14 Thread 俞先印
Ilinux-raid want to create raid0 use mdadm 2.5.6, kernel 2.6.18-iop3 on the 
intel iop80331(32bit). use 5 disks, and every hard disk is 500G. But it can't 
beyond > 2T.  How can support >2T on the 32bit cpu ?  

command and log :
#mdadm -C /dev/md0 -l0 -n5 /dev/sd[c,d,e,f,g]
# mdadm --detail /dev/md0

[EMAIL PROTECTED]:/# mdadm --detail /dev/md0
/dev/md0:
Version : 00.90.03
  Creation Time : Thu Jan  1 00:29:29 1970
 Raid Level : raid0
 Array Size : 294448832 (280.81 GiB 301.52 GB)
   Raid Devices : 5
  Total Devices : 5
Preferred Minor : 0
Persistence : Superblock is persistent

Update Time : Thu Jan  1 00:29:29 1970
  State : clean
 Active Devices : 5
Working Devices : 5
 Failed Devices : 0
  Spare Devices : 0

 Chunk Size : 64K

   UUID : ebdd57fe:8eb46fdf:884d06b0:5db18b9d
 Events : 0.1

Number   Major   Minor   RaidDevice State
   0   8   320  active sync   /dev/sdc
   1   8   481  active sync   /dev/sdd
   2   8   642  active sync   /dev/sde
   3   8   803  active sync   /dev/sdf
   4   8   964  active sync   /dev/sdg



-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Recovering from default FC6 install

2006-11-14 Thread Bill Davidsen

Doug Ledford wrote:


On Sun, 2006-11-12 at 01:00 -0500, Bill Davidsen wrote:
 

I tried something new on a test system, using the install partitioning 
tools to partition the disk. I had three drives and went with RAID-1 for 
boot, and RAID-5+LVM for the rest. After the install was complete I 
noted that it was solid busy on the drives, and found that the base RAID 
appears to have been created (a) with no superblock and (b) with no 
bitmap. That last is an issue, as a test system it WILL be getting hung 
and rebooted, and recovering the 1.5TB took hours.


Is there an easy way to recover this? The LVM dropped on it has a lot of 
partitions, and there is a lot of data in them asfter several hours of 
feeding with GigE, so I can't readily back up and recreate by hand.


Suggestions?
   



First, the Fedora installer *always* creates persistent arrays, so I'm
not sure what is making you say it didn't, but they should be
persistent.
 

I got the detail on the md device, then -E on the components, and got a 
"no super block found" message, which made me think it wasn't there. 
Given that, I didn't have much hope for the part which starts "assuming 
that they are persistent" but I do thank you for the information, I'm 
sure it will be useful.


I did try recreating, from the running FC6 rather than the rescue, since 
the large data was on it's own RAID and I could umount the f/s and stop 
the array. Alas, I think a "grow" is needed somewhere, after 
configuration, start, and mount of the f/s on RAID-5, e2fsck told me my 
data was toast. Shortest time to solution was to recreate the f/s and 
reload the data.


The RAID-1 stuff is small, a total rebuild is acceptable in the case of 
a failure.


FC install suggestion: more optional control over the RAID features 
during creation. Maybe there's an "advanced features" button in the 
install and I just missed it, but there should be, since the non-average 
user might be able to do useful things with the chunk size, and specify 
a bitmap. I would think that a bitmap would be the default on large 
arrays, assuming that >1TB is still large for the moment.


Instructions and attachments save for future use, trimmed here.

--
bill davidsen <[EMAIL PROTECTED]>
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: raid5 hang on get_active_stripe

2006-11-14 Thread Chris Allen
You probably guessed that no matter what I did, I never, ever saw the 
problem when your
trace was installed. I'd guess at some obscure timing-related problem. I 
can still trigger it
consistently with a vanilla 2.6.17_SMP though, but again only when 
bitmaps are turned on.




Neil Brown wrote:

On Tuesday October 10, [EMAIL PROTECTED] wrote:
  

Very happy to. Let me know what you'd like me to do.



Cool thanks.

At the end is a patch against 2.6.17.11, though it should apply against
any later 2.6.17 kernel.
Apply this and reboot.

Then run

   while true
   do cat /sys/block/mdX/md/stripe_cache_active
  sleep 10
   done > /dev/null

(maybe write a little script or whatever).  Leave this running. It
effects the check for "has raid5 hung".  Make sure to change "mdX" to
whatever is appropriate.

Occasionally look in the kernel logs for
   plug problem:

if you find that, send me the surrounding text - there should be about
a dozen lines following this one.

Hopefully this will let me know which is last thing to happen: a plug
or an unplug.
If the last is a "plug", then the timer really should still be
pending, but isn't (this is impossible).  So I'll look more closely at
that option.
If the last is an "unplug", then the 'Plugged' flag should really be
clear but it isn't (this is impossible).  So I'll look more closely at
that option.

Dean is running this, but he only gets the hang every couple of
weeks.  If you get it more often, that would help me a lot.

Thanks,
NeilBrown


diff ./.patches/orig/block/ll_rw_blk.c ./block/ll_rw_blk.c
--- ./.patches/orig/block/ll_rw_blk.c   2006-08-21 09:52:46.0 +1000
+++ ./block/ll_rw_blk.c 2006-10-05 11:33:32.0 +1000
@@ -1546,6 +1546,7 @@ static int ll_merge_requests_fn(request_
  * This is called with interrupts off and no requests on the queue and
  * with the queue lock held.
  */
+static atomic_t seq = ATOMIC_INIT(0);
 void blk_plug_device(request_queue_t *q)
 {
WARN_ON(!irqs_disabled());
@@ -1558,9 +1559,16 @@ void blk_plug_device(request_queue_t *q)
return;
 
 	if (!test_and_set_bit(QUEUE_FLAG_PLUGGED, &q->queue_flags)) {

+   q->last_plug = jiffies;
+   q->plug_seq = atomic_read(&seq);
+   atomic_inc(&seq);
mod_timer(&q->unplug_timer, jiffies + q->unplug_delay);
blk_add_trace_generic(q, NULL, 0, BLK_TA_PLUG);
-   }
+   } else
+   q->last_plug_skip = jiffies;
+   if (!timer_pending(&q->unplug_timer) &&
+   !q->unplug_work.pending)
+   printk("Neither Timer or work are pending\n");
 }
 
 EXPORT_SYMBOL(blk_plug_device);

@@ -1573,10 +1581,17 @@ int blk_remove_plug(request_queue_t *q)
 {
WARN_ON(!irqs_disabled());
 
-	if (!test_and_clear_bit(QUEUE_FLAG_PLUGGED, &q->queue_flags))

+   if (!test_and_clear_bit(QUEUE_FLAG_PLUGGED, &q->queue_flags)) {
+   q->last_unplug_skip = jiffies;
return 0;
+   }
 
 	del_timer(&q->unplug_timer);

+   q->last_unplug = jiffies;
+   q->unplug_seq = atomic_read(&seq);
+   atomic_inc(&seq);
+   if (test_bit(QUEUE_FLAG_PLUGGED, &q->queue_flags))
+   printk("queue still (or again) plugged\n");
return 1;
 }
 
@@ -1635,7 +1650,7 @@ static void blk_backing_dev_unplug(struc

 static void blk_unplug_work(void *data)
 {
request_queue_t *q = data;
-
+   q->last_unplug_work = jiffies;
blk_add_trace_pdu_int(q, BLK_TA_UNPLUG_IO, NULL,
q->rq.count[READ] + q->rq.count[WRITE]);
 
@@ -1649,6 +1664,7 @@ static void blk_unplug_timeout(unsigned 
 	blk_add_trace_pdu_int(q, BLK_TA_UNPLUG_TIMER, NULL,

q->rq.count[READ] + q->rq.count[WRITE]);
 
+	q->last_unplug_timeout = jiffies;

kblockd_schedule_work(&q->unplug_work);
 }
 


diff ./.patches/orig/drivers/md/raid1.c ./drivers/md/raid1.c
--- ./.patches/orig/drivers/md/raid1.c  2006-08-10 17:28:01.0 +1000
+++ ./drivers/md/raid1.c2006-09-04 21:58:31.0 +1000
@@ -1486,7 +1486,6 @@ static void raid1d(mddev_t *mddev)
d = conf->raid_disks;
d--;
rdev = conf->mirrors[d].rdev;
-   atomic_add(s, 
&rdev->corrected_errors);
if (rdev &&
test_bit(In_sync, 
&rdev->flags)) {
if 
(sync_page_io(rdev->bdev,
@@ -1509,6 +1508,9 @@ static void raid1d(mddev_t *mddev)
 s<<9, 
conf->tmppage, READ) == 0)
/* Well, this 
device is dead */
  

Re: Raid 1 up after raidhotadd without rebuild

2006-11-14 Thread Neil Brown
On Tuesday November 14, [EMAIL PROTECTED] wrote:
> 
> Any ideas why this is and how to fix it?

You don't mention a kernel version.  If it is 2.6.18, upgrade to
2.6.18.2.

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re[4]: RAID1 submirror failure causes reboot?

2006-11-14 Thread Jim Klimov
Hello Jens,

JA> Then lets wait for Jim to repeat his testing with all the debugging
JA> options enabled, that should make us a little wiser.
Ok, I'll enable the kernel rebuilt with these options and report any
findings later. So far, I'll move to the other questions aroused.

I remember when I ran with most of debugging ON last week, it made the
server run darn slow, with LA around 40-60 and little responsiveness,
and numerous messages like: 
  Hangcheck: hangcheck value past margin!
  BUG: soft lockup detected on CPU#1!
which hide the interesting traces :)
However, I can post these captured traces of several system lifetimes
to the list or privately.

Concerning other questions:
1) The workload on the software raid is rather small. It's a set of
   system partitions which keep fileserver's logs, etc. The file
   storage is on 3Ware cards and has substantial load. MD arrays are
   checked nightly, though (echo check > sync_action), and most often
   this triggers the problem. These drives only contain mirrored
   partitions, so there should be no I/O to these drives around
   MD, except for rare cases of lilo running :)

2) I have installed a third submirror disk this weekend, it's an IDE
   slave device hdd (near the failing hdc). Since then I got errors
   on other partitions, attached below as "2*)".
   
3) The failures which lead to reboots are usually preceded by a long
   history of dma_intr errors 0x40 and 0x51, but that sample I sent
   was rather full. A few errors preceded it every 5 seconds, making
   the full trace like this:

[87319.049902] hdc: dma_intr: status=0x51 { DriveReady SeekComplete Error }
[87319.057393] hdc: dma_intr: error=0x01 { AddrMarkNotFound }, 
LBAsect=176315718, sector=176315631
[87319.067205] ide: failed opcode was: unknown
[87323.956399] hdc: dma_intr: status=0x51 { DriveReady SeekComplete Error }
[87323.963681] hdc: dma_intr: error=0x01 { AddrMarkNotFound }, 
LBAsect=176315718, sector=176315631
[87323.973171] ide: failed opcode was: unknown
[87328.846265] hdc: dma_intr: status=0x51 { DriveReady SeekComplete Error }
[87328.853485] hdc: dma_intr: error=0x01 { AddrMarkNotFound }, 
LBAsect=176315718, sector=176315631
[87328.862834] ide: failed opcode was: unknown
[87333.736127] hdc: dma_intr: status=0x51 { DriveReady SeekComplete Error }
[87333.743535] hdc: dma_intr: error=0x01 { AddrMarkNotFound }, 
LBAsect=176315718, sector=176315631
[87333.752876] ide: failed opcode was: unknown
[87333.806569] ide1: reset: success
[87338.675891] hdc: task_in_intr: status=0x59 { DriveReady SeekComplete 
DataRequest Error }
[87338.685143] hdc: task_in_intr: error=0x01 { AddrMarkNotFound }, 
LBAsect=176315718, sector=176315711
[87338.694791] ide: failed opcode was: unknown
[87343.557424] hdc: task_in_intr: status=0x59 { DriveReady SeekComplete 
DataRequest Error }
[87343.566388] hdc: task_in_intr: error=0x01 { AddrMarkNotFound }, 
LBAsect=176315718, sector=176315711
[87343.576105] ide: failed opcode was: unknown
[87348.472226] hdc: task_in_intr: status=0x59 { DriveReady SeekComplete 
DataRequest Error }
[87348.481170] hdc: task_in_intr: error=0x01 { AddrMarkNotFound }, 
LBAsect=176315718, sector=176315711
[87348.490843] ide: failed opcode was: unknown
[87353.387028] hdc: task_in_intr: status=0x59 { DriveReady SeekComplete 
DataRequest Error }
[87353.395735] hdc: task_in_intr: error=0x01 { AddrMarkNotFound }, 
LBAsect=176315718, sector=176315711
[87353.405500] ide: failed opcode was: unknown
[87353.461342] ide1: reset: success
[87358.326783] hdc: task_in_intr: status=0x59 { DriveReady SeekComplete 
DataRequest Error }
[87358.335739] hdc: task_in_intr: error=0x01 { AddrMarkNotFound }, 
LBAsect=176315718, sector=176315718
[87358.345395] ide: failed opcode was: unknown
[87363.208313] hdc: task_in_intr: status=0x59 { DriveReady SeekComplete 
DataRequest Error }
[87363.217319] hdc: task_in_intr: error=0x01 { AddrMarkNotFound }, 
LBAsect=176315718, sector=176315718
[87363.228371] ide: failed opcode was: unknown
[87368.106472] hdc: task_in_intr: status=0x59 { DriveReady SeekComplete 
DataRequest Error }
[87368.115414] hdc: task_in_intr: error=0x01 { AddrMarkNotFound }, 
LBAsect=176315718, sector=176315718
[87368.125275] ide: failed opcode was: unknown
[87372.979686] hdc: task_in_intr: status=0x59 { DriveReady SeekComplete 
DataRequest Error }
[87372.988706] hdc: task_in_intr: error=0x01 { AddrMarkNotFound }, 
LBAsect=176315718, sector=176315718
[87372.998849] ide: failed opcode was: unknown
[87373.052152] ide1: reset: success
[87377.927744] hdc: task_in_intr: status=0x59 { DriveReady SeekComplete 
DataRequest Error }
[87377.936682] hdc: task_in_intr: error=0x01 { AddrMarkNotFound }, 
LBAsect=176315718, sector=176315718
[87377.946399] ide: failed opcode was: unknown
[87382.800953] hdc: task_in_intr: status=0x59 { DriveReady SeekComplete 
DataRequest Error }
[87382.809881] hdc: task_in_intr: error=0x01 { AddrMarkNotFound }, 
LBAsect=176315718, sector=176315718
[87382.819511] ide: failed opcode was: u

Re: Raid 1 up after raidhotadd without rebuild

2006-11-14 Thread Elmar Weber

Neil Brown wrote:

On Tuesday November 14, [EMAIL PROTECTED] wrote:

Any ideas why this is and how to fix it?


You don't mention a kernel version.  If it is 2.6.18, upgrade to
2.6.18.2.


Its 2.6.18, I'll upgrade and try, I checked the log and guess it is
"[PATCH] md: Fix bug where spares don't always get rebuilt properly when 
they become live.". Exactly my problem.


Thanks for the fast help.

ciao,
elm
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Raid 1 up after raidhotadd without rebuild

2006-11-14 Thread Elmar Weber

Hello,

I'm trying to rebuild a raid1 array by adding the replacement disk (sda) 
to the old one (sdf). I zero'd the new disk (sda) and created the 
partition table corresponding to the existing one.


I adapted my raidtab to the new drive node (changed due to other disk 
replacement) and now I'm trying to add the new disk, but the raid 1 is 
not rebuilding, the newly added disk is marked as U(p) in the mdstat output.


Here's what I do step by step (mdstat output reduced to one raid array, 
but the problem exists with every partition used as a raid 1):


# cat /proc/mdstat

md1 : active raid1 sdf1[1]
  112728000 blocks [2/1] [_U]


# raidhotadd /dev/md1 /dev/sda1 && cat /proc/mdstat

md1 : active raid1 sda1[1] sdf1[0]
  112728000 blocks [2/2] [UU]


Thats it, no resync, nothing. When I do a fsck on the md1 there are 
errors en masse. raidtools don't even seem to touch sda1, since when I 
create a filesystem with content on it, I can mount it after adding it 
to the raid.



Any ideas why this is and how to fix it?


Thanks in advance & ciao,
elm


Here is the entry in the syslog:
==
Nov 15 09:34:46 server md: bind
Nov 15 09:34:46 server RAID1 conf printout:
Nov 15 09:34:46 server --- wd:1 rd:2
Nov 15 09:34:46 server disk 0, wo:1, o:1, dev:sda1
Nov 15 09:34:46 server disk 1, wo:0, o:1, dev:sdf1
Nov 15 09:34:46 server md: syncing RAID array md2
Nov 15 09:34:46 server md: minimum _guaranteed_ reconstruction speed: 
1000 KB/sec/disc.
Nov 15 09:34:46 server md: using maximum available idle IO bandwidth 
(but not more than 20 KB/sec) for reconstruction.
Nov 15 09:34:46 server md: using 128k window, over a total of 112728000 
blocks.

Nov 15 09:34:46 server md: md1: sync done.
Nov 15 09:34:46 server RAID1 conf printout:
Nov 15 09:34:46 server --- wd:2 rd:2
Nov 15 09:34:46 server disk 0, wo:0, o:1, dev:sda1
Nov 15 09:34:46 server disk 1, wo:0, o:1, dev:sdf1
==

Here is the raidtab:
==
raiddev /dev/md1
raid-level  1
nr-raid-disks   2
nr-spare-disks  0
persistent-superblock   1
chunk-size 8k

device  /dev/sda1
raid-disk   0
device  /dev/sdf1
raid-disk   1
==



--
"Religion und Familie sind die beiden größten Feinde des Fortschritts."
(André Gide (1869 - 1951), französischer Schriftsteller)
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


safest way to swap in a new physical disk

2006-11-14 Thread Will Sheffler

Hi.

What is the safest way to switch out a disk in a software raid array  
created with mdadm? I'm not talking about replacing a failed disk, I  
want to take a healthy disk in the array and swap it for another  
physical disk. Specifically, I have an array made up of 10 250gb  
software-raid partitions on 8 300gb disks and 2 250gb disks, plus a  
hot spare. I want to switch the 250s to new 300gb disks so everything  
matches. Is there a way to do this without risking a rebuild? I can't  
back everything up, so I want to be as risk-free as possible.


I guess what I want is to do something like this:

(1) Unmount the array
(2) Un-create the array
(3) Somehow exactly duplicate partition X to a partition Y on a new disk
(4) Re-create array with X gone and Y in it's place
(5) Check if the array is OK without changing/activating it
(6) If there is a problem, switch from Y back to X and have it as  
though nothing changed


The part I'm worried about is (3), as I've tried duplicating  
partition images before and it never works right. Is there a way to  
do this with mdadm?


For what it's worth, mdadm / linux software raid handles this setup  
beautifully... easy to set up, easy to maintain, easy to fix... I've  
never had any trouble. And I didn't go broke buying raid controllers.  
GREAT software!


Thanks a bunch!
-Will Sheffler
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html