Re: IBM xSeries stop responding during RAID1 reconstruction
Mr. James W. Laferriere wrote: Hello Gabor , On Tue, 20 Jun 2006, Gabor Gombas wrote: On Tue, Jun 20, 2006 at 03:08:59PM +0200, Niccolo Rigacci wrote: Do you know if it is possible to switch the scheduler at runtime? echo cfq > /sys/block//queue/scheduler At least one can do a ls of the /sys/block area & then do an automated echo cfq down the tree . Does anyone know of a method to set a default scheduler ? Scanning down a list or manually maintaining a list seems to be a bug in the waiting . Tia , JimL Thought I posted this... it can be set in kernel build or on the bloot parameters from grub/lilo. 2nd thought: set it to cfq by default, then at the END of rc.local, if there are no arrays rebuilding, change to something else if you like. -- bill davidsen <[EMAIL PROTECTED]> CTO TMR Associates, Inc Doing interesting things with small computers since 1979 - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: IBM xSeries stop responding during RAID1 reconstruction
On Tue, Jun 20, 2006 at 08:00:13AM -0700, Mr. James W. Laferriere wrote: > At least one can do a ls of the /sys/block area & then do an > automated > echo cfq down the tree . Does anyone know of a method to set a > default > scheduler ? RTFM: Documentation/kernel-parameters.txt in the kernel source. Gabor -- - MTA SZTAKI Computer and Automation Research Institute Hungarian Academy of Sciences - - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: IBM xSeries stop responding during RAID1 reconstruction
> At least one can do a ls of the /sys/block area & then do an > automated > echo cfq down the tree . Does anyone know of a method to set a > default > scheduler ? My be I didn't understand the question... You decide what schedulers are available at kernel compile time, also at kernel compile time you decide which is the default i/o scheduler. -- Niccolo Rigacci Firenze - Italy Iraq, missione di pace: 38475 morti - www.iraqbodycount.net - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: IBM xSeries stop responding during RAID1 reconstruction
Hello Gabor , On Tue, 20 Jun 2006, Gabor Gombas wrote: On Tue, Jun 20, 2006 at 03:08:59PM +0200, Niccolo Rigacci wrote: Do you know if it is possible to switch the scheduler at runtime? echo cfq > /sys/block//queue/scheduler At least one can do a ls of the /sys/block area & then do an automated echo cfq down the tree . Does anyone know of a method to set a default scheduler ? Scanning down a list or manually maintaining a list seems to be a bug in the waiting . Tia , JimL -- +--+ | James W. Laferriere | SystemTechniques | Give me VMS | | NetworkEngineer | 3600 14th Ave SE #20-103 | Give me Linux | | [EMAIL PROTECTED] | Olympia , WA. 98501 | only on AXP | +--+ - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: IBM xSeries stop responding during RAID1 reconstruction
On Tue, Jun 20, 2006 at 03:08:59PM +0200, Niccolo Rigacci wrote: > Do you know if it is possible to switch the scheduler at runtime? echo cfq > /sys/block//queue/scheduler Gabor -- - MTA SZTAKI Computer and Automation Research Institute Hungarian Academy of Sciences, Laboratory of Parallel and Distributed Systems Address : H-1132 Budapest Victor Hugo u. 18-22. Hungary Phone/Fax : +36 1 329-78-64 (secretary) W3: http://www.lpds.sztaki.hu - - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: IBM xSeries stop responding during RAID1 reconstruction
On Mon, Jun 19, 2006 at 05:05:56PM +0200, Gabor Gombas wrote: > > IMHO a much better fix is to use the cfq I/O scheduler during the > rebuild. Yes, changing the default I/O to DEFAULT_CFQ solve the problem very well, I get over 40 Mb/s resync speed with no lock-up at all! Thank you very much, I think we can elaborate a new FAQ entry. Do you know if it is possible to switch the scheduler at runtime? -- Niccolo Rigacci Firenze - Italy Iraq, missione di pace: 38475 morti - www.iraqbodycount.net - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: IBM xSeries stop responding during RAID1 reconstruction
On Wed, Jun 14, 2006 at 10:46:09AM -0500, Bill Cizek wrote: > I was able to work around this by lowering > /proc/sys/dev/raid/speed_limit_max to a value > below my disk thruput value (~ 50 MB/s) as follows: IMHO a much better fix is to use the cfq I/O scheduler during the rebuild. The default anticipatory scheduler gives horrible latencies and can cause the machine to appear as 'locked up' if there is heavy I/O load like a RAID reconstruct or heavy database usage. The price of cfq is lower throughput (higher RAID rebuild time) than with the anticipatory I/O scheduler. Gabor -- - MTA SZTAKI Computer and Automation Research Institute Hungarian Academy of Sciences - - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: IBM xSeries stop responding during RAID1 reconstruction
On Thursday 15 June 2006 12:13, you wrote: > If this is causing a lockup, then there is something else wrong, just > as any single process should not - by writing constantly to disks - be > able to clog up the whole system. > > Maybe if you could get the result of > alt-sysrq-P I tried some kernel changes enabling the HyperThreading on the (single) P4 processor and enabling CONFIG_PREEMPT_VOLUNTARY=y, but with no success. During the lookup Alt-SysRq-P constantly says that: EIP is at mwait_idle+0x1a/0x2e While Alt-SysRq-T shows - among other processes - the MD syncing and the bash looked-up; this is the hand-copied call traces: md3_resync device_barrier default_wake_function sync_request __generic_unplug_device md_do_sync schedule md_thread md_thread kthread kthread kernel_thread_helper bash io_schedule sync_buffer sync_buffer __wait_on_bit_lock sync_buffer out_of_line_wait_on_bit_lock wake_bit_function __lock_buffer do_get_write_access __ext3_get_inode_loc jurnal_get_write_access ext3_reserve_inode_write ext3_mark_inode_dirty ext3_dirty_inode __mark_inode_dirty update_atime vfs_readdir sys_getdents64 filldir64 syscall_call This is also the top output, which runs regularly during the lookup: top - 11:40:41 up 7 min, 2 users, load average: 8.70, 4.92, 2.04 Tasks: 70 total, 1 running, 69 sleeping, 0 stopped, 0 zombie Cpu(s): 0.2% us, 0.7% sy, 0.0% ni, 98.7% id, 0.0% wa, 0.0% hi, 0.5% si Mem:906212k total,58620k used, 847592k free, 3420k buffers Swap: 1951736k total,0k used, 1951736k free,23848k cached PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 829 root 10 -5 000 S1 0.0 0:01.70 md3_raid1 2823 root 10 -5 000 D1 0.0 0:01.62 md3_resync 1 root 16 0 1956 656 560 S0 0.1 0:00.52 init 2 root RT 0 000 S0 0.0 0:00.00 migration/0 3 root 34 19 000 S0 0.0 0:00.00 ksoftirqd/0 4 root RT 0 000 S0 0.0 0:00.00 watchdog/0 5 root RT 0 000 S0 0.0 0:00.00 migration/1 6 root 34 19 000 S0 0.0 0:00.00 ksoftirqd/1 7 root RT 0 000 S0 0.0 0:00.00 watchdog/1 8 root 10 -5 000 S0 0.0 0:00.01 events/0 9 root 10 -5 000 S0 0.0 0:00.01 events/1 10 root 10 -5 000 S0 0.0 0:00.00 khelper 11 root 10 -5 000 S0 0.0 0:00.00 kthread 14 root 10 -5 000 S0 0.0 0:00.00 kblockd/0 15 root 10 -5 000 S0 0.0 0:00.00 kblockd/1 16 root 11 -5 000 S0 0.0 0:00.00 kacpid 152 root 20 0 000 S0 0.0 0:00.00 pdflush 153 root 15 0 000 D0 0.0 0:00.00 pdflush 154 root 17 0 000 S0 0.0 0:00.00 kswapd0 155 root 11 -5 000 S0 0.0 0:00.00 aio/0 156 root 11 -5 000 S0 0.0 0:00.00 aio/1 755 root 10 -5 000 S0 0.0 0:00.00 kseriod 796 root 10 -5 000 S0 0.0 0:00.00 ata/0 797 root 11 -5 000 S0 0.0 0:00.00 ata/1 799 root 11 -5 000 S0 0.0 0:00.00 scsi_eh_0 800 root 11 -5 000 S0 0.0 0:00.00 scsi_eh_1 825 root 15 0 000 S0 0.0 0:00.00 kirqd 831 root 10 -5 000 D0 0.0 0:00.00 md2_raid1 833 root 10 -5 000 S0 0.0 0:00.00 md1_raid1 834 root 10 -5 000 D0 0.0 0:00.00 md0_raid1 835 root 15 0 000 D0 0.0 0:00.00 kjournald 932 root 18 -4 2192 584 368 S0 0.1 0:00.19 udevd 1698 root 10 -5 000 S0 0.0 0:00.00 khubd 2031 root 22 0 000 S0 0.0 0:00.00 kjournald 2032 root 15 0 000 D0 0.0 0:00.00 kjournald 2142 daemon16 0 1708 364 272 S0 0.0 0:00.00 portmap 2464 root 16 0 2588 932 796 S0 0.1 0:00.01 syslogd -- Niccolo Rigacci Firenze - Italy War against Iraq? Not in my name! - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: IBM xSeries stop responding during RAID1 reconstruction
On Thursday June 15, [EMAIL PROTECTED] wrote: > On Wed, Jun 14, 2006 at 10:46:09AM -0500, Bill Cizek wrote: > > Niccolo Rigacci wrote: > > > > >When the sync is complete, the machine start to respond again > > >perfectly. > > > > > I was able to work around this by lowering > > /proc/sys/dev/raid/speed_limit_max to a value > > below my disk thruput value (~ 50 MB/s) as follows: > > > > $ echo 45000 > /proc/sys/dev/raid/speed_limit_max > > Thanks! > > This hack seems to solve my problem too. So it seems that the > RAID subsystem does not detect a proper speed to throttle the > sync. The RAID subsystem doesn't try to detect a 'proper' speed. When there is nothing else happening, it just drives the disks as fast as they will go. If this is causing a lockup, then there is something else wrong, just as any single process should not - by writing constantly to disks - be able to clog up the whole system. Maybe if you could get the result of alt-sysrq-P or even alt-sysrq-T while the system seems to hang. NeilBrown - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: IBM xSeries stop responding during RAID1 reconstruction
On Wed, Jun 14, 2006 at 10:46:09AM -0500, Bill Cizek wrote: > Niccolo Rigacci wrote: > > >When the sync is complete, the machine start to respond again > >perfectly. > > > I was able to work around this by lowering > /proc/sys/dev/raid/speed_limit_max to a value > below my disk thruput value (~ 50 MB/s) as follows: > > $ echo 45000 > /proc/sys/dev/raid/speed_limit_max Thanks! This hack seems to solve my problem too. So it seems that the RAID subsystem does not detect a proper speed to throttle the sync. Can you please send me some details of your system? - SATA chipset (or motherboard model)? - Disks make/model? - Do you have the config file of the kernel that you was running (look at /boot/config- file)? I wonder if kernel preemption can be blamed for that, or burst speed of disks can fool the throttle calculation. -- Niccolo Rigacci Firenze - Italy Iraq, missione di pace: 38355 morti - www.iraqbodycount.net - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: IBM xSeries stop responding during RAID1 reconstruction
Niccolo Rigacci wrote: Hi to all, I have a new IBM xSeries 206m with two SATA drives, I installed a Debian Testing (Etch) and configured a software RAID as shown: Personalities : [raid1] md1 : active raid1 sdb5[1] sda5[0] 1951744 blocks [2/2] [UU] md2 : active raid1 sdb6[1] sda6[0] 2931712 blocks [2/2] [UU] md3 : active raid1 sdb7[1] sda7[0] 39061952 blocks [2/2] [UU] md0 : active raid1 sdb1[1] sda1[0] 582 blocks [2/2] [UU] I experience this problem: whenever a volume is reconstructing (syncing), the system stops responding. The machine is alive, because it responds to the ping, the console is responsive but I cannot pass the login prompt. It seems that every disk activity is delayed and blocking. When the sync is complete, the machine start to respond again perfectly. Any hints on how to start debugging? I ran into a similar problem using kernel 2.6.16.14 on an ASUS motherboard: When I mirrored two SATA drives it seemed to block all other disk I/O until the sync was complete. My symptoms were the same: all consoles were non-responsive and when I tried to login it just sat there until the sync was complete. I was able to work around this by lowering /proc/sys/dev/raid/speed_limit_max to a value below my disk thruput value (~ 50 MB/s) as follows: $ echo 45000 > /proc/sys/dev/raid/speed_limit_max That kept my system usable but didn't address the underlying problem of the raid resync not being appropriately throttled. I ended up configuring my system differently so this became a moot point for me. Hope this helps, Bill - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html