So I broke the raid and used a single partition for iozone testing with 5000mb chunks. No errors reported. It's a combo of EXT4 and mdraid module that cause soft lockups. :(
Benchmarking results to be released at a later time. -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Brian Long Sent: Thursday, January 12, 2012 3:12 PM To: [email protected] Subject: Re: [rhelv6-list] RHEL6.2 Kernel/EXT4 bug I responded too quickly. I thought it was finished and it is still running. I thought RHEL used cfq by default. /Brian/ On 1/12/12 2:54 PM, Musayev, Ilya wrote: > I guess I can break the raid and try again on a single drive. I will let you > know what happens. > > Did you actually do 5000MB test with iozone? > > My 100MB and 1000MB are fine, only when I go into larger 5000MB range with > iozone is when I start having issues. I could probably narrow it down and > find the optimal break point, but I think it should not matter - as this > should not happen altogether and does not occur with XFS. At this point, I'm > leaning more toward XFS as I get better or on par metrics of EXT4 without any > issues. > > I'm also curious as to why your IO scheduler was set to cfq, if I recall > correctly - noop should have been default. > > -----Original Message----- > From: [email protected] > [mailto:[email protected]] On Behalf Of Brian Long > Sent: Thursday, January 12, 2012 1:41 PM > To: [email protected] > Subject: Re: [rhelv6-list] RHEL6.2 Kernel/EXT4 bug > > On 1/12/12 12:14 PM, Musayev, Ilya wrote: >> Curious if anyone has seen this in their RHEL6.2 setups, if you have >> 6.1 or 6.2 please try this out and see what happens. List of commands >> to reproduce is below, latest iozone required. >> >> >> >> https://bugzilla.redhat.com/show_bug.cgi?id=773377 > > I put the same kernel on my RH 6.2 workstation with a single drive and ran > iozone with the same parameters. I don't have the drive mirrored and I had > to change the scheduler to noop since it was cfq by default. > > The only partition I had with enough free space is encrypted, so kcryptd was > taking 100% CPU while running iozone. Have you narrowed it down to md-only? > What happens if you run the same test on just one of your drives? > > I got a kernel oops early on, but no ext4 errors: > Jan 12 12:51:26 brilong-lnx2 kernel: ------------[ cut here > ]------------ Jan 12 12:51:26 brilong-lnx2 kernel: WARNING: at > kernel/sched.c:5914 > thread_return+0x232/0x79d() (Not tainted) Jan 12 12:51:26 brilong-lnx2 > kernel: Hardware name: IBM System x3200 > -[4362PAY]- > Jan 12 12:51:26 brilong-lnx2 kernel: Modules linked in: autofs4 sunrpc > cpufreq_ondemand acpi_cpufreq freq_table mperf ipv6 ipt_REJECT > nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack iptable_filter > ip_tables sha256_generic cryptd aes_x86_64 aes_generic cbc dm_crypt > uinput sg microcode serio_raw i2c_i801 iTCO_wdt iTCO_vendor_support > tg3 i3000_edac edac_core ext4 mbcache jbd2 sd_mod crc_t10dif sr_mod > cdrom pata_acpi ata_generic ata_piix radeon ttm drm_kms_helper drm > i2c_algo_bit i2c_core dm_mirror dm_region_hash dm_log dm_mod [last > unloaded: scsi_wait_scan] > Jan 12 12:51:26 brilong-lnx2 kernel: Pid: 23, comm: kblockd/1 Not tainted > 2.6.32-220.2.1.el6.x86_64 #1 Jan 12 12:51:26 brilong-lnx2 kernel: Call Trace: > Jan 12 12:51:26 brilong-lnx2 kernel: [<ffffffff81069997>] ? > warn_slowpath_common+0x87/0xc0 > Jan 12 12:51:26 brilong-lnx2 kernel: [<ffffffff810699ea>] ? > warn_slowpath_null+0x1a/0x20 > Jan 12 12:51:26 brilong-lnx2 kernel: [<ffffffff814eccc5>] ? > thread_return+0x232/0x79d > Jan 12 12:51:26 brilong-lnx2 kernel: [<ffffffff812494d0>] ? > blk_unplug_work+0x0/0x70 > Jan 12 12:51:26 brilong-lnx2 kernel: [<ffffffff812494d0>] ? > blk_unplug_work+0x0/0x70 > Jan 12 12:51:26 brilong-lnx2 kernel: [<ffffffff8108b15c>] ? > worker_thread+0x1fc/0x2a0 > Jan 12 12:51:26 brilong-lnx2 kernel: [<ffffffff81090a10>] ? > autoremove_wake_function+0x0/0x40 > Jan 12 12:51:26 brilong-lnx2 kernel: [<ffffffff8108af60>] ? > worker_thread+0x0/0x2a0 > Jan 12 12:51:26 brilong-lnx2 kernel: [<ffffffff810906a6>] ? > kthread+0x96/0xa0 > Jan 12 12:51:26 brilong-lnx2 kernel: [<ffffffff8100c14a>] ? > child_rip+0xa/0x20 > Jan 12 12:51:26 brilong-lnx2 kernel: [<ffffffff81090610>] ? kthread+0x0/0xa0 > Jan 12 12:51:26 brilong-lnx2 kernel: [<ffffffff8100c140>] ? > child_rip+0x0/0x20 > Jan 12 12:51:26 brilong-lnx2 kernel: ---[ end trace aeef27db2e12775f > ]--- > > /Brian/ -- Brian Long | | Corporate Security Programs Org . | | | . | | | . ' ' C I S C O _______________________________________________ rhelv6-list mailing list [email protected] https://www.redhat.com/mailman/listinfo/rhelv6-list _______________________________________________ rhelv6-list mailing list [email protected] https://www.redhat.com/mailman/listinfo/rhelv6-list
