Hi Valérie,
Valerie Clement wrote:
Hi Maeda,
Here is a patch that should fix the hang problem you reported. Could you
try it?
Before applying this patch (fix_kernbench_pb.patch), you must remove the
previous one I sent you (reclaim_mapped_pages.patch).
Sometimes on my machine, under memory pressure in a class, kswapd is not
awaken any more. The patch "fix_shrink_atlimit_pb.patch" fixes this
problem.
The problem still happened during the running test. After reading
the patches, I'm afraid that if I could tell you the problem correctly.
Let me restate it.
The following is kswapd's stack back trace on hung. kswapd never
wakes up unless the start_this_handle returns.
kswapd0 D a000000100262920 0 235 1 952 11
(L-TLB)
Call Trace:
[<a00000010072ece0>] schedule+0x1140/0x1320
sp=e0000001fe42fb50 bsp=e0000001fe4290a0
[<a000000100262920>] start_this_handle+0x680/0xae0
sp=e0000001fe42fb50 bsp=e0000001fe429030
[<a000000100262f50>] journal_start+0x1d0/0x260
sp=e0000001fe42fc10 bsp=e0000001fe428ff0
[<a000000100251a50>] ext3_journal_start_sb+0xd0/0x100
sp=e0000001fe42fc10 bsp=e0000001fe428fc8
[<a000000100241e60>] ext3_ordered_writepage+0x100/0x320
sp=e0000001fe42fc10 bsp=e0000001fe428f88
[<a0000001001133e0>] shrink_ckrmzone+0xc60/0x1940
sp=e0000001fe42fc10 bsp=e0000001fe428ec0
[<a000000100115880>] kswapd+0x760/0x9a0
sp=e0000001fe42fd80 bsp=e0000001fe428e38
[<a0000001000149f0>] kernel_thread_helper+0xd0/0x100
sp=e0000001fe42fe30 bsp=e0000001fe428e10
[<a0000001000094a0>] start_kernel_thread+0x20/0x40
sp=e0000001fe42fe30 bsp=e0000001fe428e10
On the other hand, the start_this_handle never returns unless
the following __alloc_pages successes, because the ext3_create starts
journal handle but not ends yet.
The cc1 is waiting for kswapd to reclaim some of the pages, but kswapd
is waiting for the cc1 to end the journal handle, which requires
the __alloc_pages success. Deadlock.
To escape from the situation, the __alloc_pages must be success
regardless of the limit. Does your patch try to deal with this problem?
cc1 D a000000100730550 0 27939 27938 27940
(NOTLB)
Call Trace:
[<a00000010072ece0>] schedule+0x1140/0x1320
sp=e0000001e334fbc0 bsp=e0000001e3349518
[<a000000100730550>] schedule_timeout+0x110/0x180
sp=e0000001e334fbc0 bsp=e0000001e33494e8
[<a0000001007303a0>] io_schedule_timeout+0x80/0xc0
sp=e0000001e334fbf0 bsp=e0000001e33494c0
[<a00000010039a620>] blk_congestion_wait+0xe0/0x120
sp=e0000001e334fbf0 bsp=e0000001e3349490
[<a0000001000fe100>] __alloc_pages+0x140/0x760
sp=e0000001e334fc50 bsp=e0000001e3349428
[<a000000100137a10>] alloc_pages_current+0x170/0x1a0
sp=e0000001e334fc60 bsp=e0000001e33493f0
[<a0000001000f4b20>] find_or_create_page+0x60/0x120
sp=e0000001e334fc60 bsp=e0000001e33493b0
[<a000000100150580>] __getblk+0x220/0x5a0
sp=e0000001e334fc60 bsp=e0000001e3349360
[<a000000100246370>] ext3_getblk+0x150/0x4e0
sp=e0000001e334fc60 bsp=e0000001e3349310
[<a000000100246740>] ext3_bread+0x40/0x180
sp=e0000001e334fcc0 bsp=e0000001e33492d0
[<a00000010024daa0>] ext3_add_entry+0x9a0/0xf80
sp=e0000001e334fcd0 bsp=e0000001e33491b8
[<a00000010024e2f0>] ext3_add_nondir+0x30/0xe0
sp=e0000001e334fda0 bsp=e0000001e3349180
[<a00000010024e550>] ext3_create+0x1b0/0x240
sp=e0000001e334fda0 bsp=e0000001e3349130
[<a000000100168e00>] vfs_create+0x120/0x1e0
sp=e0000001e334fdb0 bsp=e0000001e33490f0
[<a000000100169670>] open_namei+0x370/0xf40
sp=e0000001e334fdb0 bsp=e0000001e3349060
[<a000000100142dc0>] filp_open+0x40/0xa0
sp=e0000001e334fdc0 bsp=e0000001e3349030
[<a000000100143510>] do_sys_open+0x90/0x1c0
sp=e0000001e334fe30 bsp=e0000001e3348fd8
[<a000000100143690>] sys_open+0x50/0x80
sp=e0000001e334fe30 bsp=e0000001e3348f80
[<a00000010000c840>] ia64_ret_from_syscall+0x0/0x20
sp=e0000001e334fe30 bsp=e0000001e3348f80
[<a000000000010640>] __kernel_syscall_via_break+0x0/0x20
sp=e0000001e3350000 bsp=e0000001e3348f80
During the hung up, the processes sleeping on blk_congestion_wait
periodically wake up and call the following ckrm_shrink_atlimit
to kick kswapd, it doesn't wake up the kswapd sleeping on
start_this_handle, though.
Even worse, because CLS_AT_LIMIT is set during the hung,
ckrm_shrink_atlimit always returns at the second if statement.
It works as a nop.
void
ckrm_shrink_atlimit(struct ckrm_mem_res *cls)
{
struct zone *zone;
unsigned long flags;
if (!cls || (cls->pg_limit == CKRM_SHARE_DONTCARE))
return;
if (test_and_set_bit(CLS_AT_LIMIT, &cls->flags))
return;
if (time_after(jiffies, cls->last_shrink +
ckrm_mem_shrink_interval * HZ)) {
cls->last_shrink = jiffies;
atomic_set(&cls->shrink_count, 0);
}
atomic_inc(&cls->shrink_count);
if (atomic_read(&cls->shrink_count) > ckrm_mem_shrink_count) {
clear_bit(CLS_AT_LIMIT, &cls->flags);
return;
}
cls->max_shrink_atlimit++;
spin_lock_irqsave(&ckrm_mem_lock, flags);
list_add(&cls->shrink_list, &ckrm_shrink_list);
spin_unlock_irqrestore(&ckrm_mem_lock, flags);
for_each_zone(zone) {
wakeup_kswapd(zone, 0);
break; /* only once is enough */
}
}
Thanks,
MAEDA Naoaki
-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
ckrm-tech mailing list
https://lists.sourceforge.net/lists/listinfo/ckrm-tech