Hi TJ, Song,

Sorry for late reply.

On 05/08/2015 11:23 PM, Tejun Heo wrote:

> Cc'ing Lai, Gu and Kamezawa as they've been working in the area for a
> while now.  Gu, is this related to what you've been working on?


Yes, they are the same. And we are still working on it, please refer to the
following for detail:
https://lkml.org/lkml/2015/4/24/143
https://lkml.org/lkml/2015/2/27/145
https://lkml.org/lkml/2015/3/25/989

Regards,
Gu

> 
> Thanks.
> 
> On Fri, May 08, 2015 at 07:16:40PM +0800, Song Xiumiao wrote:
>> From: songxiumiao <songxium...@inspur.com>
>>
>> By analysing the bug function call trace,we find that create_worker
>> function will alloc the memory from node0.Because node0 is offline,
>> the allocation is failed. Then we add a condition to ensure the node
>> is online and system can alloc memory from a node that is online.
>>
>> Follow is the bug information:
>> [root@localhost ~]# echo 1 > /sys/devices/system/cpu/cpu90/online
>> [  225.611209] smpboot: Booting Node 2 Processor 90 APIC 0x40
>> [18446744029.482996] kvm: enabling virtualization on CPU90
>> [  225.725503] TSC synchronization [CPU#43 -> CPU#90]:
>> [  225.730952] Measured 672516581900 cycles TSC warp between CPUs, turning 
>> off TSC clock.
>> [  225.739800] tsc: Marking TSC unstable due to check_tsc_sync_source failed
>> [  225.755126] BUG: unable to handle kernel paging request at 
>> 0000000000001b08
>> [  225.762931] IP: [<ffffffff81182597>] __alloc_pages_nodemask+0xb7/0x940
>> [  225.770247] PGD 449bb0067 PUD 46110e067 PMD 0
>> [  225.775248] Oops: 0000 [#1] SMP
>> [  225.778875] Modules linked in: xt_CHECKSUM ip6t_rpfilter ip6t_REJECT 
>> nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT nf_reject_ipv4 
>> nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntracd
>> [  225.868198] CPU: 43 PID: 5400 Comm: bash Not tainted 
>> 4.0.0-rc4-bug-fixed-remove #16
>> [  225.876754] Hardware name: Insyde Brickland/Type2 - Board Product Name1, 
>> BIOS Brickland.05.04.15.0024 02/28/2015
>> [  225.888122] task: ffff88045a3d8da0 ti: ffff880446120000 task.ti: 
>> ffff880446120000
>> [  225.896484] RIP: 0010:[<ffffffff81182597>]  [<ffffffff81182597>] 
>> __alloc_pages_nodemask+0xb7/0x940
>> [  225.906509] RSP: 0018:ffff880446123918  EFLAGS: 00010246
>> [  225.912443] RAX: 0000000000001b00 RBX: 0000000000000010 RCX: 
>> 0000000000000000
>> [  225.920416] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 
>> 00000000002052d0
>> [  225.928388] RBP: ffff880446123a08 R08: ffff880460eca0c0 R09: 
>> 0000000060eca101
>> [  225.936361] R10: ffff88046d007300 R11: ffffffff8108dd31 R12: 
>> 000000000001002a
>> [  225.944334] R13: 00000000002052d0 R14: 0000000000000001 R15: 
>> 00000000000040d0
>> [  225.952306] FS:  00007f9386450740(0000) GS:ffff88046db60000(0000) 
>> knlGS:0000000000000000
>> [  225.961346] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [  225.967765] CR2: 0000000000001b08 CR3: 00000004612a3000 CR4: 
>> 00000000001407e0
>> [  225.975735] Stack:
>> [  225.977981]  00000000002052d0 0000000000000000 0000000000000003 
>> ffff88045a3d8da0
>> [  225.986291]  ffff880446123988 ffffffff811c7f81 ffff88045a3d8da0 
>> 0000000000000000
>> [  225.994597]  000080d000000002 ffff88046d005500 000000000003000f 
>> 002052d0002052d0
>> [  226.002904] Call Trace:
>> [  226.005645]  [<ffffffff811c7f81>] ? alloc_pages_current+0x91/0x100
>> [  226.012557]  [<ffffffff811d27c3>] ? deactivate_slab+0x383/0x400
>> [  226.019173]  [<ffffffff811d3957>] new_slab+0xa7/0x460
>> [  226.024826]  [<ffffffff81678c75>] __slab_alloc+0x310/0x470
>> [  226.030960]  [<ffffffff8130caf6>] ? get_from_free_list+0x46/0x60
>> [  226.037679]  [<ffffffff8108dd31>] ? alloc_worker+0x21/0x50
>> [  226.043812]  [<ffffffff811d46c1>] kmem_cache_alloc_node_trace+0x91/0x250
>> [  226.051299]  [<ffffffff8108dd31>] alloc_worker+0x21/0x50
>> [  226.057236]  [<ffffffff8108ff23>] create_worker+0x53/0x1e0
>> [  226.063357]  [<ffffffff81092092>] alloc_unbound_pwq+0x2a2/0x510
>> [  226.069974]  [<ffffffff810924b4>] wq_update_unbound_numa+0x1b4/0x220
>> [  226.077076]  [<ffffffff81092828>] workqueue_cpu_up_callback+0x308/0x3d0
>> [  226.084468]  [<ffffffff8109784e>] notifier_call_chain+0x4e/0x80
>> [  226.091084]  [<ffffffff8109796e>] __raw_notifier_call_chain+0xe/0x10
>> [  226.098189]  [<ffffffff810774f3>] cpu_notify+0x23/0x50
>> [  226.103929]  [<ffffffff81077878>] _cpu_up+0x188/0x1a0
>> [  226.109574]  [<ffffffff81077919>] cpu_up+0x89/0xb0
>> [  226.114923]  [<ffffffff8166fba0>] cpu_subsys_online+0x40/0x90
>> [  226.121350]  [<ffffffff814386dd>] device_online+0x6d/0xa0
>> [  226.127382]  [<ffffffff814387a5>] online_store+0x95/0xa0
>> [  226.133322]  [<ffffffff814358a8>] dev_attr_store+0x18/0x30
>> [  226.139457]  [<ffffffff8126d76d>] sysfs_kf_write+0x3d/0x50
>> [  226.145586]  [<ffffffff8126cc1a>] kernfs_fop_write+0x12a/0x180
>> [  226.152109]  [<ffffffff811f1bb7>] vfs_write+0xb7/0x1f0
>> [  226.157853]  [<ffffffff810232bc>] ? do_audit_syscall_entry+0x6c/0x70
>> [  226.164954]  [<ffffffff811f2835>] SyS_write+0x55/0xd0
>> [  226.170595]  [<ffffffff81681f09>] system_call_fastpath+0x12/0x17
>> [  226.177306] Code: 30 97 00 89 45 bc 83 e1 0f b8 22 01 32 01 01 c9 d3 f8 
>> 83 e0 03 89 9d 6c ff ff ff 83 e3 10 89 45 c0 0f 85 6d 01 00 00 48 8b 45 88 
>> <48> 83 78 08 00 0f 84 51 01 00 00 b8 01
>> [  226.199175] RIP  [<ffffffff81182597>] __alloc_pages_nodemask+0xb7/0x940
>> [  226.206576]  RSP <ffff880446123918>
>> [  226.210471] CR2: 0000000000001b08
>> [  226.227939] ---[ end trace 30d753e1e1124696 ]---
>> [  226.412591] Kernel panic - not syncing: Fatal exception
>> [  226.430948] Kernel Offset: disabled
>> [  226.434845] drm_kms_helper: panic occurred, switching back to text console
>> [  226.618325] ---[ end Kernel panic - not syncing: Fatal exception
>> [  226.625047] ------------[ cut here ]------------
>> [  226.630213] WARNING: CPU: 43 PID: 5400 at arch/x86/kernel/smp.c:124 
>> native_smp_send_reschedule+0x5d/0x60()
>> [  226.640999] Modules linked in: xt_CHECKSUM ip6t_rpfilter ip6t_REJECT 
>> nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT nf_reject_ipv4 
>> nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntracd
>> [  226.730275] CPU: 43 PID: 5400 Comm: bash Tainted: G      D         
>> 4.0.0-rc4-bug-fixed-remove #16
>> [  226.740189] Hardware name: Insyde Brickland/Type2 - Board Product Name1, 
>> BIOS Brickland.05.04.15.0024 02/28/2015
>> [  226.751558]  0000000000000000 00000000aa535e80 ffff88046db63d58 
>> ffffffff8167aa08
>> [  226.759865]  0000000000000000 0000000000000000 ffff88046db63d98 
>> ffffffff810772da
>> [  226.768173]  ffff88046db63d98 0000000000000000 ffff88046d615380 
>> 000000000000002b
>> [  226.776480] Call Trace:
>> [  226.779212]  <IRQ>  [<ffffffff8167aa08>] dump_stack+0x45/0x57
>> [  226.785657]  [<ffffffff810772da>] warn_slowpath_common+0x8a/0xc0
>> [  226.792367]  [<ffffffff8107740a>] warn_slowpath_null+0x1a/0x20
>> [  226.798886]  [<ffffffff8104a64d>] native_smp_send_reschedule+0x5d/0x60
>> [  226.806182]  [<ffffffff810b4fe5>] trigger_load_balance+0x145/0x1b0
>> [  226.813093]  [<ffffffff810a348c>] scheduler_tick+0x9c/0xe0
>> [  226.819228]  [<ffffffff810e0a21>] update_process_times+0x51/0x60
>> [  226.825946]  [<ffffffff810f0925>] tick_sched_handle.isra.18+0x25/0x60
>> [  226.833143]  [<ffffffff810f09a4>] tick_sched_timer+0x44/0x80
>> [  226.839467]  [<ffffffff810e1737>] __run_hrtimer+0x77/0x1d0
>> [  226.845590]  [<ffffffff810f0960>] ? tick_sched_handle.isra.18+0x60/0x60
>> [  226.852980]  [<ffffffff810e1b13>] hrtimer_interrupt+0x103/0x230
>> [  226.859596]  [<ffffffff8104d3d9>] local_apic_timer_interrupt+0x39/0x60
>> [  226.866883]  [<ffffffff81684d85>] smp_apic_timer_interrupt+0x45/0x60
>> [  226.873982]  [<ffffffff81682ded>] apic_timer_interrupt+0x6d/0x80
>> [  226.880690]  <EOI>  [<ffffffff81675abe>] ? panic+0x1c3/0x204
>> [  226.887036]  [<ffffffff81675ab7>] ? panic+0x1bc/0x204
>> [  226.892682]  [<ffffffff81018949>] oops_end+0x109/0x120
>> [  226.898422]  [<ffffffff81675285>] no_context+0x2ee/0x366
>> [  226.904359]  [<ffffffff81675370>] __bad_area_nosemaphore+0x73/0x1cc
>> [  226.911361]  [<ffffffff816756ae>] bad_area+0x44/0x4c
>> [  226.916910]  [<ffffffff81062b1a>] __do_page_fault+0x2ea/0x420
>> [  226.923331]  [<ffffffff81062c81>] do_page_fault+0x31/0x70
>> [  226.929364]  [<ffffffff81683f08>] page_fault+0x28/0x30
>> [  226.935106]  [<ffffffff8108dd31>] ? alloc_worker+0x21/0x50
>> [  226.941235]  [<ffffffff81182597>] ? __alloc_pages_nodemask+0xb7/0x940
>> [  226.948430]  [<ffffffff81182705>] ? __alloc_pages_nodemask+0x225/0x940
>> [  226.955725]  [<ffffffff811c7f81>] ? alloc_pages_current+0x91/0x100
>> [  226.962624]  [<ffffffff811d27c3>] ? deactivate_slab+0x383/0x400
>> [  226.969239]  [<ffffffff811d3957>] new_slab+0xa7/0x460
>> [  226.974885]  [<ffffffff81678c75>] __slab_alloc+0x310/0x470
>> [  226.981015]  [<ffffffff8130caf6>] ? get_from_free_list+0x46/0x60
>> [  226.987727]  [<ffffffff8108dd31>] ? alloc_worker+0x21/0x50
>> [  226.993851]  [<ffffffff811d46c1>] kmem_cache_alloc_node_trace+0x91/0x250
>> [  227.001340]  [<ffffffff8108dd31>] alloc_worker+0x21/0x50
>> [  227.007275]  [<ffffffff8108ff23>] create_worker+0x53/0x1e0
>> [  227.013404]  [<ffffffff81092092>] alloc_unbound_pwq+0x2a2/0x510
>> [  227.020019]  [<ffffffff810924b4>] wq_update_unbound_numa+0x1b4/0x220
>> [  227.027112]  [<ffffffff81092828>] workqueue_cpu_up_callback+0x308/0x3d0
>> [  227.034502]  [<ffffffff8109784e>] notifier_call_chain+0x4e/0x80
>> [  227.041117]  [<ffffffff8109796e>] __raw_notifier_call_chain+0xe/0x10
>> [  227.048219]  [<ffffffff810774f3>] cpu_notify+0x23/0x50
>> [  227.053961]  [<ffffffff81077878>] _cpu_up+0x188/0x1a0
>> [  227.059597]  [<ffffffff81077919>] cpu_up+0x89/0xb0
>> [  227.064950]  [<ffffffff8166fba0>] cpu_subsys_online+0x40/0x90
>> [  227.071372]  [<ffffffff814386dd>] device_online+0x6d/0xa0
>> [  227.077395]  [<ffffffff814387a5>] online_store+0x95/0xa0
>> [  227.083332]  [<ffffffff814358a8>] dev_attr_store+0x18/0x30
>> [  227.089460]  [<ffffffff8126d76d>] sysfs_kf_write+0x3d/0x50
>> [  227.095589]  [<ffffffff8126cc1a>] kernfs_fop_write+0x12a/0x180
>> [  227.102108]  [<ffffffff811f1bb7>] vfs_write+0xb7/0x1f0
>> [  227.107850]  [<ffffffff810232bc>] ? do_audit_syscall_entry+0x6c/0x70
>> [  227.114950]  [<ffffffff811f2835>] SyS_write+0x55/0xd0
>> [  227.120595]  [<ffffffff81681f09>] system_call_fastpath+0x12/0x17
>> [  227.127306] ---[ end trace 30d753e1e1124697 ]---
>>
>> Signed-off-by: Song Xiumiao <songxium...@inspur.com>
>> Signed-off-by: Gong Zhaogang <gongzhaog...@inspur.com>
>> Tested-by: Liu Changsheng <liuchangsh...@inspur.com>
>> Reviewed-by: xiaofeng.yan <xiaofeng....@inspur.com>
>> Reviewed-by: Fan Dongdong <fa...@inspur.com>
>> ---
>>  kernel/workqueue.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
>> index 586ad91..cae6277 100644
>> --- a/kernel/workqueue.c
>> +++ b/kernel/workqueue.c
>> @@ -3253,7 +3253,8 @@ static struct worker_pool *get_unbound_pool(const 
>> struct workqueue_attrs *attrs)
>>      if (wq_numa_enabled) {
>>              for_each_node(node) {
>>                      if (cpumask_subset(pool->attrs->cpumask,
>> -                                       wq_numa_possible_cpumask[node])) {
>> +                                       wq_numa_possible_cpumask[node]) &&
>> +                                       node_online(node)) {
>>                              pool->node = node;
>>                              break;
>>                      }
>> -- 
>> 1.9.1
>>
>>
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to