On 2015-10-02 20:54, Guillaume Nault wrote:
On Fri, Oct 02, 2015 at 11:01:45AM +0300, Denys Fedoryshchenko wrote:
Here is similar panic after patch applied (it might be different bug),
got
over netconsole:
[126348.617115] CPU: 0 PID: 5254 Comm: accel-pppd Not tainted
4.2.2-build-0087 #2
[126348.617632] Hardware name: Intel Corporation S2600GZ/S2600GZ,
BIOS
SE5C600.86B.02.03.0003.041920141333 04/19/2014
[126348.618193] task: ffff8817cfbe0000 ti: ffff8817c6350000 task.ti:
ffff8817c6350000
[126348.618696] RIP: 0010:[<ffffffffa00ea129>]
[<ffffffffa00ea129>] pppoe_release+0x56/0x142 [pppoe]
[126348.619306] RSP: 0018:ffff8817c6353e28 EFLAGS: 00010202
[126348.619601] RAX: 0000000000000000 RBX: ffff8817a92b0400 RCX:
0000000000000000
[126348.620152] RDX: 0000000000000001 RSI: 00000000fffffe01 RDI:
ffffffff8180c18a
[126348.620715] RBP: ffff8817c6353e68 R08: 0000000000000000 R09:
0000000000000000
[126348.621254] R10: ffff88173c02b210 R11: 0000000000000293 R12:
ffff8817b3c18000
[126348.621784] R13: ffff8817b3c18030 R14: ffff8817967f1140 R15:
ffff8817d226c920
[126348.622330] FS: 00007f9444db9700(0000) GS:ffff8817dee00000(0000)
knlGS:0000000000000000
[126348.622876] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[126348.623202] CR2: 0000000000000428 CR3: 00000017c70b2000 CR4:
00000000001406f0
[126348.623760] Stack:
[126348.624056] 0000000100200018
0000000000000000
0000000100000000
ffff8817b3c18000
[126348.624925] ffffffffa00ec280
ffff8817b3c18030
ffff8817967f1140
ffff8817d226c920
[126348.625736] ffff8817c6353e88
ffffffff8180820a
ffff88173c02b200
0000000000000008
[126348.626533] Call Trace:
[126348.626873] [<ffffffff8180820a>] sock_release+0x1a/0x70
[126348.627183] [<ffffffff8180826d>] sock_close+0xd/0x11
[126348.627512] [<ffffffff81152c61>] __fput+0xdf/0x193
[126348.627845] [<ffffffff81152d43>] ____fput+0x9/0xb
[126348.628169] [<ffffffff810d098e>] task_work_run+0x78/0x8f
[126348.628517] [<ffffffff810038a9>] do_notify_resume+0x40/0x4e
[126348.628837] [<ffffffff818a5a0a>] int_signal+0x12/0x17
Ok, so there's another possibility for pppoe_release() to be called
while
sk->sk_state is PPPOX_{CONNECTED,BOUND,ZOMBIE} but po->pppoe_dev is
NULL.
I'll check the code to see if I can find any race wrt. po->pppoe_dev
and sk->sk_state settings.
In a previous message, you said you'd try reverting 287f3a943fef
("pppoe: Use workqueue to die properly when a PADT is received") and
related patches. I guess "related patches" means 665a6cd809f4 ("pppoe:
drop pppoe device in pppoe_unbind_sock_work"), right?.
Did these reverts give any successful result?
BTW, please don't top-post.
I am doing just "dirty" patch like this, i cannot certainly remember if
i was doing git reversal, because
it was a while when i spotted this bug. After that pppoe server is not
rebooting.
diff -Naur linux-4.2.2-vanilla/drivers/net/ppp/pppoe.c
linux-4.2.2-changed/drivers/net/ppp/pppoe.c
--- linux-4.2.2-vanilla/drivers/net/ppp/pppoe.c 2015-09-29
20:38:27.000000000 +0300
+++ linux-4.2.2-changed/drivers/net/ppp/pppoe.c 2015-10-04
19:05:55.697732991 +0300
@@ -519,7 +519,7 @@
}
bh_unlock_sock(sk);
- if (!schedule_work(&po->proto.pppoe.padt_work))
+// if (!schedule_work(&po->proto.pppoe.padt_work))
sock_put(sk);
}
@@ -633,7 +633,7 @@
lock_sock(sk);
- INIT_WORK(&po->proto.pppoe.padt_work, pppoe_unbind_sock_work);
+// INIT_WORK(&po->proto.pppoe.padt_work, pppoe_unbind_sock_work);
error = -EINVAL;
if (sp->sa_protocol != PX_PROTO_OE)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html