Hello, AUFS users, Many people are reporting that AUFS hangs up when a Java process exits. [1]https://github.com/docker/docker/issues/18180 (Interestingly, the problem seems particular to Java) I suspect this is a deadlock related to i_mutex operations in aufs_destroy_inode(), but lockdep did not detect any deadlock. Anyone please look into this? $ ps -eLf root    1054  822  1054 11   3 07:30 ?     00:00:00 [java] <defunct> root    1054  822  1058 83   3 07:30 ?     00:00:01 [java] <defunct> root    1054  822  1065  0   3 07:30 ?     00:00:00 [java] <defunct> $ cat /proc/1054/task/1054/stack [<ffffffff82074dd3>] do_exit+0x8e8/0x92c [<ffffffff82074e8b>] do_group_exit+0x47/0xc4 [<ffffffff82074f17>] __wake_up_parent+0x0/0x23 [<ffffffff825d46ee>] system_call_fastpath+0x12/0x76 [<ffffffffffffffff>] 0xffffffffffffffff $ cat /proc/1054/task/1058/stack [<ffffffff821103be>] generic_file_write_iter+0x33/0xc1 [<ffffffff8215c0b7>] new_sync_write+0x5b/0x7a [<ffffffff8228ccd5>] do_xino_fwrite+0x54/0x83 [<ffffffff8228cfb6>] xino_fwrite.part.29+0x3a/0x49 [<ffffffff8228d24e>] xino_fwrite+0x29/0x6c [<ffffffff8228d331>] au_xino_do_write+0xa0/0xf7 [<ffffffff8228de9e>] au_xino_delete_inode+0x146/0x199 [<ffffffff8229d678>] au_iinfo_fin+0xec/0x19d [<ffffffff8228928a>] aufs_destroy_inode+0xe/0x25 [<ffffffff821710f5>] destroy_inode+0x36/0x4f [<ffffffff82171256>] evict+0x148/0x150 [<ffffffff82172599>] iput+0x177/0x1af [<ffffffff8216e888>] __dentry_kill+0x128/0x198 [<ffffffff8216eac0>] dput+0x1c8/0x1f0 [<ffffffff8215d828>] __fput+0x181/0x198 [<ffffffff8215d86d>] ____fput+0x9/0xb [<ffffffff8208aafa>] task_work_run+0x85/0x9c [<ffffffff82074910>] do_exit+0x425/0x92c [<ffffffff82074e44>] do_group_exit+0x0/0xc4 [<ffffffff825d46ee>] system_call_fastpath+0x12/0x76 [<ffffffffffffffff>] 0xffffffffffffffff $ cat /proc/1054/task/1065/stack [<ffffffff820e7ff4>] zap_pid_ns_processes+0x12e/0x175 [<ffffffff820749d1>] do_exit+0x4e6/0x92c [<ffffffff82074e8b>] do_group_exit+0x47/0xc4 [<ffffffff8207f2c8>] get_signal+0x590/0x5bb [<ffffffff8200d338>] do_signal+0x23/0x599 [<ffffffff8200d8c0>] do_notify_resume+0x12/0x4e [<ffffffff825d48e6>] int_signal+0x12/0x17 [<ffffffffffffffff>] 0xffffffffffffffff $ dmesg INFO: task java:1065 blocked for more than 120 seconds.    Tainted: G      O   4.1.13-boot2docker #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. java       D ffff880037b7bc68   0  1065  1033 0x00000000  ffff880037b7bc68 0000000000000002 ffff880037b7c000 ffff880000916008  ffff880037852110 ffff880037b7bd28 ffff880000916008 ffff880037b7bc88  ffffffff825d02dd 0000000000010920 0000000000000002 ffff880037b7bcc8 Call Trace:  [<ffffffff825d02dd>] schedule+0x6f/0x7e  [<ffffffff820e7ff4>] zap_pid_ns_processes+0x12e/0x175  [<ffffffff820749d1>] do_exit+0x4e6/0x92c  [<ffffffff820a5360>] ? lock_release_holdtime.part.28+0x72/0x79  [<ffffffff82074e8b>] do_group_exit+0x47/0xc4  [<ffffffff8207f2c8>] get_signal+0x590/0x5bb  [<ffffffff8200d338>] do_signal+0x23/0x599  [<ffffffff82097f47>] ? local_clock+0x19/0x22  [<ffffffff820a5360>] ? lock_release_holdtime.part.28+0x72/0x79  [<ffffffff825d4893>] ? int_very_careful+0x5/0x46  [<ffffffff820a76e0>] ? trace_hardirqs_on_caller+0x183/0x19f  [<ffffffff8200d8c0>] do_notify_resume+0x12/0x4e  [<ffffffff825d48e6>] int_signal+0x12/0x17 no locks held by java/1065. I also attached `proc-lockdep.tbz` that contains /proc/{locks,lockdep_chains,lockdep_stats,lock_stat,lockdep}. === Reproducibility ===  * almost 100% when single CPU is assigned to the process.  * less than 1% when multiple CPUs are assigned. === How to Reproduce ===  1. Build a VM with this Boot2Docker ISO (kernel 4.1.13 + aufs4.1 + lockdep): [2]https://github.com/AkihiroSuda/boot2docker/releases/tag/1.9.1-lockdep  2. Run `docker run -it --rm java taskset 0x1 java`. Regards, Akihiro Suda
References 1. https://github.com/docker/docker/issues/18180 2. https://github.com/AkihiroSuda/boot2docker/releases/tag/1.9.1-lockdep
proc-lockdep.tbz
Description: Binary data
------------------------------------------------------------------------------