I haven't found any obvious source in aufs for the extra fputs which I suspect are causing this problem. If you could either give me more information so I can run the same tests myself (I'm guessing the problem isn't arch-specific) or else reproduce the problem with the kernel patched with something like the following, perhaps we can catch it in the act. However we also might just catch legitimate fputs since the erroneous ones could occur while the refcount is still positive ...
diff --git a/fs/file_table.c b/fs/file_table.c index df66450fb443..d4911a6e8331 100644 --- a/fs/file_table.c +++ b/fs/file_table.c @@ -264,7 +264,9 @@ static DECLARE_DELAYED_WORK(delayed_fput_work, delayed_fput); void fput(struct file *file) { - if (atomic_long_dec_and_test(&file->f_count)) { + long cnt = atomic_long_dec_return(&file->f_count); + WARN_ON(cnt < 0); + if (cnt == 0) { struct task_struct *task = current; if (likely(!in_interrupt() && !(task->flags & PF_KTHREAD))) { -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1650062 Title: Ubuntu16.04.01VM:Docker-Powervm aufs bad file panic while running tests in a docker container Status in linux package in Ubuntu: New Bug description: == Comment: #0 - Vinutha GS - 2016-12-13 02:47:35 == When some of the base and io tests were run inside a docker container, the par crashed and below are the stack trace and other details. Steps to re-create - 1. Install 16.04.02 on a PowerVM lpar. 2. Ran setup general. 3. Ran docker scripts[home grown scripts] which does docker package installation and other setups required to run STAF cases inside docker container. 4. We have docker image using which we launch containers and start tests inside containers. If complete details are required on how to execute scripts, please let me know. 5. STAF Base and IO tests were started inside containers successfully, after sometime, I see partition is in XMON. Docker info - docker info Containers: 0 Running: 0 Paused: 0 Stopped: 0 Images: 0 Server Version: 1.12.1 Storage Driver: aufs Root Dir: /var/lib/docker/aufs Backing Filesystem: extfs Dirs: 0 Dirperm1 Supported: true Logging Driver: json-file Cgroup Driver: cgroupfs Plugins: Volume: local Network: null host bridge overlay Swarm: inactive Runtimes: runc Default Runtime: runc Security Options: apparmor Kernel Version: 4.4.0-53-generic Operating System: Ubuntu 16.04.1 LTS OSType: linux Architecture: ppc64le CPUs: 24 Total Memory: 49.89 GiB Name: bamlp3 ID: I7VI:G4RJ:RHTQ:WNGV:52FK:K7AZ:YDJQ:KFUM:P3UA:MZ3I:5XUY:WV3N Docker Root Dir: /var/lib/docker Debug Mode (client): false Debug Mode (server): false Registry: https://index.docker.io/v1/ WARNING: No swap limit support Insecure Registries: 127.0.0.0/8 docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 61f2b8ab0a86 32d545c3ea01 "/bin/sh -c ./staf_io" 24 minutes ago Up 24 minutes bamlp3-io 151da0322172 590e44f15214 "/bin/sh -c ./staf_ba" 30 minutes ago Up 30 minutes bamlp3-base Stack trace - 8:mon> t [c000000a5e147d10] d00000000a04ca98 aufs_flush_nondir+0x38/0x50 [aufs] [c000000a5e147d40] c0000000002e0428 filp_close+0x68/0xe0 [c000000a5e147dc0] c00000000030f71c __close_fd+0xcc/0x150 [c000000a5e147e00] c0000000002e04d4 SyS_close+0x34/0x90 [c000000a5e147e30] c000000000009204 system_call+0x38/0xb4 --- Exception: c00 (System Call) at 00003fff8bc217d8 SP (3fffd85203b0) is in userspace 8:mon> e cpu 0x8: Vector: 300 (Data Access) at [c000000a5e147a40] pc: d00000000a04bdd4: au_do_flush+0x44/0x220 [aufs] lr: d00000000a04ca98: aufs_flush_nondir+0x38/0x50 [aufs] sp: c000000a5e147cc0 msr: 8000000000009033 dar: 28 dsisr: 40000000 current = 0xc000000a8b7fc8e0 paca = 0xc00000000fb44c00 softe: 0 irq_happened: 0x01 pid = 11936, comm = remap_file_page 8:mon> Release details - uname -r 4.4.0-53-generic uname -a Linux bamlp4 4.4.0-53-generic #74-Ubuntu SMP Fri Dec 2 15:59:36 UTC 2016 ppc64le ppc64le ppc64le GNU/Linux == Comment: #6 - Vinutha GS - 2016-12-14 03:16:18 == Please find the attached sosreport. Also i have followed the steps for k-dump, It is enabled now. I'm going to start the tests once again. == Comment: #12 - Kevin W. Rudd - 2016-12-14 16:06:46 == The basic reason for the panic is that close was called on a file that was no longer valid. The f_count value was -8 for some reason, so it passed the following check in filep_close(): if (!file_count(filp)) { printk(KERN_ERR "VFS: Close: file count is 0\n"); return 0; } It then blew up in au_do_flush() because f_inode was NULL. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1650062/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp