I haven't found any obvious source in aufs for the extra fputs which I
suspect are causing this problem. If you could either give me more
information so I can run the same tests myself (I'm guessing the problem
isn't arch-specific) or else reproduce the problem with the kernel
patched with something like the following, perhaps we can catch it in
the act. However we also might just catch legitimate fputs since the
erroneous ones could occur while the refcount is still positive ...

diff --git a/fs/file_table.c b/fs/file_table.c
index df66450fb443..d4911a6e8331 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -264,7 +264,9 @@ static DECLARE_DELAYED_WORK(delayed_fput_work, 
delayed_fput);
 
 void fput(struct file *file)
 {
-       if (atomic_long_dec_and_test(&file->f_count)) {
+       long cnt = atomic_long_dec_return(&file->f_count);
+       WARN_ON(cnt < 0);
+       if (cnt == 0) {
                struct task_struct *task = current;
 
                if (likely(!in_interrupt() && !(task->flags & PF_KTHREAD))) {

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1650062

Title:
  Ubuntu16.04.01VM:Docker-Powervm aufs bad file panic while running
  tests in a docker container

Status in linux package in Ubuntu:
  New

Bug description:
  == Comment: #0 - Vinutha GS - 2016-12-13 02:47:35 ==
  When some of the base and io tests were run inside a docker container, the 
par crashed and below are the stack trace and other details.

  Steps to re-create -
  1. Install 16.04.02 on a PowerVM lpar.
  2. Ran setup general.
  3. Ran docker scripts[home grown scripts] which does docker package 
installation and other setups required to run STAF cases inside docker 
container.
  4. We have docker image using which we launch containers and start tests 
inside containers.
  If complete details are required on how to execute scripts, please let me 
know.
  5. STAF Base and IO tests were started inside containers successfully, after 
sometime, I see partition is in XMON.

  Docker info -
  docker info
  Containers: 0
   Running: 0
   Paused: 0
   Stopped: 0
  Images: 0
  Server Version: 1.12.1
  Storage Driver: aufs
   Root Dir: /var/lib/docker/aufs
   Backing Filesystem: extfs
   Dirs: 0
   Dirperm1 Supported: true
  Logging Driver: json-file
  Cgroup Driver: cgroupfs
  Plugins:
   Volume: local
   Network: null host bridge overlay
  Swarm: inactive
  Runtimes: runc
  Default Runtime: runc
  Security Options: apparmor
  Kernel Version: 4.4.0-53-generic
  Operating System: Ubuntu 16.04.1 LTS
  OSType: linux
  Architecture: ppc64le
  CPUs: 24
  Total Memory: 49.89 GiB
  Name: bamlp3
  ID: I7VI:G4RJ:RHTQ:WNGV:52FK:K7AZ:YDJQ:KFUM:P3UA:MZ3I:5XUY:WV3N
  Docker Root Dir: /var/lib/docker
  Debug Mode (client): false
  Debug Mode (server): false
  Registry: https://index.docker.io/v1/
  WARNING: No swap limit support
  Insecure Registries:
   127.0.0.0/8

  docker ps -a
  CONTAINER ID        IMAGE               COMMAND                  CREATED      
       STATUS              PORTS               NAMES
  61f2b8ab0a86        32d545c3ea01        "/bin/sh -c ./staf_io"   24 minutes 
ago      Up 24 minutes                           bamlp3-io
  151da0322172        590e44f15214        "/bin/sh -c ./staf_ba"   30 minutes 
ago      Up 30 minutes                           bamlp3-base

  
  Stack trace -
  8:mon> t
  [c000000a5e147d10] d00000000a04ca98 aufs_flush_nondir+0x38/0x50 [aufs]
  [c000000a5e147d40] c0000000002e0428 filp_close+0x68/0xe0
  [c000000a5e147dc0] c00000000030f71c __close_fd+0xcc/0x150
  [c000000a5e147e00] c0000000002e04d4 SyS_close+0x34/0x90
  [c000000a5e147e30] c000000000009204 system_call+0x38/0xb4
  --- Exception: c00 (System Call) at 00003fff8bc217d8
  SP (3fffd85203b0) is in userspace
  8:mon> e
  cpu 0x8: Vector: 300 (Data Access) at [c000000a5e147a40]
      pc: d00000000a04bdd4: au_do_flush+0x44/0x220 [aufs]
      lr: d00000000a04ca98: aufs_flush_nondir+0x38/0x50 [aufs]
      sp: c000000a5e147cc0
     msr: 8000000000009033
     dar: 28
   dsisr: 40000000
    current = 0xc000000a8b7fc8e0
    paca    = 0xc00000000fb44c00         softe: 0        irq_happened: 0x01
      pid   = 11936, comm = remap_file_page
  8:mon>

  Release details -
  uname -r
  4.4.0-53-generic

   uname -a
  Linux bamlp4 4.4.0-53-generic #74-Ubuntu SMP Fri Dec 2 15:59:36 UTC 2016 
ppc64le ppc64le ppc64le GNU/Linux

  == Comment: #6 - Vinutha GS  - 2016-12-14 03:16:18 ==
  Please find the attached sosreport.
  Also i have followed the steps for k-dump, It is enabled now.
  I'm going to start the tests once again.

  == Comment: #12 - Kevin W. Rudd - 2016-12-14 16:06:46 ==
  The basic reason for the panic is that close was called on a file 
  that was no longer valid.  The f_count value was -8 for some reason,
  so it passed the following check in filep_close():

          if (!file_count(filp)) {
                  printk(KERN_ERR "VFS: Close: file count is 0\n");
                  return 0;
          }

  It then blew up in au_do_flush() because f_inode was NULL.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1650062/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to