On Mon, 31 Mar 2014, sf...@users.sourceforge.net wrote:

> According to this netconsole.log,
> - ln:1567 enters aufs, and aufs tries copying-up a file by calling 
> kworker/5:1:73.
> - kworker/5:1:73 tries opening a file on afs and gets stuck.
> - rs:main Q:Reg:1571 is also running and issues access(2), enters aufs
>  and gets stuck.
>
> What is the process "rs:main Q:Reg" whose pid is 1571?
> And which file is it going to access(2)?

In this case, Debian's 'savelog' command tried to rename a logfile, which 
existed only in the ro-Branch (in AFS).

I tried to create a minimalistic example:
http://users.minet.uni-jena.de/~erik/aufs/mv.log

In the last line, the 'mv' command stalls.

The corresponding kernel messages are in
http://users.minet.uni-jena.de/~erik/aufs/netconsole.log

> How do you mount aufs and openafs?

Similar to the old nfsroot-client this is an OpenAFS-rooted client, the 
directories are bind mounts, p. e.:

mount -v -o bind -r afs/mirz/linuxclients/linuxpool-jessie64/var /var
…

The superjacent AUFS-mount for this directory is (/run/shm/aufs/var ist 
just an example for an empty directory, usually it is the local disk):

mount -t aufs -o 
br:/run/shm/aufs/var=rw:/afs/mirz/linuxclients/linuxpool-jessie64/var=ro none 
/var

> If you make something like recursive mount, ie. mount aufs under
> openafs, then your system may not work since it looks like that openafs
> has a global mutex (based upon just my current guess).

These are bad news, because it worked for long years.

> ----------------------------------------
> - /proc/mounts (instead of the output of mount(8))
> - /sys/module/aufs/*
> - /sys/fs/aufs/* (if you have them)
> - /debug/aufs/* (if you have them)
> - linux kernel version
>  if your kernel is not plain, for example modified by distributor,
>  the url where i can download its source is necessary too.
> - aufs version which was printed at loading the module or booting the
>  system, instead of the date you downloaded.
> - configuration (define/undefine CONFIG_AUFS_xxx)
> - kernel configuration or /proc/config.gz (if you have it)
> ----------------------------------------

You find all these files (including 600 MB unpacked kernel source code) 
and outputs in http://users.minet.uni-jena.de/~erik/aufs/

The output of the branches in /sys/fs/aufs/ stalls, when the 
above-mentioned 'mv' commando is entered.

>In order to make sure this, if you enable CONFIG_MAGIC_SYSRQ, try
>SysRq+d and SysRq+w after the deadlock.
>If the global mutex in openafs is related to this problem, then kernel
>prints "afs_global_lock" somewhere in the message.

Unfortunately there seems to be a bug in Debians actual OpenAFS release, 
which is why OpenAFS did not compile with CONFIG_LOCKDEP (bug report is 
issued). In netconsole.log ist the output of SysRq+w. The afs_global_lock 
is not in there.

Thank you, Erik



------------------------------------------------------------------------------

Reply via email to