Kirill Kolyshkin:
> Can you tell me what is your reason to not retry reading by default? The
> code
> has just checked that this is an aufs mount so it should definitely be
> present in
> /proc/mounts. Unless, of course, this mount was unmounted by someone in
> between statfs() and reading. If you have this exact case in mind (I can't
> think
> of anything else) and don't want to retry because of efficiency, you can add
> another statfs() to after reading /proc/mounts and not finding the mount --
> that way you can be sure that the mount is still there but it eluded the
> /proc/mounts.
Yes, such race was in my mind.
In other words, it is hard to identify the reason why /proc/mounts
doesn't show the entry. The problem of /proc/mounts, or someone else
unmounted? Additionally I guess(hope) such parallel mount/unmounts are
rare. And I wonder "2" is the absolute correct solution? "3" cannot be
happen? Never?
Statfs(), you say, won't help I am afraid. Even if it tells us that the
dir is aufs, it is not the proof of the aufs mountpoint. It can be a
subdir of another aufs mount.
An extra stat(2) call may help in this point. It will tell us the inode
number, and if it is AUFS_ROOT_INO, then that path is the aufs mountpoint.
But I wonder do we really have to issue stat(2) and statfs(2) just to
make sure the aufs mount is still there? Isn't it rather heavy and racy?
> I have also took a deeper look at that other error I mentioned earlier.
> Found out
> it's a race in au_xino_create(). In case xino mount option is provided (we
> use
> xino=/dev/shm/aufs.xino in Docker), and multiple mounts (note: for different
> mount points) are performed in parallel, one mount can reach
>
> > file = vfsub_filp_open(fpath, O_RDWR | O_CREAT | O_EXCL | O_LARGEFILE,
> 0666);
>
> line of code, while another mount already created that file, but haven't
> unlinked it yet.
>
> As a result, we have an error like these in the kernel log:
>
> [2233986.956753] aufs au_xino_create:767:dockerd[17144]: open
> /dev/shm/aufs.xino(-17)
> [2233988.732636] aufs au_xino_create:767:dockerd[17518]: open
> /dev/shm/aufs.xino(-17)
Thank you very much for the report.
Here -17 means EEXIST "File exists" error.
It is an expected behaviour (and I am glad that I know it is working
expectedly). As you might know, the default path of XINO files are the
top dir of the first writable branch, and a writable branch is not
sharable between the multiple aufs mounts. So by default XINO files are
dedicated to a single aufs mount. Not shared, no confliction happens.
> Currently, I am working around this unfortunate issue by calling mount(2)
> under
> an exclusive lock, to make sure no two aufs mounts (again, for different
> mount
> points) are performed in parallel, but perhaps there is a better way?
>
> I am going to mitigate this race by adding a random suffix to xino file
> name; do you think
> it is a decent workaround?
If your first writable branch is somewhere on /dev/shm, then you can
remove "xino=" option. In this case, the XINO files will be created
under /dev/shm and not shared. Moreover "xino=" option is something
like a last resort generally. As long as the filesystem of your first
writable branch doesn't support XINO, or you want a little gain around
the aufs internal XINO handling, you may want "xino=". Otherwise you
can omit it.
Of course adding a random/unique suffix is a good idea. If I were you,
I'd use $$ in shell script manner such like
mount -t aufs -obr=...,xino=/dev/shm/aufs.xino.$$ ...
J. R. Okajima