https://bz.mercurial-scm.org/show_bug.cgi?id=5935
Bug ID: 5935 Summary: Cannot push immediately after commit, but one second later it works Product: Mercurial Version: 4.0 Hardware: PC OS: Linux Status: UNCONFIRMED Severity: bug Priority: wish Component: Mercurial Assignee: bugzi...@mercurial-scm.org Reporter: nicolas.barb...@gmail.com CC: mercurial-devel@mercurial-scm.org Hey Mercurial developers, We think we have stumbled on a very confusing race condition, probably related to the handling of hardlinks that are used when cloning locally. Situation: * We have some "shared by using setgid" directories: We have two users that belong to the same group, both the original repo and the clone are in directories having that common group as their group, the setgid bit is set, and the users have a umask of 002. This way, they can both read/write/delete/etc files and directories in this whole directory hierarchy. * We have a script that commits something (a change in a single file) and then immediately pushes that commit. * The push is over the filesystem, from some other directory (not in /var) to /var/local/hg/XXX/YYY on the same filesystem (so hardlinking is possible). Problem: * This push fails the first time (right after we cloned the repo from /var/local/hg/XXX/YYY) as follows. We use set -e, so the commands have a "+" in front of them: + umask 0002 + id uid=1015(siemen) gid=1016(siemen) groups=1016(siemen),1013(XXX) + ls -l /var/local/hg/XXX/YYY/.hg/store/00changelog.i -rw-rw-r-- 2 itsme XXX 18760 Jun 8 15:48 /var/local/hg/XXX/YYY/.hg/store/00changelog.i + hg commit -u 'Release <rele...@zzz.com>' -m 'Changed artifact version to QQQ.' + ls -l /var/local/hg/XXX/YYY/.hg/store/00changelog.i -rw-rw-r-- 1 itsme XXX 18760 Jun 8 15:48 /var/local/hg/XXX/YYY/.hg/store/00changelog.i + hg push /var/local/hg/XXX/YYY pushing to /var/local/hg/XXX/YYY searching for changes adding changesets adding manifests adding file changes added 1 changesets with 1 changes to 1 files transaction abort! rollback failed - please run hg recover abort: Operation not permitted: '/var/local/hg/XXX/YYY/.hg/store/00changelog.i' Further analysis: * It seems that Mercurial is trying to do something with this file /var/local/hg/XXX/YYY/.hg/store/00changelog.i for which it doesn't have the permission (e.g., change the ownership or permissions), because it is running as "siemen" while the file is owned by "itsme". It is not clear to me what it is trying to do, so improving the error message to include some context would be great. * We had quite a few of such repositories that all had the exact same problem, so we could try multiple times. * When we put a "read A" in between the commit and the push, and waited for a about a second before pressing enter, the push actually reproducibly succeeded, while before we added that wait, it reproducibly failed. Waiting only a fraction of a second was not enough, it had to be about a second or so. * Before, we did the pushing over SSH (to another server). We only started to have the problem since we now use filesystem-level clones. * After we either recovered the target repo and repushed manually, or after the push succeeded because of the wait, we didn't have the problem anymore. The difference seems to be that 00changelog.i is not hardlinked anymore, but replaced with a copy. My guess: It seems to be that there is some kind of race condition in the test that tries to determine whether a file is still hardlinked or not. That race condition seems to be between two different processes that run in sequence (the push only starts after the commit already ended). It seems that 00changelog.i in the target repo still looks hardlinked to the pushing process, even though the previous commit should have made a copy. Might it be that the kernel keeps the file descriptor until a bit after the process that had it opened, already ended? And that an open file descriptor increases the hardlink count, because it should prevent the file from being removed physically? I assume that this problem might not have been detected before because it requires the combination of the "directory sharing through setgid", locally cloned repositories, and a first push that happens right (less than a second) after a commit. Sorry for not having more time to investigate this better. I hope that this rings a bell for someone that knows the Mercurial code. Versions: Mercurial 4.0 (as packaged by Debian) Debian 9.4 Linux 4.9.0-6-amd64 Filesystem: ext4 Greetings, Nicolas -- You are receiving this mail because: You are on the CC list for the bug. _______________________________________________ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel