Re: [PATCH v3 2/3] sha1_file: open window into packfiles with O_CLOEXEC
On Mon, Oct 31, 2016 at 10:55:32AM -0700, Junio C Hamano wrote: > > So I guess it's possible that it produces a noticeable effect in some > > cases, but I'm still somewhat doubtful. And actually repacking your > > repository had a greater effect in every case I measured (in addition to > > providing other speedups). > > Let's keep doubting. I prefer one-step-at-a-time approach to > things anyway, and what I plan in the near term are: > > * use the "open() with O_NOATIME|O_CLOEXEC, gradually losing the >bits during fallback" approach in the ls/git-open-cloexec topic, >in order to help ls/filter-process topic be part of the upcoming >release; > > * simplify the logic to the "open(2) with O_CLOEXEC, set O_NOATIME >with fcntl(2)" in jc/git-open-cloexec~1 after 2.11 ships; > > * cook "drop the latter half of setting O_NOATIME" which is at the >tip of jc/git-open-cloexec in 'next', and while Linus is looking >the other way ^W^W^W^W^W^W^W after people had chance to complain >with numbers, merge it to a future release iff it still looked OK >to drop O_NOATIME thing. Great, that sounds like a good way to proceed (and if the final step never happens, no big deal). -Peff
Re: [PATCH v3 2/3] sha1_file: open window into packfiles with O_CLOEXEC
Jeff King writes: > If you set a probe on touch_atime() in the kernel (which is called for > every attempt to smudge the atime, regardless of mount options, but is > skipped when the descriptor was opened with O_NOATIME), you can see the > impact. Here's a command I picked because it reads a lot of objects (run > on my git.git clone): > > $ perf stat -e probe:touch-atime git log -Sfoo >/dev/null > > And the probe:touch_atime counts before (stock git) and after (a patch > to drop O_NOATIME): > > before: 22,235 >after: 22,362 > > So that's only half a percent difference. And it's on a reasonably messy > clone that is partway to triggering an auto-repack: > ... > So I guess it's possible that it produces a noticeable effect in some > cases, but I'm still somewhat doubtful. And actually repacking your > repository had a greater effect in every case I measured (in addition to > providing other speedups). Let's keep doubting. I prefer one-step-at-a-time approach to things anyway, and what I plan in the near term are: * use the "open() with O_NOATIME|O_CLOEXEC, gradually losing the bits during fallback" approach in the ls/git-open-cloexec topic, in order to help ls/filter-process topic be part of the upcoming release; * simplify the logic to the "open(2) with O_CLOEXEC, set O_NOATIME with fcntl(2)" in jc/git-open-cloexec~1 after 2.11 ships; * cook "drop the latter half of setting O_NOATIME" which is at the tip of jc/git-open-cloexec in 'next', and while Linus is looking the other way ^W^W^W^W^W^W^W after people had chance to complain with numbers, merge it to a future release iff it still looked OK to drop O_NOATIME thing.
Re: [PATCH v3 2/3] sha1_file: open window into packfiles with O_CLOEXEC
Linus Torvalds writes: > On Sat, Oct 29, 2016 at 1:25 AM, Johannes Schindelin > wrote: >> >> Correct. We cannot change an open file handle's state ("FD_CLOEXEC") on >> Windows, but we can ask open() to open said file handle with the correct >> flag ("O_CLOEXEC", which is mapped to O_NOINHERIT on Windows. > > Ok. So then I have no issues with it, and let's use O_CLOEXEC if it > exists and fcntl(FD_CLOEXEC) if O_CLOEXEC doesn't exist. I am not sure if the fallback to FD_CLOEXEC is worth doing, because it does not exist on Windows, where leaking an open file descriptor to a child process matters more than other platforms [*1*]. Of course, the world is not all Windows, and there may be other platforms this fallback may help. But there is a chance that some other thread may fork between the time we open(2) and the time we call fcntl(2), leaving FD_CLOEXEC ineffective and unreliable, so that's another thing that makes me doubt if FD_CLOEXEC fallback is worth doing. Having said that, here is a rewrite of [*2*] to use the fallback would look like. After this topic settles, we may want to also update the tempfile.c::create_tempfile() implementation to use the helper function we use here. [Footnotes] *1* The call in ce_compare_data() is to see if the contents in the working tree file match what is in the index. Perhaps the program is "git checkout another-branch" that knows this path is absent in the branch being switched to and trying to make sure we lose no local modification. When the path needs to be passed through the clean/smudge filter before hashing to see if the working tree contents hashes to the object in the index, we need to fork and then after the filter finishes its processing, we close the file descriptor and then unlink(2) the path if it is clean. We are about to add a variant of clean/smudge mechanism where a lazily spawned process can clean contents of multiple paths and that is where cloexec starts to matter. This function starts to inspect one path, opens a fd to it, finds that it needs filtering, spawns a long-lingering process, while leaking the original fd to it. After feeding the contents via pipe and then receiving the filtered contents (the protocol is not full duplex and there won't be a stuck pipe), the parent would decide that the path is clean and attempts to unlink(2). Windows does not let you unlink a path with open filedescriptor held, causing "checkout" to fail. As it is only one fd that leaks to the child, no matter how many subsequent paths are filtered by the same long-lingering child process, it will be a far less important issue on other platforms that lets you unlink(2) such a file. *2* http://public-inbox.org/git/xmqqh97w38gj@gitster.mtv.corp.google.com --- cache.h | 1 + git-compat-util.h | 4 read-cache.c | 9 + sha1_file.c | 47 --- 4 files changed, 34 insertions(+), 27 deletions(-) diff --git a/cache.h b/cache.h index a902ca1f8e..6f1c21a352 100644 --- a/cache.h +++ b/cache.h @@ -1122,6 +1122,7 @@ extern int write_sha1_file(const void *buf, unsigned long len, const char *type, extern int hash_sha1_file_literally(const void *buf, unsigned long len, const char *type, unsigned char *sha1, unsigned flags); extern int pretend_sha1_file(void *, unsigned long, enum object_type, unsigned char *); extern int force_object_loose(const unsigned char *sha1, time_t mtime); +extern int git_open_cloexec(const char *name, int flags); extern int git_open(const char *name); extern void *map_sha1_file(const unsigned char *sha1, unsigned long *size); extern int unpack_sha1_header(git_zstream *stream, unsigned char *map, unsigned long mapsize, void *buffer, unsigned long bufsiz); diff --git a/git-compat-util.h b/git-compat-util.h index 43718dabae..64407c701c 100644 --- a/git-compat-util.h +++ b/git-compat-util.h @@ -683,6 +683,10 @@ char *gitstrdup(const char *s); #define O_CLOEXEC 0 #endif +#ifndef FD_CLOEXEC +#define FD_CLOEXEC 0 +#endif + #ifdef FREAD_READS_DIRECTORIES #ifdef fopen #undef fopen diff --git a/read-cache.c b/read-cache.c index db5d910642..c27d3e240b 100644 --- a/read-cache.c +++ b/read-cache.c @@ -156,14 +156,7 @@ void fill_stat_cache_info(struct cache_entry *ce, struct stat *st) static int ce_compare_data(const struct cache_entry *ce, struct stat *st) { int match = -1; - static int cloexec = O_CLOEXEC; - int fd = open(ce->name, O_RDONLY | cloexec); - - if ((cloexec & O_CLOEXEC) && fd < 0 && errno == EINVAL) { - /* Try again w/o O_CLOEXEC: the kernel might not support it */ - cloexec &= ~O_CLOEXEC; - fd = open(ce->name, O_RDONLY | cloexec); - } + int fd = git_open_cloexec(ce->name, O_RDONLY); if (fd >= 0) { unsigned char sha1[20]; diff --git a/sha1_file.c
Re: [PATCH v3 2/3] sha1_file: open window into packfiles with O_CLOEXEC
On Fri, Oct 28, 2016 at 09:13:41AM -0700, Linus Torvalds wrote: > On Fri, Oct 28, 2016 at 4:11 AM, Johannes Schindelin > wrote: > > > > You guys. I mean: You guys! You sure make my life hard. A brief look at > > mingw.h could have answered your implicit question: > > So here's what you guys should do: > > - leave O_NOATIME damn well alone. It works. It has worked for 10+ > years. Stop arguing against it, people who do. For some definition of worked, perhaps. If you set a probe on touch_atime() in the kernel (which is called for every attempt to smudge the atime, regardless of mount options, but is skipped when the descriptor was opened with O_NOATIME), you can see the impact. Here's a command I picked because it reads a lot of objects (run on my git.git clone): $ perf stat -e probe:touch-atime git log -Sfoo >/dev/null And the probe:touch_atime counts before (stock git) and after (a patch to drop O_NOATIME): before: 22,235 after: 22,362 So that's only half a percent difference. And it's on a reasonably messy clone that is partway to triggering an auto-repack: $ git count-objects -v count: 6167 size: 61128 in-pack: 275773 packs: 18 size-pack: 86857 prune-packable: 25 garbage: 0 size-garbage: 0 Running "git gc" drops the probe count to 21,733. It makes a bigger difference for some commands (it's more like 10% for git-status). And smaller for others ("git log -p" triggers it over 100,000 times). One thing missing in that count is how many of those calls would have resulted in an actual disk write. Looking at strace, most of the filesystem activity is opening .gitattributes files, and we end up opening the same ones repeatedly (e.g., t/.gitattributes in git.git). Multiple hits for a given inode in the same second get coalesced into at most a single disk write. So I guess it's possible that it produces a noticeable effect in some cases, but I'm still somewhat doubtful. And actually repacking your repository had a greater effect in every case I measured (in addition to providing other speedups). Like I said, I'm OK keeping O_NOATIME. It's just not that much code. But if you really care about the issue of dirtying inodes via atime, you should look into vastly increasing our use of O_NOATIME. Or possibly looking at caching more in the attribute code. -Peff
Re: [PATCH v3 2/3] sha1_file: open window into packfiles with O_CLOEXEC
On Sat, Oct 29, 2016 at 1:25 AM, Johannes Schindelin wrote: > > Correct. We cannot change an open file handle's state ("FD_CLOEXEC") on > Windows, but we can ask open() to open said file handle with the correct > flag ("O_CLOEXEC", which is mapped to O_NOINHERIT on Windows. Ok. So then I have no issues with it, and let's use O_CLOEXEC if it exists and fcntl(FD_CLOEXEC) if O_CLOEXEC doesn't exist. Linus
Re: [PATCH v3 2/3] sha1_file: open window into packfiles with O_CLOEXEC
Hi, On Fri, 28 Oct 2016, Junio C Hamano wrote: > Linus Torvalds writes: > > > Apparently windows doesn't even support it, at least not mingw > > Assuming that the above was a misunderstanding, assuming that we can > do O_CLOEXEC (but not FD_CLOEXEC) on Windows just fine, where stray > file descriptors held open in the children matter more, and ... Correct. We cannot change an open file handle's state ("FD_CLOEXEC") on Windows, but we can ask open() to open said file handle with the correct flag ("O_CLOEXEC", which is mapped to O_NOINHERIT on Windows. And for the record: I agree with Junio that we cannot simply drop this O_CLOEXEC business. Because it was introduced to fix bugs. Thank you for your time, Johannes
Re: [PATCH v3 2/3] sha1_file: open window into packfiles with O_CLOEXEC
Linus Torvalds writes: > Apparently windows doesn't even support it, at least not mingw Assuming that the above was a misunderstanding, assuming that we can do O_CLOEXEC (but not FD_CLOEXEC) on Windows just fine, where stray file descriptors held open in the children matter more, and ... > In contrast, O_NOATIME isn't a maintenance problem, since it's purely > an optimization and has no semantic difference, so it not existing on > some platform is immaterial. ... assuming that I didn't screw up my use of fcntl() to set O_NOATIME on a fd opened with O_CLOEXEC, are you OK with the approach in patch *1*? We can drop *2* to keep O_NOATIME, which has been working for those with old kernels with many loose objects, and it is not too much code to keep. *1* http://public-inbox.org/git/xmqqh97w38gj@gitster.mtv.corp.google.com *2* http://public-inbox.org/git/xmqqd1ik38f4@gitster.mtv.corp.google.com
Re: [PATCH v3 2/3] sha1_file: open window into packfiles with O_CLOEXEC
Linus Torvalds writes: > On Fri, Oct 28, 2016 at 9:48 AM, Junio C Hamano wrote: >> >> Setting FD_CLOEXEC with fcntl(2) may be racy, but in what way >> it is broken to open(2) with O_CLOEXEC? > > Apparently windows doesn't even support it, at least not mingw. So > you'll have to have some supprt for just explicitly closing things at > execve anyway. And if you do that for windows, then what's the point > of using O_CLOEXEC on Linux, and having two totally different paths > that will just mean maintenance headache. Ah, that's where your reaction comes from. If I understood Dscho correctly, what he said was that I cannot set FD_CLOEXEC via fcntl() emulation they have. It wasn't an objection to O_CLOEXEC. In fact, since 05d1ed6148 ("mingw: ensure temporary file handles are not inherited by child processes", 2016-08-22), we have been relying on open(2) with O_CLOEXEC even on Windows (by emulating it with O_NOINHERIT, which Windows has) on some codepaths.
Re: [PATCH v3 2/3] sha1_file: open window into packfiles with O_CLOEXEC
On Fri, Oct 28, 2016 at 9:48 AM, Junio C Hamano wrote: > > Excuse me, but care to elaborate a bit more? It just seems entirely pointless to change the games with O_xyz. > Setting FD_CLOEXEC with fcntl(2) may be racy, but in what way > it is broken to open(2) with O_CLOEXEC? Apparently windows doesn't even support it, at least not mingw. So you'll have to have some supprt for just explicitly closing things at execve anyway. And if you do that for windows, then what's the point of using O_CLOEXEC on Linux, and having two totally different paths that will just mean maintenance headache. In contrast, O_NOATIME isn't a maintenance problem, since it's purely an optimization and has no semantic difference, so it not existing on some platform is immaterial. End result: leave O_NOATIME as it is (working), and just forget about O_CLOEXEC. I have no actual idea about mingw support for this, but Johannes' email certainly implies it's painful on Windows. Some googling also shows people doing things like #ifdef O_CLOEXEC flags |= O_CLOEXEC; #endif or #ifndef O_CLOEXEC #define O_CLOEXEC 0 #endif so O_CLOEXEC definitely isn't considered portable. Do you really want to rely on it? Linus
Re: [PATCH v3 2/3] sha1_file: open window into packfiles with O_CLOEXEC
Linus Torvalds writes: > On Fri, Oct 28, 2016 at 4:11 AM, Johannes Schindelin > wrote: >> >> You guys. I mean: You guys! You sure make my life hard. A brief look at >> mingw.h could have answered your implicit question: > > So here's what you guys should do: > > - leave O_NOATIME damn well alone. It works. It has worked for 10+ > years. Stop arguing against it, people who do. > > - get rid of all O_CLOEXEC games. They don't work. If you want to > close file descriptors at execve(), you - gasp - close the file > descriptor before doing an execve. > > So O_CLOEXEC or FD_CLOEXEC is broken. > > DO NOT BREAK O_NOATIME JUST TO ADD COMPLETELY NEW BREAKAGE. Excuse me, but care to elaborate a bit more? Did I botch the way I ask fcntl(2) to set O_NOATIME in *1*? Its follow-up in *2* is optional and as peff said in *3*, I do not think we mind dropping that step and keeping O_NOATIME. It is not that much code to keep. Setting FD_CLOEXEC with fcntl(2) may be racy, but in what way it is broken to open(2) with O_CLOEXEC? If we want to close file descriptors we opened at execve() time ourselves, we'd need to somehow keep track of which file descriptors are meant to be closed upon execve() time, and one way to mark that may be to use O_CLOEXEC, but once we mark a file descriptor with the bit, execve() would take care of "closing" it for us---wouldn't that be the whole point of that bit? *1* http://public-inbox.org/git/xmqqh97w38gj@gitster.mtv.corp.google.com *2* http://public-inbox.org/git/xmqqd1ik38f4@gitster.mtv.corp.google.com *3* http://public-inbox.org/git/20161028075104.la24zydnr3ogb...@sigill.intra.peff.net
Re: [PATCH v3 2/3] sha1_file: open window into packfiles with O_CLOEXEC
On Fri, Oct 28, 2016 at 4:11 AM, Johannes Schindelin wrote: > > You guys. I mean: You guys! You sure make my life hard. A brief look at > mingw.h could have answered your implicit question: So here's what you guys should do: - leave O_NOATIME damn well alone. It works. It has worked for 10+ years. Stop arguing against it, people who do. - get rid of all O_CLOEXEC games. They don't work. If you want to close file descriptors at execve(), you - gasp - close the file descriptor before doing an execve. So O_CLOEXEC or FD_CLOEXEC is broken. DO NOT BREAK O_NOATIME JUST TO ADD COMPLETELY NEW BREAKAGE. Linus
Re: [PATCH v3 2/3] sha1_file: open window into packfiles with O_CLOEXEC
Junio C Hamano writes: > Junio C Hamano writes: > >> Hmph. This may not fly well in practice, though. > > We can flip the order around, and then it becomes simpler to later > drop atime support. The first step goes like this. And the second step would be like this. -- >8 -- Subject: [PATCH] sha1_file: stop opening files with O_NOATIME When we open object files, we try to do so with O_NOATIME. This dates back to 144bde78e9 (Use O_NOATIME when opening the sha1 files., 2005-04-23), which is an optimization to avoid creating a bunch of dirty inodes when we're accessing many objects. But a few things have changed since then: 1. In June 2005, git learned about packfiles, which means we would do a lot fewer atime updates (rather than one per object access, we'd generally get one per packfile). 2. In late 2006, Linux learned about "relatime", which is generally the default on modern installs. So performance around atimes updates is a non-issue there these days. All the world isn't Linux, but as it turns out, Linux is the only platform to implement O_NOATIME in the first place. So it's very unlikely that this code is helping anybody these days. Helped-by: Jeff King [jc: took idea and log message from peff] Signed-off-by: Junio C Hamano --- cache.h | 2 +- sha1_file.c | 21 - 2 files changed, 1 insertion(+), 22 deletions(-) diff --git a/cache.h b/cache.h index 6f1c21a352..f440d3fd1e 100644 --- a/cache.h +++ b/cache.h @@ -1123,7 +1123,7 @@ extern int hash_sha1_file_literally(const void *buf, unsigned long len, const ch extern int pretend_sha1_file(void *, unsigned long, enum object_type, unsigned char *); extern int force_object_loose(const unsigned char *sha1, time_t mtime); extern int git_open_cloexec(const char *name, int flags); -extern int git_open(const char *name); +#define git_open(name) git_open_cloexec(name, O_RDONLY) extern void *map_sha1_file(const unsigned char *sha1, unsigned long *size); extern int unpack_sha1_header(git_zstream *stream, unsigned char *map, unsigned long mapsize, void *buffer, unsigned long bufsiz); extern int parse_sha1_header(const char *hdr, unsigned long *sizep); diff --git a/sha1_file.c b/sha1_file.c index dfbf398183..25d9359402 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -27,14 +27,6 @@ #include "list.h" #include "mergesort.h" -#ifndef O_NOATIME -#if defined(__linux__) && (defined(__i386__) || defined(__PPC__)) -#define O_NOATIME 0100 -#else -#define O_NOATIME 0 -#endif -#endif - #define SZ_FMT PRIuMAX static inline uintmax_t sz_fmt(size_t s) { return s; } @@ -1572,19 +1564,6 @@ int git_open_cloexec(const char *name, int flags) return fd; } -int git_open(const char *name) -{ - static int noatime = O_NOATIME; - int fd = git_open_cloexec(name, O_RDONLY); - - if (0 <= fd && (noatime & O_NOATIME)) { - int flags = fcntl(fd, F_GETFL); - if (fcntl(fd, F_SETFL, flags | noatime)) - noatime = 0; - } - return fd; -} - static int stat_sha1_file(const unsigned char *sha1, struct stat *st) { struct alternate_object_database *alt; -- 2.10.1-791-g404733b9cf
Re: [PATCH v3 2/3] sha1_file: open window into packfiles with O_CLOEXEC
Junio C Hamano writes: > Hmph. This may not fly well in practice, though. We can flip the order around, and then it becomes simpler to later drop atime support. The first step goes like this. -- >8 -- From: Junio C Hamano Date: Fri, 28 Oct 2016 06:23:07 -0700 Subject: [PATCH] git_open(): untangle possible NOATIME and CLOEXEC interactions The way we structured the fallback/retry mechanism for opening with O_NOATIME and O_CLOEXEC meant that if we failed due to lack of support to open the file with O_NOATIME option (i.e. EINVAL), we would still try to drop O_CLOEXEC first and retry, and then drop O_NOATIME. A platform on which O_NOATIME is defined in the header without support from the kernel wouldn't have a chance to open with O_CLOEXEC option due to this code structure. Arguably, O_CLOEXEC is more important than O_NOATIME, as the latter is mostly about performance, while the former can affect correctness. Instead use O_CLOEXEC to open the file, and then use fcntl(2) to set O_NOATIME on the resulting file descriptor. open(2) itself does not cause atime to be updated according to Linus [*1*]. The helper to do the former can be usable in the codepath in ce_compare_data() that was recently added to open a file descriptor with O_CLOEXEC; use it while we are at it. *1* Signed-off-by: Junio C Hamano --- cache.h | 1 + read-cache.c | 9 + sha1_file.c | 39 +++ 3 files changed, 21 insertions(+), 28 deletions(-) diff --git a/cache.h b/cache.h index a902ca1f8e..6f1c21a352 100644 --- a/cache.h +++ b/cache.h @@ -1122,6 +1122,7 @@ extern int write_sha1_file(const void *buf, unsigned long len, const char *type, extern int hash_sha1_file_literally(const void *buf, unsigned long len, const char *type, unsigned char *sha1, unsigned flags); extern int pretend_sha1_file(void *, unsigned long, enum object_type, unsigned char *); extern int force_object_loose(const unsigned char *sha1, time_t mtime); +extern int git_open_cloexec(const char *name, int flags); extern int git_open(const char *name); extern void *map_sha1_file(const unsigned char *sha1, unsigned long *size); extern int unpack_sha1_header(git_zstream *stream, unsigned char *map, unsigned long mapsize, void *buffer, unsigned long bufsiz); diff --git a/read-cache.c b/read-cache.c index db5d910642..c27d3e240b 100644 --- a/read-cache.c +++ b/read-cache.c @@ -156,14 +156,7 @@ void fill_stat_cache_info(struct cache_entry *ce, struct stat *st) static int ce_compare_data(const struct cache_entry *ce, struct stat *st) { int match = -1; - static int cloexec = O_CLOEXEC; - int fd = open(ce->name, O_RDONLY | cloexec); - - if ((cloexec & O_CLOEXEC) && fd < 0 && errno == EINVAL) { - /* Try again w/o O_CLOEXEC: the kernel might not support it */ - cloexec &= ~O_CLOEXEC; - fd = open(ce->name, O_RDONLY | cloexec); - } + int fd = git_open_cloexec(ce->name, O_RDONLY); if (fd >= 0) { unsigned char sha1[20]; diff --git a/sha1_file.c b/sha1_file.c index 09045df1dc..dfbf398183 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -1559,31 +1559,30 @@ int check_sha1_signature(const unsigned char *sha1, void *map, return hashcmp(sha1, real_sha1) ? -1 : 0; } -int git_open(const char *name) +int git_open_cloexec(const char *name, int flags) { - static int sha1_file_open_flag = O_NOATIME | O_CLOEXEC; - - for (;;) { - int fd; - - errno = 0; - fd = open(name, O_RDONLY | sha1_file_open_flag); - if (fd >= 0) - return fd; + static int cloexec = O_CLOEXEC; + int fd = open(name, flags | cloexec); + if ((cloexec & O_CLOEXEC) && fd < 0 && errno == EINVAL) { /* Try again w/o O_CLOEXEC: the kernel might not support it */ - if ((sha1_file_open_flag & O_CLOEXEC) && errno == EINVAL) { - sha1_file_open_flag &= ~O_CLOEXEC; - continue; - } + cloexec &= ~O_CLOEXEC; + fd = open(name, flags | cloexec); + } + return fd; +} - /* Might the failure be due to O_NOATIME? */ - if (errno != ENOENT && (sha1_file_open_flag & O_NOATIME)) { - sha1_file_open_flag &= ~O_NOATIME; - continue; - } - return -1; +int git_open(const char *name) +{ + static int noatime = O_NOATIME; + int fd = git_open_cloexec(name, O_RDONLY); + + if (0 <= fd && (noatime & O_NOATIME)) { + int flags = fcntl(fd, F_GETFL); + if (fcntl(fd, F_SETFL, flags | noatime)) + noatime = 0; } + return fd; } static int stat_sha1_file(const unsigned char *sha1, struct stat *st) -- 2.10.1-791-g404733b9cf
Re: [PATCH v3 2/3] sha1_file: open window into packfiles with O_CLOEXEC
Hi, On Thu, 27 Oct 2016, Junio C Hamano wrote: > Junio C Hamano writes: > > > Linus Torvalds writes: > > > >> On Thu, Oct 27, 2016 at 4:36 PM, Junio C Hamano wrote: > >>> > >>> Would the best endgame shape for this function be to open with > >>> O_NOATIME (and retry without), and then add CLOEXEC with fcntl(2) > >>> but ignoring an error from it, I guess? That would be the closest > >>> to what we historically had, I would think. > >> > >> I think that's the best model. > > > > OK, so perhaps like this. > > Hmph. This may not fly well in practice, though. > > To Unix folks, CLOEXEC is not a huge correctness issue. A child > process may hold onto an open file descriptor a bit longer than the > lifetime of the parent but as long as the child eventually exits, > nothing is affected. Over there, things are different. The parent > cannot even rename(2) or unlink(2) a file it created and closed > while the child is still holding the file descriptor open and the > lack of CLOEXEC will make the parent fail. I do not know how well > fcntl(2) emulation works on Windows, but I would not be surprised > if J6t or Dscho comes back and says that FD_CLOEXEC given to F_SETFD > would not work while O_CLOEXEC given to open(2) does. You guys. I mean: You guys! You sure make my life hard. A brief look at mingw.h could have answered your implicit question: static inline int fcntl(int fd, int cmd, ...) { if (cmd == F_GETFD || cmd == F_SETFD) return 0; errno = EINVAL; return -1; } So while you discuss in your Linux Ivory Tower how to optimize Git for Linux, and Linux only, I'll have to drop everything else and spend the rest of my Friday trying to find a way to adjust a file handle *immediately after opening it with undesired flags* (when it could have been opened with the desired flags, as suggested, to begin with). Ciao, Johannes
Re: [PATCH v3 2/3] sha1_file: open window into packfiles with O_CLOEXEC
On Thu, Oct 27, 2016 at 03:38:59PM -0700, Linus Torvalds wrote: > On Thu, Oct 27, 2016 at 3:24 AM, Jeff King wrote: > > > > +cc Linus as the original author of 144bde78e9 in case there is > > something subtle I'm missing, but this really just seems like it's > > an outdated optimization. > > I'd *really* like to keep O_NOATIME if at all possible. It made a huge > difference on older kernels, and I'm not convinced that relatime > really fixes it as well as O_NOATIME. > > There are people who don't like relatime. And even if you do have > relatime enabled, it will update atime once every day, so then this > makes your filesystem have a storm of nasty inode writebacks if you > haven't touched that git repo in a while. The existence of "relatime" is only half the story of its outdatedness. The other half is packfiles, so that we are paying atime only once per packfile, not once per object (technically once per mmap(), so on a 32-bit system with large packfiles, it would be multiple, depending on your window size). So I'm not convinced that "storm" is really the right word in a modern context. The atime updates due to object accesses are probably smaller than those from all the other read() calls being done on non-object files (like config, refs, etc). That being said, if you really care, it's not that much code to keep. -Peff
Re: [PATCH v3 2/3] sha1_file: open window into packfiles with O_CLOEXEC
Junio C Hamano wrote: > Junio C Hamano writes: > > > Linus Torvalds writes: > > > >> On Thu, Oct 27, 2016 at 4:36 PM, Junio C Hamano wrote: > >>> > >>> Would the best endgame shape for this function be to open with > >>> O_NOATIME (and retry without), and then add CLOEXEC with fcntl(2) > >>> but ignoring an error from it, I guess? That would be the closest > >>> to what we historically had, I would think. > >> > >> I think that's the best model. Actually, I would flip the order of flags. O_CLOEXEC is more important from a correctness standpoint. > > OK, so perhaps like this. > > Hmph. This may not fly well in practice, though. > > To Unix folks, CLOEXEC is not a huge correctness issue. A child > process may hold onto an open file descriptor a bit longer than the > lifetime of the parent but as long as the child eventually exits, I'm not too familiar with C internals of git; but I know we use threads in some places, and fork+execve in others. If our usage of threads and execve intersects, and we run untrusted code in an execve-ed child, then only having cloexec on open() will save us time when auditing for leaking FDs. fcntl(fd, F_SETFD, O_CLOEXEC) is racy in if there are other threads doing execve; so I wouldn't rely on it as a first choice. So I suppose something like this: static int noatime = 1; int fd = open(... | O_CLOEXEC); ...error checking and retrying... if (fd >= 0 && noatime && fcntl(fd, F_SETFL, O_NOATIME) != 0) noatime = 0; return fd;
Re: [PATCH v3 2/3] sha1_file: open window into packfiles with O_CLOEXEC
Junio C Hamano writes: > Linus Torvalds writes: > >> On Thu, Oct 27, 2016 at 4:36 PM, Junio C Hamano wrote: >>> >>> Would the best endgame shape for this function be to open with >>> O_NOATIME (and retry without), and then add CLOEXEC with fcntl(2) >>> but ignoring an error from it, I guess? That would be the closest >>> to what we historically had, I would think. >> >> I think that's the best model. > > OK, so perhaps like this. Hmph. This may not fly well in practice, though. To Unix folks, CLOEXEC is not a huge correctness issue. A child process may hold onto an open file descriptor a bit longer than the lifetime of the parent but as long as the child eventually exits, nothing is affected. Over there, things are different. The parent cannot even rename(2) or unlink(2) a file it created and closed while the child is still holding the file descriptor open and the lack of CLOEXEC will make the parent fail. I do not know how well fcntl(2) emulation works on Windows, but I would not be surprised if J6t or Dscho comes back and says that FD_CLOEXEC given to F_SETFD would not work while O_CLOEXEC given to open(2) does.
Re: [PATCH v3 2/3] sha1_file: open window into packfiles with O_CLOEXEC
Linus Torvalds writes: > On Thu, Oct 27, 2016 at 4:36 PM, Junio C Hamano wrote: >> >> Would the best endgame shape for this function be to open with >> O_NOATIME (and retry without), and then add CLOEXEC with fcntl(2) >> but ignoring an error from it, I guess? That would be the closest >> to what we historically had, I would think. > > I think that's the best model. OK, so perhaps like this. -- >8 -- Subject: git_open(): untangle possible NOATIME and CLOEXEC interactions The way we structured the fallback-retry for opening with O_NOATIME and O_CLOEXEC meant that if we failed due to lack of support to open the file with O_NOATIME option (i.e. EINVAL), we would still try to drop O_CLOEXEC first and retry, and then drop O_NOATIME. A platform on which O_NOATIME is defined in the header without support from the kernel wouldn't have a chance to open with O_CLOEXEC option due to this code structure. Arguably, O_CLOEXEC is more important than O_NOATIME, as the latter is mostly about performance, while the former can affect correctness. Let's revert the recent changes to the way git_open() attempts to open a file with O_NOATIME and retries without to the original sequence, and then use a separate fcntl(fd, F_SETFD, FD_CLOEXEC) on the resulting file descriptor. The helper to do the latter can be usable in the codepath in ce_compare_data() that was recently added to open a file descriptor with O_CLOEXEC, so let's refactor that codepath with the helper while we are at it. Signed-off-by: Junio C Hamano --- git-compat-util.h | 5 +++-- read-cache.c | 12 sha1_file.c | 49 ++--- 3 files changed, 37 insertions(+), 29 deletions(-) diff --git a/git-compat-util.h b/git-compat-util.h index 43718dabae..a751630db5 100644 --- a/git-compat-util.h +++ b/git-compat-util.h @@ -679,9 +679,10 @@ char *gitstrdup(const char *s); #define getpagesize() sysconf(_SC_PAGESIZE) #endif -#ifndef O_CLOEXEC -#define O_CLOEXEC 0 +#ifndef FD_CLOEXEC +#define FD_CLOEXEC 0 #endif +extern int git_set_cloexec(int); #ifdef FREAD_READS_DIRECTORIES #ifdef fopen diff --git a/read-cache.c b/read-cache.c index db5d910642..fb91514885 100644 --- a/read-cache.c +++ b/read-cache.c @@ -156,17 +156,13 @@ void fill_stat_cache_info(struct cache_entry *ce, struct stat *st) static int ce_compare_data(const struct cache_entry *ce, struct stat *st) { int match = -1; - static int cloexec = O_CLOEXEC; - int fd = open(ce->name, O_RDONLY | cloexec); - - if ((cloexec & O_CLOEXEC) && fd < 0 && errno == EINVAL) { - /* Try again w/o O_CLOEXEC: the kernel might not support it */ - cloexec &= ~O_CLOEXEC; - fd = open(ce->name, O_RDONLY | cloexec); - } + int fd = open(ce->name, O_RDONLY); if (fd >= 0) { unsigned char sha1[20]; + + /* do not let child processes to hold onto the open fd */ + git_set_cloexec(fd); if (!index_fd(sha1, fd, st, OBJ_BLOB, ce->name, 0)) match = hashcmp(sha1, ce->oid.hash); /* index_fd() closed the file descriptor already */ diff --git a/sha1_file.c b/sha1_file.c index 09045df1dc..41383a6c20 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -1559,31 +1559,42 @@ int check_sha1_signature(const unsigned char *sha1, void *map, return hashcmp(sha1, real_sha1) ? -1 : 0; } -int git_open(const char *name) +int git_set_cloexec(int fd) { - static int sha1_file_open_flag = O_NOATIME | O_CLOEXEC; + static int cloexec = FD_CLOEXEC; - for (;;) { - int fd; + if (cloexec) { + if (fcntl(fd, F_SETFD, FD_CLOEXEC) < 0) + cloexec = 0; + /* +* We might want to diagnose and complain upon seeing +* an error from this call, but let's keep the same +* behaviour as before for now. +*/ + } + return 0; +} - errno = 0; - fd = open(name, O_RDONLY | sha1_file_open_flag); - if (fd >= 0) - return fd; +int git_open(const char *name) +{ + static int noatime = O_NOATIME; + int fd; - /* Try again w/o O_CLOEXEC: the kernel might not support it */ - if ((sha1_file_open_flag & O_CLOEXEC) && errno == EINVAL) { - sha1_file_open_flag &= ~O_CLOEXEC; - continue; - } + errno = 0; + fd = open(name, O_RDONLY | noatime); - /* Might the failure be due to O_NOATIME? */ - if (errno != ENOENT && (sha1_file_open_flag & O_NOATIME)) { - sha1_file_open_flag &= ~O_NOATIME; - continue; - } - return -1; + /* Might the failure be due to O_NOATIME? */ + if ((noatime & O_NOATIME) &&
Re: [PATCH v3 2/3] sha1_file: open window into packfiles with O_CLOEXEC
On Thu, Oct 27, 2016 at 4:36 PM, Junio C Hamano wrote: > > Would the best endgame shape for this function be to open with > O_NOATIME (and retry without), and then add CLOEXEC with fcntl(2) > but ignoring an error from it, I guess? That would be the closest > to what we historically had, I would think. I think that's the best model. Note that the O_NOATIME code is very much designed to try O_NOATIME only _once_. Because even when the kernel supports O_NOATIME, if you have a shared object tree where you may not be the owner of all the files, a O_NOATIME open can fail with NOPERM ("You are not allowed to hide your accesses to this file"). This is why it uses that static unsigned int sha1_file_open_flag = O_NOATIME; and if the O_NOATIME open ever fails (and the no-O_NOATIME open succeeds), it clears that flag. Exactly so that it will *not* end up in some kind of "let's open and fail and re-open" loop. It's designed to fail once. Or at least that's how it used to be originally. This code has obviously changed since that early design. Now it seems to clear it for any non-ENOENT error. Which looks fine too. Linus
Re: [PATCH v3 2/3] sha1_file: open window into packfiles with O_CLOEXEC
Linus Torvalds writes: > But the basic issue still remains - I'd really prefer to have NOATIME > stay around for all those poor misguided souls that for some reason > don't like "relatime" or run old kernels. But whether it is with > O_NOATIME at open time or with F_SETFL, I don't care. Understood. Would the best endgame shape for this function be to open with O_NOATIME (and retry without), and then add CLOEXEC with fcntl(2) but ignoring an error from it, I guess? That would be the closest to what we historically had, I would think.
Re: [PATCH v3 2/3] sha1_file: open window into packfiles with O_CLOEXEC
On Thu, Oct 27, 2016 at 4:09 PM, Linus Torvalds wrote: > > That said, now that I think about it, I should double-check: maybe > open() doesn't actually set atime at all, and we *could* do NOATIME > with SETFL after all. Checked. Yup. O_NOATIME could easily be done with SETFL:, because as with O_CLOEXEC, it only affects operations _after_ the open. The open itself doesn't set the access time. So I was full of it. But the basic issue still remains - I'd really prefer to have NOATIME stay around for all those poor misguided souls that for some reason don't like "relatime" or run old kernels. But whether it is with O_NOATIME at open time or with F_SETFL, I don't care. Linus
Re: [PATCH v3 2/3] sha1_file: open window into packfiles with O_CLOEXEC
On Thu, Oct 27, 2016 at 3:56 PM, Junio C Hamano wrote: > > I also thought O_NOATIME shouldn't be an effective fcntl(2) thing, > for the reasons you stated above, when I was studying the area > because of the discussion on these patches. I was surprised to see > that http://man7.org/linux/man-pages/man2/fcntl.2.html contradicts > by saying F_SETFL can take O_NOATIME. > > Perhaps it deserves a bugreport? It's not a bug per se. You can indeed change O_NOATIME with F_SETFL. And it actually does matter, since atime is normally changed by mmap(), read() and write() calls. So F_SETFL O_NOATIME actually does make sense, but it doesn't cover the atime update done by open() itself. That said, now that I think about it, I should double-check: maybe open() doesn't actually set atime at all, and we *could* do NOATIME with SETFL after all. The exact [acm]time semantics are damn subtle. Linus
Re: [PATCH v3 2/3] sha1_file: open window into packfiles with O_CLOEXEC
Linus Torvalds writes: > Note that you can *not* do the same thing with O_NOATIME, since the > whole point of O_NOATIME is that it changes the behavior of the open > itself (unlike O_CLOEXEC which changes _later_ behavior, and can > always be replaced by FD_CLOEXEC fnclt modulo races that are > immaterial for git) A tangent. I also thought O_NOATIME shouldn't be an effective fcntl(2) thing, for the reasons you stated above, when I was studying the area because of the discussion on these patches. I was surprised to see that http://man7.org/linux/man-pages/man2/fcntl.2.html contradicts by saying F_SETFL can take O_NOATIME. Perhaps it deserves a bugreport?
Re: [PATCH v3 2/3] sha1_file: open window into packfiles with O_CLOEXEC
On Thu, Oct 27, 2016 at 3:24 AM, Jeff King wrote: > > +cc Linus as the original author of 144bde78e9 in case there is > something subtle I'm missing, but this really just seems like it's > an outdated optimization. I'd *really* like to keep O_NOATIME if at all possible. It made a huge difference on older kernels, and I'm not convinced that relatime really fixes it as well as O_NOATIME. There are people who don't like relatime. And even if you do have relatime enabled, it will update atime once every day, so then this makes your filesystem have a storm of nasty inode writebacks if you haven't touched that git repo in a while. If this is purely about mixing things up with O_CLOEXEC, then that is *trivially* fixable by just using fcntl(fd, F_SETFD, FD_CLOEXEC); after the open. Which is what you have to do anyway if you want to be portable, so its' not like you could avoid that complexity in the first place. Note that you can *not* do the same thing with O_NOATIME, since the whole point of O_NOATIME is that it changes the behavior of the open itself (unlike O_CLOEXEC which changes _later_ behavior, and can always be replaced by FD_CLOEXEC fnclt modulo races that are immaterial for git) Linus
Re: [PATCH v3 2/3] sha1_file: open window into packfiles with O_CLOEXEC
Jeff King writes: > +cc Linus as the original author of 144bde78e9 in case there is > something subtle I'm missing, but this really just seems like it's > an outdated optimization. > > -- >8 -- > Subject: [PATCH] sha1_file: stop opening files with O_NOATIME > > When we open object files, we try to do so with O_NOATIME. > This dates back to 144bde78e9 (Use O_NOATIME when opening > the sha1 files., 2005-04-23), which is an optimization to > avoid creating a bunch of dirty inodes when we're accessing > many objects. But a few things have changed since then: > > 1. In June 2005, git learned about packfiles, which means > we would do a lot fewer atime updates (rather than one > per object access, we'd generally get one per packfile). > > 2. In late 2006, Linux learned about "relatime", which is > generally the default on modern installs. So > performance around atimes updates is a non-issue there > these days. > > All the world isn't Linux, but as it turns out, Linux > is the only platform to implement O_NOATIME in the > first place. > > So it's very unlikely that this code is helping anybody > these days. > > It's not a particularly large amount of code, but the > fallback-retry creates complexity. E.g., we do a similar > fallback for CLOEXEC; which one should take precedence, or > should we try all possible combinations? Dropping O_NOATIME > makes those questions go away. > > Signed-off-by: Jeff King > --- We may want to lose the surrounding for (;;) loop as there is only one flag to retry without, which was the original code structure back when 144bde78e9 ("Use O_NOATIME when opening the sha1 files.", 2005-04-23) was written and refactored by 44d1c19ee8 ("Make loose object file reading more careful", 2008-06-14). IOW, this on top. The update to ce_compare_data() Lars has in a0a6cb9662 ("read-cache: make sure file handles are not inherited by child processes", 2016-10-24) could then made into a call to git_open(). sha1_file.c | 23 --- 1 file changed, 8 insertions(+), 15 deletions(-) diff --git a/sha1_file.c b/sha1_file.c index 6f02a57d8b..e18ea053e6 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -1553,24 +1553,17 @@ int check_sha1_signature(const unsigned char *sha1, void *map, int git_open(const char *name) { - static int sha1_file_open_flag = O_CLOEXEC; - - for (;;) { - int fd; - - errno = 0; - fd = open(name, O_RDONLY | sha1_file_open_flag); - if (fd >= 0) - return fd; + static int cloexec = O_CLOEXEC; + int fd; + errno = 0; + fd = open(name, O_RDONLY | cloexec); + if ((cloexec & O_CLOEXEC) && fd < 0 && errno == EINVAL) { /* Try again w/o O_CLOEXEC: the kernel might not support it */ - if ((sha1_file_open_flag & O_CLOEXEC) && errno == EINVAL) { - sha1_file_open_flag &= ~O_CLOEXEC; - continue; - } - - return -1; + cloexec &= ~O_CLOEXEC; + fd = open(name, O_RDONLY | cloexec); } + return fd; } static int stat_sha1_file(const unsigned char *sha1, struct stat *st)
Re: [PATCH v3 2/3] sha1_file: open window into packfiles with O_CLOEXEC
On Wed, Oct 26, 2016 at 02:15:33PM -0700, Junio C Hamano wrote: > Jeff King writes: > > > On Wed, Oct 26, 2016 at 10:52:41AM -0700, Junio C Hamano wrote: > > > >> > I actually wonder if it is worth carrying around the O_NOATIME hack at > >> > all. > >> > >> Yes, I share the thought. We no longer have too many loose objects > >> to matter. > >> > >> I do not mind flipping the order, but I'd prefer to cook the result > >> even longer. I am tempted to suggest we take two step route: > >> > >> - ship 2.11 with the "atime has been there and we won't regress it" > >>shape, while cooking the "cloexec is semantically more > >>important" version in 'next' during the feature freeze > >> > >> - immediately after 2.11 merge it to 'master' for 2.12 to make sure > >>there is no fallout. > > > > That sounds reasonable, though I'd consider jumping straight to "NOATIME > > is not worth it; drop it" as the patch for post-2.11. > > That endgame is fine by me too. Thanks for a sanity-check. So here's that endgame patch. My main concern with it was that there might be non-Linux systems that could be affected. But when I dug into it, I found that this code was never activated anywhere besides Linux in the first place. So I really doubt this will have any negative impact at all. I certainly don't mind cooking it until post-2.11, though. +cc Linus as the original author of 144bde78e9 in case there is something subtle I'm missing, but this really just seems like it's an outdated optimization. -- >8 -- Subject: [PATCH] sha1_file: stop opening files with O_NOATIME When we open object files, we try to do so with O_NOATIME. This dates back to 144bde78e9 (Use O_NOATIME when opening the sha1 files., 2005-04-23), which is an optimization to avoid creating a bunch of dirty inodes when we're accessing many objects. But a few things have changed since then: 1. In June 2005, git learned about packfiles, which means we would do a lot fewer atime updates (rather than one per object access, we'd generally get one per packfile). 2. In late 2006, Linux learned about "relatime", which is generally the default on modern installs. So performance around atimes updates is a non-issue there these days. All the world isn't Linux, but as it turns out, Linux is the only platform to implement O_NOATIME in the first place. So it's very unlikely that this code is helping anybody these days. It's not a particularly large amount of code, but the fallback-retry creates complexity. E.g., we do a similar fallback for CLOEXEC; which one should take precedence, or should we try all possible combinations? Dropping O_NOATIME makes those questions go away. Signed-off-by: Jeff King --- sha1_file.c | 15 +-- 1 file changed, 1 insertion(+), 14 deletions(-) diff --git a/sha1_file.c b/sha1_file.c index 09045df1dc..6f02a57d8b 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -27,14 +27,6 @@ #include "list.h" #include "mergesort.h" -#ifndef O_NOATIME -#if defined(__linux__) && (defined(__i386__) || defined(__PPC__)) -#define O_NOATIME 0100 -#else -#define O_NOATIME 0 -#endif -#endif - #define SZ_FMT PRIuMAX static inline uintmax_t sz_fmt(size_t s) { return s; } @@ -1561,7 +1553,7 @@ int check_sha1_signature(const unsigned char *sha1, void *map, int git_open(const char *name) { - static int sha1_file_open_flag = O_NOATIME | O_CLOEXEC; + static int sha1_file_open_flag = O_CLOEXEC; for (;;) { int fd; @@ -1577,11 +1569,6 @@ int git_open(const char *name) continue; } - /* Might the failure be due to O_NOATIME? */ - if (errno != ENOENT && (sha1_file_open_flag & O_NOATIME)) { - sha1_file_open_flag &= ~O_NOATIME; - continue; - } return -1; } } -- 2.10.1.916.g0d2035c
Re: [PATCH v3 2/3] sha1_file: open window into packfiles with O_CLOEXEC
Jeff King writes: > On Wed, Oct 26, 2016 at 10:52:41AM -0700, Junio C Hamano wrote: > >> > I actually wonder if it is worth carrying around the O_NOATIME hack at >> > all. >> >> Yes, I share the thought. We no longer have too many loose objects >> to matter. >> >> I do not mind flipping the order, but I'd prefer to cook the result >> even longer. I am tempted to suggest we take two step route: >> >> - ship 2.11 with the "atime has been there and we won't regress it" >>shape, while cooking the "cloexec is semantically more >>important" version in 'next' during the feature freeze >> >> - immediately after 2.11 merge it to 'master' for 2.12 to make sure >>there is no fallout. > > That sounds reasonable, though I'd consider jumping straight to "NOATIME > is not worth it; drop it" as the patch for post-2.11. That endgame is fine by me too. Thanks for a sanity-check.
Re: [PATCH v3 2/3] sha1_file: open window into packfiles with O_CLOEXEC
On Wed, Oct 26, 2016 at 10:52:41AM -0700, Junio C Hamano wrote: > > I actually wonder if it is worth carrying around the O_NOATIME hack at > > all. > > Yes, I share the thought. We no longer have too many loose objects > to matter. > > I do not mind flipping the order, but I'd prefer to cook the result > even longer. I am tempted to suggest we take two step route: > > - ship 2.11 with the "atime has been there and we won't regress it" >shape, while cooking the "cloexec is semantically more >important" version in 'next' during the feature freeze > > - immediately after 2.11 merge it to 'master' for 2.12 to make sure >there is no fallout. That sounds reasonable, though I'd consider jumping straight to "NOATIME is not worth it; drop it" as the patch for post-2.11. -Peff
Re: [PATCH v3 2/3] sha1_file: open window into packfiles with O_CLOEXEC
Jeff King writes: > Of the two flags, I would say CLOEXEC is the more important one to > respect because it may actually impact correctness (e.g., leaking > descriptors to sub-processes). Whereas O_NOATIME is purely a performance > optimization. I tend to agree. > I actually wonder if it is worth carrying around the O_NOATIME hack at > all. Yes, I share the thought. We no longer have too many loose objects to matter. I do not mind flipping the order, but I'd prefer to cook the result even longer. I am tempted to suggest we take two step route: - ship 2.11 with the "atime has been there and we won't regress it" shape, while cooking the "cloexec is semantically more important" version in 'next' during the feature freeze - immediately after 2.11 merge it to 'master' for 2.12 to make sure there is no fallout.
Re: [PATCH v3 2/3] sha1_file: open window into packfiles with O_CLOEXEC
On Wed, Oct 26, 2016 at 09:23:21AM -0700, Junio C Hamano wrote: > >> + /* Might the failure be due to O_NOATIME? */ > >> + if (errno != ENOENT && (sha1_file_open_flag & O_NOATIME)) { > >> + sha1_file_open_flag &= ~O_NOATIME; > >> + continue; > >> + } > > > > We drop O_NOATIME, and end up with an empty flag field. > > > > But we will never have tried just O_CLOEXEC, which might have worked. > > Yes, doing so would smudge atime, so one question is which one > between noatime or cloexec is more important to be done at open(2) > time. Yes, but the missing case is one where we know that O_NOATIME does not work (but O_CLOEXEC does), so we know we have to smudge the atime. Of the two flags, I would say CLOEXEC is the more important one to respect because it may actually impact correctness (e.g., leaking descriptors to sub-processes). Whereas O_NOATIME is purely a performance optimization. I actually wonder if it is worth carrying around the O_NOATIME hack at all. Linus added it on 2005-04-23 via 144bde78e9; the aim was to reduce the cost of opening loose object files. Some things have changed since then: 1. In June 2005, git learned about packfiles, which means we would do a lot fewer atime updates (rather than one per object access, we'd generally get one per packfile). 2. In late 2006, Linux learned about "relatime", which is generally the default on modern installs. So performance around atime updates is a non-issue there these days. All the world isn't Linux, of course, but I can't help that feel that atime performance hackery is something that belongs at the system level, not in individual applications. So I don't have hard numbers, but I'd be surprised if O_NOATIME is really buying us anything these days. -Peff
Re: [PATCH v3 2/3] sha1_file: open window into packfiles with O_CLOEXEC
Jeff King writes: >> +/* Try again w/o O_CLOEXEC: the kernel might not support it */ >> +if ((sha1_file_open_flag & O_CLOEXEC) && errno == EINVAL) { >> +sha1_file_open_flag &= ~O_CLOEXEC; >> continue; >> } > > So if we start with O_CLOEXEC|O_NOATIME, we drop CLOEXEC here and try > again with just O_NOATIME. And then if _that_ fails... > >> +/* Might the failure be due to O_NOATIME? */ >> +if (errno != ENOENT && (sha1_file_open_flag & O_NOATIME)) { >> +sha1_file_open_flag &= ~O_NOATIME; >> +continue; >> +} > > We drop O_NOATIME, and end up with an empty flag field. > > But we will never have tried just O_CLOEXEC, which might have worked. Yes, doing so would smudge atime, so one question is which one between noatime or cloexec is more important to be done at open(2) time. It may be possible to open(2) only with cloexec and then fcntl(2) FD_SET noatime immediately after, but going that way would explode the combination even more, as it may not be possible to set these two flags the other way around. > I'm not sure it's worth worrying about or not; I don't know which > systems are actually lacking either of the flags, or if they tend to > have both.
Re: [PATCH v3 2/3] sha1_file: open window into packfiles with O_CLOEXEC
On Tue, Oct 25, 2016 at 11:16:20AM -0700, Junio C Hamano wrote: > diff --git a/sha1_file.c b/sha1_file.c > index 5d2bcd3ed1..09045df1dc 100644 > --- a/sha1_file.c > +++ b/sha1_file.c > @@ -1571,12 +1571,17 @@ int git_open(const char *name) > if (fd >= 0) > return fd; > > - /* Might the failure be due to O_NOATIME? */ > - if (errno != ENOENT && sha1_file_open_flag) { > - sha1_file_open_flag = 0; > + /* Try again w/o O_CLOEXEC: the kernel might not support it */ > + if ((sha1_file_open_flag & O_CLOEXEC) && errno == EINVAL) { > + sha1_file_open_flag &= ~O_CLOEXEC; > continue; > } So if we start with O_CLOEXEC|O_NOATIME, we drop CLOEXEC here and try again with just O_NOATIME. And then if _that_ fails... > + /* Might the failure be due to O_NOATIME? */ > + if (errno != ENOENT && (sha1_file_open_flag & O_NOATIME)) { > + sha1_file_open_flag &= ~O_NOATIME; > + continue; > + } We drop O_NOATIME, and end up with an empty flag field. But we will never have tried just O_CLOEXEC, which might have worked. Because of the order here, this would not be a regression (i.e., any system that used to work will still eventually find a working comb), but it does mean that systems without O_NOATIME do not get the benefit of the new O_CLOEXEC protection. Unfortunately, I think covering all of the cases would be 2^nr_flags. That's only 4 right now, but it does not bode well as a pattern. I'm not sure it's worth worrying about or not; I don't know which systems are actually lacking either of the flags, or if they tend to have both. -Peff
[PATCH v3 2/3] sha1_file: open window into packfiles with O_CLOEXEC
From: Lars Schneider All processes that the Git main process spawns inherit the open file descriptors of the main process. These leaked file descriptors can cause problems. Use the O_CLOEXEC flag similar to 05d1ed61 to fix the leaked file descriptors. Signed-off-by: Lars Schneider Signed-off-by: Junio C Hamano --- * And the remainder of original 1/2, again taking suggestion by DScho. sha1_file.c | 13 + 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/sha1_file.c b/sha1_file.c index 5d2bcd3ed1..09045df1dc 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -1561,7 +1561,7 @@ int check_sha1_signature(const unsigned char *sha1, void *map, int git_open(const char *name) { - static int sha1_file_open_flag = O_NOATIME; + static int sha1_file_open_flag = O_NOATIME | O_CLOEXEC; for (;;) { int fd; @@ -1571,12 +1571,17 @@ int git_open(const char *name) if (fd >= 0) return fd; - /* Might the failure be due to O_NOATIME? */ - if (errno != ENOENT && sha1_file_open_flag) { - sha1_file_open_flag = 0; + /* Try again w/o O_CLOEXEC: the kernel might not support it */ + if ((sha1_file_open_flag & O_CLOEXEC) && errno == EINVAL) { + sha1_file_open_flag &= ~O_CLOEXEC; continue; } + /* Might the failure be due to O_NOATIME? */ + if (errno != ENOENT && (sha1_file_open_flag & O_NOATIME)) { + sha1_file_open_flag &= ~O_NOATIME; + continue; + } return -1; } } -- 2.10.1-777-gd068e6bde7