Re: [PATCH v3 2/3] sha1_file: open window into packfiles with O_CLOEXEC

2016-10-31 Thread Jeff King
On Mon, Oct 31, 2016 at 10:55:32AM -0700, Junio C Hamano wrote:

> > So I guess it's possible that it produces a noticeable effect in some
> > cases, but I'm still somewhat doubtful. And actually repacking your
> > repository had a greater effect in every case I measured (in addition to
> > providing other speedups).
> 
> Let's keep doubting.  I prefer one-step-at-a-time approach to
> things anyway, and what I plan in the near term are:
> 
>  * use the "open() with O_NOATIME|O_CLOEXEC, gradually losing the
>bits during fallback" approach in the ls/git-open-cloexec topic,
>in order to help ls/filter-process topic be part of the upcoming
>release;
> 
>  * simplify the logic to the "open(2) with O_CLOEXEC, set O_NOATIME
>with fcntl(2)" in jc/git-open-cloexec~1 after 2.11 ships;
> 
>  * cook "drop the latter half of setting O_NOATIME" which is at the
>tip of jc/git-open-cloexec in 'next', and while Linus is looking
>the other way ^W^W^W^W^W^W^W after people had chance to complain
>with numbers, merge it to a future release iff it still looked OK
>to drop O_NOATIME thing.

Great, that sounds like a good way to proceed (and if the final step
never happens, no big deal).

-Peff


Re: [PATCH v3 2/3] sha1_file: open window into packfiles with O_CLOEXEC

2016-10-31 Thread Junio C Hamano
Jeff King  writes:

> If you set a probe on touch_atime() in the kernel (which is called for
> every attempt to smudge the atime, regardless of mount options, but is
> skipped when the descriptor was opened with O_NOATIME), you can see the
> impact. Here's a command I picked because it reads a lot of objects (run
> on my git.git clone):
>
>   $ perf stat -e probe:touch-atime git log -Sfoo >/dev/null
>
> And the probe:touch_atime counts before (stock git) and after (a patch
> to drop O_NOATIME):
>
>   before: 22,235
>after: 22,362
>
> So that's only half a percent difference. And it's on a reasonably messy
> clone that is partway to triggering an auto-repack:
> ...
> So I guess it's possible that it produces a noticeable effect in some
> cases, but I'm still somewhat doubtful. And actually repacking your
> repository had a greater effect in every case I measured (in addition to
> providing other speedups).

Let's keep doubting.  I prefer one-step-at-a-time approach to
things anyway, and what I plan in the near term are:

 * use the "open() with O_NOATIME|O_CLOEXEC, gradually losing the
   bits during fallback" approach in the ls/git-open-cloexec topic,
   in order to help ls/filter-process topic be part of the upcoming
   release;

 * simplify the logic to the "open(2) with O_CLOEXEC, set O_NOATIME
   with fcntl(2)" in jc/git-open-cloexec~1 after 2.11 ships;

 * cook "drop the latter half of setting O_NOATIME" which is at the
   tip of jc/git-open-cloexec in 'next', and while Linus is looking
   the other way ^W^W^W^W^W^W^W after people had chance to complain
   with numbers, merge it to a future release iff it still looked OK
   to drop O_NOATIME thing.


Re: [PATCH v3 2/3] sha1_file: open window into packfiles with O_CLOEXEC

2016-10-31 Thread Junio C Hamano
Linus Torvalds  writes:

> On Sat, Oct 29, 2016 at 1:25 AM, Johannes Schindelin
>  wrote:
>>
>> Correct. We cannot change an open file handle's state ("FD_CLOEXEC") on
>> Windows, but we can ask open() to open said file handle with the correct
>> flag ("O_CLOEXEC", which is mapped to O_NOINHERIT on Windows.
>
> Ok. So then I have no issues with it, and let's use O_CLOEXEC if it
> exists and fcntl(FD_CLOEXEC) if O_CLOEXEC doesn't exist.

I am not sure if the fallback to FD_CLOEXEC is worth doing, because
it does not exist on Windows, where leaking an open file descriptor
to a child process matters more than other platforms [*1*].  

Of course, the world is not all Windows, and there may be other
platforms this fallback may help.  But there is a chance that some
other thread may fork between the time we open(2) and the time we
call fcntl(2), leaving FD_CLOEXEC ineffective and unreliable, so
that's another thing that makes me doubt if FD_CLOEXEC fallback is
worth doing.

Having said that, here is a rewrite of [*2*] to use the fallback
would look like.  After this topic settles, we may want to also
update the tempfile.c::create_tempfile() implementation to use the
helper function we use here.


[Footnotes]

*1* The call in ce_compare_data() is to see if the contents in the
working tree file match what is in the index.  Perhaps the
program is "git checkout another-branch" that knows this path is
absent in the branch being switched to and trying to make sure
we lose no local modification.  When the path needs to be passed
through the clean/smudge filter before hashing to see if the
working tree contents hashes to the object in the index, we need
to fork and then after the filter finishes its processing, we
close the file descriptor and then unlink(2) the path if it is
clean.  We are about to add a variant of clean/smudge mechanism
where a lazily spawned process can clean contents of multiple
paths and that is where cloexec starts to matter.  This function
starts to inspect one path, opens a fd to it, finds that it
needs filtering, spawns a long-lingering process, while leaking
the original fd to it.  After feeding the contents via pipe and
then receiving the filtered contents (the protocol is not full
duplex and there won't be a stuck pipe), the parent would decide
that the path is clean and attempts to unlink(2).  Windows does
not let you unlink a path with open filedescriptor held, causing
"checkout" to fail.  As it is only one fd that leaks to the
child, no matter how many subsequent paths are filtered by the
same long-lingering child process, it will be a far less
important issue on other platforms that lets you unlink(2) such
a file.

*2* http://public-inbox.org/git/xmqqh97w38gj@gitster.mtv.corp.google.com

---
 cache.h   |  1 +
 git-compat-util.h |  4 
 read-cache.c  |  9 +
 sha1_file.c   | 47 ---
 4 files changed, 34 insertions(+), 27 deletions(-)

diff --git a/cache.h b/cache.h
index a902ca1f8e..6f1c21a352 100644
--- a/cache.h
+++ b/cache.h
@@ -1122,6 +1122,7 @@ extern int write_sha1_file(const void *buf, unsigned long 
len, const char *type,
 extern int hash_sha1_file_literally(const void *buf, unsigned long len, const 
char *type, unsigned char *sha1, unsigned flags);
 extern int pretend_sha1_file(void *, unsigned long, enum object_type, unsigned 
char *);
 extern int force_object_loose(const unsigned char *sha1, time_t mtime);
+extern int git_open_cloexec(const char *name, int flags);
 extern int git_open(const char *name);
 extern void *map_sha1_file(const unsigned char *sha1, unsigned long *size);
 extern int unpack_sha1_header(git_zstream *stream, unsigned char *map, 
unsigned long mapsize, void *buffer, unsigned long bufsiz);
diff --git a/git-compat-util.h b/git-compat-util.h
index 43718dabae..64407c701c 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -683,6 +683,10 @@ char *gitstrdup(const char *s);
 #define O_CLOEXEC 0
 #endif
 
+#ifndef FD_CLOEXEC
+#define FD_CLOEXEC 0
+#endif
+
 #ifdef FREAD_READS_DIRECTORIES
 #ifdef fopen
 #undef fopen
diff --git a/read-cache.c b/read-cache.c
index db5d910642..c27d3e240b 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -156,14 +156,7 @@ void fill_stat_cache_info(struct cache_entry *ce, struct 
stat *st)
 static int ce_compare_data(const struct cache_entry *ce, struct stat *st)
 {
int match = -1;
-   static int cloexec = O_CLOEXEC;
-   int fd = open(ce->name, O_RDONLY | cloexec);
-
-   if ((cloexec & O_CLOEXEC) && fd < 0 && errno == EINVAL) {
-   /* Try again w/o O_CLOEXEC: the kernel might not support it */
-   cloexec &= ~O_CLOEXEC;
-   fd = open(ce->name, O_RDONLY | cloexec);
-   }
+   int fd = git_open_cloexec(ce->name, O_RDONLY);
 
if (fd >= 0) {
unsigned char sha1[20];
diff --git a/sha1_file.c 

Re: [PATCH v3 2/3] sha1_file: open window into packfiles with O_CLOEXEC

2016-10-31 Thread Jeff King
On Fri, Oct 28, 2016 at 09:13:41AM -0700, Linus Torvalds wrote:

> On Fri, Oct 28, 2016 at 4:11 AM, Johannes Schindelin
>  wrote:
> >
> > You guys. I mean: You guys! You sure make my life hard. A brief look at
> > mingw.h could have answered your implicit question:
> 
> So here's what you guys should do:
> 
>  - leave O_NOATIME damn well alone. It works. It has worked for 10+
> years. Stop arguing against it, people who do.

For some definition of worked, perhaps.

If you set a probe on touch_atime() in the kernel (which is called for
every attempt to smudge the atime, regardless of mount options, but is
skipped when the descriptor was opened with O_NOATIME), you can see the
impact. Here's a command I picked because it reads a lot of objects (run
on my git.git clone):

  $ perf stat -e probe:touch-atime git log -Sfoo >/dev/null

And the probe:touch_atime counts before (stock git) and after (a patch
to drop O_NOATIME):

  before: 22,235
   after: 22,362

So that's only half a percent difference. And it's on a reasonably messy
clone that is partway to triggering an auto-repack:

  $ git count-objects -v
  count: 6167
  size: 61128
  in-pack: 275773
  packs: 18
  size-pack: 86857
  prune-packable: 25
  garbage: 0
  size-garbage: 0

Running "git gc" drops the probe count to 21,733.

It makes a bigger difference for some commands (it's more like 10% for
git-status). And smaller for others ("git log -p" triggers it over
100,000 times).

One thing missing in that count is how many of those calls would have
resulted in an actual disk write. Looking at strace, most of the
filesystem activity is opening .gitattributes files, and we end up
opening the same ones repeatedly (e.g., t/.gitattributes in git.git).
Multiple hits for a given inode in the same second get coalesced into at
most a single disk write.

So I guess it's possible that it produces a noticeable effect in some
cases, but I'm still somewhat doubtful. And actually repacking your
repository had a greater effect in every case I measured (in addition to
providing other speedups).

Like I said, I'm OK keeping O_NOATIME. It's just not that much code. But
if you really care about the issue of dirtying inodes via atime, you
should look into vastly increasing our use of O_NOATIME. Or possibly
looking at caching more in the attribute code.

-Peff


Re: [PATCH v3 2/3] sha1_file: open window into packfiles with O_CLOEXEC

2016-10-29 Thread Linus Torvalds
On Sat, Oct 29, 2016 at 1:25 AM, Johannes Schindelin
 wrote:
>
> Correct. We cannot change an open file handle's state ("FD_CLOEXEC") on
> Windows, but we can ask open() to open said file handle with the correct
> flag ("O_CLOEXEC", which is mapped to O_NOINHERIT on Windows.

Ok. So then I have no issues with it, and let's use O_CLOEXEC if it
exists and fcntl(FD_CLOEXEC) if O_CLOEXEC doesn't exist.

  Linus


Re: [PATCH v3 2/3] sha1_file: open window into packfiles with O_CLOEXEC

2016-10-29 Thread Johannes Schindelin
Hi,

On Fri, 28 Oct 2016, Junio C Hamano wrote:

> Linus Torvalds  writes:
> 
> > Apparently windows doesn't even support it, at least not mingw
> 
> Assuming that the above was a misunderstanding, assuming that we can
> do O_CLOEXEC (but not FD_CLOEXEC) on Windows just fine, where stray
> file descriptors held open in the children matter more, and ...

Correct. We cannot change an open file handle's state ("FD_CLOEXEC") on
Windows, but we can ask open() to open said file handle with the correct
flag ("O_CLOEXEC", which is mapped to O_NOINHERIT on Windows.

And for the record: I agree with Junio that we cannot simply drop this
O_CLOEXEC business.

Because it was introduced to fix bugs.

Thank you for your time,
Johannes


Re: [PATCH v3 2/3] sha1_file: open window into packfiles with O_CLOEXEC

2016-10-28 Thread Junio C Hamano
Linus Torvalds  writes:

> Apparently windows doesn't even support it, at least not mingw

Assuming that the above was a misunderstanding, assuming that we can
do O_CLOEXEC (but not FD_CLOEXEC) on Windows just fine, where stray
file descriptors held open in the children matter more, and ...

> In contrast, O_NOATIME isn't a maintenance problem, since it's purely
> an optimization and has no semantic difference, so it not existing on
> some platform is immaterial.

... assuming that I didn't screw up my use of fcntl() to set
O_NOATIME on a fd opened with O_CLOEXEC, are you OK with the
approach in patch *1*?  We can drop *2* to keep O_NOATIME, which has
been working for those with old kernels with many loose objects, and
it is not too much code to keep.

*1* http://public-inbox.org/git/xmqqh97w38gj@gitster.mtv.corp.google.com
*2* http://public-inbox.org/git/xmqqd1ik38f4@gitster.mtv.corp.google.com


Re: [PATCH v3 2/3] sha1_file: open window into packfiles with O_CLOEXEC

2016-10-28 Thread Junio C Hamano
Linus Torvalds  writes:

> On Fri, Oct 28, 2016 at 9:48 AM, Junio C Hamano  wrote:
>>
>> Setting FD_CLOEXEC with fcntl(2) may be racy, but in what way
>> it is broken to open(2) with O_CLOEXEC?
>
> Apparently windows doesn't even support it, at least not mingw. So
> you'll have to have some supprt for just explicitly closing things at
> execve anyway. And if you do that for windows, then what's the point
> of using O_CLOEXEC on Linux, and having two totally different paths
> that will just mean maintenance headache.

Ah, that's where your reaction comes from.  If I understood Dscho
correctly, what he said was that I cannot set FD_CLOEXEC via fcntl()
emulation they have.  It wasn't an objection to O_CLOEXEC.  In fact,
since 05d1ed6148 ("mingw: ensure temporary file handles are not
inherited by child processes", 2016-08-22), we have been relying on
open(2) with O_CLOEXEC even on Windows (by emulating it with
O_NOINHERIT, which Windows has) on some codepaths.




Re: [PATCH v3 2/3] sha1_file: open window into packfiles with O_CLOEXEC

2016-10-28 Thread Linus Torvalds
On Fri, Oct 28, 2016 at 9:48 AM, Junio C Hamano  wrote:
>
> Excuse me, but care to elaborate a bit more?

It just seems entirely pointless to change the games with O_xyz.

> Setting FD_CLOEXEC with fcntl(2) may be racy, but in what way
> it is broken to open(2) with O_CLOEXEC?

Apparently windows doesn't even support it, at least not mingw. So
you'll have to have some supprt for just explicitly closing things at
execve anyway. And if you do that for windows, then what's the point
of using O_CLOEXEC on Linux, and having two totally different paths
that will just mean maintenance headache.

In contrast, O_NOATIME isn't a maintenance problem, since it's purely
an optimization and has no semantic difference, so it not existing on
some platform is immaterial.

End result: leave O_NOATIME as it is (working), and just forget about O_CLOEXEC.

I have no actual idea about mingw support for this, but Johannes'
email certainly implies it's painful on Windows. Some googling also
shows people doing things like

  #ifdef O_CLOEXEC
  flags |= O_CLOEXEC;
  #endif

or

  #ifndef O_CLOEXEC
  #define O_CLOEXEC 0
  #endif

so O_CLOEXEC definitely isn't considered portable. Do you really want
to rely on it?

  Linus


Re: [PATCH v3 2/3] sha1_file: open window into packfiles with O_CLOEXEC

2016-10-28 Thread Junio C Hamano
Linus Torvalds  writes:

> On Fri, Oct 28, 2016 at 4:11 AM, Johannes Schindelin
>  wrote:
>>
>> You guys. I mean: You guys! You sure make my life hard. A brief look at
>> mingw.h could have answered your implicit question:
>
> So here's what you guys should do:
>
>  - leave O_NOATIME damn well alone. It works. It has worked for 10+
> years. Stop arguing against it, people who do.
>
>  - get rid of all O_CLOEXEC games. They don't work. If you want to
> close file descriptors at execve(), you - gasp - close the file
> descriptor before doing an execve.
>
> So O_CLOEXEC or FD_CLOEXEC is broken.
>
> DO NOT BREAK O_NOATIME JUST TO ADD COMPLETELY NEW BREAKAGE.

Excuse me, but care to elaborate a bit more?

Did I botch the way I ask fcntl(2) to set O_NOATIME in *1*?  Its
follow-up in *2* is optional and as peff said in *3*, I do not think
we mind dropping that step and keeping O_NOATIME.  It is not that
much code to keep.

Setting FD_CLOEXEC with fcntl(2) may be racy, but in what way
it is broken to open(2) with O_CLOEXEC?

If we want to close file descriptors we opened at execve() time
ourselves, we'd need to somehow keep track of which file descriptors
are meant to be closed upon execve() time, and one way to mark that
may be to use O_CLOEXEC, but once we mark a file descriptor with the
bit, execve() would take care of "closing" it for us---wouldn't that
be the whole point of that bit?


*1* http://public-inbox.org/git/xmqqh97w38gj@gitster.mtv.corp.google.com
*2* http://public-inbox.org/git/xmqqd1ik38f4@gitster.mtv.corp.google.com
*3* 
http://public-inbox.org/git/20161028075104.la24zydnr3ogb...@sigill.intra.peff.net


Re: [PATCH v3 2/3] sha1_file: open window into packfiles with O_CLOEXEC

2016-10-28 Thread Linus Torvalds
On Fri, Oct 28, 2016 at 4:11 AM, Johannes Schindelin
 wrote:
>
> You guys. I mean: You guys! You sure make my life hard. A brief look at
> mingw.h could have answered your implicit question:

So here's what you guys should do:

 - leave O_NOATIME damn well alone. It works. It has worked for 10+
years. Stop arguing against it, people who do.

 - get rid of all O_CLOEXEC games. They don't work. If you want to
close file descriptors at execve(), you - gasp - close the file
descriptor before doing an execve.

So O_CLOEXEC or FD_CLOEXEC is broken.

DO NOT BREAK O_NOATIME JUST TO ADD COMPLETELY NEW BREAKAGE.

 Linus


Re: [PATCH v3 2/3] sha1_file: open window into packfiles with O_CLOEXEC

2016-10-28 Thread Junio C Hamano
Junio C Hamano  writes:

> Junio C Hamano  writes:
>
>> Hmph.  This may not fly well in practice, though.  
>
> We can flip the order around, and then it becomes simpler to later
> drop atime support.  The first step goes like this.

And the second step would be like this.

-- >8 --
Subject: [PATCH] sha1_file: stop opening files with O_NOATIME

When we open object files, we try to do so with O_NOATIME.
This dates back to 144bde78e9 (Use O_NOATIME when opening
the sha1 files., 2005-04-23), which is an optimization to
avoid creating a bunch of dirty inodes when we're accessing
many objects.  But a few things have changed since then:

  1. In June 2005, git learned about packfiles, which means
 we would do a lot fewer atime updates (rather than one
 per object access, we'd generally get one per packfile).

  2. In late 2006, Linux learned about "relatime", which is
 generally the default on modern installs. So
 performance around atimes updates is a non-issue there
 these days.

 All the world isn't Linux, but as it turns out, Linux
 is the only platform to implement O_NOATIME in the
 first place.

So it's very unlikely that this code is helping anybody
these days.

Helped-by: Jeff King 
[jc: took idea and log message from peff]
Signed-off-by: Junio C Hamano 
---
 cache.h |  2 +-
 sha1_file.c | 21 -
 2 files changed, 1 insertion(+), 22 deletions(-)

diff --git a/cache.h b/cache.h
index 6f1c21a352..f440d3fd1e 100644
--- a/cache.h
+++ b/cache.h
@@ -1123,7 +1123,7 @@ extern int hash_sha1_file_literally(const void *buf, 
unsigned long len, const ch
 extern int pretend_sha1_file(void *, unsigned long, enum object_type, unsigned 
char *);
 extern int force_object_loose(const unsigned char *sha1, time_t mtime);
 extern int git_open_cloexec(const char *name, int flags);
-extern int git_open(const char *name);
+#define git_open(name) git_open_cloexec(name, O_RDONLY)
 extern void *map_sha1_file(const unsigned char *sha1, unsigned long *size);
 extern int unpack_sha1_header(git_zstream *stream, unsigned char *map, 
unsigned long mapsize, void *buffer, unsigned long bufsiz);
 extern int parse_sha1_header(const char *hdr, unsigned long *sizep);
diff --git a/sha1_file.c b/sha1_file.c
index dfbf398183..25d9359402 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -27,14 +27,6 @@
 #include "list.h"
 #include "mergesort.h"
 
-#ifndef O_NOATIME
-#if defined(__linux__) && (defined(__i386__) || defined(__PPC__))
-#define O_NOATIME 0100
-#else
-#define O_NOATIME 0
-#endif
-#endif
-
 #define SZ_FMT PRIuMAX
 static inline uintmax_t sz_fmt(size_t s) { return s; }
 
@@ -1572,19 +1564,6 @@ int git_open_cloexec(const char *name, int flags)
return fd;
 }
 
-int git_open(const char *name)
-{
-   static int noatime = O_NOATIME;
-   int fd = git_open_cloexec(name, O_RDONLY);
-
-   if (0 <= fd && (noatime & O_NOATIME)) {
-   int flags = fcntl(fd, F_GETFL);
-   if (fcntl(fd, F_SETFL, flags | noatime))
-   noatime = 0;
-   }
-   return fd;
-}
-
 static int stat_sha1_file(const unsigned char *sha1, struct stat *st)
 {
struct alternate_object_database *alt;
-- 
2.10.1-791-g404733b9cf



Re: [PATCH v3 2/3] sha1_file: open window into packfiles with O_CLOEXEC

2016-10-28 Thread Junio C Hamano
Junio C Hamano  writes:

> Hmph.  This may not fly well in practice, though.  

We can flip the order around, and then it becomes simpler to later
drop atime support.  The first step goes like this.

-- >8 --
From: Junio C Hamano 
Date: Fri, 28 Oct 2016 06:23:07 -0700
Subject: [PATCH] git_open(): untangle possible NOATIME and CLOEXEC interactions

The way we structured the fallback/retry mechanism for opening with
O_NOATIME and O_CLOEXEC meant that if we failed due to lack of
support to open the file with O_NOATIME option (i.e. EINVAL), we
would still try to drop O_CLOEXEC first and retry, and then drop
O_NOATIME.  A platform on which O_NOATIME is defined in the header
without support from the kernel wouldn't have a chance to open with
O_CLOEXEC option due to this code structure.

Arguably, O_CLOEXEC is more important than O_NOATIME, as the latter
is mostly about performance, while the former can affect correctness.

Instead use O_CLOEXEC to open the file, and then use fcntl(2) to set
O_NOATIME on the resulting file descriptor.  open(2) itself does not
cause atime to be updated according to Linus [*1*].

The helper to do the former can be usable in the codepath in
ce_compare_data() that was recently added to open a file descriptor
with O_CLOEXEC; use it while we are at it.

*1* 

Signed-off-by: Junio C Hamano 
---
 cache.h  |  1 +
 read-cache.c |  9 +
 sha1_file.c  | 39 +++
 3 files changed, 21 insertions(+), 28 deletions(-)

diff --git a/cache.h b/cache.h
index a902ca1f8e..6f1c21a352 100644
--- a/cache.h
+++ b/cache.h
@@ -1122,6 +1122,7 @@ extern int write_sha1_file(const void *buf, unsigned long 
len, const char *type,
 extern int hash_sha1_file_literally(const void *buf, unsigned long len, const 
char *type, unsigned char *sha1, unsigned flags);
 extern int pretend_sha1_file(void *, unsigned long, enum object_type, unsigned 
char *);
 extern int force_object_loose(const unsigned char *sha1, time_t mtime);
+extern int git_open_cloexec(const char *name, int flags);
 extern int git_open(const char *name);
 extern void *map_sha1_file(const unsigned char *sha1, unsigned long *size);
 extern int unpack_sha1_header(git_zstream *stream, unsigned char *map, 
unsigned long mapsize, void *buffer, unsigned long bufsiz);
diff --git a/read-cache.c b/read-cache.c
index db5d910642..c27d3e240b 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -156,14 +156,7 @@ void fill_stat_cache_info(struct cache_entry *ce, struct 
stat *st)
 static int ce_compare_data(const struct cache_entry *ce, struct stat *st)
 {
int match = -1;
-   static int cloexec = O_CLOEXEC;
-   int fd = open(ce->name, O_RDONLY | cloexec);
-
-   if ((cloexec & O_CLOEXEC) && fd < 0 && errno == EINVAL) {
-   /* Try again w/o O_CLOEXEC: the kernel might not support it */
-   cloexec &= ~O_CLOEXEC;
-   fd = open(ce->name, O_RDONLY | cloexec);
-   }
+   int fd = git_open_cloexec(ce->name, O_RDONLY);
 
if (fd >= 0) {
unsigned char sha1[20];
diff --git a/sha1_file.c b/sha1_file.c
index 09045df1dc..dfbf398183 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1559,31 +1559,30 @@ int check_sha1_signature(const unsigned char *sha1, 
void *map,
return hashcmp(sha1, real_sha1) ? -1 : 0;
 }
 
-int git_open(const char *name)
+int git_open_cloexec(const char *name, int flags)
 {
-   static int sha1_file_open_flag = O_NOATIME | O_CLOEXEC;
-
-   for (;;) {
-   int fd;
-
-   errno = 0;
-   fd = open(name, O_RDONLY | sha1_file_open_flag);
-   if (fd >= 0)
-   return fd;
+   static int cloexec = O_CLOEXEC;
+   int fd = open(name, flags | cloexec);
 
+   if ((cloexec & O_CLOEXEC) && fd < 0 && errno == EINVAL) {
/* Try again w/o O_CLOEXEC: the kernel might not support it */
-   if ((sha1_file_open_flag & O_CLOEXEC) && errno == EINVAL) {
-   sha1_file_open_flag &= ~O_CLOEXEC;
-   continue;
-   }
+   cloexec &= ~O_CLOEXEC;
+   fd = open(name, flags | cloexec);
+   }
+   return fd;
+}
 
-   /* Might the failure be due to O_NOATIME? */
-   if (errno != ENOENT && (sha1_file_open_flag & O_NOATIME)) {
-   sha1_file_open_flag &= ~O_NOATIME;
-   continue;
-   }
-   return -1;
+int git_open(const char *name)
+{
+   static int noatime = O_NOATIME;
+   int fd = git_open_cloexec(name, O_RDONLY);
+
+   if (0 <= fd && (noatime & O_NOATIME)) {
+   int flags = fcntl(fd, F_GETFL);
+   if (fcntl(fd, F_SETFL, flags | noatime))
+   noatime = 0;
}
+   return fd;
 }
 
 static int stat_sha1_file(const unsigned char *sha1, struct stat *st)
-- 
2.10.1-791-g404733b9cf



Re: [PATCH v3 2/3] sha1_file: open window into packfiles with O_CLOEXEC

2016-10-28 Thread Johannes Schindelin
Hi,

On Thu, 27 Oct 2016, Junio C Hamano wrote:

> Junio C Hamano  writes:
> 
> > Linus Torvalds  writes:
> >
> >> On Thu, Oct 27, 2016 at 4:36 PM, Junio C Hamano  wrote:
> >>>
> >>> Would the best endgame shape for this function be to open with
> >>> O_NOATIME (and retry without), and then add CLOEXEC with fcntl(2)
> >>> but ignoring an error from it, I guess?  That would be the closest
> >>> to what we historically had, I would think.
> >>
> >> I think that's the best model.
> >
> > OK, so perhaps like this.
> 
> Hmph.  This may not fly well in practice, though.  
> 
> To Unix folks, CLOEXEC is not a huge correctness issue.  A child
> process may hold onto an open file descriptor a bit longer than the
> lifetime of the parent but as long as the child eventually exits,
> nothing is affected.  Over there, things are different.  The parent
> cannot even rename(2) or unlink(2) a file it created and closed
> while the child is still holding the file descriptor open and the
> lack of CLOEXEC will make the parent fail.  I do not know how well
> fcntl(2) emulation works on Windows, but I would not be surprised
> if J6t or Dscho comes back and says that FD_CLOEXEC given to F_SETFD
> would not work while O_CLOEXEC given to open(2) does.

You guys. I mean: You guys! You sure make my life hard. A brief look at
mingw.h could have answered your implicit question:

static inline int fcntl(int fd, int cmd, ...)
{
if (cmd == F_GETFD || cmd == F_SETFD)
return 0;
errno = EINVAL;
return -1;
}

So while you discuss in your Linux Ivory Tower how to optimize Git for
Linux, and Linux only, I'll have to drop everything else and spend the
rest of my Friday trying to find a way to adjust a file handle
*immediately after opening it with undesired flags* (when it could have
been opened with the desired flags, as suggested, to begin with).

Ciao,
Johannes


Re: [PATCH v3 2/3] sha1_file: open window into packfiles with O_CLOEXEC

2016-10-28 Thread Jeff King
On Thu, Oct 27, 2016 at 03:38:59PM -0700, Linus Torvalds wrote:

> On Thu, Oct 27, 2016 at 3:24 AM, Jeff King  wrote:
> >
> > +cc Linus as the original author of 144bde78e9 in case there is
> > something subtle I'm missing, but this really just seems like it's
> > an outdated optimization.
> 
> I'd *really* like to keep O_NOATIME if at all possible. It made a huge
> difference on older kernels, and I'm not convinced that relatime
> really fixes it as well as O_NOATIME.
> 
> There are people who don't like relatime. And even if you do have
> relatime enabled, it will update atime once every day, so then this
> makes your filesystem have a storm of nasty inode writebacks if you
> haven't touched that git repo in a while.

The existence of "relatime" is only half the story of its outdatedness.

The other half is packfiles, so that we are paying atime only once per
packfile, not once per object (technically once per mmap(), so on a
32-bit system with large packfiles, it would be multiple, depending on
your window size).

So I'm not convinced that "storm" is really the right word in a modern
context. The atime updates due to object accesses are probably smaller
than those from all the other read() calls being done on non-object
files (like config, refs, etc).

That being said, if you really care, it's not that much code to keep.

-Peff


Re: [PATCH v3 2/3] sha1_file: open window into packfiles with O_CLOEXEC

2016-10-27 Thread Eric Wong
Junio C Hamano  wrote:
> Junio C Hamano  writes:
> 
> > Linus Torvalds  writes:
> >
> >> On Thu, Oct 27, 2016 at 4:36 PM, Junio C Hamano  wrote:
> >>>
> >>> Would the best endgame shape for this function be to open with
> >>> O_NOATIME (and retry without), and then add CLOEXEC with fcntl(2)
> >>> but ignoring an error from it, I guess?  That would be the closest
> >>> to what we historically had, I would think.
> >>
> >> I think that's the best model.

Actually, I would flip the order of flags.  O_CLOEXEC is more
important from a correctness standpoint.

> > OK, so perhaps like this.
> 
> Hmph.  This may not fly well in practice, though.  
> 
> To Unix folks, CLOEXEC is not a huge correctness issue.  A child
> process may hold onto an open file descriptor a bit longer than the
> lifetime of the parent but as long as the child eventually exits,

I'm not too familiar with C internals of git; but I know we use
threads in some places, and fork+execve in others.

If our usage of threads and execve intersects, and we run
untrusted code in an execve-ed child, then only having cloexec
on open() will save us time when auditing for leaking FDs.

fcntl(fd, F_SETFD, O_CLOEXEC) is racy in if there are other
threads doing execve; so I wouldn't rely on it as a first
choice.

So I suppose something like this:

static int noatime = 1;
int fd = open(... | O_CLOEXEC);
...error checking and retrying...

if (fd >= 0 && noatime && fcntl(fd, F_SETFL, O_NOATIME) != 0)
noatime = 0;

return fd;


Re: [PATCH v3 2/3] sha1_file: open window into packfiles with O_CLOEXEC

2016-10-27 Thread Junio C Hamano
Junio C Hamano  writes:

> Linus Torvalds  writes:
>
>> On Thu, Oct 27, 2016 at 4:36 PM, Junio C Hamano  wrote:
>>>
>>> Would the best endgame shape for this function be to open with
>>> O_NOATIME (and retry without), and then add CLOEXEC with fcntl(2)
>>> but ignoring an error from it, I guess?  That would be the closest
>>> to what we historically had, I would think.
>>
>> I think that's the best model.
>
> OK, so perhaps like this.

Hmph.  This may not fly well in practice, though.  

To Unix folks, CLOEXEC is not a huge correctness issue.  A child
process may hold onto an open file descriptor a bit longer than the
lifetime of the parent but as long as the child eventually exits,
nothing is affected.  Over there, things are different.  The parent
cannot even rename(2) or unlink(2) a file it created and closed
while the child is still holding the file descriptor open and the
lack of CLOEXEC will make the parent fail.  I do not know how well
fcntl(2) emulation works on Windows, but I would not be surprised
if J6t or Dscho comes back and says that FD_CLOEXEC given to F_SETFD
would not work while O_CLOEXEC given to open(2) does.


Re: [PATCH v3 2/3] sha1_file: open window into packfiles with O_CLOEXEC

2016-10-27 Thread Junio C Hamano
Linus Torvalds  writes:

> On Thu, Oct 27, 2016 at 4:36 PM, Junio C Hamano  wrote:
>>
>> Would the best endgame shape for this function be to open with
>> O_NOATIME (and retry without), and then add CLOEXEC with fcntl(2)
>> but ignoring an error from it, I guess?  That would be the closest
>> to what we historically had, I would think.
>
> I think that's the best model.

OK, so perhaps like this.

-- >8 --
Subject: git_open(): untangle possible NOATIME and CLOEXEC interactions

The way we structured the fallback-retry for opening with O_NOATIME
and O_CLOEXEC meant that if we failed due to lack of support to open
the file with O_NOATIME option (i.e. EINVAL), we would still try to
drop O_CLOEXEC first and retry, and then drop O_NOATIME.  A platform
on which O_NOATIME is defined in the header without support from the
kernel wouldn't have a chance to open with O_CLOEXEC option due to
this code structure.

Arguably, O_CLOEXEC is more important than O_NOATIME, as the latter
is mostly about performance, while the former can affect correctness.
Let's revert the recent changes to the way git_open() attempts to
open a file with O_NOATIME and retries without to the original
sequence, and then use a separate fcntl(fd, F_SETFD, FD_CLOEXEC) on
the resulting file descriptor.  The helper to do the latter can be
usable in the codepath in ce_compare_data() that was recently added
to open a file descriptor with O_CLOEXEC, so let's refactor that
codepath with the helper while we are at it.

Signed-off-by: Junio C Hamano 
---
 git-compat-util.h |  5 +++--
 read-cache.c  | 12 
 sha1_file.c   | 49 ++---
 3 files changed, 37 insertions(+), 29 deletions(-)

diff --git a/git-compat-util.h b/git-compat-util.h
index 43718dabae..a751630db5 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -679,9 +679,10 @@ char *gitstrdup(const char *s);
 #define getpagesize() sysconf(_SC_PAGESIZE)
 #endif
 
-#ifndef O_CLOEXEC
-#define O_CLOEXEC 0
+#ifndef FD_CLOEXEC
+#define FD_CLOEXEC 0
 #endif
+extern int git_set_cloexec(int);
 
 #ifdef FREAD_READS_DIRECTORIES
 #ifdef fopen
diff --git a/read-cache.c b/read-cache.c
index db5d910642..fb91514885 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -156,17 +156,13 @@ void fill_stat_cache_info(struct cache_entry *ce, struct 
stat *st)
 static int ce_compare_data(const struct cache_entry *ce, struct stat *st)
 {
int match = -1;
-   static int cloexec = O_CLOEXEC;
-   int fd = open(ce->name, O_RDONLY | cloexec);
-
-   if ((cloexec & O_CLOEXEC) && fd < 0 && errno == EINVAL) {
-   /* Try again w/o O_CLOEXEC: the kernel might not support it */
-   cloexec &= ~O_CLOEXEC;
-   fd = open(ce->name, O_RDONLY | cloexec);
-   }
+   int fd = open(ce->name, O_RDONLY);
 
if (fd >= 0) {
unsigned char sha1[20];
+
+   /* do not let child processes to hold onto the open fd */
+   git_set_cloexec(fd);
if (!index_fd(sha1, fd, st, OBJ_BLOB, ce->name, 0))
match = hashcmp(sha1, ce->oid.hash);
/* index_fd() closed the file descriptor already */
diff --git a/sha1_file.c b/sha1_file.c
index 09045df1dc..41383a6c20 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1559,31 +1559,42 @@ int check_sha1_signature(const unsigned char *sha1, 
void *map,
return hashcmp(sha1, real_sha1) ? -1 : 0;
 }
 
-int git_open(const char *name)
+int git_set_cloexec(int fd)
 {
-   static int sha1_file_open_flag = O_NOATIME | O_CLOEXEC;
+   static int cloexec = FD_CLOEXEC;
 
-   for (;;) {
-   int fd;
+   if (cloexec) {
+   if (fcntl(fd, F_SETFD, FD_CLOEXEC) < 0)
+   cloexec = 0;
+   /*
+* We might want to diagnose and complain upon seeing
+* an error from this call, but let's keep the same
+* behaviour as before for now.
+*/
+   }
+   return 0;
+}
 
-   errno = 0;
-   fd = open(name, O_RDONLY | sha1_file_open_flag);
-   if (fd >= 0)
-   return fd;
+int git_open(const char *name)
+{
+   static int noatime = O_NOATIME;
+   int fd;
 
-   /* Try again w/o O_CLOEXEC: the kernel might not support it */
-   if ((sha1_file_open_flag & O_CLOEXEC) && errno == EINVAL) {
-   sha1_file_open_flag &= ~O_CLOEXEC;
-   continue;
-   }
+   errno = 0;
+   fd = open(name, O_RDONLY | noatime);
 
-   /* Might the failure be due to O_NOATIME? */
-   if (errno != ENOENT && (sha1_file_open_flag & O_NOATIME)) {
-   sha1_file_open_flag &= ~O_NOATIME;
-   continue;
-   }
-   return -1;
+   /* Might the failure be due to O_NOATIME? */
+   if ((noatime & O_NOATIME) && 

Re: [PATCH v3 2/3] sha1_file: open window into packfiles with O_CLOEXEC

2016-10-27 Thread Linus Torvalds
On Thu, Oct 27, 2016 at 4:36 PM, Junio C Hamano  wrote:
>
> Would the best endgame shape for this function be to open with
> O_NOATIME (and retry without), and then add CLOEXEC with fcntl(2)
> but ignoring an error from it, I guess?  That would be the closest
> to what we historically had, I would think.

I think that's the best model.

Note that the O_NOATIME code is very much designed to try O_NOATIME
only _once_. Because even when the kernel supports O_NOATIME, if you
have a shared object tree where you may not be the owner of all the
files, a O_NOATIME open can fail with NOPERM ("You are not allowed to
hide your accesses to this file").

This is why it uses that

static unsigned int sha1_file_open_flag = O_NOATIME;

and if the O_NOATIME open ever fails (and the no-O_NOATIME open
succeeds), it clears that flag. Exactly so that it will *not* end up
in some kind of "let's open and fail and re-open" loop. It's designed
to fail once.

Or at least that's how it used to be originally. This code has
obviously changed since that early design. Now it seems to clear it
for any non-ENOENT error. Which looks fine too.

  Linus


Re: [PATCH v3 2/3] sha1_file: open window into packfiles with O_CLOEXEC

2016-10-27 Thread Junio C Hamano
Linus Torvalds  writes:

> But the basic issue still remains - I'd really prefer to have NOATIME
> stay around for all those poor misguided souls that for some reason
> don't like "relatime" or run old kernels. But whether it is with
> O_NOATIME at open time or with F_SETFL, I don't care.

Understood.  

Would the best endgame shape for this function be to open with
O_NOATIME (and retry without), and then add CLOEXEC with fcntl(2)
but ignoring an error from it, I guess?  That would be the closest
to what we historically had, I would think.





Re: [PATCH v3 2/3] sha1_file: open window into packfiles with O_CLOEXEC

2016-10-27 Thread Linus Torvalds
On Thu, Oct 27, 2016 at 4:09 PM, Linus Torvalds
 wrote:
>
> That said, now that I think about it, I should double-check: maybe
> open() doesn't actually set atime at all, and we *could* do NOATIME
> with SETFL after all.

Checked. Yup. O_NOATIME could easily be done with SETFL:, because as
with O_CLOEXEC, it only affects operations _after_ the open. The open
itself doesn't set the access time.

So I was full of it.

But the basic issue still remains - I'd really prefer to have NOATIME
stay around for all those poor misguided souls that for some reason
don't like "relatime" or run old kernels. But whether it is with
O_NOATIME at open time or with F_SETFL, I don't care.

Linus


Re: [PATCH v3 2/3] sha1_file: open window into packfiles with O_CLOEXEC

2016-10-27 Thread Linus Torvalds
On Thu, Oct 27, 2016 at 3:56 PM, Junio C Hamano  wrote:
>
> I also thought O_NOATIME shouldn't be an effective fcntl(2) thing,
> for the reasons you stated above, when I was studying the area
> because of the discussion on these patches.  I was surprised to see
> that http://man7.org/linux/man-pages/man2/fcntl.2.html contradicts
> by saying F_SETFL can take O_NOATIME.
>
> Perhaps it deserves a bugreport?

It's not a bug per se.

You can indeed change O_NOATIME with F_SETFL. And it actually does
matter, since atime is normally changed by mmap(), read() and write()
calls.

So F_SETFL O_NOATIME actually does make sense, but it doesn't cover
the atime update done by open() itself.

That said, now that I think about it, I should double-check: maybe
open() doesn't actually set atime at all, and we *could* do NOATIME
with SETFL after all.

The exact [acm]time semantics are damn subtle.

 Linus


Re: [PATCH v3 2/3] sha1_file: open window into packfiles with O_CLOEXEC

2016-10-27 Thread Junio C Hamano
Linus Torvalds  writes:

> Note that you can *not* do the same thing with O_NOATIME, since the
> whole point of O_NOATIME is that it changes the behavior of the open
> itself (unlike O_CLOEXEC which changes _later_ behavior, and can
> always be replaced by FD_CLOEXEC fnclt modulo races that are
> immaterial for git)

A tangent.  

I also thought O_NOATIME shouldn't be an effective fcntl(2) thing,
for the reasons you stated above, when I was studying the area
because of the discussion on these patches.  I was surprised to see
that http://man7.org/linux/man-pages/man2/fcntl.2.html contradicts
by saying F_SETFL can take O_NOATIME.  

Perhaps it deserves a bugreport?




Re: [PATCH v3 2/3] sha1_file: open window into packfiles with O_CLOEXEC

2016-10-27 Thread Linus Torvalds
On Thu, Oct 27, 2016 at 3:24 AM, Jeff King  wrote:
>
> +cc Linus as the original author of 144bde78e9 in case there is
> something subtle I'm missing, but this really just seems like it's
> an outdated optimization.

I'd *really* like to keep O_NOATIME if at all possible. It made a huge
difference on older kernels, and I'm not convinced that relatime
really fixes it as well as O_NOATIME.

There are people who don't like relatime. And even if you do have
relatime enabled, it will update atime once every day, so then this
makes your filesystem have a storm of nasty inode writebacks if you
haven't touched that git repo in a while.

If this is purely about mixing things up with O_CLOEXEC, then that is
*trivially* fixable by just using

 fcntl(fd, F_SETFD, FD_CLOEXEC);

after the open.

Which is what you have to do anyway if you want to be portable, so
its' not like you could avoid that complexity in the first place.

Note that you can *not* do the same thing with O_NOATIME, since the
whole point of O_NOATIME is that it changes the behavior of the open
itself (unlike O_CLOEXEC which changes _later_ behavior, and can
always be replaced by FD_CLOEXEC fnclt modulo races that are
immaterial for git)

 Linus


Re: [PATCH v3 2/3] sha1_file: open window into packfiles with O_CLOEXEC

2016-10-27 Thread Junio C Hamano
Jeff King  writes:

> +cc Linus as the original author of 144bde78e9 in case there is
> something subtle I'm missing, but this really just seems like it's
> an outdated optimization.
>
> -- >8 --
> Subject: [PATCH] sha1_file: stop opening files with O_NOATIME
>
> When we open object files, we try to do so with O_NOATIME.
> This dates back to 144bde78e9 (Use O_NOATIME when opening
> the sha1 files., 2005-04-23), which is an optimization to
> avoid creating a bunch of dirty inodes when we're accessing
> many objects.  But a few things have changed since then:
>
>   1. In June 2005, git learned about packfiles, which means
>  we would do a lot fewer atime updates (rather than one
>  per object access, we'd generally get one per packfile).
>
>   2. In late 2006, Linux learned about "relatime", which is
>  generally the default on modern installs. So
>  performance around atimes updates is a non-issue there
>  these days.
>
>  All the world isn't Linux, but as it turns out, Linux
>  is the only platform to implement O_NOATIME in the
>  first place.
>
> So it's very unlikely that this code is helping anybody
> these days.
>
> It's not a particularly large amount of code, but the
> fallback-retry creates complexity. E.g., we do a similar
> fallback for CLOEXEC; which one should take precedence, or
> should we try all possible combinations? Dropping O_NOATIME
> makes those questions go away.
>
> Signed-off-by: Jeff King 
> ---

We may want to lose the surrounding for (;;) loop as there is only
one flag to retry without, which was the original code structure
back when 144bde78e9 ("Use O_NOATIME when opening the sha1 files.",
2005-04-23) was written and refactored by 44d1c19ee8 ("Make loose
object file reading more careful", 2008-06-14).

IOW, this on top.  The update to ce_compare_data() Lars has in
a0a6cb9662 ("read-cache: make sure file handles are not inherited by
child processes", 2016-10-24) could then made into a call to
git_open().

 sha1_file.c | 23 ---
 1 file changed, 8 insertions(+), 15 deletions(-)

diff --git a/sha1_file.c b/sha1_file.c
index 6f02a57d8b..e18ea053e6 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1553,24 +1553,17 @@ int check_sha1_signature(const unsigned char *sha1, 
void *map,
 
 int git_open(const char *name)
 {
-   static int sha1_file_open_flag = O_CLOEXEC;
-
-   for (;;) {
-   int fd;
-
-   errno = 0;
-   fd = open(name, O_RDONLY | sha1_file_open_flag);
-   if (fd >= 0)
-   return fd;
+   static int cloexec = O_CLOEXEC;
+   int fd;
 
+   errno = 0;
+   fd = open(name, O_RDONLY | cloexec);
+   if ((cloexec & O_CLOEXEC) && fd < 0 && errno == EINVAL) {
/* Try again w/o O_CLOEXEC: the kernel might not support it */
-   if ((sha1_file_open_flag & O_CLOEXEC) && errno == EINVAL) {
-   sha1_file_open_flag &= ~O_CLOEXEC;
-   continue;
-   }
-
-   return -1;
+   cloexec &= ~O_CLOEXEC;
+   fd = open(name, O_RDONLY | cloexec);
}
+   return fd;
 }
 
 static int stat_sha1_file(const unsigned char *sha1, struct stat *st)


Re: [PATCH v3 2/3] sha1_file: open window into packfiles with O_CLOEXEC

2016-10-27 Thread Jeff King
On Wed, Oct 26, 2016 at 02:15:33PM -0700, Junio C Hamano wrote:

> Jeff King  writes:
> 
> > On Wed, Oct 26, 2016 at 10:52:41AM -0700, Junio C Hamano wrote:
> >
> >> > I actually wonder if it is worth carrying around the O_NOATIME hack at
> >> > all.
> >> 
> >> Yes, I share the thought.  We no longer have too many loose objects
> >> to matter.
> >> 
> >> I do not mind flipping the order, but I'd prefer to cook the result
> >> even longer.  I am tempted to suggest we take two step route:
> >> 
> >>  - ship 2.11 with the "atime has been there and we won't regress it"
> >>shape, while cooking the "cloexec is semantically more
> >>important" version in 'next' during the feature freeze
> >> 
> >>  - immediately after 2.11 merge it to 'master' for 2.12 to make sure
> >>there is no fallout.
> >
> > That sounds reasonable, though I'd consider jumping straight to "NOATIME
> > is not worth it; drop it" as the patch for post-2.11.
> 
> That endgame is fine by me too.  Thanks for a sanity-check.

So here's that endgame patch. My main concern with it was that there
might be non-Linux systems that could be affected. But when I dug into
it, I found that this code was never activated anywhere besides Linux in
the first place. So I really doubt this will have any negative impact at
all. I certainly don't mind cooking it until post-2.11, though.

+cc Linus as the original author of 144bde78e9 in case there is
something subtle I'm missing, but this really just seems like it's
an outdated optimization.

-- >8 --
Subject: [PATCH] sha1_file: stop opening files with O_NOATIME

When we open object files, we try to do so with O_NOATIME.
This dates back to 144bde78e9 (Use O_NOATIME when opening
the sha1 files., 2005-04-23), which is an optimization to
avoid creating a bunch of dirty inodes when we're accessing
many objects.  But a few things have changed since then:

  1. In June 2005, git learned about packfiles, which means
 we would do a lot fewer atime updates (rather than one
 per object access, we'd generally get one per packfile).

  2. In late 2006, Linux learned about "relatime", which is
 generally the default on modern installs. So
 performance around atimes updates is a non-issue there
 these days.

 All the world isn't Linux, but as it turns out, Linux
 is the only platform to implement O_NOATIME in the
 first place.

So it's very unlikely that this code is helping anybody
these days.

It's not a particularly large amount of code, but the
fallback-retry creates complexity. E.g., we do a similar
fallback for CLOEXEC; which one should take precedence, or
should we try all possible combinations? Dropping O_NOATIME
makes those questions go away.

Signed-off-by: Jeff King 
---
 sha1_file.c | 15 +--
 1 file changed, 1 insertion(+), 14 deletions(-)

diff --git a/sha1_file.c b/sha1_file.c
index 09045df1dc..6f02a57d8b 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -27,14 +27,6 @@
 #include "list.h"
 #include "mergesort.h"
 
-#ifndef O_NOATIME
-#if defined(__linux__) && (defined(__i386__) || defined(__PPC__))
-#define O_NOATIME 0100
-#else
-#define O_NOATIME 0
-#endif
-#endif
-
 #define SZ_FMT PRIuMAX
 static inline uintmax_t sz_fmt(size_t s) { return s; }
 
@@ -1561,7 +1553,7 @@ int check_sha1_signature(const unsigned char *sha1, void 
*map,
 
 int git_open(const char *name)
 {
-   static int sha1_file_open_flag = O_NOATIME | O_CLOEXEC;
+   static int sha1_file_open_flag = O_CLOEXEC;
 
for (;;) {
int fd;
@@ -1577,11 +1569,6 @@ int git_open(const char *name)
continue;
}
 
-   /* Might the failure be due to O_NOATIME? */
-   if (errno != ENOENT && (sha1_file_open_flag & O_NOATIME)) {
-   sha1_file_open_flag &= ~O_NOATIME;
-   continue;
-   }
return -1;
}
 }
-- 
2.10.1.916.g0d2035c



Re: [PATCH v3 2/3] sha1_file: open window into packfiles with O_CLOEXEC

2016-10-26 Thread Junio C Hamano
Jeff King  writes:

> On Wed, Oct 26, 2016 at 10:52:41AM -0700, Junio C Hamano wrote:
>
>> > I actually wonder if it is worth carrying around the O_NOATIME hack at
>> > all.
>> 
>> Yes, I share the thought.  We no longer have too many loose objects
>> to matter.
>> 
>> I do not mind flipping the order, but I'd prefer to cook the result
>> even longer.  I am tempted to suggest we take two step route:
>> 
>>  - ship 2.11 with the "atime has been there and we won't regress it"
>>shape, while cooking the "cloexec is semantically more
>>important" version in 'next' during the feature freeze
>> 
>>  - immediately after 2.11 merge it to 'master' for 2.12 to make sure
>>there is no fallout.
>
> That sounds reasonable, though I'd consider jumping straight to "NOATIME
> is not worth it; drop it" as the patch for post-2.11.

That endgame is fine by me too.  Thanks for a sanity-check.


Re: [PATCH v3 2/3] sha1_file: open window into packfiles with O_CLOEXEC

2016-10-26 Thread Jeff King
On Wed, Oct 26, 2016 at 10:52:41AM -0700, Junio C Hamano wrote:

> > I actually wonder if it is worth carrying around the O_NOATIME hack at
> > all.
> 
> Yes, I share the thought.  We no longer have too many loose objects
> to matter.
> 
> I do not mind flipping the order, but I'd prefer to cook the result
> even longer.  I am tempted to suggest we take two step route:
> 
>  - ship 2.11 with the "atime has been there and we won't regress it"
>shape, while cooking the "cloexec is semantically more
>important" version in 'next' during the feature freeze
> 
>  - immediately after 2.11 merge it to 'master' for 2.12 to make sure
>there is no fallout.

That sounds reasonable, though I'd consider jumping straight to "NOATIME
is not worth it; drop it" as the patch for post-2.11.

-Peff


Re: [PATCH v3 2/3] sha1_file: open window into packfiles with O_CLOEXEC

2016-10-26 Thread Junio C Hamano
Jeff King  writes:

> Of the two flags, I would say CLOEXEC is the more important one to
> respect because it may actually impact correctness (e.g., leaking
> descriptors to sub-processes). Whereas O_NOATIME is purely a performance
> optimization.

I tend to agree.

> I actually wonder if it is worth carrying around the O_NOATIME hack at
> all.

Yes, I share the thought.  We no longer have too many loose objects
to matter.

I do not mind flipping the order, but I'd prefer to cook the result
even longer.  I am tempted to suggest we take two step route:

 - ship 2.11 with the "atime has been there and we won't regress it"
   shape, while cooking the "cloexec is semantically more
   important" version in 'next' during the feature freeze

 - immediately after 2.11 merge it to 'master' for 2.12 to make sure
   there is no fallout.



Re: [PATCH v3 2/3] sha1_file: open window into packfiles with O_CLOEXEC

2016-10-26 Thread Jeff King
On Wed, Oct 26, 2016 at 09:23:21AM -0700, Junio C Hamano wrote:

> >> +  /* Might the failure be due to O_NOATIME? */
> >> +  if (errno != ENOENT && (sha1_file_open_flag & O_NOATIME)) {
> >> +  sha1_file_open_flag &= ~O_NOATIME;
> >> +  continue;
> >> +  }
> >
> > We drop O_NOATIME, and end up with an empty flag field.
> >
> > But we will never have tried just O_CLOEXEC, which might have worked.
> 
> Yes, doing so would smudge atime, so one question is which one
> between noatime or cloexec is more important to be done at open(2)
> time.

Yes, but the missing case is one where we know that O_NOATIME does not
work (but O_CLOEXEC does), so we know we have to smudge the atime.

Of the two flags, I would say CLOEXEC is the more important one to
respect because it may actually impact correctness (e.g., leaking
descriptors to sub-processes). Whereas O_NOATIME is purely a performance
optimization.

I actually wonder if it is worth carrying around the O_NOATIME hack at
all.  Linus added it on 2005-04-23 via 144bde78e9; the aim was to reduce
the cost of opening loose object files. Some things have changed since
then:

  1. In June 2005, git learned about packfiles, which means we would do
 a lot fewer atime updates (rather than one per object access, we'd
 generally get one per packfile).

  2. In late 2006, Linux learned about "relatime", which is generally
 the default on modern installs. So performance around atime updates
 is a non-issue there these days.

 All the world isn't Linux, of course, but I can't help that feel
 that atime performance hackery is something that belongs at the
 system level, not in individual applications.

So I don't have hard numbers, but I'd be surprised if O_NOATIME is
really buying us anything these days.

-Peff


Re: [PATCH v3 2/3] sha1_file: open window into packfiles with O_CLOEXEC

2016-10-26 Thread Junio C Hamano
Jeff King  writes:

>> +/* Try again w/o O_CLOEXEC: the kernel might not support it */
>> +if ((sha1_file_open_flag & O_CLOEXEC) && errno == EINVAL) {
>> +sha1_file_open_flag &= ~O_CLOEXEC;
>>  continue;
>>  }
>
> So if we start with O_CLOEXEC|O_NOATIME, we drop CLOEXEC here and try
> again with just O_NOATIME. And then if _that_ fails...
>
>> +/* Might the failure be due to O_NOATIME? */
>> +if (errno != ENOENT && (sha1_file_open_flag & O_NOATIME)) {
>> +sha1_file_open_flag &= ~O_NOATIME;
>> +continue;
>> +}
>
> We drop O_NOATIME, and end up with an empty flag field.
>
> But we will never have tried just O_CLOEXEC, which might have worked.

Yes, doing so would smudge atime, so one question is which one
between noatime or cloexec is more important to be done at open(2)
time.

It may be possible to open(2) only with cloexec and then fcntl(2)
FD_SET noatime immediately after, but going that way would explode
the combination even more, as it may not be possible to set these
two flags the other way around.

> I'm not sure it's worth worrying about or not; I don't know which
> systems are actually lacking either of the flags, or if they tend to
> have both.



Re: [PATCH v3 2/3] sha1_file: open window into packfiles with O_CLOEXEC

2016-10-25 Thread Jeff King
On Tue, Oct 25, 2016 at 11:16:20AM -0700, Junio C Hamano wrote:

> diff --git a/sha1_file.c b/sha1_file.c
> index 5d2bcd3ed1..09045df1dc 100644
> --- a/sha1_file.c
> +++ b/sha1_file.c
> @@ -1571,12 +1571,17 @@ int git_open(const char *name)
>   if (fd >= 0)
>   return fd;
>  
> - /* Might the failure be due to O_NOATIME? */
> - if (errno != ENOENT && sha1_file_open_flag) {
> - sha1_file_open_flag = 0;
> + /* Try again w/o O_CLOEXEC: the kernel might not support it */
> + if ((sha1_file_open_flag & O_CLOEXEC) && errno == EINVAL) {
> + sha1_file_open_flag &= ~O_CLOEXEC;
>   continue;
>   }

So if we start with O_CLOEXEC|O_NOATIME, we drop CLOEXEC here and try
again with just O_NOATIME. And then if _that_ fails...

> + /* Might the failure be due to O_NOATIME? */
> + if (errno != ENOENT && (sha1_file_open_flag & O_NOATIME)) {
> + sha1_file_open_flag &= ~O_NOATIME;
> + continue;
> + }

We drop O_NOATIME, and end up with an empty flag field.

But we will never have tried just O_CLOEXEC, which might have worked.

Because of the order here, this would not be a regression (i.e., any
system that used to work will still eventually find a working comb), but
it does mean that systems without O_NOATIME do not get the benefit of
the new O_CLOEXEC protection.

Unfortunately, I think covering all of the cases would be 2^nr_flags.
That's only 4 right now, but it does not bode well as a pattern.

I'm not sure it's worth worrying about or not; I don't know which
systems are actually lacking either of the flags, or if they tend to
have both.

-Peff