[PATCH v6 04/12] fsmonitor: teach git to optionally utilize a file system monitor to speed up detecting new or changed files.

2017-09-15 Thread Ben Peart
When the index is read from disk, the fsmonitor index extension is used
to flag the last known potentially dirty index entries. The registered
core.fsmonitor command is called with the time the index was last
updated and returns the list of files changed since that time. This list
is used to flag any additional dirty cache entries and untracked cache
directories.

We can then use this valid state to speed up preload_index(),
ie_match_stat(), and refresh_cache_ent() as they do not need to lstat()
files to detect potential changes for those entries marked
CE_FSMONITOR_VALID.

In addition, if the untracked cache is turned on valid_cached_dir() can
skip checking directories for new or changed files as fsmonitor will
invalidate the cache only for those directories that have been
identified as having potential changes.

To keep the CE_FSMONITOR_VALID state accurate during git operations;
when git updates a cache entry to match the current state on disk,
it will now set the CE_FSMONITOR_VALID bit.

Inversely, anytime git changes a cache entry, the CE_FSMONITOR_VALID bit
is cleared and the corresponding untracked cache directory is marked
invalid.

Signed-off-by: Ben Peart 
---
 Makefile   |   1 +
 apply.c|   2 +-
 builtin/update-index.c |   2 +
 cache.h|  10 +-
 config.c   |  14 +++
 config.h   |   1 +
 diff-lib.c |   2 +
 dir.c  |  27 --
 dir.h  |   2 +
 entry.c|   4 +-
 environment.c  |   1 +
 fsmonitor.c| 253 +
 fsmonitor.h|  61 
 preload-index.c|   6 +-
 read-cache.c   |  49 --
 submodule.c|   2 +-
 unpack-trees.c |   8 +-
 17 files changed, 419 insertions(+), 26 deletions(-)
 create mode 100644 fsmonitor.c
 create mode 100644 fsmonitor.h

diff --git a/Makefile b/Makefile
index f2bb7f2f63..9d6ec9c1e9 100644
--- a/Makefile
+++ b/Makefile
@@ -786,6 +786,7 @@ LIB_OBJS += ewah/ewah_rlw.o
 LIB_OBJS += exec_cmd.o
 LIB_OBJS += fetch-pack.o
 LIB_OBJS += fsck.o
+LIB_OBJS += fsmonitor.o
 LIB_OBJS += gettext.o
 LIB_OBJS += gpg-interface.o
 LIB_OBJS += graph.o
diff --git a/apply.c b/apply.c
index 71cbbd141c..9061cc5f15 100644
--- a/apply.c
+++ b/apply.c
@@ -3399,7 +3399,7 @@ static int verify_index_match(const struct cache_entry 
*ce, struct stat *st)
return -1;
return 0;
}
-   return ce_match_stat(ce, st, 
CE_MATCH_IGNORE_VALID|CE_MATCH_IGNORE_SKIP_WORKTREE);
+   return ce_match_stat(ce, st, 
CE_MATCH_IGNORE_VALID|CE_MATCH_IGNORE_SKIP_WORKTREE|CE_MATCH_IGNORE_FSMONITOR);
 }
 
 #define SUBMODULE_PATCH_WITHOUT_INDEX 1
diff --git a/builtin/update-index.c b/builtin/update-index.c
index e1ca0759d5..6f39ee9274 100644
--- a/builtin/update-index.c
+++ b/builtin/update-index.c
@@ -16,6 +16,7 @@
 #include "pathspec.h"
 #include "dir.h"
 #include "split-index.h"
+#include "fsmonitor.h"
 
 /*
  * Default to not allowing changes to the list of files. The
@@ -233,6 +234,7 @@ static int mark_ce_flags(const char *path, int flag, int 
mark)
else
active_cache[pos]->ce_flags &= ~flag;
active_cache[pos]->ce_flags |= CE_UPDATE_IN_BASE;
+   mark_fsmonitor_invalid(&the_index, active_cache[pos]);
cache_tree_invalidate_path(&the_index, path);
active_cache_changed |= CE_ENTRY_CHANGED;
return 0;
diff --git a/cache.h b/cache.h
index a916bc79e3..eccab968bd 100644
--- a/cache.h
+++ b/cache.h
@@ -203,6 +203,7 @@ struct cache_entry {
 #define CE_ADDED (1 << 19)
 
 #define CE_HASHED(1 << 20)
+#define CE_FSMONITOR_VALID   (1 << 21)
 #define CE_WT_REMOVE (1 << 22) /* remove in work directory */
 #define CE_CONFLICTED(1 << 23)
 
@@ -326,6 +327,7 @@ static inline unsigned int canon_mode(unsigned int mode)
 #define CACHE_TREE_CHANGED (1 << 5)
 #define SPLIT_INDEX_ORDERED(1 << 6)
 #define UNTRACKED_CHANGED  (1 << 7)
+#define FSMONITOR_CHANGED  (1 << 8)
 
 struct split_index;
 struct untracked_cache;
@@ -344,6 +346,7 @@ struct index_state {
struct hashmap dir_hash;
unsigned char sha1[20];
struct untracked_cache *untracked;
+   uint64_t fsmonitor_last_update;
 };
 
 extern struct index_state the_index;
@@ -679,8 +682,10 @@ extern void *read_blob_data_from_index(const struct 
index_state *, const char *,
 #define CE_MATCH_IGNORE_MISSING0x08
 /* enable stat refresh */
 #define CE_MATCH_REFRESH   0x10
-extern int ie_match_stat(const struct index_state *, const struct cache_entry 
*, struct stat *, unsigned int);
-extern int ie_modified(const struct index_state *, const struct cache_entry *, 
struct stat *, unsigned int);
+/* do stat comparison even if CE_FSMONITOR_VALID is true */
+#define CE_MATCH_IGNORE_FSMONITOR 0X20
+

RE: [PATCH v6 04/12] fsmonitor: teach git to optionally utilize a file system monitor to speed up detecting new or changed files.

2017-09-15 Thread David Turner
> -Original Message-
> From: Ben Peart [mailto:benpe...@microsoft.com]
> Sent: Friday, September 15, 2017 3:21 PM
> To: benpe...@microsoft.com
> Cc: David Turner ; ava...@gmail.com;
> christian.cou...@gmail.com; git@vger.kernel.org; gits...@pobox.com;
> johannes.schinde...@gmx.de; pclo...@gmail.com; p...@peff.net
> Subject: [PATCH v6 04/12] fsmonitor: teach git to optionally utilize a file 
> system
> monitor to speed up detecting new or changed files.
 
> +int git_config_get_fsmonitor(void)
> +{
> + if (git_config_get_pathname("core.fsmonitor", &core_fsmonitor))
> + core_fsmonitor = getenv("GIT_FSMONITOR_TEST");
> +
> + if (core_fsmonitor && !*core_fsmonitor)
> + core_fsmonitor = NULL;
> +
> + if (core_fsmonitor)
> + return 1;
> +
> + return 0;
> +}

This functions return values are backwards relative to the rest of the 
git_config_* functions.

[snip]

+>  /*
+>   * With fsmonitor, we can trust the untracked cache's valid field.
+>   */

[snip]

> +int read_fsmonitor_extension(struct index_state *istate, const void *data,
> + unsigned long sz)
> +{

If git_config_get_fsmonitor returns 0, fsmonitor_dirty will leak.

[snip]

> + /* a fsmonitor process can return '*' to indicate all entries are 
> invalid */

That's not documented in your documentation.  Also, I'm not sure I like it: 
what 
if I have a file whose name starts with '*'?  Yeah, that would be silly, but 
this indicates the need 
for the protocol to have some sort of signaling mechanism that's out-of-band  
Maybe 
have some key\0value\0 pairs and then \0\0 and then the list of files?  Or, if 
you want to keep
it really simple, allow an entry of '/' (which is an invalid filename) to mean 
'all'.

> +void add_fsmonitor(struct index_state *istate) {
> + int i;
> +
> + if (!istate->fsmonitor_last_update) {
[snip]
> + /* reset the untracked cache */

Is this really necessary?  Shouldn't the untracked cache be in a correct state 
already? 

> +/*
> + * Clear the given cache entries CE_FSMONITOR_VALID bit and invalidate

Nit: "s/entries/entry's/".
 



Re: [PATCH v6 04/12] fsmonitor: teach git to optionally utilize a file system monitor to speed up detecting new or changed files.

2017-09-18 Thread Ben Peart

Thanks for taking the time to review/provide feedback!

On 9/15/2017 5:35 PM, David Turner wrote:

-Original Message-
From: Ben Peart [mailto:benpe...@microsoft.com]
Sent: Friday, September 15, 2017 3:21 PM
To: benpe...@microsoft.com
Cc: David Turner ; ava...@gmail.com;
christian.cou...@gmail.com; git@vger.kernel.org; gits...@pobox.com;
johannes.schinde...@gmx.de; pclo...@gmail.com; p...@peff.net
Subject: [PATCH v6 04/12] fsmonitor: teach git to optionally utilize a file 
system
monitor to speed up detecting new or changed files.
  

+int git_config_get_fsmonitor(void)
+{
+   if (git_config_get_pathname("core.fsmonitor", &core_fsmonitor))
+   core_fsmonitor = getenv("GIT_FSMONITOR_TEST");
+
+   if (core_fsmonitor && !*core_fsmonitor)
+   core_fsmonitor = NULL;
+
+   if (core_fsmonitor)
+   return 1;
+
+   return 0;
+}


This functions return values are backwards relative to the rest of the 
git_config_* functions.


I'm confused.  If core.fsmonitor is configured, it returns 1. If it is 
not configured, it returns 0. I don't make use of the -1 /* default 
value */ option as I didn't see any use/value in this case. What is 
backwards?




[snip]

+>   /*
+>* With fsmonitor, we can trust the untracked cache's valid field.
+>*/



Did you intend to make a comment here?


[snip]


+int read_fsmonitor_extension(struct index_state *istate, const void *data,
+   unsigned long sz)
+{


If git_config_get_fsmonitor returns 0, fsmonitor_dirty will leak.



Good catch!  Thank you.


[snip]


+   /* a fsmonitor process can return '*' to indicate all entries are 
invalid */


That's not documented in your documentation.  Also, I'm not sure I like it: what
if I have a file whose name starts with '*'?  Yeah, that would be silly, but 
this indicates the need
for the protocol to have some sort of signaling mechanism that's out-of-band  
Maybe
have some key\0value\0 pairs and then \0\0 and then the list of files?  Or, if 
you want to keep
it really simple, allow an entry of '/' (which is an invalid filename) to mean 
'all'.



Yea, this was an optimization I added late in the game to get around an 
issue in Watchman where it returns the name of every file the first time 
you query it (rather than the set of files that have actually changed 
since the requested time).


I didn't realize the wild card '*' was a valid character for a filename. 
 I'll switch to '/' as you suggest as I don't want to complicate things 
unnecessarily to handle this relatively rare optimization.  I'll also 
get it documented properly.  Thanks!



+void add_fsmonitor(struct index_state *istate) {
+   int i;
+
+   if (!istate->fsmonitor_last_update) {

[snip]

+   /* reset the untracked cache */


Is this really necessary?  Shouldn't the untracked cache be in a correct state 
already?



When fsmonitor is not turned on, I'm not explicitly invalidating 
untracked cache directory entries as git makes changes to files. While I 
doubt the sequence happens of 1) git making changes to files, *then* 2) 
turning on fsmonitor - I thought it better safe than sorry to assume 
that pattern won't ever happen in the future.  Especially since turning 
on the extension is rare and the cost is low.



+/*
+ * Clear the given cache entries CE_FSMONITOR_VALID bit and invalidate


Nit: "s/entries/entry's/".
  



RE: [PATCH v6 04/12] fsmonitor: teach git to optionally utilize a file system monitor to speed up detecting new or changed files.

2017-09-18 Thread David Turner
> -Original Message-
> From: Ben Peart [mailto:peart...@gmail.com]
> Sent: Monday, September 18, 2017 9:07 AM
> To: David Turner ; 'Ben Peart'
> 
> Cc: ava...@gmail.com; christian.cou...@gmail.com; git@vger.kernel.org;
> gits...@pobox.com; johannes.schinde...@gmx.de; pclo...@gmail.com;
> p...@peff.net
> Subject: Re: [PATCH v6 04/12] fsmonitor: teach git to optionally utilize a 
> file
> system monitor to speed up detecting new or changed files.
> 
> Thanks for taking the time to review/provide feedback!
> 
> On 9/15/2017 5:35 PM, David Turner wrote:
> >> -Original Message-
> >> From: Ben Peart [mailto:benpe...@microsoft.com]
> >> Sent: Friday, September 15, 2017 3:21 PM
> >> To: benpe...@microsoft.com
> >> Cc: David Turner ; ava...@gmail.com;
> >> christian.cou...@gmail.com; git@vger.kernel.org; gits...@pobox.com;
> >> johannes.schinde...@gmx.de; pclo...@gmail.com; p...@peff.net
> >> Subject: [PATCH v6 04/12] fsmonitor: teach git to optionally utilize
> >> a file system monitor to speed up detecting new or changed files.
> >
> >> +int git_config_get_fsmonitor(void)
> >> +{
> >> +  if (git_config_get_pathname("core.fsmonitor", &core_fsmonitor))
> >> +  core_fsmonitor = getenv("GIT_FSMONITOR_TEST");
> >> +
> >> +  if (core_fsmonitor && !*core_fsmonitor)
> >> +  core_fsmonitor = NULL;
> >> +
> >> +  if (core_fsmonitor)
> >> +  return 1;
> >> +
> >> +  return 0;
> >> +}
> >
> > This functions return values are backwards relative to the rest of the
> git_config_* functions.
> 
> I'm confused.  If core.fsmonitor is configured, it returns 1. If it is not
> configured, it returns 0. I don't make use of the -1 /* default value */ 
> option
> as I didn't see any use/value in this case. What is backwards?

The other git_config_* functions return 1 for error and 0 for success.

> > [snip]
> >
> > +>  /*
> > +>   * With fsmonitor, we can trust the untracked cache's valid field.
> > +>   */
> >
> 
> Did you intend to make a comment here?

Sorry.  I was going to make a comment that I didn't see how that could work 
since we weren't touching the untracked cache here, but then I saw the bit 
further down.   I'm still not sure it works (see comment on 10/12), but at
least it could in theory work.
 



Re: [PATCH v6 04/12] fsmonitor: teach git to optionally utilize a file system monitor to speed up detecting new or changed files.

2017-09-18 Thread Ben Peart



On 9/18/2017 9:32 AM, David Turner wrote:

-Original Message-
From: Ben Peart [mailto:peart...@gmail.com]
Sent: Monday, September 18, 2017 9:07 AM
To: David Turner ; 'Ben Peart'

Cc: ava...@gmail.com; christian.cou...@gmail.com; git@vger.kernel.org;
gits...@pobox.com; johannes.schinde...@gmx.de; pclo...@gmail.com;
p...@peff.net
Subject: Re: [PATCH v6 04/12] fsmonitor: teach git to optionally utilize a file
system monitor to speed up detecting new or changed files.

Thanks for taking the time to review/provide feedback!

On 9/15/2017 5:35 PM, David Turner wrote:

-Original Message-
From: Ben Peart [mailto:benpe...@microsoft.com]
Sent: Friday, September 15, 2017 3:21 PM
To: benpe...@microsoft.com
Cc: David Turner ; ava...@gmail.com;
christian.cou...@gmail.com; git@vger.kernel.org; gits...@pobox.com;
johannes.schinde...@gmx.de; pclo...@gmail.com; p...@peff.net
Subject: [PATCH v6 04/12] fsmonitor: teach git to optionally utilize
a file system monitor to speed up detecting new or changed files.



+int git_config_get_fsmonitor(void)
+{
+   if (git_config_get_pathname("core.fsmonitor", &core_fsmonitor))
+   core_fsmonitor = getenv("GIT_FSMONITOR_TEST");
+
+   if (core_fsmonitor && !*core_fsmonitor)
+   core_fsmonitor = NULL;
+
+   if (core_fsmonitor)
+   return 1;
+
+   return 0;
+}


This functions return values are backwards relative to the rest of the

git_config_* functions.

I'm confused.  If core.fsmonitor is configured, it returns 1. If it is not
configured, it returns 0. I don't make use of the -1 /* default value */ option
as I didn't see any use/value in this case. What is backwards?


The other git_config_* functions return 1 for error and 0 for success.


Hmm, I followed the model (ie copy/paste :)) used by the tracked cache. 
If you look at how it uses, the return value, it is 0 == false == remove 
the extension, 1 == true == add the extension.  I'm doing the same with 
fsmonitor.


static void tweak_untracked_cache(struct index_state *istate)
{
switch (git_config_get_untracked_cache()) {
case -1: /* keep: do nothing */
break;
case 0: /* false */
remove_untracked_cache(istate);
break;
case 1: /* true */
add_untracked_cache(istate);
break;
default: /* unknown value: do nothing */
break;
}
}

void tweak_fsmonitor(struct index_state *istate)
{
switch (git_config_get_fsmonitor()) {
case -1: /* keep: do nothing */
break;
case 0: /* false */
remove_fsmonitor(istate);
break;
case 1: /* true */
add_fsmonitor(istate);
break;
default: /* unknown value: do nothing */
break;
}
}




[snip]

+>   /*
+>* With fsmonitor, we can trust the untracked cache's valid field.
+>*/



Did you intend to make a comment here?


Sorry.  I was going to make a comment that I didn't see how that could work
since we weren't touching the untracked cache here, but then I saw the bit
further down.   I'm still not sure it works (see comment on 10/12), but at
least it could in theory work.
  



The logic here assumes that when the index is written out, it is valid 
including the untracked cache and the CE_FSMONITOR_VALID bits. 
Therefore it should still all be valid as of the time the fsmonitor was 
queried and the index saved.


Another way of thinking about this is that the fsmonitor extension is 
simply adding another persisted bit on each index entry.  It just gets 
persisted/restored differently than the other persisted bits.


Obviously, before we can use it assuming it reflects the *current* state 
of the working directory, we have to refresh the bits via the refresh logic.