Re: [PATCH v1 2/5] Teach git to optionally utilize a file system monitor to speed up detecting new or changed files.

2017-05-17 Thread Jeff King
On Wed, May 17, 2017 at 08:15:03PM +0200, Johannes Sixt wrote:

> Am 17.05.2017 um 16:26 schrieb Ben Peart:
> > On 5/16/2017 3:13 PM, Johannes Sixt wrote:
> > > Am 16.05.2017 um 19:17 schrieb Ben Peart:
> > > > OK, now I'm confused as to the best path for adding a get_be64.  This
> > > > one is trivial:
> > > > 
> > > > #define get_be64(p)ntohll(*(uint64_t *)(p))
> > > 
> > > I cringe when I see a cast like this. Unless you can guarantee that p is
> > > char* (bare or signed or unsigned), you fall pray to strict aliasing
> > > violations, aka undefined behavior. And I'm not even mentioning correct
> > > alignment, yet.
> > 
> > Note, this macro is only used where the CPU architecture is OK with
> > unaligned memory access.
> 
> I'm not worried about the unaligned memory access: It either works, or we
> get a SIGBUS. The undefined behavior is more worrisome because the code may
> work or not, and we can never be sure which it is.

I don't think there's much we can do, though. That's how all of the
get_be* macros are designed to work (and there's really no point in
using them on something that isn't a char pointer).

I agree it would be nice to have some type safety there if we can get
it, though. I wonder if:

  static inline uint32_t get_be32(unsigned char *p)
  {
return ntohl(*(unsigned int *)p);
  }

would generate the same code. It does mean we may have problems between
signed/unsigned buffers, though.

-Peff


Re: [PATCH v1 2/5] Teach git to optionally utilize a file system monitor to speed up detecting new or changed files.

2017-05-17 Thread Johannes Sixt

Am 17.05.2017 um 16:26 schrieb Ben Peart:

On 5/16/2017 3:13 PM, Johannes Sixt wrote:

Am 16.05.2017 um 19:17 schrieb Ben Peart:

OK, now I'm confused as to the best path for adding a get_be64.  This
one is trivial:

#define get_be64(p)ntohll(*(uint64_t *)(p))


I cringe when I see a cast like this. Unless you can guarantee that p is
char* (bare or signed or unsigned), you fall pray to strict aliasing
violations, aka undefined behavior. And I'm not even mentioning correct
alignment, yet.


Note, this macro is only used where the CPU architecture is OK with 
unaligned memory access.


I'm not worried about the unaligned memory access: It either works, or 
we get a SIGBUS. The undefined behavior is more worrisome because the 
code may work or not, and we can never be sure which it is.


-- Hannes


Re: [PATCH v1 2/5] Teach git to optionally utilize a file system monitor to speed up detecting new or changed files.

2017-05-17 Thread Ben Peart



On 5/16/2017 3:13 PM, Johannes Sixt wrote:

Am 16.05.2017 um 19:17 schrieb Ben Peart:

OK, now I'm confused as to the best path for adding a get_be64.  This
one is trivial:

#define get_be64(p)ntohll(*(uint64_t *)(p))


I cringe when I see a cast like this. Unless you can guarantee that p is
char* (bare or signed or unsigned), you fall pray to strict aliasing
violations, aka undefined behavior. And I'm not even mentioning correct
alignment, yet.

-- Hannes


Note, this macro is only used where the CPU architecture is OK with 
unaligned memory access.  You can see it in context with many similar 
macros and casts in bswap.h.  It's outside the scope of this patch 
series to fix them all.  Perhaps a separate patch series?


Re: [PATCH v1 2/5] Teach git to optionally utilize a file system monitor to speed up detecting new or changed files.

2017-05-16 Thread Ben Peart



On 5/16/2017 5:41 PM, Jonathan Tan wrote:

I'm not very familiar with this part of the code - here is a partial
review.

Firstly, if someone invokes update-index, I wonder if it's better just
to do a full refresh (e.g. by deleting the last_update time from the
index).


A full refresh can be very expensive when the working directory is large 
(the specific case this patch series is trying to improve).  Instead, 
the code does the minimal update required to keep things fast but still 
return correct results.




Also, the change to unpack-trees.c doesn't match my mental model. I
notice that it is in a function related to sparse checkout, but if the
working tree changes for whatever reason, it seems simpler to just let
the hook do its thing. As far as I can tell, it is fine to have files
overzealously marked as FSMONITOR_DIRTY.


The case this (and the others like it) is solving is when the index is 
updated but there may not be any change to the associated file in the 
working directory.  When this occurs, the hook won't indicate any change 
has happened so the index and working directory could be out of sync. 
To be sure this doesn't happen, the index entry is marked 
CE_FSMONITOR_DIRTY to ensure the file is checked.


This is pretty simple to demonstrate - a simple "git reset HEAD~1" will 
do it as a mixed reset updates the index but doesn't touch the files in 
the working directory.




On 05/15/2017 12:13 PM, Ben Peart wrote:

diff --git a/cache.h b/cache.h
index 40ec032a2d..64aa6e57cd 100644
--- a/cache.h
+++ b/cache.h
@@ -201,6 +201,7 @@ struct cache_entry {
 #define CE_ADDED (1 << 19)

 #define CE_HASHED(1 << 20)
+#define CE_FSMONITOR_DIRTY   (1 << 21)
 #define CE_WT_REMOVE (1 << 22) /* remove in work directory */
 #define CE_CONFLICTED(1 << 23)

@@ -324,6 +325,7 @@ static inline unsigned int canon_mode(unsigned int
mode)
 #define CACHE_TREE_CHANGED(1 << 5)
 #define SPLIT_INDEX_ORDERED(1 << 6)
 #define UNTRACKED_CHANGED(1 << 7)
+#define FSMONITOR_CHANGED(1 << 8)

 struct split_index;
 struct untracked_cache;
@@ -342,6 +344,8 @@ struct index_state {
 struct hashmap dir_hash;
 unsigned char sha1[20];
 struct untracked_cache *untracked;
+time_t last_update;
+struct ewah_bitmap *bitmap;


Here a bitmap is introduced, presumably corresponding to the entries in
"struct cache_entry **cache", but there is also a CE_FSMONITOR_DIRTY
that can be set in each "struct cache_entry". This seems redundant and
probably at least worth explaining in a comment.



The ewah bitmap is loaded from the index extension and saved until it 
can be processed after the untracked cache has been loaded and 
initialized in post_read_index_from().  I'm not opposed to documenting 
that to make it clearer but I've just followed the same pattern the 
untracked cache, and split index extensions use which don't specifically 
document it either.



+/*
+ * Call the query-fsmonitor hook passing the time of the last saved
results.
+ */
+static int query_fsmonitor(time_t last_update, struct strbuf *buffer)
+{
+struct child_process cp = CHILD_PROCESS_INIT;
+char date[64];
+const char *argv[3];
+
+if (!(argv[0] = find_hook("query-fsmonitor")))
+return -1;
+
+snprintf(date, sizeof(date), "%" PRIuMAX, (uintmax_t)last_update);
+argv[1] = date;
+argv[2] = NULL;
+cp.argv = argv;
+cp.out = -1;
+
+return capture_command(, buffer, 1024);
+}


Output argument could probably be named better.


I agree.  I've renamed it query_result for the next iteration.



Also, would the output of this command be very large? If yes, it might
be better to process it little by little instead of buffering the whole
thing first.



The output is usually quite small as it is is the list of files modified 
in the working directory since the last command that requested the 
updated list.



+void write_fsmonitor_extension(struct strbuf *sb, struct index_state*
istate);


Space before * (in the .h and .c files).



Thanks, missed that.  I'll fix it for the next iteration.



Re: [PATCH v1 2/5] Teach git to optionally utilize a file system monitor to speed up detecting new or changed files.

2017-05-16 Thread Jonathan Tan

I'm not very familiar with this part of the code - here is a partial review.

Firstly, if someone invokes update-index, I wonder if it's better just 
to do a full refresh (e.g. by deleting the last_update time from the index).


Also, the change to unpack-trees.c doesn't match my mental model. I 
notice that it is in a function related to sparse checkout, but if the 
working tree changes for whatever reason, it seems simpler to just let 
the hook do its thing. As far as I can tell, it is fine to have files 
overzealously marked as FSMONITOR_DIRTY.


On 05/15/2017 12:13 PM, Ben Peart wrote:

diff --git a/cache.h b/cache.h
index 40ec032a2d..64aa6e57cd 100644
--- a/cache.h
+++ b/cache.h
@@ -201,6 +201,7 @@ struct cache_entry {
 #define CE_ADDED (1 << 19)

 #define CE_HASHED(1 << 20)
+#define CE_FSMONITOR_DIRTY   (1 << 21)
 #define CE_WT_REMOVE (1 << 22) /* remove in work directory */
 #define CE_CONFLICTED(1 << 23)

@@ -324,6 +325,7 @@ static inline unsigned int canon_mode(unsigned int mode)
 #define CACHE_TREE_CHANGED (1 << 5)
 #define SPLIT_INDEX_ORDERED(1 << 6)
 #define UNTRACKED_CHANGED  (1 << 7)
+#define FSMONITOR_CHANGED  (1 << 8)

 struct split_index;
 struct untracked_cache;
@@ -342,6 +344,8 @@ struct index_state {
struct hashmap dir_hash;
unsigned char sha1[20];
struct untracked_cache *untracked;
+   time_t last_update;
+   struct ewah_bitmap *bitmap;


Here a bitmap is introduced, presumably corresponding to the entries in 
"struct cache_entry **cache", but there is also a CE_FSMONITOR_DIRTY 
that can be set in each "struct cache_entry". This seems redundant and 
probably at least worth explaining in a comment.



+/*
+ * Call the query-fsmonitor hook passing the time of the last saved results.
+ */
+static int query_fsmonitor(time_t last_update, struct strbuf *buffer)
+{
+   struct child_process cp = CHILD_PROCESS_INIT;
+   char date[64];
+   const char *argv[3];
+
+   if (!(argv[0] = find_hook("query-fsmonitor")))
+   return -1;
+
+   snprintf(date, sizeof(date), "%" PRIuMAX, (uintmax_t)last_update);
+   argv[1] = date;
+   argv[2] = NULL;
+   cp.argv = argv;
+   cp.out = -1;
+
+   return capture_command(, buffer, 1024);
+}


Output argument could probably be named better.

Also, would the output of this command be very large? If yes, it might 
be better to process it little by little instead of buffering the whole 
thing first.



+void write_fsmonitor_extension(struct strbuf *sb, struct index_state* istate);


Space before * (in the .h and .c files).



Re: [PATCH v1 2/5] Teach git to optionally utilize a file system monitor to speed up detecting new or changed files.

2017-05-16 Thread Johannes Sixt

Am 16.05.2017 um 19:17 schrieb Ben Peart:
OK, now I'm confused as to the best path for adding a get_be64.  This 
one is trivial:


#define get_be64(p)ntohll(*(uint64_t *)(p))


I cringe when I see a cast like this. Unless you can guarantee that p is 
char* (bare or signed or unsigned), you fall pray to strict aliasing 
violations, aka undefined behavior. And I'm not even mentioning correct 
alignment, yet.


-- Hannes


Re: [PATCH v1 2/5] Teach git to optionally utilize a file system monitor to speed up detecting new or changed files.

2017-05-16 Thread Jeff King
On Tue, May 16, 2017 at 01:17:56PM -0400, Ben Peart wrote:

> > Thanks for the pointers.  I'll update this to use the existing get_be32
> > and have created a get_be64 and will use that for the last_update.
> 
> OK, now I'm confused as to the best path for adding a get_be64.  This one is
> trivial:
> 
> #define get_be64(p)   ntohll(*(uint64_t *)(p))
> 
> but should the unaligned version be:
> 
> #define get_be64(p)   ( \
>   (*((unsigned char *)(p) + 0) << 56) | \
>   (*((unsigned char *)(p) + 1) << 48) | \
>   (*((unsigned char *)(p) + 2) << 40) | \
>   (*((unsigned char *)(p) + 3) << 32) | \
>   (*((unsigned char *)(p) + 4) << 24) | \
>   (*((unsigned char *)(p) + 5) << 16) | \
>   (*((unsigned char *)(p) + 6) <<  8) | \
>   (*((unsigned char *)(p) + 7) <<  0) )
> 
> or would it be better to do it like this:
> 
> #define get_be64(p)   ( \
>   ((uint64_t)get_be32((unsigned char *)(p) + 0) << 32) | \
>   ((uint64_t)get_be32((unsigned char *)(p) + 4) <<  0)

I'd imagine the compiler would generate quite similar code between the
two, and the second is much shorter and easier to read, so I'd probably
prefer it.

> or with a static inline function like git_bswap64:

Try "git log -Sinline compat/bswap.h", which turns up the history of why
it went from a macro to an inline function.

The get_be macros are simple enough that they can remain as macros,
though I'd have no objection personally to them being inline functions.
I'd expect modern compilers to be able to optimize similarly, and it
removes the restriction that you can't call the macro with an argument
that has side effects.

-Peff


Re: [PATCH v1 2/5] Teach git to optionally utilize a file system monitor to speed up detecting new or changed files.

2017-05-16 Thread Ben Peart



On 5/15/2017 9:55 PM, Ben Peart wrote:



On 5/15/2017 8:34 PM, Jeff King wrote:

On Tue, May 16, 2017 at 12:22:14AM +, brian m. carlson wrote:


On Mon, May 15, 2017 at 03:13:44PM -0400, Ben Peart wrote:

+istate->last_update = (time_t)ntohll(*(uint64_t *)index);
+index += sizeof(uint64_t);
+
+ewah_size = ntohl(*(uint32_t *)index);
+index += sizeof(uint32_t);


To answer the question you asked in your cover letter, you cannot write
this unless you can guarantee (((uintptr_t)index & 7) == 0) is true.
Otherwise, this will produce a SIGBUS on SPARC, Alpha, MIPS, and some
ARM systems, and it will perform poorly on PowerPC and other ARM
systems[0].

If you got that pointer from malloc and have only indexed multiples of 8
on it, you're good.  But if you're not sure, you probably want to use
memcpy.  If the compiler can determine that it's not necessary, it will
omit the copy and perform a direct load.


I think get_be32() does exactly what we want for the ewah_size read. For
the last_update one, we don't have a get_be64() yet, but it should be
easy to make based on the 16/32 versions.


Thanks for the pointers.  I'll update this to use the existing get_be32
and have created a get_be64 and will use that for the last_update.



OK, now I'm confused as to the best path for adding a get_be64.  This 
one is trivial:


#define get_be64(p) ntohll(*(uint64_t *)(p))

but should the unaligned version be:

#define get_be64(p) ( \
(*((unsigned char *)(p) + 0) << 56) | \
(*((unsigned char *)(p) + 1) << 48) | \
(*((unsigned char *)(p) + 2) << 40) | \
(*((unsigned char *)(p) + 3) << 32) | \
(*((unsigned char *)(p) + 4) << 24) | \
(*((unsigned char *)(p) + 5) << 16) | \
(*((unsigned char *)(p) + 6) <<  8) | \
(*((unsigned char *)(p) + 7) <<  0) )

or would it be better to do it like this:

#define get_be64(p) ( \
((uint64_t)get_be32((unsigned char *)(p) + 0) << 32) | \
((uint64_t)get_be32((unsigned char *)(p) + 4) <<  0)

or with a static inline function like git_bswap64:

or something else entirely?

I'm not sure why the different styles in this one file and which I 
should be emulating.





(I note also that time_t is not necessarily 64-bits in the first place,
but David said something about this not really being a time_t).



The in memory representation is a time_t as that is the return value of
time(NULL) but it is converted to/from a 64 bit value when written/read
to the index extension so that the index format is the same no matter
the native size of time_t.


-Peff



Re: [PATCH v1 2/5] Teach git to optionally utilize a file system monitor to speed up detecting new or changed files.

2017-05-15 Thread Jeff King
On Mon, May 15, 2017 at 09:55:12PM -0400, Ben Peart wrote:

> > > > +   istate->last_update = (time_t)ntohll(*(uint64_t *)index);
> [...]
> > (I note also that time_t is not necessarily 64-bits in the first place,
> > but David said something about this not really being a time_t).
> 
> The in memory representation is a time_t as that is the return value of
> time(NULL) but it is converted to/from a 64 bit value when written/read to
> the index extension so that the index format is the same no matter the
> native size of time_t.

OK. I guess your cast here will truncate on 32-bit systems, but
presumably not until 2038, so we can perhaps ignore it for now (and
anyway, time(NULL) will be broken on such a system at that point).

-Peff


Re: [PATCH v1 2/5] Teach git to optionally utilize a file system monitor to speed up detecting new or changed files.

2017-05-15 Thread Ben Peart



On 5/15/2017 8:34 PM, Jeff King wrote:

On Tue, May 16, 2017 at 12:22:14AM +, brian m. carlson wrote:


On Mon, May 15, 2017 at 03:13:44PM -0400, Ben Peart wrote:

+   istate->last_update = (time_t)ntohll(*(uint64_t *)index);
+   index += sizeof(uint64_t);
+
+   ewah_size = ntohl(*(uint32_t *)index);
+   index += sizeof(uint32_t);


To answer the question you asked in your cover letter, you cannot write
this unless you can guarantee (((uintptr_t)index & 7) == 0) is true.
Otherwise, this will produce a SIGBUS on SPARC, Alpha, MIPS, and some
ARM systems, and it will perform poorly on PowerPC and other ARM
systems[0].

If you got that pointer from malloc and have only indexed multiples of 8
on it, you're good.  But if you're not sure, you probably want to use
memcpy.  If the compiler can determine that it's not necessary, it will
omit the copy and perform a direct load.


I think get_be32() does exactly what we want for the ewah_size read. For
the last_update one, we don't have a get_be64() yet, but it should be
easy to make based on the 16/32 versions.


Thanks for the pointers.  I'll update this to use the existing get_be32 
and have created a get_be64 and will use that for the last_update.




(I note also that time_t is not necessarily 64-bits in the first place,
but David said something about this not really being a time_t).



The in memory representation is a time_t as that is the return value of 
time(NULL) but it is converted to/from a 64 bit value when written/read 
to the index extension so that the index format is the same no matter 
the native size of time_t.



-Peff



Re: [PATCH v1 2/5] Teach git to optionally utilize a file system monitor to speed up detecting new or changed files.

2017-05-15 Thread Ben Peart


On 5/15/2017 5:21 PM, David Turner wrote:



-Original Message-
From: Ben Peart [mailto:peart...@gmail.com]
Sent: Monday, May 15, 2017 3:14 PM
To: git@vger.kernel.org
Cc: gits...@pobox.com; benpe...@microsoft.com; pclo...@gmail.com;
johannes.schinde...@gmx.de; David Turner ;
p...@peff.net
Subject: [PATCH v1 2/5] Teach git to optionally utilize a file system monitor to
speed up detecting new or changed files.



@@ -342,6 +344,8 @@ struct index_state {
struct hashmap dir_hash;
unsigned char sha1[20];
struct untracked_cache *untracked;
+   time_t last_update;
+   struct ewah_bitmap *bitmap;


The name 'bitmap' doesn't tell the reader much about what it used for.


+static int update_istate(const char *name, void *is) {


Rename to mark_file_dirty?  Also why does it take a void pointer?  Or return 
int (rather than void)?



Thanks for the feedback.  I'll do some renaming and change the types passed.



+void refresh_by_fsmonitor(struct index_state *istate) {
+   static has_run_once = FALSE;
+   struct strbuf buffer = STRBUF_INIT;


Rename to query_result? Also I think you're leaking it.



Good catch!  I missed the leak there.  Fixed for the next roll.


Re: [PATCH v1 2/5] Teach git to optionally utilize a file system monitor to speed up detecting new or changed files.

2017-05-15 Thread Jeff King
On Tue, May 16, 2017 at 12:22:14AM +, brian m. carlson wrote:

> On Mon, May 15, 2017 at 03:13:44PM -0400, Ben Peart wrote:
> > +   istate->last_update = (time_t)ntohll(*(uint64_t *)index);
> > +   index += sizeof(uint64_t);
> > +
> > +   ewah_size = ntohl(*(uint32_t *)index);
> > +   index += sizeof(uint32_t);
> 
> To answer the question you asked in your cover letter, you cannot write
> this unless you can guarantee (((uintptr_t)index & 7) == 0) is true.
> Otherwise, this will produce a SIGBUS on SPARC, Alpha, MIPS, and some
> ARM systems, and it will perform poorly on PowerPC and other ARM
> systems[0].
> 
> If you got that pointer from malloc and have only indexed multiples of 8
> on it, you're good.  But if you're not sure, you probably want to use
> memcpy.  If the compiler can determine that it's not necessary, it will
> omit the copy and perform a direct load.

I think get_be32() does exactly what we want for the ewah_size read. For
the last_update one, we don't have a get_be64() yet, but it should be
easy to make based on the 16/32 versions.

(I note also that time_t is not necessarily 64-bits in the first place,
but David said something about this not really being a time_t).

-Peff


Re: [PATCH v1 2/5] Teach git to optionally utilize a file system monitor to speed up detecting new or changed files.

2017-05-15 Thread brian m. carlson
On Mon, May 15, 2017 at 03:13:44PM -0400, Ben Peart wrote:
> + istate->last_update = (time_t)ntohll(*(uint64_t *)index);
> + index += sizeof(uint64_t);
> +
> + ewah_size = ntohl(*(uint32_t *)index);
> + index += sizeof(uint32_t);

To answer the question you asked in your cover letter, you cannot write
this unless you can guarantee (((uintptr_t)index & 7) == 0) is true.
Otherwise, this will produce a SIGBUS on SPARC, Alpha, MIPS, and some
ARM systems, and it will perform poorly on PowerPC and other ARM
systems[0].

If you got that pointer from malloc and have only indexed multiples of 8
on it, you're good.  But if you're not sure, you probably want to use
memcpy.  If the compiler can determine that it's not necessary, it will
omit the copy and perform a direct load.

[0] To be technically correct, all of those systems except SPARC can
have unaligned access fixed up automatically, depending on the kernel
settings.  But such a fixup involves taking a trap into the kernel,
performing two aligned loads and bit shifting, and returning to
userspace, which performs about as well as you'd expect.  For that
reason, Debian build machines have such fixups turned off and will just
SIGBUS.
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | https://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


RE: [PATCH v1 2/5] Teach git to optionally utilize a file system monitor to speed up detecting new or changed files.

2017-05-15 Thread David Turner

> -Original Message-
> From: Ben Peart [mailto:peart...@gmail.com]
> Sent: Monday, May 15, 2017 3:14 PM
> To: git@vger.kernel.org
> Cc: gits...@pobox.com; benpe...@microsoft.com; pclo...@gmail.com;
> johannes.schinde...@gmx.de; David Turner ;
> p...@peff.net
> Subject: [PATCH v1 2/5] Teach git to optionally utilize a file system monitor 
> to
> speed up detecting new or changed files.

> @@ -342,6 +344,8 @@ struct index_state {
>   struct hashmap dir_hash;
>   unsigned char sha1[20];
>   struct untracked_cache *untracked;
> + time_t last_update;
> + struct ewah_bitmap *bitmap;

The name 'bitmap' doesn't tell the reader much about what it used for.

> +static int update_istate(const char *name, void *is) {

Rename to mark_file_dirty?  Also why does it take a void pointer?  Or return 
int (rather than void)?

> +void refresh_by_fsmonitor(struct index_state *istate) {
> + static has_run_once = FALSE;
> + struct strbuf buffer = STRBUF_INIT;

Rename to query_result? Also I think you're leaking it.



[PATCH v1 2/5] Teach git to optionally utilize a file system monitor to speed up detecting new or changed files.

2017-05-15 Thread Ben Peart
When the index is read from disk, the query-fsmonitor index extension is
used to flag the last known potentially dirty index and untracked cach
entries.

If git finds out some entries are 'fsmonitor-dirty', but are really
unchanged (e.g. the file was changed, then reverted back), then Git will
clear the marking in the extension. If git adds or updates an index
entry, it is marked 'fsmonitor-dirty' to ensure it is checked for
changes in the working directory.

Before the 'fsmonitor-dirty' flags are used to limit the scope of the
files to be checked, the query-fsmonitor hook proc is called with the
time the index was last updated.  The hook proc returns the list of
files changed since that last updated time and the list of
potentially dirty entries is updated to reflect the current state.

refresh_index() and valid_cached_dir() are updated so that any entry not
flagged as potentially dirty is not checked as it cannot have any
changes.

Signed-off-by: Ben Peart 

---
 Makefile   |   1 +
 builtin/update-index.c |   1 +
 cache.h|   5 ++
 config.c   |   5 ++
 dir.c  |  13 +++
 dir.h  |   2 +
 entry.c|   1 +
 environment.c  |   1 +
 fsmonitor.c| 233 +
 fsmonitor.h|   9 ++
 read-cache.c   |  28 +-
 unpack-trees.c |   1 +
 12 files changed, 298 insertions(+), 2 deletions(-)
 create mode 100644 fsmonitor.c
 create mode 100644 fsmonitor.h

diff --git a/Makefile b/Makefile
index 94cce645a5..89acff1f46 100644
--- a/Makefile
+++ b/Makefile
@@ -761,6 +761,7 @@ LIB_OBJS += ewah/ewah_rlw.o
 LIB_OBJS += exec_cmd.o
 LIB_OBJS += fetch-pack.o
 LIB_OBJS += fsck.o
+LIB_OBJS += fsmonitor.o
 LIB_OBJS += gettext.o
 LIB_OBJS += gpg-interface.o
 LIB_OBJS += graph.o
diff --git a/builtin/update-index.c b/builtin/update-index.c
index ebfc09faa0..32fd977b43 100644
--- a/builtin/update-index.c
+++ b/builtin/update-index.c
@@ -232,6 +232,7 @@ static int mark_ce_flags(const char *path, int flag, int 
mark)
else
active_cache[pos]->ce_flags &= ~flag;
active_cache[pos]->ce_flags |= CE_UPDATE_IN_BASE;
+   active_cache[pos]->ce_flags |= CE_FSMONITOR_DIRTY;
cache_tree_invalidate_path(_index, path);
active_cache_changed |= CE_ENTRY_CHANGED;
return 0;
diff --git a/cache.h b/cache.h
index 40ec032a2d..64aa6e57cd 100644
--- a/cache.h
+++ b/cache.h
@@ -201,6 +201,7 @@ struct cache_entry {
 #define CE_ADDED (1 << 19)
 
 #define CE_HASHED(1 << 20)
+#define CE_FSMONITOR_DIRTY   (1 << 21)
 #define CE_WT_REMOVE (1 << 22) /* remove in work directory */
 #define CE_CONFLICTED(1 << 23)
 
@@ -324,6 +325,7 @@ static inline unsigned int canon_mode(unsigned int mode)
 #define CACHE_TREE_CHANGED (1 << 5)
 #define SPLIT_INDEX_ORDERED(1 << 6)
 #define UNTRACKED_CHANGED  (1 << 7)
+#define FSMONITOR_CHANGED  (1 << 8)
 
 struct split_index;
 struct untracked_cache;
@@ -342,6 +344,8 @@ struct index_state {
struct hashmap dir_hash;
unsigned char sha1[20];
struct untracked_cache *untracked;
+   time_t last_update;
+   struct ewah_bitmap *bitmap;
 };
 
 extern struct index_state the_index;
@@ -767,6 +771,7 @@ extern int precomposed_unicode;
 extern int protect_hfs;
 extern int protect_ntfs;
 extern int git_db_env, git_index_env, git_graft_env, git_common_dir_env;
+extern int core_fsmonitor;
 
 /*
  * Include broken refs in all ref iterations, which will
diff --git a/config.c b/config.c
index d971cc3474..d146c88399 100644
--- a/config.c
+++ b/config.c
@@ -1224,6 +1224,11 @@ static int git_default_core_config(const char *var, 
const char *value)
return 0;
}
 
+   if (!strcmp(var, "core.fsmonitor")) {
+   core_fsmonitor = git_config_bool(var, value);
+   return 0;
+   }
+
/* Add other config variables here and to Documentation/config.txt. */
return platform_core_config(var, value);
 }
diff --git a/dir.c b/dir.c
index 1b5558fdf9..da428489e2 100644
--- a/dir.c
+++ b/dir.c
@@ -1652,6 +1652,18 @@ static int valid_cached_dir(struct dir_struct *dir,
if (!untracked)
return 0;
 
+   refresh_by_fsmonitor(_index);
+   if (dir->untracked->use_fsmonitor) {
+   /*
+* With fsmonitor, we can trust the untracked cache's
+* valid field.
+*/
+   if (untracked->valid)
+   goto skip_stat;
+   else
+   invalidate_directory(dir->untracked, untracked);
+   }
+
if (stat(path->len ? path->buf : ".", )) {
invalidate_directory(dir->untracked, untracked);
memset(>stat_data, 0, sizeof(untracked->stat_data));
@@ -1665,6 +1677,7 @@