RE: [PATCH v6 04/12] fsmonitor: teach git to optionally utilize a file system monitor to speed up detecting new or changed files.

2017-09-15 Thread David Turner
> -Original Message-
> From: Ben Peart [mailto:benpe...@microsoft.com]
> Sent: Friday, September 15, 2017 3:21 PM
> To: benpe...@microsoft.com
> Cc: David Turner ; ava...@gmail.com;
> christian.cou...@gmail.com; git@vger.kernel.org; gits...@pobox.com;
> johannes.schinde...@gmx.de; pclo...@gmail.com; p...@peff.net
> Subject: [PATCH v6 04/12] fsmonitor: teach git to optionally utilize a file 
> system
> monitor to speed up detecting new or changed files.
 
> +int git_config_get_fsmonitor(void)
> +{
> + if (git_config_get_pathname("core.fsmonitor", &core_fsmonitor))
> + core_fsmonitor = getenv("GIT_FSMONITOR_TEST");
> +
> + if (core_fsmonitor && !*core_fsmonitor)
> + core_fsmonitor = NULL;
> +
> + if (core_fsmonitor)
> + return 1;
> +
> + return 0;
> +}

This functions return values are backwards relative to the rest of the 
git_config_* functions.

[snip]

+>  /*
+>   * With fsmonitor, we can trust the untracked cache's valid field.
+>   */

[snip]

> +int read_fsmonitor_extension(struct index_state *istate, const void *data,
> + unsigned long sz)
> +{

If git_config_get_fsmonitor returns 0, fsmonitor_dirty will leak.

[snip]

> + /* a fsmonitor process can return '*' to indicate all entries are 
> invalid */

That's not documented in your documentation.  Also, I'm not sure I like it: 
what 
if I have a file whose name starts with '*'?  Yeah, that would be silly, but 
this indicates the need 
for the protocol to have some sort of signaling mechanism that's out-of-band  
Maybe 
have some key\0value\0 pairs and then \0\0 and then the list of files?  Or, if 
you want to keep
it really simple, allow an entry of '/' (which is an invalid filename) to mean 
'all'.

> +void add_fsmonitor(struct index_state *istate) {
> + int i;
> +
> + if (!istate->fsmonitor_last_update) {
[snip]
> + /* reset the untracked cache */

Is this really necessary?  Shouldn't the untracked cache be in a correct state 
already? 

> +/*
> + * Clear the given cache entries CE_FSMONITOR_VALID bit and invalidate

Nit: "s/entries/entry's/".
 



Re: [PATCH v6 04/12] fsmonitor: teach git to optionally utilize a file system monitor to speed up detecting new or changed files.

2017-09-18 Thread Ben Peart

Thanks for taking the time to review/provide feedback!

On 9/15/2017 5:35 PM, David Turner wrote:

-Original Message-
From: Ben Peart [mailto:benpe...@microsoft.com]
Sent: Friday, September 15, 2017 3:21 PM
To: benpe...@microsoft.com
Cc: David Turner ; ava...@gmail.com;
christian.cou...@gmail.com; git@vger.kernel.org; gits...@pobox.com;
johannes.schinde...@gmx.de; pclo...@gmail.com; p...@peff.net
Subject: [PATCH v6 04/12] fsmonitor: teach git to optionally utilize a file 
system
monitor to speed up detecting new or changed files.
  

+int git_config_get_fsmonitor(void)
+{
+   if (git_config_get_pathname("core.fsmonitor", &core_fsmonitor))
+   core_fsmonitor = getenv("GIT_FSMONITOR_TEST");
+
+   if (core_fsmonitor && !*core_fsmonitor)
+   core_fsmonitor = NULL;
+
+   if (core_fsmonitor)
+   return 1;
+
+   return 0;
+}


This functions return values are backwards relative to the rest of the 
git_config_* functions.


I'm confused.  If core.fsmonitor is configured, it returns 1. If it is 
not configured, it returns 0. I don't make use of the -1 /* default 
value */ option as I didn't see any use/value in this case. What is 
backwards?




[snip]

+>   /*
+>* With fsmonitor, we can trust the untracked cache's valid field.
+>*/



Did you intend to make a comment here?


[snip]


+int read_fsmonitor_extension(struct index_state *istate, const void *data,
+   unsigned long sz)
+{


If git_config_get_fsmonitor returns 0, fsmonitor_dirty will leak.



Good catch!  Thank you.


[snip]


+   /* a fsmonitor process can return '*' to indicate all entries are 
invalid */


That's not documented in your documentation.  Also, I'm not sure I like it: what
if I have a file whose name starts with '*'?  Yeah, that would be silly, but 
this indicates the need
for the protocol to have some sort of signaling mechanism that's out-of-band  
Maybe
have some key\0value\0 pairs and then \0\0 and then the list of files?  Or, if 
you want to keep
it really simple, allow an entry of '/' (which is an invalid filename) to mean 
'all'.



Yea, this was an optimization I added late in the game to get around an 
issue in Watchman where it returns the name of every file the first time 
you query it (rather than the set of files that have actually changed 
since the requested time).


I didn't realize the wild card '*' was a valid character for a filename. 
 I'll switch to '/' as you suggest as I don't want to complicate things 
unnecessarily to handle this relatively rare optimization.  I'll also 
get it documented properly.  Thanks!



+void add_fsmonitor(struct index_state *istate) {
+   int i;
+
+   if (!istate->fsmonitor_last_update) {

[snip]

+   /* reset the untracked cache */


Is this really necessary?  Shouldn't the untracked cache be in a correct state 
already?



When fsmonitor is not turned on, I'm not explicitly invalidating 
untracked cache directory entries as git makes changes to files. While I 
doubt the sequence happens of 1) git making changes to files, *then* 2) 
turning on fsmonitor - I thought it better safe than sorry to assume 
that pattern won't ever happen in the future.  Especially since turning 
on the extension is rare and the cost is low.



+/*
+ * Clear the given cache entries CE_FSMONITOR_VALID bit and invalidate


Nit: "s/entries/entry's/".
  



RE: [PATCH v6 04/12] fsmonitor: teach git to optionally utilize a file system monitor to speed up detecting new or changed files.

2017-09-18 Thread David Turner
> -Original Message-
> From: Ben Peart [mailto:peart...@gmail.com]
> Sent: Monday, September 18, 2017 9:07 AM
> To: David Turner ; 'Ben Peart'
> 
> Cc: ava...@gmail.com; christian.cou...@gmail.com; git@vger.kernel.org;
> gits...@pobox.com; johannes.schinde...@gmx.de; pclo...@gmail.com;
> p...@peff.net
> Subject: Re: [PATCH v6 04/12] fsmonitor: teach git to optionally utilize a 
> file
> system monitor to speed up detecting new or changed files.
> 
> Thanks for taking the time to review/provide feedback!
> 
> On 9/15/2017 5:35 PM, David Turner wrote:
> >> -Original Message-
> >> From: Ben Peart [mailto:benpe...@microsoft.com]
> >> Sent: Friday, September 15, 2017 3:21 PM
> >> To: benpe...@microsoft.com
> >> Cc: David Turner ; ava...@gmail.com;
> >> christian.cou...@gmail.com; git@vger.kernel.org; gits...@pobox.com;
> >> johannes.schinde...@gmx.de; pclo...@gmail.com; p...@peff.net
> >> Subject: [PATCH v6 04/12] fsmonitor: teach git to optionally utilize
> >> a file system monitor to speed up detecting new or changed files.
> >
> >> +int git_config_get_fsmonitor(void)
> >> +{
> >> +  if (git_config_get_pathname("core.fsmonitor", &core_fsmonitor))
> >> +  core_fsmonitor = getenv("GIT_FSMONITOR_TEST");
> >> +
> >> +  if (core_fsmonitor && !*core_fsmonitor)
> >> +  core_fsmonitor = NULL;
> >> +
> >> +  if (core_fsmonitor)
> >> +  return 1;
> >> +
> >> +  return 0;
> >> +}
> >
> > This functions return values are backwards relative to the rest of the
> git_config_* functions.
> 
> I'm confused.  If core.fsmonitor is configured, it returns 1. If it is not
> configured, it returns 0. I don't make use of the -1 /* default value */ 
> option
> as I didn't see any use/value in this case. What is backwards?

The other git_config_* functions return 1 for error and 0 for success.

> > [snip]
> >
> > +>  /*
> > +>   * With fsmonitor, we can trust the untracked cache's valid field.
> > +>   */
> >
> 
> Did you intend to make a comment here?

Sorry.  I was going to make a comment that I didn't see how that could work 
since we weren't touching the untracked cache here, but then I saw the bit 
further down.   I'm still not sure it works (see comment on 10/12), but at
least it could in theory work.
 



Re: [PATCH v6 04/12] fsmonitor: teach git to optionally utilize a file system monitor to speed up detecting new or changed files.

2017-09-18 Thread Ben Peart



On 9/18/2017 9:32 AM, David Turner wrote:

-Original Message-
From: Ben Peart [mailto:peart...@gmail.com]
Sent: Monday, September 18, 2017 9:07 AM
To: David Turner ; 'Ben Peart'

Cc: ava...@gmail.com; christian.cou...@gmail.com; git@vger.kernel.org;
gits...@pobox.com; johannes.schinde...@gmx.de; pclo...@gmail.com;
p...@peff.net
Subject: Re: [PATCH v6 04/12] fsmonitor: teach git to optionally utilize a file
system monitor to speed up detecting new or changed files.

Thanks for taking the time to review/provide feedback!

On 9/15/2017 5:35 PM, David Turner wrote:

-Original Message-
From: Ben Peart [mailto:benpe...@microsoft.com]
Sent: Friday, September 15, 2017 3:21 PM
To: benpe...@microsoft.com
Cc: David Turner ; ava...@gmail.com;
christian.cou...@gmail.com; git@vger.kernel.org; gits...@pobox.com;
johannes.schinde...@gmx.de; pclo...@gmail.com; p...@peff.net
Subject: [PATCH v6 04/12] fsmonitor: teach git to optionally utilize
a file system monitor to speed up detecting new or changed files.



+int git_config_get_fsmonitor(void)
+{
+   if (git_config_get_pathname("core.fsmonitor", &core_fsmonitor))
+   core_fsmonitor = getenv("GIT_FSMONITOR_TEST");
+
+   if (core_fsmonitor && !*core_fsmonitor)
+   core_fsmonitor = NULL;
+
+   if (core_fsmonitor)
+   return 1;
+
+   return 0;
+}


This functions return values are backwards relative to the rest of the

git_config_* functions.

I'm confused.  If core.fsmonitor is configured, it returns 1. If it is not
configured, it returns 0. I don't make use of the -1 /* default value */ option
as I didn't see any use/value in this case. What is backwards?


The other git_config_* functions return 1 for error and 0 for success.


Hmm, I followed the model (ie copy/paste :)) used by the tracked cache. 
If you look at how it uses, the return value, it is 0 == false == remove 
the extension, 1 == true == add the extension.  I'm doing the same with 
fsmonitor.


static void tweak_untracked_cache(struct index_state *istate)
{
switch (git_config_get_untracked_cache()) {
case -1: /* keep: do nothing */
break;
case 0: /* false */
remove_untracked_cache(istate);
break;
case 1: /* true */
add_untracked_cache(istate);
break;
default: /* unknown value: do nothing */
break;
}
}

void tweak_fsmonitor(struct index_state *istate)
{
switch (git_config_get_fsmonitor()) {
case -1: /* keep: do nothing */
break;
case 0: /* false */
remove_fsmonitor(istate);
break;
case 1: /* true */
add_fsmonitor(istate);
break;
default: /* unknown value: do nothing */
break;
}
}




[snip]

+>   /*
+>* With fsmonitor, we can trust the untracked cache's valid field.
+>*/



Did you intend to make a comment here?


Sorry.  I was going to make a comment that I didn't see how that could work
since we weren't touching the untracked cache here, but then I saw the bit
further down.   I'm still not sure it works (see comment on 10/12), but at
least it could in theory work.
  



The logic here assumes that when the index is written out, it is valid 
including the untracked cache and the CE_FSMONITOR_VALID bits. 
Therefore it should still all be valid as of the time the fsmonitor was 
queried and the index saved.


Another way of thinking about this is that the fsmonitor extension is 
simply adding another persisted bit on each index entry.  It just gets 
persisted/restored differently than the other persisted bits.


Obviously, before we can use it assuming it reflects the *current* state 
of the working directory, we have to refresh the bits via the refresh logic.