Re: [PATCH v7 03/12] update-index: add a new --force-write-index option

2017-09-20 Thread Junio C Hamano
Junio C Hamano  writes:

> Ben Peart  writes:
>
>> On 9/20/2017 9:46 PM, Junio C Hamano wrote:
>>> Ben Peart  writes:
>>>
 Lets see how my ascii art skills do at describing this:
>>>
>>> Your ascii art is fine.  If you said upfront that the capital
>>> letters signify points in time, lower letters are file-touching
>>> events, and time flows from left to right, it would have been
>>> perfect ;-)
>>
>> Rats, so close and yet... ;-)
>
> Nah, sorry for forgetting to add "... but I could guess that was the
> case after reading a few paragraphs, at which point I rewound and
> started reading from the beginning, and it was crystal clear."
>
>> Yes, I suppose we _could_ add a 2nd bit (and then add the logic to set
>> that bit every time a fsmonitor change was made) but I don't see that
>> it really buys us anything useful.  The force write flag in
>> update-index is off by default and the only scenario we have that
>> someone would set it is for test cases where the perf of writing out
>> the index when it is not needed just doesn't matter.
>
> I tend to agree now.  
>
> My reaction primarily came from ...

oops. please ignore the last paragraph, or transplant it to the
other thread X-<.


Re: [PATCH v7 03/12] update-index: add a new --force-write-index option

2017-09-20 Thread Junio C Hamano
Ben Peart  writes:

> On 9/20/2017 9:46 PM, Junio C Hamano wrote:
>> Ben Peart  writes:
>>
>>> Lets see how my ascii art skills do at describing this:
>>
>> Your ascii art is fine.  If you said upfront that the capital
>> letters signify points in time, lower letters are file-touching
>> events, and time flows from left to right, it would have been
>> perfect ;-)
>
> Rats, so close and yet... ;-)

Nah, sorry for forgetting to add "... but I could guess that was the
case after reading a few paragraphs, at which point I rewound and
started reading from the beginning, and it was crystal clear."

> Yes, I suppose we _could_ add a 2nd bit (and then add the logic to set
> that bit every time a fsmonitor change was made) but I don't see that
> it really buys us anything useful.  The force write flag in
> update-index is off by default and the only scenario we have that
> someone would set it is for test cases where the perf of writing out
> the index when it is not needed just doesn't matter.

I tend to agree now.  

My reaction primarily came from that I couldn't quite tell what the
IGNORE_* bit was ment to do and assumed that it meant pretty much
the same thing as existing ones like "valid bit is untrustworthy, so
do not pay attention to it".  It turns out that this one has quite a
different meaning, that is not connected to how much we should trust
state maintained by the fsmonitor, which force me off-track.

Thanks.


Re: [PATCH v7 03/12] update-index: add a new --force-write-index option

2017-09-20 Thread Ben Peart



On 9/20/2017 9:46 PM, Junio C Hamano wrote:

Ben Peart  writes:


Lets see how my ascii art skills do at describing this:


Your ascii art is fine.  If you said upfront that the capital
letters signify points in time, lower letters are file-touching
events, and time flows from left to right, it would have been
perfect ;-)



Rats, so close and yet... ;-)


There is no real concern about accumulating too many changes as 1) the
processing cost for additional modified files is fairly trivial and 2)
the index ends up getting written out pretty frequently anyway as
files are added/removed/staged/etc which updates the
fsmonitor_last_update time.


I still see some impedance mismatch here.  The optimization
described is valid but the code to do the optimization would avoid
writing the index file out when the in-core index is dirty only
because the status reported by fsmonitor changed--if there were
other changes (e.g. the code added a new index entry), even with the
optimization, you would still want to write the index out, right?



Yes, that is exactly how it works.  FSMONITOR_CHANGED is only set when 
the fsmonitor index extension is added or removed but all the other 
logic to flag the index dirty (CE_ENTRY_CHANGED/REMOVED/ADDED, 
UNTRACKED_CHANGED, etc) still exists and will still trigger the index to 
be written out as it always has.  It's not to say some of these other 
changes could not do the same optimization (I haven't looked) if 
recomputing is cheaper than writing out the index.



With or without the need for forced flush to help debugging, would
that suggest that you need two bits now, instead of just a single
'active-cache-changed' bit?

By keeping track of that new bit that tells us if we have
fsmonitor-only changes that we _could_ flush, this patch can further
reduce the need for forced flushing (i.e. if we know we didn't get
fsmonitor induced dirtyness, force_write can still be no-op), no?



Yes, I suppose we _could_ add a 2nd bit (and then add the logic to set 
that bit every time a fsmonitor change was made) but I don't see that it 
really buys us anything useful.  The force write flag in update-index is 
off by default and the only scenario we have that someone would set it 
is for test cases where the perf of writing out the index when it is not 
needed just doesn't matter.




The challenge came when it was time to test that the changes to the
index were correct.  Since they are lazily written by default, I
needed a way to force the write so that I could verify the index on
disk was correct.  Hence, this patch.



OPT_END()
};
   @@ -1147,7 +1150,7 @@ int cmd_update_index(int argc, const char
**argv, const char *prefix)
die("BUG: bad untracked_cache value: %d", untracked_cache);
}
   -if (active_cache_changed) {
+   if (active_cache_changed || force_write) {
if (newfd < 0) {
if (refresh_args.flags & REFRESH_QUIET)
exit(128);


Re: [PATCH v7 03/12] update-index: add a new --force-write-index option

2017-09-20 Thread Junio C Hamano
Ben Peart  writes:

> Lets see how my ascii art skills do at describing this:

Your ascii art is fine.  If you said upfront that the capital
letters signify points in time, lower letters are file-touching
events, and time flows from left to right, it would have been
perfect ;-)

> There is no real concern about accumulating too many changes as 1) the
> processing cost for additional modified files is fairly trivial and 2)
> the index ends up getting written out pretty frequently anyway as
> files are added/removed/staged/etc which updates the
> fsmonitor_last_update time.

I still see some impedance mismatch here.  The optimization
described is valid but the code to do the optimization would avoid
writing the index file out when the in-core index is dirty only
because the status reported by fsmonitor changed--if there were
other changes (e.g. the code added a new index entry), even with the
optimization, you would still want to write the index out, right?

With or without the need for forced flush to help debugging, would
that suggest that you need two bits now, instead of just a single
'active-cache-changed' bit?

By keeping track of that new bit that tells us if we have
fsmonitor-only changes that we _could_ flush, this patch can further
reduce the need for forced flushing (i.e. if we know we didn't get
fsmonitor induced dirtyness, force_write can still be no-op), no?

>
> The challenge came when it was time to test that the changes to the
> index were correct.  Since they are lazily written by default, I
> needed a way to force the write so that I could verify the index on
> disk was correct.  Hence, this patch.
>
>
>>> OPT_END()
>>> };
>>>   @@ -1147,7 +1150,7 @@ int cmd_update_index(int argc, const char
>>> **argv, const char *prefix)
>>> die("BUG: bad untracked_cache value: %d", untracked_cache);
>>> }
>>>   - if (active_cache_changed) {
>>> +   if (active_cache_changed || force_write) {
>>> if (newfd < 0) {
>>> if (refresh_args.flags & REFRESH_QUIET)
>>> exit(128);


Re: [PATCH v7 03/12] update-index: add a new --force-write-index option

2017-09-20 Thread Ben Peart



On 9/20/2017 1:47 AM, Junio C Hamano wrote:

Ben Peart  writes:


+   OPT_SET_INT(0, "force-write-index", _write,
+   N_("write out the index even if is not flagged as 
changed"), 1),


Hmph.  The only time this makes difference is when the code forgets
to mark active_cache_changed even when it actually made a change to
the index, no?  I do understand the wish to be able to observe what
_would_ be written if such a bug did not exist in order to debug the
other aspects of the change in this series, but at the same time I
fear that we may end up sweeping the problem under the rug by
running the tests with this option.



This is to enable a performance optimization I discovered while perf 
testing the patch series.  It enables us to do a lazy index write for 
fsmonitor detected changes but still always generate correct results.


Lets see how my ascii art skills do at describing this:

1) Index marked dirty on every fsmonitor change:
A---x---B---y---C

2) Index *not* marked dirty on fsmonitor changes:
A---x---B---x,y---C

Assume the index is written and up-to-date at point A.

In scenario #1 above, the index is marked fsmonitor dirty every time the 
fsmonitor detects a file that has been modified.  At point B, the 
fsmonitor integration script returns that file 'x' has been modified 
since A, the index is marked dirty and then written to disk with a 
last_update time of B.  At point C, the script returns 'y' as the 
changes since point B, the index is marked dirty and written to disk again.


In scenario #2, the index is *not* marked fsmonitor dirty when changed 
are detected.  At point B, the script returns 'x' but the index is not 
flagged dirty nor written to disk.  At point C, the script will return 
'x' and 'y' (since both have been changed since time 'A') and again the 
index is not marked dirty nor written to disk.


Correct results are generated in both scenarios but in scenario 2, there 
were 2 fewer index writes.  In short, the changed files were accumulated 
as the cost of processing 2 files at point C (vs 1) has no measurable 
difference in perf but the savings of two unnecessary index writes is 
significant (especially when the index gets large).


There is no real concern about accumulating too many changes as 1) the 
processing cost for additional modified files is fairly trivial and 2) 
the index ends up getting written out pretty frequently anyway as files 
are added/removed/staged/etc which updates the fsmonitor_last_update time.


The challenge came when it was time to test that the changes to the 
index were correct.  Since they are lazily written by default, I needed 
a way to force the write so that I could verify the index on disk was 
correct.  Hence, this patch.




OPT_END()
};
  
@@ -1147,7 +1150,7 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)

die("BUG: bad untracked_cache value: %d", untracked_cache);
}
  
-	if (active_cache_changed) {

+   if (active_cache_changed || force_write) {
if (newfd < 0) {
if (refresh_args.flags & REFRESH_QUIET)
exit(128);


Re: [PATCH v7 03/12] update-index: add a new --force-write-index option

2017-09-19 Thread Junio C Hamano
Ben Peart  writes:

> + OPT_SET_INT(0, "force-write-index", _write,
> + N_("write out the index even if is not flagged as 
> changed"), 1),

Hmph.  The only time this makes difference is when the code forgets
to mark active_cache_changed even when it actually made a change to
the index, no?  I do understand the wish to be able to observe what
_would_ be written if such a bug did not exist in order to debug the
other aspects of the change in this series, but at the same time I
fear that we may end up sweeping the problem under the rug by
running the tests with this option.

>   OPT_END()
>   };
>  
> @@ -1147,7 +1150,7 @@ int cmd_update_index(int argc, const char **argv, const 
> char *prefix)
>   die("BUG: bad untracked_cache value: %d", untracked_cache);
>   }
>  
> - if (active_cache_changed) {
> + if (active_cache_changed || force_write) {
>   if (newfd < 0) {
>   if (refresh_args.flags & REFRESH_QUIET)
>   exit(128);