Re: [PATCH v7 03/12] update-index: add a new --force-write-index option
Junio C Hamanowrites: > Ben Peart writes: > >> On 9/20/2017 9:46 PM, Junio C Hamano wrote: >>> Ben Peart writes: >>> Lets see how my ascii art skills do at describing this: >>> >>> Your ascii art is fine. If you said upfront that the capital >>> letters signify points in time, lower letters are file-touching >>> events, and time flows from left to right, it would have been >>> perfect ;-) >> >> Rats, so close and yet... ;-) > > Nah, sorry for forgetting to add "... but I could guess that was the > case after reading a few paragraphs, at which point I rewound and > started reading from the beginning, and it was crystal clear." > >> Yes, I suppose we _could_ add a 2nd bit (and then add the logic to set >> that bit every time a fsmonitor change was made) but I don't see that >> it really buys us anything useful. The force write flag in >> update-index is off by default and the only scenario we have that >> someone would set it is for test cases where the perf of writing out >> the index when it is not needed just doesn't matter. > > I tend to agree now. > > My reaction primarily came from ... oops. please ignore the last paragraph, or transplant it to the other thread X-<.
Re: [PATCH v7 03/12] update-index: add a new --force-write-index option
Ben Peartwrites: > On 9/20/2017 9:46 PM, Junio C Hamano wrote: >> Ben Peart writes: >> >>> Lets see how my ascii art skills do at describing this: >> >> Your ascii art is fine. If you said upfront that the capital >> letters signify points in time, lower letters are file-touching >> events, and time flows from left to right, it would have been >> perfect ;-) > > Rats, so close and yet... ;-) Nah, sorry for forgetting to add "... but I could guess that was the case after reading a few paragraphs, at which point I rewound and started reading from the beginning, and it was crystal clear." > Yes, I suppose we _could_ add a 2nd bit (and then add the logic to set > that bit every time a fsmonitor change was made) but I don't see that > it really buys us anything useful. The force write flag in > update-index is off by default and the only scenario we have that > someone would set it is for test cases where the perf of writing out > the index when it is not needed just doesn't matter. I tend to agree now. My reaction primarily came from that I couldn't quite tell what the IGNORE_* bit was ment to do and assumed that it meant pretty much the same thing as existing ones like "valid bit is untrustworthy, so do not pay attention to it". It turns out that this one has quite a different meaning, that is not connected to how much we should trust state maintained by the fsmonitor, which force me off-track. Thanks.
Re: [PATCH v7 03/12] update-index: add a new --force-write-index option
On 9/20/2017 9:46 PM, Junio C Hamano wrote: Ben Peartwrites: Lets see how my ascii art skills do at describing this: Your ascii art is fine. If you said upfront that the capital letters signify points in time, lower letters are file-touching events, and time flows from left to right, it would have been perfect ;-) Rats, so close and yet... ;-) There is no real concern about accumulating too many changes as 1) the processing cost for additional modified files is fairly trivial and 2) the index ends up getting written out pretty frequently anyway as files are added/removed/staged/etc which updates the fsmonitor_last_update time. I still see some impedance mismatch here. The optimization described is valid but the code to do the optimization would avoid writing the index file out when the in-core index is dirty only because the status reported by fsmonitor changed--if there were other changes (e.g. the code added a new index entry), even with the optimization, you would still want to write the index out, right? Yes, that is exactly how it works. FSMONITOR_CHANGED is only set when the fsmonitor index extension is added or removed but all the other logic to flag the index dirty (CE_ENTRY_CHANGED/REMOVED/ADDED, UNTRACKED_CHANGED, etc) still exists and will still trigger the index to be written out as it always has. It's not to say some of these other changes could not do the same optimization (I haven't looked) if recomputing is cheaper than writing out the index. With or without the need for forced flush to help debugging, would that suggest that you need two bits now, instead of just a single 'active-cache-changed' bit? By keeping track of that new bit that tells us if we have fsmonitor-only changes that we _could_ flush, this patch can further reduce the need for forced flushing (i.e. if we know we didn't get fsmonitor induced dirtyness, force_write can still be no-op), no? Yes, I suppose we _could_ add a 2nd bit (and then add the logic to set that bit every time a fsmonitor change was made) but I don't see that it really buys us anything useful. The force write flag in update-index is off by default and the only scenario we have that someone would set it is for test cases where the perf of writing out the index when it is not needed just doesn't matter. The challenge came when it was time to test that the changes to the index were correct. Since they are lazily written by default, I needed a way to force the write so that I could verify the index on disk was correct. Hence, this patch. OPT_END() }; @@ -1147,7 +1150,7 @@ int cmd_update_index(int argc, const char **argv, const char *prefix) die("BUG: bad untracked_cache value: %d", untracked_cache); } -if (active_cache_changed) { + if (active_cache_changed || force_write) { if (newfd < 0) { if (refresh_args.flags & REFRESH_QUIET) exit(128);
Re: [PATCH v7 03/12] update-index: add a new --force-write-index option
Ben Peartwrites: > Lets see how my ascii art skills do at describing this: Your ascii art is fine. If you said upfront that the capital letters signify points in time, lower letters are file-touching events, and time flows from left to right, it would have been perfect ;-) > There is no real concern about accumulating too many changes as 1) the > processing cost for additional modified files is fairly trivial and 2) > the index ends up getting written out pretty frequently anyway as > files are added/removed/staged/etc which updates the > fsmonitor_last_update time. I still see some impedance mismatch here. The optimization described is valid but the code to do the optimization would avoid writing the index file out when the in-core index is dirty only because the status reported by fsmonitor changed--if there were other changes (e.g. the code added a new index entry), even with the optimization, you would still want to write the index out, right? With or without the need for forced flush to help debugging, would that suggest that you need two bits now, instead of just a single 'active-cache-changed' bit? By keeping track of that new bit that tells us if we have fsmonitor-only changes that we _could_ flush, this patch can further reduce the need for forced flushing (i.e. if we know we didn't get fsmonitor induced dirtyness, force_write can still be no-op), no? > > The challenge came when it was time to test that the changes to the > index were correct. Since they are lazily written by default, I > needed a way to force the write so that I could verify the index on > disk was correct. Hence, this patch. > > >>> OPT_END() >>> }; >>> @@ -1147,7 +1150,7 @@ int cmd_update_index(int argc, const char >>> **argv, const char *prefix) >>> die("BUG: bad untracked_cache value: %d", untracked_cache); >>> } >>> - if (active_cache_changed) { >>> + if (active_cache_changed || force_write) { >>> if (newfd < 0) { >>> if (refresh_args.flags & REFRESH_QUIET) >>> exit(128);
Re: [PATCH v7 03/12] update-index: add a new --force-write-index option
On 9/20/2017 1:47 AM, Junio C Hamano wrote: Ben Peartwrites: + OPT_SET_INT(0, "force-write-index", _write, + N_("write out the index even if is not flagged as changed"), 1), Hmph. The only time this makes difference is when the code forgets to mark active_cache_changed even when it actually made a change to the index, no? I do understand the wish to be able to observe what _would_ be written if such a bug did not exist in order to debug the other aspects of the change in this series, but at the same time I fear that we may end up sweeping the problem under the rug by running the tests with this option. This is to enable a performance optimization I discovered while perf testing the patch series. It enables us to do a lazy index write for fsmonitor detected changes but still always generate correct results. Lets see how my ascii art skills do at describing this: 1) Index marked dirty on every fsmonitor change: A---x---B---y---C 2) Index *not* marked dirty on fsmonitor changes: A---x---B---x,y---C Assume the index is written and up-to-date at point A. In scenario #1 above, the index is marked fsmonitor dirty every time the fsmonitor detects a file that has been modified. At point B, the fsmonitor integration script returns that file 'x' has been modified since A, the index is marked dirty and then written to disk with a last_update time of B. At point C, the script returns 'y' as the changes since point B, the index is marked dirty and written to disk again. In scenario #2, the index is *not* marked fsmonitor dirty when changed are detected. At point B, the script returns 'x' but the index is not flagged dirty nor written to disk. At point C, the script will return 'x' and 'y' (since both have been changed since time 'A') and again the index is not marked dirty nor written to disk. Correct results are generated in both scenarios but in scenario 2, there were 2 fewer index writes. In short, the changed files were accumulated as the cost of processing 2 files at point C (vs 1) has no measurable difference in perf but the savings of two unnecessary index writes is significant (especially when the index gets large). There is no real concern about accumulating too many changes as 1) the processing cost for additional modified files is fairly trivial and 2) the index ends up getting written out pretty frequently anyway as files are added/removed/staged/etc which updates the fsmonitor_last_update time. The challenge came when it was time to test that the changes to the index were correct. Since they are lazily written by default, I needed a way to force the write so that I could verify the index on disk was correct. Hence, this patch. OPT_END() }; @@ -1147,7 +1150,7 @@ int cmd_update_index(int argc, const char **argv, const char *prefix) die("BUG: bad untracked_cache value: %d", untracked_cache); } - if (active_cache_changed) { + if (active_cache_changed || force_write) { if (newfd < 0) { if (refresh_args.flags & REFRESH_QUIET) exit(128);
Re: [PATCH v7 03/12] update-index: add a new --force-write-index option
Ben Peartwrites: > + OPT_SET_INT(0, "force-write-index", _write, > + N_("write out the index even if is not flagged as > changed"), 1), Hmph. The only time this makes difference is when the code forgets to mark active_cache_changed even when it actually made a change to the index, no? I do understand the wish to be able to observe what _would_ be written if such a bug did not exist in order to debug the other aspects of the change in this series, but at the same time I fear that we may end up sweeping the problem under the rug by running the tests with this option. > OPT_END() > }; > > @@ -1147,7 +1150,7 @@ int cmd_update_index(int argc, const char **argv, const > char *prefix) > die("BUG: bad untracked_cache value: %d", untracked_cache); > } > > - if (active_cache_changed) { > + if (active_cache_changed || force_write) { > if (newfd < 0) { > if (refresh_args.flags & REFRESH_QUIET) > exit(128);