Re: [PATCH v2] dir.c: ignore paths containing .git when invalidating untracked cache
On Tue, Feb 13, 2018 at 5:24 PM, Duy Nguyenwrote: > I am worried that always doing the right thing may carry performance > penalty (this is based purely on reading verify_path() code, no actual > benchmarking). For safety, you can always set safe_path to zero. But > if you do a lot of invalidation and something starts to slow down, > then you can consider setting safe_path to 1 (if it's actually safe to > do so). Fair enough. Thanks for articulating the reasoning.
Re: [PATCH v2] dir.c: ignore paths containing .git when invalidating untracked cache
On Wed, Feb 14, 2018 at 12:57 AM, Junio C Hamanowrote: > Duy Nguyen writes: > >> It's very tempting considering that the amount of changes is much >> smaller. But I think we should go with my version. The hope is when a >> _new_ call site appears, the author would think twice before passing >> zero or one to the safe_path argument. > > Wouldn't it be a better API if the author of new callsite does not > have to think twice and can instead rely on the called function > untracked_cache_invalidate_path() to always do the right thing? I am worried that always doing the right thing may carry performance penalty (this is based purely on reading verify_path() code, no actual benchmarking). For safety, you can always set safe_path to zero. But if you do a lot of invalidation and something starts to slow down, then you can consider setting safe_path to 1 (if it's actually safe to do so). I think we do mass invalidation in some case, so I will try to actually benchmark that and see if this safe_path argument is justified or if we can always call verify_path(). -- Duy
Re: [PATCH v2] dir.c: ignore paths containing .git when invalidating untracked cache
Duy Nguyenwrites: > It's very tempting considering that the amount of changes is much > smaller. But I think we should go with my version. The hope is when a > _new_ call site appears, the author would think twice before passing > zero or one to the safe_path argument. Wouldn't it be a better API if the author of new callsite does not have to think twice and can instead rely on the called function untracked_cache_invalidate_path() to always do the right thing?
Re: [PATCH v2] dir.c: ignore paths containing .git when invalidating untracked cache
On Wed, Feb 7, 2018 at 11:59 PM, Ben Peartwrote: > diff --git a/dir.c b/dir.c > index 7c4b45e30e..d431da46f5 100644 > --- a/dir.c > +++ b/dir.c > @@ -1773,7 +1773,7 @@ static enum path_treatment treat_path(struct > dir_struct *dir, > if (!de) > return treat_path_fast(dir, untracked, cdir, istate, path, >baselen, pathspec); > - if (is_dot_or_dotdot(de->d_name) || !strcmp(de->d_name, ".git")) > + if (is_dot_or_dotdot(de->d_name) || !fspathcmp(de->d_name, ".git")) > return path_none; > strbuf_setlen(path, baselen); > strbuf_addstr(path, de->d_name); > diff --git a/fsmonitor.c b/fsmonitor.c > index 0af7c4edba..019576f306 100644 > --- a/fsmonitor.c > +++ b/fsmonitor.c > @@ -118,8 +118,12 @@ static int query_fsmonitor(int version, uint64_t > last_update, struct strbuf *que > > static void fsmonitor_refresh_callback(struct index_state *istate, const > char *name) > { > - int pos = index_name_pos(istate, name, strlen(name)); > + int pos; > > + if (!verify_path(name)) > + return; > + > + pos = index_name_pos(istate, name, strlen(name)); > if (pos >= 0) { > struct cache_entry *ce = istate->cache[pos]; > ce->ce_flags &= ~CE_FSMONITOR_VALID; > It's very tempting considering that the amount of changes is much smaller. But I think we should go with my version. The hope is when a _new_ call site appears, the author would think twice before passing zero or one to the safe_path argument. > diff --git a/t/t7519-status-fsmonitor.sh b/t/t7519-status-fsmonitor.sh > index eb2d13bbcf..756beb0d8e 100755 > --- a/t/t7519-status-fsmonitor.sh > +++ b/t/t7519-status-fsmonitor.sh > @@ -314,4 +314,43 @@ test_expect_success 'splitting the index results in the > same state' ' > test_cmp expect actual > ' > > +test_expect_success UNTRACKED_CACHE 'ignore .git changes when invalidating > UNTR' ' > + test_create_repo dot-git && > + ( > + cd dot-git && > + mkdir -p .git/hooks && > + : >tracked && > + : >modified && > + mkdir dir1 && > + : >dir1/tracked && > + : >dir1/modified && > + mkdir dir2 && > + : >dir2/tracked && > + : >dir2/modified && > + write_integration_script && > + git config core.fsmonitor .git/hooks/fsmonitor-test && > + git update-index --untracked-cache && > + git update-index --fsmonitor && > + GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace-before" \ > + git status && > + test-dump-untracked-cache >../before > + ) && > + cat >>dot-git/.git/hooks/fsmonitor-test <<-\EOF && > + printf ".git\0" > + printf ".git/index\0" > + printf "dir1/.git\0" > + printf "dir1/.git/index\0" > + EOF > + ( > + cd dot-git && > + GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace-after" \ > + git status && > + test-dump-untracked-cache >../after > + ) && > + grep "directory invalidation" trace-before >>before && > + grep "directory invalidation" trace-after >>after && > + # UNTR extension unchanged, dir invalidation count unchanged > + test_cmp before after > +' > + > test_done > > base-commit: 5be1f00a9a701532232f57958efab4be8c959a29 > -- > 2.15.0.windows.1 > -- Duy
Re: [PATCH v2] dir.c: ignore paths containing .git when invalidating untracked cache
On 2/7/2018 4:21 AM, Nguyễn Thái Ngọc Duy wrote: read_directory() code ignores all paths named ".git" even if it's not a valid git repository. See treat_path() for details. Since ".git" is basically invisible to read_directory(), when we are asked to invalidate a path that contains ".git", we can safely ignore it because the slow path would not consider it anyway. This helps when fsmonitor is used and we have a real ".git" repo at worktree top. Occasionally .git/index will be updated and if the fsmonitor hook does not filter it, untracked cache is asked to invalidate the path ".git/index". Without this patch, we invalidate the root directory unncessarily, which: - makes read_directory() fall back to slow path for root directory (slower) - makes the index dirty (because UNTR extension is updated). Depending on the index size, writing it down could also be slow. Thank you again, this patch makes much more sense to me. A note about the new "safe_path" knob. Since this new check could be relatively expensive, avoid it when we know it's not needed. If the path comes from the index, it can't contain ".git". If it does contain, we may be screwed up at many more levels, not just this one. I do have a simplifying suggestion to make. I noticed that other uses of verify_path() check when the potentially erroneous path is passed in and then all the underlying code can assume it is valid. I think that makes sense here as well and it makes for a smaller patch. diff --git a/fsmonitor.h b/fsmonitor.h index cd3cc0ccf2..65f3743636 100644 --- a/fsmonitor.h +++ b/fsmonitor.h @@ -65,7 +65,7 @@ static inline void mark_fsmonitor_invalid(struct index_state *istate, struct cac { if (core_fsmonitor) { ce->ce_flags &= ~CE_FSMONITOR_VALID; - untracked_cache_invalidate_path(istate, ce->name); + untracked_cache_invalidate_path(istate, ce->name, 1); This test isn't needed because we're pulling the name right out of the cache entry so it doesn't need to be verified. Here is a modified version of your patch for consideration: read_directory() code ignores all paths named ".git" even if it's not a valid git repository. See treat_path() for details. Since ".git" is basically invisible to read_directory(), when we are asked to invalidate a path that contains ".git", we can safely ignore it because the slow path would not consider it anyway. This helps when fsmonitor is used and we have a real ".git" repo at worktree top. Occasionally .git/index will be updated and if the fsmonitor hook does not filter it, untracked cache is asked to invalidate the path ".git/index". Without this patch, we invalidate the root directory unnecessarily, which: - makes read_directory() fall back to slow path for root directory (slower) - makes the index dirty (because UNTR extension is updated). Depending on the index size, writing it down could also be slow. Noticed-by: Ævar Arnfjörð BjarmasonSigned-off-by: Nguyễn Thái Ngọc Duy Signed-off-by: Ben Peart --- Notes: Base Ref: master Web-Diff: https://github.com/benpeart/git/commit/218a577618 Checkout: git fetch https://github.com/benpeart/git verify_path-v1 && git checkout 218a577618 dir.c | 2 +- fsmonitor.c | 6 +- t/t7519-status-fsmonitor.sh | 39 +++ 3 files changed, 45 insertions(+), 2 deletions(-) diff --git a/dir.c b/dir.c index 7c4b45e30e..d431da46f5 100644 --- a/dir.c +++ b/dir.c @@ -1773,7 +1773,7 @@ static enum path_treatment treat_path(struct dir_struct *dir, if (!de) return treat_path_fast(dir, untracked, cdir, istate, path, baselen, pathspec); - if (is_dot_or_dotdot(de->d_name) || !strcmp(de->d_name, ".git")) + if (is_dot_or_dotdot(de->d_name) || !fspathcmp(de->d_name, ".git")) return path_none; strbuf_setlen(path, baselen); strbuf_addstr(path, de->d_name); diff --git a/fsmonitor.c b/fsmonitor.c index 0af7c4edba..019576f306 100644 --- a/fsmonitor.c +++ b/fsmonitor.c @@ -118,8 +118,12 @@ static int query_fsmonitor(int version, uint64_t last_update, struct strbuf *que static void fsmonitor_refresh_callback(struct index_state *istate, const char *name) { - int pos = index_name_pos(istate, name, strlen(name)); + int pos; + if (!verify_path(name)) + return; + + pos = index_name_pos(istate, name, strlen(name)); if (pos >= 0) { struct cache_entry *ce = istate->cache[pos]; ce->ce_flags &= ~CE_FSMONITOR_VALID; diff --git a/t/t7519-status-fsmonitor.sh b/t/t7519-status-fsmonitor.sh index eb2d13bbcf..756beb0d8e 100755 --- a/t/t7519-status-fsmonitor.sh +++ b/t/t7519-status-fsmonitor.sh @@ -314,4 +314,43 @@ test_expect_success 'splitting the