Recently Michael and I were working on a patch series (not yet
published), which did something like:


  const char *path = git_path("foo");

  ... do stuff with path ...

  for_each_ref(some_callback, NULL);

  ... do some other stuff ...

  unlink(path);

Clever readers may have spotted the bug immediately, but we did not,
until we found that random loose refs were being deleted from the
repository.

The problem is that git_path uses a static buffer that gets overwritten
by subsequent calls. The ref code uses it to iterate over all of the
loose refs in a directory, so our original path is trashed before
for_each_ref returns. Except to make it even more exciting, git_path
actually has a ring of _four_ buffers, so any trivial test you write
will probably work just fine; it's only when you use a real repository
that it causes problems (and then, only if the code path is such that
the loose refs were not previously accessed and cached!).

Michael likened git_path to "a hand-grenade with the pin pulled out",
and I tend to agree. On the other hand, it's pretty darn useful to be
able to get a quick path without having to deal with memory allocation
and ownership.  This patch series tries to document the danger, and
remove some of the more questionable uses. I don't know whether this is
fixing any actual latent bugs; I traced a number of the code paths
manually, but never found a bug. There were some near misses, though,
which make me believe that seemingly-unrelated refactoring could
introduce a bug.

I stopped short of trying to eradicate git_path entirely, and settled
for:

  git grep -E '[^_](git_|mk)path\('

producing a fairly tame-looking set of function calls. It's OK to pass
the result of git_path() to a system call, or something that is a thin
wrapper around one (e.g., strbuf_read_file).

I think this takes us most of the way there. I left out a few cases
where introducing allocations would have been awkward, and I verified
that there were no bugs (e.g., rerere_path). And I left out a few spots
that conflict with topics in "next" (and luckily, in all cases what is
in next makes the problem go away, so we do not have to follow-up for
those sites).

Along the way, there are a few cleanups (e.g., I polished off the recent
hold_lock_file_for_append topic which was on the list, as it had some
problematic calls).

  [01/17]: cache.h: clarify documentation for git_path, et al
  [02/17]: cache.h: complete set of git_path_submodule helpers
  [03/17]: t5700: modernize style
  [04/17]: add_to_alternates_file: don't add duplicate entries
  [05/17]: remove hold_lock_file_for_append
  [06/17]: prefer git_pathdup to git_path in some possibly-dangerous cases
  [07/17]: prefer mkpathdup to mkpath in assignments
  [08/17]: remote.c: drop extraneous local variable from migrate_file
  [09/17]: refs.c: remove extra git_path calls from read_loose refs
  [10/17]: path.c: drop git_path_submodule
  [11/17]: refs.c: simplify strbufs in reflog setup and writing
  [12/17]: refs.c: avoid repeated git_path calls in rename_tmp_log
  [13/17]: refs.c: avoid git_path assignment in lock_ref_sha1_basic
  [14/17]: refs.c: remove_empty_directories can take a strbuf
  [15/17]: find_hook: keep our own static buffer
  [16/17]: get_repo_path: refactor path-allocation
  [17/17]: memoize common git-path "constant" files

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to