Re: how to remove from history just *one* version of a file/dir?

2017-09-15 Thread Jeff King
On Fri, Sep 15, 2017 at 07:06:43AM -0400, Robert P. J. Day wrote:

> > I think you want to stick with a --tree-filter (or an
> > --index-filter), but just selectively decide when to do the
> > deletion. For example, if you can tell the difference between the
> > two states based on the presence of some file, then perhaps:
> >
> >   git filter-branch --prune-empty --index-filter '
> > if git rev-parse --verify :dir/sentinel >/dev/null 2>&1
> > then
> >   git rm --cached -rf dir
> > fi
> >   ' HEAD
> >
> > The "--prune-empty" is optional, but will drop commits that become
> > empty because they _only_ touched that directory.
> >
> > We use ":dir/sentinel" to see if the entry is in the index, because
> > the index filter won't have the tree checked out. Likewise, we need
> > to use "rm --cached" to just touch the index.
> 
>   got it. one last query -- i note that there is no "else" clause in
> that code for "--index-filter". am i assuming correctly that if i was
> using "--tree-filter" instead, i really would need if/then/else along
> the lines of:
> 
>   if blah ; then
> skip_commit "$@"
>   else
> git commit-tree "$@"
>   fi
> 
> thank you kindly.

No, I think a tree-filter would just be:

  if test -e dir/sentinel
  then
rm -rf dir
git add -u
  fi

(I can't remember if the "add -u" is necessary or not; I rarely use tree
filters).

In other words, for each commit you are just saying "if the bad version
of the directory is present, then get rid of it". You shouldn't need to
deal with commit-tree at all. The filter-branch script will take care of
committing whatever tree state your filter leaves in place.

Do note that I didn't test either of the versions I sent to you, so it's
possible I'm missing some subtle thing. But I'm pretty sure the general
direction is correct.

-Peff


Re: how to remove from history just *one* version of a file/dir?

2017-09-15 Thread Robert P. J. Day
On Thu, 14 Sep 2017, Jeff King wrote:

> On Thu, Sep 14, 2017 at 07:32:11AM -0400, Robert P. J. Day wrote:
>
> >   [is this the right place to ask questions about git usage? or is
> > there a different forum where one can submit possibly
> > embarrassingly silly questions?]
>
> No, this is the right place for embarrassing questions. :)
>
> >   say, early on, one commits a sizable directory of content, call
> > it /mydir. that directory sits there for a while until it becomes
> > obvious it's out of date and worthless and should never have been
> > committed. the obvious solution would seem to be:
> >
> >   $ git filter-branch --tree-filter 'rm -rf /mydir' HEAD
> >
> > correct?
>
> That would work, though note that using an --index-filter would be
> more efficient (since it avoids checking out each tree as it walks
> the history).

  i'm just digging into --index-filter as we speak, i realize it's
noticeably faster.

> >   however, say one version of that directory was committed early
> > on, then later tossed for being useless with "git rm", and
> > subsequently replaced by newer content under exactly the same
> > name. now i'd like to go back and delete the history related to
> > that early version of /mydir, but not the second.
>
> Makes sense as a goal.
>
> >   obviously, i can't use the above command as it would delete both
> > versions. so it appears the solution would be a trivial
> > application of the "--commit-filter" option:
> >
> >git filter-branch --commit-filter '
> >  if [ "$GIT_COMMIT" = "" ] ; then
> >skip_commit "$@";
> >  else
> >git commit-tree "$@";
> >  fi' HEAD
> >
> > where  is the commit that introduced the first verrsion of
> > /mydir. do i have that right? is there a simpler way to do this?
>
> No, this won't work. Filter-branch is not walking the history and
> applying the changes to each commit, like rebase does.  It's
> literally operating on each commit object, and recall that each
> commit object points to a tree that is a snapshot of the repository
> contents.
>
> So if you skip a commit, that commit itself goes away. But the
> commit after it (which didn't touch the unwanted contents) will
> still mention those contents in its tree.

  ah, of course, duh.

> I think you want to stick with a --tree-filter (or an
> --index-filter), but just selectively decide when to do the
> deletion. For example, if you can tell the difference between the
> two states based on the presence of some file, then perhaps:
>
>   git filter-branch --prune-empty --index-filter '
>   if git rev-parse --verify :dir/sentinel >/dev/null 2>&1
>   then
> git rm --cached -rf dir
>   fi
>   ' HEAD
>
> The "--prune-empty" is optional, but will drop commits that become
> empty because they _only_ touched that directory.
>
> We use ":dir/sentinel" to see if the entry is in the index, because
> the index filter won't have the tree checked out. Likewise, we need
> to use "rm --cached" to just touch the index.

  got it. one last query -- i note that there is no "else" clause in
that code for "--index-filter". am i assuming correctly that if i was
using "--tree-filter" instead, i really would need if/then/else along
the lines of:

  if blah ; then
skip_commit "$@"
  else
git commit-tree "$@"
  fi

thank you kindly.

rday

-- 


Robert P. J. Day Ottawa, Ontario, CANADA
http://crashcourse.ca

Twitter:   http://twitter.com/rpjday
LinkedIn:   http://ca.linkedin.com/in/rpjday



Re: how to remove from history just *one* version of a file/dir?

2017-09-14 Thread Jeff King
On Thu, Sep 14, 2017 at 07:32:11AM -0400, Robert P. J. Day wrote:

>   [is this the right place to ask questions about git usage? or is
> there a different forum where one can submit possibly embarrassingly
> silly questions?]

No, this is the right place for embarrassing questions. :)

>   say, early on, one commits a sizable directory of content, call it
> /mydir. that directory sits there for a while until it becomes obvious
> it's out of date and worthless and should never have been committed.
> the obvious solution would seem to be:
> 
>   $ git filter-branch --tree-filter 'rm -rf /mydir' HEAD
> 
> correct?

That would work, though note that using an --index-filter would be more
efficient (since it avoids checking out each tree as it walks the
history).

>   however, say one version of that directory was committed early on,
> then later tossed for being useless with "git rm", and subsequently
> replaced by newer content under exactly the same name. now i'd like to
> go back and delete the history related to that early version of
> /mydir, but not the second.

Makes sense as a goal.

>   obviously, i can't use the above command as it would delete both
> versions. so it appears the solution would be a trivial application of
> the "--commit-filter" option:
> 
>git filter-branch --commit-filter '
>  if [ "$GIT_COMMIT" = "" ] ; then
>skip_commit "$@";
>  else
>git commit-tree "$@";
>  fi' HEAD
> 
> where  is the commit that introduced the first verrsion of
> /mydir. do i have that right? is there a simpler way to do this?

No, this won't work. Filter-branch is not walking the history and
applying the changes to each commit, like rebase does.  It's literally
operating on each commit object, and recall that each commit object
points to a tree that is a snapshot of the repository contents.

So if you skip a commit, that commit itself goes away. But the commit
after it (which didn't touch the unwanted contents) will still mention
those contents in its tree.

I think you want to stick with a --tree-filter (or an --index-filter),
but just selectively decide when to do the deletion. For example, if you
can tell the difference between the two states based on the presence of
some file, then perhaps:

  git filter-branch --prune-empty --index-filter '
if git rev-parse --verify :dir/sentinel >/dev/null 2>&1
then
  git rm --cached -rf dir
fi
  ' HEAD

The "--prune-empty" is optional, but will drop commits that become empty
because they _only_ touched that directory.

We use ":dir/sentinel" to see if the entry is in the index, because the
index filter won't have the tree checked out. Likewise, we need to use
"rm --cached" to just touch the index.

-Peff


how to remove from history just *one* version of a file/dir?

2017-09-14 Thread Robert P. J. Day

  [is this the right place to ask questions about git usage? or is
there a different forum where one can submit possibly embarrassingly
silly questions?]

  i've been perusing "git filter-branch", and i'm curious if i have
the right idea about how to very selectively get rid of some useless
history.

  say, early on, one commits a sizable directory of content, call it
/mydir. that directory sits there for a while until it becomes obvious
it's out of date and worthless and should never have been committed.
the obvious solution would seem to be:

  $ git filter-branch --tree-filter 'rm -rf /mydir' HEAD

correct?

  however, say one version of that directory was committed early on,
then later tossed for being useless with "git rm", and subsequently
replaced by newer content under exactly the same name. now i'd like to
go back and delete the history related to that early version of
/mydir, but not the second.

  obviously, i can't use the above command as it would delete both
versions. so it appears the solution would be a trivial application of
the "--commit-filter" option:

   git filter-branch --commit-filter '
 if [ "$GIT_COMMIT" = "" ] ; then
   skip_commit "$@";
 else
   git commit-tree "$@";
 fi' HEAD

where  is the commit that introduced the first verrsion of
/mydir. do i have that right? is there a simpler way to do this?

rday

-- 


Robert P. J. Day Ottawa, Ontario, CANADA
http://crashcourse.ca

Twitter:   http://twitter.com/rpjday
LinkedIn:   http://ca.linkedin.com/in/rpjday