Re: [PATCH v3 1/4] automatically ban strcpy()

2018-07-28 Thread Jeff King
On Fri, Jul 27, 2018 at 10:34:20AM -0700, Junio C Hamano wrote:

> Jeff King  writes:
> 
> >>  $ git rebase --onto HEAD @{-1}~3 @{-1}^0
> >
> > Interesting. I'd have probably done it with an interactive rebase:
> >
> >   $ git rebase -i HEAD~4
> >   [change first "pick" to "edit"; after stopping...]
> >   $ git reset --hard HEAD^ ;# throw away patch 1
> >   $ git am -s mbox ;# apply single patch
> >   $ git rebase --continue
> >
> > Which is really the same thing,...
> 
> I have unfounded fear for doing anything other than "edit", "commit
> --amend", "rebase --continue", or "rebase --abort" during a "rebase
> -i" session.  
> 
> Especiallly "reset --hard" with anything but HEAD.  I guess that is
> because I do not fully trust/understand how the sequencer machinery
> replays the remainder of todo tasks on top of the HEAD that is
> different from what it left me to futz with when it relinquished the
> control.

I jump around via "reset --hard" all the time during interactive
rebases. I don't recall having any issues, though that does not mean
there isn't a corner case lurking. :)

> Also "am" during "rebase -i" is scary to me, as "am" also tries to
> keep its own sequencing state.  Do you know if "rebase --continue"
> would work correctly in the above sequence if "am" conflicted, I
> gave up, and said "am --abort", for example?  I don't offhand know.

This one is trickier. I _assumed_ it would be fine to "am" during a
rebase, but I didn't actually try it (in fact, I rarely do anything
exotic with "am", since my main use is just test-applying patches on a
detached head).

It _seems_ to work with this example:

-- >8 --
git init repo
cd repo

for i in 1 2 3 4; do
  echo $i >file
  git add file
  git commit -m $i
done
git format-patch --stdout -1 >patch

git reset --hard HEAD^
GIT_EDITOR='perl -i -pe "/2$/ and s/pick/edit/"' git rebase -i HEAD~2
git am patch
git am --abort
-- 8< --

I think because "am" lives in $GIT_DIR/rebase-apply, and "rebase -i"
lives in ".git/rebase-merge". Of course "rebase" can use the
rebase-apply directory, but I think interactive-rebase never will.

So it works, but mostly by luck. :)

In my ideal world, operations like this that can be continued would be
stored in a stack, and each stack element would know its operation type.
So you could do:

  # push a rebase onto the stack
  git rebase foo

  # while stopped, you might do another operation which pushes onto the
  # stack
  git am ~/patch

  # aborting an operation (or finishing it naturally) pops it off the
  # stack; now we just have the rebase on the stack
  git am --abort

  # aborting an operation that's not at the top of the stack would
  # either be an error, or could auto-abort everybody on top
  git am ~/patch
  git rebase --abort ;# aborts the am, too!

  # you could even nest similar operations; by default we find the
  # top-most one in the stack, but you could refer to them by stack
  # position.
  #
  # put us in a rebase that stops at a conflict
  git rebase foo
  # oops, rewrite the last few commits as part of fixing the conflict
  git rebase -i HEAD~3
  # nope, let's abort the whole thing (stack level 0)
  git rebase --abort=0

  # it would also be nice to have generic commands to manipulate the
  # stack
  git op list ;# show the stack
  git op abort ;# abort the top operation, whatever it is
  git op continue ;# continue the top operation, whatever it is

I've hacked something similar to "git op continue" myself and find it
very useful, but:

  - it's intimately aware of all the possible operations, including some
custom ones that I have. I wouldn't need to if each operation
touched a well-known directory to push itself on the stack, and
provided a few commands in the stack directory for things like "run
this to abort me".

  - it sometimes behaves weirdly, because I "canceled" an operation with
"reset" or "checkout", and later I expect to continue operation X,
but find that some other operation Y was waiting. Having a stack
means it's much easier to see which operations are still hanging
around (we have this already with the prompt logic that shows things
like rebases, but again, it has to be intimate with which operations
are storing data in $GIT_DIR)

Anyway. That's something I've dreamed about for a while, but the thought
of retro-fitting the existing multi-command operations turned me off.
The current systems _usually_ works.

-Peff


Re: [PATCH v3 1/4] automatically ban strcpy()

2018-07-27 Thread Junio C Hamano
Jeff King  writes:

>>  $ git rebase --onto HEAD @{-1}~3 @{-1}^0
>
> Interesting. I'd have probably done it with an interactive rebase:
>
>   $ git rebase -i HEAD~4
>   [change first "pick" to "edit"; after stopping...]
>   $ git reset --hard HEAD^ ;# throw away patch 1
>   $ git am -s mbox ;# apply single patch
>   $ git rebase --continue
>
> Which is really the same thing,...

I have unfounded fear for doing anything other than "edit", "commit
--amend", "rebase --continue", or "rebase --abort" during a "rebase
-i" session.  

Especiallly "reset --hard" with anything but HEAD.  I guess that is
because I do not fully trust/understand how the sequencer machinery
replays the remainder of todo tasks on top of the HEAD that is
different from what it left me to futz with when it relinquished the
control.

Also "am" during "rebase -i" is scary to me, as "am" also tries to
keep its own sequencing state.  Do you know if "rebase --continue"
would work correctly in the above sequence if "am" conflicted, I
gave up, and said "am --abort", for example?  I don't offhand know.


Re: [PATCH v3 1/4] automatically ban strcpy()

2018-07-27 Thread Jeff King
On Thu, Jul 26, 2018 at 10:33:27AM -0700, Junio C Hamano wrote:

> Jeff King  writes:
> 
> > So here's a replacement for just patch 1 (I'm assuming this creates less
> > work than re-posting them all, but it may not be if Junio prefers
> > dealing with a whole new mbox rather than a "rebase -i", "reset --hard
> > HEAD^", "git am" -- let me know if you'd prefer it the other way).
> 
> A single patch replacement that is clearly marked which one to
> replace and which other ones to keep, like you did here, is fine.
> The amount of work is about the same either way.
> 
> 0) I would first do these to make sure that I can replace:
> [..]

Thanks. As always, I find it interesting to see your workflows.

> 1-b) With a single patch replacement, it is quite different.
> 
>  $ git checkout HEAD~4;# we are replacing 1/4 of the original
>  $ git am -s mbox   ;# that single patch
>  $ git show-branch HEAD @{-1}
> [...]
> The most natural thing to do at this point is
> 
>  $ git cherry-pick -3 @{-1}
> 
> But we know range-pick is buggy and loses core.rewriteref, so
> instead I did this, which I know carries the notes forward:
> 
>  $ git rebase --onto HEAD @{-1}~3 @{-1}^0

Interesting. I'd have probably done it with an interactive rebase:

  $ git rebase -i HEAD~4
  [change first "pick" to "edit"; after stopping...]
  $ git reset --hard HEAD^ ;# throw away patch 1
  $ git am -s mbox ;# apply single patch
  $ git rebase --continue

Which is really the same thing, but "cheats" around the cherry-pick
problem by using rebase (which I think handles the rewriteref stuff
correctly even in interactive mode).

I guess if we wanted to be really fancy, just replacing the first "pick"
with "x git am -s mbox" would automate it. That might be handy for the
multi-patch case.

Anyway, thanks for handling it. :)

-Peff


Re: [PATCH v3 1/4] automatically ban strcpy()

2018-07-26 Thread Junio C Hamano
Jeff King  writes:

> So here's a replacement for just patch 1 (I'm assuming this creates less
> work than re-posting them all, but it may not be if Junio prefers
> dealing with a whole new mbox rather than a "rebase -i", "reset --hard
> HEAD^", "git am" -- let me know if you'd prefer it the other way).

A single patch replacement that is clearly marked which one to
replace and which other ones to keep, like you did here, is fine.
The amount of work is about the same either way.

0) I would first do these to make sure that I can replace:

 $ git checkout jk/banned-functions
 $ git log --first-parent --oneline master..
 $ git log --first-parent --oneline next..

If 'next' has some patches that are not in 'master' from the topic,
I must refrain from replacing them (in this case, there is no such
issue).

1-a) With a wholesale replacement,

 $ git checkout master...   ;# try to keep the same base
 $ git am -s mbox   ;# on detached HEAD
 $ git tbdiff ..@{-1} @{-1}..   ;# or range-diff

If the range-/tbdiff tells me that earlier N patches are the same,
the above is followed by something like (pardon off-by-one)

 $ git rebase --onto @{-1}~N HEAD~N

to preserve as many original commits as possible.

1-b) With a single patch replacement, it is quite different.

 $ git checkout HEAD~4  ;# we are replacing 1/4 of the original
 $ git am -s mbox   ;# that single patch
 $ git show-branch HEAD @{-1}

That would give me something like this

* [HEAD] automatically ban strcpy()
 ! [@{-1}] banned.h: mark strncpy() as banned
--
*  [HEAD] automatically ban strcpy()
 + [@{-1}] banned.h: mark strncpy() as banned
 + [@{-1}^] banned.h: mark sprintf() as banned
 + [@{-1}~2] banned.h: mark strcat() as banned
 + [@{-1}~3] automatically ban strcpy()
-- [HEAD^] Merge branch 'sb/blame-color' into jk/banned-function

The most natural thing to do at this point is

 $ git cherry-pick -3 @{-1}

But we know range-pick is buggy and loses core.rewriteref, so
instead I did this, which I know carries the notes forward:

 $ git rebase --onto HEAD @{-1}~3 @{-1}^0

Side note: The point of last "^0" is that I do not want to lose
the topic branch jk/banned-functions not just yet.

If I need to re-apply separate pieces of the original history, it
becomes very painful to emulate these multiple cherry-picks with
multiple "rebase --onto", though.  That is where "the amount of work
is about the same" comes from.  If cherry-pick worked correctly,
selective replacement should be less work for me.

Anyway, we already preserved as many original commits, but unlike
1-a, did so manually when we decided to replace selective patches.
So there is no further rebasing with this approach.

2) I now have two diverged histories in HEAD and @{-1} that I can
compare with range-/tbdiff in either case: 

 $ git tbdiff ..@{-1} @{-1}..

After the usual inspection and testing, replacement is concluded by

 $ git checkout -B @{-1}

which takes me back to (an updated) jk/banned-functions topic.



[PATCH v3 1/4] automatically ban strcpy()

2018-07-26 Thread Jeff King
On Thu, Jul 26, 2018 at 02:58:40AM -0400, Jeff King wrote:

> On Tue, Jul 24, 2018 at 01:20:58PM -0400, Eric Sunshine wrote:
> 
> > On Tue, Jul 24, 2018 at 5:26 AM Jeff King  wrote:
> > >   1. We'll only trigger with -Wimplicit-function-declaration
> > >  (and only stop compilation with -Werror). These are
> > >  generally enabled by DEVELOPER=1. If you _don't_ have
> > >  that set, we'll still catch the problem, but only at
> > >  link-time, with a slightly less useful message:
> > >
> > >  If instead we convert this to a reference to an
> > >  undefined variable, that always dies immediately. But
> > >  gcc seems to print the set of errors twice, which
> > >  clutters things up.
> > 
> > The above does a pretty good job of convincing me that this ought to
> > be implemented via an undefined variable rather than undefined
> > function, exactly because it is the newcomer or casual contributor who
> > is most likely to trip over a banned function, and almost certainly
> > won't have DEVELOPER=1 set. The gcc clutter seems a minor point
> > against the benefit this provides to that audience.
> 
> OK. I was on the fence, but it should be pretty trivial to switch. Let
> me see if I can just make a replacement for patch 1, or if the whole
> thing needs to be rebased on top.

The rest apply cleanly (because they're far enough away textually from
the definition of BANNED).

So here's a replacement for just patch 1 (I'm assuming this creates less
work than re-posting them all, but it may not be if Junio prefers
dealing with a whole new mbox rather than a "rebase -i", "reset --hard
HEAD^", "git am" -- let me know if you'd prefer it the other way).

-- >8 --
Subject: [PATCH] automatically ban strcpy()

There are a few standard C functions (like strcpy) which are
easy to misuse. E.g.:

  char path[PATH_MAX];
  strcpy(path, arg);

may overflow the "path" buffer. Sometimes there's an earlier
constraint on the size of "arg", but even in such a case
it's hard to verify that the code is correct. If the size
really is unbounded, you're better off using a dynamic
helper like strbuf:

  struct strbuf path = STRBUF_INIT;
  strbuf_addstr(path, arg);

or if it really is bounded, then use xsnprintf to show your
expectation (and get a run-time assertion):

  char path[PATH_MAX];
  xsnprintf(path, sizeof(path), "%s", arg);

which makes further auditing easier.

We'd usually catch undesirable code like this in a review,
but there's no automated enforcement. Adding that
enforcement can help us be more consistent and save effort
(and a round-trip) during review.

This patch teaches the compiler to report an error when it
sees strcpy (and will become a model for banning a few other
functions). This has a few advantages over a separate
linting tool:

  1. We know it's run as part of a build cycle, so it's
 hard to ignore. Whereas an external linter is an extra
 step the developer needs to remember to do.

  2. Likewise, it's basically free since the compiler is
 parsing the code anyway.

  3. We know it's robust against false positives (unlike a
 grep-based linter).

The two big disadvantages are:

  1. We'll only check code that is actually compiled, so it
 may miss code that isn't triggered on your particular
 system. But since presumably people don't add new code
 without compiling it (and if they do, the banned
 function list is the least of their worries), we really
 only care about failing to clean up old code when
 adding new functions to the list. And that's easy
 enough to address with a manual audit when adding a new
 function (which is what I did for the functions here).

  2. If this ends up generating false positives, it's going
 to be harder to disable (as opposed to a separate
 linter, which may have mechanisms for overriding a
 particular case).

 But the intent is to only ban functions which are
 obviously bad, and for which we accept using an
 alternative even when this particular use isn't buggy
 (e.g., the xsnprintf alternative above).

The implementation here is simple: we'll define a macro for
the banned function which replaces it with a reference to a
descriptively named but undeclared identifier.  Replacing it
with any invalid code would work (since we just want to
break compilation).  But ideally we'd meet these goals:

 - it should be portable; ideally this would trigger
   everywhere, and does not need to be part of a DEVELOPER=1
   setup (because unlike warnings which may depend on the
   compiler or system, this is a clear indicator of
   something wrong in the code).

 - it should generate a readable error that gives the
   developer a clue what happened

 - it should avoid generating too much other cruft that
   makes it hard to see the actual error

 - it should mention the original callsite in the error

The output with this patch looks like this (using gcc 7, on
a checkout with 022d2ac1f3