Re: [PATCH v3 1/4] automatically ban strcpy()
On Fri, Jul 27, 2018 at 10:34:20AM -0700, Junio C Hamano wrote: > Jeff King writes: > > >> $ git rebase --onto HEAD @{-1}~3 @{-1}^0 > > > > Interesting. I'd have probably done it with an interactive rebase: > > > > $ git rebase -i HEAD~4 > > [change first "pick" to "edit"; after stopping...] > > $ git reset --hard HEAD^ ;# throw away patch 1 > > $ git am -s mbox ;# apply single patch > > $ git rebase --continue > > > > Which is really the same thing,... > > I have unfounded fear for doing anything other than "edit", "commit > --amend", "rebase --continue", or "rebase --abort" during a "rebase > -i" session. > > Especiallly "reset --hard" with anything but HEAD. I guess that is > because I do not fully trust/understand how the sequencer machinery > replays the remainder of todo tasks on top of the HEAD that is > different from what it left me to futz with when it relinquished the > control. I jump around via "reset --hard" all the time during interactive rebases. I don't recall having any issues, though that does not mean there isn't a corner case lurking. :) > Also "am" during "rebase -i" is scary to me, as "am" also tries to > keep its own sequencing state. Do you know if "rebase --continue" > would work correctly in the above sequence if "am" conflicted, I > gave up, and said "am --abort", for example? I don't offhand know. This one is trickier. I _assumed_ it would be fine to "am" during a rebase, but I didn't actually try it (in fact, I rarely do anything exotic with "am", since my main use is just test-applying patches on a detached head). It _seems_ to work with this example: -- >8 -- git init repo cd repo for i in 1 2 3 4; do echo $i >file git add file git commit -m $i done git format-patch --stdout -1 >patch git reset --hard HEAD^ GIT_EDITOR='perl -i -pe "/2$/ and s/pick/edit/"' git rebase -i HEAD~2 git am patch git am --abort -- 8< -- I think because "am" lives in $GIT_DIR/rebase-apply, and "rebase -i" lives in ".git/rebase-merge". Of course "rebase" can use the rebase-apply directory, but I think interactive-rebase never will. So it works, but mostly by luck. :) In my ideal world, operations like this that can be continued would be stored in a stack, and each stack element would know its operation type. So you could do: # push a rebase onto the stack git rebase foo # while stopped, you might do another operation which pushes onto the # stack git am ~/patch # aborting an operation (or finishing it naturally) pops it off the # stack; now we just have the rebase on the stack git am --abort # aborting an operation that's not at the top of the stack would # either be an error, or could auto-abort everybody on top git am ~/patch git rebase --abort ;# aborts the am, too! # you could even nest similar operations; by default we find the # top-most one in the stack, but you could refer to them by stack # position. # # put us in a rebase that stops at a conflict git rebase foo # oops, rewrite the last few commits as part of fixing the conflict git rebase -i HEAD~3 # nope, let's abort the whole thing (stack level 0) git rebase --abort=0 # it would also be nice to have generic commands to manipulate the # stack git op list ;# show the stack git op abort ;# abort the top operation, whatever it is git op continue ;# continue the top operation, whatever it is I've hacked something similar to "git op continue" myself and find it very useful, but: - it's intimately aware of all the possible operations, including some custom ones that I have. I wouldn't need to if each operation touched a well-known directory to push itself on the stack, and provided a few commands in the stack directory for things like "run this to abort me". - it sometimes behaves weirdly, because I "canceled" an operation with "reset" or "checkout", and later I expect to continue operation X, but find that some other operation Y was waiting. Having a stack means it's much easier to see which operations are still hanging around (we have this already with the prompt logic that shows things like rebases, but again, it has to be intimate with which operations are storing data in $GIT_DIR) Anyway. That's something I've dreamed about for a while, but the thought of retro-fitting the existing multi-command operations turned me off. The current systems _usually_ works. -Peff
Re: [PATCH v3 1/4] automatically ban strcpy()
Jeff King writes: >> $ git rebase --onto HEAD @{-1}~3 @{-1}^0 > > Interesting. I'd have probably done it with an interactive rebase: > > $ git rebase -i HEAD~4 > [change first "pick" to "edit"; after stopping...] > $ git reset --hard HEAD^ ;# throw away patch 1 > $ git am -s mbox ;# apply single patch > $ git rebase --continue > > Which is really the same thing,... I have unfounded fear for doing anything other than "edit", "commit --amend", "rebase --continue", or "rebase --abort" during a "rebase -i" session. Especiallly "reset --hard" with anything but HEAD. I guess that is because I do not fully trust/understand how the sequencer machinery replays the remainder of todo tasks on top of the HEAD that is different from what it left me to futz with when it relinquished the control. Also "am" during "rebase -i" is scary to me, as "am" also tries to keep its own sequencing state. Do you know if "rebase --continue" would work correctly in the above sequence if "am" conflicted, I gave up, and said "am --abort", for example? I don't offhand know.
Re: [PATCH v3 1/4] automatically ban strcpy()
On Thu, Jul 26, 2018 at 10:33:27AM -0700, Junio C Hamano wrote: > Jeff King writes: > > > So here's a replacement for just patch 1 (I'm assuming this creates less > > work than re-posting them all, but it may not be if Junio prefers > > dealing with a whole new mbox rather than a "rebase -i", "reset --hard > > HEAD^", "git am" -- let me know if you'd prefer it the other way). > > A single patch replacement that is clearly marked which one to > replace and which other ones to keep, like you did here, is fine. > The amount of work is about the same either way. > > 0) I would first do these to make sure that I can replace: > [..] Thanks. As always, I find it interesting to see your workflows. > 1-b) With a single patch replacement, it is quite different. > > $ git checkout HEAD~4;# we are replacing 1/4 of the original > $ git am -s mbox ;# that single patch > $ git show-branch HEAD @{-1} > [...] > The most natural thing to do at this point is > > $ git cherry-pick -3 @{-1} > > But we know range-pick is buggy and loses core.rewriteref, so > instead I did this, which I know carries the notes forward: > > $ git rebase --onto HEAD @{-1}~3 @{-1}^0 Interesting. I'd have probably done it with an interactive rebase: $ git rebase -i HEAD~4 [change first "pick" to "edit"; after stopping...] $ git reset --hard HEAD^ ;# throw away patch 1 $ git am -s mbox ;# apply single patch $ git rebase --continue Which is really the same thing, but "cheats" around the cherry-pick problem by using rebase (which I think handles the rewriteref stuff correctly even in interactive mode). I guess if we wanted to be really fancy, just replacing the first "pick" with "x git am -s mbox" would automate it. That might be handy for the multi-patch case. Anyway, thanks for handling it. :) -Peff
Re: [PATCH v3 1/4] automatically ban strcpy()
Jeff King writes: > So here's a replacement for just patch 1 (I'm assuming this creates less > work than re-posting them all, but it may not be if Junio prefers > dealing with a whole new mbox rather than a "rebase -i", "reset --hard > HEAD^", "git am" -- let me know if you'd prefer it the other way). A single patch replacement that is clearly marked which one to replace and which other ones to keep, like you did here, is fine. The amount of work is about the same either way. 0) I would first do these to make sure that I can replace: $ git checkout jk/banned-functions $ git log --first-parent --oneline master.. $ git log --first-parent --oneline next.. If 'next' has some patches that are not in 'master' from the topic, I must refrain from replacing them (in this case, there is no such issue). 1-a) With a wholesale replacement, $ git checkout master... ;# try to keep the same base $ git am -s mbox ;# on detached HEAD $ git tbdiff ..@{-1} @{-1}.. ;# or range-diff If the range-/tbdiff tells me that earlier N patches are the same, the above is followed by something like (pardon off-by-one) $ git rebase --onto @{-1}~N HEAD~N to preserve as many original commits as possible. 1-b) With a single patch replacement, it is quite different. $ git checkout HEAD~4 ;# we are replacing 1/4 of the original $ git am -s mbox ;# that single patch $ git show-branch HEAD @{-1} That would give me something like this * [HEAD] automatically ban strcpy() ! [@{-1}] banned.h: mark strncpy() as banned -- * [HEAD] automatically ban strcpy() + [@{-1}] banned.h: mark strncpy() as banned + [@{-1}^] banned.h: mark sprintf() as banned + [@{-1}~2] banned.h: mark strcat() as banned + [@{-1}~3] automatically ban strcpy() -- [HEAD^] Merge branch 'sb/blame-color' into jk/banned-function The most natural thing to do at this point is $ git cherry-pick -3 @{-1} But we know range-pick is buggy and loses core.rewriteref, so instead I did this, which I know carries the notes forward: $ git rebase --onto HEAD @{-1}~3 @{-1}^0 Side note: The point of last "^0" is that I do not want to lose the topic branch jk/banned-functions not just yet. If I need to re-apply separate pieces of the original history, it becomes very painful to emulate these multiple cherry-picks with multiple "rebase --onto", though. That is where "the amount of work is about the same" comes from. If cherry-pick worked correctly, selective replacement should be less work for me. Anyway, we already preserved as many original commits, but unlike 1-a, did so manually when we decided to replace selective patches. So there is no further rebasing with this approach. 2) I now have two diverged histories in HEAD and @{-1} that I can compare with range-/tbdiff in either case: $ git tbdiff ..@{-1} @{-1}.. After the usual inspection and testing, replacement is concluded by $ git checkout -B @{-1} which takes me back to (an updated) jk/banned-functions topic.
[PATCH v3 1/4] automatically ban strcpy()
On Thu, Jul 26, 2018 at 02:58:40AM -0400, Jeff King wrote: > On Tue, Jul 24, 2018 at 01:20:58PM -0400, Eric Sunshine wrote: > > > On Tue, Jul 24, 2018 at 5:26 AM Jeff King wrote: > > > 1. We'll only trigger with -Wimplicit-function-declaration > > > (and only stop compilation with -Werror). These are > > > generally enabled by DEVELOPER=1. If you _don't_ have > > > that set, we'll still catch the problem, but only at > > > link-time, with a slightly less useful message: > > > > > > If instead we convert this to a reference to an > > > undefined variable, that always dies immediately. But > > > gcc seems to print the set of errors twice, which > > > clutters things up. > > > > The above does a pretty good job of convincing me that this ought to > > be implemented via an undefined variable rather than undefined > > function, exactly because it is the newcomer or casual contributor who > > is most likely to trip over a banned function, and almost certainly > > won't have DEVELOPER=1 set. The gcc clutter seems a minor point > > against the benefit this provides to that audience. > > OK. I was on the fence, but it should be pretty trivial to switch. Let > me see if I can just make a replacement for patch 1, or if the whole > thing needs to be rebased on top. The rest apply cleanly (because they're far enough away textually from the definition of BANNED). So here's a replacement for just patch 1 (I'm assuming this creates less work than re-posting them all, but it may not be if Junio prefers dealing with a whole new mbox rather than a "rebase -i", "reset --hard HEAD^", "git am" -- let me know if you'd prefer it the other way). -- >8 -- Subject: [PATCH] automatically ban strcpy() There are a few standard C functions (like strcpy) which are easy to misuse. E.g.: char path[PATH_MAX]; strcpy(path, arg); may overflow the "path" buffer. Sometimes there's an earlier constraint on the size of "arg", but even in such a case it's hard to verify that the code is correct. If the size really is unbounded, you're better off using a dynamic helper like strbuf: struct strbuf path = STRBUF_INIT; strbuf_addstr(path, arg); or if it really is bounded, then use xsnprintf to show your expectation (and get a run-time assertion): char path[PATH_MAX]; xsnprintf(path, sizeof(path), "%s", arg); which makes further auditing easier. We'd usually catch undesirable code like this in a review, but there's no automated enforcement. Adding that enforcement can help us be more consistent and save effort (and a round-trip) during review. This patch teaches the compiler to report an error when it sees strcpy (and will become a model for banning a few other functions). This has a few advantages over a separate linting tool: 1. We know it's run as part of a build cycle, so it's hard to ignore. Whereas an external linter is an extra step the developer needs to remember to do. 2. Likewise, it's basically free since the compiler is parsing the code anyway. 3. We know it's robust against false positives (unlike a grep-based linter). The two big disadvantages are: 1. We'll only check code that is actually compiled, so it may miss code that isn't triggered on your particular system. But since presumably people don't add new code without compiling it (and if they do, the banned function list is the least of their worries), we really only care about failing to clean up old code when adding new functions to the list. And that's easy enough to address with a manual audit when adding a new function (which is what I did for the functions here). 2. If this ends up generating false positives, it's going to be harder to disable (as opposed to a separate linter, which may have mechanisms for overriding a particular case). But the intent is to only ban functions which are obviously bad, and for which we accept using an alternative even when this particular use isn't buggy (e.g., the xsnprintf alternative above). The implementation here is simple: we'll define a macro for the banned function which replaces it with a reference to a descriptively named but undeclared identifier. Replacing it with any invalid code would work (since we just want to break compilation). But ideally we'd meet these goals: - it should be portable; ideally this would trigger everywhere, and does not need to be part of a DEVELOPER=1 setup (because unlike warnings which may depend on the compiler or system, this is a clear indicator of something wrong in the code). - it should generate a readable error that gives the developer a clue what happened - it should avoid generating too much other cruft that makes it hard to see the actual error - it should mention the original callsite in the error The output with this patch looks like this (using gcc 7, on a checkout with 022d2ac1f3