Re: [PATCH 1/2] introduce "banned function" list

2018-07-20 Thread Junio C Hamano
Jeff King  writes:

> Thanks for an interesting (and exotic) counter-point. For strcpy(), you
> always have to run strlen() anyway to know you're not going to overflow,
> so you might as well memcpy() at that point. But if you really are OK
> with truncation, then using strncpy() lets you copy while walking over
> the string only once, which is more efficient.  On the other hand,
> strncpy() fills out NULs to the end of the destination array, so you are
> paying an extra cost for small strings.
>
>> So I used strncpy() advisedly, and I ignore people running Coccinelle
>> scripts and blindly sending patches to "fix" ext4.

Given that strncpy() was invented in V7 days for specifically the
purpose of filling the filename field, it is not surprising at all
that it is the perfect tool for the same purpose in ext4 ;-)

> We don't have any remaining calls to strncpy() in Git, so this lets us
> punt on deciding if the ban is worth changing what's there. We can wait
> for somebody to decide they _really_ need strncpy, and we can discuss it
> then with a concrete case.

Yes, and the plan (or at least your plan) is to limit the banned
list to things that really has no reason to be used, the above
sounds like a good approach.



Re: [PATCH 1/2] introduce "banned function" list

2018-07-20 Thread Jeff King
On Fri, Jul 20, 2018 at 10:48:37AM -0700, Elijah Newren wrote:

> > Is it possible to extend this to ban variables as well? I'm still
> > going to delete the_index from library code. Once that work is done I
> 
> Or perhaps constants, such as PATH_MAX to avoid problems like this one
> from 2.18.0 timeframe:
> https://public-inbox.org/git/7d1237c7-5a83-d766-7d93-5f0d59166...@web.de/

I've been slowly trying to eradicate PATH_MAX from our code base. And I
would be happy with an eventual automated ban there. Unlike the_index,
it comes from the system, so it's in the same boat as strcpy() etc.

That said, I think it's less urgent. The urgent problem fixed by the
patch you linked was the use of strcpy() to overflow the buffer. Without
that, it just becomes a normal bug where we do not handle long paths
well on some operating systems.

-Peff


Re: [PATCH 1/2] introduce "banned function" list

2018-07-20 Thread Jeff King
On Fri, Jul 20, 2018 at 04:41:34PM +0200, Duy Nguyen wrote:

> > Let's start by banning strcpy() and sprintf(). It's not
> > impossible to use these correctly, but it's easy to do so
> > incorrectly, and there's always a better option.
> 
> Is it possible to extend this to ban variables as well? I'm still
> going to delete the_index from library code. Once that work is done I
> will ban it from entering again (it's only allowed in builtin/ for
> example). The next target after that would be the_repository.
> 
> Right now I will do this by not declaring the variable [2], which ends
> up with a much less friendly compile warning. I did something similar
> [1] in an earlier iteration but did not do extensive research on this
> topic like you did.

IMHO just not declaring the variable is the right move (and is what I
would do with these functions if libc wasn't already declaring them).

The compile error may be less friendly, but these things are temporary
as topics-in-flight catch up. Whereas strcpy() will be with us forever.

I also think that removing variables is a special case of a larger
technique: when we change function semantics, we try to change the
signatures, too. So there you'll get a type error, or not enough
arguments, or whatever. And the next step is usually "git log" or "git
blame" to see what happened, and what the conversion should look like.

-Peff


Re: [PATCH 1/2] introduce "banned function" list

2018-07-20 Thread Jeff King
On Fri, Jul 20, 2018 at 09:22:41AM -0400, Theodore Y. Ts'o wrote:

> On Thu, Jul 19, 2018 at 09:08:08PM -0400, Jeff King wrote:
> > Ditto for sprintf, where you should _always_ be using at least xsnprintf
> > (or some better tool, depending on the situation).  And for strncpy,
> > strlcpy (or again, some better tool) is strictly an improvement.
> 
> Nitpick: this may be true for git, but it's not strictly true in all
> cases.  I actually have a (non-git) case where strncpy *is* the right
> tool.  And this is one where I'm copying a NUL-terminated string into
> a fixed-length charater array (in the ext4 superblock) which is *not*
> NUL-terminated.  In that case, strncpy works(); but strlcpy() does not
> do what I want.

Thanks for an interesting (and exotic) counter-point. For strcpy(), you
always have to run strlen() anyway to know you're not going to overflow,
so you might as well memcpy() at that point. But if you really are OK
with truncation, then using strncpy() lets you copy while walking over
the string only once, which is more efficient.  On the other hand,
strncpy() fills out NULs to the end of the destination array, so you are
paying an extra cost for small strings.

> So I used strncpy() advisedly, and I ignore people running Coccinelle
> scripts and blindly sending patches to "fix" ext4.

Even after hearing your counter-point, I think I'm still OK with the
ban. Because as you've noticed, even if the call is fine, it carries an
ongoing maintenance cost. Every time you (or somebody else) audits, you
have to skip over that known-good call. And you have to deal with
well-meaning patches. So to me there's value in eliminating it even if
it's an acceptable tool.

We don't have any remaining calls to strncpy() in Git, so this lets us
punt on deciding if the ban is worth changing what's there. We can wait
for somebody to decide they _really_ need strncpy, and we can discuss it
then with a concrete case.

> But perhaps that's also a solution for git?  You don't have to
> necessarily put them on a banned list; you could instead have some
> handy, pre-set scripts that scan the entire code base looking for
> "bad" functions with cleanups automatically suggested.  This could be
> run at patch review time, and/or periodically (especially before a
> release).

We do this already with coccinelle for some transformations. But I think
there's real value in linting that's always run, and is cheap. Catching
these things early means we don't have to spend time on them in review,
or dealing with follow-up patches.

-Peff


Re: [PATCH 1/2] introduce "banned function" list

2018-07-20 Thread Elijah Newren
On Fri, Jul 20, 2018 at 7:41 AM, Duy Nguyen  wrote:
> On Thu, Jul 19, 2018 at 10:40 PM Jeff King  wrote:
>>
>> There are a few standard C functions (like strcpy) which are
>> easy to misuse. We generally discourage these in reviews,
>> but we haven't put advice in CodingGuidelines, nor provided
>> any automated enforcement. The latter is especially
>> important because it's more consistent, and it can often
>> save a round-trip of review.
>>
>> Let's start by banning strcpy() and sprintf(). It's not
>> impossible to use these correctly, but it's easy to do so
>> incorrectly, and there's always a better option.
>
> Is it possible to extend this to ban variables as well? I'm still
> going to delete the_index from library code. Once that work is done I

Or perhaps constants, such as PATH_MAX to avoid problems like this one
from 2.18.0 timeframe:
https://public-inbox.org/git/7d1237c7-5a83-d766-7d93-5f0d59166...@web.de/


Re: [PATCH 1/2] introduce "banned function" list

2018-07-20 Thread Jeff King
On Fri, Jul 20, 2018 at 02:32:29AM -0700, Junio C Hamano wrote:

> > Contrast this with memcpy(). This is on Microsoft's SDL banned list[1],
> > but I think it's silly for it to be. I would never add it to this list.
> 
> A tangent, but is that because they want you to use memmove()
> instead so that you do not have to worry about overlapping copies,
> perhaps?

That was my first thought, too, but nope. They recommend memcpy_s()
instead. Which in my opinion adds very little value, while missing the
most common misuse of memcpy I've seen in practice (the overlapping
thing).

Helpers like our COPY_ARRAY() are much more useful for preventing sizing
mistakes, IMHO. But again, I'd never ban memcpy. The right tool for
encouraging COPY_ARRAY() is coccinelle (because the matching is
complicated, but also because we can mechanically turn it into the right
thing, whereas a strcpy is going to require some manual reworking).

-Peff


Re: [PATCH 1/2] introduce "banned function" list

2018-07-20 Thread Duy Nguyen
On Thu, Jul 19, 2018 at 10:40 PM Jeff King  wrote:
>
> There are a few standard C functions (like strcpy) which are
> easy to misuse. We generally discourage these in reviews,
> but we haven't put advice in CodingGuidelines, nor provided
> any automated enforcement. The latter is especially
> important because it's more consistent, and it can often
> save a round-trip of review.
>
> Let's start by banning strcpy() and sprintf(). It's not
> impossible to use these correctly, but it's easy to do so
> incorrectly, and there's always a better option.

Is it possible to extend this to ban variables as well? I'm still
going to delete the_index from library code. Once that work is done I
will ban it from entering again (it's only allowed in builtin/ for
example). The next target after that would be the_repository.

Right now I will do this by not declaring the variable [2], which ends
up with a much less friendly compile warning. I did something similar
[1] in an earlier iteration but did not do extensive research on this
topic like you did.

[1] https://public-inbox.org/git/20180606073933.14755-1-pclo...@gmail.com/T/
[2] https://public-inbox.org/git/20180616054157.32433-16-pclo...@gmail.com/
-- 
Duy


Re: [PATCH 1/2] introduce "banned function" list

2018-07-20 Thread Theodore Y. Ts'o
On Thu, Jul 19, 2018 at 09:08:08PM -0400, Jeff King wrote:
> Ditto for sprintf, where you should _always_ be using at least xsnprintf
> (or some better tool, depending on the situation).  And for strncpy,
> strlcpy (or again, some better tool) is strictly an improvement.

Nitpick: this may be true for git, but it's not strictly true in all
cases.  I actually have a (non-git) case where strncpy *is* the right
tool.  And this is one where I'm copying a NUL-terminated string into
a fixed-length charater array (in the ext4 superblock) which is *not*
NUL-terminated.  In that case, strncpy works(); but strlcpy() does not
do what I want.

So I used strncpy() advisedly, and I ignore people running Coccinelle
scripts and blindly sending patches to "fix" ext4.

But perhaps that's also a solution for git?  You don't have to
necessarily put them on a banned list; you could instead have some
handy, pre-set scripts that scan the entire code base looking for
"bad" functions with cleanups automatically suggested.  This could be
run at patch review time, and/or periodically (especially before a
release).

- Ted


Re: [PATCH 1/2] introduce "banned function" list

2018-07-20 Thread Derrick Stolee

On 7/19/2018 4:39 PM, Jeff King wrote:

There are a few standard C functions (like strcpy) which are
easy to misuse. We generally discourage these in reviews,
but we haven't put advice in CodingGuidelines, nor provided
any automated enforcement. The latter is especially
important because it's more consistent, and it can often
save a round-trip of review.

Let's start by banning strcpy() and sprintf(). It's not
impossible to use these correctly, but it's easy to do so
incorrectly, and there's always a better option.

For enforcement, we can rely on the compiler and just
introduce code which breaks compilation when it sees these
functions. This has a few advantages:

   1. We know it's run as part of a build cycle, so it's
  hard to ignore. Whereas an external linter is an extra
  step the developer needs to remember to do.

   2. Likewise, it's basically free since the compiler is
  parsing the code anyway.

   3. We know it's robust against false positives (unlike a
  grep-based linter).


I know I'm late to the party, but I just wanted to chime in that I think 
this is a great idea. The more checks we have as part of the developer 
cycle means we have fewer checks that need to be caught manually in 
review (or can be missed in review).


Thanks,

-Stolee



Re: [PATCH 1/2] introduce "banned function" list

2018-07-20 Thread Junio C Hamano
Jeff King  writes:

>> Is it a downside that it is cumbersome to override?  This is not a
>> rhetorical question.  I am not convinced there will not be a legit
>> circumstance that calling strcpy (or whatever we are going to ban)
>> is the best solution and it is safe.  By "best", what I mean is "you
>> could instead use memcpy/strncpy/whatever" can legitimately be
>> argued with "but still using memcpy/strncpy/whatever is inferior
>> than using strcpy in this case for such and such reasons".
>
> In my opinion, no, this is not a problem.
>
> My plan is to only add functions which are truly worthless.

OK.

> Contrast this with memcpy(). This is on Microsoft's SDL banned list[1],
> but I think it's silly for it to be. I would never add it to this list.

A tangent, but is that because they want you to use memmove()
instead so that you do not have to worry about overlapping copies,
perhaps?



Re: [PATCH 1/2] introduce "banned function" list

2018-07-19 Thread Jeff King
On Thu, Jul 19, 2018 at 09:08:08PM -0400, Jeff King wrote:

> Contrast this with memcpy(). This is on Microsoft's SDL banned list[1],
> but I think it's silly for it to be. I would never add it to this list.

I forgot my footnote, which was going to be:

  I'm bringing up that list not because I think it's necessarily a good
  list, but because it's _a_ list. And as I was recently subjected to an
  audit that referenced it, I've been thinking a lot about ban-lists and
  whether they are useful (and specifically for which functions).

  It's at https://msdn.microsoft.com/en-us/library/bb288454.aspx if
  you're curious, but again, that is absolutely not the ban-list I am
  working towards. To what I posted already, I'd probably add strcat()
  and vsprintf() based on discussions here, and then call it done.

-Peff


Re: [PATCH 1/2] introduce "banned function" list

2018-07-19 Thread Jeff King
On Thu, Jul 19, 2018 at 03:46:08PM -0700, Junio C Hamano wrote:

> Jeff King  writes:
> 
> > For enforcement, we can rely on the compiler and just
> > introduce code which breaks compilation when it sees these
> > functions. This has a few advantages:
> >
> >   1. We know it's run as part of a build cycle, so it's
> >  hard to ignore. Whereas an external linter is an extra
> >  step the developer needs to remember to do.
> >
> >   2. Likewise, it's basically free since the compiler is
> >  parsing the code anyway.
> >
> >   3. We know it's robust against false positives (unlike a
> >  grep-based linter).
> >
> > The one big disadvantage is that it will only check code
> > that is actually compiled, so it may miss code that isn't
> > triggered on your particular system. But since presumably
> > people don't add new code without compiling it (and if they
> > do, the banned function list is the least of their worries),
> > we really only care about failing to clean up old code when
> > adding new functions to the list. And that's easy enough to
> > address with a manual audit when adding a new function
> > (which is what I did for the functions here).
> >
> > That leaves only the question of how to trigger the
> > compilation error. The goals are:
> 
> I actually have another question, though.
> 
> Is it a downside that it is cumbersome to override?  This is not a
> rhetorical question.  I am not convinced there will not be a legit
> circumstance that calling strcpy (or whatever we are going to ban)
> is the best solution and it is safe.  By "best", what I mean is "you
> could instead use memcpy/strncpy/whatever" can legitimately be
> argued with "but still using memcpy/strncpy/whatever is inferior
> than using strcpy in this case for such and such reasons".

In my opinion, no, this is not a problem.

My plan is to only add functions which are truly worthless. So with
strcpy(), for example, you _can_ make sure the buffer is big enough
before calling strcpy. But to do so, you by definition must have just
called strlen(), at which point you may as well use memcpy(). It's more
obviously correct, and it's more efficient.

Ditto for sprintf, where you should _always_ be using at least xsnprintf
(or some better tool, depending on the situation).  And for strncpy,
strlcpy (or again, some better tool) is strictly an improvement.  If
these were our functions, I would literally delete them from our
codebase. This is the moral equivalent. ;)

Contrast this with memcpy(). This is on Microsoft's SDL banned list[1],
but I think it's silly for it to be. I would never add it to this list.

Likewise snprintf(). That is a function that _can_ be dangerous due to
unexpected truncation, and I think we should avoid it in general. But I
would not ban it, as it is the correct tool in a few situations (you
have a fixed buffer and returning a truncation error code is fine --
many of our syscall wrappers are in this exact situation). So I would
probably not add it to the ban list (but I feel less strongly than I do
about memcpy).

The nuclear option for overriding is "#undef that-function" in a
particular instance. That covers the rest of the translation unit, but I
think that's probably fine. If we have a function which is mostly
questionable, but we need it sometimes, then we ought to be putting a
sane wrapper around it. And _that_ wrapper can sit in its own file and
#undef as it pleases, keeping the unsafety contained within.

Don't get me wrong, if you have another suggestion that allows easier
one-off overrides, I'm happy to hear it. But I think building this into
the compilation step and having it on-by-default is a huge advantage in
simplicity and having people remember to run it. And I couldn't come up
with a better way to do that.

-Peff


Re: [PATCH 1/2] introduce "banned function" list

2018-07-19 Thread Jeff King
On Thu, Jul 19, 2018 at 05:59:47PM -0400, Eric Sunshine wrote:

> > The one I brainstormed (but forgot to mention) is that it might be
> > possible for a platform to have strcpy as a macro already? In which case
> > we'd need to #undef it or risk a compilation error (even if the macro
> > isn't actually used).
> 
> I have some recollection (perhaps long outdated or just wrong) of
> Microsoft headers spewing deprecation warnings about "unsafe"
> functions. I don't know whether they did that by covering functions
> with macros or by decorating the function with a deprecation attribute
> or by some other mechanism, but such concern seems well-founded.
> #undef'ing them might indeed be a very good preventative tactic.

Yeah, these functions are definitely on their "SDL banned list". I don't
know how they implement that. At that level, I'd really expect it to be
done with a deprecated attribute next to the declaration (I also
considered trying to add deprecation attributes, too, but I think it's
hard to do without re-declaring the function, and anyway it's "just" a
warning).

-Peff


Re: [PATCH 1/2] introduce "banned function" list

2018-07-19 Thread Jeff King
On Thu, Jul 19, 2018 at 02:47:35PM -0700, Stefan Beller wrote:

> > But that may be overly paranoid.  Once upon a time there was some rules
> > lawyering around CodingGuidelines, but I think that was successfully
> > shut down and hasn't reared its head for several years.
> 
> Heh; My preference would be to keep docs as short and concise as
> possible (and lawyering sounds like "verbosity is a feature" :-P) but
> still giving advice on controversial (i.e. not enforced by a machine in
> a deterministic fashion) things such as having braces around one
> statement for example or leaving them out.

I think we literally had somebody say "I don't have to abide by this in
a review because it wasn't in CodingGuidelines." But then, that is
perhaps indicative of other problems.

> So maybe I would have rather asked, why we start out with these two
> functions. And you seem to call them "obviously bad", and you take both
> of them because they need to be handled differently due to the variadic 
> macros.
> (And those two are "obviously worse" than strcat, as they are used more often.
> But strcat, being on MS ban list[1], would do just as fine)

Ooh, strcat is another one that should be added.

I actually thought about splitting it into three commits (introduce
mechanism, then one per function), but it felt like stringing it out.
You are probably right, though, that each function deserves its own
explanation. And the commit message is already quite long.

> >   We'll include strcpy() and sprintf() in the initial list of banned
> >   functions. While these _can_ be used carefully by surrounding them
> >   with extra code, there's no inherent check that the size of the
> >   destination buffer is big enough, it's very easy to overflow the
> >   buffer.
> 
> Sounds good to me, maybe even add "We've had problems with that
> in the past, see 'git log -S strcpy'", but that may be too much again.

Actually, that's a good point. We've had actual bugs due strcpy(). I can
similarly point to bad uses of strcat().

I think I have sufficient fodder for a re-roll along these lines
(assuming we like the idea at all; Junio seemed to have some
reservations, but I'll reply there separately).

-Peff


RE: [PATCH 1/2] introduce "banned function" list

2018-07-19 Thread Randall S. Becker
On July 19, 2018 6:46 PM, Junio wrote:
> Jeff King  writes:
> 
> > For enforcement, we can rely on the compiler and just introduce code
> > which breaks compilation when it sees these functions. This has a few
> > advantages:
> >
> >   1. We know it's run as part of a build cycle, so it's
> >  hard to ignore. Whereas an external linter is an extra
> >  step the developer needs to remember to do.
> >
> >   2. Likewise, it's basically free since the compiler is
> >  parsing the code anyway.
> >
> >   3. We know it's robust against false positives (unlike a
> >  grep-based linter).
> >
> > The one big disadvantage is that it will only check code that is
> > actually compiled, so it may miss code that isn't triggered on your
> > particular system. But since presumably people don't add new code
> > without compiling it (and if they do, the banned function list is the
> > least of their worries), we really only care about failing to clean up
> > old code when adding new functions to the list. And that's easy enough
> > to address with a manual audit when adding a new function (which is
> > what I did for the functions here).
> >
> > That leaves only the question of how to trigger the compilation error.
> > The goals are:
> 
> I actually have another question, though.
> 
> Is it a downside that it is cumbersome to override?  This is not a
rhetorical
> question.  I am not convinced there will not be a legit circumstance that
> calling strcpy (or whatever we are going to ban) is the best solution and
it is
> safe.  By "best", what I mean is "you could instead use
> memcpy/strncpy/whatever" can legitimately be argued with "but still using
> memcpy/strncpy/whatever is inferior than using strcpy in this case for
such
> and such reasons".

Putting on my old-guy compiler hat, this sounds like a more complex activity
that something akin to lint might be useful at handling. Having a
post-processor that searches for offending functions but also supports
annotations explaining exceptions (why you really had to use strncpy because
the NULL was hiding in a bad place and you promise to fix it), might be
useful. Personally, I'd rather know that that code compiles first and then
violates rules that I can fix following basic prototyping than getting
yelled at up front - but that's just me. I can't suggest a good thing to
handle this, short of augmenting lint, and if we were in java, annotations
would be the way to go, but this seems like a problem that other products
have solved.

Cheers,
Randall

-- Brief whoami:
  NonStop developer since approximately NonStop(2112884442)
  UNIX developer since approximately 421664400
-- In my real life, I talk too much.





Re: [PATCH 1/2] introduce "banned function" list

2018-07-19 Thread Junio C Hamano
Jeff King  writes:

> For enforcement, we can rely on the compiler and just
> introduce code which breaks compilation when it sees these
> functions. This has a few advantages:
>
>   1. We know it's run as part of a build cycle, so it's
>  hard to ignore. Whereas an external linter is an extra
>  step the developer needs to remember to do.
>
>   2. Likewise, it's basically free since the compiler is
>  parsing the code anyway.
>
>   3. We know it's robust against false positives (unlike a
>  grep-based linter).
>
> The one big disadvantage is that it will only check code
> that is actually compiled, so it may miss code that isn't
> triggered on your particular system. But since presumably
> people don't add new code without compiling it (and if they
> do, the banned function list is the least of their worries),
> we really only care about failing to clean up old code when
> adding new functions to the list. And that's easy enough to
> address with a manual audit when adding a new function
> (which is what I did for the functions here).
>
> That leaves only the question of how to trigger the
> compilation error. The goals are:

I actually have another question, though.

Is it a downside that it is cumbersome to override?  This is not a
rhetorical question.  I am not convinced there will not be a legit
circumstance that calling strcpy (or whatever we are going to ban)
is the best solution and it is safe.  By "best", what I mean is "you
could instead use memcpy/strncpy/whatever" can legitimately be
argued with "but still using memcpy/strncpy/whatever is inferior
than using strcpy in this case for such and such reasons".



Re: [PATCH 1/2] introduce "banned function" list

2018-07-19 Thread Eric Sunshine
On Thu, Jul 19, 2018 at 5:27 PM Jeff King  wrote:
> On Thu, Jul 19, 2018 at 05:11:15PM -0400, Eric Sunshine wrote:
> > On Thu, Jul 19, 2018 at 4:39 PM Jeff King  wrote:
> > > + * This header lists functions that have been banned from our code base,
> > > + * because they're too easy to misuse (and even if used correctly,
> > > + * complicate audits). Including this header turns them into compile-time
> > > + * errors.
> >
> > When the above talks about "including this header", the implication is
> > that it must be included _after_ the system header(s) which declare
> > the banned functions. I wonder if that requirement should be stated
> > here explicitly.
>
> Hmm, does it need to be? I had originally intended it to be included
> before, actually, though in the end I put it later.
> I guess it would yield declarations like strcpy_is_banned(), which would
> cause _different_ errors (probably link-time ones).

Yes, that's what I meant. You'd only get link-time errors if banned.h
was included before the system headers (assuming I'm thinking about
this correctly).

> > (Probably not worth a re-roll.)
>
> Yeah, I doubt it matters much either way, since the inclusion is done
> automatically in git-compat-util.h.

Exactly.

> I had also originally imagined this to be triggered via DEVELOPER=1,
> with something like "-include banned.h" in CFLAGS. But I think it
> probably is appropriate for everybody to run it, since it shouldn't
> cause any false positives or other compilation issues.

Agreed.

> The one I brainstormed (but forgot to mention) is that it might be
> possible for a platform to have strcpy as a macro already? In which case
> we'd need to #undef it or risk a compilation error (even if the macro
> isn't actually used).

I have some recollection (perhaps long outdated or just wrong) of
Microsoft headers spewing deprecation warnings about "unsafe"
functions. I don't know whether they did that by covering functions
with macros or by decorating the function with a deprecation attribute
or by some other mechanism, but such concern seems well-founded.
#undef'ing them might indeed be a very good preventative tactic.


Re: [PATCH 1/2] introduce "banned function" list

2018-07-19 Thread Stefan Beller
On Thu, Jul 19, 2018 at 2:32 PM Jeff King  wrote:
>
> On Thu, Jul 19, 2018 at 02:15:54PM -0700, Stefan Beller wrote:
>
> > >  Documentation/CodingGuidelines |  3 +++
> >
> > I'd prefer to not add more text to our documentation
> > (It is already long and in case you run into this problem
> > the error message is clear enough IMHO)
>
> I'm fine with that too. I just wondered if somebody would complain in
> the opposite way: your patch enforces this, but we never made it an
> "official" guideline.
>
> But that may be overly paranoid.  Once upon a time there was some rules
> lawyering around CodingGuidelines, but I think that was successfully
> shut down and hasn't reared its head for several years.

Heh; My preference would be to keep docs as short and concise as
possible (and lawyering sounds like "verbosity is a feature" :-P) but
still giving advice on controversial (i.e. not enforced by a machine in
a deterministic fashion) things such as having braces around one
statement for example or leaving them out.

> > Regarding the introduction of the functions to this list,
> > I would imagine people would find the commit that introduced
> > a function to the ban list to look for a reason why.
> > Can we include a link[1] to explain why we discourage
> > strcpy and sprintf in this commit?
>
> I hoped that it was mostly common knowledge, but that's probably not a
> good assumption. I agree if there's a good reference, we should link to
> it.

In this case I am just an advocate for a better history. Countless
times I wanted
to know *why* something is the way it is and mailing list and history are not
very good at that, as my specific question was not addressed there.

Either it was obvious to all people involved at the time or not written down
why that solution (which -- in hindsight -- sucks), was superior to the other
solution (which may or may not have been known at the time).

So maybe I would have rather asked, why we start out with these two
functions. And you seem to call them "obviously bad", and you take both
of them because they need to be handled differently due to the variadic macros.
(And those two are "obviously worse" than strcat, as they are used more often.
But strcat, being on MS ban list[1], would do just as fine)

(I agree that) Any other function can be added on top with its own
commit message on *why* that function. Over time I would expect that the
reason is that we got bitten by it, or some other project got famously bitten
by it or that some smart people came up with a list[1]. The exception is
this one, which basically says: "Here is the mechanism how to do it and
$X is the obvious thing to put in first", which I agree with the mechanism
as well as that $X seems bad.

[1] https://msdn.microsoft.com/en-us/library/bb288454.aspx

Now that I am deep down the rathole of discussing a tangent, I just found
[2], when searching for "how much common knowledge is wrong", do you
know if there is such a list for (CS) security related things?

[2] http://www.marcandangel.com/2008/06/12/60-popular-pieces-of-false-knowledge/

>
> > [1] https://www.thegeekstuff.com/2013/06/buffer-overflow/?utm_source=feedly
> >   This is the best I found. So not sure if it worth it.
>
> Yeah, this one is so-so, because it goes into a lot more detail. I think
> we can assume that people know that overflowing a buffer is bad. Maybe
> just a short paragraph like:
>
>   We'll include strcpy() and sprintf() in the initial list of banned
>   functions. While these _can_ be used carefully by surrounding them
>   with extra code, there's no inherent check that the size of the
>   destination buffer is big enough, it's very easy to overflow the
>   buffer.

Sounds good to me, maybe even add "We've had problems with that
in the past, see 'git log -S strcpy'", but that may be too much again.

Thanks,
Stefan


Re: [PATCH 1/2] introduce "banned function" list

2018-07-19 Thread Jeff King
On Thu, Jul 19, 2018 at 02:15:54PM -0700, Stefan Beller wrote:

> >  Documentation/CodingGuidelines |  3 +++
> 
> I'd prefer to not add more text to our documentation
> (It is already long and in case you run into this problem
> the error message is clear enough IMHO)

I'm fine with that too. I just wondered if somebody would complain in
the opposite way: your patch enforces this, but we never made it an
"official" guideline.

But that may be overly paranoid.  Once upon a time there was some rules
lawyering around CodingGuidelines, but I think that was successfully
shut down and hasn't reared its head for several years.

> > +#define strcpy(x,y) BANNED(strcpy)
> > +
> > +#ifdef HAVE_VARIADIC_MACROS
> 
> In a split second I thought you forgot sprintf that was
> mentioned in the commit message, but then I kept on reading
> just to find it here. I wonder if we want put this #ifdef at the
> beginning of the file (after the guard) as then we can have
> a uncluttered list of banned functions here. The downside is that
> the use of strcpy would not be banned in case you have no
> variadic macros, but we'd still catch it quickly as most people
> have them. Undecided.

Right, exactly. We should catch what we can on lesser platforms, and
everything on modern ones. My hope is that people would not generally
have to touch this file. I don't think we'll be adding banned functions
often.

> Regarding the introduction of the functions to this list,
> I would imagine people would find the commit that introduced
> a function to the ban list to look for a reason why.
> Can we include a link[1] to explain why we discourage
> strcpy and sprintf in this commit?

I hoped that it was mostly common knowledge, but that's probably not a
good assumption. I agree if there's a good reference, we should link to
it.

> [1] https://www.thegeekstuff.com/2013/06/buffer-overflow/?utm_source=feedly
>   This is the best I found. So not sure if it worth it.

Yeah, this one is so-so, because it goes into a lot more detail. I think
we can assume that people know that overflowing a buffer is bad. Maybe
just a short paragraph like:

  We'll include strcpy() and sprintf() in the initial list of banned
  functions. While these _can_ be used carefully by surrounding them
  with extra code, there's no inherent check that the size of the
  destination buffer is big enough, it's very easy to overflow the
  buffer.

-Peff


Re: [PATCH 1/2] introduce "banned function" list

2018-07-19 Thread Jeff King
On Thu, Jul 19, 2018 at 05:11:15PM -0400, Eric Sunshine wrote:

> On Thu, Jul 19, 2018 at 4:39 PM Jeff King  wrote:
> > [...]
> > Let's start by banning strcpy() and sprintf(). It's not
> > impossible to use these correctly, but it's easy to do so
> > incorrectly, and there's always a better option.
> > [...]
> > Signed-off-by: Jeff King 
> > ---
> > diff --git a/banned.h b/banned.h
> > @@ -0,0 +1,19 @@
> > +/*
> > + * This header lists functions that have been banned from our code base,
> > + * because they're too easy to misuse (and even if used correctly,
> > + * complicate audits). Including this header turns them into compile-time
> > + * errors.
> > + */
> 
> When the above talks about "including this header", the implication is
> that it must be included _after_ the system header(s) which declare
> the banned functions. I wonder if that requirement should be stated
> here explicitly.

Hmm, does it need to be? I had originally intended it to be included
before, actually, though in the end I put it later.

I guess it would yield declarations like strcpy_is_banned(), which would
cause _different_ errors (probably link-time ones).

> (Probably not worth a re-roll.)

Yeah, I doubt it matters much either way, since the inclusion is done
automatically in git-compat-util.h.

I had also originally imagined this to be triggered via DEVELOPER=1,
with something like "-include banned.h" in CFLAGS. But I think it
probably is appropriate for everybody to run it, since it shouldn't
cause any false positives or other compilation issues.

The one I brainstormed (but forgot to mention) is that it might be
possible for a platform to have strcpy as a macro already? In which case
we'd need to #undef it or risk a compilation error (even if the macro
isn't actually used).

-Peff


Re: [PATCH 1/2] introduce "banned function" list

2018-07-19 Thread Stefan Beller
On Thu, Jul 19, 2018 at 1:39 PM Jeff King  wrote:
>
> There are a few standard C functions (like strcpy) which are
> easy to misuse. We generally discourage these in reviews,
> but we haven't put advice in CodingGuidelines, nor provided
> any automated enforcement. The latter is especially
> important because it's more consistent, and it can often
> save a round-trip of review.

Thanks for writing these patches. I would think this is a
good idea to have as I certainly would use banned functions
if not told by the compiler not to.

>  Documentation/CodingGuidelines |  3 +++

I'd prefer to not add more text to our documentation
(It is already long and in case you run into this problem
the error message is clear enough IMHO)


> @@ -0,0 +1,19 @@
> +#ifndef BANNED_H
> +#define BANNED_H
> +
> +/*
> + * This header lists functions that have been banned from our code base,
> + * because they're too easy to misuse (and even if used correctly,
> + * complicate audits). Including this header turns them into compile-time
> + * errors.
> + */
> +
> +#define BANNED(func) sorry_##func##_is_a_banned_function()
> +
> +#define strcpy(x,y) BANNED(strcpy)
> +
> +#ifdef HAVE_VARIADIC_MACROS

In a split second I thought you forgot sprintf that was
mentioned in the commit message, but then I kept on reading
just to find it here. I wonder if we want put this #ifdef at the
beginning of the file (after the guard) as then we can have
a uncluttered list of banned functions here. The downside is that
the use of strcpy would not be banned in case you have no
variadic macros, but we'd still catch it quickly as most people
have them. Undecided.

Regarding the introduction of the functions to this list,
I would imagine people would find the commit that introduced
a function to the ban list to look for a reason why.
Can we include a link[1] to explain why we discourage
strcpy and sprintf in this commit?


[1] https://www.thegeekstuff.com/2013/06/buffer-overflow/?utm_source=feedly
  This is the best I found. So not sure if it worth it.

Thanks,
Stefan


Re: [PATCH 1/2] introduce "banned function" list

2018-07-19 Thread Eric Sunshine
On Thu, Jul 19, 2018 at 4:39 PM Jeff King  wrote:
> [...]
> Let's start by banning strcpy() and sprintf(). It's not
> impossible to use these correctly, but it's easy to do so
> incorrectly, and there's always a better option.
> [...]
> Signed-off-by: Jeff King 
> ---
> diff --git a/banned.h b/banned.h
> @@ -0,0 +1,19 @@
> +/*
> + * This header lists functions that have been banned from our code base,
> + * because they're too easy to misuse (and even if used correctly,
> + * complicate audits). Including this header turns them into compile-time
> + * errors.
> + */

When the above talks about "including this header", the implication is
that it must be included _after_ the system header(s) which declare
the banned functions. I wonder if that requirement should be stated
here explicitly.

(Probably not worth a re-roll.)

> +#define BANNED(func) sorry_##func##_is_a_banned_function()
> +
> +#define strcpy(x,y) BANNED(strcpy)
> diff --git a/git-compat-util.h b/git-compat-util.h
> @@ -1239,4 +1239,6 @@ extern void unleak_memory(const void *ptr, size_t len);
>  #define UNLEAK(var) do {} while (0)
>  #endif
>
> +#include "banned.h"