Re: OT: Open letter to the Linux World

Alexander Holler Sat, 06 Sep 2014 13:03:12 -0700

Am 05.09.2014 08:31, schrieb Alexander Holler:

Am 04.09.2014 21:18, schrieb Rob Landley:

What's actually wrong with C++ at a language design level.

Short version:


OMG.

It's better than C. In almost every aspect. Stop. Nothing else. Of
course, if you want to write something like systemd in Python, Perl,
Pascal, Modula or Erlang, feel free to so. And if you want more security
bugs, feel free to still use C for string handling instead of
std::string, Or still write your sorted list for every structure (or
just don't and go the slow way, because you don't find the time to do it
right in C). And ...

You don't have to understand how templates do work to use e.g.
std::string. Other people do hard stuff for you. So don't panic.

I've brought up the critics about using C in a critical and verysecurity sensitive piece of software in userland, so I've decided a bitmore explanations might make sense.

First, as you don't seem to have noticed or you don't know or you ignorethe difference, let me repeat that this thread is about a piece of SWwhich runs in userland. So, please, keep away with any comments fromLinus where he talks about kernelspace. I'm pretty sure he knows thedifference.

Now let me bring up a very small piece of code which you can find in asimiliar fashion in almost every piece of software which gets in contactwith strings. And not just in one place or function, but in dozens oreven hundred of places (inline, not in functions) in one project.


First in C++:

void foo_bar(const std::string& foo, const std::string& bar,std::string& foobar)

{
        foobar = foo + bar;
}

For those which don't know C++, this concatenates the two strings namedfoo and bar and puts the result into foobar.


Now an example how you would have to do that in C:

char *foo_bar(const char *foo, const char *bar)
{
        char *foobar = malloc(strlen(foo) + strlen(bar));

        strcpy(foobar, foo);
        strcat(foobar, bar);

        return foobar;
}

Do you see the difference and spot all the problems?

First I've though about not posting the answer to see the response, butthat would just have ended up with a lot of people calling me a fooland/or assuming I can't write proper C. And it bears the problem thatsome inexperienced people might copy and paste and use it.


So at first: THE ABOVE EXAMPLE IN C IS BROKEN.

The very first problem is that foobar is allocated with the wrong size,because it doesn't take care of the terminating null byte. A very commonproblem already found at uncountable places.


But there are several more problems:

- What happens if foo or bar isn't terminated with a null byte?

- What happens if malloc fails?

- Who is the owner of foo, bar and/or foobar? Does the caller still ownsfoo and bar afterwards? Will the caller own foobar? (That means who isrepsonsible to free foo, bar and foobar if they aren't used anymore).


So now we extend the above C example:

char *foo_bar(const char *foo, const char *bar)
{
        char *foobar;

        if (!foo || !bar)
                return NULL;

        foobar = malloc(strlen(foo) + strlen(bar) + 1);

        if (!foobar)
                return NULL;

        strcpy(foobar, foo);
        strcat(foobar, bar);

        return foobar;
}

This has still some problems. First, the caller has to check iffoo_bar() hasn't returned NULL. A very common bug already found inuncountable places too.

Next, there is still the unsolvable problem about what happens if foo orbar isn't terminated with a null byte (in other words they aren't Cstrings).So you have to check all callers up to the source of foo and bar to besure the program doesn't crash in the possible far far away place calledfoo_bar().

And still no comment about ownership. That means someone who just looksat the prototype or sees a call of foo_bar() somewhere has no idea aboutthe ownership of foo, bar and the returned foobar without a comment.

So just this very simple functionality about string handling in Calready contains several still open questions and is 17 lines long whichhave to be reviewed very carefull (e.g. to not miss the off-by-one bug).Compare this with the 4 lines in C++ which are almost impossible to door to use wrong.

And, again, this thread is about a piece of software which runs withprocess ID 1, wants to control the whole system and owns all permissionsto modify the system in almost every possible way. It doesn't run assome user with restricted permissions or in chroot or somethingsimilar. Some parts might do, but for sure not all (read again the above"far far away").


And now some stats. I've just checked out systemd:

git grep -E "strcat|strncat|strcpy|strncpy|strlen" | wc -l
570

git grep -E "strcat|strncat|strcpy|strncpy|memcpy|strlen" | wc -l
850

Ok, not every of those places might be part of pid 1. And several placesare trivial calls like strlen("ATTR"), but it gives an idea about howmany places do exist in systemd which might contain a problem wich isn'ttrivial to spot.

And regardless how clever and experienced these people are which arewriting this piece of software, everyone is prone to do e.g. such anoff-by-one bug.Maybe he writes the piece of code after having worked 12 hours, maybe hegot interrupted while writing the code and continued it a day later,maybe it's full moon or maybe his last meal wasn't like it should have been.


Whatever.

This means, every piece of code in that piece of software has to bereviewed multiple times (reviewers aren't perfect too), and, as long asthe software changes, every piece which changes has to be reviewedmultiple times again.

I could continue with examples for lists, sets and similiar datastructures, which have to be inventent again and again and again in C,whereas in C++ people can reuse some code which is already in use bymany, many people in many, many other projects.

And to come to your argument about how simple everything in C is. Justlook at the macro for container_of(). I wouldn't say it's such simplethat everyone understands what it does. And it's just part of the Linuxkernel, that means limited documentation and many people never heard ofit before, compared with stuff which can be found standard libraries.

And, again, this is not about the kernelspace, it isn't the Linux kernelwhere Linus has managed it to organize an army of people which do lookat every line again and again (and still do sometimes miss a bug).Most software projects don't have that many resources (human or not)available as the Linux kernel. In fact it's an absolute exception.

So you just don't want to use error prone C in new and non-trivialprojects (if not really necessary) which are a major problem ifsomething fails in the code.

Doing so just means nothing has be learned from the (of corse relativelyshort) history of software development.


Alexander Holler

PS: Please don't try to tell me that even the above C++ example ends upin some similar code as the C code. std::string is used by even morepeople than which do review the Linux kernel code. Besides that it wasdesigned and reviewed by clever people too. And, just to repeat itagain, we are talking about userspace, not kernelspace.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: OT: Open letter to the Linux World

Reply via email to