On Fri, Jun 21, 2024 at 10:45:01AM -0400, Greg Reagle wrote:
> Basically what you might call "safety and ease issues."

I won't comment on the other languages mentioned because my view on
nearly all of them is negative. But most of my "safety and ease of use"
with C I've been able to solve by just building better "primitives"
instead of changing language - which comes with significant cost such as
having to adapt to new tooling, new ecosystem (which are almost always
significantly less mature than C) and so on.

Below are two of the most impactful changes. I was going to elaborate on
these but it was getting *really* long very quickly so I've kept it to a
brief overview followed by a real-world project that demonstrates these
techniques.

(Do note that this "style" of C programming quite different to the
suckless one, so I wouldn't be surprised if it generates some
"backlash".)

Sized strings
-------------

        typedef struct {
                uint8_t   *s;   // or you can use `(unsigned) char *`
                ptrdiff_t  len; // or you can use `size_t`, see explanation 
below
        } Str;

* Cheap access to the string length which allows expressing string
  operations much more naturally (read: less error prone).
* Zero copy sub-string. This is a huge one, because it avoids many
  spurious copies and/or allocations.
* Unifies "strings" and "memory blobs", and so the various string
  functions you will build can be reused on memory blobs with embedded
  nul bytes.

The most important thing to note here is that there is no "capacity"
member. The string does not - and should not - be tangled with
allocation details as that will be handled below.

(I've also mostly been using signed sizes as they get rid of many
unnecessary signed vs unsigned mixing altogether and have more intuitive
behavior around 0. The same reason why Go uses signed sizes:
https://github.com/golang/go/issues/27460#issuecomment-418203686)


Region based memory management
------------------------------

The idea is very simple: instead of managing each allocation
individually, you group allocation by their lifetime instead.
https://en.wikipedia.org/wiki/Region-based_memory_management

E.g: if you build a tree where each node is `malloc`-ed then to free the
tree, you'd need to traverse it and free everything individually. With
region based scheme, you'd simply free the entire region. No traversal
needed.

        // allocating node on a region
        for (...) {
                Node *np = alloc_node(&tree_arena);
                // ...
        }
        // free-ing the entire tree
        free_arena(&tree_arena);

The tree example is a bit niche but the this scheme is much more
impactful for temporary allocations. You can make many of them and then
discard them all in one call.

        checkpoint = arena_snapshot(&scratch);
        int *p = alloc(&scratch, int, 32);
        int *q = alloc(&scratch, int, 64);
        if (rand() & 0x1) float *v = alloc(&scratch, float, 128); // silly demo
        // do "work"
        // ...
        arena_reset(&scratch, checkpoint); // frees all allocation made inside 
of `scratch` since `checkpoint` was captured


Example and resources
---------------------

For a real world example of these techniques, I recommend looking at
u-config:

        https://github.com/skeeto/u-config
        https://nullprogram.com/blog/2023/01/18/#u-config-implementation 
(implementation notes)

It's a `pkg-config` replacement which requires a quite a heavy bit of
string manipulation. If you were to implement this using nul-strings
you'd be in a lot of pain. But using sized strings and an arena
allocator u-config code is fairly easy to hack on and also very robust
(it withstands hours of fuzz-testing without any buffer overflows).

This this has gotten pretty big already, I'll drop a couple resources
for those who are interested:

On arena allocators:
* https://nullprogram.com/blog/2023/09/27/  (practical tips and tricks)
* https://www.rfleury.com/p/untangling-lifetimes-the-arena-allocator  (a bit 
more theoretical)
Other allocator schemes (niche, but can be useful depending on the situation)
* https://floooh.github.io/2018/06/17/handles-vs-pointers.html
  (generational handles when you have truly dynamic lifetimes that cannot
  be expressed using an arena)
* https://www.gingerbill.org/series/memory-allocation-strategies/
  (pool & buddy allocator. pretty niche, but good to be aware of in case
  they happen to be a good fit for your problem.)
Rants:
* https://www.youtube.com/watch?v=f4ioc8-lDc0&t=4482s (against
  overcomplicated memory management solutions that stems of "single
  element thinking").
* https://www.symas.com/post/the-sad-state-of-c-strings
* https://nee.lv/2021/02/28/How-I-cut-GTA-Online-loading-times-by-70/

- NRK

Reply via email to