Hi people. I've been working on this colour library when I have free time (almost never!), and I want to work on blending/filtering functions, but that work is kinda blocked by saturating arithmetic logic. I started working on a saturating integer library a few times, but it's much a much bigger job than it appears, and I haven't had enough time for it. (Efficient) implementation tends to be significantly different for every int width and signed/unsigned. I see no use for an inefficient implementation used by a colour library; images tend to be millions of pixels, and inefficiency very quickly adds up.
I wonder if anyone has an interest in the area and wants to have a go at it? It's a pretty big job. It should support scalars, packed vectors (ie, 4 bytes in an int), and SIMD vectors (wider vectors using hardware simd ops). Each step can gain considerable efficiency for the vector width. It's hard to write a useful colour blending library without the full set of these available.