@Stefan_Salewski - I think this idea of having a family of overloaded procs to sort on standard types is very good. I have almost raised it myself several times.
It would be useful for `cmp`-less `sort` procs on such type to be _stable_ sorts (meaning elements with identical keys retain relative positions in the array). Then a user could implement a simple 2-level or 3-level sort on distinct embedded keys by just calling such sort procs 2 or 3 times. (A multi-level sort orders elements first by a primary key, then a secondary key for equal primary keys, and so on). This arrangement has another performance bonus potentially much greater than the inlining effect mentioned so far in this thread. It allows one to tune which sorting algorithm is used based on the nature of the key, e.g. how long it is, integer or floating point or string, etc. For example, for single-byte integer keys it is hard to be faster than [counting sort](https://en.wikipedia.org/wiki/Counting_sort) which does two pretty linear sweeps through the input - the first to histogram byte values and the second to output the answer (with a keyspace-sized prefix sum in between to convert counts to output offsets). The simplest version of that does require an additional copy of the input array which may have some consequences on the proc signature/interface/specification.