The option to add at least one guard page would be useful whether or not it's tied to randomization. It's not feasible to do that in userspace for mmap as a whole, only specific users of mmap like malloc and it adds significant overhead vs. a kernel implementation. It could optionally let you choose a minimum and maximum guard region size with it picking random sizes if they're not equal. It's important for it to be an enforced gap rather than something that can be filled in by another allocation. It will obviously help a lot more when it's being used with a hardened allocator designed to take advantage of this rather than glibc malloc or jemalloc.
I don't think it makes sense for the kernel to attempt mitigations to hide libraries. The best way to do that is in userspace, by having the linker reserve a large PROT_NONE region for mapping libraries (both at initialization and for dlopen) including a random gap to act as a separate ASLR base. If an attacker has library addresses, it's hard to see much point in hiding the other libraries from them. It does make sense to keep them from knowing the location of any executable code if they leak non-library addresses. An isolated library region + gap is a feature we implemented in CopperheadOS and it works well, although we haven't ported it to Android 7.x or 8.x. I don't think the kernel can bring much / anything to the table for it. It's inherently the responsibility of libc to randomize the lower bits for secondary stacks too. Fine-grained randomized mmap isn't going to be used if it causes unpredictable levels of fragmentation or has a high / unpredictable performance cost. I don't think it makes sense to approach it aggressively in a way that people can't use. The OpenBSD randomized mmap is a fairly conservative implementation to avoid causing excessive fragmentation. I think they do a bit more than adding random gaps by switching between different 'pivots' but that isn't very high benefit. The main benefit is having random bits of unmapped space all over the heap when combined with their hardened allocator which heavily uses small mmap mappings and has a fair bit of malloc-level randomization (it's a bitmap / hash table based slab allocator using 4k regions with a page span cache and we use a port of it to Android with added hardening features but we're missing the fine-grained mmap rand it's meant to have underneath what it does itself). The default vm.max_map_count = 65530 is also a major problem for doing fine-grained mmap randomization of any kind and there's the 32-bit reference count overflow issue on high memory machines with max_map_count * pid_max which isn't resolved yet.