Have you tried LTO and/or PGO? Of course the programs using your library will need to use them, but you can test in your benchmarks at least. E.g. see <https://forum.nim-lang.org/t/6295>
- Zippy: a dependency-free Nim implementation of deflate, gz... Yardanico
- Zippy: a dependency-free Nim implementation of deflat... xigoi
- Zippy: a dependency-free Nim implementation of de... dom96
- Zippy: a dependency-free Nim implementation o... guzba
- Zippy: a dependency-free Nim implementati... Araq
- Zippy: a dependency-free Nim impleme... Stefan_Salewski
- Zippy: a dependency-free Nim imp... xflywind
- Zippy: a dependency-free Nim... cblake
- Zippy: a dependency-free Nim... Araq
- Zippy: a dependency-free Nim... sschwarzer
- Zippy: a dependency-free Nim... Stefan_Salewski
