https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61034
--- Comment #13 from Richard Biener <rguenth at gcc dot gnu.org> --- We arrive at different final optimizations depending on PUSH_ARGS_REVERSED (see PR67203). Current (GCC 6) final state is either 3 or 4 calls depending on that. And this is only because the final DCE (which removes malloc/free pairs) needs some more DSE (which only follows DCE). The late dce/dse passes are the only ones with this particular odering, all other pairs come the other way around which would end up removing all malloc/free pairs in this (finally). Of course DSE and DCE depend on each other so exchanging the last two isn't a trivial surgery. Ideally DSE would have at least a basic DCE embedded or we'd finally merge both passes (given that DSE is quite ad-hoc anyway).