Just a heads up on the LZ4. I have spent roughly 3 hours optimizing my decompresser. And while I had stunning success, a speed-up of about 400%. I am still about 600x slower then the C variant.
It is still a mystery to me why that is :)Since the generated code both smaller and works almost without spills.