> Please don't top post. Sorry about that.
> Which function needs 1KB of stack space? That's quite a lot. FSE_buildCTable_wksp(), FSE_compress_wksp(), and HUF_readDTableX4() required over 1 KB of stack space. > I can see in [1] that there are some on-stack buffers replaced by > pointers to the workspace. That's good, but I would like to know if > there's any hidden gem that grags the precious stack space. I've been hunting down functions that use up the most stack trace and replacing buffers with pointers to the workspace. I compiled the code with -Wframe-larger-than=512 and reduced the stack usage of all offending functions. In the next version of the patch, no function uses more than 400 B of stack space. We'll be porting the changes back upstream as well. > Hm, I'd suggest to create a version optimized for kernel, eg. expecting > that 4+ GB buffer will never be used and you can use the most fittin in > type. This should affect only the function signatures, not the > algorithm implementation, so porting future zstd changes should be > straightforward. If the functions were exposed, then I would agree 100%. However, since these are internal functions, and the rest of zstd uses size_t to represent buffer sizes, I think it would be awkward to change just FSE/HUF functions. I also prefer size_t because it is friendlier to the optimizer, especially the loop optimizer, since the compiler doesn't have to worry about unsigned overflow. On a related note, zstd performs automatic optimizations to improve compression speed and reduce memory usage when given small sources, which is the common case in the kernel.