On Tue, Aug 04, 2020 at 02:57:50AM +0000, Nick Terrell wrote: > > > > On Aug 3, 2020, at 6:56 PM, Arvind Sankar <[email protected]> wrote: > > > > > -- I see that ZSTD_copy8 is already using __builtin_memcpy, > > but there must be more that can be optimized? There's a couple 1/2-byte > > sized copies in huf_decompress.c. > > Oh wow, I totally missed that, I guess I stopped looking once performance > was about what I expected, nice find! > > I suspect it is mostly the memcpy inside of HUF_decodeSymbolX4(), since > that should be the only hot one [1]. > > Do you want to put up the patch to fix the memcpy’s in zstd Huffman, or > should I? > > I will be submitting a patch upstream to migrate all of zstd’s memcpy() calls > to > use __builtin_memcpy(), since I plan on updating the version in the kernel to > upstream zstd in the next few months. I was waiting until the compressed > kernel > patch set landed, so I didn't distract from it. > > [0] https://gist.github.com/terrelln/9bd53321a669f62683c608af8944fbc2 > [1] > https://github.com/torvalds/linux/blob/master/lib/zstd/huf_decompress.c#L598 > > Best, > Nick >
It's better if you do the zstd changes I think, as I'm not familiar with the code at all. Thanks.

