Cool. Learn something new every day. ;-) On Oct 17, 2022, 12:26 PM -0700, Matthew Ahrens via openzfs-developer <developer@lists.open-zfs.org>, wrote: > Yes, to expand on what Rich said, there was a talk about Intel QAT offload of > gzip at the 2018 OpenZFS Developer Summit: > ZFS Hardware Acceleration with QAT > Weigang Li > Intel > slides > video > The results presented show >2x throughput with <1/2 the CPU used, and similar > compression to gzip software (I'm guessing with the default gzip level). > > QAT support has been in ZFS since 0.8.0. > > --matt > > > On Mon, Oct 17, 2022 at 12:15 PM Rich <rincebr...@gmail.com> wrote: > > > I believe the Intel QAT support we have will happily offload gzip for > > > you, though I don't know if it makes any promises about what level > > > equivalent of gzip it hands you back... > > > > > > - Rich > > > > > > > On Mon, Oct 17, 2022 at 12:02 PM Garrett D'Amore <garr...@damore.org> > > > > wrote: > > > > > That’s about what I would have expected. > > > > > > > > > > Having an offload for high levels of compression (e.g. GZIP 9 or > > > > > something) would be cool, but I don’t think it exists yet. And it > > > > > would be hard to write that in a way that doesn’t punish things for > > > > > the folks who *don’t* have the offload hardware. > > > > > > > > > > • Garrett > > > > > > > > > > On Oct 17, 2022, 8:44 AM -0700, Sanjay G Nadkarni via > > > > > openzfs-developer <developer@lists.open-zfs.org>, wrote: > > > > > > > > > > > > > > > > > > We have been doing regular performance runs using various workloads > > > > > > over NFS(v3,v4.1), SMB3, iSCSI and FC16 & 32 for the past few > > > > > > years. Compression is enabled for all datasets and zvols in our > > > > > > runs. What we have observed is, under load, compression consumes > > > > > > the highest CPU cycles, after that it is a toss up of dnode locking > > > > > > (a well known issue) and other things that might come into play > > > > > > depending on the protocol. > > > > > > > > > > > > At least in our use cases check summing of blocks does not appear > > > > > > to an issue. > > > > > > > > > > > > -Sanjay > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On 10/14/22 10:15 AM, Garrett D'Amore wrote: > > > > > > > I can tell from past experiences that offloads like what you are > > > > > > > proposing are rarely worth it. The set up and tear down of the > > > > > > > mappings to allow the data transport are not necessarily cheap. > > > > > > > You can avoid that by having a preallocated region, but then you > > > > > > > need to copy the data. Fortunately for this case you only need > > > > > > > to copy once, since the result will be very small compared to the > > > > > > > data. > > > > > > > > > > > > > > Then there is the complexity (additional branches, edge cases, > > > > > > > etc.) that have to be coded. These become performance sapping as > > > > > > > well. > > > > > > > > > > > > > > Add to this the fact that CPUs are always getting faster, and > > > > > > > advancements like extensions to the SIMD instructions mean that > > > > > > > the disparity between the offload and just doing the natural > > > > > > > thing inline gets ever smaller. > > > > > > > > > > > > > > At the end of the day, it’s often the case that your “offload” is > > > > > > > actually a performance killer. > > > > > > > > > > > > > > The exceptions to this are when the work is truly expensive. For > > > > > > > example, running (in the old days) RSA on an offload engine makes > > > > > > > a lot of sense. (I’m not sure it does for elliptic curve crypto > > > > > > > though.) Running 3DES (again if you wanted to do that, which you > > > > > > > should not) used to make sense. AES used to, but with AES-NI not > > > > > > > anymore. I suspect that for SHA2 its a toss up. Fletcher > > > > > > > probably does not make sense. If you want to compress, LZJB does > > > > > > > not make sense, but GZIP (especially at higher levels) would, if > > > > > > > you had such a device. > > > > > > > > > > > > > > Algorithms are always getting better (newer ones that are more > > > > > > > optimized for actual CPUs etc.) and CPUs are always improving — > > > > > > > the GPU is probably best reserved for truly expensive operations > > > > > > > for which it was designed — complex transforms for 3D rendering, > > > > > > > expensive hashing (although I wish that wasn’t a thing), long > > > > > > > running scientific analysis, machine learning, etc. > > > > > > > > > > > > > > As an I/O accelerator, not so much. > > > > > > > On Oct 14, 2022, 7:52 AM -0700, Thijs Cramer > > > > > > > <thijs.cra...@gmail.com>, wrote: > > > > > > > > I've been searching the GitHub Repository and the Mailing list, > > > > > > > > but couldn't find any discussion about this. > > > > > > > > I know it's probably silly, but I would like to understand the > > > > > > > > workings. > > > > > > > > > > > > > > > > Let's say one could offload the Checksumming process to a > > > > > > > > dedicated GPU. This might save some amount of CPU, *but* might > > > > > > > > increase latency incredibly. > > > > > > > > > > > > > > > > To my understanding ZFS uses the Fletcher4 Checksum Algorithm > > > > > > > > by default, and this requires a pass of the data in-memory as > > > > > > > > it calculates the checksum. If we skip this step, and instead > > > > > > > > send the data to the GPU, that would also require a pass of the > > > > > > > > data (no gains there). > > > > > > > > > > > > > > > > The actual calculation is not that hard for a CPU it seems, > > > > > > > > there are specific SIMD instructions for calculating specific > > > > > > > > Checksums, and after a quick pass over the code, it seems they > > > > > > > > are already used (if available). > > > > > > > > > > > > > > > > I think the only time that a GPU could calculate checksums > > > > > > > > 'faster', is with a form of readahead. > > > > > > > > If you would pre-read a lot of data, and dump it to the GPU's > > > > > > > > internal memory, and make the GPU calculate checksums of the > > > > > > > > entire block in parallel, it might be able to do it faster than > > > > > > > > a CPU. > > > > > > > > > > > > > > > > Has anyone considered the idea? > > > > > > > > > > > > > > > > - Thijs > openzfs / openzfs-developer / see discussions + participants + delivery > options Permalink
------------------------------------------ openzfs: openzfs-developer Permalink: https://openzfs.topicbox.com/groups/developer/T2be6db01da63a639-M9d1dd333674db9391bc0a362 Delivery options: https://openzfs.topicbox.com/groups/developer/subscription