Thanks, Barry. We'll discuss this at the next ODP public call.
On Fri, Sep 8, 2017 at 5:06 PM, Barry Spinney <spin...@mellanox.com> wrote: > > Here is my promised document on "Lossless Data Compression Requirements and > Use Cases", > in response to Shally's request for compression Use cases. Mostly a complete > first draft, with > just a couple of still "To Be Done" parts. I seem to recall some constraints > on the mailing > lists of the use of email attachments and rich document formats, so I chose > to use a simple > text file format - despite the fact that it looks a little ugly. > > ********************************************************************** > > > Lossless Data Compression > Requirements and Use Cases > > Sep 8/2017 Barry Spinney > > > This document describes the Use Cases that would justify a HW based > compression/decompression engine on Mellanox SoC chips. An API for such a > compression engine is currently being considered for addition to the > OpenDataPlane (ODP) standard. For these Use Cases some basic requirements are > also listed. > > This document considers a number of uses of data compression in computer > systems - and especially those uses that are networking related. For a number > of such uses, we note whether that usage would justify the addition of > compression/decompression HW for Mellanox chips, as well as how this use case > would affect the ODP API. This document is organized as a taxonomy or flow > chart of the compression/decompression categories. > > > 1 Low Speed Uses: > ----------------- > Note also that we ignore many uses of compression where it is usually done > offline (often in SW) OR where it happens in real-time but at low speeds. > Here we define low speeds as those use cases that can be handled fairly > reasonably in SW and so don't justify special compression/decompression HW. > Today this usually means speeds less than 1 Gbps. > > > 2 High Speed Uses: > ------------------ > > > 2.1 Lossy versus Lossless: > -------------------------- > First we discuss the two major compression categories of lossy compression > use cases and lossless compression use cases. > > > 2.1.1 Lossy Compression: > ------------------------ > Lossy compression/decompression is an important use of compression in > the Internet. In particular lossy audio and video compression is wide-spread, > however the algorithms used here - like those described in the MPEG standards > - > are both very different and far more complex than the algorithms used for > lossless data compression. Often the compression can be done off-line rather > than in real time - though decompression generally needs to be done in real > time. > > Since the current ODP compression API is designed for lossless > compression, > the lossy use cases are ignored/irrelevant. > > > 2.1.2 Lossless Compression: > --------------------------- > Lossless data compression generally uses a small set of simpler > algorithms - > which are more amenable to HW implementation. This is the only category > considered for the current ODP HW. > > > 2.1.2.1 Offline vs Real-time: > ----------------------------- > Lossless data compression/decompression can be done "offline" or in > "real-time". By real-time we mean uses where the data to be compressed or > decompressed is arriving over a high speed network port and needs to be dealt > with at network speeds and with low latency. > <TBD better definition of offline/real-time> > > > 2.1.2.2 Offline: > ---------------- > An important offline use case could be whole file compression. However > this > is currently often done by the OS (somewhat transparently) and implemented in > Kernel SW. > > Another example is compression/decompression of large files/archives > for download (e.g. a compressed ISO file or a compressed tar file). > Unfortunately, while gzip and zip are still used, it is more common today > to instead use algorithms like bzip2 and xz. But these newer algorithms are > much more difficult to implement in HW because of the massive history buffers > that are used. > > While these use cases cannot justify the addition of compression/ > decompression HW, if such HW exists for other uses, then it might be nice if > the HW could handle this case as well. However this should be > considered a low priority. > > > 2.1.2.3 Real-time: > ------------------ > Real-time use cases occur as a result of some (standardized?) network > protocol. These protocols can be divided into: > a) those that compress at the packet level - like IPComp and IPsec > b) those that compress at the disk sector/block level > c) those that compress at the file level > > > 2.1.2.3.1 Packet Level Compression: > ----------------------------------- > Packet level compression/decompression - on high speed networks - is not > that common, with perhaps the exception being the IPComp protocol (RFC 3173) - > especially in conjunction with the IPsec protocols (several dozen RFC's), > which is the only use case discussed. > > > 2.1.2.3.1.1 IPComp/IPsec: > ------------------------- > While IPComp in theory could be used in a fairly general way, in practice > today, it is almost always used as the "compression" part of the IPsec > protocol. > So we ignore all other uses of IPComp. > > While low speed IPsec tunnels might use and benefit from IPComp > compression/ > decompression - because it is low speed it is going to be done in SW and has > no > relevance for an ODP HW (i.e. high speed) compression/decompression engine. > > Conversely, compression on high speed IPsec tunnels is less common, BUT > one could argue that this is primarily because of the lack of HW compression! > So we consider HW compression to be an enabler for high speed IPsec > compression. > Of course since this use case isn't solving a current common usage, this use > case may not justify the addition of this HW, BUT if such HW already exists > (i.e. is justified by some other use case), then this use case becomes fairly > relevant. > > > 2.1.2.3.2 Disk Sector/Block Compression: > ---------------------------------------- > This use case is often not specified in the disk block access protocols > themselves, but instead is done as a side effect of reading and writing > disk blocks by protocols like iScsi, NFS, NVMoF, etc. > <TBD I wasn't able to reach the Mellanox architect knowledgeable about > this > use case, so this section is still To Be Done> > > > 2.1.2.3.3 File Level Compression: > --------------------------------- > This use case is one of the main use cases that can justify the addition > of a compression HW engine. > > > 2.1.2.3.3.1 SSL/TLS: > -------------------- > While the SSL/TLS protocol includes the capability to do compression/ > decompression as part of its operation - assuming both parties agree that > they want to do it - it turns out to be almost never used. Since this feature > is so rarely used, the upcoming new version of the TLS protocol spec has > actually removed this capability from the protocol. > > Hence, while on paper, this looks like a promising use case, not only does > it NOT justify adding compression HW, it is NOT worth considering this use > case even if such HW is already there! Hence we completely ignore this case. > > > 2.1.2.3.3.2 HTTP: > ----------------- > This is the MOST important use case - both for justifying the addition of > compression/decompression HW and for actually using it on high speed networks. > > Note that the current IANA HTTP Transfer Coding Registry lists the > following compression formats (ignoring the old deprecated aliases): > a) compress - UNIX "compress" data format > b) deflate - "deflate" compressed data inside of the "zlib" data format > c) gzip - GZIP file format > > It is IMPORTANT to understand how the HTTP protocol works and how > compression fits into this protocol. Some of the details below have been > simplified to make it easier to understand, hence the description below should > not be considered accurate or complete, but the concepts and issues raised > should be correct. > > First of all HTTP is not a packet based protocol but a byte-stream based > protocol running over TCP. What this means is that HTTP request headers and > response headers, along with data stream headers like chunked data headers, > are not necessarily related to IP packet boundaries. Instead HTTP sends a > byte stream to TCP which is then free to carve up these bytes into TCP/IP > packet payloads in any way it wishes. > > Next, HTTP is not compressing/decompressing pkt payloads, but instead is > working on whole files being transferred (technically it operates on > "resources" as indicated by a Uniform Resource Identifier (URI), but the the > distinction from a compress/decompress perspective is moot). Although HTTP > has the concept of many "methods" operating on these resources, in practice > the > vast majority of requests involves methods that simply get or put "files", > and these files can be (and often are) compressed - at the HTTP level. Here > we ignore files that HTTP considers to be not compressed, but the actual > file format uses some compression algorithm. Examples are JPEG files > (typically using lossy compression) and sometimes PNG files (using lossless > compression). Since compressing an already compressed file isn't useful, > one doesn't want to do HTTP compression on e.g. JPEG files. > > The HTTP side receiving files (compressed or not) needs some way to know > when that file transfer request is done. With HTTP/1.1, Web Servers will > usually want to transfer multiple files over a single TCP connection, so the > "old way" of knowing that the transfer is done by seeing the TCP connection > terminated, isn't sufficient. Instead HTTP either uses a Content-Length > header > field OR uses the chunked Transfer-Encoding method. > > In particular chunked encoding is widely used. This method modifies the > file body being transferred (compressed or not) by dividing it into a series > of chunks, where each chunk has an newly inserted chunk header at the start > of each chunk. A chunk consists of a chunk-size (variable number of hex > digits) > followed by an optional chunk-extension followed by a CRLF (CRLF means the > two characters of Carriage Return followed immediately by Line Feed). Next > comes that actual file bytes - whose length is indicated by the chunk-size - > followed by CRLF. This pattern is then repeated until the entire file has > been transferred. > > So if this HTTP payload is fed directly into a decompression engine, bad > things will happen because of the insertion of these chunk headers. So the > decompression API needs to have a way to indicate where the file data starts > in the packet and also where the chunked headers are so that they can be > skipped. Note that the file data will often not start immediately after > a packet's TCP header, because every file transfer has a variable length > HTTP request header or HTTP response header preceding its first byte. Also > the chunked headers could be split across TCP packets in an arbitrary way. > > In addition the result of the decompression of this file isn't logically > a set of packets, but instead is a (potentially huge) memory buffer holding > the decompressed/original file data. By the way this memory buffer often > will need to actually be implemented as a linked list of contiguous memory > regions/buffers. Ideally these regions/buffers would have a size of the > order of 16K to 64K. So the preferred API would be one which calls an > asynchronous decompress function with a batch of pkts and a pointer to one > byte > past where the last decompress byte for this file was written. The batch of > input pkts also needs a way to specify where the file data starts and ends > in each packet (via an byte offset from say the TCP header start plus a > count of the file bytes in this packet) AND also the byte offsets and lengths > of the chunked headers. Note that, while not common, one could have multiple > chunked headers in the same packet. Ideally, if one has a large number of > chunked headers in the same packet (say > 8 or 16), then probably best to > detect it, abort the connection and call it an attack. > > The preferred compression API is less clear cut. First consider the > output part of the API. One eventually needs to send the compressed file > data in a series of TCP/HTTP packets. But if one supplies the decompression > engine with a set of "empty" packets, then who adds the various protocol > headers? The L2 and L3 headers aren't hard, but adding a TCP header is > a lot harder because of sequence numbers, flags, etc. But probably harder > still is adding the variable length HTTP request/response header before the > compressed data AND the chunk headers in the middle of the compressed data! > So it is probably better to have the compressed data be appended to a linked > list of contiguous memory buffers (just like the output for the decompress > case) and then let the ODP application add the various HTTP headers and > passing > the reason to some ODP compatible TCP stack. > > ******************************************************* > > Thanx Barry.