Here is my promised document on "Lossless Data Compression Requirements and Use Cases", in response to Shally's request for compression Use cases. Mostly a complete first draft, with just a couple of still "To Be Done" parts. I seem to recall some constraints on the mailing lists of the use of email attachments and rich document formats, so I chose to use a simple text file format - despite the fact that it looks a little ugly.
********************************************************************** Lossless Data Compression Requirements and Use Cases Sep 8/2017 Barry Spinney This document describes the Use Cases that would justify a HW based compression/decompression engine on Mellanox SoC chips. An API for such a compression engine is currently being considered for addition to the OpenDataPlane (ODP) standard. For these Use Cases some basic requirements are also listed. This document considers a number of uses of data compression in computer systems - and especially those uses that are networking related. For a number of such uses, we note whether that usage would justify the addition of compression/decompression HW for Mellanox chips, as well as how this use case would affect the ODP API. This document is organized as a taxonomy or flow chart of the compression/decompression categories. 1 Low Speed Uses: ----------------- Note also that we ignore many uses of compression where it is usually done offline (often in SW) OR where it happens in real-time but at low speeds. Here we define low speeds as those use cases that can be handled fairly reasonably in SW and so don't justify special compression/decompression HW. Today this usually means speeds less than 1 Gbps. 2 High Speed Uses: ------------------ 2.1 Lossy versus Lossless: -------------------------- First we discuss the two major compression categories of lossy compression use cases and lossless compression use cases. 2.1.1 Lossy Compression: ------------------------ Lossy compression/decompression is an important use of compression in the Internet. In particular lossy audio and video compression is wide-spread, however the algorithms used here - like those described in the MPEG standards - are both very different and far more complex than the algorithms used for lossless data compression. Often the compression can be done off-line rather than in real time - though decompression generally needs to be done in real time. Since the current ODP compression API is designed for lossless compression, the lossy use cases are ignored/irrelevant. 2.1.2 Lossless Compression: --------------------------- Lossless data compression generally uses a small set of simpler algorithms - which are more amenable to HW implementation. This is the only category considered for the current ODP HW. 2.1.2.1 Offline vs Real-time: ----------------------------- Lossless data compression/decompression can be done "offline" or in "real-time". By real-time we mean uses where the data to be compressed or decompressed is arriving over a high speed network port and needs to be dealt with at network speeds and with low latency. <TBD better definition of offline/real-time> 2.1.2.2 Offline: ---------------- An important offline use case could be whole file compression. However this is currently often done by the OS (somewhat transparently) and implemented in Kernel SW. Another example is compression/decompression of large files/archives for download (e.g. a compressed ISO file or a compressed tar file). Unfortunately, while gzip and zip are still used, it is more common today to instead use algorithms like bzip2 and xz. But these newer algorithms are much more difficult to implement in HW because of the massive history buffers that are used. While these use cases cannot justify the addition of compression/ decompression HW, if such HW exists for other uses, then it might be nice if the HW could handle this case as well. However this should be considered a low priority. 2.1.2.3 Real-time: ------------------ Real-time use cases occur as a result of some (standardized?) network protocol. These protocols can be divided into: a) those that compress at the packet level - like IPComp and IPsec b) those that compress at the disk sector/block level c) those that compress at the file level 2.1.2.3.1 Packet Level Compression: ----------------------------------- Packet level compression/decompression - on high speed networks - is not that common, with perhaps the exception being the IPComp protocol (RFC 3173) - especially in conjunction with the IPsec protocols (several dozen RFC's), which is the only use case discussed. 2.1.2.3.1.1 IPComp/IPsec: ------------------------- While IPComp in theory could be used in a fairly general way, in practice today, it is almost always used as the "compression" part of the IPsec protocol. So we ignore all other uses of IPComp. While low speed IPsec tunnels might use and benefit from IPComp compression/ decompression - because it is low speed it is going to be done in SW and has no relevance for an ODP HW (i.e. high speed) compression/decompression engine. Conversely, compression on high speed IPsec tunnels is less common, BUT one could argue that this is primarily because of the lack of HW compression! So we consider HW compression to be an enabler for high speed IPsec compression. Of course since this use case isn't solving a current common usage, this use case may not justify the addition of this HW, BUT if such HW already exists (i.e. is justified by some other use case), then this use case becomes fairly relevant. 2.1.2.3.2 Disk Sector/Block Compression: ---------------------------------------- This use case is often not specified in the disk block access protocols themselves, but instead is done as a side effect of reading and writing disk blocks by protocols like iScsi, NFS, NVMoF, etc. <TBD I wasn't able to reach the Mellanox architect knowledgeable about this use case, so this section is still To Be Done> 2.1.2.3.3 File Level Compression: --------------------------------- This use case is one of the main use cases that can justify the addition of a compression HW engine. 2.1.2.3.3.1 SSL/TLS: -------------------- While the SSL/TLS protocol includes the capability to do compression/ decompression as part of its operation - assuming both parties agree that they want to do it - it turns out to be almost never used. Since this feature is so rarely used, the upcoming new version of the TLS protocol spec has actually removed this capability from the protocol. Hence, while on paper, this looks like a promising use case, not only does it NOT justify adding compression HW, it is NOT worth considering this use case even if such HW is already there! Hence we completely ignore this case. 2.1.2.3.3.2 HTTP: ----------------- This is the MOST important use case - both for justifying the addition of compression/decompression HW and for actually using it on high speed networks. Note that the current IANA HTTP Transfer Coding Registry lists the following compression formats (ignoring the old deprecated aliases): a) compress - UNIX "compress" data format b) deflate - "deflate" compressed data inside of the "zlib" data format c) gzip - GZIP file format It is IMPORTANT to understand how the HTTP protocol works and how compression fits into this protocol. Some of the details below have been simplified to make it easier to understand, hence the description below should not be considered accurate or complete, but the concepts and issues raised should be correct. First of all HTTP is not a packet based protocol but a byte-stream based protocol running over TCP. What this means is that HTTP request headers and response headers, along with data stream headers like chunked data headers, are not necessarily related to IP packet boundaries. Instead HTTP sends a byte stream to TCP which is then free to carve up these bytes into TCP/IP packet payloads in any way it wishes. Next, HTTP is not compressing/decompressing pkt payloads, but instead is working on whole files being transferred (technically it operates on "resources" as indicated by a Uniform Resource Identifier (URI), but the the distinction from a compress/decompress perspective is moot). Although HTTP has the concept of many "methods" operating on these resources, in practice the vast majority of requests involves methods that simply get or put "files", and these files can be (and often are) compressed - at the HTTP level. Here we ignore files that HTTP considers to be not compressed, but the actual file format uses some compression algorithm. Examples are JPEG files (typically using lossy compression) and sometimes PNG files (using lossless compression). Since compressing an already compressed file isn't useful, one doesn't want to do HTTP compression on e.g. JPEG files. The HTTP side receiving files (compressed or not) needs some way to know when that file transfer request is done. With HTTP/1.1, Web Servers will usually want to transfer multiple files over a single TCP connection, so the "old way" of knowing that the transfer is done by seeing the TCP connection terminated, isn't sufficient. Instead HTTP either uses a Content-Length header field OR uses the chunked Transfer-Encoding method. In particular chunked encoding is widely used. This method modifies the file body being transferred (compressed or not) by dividing it into a series of chunks, where each chunk has an newly inserted chunk header at the start of each chunk. A chunk consists of a chunk-size (variable number of hex digits) followed by an optional chunk-extension followed by a CRLF (CRLF means the two characters of Carriage Return followed immediately by Line Feed). Next comes that actual file bytes - whose length is indicated by the chunk-size - followed by CRLF. This pattern is then repeated until the entire file has been transferred. So if this HTTP payload is fed directly into a decompression engine, bad things will happen because of the insertion of these chunk headers. So the decompression API needs to have a way to indicate where the file data starts in the packet and also where the chunked headers are so that they can be skipped. Note that the file data will often not start immediately after a packet's TCP header, because every file transfer has a variable length HTTP request header or HTTP response header preceding its first byte. Also the chunked headers could be split across TCP packets in an arbitrary way. In addition the result of the decompression of this file isn't logically a set of packets, but instead is a (potentially huge) memory buffer holding the decompressed/original file data. By the way this memory buffer often will need to actually be implemented as a linked list of contiguous memory regions/buffers. Ideally these regions/buffers would have a size of the order of 16K to 64K. So the preferred API would be one which calls an asynchronous decompress function with a batch of pkts and a pointer to one byte past where the last decompress byte for this file was written. The batch of input pkts also needs a way to specify where the file data starts and ends in each packet (via an byte offset from say the TCP header start plus a count of the file bytes in this packet) AND also the byte offsets and lengths of the chunked headers. Note that, while not common, one could have multiple chunked headers in the same packet. Ideally, if one has a large number of chunked headers in the same packet (say > 8 or 16), then probably best to detect it, abort the connection and call it an attack. The preferred compression API is less clear cut. First consider the output part of the API. One eventually needs to send the compressed file data in a series of TCP/HTTP packets. But if one supplies the decompression engine with a set of "empty" packets, then who adds the various protocol headers? The L2 and L3 headers aren't hard, but adding a TCP header is a lot harder because of sequence numbers, flags, etc. But probably harder still is adding the variable length HTTP request/response header before the compressed data AND the chunk headers in the middle of the compressed data! So it is probably better to have the compressed data be appended to a linked list of contiguous memory buffers (just like the output for the decompress case) and then let the ODP application add the various HTTP headers and passing the reason to some ODP compatible TCP stack. ******************************************************* Thanx Barry.