[lng-odp] [COMPRESSION,RFCv1]

Barry Spinney Fri, 08 Sep 2017 15:09:05 -0700

Here is my promised document on "Lossless  Data Compression Requirements and 
Use Cases",
in response to Shally's request for compression Use cases.  Mostly a complete 
first draft, with
just a couple of still "To Be Done" parts.  I seem to recall some constraints 
on the mailing
lists of the use of email attachments and rich document formats, so I chose to 
use a simple
text file format - despite the fact that it looks a little ugly.


**********************************************************************


                          Lossless Data Compression
                         Requirements and Use Cases

   Sep 8/2017                                                   Barry Spinney


    This document describes the Use Cases that would justify a HW based
compression/decompression engine on Mellanox SoC chips.  An API for such a
compression engine is currently being considered for addition to the
OpenDataPlane (ODP) standard.  For these Use Cases some basic requirements are
also listed.

    This document considers a number of uses of data compression in computer
systems - and especially those uses that are networking related.  For a number
of such uses, we note whether that usage would justify the addition of
compression/decompression HW for Mellanox chips, as well as how this use case
would affect the ODP API.  This document is organized as a taxonomy or flow
chart of the compression/decompression categories.


1 Low Speed Uses:
-----------------
    Note also that we ignore many uses of compression where it is usually done
offline (often in SW) OR where it happens in real-time but at low speeds.
Here we define low speeds as those use cases that can be handled fairly
reasonably in SW and so don't justify special compression/decompression HW.
Today this usually means speeds less than 1 Gbps.


2 High Speed Uses:
------------------


2.1 Lossy versus Lossless:
--------------------------
    First we discuss the two major compression categories of lossy compression
use cases and lossless compression use cases.


2.1.1 Lossy Compression:
------------------------
    Lossy compression/decompression is an important use of compression in
the Internet.  In particular lossy audio and video compression is wide-spread,
however the algorithms used here - like those described in the MPEG standards -
are both very different and far more complex than the algorithms used for
lossless data compression.  Often the compression can be done off-line rather
than in real time - though decompression generally needs to be done in real
time.

    Since the current ODP compression API is designed for lossless compression,
the lossy use cases are ignored/irrelevant.


2.1.2 Lossless Compression:
---------------------------
    Lossless data compression generally uses a small set of simpler algorithms -
which are more amenable to HW implementation.  This is the only category
considered for the current ODP HW.


2.1.2.1 Offline vs Real-time:
-----------------------------
    Lossless data compression/decompression can be done "offline" or in
"real-time".  By real-time we mean uses where the data to be compressed or
decompressed is arriving over a high speed network port and needs to be dealt
with at network speeds and with low latency.
   <TBD better definition of offline/real-time>


2.1.2.2 Offline:
----------------
    An important offline use case could be whole file compression.  However this
is currently often done by the OS (somewhat transparently) and implemented in
Kernel SW.

    Another example is compression/decompression of large files/archives
for download (e.g. a compressed ISO file or a compressed tar file).
Unfortunately, while gzip and zip are still used, it is more common today
to instead use algorithms like bzip2 and xz.  But these newer algorithms are
much more difficult to implement in HW because of the massive history buffers
that are used.

    While these use cases cannot justify the addition of compression/
decompression HW, if such HW exists for other uses, then it might be nice if
the HW could handle this case as well.  However this should be
considered a low priority.


2.1.2.3 Real-time:
------------------
    Real-time use cases occur as a result of some (standardized?) network
protocol.  These protocols can be divided into:
    a) those that compress at the packet level - like IPComp and IPsec
    b) those that compress at the disk sector/block level
    c) those that compress at the file level


2.1.2.3.1 Packet Level Compression:
-----------------------------------
    Packet level compression/decompression - on high speed networks - is not
that common, with perhaps the exception being the IPComp protocol (RFC 3173) -
especially in conjunction with the IPsec protocols (several dozen RFC's),
which is the only use case discussed.


2.1.2.3.1.1 IPComp/IPsec:
-------------------------
    While IPComp in theory could be used in a fairly general way, in practice
today, it is almost always used as the "compression" part of the IPsec protocol.
So we ignore all other uses of IPComp.

    While low speed IPsec tunnels might use and benefit from IPComp compression/
decompression - because it is low speed it is going to be done in SW and has no
relevance for an ODP HW (i.e. high speed) compression/decompression engine.

    Conversely, compression on high speed IPsec tunnels is less common, BUT
one could argue that this is primarily because of the lack of HW compression!
So we consider HW compression to be an enabler for high speed IPsec compression.
Of course since this use case isn't solving a current common usage, this use
case may not justify the addition of this HW, BUT if such HW already exists
(i.e. is justified by some other use case), then this use case becomes fairly
relevant.


2.1.2.3.2 Disk Sector/Block Compression:
----------------------------------------
    This use case is often not specified in the disk block access protocols
themselves, but instead is done as a side effect of reading and writing
disk blocks by protocols like iScsi, NFS, NVMoF, etc.
    <TBD I wasn't able to reach the Mellanox architect knowledgeable about this
use case, so this section is still To Be Done>


2.1.2.3.3 File Level Compression:
---------------------------------
    This use case is one of the main use cases that can justify the addition
of a compression HW engine.


2.1.2.3.3.1 SSL/TLS:
--------------------
    While the SSL/TLS protocol includes the capability to do compression/
decompression as part of its operation - assuming both parties agree that
they want to do it - it turns out to be almost never used.  Since this feature
is so rarely used, the upcoming new version of the TLS protocol spec has
actually removed this capability from the protocol.

    Hence, while on paper, this looks like a promising use case, not only does
it NOT justify adding compression HW, it is NOT worth considering this use
case even if such HW is already there!  Hence we completely ignore this case.


2.1.2.3.3.2 HTTP:
-----------------
    This is the MOST important use case - both for justifying the addition of
compression/decompression HW and for actually using it on high speed networks.

    Note that the current IANA HTTP Transfer Coding Registry lists the
following compression formats (ignoring the old deprecated aliases):
    a) compress - UNIX "compress" data format
    b) deflate - "deflate" compressed data inside of the "zlib" data format
    c) gzip - GZIP file format

    It is IMPORTANT to understand how the HTTP protocol works and how
compression fits into this protocol.  Some of the details below have been
simplified to make it easier to understand, hence the description below should
not be considered accurate or complete, but the concepts and issues raised
should be correct.

    First of all HTTP is not a packet based protocol but a byte-stream based
protocol running over TCP.  What this means is that HTTP request headers and
response headers, along with data stream headers like chunked data headers,
are not necessarily related to IP packet boundaries.  Instead HTTP sends a
byte stream to TCP which is then free to carve up these bytes into TCP/IP
packet payloads in any way it wishes.

    Next, HTTP is not compressing/decompressing pkt payloads, but instead is
working on whole files being transferred (technically it operates on
"resources" as indicated by a Uniform Resource Identifier (URI), but the the
distinction from a compress/decompress perspective is moot).  Although HTTP
has the concept of many "methods" operating on these resources, in practice the
vast majority of requests involves methods that simply get or put "files",
and these files can be (and often are) compressed - at the HTTP level.  Here
we ignore files that HTTP considers to be not compressed, but the actual
file format uses some compression algorithm.  Examples are JPEG files
(typically using lossy compression) and sometimes PNG files (using lossless
compression).  Since compressing an already compressed file isn't useful,
one doesn't want to do HTTP compression on e.g. JPEG files.

    The HTTP side receiving files (compressed or not) needs some way to know
when that file transfer request is done.  With HTTP/1.1, Web Servers will
usually want to transfer multiple files over a single TCP connection, so the
"old way" of knowing that the transfer is done by seeing the TCP connection
terminated, isn't sufficient.  Instead HTTP either uses a Content-Length header
field OR uses the chunked Transfer-Encoding method.

    In particular chunked encoding is widely used.  This method modifies the
file body being transferred (compressed or not) by dividing it into a series
of chunks, where each chunk has an newly inserted chunk header at the start
of each chunk.  A chunk consists of a chunk-size (variable number of hex digits)
followed by an optional chunk-extension followed by a CRLF (CRLF means the
two characters of Carriage Return followed immediately by Line Feed).  Next
comes that actual file bytes - whose length is indicated by the chunk-size -
followed by CRLF.  This pattern is then repeated until the entire file has
been transferred.

    So if this HTTP payload is fed directly into a decompression engine, bad
things will happen because of the insertion of these chunk headers.  So the
decompression API needs to have a way to indicate where the file data starts
in the packet and also where the chunked headers are so that they can be
skipped.  Note that the file data will often not start immediately after
a packet's TCP header, because every file transfer has a variable length
HTTP request header or HTTP response header preceding its first byte.  Also
the chunked headers could be split across TCP packets in an arbitrary way.

    In addition the result of the decompression of this file isn't logically
a set of packets, but instead is a (potentially huge) memory buffer holding
the decompressed/original file data.  By the way this memory buffer often
will need to actually be implemented as a linked list of contiguous memory
regions/buffers.  Ideally these regions/buffers would have a size of the
order of 16K to 64K.  So the preferred API would be one which calls an
asynchronous decompress function with a batch of pkts and a pointer to one byte
past where the last decompress byte for this file was written.  The batch of
input pkts also needs a way to specify where the file data starts and ends
in each packet (via an byte offset from say the TCP header start plus a
count of the file bytes in this packet) AND also the byte offsets and lengths
of the chunked headers.  Note that, while not common, one could have multiple
chunked headers in the same packet.  Ideally, if one has a large number of
chunked headers in the same packet (say > 8 or 16), then probably best to
detect it, abort the connection and call it an attack.

    The preferred compression API is less clear cut.  First consider the
output part of the API.  One eventually needs to send the compressed file
data in a series of TCP/HTTP packets.  But if one supplies the decompression
engine with a set of "empty" packets, then who adds the various protocol
headers?  The L2 and L3 headers aren't hard, but adding a TCP header is
a lot harder because of sequence numbers, flags, etc.  But probably harder
still is adding the variable length HTTP request/response header before the
compressed data AND the chunk headers in the middle of the compressed data!
So it is probably better to have the compressed data be appended to a linked
list of contiguous memory buffers (just like the output for the decompress
case) and then let the ODP application add the various HTTP headers and passing
the reason to some ODP compatible TCP stack.
 
*******************************************************

Thanx Barry.

[lng-odp] [COMPRESSION,RFCv1]

Reply via email to