https://bz.apache.org/bugzilla/show_bug.cgi?id=69985

            Bug ID: 69985
           Summary: mod_mime_magic should not uncompress data
           Product: Apache httpd-2
           Version: 2.4.66
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: mod_mime_magic
          Assignee: [email protected]
          Reporter: [email protected]
  Target Milestone: ---

For any file that does not get content-type elsewhere, mod_mime_magic will
check for several sets of magic bytes that may identify compressed data. For
files that match those two conditions, it will fork/exec gzip and pass a small
block of data in order to later guess the content-type of the uncompressed
data.

Please remove this functionality because:

1: It is not RFC compliant

Standards documents consistently recommend against setting Content-Encoding for
files that are already compressed.  

https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Headers/Content-Encoding
- "If the original media is already encoded (e.g., as a .zip file), this
information is not included in the Content-Encoding header."

https://www.rfc-editor.org/rfc/rfc9110.html#name-content-encoding - "If the
media type includes an inherent encoding, such as a data format that is always
compressed, then that encoding would not be restated in Content-Encoding even
if it happens to be the same algorithm as one of the content codings"

2: It is unpredictable (or at least difficult to diagnose)

This functionality provides Content-Type and Content-Encoding for compressed
files that don't have a file extension that maps to a MIME type. One case that
might be common is serving file compressed software archives. In this case, a
file named "my-soft-0.1.2.crate" will get Content-Type
"application/x-troff-man" (apparently because it has a ".2" extension, matching
"application/x-troff-man  man 1 2 3 4 5 6 7 8" in /etc/mime.types) but a file
named "my-soft-1.0.0.crate" will not match anything in mime.types, so
mod_mime_magic gives it Content-Type application/x-tar and Content-Encoding
x-gzip

When this causes problems, it can be difficult for users to understand why
those problems only affect some files. Which leads us to:

3: It is counter-productive

Adding "Content-Encoding: x-gzip" will cause *most* clients to uncompress the
file before writing it to storage. For sites that intend to serve software
archives, that behavior will cause the resulting file on disk to have a
different size and checksum than expected. Processes that verify checksums or
signatures of the files they download will be unable to work with files that
end in ".9.crate" or ".0.crate", for example.

The current behavior makes it difficult to serve static compressed files.

4: It impacts performance negatively

The overhead of fork -> exec -> uncompress some compressed data is fairly
substantial.

5: It is unsafe

All of the previous points are made from the point of view of the impact of the
behavior on trusted data. These problems are all worse from the point of view
that sites might be serving untrusted data that might be provided by malicious
users, who might provide files with names that do not match a known mime type,
which contain the magic bytes, and which are intended to exploit the local gzip
binary. gzip could be vulnerable to compression bombs or other exploits.

Passing untrusted data to an external tool for processing by default is wildly
unsafe.

I have a patch that removes the decompression feature, staged here:

https://github.com/gordonmessmer/httpd/pull/1

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to