[ 
https://issues.apache.org/jira/browse/COUCHDB-583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Filipe Manana updated COUCHDB-583:
----------------------------------

           Component/s: Database Core
           Description: 
This feature allows Couch to gzip compress attachments as they are being 
received and store them in compressed form.

When a client asks for downloading an attachment (e.g. GET 
somedb/somedoc/attachment.txt), the attachment is sent in compressed form if 
the client's http request has gzip specified as a valid transfer encoding for 
the response (using the http header "Accept-Encoding"). Otherwise couch 
decompresses the attachment before sending it back to the client.

Attachments are compressed only if their MIME type matches one of those listed 
in a separate config file. Compression level is also configurable in the 
default.ini file.

This follows Damien's suggestion from 30 November:

"Perhaps we need a separate user editable ini file to specify compressable or 
non-compressable files (would probably be too big for the regular ini file). 
What do other web servers do?

Also, a potential optimization is to compress the file while writing to disk, 
and serve the compressed bytes directly to clients that can handle it, and 
decompressed for those that can't. For compressable types, it's a win for both 
disk IO for reads and writes, and CPU on read."

Patch attached.

  was:
The following new feature is added in the patch following this ticket creation.

A new optional http query parameter "compression" is added to the attachments 
API.
This parameter can have one of the values:  "gzip" or "deflate".

When asking for an attachment (GET http request), if the query parameter 
"compression" is found, CouchDB will send the attachment compressed to the 
client (and sets the header Content-Encoding with gzip or deflate).

Further, it adds a new config option "treshold_for_chunking_comp_responses" 
(httpd section) that specifies an attachment length threshold. If an attachment 
has a length >= than this threshold, the http response will be chunked (besides 
compressed).

Note that using non chunked compressed  body responses requires storing all the 
compressed blocks in memory and then sending each one to the client. This is a 
necessary "evil", as we only know the length of the compressed body after 
compressing all the body, and we need to set the "Content-Length" header for 
non chunked responses. By sending chunked responses, we can send each 
compressed block immediately, without accumulating all of them in memory.

Examples:

$ curl http://localhost:5984/testdb/testdoc1/readme.txt?compression=gzip
$ curl http://localhost:5984/testdb/testdoc1/readme.txt?compression=deflate
$ curl http://localhost:5984/testdb/testdoc1/readme.txt   # attachment will not 
be compressed
$ curl http://localhost:5984/testdb/testdoc1/readme.txt?compression=rar   # 
will give a 500 error code

Etap test case included.

Feedback would be very welcome.

cheers

           Environment: CouchDB trunk  (was: CouchDB trunk revision 885240)
    Remaining Estimate:     (was: 24h)
     Original Estimate:     (was: 24h)
               Summary: storing attachments in compressed form and serving them 
in compressed form if accepted by the client  (was: adding 
?compression=(gzip|deflate) optional parameter to the attachment download API)

> storing attachments in compressed form and serving them in compressed form if 
> accepted by the client
> ----------------------------------------------------------------------------------------------------
>
>                 Key: COUCHDB-583
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-583
>             Project: CouchDB
>          Issue Type: New Feature
>          Components: Database Core, HTTP Interface
>         Environment: CouchDB trunk
>            Reporter: Filipe Manana
>         Attachments: couchdb-583-trunk-3rd-try.patch, 
> couchdb-583-trunk-4th-try-trunk.patch, couchdb-583-trunk-5th-try.patch, 
> couchdb-583-trunk-6th-try.patch, jira-couchdb-583-1st-try-trunk.patch, 
> jira-couchdb-583-2nd-try-trunk.patch
>
>
> This feature allows Couch to gzip compress attachments as they are being 
> received and store them in compressed form.
> When a client asks for downloading an attachment (e.g. GET 
> somedb/somedoc/attachment.txt), the attachment is sent in compressed form if 
> the client's http request has gzip specified as a valid transfer encoding for 
> the response (using the http header "Accept-Encoding"). Otherwise couch 
> decompresses the attachment before sending it back to the client.
> Attachments are compressed only if their MIME type matches one of those 
> listed in a separate config file. Compression level is also configurable in 
> the default.ini file.
> This follows Damien's suggestion from 30 November:
> "Perhaps we need a separate user editable ini file to specify compressable or 
> non-compressable files (would probably be too big for the regular ini file). 
> What do other web servers do?
> Also, a potential optimization is to compress the file while writing to disk, 
> and serve the compressed bytes directly to clients that can handle it, and 
> decompressed for those that can't. For compressable types, it's a win for 
> both disk IO for reads and writes, and CPU on read."
> Patch attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to