[Monkey] GSoC 2013 Status: Future protocols - Week 6

Sonny Karlsson Sun, 28 Jul 2013 16:03:04 -0700

Thanks to the work done by hamza on a cache plugin and the hardship of
implementing compression in the current api, this week has been very
fruitful.


The api and the necessary protocol abstraction has reached a state where
I've added all the features I intended to add and even bit more.

# Compression

Compression is a required feature for all modern web servers and adding
this is a high priority target for my GSoC.
And when I say compression I mean for all content, not just for static
files or some specific plugins.
Doing this with the old plugin api is impossible without a lot of magic
inside the mk_socket_* functions, similar to how PolarSSL has socket
state but with quite a bit more overhead.
Implementing a transport plugin for compression is not an option.

Now, since the new plugin api no longer writes data directly to the
socket it actually possible to implement such compression.
All response data is kept in `mk_queue` structures, which are vectors of
buffers or files, so compression should be as simple compressing the
`mk_queue` content and writing it to socket.

My first attempt at this approach was to replaced the very simply
function that writes a queue to socket with one that compress the data
and then writes to socket.
This is how the `mk_http_queue_send()` looked at the before I started
implement this.

        ssize_t mk_http_queue_send(int socket, struct mk_queue *queue)
        {
            int ret;
            struct iovec io[10];
            ssize_t wret = 0;
            int fd;
            off_t offset;
            size_t bytes = 0;
        
            ret = mk_queue_iovec(queue, io, 10);
            if (ret > 0) {
                wret = writev(socket, io, ret);
            }
            else if (!mk_queue_get_file(queue, &fd, &offset, &bytes)) {
                wret = mk_socket_send_file(socket, fd, &offset, bytes);
            }
        
            if (wret > 0) {
                return mk_queue_mark_used(queue, wret);
            }
            else {
                return wret;
            }
        }

Unfortunatly, this function blew up in complexity and arguments to
support the feature because we need to be told by the client to compress
and also we need to tell the client that we compressed.
Also, the whole point of compressing is to reduce the content length,
but the content length is already written in the headers.
And if the request is chunked, we must either check for the chunked
headers, or chunk after compression.

All in all, this attempt failed, but gave some very interesting
insights about the problems that needs to be solved.
Specifically, I started passing `mk_queue` structures from one function
to another and found that I recolonized this pattern as filtering.

# Filters

When content is directly written to sockets, it is assumed that the
buffers are directly available for reuse if the call succeed.
This causes overhead if one tries to add filtering between the plugin
and the client socket as data needs to be copied multiple times.
But with the `mk_queue` structure, only references to buffers is passed
around.
This makes the overhead of filters basically nil unless we touch the
data.
But more importantly than very low overhead, it makes it very simple to
implement chunked encoding, compression or even caching.

This approach is of course not unique to monkey or new, both nginx and
apache implements something similar in their content delivery chain.

Of course, implementing filtering is a non-trivial and required quite a
few iterations, but definitely worth it considering how hard it was to
implement compression without this.

In the final implementation a filter is two function pointers, one
processing state variable and a function context variable.
If headers need to be added after all other content has been filtered,
they can be delayed.
When waiting for IO other content can be filters without blocking.

        enum mk_filter_tag {
                MK_FILTER_HEADER,
                MK_FILTER_BODY,
        };

        int filter(enum mk_filter_tag,
                struct mk_queue *input,
                struct mk_queue *output,
                struct mk_filter_data *context);

        int filter_end(enum mk_filter_tag,
                struct mk_queue *output,
                struct mk_filter_data *context);

## Deflate filter

My second attempt at compression is a deflate plugin which rely on last
weeks plugin_chain.
It will compress the response from plugins located after it in the
Plugin directive.
So to compress all static content under `/compress/`, the following
entry should be present in the vhost.

        [Location]
                Path /compress/
                Plugins deflate static

The plugin chain will then first pass any new request first to
`deflate`, and if `deflate` does not want to handle it, it passes it to
static.
Deflate never handles any requests, but if it is possible to compress
the response, it will add a `deflate` filter to the request and what
headers it can add safely.
When `static` has served the request, it will then be passed through the
deflate filter and be compressed.

The final plugin implements working deflate compression using `zlib` in
268 less than 80 character rows of code, with includes and data
structures.
And I did not try to make it compact.
But of course, it is very slow.

## HTTP transport filter

After this, I added a generalized HTTP transport filter which either
chunk the response or adds content length if non is supplied.
This is also a very simple to implement as a filter, thanks to the
possibility of delaying headers.

## Content caching

Compressing content is really slow, which makes it almost useless unless
some kind of caching is deployed.
Some of this week's changes is inspired by problems discussed in the
mailing lists recently.
Most importantly, it is very hard to do any kind of caching with
monkey's current internals.

So how hard would it be do cache with the new api and how would it be
done?

First of, similar to how the deflate plugin requires plugin chains to
work, caching would also rely on it.
The following entry would cache all data under the path /cached/.

        [Location]
                Path /cached/
                Plugins cache static

So first of, a caching plugin would be one request handler and a content
filter.
The request handler in pseudo/c code would look something like this:

        static cache;

        int cache_handler(*plugin, *request)
        {
                info = request_info(request);

                if (info.path in cache) {
                        /* Serve cache content */
                        return MK_REQUEST_END;
                }

                entry = new cache entry ptr;

                mk_filter_add(request, cache_filter, entry);

                return MK_REQUEST_NOT_ME;
        }


So in one statement; serve request if cache else add cache filter.
All response data would then be passed through `cache_filter` when any
downstream plugins (in this case `static`) has served the request.
The filter can than both pass data to the next filter and write it to
cache.

Such a plugin would add generic caching to monkey.
I've not written any code for this and it isn't very high up on my
priority list, so it would be very nice with some help.

# Overview & Next week

Monkey now allows for a third plugin pattern, post-processing plugins,
in addition to the two previous patterns, utility (do everything in
request handler) and proxy (use backend connections).

Next week will be spent is to test/document the new api, write a
proxy-type plugin with event handling, implement abort request cleanup
and maybe start implementing multiple socket support.

[Code](https://github.com/ksonny/monkey/tree/plugin_next)
[Perm](https://lotrax.org/gsoc/gsoc-2013-status-future-protocols-week-6.html)

-- 
Sonny Karlsson
_______________________________________________
Monkey mailing list
[email protected]
http://lists.monkey-project.com/listinfo/monkey

[Monkey] GSoC 2013 Status: Future protocols - Week 6

Reply via email to