[Polipo-users] How I would write Polipo today [was: An analysis of why...]

Juliusz Chroboczek Thu, 01 Nov 2012 08:12:53 -0700

>> That brings us to a different issue -- which is how I would write
>> Polipo now, with all that I've learnt over the years.


> please tell us

You're provoking me.

First of all, you need to understand what Polipo was designed for.
Application-layer proxies are evil, since the application layer is an
end-to-end layer, but they turn out to be useful for three reasons:

  (1) optimising traffic and caching;
  (2) using non-default routing;
  (3) filtering or modifying content.

Application (3), of course, should ideally be done in the browser, which
is why the "forbidden URL" capabilities of Polipo are kept primitive.
Application (2) is better done using a VPN, but it turns out to be one
of the main applications of Polipo -- whether using tor, tunnelling out
of a fascist network over ssh to access a Polipo in the Free World, or
simply tunnelling out of Proxad's network to get decent throughput from
Youtube.

Polipo was designed for application (1) -- caching and optimising.  At
the time, browsers had very primitive HTTP support, and I was accessing
the Internet over a 32kbps modem; Polipo was able to reduce latency by
a third, pages would show in 14 seconds or so instead of 20.

Because of that, Polipo is fundamentally a caching proxy -- there are
almost no user-space buffers, we read directly into the cache and serve
from the cache (concurrently).  Tunnelling functionality was added
almost as an afterthought -- different tunnelling implementations are
used for POST and for CONNECT, the former a horrible hack, the latter
fairly clean and efficient (but not optimally so).

The Internet has changed somewhat since 2003.  Latency has become even
more of a concern, while throughput ("bandwidth") is usually not
a concern any more.  So you clearly want to be optimising the HTTP
traffic (pipelining etc.), but you don't necessarily want to limit the
number of server-side connections, and you certainly don't want to be
systematically using an on-disk cache.

If I were to write Polipo from scratch, I'd start with a simple but
extremely fast tunnelling proxy, with good support for pipeling, and
connection scheduling (roughly Polipo's server.c, but with slightly less
smart connection scheduling -- you want the number of server-side
connections to grow without bound nowadays, much of the queueing stuff
can go away).  I'd still put in conversion between HTTP/1.0 and
HTTP/1.1, it increases robustness against stone-age servers without too
much complexity.

I'd then add a caching implementation.  The in-memory cache would be
similar to the one in Polipo -- I still don't see how to improve on
that --, while the on-disk cache would be much simpler, dealing with
complete objects only, since partial objects save bandwidth but don't
reduce latency.  I'd then make Polipo decide dynamically whether to
tunnel or to cache, probably depending on cache-control and the object
size.

I'd write the networking bits in CPC, and the data structures stuff in
plain C.  Keeping the CPC code localised means that it would be easy to
translate to C should people feel they cannot use CPC.

-- Juliusz

------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct
_______________________________________________
Polipo-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/polipo-users

[Polipo-users] How I would write Polipo today [was: An analysis of why...]

Reply via email to