Re: [pmacct-discussion] HTTP Virtual Hosts classification

Chris Wilson Wed, 18 Feb 2009 09:28:04 -0800

Hi all,

On Wed, 18 Feb 2009, Paolo Lucente wrote:


> In concept, and as documentation says, what you want to achieve is 
> feasible and your understanding of the classifier() is correct - you 
> only have to write down your own patterns: re-phrased, regular 
> expressions are typically employed to recognize protocols but they can 
> be of course used to recognize virtual hosts when in presence of 
> text-based protocols (ie. HTTP, FTP or POP3).
> 
> As you said this is quite innovative and interesting - so let me know if 
> i can support you somehow (feel also free to contact me privately). For 
> now i have not received any feedback which can help you dimensioning the 
> solution - so can't say how easy it would be to deploy in this sense; 
> perhaps somebody reading can fill this gap?

I have thought about doing this as well. The main problem that I had with 
using classifiers is that I ultimately would have to implement a TCP 
engine to reassemble the stream from packets (perhaps the one in snort can 
be borrowed?). Otherwise the Host: header could (accidentally or 
deliberately) be split across multiple packets. There is plenty of 
opportunity for exploitation here as well, e.g. multiple Host: headers, 
invalid characters in headers, packets that look like HTTP requests in the 
middle of streams, bad Content-Lengths, etc.

What I was planning to do, but have not done yet, is to:

* force everyone to use a HTTP proxy (transparent or not) so that dealing 
with malicious requests becomes someone else's problem;

* use the HTTP proxy's logging features to capture the full details of 
both requests (inbound to proxy and outbound from proxy) along with the 
requested URI and current time;

* save all this in a separate table in the database;

* left join from pmacct's acct_v* table to the proxy table on the unique 
quadruple (ip_src,ip_dst,src_port,dst_port) and time.

Thsi was appropriate for my situation as I wanted everyone to use a 
caching proxy anyway to save bandwidth, and hopefully to authenticate. 
However I discovered that Squid's logging formats do not provide all the 
information that I needed to reliably match up the connection (no client 
port, see http://www.visolve.com/squid/squid30/logs.php#logformat).

The external ACL program does have enough information for this
(http://www.visolve.com/squid/squid30/externalsupport.php#external_acl_type), 
so writing a program to run as an external ACL helper and log the 
information to the database is a possibility. 

In our case this also was not good enough, as it does not tell us whether 
the request will be served from the cache or not, and therefore does not 
correspond to the client's real bandwidth usage.

I would be very interested to see what you do in this space.

Cheers, Chris.
-- 
Aptivate | http://www.aptivate.org | Phone: +44 1223 760887
The Humanitarian Centre, Fenner's, Gresham Road, Cambridge CB1 2ES

Aptivate is a not-for-profit company registered in England and Wales
with company number 04980791.

_______________________________________________
pmacct-discussion mailing list
http://www.pmacct.net/#mailinglists

Re: [pmacct-discussion] HTTP Virtual Hosts classification

Reply via email to