Hello haproxy-list,
We are working on near real time web statistics architecture based on the
haproxy access logs.
(a first shoot of a flume plugin is available here :
https://github.com/figarocms/flume-haproxy-extractor it is now designed to
feet or needs especially but we are working on making it available for more
general pupose)
But HTTP logs are quite difficult to parse especially if there is captured
headers:
* Request and response headers could be captured enclosed both by {}, but
if you capture only one of them in a frontend there is no way to now if it
is the request or the response header. because if the capture is not set
the field is not set either in teh log.
My suggestion is maybe to add a letter before "{" to precise if it's query
( ex : "Q{" ) or response ( "R{" ) set of captured headers. or always make
them appear.
* Olso the header captured is not precised in it, only values appear so
you have olso to know the order of capture in the configuration to parse it
well.
Maybe the whole line of the header could be set in the field.
* The headers are separated by a "|" but the eventuelly "|" allready
present in the headers are not escaped, that could occure some probleme of
parsing I suppose.
No really answer to address this probleme but maybe enclosing headers by "
(and escaped this char it like many other String method could do) could be
a good envolvment.
And As I precised it before syslog default size of 1024 is very short for
today statistical need. I had to increase the constant to 4096 (capture of
full UA, and Referer are costly).
Best regards,
--
Damien