Re: Adding a unique id header to each request?

2012-12-11 Thread Roy Smith
Hi Willie,

Thanks for your note.  Yes, I remember you mentioning that you had plans to do 
this at some point.  I just didn't realize it was included in 1.5.  The 
flexible log format is nice too.  I had hard-wired a format that made sense for 
us, but your way lets people pick whatever makes sense for them.

Now that the official haproxy code has this, there's really no point in me 
continuing to maintain our version.  It was nice to get to write some C code 
again, however.  These days, I mostly live in Python.  It's good to get close 
to the metal once in a while, so you don't forget how computers really work.

I totally agree about keeping the clocks in sync.  And, I should add, running 
everything in UTC.  I never again want to try to remember if I'm +4 or +5, or 
deal with the fact that we rolled over from summer time to winter time in the 
middle of a log file :-)

We're really looking forward to the SSL support in 1.5.  We currently terminate 
our SSL sessions with nginx (in addition to also using nginx as our application 
front-end), so our data flow looks like nginx -> haproxy -> nginx.  It'll be 
nice to shorten that chain.

Have you experienced any issues running your SSL code on Ubuntu precise?  We 
tried to upgrade our SSL hosts from lucid to precise and started getting 
intermittent  errors with users authenticating to our application.  We were 
never able to figure out what was going on.  We eventually solved the problem 
by retreating back to lucid.  We don't know if it's an nginx issue, or a 
problem with the underlying SSL library.  I'm curious if you've seen anything 
similar?


On Dec 11, 2012, at 5:54 PM, Willy Tarreau wrote:

> Hi Roy,
> 
> On Tue, Dec 11, 2012 at 05:08:17PM -0500, Roy Smith wrote:
>> Hmmm, yeah, this looks like exactly what my patch does.  Cool.
> 
> Indeed, now I remember this old discussion where we were different ones
> having different requirements. As I said at that time, we (Exceliance)
> had plans for developing this. After we had redesigned the log format,
> it was quite easy to extend the feature to define an HTTP header format.
> 
> That way anyone can define the format he needs. It is very possible
> that some of the things you're currently doing are not possible with
> the current format (I have no idea what), but you're really invited
> to extend it if something is missing.
> 
> Now to respond your initial questions :
> 
>>>> I've just finished patching haproxy-1.5-dev14 to use my unique-id header
>>>> code, which we discussed a year or so ago.  There's one thing I'm not 100%
>>>> sure about.  You changed the interface of http_find_header2().
>>>> Previously, the third argument was an int, now it's a char*.  I'm assuming
>>>> that where I used to do:
>>>> 
>>>> +if (! http_find_header2(hdr, hdr_len, msg->sol, 
>>>> &txn->hdr_idx, &ctx)) {
>>>> 
>>>> I now want
>>>> 
>>>> +if (! http_find_header2(hdr, hdr_len, msg->chn->buf->p + 
>>>> msg->sol, &txn->hdr_idx, &ctx)) {
>>>> 
>>>> yes?  This is in proto_http.c, http_wait_for_request().
> 
> From what I'm seeing, no, you just need msg->chn->buf->p, which is the
> pointer to the beginning of current message in the request buffer. Please
> have a look to doc/internal/buffer-ops.fig to get a better idea. The
> "entities.fig" file in the same directory might also be of interest to you.
> 
>>>> Have you given any more thought to picking up this patch for the official
>>>> version?  We've been running it in production for well over a year with no
>>>> problems.  Yesterdays' traffic volume was 73 million HTTP requests, so I
>>>> think we've given it a fair workout.  The feature has proven invaluable
>>>> for debugging problems in the site, as it lets us easily correlate lines
>>>> in our haproxy, nginx, and application log files.  I'm sure many other
>>>> haproxy users would find it equally useful.
> 
> I'm totally convinced by the usefulness of such a feature. I've insisted
> that we put this in our Aloha roadmap years ago because of this. But only
> people who have previously used such a feature in production know its value.
> Once you have this, you can completely rethink the log workflow, processing
> and analysis. And the second best thing to have is perfectly synchronized
> clocks on all your servers :-)
> 
> Best regards!
> Willy
> 


---
Roy Smith
r...@panix.com






Re: Adding a unique id header to each request?

2012-12-11 Thread Roy Smith
Hmmm, yeah, this looks like exactly what my patch does.  Cool.



On Dec 11, 2012, at 4:45 PM, Baptiste wrote:

> Hi Roy,
> 
> Since a few months, HAProxy allows you to define a unique id.
> Check the keyword "unique-id-format" in haproxy 1.5 configuration.txt.
> 
> Maybe it provides whay you want (and makes your patch useless).
> 
> cheers
> 
> 
> On Tue, Dec 11, 2012 at 10:39 PM, Roy Smith  wrote:
>> Hi Willy,
>> 
>> I've just finished patching haproxy-1.5-dev14 to use my unique-id header 
>> code, which we discussed a year or so ago.  There's one thing I'm not 100% 
>> sure about.  You changed the interface of http_find_header2().  Previously, 
>> the third argument was an int, now it's a char*.  I'm assuming that where I 
>> used to do:
>> 
>> +if (! http_find_header2(hdr, hdr_len, msg->sol, 
>> &txn->hdr_idx, &ctx)) {
>> 
>> I now want
>> 
>> +if (! http_find_header2(hdr, hdr_len, msg->chn->buf->p + 
>> msg->sol, &txn->hdr_idx, &ctx)) {
>> 
>> yes?  This is in proto_http.c, http_wait_for_request().
>> 
>> Have you given any more thought to picking up this patch for the official 
>> version?  We've been running it in production for well over a year with no 
>> problems.  Yesterdays' traffic volume was 73 million HTTP requests, so I 
>> think we've given it a fair workout.  The feature has proven invaluable for 
>> debugging problems in the site, as it lets us easily correlate lines in our 
>> haproxy, nginx, and application log files.  I'm sure many other haproxy 
>> users would find it equally useful.
>> 
>> ---
>> Roy Smith
>> r...@panix.com
>> 
>> 
> 


---
Roy Smith
r...@panix.com






Re: Adding a unique id header to each request?

2012-12-11 Thread Roy Smith
Hi Willy,

I've just finished patching haproxy-1.5-dev14 to use my unique-id header code, 
which we discussed a year or so ago.  There's one thing I'm not 100% sure 
about.  You changed the interface of http_find_header2().  Previously, the 
third argument was an int, now it's a char*.  I'm assuming that where I used to 
do:

+if (! http_find_header2(hdr, hdr_len, msg->sol, &txn->hdr_idx, 
&ctx)) {

I now want

+if (! http_find_header2(hdr, hdr_len, msg->chn->buf->p + 
msg->sol, &txn->hdr_idx, &ctx)) {

yes?  This is in proto_http.c, http_wait_for_request().

Have you given any more thought to picking up this patch for the official 
version?  We've been running it in production for well over a year with no 
problems.  Yesterdays' traffic volume was 73 million HTTP requests, so I think 
we've given it a fair workout.  The feature has proven invaluable for debugging 
problems in the site, as it lets us easily correlate lines in our haproxy, 
nginx, and application log files.  I'm sure many other haproxy users would find 
it equally useful.

---
Roy Smith
r...@panix.com




Re: How does http_find_header() work?

2011-04-01 Thread Roy Smith

On Apr 1, 2011, at 2:11 AM, Willy Tarreau wrote:

> On Thu, Mar 31, 2011 at 08:10:04AM -0400, Roy Smith wrote:
>> I didn't really write a specification, but I think a critical part of the 
>> spec
>> would be that the only guarantee about the id string is that it's unique
> 
> It's unique within a delimited perimeter.

Well, yes, but that's a general problem which is difficult to solve.  There's 
really only two ways to generate unique identifiers.  One is to use some 
probabilistic algorithm with a large enough entropy that the chances of a 
collision are very low.  Uuid, and crypto hashes of various flavors are all 
examples of this approach.

The other is to have some globally administered namespace.  Examples would be 
global IPv6 addresses (and IPv4, before evil things like NAT came along!), 
telephone numbers, and the like.  Even so, those are only useful for 
cooperating parties, not

> providing proofs in case of litigation

since they can be spoofed by evil-doers.

I understand that other people may have other uses for this capability, which 
would imply additional requirements.  I'm fine with switching to a different 
way of generating the unique ids.  My concerns are three:

1) Efficiency.  Computing a UUID on every request, for example, would probably 
be prohibitively expensive.

2) Portability.  I only have a very limited number of systems available to me 
at the moment (Ubuntu and OSX).  As things get more complex (i.e. involving 
crypto), they also get less portable, and I have no way to test my code on 
other systems.

3) My time.  I have a limited amount of time I can devote to this activity. I'm 
certainly willing to invest more time than I have if I can make this more 
useful to a wider audience.  We find haproxy a useful tool in our business and 
giving back something to the community is an obligation I take seriously.  On 
the other hand, I don't want to get dragged down a rat-hole implementing every 
conceivable bell and whistle.

So, with the above three points in mind, what's a reasonable path to continue 
this and get it to a point where it's worth including in the core distribution?


--
Roy Smith
roy.sm...@s7labs.com






Re: How does http_find_header() work?

2011-03-31 Thread Roy Smith
My intent was just to have a unique string that could be searched for in the 
logs.  Building it by smashing together the hostid, pid, timestamp, etc, was 
just a fast hack to get something unique.  I made one attempt to compact the 
string by running it through md5, but then I realized that the more bells and 
whistles I hung on this, the less portable it would be (i.e. not everybody 
might have the the same md5 API I was using).

For my purpose, all I need is something that's unique.  If anything, rather 
than making it human readable, I think a better way to approach this would be 
to make it more compact, by doing some kind of message digest, and perhaps even 
printing it in some encoding more compact than hex (say, base64).

I didn't really write a specification, but I think a critical part of the spec 
would be that the only guarantee about the id string is that it's unique, and 
that the specific format is subject to change without warning.  That would 
discourage people from trying to use it to embed whatever information seems 
useful at the moment.  The right way to recover additional information about 
the request is to use the id to correlate across logs.  For example, when we 
first discussed this in January, it was suggested (IIRC) that we might want to 
embed the IP address where the request came from.   While I can see how that 
might be useful, that information is already available.  If you see something 
in a downstream log that interests you and you want to know what IP it came 
from, use the unique id to find the corresponding entry in the front-end 
haproxy log, and the IP address will be there.


On Mar 31, 2011, at 4:43 AM, Bart van der Schans wrote:

> Hi,
> 
> Thx Roxy, this would be very useful to have. I'm just wondering about
> the id format. If all the "fields" correspond to something meaningful,
> like host_id, pid, timestamp, etcetera, would it make sense to have
> them in a more human readable format?
> 
> Regards,
> Bart
> 
> On Thu, Mar 31, 2011 at 4:30 AM, Roy Smith  wrote:
>> Willy,
>> 
>> This turned out to be surprisingly straight-forward.  Patch attached 
>> (against the 1.4.11 sources).
>> 
>> To enable generation of the X-Unique-Id headers, you add "unique-id" to a 
>> listen stanza in the config file.  This doesn't make any sense unless you're 
>> in http mode (although my code doesn't check for that, which could 
>> reasonably considered a bug).  What this does is adds a header that looks 
>> like:
>> 
>> X-Unique-Id: CB0A6819.4B7D.4D93DFDB.C69B.10
>> 
>> to each incoming request.  This gets done before the header capture 
>> processing happens, so you can use the existing "capture request header" to 
>> log the newly added headers.  There's nothing magic about the format of the 
>> Id code.  In the current version, it's just a mashup of the hostid, haproxy 
>> pid, a timestamp, and a sequence number.  The sequence numbers count up to 
>> 1000, and then the leading part is regenerated.  I'm sure there's better 
>> schemes that could be used.
>> 
>> Here's a sample config stanza:
>> 
>> listen test-nodes 0.0.0.0:19199
>>   mode http
>>   option httplog
>>   balance leastconn
>>   capture request header X-Unique-Id len 64
>>   unique-id
>>   server localhost localhost:9199 maxconn 8 weight 10 check inter 60s 
>> fastinter 60s rise 2
>> 
>> If there is already a X-Unique-Id header on the incoming request, it is left 
>> untouched.
>> 
>> A little documentation:
>> 
>> We've got (a probably very typical) web application which consists of many 
>> moving parts mashed together.  In our case, it's an haproxy front end, an 
>> nginx layer (which does SSL conversion and some static file serving), 
>> Apache/PHP for the main application logic, and a number of ancillary 
>> processes which the PHP code talks to over HTTP (possibly with more 
>> haproxies in the middle).  Plus mongodb.  Each of these moving parts 
>> generates a log file, but it's near impossible to correlate entries across 
>> the various logs.
>> 
>> To fix the problem, we're going to use haproxy to assign every incoming 
>> request a unique id.  All the various bits and pieces will log that id in 
>> their own log files, and pass it along in the HTTP requests they make to 
>> other services, which in turn will log it.  We're not yet sure how to deal 
>> with mongodb, but even if we can't get it to log our ids, we'll still have a 
>> very powerful tool for looking at overall performance through the entire 
>> applica

Re: How does http_find_header() work?

2011-03-31 Thread Roy Smith
Looking at the docs, I see some lines that use "option" and some that don't.  
Is there a general rule as to when it should use "option"?  Could you give me a 
use case where you would want to apply this conditionally?

As far as stripping headers, one of the problems is that the unique id 
insertion happens very early, before the header capture, so the inserted id can 
be logged.  I took a quick look through the code right now and couldn't find 
where the reqdel processing happens, but I would imagine it's after the header 
capture.


On Mar 31, 2011, at 3:04 AM, Bryan Talbot wrote:

> This would be useful, but having a format similar to what's currently
> used for forwardfor would be nice:
> 
> option uniqueid [{if | unless} ] [ header  ]
> 
> I would also like to be sure that any incoming values for the header
> could be stripped (using reqidel) and still have the new one added
> properly.
> 
> -Bryan
> 
> 
> On Wed, Mar 30, 2011 at 7:30 PM, Roy Smith  wrote:
>> Willy,
>> 
>> This turned out to be surprisingly straight-forward.  Patch attached 
>> (against the 1.4.11 sources).
>> 
>> To enable generation of the X-Unique-Id headers, you add "unique-id" to a 
>> listen stanza in the config file.  This doesn't make any sense unless you're 
>> in http mode (although my code doesn't check for that, which could 
>> reasonably considered a bug).  What this does is adds a header that looks 
>> like:
>> 
>> X-Unique-Id: CB0A6819.4B7D.4D93DFDB.C69B.10
>> 
>> to each incoming request.  This gets done before the header capture 
>> processing happens, so you can use the existing "capture request header" to 
>> log the newly added headers.  There's nothing magic about the format of the 
>> Id code.  In the current version, it's just a mashup of the hostid, haproxy 
>> pid, a timestamp, and a sequence number.  The sequence numbers count up to 
>> 1000, and then the leading part is regenerated.  I'm sure there's better 
>> schemes that could be used.
>> 
>> Here's a sample config stanza:
>> 
>> listen test-nodes 0.0.0.0:19199
>>   mode http
>>   option httplog
>>   balance leastconn
>>   capture request header X-Unique-Id len 64
>>   unique-id
>>   server localhost localhost:9199 maxconn 8 weight 10 check inter 60s 
>> fastinter 60s rise 2
>> 
>> If there is already a X-Unique-Id header on the incoming request, it is left 
>> untouched.
>> 
>> A little documentation:
>> 
>> We've got (a probably very typical) web application which consists of many 
>> moving parts mashed together.  In our case, it's an haproxy front end, an 
>> nginx layer (which does SSL conversion and some static file serving), 
>> Apache/PHP for the main application logic, and a number of ancillary 
>> processes which the PHP code talks to over HTTP (possibly with more 
>> haproxies in the middle).  Plus mongodb.  Each of these moving parts 
>> generates a log file, but it's near impossible to correlate entries across 
>> the various logs.
>> 
>> To fix the problem, we're going to use haproxy to assign every incoming 
>> request a unique id.  All the various bits and pieces will log that id in 
>> their own log files, and pass it along in the HTTP requests they make to 
>> other services, which in turn will log it.  We're not yet sure how to deal 
>> with mongodb, but even if we can't get it to log our ids, we'll still have a 
>> very powerful tool for looking at overall performance through the entire 
>> application suite.
>> 
>> Thanks so much for the assistance you provided, not to mention making 
>> haproxy available in the first place.  Is there any possibility you could 
>> pick this up and integrate it into a future version of haproxy?  Right now, 
>> we're maintaining this in a private fork, but I'd prefer not to have to do 
>> that.  I suspect this may also be useful for other people.  If there's any 
>> modifications I could make which would help you, please let me know.
>> 
>> 
>> 


--
Roy Smith
roy.sm...@s7labs.com






Re: How does http_find_header() work?

2011-03-30 Thread Roy Smith
Willy,

This turned out to be surprisingly straight-forward.  Patch attached (against 
the 1.4.11 sources).

To enable generation of the X-Unique-Id headers, you add "unique-id" to a 
listen stanza in the config file.  This doesn't make any sense unless you're in 
http mode (although my code doesn't check for that, which could reasonably 
considered a bug).  What this does is adds a header that looks like:

X-Unique-Id: CB0A6819.4B7D.4D93DFDB.C69B.10

to each incoming request.  This gets done before the header capture processing 
happens, so you can use the existing "capture request header" to log the newly 
added headers.  There's nothing magic about the format of the Id code.  In the 
current version, it's just a mashup of the hostid, haproxy pid, a timestamp, 
and a sequence number.  The sequence numbers count up to 1000, and then the 
leading part is regenerated.  I'm sure there's better schemes that could be 
used.

Here's a sample config stanza:

listen test-nodes 0.0.0.0:19199
   mode http
   option httplog
   balance leastconn
   capture request header X-Unique-Id len 64
   unique-id
   server localhost localhost:9199 maxconn 8 weight 10 check inter 60s 
fastinter 60s rise 2

If there is already a X-Unique-Id header on the incoming request, it is left 
untouched.

A little documentation:

We've got (a probably very typical) web application which consists of many 
moving parts mashed together.  In our case, it's an haproxy front end, an nginx 
layer (which does SSL conversion and some static file serving), Apache/PHP for 
the main application logic, and a number of ancillary processes which the PHP 
code talks to over HTTP (possibly with more haproxies in the middle).  Plus 
mongodb.  Each of these moving parts generates a log file, but it's near 
impossible to correlate entries across the various logs.

To fix the problem, we're going to use haproxy to assign every incoming request 
a unique id.  All the various bits and pieces will log that id in their own log 
files, and pass it along in the HTTP requests they make to other services, 
which in turn will log it.  We're not yet sure how to deal with mongodb, but 
even if we can't get it to log our ids, we'll still have a very powerful tool 
for looking at overall performance through the entire application suite.

Thanks so much for the assistance you provided, not to mention making haproxy 
available in the first place.  Is there any possibility you could pick this up 
and integrate it into a future version of haproxy?  Right now, we're 
maintaining this in a private fork, but I'd prefer not to have to do that.  I 
suspect this may also be useful for other people.  If there's any modifications 
I could make which would help you, please let me know.




patch
Description: Binary data


Re: How does http_find_header() work?

2011-03-28 Thread Roy Smith
On Mon, 2011-03-28 at 07:22 +0200, Willy Tarreau wrote:

> On Sun, Mar 27, 2011 at 09:28:15PM -0400, Roy Smith wrote:
> > Thanks!  Looks like "capture request header" does exactly what I want.
> > 
> > Well, mostly.  This gets me logging if the header exists.  Now I need to 
> > figure out how to insert the header if it doesn't (taking us back to the 
> > conversation we had in January).
> 
> Then maybe you just have to change the code in the early processing,
> just before the header captures ?


OK, this looks like it works.   I still need to do something smarter
about generating the unique ids (some combination of hostid, pid, and
time, and counter should do it), and adding a way to control this from
the config file.  Before I get too deep into this, does the following
patch (against haproxy-1.4.11) look reasonable for the basics?


--- a/src/proto_http.c  Sat Mar 26 12:38:10 2011 -0400
+++ b/src/proto_http.c  Mon Mar 28 13:41:04 2011 -0400
@@ -2713,6 +2713,19 @@
/* transfer length unknown*/
txn->flags &= ~TX_REQ_XFER_LEN;
 
+/* 4b: Insert header for tracing, if needed */
+   ctx.idx = 0;
+{
+const char hdr[] = "X-Unique-Id";
+const size_t hdr_len = sizeof(hdr);
+if (! http_find_header(hdr, msg->sol, &txn->hdr_idx, &ctx)) {
+static int id_counter = 0;
+char hdr_val[hdr_len + sizeof(": 1234567890") + 1];
+snprintf(hdr_val, sizeof(hdr_val), "%s: %d", hdr, 
id_counter++);
+http_header_add_tail(req, &txn->req, &txn->hdr_idx, 
hdr_val);
+}
+}
+
/* 5: we may need to capture headers */
if (unlikely((s->logs.logwait & LW_REQHDR) && s->fe->req_cap))
capture_headers(msg->sol, &txn->hdr_idx,
@@ -2757,7 +2770,6 @@
 */
 
use_close_only = 0;
-   ctx.idx = 0;
/* set TE_CHNK and XFER_LEN only if "chunked" is seen last */
while ((txn->flags & TX_REQ_VER_11) &&
   http_find_header2("Transfer-Encoding", 17, msg->sol, 
&txn->hdr_idx, &ctx)) {




Re: How does http_find_header() work?

2011-03-27 Thread Roy Smith
Thanks!  Looks like "capture request header" does exactly what I want.

Well, mostly.  This gets me logging if the header exists.  Now I need to figure 
out how to insert the header if it doesn't (taking us back to the conversation 
we had in January).




On Mar 27, 2011, at 2:22 PM, Willy Tarreau wrote:

> If you need to log some request data, you have to do that during the request
> processing, and copy them somewhere in the session. You should probably use
> the header capture for that, they'll do that for free. Alternatively, you may
> add a fixed size string to the session struct and use it for your needs.


--
Roy Smith
roy.sm...@s7labs.com






How does http_find_header() work?

2011-03-27 Thread Roy Smith
I'm trying to modify haproxy-1.4.11 (on Ubuntu) to do some additional logging.  
I've added to http_sess_log(), some experimental code:

--- a/src/proto_http.c  Sat Mar 26 12:38:10 2011 -0400
+++ b/src/proto_http.c  Sat Mar 26 19:18:30 2011 -0400
@@ -1081,6 +1081,7 @@
static char tmpline[MAX_SYSLOG_LEN];
int t_request;
int hdr;
+   struct hdr_ctx ctx;

/* if we don't want to log normal traffic, return now */
err = (s->flags & (SN_ERR_MASK | SN_REDISP)) ||
@@ -1149,6 +1150,14 @@
  '#', url_encode_map, uri);
*(h++) = '"';
}
+
+   if (http_find_header("X-Unique-ID", txn->req.sol, &txn->hdr_idx, &ctx)) 
{
+// if (h < tmpline + ctx.vlen - 4) {
+// memcpy(tmpline, ctx.val, ctx.vlen);
+// h += ctx.vlen;
+// }
+}
+
*h = '\0';

svid = (tolog & LW_SVID) ?

when this code runs, I get a segfault as soon as it tries to log anything.  I'm 
obviously calling http_find_header() wrong, but I can't figure out  what I 
should be doing.  When I look at the resulting core file with gdb, it shows 
that txn->req.sol is not initialized.

#0  http_find_header2 (name=0x44d6af "X-Unique-ID", len=11, sol=0x1 , idx=0x795960, ctx=0x7fff37575a80) at src/proto_http.c:513
513 eol = sol + idx->v[cur_idx].len;

Is there something I need to be doing to initialize the sol element of the 
request?


--
Roy Smith
roy.sm...@s7labs.com




Parsing haproxy log files (python)

2011-03-18 Thread Roy Smith
Before I reinvent the wheel, has anybody already written code to parse
haproxy log messages with Python?





Re: Adding a unique id header to each request?

2011-01-27 Thread Roy Smith
It sounds like you have given this more thought than I have (for which I am 
grateful).

Still, my need is for a unique tag.  If you provide me with a unique tag which 
also encodes some useful information about source and time, and is guaranteed 
not to roll over in the face of a flood attack, and all the other good things 
you describe, then I'm still happy because you have met my basic need :-)

I do see your point about stripping incoming ids.  It seems to me that this 
should be a configurable option.  If we had several haproxys (haproxies?) 
stacked, I guess you would want to make the first one strip existing tags, and 
all the later ones keep them, since that would be the one added by the first 
one in the chain.


On Jan 27, 2011, at 5:44 PM, Willy Tarreau wrote:

> Hi Roy,
> 
> On Thu, Jan 27, 2011 at 02:51:37PM -0500, Roy Smith wrote:
>>> Try to think about these cases :
>>> - what to do with reqids that are already present in requests
>> 
>> I don't see how that would happen.  Maybe if we did something like stack one
>> haproxy behind another, but I don't see any reason we would do that.
> 
> Maybe you have a simple enough setup. I know places where you can pass through
> between 5 and 6 haproxies along a whole chain, simply because application
> components are chained and each stage includes a load balacing feature.
> 
>> But, if it were to happen somehow, I think we would just leave it untouched
>> (and log that we saw it).
> 
> That's just the most common thing to do for the inner instances, but the outer
> one needs an easy way to strip it, otherwise external users can inject the ID
> they want into your system.
> 
>> The goal is to have a unique identifier so that every process that's
>> involved in responding to a request can log the id, allowing us to correlate
>> logs. If the incoming request already has such an id, there's no reason to
>> change it.
> 
> Yes you have, see above ;-)
> 
>>> - what to do with reqids in responses
>>>   => compare them with the request's, block if they do not match
>>>   => delete them or not depending on where you're responding
>> 
>> I'm not sure I understand what you're asking.
> 
> Once you deploy unique IDs, it's common to seek for better application
> integration and have deeper application components return the ID they
> received in the responses. That way, the outer component can compare
> the ID it added with the ID it received in the response and ensure that
> there was no session crossing in the whole chain, as it unfortunately
> happens from time to time with buggy applications or components (mainly
> in threaded environments).
> 
>>> - what to log
>>>   => do we always need to log a full reqid or can we sometimes just
>>>  log one part of it
>> 
>> What do you mean by logging "one part" of an ID?  The ID is just a unique 
>> tag.  I don't understand how it can be divided into parts.
> 
> An ID can only be unique in a limited space * time. When you have multiple
> processes running on the same machine, a per-process counter is not enough
> anymore so you need to discriminate on the process too, otherwise you end
> up generating multiple identical "unique IDs". Then you add some machines
> and you repeat the same process. Then you take into account the risk of
> rollover of the values and you have to add a timestamp.
> 
> After about 10 years of feedback using unique IDs, I can say that some
> features are definitely important :
> 
>  - having a timestamp allows you to easily sort your events and correlate
>them by time. It also indicates you where to look
> 
>  - having some origin information (whether it's the instance which received
>the first event or the source address itself) helps a lot correlate logs
>when some are missing. Logs are *always* missing when you want to
>correlate large amounts. You discover that one FS got full or that one
>syslog server was being restarted, or simply that you're dropping a few
>of them on the wire or in system queues, etc... When you can identify
>*where* the ID was created, you can reconstitute the missing parts of
>the chain (assuming you're not missing too many, of course).
> 
>  - having some source information generally helps quickly search for other
>occurrences of a similar suspicious event at places where its hard to
>log source information. However, it's far from being a requirement, as
>there are always alternatives. It's just that it help a lot.
> 
>  - having the ability to certify with good enough confidence that you'

Re: Adding a unique id header to each request?

2011-01-27 Thread Roy Smith

On Jan 26, 2011, at 1:09 AM, Willy Tarreau wrote:

> Hi Roy,
> 
> On Sun, Jan 23, 2011 at 10:04:57AM -0500, Roy Smith wrote:
>> Cool.  What can I do to help?
> 
> Could you try to identify precisely how it would be used at your site ?
> Try to think about these cases :
>  - what to do with reqids that are already present in requests

I don't see how that would happen.  Maybe if we did something like stack one 
haproxy behind another, but I don't see any reason we would do that.  But, if 
it were to happen somehow, I think we would just leave it untouched (and log 
that we saw it).  The goal is to have a unique identifier so that every process 
that's involved in responding to a request can log the id, allowing us to 
correlate logs.  If the incoming request already has such an id, there's no 
reason to change it.

> - what to do with reqids in responses
>=> compare them with the request's, block if they do not match
>=> delete them or not depending on where you're responding

I'm not sure I understand what you're asking.

>  - what to log
>=> do we always need to log a full reqid or can we sometimes just
>   log one part of it

What do you mean by logging "one part" of an ID?  The ID is just a unique tag.  
I don't understand how it can be divided into parts.

>  - what info to include in it.

For what I want, the only information it needs to have is a unique tag so we 
can trace a request through the system.  I would think a constant string 
defined in the config file plus a counter would be sufficient.

> One of my customers has designed one years
>ago and has been asking for its integration into haproxy. Lack of time
>and priorities, etc... It basically includes src ip+port, dest ip+port,
>instance name, timestamp, pid and counter.

I could see how people might want all that stuff, but for my use, just the 
counter would be sufficient.

>I identified that it had to
>be extended to support IPv6 and possibly more than 64k req/s/instance.
>Overall I'm fine with it because it has permitted to trace requests
>through a long chain for years. But maybe you have different needs that
>should be studied before something is implemented. For instance, it's
>possible that some users might prefer to hash it so that info present
>in it cannot be reversed when passing it to external service providers.

Again, I could see why people might have those concerns, but for my use, just a 
simple counter is fine.  All I need is a unique tag attached to every incoming 
request.

--
Roy Smith
r...@panix.com







Re: Adding a unique id header to each request?

2011-01-23 Thread Roy Smith
Cool.  What can I do to help?


On Jan 23, 2011, at 12:51 AM, Willy Tarreau wrote:

> Hi Roy,
> 
> On Sat, Jan 22, 2011 at 05:26:12PM -0500, Roy Smith wrote:
>> Is there a way to have haproxy generate (and log) a unique ID for every 
>> incoming request, and add that ID as a header line to each outgoing proxied 
>> request, i.e. something like:
>> 
>> X-Unique-ID: 12345ABCDE
>> 
>> It doesn't matter much what the identifier is, as long as it's unique and 
>> logged.  The idea here is to be able to correlate log entries from all the 
>> various moving pieces that are involved in responding to a request.
> 
> I agree. We've had this in the TODO list since version 1.1 :
> 
>  - insert/learn/check/log unique request ID, and add the ability
>to block bad responses.
> 
> But now we've planned to work on it at Exceliance. We need to make it
> configurable enough so that it can handle various users' needs.
> 
> Regards,
> Willy
> 


--
Roy Smith
r...@panix.com








Adding a unique id header to each request?

2011-01-22 Thread Roy Smith
Is there a way to have haproxy generate (and log) a unique ID for every 
incoming request, and add that ID as a header line to each outgoing proxied 
request, i.e. something like:

X-Unique-ID: 12345ABCDE

It doesn't matter much what the identifier is, as long as it's unique and 
logged.  The idea here is to be able to correlate log entries from all the 
various moving pieces that are involved in responding to a request.

--
Roy Smith
r...@panix.com








Re: Can't get server check to work with virtual hosts

2010-08-17 Thread Roy Smith
Ah, OK, that's getting me closer.  Thanks!  Now I've got

>option httpchk GET /index.html 
> HTTP/1.1\r\nHost:test1.cluster6.corp.amiestreet.com
>server webA test1.cluster6.corp.amiestreet.com:80 cookie A check inter 2s
>server webB test2.cluster6.corp.amiestreet.com:80 cookie B check inter 2s

and it's sending the correct headers, at least for test1.  The problem is that 
it's also sending "Host: test1..." to test2.  I don't see how to configure it 
to send each host the correct header.


On Aug 17, 2010, at 5:23 PM, Graeme Donaldson wrote:

> Hi Roy
> 
> You simply need to send an HTTP 1.1 request with a Host: header in the http 
> check, like this:
> 
> option httpchk GET /index.html\r\nHost: vhost.example.com
> 
> Graeme.
> 
> On 17 August 2010 23:19, Roy Smith  wrote:
> I'm running "HA-Proxy version 1.3.22" on Ubuntu Linux.  I've got apache set 
> up with two virtual hosts, and I want to use haproxy to round-robin between 
> them.  Ultimately, these virtual hosts will be on different machines, but for 
> my testing environment, they're on the same box.  I've got a config file I'm 
> using for testing:
> 
> > global
> > maxconn 100
> >
> > listen webfarm cluster6:23000
> >mode http
> >option httpclose
> >balance roundrobin
> >cookie SERVERID insert indirect
> >timeout server 5s
> >timeout client 5s
> >timeout connect 5s
> >option httpchk GET /index.html HTTP/1.0
> >server webA test1.cluster6.corp.amiestreet.com:80 cookie A check inter 2s
> >server webB test2.cluster6.corp.amiestreet.com:80 cookie B check inter 2s
> 
> 
> As soon as I start haproxy up, I get:
> 
> > [WARNING] 228/171101 (20636) : Server webfarm/webA is DOWN. 1 active and 0 
> > backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
> > [WARNING] 228/171102 (20636) : Server webfarm/webB is DOWN. 0 active and 0 
> > backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
> > [ALERT] 228/171102 (20636) : proxy 'webfarm' has no server available!
> 
> 
> The problem seems to be that when it sends the HTTP requests to apache, it 
> leaves out the Host: header.  For example, strace shows that wget does:
> 
> > write(3, "GET /index.html HTTP/1.0\r\nUser-Agent: Wget/1.12 
> > (linux-gnu)\r\nAccept: */*\r\nHost: 
> > test1.cluster6.corp.amiestreet.com\r\nConnection: Keep-Alive\r\n\r\n", 142\
> > ) = 142
> 
> 
> but haproxy just does:
> 
> > sendto(5, "GET /index.html HTTP/1.0\r\n\r\n", 28, 
> > MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = 28
> 
> How do I get haproxy to work and play well with virtual hosts?
> 
> 
> 



Can't get server check to work with virtual hosts

2010-08-17 Thread Roy Smith
I'm running "HA-Proxy version 1.3.22" on Ubuntu Linux.  I've got apache set up 
with two virtual hosts, and I want to use haproxy to round-robin between them.  
Ultimately, these virtual hosts will be on different machines, but for my 
testing environment, they're on the same box.  I've got a config file I'm using 
for testing:

> global
> maxconn 100
> 
> listen webfarm cluster6:23000
>mode http
>option httpclose
>balance roundrobin
>cookie SERVERID insert indirect
>timeout server 5s
>timeout client 5s
>timeout connect 5s
>option httpchk GET /index.html HTTP/1.0
>server webA test1.cluster6.corp.amiestreet.com:80 cookie A check inter 2s
>server webB test2.cluster6.corp.amiestreet.com:80 cookie B check inter 2s


As soon as I start haproxy up, I get:

> [WARNING] 228/171101 (20636) : Server webfarm/webA is DOWN. 1 active and 0 
> backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
> [WARNING] 228/171102 (20636) : Server webfarm/webB is DOWN. 0 active and 0 
> backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
> [ALERT] 228/171102 (20636) : proxy 'webfarm' has no server available!


The problem seems to be that when it sends the HTTP requests to apache, it 
leaves out the Host: header.  For example, strace shows that wget does:

> write(3, "GET /index.html HTTP/1.0\r\nUser-Agent: Wget/1.12 
> (linux-gnu)\r\nAccept: */*\r\nHost: 
> test1.cluster6.corp.amiestreet.com\r\nConnection: Keep-Alive\r\n\r\n", 142\
> ) = 142


but haproxy just does:

> sendto(5, "GET /index.html HTTP/1.0\r\n\r\n", 28, MSG_DONTWAIT|MSG_NOSIGNAL, 
> NULL, 0) = 28

How do I get haproxy to work and play well with virtual hosts?