Re: "option httpchk" is reporting servers as down when they're not
Hi Thomas, On Mon, Mar 09, 2009 at 05:20:49PM -0400, Allen, Thomas wrote: > Hi Willy, > > Hm, changing to "60s" for each gave me 100% 504 errors, I removed all > three. Bad idea, I know, but at least it works then. then use "6", that's the old way of doing it :-) > I'm running 1.2.18 because the HAProxy homepage calls it the Latest > version. Ah OK, version 1.2 did not have the time units. Well, in fact it's not exactly marked as the only latest version, it's the latest version of branch 1.2, and 1.2 is the only one not tainted by development I admit. > I've removed all cookies from this IP, cleared my cache, and still it > seems that only one server is being hit. But the stats page reports an > equal distribution, so it's anybody's guess. What would be a simple way > to log the distribution? I find it difficult to determine this even in > debug mode (I'm running the proxy in daemon mode, of course). it is in the logs, you have the server's name (assuming you're logging with "option httplog"). Something is possible if you're playing with only once client. If the number of objects on a page is a multiple of the number of servers and you're in round-robin mode, then each time you'll fetch a page, you'll alternatively fetch objects from both servers and come back to the first one for the next click. Of course that does not happen as soon as you have at least another client. And since I saw 20 sessions on your stats after my access, I'm tempted to think that it could be related. Regards, Willy
RE: "option httpchk" is reporting servers as down when they're not
Hi Willy, Hm, changing to "60s" for each gave me 100% 504 errors, I removed all three. Bad idea, I know, but at least it works then. I'm running 1.2.18 because the HAProxy homepage calls it the Latest version. I've removed all cookies from this IP, cleared my cache, and still it seems that only one server is being hit. But the stats page reports an equal distribution, so it's anybody's guess. What would be a simple way to log the distribution? I find it difficult to determine this even in debug mode (I'm running the proxy in daemon mode, of course). Thanks, Thomas Allen Web Developer, ASCE 703.295.6355 -Original Message- From: Willy Tarreau [mailto:w...@1wt.eu] Sent: Monday, March 09, 2009 4:58 PM To: Allen, Thomas Cc: Jeffrey 'jf' Lim; haproxy@formilux.org Subject: Re: "option httpchk" is reporting servers as down when they're not On Mon, Mar 09, 2009 at 04:15:34PM -0400, Allen, Thomas wrote: > I used the unit 'S' for my timeouts, as in > > clitimeout 60S > contimeout 60S > srvtimeout 60S > > Is that to be avoided? I assumed it meant "seconds." OK it's just a minor problem. You have to use a lower-case "s" : 60s. It's stupid that the parser did not catch this mistake. I should improve it. By default, it ignores unknown chars, you you clearly had 60 ms here. BTW, there's no use in setting large contimeouts. You should usually stay with lower values such as 5-10s. Oh BTW, what version are you running ? Your stats page looks old. The time units were introduced in 1.3.14, so I hope you're at least at this level. > I'm using roundrobin and adding the httpclose option. I've been using > cookie stickiness (which will be important for this website), but after > disabling this stickiness, I get the same results. I tried clearing out > the server cookie before and opening the page in multiple browsers, and > still got these results. Then it is possible that haproxy could not manage to connect to your server in 60ms, then immediately retried on the other one, and sticked to that one. Regards, Willy
Re: "option httpchk" is reporting servers as down when they're not
On Mon, Mar 09, 2009 at 04:15:34PM -0400, Allen, Thomas wrote: > I used the unit 'S' for my timeouts, as in > > clitimeout 60S > contimeout 60S > srvtimeout 60S > > Is that to be avoided? I assumed it meant "seconds." OK it's just a minor problem. You have to use a lower-case "s" : 60s. It's stupid that the parser did not catch this mistake. I should improve it. By default, it ignores unknown chars, you you clearly had 60 ms here. BTW, there's no use in setting large contimeouts. You should usually stay with lower values such as 5-10s. Oh BTW, what version are you running ? Your stats page looks old. The time units were introduced in 1.3.14, so I hope you're at least at this level. > I'm using roundrobin and adding the httpclose option. I've been using > cookie stickiness (which will be important for this website), but after > disabling this stickiness, I get the same results. I tried clearing out > the server cookie before and opening the page in multiple browsers, and > still got these results. Then it is possible that haproxy could not manage to connect to your server in 60ms, then immediately retried on the other one, and sticked to that one. Regards, Willy
RE: "option httpchk" is reporting servers as down when they're not
I used the unit 'S' for my timeouts, as in clitimeout 60S contimeout 60S srvtimeout 60S Is that to be avoided? I assumed it meant "seconds." I'm using roundrobin and adding the httpclose option. I've been using cookie stickiness (which will be important for this website), but after disabling this stickiness, I get the same results. I tried clearing out the server cookie before and opening the page in multiple browsers, and still got these results. Thanks, Thomas Allen Web Developer, ASCE 703.295.6355 -Original Message- From: Willy Tarreau [mailto:w...@1wt.eu] Sent: Monday, March 09, 2009 4:09 PM To: Allen, Thomas Cc: Jeffrey 'jf' Lim; haproxy@formilux.org Subject: Re: "option httpchk" is reporting servers as down when they're not Hi Thomas, just replying quick, as I'm in a hurry. On Mon, Mar 09, 2009 at 04:01:29PM -0400, Allen, Thomas wrote: > That, along with specifying HTTP1.1, did it, so thanks! What should I > load into "Host:" ? It seems to work fine with "www", but I'd prefer to > use something I understand. Please keep in mind that none of this is yet > associated with a domain, so www.mydomain.com would be inaccurate. Of course, www.mydomain.com was an example. Often web servers are fine with just "www" but normally you should use the same host name that your server will respond to. Sometimes you can also put the server's IP address. Some servers also accept an empty header (so just "Host:" and nothing else). > Beginning very recently, I get a 504 Gateway Timeout for about 30% of > all requests. What could be causing this? responses taking too much time. Are you sure that your "timeout server" is properly set ? Maybe you have put times in milliseconds there thinking they were in seconds ? > More importantly, I'm not > convinced that HAProxy is successfully forwarding requests to both > servers, although I could wrong. As you can see on the two app > instances, each reports a separate internal IP to help diagnose. It > appears that only SAMP1 receives requests, although both pass health > checks now. I see both servers receiving 20 sessions, so that seems fine. Among possible reasons for what you observe : - ensure you're using "balance roundrobin" and not any sort of hash or source-based algorithm - ensure that you have not enabled cookie stickiness, or that you close your browser before retrying. - ensure that you have "option httpclose" and that your browser is not simply pushing all requests in the same session tunnelled to the first server haproxy connected to. Regards, Willy
Re: "option httpchk" is reporting servers as down when they're not
Hi Thomas, just replying quick, as I'm in a hurry. On Mon, Mar 09, 2009 at 04:01:29PM -0400, Allen, Thomas wrote: > That, along with specifying HTTP1.1, did it, so thanks! What should I > load into "Host:" ? It seems to work fine with "www", but I'd prefer to > use something I understand. Please keep in mind that none of this is yet > associated with a domain, so www.mydomain.com would be inaccurate. Of course, www.mydomain.com was an example. Often web servers are fine with just "www" but normally you should use the same host name that your server will respond to. Sometimes you can also put the server's IP address. Some servers also accept an empty header (so just "Host:" and nothing else). > Beginning very recently, I get a 504 Gateway Timeout for about 30% of > all requests. What could be causing this? responses taking too much time. Are you sure that your "timeout server" is properly set ? Maybe you have put times in milliseconds there thinking they were in seconds ? > More importantly, I'm not > convinced that HAProxy is successfully forwarding requests to both > servers, although I could wrong. As you can see on the two app > instances, each reports a separate internal IP to help diagnose. It > appears that only SAMP1 receives requests, although both pass health > checks now. I see both servers receiving 20 sessions, so that seems fine. Among possible reasons for what you observe : - ensure you're using "balance roundrobin" and not any sort of hash or source-based algorithm - ensure that you have not enabled cookie stickiness, or that you close your browser before retrying. - ensure that you have "option httpclose" and that your browser is not simply pushing all requests in the same session tunnelled to the first server haproxy connected to. Regards, Willy
RE: "option httpchk" is reporting servers as down when they're not
That, along with specifying HTTP1.1, did it, so thanks! What should I load into "Host:" ? It seems to work fine with "www", but I'd prefer to use something I understand. Please keep in mind that none of this is yet associated with a domain, so www.mydomain.com would be inaccurate. Beginning very recently, I get a 504 Gateway Timeout for about 30% of all requests. What could be causing this? More importantly, I'm not convinced that HAProxy is successfully forwarding requests to both servers, although I could wrong. As you can see on the two app instances, each reports a separate internal IP to help diagnose. It appears that only SAMP1 receives requests, although both pass health checks now. Load balancer: http://174.129.240.119/ and stats (temporarily unblocked) http://174.129.240.119/status/lb SAMP1: http://174.129.251.234/ SAMP2: http://174.129.244.252/ Thanks, Thomas Allen Web Developer, ASCE 703.295.6355 -Original Message- From: Willy Tarreau [mailto:w...@1wt.eu] Sent: Friday, March 06, 2009 1:39 PM To: Allen, Thomas Cc: Jeffrey 'jf' Lim; haproxy@formilux.org Subject: Re: "option httpchk" is reporting servers as down when they're not Hi Thomas, On Thu, Mar 05, 2009 at 08:45:20AM -0500, Allen, Thomas wrote: > Hi Jeff, > > The thing is that if I don't include the health check, the load balancer works fine and each server receives equal distribution. I have no idea why the servers would be reported as "down" but still work when unchecked. It is possible that your servers expect the "Host:" header to be set during the checks. There's a trick to do it right now (don't forget to escape spaces) : option httpchk GET /index.php HTTP/1.0\r\nHost:\ www.mydomain.com Also, you should check the server's logs to see why it is reporting the service as down. And as a last resort, a tcpdump of the traffic between haproxy and a failed server will show you both the request and the complete error from the server. Regards, Willy
Re: HaProxy ACL (fwd) - access control
On Wed, 11 Feb 2009, Willy Tarreau wrote: Hi Krzysztof, Hi Willy, First, please excuse that it took me nearly one moth to replay to your letter, shame on me. :( On Wed, Feb 11, 2009 at 05:58:42PM +0100, Krzysztof Oledzki wrote: As you are probably aware, recently there was a mail quoted below, asking about the redirect feature. It encouraged me to think a little more about it, so: shouldn't we rather put the feature into use_backend chain instead of "req allow/deny/block"? This would simply allow to do something like: --- cut here --- use_backend www_php4 if payment redirect prefix https://pay.xxx.bg if pay_xxx default_backend www_php4 --- cut here --- instead of: --- cut here --- use_backend www_php4 if payment redirect prefix https://pay.xxx.bg if pay_xxx !payment default_backend www_php4 --- cut here --- It would also allow to create more complicated rules, without duplicating whole "use_backend xxx if abc" section, which is sometimes *very* complicated, into "req allow if abc". What is your opinion? To be fair, this is how I believed it worked. But I now remember there were is a specific "redirect rules" list for them, so I believed wrong. I've just looked at the diagram I have posted yesterday, and use_backend is separated there. Now that I'm thinking about it, I believe the reason I wanted use_backend in a different list was because it is not really an action. I mean, "allow", "deny", "redirect", "tarpit", ... all terminate processing or rule evaluation. Having "use_backend" stop evaluation is problematic, as it's not an action, it's a dynamic configuration. In fact, it's problematic for existing setups mostly, because new setups could be written by carefully interleaving use_backend statements in the middle of the allow/redirect/deny rules. Yes, your concern is perfectly right. Changing such behavior would be wrong and most likely useless. And I agree that for the long term, it's better to remember that rule ordering is what matters than remembering which one applies before or after which one. I doubt it will be ever possible to ensure 100% rule ordering, especially it is quite inconvenience for automatically generated configs allowing to include manually created rulesets. In fact, I think that having use_backend evaluated last makes sense since it's really how it's supposed to work. But I agree that for the poor guy writing the rules, it would be easier to be able to put it before other rules. In fact, there's a solution consisting in using "allow" to escape the rules, but it requires that the rule is duplicated for the use_backend one, which is not always very convenient. So, after so long time of thinking (1 month, right? ;) ) I believe we should keep it as-is and prevent duplication by simply writing a "best practices" chapter in the docs and suggesting to use a dedicated backend to handle redirects: backend pax_redirects redirect prefix https://pay.xxx.bg if pay_xxx (...) backend XX use_backend www_php4 if payment use_backend pax_redirects if pay_xxx default_backend www_php4 We may also add a warinig or even disallow to use use_backend and redirect in the same proxy. I believe it is the best solution - we already have everything what is needed, let's use it. Hmmm now I remember why redirect was in another list. I'm pretty sure the reason was that we wanted to be able to redirect after an allow (which is not necessarily a strong requirement). There are other aspects to consider : - eventually, tcp will support use_backend/allow/deny/tarpit - outgoing processing will obviously ever support use_backend - the backend's incoming rules will support use_server (to force a server) but not use_backend. All these elements may easily be solved by a simple rule list and strict action enforcing to ensure we never have the wrong action at the wrong place. I'm still afraid of breaking existing setups. Same problem as for the "block" keyword after all. Don't you think we should create a new "set-backend" keyword to merge it with the whole list, and let "use_backend" slowly die (for instance, we mark it deprecated in version 1.4 with a big warning) ? This means we would have : block allow|deny|redirect|tarpit|set-backend use_backend I'm not sure. I like the idea of two step request processing: - first decide if we need to "allow|deny|tarpit" a request - then decide which backend to use (frontend) or which server/redirect (backend) We may add set-backend directive but I think use_backend will be still useful, so keeping both could be hard to maintain in the long term. Another idea would consist in splitting access rules from traffic management rules. I mean, "allow", "deny", "tarpit" grant or deny access. Even the tarpit could be considered as an extended deny. Then we have "use_backend", "redirect", and maybe later things about QoS, logging, etc... which would make sense in a separate list. Yes. This is definitely the way I thin