Re: "option httpchk" is reporting servers as down when they're not

2009-03-09 Thread Willy Tarreau
Hi Thomas,

On Mon, Mar 09, 2009 at 05:20:49PM -0400, Allen, Thomas wrote:
> Hi Willy,
> 
> Hm, changing to "60s" for each gave me 100% 504 errors, I removed all
> three. Bad idea, I know, but at least it works then. 

then use "6", that's the old way of doing it :-)

> I'm running 1.2.18 because the HAProxy homepage calls it the Latest
> version.

Ah OK, version 1.2 did not have the time units. Well, in fact it's not
exactly marked as the only latest version, it's the latest version of
branch 1.2, and 1.2 is the only one not tainted by development I admit.

> I've removed all cookies from this IP, cleared my cache, and still it
> seems that only one server is being hit. But the stats page reports an
> equal distribution, so it's anybody's guess. What would be a simple way
> to log the distribution? I find it difficult to determine this even in
> debug mode (I'm running the proxy in daemon mode, of course).

it is in the logs, you have the server's name (assuming you're logging
with "option httplog"). Something is possible if you're playing with
only once client. If the number of objects on a page is a multiple of
the number of servers and you're in round-robin mode, then each time
you'll fetch a page, you'll alternatively fetch objects from both servers
and come back to the first one for the next click. Of course that does
not happen as soon as you have at least another client. And since I
saw 20 sessions on your stats after my access, I'm tempted to think
that it could be related.

Regards,
Willy




RE: "option httpchk" is reporting servers as down when they're not

2009-03-09 Thread Allen, Thomas
Hi Willy,

Hm, changing to "60s" for each gave me 100% 504 errors, I removed all
three. Bad idea, I know, but at least it works then. 

I'm running 1.2.18 because the HAProxy homepage calls it the Latest
version.

I've removed all cookies from this IP, cleared my cache, and still it
seems that only one server is being hit. But the stats page reports an
equal distribution, so it's anybody's guess. What would be a simple way
to log the distribution? I find it difficult to determine this even in
debug mode (I'm running the proxy in daemon mode, of course).

Thanks,
Thomas Allen
Web Developer, ASCE
703.295.6355

-Original Message-
From: Willy Tarreau [mailto:w...@1wt.eu] 
Sent: Monday, March 09, 2009 4:58 PM
To: Allen, Thomas
Cc: Jeffrey 'jf' Lim; haproxy@formilux.org
Subject: Re: "option httpchk" is reporting servers as down when they're
not

On Mon, Mar 09, 2009 at 04:15:34PM -0400, Allen, Thomas wrote:
> I used the unit 'S' for my timeouts, as in
> 
> clitimeout 60S
> contimeout 60S
> srvtimeout 60S 
> 
> Is that to be avoided? I assumed it meant "seconds."

OK it's just a minor problem. You have to use a lower-case "s" : 60s.
It's stupid that the parser did not catch this mistake. I should improve
it. By default, it ignores unknown chars, you you clearly had 60 ms
here.
BTW, there's no use in setting large contimeouts. You should usually
stay
with lower values such as 5-10s. Oh BTW, what version are you running ?
Your stats page looks old. The time units were introduced in 1.3.14, so
I hope you're at least at this level.

> I'm using roundrobin and adding the httpclose option. I've been using
> cookie stickiness (which will be important for this website), but
after
> disabling this stickiness, I get the same results. I tried clearing
out
> the server cookie before and opening the page in multiple browsers,
and
> still got these results.

Then it is possible that haproxy could not manage to connect to your
server in 60ms, then immediately retried on the other one, and sticked
to that one.

Regards,
Willy




Re: "option httpchk" is reporting servers as down when they're not

2009-03-09 Thread Willy Tarreau
On Mon, Mar 09, 2009 at 04:15:34PM -0400, Allen, Thomas wrote:
> I used the unit 'S' for my timeouts, as in
> 
> clitimeout 60S
> contimeout 60S
> srvtimeout 60S 
> 
> Is that to be avoided? I assumed it meant "seconds."

OK it's just a minor problem. You have to use a lower-case "s" : 60s.
It's stupid that the parser did not catch this mistake. I should improve
it. By default, it ignores unknown chars, you you clearly had 60 ms here.
BTW, there's no use in setting large contimeouts. You should usually stay
with lower values such as 5-10s. Oh BTW, what version are you running ?
Your stats page looks old. The time units were introduced in 1.3.14, so
I hope you're at least at this level.

> I'm using roundrobin and adding the httpclose option. I've been using
> cookie stickiness (which will be important for this website), but after
> disabling this stickiness, I get the same results. I tried clearing out
> the server cookie before and opening the page in multiple browsers, and
> still got these results.

Then it is possible that haproxy could not manage to connect to your
server in 60ms, then immediately retried on the other one, and sticked
to that one.

Regards,
Willy




RE: "option httpchk" is reporting servers as down when they're not

2009-03-09 Thread Allen, Thomas
I used the unit 'S' for my timeouts, as in

clitimeout 60S
contimeout 60S
srvtimeout 60S 

Is that to be avoided? I assumed it meant "seconds."

I'm using roundrobin and adding the httpclose option. I've been using
cookie stickiness (which will be important for this website), but after
disabling this stickiness, I get the same results. I tried clearing out
the server cookie before and opening the page in multiple browsers, and
still got these results.

Thanks,
Thomas Allen
Web Developer, ASCE
703.295.6355

-Original Message-
From: Willy Tarreau [mailto:w...@1wt.eu] 
Sent: Monday, March 09, 2009 4:09 PM
To: Allen, Thomas
Cc: Jeffrey 'jf' Lim; haproxy@formilux.org
Subject: Re: "option httpchk" is reporting servers as down when they're
not

Hi Thomas,

just replying quick, as I'm in a hurry.

On Mon, Mar 09, 2009 at 04:01:29PM -0400, Allen, Thomas wrote:
> That, along with specifying HTTP1.1, did it, so thanks! What should I
> load into "Host:" ? It seems to work fine with "www", but I'd prefer
to
> use something I understand. Please keep in mind that none of this is
yet
> associated with a domain, so www.mydomain.com would be inaccurate.

Of course, www.mydomain.com was an example. Often web servers are fine
with just "www" but normally you should use the same host name that
your server will respond to. Sometimes you can also put the server's
IP address. Some servers also accept an empty header (so just "Host:"
and nothing else).

> Beginning very recently, I get a 504 Gateway Timeout for about 30% of
> all requests. What could be causing this?

responses taking too much time. Are you sure that your "timeout server"
is properly set ? Maybe you have put times in milliseconds there
thinking
they were in seconds ?

> More importantly, I'm not
> convinced that HAProxy is successfully forwarding requests to both
> servers, although I could wrong. As you can see on the two app
> instances, each reports a separate internal IP to help diagnose. It
> appears that only SAMP1 receives requests, although both pass health
> checks now.

I see both servers receiving 20 sessions, so that seems fine.
Among possible reasons for what you observe :
  - ensure you're using "balance roundrobin" and not any sort of
hash or source-based algorithm

  - ensure that you have not enabled cookie stickiness, or that
you close your browser before retrying.

  - ensure that you have "option httpclose" and that your browser
is not simply pushing all requests in the same session tunnelled
to the first server haproxy connected to.

Regards,
Willy




Re: "option httpchk" is reporting servers as down when they're not

2009-03-09 Thread Willy Tarreau
Hi Thomas,

just replying quick, as I'm in a hurry.

On Mon, Mar 09, 2009 at 04:01:29PM -0400, Allen, Thomas wrote:
> That, along with specifying HTTP1.1, did it, so thanks! What should I
> load into "Host:" ? It seems to work fine with "www", but I'd prefer to
> use something I understand. Please keep in mind that none of this is yet
> associated with a domain, so www.mydomain.com would be inaccurate.

Of course, www.mydomain.com was an example. Often web servers are fine
with just "www" but normally you should use the same host name that
your server will respond to. Sometimes you can also put the server's
IP address. Some servers also accept an empty header (so just "Host:"
and nothing else).

> Beginning very recently, I get a 504 Gateway Timeout for about 30% of
> all requests. What could be causing this?

responses taking too much time. Are you sure that your "timeout server"
is properly set ? Maybe you have put times in milliseconds there thinking
they were in seconds ?

> More importantly, I'm not
> convinced that HAProxy is successfully forwarding requests to both
> servers, although I could wrong. As you can see on the two app
> instances, each reports a separate internal IP to help diagnose. It
> appears that only SAMP1 receives requests, although both pass health
> checks now.

I see both servers receiving 20 sessions, so that seems fine.
Among possible reasons for what you observe :
  - ensure you're using "balance roundrobin" and not any sort of
hash or source-based algorithm

  - ensure that you have not enabled cookie stickiness, or that
you close your browser before retrying.

  - ensure that you have "option httpclose" and that your browser
is not simply pushing all requests in the same session tunnelled
to the first server haproxy connected to.

Regards,
Willy




RE: "option httpchk" is reporting servers as down when they're not

2009-03-09 Thread Allen, Thomas
That, along with specifying HTTP1.1, did it, so thanks! What should I
load into "Host:" ? It seems to work fine with "www", but I'd prefer to
use something I understand. Please keep in mind that none of this is yet
associated with a domain, so www.mydomain.com would be inaccurate.

Beginning very recently, I get a 504 Gateway Timeout for about 30% of
all requests. What could be causing this? More importantly, I'm not
convinced that HAProxy is successfully forwarding requests to both
servers, although I could wrong. As you can see on the two app
instances, each reports a separate internal IP to help diagnose. It
appears that only SAMP1 receives requests, although both pass health
checks now.

Load balancer: http://174.129.240.119/ and stats (temporarily unblocked)
http://174.129.240.119/status/lb
SAMP1: http://174.129.251.234/
SAMP2: http://174.129.244.252/

Thanks,
Thomas Allen
Web Developer, ASCE
703.295.6355

-Original Message-
From: Willy Tarreau [mailto:w...@1wt.eu] 
Sent: Friday, March 06, 2009 1:39 PM
To: Allen, Thomas
Cc: Jeffrey 'jf' Lim; haproxy@formilux.org
Subject: Re: "option httpchk" is reporting servers as down when they're
not

Hi Thomas,

On Thu, Mar 05, 2009 at 08:45:20AM -0500, Allen, Thomas wrote:
> Hi Jeff,
> 
> The thing is that if I don't include the health check, the load
balancer works fine and each server receives equal distribution. I have
no idea why the servers would be reported as "down" but still work when
unchecked.

It is possible that your servers expect the "Host:" header to
be set during the checks. There's a trick to do it right now
(don't forget to escape spaces) :

option httpchk GET /index.php HTTP/1.0\r\nHost:\
www.mydomain.com

Also, you should check the server's logs to see why it is reporting
the service as down. And as a last resort, a tcpdump of the traffic
between haproxy and a failed server will show you both the request
and the complete error from the server.

Regards,
Willy




Re: HaProxy ACL (fwd) - access control

2009-03-09 Thread Krzysztof Oledzki



On Wed, 11 Feb 2009, Willy Tarreau wrote:


Hi Krzysztof,


Hi Willy,

First, please excuse that it took me nearly one moth to replay to your 
letter, shame on me. :(



On Wed, Feb 11, 2009 at 05:58:42PM +0100, Krzysztof Oledzki wrote:

As you are probably aware, recently there was a mail quoted below, asking
about the redirect feature. It encouraged me to think a little more about
it, so: shouldn't we rather put the feature into use_backend chain instead
of "req allow/deny/block"? This would simply allow to do something like:

--- cut here ---
use_backend www_php4 if payment
redirect prefix https://pay.xxx.bg if pay_xxx
default_backend www_php4
--- cut here ---

instead of:
--- cut here ---
use_backend www_php4 if payment
redirect prefix https://pay.xxx.bg if pay_xxx !payment
default_backend www_php4
--- cut here ---

It would also allow to create more complicated rules, without duplicating
whole "use_backend xxx if abc" section, which is sometimes *very*
complicated, into "req allow if abc".

What is your opinion?


To be fair, this is how I believed it worked. But I now remember there
were is a specific "redirect rules" list for them, so I believed wrong.

I've just looked at the diagram I have posted yesterday, and use_backend
is separated there. Now that I'm thinking about it, I believe the reason
I wanted use_backend in a different list was because it is not really an
action. I mean, "allow", "deny", "redirect", "tarpit", ... all terminate
processing or rule evaluation. Having "use_backend" stop evaluation is
problematic, as it's not an action, it's a dynamic configuration.
In fact, it's problematic for existing setups mostly, because new setups 
could be written by carefully interleaving use_backend statements in the 
middle of the allow/redirect/deny rules.


Yes, your concern is perfectly right. Changing such behavior would be 
wrong and most likely useless.


And I agree that for the long term, it's better to remember that rule 
ordering is what matters than remembering which one applies before or 
after which one.


I doubt it will be ever possible to ensure 100% rule ordering, especially 
it is quite inconvenience for automatically generated configs allowing 
to include manually created rulesets.



In fact, I think that having use_backend evaluated last makes sense
since it's really how it's supposed to work. But I agree that for the
poor guy writing the rules, it would be easier to be able to put it
before other rules. In fact, there's a solution consisting in using
"allow" to escape the rules, but it requires that the rule is
duplicated for the use_backend one, which is not always very convenient.


So, after so long time of thinking (1 month, right? ;) ) I believe we 
should keep it as-is and prevent duplication by simply writing a "best 
practices" chapter in the docs and suggesting to use a dedicated backend 
to handle redirects:


backend pax_redirects
  redirect prefix https://pay.xxx.bg if pay_xxx
(...)

backend XX
  use_backend www_php4 if payment
  use_backend pax_redirects if pay_xxx
  default_backend www_php4

We may also add a warinig or even disallow to use use_backend and redirect 
in the same proxy. I believe it is the best solution - we already have 
everything what is needed, let's use it.



Hmmm now I remember why redirect was in another list. I'm pretty sure
the reason was that we wanted to be able to redirect after an allow
(which is not necessarily a strong requirement).

There are other aspects to consider :
 - eventually, tcp will support use_backend/allow/deny/tarpit
 - outgoing processing will obviously ever support use_backend
 - the backend's incoming rules will support use_server (to force
   a server) but not use_backend.

All these elements may easily be solved by a simple rule list and
strict action enforcing to ensure we never have the wrong action
at the wrong place.

I'm still afraid of breaking existing setups. Same problem as for the
"block" keyword after all.

Don't you think we should create a new "set-backend" keyword to merge
it with the whole list, and let "use_backend" slowly die (for instance,
we mark it deprecated in version 1.4 with a big warning) ?
This means we would have :
  block
  allow|deny|redirect|tarpit|set-backend
  use_backend


I'm not sure. I like the idea of two step request processing:
 - first decide if we need to "allow|deny|tarpit" a request
 - then decide which backend to use (frontend) or which server/redirect 
(backend)

We may add set-backend directive but I think use_backend will be still
useful, so keeping both could be hard to maintain in the long term.


Another idea would consist in splitting access rules from traffic
management rules. I mean, "allow", "deny", "tarpit" grant or deny
access. Even the tarpit could be considered as an extended deny.
Then we have "use_backend", "redirect", and maybe later things
about QoS, logging, etc... which would make sense in a separate
list.


Yes. This is definitely the way I thin