Re: HAProxy for PostgreSQL Failover
Alan, On 2011-06-15 19:54, Alan Gutierrez wrote: > I'd like to use HAProxy to implement a simple proxy that can perform > failover for a pair of PostgreSQL configured as master/slave with > PostgreSQL 9.0 streaming replication to replicate the master to the > slave. Only the master is active for client connections, unless the > master fails, then the clients should connect to the slave while an > administrator recovers the master. You might also want to have a look at pgpool-II [1] which is a proxy specifically designed for failover, replication and loadbalancing of Postgres servers. Recent versions can take advantage of the built-in asynchronous replication feature of Postgres 9. Using this, you can * configure failover and recovery * potentially utilize the second machine for read-only queries. --Holger [1] http://pgpool.projects.postgresql.org
Re: Strange problem with haproxy
Thanks Willy, The information that you gave me was very helpful. I checked your suggestions and you are right. I made the client upload files from another place and it seems to work quite nicely. So I guess I have to figure out if my internet provider is making the caching/proxy, or my hosting provider is doing this. Cheers for the great product that you've made and the helpful mailing list. Regards, Galfy On Tue, Jun 21, 2011 at 10:29 PM, Willy Tarreau wrote: > On Tue, Jun 21, 2011 at 04:23:52PM +0200, Galfy Pundee wrote: >> Hi all, >> At the moment I am evaluating haproxy as a reverse proxy. Everything >> seems to work great except one thing and I do not know where is the >> problem. The situations in which I get the problem is the following: >> 1. client uploads big file to the haproxy >> 2. all incoming data seems to be sent from the client but the backend >> is taking quite some time to save the data to the hard disk. >> 3. meanwhile the client reports time out >> 4. There is nothing sent from the client to the server, but the >> backend seems to continue to write the file until it writes down all >> the bytes. >> >> As a result all uploads of big files are reported from the client as >> failed, even though they are stored with a delay. How can I make the >> client aware about this fact? Why is this happening? Can it be that >> the haproxy is getting the data very fast, but the backend cannot >> store it so fast? Any suggestions or similar experiences? > > This cannot be the case because haproxy's buffers are relatively small > size (16 kB by default), so it cannot hold all of the request. However > what you describe is common when there is a client-side anti-virus which > parses POST requests. What happens is exactly what you're describing, > the client sends everything very fast to the local gateway, and waits > for a long time while the data are uploaded over a slow link, then > finally times out. > > Do you know if there is any such thing on the client side ? Alternatively > it is possible that socket buffers on the client are far too large and are > able to hold the whole request at once without the client application being > aware of this. This is less common because what you describe suggests that > the amount of data is huge, but still this is a possible scenario that I've > observed with a USB 3G key with huge buffers. > > Ideally you should put a tcpdump on the client side and in front of haproxy > to see what's happening. But if you're facing a proxy which buffers > everything before passing on the request, I see no easy solution to the > problem. > > Regards, > Willy > >
Re: halog assitance
So I'm still debugging these potentially slow backends, but I've got some pretty wildly ranging times showing up in haproxy log. Some clients are really slow to send their request headers: 945/0/1/2/948 Some backends are really slow to send their response: 0/0/0/1556/1556 More background is that the clients are our scala web app making requests to a backend http service through haproxy. Scala, as many might know, runs on top of the JVM. A collegue of mine theorizes that garbage collection runs might get in the way of these requests thereby explaining the extremely slow request headers. We'll be looking into tuning the clients somehow, whether it be GC tuning or simply bulking up the pool of servers to reduce GC overhead per app server. Is it possible that the slow backends as reported by haproxy are not actually slow, but that the clients are holding up the conversation and not pulling the request through haproxy? Does haproxy fully buffer the response like nginx can? Is there some tuning I can do to help to prove that the clients are slow and not the backends? On Tue, Jun 21, 2011 at 2:15 PM, Willy Tarreau wrote: > Hi David, > > On Tue, Jun 21, 2011 at 02:04:50PM -0700, David Birdsong wrote: >> I'm in the process of trying to debug a somewhat sluggish set of >> backends. The backends are ~20 python tornado web servers that >> implement a simple blocking db call to mongodb. I would theorize that >> the request rate can overload the number of backends and their ability >> to service periodically when outside forces slow the app down slightly >> and in turn cause connections to sit in each app servers listen queue >> briefly while it clears out requests. >> >> I'm pretty sure halog can help me figure this out for certain, but I >> can't seem to either invoke the correct cmd line args or I'm not >> comprehending the output properly. >> >> For example: >> >> | halog -ad 10 gives me lines like: >> 21:02:21.020 75741020 11 1 >> 21:02:21.050 75741050 13 1 >> 21:02:21.313 75741313 17 1 >> 21:02:21.522 75741522 10 1 >> 21:02:21.549 75741549 13 1 >> 21:02:21.661 75741661 11 2 >> 21:02:21.704 75741704 12 1 >> 21:02:21.762 75741762 15 1 >> >> >> I expect this to mean: 'filter out any lines that indicate an accept >> time below 10ms and show me anything greater'. However -ad is an input >> filter, so I have no idea what the output means. > > No, the -ad is one of the few hacks that was initially developped in halog > for a specific purpose. It reports to you at what dates there was a hole > without any requests that latested more than 10 ms, and the number of > requests which suddenly happened after the hole. It helped spot system > issues on a machine running haproxy. > > What you should use are -srv (report per-server stats), -st (stats on > status codes), -tc (stats on termination codes), and -pct (outputs > percentiles of connect times, response times and data times). You can > also make use of -u (report stats by URL, and optionally sort by average > time, errors, etc...). > > Do not hesitate to add new features to halog, each time I had to use it > on a serious problem, I found that some filtering or output capability > was missing and added it. > > If you need more assistance on precise log extracts, please send them > off-list. > > Regards, > Willy > >
Re: halog assitance
Hi David, On Wed, Jun 22, 2011 at 12:18:20PM -0700, David Birdsong wrote: > So I'm still debugging these potentially slow backends, but I've got > some pretty wildly ranging times showing up in haproxy log. > > Some clients are really slow to send their request headers: > 945/0/1/2/948 > > Some backends are really slow to send their response: > 0/0/0/1556/1556 > > More background is that the clients are our scala web app making > requests to a backend http service through haproxy. Scala, as many > might know, runs on top of the JVM. A collegue of mine theorizes that > garbage collection runs might get in the way of these requests thereby > explaining the extremely slow request headers. This is indeed possible. If I were you, I'd take a network capture to see how it goes. You might as well discover some packet losses on the NIC of the load balancer, explaining the slowness on both sides. > We'll be looking into > tuning the clients somehow, whether it be GC tuning or simply bulking > up the pool of servers to reduce GC overhead per app server. > > Is it possible that the slow backends as reported by haproxy are not > actually slow, but that the clients are holding up the conversation > and not pulling the request through haproxy? Does haproxy fully buffer > the response like nginx can? Is there some tuning I can do to help to > prove that the clients are slow and not the backends? What happens precisely is that haproxy measures the response time as the time elapsed between the moment it sends the full request headers and it receives the full response headers. So yes, if the client takes time to emit some required data, it can alter the apparent response time. There are a few things you can do. First, capture the content-length header from the request and log it : capture request header content-length len 10 This way you'll be able to correlate the amount of forwarded data with the response time. Second, there is something quite ugly you can do if all of your incoming requests are short enough to fit in a buffer and you're always interested in checking for approximately the same length : you enable TCP inspection to say that if a content-length header indicates a value of more than a certain amount of bytes, then you want to block the request until either a timeout fires or this amount of bytes is finally received. Since there cannot be a relation between both values, you have to set the values by hand yourself (though you can use multiple combinations). Keep in mind that all buffered bytes are accounted for, so headers are included in the amount of bytes. Thus you're only interested in bodies larger than a few kilobytes and still smaller than the buffer. For instance, let's assume that you're running with 16kB buffers, minus 1kB for rewrite = 15kB. We'd set a few rules to say that if the client advertises more than 2, 4, 8, 12 kB, then we wait for that amount of bytes to be received before passing the request to the HTTP parser, and never wait more than 5 seconds : listen test bind: tcp-request inspect-delay 5s tcp-request content accept if HTTP METH_POST { hdr_val(content-length) ge 12000 } { req_len gt 12000 } tcp-request content accept if HTTP METH_POST { hdr_val(content-length) ge 8000 } { req_len gt 8000 } tcp-request content accept if HTTP METH_POST { hdr_val(content-length) ge 4000 } { req_len gt 4000 } tcp-request content accept if HTTP METH_POST { hdr_val(content-length) ge 2000 } { req_len gt 2000 } mode http ... What's interesting here (ugliness aside) is that the time spent waiting for the client will be logged in the first field (request time). Still there are cases where some more bytes will be waited for from the client due to the very poor granularity above. But it can already be enough to show that if you wait for the client, the server time almost completely disappears. A third possibility consists in abusing the url_param checkpost feature, which waits for a specific parameter in the POST request to perform load balancing. If it does not find it, then it falls back to round robin. Assuming you're doing round robin right now, a solution could be to enable url_param checking on a parameter you're certain never to get : balance url_param dfsgfdsgfds check_post 15000 Warning, ensure the size you're waiting for above (15000) is always smaller than your request buffer size, otherwise too large requests will be rejected. The statement above will say that the request body will be buffered until it matches the advertised content-length or the specified size. A last solution could consist in writing a new ACL match "http_body_complete" or something like this which compares the buffered body size with the advertised content-length, and to use it in a tcp-request statement as above. If you want to process slightly larger requests, you can make use of the "tune.bufsize" statement in