Re: Observations about reloads and DNS SRV records
Hey Baptiste, I’ll try it out next week when I get back (currently on vacation) and let you know. Thanks! Tait On Tue., Jul. 3, 2018 at 06:24 Baptiste wrote: > Hi, > > Actually, the problem was deeper than my first thought. > In its current state, statefile and SRV records are simply not compatible. > I had to add a new field in the state file format to add support to this. > > Could you please confirm the patch attached fixes your issues? > > Baptiste > > > > On Mon, Jun 25, 2018 at 11:48 AM, Baptiste wrote: > >> Hi, >> >> Forget the backend id, it's the wrong answer to that problem. >> I was investigating an other potential issue, but this does not fix the >> original problem reported here. >> >> Here is the answer I delivered today on discourse, where other people >> have also reported the same issue: >> >>Just to let you know that I think I found the cause of the issue but I >> don’t have a fix yet. >>I’ll come back to you this week with more info and hopefully a fix. >>The issue seem to be in srv_init_addr(), because srv->hostname is not >> set (null). >> >> Baptiste >> >> >> >
Re: Using different sources when connecting to a server
Hello Baptiste, On Wed, Jul 4, 2018 at 1:07 PM, Baptiste wrote: > Hi Aurélien, > > My 2 cents. > >> I'm trying to add a feature which allows HAProxy to use more than one >> source when connecting to a server of a backend. The main reason is to >> avoid duplicating the 'server' lines to reach more than 64k connections >> from HAProxy to one server. > > > Cool! > >> >> So far I thought of two ways: >> - each time the 'source' keyword is encountered on a 'server' line, >> duplicate the original 'struct server' and fill 'conn_src' with >> the correct source informations. It's easy to implement but does >> not scale at all. In fact it mimics the multiple 'server' lines. >> The big advantage is that it can use all existing features that >> deal with 'struct server' (balance keyword, for example). >> - use a list of 'struct conn_src' in 'struct server' and 'struct >> proxy' and choose the best source (using round-robbin, leastconn, >> etc...) when a connection is about to get established. > > > I also prefer the second option. > So we would have 2 LBing algorithm? One to choose the server and one to > choose the source IP to use? It depends. Considering this feature could be (only ?) useful to address the 64k maximum connections, maybe hardcoding a leastconn algorithm is enough. >> >> The config. syntax would look like this: >> >> server srv 127.0.0.1:9000 source 127.0.0.2 source 127.0.0.3 source >> 127.0.0.4 source 127.0.0.5 source 127.0.0.6 source 127.0.1.0/24 >> >> Not using ip1,ip2,ip/cidr,... avoids confusion when using keywords like >> usesrc, interface, etc... > > > Sure, but at least, I don't want to set 255 source for a "source > 10.0.0.0/24", so please confirm you'll still allow CIDR notation. Yes, look at the last 'source' from my config. line example. What I found tedious is to use something like this: server srv 127.0.0.1:9000 source 127.0.0.2,127.0.0.3,127.0.0.4,127.0.1.0/24 usesrc clientip,client [...] >> >> Checks to the server would be done from each source but it can be very >> slow to cover the whole range. > > > I would make this optional. From a pure LBing safety point of view, I > understand the requirement. > That said, in some cases, we may not want to run tens or hundreds of health > checks per second. > I see different options: > - check from all source IP > - check from the host IP address (as of no source is configured) > - check from one source IP per source subnet > >> >> The main problem I see is how to efficiently store all sources for each >> server. Using the CIDR syntax can quickly allow millions of sources to >> be used and if we want to use algorithms like 'leastconn', we need to >> remember how many connections are still active on a particular source >> (using round-robbin + an index into the range would otherwise have been >> one solution) >> I have some ideas but I would like to know the preferred way. > > > Well, storing a 32 bit hash of and counting on this > pattern (and automatically eject server source+dest IP which have reached > 64K concurrent connections). Using a leastconn algorithm with very long connections will quickly fill the list/tree with entries with a counter of 1. > > I have a question: what would be the impact on "retries" ? At first, we > could use it as of today. But later, we may want to retry from a different > source IP. -- Aurélien Nephtali
Re: haproxy 1.9 status update
Sorry to wake up an old thread, but I'm very concerned by the lack of "architecture guide" documentation with HAProxy. Did we make any progress on this topic? Baptiste
Re: Using different sources when connecting to a server
Hi Aurélien, My 2 cents. I'm trying to add a feature which allows HAProxy to use more than one > source when connecting to a server of a backend. The main reason is to > avoid duplicating the 'server' lines to reach more than 64k connections > from HAProxy to one server. > Cool! > So far I thought of two ways: > - each time the 'source' keyword is encountered on a 'server' line, > duplicate the original 'struct server' and fill 'conn_src' with > the correct source informations. It's easy to implement but does > not scale at all. In fact it mimics the multiple 'server' lines. > The big advantage is that it can use all existing features that > deal with 'struct server' (balance keyword, for example). > - use a list of 'struct conn_src' in 'struct server' and 'struct > proxy' and choose the best source (using round-robbin, leastconn, > etc...) when a connection is about to get established. > I also prefer the second option. So we would have 2 LBing algorithm? One to choose the server and one to choose the source IP to use? > The config. syntax would look like this: > > server srv 127.0.0.1:9000 source 127.0.0.2 source 127.0.0.3 source > 127.0.0.4 source 127.0.0.5 source 127.0.0.6 source 127.0.1.0/24 > > Not using ip1,ip2,ip/cidr,... avoids confusion when using keywords like > usesrc, interface, etc... > Sure, but at least, I don't want to set 255 source for a "source 10.0.0.0/24", so please confirm you'll still allow CIDR notation. > Checks to the server would be done from each source but it can be very > slow to cover the whole range. > I would make this optional. From a pure LBing safety point of view, I understand the requirement. That said, in some cases, we may not want to run tens or hundreds of health checks per second. I see different options: - check from all source IP - check from the host IP address (as of no source is configured) - check from one source IP per source subnet > The main problem I see is how to efficiently store all sources for each > server. Using the CIDR syntax can quickly allow millions of sources to > be used and if we want to use algorithms like 'leastconn', we need to > remember how many connections are still active on a particular source > (using round-robbin + an index into the range would otherwise have been > one solution) > I have some ideas but I would like to know the preferred way. > Well, storing a 32 bit hash of and counting on this pattern (and automatically eject server source+dest IP which have reached 64K concurrent connections). I have a question: what would be the impact on "retries" ? At first, we could use it as of today. But later, we may want to retry from a different source IP. Baptiste
Re: Connections stuck in CLOSE_WAIT state with h2
> > 20180629.1347 mpeh2 fd25 h2c_error - st04 fl0002 err05 > Just hit h2c_error - H2_ERR_STREAM_CLOSED > After adding more debug I found following pattern around h2c_error in hanging connections: ... everything OK until now 20180629.1826 e901:backend.srvrep[000e:001a]: HTTP/1.1 200 OK 20180629.1826 e901:backend.srvcls[000e:adfd] 20180629.1826 mpeh2 fd14 h2s_close/h2c - id006d h2c_st04 h2c_fl streams:12 20180629.1826 mpeh2 fd14 h2s_close/real - id006d st04 fl3101 streams:12 -> 11 h2c_error 20180629.1826 mpeh2 fd14 h2_process_demux/11 - st04 fl 20180629.1826 mpeh2 fd14 h2c_error - st04 fl err05 20180629.1826 e8e7:backend.srvcls[000e:adfd] 20180629.1826 mpeh2 fd14 h2_process_mux/01a - st06 fl 20180629.1826 mpeh2 fd14 h2_process_mux/01b - st07 fl0100 20180629.1826 e8dd:backend.srvcls[000e:adfd] few more streams closed before before log for file descriptor 14 ends and connection hangs in CLOSE_WAIT 20180629.1827 x01.clicls[000e:adfd] 20180629.1827 e8dd:backend.closed[000e:adfd] 20180629.1827 e8df:backend.clicls[000e:adfd] 20180629.1827 e8df:backend.closed[000e:adfd] 20180629.1827 mpeh2 fd14 h2s_destroy - id0045 st06 fl3081 streams:11 20180629.1827 mpeh2 fd14 h2s_close/h2c - id0045 h2c_st07 h2c_fl0100 streams:11 20180629.1827 mpeh2 fd14 h2s_close/real - id0045 st06 fl3081 streams:11 -> 10 I have not seen this pattern (h2_process_demux/11 followed by h2c_error and h2_process_mux/01a + h2_process_mux/01b) in other conncetions. Only in those in CLOSE_WAIT state. Here is the piece of code with added debug h2_process_demux/11 from sources of haproxy 1.8.12 /* RFC7540#5.1:closed: if this state is reached as a * result of sending a RST_STREAM frame, the peer that * receives the RST_STREAM might have already sent * frames on the stream that cannot be withdrawn. An * endpoint MUST ignore frames that it receives on * closed streams after it has sent a RST_STREAM * frame. An endpoint MAY choose to limit the period * over which it ignores frames and treat frames that * arrive after this time as being in error. */ if (!(h2s->flags & H2_SF_RST_SENT)) { /* RFC7540#5.1:closed: any frame other than * PRIO/WU/RST in this state MUST be treated as * a connection error */ if (h2c->dft != H2_FT_RST_STREAM && h2c->dft != H2_FT_PRIORITY && h2c->dft != H2_FT_WINDOW_UPDATE) { send_log(NULL, LOG_NOTICE, "mpeh2 fd%d h2_process_demux/11 - st%02x fl%08x\n", mpeh2_h2c_fd(h2c), mpeh2_h2c_st0(h2c), mpeh2_h2c_flags(h2c)); h2c_error(h2c, H2_ERR_STREAM_CLOSED); goto strm_err; } } Here is the piece of code with added debug h2_process_mux/01a and 01b from sources of haproxy 1.8.12 fail: if (unlikely(h2c->st0 >= H2_CS_ERROR)) { send_log(NULL, LOG_NOTICE, "mpeh2 fd%d h2_process_mux/01a - st%02x fl%08x\n", mpeh2_h2c_fd(h2c), mpeh2_h2c_st0(h2c), mpeh2_h2c_flags(h2c)); if (h2c->st0 == H2_CS_ERROR) { if (h2c->max_id >= 0) { h2c_send_goaway_error(h2c, NULL); if (h2c->flags & H2_CF_MUX_BLOCK_ANY) return 0; } h2c->st0 = H2_CS_ERROR2; // sent (or failed hard) ! } send_log(NULL, LOG_NOTICE, "mpeh2 fd%d h2_process_mux/01b - st%02x fl%08x\n", mpeh2_h2c_fd(h2c), mpeh2_h2c_st0(h2c), mpeh2_h2c_flags(h2c)); return 1; } I hope this helps in isolating the problem. If it's not enough, I can add more debug to h2_mux if someone with better knowledge of source code and h2 protocol suggests where. Milan
Using different sources when connecting to a server
Hello, I'm trying to add a feature which allows HAProxy to use more than one source when connecting to a server of a backend. The main reason is to avoid duplicating the 'server' lines to reach more than 64k connections from HAProxy to one server. So far I thought of two ways: - each time the 'source' keyword is encountered on a 'server' line, duplicate the original 'struct server' and fill 'conn_src' with the correct source informations. It's easy to implement but does not scale at all. In fact it mimics the multiple 'server' lines. The big advantage is that it can use all existing features that deal with 'struct server' (balance keyword, for example). - use a list of 'struct conn_src' in 'struct server' and 'struct proxy' and choose the best source (using round-robbin, leastconn, etc...) when a connection is about to get established. The config. syntax would look like this: server srv 127.0.0.1:9000 source 127.0.0.2 source 127.0.0.3 source 127.0.0.4 source 127.0.0.5 source 127.0.0.6 source 127.0.1.0/24 Not using ip1,ip2,ip/cidr,... avoids confusion when using keywords like usesrc, interface, etc... Checks to the server would be done from each source but it can be very slow to cover the whole range. The main problem I see is how to efficiently store all sources for each server. Using the CIDR syntax can quickly allow millions of sources to be used and if we want to use algorithms like 'leastconn', we need to remember how many connections are still active on a particular source (using round-robbin + an index into the range would otherwise have been one solution) I have some ideas but I would like to know the preferred way. Thanks. -- Aurélien Nephtali