Re: Help tracking "connection refused" under pressure on v2.9
On Wed, Mar 27, 2024 at 02:26:47PM -0300, Ricardo Nabinger Sanchez wrote: > On Wed, 27 Mar 2024 11:06:39 -0300 > Felipe Wilhelms Damasio wrote: > > > kernel: traps: haproxy[2057993] trap invalid opcode ip:5b3e26 > > sp:7fd7c002f100 error:0 in haproxy[42c000+1f7000] > > We managed to get a core file, and so created ticket #2508 > (https://github.com/haproxy/haproxy/issues/2508) with more details. Thanks guys! So there seems to be an annoying bug. However I'm not sure how this is related to your "connection refused", except if you try to connect at the moment the process crashes and restarts, of course. I'm seeing that the bug here is stktable_requeue_exp() calling task_queue() with an invalid task expiration. I'm having a look now. I'll respond in the issue with what I can find, thanks for your report. Since you were speaking about FD count and maxconn at 900k, please let me take this opportunity for a few extra sanity checks. By default we assign up to about 50% of the FD to pipes (i.e. up to 25% pipes compared to connections), so if maxconn is 900k you can reach 1800 + 900 = 2700k FD. One thing to keep in mind is that /proc/sys/fs/nr_open sets a per-process hard limit and usually is set to 1M, and that /proc/sys/fs/file-max sets a system-wide limit and depends on the amount of RAM, so both may interact with such a large setting. We could for example imagine that at ~256k connections with as many pipes you're reaching around 1M FDs and that the connection from socat to the CLI socket cannot be accepted and is rejected. Since you recently updated your kernel, it might be worth checking if the default values are still in line with your usage. Cheers, Willy
RFC: PKCS#11 create private keys in worker process - take 3
Fix typo in patch formatting. Richard diff --git a/include/haproxy/ssl_ckch.h b/include/haproxy/ssl_ckch.h index 94c53b301..00ba2bf18 100644 --- a/include/haproxy/ssl_ckch.h +++ b/include/haproxy/ssl_ckch.h @@ -72,5 +72,14 @@ int __ssl_store_load_locations_file(char *path, int create_if_none, enum cafile_ extern struct cert_exts cert_exts[]; extern int (*ssl_commit_crlfile_cb)(const char *path, X509_STORE *ctx, char **err); +struct ckch_data_cache { + struct ckch_data_cache *next; + struct ckch_data *key; + uint8_t *data; + int len; +}; + +struct ckch_data_cache *ckch_data_get(void *key); + #endif /* USE_OPENSSL */ #endif /* _HAPROXY_SSL_CRTLIST_H */ diff --git a/src/ssl_ckch.c b/src/ssl_ckch.c index add42b69e..9c256f193 100644 --- a/src/ssl_ckch.c +++ b/src/ssl_ckch.c @@ -516,11 +516,63 @@ int ssl_sock_load_files_into_ckch(const char *path, struct ckch_data *data, char * * Return 0 on success or != 0 on failure */ + +struct ckch_data_cache *cache[31] = {0}; + +/* + * This caches the BIO data in case the private key i + * + * to be instantiated in the child process (used for HSM keys) + * + * The key ckch_data* and the value is ckch_data_cache* + * + */ +static void ckch_data_put(void *key, uint8_t *data, int len) +{ + int idx = (unsigned long)key % 31; + struct ckch_data_cache *prev, *cur, *last = NULL; + + ha_notice("caching ckch_data:%p len:%d\n", key, len); + + for(prev = cache[idx]; prev != NULL;) { + last = prev; + prev = prev->next; + } + + cur = malloc(sizeof(struct ckch_data_cache)); + memset(cur, 0, sizeof(struct ckch_data_cache)); + if (last == NULL) + cache[idx] = cur; + else + last->next = cur; + cur->key = key; + cur->data = malloc(len); + memcpy(cur->data, data, len); + cur->len = len; +} + +struct ckch_data_cache *ckch_data_get(void *key) +{ + int idx = (unsigned long)key % 31; + struct ckch_data_cache *prev; + + for(prev = cache[idx]; prev != NULL; prev = prev->next) { + if (prev->key == key) + return prev; + } + + return NULL; +} + int ssl_sock_load_key_into_ckch(const char *path, char *buf, struct ckch_data *data , char **err) { BIO *in = NULL; int ret = 1; EVP_PKEY *key = NULL; + static char src[16384]; + char *src_temp; + int len; + int filetype = 0; if (buf) { /* reading from a buffer */ @@ -538,14 +590,25 @@ int ssl_sock_load_key_into_ckch(const char *path, char *buf, struct ckch_data *d if (BIO_read_filename(in, path) <= 0) goto end; + filetype = 1; } /* Read Private Key */ + if(filetype) { + len = BIO_read(in, src, sizeof(src)); + BIO_reset(in); + } else { + len = BIO_get_mem_data(in, _temp); + memcpy(src, src_temp, len); + } + key = PEM_read_bio_PrivateKey(in, NULL, NULL, NULL); if (key == NULL) { memprintf(err, "%sunable to load private key from file '%s'.\n", err && *err ? *err : "", path); goto end; + } else { + ckch_data_put(data, (uint8_t*)src, len); } ret = 0; @@ -581,6 +644,10 @@ int ssl_sock_load_pem_into_ckch(const char *path, char *buf, struct ckch_data *d EVP_PKEY *key = NULL; HASSL_DH *dh = NULL; STACK_OF(X509) *chain = NULL; + int filetype = 0; + int len; + static char src[16384]; + char *src_temp; if (buf) { /* reading from a buffer */ @@ -603,11 +670,23 @@ int ssl_sock_load_pem_into_ckch(const char *path, char *buf, struct ckch_data *d err && *err ? *err : "", path); goto end; } + filetype = 1; } /* Read Private Key */ + if(filetype) { + len = BIO_read(in, src, sizeof(src)); + BIO_reset(in); + } else { + len = BIO_get_mem_data(in, _temp); + memcpy(src, src_temp, len); + } + key = PEM_read_bio_PrivateKey(in, NULL, NULL, NULL); /* no need to check for errors here, because the private key could be loaded later */ + if(key && global.mode_MWORKER) { + ckch_data_put(data, (uint8_t *)src, len); + } #ifndef OPENSSL_NO_DH /* Seek back to beginning of file */ diff --git a/src/ssl_sock.c b/src/ssl_sock.c index 824bdaa72..64bd7ac04 100644 --- a/src/ssl_sock.c +++ b/src/ssl_sock.c @@ -2009,10 +2009,77 @@ struct methodVersions methodVersions[] = { {SSL_OP_NO_TLSv1_3, MC_SSL_O_NO_TLSV13, ctx_set_TLSv13_func, ssl_set_TLSv13_func, "TLSv1.3"}, /* CONF_TLSV13 */ }; +/* cache SSL_CTX* to ckch_data + * so we can recreate private key + */ +struct ssl_ctx_cache { + void *key; + struct ckch_data_cache *value; + struct ssl_ctx_cache *next; + EVP_PKEY *pkey; +}; +static struct ssl_ctx_cache *ctx_cache[31]; + +void ssl_ctx_put(void *key, void *data) +{ + int idx = (unsigned long)key % 31; + struct ssl_ctx_cache *prev, *cur, *last = NULL; + + ha_notice("ssl_ctx_cache key = %p value = %p\n", key, data); + + for(prev = ctx_cache[idx]; prev != NULL;) { + last = prev; + prev = prev->next; + } + + cur = malloc(sizeof(struct ssl_ctx_cache)); + memset(cur, 0, sizeof(struct ssl_ctx_cache)); + if (last == NULL) + ctx_cache[idx] = cur; + else + last->next = cur; + cur->next = NULL; + cur->key = key; + cur->value = data; +} + +struct ssl_ctx_cache
RFC: PKCS#11 create private keys in worker process - take 2
Apologies for the badly pasted diff Richard diff --git a/include/haproxy/ssl_ckch.h b/include/haproxy/ssl_ckch.h index 94c53b301..00ba2bf18 100644 --- a/include/haproxy/ssl_ckch.h +++ b/include/haproxy/ssl_ckch.h @@ -72,5 +72,14 @@ int __ssl_store_load_locations_file(char *path, int create_if_none, enum cafile_ extern struct cert_exts cert_exts[]; extern int (*ssl_commit_crlfile_cb)(const char *path, X509_STORE *ctx, char **err); +struct ckch_data_cache { + struct ckch_data_cache *next; + struct ckch_data *key; + uint8_t *data; + int len; +}; + +struct ckch_data_cache *ckch_data_get(void *key); + #endif /* USE_OPENSSL */ #endif /* _HAPROXY_SSL_CRTLIST_H */ diff --git a/src/ssl_ckch.c b/src/ssl_ckch.c index add42b69e..f659014ba 100644 --- a/src/ssl_ckch.c +++ b/src/ssl_ckch.c @@ -516,11 +516,63 @@ int ssl_sock_load_files_into_ckch(const char *path, struct ckch_data *data, char * * Return 0 on success or != 0 on failure */ + +struct ckch_data_cache *cache[31] = {0}; + +/* + * This caches the BIO data in case the private key i + * + * to be instantiated in the child process (used for HSM keys) + * + * The key ckch_data* and the value is ckch_data_cache* + * + */ +static void ckch_data_put(void *key, uint8_t *data, int len) +{ + int idx = (unsigned long)key % 31; + struct ckch_data_cache *prev, *cur, *last = NULL; + + ha_notice("caching ckch_data:%p len:%d\n", key, len); + + for(prev = cache[idx]; prev != NULL;) { + last = prev; + prev = prev->next; + } + + cur = malloc(sizeof(struct ckch_data_cache)); + memset(cur, 0, sizeof(struct ckch_data_cache)); + if (last == NULL) + cache[idx] = cur; + else + last = cur; + cur->key = key; + cur->data = malloc(len); + memcpy(cur->data, data, len); + cur->len = len; +} + +struct ckch_data_cache *ckch_data_get(void *key) +{ + int idx = (unsigned long)key % 31; + struct ckch_data_cache *prev; + + for(prev = cache[idx]; prev != NULL; prev = prev->next) { + if (prev->key == key) + return prev; + } + + return NULL; +} + int ssl_sock_load_key_into_ckch(const char *path, char *buf, struct ckch_data *data , char **err) { BIO *in = NULL; int ret = 1; EVP_PKEY *key = NULL; + static char src[16384]; + char *src_temp; + int len; + int filetype = 0; if (buf) { /* reading from a buffer */ @@ -538,14 +590,25 @@ int ssl_sock_load_key_into_ckch(const char *path, char *buf, struct ckch_data *d if (BIO_read_filename(in, path) <= 0) goto end; + filetype = 1; } /* Read Private Key */ + if(filetype) { + len = BIO_read(in, src, sizeof(src)); + BIO_reset(in); + } else { + len = BIO_get_mem_data(in, _temp); + memcpy(src, src_temp, len); + } + key = PEM_read_bio_PrivateKey(in, NULL, NULL, NULL); if (key == NULL) { memprintf(err, "%sunable to load private key from file '%s'.\n", err && *err ? *err : "", path); goto end; + } else { + ckch_data_put(data, (uint8_t*)src, len); } ret = 0; @@ -581,6 +644,10 @@ int ssl_sock_load_pem_into_ckch(const char *path, char *buf, struct ckch_data *d EVP_PKEY *key = NULL; HASSL_DH *dh = NULL; STACK_OF(X509) *chain = NULL; + int filetype = 0; + int len; + static char src[16384]; + char *src_temp; if (buf) { /* reading from a buffer */ @@ -603,11 +670,23 @@ int ssl_sock_load_pem_into_ckch(const char *path, char *buf, struct ckch_data *d err && *err ? *err : "", path); goto end; } + filetype = 1; } /* Read Private Key */ + if(filetype) { + len = BIO_read(in, src, sizeof(src)); + BIO_reset(in); + } else { + len = BIO_get_mem_data(in, _temp); + memcpy(src, src_temp, len); + } + key = PEM_read_bio_PrivateKey(in, NULL, NULL, NULL); /* no need to check for errors here, because the private key could be loaded later */ + if(key && global.mode_MWORKER) { + ckch_data_put(data, (uint8_t *)src, len); + } #ifndef OPENSSL_NO_DH /* Seek back to beginning of file */ diff --git a/src/ssl_sock.c b/src/ssl_sock.c index 824bdaa72..64bd7ac04 100644 --- a/src/ssl_sock.c +++ b/src/ssl_sock.c @@ -2009,10 +2009,77 @@ struct methodVersions methodVersions[] = { {SSL_OP_NO_TLSv1_3, MC_SSL_O_NO_TLSV13, ctx_set_TLSv13_func, ssl_set_TLSv13_func, "TLSv1.3"}, /* CONF_TLSV13 */ }; +/* cache SSL_CTX* to ckch_data + * so we can recreate private key + */ +struct ssl_ctx_cache { + void *key; + struct ckch_data_cache *value; + struct ssl_ctx_cache *next; + EVP_PKEY *pkey; +}; +static struct ssl_ctx_cache *ctx_cache[31]; + +void ssl_ctx_put(void *key, void *data) +{ + int idx = (unsigned long)key % 31; + struct ssl_ctx_cache *prev, *cur, *last = NULL; + + ha_notice("ssl_ctx_cache key = %p value = %p\n", key, data); + + for(prev = ctx_cache[idx]; prev != NULL;) { + last = prev; + prev = prev->next; + } + + cur = malloc(sizeof(struct ssl_ctx_cache)); + memset(cur, 0, sizeof(struct ssl_ctx_cache)); + if (last == NULL) + ctx_cache[idx] = cur; + else + last->next = cur; + cur->next = NULL; + cur->key = key; + cur->value = data; +} + +struct ssl_ctx_cache
RFC: PKCS#11 create private keys in worker process diff
diff --git a/include/haproxy/ssl_ckch.h b/include/haproxy/ssl_ckch.h index 94c53b301..00ba2bf18 100644 --- a/include/haproxy/ssl_ckch.h +++ b/include/haproxy/ssl_ckch.h @@ -72,5 +72,14 @@ int __ssl_store_load_locations_file(char *path, int create_if_none, enum cafile_ extern struct cert_exts cert_exts[]; extern int (*ssl_commit_crlfile_cb)(const char *path, X509_STORE *ctx, char **err); +struct ckch_data_cache { + struct ckch_data_cache *next; + struct ckch_data *key; + uint8_t *data; + int len; +}; + +struct ckch_data_cache *ckch_data_get(void *key); + #endif /* USE_OPENSSL */ #endif /* _HAPROXY_SSL_CRTLIST_H */ diff --git a/src/ssl_ckch.c b/src/ssl_ckch.c index add42b69e..f659014ba 100644 --- a/src/ssl_ckch.c +++ b/src/ssl_ckch.c @@ -516,11 +516,63 @@ int ssl_sock_load_files_into_ckch(const char *path, struct ckch_data *data, char * * Return 0 on success or != 0 on failure */ + +struct ckch_data_cache *cache[31] = {0}; + +/* + * This caches the BIO data in case the private key i + * + * to be instantiated in the child process (used for HSM keys) + * + * The key ckch_data* and the value is ckch_data_cache* + * + */ +static void ckch_data_put(void *key, uint8_t *data, int len) +{ + int idx = (unsigned long)key % 31; + struct ckch_data_cache *prev, *cur, *last = NULL; + + ha_notice("caching ckch_data:%p len:%d\n", key, len); + + for(prev = cache[idx]; prev != NULL;) { + last = prev; + prev = prev->next; + } + + cur = malloc(sizeof(struct ckch_data_cache)); + memset(cur, 0, sizeof(struct ckch_data_cache)); + if (last == NULL) + cache[idx] = cur; + else + last = cur; + cur->key = key; + cur->data = malloc(len); + memcpy(cur->data, data, len); + cur->len = len; +} + +struct ckch_data_cache *ckch_data_get(void *key) +{ + int idx = (unsigned long)key % 31; + struct ckch_data_cache *prev; + + for(prev = cache[idx]; prev != NULL; prev = prev->next) { + if (prev->key == key) + return prev; + } + + return NULL; +} + int ssl_sock_load_key_into_ckch(const char *path, char *buf, struct ckch_data *data , char **err) { BIO *in = NULL; int ret = 1; EVP_PKEY *key = NULL; + static char src[16384]; + char *src_temp; + int len; + int filetype = 0; if (buf) { /* reading from a buffer */ @@ -538,14 +590,25 @@ int ssl_sock_load_key_into_ckch(const char *path, char *buf, struct ckch_data *d if (BIO_read_filename(in, path) <= 0) goto end; + filetype = 1; } /* Read Private Key */ + if(filetype) { + len = BIO_read(in, src, sizeof(src)); + BIO_reset(in); + } else { + len = BIO_get_mem_data(in, _temp); + memcpy(src, src_temp, len); + } + key = PEM_read_bio_PrivateKey(in, NULL, NULL, NULL); if (key == NULL) { memprintf(err, "%sunable to load private key from file '%s'.\n", err && *err ? *err : "", path); goto end; + } else { + ckch_data_put(data, (uint8_t*)src, len); } ret = 0; @@ -581,6 +644,10 @@ int ssl_sock_load_pem_into_ckch(const char *path, char *buf, struct ckch_data *d EVP_PKEY *key = NULL; HASSL_DH *dh = NULL; STACK_OF(X509) *chain = NULL; + int filetype = 0; + int len; + static char src[16384]; + char *src_temp; if (buf) { /* reading from a buffer */ @@ -603,11 +670,23 @@ int ssl_sock_load_pem_into_ckch(const char *path, char *buf, struct ckch_data *d err && *err ? *err : "", path); goto end; } + filetype = 1; } /* Read Private Key */ + if(filetype) { + len = BIO_read(in, src, sizeof(src)); + BIO_reset(in); + } else { + len = BIO_get_mem_data(in, _temp); + memcpy(src, src_temp, len); + } + key = PEM_read_bio_PrivateKey(in, NULL, NULL, NULL); /* no need to check for errors here, because the private key could be loaded later */ + if(key && global.mode_MWORKER) { + ckch_data_put(data, (uint8_t *)src, len); + } #ifndef OPENSSL_NO_DH /* Seek back to beginning of file */ diff --git a/src/ssl_sock.c b/src/ssl_sock.c index 824bdaa72..64bd7ac04 100644 --- a/src/ssl_sock.c +++ b/src/ssl_sock.c @@ -2009,10 +2009,77 @@ struct methodVersions methodVersions[] = { {SSL_OP_NO_TLSv1_3, MC_SSL_O_NO_TLSV13, ctx_set_TLSv13_func, ssl_set_TLSv13_func, "TLSv1.3"}, /* CONF_TLSV13 */ }; +/* cache SSL_CTX* to ckch_data + * so we can recreate private key + */ +struct ssl_ctx_cache { + void *key; + struct ckch_data_cache *value; + struct ssl_ctx_cache *next; + EVP_PKEY *pkey; +}; +static struct ssl_ctx_cache *ctx_cache[31]; + +void ssl_ctx_put(void *key, void *data) +{ + int idx = (unsigned long)key % 31; + struct ssl_ctx_cache *prev, *cur, *last = NULL; + + ha_notice("ssl_ctx_cache key = %p value = %p\n", key, data); + + for(prev = ctx_cache[idx]; prev != NULL;) { + last = prev; + prev = prev->next; + } + + cur = malloc(sizeof(struct ssl_ctx_cache)); + memset(cur, 0, sizeof(struct ssl_ctx_cache)); + if (last == NULL) + ctx_cache[idx] = cur; + else + last->next = cur; + cur->next = NULL; + cur->key = key; + cur->value = data; +} + +struct ssl_ctx_cache *ssl_ctx_get(void *key) +{ + int idx = (unsigned long)key % 31; + struct ssl_ctx_cache *prev;
RFC: PKCS#11 create private keys in worker process
Hello, This is an RFC to recreate private keys in the worker process for PKCS#11, so that HSM keys can be used in -W mode. - ssl_ckch.c: add map of ckch_data to PEM data - ssl_sock.c: add map of SSL_CTX* to ckch_data - maps are implemented using buckets of linked lists it is explicit and in the code for easier review instead of using more optimized hashmap implementations - when the SSL context is created and the correct SSL_CTX is assigned with SSL_use_SSL_CTX the private key data is retrieved just once once, cached, and installed into the SSL_CTX; this is done in the worker process - the PEM data has an arbitrary limit of 16384 bytes Regards Richard
Re: Help tracking "connection refused" under pressure on v2.9
On Wed, 27 Mar 2024 11:06:39 -0300 Felipe Wilhelms Damasio wrote: > kernel: traps: haproxy[2057993] trap invalid opcode ip:5b3e26 > sp:7fd7c002f100 error:0 in haproxy[42c000+1f7000] We managed to get a core file, and so created ticket #2508 (https://github.com/haproxy/haproxy/issues/2508) with more details. Cheers, -- Ricardo Nabinger Sanchez https://www.taghos.com.br/
[PATCH 0/1] CI improvement, display coredumps if any
it is pretty rare case, however displaying "bt" may provide some ideas what went wrong Ilya Shipitsin (1): CI: vtest: show coredumps if any .github/workflows/vtest.yml | 26 +- 1 file changed, 25 insertions(+), 1 deletion(-) -- 2.44.0
[PATCH 1/1] CI: vtest: show coredumps if any
if any coredump is found, it is passed to gdb with 'thread apply all bt full' --- .github/workflows/vtest.yml | 26 +- 1 file changed, 25 insertions(+), 1 deletion(-) diff --git a/.github/workflows/vtest.yml b/.github/workflows/vtest.yml index 8c461385f..a704c92dc 100644 --- a/.github/workflows/vtest.yml +++ b/.github/workflows/vtest.yml @@ -49,6 +49,13 @@ jobs: - uses: actions/checkout@v4 with: fetch-depth: 100 + +- name: Setup coredumps + if: ${{ startsWith(matrix.os, 'ubuntu-') }} + run: | +sudo sysctl -w fs.suid_dumpable=1 +sudo sysctl kernel.core_pattern=/tmp/core.%h.%e.%t + # # Github Action cache key cannot contain comma, so we calculate it based on job name # @@ -92,7 +99,8 @@ jobs: libpcre2-dev \ libsystemd-dev \ ninja-build \ - socat + socat \ + gdb - name: Install brew dependencies if: ${{ startsWith(matrix.os, 'macos-') }} run: | @@ -150,6 +158,7 @@ jobs: # This is required for macOS which does not actually allow to increase # the '-n' soft limit to the hard limit, thus failing to run. ulimit -n 65536 +ulimit -c unlimited make reg-tests HAPROXY_ARGS="-dI" VTEST_PROGRAM=../vtest/vtest REGTESTS_TYPES=default,bug,devel - name: Config syntax check memleak smoke testing if: ${{ contains(matrix.name, 'ASAN') }} @@ -175,3 +184,18 @@ jobs: echo "::endgroup::" done exit 1 + +- name: Show coredumps + if: ${{ failure() && steps.vtest.outcome == 'failure' }} + run: | +failed=false +shopt -s nullglob +for file in /tmp/core.*; do + failed=true + printf "::group::" + gdb -ex 'thread apply all bt full' ./haproxy $file + echo "::endgroup::" +done +if [ "$failed" = true ]; then + exit 1; +fi -- 2.44.0
[PATCH] BUG/MINOR: server: fix persistence cookie for dynamic servers
When adding a server dynamically, we observe that when a backend has a dynamic persistence cookie, the new server has no cookie as we receive the following HTTP header: set-cookie: test-cookie=; Expires=Thu, 01-Jan-1970 00:00:01 GMT; path=/ Whereas we were expecting to receive something like the following, which is what we receive for a server added in the config file: set-cookie: test-cookie=abcdef1234567890; path=/ After investigating code path, srv_set_dyncookie() is never called when adding a server through CLI, it is only called when parsing config file or using "set server bkd1/srv1 addr". To fix this, call srv_set_dyncookie() inside cli_parse_add_server(). This patch must be backported up to 2.4. --- src/server.c | 5 + 1 file changed, 5 insertions(+) diff --git a/src/server.c b/src/server.c index 555cae82c..a93798f03 100644 --- a/src/server.c +++ b/src/server.c @@ -5732,6 +5732,11 @@ static int cli_parse_add_server(char **args, char *payload, struct appctx *appct */ srv->rid = (srv_id_reuse_cnt) ? (srv_id_reuse_cnt / 2) : 0; + /* generate new server's dynamic cookie if enabled on backend */ + if (be->ck_opts & PR_CK_DYNAMIC) { + srv_set_dyncookie(srv); + } + /* adding server cannot fail when we reach this: * publishing EVENT_HDL_SUB_SERVER_ADD */ -- 2.34.1
Re: Help tracking "connection refused" under pressure on v2.9
On Wed, 27 Mar 2024 11:06:39 -0300 Felipe Wilhelms Damasio wrote: > kernel: traps: haproxy[2057993] trap invalid opcode ip:5b3e26 sp:7fd7c002f100 > error:0 in haproxy[42c000+1f7000] In our build, this would be where instruction pointer was: (gdb) list *0x5b10e6 0x5b10e6 is in __task_queue (src/task.c:285). 280(wq != _ctx->timers && wq != _ctx->timers)); 281 #endif 282 /* if this happens the process is doomed anyway, so better catch it now 283 * so that we have the caller in the stack. 284 */ 285 BUG_ON(task->expire == TICK_ETERNITY); 286 287 if (likely(task_in_wq(task))) 288 __task_unlink_wq(task); 289 However, we can't produce a stack trace from only the instruction pointer; at least I don't know how (but would love to learn if it is possible). We are trying to get a core dump, too. Cheers, -- Ricardo Nabinger Sanchez https://www.taghos.com.br/
Re: Help tracking "connection refused" under pressure on v2.9
Hi, We've confirmed a few findings after we poured ~75-80Gbps of traffic on purpose on a single machine: - haproxy does indeed crashes; - hence, we have no stats socket to collect a few things; It seems that under pressure (not sure which conditions yet) the kernel seems to be killing it. dmesg shows: kernel: traps: haproxy[2057993] trap invalid opcode ip:5b3e26 sp:7fd7c002f100 error:0 in haproxy[42c000+1f7000] This is a relatively new kernel: Linux ndt-spo-12 6.1.60 #1 SMP PREEMPT_DYNAMIC Wed Oct 25 19:17:36 UTC 2023 x86_64 Intel(R) Xeon(R) Gold 6338N CPU @ 2.20GHz GenuineIntel GNU/Linux And it seems to happen on different kernels. Does anyone have any tips on how to proceed to track this down? Before the crash, "show info" showed only around 27,000 CurConn, so not a great deal for maxconn 90. Thanks! On Tue, Mar 26, 2024 at 11:33 PM Felipe Wilhelms Damasio wrote: > > Hi, > > Since we don't really know how to track this one, we thought it might > be better to reach out here to get feedback. > > We're using haproxy to deliver streaming files under pressure > (80-90Gbps per machine). When using h1/http, splice-response is a > great help to keep load under control. We use branch v2.9 at the > moment. > > However, we've hit a bug with splice-response (Github issue created) > and we had to use all day our haproxies without splicing. > > When we reach a certain load, a "connection refused" alarm starting > buzzing like crazy (2-3 times every 30 minutes). This alarm is simply > a connect to localhost with 500ms timeout: > > socat /dev/null tcp4-connect:127.0.0.1:80,connect-timeout=0.5 > > The log file indicates the port is virtually closed: > > 2024/03/27 01:06:04 socat[984480] E read(6, 0xe98000, 8192): Connection > refused > > The thing is haproxy process is very much alive...so we just restart > it everytime this happens. > > What data do you suggest we collect to help track this down? Not sure > if the stats socket is available, but we can definitely try and get > some information. > > We're not running out of fds, or even connections with/without backlog > (we have a global maxconn of 90 with ~30,000 streaming sessions > active and we have tcp_max_syn_backlog set to 262144), we checked. But > it seems to correlate with heavy traffic. > > Thanks! > > -- > Felipe Damasio -- Felipe Damasio
Re: [PATCH] MEDIUM: lb-chash: Deterministic node hashes based on server address
Hi again Anthony, I'm still having a few comments, but I think nothing that I cannot address while merging it: On Wed, Mar 13, 2024 at 12:33:54PM -0400, Anthony Deschamps wrote: > +static inline u32 chash_compute_server_key(struct server *s) > +{ > + u32 key = 0; > + struct server_inetaddr srv_addr; > + > + enum srv_hash_key hash_key = s->hash_key; In general, it's better to prefer the common "reverse christmas tree" declaration order, i.e. put larger statements first and smaller next, and more importantly not leave empty lines in the middle of declarations as these usually indicate the beginning of statements. > + > + /* If hash-key is addr or addr-port then we need the address, but if we > + * can't determine the address then we fall back on hashing the puid. > + */ > + switch (s->hash_key) { Here I initially got confused by the use of s->hash_key while the rest below uses hash_key, I'd change it to hash_key as well since it's already assigned at the top. > + case SRV_HASH_KEY_ADDR: > + case SRV_HASH_KEY_ADDR_PORT: > + server_get_inetaddr(s, _addr); > + if (srv_addr.family != AF_INET && srv_addr.family != AF_INET6) { > + hash_key = SRV_HASH_KEY_ID; > + } > + break; > + > + default: > + break; > + } > + > + if (hash_key == SRV_HASH_KEY_ADDR_PORT) { > + key = full_hash(srv_addr.port.svc); > + } Given that "key" is used immediately below and that I had to scroll up a bit to verify that it was properly initialized, I'd rather set the default zero value here in an else statement for ease of inspection later. > + switch (hash_key) { > + case SRV_HASH_KEY_ADDR_PORT: > + case SRV_HASH_KEY_ADDR: > + switch (srv_addr.family) { > + case AF_INET: > + key = full_hash(key + srv_addr.addr.v4.s_addr); > + break; > + case AF_INET6: > + key = XXH32(srv_addr.addr.v6.s6_addr, 16, key); > + break; > + default: (...) > + Arguments : > +may be one of the following : > + > + id The node keys will be derived from the server's position in > + the server list. > + Here it's "the server's numeric identifier as set from 'id' or which defaults to its position in the list". The rest is good. If you want I can perform these tiny cosmetic adjustments myself so as to save you a round trip. Thanks! Willy
Re: [PATCH] BUG/MINOR: server: fix persistence cookie for dynamic servers
On Fri, Mar 22, 2024 at 09:45:59AM +, Damien Claisse wrote: > Hi Amaury! > Sorry for the HTML message, I have to use a *** well-known enterprise MUA :( > Le 22/03/2024 09:09, « Amaury Denoyelle » a écrit : >> This patch raises several interrogations. First, dynamic servers are >> currently intended to not support cookies, hence why the keyword is >> disabled for them. It has been done as a convenience but maybe it would >> be a good time to review it carefully and see if whole cookie support >> can be enabled. > Indeed, there could be an opportunity to revisit this. What we observed is > that, even with dynamic servers, calling “set server bkd1/srv1 addr a.b.c.d” > would re-add the dynamic cookie for this server, and calling “enable > dynamic-cookie backend bkd1” would re-compute cookie for all servers in the > backend. It is expected as in these calls code path there is a call to > srv_set_dyncookie(). So there currently is at least a partial support for > dynamic cookies on dynamic servers, even if it’s not expected :) >> Second, I'm unsure srv_set_dyncookie() should be called on >> _srv_parse_init(). This function is also called for configuration file >> servers. In particular, I do not know how we should handled duplicate >> cookie values in this case. > Not sure we really create duplicates here as we basically reset the same > cookie on the server, not on another one in the backend, at least I didn’t > observe such warnings in my logs while testing this patch yet. But I agree > that in the context of haproxy startup there would be 2 calls to > srv_set_dyncookie() per server which is useless and could create issues. > Maybe I could move that at the end of cli_parse_add_server()? Feel free to > suggest any better option. Okay, I had the time to review dynamic cookie handling. For me it's fine to use srv_set_dyncookie() for dynamic servers. I think however its new invocation should be placed near the end of cli_parse_add_server(). Maybe with an extra check (be->ck_opts & PR_CK_DYNAMIC) to be similar with invocation in check_config_validity(). This location will prevent duplicate invocation of srv_set_dyncookie() for static servers, as for them the function must be called on post parsing to ensure the backend key has been parsed. Could you please adjust your patch ? Also, as it is a bugfix, you can specify that it must be backported up to 2.4. In the meantime, I will continue code inspection to determine if cookie keyword can also be enabled for dynamic servers as well. Regards, -- Amaury Denoyelle
Re: [PATCH] MEDIUM: lb-chash: Deterministic node hashes based on server address
Hi Anthony, On Sun, Mar 24, 2024 at 10:11:41PM -0400, Anthony Deschamps wrote: > Hi Willy, > > I'm just checking in to see if there's anything left I can help address here. Thanks for the ping and sorry for the delay. It just fell through the cracks in the middle of all other stuff I'm currently having to deal with in parallel, as every time we're starting to get closer to a release :-( I'll try to deal with it today. Willy