Re: Help tracking "connection refused" under pressure on v2.9

2024-03-27 Thread Willy Tarreau
On Wed, Mar 27, 2024 at 02:26:47PM -0300, Ricardo Nabinger Sanchez wrote:
> On Wed, 27 Mar 2024 11:06:39 -0300
> Felipe Wilhelms Damasio  wrote:
> 
> > kernel: traps: haproxy[2057993] trap invalid opcode ip:5b3e26
> > sp:7fd7c002f100 error:0 in haproxy[42c000+1f7000]
> 
> We managed to get a core file, and so created ticket #2508
> (https://github.com/haproxy/haproxy/issues/2508) with more details.

Thanks guys! So there seems to be an annoying bug. However I'm not sure
how this is related to your "connection refused", except if you try to
connect at the moment the process crashes and restarts, of course. I'm
seeing that the bug here is stktable_requeue_exp() calling task_queue()
with an invalid task expiration. I'm having a look now. I'll respond in
the issue with what I can find, thanks for your report.

Since you were speaking about FD count and maxconn at 900k, please let
me take this opportunity for a few extra sanity checks. By default we
assign up to about 50% of the FD to pipes (i.e. up to 25% pipes compared
to connections), so if maxconn is 900k you can reach 1800 + 900 = 2700k
FD. One thing to keep in mind is that /proc/sys/fs/nr_open sets a
per-process hard limit and usually is set to 1M, and that
/proc/sys/fs/file-max sets a system-wide limit and depends on the amount
of RAM, so both may interact with such a large setting. We could for
example imagine that at ~256k connections with as many pipes you're
reaching around 1M FDs and that the connection from socat to the CLI
socket cannot be accepted and is rejected. Since you recently updated
your kernel, it might be worth checking if the default values are still
in line with your usage.

Cheers,
Willy



RFC: PKCS#11 create private keys in worker process - take 3

2024-03-27 Thread Richard Chan
Fix typo in patch formatting.

Richard
diff --git a/include/haproxy/ssl_ckch.h b/include/haproxy/ssl_ckch.h
index 94c53b301..00ba2bf18 100644
--- a/include/haproxy/ssl_ckch.h
+++ b/include/haproxy/ssl_ckch.h
@@ -72,5 +72,14 @@ int __ssl_store_load_locations_file(char *path, int create_if_none, enum cafile_
 extern struct cert_exts cert_exts[];
 extern int (*ssl_commit_crlfile_cb)(const char *path, X509_STORE *ctx, char **err);
 
+struct ckch_data_cache {
+	struct ckch_data_cache *next;
+	struct ckch_data *key;
+	uint8_t *data;
+	int len;
+};
+
+struct ckch_data_cache *ckch_data_get(void *key);
+
 #endif /* USE_OPENSSL */
 #endif /* _HAPROXY_SSL_CRTLIST_H */
diff --git a/src/ssl_ckch.c b/src/ssl_ckch.c
index add42b69e..9c256f193 100644
--- a/src/ssl_ckch.c
+++ b/src/ssl_ckch.c
@@ -516,11 +516,63 @@ int ssl_sock_load_files_into_ckch(const char *path, struct ckch_data *data, char
  *
  *  Return 0 on success or != 0 on failure
  */
+
+struct ckch_data_cache *cache[31] = {0};
+
+/*
+ * This caches the BIO data in case the private key i
+ *
+ * to be instantiated in the child process (used for HSM keys)
+ *
+ * The key ckch_data* and the value is ckch_data_cache*
+ *
+ */
+static void ckch_data_put(void *key, uint8_t *data, int len)
+{
+	int idx = (unsigned long)key % 31;
+	struct ckch_data_cache *prev, *cur, *last = NULL;
+
+	ha_notice("caching ckch_data:%p len:%d\n", key, len);
+
+	for(prev = cache[idx]; prev != NULL;) {
+		last = prev;
+		prev = prev->next;
+	}
+
+	cur = malloc(sizeof(struct ckch_data_cache));
+	memset(cur, 0, sizeof(struct ckch_data_cache));
+	if (last == NULL)
+		cache[idx] = cur;
+	else
+		last->next = cur;
+	cur->key = key;
+	cur->data = malloc(len);
+	memcpy(cur->data, data, len);
+	cur->len = len;
+}
+
+struct ckch_data_cache *ckch_data_get(void *key)
+{
+	int idx = (unsigned long)key % 31;
+	struct ckch_data_cache *prev;
+
+	for(prev = cache[idx]; prev != NULL; prev = prev->next) {
+		if (prev->key == key)
+			return prev;
+	}
+
+	return NULL;
+}
+
 int ssl_sock_load_key_into_ckch(const char *path, char *buf, struct ckch_data *data , char **err)
 {
 	BIO *in = NULL;
 	int ret = 1;
 	EVP_PKEY *key = NULL;
+	static char src[16384];
+	char *src_temp;
+	int len;
+	int filetype = 0;
 
 	if (buf) {
 		/* reading from a buffer */
@@ -538,14 +590,25 @@ int ssl_sock_load_key_into_ckch(const char *path, char *buf, struct ckch_data *d
 
 		if (BIO_read_filename(in, path) <= 0)
 			goto end;
+		filetype = 1;
 	}
 
 	/* Read Private Key */
+	if(filetype) {
+		len = BIO_read(in, src, sizeof(src));
+		BIO_reset(in);
+	} else {
+		len = BIO_get_mem_data(in, _temp);
+		memcpy(src, src_temp, len);
+	}
+
 	key = PEM_read_bio_PrivateKey(in, NULL, NULL, NULL);
 	if (key == NULL) {
 		memprintf(err, "%sunable to load private key from file '%s'.\n",
 		  err && *err ? *err : "", path);
 		goto end;
+	} else {
+		ckch_data_put(data, (uint8_t*)src, len);
 	}
 
 	ret = 0;
@@ -581,6 +644,10 @@ int ssl_sock_load_pem_into_ckch(const char *path, char *buf, struct ckch_data *d
 	EVP_PKEY *key = NULL;
 	HASSL_DH *dh = NULL;
 	STACK_OF(X509) *chain = NULL;
+	int filetype = 0;
+	int len;
+	static char src[16384];
+	char *src_temp;
 
 	if (buf) {
 		/* reading from a buffer */
@@ -603,11 +670,23 @@ int ssl_sock_load_pem_into_ckch(const char *path, char *buf, struct ckch_data *d
 			  err && *err ? *err : "", path);
 			goto end;
 		}
+		filetype = 1;
 	}
 
 	/* Read Private Key */
+	if(filetype) {
+		len = BIO_read(in, src, sizeof(src));
+		BIO_reset(in);
+	} else {
+		len = BIO_get_mem_data(in, _temp);
+		memcpy(src, src_temp, len);
+	}
+
 	key = PEM_read_bio_PrivateKey(in, NULL, NULL, NULL);
 	/* no need to check for errors here, because the private key could be loaded later */
+	if(key && global.mode_MWORKER) {
+		ckch_data_put(data, (uint8_t *)src, len);
+	}
 
 #ifndef OPENSSL_NO_DH
 	/* Seek back to beginning of file */
diff --git a/src/ssl_sock.c b/src/ssl_sock.c
index 824bdaa72..64bd7ac04 100644
--- a/src/ssl_sock.c
+++ b/src/ssl_sock.c
@@ -2009,10 +2009,77 @@ struct methodVersions methodVersions[] = {
 	{SSL_OP_NO_TLSv1_3, MC_SSL_O_NO_TLSV13, ctx_set_TLSv13_func, ssl_set_TLSv13_func, "TLSv1.3"}, /* CONF_TLSV13 */
 };
 
+/* cache SSL_CTX* to ckch_data
+ * so we can recreate private key
+ */
+struct ssl_ctx_cache {
+	void *key;
+	struct ckch_data_cache *value;
+	struct ssl_ctx_cache *next;
+	EVP_PKEY *pkey;
+};
+static struct ssl_ctx_cache *ctx_cache[31];
+
+void ssl_ctx_put(void *key, void *data)
+{
+	int idx = (unsigned long)key % 31;
+	struct ssl_ctx_cache *prev, *cur, *last = NULL;
+
+	ha_notice("ssl_ctx_cache key = %p value = %p\n", key, data);
+
+	for(prev = ctx_cache[idx]; prev != NULL;) {
+		last = prev;
+		prev = prev->next;
+	}
+
+	cur = malloc(sizeof(struct ssl_ctx_cache));
+   memset(cur, 0, sizeof(struct ssl_ctx_cache));
+	if (last == NULL)
+		ctx_cache[idx] = cur;
+	else
+		last->next = cur;
+	cur->next = NULL;
+	cur->key = key;
+	cur->value = data;
+}
+
+struct ssl_ctx_cache 

RFC: PKCS#11 create private keys in worker process - take 2

2024-03-27 Thread Richard Chan
Apologies for the badly pasted diff

Richard
diff --git a/include/haproxy/ssl_ckch.h b/include/haproxy/ssl_ckch.h
index 94c53b301..00ba2bf18 100644
--- a/include/haproxy/ssl_ckch.h
+++ b/include/haproxy/ssl_ckch.h
@@ -72,5 +72,14 @@ int __ssl_store_load_locations_file(char *path, int create_if_none, enum cafile_
 extern struct cert_exts cert_exts[];
 extern int (*ssl_commit_crlfile_cb)(const char *path, X509_STORE *ctx, char **err);
 
+struct ckch_data_cache {
+	struct ckch_data_cache *next;
+	struct ckch_data *key;
+	uint8_t *data;
+	int len;
+};
+
+struct ckch_data_cache *ckch_data_get(void *key);
+
 #endif /* USE_OPENSSL */
 #endif /* _HAPROXY_SSL_CRTLIST_H */
diff --git a/src/ssl_ckch.c b/src/ssl_ckch.c
index add42b69e..f659014ba 100644
--- a/src/ssl_ckch.c
+++ b/src/ssl_ckch.c
@@ -516,11 +516,63 @@ int ssl_sock_load_files_into_ckch(const char *path, struct ckch_data *data, char
  *
  *  Return 0 on success or != 0 on failure
  */
+
+struct ckch_data_cache *cache[31] = {0};
+
+/*
+ * This caches the BIO data in case the private key i
+ *
+ * to be instantiated in the child process (used for HSM keys)
+ *
+ * The key ckch_data* and the value is ckch_data_cache*
+ *
+ */
+static void ckch_data_put(void *key, uint8_t *data, int len)
+{
+	int idx = (unsigned long)key % 31;
+	struct ckch_data_cache *prev, *cur, *last = NULL;
+
+	ha_notice("caching ckch_data:%p len:%d\n", key, len);
+
+	for(prev = cache[idx]; prev != NULL;) {
+		last = prev;
+		prev = prev->next;
+	}
+
+	cur = malloc(sizeof(struct ckch_data_cache));
+	memset(cur, 0, sizeof(struct ckch_data_cache));
+	if (last == NULL)
+		cache[idx] = cur;
+	else
+		last = cur;
+	cur->key = key;
+	cur->data = malloc(len);
+	memcpy(cur->data, data, len);
+	cur->len = len;
+}
+
+struct ckch_data_cache *ckch_data_get(void *key)
+{
+	int idx = (unsigned long)key % 31;
+	struct ckch_data_cache *prev;
+
+	for(prev = cache[idx]; prev != NULL; prev = prev->next) {
+		if (prev->key == key)
+			return prev;
+	}
+
+	return NULL;
+}
+
 int ssl_sock_load_key_into_ckch(const char *path, char *buf, struct ckch_data *data , char **err)
 {
 	BIO *in = NULL;
 	int ret = 1;
 	EVP_PKEY *key = NULL;
+	static char src[16384];
+	char *src_temp;
+	int len;
+	int filetype = 0;
 
 	if (buf) {
 		/* reading from a buffer */
@@ -538,14 +590,25 @@ int ssl_sock_load_key_into_ckch(const char *path, char *buf, struct ckch_data *d
 
 		if (BIO_read_filename(in, path) <= 0)
 			goto end;
+		filetype = 1;
 	}
 
 	/* Read Private Key */
+	if(filetype) {
+		len = BIO_read(in, src, sizeof(src));
+		BIO_reset(in);
+	} else {
+		len = BIO_get_mem_data(in, _temp);
+		memcpy(src, src_temp, len);
+	}
+
 	key = PEM_read_bio_PrivateKey(in, NULL, NULL, NULL);
 	if (key == NULL) {
 		memprintf(err, "%sunable to load private key from file '%s'.\n",
 		  err && *err ? *err : "", path);
 		goto end;
+	} else {
+		ckch_data_put(data, (uint8_t*)src, len);
 	}
 
 	ret = 0;
@@ -581,6 +644,10 @@ int ssl_sock_load_pem_into_ckch(const char *path, char *buf, struct ckch_data *d
 	EVP_PKEY *key = NULL;
 	HASSL_DH *dh = NULL;
 	STACK_OF(X509) *chain = NULL;
+	int filetype = 0;
+	int len;
+	static char src[16384];
+	char *src_temp;
 
 	if (buf) {
 		/* reading from a buffer */
@@ -603,11 +670,23 @@ int ssl_sock_load_pem_into_ckch(const char *path, char *buf, struct ckch_data *d
 			  err && *err ? *err : "", path);
 			goto end;
 		}
+		filetype = 1;
 	}
 
 	/* Read Private Key */
+	if(filetype) {
+		len = BIO_read(in, src, sizeof(src));
+		BIO_reset(in);
+	} else {
+		len = BIO_get_mem_data(in, _temp);
+		memcpy(src, src_temp, len);
+	}
+
 	key = PEM_read_bio_PrivateKey(in, NULL, NULL, NULL);
 	/* no need to check for errors here, because the private key could be loaded later */
+	if(key && global.mode_MWORKER) {
+		ckch_data_put(data, (uint8_t *)src, len);
+	}
 
 #ifndef OPENSSL_NO_DH
 	/* Seek back to beginning of file */
diff --git a/src/ssl_sock.c b/src/ssl_sock.c
index 824bdaa72..64bd7ac04 100644
--- a/src/ssl_sock.c
+++ b/src/ssl_sock.c
@@ -2009,10 +2009,77 @@ struct methodVersions methodVersions[] = {
 	{SSL_OP_NO_TLSv1_3, MC_SSL_O_NO_TLSV13, ctx_set_TLSv13_func, ssl_set_TLSv13_func, "TLSv1.3"}, /* CONF_TLSV13 */
 };
 
+/* cache SSL_CTX* to ckch_data
+ * so we can recreate private key
+ */
+struct ssl_ctx_cache {
+	void *key;
+	struct ckch_data_cache *value;
+	struct ssl_ctx_cache *next;
+	EVP_PKEY *pkey;
+};
+static struct ssl_ctx_cache *ctx_cache[31];
+
+void ssl_ctx_put(void *key, void *data)
+{
+	int idx = (unsigned long)key % 31;
+	struct ssl_ctx_cache *prev, *cur, *last = NULL;
+
+	ha_notice("ssl_ctx_cache key = %p value = %p\n", key, data);
+
+	for(prev = ctx_cache[idx]; prev != NULL;) {
+		last = prev;
+		prev = prev->next;
+	}
+
+	cur = malloc(sizeof(struct ssl_ctx_cache));
+   memset(cur, 0, sizeof(struct ssl_ctx_cache));
+	if (last == NULL)
+		ctx_cache[idx] = cur;
+	else
+		last->next = cur;
+	cur->next = NULL;
+	cur->key = key;
+	cur->value = data;
+}
+
+struct ssl_ctx_cache 

RFC: PKCS#11 create private keys in worker process diff

2024-03-27 Thread Richard Chan
diff --git a/include/haproxy/ssl_ckch.h b/include/haproxy/ssl_ckch.h
index 94c53b301..00ba2bf18 100644
--- a/include/haproxy/ssl_ckch.h
+++ b/include/haproxy/ssl_ckch.h
@@ -72,5 +72,14 @@ int __ssl_store_load_locations_file(char *path, int
create_if_none, enum cafile_
 extern struct cert_exts cert_exts[];
 extern int (*ssl_commit_crlfile_cb)(const char *path, X509_STORE *ctx,
char **err);

+struct ckch_data_cache {
+ struct ckch_data_cache *next;
+ struct ckch_data *key;
+ uint8_t *data;
+ int len;
+};
+
+struct ckch_data_cache *ckch_data_get(void *key);
+
 #endif /* USE_OPENSSL */
 #endif /* _HAPROXY_SSL_CRTLIST_H */
diff --git a/src/ssl_ckch.c b/src/ssl_ckch.c
index add42b69e..f659014ba 100644
--- a/src/ssl_ckch.c
+++ b/src/ssl_ckch.c
@@ -516,11 +516,63 @@ int ssl_sock_load_files_into_ckch(const char *path,
struct ckch_data *data, char
  *
  *  Return 0 on success or != 0 on failure
  */
+
+struct ckch_data_cache *cache[31] = {0};
+
+/*
+ * This caches the BIO data in case the private key i
+ *
+ * to be instantiated in the child process (used for HSM keys)
+ *
+ * The key ckch_data* and the value is ckch_data_cache*
+ *
+ */
+static void ckch_data_put(void *key, uint8_t *data, int len)
+{
+ int idx = (unsigned long)key % 31;
+ struct ckch_data_cache *prev, *cur, *last = NULL;
+
+ ha_notice("caching ckch_data:%p len:%d\n", key, len);
+
+ for(prev = cache[idx]; prev != NULL;) {
+ last = prev;
+ prev = prev->next;
+ }
+
+ cur = malloc(sizeof(struct ckch_data_cache));
+ memset(cur, 0, sizeof(struct ckch_data_cache));
+ if (last == NULL)
+ cache[idx] = cur;
+ else
+ last = cur;
+ cur->key = key;
+ cur->data = malloc(len);
+ memcpy(cur->data, data, len);
+ cur->len = len;
+}
+
+struct ckch_data_cache *ckch_data_get(void *key)
+{
+ int idx = (unsigned long)key % 31;
+ struct ckch_data_cache *prev;
+
+ for(prev = cache[idx]; prev != NULL; prev = prev->next) {
+ if (prev->key == key)
+ return prev;
+ }
+
+ return NULL;
+}
+
 int ssl_sock_load_key_into_ckch(const char *path, char *buf, struct
ckch_data *data , char **err)
 {
  BIO *in = NULL;
  int ret = 1;
  EVP_PKEY *key = NULL;
+ static char src[16384];
+ char *src_temp;
+ int len;
+ int filetype = 0;

  if (buf) {
  /* reading from a buffer */
@@ -538,14 +590,25 @@ int ssl_sock_load_key_into_ckch(const char *path,
char *buf, struct ckch_data *d

  if (BIO_read_filename(in, path) <= 0)
  goto end;
+ filetype = 1;
  }

  /* Read Private Key */
+ if(filetype) {
+ len = BIO_read(in, src, sizeof(src));
+ BIO_reset(in);
+ } else {
+ len = BIO_get_mem_data(in, _temp);
+ memcpy(src, src_temp, len);
+ }
+
  key = PEM_read_bio_PrivateKey(in, NULL, NULL, NULL);
  if (key == NULL) {
  memprintf(err, "%sunable to load private key from file '%s'.\n",
   err && *err ? *err : "", path);
  goto end;
+ } else {
+ ckch_data_put(data, (uint8_t*)src, len);
  }

  ret = 0;
@@ -581,6 +644,10 @@ int ssl_sock_load_pem_into_ckch(const char *path, char
*buf, struct ckch_data *d
  EVP_PKEY *key = NULL;
  HASSL_DH *dh = NULL;
  STACK_OF(X509) *chain = NULL;
+ int filetype = 0;
+ int len;
+ static char src[16384];
+ char *src_temp;

  if (buf) {
  /* reading from a buffer */
@@ -603,11 +670,23 @@ int ssl_sock_load_pem_into_ckch(const char *path,
char *buf, struct ckch_data *d
   err && *err ? *err : "", path);
  goto end;
  }
+ filetype = 1;
  }

  /* Read Private Key */
+ if(filetype) {
+ len = BIO_read(in, src, sizeof(src));
+ BIO_reset(in);
+ } else {
+ len = BIO_get_mem_data(in, _temp);
+ memcpy(src, src_temp, len);
+ }
+
  key = PEM_read_bio_PrivateKey(in, NULL, NULL, NULL);
  /* no need to check for errors here, because the private key could be
loaded later */
+ if(key && global.mode_MWORKER) {
+ ckch_data_put(data, (uint8_t *)src, len);
+ }

 #ifndef OPENSSL_NO_DH
  /* Seek back to beginning of file */
diff --git a/src/ssl_sock.c b/src/ssl_sock.c
index 824bdaa72..64bd7ac04 100644
--- a/src/ssl_sock.c
+++ b/src/ssl_sock.c
@@ -2009,10 +2009,77 @@ struct methodVersions methodVersions[] = {
  {SSL_OP_NO_TLSv1_3, MC_SSL_O_NO_TLSV13, ctx_set_TLSv13_func,
ssl_set_TLSv13_func, "TLSv1.3"}, /* CONF_TLSV13 */
 };

+/* cache SSL_CTX* to ckch_data
+ * so we can recreate private key
+ */
+struct ssl_ctx_cache {
+ void *key;
+ struct ckch_data_cache *value;
+ struct ssl_ctx_cache *next;
+ EVP_PKEY *pkey;
+};
+static struct ssl_ctx_cache *ctx_cache[31];
+
+void ssl_ctx_put(void *key, void *data)
+{
+ int idx = (unsigned long)key % 31;
+ struct ssl_ctx_cache *prev, *cur, *last = NULL;
+
+ ha_notice("ssl_ctx_cache key = %p value = %p\n", key, data);
+
+ for(prev = ctx_cache[idx]; prev != NULL;) {
+ last = prev;
+ prev = prev->next;
+ }
+
+ cur = malloc(sizeof(struct ssl_ctx_cache));
+   memset(cur, 0, sizeof(struct ssl_ctx_cache));
+ if (last == NULL)
+ ctx_cache[idx] = cur;
+ else
+ last->next = cur;
+ cur->next = NULL;
+ cur->key = key;
+ cur->value = data;
+}
+
+struct ssl_ctx_cache *ssl_ctx_get(void *key)
+{
+ int idx = (unsigned long)key % 31;
+ struct ssl_ctx_cache *prev;

RFC: PKCS#11 create private keys in worker process

2024-03-27 Thread Richard Chan
Hello,

This is an RFC to recreate private keys in the worker process
for PKCS#11, so that HSM keys can be used in -W mode.

- ssl_ckch.c: add map of ckch_data to PEM data
- ssl_sock.c: add map of SSL_CTX* to ckch_data
- maps are implemented using buckets of linked lists
  it is explicit and in the code for easier review instead of using
  more optimized hashmap implementations
- when the SSL context is created and the correct SSL_CTX is assigned
 with SSL_use_SSL_CTX
  the private key data is retrieved just once once, cached, and installed
into the
SSL_CTX;
  this is done in the worker process
- the PEM data has an arbitrary limit of 16384 bytes

Regards
Richard


Re: Help tracking "connection refused" under pressure on v2.9

2024-03-27 Thread Ricardo Nabinger Sanchez
On Wed, 27 Mar 2024 11:06:39 -0300
Felipe Wilhelms Damasio  wrote:

> kernel: traps: haproxy[2057993] trap invalid opcode ip:5b3e26
> sp:7fd7c002f100 error:0 in haproxy[42c000+1f7000]

We managed to get a core file, and so created ticket #2508
(https://github.com/haproxy/haproxy/issues/2508) with more details.

Cheers,

-- 
Ricardo Nabinger Sanchez https://www.taghos.com.br/



[PATCH 0/1] CI improvement, display coredumps if any

2024-03-27 Thread Ilya Shipitsin
it is pretty rare case, however displaying "bt" may provide some ideas what 
went wrong

Ilya Shipitsin (1):
  CI: vtest: show   coredumps if any

 .github/workflows/vtest.yml | 26 +-
 1 file changed, 25 insertions(+), 1 deletion(-)

-- 
2.44.0




[PATCH 1/1] CI: vtest: show coredumps if any

2024-03-27 Thread Ilya Shipitsin
if any coredump is found, it is passed to gdb with
'thread apply all bt full'
---
 .github/workflows/vtest.yml | 26 +-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/.github/workflows/vtest.yml b/.github/workflows/vtest.yml
index 8c461385f..a704c92dc 100644
--- a/.github/workflows/vtest.yml
+++ b/.github/workflows/vtest.yml
@@ -49,6 +49,13 @@ jobs:
 - uses: actions/checkout@v4
   with:
 fetch-depth: 100
+
+- name: Setup coredumps
+  if: ${{ startsWith(matrix.os, 'ubuntu-') }}
+  run: |
+sudo sysctl -w fs.suid_dumpable=1
+sudo sysctl kernel.core_pattern=/tmp/core.%h.%e.%t
+
 #
 # Github Action cache key cannot contain comma, so we calculate it based on 
job name
 #
@@ -92,7 +99,8 @@ jobs:
   libpcre2-dev \
   libsystemd-dev \
   ninja-build \
-  socat
+  socat \
+  gdb
 - name: Install brew dependencies
   if: ${{ startsWith(matrix.os, 'macos-') }}
   run: |
@@ -150,6 +158,7 @@ jobs:
 # This is required for macOS which does not actually allow to increase
 # the '-n' soft limit to the hard limit, thus failing to run.
 ulimit -n 65536
+ulimit -c unlimited
 make reg-tests HAPROXY_ARGS="-dI" VTEST_PROGRAM=../vtest/vtest 
REGTESTS_TYPES=default,bug,devel
 - name: Config syntax check memleak smoke testing
   if: ${{ contains(matrix.name, 'ASAN') }}
@@ -175,3 +184,18 @@ jobs:
   echo "::endgroup::"
 done
 exit 1
+
+- name: Show coredumps
+  if: ${{ failure() && steps.vtest.outcome == 'failure' }}
+  run: |
+failed=false
+shopt -s nullglob
+for file in /tmp/core.*; do
+  failed=true
+  printf "::group::"
+  gdb -ex 'thread apply all bt full' ./haproxy $file
+  echo "::endgroup::"
+done
+if [ "$failed" = true ]; then
+  exit 1;
+fi
-- 
2.44.0




[PATCH] BUG/MINOR: server: fix persistence cookie for dynamic servers

2024-03-27 Thread Damien Claisse
When adding a server dynamically, we observe that when a backend has a
dynamic persistence cookie, the new server has no cookie as we receive
the following HTTP header:
set-cookie: test-cookie=; Expires=Thu, 01-Jan-1970 00:00:01 GMT; path=/
Whereas we were expecting to receive something like the following, which
is what we receive for a server added in the config file:
set-cookie: test-cookie=abcdef1234567890; path=/

After investigating code path, srv_set_dyncookie() is never called when
adding a server through CLI, it is only called when parsing config file
or using "set server bkd1/srv1 addr".

To fix this, call srv_set_dyncookie() inside cli_parse_add_server().

This patch must be backported up to 2.4.
---
 src/server.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/src/server.c b/src/server.c
index 555cae82c..a93798f03 100644
--- a/src/server.c
+++ b/src/server.c
@@ -5732,6 +5732,11 @@ static int cli_parse_add_server(char **args, char 
*payload, struct appctx *appct
 */
srv->rid = (srv_id_reuse_cnt) ? (srv_id_reuse_cnt / 2) : 0;
 
+   /* generate new server's dynamic cookie if enabled on backend */
+   if (be->ck_opts & PR_CK_DYNAMIC) {
+   srv_set_dyncookie(srv);
+   }
+
/* adding server cannot fail when we reach this:
 * publishing EVENT_HDL_SUB_SERVER_ADD
 */
-- 
2.34.1



Re: Help tracking "connection refused" under pressure on v2.9

2024-03-27 Thread Ricardo Nabinger Sanchez
On Wed, 27 Mar 2024 11:06:39 -0300
Felipe Wilhelms Damasio  wrote:

> kernel: traps: haproxy[2057993] trap invalid opcode ip:5b3e26 sp:7fd7c002f100 
> error:0 in haproxy[42c000+1f7000]

In our build, this would be where instruction pointer was:

(gdb) list *0x5b10e6
0x5b10e6 is in __task_queue (src/task.c:285).
280(wq != _ctx->timers && wq != _ctx->timers));
281 #endif
282 /* if this happens the process is doomed anyway, so better 
catch it now
283  * so that we have the caller in the stack.
284  */
285 BUG_ON(task->expire == TICK_ETERNITY);
286
287 if (likely(task_in_wq(task)))
288 __task_unlink_wq(task);
289

However, we can't produce a stack trace from only the instruction
pointer; at least I don't know how (but would love to learn if it is
possible).

We are trying to get a core dump, too.

Cheers,

-- 
Ricardo Nabinger Sanchez https://www.taghos.com.br/



Re: Help tracking "connection refused" under pressure on v2.9

2024-03-27 Thread Felipe Wilhelms Damasio
Hi,

We've confirmed a few findings after we poured ~75-80Gbps of traffic
on purpose on a single machine:
- haproxy does indeed crashes;
- hence, we have no stats socket to collect a few things;

It seems that under pressure (not sure which conditions yet) the
kernel seems to be killing it. dmesg shows:

kernel: traps: haproxy[2057993] trap invalid opcode ip:5b3e26
sp:7fd7c002f100 error:0 in haproxy[42c000+1f7000]

This is a relatively new kernel:

Linux ndt-spo-12 6.1.60 #1 SMP PREEMPT_DYNAMIC Wed Oct 25 19:17:36 UTC
2023 x86_64 Intel(R) Xeon(R) Gold 6338N CPU @ 2.20GHz GenuineIntel
GNU/Linux

And it seems to happen on different kernels.

Does anyone have any tips on how to proceed to track this down?

Before the crash, "show info" showed only around 27,000 CurConn, so
not a great deal for maxconn 90.

Thanks!

On Tue, Mar 26, 2024 at 11:33 PM Felipe Wilhelms Damasio
 wrote:
>
> Hi,
>
> Since we don't really know how to track this one, we thought it might
> be better to reach out here to get feedback.
>
> We're using haproxy to deliver streaming files under pressure
> (80-90Gbps per machine). When using h1/http, splice-response is a
> great help to keep load under control. We use branch v2.9 at the
> moment.
>
> However, we've hit a bug with splice-response (Github issue created)
> and we had to use all day our haproxies without splicing.
>
> When we reach a certain load, a "connection refused" alarm starting
> buzzing like crazy (2-3 times every 30 minutes). This alarm is simply
> a connect to localhost with 500ms timeout:
>
> socat /dev/null  tcp4-connect:127.0.0.1:80,connect-timeout=0.5
>
> The log file indicates the port is virtually closed:
>
> 2024/03/27 01:06:04 socat[984480] E read(6, 0xe98000, 8192): Connection 
> refused
>
> The thing is haproxy process is very much alive...so we just restart
> it everytime this happens.
>
> What data do you suggest we collect to help track this down? Not sure
> if the stats socket is available, but we can definitely try and get
> some information.
>
> We're not running out of fds, or even connections with/without backlog
> (we have a global maxconn of 90 with ~30,000 streaming sessions
> active and we have tcp_max_syn_backlog set to 262144), we checked. But
> it seems to correlate with heavy traffic.
>
> Thanks!
>
> --
> Felipe Damasio



-- 
Felipe Damasio



Re: [PATCH] MEDIUM: lb-chash: Deterministic node hashes based on server address

2024-03-27 Thread Willy Tarreau
Hi again Anthony,

I'm still having a few comments, but I think nothing that I cannot address
while merging it:

On Wed, Mar 13, 2024 at 12:33:54PM -0400, Anthony Deschamps wrote:
> +static inline u32 chash_compute_server_key(struct server *s)
> +{
> + u32 key = 0;
> + struct server_inetaddr srv_addr;
> +
> + enum srv_hash_key hash_key = s->hash_key;

In general, it's better to prefer the common "reverse christmas tree"
declaration order, i.e. put larger statements first and smaller next,
and more importantly not leave empty lines in the middle of declarations
as these usually indicate the beginning of statements.

> +
> + /* If hash-key is addr or addr-port then we need the address, but if we
> +  * can't determine the address then we fall back on hashing the puid.
> +  */
> + switch (s->hash_key) {

Here I initially got confused by the use of s->hash_key while the rest
below uses hash_key, I'd change it to hash_key as well since it's already
assigned at the top.

> + case SRV_HASH_KEY_ADDR:
> + case SRV_HASH_KEY_ADDR_PORT:
> + server_get_inetaddr(s, _addr);
> + if (srv_addr.family != AF_INET && srv_addr.family != AF_INET6) {
> + hash_key = SRV_HASH_KEY_ID;
> + }
> + break;
> +
> + default:
> + break;
> + }
> +
> + if (hash_key == SRV_HASH_KEY_ADDR_PORT) {
> + key = full_hash(srv_addr.port.svc);
> + }

Given that "key" is used immediately below and that I had to scroll up a
bit to verify that it was properly initialized, I'd rather set the default
zero value here in an else statement for ease of inspection later.

> + switch (hash_key) {
> + case SRV_HASH_KEY_ADDR_PORT:
> + case SRV_HASH_KEY_ADDR:
> + switch (srv_addr.family) {
> + case AF_INET:
> + key = full_hash(key + srv_addr.addr.v4.s_addr);
> + break;
> + case AF_INET6:
> + key = XXH32(srv_addr.addr.v6.s6_addr, 16, key);
> + break;
> + default:
(...)

> +  Arguments :
> +may be one of the following :
> +
> +  id The node keys will be derived from the server's position in
> + the server list.
> +

Here it's "the server's numeric identifier as set from 'id' or which
defaults to its position in the list".

The rest is good.

If you want I can perform these tiny cosmetic adjustments myself so as
to save you a round trip.

Thanks!
Willy



Re: [PATCH] BUG/MINOR: server: fix persistence cookie for dynamic servers

2024-03-27 Thread Amaury Denoyelle
On Fri, Mar 22, 2024 at 09:45:59AM +, Damien Claisse wrote:
> Hi Amaury!
> Sorry for the HTML message, I have to use a *** well-known enterprise MUA :(
> Le 22/03/2024 09:09, « Amaury Denoyelle »  a écrit :
>> This patch raises several interrogations. First, dynamic servers are
>> currently intended to not support cookies, hence why the keyword is
>> disabled for them. It has been done as a convenience but maybe it would
>> be a good time to review it carefully and see if whole cookie support
>> can be enabled.
> Indeed, there could be an opportunity to revisit this. What we observed is 
> that, even with dynamic servers, calling “set server bkd1/srv1 addr a.b.c.d” 
> would re-add the dynamic cookie for this server, and calling “enable 
> dynamic-cookie backend bkd1” would re-compute cookie for all servers in the 
> backend. It is expected as in these calls code path there is a call to 
> srv_set_dyncookie(). So there currently is at least a partial support for 
> dynamic cookies on dynamic servers, even if it’s not expected :)
>> Second, I'm unsure srv_set_dyncookie() should be called on
>> _srv_parse_init(). This function is also called for configuration file
>> servers. In particular, I do not know how we should handled duplicate
>> cookie values in this case.
> Not sure we really create duplicates here as we basically reset the same 
> cookie on the server, not on another one in the backend, at least I didn’t 
> observe such warnings in my logs while testing this patch yet. But I agree 
> that in the context of haproxy startup there would be 2 calls to 
> srv_set_dyncookie() per server which is useless and could create issues. 
> Maybe I could move that at the end of cli_parse_add_server()? Feel free to 
> suggest any better option.

Okay, I had the time to review dynamic cookie handling. For me it's fine
to use srv_set_dyncookie() for dynamic servers. I think however its new
invocation should be placed near the end of cli_parse_add_server().
Maybe with an extra check (be->ck_opts & PR_CK_DYNAMIC) to be similar
with invocation in check_config_validity().

This location will prevent duplicate invocation of srv_set_dyncookie()
for static servers, as for them the function must be called on post
parsing to ensure the backend key has been parsed.

Could you please adjust your patch ? Also, as it is a bugfix, you can
specify that it must be backported up to 2.4.

In the meantime, I will continue code inspection to determine if cookie
keyword can also be enabled for dynamic servers as well.

Regards,

-- 
Amaury Denoyelle



Re: [PATCH] MEDIUM: lb-chash: Deterministic node hashes based on server address

2024-03-27 Thread Willy Tarreau
Hi Anthony,

On Sun, Mar 24, 2024 at 10:11:41PM -0400, Anthony Deschamps wrote:
> Hi Willy,
> 
> I'm just checking in to see if there's anything left I can help address here.

Thanks for the ping and sorry for the delay. It just fell through the
cracks in the middle of all other stuff I'm currently having to deal
with in parallel, as every time we're starting to get closer to a
release :-(

I'll try to deal with it today.

Willy