Re: [PATCH]: MEDIUM: pool little FreeBSD support improvement.

2021-11-25 Thread Willy Tarreau
On Thu, Nov 25, 2021 at 04:29:55PM +, David CARLIER wrote:
> Ok I applied your suggestions and move back the malloc_trim/mallinfo
> part as it was before.

Thanks, now merged!
Willy



Re: [PATCH]: MEDIUM: pool little FreeBSD support improvement.

2021-11-25 Thread David CARLIER
Ok I applied your suggestions and move back the malloc_trim/mallinfo
part as it was before.

On Thu, 25 Nov 2021 at 14:37, Willy Tarreau  wrote:
>
> On Thu, Nov 25, 2021 at 01:19:39PM +, David CARLIER wrote:
> > Here a patchset instead :)
>
> Thanks!
>
> I've reviewed it, I'm having some comments below:
>
> > From e8daa477b53a43ab39113cf0e9c43d9bbda1e9a9 Mon Sep 17 00:00:00 2001
> > From: David Carlier 
> > Date: Thu, 25 Nov 2021 10:26:50 +
> > Subject: [PATCH 1/3] MINOR: pool: refactoring to make it available for other
> >  systems/allocators.
> >
> > Actually it's focusing on linux/glibc, we re trying to expand it for other 
> > possible systems and allocators combinations.
> > ---
> >  include/haproxy/compat.h |  9 +++--
> >  src/pool.c   | 38 +++---
> >  2 files changed, 14 insertions(+), 33 deletions(-)
> >
> > diff --git a/include/haproxy/compat.h b/include/haproxy/compat.h
> > index 25b15a1f0..b801dac08 100644
> > --- a/include/haproxy/compat.h
> > +++ b/include/haproxy/compat.h
> > @@ -263,9 +263,14 @@ typedef struct { } empty_t;
> >  #endif
> >
> >  /* glibc 2.33 provides mallinfo2() that overcomes mallinfo()'s type 
> > limitations */
> > -#if (defined(__GNU_LIBRARY__) && (__GLIBC__ > 2 || __GLIBC__ == 2 && 
> > __GLIBC_MINOR__ >= 33))
> > +#if defined(__GNU_LIBRARY__)
> >  #include 
> > -#define HA_HAVE_MALLINFO2
> > +#if (__GLIBC__ > 2 || __GLIBC__ == 2 && __GLIBC_MINOR__ >= 33)
> > +#define HA_MALLINFO_API mallinfo2
> > +#else
> > +#define HA_MALLINFO_API mallinfo
> > +#endif
> > +#define HA_HAVE_MALLINFO
> >  #endif
> >
> >  /* FreeBSD also has malloc_usable_size() but it requires malloc_np.h */
> > diff --git a/src/pool.c b/src/pool.c
> > index af46b4469..16ef76cf0 100644
> > --- a/src/pool.c
> > +++ b/src/pool.c
> > @@ -42,14 +42,15 @@ int mem_poison_byte = -1;
> >  static int mem_fail_rate = 0;
> >  #endif
> >
> > -#if defined(HA_HAVE_MALLOC_TRIM)
> >  static int using_libc_allocator = 0;
> >
> >  /* ask the allocator to trim memory pools */
> >  static void trim_all_pools(void)
> >  {
> > +#ifdef HAVE_MALLOC_TRIM
> >   if (using_libc_allocator)
> >   malloc_trim(0);
> > +#endif
> >  }
> >
> >  /* check if we're using the same allocator as the one that provides
> > @@ -60,48 +61,23 @@ static void trim_all_pools(void)
> >   */
> >  static void detect_allocator(void)
> >  {
> > -#ifdef HA_HAVE_MALLINFO2
> > - struct mallinfo2 mi1, mi2;
> > -#else
> > - struct mallinfo mi1, mi2;
> > -#endif
> > +#if defined(HA_HAVE_MALLINFO)
> > + struct HA_MALLINFO_API mi1, mi2;
> >   void *ptr;
> >
> > -#ifdef HA_HAVE_MALLINFO2
> > - mi1 = mallinfo2();
> > -#else
> > - mi1 = mallinfo();
> > -#endif
> > + mi1 = HA_MALLINFO_API();
> >   ptr = DISGUISE(malloc(1));
> > -#ifdef HA_HAVE_MALLINFO2
> > - mi2 = mallinfo2();
> > -#else
> > - mi2 = mallinfo();
> > -#endif
> > + mi2 = HA_MALLINFO_API();
> >   free(DISGUISE(ptr));
> >
> >   using_libc_allocator = !!memcmp(, , sizeof(mi1));
> > +#endif
> >  }
>
> I find it unwelcome to use a single macro for the struct name and
> the function. It just turns out that they are the same but they're
> for different object types, and if one day we have mallinfo3() that
> works on struct mallinfo2, we're doomed.

I hope it will never come to such things, sounds terrible to me :)
even tough I have hard time to imagine a situation like this.

And the gain in legibility
> is not that great overall, I'd rather keep them with the ugly ifdefs
> here. If you prefer, just write two blocks in the function, one for
> mallinfo2 and one for the other. That's what I did the first time
> but wasn't much convinced about the benefit, but at this point it's
> a matter of taste.
>
> If you really want to have it this way, otherwise use two different
> macros, e.g.:
>   #define HA_MALLINFO_TYPE struct mallinfo
>   #define HA_MALLINFO_FUNC(x) mallinfo(x)
>
> But again this adds abstraction that doesn't help much, especially in
> tricky code which is written to diagnose certain things.
>
> > From c0f5cf6f935392e1ea69a84aae104de5ed06e68c Mon Sep 17 00:00:00 2001
> > From: David Carlier 
> > Date: Thu, 25 Nov 2021 13:00:54 +
> > Subject: [PATCH 2/3] MEDIUM: pool : detecting jemalloc usage at runtime.
> >
> > Here we're just looking up if the mallctl's jemalloc API symbol is present.
> > ---
> >  src/pool.c | 26 +++---
> >  1 file changed, 19 insertions(+), 7 deletions(-)
> >
> > diff --git a/src/pool.c b/src/pool.c
> > index 16ef76cf0..65c037911 100644
> > --- a/src/pool.c
> > +++ b/src/pool.c
> > @@ -43,6 +43,7 @@ static int mem_fail_rate = 0;
> >  #endif
> >
> >  static int using_libc_allocator = 0;
> > +static int(*my_mallctl)(const char *, void *, size_t *, void *, size_t) = 
> > NULL;
> >
> >  /* ask the allocator to trim memory pools */
> >  static void trim_all_pools(void)
> > @@ -61,17 +62,28 @@ static void trim_all_pools(void)
> >   */
> >  

Re: [PATCH]: MEDIUM: pool little FreeBSD support improvement.

2021-11-25 Thread Willy Tarreau
On Thu, Nov 25, 2021 at 01:19:39PM +, David CARLIER wrote:
> Here a patchset instead :)

Thanks!

I've reviewed it, I'm having some comments below:

> From e8daa477b53a43ab39113cf0e9c43d9bbda1e9a9 Mon Sep 17 00:00:00 2001
> From: David Carlier 
> Date: Thu, 25 Nov 2021 10:26:50 +
> Subject: [PATCH 1/3] MINOR: pool: refactoring to make it available for other
>  systems/allocators.
> 
> Actually it's focusing on linux/glibc, we re trying to expand it for other 
> possible systems and allocators combinations.
> ---
>  include/haproxy/compat.h |  9 +++--
>  src/pool.c   | 38 +++---
>  2 files changed, 14 insertions(+), 33 deletions(-)
> 
> diff --git a/include/haproxy/compat.h b/include/haproxy/compat.h
> index 25b15a1f0..b801dac08 100644
> --- a/include/haproxy/compat.h
> +++ b/include/haproxy/compat.h
> @@ -263,9 +263,14 @@ typedef struct { } empty_t;
>  #endif
>  
>  /* glibc 2.33 provides mallinfo2() that overcomes mallinfo()'s type 
> limitations */
> -#if (defined(__GNU_LIBRARY__) && (__GLIBC__ > 2 || __GLIBC__ == 2 && 
> __GLIBC_MINOR__ >= 33))
> +#if defined(__GNU_LIBRARY__)
>  #include 
> -#define HA_HAVE_MALLINFO2
> +#if (__GLIBC__ > 2 || __GLIBC__ == 2 && __GLIBC_MINOR__ >= 33)
> +#define HA_MALLINFO_API mallinfo2
> +#else
> +#define HA_MALLINFO_API mallinfo
> +#endif
> +#define HA_HAVE_MALLINFO
>  #endif
>  
>  /* FreeBSD also has malloc_usable_size() but it requires malloc_np.h */
> diff --git a/src/pool.c b/src/pool.c
> index af46b4469..16ef76cf0 100644
> --- a/src/pool.c
> +++ b/src/pool.c
> @@ -42,14 +42,15 @@ int mem_poison_byte = -1;
>  static int mem_fail_rate = 0;
>  #endif
>  
> -#if defined(HA_HAVE_MALLOC_TRIM)
>  static int using_libc_allocator = 0;
>  
>  /* ask the allocator to trim memory pools */
>  static void trim_all_pools(void)
>  {
> +#ifdef HAVE_MALLOC_TRIM
>   if (using_libc_allocator)
>   malloc_trim(0);
> +#endif
>  }
>  
>  /* check if we're using the same allocator as the one that provides
> @@ -60,48 +61,23 @@ static void trim_all_pools(void)
>   */
>  static void detect_allocator(void)
>  {
> -#ifdef HA_HAVE_MALLINFO2
> - struct mallinfo2 mi1, mi2;
> -#else
> - struct mallinfo mi1, mi2;
> -#endif
> +#if defined(HA_HAVE_MALLINFO)
> + struct HA_MALLINFO_API mi1, mi2;
>   void *ptr;
>  
> -#ifdef HA_HAVE_MALLINFO2
> - mi1 = mallinfo2();
> -#else
> - mi1 = mallinfo();
> -#endif
> + mi1 = HA_MALLINFO_API();
>   ptr = DISGUISE(malloc(1));
> -#ifdef HA_HAVE_MALLINFO2
> - mi2 = mallinfo2();
> -#else
> - mi2 = mallinfo();
> -#endif
> + mi2 = HA_MALLINFO_API();
>   free(DISGUISE(ptr));
>  
>   using_libc_allocator = !!memcmp(, , sizeof(mi1));
> +#endif
>  }

I find it unwelcome to use a single macro for the struct name and
the function. It just turns out that they are the same but they're
for different object types, and if one day we have mallinfo3() that
works on struct mallinfo2, we're doomed. And the gain in legibility
is not that great overall, I'd rather keep them with the ugly ifdefs
here. If you prefer, just write two blocks in the function, one for
mallinfo2 and one for the other. That's what I did the first time
but wasn't much convinced about the benefit, but at this point it's
a matter of taste.

If you really want to have it this way, otherwise use two different
macros, e.g.:
  #define HA_MALLINFO_TYPE struct mallinfo
  #define HA_MALLINFO_FUNC(x) mallinfo(x)

But again this adds abstraction that doesn't help much, especially in
tricky code which is written to diagnose certain things.

> From c0f5cf6f935392e1ea69a84aae104de5ed06e68c Mon Sep 17 00:00:00 2001
> From: David Carlier 
> Date: Thu, 25 Nov 2021 13:00:54 +
> Subject: [PATCH 2/3] MEDIUM: pool : detecting jemalloc usage at runtime.
> 
> Here we're just looking up if the mallctl's jemalloc API symbol is present.
> ---
>  src/pool.c | 26 +++---
>  1 file changed, 19 insertions(+), 7 deletions(-)
> 
> diff --git a/src/pool.c b/src/pool.c
> index 16ef76cf0..65c037911 100644
> --- a/src/pool.c
> +++ b/src/pool.c
> @@ -43,6 +43,7 @@ static int mem_fail_rate = 0;
>  #endif
>  
>  static int using_libc_allocator = 0;
> +static int(*my_mallctl)(const char *, void *, size_t *, void *, size_t) = 
> NULL;
>  
>  /* ask the allocator to trim memory pools */
>  static void trim_all_pools(void)
> @@ -61,17 +62,28 @@ static void trim_all_pools(void)
>   */
>  static void detect_allocator(void)
>  {
> + extern int mallctl(const char *, void *, size_t *, void *, size_t) 
> __attribute__((weak));
> +
> + my_mallctl = mallctl;
> + 
> + if (!my_mallctl)
> + my_mallctl = get_sym_curr_addr("mallctl");
> +
> + if (!my_mallctl) {
>  #if defined(HA_HAVE_MALLINFO)
> - struct HA_MALLINFO_API mi1, mi2;
> - void *ptr;
> + struct HA_MALLINFO_API mi1, mi2;
> + void *ptr;
>  
> - mi1 = HA_MALLINFO_API();
> - ptr = 

Re: [PATCH]: MEDIUM: pool little FreeBSD support improvement.

2021-11-25 Thread Илья Шипицин
чт, 25 нояб. 2021 г. в 18:22, David CARLIER :

> On Thu, 25 Nov 2021 at 12:25, Willy Tarreau  wrote:
> >
> > On Thu, Nov 25, 2021 at 04:38:27PM +0500,  ??? wrote:
> > > > Thus I think that instead of focusing on the OS we ought to continue
> > > > to focus on the allocator and improve runtime detection:
> > > >
> > > >   - glibc (currently detected using detect_allocator)
> > > > => use malloc_trim()
> > > >   - jemalloc at build time (mallctl != NULL)
> > > > => use mallctl() as you did
> > > >   - jemalloc at runtime (mallctl == NULL but dlsym("mallctl") !=
> NULL)
> > > > => use mallctl() as you did
> > > >   - others
> > > > => no trimming
> > > >
> > >
> > > I never imagined earlier that high level applications (such as reverse
> > > https/tcp proxy) cares about such low level things as allocator
> behaviour.
> > > no jokes, really.
> >
> > Yes it does count a lot. That's also why we spent a lot of time
> optimizing
> > the pools, to limit the number of calls to the system's allocator for
> > everything that uses a fixed size. I've seen some performance graphs in
> > our internal ticket tracker showing the memory consumption between and
> > after the switch to jemalloc, and the CPU usage as well, and sometimes
> > it was very important.
> >
> > Glibc improved quite a bit recently (2.28 or 2.33 I don't remember) by
> > implementing a per-thread cache in its ptmalloc. But in our case it's
> > still not as good as jemalloc, and neither perform as well as our
> > thread-local pools for fixed sizes.
> >
> > I'm seeing in a paper about snmalloc that it performs exceptionally well
> > for small allocations. I just don't know how this degrades depending on
> > the access patterns. For example some allocators are fast when you free()
> > in the exact reverse allocation order, but can start to fragment or have
> > more work to do finding holes if you don't free() in the exact same
> order.
> >
>
> If you re curious there is also mimalloc (with a pretty rich C api)
> from Microsoft too.
>
>
I had bad experience with tcmalloc on arm64. It turned out that it was not
properly tested under arm64.
actually, I think "do we really need massive alloc/free" instead of using
preallocated objects.


> > But that's something to keep an eye on in the future.
> >
> > Willy
>


Re: [PATCH]: MEDIUM: pool little FreeBSD support improvement.

2021-11-25 Thread David CARLIER
On Thu, 25 Nov 2021 at 12:25, Willy Tarreau  wrote:
>
> On Thu, Nov 25, 2021 at 04:38:27PM +0500,  ??? wrote:
> > > Thus I think that instead of focusing on the OS we ought to continue
> > > to focus on the allocator and improve runtime detection:
> > >
> > >   - glibc (currently detected using detect_allocator)
> > > => use malloc_trim()
> > >   - jemalloc at build time (mallctl != NULL)
> > > => use mallctl() as you did
> > >   - jemalloc at runtime (mallctl == NULL but dlsym("mallctl") != NULL)
> > > => use mallctl() as you did
> > >   - others
> > > => no trimming
> > >
> >
> > I never imagined earlier that high level applications (such as reverse
> > https/tcp proxy) cares about such low level things as allocator behaviour.
> > no jokes, really.
>
> Yes it does count a lot. That's also why we spent a lot of time optimizing
> the pools, to limit the number of calls to the system's allocator for
> everything that uses a fixed size. I've seen some performance graphs in
> our internal ticket tracker showing the memory consumption between and
> after the switch to jemalloc, and the CPU usage as well, and sometimes
> it was very important.
>
> Glibc improved quite a bit recently (2.28 or 2.33 I don't remember) by
> implementing a per-thread cache in its ptmalloc. But in our case it's
> still not as good as jemalloc, and neither perform as well as our
> thread-local pools for fixed sizes.
>
> I'm seeing in a paper about snmalloc that it performs exceptionally well
> for small allocations. I just don't know how this degrades depending on
> the access patterns. For example some allocators are fast when you free()
> in the exact reverse allocation order, but can start to fragment or have
> more work to do finding holes if you don't free() in the exact same order.
>

If you re curious there is also mimalloc (with a pretty rich C api)
from Microsoft too.

> But that's something to keep an eye on in the future.
>
> Willy



Re: [PATCH]: MEDIUM: pool little FreeBSD support improvement.

2021-11-25 Thread David CARLIER
Here a patchset instead :)

On Thu, 25 Nov 2021 at 12:25, Willy Tarreau  wrote:
>
> On Thu, Nov 25, 2021 at 04:38:27PM +0500,  ??? wrote:
> > > Thus I think that instead of focusing on the OS we ought to continue
> > > to focus on the allocator and improve runtime detection:
> > >
> > >   - glibc (currently detected using detect_allocator)
> > > => use malloc_trim()
> > >   - jemalloc at build time (mallctl != NULL)
> > > => use mallctl() as you did
> > >   - jemalloc at runtime (mallctl == NULL but dlsym("mallctl") != NULL)
> > > => use mallctl() as you did
> > >   - others
> > > => no trimming
> > >
> >
> > I never imagined earlier that high level applications (such as reverse
> > https/tcp proxy) cares about such low level things as allocator behaviour.
> > no jokes, really.
>
> Yes it does count a lot. That's also why we spent a lot of time optimizing
> the pools, to limit the number of calls to the system's allocator for
> everything that uses a fixed size. I've seen some performance graphs in
> our internal ticket tracker showing the memory consumption between and
> after the switch to jemalloc, and the CPU usage as well, and sometimes
> it was very important.
>
> Glibc improved quite a bit recently (2.28 or 2.33 I don't remember) by
> implementing a per-thread cache in its ptmalloc. But in our case it's
> still not as good as jemalloc, and neither perform as well as our
> thread-local pools for fixed sizes.
>
> I'm seeing in a paper about snmalloc that it performs exceptionally well
> for small allocations. I just don't know how this degrades depending on
> the access patterns. For example some allocators are fast when you free()
> in the exact reverse allocation order, but can start to fragment or have
> more work to do finding holes if you don't free() in the exact same order.
>
> But that's something to keep an eye on in the future.
>
> Willy
From 7206aff8ce6b6cb8c9b9b36d33b4f218f33cf0d1 Mon Sep 17 00:00:00 2001
From: David Carlier 
Date: Thu, 25 Nov 2021 13:14:37 +
Subject: [PATCH 3/3] MEDIUM: pool: using jemalloc when relevant.

Finally, we trim the arenas with jemalloc if detected otherwise falling back
 to glibc's malloc_trim.
---
 src/pool.c | 18 --
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/src/pool.c b/src/pool.c
index 65c037911..3ec7a5386 100644
--- a/src/pool.c
+++ b/src/pool.c
@@ -48,10 +48,24 @@ static int(*my_mallctl)(const char *, void *, size_t *, void *, size_t) = NULL;
 /* ask the allocator to trim memory pools */
 static void trim_all_pools(void)
 {
+	if (using_libc_allocator) {
+		if (my_mallctl) {
+			unsigned int i, narenas = 0;
+			size_t len = sizeof(narenas);
+
+			if (my_mallctl("arenas.narenas", , , NULL, 0) == 0) {
+for (i = 0; i < narenas; i ++) {
+	char mib[32] = {0};
+	snprintf(mib, sizeof(mib), "arena.%u.purge", i);
+	(void)my_mallctl(mib, NULL, NULL, NULL, 0);
+}
+			}
+		} else {
 #ifdef HAVE_MALLOC_TRIM
-	if (using_libc_allocator)
-		malloc_trim(0);
+			malloc_trim(0);
 #endif
+		}
+	}
 }
 
 /* check if we're using the same allocator as the one that provides
-- 
2.33.1

From e8daa477b53a43ab39113cf0e9c43d9bbda1e9a9 Mon Sep 17 00:00:00 2001
From: David Carlier 
Date: Thu, 25 Nov 2021 10:26:50 +
Subject: [PATCH 1/3] MINOR: pool: refactoring to make it available for other
 systems/allocators.

Actually it's focusing on linux/glibc, we re trying to expand it for other possible systems and allocators combinations.
---
 include/haproxy/compat.h |  9 +++--
 src/pool.c   | 38 +++---
 2 files changed, 14 insertions(+), 33 deletions(-)

diff --git a/include/haproxy/compat.h b/include/haproxy/compat.h
index 25b15a1f0..b801dac08 100644
--- a/include/haproxy/compat.h
+++ b/include/haproxy/compat.h
@@ -263,9 +263,14 @@ typedef struct { } empty_t;
 #endif
 
 /* glibc 2.33 provides mallinfo2() that overcomes mallinfo()'s type limitations */
-#if (defined(__GNU_LIBRARY__) && (__GLIBC__ > 2 || __GLIBC__ == 2 && __GLIBC_MINOR__ >= 33))
+#if defined(__GNU_LIBRARY__)
 #include 
-#define HA_HAVE_MALLINFO2
+#if (__GLIBC__ > 2 || __GLIBC__ == 2 && __GLIBC_MINOR__ >= 33)
+#define HA_MALLINFO_API mallinfo2
+#else
+#define HA_MALLINFO_API mallinfo
+#endif
+#define HA_HAVE_MALLINFO
 #endif
 
 /* FreeBSD also has malloc_usable_size() but it requires malloc_np.h */
diff --git a/src/pool.c b/src/pool.c
index af46b4469..16ef76cf0 100644
--- a/src/pool.c
+++ b/src/pool.c
@@ -42,14 +42,15 @@ int mem_poison_byte = -1;
 static int mem_fail_rate = 0;
 #endif
 
-#if defined(HA_HAVE_MALLOC_TRIM)
 static int using_libc_allocator = 0;
 
 /* ask the allocator to trim memory pools */
 static void trim_all_pools(void)
 {
+#ifdef HAVE_MALLOC_TRIM
 	if (using_libc_allocator)
 		malloc_trim(0);
+#endif
 }
 
 /* check if we're using the same allocator as the one that provides
@@ -60,48 +61,23 @@ static void trim_all_pools(void)
  */
 static void detect_allocator(void)
 {

Re: [PATCH]: MEDIUM: pool little FreeBSD support improvement.

2021-11-25 Thread Willy Tarreau
On Thu, Nov 25, 2021 at 04:38:27PM +0500,  ??? wrote:
> > Thus I think that instead of focusing on the OS we ought to continue
> > to focus on the allocator and improve runtime detection:
> >
> >   - glibc (currently detected using detect_allocator)
> > => use malloc_trim()
> >   - jemalloc at build time (mallctl != NULL)
> > => use mallctl() as you did
> >   - jemalloc at runtime (mallctl == NULL but dlsym("mallctl") != NULL)
> > => use mallctl() as you did
> >   - others
> > => no trimming
> >
> 
> I never imagined earlier that high level applications (such as reverse
> https/tcp proxy) cares about such low level things as allocator behaviour.
> no jokes, really.

Yes it does count a lot. That's also why we spent a lot of time optimizing
the pools, to limit the number of calls to the system's allocator for
everything that uses a fixed size. I've seen some performance graphs in
our internal ticket tracker showing the memory consumption between and
after the switch to jemalloc, and the CPU usage as well, and sometimes
it was very important.

Glibc improved quite a bit recently (2.28 or 2.33 I don't remember) by
implementing a per-thread cache in its ptmalloc. But in our case it's
still not as good as jemalloc, and neither perform as well as our
thread-local pools for fixed sizes.

I'm seeing in a paper about snmalloc that it performs exceptionally well
for small allocations. I just don't know how this degrades depending on
the access patterns. For example some allocators are fast when you free()
in the exact reverse allocation order, but can start to fragment or have
more work to do finding holes if you don't free() in the exact same order.

But that's something to keep an eye on in the future.

Willy



Re: [PATCH]: MEDIUM: pool little FreeBSD support improvement.

2021-11-25 Thread David CARLIER
On Thu, 25 Nov 2021 at 11:38, Илья Шипицин  wrote:
>
>
>
> чт, 25 нояб. 2021 г. в 14:54, Willy Tarreau :
>>
>> Hi David,
>>
>> On Wed, Nov 24, 2021 at 08:08:39PM +, David CARLIER wrote:
>> > Hi
>> >
>> > here a little patch for FreeBSD to support memory arenas trimming.
>> (...)
>> > FreeBSD uses a slighty simplified version of jemalloc as libc allocator
>> > since many years (there is thoughts to eventually switch to snmalloc
>> >  but not before a long time).
>> > We detect the libc in the least hacky way in this case aiming as jemalloc
>> >  specific API then we try to purge arenas as much as we can.
>>
>> This one is interesting because according to the jemalloc doc on
>> my machine it could also work on linux with jemalloc, provided that
>> jemalloc is linked at build time (otherwise mallctl remains NULL).
>> However it remains available uding dlopen().
>>
>> Thus I think that instead of focusing on the OS we ought to continue
>> to focus on the allocator and improve runtime detection:
>>
>>   - glibc (currently detected using detect_allocator)
>> => use malloc_trim()
>>   - jemalloc at build time (mallctl != NULL)
>> => use mallctl() as you did
>>   - jemalloc at runtime (mallctl == NULL but dlsym("mallctl") != NULL)
>> => use mallctl() as you did
>>   - others
>> => no trimming
>

Fair enough I thought it was solely about system libc, but indeed the
Linux/jemalloc
 combination is very common, interesting to look at indeed.
Ok will provide the patchsets later on :-)
>
> I never imagined earlier that high level applications (such as reverse 
> https/tcp proxy) cares about such low level things as allocator behaviour.
> no jokes, really.
>

This is how you stand out ;-)
>
>>
>>
>> That would cover the vast majority of use cases where trimming is relevant.
>> I'm quite interested in jemalloc on linux, including when used at runtime
>> using LD_PRELOAD, because I see it quite frequently in high-performance
>> environments. When site operators see their memory usage fall two-fold and
>> the CPU as well by just prepending an LD_PRELOAD in their startup script,
>> they don't need to think twice and it's quickly done.
>>
>> Thus I think we could proceed like this in the pools init code:
>>
>>void detect_allocator()
>>{
>> extern int mallctl(const char *, void *, size_t *, void *, size_t) 
>> __attribute__((weak));
>>
>> my_mallctl = mallctl;
>> if (!my_mallctl)
>> my_mallctl = get_sym_curr_addr("mallctl");
>>
>> if (!my_mallctl) {
>> /* existing tests based on mallinfo/mallinfo2 */
>> }
>>}
>>
>> At this point we'd have:
>>   - my_mallctl not null when using jemalloc, hence trim() would always
>> use your code that purges all arenas
>>
>>   - otherwise if using_libc_allocator is set, we can call malloc_trim()
>>
>>   - otherwise it's an unknown allocator and we don't do anything.
>>
>> Thus could you please turn your patch into a small series that does
>> this ? The steps I'm seeing are:
>>   1. changing the detection logic so that it's always performed, and
>>  not just on HA_HAVE_MALLOC_TRIM. This means the variants of
>>  functions is_trim_enabled, trim_all_pools and detect_allocator()
>>  are all remerged into a single variant. trim_all_pools() will be
>>  the only one to rely on HA_HAVE_MALLOC_TRIM.
>>
>>   2. introduce mallctl availability detection and put its pointer into
>>  a separate one that we can get unsing dlfcn when linked in.
>>
>>   3. modify trim_all_pools() to use mallctl() when available, otherwise
>>  rely on current code with malloc_trim().
>>
>> This should make a significant improvement with plug-n-play detection
>> for the majority of users.
>>
>> Then if you want to introduce runtime detection for snmalloc to do the
>> same, this would make everything much easier.
>>
>> Thanks,
>> Willy
>>



Re: [PATCH]: MEDIUM: pool little FreeBSD support improvement.

2021-11-25 Thread Илья Шипицин
чт, 25 нояб. 2021 г. в 14:54, Willy Tarreau :

> Hi David,
>
> On Wed, Nov 24, 2021 at 08:08:39PM +, David CARLIER wrote:
> > Hi
> >
> > here a little patch for FreeBSD to support memory arenas trimming.
> (...)
> > FreeBSD uses a slighty simplified version of jemalloc as libc allocator
> > since many years (there is thoughts to eventually switch to snmalloc
> >  but not before a long time).
> > We detect the libc in the least hacky way in this case aiming as jemalloc
> >  specific API then we try to purge arenas as much as we can.
>
> This one is interesting because according to the jemalloc doc on
> my machine it could also work on linux with jemalloc, provided that
> jemalloc is linked at build time (otherwise mallctl remains NULL).
> However it remains available uding dlopen().
>
> Thus I think that instead of focusing on the OS we ought to continue
> to focus on the allocator and improve runtime detection:
>
>   - glibc (currently detected using detect_allocator)
> => use malloc_trim()
>   - jemalloc at build time (mallctl != NULL)
> => use mallctl() as you did
>   - jemalloc at runtime (mallctl == NULL but dlsym("mallctl") != NULL)
> => use mallctl() as you did
>   - others
> => no trimming
>

I never imagined earlier that high level applications (such as reverse
https/tcp proxy) cares about such low level things as allocator behaviour.
no jokes, really.



>
> That would cover the vast majority of use cases where trimming is relevant.
> I'm quite interested in jemalloc on linux, including when used at runtime
> using LD_PRELOAD, because I see it quite frequently in high-performance
> environments. When site operators see their memory usage fall two-fold and
> the CPU as well by just prepending an LD_PRELOAD in their startup script,
> they don't need to think twice and it's quickly done.
>
> Thus I think we could proceed like this in the pools init code:
>
>void detect_allocator()
>{
> extern int mallctl(const char *, void *, size_t *, void *, size_t)
> __attribute__((weak));
>
> my_mallctl = mallctl;
> if (!my_mallctl)
> my_mallctl = get_sym_curr_addr("mallctl");
>
> if (!my_mallctl) {
> /* existing tests based on mallinfo/mallinfo2 */
> }
>}
>
> At this point we'd have:
>   - my_mallctl not null when using jemalloc, hence trim() would always
> use your code that purges all arenas
>
>   - otherwise if using_libc_allocator is set, we can call malloc_trim()
>
>   - otherwise it's an unknown allocator and we don't do anything.
>
> Thus could you please turn your patch into a small series that does
> this ? The steps I'm seeing are:
>   1. changing the detection logic so that it's always performed, and
>  not just on HA_HAVE_MALLOC_TRIM. This means the variants of
>  functions is_trim_enabled, trim_all_pools and detect_allocator()
>  are all remerged into a single variant. trim_all_pools() will be
>  the only one to rely on HA_HAVE_MALLOC_TRIM.
>
>   2. introduce mallctl availability detection and put its pointer into
>  a separate one that we can get unsing dlfcn when linked in.
>
>   3. modify trim_all_pools() to use mallctl() when available, otherwise
>  rely on current code with malloc_trim().
>
> This should make a significant improvement with plug-n-play detection
> for the majority of users.
>
> Then if you want to introduce runtime detection for snmalloc to do the
> same, this would make everything much easier.
>
> Thanks,
> Willy
>
>


Re: [ANNOUNCE] haproxy-2.4.9

2021-11-25 Thread Willy Tarreau
On Thu, Nov 25, 2021 at 01:29:13PM +0300, Dmitry Sivachenko wrote:
> 
> > On 25 Nov 2021, at 13:09, Willy Tarreau  wrote:
> > 
> > Please try the two attached patches. They re-backport something that
> > we earlier failed to backport that simplifies the ugly ifdefs everywhere
> > that virtually break every single backport related to SSL.
> > 
> > For me they work with/without SSL and with older versions (tested as far
> > as 0.9.8).
> > 
> > Thanks,
> > Willy
> > <0001-CLEANUP-servers-do-not-include-openssl-compat.patch><0002-CLEANUP-server-always-include-the-storage-for-SSL-se.patch>
> 
> 
> These two patches do fix the build.

OK thanks Dmitry. For now we'll probably just keep the workaround that
Amaury pushed in the mean time, but I'm pretty sure that sooner or later
we'll see yet another breakage, and if so it's likely that we decide to
merge them.

Cheers,
Willy



Re: [ANNOUNCE] haproxy-2.4.9

2021-11-25 Thread Dmitry Sivachenko



> On 25 Nov 2021, at 13:29, Amaury Denoyelle  wrote:
> 
> Dmitry, the patches that Willy provided you should fix the issue. Now,
> do you need a 2.4.10 to be emitted early with it or is it possible for
> you to keep the patches in your tree so we can have a more substantial
> list of change for a new version ?
> 

As for me there is no hurry: I'll add patches to FreeBSD ports collection.




Re: [ANNOUNCE] haproxy-2.4.9

2021-11-25 Thread Amaury Denoyelle
On Thu, Nov 25, 2021 at 11:42:01AM +0300, Dmitry Sivachenko wrote:
> On 24 Nov 2021, at 12:57, Christopher Faulet  wrote:
> > > > Hi,
> > > HAProxy 2.4.9 was released on 2021/11/23. It added 36 new commits
> > after version 2.4.8.
> > 
> Hello,
> version 2.4.9 fails to build with OpenSSL turned off:
>  src/server.c:207:51: error: no member named 'ssl_ctx' in 'struct server'
> if (srv->mux_proto || srv->use_ssl != 1 || !srv->ssl_ctx.alpn_str) {
> ~~~  ^
> src/server.c:241:37: error: no member named 'ssl_ctx' in 'struct server'
> const struct ist alpn = ist2(srv->ssl_ctx.alpn_str,
>  ~~~  ^
> src/server.c:242:37: error: no member named 'ssl_ctx' in 'struct server'
>  srv->ssl_ctx.alpn_len);
>  ~~~  ^
> Version 2.4.8 builds fine.
> 
> 

Dmitry, the patches that Willy provided you should fix the issue. Now,
do you need a 2.4.10 to be emitted early with it or is it possible for
you to keep the patches in your tree so we can have a more substantial
list of change for a new version ?

-- 
Amaury Denoyelle



Re: [ANNOUNCE] haproxy-2.4.9

2021-11-25 Thread Dmitry Sivachenko


> On 25 Nov 2021, at 13:09, Willy Tarreau  wrote:
> 
> Please try the two attached patches. They re-backport something that
> we earlier failed to backport that simplifies the ugly ifdefs everywhere
> that virtually break every single backport related to SSL.
> 
> For me they work with/without SSL and with older versions (tested as far
> as 0.9.8).
> 
> Thanks,
> Willy
> <0001-CLEANUP-servers-do-not-include-openssl-compat.patch><0002-CLEANUP-server-always-include-the-storage-for-SSL-se.patch>


These two patches do fix the build.

Thanks!


Re: [ANNOUNCE] haproxy-2.4.9

2021-11-25 Thread Willy Tarreau
On Thu, Nov 25, 2021 at 11:02:52AM +0100, Amaury Denoyelle wrote:
> On Thu, Nov 25, 2021 at 11:42:01AM +0300, Dmitry Sivachenko wrote:
> > On 24 Nov 2021, at 12:57, Christopher Faulet  wrote:
> > > > > Hi,
> > > > HAProxy 2.4.9 was released on 2021/11/23. It added 36 new commits
> > > after version 2.4.8.
> > > 
> > Hello,
> > version 2.4.9 fails to build with OpenSSL turned off:
> >  src/server.c:207:51: error: no member named 'ssl_ctx' in 'struct server'
> > if (srv->mux_proto || srv->use_ssl != 1 || !srv->ssl_ctx.alpn_str) {
> > ~~~  ^
> > src/server.c:241:37: error: no member named 'ssl_ctx' in 'struct server'
> > const struct ist alpn = ist2(srv->ssl_ctx.alpn_str,
> >  ~~~  ^
> > src/server.c:242:37: error: no member named 'ssl_ctx' in 'struct server'
> >  srv->ssl_ctx.alpn_len);
> >  ~~~  ^
> > Version 2.4.8 builds fine.
> > 
> > 
> 
> Thanks for your report. One of my commit to handle properly websocket on
> the server side introduces this issue. I'm working on a fix.

Please try the two attached patches. They re-backport something that
we earlier failed to backport that simplifies the ugly ifdefs everywhere
that virtually break every single backport related to SSL.

For me they work with/without SSL and with older versions (tested as far
as 0.9.8).

Thanks,
Willy
>From ce5ca630697a069ffbd81169663e5dbeb554179a Mon Sep 17 00:00:00 2001
From: Willy Tarreau 
Date: Wed, 6 Oct 2021 11:23:32 +0200
Subject: CLEANUP: servers: do not include openssl-compat

This is exactly the same as for listeners, servers only include
openssl-compat to provide the SSL_CTX type to use as two pointers to
contexts, and to detect if NPN, ALPN, and cipher suites are supported,
and save up to 5 pointers in the ssl_ctx struct if not supported. This
is pointless, as these ones have all been supported for about a decade,
and including this file comes with a long dependency chain that impacts
lots of other files. The ctx was made a void*.

Now the build time was significantly reduced, from 9.2 to 8.1 seconds,
thanks to opensslconf.h being included "only" 456 times instead of 2424
previously!

The total number of lines of code compiled was reduced by 15%.

(cherry picked from commit 340ef2502eae2a37781e460d3590982c0e437fbd)
[wt: this is backported to get rid of the painful #ifdef around SSL
 fields that regularly break backports]
Signed-off-by: Willy Tarreau 
---
 include/haproxy/server-t.h | 10 +-
 1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/include/haproxy/server-t.h b/include/haproxy/server-t.h
index 429195388..32b649bf3 100644
--- a/include/haproxy/server-t.h
+++ b/include/haproxy/server-t.h
@@ -35,9 +35,7 @@
 #include 
 #include 
 #include 
-#include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -341,7 +339,7 @@ struct server {
 #ifdef USE_OPENSSL
char *sni_expr; /* Temporary variable to store a sample 
expression for SNI */
struct {
-   SSL_CTX *ctx;
+   void *ctx;
struct {
unsigned char *ptr;
int size;
@@ -353,9 +351,7 @@ struct server {
__decl_thread(HA_RWLOCK_T lock); /* lock the cache and SSL_CTX 
during commit operations */
 
char *ciphers;  /* cipher suite to use if 
non-null */
-#ifdef HAVE_SSL_CTX_SET_CIPHERSUITES
char *ciphersuites; /* TLS 1.3 cipher suite 
to use if non-null */
-#endif
int options;/* ssl options */
int verify; /* verify method (set of 
SSL_VERIFY_* flags) */
struct tls_version_filter methods;  /* ssl methods */
@@ -363,14 +359,10 @@ struct server {
char *ca_file;  /* CAfile to use on verify */
char *crl_file; /* CRLfile to use on verify */
struct sample_expr *sni;/* sample expression for SNI */
-#ifdef OPENSSL_NPN_NEGOTIATED
char *npn_str;  /* NPN protocol string */
int npn_len;/* NPN protocol string length */
-#endif
-#ifdef TLSEXT_TYPE_application_layer_protocol_negotiation
char *alpn_str; /* ALPN protocol string */
int alpn_len;   /* ALPN protocol string length 
*/
-#endif
} ssl_ctx;
 #ifdef USE_QUIC
struct quic_transport_params quic_params; /* QUIC transport parameters 
*/
-- 
2.28.0

>From 6d395b766fd816cf2e7feea3286a689e635e35f9 Mon Sep 17 00:00:00 2001
From: Willy Tarreau 
Date: Wed, 6 Oct 2021 14:48:37 +0200
Subject: CLEANUP: server: always include the storage for SSL settings

The SSL stuff in struct server takes less than 3% of it 

Re: [ANNOUNCE] haproxy-2.4.9

2021-11-25 Thread Amaury Denoyelle
On Thu, Nov 25, 2021 at 11:42:01AM +0300, Dmitry Sivachenko wrote:
> On 24 Nov 2021, at 12:57, Christopher Faulet  wrote:
> > > > Hi,
> > > HAProxy 2.4.9 was released on 2021/11/23. It added 36 new commits
> > after version 2.4.8.
> > 
> Hello,
> version 2.4.9 fails to build with OpenSSL turned off:
>  src/server.c:207:51: error: no member named 'ssl_ctx' in 'struct server'
> if (srv->mux_proto || srv->use_ssl != 1 || !srv->ssl_ctx.alpn_str) {
> ~~~  ^
> src/server.c:241:37: error: no member named 'ssl_ctx' in 'struct server'
> const struct ist alpn = ist2(srv->ssl_ctx.alpn_str,
>  ~~~  ^
> src/server.c:242:37: error: no member named 'ssl_ctx' in 'struct server'
>  srv->ssl_ctx.alpn_len);
>  ~~~  ^
> Version 2.4.8 builds fine.
> 
> 

Thanks for your report. One of my commit to handle properly websocket on
the server side introduces this issue. I'm working on a fix.

-- 
Amaury Denoyelle



Re: [PATCH]: MEDIUM: pool little FreeBSD support improvement.

2021-11-25 Thread Willy Tarreau
Hi David,

On Wed, Nov 24, 2021 at 08:08:39PM +, David CARLIER wrote:
> Hi
> 
> here a little patch for FreeBSD to support memory arenas trimming.
(...)
> FreeBSD uses a slighty simplified version of jemalloc as libc allocator
> since many years (there is thoughts to eventually switch to snmalloc
>  but not before a long time).
> We detect the libc in the least hacky way in this case aiming as jemalloc
>  specific API then we try to purge arenas as much as we can.

This one is interesting because according to the jemalloc doc on
my machine it could also work on linux with jemalloc, provided that
jemalloc is linked at build time (otherwise mallctl remains NULL).
However it remains available uding dlopen().

Thus I think that instead of focusing on the OS we ought to continue
to focus on the allocator and improve runtime detection:

  - glibc (currently detected using detect_allocator)
=> use malloc_trim()
  - jemalloc at build time (mallctl != NULL)
=> use mallctl() as you did
  - jemalloc at runtime (mallctl == NULL but dlsym("mallctl") != NULL)
=> use mallctl() as you did
  - others
=> no trimming

That would cover the vast majority of use cases where trimming is relevant.
I'm quite interested in jemalloc on linux, including when used at runtime
using LD_PRELOAD, because I see it quite frequently in high-performance
environments. When site operators see their memory usage fall two-fold and
the CPU as well by just prepending an LD_PRELOAD in their startup script,
they don't need to think twice and it's quickly done.

Thus I think we could proceed like this in the pools init code:

   void detect_allocator()
   {
extern int mallctl(const char *, void *, size_t *, void *, size_t) 
__attribute__((weak));

my_mallctl = mallctl;
if (!my_mallctl)
my_mallctl = get_sym_curr_addr("mallctl");

if (!my_mallctl) {
/* existing tests based on mallinfo/mallinfo2 */
}
   }

At this point we'd have:
  - my_mallctl not null when using jemalloc, hence trim() would always
use your code that purges all arenas

  - otherwise if using_libc_allocator is set, we can call malloc_trim()

  - otherwise it's an unknown allocator and we don't do anything.

Thus could you please turn your patch into a small series that does
this ? The steps I'm seeing are:
  1. changing the detection logic so that it's always performed, and
 not just on HA_HAVE_MALLOC_TRIM. This means the variants of
 functions is_trim_enabled, trim_all_pools and detect_allocator()
 are all remerged into a single variant. trim_all_pools() will be
 the only one to rely on HA_HAVE_MALLOC_TRIM.

  2. introduce mallctl availability detection and put its pointer into
 a separate one that we can get unsing dlfcn when linked in.

  3. modify trim_all_pools() to use mallctl() when available, otherwise
 rely on current code with malloc_trim().

This should make a significant improvement with plug-n-play detection
for the majority of users.

Then if you want to introduce runtime detection for snmalloc to do the
same, this would make everything much easier.

Thanks,
Willy



Re: [ANNOUNCE] haproxy-2.4.9

2021-11-25 Thread Dmitry Sivachenko
On 24 Nov 2021, at 12:57, Christopher Faulet  wrote:
> 
> 
> Hi,
> 
> HAProxy 2.4.9 was released on 2021/11/23. It added 36 new commits
> after version 2.4.8.
> 


Hello,

version 2.4.9 fails to build with OpenSSL turned off:

 src/server.c:207:51: error: no member named 'ssl_ctx' in 'struct server'
if (srv->mux_proto || srv->use_ssl != 1 || !srv->ssl_ctx.alpn_str) {
~~~  ^
src/server.c:241:37: error: no member named 'ssl_ctx' in 'struct server'
const struct ist alpn = ist2(srv->ssl_ctx.alpn_str,
 ~~~  ^
src/server.c:242:37: error: no member named 'ssl_ctx' in 'struct server'
 srv->ssl_ctx.alpn_len);
 ~~~  ^

Version 2.4.8 builds fine.





Re: [PATCH 1/1] BUG/MINOR: lua: remove loop initial declarations

2021-11-25 Thread Christopher Faulet

Le 11/24/21 à 22:16, Bertrand Jacquin a écrit :

HAProxy is documented to support gcc >= 3.4 as per INSTALL file, however
hlua.c makes use of c11 only loop initial declarations leading to build
failure when using gcc-4.9.4:

   x86_64-unknown-linux-gnu-gcc -Iinclude  -Wchar-subscripts -Wcomment -Wformat -Winit-self -Wmain 
-Wmissing-braces -Wno-pragmas -Wparentheses -Wreturn-type -Wsequence-point -Wstrict-aliasing 
-Wswitch -Wtrigraphs -Wuninitialized -Wunknown-pragmas -Wunused-label -Wunused-variable 
-Wunused-value -Wpointer-sign -Wimplicit -pthread -fdiagnostics-color=auto -D_LARGEFILE_SOURCE 
-D_FILE_OFFSET_BITS=64 -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -O3 -msse -mfpmath=sse 
-march=core2 -g -fPIC -g -Wall -Wextra -Wundef -Wdeclaration-after-statement -fwrapv 
-Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-clobbered 
-Wno-missing-field-initializers -Wtype-limits  -DUSE_EPOLL  -DUSE_NETFILTER   -DUSE_PCRE2 
-DUSE_PCRE2_JIT -DUSE_POLL -DUSE_THREAD -DUSE_BACKTRACE   -DUSE_TPROXY -DUSE_LINUX_TPROXY 
-DUSE_LINUX_SPLICE -DUSE_LIBCRYPT -DUSE_CRYPT_H -DUSE_GETADDRINFO -DUSE_OPENSSL -DUSE_LUA 
-DUSE_ACCEPT4   -DUSE_SLZ -DUSE_CPU_AFFINITY -DUSE_TFO -DUSE_NS -DUSE_DL -DUSE_RT  -DUSE_PRCTL  
-DUSE_THREAD_DUMP-DUSE_PCRE2 -DPCRE2_CODE_UNIT_WIDTH=8  -I/usr/local/include 
-DCONFIG_HAPROXY_VERSION=\"2.5.0\" -DCONFIG_HAPROXY_DATE=\"2021/11/23\" -c -o 
src/connection.o src/connection.c
   src/hlua.c: In function 'hlua_config_prepend_path':
   src/hlua.c:11292:2: error: 'for' loop initial declarations are only allowed 
in C99 or C11 mode
 for (size_t i = 0; i < 2; i++) {
 ^
   src/hlua.c:11292:2: note: use option -std=c99, -std=gnu99, -std=c11 or 
-std=gnu11 to compile your code

This commit moves loop iterator to an explicit declaration.

No backport needed as this issue was introduced in v2.5-dev10~69 with
commit 9e5e586e35c5 ("BUG/MINOR: lua: Fix lua error handling in
`hlua_config_prepend_path()`")
---
  src/hlua.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/hlua.c b/src/hlua.c
index 08735374af77..8dea91e75832 100644
--- a/src/hlua.c
+++ b/src/hlua.c
@@ -11249,6 +11249,7 @@ static int hlua_config_prepend_path(char **args, int 
section_type, struct proxy
char *path;
char *type = "path";
struct prepend_path *p = NULL;
+   size_t i;
  
  	if (too_many_args(2, args, err, NULL)) {

goto err;
@@ -11289,7 +11290,7 @@ static int hlua_config_prepend_path(char **args, int 
section_type, struct proxy
 * thread. The remaining threads will be initialized based on
 * prepend_path_list.
 */
-   for (size_t i = 0; i < 2; i++) {
+   for (i = 0; i < 2; i++) {
lua_State *L = hlua_states[i];
const char *error;
  



Thanks, merged now with backports info updated.

--
Christopher Faulet