Re: [PATCH] improve performance of std::allocator::deallocate

2019-05-20 Thread Pádraig Brady



On 04/02/2019 07:33 PM, Padraig Brady wrote:
> On 03/07/2019 03:43 AM, Jonathan Wakely wrote:
>> OK, that makes me feel better about it. It's presumably much easier to
>> upgrade to 5.2 from 5.0 or 5.1 than it would be from 4.x.
>>
 How complicated is the fix to prevent the crashes? Would it be
 feasible for distros to backport that fix? I see that RHEL8 has
 jemalloc 5.0.1 for example, but if the fix could be backported to that
 release then it's less of a problem.
>>> The patch set is simple enough:
>>> https://github.com/jemalloc/jemalloc/pull/1341/commits
>> Thanks. That does seem reasonable for distros and other packagers to
>> backport, if they want to support 5.0 or 5.1 for their users.
>>
>> I'm leaning towards accepting the patch for gcc-9 (and if not, we
>> should do it early in the gcc-10 cycle).
>>
> FYI jemalloc 5.2 is released with the fix for zero sized deallocations:
> https://github.com/jemalloc/jemalloc/releases/tag/5.2.0
>
Friendly ping.
I can create a bug to track if you prefer.

cheers,
Pádraig


Re: [PATCH] improve performance of std::allocator::deallocate

2019-04-02 Thread Pádraig Brady



On 03/07/2019 03:43 AM, Jonathan Wakely wrote:
> On 06/03/19 22:27 +0000, Pádraig Brady wrote:
>>
>>
>> On 03/06/2019 01:44 AM, Jonathan Wakely wrote:
>>> On 06/03/19 09:20 +, Pádraig Brady wrote:
>>>> On 03/06/2019 12:50 AM, Jonathan Wakely wrote:
>>>>> On 06/03/19 02:43 +, Pádraig Brady wrote:
>>>>>>
>>>>>>
>>>>>> On 02/26/2019 04:23 PM, Padraig Brady wrote:
>>>>>>>
>>>>>>>> Note jemalloc >= 5.1 is required to fix a bug with 0 sizes.
>>>>>>>>
>>>>>>>> How serious is the bug? What are the symptoms?
>>>>>>>>
>>>>>>> I've updated the commit summary to say it's a crash.
>>>>>>> Arguably that's better than mem corruption.
>>>>>>>
>>>>>>>> It looks like 5.1.0 is less than a year old, so older versions are
>>>>>>>> presumably still in wide use.
>>>>>>>>
>>>>>>>> We could potentially workaround it by making
>>>>>>>> new_allocator::allocate(0) call ::operator new(1) when
>>>>>>>> __cpp_sized_deallocation is defined, and deallocate(p, 0) call
>>>>>>>> ::operator delete(p, 1). Obviously I'd prefer not to do that,
>>>>>>>> because
>>>>>>>> the default operator new already has an equivalent check, and only
>>>>>>>> programs linking to jemalloc require the workaround.
>>>>>>>>
>>>>>>> Right the jemalloc fix was released May 2018.
>>>>>>> It would be great to avoid the extra workarounds.
>>>>>>> Given this would be released about a year after the jemalloc fix
>>>>>>> was
>>>>>>> released,
>>>>>>> and that this would only manifest upon program rebuild,
>>>>>>> I would be inclined to not add any workarounds.
>>>>>>> For reference tcmalloc and glibc malloc were seen to work fine with
>>>>>>> this.
>>>>>>
>>>>>> Actually the jemalloc issue will only be fixed with the release
>>>>>> of 5.2
>>>>>> (a few weeks away).
>>>>>> I've updated the commit message in the attached accordingly.
>>>>>
>>>>> Hmm, I'm a bit nervous about making a change that will cause crashes
>>>>> unless using an unreleased version (I know it will be released by the
>>>>> time GCC 9.1 is released, but some people might upgrade GCC without
>>>>> upgrading jemalloc).
>>>> Yes it's not ideal. It does make it a lot less risky that one has to
>>>> rebuild programs to get the new functionality, so existing programs
>>>> will be unaffected. Also -fsized-deallocation is only enabled by
>>>> default on gcc with -std >= c++14.
>>>
>>> The default is -std=gnu++14 so it's enabled unless you explicitly
>>> choose an older dialect or add -fno-sized-deallocation.
>> Good point :)
>>>
>>>>> On the other hand, zero sized allocations should be rare in practice.
>>>> Yes they were rare in testing here
>>>>
>>>> So programs _rebuilt_ against the following would need to update
>>>> to jemalloc 5.2:
>>>>
>>>>  zero sized allocs, jemalloc<5.2, c++>=14, GCC>=9.1
>>>>
>>>> Hopefully that's a small enough set.
>>>
>>> Which versions of jemalloc replace operator delete(void*, size_t) ?
>>>
>>> Was that something new in 5.0 or did older versions already provide a
>>> replacement for the sized operator delete?
>>>
>>> If it was introduced in 5.0 then there won't be a problem for 3.x and
>>> 4.x because they'll use the default definition from libstdc++ which
>>> just calls ::operator delete(p).
>> Again very good point. The replacements were only added in 5.0:
>> https://github.com/jemalloc/jemalloc/commit/2319152d
>
> OK, that makes me feel better about it. It's presumably much easier to
> upgrade to 5.2 from 5.0 or 5.1 than it would be from 4.x.
>
>>>
>>> How complicated is the fix to prevent the crashes? Would it be
>>> feasible for distros to backport that fix? I see that RHEL8 has
>>> jemalloc 5.0.1 for example, but if the fix could be backported to that
>>> release then it's less of a problem.
>> The patch set is simple enough:
>> https://github.com/jemalloc/jemalloc/pull/1341/commits
>
> Thanks. That does seem reasonable for distros and other packagers to
> backport, if they want to support 5.0 or 5.1 for their users.
>
> I'm leaning towards accepting the patch for gcc-9 (and if not, we
> should do it early in the gcc-10 cycle).
>
FYI jemalloc 5.2 is released with the fix for zero sized deallocations:
https://github.com/jemalloc/jemalloc/releases/tag/5.2.0

cheers,
Pádraig


Re: [PATCH] improve performance of std::allocator::deallocate

2019-03-06 Thread Pádraig Brady



On 03/06/2019 01:44 AM, Jonathan Wakely wrote:
> On 06/03/19 09:20 +0000, Pádraig Brady wrote:
>> On 03/06/2019 12:50 AM, Jonathan Wakely wrote:
>>> On 06/03/19 02:43 +0000, Pádraig Brady wrote:
>>>>
>>>>
>>>> On 02/26/2019 04:23 PM, Padraig Brady wrote:
>>>>>
>>>>>> Note jemalloc >= 5.1 is required to fix a bug with 0 sizes.
>>>>>>
>>>>>> How serious is the bug? What are the symptoms?
>>>>>>
>>>>> I've updated the commit summary to say it's a crash.
>>>>> Arguably that's better than mem corruption.
>>>>>
>>>>>> It looks like 5.1.0 is less than a year old, so older versions are
>>>>>> presumably still in wide use.
>>>>>>
>>>>>> We could potentially workaround it by making
>>>>>> new_allocator::allocate(0) call ::operator new(1) when
>>>>>> __cpp_sized_deallocation is defined, and deallocate(p, 0) call
>>>>>> ::operator delete(p, 1). Obviously I'd prefer not to do that,
>>>>>> because
>>>>>> the default operator new already has an equivalent check, and only
>>>>>> programs linking to jemalloc require the workaround.
>>>>>>
>>>>> Right the jemalloc fix was released May 2018.
>>>>> It would be great to avoid the extra workarounds.
>>>>> Given this would be released about a year after the jemalloc fix was
>>>>> released,
>>>>> and that this would only manifest upon program rebuild,
>>>>> I would be inclined to not add any workarounds.
>>>>> For reference tcmalloc and glibc malloc were seen to work fine with
>>>>> this.
>>>>
>>>> Actually the jemalloc issue will only be fixed with the release of 5.2
>>>> (a few weeks away).
>>>> I've updated the commit message in the attached accordingly.
>>>
>>> Hmm, I'm a bit nervous about making a change that will cause crashes
>>> unless using an unreleased version (I know it will be released by the
>>> time GCC 9.1 is released, but some people might upgrade GCC without
>>> upgrading jemalloc).
>> Yes it's not ideal. It does make it a lot less risky that one has to
>> rebuild programs to get the new functionality, so existing programs
>> will be unaffected. Also -fsized-deallocation is only enabled by
>> default on gcc with -std >= c++14.
>
> The default is -std=gnu++14 so it's enabled unless you explicitly
> choose an older dialect or add -fno-sized-deallocation.
Good point :)
>
>>> On the other hand, zero sized allocations should be rare in practice.
>> Yes they were rare in testing here
>>
>> So programs _rebuilt_ against the following would need to update
>> to jemalloc 5.2:
>>
>>  zero sized allocs, jemalloc<5.2, c++>=14, GCC>=9.1
>>
>> Hopefully that's a small enough set.
>
> Which versions of jemalloc replace operator delete(void*, size_t) ?
>
> Was that something new in 5.0 or did older versions already provide a
> replacement for the sized operator delete?
>
> If it was introduced in 5.0 then there won't be a problem for 3.x and
> 4.x because they'll use the default definition from libstdc++ which
> just calls ::operator delete(p).
Again very good point. The replacements were only added in 5.0:
https://github.com/jemalloc/jemalloc/commit/2319152d
>
> How complicated is the fix to prevent the crashes? Would it be
> feasible for distros to backport that fix? I see that RHEL8 has
> jemalloc 5.0.1 for example, but if the fix could be backported to that
> release then it's less of a problem.
The patch set is simple enough:
https://github.com/jemalloc/jemalloc/pull/1341/commits

cheers,
Pádraig.


Re: [PATCH] improve performance of std::allocator::deallocate

2019-03-06 Thread Pádraig Brady
On 03/06/2019 12:50 AM, Jonathan Wakely wrote:
> On 06/03/19 02:43 +0000, Pádraig Brady wrote:
>>
>>
>> On 02/26/2019 04:23 PM, Padraig Brady wrote:
>>>
>>>> Note jemalloc >= 5.1 is required to fix a bug with 0 sizes.
>>>>
>>>> How serious is the bug? What are the symptoms?
>>>>
>>> I've updated the commit summary to say it's a crash.
>>> Arguably that's better than mem corruption.
>>>
>>>> It looks like 5.1.0 is less than a year old, so older versions are
>>>> presumably still in wide use.
>>>>
>>>> We could potentially workaround it by making
>>>> new_allocator::allocate(0) call ::operator new(1) when
>>>> __cpp_sized_deallocation is defined, and deallocate(p, 0) call
>>>> ::operator delete(p, 1). Obviously I'd prefer not to do that, because
>>>> the default operator new already has an equivalent check, and only
>>>> programs linking to jemalloc require the workaround.
>>>>
>>> Right the jemalloc fix was released May 2018.
>>> It would be great to avoid the extra workarounds.
>>> Given this would be released about a year after the jemalloc fix was
>>> released,
>>> and that this would only manifest upon program rebuild,
>>> I would be inclined to not add any workarounds.
>>> For reference tcmalloc and glibc malloc were seen to work fine with
>>> this.
>>
>> Actually the jemalloc issue will only be fixed with the release of 5.2
>> (a few weeks away).
>> I've updated the commit message in the attached accordingly.
>
> Hmm, I'm a bit nervous about making a change that will cause crashes
> unless using an unreleased version (I know it will be released by the
> time GCC 9.1 is released, but some people might upgrade GCC without
> upgrading jemalloc).
Yes it's not ideal. It does make it a lot less risky that one has to
rebuild programs to get the new functionality, so existing programs
will be unaffected. Also -fsized-deallocation is only enabled by
default on gcc with -std >= c++14.
>
> On the other hand, zero sized allocations should be rare in practice.
Yes they were rare in testing here

So programs _rebuilt_ against the following would need to update
to jemalloc 5.2:

  zero sized allocs, jemalloc<5.2, c++>=14, GCC>=9.1

Hopefully that's a small enough set.

thanks,
Pádraig


Re: [PATCH] improve performance of std::allocator::deallocate

2019-03-05 Thread Pádraig Brady


On 02/26/2019 04:23 PM, Padraig Brady wrote:
>
>> Note jemalloc >= 5.1 is required to fix a bug with 0 sizes.
>>
>> How serious is the bug? What are the symptoms?
>>
> I've updated the commit summary to say it's a crash.
> Arguably that's better than mem corruption.
>
>> It looks like 5.1.0 is less than a year old, so older versions are
>> presumably still in wide use.
>>
>> We could potentially workaround it by making
>> new_allocator::allocate(0) call ::operator new(1) when
>> __cpp_sized_deallocation is defined, and deallocate(p, 0) call
>> ::operator delete(p, 1). Obviously I'd prefer not to do that, because
>> the default operator new already has an equivalent check, and only
>> programs linking to jemalloc require the workaround.
>>
> Right the jemalloc fix was released May 2018.
> It would be great to avoid the extra workarounds.
> Given this would be released about a year after the jemalloc fix was
> released,
> and that this would only manifest upon program rebuild,
> I would be inclined to not add any workarounds.
> For reference tcmalloc and glibc malloc were seen to work fine with this.

Actually the jemalloc issue will only be fixed with the release of 5.2
(a few weeks away).
I've updated the commit message in the attached accordingly.

thanks,
Pádraig
From fec53f1500f5db6a2fe1b81c36a6695fd334758e Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?P=C3=A1draig=20Brady?= 
Date: Fri, 22 Feb 2019 17:21:57 -0800
Subject: [PATCH] std::allocator::deallocate support sized-deallocation

Pass the size to the allocator so that it may optimize deallocation.
This was seen to significantly reduce the work required in jemalloc,
with about 40% reduction in CPU cycles in the free path.
Note jemalloc >= 5.2 is required to fix a crash with 0 sizes.

* libstdc++-v3/include/ext/new_allocator.h (deallocate): Pass the size
to the deallocator with -fsized-deallocation.
---
 libstdc++-v3/include/ext/new_allocator.h | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/libstdc++-v3/include/ext/new_allocator.h b/libstdc++-v3/include/ext/new_allocator.h
index e245391..f1ff7da 100644
--- a/libstdc++-v3/include/ext/new_allocator.h
+++ b/libstdc++-v3/include/ext/new_allocator.h
@@ -116,16 +116,24 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   // __p is not permitted to be a null pointer.
   void
-  deallocate(pointer __p, size_type)
+  deallocate(pointer __p, size_type __t)
   {
 #if __cpp_aligned_new
 	if (alignof(_Tp) > __STDCPP_DEFAULT_NEW_ALIGNMENT__)
 	  {
-	::operator delete(__p, std::align_val_t(alignof(_Tp)));
+	::operator delete(__p,
+# if __cpp_sized_deallocation
+			  __t * sizeof(_Tp),
+# endif
+			  std::align_val_t(alignof(_Tp)));
 	return;
 	  }
 #endif
-	::operator delete(__p);
+	::operator delete(__p
+#if __cpp_sized_deallocation
+			  , __t * sizeof(_Tp)
+#endif
+			 );
   }
 
   size_type
-- 
2.5.5



Re: [PATCH] improve performance of std::allocator::deallocate

2019-02-26 Thread Pádraig Brady


On 02/26/2019 05:50 AM, Jonathan Wakely wrote:
> On 23/02/19 02:04 +0000, Pádraig Brady wrote:
>> Attached is a simple patch which has been extensively tested within
>> Facebook,
>> and is enabled by default in our code base.
>>
>> Passing the size to the allocator allows it to optimize deallocation,
>> and this was seen to significantly reduce the work required in jemalloc,
>> with about 40% reduction in CPU cycles in the free path.
>
> Thanks, the patch looks good.
>
> I would prefer to only change the function body, as otherwise LTO
> sometimes produces link-time warnings about ODR violations, because
> the same function is defined on two different lines. The ODR detection
> heuristics aren't smart enough to complain when the function
> declarator is always defined on the same line, but with two different
> function bodies.

We've not noticed issues here with LTO, though that could be
due to global enablement of -fsized-deallocation.
Anyway your suggestion is neater. Updated patch attached.
> Note jemalloc >= 5.1 is required to fix a bug with 0 sizes.
>
> How serious is the bug? What are the symptoms?
>
I've updated the commit summary to say it's a crash.
Arguably that's better than mem corruption.

> It looks like 5.1.0 is less than a year old, so older versions are
> presumably still in wide use.
>
> We could potentially workaround it by making
> new_allocator::allocate(0) call ::operator new(1) when
> __cpp_sized_deallocation is defined, and deallocate(p, 0) call
> ::operator delete(p, 1). Obviously I'd prefer not to do that, because
> the default operator new already has an equivalent check, and only
> programs linking to jemalloc require the workaround.
>
Right the jemalloc fix was released May 2018.
It would be great to avoid the extra workarounds.
Given this would be released about a year after the jemalloc fix was
released,
and that this would only manifest upon program rebuild,
I would be inclined to not add any workarounds.
For reference tcmalloc and glibc malloc were seen to work fine with this.

thanks!
Pádraig.
From 54b69ee9ac7a188fa938108b4eac46a66416d0ce Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?P=C3=A1draig=20Brady?= 
Date: Fri, 22 Feb 2019 17:21:57 -0800
Subject: [PATCH] std::allocator::deallocate support sized-deallocation

Pass the size to the allocator so that it may optimize deallocation.
This was seen to significantly reduce the work required in jemalloc,
with about 40% reduction in CPU cycles in the free path.
Note jemalloc >= 5.1 is required to fix a crash with 0 sizes.

* libstdc++-v3/include/ext/new_allocator.h (deallocate): Pass the size
to the deallocator with -fsized-deallocation.
---
 libstdc++-v3/include/ext/new_allocator.h | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/libstdc++-v3/include/ext/new_allocator.h b/libstdc++-v3/include/ext/new_allocator.h
index e245391..f1ff7da 100644
--- a/libstdc++-v3/include/ext/new_allocator.h
+++ b/libstdc++-v3/include/ext/new_allocator.h
@@ -116,16 +116,24 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   // __p is not permitted to be a null pointer.
   void
-  deallocate(pointer __p, size_type)
+  deallocate(pointer __p, size_type __t)
   {
 #if __cpp_aligned_new
 	if (alignof(_Tp) > __STDCPP_DEFAULT_NEW_ALIGNMENT__)
 	  {
-	::operator delete(__p, std::align_val_t(alignof(_Tp)));
+	::operator delete(__p,
+# if __cpp_sized_deallocation
+			  __t * sizeof(_Tp),
+# endif
+			  std::align_val_t(alignof(_Tp)));
 	return;
 	  }
 #endif
-	::operator delete(__p);
+	::operator delete(__p
+#if __cpp_sized_deallocation
+			  , __t * sizeof(_Tp)
+#endif
+			 );
   }
 
   size_type
-- 
2.5.5



[PATCH] improve performance of std::allocator::deallocate

2019-02-22 Thread Pádraig Brady
Attached is a simple patch which has been extensively tested within
Facebook,
and is enabled by default in our code base.

Passing the size to the allocator allows it to optimize deallocation,
and this was seen to significantly reduce the work required in jemalloc,
with about 40% reduction in CPU cycles in the free path.

Note jemalloc >= 5.1 is required to fix a bug with 0 sizes.

cheers,
Pádraig
From eb337e751bdeb5db7d41dd584c179c2cf19485a8 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?P=C3=A1draig=20Brady?= 
Date: Fri, 22 Feb 2019 17:21:57 -0800
Subject: [PATCH] std::allocator::deallocate support sized-deallocation

Pass the size to the allocator so that it may optimize deallocation.
This was seen to significantly reduce the work required in jemalloc,
with about 40% reduction in CPU cycles in the free path.
Note jemalloc >= 5.1 is required to fix a bug with 0 sizes.

* libstdc++-v3/include/ext/new_allocator.h (deallocate): Pass the size
to the deallocator with -fsized-deallocation.
---
 libstdc++-v3/include/ext/new_allocator.h | 17 +
 1 file changed, 17 insertions(+)

diff --git a/libstdc++-v3/include/ext/new_allocator.h b/libstdc++-v3/include/ext/new_allocator.h
index e245391..061d8ce 100644
--- a/libstdc++-v3/include/ext/new_allocator.h
+++ b/libstdc++-v3/include/ext/new_allocator.h
@@ -114,6 +114,22 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	return static_cast<_Tp*>(::operator new(__n * sizeof(_Tp)));
   }
 
+#if __cpp_sized_deallocation
+  // __p is not permitted to be a null pointer.
+  void
+  deallocate(pointer __p, size_type __t)
+  {
+#if __cpp_aligned_new
+	if (alignof(_Tp) > __STDCPP_DEFAULT_NEW_ALIGNMENT__)
+	  {
+	::operator delete(__p, __t * sizeof(_Tp),
+			  std::align_val_t(alignof(_Tp)));
+	return;
+	  }
+#endif
+	::operator delete(__p, __t * sizeof(_Tp));
+  }
+#else
   // __p is not permitted to be a null pointer.
   void
   deallocate(pointer __p, size_type)
@@ -127,6 +143,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #endif
 	::operator delete(__p);
   }
+#endif
 
   size_type
   max_size() const _GLIBCXX_USE_NOEXCEPT
-- 
2.5.5