Re: [PATCH] improve performance of std::allocator::deallocate
On 04/02/2019 07:33 PM, Padraig Brady wrote: > On 03/07/2019 03:43 AM, Jonathan Wakely wrote: >> OK, that makes me feel better about it. It's presumably much easier to >> upgrade to 5.2 from 5.0 or 5.1 than it would be from 4.x. >> How complicated is the fix to prevent the crashes? Would it be feasible for distros to backport that fix? I see that RHEL8 has jemalloc 5.0.1 for example, but if the fix could be backported to that release then it's less of a problem. >>> The patch set is simple enough: >>> https://github.com/jemalloc/jemalloc/pull/1341/commits >> Thanks. That does seem reasonable for distros and other packagers to >> backport, if they want to support 5.0 or 5.1 for their users. >> >> I'm leaning towards accepting the patch for gcc-9 (and if not, we >> should do it early in the gcc-10 cycle). >> > FYI jemalloc 5.2 is released with the fix for zero sized deallocations: > https://github.com/jemalloc/jemalloc/releases/tag/5.2.0 > Friendly ping. I can create a bug to track if you prefer. cheers, Pádraig
Re: [PATCH] improve performance of std::allocator::deallocate
On 03/07/2019 03:43 AM, Jonathan Wakely wrote: > On 06/03/19 22:27 +0000, Pádraig Brady wrote: >> >> >> On 03/06/2019 01:44 AM, Jonathan Wakely wrote: >>> On 06/03/19 09:20 +, Pádraig Brady wrote: >>>> On 03/06/2019 12:50 AM, Jonathan Wakely wrote: >>>>> On 06/03/19 02:43 +, Pádraig Brady wrote: >>>>>> >>>>>> >>>>>> On 02/26/2019 04:23 PM, Padraig Brady wrote: >>>>>>> >>>>>>>> Note jemalloc >= 5.1 is required to fix a bug with 0 sizes. >>>>>>>> >>>>>>>> How serious is the bug? What are the symptoms? >>>>>>>> >>>>>>> I've updated the commit summary to say it's a crash. >>>>>>> Arguably that's better than mem corruption. >>>>>>> >>>>>>>> It looks like 5.1.0 is less than a year old, so older versions are >>>>>>>> presumably still in wide use. >>>>>>>> >>>>>>>> We could potentially workaround it by making >>>>>>>> new_allocator::allocate(0) call ::operator new(1) when >>>>>>>> __cpp_sized_deallocation is defined, and deallocate(p, 0) call >>>>>>>> ::operator delete(p, 1). Obviously I'd prefer not to do that, >>>>>>>> because >>>>>>>> the default operator new already has an equivalent check, and only >>>>>>>> programs linking to jemalloc require the workaround. >>>>>>>> >>>>>>> Right the jemalloc fix was released May 2018. >>>>>>> It would be great to avoid the extra workarounds. >>>>>>> Given this would be released about a year after the jemalloc fix >>>>>>> was >>>>>>> released, >>>>>>> and that this would only manifest upon program rebuild, >>>>>>> I would be inclined to not add any workarounds. >>>>>>> For reference tcmalloc and glibc malloc were seen to work fine with >>>>>>> this. >>>>>> >>>>>> Actually the jemalloc issue will only be fixed with the release >>>>>> of 5.2 >>>>>> (a few weeks away). >>>>>> I've updated the commit message in the attached accordingly. >>>>> >>>>> Hmm, I'm a bit nervous about making a change that will cause crashes >>>>> unless using an unreleased version (I know it will be released by the >>>>> time GCC 9.1 is released, but some people might upgrade GCC without >>>>> upgrading jemalloc). >>>> Yes it's not ideal. It does make it a lot less risky that one has to >>>> rebuild programs to get the new functionality, so existing programs >>>> will be unaffected. Also -fsized-deallocation is only enabled by >>>> default on gcc with -std >= c++14. >>> >>> The default is -std=gnu++14 so it's enabled unless you explicitly >>> choose an older dialect or add -fno-sized-deallocation. >> Good point :) >>> >>>>> On the other hand, zero sized allocations should be rare in practice. >>>> Yes they were rare in testing here >>>> >>>> So programs _rebuilt_ against the following would need to update >>>> to jemalloc 5.2: >>>> >>>> zero sized allocs, jemalloc<5.2, c++>=14, GCC>=9.1 >>>> >>>> Hopefully that's a small enough set. >>> >>> Which versions of jemalloc replace operator delete(void*, size_t) ? >>> >>> Was that something new in 5.0 or did older versions already provide a >>> replacement for the sized operator delete? >>> >>> If it was introduced in 5.0 then there won't be a problem for 3.x and >>> 4.x because they'll use the default definition from libstdc++ which >>> just calls ::operator delete(p). >> Again very good point. The replacements were only added in 5.0: >> https://github.com/jemalloc/jemalloc/commit/2319152d > > OK, that makes me feel better about it. It's presumably much easier to > upgrade to 5.2 from 5.0 or 5.1 than it would be from 4.x. > >>> >>> How complicated is the fix to prevent the crashes? Would it be >>> feasible for distros to backport that fix? I see that RHEL8 has >>> jemalloc 5.0.1 for example, but if the fix could be backported to that >>> release then it's less of a problem. >> The patch set is simple enough: >> https://github.com/jemalloc/jemalloc/pull/1341/commits > > Thanks. That does seem reasonable for distros and other packagers to > backport, if they want to support 5.0 or 5.1 for their users. > > I'm leaning towards accepting the patch for gcc-9 (and if not, we > should do it early in the gcc-10 cycle). > FYI jemalloc 5.2 is released with the fix for zero sized deallocations: https://github.com/jemalloc/jemalloc/releases/tag/5.2.0 cheers, Pádraig
Re: [PATCH] improve performance of std::allocator::deallocate
On 03/06/2019 01:44 AM, Jonathan Wakely wrote: > On 06/03/19 09:20 +0000, Pádraig Brady wrote: >> On 03/06/2019 12:50 AM, Jonathan Wakely wrote: >>> On 06/03/19 02:43 +0000, Pádraig Brady wrote: >>>> >>>> >>>> On 02/26/2019 04:23 PM, Padraig Brady wrote: >>>>> >>>>>> Note jemalloc >= 5.1 is required to fix a bug with 0 sizes. >>>>>> >>>>>> How serious is the bug? What are the symptoms? >>>>>> >>>>> I've updated the commit summary to say it's a crash. >>>>> Arguably that's better than mem corruption. >>>>> >>>>>> It looks like 5.1.0 is less than a year old, so older versions are >>>>>> presumably still in wide use. >>>>>> >>>>>> We could potentially workaround it by making >>>>>> new_allocator::allocate(0) call ::operator new(1) when >>>>>> __cpp_sized_deallocation is defined, and deallocate(p, 0) call >>>>>> ::operator delete(p, 1). Obviously I'd prefer not to do that, >>>>>> because >>>>>> the default operator new already has an equivalent check, and only >>>>>> programs linking to jemalloc require the workaround. >>>>>> >>>>> Right the jemalloc fix was released May 2018. >>>>> It would be great to avoid the extra workarounds. >>>>> Given this would be released about a year after the jemalloc fix was >>>>> released, >>>>> and that this would only manifest upon program rebuild, >>>>> I would be inclined to not add any workarounds. >>>>> For reference tcmalloc and glibc malloc were seen to work fine with >>>>> this. >>>> >>>> Actually the jemalloc issue will only be fixed with the release of 5.2 >>>> (a few weeks away). >>>> I've updated the commit message in the attached accordingly. >>> >>> Hmm, I'm a bit nervous about making a change that will cause crashes >>> unless using an unreleased version (I know it will be released by the >>> time GCC 9.1 is released, but some people might upgrade GCC without >>> upgrading jemalloc). >> Yes it's not ideal. It does make it a lot less risky that one has to >> rebuild programs to get the new functionality, so existing programs >> will be unaffected. Also -fsized-deallocation is only enabled by >> default on gcc with -std >= c++14. > > The default is -std=gnu++14 so it's enabled unless you explicitly > choose an older dialect or add -fno-sized-deallocation. Good point :) > >>> On the other hand, zero sized allocations should be rare in practice. >> Yes they were rare in testing here >> >> So programs _rebuilt_ against the following would need to update >> to jemalloc 5.2: >> >> zero sized allocs, jemalloc<5.2, c++>=14, GCC>=9.1 >> >> Hopefully that's a small enough set. > > Which versions of jemalloc replace operator delete(void*, size_t) ? > > Was that something new in 5.0 or did older versions already provide a > replacement for the sized operator delete? > > If it was introduced in 5.0 then there won't be a problem for 3.x and > 4.x because they'll use the default definition from libstdc++ which > just calls ::operator delete(p). Again very good point. The replacements were only added in 5.0: https://github.com/jemalloc/jemalloc/commit/2319152d > > How complicated is the fix to prevent the crashes? Would it be > feasible for distros to backport that fix? I see that RHEL8 has > jemalloc 5.0.1 for example, but if the fix could be backported to that > release then it's less of a problem. The patch set is simple enough: https://github.com/jemalloc/jemalloc/pull/1341/commits cheers, Pádraig.
Re: [PATCH] improve performance of std::allocator::deallocate
On 03/06/2019 12:50 AM, Jonathan Wakely wrote: > On 06/03/19 02:43 +0000, Pádraig Brady wrote: >> >> >> On 02/26/2019 04:23 PM, Padraig Brady wrote: >>> >>>> Note jemalloc >= 5.1 is required to fix a bug with 0 sizes. >>>> >>>> How serious is the bug? What are the symptoms? >>>> >>> I've updated the commit summary to say it's a crash. >>> Arguably that's better than mem corruption. >>> >>>> It looks like 5.1.0 is less than a year old, so older versions are >>>> presumably still in wide use. >>>> >>>> We could potentially workaround it by making >>>> new_allocator::allocate(0) call ::operator new(1) when >>>> __cpp_sized_deallocation is defined, and deallocate(p, 0) call >>>> ::operator delete(p, 1). Obviously I'd prefer not to do that, because >>>> the default operator new already has an equivalent check, and only >>>> programs linking to jemalloc require the workaround. >>>> >>> Right the jemalloc fix was released May 2018. >>> It would be great to avoid the extra workarounds. >>> Given this would be released about a year after the jemalloc fix was >>> released, >>> and that this would only manifest upon program rebuild, >>> I would be inclined to not add any workarounds. >>> For reference tcmalloc and glibc malloc were seen to work fine with >>> this. >> >> Actually the jemalloc issue will only be fixed with the release of 5.2 >> (a few weeks away). >> I've updated the commit message in the attached accordingly. > > Hmm, I'm a bit nervous about making a change that will cause crashes > unless using an unreleased version (I know it will be released by the > time GCC 9.1 is released, but some people might upgrade GCC without > upgrading jemalloc). Yes it's not ideal. It does make it a lot less risky that one has to rebuild programs to get the new functionality, so existing programs will be unaffected. Also -fsized-deallocation is only enabled by default on gcc with -std >= c++14. > > On the other hand, zero sized allocations should be rare in practice. Yes they were rare in testing here So programs _rebuilt_ against the following would need to update to jemalloc 5.2: zero sized allocs, jemalloc<5.2, c++>=14, GCC>=9.1 Hopefully that's a small enough set. thanks, Pádraig
Re: [PATCH] improve performance of std::allocator::deallocate
On 02/26/2019 04:23 PM, Padraig Brady wrote: > >> Note jemalloc >= 5.1 is required to fix a bug with 0 sizes. >> >> How serious is the bug? What are the symptoms? >> > I've updated the commit summary to say it's a crash. > Arguably that's better than mem corruption. > >> It looks like 5.1.0 is less than a year old, so older versions are >> presumably still in wide use. >> >> We could potentially workaround it by making >> new_allocator::allocate(0) call ::operator new(1) when >> __cpp_sized_deallocation is defined, and deallocate(p, 0) call >> ::operator delete(p, 1). Obviously I'd prefer not to do that, because >> the default operator new already has an equivalent check, and only >> programs linking to jemalloc require the workaround. >> > Right the jemalloc fix was released May 2018. > It would be great to avoid the extra workarounds. > Given this would be released about a year after the jemalloc fix was > released, > and that this would only manifest upon program rebuild, > I would be inclined to not add any workarounds. > For reference tcmalloc and glibc malloc were seen to work fine with this. Actually the jemalloc issue will only be fixed with the release of 5.2 (a few weeks away). I've updated the commit message in the attached accordingly. thanks, Pádraig From fec53f1500f5db6a2fe1b81c36a6695fd334758e Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?P=C3=A1draig=20Brady?= Date: Fri, 22 Feb 2019 17:21:57 -0800 Subject: [PATCH] std::allocator::deallocate support sized-deallocation Pass the size to the allocator so that it may optimize deallocation. This was seen to significantly reduce the work required in jemalloc, with about 40% reduction in CPU cycles in the free path. Note jemalloc >= 5.2 is required to fix a crash with 0 sizes. * libstdc++-v3/include/ext/new_allocator.h (deallocate): Pass the size to the deallocator with -fsized-deallocation. --- libstdc++-v3/include/ext/new_allocator.h | 14 +++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/libstdc++-v3/include/ext/new_allocator.h b/libstdc++-v3/include/ext/new_allocator.h index e245391..f1ff7da 100644 --- a/libstdc++-v3/include/ext/new_allocator.h +++ b/libstdc++-v3/include/ext/new_allocator.h @@ -116,16 +116,24 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION // __p is not permitted to be a null pointer. void - deallocate(pointer __p, size_type) + deallocate(pointer __p, size_type __t) { #if __cpp_aligned_new if (alignof(_Tp) > __STDCPP_DEFAULT_NEW_ALIGNMENT__) { - ::operator delete(__p, std::align_val_t(alignof(_Tp))); + ::operator delete(__p, +# if __cpp_sized_deallocation + __t * sizeof(_Tp), +# endif + std::align_val_t(alignof(_Tp))); return; } #endif - ::operator delete(__p); + ::operator delete(__p +#if __cpp_sized_deallocation + , __t * sizeof(_Tp) +#endif + ); } size_type -- 2.5.5
Re: [PATCH] improve performance of std::allocator::deallocate
On 02/26/2019 05:50 AM, Jonathan Wakely wrote: > On 23/02/19 02:04 +0000, Pádraig Brady wrote: >> Attached is a simple patch which has been extensively tested within >> Facebook, >> and is enabled by default in our code base. >> >> Passing the size to the allocator allows it to optimize deallocation, >> and this was seen to significantly reduce the work required in jemalloc, >> with about 40% reduction in CPU cycles in the free path. > > Thanks, the patch looks good. > > I would prefer to only change the function body, as otherwise LTO > sometimes produces link-time warnings about ODR violations, because > the same function is defined on two different lines. The ODR detection > heuristics aren't smart enough to complain when the function > declarator is always defined on the same line, but with two different > function bodies. We've not noticed issues here with LTO, though that could be due to global enablement of -fsized-deallocation. Anyway your suggestion is neater. Updated patch attached. > Note jemalloc >= 5.1 is required to fix a bug with 0 sizes. > > How serious is the bug? What are the symptoms? > I've updated the commit summary to say it's a crash. Arguably that's better than mem corruption. > It looks like 5.1.0 is less than a year old, so older versions are > presumably still in wide use. > > We could potentially workaround it by making > new_allocator::allocate(0) call ::operator new(1) when > __cpp_sized_deallocation is defined, and deallocate(p, 0) call > ::operator delete(p, 1). Obviously I'd prefer not to do that, because > the default operator new already has an equivalent check, and only > programs linking to jemalloc require the workaround. > Right the jemalloc fix was released May 2018. It would be great to avoid the extra workarounds. Given this would be released about a year after the jemalloc fix was released, and that this would only manifest upon program rebuild, I would be inclined to not add any workarounds. For reference tcmalloc and glibc malloc were seen to work fine with this. thanks! Pádraig. From 54b69ee9ac7a188fa938108b4eac46a66416d0ce Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?P=C3=A1draig=20Brady?= Date: Fri, 22 Feb 2019 17:21:57 -0800 Subject: [PATCH] std::allocator::deallocate support sized-deallocation Pass the size to the allocator so that it may optimize deallocation. This was seen to significantly reduce the work required in jemalloc, with about 40% reduction in CPU cycles in the free path. Note jemalloc >= 5.1 is required to fix a crash with 0 sizes. * libstdc++-v3/include/ext/new_allocator.h (deallocate): Pass the size to the deallocator with -fsized-deallocation. --- libstdc++-v3/include/ext/new_allocator.h | 14 +++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/libstdc++-v3/include/ext/new_allocator.h b/libstdc++-v3/include/ext/new_allocator.h index e245391..f1ff7da 100644 --- a/libstdc++-v3/include/ext/new_allocator.h +++ b/libstdc++-v3/include/ext/new_allocator.h @@ -116,16 +116,24 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION // __p is not permitted to be a null pointer. void - deallocate(pointer __p, size_type) + deallocate(pointer __p, size_type __t) { #if __cpp_aligned_new if (alignof(_Tp) > __STDCPP_DEFAULT_NEW_ALIGNMENT__) { - ::operator delete(__p, std::align_val_t(alignof(_Tp))); + ::operator delete(__p, +# if __cpp_sized_deallocation + __t * sizeof(_Tp), +# endif + std::align_val_t(alignof(_Tp))); return; } #endif - ::operator delete(__p); + ::operator delete(__p +#if __cpp_sized_deallocation + , __t * sizeof(_Tp) +#endif + ); } size_type -- 2.5.5
[PATCH] improve performance of std::allocator::deallocate
Attached is a simple patch which has been extensively tested within Facebook, and is enabled by default in our code base. Passing the size to the allocator allows it to optimize deallocation, and this was seen to significantly reduce the work required in jemalloc, with about 40% reduction in CPU cycles in the free path. Note jemalloc >= 5.1 is required to fix a bug with 0 sizes. cheers, Pádraig From eb337e751bdeb5db7d41dd584c179c2cf19485a8 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?P=C3=A1draig=20Brady?= Date: Fri, 22 Feb 2019 17:21:57 -0800 Subject: [PATCH] std::allocator::deallocate support sized-deallocation Pass the size to the allocator so that it may optimize deallocation. This was seen to significantly reduce the work required in jemalloc, with about 40% reduction in CPU cycles in the free path. Note jemalloc >= 5.1 is required to fix a bug with 0 sizes. * libstdc++-v3/include/ext/new_allocator.h (deallocate): Pass the size to the deallocator with -fsized-deallocation. --- libstdc++-v3/include/ext/new_allocator.h | 17 + 1 file changed, 17 insertions(+) diff --git a/libstdc++-v3/include/ext/new_allocator.h b/libstdc++-v3/include/ext/new_allocator.h index e245391..061d8ce 100644 --- a/libstdc++-v3/include/ext/new_allocator.h +++ b/libstdc++-v3/include/ext/new_allocator.h @@ -114,6 +114,22 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION return static_cast<_Tp*>(::operator new(__n * sizeof(_Tp))); } +#if __cpp_sized_deallocation + // __p is not permitted to be a null pointer. + void + deallocate(pointer __p, size_type __t) + { +#if __cpp_aligned_new + if (alignof(_Tp) > __STDCPP_DEFAULT_NEW_ALIGNMENT__) + { + ::operator delete(__p, __t * sizeof(_Tp), + std::align_val_t(alignof(_Tp))); + return; + } +#endif + ::operator delete(__p, __t * sizeof(_Tp)); + } +#else // __p is not permitted to be a null pointer. void deallocate(pointer __p, size_type) @@ -127,6 +143,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION #endif ::operator delete(__p); } +#endif size_type max_size() const _GLIBCXX_USE_NOEXCEPT -- 2.5.5