Re: [PATCH 0/2] Introduce gcc_qsort
> I meant "anti-commutativity"; sorry about the mixup. symmetry/antisymmetry is more appropriate for binary relations (commutativity is for binary operations). -- Eric Botcazou
Re: [PATCH 0/2] Introduce gcc_qsort
On Fri, May 11, 2018 at 03:03:09PM +0300, Alexander Monakov wrote: > On Fri, 11 May 2018, Segher Boessenkool wrote: > > > In general such address-based comparisons steps are invalid; they lack > > > anti-reflexivity. So comparators are not actually allowed to do that. > > > > I don't know what you mean here? Every comparator is required to be > > *reflexive*! Subtracting two pointers to elements of the same array gives > > a perfectly fine total order as far as I see (it is the same as the > > difference between the array indices of those elements). > > I meant "anti-commutativity"; sorry about the mixup. Such strategy does not > give a valid total order because the order changes while in-progress sorting > reorders elements: Ah. Right. C11 7.22.5/4. Sorry for being dense, I'll drink more coffee. Segher
Re: [PATCH 0/2] Introduce gcc_qsort
On Fri, 11 May 2018, Segher Boessenkool wrote: > > In general such address-based comparisons steps are invalid; they lack > > anti-reflexivity. So comparators are not actually allowed to do that. > > I don't know what you mean here? Every comparator is required to be > *reflexive*! Subtracting two pointers to elements of the same array gives > a perfectly fine total order as far as I see (it is the same as the > difference between the array indices of those elements). I meant "anti-commutativity"; sorry about the mixup. Such strategy does not give a valid total order because the order changes while in-progress sorting reorders elements: suppose you have elements A and B such that A precedes B and compare A < B via address test. At some point, the sort routine may swap A with some unrelated element C, moving A past B. Now B < A. This is a contradiction. > In any case, every qsort call that is converted to one with this weaker > contract will need to be checked. Or we can wait for bug reports ;-) I think such comparators have a good chance of failing existing qsort_chk validation. Alexander
Re: [PATCH 0/2] Introduce gcc_qsort
On Fri, May 11, 2018 at 01:35:00PM +0300, Alexander Monakov wrote: > On Fri, 11 May 2018, Segher Boessenkool wrote: > > For example from reload1.c: > > No, this example is not relevant, because ... > > > static int > > reload_reg_class_lower (const void *r1p, const void *r2p) > > { > > int r1 = *(const short *) r1p, r2 = *(const short *) r2p; > > > > // ... > > > > return r1 - r2; > > this subtracts integers, not pointers. Oh whoops, I read that too fast. Sorry. > In general such address-based comparisons steps are invalid; they lack > anti-reflexivity. So comparators are not actually allowed to do that. I don't know what you mean here? Every comparator is required to be *reflexive*! Subtracting two pointers to elements of the same array gives a perfectly fine total order as far as I see (it is the same as the difference between the array indices of those elements). In any case, every qsort call that is converted to one with this weaker contract will need to be checked. Or we can wait for bug reports ;-) Segher
Re: [PATCH 0/2] Introduce gcc_qsort
On Fri, May 11, 2018 at 05:29:05AM -0500, Segher Boessenkool wrote: > Such a comparator will still work with temporary arrays if both elements > are always in the same temp array (in deterministic order, etc.) array; but > it won't work if the comparator did e.g. > > return (r1 - reload_order) - (r2 - reload_order); > > (it would be undefined behaviour, to start with). Yeah, UB, but likely would still "work". What wouldn't work is if a comparator assumes certain sizes of the array and e.g. stores the difference r1 - reload_order in char/short/int because all callers guarantee that nmemb * size fits into such types. Jakub
Re: [PATCH 0/2] Introduce gcc_qsort
On Fri, 11 May 2018, Segher Boessenkool wrote: > For example from reload1.c: No, this example is not relevant, because ... > static int > reload_reg_class_lower (const void *r1p, const void *r2p) > { > int r1 = *(const short *) r1p, r2 = *(const short *) r2p; > > // ... > > return r1 - r2; this subtracts integers, not pointers. In general such address-based comparisons steps are invalid; they lack anti-reflexivity. So comparators are not actually allowed to do that. Alexander
Re: [PATCH 0/2] Introduce gcc_qsort
On Thu, May 10, 2018 at 07:40:57PM +0200, Richard Biener wrote: > On May 10, 2018 5:56:39 PM GMT+02:00, Alexander Monakov > wrote: > >/* This implements a sort function suitable for GCC use cases: > > - signature-compatible to C qsort, but relaxed contract: > > - may apply the comparator to elements in a temporary buffer > > What consequences has this or rather how is this observable and makes > comparators behave differently? A comparison function is allowed to compare the addresses of the elements, or compute what should be the index into the array that is sorted from the pointers to the elements, etc. That will not work if the pointers to the elements do not actually point into the original array. For example from reload1.c: static int reload_reg_class_lower (const void *r1p, const void *r2p) { int r1 = *(const short *) r1p, r2 = *(const short *) r2p; // ... return r1 - r2; } (called as static short reload_order[MAX_RELOADS]; // ... qsort (reload_order, n_reloads, sizeof (short), reload_reg_class_lower); ). Such a comparator will still work with temporary arrays if both elements are always in the same temp array (in deterministic order, etc.) array; but it won't work if the comparator did e.g. return (r1 - reload_order) - (r2 - reload_order); (it would be undefined behaviour, to start with). Segher
Re: [PATCH 0/2] Introduce gcc_qsort
On Thu, 10 May 2018, Richard Biener wrote: > > - signature-compatible to C qsort, but relaxed contract: > > - may apply the comparator to elements in a temporary buffer > > What consequences has this or rather how is this observable and makes > comparators behave differently? The only serious consequence I'm aware of is this: the buffer must be sufficiently aligned to match the alignment requirement of elements being sorted Alexander
Re: [PATCH 0/2] Introduce gcc_qsort
On Thu, 10 May 2018, Jakub Jelinek wrote: > Have you gathered some statistics on the element sizes and how often they > appear in qsort calls (perhaps weighted by n*log(n) of the element count) > across bootstrap+regtest? No, but Adhemerval Zanella collected stats on bootstrap, and they are similar to my observations: https://sourceware.org/ml/libc-alpha/2018-01/msg00629.html > glibc uses indirect sorting (sorts pointers to the elements or indexes and > just reshuffles at the end) and has special case for the most commonly used > small element size (4/8). With C++ templates you could achieve that even > without macros, just by instantiating the mergesort and its helpers for the > few common cases. Or is that not worth it (as in, we never sort really > large (say > 32 bytes) element sizes and the 4 or 8 bytes long element sizes > aren't common enough to see benefit by using constant size memcpy for those > cases? I think it's not worth it as branches by element size are not too frequent, off the critical path, and easy for predictors. So doing it via templates would cause significant code growth for no speed gain. As for indirect sorting, we rarely sort elements of size other than 4/8, so I believe that's not worth it either. Alexander
Re: [PATCH 0/2] Introduce gcc_qsort
On May 10, 2018 5:56:39 PM GMT+02:00, Alexander Monakov wrote: >Hello. > >This introduces a replacement for qsort() in GCC. The main selling >point is >reproducibility (currently compiler output may change depending on how >libc >qsort reorders not-bitwise-identical elements that compare equal) with >a >small improvement speed-wise and small code growth (under 2K on >x86-64). > >The opening comment in sort.cc gives a brief implementation overview: > >/* This implements a sort function suitable for GCC use cases: > - signature-compatible to C qsort, but relaxed contract: > - may apply the comparator to elements in a temporary buffer What consequences has this or rather how is this observable and makes comparators behave differently? Otherwise thanks for doing this. Will review tomorrow. Richard. > - may abort on allocation failure > - deterministic (but not necessarily stable) > - fast, especially for common cases (0-5 elements of size 8 or 4) > > The implementation uses a network sort for up to 5 elements and > a merge sort on top of that. Neither stage has branches depending on >comparator result, trading extra arithmetic for branch mispredictions. >*/ > >I used a Sandy Bridge CPU to collect statistics on tramp3d -O2 >compilation. > >Overall the new implementation is roughly 30% faster compared to Glibc >qsort, >with 2x or more speedup for cases with tiny element count. I see one >instance >where the new approach is significantly (1.5x) slower: it is ipa-icf.c: >sort_congruence_class_groups_by_decl_uid. It sorts a big array (1500 >entries) >and needs 14 indirect loads just to reach values to compare, so when >branch >prediction manages to guess correctly, it allows to overlap execution >of two >comparators and better hide their cache miss latency. > >Overall GCC spends about 0.3% time under qsort, but this doesn't >automatically >mean that this brings a 0.1% speed improvement: it may be larger or >smaller >depending on how new code affects cache behavior and branch predictors >in >other code, and it's not trivial to measure precisely. > >I can go into more detail about measured stats if there's interest :) > >Makefile.in changes are separated to patch 2 in the hope it'd make >review >easier, but the two patches will need to be applied together. > >Bootstrapped/regtested on x86-64, OK for trunk? > >Alexander Monakov (2): > gcc_qsort: source code changes > gcc_qsort: build system changes > > gcc/Makefile.in | 9 ++- >gcc/sort.cc | 232 > > gcc/system.h| 7 +- > gcc/vec.c | 2 +- > 4 files changed, 243 insertions(+), 7 deletions(-) > create mode 100644 gcc/sort.cc
Re: [PATCH 0/2] Introduce gcc_qsort
On Thu, May 10, 2018 at 06:56:39PM +0300, Alexander Monakov wrote: > Overall the new implementation is roughly 30% faster compared to Glibc qsort, > with 2x or more speedup for cases with tiny element count. I see one instance > where the new approach is significantly (1.5x) slower: it is ipa-icf.c: > sort_congruence_class_groups_by_decl_uid. It sorts a big array (1500 entries) > and needs 14 indirect loads just to reach values to compare, so when branch > prediction manages to guess correctly, it allows to overlap execution of two > comparators and better hide their cache miss latency. > > Overall GCC spends about 0.3% time under qsort, but this doesn't automatically > mean that this brings a 0.1% speed improvement: it may be larger or smaller > depending on how new code affects cache behavior and branch predictors in > other code, and it's not trivial to measure precisely. > > I can go into more detail about measured stats if there's interest :) > > Makefile.in changes are separated to patch 2 in the hope it'd make review > easier, but the two patches will need to be applied together. > > Bootstrapped/regtested on x86-64, OK for trunk? Have you gathered some statistics on the element sizes and how often they appear in qsort calls (perhaps weighted by n*log(n) of the element count) across bootstrap+regtest? glibc uses indirect sorting (sorts pointers to the elements or indexes and just reshuffles at the end) and has special case for the most commonly used small element size (4/8). With C++ templates you could achieve that even without macros, just by instantiating the mergesort and its helpers for the few common cases. Or is that not worth it (as in, we never sort really large (say > 32 bytes) element sizes and the 4 or 8 bytes long element sizes aren't common enough to see benefit by using constant size memcpy for those cases? Jakub
[PATCH 0/2] Introduce gcc_qsort
Hello. This introduces a replacement for qsort() in GCC. The main selling point is reproducibility (currently compiler output may change depending on how libc qsort reorders not-bitwise-identical elements that compare equal) with a small improvement speed-wise and small code growth (under 2K on x86-64). The opening comment in sort.cc gives a brief implementation overview: /* This implements a sort function suitable for GCC use cases: - signature-compatible to C qsort, but relaxed contract: - may apply the comparator to elements in a temporary buffer - may abort on allocation failure - deterministic (but not necessarily stable) - fast, especially for common cases (0-5 elements of size 8 or 4) The implementation uses a network sort for up to 5 elements and a merge sort on top of that. Neither stage has branches depending on comparator result, trading extra arithmetic for branch mispredictions. */ I used a Sandy Bridge CPU to collect statistics on tramp3d -O2 compilation. Overall the new implementation is roughly 30% faster compared to Glibc qsort, with 2x or more speedup for cases with tiny element count. I see one instance where the new approach is significantly (1.5x) slower: it is ipa-icf.c: sort_congruence_class_groups_by_decl_uid. It sorts a big array (1500 entries) and needs 14 indirect loads just to reach values to compare, so when branch prediction manages to guess correctly, it allows to overlap execution of two comparators and better hide their cache miss latency. Overall GCC spends about 0.3% time under qsort, but this doesn't automatically mean that this brings a 0.1% speed improvement: it may be larger or smaller depending on how new code affects cache behavior and branch predictors in other code, and it's not trivial to measure precisely. I can go into more detail about measured stats if there's interest :) Makefile.in changes are separated to patch 2 in the hope it'd make review easier, but the two patches will need to be applied together. Bootstrapped/regtested on x86-64, OK for trunk? Alexander Monakov (2): gcc_qsort: source code changes gcc_qsort: build system changes gcc/Makefile.in | 9 ++- gcc/sort.cc | 232 gcc/system.h| 7 +- gcc/vec.c | 2 +- 4 files changed, 243 insertions(+), 7 deletions(-) create mode 100644 gcc/sort.cc -- 2.13.3