Re: [PATCH 1/5] lib/sort: Make swap functions more generic
Hi George, On Fri, Mar 15, 2019 at 4:36 AM George Spelvin wrote: > >> swap_bytes / swap_4byte_words / swap_8byte_words > >> swap_bytes / swap_ints / swap_longs > >> swap_1 / swap_4 / swap_8 > >> Pistols at dawn? > > On Thu, 14 Mar 2019 at 22:59:55 +0300, Andrey Abramov wrote: > > Yes, in my opinion, swap_bytes / swap_ints / swap_longs are the > > most readable because we have both swap_ints and swap_longs functions > > (in one file near each other), so I don't think that there will be > > any confusion about size. > > Yes, that's what I thought. They're three related but different > functions, suffixed _bytes, _ints, and _longs. What could the > difference possibly be? And if anyone has any lingering doubts, > the functions are right there, with exquisitely clear comments. > > No to mention where they're used. Is "is_aligned(base, size, 8)" > remotely obscure? Especially in context: > > if (is_aligned(base, size, 8)) > swap_func = swap_longs; > else if (is_aligned(base, size, 4)) > swap_func = swap_ints; > else > swap_func = swap_bytes; > > What subtle and mysterious code. > > > But actually, it doesn't matter which name will you take, because > > the meaning of each, in my opinion, is obvious enough, so I don't > > mind about any of these options. > > I'm just amazed that this piece of bikeshedding is the most > contentious thing about the patch series. > > I mean, if I'd named them: > llanfairpwllgwyngyll() > shravanabelagola() > zheleznodorozhny() > or > peckish() > esuriant() > hungry() > then yes, those would be bad names. > > I prefer the shorter _ints and _longs names, but this is just > not a hill I want to die on. Argument of least surprise: don't call something a duck if it's not guaranteed to behave like a duck. If I read "long", this triggers a warning flag in my head: be careful, this is 32-bit on 32-bit platforms, and 64-bit on 64-bit platforms. There's a reason the newer I/O ioread{8,16,32} accessors use explicit sizes, unlike the ancient x86-centric read[bwl](). Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds
Re: [PATCH 1/5] lib/sort: Make swap functions more generic
>> swap_bytes / swap_4byte_words / swap_8byte_words >> swap_bytes / swap_ints / swap_longs >> swap_1 / swap_4 / swap_8 >> Pistols at dawn? On Thu, 14 Mar 2019 at 22:59:55 +0300, Andrey Abramov wrote: > Yes, in my opinion, swap_bytes / swap_ints / swap_longs are the > most readable because we have both swap_ints and swap_longs functions > (in one file near each other), so I don't think that there will be > any confusion about size. Yes, that's what I thought. They're three related but different functions, suffixed _bytes, _ints, and _longs. What could the difference possibly be? And if anyone has any lingering doubts, the functions are right there, with exquisitely clear comments. No to mention where they're used. Is "is_aligned(base, size, 8)" remotely obscure? Especially in context: if (is_aligned(base, size, 8)) swap_func = swap_longs; else if (is_aligned(base, size, 4)) swap_func = swap_ints; else swap_func = swap_bytes; What subtle and mysterious code. > But actually, it doesn't matter which name will you take, because > the meaning of each, in my opinion, is obvious enough, so I don't > mind about any of these options. I'm just amazed that this piece of bikeshedding is the most contentious thing about the patch series. I mean, if I'd named them: llanfairpwllgwyngyll() shravanabelagola() zheleznodorozhny() or peckish() esuriant() hungry() then yes, those would be bad names. I prefer the shorter _ints and _longs names, but this is just not a hill I want to die on.
Re: [PATCH 1/5] lib/sort: Make swap functions more generic
> Pistols at dawn? > swap_bytes > swap_4byte_words > swap_8byte_words or > swap_bytes / swap_ints / swap_longs > swap_1 / swap_4 / swap_8 Yes, in my opinion, swap_bytes / swap_ints / swap_longs are the most readable because we have both swap_ints and swap_longs functions (in one file near each other), so I don't think that there will be any confusion about size. But actually, it doesn't matter which name will you take, because the meaning of each, in my opinion, is obvious enough, so I don't mind about any of these options. -- With Best Regards, Andrey Abramov
Re: [PATCH 1/5] lib/sort: Make swap functions more generic
On Thu, Mar 14, 2019 at 11:53:55AM +, George Spelvin wrote: > On Thu, 14 Mar 2019 at 11:41:26 +0100, Geert Uytterhoeven wrote: > > On Thu, Mar 14, 2019 at 11:10 AM George Spelvin wrote: > >> On Sat, 09 Mar 2019 at 23:19:49 +0300, Andrey Abramov wrote: > How about one of: > swap_bytes / swap_ints / swap_longs > swap_1 / swap_4 / swap_8 > >>> > >>> longs are ambiguous, so I would prefer bit-sized types. > >> > >> I already implemented Andrey's suggestions, which were the exact > >> opposite of yours. > >> > >> Pistols at dawn? > I just want to pick some names and move on. Since nothing so far seems > satsify everyone, I'll go with the ponderous but utterly unambiguous: > swap_bytes > swap_4byte_words > swap_8byte_words Works for me, or if someone insists it might be also bits there. -- With Best Regards, Andy Shevchenko
Re: [PATCH 1/5] lib/sort: Make swap functions more generic
On Thu, 14 Mar 2019 at 11:41:26 +0100, Geert Uytterhoeven wrote: > On Thu, Mar 14, 2019 at 11:10 AM George Spelvin wrote: >> On Sat, 09 Mar 2019 at 23:19:49 +0300, Andrey Abramov wrote: How about one of: swap_bytes / swap_ints / swap_longs swap_1 / swap_4 / swap_8 >>> >>> longs are ambiguous, so I would prefer bit-sized types. >> >> I already implemented Andrey's suggestions, which were the exact >> opposite of yours. >> >> Pistols at dawn? I didn't explain the joke because jokes aren't funny if explained, but just in case, by suggesting a clearly ridiculous method, I was saying "I have no idea how to resolve this conflict." > Prepared to fix all future long vs. int bugs? In the entire kernel? Or just the one small source file where these statically scoped helper functions are visible? In case of uncertainty, the comments and the code are right there just a few lines away from the one and only call site, so I don't expect much confusion. I care a lot about function names when they are exported, but in this case we're talking about what colour to paint the *inside* of the bike shed. I just want to pick some names and move on. Since nothing so far seems satsify everyone, I'll go with the ponderous but utterly unambiguous: swap_bytes swap_4byte_words swap_8byte_words
Re: [PATCH 1/5] lib/sort: Make swap functions more generic
Hi George, On Thu, Mar 14, 2019 at 11:10 AM George Spelvin wrote: > On Sat, 09 Mar 2019 at 23:19:49 +0300, Andrey Abramov wrote: > >> How about one of: > >> swap_bytes / swap_ints / swap_longs > >> swap_1 / swap_4 / swap_8 > > > > longs are ambiguous, so I would prefer bit-sized types. > > I already implemented Andrey's suggestions, which were the exact > opposite of yours. > > Pistols at dawn? Prepared to fix all future long vs. int bugs? Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds
Re: [PATCH 1/5] lib/sort: Make swap functions more generic
On Sat, 09 Mar 2019 at 23:19:49 +0300, Andrey Abramov wrote: >> Although I'm thinking of: >> >> static bool __attribute_const__ >> is_aligned(const void *base, size_t size, unsigned char align) >> { >> unsigned char lsbits = (unsigned char)size; >> >> (void)base; >> #ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS >> lsbits |= (unsigned char)(uintptr_t)base; >> #endif >> return (lsbits & (align - 1)) == 0; >> } >> >> Any preference? > I think it would be better. >> I find "u32s" confusing; I keep reading the "s" as "signed" rather >> than a plural. >> >> How about one of: >> swap_bytes / swap_ints / swap_longs >> swap_1 / swap_4 / swap_8 > > In my opinion "swap_bytes / swap_ints / swap_longs" are the most readable. On Thu, 14 Mar 2019 at 11:29:58 +0200, Andy Shevchenko wrote: > On Sat, Mar 09, 2019 at 03:53:41PM +, l...@sdf.org wrote: >> static bool __attribute_const__ >> is_aligned(const void *base, size_t size, unsigned char align) >> { >> unsigned char lsbits = (unsigned char)size; >> #ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS >> (void)base; >> #else >> lsbits |= (unsigned char)(uintptr_t)base; >> #endif >> return (lsbits & (align - 1)) == 0; >> } > >> Any preference? > > This one looks better in a sense we don't suppress the warnings when it's > not needed. >>> For such primitives that operates on top of an arrays we usually >>> append 's' to the name. Currently the name is misleading. >>> >>> Perhaps u32s_swap(). >> >> I don't worry much about the naming of static helper functions. >> If they were exported, it would be a whole lot more important! >> >> I find "u32s" confusing; I keep reading the "s" as "signed" rather >> than a plural. > > For signedness we use prefixes; for plural, suffixes. I don't see the point of > confusion. And this is in use in kernel a lot. > >> How about one of: >> swap_bytes / swap_ints / swap_longs >> swap_1 / swap_4 / swap_8 > > longs are ambiguous, so I would prefer bit-sized types. I already implemented Andrey's suggestions, which were the exact opposite of yours. Pistols at dawn? +#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS >>> >>> Why #ifdef is better than if (IS_ENABLED()) ? >> >> Because CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS is bool and not >> tristate. IS_ENABLED tests for 'y' or 'm' but we don't need it >> for something that's only on or off. > > There is IS_BUILTIN(), though it's a common practice to use IS_ENABLED() > even for boolean options (I think because of naming of the macro). Well, as I said earlier, #ifdef is the most common form in the kernel. It's also the shortest to write, and I like the fact that it slightly simpler. (Admittedly, "IS_ENABLED" does not take a lot of brain power to interpret, but it *is* one more macro that might be hiding magic.) So I'm not inclined to change it without a substantial reason.
Re: [PATCH 1/5] lib/sort: Make swap functions more generic
On Sat, 09 Mar 2019 at 23:19:49 +0300, Andrey Abramov wrote: >> Although I'm thinking of: >> >> static bool __attribute_const__ >> is_aligned(const void *base, size_t size, unsigned char align) >> { >> unsigned char lsbits = (unsigned char)size; >> >> (void)base; >> #ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS >> lsbits |= (unsigned char)(uintptr_t)base; >> #endif >> return (lsbits & (align - 1)) == 0; >> } >> >> Any preference? > I think it would be better. >> I find "u32s" confusing; I keep reading the "s" as "signed" rather >> than a plural. >> >> How about one of: >> swap_bytes / swap_ints / swap_longs >> swap_1 / swap_4 / swap_8 > > In my opinion "swap_bytes / swap_ints / swap_longs" are the most readable. On Thu, 14 Mar 2019 at 11:29:58 +0200, Andy Shevchenko wrote: > On Sat, Mar 09, 2019 at 03:53:41PM +, l...@sdf.org wrote: >> static bool __attribute_const__ >> is_aligned(const void *base, size_t size, unsigned char align) >> { >> unsigned char lsbits = (unsigned char)size; >> #ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS >> (void)base; >> #else >> lsbits |= (unsigned char)(uintptr_t)base; >> #endif >> return (lsbits & (align - 1)) == 0; >> } > >> Any preference? > > This one looks better in a sense we don't suppress the warnings when it's > not needed. >>> For such primitives that operates on top of an arrays we usually >>> append 's' to the name. Currently the name is misleading. >>> >>> Perhaps u32s_swap(). >> >> I don't worry much about the naming of static helper functions. >> If they were exported, it would be a whole lot more important! >> >> I find "u32s" confusing; I keep reading the "s" as "signed" rather >> than a plural. > > For signedness we use prefixes; for plural, suffixes. I don't see the point of > confusion. And this is in use in kernel a lot. > >> How about one of: >> swap_bytes / swap_ints / swap_longs >> swap_1 / swap_4 / swap_8 > > longs are ambiguous, so I would prefer bit-sized types. I already implemented Andrey's suggestions, which were the exact opposite of yours. Pistols at dawn? +#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS >>> >>> Why #ifdef is better than if (IS_ENABLED()) ? >> >> Because CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS is bool and not >> tristate. IS_ENABLED tests for 'y' or 'm' but we don't need it >> for something that's only on or off. > > There is IS_BUILTIN(), though it's a common practice to use IS_ENABLED() > even for boolean options (I think because of naming of the macro). Well, as I said earlier, #ifdef is the most common form in the kernel. It's also the shortest to write, and I like the fact that it slightly simpler. (Admittedly, "IS_ENABLED" does not take a lot of brain power to interpret, but it *is* one more macro that might be hiding magic.) So I'm not inclined to change it without a substantial reason.
Re: [PATCH 1/5] lib/sort: Make swap functions more generic
On Sat, Mar 09, 2019 at 03:53:41PM +, l...@sdf.org wrote: > Andy Shevchenko wrote: > > On Thu, Feb 21, 2019 at 06:30:28AM +, George Spelvin wrote: > >> +#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS > > > > Why #ifdef is better than if (IS_ENABLED()) ? > > Because CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS is bool and not > tristate. IS_ENABLED tests for 'y' or 'm' but we don't need it > for something that's only on or off. There is IS_BUILTIN(), though it's a common practice to use IS_ENABLED() even for boolean options (I think because of naming of the macro). > Looking through the kernel, I see both, but #ifdef or #if defined() > are definitely in the majority: > > lib/lzo/lzo1x_compress.c:#if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) > && defined(LZO_USE_CTZ64) > lib/siphash.c:#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS > lib/string.c:#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS > lib/strncpy_from_user.c:#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS > lib/zlib_inflate/inffast.c:#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS > > I see a few IS_ENABLED uses in include/crypto/ and kernel/bpf/. > > It makes no real difference; #ifdef is simpler to me. > static bool __attribute_const__ > is_aligned(const void *base, size_t size, unsigned char align) > { > unsigned char lsbits = (unsigned char)size; > #ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS > (void)base; > #else > lsbits |= (unsigned char)(uintptr_t)base; > #endif > return (lsbits & (align - 1)) == 0; > } > Any preference? This one looks better in a sense we don't suppress the warnings when it's not needed. > > For such primitives that operates on top of an arrays we usually append 's' > > to > > the name. Currently the name is misleading. > > > > Perhaps u32s_swap(). > > I don't worry much about the naming of static helper functions. > If they were exported, it would be a whole lot more important! > > I find "u32s" confusing; I keep reading the "s" as "signed" rather > than a plural. For signedness we use prefixes, for plural — suffixes. I don't see the point of confusion. And this is in use in kernel a lot. > How about one of: > swap_bytes / swap_ints / swap_longs > swap_1 / swap_4 / swap_8 longs are ambiguous, so I would prefer bit-sized types. > > Shouldn't simple memcpy cover these case for both 32- and 64-bit > > architectures? > > This isn't a memcpy, it's a memory *swap*. To do it with memcpy > requires: > memcpy(temp_buffer, a, size); > memcpy(a, b, size); > memcpy(b, temp_buffer, size); > > This is 1.5x as much memory access, and you have to find a large > enough temp_buffer. (I didn't think a variable-length array on > the stack would make people happy.) > > Also, although it is a predictable branch, memcpy() has to check the > alignment of its inputs each call. The reason for these helpers is > to factor that out. Makes sense. -- With Best Regards, Andy Shevchenko
Re: [PATCH 1/5] lib/sort: Make swap functions more generic
Thank you for your thoughtful comments! On Wed, 13 Mar 2019 at 23:23:44 +0100, Rasmus Villemoes wrote: > On 21/02/2019 07.30, George Spelvin wrote: > + * @align: required aignment (typically 4 or 8) > > typo aLignment Thanks; fixed! >> + * Returns true if elements can be copied using word loads and stores. >> + * The size must be a multiple of the alignment, and the base address must >> + * be if we do not have CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS. > > I wonder if we shouldn't unconditionally require the base to be aligned. > It of course depends on what exactly 'efficient' means, but if the base > is not aligned, we know that every single load and store will be > unaligned, and we're doing lots of them. IIRC even on x86 it's usually > two cycles instead of one, and even more when we cross a cache line > boundary (which we do rather often - e.g. in your 40 byte example, > almost every swap will hit at least one). One could also have some data > that is naturally 4-byte aligned with an element size that happens to be > a multiple of 8 bytes; it would be a bit sad to use 8-byte accesses when > 4-byte would all have been aligned. Well, a 2-cycle unaligned access is still lots faster than 4 byte accesses. I think that's a decent interpretation of "EFFICIENT_UNALIGNED_ACCESS": faster than doing it a byte at a time. So the change is a net win; we're just wondering if it could be optimized more. The 4/8 issue is similar, but not as extreme. Is one unaligned 64-bit access faster than two aligned 32-bit accesses? As you note, it's usually only twice the cost, so it's no slower, and there's less loop overhead. So I don't think it's a bad thing here, either. Re your comments on cache lines, ISTR that x86 can do an unaligned load with minimal penalty as long as it doesn't cross a cache line: https://software.intel.com/en-us/articles/reducing-the-impact-of-misaligned-memory-accesses/ So you win on most of the accesses, hopefully enough to pay for the one unligned access. Apparently in processors more recent than the P4 example Intel used above, it's even less: https://lemire.me/blog/2012/05/31/data-alignment-for-speed-myth-or-reality/ On 32-bit machines, it's actually a 4-byte swap, unrolled twice; there are no 64-bit memory accesses. So the concern is only about 8-byte alignment on 64-bit machines. The great majority of call sites sort structures with pointer or long members, so are aligned and the question is moot. I don't think it's worth overthinking the issue on behalf of the performance of some rare corner cases. I have considered doing away with the two word-copy loops and just having one machine-word-sized loop plus a byte fallback. >> +unsigned int lsbits = (unsigned int)size; > > Drop that cast. Actually, in response to an earlier comment, I changed it (and the type of lsbits) to (unsigned char), to emphasize that I *really* only care about a handful of low bits. I know gcc doesn't warn about implicit narrowing casts, but I prefer to make them explicit for documentation reasons. "Yes, I really mean to throw away the high bits." Compilers on machines without native byte operations are very good at using larger registers and optimizing away mask operations. (Remember that this is inlined, and "align" is a constant 4 or 8.) > >> +#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS >> +(void)base; >> +#else >> +lsbits |= (unsigned int)(size_t)base; > > The kernel usually casts pointers to long or unsigned long. If some > oddball arch has size_t something other than unsigned long, this would > give a "warning: cast from pointer to integer of different size". So > just cast to unsigned long, and drop the cast to unsigned int. I changed this to uintptr_t and called it good. >> static void u32_swap(void *a, void *b, int size) >> { >> -u32 t = *(u32 *)a; >> -*(u32 *)a = *(u32 *)b; >> -*(u32 *)b = t; >> +size_t n = size; >> + > > Since the callback has int in its prototype (we should fix that globally > at some point...), might as well use a four-byte local variable, to > avoid rex prefix on x86-64. So just make that "unsigned" or "u32". Agreed about the eventual global fix. If you care, the only places a non-NULL swap function is passed in are: - arch/arc/kernel/unwind.c - arch/powerpc/kernel/module_32.c - arch/powerpc/kernel/module_64.c ! arch/x86/kernel/unwind_orc.c (2x) - fs/ocfs2/dir.c - fs/ocfs2/refcounttree.c (3x) - fs/ocfs2/xattr.c (3x) - fs/ubifs/find.c ! kernel/jump_label.c ! lib/extable.c The ones marked with "-" are simple memory swaps that could (should!) be replaced with NULL. The ones marked with "!" actually do something non-trivial. Actually, I deliberately used a pointer-sized index, since the loop uses (a + n) addressing. Hardware indexed addressing on 64-bit machines generally (I vaguely recall Aarch64 is an exception) requires a 64-bit index; using a smaller index requires some masking or sign-extending. I thought it was safer to bet that the c
Re: [PATCH 1/5] lib/sort: Make swap functions more generic
On Wed, Mar 13, 2019 at 10:23 PM Rasmus Villemoes wrote: > On 21/02/2019 07.30, George Spelvin wrote: > > +#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS > > + (void)base; > > +#else > > + lsbits |= (unsigned int)(size_t)base; > > The kernel usually casts pointers to long or unsigned long. If some > oddball arch has size_t something other than unsigned long, this would > give a "warning: cast from pointer to integer of different size". So > just cast to unsigned long, and drop the cast to unsigned int. The proper integer equivalent of a pointer is uintptr_t, so please use that instead of size_t. Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds
Re: [PATCH 1/5] lib/sort: Make swap functions more generic
On 21/02/2019 07.30, George Spelvin wrote: > Rather than u32_swap and u64_swap working on 4- and 8-byte objects > directly, let them handle any multiple of 4 or 8 bytes. This speeds > up most users of sort() by avoiding fallback to the byte copy loop. > > Despite what commit ca96ab859ab4 ("lib/sort: Add 64 bit swap function") > claims, very few users of sort() sort pointers (or pointer-sized > objects); most sort structures containing at least two words. > (E.g. drivers/acpi/fan.c:acpi_fan_get_fps() sorts an array of 40-byte > struct acpi_fan_fps.) > > x86-64 code size 872 -> 885 bytes (+8) > > Signed-off-by: George Spelvin > --- > lib/sort.c | 117 +++-- > 1 file changed, 96 insertions(+), 21 deletions(-) > > diff --git a/lib/sort.c b/lib/sort.c > index d6b7a202b0b6..dff2ab2e196e 100644 > --- a/lib/sort.c > +++ b/lib/sort.c > @@ -11,35 +11,110 @@ > #include > #include > > -static int alignment_ok(const void *base, int align) > +/** > + * alignment_ok - is this pointer & size okay for word-wide copying? > + * @base: pointer to data > + * @size: size of each element > + * @align: required aignment (typically 4 or 8) > + * typo aLignment > + * Returns true if elements can be copied using word loads and stores. > + * The size must be a multiple of the alignment, and the base address must > + * be if we do not have CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS. > + * I wonder if we shouldn't unconditionally require the base to be aligned. It of course depends on what exactly 'efficient' means, but if the base is not aligned, we know that every single load and store will be unaligned, and we're doing lots of them. IIRC even on x86 it's usually two cycles instead of one, and even more when we cross a cache line boundary (which we do rather often - e.g. in your 40 byte example, almost every swap will hit at least one). One could also have some data that is naturally 4-byte aligned with an element size that happens to be a multiple of 8 bytes; it would be a bit sad to use 8-byte accesses when 4-byte would all have been aligned. > + * For some reason, gcc doesn't know to optimize "if (a & mask || b & mask)" > + * to "if ((a | b) & mask)", so we do that by hand. > + */ > +static bool __attribute_const__ > +alignment_ok(const void *base, size_t size, unsigned int align) > { > - return IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) || > - ((unsigned long)base & (align - 1)) == 0; > + unsigned int lsbits = (unsigned int)size; Drop that cast. > +#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS > + (void)base; > +#else > + lsbits |= (unsigned int)(size_t)base; The kernel usually casts pointers to long or unsigned long. If some oddball arch has size_t something other than unsigned long, this would give a "warning: cast from pointer to integer of different size". So just cast to unsigned long, and drop the cast to unsigned int. If you do drop CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS, this all becomes much simpler, of course lsbits = size | (unsigned long)base; > +#endif > + lsbits &= align - 1; > + return lsbits == 0; > } > > +/** > + * u32_swap - swap two elements in 32-bit chunks > + * @a, @b: pointers to the elements > + * @size: element size (must be a multiple of 4) > + * > + * Exchange the two objects in memory. This exploits base+index addressing, > + * which basically all CPUs have, to minimize loop overhead computations. > + * > + * For some reason, on x86 gcc 7.3.0 adds a redundant test of n at the > + * bottom of the loop, even though the zero flag is stil valid from the > + * subtract (since the intervening mov instructions don't alter the flags). > + * Gcc 8.1.0 doesn't have that problem. > + */ > static void u32_swap(void *a, void *b, int size) > { > - u32 t = *(u32 *)a; > - *(u32 *)a = *(u32 *)b; > - *(u32 *)b = t; > + size_t n = size; > + Since the callback has int in its prototype (we should fix that globally at some point...), might as well use a four-byte local variable, to avoid rex prefix on x86-64. So just make that "unsigned" or "u32". > + do { > + u32 t = *(u32 *)(a + (n -= 4)); > + *(u32 *)(a + n) = *(u32 *)(b + n); > + *(u32 *)(b + n) = t; > + } while (n); > } Hm, are we absolutely sure there's no weird .config where some struct ends up being empty, yet we still sort an array of them? It's far fetched, of course, but perhaps an "if (unlikely(!size)) return;" at the start of sort() would be in order. Dunno, I'll leave it to you. Rasmus
Re: [PATCH 1/5] lib/sort: Make swap functions more generic
Andy Shevchenko wrote: > Shouldn't simple memcpy cover these case for both 32- and 64-bit > architectures? Speaking of replacing swap with copying via temporary buffers, one idea that did come to mind was avoiding swap for sufficiently small objects. Every sift-down is actually a circular shift. Once the target position hs been found, we rotate the path from the root to the target up one, with the target filled in by the previous root. (When breaking down the heap, there's one additional link in the cycle, where the previous root goes to the end of the heap and the end of the heap goes to the target.) If we had a temporary buffer (128 bytes would handle most things), we could copy the previous root there and *copy*, rather than swap, the elements on the path to the target up, then finally copy the previous root to the target. However, to rotate up, the this must be done in top-down order. The way the code currently works with premultiplied offsets, it's easy to traverse bottom-up, but very awkward to retrace the top-down path. The only solution I can think of is to build a bitmap of the left/right turnings from the root to the leaf, and then back it up to the target. There are a few ways to do this: 1) The obvious big-endian order. The bitmap is simply the 1-based position of the leaf. To add a level, shift left and add the new bit at the bottom. To back up a step, shift right. To retrace, create a 1-bit mask equal to the msbit of the index ((smear_right(x) >> 1) + 1) and repeatedly shift the mask right. 2) The reverse order. We use a 1-bit mask while building the bitmap, and while retracing, just examine the lsbit while shifting the bitmap right. 3) As option 1, but don't build the bitmap as we're walking down; rather reconstruct it from the premultiplied offset using reciprocal_divide(). Nothing really jumps out to me as The Right Way to do it. I don't want to bloat the code to the point that it would be easier to implement a different algorithm entirely.
Re: [PATCH 1/5] lib/sort: Make swap functions more generic
> Although I'm thinking of: > > static bool __attribute_const__ > is_aligned(const void *base, size_t size, unsigned char align) > { > unsigned char lsbits = (unsigned char)size; > > (void)base; > #ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS > lsbits |= (unsigned char)(uintptr_t)base; > #endif > return (lsbits & (align - 1)) == 0; > } > > Any preference? I think it would be better. > I find "u32s" confusing; I keep reading the "s" as "signed" rather than a > plural. How about one of: swap_bytes / swap_ints / swap_longs swap_1 / swap_4 > / swap_8 In my opinion "swap_bytes / swap_ints / swap_longs" are the most readable. (Good job)
Re: [PATCH 1/5] lib/sort: Make swap functions more generic
Thnk you for the feedback! Andy Shevchenko wrote: > On Thu, Feb 21, 2019 at 06:30:28AM +, George Spelvin wrote: >> +#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS > > Why #ifdef is better than if (IS_ENABLED()) ? Because CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS is bool and not tristate. IS_ENABLED tests for 'y' or 'm' but we don't need it for something that's only on or off. Looking through the kernel, I see both, but #ifdef or #if defined() are definitely in the majority: lib/lzo/lzo1x_compress.c:#if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) && defined(LZO_USE_CTZ64) lib/siphash.c:#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS lib/string.c:#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS lib/strncpy_from_user.c:#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS lib/zlib_inflate/inffast.c:#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS I see a few IS_ENABLED uses in include/crypto/ and kernel/bpf/. It makes no real difference; #ifdef is simpler to me. >> +(void)base; > > I understand the possible weirdness of the case, but 0 pointer is also good, > no? I don't quite understand the question. A NULL pointer is aligned as far as alignment_ok is concerned. In other words, word accesses to *NULL will (not) work just as well as byte accesses. None of this matters because the callers never pass in NULL pointers. >> +#else >> +lsbits |= (unsigned int)(size_t)base; > > Perhaps you meant simple (uintptr_t) here and maybe above as well. No, I meant to truncate it to the same type as "align". We only care about the low few bits; I could have used (unsigned char) just as well. My reason or choosing unsigned int rather than unsigned char was that both x86 and ARM are a tiny bit more efficient at 32-bit operations (x86 avoids a REX prefix; ARM gates off the high half of the ALU to save power), but only x86 does byte operations natively. Still, compilers know how to promote to word operations, so maybe using unsigned char would make it clearer to the reader that "yes, I meant to do that!". I've changed it to: static bool __attribute_const__ is_aligned(const void *base, size_t size, unsigned char align) { unsigned char lsbits = (unsigned char)size; #ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS (void)base; #else lsbits |= (unsigned char)(uintptr_t)base; #endif return (lsbits & (align - 1)) == 0; } Although I'm thinking of: static bool __attribute_const__ is_aligned(const void *base, size_t size, unsigned char align) { unsigned char lsbits = (unsigned char)size; (void)base; #ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS lsbits |= (unsigned char)(uintptr_t)base; #endif return (lsbits & (align - 1)) == 0; } Any preference? >> +#endif >> +lsbits &= align - 1; >> +return lsbits == 0; > > and this can be one return operation. It was just to avoid some annoying parentheses because C's precendence rules and GCC's warnings are fighting me here. If others prefer "return (lsbits & (align - 1)) == 0;" I'm happy to change. >> static void u32_swap(void *a, void *b, int size) > > For such primitives that operates on top of an arrays we usually append 's' to > the name. Currently the name is misleading. > > Perhaps u32s_swap(). I don't worry much about the naming of static helper functions. If they were exported, it would be a whole lot more important! I find "u32s" confusing; I keep reading the "s" as "signed" rather than a plural. How about one of: swap_bytes / swap_ints / swap_longs swap_1 / swap_4 / swap_8 > Shouldn't simple memcpy cover these case for both 32- and 64-bit > architectures? This isn't a memcpy, it's a memory *swap*. To do it with memcpy requires: memcpy(temp_buffer, a, size); memcpy(a, b, size); memcpy(b, temp_buffer, size); This is 1.5x as much memory access, and you have to find a large enough temp_buffer. (I didn't think a variable-length array on the stack would make people happy.) Also, although it is a predictable branch, memcpy() has to check the alignment of its inputs each call. The reason for these helpers is to factor that out.