On Tue, Sep 25, 2018 at 5:33 AM Jason Ekstrand <ja...@jlekstrand.net> wrote: > > On Mon, Sep 24, 2018 at 7:15 PM Marek Olšák <mar...@gmail.com> wrote: >> >> This patch also handles all types, just differently. If you change the >> typedefs in the header, you'll get a different type and the code is >> exactly the same for all types, but that's not important (to me >> anyway). >> >> It also supports signed division. (not important to me either, but may >> be important to you) > > > Mine supported signed division as well though what's here might be a bit more > clever. I'll have to give it some thought. > >> >> Did you figure out the algorithm by yourself or did you copy it from >> somewhere? The reason I'm asking is that yours only seems to implement >> the Round-Up algorithm and you said: >> >> "In particular, we want to have m < 2^N so that we don't have any >> overflow problems. Unfortunately, this isn't always achievable." > > > Yes, mine is based on the round up algorithm. However, it's not the blind > round-up algorithm; it's a bit smarter than that. > >> >> Let me tell you what. This patch achieves it ALWAYS. > > > I don't think that's true. You still have an N+1 bit multiplier, you just > call it the increment bit. The saturated add, however, is a neat trick that > probably lets you avoid the weirness around adding in the increment factor. > I'll need to look at the web-site you linked and think about this stuff again > before I can verify it.
If your hw doesn't have a saturated add, you can still achieve the result with: addcarry (32-bit add doing "+1") subborrow (saturation) It takes advantage of the fact that increment <= 1. The idea is that addcarry returns carry=1 if it overflowed, and subborrow subtracts 1 if carry=1. It avoids all 64-bit math and you only need one 32-bit intermediate register for the division. > >> >> This patch implements 2 algorithms for unsigned division: Round-Up and >> Round-Down. The paper I linked shows that the Round Down algorithm >> generates better code for some divisors than the Round Up algorithm, >> because the multiplier always fits into 32 bits. The most operations >> you'll ever need are: 2 shifts, 32-bit saturated ADD and UMUL_HI. >> >> Marek >> >> On Mon, Sep 24, 2018 at 7:41 PM, Marek Olšák <mar...@gmail.com> wrote: >> > Did you copy the code from the same author? >> > >> > Does your version also have an interface for dividing by a uniform >> > instead of a compile time constant? >> > >> > Note that this algorithm was originally only written for >> > non-power-of-two divisors and I extended it to support 1 and >> > power-of-two divisors in order to support dividing by a uniform in a >> > generic way. The other two generic variants that I added are also >> > important. One of them assumes no unsigned wraparounds and the other >> > one assumes operands have 31 bits and the divisor is >= 2. >> > >> > Marek >> > >> > On Mon, Sep 24, 2018 at 10:00 AM, Jason Ekstrand <ja...@jlekstrand.net> >> > wrote: >> >> Very similar.... And mine handles 8, 16, and 64-bit types. :-D >> >> >> >> --Jason >> >> >> >> On Mon, Sep 24, 2018 at 8:53 AM Ian Romanick <i...@freedesktop.org> wrote: >> >>> >> >>> I didn't look really closely at either set, but this seems really >> >>> similar to something Jason sent out a week or two. Perhaps you guys >> >>> could unify these? >> >>> >> >>> On 09/23/2018 09:57 AM, Marek Olšák wrote: >> >>> > From: Marek Olšák <marek.ol...@amd.com> >> >>> > >> >>> > Compilers can use this to generate optimal code for integer division >> >>> > by a constant. >> >>> > >> >>> > Additionally, an unsigned division by a uniform that is constant but >> >>> > not >> >>> > known at compile time can still be optimized by passing 2-4 division >> >>> > factors to the shader as uniforms and executing one of the fast_udiv* >> >>> > variants. The signed division algorithm doesn't have this capability. >> >>> > --- >> >>> > src/util/Makefile.sources | 2 + >> >>> > src/util/fast_idiv_by_const.c | 245 >> >>> > ++++++++++++++++++++++++++++++++++++++++++ >> >>> > src/util/fast_idiv_by_const.h | 173 +++++++++++++++++++++++++++++ >> >>> > src/util/meson.build | 2 + >> >>> > 4 files changed, 422 insertions(+) >> >>> > create mode 100644 src/util/fast_idiv_by_const.c >> >>> > create mode 100644 src/util/fast_idiv_by_const.h >> >>> > >> >>> > diff --git a/src/util/Makefile.sources b/src/util/Makefile.sources >> >>> > index b562d6c..f741b2a 100644 >> >>> > --- a/src/util/Makefile.sources >> >>> > +++ b/src/util/Makefile.sources >> >>> > @@ -3,20 +3,22 @@ MESA_UTIL_FILES := \ >> >>> > bitscan.h \ >> >>> > bitset.h \ >> >>> > build_id.c \ >> >>> > build_id.h \ >> >>> > crc32.c \ >> >>> > crc32.h \ >> >>> > debug.c \ >> >>> > debug.h \ >> >>> > disk_cache.c \ >> >>> > disk_cache.h \ >> >>> > + fast_idiv_by_const.c \ >> >>> > + fast_idiv_by_const.h \ >> >>> > format_r11g11b10f.h \ >> >>> > format_rgb9e5.h \ >> >>> > format_srgb.h \ >> >>> > futex.h \ >> >>> > half_float.c \ >> >>> > half_float.h \ >> >>> > hash_table.c \ >> >>> > hash_table.h \ >> >>> > list.h \ >> >>> > macros.h \ >> >>> > diff --git a/src/util/fast_idiv_by_const.c >> >>> > b/src/util/fast_idiv_by_const.c >> >>> > new file mode 100644 >> >>> > index 0000000..f247b66 >> >>> > --- /dev/null >> >>> > +++ b/src/util/fast_idiv_by_const.c >> >>> > @@ -0,0 +1,245 @@ >> >>> > +/* >> >>> > + * Copyright © 2018 Advanced Micro Devices, Inc. >> >>> > + * >> >>> > + * Permission is hereby granted, free of charge, to any person >> >>> > obtaining a >> >>> > + * copy of this software and associated documentation files (the >> >>> > "Software"), >> >>> > + * to deal in the Software without restriction, including without >> >>> > limitation >> >>> > + * the rights to use, copy, modify, merge, publish, distribute, >> >>> > sublicense, >> >>> > + * and/or sell copies of the Software, and to permit persons to whom >> >>> > the >> >>> > + * Software is furnished to do so, subject to the following >> >>> > conditions: >> >>> > + * >> >>> > + * The above copyright notice and this permission notice (including >> >>> > the >> >>> > next >> >>> > + * paragraph) shall be included in all copies or substantial portions >> >>> > of the >> >>> > + * Software. >> >>> > + * >> >>> > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, >> >>> > EXPRESS OR >> >>> > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF >> >>> > MERCHANTABILITY, >> >>> > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT >> >>> > SHALL >> >>> > + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES >> >>> > OR >> >>> > OTHER >> >>> > + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, >> >>> > ARISING >> >>> > + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER >> >>> > DEALINGS >> >>> > + * IN THE SOFTWARE. >> >>> > + */ >> >>> > + >> >>> > +/* Imported from: >> >>> > + * >> >>> > https://raw.githubusercontent.com/ridiculousfish/libdivide/master/divide_by_constants_codegen_reference.c >> >>> > + * Paper: >> >>> > + * >> >>> > http://ridiculousfish.com/files/faster_unsigned_division_by_constants.pdf >> >>> > + * >> >>> > + * The author, ridiculous_fish, wrote: >> >>> > + * >> >>> > + * ''Reference implementations of computing and using the "magic >> >>> > number" >> >>> > + * approach to dividing by constants, including codegen >> >>> > instructions. >> >>> > + * The unsigned division incorporates the "round down" optimization >> >>> > per >> >>> > + * ridiculous_fish. >> >>> > + * >> >>> > + * This is free and unencumbered software. Any copyright is >> >>> > dedicated >> >>> > + * to the Public Domain.'' >> >>> > + */ >> >>> > + >> >>> > +#include "fast_idiv_by_const.h" >> >>> > +#include "u_math.h" >> >>> > +#include <limits.h> >> >>> > +#include <assert.h> >> >>> > + >> >>> > +/* uint_t and sint_t can be replaced by different integer types and >> >>> > the >> >>> > code >> >>> > + * will work as-is. The only requirement is that sizeof(uintN) == >> >>> > sizeof(intN). >> >>> > + */ >> >>> > + >> >>> > +struct util_fast_udiv_info >> >>> > +util_compute_fast_udiv_info(uint_t D, unsigned num_bits) >> >>> > +{ >> >>> > + /* The numerator must fit in a uint_t */ >> >>> > + assert(num_bits > 0 && num_bits <= sizeof(uint_t) * CHAR_BIT); >> >>> > + assert(D != 0); >> >>> > + >> >>> > + /* The eventual result */ >> >>> > + struct util_fast_udiv_info result; >> >>> > + >> >>> > + if (util_is_power_of_two_nonzero(D)) { >> >>> > + unsigned div_shift = util_logbase2(D); >> >>> > + >> >>> > + if (div_shift) { >> >>> > + /* Dividing by a power of two. */ >> >>> > + result.multiplier = 1 << 31; > > > This is wrong for non-32-bit Indeed. Do we need other bit sizes for NIR? > >> >> >>> > + result.pre_shift = 0; >> >>> > + result.post_shift = div_shift - 1; >> >>> > + result.increment = 0; >> >>> > + return result; >> >>> > + } else { >> >>> > + /* Dividing by 1. */ >> >>> > + /* Assuming: floor((num + 1) * (2^32 - 1) / 2^32) = num */ >> >>> > + result.multiplier = UINT_MAX; > > > So is this. > > Can we at the very least pull in the unit tests from my series? Yes. Marek _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev