On Fri, Mar 22, 2024 at 12:09 AM Nathan Bossart <nathandboss...@gmail.com> wrote: > > On Thu, Mar 21, 2024 at 11:30:30AM +0700, John Naylor wrote:
> > If this were "<=" then the for long arrays we could assume there is > > always more than one block, and wouldn't need to check if any elements > > remain -- first block, then a single loop and it's done. > > > > The loop could also then be a "do while" since it doesn't have to > > check the exit condition up front. > > Good idea. That causes us to re-check all of the tail elements when the > number of elements is evenly divisible by nelem_per_iteration, but that > might be worth the trade-off. Yeah, if there's no easy way to avoid that it's probably fine. I wonder if we can subtract one first to force even multiples to round down, although I admit I haven't thought through the consequences of that. > [v8] Seems pretty good. It'd be good to see the results of 2- vs. 4-register before committing, because that might lead to some restructuring, but maybe it won't, and v8 is already an improvement over HEAD. /* Process the remaining elements one at a time. */ This now does all of them if that path is taken, so "remaining" can be removed.