Hello, I would like to "ping" my patch series for s390x.
Any thoughts or comments? Would you prefer keeping addmul_1 and addmul_2 as assembly? Regards, Marius On 8/5/21 09:03, Marius Hillenbrand wrote: > Hi, > > Changes from v1: > - add tuneup results from z13 > - fix mul_basecase to use #included and inlined addmul_2 > > Based on your feedback on my previous patches, I rewrote addmul_1/mul_1 > and added implementations for addmul_2/mul_2 and mul_basecase. They are > still based on multiplying 64x64->128 in gpr pairs and accumulating > 128-bit-wise in vector registers. > > The code passes "make check", of course, and I have run "try" for ~72 > hours for each of the functions (on top of countless iterations of the > relevant individual test cases in tests/devel). > > GMPbench.base.multiply improves by about 50% on z15, the overall score > in GMPbench improves by ~35%. The patches do not include new tuneup > parameters, yet. > > All the implementations are in C with enough inline assembly to result > in decent code. mul_basecase #includes and inlines the (add)mul > functions to avoid calls and unnecessary branches. > > All the (add)mul_1/2 functions are 4x unrolled for the first operand > (i.e., 4 mults per iteration in addmul_1, 8 mults in addmul_2). > Mul_basecase is structured so that it branches on (un % 4) to select the > correct loop prologue only once on entry, and does not need branches for > that in each body of addmul. > > The accumulation structure in addmul_2 is maybe a little unexpected. The > idea there is to use 128-bit adds without carry over full adds with > carry-in and carry-out whenever possible because the latter require two > instructions for each sum and have instruction grouping limitations. The > resulting code performs better than strictly using adds with > carry-in/out for the moderate number of limbs that are relevant for > mul_basecase. > > Regards, > Marius > > > _______________________________________________ > gmp-devel mailing list > gmp-devel@gmplib.org > https://gmplib.org/mailman/listinfo/gmp-devel > -- Marius Hillenbrand Linux on Z development IBM Deutschland Research & Development GmbH Vors. des Aufsichtsrats: Gregor Pillen / Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 _______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel