On 02/23/2018 07:15 AM, Peter Maydell wrote: >> +static const uint64_t expand_bit_data[5][2] = { >> + { 0x1111111111111111ull, 0x2222222222222222ull }, >> + { 0x0303030303030303ull, 0x0c0c0c0c0c0c0c0cull }, >> + { 0x000f000f000f000full, 0x00f000f000f000f0ull }, >> + { 0x000000ff000000ffull, 0x0000ff000000ff00ull }, >> + { 0x000000000000ffffull, 0x00000000ffff0000ull } >> +}; >> + >> +/* Expand units of 2**N bits to units of 2**(N+1) bits, >> + with the higher bits zero. */ > > In bitops.h we call this operation "half shuffle" (where > it is specifically working on units of 1 bit size), and > the inverse "half unshuffle". Worth mentioning that (or > using similar terminology) ?
I hadn't noticed this helper. I'll at least mention. FWIW, the half_un/shuffle operation is what you get with N=0, which corresponds to a byte predicate interleave. We need the intermediate steps for half, single, and double predicate interleaves. >> +static uint64_t expand_bits(uint64_t x, int n) >> +{ >> + int i, sh; > > Worth asserting that n is within the range we expect it to be ? > (what range is that? 0 to 4?) N goes from 0-3; I goes from 0-4. N will have been controlled by decode, so I'm not sure it's worth an assert. Even if I did add one, I wouldn't want it here, at the center of a loop kernel. >> + d[0] = nn + (mm << (1 << esz)); > > Is this actually doing an addition, or is it just an odd > way of writing a bitwise OR when neither of the two > inputs have 1 in the same bit position? It could be an OR. Here I'm hoping that the compiler will use a shift-add instruction. Which it wouldn't necessarily be able to prove by itself if I did write it with an OR. >> + d[0] = extract64(l + (h << (4 * oprsz)), 0, 8 * oprsz); > > This looks like it's using addition for logical OR again ? Yes. Although this time I admit it'll never produce an LEA. >> + /* For VL which is not a power of 2, the results from M do not >> + align nicely with the uint64_t for D. Put the aligned results >> + from M into TMP_M and then copy it into place afterward. */ > > How much risu testing did you do of funny vector lengths ? As much as I can with the unlicensed Foundation Platform: all lengths from 1-4. Which, unfortunately does leave a few multi-word predicate paths untested, but many of the routines loop identically within this length and beyond. >> +static const uint64_t even_bit_esz_masks[4] = { >> + 0x5555555555555555ull, >> + 0x3333333333333333ull, >> + 0x0f0f0f0f0f0f0f0full, >> + 0x00ff00ff00ff00ffull >> +}; > > Comment describing the purpose of these numbers would be useful. Ack. r~