On Wed, 13 Oct 2021 at 15:44, Vincent Palatin <vpala...@rivosinc.com> wrote:
>
> On Wed, Oct 13, 2021 at 3:13 PM Philipp Tomsich
> <philipp.toms...@vrull.eu> wrote:
> >
> > I had a much simpler version initially (using 3 x mask/shift/or, for
> > 12 instructions after setup of constants), but took up the suggestion
> > to optimize based on haszero(v)...
> > Indeed this appears to not do what we expect, when there's only 0x01
> > set in a byte.
> >
> > The less optimized form, with a single constant, that would still do
> > what we want is:
> >    /* set high-bit for non-zero bytes */
> >    constant = dup_const_tl(MO_8, 0x7f);
> >    tmp = v & constant;   // AND
> >    tmp += constant;       // ADD
> >    tmp |= v;                    // OR
> >    /* extract high-bit to low-bit, for each word */
> >    tmp &= ~constant;     // ANDC
> >    tmp >>= 7;                 // SHR
> >    /* multiply with 0xff to populate entire byte where the low-bit is set */
> >    tmp *= 0xff;                // MUL
> >
> > I'll submit a patch with this one later today, once I had a chance to
> > pass this through a full test.
>
>
> Thanks for the insight.
>
> I have tried it, implemented as:
> ```
> static void gen_orc_b(TCGv ret, TCGv source1)
> {
>     TCGv  tmp = tcg_temp_new();
>     TCGv  constant = tcg_constant_tl(dup_const_tl(MO_8, 0x7f));
>
>     /* set high-bit for non-zero bytes */
>     tcg_gen_and_tl(tmp, source1, constant);
>     tcg_gen_add_tl(tmp, tmp, constant);
>     tcg_gen_or_tl(tmp, tmp, source1);
>     /* extract high-bit to low-bit, for each word */
>     tcg_gen_andc_tl(tmp, tmp, constant);
>     tcg_gen_shri_tl(tmp, tmp, 7);
>
>     /* Replicate the lsb of each byte across the byte. */
>     tcg_gen_muli_tl(ret, tmp, 0xff);
>
>     tcg_temp_free(tmp);
> }
> ```
>
> It does pass my own test sequences.

I am running it against SPEC at the moment, using optimized
strlen/strcpy/strcmp functions using orc.b.
The verdict on that should be available later today...

Philipp.

Reply via email to