https://bugs.kde.org/show_bug.cgi?id=385408

--- Comment #21 from Vadim Barkov <vbr...@gmail.com> ---
(In reply to Julian Seward from comment #19)
> (In reply to Vadim Barkov from comment #18)
> > Created attachment 108754 [details]
> > Initial vector support (chapter 21) (fixed doc)
> 
> This is pretty good on the whole, but there are a couple of areas of concern.
> Comments below.
> 
> 
> +++ b/memcheck/mc_translate.c
> 
> +      case Iop_Perm8x16x2:
> +         /* (V128, V128, V128) -> V128 */
> +         return assignNew('V', mce, Ity_V128, triop(op, vatom1, vatom2,
> vatom3));
> 
> This isn't right.  It needs to steer the shadow values but using atom3 as
> the steering value, not vatom3.  Also it needs to somehow take into account
> the fact that atom3 might be undefined.  See the handling of Iop_Perm8x8 in
> the same file.
> 

Is this code right?

      case Iop_Perm8x16x2:
         /* (V128, V128, V128) -> V128 */
            complainIfUndefined(mce, atom3, NULL);
            return mkUifUV128(
                   mce,
                   assignNew('V', mce, Ity_V128, triop(op, vatom1, vatom2,
atom3)),
                   mkPCast8x16(mce, vatom3)
                );

> VEX/priv/host_s390_defs.c
> 
> 
> +s390_amode *
> +s390_amode_for_stack_pointer(Int offset)
> +{
> +   if (fits_unsigned_12bit(offset))
> +      return s390_amode_b12(offset, s390_hreg_stack_pointer());
> +
> +   if (fits_signed_20bit(offset))
> +      return s390_amode_b20(offset, s390_hreg_stack_pointer());
> 
> If in both cases you're returning a 's390_amode*', is there any purpose
> in having both variants?  Why not always just use the 20 bit variant?

To be honest this function is heavily inspired by "s390_amode_for_guest_state"
one from the same file and I don't know why we have two variants. I guess it's
a try to save memory or something else.
Since the functions do the very similar job they have similar implementation.

> VEX/pub/libvex_guest_s390x.h
> 
> +   /*   64 */  union{ ULong guest_f0;  V128  guest_v0; };
> ..
> +   /*  304 */  union{ ULong guest_f15; V128  guest_v15; };
> 
> This is definitely not right, because the two union components have
> different sizes (8 vs 16 bytes).  Please remove the _f[0-15] variants and
> the unions themselves, and instead use (eg) guest_vN.w64[0] or .w64[1]
> instead.  This is what various other ports do, that have similar register
> arrangements.

Fixed. 

> VEX/priv/guest_s390_toIR.c
> +s390_format_VRX_VRRD(const HChar *(*irgen)(UChar v1, IRTemp op2addr),
> +                    UChar v1, UChar x2, UChar b2, UShort d2, UChar rxb)
> +{
> +   const HChar *mnm;
> +   IRTemp op2addr = newTemp(Ity_I64);
> +
> +   if (! s390_host_has_vx) {
> +      emulation_failure(EmFail_S390X_vx);
> +      return;
> +   }
> 
> Is this really right?  Looking at emulation_failure(), I see that this
> produces a jump of the kind Ijk_EmFail.  For other targets, and
> instruction-decode failure produces a jump of the kind Ijk_NoDecode, which
> is different -- it causes Valgrind to send SIGILL to the guest.
> 
> So I don't think this is really right.  Unless the s390 response to an
> undecodeable insn is something different than SIGILL.
> 

"s390_host_has_vx" is not a decoding failure reaction. It indicates that
valgrind decoded vector instruction which it cannot emulate / handle due to
missing vector facility. 
In the same way valgrind (on s390x machine) calls emulation_failure() when it
decodes some FP insn and doesn't have fpext (floating-point extension)
facility.
So I think this is right.

> +s390_irgen_VMRH(UChar v1, UChar v2, UChar v3, UChar m4)
> +s390_irgen_VMRL(UChar v1, UChar v2, UChar v3, UChar m4)
> +s390_irgen_VUPH(UChar v1, UChar v2, UChar m3)
> +s390_irgen_VUPLH(UChar v1, UChar v2, UChar m3)
> +s390_irgen_VUPL(UChar v1, UChar v2, UChar m3)
> +s390_irgen_VUPLL(UChar v1, UChar v2, UChar m3)
> 
> Many of these look like they are performing vector operations by doing one
> lane at a time.  Whilst this will generate working code, it will also cause
> the translations to be big.  In extreme cases of this (on other
> architectures), given long sequences of vector instructions in the input, we
> have had problems in which VEX's internal storage overflowed and so it had
> to abort.  I would therefore encourage you to generate shorter sequences
> that make more extensive use of the vector IROps, if possible.

 Fixed.

> one/tests/s390x/vector.c
> 
> These tests test decoding, but they don't test the arithmetic much at all.
> Please add more tests, possibly by running each test (eg) 50 times with
> random data.  That often shakes out edge cases.  See for example
> none/tests/amd64/avx-1.c.

Done.

-- 
You are receiving this mail because:
You are watching all bug changes.

Reply via email to