https://bugs.kde.org/show_bug.cgi?id=385408
--- Comment #21 from Vadim Barkov <vbr...@gmail.com> --- (In reply to Julian Seward from comment #19) > (In reply to Vadim Barkov from comment #18) > > Created attachment 108754 [details] > > Initial vector support (chapter 21) (fixed doc) > > This is pretty good on the whole, but there are a couple of areas of concern. > Comments below. > > > +++ b/memcheck/mc_translate.c > > + case Iop_Perm8x16x2: > + /* (V128, V128, V128) -> V128 */ > + return assignNew('V', mce, Ity_V128, triop(op, vatom1, vatom2, > vatom3)); > > This isn't right. It needs to steer the shadow values but using atom3 as > the steering value, not vatom3. Also it needs to somehow take into account > the fact that atom3 might be undefined. See the handling of Iop_Perm8x8 in > the same file. > Is this code right? case Iop_Perm8x16x2: /* (V128, V128, V128) -> V128 */ complainIfUndefined(mce, atom3, NULL); return mkUifUV128( mce, assignNew('V', mce, Ity_V128, triop(op, vatom1, vatom2, atom3)), mkPCast8x16(mce, vatom3) ); > VEX/priv/host_s390_defs.c > > > +s390_amode * > +s390_amode_for_stack_pointer(Int offset) > +{ > + if (fits_unsigned_12bit(offset)) > + return s390_amode_b12(offset, s390_hreg_stack_pointer()); > + > + if (fits_signed_20bit(offset)) > + return s390_amode_b20(offset, s390_hreg_stack_pointer()); > > If in both cases you're returning a 's390_amode*', is there any purpose > in having both variants? Why not always just use the 20 bit variant? To be honest this function is heavily inspired by "s390_amode_for_guest_state" one from the same file and I don't know why we have two variants. I guess it's a try to save memory or something else. Since the functions do the very similar job they have similar implementation. > VEX/pub/libvex_guest_s390x.h > > + /* 64 */ union{ ULong guest_f0; V128 guest_v0; }; > .. > + /* 304 */ union{ ULong guest_f15; V128 guest_v15; }; > > This is definitely not right, because the two union components have > different sizes (8 vs 16 bytes). Please remove the _f[0-15] variants and > the unions themselves, and instead use (eg) guest_vN.w64[0] or .w64[1] > instead. This is what various other ports do, that have similar register > arrangements. Fixed. > VEX/priv/guest_s390_toIR.c > +s390_format_VRX_VRRD(const HChar *(*irgen)(UChar v1, IRTemp op2addr), > + UChar v1, UChar x2, UChar b2, UShort d2, UChar rxb) > +{ > + const HChar *mnm; > + IRTemp op2addr = newTemp(Ity_I64); > + > + if (! s390_host_has_vx) { > + emulation_failure(EmFail_S390X_vx); > + return; > + } > > Is this really right? Looking at emulation_failure(), I see that this > produces a jump of the kind Ijk_EmFail. For other targets, and > instruction-decode failure produces a jump of the kind Ijk_NoDecode, which > is different -- it causes Valgrind to send SIGILL to the guest. > > So I don't think this is really right. Unless the s390 response to an > undecodeable insn is something different than SIGILL. > "s390_host_has_vx" is not a decoding failure reaction. It indicates that valgrind decoded vector instruction which it cannot emulate / handle due to missing vector facility. In the same way valgrind (on s390x machine) calls emulation_failure() when it decodes some FP insn and doesn't have fpext (floating-point extension) facility. So I think this is right. > +s390_irgen_VMRH(UChar v1, UChar v2, UChar v3, UChar m4) > +s390_irgen_VMRL(UChar v1, UChar v2, UChar v3, UChar m4) > +s390_irgen_VUPH(UChar v1, UChar v2, UChar m3) > +s390_irgen_VUPLH(UChar v1, UChar v2, UChar m3) > +s390_irgen_VUPL(UChar v1, UChar v2, UChar m3) > +s390_irgen_VUPLL(UChar v1, UChar v2, UChar m3) > > Many of these look like they are performing vector operations by doing one > lane at a time. Whilst this will generate working code, it will also cause > the translations to be big. In extreme cases of this (on other > architectures), given long sequences of vector instructions in the input, we > have had problems in which VEX's internal storage overflowed and so it had > to abort. I would therefore encourage you to generate shorter sequences > that make more extensive use of the vector IROps, if possible. Fixed. > one/tests/s390x/vector.c > > These tests test decoding, but they don't test the arithmetic much at all. > Please add more tests, possibly by running each test (eg) 50 times with > random data. That often shakes out edge cases. See for example > none/tests/amd64/avx-1.c. Done. -- You are receiving this mail because: You are watching all bug changes.