[valgrind] [Bug 203380] Add a client request to register debug info for code ranges
https://bugs.kde.org/show_bug.cgi?id=203380 Julian Seward changed: What|Removed |Added Attachment #158504|0 |1 is obsolete|| --- Comment #7 from Julian Seward --- Created attachment 158505 --> https://bugs.kde.org/attachment.cgi?id=158505=edit Majorly revised and improved patch, take 2 (with debug printing removed) -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 203380] Add a client request to register debug info for code ranges
https://bugs.kde.org/show_bug.cgi?id=203380 Julian Seward changed: What|Removed |Added Attachment #36062|0 |1 is obsolete|| Attachment #57659|0 |1 is obsolete|| --- Comment #6 from Julian Seward --- Created attachment 158504 --> https://bugs.kde.org/attachment.cgi?id=158504=edit Majorly revised and improved patch Here's a revised and improved patch that I have used recently for profiling webassembly applications running on SpiderMonkey. It's still not particularly sophisticated, but works well enough to be useful. It retains the basic design of earlier versions of the patch, in that it stands outside the per-object DebugInfo mechanism used for AOT code. Hence it is localised complexity that does not impact the existing mechanisms. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 380942] Experimental: add MESI protocol simulation to Callgrind
https://bugs.kde.org/show_bug.cgi?id=380942 Julian Seward changed: What|Removed |Added Attachment #105965|0 |1 is obsolete|| --- Comment #3 from Julian Seward --- Created attachment 156636 --> https://bugs.kde.org/attachment.cgi?id=156636=edit valgrind-bug380942-Callgrind-MESI.diff-2023Feb23-rebased Here's the same patch rebased to V sources of 23 Feb 2023. --fair-sched=yes has been hardwired (see comment 1) and the scheduling quantum has been reduced from 1200 to 800, for safety. TL;DR to build is now: git clone https://sourceware.org/git/valgrind.git mesi2023 cd mesi2023 patch -p1 < valgrind-bug380942-Callgrind-MESI.diff-2023Feb23-rebased ./autogen.sh ./configure --prefix=`pwd`/Inst --enable-only64bit make -j 8 all make -j 8 install To run: /path/to/mesi2023/Inst/bin/valgrind \ --tool=callgrind --simulate-mesi=yes kcachegrind callgrind.out. See comment 1 for more details on the RFO events. These are the "requests for ownership" (cache line transfers) which contain the info of interest here. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 383010] Add support for AVX-512 instructions
https://bugs.kde.org/show_bug.cgi?id=383010 --- Comment #70 from Julian Seward --- Created attachment 145020 --> https://bugs.kde.org/attachment.cgi?id=145020=edit Fix handling of no-mask reg-reg versions of VEXPAND* and VCOMPRESS* Here's a bug fix for the VEXPAND and VCOMPRESS instructions, specifically for the register-to-register, mask-free versions. By "mask-free" I mean they do not specify any of `{k1}` to `{k7}`. (I think that makes the instructions into trivial reg-to-reg copies, but that's irrelevant). The bug is that the generated IR acts as if `{k0}` had been specified, and so the result depends on whatever value is in `k0` at the time. I worry that there are potentially other places where the IR is generated using `getKReg(mask)` when really it should be `mask ? getKReg(mask) : mkU64(0)`, and that testing isn't catching these. Not sure though. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 383010] Add support for AVX-512 instructions
https://bugs.kde.org/show_bug.cgi?id=383010 --- Comment #68 from Julian Seward --- Created attachment 144930 --> https://bugs.kde.org/attachment.cgi?id=144930=edit valgrind-avx512-rollup-fixes-2021Dec29.diff Rollup fixes to be applied on top of (after) the patches in comments 58, 59 and 60: * fixes the problem described in comment 67. The patch set extends AMD64Instr::CStore and AMD64Instr::CLoad to also handle 8- and 16- bit conditional stores and loads. However, the emit_AMD64Instr clauses for these cases were not correct and still generating 64-bit transactions. This fixes them. That removes a bunch of incorrect results in regression tests, and crashing when running real programs. The test case avx512-skx is still failing, though. * [trivial fixes] fixes segfaults caused by insufficient alignment in test cases avx512-l1.c and avx512-l2.c * [temporary] disables a few test cases in avx512-l1.c since they don't run on my hardware (Core i5-1135G7), even natively. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 383010] Add support for AVX-512 instructions
https://bugs.kde.org/show_bug.cgi?id=383010 --- Comment #67 from Julian Seward --- Created attachment 144910 --> https://bugs.kde.org/attachment.cgi?id=144910=edit Demonstrates misbehaving `vmovdqu8 %ymm7, (%r9){%k5}` I've been testing the patches from comment 58, 59, 60 against the trunk, using Fedora 35 running on a Core i5-1135G7. It passes the tests in the comment 60 patch, but causes regressions in various other tests. I tracked one problem down to an incorrect implementation of 256-bit stores that use a guard register (k1 .. k7). This causes glibc's memset() to misbehave, hence causing --tool=none runs to fail. Testcase is attached. I imagine it's caused by an incorrect translation into IR, but I haven't figured out how that translation is done. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 446103] Memcheck: `--track-origins=yes` causes extreme slowdowns for large mmap/munmap
https://bugs.kde.org/show_bug.cgi?id=446103 Julian Seward changed: What|Removed |Added Resolution|--- |FIXED Status|REPORTED|RESOLVED --- Comment #2 from Julian Seward --- Fixed, 8ee1656165902125c414d598cff788c7bb0b1556. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 446103] Memcheck: `--track-origins=yes` causes extreme slowdowns for large mmap/munmap
https://bugs.kde.org/show_bug.cgi?id=446103 --- Comment #1 from Julian Seward --- Created attachment 144170 --> https://bugs.kde.org/attachment.cgi?id=144170=edit Proposed patch (needs commit message) Attached is a proposed fix. It makes gigabyte-sized SARPs about 3 x faster, when --track-origins=yes. Note, from comment 0: > a huge AVL tree is created This was speculation and is totally wrong. The slowdowns were for other reasons. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 446236] Issues with Valgrind 3.8.1 version, after compiler upgrade
https://bugs.kde.org/show_bug.cgi?id=446236 --- Comment #2 from Julian Seward --- Also, if it really is 3.8.1 that you are using .. that is ancient and unsupported. Try again with the most recent version, 3.18.1. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 446103] New: Memcheck: `--track-origins=yes` causes extreme slowdowns for large mmap/munmap
https://bugs.kde.org/show_bug.cgi?id=446103 Bug ID: 446103 Summary: Memcheck: `--track-origins=yes` causes extreme slowdowns for large mmap/munmap Product: valgrind Version: unspecified Platform: Other OS: Linux Status: REPORTED Severity: normal Priority: NOR Component: memcheck Assignee: jsew...@acm.org Reporter: jsew...@acm.org Target Milestone: --- Created attachment 143957 --> https://bugs.kde.org/attachment.cgi?id=143957=edit C test case The attached test program repeatedly mmaps/munmaps a 2GB chunk of memory. Without `--track-origins=yes`, it runs in about 0.7 seconds. But with origin tracking enabled, it takes 95.5 seconds, that is, it runs 137 times slower. The problem occurs because the 2GB mmaps/munmaps cause Memcheck to perform a set-address-range-perms operation on the area, changing it to "defined" and then back to "noaccess". Large-range SARPs are (highly) optimised for the V/A-bit (normal) Memcheck shadow memory, but they are very inefficient for the origin-tracking shadow area. Origin-tracking shadows are kept in a 64MB 2-way set associative cache, which is pretty fast. But lines ejected from that are kept in a OSet (AVL-tree) based backing store. Because these SARP lengths greatly exceed the size of the cache, a huge AVL tree is created, which is very time- and space-inefficient. This was initially noticed in https://bugzilla.mozilla.org/show_bug.cgi?id=1742851. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 445415] New: arm64 front end: alignment checks missing for atomic instructions
https://bugs.kde.org/show_bug.cgi?id=445415 Bug ID: 445415 Summary: arm64 front end: alignment checks missing for atomic instructions Product: valgrind Version: unspecified Platform: Other OS: Linux Status: REPORTED Severity: normal Priority: NOR Component: vex Assignee: jsew...@acm.org Reporter: jsew...@acm.org Target Milestone: --- For the arm64 front end, none of the atomic instructions have address alignment checks included in their IR. They all should. The effect of missing alignment checks in the IR is that, since this IR will in most cases translated back to atomic instructions in the back end, we will get alignment traps (SIGBUS) on the host side and not on the guest side, which is (very) incorrect behaviour of the simulation. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 444399] disInstr(arm64): unhandled instruction 0xC87F2D89 (LD{,A}XP and ST{,L}XP)
https://bugs.kde.org/show_bug.cgi?id=444399 --- Comment #9 from Julian Seward --- Landed, 530df882b8f60ecacaf2b9b8a719f7ea1c1d1650. I think it's OK, and the patch contains test cases, but .. please test. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 444399] disInstr(arm64): unhandled instruction 0xC87F2D89 (LD{,A}XP and ST{,L}XP)
https://bugs.kde.org/show_bug.cgi?id=444399 Julian Seward changed: What|Removed |Added Summary|disInstr(arm64): unhandled |disInstr(arm64): unhandled |instruction 0xC87F2D89 |instruction 0xC87F2D89 ||(LD{,A}XP and ST{,L}XP) -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 444399] disInstr(arm64): unhandled instruction 0xC87F2D89
https://bugs.kde.org/show_bug.cgi?id=444399 Bug 444399 depends on bug 445354, which changed state. Bug 445354 Summary: arm64 backend: incorrect code emitted for doubleword CAS https://bugs.kde.org/show_bug.cgi?id=445354 What|Removed |Added Status|REPORTED|RESOLVED Resolution|--- |FIXED -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 445354] arm64 backend: incorrect code emitted for doubleword CAS
https://bugs.kde.org/show_bug.cgi?id=445354 Julian Seward changed: What|Removed |Added Resolution|--- |FIXED Status|REPORTED|RESOLVED --- Comment #1 from Julian Seward --- Fixed, 7dbe2fed72886874f2eaf57dc07929542ae55b58. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 444399] disInstr(arm64): unhandled instruction 0xC87F2D89
https://bugs.kde.org/show_bug.cgi?id=444399 Julian Seward changed: What|Removed |Added Attachment #143328|0 |1 is obsolete|| --- Comment #8 from Julian Seward --- Created attachment 143474 --> https://bugs.kde.org/attachment.cgi?id=143474=edit Final proposed patch Final proposed patch. This includes the fix for blocking bug 445354, which is small and which I will land separately and first. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 445354] arm64 backend: incorrect code emitted for doubleword CAS
https://bugs.kde.org/show_bug.cgi?id=445354 Julian Seward changed: What|Removed |Added Blocks||444399 Referenced Bugs: https://bugs.kde.org/show_bug.cgi?id=444399 [Bug 444399] disInstr(arm64): unhandled instruction 0xC87F2D89 -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 444399] disInstr(arm64): unhandled instruction 0xC87F2D89
https://bugs.kde.org/show_bug.cgi?id=444399 Julian Seward changed: What|Removed |Added Depends on||445354 Referenced Bugs: https://bugs.kde.org/show_bug.cgi?id=445354 [Bug 445354] arm64 backend: incorrect code emitted for doubleword CAS -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 445354] New: arm64 backend: incorrect code emitted for doubleword CAS
https://bugs.kde.org/show_bug.cgi?id=445354 Bug ID: 445354 Summary: arm64 backend: incorrect code emitted for doubleword CAS Product: valgrind Version: unspecified Platform: Other OS: Linux Status: REPORTED Severity: normal Priority: NOR Component: vex Assignee: jsew...@acm.org Reporter: jsew...@acm.org CC: m...@klomp.org Target Milestone: --- The sequence of instructions emitted by the arm64 backend for doubleword compare-and-swap is incorrect. This could lead to incorrect simulation of the AArch8.1 atomic instructions (CASP, at least), and causes failures in the upcoming fix for v8.0 support for LD{,A}XP/ST{,L}XP in bug 444399. In the worst case it can cause segfaulting in the generated code, because it could jump backwards unexpectedly far. The problem is the sequence emitted for ARM64in_CASP: * the jump offsets are incorrect, both for `bne out` (x 2) and `cbnz w1, loop`. * using w1 to hold the success indication of the stxp instruction trashes the previous value in x1. But the value in x1 is an output of ARM64in_CASP, hence one of the two output registers is corrupted. That confuses any code downstream that want to inspect those values to find out if the transaction succeeded or not. The fixes are to * fix the branch offsets * use a different register to hold the stxp success indication. w3 is a convenient check. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 444399] disInstr(arm64): unhandled instruction 0xC87F2D89
https://bugs.kde.org/show_bug.cgi?id=444399 --- Comment #7 from Julian Seward --- Created attachment 143328 --> https://bugs.kde.org/attachment.cgi?id=143328=edit WIP patch that will possibly get you back on the road. DO NOT LAND. Fixing this is a whole trip because the various IR and arm64 frameworks were not really designed to accommodate it. Anyways, here is a WIP patch. It seems to work for simple tests (in the patch) but is not fully tested. It will not work if you run with `--sim-hints=fallback-llsc` or if the fallback LL/SC implementation is auto-selected, based on your processor, at startup. It applies against the head and also against a vanilla 3.18.1 tarball, although I haven't tested it in the latter case. If anyone wants to test it, and let me know if works, that would be appreciated. I will try to finish it up properly this coming week. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 434283] disInstr(arm64): unhandled instruction 0xC87FAB5F (LDAXP and STLXP)
https://bugs.kde.org/show_bug.cgi?id=434283 Julian Seward changed: What|Removed |Added CC||jsew...@acm.org Resolution|--- |DUPLICATE Status|REPORTED|RESOLVED --- Comment #1 from Julian Seward --- I'm going to close this as a dup of bug 444399, where there is, now, a preliminary fix available. *** This bug has been marked as a duplicate of bug 444399 *** -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 444399] disInstr(arm64): unhandled instruction 0xC87F2D89
https://bugs.kde.org/show_bug.cgi?id=444399 Julian Seward changed: What|Removed |Added CC||a...@ao2.it --- Comment #6 from Julian Seward --- *** Bug 434283 has been marked as a duplicate of this bug. *** -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 444399] disInstr(arm64): unhandled instruction 0xC87F2D89
https://bugs.kde.org/show_bug.cgi?id=444399 --- Comment #5 from Julian Seward --- I looked into this a bit. It does indeed appear that LD{A}XP and ST{L}XP exist in AArch64 8.0 but are not implemented in V. I am somewhat surprised by this since I distinctly remember carefully making a list of all instructions that needed to be implemented, when doing the initial AArch64 port, so I'm not sure how these got forgotten. I will fix it, but it may not be an immediate fix. VEX's intermediate representation has a way to represent doubleword CAS, but can only represent single word load-exclusive / store-check, so it will need to be extended accordingly, and that may have some minor knock-on effect on other architectures. I would guess that the immediate cause of the failure is that LLVM 12 has started generating these instructions. That would explain why rustc shows the problem in comment 0 -- presumably that is rustc nightly -- and also why clang++ 12 shows the problem in comment 3. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 444545] memchr behaves differently under valgrind --tool=massif
https://bugs.kde.org/show_bug.cgi?id=444545 --- Comment #10 from Julian Seward --- That is all very strange. Q: for the failing case, does adding `--alignment=32` make any difference? That changes the alignment produced by `malloc`. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 444545] memchr behaves differently under valgrind --tool=massif
https://bugs.kde.org/show_bug.cgi?id=444545 --- Comment #6 from Julian Seward --- > Model name:Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz Hmm, nothing particularly exotic there. > Sorry this is proving more troublesome than I expected. It's OK; however; we can't fix it if we can't reproduce it. Can you maybe re-check your testing setup? TBH I seriously doubt that Massif or the V framework as a whole mishandles memchr, since I've run huge apps (eg, Firefox) on Massif and I see no such breakage. Plus there are multiple auto-test runs every night, on various F35 targets, and I'm not aware of such breakage in them. That said it maybe that you've fallen across some obscure corner case that is broken. But I can't imagine what that could be. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 444545] memchr behaves differently under valgrind --tool=massif
https://bugs.kde.org/show_bug.cgi?id=444545 Julian Seward changed: What|Removed |Added CC||jsew...@acm.org --- Comment #4 from Julian Seward --- I can't reproduce this, running on Fedora 35, x86_64, using V from git and the system gcc, 11.2.1. This is on an Intel i7-3615QM, in short an older machine that supports up to avx but not avx2. Can you post the results of /usr/bin/lscpu on your F35 install? -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 444399] disInstr(arm64): unhandled instruction 0xC87F2D89
https://bugs.kde.org/show_bug.cgi?id=444399 --- Comment #2 from Julian Seward --- I can't reproduce this, testing on Fedora 33 on Parallels Workstation running on an M1 Mac Mini, with either rustc-1.55 or rustc-1.56. I suspect this is some kind of hardware capabilities problem, in that rustc-generated code is using instructions that V doesn't claim to support. From irc I see that the hardware you used here was a "Graviton 2", which has Neoverse N1 cores, and they support AArch64-v8.4. The M1 is AArch64-v8.6 I think, although which of those extensions are available within Parallels I don't know. That said, I'm still surprised it fails for you, given that it doesn't fail here, *and* given that V doesn't even fully support v8.2 of the instruction set. The output of /usr/bin/lscpu for F33-on-Parallels-on-M1 are below. Can you show the output on the failing target? Architecture:aarch64 CPU op-mode(s): 64-bit Byte Order: Little Endian CPU(s): 8 On-line CPU(s) list: 0-7 Thread(s) per core: 1 Core(s) per socket: 8 Socket(s): 1 NUMA node(s):1 Vendor ID: ARM Model: 0 Stepping:r0p0 BogoMIPS:48.00 NUMA node0 CPU(s): 0-7 Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1:Mitigation; __user pointer sanitization Vulnerability Spectre v2:Not affected Vulnerability Srbds: Not affected Vulnerability Tsx async abort: Not affected Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 asimddp sha512 asimdf hm dit uscat ilrcpc flagm ssbs sb paca pacg dcpodp flagm2 frint -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 433510] FreeBSD support, part 12
https://bugs.kde.org/show_bug.cgi?id=433510 --- Comment #6 from Julian Seward --- (In reply to Mark Wielaard from comment #5) > (In reply to Paul Floyd from comment #4) > > > - TRACE_SYMTAB("rx_map: avma %#lx size %lu foff %ld\n", > > > + TRACE_SYMTAB("rx_map: avma %#lx size %lu foff %lld\n", > > > > ^^^ generates warnings > > > > I think that both versions probably generate warnings on either 32bit or > > 64bit systems, and that we should use a macro to choose l or ll. > > I don't think so. I believe the actual pointers and offsets are either > 32/64bit already and match the lx/lu/ld directives. As a general comment, the way I've handled these problems in the past is simply to cast the relevant values to the widest possible type -- a 64-bit signed or unsigned int (Long or ULong) and then used %lld, %llx, etc. That makes the problem go away without need of macros or other complexity, although you do have to be a bit careful about whether the cast requires signed or unsigned widening. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 442168] Floating-point erroneous behavior with memcheck with regards to class (isfinite(), etc.)
https://bugs.kde.org/show_bug.cgi?id=442168 --- Comment #8 from Julian Seward --- (In reply to Xavier Roche from comment #7) I checked V's {V,}MOVSD implementation more, and still find no problem I think this is unrelated to those insns. If I had to guess, I'd say it *might* be related to the use of + vcmpneq_oqsd.LCPI1_1(%rip), %xmm1, %xmm1 since the implementation is a bit shaky: // 0xC: this isn't really right because it returns all-1s when // either operand is a NaN, and it should return all-0s. case 0x4: XXX(False, False, Iop_CmpEQ32Fx4, True); break; // NEQ_UQ case 0xC: XXX(False, False, Iop_CmpEQ32Fx4, True); break; // NEQ_OQ (from `findSSECmpOp`). But that's just a guess; without a way to reproduce the failure, I think the chances of resolving this is pretty low. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 442168] Floating-point erroneous behavior with memcheck with regards to class (isfinite(), etc.)
https://bugs.kde.org/show_bug.cgi?id=442168 --- Comment #5 from Julian Seward --- Looking at the Intel docs for MOVSD and VMOVSD, and comparing against what guest_amd64_toIR.c implements, I don't see any error in the implementation. Plus, both of those must be at least moderately commonly used, and so if there was really an emulation bug with them we would probably have heard about it long ago. So I find it a bit difficult to believe that this MOVSD/VMOVSD difference is really the root cause of whatever it is that you are seeing. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 442168] Floating-point erroneous behavior with memcheck with regards to class (isfinite(), etc.)
https://bugs.kde.org/show_bug.cgi?id=442168 --- Comment #4 from Julian Seward --- (In reply to Xavier Roche from comment #2) > The difference between the correctly executed code under valgrind and the > faulty one: > - movsd %xmm0, (%rsp) # 8-byte Spill > + vmovsd %xmm0, (%rsp) # 8-byte Spill > - movsd (%rsp), %xmm0 # 8-byte Reload > + vmovsd (%rsp), %xmm0 # 8-byte Reload > - movsd .LCPI6_0(%rip), %xmm0 # xmm0 = mem[0],zero > + vmovsd .LCPI6_0(%rip), %xmm0 # xmm0 = mem[0],zero Can you give some more information about why you think the change from movsd to vmovsd causes the error? Also, which one gives correct execution and which doesn't? -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 432387] s390x: z15 instructions support
https://bugs.kde.org/show_bug.cgi?id=432387 --- Comment #5 from Julian Seward --- (In reply to Andreas Arnez from comment #4) > > + s390_insn_assert("vcdlg", m3 == 2 || m3 == 3); > > (multiple times) > > This will fail only on internal logic errors, correct? .. it can't fail only > > because the insn can't be decoded, right? > Actually, the latter. The s390_insn_assert macro passes a special kind of > decoding failure (a "specification exception") up through the call chain > if the asserted condition is not met. So .. am a bit unclear about this, but it doesn't matter. It's enough to say that it should be impossible to cause the front end to assert by feeding it invalid instructions. Instead, all invalid insns should cause a SIGILL to be handed to the guest (or whatever is appropriate on s390). So long as the above holds, then I'm happy. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 440893] False positive "uninitialzied value" error on memeory allocated with new[]
https://bugs.kde.org/show_bug.cgi?id=440893 --- Comment #3 from Julian Seward --- > It's strange that in my case the data block is always filled with 0's > (abort() is never called), I'm using gcc 8.5.0 and c++17. The Linux kernel is obliged to zero out all memory pages it gives to a process, since not doing so would be a massive security hole. That's quite possibly what you're observing. > Just for my information is there a reason why valgrind only complained > about the first element of the array being uninitialized, and not the > rest of the array? Because all of those accesses will have the same stack trace, and Memcheck will only show the first one. Use -v if you want per-stack-trace error counts. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 432387] s390x: z15 instructions support
https://bugs.kde.org/show_bug.cgi?id=432387 Julian Seward changed: What|Removed |Added CC||jsew...@acm.org --- Comment #3 from Julian Seward --- Nice work! The following look OK to land as-is: Subject: [PATCH 01/13] s390x: Misc-insn-3, bitwise logical 3-way instructions Subject: [PATCH 02/13] s390x: Misc-insn-3, "select" instructions Subject: [PATCH 03/13] s390x: Misc-insn-3, new POPCNT variant Subject: [PATCH 04/13] s390x: Misc-insn-3, MVCRL Subject: [PATCH 06/13] s390x: Vec-enh-2, extend VSL, VSRA, and VSRL Subject: [PATCH 09/13] s390x: Vec-enh-2, VSLD and VSRD Subject: [PATCH 10/13] s390x: Vec-enh-2, VSTRS Subject: [PATCH 11/13] s390x: Mark arch13 features as supported Subject: [PATCH 12/13] s390x: Vec-enh-2, test cases Subject: [PATCH 13/13] s390x: Wrap up misc-insn-3 and vec-enh-2 support For patches 05/13 and 07/13, I have a couple of queries/comments, but nothing serious. Providing those assertions won't fail and that there's no build-time dependency problems for the test cases, OK to land these two too. Subject: [PATCH 05/13] s390x: Misc-insn-3, test case none/tests/s390x/Makefile.am @@ -19,7 +19,8 @@ INSN_TESTS = clc clcle cvb cvd icm lpr tcxb lam_stam xc mvst add sub mul \ spechelper-ltr spechelper-or \ spechelper-icm-1 spechelper-icm-2 spechelper-tmll \ spechelper-tm laa vector lsc2 ppno vector_string vector_integer \ -vector_float add-z14 sub-z14 mul-z14 bic +vector_float add-z14 sub-z14 mul-z14 bic \ +misc3 Is `misc3` safe to always-build? Or would it require gating on assembler features, in the style of "if BUILD_DFP_TESTS .." just below? .. ah .. later .. I see misc3.c doesn't require any assembler support, since it just does `insn rrf,0x" #opcode ",%[out],%[a],%[b],0\n\t"`. diff --git a/none/tests/s390x/misc3.vgtest b/none/tests/s390x/misc3.vgtest new file mode 100644 index 0..d051a06bd --- /dev/null +++ b/none/tests/s390x/misc3.vgtest @@ -0,0 +1 @@ +prog: misc3 Similarly, does this require gating on host hwcaps? Looking at your _toIR.c implementation, maybe not, since that appears to be down-translating (I mean, translating from the z15 dialect into IR which will be re-emitted as instructions for some earlier zSeries CPU). I suppose this means that on older platforms, it will be possible to compile this test, and run it on V, but not to run it natively. Subject: [PATCH 07/13] s390x: Vec-enh-2, extend VCDG, VCDLG, VCGD, and VCLGD + s390_insn_assert("vcdlg", m3 == 2 || m3 == 3); (multiple times) This will fail only on internal logic errors, correct? .. it can't fail only because the insn can't be decoded, right? Subject: [PATCH 08/13] s390x: Vec-enh-2, VLBR and friends +++ b/VEX/priv/host_s390_isel.c + case Iop_Perm8x16: Seeing use of Iop_Perm8x16 made me wonder if it had a definition that is independent of the guest/host endianness .. and, no it doesn't: libvex_ir.h says: for i in 0 .. 15 . result[i] = argL[ argR[i] ] so the meaning of "[n]" is itself ambiguous. Anyways, no need to change anything in your code. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 432801] Valgrind 3.16.1 reports a jump based on uninitialized memory somehow related to clang and signals
https://bugs.kde.org/show_bug.cgi?id=432801 --- Comment #44 from Julian Seward --- I meant, unsigned < and <= scalar comparisons on undefined data. Duh. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 432801] Valgrind 3.16.1 reports a jump based on uninitialized memory somehow related to clang and signals
https://bugs.kde.org/show_bug.cgi?id=432801 --- Comment #43 from Julian Seward --- As a general note, I have been investigating unsigned < and <= comparisons on scalar values for arm64, and expect to continue doing so as a background task for the next few weeks. So, this is not forgotten. Ideally we can arrive at a cheap and effective solution that works for all targets. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 440095] No support for "dc cvac" and "dc cvap" on Arm64
https://bugs.kde.org/show_bug.cgi?id=440095 --- Comment #1 from Julian Seward --- Try copying the scheme used for 'dc cvau' in guest_arm64_toIR.c. That might just work for the cases you care about too. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 438630] Adding zero variants of arm64 v8.2 FCMEQ, FCMGE, FCMGT, FCMLE and FCMLT instructions
https://bugs.kde.org/show_bug.cgi?id=438630 Julian Seward changed: What|Removed |Added CC||jsew...@acm.org --- Comment #2 from Julian Seward --- (In reply to ahashmi from comment #1) > Created attachment 139381 [details] > Adds zero variants of arm64 v8.2 FCMEQ, FCMGE, FCMGT, FCMLE and FCMLT > instructions This seems fine; r+ to land. Has no assertions on the decode-fail paths, which is as desired. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 438038] Addition of arm64 v8.2 FCCMP, FCCMPE, FCMEQ, FCMGE and FCMGT instructions
https://bugs.kde.org/show_bug.cgi?id=438038 Julian Seward changed: What|Removed |Added CC||jsew...@acm.org --- Comment #3 from Julian Seward --- (In reply to ahashmi from comment #1) > Created attachment 139231 [details] > Adds arm64 v8.2 FCCMP, FCCMPE, FCMEQ, FCMGE and FCMGT instructions Looks mostly fine. Comments below. r+ provided the SIGILL-vs-assert thing is fixed. In dis_AdvSIMD_fp_conditional_compare + else if (ty == 3) { + if ((archinfo->hwcaps & VEX_HWCAPS_ARM64_FP16) == 0) + return False; + ity = Ity_F16; + irop = Iop_CmpF16; + } + else + vassert(0); /* ty = 2 => illegal encoding */ No .. ensure SIGILL is raised (decode failure) for any undecodable insns. Never assert. + setFlags_COPY(nzcv_28x0); + DIP("fccmp%s %s, %s, #%u, %s\n", isCMPE ? "e" : "", + nameQRegLO(nn, ity), nameQRegLO(mm, ity), nzcv, nameCC(cond)); + return True; return False; This is a strange sequence. The second return is unreachable. (Maybe I misread?) + if (e->Iex.Binop.op == Iop_CmpF64 || e->Iex.Binop.op == Iop_CmpF32 || + e->Iex.Binop.op == Iop_CmpF16) { + HReg (*iselExpr)(ISelEnv*, IRExpr*); + ARM64Instr* (*VCmp)(HReg, HReg); As a nit, in such situations I'd prefer if you set these both to NULL so as to protect against future snafus .. I know it's redundant as the code stands. Viz: HReg (*iselExpr)(ISelEnv*, IRExpr*) = NULL; ARM64Instr* (*VCmp)(HReg, HReg) = NULL; -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 436873] Addition of arm64 v8.2 vector FABD, FACGE, FACGT and FADD instructions
https://bugs.kde.org/show_bug.cgi?id=436873 --- Comment #4 from Julian Seward --- (In reply to ahashmi from comment #3) > The review fixes for the scalar variants patch > https://bugs.kde.org/show_bug.cgi?id=436411 (hwcaps-gating and > iselIntExpr_RIL_wrk issues) means this latest patch works. Tested with > regtest and combinations of --memcheck:leak-check=yes --tool=memcheck > --track-origins=yes. Ah, my favourite kind of patch to review -- mostly test cases :-) I didn't actually see any gating stuff in the three new frontend (guest_arm64_toIR.c) cases, but maybe it's done elsewhere? Anyways I'll take your word for it. So long as the relevant gating is done as per the scalar patch, then +1 to land. J -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 436411] Addition of arm64 v8.2 scalar FABD, FACGE, FACGT and FADD instructions
https://bugs.kde.org/show_bug.cgi?id=436411 --- Comment #5 from Julian Seward --- (In reply to ahashmi from comment #4) > Created attachment 138947 [details] > Adds arm64 v8.2 scalar FABD, FACGE, FACGT and FADD instructions That all looks fine; please land. Just before you do -- if you haven't already -- please though run the test cases once by hand on Memcheck, with --track-origins=yes, and check nothing breaks. This has been known to happen in the past, at least to me :-) -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 435665] PPC ISA 3.0 copy, paste, cpabort instructions are not supported
https://bugs.kde.org/show_bug.cgi?id=435665 --- Comment #14 from Julian Seward --- (In reply to Tulio Magno Quites Machado Filho from comment #13) > (In reply to Carl Love from comment #12) > Correct. The address is always mmap'ed. Ok. Then I think you're good to go as-is, apart from the ifdef thing. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 435665] PPC ISA 3.0 copy, paste, cpabort instructions are not supported
https://bugs.kde.org/show_bug.cgi?id=435665 --- Comment #11 from Julian Seward --- (In reply to Carl Love from comment #10) > Created attachment 138846 [details] > PPC add copy, paste, cpabort support > > Updated patch to add the Memcheck trickery to get Memcheck to detect > undefined bits in the copy paste buffer. I *think* this is OK now, with two caveats: +UInt copy_paste_abort_dirty_helper(UInt addr, UInt op) { This will not build on anything except a Power target. It needs to be ifdef'd in the usual way. Ok to land if you fix the above plus sanity check the following issue. Here's another clarifying comment; I think you should add something like this. It's about the "paste" effects annotations. If I understand the code correctly, for "paste" you will now have annotations Ifx_Write at (wherever the target address is) for 128 bytes and also the function takes two args `EA_base` and `operation` and it returns the uint32_t that you mention. How Memcheck will instrument that is, if there is any undefinedness in the inputs, then all of the outputs will be undefined. Hence ifEA_base or operation contain any undefined bits then the return value is undefined and the specified 128-byte memory area are undefined after the call else the return value is undefined and the specified 128-byte memory area are defined after the call The point is .. if I understand "paste" correctly, the destination address is not part of the "normal" program address space. So the Ifx_Write annotation in this case is irrelevant. Indeed, if the destination address is part of *some other* address space, then we actually need to omit that Ifx_Write annotation, because othewise Memcheck will update its status bits for that address range in this program's address space, which would be wrong. So .. I hope that makes sense. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 435665] PPC ISA 3.0 copy, paste, cpabort instructions are not supported
https://bugs.kde.org/show_bug.cgi?id=435665 --- Comment #9 from Julian Seward --- So .. I had an idea about how to get the annotations right. Problem is, it's a nasty hack (although simple), and I had hoped that the passage of a bit of time would cause me to think of something cleaner. But nothing came to mind, so here it is. I *think* it'll work, but it needs implementing and testing, to check that if you do pass undefined values to the accelerator, you do actually get a Memcheck error. The basic idea is for your helper function -- the one that copies 128 bytes [or whatever] to the accelerator -- to return a fake return value. At present it doesn't return anything. Specifically, you change the helper to return a UWord and that UWord should be (for example) zero. Then, add a fake use of the fake value, in a way that will cause Memcheck to test it, as follows. Furthermore this fake use must not be something that IR optimisation can later remove. On the IR side (in guest_ppc_toIR.c), you allocate an IRTemp of type I64 (assuming 64-bit only operation) to receive the fake value, and set the IRTemp::tmp field to be that temp. Also for the IRDirty structure, you must set mFx/mAddr/mSize so as to declare the memory area it is reading. Immediately after emitting the IRDirty (in guest_ppc_toIR.c), place an IRStmt_Exit that uses the fake return value. Something very similar to this: stmt( IRStmt_Exit( binop(Iop_CmpNE64, fake_return_val_tmp, mkU64(0)), Ijk_SigTRAP, mode64 ? IRConst_U64(cia) : IRConst_U32((UInt)cia), OFFB_CIA )); where cia is the current instruction address (is often passed around, see (eg) do_trap in guest_ppc_toIR.c). The effects of this trickery are as follows: (1) the above stmt() asks the IR to exit, asking Valgrind to hand the program a SIGTRAP at this point, if the fake return value is nonzero, however .. (2) .. that never happens, because your helper always returns zero. (3) Memcheck will believe that any undefinedness in the area read by the helper will be propagated through to the fake return value, and will generate instrumentation to cause that to happen. (4) Memcheck will instrument the IRStmt_Exit to check the definedness computed by (3) and emit an error if the fake return value contains any undefined bits. Hence you get the warning we want. (5) We note that the IR optimisation passes do not know what value the helper call will return. Hence we are guaranteed that they can't optimise away the IRStmt_Exit and its associated check. Bad, huh?! I wish there were a cleaner way. I suggest you test it at least as follows: (a) check that, if the memory area is fully defined, no error is printed. (b) check that, if the memory area contains even one undefined bit, an error is printed. (c) check that, if the helper function does return nonzero, then you really do get SIGTRAP synthesised at the instruction. Not that this is important, but just as a kind of sanity check. Good luck! I am happy to look at patches, etc, as ever. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 436873] Addition of arm64 v8.2 vector FABD, FACGE, FACGT and FADD instructions
https://bugs.kde.org/show_bug.cgi?id=436873 Julian Seward changed: What|Removed |Added CC||jsew...@acm.org --- Comment #2 from Julian Seward --- This is structurally very similar to the scalar-version patch at https://bugs.kde.org/show_bug.cgi?id=436411#c2 (which is good) and so the same comments for that patch (in comment 3) apply here, meaning, the concern about hwcaps-gating. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 436411] Addition of arm64 v8.2 scalar FABD, FACGE, FACGT and FADD instructions
https://bugs.kde.org/show_bug.cgi?id=436411 Julian Seward changed: What|Removed |Added CC||jsew...@acm.org --- Comment #3 from Julian Seward --- This is for the most part good, and apart from the following I have no concerns. I see two issues of concern, both pertaining to CPU feature-gating. The first is in the front end: + if (bitU == 1 && size == X11 && opcode == BITS5(0,0,0,1,0)) { + /* 1,11,00010 FABD h_h_h */ .. and the same for the other added instructions: these need to be feature-gated, so that they are not accepted on non-8.2 capable hosts. Consider what would happen if, with these changes, we tried to run a binary containing these insns on an 8.0 host. Then the front end will accept the insns, generate (eg Iop_Add16F), which the back end will translate back into an 8.2 insn, and we will take a SIGILL in the generated code. But that's not the correct behaviour: what is required is that the front end declines to decode the insn, and that in turn will cause the existing front end machinery to eventually deliver a SIGILL to the guest. The second area of concern is similar but in the back end, regarding the Iex_Const case for iselIntExpr_RIL_wrk. To be honest I'm not sure that this should have been changed at all -- unless the RIL field, with its division into N/R/S fields, has acquired a new interpretation, but not a new layout, in 8.2 instructions. Can you clarify: does the existing immediate-for-bitfield instruction-fragment (here called RIL, and the the ARM/ARM described using the N/R/S triple) get a new interpretation in 8.2 instructions? If it doesn't, and is instead a new kind of immediate field for these new insns, then it seems to me that we'd need a new struct altogether, in similar style to, but different from, ARM64RI6/ARM64RIL/ARM64RIA/ARM64AMode. Independent of all of that, there's still the issues that iselIntExpr_RIL_wrk can't be allowed to have different behaviour than it does now, without first testing that we're on an 8.2 host rather than an 8.0 host. In the backend, the top level iselSB_ARM64 passes around a `hwcaps_host`, and that should contain whatever gating info is needed. In the front end, disInstr_ARM64 similarly passes around an `archinfo` and so the `archinfo->hwcaps` field will contain whatever's needed for gating in the front end. + /* This is a temporary alert while work is done on adding v8.2 + * support. It will be removed and fixed when Ity_I16 immediate + * encodings are understood better as v8.2 work progresses. + */ + vpanic("iselIntExpr_RIL_wrk: TODO: unhandled Ity_I16 for Iex_Const"); Regardless of all of the above, that can't land .. it would potentially destabilise the back end. If you don't want to handle the case specially, then jump to the "default case" bit at the end of the function, which can handle anything. (The same scheme applies to many other isel-functions too). -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 435665] PPC ISA 3.0 copy, paste, cpabort instructions are not supported
https://bugs.kde.org/show_bug.cgi?id=435665 --- Comment #7 from Julian Seward --- I looked at the PowerISA 3.0b documentation you linked to. Given that this is a copy from "normal" memory to an accelerator, I think you don't have the option to implement it precisely. I'd say the least worst is to implement it so that, for the copy, Memcheck will flag an error if any of the 128 bytes are undefined, and for the paste we make it look as if the 128 bytes are completely defined. That way, at least you'll know if you're sending undefined values to the accelerator. Exactly how to achieve this I am not sure. It will take some fiddling with the annotations on the dirty helper calls. I'll contemplate it more. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 435665] PPC ISA 3.0 copy, paste, cpabort instructions are not supported
https://bugs.kde.org/show_bug.cgi?id=435665 --- Comment #5 from Julian Seward --- I'm somewhat concerned, at a fundamental level .. if I understand correctly, these instructions allow one to copy memory from one place to another (so why bother? Why not just do normal loads/stores?) but -- at least as implemented -- there's no way to tell Memcheck or any other tool, that the values in the destination area are derived from values in the source area. So you'll wind up with definedness false positives or negatives as a result of using them. Can you explain the background to the insns, what they are used for, etc, so we can see if there's an implementation that fits better in the instrumentation framework? -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 434296] s390x: False-positive memcheck diagnostics from vector string instructions
https://bugs.kde.org/show_bug.cgi?id=434296 --- Comment #8 from Julian Seward --- (In reply to Andreas Arnez from comment #6) > OK, done. Unless there's anything else, I'm going to push with this change > then. Yes, pls land. Looks fine to me. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 434296] s390x: False-positive memcheck diagnostics from vector string instructions
https://bugs.kde.org/show_bug.cgi?id=434296 --- Comment #5 from Julian Seward --- (In reply to Andreas Arnez from comment #4) > > the front end, set the returned DisResult::hint field to Dis_HintVerbose. > Ah, good point, will do. Is there a recommended threshold when to set > this flag? No .. it's an ad-hoc hack :-) but I would suggest you do it for all of the insns covered by this bug, since they look like they could each generate 20 or more IRStmts per guest insn. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 431157] PPC_FEATURE2_SCV needs to be masked in AT_HWCAP2
https://bugs.kde.org/show_bug.cgi?id=431157 --- Comment #29 from Julian Seward --- (In reply to Carl Love from comment #27) > Created attachment 137063 [details] > patch to add support for the scv instruction This looks OK to me. > One issue with the patch is the SC_FLAG and SCV_FLAG defines. These > #defines were created to set a flag indicating which system call instruction > is to be used. Unfortunately I could not find an include file that is > visible in VEX as well as coregrind/m_syswrap and in > coregrind/pub_core_syscall.h. As a result, I had to add the defines for > these values in three different places with a comment pointing at the other > locations. Rather ugly. Wasn't sure if we wanted to create an include file > for these that would then be globally available?? Suggestions on how to > better was to make these defines more globally visible would be nice. Nothing better immediately springs to mind. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 434840] PPC64 darn instruction not supported
https://bugs.kde.org/show_bug.cgi?id=434840 --- Comment #7 from Julian Seward --- (In reply to Carl Love from comment #6) > Created attachment 137204 [details] > patch to add darn instruction support +static Int dis_darn ( UInt prefix, UInt theInstr, + + IRExpr** args = mkIRExprVec_1( mkU32( L ) ); + + d = unsafeIRDirty_1_N ( + rD, + 0/*regparms*/, + "darn_dirty_helper", + fnptr_to_fnentry( vbi, _dirty_helper ), + args ); This wants a comment saying that there are no effects on memory or guest state, since it is unusual for a dirty helper to have no such effects. Otherwise looks fine to me. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 435665] PPC ISA 3.0 copy, paste, cpabort instructions are not supported
https://bugs.kde.org/show_bug.cgi?id=435665 --- Comment #2 from Julian Seward --- (In reply to Carl Love from comment #1) > Created attachment 137535 [details] > PPC add copy, paste, cpabort support + d = unsafeIRDirty_1_N ( + cr0, + 0/*regparms*/, + "copy_paste_abort_dirty_helper", + fnptr_to_fnentry( vbi, _paste_abort_dirty_helper ), + args ); + stmt( IRStmt_Dirty(d) ); Doesn't this need some statement of effects? (mem/regs read/written) Without that Memcheck won't know what effect the instruction has. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 434296] s390x: False-positive memcheck diagnostics from vector string instructions
https://bugs.kde.org/show_bug.cgi?id=434296 Julian Seward changed: What|Removed |Added CC||jsew...@acm.org --- Comment #3 from Julian Seward --- >From an implementation view this all looks fine. I have only the minor query: - /* Check for specification exception */ - vassert(m3 < 3); - vassert((m5 & 0b1110) == 0); + s390_insn_assert("vistr", m3 < 3 && m5 == (m5 & 1)); Is it possible for any incoming instruction to cause this assertion to fail? If so that should be reported as SIGILL, and not cause an assertion failure. >From the code-expansion point of view, I am somewhat concerned that this may cause failures in VEX due to generating too many instructions in the back end, etc. But I can't think of any obvious improvement and it's clear you've spent time thinking about this too. The one thing I would suggest is that, for the instructions in question, in the front end, set the returned DisResult::hint field to Dis_HintVerbose. This tells the top level block-disassembly logic that the instruction just translated to IR is "verbose" and so will cause it to stop pulling new insns into the block sooner rather than later. For examples look for dres->hint = Dis_HintVerbose; in guest_amd64_toIR.c. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 433863] s390x: memcheck/tests/s390x/{cds,cs,csg} failures
https://bugs.kde.org/show_bug.cgi?id=433863 Julian Seward changed: What|Removed |Added CC||jsew...@acm.org --- Comment #4 from Julian Seward --- (In reply to Andreas Arnez from comment #3) > (In reply to Andreas Arnez from comment #2) > > Created attachment 137982 [details] > > Remove memcheck test cases for cs, cds, and csg > Any objections/comments? Otherwise I'm planning to apply the patch early > next week. This seems reasonable to me. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 433801] PPC ISA 3.1 support is missing, part 10
https://bugs.kde.org/show_bug.cgi?id=433801 --- Comment #8 from Julian Seward --- (In reply to Carl Love from comment #4) > Created attachment 136294 [details] > functional tests for Reduced-Precision: Missing Integer-based Outer > Product Operations OK to land. Again, please ensure any new files end up in the tarball. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 433801] PPC ISA 3.1 support is missing, part 10
https://bugs.kde.org/show_bug.cgi?id=433801 --- Comment #7 from Julian Seward --- (In reply to Carl Love from comment #2) > Created attachment 136292 [details] > functional tests for Reduced-Precision - bfloat16 Outer Product & Format > Conversion Operations OK to land. Please make sure though that all the new files get included in the tarball. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 433801] PPC ISA 3.1 support is missing, part 10
https://bugs.kde.org/show_bug.cgi?id=433801 --- Comment #6 from Julian Seward --- (In reply to Carl Love from comment #3) > Created attachment 136293 [details] > PPC64: Reduced-Precision: Missing Integer-based Outer Product Operations -static UInt exts8( UInt src) +static ULong exts8( UInt src) -static UInt extz8( UInt src) +static ULong extz8( UInt src) Mark these as 'static'. Otherwise, OK to land. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 433801] PPC ISA 3.1 support is missing, part 10
https://bugs.kde.org/show_bug.cgi?id=433801 --- Comment #5 from Julian Seward --- (In reply to Carl Love from comment #1) > Created attachment 136291 [details] > Reduced-Precision - bfloat16 Outer Product & Format Conversion Operations +static Float conv_bf16_to_float( UInt input ) +{ .. + output is 64-bit float. + bias +127, exponent 8-bits, fraction 22-bits Is this comment correct? 1 sign bit + 8 exponent bits + 22 mantissa bits looks much more like a 32-bit float than a 64-bit float. -- Is there an inconsistency in naming these functions? It appears that in some places, a 32-bit float is called `_float` in the name, but in others it is called `_f32`. Eg. +static Float conv_bf16_to_float( UInt input ) vs +static UInt conv_f32_to_bf16( UInt input ) Can you either fix the inconsistencies (if they exist) and/or also add a comment at the top explaining the naming? --- +ULong convert_from_f32tobf16_helper( ULong src ) { In this file, either mark functions as 'static' or add a comment saying they are called from generated code. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 429375] PPC ISA 3.1 support is missing, part 9
https://bugs.kde.org/show_bug.cgi?id=429375 --- Comment #5 from Julian Seward --- Looks OK to land now. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 434840] PPC64 darn instruction not supported
https://bugs.kde.org/show_bug.cgi?id=434840 --- Comment #5 from Julian Seward --- (In reply to Carl Love from comment #4) > Created attachment 136999 [details] > patch to add darn instruction support +ULong darn_dirty_helper ( UInt L ) +{ + ... +# else + val = 0xULL; /* error */ +# endif Given that you initialise `val` to the error value outside of any ifdef, this #else clause is redundant. You could remove it. +static Int dis_darn ( UInt prefix, UInt theInstr, + const VexAbiInfo* vbi ) .. + if (L == 3) + /* Hardware reports illegal instruction if L = 3. */ + vpanic("ERROR: darn instruction L=3 is not valid\n"); Generate SIGILL for an unrecognised instruction, don't assert. Please always do this -- I've asked multiple times before. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 434840] PPC64 darn instruction not supported
https://bugs.kde.org/show_bug.cgi?id=434840 --- Comment #2 from Julian Seward --- (In reply to Carl Love from comment #1) Handing it off to the hardware via a helper function is the right thing to do. But you can't use a clean helper for this; that breaks the rules for clean helpers. A clean helper * must produce a value which depends only on its arguments * given the same argument, always produces the same result. and the IR optimiser relies (or may rely) on the above being true. You need to use a dirty helper. Also I'm not happy about the replacement C implementations; I'm not sure when they would ever get used. I'd prefer if they were removed. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 401416] Compile failure with openmpi 4.0
https://bugs.kde.org/show_bug.cgi?id=401416 Julian Seward changed: What|Removed |Added Resolution|--- |FIXED Status|CONFIRMED |RESOLVED --- Comment #8 from Julian Seward --- Fixed, 3415e1e1acc5095607663071db299f961bd65bde. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 401416] Compile failure with openmpi 4.0
https://bugs.kde.org/show_bug.cgi?id=401416 --- Comment #5 from Julian Seward --- (In reply to Mark Wielaard from comment #4) > Is there a reason to remove them completely instead of keep using the if > defined (...) constructs? I had wondered that too. I'm trying the if-defined scheme right now. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 434193] GCC 9+ inlined strcmp causes "Conditional jump or move depends on uninitialised value" report
https://bugs.kde.org/show_bug.cgi?id=434193 Julian Seward changed: What|Removed |Added Status|REPORTED|RESOLVED Resolution|--- |FIXED --- Comment #9 from Julian Seward --- Committed, 21571252964c4d7860210cbe9f60a49eb6881824. Thanks for the patch (and analysis). -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 432801] Valgrind 3.16.1 reports a jump based on uninitialized memory somehow related to clang and signals
https://bugs.kde.org/show_bug.cgi?id=432801 --- Comment #22 from Julian Seward --- (In reply to Eyal from comment #21) > Will anyone consider the patch? This (problem) is definitely something I want to deal with, and your patch seems like a good starting place. However, 3.17 is imminent and I would prefer to defer till after 3.17 is done. In part because I'd like to do more general qualifying runs against gcc-11 and clang-11 compiled code. The patch might need to be generalised, so as to make it handle different vector widths, lane widths and even maybe the scalar cases. Regarding the frequency of false positives, this is the first time I have heard of this problem. By comparison, we have many bug reports that relate to (eg) clang/gcc doing equality comparisons on partially defined data. I'd accept also though, that if this is a new optimisation in clang, then we might get more such reports in the future. But right now it doesn't seem critical enough to put into 3.17. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 434193] GCC 9+ inlined strcmp causes "Conditional jump or move depends on uninitialised value" report
https://bugs.kde.org/show_bug.cgi?id=434193 --- Comment #6 from Julian Seward --- Here's my proposed fix. It works for your test case on x86_64 but I didn't test it on arm64. Does it work for you? diff --git a/memcheck/mc_translate.c b/memcheck/mc_translate.c index 516988bdd..759f00b3a 100644 --- a/memcheck/mc_translate.c +++ b/memcheck/mc_translate.c @@ -8590,12 +8590,14 @@ IRSB* MC_(instrument) ( VgCallbackClosure* closure, # elif defined(VGA_amd64) mce.dlbo.dl_Add32 = DLexpensive; mce.dlbo.dl_Add64 = DLauto; + mce.dlbo.dl_CmpEQ16_CmpNE16 = DLexpensive; mce.dlbo.dl_CmpEQ32_CmpNE32 = DLexpensive; mce.dlbo.dl_CmpEQ64_CmpNE64 = DLexpensive; # elif defined(VGA_ppc64le) // Needed by (at least) set_AV_CR6() in the front end. mce.dlbo.dl_CmpEQ64_CmpNE64 = DLexpensive; # elif defined(VGA_arm64) + mce.dlbo.dl_CmpEQ16_CmpNE16 = DLexpensive; mce.dlbo.dl_CmpEQ64_CmpNE64 = DLexpensive; # endif -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 434193] GCC 9+ inlined strcmp causes "Conditional jump or move depends on uninitialised value" report
https://bugs.kde.org/show_bug.cgi?id=434193 --- Comment #5 from Julian Seward --- (In reply to Mike Crowe from comment #4) > It ain't half slow on anything more than toy test cases though. :) That was a general test to establish whether lack of precise instrumentation was indeed the problem. Now that we've established that, I can enable precise instrumentation just for the case CmpEQ16/NE16 on those two targets, which shouldn't be much of a per hit, unless your program spends all its time doing 16-bit equality comparisons. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 434193] GCC 9+ inlined strcmp causes "Conditional jump or move depends on uninitialised value" report
https://bugs.kde.org/show_bug.cgi?id=434193 --- Comment #3 from Julian Seward --- Does running with --expensive-definedness-checks=yes make any difference? -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 430429] valgrind.h doesn't compile on s390x with clang
https://bugs.kde.org/show_bug.cgi?id=430429 --- Comment #6 from Julian Seward --- (In reply to Paul Floyd from comment #5) > Quick confirmation for amd64 Fedora. No problems with clang 11 or clang 13 Thanks. (general comment, not suggesting any action right now): Looking at valgrind.h now, I am thinking that it is a mistake for us to use `unsigned long int` as the basic machine-word type for 64-bit targets. Perhaps we should change those en masse to `unsigned long long int`, so as to avoid any ambiguity in the type-sizes. TBH I am not sure why it uses `unsigned long int` at all. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 431157] PPC_FEATURE2_SCV needs to be masked in AT_HWCAP2
https://bugs.kde.org/show_bug.cgi?id=431157 --- Comment #25 from Julian Seward --- (In reply to Tom Hughes from comment #24) > Indeed if that is -2 that would be errno 2 aka ENOENT which would make sense. Agreed. So then it would seem to me that the problem is that the patch doesn't update getSyscallStatusFromGuestState. How this works is: the assembly code in syscall-ppc64{le,be}-linux.S hands the syscall to the host, and when it returns, it copies the result back into the guest state: /* put the result back in the threadstate */ 3: std 3,OFFSET_ppc64_GPR3(30) /* gst->GPR3 = sc result */ /* copy cr0.so back to simulated state */ mfcr5 /* r5 = CR */ rlwinm 5,5,4,31,31 /* r5 = (CR >> 28) & 1 */ stb 5,OFFSET_ppc64_CR0_0(30)/* gst->CR0.SO = cr0.so */ So far so good. No problem there. The problem occurs slightly later, when the main driver logic in syswrap-main.c calls getSyscallStatusFromGuestState to find out whether the syscall failed or not, in a target-independent way. And I am guessing this needs to be updated too. That currently computes the result as canonical->sres = VG_(mk_SysRes_ppc64_linux)( gst->guest_GPR3, cr0so ); meaning, it is using the legacy (R3 + CR0.S0) scheme to figure out whether the call failed, regardless of which scheme (sc or scv) the guest originally requested. Hence also, the syscall appears to fail or succeed depending on what junk the kernel left for us in CR0.S0, hence the apparently random failures that have been observed. Fixing it would be pretty simple if getSyscallStatusFromGuestState can find out for sure which syscall scheme was requested. That is probably safest done by passing it sci->orig_args. Another complication (sigh) is that this only covers the cases where we run the syscall async: if (sci->flags & SfMayBlock) { /* Syscall may block, so run it asynchronously */ vki_sigset_t mask; But this case will also need fixing } else { /* run the syscall directly */ and hence it looks like convert_SysRes_to_SyscallStatus will need fixing too. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 431157] PPC_FEATURE2_SCV needs to be masked in AT_HWCAP2
https://bugs.kde.org/show_bug.cgi?id=431157 --- Comment #23 from Julian Seward --- (In reply to Tom Hughes from comment #22) > So if thinks the open was a success with a negative file descriptor returned? Ah, maybe it's trying to tell us something :-) I vaguely remember reading that this new syscall convention denotes failure by returning a value between -1 and -4096 [bizarrely, like x86-32 in the earliest days of Linux]. If that's true, it would imply that the call was passed to the kernel using the new convention, it failed, but we mistakenly believe, in the post-syscall handling, that it succeeded. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 430429] valgrind.h doesn't compile on s390x with clang
https://bugs.kde.org/show_bug.cgi?id=430429 --- Comment #4 from Julian Seward --- (In reply to Andreas Arnez from comment #2) @Andreas: if the patch works on s390, I think you should land it. > @Julian: Isn't the patch necessary for other 64-bit platforms as well then? > Such as ppc64le or amd64? Well, we have no reports of such problems on either ppc64le or amd64. Though looking at the code, I fundamentally agree with you .. I don't see why s390 should have a problem here but not the others. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 431157] PPC_FEATURE2_SCV needs to be masked in AT_HWCAP2
https://bugs.kde.org/show_bug.cgi?id=431157 --- Comment #20 from Julian Seward --- Looking at the end of the log there: SYSCALL[279836,1](291) sys_newfstatat ( 24, 0x4a73980(), 0x4ad2578 )[sync] --> Failure(0x9) SYSCALL[279836,1](90) sys_mmap ( 0x0, 0, 1, 2, 24, 0 ) --> [pre-fail] Failure(0x16) ==279836== ==279836== Process terminating with default action of signal 11 (SIGSEGV): dumping core The mmap call is marked "[pre-fail]", which means that V never passed it to the kernel. Instead it decided to "fail" it itself, for the reason 0x16, whatever that is. Might be worth investigating. Also I wonder if the failure of the immediately preceding sys_newfstatat syscall is related. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 432552] [AArch64] invalid error emitted for pre-decremented byte and half-word addresses
https://bugs.kde.org/show_bug.cgi?id=432552 Julian Seward changed: What|Removed |Added Resolution|--- |FIXED Status|REPORTED|RESOLVED --- Comment #2 from Julian Seward --- Fixed, 2ab8ab38a372ab1f10094588dc5ad8f299c13ca6. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 432552] [AArch64] invalid error emitted for pre-decremented byte and half-word addresses
https://bugs.kde.org/show_bug.cgi?id=432552 Julian Seward changed: What|Removed |Added Summary|[AArch64] invalid error |[AArch64] invalid error |emitted for pre-incremented |emitted for pre-decremented |byte and half-word |byte and half-word |addresses |addresses -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 429354] PPC ISA 3.1 support is missing, part 8
https://bugs.kde.org/show_bug.cgi?id=429354 --- Comment #6 from Julian Seward --- Thanks for the fixes. r+; please land. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 432552] [AArch64] invalid error emitted for pre-incremented byte and half-word addresses
https://bugs.kde.org/show_bug.cgi?id=432552 --- Comment #1 from Julian Seward --- I suspect this can be fixed by finding the following at or near line 4967 of guest_arm64_toIR.c Bool earlyWBack = wBack && simm9 < 0 && (szB == 8 || szB == 4) && how == BITS2(1,1) && nn == 31 && !isLoad; and changing it to also accept szB == 2 and szB == 1. Given the existing logic, it's not surprising that you observe .. This error is emitted for byte and half-word stores. The error does not happen for word and double-word stores. I haven't tested this proposed fix yet though. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 432801] Valgrind 3.16.1 reports a jump based on uninitialized memory somehow related to clang and signals
https://bugs.kde.org/show_bug.cgi?id=432801 --- Comment #17 from Julian Seward --- Interesting analysis, and a plausible patch; thank you for that. This seems like a new trick from LLVM. I'm still struggling to understand what's going on, though. I can see that for (size_t i = 0; i < plen; ++i) hp += pattern[i]; could be vectorised as you say, so that it loads 4 bytes at a time, and uses punpcklbw twice to interleave them as described in comment 12. But: * where's the addition instruction that merges the lanes together? I don't see that. * what is the purpose of the pcmpgtd instruction? The original sources contain a scalar comparison against zero if (hp==j) { j++; } Is that related? If so, how does a scalar 32-bit equality test against zero get translated into a vector 32x4 signed-greater-than operation? --- In the patch, there's mention of biasing: + // From here on out, we're dealing with biased integers instead of 2's + // complement. What does that mean, in this context? Regarding the test: * you put it in memcheck/tests/x86; "x86" here means 32-bit only. Is that what you intended? I would have expected it to go in the "amd64" directory. * because the test is written in C, whether or not it tests what you expect it to test depends entirely on the compiler used to compile it. And most likely, it won't be vectorised, or won't be vectorised in the same way. This kind of test really needs to be written in assembly (inline assembly) so we know what we're testing. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 429354] PPC ISA 3.1 support is missing, part 8
https://bugs.kde.org/show_bug.cgi?id=429354 --- Comment #4 from Julian Seward --- return a & 0xFF, I meant. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 429354] PPC ISA 3.1 support is missing, part 8
https://bugs.kde.org/show_bug.cgi?id=429354 --- Comment #3 from Julian Seward --- (In reply to Julian Seward from comment #2) >uint16_t getMSBs_8x16(vec128) >{ > let hiHalf = vec128[127:64] // LE numbering > let loHalf = vec128[ 63:0] > // In each byte lane, copy the MSB to all bit positions > hiHalf = shift_right_signed_8x8(hiHalf, 7); > loHalf = shift_right_signed_8x8(loHalf, 7); > // Now each byte lane is either 0x00 or 0xFF > // Make (eg) lane 7 contain either 0x00 or 0x80, lane 6 contain > // either 0x00 or 0x40, etc > hiHalf &= 0x8040201008040201; > loHalf &= 0x8040201008040201; > hi8msbs = add_across_lanes_8x8(hiHalf) > lo8msbs = add_across_lanes_8x8(loHalf) > return (hi8msbs << 8) | lo8msbs; >} One more thought, regarding add_across_lanes_8x8(). In fact you can do this semi-reasonably using standard 64-bit scalar code, because of the nature of the values involved. Specifically, we are adding together 8 bytes, each of which is either zero or it has a 1 bit in a different location. Hence there will never be any carry-bit propagation at all in the addition, and so it can be implemented -- for this particular use case only -- as uint64_t add_across_lanes_8x8(uint64_t a) { a += (a >> 8); a += (a >> 16); a += (a >> 32); return a; } (I *think*) -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 429375] PPC ISA 3.1 support is missing, part 9
https://bugs.kde.org/show_bug.cgi?id=429375 --- Comment #3 from Julian Seward --- == https://bugs.kde.org/attachment.cgi?id=133479 test cases for VSX-PCV generate operations OK to land == https://bugs.kde.org/attachment.cgi?id=134757 VSX- PCV Generate Operation functional support Some concerns here. /*---*/ /*--- Misc integer helpers. ---*/ /*---*/ +void write_VSX_entry (VexGuestPPC64State* gst, UInt reg_offset, + ULong *vsx_entry); Mark this as static or give it a comment that it is called from generated code (one or the other) Also it seems like this function prototype is in a .c file, not a header. Seems not right. +/*--*/ +/* VSX Vector Generate PCV from Mask helpers ---*/ +/*--*/ +void write_VSX_entry (VexGuestPPC64State* gst, UInt reg_offset, + ULong *vsx_entry) comment or 'static' pls +void vector_gen_pvc_byte_mask_dirty_helper( VexGuestPPC64State* gst, +ULong src_hi, ULong src_lo, +UInt reg_offset, UInt imm ) { and here + } else { + vex_printf("ERROR, vector_gen_pvc_byte_mask_dirty_helper, imm value %u not supported.\n", + imm); + } After printing, do vassert(0); so the system stops, rather than continues with incorrect data. + return; +} the return is redundant +void vector_gen_pvc_hword_mask_dirty_helper( VexGuestPPC64State* gst, + ULong src_hi, ULong src_lo, + UInt reg_offset, UInt imm ) { comment or 'static' + } else { + vex_printf("ERROR, vector_gen_pvc_hword_dirty_mask_helper, imm value %u not supported.\n", + imm); + } and assert, and rm redundant return +void vector_gen_pvc_word_mask_dirty_helper( VexGuestPPC64State* gst, +ULong src_hi, ULong src_lo, +UInt reg_offset, UInt imm ) { same comments +void vector_gen_pvc_dword_mask_dirty_helper( VexGuestPPC64State* gst, + ULong src_hi, ULong src_lo, + UInt reg_offset, UInt imm ) { same comments + IREffect VSX_fx = Ifx_Write; Inline this at its single use point, since it is never changed. + IRExpr** args = mkIRExprVec_5( + IRExpr_GSPTR(), + mkexpr( src_hi ), + mkexpr( src_lo ), + mkU32( reg_offset ), + mkU64( IMM ) ); + + + No need for three blank lines here. + d->nFxState = 1; + vex_bzero(>fxState, sizeof(d->fxState)); + d->fxState[0].fx = VSX_fx; + d->fxState[0].size = sizeof(U128); + d->fxState[0].offset = reg_offset; You're sure this declaration of effects is correct, right? The called helpers really only writes one of the vector registers and doesn't read any? + default: + vex_printf("dis_vector_generate_pvc_from_mask (opc2)\n"); + return False; + } Don't print in failure paths; just return False. == -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 429354] PPC ISA 3.1 support is missing, part 8
https://bugs.kde.org/show_bug.cgi?id=429354 --- Comment #2 from Julian Seward --- == https://bugs.kde.org/attachment.cgi?id=133462 Test cases for VSX Mask Manipulation operations OK to land == https://bugs.kde.org/attachment.cgi?id=133461 Functional support for ISA 3.1, VSX Mask manipulation operations I have some concerns about the verboseness/inefficency of the generated IR. +static IRExpr * copy_MSB_bit_fields ( IRExpr *src, UInt size ) +{ + /* Input src is a V128 value. Input size is the number of bits in each +* vector field. The function copies the MSB of each vector field into +* the low order bits of the 64-bit result. +*/ You can do it like this, but it looks very expensive. Note that there is an IROp which (if I understand this correctly) does what you want: /* MISC CONVERSION -- get high bits of each byte lane, a la x86/amd64 pmovmskb */ Iop_GetMSBs8x16, /* V128 -> I16 */ You could use that instead (and I'd encourage you to do so), although it does mean you'd have to handle this in the POWER backend, obviously. Although now I think about it, I am not sure what the deal is with endianness -- Iop_GetMSBs8x16 has been used so far only on little-endian targets. If Power has an instruction that can add all the lanes of a SIMD value together, creating a single number, then there's a faster way to do this, that doesn't involve Iop_GetMSBs8x16. Something like this: uint16_t getMSBs_8x16(vec128) { let hiHalf = vec128[127:64] // LE numbering let loHalf = vec128[ 63:0] // In each byte lane, copy the MSB to all bit positions hiHalf = shift_right_signed_8x8(hiHalf, 7); loHalf = shift_right_signed_8x8(loHalf, 7); // Now each byte lane is either 0x00 or 0xFF // Make (eg) lane 7 contain either 0x00 or 0x80, lane 6 contain // either 0x00 or 0x40, etc hiHalf &= 0x8040201008040201; loHalf &= 0x8040201008040201; hi8msbs = add_across_lanes_8x8(hiHalf) lo8msbs = add_across_lanes_8x8(loHalf) return (hi8msbs << 8) | lo8msbs; } There are variants, but you get the general idea from the above. + if (IFIELD(theInstr, 1, 5) == 0xA)//mtvsrbmi + inst_select = 0x; Add a comment to explain that this is a special-case hack for mtvsrbmi. + default: + vex_printf("dis_VSR_byte_mask (opc2)\n"); + return False; + } Don't print on failure paths; only return False. + for(i = 0; i< max; i++) { + ones_lo[i+1] = newTemp( Ity_I64 ); + test_lo[i] = newTemp( Ity_I64 ); + ones_hi[i+1] = newTemp( Ity_I64 ); + test_hi[i] = newTemp( Ity_I64 ); .. This seems really inefficient; is there no way to do this somewhat in parallel using SIMD IROps ? == -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 429352] PPC ISA 3.1 support is missing, part 7
https://bugs.kde.org/show_bug.cgi?id=429352 --- Comment #10 from Julian Seward --- == https://bugs.kde.org/attachment.cgi?id=133459 Test cases for new IOps OK to land == https://bugs.kde.org/attachment.cgi?id=135083 Fix pcast for existing code OK to land == https://bugs.kde.org/attachment.cgi?id=135176 Add ACC support into routine get_otrack_shadow_offset_wrk() OK to land == https://bugs.kde.org/attachment.cgi?id=135082 Functional support for ISA 3.1 New Iops OK to land -- I don't see any significant problems with it -- but first please fix the following comments. +UInt generate_DFP_FPRF_value_helper( UInt gfield, If this is intended to be called from VEX-generated code, add a comment to that effect, and add an extern qualifier. If it isn't, add a static qualifier. (== follow the existing conventions) +PPCInstr* PPCInstr_AvTernaryInt128 ( PPCAvOp op, HReg dst, + HReg src1, HReg src2, HReg src3 ) { + PPCInstr* i = LibVEX_Alloc_inline(sizeof(PPCInstr)); + i->tag = Pin_AvTrinaryInt128; + i->Pin.AvTrinaryInt128.op = op; There's still an inconsistency in the naming: Ternary vs Trinary. I'd prefer you use Ternary consistently, everywhere. @@ -2321,6 +2428,21 @@ void ppPPCInstr ( const PPCInstr* i, Bool mode64 ) ppHRegPPC(i->Pin.Dfp128Cmp.dst); vex_printf(",8,28,31"); return; + + case Pin_XFormUnary994: + if (Px_DFPTOIQS) { This isn't right. It needs to be if (i->Pin.XFormUnary994.op == Px_DFPTOIQS) or something like that. + case Pin_XFormUnary994: { + int inst_sel; Regarding "int", please use the project-defined types, always: Int, Char, UChar etc, not "int", "char" Also, here, "inst_sel" is local to each of the two case-clauses, so move it into each. + case Iop_D128toI128S: { + HReg srcHi = newVRegF(env); + HReg srcLo = newVRegF(env); .. + iselDfp128Expr( , , env, e->Iex.Binop.arg2, IEndianess ); Isn't it the case that iselDfp128Expr always writes its first two arguments (meaning, it selects the output virtual registers itself) ? If so, then it's pointless to initialise srcHi/srcLo by calling newVRegF, because those vregs will be ignored. Better to initialise them with HReg_INVALID (or is it INVALID_HREG). Similar comment in some other places further down the file. + // store the two 64-bit pars pairs + HReg rHi, rLo; + HReg dst = newVRegV(env); + + iselInt128Expr(,, env, e->Iex.Unop.arg, IEndianess); For example here: now there's no initialisation at all. Initialise rHi/rLo to HREG_INVALID here. @@ -314,8 +316,10 @@ void ppIROp ( IROp op ) case Iop_F128LOtoF64: vex_printf("F128LOtoF64"); return; case Iop_I32StoF128: vex_printf("I32StoF128"); return; case Iop_I64StoF128: vex_printf("I64StoF128"); return; + case Iop_I128StoF128: vex_printf("128StoF128"); return; In the output text, the initial "I" is missing @@ -4753,15 +4769,19 @@ IRAtom* expr2vbits_Binop ( MCEnv* mce, uifu = mkUifU1; difd = mkDifD1; and_or_ty = Ity_I1; improve = mkImproveOR1; goto do_And_Or; - do_And_Or: + do_And_Or: Please restore the original indentation for this label. == -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 432510] RFE: ENOMEM fault injection mode
https://bugs.kde.org/show_bug.cgi?id=432510 --- Comment #2 from Julian Seward --- (In reply to Frank Ch. Eigler from comment #0) Before we get into discussing how to implement stuff, I think it's important to have a more precise statement of what functionality is required. Can you give more details on what errors should be triggered, how they should be reported, and when they should be triggered? -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 432161] Addition of arm64 v8.2 FABS, FNEG and FSQRT instructions
https://bugs.kde.org/show_bug.cgi?id=432161 --- Comment #6 from Julian Seward --- (In reply to ahashmi from comment #5) Looks good to me. Land! -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 432161] Addition of arm64 v8.2 FABS, FNEG and FSQRT instructions
https://bugs.kde.org/show_bug.cgi?id=432161 --- Comment #3 from Julian Seward --- Nicely done! I did spot one bug though: @@ -3752,6 +3775,8 @@ IRAtom* expr2vbits_Binop ( MCEnv* mce, case Iop_I32StoF32x4: case Iop_F32toI32Sx4: + case Iop_Sqrt16Fx8: + return unary16Fx8_w_rm(mce, vatom1, vatom2); case Iop_Sqrt32Fx4: return unary32Fx4_w_rm(mce, vatom1, vatom2); case Iop_Sqrt64Fx2: This isn't quite right, in that the two added lines change the behaviour for the fallthrough cases immediately above. I mean: consider what happens for Iop_I32StoF32x4 and Iop_F32toI32Sx4 with and without the added lines. Otherwise it's fine. One final question: did you check that the relevant test programs run OK (without any assertion failures) on memcheck, by running them by hand, with --tool=memcheck and also --tool=memcheck --track-origins=yes ? If not, could you please do that and make sure nothing breaks. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 429352] PPC ISA 3.1 support is missing, part 7
https://bugs.kde.org/show_bug.cgi?id=429352 --- Comment #5 from Julian Seward --- (In reply to Carl Love from comment #1) > Created attachment 133459 [details] > Test cases for new IOps > > The test cases This seems OK to be. But per comments on https://bugs.kde.org/attachment.cgi?id=133458, you need to ensure that absolutely all of the Power instruction tests run successfully with --tool=memcheck and --tool=memcheck --track-origins=yes. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 429352] PPC ISA 3.1 support is missing, part 7
https://bugs.kde.org/show_bug.cgi?id=429352 --- Comment #4 from Julian Seward --- (In reply to Carl Love from comment #0) > Created attachment 133458 [details] > Functional support for ISA 3.1, New Iops > > Power PC missing ISA 3.1 support. Additional patches, part 7 It seems to me that the mc_translate.c sections of the patch aren't right, which could cause Memcheck to assert when running programs containing the new instructions. -- diff --git a/memcheck/mc_translate.c b/memcheck/mc_translate.c @@ -3410,8 +3412,7 @@ IRAtom* expr2vbits_Triop ( MCEnv* mce, unary64Fx2_w_rm(mce, vatom1, vatom2), unary64Fx2_w_rm(mce, vatom1, vatom3))); - - default: + default: Nit: return the `default:` to its correct indentation -- + // CARLL not sure about this yet. Remove this pls -- - do_And_Or: + /* Int 128-bit Integer two arg */ + // CARLL not sure about this yet. + case Iop_DivU128: + case Iop_DivS128: + case Iop_DivU128E: + case Iop_DivS128E: + case Iop_ModU128: + case Iop_ModS128: + uifu = mkUifUV128; difd = mkDifDV128; + and_or_ty = Ity_V128; improve = mkImproveORV128; goto do_And_Or; + This strikes me as wrong. `do_And_Or` is only intended to instrument AND and OR operations. But here, you're using it to instrument Div/Mod operations, which isn't correct. -- @@ -5001,15 +5024,18 @@ IRExpr* expr2vbits_Unop ( MCEnv* mce, IROp op, IRAtom* atom ) return assignNew('V', mce, Ity_I64, unop(Iop_128HIto64, vatom)); case Iop_F128LOtoF64: /* F128 -> low half of F128 */ case Iop_D128LOtoD64: /* D128 -> low half of D128 */ + case Iop_TruncF128toI128S: /* F128 -> I128S */ + case Iop_TruncF128toI128U: /* F128 -> I128U */ return assignNew('V', mce, Ity_I64, unop(Iop_128to64, vatom)); This also doesn't seem correct to me, in the sense that it simply ignores the definedness of the most significant 64 bits of the input, and, I am guessing, will create general incorrectly-typed IR. Has it been tested? These cases TruncF128toI128S and TruncF128toI128U do not belong with F128LOtoF64 and D128LOtoD64. I suggest you add them instead to the group that includes NegF128, AbsF128 and RndF128. -- But even that (present code) doesn't seem correct: case Iop_NegF128: case Iop_AbsF128: case Iop_RndF128: case Iop_TruncF128toI64S: /* F128 -> I64S */ case Iop_TruncF128toI32S: /* F128 -> I32S (result stored in 64-bits) */ case Iop_TruncF128toI64U: /* F128 -> I64U */ case Iop_TruncF128toI32U: /* F128 -> I32U (result stored in 64-bits) */ return mkPCastTo(mce, Ity_I128, vatom); NegF128/AbsF128/RndF128 are correct, because their shadow value will be an I128 type and hence it is correct to do ` mkPCastTo(mce, Ity_I128, vatom)`. But TruncF128toI64S and Iop_TruncF128toI64U need to be PCast-ed to an I64 value -- so they belong somewhere else in this file, not here, and TruncF128toI32S and TruncF128toI32U need to be PCast-ed to an I32 value, again, somewhere else. Similarly these 6 case Iop_ReinterpF64asI64: case Iop_ReinterpI64asF64: case Iop_ReinterpI32asF32: case Iop_ReinterpF32asI32: case Iop_ReinterpI64asD64: case Iop_ReinterpD64asI64: are probably not correct; they need to be PCasted to I64/I32 appropriately. -- Some smaller things, in other places in the patch: diff --git a/VEX/priv/guest_ppc_toIR.c b/VEX/priv/guest_ppc_toIR.c +static void generate_store_DFP_FPRF_value( ULong irType, IRExpr *src, + const VexAbiInfo* vbi ) ... + } else { + vex_printf("ERROR generate_store_DFP_FPRF_value, unknown value for irType 0x%lx\n", (unsigned long ) irType); + } Uhh .. this isn't good. Please remove it, and either assert/panic here if reaching here indicates a bug in this front end, or cause this to generate a SIGILL (insn-not-decoded) in the normal way. -- + } + break; } Indent the `break` correctly, so it's easier to see what scope it's relevant to. -- diff --git a/VEX/priv/host_ppc_defs.c b/VEX/priv/host_ppc_defs.c +PPCInstr* PPCInstr_AvTrinaryInt128 ( PPCAvOp op, HReg dst, + HReg src1, HReg src2, HReg src3 ) { I believe the normal terminology for a 3-way thing is "ternary", not "trinary" (I've never heard the latter word before, AFAIR) -- @@ -4902,13 +5075,27 @@ Int emit_PPCInstr ( /*MB_MOD*/Bool* is_profInc, case Pfp_IDUTOQ: // xscvudqp p = mkFormVXR0( p, 63, fr_dst, 2, fr_src, 836, 0, endness_host ); break; + case Pfp_IQSTOQ: // xscvsqqp +// vex_printf("CARLL, issue xscvsqqp instruction. If not on P10, I will crash now on illegal inst.\n"); + p = mkFormVXR0( p,
[valgrind] [Bug 431556] Complete arm64 FADDP v8.2 instruction support started in 413547
https://bugs.kde.org/show_bug.cgi?id=431556 Julian Seward changed: What|Removed |Added Resolution|--- |FIXED Status|REPORTED|RESOLVED --- Comment #6 from Julian Seward --- Landed, 8fa9e36f7c269563feebbf41a7a658bc7879ad39. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 431556] Complete arm64 FADDP v8.2 instruction support started in 413547
https://bugs.kde.org/show_bug.cgi?id=431556 Julian Seward changed: What|Removed |Added CC||jsew...@acm.org --- Comment #3 from Julian Seward --- (In reply to ahashmi from comment #2) > Created attachment 134852 [details] > Patch completes addition of arm64 v8.2 faddp instructions. Looks good to me; a couple of small queries, but basically is landable. --- The cases are distinguished as follows: isD == True, bitQ == 1 => 2d isD == False, bitQ == 1 => 4s isD == False, bitQ == 0 => 2s + isH == True, bitQ == 0 => 4h + isH == False, bitQ == 1 => 8h Is this comment out of date? The function it applies to takes an ARM64VecESize now, not isH / isD. --- + if (1) test_faddp_4h_00_00_00(TyH); The tests where the three register numbers are the same .. are they of any value? In particular, they won't expose mixups where the wrong register number is used in decode. Those cases are covered by the _N_N+1_N+2 variants afaics. Is there some other reason to keep the N_N_N variants? -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 413547] regression test does not check for Arm 64 features
https://bugs.kde.org/show_bug.cgi?id=413547 Julian Seward changed: What|Removed |Added Status|REPORTED|RESOLVED Resolution|--- |FIXED --- Comment #10 from Julian Seward --- Landed, 3b1710d38cf19619242c9113a2dbe291e914a8c2. Thank you for the work on this! -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 414268] Enable AArch64 feature detection and decoding for v8.x instructions (where x>0)
https://bugs.kde.org/show_bug.cgi?id=414268 Julian Seward changed: What|Removed |Added Status|CONFIRMED |RESOLVED Resolution|--- |FIXED --- Comment #18 from Julian Seward --- Landed, cb52fee5ddbc2c0e936fd1efe5107a1afcf375cf. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 413547] regression test does not check for Arm 64 features
https://bugs.kde.org/show_bug.cgi?id=413547 --- Comment #7 from Julian Seward --- You can get to an "patch is obsolete" check box by following the twisty maze attached to the "Details" link on the "Attachments" box. Anyways, review comments. This looks fine, on the whole. OK to land providing the apparently-bogus faddp implementation issue is resolved. Most of the code in this patch is, I assume, boilerplate test-support functions taken from other tests in none/tests/arm64. + /* Temporary clause to test hwcaps functionality. */ + if (bitU == 0 && sz <= X00 && opcode == BITS5(0,1,1,0,1)) { + /* 0,00,01101 ADDP h_2h */ + if ((archinfo->hwcaps & VEX_HWCAPS_ARM64_FP16) == 0) + return False; + DIP("faddp h%u, v%u.2h\n", dd, nn); + vex_printf("Unimplemented v8.2 instruction scalar FADDP: 0x%08x\n", insn); + return True; + } Should this be present? It certainly looks like it shouldn't be; if it accepts the instruction then it gives no implementation for it, which is definitely bogus. +static Double randDouble ( void ) +{ + ensure_special_values_initted(); + UChar c = randUChar(); + if (c >= 128) { + // return a normal number most of the time. "most" is somewhat misleading; "half" might be more accurate. Ditto in randFloat. > tests/arm64_features.c Just as a general sanity check .. is the way that the capabilities are established (by looking at auxv HWCAPS) consistent with the outcome of the discussion in bug 414268 ? I'm not looking for any change here. Instead I just want to check that there's an overall top-level coherent story about how capabilities in V are done for AArch64. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 413547] regression test does not check for Arm 64 features
https://bugs.kde.org/show_bug.cgi?id=413547 Julian Seward changed: What|Removed |Added Attachment #133667|0 |1 is obsolete|| -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 413547] regression test does not check for Arm 64 features
https://bugs.kde.org/show_bug.cgi?id=413547 --- Comment #5 from Julian Seward --- Assad, am I right to understand that the comment 3 patch has been made redundant by the comment 4 patch? -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 414268] Enable AArch64 feature detection and decoding for v8.x instructions (where x>0)
https://bugs.kde.org/show_bug.cgi?id=414268 --- Comment #14 from Julian Seward --- Thank you all for the high quality discussion and analysis. This looks fine to me; please land. One tiny nit .. please #undef get_cpu_ftr and get_ftr just before the closing brace of the fn that defines them. I agree with Peter's and Mark's view that this should be feature-based; but I see the difficulties in making that work cleanly because (unlike on x86) there's no "single source of truth" (viz, CPUID) that we can fake up for the guest, at our own convenience. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 429864] s390x: C++ atomic test_and_set yields false-positive memcheck diagnostics
https://bugs.kde.org/show_bug.cgi?id=429864 --- Comment #1 from Julian Seward --- Yes. There's a special Iop_CasCmpEQ{32,64} or some such, designed specifically to deal with this problem. See libvex_ir.h for details. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 428716] VEX/useful/smchash.c:39:16: error: Memory leak: gb [memleak]
https://bugs.kde.org/show_bug.cgi?id=428716 Julian Seward changed: What|Removed |Added CC||jsew...@acm.org --- Comment #6 from Julian Seward --- I don't think this is worth bothering with. It's not part of the build. It's just a program I hacked up to collect stats used to develop the hash functions that you see at the bottom of VEX/priv/guest_generic_bb_to_IR.c. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 427404] PPC ISA 3.1 support is missing, part 6
https://bugs.kde.org/show_bug.cgi?id=427404 --- Comment #11 from Julian Seward --- OK to land provided that setup_fxstate_struct Read/Write/Modify annotations are set more precisely, so as to avoid false positives and false negatives from Memcheck. -- You are receiving this mail because: You are watching all bug changes.