[valgrind] [Bug 484742] unhandled instruction 0x4E9096B7

2024-03-30 Thread Joost VandeVondele
https://bugs.kde.org/show_bug.cgi?id=484742

--- Comment #2 from Joost VandeVondele  ---
ok, seems to be related to the +dotprod part of the isa (which we detect based
on the asimddp flag). Probably 
acc = vdotq_s32(acc, a0, b0); 
or
  output[i] = vaddvq_s32(sum);

-- 
You are receiving this mail because:
You are watching all bug changes.

[valgrind] [Bug 484742] unhandled instruction 0x4E9096B7

2024-03-30 Thread Joost VandeVondele
https://bugs.kde.org/show_bug.cgi?id=484742

--- Comment #1 from Joost VandeVondele  ---
Also reproduces on a Raspberry Pi 5: 

Raspberry Pi 5

$ cat /proc/cpuinfo 
processor   : 0
BogoMIPS: 108.00
Features: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp
asimdhp cpuid asimdrdm lrcpc dcpop asimddp
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x4
CPU part: 0xd0b
CPU revision: 1

-- 
You are receiving this mail because:
You are watching all bug changes.

[valgrind] [Bug 484742] New: unhandled instruction 0x4E9096B7

2024-03-30 Thread Joost VandeVondele
https://bugs.kde.org/show_bug.cgi?id=484742

Bug ID: 484742
   Summary: unhandled instruction 0x4E9096B7
Classification: Developer tools
   Product: valgrind
   Version: 3.22.0
  Platform: Other
OS: Linux
Status: REPORTED
  Severity: normal
  Priority: NOR
 Component: general
  Assignee: jsew...@acm.org
  Reporter: joost.vandevond...@gmail.com
  Target Milestone: ---

On NVIDIA's Grace CPU valgrind fails to run a binary (that otherwise runs fine)
failing to handle an instruction:

```
==45347== Memcheck, a memory error detector
==45347== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==45347== Using Valgrind-3.22.0 and LibVEX; rerun with -h for copyright info
==45347== Command: ./stockfish bench
==45347== 
Stockfish dev-20240329-ec598b38 by the Stockfish developers (see AUTHORS file)

Position: 1/48 (rnbqkbnr//8/8/8/8//RNBQKBNR w KQkq - 0 1)
info string NNUE evaluation using nn-1ceb1ade0001.nnue
info string NNUE evaluation using nn-baff1ede1f90.nnue
disInstr(arm64): unhandled instruction 0x4E9096B7
disInstr(arm64): 0100'1110 1001' 1001'0110 1011'0111
==45347== valgrind: Unrecognised instruction at address 0x40f684.
==45347==at 0x40F684:
Stockfish::Eval::NNUE::Network, Stockfish::Eval::NNUE::FeatureTransformer<2560u,
&Stockfish::StateInfo::accumulatorBig> >::evaluate(Stockfish::Position const&,
bool, int*, bool) const [clone .constprop.0] (in
/users/vjoost/Stockfish/src/stockfish)
==45347==by 0x40E667:
Stockfish::Eval::evaluate(Stockfish::Eval::NNUE::Networks const&,
Stockfish::Position const&, int) (in /users/vjoost/Stockfish/src/stockfish)
==45347==by 0x42A3F7: Stockfish::Search::Worker::iterative_deepening() (in
/users/vjoost/Stockfish/src/stockfish)
==45347==by 0x4280E7: Stockfish::Search::Worker::start_searching() (in
/users/vjoost/Stockfish/src/stockfish)
==45347==by 0x4210EB: Stockfish::Thread::idle_loop() (in
/users/vjoost/Stockfish/src/stockfish)
==45347==by 0x42103F: Stockfish::NativeThread::NativeThread(void (Stockfish::Thread::*&&)(),
Stockfish::Thread*&&)::{lambda(void*)#1}::_FUN(void*) (in
/users/vjoost/Stockfish/src/stockfish)
==45347==by 0x507875B: start_thread (in /lib64/libpthread-2.31.so)
==45347==by 0x54BFEEB: thread_start (in /lib64/libc-2.31.so)
```
Linux OS.

The program is compiled using `gcc version 12.3.0`, with target:
`-march=armv8.2-a+dotprod`. 
/proc/cpuinfo gives:
```
Features: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp
asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve
asimdfhm dit uscat ilrcpc flagm ssbs sb paca pacg dcpodp sve2 sveaes svepmull
svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh bti
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part: 0xd4f
CPU revision: 0
```

-- 
You are receiving this mail because:
You are watching all bug changes.

[valgrind] [Bug 356457] valgrind: m_mallocfree.c:2042 (vgPlain_arena_free): Assertion 'blockSane(a, b)' failed.

2016-10-19 Thread Joost VandeVondele via KDE Bugzilla
https://bugs.kde.org/show_bug.cgi?id=356457

--- Comment #22 from Joost VandeVondele  ---
In my case, the issue has disappeared, and the 'only' thing changed is that the
server has been updated and is now running Red Hat Enterprise Linux Server
release 7.2, which for example includes a newer kernel
3.10.0-327.13.1.el7.x86_64. Valgrind, gcc etc. are still the same version. So,
I would suspect this is some interaction with the OS causing this.

-- 
You are receiving this mail because:
You are watching all bug changes.


[valgrind] [Bug 356457] valgrind: m_mallocfree.c:2042 (vgPlain_arena_free): Assertion 'blockSane(a, b)' failed.

2016-01-25 Thread Joost VandeVondele via KDE Bugzilla
https://bugs.kde.org/show_bug.cgi?id=356457

--- Comment #14 from Joost VandeVondele  ---
Also no luck with --sanity-level=4 

The fact that it is not reproducible on command is indeed not simplifying this.
I wonder if this could be related to something external to valgrind triggering
this.

-- 
You are receiving this mail because:
You are watching all bug changes.


[valgrind] [Bug 356457] valgrind: m_mallocfree.c:2042 (vgPlain_arena_free): Assertion 'blockSane(a, b)' failed.

2016-01-22 Thread Joost VandeVondele via KDE Bugzilla
https://bugs.kde.org/show_bug.cgi?id=356457

--- Comment #12 from Joost VandeVondele  ---
Since the error is recurring, I have now tried to run the self-hosting. Running
:

/data/vjoost/test/outer/install/bin/valgrind --sim-hints=enable-outer
--trace-children=yes --smc-check=all-non-file --run-libc-freeres=no
--tool=memcheck -v /data/vjoost/test/inner/install/bin/valgrind
--suppressions=/data/vjoost/toolchain-r16494/install/valgrind.supp
--max-stackframe=2168152 --error-exitcode=42 --vgdb-prefix=./inner
--core-redzone-size=1000 --tool=memcheck -v
/data/schuetto/auto_regtesting/regtests/cp2k/exe/local_valgrind/cp2k.sdbg
ethanol_both_rcut10.0_e1-1_v1-4_RSR.inp

(I.e. self-hosting with added redzone, on the our executable corresponding to a
failed run, with its arguments and parameters), I get a seemingly correct run.
The output will be attached as out.innerouter.2 . Maybe it is worthwhile to
look with expert eyes.

However, after observing in that output a warning on stack switching, I added
--max-stackframe=68009224472 (as suggested, seems a bit large;-), and that lead
to a run with some other error (Memcheck: the 'impossible' happened:  
create_MC_Chunk: shadow area is accessible).

-- 
You are receiving this mail because:
You are watching all bug changes.


[valgrind] [Bug 356457] valgrind: m_mallocfree.c:2042 (vgPlain_arena_free): Assertion 'blockSane(a, b)' failed.

2016-01-22 Thread Joost VandeVondele via KDE Bugzilla
https://bugs.kde.org/show_bug.cgi?id=356457

--- Comment #11 from Joost VandeVondele  ---
Created attachment 96783
  --> https://bugs.kde.org/attachment.cgi?id=96783&action=edit
self hosting output 3

-- 
You are receiving this mail because:
You are watching all bug changes.


[valgrind] [Bug 356457] valgrind: m_mallocfree.c:2042 (vgPlain_arena_free): Assertion 'blockSane(a, b)' failed.

2016-01-22 Thread Joost VandeVondele via KDE Bugzilla
https://bugs.kde.org/show_bug.cgi?id=356457

--- Comment #10 from Joost VandeVondele  ---
Created attachment 96782
  --> https://bugs.kde.org/attachment.cgi?id=96782&action=edit
self-hosting output 2

-- 
You are receiving this mail because:
You are watching all bug changes.


[valgrind] [Bug 356457] valgrind: m_mallocfree.c:2042 (vgPlain_arena_free): Assertion 'blockSane(a, b)' failed.

2015-12-20 Thread Joost VandeVondele via KDE Bugzilla
https://bugs.kde.org/show_bug.cgi?id=356457

--- Comment #8 from Joost VandeVondele  ---
The failures are observed on RedHatEnterpriseServer Release:6.7

over the weekend, I have been running valgrind with as an argument essentially
just a start of the relevant binary (and have it print the version number).
With >20 runs (10s each) I had no failure. This is on different machine
with Redhat 7.2

I'll try something similar on the other machine, but the failure is not so easy
to trigger, seemingly.

Dynamic libraries in my case are few, and standard I suppose:

> ldd /data/vjoost/clean/cp2k/cp2k/exe/local_valgrind/cp2k.sdbg 
linux-vdso.so.1 =>  (0x7ffe09f0d000)
libstdc++.so.6 =>
/data/vjoost/toolchain-r16447/install/lib64/libstdc++.so.6 (0x7f7d5a89)
libgfortran.so.3 =>
/data/vjoost/toolchain-r16447/install/lib64/libgfortran.so.3
(0x7f7d5a56f000)
libm.so.6 => /lib64/libm.so.6 (0x00323320)
libgcc_s.so.1 => /data/vjoost/toolchain-r16447/install/lib64/libgcc_s.so.1
(0x7f7d5a33c000)
libquadmath.so.0 =>
/data/vjoost/toolchain-r16447/install/lib64/libquadmath.so.0
(0x7f7d5a0fd000)
libc.so.6 => /lib64/libc.so.6 (0x003232e0)
/lib64/ld-linux-x86-64.so.2 (0x003232a0)

There are many more static libraries involved, and all are compiled with debug
info. The binary is also large (~142Mb).

-- 
You are receiving this mail because:
You are watching all bug changes.


[valgrind] [Bug 356457] valgrind: m_mallocfree.c:2042 (vgPlain_arena_free): Assertion 'blockSane(a, b)' failed.

2015-12-17 Thread Joost VandeVondele via KDE Bugzilla
https://bugs.kde.org/show_bug.cgi?id=356457

--- Comment #3 from Joost VandeVondele  ---
just happened again, but it is really rare. (this is a 12 core server running
valgrind +-12h a day... and this seems to happen every +- 10 days). Is any of
the suggestions mentioned above possible without runtime overhead and excessive
IO ?

==25277== Memcheck, a memory error detector
==25277== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==25277== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==25277== Command:
/data/schuetto/auto_regtesting/regtests/cp2k/exe/local_valgrind/cp2k.pdbg
Pa.inp
==25277== 
blockSane: fail -- redzone-hi

valgrind: m_mallocfree.c:2042 (vgPlain_arena_free): Assertion 'blockSane(a, b)'
failed.

-- 
You are receiving this mail because:
You are watching all bug changes.


[valgrind] [Bug 356457] New: valgrind: m_mallocfree.c:2042 (vgPlain_arena_free): Assertion 'blockSane(a, b)' failed.

2015-12-09 Thread Joost VandeVondele via KDE Bugzilla
https://bugs.kde.org/show_bug.cgi?id=356457

Bug ID: 356457
   Summary: valgrind: m_mallocfree.c:2042 (vgPlain_arena_free):
Assertion 'blockSane(a, b)' failed.
   Product: valgrind
   Version: unspecified
  Platform: Compiled Sources
OS: Linux
Status: UNCONFIRMED
  Severity: normal
  Priority: NOR
 Component: general
  Assignee: jsew...@acm.org
  Reporter: joost.vandevond...@mat.ethz.ch

Our nightly tester runs a few thousand testcases, and usually this goes all
fine. However, seemingly some rare condition can trigger the following assert:

==10071== Memcheck, a memory error detector
==10071== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==10071== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==10071== Command:
/data/schuetto/auto_regtesting/regtests/cp2k/exe/local_valgrind/cp2k.sdbg
O-B97-q6.inp
==10071== 
blockSane: fail -- redzone-hi

valgrind: m_mallocfree.c:2042 (vgPlain_arena_free): Assertion 'blockSane(a, b)'
failed.

host stacktrace:
==10071==at 0x38083F48: show_sched_status_wrk (m_libcassert.c:343)
==10071==by 0x38084064: report_and_quit (m_libcassert.c:415)
==10071==by 0x380841F1: vgPlain_assert_fail (m_libcassert.c:481)
==10071==by 0x380925E6: vgPlain_arena_free (m_mallocfree.c:2042)
==10071==by 0x3811B51E: vgModuleLocal_img_done (image.c:778)
==10071==by 0x380BAFF0: vgModuleLocal_read_elf_debug_info (readelf.c:3027)
==10071==by 0x380B343A: di_notify_ACHIEVE_ACCEPT_STATE (debuginfo.c:749)
==10071==by 0x380B343A: vgPlain_di_notify_mmap (debuginfo.c:1067)
==10071==by 0x380D963D: vgModuleLocal_generic_PRE_sys_mmap
(syswrap-generic.c:2367)
==10071==by 0x3810D6A1: vgSysWrap_amd64_linux_sys_mmap_before
(syswrap-amd64-linux.c:637)
==10071==by 0x380D60A4: vgPlain_client_syscall (syswrap-main.c:1905)
==10071==by 0x380D2B9A: handle_syscall (scheduler.c:1118)
==10071==by 0x380D424E: vgPlain_scheduler (scheduler.c:1435)
==10071==by 0x380E37B6: thread_wrapper (syswrap-linux.c:102)
==10071==by 0x380E37B6: run_a_thread_NORETURN (syswrap-linux.c:155)

sched status:
  running_tid=1

Thread 1: status = VgTs_Runnable (lwpid 10071)
==10071==at 0x3232A1761A: mmap (in /lib64/ld-2.12.so)
==10071==by 0x3232A076B9: _dl_map_object_from_fd (in /lib64/ld-2.12.so)
==10071==by 0x3232A08399: _dl_map_object (in /lib64/ld-2.12.so)
==10071==by 0x3232A0C3A1: openaux (in /lib64/ld-2.12.so)
==10071==by 0x3232A0E285: _dl_catch_error (in /lib64/ld-2.12.so)
==10071==by 0x3232A0CA84: _dl_map_object_deps (in /lib64/ld-2.12.so)
==10071==by 0x3232A0330F: dl_main (in /lib64/ld-2.12.so)
==10071==by 0x3232A160AD: _dl_sysdep_start (in /lib64/ld-2.12.so)
==10071==by 0x3232A014A3: _dl_start (in /lib64/ld-2.12.so)
==10071==by 0x3232A00B07: ??? (in /lib64/ld-2.12.so)
==10071==by 0x1: ???
==10071==by 0xFFF000A32: ???
==10071==by 0xFFF000A7C: ???


Note: see also the FAQ in the source distribution.

I have attempted to reproduce this (running exactly the same commands and
binary, etc.) but this didn't fail. The error appears to happen before our
executable is really running, so maybe something is wrong on startup ?

Reproducible: Couldn't Reproduce

-- 
You are receiving this mail because:
You are watching all bug changes.