gdb issues?

2023-10-10 Thread Havard Eidnes
Hi,

I have recently had a bear of a time getting the new rust which
landed in pkgsrc-wip the other day to build natively on several
of the targets we support for NetBSD.

The problem is that the "bootstrap" program (a rust executable)
lands on its nose with a SIGSEGV, and dumps core (without leaving
a discernible error message in the build log, so I had to ktrace
to find *that* out, argh!)

However, it appears that gdb has problems dealing with the
combination of the executable and the core file.  I see similar
problems on the following platforms:  NetBSD/macppc 10.0_BETA and
NetBSD/i386 9.3.

I'm beginning to wonder if it's my "gdb driving skills" which are
lacking, or whether it really works this poorly in other NetBSD
contexts as well...

The symptom looks like this on macppc 10.0_BETA:

: {18} gdb 
/usr/pkgsrc/wip/rust/work/rustc-1.73.0-src/build/bootstrap/debug/bootstrap 
work/rustc-1.73.0-src/bootstrap.core
GNU gdb (GDB) 11.0.50.20200914-git
...
Reading symbols from 
/usr/pkgsrc/wip/rust/work/rustc-1.73.0-src/build/bootstrap/debug/bootstrap...
[New process 19376]

warning: Error reading shared library list entry at 0x4b

warning: Error reading shared library list entry at 0x4b
Core was generated by `bootstrap'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0xfdc52444 in ?? ()
warning: Unsupported auto-load script at offset 0 in section .debug_gdb_scripts
of file 
/usr/pkgsrc/wip/rust/work/rustc-1.73.0-src/build/bootstrap/debug/bootstrap.
Use `info auto-load python-scripts [REGEXP]' to list them.
(gdb) i reg
r0 0xbe2b1812462872
r1 0xfffd4bf0  4294790128
r2 0xfdbbd008  4256944136
r3 0x0 0
r4 0x0 0
r5 0xfdedc1f8  4260217336
r6 0xfdedc1f8  4260217336
r7 0x0 0
r8 0x3654
r9 0x0 0
r100x1 1
r110xfdc52408  4257555464
r120xfdef9400  4260336640
r130xf9fda016383392
r140xc37e7412811892
r150x8 8
r160xc37f3912812089
r170xc 12
r180xc37f4512812101
r190xb 11
r200xc37f5012812112
r210x5 5
r220xc37f5512812117
r230x1117
r240xc37f6612812134
r250x0 0
r260x1 1
r270x0 0
r280xfdedc1f8  4260217336
r290xfffd4c80  4294790272
r300xfde6c584  4259759492
r310x4 4
pc 0xfdc52444  0xfdc52444
msr
cr 0x42000248  1107296840
lr 0xfdc52414  0xfdc52414
ctr0xfdc52408  4257555464
xer0x0 0
fpscr  0xfff8  -524288
vscr   
vrsave 
(gdb) i target

Symbols from 
"/usr/pkgsrc/wip/rust/work/rustc-1.73.0-src/build/bootstrap/debug/bootstrap".
Local core dump file:
`/usr/pkgsrc/wip/rust/work/rustc-1.73.0-src/bootstrap.core', 
file type elf32-powerpc.
0x0001 - 0x00f29000 is load0
0x00f41000 - 0x00fa is load1
0x00fa - 0x00fa03c8 is load2a
0x00fa03c8 - 0x00fa1000 is load2b
0xfd60 - 0xfd608128 is load3a
0xfd608128 - 0xfd61 is load3b
0xfd61 - 0xfd80 is load4
0xfda18000 - 0xfda2c000 is load5
0xfda2c000 - 0xfda2d2dc is load6a
0xfda2d2dc - 0xfda4c000 is load6b
0xfda4c000 - 0xfda4d41c is load7a
0xfda4d41c - 0xfda58000 is load7b
0xfda58000 - 0xfda5834c is load8a
0xfda5834c - 0xfda74000 is load8b
0xfda74000 - 0xfda765c0 is load9a
0xfda765c0 - 0xfda88000 is load9b
0xfda88000 - 0xfda88384 is load10a
0xfda88384 - 0xfda8c000 is load10b
0xfda8c000 - 0xfda8cb7c is load11a
0xfda8cb7c - 0xfda98000 is load11b
0xfda98000 - 0xfda981b4 is load12a
0xfda981b4 - 0xfdab4000 is load12b
0xfdab4000 - 0xfdab52e0 is load13a
0xfdab52e0 - 0xfdac8000 is load13b
0xfdac8000 - 0xfdac85bc is load14a
0xfdac85bc - 0xfdad4000 is load14b
0xfdad4000 - 0xfdad414c is load15a
0xfdad414c - 0xfdaf is load15b
0xfdaf - 0xfdaf04c4 is load16a
0xfdaf04c4 - 0xfdaf4000 is load16b
0xfdaf4000 - 0xfdaf407c is load17a
0xfdaf407c - 0xfdaf8000 is load17b
0xfdaf8000 - 0xfdaf8278 is load18a
0xfdaf8278 - 0xfdafc000 is load18b
0xfdafc000 - 0xfdafc23c is load19a
0xfdafc23c - 0xfdb1 is load19b
0xfdb1 - 0xfdb10120 is load20a

Re: gdb issues?

2023-10-11 Thread Havard Eidnes
Hi,

following up on my own message, I finally had the presence of
mind to look at what gdb on armv7 would tell me, if anything,
because that build failed as well.

And... it tells quite a bit more than the other two:

armv7: {2} gdb 
/usr/pkgsrc/wip/rust/work/rustc-1.73.0-src/build/bootstrap/debug/bootstrap 
work/rustc-1.73.0-src/bootstrap.core
GNU gdb (GDB) 8.3
Copyright (C) 2019 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "armv7--netbsdelf-eabihf".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
.
Find the GDB manual and other documentation resources online at:
.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from 
/usr/pkgsrc/wip/rust/work/rustc-1.73.0-src/build/bootstrap/debug/bootstrap...
[New process 1]
Core was generated by `bootstrap'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x60d0fe74 in _cpuset_isset () from /usr/lib/libc.so.12
warning: Unsupported auto-load script at offset 0 in section .debug_gdb_scripts
of file 
/usr/pkgsrc/wip/rust/work/rustc-1.73.0-src/build/bootstrap/debug/bootstrap.
Use `info auto-load python-scripts [REGEXP]' to list them.
(gdb) where
#0  0x60d0fe74 in _cpuset_isset () from /usr/lib/libc.so.12
#1  0x03d2bf8c in std::sys::unix::thread::available_parallelism ()
#2  0x03cff460 in std::thread::available_parallelism ()
#3  0x0383ed74 in ::augment_args::DEFAULT_VALUE::{{closure}} () at 
flags.rs:110
#4  0x0347d7a8 in core::ops::function::FnOnce::call_once ()
at 
/usr/pkgsrc/wip/rust/work/rustc-1.72.0-src/library/core/src/ops/function.rs:250
#5  0x0347f29c in core::ops::function::FnOnce::call_once ()
at 
/usr/pkgsrc/wip/rust/work/rustc-1.72.0-src/library/core/src/ops/function.rs:250
#6  0x033f87c8 in once_cell::sync::Lazy::force::{{closure}} ()
at 
/usr/pkgsrc/wip/rust/work/rustc-1.73.0-src/vendor/once_cell-1.12.0/src/lib.rs:1212
#7  0x033f8be0 in once_cell::sync::OnceCell::get_or_init::{{closure}} ()
at 
/usr/pkgsrc/wip/rust/work/rustc-1.73.0-src/vendor/once_cell-1.12.0/src/lib.rs:1023
#8  0x0383c624 in once_cell::imp::OnceCell::initialize::{{closure}} ()
at 
/usr/pkgsrc/wip/rust/work/rustc-1.73.0-src/vendor/once_cell-1.12.0/src/imp_std.rs:85
#9  0x03cdf5d8 in core::ops::function::impls:: for &mut F>::call_mut ()
#10 0x03ce0d98 in once_cell::imp::initialize_or_wait ()
#11 0x0383c010 in once_cell::imp::OnceCell::initialize ()
at 
/usr/pkgsrc/wip/rust/work/rustc-1.73.0-src/vendor/once_cell-1.12.0/src/imp_std.rs:81
#12 0x033f9880 in once_cell::sync::OnceCell::get_or_try_init ()
at 
/usr/pkgsrc/wip/rust/work/rustc-1.73.0-src/vendor/once_cell-1.12.0/src/lib.rs:1063
#13 0x033f89b0 in once_cell::sync::OnceCell::get_or_init ()
at 
/usr/pkgsrc/wip/rust/work/rustc-1.73.0-src/vendor/once_cell-1.12.0/src/lib.rs:1023
#14 0x033f86a8 in once_cell::sync::Lazy::force ()
at 
/usr/pkgsrc/wip/rust/work/rustc-1.73.0-src/vendor/once_cell-1.12.0/src/lib.rs:1211
#15 0x033f8580 in  as 
core::ops::deref::Deref>::deref ()
at 
/usr/pkgsrc/wip/rust/work/rustc-1.73.0-src/vendor/once_cell-1.12.0/src/lib.rs:1221
#16 0x03414ff4 in ::augment_args () at flags.rs:110
#17 0x0340eee0 in ::command () at flags.rs:33
#18 0x0383d784 in clap_builder::derive::Parser::parse_from ()
at 
/usr/pkgsrc/wip/rust/work/rustc-1.73.0-src/vendor/clap_builder-4.2.4/src/derive.rs:52
#19 0x0340e7a4 in bootstrap::flags::Flags::parse () at flags.rs:199
#20 0x033d73c8 in bootstrap::config::Config::parse_inner () at config.rs:1117
#21 0x03681018 in bootstrap::config::Config::parse () at config.rs:1113
#22 0x03381578 in bootstrap::main () at bin/main.rs:20
(gdb) i reg
r0 0x0 0
r1 0x0 0
r2 0x0 0
r3 0x1 1
r4 0x0 0
r5 0x60ed41e8  1626161640
r6 0x1 1
r7 0x0 0
r8 0x0 0
r9 0x7ff64328  2146845480
r100x3d61425   64361509
r110x7ff642cc  2146845388
r120x7ff642d0  2146845392
sp 0x7ff642c0  0x7ff642c0
lr 0x3d2bf8c   64143244
pc 0x60d0fe74  0x60d0fe74 <_cpuset_isset+36>
cpsr   0x20030010  537067536
(gdb) x/i 0x60d0fe74
=> 0x60d0fe74 <_cpuset_isset+36>:   ldr r3, [r1, r2, lsl #2]
(gdb) 

At least it gives a bit of clue about where to go looking for the
null pointer de-reference, so that's at least something...

Meanwhile, the arm64/9.0 

Re: gdb issues?

2023-10-11 Thread Valery Ushakov
On Wed, Oct 11, 2023 at 09:31:19 +0200, Havard Eidnes wrote:

> armv7: {2} gdb  .core
> GNU gdb (GDB) 8.3
> Copyright (C) 2019 Free Software Foundation, Inc.
> [another dozen or so lines of fsf spam]

Pro tip: gdb -q :)

-uwe


Re: new rust (was: gdb issues?)

2023-10-11 Thread Havard Eidnes
> Program terminated with signal SIGSEGV, Segmentation fault.
...
> #0  0x60d0fe74 in _cpuset_isset () from /usr/lib/libc.so.12
> #1  0x03d2bf8c in std::sys::unix::thread::available_parallelism ()

...

> At least it gives a bit of clue about where to go looking for the
> null pointer de-reference, so that's at least something...

This gets me to

work/rustc-1.73.0-src/library/std/src/sys/unix/thread.rs

which says:

#[cfg(target_os = "netbsd")]
{
unsafe {
let set = libc::_cpuset_create();
if !set.is_null() {
let mut count: usize = 0;
if libc::pthread_getaffinity_np(libc::pthread_self(), 
libc::_cpuset_size(set), set) == 0 {
for i in 0..u64::MAX {
match libc::_cpuset_isset(i, set) {
-1 => break,
0 => continue,
_ => count = count + 1,
}
}
}
libc::_cpuset_destroy(set);
if let Some(count) = NonZeroUsize::new(count) {
return Ok(count);
}
}
}
}

which on the surface looks innocent enough, and this is as near
as I can tell the same code as in rust 1.72.1, while the code in
1.71.1 is different, and falls back to using sysctl with this
code (the bootstrap program may be linked with the "old" standard
library, so the problem may have been in 1.72.1 too):

let mut cpus: libc::c_uint = 0;
let mut cpus_size = crate::mem::size_of_val(&cpus);

unsafe {
cpus = libc::sysconf(libc::_SC_NPROCESSORS_ONLN) as 
libc::c_uint;
}

// Fallback approach in case of errors or no hardware threads.
if cpus < 1 {
let mut mib = [libc::CTL_HW, libc::HW_NCPU, 0, 0];
let res = unsafe {
libc::sysctl(
mib.as_mut_ptr(),
2,
&mut cpus as *mut _ as *mut _,
&mut cpus_size as *mut _ as *mut _,
ptr::null_mut(),
0,
)
};

// Handle errors if any.
if res == -1 {
return Err(io::Error::last_os_error());
} else if cpus == 0 {
return Err(io::const_io_error!(io::ErrorKind::NotFound, 
"The number of hardware threads is not known for the target platform"));
}
}
Ok(unsafe { NonZeroUsize::new_unchecked(cpus as usize) })

(Actually, the fallback code is there in 1.73.0 and 1.72.1 too,
it's just not used due to the addition of the netbsd-specific
section above...)

The cpuset(3) man page says

 cpuset_isset(cpu, set)
  Checks if CPU specified by cpu is set in the CPU-set set.
  Returns the positive number if set, zero if not set, and -1 if
  cpu is invalid.

but ... under which conditions would it seg-fault inside that function?
Looking at the C code in common doesn't reveal anything frightening...

However, an attempt at a trivial re-implementation "to count
CPUs" in this manner in C does not trigger this issue on any of
my "problematic" platforms (or on amd64 for that matter):

#include 
#include 
#include 

int
main(int argc, char **argv)
{
int count = 0;
cpuset_t *cset;
int i;
int ret;

cset = cpuset_create();
if (cset != NULL) {
cpuset_zero(cset);
if (pthread_getaffinity_np(pthread_self(),  
cpuset_size(cset),
cset) == 0)
{
for (i = 0; i<256; i++) {
ret = cpuset_isset(i, cset);
if (ret == -1)
break;
if (ret == 0)
continue;
count++;
}
}
}
printf("cpus: %d\n", count);
return 0;
}

but also fails to count the number of CPUs (prints 0). So what
am I (and/or rust) doing wrong?  Or ... is this code simply wrong
anyway, and we need to re-instate the 1.71.1 code path by ripping
out the NetBSD-specific section quoted above?

Meanwhile, the warning in the pthread_getaffinity_np man page is
ignored:

 Portable applications should not use the pthread_setaffinity_np() and
 pthread_getaffinity_np() functions.

Although it could perhaps be argued that rust isn't all that
portable..., and perhaps in particular this piece of code?

Debugging t

Re: new rust (was: gdb issues?)

2023-10-15 Thread RVP

On Wed, 11 Oct 2023, Havard Eidnes wrote:


Program terminated with signal SIGSEGV, Segmentation fault.

...

#0  0x60d0fe74 in _cpuset_isset () from /usr/lib/libc.so.12
#1  0x03d2bf8c in std::sys::unix::thread::available_parallelism ()


...


At least it gives a bit of clue about where to go looking for the
null pointer de-reference, so that's at least something...


This gets me to

work/rustc-1.73.0-src/library/std/src/sys/unix/thread.rs

which says:

   for i in 0..u64::MAX {
   match libc::_cpuset_isset(i, set) {
[...]
but ... under which conditions would it seg-fault inside that function?



What's does the Rust impl. of _cpuset_isset() look like? Does it
take ints by any chance and you're passing a u64 to it here. A C
compiler will complain if you use `-m32', but, that's all. Don't
know how the Rust FFI will handle this. That's all I can think
of...


Debugging the C program reveals that pthread_getaffinity_np() has
done exactly nothing to the "cset" contents as near as I can
tell, the "bits" entry doesn't change.



pthread_getaffinity_np() _can_ be used to get the no. of "online"
CPUs on both Linux and FreeBSD, but it looks (from my perusal just
now) like threads default to no affinity on NetBSD and the scheduler
just picks whatever CPUs available for it--unless the affinity is
explicitly set, in which case it's inherited.

I think you should just use sysconf(_SC_NPROCESSORS_ONLN) or the
equivalent on NetBSD.

HTH,

-RVP