Re: HEADS UP: plan to switch many ports over to GCC 12 soon

2024-06-29 Thread Lloyd Parkes
On Sun, 2024-06-30 at 09:44 +1000, matthew green wrote:
> sh3 may be problematic,

I can almost certainly test sh3. I should even be able to make a new
bootable disk for an end to end test.

Ngā mihi,
Lloyd


Re: syslog, ENOBUFS and non-C implementations

2024-03-09 Thread Lloyd Parkes
Kia ora koutou katoa,

On Tue, 2024-03-05 at 11:19 +0100, Havard Eidnes wrote:
> 
> Secondly: is it something particular we are doing on the NetBSD
> end of things which contributes to this problem?  Doesn't other
> OSes return ENOBUFS if syslogd isn't able to keep up by consuming
> the messages at the receiving end?

It seems that other OSes do not return ENOBUFS. The following is from a
Debian manual page for send(2).

> ENOBUFS
>The  output queue for a network interface was full.  This
> gener‐
>ally indicates that the interface has stopped sending,  but 
> may
>be caused by transient congestion.  (Normally, this does not
> oc‐
>cur in Linux.  Packets are just silently dropped when  a 
> device
>queue overflows.)

This doesn't seem like the most useful behaviour to me, but it will
explain what you are seeing.

Ngā mihi,
Lloyd



Re: GitHub mirror stopped mirroring

2024-01-27 Thread Lloyd Parkes
On Sat, 2024-01-27 at 16:17 -0800, Chris Hanson wrote:
> It looks like there are CVS commits that haven’t made it to the
> GitHub mirror yet.
> 
> Anyone know what’s up with that?
> 
>   -- Chris
> 

The Mercurial mirror also hasn't been updated for a week.

Ngā mihi,
Lloyd



Re: Small mystery with grep -o -i

2024-01-24 Thread Lloyd Parkes
Kia ora koutou katoa,

On Wed, 2024-01-24 at 22:13 +0100, Rhialto wrote:
> T..
> he CHANGES file from upstream
> https://git.savannah.gnu.org/git/grep.git
> lists:
> 
> ...
> 
> related to commit 70e236167c3973fc428d2b5b297218fde9b68e73, committed
> 2010-03-17

And the source browser is at. 

https://git.savannah.gnu.org/cgit/grep.git/commit/?id=70e236167c3973fc428d2b5b297218fde9b68e73

Ngā mihi,
Lloyd



Re: mktemp POSIX (and Linux) divergence

2024-01-16 Thread Lloyd Parkes
Kia ora,
In one out of one samples I checked, the result of mktemp was only
checked for != NULL rather than a more robust check. That same sample
(xsrc/external/mit/smproxy/dist/save.c) also preferred to use mkstemp.

mktemp() got removed from POSIX because using it is just wrong. Given
how much work NetBSD and the external source maintainers have clearly
put into removing uses of mktemp() IMHO I don't think fiddling with it
is warranted. 

Ngā mihi,
Lloyd



Re: openssl3+postfix issue (ca md too weak)

2023-11-13 Thread Lloyd Parkes
Maybe rebuild Postfix with the option -DSSL_SECOP_PEER ? That causes 
Postfix to always set security level 0 when using TLS.


Cheers,
Lloyd


Re: openssl3+postfix issue (ca md too weak)

2023-11-13 Thread Lloyd Parkes




On 14/11/23 10:56, Joerg Sonnenberger wrote:


NIST has been sunsetting SHA1 for a long time, 2016 in fact. In many cases, 
there is a better trust chain
for Comodo intermediary certificates and admins should be installing those.


I'm not sure that's what Comodo has, even though it is the normal way of 
doing things.


I found a Comodo web page that said SHA1 will be fine, so don't worry, 
and if you are worried, you can buy a different certificate. That same 
web page's link to their intermediate certificates is a dead link. 
Comodo does not fill me with confidence.


I'm going to guess that the default @SECLEVEL of openssl needs to be 
adjusted if there is no Postfix specific way to adjust it. Apparently 
you can set the environment variable OPENSSL_CONF to run with a custom 
openssl configuration which can avoid reducing the security level of the 
rest of your system. Searching for "openssl @SECLEVEL" gave me the usual 
levels of StackExchange clarity, so ymmv.


Cheers,
Lloyd


Re: Problems with dhcpcd

2023-10-19 Thread Lloyd Parkes




On 20/10/23 00:32, Roy Marples wrote:

I've just landed dhcpcd-10.0.4 into -current and pkgsrc which fixes this issue.
Sorry for the delay.
Let me know if it works for you!


Thanks. I'll give it a go as soon as the change makes it into 
anonhg.netbsd.org, which might take a day or two.


Cheers,
Lloyd



Re: random lockups

2023-10-18 Thread Lloyd Parkes




On 19/10/23 02:12, Greg Troxel wrote:

I realize this could be a vast number of things, flaky power, bad power
supply, bad RAM, but it feels correlated with updating.   I think this
updated included a zfs actually-return-memory fix (which is very welcome
but epsilon scary).

Is anyone else seeing problems, especially new problems with netbsd-10?


Yes. A week or so ago I had a similar-ish sounding lockup on 10.99.10 
compiled on Oct 7. I was building NetBSD releases with two -j8 builds 
jobs running concurrently on a Xeon E3 with four cores, hyperthreading 
and 32 GB of ECC memory.


In my case my two extant ssh sessions were still working, but I couldn't 
make new ssh sessions, "sync" hung one ssh session and the hung while 
running "shutdown".


I'm a little bit suspicious that I pushed ZFS a bit too hard. All of my 
source and object files are on ZFS on that system. That system is one of 
a number of Xeon E3 systems I built years ago that have been thoroughly 
reliable.


Cheers,
Lloyd


Re: Problems with dhcpcd

2023-10-08 Thread Lloyd Parkes




On 8/10/23 15:30, Lloyd Parkes wrote:
I added some debugging to /libexec/dhcpcd-run-hooks and things started 
working ... better.


I created an empty file called /var/log/debug and added "exec >> 
/var/log/debug 2>&1" to the top of /libexec/dhcpcd-run-hooks.


With this change the hostname is reliably set by the time I ssh in to 
the Raspberry Pi after a reboot...


I found the problem. The syslog function in /libexec/dhcpcd-run-hooks 
tries to echo text to stdout/stderr and the shell script gets killed 
with SIGPIPE when it's being run in the background.


Commenting out the lines

case "$lvl" in
err|error)  echo "$interface: $*" >&2;;
*)  echo "$interface: $*";;
esac

allows the script to run correctly.

Adding the command 'trap "" PIPE' to /libexec/dhcpcd-run-hooks is 
another way that allows the script to run correctly.


Cheers,
Lloyd


Re: Problems with dhcpcd

2023-10-07 Thread Lloyd Parkes




On 8/10/23 14:54, Lloyd Parkes wrote:

That was all with dhcpcd 10.0.3. I'll try and look into the source code 
and see why the reason field might make a difference. It could just be 
something causing trouble in the early boot and the reason is just a 
coincidence.


I added some debugging to /libexec/dhcpcd-run-hooks and things started 
working ... better.


I created an empty file called /var/log/debug and added "exec >> 
/var/log/debug 2>&1" to the top of /libexec/dhcpcd-run-hooks.


With this change the hostname is reliably set by the time I ssh in to 
the Raspberry Pi after a reboot, but the hostname is not set quickly 
enough for Postfix to notice it and the initial hostname prompt on the 
console still says "Amnesiac".


Every time I reboot the Raspberry Pi, I get one more line in 
/var/log/debug that says "usmsc0: Setting hostname: 
rpi3-1.must-have-coffee.gen.nz".


Cheers,
Lloyd


Re: Problems with dhcpcd

2023-10-07 Thread Lloyd Parkes




On 7/10/23 05:01, Roy Marples wrote:

So it took 12 seconds to complete the DHCP transaction and validate the 
addresses are good before applying the DHCP lease.
Without -B, dhcpcd will fork to the background right away so any assignments 
from the DHCP lease won't apply right away.

Is this what you are seeing? 


Partly, yes. I am confident that when I was testing "dhcpcd -d", that I 
can't have been waiting long enough for the hostname to be set.


Is the hostname even there? 


Yes. I checked the dhcpcd.leases file by hand as well as using tcpdump. 
While I couldn't read the binary contents of dhcpcd.leases, the hostname 
was pretty obvious.



You can examine the contents of your leases with `dhcpcd -U`.


Thank you. That has given me more information.

The hostname is never set when I reboot the Raspberry Pi and in that 
case the output from "dhcpcd -U" is


reason=BOUND
interface=usmsc0
protocol=dhcp
ip_address=10.0.1.54
subnet_cidr=24
network_number=10.0.1.0
subnet_mask=255.255.255.0
routers=10.0.1.1
domain_name_servers=10.0.1.42 10.0.1.40
host_name=rpi3-1
domain_name=must-have-coffee.gen.nz
broadcast_address=10.0.1.255
dhcp_lease_time=86400
dhcp_message_type=5
dhcp_server_identifier=10.0.1.9
domain_search=must-have-coffee.gen.nz

If I simply run "/etc/rc.d/dhcpcd restart" any time after the Raspberry 
Pi has booted, then I get a hostname. At this time the output from 
"dhcpcd -U" is


reason=REBOOT
interface=usmsc0
protocol=dhcp
ip_address=10.0.1.54
subnet_cidr=24
network_number=10.0.1.0
subnet_mask=255.255.255.0
routers=10.0.1.1
domain_name_servers=10.0.1.42 10.0.1.40
host_name=rpi3-1
domain_name=must-have-coffee.gen.nz
broadcast_address=10.0.1.255
dhcp_lease_time=86400
dhcp_message_type=5
dhcp_server_identifier=10.0.1.9
domain_search=must-have-coffee.gen.nz

Running diff shows that the only difference is the "reason=" field.

That was all with dhcpcd 10.0.3. I'll try and look into the source code 
and see why the reason field might make a difference. It could just be 
something causing trouble in the early boot and the reason is just a 
coincidence.


Cheers,
Lloyd


Problems with dhcpcd

2023-09-30 Thread Lloyd Parkes

Hi all,
I've installed 10.99.9 from about a day ago onto an old Raspberry Pi and 
I just can't get it to correctly set its hostname from DHCP. (I have 
removed the hostname=rpi from /etc/rc.conf).


What I have discovered so far is that if I manually run "dhcpcd -d" then 
no hostname gets set. If I run "dhcpcd -d -B" then the hostname does get 
set. This doesn't make sense.


Here are the logs from the failed run (console and /var/log/messages). 
Dhcpcd doesn't seem to be running the hooks for the "CARRIER", which is 
something that does happen with dhcpcd -d -B". Interestingly, the 
message "executing: /libexec/dhcpcd-run-hooks ..." is not logged to 
syslog by either invocation of dhcpcd.


Any advice would be welcome.

Cheers,
Lloyd

>8---
dhcpcd-10.0.2 starting
chrooting as _dhcpcd to /var/chroot/dhcpcd
sandbox: posix resource limited
spawned manager process on PID 330
spawned privileged proxy on PID 331
spawned network proxy on PID 332
spawned controller proxy on PID 333
DUID 00:01:00:01:2c:a8:c9:27:b8:27:eb:d8:e5:08
lo0: ignoring due to interface type and no config
usmsc0: executing: /libexec/dhcpcd-run-hooks PREINIT
forked to background, child pid 330
no interfaces have a carrier

Sep 30 22:15:28  dhcpcd[325]: dhcpcd-10.0.2 starting
Sep 30 22:15:28  dhcpcd[331]: DUID 00:01:00:01:2c:a8:c9:27:b8:27:eb:d8:e5:08
Sep 30 22:15:29  dhcpcd[331]: no interfaces have a carrier
Sep 30 22:15:29  dhcpcd[331]: usmsc0: waiting for carrier
Sep 30 22:15:29  dhcpcd[331]: usmsc0: carrier acquired
Sep 30 22:15:29  dhcpcd[331]: usmsc0: IAID eb:d8:e5:08
Sep 30 22:15:29  dhcpcd[331]: usmsc0: adding address 
fe80::ba27:ebff:fed8:e508

Sep 30 22:15:29  dhcpcd[331]: usmsc0: soliciting an IPv6 router
Sep 30 22:15:30  dhcpcd[331]: usmsc0: carrier lost - roaming
Sep 30 22:15:37  dhcpcd[331]: usmsc0: carrier acquired
Sep 30 22:15:40  dhcpcd[331]: usmsc0: IAID eb:d8:e5:08
Sep 30 22:15:40  dhcpcd[331]: usmsc0: Router Advertisement from 
fe80::20d:b9ff:fe50:fe8c
Sep 30 22:15:40  dhcpcd[331]: usmsc0: adding address 
2406:1e00:b410:3501:ba27:ebff:fed8:e508/64
Sep 30 22:15:40  dhcpcd[331]: usmsc0: adding route to 
2406:1e00:b410:3501::/64
Sep 30 22:15:40  dhcpcd[331]: usmsc0: adding default route via 
fe80::20d:b9ff:fe50:fe8c

Sep 30 22:15:40  dhcpcd[331]: usmsc0: rebinding lease of 10.0.1.53
Sep 30 22:15:46  dhcpcd[331]: usmsc0: leased 10.0.1.53 for 86400 seconds
Sep 30 22:15:52  dhcpcd[331]: usmsc0: adding route to 10.0.1.0/24
Sep 30 22:15:52  dhcpcd[331]: usmsc0: adding default route via 10.0.1.1


Re: Error (cross) building tools from macOS

2023-09-18 Thread Lloyd Parkes




On 18/09/23 04:15, Benny Siegert wrote:

Hi!

I tried to build NetBSD-current from source on a Macbook Air M2. However, the 
tools build fails because gcc cannot find zstd while linking. My command line 
was:

% ./build.sh -j 6 -N 1 -U -O ../obj -m evbarm -a aarch64 release

Any ideas?


Maybe ../obj wasn't clean?

I built with "build.sh -j 6 -U -m evbarm -a aarch64 ... tools" on an M1 
Pro and it completed fine. This was just after doing two Xcode updates 
and one macOS Sonoma update today.


Cheers,
Lloyd


Re: Call for testing: certctl, postinstall, TLS trust anchors

2023-09-05 Thread Lloyd Parkes




On 4/09/23 08:47, Taylor R Campbell wrote:

We're preparing to ship TLS trust anchors in base and configure them
so that applications like ftp(1) and pkg_add(1) can do TLS validation
out of the box.


Nice.

I will have to wait until after the repository conversion systems come 
back online. Oh well.


Cheers,
Lloyd


Re: Why can't a WireGuard interface route packets to itself?

2023-07-30 Thread Lloyd Parkes
PPPoE is a point to point protocol and the public IP addresses 
114.23.17.255 and 114.23.164.222 are normal IP addresses. 114.23.164.222 
is my local IP address and 114.23.17.255 is my ISP's IP address. Both 
can be treated as /32.


AFAICT the important fact is that the route to 114.23.164.222 has lo0 in 
the Interface column meaning (according to the manual page) that lo0 
will be used to reach that IP address. In your case, wg0 will be used, 
which means the packet will be transmitted over WireGuard to the remote 
end. This doesn't do what you want.


I expect that you will need to dig deeper into WireGuard. It's quite 
possible that this is a bug in WireGuard. Or you might just have 
something misconfigured. I don't know anything about WireGuard and only 
a little bit about PPPoE.


Cheers,
Lloyd

On 31/07/23 10:18, logothesia wrote:

Beware of possible line wrapping.


No problem :)


DestinationGatewayFlagsRefs  UseMtu
114.23.17.255  114.23.164.222 UH  --  -  pppoe0
114.23.164.222 pppoe0 UHl --  -  lo0


10/8   10.0.0.1   U   --  -  wg0
10.0.0.1   wg0UHl --  -  wg0

I'm not entirely sure what I'm looking at; is 114.23.17.255 a broadcast
address? I assume it's not a /24, right? In any case, 114.23.164.222
looks a lot like my 10.0.0.1, minus the interface, which is set to lo0.
Should I set mine to lo0?


127/8  127.0.0.1  UGRS--  33624  lo0
127.0.0.1  lo0UHl --  33624  lo0


Barring the MTU, my loopback routes are pretty much identical.



Re: Why can't a WireGuard interface route packets to itself?

2023-07-30 Thread Lloyd Parkes




On 31/07/23 02:18, logothesia wrote:

Hi folks,

I have a very simple WG network with only two machines: 10.0.0.1 (NetBSD), and
10.0.0.2 (linux). Indeed they can ping each other just fine, but attempting to
ping 10.0.0.1 from itself yields the following error:

% ping 10.0.0.1
PING 10.0.0.1 (10.0.0.1): 56 data bytes
ping: sendto: No route to host
...

Is this intended behavior? If so, it seems very strange to me. Here is my
routing table:

% netstat -rn
Internet:
DestinationGatewayFlagsRefs  UseMtu Interface
...
10/8   10.0.0.1   U   --  -  wg0
10.0.0.1   wg0UHl --  -  wg0
...

Looks fine, no?


It does look a bit different from ppoe0 which I chose because it is 
probably the closest thing I have to a WireGuard interface.


I get the following from netstat and it looks like pppoe adds a route 
via localhost to itself. Beware of possible line wrapping.


drumhunter$ netstat  -rnfinet
Routing tables

Internet:
DestinationGatewayFlagsRefs  UseMtu 
Interface

default114.23.164.222 US  --  -  pppoe0
...
114.23.17.255  114.23.164.222 UH  --  -  pppoe0
114.23.164.222 pppoe0 UHl --  -  lo0
127/8  127.0.0.1  UGRS--  33624  lo0
127.0.0.1  lo0UHl --  33624  lo0
...

Cheers,
Lloyd


Re: modesetting vs intel in 10.0

2023-07-08 Thread Lloyd Parkes




On 9/07/23 10:06, David Brownlee wrote:

What would be a good benchmark to stress the system a little?


A while back someone mentioned pkgsrc/benchmarks/glmark2 and so I 
started using that. It seems reasonable.


Cheers,
Lloyd



Re: Redirecting 80 to 443

2023-06-06 Thread Lloyd Parkes
On Tue, 2023-06-06 at 19:54 +0200, Sagar Acharya wrote:
> I'm using current httpd server shipped with NetBSD to host my website
> at link below. It is awesome. I am shocked as to how complicated
> server hosting is made by commercial companies!

The NetBSD is very minimal and it is missing many features that you
might expect a web server to have. Features such as ...

> I wanted to know now, since I have added a cert, how do I redirect
> all port 80 requests to port 443?

this one. I think you will need to write a CGI program to get this
feature. It's not very important though because modern web browsers
will try https before http if you just type in a partial URL such as
www.example.com. 


> Thank you devs. Also, please link me to pages where I can learn how
> to use Lua with cgi.

The httpd manual page describes how to invoke Lua scripts, but these
aren't CGI scripts. The manual page also describes how to invoke CGI
scripts of course.

Cheers,
Lloyd



Re: current installation image fails to boot up

2023-06-04 Thread Lloyd Parkes
You can use "userconf" commands from the boot prompt to disable 
auto-detection of problematic devices. You should be able to interrupt 
the boot and disable ugen0 with the command "userconf disable ugen*".


You can also put that command into boot.cfg so that you don't have to 
interrupt the boot process and type it in all the time. It's all in the 
manual pages for "boot" and "boot.cfg".


Cheers,
Lloyd

On 2/06/23 14:10, bsd...@tuta.io wrote:

When I boot up from the NetBSD HEAD image, it fails to enumerate USB devices 
and I get this error:

ubt0: vendor 8087 (Ox8087) product 0a2b (Ox0a2b), rev 2.00/0.10, addr 3
ugen0 at uhub1 port 10
ugenB: Broadcon Corp (Ox0a5c) 5888 (Ox5832), rev 1.10/1.01, addr 4
panic: kernel diagnostic assertion “dev=>ud_ifaces == NULL” failed: file 
"/usr/src/sys/dev/usb/usb_subr.c", line 870, ud_ifaces=OxfffTadv425d89940
cpul: Begin traceback...
vpanic() at netbsd:vpanic+0x173
kern_assert() at netbsd:kern_assert+Ox4b
usbd_set_config_index() at netbsd:usbd_set_config_index+0x59d
ugenif_attach() at netbsd:ugenif_attach+0x241
ugen_attach() at netbsd:ugen_attach+0x5c
config_attach_internal() at netbsd:config_attach_internal+0x1a7
config_found_acquire() at netbsd:config_found_acquire+0xd9
config_found() at netbsd:config_found+0x32
usbd_attachwholedevice() at netbsd:usbd_attachwholedevice+0xf8
usbd_probe_and_attach() at netbsd:usbd_probe_and_attach+0x14a
xhci_new_device() at netbsd:xhci_new_device+0x54a
uhub_explore() at netbsd:uhub_explore+0x4bd
usb_discover() at netbsd:usb_discover+0x4f
usb_event_thread() at netbsd:usb_event_thread+0x46
cpul: End traceback... :

I did an OCR from a camera image, and tried my best to fix the text. So, some 
text might be a little off. Does anyone know the workaround for this?

Thank youSalil




Re: httpd with tls on bootup

2023-05-28 Thread Lloyd Parkes
The manual page for httpd says that certificate needs to be in PEM 
format and that -I changes the port number. Perhaps using -Z to specify 
the certificate changes default port for you, perhaps it doesn't you'll 
have to experiment.


On 29/05/23 03:55, Sagar Acharya wrote:

Dear folks,

I tried to configure httpd but I'm unable to do so. Can you please help with 
it. What I want is on bootup, I want httpd to host root_dir /var/www . I have

httpd=YES

What I'm unable to understand is what would be the format of cert? How to host 
on port 443 with rc?
Thanking you
Sagar Acharya
http://humaaraartha.in 


Re: I can't cross build 10 on 10??

2023-04-28 Thread Lloyd Parkes

And now it works fine.

The only change I made was to update my build host to the latest 
10_BETA. I had already updated the base and comp sets on that host 
before I sent my first email because it smelled like tool(chain) confusion.


Thanks for the advice even though I didn't end up needing it.

Cheers,
Lloyd


I can't cross build 10 on 10??

2023-04-27 Thread Lloyd Parkes

Hi all,
I recently started trying to build 10_BETA for some of my non-amd64 
platforms using my 10_BETA amd64 build hosts and it just won't work.


It looks like something is going wrong with the host tools and they are 
trying to use host includes when they should be using in-tree includes. 
One concise example is



#   compile  csu/gcrt0.o
/home/lloyd/NetBSD/tooldir/bin/sparc--netbsdelf-gcc 
--sysroot=/home/lloyd/NetBSD/destdir -DELFSIZE=32 
-I/home/lloyd/NetBSD/src/lib/csu/arch/sparc 
-I/home/lloyd/NetBSD/src/lib/csu/common -I.  -x assembler-with-cpp -c 
/home/lloyd/NetBSD/src/lib/csu/arch/sparc/crt0.S -o gcrt0.o.S.o
/home/lloyd/NetBSD/tooldir/bin/sparc--netbsdelf-gcc -O2  -std=gnu99
-Wall -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith 
-Wno-sign-compare  -Wsystem-headers   -Wno-traditional   -Wa,--fatal-warnings  
-Wreturn-type -Wswitch-Wshadow -Wcast-qual -Wwrite-strings -Wextra 
-Wno-unused-parameter -Wno-sign-compare -Wold-style-definition -Wsign-compare 
-Wformat=2  -Wno-format-zero-length -Werror 
--sysroot=/home/lloyd/NetBSD/destdir -DELFSIZE=32 
-I/home/lloyd/NetBSD/src/lib/csu/arch/sparc 
-I/home/lloyd/NetBSD/src/lib/csu/common -I.  -c -fPIC -DMCRT0 
/home/lloyd/NetBSD/src/lib/csu/common/crt0-common.c -o gcrt0.o.c.o
/home/lloyd/NetBSD/src/lib/csu/common/crt0-common.c:150:10: fatal error: 
machine/elf_support.h: No such file or directory
  150 | #include 
  |  ^~~
compilation terminated.
*** Failed target: gcrt0.o
...
┌──(lloyd@ceph4)-[~/NetBSD/src]
└─$ find . -name elf_support.h  
./sys/arch/sparc/include/elf_support.h

./sys/arch/sparc64/include/elf_support.h


For some reason, the include path isn't set correctly. I can see that 
the missing files are in the tree.


I'm using sources from Mercurial and I can see that my local changes are 
utterly trivial and I do _seem_ to have reasonably up to date sources 
judging by the most recent commit message that says "Tickets #145 - #150".



┌──(lloyd@ceph4)-[~/NetBSD/src]
└─$ hg status
M etc/rc.d/entropy
M share/man/man7/entropy.7
M sys/arch/amd64/conf/GENERIC
┌──(lloyd@ceph4)-[~/NetBSD/src]
└─$ hg sum   
parent: 1023970:7893cdb2b634 
 Tickets #145 - #150

branch: netbsd-10
commit: 3 modified
update: (current)


I've tried deleting destdir, objdir, releasedir and tooldir with no 
change in behaviour.


Do I remember this being discussed before? Am I going mad? Probably. Any 
help will be appreciated.


Cheers,
Lloyd


Re: Building old systems

2023-04-19 Thread Lloyd Parkes




On 20/04/23 10:34, Brook Milligan wrote:

I am trying to build an old kernel with build.sh on a recent (9.99.108) amd64 
system.  However, compiling nbmake fails immediately with errors like

/usr/bin/ld: buf.o:(.bss+0x0): multiple definition of `debug_file'; 
arch.o:(.bss+0x0): first defined here

Unless I am doing something silly, it clearly is not possible for a current 
system to build old tools and kernels at arbitrary points in time.

What is the best strategy for building old kernels to, for example, bisect the 
code?


This problem is only going to get worse as 9.x gets older and more 
people use newer hosts.


It occurs to me that we could fix the nbmake source code and release it 
as 9.4. It won't fix things for people stuck on 9.[0-3], but it's better 
than a poke in the eye with a sharp stick.


Cheers,
Lloyd


Re: How to recover a root partition with damaged boot blocks

2023-04-05 Thread Lloyd Parkes




On 5/04/23 18:00, matthew green wrote:


ps see "man 7 entropy" for how to fix the problem you observed.


FWIW I have PR 57254 in Gnats that provides a patch to /etc/rc.d/entropy 
so that whenever the system boots with insufficient entropy appropriate 
messages are logged. It doesn't change the behaviour of anything, it 
just adds more log messages in quite a useful way if I do say so myself.


The intention is to catch any unforeseen ways that entropy might be 
forgotten about. I was thinking of custom image builds, but losing 
/var/db/entopy-file during a power cut is an excellent scenario as well.


Cheers,
Lloyd


Re: ipmi0: incorrect critical max

2023-03-21 Thread Lloyd Parkes




On 22/03/23 03:45, Stephen Borrill wrote:

On Sat, 18 Mar 2023, Lloyd Parkes wrote:

On 18/03/23 05:14, Stephen Borrill wrote:
On an HP Microserver Gen10 Plus, I found that soon after booting, I 
get the following alert:

...
   Current  CritMax  WarnMax WarnMin  CritMin  Unit
[ipmi0]
    11-LOM-CORE:    59.253    0.000 110.471    degC



Just out of interest, in the BIOS (RBSU) what is the Power Management 
/ Power Regulator set to? It will have settings such as "Dynamic Power 
Savings Mode" and "OS Control Mode".


I set it to Maximum I/O Performance (words may not match exactly, it is 
in a box waiting to be installed at a customer).


OK. When you don't set it to OS Controlled, the HPE RBSU chops power 
management out of the ACPI in a way that makes Linux complain about 
corrupt ACPI information.


I realise that you are looking at IPMI, not ACPI, but it does have that 
HPE smell of ugly removal from your view because the RBSU is managing 
it. That could just be coincidence of course.


Cheers,
Lloyd


Re: ipmi0: incorrect critical max

2023-03-17 Thread Lloyd Parkes




On 18/03/23 05:14, Stephen Borrill wrote:
On an HP Microserver Gen10 Plus, I found that soon after booting, I get 
the following alert:

...
   Current  CritMax  WarnMax  WarnMin  CritMin  Unit
[ipmi0]
    11-LOM-CORE:    59.253    0.000  110.471    degC



Just out of interest, in the BIOS (RBSU) what is the Power Management / 
Power Regulator set to? It will have settings such as "Dynamic Power 
Savings Mode" and "OS Control Mode".


Cheers,
Lloyd


Re: GZIP warnings when building

2023-02-15 Thread Lloyd Parkes
Oh, it's used to pass the compression level though pax to gzip. That's 
explain it. Maybe we should switch to --use-compress-program="nbgzip 
-LEVEL"???


Cheers

On 16/02/23 06:28, Martin Husemann wrote:

On Thu, Feb 16, 2023 at 06:18:54AM +1300, Lloyd Parkes wrote:

I saw this too. I think the way we select the gzip compression level (using
the GZIP environment variable) is now deprecated for some reason.


I have not seen it anywhere, but I see we set GZIP=-9n for pax -z
in src/distrib/sets/maketars.

Martin


Re: GZIP warnings when building

2023-02-15 Thread Lloyd Parkes
I saw this too. I think the way we select the gzip compression level 
(using the GZIP environment variable) is now deprecated for some reason.


Cheers,
Lloyd

On 16/02/23 04:44, Martin Husemann wrote:

On Wed, Feb 15, 2023 at 03:09:37PM +0100, Christian Groessler wrote:

Hi,

probably known and considered not important, but I wanted to mention that
I'm getting warnings like this when building a release:


Sounds like a local problem - what is in your $GZIP (and why)?

Martin


Re: 10.0 BETA : Poor audio quality if 3.5mm jack is fully inserted

2023-02-09 Thread Lloyd Parkes
This could be a hardware issue and it could be a hardware issue that 
Linux is "fixing" for you.


I have seen this exact broken behaviour before when connecting 
headphones to a headset socket or vice versa. I can't remember which way 
round it was.


It is entirely reasonable for modern OSes and audio chipsets to be able 
to detect this and fix it. Hmm wikipedia says that some modern systems 
do detect this.


You can buy or make your own headphone to headset adapter if we can't 
fix the device driver, but fixing the device driver is probably easier 
than buying an adapter. The adapters seem a bit rare, probably because 
the device drivers always fix the problem these days. Or you could try 
swapping out your headphones/headset with a headset/headphones?


An explanation of what is (probably) going on
=

This appears to be what is going, but it is just FYI since it doesn't 
fix NetBSD.


The poor audio quality from a fully inserted jack is because the what 
you are hearing is the _difference_ between the left and right channels 
played through both speakers at once. the left and right channels are 
being driven normally, but the ground isn't connected, so instead of 
getting full power across each speaker all you get is the voltage 
difference of the left and right channels split over the two speakers. I 
think that's what's going on here.


When you pull the jack out slightly, the ground ring now gets connected 
fine, but the right audio channel is no longer connected. This gives you 
a good left channel only. I'm pretty sure this is right.


See also https://en.wikipedia.org/wiki/Phone_connector_(audio) and 
https://en.wikipedia.org/wiki/Phone_connector_(audio)#Interoperability


On 9/02/23 17:13, Mayuresh wrote:

On NetBSD 10.0 BETA, amd64, build of 10 Jan 23, on an asus laptop I face
this problem:

If I use a 3.5mm headphone jack the audio is of very poor quality, barely
some sort of a noise.

If I insert the jack partially, then the audio is proper, but only through
the left speaker.

The outputs.master2 control seems to be relevant to the headphone jack.
Have set others to 0.

 $ mixerctl -a
 outputs.master=0,0
 outputs.master2=254,254
 inputs.reclvl=92,92
 inputs.reclvl.mute=on
 outputs.master3=0,0
 outputs.master3.mute=off
 record.monitor=0,0
 outputs.master4=0,0
 outputs.master4.mute=off
 inputs.reclvl2=0,0
 inputs.reclvl2.mute=on
 outputs.dacsel=DAC00,DAC01
 record.source=ADC02

Possibly relevant bits of dmesg

 $ grep audio /var/run/dmesg.boot
 hdaudio0 at pci0 dev 14 function 0: HD Audio Controller
 hdaudio0: interrupting at msi0 vec 0
 hdaudio0: HDA ver. 1.0, OSS 6, ISS 7, BSS 0, SDO 1, 64-bit
 hdafg0 at hdaudio0 vendor 0x10EC product 0x0256 nid 0x01: Realtek product 
0256
nid=02 [audio output] [source: dac]
nid=03 [audio output] [source: dac]
 audio0 at hdafg0: playback, capture, full duplex, independent
 audio0: slinear_le:16 2ch 48000Hz, blk 1920 bytes (10ms) for playback
 audio0: slinear_le:16 2ch 48000Hz, blk 1920 bytes (10ms) for recording
 spkr0 at audio0: PC Speaker (synthesized)
 hdafg1 at hdaudio0 vendor 0x8086 product 0x280D nid 0x01: Intel HDMI/DP
nid=02 [audio output] [source: dac]
 audio1 at hdafg1: playback, capture, full duplex, independent
 audio1: slinear_le:16 2ch 48000Hz, blk 1920 bytes (10ms) for playback
 audio1: slinear_le:16 2ch 48000Hz, blk 1920 bytes (10ms) for recording
 spkr1 at audio1: PC Speaker (synthesized)


Note: On Linux the headphone jack works fine, with fully inserted jack
both left and right speakers work. So no hardware issue.

Please help.



Re: NetBSD 10.0_BETA envstat hangs

2023-01-08 Thread Lloyd Parkes




On 7/01/23 03:47, Mayuresh wrote:

On Thu, Jan 05, 2023 at 10:12:27AM +1300, Lloyd Parkes wrote:

2) For the the fact that the device appears to be getting attached to
sysmon_envsys(9) even though device configuration failed.


I filed for the other two, but did not find the word sysmon_envsys in
dmesg. What observation shall I report in the PR?


The sysmon_envsys subsystem is an internal kernel API, which is why its 
manual page is in section 9 of the manual. sysmon_envsys probably 
doesn't log much (if anything) to the console and so you have to know 
what's going on behind the scenes before you can have an idea of what to 
expect, which is an important part of the PR. Sorry.


Every device driver has an attach function that is responsible for 
configuring all the I/O needed to access the hardware and then 
registering that device with the kernel in some way so that NetBSD can 
then use the hardware. If the I/O can't be configured, then there is no 
point trying to register the device with the kernel because the device 
driver can't access the hardware.


With many drivers I would expect that if the dmesg output says 
"autoconfiguration error", then the device would not be registered with 
the kernel. You can verify that acpibat0 has been registered with 
sysmon_envsys by running "envstat -D", which hopefully won't hang your 
system.


In summary. When I see an "autoconfiguration error" for a device in 
dmesg, I expect that device to be unavailable. i.e. it won't be reported 
in "envstat -D".


My expectations may be wrong, this wouldn't be the first time.

Cheers,
Lloyd


Re: NetBSD 10.0_BETA envstat hangs

2023-01-04 Thread Lloyd Parkes




On 4/01/23 23:32, Mayuresh wrote:

envstat hangs (doesn't come out with Ctrl-C) on a laptop with following
description:

# uname -a
NetBSD asusn 10.0_BETA NetBSD 10.0_BETA (GENERIC) #0: Mon Dec 19 14:00:35 UTC 
2022  mkre...@mkrepro.netbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC amd64

# dmesg | grep acpibat0
[ 1.046558] acpibat0 at acpi0 (BAT0, PNP0C0A-0): ACPI Battery
[ 1.046558] acpibat0: ACPI 4.0 functionality present
[ 2.056386] acpibat0: autoconfiguration error: failed to evaluate _STA: 
AE_NO_MEMORY
[62.666971] acpibat0: workqueue busy: updates stopped

Please advise.



If I got that on my laptop, I would create 3 PRs.

1) For the autoconfiguration error.
2) For the the fact that the device appears to be getting attached to 
sysmon_envsys(9) even though device configuration failed.
3) For the fact that the system call envstat appears to use a system 
call that cannot be interrupted.


For #3 run "ps alx" and see if the STAT column for your uninterruptable 
envstat process has the letter "D" in it. If so, this indicates that the 
system call is in an uninterruptible state normally associated with 
short term disk I/O.


Cheers,
Lloyd



Re: 10_BETA: Nice QOL improvements to the installer

2022-12-21 Thread Lloyd Parkes




On 22/12/22 02:33, Mayuresh wrote:

On Wed, Dec 21, 2022 at 09:25:30AM +1300, Lloyd Parkes wrote:

I was installing on amd64 and the installer let me choose either MBR or GPT,


I notice a separate image marked "bios"

 NetBSD-10.0_BETA-amd64-bios-install.img.gz
 NetBSD-10.0_BETA-amd64-install.img.gz

So wonder, which installer you tried that gave you these two options. I
got an impression that these two are just different installers - 1 UEFI
and 1 BIOS.


I used the second (non-BIOS) image because I guessed it might be a 
hybrid installer. I think that my old NUCs only support BIOS booting 
from USB sticks, but I could easily be wrong.


Cheers,
Lloyd


10_BETA: Nice QOL improvements to the installer

2022-12-20 Thread Lloyd Parkes

Hi all,
It's been a while since I've run the NetBSD installer, so I don't now 
how recent the changes I've seen are, but I do like them.


I was installing on amd64 and the installer let me choose either MBR or 
GPT, which is nice. The NUC I was installing on is old enough that it 
can only UEFI boot from wd0 and some OSes don't allow a UEFI install 
from a BIOS booted USB installed.


I also noticed that the installer knew where the install files were 
locally on the USB stick and just installed them. Is that new?


I specifically chose a NUC without a hardware RNG so that I could test a 
tweak to the /etc/rc.d/entropy script and was pleasantly surprised to 
see that the installer detected this and applied a giant clue hammer. 
That was very nice.


Cheers,
Lloyd


Re: py310-pdf-parser doesn't install py_pdf_parser

2022-12-06 Thread Lloyd Parkes




On 6/12/22 22:35, Sagar Acharya wrote:

I installed package py310-pdf-parser . It should install module py_pdf_parser 
as should be imported in python3.10.

It throws an error
No such module.


You should normally provide more details, such as where you got the 
package from and a short program showing the code that produces the error.


I looked into a copy of the pkgsrc source code I have lying around and 
your problem seems to be that the Python code is installed as a program 
and not as a module. It is installed as /usr/pkg/bin/pdf-parser.py.


This does seem to be what the author of the software intends.

Cheers,
Lloyd


Re: getrandom() error with linux emulation

2022-11-19 Thread Lloyd Parkes




On 19/11/22 12:31, Brook Milligan wrote:

I am running a linux application (no source code unfortunately) and 
encountering an error that seems from the message (which is cryptic) to be 
related to a getrandom() function call.  This is with a NetBSD/amd64 9.99.99 
kernel dating from August.

I recall all the discussions floating around about the random generator 
functions, but did not fully track them.  Now I wonder if that is relevant 
somehow.


Probably not. I think the /etc/rc.d/ script got updated to provide a 
smoother system admin experience.


You can run "sysctl kern.entropy.needed" to see if your system is 
waiting for more entropy. The value should be 0 if your system has 
enough entropy.


You can also run "rndctl -l" to list entropy sources. There should be at 
least two rows (and probably only two rows) with a non-zero "Estimated 
bits" value. One of them will be the "seed" device which is the seed 
from /var/db/entropy-file, and there should be some other hardware 
dependent device with a type of "rng" which is your CPU's random number 
generate (assuming your CPU has one, modern ones do).



Are there differences between between NetBSD and Linux in the getrandom() 
function or implementations that would cause emulation to fail?

Is there any way to debug this kind of situation?


I generally sift through the output from ktruss and try and infer some 
kind of application behaviour. Sometimes that works.


Cheers,
Lloyd


ACPI problem post 9.99.79 on amd64

2022-11-05 Thread Lloyd Parkes

Hi all,
I thought I ought to refresh the NetBSD running on my Xeons, but current 
kernels panic during boot. This affects 9.99.105 and a bunch of earlier 
versions, but I don't know how much earlier.


I finally found a serial cable that works with our boot loader so I now 
have some console messages I can copy and paste.


The most obviously useful messages seem to be:

	[   1.0157457] lpt0 at acpi0 (PEC2, PNP0401-2): io 
0x-0x10006,0x3ff-0x406 irq 15

[   1.0157457] extent_alloc_region: extent `ioport' (0x0 - 0x)
[   1.0157457] extent_alloc_region: start 0x, end 0x10006
[   1.0157457] panic: extent_alloc_region: region lies outside extent
[   1.0157457] cpu0: Begin traceback...


The addresses range 0x3ff-0x406 looks almost like it's right, and the 
IRQ 15 almost looks right, but the range 0x-0x10006 is well outside 
the old ISA range.


Running /usr/sbin/acpidump shows the numbers 0x378 and 7 a fair few 
times near the text PNP040x which makes me think the the firmware might 
have the right data in it, implying that kernel is parsing the ACPI data 
incorrectly.


The older kernel works because it doesn't seem to have support for lpt0 
at acpi0.


I also can't find a way to disable the device in the BIOS (I really 
don't like Intel firmware). I tried running "userconf disable lpt*" in 
the boot loader, but that didn't seem to make any difference (i.e. the 
kernel still tried to attach lpt0).


On top of all that, the motherboard doesn't even have a physical 
parallel port.


I can remove lpt* from my kernel easily enough, but does anyone know 
what might be going on? I'm happy to run any debugging stuff that anyone 
wants.


Cheers,
Lloyd


Re: How to limit amount of virtual memory used for files (was: Re: Tuning ZFS memory usage on NetBSD - call for advice)

2022-09-21 Thread Lloyd Parkes
I wrote a quick and dirty program to trigger vast amounts of File memory 
usage only to remind myself of something important.


Having File memory use all available RAM with only scraps left is not a 
problem by itself. The memory is there to be used and it is getting 
used. I used to have to explain this as part of my $dayjob, so it's a 
bit embarrassing that I forgot it.


What is a problem is if that memory isn't freed up for other uses when 
necessary. Håvard's email about his G4 Mac Mini is an excellent example 
of a problem. A problem I have experienced in the past was a program 
failing with out of memory errors while processing 128MB of data on a 
system with 256MB of RAM. A problem doesn't have to be a crash. It could 
simply be unnecessary swap being used leading to terrible performance.


Can we put together a catalogue of clearly defined problems so that we 
can reproduce them and investigate further? While Håvard appears to have 
solved his problem, I'm pretty sure I have an unused G4 Mac Mini of my 
own that I can try and reproduce his problem on.


Cheers,
Lloyd


Re: macppc system wedging under memory pressure

2022-09-15 Thread Lloyd Parkes
You aren't the first person to have problems with memory pressure. We 
really are going to have to get around to documenting the memory 
management algorithms and all the tuning knobs.


I used to use this page (https://imil.net/NetBSD/mirror/vm_tune.html), 
but I have no idea how current it is. Also, I haven't used my smaller 
systems for a while now.


In the past, I used to set vm.filemax to 5 because I never want a page 
that I can simply reread to force an anonymous page to be written out to 
swap.


Cheers,
Lloyd

On 9/09/22 08:30, Havard Eidnes wrote:

Hi,

I'm running NetBSD-current on one of my 1G Mac Mini G4 systems,
doing pkgsrc bulk building.

This go-around I've managed to build llvm, and next up is rust.  This
is proving to be difficult -- my system will consistently wedge it's
user-land (still responding to ping, no response on the console or any
ongoing ssh sessions; well, not entirely correct, it will echo one
carriage-return on the console with a newline, but then that is wedged
as well).  Also, I have still not managed to break into DDB on this
system, so each and every time I have to power-cycle the box.  This
also means that all I have to go on is output from "top -s 1", "vmstat
1" and "systat vm", and this is the latest information I got from
these programs when it wedged just now:

load averages:  1.10,  1.13,  1.05;   up 0+02:01:4521:59:52
103 threads: 5 idle, 6 runnable, 90 sleeping, 1 zombie, 1 on CPU
CPU states:  1.0% user,  5.9% nice, 93.1% system,  0.0% interrupt,  0.0% idle
Memory: 559M Act, 274M Inact, 12M Wired, 186M Exec, 162M File, 36K Free
Swap: 3026M Total, 80M Used, 2951M Free / Pools: 134M Used

   PID   LID USERNAME PRI STATE   TIME   WCPUCPU NAME  COMMAND
  6376 26281 1138  78 RUN 2:03 89.10% 88.96% rustc rustc
 0   109 root 126 pgdaemon0:20 15.48% 15.48% pgdaemon  [system]
   733   733 he85 poll0:14  2.93%  2.93% - sshd
   164   164 he85 RUN 0:06  1.17%  1.17% - systat

Notice the rather small amount of "Free" memory, and the rather
high rate of system CPU.  The "vmstat 1" output for the last few
seconds:

  procsmemory  page   disk faults  cpu
  r b  avmfre  flt  re  pi   po   fr   sr w0   in   sy  cs us sy id
  1 0   634804   4164 1869   0   00 1358 1358  0  2800 425 97  3  0
  3 0   637876   1016  786   0   0000  0  2130 410 99  1  0
  2 0   636336   2512  816   4   00 1192 1202  0  3260 508 98  2  0
  2 0   633448   5456  617   0   00 1355 1371  0  2280 374 99  1  0
  2 0   634964   3780  430   0   0000  0  2500 452 98  2  0
  2 0   635988   2740  260   0   0000  0  2610 496 98  2  0
  2 0   637396   1376  386   0   0000  0  3000 459 97  3  0
  2 0   634912   4060  775   0   00 1354 1354  0  1900 245 100 0 0
  2 0   636940   2308  437   0   0000  0  2500 415 100 0 0
  2 0   637912   1064  473   0   0000  0  2510 406 100 0 0
  2 0   633580   5408  175   0   00 1262 1270  0  2540 403 99  1  0
  2 0   637288   1740 1002   0   0000  0  2780 521 97  3  0
  2 0   634340   4324  713   0   00 1354 1357  0  2960 471 96  4  0
  2 0   636388   2160  540   0   0000  0  2160 361 98  2  0
  2 0   637412   1116  258   0   0000  0  2540 405 98  2  0
  2 0   637556   4872  178  12   0  996 1122 42861  4  3070 442 30 70  0
  2 0   638064   9620 1105   3   0 1228 1228 2305 70  4110 667 19 81  0
  2 0   639624   7416  550   0   0000  0  3190 584 97  3  0
  2 0   644744   2200 1299   0   0000  0  2790 416 93  7  0
  6 0   646924   2716  537   0   0 1356  672 2403 14  4120 497 35 65  0
  4 0   654792 36 2022  32   0 1354 1366 7910 91  2410 6735 7 93  0

while "systat vm" doesn't really give any more information than
the above:

 6 usersLoad  1.10  1.13  1.05  Thu Sep  8 21:59:51

Proc:r  d  sCsw  Traps SysCal  Intr   Soft  Fault PAGING   SWAPPING
  8 3355471  302 75398 in  out   in  out
 ops64
   68.2% Sy   0.0% Us  31.8% Ni   0.0% In   0.0% Idpages  1027
|||||||||||
==forks
   fkppw
Anon   509096  50%   zero472 Interrupts   fksvm
Exec   190804  18%   wired   12000   100 cpu0 clock   pwait
File   166072  16%   inact  280984   openpic irq 29   relck
Meta82832   2%   bufs 6500   openpic irq 63   rlkok
  (kB)real   swaponly  free38 openpic irq 39 1 

Re: Switching to the new DHCP from ISC?

2022-09-03 Thread Lloyd Parkes




On 3/09/22 22:40, Joerg Sonnenberger wrote:

On Sat, Sep 03, 2022 at 10:00:04AM +1200, Lloyd Parkes wrote:

Does anyone know of a maintained DHCP relay implementation?


The better question for me is: are DHCP relayer server still in use?


DHCP relays are used a lot, but the ISC one probably not so much. It 
dumps core when send a DHCP response packet back to the client. Maybe 
older versions of NetBSD have an ISC DHCP relay with less bit rot?


Since we are talking about NetBSD, it is always an option to just run a 
DHCP server instead of a DHCP relay. That doesn't mean it's a good 
option though.


If it turns out that neither ISC nor TNF want to support the old DHCP 
relay, then we may as well switch to upstream supported code.


Cheers,
Lloyd

p.s. I have a fix for the DHCP relay core dump. I don't know where the 
root cause bit rot is, but I do have an fairly acceptable patch.


Switching to the new DHCP from ISC?

2022-09-02 Thread Lloyd Parkes

Hi all,
Are there any plans to switch the DHCP server we have to the new Kea one 
from ISC? The relay that comes with it dumps core when relaying DHCP 
responses.


I've fixed my copy of dhcrelay, but when I went to send the patch 
upstream, I saw this


Please note that this project is in maintenance mode - we are
not actively adding new functionality and may not respond to
non-critical issues

and this

The client and relay portions of ISC DHCP are no longer
maintained

but also this

The Kea distribution does not currently include either a client
or a relay.

Does anyone know of a maintained DHCP relay implementation?

Cheers,
Lloyd


Re: Tuning ZFS memory usage on NetBSD - call for advice

2022-08-31 Thread Lloyd Parkes

It might not be ZFS related. But it could be.

Someone else reported excessive, ongoing, increasing "File" usage a 
while back and I was somewhat dismissive because they were running a 
truckload of apps at the same time (not in VMs).


I did manage to reproduce his problem on an empty non-ZFS NetBSD system, 
so there is definitely something going on where "File" pages are not 
getting reclaimed when there is pressure on the memory system.


I haven't got around to looking into it any deeper though.

BTW the test was to copy a single large file (>1TB?) from SATA storage 
to USB storage. Since the file is held open for the duration of the copy 
(I used dd IIRC) this might end up exercising many of the same code 
paths as a VM accessing a disk image.


Cheers,
Lloyd

On 31/08/22 22:52, Matthias Petermann wrote:

Hello all,

under [1] is described in the section "Memory usage", which requirements 
ZFS has for the memory.


It further mentions that the tunables that exist in FreeBSD do not exist 
in NetBSD. Especially for the size of the ARC there seems to be no limit 
for NetBSD:


"vfs.zfs.arc_max - Upper size of the ARC. The default is all RAM but 1 
GB, or 5/8 of all RAM, whichever is more. Use a lower value if the 
system runs any other daemons or processes that may require memory. 
Adjust this value at runtime with sysctl(8) and set it in 
/boot/loader.conf or /etc/sysctl.conf."


So far so good... I have here the concrete case that I use ZFS ZVOLS as 
backend storage for virtual machines (Qemu/nvmm). The host has 8192 MB 
RAM available, of which are allocated to the VMs:


* net (512 MB RAM)
* iot (1024 MB RAM)
* mail (512 MB RAM)
* app (2048 MB RAM)

This should leave 4096 MB for the host - while ZFS would claim 5120 MB 
(5/8) of the available RAM. On this, after a while, the value under 
"File" in top increases to over 3 GB, with the consequence that the 
system starts swapping to the swap partition.


This raises the following questions for me:

- how can I investigate the composition of the amount of memory 
displayed under "File" in top more precisely?


- are there any hidden tuning possibilities for ZFS in NetBSD (maybe 
boot parameters etc.) or compile-time settings?


- what kind of memory can basically be swapped out? Only memory of the 
processes (e.g. RAM of the Qemu VMs) or also parts of the ZFS ARC?


- Which value does ZFS use to determine the ARC upper limit? The 
physical RAM or the physical RAM + swap? Background to the question: in 
my example, would I perhaps be better off disabling swap?


Kind regards
Matthias

Btw, sorry for the cross-posting. The host is running on a NetBSD 
9.3_STABLE, however the topic seems relevant for current as well and I 
would have the possibility to test or compare on current as well.



[1] https://wiki.netbsd.org/zfs/
[2] https://docs.freebsd.org/en/books/handbook/zfs/#zfs-advanced


Re: HPE H240 in HBA mode

2022-08-13 Thread Lloyd Parkes
I have seen some somewhat official recommendations from HPE to leave 
such controllers in RAID mode and to then create an individual RAID 0 
target for each disk.


It seems more complicated than is needed, but it's what they said to do 
and it did seem to work.


Cheers

On 13/08/22 19:35, os...@fessel.org wrote:

Hej,
while trying to get rid of the issues with the mfii driver when running xen, i 
popped in a H240 controller int o my DL380.  This works fine, but there seems 
to be no real driver for that Card.  When running in RAID mode, ciss claims 
this device and works.  But I want to avoid double raid overhead (this runs 
zfs), so I configured the controller to HBA-mode.

Obviously, the ciss driver now does not recognize the connected drives:
[ 1.03] ciss1 at pci12 dev 0 function 0: HP Smart Array 10
[ 1.03] ciss1: interrupting at msix6 vec 0
[ 1.03] ciss1: 0 LDs, HW rev 1, FW 7.00/7.00, 64bit fifo rro, method 
perf 0x2005
[ 1.03] scsibus2 at ciss1: 0 targets, 1 lun per target
Looks to me ciss only operated with the HP virtual disks.
Is there a driver for HBA mode on these cards?

Cheers
Oskar


Re: Broken references in manual pages

2022-07-12 Thread Lloyd Parkes




On 10/07/22 14:59, Robert Elz wrote:


Fixing these things is a noble goal, but breaking the buikd to
make that happen is not.


I didn't envision breaking the build, but I also have to admit that I 
didn't envision how this wouldn't break the build either.


Maybe a new build.sh operation?

Cheers,
Lloyd


Re: Broken references in manual pages

2022-07-09 Thread Lloyd Parkes




On 10/07/22 00:55, Roland Illig wrote:


Shouldn't there be an automatic check for these?  Running this check
after a "make distribution" should be fairly easy.


That sounds like an excellent idea to me.

Fixing the problems will be much harder than finding them, but there is 
no good excuse for broken references.


Cheers,
Lloyd


Re: i386/amd64 image generated trough mkimage stuck on primary bootsrap at boot

2022-07-08 Thread Lloyd Parkes




On 8/07/22 21:01, RVP wrote:

On Fri, 8 Jul 2022, br0nko wrote:


8 partitions:
#    size    offset fstype [fsize bsize cpg/sgs]
a:   2369473    63 4.2BSD  0 0 0  # (Cyl.  
0*-   1156)
c:   2312129    63 unused  0 0    # (Cyl.  
0*-   1128)
d:   2369536 0 unused  0 0    # (Cyl.  0 
-   1156)




This doesn't look right, does it? Offset is 63 instead of 64,


63 is a popular offset because the BIOS field for track length can only 
hold values 0-63.


Cheers,
Lloyd


Re: i386/amd64 image generated trough mkimage stuck on primary bootsrap at boot

2022-07-08 Thread Lloyd Parkes




On 8/07/22 19:57, br0nko wrote:

On Thursday, July 7th, 2022 at 11:25 PM, Mike Pumford 
 wrote:


On 07/07/2022 15:40, br0nko wrote:


Hi,

0: NetBSD (sysid 169)
start 63, size 1568384 (766 MB, Cyls 0/1/1-97/160/62), Active
beg: cylinder 0, head 1, sector 1
end: cylinder 97, head 160, sector 62
Information from PBR:
Not bootable: All bytes are identical (0x00)
Not bootable: Bad magic number (0x)


No MBR boot code in the partition table.

fdisk -i /dev/rvnd0

should resolve that I think.


I did give a try, using an amd64 mkimage image build from current tree 
(9.99.98/amd64):


Note that, in general, NetBSD fdisk, installboot and disklabel can be 
run directly against the image itself without needing to use vnconfig. 
Sometimes you might need a flag to tell the command that it is being 
given a disk image instead of a real disk, but that's about it.


Cheers,
Lloyd


Re: Using NetBSD-current/amd64 on Sunfire X2200-M2 servers

2022-04-28 Thread Lloyd Parkes

I keep forgetting this.

On 20/04/22 06:36, Brian Buhrow wrote:

...
Any ideas would be greatly appreciated.



You should be able to boot in single user mode and ask the kernel not to 
configure the bge driver. You should be able to run userconf in single 
user mode with "boot -sc" and then use the command "disable bge".


I haven't been able to try this for myself though.

Cheers,
Lloyd


Re: Installing bootx64.efi on NetBSD current?

2022-04-10 Thread Lloyd Parkes




On 10/04/22 23:57, Martin Husemann wrote:

I think I have just fixed this...


I think you have as well.

Thanks,
Lloyd


Re: Installing bootx64.efi on NetBSD current?

2022-04-09 Thread Lloyd Parkes




On 9/04/22 23:25, Martin Husemann wrote:


First question is if the files exist in that install medium - can you
netboot into that netbsd-INSTALL.gz again, go to the shell and
check contents of /usr/mdec?


I did manage to find my way this far before getting distracted by a 
slightly more important problem on another system.


I saw that sysinst source did seem to copy both bootia32.efi and 
bootx64.efi without bothering to check if either of them succeeded. When 
I checked /usr/mdec in the boot image and neither file is there.


My /usr/mdec contains only boot, bootxx_ffsv1, bootxx_ffsv2, 
bootxx_lfsv2, gptmbr.bin, mbr, mbr_bootsel, and mbr_ext.


Cheers,
Lloyd


Installing bootx64.efi on NetBSD current?

2022-04-08 Thread Lloyd Parkes
I installed NetBSD current from late March by net booting 
netbsd-INSTALL.gz on and amd64 system of mine. The net boot and disk 
boot were both done with UEFI and sysinst correctly offered to create 
GPT filesystems for me.


Even though the FAT system partition was created correctly, bootx64.efi 
didn't get installed in it. The directories were there, just not the 
bootable file.


Is that supposed to happen? I don't think that's supposed to happen.

If someone wants to point me at a likely place in the source code, I'm 
happy to have a go at fixing this.


Cheers,
Lloyd


Re: the entropy bug, and device timeouts (was: Note: two files changed and hashes/signatures updated for NetBSD 8.1)

2022-01-25 Thread Lloyd Parkes




On 25/01/22 17:57, Greg A. Woods wrote:


I have fixes that restore the previous option to use "untrusted"
hardware as an entropy source.  They may need some updating to be truly
complete in the most recent -current, as I'm still back at 9.99.81.


The change was more subtle than that I think. Untrusted hardware was 
used as an entropy source, but it didn't count towards the "enough" that 
was needed to bootstrap the rnd system from nothing.


On 7 May 2020 a change was committed to /etc/rc.d/random_seed so that a 
seed file is created at boot time if you don't already have one. I 
haven't checked because I really can't be bothered right now, but I'm 
pretty sure that's all that's required.


Cheers,
Lloyd


Re: Serious bugs in NetBSD-current, have they been fixed?

2021-10-24 Thread Lloyd Parkes

Hi Tom,

On 25/10/21 14:01, Thomas Mueller wrote:

One of these bugs relates to entropy and how it impedes building many packages 
in pkgsrc.

I seemed to get around this bug on one computer but not the other.


It's the old story that it's not a bug, but a feature. It's quite 
possible that it is "fixed" now (I'm running 9.99.79).


I'm going to assume that you want entropy that is "good enough" rather 
than "guaranteed" because that's going to be easier for all of us.


I think what you need to do is the following:

1) Run some command like "ls -lR /" to generate some entropy
2) Run "sysctl -w kern.entropy.consolidate=1"
3) Run "/etc/rc.d/random_seed stop" to store the entropy

This is only needed on systems that don't have a built-in secure random 
number generator and I don't have any such systems running right now.


Rebooting a machine with "shutdown -r now" will run step 3 for you. If 
you are like me, then you might be in the habit of running "reboot" when 
setting up a new machine instead of "shutdown -r now". Doing that skips 
step 3 and doesn't end well.



Other bug is longer-standing and plagued me in NetBSD 8.99.51 and again in 
9.99.82.

That bug causes device timeouts on some types of hard drive but not all.

Sample output is, excerpt from /var/run/dmesg.boot on the following reboot:

...
wd1d: device timeout writing fsbn 2391623176 of 2391623176-2391623199 (wd1 bn 
2391623176; cn 2372642 tn 0 sn 40), xfer e0, retry 3


You haven't told us what sort of hardware this is. Drive models, 
motherboard chipsets, etc.


My personal experience is that this indicates a physical failure of a 
hard disk. I've also seen errors caused by SATA cables and SSD firmware. 
I have a small stack of failed HDDs and SDDs and the only reason it 
isn't a large stack is because I've thrown half of them away.


Try running a SMARTS tool to ask the disk what it thinks is going on. 
The SMARTS report is quite hard to read, but very thorough. Run "dd 
if=/dev/rwd1d of=/dev/null bs=32k" and see if it hangs at the same place 
every time. If it does, then you have a bad block on your disk.


You can also try booting into SeaTools from Seagate to run a full health 
check on the disk (works for non-Seagate disks).


To test the cable, just use another cable that is as different as 
possible from the current cable and hope for the best. SMARTS will tell 
you about some cable problems, but then we get into knowing how to read 
the report.


Cheers,
Lloyd


Re: High vm scan rate and dropped keystrokes thru X?

2021-07-27 Thread Lloyd Parkes




On 27/07/21 12:19 am, Paul Ripke wrote:

On Mon, Jul 26, 2021 at 05:53:19PM +1200, Lloyd Parkes wrote:

That's 12GB of RAM in use and 86MB of RAM free. Sounds pretty awful to me.


Sounds normal to me - I don't expect to see any free RAM unless I've just
- exited a large process
- deleted a large file with large cache footprint
- released a large chunk of RAM by other means (mmap, madvise, semctl, etc).


I haven't run NetBSD on a desktop for a while now, but I still think 
12GB is a lot of memory in use. Maybe I'll get a new MacBook when they 
start shipping 32GB Apple CPU ones and then put NetBSD on my current 
MacBook.



A big chunk of it is in file cache, which is unsurprising when reading
thru a 400GiB file...


Page activity lasts 20s and at 30MB/s that means you should have 600MB 
of file data active. Add 50% for inactive pages and that's still only 
900MB. I'm willing to bet money that zstd only reads each block of data 
once (sequentially in fact) and so it doesn't need any file data cache 
at all. File metadata is a different matter, but that probably stays 
active and there won't be much of it.


I suspect that your vm.filemax is set to more memory than you have 
available for the file cache and once that happens anonymous pages start 
to get swapped out. My experience is that while anonymous pages sound 
unimportant, they are in fact the most important pages to keep in RAM. 
Thinking about it, they are the irreplaceable bits of all our running 
software.


Try setting vm.filemin=5 and vm.filemax=10. Really. I did it when 
processing vast amounts of files in CVS and it worked for me.


Out of curiosity, what are you doing with zstd. You mentioned backups. 
Is this dump or restore? dump implements its own file cache, which won;t 
help with the memory burden.


"top -ores" will tell you what programs are using the most anonymous 
pages, which might help identify where all this memory pressure is 
coming from.


Cheers,
Lloyd


Re: High vm scan rate and dropped keystrokes thru X?

2021-07-25 Thread Lloyd Parkes
It has been a very long time since I had to look at UVM stuff, but 
luckily past me post to 
https://mail-index.netbsd.org/tech-repository/2010/02/01/msg000364.html. 
Well done past me.


Copying from that post, I was using
  vm.anonmin = 10
  vm.filemin = 5
  vm.execmin = 5
  vm.anonmax = 90
  vm.filemax = 10
  vm.execmax = 30


On 25/07/21 5:37 pm, Paul Ripke wrote:

NetBSD 9.2, amd64, 16GiB RAM, quad core + hyperthreading.


Sounds normal enough.


  procsmemory  page   disks   faults  cpu
  r b  avmfre  flt  re  pi   po   fr   sr w0 w1   in   sy  cs us sy id
  0 2 12214336  86564 4043   0   0000 66 66 2415 9142 4588 0  3 97


That's 12GB of RAM in use and 86MB of RAM free. Sounds pretty awful to me.

What does top or vmstat -s say about pages active/inactive and 
anonymous/cachdd file/cached executable pages. This might give you a 
hint about where all your memory has gone and what it is being used for.


Cheers,
Lloyd



Re: build.sh live-image

2021-05-29 Thread Lloyd Parkes




On 30/05/21 8:55 am, Rhialto wrote:


Another thing I noticed is that /etc/rc.d/resize_disklabel looks at the
wrong MBR partition to check for NetBSD: it looks at partition 1 but
should look at partition 0. 


resize_disklabel is designed for use on the Raspberry Pi where the 
NetBSD partition comes after the FAT boot partition and so it is 
partition 1.


When building amd64 images, I've resorted to rewriting the 
resize_disklabel script from scratch.


Cheers,
Lloyd


Re: Problem reports for version control systems

2021-04-30 Thread Lloyd Parkes




On 30/04/21 9:50 pm, Joerg Sonnenberger wrote:

On Fri, Apr 30, 2021 at 05:31:53PM +1200, Lloyd Parkes wrote:

     ceph4% hg --version
     Mercurial Distributed SCM (version 5.3.2)


Please note that this is quite an old version and a lot of work on
improving both CPU time and memory use has been spend since then.


I have a pkgsrc from about the 9.0 days that I keep copying around for 
"reasons". Maybe it's time to upgrade.


Cheers,
Lloyd


Re: Problem reports for version control systems

2021-04-30 Thread Lloyd Parkes




On 30/04/21 8:36 pm, Hauke Fath wrote:

Out of curiosity: Do you use a ZIL SLOG* volume with that setup? I
remember cvs operations used to be a lot slower on spinning rust than
on SSD.


I'm not using a SLOG. I couldn't be bothered setting one up on my crash 
and burn systems. It doesn't seem to be too bad, except for when I try 
and run "rm -rf src".



Both from home (16 MBit DSL) and $WORKPLACE I am frequently running cvs
updates through filtering routers (pf(4) here), and basically never see
connection issues.


Germany is pretty much the opposite of New Zealand. It's close to 
everywhere, but its last mile access speeds are a bit infamous.


I'm running some tests on other local clients and against other CVS 
mirrors in the hope that come up with a better characterisation of the 
problem than "it doesn't work".


Cheers,
Lloyd



Problem reports for version control systems

2021-04-29 Thread Lloyd Parkes

Hi all,
The problem reports people have in their emails are completely 
inadequate for trying to determine what is going wrong for people trying 
to access the NetBSD source.


Since I was the first person to post an inadequate report in this first 
batch, I'll go first at trying to do better. There are three sections to 
this email. First, a description of the host and network I am running 
this on. Second, a description of what I did, what I got, and what I 
expected to get. Third, I'll speculate on possible causes.


Host and Network


I am in New Zealand, which tends to have good internet access to the 
home and Long Fat Networks (LFNs) to the rest of the world.


The host is a Xeon E3-1241 v3 @ 3.50GHz with eight hyperthreads and 32GB 
RAM. It has a 128GB SSD for / and 4x4TB disks in a raidz zpool on /vol. 
All work is being done on /vol. The OS is NetBSD 9.99.79.


The network is a 1Gb/s LAN through to a smaller NetBSD router running 
NPF with MSS clamping enabled so that I can get Netflix. My ISP does not 
use CGN for my IPv4 connection. My IPv6 connection is tunnelled through 
to Hurricane Electric in Sydney, Australia.


    ceph4% cvs -v

    Concurrent Versions System (CVS) 1.12.13 (client/server)
    with CVSACL Patch 1.2.5 (cvsacl.sourceforge.net)

    ceph4% hg --version
    Mercurial Distributed SCM (version 5.3.2)

What I Did, Got, and Expected
=

In all cases, I pretty much copied instructions from the NetBSD web site.

    ceph4% cd /vol/src/cylc/src-CVS
    ceph4% export CVSROOT="anon...@anoncvs.netbsd.org:/cvsroot"
    ceph4% export CVS_RSH="ssh"
    ceph4% cvs checkout -A -P src
    cvs checkout: Updating src
    U src/BUILDING
    U src/Makefile
    U src/Makefile.inc
    U src/README.md
    ...
    U src/crypto/external/bsd/heimdal/dist/lib/wind/test-utf8.c
    [very long pause here]
    client_loop: send disconnect: Broken pipe
    cvs [checkout aborted]: end of file from server (consult above 
messages if any)


I expected the output to not pause too much and for the listing to run 
through to the end of src/usr.sbin without disconnecting and aborting.


    ceph4% hg clone -U https://anonhg.NetBSD.org/src src-hg
    [lots of progress stuff gets displayed here]
    applying clone bundle from 
https://cdn.NetBSD.org/_bundles/src/77d2a2ece3a06d837da45acd0fda80086ab4113c.zstd.hg 


    adding changesets
    adding manifests
    adding file changes
    added 931876 changesets with 2425841 changes to 439702 files (+417 
heads)

    finished applying clone bundle
    searching for changes
    adding changesets
    adding manifests
    adding file changes
    adding changesets
    adding manifests
    adding file changes
    added 22612 changesets with 161374 changes to 110024 files (+6 heads)
    new changesets 26c8f37631b6:6f32acf2c5e1 (230 drafts)
    280602 local changesets published

This is what I expected.

Speculation
===

I'm suspicious of the long pause in the CVS example. Maybe one end or 
the other of the CVS connection spent too long doing disk I/O or 
calculating stuff and then NPF timed-out the connection?


Cheers,
Lloyd


Re: GCC 10 available for testing etc. in -current.

2021-04-18 Thread Lloyd Parkes



On 19/04/21 10:21 am, Lloyd Parkes wrote:


On 17/04/21 6:30 pm, Lloyd Parkes wrote:
I am using the Mercurial repository at https://anonhg.NetBSD.org/src 
for fetching the source code because it's nice and quick


So I'm now downloading the source code through CVS instead of 
Mercurial because nobody else seems to be having the same problems 
that I'm having building with GCC 10.


I found the answer.

Someone moved the bookmark called "@" in a way that Mercurial wasn't 
willing to blindly propagate into my local, central copy of the 
Mercurial src repository. This could easily have been done by part of 
the CVS to Mercurial tracking that is done rather than by a real person.


Mercurial then left the original "@" bookmark in place in my central 
copy of the src repository, created a bookmark called "@default" to 
track the upstream "@", and finally issued me a warning about a 
"divergent bookmark". I didn't know what that meant or that I was using 
the "@" bookmark. It turns out that "@" is the default thing to checkout 
if such a bookmark exists.


Of course, when cloning to a build box Mercurial just cloned my central 
repository without needing to warn me about divergent bookmarks (there 
was no divergence at this step) and I was unknowingly using an old 
version of the source tree with no possibility of it ever getting updated.


The fix is for me to explicitly checkout the branch "trunk" when I don't 
want a specific branch. And also to keep an eye on messages when I 
update my local, central copy of the Mercurial src repository.


Now I can get back to building things with GCC 10 and then wondering how 
to get all the hardware booted.


Cheers,
Lloyd



Re: GCC 10 available for testing etc. in -current.

2021-04-18 Thread Lloyd Parkes



On 17/04/21 6:30 pm, Lloyd Parkes wrote:
I am using the Mercurial repository at https://anonhg.NetBSD.org/src 
for fetching the source code because it's nice and quick


So I'm now downloading the source code through CVS instead of Mercurial 
because nobody else seems to be having the same problems that I'm having 
building with GCC 10.


I've been running CVS for more than two hours now, and it has terminated 
with a broken connection 10 (make that 11) times so far. I'm rerunning 
with "cvs update", so I am making progress. Even so, CVS has only just 
got to paths starting with the letter "d".


As someone who has used CVS for almost 30 years now, I'm happy to say 
that CVS just needs to be retired. Have a flag day, check that the 
content in the HEADs all match up and then make CVS read-only. I know 
many of us are old and grumpy, but that model T isn't our every day 
drive any more.


Cheers,
Lloyd

p.s. The code in Mercurial seems to match what I can browse on the FTP 
site, so I really am at a loss as to why I can't build, so I am checking 
everything.




Re: GCC 10 available for testing etc. in -current.

2021-04-17 Thread Lloyd Parkes



On 15/04/21 2:19 pm, matthew green wrote:

the steps are fairly simple:

- update -currnet srcs
- build.sh with no -u (update), and set -V HAVE-GCC=10 as a
   option.  this ensures that everything is actually rebuilt
   with the new compiler.


I'm guessing that should be "-V HAVE_GCC=10", but even so I just can't 
get this to build. I always get the message "cc: error: CET_HOST_FLAGS@: 
No such file or directory". I'm going to see if I can find where this 
has come from. Does it ring any bells for anyone?


I am using the Mercurial repository at https://anonhg.NetBSD.org/srcfor 
fetching the source code because it's nice and quick and I'm building on 
a Linux Xeon I had lying around because it's also nice and quick.


I'm trying to download the -current source tarballs now, but I'm getting 
150KB/s and so I'm only up to external.tar.gz. I have also just deployed 
NetBSD current to another Xeon I have lying around and maybe the 
deployment will have actually worked.


I'll speculate that something is using the native sed/awk instead of the 
the one from tools.


Cheers,
Lloyd



Re: regarding the changes to kernel entropy gathering

2021-04-04 Thread Lloyd Parkes
With some trepidation, I'm going to dip into this conversation even 
though I haven't read all of.  I don't have the mental fortitude for 
that. I have two suggestions, one short and one long.


Firstly, we could just have an rc.d script that checks to see if the 
system has /var/db/entropy-file or an rng device, and if not then it 
prints a warning and then generates some simplistic entropy with "ls -lR 
/ ; dd if=/dev/urandom of=/dev/random bs=32 count=1 ; sysctl -w 
kern.entropy.consolidate=1". The system owner has been warned and the 
system proceeds to run.


Secondly we could fix what I see as the biggest problem with the new 
implementation that I see right now and that is that it is unreasonably 
difficult for people to work out how to make their system go forwards 
once it has stopped. Note that making the system go forwards is easy, 
it's work out what to do that's hard. We can fix that.


The current implementation prints out a message whenever it blocks a 
process that wants randomness, which immediately makes this 
implementation superior to all others that I have ever seen. The number 
of times I've logged into systems that have stalled on boot and made 
them finish booting by running "ls -lR /" over the past 20 years are too 
many to count. I don't know if I just needed to wait longer for the boot 
to finish, or if generating entropy was the fix, and I will never know. 
This is nuts.


We can use the message to point the system administrator to a manual 
page that tells them what to do, and by "tells them what to do", I mean 
in plain simple language, right at the top of the page, without scaring 
them.


How about this..

"entropy: pid %d (%s) blocking due to lack of entropy, see entropy(4)"

and then in entropy(4) we can start with something like

"If you are reading this because you have read a kernel message telling 
you that a process is blocking due to a lack of entropy then it is 
almost certainly because your hardware doesn't have a reliable source of 
randomness. If you have no particular requirements for cryptographic 
security on your system, you can generate some entropy and then tell the 
kernel that this entropy is 'enough' with the commands

    ls -lR /
    dd if=/dev/urandom of=/dev/random bs=32 count=1
    sysctl -w kern.entropy.consolidate=1
If have strong requirements for cryptographic security on your system 
then you should run 'rndctl -S /root/seed' on a system with hardware 
random number generate (most modern CPUs), copy the seed file over to 
this system as /var/db/entropy-file and then run 'rndctl -L 
/var/db/entropy-file'.


This only needs to be done once since scripts in rc.d will take care of 
saving and restoring system entropy in /var/db/entropy-file across reboots."


We could even do both of these things.



Re: How to determine if graphics is supported by radeondrm?

2021-03-20 Thread Lloyd Parkes



On 20/03/21 3:41 pm, Lloyd Parkes wrote:
I also tried my Intel based laptop, but I only had an MBR image and HP 
seemed to have removed the old BIOS boot option in their newer 
firmware so I couldn't even boot the image.


I just tried the 9.1 image on my ASUS UX550V laptop and it did load a 
proper GLX renderer for the Intel GPU. I hadn't tried that laptop 
initially because it also has an nVidia GPU in it, but then I realised 
that NetBSD will probably ignore the nVidia GPU and it did.


None of the trackpads on my laptops worked with the NetBSD image I was 
using and I did not look into why.


Cheers,
Lloyd



Re: How to determine if graphics is supported by radeondrm?

2021-03-19 Thread Lloyd Parkes

HI all,

On 18/03/21 10:03 am, Rhialto wrote:

For example, if I look at an "AMD Ryzen 3" cpu, which supposedly has
integrated graphics "AMD Radeon Vega 8, integrated GPU". Grepping -i for
"Vega" in src/sys/external/bsd/drm2/dist/drm yields no results; I take
it this is a bad sign?


I booted a live image of 9.1 that I found on my Ryzen 3 laptop and it 
ends up running the Vesa driver with the "llvmpipe" OpenGL renderer. The 
Xorg log file shows that X thought about the AMD driver, but ended up 
using the VESA one. The log shows a long list of AMD GPU models, which 
looks like Xorg's way of saying that it doesn't know what model of AMD 
GPU I have.


Linux says the laptop has "AMD Ryzen 3 3300U with Radeon Vega Mobile 
Gfx" and Google tells me this is a Picasso/Radeon Vega 6.


I found the file amdgpu_device.c with the amdgpu_asic_name definition at 
the top. NetBSD has many, many entries missing from this list. :-(


I also tried my Intel based laptop, but I only had an MBR image and HP 
seemed to have removed the old BIOS boot option in their newer firmware 
so I couldn't even boot the image.


Lloyd



Re: pkgsrc build hang of python38

2021-03-01 Thread Lloyd Parkes



On 2/03/21 6:51 am, Riccardo Mottola wrote:

I thouht your cited entropy fix was a "once time" not to be reissued
again (when? at every reboot? kernel update?)

I don't think it is that user friendly to recularly "feed" this
entropy... and it gets lost at the next reboot? Could we have a script
on boot?


The entropy is preserved across reboot by the shutdown hooks, so you 
normally only need to do this once. If you have the bad habit (like I 
do) of rebooting some systems with the "reboot" command instead of 
"shutdown -r", then your entropy will not be preserved. Once I setup 
some entropy and started using "shutdown -r" everything worked fine.


Cheers,
Lloyd



Re: pkgsrc build hang of python38

2021-02-27 Thread Lloyd Parkes



On 28/02/21 9:50 am, Riccardo Mottola wrote:
python38 hangs during build. I tried several times, it hangs in 
exactly the same place again. I don't know of course where the issue 
comes from, since all is new now :)


Do you have any console messages? Maybe messages about blocking the 
python process because of not enough entropy?


That's a problem I encountered when I was building Python on current. If 
you are working on the console, then it's all obvious, but if you aren't 
on the console then it's all mysterious.


If that is your problem, then generate some entropy with something like 
"ls -lR /" and then run


dd if=/dev/urandom of=/dev/random bs=32 count=1
sysctl -w kern.entropy.consolidate=1

See 
https://mail-index.netbsd.org/current-users/2020/05/01/msg038495.html 
for all the details.


Cheers,
Lloyd




Re: zpool import lossage

2021-02-16 Thread Lloyd Parkes
This is all off the top of my head and while I use ZFS almost daily, not 
on NetBSD :-(, it's been a few years since I poked at the internals.


Your action of creating a symlink seems like a reasonable 
workaround/solution to your issue. You should be able to create the 
symlink in any directory and tell zfs import which directory to use.


ZFS generally expects to use a whole GPT labelled disk and so I expect 
that BSD labelled partitions are not checked. Since almost everyone 
starts with ZFS by doing exactly what you did, adding information to the 
wiki is a good idea.


I think that /etc/zfs is used for maintaining certain system state 
information about imported pools across reboots and so I'm not overly 
surprised to see that it is empty after you exported the pool. It might 
just optimise the boot time import of the pool.


Cheers

On 17/02/21 2:39 pm, Greg Troxel wrote:

(I'm testing on 9, but am guessing this is similar on current and will
if anywhere be fixed there and not necessarily pulled up to 9.)

I'm starting to try out zfs.   So far I don't have any data that
matters.

On a 1T SSD I have wd0[abe] as root/swap/usr as an unremarkable netbsd-9
system, on an unremarkable amd64 desktop with 8G of RAM.

I created pool1 with wd0f, which is the rest of the 1T disk, about 850G,
not raid of any kind.  I created a few filesystems, changed their mount
points, changed their options, and mounted one over NFS from another
machine, and all seemed ok.  (Yes, I realize the doctrine that "use the
whole disk as a zfs component" is the preferred approach.)

I wanted to rename my pool from pool1 to tank0, for no good reason,
mostly trying to do all the scary things while the only data I had was a
pkgsrc checkout, but partly having seen Stephen Borrill's report of
import trouble.

So I did

   zpool export pool1

and sure enough all my zfs stuff was gone.

Then I did, per the man page:

   zpool import

and nothing was found.  After a bunch of reading and ktracing, I
realized that there is no record of the pool in /etc/zfs or anywhere
else I could find, and the notion is that zpool import will somehow find
all the disks that have zfs data on them, apparently by opening all
disks and looking for some kind of ZFSMAGIC.  But it looked at wd0 and
not the slices.  There was no apparent way to ask it to look at wd0f
specifically.  So I did

   cd /dev; rm wd0; ln -s wd0f wd0

which is icky, but then zpool import found wd0f and I could

   zpool import pool1 tank0

So this feels like a significant bug, and matches Stephen Borrill's
report.  I think we're heading to documenting this in the wiki, or at
least I am.

Does anything think I have this wrong?
Is anyone inclined to do anything more serious?


Re: Running into unknown user errors while building -current for the Rasberry Pi

2016-12-31 Thread Lloyd Parkes

> On 29/12/2016, at 10:39 AM, Brian Buhrow  wrote:
> 
>   hello.  thanks for the feedback.  The earlier post about looking at
> the mtree source code gave a clue about what to do.  The build is running
> as root and not in a chrooted environment.  So, mtree is doing lookups
> against the host's password and group files.  My uid for user postgres was
> the same as for the build environment's user games.  While I contend that
> shouldn't matter and the cross build environment should be completely
> independent of the host environment, an immediate fix is to fix the uid for
> games on the host system and see what happens.  

I agree that you shouldn’t need to adjust the users on the build host and that 
this is a bug in the build system.

You don’t need to worry about chroot because you have specified a DESTDIR with 
-D. 

I expect that your builds will work fine without adjusting your local postgres 
user if you build with the -U option for an unprivileged (non-root) build. The 
reason for this is that instead of running all the chown and chmod commands, 
they are instead logged to a file during the build. This in turn means that the 
build system will never use the actual ownership of any of the files in DESTDIR 
when building a filesystem image.

Cheers,
Lloyd


Re: Running into unknown user errors while building -current for the Rasberry Pi

2016-12-31 Thread Lloyd Parkes
Hi Brian,
Can you provide the snippet of the log from a build with -N 2 that shows the 
command line for nbmakefs? If you have trouble coming up with a log small 
enough for you to feel comfortable cross-posting, feel free to email it to me 
directly, or use something like pastebin.

nbmakefs should have an option passed to it “-F mtree-specfile” that which will 
be “the specification” that the error message is referring to, and looking at 
line 7774 should give some clue about what might have gone wrong.

It would be interesting to see if an unprivileged build (with -U and not as 
root) exhibits the same error.

Cheers,
Lloyd


> On 29/12/2016, at 8:26 AM, Brian Buhrow  wrote:
> 
>   Hello.  I'm sure this problem is pilot error on my part, but I'm
> having trouble figuring out where the build process is picking up the user
> postgres in the destination build environent.  
> I'm hosting the build on a NetBSD-5.2/i386 system.  The build command looks
> like:
> 
> ./build.sh -D /var/tmp/netbsd-rpi -O /usr/local/netbsd/obj-rpi -j 4 -m evbarm 
> -a earmv7hf release
> 
>   I'm almost all the way through the build process, when I get the
> following error:
> 
> 
> . . . 
> 
> Populating 
> `/usr/local/netbsd/obj-rpi/releasedir/evbarm/binary/gzimg/armv7.img'
> Image `/usr/local/netbsd/obj-rpi/releasedir/evbarm/binary/gzimg/armv7.img' 
> complete
> === Populating ffs filesystem ===
> nbmakefs: unknown user `postgres'
> nbmakefs: failed at line 7774 of the specification
> *** [smp_armv7] Error code 1
> 
>   What is the process using for the specification?  
> Yes, the user postgres exists on the host system, but why is that getting
> picked up for the target build?
> 
> Any ideas on how to get more information or just solve the issue?  I see
> similar issues in the past, but most of them predate the build.sh script
> and there's nothing obvious in the BUILDING or UPDATING files that I
> noticed.
> 
> -thanks
> -Brian
> 
> 
> 




Re: Building on OS X - how?

2016-08-17 Thread Lloyd Parkes

> On 16/08/2016, at 7:41 PM, matthew green  wrote:
> 
>> I've been trying to find when this breakage occurred,
> 
> it happened when your port switched to GCC 5.  sorry :-)

Yeah. While looking for the “backend” directory I saw gcc.old and gcc. A quick 
look showed me they were 4.x and 5.x and thought to myself “yeah, that’ll do 
it”.

The logs should now be available at 

https://www.must-have-coffee.gen.nz/LOG-distribution.txt 
 
https://www.must-have-coffee.gen.nz/LOG-cleandir.txt 
 
https://www.must-have-coffee.gen.nz/LOG-genmatch.txt 
 
https://www.must-have-coffee.gen.nz/libcpp-config.log 
 
https://www.must-have-coffee.gen.nz/libiberty-config.log 
 

Cheers,
Lloyd

Re: Building on OS X - how?

2016-08-16 Thread Lloyd Parkes

> On 12/08/2016, at 3:34 AM, Thor Lancelot Simon  wrote:
> 
> It's curious that this doesn't break the tools build, and doesn't
> prevent using the built tools to build a kernel!  If this can break
> the cross-build of the target compiler, I think we must have suddenly
> sprouted a rather serious instance of host/target confusion.

I’ve been trying to find when this breakage occurred, but I’ve given up because 
it seems to have been around for some months at least and other occasional 
compile problems with the tree aren’t helping be pick arbitrary dates in the 
past to build in.

I think you are right about the host/target confusion because the error 
messages refer to /usr/bin/cc and x86_68. I can also see Mach-O format object 
files outside the tools object directory, which I assume to be a sign of 
something going badly wrong.

I’ll try following mrg’s suggestion and see what I get.

Cheers,
Lloyd