Re: pkg_rolling-replace and mismatched version

2024-06-06 Thread Greg Troxel
Riccardo Mottola  writes:

> sudo pkg_add -v /usr/packages/All/perl-5.38.2.tgz
> pkg_add: no pkg found for '/usr/packages/All/perl-5.38.2.tgz', sorry.
> pkg_add: 1 package addition failed
>
> -rw-r--r--  1 root  wheel   18M May 17 18:32 perl-5.38.2.tgz

run "tar tfvz" on it and see what's inside.

> Mystery is that lintpkgsrc did check:
>
> => Full dependency perl>=5.36: found perl-5.36.0
> But pkg_info told me it wasn't there.. how can this check pass?
>
> eowyn$ perl
> -sh: perl: not found
>
>
> I thus got perl from the prebuild binaries and installed it. Now
> lintpgsrc is running... it will take some time on this oldie. But I
> wonder what's going on.

Check that you have only one PKG_DBDIR (/usr/pkg/pkgdb, and not
/var/db/pkg).

Check for directories in PKG_DBDIR that are empty, or otherwise
irregular.

find / -name perl
and see if it is in /usr/local or some other place.

I am pretty convinced that something is wrong with your system.


Re: pkg_rolling-replace and mismatched version

2024-06-06 Thread Greg Troxel
It sounds like you are not having a pkg_rolling-replace problem, in that
the underlying make replace does not work.  Please see the
pkg_rolling-replace man page where it asks that problems with a make
replace (that was ordered reasonably) not be reported as
pkg_rolling-replace issues.

I would suggest checking your pkgdb (that you only have one, rebuild,
rebuild-tree, check).

Also, when you upgrade NetBSD from 9 to 10, you can run old packages,
but you cannot (soundly, in general) run a mixed package set.   So you
need to mark all packages rebuild=YES or do something equivalent.



Re: Memory leaks in c/pthread libraries

2024-06-05 Thread Greg Troxel
Brian Marcotte  writes:

> Since upgrading to NetBSD-10, we've seen memory leaks in several
> daemons which use libpthread:
>
> gpg-agent
> opendmarc
> dkimpy_milter (python3)
> syslog-ng (in some cases)
> mysqld
> mariadbd
>
> In most cases, the daemons leak as they are used, but running this
> will show the leak just sitting there:
>
>   gpg-agent --daemon
>
> I opened PR#57831 on this issue back in January.
>
> Has anyone noticed this?

I am seeing the leak with gpg-agent and mariadbd.  I don't run the rest
on 10.

gpg-agent had a SIZE of 17G with a reasonable rss, after 2h21h of
uptime.  This is a machine where gpg is basically not in used.  So it's
pretty bad.

And, on a  RPI4 also with 10:

 4212 mariadb   850  3734M  130M poll/0145:16  0.00%  0.00% mariadbd

since 26May.  bad, but not nearly as much.



Re: Mailing List Oddity

2024-05-22 Thread Greg Troxel
Andrew Ball  writes:

> Why is there a regional-london mailing list but no regional-uk? -Andy

I would not assume there is any particular reason.

There is a regional-boston list but it has not, in the last many years,
had any useful traffic.  I think it got deleted but I'm not sure.   And
I was probably the moderator.  Overall, this is a clue!


Re: getconf LONG_BIT in NetBSD

2024-05-19 Thread Greg Troxel
Not that this is super helpful, but posix says:

  https://pubs.opengroup.org/onlinepubs/9699919799/

  All of the following variables shall be supported: 

The names of the symbolic constants listed under the headings
``Maximum Values'' and ``Minimum Values'' in the description of the
 header in the Base Definitions volume of POSIX.1-2017,
without the enclosing braces.


and, in limits.h:

  https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/limits.h.html

  LONG_BIT appears under "Numerical Limits", not max or min.

and therefore, your friendly but unhelpful language lawyer says that
NetBSD's behavior is not improper.

LONG_BIT is defined in limits.h.  It is straightforward to write a
program to print it, and run that from configure (at the expense of
cross).  It's also straightforward to add it to the getconf sources, and
rebuild, but that doesn't help packaging.

You could also insert a bit of shell that if uname is netbsd, looks at
uname -p and has a table.  gross, but pretty easy.

Maybe somebody else knows a better way


Re: Please forgive a blatant plug: I reviewed v10 for the Reg

2024-05-01 Thread Greg Troxel
Liam Proven  writes:

> Step 1: a binary interoperability standard, so apps from any BSD can
> execute on any other BSD (on the same CPU architecture, obviously.)

This is not so much about the binaries about about the ABI for libc and
other core libs.   But I suspec this works better than you think, if one
arranges for the other libs and deals with elf tags.

> Step 2: identify the core OS elements that are widely different, and
> those that are largely shared because they are upstream FOSS code.

All BSDs come from 386BSD and Net/2, said in a very broad brush way.

> Unique and different:
> * Kernels
> * LibC
> * Init daemon, maybe?
> * Packaging tools

It's not accurate to say that the kernels are unique.  There is a lot of
common code.  You can't really slice it that way, because what's common
and what isn't is not organized along the kinds of lines you are
listing.

> Largely common:
> * Shells
> * Coreutils?

coreutils is a term for a GNU package, and many/most of those
utilities in BSDs aren't from that.   Really, coreutils is the linux
reimplementation of traditional utilities.

> * Console-level userland?
> * X11 server?
> * Other core servers, such as HTTPD, SSH, etc.?
> * Compilers

Lots of things are the same because they are from the same upstream,
eg. OpenSSH.  But others are differnet.  Again you can't really slice it
that way so easily.

> Mostly separate:
> * Desktop environments and window managers
> * Upstream apps such as editors, web browsers, office suites, etc.

Mostly these are the same upstreams, perhaps packaged differently.
emacs runs on all of them.  I'm not sure what your point is here.


I see the biggest benefit to adding kernel support for other BSD's
interfaces, and things like stealing updated zfs code from FreeBSD.
But that's all work, and people do work when they feel like it.


Re: Pkgsrc issue

2024-05-01 Thread Greg Troxel
Havard Eidnes  writes:

>> I think I may have struck a nerve.
>
> I'm not sure I understand what that's getting at.  It's not a
> particularly "sensitive topic" to me or us in general, I would
> think.  The fact that english is a second language for me may
> prevent me from interpreting this comment appropriately.

I don't think 'struck a nerve' fits either.  We aren't upset that Don is
choosing to run old software.  We're just pointing out that it has
issues and recommending upgrades.  And saying that we are done with 8,
so we're not going to spend time helping.

It doesn't bother me at all that someone is running 8.2.  I am only
bothered if they demand that others make things work anyway -- and
that's not happening in this case.

>> The consensus is that 8.2 is very old and I should upgrade.

As of today, it is formally desupported by NetBSD.

>> As a software developer, I understand completely.
>>
>> As a lowly user, I never want to upgrade *anything* that is
>> working - ever. (You _never_ trade non-working for working, you
>> trade the old bugs for a new set of bugs. ;-> )

Then why aren't you using the packages from when you first installed
your 8.2 system?  The problem is that you want some old software and
some new software.  It's fine to want new software -- in fact I'd say
it's not good to keep running unmaintained software.

There's a parallel in the GNU/Linux world where people run "LTS" that
has a very long lifetime, which I describe as "intentionall deciding to
run old software".   Free Software licenses give people the right to
make this choice, and just like GNU/Linux LTS you are welcome to run
8.2, or really even 1.6.1 :-)

But often people, having chosen LTS, want to build new software on that
old system, and this is not only logically inconsistent with their LTS
choice, but tends to run into trouble.

A fairly large number of these people expect the current releases of
packages to build on their ancient systems.  By ancient, I mean
"snapshots of 2014".  This causes problems for people that maintain
software (the upstream maintainers).

In pkgsrc, I take the view that if the upstream code doesn't build on a
platform when a human follows the upstream build instructions, that's
not a pkgsrc bug, but an upstream bug.  Certainly we have a lot of
workarounds for such bugs, but they are supposed to be filed upstream.

For me personally, I gave up on 8 about a year and a half ago, and will
likely have given up on 9 within a year.  That doesn't mean pkgsrc will
formally desupport it -- just that I won't be helping in any signifcant
way, other than to add cautions to release notes that people using 9 are
likely to have problems.




Re: $squid is not set properly

2024-04-30 Thread Greg Troxel


> Greg Troxel wrote:
>> Almost certainly, you have squid installed via pkgsrc, or you have a
>> leftover /etc/rc.d/squid because you used to.
>
> oh, got you! Squid was there. Actually, I really had the package
> installed and left unused. Possibly some testing. The logs indicated
> it did some runs in the past.
> I just removed it (no dependencies found) and instruction to remove
> squid user and such appeared. I cleaned everything up as per messages.
>
> Next reboot... all fine.

This is why I recommend paying attention and keeping things tidy.  My
friend Marie says it's good for software too, not just houses!


Re: $squid is not set properly

2024-04-30 Thread Greg Troxel
Riccardo Mottola  writes:

> Hello,
>
> on boot, I see this message:
> /etc/rc: WARNING: $squid is not set properly - see rc.conf(5).
>
>
> but I have no squid related lines in my /etc/rc.conf and I don't use
> squid, so I don't think I need it (or does the system need it?).

Almost certainly, you have squid installed via pkgsrc, or you have a
leftover /etc/rc.d/squid because you used to.

I generally recommnd

  - going over your "keepable" (manually installed) packages and "pkgin
uk" those you don't actually want and then "pkgin ar"

  - unpacking the releese etc and xetc sets e.g. to /usr/netbsd-etc and
then "diff -ur /usr/netbsd-etce/etc /etc" and *thoughfully* reducing
differences to those that you desire, pausing to understand all
differences


One could argue that having a package installed that has an rc.d file
which is not enabled is not a bug and that no warnig should be
generated.  The base system has defaults (/etc/defaults/rc.d) so you
don't get this for base rc.d files.  Everything installed by pkgsrc has
to default to off (running daemons because of installation without
configuration is not ok),

All that said the typical approach is to ignore these lines.

 


Re: OAUTH TOTP

2024-04-29 Thread Greg Troxel
Staffan Thomen  writes:

> It used to be that google authenticator didn't automatically back up
> your secrets, so you had to be very careful to copy them over when you
> got a new phone and if your old phone was unusable you were hosed.

> This has since been fixed, and it will back them up to the google
> cloud like any other app's private data.

As long as it's e2e so google can't read it, that's ok.


> I will leave any tinfoiling about backing up secrets to the cloud unsaid.

I think you're joking, but it's not fair to call it tinfoiling.  Putting
TOTP seeds in the cloud where the cloud provider can read them is like a
password manager with cloud storage that does not encrypt the passwords.
Except 2fa is supposed to be better than passwords.  So that's just not
a reasonable thing to do.  Arguably, a password manager should also be
encrypting the URLs, not just the passwords, as the set of places at
which you have accounts is also sensitive.  I suspect there's a problem
with that too.

> AndOTP is an opensource alternative, and I will second a vote for
> KeePassXC in general.

Yes, there are other open source TOTP apps, and yes you need to pay
attention to backups.

Also, my understanding is that bitwarden will store seeds and do TOTP, I
think if you have a paid cloud account or if you are selfhosting
(vaultwarden) -- but I haven't tried it yet.


Re: OAUTH TOTP

2024-04-29 Thread Greg Troxel
Benny Siegert  writes:

> The cheapest way to have TOTP is to install Google Authenticator on
> your phone.

Be careful when you choose a TOTP program that you are able to back up
the seeds yourself, and that the program does not send the seeds to the
cloud not adequately protected in the name of cross-device syncing.
Last I heard Google Authenticator was not ok, but maybe that has changed
and it is now impossible to sync without e2e encryption inaccessible to
google.

> Hopefully, you can use proper Security Keys too (WebAuthn and
> whatnot), in which case I highly recommend a Yubikey.

I also recommend yubikeys.


Re: Pkgsrc issue

2024-04-27 Thread Greg Troxel
Don Lee  writes:

> Mr Nestor has given me a good path for my needs. I’m wondering if I
> can apply any of my resources to TNF efforts. I was thinking of
> setting up a machine here, but I might do better by simply giving a
> machine to TNF. (The NetBSD Foundation?)
>
> NetBSD has been good to me. I’d like to return the favor.

Yes, TNF == The NetBSD Foundation.

I'm not on the board and can't really speak for TNF, but from helping
with pkgsrc and talking about bulk build setups, I know that hardware is
one thing, and a place to put it, power, network, remote hands, and
admin resources is much more.  You might ask he@ who is doing powerpc
builds if they could use more, but I would guess that's really not the
limit.  There is always a call for money which is used to pay people
sometimes, and to buy hardware.  (I don't know where it came from, but
the x86_64/i386 bulk build setup is recently massively improved with new
hardware, but it took a lot of work by many people to get it all done.)

It is very useful if you figure out what's wrong and fix it and send
patches.  For that, it's best if you update to 10.


Re: Pkgsrc issue

2024-04-27 Thread Greg Troxel
Don Lee  writes:

> I have extra static IPs, and several PPC Mac mini machines. I wonder
> how hard it would be for me to set one up with enough disk space to do
> bulk builds. Would that even be helpful? I imagine that to be useful,
> a machine would have to have some administrative massaging to set up
> users/ssh/ftp/etc.

You building our own packages is useful for you.  You are welcome to
make them available, and we list sets that are long-term maintained in
SEE_ALSO.  We have a rule that only packages built on machines
controlled by TNF members are posted on ftp.netbsd.org.

> I have not done the pkgsrc bootstrap for quite a while, and don’t know
> how much disk space I would need. Fortunately, NetBSD 8.2 on PPC seem
> to be very stable, so running for a couple of weeks on a build should
> be fine.

I meant that if you need a small subset of packages, then building those
yourself is reasonable.  If you try to build the entire set, it's
going to take a very long time.   For example, I build what I need for
earmv7hf-el myself, but my setup would take years to build the full set,
and it doens't have enough memory.


Re: Pkgsrc issue

2024-04-27 Thread Greg Troxel
Don Lee  writes:

> I have a PPC Mac Mini running NetBSD 8.2. It’s stable and functional. It 
> serves me well.
>
> I have been using pkgin to install packages and update them with "pkgin 
> upgrade”.
>
> My recent attempts to upgrade have been ended by pkgin telling me:
>
> +mercy$ pkgin upgrade
> calculating dependencies...|
> glib2>=2.76.4nb1 is not available in the repository
> proceed ? [y/N]
>
> So I checked on glib2, and it really should be in the pkgsrc binary
> packages, but it is not there in the 8.2 ppc pkgsrc binaries. I also
> have amd64 machines and they have glib2.

NetBSD has machines to do builds for some architectures, basically
x86_64, i386, earmv[67]hf-el, and aarch64.

Beyond that individual developers do bulk builds and publish the
results.  These are almost all "slow architectures", which means that
the available compute resources are such that it takes a long time (more
than 2 weeks) to complete a bulk build.  For many, it takes a whole
quarter.

Thus the build currently pointed to for macppc/8 is 2023Q4.  You are
welcome to look at the ftp server and choose a build yourself -- but
there isn't anything newere.

> How should I go about reporting this issue and hopefully getting it fixed.

You have reported it :-)

There are a lot of reasons a package might not succeed:

  bulk build ran out of time that quarter before the hardware is used to
  start the next quarter

  hiccup or resource issue on the build system

  the package actually won't build on the branch

The place to find bulk build reports is the pkgsrc-bulk mailing list,
but I am not able to find the report for this build with quick search.

So, you could check out pkgsrc, bootstrap it to a scratch prefix, and
try to build glib2, see if it succeeds, and if not fix it.

However:

  we no longer maintain 2023Q4, now that 2023Q1 was created

  pkgsrc no longer supports NetBSD 8, more or less right now, as NetBSD
  8 is formally EOL on Tuesday.  (The notion of "supports" is a bit
  funny, as people fix what they want, regardless.)

> I *believe* I could fix this by upgrading NetBSD to
> 9.0+. Unfortunately, that would be hard for me, at least now.

Whether there are binary package sets for various versions is a question
you might want to answer, but in general, you are now overdue for an
upgrade.

This query isn't, as far as I know, easy to 4un for the web, for in case
it helps:

$ ls -l pub/pkgsrc/packages/NetBSD/powerpc/*_20*Q*/All/glib2-2*
-rw-r--r--  1 he  netbsd  5353306 Dec 18 08:33 
pub/pkgsrc/packages/NetBSD/powerpc/10.0_2023Q3/All/glib2-2.76.5.tgz
-rw-r--r--  1 he  netbsd  5301256 Oct 24  2023 
pub/pkgsrc/packages/NetBSD/powerpc/8.0_2023Q3/All/glib2-2.76.5.tgz
-rw-r--r--  1 he  netbsd  5372926 Oct 23  2023 
pub/pkgsrc/packages/NetBSD/powerpc/9.0_2023Q3/All/glib2-2.76.5.tgz
-rw-r--r--  1 he  netbsd  5435273 Feb 12 06:31 
pub/pkgsrc/packages/NetBSD/powerpc/9.0_2023Q4/All/glib2-2.78.1nb1.tgz

which shows that 9 has glib2 for 2023Q4, and that there isn't a 10.0
build for 2023Q4 on the ftp server.

You of course can build your own packages.  That might be more
reasonable than you think, especially if you are operating in a
lightweight old-school manner.


Re: cryptic pkgin SSL cert error

2024-04-23 Thread Greg Troxel
David Brownlee  writes:

> Do you have security/mozilla-rootcerts-openssl installed? (which
> should provide a full set of certs in /etc/openssl). Alternatively
> what do you have in /etc/openssl
>
> For netbsd-10 /etc/openssl is populated by the OS, but doing that
> would be a breaking change on netbsd-9, however it may be that the
> latest pkgin is enforcing SSL certificates by default on netbsd-9
> which would be... unhelpful in this case

I don't see it as uhelpful -- doctrine has always been that the sysadmin
should choose which CAs to configure as trust anchors.  In 10, that's
still more or less doctrine, except the default set is mozilla (or ish)
rather than the empty set.  If you haven't set up trust anchors, lots of
things are troubled.



Re: Mail delivery from Postfix to remote IMAP

2024-04-22 Thread Greg Troxel
Rhialto  writes:

> The trouble with plain forwarding is that my mail server's domain name
> doesn't match the domain name in the From: header, and doesn't match the
> envelope FROM domain, and it doesn't match the SPF policy of the sender
> domain etc etc. Those are things that are checked by DKIM/DMARC/SPF.

DKIM checks the signature.
SPF checks the sending server.
DMARC doesn't check anything, but specifies that a message should be
disfavored unless either DKIM or SPF passes.

Not modifying the message is exactly the right thing to do.

> And you can't change the From: header because that is changing the mail
> (and invalidates the DKIM signature), and neither can you change the
> envelope FROM address because bounces (as far as they happen) won't work.

It's bad to change either, regardless.

> Unfortunately DKIM is designed to break forwarding... I can't think of a
> way to change an email message to make it DKIM-compliant.

You can't; that's the point.

> Mailing lists can get away with changing the From: header to something
> like "l...@example.org (Rhialto via Example-List)" (and that's already
> an ugly thing to do) but that's not an option for individual mails.

I don't think they get away with it.  They do it anyway and people that
understand standards tell them they are doing it wrong.  But their
internet license is not revoked and they aren't jailed, if that's what
you mean by get away.


There's something else, which is that spam filtering is a local call, so
you can't reason "if I do X it will be ok".  It might or might not be,
and it can change  in the future.

Because of this, I think delivering to IMAP via some kind of IMAP client
delivery agent is reasonable.

The other thing to do is to tell them that they have an account on your
domain, and they can IMAP to you to get mail, and use submission to your
server to send mail, and that's that.


Re: How to remove old libraries ?

2024-04-03 Thread Greg Troxel
BERTRAND Joël  writes:

>   My server runs NetBSD for a very long time (if I remember, first
> installation was done with NetBSD 4.0). Motherborad was changed, raid1/5
> volumes also... But installation was done a long time ago and upgraded
> from sources.

Many others have the same sorts of feelings.

  > ls -ltr /usr/lib/libstdc*
  -r--r--r--  1 root  wheel   342941 Jul 13  2003 /usr/lib/libstdc++.so.4.0
  lrwxr-xr-x  1 root  wheel   16 Jul 13  2003 /usr/lib/libstdc++.so.4 -> 
libstdc++.so.4.0
  -r--r--r--  1 root  wheel   915503 May  7  2006 /usr/lib/libstdc++.so.5.0
  lrwxr-xr-x  1 root  wheel   16 May  7  2006 /usr/lib/libstdc++.so.5 -> 
libstdc++.so.5.0
  -r--r--r--  1 root  wheel  1118243 Nov 17  2014 /usr/lib/libstdc++.so.6.0
  lrwxr-xr-x  1 root  wheel   16 Nov 17  2014 /usr/lib/libstdc++.so.6 -> 
libstdc++.so.6.0
  -r--r--r--  1 root  wheel  1312289 Oct 27  2018 /usr/lib/libstdc++.so.7.3
  lrwxr-xr-x  1 root  wheel   16 Oct 27  2018 /usr/lib/libstdc++.so.7 -> 
libstdc++.so.7.3
  -r--r--r--  1 root  wheel  1458944 Apr 18  2020 /usr/lib/libstdc++.so.8.0
  lrwxr-xr-x  1 root  wheel   16 Jan  2  2022 /usr/lib/libstdc++.so.8 -> 
libstdc++.so.8.0
  -r--r--r--  1 root  wheel  5774150 Mar 31 18:19 /usr/lib/libstdc++_p.a
  -r--r--r--  1 root  wheel  5459714 Mar 31 18:19 /usr/lib/libstdc++.a
  -r--r--r--  1 root  wheel  2417272 Mar 31 18:19 /usr/lib/libstdc++.so.9.0
  lrwxr-xr-x  1 root  wheel   16 Mar 31 18:19 /usr/lib/libstdc++.so.9 -> 
libstdc++.so.9.0
  lrwxr-xr-x  1 root  wheel   16 Mar 31 18:19 /usr/lib/libstdc++.so -> 
libstdc++.so.9.0

>   Now, I have some trouble to build Libreoffice. I have found that this
> bug comes from old libraries (libstdc++7). Effectively, /lib and
> /usr/lib contain a lot of old shared libraries. How can I purge these
> directories without doing a mistake ?

The basic answer is "carefully".

But I would suggest that after you have done an install/upgrade
recently, you can do something like

  cd /usr/lib
  find . -name \*.so\* -mtime +365 > /tmp/OLD
  more /tmp/OLD
  cat /tmp/OLD | xargs ls -ltr | more
  # think, and make sure you aren't deleting libc, and that rescue is ok
  # and that you could recover your system
  tar cfvz ~/usr-lib-prune.tgz `cat /tmp/OLD`
  cat /tmp/OLD | xargs rm

because the important thing is to be able to recover without too much
pain if you do make a mistake.

Then you can look at /lib and you can be more aggresssive.

Note that this will break compat of old binaries.  But that may be ok.




Re: changed major shlib versions in netbsd-9

2024-03-26 Thread Greg Troxel
Martin Husemann  writes:

> On Tue, Mar 26, 2024 at 03:46:21PM -0400, Greg Troxel wrote:
>> Or, do we claim that these libs are private to bind, and thus this is
>> not an ABI change?
>
> We do, but it is phishy. There was a recent discussion to move it to
> some more private directory like /usr/lib/bind/lib*.so (especially to
> avoid collisions with bind from pkgsrc).

So:

  - this was thought about and believed within the rules
  - I should just delete the destdir and continue
  - we believe that I will then have no problems

?

Is that right?


changed major shlib versions in netbsd-9

2024-03-26 Thread Greg Troxel
Perhaps my build is messed up, but I just updated along netbsd-9 and
netbsd-10 and rebuilt.

On -10, I got new shlib versions for bind and unbound libs.  That's ok
because 10 is not yet released.

On -9, I see in my destdir:

  -r--r--r--  1 gdt  wheel  143494 Mar 26 12:59 
/usr/obj/gdt-9/destdir/amd64/usr/lib/libbind9.a
  lrwxr-xr-x  1 gdt  wheel  16 Mar 26 13:00 
/usr/obj/gdt-9/destdir/amd64/usr/lib/libbind9.so -> libbind9.so.21.0
  lrwxr-xr-x  1 gdt  wheel  16 Jan 30 12:49 
/usr/obj/gdt-9/destdir/amd64/usr/lib/libbind9.so.13 -> libbind9.so.13.0
  -r--r--r--  1 gdt  wheel   74720 Jan 30 12:49 
/usr/obj/gdt-9/destdir/amd64/usr/lib/libbind9.so.13.0
  lrwxr-xr-x  1 gdt  wheel  16 Mar 26 13:00 
/usr/obj/gdt-9/destdir/amd64/usr/lib/libbind9.so.21 -> libbind9.so.21.0
  -r--r--r--  1 gdt  wheel   95744 Mar 26 12:59 
/usr/obj/gdt-9/destdir/amd64/usr/lib/libbind9.so.21.0
  -r--r--r--  1 gdt  wheel  147542 Mar 26 12:59 
/usr/obj/gdt-9/destdir/amd64/usr/lib/libbind9_p.a

which makes we think we had a pullup with an ABI change.

I think this is tickets 1804, 1805.

Or, do we claim that these libs are private to bind, and thus this is
not an ABI change?

(I have not tried to boot this system - I'm at the stage where the build
failed because there are old files in destdir.  That's  not an error in
current, but it's irregular in 9.)


Re: How Does One Build Libdvdcss?

2024-01-07 Thread Greg Troxel
Glad it was useful.

Not that you did, but note that TNF asks that help to locate this
program not be provided in PRs or commmits, and it follows not on the
mailinglist.

And if you don't like this, write to your congresscritter.   Since
you're in .il.us, this is your problem too...


Re: How Does One Build Libdvdcss?

2024-01-07 Thread Greg Troxel
"Jay F. Shachter"  writes:

> How does one obtain libdvdcss on NetBSD?  If I am not mistaken, the
> procedure involves:
>
>cd /usr/pkgsrc/multimedia/libdvdcss
>make install clean
>
> But (also if I am not mistaken), one must first make some sort of
> change in /etc/mk.conf before issuing the above commands.  What is the
> procedure?  As always, thank you in advance for any and all replies.

I'm not going to give you advice about this package.

General advice for your next similar situation:

1) Instead of asking an open-ended question (that makes it seem like you
have not tried to figure things out), try to do what you want to do, see
what happens, try to resolve problems, and then if/when you are stuck,
write and in addition to mentioning your goal, say what you did and what
happened.  When you do, include the key lines from output; don't
summarize them.

2) When wondering about a package, read the files in the package
directory.  I realize that if you haven't learned BSD make and pgksrc
development the code is hard to read, but often it is possible to
understand something anyway.  Sometimes people working on packages even
include comments.  This might be one of those times, and it might not!


Re: Network and port redirection with QEMU not working with package compiled on 10.0_RC1

2024-01-03 Thread Greg Troxel
I really doubt reinstalling is necessary.

When you upgrade packages, make sure you have every single package from
a consistent build - same branch, same OS version.

diff your /etc from unpacking the etc.tgz and xetc.tgz sets someplace
else.  Understand the differences.   I try to minimize them if I don't
intend them to differ.

Check  your firewall configs extra carefully.


firewall by mac address, ignore in dhcpd?

2023-12-15 Thread Greg Troxel
I have a system with a wm(4) interface, and a vlan.  I have wifi where
one ssid goes on trunk and one goes on a specific other vlan tag,
configured as vlan0.  dhcpd serves one subnet to wm0 and another to
vlan0.

For reasons that are not clear, I am seeing packets from hosts that
should be on the vlan also appear on wm0, and I want dhcpd to ignore
those.   I think this may be a Unifi bug.

I dimly remember there was a facility to firewall by mac address, but I
can't find it now in ipfilter.  I don't see it in npf either.  But, that
might block it from the stack, not dhcpd which at least used to use bpf.

In dhcpd, I can ignore by mac address, globally.  And I can 'deny' in
the pool for wm0.  But I need these hosts to get addrs on vlan1.  If I
deny in wm0, then they get NAKs for "no address in pool" and I want them
to be ignored.

So:

  any way to firewall by mac addr?

  any way to have dhcpd ignore by mac on one subnet but not the other?



Re: How big should wd0e (/var) be

2023-12-14 Thread Greg Troxel
Benny Siegert  writes:

> On Wed, Dec 13, 2023 at 3:52 AM  wrote:
>> p.s. NetBSD does have ZFS, but I haven't tried it, and I seem to recall
>> some discussion of stability issues.
>
> I have been using it for a few years, never had any stability issues.
> However, root on ZFS remains a bit challenging IIRC. I put ZFS on /usr
> instead, which requires having a minimal /usr that is used during
> boot.

There are definitely stability issues.  I have had two machines lock up
hard, multiple times, with netbsd-10 and pool with a single disk.  It
seems to take lots of activity (pkgsrc builds with elevated MAKE_JOBS)
and the system running low on RAM.  It smells to me like a locking error
that usually, it gets away with.

One of the systems that locks up has 32G of RAM.  But I run firefox on
it, and now I tend to exit firefox periodically and that seems to
help.  I say seems, because this is not reproducible at will and thus
hard to reason about.




Re: How big should wd0e (/var) be

2023-12-11 Thread Greg Troxel
Brett Lymn  writes:

> Sure, there are some instances where separating /var and /var/tmp and
> even /var/mail are a good thing but it really isn't a must do.

Agreed; I didn't mean to say it was mandatory.  I was objecting to "it's
never a good idae".

>> correct size for /var depends on what you put in it!
>
> Yes, that is the problem and it is really painful when you undershoot or
> even overshoot on other partitions and find you really need that space
> elsewhere.

Yes, but if / isn't the whole disk or really large, you run into the
same space issue with / that you did with /var.

On my current, non-space-challenged system which has 32G of RAM and a 4T
SSD, I have
  2GB /
  40 GB swap
  8 GB /var
  64 GB /usr
  rest zfs pool


/var would have 2G usage but for an unusual reason it is just over 3G.
/usr (which has /usr/pkg and /usr/pkg/pgsql, and about 1400 packages) is
at 20G.  And no swap is used.   Still, saving 50G wouldn't really change
my zfs pool size.

If you are tight on space, then you have to think harder.


Re: How big should wd0e (/var) be

2023-12-11 Thread Greg Troxel
Benny Siegert  writes:

> The correct size is "not on its own partition". Why not have /var
> along with / on the same filesystem?

That's a matter of opinion.   Traditional practice was to separate it,
to protect / from filling up, and to keep fs churn on / down for greater
likelihood of things being ok.

correct size for /var depends on what you put in it!


Re: pkgconf freetype and flags

2023-11-29 Thread Greg Troxel
nia  writes:

>> > we see that for freetype -Wl,-R/usr/pkg/lib is missing and this causes
>> > me various issues during configures and builds.
>> 
>> This is the typical bug.
>> 
>> You can look at the freetype2 package and the file from upstream and how
>> LDFLAGS get substituted in.
>> 
>> Basically things are often linuxy and assume that everything goes in
>> /usr and there is no need to set rpath.
>
> Specifically:
>
> - RPATH is not portable. Some OSes don't support it at all.

It's true that it's complicated.  But when generating a .pc file the
right thing to do is to respect the local conventions.  I view including
rpath (if not /usr) as the standard approach.

>   Many distributors recommend against it and advise leaving
>   library path configuration to the system's administrator.

True, but that doesn't make their opinion correct.  These are also
people that put everything in /usr and thus are unconcerned that their
opinion results in a non-working setup for people that don't do that.

> - In pkgsrc we use PKGCONFIG_OVERRIDE to automatically insert
>   RPATH to .pc files on supported systems.

Thanks; I was not aware of that.  It would be nice if Someone(tm) added
it to the pkgsrc guide.

python311 at least seems to need this fixed.  I had run into linking
with the pc file not working and just hand patched the installed
version.  (This is for building something outside of pkgsrc.)


Re: X on 10.0 RC1 is unusable on my laptop

2023-11-25 Thread Greg Troxel
Mike Pumford  writes:

> On 24/11/2023 16:28, Patrick Welche wrote:
>> I notice that my artifacts (8th gen) disappear really quickly /
>> hardly exist if the system is under load. Otherwise, they also self
>> clear after about a second. Someone (tnn? rvp?) mentioned the
>> possibility of cache lines not being flushed in this context.
>> 
> That matches my experience exactly. They completely disappear if the
> system is under any sort of load. I'd always assumed it was a cache
> thing and got exactly the same behaviour on 9-STABLE as well.

Do we think that the GPU reads from RAM separately from the CPU cache,
so that bits written to a write-back cache are not in memory and thus
not rendered correctly?  Is there some way to flush that cache after
writing, or to set that region to write-through, or ?


Re: recent strange sudo behavior, probably due to /dev/pts

2023-11-17 Thread Greg Troxel
Greg Troxel  writes:

> After a lot of investigating including writing a quick read/printf
> program to examine the sudo timestamp files, the problem appears to be
> that the timestamp records are "TS_PPID" rather than "TS_TTY".  The
> parent is something deep in make, and thus different every time.  So
> this is not really a pkgsrc issue.

Followup for the archives.

There are two problems that combine for hard-to-debug bad behavior.

1) devname(3)

devname(3) converts a device major/minor to a pathname.  sudo uses this.
However, it uses /var/run/dev.cdb, which is created by dev_mkdb(8) which
runs at boot.

So if you boot your system with /dev/ttyp*, mount ptyfs, rm /dev/ttyp*,
and log out and back in, you will still get e.g. /dev/ttyp5 from
devname.  (I saw this in the sudo logs.)


2) sudo bugs

sudo's default is 'tty'.  It gets the major/minor.  But it doesn't just
store that in the timestamp file.  It calls devname, and if it can't
stat the result, *silently*, even with respeect to debugging statements,
switches to ppid mode.  If it can stat, it stores the dev_t in the file.

See plugins/sudoers/timestamp.c, line 415, where it checks that stat
succeeds on the path.

The bad bug is that failure to stat a path that should be valid should
be fatal, not a silent flip to ppid.  The lesser bug is that if the
thing that matters is the dev_t, it should just use that.



Once realizing, re-running dev_mkdb caused devname to return /dev/pts/5
instead, and things work as I would expect.







Re: recent strange sudo behavior, probably due to /dev/pts

2023-11-17 Thread Greg Troxel
tlaro...@kergis.com writes:

> FWIW: In the "tty" entry there is: "If no terminal is present, the behavior
> is the same ass ppid." Could it be that the tmux instances are not
> recognized as terminal / tty anymore ?
>
> In doc/CHANGES:
>
>   tmux(1): Import version 3.3a [wiz 20230628]

Interesting and a good point.  I just ran sudo in an xterm without tmux
and the entry in the cache file is still type 3.  But great to take tmux
out of the picture; I should have tried that before posting.



recent strange sudo behavior, probably due to /dev/pts

2023-11-17 Thread Greg Troxel
My system is netbsd-10.  It was installed around 2003 and has been
updated since then, both to each NetBSD stable branch, and to new disks
usually via dump/restore and sometimes rsync.  Other than the problem I
am describing in this message, occasional hangs that I blame on ZFS, and
X display glitches, it is working well.

I have logged in to ttyE0, and ran "xinit .xsession" which starts xfce4
and I am in an xterm with tmux.

I had recently noticed that my system did not have /dev/pts, so I grabbed
the line from /etc/fstab on a more recenntly-installed system.  I then
removed the /dev/ttyp* type entries.  This messed up tmux but logging
out and starting fresh was ok, and this was not surprising.  Now the
system is apparently ok execpt for sudo/pkgsrc.

I have for a long time had in mk.conf:

  SU_CMD=sudo /bin/sh -c

and "make replace" has invoked sudo, with sudo respecting the
don't-ask-for-password timer as documented.  Recently this stopped
working.

After a lot of investigating including writing a quick read/printf
program to examine the sudo timestamp files, the problem appears to be
that the timestamp records are "TS_PPID" rather than "TS_TTY".  The
parent is something deep in make, and thus different every time.  So
this is not really a pkgsrc issue.

I have added:
  Defaults timestamp_type=tty




Has anyone else seen sudo refusing to use tty as a timestamp type?


sudoers(5) excerpt:

 timestamp_typesudoers uses per-user time stamp files for credential
   caching.  The timestamp_type option can be used to
   specify the type of time stamp record used.  It has the
   following possible values:

   global  A single time stamp record is used for all of a
   user's login sessions, regardless of the
   terminal or parent process ID.  An additional
   record is used to serialize password prompts
   when sudo is used multiple times in a pipeline,
   but this does not affect authentication.

   ppidA single time stamp record is used for all
   processes with the same parent process ID
   (usually the shell).  Commands run from the
   same shell (or other common parent process)
   will not require a password for
   timestamp_timeout minutes (5 by default).
   Commands run via sudo with a different parent
   process ID, for example from a shell script,
   will be authenticated separately.

   tty One time stamp record is used for each
   terminal, which means that a user's login
   sessions are authenticated separately.  If no
   terminal is present, the behavior is the same
   as ppid.  Commands run from the same terminal
   will not require a password for
   timestamp_timeout minutes (5 by default).

   kernel  The time stamp is stored in the kernel as an
   attribute of the terminal device.  If no
   terminal is present, the behavior is the same
   as ppid.  Negative timestamp_timeout values are
   not supported and positive values are limited
   to a maximum of 60 minutes.  This is currently
   only supported on OpenBSD.

   The default value is tty.

   This setting is only supported by version 1.8.21 or
   higher.



Re: network adapters, IP addresses, ports, domain names

2023-11-16 Thread Greg Troxel
st...@prd.co.uk (Steve Blinkhorn) writes:

> So can two different IP addresses on the same adapter each use the same
> port number each for its own distinct purposes?

It's not that IP addresses use ports.  It's that one progam can have a
listening socket bound to addra:p and another can bind to addrb:p.  As
far as I know, the logic in bind does not care if addra and addrb are
assigned to the same interface.  (It's not 'adaptor' as there could be
vlan interfaces on it, and there could be gif, tap, and so on.)

See also SO_REUSEADDDR, but I think you don't want to there.   You have
declined to explain your problem so I'll stop, but many problems like
this are from one program binding to *:p and then the other one can't.
Again, fstat -n is the way to see what is going on.



> I would assume they can since I run different web servers that use
> ports 80 and 443 in this way.

ports and addresses are not necessarily exactly parallel in general.


Re: NetBSD as an NTP stratum 1 server

2023-11-16 Thread Greg Troxel
Manuel Bouyer  writes:

>> If the concern is to keep time sync when the Internet is down, 1 ms of
>> fuzz is ok.  If you are trying to build something to distribute time to
>> other people, and especially to be a public stratum 1, then it's not ok.
>
> Sure; my feeling (but I may be wrong) is that is this case the 1s NMEA 
> messages
> may be good enough for NTP to sync, and the PPS may not bring much.
>
> I have a setup where I only use the NMEA message with gpsd and ntpd, no PPS.
> ntpd has to problem to sync but I don't know how acurate it is (this host
> is not connected to internet). For this use case 1s acuracy is good enough :)

If within 1s is good enough, NMEA only is ok.

If you have a system that is well-connected  and you set up NMEA you will
find out that there are delays and that these vary by type of receiver.
You use 'time1' in ntp.conf to adjust this.  I have 3 devices that need
values of
  0.116
  0.0445
  0.072
to be close.

That is very coarse compared to 1 ms for PPS.  But indeed, if you have a
machine that is not connected at all, it's really hard to tell it is a
second off.

So basically you might think
  NMEA only: 200 ms
  USB PPS: 1 ms
  gpio PPS: 100 us

and to do better (or to know you are doing better) you need to really
pay attention.


Re: network adapters, IP addresses, ports, domain names

2023-11-16 Thread Greg Troxel
st...@prd.co.uk (Steve Blinkhorn) writes:

> In a situation where a NetBSD machine (9.2 amd64 if it matters) has
> multiple network adapters each with multiple IP addresses
> corresponding to diverse domain names, to what are port numbers
> uniquely attached?

That question is too vague to be answered.

When a program calls socket/bind/listen on say port 1234, it can end up
listening on

  INADDR_ANY:1234
  INADDR6_ANY:1234
  127.0.0.1:1234
  ::1:1234
  ip-address-1:1234
  ipv6-address-1:1234

or there can be multiple sockets.  So it depends.

fstat -n is very useful here.


Re: NetBSD as an NTP stratum 1 server

2023-11-16 Thread Greg Troxel
Brad Spencer  writes:

>> There is actually support for PPS with USB devices that put the pps
>> signal on one of the modem pins, and I'm using it.  I think that I added
>> it a few years ago, and it was pretty easy.  However, there are two

I did, and it's been 9 years.

>> issues:
>>
>>   almost no USB GPS receivers bring PPS out on DCD.  I have a GR-601W or
>>   some model number like that, with a ublox 6, that was a special run
>>   that wired PPS to DCD inside.  If you have wired up a USB/serial chip
>>   and a GPS chip, then you may have this too.  But if you just bought a
>>   "gps mouse" or dongle, it is highly unlikely.
>>
>>   There is 1 ms of fuzz on timing.
>
> Oh, very nice to hear that and I was unaware that support existed.  My
> last test on this matter was with gpsd and it didn't seem to find the
> defines needed to make that work and my quick look at the source was not
> in the USB tty code so I missed seeing the support.

Sounds like you found it, but in case anybody else is looking:
  src/sys/dev/usb/ucom.c
  revision 1.106 from 2014-07-25

and the 1 ms fuzz is noted in ucom(4).

> The use of DCD in this manor is like a GPIO pin, but as you say, the USB
> fuzz may get in the way as the signal is transported via a USB packet.

AIUI, gpsd does not work with PPS on NetBSD at all because it is written
to the linux extension of the PPS API, not the RFC-specified version
that NetBSD implements.  These issues are about TIOCMWAIT not being
implemented in pps code in general, not about any particular pps
provider.

ntpd's pps code works with with the ucom pps, or at least it did pretty
recently.  (My ntp setup has decayed, likely due to other issues, not
this.)



Re: NetBSD as an NTP stratum 1 server

2023-11-16 Thread Greg Troxel
Brad Spencer  writes:

> No, there is no support for the /dev/ttyXX based IOCTLs that glue a PPS
> signal to a TTY port [1].  If there is an output on your GPS device for
> a pure GPIO style PPS signal, something that is either 5v or 3.3v in
> nature and pulses once per second at a digital logic level, you can feed
> that into a GPIO pin and use gpiopps(4) to utilize the pulse-per-second
> that way [2].  I have been doing that with GPS modules for years with
> NetBSD and run 2 GPS Stratum 1 NTP servers and 1 WWVB Stratum 1 NTP
> server.

There is actually support for PPS with USB devices that put the pps
signal on one of the modem pins, and I'm using it.  I think that I added
it a few years ago, and it was pretty easy.  However, there are two
issues:

  almost no USB GPS receivers bring PPS out on DCD.  I have a GR-601W or
  some model number like that, with a ublox 6, that was a special run
  that wired PPS to DCD inside.  If you have wired up a USB/serial chip
  and a GPS chip, then you may have this too.  But if you just bought a
  "gps mouse" or dongle, it is highly unlikely.

  There is 1 ms of fuzz on timing.

> As for stability and accuracy... using just the USB data alone will
> yield very poor results, as has been mentioned.  That can, however, be
> used for a quick test with the NEMA driver that ntpd has, just don't be
> impressed by it.  By adding the digital PPS signal into the mix that
> will deal with the USB problems and you will get a good result once the
> device and ntpd stabilizes.  My modules present their NEMA output as
> digital tty (uart) signals that I hook to a FTDI chip and into a USB
> port... so the effect is very simular to what you are probably doing.  I
> also use ntpd which can deal with both a /dev/ttyXX NEMA device and
> /dev/gpioppsX PPS device at the same time.  In this arrangement, you
> won't be using shared memory and your output would look something like
> this:

It is true that using USB PPS has 1 ms of fuzz.  However, people say
"stratum 1" and make varying assumptions about what they care about.

If the concern is to keep time sync when the Internet is down, 1 ms of
fuzz is ok.  If you are trying to build something to distribute time to
other people, and especially to be a public stratum 1, then it's not ok.


(The advice about "use gpio" is good - I am just trying to clarify USB PPS.)


Re: [cross]compiling world and mk.conf

2023-11-08 Thread Greg Troxel
Martin Husemann  writes:

> Alternatively you can use conditionals in mk.conf, like:
>
> .if ${MACHINE} == "sparc"
> CFLAGS+= -mcpu=v8 -mtune=supersparc
> .endif
>
> or
>
> .if ${MACHINE_ARCH} != shark
> MKKDEBUG=yes
> .endif


I would strongly recommend the .if method and having one file.  I would
expect things you want are mostly the same and only slightly different.
Note also that you can .include from mk.conf and you can do that from
within an .if.



Re: pkgconf freetype and flags

2023-10-19 Thread Greg Troxel
Riccardo Mottola  writes:

> who sets what pkgconf returns for the packages? Is it upstream or does
> it come from NetBSD?

For packages:
generally, upstream, and pkgsrc tries to fix it if it is not right.  It
is often not right...

For the base system: it's really up to NetBSD.

> I think there is an issue with freetype, missing the other part.
>
> Here:
> osgiliath: {64} pkg-config --libs freetype2
> -L/usr/pkg/lib -lfreetype
> osgiliath: {65} pkg-config --libs-only-other freetype2
> 
>
> if I compare it with nettle, which works fine:
> osgiliath: {72} pkg-config --libs nettle
> -Wl,-R/usr/pkg/lib -L/usr/pkg/lib -lnettle
> osgiliath: {73} pkg-config --libs-only-other nettle
> -Wl,-R/usr/pkg/lib
>
> we see that for freetype -Wl,-R/usr/pkg/lib is missing and this causes
> me various issues during configures and builds.

This is the typical bug.

You can look at the freetype2 package and the file from upstream and how
LDFLAGS get substituted in.

Basically things are often linuxy and assume that everything goes in
/usr and there is no need to set rpath.


Re: bad cksum when npf is started???

2023-10-19 Thread Greg Troxel
George Georgalis  writes:

> I've been struggling to resolve an odd networking issue.
> Initially I expected it was an npf.conf misconfiguration,
> but that conf has been pared down to almost nothing, yet
> when I load the config, networking stops. Now, I suspect the
> issue is a vswitch breaking tcp cksum, but I'm not sure why
> that would only be an issue when npfctl is enabled?

Are you using any hardware offloading?  If so, try to stop.

I do not understand "vswitch", precisely.  Is that code running on
NetBSD?


Re: Where are the 10.0 packages?

2023-09-28 Thread Greg Troxel
"Jay F. Shachter"  writes:

> I have just now been trying to install some packages onto my NetBSD 10
> system (which I don't often get to do, because it cannot see my
> wireless device, but today I was able to give it Internet access by
> connecting two laptops to an Ethernet hub, and making the NetBSD
> system a dhcp client of the other laptop), and I cannot, because the
> packages are no longer at
> http://ftp.NetBSD.org/pub/pkgsrc/packages/NetBSD/amd64/10.0/All where
> they used to be.  The amd64 directory now has nothing higher then 9.
> What do I need to put into my PKG_PATH variable to access the 10.0
> packages (when I am next able to do so, which won't be for a long
> while)?  Thank you in advance for any and all replies.

The NetBSD 10 branch (note that NetBSD 10.0 has not been released)
recently had an ABI break, so packges built before the ABI change no
longer worked.  We have withdrawn all pre-ABI-change packages as they
will not work with up-to-date 10.  And, because the point of 10 is to
help test for the upcoming 10.0 relase, everyone running 10 should be
running up-to-date 10.

Packages from after the ABI break are available for aarch64 and
earmv[67]hf-el.  Someone is working on packages for 2023Q3 for amd64 but
I don't have a firm ETA.  Tomorrow is pretty much unthinkable and 30
days is pretty likely.  In between is much harder to talk about!


Re: i915 drm error

2023-09-03 Thread Greg Troxel
In that case, it seems there is something unusual about the monitor, and
that intelfb or X is misparsing it.  Either that or our code is just
buggy.  I would suggest

  Configure a modeline in X explicitly, so parsing the EDID is out of
  the equation.  Probably the fastest path to working.

  Add debugging to the edid parsing, look at the bits, and follow along
  with the spec and figure out what we are doing wrong, or if the
  monitor is wrong and we could make a more helpful choice in the face
  of bad data, or something else, and fix it.  The best path to helping
  everybody, and beware that I could be confused here.  But this is of
  course much harder.  This may need doing in intelfb and in X.

Note that the linux utility printed:

  #Extension block found. Parsing...
  #WARNING: I may have missed a mode (CEA mode 95)
  #WARNING: I may have missed a mode (CEA mode 97)
  #WARNING: I may have missed a mode (CEA mode 94)
  #WARNING: I may have missed a mode (CEA mode 96)
  #WARNING: I may have missed a mode (CEA mode 93)
  Modeline  "Mode 16" +hsync +vsync

which is a clue that something is irregular.  I would read the source
code for that too, as it might have a workaround with a good comment, in
the best case.  (Reminder that it is fine to steal concepts as they are
not subject to copyright (concept attribution remains polite of course),
but not to copy even comments from a GPL-licensed file to a BSD-licensed
file.  This assuming you would send a patch in to NetBSD.)


Re: i915 drm error

2023-09-03 Thread Greg Troxel
bsd...@tuta.io writes:

> Hi,
>
> I am on NetBSD 10 beta. I get a DRM error while booting. And when I start 
> Xorg, unsurprinsgly I get just a mere cursor and hangs.

> 319 Sep  2 18:17:01 bsd /netbsd: [  30.8266723] intelfb0: framebuffer at 
> 0x90009000, size 3840x2160, depth 32, stride 15360
> 320 Sep  2 18:17:01 bsd /netbsd: [  30.9366715] 
> {drm:netbsd:pipe_config_infoframe_mismatch+0x40} *ERROR* mismatch in hdmi 
> infoframe
> 321 Sep  2 18:17:01 bsd /netbsd: [  31.2766713] 
> {drm:netbsd:pipe_config_infoframe_mismatch+0x4e} *ERROR* expected:
> 322 Sep  2 18:17:01 bsd /netbsd: [  31.6066705] i915drmkms0: 68 bytes @ 
> 0xa725f40bcbe8
> 323 Sep  2 18:17:01 bsd /netbsd: [  31.9366699] 81 00 00 00 01 00 00 00  03 
> 0c 00 00 00 00 00 00 | 
> 324 Sep  2 18:17:01 bsd /netbsd: [  32.291] ff ff ff ff 00 00 00 00  00 
> 00 00 00 00 00 00 00 | 
> 325 Sep  2 18:17:01 bsd /netbsd: [  32.6066684] 00 00 00 00 00 00 00 00  00 
> 00 00 00 00 00 00 00 | 
> 326 Sep  2 18:17:01 bsd /netbsd: [  32.9366677] 00 00 00 00 00 00 00 00  00 
> 00 00 00 00 00 00 00 | 
> 327 Sep  2 18:17:01 bsd /netbsd: [  33.272] 00 00 00 00   
>    | 
> 328 Sep  2 18:17:01 bsd /netbsd: [  33.606] 
> {drm:netbsd:pipe_config_infoframe_mismatch+0x6f} *ERROR* found:
> 329 Sep  2 18:17:01 bsd /netbsd: [  33.9366659] i915drmkms0: 68 bytes @ 
> 0xa725f40ebbe8
> 330 Sep  2 18:17:01 bsd /netbsd: [  34.249] 81 00 00 00 01 04 00 00  00 
> 00 00 00 00 00 00 00 | 
> 331 Sep  2 18:17:01 bsd /netbsd: [  34.6066645] 00 00 00 00 00 00 00 00  00 
> 00 00 00 00 00 00 00 | 
> 332 Sep  2 18:17:01 bsd /netbsd: [  34.9366643] 00 00 00 00 00 00 00 00  00 
> 00 00 00 00 00 00 00 | 
> 333 Sep  2 18:17:01 bsd /netbsd: [  35.229] 00 00 00 00 00 00 00 00  00 
> 00 00 00 00 00 00 00 | 
> 334 Sep  2 18:17:01 bsd /netbsd: [  35.6066625] 00 00 00 00   
>    | 
> 335 Sep  2 18:17:01 bsd /netbsd: [  35.9366619] warning: 
> /usr/src/sys/external/bsd/drm2/dist/drm/i915/display/intel_display.c:14031: 
> pipe state doesn't match!
> 336 Sep  2 18:17:01 bsd /netbsd: [  36.212] no data for est. mode 
> 640x480x67

What kind of monitor do you have plugged in?  To HDMI?  Do you have only
one attached?  Do you have another one to try?   It looks like there is
unspected EDID data and this is not well handled.   You might read the
kernel code and try to improve the error handling to skip bad modes and
just leave the good ones, and maybe print them all out.

Does the monitor really support 3840x2160?

Does it work with other operating systems?



Re: Using cross compiler created by build.sh

2023-09-03 Thread Greg Troxel
Brook Milligan  writes:

> I am building software that targets an Arm board.  When the OS is
> NetBSD, I can use the cross-compiler created by build.sh to build
> software for the board.
>
> However, I also need to build software for the same board but a
> different OS, e.g., Debian.  Can I use the same compiler?  Is pointing
> it to the appropriate headers/libraries with -I/-L sufficient?  Are
> there other tricks?

Perhaps, but you will likely also need some sort of --nostdinc and
--sysroot.


Re: syncthing: too many open files?

2023-08-26 Thread Greg Troxel
David Brownlee  writes:

> On Sat, 26 Aug 2023 at 09:37, nia  wrote:
>>
>> Has anyone ran into syncthing spamming the following in its log
>> when faced with a large directory (1402 files, 173 subdirectories,
>> ~22.8 GiB)?
>>
>>  Listen (BEP/tcp): Accepting connection: accept tcp [::]:22000: accept4: too 
>> many open files
>>
>> It's unable to sync anything.
>>
>> kern.maxfiles is 32000, ulimit -n is 3000 (the max it'll let
>> my user account set).
>
> For anything more than moderate numbers of files you need to disable
> "Watch for Changes" under the advanced tab for the folder on NetBSD.
> (I'm using syncthing with > 350,000 files so hit this quite early on :)

This is consistent with my fuzzy memory.

> I'm sure there used to be a MESSAGE.NetBSD for this a while back :-p

cvs says no, and that would have been wrong anyway.

This is not a pkgsrc problem.  The same thing would happen to anyone
building syncthing from upstream sources on NetBSD.  An upstream bug
report is in order, both to document and to default watch off, or to
dynamically flip it off on running out of fds or something.

Also perhaps files/README-NetBSD installed in $docdir/syncthing to
remediate upstream docs, with a link to the upstream bug reports.




Re: Libreoffice: Error about /usr/lib/libstdc++.so.7

2023-08-26 Thread Greg Troxel
Bruce Nagel  writes:

> pkg_admin rebuild:
>
> "pkg_admin: Package `openjdk17-1.17.0.6.10nb1' has no @name, aborting."

Less drastically, you could pkg_delete -f openjdk17.


> pkg_admin rebuild-tree:
>
> pkg_admin: Dependency poppler-22.10.0{,nb*} of poppler-cpp-22.10.0 unresolved
> pkg_admin: Dependency osabi-NetBSD-9.3 of x11-links-1.35 unresolved
> pkg_admin: Dependency webkit-gtk>=2.36.8nb1 of liferea-1.12.8nb11 unresolved
> Done.
>
> So it seems the pkgin database is corrupted, do I have more issues with the
> pkgdb to address as well based on those results?

Those are not about pkgin.  They are deeper, in the pkgdb.  You can
either nuke as nia suggested and start over, or you can dig in and read
the control files and when you find a package with broken control files
pkg_delete -f it.  You didn't mention whether you looked and e.g. if
poppler was present in /usr/pkg/pkgdb.  If you aren't comfortable with
that, then you have arrived at starting over.



Re: syncthing: too many open files?

2023-08-26 Thread Greg Troxel
nia  writes:

> Has anyone ran into syncthing spamming the following in its log
> when faced with a large directory (1402 files, 173 subdirectories,
> ~22.8 GiB)?
>
>  Listen (BEP/tcp): Accepting connection: accept tcp [::]:22000: accept4: too 
> many open files
>
> It's unable to sync anything.
>
> kern.maxfiles is 32000, ulimit -n is 3000 (the max it'll let
> my user account set).

I have a directory (including .stfolder and .stignore):

  11G total
  6007 files
  301 directories

and it was working fine on 9 a week ago.  (I have not yet started
syncthing on 10 with a new layout and move to zfs because I feel I need
extra backups first, probably unjustified fear.)

I remember this happening, but not super clearly.  I dimly remember
turning off some watch options and leaving it at periodic scan, and also
that this used to not be an issue but then maybe a feature appeared.  I
think there is a code path that instead of inotify uses kqueue and
somehow adds every file.




Re: dhcpd(8) and unused or old MAC addresses

2023-08-23 Thread Greg Troxel
Brett Lymn  writes:

>> But dhcpd keeps track of previous leases long after their expiration; I
>> have had entires in the lease file from 6 months ago.  It will assign
>> addresses from the pool that have never been used for a lease, and then
>> I am pretty sure it will start reusing addresses probably in order of
>> least recently leased.  This is 100% compliant with the spec and means
>> that if a device gets an address and comes back next week, it will get
>> the same address.  I think this is what Rocky is seeing in the lease file.
>
> I have seen the opposite happen on a linux server where a vm I was
> constantly rebuilding would get a different IP despite the interface
> MAC being the same every time.

That is also spec-compliant behavior.

With dhcpd?

> As long as it doesn't double allocate it really shouldn't matter if
> dhpcd hands out the same address to something reappearing after the
> lease time has expired it.

Unless something or somebody cares.  It has no right to, per the
protocol, but it is often helpful to humans for it to be stable.  This
is why I suggested to Rocky to configure addresses.


Re: dhcpd(8) and unused or old MAC addresses

2023-08-22 Thread Greg Troxel
Brett Lymn  writes:

> On Tue, Aug 22, 2023 at 09:41:48AM -0400, Greg Troxel wrote:
>> 
>> > Is there a way to "free" their entries, to let dhcpd(8) forget about
>> > them, so that the relative IP addresses are re-usable? Each device
>> > which receives an IP address is recorded in /var/db/dhcpd.leases. Is it
>> > enough to manually delete its entry in that database file, or some
>> > other operation is needed?
>> 
>> This works and I do it all the time.
>
> The lease time is supposed to control how quickly the IP addresses are
> reused - half way through the lease time the client is supposed to renew
> the address, if this happens then the lease time is restarted.  If the
> client fails to renew the address and does not renew it at the expiry
> time then

This I agree with, finishing "the lease has expired and the address will
not be currently assigned"

> the address will be reused.

This isn't really true.   The address will be available for
reassignment, according to the protocol, so that if the server chose to
reuse it, it would not be wrong.

But dhcpd keeps track of previous leases long after their expiration; I
have had entires in the lease file from 6 months ago.  It will assign
addresses from the pool that have never been used for a lease, and then
I am pretty sure it will start reusing addresses probably in order of
least recently leased.  This is 100% compliant with the spec and means
that if a device gets an address and comes back next week, it will get
the same address.  I think this is what Rocky is seeing in the lease file.

> So, you could reduce your lease time so ephemeral devices will get a
> lease and it will be released quickly.  The balance being if you have
> devices that you want having the same address are off for a while then
> they may not get the same address again.  For those you could
> statically assing a particular mac address an IP and have another
> part of the range available for dynamic allocation.


I don't see lease time as very related here.  I'm using:

  default-lease-time 3600;
  max-lease-time 14400;

but I have leases in /var/db/dhcpd.leases from August 7th, and only then
because I cleaned out one from December.

1 hr, 1 day, is not going to matter much.  If you don't use all the
addresses, a returning device (with same mac addr) will get the same
address, even after a year.

I have observed Ubiquiti EdgeRouterLite (with their firmware; this isn't
port-mips :-) doing essentially the same thing with reusing addresses.


Re: dhcpd(8) and unused or old MAC addresses

2023-08-22 Thread Greg Troxel
rockyho...@firemail.cc writes:

>> But basically you shouldn't care, unless you want your IP address space
>> tidy.  If that's what you want, then you probably should configure your
>> dhcpd:
> [...]
>>   statically assign (NOT in the pool!) in dhcpcd addresses to specific
>>   devices based on IP address or client-id (you can steal these lines
>>   from the lease file) organized in some way that makes senes to you,
>>   like a block of 8 or 10 for one persons phone-type devices, a block
>>   for accees points, etc.
> [...]
>> The basic expression is
>>   host foo {
>> hardware ethernet bar;
>> fixed-address 192.168.100.11;
>>   }
>
> I am not sure that I understood what you are meaning. Are you
> suggesting to explicitly specify in /etc/dhcpd.conf some entries for
> some selected devices, so that - if they make a dhcp request - they
> always get the same IP, fully controlled by me?

Yes, that is exactly what I am suggesting.

I identify every device on my network and organize them as if I am
assigning static IP addresses, but I do it with many of those statements
inside the network declaration in dhcpcd.conf.  The hosts have no idea
and just do dhcp, but they all end up on the address I want for each.

I meant to suggest that if you don't like what just happens, then you
probably want even more control.   I don't see sort-of-caring about
which addresses are used, between not caring and really caring, as a
sensible place to be, but probably it is and I just don't understand
why.


Re: dhcpd(8) and unused or old MAC addresses

2023-08-22 Thread Greg Troxel
rockyho...@firemail.cc writes:

> II{R,U}C, dhcpd(8) actually uses a pseudo-static IP assigment policy:
> it tries to relate the same MAC address always to the same IP address.
> I have a NetBSD 9.0 machine which acts as DHCP server and it seems to
> behave exactly this way.

This is true.

> However, there are some MAC addresses (which "reserve" an IP address in
> the dhcp IP range) that are no more present in my local network: for
> example, some devices that are no more used, or some other devices that
> (after a software update: sometimes it happens, at least with mobile
> phones) changed the MAC address of their NICs.

True.

> Is there a way to "free" their entries, to let dhcpd(8) forget about
> them, so that the relative IP addresses are re-usable? Each device
> which receives an IP address is recorded in /var/db/dhcpd.leases. Is it
> enough to manually delete its entry in that database file, or some
> other operation is needed?

This works and I do it all the time.

Edit the file and delete the specific lease.  Then restart dhcpd.
However, I am unclear on when it writes the lease file and if it reads
it other than at startup.   I think it reads it on startup, and then the
working copy in RAM is what counts and it writes it when it issues a
lease.   So if you write/restart without a lease renewal, this works.

Best practice is surely to stop the daemon, edit, and restart.


But basically you shouldn't care, unless you want your IP address space
tidy.  If that's what you want, then you probably should configure your
dhcpd:

  a range of only part fo your network, e.g. for /24 only put 128-254 in
  the pool

  statically assign (NOT in the pool!) in dhcpcd addresses to specific
  devices based on IP address or client-id (you can steal these lines
  from the lease file) organized in some way that makes senes to you,
  like a block of 8 or 10 for one persons phone-type devices, a block
  for accees points, etc.

  periodically review what's in the lease file and assign static or hunt
  it down and disconnect it


The basic expression is

  host foo {
hardware ethernet bar;
fixed-address 192.168.100.11;
  }


My dhcpd.conf is large, but it's not really complicated.

After you edit and restart, make sure it's running because it will fail
to start on syntax error and you will notice hours later when devices
can't get leases.


Re: Samba, ZFS and xattr

2023-08-21 Thread Greg Troxel
Hauke Fath  writes:

> On 8/14/23 22:14, Chavdar Ivanov wrote:
>>> supermicro# zfs set xattr=on pool0/backup/timemachine
>>> property 'xattr' not supported on NetBSD: permission denied
>>>
>>> If I'm not mistaken, this should be the step to set xattr?
>> According to Oracle's documentation, yes; it should be on by
>> default.
>> On NetBSD (-current from yesterday) I get the same as above.
>
> To my understanding, NetBSD's zfs is not the latest and greatest.

That is my understanding also.

> I am in the same boat - we have Time Machine backups to a FreeBSD
> Samba zfs server at work, and I failed to reproduce the setup at home
> on NetBSD 10 because of the missing xattr support.

I added this to the zfs wiki page:

https://wiki.netbsd.org/zfs/


Re: UEFI installation

2023-08-14 Thread Greg Troxel
Mark Davies  writes:

> What sort of Dell is it?   We've got lots of them of various ages so
> I'm fairly familiar with the BIOS settings.  Any halfway recent one
> will let you UEFI boot off anything, but the newer they are the more
> restrictive they are in what they will legacy boot from.  Recent ones
> will only allow legacy boot from external USB.

It is an Optiplex 3070 SFF, from 2019, with 32 GB RAM and i7-9700 (9th
gen) and Intel UHD Graphics 630.  It came with a Kingwin SSD with
Windows which I swapped out for a bigger one I have more confidence in,
and can only really handle a single 2.5" drive.

Have there been Dell or HP systems that don't let you turn off Secure
Boot?  That would make them useless ;-(

It turns out that if you just turn off Secure Boot and leave the rest,
it will boot UEFI off USB if present, and if not the internal SATA disk.
There are options for "legacy option ROM" and MBR boot that confused me
into thinking that it would not boot off USB without them, but I think
they meant would not boot MBR from USB.

Once I had booted from USB via UEFI, confirmed vis sysctl
machdep.bootmethod, the installer did the right thing.

> I haven't tried a netbsd-9 install recently, but last week I did a
> uefi install of 10_beta all from the installer and that was straight
> forward and went smoothly.  Previously I'd always broken out and set
> up the disks manually - as per
> https://wiki.netbsd.org/Installation_on_UEFI_systems/

I have a working system from the netbsd-9 installer normal path, with
just "set sizes".  It proposed 128 MiB for EFI, and then I set up / swap
/usr /var (oddly out of order) how I wanted them, and left the rest free
for zfs later.  After a few PEBKAC, totally not the installer's fault,
the builtin re0 works fine.

The computer has VGA, HDMI and DisplayPort, which is an interesting set,
and right now I only have hand-me-down monitors that do VGA and HDMI, so
I used a VGA cable.  It did not detect monitor resolution and ended up
at 1024x768 on both a Dell 1600x1200 and an HP w2207 (1680x1050).  I
realize I may have bad VGA cable and that I can work around with with
xrandr and Modelines in xorg.conf, like in the old days, and that a
post-2007 monitor is in order.

Other than the video resolution, things appear to be working well.
There a few nonconfigured things in demsg but they don't seem likely
important.  I will be moving to NetBSD 10, but am trying to do fewer
things at once.




Re: UEFI installation

2023-08-14 Thread Greg Troxel
Thanks to martin@ and mlelstv@ for hints.  I have updated the wiki page:

  https://wiki.netbsd.org/Installation_on_UEFI_systems/

please feel free to fix it or tell me I did it wrong; I try to update
things after getting help to help the next person or future me after
this is paged out.


Re: UEFI installation

2023-08-14 Thread Greg Troxel
Martin Husemann  writes:

> But the part that I don't understand: why can't you get your machine to
> boot the USB install image in UEFI mode? With stupid x86 firmware everything
> is possible but I would guess it is more likely that some setting should
> allow booting from USB in UEFI mode. Maybe the boot order settings
> priorize CSM/legacy mode for USB over native (UEFI) boot and you can move
> the UEFI USB boot up in the list?

It may just be me not figuring it out.  The in-BIOS labels for settings
are hard to understand.  I'll see, but at least now I understand what's
going on.



UEFI installation

2023-08-14 Thread Greg Troxel
(I have a new 2019 Dell, and I'll post details in the thread where I
asked about hardware after it is working.)

Windows is set up to boot gpt/UEFI on the 1T low-end SSD that I have set
aside.  I'm thus trying to install onto a new 4T SSD.

The BIOS situation is a little funky.  It's clearly UEFI, but it has a
"legacy option rom" setting that seems to allow for booting via USB in
which case it would use MBR.  One seems to need to put the bios in that
mode for USB and then back to normal for UEFI boot from internal disk --
but I'm really not sure.

I am installing netbsd-9 (because the system I am migrating is netbsd-9
and I am trying to do as little as possible in one step).   The
installer boots, and I did a gpt install.

Flipping back to internal/UEFI boot, it failed to boot, and with the
install image utility menu I see that it is gpt, but there is no EFI
partition which explains it.

So it seems the installer detected that it booted from mbr instead of
UEFI and set up MBR probably gptboot and skipped the EFI gpt partition.
Maybe I'm over-assuming.

I am therefore thinking that either:

  I need to coax the BIOS into booting UEFI from USB (or maybe boot an
  install CD)

  I need to tell the installer that I want a UEFI install

  I need to break into utility menu and make the EFI partition myself

and would appreciate clues from those who have gone down this road
before.  (I have been sufficiently retro that I have rarely dealt with
UEFI.)

For the EFI partition, what are the rules?  It seems like

  it's first

  it's maybe aligned to ?

  the size is at least X and less than Y

  it contains bootx64.efi (and maybe bootia32.efi, but is that necessary
  on a post-2010 CPU?)

and making it match what windows did seems safe enough

This page seems out of date as the install image names are different now
(on netbsd-9): 

  https://wiki.netbsd.org/Installation_on_UEFI_systems/


Re: ZFS Bogosity

2023-08-14 Thread Greg Troxel
Michael van Elst  writes:

>> Alternatively, I see that we add wedges to hw.disknames. My system has
>> a NetBSD boot image on a flash drive this minute, and:
>>   hw.disknames = wd0 cd0 sd0 dk0 dk1
>> so if we add dk0, which is really no different logically than sd0a, it
>> seems like we should add disklabel partitions like wd3e.
>
> The sysctl value reflects the device drivers from the autoconfig
> process. It has no knowledge about disk content. This discrepancy
> between disklabel partitions and wedges is maybe the largest problem
> of wedges.

I read that as "changing hw.disknames is difficult and likely to lead to
trouble".

In this case, wedges show up as logical disks in hw.disknames, so it's
really that partitions aren't disks.  Which is fine, except some code
assumes that only full disks matter which is wrong.

> You can avoid this by ignoring and phasing out disklabel
> partitions (which only work for "small" disks anyway) and use wedges
> also to handle the bsd disklabel.

Do you mean "create dkN entries from the disklabel"?  That doesn't
happen by default.

>> It also seems odd to special case /dev, vs using the dir if passed and
>> doing hw.disknames->dev if not, but it seems best to minimally munge
>> upstream.
>
> That's an optimization to avoid scanning and probing all entries
> in /dev/ which can take some time and may have unwanted side effects
> when you scan non-disk devices.

Makes sense, but the man page should probably be louder about this,
then.  I'll try to get this into the zfs howto.

>> What happens on FreeBSD?  Are they so firm on gpt-only, geom and
>> zfs-on-whole disk that this doesn't come up?  It seems obvious that
>> people easing into zfs are going to use partial disks, if only as
>>   / sw /usr on ffs, and
>>   rest as sole storage for a pool
>> .
>
> For ZFS it rarely makes sense to use partitions. You need this as
> a workaround to allow bootstrap without ZFS support in the boot
> process, and you need it to exchange media with other systems
> that use the partition table as identifier (but who uses disklabel?).

I still use disklabels on disks that are <= 2T, and I don't see that as
really odd.

>> > You can also cache devices, then the pool devices are just used as
>> > stored in the cache instead of scanning all disks for labels. But
>> > that doesn't work nicely with wedges or anything else with changing
>> > device units.
>> 
>> I found that my single pool tank0 with a single component wd0f has a
>> cache file in /etc/zfs/zpool.cache.   But that seems not a general
>> solution for import.
>
> That is the general solution of ZFS to avoid the time consuming scanning
> of /dev/ at startup, and you could arrange disk devices into something
> like /dev/disk to avoid side effects on non-disk devices You see,
> it's a design for something else.

I do have /etc/zfs/tank0 with:

  $ ls -l /etc/zfs/tank0/
  total 0
  lrwxr-xr-x  1 root  wheel  9 Feb 17  2021 wd0f -> /dev/wd0f

and that seemed to work.

Yes, the cache solution works fine on boot.  But it does not solve
export/import so cleanly.

> Or you could arrange for "volume labels" to be visible in the filesystem
> (e.g. use devpubd to create symlinks for wedges, or invent a device
> filesystem that is filled by the kernel) and let zfs scan these.
>
> Makeing zfs scan disklabel partitions derived from hw.disknames seems
> to be the easier method though.

Yes, that seems not to have any real downsides and would make it behave
as expected.

We probably do need to make sure we don't alias /dev/wd3e and /dev/dk7
which is a wedge created from it.  But presumably there is header with a
uuid and zfs already copes.


Re: 10 fails to build

2023-08-13 Thread Greg Troxel
Paul Ripke  writes:

> I just finished an update build for netbsd-10 amd64 - apart from some
> flist shenanigans, it went smoothly.

Thanks.  I also removed my tools objdir and restarted, and that didn't
seem to fix it, and then I re-did includes.  Now I have succeeeded to
the point of flist shenanigans :-) and just rm'd stuff.  So I am not
sure what happened, but if it doesn't happen to others it's not worth
worrying about.

> SIGILL is a little odd. The only 3 causes I've seen in the wild:
> - code bug, function pointer flying off somewhere unexpected.
> - dodgy hardware.
> - the `ud` family of instructions injected by clang/llvm/gcc in sanitised
>   builds to catch undefined behaviour.
>
> I wonder which case fits here... if you got a core, you may be able to
> check if PC points to a ud2, etc. instruction? Though I'm not seeing gcc
> built with -fsanitize=undefined in my build log.

I don't have a core file.

Fair point about dogdy hardware.  I'm currently using a 2007 Dell mid
tower and it may have issues.  But I could build everything else, pkgsrc
builds are ok and this one file would fail entirely repeatably.

I'll move on to i386 and earmv7hf-el when this works.


Re: ZFS Bogosity

2023-08-13 Thread Greg Troxel
mlel...@serpens.de (Michael van Elst) writes:

> g...@lexort.com (Greg Troxel) writes:
>
>>David Brownlee  writes:
>>> https://gnats.netbsd.org/57583
>
>>Do you think this is just a bug that  it fails to look at wd3e
>>etc. wrongly if there is /dev/zfs?
>
> The code scans all devices in the specified device directory, unless
> it's /dev/. Then it uses sysctl hw.disknames to enumerate disk devices,
> and it doesn't care about disklabel partitions.

Given that wd3e is a name for a disk special file with defined size, it
would seem that we should change that.  It seems to make just as much
sense to probe wd3[a-p] as it does to probe wd3 (which is wd3d).

Alternatively, I see that we add wedges to hw.disknames.   My system has
a NetBSD boot image on a flash drive this minute, and:
  hw.disknames = wd0 cd0 sd0 dk0 dk1
so if we add dk0, which is really no different logically than sd0a, it
seems like we should add disklabel partitions like wd3e.

What else cares about hw.disknames?

It also seems odd to special case /dev, vs using the dir if passed and
doing hw.disknames->dev if not, but it seems best to minimally munge
upstream.

What happens on FreeBSD?  Are they so firm on gpt-only, geom and
zfs-on-whole disk that this doesn't come up?  It seems obvious that
people easing into zfs are going to use partial disks, if only as
  / sw /usr on ffs, and
  rest as sole storage for a pool
.

> You can also cache devices, then the pool devices are just used as
> stored in the cache instead of scanning all disks for labels. But
> that doesn't work nicely with wedges or anything else with changing
> device units.

I found that my single pool tank0 with a single component wd0f has a
cache file in /etc/zfs/zpool.cache.   But that seems not a general
solution for import.


Re: ZFS Bogosity

2023-08-13 Thread Greg Troxel
David Brownlee  writes:

> https://gnats.netbsd.org/57583

Do you think this is just a bug that  it fails to look at wd3e
etc. wrongly if there is /dev/zfs?

What is the point of /dev/zfs (is that how zpool/zfs control works?) and
is there any reason this should matter?  Do you think this is this just a bug?



10 fails to build

2023-08-13 Thread Greg Troxel
I'm doing an update build, but I did a cleandir in libc.  This file
fails and the rest of gdtoa seems troubled too.  (up to date netbsd-10)

~/NetBSD-10/src/lib/libc > /usr/obj/gdt-10/tools/bin/nbmake-amd64   
#   compile  libc/dtoa.o
/usr/obj/gdt-10/tools/bin/x86_64--netbsd-gcc -O2   -std=gnu99-Wall 
-Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Wno-sign-compare  
-Wsystem-headers   -Wno-traditional   -Wa,--fatal-warnings  -Wreturn-type 
-Wswitch -Wshadow -Wcast-qual -Wwrite-strings -Wextra -Wno-unused-parameter 
-Wno-sign-compare -Wold-style-definition -Wsign-compare -Wformat=2  
-Wno-format-zero-length  -Werror   -fPIE -fstack-protector -Wstack-protector   
--param ssp-buffer-size=1--sysroot=/usr/obj/gdt-10/destdir/amd64 -D_LIBC 
-DLIBC_SCCS -DSYSLIBC_SCCS -D_REENTRANT -D_DIAGNOSTIC 
-I/home/n0/gdt/NetBSD-10/src/lib/csu/common -DHESIOD -DINET6 -DNLS -DYP 
-I/home/n0/gdt/NetBSD-10/src/lib/libc/include 
-I/home/n0/gdt/NetBSD-10/src/lib/libc -I. -I/home/n0/gdt/NetBSD-10/src/sys 
-I/home/n0/gdt/NetBSD-10/src/lib/libc/compat/../locale 
-I/home/n0/gdt/NetBSD-10/src/lib/libc/compat/stdlib 
-I/home/n0/gdt/NetBSD-10/src/lib/libc/compat/../stdlib -D__BUILD_LEGACY 
-I/home/n0/gdt/NetBSD-10/src/lib/libc/../../common/lib/libc/quad 
-I/home/n0/gdt/NetBSD-10/src/lib/libc/../../common/lib/libc/string 
-I/home/n0/gdt/NetBSD-10/src/lib/libc/../../common/lib/libc/arch/x86_64/string 
-I/home/n0/gdt/NetBSD-10/src/lib/libc/../../common/lib/libc/arch/x86_64/atomic 
-I/home/n0/gdt/NetBSD-10/src/lib/libc/../../common/lib/libc/hash/sha3 
-D__DBINTERFACE_PRIVATE -I/home/n0/gdt/NetBSD-10/src/libexec/ld.elf_so 
-I/home/n0/gdt/NetBSD-10/src/lib/libc/dlfcn 
-I/home/n0/gdt/NetBSD-10/src/lib/libc/gdtoa 
-I/home/n0/gdt/NetBSD-10/src/lib/libc/locale -DHonor_FLT_ROUNDS 
-I/home/n0/gdt/NetBSD-10/src/lib/libc/arch/x86_64/gdtoa -DWITH_RUNE 
-I/home/n0/gdt/NetBSD-10/src/lib/libc -D_ACL_PRIVATE -DPOSIX_MISTAKE 
-DCOMPAT__RES -DUSE_POLL -DPORTMAP -DUSG_COMPAT  -D_FORTIFY_SOURCE=2 -c
/home/n0/gdt/NetBSD-10/src/lib/libc/gdtoa/dtoa.c -o dtoa.o.o
/home/n0/gdt/NetBSD-10/src/lib/libc/gdtoa/dtoa.c: In function '__dtoa':
/home/n0/gdt/NetBSD-10/src/lib/libc/gdtoa/dtoa.c:262:2: internal compiler 
error: Illegal instruction
  262 |  ds = (dval()-1.5)*0.289529654602168 + 0.1760912590558 + 
i*0.301029995663981;
  |  ^~
Please submit a full bug report,
with preprocessed source if appropriate.
See  for instructions.



Re: would anybody use binary packages for NetBSD/i386 10?

2023-08-13 Thread Greg Troxel
Benny Siegert  writes:

> I would like to create NetBSD 10 based CI images for Go in the near future. 
> Having binary packages for i386 makes this immensely easier.

Thanks.  There are been several people who say they'd use them, so that
seems enough not to rock the boat.  You just never know until you ask!

(I didn't perceive a need for them to be particuarly timely, though, so
if they came to exist but were a week behind amd64, probably there would
be no great upset.)


Re: would anybody use binary packages for NetBSD/i386 10?

2023-08-13 Thread Greg Troxel
mlel...@serpens.de (Michael van Elst) writes:

> g...@lexort.com (Greg Troxel) writes:
>
>>it was underpowered, that I might or might not ever power up again, and
>>if I did I wouldn't use ftp.n.o packages on it.
>
> What else? Self-compiling on a system you already consider outdated? :)

I would use my i386 vm on a "modern" system from 2010, that I have set
up to build packages, on which I do my own cowboy pullups to get new
versions mid-branch, which I can do easily since I don't need to worry
about breaking other people.

And if I did power it up, I would only want bash/tmux/m4/emacs-nox11
anda few other things.  I certainly don't want kde, firefox, or any
desktop stuff.  It has only 512M of RAM.  That's half of why it's
powered off.

> Binary packages are more important on systems that we consider old,
> doesn't have to be a VAX.

Sure, but the question is how many actual people care, and how that
trades off against other builds that would be useful that we don't
have.  For example pkgsrc-current builds for netbsd-10 and 9 would be
very useful to improve the state of pkgsrc for all, even if the builds
aren't published.

pkgsrc doesn't have invasive tracking so we don't really have good data.


would anybody use binary packages for NetBSD/i386 10?

2023-08-13 Thread Greg Troxel
In contemplating bulk builds and resources, I wonder if there are still
people who:

  are running NetBSD/i386 (as opposed to amd64)

  are using the binary packges from quarterly branches on ftp.netbsd.org

  are running NetBSD 10 already, or who intend to move to it soon or
  after release

If you have a system that meets the above, please either reply here (the
first few people :-) or just answer me privately.  (I'd also be
interested in which category below your use is.)

Basically, I would think about not doing bulk builds if very few want
them, relative to the effort/resources required to create them.


My guess is that at this point, i386 use is limited to

  a) old embedded-type systems (soekris)
  b) systems that are running i386 because they were first installed many
 years ago and haven't been converted to amd64 for no good reason or
 for some odd special case odd reason
  c) build systems to support category a/b systems, for testing or
 building private binary package sets
  d) retrocomputing

and that the amount of use with ftp.n.o binary packages is extremely
small.

As a personal example -- and I am somewhat trailing edge -- I know of
two NetBSD/i386 systems in category b (one each no good reason and one
special case odd reason), and 2 in category c.  I have one system that
would be category a, replaced several years ago and powered off because
it was underpowered, that I might or might not ever power up again, and
if I did I wouldn't use ftp.n.o packages on it.




Re: Libreoffice: Error about /usr/lib/libstdc++.so.7

2023-08-09 Thread Greg Troxel
Bruce Nagel  writes:

> I confirmed that in the following files, PKG_DBDIR=/usr/pkg/pkgdb is the
> file path set:
>
>  /usr/pkg/etc/pkg_install.conf
>  /etc/mk.conf
>  /etc/pkg_install.conf

That sounds good then.  So you should have a package databsae in
/usr/pkg/pkgdb (meaning a bunch of dirs with things like +CONTENT in
them).  And `pkg_info` should list packages that correspond to those
dirs, more or less exactly.

And, /var/db/pkg should not exist.

> It does appear that pkgin is still putting info in /var/db/pkgin, because
> the cache, pkg_install-err.log, and pkgin.db files there have all been updated
> today (when I updated pkgin using pkgin):
>
>  contents of /var/db/pkgin:
>
>  drwxr-xr-x  2 root  wheel 23552 Aug  8 12:42 cache
>  -rw-r--r--  1 root  wheel   4959205 Aug  8 12:42 pkg_install-err.log
>  -rw-r--r--  1 root  wheel  35078144 Aug  8 12:42 pkgin.db
>
> So it appears pkgin is still writing to /var/db/pkgin for some reason, what
> config file controls that for pkgin?

That's fine.  That's the pkgin database, which is distinct from the
pkgdb.   pkgdb is about the base system (and updated via pkgsrc)
pkg_info, pkg_add, pkg_delete, etc.

pkgin is a program that calls these pkg_foo, and has an additional
databasee.

> The manpage for pkgin says:
>
> /var/db/pkgin
>This directory contains the individual files and
>directories used by pkgin listed below.
>
> The version of pkgin I have installed is 22.10.0nb2, the most current showing
> on pkgsrc.org, so it should be putting things in /usr/pkg/pkgdb, correct?
>
> Maybe I missed something in https://www.pkgsrc.org/pkgdb-change/

Nope, that's not about pkgin at all.  I think your pkgdb setup is ok.

>>You may also want to update pkgin.   I have no idea how old your
>>installed packages are.   As I said before, having some packages old,and
>>especially very old, is asking for trouble because essentially nobody
>>who contributes to pkgsrc does is that way.
>
> My starting point on this system was a fresh NetBSD 9.3 install (in April).
>  I have now installed pkgin-22.10.0.nb2 (and with it pkgin also
> installed
> pkg_install-2025).
>   'pkgin upgrade' still gives "pkgin: empty local package list."

At this point I think you have having a pkgin problem which I am not
super good at diagnosing.  I think, that you can remove the contents of
/var/db/pkgin and it will regenerate.  The idea of "keep" packages is
actually stored in the automatic=YES variable in the pkgdb.

>>> 'pkg_admin check' gives a lot of errors to the effect of:
>>>
>>> pkg_admin: libreoffice-7.5.1.2: File 
>>> `/usr/pkg/libreoffice-7.5.1.2/lib/libreoffice/sha
>>> re/extensions/dict-is/license.txt' is in +CONTENTS but not on filesystem!
>>>
>>> but mentions no other packages, just libreoffice.
>>
>>So your installed libreoffice is broken.  I suggest 'pkgin rm
>>libreoffice' to get to a safer that.
>
> Attempting this gives the "pkgin: empty local package list." error

See previous comment.

>>Also check if you hvae /usr/bin/pkg_info and /usr/pkg/sbin/pkg_info both
>>and if they print the same thing.
>
> /usr/pkg/sbin/pkg_info -Vgives   2025
> /usr/sbin/pkg_info -Vgives   20201218

That seems ok, given that you have PKG_DBDIR set.

>>Try also "pkgin sk" and "pkgin -n upgrade".
>
> "pkgin sk" gives: a long list (looks like all of my installed packages)

It should be far smaller than your installed 


> "pkgin -n upgrade" - gives error: "pkgin: empty local package list."
>
> I'm puzzled as to why pkgin is storing its information in /var/db/pkgin
> (and why the manpage still says that's where it should store it).

Because that's ${VARBASE}/pkgin, which is ok.   There is a fair
question "why is VARBASE on NetBSD /var which is outside ${PREFIX}?"
But that is historically how it is.

So at this point you appear to have a corrupted pkgin database.


Re: Libreoffice: Error about /usr/lib/libstdc++.so.7

2023-08-07 Thread Greg Troxel
Bruce Nagel  writes:

> Using either my original pkgsrc url that e.g. url I get this error when trying
> to do 'pkgin upgrade':
>
> pkgin: empty local package list.
>
> 'pkgin list' gives the same: Requested list is empty.

It is possible you have crossed wires about where your pkgdb is.I
suggest you read carefully:
  https://www.pkgsrc.org/pkgdb-change/

and look around in your filesystem and see the script and be super
careful.

You may also want to update pkgin.   I have no idea how old your
installed packages are.   As I said before, having some packages old,and
especially very old, is asking for trouble because essentially nobody
who contributes to pkgsrc does is that way.
> 'pkg_admin check' gives a lot of errors to the effect of:
>
> pkg_admin: libreoffice-7.5.1.2: File 
> `/usr/pkg/libreoffice-7.5.1.2/lib/libreoffice/sha
> re/extensions/dict-is/license.txt' is in +CONTENTS but not on filesystem!
>
> but mentions no other packages, just libreoffice.

So your installed libreoffice is broken.  I suggest 'pkgin rm
libreoffice' to get to a safer that.

> 'pkg_info -a' gives a long list of packages that looks like what I have
> installed.

Also check if you hvae /usr/bin/pkg_info and /usr/pkg/sbin/pkg_info both
and if they print the same thing.

> If I try to do: 'pkgin install libreoffice' it now offers to install it,
> without the libsdtc++ error it was giving before, but due to the 'empty
> package list' error I have not done that yet, figuring there's a bigger
> issue to resolve.

Try also "pkgin sk" and "pkgin -n upgrade".


seeking desktop hardware recommendation

2023-08-07 Thread Greg Troxel
My hand-me-down old Dell gamer box is having thermal issues.  (It's from
2010, so not a complaint.)  I have been using hand-me-down boxes for a
while, which means I accept hardware not knowing if it works with
NetBSD, and usually it does because it's 5 years old when I get it,
after my Windows-using gamer friend gets something shinier.  This means
I am fairly clueless about buying modern hardware for NetBSD.

My system has a 2010 4-core CPU, 24G RAM and aside from being worried
about thermal issues, my only real complaint is that I'd like more CPU.
Once upgrading of course I want more RAM, but I'm not really bothered by
24G right now.  So:

I'm looking for a mini to mid tower, with:

  - fairly beefy CPU for package and base system builds
  - metal case for RF noise mitigation
  - UL listed power supply
  - quiet when not compiling, but doesn't have to be silent

  - 64G of RAM, preferably expandable to 128G, even more preferably by
adding later and not discarding

  - GbE

  - basic sound

  - graphics sufficient to run X on a single biggish monitor, say
2560x1440, so that I can run firefox, gimp, qgis, and xterm.

  - perhaps CDRW/DVDRW, but if that's USB external and works with netbsd
that's ok

  - space for at least 3 2.5" SSDs.  no need for spinning disks.
smaller/faster SSD slots a bonus

  - works well with NetBSD 10 (am running 9, but about to upgrade
anyway)

  - no need to deal with binary blobs

and questions:

In general, my impression is that Intel built-in graphics and Radeons
should just work, except they won't be really accelerated.   Any advice
about which is likely to have that arrive sooner?

I see that intel 13th gen has P and E cores, and it looks like P cores
have 2x hyperthreading and E cores do not.  Does NetBSD play well with
this?  If I just leaving HT on, and the kernel sees the 2P+E "cpus",
will that be basically ok?

Do people think I should lean to AMD vs Intel?  I'm steering for
low-grief, and low power/fan when not doing much.   I realize this flips
the onboard graphics too.



I am curious if anyone has gotten one of these, and if so, if they could
post a trip report (or send it privately just to me, or for me to send
the text, if they wish to be stealthy).

  https://system76.com/desktops/thelio-mira#specs



Any other place to just order a box, even if it's a screwdriver shop and
you suggest particular motherboard/cpu?

I am also considering refurb, which seems like a way to buy 5 year old
computers for $200.  Any particular recommendations for that would be
appreciated.  For a $200-$300 computer, I'm ok with 32G of RAM; it
doesn't need to have the lifetime of a higher-cost box.  There are lots
of refurb Dells for not much on Amazon, and I'm guessing this is because
of a vast number coming out of corporate service because they are no
longer eligible for service contracts, 5-6 year fixed life thinking, or
because the CPUs won't run W11.


Re: Libreoffice: Error about /usr/lib/libstdc++.so.7

2023-08-07 Thread Greg Troxel
Bruce Nagel  writes:

> NetBSD Bast 9.3 NetBSD 9.3 (GENERIC) #0: Thu Aug  4 15:30:37 UTC 2022  
> mkre...@mkrepro.netbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC amd64
>
> packages are coming from:
> http://cdn.NetBSD.org/pub/pkgsrc/packages/NetBSD/amd64/9.3/All

OK, not super surprising, but could have been all sorts of things

That is likely still packages from 2023Q1, but 2023Q2 packages are
mostly built, perhaps fully.  You may wish to use a URL with an explicit
quarter rather than relying on the symlink, e.g.

https://cdn.netbsd.org/pub/pkgsrc/packages/NetBSD/amd64/9.0_2023Q2/All/


however, as always, you should make a backup before you start.


> Currently-installed gcc reports its version as 7.5.0nb4, 7.5.0nb6 is to be
> installed.

The /usr/lib/ file you reported as something complaining about is from
the base system.  I would see if it exists.  (In general c++ is
difficult to deal with as different compiler versions are not ABI
compatible, and I suspect that is part of the issue.)

> So standard NetBSD/pkgsrc practice is to do a 'pkgin upgrade' rather than
> upgrading individual packages using 'pkgin install ' as one might do
> on e.g., a Linux system?

Well, in theory there are dependency rules and pkgin should not let you
violate them.   However often a new package will want a dependency which
is newer than one you have installed.   And then something else you have
installed will have had a dependency replaced out from under it, and
it's hard to say if it will be ok.

To get concrete

  progA 3 and libX 1 are installed from 2022Q4

time passes, wait until 2023Q2 is available.   Now, progA and libX both
have new versions in the repository, progA 4 and libX 2

 pkgin install progB which is at version 5
   let's say progB also depends on libX 2
   so it upgrades libX to 2, and installs progB 5

 but now, progA 2 is expecting libX 1 and doesn' have it

 maybe, it's possible that pkgin recognizes this and rejects the
 installation, or maybe it upgrades progA.  But knowing if a binary
 package that is installed will be ok with a new dependency is basically
 unsolvable.


I view it as a clue that I don't understand what happens in these
partial upgrade scenarios.  I think it's just asking for trouble and I
don't do it.  That doesn't mean it can't, shouldn't or doesn't work.

I am unclear on what happens in say Debian (as this is about a package
manager not the kernel).   But the choices are basically:

  don't allow upgrading something if there is a depending package not
  being upgraded

  if a package is upgraded, refresh/upgrade everything that depends on
  it

  somehow have data about what is an ABI break and what will work

  upgrade it anyway and break

I would expect Debian does one of the first two, but on the two Debian
systems I deal with, I just 'apt update' and 'apt upgrade' and have thus
never had a reason to figure it out.


This is why I am recommending that for binary packages with pkgin, you
pick a consistent build and "pkgin upgrade" to it, so that all our
packages are from the same consistent build.


Re: Libreoffice: Error about /usr/lib/libstdc++.so.7

2023-08-07 Thread Greg Troxel
Bruce Nagel  writes:

> When attempting to upgrade libreoffice using pkgin, I am getting the following
> before the (massive) list of packages to be updated:

You didn't say what version of NetBSD, which arch, and where the binary
packages you are using are coming from.

> /usr/lib/libstdc++.so.7, needed by gcc7-7.5.0nb6 is not present in this 
> system.

> After having updated some other packages (installing a new version of 
> Firefox52
> in hopes that it wouldn't crash like crazy) trying to run libreoffice gives
> this error:

It is in general unsound to update some but not all packages.   I guess
in theory the dependency rules should express what's necessary and
partial updates should be ok.


Re: Intermitent loss of WiFi

2023-08-04 Thread Greg Troxel
bsd...@tuta.io writes:

> $ doas dmesg | grep -w iwm0
> [ 1.031913] iwm0 at pci2 dev 0 function 0: Intel Dual Band Wireless AC 
> 8265 (rev. 0x78)
> [ 1.031913] iwm0: interrupting at msi3 vec 0
> [ 4.380047] iwm0: hw rev 0x230, fw ver 22.361476.0, address 
> [ 10910.342873] iwm0: autoconfiguration error: fatal firmware error
> [ 1.033640] iwm0 at pci2 dev 0 function 0: Intel Dual Band Wireless AC 
> 8265 (rev. 0x78)
> [ 1.033640] iwm0: interrupting at msi3 vec 0
> [ 5.657031] iwm0: hw rev 0x230, fw ver 22.361476.0, address 
> [ 28069.286994] iwm0: autoconfiguration error: device timeout

I guess you have rbooted.

> I don't remember seeing this last line in 10.0 Beta. So, I think current 
> correctly logs the error. I get tons of pci devices timeout:
>
> [ 30993.196990] autoconfiguration error: pms_disable: command error
> [ 30994.196994] pckbport: command timeout
> [ 31004.356990] autoconfiguration error: pms_enable: command error 35
> [ 31005.356994] pckbport: command timeout
> [ 31015.516990] autoconfiguration error: pms_disable: command error
> [ 31016.516994] pckbport: command timeout
> [ 31026.678834] autoconfiguration error: pms_enable: command error 35
>
> These may not have anything to do with iwm, because as I understand these are 
> config errors for mouse and keyboard (though they both work fine).

This is perhaps pointing to a more general problem with device access.
These other issues could be unrelated though.



Re: Intermitent loss of WiFi

2023-08-03 Thread Greg Troxel
bsd...@tuta.io writes:

> I am on NetBSD 10.0 Beta.
> I get intermitent loss of WiFi with my iwm driver.

This sort of problem happens with some adaptors sometimes, but is not
super common.  I have a few urtwn(4) devices.  One, on a 2006 macbook
(i386) has been reliable.  Another, on a RPI3 (earmv7hf-el) has
occasional failures and I have a cron script to detect that and down/up.
Both of those experiences were on 8 and 9; I'm getting ready to move to
10.

Please describe your hardware and which architecture you are running.
We have no idea what kind of cpu/system etc.  even if it seems likely an
i386 or amd64 laptop.

Please post the dmesg lines for iwm attachment.  You can blur our the
mac address.

You say "loss of WiFi", but I wonder if you can narrow that down.

How often does this happen?

Does "ifconfig" show it still associated?
On the router/AP/whatever, can you tell if it is gone from the network?
If you run tcpdump -n on the adaptor, do you see any incoming frames?
If you "ping -n [router's IP addr]", do you see outgoing frames?
What does netstat -i show for in/out/error counts when it is "lost"?

(I'm not really asking you to post all of this, but to try it all and
see what you can figure out.)

> I think this is the error message in dmesg (I am not sure whether this DRM is 
> i915 or iwm)
> [   968.284136] {drm:netbsd:intel_pipe_update_start+0x33b} *ERROR* Potential 
> atomic update failure on pipe A: -35

That's about graphics.  Does that happen just once, or does it happen
again when the wifi fails again?  Is there anything else in dmesg or
syslog?

> The workaround is to execute the following twice:
> doas ifconfig iwm0 down
> doas ifconfig iwm0 up
>
> It needs to be executed twice, because after the first down and up, I get:
> ifconfig: exec_matches: Resource temporarily unavailable

Try waiting 10s after down before up.  Run ifconfig in between.
Basically try every inspection method you can think of.

If you are up for it, you can read the driver src/sys/dev/pci/if_iwm.c
and turn on IWM_DEBUG and set iwm_debug to nonzero and get lots of
messages.






Re: zfs pool behavior - is it ever freed?

2023-07-31 Thread Greg Troxel
I took your patch and have been adding comments to help me understand
things, as well as debug logging.  I also switched how it works, to have
an ifdef for netbsd approch vs others, to make it less confusing -- but
it amounts to the same thing.  (I understand your intent was to touch as
few lines as possible and agree that your approach is also sensible.)

I conclude:

  the default behavior is to set the ARC size to all memory except 1
  GB

  Even on a high-memory machine, without memory pressure mechanisms, the
  current code is dangerous -- even if in practice it is usually ok.

  If the ARC size is more moderate, things are ok

  The ARC tends to fill with metadata, and I believe this is because the
  vnode cache has refs to the ARC so it is not evictable

  We don't have any real evidence that huge ARC is much better than
  biggish ARC on real workloads.

I attach my patch, which I am not really proposing for committing this
minute.  But I suggest anyone trying to run zfs on 8G and below try it.
I think it would be interesting to hear how it affects systems with lots
of memory.

On a system with 6000 MB (yes, that's not a power of 2 - xen config), I
end up with 750 MB of arc_c.   There is often 2+ GB of pool usage, but
that's of course also non-zfs.  The system is stable, so far.  (I have
pkgsrc, distfiles, binary packages in zfs; the OS is in UFS2.)

ARCI 002 arc_abs_min 16777216
ARCI 002 arc_c_min 196608000
ARCI 005 arc_c_max 786432000
ARCI 010 arc_c_min 196608000
ARCI 010 arc_p 393216000
ARCI 010 arc_c 786432000
ARCI 010 arc_c_max 786432000
ARCI 011 arc_meta_limit 196608000



DIFF.arc
Description: Binary data


Re: nvmm woes: won't load

2023-07-29 Thread Greg Troxel
"Jonathan A. Kollasch"  writes:

> All too old, need the Unrestricted Guest feature on Intel for nvmm.

I updated the man page.


Re: zfs pool behavior - is it ever freed?

2023-07-29 Thread Greg Troxel
mlel...@serpens.de (Michael van Elst) writes:

> t...@netbsd.org (Tobias Nygren) writes:
>
>>There exists ZFS code which hooks into UVM to drain memory -- but part
>>of it is ifdef __i386 for some reason. See arc_kmem_reap_now().
>
> That's an extra for 32bit systems (later code replaced __i386 with
> the proper macro) where kernel address space is much smaller.

Sure, but I don't see why it shouldn't always be hooked up.

The upstream code ends up setting

  arc_c_min: 1/32 of memory.  I think this is "amount below which arc
  will not be pushed, even under memory pressure".

  arc_c: 1/8 of memory.  I think this is "target size of ARC"

  arc_c_max: all memory except 1GB.  I think this is "size above which
  ARC will be hard prohibited from growing".

The idea that arc_c_max is so big is only reasonable if there is an
effective, known-working mechanism to free memory under pressure.

I'm in the middle of trying to document what the variables mean and
rationalize the size calculation, and will post a diff fairly soon.

With tnn@'s patch, I am seeing

  $ sysctl kstat.zfs.misc.arcstats|egrep size
  kstat.zfs.misc.arcstats.size = 48786184
  kstat.zfs.misc.arcstats.compressed_size = 36969472
  kstat.zfs.misc.arcstats.uncompressed_size = 185070080
  kstat.zfs.misc.arcstats.overhead_size = 1348096
  kstat.zfs.misc.arcstats.hdr_size = 10160672
  kstat.zfs.misc.arcstats.data_size = 0
  kstat.zfs.misc.arcstats.metadata_size = 38317568
  kstat.zfs.misc.arcstats.other_size = 307944
  kstat.zfs.misc.arcstats.anon_size = 16384
  kstat.zfs.misc.arcstats.mru_size = 28901888
  kstat.zfs.misc.arcstats.mru_ghost_size = 87785472
  kstat.zfs.misc.arcstats.mfu_size = 9399296
  kstat.zfs.misc.arcstats.mfu_ghost_size = 97047552
  kstat.zfs.misc.arcstats.l2_size = 0
  kstat.zfs.misc.arcstats.l2_asize = 0
  kstat.zfs.misc.arcstats.l2_hdr_size = 0

but also vmstat -m

  In use 1445870K, total allocated 3323556K; utilization 43.5%

so I think either ARC use is not being accounted for correctly or there
is substantial non-ARC use.   But, my system is stable with 6G of RAM
under Xen. (Which is good because nvmm requires newish CPUs and I can't
run it.)


Also, I don't understand "ghost" yet.


Re: zfs pool behavior - is it ever freed?

2023-07-29 Thread Greg Troxel
tlaro...@polynum.com writes:

> On Sat, Jul 29, 2023 at 12:42:13PM +0200, Tobias Nygren wrote:
>> On Fri, 28 Jul 2023 20:04:56 -0400
>> Greg Troxel  wrote:
>> 
>> > The upstream code tries to find a min/target/max under the assumption
>> > that there is a mechanism to free memory under pressure -- which there
>> > is not.
>> 
>> There exists ZFS code which hooks into UVM to drain memory -- but part
>> of it is ifdef __i386 for some reason. See arc_kmem_reap_now().
>
> FWIW, with jemalloc, there is the possibility to configure to give back
> memory to the system.
>
> Since jemalloc is incorporated in NetBSD, one(TM) should perhaps look if
> the feature is available and what has to be done in this area to use
> it.

It's unlikely reasonable to rototill the way zfs allocates memory.  The
issue is just that things aren't hooked up.


Re: zfs pool behavior - is it ever freed?

2023-07-28 Thread Greg Troxel


Tobias Nygren  writes:

> n Thu, 27 Jul 2023 06:43:45 -0400
> Greg Troxel  wrote:
>
>>   Thus it seems there is a limit for zfs usage, but it is simply
>>   sometimes too high depending on available RAM.
>
> I use this patch on my RPi4, which I feel improves things.
> People might find it helpful.
> There ought to be writable sysctl knobs for some of the ZFS
> tuneables, but looks like it isn't implemented in NetBSD yet.

It definitely helps.

The upstream code tries to find a min/target/max under the assumption
that there is a mechanism to free memory under pressure -- which there
is not.

Reading the code, the cache is supposed to free things if size >
target.  I am not sure that works.   It makes sense to have a target/max
gap so that the freee can be async.  That's often lost.

The code to set min/target/max does not seem clearly sensible to me.
It seems to set arc_max to all RAM except 1 GB.  No wonder we have trouble.

Looking at kstats, I see excessive meta usage, way above limit.  In his
case min/max are historical, not controls.

Here, meta imit is 200M, which is 1/4 of the 800M target for the whole
cache (1/8 of 6GB, my allocation to dom0).  That seems reasonable.
But there is 1.2G of metadata.  Perhaps that is uncompresed size.


kstat.zfs.misc.arcstats.arc_meta_used = 1225255744
kstat.zfs.misc.arcstats.arc_meta_limit = 201326592
kstat.zfs.misc.arcstats.arc_meta_max = 1407252032
kstat.zfs.misc.arcstats.arc_meta_min = 100663296



Re: nvmm woes: won't load

2023-07-28 Thread Greg Troxel
"Jonathan A. Kollasch"  writes:

> All too old, need the Unrestricted Guest feature on Intel for nvmm.

Thanks - that's a huge clue, not turned up in my previous searching.

Interesting -- that's not what the man page says, so I guess we should
fix the man page.

reading, it seems "Unrestricted Guest" is sort of linked to Extended
Page Tables.

What is the 'cpuctl identify' codepoint for unrestricted guest?

What is the approximate date by which (non-celeron/reduced-power/etc.)
Intel CPUs generally support nvmm?

(It turns out the computer I thought was more 2014ish has a CPU from 1Q
2010:
  
https://www.intel.com/content/www/us/en/products/sku/41447/intel-core-i7930-processor-8m-cache-2-80-ghz-4-80-gts-intel-qpi/specifications.html

That cpu claims EPT, but cpuctl doesn't show it.


nvmm woes: won't load

2023-07-28 Thread Greg Troxel
I am trying to run qemu/nvmm because of zfs memory problems.  But I'd
like nvmm to work anyway, so zfs is irrelevant here.

I have a Dell Inspiron 560 from around 2010 (my computers were free to
good home, so I'm not really sure).  This is a netbsd-10 system that's
up to date as of June 28.  (Will update, but I have not seen anything
related.)

Intel says this supports VT-X:nnn
 
https://ark.intel.com/content/www/us/en/ark/products/42801/intel-pentium-processor-e5700-2m-cache-3-00-ghz-800-mhz-fsb.html

cpuctl says:

  cpu0: "Pentium(R) Dual-Core  CPU  E5700  @ 3.00GHz"
  cpu0: Intel Xeon 31xx, 33xx, 52xx, 54xx, Core 2 Quad 8xxx and 9xxx 
(686-class), 2992.50 MHz
  cpu0: family 0x6 model 0x17 stepping 0xa (id 0x1067a)
  cpu0: features 
0xbfebfbff
  cpu0: features 0xbfebfbff
  cpu0: features 0xbfebfbff
  cpu0: features1 0xc00e3bd
  cpu0: features1 0xc00e3bd
  cpu0: features2 0x20100800
  cpu0: features3 0x1

In the BIOS there is an option to enable virtualization and it was
enabled.  Apparently "VT-X" from the marketing people is "VMX" in more
nerdy contexts.

I have a dim memory of running nvmm before on this machine, but it's
dim; would have been with anita.  Probably under 9.

Trying to load nvmm:

  # modload nvmm
  modload: nvmm: Not supported

and in the log

  NVMM: proc-based-ctls requirements not satisfied
  autoconfiguration error: nvmm: cpu not supported
  WARNING: module error: modcmd(CMD_INIT) failed for `nvmm', error 86


On a 2014 system:

  cpu0: "Intel(R) Core(TM) i7 CPU 930  @ 2.80GHz"
  cpu0: Intel Core i7, Xeon 34xx, 35xx and 55xx (Nehalem) (686-class), 2800.15 
MHz
  cpu0: family 0x6 model 0x1a stepping 0x5 (id 0x106a5)
  cpu0: features 
0xbfebfbff
  cpu0: features 0xbfebfbff
  cpu0: features 0xbfebfbff
  cpu0: features1 0x98e3bd
  cpu0: features1 0x98e3bd
  cpu0: features2 0x28100800
  cpu0: features3 0x1
  cpu0: features7 0x9c00

trying to modload gets me:

  NVMM: proc-based-ctls2 requirements not satisfied

So, is there something wrong here, or is it that nvmm needs a pretty
recent CPU?  nvmm(4) implies "VMX is enough" and if that's not mostly
true it would be nice to fix it.

(The 2010 machine runs xen just fine.)


Re: zfs pool behavior - is it ever freed?

2023-07-28 Thread Greg Troxel
mlel...@serpens.de (Michael van Elst) writes:

> g...@lexort.com (Greg Troxel) writes:
>
>>I'm not either, but if there is a precise description/code of what they
>>did, that lowers the barrier to us stealing* it.  (* There is of course
>>a long tradition of improvements from various *BSD being applied to
>>others.)
>
> The FreeBSD code is already there and I have exposed a few settings:
>
> vfs.zfs_arc.meta_limit = 0
> vfs.zfs_arc.meta_min = 0
> vfs.zfs_arc.shrink_shift = 0
> vfs.zfs_arc.max = 5292193280
> vfs.zfs_arc.min = 661524160
> vfs.zfs_arc.compressed = 1
>
> but that's not enough to control the behaviour.

Is that in current only?   I don't see that in netbsd-10.

I did some code reading and it looks like the arc parameters are
computed at module load time, from the systctl/whatever values as they
exist at that moment, and then not adjusted.  But I didn't read that far
and am still trying to understand.  The ARC sizing rules are pretty
complicated.


Re: zfs pool behavior - is it ever freed?

2023-07-28 Thread Greg Troxel
Mr Roooster  writes:

> I'm not sure they did a lot more than expose the ARC limit as a sysctl.

I'm not either, but if there is a precise description/code of what they
did, that lowers the barrier to us stealing* it.  (* There is of course
a long tradition of improvements from various *BSD being applied to
others.)

> I moved to FreeBSD from Net a few years ago (mainly to get ZFS), and
> have had similar issues under heavy load with a large ARC. It wouldn't
> crash or hang, but it would always favour killing something over
> flushing the ARC under pressure. I did a little bit of digging and got
> the impression this was the way it was intended to work. (Although
> reading this thread it may be a little more complex than that. :) )

Somebody may intend that, but it seems obviously buggy to kill processes
than to drop data from a cache.

> Once I limited my ARC my problems went away. I limited mine to 16 gig
> on a 96 gig system, but I was running some processes with high memory
> usage. I've not had cause to increase it though, and the system runs
> reliably. It has a few zpools, and I'm running a VM of an iSCSI
> exposed ZVOL, so it get a decent amount of use.

Did I hear that right -- you had problems on a 96 GB system with the
default settings?  What was the default limit?

Did you -- or could you -- characterize the performance impact on ZFS of
having ARC limited to say 8/16/24G?  And is this with spinning disks or
SSD, with or without L2ARC?

> (This is my home system, not a production system, however it does have
> something like 10 HDDs in, so is often quite I/O loaded).

Wow, that's a lot of disks!


Re: zfs pool behavior - is it ever freed?

2023-07-27 Thread Greg Troxel
Mike Pumford  writes:

> Now I might be reading it wrong but that suggest to me that it would
> be an awful idea to run ZFS on a system that needs memory for things
> other than filesystem caching as there is no way for those memory
> needs to force ZFS to give up its pool usage.

As I infer the kernel behavior from reading tnn@'s patch, there is a
limit on the amount of ARC storage.  On my 8G system, it seems ARC ends
up around 2-2.5G and doesn't grow.  One can debate what the limit should
be -- and clearly that's too big for a 4G system, but it does seem to be
bounded.

> If I've read it right there needs to be a mechanism for memory
> pressure to force ZFS to release memory. Doing it after all the
> processes have been swapped to disk is way too late as the chances are
> the system will become non-responsive by then. From memory this was a
> problem FreeBSD had to solve as well.

It would be interesting to read a description of what they did.  That
seems easier than figuring it out from scratch.

> Even with the conventional BSD FFS I have to set vm.filemin and
> vm.filemax to quite low values to stop the kernel prioritizing file
> system cache over process memory and thats on a system with 16GB of
> RAM. Without that tuning I'd regularly have processes effectively
> rendered unresponsive as they were completely swapped out in favor of
> FS cache.

Yes, but the FS cache is allowed to grow to most of memory.  The ARC
size has a limit that if you have as much memory as the people that
wrote the code comntemplated, is not nearly "most of memory".

Another thing I don't understand is how ARC relates to the vnode cache
and the buffer cache that stores file contents, and in particular if
there are two copies of things.

> What's the equivalent lever for ZFS?

Some variable not hooked up to a sysctl!


Re: zfs pool behavior - is it ever freed?

2023-07-27 Thread Greg Troxel
David Brownlee  writes:

> I would definitely like to see something like this in-tree soonest for
> low memory (<6GB?) machines, but I'd prefer not to affect machines
> with large amounts of memory used as dedicated ZFS fileservers (at
> least not until its easily tunable)

Can you apply this locally and spiff it up so that for say >= 8GB or >=
16GB the new rule doesn't fire?  That seems the fastest path to fixing
that is clearly very broken.  (IMHO avoiding pathological behavior is
more important than what is likely a minor efficiency issue, but it's
easier to avoid that discussion.)

(We don't have any data on the table that says this would hurt, either,
assuming that anyone using zfs is either using ssd only or has l2arc on
ssd.  Actual data would be interesting!)


Re: zfs pool behavior - is it ever freed?

2023-07-27 Thread Greg Troxel
Tobias Nygren  writes:

> I use this patch on my RPi4, which I feel improves things.
> People might find it helpful.

That looks very helpful; I'll try it.

> There ought to be writable sysctl knobs for some of the ZFS
> tuneables, but looks like it isn't implemented in NetBSD yet.

That seems not that hard -- it would be great if someone(tm) did that
and mailed a patch.

> --- external/cddl/osnet/dist/uts/common/fs/zfs/arc.c  3 Aug 2022 01:53:06 
> -   1.22
> +++ external/cddl/osnet/dist/uts/common/fs/zfs/arc.c  27 Jul 2023 11:10:40 
> -
> @@ -6100,6 +6100,10 @@ arc_init(void)
>   else
>   arc_c_max = arc_c_min;
>   arc_c_max = MAX(arc_c * 5, arc_c_max);
> +#if defined(__NetBSD__) && defined(_KERNEL)
> +/* XXX prevent ARC from eating more than 12% of kmem */
> + arc_c_max = MIN(arc_c, vmem_size(heap_arena, VMEM_ALLOC | VMEM_FREE) / 
> 8);
> +#endif
>
>   /*
>* In userland, there's only the memory pressure that we artificially

That seems eminently sensible and is sort of what I was thinking of
heading to.  Interesting q about /8 vs /16, but it's a reasonable enough
value to avoid lockups and that's 90% of the benefit.

I wonder if we should commit that as obviously better than where we are
now, where machines of <= 4G fail badly.

It would be interesting for people with 8G and 16G machines to try this
patch.  That will be somewhat less and maybe not less respectively.

Also perhaps a dmesg printout of what arc_c_max is set to, to help in
figuring things out.

(I suppose one can gdb it, too, for testing.)



Re: zfs pool behavior - is it ever freed?

2023-07-27 Thread Greg Troxel
I have a bit of data, perhaps merged with some off list comments:

  People say that a 16G machine is ok with zfs, and I have seen no
  reports of real trouble.

  When I run my box with 4G, it locks up.

  When I run my box with 8G, I end up with pool usage in the 3 G to 3.5
  G range.  It feels like there's a limit as I've never seen it above
  3.5G.  vmstat -m says (after a lot of things happening):
In use 1975994K, total allocated 3110132K; utilization 63.5%

  On machines I have handy to check without zfs (amd64 if not labeled):
In use 198214K, total allocated 217912K; utilization 91.0%
   (1G, n9 rpi3, operating near RAM capacity)
In use 67140K, total allocated 71664K; utilization 93.7%
   (1G, n9 rpi3, doing very little)
In use 813025K, total allocated 864324K; utilization 94.1%
   (4G, n9, operates a backup disk (ufs2) and little else)
In use 901729K, total allocated 975280K; utilization 92.5%
   (4G, n9, router and various home servers)
In use 574035K, total allocated 652188K; utilization 88.0%
   (5G, n9, no building, mail+everything_else server)
In use 2841803K, total allocated 3120148K; utilization 91.1%
   (24G, n9, 14G tmpfs, has built a lot of packages)
  
  On the zfs box, the big users are:
zio_buf_512 dnode_t dmu_buf_impl zio_buf_16384 zfs_znode_cache


My conclusions:

  Generally in NetBSD pool usage for caching scales appropriately with
  RAM and/or responds to pressure.  That's why we see almost no reports
  of trouble expect for zfs.

  A machine without zfs that is in the 4G class will use 0.5-1G for pools.

  A 4G machine with zfs, and an 8G machine, tend to end up around 3.5G
  for pools.  It seems that zfs uses 2.5-3G, regardless of what's
  available.

  Thus it seems there is a limit for zfs usage, but it is simply
  sometimes too high depending on available RAM.

  Utilization is particularly poor on the zfs machine, 64% vs 88-94% for
  the rest.

  Our howto should say:

32G is pretty clearly enough.  Nobody thinks there will be trouble.
16G is highly likely enough; we have no reports of trouble.
8G will probably work but ill advised for production use.
4G will not work; we have no reports of succesful long-term operation

When you run out, it's ugly.  External tickle after sync(8) works to
reboot.  Other wdog approaches unclear.


Additional data welcome of course.


Re: zfs pool behavior - is it ever freed?

2023-07-22 Thread Greg Troxel
Hauke Fath  writes:

> On Fri, 21 Jul 2023 08:31:46 -0400, Greg Troxel wrote:
> [zfs memory pressure]
>
>>   Are others having this problem?
>
> I have two machines, one at home (-10) and one at work (-9), in a 
> similar role as yours (fileserver and builds). While both have had 
> their moments, those have never been zfs related.
>
> They both have 32 GB RAM. The home machine, currently running a 
> netbsd-9 build natively and pkg_rr in a VM, is using 16 GB for pools as 
> we speak. 

Using half the ram for pools feels like perhaps a bug, depending -- even
if you are getting away with it.

I am curious:

  What VM approach?

  How much ram in the domU (generic term even if not xen)?

  Are you using NFS from the domU to dom0?  domU running zfs?  Something
  else?

  Is the 16G for pools the sum of the dom0 and domU pools?  Or ?

> My guess would be that your 8 GB are simply not enough for sustaining 
> both zfs and builds.

I think that's how it is, but it seems obviously buggy for that to be
the case.  It is dysfunctional to run the system to lockup caching
things that don't need to be cached.  The ffs vnode cache for example
does not do this.

The zfs howto currently talks about zfs taking 1G plus 1G per 1T of
disk.  For me that would be 1.8G, which would be ok.  But that's not
what happens.

Thanks for the data point; I'll probably edit the zfs HOWTO.  As it is
we should probably be recommending against zfs unless you have 64G of
RAM :-( as even your system doesn't seem healthy memory usage wise.


Re: zfs pool behavior - is it ever freed?

2023-07-21 Thread Greg Troxel
This script worked to reboot after a wedge.  Assuming one has a
watchdog of course.

  #!/bin/sh

  if [ `id -u` != 0 ]; then
  echo run as root
  exit 1
  fi

  wdogctl -e -p 360 tco0

  while true; do
  echo -n "LOOP: "; date
  date > /tank0/n0/do-wdog
  sync
  wdogctl -t
  sleep 60
  done


zfs pool behavior - is it ever freed?

2023-07-21 Thread Greg Troxel
I'm having trouble with zfs causing a system to run out of memory, when
I think it should work ok.  I have tried to err on the side of TMI.

I have a semi-old computer (2010) that is:
  netbsd-10
  amd64
  8GB RAM
  1T SSD
  cpu0: "Pentium(R) Dual-Core  CPU  E5700  @ 3.00GHz"
  cpu1: "Pentium(R) Dual-Core  CPU  E5700  @ 3.00GHz"

and it basically works fine, besides being a bit slow by today's
standards.  I am using it as a build and fileserver, heading to
eventually running pbulk, either in domUs or chroots.  I have recently
moved 2 physical machines (netbsd-9 i386 and amd64) to domUs; I use
these to build packages for production use.  (The machines are 2006 and
2008 mac notebooks, with painfully slow spinning disks and 4G of RAM
each -- but they work.)

wd0 has a disklabel, with / and /usr as normal FFSv2 (a and e), normal
swap on wd0b.  wd0f is defined as most of the disk, and is the sole
component of tank0:

  #> zpool status
pool: tank0
   state: ONLINE
scan: scrub repaired 0 in 0h8m with 0 errors on Tue Jul  4 20:31:03 2023
  config:

  NAME   STATE READ WRITE CKSUM
  tank0  ONLINE   0 0 0
/etc/zfs/tank0/wd0f  ONLINE   0 0 0

  errors: No known data errors

I have a bunch of filesystems, for various pkgsrc branches (created from
snapshots), etc:

  NAME   USED  AVAIL  REFER  MOUNTPOINT
  tank0  138G   699G26K  /tank0
  tank0/b0  6.16G   699G  6.16G  /tank0/b0
  tank0/ccache  24.1G   699G  24.1G  /tank0/ccache
  tank0/distfiles   35.1G   699G  35.1G  /tank0/distfiles
  tank0/n0  31.5K   699G  31.5K  /tank0/n0
  tank0/obj 3.48G   699G  3.48G  /tank0/obj
  tank0/packages7.27G   699G  7.27G  /tank0/packages
  tank0/pkgsrc-2022Q1130M   699G   567M  /tank0/pkgsrc-2022Q1
  tank0/pkgsrc-2022Q2145M   699G   569M  /tank0/pkgsrc-2022Q2
  tank0/pkgsrc-2022Q3194M   699G   566M  /tank0/pkgsrc-2022Q3
  tank0/pkgsrc-2022Q4130M   699G   573M  /tank0/pkgsrc-2022Q4
  tank0/pkgsrc-2023Q1147M   699G   582M  /tank0/pkgsrc-2023Q1
  tank0/pkgsrc-2023Q2148M   699G   583M  /tank0/pkgsrc-2023Q2
  tank0/pkgsrc-current  10.3G   699G  1.14G  /tank0/pkgsrc-current
  tank0/pkgsrc-wip   623M   699G   623M  /tank0/pkgsrc-wip
  tank0/u0  1.91M   699G  1.91M  /tank0/u0
  tank0/vm  49.5G   699G23K  /tank0/vm
  tank0/vm/n9-amd64 33.0G   722G  10.1G  -
  tank0/vm/n9-i386  16.5G   711G  4.38G  -
  tank0/ztmp 121M   699G   121M  /tank0/ztmp

which all feels normal to me.


I used to usually boot this as GENERIC.  Now I'm booting xen with 4G:

  menu=GENERIC:rndseed /var/db/entropy-file;boot netbsd
  menu=GENERIC single user:rndseed /var/db/entropy-file;boot netbsd -s
  menu=Xen:load /netbsd-XEN3_DOM0.gz root=wd0a rndseed=/var/db/entropy-file 
console=pc;multiboot /xen.gz dom0_mem=4096M
  menu=Xen single user:load /netbsd-XEN3_DOM0.gz root=wd0a 
rndseed=/var/db/entropy-file console=pc -s;multiboot /xen.gz dom0_mem=4096M
  menu=GENERIC.ok:rndseed /var/db/entropy-file;boot netbsd
  menu=Drop to boot prompt:prompt
  default=3
  timeout=5
  clear=1

I find that after doing things like cvs update in pkgsrc, I have a vast
amount of memory in pools:

  Memory: 629M Act, 341M Inact, 16M Wired, 43M Exec, 739M File, 66M Free
  Swap: 16G Total, 16G Free / Pools: 3372M Used

vmstat -m, sorted by Npage and showing > 1E4:

  zio_buf_16384 16384 57643153341 33786 22341 11445 30831 0   inf 
7143
  zio_buf_2560 2560   18636017890 15244  2467 12777 12777 0   inf 
12031
  ffsdino2 264   5406070   348374 28691 15875 12816 13522 0   inf   
 0
  zfs_znode_cache 248 245152   0   206469 1301518 12997 13015 0   inf  
665
  ffsino   280   5402490   348016 30887 17156 13731 14488 0   inf   
 0
  zio_buf_2048 2048   36944036004 15617   599 15018 15026 0   inf 
14259
  zio_buf_1536 2048   41491040737 18313 6 18307 18313 0   inf 
17657
  zio_buf_1024 1536   55808054191 22942   357 22585 22942 0   inf 
21442
  dmu_buf_impl_t 216 5388280   440673 2301611 23005 23016 0   inf  
380
  arc_buf_hdr_t_f 208 657474   0   556468 25273   638 24635 25096 0   inf 
7913
  zio_data_buf_51 1024 187177  0   157005 45575 14127 31448 45575 0   inf 
10220
  vcachepl 640   266639056918 34959 2 34957 34958 0   inf   
 1
  dnode_t  640   5761980   485522 70645  9470 61175 70645 0   inf 
11511
  zio_buf_512 1024   8482400   798838 141743 15535 126208 128224  0   inf 
96759
  Memory resource pool statistics
  NameSize Requests Fail Releases Pgreq Pgrel Npage Hiwat Minpg Maxpg 
Idle

systcl:

  kstat.zfs.misc.arcstats.size = 283598992

If I continue to do things, the system locks up and needs to have the
reset button pushed.  I'm now trying an external tickle watchdog 

Re: NetBSD & disks with 4K sector size

2023-07-20 Thread Greg Troxel
mlel...@serpens.de (Michael van Elst) writes:

> g...@lexort.com (Greg Troxel) writes:
>
>>mlel...@serpens.de (Michael van Elst) writes:
>>> The xbd driver lies about the sector size and always reports 512byte
>>> sectors. If you pass through a 4k sector host disk, this make some
>>> I/O operations fail.
>
>>What do you suggest we do?
>
> I suggested to make it tell the truth, all data is provided by the
> backend. Then it just works like other drivers.

Thanks - I have run that up the flagpole on port-xen,  with status

  EWOULDBLOCK|ENOPATCH

and hopefully that async call will return successfully!


Re: NetBSD & disks with 4K sector size

2023-07-20 Thread Greg Troxel
mlel...@serpens.de (Michael van Elst) writes:

> g...@lexort.com (Greg Troxel) writes:
>
>>With any luck, this is supported and the xbd driver in NetBSD is just
>>not noticing the sector size variable and it's a fairly small matter of
>>programming.
>
> The xbd driver lies about the sector size and always reports 512byte
> sectors. If you pass through a 4k sector host disk, this make some
> I/O operations fail.

What do you suggest we do?


Re: NetBSD & disks with 4K sector size

2023-07-20 Thread Greg Troxel
Probably best to use port-xen now that we think it's a xen thing.  I
would suggest investigating everything suggested to you and then sending
a new message based on your future understanding to port-xen.

1) I am unclear on if the xbd abstraction has the concept of 4K sizes,
and if so how that is communicated, and if not how the dom0 is supposed
to map things.

With any luck, this is supported and the xbd driver in NetBSD is just
not noticing the sector size variable and it's a fairly small matter of
programming.

Looking quickly at sys/arch/xen/xbd_xenbus.c, I see some concept of
sector size, but it would take me a while to understand and trace all
this.

I suggest trying a current or netbsd-10 PV or PVH domU and seeing what happens.



2) In your HVM success, do the disks appaer as 4K within the domU?


Re: cctlds in wtf

2023-06-29 Thread Greg Troxel
Jan Schaumann  writes:

> It's not uncommon for me to want to easily look up
> what country a specific ccTLD is for (literally "wtf
> is md").  I was wondering if people objected to adding
> the whole catalog to acronyms.comp for wtf(1)?

I don't think this is a good idea (which should be read as an
objection).  These aren't acronyms; they are codepoints, and a vast
number of 2-letter combinations have values.  It seems like a tremendous
amount of noise.  If you want to look up a 2-letter or 3-letter country
code, that's a different question than asking for an acronym.  Jeremy
has pointed out that it's there already.

> If people see value in adding those, would you also
> want to have the gTLDs added?  I kinda feel like
> that'd be too many.

It is definitely too many, and in the wrong place.

Now, if you want to add an argument -c or something to wtf(6) so that it
looks up the argument which must be a 2 or 3 letter cc instead of
treating it as an acronym, by consulting /usr/share/misc/country or
domain, that seems ok.



Re: Call for testing: Diagnostics for broken downloads from NetBSD.org CDN

2023-06-17 Thread Greg Troxel
mlel...@serpens.de (Michael van Elst) writes:

> Fastly caches data in segments:
>
> https://docs.fastly.com/en/guides/segmented-caching
>
> If some segments of a file can be fetched from the backend
> and others cannot, fastly will deliver a partial file and
> return an error. For a complete file all segments must have
> been fetched and cached succssfully.

Is a client that writes a partial file and exits considered buggy?  Or
are users obligated to check for exit status 0 and if not rm the file?
Or something else?  This all just seems a little odd to me, but I am far
from a fetch/CDN expert.


Re: CentOS emulation

2023-06-11 Thread Greg Troxel
Fekete Zoltán  writes:

> I want to experiment with CentOS on my NetBSD box. Is it enough if I
> set up kernel, /proc, /dev, then copy the contents of /etc and /lib64
> from a real CentOS installation over /linux/emul? Is there any further
> specific task I should consider?

/emul/linux, not the other way around.

You might want /usr/bin from a minimal system, if you are running things
that want Linux-flavored tools.

Not sure it's enough, but it sounds like the right path to head down.
netbsd-10 is better than 9, surely and current better still if there are
newish or just newly-encountered-in-netbsd-emul syscalls.  You might
have to implement some more.


Re: Meaning of file flags

2023-05-23 Thread Greg Troxel
I just pulled out "The Design and Implementation of the 4.4BSD Operating
System".  It talks about immutable, append-only, and nodump on page 263.
There is no mention of "archive".

The book is also clear about 16  bits of user flags and 16 bits of
system flags.  chflags(1) is confusing about this, and chflags(2) is
better, showing that there are system flags and user flags (SF_ and
UF_).

It turns out there are more flags in sys/stat.h than are documented in
chflags(2):

/*
 * Definitions of flags stored in file flags word.
 *
 * Super-user and owner changeable flags.
 */
#define UF_SETTABLE 0x  /* mask of owner changeable flags */
#define UF_NODUMP   0x0001  /* do not dump file */
#define UF_IMMUTABLE0x0002  /* file may not be changed */
#define UF_APPEND   0x0004  /* writes to file may only append */
#define UF_OPAQUE   0x0008  /* directory is opaque wrt. union */
/*  UF_NOUNLINK 0x0010 [NOT IMPLEMENTED] */
/*
 * Super-user changeable flags.
 */
#define SF_SETTABLE 0x  /* mask of superuser changeable flags */
#define SF_ARCHIVED 0x0001  /* file is archived */
#define SF_IMMUTABLE0x0002  /* file may not be changed */
#define SF_APPEND   0x0004  /* writes to file may only append */
/*  SF_NOUNLINK 0x0010 [NOT IMPLEMENTED] */
#define SF_SNAPSHOT 0x0020  /* snapshot inode */
#define SF_LOG  0x0040  /* WAPBL log file inode */
#define SF_SNAPINVAL0x0080  /* snapshot is invalid */

where we see that there is perhaps a relationship, or at least a bit
reuse, between NODUMP and ARCHIVED.

The three undocumented flags appear to be implementation details and not
part of the syscall interface.

But, reading the sources, I see that the archive bit was indeed added
for MS-DOS compat.  Still, it's far from clear what it means in BSD
other than that it enables read/write access to an archive bit in the
underlying filesystem.  This seems to come from Lite in 1994, but I'm
running out of the will to conduct software archaeology.

So perhaps we should describe it as

  archive bit set in underlying foreign filesystem

and perhaps deny setting it in filesystems that don't implement an
archive bit.

(Probably, the use of this bit is now historical, as it seems at best
awkward for backups, in that one of course wants to make backups to
different media sets, to be able to restore precisely without adding
back deleted files, and to have deduplication.)




Re: Meaning of file flags

2023-05-23 Thread Greg Troxel
Rocky Hotas  writes:

>> +The
>> +.Va arch
>> +flag is only used in connection with certain
>> +filesystems (e.g., MS-DOS), where it indicates whether
>> +a file has been modified since it was last backed up.

I find that very surprising.  AIUI, these flags date from 4.4BSD, and
there was not a strong culture of interoperating with MS-DOS
filesystems.  And, people were aware of TOPS-20 and more mainframe type
systems, where there was an idea that a file would be migrated to tape
but still have an entry in the filesystem.

I would suggest reading the 4.4BSD sources, or our own history, to see
what the flags did.  It is also possible they were defined as a possible
future good idea and never really used.



> Considering the first lines of the section `Usage' here:
>
>  

That's about

  "CP/M, Microsoft operating systems, OS/2, and AmigaOS"

according to the article.  I do not expect the authors of the BSD code
to have considered themselves to be implementing a compatible feature
from CP/M.


Re: Meaning of file flags

2023-05-22 Thread Greg Troxel
Rocky Hotas  writes:

> I still can't understand the meaning of `nodump' and `arch'.

I can help with nodump...

man 2 chflags

contains

   UF_NODUMP Do not dump the file.

and dump(8) has an -h flag to respect the nodump flag.

Basically, if you use -h0 when calling dump, then files that have the
nodump flag (and everything under a dir that has the nodump flag) will
be omitted from the backup.




Re: Help understanding pkgsrc snapshots

2023-04-17 Thread Greg Troxel
vom513  writes:

> So say instead of using the pkgsrc tarballs I did a CVS clone
> (checkout ? I’m more familiar with git).

Yes, checkout.   CVS creates a working tree locally.  Unlike git it does
not copy the repo.

> Then I build a couple of packages.  At some point after this I want to
> refresh from CVS - is there some top level “make clean” type thing I
> need to do ?  Whats the best practice here ?  I guess my thought is
> around the packages that I built now have various files they didn’t
> before by virtue of being built.  I would be concerned about pulling
> from CVS and the “base” files in these dirs having been changed/added
> etc.

Two related issues:

  With pkgsrc, you do "make package-install" and IMHO always want to
  "make clean" afterwards.  You can "make clean-depends", or you can set
DEPENDS_TARGET= bin-install clean
  in mk.conf to try to use binary packages and clean up afterwards.

  Separately from workdirs, after you update some package might be out
  of date.  My approach is pkg_rolling-replace; others have other
  aproaches.
  There used to be a nice wiki page about this, but it appears to have
  been deleted!


Re: Help understanding pkgsrc snapshots

2023-04-16 Thread Greg Troxel
Also:

  If you want to mix packages you build and a published set, you have to
  match the sources to the sources they used, which means same quarterly
  branch.  It's ok to update along the branch -- because of our rules
  for applying changes to the branch -- that's what it's for.

  2023Q1 builds for sparc are surely woefully incomplete today, only 3
  weeks in.   It's likely 2022Q4 is the right choice today.

  Ask for specific advice about sparc on port-sparc.


Re: Help understanding pkgsrc snapshots

2023-04-16 Thread Greg Troxel
vom513  writes:

> - This is no way intended to be anything negative against NetBSD devs.

Asking questions is totally fine.

> I’ve been playing with NetBSD 9.3 on one of my old sparcstations (32
> bit).

> - Take a look at http://ftp.netbsd.org/pub/pkgsrc/pkgsrc-2022Q4/ In
> that dir there are bundles that have the snapshot name in them, and
> others that don’t.  Very hard to tell what is what.  I would assume
> the ones with the snapshot in the name are the ones of that snapshot
> based on the file dates ?  What are the others with dates ahead of the
> snapshot date ?

These are tarballs from CVS checkouts.  I recommend using CVS directl.
(I do not use these tarballs.)

The central confusion is that pkgsrc-2022Q4 is a branch, and there is
also a tag that the branch is rooted at.  The branch is updated over
time (until the next one) for security fixes.  The one with 2022Q4 in
the name is the original branch content, and the one without is the most
recent version of the branch.  That is 25 March because once 2023Q1 is
created, we don't do anything further with 2022Q4.

> - Compare these two dirs:
>   http://ftp.netbsd.org/pub/pkgsrc/packages/NetBSD/amd64/
>   http://ftp.netbsd.org/pub/pkgsrc/packages/NetBSD/sparc/
>
> amd64 has a lot more subdirs, versions etc.  Not consistent.  Is this because 
> of the “tier” status of the arch’s ?

Sort of.  Builds get done by people (within the NetBSD developers) as
they want to spend time, and as resources are available.  There are
NetBSD-owned resources for i386/adm64 and aarch64/earmv[67], and the
rest is basically hardware owned by individual NetBSD developers.  And,
as you know, it takes a long time to build things on sparc.

It's not so much that we have decided sparc won't be allocated
resources; you are just seeing what people volunteer to do.  (Which is
in my view very impressive.)


> - So as time moves on - I would guess older packages* dir’s get
> removed.  Are there any timelines/guidelines on when this happens ?
> My thinking is along the lines of say Ubuntu where unless it’s LTS (5
> years) - the intermediate versions repos go dark.  So running a
> non-LTS system - you will find you can’t (easily) install packages
> after such and such a time…

See archive.netbsd.org which has the older ones.

Basically, the last two quarters are kept.   See
  https://www.pkgsrc.org/quarterly/
and "Current vs. old binary package sets".

There are no pkgsrc LTS branches.  We create a new branch every 3
months, and at the time of creation maintenance stops on the old one.

If there are complete binary package sets for a newer branch, then you
can "pkgin fug" and upgrade all.   I do this on amd64, after making sure
that my package sets (that I build) have what I need.

It is a hard problem to have enough packages for what everybody wants on
sparc, without someone funding a cluster of 32 sparcs  and paying for
space/HVAC/electricity and sysadmin.


Re: Blocklistd + postfix

2023-04-07 Thread Greg Troxel
Brook Milligan  writes:

>> On Apr 6, 2023, at 2:19 PM, Martin Neitzel 
>>  wrote:
>> 
>>> Is it possible for the NetBSD postfix to trigger blocklistd events?
>> 
>> For what it's worth, nothing in /usr/libexec/postfix uses the lib.
>
> Does it make sense that failed SMTP authentication should trigger blocklistd 
> events?

I think it's always tricky.   Basically, anything that goes wrong that
indicates an unauthorized user trying to obtain access seems like a
reasonable thing to include.   Failing authentication for submission
seems like exactly that case.

Of course, problems include authorized users doing things wrong and
getting blocked, and authorized users sharing IP addresses with
attackers getting blocked -- but that doesn't seem to be about SMTP
specifically.


Re: TOTP apps, and WebAuthn recommended devices?

2023-03-25 Thread Greg Troxel
Thanks very much for the detailed response.

One thing that's not 100% clear to me:

  One device (plus a second one as a backup!)


A device can fail or be lost, so the backup concept is obvious, and
perhaps should extend to a third.

Are the backup devices independent in that you

  enroll device A on a site

  enroll device B on the same site

and then either one will be accpeted by the site to login, and they
otherwise don't have anything to do with each other?  I mean no transfer
of keymat, or other linkage.

So therefore one could have a secondary backup in a place far away
that's somewhat hard to get to, and when visiting it every few months,
enroll that backup as an additional key in the sites that were added to
the working device (carried with you) and the primary backup.

And yes, I realize that one needs physical access control on all the
devices, except that an attack requires pw + one of the devices.


Good point about TOTP and phishing.   Password via password manager and
TOTP mitigates that, as not typing in passwords means autofill needs to
work by URL match.

But, I'm mostly coming from "I need to cope with this world because
various sites are making it required", and I wanted to really understand
before digging in.  Important sites like adafruit, for instance,
supposedly to protect RPI purchases from bots, because nobody could
possibly code a bot that does TOTP, or something like that.




  1   2   3   4   5   6   7   >