Re: change in /usr/bin/bc with CTRL-d no longer exit [fixed in -CURRENT]

2024-09-17 Thread Stefan Esser

Am 16.09.24 um 17:01 schrieb Cy Schubert:

In message 
, Warner Losh writes:

[...]

The irony here is that i fixed thus very bug 2 or 3 years ago.

Warner



It's curious (or maybe not) how some bugs keep returning with a time and
time again.


Version 7.0.2 has just been committed to -CURRENT, the MFC is planned
after 3 days. (Or is this a bug fix that might be committed without delay,
since it reverts the code back to what it was in version 7.0.0?)

The issue was caused by differences in behavior between editline on FreeBSD
and Linux versus MacOS, and an attempt to fix ^D on the latter.

Regards, STefan



Re: pkg scripts need updating

2024-05-14 Thread Stefan Esser




Am 15.05.24 um 02:21 schrieb Enji Cooper:



On May 14, 2024, at 7:19 AM, Michael Butler  wrote:

After commit aa48259f337100e79933d660fec8856371f761ed to src which removed 
security_daily_compat_var, I get these warnings daily..

aaron.protected-networks.net login failures:

aaron.protected-networks.net refused connections:
/usr/local/etc/periodic/security/405.pkg-base-audit: security_daily_compat_var: 
not found
/usr/local/etc/periodic/security/405.pkg-base-audit: security_daily_compat_var: 
not found
/usr/local/etc/periodic/security/405.pkg-base-audit: security_daily_compat_var: 
not found
/usr/local/etc/periodic/security/405.pkg-base-audit: security_daily_compat_var: 
not found
/usr/local/etc/periodic/security/405.pkg-base-audit: security_daily_compat_var: 
not found

Checking for security vulnerabilities in base (userland & kernel):
Database fetched: 2024-05-12T14:16-04:00
0 problem(s) in 0 installed package(s) found.
0 problem(s) in 0 installed package(s) found.
/usr/local/etc/periodic/security/410.pkg-audit: security_daily_compat_var: not 
found
/usr/local/etc/periodic/security/410.pkg-audit: security_daily_compat_var: not 
found
/usr/local/etc/periodic/security/410.pkg-audit: security_daily_compat_var: not 
found
/usr/local/etc/periodic/security/410.pkg-audit: security_daily_compat_var: not 
found
/usr/local/etc/periodic/security/410.pkg-audit: security_daily_compat_var: not 
found

Checking for packages with security vulnerabilities:
Database fetched: 2024-05-12T14:16-04:00
/usr/local/etc/periodic/security/460.pkg-checksum: security_daily_compat_var: 
not found
/usr/local/etc/periodic/security/460.pkg-checksum: security_daily_compat_var: 
not found
/usr/local/etc/periodic/security/460.pkg-checksum: security_daily_compat_var: 
not found

Checking for packages with mismatched checksums:


Have you tried emailing the issue to the committer/filing a bug report to bring 
this to their attention?
Cheers,


The messages are caused by running:

/usr/local/etc/periodic/security/405.pkg-base-audit
/usr/local/etc/periodic/security/460.pkg-checksum
/usr/local/etc/periodic/security/410.pkg-audit

These scripts have been installed by pkg-1.12.2 on my system ...

Best regards, STefan



Re: Removing fdisk and bsdlabel (legacy partition tools)

2024-01-26 Thread Stefan Esser

Am 26.01.24 um 17:09 schrieb Rodney W. Grimes:

Am 25.01.24 um 16:38 schrieb Ed Maste:

On Wed, 24 Jan 2024 at 12:30, Warner Losh  wrote:
sbin/growfs/tests/legacy_test.pl
tools/regression/msdosfs/msdosfstest-2.sh
tools/regression/tmpfs/t_vnd
tools/tools/nanobsd/legacy.sh


All these scripts that currently depend on bsdlabel could
easily be converted to exclusively use gpart instead.

Other scripts that had been identified to use bsdlabel or
disklabel are unused / not relevant for FreeBSD.

[...]

The bsdlabel/disklabel/fdisk programs could be rewritten using
gpart without too much effort, at least for the use cases that


After looking at the source code it appears that there is
no need to rewrite any of the bsdlabel/disklabel code, since
it already uses geom calls to access the partition data and
only uses direct disk writes to write out the partition table
(as does gpart, AFAICT).

So, I do not see any dependencies on deprecated kernel features.

I have not compared the bsdlabel code and gpart_write_partcode()
in detail, but I do not see much of a difference at first glance.

Therefore, bsdlabel and disklabel could be kept in the base
system, IMHO. (But fdisk should go ...)


That would be wonderful.  Even just getting it to spit out
the FULL MBR values that are in a protective 0x238 MBR
would go along way to diagnose some corrupt GPT disks.


If you need access to the protective MBR of a GPT partition,
this feature should be added to gpart instead, IMHO.

But what's wrong with using "file -s" for this purpose:

# file -s /dev/nda0
/dev/nda0: DOS/MBR boot sector; partition 1 : ID=0xee, start-CHS (0x0,0,2), 
end-CHS (0x3ff,255,63), startsector 1, 1953525167 sectors, extended partition 
table (last)


Do you need more information from the protective MBR?

This will not work on mounted file systems, though. But if
you got the disk mounted, I'd expect you do not really need
this information ...


have not become obsolete (e.g. CHS specifications) and only for
use in scripts (i.e. no fdisk interactive edit mode).


You are fooling yourself if you think an MBR and CHS values
are obsolete.  GPT *IS* a type 0x238 MBR and see how many
BIOSes you can crash by writting garbage (Especially 0x0)
to the CHS values.  That MBR must have proper values, and
you cant just ignore that they exist.


Again something that gpart should be able to diagnose and fix.

Doesn't "gpart recover" already fix such protective MBRs?


Even parsing of the disktab format and a conversion to gpart
backup format for use by gpart restore should not be too hard.

That would keep the commands available for those that use them
in scripts outside the FreeBSD sources, but would also allow to
remove the kernel interfaces used by those legacy tools.

I'd be willing to write those emulations of legacy tools, if
there is interest in going that way ...


I would be interested in seeing these.
For me gpart does do a lot of things, but it is missing
some very low level stuff that is probably should have.


I read that to mean that gpart is useful for standard setup
operations, but it lacks commands that might be useful to
diagnose inconsistent parameters?

Well, adding consistency checks and warning about potential
issues might not have been on the requirements sheet, but if
you specify checks that should be performed, these could be
added either to "gpart show" or a "gaprt check" command could
be implemented.

If you want such consistency checks added, then specify them
in a feature request PR, for example.

Best regards, STefan



Re: Removing fdisk and bsdlabel (legacy partition tools)

2024-01-26 Thread Stefan Esser

Am 25.01.24 um 16:38 schrieb Ed Maste:

On Wed, 24 Jan 2024 at 12:30, Warner Losh  wrote:


Those are the only users in the tree, but not for long :)


I have some reviews open to remove some old fdisk / diskabel /
bsdlabel invocations from the tree.

With those applied, for fdisk I see the following references
(excluding sbin/fdisk/* and comments, old examples, etc.):

contrib/netbsd-tests/sbin/gpt/t_gpt.sh


This test contains NetBSD specific details and will not run
on FreeBSD.


tests/sys/cddl/zfs/bin/zpool_smi.ksh


More than 99% of the tests in tests/sys/cddl/zfs are skipped,
including this one, which relies on commands that do not exist
on FreeBSD.


For bsdlabel / disklabel:

sbin/growfs/tests/legacy_test.pl


This test could easily be changed to use gpart.


tools/regression/msdosfs/msdosfstest-2.sh


Trivially fixed.


tools/regression/tmpfs/t_vnd


Trivially fixed.


tools/tools/nanobsd/legacy.sh


Does already use gpart and could easily be fixed.


contrib/netbsd-tests/kernel/t_umount.sh
contrib/netbsd-tests/kernel/t_umountstress.sh
contrib/netbsd-tests/sbin/gpt/t_gpt.sh


These are unused and won't run without modification.


sbin/newfs/runtest00.sh
sbin/newfs/runtest01.sh


Unused and do not run on a current version of FreeBSD.


These will need to be addressed before actually removing any of these
binaries, of course.


I could fix those that are actually usable and installed on
a current FreeBSD system within at most 1 hour.


I wouldn't object to making these ports, but both these programs use 'sekret'
bits from the kernel that might not remain exposed as we clean things up.
Though the IOCTLs they do (or used to do) may no longer be relevant. It's
been so long that I've forgotten


If we eventually stop exporting those kernel interfaces the tools
would fail anyway, so IMO we can keep providing the kernel interfaces
along with the headers etc, and keep building from source until/unless
we drop support altogether.


The bsdlabel/disklabel/fdisk programs could be rewritten using
gpart without too much effort, at least for the use cases that
have not become obsolete (e.g. CHS specifications) and only for
use in scripts (i.e. no fdisk interactive edit mode).

Even parsing of the disktab format and a conversion to gpart
backup format for use by gpart restore should not be too hard.

That would keep the commands available for those that use them
in scripts outside the FreeBSD sources, but would also allow to
remove the kernel interfaces used by those legacy tools.

I'd be willing to write those emulations of legacy tools, if
there is interest in going that way ...

Regards, STefan



Re: vt and keyboard accents

2023-02-01 Thread Stefan Esser

Am 29.01.23 um 01:54 schrieb Yuri:

Looking into an issue with accents input for vt and cz (so
/usr/share/vt/keymaps/cz.kbd) keyboard where some of the accents are
working and other result weird unrelated characters output.

Checking kbdcontrol -d output, there is an obvious difference with
keymap contents -- all mappings are trimmed down to 1 byte after reading:

kbdcontrol:
   dacu  180  ( 180 180 ) ( 'S' 'Z' ) ( 'Z' 'y' ) ( 's' '[' )
  ( 'z' 'z' ) ( 'R' 'T' ) ( 'A' 193 ) ( 'L' '9' )
  ( 'C' 006 ) ( 'E' 201 ) ( 'I' 205 ) ( 'N' 'C' )
  ( 'O' 211 ) ( 'U' 218 ) ( 'Y' 221 ) ( 'r' 'U' )
  ( 'a' 225 ) ( 'l' ':' ) ( 'c' 007 ) ( 'e' 233 )
  ( 'i' 237 ) ( 'n' 'D' ) ( 'o' 243 ) ( 'u' 250 )
  ( 'y' 253 )

keymap:
   dacu 0xb4( 0xb4   0xb4) ( 'S'0x015a  ) ( 'Z'0x0179  )
( 's'0x015b  )
( 'z'0x017a  ) ( 'R'0x0154  ) ( 'A'0xc1)
( 'L'0x0139  )
( 'C'0x0106  ) ( 'E'0xc9) ( 'I'0xcd)
( 'N'0x0143  )
( 'O'0xd3) ( 'U'0xda) ( 'Y'0xdd)
( 'r'0x0155  )
( 'a'0xe1) ( 'l'0x013a  ) ( 'c'0x0107  )
( 'e'0xe9)
( 'i'0xed) ( 'n'0x0144  ) ( 'o'0xf3)
( 'u'0xfa)
( 'y'0xfd)

Source of the problem is the following definition in sys/sys/kbio.h:

struct acc_t {
 u_char  accchar;
 u_char  map[NUM_ACCENTCHARS][2];
};

While the keymaps were converted to have the unicode characters for vt
in the commit below, the array to store them (map) was missed, or was
there a reason for this?

---
commit 7ba08f814546ece02e0193edc12cf6eb4d5cb8d4
Author: Stefan Eßer 
Date:   Sun Aug 17 19:54:21 2014 +

 Attempt at converting the SYSCONS keymaps to Unicode for use with
NEWCONS.
 I have spent many hours comparing source and destination formats,
and hope
 to have caught the most severe conversion errors.
---

I have tried the following patch and it allows me to enter all accents
documented in the keymap, though I must admit I'm not sure it does not
have hidden issues:

diff --git a/sys/sys/kbio.h b/sys/sys/kbio.h
index 7f17bda76c5..fffeb63e226 100644
--- a/sys/sys/kbio.h
+++ b/sys/sys/kbio.h
@@ -200,7 +200,7 @@ typedef struct okeymap okeymap_t;

  struct acc_t {
 u_char  accchar;
-   u_char  map[NUM_ACCENTCHARS][2];
+   int map[NUM_ACCENTCHARS][2];
  };



I have extended the range of the map array entries to 16 bits,
which is sufficient for all currently defined keymap entries,
see commit 1e0853ee8403.

Thanks for reporting!

Regards, STefan



Re: domain names and internationalization?

2022-09-20 Thread Stefan Esser

Am 19.09.22 um 22:27 schrieb Rick Macklem:

Hi,

Recently there has been discussion on the NFSv4 IETF working
group email list w.r.t. internationalization for the domain name
it uses for users/groups.


Hi Rick,

I do assume that you know about RFC 3492 (Punycode):

https://datatracker.ietf.org/doc/html/rfc3492


Right now, I am pretty sure the FreeBSD nfsuserd(8) only works
for ascii domain names, but...


You can manually translate domain names into their Punycode
representation. The NFS code could work with them and only
translate them back to UTF-8 (or whatever) for display purposes.

For pure ASCII this is an identity transformation, for names
that actually represent UTF-8 strings, the value to send to
DNS servers (and to locally store in the daemon) could be the
internally stored Punycode representation.


I am hoping someone knows what DNS does in this area (the
working group list uses terms like umlaut, which I have never
even heard of;-).


That's the contraction of "ae", "oe", "ue" that has long ago
been introduced into the German writing system, with the "e"
abbreviated to two dots above the vocal, e.g. "ae" --> "ä".
Just a convenience rule to speed up manually copying the bible
in monasteries in medieval times ;-)

But there are many other accented letters in other languages,
that can be used in internationalized domain names, and the
whole set of Unicode characters can be represented using
Punycode.


I know essentially nothing about internationalization, so any hints
will be appreciated.


For a start:

https://en.wikipedia.org/wiki/Internationalized_domain_name
https://en.wikipedia.org/wiki/Punycode

There are C implementations of the transformations, e.g. in the
dns/libidn2 port.

We do not seem to have equivalent library functions in the
FreeBSD base system yet, but probably should provide them.

Best regards, STefan



Re: main-n257625-587649902329-dirty?

2022-08-26 Thread Stefan Esser

Am 26.08.22 um 18:55 schrieb Nuno Teixeira:

Hello to all,

Today I updated and uname -a shows main-n257625-587649902329-dirty.
Why is showing -dirty?


The -dirty tag is appended to the commit if there are local changes,
i.e., your system has been built using Git commit 587649902329 with
uncommitted changes.

Regards, STefan



Re: Updating EFI boot loader results in boot hangup

2022-08-14 Thread Stefan Esser

Am 14.08.22 um 04:20 schrieb Oleg Lelchuk:

Yes, Yasuhiro and I have the same error.


Just a "me too", also on ZFS, on a Ryzen 3 based system.

Booting the latest USB snapshot image worked, but not when I copy
the whole of /boot from that USB stick to my ZFS boot partition.

The system is usable if I boot from USB and manually mount the ZFS
file systems over the USB boot image.

Failed boot log:

https://people.freebsd.org/~se/ZFS-Boot-Failure.jpg

But I doubt that it adds any new information ...



Re: Accessibility in the FreeBSD installer and console

2022-07-08 Thread Stefan Esser

Am 08.07.22 um 12:53 schrieb Hans Petter Selasky:

Hi,

Here is the complete patch for Voice-Over in the FreeBSD console:

https://reviews.freebsd.org/D35754

You need to install espeak from pkg and then install the 
/etc/devd/accessibility.conf file and then run sysctl 
kern.vt.accessibility.enable=1 after booting the new kernel.


It is freaking awesome!

There might be some bugs, but it worked fine for me!


The espeak port is marked for deletion on 2022-06-30 (but has
not been deleted, yet):

DEPRECATED= Last release in 2014 and deprecated upstream
EXPIRATION_DATE=2022-06-30

There is espeak-ng, which took over the sources, and I have
prepared a port update.

I had asked a member of the portmgr team whether it would be
preferred if the espeak port was updated to use sources from
the espeak-ng repository (the version numbers continue from
the last espeak release), or whether if I should create a new
port for espeak-ng.

But I have not got any response on that question ...

The current status of the espeak-ng port is that it generates
WAV output, but it does not work with the sound system in
FreeBSD, anymore.

The 2 sound options available in the espeak port seem to no
longer be supported.

I did not have time to look into this issue, but I do assume
that the sound output from the old espeak code could easily
be restored in espeak-ng.

Regards, STefan

PS: A compressed tar of the WIP espeak-ng port is attached,
in case you are interested. I expect it to be stripped
off in the mail list ...


OpenPGP_signature
Description: OpenPGP digital signature


Re: How to supress prompt on bc 5.3.1,Re: How to supress prompt on bc 5.3.1

2022-06-16 Thread Stefan Esser
Am 16.06.22 um 03:22 schrieb Michael Butler:
> On 6/15/22 18:47, Masachika ISHIZUKA wrote:
>    I updated to master-n256084-5dd1f6f1441 (1400061) and this
> leads to bc to 5.3.1.
>    Previosly, 'BC-ENV-APRG=-P' or 'bc -P' were working but it
> doesn't work on 5.3.1.  Is there any way to supress prompt ?

 This is fixed in 5.3.2:
>>>
>>> This version is already available as a port, but it cannot be
>>> built in the base system at the default WARNS level.
>>>
>>> I had suggested a different patch that was tested in base and
>>> have re-submitted that patch after noticing the issue with the
>>> current code.
>>
>>    Thank you for commit.
>>
>>    'bc' 5.3.3 works fine on master-n256152-ce00b11940a.
> 
> There's still some remaining buildworld left-overs in /tmp from this series of
> updates ..
> 
> imb@toshi:/home/imb> ll /tmp/*tmp
> -rw-r--r--  1 root  wheel  6768 Jun 15 12:18 /tmp/bc_help.txt.XBwzl9An2r.tmp
> -rw-r--r--  1 root  wheel  6768 Jun 15 11:26 /tmp/bc_help.txt.zLR884lFpG.tmp
> -rw-r--r--  1 root  wheel  5797 Jun 15 12:18 /tmp/dc_help.txt.RGfFwWi2Yh.tmp
> -rw-r--r--  1 root  wheel  5797 Jun 15 11:26 /tmp/dc_help.txt.oltK8Dc7mR.tmp
> -rw-r--r--  1 root  wheel  3416 Jun 15 12:18 /tmp/lib.bc.NuspIZHi5s.tmp
> -rw-r--r--  1 root  wheel  3416 Jun 15 11:26 /tmp/lib.bc.wwZJr98NlN.tmp
> -rw-r--r--  1 root  wheel  9069 Jun 15 11:26 /tmp/lib2.bc.5ywBAhZgwg.tmp
> -rw-r--r--  1 root  wheel  9069 Jun 15 12:18 /tmp/lib2.bc.D4PYZlxfWk.tmp

These left over files appear to be due to a commented-out "rm" at the
end of scripts/functions.sh, probably due to debugging of recent changes
in gen/genstr.sh:

# Remove multiple blank lines.
uniq "$_filter_text_temp" "$_filter_text_out"

# Remove the temp file.
#rm -rf "$_filter_text_temp"<

# Reset IFS.
IFS="$_filter_text_ifs"
}

You can fix it locally by removing the "#" in front of that rm command.

I'll contact the author of this software to get this fixed in his repository
and will then import that change. (The author wants downstream versions to
not diverge from his repository and I respect this wish.)

Thank you for reporting this issue!

Best regards, STefan


OpenPGP_signature
Description: OpenPGP digital signature


Re: How to supress prompt on bc 5.3.1

2022-06-14 Thread Stefan Esser
Am 14.06.22 um 21:22 schrieb Herbert J. Skuhra:
> On Tue, 14 Jun 2022 03:01:41 +0200, Masachika ISHIZUKA wrote:
>>
>>   I updated to master-n256084-5dd1f6f1441 (1400061) and this
>> leads to bc to 5.3.1.
>>   Previosly, 'BC-ENV-APRG=-P' or 'bc -P' were working but it
>> doesn't work on 5.3.1.  Is there any way to supress prompt ?
> 
> This is fixed in 5.3.2:

This version is already available as a port, but it cannot be
built in the base system at the default WARNS level.

I had suggested a different patch that was tested in base and
have re-submitted that patch after noticing the issue with the
current code.

Regards, STefan


OpenPGP_signature
Description: OpenPGP digital signature


Re: "pkg upgrade" failing with "Fail to create temporary file: ... Not a directory"

2022-04-28 Thread Stefan Esser
Am 28.04.22 um 09:11 schrieb Baptiste Daroussin> It is 2 things, it is a port
problem of maintainers who do not check for
> upgradability of their packages, and it can also been seen as something pkg 
> can
> deal with, but a complicated case, so I don't know yet how.
> 
> The main issue is a file in vX which becomes a directory in vX+1 which goes in
> the way pkg does extract files to be as atomic as possible.

This case could be caught and dealt with by removing the file or by moving
it out of the way (to a temporary name to allow it to be recovered if the
subsequent steps fail or to be deleted if they succeed).

Further special conditions may apply - but since there is no way a file
and directory can exist under the same name (on FreeBSD, at least), it is
safe to assume that the file will not be kept when the package is installed.

Regards, STefan


OpenPGP_signature
Description: OpenPGP digital signature


Re: pciconf -lbvV crashes kernel main-8d72c409c

2022-02-06 Thread Stefan Esser
Am 06.02.22 um 01:19 schrieb Michael Jung:
> Dump header from device: /dev/ada0p2
> Architecture: amd64
> Architecture Version: 2
> Dump Length: 900231168
> Blocksize: 512
> Compression: none
> Dumptime: 2022-02-04 15:48:08 -0500
> Hostname: draid.mikej.com
> Magic: FreeBSD Kernel Dump
> Version String: FreeBSD 14.0-CURRENT #1 main-8d72c409c: Thu Feb 3 18:14:01 
> EST 2022
> mikej@draid:/usr/obj/usr/src/amd64.amd64/sys/GENERIC
> Panic String: length mismatch
> Dump Parity: 1692982593
> Bounds: 2
> Dump Status: good

This is caused by the following code fragments:

/*


 * Calculate the amount of space needed in the data buffer.  An


 * identifier element is always present followed by the read-only


 * and read-write keywords.


 */
len = sizeof(struct pci_vpd_element) + strlen(vpd->vpd_ident);
for (i = 0; i < vpd->vpd_rocnt; i++)
len += sizeof(struct pci_vpd_element) + vpd->vpd_ros[i].len;
for (i = 0; i < vpd->vpd_wcnt; i++)
len += sizeof(struct pci_vpd_element) + vpd->vpd_w[i].len;
[...]
vpd_user = lvio->plvi_data;
[...]
vpd_user = PVE_NEXT_LEN(vpd_user, vpd_element.pve_datalen);
vpd_element.pve_flags = 0;
for (i = 0; i < vpd->vpd_rocnt; i++) {
vpd_element.pve_keyword[0] = vpd->vpd_ros[i].keyword[0];
vpd_element.pve_keyword[1] = vpd->vpd_ros[i].keyword[1];
vpd_element.pve_datalen = vpd->vpd_ros[i].len;
error = copyout(&vpd_element, vpd_user, sizeof(vpd_element));
if (error)
return (error);
error = copyout(vpd->vpd_ros[i].value, vpd_user->pve_data,
vpd->vpd_ros[i].len);
if (error)
return (error);
vpd_user = PVE_NEXT_LEN(vpd_user, vpd_element.pve_datalen);
}
vpd_element.pve_flags = PVE_FLAG_RW;
for (i = 0; i < vpd->vpd_wcnt; i++) {
vpd_element.pve_keyword[0] = vpd->vpd_w[i].keyword[0];
vpd_element.pve_keyword[1] = vpd->vpd_w[i].keyword[1];
vpd_element.pve_datalen = vpd->vpd_w[i].len;
error = copyout(&vpd_element, vpd_user, sizeof(vpd_element));
if (error)
return (error);
error = copyout(vpd->vpd_w[i].value, vpd_user->pve_data,
vpd->vpd_w[i].len);
if (error)
return (error);
vpd_user = PVE_NEXT_LEN(vpd_user, vpd_element.pve_datalen);
}
KASSERT((char *)vpd_user - (char *)lvio->plvi_data == len,
("length mismatch"));

The KASSERT triggered, indicating that a different amount of data has been
fetched than has previously been calculated.

It would be interesting to compare the pre-computed "len" and the actual
amount of data (i.e. the operands of == in the KASSERT).

The definition of PVE_NEXT_LEN looks correct, but in order to completely
understand what the issue is, a dump of the VPD range should be analyzed
(or you could add trace output to both the calculation of "len" and to
the fetching of the VPD data that advances vpd_user).

Regards, STefan

PS: You may want to build a kernel with the attached patch, which prints
the calculated lengths after each element that is added to "len".
The KASSERT will only trigger if the actual length exceeds the expected
value, and the printf() output should go to the console device.
My system does not seem to have a single device that provides VPD,
therefore the patch has only been compile tested ...diff --git a/sys/dev/pci/pci_user.c b/sys/dev/pci/pci_user.c
index a5f849e85c2d..c771db0b5070 100644
--- a/sys/dev/pci/pci_user.c
+++ b/sys/dev/pci/pci_user.c
@@ -565,6 +565,7 @@ pci_list_vpd(device_t dev, struct pci_list_vpd_io *lvio)
size_t len;
int error, i;
 
+   printf("%p / %p\n", lvio->plvi_data, PVE_NEXT_LEN(lvio->plvi_data, 1));
vpd = pci_fetch_vpd_list(dev);
if (vpd->vpd_reg == 0 || vpd->vpd_ident == NULL)
return (ENXIO);
@@ -575,10 +576,15 @@ pci_list_vpd(device_t dev, struct pci_list_vpd_io *lvio)
 * and read-write keywords.
 */
len = sizeof(struct pci_vpd_element) + strlen(vpd->vpd_ident);
-   for (i = 0; i < vpd->vpd_rocnt; i++)
+   printf("LEN(%d): %lu\n", -1, len);
+   for (i = 0; i < vpd->vpd_rocnt; i++) {
len += sizeof(struct pci_vpd_element) + vpd->vpd_ros[i].len;
-   for (i = 0; i < vpd->vpd_wcnt; i++)
+   printf("LEN(%d): %lu\n", i, len);
+   }
+   for (i = 0; i < vpd->vpd_wcnt; i++) {
len += sizeof(struct pci_vpd_element) + vpd->vpd_w[i].len;
+   printf("LEN(%d): %lu\n", i, len);
+   }
 
if (lvio->plvi_len == 0) {
lvio->plvi_len = len;
@@ -606,6 +612,7 @@ pci_list_vpd(device_t dev, struct pci_lis

[REVIEW] Fix of sysctlbyname() accesses to user sub-tree variables

2022-02-05 Thread Stefan Esser
I have created https://reviews.freebsd.org/D34171 for a patch
that restores the lost support for accesses to the user sub-tree
in sysctlbyname().

E.g. sysctlbyname("user.cs_path", ...) returns 0 to indicate no
error, but only an empty string, since the actual result string
is to be provided by the user-land code in the C library.

This functionality exists in sysctl(), which used to be called
by sysctlbyname(), but after an optimization that reduces the
number of system calls required, sysctl() is not longer called
and thus the empty result obtained from the kernel is returned.
(The system call is only used to check access rights, and a
non-zero return value would be returned to the caller, but the
actual value of the result string is not known to the kernel.)

One user land application affected by this issue is "whereis"
(just fixed in -CURRENT, MFC to -STABLE planned). But more out
of tree users of sysctlbyname() may exist that try to to access
user sub-tree variables, and thus this function should be fixed
to return the same results as sysctl() in all cases, as it did
before the optimization was implemented.

The code in the review special cases accesses to "user.*" and
uses sysctl() to fill in the actual value, but keeps the faster
direct system call for the variables actually maintained in the
kernel. It is simplified relative to the "old" implementation to
account for the implicit assumption that user.* names may only
have 2 elements in the OID array. (Codified in sysctl() and
would cause error returns if that assumption was violated.)

I'd appreciate a review and an approval of the change.

Regards, STefan



Re: Kernel changes causing AMDGPU / DRM to fail? i2c related?

2022-01-30 Thread Stefan Esser
Am 30.01.22 um 19:23 schrieb Vladimir Kondratyev:
> On 30.01.2022 00:25, Stefan Esser wrote:
>> After rebooting with freshly built world, kernel and the amdgpu driver
>> my console stopped working. It goes blank and the display goes into a
>> power save mode, as soon as the amdgpu driver is loaded.
>>
>> The GPU (a Radeon R7 250E) is correctly detected as before, but there
>> is an error message "drmn0: [drm] Cannot find any crtc or sizes".
>>
[...]
>> [drm] AMDGPU Display Connectors
>> [drm] Connector 0:
>> [drm]   DP-1
>> [drm]   HPD4
>> [drm]   DDC: 0x1950 0x1950 0x1951 0x1951 0x1952 0x1952 0x1953 0x1953
>> [drm]   Encoders:
>> [drm] DFP1: INTERNAL_UNIPHY2
>> [drm] Connector 1:
>> [drm]   HDMI-A-1
>> [drm]   HPD1
>> [drm]   DDC: 0x195c 0x195c 0x195d 0x195d 0x195e 0x195e 0x195f 0x195f
>> [drm]   Encoders:
>> [drm] DFP2: INTERNAL_UNIPHY2
>> [drm] Connector 2:
>> [drm]   DVI-I-1
>> [drm]   HPD2
>> [drm]   DDC: 0x1958 0x1958 0x1959 0x1959 0x195a 0x195a 0x195b 0x195b
>> [drm]   Encoders:
>> [drm] DFP3: INTERNAL_UNIPHY
>> [drm] CRT1: INTERNAL_KLDSCP_DAC1
>> drmn0: [drm] Cannot find any crtc or sizes
>> drmn0: [drm] Cannot find any crtc or sizes
>> drmn0: [drm] Cannot find any crtc or sizes
>> [drm] Initialized amdgpu 3.37.0 20150101 for drmn0 on minor 0
>>
>> A successful driver attach from a reboot a few days ago had ended in:
>>
>> [drm] CRT1: INTERNAL_KLDSCP_DAC1
>> [drm] fb mappable at 0xE0503000
>> [drm] vram apper at 0xE000
>> [drm] size 33177600
>> [drm] fb depth is 24
>> [drm]    pitch is 15360
>> [drm] Initialized amdgpu 3.36.0 20150101 for drmn0 on minor 0
>>
>> Regards, STefan
> 
> drm-kmod commit 534aa199c10d forced it to use i2c from base.

Hi Vladimir,

thank you for the information! I'm using drm-devel-kmod, and in fact found
that 5.5.19.g20211230 works, while 5.7.19.g20220126 (committed as 0c38674b389ad
on 2022-01-26) causes the failure.

> You may try to checkout previous revision (444dc58f0247) to find out if 
> in-base
> i2c is guilty or not.

Assuming that the same change to use the system i2c code has been in the latest
commit to the drm-devel-kmod port, this should be proven, now. ;-)

These is the list of in-kernel i2c modules on my system (a Ryzen 9 5950 on an
ASUS mainboard with B550 chip-set):

$ kldstat -v | grep iic
 68 iicsmb/smbus
 67 iicbus/iicsmb
 66 iichb/iicbus
 65 iicbb/iicbus
 64 iicbus/iic
 63 iicbus/ic
213 lkpi_iicbb/iicbb
212 lkpi_iic/lkpi_iicbb
211 lkpi_iic/iicbus
210 drmn/lkpi_iic
 56 iichid/hidbus

Can I help debug this issue?

I could re-install the latest version and boot with hw.dri.drm_debug or
dev.drm.drm_debug set?

Or are there other settings to get a debug log from the i2c side?

Regards, STefan

PS: I'm keeping the CC to current@, since this might be an issue in the i2c
kernel code ...


OpenPGP_signature
Description: OpenPGP digital signature


Latest drm-devel-kmod port not working? (was: Re: Kernel changes causing AMDGPU / DRM to fail? i2c related?)

2022-01-29 Thread Stefan Esser
Am 29.01.22 um 23:25 schrieb Tomoaki AOKI:
> On Sat, 29 Jan 2022 22:25:17 +0100
> Stefan Esser  wrote:
> 
>> After rebooting with freshly built world, kernel and the amdgpu driver
>> my console stopped working. It goes blank and the display goes into a
>> power save mode, as soon as the amdgpu driver is loaded.
>>
>> The GPU (a Radeon R7 250E) is correctly detected as before, but there
>> is an error message "drmn0: [drm] Cannot find any crtc or sizes".
[...]
> Are you sure your ports tree is up-to-date and graphics/drm-*-kmod
> you installed (IIRC, should be needed for -intel and -amdgpu drivers)
> is also updated? drm-*-kmod could be affected by LinuxKPI updates in
> base.

Yes, I rebuild the system from sources at least once a day, but do
only reboot every few days, especially if there has been any change
that might cause incompatibilities between kernel and user land.

And I always rebuild all KLDs together with the kernel, including
all the driver module ports relevant for X11.

>  *There can be some (sometime very wide) timeframe between LinuxKPI
>   update and corresponding linux-*-kmod catches up with it.

X11 was working just fine on a system built and rebooted a few days
ago.

> Looking into cgit.freebsd.org, at least drm-current-kmod is updated 36
> hours ago. Not sure it's related or not, though.

I had missed that update - and YES you are right. I'm using the devel
version and after reverting the update to drm_v5.5.19_7 the driver
does attach again.

Thanks for the hint!

>  *I always prefer nvidia dGPU because of these dangerous span.
>   So I've forced to choose ThinkPad P series (without "s") which
>   usually can disable CPU-integrated Intel GPU and run nvidia GPU
>   alone though BIOS setting.

I had issues with Nvidia cards (and the closed source driver modules
and libraries) before and for that reason bought a passively cooled
Radeon card for this development workstation.

Best regards, STefan


OpenPGP_signature
Description: OpenPGP digital signature


Kernel changes causing AMDGPU / DRM to fail? i2c related?

2022-01-29 Thread Stefan Esser
After rebooting with freshly built world, kernel and the amdgpu driver
my console stopped working. It goes blank and the display goes into a
power save mode, as soon as the amdgpu driver is loaded.

The GPU (a Radeon R7 250E) is correctly detected as before, but there
is an error message "drmn0: [drm] Cannot find any crtc or sizes".

I'm asking here and not on the ports list, since the AMDGPU driver has
not been updated for half a year. But to be sure that there is no mismatch
between kernel and user land, I have rebuilt all X11 server and library
ports.

There have been changes affecting the i2c driver, IIRC, and the error
message seems to point at an issue obtaining information from the LCD
display.

The output of "grep drm /var/run/dmesg.boot" follows:

[drm] amdgpu kernel modesetting enabled.
drmn0:  on vgapci0
vgapci0: child drmn0 requested pci_enable_io
vgapci0: child drmn0 requested pci_enable_io
[drm] initializing kernel modesetting (VERDE 0x1002:0x683F 0x174B:0xA001 0x00).
[drm] register mmio base: 0xFCE0
[drm] register mmio size: 262144
[drm] add ip block number 0 
[drm] add ip block number 1 
[drm] add ip block number 2 
[drm] add ip block number 3 
[drm] add ip block number 4 
[drm] add ip block number 5 
[drm] add ip block number 6 
[drm] BIOS signature incorrect 0 0
[drm] vm size is 512 GB, 2 levels, block size is 10-bit, fragment size is 9-bit
drmn0: successfully loaded firmware image 'amdgpu/verde_mc.bin'
drmn0: VRAM: 1024M 0x00F4 - 0x00F43FFF (1024M used)
drmn0: GART: 1024M 0x00FF - 0x00FF3FFF
[drm] Detected VRAM RAM=1024M, BAR=256M
[drm] RAM width 128bits GDDR5
[drm] amdgpu: 1024M of VRAM memory ready
[drm] amdgpu: 3072M of GTT memory ready.
[drm] GART: num cpu pages 262144, num gpu pages 262144
drmn0: PCIE GART of 1024M enabled (table at 0x00F40050).
[drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
drmn0: successfully loaded firmware image 'amdgpu/verde_pfp.bin'
drmn0: successfully loaded firmware image 'amdgpu/verde_me.bin'
drmn0: successfully loaded firmware image 'amdgpu/verde_ce.bin'
drmn0: successfully loaded firmware image 'amdgpu/verde_rlc.bin'
drmn0: successfully loaded firmware image 'amdgpu/verde_smc.bin'
[drm] Internal thermal controller without fan control
[drm] amdgpu: dpm initialized
[drm] Connector DP-1: get mode from tunables:
[drm]   - kern.vt.fb.modes.DP-1
[drm]   - kern.vt.fb.default_mode
[drm] Connector HDMI-A-1: get mode from tunables:
[drm]   - kern.vt.fb.modes.HDMI-A-1
[drm]   - kern.vt.fb.default_mode
[drm] Connector DVI-I-1: get mode from tunables:
[drm]   - kern.vt.fb.modes.DVI-I-1
[drm]   - kern.vt.fb.default_mode
[drm] AMDGPU Display Connectors
[drm] Connector 0:
[drm]   DP-1
[drm]   HPD4
[drm]   DDC: 0x1950 0x1950 0x1951 0x1951 0x1952 0x1952 0x1953 0x1953
[drm]   Encoders:
[drm] DFP1: INTERNAL_UNIPHY2
[drm] Connector 1:
[drm]   HDMI-A-1
[drm]   HPD1
[drm]   DDC: 0x195c 0x195c 0x195d 0x195d 0x195e 0x195e 0x195f 0x195f
[drm]   Encoders:
[drm] DFP2: INTERNAL_UNIPHY2
[drm] Connector 2:
[drm]   DVI-I-1
[drm]   HPD2
[drm]   DDC: 0x1958 0x1958 0x1959 0x1959 0x195a 0x195a 0x195b 0x195b
[drm]   Encoders:
[drm] DFP3: INTERNAL_UNIPHY
[drm] CRT1: INTERNAL_KLDSCP_DAC1
drmn0: [drm] Cannot find any crtc or sizes
drmn0: [drm] Cannot find any crtc or sizes
drmn0: [drm] Cannot find any crtc or sizes
[drm] Initialized amdgpu 3.37.0 20150101 for drmn0 on minor 0

A successful driver attach from a reboot a few days ago had ended in:

[drm] CRT1: INTERNAL_KLDSCP_DAC1
[drm] fb mappable at 0xE0503000
[drm] vram apper at 0xE000
[drm] size 33177600
[drm] fb depth is 24
[drm]pitch is 15360
[drm] Initialized amdgpu 3.36.0 20150101 for drmn0 on minor 0

Regards, STefan


OpenPGP_signature
Description: OpenPGP digital signature


Re: UBSAN report for main [so: 14] /usr/bin/whatis: non-zero (48) and zero offsets from null pointer in qsort.c

2022-01-12 Thread Stefan Esser
Am 12.01.22 um 08:50 schrieb Jan Kokemüller:
> On 11.01.22 22:08, Stefan Esser wrote:
>> diff --git a/lib/libc/stdlib/qsort.c b/lib/libc/stdlib/qsort.c
>> index 5016fff7895f..51c41e802330 100644
>> --- a/lib/libc/stdlib/qsort.c
>> +++ b/lib/libc/stdlib/qsort.c
>> @@ -108,6 +108,8 @@ local_qsort(void *a, size_t n, size_t es, cmp_t *cmp, 
>> void
>> *thunk)
>>  int cmp_result;
>>  int swap_cnt;
>>
>> +if (__predict_false(a == NULL))
>> +return;
>>  loop:
>>  swap_cnt = 0;
>>  if (n < 7) {
>>
>> This would also work to prevent the NULL pointer arithmetik for
>> ports that might also path a == NULL and n == 0 in certain cases.
> 
> The UB happens in this line, when "a == NULL" and "n == 0", right?
> 
> for (pm = (char *)a + es; pm < (char *)a + n * es; pm += es)
> 
> This is arithmetic on a pointer (the NULL pointer) which is not part of an
> array, which is UB.

Yes.

> Then, wouldn't "if (__predict_false(n == 0))" be more appropriate than 
> checking
> for "a == NULL" here? Testing for "a == NULL" might suppress UBSAN warnings of
> valid bugs, i.e. when "qsort" is called with "a == NULL" and "n != 0". In that
> case UBSAN _should_ trigger.

Yes, but not only UBSAN would trigger, the program would probably
crash due to an attempt to access an unmapped page.

> UBSAN should not trigger when n == 0, though. At least, when "a" does point to
> a valid array. But what about the case of "a == NULL && n == 0"? Is that 
> deemed
> UB? It looks like at least FreeBSD's "qsort_s" implementation says it's legal.

This might be legal, but it leads to adding the element size to a
NULL pointer, in the current implementation. The addition happens
in the initialization part of the for loop, before n == 0 leads to
no actual iteration being performed (a + es < a + n * es is false
for es > 0).

There is no functional difference if the case of a == NULL and
n == 0 is silently ignored.

But your are correct: just returning early for a == NULL and n != 0
will prevent the program abort.

> a != NULL (pointing to valid array), n != 0  ->  "normal" case, no UB
> a != NULL (pointing to valid array), n == 0  ->  should not trigger UB, and
>  doesn't in the current
>  implementation

It does trigger UB in a way that does not cause issues (or else
the problem would have been detected before). a == NULL makes the
calculation of pm = (char *)a + es undefined, but the value of pm
will never be used if n == 0.

> a == NULL, n == 0->  should not trigger UB?
>  (debatable)
> 
> So if "a == NULL && n == 0" was deemed legal, then there would be no bug in
> "mansearch.c", right?

IMHO it is not the question of "legal" or not, but we should prevent
the undefined behavior that results from execution reaching the
initialization part of the for loop.

Any you are correct, the patch should probably be:

diff --git a/lib/libc/stdlib/qsort.c b/lib/libc/stdlib/qsort.c
index 5016fff7895f..eef51d2dd3b3 100644
--- a/lib/libc/stdlib/qsort.c
+++ b/lib/libc/stdlib/qsort.c
@@ -108,6 +108,8 @@
int cmp_result;
int swap_cnt;

+   if (__predict_false(a == NULL && n == 0))
+   return;
 loop:
swap_cnt = 0;
if (n < 7) {

This will be detected by UBSAN if called with a == NULL and n != 0,
but it will also cause the program to fail with typical parameters
for the elements to sort and the cmp function.

Regards, STefan

PS: I just saw Mark's reply regarding n == 0 and cmp == NULL. That
case is already covered, since n == 0 will prevent cmp from
being dereferenced (since that only happens in the loop body,
which will not be entered for n == 0).


OpenPGP_signature
Description: OpenPGP digital signature


Re: UBSAN report for main [so: 14] /usr/bin/whatis: non-zero (48) and zero offsets from null pointer in qsort.c

2022-01-11 Thread Stefan Esser
Am 11.01.22 um 21:08 schrieb Mark Millard:
> On 2022-Jan-11, at 05:19, Stefan Esser  wrote:
[...]
>> The undefined behavior is caused by insufficient checking of parameters
>> in mansearch.c.
>>
>> As part of the initializations performed at the start of mansearch(),
>> the variables cur and *res are initialized to 0 resp. NULL:
>>
>>  cur = maxres = 0;   
>>  if (res != NULL)
>>  *res = NULL;
>>
>> If no match is found, these values are unchanged at line 223, where res
>> is checked to be non-NULL, but then *res is passed to qsort() and that
>> is still NULL.
>>
>> Suggested fix (also attached to avoid white-space issues):
>>
>> --- usr.bin/mandoc/mansearch.c
>> +++ usr.bin/mandoc/mansearch.c
>> @@ -220,7 +220,7 @@
>>  if (cur && search->firstmatch)
>>  break;
>>  }
>> -if (res != NULL)
>> +if (res != NULL && *res != NULL)
>>  qsort(*res, cur, sizeof(struct manpage), manpage_compare);
>>  if (chdir_status && getcwd_status && chdir(buf) == -1)
>>  warn("%s", buf);
>>
>> (File name as in OpenBSD, it is contrib/mandoc/mansearch.c in FreeBSD.)
> 
> Cool. Thanks.
> 
> (But I'm not a committer so someone else
> will have to deal with doing an update to
> the file in git --and likely MFC'ing it.)
> 
> ===
> Mark Millard
> marklmi at yahoo.com
> 

I have submitted a bug report to our upstream (OpenBSD), but the
issue could also be fixed (or rather undefined behavior prevented)
by a simple patch that makes qsort() detect the NULL pointer:

diff --git a/lib/libc/stdlib/qsort.c b/lib/libc/stdlib/qsort.c
index 5016fff7895f..51c41e802330 100644
--- a/lib/libc/stdlib/qsort.c
+++ b/lib/libc/stdlib/qsort.c
@@ -108,6 +108,8 @@ local_qsort(void *a, size_t n, size_t es, cmp_t *cmp, void
*thunk)
int cmp_result;
int swap_cnt;

+   if (__predict_false(a == NULL))
+   return;
 loop:
swap_cnt = 0;
if (n < 7) {

This would also work to prevent the NULL pointer arithmetik for
ports that might also path a == NULL and n == 0 in certain cases.

I'll apply this patch tomorrow, if there are no objections.

Regards, STefan


OpenPGP_signature
Description: OpenPGP digital signature


Re: UBSAN report for main [so: 14] /usr/bin/whatis: non-zero (48) and zero offsets from null pointer in qsort.c

2022-01-11 Thread Stefan Esser
Am 11.01.22 um 08:40 schrieb Mark Millard:
> # whatis dog
> /usr/main-src/lib/libc/stdlib/qsort.c:114:23: runtime error: applying 
> non-zero offset 48 to null pointer
> SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior 
> /usr/main-src/lib/libc/stdlib/qsort.c:114:23 in 
> /usr/main-src/lib/libc/stdlib/qsort.c:114:44: runtime error: applying zero 
> offset to null pointer
> SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior 
> /usr/main-src/lib/libc/stdlib/qsort.c:114:44 in 
> whatis: nothing appropriate
> 
> This seems to be only for the not-found case.
> 
> ===
> Mark Millard
> marklmi at yahoo.com

The undefined behavior is caused by insufficient checking of parameters
in mansearch.c.

As part of the initializations performed at the start of mansearch(),
the variables cur and *res are initialized to 0 resp. NULL:

cur = maxres = 0;   
if (res != NULL)
*res = NULL;

If no match is found, these values are unchanged at line 223, where res
is checked to be non-NULL, but then *res is passed to qsort() and that
is still NULL.

Suggested fix (also attached to avoid white-space issues):

--- usr.bin/mandoc/mansearch.c
+++ usr.bin/mandoc/mansearch.c
@@ -220,7 +220,7 @@
if (cur && search->firstmatch)
break;
}
-   if (res != NULL)
+   if (res != NULL && *res != NULL)
qsort(*res, cur, sizeof(struct manpage), manpage_compare);
if (chdir_status && getcwd_status && chdir(buf) == -1)
warn("%s", buf);

(File name as in OpenBSD, it is contrib/mandoc/mansearch.c in FreeBSD.)

Regards, STefan--- usr.bin/mandoc/mansearch.c
+++ usr.bin/mandoc/mansearch.c
@@ -220,7 +220,7 @@
if (cur && search->firstmatch)
break;
}
-   if (res != NULL)
+   if (res != NULL && *res != NULL)
qsort(*res, cur, sizeof(struct manpage), manpage_compare);
if (chdir_status && getcwd_status && chdir(buf) == -1)
warn("%s", buf);


OpenPGP_signature
Description: OpenPGP digital signature


Re: FYI: An example type of UBSAN failure during kyua test -k /usr/tests/Kyuafile

2022-01-07 Thread Stefan Esser
Am 07.01.22 um 12:49 schrieb Mark Millard:
> Having done a buildworld with both WITH_ASAN= and WITH_UBSAN=
> after finding what to control to allow the build, I installed
> it in a directory tree for chroot use and have
> "kyua test -k /usr/tests/Kyuafile" running.
> 
> I see evidence of various examples of one type of undefined
> behavior: "applying zero offset to null pointer"
> 
> # more 
> /usr/obj/DESTDIRs/main-amd64-xSAN-chroot/tmp/kyua.FKD2vh/356/stderr.txt 
> /usr/main-src/lib/libc/stdio/fread.c:133:10: runtime error: applying zero 
> offset to null pointer
> SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior 
> /usr/main-src/lib/libc/stdio/fread.c:133:10 in 
> /usr/main-src/lib/libc/stdio/fread.c:133:10: runtime error: applying zero 
> offset to null pointer
> SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior 
> /usr/main-src/lib/libc/stdio/fread.c:133:10 in 
> /usr/main-src/usr.bin/sed/process.c:715:18: runtime error: applying zero 
> offset to null pointer
> SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior 
> /usr/main-src/usr.bin/sed/process.c:715:18 in 
> /usr/main-src/lib/libc/stdio/fread.c:133:10: runtime error: applying zero 
> offset to null pointer
> SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior 
> /usr/main-src/lib/libc/stdio/fread.c:133:10 in 
> Fail: stderr not empty
> --- /dev/null   2022-01-07 10:29:57.182903000 +
> +++ /tmp/kyua.FKD2vh/356/work/check.Mk9llD/stderr   2022-01-07 
> 10:29:57.17310 +
> @@ -0,0 +1,2 @@
> +/usr/main-src/lib/libc/stdio/fread.c:133:10: runtime error: applying zero 
> offset to null pointer
> +SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior 
> /usr/main-src/lib/libc/stdio/fread.c:133:10 in 
> Files left in work directory after failure: mntpt, mounterr
> 
> 
> In general the lib/libc/stdio/fread.c:133:10 example seems to
> be in a place that would make it fairly common.

Interesting find:

while (resid > (r = fp->_r)) {
(void)memcpy((void *)p, (void *)fp->_p, (size_t)r);
fp->_p += r; /* line 133 */
/* fp->_r = 0 ... done in __srefill */
p += r;
resid -= r;

If fp->_p == NULL in line 133, then NULL has been passed as source address
in memcpy() in the line above, and I'd think that is undefined behavior,
even if a length of 0 is passed at the same time.

Maybe the code block quoted above (line 132 to 136) should be made wrapped
into "if (r > 0) {}"?

Regards, STefan


OpenPGP_signature
Description: OpenPGP digital signature


Re: Make etcupdate bootstrap requirement due to previous mergemaster usage more clear in handbook

2021-12-08 Thread Stefan Esser
Am 08.12.21 um 18:11 schrieb John Baldwin:
> So the new changes always build a temporary tree (vs trying to build
> /var/db/etupdate/current in place).  For -n it should be that it just
> doesn't change /var/db/etcupdate/current at the end, but if it did the
> move anyway that would explain the bug you are seeing.  That does indeed
> look broken.  Please file a PR as a reminder for me to fix it.

If you work on this then please have a look at PR 247519, too:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=247519

This problem lead to my complete customized /etc getting lost when I
invoked etcupdate with WITH_DIRDEPS_BUILD defined in /etc/src-env.conf.

But I can imagine other reasons that let the make commands in build_tree()
fail without error exit, leading to an empty etcupdate/current tree and
subsequent deletion of files in /etc.

Regards, STefan


OpenPGP_signature
Description: OpenPGP digital signature


Re: git: 5e04571cf3cf - main - sys/bitset.h: reduce visibility of BIT_* macros

2021-12-07 Thread Stefan Esser
Am 07.12.21 um 01:50 schrieb Mark Millard:
> 
> 
> On 2021-Dec-6, at 14:48, Mark Millard  wrote:
> 
>> On 2021-Dec-6, at 14:19, Mark Millard  wrote:
>>
>>> This broke building lang/gcc11 so may be a exp run is appropriate:
>>>
>>> In file included from /usr/include/sys/cpuset.h:39,
>>>from /usr/include/sched.h:36,
>>>from /usr/include/pthread.h:48,
>>>from 
>>> /wrkdirs/usr/ports/lang/gcc11/work/gcc-11.2.0/gcc/jit/libgccjit.c:27:
>>> /usr/include/sys/bitset.h:314:36: error: attempt to use poisoned "malloc"
>>> 314 | #define __BITSET_ALLOC(_s, mt, mf) malloc(__BITSET_SIZE((_s)), mt, 
>>> (mf))
>>> |^
[...]
>> Just like the poudriere-devel based build on aarch64,
>> amd64's poudriere-devel based build got:
>>
>> In file included from /usr/include/sys/cpuset.h:39,
>> from /usr/include/sched.h:36,
>> from /usr/include/pthread.h:48,
>> from 
>> /wrkdirs/usr/ports/lang/gcc11/work/gcc-11.2.0/gcc/jit/libgccjit.c:27:
>> /usr/include/sys/bitset.h:314:36: error: attempt to use poisoned "malloc"
>>  314 | #define __BITSET_ALLOC(_s, mt, mf) malloc(__BITSET_SIZE((_s)), mt, 
>> (mf))
>>  |^
[...]
> This happens from the sequence below, where system.h use in
> the:
> 
> work/gcc-11.2.0/gcc/jit/{libgccjit,jit-recording,jit-playback}.c
> 
> builds is what poisons malloc in each case (and poisons more):
> 
> #include "config.h"
> #include "system.h"
> #include "coretypes.h"
> #include "timevar.h"
> #include "typed-splay-tree.h"
> #include "cppbuiltin.h"
> #include 
> 
> After the poison-point, new macro definitions can not
> reference malloc (and such) --nor can normal code. But
> macros defined prior to the poison-point can contain
> malloc (and such) and the use of such macros after
> the poison point is okay.
> 
> So, if pthread.h is to define a macro referencing
> malloc (say), then it needs to be included before
> system.h is included in the way that things are set up
> in this code.

Hi Mark,

sorry for (indirectly) causing the breakage ...

The problem seems to be the inclusion of extra functionality
in sched.h, that is required by a number of programs that use
autoconfigure. They probe for one detail of sched.h and then
try to use functionality that up to the commit you are referencing
had to be hidden (made conditional on _WITH_CPUT_SET_T).

The line that contains the malloc() is in a macro definition and
AFAIU the situation there is no actual use of __BITSET_ALLOC in
the gcc code.

In fact, I could not find a single use of BITSEC_ALLOC in userland
code. Therefore, the line that contains the malloc could be made
conditional on _KERNEL being defined.

> I've only tried to build lang/gcc11 (only as supplied
> by the port). There could be more failure points for
> the lang/gcc11 build that were skipped.

At least the other gcc ports. I do not think that there are many
other ports that do not accept the definition of a macro using
malloc() in the way the poisoning in gcc does.

> It seems likely that multiple lang/gcc* would have
> such issues but I normally only build the lang/gcc11
> one at this point.

I have not tested the other ports, but I do assume the same.

I'll try to build the world with BITSET_ALLOC conditional on
some macro that is not defined in the gcc build.

Regards, STefan


OpenPGP_signature
Description: OpenPGP digital signature


Re: [REVIEW] Hide BIT_* macros from userland code

2021-12-03 Thread Stefan Esser
Am 02.12.21 um 17:46 schrieb Shawn Webb:
> Hey Stefan,
> 
> On Thu, Dec 02, 2021 at 05:26:55PM +0100, Stefan Esser wrote:
>> I have created
>>
>>  https://reviews.freebsd.org/D33235
>>
>> to remove the BIT_* macros used in the kernel from the userland API.
>>
>> They conflict with differing definitions in some 3rd party code and
>> lead to compile issues in a number of ports (via CPU_* macros based
>> on the BIT_* macros).
>>
>> See PR259787 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=259787
>> for an example of such a problem.
> 
> I recently was in a position to evaluate BIT_* macros for userland
> use. It was around the time when the conversation regarding hiding
> BIT_* from userland, which conversation caused me to find another
> solution.
> 
> I think such an API is incredibly useful, so I wonder if there's a way
> to satisfy both. For example, maybe prefix the userland side with a
> USERLAND_ or something similar? Kernel would use BIT_* and userland
> would use USERLAND_BIT_* (just spitballing, not actually advocating
> for "USERLAND_BIT_*" but rather just the idea of it.)

Hi Shawn,

I have updated the patch set in review D33235 and have added you to
the reviewer list.

IMHO the approach proposed by Konstantin Belousov is better than the
introduction of prefixed macro names for the userland.

A simple #define _WANT_FREEBSD_BITSET makes the __BIT* macros available
by their traditional names, no other changes are required in the code.

This does not solve the potential case of a program that wants to use
both the BSD and GLIBC variants of the macros in a single source file.
But I think that such a case is constructed and does not occur in
actual code.

And in any case, the IMHO __BIT* names are as good as the USERLAND_BIT*
names you suggest (and I understand that you did not want that specific
name - therefore a prefix of __ might be considered to match what you
proposed ;-) ).

And you are of course free to map __BIT* to any other prefixed name in
a header file in your code ...

An update of the bitset(9) man page might be a good idea, explaining
the visibility rules and _WANT_FREEBSD_BITSET for system utilities that
need to work with kernel style bitsets.

Regards, STefan


OpenPGP_signature
Description: OpenPGP digital signature


[REVIEW] Hide BIT_* macros from userland code

2021-12-02 Thread Stefan Esser
I have created

https://reviews.freebsd.org/D33235

to remove the BIT_* macros used in the kernel from the userland API.

They conflict with differing definitions in some 3rd party code and
lead to compile issues in a number of ports (via CPU_* macros based
on the BIT_* macros).

See PR259787 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=259787
for an example of such a problem.


OpenPGP_signature
Description: OpenPGP digital signature


Re: problem with re(4) interface

2021-11-22 Thread Stefan Esser
Am 22.11.21 um 18:55 schrieb Warner Losh:
> On Mon, Nov 22, 2021 at 10:51 AM Chuck Tuffli  wrote:
> 
>> On Mon, Nov 22, 2021 at 9:34 AM Chris  wrote:
>>>
>>> On 2021-11-22 08:47, Chuck Tuffli wrote:
 Running on a recent-ish -current
 # uname -a
 FreeBSD stargate.tuffli.net 14.0-CURRENT FreeBSD 14.0-CURRENT
 main-81b22a9892 GENERIC  amd64

 I'm having trouble using the second NIC interface in a bridge to
>> provide
 network connectivity to bhyve VMs and need some help figuring out what
>> is
 wrong.
>> ...
>>> Because there's subtle differences between them; are you using the re
>> driver
>>> from base, or from ports?
>>
>> The driver is from base. Didn't realize there was one in ports.
>>
> 
> The ports driver is tricky... It's an older, buggier version of the base
> driver... *BUT*
> a number of issues that aren't fixed in base are fixed in it (mostly
> dealing better with
> errata)...  Ideally, we'd pull in the actual fixes from this driver, but
> it's a huge patch-set
> where it's unclear which bits are for what thing fixed, so nobody (that I
> know of) has
> gone through and even come up with an ugly patch for -current.

I had hoped to be able to merge RTL8125 support into our driver, based on
the Realtek version the port uses, but gave up for lack of documentation
that describes the RTL8125 chips and their PHYs.

But in preparation for this work I have analyzed the differences between
our driver in base and the one from Realtek. The Realtek driver:

- lacks support for a lot of newer features (e.g. NETMAP)
- has lots of conditional sections for antique FreeBSD versions
- special cases some 50 chip versions with regard to features, timing, ...
- contains microcode patches for nearly every RTL chip version

There are even 4 chip versions of the RTL8125 (as the latest Realtek chip)
that are distinguished in the driver (some need microcode patches, some do
not).

I have created patches to bring our version more in line with additions
present in the Realtek driver (e.g. register definitions for RTL8125), but
had decided not to commit them, since I had no way of testing them with the
variety of hardware the driver supports.

I could commit the register definitions and other changes that I consider
low risk (even if I do not have the particular hardware revision the changes
address).

It is sad that Realtek does not provide developers with detailed information.
I'll look again for any leaked RTL8125 data books, but last I checked, there
were none.

Regards, STefan


OpenPGP_signature
Description: OpenPGP digital signature


Re: Incompatible change in LLD13 causing link errors?

2021-11-17 Thread Stefan Esser

Am 17.11.21 um 21:20 schrieb Dimitry Andric:
> On 17 Nov 2021, at 21:07, Stefan Esser  wrote:
>>
>> I have just received pkg-fallout for a port that has not been touched
>> for several months, specifically lang/silq.
>>
>> ld.lld: error: undefined hidden symbol: __start___minfo
>>>>> referenced by terminal.d
>>>>>  silq.o:(ldc.register_dso)
>>
>> ld.lld: error: undefined hidden symbol: __stop___minfo
>>>>> referenced by terminal.d
>>>>>  silq.o:(ldc.register_dso)
>> cc: error: linker command failed with exit code 1 (use -v to see invocation)
>> Error: /usr/bin/cc failed with status: 1
>> *** Error code 1
>>
>> This port builds correctly with LLD12 from a port, but fails with the
>> error message included above for both LLD13 from a port and LLD from
>> the FreeBSD-CURRENT base system.
> 
> See https://bugs.llvm.org/show_bug.cgi?id=52384 where this is discussed.
> Executive summary is to add -Wl,-z,nostart-stop-gc to your LDFLAGS, for
> now at least. But as you can see in the upstream PR, not everybody is
> happy with them flipping the default to on.

Hi Dimitry,

thank you for the quick reply!

Seems that the breakage of LDC had been noticed (by Jessica?) a few
weeks ago, and that a possible solution could be to build LDC with
LLVM>=13.0.0.

But apparently LDC-1.23.0 cannot be built with llvm13, and a naive
attempt to upgrade the LDC port to 1.28.0 failed (MAINTAINER in CC).

Since LDC currently depends on LLVM10 I'll just add that as a
dependency to my failing port and hard-code lld10 as the linker
to use.

A better fix could be to import the (apparently not yet completely
accepted) patch

https://github.com/ldc-developers/ldc/pull/3850/

to explicitly define the garbage collected symbols in rt.dso into
LDC.

Anyway, my lang/silq port seems to be fixed by using llvm10 (poudriere
test builds ongoing).

Regards, STefan


OpenPGP_signature
Description: OpenPGP digital signature


Incompatible change in LLD13 causing link errors?

2021-11-17 Thread Stefan Esser
I have just received pkg-fallout for a port that has not been touched
for several months, specifically lang/silq.

ld.lld: error: undefined hidden symbol: __start___minfo
>>> referenced by terminal.d
>>>   silq.o:(ldc.register_dso)

ld.lld: error: undefined hidden symbol: __stop___minfo
>>> referenced by terminal.d
>>>   silq.o:(ldc.register_dso)
cc: error: linker command failed with exit code 1 (use -v to see invocation)
Error: /usr/bin/cc failed with status: 1
*** Error code 1

This port builds correctly with LLD12 from a port, but fails with the
error message included above for both LLD13 from a port and LLD from
the FreeBSD-CURRENT base system.

There seems to be a difference in the visibility of symbols between
the LLD versions 12 and 13, but I have no idea what changed and which
LLD flags might be available to restore the previous  behavior.

Any ideas?


OpenPGP_signature
Description: OpenPGP digital signature


Re: stat(1) isn't honouring locale

2021-10-30 Thread Stefan Esser
Am 30.10.21 um 14:12 schrieb Jamie Landeg-Jones:> Stefan Esser 

wrote:
>
>>> % date +%+
>>> Fri 29 Oct 2021 00:15:05 BST
>>>
>>> % stat -t%+ -f '%Sm' .
>>> Fri Oct 29 00:13:38 BST 2021
>>> -
>
>> thank you for reporting this issue and suggesting a fix.
>>
>> I have committed your proposed fix to -CURRENT as Git commit
>> 20f8331aca892ff8
>> and plan to MFC it to 13-STABLE in a few days.
>>
>> I'm CCing to the release engineer, since this might be a change that
>> we want to include in the upcoming 12.3 release (currently in beta).
>
> Thanks, and thanks for the quick response! I wasn't sure if it was an
> oversight,> or if there was something I missed.

The man page does not mention a locale dependency, and strftime()
without setting a locale just returns the date for the POSIX locale.

But I think that it was an oversight, since the date command respects
the locale by default and with the change you suggested, it is possible
to get the stat output in the locale specific format but also in the
format previously displayed.

This might be a change that violates POLA, since a format changes in
an existing application, and I'm not sure whether a MFC to 12.3 might
be too large a change at that time of the release cycle.

If it is merged to 13-STABLE I'll add a "add to release notes" marker.

But I do think that this is a worthwhile change that has just been
forgotten when other utilities have been made locale aware.

Regards, STefan


OpenPGP_signature
Description: OpenPGP digital signature


Re: stat(1) isn't honouring locale

2021-10-30 Thread Stefan Esser
Am 29.10.21 um 20:15 schrieb Jamie Landeg-Jones:
> stat(1) isn't honouring locale.
> 
> The manual page says:
> 
>  -t timefmt
>   Display timestamps using the specified format.  This format 
> is passed directly to strftime(3).
> 
> strftime(3) says:
> 
>%+is replaced by national representation of the date and time (the 
> format is similar to that produced by date(1)).
> 
> However:
> 
> -
> % date
> Fri Oct 29 00:14:12 BST 2021
> 
> % date +%+
> Fri Oct 29 00:14:19 BST 2021
> 
> % stat -t%+ -f '%Sm' .
> Fri Oct 29 00:13:38 BST 2021
> -
> 
> % setenv LANG en_GB.UTF-8
> 
> % date
> Fri 29 Oct 2021 00:14:57 BST
> 
> % date +%+
> Fri 29 Oct 2021 00:15:05 BST
> 
> % stat -t%+ -f '%Sm' .
> Fri Oct 29 00:13:38 BST 2021
> -
> 
> Including  and adding:
> 
> (void) setlocale(LC_TIME, "");
> 
> before the call to strftime() in usr.bin/stat/stat.c fixes this
> 
> Is there any reason this isn't in place?

Hi Jamie,

thank you for reporting this issue and suggesting a fix.

I have committed your proposed fix to -CURRENT as Git commit 20f8331aca892ff8
and plan to MFC it to 13-STABLE in a few days.

I'm CCing to the release engineer, since this might be a change that we want
to include in the upcoming 12.3 release (currently in beta).

Regards, STefan


OpenPGP_signature
Description: OpenPGP digital signature


Re: RFC: Use of VOP_ALLOCATE() by NFSV4.2 nfsd

2021-10-13 Thread Stefan Esser
Am 10.10.21 um 05:52 schrieb Alan Somers:
> On Sat, Oct 9, 2021 at 7:13 PM Rick Macklem  wrote>> 
> This leads me to a couple of questions:
>> - Is there a good reason for not using vop_stdallocate() for ZFS?
> 
> Yes.  posix_fallocate is supposed to guarantee that subsequent writes
> to the file will not fail with ENOSPC.  But ZFS, being a copy-on-write
> file system, cannot possibly guarantee that.  See SVN r325320.

This is not entirely true: ZFS supports reservations and it could
thus support the pre-allocation of space that is later "filled".
This reservations would be substracted from the free space sum,
and it would be guaranteed that this free space is available for
the file for which the pre-allocation has been requested.

This would require that the allocate() call recorded the block
range for which an allocation is requested (and for which no
disk blocks are currently allocated) without assignment of any
backing blocks at that time.

Later writes to that range would allocate disk blocks and at the
same time reduce the amount that is reserved and remove that range
(that is now allocated) from the recorded pre-allocation range.

This would of course require the addition of block ranges that
are reserved but not yet backed by disk blocks to the znode, and
of the total count of blocks reserved for this purpose in addition
to other types of reservations in a separate variable.

>> - Should I try and support both file system types via vop_stdallocate()
>>   or not support Allocate at all?
> 
> Since you can't possibly support it for ZFS (not to mention other file
> systems like fusefs) you'll have to not support it at all.

While I do think that an allocate() operation could be implemented
in ZFS, it is obvious that this does not apply to all possible
fusefs filesystems (which do not even need to support the concept
of an allocation of blocks or ranges).

Regards, STefan


OpenPGP_signature
Description: OpenPGP digital signature


fetch -v error output broken?

2021-09-09 Thread Stefan Esser
I have just opened PR 258387 for this issue, which occurred during testing
of a port with invalid MASTER_SITE.

The error output of "fetch -v" should be server messages, but it appears that
the buffer gets overwritten with data of unknown origin (mostly NUL bytes), 
e.g.:

$ fetch -v http://distcache.us-west.freebsd.org/x 2>&1 | hd
  72 65 73 6f 6c 76 69 6e  67 20 73 65 72 76 65 72  |resolving server|
0010  20 61 64 64 72 65 73 73  3a 20 64 69 73 74 63 61  | address: distca|
0020  63 68 65 2e 75 73 2d 77  65 73 74 2e 66 72 65 65  |che.us-west.free|
0030  62 73 64 2e 6f 72 67 3a  38 30 0a 72 65 71 75 65  |bsd.org:80.reque|
0040  73 74 69 6e 67 20 68 74  74 70 3a 2f 2f 64 69 73  |sting http://dis|
0050  74 63 61 63 68 65 2e 75  73 2d 77 65 73 74 2e 66  |tcache.us-west.f|
0060  72 65 65 62 73 64 2e 6f  72 67 2f 78 0a 0d 0a 00  |reebsd.org/x|
0070  67 69 6e 78 00 00 00 0a  34 30 34 20 4e 6f 74 20  |ginx404 Not |
0080  46 6f 75 6e 64 0d 0a 00  00 00 00 00 10 02 00 50  |Found..P|
0090  95 14 01 c9 00 00 00 00  00 00 00 00 0a 0d 0a 00  ||
00a0  74 6c 65 3e 34 30 34 20  4e 6f 74 20 46 6f 75 6e  |tle>404 Not Foun|
00b0  64 0d 0a 00 00 00 00 00  10 02 00 50 95 14 01 c9  |d..P|
00c0  00 00 00 00 00 00 00 00  0a 34 30 34 20 4e 6f 74  |.404 Not|
00d0  20 46 6f 75 6e 64 0d 0a  00 0a 00 00 00 00 00 10  | Found..|
00e0  02 00 50 95 14 01 c9 00  00 00 00 00 00 00 00 0a  |..P.|
00f0  6e 67 69 6e 78 0d 0a 00  3e 0d 0a 00 0a 00 00 00  |nginx...>...|
0100  00 00 10 02 00 50 95 14  01 c9 00 00 00 00 00 00  |.P..|
0110  00 00 0a 0d 0a 00 72 3e  6e 67 69 6e 78 0d 0a 00  |..r>nginx...|
0120  3e 0d 0a 00 0a 00 00 00  00 00 10 02 00 50 95 14  |>P..|
0130  01 c9 00 00 00 00 00 00  00 00 0a 0d 0a 00 72 3e  |..r>|
0140  6e 67 69 6e 78 0d 0a 00  3e 0d 0a 00 0a 00 00 00  |nginx...>...|
0150  00 00 10 02 00 50 95 14  01 c9 00 00 00 00 00 00  |.P..|
0160  00 00 0a 66 65 74 63 68  3a 20 68 74 74 70 3a 2f  |...fetch: http:/|
0170  2f 64 69 73 74 63 61 63  68 65 2e 75 73 2d 77 65  |/distcache.us-we|
0180  73 74 2e 66 72 65 65 62  73 64 2e 6f 72 67 2f 78  |st.freebsd.org/x|
0190  3a 20 4e 6f 74 20 46 6f  75 6e 64 0a  |: Not Found.|
019c

The expected output is returned by wget, e.g.:

$ wget -d http://distcache.us-west.freebsd.org/x
[...]
404 Not Found
Registered socket 3 for persistent reuse.
Skipping 146 bytes of body: [
404 Not Found

404 Not Found
nginx


] done.
2021-09-09 15:37:02 ERROR 404: Not Found.

Part of the HTML response can be found in the fetch output, too:

"r>nginx" is obviously a fragment of "nginx", but "r>nginx"
appears twice, 40 bytes apart.

"tle>404 Not Found" is a fragment of "404 Not Found", with "404
Not Found" appearing a total of 3 times ...

This could be a result of recent changes to the memcpy function, which used to
allow overlapping buffers, but does not anymore on -CURRENT.

But the 4 occurences of memcpy() in libfetch/http.c and libfetch/common.c seem
to be sane, and I did not look any further for the source of the data 
corruption.


OpenPGP_signature
Description: OpenPGP digital signature


Re: -CURRENT compilation time

2021-09-08 Thread Stefan Esser
Am 08.09.21 um 10:57 schrieb David Chisnall:
> On 07/09/2021 18:02, Stefan Esser wrote:
>> Wouldn't this break META_MODE?
> 
> I have never managed to get META_MODE to work but my understanding is that
> META_MODE is addressing a problem that doesn't really exist in any other build
> system that I've used: that dependencies are not properly tracked.

META_MODE allows for complex interdependencies. They are no issue in the
GPL/Linux world, since components are not integrated in the same way as
has been practice in BSD for many decades.

> When I do a build of LLVM with the upstream build system with no changes, it
> takes Ninja approximately a tenth of a second to stat all of the relevant 
> files
> and tell me that I have no work to do.  META_MODE apparently lets the FreeBSD
> build system extract these dependencies and do something similar, but it's not
> enabled by default and it's difficult to make work.

I tend to disagree on the last 5 words of your last sentence.

It took me just a few seconds to activate, and it has worked without fault
since.

There are only 2 trivial steps. But it is easy to miss the fact, that
WITH_META_MODE has to be added to /etc/src-env.conf, not /etc/src.conf:

1) Add "WITH_META_MODE=yes" to /etc/src-env.conf (create file, if it does
   not exist)

2) Add "device filemon" to your kernel configuration or to the kld_load
   variable in /etc/rc.conf to load the kernel module

(The kernel module can of course also be manually loaded at any time.)

>> I'd rather be able to continue building the world within a few minutes
>> (generally much less than 10 minutes, as long as there is no major LLVM
>> upgrade) than have a faster LLVM build and then a slower build of the world 
>> ...
> 
> The rest of this thread has determined that building LLVM accounts for half of
> the build time in a clean FreeBSD build.  LLVM's CMake is not a great example:
> it has been incrementally improved since CMake 2.8 and doesn't yet use any of
> the modern CMake features that allow encapsulating targets and providing 
> import
> / export configurations.

The build of LLVM is skipped if META_MODE is enabled, except if there
really was a change to some LLVM header that causes a complete rebuild.

A further speed-up can be had with ccache, but I found that it does not
seem to make that much of a difference on my system.

> In spite of that, it generates a ninja file that compiles *significantly*
> faster than the bmake-based system in FreeBSD.  In other projects that I've
> worked on with a similar-sized codebase to FreeBSD that use CMake + Ninja, 
> I've
> never had the same problems with build speed that I have with FreeBSD.

Possible, but if I watch the LLVM build with top or systat, I see that
all my cores are busy, nearly throughout the full build. There are two
methods that could theoretically speed-up the build:

1) make use of idle CPU cores

2) reduce the number of object files to build

I do not see that there is much potential for 1), since there is a high
degree of parallelism:

>>> World build completed on Wed Sep  1 13:40:14 CEST 2021
>>> World built in 99 seconds, ncpu: 32, make -j32
--
   98.69 real   741.61 user   234.55 sys

>>> World build completed on Thu Sep  2 23:22:04 CEST 2021
>>> World built in 98 seconds, ncpu: 32, make -j32
--
   98.34 real   780.41 user   228.67 sys

>>> World build completed on Fri Sep  3 19:09:39 CEST 2021
>>> World built in 165 seconds, ncpu: 32, make -j32
--
  164.84 real  1793.62 user   241.11 sys

>>> World build completed on Sun Sep  5 20:23:29 CEST 2021
>>> World built in 135 seconds, ncpu: 32, make -j32
--
  135.59 real   695.45 user   214.76 sys

>>> World build completed on Mon Sep  6 21:10:44 CEST 2021
>>> World built in 478 seconds, ncpu: 32, make -j32
--
  479.22 real 11374.40 user   474.19 sys

>>> World build completed on Wed Sep  8 11:51:03 CEST 2021
>>> World built in 652 seconds, ncpu: 32, make -j32
--
  652.14 real 17857.03 user   753.41 sys

Calculating "(user + sys) / real" I get factors between 10 (in case
of only minor changes) to 28 for larger recompiles (e.g. if lots
of source files depend on an updated header), with 32 the theoretical
limit for all cores continuously active during the build.

META_MODE does not understand that upd

Re: -CURRENT compilation time

2021-09-07 Thread Stefan Esser
Am 07.09.21 um 15:51 schrieb David Chisnall:
> On 06/09/2021 20:34, Wolfram Schneider wrote:
>> With the option WITHOUT_TOOLCHAIN=yes the world build time is 2.5
>> times faster (real or user+sys), down from 48 min to 19.5 min real
>> time.
> 
> Note that building LLVM with the upstream CMake + Ninja build system is
> *significantly* faster on a decent multicore machine than the FreeBSD
> bmake-based in-tree version.
> 
> One of the things I'd love to prototype if I had time is a CMake-based build
> system for FreeBSD so that we could get all of the tooling integration from 
> the
> compile_commands.json, reuse LLVM's (and any other contrib things that use
> CMake) build system without having to recreate it, and be able to use ninja, 
> to
> build.

Wouldn't this break META_MODE?

I'd rather be able to continue building the world within a few minutes
(generally much less than 10 minutes, as long as there is no major LLVM
upgrade) than have a faster LLVM build and then a slower build of the world ...

Regards, STefan


OpenPGP_signature
Description: OpenPGP digital signature


Re: awk behaviour?

2021-07-29 Thread Stefan Esser
Am 29.07.21 um 18:42 schrieb Michael Butler via freebsd-current:
> On 7/29/21 6:09 AM, Michael Gmelin wrote:
>>
>>
>> On Wed, 28 Jul 2021 16:02:30 -0400
>> Ed Maste  wrote:
>>
>>> On Wed, 28 Jul 2021 at 15:15, Michael Butler via freebsd-current
>>>  wrote:

 What prompted the question was my (obviously poor) attempt to debug
 and resolve this failure when attempting to build a release for
 i386 on an amd64 ..
>>>
>>> This will be due to my 4e224e4be7c3. I'm not sure exactly what's
>>> happening yet, but I can provoke this behaviour if `${PKG_CMD}
>>> --version` outputs something other than a single line with the version
>>> number.
>>>
>>
>> Could it be, that the pkg binary isn't installed in $LOCALBASE/sbin/pkg,
>> (whatever LOCALBASE is at that point)? This would make pkg --version
>> shows its bootstrap message:
>>
>>    The package management tool is not yet installed on your system.
>>    Do you want to fetch and install it now? [y/N]:
>>
>> which could explain the behavior.
>>
>> Just speculating...
> 
> This is consistent with the behaviour I'm now seeing after the most recent 
> patch.
> 
> In the chroot environment used by a cross-compilation, there is no installed
> pkg port. When pkg is invoked in the target environment, it now waits on the
> yes/no response,

Passing "ASSUME_ALWAYS_YES=yes" in the environment should cause the
installation to proceed without a wait for user input.

Regards, STefan



OpenPGP_signature
Description: OpenPGP digital signature


Add support for -c to sha256sum to fix port build failures

2021-06-18 Thread Stefan Esser
The sha256 et.al. programs have recently been extended to provide
GNU compatible features if invoked as sha256sum.

This does now lead to port build issues, since there are ports that
assume that the -c option is implemented and that treat an error exit
of sha256sum -c as an indication of corrupted source files.

I have created

https://reviews.freebsd.org/D30812

as a quick attempt to provide a GNU compatible sha256sum -c feature
and I'd appreciate a review this change.

An alternative to adding this feature would be changes to all ports
that now fail due to the assumption that sha256sum does provide that
option.

I could have used linked list macros, but given the simple structure
I did not think the extra dependency was required here, and it does
not really simplify the program, IMHO.

Regards, STefan



OpenPGP_signature
Description: OpenPGP digital signature


Re: Problems with realtek NIC

2021-05-02 Thread Stefan Esser
Am 02.05.21 um 01:37 schrieb Greg Rivers:
> On Saturday, 1 May 2021 16:45:03 CDT Stefan Esser wrote:
>> Am 01.05.21 um 21:48 schrieb Greg Rivers via freebsd-current:
>>> On Saturday, 1 May 2021 14:09:46 CDT Nilton Jose Rizzo wrote:
>>>> I using a FreeBSD 14-Current and get random error with my NIC. The 
>>>> watchdog timer send a timeout message and I loose connection temporaly. In 
logs show only this message:
>>>>
>>> Switch to the official Realtek driver in ports: net/realtek-re-kmod
>>
>> The "official" RealTek driver is based on a very old version of "our"
>> driver that was written by Bill Paul.
>>
>> It lacks many features that have been introduced in FreeBSD in the
>> last decade (or even earlier) like NETMAP-Support.
>>
>> The RealTek-driver has special cases for some 50 variants of RealTek
>> Ethernet chips and contains individual firmware patches for nearly all
>> of them.
>>
>> I had started to merge chip specific changes from the official driver
>> to the FreeBSD driver in the hope to get it to support the RTL8125A/B
>> chips. But I have stopped that project for lack of RTL8125 documentation,
>> especially regarding the PHY, which has its own driver module in our
>> version but not the RealTek code. (And somebody claimed to know that
>> another FreeBSD developer was working on RTL8125 support but did not
>> tell who that might be and whether he had documentation.)
>>
>> Anyway, there are changes regarding the initialization and error recovery
>> of different RealTek chips in the official driver that could be merged
>> into our version. But I do not know whether these changes require the
>> firmware changes provided by the RealTek driver to correctly work.
>>
> Thanks for the information Stefan, and for your work on FreeBSD. My use
> of the term "official" was apparently inaccurate. I was not aware of the
> deficiencies in the RealTek driver. I would prefer to use the FreeBSD
> driver, but I don't for purely pragmatic reasons: the FreeBSD driver
> continually locks up and resets under load (as described by the OP),
> while the RealTek driver does not.

Hi Greg,

the RealTek driver is "official" in the sense that it is provided by the
vendor and written with knowledge about all the (many!) deficiencies of
the RealTek Ethernet chips. And yes, the FreeBSD drived definitely needs
work to fully support all variants of the RealTek chip. I guess that due
to uncovered chip specifics or hardware issues, the FreeBSD driver will
often lack the special code (or firmware patches) and will have to recover
chip operations be going through a hard reset.

If you look at the "official" driver sources, you'll find #ifdefs for
FreeBSD versions  before 4.9, but that is not the reason the main driver
source file is more than 3 lines long.

The driver distinguishes between more than 70 different chip versions
(identified by MACFG_3 to MACFG_84 with some IDs missing). And each one
has specific requirements regarding firmware patches, initialization and
reset behavior, error handling, ...

I have analyzed these differences (see the attached file) but for lack
of RTL8125 documentation not preceded with this project at this time.
(The column lx_fw identifies firmware patches used by the Linux driver,
while rt_fw identifies those embedded into the RealTek driver for
FreeBSD - and those differ somewhat, and I have no idea why ...).

I have local modifications of the re driver in my sources, but have
one other project that I really want to get ready in the next few
months (after working on it for nearly 2 years) and I do not want to
become responsible for issues of the RealTek driver in base (after
committing fixes that also might cause regressions, if they need to
be accompanied by firmware patches ...)

> FWIW, here are the particulars on the RealTek chip-set that I've got:
> 
> re0:  port 
> 0xe000-0xe0ff mem 0xb0804000-0xb0804fff,0xb080-0xb0803fff irq 16 at 
> device 0.0 on pci1
> re0: Using 1 MSI-X message
> re0: Chip rev. 0x2c80
> re0: MAC rev. 0x0010

The Chip rev. indicates that you have got a RTL8168E_VL, which does not
need a firmware patch according to RealTek driver, but gets one in Linux.

It is identified by MACFG_38 in the RealTek driver, BTW, and there are
only a few chip specific code fragments relevant to that chip. It seems,
it needs special handling when the MAC address is programmed, but I did
not spot any other special code for that particular chip in the "official"
driver.

> re0@pci0:1:0:0: class=0x02 rev=0x06 hdr=0x00 vendor=0x10ec device=0x8168 
> subvendor=0x1458 subdevice=0xe000
> vendor = 'Realtek Semiconductor Co., Ltd.'
>

Re: Problems with realtek NIC

2021-05-01 Thread Stefan Esser
Am 01.05.21 um 21:48 schrieb Greg Rivers via freebsd-current:
> On Saturday, 1 May 2021 14:09:46 CDT Nilton Jose Rizzo wrote:
>> I using a FreeBSD 14-Current and get random error with my NIC. The watchdog 
>> timer send a timeout message and I loose connection temporaly. In logs show 
>> only this message:
>>
> Switch to the official Realtek driver in ports: net/realtek-re-kmod

The "official" RealTek driver is based on a very old version of "our"
driver that was written by Bill Paul.

It lacks many features that have been introduced in FreeBSD in the
last decade (or even earlier) like NETMAP-Support.

The RealTek-driver has special cases for some 50 variants of RealTek
Ethernet chips and contains individual firmware patches for nearly all
of them.

I had started to merge chip specific changes from the official driver
to the FreeBSD driver in the hope to get it to support the RTL8125A/B
chips. But I have stopped that project for lack of RTL8125 documentation,
especially regarding the PHY, which has its own driver module in our
version but not the RealTek code. (And somebody claimed to know that
another FreeBSD developer was working on RTL8125 support but did not
tell who that might be and whether he had documentation.)

Anyway, there are changes regarding the initialization and error recovery
of different RealTek chips in the official driver that could be merged
into our version. But I do not know whether these changes require the
firmware changes provided by the RealTek driver to correctly work.

Regards, STefan



OpenPGP_signature
Description: OpenPGP digital signature


Re: git magic in contrib/bc

2021-04-28 Thread Stefan Esser
Am 28.04.21 um 20:44 schrieb Michael Gmelin:
> 
> 
> On Wed, 28 Apr 2021 20:00:38 +0300
> Yuri Pankov  wrote:
> 
>> Not sure if it's just me, but I'm seeing a bit of git weirdness in
>> contrib/bc:
> 
> I'm seeing the same here, also when doing:
> 
>   rm .git/index
>   git reset
>   git status
> 
> after this, `git diff' also shows what changed in those files (basically
> every line). It's all whitespace characters, as `git diff -w' is empty.
> 
> Turns out EOLs changed, I suspect this is due to the eol overrides in
> contrib/bc/.gitattributes. If I comment those out, "git diff" is silent
> again.

Yes, the new file .gitattributes has recently been committed by me
as part of an upgrade.

I do assume that the files affected are only for the Windows build
that has been added in version 4.0.0.

I do not know how to fix this problem (and whether this is just a
nuisance or an actual problem).

The upstream repository is https://git.yzena.com/gavin/bc and I have
performed a "diff -r" of the distfile of the math/gh-bc port against
the files in vendor/bc in our repository (before the commit to that
repository) and thus any change that we locally apply will need to
be upstreamed.

Regards, STefan



OpenPGP_signature
Description: OpenPGP digital signature


[SOLVED] Re: Strange behavior after running under high load

2021-04-02 Thread Stefan Esser

Am 28.03.21 um 16:39 schrieb Stefan Esser:

After a period of high load, my now idle system needs 4 to 10 seconds to
run any trivial command - even after 20 minutes of no load ...


I have run some Monte-Carlo simulations for a few hours, with initially 
35 

processes running in parallel for some 10 seconds each.

The load decreased over time since some parameter sets were faster to process.
All in all 63000 processes ran within some 3 hours.

When the system became idle, interactive performance was very bad. Running
any trivial command (e.g. uptime) takes some 5 to 10 seconds. Since I have
to have this system working, I plan to reboot it later today, but will keep
it in this state for some more time to see whether this state persists or
whether the system recovers from it.

Any ideas what might cause such a system state???


Seems that Mateusz Guzik was right to mention performance issues when
the system is very low on vnodes. (Thanks!)

I have been able to reproduce the issue and have checked vnode stats:

kern.maxvnodes: 620370
kern.minvnodes: 155092
vm.stats.vm.v_vnodepgsout: 6890171
vm.stats.vm.v_vnodepgsin: 18475530
vm.stats.vm.v_vnodeout: 228516
vm.stats.vm.v_vnodein: 1592444
vfs.wantfreevnodes: 155092
vfs.freevnodes: 47  <- obviously too low ...
vfs.vnodes_created: 19554702
vfs.numvnodes: 621284
vfs.cache.debug.vnodes_cel_3_failures: 0
vfs.cache.stats.heldvnodes: 6412

The freevnodes value stayed in this region over several minutes, with
typical program start times (e.g. for "uptime") in the region of 10 to
15 seconds.

After rising maxvnodes to 2,000,000 form 600,000 the system performance
is restored and I get:

kern.maxvnodes: 200
kern.minvnodes: 50
vm.stats.vm.v_vnodepgsout: 7875198
vm.stats.vm.v_vnodepgsin: 20788679
vm.stats.vm.v_vnodeout: 261179
vm.stats.vm.v_vnodein: 1817599
vfs.wantfreevnodes: 50
vfs.freevnodes: 205988  <- still a lot higher than wantfreevnodes
vfs.vnodes_created: 19956502
vfs.numvnodes: 912880
vfs.cache.debug.vnodes_cel_3_failures: 0
vfs.cache.stats.heldvnodes: 20702

I do not know why the performance impact is so high - there are a few
free vnodes (more than required for the shared libraries to start e.g.
the uptime program). Most probably each attempt to get a vnode triggers
a clean-up attempt that runs for a significant time, but has no chance
to actually reach near the goal of 155k or 500k free vnodes.

Anyway, kern.maxvnodes can be changed at run-time and it is thus easy
to fix. It seems that no message is logged to report this situation.
A rate limited hint to rise the limit should help other affected users.

Regards, STefan



OpenPGP_signature
Description: OpenPGP digital signature


Re: Strange behavior after running under high load

2021-03-29 Thread Stefan Esser

Am 29.03.21 um 08:45 schrieb Andrea Venturoli:

On 3/28/21 4:39 PM, Stefan Esser wrote:

After a period of high load, my now idle system needs 4 to 10 seconds to
run any trivial command - even after 20 minutes of no load ...


High CPU load or high disk load?


High CPU load, 3 times the number of CPU threads in this particular
batch run.

Less than 10 files of less than 100 KB per second have been written.


ZFS? Snapshots?


ZFS and automatic snapshots of the file system every hour.


12.x? 13.x?


-CURRENT as of some 24 hours before the issue occurred:

FreeBSD 14.0-CURRENT #33 main-n245694-90d2f7c413f9-dirty: Sat Mar 27 15:35:37 
CET 2021


I've seen something similar: after a high load period, system crawled so much 
that services were not answering in a reasonable time (e.g. mail would fail 
with "no such mailbox"!).


Program start-up was very slow, but interactive response once running was
normal (e.g. execution of internal shell commands like "echo *").


Even rebooting didn't fix it, until I deleted some autosnapshots.


Rebooting fixed it on my case.

top or other tools would show no disk activity, although the disks were 
working 

as mad.


No disk activity in my case. The system was idle without any load, but the
issue persisted over many hours (up to the moment when I decided to reboot
the system to get it back into a usable state).


Not sure it's the same case you experienced, though.


Probably not, but you seem to have hit another case were a resource limit
was reached and the system did not gracefully deal with the situation.

Thanks for replying ...

Regards, STefan



OpenPGP_signature
Description: OpenPGP digital signature


Re: Strange behavior after running under high load

2021-03-29 Thread Stefan Esser

Am 29.03.21 um 03:11 schrieb Mateusz Guzik:

This may be the problem fixed in
e9272225e6bed840b00eef1c817b188c172338ee ("vfs: fix vnlru marker
handling for filtered/unfiltered cases").


My system was up for less than 24 hours and using a kernel and world
built on the latest -CURRENT of less than 1 hour before the reboot:

FreeBSD 14.0-CURRENT #33 main-n245694-90d2f7c413f9-dirty: Sat Mar 27 15:35:37 
CET 2021


The fix had been committed some 9 days before that kernel was built.


However, there is a long standing performance bug where if vnode limit
is hit, and there is nothing to reclaim, the code is just going to
sleep for one second.


There are no log entries that give any hint to what occurred.
But I do assume that these events are not logged ... (?)

Yes, I could have checked that and will do so if the issue occurs
again. I plan to generate more output files in the same way that
triggered the issue yesterday, and since the system is very slow
but still able to execute commands, I can try to debug it, just
have to know where to start looking ...

Thank you for your reply!

Regards, STefan



OpenPGP_signature
Description: OpenPGP digital signature


Re: Strange behavior after running under high load

2021-03-28 Thread Stefan Esser

Am 28.03.21 um 17:44 schrieb Andriy Gapon:

On 28/03/2021 17:39, Stefan Esser wrote:

After a period of high load, my now idle system needs 4 to 10 seconds to
run any trivial command - even after 20 minutes of no load ...


I have run some Monte-Carlo simulations for a few hours, with initially 35
processes running in parallel for some 10 seconds each.


I saw somewhat similar symptoms with 13-CURRENT some time ago.
To me it looked like even small kernel memory allocations took a very long time.
But it was hard to properly diagnose that as my favorite tool, dtrace, was also
affected by the same problem.


That could have been the case - but I had to reboot to recover the system.

I had let it sit idle fpr a few hours and the last "time uptime" before
the reboot took 15 second real time to complete.

Response from within the shell (e.g. "echo *") was instantaneous, though.

I tried to trace the program execution of "uptime" with truss and found,
that the loading of shared libraries proceeded at about one or two per
second until all were attached and then the program quickly printed the
expected results.

I could probably recreate the issue by running the same set of programs
that triggered it a few hours ago, but this is a production system and
I need it to be operational through the week ...

Regards, STefan



OpenPGP_signature
Description: OpenPGP digital signature


Strange behavior after running under high load

2021-03-28 Thread Stefan Esser

After a period of high load, my now idle system needs 4 to 10 seconds to
run any trivial command - even after 20 minutes of no load ...


I have run some Monte-Carlo simulations for a few hours, with initially 35 
processes running in parallel for some 10 seconds each.


The load decreased over time since some parameter sets were faster to process.
All in all 63000 processes ran within some 3 hours.

When the system became idle, interactive performance was very bad. Running
any trivial command (e.g. uptime) takes some 5 to 10 seconds. Since I have
to have this system working, I plan to reboot it later today, but will keep
it in this state for some more time to see whether this state persists or
whether the system recovers from it.

Any ideas what might cause such a system state???


The system has a Ryzen 5 3600 CPU (6 core/12 threads) and 32 GB or RAM.

The following are a few commands that I have tried on this now practically
idle system:

$ time vmstat -n 1
  procsmemorypage  disks faults   cpu
  r  b  w  avm  fre  flt  re  pi  po   fr   sr nv0   in   sy   cs us sy id
  2  0  0  26G 922M 1.2K   1   4   0 1.4K  239   0  482 7.2K  934 11  1 88

real0m9,357s
user0m0,001s
sys 0m0,018

 wait 1 minute 

$ time vmstat -n 1
  procsmemorypage  disks faults   cpu
  r  b  w  avm  fre  flt  re  pi  po   fr   sr nv0   in   sy   cs us sy id
  1  0  0  26G 925M 1.2K   1   4   0 1.4K  239   0  482 7.2K  933 11  1 88

real0m9,821s
user0m0,003s
sys 0m0,389s

$ systat -vm

 4 usersLoad  0.10  0.72  3.57  Mar 28 16:15
Mem usage:  97%Phy 55%Kmem   VN PAGER   SWAP 
PAGER
Mem:  REAL   VIRTUAL in   out in  
out

Tot   Share TotShare Free   count
Act  2387M460K  26481M 460K 923M   pages
All  2605M218M  27105M 572Mioflt  Interrupts
Proc:  cow 132 total
   r   p   ds   w   Csw  Trp  Sys  Int  Sof  Flt52 zfod 96 hpet0:t0
  316   356   39  225  132   21   53   ozfod nvme0:admi
  %ozfod nvme0:io0
  0.1%Sys   0.0%Intr  0.0%User  0.0%Nice 99.9%Idle daefr nvme0:io1
|||||||||||prcfr nvme0:io2
   totfr nvme0:io3
dtbuf  react nvme0:io4
Namei  Name-cache   Dir-cache620370 maxvn  pdwak nvme0:io5
Callshits   %hits   %627486 numvn  168 pdpgs27 xhci0 66
   18  14  7865 frevn  intrn ahci0 67
17539M wire xhci1 68
Disks  nvd0  ada0  ada1  ada2  ada3  ada4   cd0   430M act   9 re0 69
KB/t   0.00  0.00  0.00  0.00  0.00  0.00  0.00 12696M inact hdac0 76
tps   0 0 0 0 0 0 0 54276K laund vgapci0 78
MB/s   0.00  0.00  0.00  0.00  0.00  0.00  0.00   923M free
%busy 0 0 0 0 0 0 0  0 buf

 5 minutes later 

$ time vmstat -n 1
 procsmemorypage  disks faults   cpu
 r  b  w  avm  fre  flt  re  pi  po   fr   sr nv0   in   sy   cs us sy id
 1  0  0  26G 922M 1.2K   1   4   0 1.4K  239   0  481 7.2K  931 11  1 88

real0m4,270s
user0m0,000s
sys 0m0,019s

$ time uptime
16:20  up 23:23, 4 users, load averages: 0,17 0,39 2,68

real0m10,840s
user0m0,001s
sys 0m0,374s

$ time uptime
16:37  up 23:40, 4 users, load averages: 0,29 0,27 0,96

real0m9,273s
user0m0,000s
sys 0m0,020s



OpenPGP_signature
Description: OpenPGP digital signature


Re: On 14-CURRENT: no ports options anymore?

2021-03-13 Thread Stefan Esser

Am 13.03.21 um 20:17 schrieb Hartmann, O.:

Since I moved on to 14-CURRENT, I face a very strange behaviour when trying to 
set
options via "make config" or via poudriere accordingly. I always get "===> 
Options
unchanged" (when options has been already set and I'd expect a dialog menu).
This misbehaviour is throughout ALL 14-CURRENT systems (the oldest is at FreeBSD
14.0-CURRENT #49 main-n245422-cecfaf9bede9: Fri Mar 12 16:08:09 CET 2021 amd64).

I do not see such a behaviour with 13-STABLE, 12-STABLE, 12.2-RELENG.

How to fix this? What happened?


Hi Oliver,

please check your TERM setting and test with a trivial setting
if it is not one of xterm, vt100 or vt320 (for example).

I had this problem when my TERM variable was xterm-color, which
used to be supported but apparently no longer is.

Regards, STefan



OpenPGP_signature
Description: OpenPGP digital signature


Re: jails: /pool/jails/fulljailmake -> /pool/jails/fulljailbmake: No such file or directory

2021-02-15 Thread Stefan Esser

Am 15.02.21 um 11:47 schrieb Mateusz Guzik:

Can you try this with reverting:

commit ee10666327b622c2f20a4ac17e7a5673b04e7c9a
Author: Simon J. Gerraty 
Date:   Sun Feb 14 17:20:10 2021 -0800

 Links for bmake and bmake.1

 Some folk forget that make is bmake, and want the links...

 MFC after: 1 week

diff --git a/usr.bin/bmake/Makefile.inc b/usr.bin/bmake/Makefile.inc
index 96431c19d2af..8c4cb659e1d8 100644
--- a/usr.bin/bmake/Makefile.inc
+++ b/usr.bin/bmake/Makefile.inc
@@ -9,6 +9,8 @@

  .if exists(${.CURDIR}/tests)
  PROG= make
+LINKS= make bmake
+MLINKS= ${MAN} b${MAN}
  .endif

  .if !defined(MK_SHARED_TOOLCHAIN) || ${MK_SHARED_TOOLCHAIN} == "no"

If reverting this does not help, can you try with:
sysctl vfs.cache_fast_lookup=0

On 2/15/21, O. Hartmann  wrote:

The base host is running FreeBSD 14.0-CURRENT #6 main-n244784-8563de2f279:
Fri
Feb 12 12:48:34 CET 2021 amd64, the source tree is at "commit
5dce03847fdc7bc6eb959282c0ae2117b1991746".


Updating jails via "ezjail-admin update -i", or for poudriere based CURRENT
(14-CURRENT) jails via "poudriere jail -j jail -u -b", installation of
world
fails due to an error, shown below:

[...]

===> usr.bin/bmake (install)
install  -s -o root -g wheel -m 555   make
/pool/jails/fulljail/usr/bin/make
install  -o root -g wheel -m 444 make.1.gz
/pool/jails/fulljail/usr/share/man/man1/ rm -f
/pool/jails/fulljail/usr/share/man/man1/bmake.1
/pool/jails/fulljail/usr/share/man/man1/bmake.1.gz;  install -l h -o root
-g
wheel -m 444  /pool/jails/fulljail/usr/share/man/man1/make.1.gz
/pool/jails/fulljail/usr/share/man/man1/bmake.1.gz install -l h -o root -g
wheel -m 555  /pool/jails/fulljailmake /pool/jails/fulljailbmake install:
link
/pool/jails/fulljailmake -> /pool/jails/fulljailbmake: No such file or
directory *** Error code 71


I've got the same problem in a simple buildworld/installworld:

===> usr.bin/bmake/tests/variables/t0 (install)
install  -o root  -g wheel -m 555  legacy_test 
//usr/tests/usr.bin/bmake/variables/t0/legacy_test

installing DIRS testsFILESDIR
install  -d -m 0755 -o root  -g wheel 
//usr/tests/usr.bin/bmake/variables/t0
install  -o root  -g wheel -m 444 
/usr/git/src/usr.bin/bmake/tests/variables/t0/Makefile.test 
//usr/tests/usr.bin/bmake/variables/t0/Makefile.test
install  -o root  -g wheel -m 444 
/usr/git/src/usr.bin/bmake/tests/variables/t0/expected.status.1 
//usr/tests/usr.bin/bmake/variables/t0/expected.status.1
install  -o root  -g wheel -m 444 
/usr/git/src/usr.bin/bmake/tests/variables/t0/expected.stderr.1 
//usr/tests/usr.bin/bmake/variables/t0/expected.stderr.1
install  -o root  -g wheel -m 444 
/usr/git/src/usr.bin/bmake/tests/variables/t0/expected.stdout.1 
//usr/tests/usr.bin/bmake/variables/t0/expected.stdout.1
install  -o root  -g wheel -m 444  Kyuafile 
//usr/tests/usr.bin/bmake/variables/t0/Kyuafile

install -l h -o root -g wheel -m 555  /make /bmake
install: link /make -> /bmake: No such file or directory

It seems that "/" is used as a path prefix instead of "/usr/bin/".

Removal of the LINKS line solves the issue for me, but with a sane
path prefix the link could be installed.

Regards, STefan



OpenPGP_signature
Description: OpenPGP digital signature


Re: problem building virtualbox-ose-kmod

2021-01-26 Thread Stefan Esser

Am 26.01.21 um 07:34 schrieb monochrome:
having this issue building virtualbox-ose-kmod, its been like this for a 
while but I deinstalled and forgot, for quite a while now, maybe over a 
month. now that I've moved from 13-current to stable/13 I thought I 
would try to put it back, but it still wont build. I haven't seen anyone 
else with this problem, did I miss a memo?


I have sent a patch to vbox@on 2020-01-16, but only received an
automatic reply that it had to be accepted by the moderator of the
list (and never got any further reply or reaction on it).

The signature of vm_map_protect() has changed, but the port has not
been updated.

Here is the patch in case the attachment gets stripped (but probably
with messed-up white-space):

Index: files/patch-src_VBox_Runtime_r0drv_freebsd_memobj-r0drv-freebsd.c
===
--- files/patch-src_VBox_Runtime_r0drv_freebsd_memobj-r0drv-freebsd.c 
(revision 561738)
+++ files/patch-src_VBox_Runtime_r0drv_freebsd_memobj-r0drv-freebsd.c 
(working copy)

@@ -421,7 +421,8 @@
 @@ -826,6 +885,7 @@ DECLHIDDEN(int) rtR0MemObjNativeProtect(PRTR0MEMOBJINT
  ProtectionFlags |= VM_PROT_EXECUTE;

- int krc = vm_map_protect(pVmMap, AddrStart, AddrEnd, 
ProtectionFlags, FALSE);
+-int krc = vm_map_protect(pVmMap, AddrStart, AddrEnd, 
ProtectionFlags, FALSE);
++int krc = vm_map_protect(pVmMap, AddrStart, AddrEnd, 
ProtectionFlags, 0, VM_MAP_PROTECT_SET_PROT);

 +IPRT_FREEBSD_RESTORE_EFL_AC();
  if (krc == KERN_SUCCESS)
  return VINF_SUCCESS;

Seems that __FreeBSD_version has been bumped to 1300135 less than
2 hours before 0659df6faddfb27ba54a2cae2a12552cf4f823a0 and thus
the patch could be made to depend on that __FreeBSD_version value,
but I did not bother to add the condition since all my systems have
been updated to newer versions.

Regards, STefan


--- memobj-r0drv-freebsd.o ---
/usr/ports/emulators/virtualbox-ose-kmod/work/VirtualBox-5.2.44/out/freebsd.amd64/release/bin/src/vboxdrv/r0drv/freebsd/memobj-r0drv-freebsd.c:887:80: 
error: too few arguments to function call, expected 6, have 5
     int krc = vm_map_protect(pVmMap, AddrStart, AddrEnd, 
ProtectionFlags, FALSE);

   ~~    ^
/usr/src/sys/vm/vm_map.h:517:5: note: 'vm_map_protect' declared here
int vm_map_protect(vm_map_t map, vm_offset_t start, vm_offset_t end,
     ^
1 error generated.
*** [memobj-r0drv-freebsd.o] Error code 1
Index: files/patch-src_VBox_Runtime_r0drv_freebsd_memobj-r0drv-freebsd.c
===
--- files/patch-src_VBox_Runtime_r0drv_freebsd_memobj-r0drv-freebsd.c   
(revision 561738)
+++ files/patch-src_VBox_Runtime_r0drv_freebsd_memobj-r0drv-freebsd.c   
(working copy)
@@ -421,7 +421,8 @@
 @@ -826,6 +885,7 @@ DECLHIDDEN(int) rtR0MemObjNativeProtect(PRTR0MEMOBJINT
  ProtectionFlags |= VM_PROT_EXECUTE;
  
- int krc = vm_map_protect(pVmMap, AddrStart, AddrEnd, ProtectionFlags, 
FALSE);
+-int krc = vm_map_protect(pVmMap, AddrStart, AddrEnd, ProtectionFlags, 
FALSE);
++int krc = vm_map_protect(pVmMap, AddrStart, AddrEnd, ProtectionFlags, 0, 
VM_MAP_PROTECT_SET_PROT);
 +IPRT_FREEBSD_RESTORE_EFL_AC();
  if (krc == KERN_SUCCESS)
  return VINF_SUCCESS;


OpenPGP_signature
Description: OpenPGP digital signature


[REVIEW] new function getlocalbase() - D27236 and D27237

2020-11-16 Thread Stefan Esser

I have created two Phabricator reviews for my proposed implementation
of getlocalbase():

https://reviews.freebsd.org/D27236
https://reviews.freebsd.org/D27237

The first one implements getlocalbase() with quite similar semantics
to getenv("LOCALBASE") which it will replace in a number of places in
the base system.

This implementation returns a pointer to either of:

1) getenv("LOCALBASE")
2) sysctlbyname("user.localbase", ...)
3) _PATH_LOCALBASE or "/usr/local"

I had considered to copy any result of 1) or 3) into the same static
buffer used to retrieve the sysctl result, but have for now implemented
a version that returns the pointers as is (in case of getenv() into the
environment, in case of the fall-back strings into the area for R/O
strings).

I'd be willing to change this to either:

a) retrieve a value on the first invocation and copy it to the buffer
b) retrieve a new value on each invocation and copy it to the buffer

Most programs will call getlocalbase() at most once and will either
construct the required path to e.g. a config file directory from it,
or they will store a local copy. The return type should prevent any
accidental overwriting of values at the returned address (but does
not really protect e.g. the environment variable area - same as if
getenv() was directly called).

If returning the pointer into the environment is considered too
dangerous, I'd prefer to implement variant a).

A potential problem exists due the unlimited length of the string
returned by getenv("LOCALBASE"), i.e. it could cause a path name
longer than PATHNAMEMAX to be created. I do not want to introduce
a potential error return or to silently discard superfluous data
from the returned value and therefore prefer to return the pointer
into the environment area without guarantees regarding the maximum
length of the string pointed to.


The second review replaces getenv() calls with getlocalbase() in
code that already used getenv(). The code is simplified but has
unchanged behavior if LOCALBASE is defined in the environment.
If it is undefined, than the sysctl value or the hard-coded value
is returned (and the only difference is that sysctl may cause the
system wide value to be controlled without recompilation of the
world).

In one program the constant _PATH_LOCALBASE was concatenated to
a relative file path, and in that case the same approach can be
used as in the other two (with snprintf() filling a local buffer).

I have not looked for other programs that should be modified to
use getlocalbase(), but all affected by my recent _PATH_LOCALBASE
commit are candidates ...

I'd appreciate comments in the Phabricator or review or by mail.

Regards, STefan


OpenPGP_signature
Description: OpenPGP digital signature


Re: pkg.c revision 367687 breaks pkg

2020-11-15 Thread Stefan Esser

Am 15.11.20 um 21:34 schrieb Kyle Evans:

On Sun, Nov 15, 2020 at 2:05 PM Stefan Esser  wrote:


Am 15.11.20 um 20:41 schrieb Kyle Evans:

This is a separate (valid) problem, but not directly related to
Scott's work here. sysctlbyname now goes directly to the kernel with
no chance for the user.* sysctls to intercept. That should
independently be fixed to maintain the illusion that they're real
sysctl's.


user.localbase is a real sysctl, but with a default value returned
when sysctl(3) is used.



Yup.


The getlocalbase() function should not depend on this default value,
since it contains an identicl default value that can be returned if
sysctlbyname fails (or rather returns a zero length string in case
no other value has been written to the kernel).



I don't care about this particular application, to be honest, but
about the general problem. libc has a sizable chunk of code in
sysctl(3) dealing with user.* sysctls, and sysctlbyname will never see
it. This isn't documented in the manpage, and IMO it's really just an
oversight; libc should still be able to provide the values as seen in
^/lib/libc/gen/sysctl.c whether you've invoked sysctl() or
sysctlbyname(). At a glance, it looks like localbase is the only one
that's also tunable, most of these don't really even need to take a
trip to the kernel to read.


I have added user.localbase a few days back to -CURRENT.

Having it under "user" seemed a logical choice and I have preserved
the semantics of all the existing R/O cases.

The trip through the kernel has the effect, that the conditions for
access that are specified in kern_mib.c are checked, before the value
is then provided by libc.

I do consider this is a sensible approach, since it consolidates the
access checks / policy in the kernel, independently of detailed checks
in libc.

The values returned by libc are read-only system parameters, and they
could also be passed into the kernel, to be returned from there, but
this would not provide any useful added functionality.

Having getsysctlbyname() implement the same logic for accesses to the
user sysctl name-space seems sensible, and if nobody beats me, I'd be
willing to provide a patch for review.


Back to getlocalbase() and its supposed semantics:

Is it useful to have it return a NULL prefix (functionally equivalent
to returning "/")?

Having LOCALBASE/etc identical to /etc could lead to unexpected
behavior (e.g. to files being found twice if a program collects data
from both places), but might still be valid?

I'd consider an undefined return value from sysctl() to indicate that
the system default of (e.g.) "/usr/local" should be used, while a value
of "/" maps LOCALBASE files to be found relative to the root directory.

Regards, STefan
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: pkg.c revision 367687 breaks pkg

2020-11-15 Thread Stefan Esser

Am 15.11.20 um 20:41 schrieb Kyle Evans:

This is a separate (valid) problem, but not directly related to
Scott's work here. sysctlbyname now goes directly to the kernel with
no chance for the user.* sysctls to intercept. That should
independently be fixed to maintain the illusion that they're real
sysctl's.


user.localbase is a real sysctl, but with a default value returned
when sysctl(3) is used.

The getlocalbase() function should not depend on this default value,
since it contains an identicl default value that can be returned if
sysctlbyname fails (or rather returns a zero length string in case
no other value has been written to the kernel).

Regards, STefan


OpenPGP_signature
Description: OpenPGP digital signature


Literal references to /usr/local in shell scripts

2020-10-26 Thread Stefan Esser

The following shell scripts (or configuration files parsed by a
shell) contain literal references to /usr/local:

libexec/rc/rc.conf  # many variables
libexec/rc/rc.shutdown  # PATH component

sys/conf/newvers.sh # search for svnversion, git, hg

usr.bin/man/man.sh  # man_default_path, config_local

usr.sbin/autofs/autofs/include_ldap # path to ldapsearch
usr.sbin/autofs/autofs/special_media# path to mount.exfat, ntfs-3g
usr.sbin/bsdconfig/bsdconfig# BSDCFG_LOCAL_LIBE
usr.sbin/certctl/certctl.sh # TRUSTPATH, BLACKLISTPATH
usr.sbin/crashinfo/crashinfo.sh # path to gdb
usr.sbin/periodic/periodic.conf # local_periodic variable

On systems with non-default LOCALBASE these scripts need to be
adjusted.

In the case of rc.shutdown, for example, shutdown routines will
not be executed for a LOCALBASE other then /usr/local.

The rc.shutdown, autofs/*, certctl.sh, and crashinfo scripts will
be run with root privileges and must not use an untrusted LOCALBASE
value (but could refer to a sysctl variable). The same applies to
the periodic script that relies on the local_periodic variable set
in periodic.conf (but probably overridden in periodic.conf.local,
if required).

rc.conf could use a $LOCALBASE variable instead of literal values
to construct paths to port/package provided files in order to not
require that each value is modified in the systems /etc/rc.conf
file - which will fail if new variables referring to /usr/local
are introduced in the default configuration).

The list of shell scripts checked excludes those in contrib, release, 
tests, and tools directories, since I think those will be used with

default LOCALBASE, in general.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


[REVIEW] replace literal uses of /usr/local with a macro [D26942]

2020-10-25 Thread Stefan Esser

I have created

https://reviews.freebsd.org/D26942

as a suggested patch to remove nearly 20 literal uses of /usr/local
in the base system.

This requires to add an include of paths.h to some of the source files
(.c or .h), but none of these includes is leaked to /usr/include and
they are thus only visible during the build.

I have built the world with this patch applied and the resulting
binaries are unchanged.

The definition of _PATH_LOCALBASE in paths.h could at a later time
be derived from the value of LOCALBASE (in src/Makefile.inc1 or
overridden my the user in src.conf), but this is a change that
should be discussed separately from this review.

Please comment on this patch, the decision to not touch contrib
sources and which follow-up steps to perform next (e.g. similar
changes to shell scripts or configuration files).
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


OpenZFS: L2ARC shrinking over time?

2020-10-12 Thread Stefan Esser
After the switch-over to OpenZFS in -CURRENT I have observed that the 
L2ARC shrinks over time (at a rate of 10 to 20 MB/day).


My system uses a 1 TB NVME SSD partitioned as 64 GB of SWAP (generally
unused) and 256 GB of ZFS cache (L2ARC) to speed up reads from a 3*6 TB
raidz1.

(L2ARC persistence is great, especially on a system that is used for
development and rebooted into the latest -CURRENT about once per week!)


After reboot, the full cache partition is available, but even measured
only minutes apart the reported site of the L2ARC is declining.

The following two values were obtained just 120 seconds apart:

kstat.zfs.misc.arcstats.l2_asize: 273831726080

kstat.zfs.misc.arcstats.l2_asize: 273831644160

[After finishing the text of this mail I have checked the value of
that variable another time - maybe 10 minutes have passed ...

kstat.zfs.misc.arcstats.l2_asize: 273827724288

That corresponds with some 4 MB lost over maybe 10 minutes ...]


I have first noticed this effect with the zfs-stats command updated
to support the OpenZFS sysctl variables (committed to ports a few days
ago).

After 6 days of uptime the output of "uptime; zfs-stats -L" is:


12:31PM  up 6 days, 7 mins, 2 users, load averages: 2.67, 0.73, 0.36


ZFS Subsystem ReportMon Oct 12 12:31:57 2020


L2 ARC Summary: (HEALTHY)
Low Memory Aborts:  87
Free on Write:  5.81k
R/W Clashes:0
Bad Checksums:  0
IO Errors:  0

L2 ARC Size: (Adaptive) 160.09  GiB
Decompressed Data Size: 373.03  GiB
Compression Factor: 2.33
Header Size:0.12%   458.14  MiB

L2 ARC Evicts:
Lock Retries:   61
Upon Reading:   9

L2 ARC Breakdown:   12.66   m
Hit Ratio:  75.69%  9.58m
Miss Ratio: 24.31%  3.08m
Feeds:  495.76  k

L2 ARC Writes:
Writes Sent:100.00% 48.94   k




After a reboot and with the persistent L2ARC now reported to be
available again (and filled with the expected amount of data):


13:24  up 28 mins, 2 users, load averages: 0,09 0,05 0,01


ZFS Subsystem ReportMon Oct 12 13:24:56 2020


L2 ARC Summary: (HEALTHY)
Low Memory Aborts:  0
Free on Write:  0
R/W Clashes:0
Bad Checksums:  0
IO Errors:  0

L2 ARC Size: (Adaptive) 255.03  GiB
Decompressed Data Size: 633.21  GiB
Compression Factor: 2.48
Header Size:0.14%   901.41  MiB

L2 ARC Breakdown:   9.11k
Hit Ratio:  35.44%  3.23k
Miss Ratio: 64.56%  5.88k
Feeds:  1.57k

L2 ARC Writes:
Writes Sent:100.00% 205



I do not know whether this is just an accounting effect, or whether the
usable size of the L2ARC is actually shrinking, but since there is data
in the L2ARC after the reboot, I assume it is just an accounting error.

But I think this should still be researched and fixed - there might be
a wrap-around after several weeks of up-time, and if the size value
is not only used for display purposes, this might lead to unexpected
behavior.


OpenPGP_signature
Description: OpenPGP digital signature


Re: geeqie, and neverball build problem on 13-current

2020-09-24 Thread Stefan Esser

Am 24.09.20 um 11:24 schrieb Niclas Zeising:

On 2020-09-24 11:17, monochrome wrote:
Not sure how long this has been a problem, I noticed with the new 
version of geeqie (geeqie-devel builds fine) and found the neverball 
problem when rebuilding all packages to investigate. neverball output 
changes with consecutive build attempts, geeqie does not.


This is related to the update of llvm to 11.  With this update, builds 
are by default using -fno-common, which means global variables cannot 
exist in multiple places.  gcc 10 has the same default.  A quick fix is 
to add -fcommon to CFLAGS, but the proper fix is to update the 
application source to only have the variable in one place.


This was very easy to fix (like most of the ports affected by the
-fno-common issue).

The port is updated (r549911) and packages will appear in due time.

Regards, STefan


OpenPGP_signature
Description: OpenPGP digital signature


Re: OpenZFS and L2ARC

2020-09-10 Thread Stefan Esser
Am 09.09.20 um 21:26 schrieb John Baldwin:> A simple fix might be to use 
CTLFLAG_SKIP so that you only invoke the

expensive sysctls if you request them by name, but not if you request
the 'kstat.zfs' tree.


I have looked at /sys/contrib/openzfs/module/zfs/dbuf.c where I had
assumed that the "kstat.zfs.misc.dbufs" sysctl node was created, but
did not spot the location on a quick search.

The kstat nodes are created by kstat_install() and AFAICT, there is
no parameter that directly allows to create the sysctl node with
CTLFLAG_SKIP, currently.

This long delay affects sysctl -a and I'd really hope that it can be
fixed in a way that suppresses this large debug output unless it is
specifically requested by passing the full node name ...

Regards, STefan


OpenPGP_signature
Description: OpenPGP digital signature


Re: OpenZFS and L2ARC

2020-09-09 Thread Stefan Esser

Am 09.09.20 um 08:46 schrieb Stefan Esser:

Am 09.09.20 um 00:45 schrieb Graham Perrin:
Recalling 
<https://lists.freebsd.org/pipermail/freebsd-current/2020-March/075661.html>, 
on 28/03/2020 15:17,28/03/2020 15:17, Allan Jude wrote:


 >> …
 >>
 >> Basically 'arc' was converted to a subtree.
 >>
 >> We should add some backwards compat sysctls to cover some of
 >> these renames etc so configs and scripts don't break etc.


This is not possible for quite a number of sysctls, since there is
no simple 1:1 mapping for many of them.


And there is an annoyance that I had noticed before but now have
tracked down:

$ time sysctl kstat.zfs.misc.dbufs | wc
    55327 2047031 16333472

real    0m16,446s
user    0m0,055s
sys    0m16,397s

Somebody decided to put a complete list of dbufs under this sysctl
and thus querying "kstat.zfs.misc" takes that long (16 seconds to
generate 16 MB of output on my system), even if only a few other
values in "kstat.zfs.misc" are needed.

I do not know whether there is any chance to get that debug output
moved out of the "misc", e.g. into a new "debug" sub-tree. I'm afraid,
that on Linux there are scripts that expect it under this name.

If it is not acceptable to the upstream, we should locally modify the
sysctl tree to move that variable out of "misc", IMHO. (While not
taking much time, "kstat.zfs.misc.dbgmsg" should also be relocted to
a "debug" sub-tree, IMHO ...)

zfs-stats needs tens of values from "misc", and if they are not all
added individually to the Kstat array, this will limit the response
time to any zfs-stats invocation.

It is not too hard to add the new variables in zfs-stats and to
adapt the calculations to derive meaningful values to display.

But if it always takes 16 seconds to generate any output, I'm not
likely to use it too often ...


Update: I have created a fork of zfs-stats to work on:

https://github.com/stesser/zfs-stats

Initial change is to work around the long delay mentioned above and to
use the correct name for the vdev cache size variable and to display
the size, data contents and the corresponding compression factor of the
compressed L2ARC.

I'll create pull requests to inform the upstream of these changes.


OpenPGP_signature
Description: OpenPGP digital signature


Re: OpenZFS and L2ARC

2020-09-08 Thread Stefan Esser

Am 09.09.20 um 00:45 schrieb Graham Perrin:

On 08/09/2020 08:43, Stefan Esser wrote:
OpenZFS seems to work quite well for me, in general, but I have 
questions regarding the L2ARC statistics.



…


The sysutils/zfs-stats port reports the following values for
this system, BTW:


ZFS Subsystem Report    Tue Sep  8 09:02:46 2020



…



Quite a number of sysctl variable names have changed, and the port
needs to be adapted to the new names (therefore there are lots of 0
values in the -L output).

The following names used by zfs-stats do not exist in OpenZFS:

kstat.zfs.misc.arcstats.recycle_miss
kstat.zfs.misc.arcstats.l2_write_buffer_bytes_scanned
kstat.zfs.misc.arcstats.l2_write_buffer_iter
kstat.zfs.misc.arcstats.l2_write_buffer_list_iter
kstat.zfs.misc.arcstats.l2_write_buffer_list_null_iter
kstat.zfs.misc.arcstats.l2_write_full
kstat.zfs.misc.arcstats.l2_write_in_l2
kstat.zfs.misc.arcstats.l2_write_io_in_progress
kstat.zfs.misc.arcstats.l2_write_not_cacheable
kstat.zfs.misc.arcstats.l2_write_passed_headroom
kstat.zfs.misc.arcstats.l2_write_pios
kstat.zfs.misc.arcstats.l2_write_spa_mismatch
kstat.zfs.misc.arcstats.l2_write_trylock_fail
kstat.zfs.misc.arcstats.l2_writes_hdr_miss
vfs.zfs.vdev.cache.size

The existence of vfs.zfs.vdev.cache.size vs vfs.zfs.vdev.cache_size
can be used to detect OpenZFS, and is easily fixed.

But the above listed L2ARC values seem to have been removed from or
have never existed in OpenZFS, and I did not find any substitutes.

Are there any plans to re-create them in OpenZFS on FreeBSD or are
they gone for good?


Recalling 
<https://lists.freebsd.org/pipermail/freebsd-current/2020-March/075661.html>, 
on 28/03/2020 15:17,28/03/2020 15:17, Allan Jude wrote:


 >> …
 >>
 >> Basically 'arc' was converted to a subtree.
 >>
 >> We should add some backwards compat sysctls to cover some of
 >> these renames etc so configs and scripts don't break etc.


This is not possible for quite a number of sysctls, since there is
no simple 1:1 mapping for many of them.


And there is an annoyance that I had noticed before but now have
tracked down:

$ time sysctl kstat.zfs.misc.dbufs | wc
   55327 2047031 16333472

real0m16,446s
user0m0,055s
sys 0m16,397s

Somebody decided to put a complete list of dbufs under this sysctl
and thus querying "kstat.zfs.misc" takes that long (16 seconds to
generate 16 MB of output on my system), even if only a few other
values in "kstat.zfs.misc" are needed.

I do not know whether there is any chance to get that debug output
moved out of the "misc", e.g. into a new "debug" sub-tree. I'm afraid,
that on Linux there are scripts that expect it under this name.

If it is not acceptable to the upstream, we should locally modify the
sysctl tree to move that variable out of "misc", IMHO. (While not
taking much time, "kstat.zfs.misc.dbgmsg" should also be relocted to
a "debug" sub-tree, IMHO ...)

zfs-stats needs tens of values from "misc", and if they are not all
added individually to the Kstat array, this will limit the response
time to any zfs-stats invocation.

It is not too hard to add the new variables in zfs-stats and to
adapt the calculations to derive meaningful values to display.

But if it always takes 16 seconds to generate any output, I'm not
likely to use it too often ...

Regards, STefan


OpenPGP_signature
Description: OpenPGP digital signature


OpenZFS and L2ARC

2020-09-08 Thread Stefan Esser
OpenZFS seems to work quite well for me, in general, but I have 
questions regarding the L2ARC statistics.


The system uses a 3 * 6 TB raidz1 (plus further ZFS volumes that
are not relevant here, since without level 2 ARC) and an 1 TB M.2
SSD with a 256 GB partition for the L2ARC (and most of it currently
unused, else).

The L2ARC seems to have filled to the limit of 256 GB, but after
several reboots, sysctl reports a L2ARC size of nearly twice the
allocated space:

kstat.zfs.misc.arcstats.l2_size: 534620858880

That is 497 GiB, and might be possible with a lz4 compression
factor of 2 - if the value reported is not the space allocated,
but the actual (uncompressed) data held by the L2ARC.


The sysutils/zfs-stats port reports the following values for
this system, BTW:


ZFS Subsystem ReportTue Sep  8 09:02:46 2020


L2 ARC Summary: (HEALTHY)
Passed Headroom:0
Tried Lock Failures:0
IO In Progress: 0
Low Memory Aborts:  7
Free on Write:  123
Writes While Full:  0
R/W Clashes:0
Bad Checksums:  0
IO Errors:  0
SPA Mismatch:   0

L2 ARC Size: (Adaptive) 497.91  GiB
Header Size:0.11%   558.83  MiB

L2 ARC Evicts:
Lock Retries:   6
Upon Reading:   0

L2 ARC Breakdown:   5.75m
Hit Ratio:  81.94%  4.71m
Miss Ratio: 18.06%  1.04m
Feeds:  235.04  k

L2 ARC Buffer:
Bytes Scanned:  0   Bytes
Buffer Iterations:  0
List Iterations:0
NULL List Iterations:   0

L2 ARC Writes:
Writes Sent:100.00% 22.67   k



With the FreeBSD ZFS (without persistent L2ARC) I never got more
than 20% hit ratio on the L2ARC between reboots.

Quite a number of sysctl variable names have changed, and the port
needs to be adapted to the new names (therefore there are lots of 0
values in the -L output).

The following names used by zfs-stats do not exist in OpenZFS:

kstat.zfs.misc.arcstats.recycle_miss
kstat.zfs.misc.arcstats.l2_write_buffer_bytes_scanned
kstat.zfs.misc.arcstats.l2_write_buffer_iter
kstat.zfs.misc.arcstats.l2_write_buffer_list_iter
kstat.zfs.misc.arcstats.l2_write_buffer_list_null_iter
kstat.zfs.misc.arcstats.l2_write_full
kstat.zfs.misc.arcstats.l2_write_in_l2
kstat.zfs.misc.arcstats.l2_write_io_in_progress
kstat.zfs.misc.arcstats.l2_write_not_cacheable
kstat.zfs.misc.arcstats.l2_write_passed_headroom
kstat.zfs.misc.arcstats.l2_write_pios
kstat.zfs.misc.arcstats.l2_write_spa_mismatch
kstat.zfs.misc.arcstats.l2_write_trylock_fail
kstat.zfs.misc.arcstats.l2_writes_hdr_miss
vfs.zfs.vdev.cache.size

The existence of vfs.zfs.vdev.cache.size vs vfs.zfs.vdev.cache_size
can be used to detect OpenZFS, and is easily fixed.

But the above listed L2ARC values seem to have been removed from or
have never existed in OpenZFS, and I did not find any substitutes.

Are there any plans to re-create them in OpenZFS on FreeBSD or are
they gone for good?

I'd like to update the zfs-stats port for compatibilíty with OpenZFS ...
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: /usr/src/usr.bin/gh-bc don't know how to make /usr/src/contrib/bc/locales/en_US.UTF-8.msg

2020-09-02 Thread Stefan Esser

Am 02.09.20 um 01:42 schrieb Julian H. Stacey:

Hi curr...@freebsd.org,

/usr/src/usr.bin/gh-bc don't know how to make 
/usr/src/contrib/bc/locales/en_US.UTF-8.msg
With .ctm_status src-cur 14656 .svn_revision 364986 /usr/src/usr.bin/gh-bc


Hi Julian,

since I'm building -CURRENT at least once a day with this bc
and there have been no other reports, this does appear to be
a local problem on your system (or a problem with CTM).


Avoided for now with /etc/src.conf WITHOUT_GH_BC=YES


Yes, but the correct fix is to provide the missing file, which
is a symbolic link to en_US.msg. In fact, most of the message
catalogs are provided by symlinks (71 of 96 files in the locales
directory).

My assumption is, that CTM does not correctly encode and create
symbolic links, so you miss them ... (but I did not check the
CTM sources to verify that assumption).

This ought to be fixed in CTM and then a delta should be created
that provides these missing symlinks - since you are one of very
few CTM users left, you may want to create a patch ...

Since the mail list strips binary attachments, I'm including a
compresssed and uuencoded TAR file with these symlinks below.
Extract from within src/contrib/bc/locales to generate them:

begin 644 gh-bc-locales.tar.bz2
M0EIH.3%!6293618`GIIH`'ZI[JAY5#VJAUJ'0S
MB5)<^7/.[OVC><;E,R%I@YC!.4$C!%0&\T"DP#,3(KT4RW8;8ZL<=^PJ'\5#
MMBEQ*ABH=RH=50RH94.NHDD@:R,W/GV>3:MKS5-J$-83WJDS.
M)1%%A['191"3G$0\(S*S-#O-:6214=J+2@HJ,CILM%IM\<((&8``>!/:`2:D
M`UNUOIK>%^6W9OX=W/T^/CPUIGHWXYQ\=I4-M:%0\]0^BH8J'XK\#
MO8SO8QF,X3FF939;&8.-0ZE0RH>JH>;DJ'ON,DNYPY9FOK!SW<\J'94-;]CL
M6M^I%DP@060$<&-XODQ>V8=20"BM(8:NE8B<)(!>+T9O4D`:<5G*7:#R20#-
MX2Y(!CAAV:S,Z0V--71:%+%-XN5>7>I(!F2`60,2JD1@+-54NJ)C#AQJ&HVV
MTWXU#THM];M^IOIIEF\-(+*24PU)`*M+=:<7JRV2`54-.&<===0X94.VH]TG
M:B\D76B^^G&>V7DL@]%87K];RF:UIIFK6M9K@>-1E'6G"M+5>2+3Y3U9GG1<
M65M3M1?DTG"H[?8,9,U)P>M?4J"P(OHDY2!**Z,5&+4D"H>1>,-@`P(<($XD
MY3WP^Y/=\8ASA=*BJX555YBB'.%PF`G,;A1$V!^BP`_)=0Z='LPS,F99VU#^
MR@D$#B+N2*<*$@+`$[F```
`
end

But since further symlinks will occur if more locales are added,
the problem will re-appear for CTM users, unless symlinks are
supported by CTM.

Regards, STefan


OpenPGP_signature
Description: OpenPGP digital signature


Re: bc and dc -e/-f and Copyright

2020-08-02 Thread Stefan Esser
Am 31.07.20 um 19:44 schrieb Walter von Entferndt:
> Hi, innocent user here.
> At Freitag, 31. Juli 2020, 14:00:00 CEST, Gavin Howard 
>  wrote:
>> 3. I could restore the expected behavior of -e and -f and add -E and
>> -F options that do the same thing but do not exit. I think I like this
> +1
> This is the most intuitive & natural choice.

An even better alternative has been suggested and implemented, I'm
waiting for that version to be tagged in the repository and will
then update our base version:

The -e and -f option will in general exit after all input has been
processed, but an additional "-f -" following those options will
allow to supply additional commands via stdin (and thus reproduce
the current behavior, if desired).

Regards, STefan
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: problem building dev/e1000

2019-02-15 Thread Stefan Esser
Am 15.02.19 um 21:28 schrieb Warner Losh:
> On Fri, Feb 15, 2019 at 12:54 PM Ian Lepore  wrote:
> 
>> On Fri, 2019-02-15 at 12:32 -0700, Warner Losh wrote:
>>> On Fri, Feb 15, 2019 at 12:17 PM Ian Lepore  wrote:
>> I guess the question would be how many things does '...' represent now
>> and in the future?  What it would need to be, given our current
>> inflexible config(8) is
>>
>>  net/iflib.c optional ether pci em | ether pci igb | ...
>>
>> So if ... is 2 or 3 more drivers, that's not so bad.  If iflib is
>> eventually going to be used by dozens of drivers, even the parens would
>> make for a pretty ugly solution.
> 
> Immediately, there's at least half a dozen. Count on there being a dozen or
> two eventually.

I had been thinking about a dependencies file for config,
which either reports missing device/options lines, or adds
obvious dependencies.

This could also detect other missing pre-requisites, e.g.
if inet or inet6 is to be compiled in, but no link-layer
support (e.g. ether or wlan).

The same applies to iflib for drivers that need it, miibus,
CAM, ...

The purpose of such a dependency check would be a clear
indication of the missing definitions, especially if the
dependencies have been changed as with iflib becoming
non-standard.

Open points:

- How to generate these dependency files (could be derived
  from driver sources, e.g. an #include of some header might
  indicate that some "device xyz" is required).

- Loadable kernel modules are an alternate method to provide
  a device driver or protocol (the dependency check could in
  such cases warn that a driver is not compiled in and will
  be required in form of a LKM).


I have had a look at /sys/conf/files and have noticed that a
number of lines contain bugs. E.g. there are network drivers
that depend on "inet" and will not be compiled in if only
"inet6" has been selected in the config file.

Regards, STefan
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: ctm(1) deprecation in the FreeBSD base system?

2018-12-23 Thread Stefan Esser
Am 23.12.18 um 02:39 schrieb Montgomery-Smith, Stephen:
> On 12/21/18 10:03 PM, Julian H. Stacey wrote:
>>> The port Makefile that I have prepared is attached below for reference.
>>> Regards, STefan
>>
>> Thanks Stefan,
>> I took current /usr/ports/misc/ctm/
>> & converted Stephen's & my diffs to be automatic ports patches:
>>  http://berklix.com/~jhs/src/bsd/fixes/freebsd/ports/gen/misc/ctm/files/
>>  http://berklix.com/~jhs/src/bsd/fixes/freebsd/ports/gen/misc/ctm/README.JHS
>>
>> Stephens diffs are essential, without them CTM broke long ago, 
>> (5 digit numeric names maybe ?)
>>
>> I haven't checked all execution as my ctm_rmail scripts run
>> automaticaly on an older release, not my current box, but this is
>> running OK so far:
>>  ctm -q /pub/FreeBSD/development/CTM/svn-cur/svn-cur.07000xEmpty.xz ;
>>  ctm -q /pub/FreeBSD/development/CTM/svn-cur/svn-cur.07[0-9][0-9][0-9].xz
>>
>> Stephen may be best person to test delta builds, as hes the delta originator.
>>
>> Soonish I'll set up a 
>> [freebsd-]ctm-src-12 on http://mailman.berklix.org/mailman/listinfo
>> if Stephens' & my requests to postmaster @ & mailman @ freebsd.org
>> continues to get no response.
> 
> Thank you for doing this, Stefan.  The additional patches mentioned by
> Julian need to be included with the port before it can work and be
> tested.  They are absolutely needed for the svn ctm deltas.

Hi Stephen and Julian,

I have converted the diffs into port patches (make makepatch) and
updated the man-page revision dates of ctm.8 and ctm_rmail.8.

The port update has been committed as r488168.

I hope this brings the port to the level required to make the CTM
port usable again.

The patches could be imported to the Github repo (freebsd/ctm) after
testing, but for now the patched port should suffice ...

Best regards, STefan
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: ctm(1) deprecation in the FreeBSD base system?

2018-11-11 Thread Stefan Esser
Am 23.10.18 um 22:21 schrieb Warner Losh:
> On Tue, Oct 23, 2018 at 2:13 PM Rodney W. Grimes> At the most/least we 
> should not go very far,
> the only thing that needs done soon is a gonein(13) commited
> to head and MFC'ed to stable/12 by thursday.
> 
> All the other details should wait until a depreication policy
> revision is completed that includes how to deal with this.
> 
> There's no reason at all to wait. We can create the port. We can create the
> github repo. We can move the history there. We won't  be removing it before we
> have a chance to socialize the removal and give people a chance to cut over.
> None of this requires a new policy. Everybody agrees we should do it. We
> shouldn't let some perceived policy get in the way of just moving forward.

I have created a review for the removal of CTM on phabricator:

https://reviews.freebsd.org/D17935

The goal of this review is not to get approval for the removal within a few
days, but to have all relevant changes documented and open for review.

Since the removal affects ObsoleteFiles, there is some churn if another entry
is added before approval, but it is easy enough to deal with it ...

I'd still appreciate comments and a suggestion when to perform the removal.

I could add a gonein(13) now, to give further attention to the depreciation
of CTM, in the mean-time.

Regards, STefan
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: ctm(1) deprecation in the FreeBSD base system?

2018-10-23 Thread Stefan Esser
Am 23.10.18 um 19:06 schrieb Warner Losh:
> 
> 
> On Tue, Oct 23, 2018 at 10:44 AM Stefan Esser  <mailto:s...@freebsd.org>> wrote:
> 
> Am 23.10.18 um 17:27 schrieb Montgomery-Smith, Stephen:
> > I have no problem turning ctm into a port.  But I would appreciate
> > advice on whether there is a standard or easy process for converting
> > software from the FreeBSD base to a port.  If not, I can muddle my way
> > through it.  But give me some time (a few months) to get it done,
> > because the rest of my life is making heavy demands on me right now.
> 
> Hi Stephen,
> 
> I could spend a few hours to perform the conversion to a port and to
> test it. I've a happy CTM user, many years ago, and I can understand
> that it still may be useful in special situations.
> 
> The source archive will need to be hosted somewhere. Do you have a
> preference (e.g., on a FreeBSD server, or on Github, Gitlab, ...)?
> 
> 
> It's trivial to setup a new repository on github.com/freebsd/ctm
> <http://github.com/freebsd/ctm> for this purpose. With the right magic, we
> could even retain the commit history.


I have a complete port (in the sense that it builds, installs, packages),
but there are a few details, that should be fixed on that occasion:

1) The man-pages install in man1 for binaries in sbin (--> change to man8)
2) The Makefiles use LIBADD (--> change to use LDADD)
3) The README file contains a reference to CVSUP (--> clean up)

I do not have write access to freebsd on Github, and I'd appreciate if it
was possible to move the files from src/usr.sbin/ctm there (with history
would of course be preferable, but I'm not sure that it is of much use).

I could then push my local changes (required to make the port build) to
the Github repo (or add a few small patch files to the port).

The port Makefile that I have prepared is attached below for reference.

Regards, STefan

-
# $FreeBSD$




PORTNAME=   ctm
PORTVERSION=2.0
CATEGORIES= ports-mgmt

MAINTAINER= s...@freebsd.org
COMMENT=Create, receive, and apply FreeBSD source updates per mail

LICENSE=Beerware
LICENSE_NAME=   Beerware
LICENSE_TEXT=   "THE BEER-WARE LICENSE" (Revision 42): \
 wrote this file.  As long as you retain this
notice you \
can do whatever you want with this stuff. If we meet some day,
and you think \
this stuff is worth it, you can buy me a beer in return.
Poul-Henning Kamp
LICENSE_PERMS=  dist-mirror dist-sell pkg-mirror pkg-sell auto-accept

USES=   tar:txz

#USE_GITHUB=yes



#GH_ACCOUNT=freebsd




do-install:
.for f in ctm ctm_dequeue ctm_rmail ctm_smail
${INSTALL_PROGRAM} ${WRKSRC}/${f}/${f} \
${STAGEDIR}${PREFIX}/sbin
.endfor
.for f in ctm ctm_rmail
${INSTALL_MAN} ${WRKSRC}/${f}/${f}.1 \
${STAGEDIR}${MAN1PREFIX}/man/man1 # should be man8



.endfor
.for f in ctm_dequeue ctm_smail
${INSTALL_MAN} ${WRKSRC}/ctm_rmail/ctm_rmail.1 \
${STAGEDIR}${MAN1PREFIX}/man/man1/${f}.1 # should be
man8


.endfor

.include 
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: ctm(1) deprecation in the FreeBSD base system?

2018-10-23 Thread Stefan Esser
Am 23.10.18 um 17:27 schrieb Montgomery-Smith, Stephen:
> I have no problem turning ctm into a port.  But I would appreciate
> advice on whether there is a standard or easy process for converting
> software from the FreeBSD base to a port.  If not, I can muddle my way
> through it.  But give me some time (a few months) to get it done,
> because the rest of my life is making heavy demands on me right now.

Hi Stephen,

I could spend a few hours to perform the conversion to a port and to
test it. I've a happy CTM user, many years ago, and I can understand
that it still may be useful in special situations.

The source archive will need to be hosted somewhere. Do you have a
preference (e.g., on a FreeBSD server, or on Github, Gitlab, ...)?

Regards, STefan
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: careless commits disrupt

2018-10-23 Thread Stefan Esser
Am 22.10.18 um 23:09 schrieb Julian H. Stacey:
> Hi, Reference:
>> From:        Stefan Esser 
>> Date:Fri, 12 Oct 2018 11:44:59 +0200
> 
> Stefan Esser wrote:
>> I might have mentioned, that I always preserve old shared libraries in
>> /usr/lib/compat before running "make delete-old-libs". 
> 
> Good idea, are you doing that manually, or do you have a patch to share ?

I do it manually, but I have a script that checks for programs in the
base system and packages that use any library moved to the lib/compat
directory - see below.

The base system checks should never give any output (unless there are
old and obsolete binaries).

I use the package list to identify packages that must be rebuilt to
make the compat libraries obsolete. Since there may be shared library
version conflicts, if some

Regards, STefan



#!/bin/sh

find_compat_depend () {
local dir="$1"
local pattern="$2"

find "$dir" -type f ${pattern:+-name "$pattern"} \
-exec sh -c "ldd {} 2>/dev/null | grep -lq /compat/ && echo {}" \;
}

echo "Base system programs referencing compat libraries:"
find_compat_depend /bin ""
find_compat_depend /sbin ""
find_compat_depend /libexec ""
#find_compat_depend /lib "lib*.so.*"
find_compat_depend /usr/bin ""
find_compat_depend /usr/sbin ""
find_compat_depend /usr/libexec ""
#find_compat_depend /usr/lib "lib*.so.*"

echo
echo "Installed packages referencing compat libraries:"
{
find_compat_depend /usr/local/bin ""
find_compat_depend /usr/local/sbin ""
find_compat_depend /usr/local/libexec ""
find_compat_depend /usr/local/lib "lib*.so.*"
} | xargs pkg which -q | sort -u
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Also affects "pkg" (was: Re: OpenSSL 1.1.1 libssl.so version number)

2018-10-13 Thread Stefan Esser
Am 13.10.18 um 01:56 schrieb Don Lewis:
> Prior to the OpenSSL 1.1.1 import, the base OpenSSL library was
> /usr/lib/libssl.so.8.  The security/openssl port (1.0.2p) installed
> ${LOCALBASE}/lib/ilbssl.so.9 and the security/openssl-devel port
> (1.1.0i) installed ${LOCALBASE}/lib/libssl.so.11.  After the import, the
> base OpenSSL library is /usr/lib/libssl.so.9.  Now if you build ports
> with DEFAULT_VERSIONS+=ssl=openssl, the library that actually gets used
> is ambiguous because there are now two different versions of libssl.so
> (1.0.2p and 1.1.1) with the same shared library version number.
> 
> I stumbled across this when debugging a virtualbox-ose configure
> failure.  The test executable was linked to the ports version of
> libssl.so but rtld chose the base libssl.so at run time.

I'm seeing something possibly related in pkg:

$ ldd /usr/local/lib/libpkg. | grep ssl
libpkg.a libpkg.solibpkg.so.4  libpkg.so.4.0.0

$ldd /usr/local/lib/libpkg.so.4 | grep ssl
libssl.so.9 => /usr/lib/libssl.so.9 (0x800679000)

This results in:

$ pkg -v
ld-elf.so.1: /usr/local/lib/libcrypto.so.9: version OPENSSL_1_1_0 required by
/usr/local/lib/libpkg.so.4 not defined

My work-around was to copy pkg-static over pkg (I have not checked
whether the static version is linked against the system or ports
version of the library, but I assume the latter).

I have "DEFAULT_VERSIONS+= ssl=openssl" in make.conf but the same
problem exists if pkg is built without that setting.

My local version of portmaster has been changed to use pkg-static
in favor of pkg and I plan to commit that change to the portmaster
port, to make it resilient against this problem.

Regards, STefan
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: careless commits disrupt

2018-10-12 Thread Stefan Esser
Am 12.10.18 um 07:39 schrieb Dag-Erling Sm�rgrav:
> Julian H. Stacey  writes:
>> Stefan Esser  writes:
>>> You should also delete old files:
>>>
>>> cd /usr/src
>>> make delete-old
>>> make delete-old-libs
>> I just ran that. It deleted lots of stuff. & I'd only run it 2 days ago.
>> I should have run it just before buildworld though.
>> It's not suggested in the top of 
>>   https://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/makeworld.html
>> just at base of page.
> 
> That's because you should *never* run delete-old or delete-old-libs from
> a source tree that is newer than your installed system.  It may delete
> files which have been obsoleted by changes you haven't yet built and
> installed, to the point where you may be unable to build and install
> those changes.  In this particular case, it will, at the very least,
> break ssh and svn / svnlite.

Yes, sorry, running make delete-old-libs before buildworld is no good
idea, unless the old libraries have been copied to /usr/lib/compat before.


The advice to run "make delete-old-libs" came from the following message
from Glen Barber:

https://lists.freebsd.org/pipermail/freebsd-current/2018-October/071581.html

But the advice was not to delete old files before make buildworld, but only
before starting the required port upgrades ...


I might have mentioned, that I always preserve old shared libraries in
/usr/lib/compat before running "make delete-old-libs". This allows to run
old binaries, but prevents linking of new binaries against these libraries
(should not matter for make buildworld, but for building ports, which I do
at in the same script that invokes buildworld for critical kernel modules
that are to be built from ports).

No binary or library should reference a library whose path contains
/compat/ after all upgrades have been performed, obviously ...

Regards, STefan
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: careless commits disrupt

2018-10-11 Thread Stefan Esser
Am 11.10.18 um 18:20 schrieb Julian H. Stacey:
>> On Thu, Oct 11, 2018 at 05:54:08PM +0200, Julian H. Stacey wrote:
>>> 
>>> 3_sxnet.pico v3_tlsf.pico v3_utl.pico v3err.pico |  tsort -q`  -lpthread
>>> /usr/bin/ld: error: unable to find library -lpthread
>>> cc: error: linker command failed with exit code 1 (use -v to see invocati=
>> on)
>>> *** Error code 1
>>> =20
>>> Stop.
>>> make[4]: stopped in /usr/src/secure/lib/libcrypto
>>> =20
>>> Yes I'm current:
>>> .ctm_status src-cur 13733
>>> .svn_revision 339303
>>> 
>>
>> I had no issues this morning performing a src-based update from
>> head/amd64 @r339278 to r339303 -- either on my laptop or my build
>> machine.
> 
> Hi David, 
> Then it seems this one is hopefuly my local problem then,
> so I'll revert to vanilla r339303 & retry. Thanks.

It worked for me when I started with a clean /usr/obj.

You should also delete old files:

cd /usr/src
make delete-old
make delete-old-libs

Regards, STefan
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: options COMPAT_AOUT to file UPDATING to know

2018-07-14 Thread Stefan Esser
Am 13.07.18 um 19:56 schrieb joun...@yahoo.co.uk:
> 
> === Reason:
> 
> In compiling the kernel again after a long time after 'pkg upgrade' the
> following errors. The Intel graphics card is in use and something had changed,
> the 'startx' did not start the XFCE session. This was the reason to compile
> the kernel again with the new sources of today. After two retries taking some
> time to complite, it would be helpful to ...
> 
> === Symptom:
> 
> --- kernel.full ---
> linking kernel.full
> ld.lld: error: undefined symbol: aout_sysvec
 referenced by imgact_gzip.c:240 (/usr/src/sys/kern/imgact_gzip.c:240)
 imgact_gzip.o:(Flush)
> 
> === Resolution:
> 
> Adding
> 
> options COMPAT_AOUT
> 
> to the kernel configuration file.
> 
> This added the necessary 'imgact_aout.o' to the linking and the 'aout_sysvec'
> was found.

Seems you have "device gzip" in your kernel configuration?

This is a long (15 years?) obsolete option, which let you compress your a.out
binaries with gzip and execute them as if they were uncompressed. The binaries
where not paged in as normal, but loaded as one blob, in that case.

This was a useful features when Laptops had slow 200 MB hard disks, since the
space saving was substantial.

ELF binaries could never be compressed and executed that way, and the option
is not present in any kernel configuration in the FreeBSD sources. It is only
mentioned in NOTES and there is a clear remark, that this option requires
COMPAT_AOUT (and also mentions that it is only useful for a.out binaries).

Regards, STefan
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Is kern.sched.preempt_thresh=0 a sensible default?

2018-06-09 Thread Stefan Esser
Am 07.06.18 um 19:14 schrieb Andriy Gapon:
> On 03/05/2018 12:41, Andriy Gapon wrote:
>> I think that we need preemption policies that might not be expressible as 
>> one or
>> two numbers.  A policy could be something like this:
>> - interrupt threads can preempt only threads from "lower" classes: real-time,
>> kernel, timeshare, idle;
>> - interrupt threads cannot preempt other interrupt threads
>> - real-time threads can preempt other real-time threads and threads from 
>> "lower"
>> classes: kernel, timeshare, idle
>> - kernel threads can preempt only threads from lower classes: timeshare, idle
>> - interactive timeshare threads can only preempt batch and idle threads
>> - batch threads can only preempt idle threads
> 
> Here is a sketch of the idea: https://reviews.freebsd.org/D15693

Hi Andriy,

I highly appreciate your effort to improve the scheduling in SCHED_ULE.

But I'm afraid, that your scheme will not fix the problem. As you may
know, there are a number of problems with SCHED_ULE, which let quite a
number of users prefer SCHED_4BSD even on multi-core systems.

The problems I'm aware of:

1) On UP systems, I/O intensive applications may be starved by compute
   intensive processes that are allowed to consume their full quantum of
   time (limiting reads to some 10 per second worst case).

2) Similarly, on SMP systems with load higher than the number of cores
   (virtual cores in case of HT), the compute bound cores can slow down
   a cp of a large file from 100s of MB/s to 100s of KB/s, under certain
   circumstances.

3) Programs that evenly split the load on all available cores have been
   suffering from sub-optimal assignment of threads to cores. E.g. on a
   CPU with 8 (virtual) cores, this resulted in 6 cores running the load
   in nominal time, 1 core taking twice as long because 2 threads were
   scheduled to run on it, while 1 core was mostly idle. Even if the
   load was initially evenly distributed, a woken up process that ran on
   one core destroyed the symmetry and it was not recovered. (This was a
   problem e.g. for parallel programs using MPI or the like.)

4) The real time behavior of SCHED_ULE is weak due to interactive
   processes (e.g. the X server) being put into the "time-share" class
   and then suffering from the problems described as 1) or 2) above.
   (You distinguish time-share and batch processes, which both are
allowed to consume their full quanta even of a higher priority
process in their class becomes runnable. I think this will not
give the required responsiveness e.g. for an X server.)
   They should be considered I/O intensive, if they often don't use
   their full quantum, without taking the significant amount of CPU
   time they may use at times into account. (I.e. the criterion for
   time-sharing should not be the CPU time consumed, but rather some
   fraction of the quanta not being fully used due to voluntarily giving
   up the CPU.) With many real-time threads it may be hard to identify
   interactive threads, since they are non-voluntarily disrupted too
   often - this must be considered in the sampling of voluntary vs.
   non-voluntary context switches.

5) The NICE parameter has hardly any effect on the scheduling. Processes
   started with nice 19 get nearly the same share of the CPU as processes
   at nice 0, while they should traditionally only run when a core was
   idle, otherwise. Nice values between 0 and 19 have even less effect
   (hardly any).

I have not had time to try the patch in that review, but I think that
the cause of scheduling problems is not localized in that function.

And a solution should be based on typical use cases or sample scenarios
being applied to a scheduling policy. There are some easy cases (e.g. a
"random" load of independent processes like a parallel make run), where
only cache effects are relevant (try to keep a thread on its CPU as long
as possible and, if interrupted, continue it on that CPU if you can assume
there is still significant cached state).

There have been excessive KTR traces that showed the scheduler behavior
under specific loads, especially MPI, and there have been attempts to
fix the uneven distribution of processes for that case (but AFAIR not
with good success).

Your patches may be part of the solution, with at least 3 other parts
remaining:

1) The classification of interactive and time-share should be separate.
   Interactive means that the process does not use its full quantum in
   a non-negligible fraction of cases. The X server or a DBMS server
   should not be considered compute intensive, or request rates will
   be as low as 10 per second (if the time-share quantum is in the
   order of 100 ms).

2) The scheduling should guarantee symmetric distribution of the load
   for scenarios as parallel programs with MPI. Since OpenMP and other
   mechanism have similar requirements, this will become more relevant
   over time.

3) The nice-ness of a process should be relevant, to g

Strange problem with tar in -CURRENT (VM problem?)

2018-05-12 Thread Stefan Esser
While searching for the reason an upgrade of math/atlas failed on my amd64
-CURRENT system, I found that tar fails to create an archive of some 10KB.

It is killed (-9) after some 30 seconds during which it grows seemingly
without bounds.

The port processes some TAR files in order to fixup paths in them with
the following shell loop (edited for readability):

cd ${WRKDIR}/ATLAS/CONFIG/ARCHS
for t in *.tgz ; do
/bin/mv ${t} ${t}.bak
/usr/bin/tar -s '/gcc/gcc6/' -xf ${t}.bak
/usr/bin/tar -czf ${t} ${t%.tgz}# (***)
/bin/rm -f -r ${t%.tgz} ${t}.bak
done

The command that fails is the one marked (***) and I have tried to trace it
with ktrace and truss, but only see that a large amount of memory is mapped
and the tar process is killed without having produced any output. I have
added "-v" to watch progress and the log does also indicate, that tar does
not even start to write to the archive. Removal of the "z" option makes no
difference.

Typical "ps l" output is:

UID  PID PPID CPU PRI NI   VSZ  RSS MWCHAN STAT TT TIME COMMAND
  0 2269 2254   0  30  0 105946804 21244044 pfault D 0  0:31,48
/usr/bin/tar -czf Core232SSE3.tgz Core232SSE3 (bsdtar)

VSZ is 105946804 KB or about 100 GB, RSS 21 GB when tar is killed ...

The files to be processed are:

-rw-r--r--  1 root  wheel  11399 May 12 16:40 AMD64K10h32SSE3.tgz
-rw-r--r--  1 root  wheel  11697 May 12 16:40 AMD64K10h64SSE3.tgz
-rw-r--r--  1 root  wheel   1305 May 12 16:40 BOZOL1.tgz
-rw-r--r--  1 root  wheel   9909 May 12 16:40 Core232SSE3.tgz
drwxr-xr-x  5 root  wheel  9 Feb 25  2009 Core264SSE3/
-rw-r--r--  1 root  wheel  0 May 12 16:40 Core264SSE3.tgz
-rw-r--r--  1 root  wheel  10212 May 14  2011 Core264SSE3.tgz.bak
-rw-r--r--  1 root  wheel   8544 May 14  2011 Corei164SSE3.tgz
[...]

The failure may be caused by a race-condition, since sometimes tar fails on
a later file (e.g. Corei164SSE3.tgz).

If I replace "${TAR} -czf" with "gtar -czf", then the port can be built.

But I do not think that this is a problem in BSDTAR, since the failure can
be reproduced (also after a buildworld/buildkernel and reboot), but there
have been no changes to BSDTAR since the libarcjive upgrade in January.

I guess this is a VM problem, that happens to show itself in this specific
program invocation. (The system runs without other obvious problems and
tar works outside this specific usage in the port ...)

Any ideas?

Best regards, STefan
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: grep extremely slow for LC_CTYPE=C? [SOLVED]

2018-05-03 Thread Stefan Esser
Am 03.05.18 um 17:28 schrieb Kyle Evans:
> On Thu, May 3, 2018 at 10:19 AM, Stefan Esser  wrote:
>> Am 03.05.18 um 16:41 schrieb Kyle Evans:
>>> Hmm... what does `grep -V` look like, just to confirm?
>>
>> Ah, yes, good point ...
>>
>> $ which grep
>> /usr/bin/grep
>>
>> $ grep -V
>> grep (GNU grep) 2.5.1-FreeBSD
>>
>> Copyright 1988, 1992-1999, 2000, 2001 Free Software Foundation, Inc.
>> This is free software; see the source for copying conditions. There is NO
>> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
>>
>> So, it seems I have to complain somewhere else about this behavior ...
> 
> Eh, no worries there. Newer GNU grep sucks less, and we're going to
> replace it Real Soon Now (TM).

Thank you very much - your reply was really helpful!

I just tested with GNU grep 2.27 (the current port version) and it does not
show the extreme slowness of the old version in FreeBSD, but is still more
than 10 times slower than BSD grep on my test data.

>> But I have (for a long time) in my /etc/src.conf:
>>
>> WITH_BSDGREP=yes
>> WITH_BSD_GREP_FASTMATCH= yes
>> WITHOUT_GNU_GREP_COMPAT= yes
>>
>> And before seeing the grep -V output, I was convinced that I had been using
>> BSD grep (i.e. that it replaced GNU grep with above options) by default ...
>>
>> But now I see that I need to invoke bsdgrep under that name. It is very fast,
>> but does not give the expected (correct?) result, which is the single line
>> that is not suppressed by the pattern match ...
> 
> This is actually because you've typo'd WITH_BSD_GREP. =) WITH_BSD_GREP
> will replace /usr/bin/grep with bsdgrep and put GNU grep at
> /usr/bin/gnugrep.

Yes, that was what I had expected, and I had correctly spelled WITH_BSD_PATCH,
but never bother to check that I got the "grep" I wanted ...

> I also recommend using WITHOUT_BSD_GREP_FASTMATCH / not using
> WITH_BSD_GREP_FASTMATCH. See below response.

It is so much faster than GNU grep on this use-case anyway ;-)

$ sh grep-test.sh
All/mpfr-3.1.7.tgz
0.14 real 0.13 user 0.00 sys
All/mpfr-3.1.7.tgz
0.13 real 0.13 user 0.00 sys

This is a factor 30 to 40 better than with our GNU grep (for the UTF-8 case,
where it finishes in finite time, orders of magnitude faster for LANG=C ;-) ).

And yes, FASTMATCH was responsible for the erroneous result in my previous
tests with BSD grep. Now that I have rebuild it without that option, it works
perfectly for me :)

> BSD_GREP_FASTMATCH is best left off (default on HEAD)- it was disabled
> because the version of tre ("fastmatch") that bsdgrep uses is buggy
> and I don't want to invest the time to fix it. The performance of the
> version we use isn't any better than our libc regex(3), so I made the
> decision to switch it to that and focus efforts on optimizing our
> general regex implementation instead.

A decision I can well understand and sympathize with.

How about removing the BSD_GREP_FASTMATCH option, then?

> I have plans to replace our libc regex(3) with Onigmo [1], which is at
> least twice as fast as what we have and comes with all kinds of other
> extensions- GNU extensions will be exposed via libregex, and I also
> plan to install Onigmo on its own so that others can use that with its
> own interface. The difference between it and libregex will be that
> libregex exposes a regex(3) interface for using extensions with an
> option to go REG_POSIX.
> 
> [1] https://github.com/k-takata/Onigmo

Great plan! But for now BSD grep seems well up to the task and my only
problem is now, that I need to support stable releases that use (and will
stay with) the old GNU grep, so I'll need to keep the work-around (or
perhaps depend on the port version?).

Thanks again!

Best regards, STefan
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: grep extremely slow for LC_CTYPE=C?

2018-05-03 Thread Stefan Esser
Am 03.05.18 um 16:41 schrieb Kyle Evans:

Hi Kyle,

thank you for the fast reply. You were right to request grep -V output,
but see below ... ;-)

> On Thu, May 3, 2018 at 9:08 AM, Stefan Esser  wrote:
>> The first "grep" needs 3.5 seconds to finish on my system, but the second
>> one (with LC_CTYPE=C or no locale set at all) runs for minutes (I did not
>> bother to check whether it finishes at all).
>>
>> Is this a bug in grep?
>>
>> Maybe there is something odd in the data file (loading the pattern is not
>> slower with LC_CTYPE=C, it takes 0.8 seconds on my system), but this is a
>> problem that was observed with "real" data, not a specifically constructed
>> worst case.
>>
>> Any ideas what's causing this behavior?
>>
>> I'm currently setting the UTF-8 locale as in the first invocation above
>> to make grep run in reasonable time, but I'd expect it to be faster in
>> the C locale ...
>>
>> Regards, STefan
> 
> Hmm... what does `grep -V` look like, just to confirm?

Ah, yes, good point ...

$ which grep
/usr/bin/grep

$ grep -V
grep (GNU grep) 2.5.1-FreeBSD

Copyright 1988, 1992-1999, 2000, 2001 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

So, it seems I have to complain somewhere else about this behavior ...

But I have (for a long time) in my /etc/src.conf:

WITH_BSDGREP=yes
WITH_BSD_GREP_FASTMATCH= yes
WITHOUT_GNU_GREP_COMPAT= yes

And before seeing the grep -V output, I was convinced that I had been using
BSD grep (i.e. that it replaced GNU grep with above options) by default ...

But now I see that I need to invoke bsdgrep under that name. It is very fast,
but does not give the expected (correct?) result, which is the single line
that is not suppressed by the pattern match ...

> These are the results on my local system:
> 
> root@viper:/tmp/grep# ./grep-test.sh
> All/mpfr-3.1.7.tgz
> 0.10 real 0.10 user 0.00 sys
> All/mpfr-3.1.7.tgz
> 0.09 real 0.08 user 0.00 sys
> 
> But I don't immediately recall if I have local modifications in
> regex(3)/bsdgrep that might have affected this. =(

Yes, that's the correct result and extremely fast!

But on my system (with only "bsdgrep" substituted for "grep") I get

$ sh bsdgrep-test.sh | wc
0.15 real 0.14 user 0.00 sys
0.15 real 0.15 user 0.00 sys
33623362   94700

I.e. only about 1/3 of the lines are suppressed by the pattern, while all
but 1 line should be ...

Or is one of the build options that I used unsafe?

Best regards, STefan
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


grep extremely slow for LC_CTYPE=C?

2018-05-03 Thread Stefan Esser
Hi all,

while working on a new portmaster version, I found that bsdgrep is much
faster in an UTF-8 locale than in the C locale, much to my surprise.

I have uploaded a small shell-script with test data that can be fetched
from:

https://people.freebsd.org/~se/grep-test.txz

The script uses "grep -v -f patternfile datafile" to select from datafiles
the lines that are not matched by the contents of patternfile:

#---
#!/bin/sh

LANG=en_US.UTF-8
LC_CTYPE=en_US.UTF-8

export LANG LC_CTYPE

time grep -v -f grep-test-pattern grep-test-data

LANG=C
LC_CTYPE=C
#unset LANG LC_CTYPE # is an alternative leading to the same result ...

time grep -v -f grep-test-pattern grep-test-data
#---

The first "grep" needs 3.5 seconds to finish on my system, but the second
one (with LC_CTYPE=C or no locale set at all) runs for minutes (I did not
bother to check whether it finishes at all).

Is this a bug in grep?

Maybe there is something odd in the data file (loading the pattern is not
slower with LC_CTYPE=C, it takes 0.8 seconds on my system), but this is a
problem that was observed with "real" data, not a specifically constructed
worst case.

Any ideas what's causing this behavior?

I'm currently setting the UTF-8 locale as in the first invocation above
to make grep run in reasonable time, but I'd expect it to be faster in
the C locale ...

Regards, STefan
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Is kern.sched.preempt_thresh=0 a sensible default?

2018-04-05 Thread Stefan Esser
Am 04.04.18 um 18:45 schrieb Andriy Gapon:
> On 04/04/2018 16:19, Stefan Esser wrote:
>> I have identified the cause of the extremely low I/O performance (2 to 6 read
>> operations scheduled per second).
>>
>> The default value of kern.sched.preempt_thresh=0 does not give any CPU to the
>> I/O bound process unless a (long) time slice expires 
>> (kern.sched.quantum=94488
>> on my system with HZ=1000) or one of the CPU bound processes voluntarily 
>> gives
>> up the CPU (or exits).
>>
>> Any non-zero value of preemt_thresh lets the system perform I/O in parallel
>> with the CPU bound processes, again.
> 
> Let me guess... you have a custom kernel configuration and, unlike GENERIC
> (assuming x86), it does not have 'options PREEMPTION'?

Yes, thank you for pointing that out!!!

I used to have PREEMPTION and FULL_PREEMPTION in my kernel configuration,
and apparently have deleted both options when only FULL_PREEMPTION was
supposed to go ...


After looking at sched_ule.c and top/machine.c it appears, that the value
of preempt_thresh corresponds to the PRI value as shown by top (or ps -l)
plus PZERO which is calculated as (PRI_MIN_KERN=80) + 20.

What I do not understand, though, is that the decision about a preemption
is only based on the calculated new priority of the thread, but not at all
on the priority of other running threads (except the idle thread).

On my system, a "real" batch job (i.e. one that does not voluntarily give
up the CPU due to I/O) seems to have a PRI value of 80 to 100 (growing
over time), while an interactive process has a PRI of 20, a maximally
"niced" interactive process has 52.

So, I'd expect a reasonable default value of preempt_thresh to be slightly
above 120 (e.g. 124) to prevent I/O heavy threads from stealing each other
the CPU too often, and to prevent "niced" processes from doing the same ...

The two values configured into the kernel (80 for PREEMPTION and 255 for
FULL_PREEMPTION) seem to be extremes, but something in between (e.g. 124)
is not offered (can only be configured via sysctl without any information
for the correspondence between the threshold value and the PRI value in
any document I've found, besides the kernel sources ...).


Is PRI_MIN_KERN=80 really a good default value for the preemption threshold?

Regards, STefan
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Is kern.sched.preempt_thresh=0 a sensible default? (was: Re: Extremely low disk throughput under high compute load)

2018-04-04 Thread Stefan Esser
Am 02.04.18 um 00:18 schrieb Stefan Esser:
> Am 01.04.18 um 18:33 schrieb Warner Losh:
>> On Sun, Apr 1, 2018 at 9:18 AM, Stefan Esser > <mailto:s...@freebsd.org>> wrote:
>>
>> My i7-2600K based system with 24 GB RAM was in the midst of a buildworld 
>> -j8
>> (starting from a clean state) which caused a load average of 12 for more 
>> than
>> 1 hour, when I decided to move a directory structure holding some 10 GB 
>> to its
>> own ZFS file system. File sizes varied, but were mostly in the range 0f 
>> 500KB.
>>
>> I had just thrown away /usr/obj, but /usr/src was cached in ARC and thus 
>> there
>> was nearly no disk activity caused by the buildworld.
>>
>> The copying proceeded at a rate of at most 10 MB/s, but most of the time 
>> less
>> than 100 KB/s were transferred. The "cp" process had a PRIO of 20 and 
>> thus a
>> much better priority than the compute bound compiler processes, but it 
>> got
>> just 0.2% to 0.5% of 1 CPU core. Apparently, the copy process was 
>> scheduled
>> at such a low rate, that it only managed to issue a few controller 
>> writes per
>> second.
>>
>> The system is healthy and does not show any problems or anomalies under
>> normal use (e.g., file copies are fast, without the high compute load).
>>
>> This was with SCHED_ULE on a -CURRENT without WITNESS or malloc 
>> debugging.
>>
>> Is this a regression in -CURRENT?
>>
>> Does 'sync' push a lot of I/O to the disk?
> 
> Each sync takes 0.7 to 1.5 seconds to complete, but since reading is so
> slow, not much is written.
> 
> Normal gstat output for the 3 drives the RAIDZ1 consists of:
> 
> dT: 1.002s  w: 1.000s
>  L(q)  ops/sr/s   kBps   ms/rw/s   kBps   ms/w   %busy Name
> 0  2  2 84   39.1  0  00.07.8  ada0
> 0  4  4 92   66.6  0  00.0   26.6  ada1
> 0  6  6259   66.9  0  00.0   36.2  ada3
> dT: 1.058s  w: 1.000s
>  L(q)  ops/sr/s   kBps   ms/rw/s   kBps   ms/w   %busy Name
> 0  1  1 60   70.6  0  00.06.7  ada0
> 0  3  3 68   71.3  0  00.0   20.2  ada1
> 0  6  6242   65.5  0  00.0   28.8  ada3
> dT: 1.002s  w: 1.000s
>  L(q)  ops/sr/s   kBps   ms/rw/s   kBps   ms/w   %busy Name
> 0  5  5192   44.8  0  00.0   22.4  ada0
> 0  6  6160   61.9  0  00.0   26.5  ada1
> 0  6  6172   43.7  0  00.0   26.2  ada3
> 
> This includes the copy process and the reads caused by "make -j 8 world"
> (but I assume that all the source files are already cached in ARC).

I have identified the cause of the extremely low I/O performance (2 to 6 read
operations scheduled per second).

The default value of kern.sched.preempt_thresh=0 does not give any CPU to the
I/O bound process unless a (long) time slice expires (kern.sched.quantum=94488
on my system with HZ=1000) or one of the CPU bound processes voluntarily gives
up the CPU (or exits).

Any non-zero value of preemt_thresh lets the system perform I/O in parallel
with the CPU bound processes, again.

I'm not sure about the bias relative to the PRI values displayed by top, but
for me a process with PRI above 72 (in top) should be eligible for preemption.

What value of preempt_thresh should I use to get that behavior?


And, more important: Is preempt_thresh=0 a reasonable default???

This prevents I/O bound processes from making reasonable progress if all CPU
cores/threads are busy. In my case, performance dropped from > 10 MB/s to just
a few hundred KB per second, i.e. by a factor of 30. (The %busy values in my
previous mail are misleading: At 10 MB/s the disk was about 70% busy ...)


Should preempt_thresh be set to some (possibly high, to only preempt long
running processes) value?

Regards, STefan
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Extremely low disk throughput under high compute load

2018-04-01 Thread Stefan Esser
Am 01.04.18 um 18:33 schrieb Warner Losh:
> 
> 
> On Sun, Apr 1, 2018 at 9:18 AM, Stefan Esser  <mailto:s...@freebsd.org>> wrote:
> 
> My i7-2600K based system with 24 GB RAM was in the midst of a buildworld 
> -j8
> (starting from a clean state) which caused a load average of 12 for more 
> than
> 1 hour, when I decided to move a directory structure holding some 10 GB 
> to its
> own ZFS file system. File sizes varied, but were mostly in the range 0f 
> 500KB.
> 
> I had just thrown away /usr/obj, but /usr/src was cached in ARC and thus 
> there
> was nearly no disk activity caused by the buildworld.
> 
> The copying proceeded at a rate of at most 10 MB/s, but most of the time 
> less
> than 100 KB/s were transferred. The "cp" process had a PRIO of 20 and 
> thus a
> much better priority than the compute bound compiler processes, but it got
> just 0.2% to 0.5% of 1 CPU core. Apparently, the copy process was 
> scheduled
> at such a low rate, that it only managed to issue a few controller writes 
> per
> second.
> 
> The system is healthy and does not show any problems or anomalies under
> normal use (e.g., file copies are fast, without the high compute load).
> 
> This was with SCHED_ULE on a -CURRENT without WITNESS or malloc debugging.
> 
> Is this a regression in -CURRENT?
> 
> Does 'sync' push a lot of I/O to the disk?

Each sync takes 0.7 to 1.5 seconds to complete, but since reading is so
slow, not much is written.

Normal gstat output for the 3 drives the RAIDZ1 consists of:

dT: 1.002s  w: 1.000s
 L(q)  ops/sr/s   kBps   ms/rw/s   kBps   ms/w   %busy Name
0  2  2 84   39.1  0  00.07.8  ada0
0  4  4 92   66.6  0  00.0   26.6  ada1
0  6  6259   66.9  0  00.0   36.2  ada3
dT: 1.058s  w: 1.000s
 L(q)  ops/sr/s   kBps   ms/rw/s   kBps   ms/w   %busy Name
0  1  1 60   70.6  0  00.06.7  ada0
0  3  3 68   71.3  0  00.0   20.2  ada1
0  6  6242   65.5  0  00.0   28.8  ada3
dT: 1.002s  w: 1.000s
 L(q)  ops/sr/s   kBps   ms/rw/s   kBps   ms/w   %busy Name
0  5  5192   44.8  0  00.0   22.4  ada0
0  6  6160   61.9  0  00.0   26.5  ada1
0  6  6172   43.7  0  00.0   26.2  ada3

This includes the copy process and the reads caused by "make -j 8 world"
(but I assume that all the source files are already cached in ARC).

During sync:

dT: 1.002s  w: 1.000s
 L(q)  ops/sr/s   kBps   ms/rw/s   kBps   ms/w   %busy Name
1101  9132   14.6 90   17605.6   59.7  ada0
2110 16267   15.0 92   17566.0   50.7  ada1
2 82 13291   17.8 67   16537.4   34.3  ada3

ZFS is configured to flush dirty buffers after 5 seconds, so there are
not many dirty buffers in RAM at any time, anyway.

> Is the effective throughput of CP tiny or large? It's tiny, if I read right,
> and the I/O is slow (as opposed to it all buffering in memory and being slow
> to drain own), right?

Yes, reading is very slow, with less than 10 read operations scheduled
per second.

Top output taken at the same time as above gstat samples:

last pid: 24306;  load averages: 12.07, 11.51,  8.13

  up 2+05:41:57  00:10:22
132 processes: 10 running, 122 sleeping
CPU: 98.2% user,  0.0% nice,  1.7% system,  0.1% interrupt,  0.0% idle
Mem: 1069M Active, 1411M Inact, 269M Laundry, 20G Wired, 1076M Free
ARC: 16G Total, 1234M MFU, 14G MRU, 83M Anon, 201M Header, 786M Other
 14G Compressed, 30G Uncompressed, 2.09:1 Ratio
Swap: 24G Total, 533M Used, 23G Free, 2% Inuse

  PID USERNAME   THR PRI NICE   SIZERES STATE   C   TIMEWCPU COMMAND
24284 root 1  920   228M   199M CPU66   0:11 101.34% c++
24287 root 1  910   269M   241M CPU33   0:10 101.32% c++
24266 root 1  970   303M   276M CPU00   0:17 101.13% c++
24297 root 1  850   213M   184M CPU11   0:06  98.40% c++
24281 root 1  930   245M   217M CPU77   0:12  96.76% c++
24300 root 1  760   114M 89268K RUN 2   0:02  83.22% c++
24303 root 1  750   105M 79908K CPU44   0:01  59.94% c++
24302 root 1  520 74940K 47264K wait4   0:00   0.35% c++
24299 root 1  520 74960K 47268K wait2   0:00   0.33% c++
20954 root 1  200 15528K  4900K zio->i  3   0:02   0.11% cp

ARC is limited to 18 GB to leave 6 GB RAM for use by kernel and user programs.

vfs.zfs.arc_meta_limit: 45
vfs.zfs.arc_free_target: 42339
vfs.zfs.compresse

Extremely low disk throughput under high compute load

2018-04-01 Thread Stefan Esser
My i7-2600K based system with 24 GB RAM was in the midst of a buildworld -j8
(starting from a clean state) which caused a load average of 12 for more than
1 hour, when I decided to move a directory structure holding some 10 GB to its
own ZFS file system. File sizes varied, but were mostly in the range 0f 500KB.

I had just thrown away /usr/obj, but /usr/src was cached in ARC and thus there
was nearly no disk activity caused by the buildworld.

The copying proceeded at a rate of at most 10 MB/s, but most of the time less
than 100 KB/s were transferred. The "cp" process had a PRIO of 20 and thus a
much better priority than the compute bound compiler processes, but it got
just 0.2% to 0.5% of 1 CPU core. Apparently, the copy process was scheduled
at such a low rate, that it only managed to issue a few controller writes per
second.

The system is healthy and does not show any problems or anomalies under
normal use (e.g., file copies are fast, without the high compute load).

This was with SCHED_ULE on a -CURRENT without WITNESS or malloc debugging.

Is this a regression in -CURRENT?

Regards, STefan
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Boot failure: panic: No heap setup

2018-03-30 Thread Stefan Esser
Am 29.03.18 um 07:15 schrieb Toomas Soome:
> 
> 
>> On 29 Mar 2018, at 01:06, Stefan Esser  wrote:
>>
>> Am 28.03.18 um 22:28 schrieb Warner Losh:
>>>> Hmmm, the code references point into the boot loader code - I had
>>>> expected that there is a problem in the kernel, not the boot loader.
>>>>
>>>>> [1]
>>>>> https://svnweb.freebsd.org/base/head/stand/libsa/sbrk.c?view=markup#l56
>>><https://svnweb.freebsd.org/base/head/stand/libsa/sbrk.c?view=markup#l56>
>>>>
>>>>
>>>> Seems that setbase has either not been called or has been called with
>>>> base=0.
>>>
>>>Right, which is odd...
>>>
>>>>> [2]
>>>>> https://svnweb.freebsd.org/base/head/stand/i386/zfsboot/zfsboot.c?view=markup#l688
>>>
>>> <https://svnweb.freebsd.org/base/head/stand/i386/zfsboot/zfsboot.c?view=markup#l688>
>>>>
>>>>
>>>> I had thought, that the zfs boot code has been initialized before the
>>>> menu is displayed?
>>>
>>>Right, all of this should be done long before we get to the
>>>interpreter. Can you break into the loader prompt and try the `heap`
>>>command, see what that outputs? CC'ing imp@ because he actually knows
>>>things.
>>>
>>> Totally weird. I'd add a printf to the sethead() function to display its 
>>> args
>>> and see if you get this panic before/after that printf...
>>
>> I'm currently using a Forth-enabled boot loader again, since this is a
>> "production" machine (my home server, which also receives and keeps all
>> my work email, for example).
>>
>> I'll build a clean world with the LUA loader and test it on one of the
>> next days. Tests will include the "heap" loader command and I'll add the
>> printf (though, if sbrk() has really not been called, I guess that will
>> not go too well ...).
>>
>> Is it possible, that the setheap function is called a second time, just
>> before jumping into the kernel? (In that case adding the printf might
>> crash the loader in the first setheap call ...)
>>
>> Since the loader menu (and escaping from the menu) works, there must be
>> a valid heap, at that time.
>>
> 
> indeed. and assuming the message really is from loader, it means, there must
> be memory corruption - if so, you can check which variables are located
> close to heap related ones… Also, since you have the working menu, it has to
> be related to actual loading. Since the loading itself has been working so
> far, it should be related to lua specific bits which are preparing towards
> to call load functions.

Ok, some more data points:

1) A printf in setheap reported plausible values during start-up of zfsboot.
   The menu appeared and wiped away the values so fast that I could not take
   a photo or write them down.

2) I have rebuilt world and kernel based on r331763. Booting resulted in the
   same panic as reported before. There was no debug output from the patched
   setheap call before the panic (which indicates that it was not called a
   second time).

3) In order to get my system to boot, I interrupted loading of zfsloader and
   forced loading of the previous version (from a world build with Forth in
   the loader). Booting succeeded with the latest kernel ...

It looks as if sbrk() was called in zfsloader before setheap() has been used
to initialize the heap parameters, if lua is enabled instead if Forth. See
stand/i386/loader/main.c:124 for the location of the setheap call in the
loader.

This is obviously hard to debug, though, since printf cannot be called at that
point. A pure write(2) should be possible without heap, but since the console
has not been initialized at the point of the setheap invocation, there is no
working output device, AFAIK.

I do not see, how any sbrk() call could occur before setheap is called. And
there does not appear to be any other setheap function (or macro) in the
tree, that could overload the one defined in stand/libsa/sbrk.c ...

I have no idea how to proceed from here ...

But now I'm sure it is a problem in zfsloader (or loader in general?).

Hmmm: How is the panic message printed by sbrk() without a initialized heap?
The definition of panic in stand/libsa/panic.c relies on a working printf!

I should be able to use printf in the same way as panic does, but I did
not succeed when I tried to use it early in zfsloader ...

Regards, STefan
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Boot failure: panic: No heap setup

2018-03-28 Thread Stefan Esser
Am 28.03.18 um 22:28 schrieb Warner Losh:
> > Hmmm, the code references point into the boot loader code - I had
> > expected that there is a problem in the kernel, not the boot loader.
> >
> >> [1]
> >> https://svnweb.freebsd.org/base/head/stand/libsa/sbrk.c?view=markup#l56
> 
> >
> >
> > Seems that setbase has either not been called or has been called with
> > base=0.
> 
> Right, which is odd...
> 
> >> [2]
> >> 
> https://svnweb.freebsd.org/base/head/stand/i386/zfsboot/zfsboot.c?view=markup#l688
> 
> 
> >
> >
> > I had thought, that the zfs boot code has been initialized before the
> > menu is displayed?
> 
> Right, all of this should be done long before we get to the
> interpreter. Can you break into the loader prompt and try the `heap`
> command, see what that outputs? CC'ing imp@ because he actually knows
> things.
> 
> Totally weird. I'd add a printf to the sethead() function to display its args
> and see if you get this panic before/after that printf...

I'm currently using a Forth-enabled boot loader again, since this is a
"production" machine (my home server, which also receives and keeps all
my work email, for example).

I'll build a clean world with the LUA loader and test it on one of the
next days. Tests will include the "heap" loader command and I'll add the
printf (though, if sbrk() has really not been called, I guess that will
not go too well ...).

Is it possible, that the setheap function is called a second time, just
before jumping into the kernel? (In that case adding the printf might
crash the loader in the first setheap call ...)

Since the loader menu (and escaping from the menu) works, there must be
a valid heap, at that time.

Regards, STefan
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Boot failure: panic: No heap setup

2018-03-27 Thread Stefan Esser

Am 27.03.18 um 21:31 schrieb Kyle Evans:

On Tue, Mar 27, 2018 at 11:06 AM, Stefan Esser  wrote:

A few weeks ago I tried the LUA boot and found, that my kernel did not start
(i.e. did not print the initial FreeBSD version line), but instead stopped
with:


Oy =/


panic: No heap setup

I recovered by booting from an alternate boot device and kept my system
running until today, where I decided to give the LUA boot another try.

The boot failure happened again, with identical message:

 panic: No heap setup


Hmm... that's an sbrk panic [1], indicating that setheap hadn't been
called. zfsgptboot is zfsboot with gpt bits included, so the relevant
setheap call is [2] I believe. It's not immediately clear to me how
switching interpreters could actually be breaking it in this way.

At what point are you hitting this panic? After menu, before kernel transition?


The menu is displayed and I can unload the kernel and load the kernel
and modules from an alternate path. The lua code seems to work just fine,
but as soon as I enter the "boot" command, the panic happens.

This happens when the loader transfers control to the kernel but before
any other output is generated. I tried booting a GENERIC kernel just to
be sure this is not caused by an out-dated kernel config file.


I tried booting a GENERIC kernel, but only rebuilding the boot loader
(gptzfsloader in my case) without LUA support fixed the issue for me ...

The system is -CURRENT (built today) on amd64 (not converted to UEFI, yet).


Hmmm, the code references point into the boot loader code - I had
expected that there is a problem in the kernel, not the boot loader.


[1] https://svnweb.freebsd.org/base/head/stand/libsa/sbrk.c?view=markup#l56


Seems that setbase has either not been called or has been called with base=0.


[2] 
https://svnweb.freebsd.org/base/head/stand/i386/zfsboot/zfsboot.c?view=markup#l688


I had thought, that the zfs boot code has been initialized before the
menu is displayed?

Or do I misunderstand this phase of the boot process???

Regards, STefan
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Boot failure: panic: No heap setup

2018-03-27 Thread Stefan Esser
A few weeks ago I tried the LUA boot and found, that my kernel did not start 
(i.e. did not print the initial FreeBSD version line), but instead stopped

with:

panic: No heap setup

I recovered by booting from an alternate boot device and kept my system
running until today, where I decided to give the LUA boot another try.

The boot failure happened again, with identical message:

panic: No heap setup

I tried booting a GENERIC kernel, but only rebuilding the boot loader
(gptzfsloader in my case) without LUA support fixed the issue for me ...

The system is -CURRENT (built today) on amd64 (not converted to UEFI, yet).

Further information is available on request. For now, I'm back to booting
with the Forth based loader ...

STefan
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Strange ARC/Swap/CPU on yesterday's -CURRENT

2018-03-06 Thread Stefan Esser
Am 05.03.18 um 21:39 schrieb Larry Rosenman:
> Upgraded to:
> 
> FreeBSD borg.lerctr.org 12.0-CURRENT FreeBSD 12.0-CURRENT #11 r330385: Sun 
> Mar  4 12:48:52 CST 2018 
> r...@borg.lerctr.org:/usr/obj/usr/src/amd64.amd64/sys/VT-LER  amd64
> +1200060 1200060
> 
> Yesterday, and I'm seeing really strange slowness, ARC use, and SWAP use and 
> swapping.
> 
> See http://www.lerctr.org/~ler/FreeBSD/Swapuse.png
> 
> Ideas?

I'm seeing the same, and currently work around this with a reasonably limited
vfs.zfs.arc_max.

Without such a limit I see (on a system with 24 GB RAM):

CPU:  0.3% user,  0.0% nice,  0.9% system,  0.1% interrupt, 98.8% idle
Mem: 14M Active, 1228K Inact, 32K Laundry, 23G Wired, 376M Free
ARC: 19G Total, 3935M MFU, 14G MRU, 82M Anon, 223M Header, 876M Other
 18G Compressed, 36G Uncompressed, 2.02:1 Ratio
Swap: 24G Total, 888M Used, 23G Free, 3% Inuse, 8892K In, 5136K Out

sysctl vfs.zfs.arc_max=15988656640 results in:

Mem: 129M Active, 72M Inact, 36K Laundry, 18G Wired, 5149M Free
ARC: 15G Total, 3997M MFU, 10G MRU, 40M Anon, 205M Header, 877M Other
 13G Compressed, 28G Uncompressed, 2.08:1 Ratio
Swap: 24G Total, 796M Used, 23G Free, 3% Inuse, 16K In

The system was mostly idle at both times, just some Samba traffic and
mail being checked by spamassassin. And I noticed it (this time) when
the spamassassin processes were aborted due to a time limit.

I think that this problem must have been introduced in the last few
weeks, but cannot give a better estimate (do not reboot that often).

But I had already applied the arc_max setting a week ago (and had not
put it in sysctl.conf in the hope that the ARC growth was a temporary
problem in the ZFS code, soon to be fixed ...).

Regards, STefan
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Intel CPU design flaw - FreeBSD affected?

2018-01-04 Thread Stefan Esser
Am 04.01.18 um 12:56 schrieb Darren Reed:
> On 4/01/2018 11:51 AM, Mark Heily wrote:
>> On Jan 2, 2018 19:05, "Warner Losh"  wrote:
>>
>> The register article says the specifics are under embargo still. That would
>> make it hard for anybody working with Intel to comment publicly on the flaw
>> and any mitigations that may be underway. It would be unwise to assume that
>> all the details are out until the embargo lifts.
>>
>>
>> Details of the flaws are now published at:
>>
>> https://meltdownattack.com
> 
> The web page has both: meltdown and spectre.
> Most people are only talking about meltdown which doesn't hit AMD.
> spectre impacts *both* Intel and AMD.
> 
> SuSE are making available a microcode patch for AMD 17h processors that
> disables branch prediction:
> 
> https://lists.opensuse.org/opensuse-security-announce/2018-01/msg4.html

Disabling branch prediction will have a very noticeable effect on execution
speed in general (while split page tables only affect programs that perform
system calls at a high frequency).

I have not fully read the Meltdown and Spectre papers, yet, but I do assume,
that the attack at the branch prediction tries to counter KASLR, which we do
not support at all in FreeBSD.

So, I guess, we do not have to bother with disabling of branch prediction in
FreeBSD for the time being?

Regards, STefan
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: cve-2017-13077 - WPA2 security vulni

2017-10-16 Thread Stefan Esser
Am 16.10.17 um 12:38 schrieb blubee blubeeme:
> well, that's a cluster if I ever seen one.
> 
> On Mon, Oct 16, 2017 at 6:35 PM, Poul-Henning Kamp 
> wrote:
> 
>> 
>> In message > gmail.com>
>> , blubee blubeeme writes:
>>
>>> Does anyone on FreeBSD know if it's affected by this?
>>> https://cve.mitre.org/cgi-bin/cvename.cgi?name=2017-13077
>>
>> It is, same as Linux, we use the same wpa_supplicant software

The attached patch includes the official patch applied by the WPA
developers in   https://w1.fi/cgit/hostap/commit/?id=a00e946   but
for our version of wpa_supplicant in /usr/src/contrib.

Regards, STefan
Index: contrib/wpa/src/rsn_supp/wpa.c
===
--- contrib/wpa/src/rsn_supp/wpa.c  (Revision 324638)
+++ contrib/wpa/src/rsn_supp/wpa.c  (Arbeitskopie)
@@ -1534,6 +1534,14 @@
sm->ptk_set = 1;
os_memcpy(&sm->ptk, &sm->tptk, sizeof(sm->ptk));
os_memset(&sm->tptk, 0, sizeof(sm->tptk));
+   /*
+* This assures the same TPTK in sm->tptk can never be
+* copied twice to sm->pkt as the new PTK. In
+* combination with the installed flag in the wpa_ptk
+* struct, this assures the same PTK is only installed
+* once.
+*/
+   sm->renew_snonce = 1;
}
}
 
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: swapfile query

2017-08-20 Thread Stefan Esser
Am 20.08.17 um 01:39 schrieb Greg 'groggy' Lehey:
>> 3. should total swap be 1x 2x or some other multiple of RAM these days?
> 
> It never needed to be.  The only issue is that if you want processor
> dumps, you once needed a swap partition (and not a swap file) at least
> marginally larger than memory.  With compressed dumps, that
> requirement is relaxed, but I suspect that a 4 GB partition could be
> too small.

Well, no, it (2x RAM) used to be needed at a time ... ;-)

The VAX supported paging, but did not use a multi-level page table as
most CPUs do today. There was a linear list of page addresses per
process, and new page allocations could lead to a situation, where
there was no free space in this list. This required a kind of garbage
collection run, which was implemented by swapping out all processes
and starting with a clean state. This required 2 times RAM configured
as swap, to prevent a dead-lock (when a new page needed to be allocated
to complete the swap-out).

This MMU was used in at least all VAX 11-7xx, the µVAX 2 and µVAX 3
and thus in many of the machines used to run BSD back in the 80s ...

And thus, swap of at least 2 times RAM used to be not just a best
practice, but a strict requirement for stable operation of these
machines.

Regards, STefan
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: csh script help

2017-04-14 Thread Stefan Esser
Am 14.04.17 um 15:47 schrieb Ernie Luzar:
> To aid in debugging the script I'm writing, I place "echo" commands
> throughout so I can kind of have a trace of the logic as different
> conditions are processed. Normally I just delete these "echo" commands
> after I get the script working.
> 
> But this time I want to try something different. I want to
> enable/disable the echo commands in mass. So in the beginning of the
> script I added these 2 lints.
> 
> #trace=""  # use to enable trace echo
> trace="#"  # use to disable trace echo
> 
> In front of each of the echo commands I added this,
>  $trace echo "what ever."
> 
> When I exec the script I get error message  #: not found

This is to be expected ;-)

> What is happing here? Is the substitution to late?

No.

> Is there a way to fix this?

Use ":" instead of "#" to insert a "null command" before the echo:

% set trace=""
% $trace echo Hello
Hello
% set trace=":"
% $trace echo Hello
%

Regards, STefan
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Possible zpool online, resilvering issue

2016-08-10 Thread Stefan Esser
Am 10.08.2016 um 18:53 schrieb Ultima:
> Hello,
> 
>> I didn't see any reply on the list, so I thought I might let you know
> 
> Sorry, never received this reply (till now) xD
> 
>>what I assume is happening:
> 
>> ZFS never updates data in place, which affects inode updates, e.g. if
>> a file has been read and access times must be updated. (For that reason,
>> many ZFS file systems are configured to ignore access time updates).
> 
>> Even if there were only R/O accesses to files in the pool, there will
>> have been updates to the inodes, which were missed by the offlined
>> drives (unless you ignore atime updates).
> 
>> But even if there are no access time updates, ZFS might have written
>> new uberblocks and other meta information. Check the POOL history and
>> see if there were any TXGs created during the scrub.
> 
>> If you scrub the pooll while it is off-line, it should stay stable
>> (but if any information about the scrub, the offlining of drives etc.
>> is recorded in the pool's history log, differences are to be expected).
> 
>> Just my $.02 ...
> 
>> Regards, STefan
> 
> Thanks for the reply, I'm not completely sure what would be considered a
> TXG. Maintained normal operations during most this noise and this pool
> has quite a bit of activity during normal operations. My zpool history
> looks like it gos on forever and the last scrub is showing it repaired
> 9.48G. That was for all these access time updates? I guess that would be
> a little less then 2.5G per disk worth.
> 
> The zpool history looks like it gos on forever (733373 lines). This pool
> has much of this activity with poudriere. All the entries I see are
> clone, destroy, rollback and snapshotting. I can't really say how much
> but at least 500 (prob much more than that) entries between the last two
> scrubs. Atime is off on all datasets.
> 
>  So to be clear, this is expected behavior with atime=off + TXGs during
> offline time? I had thought that the resilver after onlining the disk
> would bring that disk up-to-date with the pool. I guess my understanding
> was a bit off.

Sorry, you'll have to ask somebody more familiar with ZFS internals
than me.

I just wanted to point out, that scrub might change the state of the
drives, even though no file data is modified.

Some 10 GB "repaired" on a 35000 GB pool is not much, it is about what
I'd expect to be required for meta-data.

BTW: The pool history is chronologically sorted, you need only check
the last few lines (written after the start time of the scrub, or
rather written after offlining some of the disk drives).

Regards, STefan
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [patch] syscons/vt keymap: Norwegian country code conflicts with default value

2014-09-22 Thread Stefan Esser
Am 22.09.2014 um 19:28 schrieb Tijl Coosemans:
> On Mon, 22 Sep 2014 14:09:54 +0200 Stefan Esser  wrote:
>> Am 21.09.2014 um 18:39 schrieb Gyrd Thane Lange:
>>> Hi,
>>>
>>> Recent changes in keymap namning for syscons/vt to use shorter names
>>> has exposed a conflict with the value "no" both used as country code
>>> for Norway and as a default value indicating that no keymap is set.
>>>
>>> The attached patch proposes to use "" (empty string) as default value
>>> instead.
>>
>> Hi Gyrd,
>>
>> thank you for reporting the issue!
>>
>> I have just committed a slightly different patch to -CURRENT and plan
>> to merge it to 10-STABLE in time for the next BETA.
>>
>> You may want to check-out r271958 ...
>>
>>
>> The approach I have chosen it to let "NO" continue to stand for "do
>> not load any keymap", while "no" is now recognized as equivalent to
>> "no.kbd".
>>
>>
>> The new semantics of the keymap parameter in rc.conf are:
>>
>>  keymap='' ==> do not load any keymap (unchanged)
>>  keymap=NO ==> do not load any keymap (unchanged)
>>  keymap=no ==> load Norwegian keymap  (new)
>>
>> This may still catch people that have edited rc.conf to use "no" in
>> the meaning "no keymap" by accident, but I see no other approach that
>> better complies with POLA ...
> 
> Maybe NONE.  It's already being used in a number of cases.

This was one of the alternatives, which I considered before the commit.

Reasons for my choice of "no" vs. "NO" (and not "NONE"):

1) NO is the default (in defaults/rc.conf) and may have found its way
   into individual rc.conf files. I wanted to preserve its meaning.

2) Tools like bsdconfig need to be made version aware and to use NO
   for releases before 10.1 and NONE for 10.1 and later.

IIRC, the use of NONE in rc.conf should be limited to cases that need
a value besides NO (e.g. in the case of sendmail_enable, where both
NO and NONE have special meaning).

If there are no strong arguments against the patch that I committed,
I'd like to MFC it to -STABLE.

But if there are better alternatives (and I do not think that "NONE"
is better, sorry), I'd like to hear about them in time for BETA3 ...

Regards, STefan
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [patch] syscons/vt keymap: Norwegian country code conflicts with default value

2014-09-22 Thread Stefan Esser
Am 21.09.2014 um 18:39 schrieb Gyrd Thane Lange:
> Hi,
> 
> Recent changes in keymap namning for syscons/vt to use shorter names
> has exposed a conflict with the value "no" both used as country code
> for Norway and as a default value indicating that no keymap is set.
> 
> The attached patch proposes to use "" (empty string) as default value
> instead.

Hi Gyrd,

thank you for reporting the issue!

I have just committed a slightly different patch to -CURRENT and plan
to merge it to 10-STABLE in time for the next BETA.

You may want to check-out r271958 ...


The approach I have chosen it to let "NO" continue to stand for "do
not load any keymap", while "no" is now recognized as equivalent to
"no.kbd".


The new semantics of the keymap parameter in rc.conf are:

keymap='' ==> do not load any keymap (unchanged)
keymap=NO ==> do not load any keymap (unchanged)
keymap=no ==> load Norwegian keymap  (new)

This may still catch people that have edited rc.conf to use "no" in
the meaning "no keymap" by accident, but I see no other approach that
better complies with POLA ...

An alternative to a MFC to 10-STABLE (and 10.1) might be to mention
this specific case in the release notes (and to suggest "keymap=no.kbd"
as a work-around). But I'll try to get approval for the MFC in time for
10.1-BETA3 ...

Regards, STefan
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: GNU LICENSING

2014-09-13 Thread Stefan Esser
Am 13.09.2014 um 09:04 schrieb jon.ruse:
> I was wondering how to apply to the gnu licensing and how to sign and
> commit to the licensing laws.. Would you mind telling me how to
> assign to one please?? And one other thing in the license of most gnu
> licensing they go on to mention the 'AS IS' commitment but I don't
> fully understand, as well could you give me five minutes of you busy
> time to explain please??

You just accept the GPL and act accordingly, there
is nothing to sign. If you violate the license rules
and somebody notices, you may be sued, though.

But you really should ask a project that uses the GPL!

The BSD projects use the much simpler BSD license for
all internally developed code, which in its 2-clause
form just requests that you do not remove the copyright
mark from any source files and that you distribute a
copy of the license with any binaries. And it excludes
warranties (in the "PROVIDED ... AS-IS" sentence), as
usual for such licenses.

[...]

> I do donate to the fsf donation station if that is anything meaning??

No it doesn't - donations do not affect what you are
allowed to do under the GPL.

Regards, STefan
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: nscd not caching

2014-08-18 Thread Stefan Esser
Am 17.08.2014 um 18:10 schrieb Adam McDougall:
> On 08/17/2014 09:09, Eggert, Lars wrote:
>> Nobody using nscd? Really?
> 
> I would test for you, but we retired our NIS infrastructure at least a
> year ago.  I did have it working on a test client at some point, but I
> didn't push it into production because I found a couple issues (below).
[...]
> The two main problems I recall were nscd making java crash, and nscd
> holding on to negative cache lookups too long, causing failures while
> installing ports that depend on adding users/groups for a following file
> permission change.  I can't remember if the latter issue was fixed at
> some point.  I also can't remember if I was receiving perfectly accurate
> results from the cache either.

I added the "negative-confidence-threshold" option to nscd, a few
years ago.  If set to a number > 1 (the default), then that number
of failures are required to cause a negative cache entry.  Setting
this value to 3 should allow for 2 probes for the presence of a UID
or username, before the cache returns a failure without bothering
to re-check the source.  The value should be low enough to prevent
flooding of a remote source with requests, if an entry really does
not exist.

The default was left unchanged - you need to increase the value to
see any effect of this threshold.  3 might be a reasonable default
for the user database.  But I never bothered to suggest and discuss
an increased default value on the mail-lists ...

[...]
> I dabbled with nscd a bit after we switched from NIS to LDAP.  I think I
> recall lookups being slightly slower WITH the cache, plus I would get
> some duplicated group entries returned on all but the first getent
> group.  The short version is we in no way seem to benefit or require a
> cache of LDAP with our site size, so I'm just not using nscd.  I didn't
> make bug reports for these issues, I had to prioritize towards more
> pressing issues.  I'm trying to do better about reporting bugs.

I also found that there were glitches, when I tested the extension
to cache only the nth negative reply. The code is not easy to read
and change (IMHO), and I did not succeed when I tried to reproduce
and debug these glitches.

Regards, STefan
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Several minor annoyances on Current

2014-07-26 Thread Stefan Esser
Am 26.07.2014 um 13:24 schrieb Lars Engels:
> On Sat, Jul 26, 2014 at 02:03:09AM -0700, Beeblebrox wrote:
>> @ Ian: I added your code and rebuilt the kernel. 
>> /boot/loader.conf also has vfs.nfs.bootp_disable="YES" Previously
>> described problem persists.
>> 
>> SEPARATE KBD_KEYMAP ISSUE: The keymap for keyboard fails to be
>> set from rc.conf with keymap="fr.iso.kbd" Boot message shows this
>> but there is no record of the message in log files/dmesg:
>> "Configuring syscons: keymapkbdcontrol: keymap file "fr.iso.kbd"
>> not found: No such file or directory" Keymap gets set when I
>> change rc.conf entry to:
>> keymap="/usr/share/syscons/keymaps/fr.iso.kbd"
> 
> Yes, I also stumbled upon this. Usually the following worked:
> 
> # kbdcontrol -l german.iso # kbdcontrol -l german.iso.kbd #
> kbdcontrol -l /usr/share/syscons/keymaps/german.iso.kbd
> 
> In the last few months on HEAD only the latest invocation works.

Please retry with kbdcontrol as of SVN revision 269120.

With this change, "kbdcontrol -l" looks at up to 4 places for
keyboard map definitions:

1) Under the path found in the environment variable KEYMAP_PATH
2) Under the full path name specified as parameter to -l
3) Under the newcons-specific path (/usr/share/vt/keymaps) (***)
4) Under the old path used for syscons (/usr/share/syscons/keymaps)

(***) Only if using newcons as reported by sysctl kern.vty

For newcons, there was no case 4), it only looked into vt/keymaps.
But many keymaps are identical for syscons and newcons and only a
few have been converted and placed into /usr/share/vt/keymaps.

Regards, STefan
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: buildworld fails (missing /usr/share/mk/src.opts.mk)

2014-05-06 Thread Stefan Esser
Am 06.05.2014 16:28, schrieb Warner Losh:
> First off, thanks for looking into this. Build issues are no fun. :(
> On May 6, 2014, at 8:11 AM, Stefan Esser  wrote:
>> Am 06.05.2014 15:18, schrieb Warner Losh:
>>> On May 6, 2014, at 7:16 AM, Warner Losh  wrote:
>>>> On May 6, 2014, at 6:39 AM, Stefan Esser  wrote:
>>>>> Am 06.05.2014 13:44, schrieb Trond Endrestøl:
>>>>>> On Tue, 6 May 2014 13:24+0200, Stefan Esser wrote:
>>>>>>> Am 06.05.2014 11:52, schrieb Stefan Esser:
>>>>>> tinderbox still complains about usr.bin/bmake/Makefile.inc.
>>>>>
>>>>> Hmmm, I managed to buildworld -HEAD after this patch, but it
>>>>> is possible, that I had src.opts.mk installed in /usr/share/mk
>>>>> when I started the build.
>>>>>
>>>>> (I later deleted it, to be sure that the version in the source 
>>>>> directory was found and used when building modules, which the 
>>>>> commit actually fixed.)
>>>>>
>>>>> I guess the remaining problem is caused by
>>>>>
>>>>> .include "src.opts.mk"
>>>>>
>>>>> in line 3 of src/usr.bin/bmake/Makefile.inc
>>>>>
>>>>> Changing this line to read ".include " seems to
>>>>> fix it on my system.
>>>>>
>>>>> --- usr.bin/bmake/Makefile.inc~ +++ usr.bin/bmake/Makefile.inc 
>>>>> @@ -1,6 +1,6 @@ # $FreeBSD$
>>>>>
>>>>> -.include "src.opts.mk" +.include 
> 
> This change I think actually is right. And it needs to be an ‘sinclude’
> to support the fmake upgrade path, but I need to double check that
> can’t be worked around in the environment.

Yes, the upgrade path is broken and that is what the tinderbox builds
complain about.

I just verified this by installing fmake as make on -HEAD. It fails
to build bmake as reported by tinderbox.

>>>> What is your source system? This is absolutely the wrong change,
>>>> and shouldn’t be necessary at all. These changes survived a
>>>> universe run and a few build worlds on other systems.
>>
>> I'm on a fresh -CURRENT (built the previous day) and with sources
>> as of r265439.
> 
> OK. My current is a bit dated, so I’ll spin up a new one.

I guess there are less problems when starting with -CURRENT than if
you come from 9-STABLE, so that appears to be the better test.

>> I agree, that the change to bmake/Makefile.inc was wrong - though
>> it was needed to get a "make cleandir" working in that directory.
> 
> Yea, I’m trying to get one that works all the time… I think I have it
> which should silence the tinderboxes…  In hind sight, perhaps
> I should have pushed this in first thing this morning rather than
> last thing last night...

Well, it was early morning in this part of the world, when the
tinderbox mails started to arrive ;-)

But the change (separation of src options and corresponding logic)
is definitely worth the nuisance of temporary breakage ...

>>> 
>>>
>>> so I’d like to know how to recreate it, since I didn’t see this in
>>> any of my testing over the last two weeks...
>>
>> The tinderbox builds all fail in bmake, and while I changed
>> Makefile.inc to fix just that kind of problem on my system, it
>> may have worked by accident (because of a forgotten src.opts.mk
>> in /usr/share/mk - it had been installed by a previous attempt
>> to work around these problems).
> 
> The initial bootstrap of bmake, or the later build of bmake? I was
> able to reproduce the former, but haven’t seen the latter fail.

Yes, it always seems to be the initial build that fails. And that
phase is skipped on systems with a usable bmake ...

>> To recapitulate the order of events:
>>
>> 1) make buildkernel failed due to 2 missing includes of
>>   src.opts.mk. The affected files files were:
>>
>>  sys/conf/kmod.mk
>>  sys/modules/drm2/Makefile
>>
>>   Adding an .include  seems to have fixed this
>>   problem. Maybe "src.opts.mk" would have been more correct,
>>   but I checked without src.opts.mk in /usr/share/mk and the
>>   file was found in src/share/mk.
> 
> I’ll look at these. I might have introduced the issues after I stopped
> building the 75 kernels in make universe after I made it through once.
> My bad...
> 
>> 2) tinderbox still complained about the test for MK_SHARED_TOOLCHAIN
>>   in bmake/Makefile.inc (I deleted the mails and thus cannot
>>   easily quote the exact error message). I tried

Re: [head tinderbox] failure on amd64/amd64

2014-05-06 Thread Stefan Esser
Am 06.05.2014 16:20, schrieb FreeBSD Tinderbox:
> TB --- 2014-05-06 14:20:19 - tinderbox 2.21 running on 
> freebsd-current.sentex.ca
> TB --- 2014-05-06 14:20:19 - FreeBSD freebsd-current.sentex.ca 9.2-STABLE 
> FreeBSD 9.2-STABLE #0 r263721: Tue Mar 25 09:27:39 EDT 2014 
> d...@freebsd-current.sentex.ca:/usr/obj/usr/src/sys/GENERIC  amd64
> TB --- 2014-05-06 14:20:19 - starting HEAD tinderbox run for amd64/amd64
> TB --- 2014-05-06 14:20:19 - cleaning the object tree
> TB --- 2014-05-06 14:20:19 - /usr/local/bin/svn stat --no-ignore /src
> TB --- 2014-05-06 14:20:24 - At svn revision 265446
> TB --- 2014-05-06 14:20:25 - building world
> TB --- 2014-05-06 14:20:25 - CROSS_BUILD_TESTING=YES
> TB --- 2014-05-06 14:20:25 - MAKEOBJDIRPREFIX=/obj
> TB --- 2014-05-06 14:20:25 - PATH=/usr/bin:/usr/sbin:/bin:/sbin
> TB --- 2014-05-06 14:20:25 - SRCCONF=/dev/null
> TB --- 2014-05-06 14:20:25 - TARGET=amd64
> TB --- 2014-05-06 14:20:25 - TARGET_ARCH=amd64
> TB --- 2014-05-06 14:20:25 - TZ=UTC
> TB --- 2014-05-06 14:20:25 - __MAKE_CONF=/dev/null
> TB --- 2014-05-06 14:20:25 - cd /src
> TB --- 2014-05-06 14:20:25 - /usr/bin/make -B buildworld
 Building an up-to-date make(1)
> --
> "Makefile.inc", line 3: Could not find src.opts.mk
> make: fatal errors encountered -- cannot continue
> *** [bmake] Error code 1

I just noticed that the build errors occur when building -HEAD on 9.2.

Without checking the Makefiles I assume, that this target is skipped
on systems that have a reasonably recent bmake installed (as is the
case on my build system).

Regards, STefan
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: buildworld fails (missing /usr/share/mk/src.opts.mk)

2014-05-06 Thread Stefan Esser
Am 06.05.2014 16:15, schrieb Benjamin Kaduk:
> On Tue, 6 May 2014, Stefan Esser wrote:
>> 2) tinderbox still complained about the test for MK_SHARED_TOOLCHAIN
>>   in bmake/Makefile.inc (I deleted the mails and thus cannot
>>   easily quote the exact error message). I tried to fix this by
> 
> http://lists.freebsd.org/pipermail/freebsd-current/2014-May/049744.html

Hi Ben,

thank you for the link, I did not know that the messages are archived
in real-time ...

Hmm, I was sure that there was a problem with MK_SHARED_TOOLCHAIN or
some other variable defined in src.opts.mk. I just checked my shell
history, and it must have been SOURCELESS_UCODE and MK_KERNEL_SYMBOLS
(which both made buildkernel fail).

Regards, STefan
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: buildworld fails (missing /usr/share/mk/src.opts.mk)

2014-05-06 Thread Stefan Esser
Am 06.05.2014 15:18, schrieb Warner Losh:
> 
> On May 6, 2014, at 7:16 AM, Warner Losh  wrote:
> 
>> 
>> On May 6, 2014, at 6:39 AM, Stefan Esser  wrote:
>> 
>>> Am 06.05.2014 13:44, schrieb Trond Endrestøl:
>>>> On Tue, 6 May 2014 13:24+0200, Stefan Esser wrote:
>>>>> Am 06.05.2014 11:52, schrieb Stefan Esser:
>>>>>> Hi Warner,
>>>>>> 
>>>>>> as already reported by Jenkins, HEAD does not build.
>>>>>> 
>>>>>> Seems that this is caused by src.opts.mk missing in
>>>>>> /usr/share/mk during the cleandir phase. I guess this is
>>>>>> kind of a bootstrap issue - the definitions are looked up
>>>>>> in the installed base, not in the src tree - but did not
>>>>>> verify this assumption.
>>>>>> 
>>>>>> A work-around is to manually install src.opts.mk:
>>>>>> 
>>>>>> # make -C /usr/src/share/mk install
>>>>>> 
>>>>>> (which might deserve an UPDATING entry). Falling back on
>>>>>> the file in the src directory might be a better solution
>>>>>> ...
>>>>>> 
>>>>>> Regards, STefan
>>>>> 
>>>>> Following up to my earlier mail:
>>>>> 
>>>>> The diagnosis was wrong - the main Makefiles include
>>>>> src.opts.mk from the source directory. But two sub-ordinate
>>>>> Makefiles missed to include the new options file
>>>>> (sys/conf/kmod.mk and sys/modules/drm2/Makefile).
>>>>> 
>>>>> I committed a fix/work-around to stop the flood of
>>>>> tinderbox messages (r265433).
>>>> 
>>>> tinderbox still complains about usr.bin/bmake/Makefile.inc.
>>> 
>>> Hmmm, I managed to buildworld -HEAD after this patch, but it
>>> is possible, that I had src.opts.mk installed in /usr/share/mk
>>> when I started the build.
>>> 
>>> (I later deleted it, to be sure that the version in the source 
>>> directory was found and used when building modules, which the 
>>> commit actually fixed.)
>>> 
>>> I guess the remaining problem is caused by
>>> 
>>> .include "src.opts.mk"
>>> 
>>> in line 3 of src/usr.bin/bmake/Makefile.inc
>>> 
>>> Changing this line to read ".include " seems to
>>> fix it on my system.
>>> 
>>> --- usr.bin/bmake/Makefile.inc~ +++ usr.bin/bmake/Makefile.inc 
>>> @@ -1,6 +1,6 @@ # $FreeBSD$
>>> 
>>> -.include "src.opts.mk" +.include 
>>> 
>>> .if defined(.PARSEDIR) # make sure this is available to
>>> unit-tests/Makefile
>>> 
>>> It is possible, that the build will still fail at a latter
>>> stage, though (buildworld is still running).
>>> 
>>> I committed the above patch, since it gets buildworld through
>>> the bmake subdirectory at least (r265436). If buildworld fails
>>> again, then I'll commit any further missing fixes in one go.
>>> I'll know in some 20 minutes.
>> 
>> What is your source system? This is absolutely the wrong change,
>> and shouldn’t be necessary at all. These changes survived a
>> universe run and a few build worlds on other systems.

I'm on a fresh -CURRENT (built the previous day) and with sources
as of r265439.

I agree, that the change to bmake/Makefile.inc was wrong - though
it was needed to get a "make cleandir" working in that directory.

> 
> 
> so I’d like to know how to recreate it, since I didn’t see this in
> any of my testing over the last two weeks...

The tinderbox builds all fail in bmake, and while I changed
Makefile.inc to fix just that kind of problem on my system, it
may have worked by accident (because of a forgotten src.opts.mk
in /usr/share/mk - it had been installed by a previous attempt
to work around these problems).

To recapitulate the order of events:

1) make buildkernel failed due to 2 missing includes of
   src.opts.mk. The affected files files were:

sys/conf/kmod.mk
sys/modules/drm2/Makefile

   Adding an .include  seems to have fixed this
   problem. Maybe "src.opts.mk" would have been more correct,
   but I checked without src.opts.mk in /usr/share/mk and the
   file was found in src/share/mk.

2) tinderbox still complained about the test for MK_SHARED_TOOLCHAIN
   in bmake/Makefile.inc (I deleted the mails and thus cannot
   easily quote the exact error message). I tried to fix this by
   changing the include syntax in bmake/Makefile.inc, but have
   just reverted this change. It made buildworld complete on my
   system, but tinderbox complains loudly.

A work-around for the second problem is to manually install
src.opts.mk in /usr/share/mk before attempting to build bmake.

Regards, STefan
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: buildworld fails (missing /usr/share/mk/src.opts.mk)

2014-05-06 Thread Stefan Esser
Am 06.05.2014 14:39, schrieb Stefan Esser:
> Am 06.05.2014 13:44, schrieb Trond Endrestøl:
>> On Tue, 6 May 2014 13:24+0200, Stefan Esser wrote:
>>> Am 06.05.2014 11:52, schrieb Stefan Esser:
>>> I committed a fix/work-around to stop the flood of tinderbox messages
>>> (r265433).
>>
>> tinderbox still complains about usr.bin/bmake/Makefile.inc.
> 
> Hmmm, I managed to buildworld -HEAD after this patch, but it is
> possible, that I had src.opts.mk installed in /usr/share/mk when
> I started the build.
> 
> (I later deleted it, to be sure that the version in the source
> directory was found and used when building modules, which the
> commit actually fixed.)
> 
> I guess the remaining problem is caused by
> 
> .include "src.opts.mk"
> 
> in line 3 of src/usr.bin/bmake/Makefile.inc
> 
> Changing this line to read ".include " seems to fix
> it on my system.
> 
> --- usr.bin/bmake/Makefile.inc~
> +++ usr.bin/bmake/Makefile.inc
> @@ -1,6 +1,6 @@
>  # $FreeBSD$
> 
> -.include "src.opts.mk"
> +.include 
> 
>  .if defined(.PARSEDIR)
>  # make sure this is available to unit-tests/Makefile
> 
> It is possible, that the build will still fail at a latter stage,
> though (buildworld is still running).
> 
> I committed the above patch, since it gets buildworld through the
> bmake subdirectory at least (r265436). If buildworld fails again,
> then I'll commit any further missing fixes in one go. I'll know
> in some 20 minutes.

My -HEAD buildworld completed without error for r265436,
but the tinderbox still complains at r265439.

When I had looked for the cause of the build errors, I had
modified several other Makefiles, but I reverted all these
temporary changes before the last buildworld, which succeeded.
(I had added "-I $(.CURDIR)/share/mk" to several invocations
of sub-make processes, but I checked - these are removed and
already were, during the successful buildworld.)

I cannot reproduce the failure in buildworld, even after
deleting the src.opts.mk from /usr/share/mk to reproduce
a system before that file gets installed.


But if I just go to usr.bin/bmake and try to build it on
a system without already installed src.opts.mk, I get the
error reported by tinderbox.


Reverting r265436 seems to help: "make -I /usr/src/share/mk"
in src/usr.bin/bmake finds src.opts.mk in the source directory.

But I added that patch as the final step required to fix
buildworld on my system ...

I'll see whether the build completes with r265436 reverted
and will back this commit out, if successful.

Regards, STefan
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: buildworld fails (missing /usr/share/mk/src.opts.mk)

2014-05-06 Thread Stefan Esser
Am 06.05.2014 13:44, schrieb Trond Endrestøl:
> On Tue, 6 May 2014 13:24+0200, Stefan Esser wrote:
>> Am 06.05.2014 11:52, schrieb Stefan Esser:
>>> Hi Warner,
>>>
>>> as already reported by Jenkins, HEAD does not build.
>>>
>>> Seems that this is caused by src.opts.mk missing in /usr/share/mk
>>> during the cleandir phase. I guess this is kind of a bootstrap
>>> issue - the definitions are looked up in the installed base, not
>>> in the src tree - but did not verify this assumption.
>>>
>>> A work-around is to manually install src.opts.mk:
>>>
>>> # make -C /usr/src/share/mk install
>>>
>>> (which might deserve an UPDATING entry). Falling back on the file
>>> in the src directory might be a better solution ...
>>>
>>> Regards, STefan
>>
>> Following up to my earlier mail:
>>
>> The diagnosis was wrong - the main Makefiles include src.opts.mk from
>> the source directory. But two sub-ordinate Makefiles missed to include
>> the new options file (sys/conf/kmod.mk and sys/modules/drm2/Makefile).
>>
>> I committed a fix/work-around to stop the flood of tinderbox messages
>> (r265433).
> 
> tinderbox still complains about usr.bin/bmake/Makefile.inc.

Hmmm, I managed to buildworld -HEAD after this patch, but it is
possible, that I had src.opts.mk installed in /usr/share/mk when
I started the build.

(I later deleted it, to be sure that the version in the source
directory was found and used when building modules, which the
commit actually fixed.)

I guess the remaining problem is caused by

.include "src.opts.mk"

in line 3 of src/usr.bin/bmake/Makefile.inc

Changing this line to read ".include " seems to fix
it on my system.

--- usr.bin/bmake/Makefile.inc~
+++ usr.bin/bmake/Makefile.inc
@@ -1,6 +1,6 @@
 # $FreeBSD$

-.include "src.opts.mk"
+.include 

 .if defined(.PARSEDIR)
 # make sure this is available to unit-tests/Makefile

It is possible, that the build will still fail at a latter stage,
though (buildworld is still running).

I committed the above patch, since it gets buildworld through the
bmake subdirectory at least (r265436). If buildworld fails again,
then I'll commit any further missing fixes in one go. I'll know
in some 20 minutes.

Regards, STefan
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


  1   2   >