from:"Mark Millard"

Re: releng/13 release/13.0.0 : odd/incorrect diff result over nfs (in a zfs file systems context)

2021-05-23 Thread Mark Millard via freebsd-stable

On 2021-May-23, at 01:27, Mark Millard  wrote:

> On 2021-May-23, at 00:44, Mark Millard  wrote:
> 
>> On 2021-May-21, at 17:56, Rick Macklem  wrote:
>> 
>>> Mark Millard wrote:
>>> [stuff snipped]
>>>> Well, why is it that ls -R, find, and diff -r all get file
>>>> name problems via genet0 but diff -r gets no problems
>>>> comparing the content of files that it does match up (the
>>>> vast majority)? Any clue how could the problems possibly
>>>> be unique to the handling of file names/paths? Does it
>>>> suggest anything else to look into for getting some more
>>>> potentially useful evidence?
>>> Well, all I can do is describe the most common TSO related
>>> failure:
>>> - When a read RPC reply (including NFS/RPC/TCP/IP headers)
>>> is slightly less than 64K bytes (many TSO implementations are
>>> limited to 64K or 32 discontiguous segments, think 32 2K
>>> mbuf clusters), the driver decides it is ok, but when the MAC
>>> header is added it exceeds what the hardware can handle correctly...
>>> --> This will happen when reading a regular file that is slightly less
>>> than a multiple of 64K in size.
>>> or
>>> --> This will happen when reading just about any large directory,
>>>since the directory reply for a 64K request is converted to Sun XDR
>>>format and clipped at the last full directory entry that will fit within 
>>> 64K.
>>> For ports, where most files are small, I think you can tell which is more
>>> likely to happen.
>>> --> If TSO is disabled, I have no idea how this might matter, but??
>>> 
>>>> I'll note that netstat -I ue0 -d and netstat -I genet0 -d
>>>> do not report changes in Ierrs or Idrop in a before vs.
>>>> after failures comparison. (There may be better figures
>>>> to look at for all I know.)
>>>> 
>>>> I tried "ifconfig genet0 -rxcsum -rxcsum -rxcsum6 -txcsum6"
>>>> and got no obvious change in behavior.
>>> All we know is that the data is getting corrupted somehow.
>>> 
>>> NFS traffic looks very different than typical TCP traffic. It is
>>> mostly small messages travelling in both directions concurrently,
>>> with some large messages thrown in the mix.
>>> All I'm saying is that, testing a net interface with something like
>>> bulk data transfer in one direction doesn't verify it works for NFS
>>> traffic.
>>> 
>>> Also, the large RPC messages are a chain of about 33 mbufs of
>>> various lengths, including a mix of partial clusters and regular
>>> data mbufs, whereas a bulk send on a socket will typically
>>> result in an mbuf chain of a lot of full 2K clusters.
>>> --> As such, NFS can be good at tickling subtle bugs it the
>>>net driver related to mbuf handling.
>>> 
>>> rick
>>> 
>>>>> W.r.t. reverting r367492...the patch to replace r367492 was just
>>>>> committed to "main" by rscheff@ with a two week MFC, so it
>>>>> should be in stable/13 soon. Not sure if an errata can be done
>>>>> for it for releng13.0?
>>>> 
>>>> That update is reported to be causing "rack" related panics:
>>>> 
>>>> https://lists.freebsd.org/pipermail/dev-commits-src-main/2021-May/004440.html
>>>> 
>>>> reports (via links):
>>>> 
>>>> panic: _mtx_lock_sleep: recursed on non-recursive mutex so_snd @ 
>>>> /syzkaller/managers/i386/kernel/sys/modules/tcp/rack/../../../netinet/tcp_stacks/rack.c:10632
>>>> 
>>>> Still, I have a non-debug update to main building and will
>>>> likely do a debug build as well. llvm is rebuilding, so
>>>> the builds will take a notable time.
>> 
>> I got the following built and installed on the two
>> machines:
>> 
>> # uname -apKU
>> FreeBSD CA72_16Gp_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #1 
>> main-n246854-03b0505b8fe8-dirty: Sat May 22 16:25:04 PDT 2021 
>> root@CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA72-dbg-clang/usr/main-src/arm64.aarch64/sys/GENERIC-DBG-CA72
>>   arm64 aarch64 1400013 1400013
>> 
>> # uname -apKU
>> FreeBSD CA72_4c8G_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #1 
>> main-n246854-03b0505b8fe8-dirty: Sat May 22 16:25:04 PDT 2021 
>> root@CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA72-dbg-clang/usr/main-src/arm64.aarch64/sys/GENERIC-DBG-CA72
>>   arm64 aarch64 1400013 1400013
>>

Re: releng/13 release/13.0.0 : odd/incorrect diff result over nfs (in a zfs file systems context)

2021-05-23 Thread Mark Millard via freebsd-stable

On 2021-May-23, at 00:44, Mark Millard  wrote:

> On 2021-May-21, at 17:56, Rick Macklem  wrote:
> 
>> Mark Millard wrote:
>> [stuff snipped]
>>> Well, why is it that ls -R, find, and diff -r all get file
>>> name problems via genet0 but diff -r gets no problems
>>> comparing the content of files that it does match up (the
>>> vast majority)? Any clue how could the problems possibly
>>> be unique to the handling of file names/paths? Does it
>>> suggest anything else to look into for getting some more
>>> potentially useful evidence?
>> Well, all I can do is describe the most common TSO related
>> failure:
>> - When a read RPC reply (including NFS/RPC/TCP/IP headers)
>> is slightly less than 64K bytes (many TSO implementations are
>> limited to 64K or 32 discontiguous segments, think 32 2K
>> mbuf clusters), the driver decides it is ok, but when the MAC
>> header is added it exceeds what the hardware can handle correctly...
>> --> This will happen when reading a regular file that is slightly less
>>  than a multiple of 64K in size.
>> or
>> --> This will happen when reading just about any large directory,
>> since the directory reply for a 64K request is converted to Sun XDR
>> format and clipped at the last full directory entry that will fit within 
>> 64K.
>> For ports, where most files are small, I think you can tell which is more
>> likely to happen.
>> --> If TSO is disabled, I have no idea how this might matter, but??
>> 
>>> I'll note that netstat -I ue0 -d and netstat -I genet0 -d
>>> do not report changes in Ierrs or Idrop in a before vs.
>>> after failures comparison. (There may be better figures
>>> to look at for all I know.)
>>> 
>>> I tried "ifconfig genet0 -rxcsum -rxcsum -rxcsum6 -txcsum6"
>>> and got no obvious change in behavior.
>> All we know is that the data is getting corrupted somehow.
>> 
>> NFS traffic looks very different than typical TCP traffic. It is
>> mostly small messages travelling in both directions concurrently,
>> with some large messages thrown in the mix.
>> All I'm saying is that, testing a net interface with something like
>> bulk data transfer in one direction doesn't verify it works for NFS
>> traffic.
>> 
>> Also, the large RPC messages are a chain of about 33 mbufs of
>> various lengths, including a mix of partial clusters and regular
>> data mbufs, whereas a bulk send on a socket will typically
>> result in an mbuf chain of a lot of full 2K clusters.
>> --> As such, NFS can be good at tickling subtle bugs it the
>> net driver related to mbuf handling.
>> 
>> rick
>> 
>>>> W.r.t. reverting r367492...the patch to replace r367492 was just
>>>> committed to "main" by rscheff@ with a two week MFC, so it
>>>> should be in stable/13 soon. Not sure if an errata can be done
>>>> for it for releng13.0?
>>> 
>>> That update is reported to be causing "rack" related panics:
>>> 
>>> https://lists.freebsd.org/pipermail/dev-commits-src-main/2021-May/004440.html
>>> 
>>> reports (via links):
>>> 
>>> panic: _mtx_lock_sleep: recursed on non-recursive mutex so_snd @ 
>>> /syzkaller/managers/i386/kernel/sys/modules/tcp/rack/../../../netinet/tcp_stacks/rack.c:10632
>>> 
>>> Still, I have a non-debug update to main building and will
>>> likely do a debug build as well. llvm is rebuilding, so
>>> the builds will take a notable time.
> 
> I got the following built and installed on the two
> machines:
> 
> # uname -apKU
> FreeBSD CA72_16Gp_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #1 
> main-n246854-03b0505b8fe8-dirty: Sat May 22 16:25:04 PDT 2021 
> root@CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA72-dbg-clang/usr/main-src/arm64.aarch64/sys/GENERIC-DBG-CA72
>   arm64 aarch64 1400013 1400013
> 
> # uname -apKU
> FreeBSD CA72_4c8G_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #1 
> main-n246854-03b0505b8fe8-dirty: Sat May 22 16:25:04 PDT 2021 
> root@CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA72-dbg-clang/usr/main-src/arm64.aarch64/sys/GENERIC-DBG-CA72
>   arm64 aarch64 1400013 1400013
> 
> Note that both are booted with debug builds of main.
> 
> Using the context with the alternate EtherNet device that has not
> had an associated diff -r, find, pr ls -R failure yet
> yet got a panic that looks likely to be unrelated:
> 
> # mount -onoatime 192.168.1.187:/usr/ports/ /mnt/
> # diff -r /usr/ports/ /mnt/ | more
> nvme0: cpl does not map to outstanding cmd

Re: releng/13 release/13.0.0 : odd/incorrect diff result over nfs (in a zfs file systems context)

2021-05-23 Thread Mark Millard via freebsd-stable

On 2021-May-21, at 17:56, Rick Macklem  wrote:

> Mark Millard wrote:
> [stuff snipped]
>> Well, why is it that ls -R, find, and diff -r all get file
>> name problems via genet0 but diff -r gets no problems
>> comparing the content of files that it does match up (the
>> vast majority)? Any clue how could the problems possibly
>> be unique to the handling of file names/paths? Does it
>> suggest anything else to look into for getting some more
>> potentially useful evidence?
> Well, all I can do is describe the most common TSO related
> failure:
> - When a read RPC reply (including NFS/RPC/TCP/IP headers)
>  is slightly less than 64K bytes (many TSO implementations are
>  limited to 64K or 32 discontiguous segments, think 32 2K
>  mbuf clusters), the driver decides it is ok, but when the MAC
>  header is added it exceeds what the hardware can handle correctly...
> --> This will happen when reading a regular file that is slightly less
>   than a multiple of 64K in size.
> or
> --> This will happen when reading just about any large directory,
>  since the directory reply for a 64K request is converted to Sun XDR
>  format and clipped at the last full directory entry that will fit within 
> 64K.
> For ports, where most files are small, I think you can tell which is more
> likely to happen.
> --> If TSO is disabled, I have no idea how this might matter, but??
> 
>> I'll note that netstat -I ue0 -d and netstat -I genet0 -d
>> do not report changes in Ierrs or Idrop in a before vs.
>> after failures comparison. (There may be better figures
>> to look at for all I know.)
>> 
>> I tried "ifconfig genet0 -rxcsum -rxcsum -rxcsum6 -txcsum6"
>> and got no obvious change in behavior.
> All we know is that the data is getting corrupted somehow.
> 
> NFS traffic looks very different than typical TCP traffic. It is
> mostly small messages travelling in both directions concurrently,
> with some large messages thrown in the mix.
> All I'm saying is that, testing a net interface with something like
> bulk data transfer in one direction doesn't verify it works for NFS
> traffic.
> 
> Also, the large RPC messages are a chain of about 33 mbufs of
> various lengths, including a mix of partial clusters and regular
> data mbufs, whereas a bulk send on a socket will typically
> result in an mbuf chain of a lot of full 2K clusters.
> --> As such, NFS can be good at tickling subtle bugs it the
>  net driver related to mbuf handling.
> 
> rick
> 
>>> W.r.t. reverting r367492...the patch to replace r367492 was just
>>> committed to "main" by rscheff@ with a two week MFC, so it
>>> should be in stable/13 soon. Not sure if an errata can be done
>>> for it for releng13.0?
>> 
>> That update is reported to be causing "rack" related panics:
>> 
>> https://lists.freebsd.org/pipermail/dev-commits-src-main/2021-May/004440.html
>> 
>> reports (via links):
>> 
>> panic: _mtx_lock_sleep: recursed on non-recursive mutex so_snd @ 
>> /syzkaller/managers/i386/kernel/sys/modules/tcp/rack/../../../netinet/tcp_stacks/rack.c:10632
>> 
>> Still, I have a non-debug update to main building and will
>> likely do a debug build as well. llvm is rebuilding, so
>> the builds will take a notable time.

I got the following built and installed on the two
machines:

# uname -apKU
FreeBSD CA72_16Gp_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #1 
main-n246854-03b0505b8fe8-dirty: Sat May 22 16:25:04 PDT 2021 
root@CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA72-dbg-clang/usr/main-src/arm64.aarch64/sys/GENERIC-DBG-CA72
  arm64 aarch64 1400013 1400013

# uname -apKU
FreeBSD CA72_4c8G_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #1 
main-n246854-03b0505b8fe8-dirty: Sat May 22 16:25:04 PDT 2021 
root@CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA72-dbg-clang/usr/main-src/arm64.aarch64/sys/GENERIC-DBG-CA72
  arm64 aarch64 1400013 1400013

Note that both are booted with debug builds of main.

Using the context with the alternate EtherNet device that has not
had an associated diff -r, find, pr ls -R failure yet
yet got a panic that looks likely to be unrelated:

# mount -onoatime 192.168.1.187:/usr/ports/ /mnt/
# diff -r /usr/ports/ /mnt/ | more
nvme0: cpl does not map to outstanding cmd
cdw0: sqhd:0020 sqid:0003 cid:007e p:1 sc:00 sct:0 m:0 dnr:0
panic: received completion for unknown cmd
cpuid = 3
time = 1621743752
KDB: stack backtrace:
db_trace_self() at db_trace_self
db_trace_self_wrapper() at db_trace_self_wrapper+0x30
vpanic() at vpanic+0x188
panic() at panic+0x44
nvme_qpair_process_completions() at nvme_qpair_process_completions+0x1fc
nvme_timeout() at nvme_timeout+0x3c
softclock_call_cc() at softclock_call_cc+0x124
so

Re: releng/13 release/13.0.0 : odd/incorrect diff result over nfs (in a zfs file systems context)

2021-05-21 Thread Mark Millard via freebsd-stable

On 2021-May-21, at 09:00, Rick Macklem  wrote:

> Mark Millard wrote:
>> On 2021-May-20, at 22:19, Rick Macklem  wrote:
> [stuff snipped]
>>> ps: I do not think that r367492 could cause this, but it would be
>>>nice if you try a kernel with the r367492 patch reverted.
>>>It is currently in all of releng13, stable13 and main, although
>>>the patch to fix this is was just reviewed and may hit main soon.
>> 
>> Do you want a debug kernel to be used? Do you have a preference
>> for main vs. stable/13 vs. release/13.0.0 based? Is it okay to
>> stick to the base version things are now based on --or do you
>> want me to update to more recent? (That last only applies if
>> main or stable/13 is to be put to use.)
> Well, it sounds like you've isolated it to the genet interface.
> Good sluething.
> Unfortunately, NFS is only as good as the network fabric under it.
> However, it's usually hangs or poor performance. Except maybe
> for the readdir issue that Jason Bacon reported and resolved via
> an upgrade, this is a first.
> --> In the old days, I would have expected IP checksums to catch
>   this, but I'm guessing the hardware/net driver are doing them
>   these days?

Well, why is it that ls -R, find, and diff -r all get file
name problems via genet0 but diff -r gets no problems
comparing the content of files that it does match up (the
vast majority)? Any clue how could the problems possibly
be unique to the handling of file names/paths? Does it
suggest anything else to look into for getting some more
potentially useful evidence?

I'll note that netstat -I ue0 -d and netstat -I genet0 -d
do not report changes in Ierrs or Idrop in a before vs.
after failures comparison. (There may be better figures
to look at for all I know.)

I tried "ifconfig genet0 -rxcsum -rxcsum -rxcsum6 -txcsum6"
and got no obvious change in behavior.

> W.r.t. reverting r367492...the patch to replace r367492 was just
> committed to "main" by rscheff@ with a two week MFC, so it
> should be in stable/13 soon. Not sure if an errata can be done
> for it for releng13.0?

That update is reported to be causing "rack" related panics:

https://lists.freebsd.org/pipermail/dev-commits-src-main/2021-May/004440.html

reports (via links):

panic: _mtx_lock_sleep: recursed on non-recursive mutex so_snd @ 
/syzkaller/managers/i386/kernel/sys/modules/tcp/rack/../../../netinet/tcp_stacks/rack.c:10632

Still, I have a non-debug update to main building and will
likely do a debug build as well. llvm is rebuilding, so
the builds will take a notable time.

> Thanks for isolating this, rick
> ps: Co-incidentally, I've been thinking of buying an RBPi4 as a toy.

I'll warn that the primary "small arm" development/support
folk(s) do not work on the RPi*'s these days, beyond
committing what others provide and the like.

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: releng/13 release/13.0.0 : odd/incorrect diff result over nfs (in a zfs file systems context) [RPi4B genet0 involved in problem]

2021-05-21 Thread Mark Millard via freebsd-stable

[Looks like the RPi4B genet0 handling is involved.]

On 2021-May-20, at 22:56, Mark Millard  wrote:
> 
> On 2021-May-20, at 22:19, Rick Macklem  wrote:
> 
>> Ok, so it isn't related to "soft".
>> I am wondering if it is something specific to what
>> "diff -r" does?
>> 
>> Could you try:
>> # cd /usr/ports
>> # ls -R > /tmp/x
>> # cd /mnt
>> # ls -R > /tmp/y
>> # cd /tmp
>> # diff -u -p x y
>> --> To see if "ls -R" finds any difference?
>> 
> 
> # diff -u -p x y 
> --- x   2021-05-20 22:35:48.021663000 -0700
> +++ y   2021-05-20 22:39:03.691936000 -0700
> @@ -227209,10 +227209,10 @@ 
> patch-chrome_browser_background_background__mode__mana
> patch-chrome_browser_background_background__mode__optimizer.cc
> patch-chrome_browser_browser__resources.grd
> patch-chrome_browser_browsing__data_chrome__browsing__data__remover__delegate.cc
> +patch-chrome_browser_chrome__browser
> patch-chrome_browser_chrome__browser__interface__binders.cc
> patch-chrome_browser_chrome__browser__main.cc
> patch-chrome_browser_chrome__browser__main__linux.cc
> -patch-chrome_browser_chrome__browser__main__posix.cc
> patch-chrome_browser_chrome__content__browser__client.cc
> patch-chrome_browser_chrome__content__browser__client.h
> patch-chrome_browser_crash__upload__list_crash__upload__list.cc
> 
> # find /usr/ports/ -name 'patch-chrome_browser_chrome__browser*' -print | more
> /usr/ports/devel/electron12/files/patch-chrome_browser_chrome__browser__main__linux.cc
> /usr/ports/devel/electron12/files/patch-chrome_browser_chrome__browser__main.cc
> /usr/ports/devel/electron12/files/patch-chrome_browser_chrome__browser__main__posix.cc
> /usr/ports/devel/electron12/files/patch-chrome_browser_chrome__browser__interface__binders.cc
> /usr/ports/www/chromium/files/patch-chrome_browser_chrome__browser__main__posix.cc
> /usr/ports/www/chromium/files/patch-chrome_browser_chrome__browser__main.cc
> /usr/ports/www/chromium/files/patch-chrome_browser_chrome__browser__main__linux.cc
> /usr/ports/www/chromium/files/patch-chrome_browser_chrome__browser__interface__binders.cc
> 
> find /mnt/ -name 'patch-chrome_browser_chrome__browser*' -print | more
> /mnt/devel/electron12/files/patch-chrome_browser_chrome__browser__main__linux.cc
> /mnt/devel/electron12/files/patch-chrome_browser_chrome__browser__main.cc
> /mnt/devel/electron12/files/patch-chrome_browser_chrome__browser__main__posix.cc
> /mnt/devel/electron12/files/patch-chrome_browser_chrome__browser__interface__binders.cc
> /mnt/www/chromium/files/patch-chrome_browser_chrome__browser
> /mnt/www/chromium/files/patch-chrome_browser_chrome__browser__main.cc
> /mnt/www/chromium/files/patch-chrome_browser_chrome__browser__main__linux.cc
> /mnt/www/chromium/files/patch-chrome_browser_chrome__browser__interface__binders.cc
> 
> So: patch-chrome_browser_chrome__browser appears to be a
> truncated: patch-chrome_browser_chrome__browser__main__posix.cc
> file name and find also gets the same oddity.
> 
> (Note: This had /usr/ports in a main context and /mnt/
> referring to a release/13.0.0 context.)
> 
>> ps: I do not think that r367492 could cause this, but it would be
>>nice if you try a kernel with the r367492 patch reverted.
>>It is currently in all of releng13, stable13 and main, although
>>the patch to fix this is was just reviewed and may hit main soon.
> 
> Do you want a debug kernel to be used? Do you have a preference
> for main vs. stable/13 vs. release/13.0.0 based? Is it okay to
> stick to the base version things are now based on --or do you
> want me to update to more recent? (That last only applies if
> main or stable/13 is to be put to use.)
> 
>> . . . old history deleted . . .

I reversed the roles of the faster vs. somewhat slower
machine and so far my diff -r attempts for this found
no differences. The machines were using different types
of EtherNet devices.

So I've substituted a different EtherNet device onto
the slower machine: the same type of USB3 EtherNet
device in use on the faster machine (instead of
using the RPi4B's builtin EtherNet). So the below
testing is with both machines having a:

ugen0.6:  at usbus0
ure0 on uhub0
ure0:  on usbus0
miibus1:  on ure0
rgephy0:  PHY 0 on miibus1
rgephy0:  none, 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT-FDX, 
1000baseT-FDX-master, auto

in use.

I rebooted with this connected instead of the genet0
interface.

Mounting the slower machine's /usr/ports/ as /mnt/ from the faster machine:
No differences found by diff -r this way (expected result).

Mounting the faster machine's /usr/ports/ as /mnt/ from the slower machine:
No differences found by diff -r this way (expected result).

Doing diff -r's from bo

Re: releng/13 release/13.0.0 : odd/incorrect diff result over nfs (in a zfs file systems context)

2021-05-20 Thread Mark Millard via freebsd-stable




On 2021-May-20, at 22:19, Rick Macklem  wrote:

> Ok, so it isn't related to "soft".
> I am wondering if it is something specific to what
> "diff -r" does?
> 
> Could you try:
> # cd /usr/ports
> # ls -R > /tmp/x
> # cd /mnt
> # ls -R > /tmp/y
> # cd /tmp
> # diff -u -p x y
> --> To see if "ls -R" finds any difference?
> 

# diff -u -p x y 
--- x   2021-05-20 22:35:48.021663000 -0700
+++ y   2021-05-20 22:39:03.691936000 -0700
@@ -227209,10 +227209,10 @@ 
patch-chrome_browser_background_background__mode__mana
 patch-chrome_browser_background_background__mode__optimizer.cc
 patch-chrome_browser_browser__resources.grd
 
patch-chrome_browser_browsing__data_chrome__browsing__data__remover__delegate.cc
+patch-chrome_browser_chrome__browser
 patch-chrome_browser_chrome__browser__interface__binders.cc
 patch-chrome_browser_chrome__browser__main.cc
 patch-chrome_browser_chrome__browser__main__linux.cc
-patch-chrome_browser_chrome__browser__main__posix.cc
 patch-chrome_browser_chrome__content__browser__client.cc
 patch-chrome_browser_chrome__content__browser__client.h
 patch-chrome_browser_crash__upload__list_crash__upload__list.cc

# find /usr/ports/ -name 'patch-chrome_browser_chrome__browser*' -print | more
/usr/ports/devel/electron12/files/patch-chrome_browser_chrome__browser__main__linux.cc
/usr/ports/devel/electron12/files/patch-chrome_browser_chrome__browser__main.cc
/usr/ports/devel/electron12/files/patch-chrome_browser_chrome__browser__main__posix.cc
/usr/ports/devel/electron12/files/patch-chrome_browser_chrome__browser__interface__binders.cc
/usr/ports/www/chromium/files/patch-chrome_browser_chrome__browser__main__posix.cc
/usr/ports/www/chromium/files/patch-chrome_browser_chrome__browser__main.cc
/usr/ports/www/chromium/files/patch-chrome_browser_chrome__browser__main__linux.cc
/usr/ports/www/chromium/files/patch-chrome_browser_chrome__browser__interface__binders.cc

 find /mnt/ -name 'patch-chrome_browser_chrome__browser*' -print | more
/mnt/devel/electron12/files/patch-chrome_browser_chrome__browser__main__linux.cc
/mnt/devel/electron12/files/patch-chrome_browser_chrome__browser__main.cc
/mnt/devel/electron12/files/patch-chrome_browser_chrome__browser__main__posix.cc
/mnt/devel/electron12/files/patch-chrome_browser_chrome__browser__interface__binders.cc
/mnt/www/chromium/files/patch-chrome_browser_chrome__browser
/mnt/www/chromium/files/patch-chrome_browser_chrome__browser__main.cc
/mnt/www/chromium/files/patch-chrome_browser_chrome__browser__main__linux.cc
/mnt/www/chromium/files/patch-chrome_browser_chrome__browser__interface__binders.cc

So: patch-chrome_browser_chrome__browser appears to be a
truncated: patch-chrome_browser_chrome__browser__main__posix.cc
file name and find also gets the same oddity.

(Note: This had /usr/ports in a main context and /mnt/
referring to a release/13.0.0 context.)

> ps: I do not think that r367492 could cause this, but it would be
> nice if you try a kernel with the r367492 patch reverted.
> It is currently in all of releng13, stable13 and main, although
> the patch to fix this is was just reviewed and may hit main soon.

Do you want a debug kernel to be used? Do you have a preference
for main vs. stable/13 vs. release/13.0.0 based? Is it okay to
stick to the base version things are now based on --or do you
want me to update to more recent? (That last only applies if
main or stable/13 is to be put to use.)

> . . . old history deleted . . .

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: releng/13 release/13.0.0 : odd/incorrect diff result over nfs (in a zfs file systems context)

2021-05-20 Thread Mark Millard via freebsd-stable

[Direct drive connection to machine: no problem.]

On 2021-May-20, at 21:40, Mark Millard  wrote:

> [main test example and main/releng/13 mixed example]
> 
> On 2021-May-20, at 20:36, Mark Millard  wrote:
> 
>> [stable/13 test: example ends up being odder. That might
>> allow eliminating some potential alternatives.]
>> 
>> On 2021-May-20, at 19:38, Mark Millard  wrote:
>>> 
>>> On 2021-May-20, at 18:09, Rick Macklem  wrote:
>>>> 
>>>> Oh, one additional thing that I'll dare to top post...
>>>> r367492 broke the TCP upcalls that the NFS server uses, such
>>>> that intermittent hangs of NFS mounts to FreeBSD13 servers can occur.
>>>> This has not yet been resolved in "main" etc and could explain
>>>> why an RPC could time out for a soft mount.
>>> 
>>> See later notes that I added: soft mount is not required
>>> to see the problem.
>>> 
>>>> You can revert the patch in r367492 to avoid the problem.
>>> 
>>> If I understand right, you are indicating that this would
>>> not apply to the non-soft mount case that I got.
>>> 
>>>> Disabling TSO, LRO are also de-facto standard things to do when
>>>> you observe weird NFS  behaviour, because they are often broken
>>>> in various network device drivers.
>>> 
>>> I'll have to figure out how to experiment with such. Things
>>> are at defaults rather generally on the systems. I'm not
>>> literate in the subject areas.
>>> 
>>> I'm the only user of the machines and network. It is not
>>> outward facing. It is a rather small EtherNet network.
>>> 
>>>> rick
>>>> 
>>>> 
>>>> From: owner-freebsd-sta...@freebsd.org  
>>>> on behalf of Rick Macklem 
>>>> Sent: Thursday, May 20, 2021 8:55 PM
>>>> To: FreeBSD-STABLE Mailing List; Mark Millard
>>>> Subject: Re: releng/13 release/13.0.0 : odd/incorrect diff result over nfs 
>>>> (in a zfs file systems context)
>>>> 
>>>> Mark Millard wrote:
>>>>> [I warn that I'm a fairly minimal user of NFS
>>>>> mounts, not knowing all that much. I'm mostly
>>>>> reporting this in case it ends up as evidence
>>>>> via eventually matching up with others observing
>>>>> possibly related oddities.]
>>>>> 
>>>>> I got the following odd sequence (that I've
>>>>> mixed notes into). It involved a diff -r over NFS
>>>>> showing differences (files missing) and then a
>>>>> later diff finding matches for the same files,
>>>>> no file system changes made on either machine.
>>>>> I'm unable to reproduce the oddity on demand.
>>>>> 
>>>>> Note: A larger scope diff -r originally returned the
>>>>> below as well, but doing the narrower diff -r did
>>>>> repeat the result and that is what I show. (I
>>>>> make no use of devel/ice .)
>>>>> 
>>>>> # diff -r /usr/ports/devel/ice/files /mnt/devel/ice/files | more
>>>>> Only in /usr/ports/devel/ice/files: Make.rules.FreeBSD
>>> . . .
>>>>> Only in /usr/ports/devel/ice/files: patch-scripts-TestUtil.py
>>>>> 
>>>>> Note: The above was not expected. So I tried:
>>>>> 
>>>>> # ls -Tld /mnt/devel/ice/files/*
>>>>> -rw-r--r--  1 root  wheel   755 Apr 21 21:07:54 2021 
>>>>> /mnt/devel/ice/files/Make.rules.FreeBSD
>>> . . .
>>>>> -rw-r--r--  1 root  wheel  2588 Apr 21 21:07:54 2021 
>>>>> /mnt/devel/ice/files/patch-scripts-TestUtil.py
>>>>> 
>>>>> Note: So that indicated that the files were there on the
>>>>> machine that /mnt references. So attempting the original
>>>>> diff -r again:
>>>>> 
>>>>> # diff -r /usr/ports/devel/ice/files /mnt/devel/ice/files | more
>>>>> #
>>>>> 
>>>>> (Empty difference.)
>>>>> 
>>>>> Note: So after the explicit "ls -Tld /mnt/devel/ice/files/*"
>>>>> the odd result of the diff -r no longer happened: no
>>>>> differences reported.
>>>>> 
>>>>> 
>>>>> 
>>>>> For reference (both machines reported):
>>>>> 
>>>>> . . .
>>>>> The original mount command w

Re: releng/13 release/13.0.0 : odd/incorrect diff result over nfs (in a zfs file systems context)

2021-05-20 Thread Mark Millard via freebsd-stable

[main test example and main/releng/13 mixed example]

On 2021-May-20, at 20:36, Mark Millard  wrote:

> [stable/13 test: example ends up being odder. That might
> allow eliminating some potential alternatives.]
> 
> On 2021-May-20, at 19:38, Mark Millard  wrote:
>> 
>> On 2021-May-20, at 18:09, Rick Macklem  wrote:
>>> 
>>> Oh, one additional thing that I'll dare to top post...
>>> r367492 broke the TCP upcalls that the NFS server uses, such
>>> that intermittent hangs of NFS mounts to FreeBSD13 servers can occur.
>>> This has not yet been resolved in "main" etc and could explain
>>> why an RPC could time out for a soft mount.
>> 
>> See later notes that I added: soft mount is not required
>> to see the problem.
>> 
>>> You can revert the patch in r367492 to avoid the problem.
>> 
>> If I understand right, you are indicating that this would
>> not apply to the non-soft mount case that I got.
>> 
>>> Disabling TSO, LRO are also de-facto standard things to do when
>>> you observe weird NFS  behaviour, because they are often broken
>>> in various network device drivers.
>> 
>> I'll have to figure out how to experiment with such. Things
>> are at defaults rather generally on the systems. I'm not
>> literate in the subject areas.
>> 
>> I'm the only user of the machines and network. It is not
>> outward facing. It is a rather small EtherNet network.
>> 
>>> rick
>>> 
>>> 
>>> From: owner-freebsd-sta...@freebsd.org  
>>> on behalf of Rick Macklem 
>>> Sent: Thursday, May 20, 2021 8:55 PM
>>> To: FreeBSD-STABLE Mailing List; Mark Millard
>>> Subject: Re: releng/13 release/13.0.0 : odd/incorrect diff result over nfs 
>>> (in a zfs file systems context)
>>> 
>>> Mark Millard wrote:
>>>> [I warn that I'm a fairly minimal user of NFS
>>>> mounts, not knowing all that much. I'm mostly
>>>> reporting this in case it ends up as evidence
>>>> via eventually matching up with others observing
>>>> possibly related oddities.]
>>>> 
>>>> I got the following odd sequence (that I've
>>>> mixed notes into). It involved a diff -r over NFS
>>>> showing differences (files missing) and then a
>>>> later diff finding matches for the same files,
>>>> no file system changes made on either machine.
>>>> I'm unable to reproduce the oddity on demand.
>>>> 
>>>> Note: A larger scope diff -r originally returned the
>>>> below as well, but doing the narrower diff -r did
>>>> repeat the result and that is what I show. (I
>>>> make no use of devel/ice .)
>>>> 
>>>> # diff -r /usr/ports/devel/ice/files /mnt/devel/ice/files | more
>>>> Only in /usr/ports/devel/ice/files: Make.rules.FreeBSD
>> . . .
>>>> Only in /usr/ports/devel/ice/files: patch-scripts-TestUtil.py
>>>> 
>>>> Note: The above was not expected. So I tried:
>>>> 
>>>> # ls -Tld /mnt/devel/ice/files/*
>>>> -rw-r--r--  1 root  wheel   755 Apr 21 21:07:54 2021 
>>>> /mnt/devel/ice/files/Make.rules.FreeBSD
>> . . .
>>>> -rw-r--r--  1 root  wheel  2588 Apr 21 21:07:54 2021 
>>>> /mnt/devel/ice/files/patch-scripts-TestUtil.py
>>>> 
>>>> Note: So that indicated that the files were there on the
>>>> machine that /mnt references. So attempting the original
>>>> diff -r again:
>>>> 
>>>> # diff -r /usr/ports/devel/ice/files /mnt/devel/ice/files | more
>>>> #
>>>> 
>>>> (Empty difference.)
>>>> 
>>>> Note: So after the explicit "ls -Tld /mnt/devel/ice/files/*"
>>>> the odd result of the diff -r no longer happened: no
>>>> differences reported.
>>>> 
>>>> 
>>>> 
>>>> For reference (both machines reported):
>>>> 
>>>> . . .
>>>> The original mount command was on CA72_16Gp_ZFS:
>>>> 
>>>> # mount -onoatime,soft 192.168.1.170:/usr/ports/ /mnt/
>>> The likely explanation for this is your use of a "soft" mount.
>>> - If the NFS server is slow to respond or there is a temporary network 
>>> issue,
>>> the RPC request can time out and then the
>>> syscall can fail with EINT/ETIMEDOUT. Since almost nothing, including the
>>>  readdir(3) libc functions ex

Re: releng/13 release/13.0.0 : odd/incorrect diff result over nfs (in a zfs file systems context)

2021-05-20 Thread Mark Millard via freebsd-stable

[stable/13 test: example ends up being odder. That might
allow eliminating some potential alternatives.]

On 2021-May-20, at 19:38, Mark Millard  wrote:
> 
> On 2021-May-20, at 18:09, Rick Macklem  wrote:
>> 
>> Oh, one additional thing that I'll dare to top post...
>> r367492 broke the TCP upcalls that the NFS server uses, such
>> that intermittent hangs of NFS mounts to FreeBSD13 servers can occur.
>> This has not yet been resolved in "main" etc and could explain
>> why an RPC could time out for a soft mount.
> 
> See later notes that I added: soft mount is not required
> to see the problem.
> 
>> You can revert the patch in r367492 to avoid the problem.
> 
> If I understand right, you are indicating that this would
> not apply to the non-soft mount case that I got.
> 
>> Disabling TSO, LRO are also de-facto standard things to do when
>> you observe weird NFS  behaviour, because they are often broken
>> in various network device drivers.
> 
> I'll have to figure out how to experiment with such. Things
> are at defaults rather generally on the systems. I'm not
> literate in the subject areas.
> 
> I'm the only user of the machines and network. It is not
> outward facing. It is a rather small EtherNet network.
> 
>> rick
>> 
>> 
>> From: owner-freebsd-sta...@freebsd.org  on 
>> behalf of Rick Macklem 
>> Sent: Thursday, May 20, 2021 8:55 PM
>> To: FreeBSD-STABLE Mailing List; Mark Millard
>> Subject: Re: releng/13 release/13.0.0 : odd/incorrect diff result over nfs 
>> (in a zfs file systems context)
>> 
>> Mark Millard wrote:
>>> [I warn that I'm a fairly minimal user of NFS
>>> mounts, not knowing all that much. I'm mostly
>>> reporting this in case it ends up as evidence
>>> via eventually matching up with others observing
>>> possibly related oddities.]
>>> 
>>> I got the following odd sequence (that I've
>>> mixed notes into). It involved a diff -r over NFS
>>> showing differences (files missing) and then a
>>> later diff finding matches for the same files,
>>> no file system changes made on either machine.
>>> I'm unable to reproduce the oddity on demand.
>>> 
>>> Note: A larger scope diff -r originally returned the
>>> below as well, but doing the narrower diff -r did
>>> repeat the result and that is what I show. (I
>>> make no use of devel/ice .)
>>> 
>>> # diff -r /usr/ports/devel/ice/files /mnt/devel/ice/files | more
>>> Only in /usr/ports/devel/ice/files: Make.rules.FreeBSD
> . . .
>>> Only in /usr/ports/devel/ice/files: patch-scripts-TestUtil.py
>>> 
>>> Note: The above was not expected. So I tried:
>>> 
>>> # ls -Tld /mnt/devel/ice/files/*
>>> -rw-r--r--  1 root  wheel   755 Apr 21 21:07:54 2021 
>>> /mnt/devel/ice/files/Make.rules.FreeBSD
> . . .
>>> -rw-r--r--  1 root  wheel  2588 Apr 21 21:07:54 2021 
>>> /mnt/devel/ice/files/patch-scripts-TestUtil.py
>>> 
>>> Note: So that indicated that the files were there on the
>>> machine that /mnt references. So attempting the original
>>> diff -r again:
>>> 
>>> # diff -r /usr/ports/devel/ice/files /mnt/devel/ice/files | more
>>> #
>>> 
>>> (Empty difference.)
>>> 
>>> Note: So after the explicit "ls -Tld /mnt/devel/ice/files/*"
>>> the odd result of the diff -r no longer happened: no
>>> differences reported.
>>> 
>>> 
>>> 
>>> For reference (both machines reported):
>>> 
>>> . . .
>>> The original mount command was on CA72_16Gp_ZFS:
>>> 
>>> # mount -onoatime,soft 192.168.1.170:/usr/ports/ /mnt/
>> The likely explanation for this is your use of a "soft" mount.
>> - If the NFS server is slow to respond or there is a temporary network issue,
>>  the RPC request can time out and then the
>>  syscall can fail with EINT/ETIMEDOUT. Since almost nothing, including the
>>   readdir(3) libc functions expect syscalls to fail this way...
>>   Then the cached directory is messed up.
>>   Doing the "ls" read the directory again and fixed the problem.
>> 
>> Try to reproduce it for a mount without the "soft" option.
>> (If a mount point is hung, due to an unresponsive server "umount -N /mnt"
>> can usually get rid of it.)
>> Personally, I thought "soft" was a bad idea when Sun introduced it in NFS in 
>> 1985
>> and I still feel that

Re: releng/13 release/13.0.0 : odd/incorrect diff result over nfs (in a zfs file systems context)

2021-05-20 Thread Mark Millard via freebsd-stable




> On 2021-May-20, at 18:09, Rick Macklem  wrote:
> 
> Oh, one additional thing that I'll dare to top post...
> r367492 broke the TCP upcalls that the NFS server uses, such
> that intermittent hangs of NFS mounts to FreeBSD13 servers can occur.
> This has not yet been resolved in "main" etc and could explain
> why an RPC could time out for a soft mount.

See later notes that I added: soft mount is not required
to see the problem.

> You can revert the patch in r367492 to avoid the problem.

If I understand right, you are indicating that this would
not apply to the non-soft mount case that I got.

> Disabling TSO, LRO are also de-facto standard things to do when
> you observe weird NFS  behaviour, because they are often broken
> in various network device drivers.

I'll have to figure out how to experiment with such. Things
are at defaults rather generally on the systems. I'm not
literate in the subject areas.

I'm the only user of the machines and network. It is not
outward facing. It is a rather small EtherNet network.

> rick
> 
> 
> From: owner-freebsd-sta...@freebsd.org  on 
> behalf of Rick Macklem 
> Sent: Thursday, May 20, 2021 8:55 PM
> To: FreeBSD-STABLE Mailing List; Mark Millard
> Subject: Re: releng/13 release/13.0.0 : odd/incorrect diff result over nfs 
> (in a zfs file systems context)
> 
> Mark Millard wrote:
>> [I warn that I'm a fairly minimal user of NFS
>> mounts, not knowing all that much. I'm mostly
>> reporting this in case it ends up as evidence
>> via eventually matching up with others observing
>> possibly related oddities.]
>> 
>> I got the following odd sequence (that I've
>> mixed notes into). It involved a diff -r over NFS
>> showing differences (files missing) and then a
>> later diff finding matches for the same files,
>> no file system changes made on either machine.
>> I'm unable to reproduce the oddity on demand.
>> 
>> Note: A larger scope diff -r originally returned the
>> below as well, but doing the narrower diff -r did
>> repeat the result and that is what I show. (I
>> make no use of devel/ice .)
>> 
>> # diff -r /usr/ports/devel/ice/files /mnt/devel/ice/files | more
>> Only in /usr/ports/devel/ice/files: Make.rules.FreeBSD
. . .
>> Only in /usr/ports/devel/ice/files: patch-scripts-TestUtil.py
>> 
>> Note: The above was not expected. So I tried:
>> 
>> # ls -Tld /mnt/devel/ice/files/*
>> -rw-r--r--  1 root  wheel   755 Apr 21 21:07:54 2021 
>> /mnt/devel/ice/files/Make.rules.FreeBSD
. . .
>> -rw-r--r--  1 root  wheel  2588 Apr 21 21:07:54 2021 
>> /mnt/devel/ice/files/patch-scripts-TestUtil.py
>> 
>> Note: So that indicated that the files were there on the
>> machine that /mnt references. So attempting the original
>> diff -r again:
>> 
>> # diff -r /usr/ports/devel/ice/files /mnt/devel/ice/files | more
>> #
>> 
>> (Empty difference.)
>> 
>> Note: So after the explicit "ls -Tld /mnt/devel/ice/files/*"
>> the odd result of the diff -r no longer happened: no
>> differences reported.
>> 
>> 
>> 
>> For reference (both machines reported):
>> 
>> . . .
>> The original mount command was on CA72_16Gp_ZFS:
>> 
>> # mount -onoatime,soft 192.168.1.170:/usr/ports/ /mnt/
> The likely explanation for this is your use of a "soft" mount.
> - If the NFS server is slow to respond or there is a temporary network issue,
>   the RPC request can time out and then the
>   syscall can fail with EINT/ETIMEDOUT. Since almost nothing, including the
>readdir(3) libc functions expect syscalls to fail this way...
>Then the cached directory is messed up.
>Doing the "ls" read the directory again and fixed the problem.
> 
> Try to reproduce it for a mount without the "soft" option.
> (If a mount point is hung, due to an unresponsive server "umount -N /mnt"
> can usually get rid of it.)
> Personally, I thought "soft" was a bad idea when Sun introduced it in NFS in 
> 1985
> and I still feel that way.
> --> If you can reproduce it without "soft" then I can't explain it.
>  To be honest, the directory reading/caching code in the NFSv3 client
>  hasn't changed significantly in literally decades, as far as I can 
> remember.

Well . . . trying an even wider scope diff than
the original . . .

# umount /mnt/
# mount -onoatime 192.168.1.170:/usr/ports/ /mnt/
# diff -r /usr/ports/ /mnt/ | more
Only in /mnt/databases/mongodb42/files/aarch64: patch-src_third__party_mozjs-60_
Only in /usr/ports/databases/mongodb42/files/aarch64: 
patch-src_th

releng/13 release/13.0.0 : odd/incorrect diff result over nfs (in a zfs file systems context)

2021-05-20 Thread Mark Millard via freebsd-stable

fbsd-based-on-what-commit.sh 
branch: releng/13.0
merge-base: ea31abc261ffc01b6ff5671bffb15cf910a07f4b
merge-base: CommitDate: 2021-04-09 00:14:30 +
ea31abc261ff (HEAD -> releng/13.0, tag: release/13.0.0, freebsd/releng/13.0) 
13.0: update to RELEASE
n244733 (--first-parent --count for merge-base)

# uname -apKU
FreeBSD CA72_4c8G_ZFS 13.0-RELEASE FreeBSD 13.0-RELEASE #0 
releng/13.0-n244733-ea31abc261ff-dirty: Thu Apr 29 21:53:20 PDT 2021 
root@CA72_4c8G_ZFS:/usr/obj/BUILDs/13_0R-CA72-nodbg-clang/usr/13_0R-src/arm64.aarch64/sys/GENERIC-NODBG-CA72
  arm64 aarch64 1300139 1300139

# ~/fbsd-based-on-what-commit.sh 
branch: releng/13.0
merge-base: ea31abc261ffc01b6ff5671bffb15cf910a07f4b
merge-base: CommitDate: 2021-04-09 00:14:30 +
ea31abc261ff (HEAD -> releng/13.0, tag: release/13.0.0, freebsd/releng/13.0) 
13.0: update to RELEASE
n244733 (--first-parent --count for merge-base)

>From zfs list commands (one machine per line shown):

zopt0/usr/ports   2.13G   236G 2.13G  /usr/ports
zroot/usr/ports   2.13G   113G 2.13G  /usr/ports

I've no clue if ZFS is important to the odditity
or not.

The original mount command was on CA72_16Gp_ZFS:

# mount -onoatime,soft 192.168.1.170:/usr/ports/ /mnt/

The network is just a local EtherNet.

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Fresh releng/13.0 release/13.0.0 install: "newsyslog: malformed 'at' value" messages

2021-05-06 Thread Mark Millard via freebsd-stable

Having used bsdinstall to make a USB3 SSD on a RPi4B
(zfs-on-root, GPT parition, RPi4B materials copied
copied to msdos file system), booting gets error notices:

newsyslog: malformed 'at' value:
/var/log/all.log600  7 *@T00  J

newsyslog: malformed 'at' value:
/var/log/auth.log   600  7 1000 @0101T JC

newsyslog: malformed 'at' value:
/var/log/daily.log  640  7 *@T00  JN

newsyslog: malformed 'at' value:
/var/log/maillog640  7 *@T00  JC

newsyslog: malformed 'at' value:
/var/log/messages   644  5 1000 @0101T JC

newsyslog: malformed 'at' value:
/var/log/utx.log644  3 *@01T05 B

newsyslog: malformed 'at' value:
/var/log/daemon.log 644  5 1000 @0101T JC

It is apparently complaining about some of the
content in:

# more /etc/newsyslog.conf
# configuration file for newsyslog
# $FreeBSD$
#
# Entries which do not specify the '/pid_file' field will cause the
# syslogd process to be signalled when that log file is rotated.  This
# action is only appropriate for log files which are written to by the
# syslogd process (ie, files listed in /etc/syslog.conf).  If there
# is no process which needs to be signalled when a given log file is
# rotated, then the entry for that file should include the 'N' flag.
#
# Note: some sites will want to select more restrictive protections than the
# defaults.  In particular, it may be desirable to switch many of the 644
# entries to 640 or 600.  For example, some sites will consider the
# contents of maillog, messages, and lpd-errs to be confidential.  In the
# future, these defaults may change to more conservative ones.
#
# logfilename  [owner:group]mode count size when  flags [/pid_file] 
[sig_num]
/var/log/all.log600  7 *@T00  J
/var/log/auth.log   600  7 1000 @0101T JC
/var/log/console.log600  5 1000 * J
/var/log/cron   600  3 1000 * JC
/var/log/daily.log  640  7 *@T00  JN
/var/log/debug.log  600  7 1000 * JC
/var/log/init.log   644  3 1000 * J
/var/log/kerberos.log   600  7 1000 * J
/var/log/maillog640  7 *@T00  JC
/var/log/messages   644  5 1000 @0101T JC
/var/log/monthly.log640  12*$M1D0 JN
/var/log/devd.log   644  3 1000 * JC
/var/log/security   600  101000 * JC
/var/log/utx.log644  3 *@01T05 B
/var/log/weekly.log 640  5 *$W6D0 JN
/var/log/daemon.log 644  5 1000 @0101T JC

 /etc/newsyslog.conf.d/[!.]*.conf
 /usr/local/etc/newsyslog.conf.d/[!.]*.conf


Specifically, the 7 lines with "@" involved under "when" get the
complaints.


===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: zpool list -p 's FREE vs. zfs list -p's AVAIL ? FREE-AVAIL == 6_675_374_080 (199G zroot pool)

2021-05-05 Thread Mark Millard via freebsd-stable




On 2021-May-5, at 17:01, Yuri Pankov  wrote:

> Mark Millard via freebsd-current wrote:
>> Context:
>> 
>> # gpart show -pl da0
>> =>   40  468862048da0  GPT  (224G)
>> 40 532480  da0p1  efiboot0  (260M)
>> 532520   2008 - free -  (1.0M)
>> 534528   25165824  da0p2  swp12a  (12G)
>>   25700352   25165824  da0p4  swp12b  (12G)
>>   50866176  417994752  da0p3  zfs0  (199G)
>>  468860928   1160 - free -  (580K)
>> 
>> There is just one pool: zroot and it is on zfs0 above.
>> 
>> # zpool list -p
>> NAME   SIZEALLOC  FREE  CKPOINT  EXPANDSZ   FRAG
>> CAP  DEDUPHEALTH  ALTROOT
>> zroot  213674622976  71075655680  142598967296- - 28 
>> 33   1.00ONLINE  -
>> 
>> So FREE: 142_598_967_296
>> (using _ to make it more readable)
>> 
>> # zfs list -p zroot 
>> NAME  USED AVAIL REFER  MOUNTPOINT
>> zroot  71073697792  135923593216 98304  /zroot
>> 
>> So AVAIL: 135_923_593_216
>> 
>> FREE-AVAIL == 6_675_374_080
>> 
>> 
>> 
>> The questions:
>> 
>> Is this sort of unavailable pool-free-space normal?
>> Is this some sort of expected overhead that just is
>> not explicitly reported? Possibly a "FRAG"
>> consequence?
> 
> From zpoolprops(8):
> 
> freeThe amount of free space available in the pool.  By contrast,
>the zfs(8) available property describes how much new data can be
>written to ZFS filesystems/volumes.  The zpool free property is
>not generally useful for this purpose, and can be substantially
>more than the zfs available space. This discrepancy is due to
>several factors, including raidz parity; zfs reservation, quota,
>refreservation, and refquota properties; and space set aside by
>spa_slop_shift (see zfs-module-parameters(5) for more
>information).

Thanks for pointing to the reference material.

6_675_374_080/213_674_622_976 =approx= 0.03124 =approx= 1.0/32.0

and spa_slop_shift's description reports:

QUOTE
   spa_slop_shift (int)
   Normally, we don't allow the last 3.2%
   (1/(2^spa_slop_shift)) of space in the pool to be consumed.
   This ensures that we don't run the pool completely out of
   space, due to unaccounted changes (e.g. to the MOS).  It
   also limits the worst-case time to allocate space.  If we
   have less than this amount of free space, most ZPL
   operations (e.g. write, create) will return ENOSPC.

   Default value: 5.
END QUOTE

So in my simple context, apparently not much else
contributes and the figures are basically as
expected.

Thanks again.

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

zpool list -p 's FREE vs. zfs list -p's AVAIL ? FREE-AVAIL == 6_675_374_080 (199G zroot pool)

2021-05-05 Thread Mark Millard via freebsd-stable

Context:

# gpart show -pl da0
=>   40  468862048da0  GPT  (224G)
 40 532480  da0p1  efiboot0  (260M)
 532520   2008 - free -  (1.0M)
 534528   25165824  da0p2  swp12a  (12G)
   25700352   25165824  da0p4  swp12b  (12G)
   50866176  417994752  da0p3  zfs0  (199G)
  468860928   1160 - free -  (580K)

There is just one pool: zroot and it is on zfs0 above.

# zpool list -p
NAME   SIZEALLOC  FREE  CKPOINT  EXPANDSZ   FRAGCAP 
 DEDUPHEALTH  ALTROOT
zroot  213674622976  71075655680  142598967296- - 28 33 
  1.00ONLINE  -

So FREE: 142_598_967_296
(using _ to make it more readable)

# zfs list -p zroot 
NAME  USED AVAIL REFER  MOUNTPOINT
zroot  71073697792  135923593216 98304  /zroot

So AVAIL: 135_923_593_216

FREE-AVAIL == 6_675_374_080



The questions:

Is this sort of unavailable pool-free-space normal?
Is this some sort of expected overhead that just is
not explicitly reported? Possibly a "FRAG"
consequence?


For reference:

# zpool status
  pool: zroot
 state: ONLINE
  scan: scrub repaired 0B in 00:31:48 with 0 errors on Sun May  2 19:52:14 2021
config:

NAMESTATE READ WRITE CKSUM
zroot   ONLINE   0 0 0
  da0p3 ONLINE   0 0 0

errors: No known data errors


===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: ZFS rename with associated snapshot present: odd error message

2021-05-05 Thread Mark Millard via freebsd-stable

On 2021-May-5, at 05:28, Mark Millard  wrote:

> On 2021-May-5, at 02:47, Andriy Gapon  wrote:
> 
>> On 05/05/2021 01:59, Mark Millard via freebsd-current wrote:
>>> I had a:
>>> # zfs list -tall
>>> NAME   USED  AVAIL REFER  MOUNTPOINT
>>> . . .
>>> zroot/DESTDIRs/13_0R-CA72-instwrld-norm  1.44G   117G   96K 
>>>  /usr/obj/DESTDIRs/13_0R-CA72-instwrld-norm
>>> zroot/DESTDIRs/13_0R-CA72-instwrld-norm@dirty-style  1.44G  - 1.44G 
>>>  -. . .
>>> . . .
>>> (copied/pasted from somewhat earlier) and then attempted:
>>> # zfs rename zroot/DESTDIRs/13_0R-CA72-instwrld-norm 
>>> zroot/DESTDIRs/13_0R-CA72-instwrld-alt-0
>>> cannot open 'zroot/DESTDIRs/13_0R-CA72-instwrld-norm@dirty-style': snapshot 
>>> delimiter '@' is not expected here
>>> Despite the "cannot open" message, the result looks like:
>>> # zfs list -tall
>>> NAME   USED  AVAIL 
>>> REFER  MOUNTPOINT
>>> . . .
>>> zroot/DESTDIRs/13_0R-CA72-instwrld-alt-0  1.44G   114G  
>>>  96K  /usr/obj/DESTDIRs/13_0R-CA72-instwrld-alt-0
>>> zroot/DESTDIRs/13_0R-CA72-instwrld-alt-0@dirty-style  1.44G  - 
>>> 1.44G  -
>>> . . .
>>> Still, it leaves me wondering if everything is okay
>>> given that internal attempt to use the old name with
>>> @dirty-style when it was apparently no longer
>>> available under that naming.
>>> For reference:
>>> # uname -apKU
>>> FreeBSD CA72_4c8G_ZFS 13.0-RELEASE FreeBSD 13.0-RELEASE #0 
>>> releng/13.0-n244733-ea31abc261ff-dirty: Thu Apr 29 21:53:20 PDT 2021 
>>> root@CA72_4c8G_ZFS:/usr/obj/BUILDs/13_0R-CA72-nodbg-clang/usr/13_0R-src/arm64.aarch64/sys/GENERIC-NODBG-CA72
>>>   arm64 aarch64 1300139 1300139
>> 
>> Cannot reproduce here (but with much simpler names and on stable/13):
>> zfs create testz/test
>> zfs snapshot testz/test@snap1
>> zfs rename testz/test testz/test2
>> 
>> All worked.
>> 
> 
> I've noticed that sometimes in my explorations it has been
> silent instead of complaining. I've no clue at this point
> what prior activity (or lack of activity) makes the
> difference for if a message will be generated vs. not.

One difference in context is that your above sort of sequence
generates the after-snapshot context (using some things I have
around now):

zroot/DESTDIRs/13_0R-CA53-poud 1.45G   127G 1.45G  
/usr/obj/DESTDIRs/13_0R-CA53-poud
zroot/DESTDIRs/13_0R-CA53-poud@test   0B  - 1.45G  -

where my example had something more like (hand edited
the above just for illustration):

zroot/DESTDIRs/13_0R-CA53-poud 1.45G   125G   96K  
/usr/obj/DESTDIRs/13_0R-CA53-poud
zroot/DESTDIRs/13_0R-CA53-poud@test1.45G  - 1.45G  -

before the rename. In other words, I'd updated the
original (almost?) completely after the snapshot
(as a side effect of my overall activity). It was
only later that I tried the rename to track a new
purpose/context that I was going to switch to.

I'm not claiming that such is sufficient to
(always? ever?) reproduce the message. I'm
just pointing out that I'd had some significant
activity on the writable file system before the
rename.

Some of my activity has been more like your test
and I'd not seen the problems from such. But it
is not a very good comparison/contrast context
so I'd not infer much. I still can not at-will
set up a context to produce the messages.



===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: FYI: WITH_REPRODUCIBLE_BUILD= problem for some files? [aarch64 test did not reproduce the issue]

2021-05-05 Thread Mark Millard via freebsd-stable

On 2021-May-4, at 20:26, Mark Millard  wrote:

> On 2021-May-4, at 13:38, Mark Millard  wrote:
> 
>> [The first buidlworld is still in process. So while waiting . . .]
>> 
>> On 2021-May-4, at 10:31, Mark Millard  wrote:
>> 
>>> I probably know why the huge count of differences this time
>>> unlike the original report . . .
>>> 
>>> Previously I built based on a checked-in branch as part of
>>> my experimenting. This time it was in a -dirty form (not
>>> checked in), again as part of my experimental exploration.
>>> 
>>> WITH_REPRODUCIBLE_BUILD= makes a distinction between these
>>> if I remember right: (partially?) disabling itself for
>>> -dirty style.
>>> 
>>> To reproduce the original style of test I need to create
>>> a branch with my few patches checked in and do the
>>> buildworlds from that branch.
>>> 
>>> This will, of course, take a while.
>>> 
>>> Sorry for the noise.
>>> 
>> 
>> I've confirmed some of the details of the large number of
>> files with difference while waiting for the 1st buildworld :
>> 
>> The 4 bytes at the end of the .gnu_debuglink section
>> that are ending up different are the checksum for the
>> .debug file. The .debug files have differences such as:
>> 
>> │ -<1a>   DW_AT_comp_dir: (indirect string) 
>> /usr/obj/BUILDs/13_0R-CA72-nodbg-clang-alt/usr/13_0R-src/arm64.aarch64/lib/csu/aarch64
>> │ +<1a>   DW_AT_comp_dir: (indirect string) 
>> /usr/obj/BUILDs/13_0R-CA72-nodbg-clang/usr/13_0R-src/arm64.aarch64/lib/csu/aarch64
>> 
>> So I need to build, snapshot (in case need
>> to reference), install, clean-out, build,
>> install elsewhere, compare. (Or analogous
>> that uses the same build base-path for both
>> installs despite separate buildworld's.)
>> This is separate from any potential -dirty
>> vs. checked-in handling variation by
>> WITH_REPRODUCIBLE_BUILD= .
>> 
>> My process that produced the original armv7
>> report happened to do that before I accidentally
>> discovered the presence of the few files with
>> differences. My new experiments were different
>> and I'd not though of needing to vary the
>> procedure to get you the right evidence.
>> 
> 
> The two aarch64 test installs did not show any
> differences in a "diff -rq" . Ignoring *.meta
> files generated during the builds, the build
> directory tree snapshots showed just the
> differences:
> 
> # diff -rq 
> /usr/obj/BUILDs/13_0R-CA72-nodbg-clang-alt/.zfs/snapshot/commited-style-0/usr 
> /usr/obj/BUILDs/13_0R-CA72-nodbg-clang-alt/.zfs/snapshot/commited-style-1/usr 
> | grep -v '\.meta' | more
> Files 
> /usr/obj/BUILDs/13_0R-CA72-nodbg-clang-alt/.zfs/snapshot/commited-style-0/usr/13_0R-src/arm64.aarch64/stand/ficl/softcore.c
>  and 
> /usr/obj/BUILDs/13_0R-CA72-nodbg-clang-alt/.zfs/snapshot/commited-style-1/usr/13_0R-src/arm64.aarch64/stand/ficl/softcore.c
>  differ
> Files 
> /usr/obj/BUILDs/13_0R-CA72-nodbg-clang-alt/.zfs/snapshot/commited-style-0/usr/13_0R-src/arm64.aarch64/toolchain-metadata.mk
>  and 
> /usr/obj/BUILDs/13_0R-CA72-nodbg-clang-alt/.zfs/snapshot/commited-style-1/usr/13_0R-src/arm64.aarch64/toolchain-metadata.mk
>  differ
> 
> # diff -u 
> /usr/obj/BUILDs/13_0R-CA72-nodbg-clang-alt/.zfs/snapshot/commited-style-0/usr/13_0R-src/arm64.aarch64/stand/ficl/softcore.c
>  
> /usr/obj/BUILDs/13_0R-CA72-nodbg-clang-alt/.zfs/snapshot/commited-style-1/usr/13_0R-src/arm64.aarch64/stand/ficl/softcore.c
> --- 
> /usr/obj/BUILDs/13_0R-CA72-nodbg-clang-alt/.zfs/snapshot/commited-style-0/usr/13_0R-src/arm64.aarch64/stand/ficl/softcore.c
>  2021-05-04 13:45:14.463351000 -0700
> +++ 
> /usr/obj/BUILDs/13_0R-CA72-nodbg-clang-alt/.zfs/snapshot/commited-style-1/usr/13_0R-src/arm64.aarch64/stand/ficl/softcore.c
>  2021-05-04 19:04:32.338203000 -0700
> @@ -4,7 +4,7 @@
> ** Words from CORE set written in FICL
> ** Author: John Sadler (john_sad...@alum.mit.edu)
> ** Created: 27 December 1997
> -** Last update: Tue May  4 13:45:14 PDT 2021
> +** Last update: Tue May  4 19:04:32 PDT 2021
> ***/
> /*
> ** DO NOT EDIT THIS FILE -- it is generated by softwords/softcore.awk
> 
> # diff -u 
> /usr/obj/BUILDs/13_0R-CA72-nodbg-clang-alt/.zfs/snapshot/commited-style-0/usr/13_0R-src/arm64.aarch64/toolchain-metadata.mk
>  
> /usr/obj/BUILDs/13_0R-CA72-nodbg-clang-alt/.zfs/snapshot/commited-style-1/usr/13_0R-src/arm64.aarch64/toolchain-metadata.mk
> --- 
> /usr/obj/BUILDs/13_0R-CA72-nodbg-clang-alt/.zfs/sn

Re: ZFS rename with associated snapshot present: odd error message

2021-05-05 Thread Mark Millard via freebsd-stable




On 2021-May-5, at 02:47, Andriy Gapon  wrote:

> On 05/05/2021 01:59, Mark Millard via freebsd-current wrote:
>> I had a:
>> # zfs list -tall
>> NAME   USED  AVAIL REFER  MOUNTPOINT
>> . . .
>> zroot/DESTDIRs/13_0R-CA72-instwrld-norm  1.44G   117G   96K  
>> /usr/obj/DESTDIRs/13_0R-CA72-instwrld-norm
>> zroot/DESTDIRs/13_0R-CA72-instwrld-norm@dirty-style  1.44G  - 1.44G  
>> -. . .
>> . . .
>> (copied/pasted from somewhat earlier) and then attempted:
>> # zfs rename zroot/DESTDIRs/13_0R-CA72-instwrld-norm 
>> zroot/DESTDIRs/13_0R-CA72-instwrld-alt-0
>> cannot open 'zroot/DESTDIRs/13_0R-CA72-instwrld-norm@dirty-style': snapshot 
>> delimiter '@' is not expected here
>> Despite the "cannot open" message, the result looks like:
>> # zfs list -tall
>> NAME   USED  AVAIL 
>> REFER  MOUNTPOINT
>> . . .
>> zroot/DESTDIRs/13_0R-CA72-instwrld-alt-0  1.44G   114G   
>> 96K  /usr/obj/DESTDIRs/13_0R-CA72-instwrld-alt-0
>> zroot/DESTDIRs/13_0R-CA72-instwrld-alt-0@dirty-style  1.44G  - 
>> 1.44G  -
>> . . .
>> Still, it leaves me wondering if everything is okay
>> given that internal attempt to use the old name with
>> @dirty-style when it was apparently no longer
>> available under that naming.
>> For reference:
>> # uname -apKU
>> FreeBSD CA72_4c8G_ZFS 13.0-RELEASE FreeBSD 13.0-RELEASE #0 
>> releng/13.0-n244733-ea31abc261ff-dirty: Thu Apr 29 21:53:20 PDT 2021 
>> root@CA72_4c8G_ZFS:/usr/obj/BUILDs/13_0R-CA72-nodbg-clang/usr/13_0R-src/arm64.aarch64/sys/GENERIC-NODBG-CA72
>>   arm64 aarch64 1300139 1300139
> 
> Cannot reproduce here (but with much simpler names and on stable/13):
> zfs create testz/test
> zfs snapshot testz/test@snap1
> zfs rename testz/test testz/test2
> 
> All worked.
> 

I've noticed that sometimes in my explorations it has been
silent instead of complaining. I've no clue at this point
what prior activity (or lack of activity) makes the
difference for if a message will be generated vs. not.

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: FYI: WITH_REPRODUCIBLE_BUILD= problem for some files? [aarch64 test did not reproduce the issue]

2021-05-04 Thread Mark Millard via freebsd-stable



On 2021-May-4, at 13:38, Mark Millard  wrote:

> [The first buidlworld is still in process. So while waiting . . .]
> 
> On 2021-May-4, at 10:31, Mark Millard  wrote:
> 
>> I probably know why the huge count of differences this time
>> unlike the original report . . .
>> 
>> Previously I built based on a checked-in branch as part of
>> my experimenting. This time it was in a -dirty form (not
>> checked in), again as part of my experimental exploration.
>> 
>> WITH_REPRODUCIBLE_BUILD= makes a distinction between these
>> if I remember right: (partially?) disabling itself for
>> -dirty style.
>> 
>> To reproduce the original style of test I need to create
>> a branch with my few patches checked in and do the
>> buildworlds from that branch.
>> 
>> This will, of course, take a while.
>> 
>> Sorry for the noise.
>> 
> 
> I've confirmed some of the details of the large number of
> files with difference while waiting for the 1st buildworld :
> 
> The 4 bytes at the end of the .gnu_debuglink section
> that are ending up different are the checksum for the
> .debug file. The .debug files have differences such as:
> 
> │ -<1a>   DW_AT_comp_dir: (indirect string) 
> /usr/obj/BUILDs/13_0R-CA72-nodbg-clang-alt/usr/13_0R-src/arm64.aarch64/lib/csu/aarch64
> │ +<1a>   DW_AT_comp_dir: (indirect string) 
> /usr/obj/BUILDs/13_0R-CA72-nodbg-clang/usr/13_0R-src/arm64.aarch64/lib/csu/aarch64
> 
> So I need to build, snapshot (in case need
> to reference), install, clean-out, build,
> install elsewhere, compare. (Or analogous
> that uses the same build base-path for both
> installs despite separate buildworld's.)
> This is separate from any potential -dirty
> vs. checked-in handling variation by
> WITH_REPRODUCIBLE_BUILD= .
> 
> My process that produced the original armv7
> report happened to do that before I accidentally
> discovered the presence of the few files with
> differences. My new experiments were different
> and I'd not though of needing to vary the
> procedure to get you the right evidence.
> 

The two aarch64 test installs did not show any
differences in a "diff -rq" . Ignoring *.meta
files generated during the builds, the build
directory tree snapshots showed just the
differences:

# diff -rq 
/usr/obj/BUILDs/13_0R-CA72-nodbg-clang-alt/.zfs/snapshot/commited-style-0/usr 
/usr/obj/BUILDs/13_0R-CA72-nodbg-clang-alt/.zfs/snapshot/commited-style-1/usr | 
grep -v '\.meta' | more
Files 
/usr/obj/BUILDs/13_0R-CA72-nodbg-clang-alt/.zfs/snapshot/commited-style-0/usr/13_0R-src/arm64.aarch64/stand/ficl/softcore.c
 and 
/usr/obj/BUILDs/13_0R-CA72-nodbg-clang-alt/.zfs/snapshot/commited-style-1/usr/13_0R-src/arm64.aarch64/stand/ficl/softcore.c
 differ
Files 
/usr/obj/BUILDs/13_0R-CA72-nodbg-clang-alt/.zfs/snapshot/commited-style-0/usr/13_0R-src/arm64.aarch64/toolchain-metadata.mk
 and 
/usr/obj/BUILDs/13_0R-CA72-nodbg-clang-alt/.zfs/snapshot/commited-style-1/usr/13_0R-src/arm64.aarch64/toolchain-metadata.mk
 differ

# diff -u 
/usr/obj/BUILDs/13_0R-CA72-nodbg-clang-alt/.zfs/snapshot/commited-style-0/usr/13_0R-src/arm64.aarch64/stand/ficl/softcore.c
 
/usr/obj/BUILDs/13_0R-CA72-nodbg-clang-alt/.zfs/snapshot/commited-style-1/usr/13_0R-src/arm64.aarch64/stand/ficl/softcore.c
--- 
/usr/obj/BUILDs/13_0R-CA72-nodbg-clang-alt/.zfs/snapshot/commited-style-0/usr/13_0R-src/arm64.aarch64/stand/ficl/softcore.c
 2021-05-04 13:45:14.463351000 -0700
+++ 
/usr/obj/BUILDs/13_0R-CA72-nodbg-clang-alt/.zfs/snapshot/commited-style-1/usr/13_0R-src/arm64.aarch64/stand/ficl/softcore.c
 2021-05-04 19:04:32.338203000 -0700
@@ -4,7 +4,7 @@
 ** Words from CORE set written in FICL
 ** Author: John Sadler (john_sad...@alum.mit.edu)
 ** Created: 27 December 1997
-** Last update: Tue May  4 13:45:14 PDT 2021
+** Last update: Tue May  4 19:04:32 PDT 2021
 ***/
 /*
 ** DO NOT EDIT THIS FILE -- it is generated by softwords/softcore.awk

# diff -u 
/usr/obj/BUILDs/13_0R-CA72-nodbg-clang-alt/.zfs/snapshot/commited-style-0/usr/13_0R-src/arm64.aarch64/toolchain-metadata.mk
 
/usr/obj/BUILDs/13_0R-CA72-nodbg-clang-alt/.zfs/snapshot/commited-style-1/usr/13_0R-src/arm64.aarch64/toolchain-metadata.mk
--- 
/usr/obj/BUILDs/13_0R-CA72-nodbg-clang-alt/.zfs/snapshot/commited-style-0/usr/13_0R-src/arm64.aarch64/toolchain-metadata.mk
 2021-05-04 10:55:26.030179000 -0700
+++ 
/usr/obj/BUILDs/13_0R-CA72-nodbg-clang-alt/.zfs/snapshot/commited-style-1/usr/13_0R-src/arm64.aarch64/toolchain-metadata.mk
 2021-05-04 16:14:24.513346000 -0700
@@ -1,4 +1,4 @@
-.info Using cached toolchain metadata from build at CA72_4c8G_ZFS on Tue May  
4 10:55:26 PDT 2021
+.info Using cached toolchain metadata from build at CA72_4c8G_ZFS on Tue May  
4 16:14:24 PDT 2021
 _LOADED_TOOLCHAI

ZFS rename with associated snapshot present: odd error message

2021-05-04 Thread Mark Millard via freebsd-stable

I had a:

# zfs list -tall
NAME   USED  AVAIL REFER  MOUNTPOINT
. . .
zroot/DESTDIRs/13_0R-CA72-instwrld-norm  1.44G   117G   96K  
/usr/obj/DESTDIRs/13_0R-CA72-instwrld-norm
zroot/DESTDIRs/13_0R-CA72-instwrld-norm@dirty-style  1.44G  - 1.44G  -. 
. .
. . .

(copied/pasted from somewhat earlier) and then attempted:

# zfs rename zroot/DESTDIRs/13_0R-CA72-instwrld-norm 
zroot/DESTDIRs/13_0R-CA72-instwrld-alt-0
cannot open 'zroot/DESTDIRs/13_0R-CA72-instwrld-norm@dirty-style': snapshot 
delimiter '@' is not expected here

Despite the "cannot open" message, the result looks like:

# zfs list -tall
NAME   USED  AVAIL 
REFER  MOUNTPOINT
. . .
zroot/DESTDIRs/13_0R-CA72-instwrld-alt-0  1.44G   114G   
96K  /usr/obj/DESTDIRs/13_0R-CA72-instwrld-alt-0
zroot/DESTDIRs/13_0R-CA72-instwrld-alt-0@dirty-style  1.44G  - 
1.44G  -
. . .

Still, it leaves me wondering if everything is okay
given that internal attempt to use the old name with
@dirty-style when it was apparently no longer
available under that naming.

For reference:

# uname -apKU
FreeBSD CA72_4c8G_ZFS 13.0-RELEASE FreeBSD 13.0-RELEASE #0 
releng/13.0-n244733-ea31abc261ff-dirty: Thu Apr 29 21:53:20 PDT 2021 
root@CA72_4c8G_ZFS:/usr/obj/BUILDs/13_0R-CA72-nodbg-clang/usr/13_0R-src/arm64.aarch64/sys/GENERIC-NODBG-CA72
  arm64 aarch64 1300139 1300139


===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: FYI: WITH_REPRODUCIBLE_BUILD= problem for some files? [Ignore recent test: -dirty vs. checked-in usage difference]

2021-05-04 Thread Mark Millard via freebsd-stable

[The first buidlworld is still in process. So while waiting . . .]

On 2021-May-4, at 10:31, Mark Millard  wrote:

> I probably know why the huge count of differences this time
> unlike the original report . . .
> 
> Previously I built based on a checked-in branch as part of
> my experimenting. This time it was in a -dirty form (not
> checked in), again as part of my experimental exploration.
> 
> WITH_REPRODUCIBLE_BUILD= makes a distinction between these
> if I remember right: (partially?) disabling itself for
> -dirty style.
> 
> To reproduce the original style of test I need to create
> a branch with my few patches checked in and do the
> buildworlds from that branch.
> 
> This will, of course, take a while.
> 
> Sorry for the noise.
> 

I've confirmed some of the details of the large number of
files with difference while waiting for the 1st buildworld :

The 4 bytes at the end of the .gnu_debuglink section
that are ending up different are the checksum for the
.debug file. The .debug files have differences such as:

│ -<1a>   DW_AT_comp_dir: (indirect string) 
/usr/obj/BUILDs/13_0R-CA72-nodbg-clang-alt/usr/13_0R-src/arm64.aarch64/lib/csu/aarch64
│ +<1a>   DW_AT_comp_dir: (indirect string) 
/usr/obj/BUILDs/13_0R-CA72-nodbg-clang/usr/13_0R-src/arm64.aarch64/lib/csu/aarch64

So I need to build, snapshot (in case need
to reference), install, clean-out, build,
install elsewhere, compare. (Or analogous
that uses the same build base-path for both
installs despite separate buildworld's.)
This is separate from any potential -dirty
vs. checked-in handling variation by
WITH_REPRODUCIBLE_BUILD= .

My process that produced the original armv7
report happened to do that before I accidentally
discovered the presence of the few files with
differences. My new experiments were different
and I'd not though of needing to vary the
procedure to get you the right evidence.

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

diffoscope's odd UnicodeDecodeError error message: reason found

2021-05-04 Thread Mark Millard via freebsd-stable

I had reported in the reproducable build list messages:

> # diffoscope /.zfs/snapshot/2021-04-*-01:40:48-0/bin/sh
> [...]
> $<3/>2021-05-04 08:49:21 W: diffoscope.main: Fuzzy-matching is currently 
> disabled as the "tlsh" module is unavailable.
> UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb7 in position 18: 
> invalid start byte

Well, it turns out that the file name pattern was
incorrect and only matched one file.

By contrast:

# diffoscope /.zfs/snapshot/2021-04-*/bin/sh
$<3/>2021-05-04 11:05:25 W: diffoscope.main: Fuzzy-matching is currently 
disabled as the "tlsh" module is unavailable.

worked fine.

And making the "one file" status obvious:

# diffoscope c_tests/a.out
$<3/>2021-05-04 11:11:45 W: diffoscope.main: Fuzzy-matching is currently 
disabled as the "tlsh" module is unavailable.
$<3/>Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/diffoscope/main.py", line 745, 
in main
sys.exit(run_diffoscope(parsed_args))
  File "/usr/local/lib/python3.7/site-packages/diffoscope/main.py", line 677, 
in run_diffoscope
difference = load_diff_from_path(path1)
  File "/usr/local/lib/python3.7/site-packages/diffoscope/readers/__init__.py", 
line 31, in load_diff_from_path
return load_diff(codecs.getreader("utf-8")(fp), path)
  File "/usr/local/lib/python3.7/site-packages/diffoscope/readers/__init__.py", 
line 35, in load_diff
return JSONReaderV1().load(fp, path)
  File "/usr/local/lib/python3.7/site-packages/diffoscope/readers/json.py", 
line 33, in load
raw = json.load(fp)
  File "/usr/local/lib/python3.7/json/__init__.py", line 293, in load
return loads(fp.read(),
  File "/usr/local/lib/python3.7/codecs.py", line 504, in read
newchars, decodedbytes = self.decode(data, self.errors)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb7 in position 18: 
invalid start byte

Not exactly an obvious error message for the issue.

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: FYI: WITH_REPRODUCIBLE_BUILD= problem for some files? [Ignore recent test: -dirty vs. checked-in usage difference]

2021-05-04 Thread Mark Millard via freebsd-stable

I probably know why the huge count of differences this time
unlike the original report . . .

Previously I built based on a checked-in branch as part of
my experimenting. This time it was in a -dirty form (not
checked in), again as part of my experimental exploration.

WITH_REPRODUCIBLE_BUILD= makes a distinction between these
if I remember right: (partially?) disabling itself for
-dirty style.

To reproduce the original style of test I need to create
a branch with my few patches checked in and do the
buildworlds from that branch.

This will, of course, take a while.

Sorry for the noise.



===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: FYI: WITH_REPRODUCIBLE_BUILD= problem for some files?

2021-05-04 Thread Mark Millard via freebsd-stable

[Just adding readelf -S info since it seems to show more.]

On 2021-May-4, at 10:01, Mark Millard  wrote:


> On 2021-May-4, at 08:51, Mark Millard  wrote:
> 
>> On 2021-May-4, at 06:01, Ed Maste  wrote:
>> 
>>> On Mon, 3 May 2021 at 22:26, Mark Millard  wrote:
>>>> 
>>>> But I'll note that I've built and stalled py37-diffoscope
>>>> (new to me). A basic quick test showed that it reports:
>>>> 
>>>> W: diffoscope.main: Fuzzy-matching is currently disabled as the "tlsh" 
>>>> module is unavailable.
>>> 
>>> I just looked up tlsh - its "A Locality Sensitive Hash"; I presume
>>> diffoscope uses it to infer file renames. I believe the warning
>>> emitted here should have no impact on the output we're looking for.
>> 
>> Okay.
>> 
>>> As far as the utf-8 issues go, diffoscope requires a utf-8 locale and
>>> I suspect that is the issue. If you don't have LANG set already, try
>>> setting LANG=C.UTF-8 in your environment.
>> 
>> That is not the issue for the UnicodeDecodeError:
>> 
>> # echo $LANG
>> C.UTF-8
>> 
>> # diffoscope /.zfs/snapshot/2021-04-*-01:40:48-0/bin/sh
>> $<3/>2021-05-04 08:49:21 W: diffoscope.main: Fuzzy-matching is currently 
>> disabled as the "tlsh" module is unavailable.
>> $<3/>Traceback (most recent call last):
>> File "/usr/local/lib/python3.7/site-packages/diffoscope/main.py", line 745, 
>> in main
>>   sys.exit(run_diffoscope(parsed_args))
>> File "/usr/local/lib/python3.7/site-packages/diffoscope/main.py", line 677, 
>> in run_diffoscope
>>   difference = load_diff_from_path(path1)
>> File 
>> "/usr/local/lib/python3.7/site-packages/diffoscope/readers/__init__.py", 
>> line 31, in load_diff_from_path
>>   return load_diff(codecs.getreader("utf-8")(fp), path)
>> File 
>> "/usr/local/lib/python3.7/site-packages/diffoscope/readers/__init__.py", 
>> line 35, in load_diff
>>   return JSONReaderV1().load(fp, path)
>> File "/usr/local/lib/python3.7/site-packages/diffoscope/readers/json.py", 
>> line 33, in load
>>   raw = json.load(fp)
>> File "/usr/local/lib/python3.7/json/__init__.py", line 293, in load
>>   return loads(fp.read(),
>> File "/usr/local/lib/python3.7/codecs.py", line 504, in read
>>   newchars, decodedbytes = self.decode(data, self.errors)
>> UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb7 in position 18: 
>> invalid start byte
>> 
> 
> Well, the list of differing files is huge. But this seems to
> be .gnu_debuglink content for the area it is in.

Specifically: the last 4 bytes of the .gnu_debuglink section.

> I'll note
> that I did installworld but not the likes of distrib-dirs
> or distribution this time.
> 
> This test did buildworld to two distinct directories:
> 
> zroot/BUILDs/13_0R-CA72-nodbg-clang   5.13G   118G 5.13G  
> /usr/obj/BUILDs/13_0R-CA72-nodbg-clang
> zroot/BUILDs/13_0R-CA72-nodbg-clang-alt   4.28G   118G 4.28G  
> /usr/obj/BUILDs/13_0R-CA72-nodbg-clang-alt
> 
> and installworld to 2 distinct directories:
> 
> zroot/DESTDIRs/13_0R-CA72-instwrld-alt1.44G   118G 1.44G  
> /usr/obj/DESTDIRs/13_0R-CA72-instwrld-alt
> zroot/DESTDIRs/13_0R-CA72-instwrld-norm   1.44G   118G 1.44G  
> /usr/obj/DESTDIRs/13_0R-CA72-instwrld-norm
> 
> Previously (armv7 target) I had built, installed, rebuilt
> to same directory (after clean-out) and installed to an
> alternate directory. That had gotten only a few files
> different but I do not know (yet) if it was the procedural
> difference that made the difference.
> 
> Prefix of the list of different files this time:
> 
> # diff -rq /usr/obj/DESTDIRs/13_0R-CA72-instwrld-norm/ 
> /usr/obj/DESTDIRs/13_0R-CA72-instwrld-alt/ | more
> Files /usr/obj/DESTDIRs/13_0R-CA72-instwrld-norm/bin/[ and 
> /usr/obj/DESTDIRs/13_0R-CA72-instwrld-alt/bin/[ differ
> Files /usr/obj/DESTDIRs/13_0R-CA72-instwrld-norm/bin/cat and 
> /usr/obj/DESTDIRs/13_0R-CA72-instwrld-alt/bin/cat differ
> Files /usr/obj/DESTDIRs/13_0R-CA72-instwrld-norm/bin/chflags and 
> /usr/obj/DESTDIRs/13_0R-CA72-instwrld-alt/bin/chflags differ
> Files /usr/obj/DESTDIRs/13_0R-CA72-instwrld-norm/bin/chio and 
> /usr/obj/DESTDIRs/13_0R-CA72-instwrld-alt/bin/chio differ
> . . .
> 
> Looking, aarch64 seems to typically get a back-to-back
> sequence of 4 bytes different in native programs in my
> builds:
> 
> # cmp -x /usr/obj/DESTDIRs/13_0R-CA72-instwrld-norm/bin/cat 
> /usr/obj/DESTDIRs/13_0R-CA72-instwrld-al

Re: FYI: WITH_REPRODUCIBLE_BUILD= problem for some files?

2021-05-04 Thread Mark Millard via freebsd-stable



On 2021-May-4, at 08:51, Mark Millard  wrote:

> On 2021-May-4, at 06:01, Ed Maste  wrote:
> 
>> On Mon, 3 May 2021 at 22:26, Mark Millard  wrote:
>>> 
>>> But I'll note that I've built and stalled py37-diffoscope
>>> (new to me). A basic quick test showed that it reports:
>>> 
>>> W: diffoscope.main: Fuzzy-matching is currently disabled as the "tlsh" 
>>> module is unavailable.
>> 
>> I just looked up tlsh - its "A Locality Sensitive Hash"; I presume
>> diffoscope uses it to infer file renames. I believe the warning
>> emitted here should have no impact on the output we're looking for.
> 
> Okay.
> 
>> As far as the utf-8 issues go, diffoscope requires a utf-8 locale and
>> I suspect that is the issue. If you don't have LANG set already, try
>> setting LANG=C.UTF-8 in your environment.
> 
> That is not the issue for the UnicodeDecodeError:
> 
> # echo $LANG
> C.UTF-8
> 
> # diffoscope /.zfs/snapshot/2021-04-*-01:40:48-0/bin/sh
> $<3/>2021-05-04 08:49:21 W: diffoscope.main: Fuzzy-matching is currently 
> disabled as the "tlsh" module is unavailable.
> $<3/>Traceback (most recent call last):
>  File "/usr/local/lib/python3.7/site-packages/diffoscope/main.py", line 745, 
> in main
>sys.exit(run_diffoscope(parsed_args))
>  File "/usr/local/lib/python3.7/site-packages/diffoscope/main.py", line 677, 
> in run_diffoscope
>difference = load_diff_from_path(path1)
>  File 
> "/usr/local/lib/python3.7/site-packages/diffoscope/readers/__init__.py", line 
> 31, in load_diff_from_path
>return load_diff(codecs.getreader("utf-8")(fp), path)
>  File 
> "/usr/local/lib/python3.7/site-packages/diffoscope/readers/__init__.py", line 
> 35, in load_diff
>return JSONReaderV1().load(fp, path)
>  File "/usr/local/lib/python3.7/site-packages/diffoscope/readers/json.py", 
> line 33, in load
>raw = json.load(fp)
>  File "/usr/local/lib/python3.7/json/__init__.py", line 293, in load
>return loads(fp.read(),
>  File "/usr/local/lib/python3.7/codecs.py", line 504, in read
>newchars, decodedbytes = self.decode(data, self.errors)
> UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb7 in position 18: 
> invalid start byte
> 

Well, the list of differing files is huge. But this seems to
be .gnu_debuglink content for the area it is in. I'll note
that I did installworld but not the likes of distrib-dirs
or distribution this time.

This test did buildworld to two distinct directories:

zroot/BUILDs/13_0R-CA72-nodbg-clang   5.13G   118G 5.13G  
/usr/obj/BUILDs/13_0R-CA72-nodbg-clang
zroot/BUILDs/13_0R-CA72-nodbg-clang-alt   4.28G   118G 4.28G  
/usr/obj/BUILDs/13_0R-CA72-nodbg-clang-alt

and installworld to 2 distinct directories:

zroot/DESTDIRs/13_0R-CA72-instwrld-alt1.44G   118G 1.44G  
/usr/obj/DESTDIRs/13_0R-CA72-instwrld-alt
zroot/DESTDIRs/13_0R-CA72-instwrld-norm   1.44G   118G 1.44G  
/usr/obj/DESTDIRs/13_0R-CA72-instwrld-norm

Previously (armv7 target) I had built, installed, rebuilt
to same directory (after clean-out) and installed to an
alternate directory. That had gotten only a few files
different but I do not know (yet) if it was the procedural
difference that made the difference.

Prefix of the list of different files this time:

# diff -rq /usr/obj/DESTDIRs/13_0R-CA72-instwrld-norm/ 
/usr/obj/DESTDIRs/13_0R-CA72-instwrld-alt/ | more
Files /usr/obj/DESTDIRs/13_0R-CA72-instwrld-norm/bin/[ and 
/usr/obj/DESTDIRs/13_0R-CA72-instwrld-alt/bin/[ differ
Files /usr/obj/DESTDIRs/13_0R-CA72-instwrld-norm/bin/cat and 
/usr/obj/DESTDIRs/13_0R-CA72-instwrld-alt/bin/cat differ
Files /usr/obj/DESTDIRs/13_0R-CA72-instwrld-norm/bin/chflags and 
/usr/obj/DESTDIRs/13_0R-CA72-instwrld-alt/bin/chflags differ
Files /usr/obj/DESTDIRs/13_0R-CA72-instwrld-norm/bin/chio and 
/usr/obj/DESTDIRs/13_0R-CA72-instwrld-alt/bin/chio differ
. . .

Looking, aarch64 seems to typically get a back-to-back
sequence of 4 bytes different in native programs in my
builds:

# cmp -x /usr/obj/DESTDIRs/13_0R-CA72-instwrld-norm/bin/cat 
/usr/obj/DESTDIRs/13_0R-CA72-instwrld-alt/bin/cat
3bd4 1d 65
3bd5 eb a3
3bd6 bb ca
3bd7 8e 1a

# ls -Tld /usr/obj/DESTDIRs/13_0R-CA72-instwrld-norm/bin/cat 
/usr/obj/DESTDIRs/13_0R-CA72-instwrld-alt/bin/cat
-r-xr-xr-x  1 root  wheel  18448 May  4 08:55:01 2021 
/usr/obj/DESTDIRs/13_0R-CA72-instwrld-alt/bin/cat
-r-xr-xr-x  1 root  wheel  18448 May  3 23:16:36 2021 
/usr/obj/DESTDIRs/13_0R-CA72-instwrld-norm/bin/cat

Sections:
Idx Name  Size  VMA   LMA   File off  Algn
. . .
 25 .gnu_debuglink 0010      3bc8  2**0
  CONTENTS, READONLY

3bd4-0

Re: FYI: WITH_REPRODUCIBLE_BUILD= problem for some files?

2021-05-04 Thread Mark Millard via freebsd-stable

On 2021-May-4, at 06:01, Ed Maste  wrote:

> On Mon, 3 May 2021 at 22:26, Mark Millard  wrote:
>> 
>> But I'll note that I've built and stalled py37-diffoscope
>> (new to me). A basic quick test showed that it reports:
>> 
>> W: diffoscope.main: Fuzzy-matching is currently disabled as the "tlsh" 
>> module is unavailable.
> 
> I just looked up tlsh - its "A Locality Sensitive Hash"; I presume
> diffoscope uses it to infer file renames. I believe the warning
> emitted here should have no impact on the output we're looking for.

Okay.

> As far as the utf-8 issues go, diffoscope requires a utf-8 locale and
> I suspect that is the issue. If you don't have LANG set already, try
> setting LANG=C.UTF-8 in your environment.

That is not the issue for the UnicodeDecodeError:

# echo $LANG
C.UTF-8

# diffoscope /.zfs/snapshot/2021-04-*-01:40:48-0/bin/sh
$<3/>2021-05-04 08:49:21 W: diffoscope.main: Fuzzy-matching is currently 
disabled as the "tlsh" module is unavailable.
$<3/>Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/diffoscope/main.py", line 745, 
in main
sys.exit(run_diffoscope(parsed_args))
  File "/usr/local/lib/python3.7/site-packages/diffoscope/main.py", line 677, 
in run_diffoscope
difference = load_diff_from_path(path1)
  File "/usr/local/lib/python3.7/site-packages/diffoscope/readers/__init__.py", 
line 31, in load_diff_from_path
return load_diff(codecs.getreader("utf-8")(fp), path)
  File "/usr/local/lib/python3.7/site-packages/diffoscope/readers/__init__.py", 
line 35, in load_diff
return JSONReaderV1().load(fp, path)
  File "/usr/local/lib/python3.7/site-packages/diffoscope/readers/json.py", 
line 33, in load
raw = json.load(fp)
  File "/usr/local/lib/python3.7/json/__init__.py", line 293, in load
return loads(fp.read(),
  File "/usr/local/lib/python3.7/codecs.py", line 504, in read
newchars, decodedbytes = self.decode(data, self.errors)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb7 in position 18: 
invalid start byte

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: FYI: WITH_REPRODUCIBLE_BUILD= problem for some files?

2021-05-04 Thread Mark Millard via freebsd-stable




On 2021-May-3, at 21:27, Mark Millard  wrote:

> On 2021-May-3, at 19:26, Mark Millard  wrote:
> 
>> On 2021-May-3, at 10:51, Mark Millard  wrote:
>> 
>>> On 2021-May-3, at 07:47, Ed Maste  wrote:
>>> 
>>>> On Thu, 29 Apr 2021 at 02:50, Mark Millard via freebsd-current
>>>>  wrote:
>>>>> 
>>>>> Files /usr/obj/DESTDIRs/13_0R-CA7-chroot/sbin/ping and 
>>>>> /usr/obj/DESTDIRs/13_0R-CA7-poud/sbin/ping differ
>>>>> Files /usr/obj/DESTDIRs/13_0R-CA7-chroot/sbin/ping6 and 
>>>>> /usr/obj/DESTDIRs/13_0R-CA7-poud/sbin/ping6 differ
>>>>> Files /usr/obj/DESTDIRs/13_0R-CA7-chroot/usr/bin/ntpq and 
>>>>> /usr/obj/DESTDIRs/13_0R-CA7-poud/usr/bin/ntpq differ...
>>>> 
>>>> This is unexpected. Unfortunately I haven't looked at reproducibility
>>>> in a while, and my work was all on x86. This could be a regression or
>>>> a longstanding issue with arm64.
>>>> 
>>>> If you install the diffoscope package (py37-diffoscope) and run it on
>>>> the two directories / files it should give a more convenient view of
>>>> the differences. (Or, if you can make a tarball of the differing files
>>>> I can take a look.)
>>> 
>>> I no longer have the same content in those directory
>>> trees: newer rebuild and the same buildworld used to
>>> installworld to both places, instead of 2 different
>>> buildworld's. I'm also unsure how reproducible getting
>>> differences was.
>>> 
>>> I can eventually do experiments to test multiple separate
>>> buildworld's and installworld's, but the machine is busy
>>> building ports and the llvm builds involved means it
>>> will be some time before I'd switch activities. And the
>>> buildworld's involve llvm builds as well and take notable
>>> time themselves. So my next comparison will not be any
>>> time soon.
>>> 
>>> I'll let you know if I manage to generate another example,
>>> this time being sure to keep the data. If I try multiple
>>> times without finding any differences, I'll eventually
>>> decide "enough is enough" and let you know.
>> 
>> I've still got a long ways to go to do the first
>> actual comparison of builds.
>> 
>> But I'll note that I've built and stalled py37-diffoscope
>> (new to me). A basic quick test showed that it reports:
>> 
>> W: diffoscope.main: Fuzzy-matching is currently disabled as the "tlsh" 
>> module is unavailable.
>> 
>> As I'm not familiar with the tool, you might need to send
>> notes about how you want me to use the tool to get the
>> output that you would want. (And, so, I get to learn . . .)
> 
> I've tried another experiment (* in the path matches "28" and "30"):
> 
> # diffoscope /.zfs/snapshot/2021-04-*-01:40:48-0/bin/sh
> $<3/>2021-05-03 21:08:48 W: diffoscope.main: Fuzzy-matching is currently 
> disabled as the "tlsh" module is unavailable.
> $<3/>Traceback (most recent call last):
>  File "/usr/local/lib/python3.7/site-packages/diffoscope/main.py", line 745, 
> in main
>sys.exit(run_diffoscope(parsed_args))
>  File "/usr/local/lib/python3.7/site-packages/diffoscope/main.py", line 677, 
> in run_diffoscope
>difference = load_diff_from_path(path1)
>  File 
> "/usr/local/lib/python3.7/site-packages/diffoscope/readers/__init__.py", line 
> 31, in load_diff_from_path
>return load_diff(codecs.getreader("utf-8")(fp), path)
>  File 
> "/usr/local/lib/python3.7/site-packages/diffoscope/readers/__init__.py", line 
> 35, in load_diff
>return JSONReaderV1().load(fp, path)
>  File "/usr/local/lib/python3.7/site-packages/diffoscope/readers/json.py", 
> line 33, in load
>raw = json.load(fp)
>  File "/usr/local/lib/python3.7/json/__init__.py", line 293, in load
>return loads(fp.read(),
>  File "/usr/local/lib/python3.7/codecs.py", line 504, in read
>newchars, decodedbytes = self.decode(data, self.errors)
> UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb7 in position 18: 
> invalid start byte
> 
> The two older snapshots of a Boot Environment have
> bin/sh files that compare equal. But every program I
> tried the above sort of thing against on got the same
> UnicodeDecodeError result from diffoscope, byte value
> and position matching.
> 
> These snapshots have more than an installworld in them
> and so are messy to compare overall. But the
> installworld (and

Re: FYI: WITH_REPRODUCIBLE_BUILD= problem for some files?

2021-05-03 Thread Mark Millard via freebsd-stable

On 2021-May-3, at 19:26, Mark Millard  wrote:

> On 2021-May-3, at 10:51, Mark Millard  wrote:
> 
>> On 2021-May-3, at 07:47, Ed Maste  wrote:
>> 
>>> On Thu, 29 Apr 2021 at 02:50, Mark Millard via freebsd-current
>>>  wrote:
>>>> 
>>>> Files /usr/obj/DESTDIRs/13_0R-CA7-chroot/sbin/ping and 
>>>> /usr/obj/DESTDIRs/13_0R-CA7-poud/sbin/ping differ
>>>> Files /usr/obj/DESTDIRs/13_0R-CA7-chroot/sbin/ping6 and 
>>>> /usr/obj/DESTDIRs/13_0R-CA7-poud/sbin/ping6 differ
>>>> Files /usr/obj/DESTDIRs/13_0R-CA7-chroot/usr/bin/ntpq and 
>>>> /usr/obj/DESTDIRs/13_0R-CA7-poud/usr/bin/ntpq differ...
>>> 
>>> This is unexpected. Unfortunately I haven't looked at reproducibility
>>> in a while, and my work was all on x86. This could be a regression or
>>> a longstanding issue with arm64.
>>> 
>>> If you install the diffoscope package (py37-diffoscope) and run it on
>>> the two directories / files it should give a more convenient view of
>>> the differences. (Or, if you can make a tarball of the differing files
>>> I can take a look.)
>> 
>> I no longer have the same content in those directory
>> trees: newer rebuild and the same buildworld used to
>> installworld to both places, instead of 2 different
>> buildworld's. I'm also unsure how reproducible getting
>> differences was.
>> 
>> I can eventually do experiments to test multiple separate
>> buildworld's and installworld's, but the machine is busy
>> building ports and the llvm builds involved means it
>> will be some time before I'd switch activities. And the
>> buildworld's involve llvm builds as well and take notable
>> time themselves. So my next comparison will not be any
>> time soon.
>> 
>> I'll let you know if I manage to generate another example,
>> this time being sure to keep the data. If I try multiple
>> times without finding any differences, I'll eventually
>> decide "enough is enough" and let you know.
> 
> I've still got a long ways to go to do the first
> actual comparison of builds.
> 
> But I'll note that I've built and stalled py37-diffoscope
> (new to me). A basic quick test showed that it reports:
> 
> W: diffoscope.main: Fuzzy-matching is currently disabled as the "tlsh" module 
> is unavailable.
> 
> As I'm not familiar with the tool, you might need to send
> notes about how you want me to use the tool to get the
> output that you would want. (And, so, I get to learn . . .)

I've tried another experiment (* in the path matches "28" and "30"):

# diffoscope /.zfs/snapshot/2021-04-*-01:40:48-0/bin/sh
$<3/>2021-05-03 21:08:48 W: diffoscope.main: Fuzzy-matching is currently 
disabled as the "tlsh" module is unavailable.
$<3/>Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/diffoscope/main.py", line 745, 
in main
sys.exit(run_diffoscope(parsed_args))
  File "/usr/local/lib/python3.7/site-packages/diffoscope/main.py", line 677, 
in run_diffoscope
difference = load_diff_from_path(path1)
  File "/usr/local/lib/python3.7/site-packages/diffoscope/readers/__init__.py", 
line 31, in load_diff_from_path
return load_diff(codecs.getreader("utf-8")(fp), path)
  File "/usr/local/lib/python3.7/site-packages/diffoscope/readers/__init__.py", 
line 35, in load_diff
return JSONReaderV1().load(fp, path)
  File "/usr/local/lib/python3.7/site-packages/diffoscope/readers/json.py", 
line 33, in load
raw = json.load(fp)
  File "/usr/local/lib/python3.7/json/__init__.py", line 293, in load
return loads(fp.read(),
  File "/usr/local/lib/python3.7/codecs.py", line 504, in read
newchars, decodedbytes = self.decode(data, self.errors)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb7 in position 18: 
invalid start byte

The two older snapshots of a Boot Environment have
bin/sh files that compare equal. But every program I
tried the above sort of thing against on got the same
UnicodeDecodeError result from diffoscope, byte value
and position matching.

These snapshots have more than an installworld in them
and so are messy to compare overall. But the
installworld (and installkernel) content show similar
differences to what I reported before as far as
example files with differences go. But this is aarch64,
not armv7.

It will still be notable time before I have simple
installworld tree's to compare.

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: FYI: WITH_REPRODUCIBLE_BUILD= problem for some files?

2021-05-03 Thread Mark Millard via freebsd-stable

On 2021-May-3, at 10:51, Mark Millard  wrote:

> On 2021-May-3, at 07:47, Ed Maste  wrote:
> 
>> On Thu, 29 Apr 2021 at 02:50, Mark Millard via freebsd-current
>>  wrote:
>>> 
>>> Files /usr/obj/DESTDIRs/13_0R-CA7-chroot/sbin/ping and 
>>> /usr/obj/DESTDIRs/13_0R-CA7-poud/sbin/ping differ
>>> Files /usr/obj/DESTDIRs/13_0R-CA7-chroot/sbin/ping6 and 
>>> /usr/obj/DESTDIRs/13_0R-CA7-poud/sbin/ping6 differ
>>> Files /usr/obj/DESTDIRs/13_0R-CA7-chroot/usr/bin/ntpq and 
>>> /usr/obj/DESTDIRs/13_0R-CA7-poud/usr/bin/ntpq differ...
>> 
>> This is unexpected. Unfortunately I haven't looked at reproducibility
>> in a while, and my work was all on x86. This could be a regression or
>> a longstanding issue with arm64.
>> 
>> If you install the diffoscope package (py37-diffoscope) and run it on
>> the two directories / files it should give a more convenient view of
>> the differences. (Or, if you can make a tarball of the differing files
>> I can take a look.)
> 
> I no longer have the same content in those directory
> trees: newer rebuild and the same buildworld used to
> installworld to both places, instead of 2 different
> buildworld's. I'm also unsure how reproducible getting
> differences was.
> 
> I can eventually do experiments to test multiple separate
> buildworld's and installworld's, but the machine is busy
> building ports and the llvm builds involved means it
> will be some time before I'd switch activities. And the
> buildworld's involve llvm builds as well and take notable
> time themselves. So my next comparison will not be any
> time soon.
> 
> I'll let you know if I manage to generate another example,
> this time being sure to keep the data. If I try multiple
> times without finding any differences, I'll eventually
> decide "enough is enough" and let you know.

I've still got a long ways to go to do the first
actual comparison of builds.

But I'll note that I've built and stalled py37-diffoscope
(new to me). A basic quick test showed that it reports:

W: diffoscope.main: Fuzzy-matching is currently disabled as the "tlsh" module 
is unavailable.

As I'm not familiar with the tool, you might need to send
notes about how you want me to use the tool to get the
output that you would want. (And, so, I get to learn . . .)

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)


===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: FYI: WITH_REPRODUCIBLE_BUILD= problem for some files?

2021-05-03 Thread Mark Millard via freebsd-stable

On 2021-May-3, at 07:47, Ed Maste  wrote:

> On Thu, 29 Apr 2021 at 02:50, Mark Millard via freebsd-current
>  wrote:
>> 
>> Files /usr/obj/DESTDIRs/13_0R-CA7-chroot/sbin/ping and 
>> /usr/obj/DESTDIRs/13_0R-CA7-poud/sbin/ping differ
>> Files /usr/obj/DESTDIRs/13_0R-CA7-chroot/sbin/ping6 and 
>> /usr/obj/DESTDIRs/13_0R-CA7-poud/sbin/ping6 differ
>> Files /usr/obj/DESTDIRs/13_0R-CA7-chroot/usr/bin/ntpq and 
>> /usr/obj/DESTDIRs/13_0R-CA7-poud/usr/bin/ntpq differ...
> 
> This is unexpected. Unfortunately I haven't looked at reproducibility
> in a while, and my work was all on x86. This could be a regression or
> a longstanding issue with arm64.
> 
> If you install the diffoscope package (py37-diffoscope) and run it on
> the two directories / files it should give a more convenient view of
> the differences. (Or, if you can make a tarball of the differing files
> I can take a look.)

I no longer have the same content in those directory
trees: newer rebuild and the same buildworld used to
installworld to both places, instead of 2 different
buildworld's. I'm also unsure how reproducible getting
differences was.

I can eventually do experiments to test multiple separate
buildworld's and installworld's, but the machine is busy
building ports and the llvm builds involved means it
will be some time before I'd switch activities. And the
buildworld's involve llvm builds as well and take notable
time themselves. So my next comparison will not be any
time soon.

I'll let you know if I manage to generate another example,
this time being sure to keep the data. If I try multiple
times without finding any differences, I'll eventually
decide "enough is enough" and let you know.

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

FYI: WITH_REPRODUCIBLE_BUILD= problem for some files?

2021-04-29 Thread Mark Millard via freebsd-stable

I did 2 test buildworld's based on:

# ~/fbsd-based-on-what-freebsd.sh 
branch: releng/13.0
merge-base: ea31abc261ffc01b6ff5671bffb15cf910a07f4b
merge-base: CommitDate: 2021-04-09 00:14:30 +
ea31abc261ff (HEAD -> releng/13.0, tag: release/13.0.0, freebsd/releng/13.0) 
13.0: update to RELEASE
n244733 (--first-parent --count for merge-base)

and produced separate build trees. I also installed the
world build into two separate directory trees:

/usr/obj/DESTDIRs/13_0R-CA7-chroot/
vs.
/usr/obj/DESTDIRs/13_0R-CA7-poud/

This was for other reasons. But eventually I happened
to do a diff -rq of the two trees and ended up with
the output showing some differing files:

Files /usr/obj/DESTDIRs/13_0R-CA7-chroot/sbin/ping and 
/usr/obj/DESTDIRs/13_0R-CA7-poud/sbin/ping differ
Files /usr/obj/DESTDIRs/13_0R-CA7-chroot/sbin/ping6 and 
/usr/obj/DESTDIRs/13_0R-CA7-poud/sbin/ping6 differ
Files /usr/obj/DESTDIRs/13_0R-CA7-chroot/usr/bin/ntpq and 
/usr/obj/DESTDIRs/13_0R-CA7-poud/usr/bin/ntpq differ
Files /usr/obj/DESTDIRs/13_0R-CA7-chroot/usr/lib/debug/sbin/ping.debug and 
/usr/obj/DESTDIRs/13_0R-CA7-poud/usr/lib/debug/sbin/ping.debug differ
Files /usr/obj/DESTDIRs/13_0R-CA7-chroot/usr/lib/debug/usr/sbin/ntpd.debug and 
/usr/obj/DESTDIRs/13_0R-CA7-poud/usr/lib/debug/usr/sbin/ntpd.debug differ
Files 
/usr/obj/DESTDIRs/13_0R-CA7-chroot/usr/lib/debug/usr/tests/sbin/ping/in_cksum_test.debug
 and 
/usr/obj/DESTDIRs/13_0R-CA7-poud/usr/lib/debug/usr/tests/sbin/ping/in_cksum_test.debug
 differ
Files /usr/obj/DESTDIRs/13_0R-CA7-chroot/usr/sbin/ntp-keygen and 
/usr/obj/DESTDIRs/13_0R-CA7-poud/usr/sbin/ntp-keygen differ
Files /usr/obj/DESTDIRs/13_0R-CA7-chroot/usr/sbin/ntpd and 
/usr/obj/DESTDIRs/13_0R-CA7-poud/usr/sbin/ntpd differ
Files /usr/obj/DESTDIRs/13_0R-CA7-chroot/usr/sbin/ntpdate and 
/usr/obj/DESTDIRs/13_0R-CA7-poud/usr/sbin/ntpdate differ
Files /usr/obj/DESTDIRs/13_0R-CA7-chroot/usr/sbin/ntpdc and 
/usr/obj/DESTDIRs/13_0R-CA7-poud/usr/sbin/ntpdc differ
Files /usr/obj/DESTDIRs/13_0R-CA7-chroot/usr/sbin/sntp and 
/usr/obj/DESTDIRs/13_0R-CA7-poud/usr/sbin/sntp differ
Files /usr/obj/DESTDIRs/13_0R-CA7-chroot/usr/tests/sbin/ping/in_cksum_test and 
/usr/obj/DESTDIRs/13_0R-CA7-poud/usr/tests/sbin/ping/in_cksum_test differ

(That is all.)

For as much as I've looked at (not much), it looks to be
variations in byte-padding values.

The builds both were set up to tune for cortex-a7 explicitly.
I patch top's source code. I patch the OOM kill code to report
the specific reason for a kill. I still have some bcm2838
pci/xhci patching in place from an old investigation, but
that would be kernel code. None of the patching is specific
to the above list of files.

The hosting context was:

# uname -apKU
FreeBSD CA72_4c8G_ZFS 13.0-RELEASE FreeBSD 13.0-RELEASE #1 
releng/13.0-n244733-ea31abc261ff-dirty: Wed Apr 28 05:45:27 PDT 2021 
root@CA72_4c8G_ZFS:/usr/obj/BUILDs/13_0R-CA72-nodbg-clang/usr/src/arm64.aarch64/sys/GENERIC-NODBG-CA72
  arm64 aarch64 1300139 1300139

based on building the same source code (tuning for
cortex-a72).

It was the same media for all the activity. Unlike
the past many years for me, the context is using ZFS
instead of UFS, not that I think that makes a
difference here.

The differences do not mess up my activity but others
might notice and care about such differences.

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: (D29934) Reorder commented steps in UPDATING following sequential order. (was: etcupdate -p vs. root on zfs (and bectl use and such): no /usr/src/etc/master.passwd (for example))

2021-04-25 Thread Mark Millard via freebsd-stable



On 2021-Apr-25, at 08:14, Graham Perrin  wrote:

> On 23/04/2021 08:39, Mark Millard via freebsd-current wrote:
> 
>>   [3]
> 
> 
> With regard to mounting ZFS file systems in single user mode
> 
> What's currently footnote 3 will probably become footnote 4, please see:
> 
> <https://reviews.freebsd.org/D29934#inline-186101>
> 
> … and so on.

If it were me, I'd probably do something to make the
mounting of file systems and such have an explicit
reminder as its own step, something like:


[4]
mergemaster -Fp [5]

I just do not think of such as part of :
it is already rebooted in single user at that point in my
view.

Sorry that I missed what was there in UPDATING.

However, /usr/src/Makefile has:

#  1.  `cd /usr/src'   (or to the directory containing your source tree).
#  2.  `make buildworld'
#  3.  `make buildkernel KERNCONF=YOUR_KERNEL_HERE' (default is GENERIC).
#  4.  `make installkernel KERNCONF=YOUR_KERNEL_HERE'   (default is GENERIC).
#   [steps 3. & 4. can be combined by using the "kernel" target]
#  5.  `reboot'(in single user mode: boot -s from the loader prompt).
#  6.  `mergemaster -p'
#  7.  `make installworld'
#  8.  `mergemaster'(you may wish to use -i, along with -U or -F).
#  9.  `make delete-old'
# 10.  `reboot'
# 11.  `make delete-old-libs' (in case no 3rd party program uses them anymore)

without such material, even in footnotes.


Side notes:

"From the bootblocks, boot -s, and then do":
"From the boot loader, boot -s, and then do"?

etcupdate vs. mergemaster and the $FreeBSD$ issue?
Is mergemaster going to stay as the recommented
command to use? If so, with which command line
options?

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Despite the documentation, "etcupdate extract" handles -D destdir (and its contribution to the default workdir)

2021-04-24 Thread Mark Millard via freebsd-stable

# etcupdate -?
Illegal option -?

usage: etcupdate [-npBF] [-d workdir] [-r | -s source | -t tarball]
 [-A patterns] [-D destdir] [-I patterns] [-L logfile]
 [-M options]
   etcupdate build [-B] [-d workdir] [-s source] [-L logfile] [-M options]
 
   etcupdate diff [-d workdir] [-D destdir] [-I patterns] [-L logfile]
   etcupdate extract [-B] [-d workdir] [-s source | -t tarball] [-L logfile]
 [-M options]
   etcupdate resolve [-p] [-d workdir] [-D destdir] [-L logfile]
   etcupdate status [-d workdir] [-D destdir]

The "etcupdate extract" material does not show -D destdir as valid.


# man etcupdate
. . .
SYNOPSIS
 etcupdate [-npBF] [-d workdir] [-r | -s source | -t tarball]
   [-A patterns] [-D destdir] [-I patterns] [-L logfile]
   [-M options]
 etcupdate build [-B] [-d workdir] [-s source] [-L logfile] [-M options]
   tarball
 etcupdate diff [-d workdir] [-D destdir] [-I patterns] [-L logfile]
 etcupdate extract [-B] [-d workdir] [-s source | -t tarball] [-L logfile]
   [-M options]
 etcupdate resolve [-p] [-d workdir] [-D destdir] [-L logfile]
 etcupdate status [-d workdir] [-D destdir]
. . .

Again the "etcupdate extract" material does not show -D destdir as valid.

But I used it:

# etcupdate extract -D usr/obj/DESTDIRs/13_0R-CA7-for-chroot

and it created and filled in the workdir:

/usr/obj/DESTDIRs/13_0R-CA7-for-chroot/var/db/etcupdate/


I have not checked on if "etcupdate build" has a similar issue
vs. not.

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

https://artifact.ci.freebsd.org/snapshot/stable-13/?C=M=D messed up dates and HASHID-only use make things extremely hard to find "in time order"

2021-04-23 Thread Mark Millard via freebsd-stable

Using an example to illustrate problems finding artifacts,
the problems not being limited to the example's specifics.

I have historically used https://artifact.ci.freebsd.org/snapshot/
to do build-less approximate bisecting (and other things). Such
use is very messed up since the git-related URL conventions
chosen were put in place. The below illustrates an example
of the mess for how things are currently presented.

https://artifact.ci.freebsd.org/snapshot/stable-13/?C=M=D

lists ac845558f7b626d9a31b8f6dab686c45d39dc5a0/ as having
date/time 2021-Apr-10 18:43 .

But:

https://artifact.ci.freebsd.org/snapshot/stable-13/ac845558f7b626d9a31b8f6dab686c45d39dc5a0/?C=M=D
lists:

powerpc/ and arm/ as having date/times 2021-Apr-10 18:54 and 2021-Apr-10 18:50
yet lists...
i386/ and arm64/ as having date/times 2021-Feb-19 19:00 and 2021-Feb-19 18:50 .

But it gets worse:

https://artifact.ci.freebsd.org/snapshot/stable-13/ac845558f7b626d9a31b8f6dab686c45d39dc5a0/powerpc/?C=M=D

shows an empty directory. Same for:

https://artifact.ci.freebsd.org/snapshot/stable-13/ac845558f7b626d9a31b8f6dab686c45d39dc5a0/arm/?C=M=D

By contrast,

https://artifact.ci.freebsd.org/snapshot/stable-13/ac845558f7b626d9a31b8f6dab686c45d39dc5a0/i386/?C=M=D

shows i386/ with date/time 2021-Apr-10 18:43 but

https://artifact.ci.freebsd.org/snapshot/stable-13/ac845558f7b626d9a31b8f6dab686c45d39dc5a0/i386/i386/?C=M=D

shows all the file dates as 2021-Feb-19 19:00 .

Going back to arm64/ I find a similar 2021-Feb-19 dating,
although 21021-Feb does show up in more places:

https://artifact.ci.freebsd.org/snapshot/stable-13/ac845558f7b626d9a31b8f6dab686c45d39dc5a0/arm64/?C=M=D

shows aarch64/ with date/time 2021-Feb-19 18:50 and

https://artifact.ci.freebsd.org/snapshot/stable-13/ac845558f7b626d9a31b8f6dab686c45d39dc5a0/arm64/aarch64/?C=M=D

shows the files also having the date/time 2021-Feb-19 18:50 .


In my view the choice to only use the hash-id for the commit
in the url is a usability mistake and the url prefix should
be of a form more like (for this example context):

https://artifact.ci.freebsd.org/snapshot/stable-13/n??-HASHID/

where the ?? is from:

git rev-list --first-parent --count

(as used elsewhere by FreeBSD).

(The HASHID might be just the 12 character prefix instead of the
whole hash-id as well.)

Such a convention would be more independent of dates possibly
being touched on the file server and would make time ordered
finding of things (such as for build-less approximate bisecting)
far more reasonable.



===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Is stable/13 going to start getting snapshot builds?

2021-04-23 Thread Mark Millard via freebsd-stable

Is stable/13 going to start getting snapshot builds?

(As stands, main , stable/12 , and stable/11 are getting them.)

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

etcupdate -p vs. root on zfs (and bectl use and such): no /usr/src/etc/master.passwd (for example)

2021-04-23 Thread Mark Millard via freebsd-stable

FYI: The default bsdinstall result for auto ZFS that I tried
has a separate zroot/usr/src dataset, which zfs mounts at
/usr/src .

UPDATING and such places indicate sequences like:
(think etcupdate where it lists mergemaster and ignore
-F and -Fi)


make buildworld
make buildkernel KERNCONF=YOUR_KERNEL_HERE
make installkernel KERNCONF=YOUR_KERNEL_HERE
[1]
 [3]
mergemaster -Fp [5]
NOTE: What /usr/src/etc/master.passwd here? (for example)
make installworld
mergemaster -Fi [4]
make delete-old [6]


etcupdate has the logic for handling -p:

if [ -n "$preworld" ]; then
# Build a limited tree that only contains files that are
# crucial to installworld.
for file in $PREWORLD_FILES; do
name=$(basename $file)
mkdir -p $1/etc >&3 2>&1 || return 1
cp -p $SRCDIR/$file $1/etc/$name || return 1
done

Note the "$SRCDIR/$file". But for a boot -s after
installing the kernel there is only zroot/ROOT/NAME
and no zroot/usr/src zfs mount so /usr/src/ is empty.

This leads to needing an additional step:

zfs mount zroot/usr/src

(The instructions do not deal with making / writable at this
stage either.)



===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

13.0-RELEASE bsdinstall failure : looked for MANIFEST in wrong place (not with *.txz files)

2021-04-17 Thread Mark Millard via freebsd-stable

Booted RPi4 via micrsd card dd'd from:

FreeBSD-13.0-RELEASE-arm64-aarch64-RPI.img

I attempted a bsdinstall onto a USB3 SSD. The following
reports what happened.

# bsdinstall

default keymap Select
Hostname OK
ftp mirror OK
Auto (ZFS) OK
Install Select
stripe OK
[*] da0 OK
Last Chance! da0 YES

Error while fetching
file:///usr/freebsd-dist/MANIFEST │
: No such file or directory

OK
Exit

NOTE: the path is /usr/freebsd-dist/MANIFEST instead
of /mnt/usr/freebsd-dist/MANIFEST but . . .

# df -m
Filesystem 1M-blocks Used  Avail Capacity  Mounted on
/dev/ufs/rootfs28862 3217  2333612%/
devfs  00  0   100%/dev
/dev/msdosfs/MSDOSBOOT49   24 2549%/boot/msdos
tmpfs 500 49 1%/tmp
zroot/ROOT/default197406  183 197222 0%/mnt
zroot/tmp 1972220 197222 0%/mnt/tmp
zroot/usr/home1972220 197222 0%/mnt/usr/home
zroot/usr/ports   1972220 197222 0%/mnt/usr/ports
zroot/usr/src 1972220 197222 0%/mnt/usr/src
zroot/var/audit   1972220 197222 0%/mnt/var/audit
zroot/var/crash   1972220 197222 0%/mnt/var/crash
zroot/var/log 1972220 197222 0%/mnt/var/log
zroot/var/mail1972220 197222 0%/mnt/var/mail
zroot/var/tmp 1972220 197222 0%/mnt/var/tmp
zroot 1972220 197222 0%/mnt/zroot

# ls -Tla /mnt/usr/freebsd-dist/
total 187454
drwxr-xr-x  2 root  wheel  4 Apr  9 07:39:20 2021 .
drwxr-xr-x  6 root  wheel  6 Apr  9 07:39:20 2021 ..
-rw-r--r--  1 root  wheel  165248188 Apr  9 07:39:20 2021 base.txz
-rw-r--r--  1 root  wheel   26552108 Apr  9 07:39:21 2021 kernel.txz

# ls -Tla /usr/freebsd-dist/
ls: /usr/freebsd-dist/: No such file or directory


NOTE: creating /usr/freebsd-dist/ with a copy of the MANIFEST
file in it was enough to get past this issue: it is doing
Archive Extraction now.

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

powerpc64le is missing in: https://www.freebsd.org/platforms/

2021-03-30 Thread Mark Millard via freebsd-stable

When I looked at https://www.freebsd.org/platforms/ I noticed
that "64-bit little-endian PowerPC" powerpc64le is not listed.

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Filesystem operations slower in 13.0 than 12.2

2021-03-23 Thread Mark Millard via freebsd-stable

On 2021-Mar-22, at 22:51, Kevin Oberman  wrote:

> On Mon, Mar 22, 2021 at 8:19 AM Adrian Chadd  wrote:
>> On Mon, 15 Mar 2021 at 14:58, Kevin Oberman  wrote:
>> 
>> > >
>> > > It appears that the messages are associated with reading
>> > > the disk(s), not directly with writing them, where the
>> > > reads take more than "hz * 20" time units to complete.
>> > > (I'm looking at main (14) code.) What might contribute
>> > > to the time taken for the pending read(s)?
>> > >
>> > The reference to hz * 20 woke up a few sleeping memory cells. I forgot that
>> > I cleaned up my loader.conf. It was largely a copy of the one on my
>> > decade-old T520. I commented out "kern.hz=100". I don't recall the details,
>> > but I think it was actually from an even older system, my T42 from before I
>> > retired.
>> >
>> > In any case, restoring this setting has greatly improved the situation. I
>> > now have really bad disk I/O performance on large disk to disk activity
>> > (untarring the firefox distro) instead of terrible performance and the
>> > system freezes have vanished, though I do see pauses in response to clicks
>> > or text entry, but the display remains active and the pauses are short... 1
>> > to 15 seconds, I'd guess. No, I have no idea what this indicates.
>> 
>> ... which drive controller is this? Is it just a laptop ATA disk?
>> 
>> > I'm still not seeing the performance I was seeing back in February when 40
>> > MB/s for extended intervals was common and I once untarred firefox.tar.gz2
>> > in under a minute and performance seldom dropped below 1.4 MB/s.
>> 
>> Did you find a resolution?  I wonder if setting kern.hz is kicking
>> some process(es) to get some time more frequently due to bugs
>> elsewhere in the system (interrupts, IPI handling, wake-ups, etc)
>> 
>> 
>> 
>> -adrian

> No resolution. This is a Lenovo L15 ThinkPad with a 2TB ATAPI drive.

I've not found documentation indicating the "which drive
controller" answer. That may have to be answered from boot
messages or boot -v messages or other such on FreeBSD.
(I've no access to such a machine.)

You might want to put a copy of such a log someplace that
folks could look at it. There may be commands that some
folks would like to see the output of. (I'm not all that
likely to be one that could put such to use but other
folks might be able to.)

Intel® Celeron®? 10th Generation Intel CoreTM i3? i5? i7?

> The current drive is a Seagate.  All testing has been done since I got it 
> back from Lenovo in late January. I can read or write the drive at reasonable 
> rates that exceed 50 MB/s. Extracting a tar distribution file is painful. I 
> have had firefox extracts take over a half hour. Worse, if I do other 
> operations while the extract is taking place, I often see a 30 second (and, 
> occasionally 60 second) display freezes

I thought that you had reported that use of kern.hz=100
had lead to "the system freezes have vanished" and "pauses
are short... 1 to 15 seconds". Did more testing show that
to not be always the case?

> as well as log reports that of "swap_pager: indefinite wait buffer:"

Unfortunately, I do not know how to investigate what is leading to
those message being generated. Figuring that out would seem to be
important but I do not know what to monitor to at least potentially
eliminate some possibilities.

One possible thing to look at is something like "gstat -spod"
output spanning the time of the untar. It would at least
indicate if a large queue backlog was accumulating on the
device. And the ms/r and ms/w columns would give a clue if
commands are sitting in the queues for long periods. (The
"d" may be a waste: no BIO_DELETEs possible? Also, the r/s
vs. ms/r are not rescaled reciprocals but distinct
measurements. Similarly for: the w/s vs. ms/w.)

Given the "indefinite wait buffer" messages, I expect
the ms/r and/or ms/w figures to be large at least some
of the time. Knowing how large may be of use to someone.
But I can not eliminate anything with such information.

>  This is a bit odd as I have 20G of RAM and am pretty close to no swap space 
> activity, but, of course, paging does occur. 

With 20 GiBytes of RAM, what is going on at the time that
leads to paging activity? I'm thinking of just untarring
the firefox file, not building firefox or such. Can you
test such an untar in a context that is not otherwise
paging (nor swapping)? If yes, is the behavior different
in any readily noticeable way?

> This system is CometLake and graphics are not supported on 12. I am not 
> absolutely sure that there is no

Re: Filesystem operations slower in 13.0 than 12.2

2021-03-15 Thread Mark Millard via freebsd-stable

On 2021-Mar-15, at 14:57, Kevin Oberman  wrote:

> Responses in-line.
> 
> On Sun, Mar 14, 2021 at 3:09 PM Mark Millard  wrote:
> 
>> On 2021-Mar-14, at 11:09, Kevin Oberman  wrote:
>> 
>> > . . .
>> >  
>> > Seems to only occur on large r/w operations from/to the same disk. "sp
>> > big-file /other/file/on/same/disk" or tar/untar operations on large files.
>> > Hit this today updating firefox.
>> > 
>> > I/O starts at >40MB/s. Dropped to about 1.5MB/s. If I tried doing other
>> > things while it was running slowly, the disk would appear to lock up. E.g.
>> > pwd(1) seemed to completely lock up the system, but I could still ping it
>> > and, after about 30 seconds, things came back to life. It was also not
>> > instantaneous. Disc activity dropped to <1MB/s for a few seconds before
>> > everything froze.
>> > 
>> > During the untar of firefox, I saw; this several times. I also looked at my
>> > console where I found these errors during :
>> > swap_pager: indefinite wait buffer: bufobj: 0, blkno: 55043, size: 8192
>> > swap_pager: indefinite wait buffer: bufobj: 0, blkno: 51572, size: 4096
>> 
>> Does anyone know:
>> Are those messages normal "reading is taking a rather long
>> time" notices or is their presence more useful information
>> in some way about the type of problem or context for the
>> problem?
>> 
> As for the tests:
> Are these messages always present when near a time frame
> when the problem occurs? Never present in a near time
> frame to a period when the problem does not occur?
> In a large number of test, these errors have not repeated. They baffle me for 
> another reason. This system has 20G or RAM. Tyically, all swap space is 
> unused. ATM I see 16384M free out of 16384. Not sure that I have ever seen it 
> used, though it might have been while building rust. I have not built rust 
> for a month.
> 
> It appears that the messages are associated with reading
> the disk(s), not directly with writing them, where the
> reads take more than "hz * 20" time units to complete.
> (I'm looking at main (14) code.) What might contribute
> to the time taken for the pending read(s)?
> The reference to hz * 20 woke up a few sleeping memory cells. I forgot that I 
> cleaned up my loader.conf. It was largely a copy of the one on my decade-old 
> T520. I commented out "kern.hz=100". I don't recall the details, but I think 
> it was actually from an even older system, my T42 from before I retired.
> 
> In any case, restoring this setting has greatly improved the situation. I now 
> have really bad disk I/O performance on large disk to disk activity 
> (untarring the firefox distro) instead of terrible performance and the system 
> freezes have vanished, though I do see pauses in response to clicks or text 
> entry, but the display remains active and the pauses are short... 1 to 15 
> seconds, I'd guess. No, I have no idea what this indicates.

Interesting.

> I'm still not seeing the performance I was seeing back in February when 40 
> MB/s for extended intervals was common and I once untarred firefox.tar.gz2 in 
> under a minute and performance seldom dropped below 1.4 MB/s.
>  
>>> /*
>>>  * swap_pager_getpages() - bring pages in from swap
>>>  *
>>>  *  Attempt to page in the pages in array "ma" of length "count".  The
>>>  *  caller may optionally specify that additional pages preceding and
>>>  *  succeeding the specified range be paged in.  The number of such 
>>> pages
>>>  *  is returned in the "rbehind" and "rahead" parameters, and they will
>>>  *  be in the inactive queue upon return.
>>>  *
>>>  *  The pages in "ma" must be busied and will remain busied upon return.
>>>  */
>>> static int
>>> swap_pager_getpages_locked(vm_object_t object, vm_page_t *ma, int count,
>>> int *rbehind, int *rahead)
>>> {
>>> . . .
>>> /*
>>>  * Wait for the pages we want to complete.  VPO_SWAPINPROG is always
>>>  * cleared on completion.  If an I/O error occurs, SWAPBLK_NONE
>>>  * is set in the metadata for each page in the request.
>>>  */
>>> VM_OBJECT_WLOCK(object);
>>> /* This could be implemented more efficiently with aflags */
>>> while ((ma[0]->oflags & VPO_SWAPINPROG) != 0) {
>>> ma[0]->oflags |= VPO_SWAPSLEEP;
>>>

Re: Filesystem operations slower in 13.0 than 12.2

2021-03-14 Thread Mark Millard via freebsd-stable

reasonable values for 
> everything. No indication of a HW problem. The system performs well unless I 
> do something that tries a bulk disk data move. Building world takes about 75 
> minutes. I just have a very hard time building big ports.

Almost like things were stuck-sleeping and then the
sleep(s) finished?


===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Filesystem operations slower in 13.0 than 12.2

2021-03-05 Thread Mark Millard via freebsd-stable

Konstantin Belousov kostikbel at gmail.com wrote on
Fri Mar 5 23:12:13 UTC 2021 :

> On Sat, Mar 06, 2021 at 12:27:55AM +0200, Christos Chatzaras wrote:
. . .
> > Command: /usr/bin/time -l portsnap extract (these tests done with 2 
> > different idle servers but with same 4TB HDDs models)
> > 
> > FreeBSD 12.2p4
> > 
> >99.45 real34.90 user59.63 sys
> >   100.00 real34.91 user59.97 sys
> >82.95 real35.98 user60.68 sys
> > 
> > FreeBSD 13.0-RC1
> > 
> >   217.43 real75.67 user   110.97 sys
> >   125.50 real63.00 user96.47 sys
> >   118.93 real62.91 user96.28 sys
> . . .
> In the portsnap results for 13RC1, the variance is too high to conclude
> anything, I think.

I'll note that there are other reports of wide variance
in transfer rates observed during an overall operation
such as "make extract". The one I'm thinking of is:

https://lists.freebsd.org/pipermail/freebsd-stable/2021-March/093251.html

which is an update to earlier reports, but based on more recent
stable/13. https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=253968
comment 4 has some more notes about the context. The "make extract"
for firefox likely is not as complicated as the portsnap extract
example's execution structure.

Might be something to keep an eye on if there are on-going
examples of over time.

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Filesystem operations slower in 13.0 than 12.2

2021-03-05 Thread Mark Millard via freebsd-stable

On 2021-Mar-4, at 14:16, Mark Millard  wrote:

> Christos Chatzaras chris at cretaforce.gr wrote on
> Thu Mar 4 21:41:01 UTC 2021 :
> 
> 
>> After finding slow filesystem operations with 13.0-BETA2 I did more tests.
>> 
>> All tests done with same hardware (Seagate ST4000NM0245 4TB HDD - 2 disks 
>> with RAID-1 using gmirror).
>> 
>> Filesystem mounted with noatime.
>> 
>> Command used:
>> 
>> /usr/bin/time -l portsnap extract
>> 
>> but similar differences I see with "/usr/bin/time -l rm -fr /usr/ports"
> 
> I doubt that "rm -fr" gets large differences of the
> type:
> 
> (from 12.2p4:)
> 0  messages sent
> 0  messages received
> vs. (13.0-BETA4 and 14.0-CURRENT:)
>  4412  messages sent
>   2536379  messages received

The more I think above the above figures, the more
it seems like 12.2 probably just does not track
messsages sent and received, especially given the
lack of huge "voluntary context switches" differences
vs. 13.0-BETA4 and 14.0-CURRENT. (I expect the message
sends/receives to context switch, but I might be
wrong.)

> In other words, large variations in Inter-Process-Communiciation
> counts, especially "received".
> 
> It is not obvious that the "portsnap extract" issue
> is dominated by file system I/O vs IPC issues.
> 
> portsanp is a script and does something that looks
> like the following, with the "while read" happening
> over 29000 times:
> 
> . . . | while read FILE HASH; do
>echo ${PORTSDIR}/${FILE}
>if ! [ -s "${WORKDIR}/files/${HASH}.gz" ]; then
>echo "files/${HASH}.gz not found -- snapshot corrupt."
>return 1
>fi
>case ${FILE} in
>*/)
>rm -rf ${PORTSDIR}/${FILE%/}
>mkdir -p ${PORTSDIR}/${FILE}
>tar -xz --numeric-owner -f ${WORKDIR}/files/${HASH}.gz 
> \
>-C ${PORTSDIR}/${FILE}
>;;
>*)
>rm -f ${PORTSDIR}/${FILE}
>tar -xz --numeric-owner -f ${WORKDIR}/files/${HASH}.gz 
> \
>-C ${PORTSDIR} ${FILE}
>;;
>esac
>done; then
> 
> I expect that the "tar -xz . . . *.gz" sort of commands
> also involve internal IPC use. (It looked like the
> portsnap script has not changed noticeably since
> something like late 2016.)

I wonder if the large user and/or sys differences between
12.2 and 13.0-BETA4 might be in process creation given the
over 29000 repititions of the loop and the number of
processes created per loop iteration.

The block input and output figures make no clear
difference that I can tell:

29  block input operations
  2783  block output operations
vs.
   716  block input operations
   868  block output operations

There is also:

  11821398  page reclaims
vs.
  12288156  page reclaims

but none of that suggests that scale of differences in:

   98.18 real35.31 user59.31 sys
vs.
  163.81 real71.93 user   107.32 sys

So it might be that "time -l" just does not report
on what makes up much of the difference.

Given the scale of the differences, I'd not expect
the variations in the likes of "involuntary context
switches" or the like to explain much of the
observed differences.

(I avoid 14.0-CURRENT for this because of its debug
build status that was reported. I avoid 13.0-BETA2
because of know block input/output operation count
issues.)

> (13.0-BETA2 showed a large "voluntary context switches"
> difference as well, but I ignore that middle step in
> the version sequence here.)
> 
> So I expect publishing the "rm -fr /usr/ports" figures
> from "time -l" would be appropriate. I do not know if
> the reports should be via separate topic or not but I
> doubt the figures with large differences will be the
> same for most-modern vs. older: I do not expect notable
> IPC from "rm -fr".
> 
>> --
>> 
>> FreeBSD 12.2p4 
>> 
>>   98.18 real35.31 user59.31 sys
>> 49064  maximum resident set size
>>21  average shared memory size
>> 3  average unshared data size
>>86  average unshared stack size
>>  11821398  page reclaims
>> 0  page faults
>> 0  swaps
>>29  block input operations
>>  2783  block output operations
>> 0  messages sent
>> 0  me

Re: Filesystem operations slower in 13.0 than 12.2

2021-03-04 Thread Mark Millard via freebsd-stable

>175  block output operations
>   4412  messages sent
>2536379  messages received
>  0  signals received
> 385527  voluntary context switches
>369  involuntary context switches
> 
> --
> 
> Differences between 13.0 and 14-CURRENT maybe related to debugging features.
> 
> But 13.0-BETA4 is slower than 12.2. Does someone have more information about 
> this?

Again, I expect that the "time -l" figures may point in
different directions for "portsnap extract" vs.
"rm -fr /usr/ports" in your context. The question may
need to be split because the answers may be different.

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: FreeBSD 13.0-BETA2 and slow IO

2021-03-02 Thread Mark Millard via freebsd-stable

Kevin Oberman rkoberman at gmail.com wrote on
Mon Mar 1 07:11:32 UTC 2021 :

> On Sun, Feb 28, 2021 at 12:49 PM Christos Chatzaras 
> wrote:
> 
> > Did someone test if this is fixed in BETA4?
> >
> 
> Just tried to "make extract" on firefox and I am still seeing transfer
> rates around 1.7M when I would expect more like 50M. If I see the same
> thing others are, it runs for a while at >40MB and abruptly drops to
> 1.5-20M for some random time varying from a few seconds to minutes before
> jumping back to >40MB. Is this what others are seeing?

I'll note that someone submitted:

https://lists.freebsd.org/pipermail/freebsd-bugs/2021-March/100124.html

against 13.0-BETA4 for the UFS journaled soft-updates
related performance issue(s). They compared something
to 12.1-RELEASE for illustration.

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: How do I know if my 13-stable has security patches?

2021-02-25 Thread Mark Millard via freebsd-stable

aster-9312e0fd1vendor/openzfs Martin Matuska  
4 days  36  -247/+716
* | Fix build after 2c7dc6bae9fd.   Alexander Motin 4 days  1   -0/+4
* | Refactor CTL datamove KPI.  Alexander Motin 4 days  12  -162/+94
* | jail: Add pr_state to struct prison Jamie Gritton   4 days  2   
-51/+65
* | vfs: shrink struct vnode to 448 bytes on LP64   Mateusz Guzik   4 days  
1   -1/+12
* | jail: fix build after the previous commit   Mateusz Guzik   4 days  
1   -1/+1
* | jail: Change the locking around pr_ref and pr_uref  Jamie Gritton   
4 days  6   -235/+232
* | sctp: improve computation of an alternate net   Michael Tuexen  5 days  
1   -36/+49
* | sctp: clear a pointer to a net which will be removed
. . . (all the prior history) . . .

and an empty vs. non-empty status is easier to tell
apart.


===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: When did pkg(8) drop support for 12-stable?

2021-02-23 Thread Mark Millard via freebsd-stable

On 2021-Feb-23, at 18:08, Chris  wrote:

> On 2021-02-23 17:42, Mark Millard wrote:
>> (Warner is only CC'd here.)
>> Warner Losh imp at bsdimp.com wrote on
>> Wed Feb 24 01:04:13 UTC 2021 :
>>> On Tue, Feb 23, 2021, 4:51 PM Chris  wrote:
>>> > Given this is a pkg(8) error, I brought it up on ports@
>>> > but it was suggested I (also?) bring it up here on stable@
>>> >
>>> > OK awhile back I installed a copy of 12 stable from the
>>> > usb stick image. I tweaked it to my wishes then got called
>>> > away and haven't been able to get back to it until the other
>>> > day. This is still a fresh install which has a populated /usr/src.
>>> > So I
>>> > svnlite co svn://svn.freebsd.org/ports/head /usr/ports
>>> > followed by a
>>> > cd /usr/ports/ports-mgmt/pkg/ && make install clean
>>> > which returns
>>> > make
>>> > /!\ ERROR: /!\
>>> >
>>> > Ports Collection support for your FreeBSD version has ended, and no ports
>>> > are
>>> > guaranteed to build on this syst
>>> em. Please upgrade to a supported release.
>>> >
>>> > No support will be provided if you silence this message by defining
>>> > ALLOW_UNSUPPORTED_SYSTEM.
>>> >
>>> > *** Error code 1
>>> >
>>> > Stop.
>>> > Err what? Ok while I think this was from stable 12.1, it's still still 12,
>>> > and it's on stable. So what gives?
>>> >
>>> 12.1 has reached EOL now that 12.2 has been out a while.
>>> From release/12.1.0/ :
>> "Tag releng/12.1@r354233 as release/12.1.0 (12.1-RELEASE)"
>> I think that implicit in Warner's response is that
>> versions of stable/12/ that are not after r354233 are
>> also EOL. One needs to have stable/12/ material from
>> after -r354233 in order for it to be supported.
>> He might even mean that stable/12/ material from before:
>> "Tag releng/12.2@r366954 as release/12.2.0 (12.2-RELEASE)"
>> would also be considered as not supported.
>> To be safe you should be using stable/12/ material from
>> on or after -r366954 in order to have a supported
>> context.
>> (I'm not sure if anything is explicit about the status
>> of stable/12/ material between releng/12.1@r354233
>> and releng/12.2@r366954 .)
> A HUGE thanks for all of this, Mark. This is EXACTLY what I needed.
> 
> # uname -apKU
> FreeBSD fbsd12dev 12.1-STABLE FreeBSD 12.1-STABLE r363918 GENERIC  amd64 
> amd64 1201522 1201522
> which pretty well confirms what you deduced.
> I'm still a bit confused. It seems to me that it didn't _used_
> to be that way. But my brain isn't using ECC. So a couple of
> bits may be flipped.

The implication of all of stable/12/ being supported
would be support of stable/12/ from on or after its
creation:

QUOTE
Revision 339434 - Directory Listing 
Modified Fri Oct 19 00:09:24 2018 UTC (2 years, 4 months ago) by gjb 
Copied from: head revision 339432
Copy head@r339432
 to stable/12 as part of the 12.0-RELEASE cycle.

Additional post-branch commits will follow.
END QUOTE

Such does not seem likely to me. What would be the
point of dropping 12.0-RELEASE support and
12.1-RELEASE support if such stable/12/ history was
covered, some of that history being minor variations
on the 12.0-RELEASE or 12.1-RELEASE ?

Note:
Despite some claims in other messages, svn -r363918
is not 12.1-RELEASE ( not -r354233 ) and -r363918
is shown as (only) in stable/12/ by svn. Your
claim of 12-STABLE was correct, just not detailed
enough.


===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: When did pkg(8) drop support for 12-stable?

2021-02-23 Thread Mark Millard via freebsd-stable

(Warner is only CC'd here.)

Warner Losh imp at bsdimp.com wrote on
Wed Feb 24 01:04:13 UTC 2021 :

> On Tue, Feb 23, 2021, 4:51 PM Chris  wrote:
> 
> > Given this is a pkg(8) error, I brought it up on ports@
> > but it was suggested I (also?) bring it up here on stable@
> >
> > OK awhile back I installed a copy of 12 stable from the
> > usb stick image. I tweaked it to my wishes then got called
> > away and haven't been able to get back to it until the other
> > day. This is still a fresh install which has a populated /usr/src.
> > So I
> > svnlite co svn://svn.freebsd.org/ports/head /usr/ports
> > followed by a
> > cd /usr/ports/ports-mgmt/pkg/ && make install clean
> > which returns
> > make
> > /!\ ERROR: /!\
> >
> > Ports Collection support for your FreeBSD version has ended, and no ports
> > are
> > guaranteed to build on this syst
> em. Please upgrade to a supported release.
> >
> > No support will be provided if you silence this message by defining
> > ALLOW_UNSUPPORTED_SYSTEM.
> >
> > *** Error code 1
> >
> > Stop.
> > Err what? Ok while I think this was from stable 12.1, it's still still 12,
> > and it's on stable. So what gives?
> >
> 
> 12.1 has reached EOL now that 12.2 has been out a while.

>From release/12.1.0/ :

"Tag releng/12.1@r354233 as release/12.1.0 (12.1-RELEASE)"

I think that implicit in Warner's response is that
versions of stable/12/ that are not after r354233 are
also EOL. One needs to have stable/12/ material from
after -r354233 in order for it to be supported.

He might even mean that stable/12/ material from before:

"Tag releng/12.2@r366954 as release/12.2.0 (12.2-RELEASE)"

would also be considered as not supported.

To be safe you should be using stable/12/ material from
on or after -r366954 in order to have a supported
context.

(I'm not sure if anything is explicit about the status
of stable/12/ material between releng/12.1@r354233
and releng/12.2@r366954 .)

Since you did not provide the output from the
likes of "uname -apKU" (or some rough equivalent)
I've no direct clue which version you were trying.
But you should be able to compare to the above to
see which range the material is from.

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: git to svn update frequency ?

2021-02-18 Thread Mark Millard via freebsd-stable



On 2021-Feb-18, at 05:33, Mark Millard  wrote:

> mike tancsa mike at sentex.net wrote on
> Thu Feb 18 10:33:14 UTC 2021 :
> 
>> On 2/17/2021 12:10 PM, Warner Losh wrote:
>>> On Feb 17, 2021, at 6:05 AM, mike tancsa  wrote:
>>>>I noticed on a box that I update RELENG_12 via git there are more
>>>> recent commits then if I use svnlite to track.  Are they only
>>>> periodically updated ? If so, how frequently do they get refreshed ? 
>>>> e.g. I see the new OpenSSL version in git, but not when I update via
>>>> svnlite.
>>> Yes. There is a lag for a number of reasons. The updates happen on a 
>>> batched basis (it’s a script I wrote) and then there’s a delay in 
>>> replication to the main subversion servers. I believe that the rate is on 
>>> the scale of hourly, but lwhsu will have to answer that detail.
>>> 
>> Hi Warner & Li-Wen,
>> 
>>I think something might be broken somewhere ? The last update is
>> from ~ 36 hrs ago and there have been many commits to the git repo since
>> for RELENG_12.
>> 
>> # svnlite update
>> Updating '.':
>> At revision 369283.
>> #
> 
> You are referencing 12, not 13 . . .
> 
> https://cgit.freebsd.org/src/log/?h=releng/12.0
> 
> shows the most recent releng/12.0 in git is from 2021-Jan-28:
> 
> Commit message (Expand)   Author  Age Files   Lines
> Add UPDATING entries and bump version.releng/12.0 Gordon Tetlow   
> 2020-01-28  2   -1/+17
> 
> 
> Are you confusing stable/12 and releng/12.0 or 
> possibly releng/12.0 and releng/13.0 ?

Dumb of me to show releng/12.0 instead of releng/12.2 , I guess.
But I luck out: releng/12.2 was only one day more recent . . .

https://cgit.freebsd.org/src/log/?h=releng/12.2

shows:

Commit message (Expand) Author  Age Files   Lines
Add UPDATING entry and bump versionreleng/12.2  Ed Maste2021-01-29  
2   -1/+17


===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: git to svn update frequency ?

2021-02-18 Thread Mark Millard via freebsd-stable

mike tancsa mike at sentex.net wrote on
Thu Feb 18 10:33:14 UTC 2021 :

> On 2/17/2021 12:10 PM, Warner Losh wrote:
> > On Feb 17, 2021, at 6:05 AM, mike tancsa  wrote:
> >> I noticed on a box that I update RELENG_12 via git there are more
> >> recent commits then if I use svnlite to track.  Are they only
> >> periodically updated ? If so, how frequently do they get refreshed ? 
> >> e.g. I see the new OpenSSL version in git, but not when I update via
> >> svnlite.
> > Yes. There is a lag for a number of reasons. The updates happen on a 
> > batched basis (it’s a script I wrote) and then there’s a delay in 
> > replication to the main subversion servers. I believe that the rate is on 
> > the scale of hourly, but lwhsu will have to answer that detail.
> >
> Hi Warner & Li-Wen,
> 
> I think something might be broken somewhere ? The last update is
> from ~ 36 hrs ago and there have been many commits to the git repo since
> for RELENG_12.
> 
> # svnlite update
> Updating '.':
> At revision 369283.
> #

You are referencing 12, not 13 . . .

https://cgit.freebsd.org/src/log/?h=releng/12.0

shows the most recent releng/12.0 in git is from 2021-Jan-28:

Commit message (Expand) Author  Age Files   Lines
Add UPDATING entries and bump version.releng/12.0   Gordon Tetlow   
2020-01-28  2   -1/+17


Are you confusing stable/12 and releng/12.0 or 
possibly releng/12.0 and releng/13.0 ?

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: where to upgrade 12-stable now, svn still, or git?

2021-02-17 Thread Mark Millard via freebsd-stable




On 2021-Feb-12, at 23:03, Mark Millard  wrote:

> Dewayne Geraghty dewayne at heuristicsystems.com.au wrote on
> Sat Feb 13 06:04:52 UTC 2021 :
> 
>> The main list we used was:
>> 
>> https://lists.freebsd.org/pipermail/svn-src-stable-12/
>> 
>> but that appears dead.
>> . . .
>> https://lists.freebsd.org/pipermail/svn-src-release/
>> 
>> suspect also dead.
> 
> I should have mentioned this area in my reply to tech-lists.
> This part of things is more git based now, probably
> meaning more use of https://svnweb.freebsd.org/ to look
> at commits/check-ins is needed in order to see the modern
> cross-references between svn and git. (Such is not available
> from the git side.)
> 
> (Older history in svn does not have git references as
> far as I know.)
> 
>> I suspect that
>> 
>> https://lists.freebsd.org/pipermail/dev-commits-src-branches/2021-January/thread.html
>> 
>> is the stable-12 equivalent but are incremental patch releases also
>> available here?
> 
> 
> That covers stable/11 , stable/12 , and stable/13 . But no list
> that I know of covers any releng/* or release/* commit activity.

That last sentence is false as of today for releng/13.0 :

https://lists.freebsd.org/pipermail/dev-commits-src-branches/2021-February/thread.html

lists 7 releng/13.0 entries, the first being:

git: 00abeecb4a25 - releng/13.0 - pf: Slightly relax pf_rule_addr validation   
Kristof Provost


> For the git side of things, one has to look at the likes of
> branches via cgit (or whatever) via the likes of:
> 
> https://cgit.freebsd.org/src/log/?h=releng/12.2
> https://cgit.freebsd.org/src/log/?h=releng/13.0
> 
> Something like release/12.2.0 seems to be via a tag
> on a commit. So https://cgit.freebsd.org/src/log/?h=releng/12.2
> lists it but https://cgit.freebsd.org/src/log/?h=stable/12
> does not. (There are commits to releng/12.2 after the
> release/12.2.0 tag.)
> 
> Of course, for 12 there still is:
> 
> https://svnweb.freebsd.org/base/release/12.2.0/
> https://svnweb.freebsd.org/base/releng/12.2/
> 
> as a svn side view of things that has the modern
> cross references to git included.




===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: where to upgrade 12-stable now, svn still, or git?

2021-02-12 Thread Mark Millard via freebsd-stable

Dewayne Geraghty dewayne at heuristicsystems.com.au wrote on
Sat Feb 13 06:04:52 UTC 2021 :

> The main list we used was:
> 
> https://lists.freebsd.org/pipermail/svn-src-stable-12/
> 
> but that appears dead.
> . . .
> https://lists.freebsd.org/pipermail/svn-src-release/
> 
> suspect also dead.

I should have mentioned this area in my reply to tech-lists.
This part of things is more git based now, probably
meaning more use of https://svnweb.freebsd.org/ to look
at commits/check-ins is needed in order to see the modern
cross-references between svn and git. (Such is not available
from the git side.)

(Older history in svn does not have git references as
far as I know.)

> I suspect that
> 
> https://lists.freebsd.org/pipermail/dev-commits-src-branches/2021-January/thread.html
> 
> is the stable-12 equivalent but are incremental patch releases also
> available here?


That covers stable/11 , stable/12 , and stable/13 . But no list
that I know of covers any releng/* or release/* commit activity.

For the git side of things, one has to look at the likes of
branches via cgit (or whatever) via the likes of:

https://cgit.freebsd.org/src/log/?h=releng/12.2
https://cgit.freebsd.org/src/log/?h=releng/13.0

Something like release/12.2.0 seems to be via a tag
on a commit. So https://cgit.freebsd.org/src/log/?h=releng/12.2
lists it but https://cgit.freebsd.org/src/log/?h=stable/12
does not. (There are commits to releng/12.2 after the
release/12.2.0 tag.)

Of course, for 12 there still is:

https://svnweb.freebsd.org/base/release/12.2.0/
https://svnweb.freebsd.org/base/releng/12.2/

as a svn side view of things that has the modern
cross references to git included.

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: where to upgrade 12-stable now, svn still, or git?

2021-02-12 Thread Mark Millard via freebsd-stable

tech-lists tech-lists at zyxst.net wrote on
Sat Feb 13 04:11:46 UTC 2021 :

>  Basically I'm asking "which is the source for truth now".


The official answer for 12 as I understand it: git,
where the commits are initially made before they are
converted into svn. (Thus svn is time delayed.)

But that is complicated because the official builds
for releng, release, and even snapshots, are still
based on svn (not git) for 12 and still use rDD
numbering from svn and will be for the life of 12
(and 11).

(There is no svn branch for 13 and later.)

So, tracking relationships to official builds is easier
from the svn side of things and it also provides pointers
back to git.

To me that makes answers to "which is the source for
truth now" problematical: it would be hard to avoid
mixing the criteria from what I can tell.

(But, I'm unlikely to deal with before 13 and so likely
will be able to avoid the issue myself.)

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: where to upgrade 12-stable now, svn still, or git?

2021-02-12 Thread Mark Millard via freebsd-stable

> As subject, where to get sources for 12-stable upgrade now? Is it still
> svn or is it git?

Probably your choice. But one thing that could
bias towards svn is that the svn information
spans identifying both the svn and the git
material but the git commit does not identify
the svn material. For example, via:

https://svnweb.freebsd.org/base/stable/12/lib/?sortby=rev=down=log

is the following . . .

QUOTE
Revision 369260 - Directory Listing 
Modified Fri Feb 12 21:02:48 2021 UTC (4 hours, 49 minutes ago) by dim
test_inf_inputs: Use atf_tc_expect_fail() instead of atf_tc_skip()

Reviewed By:lwhsu
Differential Revision: 
https://reviews.freebsd.org/D28396


(cherry picked from commit 4d2edf3af1dbd8a3e7cf1b22343a1ecfc2dd41ba)

Fix lib/msun's ctrig_test/test_inf_inputs test case with clang >= 10

This sprinkles a few strategic volatiles in an attempt to defeat clang's
optimization interfering with the expected floating-point exception
flags.

Reported by:lwhsu
PR: 244732

(cherry picked from commit ac76bc1145dd7f4476e5d982ce8f355f71015713)

Git Hash:   f2a88e744701de1b37d7463828f2147f96e39d58
Git Author: arichard...@freebsd.org
END QUOTE

So both -r369260 and git hash-ids are indicated.

By contrast, the cgit commit's display does not identify the
svn side's -r369260 :

QUOTE
diff options
context:
space:  
mode:   
author  Alex Richardson2021-01-29 09:28:40 
+
committer   Dimitry Andric2021-02-12 20:50:28 
+
commit  f2a88e744701de1b37d7463828f2147f96e39d58 (patch)
tree0db8207a810f40d7f82c2033f8377ed38ce08ba2
parent  9525ccc84e337f4261425fc8fbf9f0de18500a1b (diff)
downloadsrc-f2a88e744701de1b37d7463828f2147f96e39d58.tar.gz
src-f2a88e744701de1b37d7463828f2147f96e39d58.zip
test_inf_inputs: Use atf_tc_expect_fail() instead of atf_tc_skip()stable/12
Reviewed By:lwhsu
Differential Revision: 
https://reviews.freebsd.org/D28396


(cherry picked from commit 4d2edf3af1dbd8a3e7cf1b22343a1ecfc2dd41ba)

Fix lib/msun's ctrig_test/test_inf_inputs test case with clang >= 10

This sprinkles a few strategic volatiles in an attempt to defeat clang's
optimization interfering with the expected floating-point exception
flags.

Reported by:lwhsu
PR: 
244732


(cherry picked from commit ac76bc1145dd7f4476e5d982ce8f355f71015713)
END QUOTE


Matching up stable revisions with releng/12.3/ or release/12.3.0/ in
the future would be easier starting from svn material in the first
place and would provide identification for git as well.

But I've no clue if such would be important to what you might need
to do with 12.

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: swap space issues

2020-06-29 Thread Mark Millard via freebsd-stable




On 2020-Jun-29, at 14:12, Donald Wilde  wrote:

> On 6/29/20, Mark Millard  wrote:
>> [I'm now subscribed so my messages should go through to
>> the list.]
>> 
>> On 2020-Jun-29, at 06:17, Donald Wilde  wrote:
>> 
>>> . . .
>> 
>> You report using:
>> 
>> # For possibly insufficient swap/paging space
>> # (might run out), increase the pageout delay
>> # that leads to Out Of Memory killing of
>> # processes:
>> vm.pfault_oom_attempts= 10
>> vm.pfault_oom_wait= 1
>> # (The multiplication is the total but there
>> # are other potential tradoffs in the factors
>> # multiplied for the same total.)
>> 
>> Note: kib might be interested in what happens
>> for, say, 10 and 1, 5 and 2, and 1 and 10.
>> He has asked for such before from someone
>> having OOM problems but, to my knowledge,
>> no one has taken him up on such testing.
>> (He might be only after 10/1 and 1/10 or
>> other specific figures. Best to ask him if
>> you want to try such things for him.)
> 
> Who is 'kib'? I'm still learning the current team of the Project.

Konstantin Belousov

Also known as kib (from kib at freebsd.org).
Also known as kostik (from part of his gmail address?).


>> I've always set up to use vm.pfault_oom_attempts=-1
>> (avoiding running out of swap space by how I
>> configure things and what I choose to run). I
>> avoid things like tempfs that compete for RAM,
>> especially in low memory contexts.
> 
> Until you explained what you have taught me, I thought these were
> swap-related issues.
> 
> TBH, I am getting disgusted with Synth, as good as it (by spec, not
> actuality) is supposed to be.

While I experimented with Synth a little a long time ago,
I normally stick to tools and techniques that work across
amd64, powerpc64, aarch64, 32-bit powerpc, and armv7 when
I can. So, the experiment was strictly temporary on one
environment at the time.

> CCache I've used for years, and never had this kind of issue.
>> 
>> For 64-bit environments I've never had to have
>> enough swapspace that the boot reported an issue
>> for kern.maxswapzone : more swap is allowed for
>> the same amount of RAM as is allowed for a 32-bit
>> environment.
> 
> Now that you've opened the possibility, it would explain how it goes
> from <3% swap use to OOM in moments... it's not a swap usage issue!
> That's an important thing to learn.
> 
> Not having heard from anyone else, I'm in the process of zeroing my
> drive and starting over.
>> 
>> In the 64-bit type of context with 1 GiByte+
>> of RAM I do -j4 build world buildkernel, 3072 MiBytes
>> of swap. For 2 GiByte+ of RAM I use 4 poudriere builders
>> (one per core), each allowed 4 processes
>> (ALLOW_MAKE_JOBS=yes), so the load average can at times
>> reach around 16 over significant periods. I also use
>> USB SSDs instead of spinning rust. The port builds
>> include a couple of llvm's and other toolchains. But
>> there could be other stuff around that would not fit.
>> 
>> (So synth for you vs. poudriere for me is a
>> difference in our contexts. ALso, I stick to
>> default kern.maxswapzone use without boot
>> messages about exceeding the maximum
>> recommended amount. Increasing kern.maxswapzone
>> trades off KVM available for other purposes and
>> I avoid the tradeoffs that I do not understand.)
> [snip]
>> (My context is head, not stable.)
> 
> Thanks for documenting your usage. I'll store a pointer to this week's
> -stable archives  so I can come back to this when I get to smaller
> builds.
>> 
>> . . .
>> 
>>> What got corrupted was one of the /usr/.ccache directories, but
>>> 'ccache -C' doesn't clear it.
>> 
>> I've not used ccache. So that is another variation
>> in our contexts.
>> 
>> I use UFS, not ZFS. I avoid tmpfs and such that complete
>> for memory.
> 
> I'm using UFS on MBR partitions.

GPT for root file systems for me, other than any old PowerMacs
(APM). (On the small arm's I just use microsd cards to get to
booting the root file system on a GPT based USB SSD via a
technique that works the same for all such arms that I
sometimes have access to, other than the RPi4's at this stage.)

>> . . .

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: swap space issues

2020-06-29 Thread Mark Millard via freebsd-stable

[I'm now subscribed so my messages should go through to
the list.]

On 2020-Jun-29, at 06:17, Donald Wilde  wrote:

> [adding maintainers of synth and ccache]
> 
> On 6/29/20, Mark Millard  wrote:
>> Based on "small arm system" context experiments
>> mostly . . .
>> 
>> If your console messasges do not include
>> messages about "swap_pager_getswapspace(...): failed",
>> then it is unlikely that being out of swap space
>> is the actual issue even when it reports: "was killed:
>> out of swap space" messages. For such contexts, making
>> the swap area bigger does not help.
>> 
> 
> It did not show those getswapspace messages.

Any other potentially of interest console messages?

>> In other words, "was killed: out of swap space"
>> is frequently a misnomer and not to be believed
>> for "why" the kill happened or what should be
>> done about it --without other evidence also being
>> present anyway.
>> 
>> Other causes include:
>> 
>> Sustained low free RAM (via stays-runnable processes).
>> A sufficiently delayed pageout.
>> The swap blk uma zone was exhausted.
>> The swap pctrie uma zone was exhausted.
>> 
>> (stays-runnable processes are not swapped out
>> [kernel stacks are not swapped out] but do actively
>> compete for RAM via paging activity. In such a
>> context, free RAM can stay low.)
>> 
>> The below material does not deal with the
>> the "exhausted" causes but does deal with
>> the other 2.
>> 
>> Presuming that you are getting "was killed: out
>> of swap space" notices but are not getting
>> "swap_pager_getswapspace failed" notices and
>> that kern.maxswzone vs. system load has not
>> been adjusted in a way that leads to bad
>> memory tradeoffs . . .
>> 
>> I recommend attempting use of, say, (from
>> my /etc/sysctl.conf ):
>> 
> Attached is what I tried, but when I ran synth again, I got a
> corrupted HDD that fsck refuses to fix, whether in 1U mode or with fs
> mounted. It just will not SALVAGE even when I add the -y flag.

That is a horrible result.

I assume that you rebooted after editing
sysctl.conf or manually applied the
values separately instead.

What sort of console messages were generated?
Was the corruption the only issue? Did the system
crash? In what way?

Your notes on what you set have a incorrect
comment about a case that you did not use:

# For plunty of swap/paging space (will not
# run out), avoid pageout delays leading to
# Out Of Memory killing of processes:
#vm.pfault_oom_attempts=-1 # infinite

vm.pfault_oom_attempts being -1 is a special
value that disables the the logic for the
vm.pfault_oom_attempts and vm.pfault_oom_wait
pair: Willing to wait indefinitely relative to
how long the pageout takes, no retries. (Other
OOM criteria may still be active.)

You report using:

# For possibly insufficient swap/paging space
# (might run out), increase the pageout delay
# that leads to Out Of Memory killing of
# processes:
vm.pfault_oom_attempts= 10
vm.pfault_oom_wait= 1
# (The multiplication is the total but there
# are other potential tradoffs in the factors
# multiplied for the same total.)

Note: kib might be interested in what happens
for, say, 10 and 1, 5 and 2, and 1 and 10.
He has asked for such before from someone
having OOM problems but, to my knowledge,
no one has taken him up on such testing.
(He might be only after 10/1 and 1/10 or
other specific figures. Best to ask him if
you want to try such things for him.)

I've always set up to use vm.pfault_oom_attempts=-1
(avoiding running out of swap space by how I
configure things and what I choose to run). I
avoid things like tempfs that compete for RAM,
especially in low memory contexts.

For 64-bit environments I've never had to have
enough swapspace that the boot reported an issue
for kern.maxswapzone : more swap is allowed for
the same amount of RAM as is allowed for a 32-bit
environment.

In the 64-bit type of context with 1 GiByte+
of RAM I do -j4 build world buildkernel, 3072 MiBytes
of swap. For 2 GiByte+ of RAM I use 4 poudriere builders
(one per core), each allowed 4 processes
(ALLOW_MAKE_JOBS=yes), so the load average can at times
reach around 16 over significant periods. I also use
USB SSDs instead of spinning rust. The port builds
include a couple of llvm's and other toolchains. But
there could be other stuff around that would not fit.

(So synth for you vs. poudriere for me is a
difference in our contexts. ALso, I stick to
default kern.maxswapzone use without boot
messages about exceeding the maximum
recommended amount. Increasing kern.maxswapzone
trades off KVM available for other purposes and
I avoid the tradeoffs that I do not understand.)

For 32-bit e

Re: How to boot from GPT partition without "bootme" attribute?

2018-10-30 Thread Mark Millard via freebsd-stable

Lev Serebryakov lev at FreeBSD.org wrote on
Tue Oct 30 18:37:14 UTC 2018 :

>  I have disk with GPT scheme and three partitions:
> 
> p1 - freebsd-boot
> p2 - freebsd-ufs
> p3 - freebsd-ufs
> 
>  pmbr is installed on this disk, and gptboot is installed on p1. Both p2
> and p3 contains valid FreeBSD installation, with /boot/loader, kernel,
> and everything.
> 
>  I have attribute "bootme" set on p3, but not on p2.
> 
>  What should I do to boot from p2?
> 
>  I've tried to interrupt gptboot and override its choice:
> 
>  0:ad(0p3)/boot/loader
> 
>  with
> 
>  0:ad(0p2)/boot/loader
> 
>  After that loader, loaded from p2, loads kernel from p3 and boots
> system from p3!

Are the kernel's on p2 and p3 distinct in an identifiable way?
Can you be sure it was not a mix of the p2 kernel and p3 world
that booted? I ask because . . .

One way to control what world is booted is to adjust the
/etc/fstab where the /boot/kernel/kernel is loaded from,
having that /etc/fstab to point to a different / area. I do
this on small, single board computers to get the kernel
from a microsd card but world from a USB storage media
device. (I tend to use some form of labeling style
reference to avoid device numbering dependencies.) The
/etc/fstab where world is from has / agreeing and directs
swap partition bindings and such that are appropriate to
the specific world.

(I've frequently had a world on the microsd card that the
initial /etc/fstab can be edited to point to. This gives
me a way to boot if there is a problem for the USB media.)

I've done such things in gpt and non-gpt contexts.

Any chance that that /etc/fstab initially used points
to p3's world for / ?

There are also things like /boot/loader.conf having
something like:

vfs.root.mountfrom='ufs:/dev/gpt/MyRoot'

to control where things are booted from.

>  If I have MBR, I could override "active" slice in boot0 MBR loader
> interactively.
> 
>  Is it analogous feature for GPT?



===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

What will be tier 1 for 12.0-Release?

2018-10-24 Thread Mark Millard via freebsd-stable

I note that https://pkg.freebsd.org/ does not list FreeBSD:12:aarch64
under the Tier-2 support Package sets but instead on the list with
i386 and amd64. But the same is true for FreeBSD:11:aarch64 .

FreeBSD:12:armv7 is listed in the Tier-2 support package sets list.
The same is true for FreeBSD:12:armv6 .

https://www.freebsd.org/platforms/ and
https://www.freebsd.org/doc/en_US.ISO8859-1/articles/committers-guide/archs.html
are, of course, not updated so far. (12.0 is not released yet and may be
nothing is changing in the status.)

It may be that the FreeBSD Core Team has not yet covered this for
12.0 or that it waits to see how the release goes for the potential
status changes before declaring a status changed. (So I may be
asking this too early.)

Just curious.


Good to see that there are pkg builds for powerpc64 these
days:  FreeBSD:12:powerpc64 and FreeBSD:11:powerpc64 are
listed in the Tier-2 support package sets list as well.


Technically the reported lists are from: pkg0.isc.freebsd.org

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: head -r338804 boots threadripper 1950X fine; head -r338810+ do not; -r338807 seems implicated

2018-10-22 Thread Mark Millard via freebsd-stable

[I' unable to reproduce the under-Hyper-V early kernel
crash for WITH_ZFS= (implicit) build that includes the
for-loaders patch I was given to try.]

On 2018-Oct-22, at 10:01 AM, Mark Millard  wrote:

> [I will note the the loader problem has been shown to
> not be involved in the kernel problem that this
> "Subject:" was originally for.]
> 
> On 2018-Oct-22, at 9:26 AM, Warner Losh  wrote:
> 
>> On Mon, Oct 22, 2018 at 6:39 AM Mark Millard  wrote:
>>> On 2018-Oct-22, at 4:07 AM, Toomas Soome  wrote:
>>> 
>>>> On 22 Oct 2018, at 13:58, Mark Millard  wrote:
>>>>> 
>>>>> On 2018-Oct-22, at 2:27 AM, Toomas Soome  wrote:
>>>>>> 
>>>>>>> On 22 Oct 2018, at 06:30, Warner Losh  wrote:
>>>>>>> 
>>>>>>> On Sun, Oct 21, 2018 at 9:28 PM Warner Losh  wrote:
>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Sun, Oct 21, 2018 at 8:57 PM Mark Millard via freebsd-stable <
>>>>>>>> freebsd-stable@freebsd.org> wrote:
>>>>>>>> 
>>>>>>>>> [I built based on WITHOUT_ZFS= for other reasons. But,
>>>>>>>>> after installing the build, Hyper-V based boots are
>>>>>>>>> working.]
>>>>>>>>> 
>>>>>>>>> On 2018-Oct-20, at 2:09 AM, Mark Millard  wrote:
>>>>>>>>> 
>>>>>>>>>> On 2018-Oct-20, at 1:39 AM, Mark Millard  
>>>>>>>>>> wrote:
>>>>>>>>>> . . .
>>>>>>> 
>>>>>> 
>>>>>> It would help to get output from loader lsdev -v command.
>>>>> 
>>>>> That turned out to be very interesting: The non-ZFS loader
>>>>> crashes during the listing, during disk8, which shows a
>>>>> x0 instead of a x512.
>>>>> 
>>>> 
>>>> Yes, thats the root cause there. The non-zfs loader does only *read* the 
>>>> boot disk, thats why the issue was not revealed there. 
>>>> 
>>>> It would help to identify the sector size for that disk, at least from OS, 
>>>> so we can compare with what we can get from INT13.
>>>> 
>>>> I have pretty good idea what to look there, but I am afraid we need to run 
>>>> few tests with you to understand why that disk is reporting sector size 0 
>>>> there.
>>>> 
>>>> 
>>> 
>>> Looks like I guessed wrong about the device
>>> for "drive8".
>>> 
>>> So I unplugged the only other external
>>> storage device, so the original drives
>>> 0-13 become 0-11 overall.
>>> 
>>> The machine has a multi-LUN media card reader with
>>> no cards plugged in. It is built-in rather than
>>> one that I plugged into a port. It has 4 LUN's.
>>> 
>>> So 8+4=12 and drives 0-7 show up with media before
>>> it tries any of the 4 LUN's with no card in place.
>>> 
>>> I conclude that "drive8" is an empty LUN in a media
>>> card reader.
>>> 
>>> I conclude that there is no sector size available for
>>> any of the empty LUNs in the media reader.
>>> 
>> I think you are probably right and we're hitting some divide by 0 error when 
>> we should just ignore the disk.
> 
> In the Hyper-V context, the loader and kernel do not
> see the 4-LUN media reader at all: only drives with
> normal freebsd-* style partitions and free space.
> This explains why I did not see a loader problem
> in that context.
> 
> So I conclude that the kernel crash under Hyper-V
> associated with -r338807 is a separate issue even
> though WITHOUT_ZFS= seems to have avoided the
> crash.
> 
> My plan is to continue with the -r338807 investigation
> after the loader problem is fixed in my builds. Then
> I've go back to trying builds using WITH_ZFS= (implicit),
> both native boots and Hyper-V based ones.

So much for my ability to make that inference correctly:

The WITH_ZFS= (implicit) build worked fine for booting
natively and via Hyper-V when the patch to fix the loaders
was included in what to build. I'm now unable to reproduce
this kernel-time crash.

The patch was from: https://reviews.freebsd.org/D11174

The empty LUN's in the media reader now get messages that
look something like:

disk8: Read 1 sector(s) from 0 to 0xe000 (0x8000): 0x31

early in the loader activity.

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: head -r338804 boots threadripper 1950X fine; head -r338810+ do not; -r338807 seems implicated

2018-10-22 Thread Mark Millard via freebsd-stable

[I will note the the loader problem has been shown to
not be involved in the kernel problem that this
"Subject:" was originally for.]

On 2018-Oct-22, at 9:26 AM, Warner Losh  wrote:

> On Mon, Oct 22, 2018 at 6:39 AM Mark Millard  wrote:
>> On 2018-Oct-22, at 4:07 AM, Toomas Soome  wrote:
>> 
>> > On 22 Oct 2018, at 13:58, Mark Millard  wrote:
>> >> 
>> >> On 2018-Oct-22, at 2:27 AM, Toomas Soome  wrote:
>> >>> 
>> >>>> On 22 Oct 2018, at 06:30, Warner Losh  wrote:
>> >>>> 
>> >>>> On Sun, Oct 21, 2018 at 9:28 PM Warner Losh  wrote:
>> >>>> 
>> >>>>> 
>> >>>>> 
>> >>>>> On Sun, Oct 21, 2018 at 8:57 PM Mark Millard via freebsd-stable <
>> >>>>> freebsd-stable@freebsd.org> wrote:
>> >>>>> 
>> >>>>>> [I built based on WITHOUT_ZFS= for other reasons. But,
>> >>>>>> after installing the build, Hyper-V based boots are
>> >>>>>> working.]
>> >>>>>> 
>> >>>>>> On 2018-Oct-20, at 2:09 AM, Mark Millard  wrote:
>> >>>>>> 
>> >>>>>>> On 2018-Oct-20, at 1:39 AM, Mark Millard  
>> >>>>>>> wrote:
>> >>>>>>> . . .
>> >>>> 
>> >>> 
>> >>> It would help to get output from loader lsdev -v command.
>> >> 
>> >> That turned out to be very interesting: The non-ZFS loader
>> >> crashes during the listing, during disk8, which shows a
>> >> x0 instead of a x512.
>> >> 
>> > 
>> > Yes, thats the root cause there. The non-zfs loader does only *read* the 
>> > boot disk, thats why the issue was not revealed there. 
>> > 
>> > It would help to identify the sector size for that disk, at least from OS, 
>> > so we can compare with what we can get from INT13.
>> > 
>> > I have pretty good idea what to look there, but I am afraid we need to run 
>> > few tests with you to understand why that disk is reporting sector size 0 
>> > there.
>> > 
>> > 
>> 
>> Looks like I guessed wrong about the device
>> for "drive8".
>> 
>> So I unplugged the only other external
>> storage device, so the original drives
>> 0-13 become 0-11 overall.
>> 
>> The machine has a multi-LUN media card reader with
>> no cards plugged in. It is built-in rather than
>> one that I plugged into a port. It has 4 LUN's.
>> 
>> So 8+4=12 and drives 0-7 show up with media before
>> it tries any of the 4 LUN's with no card in place.
>> 
>> I conclude that "drive8" is an empty LUN in a media
>> card reader.
>> 
>> I conclude that there is no sector size available for
>> any of the empty LUNs in the media reader.
>> 
> I think you are probably right and we're hitting some divide by 0 error when 
> we should just ignore the disk.

In the Hyper-V context, the loader and kernel do not
see the 4-LUN media reader at all: only drives with
normal freebsd-* style partitions and free space.
This explains why I did not see a loader problem
in that context.

So I conclude that the kernel crash under Hyper-V
associated with -r338807 is a separate issue even
though WITHOUT_ZFS= seems to have avoided the
crash.

My plan is to continue with the -r338807 investigation
after the loader problem is fixed in my builds. Then
I've go back to trying builds using WITH_ZFS= (implicit),
both native boots and Hyper-V based ones.

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: head -r338804 boots threadripper 1950X fine; head -r338810+ do not; -r338807 seems implicated

2018-10-22 Thread Mark Millard via freebsd-stable

On 2018-Oct-22, at 4:07 AM, Toomas Soome  wrote:

> On 22 Oct 2018, at 13:58, Mark Millard  wrote:
>> 
>> On 2018-Oct-22, at 2:27 AM, Toomas Soome  wrote:
>>> 
>>>> On 22 Oct 2018, at 06:30, Warner Losh  wrote:
>>>> 
>>>> On Sun, Oct 21, 2018 at 9:28 PM Warner Losh  wrote:
>>>> 
>>>>> 
>>>>> 
>>>>> On Sun, Oct 21, 2018 at 8:57 PM Mark Millard via freebsd-stable <
>>>>> freebsd-stable@freebsd.org> wrote:
>>>>> 
>>>>>> [I built based on WITHOUT_ZFS= for other reasons. But,
>>>>>> after installing the build, Hyper-V based boots are
>>>>>> working.]
>>>>>> 
>>>>>> On 2018-Oct-20, at 2:09 AM, Mark Millard  wrote:
>>>>>> 
>>>>>>> On 2018-Oct-20, at 1:39 AM, Mark Millard  wrote:
>>>>>>> . . .
>>>> 
>>> 
>>> It would help to get output from loader lsdev -v command.
>> 
>> That turned out to be very interesting: The non-ZFS loader
>> crashes during the listing, during disk8, which shows a
>> x0 instead of a x512.
>> 
> 
> Yes, thats the root cause there. The non-zfs loader does only *read* the boot 
> disk, thats why the issue was not revealed there. 
> 
> It would help to identify the sector size for that disk, at least from OS, so 
> we can compare with what we can get from INT13.
> 
> I have pretty good idea what to look there, but I am afraid we need to run 
> few tests with you to understand why that disk is reporting sector size 0 
> there.
> 
> 

Looks like I guessed wrong about the device
for "drive8".

So I unplugged the only other external
storage device, so the original drives
0-13 become 0-11 overall.

The machine has a multi-LUN media card reader with
no cards plugged in. It is built-in rather than
one that I plugged into a port. It has 4 LUN's.

So 8+4=12 and drives 0-7 show up with media before
it tries any of the 4 LUN's with no card in place.

I conclude that "drive8" is an empty LUN in a media
card reader.

I conclude that there is no sector size available for
any of the empty LUNs in the media reader.

> 
> 
>> Hand transcribed from pictures:
>> 
>> OK lsdev -v
>> disk devices
>> disk0: BIOS drive C (937703088 x 512):
>> disk0p1: FreeBSD boot 512K
>> disk0p2: FreeBSD UFS  356G
>> disk0p3: FreeBSD swap 15G
>> disp0p4: FreeBSD swap 76G
>> disk1: BIOS drive D (16514064 x 512):
>> disk1s1: Linux   2048KB
>> disk1s2: Unknown 952GB
>> disk2: BIOS drive E (16514064 x 512):
>> disk2p1: Unknown 128MB
>> disk3: BIOS drive F (16514064 x 512):
>> disk3p1: Unknown 128MB
>> disk4: BIOS drive G (16434495 x 512):
>> disk2p1: Unknown 128MB
>> disk4p2: DOS/Windwos 1716GB
>> disk5: BIOS drive H (16434495 x 512):
>> disk5p1: FreeBSD boot 512K
>> disk5p2: FreeBSD UFS  176G
>> disk5p3: FreeBSD swap 193G
>> disp5p4: FreeBSD swap 15G
>> disk6: BIOS drive I (16434495 x 512):
>> disk6p1: Unknown 499MB
>> disk6p2: EFI 99MB
>> disk6p3: Unknown 16MB
>> disp6p4: DOS/Windows 886G
>> dis7: BIOS drive H (16434495 x 512):
>> disk7p1: FreeBSD boot 512K
>> disk7p2: FreeBSD UFS  953G
>> disk8: BIOS drive K (262144 x 0):
>> 
>> int=  err=  efl=00010246  eip=000286bd
>> eax=  ebx=72b50430  ecx=  edx=
>> esi=  edi=00092080  ebp=00091eec  esp=00091ea8
>> cs=002b  ds=0033  es=0033fs=0033  gs=0033  ss=0033
>> cs:eip=f7 f1 89 c1 85 d2 0f 85-d8 01 00 00 6a 05 58 85
>>   f6 0f 88 75 01 00 00 89-cb c1 fb 1f 89 ca 03 55
>> ss:esp=09 00 00 00 00 00 00 00-0a 00 00 00 02 00 00 00
>>   00 00 00 00 00 00 00 00-78 1f 09 00 33 45 04 00
>> BTX halted
>> 
>> I expect that "disk8" is what gpart show -p
>> from a native boot showed as:
>> 
>> =>   1  60062499da1  MBR  (29G)
>>131 - free -  (16K)
>>   32  60062468  da1s1  fat32lba  (29G)
>> 
>> (That gpart show -p output is in another of the
>> list messages.)
>> 
>>> Also if you could test boot loader with UEFI - for example get to loader 
>>> prompt via usb/cd boot and then get the same lsdev -v output.
>> 
>> Still true given the above crash? Or, going the
>> other way, should "drive8" be left as it is in
>> order to be sure to do this test with the drive
>> present?
>> 
>> If I do this test later, it will take a bit to
>> get media to do it with. (It is about 4AM i

Re: head -r338804 boots threadripper 1950X fine; head -r338810+ do not; -r338807 seems implicated

2018-10-22 Thread Mark Millard via freebsd-stable

On 2018-Oct-22, at 2:27 AM, Toomas Soome  wrote:
> 
>> On 22 Oct 2018, at 06:30, Warner Losh  wrote:
>> 
>> On Sun, Oct 21, 2018 at 9:28 PM Warner Losh  wrote:
>> 
>>> 
>>> 
>>> On Sun, Oct 21, 2018 at 8:57 PM Mark Millard via freebsd-stable <
>>> freebsd-stable@freebsd.org> wrote:
>>> 
>>>> [I built based on WITHOUT_ZFS= for other reasons. But,
>>>> after installing the build, Hyper-V based boots are
>>>> working.]
>>>> 
>>>> On 2018-Oct-20, at 2:09 AM, Mark Millard  wrote:
>>>> 
>>>>> On 2018-Oct-20, at 1:39 AM, Mark Millard  wrote:
>>>>> 
>>>>>> I attempted to jump from head -r334014 to -r339076
>>>>>> on a threadripper 1950X board and the boot fails.
>>>>>> This is both native booting and under Hyper-V,
>>>>>> same machine and root file system in both cases.
>>>>> 
>>>>> I did my investigation under Hyper-V after seeing
>>>>> a boot failure native.
>>>>> 
>>>>> Looks like the native failure is even earlier,
>>>>> before db> is even possible, possibly during
>>>>> early loader activity.
>>>>> 
>>>>> So this report is really for running under
>>>>> Hyper-V: -r338804 boots and -r338810 does
>>>>> not. By contrast -r334804 does not boot native.
>>>>> (But I've little information for that context.)
>>>>> 
>>>>> Sorry for the confusion. I rushed the report
>>>>> in hopes of getting to sleep. It was not to be.
>>>>> 
>>>>>> It fails just after the FreeBSD/SMP lines,
>>>>>> reporting "kernel trap 9 with interrupts disabled".
>>>>>> 
>>>>>> It fails in pmap_force_invaldiate_cache_range at
>>>>>> a clflusl (%rax) instruction that produces a
>>>>>> "Fatal trap 9: general protection fault while
>>>>>> in kernel mode". cpudid=0 apic id= 00
>>>>>> 
>>>>>> I used kernel.txz files from:
>>>>>> 
>>>>>> https://artifact.ci.freebsd.org/snapshot/head/r*/amd64/amd64/
>>>>>> 
>>>>>> to narrow the range of kernel builds for working -> failing
>>>>>> and got:
>>>>>> 
>>>>>> -r338804 boots fine
>>>>>> (no amd64 kernel builds between to try)
>>>>>> -r338810+ fails (any that I tried, anyway)
>>>>>> 
>>>>>> In that range is -r338807 :
>>>>>> 
>>>>>> QUOTE
>>>>>> Author: kib
>>>>>> Date: Wed Sep 19 19:35:02 2018
>>>>>> New Revision: 338807
>>>>>> URL:
>>>>>> https://svnweb.freebsd.org/changeset/base/338807
>>>>>> 
>>>>>> 
>>>>>> Log:
>>>>>> Convert x86 cache invalidation functions to ifuncs.
>>>>>> 
>>>>>> This simplifies the runtime logic and reduces the number of
>>>>>> runtime-constant branches.
>>>>>> 
>>>>>> Reviewed by: alc, markj
>>>>>> Sponsored by:The FreeBSD Foundation
>>>>>> Approved by: re (gjb)
>>>>>> Differential revision:
>>>>>> https://reviews.freebsd.org/D16736
>>>>>> 
>>>>>> Modified:
>>>>>> head/sys/amd64/amd64/pmap.c
>>>>>> head/sys/amd64/include/pmap.h
>>>>>> head/sys/dev/drm2/drm_os_freebsd.c
>>>>>> head/sys/dev/drm2/i915/intel_ringbuffer.c
>>>>>> head/sys/i386/i386/pmap.c
>>>>>> head/sys/i386/i386/vm_machdep.c
>>>>>> head/sys/i386/include/pmap.h
>>>>>> head/sys/x86/iommu/intel_utils.c
>>>>>> END QUOTE
>>>>>> 
>>>>>> There do seem to be changes associated with
>>>>>> clflush(...) use. Looking at:
>>>>>> 
>>>>>> 
>>>> https://svnweb.freebsd.org/base/head/sys/amd64/amd64/pmap.c?annotate=339432
>>>>>> 
>>>>>> it appears that pmap_force_invalidate_cache_range has not
>>>>>> changed since -r338807.
>>>>>> 
>>>>>> It seems that -r338806 and -r3388810 would be unlikely
>>>>>> contributors.

Re: head -r338804 boots threadripper 1950X fine; head -r338810+ do not; -r338807 seems implicated

2018-10-22 Thread Mark Millard via freebsd-stable




On 2018-Oct-21, at 8:30 PM, Warner Losh  wrote:

> On Sun, Oct 21, 2018 at 9:28 PM Warner Losh  wrote:
> 
> On Sun, Oct 21, 2018 at 8:57 PM Mark Millard via freebsd-stable 
>  wrote:
>> [I built based on WITHOUT_ZFS= for other reasons. But,
>> after installing the build, Hyper-V based boots are
>> working.]
>> 
>> On 2018-Oct-20, at 2:09 AM, Mark Millard  wrote:
>> 
>> > On 2018-Oct-20, at 1:39 AM, Mark Millard  wrote:
>> > 
>> >> I attempted to jump from head -r334014 to -r339076
>> >> on a threadripper 1950X board and the boot fails.
>> >> This is both native booting and under Hyper-V,
>> >> same machine and root file system in both cases.
>> > 
>> > I did my investigation under Hyper-V after seeing
>> > a boot failure native.
>> > 
>> > Looks like the native failure is even earlier,
>> > before db> is even possible, possibly during
>> > early loader activity.
>> > 
>> > So this report is really for running under
>> > Hyper-V: -r338804 boots and -r338810 does
>> > not. By contrast -r334804 does not boot native.
>> > (But I've little information for that context.)
>> > 
>> > Sorry for the confusion. I rushed the report
>> > in hopes of getting to sleep. It was not to be.
>> > 
>> >> It fails just after the FreeBSD/SMP lines,
>> >> reporting "kernel trap 9 with interrupts disabled".
>> >> 
>> >> It fails in pmap_force_invaldiate_cache_range at
>> >> a clflusl (%rax) instruction that produces a
>> >> "Fatal trap 9: general protection fault while
>> >> in kernel mode". cpudid=0 apic id= 00
>> >> 
>> >> I used kernel.txz files from:
>> >> 
>> >> https://artifact.ci.freebsd.org/snapshot/head/r*/amd64/amd64/
>> >> 
>> >> to narrow the range of kernel builds for working -> failing
>> >> and got:
>> >> 
>> >> -r338804 boots fine
>> >> (no amd64 kernel builds between to try)
>> >> -r338810+ fails (any that I tried, anyway)
>> >> 
>> >> In that range is -r338807 :
>> >> 
>> >> QUOTE
>> >> Author: kib
>> >> Date: Wed Sep 19 19:35:02 2018
>> >> New Revision: 338807
>> >> URL: 
>> >> https://svnweb.freebsd.org/changeset/base/338807
>> >> 
>> >> 
>> >> Log:
>> >> Convert x86 cache invalidation functions to ifuncs.
>> >> 
>> >> This simplifies the runtime logic and reduces the number of
>> >> runtime-constant branches.
>> >> 
>> >> Reviewed by: alc, markj
>> >> Sponsored by:The FreeBSD Foundation
>> >> Approved by: re (gjb)
>> >> Differential revision:   
>> >> https://reviews.freebsd.org/D16736
>> >> 
>> >> Modified:
>> >> head/sys/amd64/amd64/pmap.c
>> >> head/sys/amd64/include/pmap.h
>> >> head/sys/dev/drm2/drm_os_freebsd.c
>> >> head/sys/dev/drm2/i915/intel_ringbuffer.c
>> >> head/sys/i386/i386/pmap.c
>> >> head/sys/i386/i386/vm_machdep.c
>> >> head/sys/i386/include/pmap.h
>> >> head/sys/x86/iommu/intel_utils.c
>> >> END QUOTE
>> >> 
>> >> There do seem to be changes associated with
>> >> clflush(...) use. Looking at:
>> >> 
>> >> https://svnweb.freebsd.org/base/head/sys/amd64/amd64/pmap.c?annotate=339432
>> >> 
>> >> it appears that pmap_force_invalidate_cache_range has not
>> >> changed since -r338807.
>> >> 
>> >> It seems that -r338806 and -r3388810 would be unlikely
>> >> contributors.
>> > 
>> 
>> I went after my native-boot loader problem first because I
>> could switch kernels via the loader for booting FreeBSD under
>> Hyper-V. Switching loaders is more of a problem.
>> 
>> In order to avoid the loader-time crash I switched to building
>> installing based on WITHOUT_ZFS= . I've had no active use of
>> ZFS in years. (The old official-build loaders that worked were
>> non-ZFS ones.)
>> 
>> This took care of the native-boot loader-crash --and, to my
>> surprise, also the Hyper-V-boot kernel-time crash.
>> 
>> My private builds now boot the 1950X in both contexts just
>> fine.
>> 
>> During my early investigation I did pick up specific changes
>> from after -r339076 tha

Re: head -r338804 boots threadripper 1950X fine; head -r338810+ do not; -r338807 seems implicated

2018-10-21 Thread Mark Millard via freebsd-stable

[I built based on WITHOUT_ZFS= for other reasons. But,
after installing the build, Hyper-V based boots are
working.]

On 2018-Oct-20, at 2:09 AM, Mark Millard  wrote:

> On 2018-Oct-20, at 1:39 AM, Mark Millard  wrote:
> 
>> I attempted to jump from head -r334014 to -r339076
>> on a threadripper 1950X board and the boot fails.
>> This is both native booting and under Hyper-V,
>> same machine and root file system in both cases.
> 
> I did my investigation under Hyper-V after seeing
> a boot failure native.
> 
> Looks like the native failure is even earlier,
> before db> is even possible, possibly during
> early loader activity.
> 
> So this report is really for running under
> Hyper-V: -r338804 boots and -r338810 does
> not. By contrast -r334804 does not boot native.
> (But I've little information for that context.)
> 
> Sorry for the confusion. I rushed the report
> in hopes of getting to sleep. It was not to be.
> 
>> It fails just after the FreeBSD/SMP lines,
>> reporting "kernel trap 9 with interrupts disabled".
>> 
>> It fails in pmap_force_invaldiate_cache_range at
>> a clflusl (%rax) instruction that produces a
>> "Fatal trap 9: general protection fault while
>> in kernel mode". cpudid=0 apic id= 00
>> 
>> I used kernel.txz files from:
>> 
>> https://artifact.ci.freebsd.org/snapshot/head/r*/amd64/amd64/
>> 
>> to narrow the range of kernel builds for working -> failing
>> and got:
>> 
>> -r338804 boots fine
>> (no amd64 kernel builds between to try)
>> -r338810+ fails (any that I tried, anyway)
>> 
>> In that range is -r338807 :
>> 
>> QUOTE
>> Author: kib
>> Date: Wed Sep 19 19:35:02 2018
>> New Revision: 338807
>> URL: 
>> https://svnweb.freebsd.org/changeset/base/338807
>> 
>> 
>> Log:
>> Convert x86 cache invalidation functions to ifuncs.
>> 
>> This simplifies the runtime logic and reduces the number of
>> runtime-constant branches.
>> 
>> Reviewed by: alc, markj
>> Sponsored by:The FreeBSD Foundation
>> Approved by: re (gjb)
>> Differential revision:   
>> https://reviews.freebsd.org/D16736
>> 
>> Modified:
>> head/sys/amd64/amd64/pmap.c
>> head/sys/amd64/include/pmap.h
>> head/sys/dev/drm2/drm_os_freebsd.c
>> head/sys/dev/drm2/i915/intel_ringbuffer.c
>> head/sys/i386/i386/pmap.c
>> head/sys/i386/i386/vm_machdep.c
>> head/sys/i386/include/pmap.h
>> head/sys/x86/iommu/intel_utils.c
>> END QUOTE
>> 
>> There do seem to be changes associated with
>> clflush(...) use. Looking at:
>> 
>> https://svnweb.freebsd.org/base/head/sys/amd64/amd64/pmap.c?annotate=339432
>> 
>> it appears that pmap_force_invalidate_cache_range has not
>> changed since -r338807.
>> 
>> It seems that -r338806 and -r3388810 would be unlikely
>> contributors.
> 

I went after my native-boot loader problem first because I
could switch kernels via the loader for booting FreeBSD under
Hyper-V. Switching loaders is more of a problem.

In order to avoid the loader-time crash I switched to building
installing based on WITHOUT_ZFS= . I've had no active use of
ZFS in years. (The old official-build loaders that worked were
non-ZFS ones.)

This took care of the native-boot loader-crash --and, to my
surprise, also the Hyper-V-boot kernel-time crash.

My private builds now boot the 1950X in both contexts just
fine.

During my early investigation I did pick up specific changes
from after -r339076 that seemed to be tied to Ryzen and such.
(They made no difference to the boot problems at the time
but I saw no reason to remove them.)

# uname -apKU
FreeBSD FBSDFSSD 12.0-ALPHA8 FreeBSD 12.0-ALPHA8 #5 r339076:339432M: Sun Oct 21 
16:44:25 PDT 2018 
markmi@FBSDFSSD:/usr/obj/amd64_clang/amd64.amd64/usr/src/amd64.amd64/sys/GENERIC-NODBG
  amd64 amd64 1200084 1200084


===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: head -r339076's boot loader fails to boot threadripper 1950X system (BTX halted); an earlier version works [ WITHOUT_ZFS= fixes it ]

2018-10-21 Thread Mark Millard via freebsd-stable

[Building and installing based on WITHOUT_ZFS= allows the
resulting loader to work correctly on the 1950X.]

On 2018-Oct-21, at 12:05 AM, Mark Millard  wrote:

> On 2018-Oct-20, at 10:32 PM, Warner Losh  wrote:
> 
>> On Sat, Oct 20, 2018 at 11:04 PM Mark Millard  wrote:
>> [I found what change lead to the 1950X boot crashing
>> with BTX halted.]
>> 
>>> On 2018-Oct-20, at 12:44 PM, Mark Millard  wrote:
>>> 
>>>> [Adding some vintage information for a loader
>>>> that allowed a native boot.]
>>>> 
>>>> On 2018-Oct-20, at 4:00 AM, Mark Millard  wrote:
>>>> 
>>>>> I attempted to jump from head -r334014 to -r339076
>>>>> on a threadripper 1950X board and the native
>>>>> FreeBSD boot failed very early. (Hyper-V use of
>>>>> the same media did not have this issue.)
>>>>> 
>>>>> But copying over an older /boot/loader from another
>>>>> storage device with a FreeBSD head version that has
>>>>> not been updated yet got past the problem being
>>>>> reported here. (For other reasons, the kernel has
>>>>> been moved back to -r338804 --and with that,
>>>>> and the older /boot/loader, the 1950X native-boots
>>>>> FreeBSD all the way just fine.)
>>>> 
>>>> I found one /boot/loader.old that was dated
>>>> in the update'd file system as 2018-May 20,
>>>> instead of 2018-Apr-03 from the older file
>>>> system. May 20 would apparently mean a little
>>>> below -r334014 . It native-booted okay, as did
>>>> the April one.
>>>> 
>>>> [I do not know how to inspect a /boot/loader*
>>>> to find out what -r?? it is from.]
>>>> 
>>>> Unfortunately, I had done more than one -r339076
>>>> install from -r334014 before rebooting and
>>>> no -r334014 loaders were still present:
>>>> the other *.old files from a few minutes before
>>>> the ones I had the boot problem with.
>>>> 
>>>> I might be able to extract loaders from various:
>>>> 
>>>> https://artifact.ci.freebsd.org/snapshot/head/r*/amd64/amd64/base.txz
>>>> 
>>>> materials and try substituting them in order to
>>>> narrow the range for works -> fails. If I can,
>>>> this likely would take a fair amount of time in
>>>> my context.
>>>> 
>>>> Other notes:
>>>> 
>>>> It turns out that only Hyper-V based use needed
>>>> a -r334804 kernel: Native booting with the older
>>>> loaders and newer kernels works fine.
>>>> 
>>>> Windows 10 Pro 64bit also has no problems
>>>> booting and operating the machine.
>>>> 
>>>> The native-boot problem does seem to be freeBSD
>>>> loader-vintage specific.
>>>> 
>>>>> For the BTX failure the display ends up with
>>>>> (hand transcribed, ". . ." for an omission):
>>>>> 
>>>>> BTX loader 1.00 BTX version is 1.02
>>>>> Console: internal video/keyboard
>>>>> BIOS drive C: is disk0
>>>>> . . .
>>>>> BIOS drive P: is disk13
>>>>> -
>>>>> int=  err=  efl=00010246  eip=96fd
>>>>> eax=74d48000  ebx=74d4e5e0  ecx=0011  edx=
>>>>> esi=74d4e380  edi=74d4e5b0  ebp=00091da0  esp=00091d60
>>>>> cs=002b  ds=0033  es=0033fs=0033  gs=0033  ss=0033
>>>>> cs:eip=66 f7 77 04 0f b7 c0 89-44 24 0c 89 5c 24 04 8b
>>>>> 45 08 89 04 24 83 64 24-10 00 c7 44 24 08 01 00
>>>>> ss:esp=00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00
>>>>> 00 00 00 00 00 00 00 00-f0 1d 89 00 00 00 00 00
>>>>> BTX halted
>>>> 
>>>> I've no clue what of that output might be loader vintage
>>>> specific. It might not be of use without knowing the
>>>> exact build of the loader.
>>>> 
>>>>> The board is a GIGABYTE X399 AORUS Gaming 7 (rev 1.0).
>>>>> It has 96 GiBytes of ECC RAM, just 6 DIMMs installed.
>>>> 
>>>> For reference for the board's BIOS:
>>>> 
>>>> Version: F11e
>>>> Dated: 2018-Sep-17
>>>> Description: Update AGESA 1.1.0.1a
>>> 
>>> Using:
>>> 
>>> https://artifact.ci.freebsd.org/snapshot/head/r*/amd64/amd64/base.txz
>>> 
>&g

Re: head -r339076's boot loader fails to boot threadripper 1950X system (BTX halted); an earlier version works [ -r336532 broke it ]

2018-10-21 Thread Mark Millard via freebsd-stable

On 2018-Oct-20, at 10:32 PM, Warner Losh  wrote:

> On Sat, Oct 20, 2018 at 11:04 PM Mark Millard  wrote:
> [I found what change lead to the 1950X boot crashing
> with BTX halted.]
> 
>> On 2018-Oct-20, at 12:44 PM, Mark Millard  wrote:
>> 
>> > [Adding some vintage information for a loader
>> > that allowed a native boot.]
>> > 
>> > On 2018-Oct-20, at 4:00 AM, Mark Millard  wrote:
>> > 
>> >> I attempted to jump from head -r334014 to -r339076
>> >> on a threadripper 1950X board and the native
>> >> FreeBSD boot failed very early. (Hyper-V use of
>> >> the same media did not have this issue.)
>> >> 
>> >> But copying over an older /boot/loader from another
>> >> storage device with a FreeBSD head version that has
>> >> not been updated yet got past the problem being
>> >> reported here. (For other reasons, the kernel has
>> >> been moved back to -r338804 --and with that,
>> >> and the older /boot/loader, the 1950X native-boots
>> >> FreeBSD all the way just fine.)
>> > 
>> > I found one /boot/loader.old that was dated
>> > in the update'd file system as 2018-May 20,
>> > instead of 2018-Apr-03 from the older file
>> > system. May 20 would apparently mean a little
>> > below -r334014 . It native-booted okay, as did
>> > the April one.
>> > 
>> > [I do not know how to inspect a /boot/loader*
>> > to find out what -r?? it is from.]
>> > 
>> > Unfortunately, I had done more than one -r339076
>> > install from -r334014 before rebooting and
>> > no -r334014 loaders were still present:
>> > the other *.old files from a few minutes before
>> > the ones I had the boot problem with.
>> > 
>> > I might be able to extract loaders from various:
>> > 
>> > https://artifact.ci.freebsd.org/snapshot/head/r*/amd64/amd64/base.txz
>> > 
>> > materials and try substituting them in order to
>> > narrow the range for works -> fails. If I can,
>> > this likely would take a fair amount of time in
>> > my context.
>> > 
>> > Other notes:
>> > 
>> > It turns out that only Hyper-V based use needed
>> > a -r334804 kernel: Native booting with the older
>> > loaders and newer kernels works fine.
>> > 
>> > Windows 10 Pro 64bit also has no problems
>> > booting and operating the machine.
>> > 
>> > The native-boot problem does seem to be freeBSD
>> > loader-vintage specific.
>> > 
>> >> For the BTX failure the display ends up with
>> >> (hand transcribed, ". . ." for an omission):
>> >> 
>> >> BTX loader 1.00 BTX version is 1.02
>> >> Console: internal video/keyboard
>> >> BIOS drive C: is disk0
>> >> . . .
>> >> BIOS drive P: is disk13
>> >> -
>> >> int=  err=  efl=00010246  eip=96fd
>> >> eax=74d48000  ebx=74d4e5e0  ecx=0011  edx=
>> >> esi=74d4e380  edi=74d4e5b0  ebp=00091da0  esp=00091d60
>> >> cs=002b  ds=0033  es=0033fs=0033  gs=0033  ss=0033
>> >> cs:eip=66 f7 77 04 0f b7 c0 89-44 24 0c 89 5c 24 04 8b
>> >>  45 08 89 04 24 83 64 24-10 00 c7 44 24 08 01 00
>> >> ss:esp=00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00
>> >>  00 00 00 00 00 00 00 00-f0 1d 89 00 00 00 00 00
>> >> BTX halted
>> > 
>> > I've no clue what of that output might be loader vintage
>> > specific. It might not be of use without knowing the
>> > exact build of the loader.
>> > 
>> >> The board is a GIGABYTE X399 AORUS Gaming 7 (rev 1.0).
>> >> It has 96 GiBytes of ECC RAM, just 6 DIMMs installed.
>> > 
>> > For reference for the board's BIOS:
>> > 
>> > Version: F11e
>> > Dated: 2018-Sep-17
>> > Description: Update AGESA 1.1.0.1a
>> 
>> Using:
>> 
>> https://artifact.ci.freebsd.org/snapshot/head/r*/amd64/amd64/base.txz
>> 
>> materials I found that:
>> 
>> -r336492: worked (loader vs. zfsloader: not linked)
>> (no more amd64 builds until . . .)
>> -r336538: failed (loader vs. zfsloader: linked)
>> 
>> (Later ones that I tried also failed.)
>> 
>> Looks like this broke for booting the 1950X 
>> system in question when the following was
>> checked in:
>> 
>> Author: imp
>> Date: Fri Jul 20 05:17:37 2018
>>

Re: head -r339076's boot loader fails to boot threadripper 1950X system (BTX halted); an earlier version works [ -r336532 broke it ]

2018-10-21 Thread Mark Millard via freebsd-stable

[I found what change lead to the 1950X boot crashing
with BTX halted.]

On 2018-Oct-20, at 12:44 PM, Mark Millard  wrote:

> [Adding some vintage information for a loader
> that allowed a native boot.]
> 
> On 2018-Oct-20, at 4:00 AM, Mark Millard  wrote:
> 
>> I attempted to jump from head -r334014 to -r339076
>> on a threadripper 1950X board and the native
>> FreeBSD boot failed very early. (Hyper-V use of
>> the same media did not have this issue.)
>> 
>> But copying over an older /boot/loader from another
>> storage device with a FreeBSD head version that has
>> not been updated yet got past the problem being
>> reported here. (For other reasons, the kernel has
>> been moved back to -r338804 --and with that,
>> and the older /boot/loader, the 1950X native-boots
>> FreeBSD all the way just fine.)
> 
> I found one /boot/loader.old that was dated
> in the update'd file system as 2018-May 20,
> instead of 2018-Apr-03 from the older file
> system. May 20 would apparently mean a little
> below -r334014 . It native-booted okay, as did
> the April one.
> 
> [I do not know how to inspect a /boot/loader*
> to find out what -r?? it is from.]
> 
> Unfortunately, I had done more than one -r339076
> install from -r334014 before rebooting and
> no -r334014 loaders were still present:
> the other *.old files from a few minutes before
> the ones I had the boot problem with.
> 
> I might be able to extract loaders from various:
> 
> https://artifact.ci.freebsd.org/snapshot/head/r*/amd64/amd64/base.txz
> 
> materials and try substituting them in order to
> narrow the range for works -> fails. If I can,
> this likely would take a fair amount of time in
> my context.
> 
> Other notes:
> 
> It turns out that only Hyper-V based use needed
> a -r334804 kernel: Native booting with the older
> loaders and newer kernels works fine.
> 
> Windows 10 Pro 64bit also has no problems
> booting and operating the machine.
> 
> The native-boot problem does seem to be freeBSD
> loader-vintage specific.
> 
>> For the BTX failure the display ends up with
>> (hand transcribed, ". . ." for an omission):
>> 
>> BTX loader 1.00 BTX version is 1.02
>> Console: internal video/keyboard
>> BIOS drive C: is disk0
>> . . .
>> BIOS drive P: is disk13
>> -
>> int=  err=  efl=00010246  eip=96fd
>> eax=74d48000  ebx=74d4e5e0  ecx=0011  edx=
>> esi=74d4e380  edi=74d4e5b0  ebp=00091da0  esp=00091d60
>> cs=002b  ds=0033  es=0033fs=0033  gs=0033  ss=0033
>> cs:eip=66 f7 77 04 0f b7 c0 89-44 24 0c 89 5c 24 04 8b
>>  45 08 89 04 24 83 64 24-10 00 c7 44 24 08 01 00
>> ss:esp=00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00
>>  00 00 00 00 00 00 00 00-f0 1d 89 00 00 00 00 00
>> BTX halted
> 
> I've no clue what of that output might be loader vintage
> specific. It might not be of use without knowing the
> exact build of the loader.
> 
>> The board is a GIGABYTE X399 AORUS Gaming 7 (rev 1.0).
>> It has 96 GiBytes of ECC RAM, just 6 DIMMs installed.
> 
> For reference for the board's BIOS:
> 
> Version: F11e
> Dated: 2018-Sep-17
> Description: Update AGESA 1.1.0.1a

Using:

https://artifact.ci.freebsd.org/snapshot/head/r*/amd64/amd64/base.txz

materials I found that:

-r336492: worked (loader vs. zfsloader: not linked)
(no more amd64 builds until . . .)
-r336538: failed (loader vs. zfsloader: linked)

(Later ones that I tried also failed.)

Looks like this broke for booting the 1950X 
system in question when the following was
checked in:

Author: imp
Date: Fri Jul 20 05:17:37 2018
New Revision: 336532
URL: 
https://svnweb.freebsd.org/changeset/base/336532


Log:
  Collapse zfsloader functionality back down into loader.
. . .


===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: head -r339076's boot loader fails to boot threadripper 1950X system (BTX halted); an earlier version works

2018-10-20 Thread Mark Millard via freebsd-stable

[Adding some vintage information for a loader
that allowed a native boot.]

On 2018-Oct-20, at 4:00 AM, Mark Millard  wrote:

> I attempted to jump from head -r334014 to -r339076
> on a threadripper 1950X board and the native
> FreeBSD boot failed very early. (Hyper-V use of
> the same media did not have this issue.)
> 
> But copying over an older /boot/loader from another
> storage device with a FreeBSD head version that has
> not been updated yet got past the problem being
> reported here. (For other reasons, the kernel has
> been moved back to -r338804 --and with that,
> and the older /boot/loader, the 1950X native-boots
> FreeBSD all the way just fine.)

I found one /boot/loader.old that was dated
in the update'd file system as 2018-May 20,
instead of 2018-Apr-03 from the older file
system. May 20 would apparently mean a little
below -r334014 . It native-booted okay, as did
the April one.

[I do not know how to inspect a /boot/loader*
to find out what -r?? it is from.]

Unfortunately, I had done more than one -r339076
install from -r334014 before rebooting and
no -r334014 loaders were still present:
the other *.old files from a few minutes before
the ones I had the boot problem with.

I might be able to extract loaders from various:

https://artifact.ci.freebsd.org/snapshot/head/r*/amd64/amd64/base.txz

materials and try substituting them in order to
narrow the range for works -> fails. If I can,
this likely would take a fair amount of time in
my context.

Other notes:

It turns out that only Hyper-V based use needed
a -r334804 kernel: Native booting with the older
loaders and newer kernels works fine.

Windows 10 Pro 64bit also has no problems
booting and operating the machine.

The native-boot problem does seem to be freeBSD
loader-vintage specific.

> For the BTX failure the display ends up with
> (hand transcribed, ". . ." for an omission):
> 
> BTX loader 1.00 BTX version is 1.02
> Console: internal video/keyboard
> BIOS drive C: is disk0
> . . .
> BIOS drive P: is disk13
> -
> int=  err=  efl=00010246  eip=96fd
> eax=74d48000  ebx=74d4e5e0  ecx=0011  edx=
> esi=74d4e380  edi=74d4e5b0  ebp=00091da0  esp=00091d60
> cs=002b  ds=0033  es=0033fs=0033  gs=0033  ss=0033
> cs:eip=66 f7 77 04 0f b7 c0 89-44 24 0c 89 5c 24 04 8b
>   45 08 89 04 24 83 64 24-10 00 c7 44 24 08 01 00
> ss:esp=00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00
>   00 00 00 00 00 00 00 00-f0 1d 89 00 00 00 00 00
> BTX halted

I've no clue what of that output might be loader vintage
specific. It might not be of use without knowing the
exact build of the loader.

> The board is a GIGABYTE X399 AORUS Gaming 7 (rev 1.0).
> It has 96 GiBytes of ECC RAM, just 6 DIMMs installed.

For reference for the board's BIOS:

Version: F11e
Dated: 2018-Sep-17
Description: Update AGESA 1.1.0.1a

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

head -r339076's boot loader fails to boot threadripper 1950X system (BTX halted); an earlier version works

2018-10-20 Thread Mark Millard via freebsd-stable

I attempted to jump from head -r334014 to -r339076
on a threadripper 1950X board and the native
FreeBSD boot failed very early. (Hyper-V use of
the same media did not have this issue.)

But copying over an older /boot/loader from another
storage device with a FreeBSD head version that has
not been updated yet got past the problem being
reported here. (For other reasons, the kernel has
been moved back to -r338804 --and with that,
and the older /boot/loader, the 1950X native-boots
FreeBSD all the way just fine.)

For the BTX failure the display ends up with
(hand transcribed, ". . ." for an omission):

BTX loader 1.00 BTX version is 1.02
Console: internal video/keyboard
BIOS drive C: is disk0
. . .
BIOS drive P: is disk13
-
int=  err=  efl=00010246  eip=96fd
eax=74d48000  ebx=74d4e5e0  ecx=0011  edx=
esi=74d4e380  edi=74d4e5b0  ebp=00091da0  esp=00091d60
cs=002b  ds=0033  es=0033fs=0033  gs=0033  ss=0033
cs:eip=66 f7 77 04 0f b7 c0 89-44 24 0c 89 5c 24 04 8b
   45 08 89 04 24 83 64 24-10 00 c7 44 24 08 01 00
ss:esp=00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00
   00 00 00 00 00 00 00 00-f0 1d 89 00 00 00 00 00
BTX halted


The board is a GIGABYTE X399 AORUS Gaming 7 (rev 1.0).
It has 96 GiBytes of ECC RAM, just 6 DIMMs installed.

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: head -r338804 boots threadripper 1950X fine; head -r338810+ do not; -r338807 seems implicated

2018-10-20 Thread Mark Millard via freebsd-stable




On 2018-Oct-20, at 1:39 AM, Mark Millard  wrote:

> I attempted to jump from head -r334014 to -r339076
> on a threadripper 1950X board and the boot fails.
> This is both native booting and under Hyper-V,
> same machine and root file system in both cases.

I did my investigation under Hyper-V after seeing
a boot failure native.

Looks like the native failure is even earlier,
before db> is even possible, possibly during
early loader activity.

So this report is really for running under
Hyper-V: -r338804 boots and -r338810 does
not. By contrast -r334804 does not boot native.
(But I've little information for that context.)

Sorry for the confusion. I rushed the report
in hopes of getting to sleep. It was not to be.

> It fails just after the FreeBSD/SMP lines,
> reporting "kernel trap 9 with interrupts disabled".
> 
> It fails in pmap_force_invaldiate_cache_range at
> a clflusl (%rax) instruction that produces a
> "Fatal trap 9: general protection fault while
> in kernel mode". cpudid=0 apic id= 00
> 
> I used kernel.txz files from:
> 
> https://artifact.ci.freebsd.org/snapshot/head/r*/amd64/amd64/
> 
> to narrow the range of kernel builds for working -> failing
> and got:
> 
> -r338804 boots fine
> (no amd64 kernel builds between to try)
> -r338810+ fails (any that I tried, anyway)
> 
> In that range is -r338807 :
> 
> QUOTE
> Author: kib
> Date: Wed Sep 19 19:35:02 2018
> New Revision: 338807
> URL: 
> https://svnweb.freebsd.org/changeset/base/338807
> 
> 
> Log:
>  Convert x86 cache invalidation functions to ifuncs.
> 
>  This simplifies the runtime logic and reduces the number of
>  runtime-constant branches.
> 
>  Reviewed by: alc, markj
>  Sponsored by:The FreeBSD Foundation
>  Approved by: re (gjb)
>  Differential revision:   
> https://reviews.freebsd.org/D16736
> 
> Modified:
>  head/sys/amd64/amd64/pmap.c
>  head/sys/amd64/include/pmap.h
>  head/sys/dev/drm2/drm_os_freebsd.c
>  head/sys/dev/drm2/i915/intel_ringbuffer.c
>  head/sys/i386/i386/pmap.c
>  head/sys/i386/i386/vm_machdep.c
>  head/sys/i386/include/pmap.h
>  head/sys/x86/iommu/intel_utils.c
> END QUOTE
> 
> There do seem to be changes associated with
> clflush(...) use. Looking at:
> 
> https://svnweb.freebsd.org/base/head/sys/amd64/amd64/pmap.c?annotate=339432
> 
> it appears that pmap_force_invalidate_cache_range has not
> changed since -r338807.
> 
> It seems that -r338806 and -r3388810 would be unlikely
> contributors.


===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

head -r338804 boots threadripper 1950X fine; head -r338810+ do not; -r338807 seems implicated

2018-10-20 Thread Mark Millard via freebsd-stable

I attempted to jump from head -r334014 to -r339076
on a threadripper 1950X board and the boot fails.
This is both native booting and under Hyper-V,
same machine and root file system in both cases.

It fails just after the FreeBSD/SMP lines,
reporting "kernel trap 9 with interrupts disabled".

It fails in pmap_force_invaldiate_cache_range at
a clflusl (%rax) instruction that produces a
"Fatal trap 9: general protection fault while
in kernel mode". cpudid=0 apic id= 00

I used kernel.txz files from:

https://artifact.ci.freebsd.org/snapshot/head/r*/amd64/amd64/

to narrow the range of kernel builds for working -> failing
and got:

-r338804 boots fine
(no amd64 kernel builds between to try)
-r338810+ fails (any that I tried, anyway)

In that range is -r338807 :

QUOTE
Author: kib
Date: Wed Sep 19 19:35:02 2018
New Revision: 338807
URL: 
https://svnweb.freebsd.org/changeset/base/338807


Log:
  Convert x86 cache invalidation functions to ifuncs.
  
  This simplifies the runtime logic and reduces the number of
  runtime-constant branches.
  
  Reviewed by:  alc, markj
  Sponsored by: The FreeBSD Foundation
  Approved by:  re (gjb)
  Differential revision:
https://reviews.freebsd.org/D16736

Modified:
  head/sys/amd64/amd64/pmap.c
  head/sys/amd64/include/pmap.h
  head/sys/dev/drm2/drm_os_freebsd.c
  head/sys/dev/drm2/i915/intel_ringbuffer.c
  head/sys/i386/i386/pmap.c
  head/sys/i386/i386/vm_machdep.c
  head/sys/i386/include/pmap.h
  head/sys/x86/iommu/intel_utils.c
END QUOTE

There do seem to be changes associated with
clflush(...) use. Looking at:

https://svnweb.freebsd.org/base/head/sys/amd64/amd64/pmap.c?annotate=339432

it appears that pmap_force_invalidate_cache_range has not
changed since -r338807.

It seems that -r338806 and -r3388810 would be unlikely
contributors.


===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Heads up: OFED build by default

2018-08-07 Thread Mark Millard via freebsd-stable

OFED in head lead to the following in order for ci.freebsd.org's
FreeBSD-head-amd64-gcc builds to not fail/stop in
all_subdir_lib/ofed :

Author: jhb
Date: Mon Aug  6 23:51:08 2018
New Revision: 337399
URL: 
https://svnweb.freebsd.org/changeset/base/337399


Log:
  Make the system C11 atomics headers fully compatible with external GCC.
  
  The  and  headers already included support for
  C11 atomics via intrinsincs in modern versions of GCC, but these versions
  tried to "hide" atomic variables inside a wrapper structure.  This wrapper
  is not compatible with GCC's internal  header, so that if
  GCC's  was used together with , use of C11
  atomics would fail to compile.  Fix this by not hiding atomic variables
  in a structure for modern versions of GCC.  The headers already avoid
  using a wrapper structure on clang.
  
  Note that this wrapper was only used if C11 was not enabled (e.g.
  via -std=c99), so this also fixes compile failures if a modern version
  of GCC was used with -std=c11 but with FreeBSD's  instead
  of GCC's  and this change fixes that case as well.
  
  Reported by:  Mark Millard
  Reviewed by:  kib
  Differential Revision:
https://reviews.freebsd.org/D16585


Modified:
  head/sys/sys/cdefs.h
  head/sys/sys/stdatomic.h


Without this FreeBSD-head-amd64-gcc was getting:

--- all_subdir_lib/ofed ---
In file included from /workspace/src/contrib/ofed/librdmacm/cma.h:43:0,
from /workspace/src/contrib/ofed/librdmacm/acm.c:42:
/workspace/src/contrib/ofed/librdmacm/cma.h: In function 'fastlock_init':
/workspace/src/contrib/ofed/librdmacm/cma.h:60:2: error: invalid initializer
 atomic_store(>cnt, 0);
 ^
In file included from /workspace/src/contrib/ofed/librdmacm/acm.c:42:0:
/workspace/src/contrib/ofed/librdmacm/cma.h: In function 'fastlock_acquire':
/workspace/src/contrib/ofed/librdmacm/cma.h:68:2: error: operand type 'struct 
 *' is incompatible with argument 1 of '__atomic_fetch_add'
 if (atomic_fetch_add(>cnt, 1) > 0)
 ^~
/workspace/src/contrib/ofed/librdmacm/cma.h: In function 'fastlock_release':
/workspace/src/contrib/ofed/librdmacm/cma.h:73:2: error: operand type 'struct 
 *' is incompatible with argument 1 of '__atomic_fetch_sub'
 if (atomic_fetch_sub(>cnt, 1) > 1)
 ^~
. . .
--- all_subdir_lib/ofed ---
*** [acm.o] Error code 1


Side notes:

A modern enough /usr/ports avoids the devel/*-gcc
the separate float.h problem: -r476273 fixed
/devel/powerpc64-gcc (the master port for
devel/*-gcc ports) to avoid this.

With both fixes in place I was able to buildworld
buildkernel via amd64's xtoolchain ( so via its
use of devel/amd64-gcc ). (The build servers likely
do not have -r476273 yet.)


===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: zfs problems after rebuilding system [SOLVED]

2018-03-05 Thread Mark Millard via freebsd-stable

Eugene Grosbein eugen at grosbein.net wrote on
Mon Mar 5 12:20:47 UTC 2018 :

> 05.03.2018 19:10, Dimitry Andric wrote:
> 
>>> When no boot drive is detected early enough, the kernel goes to the
>>> mountroot prompt.  That seems to hold a Giant lock which inhibits
>>> further progress being made.  Sometimes progress can be made by trying
>>> to mount unmountable partitions on other drives, but this usually goes
>>> too fast, especially if the USB drive often times out.
>> 
>> What I would like to know, is why our USB stack has such timeout issues
>> at all.  When I boot Linux on the same type of hardware, I never see USB
>> timeouts.  They must be doing something right, or maybe they just don't
>> bother checking some status bits that we are very strict about?
> 
> This is heavily hardware-dependent. You may have no issues with some
> software+hardware combination and long timeouts with same software
> but different hardware.

Dimitry's example is for changing the software for the same(?) hardware,
if I understand right. (FreeBSD vs. some Linux distribution.)

(?: He did say "type of".)

Perhaps that type of hardware can be used to figure out the difference.

===
Mark Millard
marklmi at yahoo.com
( markmi at dsl-only.net is
going away in 2018-Feb, late)

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: 50 percent swap used, but "ps auxww" output shows no processes swapped out

2018-02-03 Thread Mark Millard via freebsd-stable

Brandon Allbery allbery.b at gmail.com wrote on
Sat Feb 3 21:18:53 UTC 2018 :

> Swapping whole processes out is not really a thing any more. Individual
> pages are paged to/from memory; if a memory page has no backing file, it
> will be allocated a block in swap space as its backing storage.
> 
> (I'm not sure "W" status even means swap; I thought whole-process swapping
> wasn't even supported any more.)

>From what I've seen on the lists there is a technical distinction
made between "kernel stacks for the process no longer memory resident"
(swapped out) and other pages for the process having paged to disk and
not being resident.

But many tools do not seem to present that point of view and still
reflect an older view in the terminology used, including in
documentation. One has to interpret what one is shown as I understand.

As an example, top can show RES being zero despite the kernel stacks
for the process not having been moved to disk. RES zero might not
mean what one might expect about "swapped out".

I do not know if a W after the first letter in state (STAT) for
"ps auxww" track the kernel-stacks' resident-vs-not status for the
process or not. (Matching your not sure status.)

> On Sat, Feb 3, 2018 at 4:14 PM, Michael Voorhis  wrote:
> 
> > Hi all,
> >
> > I've got an amd64 system running 11.1-STABLE r325027, with something
> > like 20G of swap. "swapinfo" shows that half the swap is used.
> >
> > So of course I'm curious to know which processes have been swapped
> > out. I'm not using any "tmpfs" filesystems; no ZFS, no huge amounts of
> > wired-down memory. The system's got 16 processors and 128G of RAM. "ps
> > auxww" output shows *no* processes that are swapped out (2nd character
> > in "STAT" field is "W"). Not a single one. The only process with a W in
> > the stat field at all is the "[intr]" kernel thread.
> >
> > What is using the swapspace

The so-called swapspace is really the paging/swap-space with
most of the use being paging typically. (As Brandon indicated.)

Once a page is paged out, if the process sticks around but
does not use or free the page, the page likely stays
paged-out. (I'm guessing some at the intended results for
default tuning --and that you probably are using default
tuning.) So the in-use swapspace is likely from one or
more existing processes that did page-outs earlier.

(Expect my descriptions to be over simplified, but hopefully
pointing in the right general direction.)

> > Please educate me.
> >

===
Mark Millard
marklmi at yahoo.com
( markmi at dsl-only.net is
going away in 2018-Feb, late)

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Ryzen issues on FreeBSD ?

2018-01-27 Thread Mark Millard via freebsd-stable

Don Lewis truckman at FreeBSD.org wrote on
Sat Jan 27 08:23:27 UTC 2018 :

>   PIDTID COMMTDNAME  CPU  PRI STATE   WCHAN   
>  
> 90692 100801 python2.7   --1  124 sleep   usem
>   
> 90692 100824 python2.7   --1  124 sleep   usem
>   
. . .


# grep -r '"usem"' /usr/src/sys/
/usr/src/sys/dev/qlnx/qlnxe/ecore_dbg_fw_funcs.c:   "usem", { true, true, 
true }, true, DBG_USTORM_ID,
/usr/src/sys/kern/kern_umtx.c:  error = umtxq_sleep(uq, "usem", timeout == NULL 
? NULL : );
/usr/src/sys/kern/kern_umtx.c:  error = umtxq_sleep(uq, "usem", timeout == NULL 
? NULL : );

/usr/src/sys/kern/kern_umtx.c has :

#if defined(COMPAT_FREEBSD9) || defined(COMPAT_FREEBSD10)
static int
do_sem_wait(struct thread *td, struct _usem *sem, struct _umtx_time *timeout)
{
. . .
error = umtxq_sleep(uq, "usem", timeout == NULL ? NULL : );
. . .
#endif
. . .
static int
do_sem2_wait(struct thread *td, struct _usem2 *sem, struct _umtx_time *timeout)
{
. . .
error = umtxq_sleep(uq, "usem", timeout == NULL ? NULL : );
. . .


The comparison/contrast for:

> 90692 101629 python2.7   --1  125 sleep   umtxn   
>   



# grep -r '"umtxn"' /usr/src/sys/
/usr/src/sys/kern/kern_umtx.c:  error = umtxq_sleep(uq, 
"umtxn", timeout == NULL ?

/usr/src/sys/kern/kern_umtx.c has:

static int
do_lock_normal(struct thread *td, struct umutex *m, uint32_t flags,
struct _umtx_time *timeout, int mode)
{
. . .
/*
 * We set the contested bit, sleep. Otherwise the lock changed
 * and we need to retry or we lost a race to the thread
 * unlocking the umtx.
 */
umtxq_lock(>uq_key);
umtxq_unbusy(>uq_key);
if (old == owner)
error = umtxq_sleep(uq, "umtxn", timeout == NULL ?
NULL : );
umtxq_remove(uq);
umtxq_unlock(>uq_key);
umtx_key_release(>uq_key);
. . .

Both contexts are umtxq_sleep usage:

/*
 * Put thread into sleep state, before sleeping, check if
 * thread was removed from umtx queue.
 */
static inline int
umtxq_sleep(struct umtx_q *uq, const char *wmesg, struct abs_timeout *abstime)
. . .


Note: I'm guessing that /usr/src/sys/dev/qlnx/qlnxe/ecore_dbg_fw_funcs.c
is not involved.

===
Mark Millard
marklmi at yahoo.com
( markmi at dsl-only.net is
going away in 2018-Feb, late)

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Ryzen issues on FreeBSD ?

2018-01-24 Thread Mark Millard via freebsd-stable

Mike Pumford michaelp at bsquare.com wrote on
Wed Jan 24 12:03:04 UTC 2018 :

> I've run into this on modern Intel systems as well. The RAM is sold as 
> 2400 but thats actually an overclock profile. If I actually enabled it 
> (despite both board and RAM being qualified for that) the system ends up 
> locking up or crashing as soon as you stress it. Go back to the standard 
> DDR profile advertised by the RAM and it is totally stable.

The reported fails are during idle time as I understand. Things are
working when the CPU's are kept busy from what I've read in the
various notes. The hang-ups are during idle times.

"the system ends up locking up or crashing as soon as you stress it"
does not sound like a matching context.

That a slower RAM speed might help idle behave correctly is interesting
given the Zen and Ryzen dependence on RAM speed for the speed of its
internal interconnect-fabric's operation.

I'll note that, if one goes through the referenced Linux exchanges about
this, Ryzen Threadripper's examples are also reported to have the problem.

===
Mark Millard
marklmi at yahoo.com
( markmi at dsl-only.net is
going away in 2018-Feb, late)

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Ryzen issues on FreeBSD ?

2018-01-21 Thread Mark Millard via freebsd-stable

On 2018-Jan-21, at 12:17 PM, Don Lewis  wrote:

> On 20 Jan, Mark Millard wrote:
>> Don Lewis truckman at FreeBSD.org wrote on
>> Sat Jan 20 02:35:40 UTC 2018 :
>> 
>>> The only real problem with the old CPUs is the random segfault problem
>>> and some other random strangeness, like the lang/ghc build almost always
>>> failing.
>> 
>> 
>> At one time you had written
>> ( https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=221029
>> comment #103 on 2017-Oct-09):
>> 
>> QUOTE
>> The ghc build failure seems to be gone after upgrading the a
>> more recent 12.0-CURRENT.  I will try to bisect for the fix
>> when I have a chance.
>> END QUOTE
>> 
>> Did that not pan out? Did you conclude it was
>> hardware-context specific?
> 
> I was never able to reproduce the problem.  It seems like it failed on
> the first ports build run after I replaced the CPU.  When I upgraded the
> OS and ports, the build succeeded.  I tried going back to much earlier
> OS and ports versions, but I could never get the ghc build to fail
> again.  I'm baffled by this ...

Sounds like the overall information is then:

Old CPU: frequent problem building ghc (nearly always
 fails as far as I know)

New CPU: rare problem building ghc
 (possibly never for some softare version combinations?)

(On a Ryzen Threadripper 1950X I've not seen a failure. For the
above I'm including what I observed under Hyper-V for the 1800X
and 1950X as contributing evidence: The 1800X was a early one
and fit the "Old CPU" case above. AMD has stated that
threadrippers never had the problems that other, early Ryzen
CPUs did for heavy compiling use. So far, for me, that seems
true.)

So, it sounds like building ghc is still a good test. Back when
I had access to the 1800X Ryzen system ghc was the most reliable
failure-to-build of what I tried. It still may be useful for
that sort of test activity to classify Ryzen CPUs for the one
type of issue.

===
Mark Millard
marklmi at yahoo.com
( markmi at dsl-only.net is
going away in 2018-Feb, late)

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Ryzen issues on FreeBSD ?

2018-01-21 Thread Mark Millard via freebsd-stable

Don Lewis truckman at FreeBSD.org wrote on
Sat Jan 20 02:35:40 UTC 2018 :

> The only real problem with the old CPUs is the random segfault problem
> and some other random strangeness, like the lang/ghc build almost always
> failing.


At one time you had written
( https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=221029
comment #103 on 2017-Oct-09):

QUOTE
The ghc build failure seems to be gone after upgrading the a
more recent 12.0-CURRENT.  I will try to bisect for the fix
when I have a chance.
END QUOTE

Did that not pan out? Did you conclude it was
hardware-context specific?


===
Mark Millard
marklmi26-fbsd at yahoo.com
( markmi at dsl-only.net is
going away in 2018-Feb, late)

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Ryzen issues on FreeBSD ?

2018-01-17 Thread Mark Millard via freebsd-stable

Mike Tancsa mike at sentex.net wrote on:
Wed Jan 17 14:31:50 UTC 2018 :

> On 1/17/2018 8:46 AM, Nimrod Levy wrote:
> > I've been seeing similar issues on Ryzen and asked some questions,
> > here 
> > https://lists.freebsd.org/pipermail/freebsd-stable/2017-December/088121.html
> > 
> > My previous queries didn't go anywhere.  
> >
>  
> 
> 
> Thats not very promising :(  Googling around, shows lots of similar
> reports both on FreeBSD and Linux, but its a lot of "I tweaked this BIOS
> setting and so far so good" but nothing definitive / conclusive.  Having
> to mess about with hardware settings for days on end hoping to fix
> random lockups is  not good.

See Bugzilla 219399 and 221029 :

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=219399
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=221029

I'm not sure how much stable/11 and the like have been
tracking things that were done in head (12) during this.
My use has only been via versions of head.

My 1800X use was basically after head was updated to deal
with what 219399 eventually was isolated to. (221029 is
from splitting off problems that were not originally known
to be separate.)

While I had problems for 1800X that are what the 221029
bugzilla above is about, I've not had such with a 1950X
in the same sorts of contexts as I had been using the
1800X. But this was under Hyper-V for both processor
variants (with matching boards).

I've only tried the 1950X with a native FreeBSD boot once
(a fair time ago). It showed a lockup problem fairly
quickly (power switch/plug time). I've never seen such
(or anything analogous) under Hyper-V with extensive use.

It does not look like I'll be investigating native FreeBSD
on the 1950X anytime soon. (I no longer have access to the
1800X.)

===
Mark Millard
marklmi26-fbsd at yahoo.com
( markmi at dsl-only.net is going away in 2018-Feb, late)
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

11.1-STABLE for amd64: jumping from -r326142 to -r327228: all_subdir_cxgbe/t4_firmware failed to build

2017-12-26 Thread Mark Millard

g.txt' 't4fw_cfg.txt'"
indicates execution of the (whitespace changed below):

else 
  ln -s /usr/src/sys/dev/cxgbe/firmware/t4fw_cfg.txt t4fw_cfg.txt;
  ld -b binary --no-warn-mismatch -d -warn-common -m elf_x86_64_fbsd -r -d -o 
t4fw_cfg.txt.fwo t4fw_cfg.txt;
  rm t4fw_cfg.txt;
fi

The "E 99835 /usr/obj/amd64_clang/amd64.amd64/usr/src/tmp/usr/bin/ld"
indicates which ld was executed. "X 99835 1 0" indicates a non-zero
status return if I understand right.

There is no "D  t4fw_cfg.txt" line to match up with the
"rm t4fw_cfg.txt", nor an "E" to match up with rm.
 


# uname -apKU
FreeBSD FBSDFS 11.1-STABLE FreeBSD 11.1-STABLE  r326142  amd64 amd64 1101506 
1101506

# svnlite info /usr/src/ | grep "Re[plv]"
Relative URL: ^/stable/11
Repository Root: svn://svn.freebsd.org/base
Repository UUID: ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f
Revision: 327228
Last Changed Rev: 327228

# more ~/sys_build_scripts.amd64-host/make_amd64_nodebug_clang-amd64-host.sh 
kldload -n filemon && \
script 
/typescripts/sys_typescripts/typescript_make_amd64_nodebug_clang-amd64-host-$(date
 +%Y-%m-%d:%H:%M:%S) \
env __MAKE_CONF="/root/src.configs/make.conf" SRCCONF="/dev/null" 
SRC_ENV_CONF="/root/src.configs/src.conf.amd64-clang.amd64-host" \
WITH_META_MODE=yes \
MAKEOBJDIRPREFIX="/usr/obj/amd64_clang/amd64.amd64" \
make $*

# more /root/src.configs/src.conf.amd64-clang.amd64-host 
TO_TYPE=amd64
#
KERNCONF=GENERIC
TARGET=${TO_TYPE}
.if ${.MAKE.LEVEL} == 0
TARGET_ARCH=${TO_TYPE}
.export TARGET_ARCH
.endif
#
WITH_META_MODE=
#WITH_CROSS_COMPILER=
WITH_SYSTEM_COMPILER=
#
WITH_LIBCPLUSPLUS=
WITH_BINUTILS_BOOTSTRAP=
WITH_ELFTOOLCHAIN_BOOTSTRAP=
#WITH_CLANG_BOOTSTRAP=
WITH_CLANG=
WITH_CLANG_IS_CC=
WITH_CLANG_FULL=
WITH_CLANG_EXTRAS=
#WITH_LLD=
#WITHOUT_LLD_IS_LD=
#WITH_LLVM_LIBUNWIND=
#WITH_LLDB=
#PORTS_MODULES=emulators/virtualbox-ose-additions
#
WITH_BOOT=
WITH_LIB32=
#
WITHOUT_GCC_BOOTSTRAP=
WITHOUT_GCC=
WITHOUT_GCC_IS_CC=
WITHOUT_GNUCXX=
#
NO_WERROR=
#WERROR=
MALLOC_PRODUCTION=
#
WITH_REPRODUCIBLE_BUILD=
WITH_DEBUG_FILES=

Ryzen Threadripper 1950X HW but FreeBSD -r327142 running
under a Windows 10 Pro Hyper-V virtual machine. 110592
MB of RAM assigned. 29 virtual processors assigned.
Physical hard disk used, not a virtual one.

===
Mark Millard
markmi at dsl-only.net

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: ryzen issues?

2017-12-15 Thread Mark Millard

On 12/15/2017 7:42 AM, Nimrod Levy wrote:

> I've been having a problem with a recent computer build.  I've got a Ryzen
> 5 1600 on an Asus Prime B350+ motherboard.  When it runs, it runs well.  I
> don't see things like programs crashing or anything like that.  

I have no debugging help to provide but can
describe a similar experience. I only tried
a direct-boot of FreeBSD once and it may be
a long time before I try again.

With a Ryzen Threadripper 1950X that normally
runs FreeBSD 12.0-CURRENT builds under Windows
10 Pro's Hyper-V, I once tried to boot and use
the system directly with FreeBSD --from the
same media that it runs on via Hyper-V. (Under
Hyper-V FreeBSD runs fine.)

It booted. But not too long later it hung up.
As I remember I was able to hold a power button
in for a longer than normal time to cut the
power. If I remember right, the light normally
visible on the USB keyboard when it is
operational on USB was out after the hangup
(and before the forced power-off). As I remember,
the video display stayed (but did not update).
Ethernet access stopped.

The Ryzen Threadripper 1950X does not have the
CPU problem that lower end Ryzen's have had.
(So for this is confirmed in my testing,
contrasted to my earlier access to a 1800X, also
used under Hyper-V, but never directly booted.)
So, I doubt that CPU issue is involved in the
hangup that I got.

I will note that I've seen list messages from
folks indicating some Ryzen variant for some
system --and they were not reporting such
hangups, nor did they indicate running under
any hypervisors. So, something more
local-context-special seems to be involved.

===
Mark Millard
markmi at dsl-only.net

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

stable/11 -r326142 (e.g.): "cat /dev/null | zstd --stdout" gets "/usr/bin/zstd: Undefined symbol "stat@FBSD_1.5"

2017-11-26 Thread Mark Millard

# cat /dev/null | zstd --stdout
/usr/bin/zstd: Undefined symbol "stat@FBSD_1.5"

# freebsd-version -ku
11.1-STABLE
11.1-STABLE

# uname -apKU
FreeBSD FBSDFS 11.1-STABLE FreeBSD 11.1-STABLE  r326142  amd64 amd64 1101506 
1101506

It was built from source:

# svnlite info /usr/src/
Path: .
Working Copy Root Path: /usr/src
URL: svn://svn.freebsd.org/base/stable/11
Relative URL: ^/stable/11
Repository Root: svn://svn.freebsd.org/base
Repository UUID: ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f
Revision: 326142
Node Kind: directory
Schedule: normal
Last Changed Author: ae
Last Changed Rev: 326142
Last Changed Date: 2017-11-23 20:42:21 -0800 (Thu, 23 Nov 2017)

# svnlite status /usr/src/
# 

(So, no changes.)


/usr/src/lib/libc/sys/Symbol.map has:

FBSD_1.0 {
. . .
socket;
socketpair;
stat;
statfs;
swapoff;
swapon;
. . .

So 1.0 vs. 1.5 for some reason.


Note: Using /rescue/zstd avoids this issue.

===
Mark Millard
markmi at dsl-only.net

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: 11.1 running on HyperV hn interface hangs

2017-09-06 Thread Mark Millard

Paul Koch paul.koch at akips.com wrote on
Wed Sep 6 09:33:26 UTC 2017 :

> We recently moved our software from 11.0-p9 to 11.1-p1, but looks like there
> is a regression in 11.1-p1 running on HyperV (Windows/HyperV 2012 R2) where
> the virtual hn0 interface hangs with the following kernel messages:
> 
>  hn0:  on vmbus0
>  hn0: Ethernet address: 00:15:5d:31:21:0f
>  hn0: link state changed to UP
>  ...
>  hn0: RXBUF ack retry
>  hn0: RXBUF ack failed
>  last message repeated 571 times
> 
> . . .
> 
> Has anyone seen this problem before with 11.1 ?

While it is/was a personal use/experiment I have
used all the following under Windows 10 Pro's
Hyper-V with networking via hn0 Ethernet as seen
from the guest FreeBSD:

releng/11.1 (no longer around to remind me of the
 most recent -r?? but various updates )
stable/11   (various updates, -r320807 currently)
head(various updates, -r323147 currently)

I had no problems with my use. (By no means a traffic
match to your context but definitely used.)

In all cases the Virtual Switch Manager was tied to the
(builtin) "External network" that is listed as:

Intel(R) I211 Gigabit Network Connection

in the Virtual Switch Properties pop-up for
External network. The machine is not a server.

So not totally broken as far as I can tell. Something
more specific to your context would seem to also be
involved.

Hyper-V has worked nicely for assigning 14 of the machine's
16 hardware threads to FreeBSD and doing buildworld buildkernel
and poudriere based port builds. (Windows 10 Pro not being
otherwise busy.)

===
Mark Millard
markmi at dsl-only.net

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: svn commit: r322715 - in stable/11: etc/mtree lib/libcasper lib/libcasper/services lib/libcasper/services/cap_dns lib/libcasper/services/cap_dns/tests lib/libcasper/services/cap_grp lib/libcaspe

2017-08-29 Thread Mark Millard

Nevermind, stupid mistake on my part: armv6 was not actually
updated yet.

> On 2017-Aug-29, at 8:42 PM, Mark Millard  wrote:
> 
> installworld for -r323012 is getting things like (at least with the likes of 
> -j14):
> 
> --- pwd_test.install ---
> --- _proginstall ---
> install -N /usr/src/etc  -s -o root -g wheel -m 555   pwd_test 
> /usr/obj/DESTDIRs/clang-armv7-installworld-dist-from-src/usr/tests/lib/libcasper/services/cap_pwd/pwd_test
> install: pwd_test: No such file or directory
> *** [_proginstall] Error code 71
> 
> 
> This was on a amd64 -> armv6 cross build and local installworld
> on the amd64 file system, not a live install.
> 
> 
> # svnlite status /usr/src/ | sort
> ?   /usr/src/sys/amd64/conf/GENERIC-NODBG
> ?   /usr/src/sys/arm/conf/GENERIC-NODBG
> ?   /usr/src/sys/arm64/conf/GENERIC-NODBG
> ?   /usr/src/sys/powerpc/conf/GENERIC64vtsc-NODBG
> ?   /usr/src/sys/powerpc/conf/GENERICvtsc-NODBG
> M   /usr/src/contrib/llvm/lib/Target/PowerPC/PPCFrameLowering.cpp
> M   /usr/src/sys/boot/powerpc/kboot/Makefile
> 
> 
> # uname -apKU
> FreeBSD FBSDx6411SL 11.1-STABLE FreeBSD 11.1-STABLE  r323012M  amd64 amd64 
> 1101502 1101502
> 
> 
> # svnlite info /usr/src/ | grep "Re[plv]"
> Relative URL: ^/stable/11
> Repository Root: svn://svn.freebsd.org/base
> Repository UUID: ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f
> Revision: 323012
> Last Changed Rev: 323012

===
Mark Millard
markmi at dsl-only.net


___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: svn commit: r322715 - in stable/11: etc/mtree lib/libcasper lib/libcasper/services lib/libcasper/services/cap_dns lib/libcasper/services/cap_dns/tests lib/libcasper/services/cap_grp lib/libcaspe

2017-08-29 Thread Mark Millard

installworld for -r323012 is getting things like (at least with the likes of 
-j14):

--- pwd_test.install ---
--- _proginstall ---
install -N /usr/src/etc  -s -o root -g wheel -m 555   pwd_test 
/usr/obj/DESTDIRs/clang-armv7-installworld-dist-from-src/usr/tests/lib/libcasper/services/cap_pwd/pwd_test
install: pwd_test: No such file or directory
*** [_proginstall] Error code 71


This was on a amd64 -> armv6 cross build and local installworld
on the amd64 file system, not a live install.


# svnlite status /usr/src/ | sort
?   /usr/src/sys/amd64/conf/GENERIC-NODBG
?   /usr/src/sys/arm/conf/GENERIC-NODBG
?   /usr/src/sys/arm64/conf/GENERIC-NODBG
?   /usr/src/sys/powerpc/conf/GENERIC64vtsc-NODBG
?   /usr/src/sys/powerpc/conf/GENERICvtsc-NODBG
M   /usr/src/contrib/llvm/lib/Target/PowerPC/PPCFrameLowering.cpp
M   /usr/src/sys/boot/powerpc/kboot/Makefile


# uname -apKU
FreeBSD FBSDx6411SL 11.1-STABLE FreeBSD 11.1-STABLE  r323012M  amd64 amd64 
1101502 1101502


# svnlite info /usr/src/ | grep "Re[plv]"
Relative URL: ^/stable/11
Repository Root: svn://svn.freebsd.org/base
Repository UUID: ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f
Revision: 323012
Last Changed Rev: 323012

===
Mark Millard
markmi at dsl-only.net

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: svn commit: r322875 - head/sys/dev/nvme

2017-08-28 Thread Mark Millard

On 2017-Aug-27, at 11:54 PM, Ed Schouten  wrote:

> 2017-08-25 14:53 GMT+02:00 Ed Schouten :
>> 2017-08-25 9:46 GMT+02:00 Mark Millard :
>>> It appears that at least 11.1-STABLE -r322807 does not handle
>>> -std=c++98 styles of use of _Static_assert for g++7 in that
>>> g++7 reports an error:
>> 
>> Maybe we need to do something like this?
>> 
>> Index: sys/sys/cdefs.h
>> ===
>> --- sys/sys/cdefs.h (revision 322887)
>> +++ sys/sys/cdefs.h (working copy)
>> @@ -294,7 +294,7 @@
>> #if (defined(__cplusplus) && __cplusplus >= 201103L) || \
>> __has_extension(cxx_static_assert)
>> #define _Static_assert(x, y) static_assert(x, y)
>> -#elif __GNUC_PREREQ__(4,6)
>> +#elif __GNUC_PREREQ__(4,6) && !defined(__cplusplus)
>> /* Nothing, gcc 4.6 and higher has _Static_assert built-in */
>> #elif defined(__COUNTER__)
>> #define _Static_assert(x, y) __Static_assert(x, __COUNTER__)
> 
> Could you let me know whether this patch fixes the build for you? If
> so, I'll commit it!

As a variant of stable/11 -r322807 . . .

buildworld and buildkernel seem to work fine.
(I did not try any port [re-]builds.)

Based on the same main.cc as before . . .

g++7 -std=c++98 main.cc
g++7 -Wpedantic -std=c++98 main.cc
g++7 -std=c++03 main.cc
g++7 -Wpedantic -std=c++03 main.cc

no longer complain (so no error, no
warning).

clang++ -Wpedantic -std=c++11 main.cc
clang++ -Wpedantic -std=c++98 main.cc
clang++ -Wpedantic -std=c++03 main.cc

each still give the warning but no error.

g++7 -Wpedantic -std=c++11 main.cc
g++7 -std=c++11 main.cc
clang++ -std=c++11 main.cc
clang++ -std=c++98 main.cc
clang++ -std=c++03 main.cc

are still silent, no errors, no warnings.

Note that clang here is version 4 --the
same as in my original report that had the
g++7 rejection example. This is because of
the stable/11 context that I used. (An
intended MFC had been listed.)

If needed I could probably try under some
version of head (and so test clang version
5).

===
Mark Millard
markmi at dsl-only.net

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: svn commit: r322875 - head/sys/dev/nvme

2017-08-25 Thread Mark Millard


On 2017-Aug-25, at 12:14 AM, David Chisnall  wrote:

> On 25 Aug 2017, at 07:32, Mark Millard  wrote:
>> 
>> As I remember _Static_assert is from C11, not
>> the older C99.
> 
> In pre-C11 dialects of C, _Static_assert is an identifier reserved for the 
> implementation.  sys/cdefs.h defines it to generate a zero-length array if 
> the condition is true or a negative-length array if it is false, emulating 
> the behaviour (though giving less helpful error messages)
> 
>> 
>> As I understand head/sys/dev/nvme/nvme.h use by
>> C++ code could now reject attempts to use
>> _Static_assert .
> 
> In C++, _Static_assert is an identifier reserved for the implementation, but 
> in C++11 or newer static_assert is a keyword.  sys/cdefs.h defines 
> _Static_assert to static_assert for newer versions of C++ and defines it to 
> the C-before-11-compatible version for C++-before-11.
> 
> TL;DR: We have gone to a lot of effort to ensure that these keywords work in 
> all C/C++ dialects, please use them, please report bugs if you find a case 
> where they don’t work.

It appears that at least 11.1-STABLE -r322807 does not handle
-std=c++98 styles of use of _Static_assert for g++7 in that
g++7 reports an error:

# uname -apKU
FreeBSD hzFreeBSD11S 11.1-STABLE FreeBSD 11.1-STABLE  r322807  amd64 amd64 
1101501 1101501

# more main.cc
#include "/usr/include/sys/cdefs.h"
_Static_assert(1,"Test");
int main(void)
{
   return 0;
}

# g++7 -std=c++98 main.cc
main.cc:2:15: error: expected constructor, destructor, or type conversion 
before '(' token
 _Static_assert(1,"Test");
   ^

So it appears that as stands the _Static_assert
implementation requires a more modern C++ standard
vintage.


With the likes of -Wpedantic clang++ from 11.1-STABLE
-r322807 reports a warning:

# clang++ -Wpedantic -std=c++11 main.cc
main.cc:2:1: warning: _Static_assert is a C11-specific feature 
[-Wc11-extensions]
_Static_assert(1,"Test");
^
1 warning generated.

# clang++ -Wpedantic -std=c++98 main.cc
In file included from main.cc:1:
/usr/include/sys/cdefs.h:852:27: warning: variadic macros are a C99 feature 
[-Wvariadic-macros]
#define __locks_exclusive(...) \
  ^
. . . (more such macro reports) . . .
main.cc:2:1: warning: _Static_assert is a C11-specific feature 
[-Wc11-extensions]
_Static_assert(1,"Test");
^
11 warnings generated.

By contrast "g++7 -Wpedantic -std=c++11 main.cc" is silent about it.


===
Mark Millard
markmi at dsl-only.net

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: svn commit: r322875 - head/sys/dev/nvme

2017-08-25 Thread Mark Millard

> Author: imp
> Date: Fri Aug 25 04:33:06 2017
> New Revision: 322875
> URL: 
> https://svnweb.freebsd.org/changeset/base/322875
> 
> 
> Log:
>   Use _Static_assert
>   
>   These files are compiled in userland too, so we can't use sys/systm.h
>   and rely on CTASSERT. Switch to using _Static_assert instead.
>   
>   MFC After: 3 days
>   Sponsored by: Netflix
> 
> Modified:
>   head/sys/dev/nvme/nvme.h
>   head/sys/dev/nvme/nvme_util.c

As I remember _Static_assert is from C11, not
the older C99.

As I understand head/sys/dev/nvme/nvme.h use by
C++ code could now reject attempts to use
_Static_assert .

There have been at least one old bugzilla report
for such. An example is 205453 (back around
2015-Dec).

>From back then:

> # more main.cc
> #include "/usr/include/sys/cdefs.h"
> _Static_assert(1,"Test");
> int main(void)
> {
> return 0;
> }
> 
> For example:
> 
> # g++49 main.cc
> main.cc:2:15: error: expected constructor, destructor, or type conversion 
> before '(' token
>  _Static_assert(1,"Test");
> . . .
> g++49, g++5, and powerpc64-portbld-freebsd11.0-g++ all reject the above 
> source the same way that libcxxrt/guard.cc compiles are rejected during 
> powerpc64-portbld-freebsd11.0-g++ based buildworld lib32 -m32 compiles. 
> 
> gcc49, gcc5, and powerpc64-portbld-freebsd11.0-gcc all accept the above 
> instead (when in main.c instead of main.cc so it is handle as C code), with 
> or without the include. _Static_assert is specific to C11 and is not part of 
> C++. It takes explicit definitions to make the syntax acceptable as C++.
> 
> Note: clang++ (3.7) accepts the use of the C11 _Static_assert, with or 
> without the include, going well outside the C++ language definition.
> 
> . . .
> 
> Fixed in r297299 .

(The context was a C++ file head/contrib/libcxxrt/guard.cc so C++'s
static_assert was used instead and -std=c++11 was added for the
library in question [libcxxrt].)

Unless head/sys/dev/nvme/nvme.h is not to be used from
C++ code: use of _Static_assert in the header would appear
to be a problem.

===
Mark Millard
markmi at dsl-only.net

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

UNAME_r () and OSVERSION (1101501) do not agree on major version number , which poudriere bulk rejects as a combination.

2017-08-13 Thread Mark Millard

: /usr/local
Categories : ports-mgmt
Licenses   : BSD2CLAUSE
Maintainer : bdrew...@freebsd.org
WWW: https://github.com/freebsd/poudriere/wiki
Comment: Port build and test system
Options:
EXAMPLES   : on
QEMU   : off
ZSH: on
Annotations:
repo_type  : binary
repository : FreeBSD
Flat size  : 2.09MiB
Description:
poudriere is a tool primarily designed to test package production on
FreeBSD. However, most people will find it useful to bulk build ports
for FreeBSD.

WWW: https://github.com/freebsd/poudriere/wiki


I tried to configure and use a -m null
-M /usr/obj/DESTDIRs/FBSDx6411SL-installworld-dist-from-src
based jail and a -m null -M /usr/ports based ports . . .

This leads to my attempted poudriere bulk after configuring
doing the following:

[00:00:00] Creating the reference jail... done
[00:00:04] Mounting system devices for zrFBSDx64SLjail-default
[00:00:04] Mounting ports/packages/distfiles
[00:00:04] Converting package repository to new format
[00:00:04] Stashing existing package repository
[00:00:04] Mounting packages from: 
/usr/local/poudriere/data/packages/zrFBSDx64SLjail-default
/etc/resolv.conf -> 
/usr/local/poudriere/data/.m/zrFBSDx64SLjail-default/ref/etc/resolv.conf
[00:00:04] Starting jail zrFBSDx64SLjail-default
make: "/usr/ports/Mk/bsd.port.mk" line 1177: UNAME_r () and OSVERSION (1101501) 
do not agree on major version number.
[00:00:05] Logs: 
/usr/local/poudriere/data/logs/bulk/zrFBSDx64SLjail-default/2017-08-13_17h45m28s
[00:00:05] Loading MOVED
make: "/usr/ports/Mk/bsd.port.mk" line 1177: UNAME_r () and OSVERSION (1101501) 
do not agree on major version number.
[00:00:06] Error: Error looking up pre-build ports vars
[00:00:06] Cleaning up
[00:00:09] Unmounting file systems


And at this point we are to what I put in the summary.


===
Mark Millard
markmi at dsl-only.net

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: stack_guard hardening bsdinstall option in STABLE and 11.1

2017-07-17 Thread Mark Millard

Vlad K. vlad-fbsd at acheronmedia.com wrote on
Mon Jul 17 15:03:11 UTC 2017 :

> I also asked why wasn't the bsdinstall-er option change 
> MFC'd after 1 day, two weeks ago, whether it's by omission, simply 
> ENOTIME, or something else...

Given what Konstantin Belousov described (default
stack space sizes and apparently guard pages eat
into stack space instead of the overall space being
bigger by the guard size), I think that would explain
not moving from CURRENT: it was known to be a problem.
(Although I expect Konstantin Belousov's note here is
the first public description of the problem's details.)

I agree that you did not get an answer for the other
part:

> I simply asked if it's safe to assume the sysctl to be an integer in 

> 11.1


I've not gone through any draft 11.1-release code to
check.

===
Mark Millard
markmi at dsl-only.net

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: lang/gcc* package builds vs. release/11.0.1/ and the future release/11.1.0 because of vm_ooffset_t and vm_pindex_t changes and how the lang/gcc* work

2017-06-29 Thread Mark Millard

On 2017-Jun-29, at 3:10 AM, Gerald Pfeifer  wrote:

> Am 28. Juni 2017 22:38:52 GMT+08:00 schrieb Mark Millard  dsl-only.net>:
>> A primary test is building lang/gcc5-devel under release/11.0.1
>> and then using it under stable/11 or some draft of release/11.1.0 .
> 
> Thank you, Mark. Let me know how it went. In the meantime I'll prepare the 
> change for gcc5 itself.

I'm not currently set up to run more than head on
any of amd64, powerpc64, powerpc, aarch64, or armv6/7
(which are all I target). And I'm in the middle of
attempting a fairly large jump to head -r320458 on
those. (powerpc 32-bit and 64-bit just failed
for libc++ time-usage compiling now that 32-bit has
64-bit time_t, including in world32/lib32 contexts
for powerpc64.)

It will likely be a while before I manage to have a
11.x context (without losing my head contexts), much
less examples from all "my" 5 TARGET_ARCH's. (Given past
wchar_t type handling problems (e.g.) for gcc targeting
powerpc family members I think it should be checked.)
I'll have to find and set up disks: I do not even have
such handy/ready at the moment.

[I got into this area by being asked questions, not by
my direct use of release/11.0.1 , stable/11 , or a
draft of release/11.1.0 .]

I'll let you know when I have some test results but
others may get some before I do.

> . . .
>> Eventually most of the lang/gcc* 's will need whatever
>> technique is used.
> 
> Yes, agreed. Version 5 is most important since it's the default; then 6; 4.x 
> is for retro computing fans ;-), so 7 will then be next.

[In my normal/head environment I'm switching to lang/gcc7-devel
for gcc (from lang/gcc6 ) but I'm odd that way.]

===
Mark Millard
markmi at dsl-only.net

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: lang/gcc* package builds vs. release/11.0.1/ and the future release/11.1.0 because of vm_ooffset_t and vm_pindex_t changes and how the lang/gcc* work

2017-06-28 Thread Mark Millard

On 2017-Jun-28, at 3:21 AM, Gerald Pfeifer <ger...@pfeifer.com> wrote:

> I am testing a patch for gcc5-devel right now that will disable fixincludes 
> (or rather its fixed files) being packaged.
> 
> Should that work fine for you, I will push this back to gcc5 the following 
> days.
> 
> That said, the change that triggered this is what I would expect on CURRENT, 
> not STABLE (and hence hoped we'd have more time for this change).
> 
> My Internet connectivity right now is only slightly above pigeon speed, so 
> sorry for any delays.

Thanks!

Some notes:

A primary test is building lang/gcc5-devel under release/11.0.1
and then using it under stable/11 or some draft of release/11.1.0 .

It looks like the the lang/gcc5-devel build still creates and
uses the headers that go in include-fixed/ but that they are
removed from $(STAGEDIR}${TARGLIB} 's tree before installation
or packaging.

So, if I understand right, lang/gcc5-devel itself still does use
the adjusted headers to produce its own materials but when
lang/gcc5-devel is used later it does not. Definitely
something to be testing since it is a mix overall.

Is some form of exp-like run needed that tries to force use
of a release/11.0.1 built lang/gcc5-devel (-r444563) to build
other things under, say, stable/11  or some draft of
release/11.1.0 ? Is this odd combination even possible
currently?

A normal exp-run on release/11.0.1 without a system version
switch being involved also seems appropriate. The same could
be said of an exp-run based on a release/11.1.0 draft for
both building lang/gcc5-devel and using it to build other
things.

I had hoped that the Linux From Scratch technique of doing:

sed -i 's@\./fixinc\.sh@-c true@' gcc/Makefile.in

(or an equivalent) before gcc/Makefile.in is used would
allow lang/gcc5-devel to use the same headers in its build
that the installed compiler would then use to produce other
code --by avoiding generating most of the adjusted files in
the first place. But I guess that did not work out.

Eventually most of the lang/gcc* 's will need whatever
technique is used. Some, such as lang/gcc6-aux, need
more done because of binary bootstrap materials being
downloaded and used and so the build of lang/gcc6-aux
gets the problem and fails before staging happens: the
binary-bootstrap materials need to avoid the adjusted
headers that they currently contain.

===
Mark Millard
markmi at dsl-only.net

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: lang/gcc* package builds vs. release/11.0.1/ and the future release/11.1.0 because of vm_ooffset_t and vm_pindex_t changes and how the lang/gcc* work

2017-06-26 Thread Mark Millard

Top post on one point. . .

Patrick Powell papowell at astart.com wrote on Mon Jun 26 14:10:44 UTC 2017
(He was quoting Gerald. I was also part of some earlier discussions.)

> (Luckily this only hits with most -CURRENT versions of FreeBSD and
> older packages only.)
> 
> Gerald

Unfortunately this part is false if it is about the vm_ooffset_t
and vm_pindex_t issue: stable/11/ and release/11.1.0/  also
have the vm_ooffset_t and vm_pindex_t issue vs. lang/gcc*
packages built by release/11.0.1/ .

The issue is not limited to head (12) at this point:

Installing a gcc* package built by release/11.0.1/ fails
now for stable/11/ and the drafs oft release/11.1.0/ .
Anyone progressing to one of those has to build the
lang/gcc* of interest from source under the newer system
context. (Mixing source builds and package builds is
discouraged as I understand.)

I'm not claiming which specific handling needs to be made.
But the vm_ooffset_t and vm_pindex_t changes did not even
make the UPDATING notes. Right now things look to have
the worst combination for lang/gcc* when release/11.1.0/
becomes official: lang/gcc* 's break without notification
or suggestion of a workaround.

===
Mark Millard
markmi at dsl-only.net

On 2017-Jun-24, at 5:55 PM, Mark Millard <mar...@dsl-only.net> wrote:

The following is based mostly on an extraction from a
private exchange in which a question was asked and my
answer was unsettling: incompatibilities within the
11.* family. I would not normally send to re but doing
so was explicitly mentioned. Hopefully this example is
reasonable for doing that.

Aspect #0: what is broken currently (and in the future?)
  within the 11.* family?

lang/gcc* packages built on release/11.0.1/ to not work
fully on stable/11/ or on the drafts of
release/11.1.0/ . (I leave releng/11.*/'s implicit.)

-r313194 in head and was describied with:

> Define the vm_ooffset_t and vm_pindex_t types as machine-independend.
> 
> The types are for the byte offset and page index in vm object.  They
> are similar to off_t, which is defined as 64bit MI integer.  Using MI
> definitions will allow to provide consistent MD values of vm
> object-related maximum sizes.

The known issue is the generation of header dependencies
in the lang/gcc* builds on release/11.0.1/ that when
used on stable/11/ or release/11.0.1/ generate reports
like:

/usr/local/lib/gcc5/gcc/x86_64-portbld-freebsd11.0/5.4.0/include-fixed/sys/types.h:266:9:
 error: '__vm_ooffset_t' does not name a type
typedef __vm_ooffset_t vm_ooffset_t;
   ^
/usr/local/lib/gcc5/gcc/x86_64-portbld-freebsd11.0/5.4.0/include-fixed/sys/types.h:268:9:
 error: '__vm_pindex_t' does not name a type
typedef __vm_pindex_t vm_pindex_t;
   ^
*** [CoinFactorization2.lo] Error code 1

Unfortunately UPDATING was not updated
for head/'s -r313194 (2017-Feb-4) --nor for
stable/11/'s -r313574 (2017-Feb-11), the MFC.
(No MFC was made to stable/10/ or to
release/10.3.0 as far as I found.)

(These changes predate the INO64 issue in head/ .
Head ends up with more issues than I'm dealing
with here.)

Aspect #1: what 11.* version builds the pre-built packages
  targeting 11.* and the apparent consequences
  (given the vm_ooffset_t and vm_pindex_t changes
   and the lang/gcc* build behavior)

This is the unsettling part for pre-built
packages:  incompatibilities within the 11.*
family for the lang/gcc* packages.

http://portsmon.freebsd.org/portoverview.py?category=%3Bamng=gcc5=

shows categories for builds for

8.4
9.3
10.1
10.3
11.0
head

(Nothing for stable/*/ .)

But the 10.3 rows show no package
builds. I would guess that they
start once 10.1 stops
(approximately).

So it may be that 11.1 will not
get package builds until 11.0
stops (approximately).

If so unless lang/gcc* are changed
to bootstrap differently they will
configure to match release/11.0.1/
and will not be compatible with the
vm_ooffset_t and vm_pindex_t changes
in stable/11/ and release/11.1.0/ .

But as I understand updating how the
lang/gcc* builds work to remove such
dependencies is under investigation.
I do not know any timing relative to
release/11.1.0/ if my understanding
is right.

Until then (if I was right):

Unless there are separate packages made for
targeting release/11.0.1/ vs. release/11.1.0/
it is not obvious when lang/gcc* packages
will be generally compatible with various
folks choices about what to install as the
system version within the release/11.*/
and stable/11/ family. This would likely
be true even if they were built on
release/11.1.0/ : then release/11.0.1/
likely would have compatibility problems.

The ABI versioning does not cover the specific
issues involved based on how vm_ooffset_t and
vm_pindex_t were changed and what the
lang/gcc* builds do relative to such changes.
Yet there is incompatibility for some
fairly-significant-usage ports.

Aspect #2: stable/10/ and release/10.4.0/

Just covered for completeness:

I do not see a

lang/gcc* package builds vs. release/11.0.1/ and the future release/11.1.0 because of vm_ooffset_t and vm_pindex_t changes and how the lang/gcc* work

2017-06-24 Thread Mark Millard

The following is based mostly on an extraction from a
private exchange in which a question was asked and my
answer was unsettling: incompatibilities within the
11.* family. I would not normally send to re but doing
so was explicitly mentioned. Hopefully this example is
reasonable for doing that.


Aspect #0: what is broken currently (and in the future?)
   within the 11.* family?

lang/gcc* packages built on release/11.0.1/ to not work
fully on stable/11/ or on the drafts of
release/11.1.0/ . (I leave releng/11.*/'s implicit.)

-r313194 in head and was describied with:

> Define the vm_ooffset_t and vm_pindex_t types as machine-independend.
> 
> The types are for the byte offset and page index in vm object.  They
> are similar to off_t, which is defined as 64bit MI integer.  Using MI
> definitions will allow to provide consistent MD values of vm
> object-related maximum sizes.

The known issue is the generation of header dependencies
in the lang/gcc* builds on release/11.0.1/ that when
used on stable/11/ or release/11.0.1/ generate reports
like:

/usr/local/lib/gcc5/gcc/x86_64-portbld-freebsd11.0/5.4.0/include-fixed/sys/types.h:266:9:
 error: '__vm_ooffset_t' does not name a type
typedef __vm_ooffset_t vm_ooffset_t;
^
/usr/local/lib/gcc5/gcc/x86_64-portbld-freebsd11.0/5.4.0/include-fixed/sys/types.h:268:9:
 error: '__vm_pindex_t' does not name a type
typedef __vm_pindex_t vm_pindex_t;
^
*** [CoinFactorization2.lo] Error code 1

Unfortunately UPDATING was not updated
for head/'s -r313194 (2017-Feb-4) --nor for
stable/11/'s -r313574 (2017-Feb-11), the MFC.
(No MFC was made to stable/10/ or to
release/10.3.0 as far as I found.)

(These changes predate the INO64 issue in head/ .
Head ends up with more issues than I'm dealing
with here.)


Aspect #1: what 11.* version builds the pre-built packages
   targeting 11.* and the apparent consequences
   (given the vm_ooffset_t and vm_pindex_t changes
and the lang/gcc* build behavior)

This is the unsettling part for pre-built
packages:  incompatibilities within the 11.*
family for the lang/gcc* packages.

http://portsmon.freebsd.org/portoverview.py?category=%3Bamng=gcc5=

shows categories for builds for

8.4
9.3
10.1
10.3
11.0
head

(Nothing for stable/*/ .)

But the 10.3 rows show no package
builds. I would guess that they
start once 10.1 stops
(approximately).

So it may be that 11.1 will not
get package builds until 11.0
stops (approximately).

If so unless lang/gcc* are changed
to bootstrap differently they will
configure to match release/11.0.1/
and will not be compatible with the
vm_ooffset_t and vm_pindex_t changes
in stable/11/ and release/11.1.0/ .

But as I understand updating how the
lang/gcc* builds work to remove such
dependencies is under investigation.
I do not know any timing relative to
release/11.1.0/ if my understanding
is right.

Until then (if I was right):

Unless there are separate packages made for
targeting release/11.0.1/ vs. release/11.1.0/
it is not obvious when lang/gcc* packages
will be generally compatible with various
folks choices about what to install as the
system version within the release/11.*/
and stable/11/ family. This would likely
be true even if they were built on
release/11.1.0/ : then release/11.0.1/
likely would have compatibility problems.

The ABI versioning does not cover the specific
issues involved based on how vm_ooffset_t and
vm_pindex_t were changed and what the
lang/gcc* builds do relative to such changes.
Yet there is incompatibility for some
fairly-significant-usage ports.


Aspect #2: stable/10/ and release/10.4.0/

Just covered for completeness:

I do not see a MFC of -r313194 to stable/10/ :
its sys/sys/types.h dates back to 2015-Oct-10.
So it looks like 10.x has a permanent difference
in this area: 10.x continues to get separate
lang/gcc* package builds from 11.x and later.
No problem for this context as far as I know.




Note: To simplify I choose to not be explicit
about what authors wrote what original text.
If that becomes an issue, it is correctable.

Blame me for any errors in the above.

===
Mark Millard
markmi at dsl-only.net

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: GCC + FreeBSD 11.0 Stable - stat.h does not have vm_ooffset_t definition

2017-05-01 Thread Mark Millard

Gerald Pfeifer gerald at pfeifer.com wrote on
Sun Apr 30 15:20:35 UTC 2017 :

> That, or run the fixinc.sh script in 
> ./libexec/gcc/$TARGETTRIPLET/$VERSION/install-tools/fixinc.sh.


fixinc.sh is designed to be run by (for
the */* involved):

bootstrap/libexec/gcc/*/*/install-tools/mkheaders

and that mkheaders does more than just fixinc.sh
as far as changing headers goes, such as limits.h
and gsyslmits.h and syslimits.h .

In more detail:

The mkheaders core loop looks like:

for ml in `cat ${itoolsdatadir}/fixinc_list`; do
sysroot_headers_suffix=`echo ${ml} | sed -e 's/;.*$//'`
multi_dir=`echo ${ml} | sed -e 's/^[^;]*;//'`
subincdir=${incdir}${multi_dir}
. ${itoolsdatadir}/mkheaders.conf
if [ x${STMP_FIXINC} != x ] ; then
  TARGET_MACHINE="${target}" target_canonical="${target}" \
  MACRO_LIST="${itoolsdatadir}/macro_list" \
  /bin/sh ./fixinc.sh ${subincdir} \
  ${isysroot}${SYSTEM_HEADER_DIR} ${OTHER_FIXINCLUDES_DIRS}
  rm -f ${subincdir}/syslimits.h
  if [ -f ${subincdir}/limits.h ]; then
mv ${subincdir}/limits.h ${subincdir}/syslimits.h
  else
cp ${itoolsdatadir}/gsyslimits.h ${subincdir}/syslimits.h
  fi
fi

cp ${itoolsdatadir}/include${multi_dir}/limits.h ${subincdir}
done

Note that mkheaders also provides various definitions to
fixinc.sh, such as MACRO_LIST . Direct use of fixinc.sh
likely requires providing appropriate alternate
definitions for such.

I'll note that:

http://www.linuxfromscratch.org/lfs/view/7.1/chapter06/gcc.html

reports as one of its steps (quote):

The fixincludes script is known to occasionally erroneously attempt
to "fix" the system headers installed so far. As the headers up to
this point are known to not require fixing, issue the following
command to prevent the fixincludes script from running:

sed -i 's@\./fixinc\.sh@-c true@' gcc/Makefile.in

(End quote)

So seems that disabling fixinc.sh's use is fairly common when
the headers are known to "not require fixing" (i.e., are known
to already be gcc compliant).

This still leaves the limits.h and gsystemlimits.h and
syslimits.h code in place but does block most of the
activity.

===
Mark Millard
markmi at dsl-only.net

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: arm64 fork/swap data corruptions: A ~110 line C program demonstrating an example (Pine64+ 2GB context) [Corrected subject: arm64!]

2017-03-27 Thread Mark Millard

On 2017-Mar-21, at 7:21 PM, Mark Millard  wrote:

> On 2017-Mar-18, at 9:10 PM, Mark Millard  wrote:
> 
>> 
>> On 2017-Mar-18, at 5:53 PM, Mark Millard  wrote:
>> 
>>> A new, significant discovery follows. . .
>>> 
>>> While checking out use of procstat -v I ran
>>> into the following common property for the 3
>>> programs that I looked at:
>>> 
>>> A) My small test program that fails for
>>> a dynamically allocated space.
>>> 
>>> B) sh reporting Failed assertion: "tsd_booted".
>>> 
>>> C) su reporting Failed assertion: "tsd_booted".
>>> 
>>> Here are example addresses from the area of
>>> incorrectly zeroed memory (A then B then C):
>>> 
>>> (lldb) print dyn_region
>>> (region *volatile) $0 = 0x40616000
>>> 
>>> (lldb) print &__je_tsd_booted
>>> (bool *) $0 = 0x40618520
>>> 
>>> (lldb) print &__je_tsd_booted
>>> (bool *) $0 = 0x40618520
>> 
>> That last above was a copy/paste error. Correction:
>> 
>> (lldb) print &__je_tsd_booted
>> (bool *) $0 = 0x4061d520
>> 
>>> The first is from dynamic allocation ending up
>>> in the area. The other two are from libc.so.7
>>> globals/statics ending up in the general area.
>>> 
>>> It looks like something is trashing a specific
>>> memory area for some reason, rather independently
>>> of what the program specifics are.
> 
> I probably should have noted that the processes
> involved were: child/parent then grandparent
> and then great grandparent. The grandparent
> was sh and the great grandparent was su.
> 
> The ancestors in the process tree are being
> damaged, not just the instances of the
> program that demonstrates the problem.
> 
>>> Other notes:
>>> 
>>> At least for my small program showing failure:
>>> 
>>> Being explicit about the combined conditions for failure
>>> for my test program. . .
>>> 
>>> Both tcache enabled and allocations fitting in SMALL_MAXCLASS
>>> are required in order to make the program fail.
>>> 
>>> Note:
>>> 
>>> lldb) print __je_tcache_maxclass
>>> (size_t) $0 = 32768
>>> 
>>> which is larger than SMALL_MAXCLASS. I've not observed
>>> failures for sizes above SMALL_MAXCLASS but not exceeding
>>> __je_tcache_maxclass.
>>> 
>>> Thus tcache use by itself does not seen sufficient for
>>> my program to get corruption of its dynamically allocated
>>> memory: the small allocation size also matters.
>>> 
>>> 
>>> Be warned that I can not eliminate the possibility that
>>> the trashing changed what region of memory it trashed
>>> for larger allocations or when tcache is disabled.
>> 
>> The pine64+ 2GB eventually got into a state where:
>> 
>> /etc/malloc.conf -> tcache:false
>> 
>> made no difference and the failure kept occurring
>> with that symbolic link in place.
>> 
>> But after a reboot of the pin46+ 2GB
>> /etc/malloc.conf -> tcache:false was again effective
>> for my test program. (It was still present from
>> before the reboot.)
>> 
>> I checked the .core files and the allocated address
>> assigned to dyn_region was the same in the tries
>> before and after the reboot. (I had put in an
>> additional raise(SIGABRT) so I'd always have
>> a core file to look at.)
>> 
>> Apparently /etc/malloc.conf -> tcache:false was
>> being ignored before the reboot for some reason?
> 
> I have also discovered that if the child process
> in an example like my program does a:
> 
> (void) posix_madvise(dyn_region, region_size, POSIX_MADV_WILLNEED);
> 
> after the fork but before the sleep/swap-out/wait
> then the problem does not happen. This is without
> any read or write access to the memory between the
> fork and sleep/swap-out/wait.
> 
> By contrast such POSIX_MADV_WILLNEED use in the parent
> process does not change the failure behavior.

I've added another test program to bugzilla
217239 and 217138, one with thousands of 14
KiByte allocations.

The test program usually ends up with them all being
zeroed in the parent and child of the fork.

But I've had a couple of runs where a much smaller
prefix was messed up and then there were normal,
expected values.

#define region_size (14u*1024u)
. . .
#define num_regions (256u*1024u*1024u/region_size)

So num_regions==18724, using up most of 256 MiBytes.

Note: each region has its own 14 KiByte allocation.

But dyn_regions[1296].array[0] in one example was
the first normal value.

In another example dyn_regions[2180].array[4096] was
the first normal value.

The last is interesting for being part way through
an allocation's space. That but aligning with a 4
KiByte page size would seem odd for a pure-jemalloc
issue.

===
Mark Millard
markmi at dsl-only.net

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: arm64 fork/swap data corruptions: A ~110 line C program demonstrating an example (Pine64+ 2GB context) [Corrected subject: arm64!]

2017-03-21 Thread Mark Millard

On 2017-Mar-18, at 9:10 PM, Mark Millard <mar...@dsl-only.net> wrote:

> 
> On 2017-Mar-18, at 5:53 PM, Mark Millard <mar...@dsl-only.net> wrote:
> 
>> A new, significant discovery follows. . .
>> 
>> While checking out use of procstat -v I ran
>> into the following common property for the 3
>> programs that I looked at:
>> 
>> A) My small test program that fails for
>>  a dynamically allocated space.
>> 
>> B) sh reporting Failed assertion: "tsd_booted".
>> 
>> C) su reporting Failed assertion: "tsd_booted".
>> 
>> Here are example addresses from the area of
>> incorrectly zeroed memory (A then B then C):
>> 
>> (lldb) print dyn_region
>> (region *volatile) $0 = 0x40616000
>> 
>> (lldb) print &__je_tsd_booted
>> (bool *) $0 = 0x40618520
>> 
>> (lldb) print &__je_tsd_booted
>> (bool *) $0 = 0x40618520
> 
> That last above was a copy/paste error. Correction:
> 
> (lldb) print &__je_tsd_booted
> (bool *) $0 = 0x4061d520
> 
>> The first is from dynamic allocation ending up
>> in the area. The other two are from libc.so.7
>> globals/statics ending up in the general area.
>> 
>> It looks like something is trashing a specific
>> memory area for some reason, rather independently
>> of what the program specifics are.

I probably should have noted that the processes
involved were: child/parent then grandparent
and then great grandparent. The grandparent
was sh and the great grandparent was su.

The ancestors in the process tree are being
damaged, not just the instances of the
program that demonstrates the problem.

>> Other notes:
>> 
>> At least for my small program showing failure:
>> 
>> Being explicit about the combined conditions for failure
>> for my test program. . .
>> 
>> Both tcache enabled and allocations fitting in SMALL_MAXCLASS
>> are required in order to make the program fail.
>> 
>> Note:
>> 
>> lldb) print __je_tcache_maxclass
>> (size_t) $0 = 32768
>> 
>> which is larger than SMALL_MAXCLASS. I've not observed
>> failures for sizes above SMALL_MAXCLASS but not exceeding
>> __je_tcache_maxclass.
>> 
>> Thus tcache use by itself does not seen sufficient for
>> my program to get corruption of its dynamically allocated
>> memory: the small allocation size also matters.
>> 
>> 
>> Be warned that I can not eliminate the possibility that
>> the trashing changed what region of memory it trashed
>> for larger allocations or when tcache is disabled.
> 
> The pine64+ 2GB eventually got into a state where:
> 
> /etc/malloc.conf -> tcache:false
> 
> made no difference and the failure kept occurring
> with that symbolic link in place.
> 
> But after a reboot of the pin46+ 2GB
> /etc/malloc.conf -> tcache:false was again effective
> for my test program. (It was still present from
> before the reboot.)
> 
> I checked the .core files and the allocated address
> assigned to dyn_region was the same in the tries
> before and after the reboot. (I had put in an
> additional raise(SIGABRT) so I'd always have
> a core file to look at.)
> 
> Apparently /etc/malloc.conf -> tcache:false was
> being ignored before the reboot for some reason?

I have also discovered that if the child process
in an example like my program does a:

(void) posix_madvise(dyn_region, region_size, POSIX_MADV_WILLNEED);

after the fork but before the sleep/swap-out/wait
then the problem does not happen. This is without
any read or write access to the memory between the
fork and sleep/swap-out/wait.

By contrast such POSIX_MADV_WILLNEED use in the parent
process does not change the failure behavior.

===
Mark Millard
markmi at dsl-only.net

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Unicode strageness with lldb

2017-03-20 Thread Mark Millard


Pete French petefrench at ingresso.co.uk  wrote on Mon Mar 20 14:55:44 UTC 2017:

> Using the lldb installed with 11-STABLE from an hour or so ago. Thoigh
> I dont know when this started, as I have been using db until now.
> 
> First command I type is fine, subsequent commands, every keypress I
> type looks like this:
> 
> (lldb) \U+7F68\U+7F65\U+7F08
> 
> That's an attempted 'bt' so something is adding 0x7F00 to it somewhere.
> 
> Anyone else seeing this ? Its in a standard xterm, options '-sb +lc -en utf8'
> but I have tried with other options and it does the same thing.

There was a time a while back when I was seeing such in a head (12)
context. I'm not sure about the specific values after the +'s but \U
prefixed output was definitely involved. I was not explicitly using
any of the options that you list in those ssh sessions (from a
macOS environment).

I discovered that if I typed ^C it would output a new prompt and
start taking/displaying input normally.

I've not had such an issue in a while.

I never managed to isolate what contributed to it happening.

===
Mark Millard
markmi at dsl-only.net

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: arm64 fork/swap data corruptions: A ~110 line C program demonstrating an example (Pine64+ 2GB context) [Corrected subject: arm64!]

2017-03-18 Thread Mark Millard


On 2017-Mar-18, at 5:53 PM, Mark Millard <mar...@dsl-only.net> wrote:

> A new, significant discovery follows. . .
> 
> While checking out use of procstat -v I ran
> into the following common property for the 3
> programs that I looked at:
> 
> A) My small test program that fails for
>   a dynamically allocated space.
> 
> B) sh reporting Failed assertion: "tsd_booted".
> 
> C) su reporting Failed assertion: "tsd_booted".
> 
> Here are example addresses from the area of
> incorrectly zeroed memory (A then B then C):
> 
> (lldb) print dyn_region
> (region *volatile) $0 = 0x40616000
> 
> (lldb) print &__je_tsd_booted
> (bool *) $0 = 0x40618520
> 
> (lldb) print &__je_tsd_booted
> (bool *) $0 = 0x40618520

That last above was a copy/paste error. Correction:

(lldb) print &__je_tsd_booted
(bool *) $0 = 0x4061d520

> The first is from dynamic allocation ending up
> in the area. The other two are from libc.so.7
> globals/statics ending up in the general area.
> 
> It looks like something is trashing a specific
> memory area for some reason, rather independently
> of what the program specifics are.
> 
> 
> Other notes:
> 
> At least for my small program showing failure:
> 
> Being explicit about the combined conditions for failure
> for my test program. . .
> 
> Both tcache enabled and allocations fitting in SMALL_MAXCLASS
> are required in order to make the program fail.
> 
> Note:
> 
> lldb) print __je_tcache_maxclass
> (size_t) $0 = 32768
> 
> which is larger than SMALL_MAXCLASS. I've not observed
> failures for sizes above SMALL_MAXCLASS but not exceeding
> __je_tcache_maxclass.
> 
> Thus tcache use by itself does not seen sufficient for
> my program to get corruption of its dynamically allocated
> memory: the small allocation size also matters.
> 
> 
> Be warned that I can not eliminate the possibility that
> the trashing changed what region of memory it trashed
> for larger allocations or when tcache is disabled.

The pine64+ 2GB eventually got into a state where:

/etc/malloc.conf -> tcache:false

made no difference and the failure kept occurring
with that symbolic link in place.

But after a reboot of the pin46+ 2GB
/etc/malloc.conf -> tcache:false was again effective
for my test program. (It was still present from
before the reboot.)

I checked the .core files and the allocated address
assigned to dyn_region was the same in the tries
before and after the reboot. (I had put in an
additional raise(SIGABRT) so I'd always have
a core file to look at.)

Apparently /etc/malloc.conf -> tcache:false was
being ignored before the reboot for some reason?


===
Mark Millard
markmi at dsl-only.net

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: arm64 fork/swap data corruptions: A ~110 line C program demonstrating an example (Pine64+ 2GB context) [Corrected subject: arm64!]

2017-03-18 Thread Mark Millard

A new, significant discovery follows. . .

While checking out use of procstat -v I ran
into the following common property for the 3
programs that I looked at:

A) My small test program that fails for
   a dynamically allocated space.

B) sh reporting Failed assertion: "tsd_booted".

C) su reporting Failed assertion: "tsd_booted".

Here are example addresses from the area of
incorrectly zeroed memory (A then B then C):

(lldb) print dyn_region
(region *volatile) $0 = 0x40616000

(lldb) print &__je_tsd_booted
(bool *) $0 = 0x40618520

(lldb) print &__je_tsd_booted
(bool *) $0 = 0x40618520

The first is from dynamic allocation ending up
in the area. The other two are from libc.so.7
globals/statics ending up in the general area.

It looks like something is trashing a specific
memory area for some reason, rather independently
of what the program specifics are.


Other notes:

At least for my small program showing failure:

Being explicit about the combined conditions for failure
for my test program. . .

Both tcache enabled and allocations fitting in SMALL_MAXCLASS
are required in order to make the program fail.

Note:

lldb) print __je_tcache_maxclass
(size_t) $0 = 32768

which is larger than SMALL_MAXCLASS. I've not observed
failures for sizes above SMALL_MAXCLASS but not exceeding
__je_tcache_maxclass.

Thus tcache use by itself does not seen sufficient for
my program to get corruption of its dynamically allocated
memory: the small allocation size also matters.


Be warned that I can not eliminate the possibility that
the trashing changed what region of memory it trashed
for larger allocations or when tcache is disabled.


===
Mark Millard
markmi at dsl-only.net


___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: arm64 fork/swap data corruptions: A ~110 line C program demonstrating an example (Pine64+ 2GB context) [Corrected subject: arm64!]

2017-03-18 Thread Mark Millard

[Summary: I've now tested on a rpi3 in addition to a
pine64+ 2GB. Both contexts show the problem.]

On 2017-Mar-16, at 2:07 AM, Mark Millard  wrote:

> On 2017-Mar-15, at 11:07 PM, Scott Bennett  wrote:
> 
>> Mark Millard  wrote:
>> 
>>> [Something strange happened to the automatic CC: fill-in for my original
>>> reply. Also I should have mentioned that for my test program if a
>>> variant is made that does not fork the swapping works fine.]
>>> 
>>> On 2017-Mar-15, at 9:37 AM, Mark Millard  wrote:
>>> 
>>>> On 2017-Mar-15, at 6:15 AM, Scott Bennett  wrote:
>>>> 
>>>>>  On Tue, 14 Mar 2017 18:18:56 -0700 Mark Millard
>>>>>  wrote:
>>>>>> On 2017-Mar-14, at 4:44 PM, Bernd Walter <ti...@cicely7.cicely.de> wrote:
>>>>>> 
>>>>>>> On Tue, Mar 14, 2017 at 03:28:53PM -0700, Mark Millard wrote:
>>>>>>>> [test_check() between the fork and the wait/sleep prevents the
>>>>>>>> failure from occurring. Even a small access to the memory at
>>>>>>>> that stage prevents the failure. Details follow.]
>>>>>>> 
>>>>>>> Maybe a stupid question, since you might have written it somewhere.
>>>>>>> What medium do you swap to?
>>>>>>> I've seen broken firmware on microSD cards doing silent data
>>>>>>> corruption for some access patterns.
>>>>>> 
>>>>>> The root filesystem is on a USB SSD on a powered hub.
>>>>>> 
>>>>>> Only the kernel is from the microSD card.
>>>>>> 
>>>>>> I have several examples of the USB SSD model and have
>>>>>> never observed such problems in any other context.
>>>>>> 
>>>>>> [remainder of irrelevant material deleted  --SB]
>>>>> 
>>>>>  You gave a very long-winded non-answer to Bernd's question, so I'll
>>>>> repeat it here.  What medium do you swap to?
>>>> 
>>>> My wording of:
>>>> 
>>>> The root filesystem is on a USB SSD on a powered hub.
>>>> 
>>>> was definitely poor. It should have explicitly mentioned the
>>>> swap partition too:
>>>> 
>>>> The root filesystem and swap partition are both on the same
>>>> USB SSD on a powered hub.
>>>> 
>>>> More detail from dmesg -a for usb:
>>>> 
>>>> usbus0: 12Mbps Full Speed USB v1.0
>>>> usbus1: 480Mbps High Speed USB v2.0
>>>> usbus2: 12Mbps Full Speed USB v1.0
>>>> usbus3: 480Mbps High Speed USB v2.0
>>>> ugen0.1:  at usbus0
>>>> uhub0:  on usbus0
>>>> ugen1.1:  at usbus1
>>>> uhub1:  on 
>>>> usbus1
>>>> ugen2.1:  at usbus2
>>>> uhub2:  on usbus2
>>>> ugen3.1:  at usbus3
>>>> uhub3:  on 
>>>> usbus3
>>>> . . .
>>>> uhub0: 1 port with 1 removable, self powered
>>>> uhub2: 1 port with 1 removable, self powered
>>>> uhub1: 1 port with 1 removable, self powered
>>>> uhub3: 1 port with 1 removable, self powered
>>>> ugen3.2:  at usbus3
>>>> uhub4 on uhub3
>>>> uhub4:  on 
>>>> usbus3
>>>> uhub4: MTT enabled
>>>> uhub4: 4 ports with 4 removable, self powered
>>>> ugen3.3:  at usbus3
>>>> umass0 on uhub4
>>>> umass0:  on usbus3
>>>> umass0:  SCSI over Bulk-Only; quirks = 0x0100
>>>> umass0:0:0: Attached to scbus0
>>>> . . .
>>>> da0 at umass-sim0 bus 0 scbus0 target 0 lun 0
>>>> da0:  Fixed Direct Access SPC-4 SCSI device
>>>> da0: Serial Number 
>>>> da0: 40.000MB/s transfers
>>>> 
>>>> (Edited a bit because there is other material interlaced, even
>>>> internal to some lines. Also: I removed the serial number of the
>>>> specific example device.)
>> 
>>Thank you.  That presents a much clearer picture.
>>>> 
>>>>>  I will further note that any kind of USB device cannot automatically
>>>>> be trusted to behave properly.  USB devices are notorious, for example,
>>>>> 
>>>>> [reasons why deleted  --SB]
>>>>> 
>>>>>  You should identify where you page/swap to and then try substituting
>>>>> a different device for that function as a test to eliminate the 
>

1 2 >

1 - 100 of 170 matches

Mail list logo