Re: releng/13 release/13.0.0 : odd/incorrect diff result over nfs (in a zfs file systems context)
On 2021-May-23, at 01:27, Mark Millard wrote: > On 2021-May-23, at 00:44, Mark Millard wrote: > >> On 2021-May-21, at 17:56, Rick Macklem wrote: >> >>> Mark Millard wrote: >>> [stuff snipped] >>>> Well, why is it that ls -R, find, and diff -r all get file >>>> name problems via genet0 but diff -r gets no problems >>>> comparing the content of files that it does match up (the >>>> vast majority)? Any clue how could the problems possibly >>>> be unique to the handling of file names/paths? Does it >>>> suggest anything else to look into for getting some more >>>> potentially useful evidence? >>> Well, all I can do is describe the most common TSO related >>> failure: >>> - When a read RPC reply (including NFS/RPC/TCP/IP headers) >>> is slightly less than 64K bytes (many TSO implementations are >>> limited to 64K or 32 discontiguous segments, think 32 2K >>> mbuf clusters), the driver decides it is ok, but when the MAC >>> header is added it exceeds what the hardware can handle correctly... >>> --> This will happen when reading a regular file that is slightly less >>> than a multiple of 64K in size. >>> or >>> --> This will happen when reading just about any large directory, >>>since the directory reply for a 64K request is converted to Sun XDR >>>format and clipped at the last full directory entry that will fit within >>> 64K. >>> For ports, where most files are small, I think you can tell which is more >>> likely to happen. >>> --> If TSO is disabled, I have no idea how this might matter, but?? >>> >>>> I'll note that netstat -I ue0 -d and netstat -I genet0 -d >>>> do not report changes in Ierrs or Idrop in a before vs. >>>> after failures comparison. (There may be better figures >>>> to look at for all I know.) >>>> >>>> I tried "ifconfig genet0 -rxcsum -rxcsum -rxcsum6 -txcsum6" >>>> and got no obvious change in behavior. >>> All we know is that the data is getting corrupted somehow. >>> >>> NFS traffic looks very different than typical TCP traffic. It is >>> mostly small messages travelling in both directions concurrently, >>> with some large messages thrown in the mix. >>> All I'm saying is that, testing a net interface with something like >>> bulk data transfer in one direction doesn't verify it works for NFS >>> traffic. >>> >>> Also, the large RPC messages are a chain of about 33 mbufs of >>> various lengths, including a mix of partial clusters and regular >>> data mbufs, whereas a bulk send on a socket will typically >>> result in an mbuf chain of a lot of full 2K clusters. >>> --> As such, NFS can be good at tickling subtle bugs it the >>>net driver related to mbuf handling. >>> >>> rick >>> >>>>> W.r.t. reverting r367492...the patch to replace r367492 was just >>>>> committed to "main" by rscheff@ with a two week MFC, so it >>>>> should be in stable/13 soon. Not sure if an errata can be done >>>>> for it for releng13.0? >>>> >>>> That update is reported to be causing "rack" related panics: >>>> >>>> https://lists.freebsd.org/pipermail/dev-commits-src-main/2021-May/004440.html >>>> >>>> reports (via links): >>>> >>>> panic: _mtx_lock_sleep: recursed on non-recursive mutex so_snd @ >>>> /syzkaller/managers/i386/kernel/sys/modules/tcp/rack/../../../netinet/tcp_stacks/rack.c:10632 >>>> >>>> Still, I have a non-debug update to main building and will >>>> likely do a debug build as well. llvm is rebuilding, so >>>> the builds will take a notable time. >> >> I got the following built and installed on the two >> machines: >> >> # uname -apKU >> FreeBSD CA72_16Gp_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #1 >> main-n246854-03b0505b8fe8-dirty: Sat May 22 16:25:04 PDT 2021 >> root@CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA72-dbg-clang/usr/main-src/arm64.aarch64/sys/GENERIC-DBG-CA72 >> arm64 aarch64 1400013 1400013 >> >> # uname -apKU >> FreeBSD CA72_4c8G_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #1 >> main-n246854-03b0505b8fe8-dirty: Sat May 22 16:25:04 PDT 2021 >> root@CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA72-dbg-clang/usr/main-src/arm64.aarch64/sys/GENERIC-DBG-CA72 >> arm64 aarch64 1400013 1400013 >>
Re: releng/13 release/13.0.0 : odd/incorrect diff result over nfs (in a zfs file systems context)
On 2021-May-23, at 00:44, Mark Millard wrote: > On 2021-May-21, at 17:56, Rick Macklem wrote: > >> Mark Millard wrote: >> [stuff snipped] >>> Well, why is it that ls -R, find, and diff -r all get file >>> name problems via genet0 but diff -r gets no problems >>> comparing the content of files that it does match up (the >>> vast majority)? Any clue how could the problems possibly >>> be unique to the handling of file names/paths? Does it >>> suggest anything else to look into for getting some more >>> potentially useful evidence? >> Well, all I can do is describe the most common TSO related >> failure: >> - When a read RPC reply (including NFS/RPC/TCP/IP headers) >> is slightly less than 64K bytes (many TSO implementations are >> limited to 64K or 32 discontiguous segments, think 32 2K >> mbuf clusters), the driver decides it is ok, but when the MAC >> header is added it exceeds what the hardware can handle correctly... >> --> This will happen when reading a regular file that is slightly less >> than a multiple of 64K in size. >> or >> --> This will happen when reading just about any large directory, >> since the directory reply for a 64K request is converted to Sun XDR >> format and clipped at the last full directory entry that will fit within >> 64K. >> For ports, where most files are small, I think you can tell which is more >> likely to happen. >> --> If TSO is disabled, I have no idea how this might matter, but?? >> >>> I'll note that netstat -I ue0 -d and netstat -I genet0 -d >>> do not report changes in Ierrs or Idrop in a before vs. >>> after failures comparison. (There may be better figures >>> to look at for all I know.) >>> >>> I tried "ifconfig genet0 -rxcsum -rxcsum -rxcsum6 -txcsum6" >>> and got no obvious change in behavior. >> All we know is that the data is getting corrupted somehow. >> >> NFS traffic looks very different than typical TCP traffic. It is >> mostly small messages travelling in both directions concurrently, >> with some large messages thrown in the mix. >> All I'm saying is that, testing a net interface with something like >> bulk data transfer in one direction doesn't verify it works for NFS >> traffic. >> >> Also, the large RPC messages are a chain of about 33 mbufs of >> various lengths, including a mix of partial clusters and regular >> data mbufs, whereas a bulk send on a socket will typically >> result in an mbuf chain of a lot of full 2K clusters. >> --> As such, NFS can be good at tickling subtle bugs it the >> net driver related to mbuf handling. >> >> rick >> >>>> W.r.t. reverting r367492...the patch to replace r367492 was just >>>> committed to "main" by rscheff@ with a two week MFC, so it >>>> should be in stable/13 soon. Not sure if an errata can be done >>>> for it for releng13.0? >>> >>> That update is reported to be causing "rack" related panics: >>> >>> https://lists.freebsd.org/pipermail/dev-commits-src-main/2021-May/004440.html >>> >>> reports (via links): >>> >>> panic: _mtx_lock_sleep: recursed on non-recursive mutex so_snd @ >>> /syzkaller/managers/i386/kernel/sys/modules/tcp/rack/../../../netinet/tcp_stacks/rack.c:10632 >>> >>> Still, I have a non-debug update to main building and will >>> likely do a debug build as well. llvm is rebuilding, so >>> the builds will take a notable time. > > I got the following built and installed on the two > machines: > > # uname -apKU > FreeBSD CA72_16Gp_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #1 > main-n246854-03b0505b8fe8-dirty: Sat May 22 16:25:04 PDT 2021 > root@CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA72-dbg-clang/usr/main-src/arm64.aarch64/sys/GENERIC-DBG-CA72 > arm64 aarch64 1400013 1400013 > > # uname -apKU > FreeBSD CA72_4c8G_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #1 > main-n246854-03b0505b8fe8-dirty: Sat May 22 16:25:04 PDT 2021 > root@CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA72-dbg-clang/usr/main-src/arm64.aarch64/sys/GENERIC-DBG-CA72 > arm64 aarch64 1400013 1400013 > > Note that both are booted with debug builds of main. > > Using the context with the alternate EtherNet device that has not > had an associated diff -r, find, pr ls -R failure yet > yet got a panic that looks likely to be unrelated: > > # mount -onoatime 192.168.1.187:/usr/ports/ /mnt/ > # diff -r /usr/ports/ /mnt/ | more > nvme0: cpl does not map to outstanding cmd
Re: releng/13 release/13.0.0 : odd/incorrect diff result over nfs (in a zfs file systems context)
On 2021-May-21, at 17:56, Rick Macklem wrote: > Mark Millard wrote: > [stuff snipped] >> Well, why is it that ls -R, find, and diff -r all get file >> name problems via genet0 but diff -r gets no problems >> comparing the content of files that it does match up (the >> vast majority)? Any clue how could the problems possibly >> be unique to the handling of file names/paths? Does it >> suggest anything else to look into for getting some more >> potentially useful evidence? > Well, all I can do is describe the most common TSO related > failure: > - When a read RPC reply (including NFS/RPC/TCP/IP headers) > is slightly less than 64K bytes (many TSO implementations are > limited to 64K or 32 discontiguous segments, think 32 2K > mbuf clusters), the driver decides it is ok, but when the MAC > header is added it exceeds what the hardware can handle correctly... > --> This will happen when reading a regular file that is slightly less > than a multiple of 64K in size. > or > --> This will happen when reading just about any large directory, > since the directory reply for a 64K request is converted to Sun XDR > format and clipped at the last full directory entry that will fit within > 64K. > For ports, where most files are small, I think you can tell which is more > likely to happen. > --> If TSO is disabled, I have no idea how this might matter, but?? > >> I'll note that netstat -I ue0 -d and netstat -I genet0 -d >> do not report changes in Ierrs or Idrop in a before vs. >> after failures comparison. (There may be better figures >> to look at for all I know.) >> >> I tried "ifconfig genet0 -rxcsum -rxcsum -rxcsum6 -txcsum6" >> and got no obvious change in behavior. > All we know is that the data is getting corrupted somehow. > > NFS traffic looks very different than typical TCP traffic. It is > mostly small messages travelling in both directions concurrently, > with some large messages thrown in the mix. > All I'm saying is that, testing a net interface with something like > bulk data transfer in one direction doesn't verify it works for NFS > traffic. > > Also, the large RPC messages are a chain of about 33 mbufs of > various lengths, including a mix of partial clusters and regular > data mbufs, whereas a bulk send on a socket will typically > result in an mbuf chain of a lot of full 2K clusters. > --> As such, NFS can be good at tickling subtle bugs it the > net driver related to mbuf handling. > > rick > >>> W.r.t. reverting r367492...the patch to replace r367492 was just >>> committed to "main" by rscheff@ with a two week MFC, so it >>> should be in stable/13 soon. Not sure if an errata can be done >>> for it for releng13.0? >> >> That update is reported to be causing "rack" related panics: >> >> https://lists.freebsd.org/pipermail/dev-commits-src-main/2021-May/004440.html >> >> reports (via links): >> >> panic: _mtx_lock_sleep: recursed on non-recursive mutex so_snd @ >> /syzkaller/managers/i386/kernel/sys/modules/tcp/rack/../../../netinet/tcp_stacks/rack.c:10632 >> >> Still, I have a non-debug update to main building and will >> likely do a debug build as well. llvm is rebuilding, so >> the builds will take a notable time. I got the following built and installed on the two machines: # uname -apKU FreeBSD CA72_16Gp_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #1 main-n246854-03b0505b8fe8-dirty: Sat May 22 16:25:04 PDT 2021 root@CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA72-dbg-clang/usr/main-src/arm64.aarch64/sys/GENERIC-DBG-CA72 arm64 aarch64 1400013 1400013 # uname -apKU FreeBSD CA72_4c8G_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #1 main-n246854-03b0505b8fe8-dirty: Sat May 22 16:25:04 PDT 2021 root@CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA72-dbg-clang/usr/main-src/arm64.aarch64/sys/GENERIC-DBG-CA72 arm64 aarch64 1400013 1400013 Note that both are booted with debug builds of main. Using the context with the alternate EtherNet device that has not had an associated diff -r, find, pr ls -R failure yet yet got a panic that looks likely to be unrelated: # mount -onoatime 192.168.1.187:/usr/ports/ /mnt/ # diff -r /usr/ports/ /mnt/ | more nvme0: cpl does not map to outstanding cmd cdw0: sqhd:0020 sqid:0003 cid:007e p:1 sc:00 sct:0 m:0 dnr:0 panic: received completion for unknown cmd cpuid = 3 time = 1621743752 KDB: stack backtrace: db_trace_self() at db_trace_self db_trace_self_wrapper() at db_trace_self_wrapper+0x30 vpanic() at vpanic+0x188 panic() at panic+0x44 nvme_qpair_process_completions() at nvme_qpair_process_completions+0x1fc nvme_timeout() at nvme_timeout+0x3c softclock_call_cc() at softclock_call_cc+0x124 so
Re: releng/13 release/13.0.0 : odd/incorrect diff result over nfs (in a zfs file systems context)
On 2021-May-21, at 09:00, Rick Macklem wrote: > Mark Millard wrote: >> On 2021-May-20, at 22:19, Rick Macklem wrote: > [stuff snipped] >>> ps: I do not think that r367492 could cause this, but it would be >>>nice if you try a kernel with the r367492 patch reverted. >>>It is currently in all of releng13, stable13 and main, although >>>the patch to fix this is was just reviewed and may hit main soon. >> >> Do you want a debug kernel to be used? Do you have a preference >> for main vs. stable/13 vs. release/13.0.0 based? Is it okay to >> stick to the base version things are now based on --or do you >> want me to update to more recent? (That last only applies if >> main or stable/13 is to be put to use.) > Well, it sounds like you've isolated it to the genet interface. > Good sluething. > Unfortunately, NFS is only as good as the network fabric under it. > However, it's usually hangs or poor performance. Except maybe > for the readdir issue that Jason Bacon reported and resolved via > an upgrade, this is a first. > --> In the old days, I would have expected IP checksums to catch > this, but I'm guessing the hardware/net driver are doing them > these days? Well, why is it that ls -R, find, and diff -r all get file name problems via genet0 but diff -r gets no problems comparing the content of files that it does match up (the vast majority)? Any clue how could the problems possibly be unique to the handling of file names/paths? Does it suggest anything else to look into for getting some more potentially useful evidence? I'll note that netstat -I ue0 -d and netstat -I genet0 -d do not report changes in Ierrs or Idrop in a before vs. after failures comparison. (There may be better figures to look at for all I know.) I tried "ifconfig genet0 -rxcsum -rxcsum -rxcsum6 -txcsum6" and got no obvious change in behavior. > W.r.t. reverting r367492...the patch to replace r367492 was just > committed to "main" by rscheff@ with a two week MFC, so it > should be in stable/13 soon. Not sure if an errata can be done > for it for releng13.0? That update is reported to be causing "rack" related panics: https://lists.freebsd.org/pipermail/dev-commits-src-main/2021-May/004440.html reports (via links): panic: _mtx_lock_sleep: recursed on non-recursive mutex so_snd @ /syzkaller/managers/i386/kernel/sys/modules/tcp/rack/../../../netinet/tcp_stacks/rack.c:10632 Still, I have a non-debug update to main building and will likely do a debug build as well. llvm is rebuilding, so the builds will take a notable time. > Thanks for isolating this, rick > ps: Co-incidentally, I've been thinking of buying an RBPi4 as a toy. I'll warn that the primary "small arm" development/support folk(s) do not work on the RPi*'s these days, beyond committing what others provide and the like. === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar) ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: releng/13 release/13.0.0 : odd/incorrect diff result over nfs (in a zfs file systems context) [RPi4B genet0 involved in problem]
[Looks like the RPi4B genet0 handling is involved.] On 2021-May-20, at 22:56, Mark Millard wrote: > > On 2021-May-20, at 22:19, Rick Macklem wrote: > >> Ok, so it isn't related to "soft". >> I am wondering if it is something specific to what >> "diff -r" does? >> >> Could you try: >> # cd /usr/ports >> # ls -R > /tmp/x >> # cd /mnt >> # ls -R > /tmp/y >> # cd /tmp >> # diff -u -p x y >> --> To see if "ls -R" finds any difference? >> > > # diff -u -p x y > --- x 2021-05-20 22:35:48.021663000 -0700 > +++ y 2021-05-20 22:39:03.691936000 -0700 > @@ -227209,10 +227209,10 @@ > patch-chrome_browser_background_background__mode__mana > patch-chrome_browser_background_background__mode__optimizer.cc > patch-chrome_browser_browser__resources.grd > patch-chrome_browser_browsing__data_chrome__browsing__data__remover__delegate.cc > +patch-chrome_browser_chrome__browser > patch-chrome_browser_chrome__browser__interface__binders.cc > patch-chrome_browser_chrome__browser__main.cc > patch-chrome_browser_chrome__browser__main__linux.cc > -patch-chrome_browser_chrome__browser__main__posix.cc > patch-chrome_browser_chrome__content__browser__client.cc > patch-chrome_browser_chrome__content__browser__client.h > patch-chrome_browser_crash__upload__list_crash__upload__list.cc > > # find /usr/ports/ -name 'patch-chrome_browser_chrome__browser*' -print | more > /usr/ports/devel/electron12/files/patch-chrome_browser_chrome__browser__main__linux.cc > /usr/ports/devel/electron12/files/patch-chrome_browser_chrome__browser__main.cc > /usr/ports/devel/electron12/files/patch-chrome_browser_chrome__browser__main__posix.cc > /usr/ports/devel/electron12/files/patch-chrome_browser_chrome__browser__interface__binders.cc > /usr/ports/www/chromium/files/patch-chrome_browser_chrome__browser__main__posix.cc > /usr/ports/www/chromium/files/patch-chrome_browser_chrome__browser__main.cc > /usr/ports/www/chromium/files/patch-chrome_browser_chrome__browser__main__linux.cc > /usr/ports/www/chromium/files/patch-chrome_browser_chrome__browser__interface__binders.cc > > find /mnt/ -name 'patch-chrome_browser_chrome__browser*' -print | more > /mnt/devel/electron12/files/patch-chrome_browser_chrome__browser__main__linux.cc > /mnt/devel/electron12/files/patch-chrome_browser_chrome__browser__main.cc > /mnt/devel/electron12/files/patch-chrome_browser_chrome__browser__main__posix.cc > /mnt/devel/electron12/files/patch-chrome_browser_chrome__browser__interface__binders.cc > /mnt/www/chromium/files/patch-chrome_browser_chrome__browser > /mnt/www/chromium/files/patch-chrome_browser_chrome__browser__main.cc > /mnt/www/chromium/files/patch-chrome_browser_chrome__browser__main__linux.cc > /mnt/www/chromium/files/patch-chrome_browser_chrome__browser__interface__binders.cc > > So: patch-chrome_browser_chrome__browser appears to be a > truncated: patch-chrome_browser_chrome__browser__main__posix.cc > file name and find also gets the same oddity. > > (Note: This had /usr/ports in a main context and /mnt/ > referring to a release/13.0.0 context.) > >> ps: I do not think that r367492 could cause this, but it would be >>nice if you try a kernel with the r367492 patch reverted. >>It is currently in all of releng13, stable13 and main, although >>the patch to fix this is was just reviewed and may hit main soon. > > Do you want a debug kernel to be used? Do you have a preference > for main vs. stable/13 vs. release/13.0.0 based? Is it okay to > stick to the base version things are now based on --or do you > want me to update to more recent? (That last only applies if > main or stable/13 is to be put to use.) > >> . . . old history deleted . . . I reversed the roles of the faster vs. somewhat slower machine and so far my diff -r attempts for this found no differences. The machines were using different types of EtherNet devices. So I've substituted a different EtherNet device onto the slower machine: the same type of USB3 EtherNet device in use on the faster machine (instead of using the RPi4B's builtin EtherNet). So the below testing is with both machines having a: ugen0.6: at usbus0 ure0 on uhub0 ure0: on usbus0 miibus1: on ure0 rgephy0: PHY 0 on miibus1 rgephy0: none, 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT-FDX, 1000baseT-FDX-master, auto in use. I rebooted with this connected instead of the genet0 interface. Mounting the slower machine's /usr/ports/ as /mnt/ from the faster machine: No differences found by diff -r this way (expected result). Mounting the faster machine's /usr/ports/ as /mnt/ from the slower machine: No differences found by diff -r this way (expected result). Doing diff -r's from bo
Re: releng/13 release/13.0.0 : odd/incorrect diff result over nfs (in a zfs file systems context)
On 2021-May-20, at 22:19, Rick Macklem wrote: > Ok, so it isn't related to "soft". > I am wondering if it is something specific to what > "diff -r" does? > > Could you try: > # cd /usr/ports > # ls -R > /tmp/x > # cd /mnt > # ls -R > /tmp/y > # cd /tmp > # diff -u -p x y > --> To see if "ls -R" finds any difference? > # diff -u -p x y --- x 2021-05-20 22:35:48.021663000 -0700 +++ y 2021-05-20 22:39:03.691936000 -0700 @@ -227209,10 +227209,10 @@ patch-chrome_browser_background_background__mode__mana patch-chrome_browser_background_background__mode__optimizer.cc patch-chrome_browser_browser__resources.grd patch-chrome_browser_browsing__data_chrome__browsing__data__remover__delegate.cc +patch-chrome_browser_chrome__browser patch-chrome_browser_chrome__browser__interface__binders.cc patch-chrome_browser_chrome__browser__main.cc patch-chrome_browser_chrome__browser__main__linux.cc -patch-chrome_browser_chrome__browser__main__posix.cc patch-chrome_browser_chrome__content__browser__client.cc patch-chrome_browser_chrome__content__browser__client.h patch-chrome_browser_crash__upload__list_crash__upload__list.cc # find /usr/ports/ -name 'patch-chrome_browser_chrome__browser*' -print | more /usr/ports/devel/electron12/files/patch-chrome_browser_chrome__browser__main__linux.cc /usr/ports/devel/electron12/files/patch-chrome_browser_chrome__browser__main.cc /usr/ports/devel/electron12/files/patch-chrome_browser_chrome__browser__main__posix.cc /usr/ports/devel/electron12/files/patch-chrome_browser_chrome__browser__interface__binders.cc /usr/ports/www/chromium/files/patch-chrome_browser_chrome__browser__main__posix.cc /usr/ports/www/chromium/files/patch-chrome_browser_chrome__browser__main.cc /usr/ports/www/chromium/files/patch-chrome_browser_chrome__browser__main__linux.cc /usr/ports/www/chromium/files/patch-chrome_browser_chrome__browser__interface__binders.cc find /mnt/ -name 'patch-chrome_browser_chrome__browser*' -print | more /mnt/devel/electron12/files/patch-chrome_browser_chrome__browser__main__linux.cc /mnt/devel/electron12/files/patch-chrome_browser_chrome__browser__main.cc /mnt/devel/electron12/files/patch-chrome_browser_chrome__browser__main__posix.cc /mnt/devel/electron12/files/patch-chrome_browser_chrome__browser__interface__binders.cc /mnt/www/chromium/files/patch-chrome_browser_chrome__browser /mnt/www/chromium/files/patch-chrome_browser_chrome__browser__main.cc /mnt/www/chromium/files/patch-chrome_browser_chrome__browser__main__linux.cc /mnt/www/chromium/files/patch-chrome_browser_chrome__browser__interface__binders.cc So: patch-chrome_browser_chrome__browser appears to be a truncated: patch-chrome_browser_chrome__browser__main__posix.cc file name and find also gets the same oddity. (Note: This had /usr/ports in a main context and /mnt/ referring to a release/13.0.0 context.) > ps: I do not think that r367492 could cause this, but it would be > nice if you try a kernel with the r367492 patch reverted. > It is currently in all of releng13, stable13 and main, although > the patch to fix this is was just reviewed and may hit main soon. Do you want a debug kernel to be used? Do you have a preference for main vs. stable/13 vs. release/13.0.0 based? Is it okay to stick to the base version things are now based on --or do you want me to update to more recent? (That last only applies if main or stable/13 is to be put to use.) > . . . old history deleted . . . === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar) ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: releng/13 release/13.0.0 : odd/incorrect diff result over nfs (in a zfs file systems context)
[Direct drive connection to machine: no problem.] On 2021-May-20, at 21:40, Mark Millard wrote: > [main test example and main/releng/13 mixed example] > > On 2021-May-20, at 20:36, Mark Millard wrote: > >> [stable/13 test: example ends up being odder. That might >> allow eliminating some potential alternatives.] >> >> On 2021-May-20, at 19:38, Mark Millard wrote: >>> >>> On 2021-May-20, at 18:09, Rick Macklem wrote: >>>> >>>> Oh, one additional thing that I'll dare to top post... >>>> r367492 broke the TCP upcalls that the NFS server uses, such >>>> that intermittent hangs of NFS mounts to FreeBSD13 servers can occur. >>>> This has not yet been resolved in "main" etc and could explain >>>> why an RPC could time out for a soft mount. >>> >>> See later notes that I added: soft mount is not required >>> to see the problem. >>> >>>> You can revert the patch in r367492 to avoid the problem. >>> >>> If I understand right, you are indicating that this would >>> not apply to the non-soft mount case that I got. >>> >>>> Disabling TSO, LRO are also de-facto standard things to do when >>>> you observe weird NFS behaviour, because they are often broken >>>> in various network device drivers. >>> >>> I'll have to figure out how to experiment with such. Things >>> are at defaults rather generally on the systems. I'm not >>> literate in the subject areas. >>> >>> I'm the only user of the machines and network. It is not >>> outward facing. It is a rather small EtherNet network. >>> >>>> rick >>>> >>>> >>>> From: owner-freebsd-sta...@freebsd.org >>>> on behalf of Rick Macklem >>>> Sent: Thursday, May 20, 2021 8:55 PM >>>> To: FreeBSD-STABLE Mailing List; Mark Millard >>>> Subject: Re: releng/13 release/13.0.0 : odd/incorrect diff result over nfs >>>> (in a zfs file systems context) >>>> >>>> Mark Millard wrote: >>>>> [I warn that I'm a fairly minimal user of NFS >>>>> mounts, not knowing all that much. I'm mostly >>>>> reporting this in case it ends up as evidence >>>>> via eventually matching up with others observing >>>>> possibly related oddities.] >>>>> >>>>> I got the following odd sequence (that I've >>>>> mixed notes into). It involved a diff -r over NFS >>>>> showing differences (files missing) and then a >>>>> later diff finding matches for the same files, >>>>> no file system changes made on either machine. >>>>> I'm unable to reproduce the oddity on demand. >>>>> >>>>> Note: A larger scope diff -r originally returned the >>>>> below as well, but doing the narrower diff -r did >>>>> repeat the result and that is what I show. (I >>>>> make no use of devel/ice .) >>>>> >>>>> # diff -r /usr/ports/devel/ice/files /mnt/devel/ice/files | more >>>>> Only in /usr/ports/devel/ice/files: Make.rules.FreeBSD >>> . . . >>>>> Only in /usr/ports/devel/ice/files: patch-scripts-TestUtil.py >>>>> >>>>> Note: The above was not expected. So I tried: >>>>> >>>>> # ls -Tld /mnt/devel/ice/files/* >>>>> -rw-r--r-- 1 root wheel 755 Apr 21 21:07:54 2021 >>>>> /mnt/devel/ice/files/Make.rules.FreeBSD >>> . . . >>>>> -rw-r--r-- 1 root wheel 2588 Apr 21 21:07:54 2021 >>>>> /mnt/devel/ice/files/patch-scripts-TestUtil.py >>>>> >>>>> Note: So that indicated that the files were there on the >>>>> machine that /mnt references. So attempting the original >>>>> diff -r again: >>>>> >>>>> # diff -r /usr/ports/devel/ice/files /mnt/devel/ice/files | more >>>>> # >>>>> >>>>> (Empty difference.) >>>>> >>>>> Note: So after the explicit "ls -Tld /mnt/devel/ice/files/*" >>>>> the odd result of the diff -r no longer happened: no >>>>> differences reported. >>>>> >>>>> >>>>> >>>>> For reference (both machines reported): >>>>> >>>>> . . . >>>>> The original mount command w
Re: releng/13 release/13.0.0 : odd/incorrect diff result over nfs (in a zfs file systems context)
[main test example and main/releng/13 mixed example] On 2021-May-20, at 20:36, Mark Millard wrote: > [stable/13 test: example ends up being odder. That might > allow eliminating some potential alternatives.] > > On 2021-May-20, at 19:38, Mark Millard wrote: >> >> On 2021-May-20, at 18:09, Rick Macklem wrote: >>> >>> Oh, one additional thing that I'll dare to top post... >>> r367492 broke the TCP upcalls that the NFS server uses, such >>> that intermittent hangs of NFS mounts to FreeBSD13 servers can occur. >>> This has not yet been resolved in "main" etc and could explain >>> why an RPC could time out for a soft mount. >> >> See later notes that I added: soft mount is not required >> to see the problem. >> >>> You can revert the patch in r367492 to avoid the problem. >> >> If I understand right, you are indicating that this would >> not apply to the non-soft mount case that I got. >> >>> Disabling TSO, LRO are also de-facto standard things to do when >>> you observe weird NFS behaviour, because they are often broken >>> in various network device drivers. >> >> I'll have to figure out how to experiment with such. Things >> are at defaults rather generally on the systems. I'm not >> literate in the subject areas. >> >> I'm the only user of the machines and network. It is not >> outward facing. It is a rather small EtherNet network. >> >>> rick >>> >>> >>> From: owner-freebsd-sta...@freebsd.org >>> on behalf of Rick Macklem >>> Sent: Thursday, May 20, 2021 8:55 PM >>> To: FreeBSD-STABLE Mailing List; Mark Millard >>> Subject: Re: releng/13 release/13.0.0 : odd/incorrect diff result over nfs >>> (in a zfs file systems context) >>> >>> Mark Millard wrote: >>>> [I warn that I'm a fairly minimal user of NFS >>>> mounts, not knowing all that much. I'm mostly >>>> reporting this in case it ends up as evidence >>>> via eventually matching up with others observing >>>> possibly related oddities.] >>>> >>>> I got the following odd sequence (that I've >>>> mixed notes into). It involved a diff -r over NFS >>>> showing differences (files missing) and then a >>>> later diff finding matches for the same files, >>>> no file system changes made on either machine. >>>> I'm unable to reproduce the oddity on demand. >>>> >>>> Note: A larger scope diff -r originally returned the >>>> below as well, but doing the narrower diff -r did >>>> repeat the result and that is what I show. (I >>>> make no use of devel/ice .) >>>> >>>> # diff -r /usr/ports/devel/ice/files /mnt/devel/ice/files | more >>>> Only in /usr/ports/devel/ice/files: Make.rules.FreeBSD >> . . . >>>> Only in /usr/ports/devel/ice/files: patch-scripts-TestUtil.py >>>> >>>> Note: The above was not expected. So I tried: >>>> >>>> # ls -Tld /mnt/devel/ice/files/* >>>> -rw-r--r-- 1 root wheel 755 Apr 21 21:07:54 2021 >>>> /mnt/devel/ice/files/Make.rules.FreeBSD >> . . . >>>> -rw-r--r-- 1 root wheel 2588 Apr 21 21:07:54 2021 >>>> /mnt/devel/ice/files/patch-scripts-TestUtil.py >>>> >>>> Note: So that indicated that the files were there on the >>>> machine that /mnt references. So attempting the original >>>> diff -r again: >>>> >>>> # diff -r /usr/ports/devel/ice/files /mnt/devel/ice/files | more >>>> # >>>> >>>> (Empty difference.) >>>> >>>> Note: So after the explicit "ls -Tld /mnt/devel/ice/files/*" >>>> the odd result of the diff -r no longer happened: no >>>> differences reported. >>>> >>>> >>>> >>>> For reference (both machines reported): >>>> >>>> . . . >>>> The original mount command was on CA72_16Gp_ZFS: >>>> >>>> # mount -onoatime,soft 192.168.1.170:/usr/ports/ /mnt/ >>> The likely explanation for this is your use of a "soft" mount. >>> - If the NFS server is slow to respond or there is a temporary network >>> issue, >>> the RPC request can time out and then the >>> syscall can fail with EINT/ETIMEDOUT. Since almost nothing, including the >>> readdir(3) libc functions ex
Re: releng/13 release/13.0.0 : odd/incorrect diff result over nfs (in a zfs file systems context)
[stable/13 test: example ends up being odder. That might allow eliminating some potential alternatives.] On 2021-May-20, at 19:38, Mark Millard wrote: > > On 2021-May-20, at 18:09, Rick Macklem wrote: >> >> Oh, one additional thing that I'll dare to top post... >> r367492 broke the TCP upcalls that the NFS server uses, such >> that intermittent hangs of NFS mounts to FreeBSD13 servers can occur. >> This has not yet been resolved in "main" etc and could explain >> why an RPC could time out for a soft mount. > > See later notes that I added: soft mount is not required > to see the problem. > >> You can revert the patch in r367492 to avoid the problem. > > If I understand right, you are indicating that this would > not apply to the non-soft mount case that I got. > >> Disabling TSO, LRO are also de-facto standard things to do when >> you observe weird NFS behaviour, because they are often broken >> in various network device drivers. > > I'll have to figure out how to experiment with such. Things > are at defaults rather generally on the systems. I'm not > literate in the subject areas. > > I'm the only user of the machines and network. It is not > outward facing. It is a rather small EtherNet network. > >> rick >> >> >> From: owner-freebsd-sta...@freebsd.org on >> behalf of Rick Macklem >> Sent: Thursday, May 20, 2021 8:55 PM >> To: FreeBSD-STABLE Mailing List; Mark Millard >> Subject: Re: releng/13 release/13.0.0 : odd/incorrect diff result over nfs >> (in a zfs file systems context) >> >> Mark Millard wrote: >>> [I warn that I'm a fairly minimal user of NFS >>> mounts, not knowing all that much. I'm mostly >>> reporting this in case it ends up as evidence >>> via eventually matching up with others observing >>> possibly related oddities.] >>> >>> I got the following odd sequence (that I've >>> mixed notes into). It involved a diff -r over NFS >>> showing differences (files missing) and then a >>> later diff finding matches for the same files, >>> no file system changes made on either machine. >>> I'm unable to reproduce the oddity on demand. >>> >>> Note: A larger scope diff -r originally returned the >>> below as well, but doing the narrower diff -r did >>> repeat the result and that is what I show. (I >>> make no use of devel/ice .) >>> >>> # diff -r /usr/ports/devel/ice/files /mnt/devel/ice/files | more >>> Only in /usr/ports/devel/ice/files: Make.rules.FreeBSD > . . . >>> Only in /usr/ports/devel/ice/files: patch-scripts-TestUtil.py >>> >>> Note: The above was not expected. So I tried: >>> >>> # ls -Tld /mnt/devel/ice/files/* >>> -rw-r--r-- 1 root wheel 755 Apr 21 21:07:54 2021 >>> /mnt/devel/ice/files/Make.rules.FreeBSD > . . . >>> -rw-r--r-- 1 root wheel 2588 Apr 21 21:07:54 2021 >>> /mnt/devel/ice/files/patch-scripts-TestUtil.py >>> >>> Note: So that indicated that the files were there on the >>> machine that /mnt references. So attempting the original >>> diff -r again: >>> >>> # diff -r /usr/ports/devel/ice/files /mnt/devel/ice/files | more >>> # >>> >>> (Empty difference.) >>> >>> Note: So after the explicit "ls -Tld /mnt/devel/ice/files/*" >>> the odd result of the diff -r no longer happened: no >>> differences reported. >>> >>> >>> >>> For reference (both machines reported): >>> >>> . . . >>> The original mount command was on CA72_16Gp_ZFS: >>> >>> # mount -onoatime,soft 192.168.1.170:/usr/ports/ /mnt/ >> The likely explanation for this is your use of a "soft" mount. >> - If the NFS server is slow to respond or there is a temporary network issue, >> the RPC request can time out and then the >> syscall can fail with EINT/ETIMEDOUT. Since almost nothing, including the >> readdir(3) libc functions expect syscalls to fail this way... >> Then the cached directory is messed up. >> Doing the "ls" read the directory again and fixed the problem. >> >> Try to reproduce it for a mount without the "soft" option. >> (If a mount point is hung, due to an unresponsive server "umount -N /mnt" >> can usually get rid of it.) >> Personally, I thought "soft" was a bad idea when Sun introduced it in NFS in >> 1985 >> and I still feel that
Re: releng/13 release/13.0.0 : odd/incorrect diff result over nfs (in a zfs file systems context)
> On 2021-May-20, at 18:09, Rick Macklem wrote: > > Oh, one additional thing that I'll dare to top post... > r367492 broke the TCP upcalls that the NFS server uses, such > that intermittent hangs of NFS mounts to FreeBSD13 servers can occur. > This has not yet been resolved in "main" etc and could explain > why an RPC could time out for a soft mount. See later notes that I added: soft mount is not required to see the problem. > You can revert the patch in r367492 to avoid the problem. If I understand right, you are indicating that this would not apply to the non-soft mount case that I got. > Disabling TSO, LRO are also de-facto standard things to do when > you observe weird NFS behaviour, because they are often broken > in various network device drivers. I'll have to figure out how to experiment with such. Things are at defaults rather generally on the systems. I'm not literate in the subject areas. I'm the only user of the machines and network. It is not outward facing. It is a rather small EtherNet network. > rick > > > From: owner-freebsd-sta...@freebsd.org on > behalf of Rick Macklem > Sent: Thursday, May 20, 2021 8:55 PM > To: FreeBSD-STABLE Mailing List; Mark Millard > Subject: Re: releng/13 release/13.0.0 : odd/incorrect diff result over nfs > (in a zfs file systems context) > > Mark Millard wrote: >> [I warn that I'm a fairly minimal user of NFS >> mounts, not knowing all that much. I'm mostly >> reporting this in case it ends up as evidence >> via eventually matching up with others observing >> possibly related oddities.] >> >> I got the following odd sequence (that I've >> mixed notes into). It involved a diff -r over NFS >> showing differences (files missing) and then a >> later diff finding matches for the same files, >> no file system changes made on either machine. >> I'm unable to reproduce the oddity on demand. >> >> Note: A larger scope diff -r originally returned the >> below as well, but doing the narrower diff -r did >> repeat the result and that is what I show. (I >> make no use of devel/ice .) >> >> # diff -r /usr/ports/devel/ice/files /mnt/devel/ice/files | more >> Only in /usr/ports/devel/ice/files: Make.rules.FreeBSD . . . >> Only in /usr/ports/devel/ice/files: patch-scripts-TestUtil.py >> >> Note: The above was not expected. So I tried: >> >> # ls -Tld /mnt/devel/ice/files/* >> -rw-r--r-- 1 root wheel 755 Apr 21 21:07:54 2021 >> /mnt/devel/ice/files/Make.rules.FreeBSD . . . >> -rw-r--r-- 1 root wheel 2588 Apr 21 21:07:54 2021 >> /mnt/devel/ice/files/patch-scripts-TestUtil.py >> >> Note: So that indicated that the files were there on the >> machine that /mnt references. So attempting the original >> diff -r again: >> >> # diff -r /usr/ports/devel/ice/files /mnt/devel/ice/files | more >> # >> >> (Empty difference.) >> >> Note: So after the explicit "ls -Tld /mnt/devel/ice/files/*" >> the odd result of the diff -r no longer happened: no >> differences reported. >> >> >> >> For reference (both machines reported): >> >> . . . >> The original mount command was on CA72_16Gp_ZFS: >> >> # mount -onoatime,soft 192.168.1.170:/usr/ports/ /mnt/ > The likely explanation for this is your use of a "soft" mount. > - If the NFS server is slow to respond or there is a temporary network issue, > the RPC request can time out and then the > syscall can fail with EINT/ETIMEDOUT. Since almost nothing, including the >readdir(3) libc functions expect syscalls to fail this way... >Then the cached directory is messed up. >Doing the "ls" read the directory again and fixed the problem. > > Try to reproduce it for a mount without the "soft" option. > (If a mount point is hung, due to an unresponsive server "umount -N /mnt" > can usually get rid of it.) > Personally, I thought "soft" was a bad idea when Sun introduced it in NFS in > 1985 > and I still feel that way. > --> If you can reproduce it without "soft" then I can't explain it. > To be honest, the directory reading/caching code in the NFSv3 client > hasn't changed significantly in literally decades, as far as I can > remember. Well . . . trying an even wider scope diff than the original . . . # umount /mnt/ # mount -onoatime 192.168.1.170:/usr/ports/ /mnt/ # diff -r /usr/ports/ /mnt/ | more Only in /mnt/databases/mongodb42/files/aarch64: patch-src_third__party_mozjs-60_ Only in /usr/ports/databases/mongodb42/files/aarch64: patch-src_th
releng/13 release/13.0.0 : odd/incorrect diff result over nfs (in a zfs file systems context)
fbsd-based-on-what-commit.sh branch: releng/13.0 merge-base: ea31abc261ffc01b6ff5671bffb15cf910a07f4b merge-base: CommitDate: 2021-04-09 00:14:30 + ea31abc261ff (HEAD -> releng/13.0, tag: release/13.0.0, freebsd/releng/13.0) 13.0: update to RELEASE n244733 (--first-parent --count for merge-base) # uname -apKU FreeBSD CA72_4c8G_ZFS 13.0-RELEASE FreeBSD 13.0-RELEASE #0 releng/13.0-n244733-ea31abc261ff-dirty: Thu Apr 29 21:53:20 PDT 2021 root@CA72_4c8G_ZFS:/usr/obj/BUILDs/13_0R-CA72-nodbg-clang/usr/13_0R-src/arm64.aarch64/sys/GENERIC-NODBG-CA72 arm64 aarch64 1300139 1300139 # ~/fbsd-based-on-what-commit.sh branch: releng/13.0 merge-base: ea31abc261ffc01b6ff5671bffb15cf910a07f4b merge-base: CommitDate: 2021-04-09 00:14:30 + ea31abc261ff (HEAD -> releng/13.0, tag: release/13.0.0, freebsd/releng/13.0) 13.0: update to RELEASE n244733 (--first-parent --count for merge-base) >From zfs list commands (one machine per line shown): zopt0/usr/ports 2.13G 236G 2.13G /usr/ports zroot/usr/ports 2.13G 113G 2.13G /usr/ports I've no clue if ZFS is important to the odditity or not. The original mount command was on CA72_16Gp_ZFS: # mount -onoatime,soft 192.168.1.170:/usr/ports/ /mnt/ The network is just a local EtherNet. === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar) ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Fresh releng/13.0 release/13.0.0 install: "newsyslog: malformed 'at' value" messages
Having used bsdinstall to make a USB3 SSD on a RPi4B (zfs-on-root, GPT parition, RPi4B materials copied copied to msdos file system), booting gets error notices: newsyslog: malformed 'at' value: /var/log/all.log600 7 *@T00 J newsyslog: malformed 'at' value: /var/log/auth.log 600 7 1000 @0101T JC newsyslog: malformed 'at' value: /var/log/daily.log 640 7 *@T00 JN newsyslog: malformed 'at' value: /var/log/maillog640 7 *@T00 JC newsyslog: malformed 'at' value: /var/log/messages 644 5 1000 @0101T JC newsyslog: malformed 'at' value: /var/log/utx.log644 3 *@01T05 B newsyslog: malformed 'at' value: /var/log/daemon.log 644 5 1000 @0101T JC It is apparently complaining about some of the content in: # more /etc/newsyslog.conf # configuration file for newsyslog # $FreeBSD$ # # Entries which do not specify the '/pid_file' field will cause the # syslogd process to be signalled when that log file is rotated. This # action is only appropriate for log files which are written to by the # syslogd process (ie, files listed in /etc/syslog.conf). If there # is no process which needs to be signalled when a given log file is # rotated, then the entry for that file should include the 'N' flag. # # Note: some sites will want to select more restrictive protections than the # defaults. In particular, it may be desirable to switch many of the 644 # entries to 640 or 600. For example, some sites will consider the # contents of maillog, messages, and lpd-errs to be confidential. In the # future, these defaults may change to more conservative ones. # # logfilename [owner:group]mode count size when flags [/pid_file] [sig_num] /var/log/all.log600 7 *@T00 J /var/log/auth.log 600 7 1000 @0101T JC /var/log/console.log600 5 1000 * J /var/log/cron 600 3 1000 * JC /var/log/daily.log 640 7 *@T00 JN /var/log/debug.log 600 7 1000 * JC /var/log/init.log 644 3 1000 * J /var/log/kerberos.log 600 7 1000 * J /var/log/maillog640 7 *@T00 JC /var/log/messages 644 5 1000 @0101T JC /var/log/monthly.log640 12*$M1D0 JN /var/log/devd.log 644 3 1000 * JC /var/log/security 600 101000 * JC /var/log/utx.log644 3 *@01T05 B /var/log/weekly.log 640 5 *$W6D0 JN /var/log/daemon.log 644 5 1000 @0101T JC /etc/newsyslog.conf.d/[!.]*.conf /usr/local/etc/newsyslog.conf.d/[!.]*.conf Specifically, the 7 lines with "@" involved under "when" get the complaints. === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar) ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: zpool list -p 's FREE vs. zfs list -p's AVAIL ? FREE-AVAIL == 6_675_374_080 (199G zroot pool)
On 2021-May-5, at 17:01, Yuri Pankov wrote: > Mark Millard via freebsd-current wrote: >> Context: >> >> # gpart show -pl da0 >> => 40 468862048da0 GPT (224G) >> 40 532480 da0p1 efiboot0 (260M) >> 532520 2008 - free - (1.0M) >> 534528 25165824 da0p2 swp12a (12G) >> 25700352 25165824 da0p4 swp12b (12G) >> 50866176 417994752 da0p3 zfs0 (199G) >> 468860928 1160 - free - (580K) >> >> There is just one pool: zroot and it is on zfs0 above. >> >> # zpool list -p >> NAME SIZEALLOC FREE CKPOINT EXPANDSZ FRAG >> CAP DEDUPHEALTH ALTROOT >> zroot 213674622976 71075655680 142598967296- - 28 >> 33 1.00ONLINE - >> >> So FREE: 142_598_967_296 >> (using _ to make it more readable) >> >> # zfs list -p zroot >> NAME USED AVAIL REFER MOUNTPOINT >> zroot 71073697792 135923593216 98304 /zroot >> >> So AVAIL: 135_923_593_216 >> >> FREE-AVAIL == 6_675_374_080 >> >> >> >> The questions: >> >> Is this sort of unavailable pool-free-space normal? >> Is this some sort of expected overhead that just is >> not explicitly reported? Possibly a "FRAG" >> consequence? > > From zpoolprops(8): > > freeThe amount of free space available in the pool. By contrast, >the zfs(8) available property describes how much new data can be >written to ZFS filesystems/volumes. The zpool free property is >not generally useful for this purpose, and can be substantially >more than the zfs available space. This discrepancy is due to >several factors, including raidz parity; zfs reservation, quota, >refreservation, and refquota properties; and space set aside by >spa_slop_shift (see zfs-module-parameters(5) for more >information). Thanks for pointing to the reference material. 6_675_374_080/213_674_622_976 =approx= 0.03124 =approx= 1.0/32.0 and spa_slop_shift's description reports: QUOTE spa_slop_shift (int) Normally, we don't allow the last 3.2% (1/(2^spa_slop_shift)) of space in the pool to be consumed. This ensures that we don't run the pool completely out of space, due to unaccounted changes (e.g. to the MOS). It also limits the worst-case time to allocate space. If we have less than this amount of free space, most ZPL operations (e.g. write, create) will return ENOSPC. Default value: 5. END QUOTE So in my simple context, apparently not much else contributes and the figures are basically as expected. Thanks again. === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar) ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
zpool list -p 's FREE vs. zfs list -p's AVAIL ? FREE-AVAIL == 6_675_374_080 (199G zroot pool)
Context: # gpart show -pl da0 => 40 468862048da0 GPT (224G) 40 532480 da0p1 efiboot0 (260M) 532520 2008 - free - (1.0M) 534528 25165824 da0p2 swp12a (12G) 25700352 25165824 da0p4 swp12b (12G) 50866176 417994752 da0p3 zfs0 (199G) 468860928 1160 - free - (580K) There is just one pool: zroot and it is on zfs0 above. # zpool list -p NAME SIZEALLOC FREE CKPOINT EXPANDSZ FRAGCAP DEDUPHEALTH ALTROOT zroot 213674622976 71075655680 142598967296- - 28 33 1.00ONLINE - So FREE: 142_598_967_296 (using _ to make it more readable) # zfs list -p zroot NAME USED AVAIL REFER MOUNTPOINT zroot 71073697792 135923593216 98304 /zroot So AVAIL: 135_923_593_216 FREE-AVAIL == 6_675_374_080 The questions: Is this sort of unavailable pool-free-space normal? Is this some sort of expected overhead that just is not explicitly reported? Possibly a "FRAG" consequence? For reference: # zpool status pool: zroot state: ONLINE scan: scrub repaired 0B in 00:31:48 with 0 errors on Sun May 2 19:52:14 2021 config: NAMESTATE READ WRITE CKSUM zroot ONLINE 0 0 0 da0p3 ONLINE 0 0 0 errors: No known data errors === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar) ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ZFS rename with associated snapshot present: odd error message
On 2021-May-5, at 05:28, Mark Millard wrote: > On 2021-May-5, at 02:47, Andriy Gapon wrote: > >> On 05/05/2021 01:59, Mark Millard via freebsd-current wrote: >>> I had a: >>> # zfs list -tall >>> NAME USED AVAIL REFER MOUNTPOINT >>> . . . >>> zroot/DESTDIRs/13_0R-CA72-instwrld-norm 1.44G 117G 96K >>> /usr/obj/DESTDIRs/13_0R-CA72-instwrld-norm >>> zroot/DESTDIRs/13_0R-CA72-instwrld-norm@dirty-style 1.44G - 1.44G >>> -. . . >>> . . . >>> (copied/pasted from somewhat earlier) and then attempted: >>> # zfs rename zroot/DESTDIRs/13_0R-CA72-instwrld-norm >>> zroot/DESTDIRs/13_0R-CA72-instwrld-alt-0 >>> cannot open 'zroot/DESTDIRs/13_0R-CA72-instwrld-norm@dirty-style': snapshot >>> delimiter '@' is not expected here >>> Despite the "cannot open" message, the result looks like: >>> # zfs list -tall >>> NAME USED AVAIL >>> REFER MOUNTPOINT >>> . . . >>> zroot/DESTDIRs/13_0R-CA72-instwrld-alt-0 1.44G 114G >>> 96K /usr/obj/DESTDIRs/13_0R-CA72-instwrld-alt-0 >>> zroot/DESTDIRs/13_0R-CA72-instwrld-alt-0@dirty-style 1.44G - >>> 1.44G - >>> . . . >>> Still, it leaves me wondering if everything is okay >>> given that internal attempt to use the old name with >>> @dirty-style when it was apparently no longer >>> available under that naming. >>> For reference: >>> # uname -apKU >>> FreeBSD CA72_4c8G_ZFS 13.0-RELEASE FreeBSD 13.0-RELEASE #0 >>> releng/13.0-n244733-ea31abc261ff-dirty: Thu Apr 29 21:53:20 PDT 2021 >>> root@CA72_4c8G_ZFS:/usr/obj/BUILDs/13_0R-CA72-nodbg-clang/usr/13_0R-src/arm64.aarch64/sys/GENERIC-NODBG-CA72 >>> arm64 aarch64 1300139 1300139 >> >> Cannot reproduce here (but with much simpler names and on stable/13): >> zfs create testz/test >> zfs snapshot testz/test@snap1 >> zfs rename testz/test testz/test2 >> >> All worked. >> > > I've noticed that sometimes in my explorations it has been > silent instead of complaining. I've no clue at this point > what prior activity (or lack of activity) makes the > difference for if a message will be generated vs. not. One difference in context is that your above sort of sequence generates the after-snapshot context (using some things I have around now): zroot/DESTDIRs/13_0R-CA53-poud 1.45G 127G 1.45G /usr/obj/DESTDIRs/13_0R-CA53-poud zroot/DESTDIRs/13_0R-CA53-poud@test 0B - 1.45G - where my example had something more like (hand edited the above just for illustration): zroot/DESTDIRs/13_0R-CA53-poud 1.45G 125G 96K /usr/obj/DESTDIRs/13_0R-CA53-poud zroot/DESTDIRs/13_0R-CA53-poud@test1.45G - 1.45G - before the rename. In other words, I'd updated the original (almost?) completely after the snapshot (as a side effect of my overall activity). It was only later that I tried the rename to track a new purpose/context that I was going to switch to. I'm not claiming that such is sufficient to (always? ever?) reproduce the message. I'm just pointing out that I'd had some significant activity on the writable file system before the rename. Some of my activity has been more like your test and I'd not seen the problems from such. But it is not a very good comparison/contrast context so I'd not infer much. I still can not at-will set up a context to produce the messages. === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar) ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: FYI: WITH_REPRODUCIBLE_BUILD= problem for some files? [aarch64 test did not reproduce the issue]
On 2021-May-4, at 20:26, Mark Millard wrote: > On 2021-May-4, at 13:38, Mark Millard wrote: > >> [The first buidlworld is still in process. So while waiting . . .] >> >> On 2021-May-4, at 10:31, Mark Millard wrote: >> >>> I probably know why the huge count of differences this time >>> unlike the original report . . . >>> >>> Previously I built based on a checked-in branch as part of >>> my experimenting. This time it was in a -dirty form (not >>> checked in), again as part of my experimental exploration. >>> >>> WITH_REPRODUCIBLE_BUILD= makes a distinction between these >>> if I remember right: (partially?) disabling itself for >>> -dirty style. >>> >>> To reproduce the original style of test I need to create >>> a branch with my few patches checked in and do the >>> buildworlds from that branch. >>> >>> This will, of course, take a while. >>> >>> Sorry for the noise. >>> >> >> I've confirmed some of the details of the large number of >> files with difference while waiting for the 1st buildworld : >> >> The 4 bytes at the end of the .gnu_debuglink section >> that are ending up different are the checksum for the >> .debug file. The .debug files have differences such as: >> >> │ -<1a> DW_AT_comp_dir: (indirect string) >> /usr/obj/BUILDs/13_0R-CA72-nodbg-clang-alt/usr/13_0R-src/arm64.aarch64/lib/csu/aarch64 >> │ +<1a> DW_AT_comp_dir: (indirect string) >> /usr/obj/BUILDs/13_0R-CA72-nodbg-clang/usr/13_0R-src/arm64.aarch64/lib/csu/aarch64 >> >> So I need to build, snapshot (in case need >> to reference), install, clean-out, build, >> install elsewhere, compare. (Or analogous >> that uses the same build base-path for both >> installs despite separate buildworld's.) >> This is separate from any potential -dirty >> vs. checked-in handling variation by >> WITH_REPRODUCIBLE_BUILD= . >> >> My process that produced the original armv7 >> report happened to do that before I accidentally >> discovered the presence of the few files with >> differences. My new experiments were different >> and I'd not though of needing to vary the >> procedure to get you the right evidence. >> > > The two aarch64 test installs did not show any > differences in a "diff -rq" . Ignoring *.meta > files generated during the builds, the build > directory tree snapshots showed just the > differences: > > # diff -rq > /usr/obj/BUILDs/13_0R-CA72-nodbg-clang-alt/.zfs/snapshot/commited-style-0/usr > /usr/obj/BUILDs/13_0R-CA72-nodbg-clang-alt/.zfs/snapshot/commited-style-1/usr > | grep -v '\.meta' | more > Files > /usr/obj/BUILDs/13_0R-CA72-nodbg-clang-alt/.zfs/snapshot/commited-style-0/usr/13_0R-src/arm64.aarch64/stand/ficl/softcore.c > and > /usr/obj/BUILDs/13_0R-CA72-nodbg-clang-alt/.zfs/snapshot/commited-style-1/usr/13_0R-src/arm64.aarch64/stand/ficl/softcore.c > differ > Files > /usr/obj/BUILDs/13_0R-CA72-nodbg-clang-alt/.zfs/snapshot/commited-style-0/usr/13_0R-src/arm64.aarch64/toolchain-metadata.mk > and > /usr/obj/BUILDs/13_0R-CA72-nodbg-clang-alt/.zfs/snapshot/commited-style-1/usr/13_0R-src/arm64.aarch64/toolchain-metadata.mk > differ > > # diff -u > /usr/obj/BUILDs/13_0R-CA72-nodbg-clang-alt/.zfs/snapshot/commited-style-0/usr/13_0R-src/arm64.aarch64/stand/ficl/softcore.c > > /usr/obj/BUILDs/13_0R-CA72-nodbg-clang-alt/.zfs/snapshot/commited-style-1/usr/13_0R-src/arm64.aarch64/stand/ficl/softcore.c > --- > /usr/obj/BUILDs/13_0R-CA72-nodbg-clang-alt/.zfs/snapshot/commited-style-0/usr/13_0R-src/arm64.aarch64/stand/ficl/softcore.c > 2021-05-04 13:45:14.463351000 -0700 > +++ > /usr/obj/BUILDs/13_0R-CA72-nodbg-clang-alt/.zfs/snapshot/commited-style-1/usr/13_0R-src/arm64.aarch64/stand/ficl/softcore.c > 2021-05-04 19:04:32.338203000 -0700 > @@ -4,7 +4,7 @@ > ** Words from CORE set written in FICL > ** Author: John Sadler (john_sad...@alum.mit.edu) > ** Created: 27 December 1997 > -** Last update: Tue May 4 13:45:14 PDT 2021 > +** Last update: Tue May 4 19:04:32 PDT 2021 > ***/ > /* > ** DO NOT EDIT THIS FILE -- it is generated by softwords/softcore.awk > > # diff -u > /usr/obj/BUILDs/13_0R-CA72-nodbg-clang-alt/.zfs/snapshot/commited-style-0/usr/13_0R-src/arm64.aarch64/toolchain-metadata.mk > > /usr/obj/BUILDs/13_0R-CA72-nodbg-clang-alt/.zfs/snapshot/commited-style-1/usr/13_0R-src/arm64.aarch64/toolchain-metadata.mk > --- > /usr/obj/BUILDs/13_0R-CA72-nodbg-clang-alt/.zfs/sn
Re: ZFS rename with associated snapshot present: odd error message
On 2021-May-5, at 02:47, Andriy Gapon wrote: > On 05/05/2021 01:59, Mark Millard via freebsd-current wrote: >> I had a: >> # zfs list -tall >> NAME USED AVAIL REFER MOUNTPOINT >> . . . >> zroot/DESTDIRs/13_0R-CA72-instwrld-norm 1.44G 117G 96K >> /usr/obj/DESTDIRs/13_0R-CA72-instwrld-norm >> zroot/DESTDIRs/13_0R-CA72-instwrld-norm@dirty-style 1.44G - 1.44G >> -. . . >> . . . >> (copied/pasted from somewhat earlier) and then attempted: >> # zfs rename zroot/DESTDIRs/13_0R-CA72-instwrld-norm >> zroot/DESTDIRs/13_0R-CA72-instwrld-alt-0 >> cannot open 'zroot/DESTDIRs/13_0R-CA72-instwrld-norm@dirty-style': snapshot >> delimiter '@' is not expected here >> Despite the "cannot open" message, the result looks like: >> # zfs list -tall >> NAME USED AVAIL >> REFER MOUNTPOINT >> . . . >> zroot/DESTDIRs/13_0R-CA72-instwrld-alt-0 1.44G 114G >> 96K /usr/obj/DESTDIRs/13_0R-CA72-instwrld-alt-0 >> zroot/DESTDIRs/13_0R-CA72-instwrld-alt-0@dirty-style 1.44G - >> 1.44G - >> . . . >> Still, it leaves me wondering if everything is okay >> given that internal attempt to use the old name with >> @dirty-style when it was apparently no longer >> available under that naming. >> For reference: >> # uname -apKU >> FreeBSD CA72_4c8G_ZFS 13.0-RELEASE FreeBSD 13.0-RELEASE #0 >> releng/13.0-n244733-ea31abc261ff-dirty: Thu Apr 29 21:53:20 PDT 2021 >> root@CA72_4c8G_ZFS:/usr/obj/BUILDs/13_0R-CA72-nodbg-clang/usr/13_0R-src/arm64.aarch64/sys/GENERIC-NODBG-CA72 >> arm64 aarch64 1300139 1300139 > > Cannot reproduce here (but with much simpler names and on stable/13): > zfs create testz/test > zfs snapshot testz/test@snap1 > zfs rename testz/test testz/test2 > > All worked. > I've noticed that sometimes in my explorations it has been silent instead of complaining. I've no clue at this point what prior activity (or lack of activity) makes the difference for if a message will be generated vs. not. === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar) ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: FYI: WITH_REPRODUCIBLE_BUILD= problem for some files? [aarch64 test did not reproduce the issue]
On 2021-May-4, at 13:38, Mark Millard wrote: > [The first buidlworld is still in process. So while waiting . . .] > > On 2021-May-4, at 10:31, Mark Millard wrote: > >> I probably know why the huge count of differences this time >> unlike the original report . . . >> >> Previously I built based on a checked-in branch as part of >> my experimenting. This time it was in a -dirty form (not >> checked in), again as part of my experimental exploration. >> >> WITH_REPRODUCIBLE_BUILD= makes a distinction between these >> if I remember right: (partially?) disabling itself for >> -dirty style. >> >> To reproduce the original style of test I need to create >> a branch with my few patches checked in and do the >> buildworlds from that branch. >> >> This will, of course, take a while. >> >> Sorry for the noise. >> > > I've confirmed some of the details of the large number of > files with difference while waiting for the 1st buildworld : > > The 4 bytes at the end of the .gnu_debuglink section > that are ending up different are the checksum for the > .debug file. The .debug files have differences such as: > > │ -<1a> DW_AT_comp_dir: (indirect string) > /usr/obj/BUILDs/13_0R-CA72-nodbg-clang-alt/usr/13_0R-src/arm64.aarch64/lib/csu/aarch64 > │ +<1a> DW_AT_comp_dir: (indirect string) > /usr/obj/BUILDs/13_0R-CA72-nodbg-clang/usr/13_0R-src/arm64.aarch64/lib/csu/aarch64 > > So I need to build, snapshot (in case need > to reference), install, clean-out, build, > install elsewhere, compare. (Or analogous > that uses the same build base-path for both > installs despite separate buildworld's.) > This is separate from any potential -dirty > vs. checked-in handling variation by > WITH_REPRODUCIBLE_BUILD= . > > My process that produced the original armv7 > report happened to do that before I accidentally > discovered the presence of the few files with > differences. My new experiments were different > and I'd not though of needing to vary the > procedure to get you the right evidence. > The two aarch64 test installs did not show any differences in a "diff -rq" . Ignoring *.meta files generated during the builds, the build directory tree snapshots showed just the differences: # diff -rq /usr/obj/BUILDs/13_0R-CA72-nodbg-clang-alt/.zfs/snapshot/commited-style-0/usr /usr/obj/BUILDs/13_0R-CA72-nodbg-clang-alt/.zfs/snapshot/commited-style-1/usr | grep -v '\.meta' | more Files /usr/obj/BUILDs/13_0R-CA72-nodbg-clang-alt/.zfs/snapshot/commited-style-0/usr/13_0R-src/arm64.aarch64/stand/ficl/softcore.c and /usr/obj/BUILDs/13_0R-CA72-nodbg-clang-alt/.zfs/snapshot/commited-style-1/usr/13_0R-src/arm64.aarch64/stand/ficl/softcore.c differ Files /usr/obj/BUILDs/13_0R-CA72-nodbg-clang-alt/.zfs/snapshot/commited-style-0/usr/13_0R-src/arm64.aarch64/toolchain-metadata.mk and /usr/obj/BUILDs/13_0R-CA72-nodbg-clang-alt/.zfs/snapshot/commited-style-1/usr/13_0R-src/arm64.aarch64/toolchain-metadata.mk differ # diff -u /usr/obj/BUILDs/13_0R-CA72-nodbg-clang-alt/.zfs/snapshot/commited-style-0/usr/13_0R-src/arm64.aarch64/stand/ficl/softcore.c /usr/obj/BUILDs/13_0R-CA72-nodbg-clang-alt/.zfs/snapshot/commited-style-1/usr/13_0R-src/arm64.aarch64/stand/ficl/softcore.c --- /usr/obj/BUILDs/13_0R-CA72-nodbg-clang-alt/.zfs/snapshot/commited-style-0/usr/13_0R-src/arm64.aarch64/stand/ficl/softcore.c 2021-05-04 13:45:14.463351000 -0700 +++ /usr/obj/BUILDs/13_0R-CA72-nodbg-clang-alt/.zfs/snapshot/commited-style-1/usr/13_0R-src/arm64.aarch64/stand/ficl/softcore.c 2021-05-04 19:04:32.338203000 -0700 @@ -4,7 +4,7 @@ ** Words from CORE set written in FICL ** Author: John Sadler (john_sad...@alum.mit.edu) ** Created: 27 December 1997 -** Last update: Tue May 4 13:45:14 PDT 2021 +** Last update: Tue May 4 19:04:32 PDT 2021 ***/ /* ** DO NOT EDIT THIS FILE -- it is generated by softwords/softcore.awk # diff -u /usr/obj/BUILDs/13_0R-CA72-nodbg-clang-alt/.zfs/snapshot/commited-style-0/usr/13_0R-src/arm64.aarch64/toolchain-metadata.mk /usr/obj/BUILDs/13_0R-CA72-nodbg-clang-alt/.zfs/snapshot/commited-style-1/usr/13_0R-src/arm64.aarch64/toolchain-metadata.mk --- /usr/obj/BUILDs/13_0R-CA72-nodbg-clang-alt/.zfs/snapshot/commited-style-0/usr/13_0R-src/arm64.aarch64/toolchain-metadata.mk 2021-05-04 10:55:26.030179000 -0700 +++ /usr/obj/BUILDs/13_0R-CA72-nodbg-clang-alt/.zfs/snapshot/commited-style-1/usr/13_0R-src/arm64.aarch64/toolchain-metadata.mk 2021-05-04 16:14:24.513346000 -0700 @@ -1,4 +1,4 @@ -.info Using cached toolchain metadata from build at CA72_4c8G_ZFS on Tue May 4 10:55:26 PDT 2021 +.info Using cached toolchain metadata from build at CA72_4c8G_ZFS on Tue May 4 16:14:24 PDT 2021 _LOADED_TOOLCHAI
ZFS rename with associated snapshot present: odd error message
I had a: # zfs list -tall NAME USED AVAIL REFER MOUNTPOINT . . . zroot/DESTDIRs/13_0R-CA72-instwrld-norm 1.44G 117G 96K /usr/obj/DESTDIRs/13_0R-CA72-instwrld-norm zroot/DESTDIRs/13_0R-CA72-instwrld-norm@dirty-style 1.44G - 1.44G -. . . . . . (copied/pasted from somewhat earlier) and then attempted: # zfs rename zroot/DESTDIRs/13_0R-CA72-instwrld-norm zroot/DESTDIRs/13_0R-CA72-instwrld-alt-0 cannot open 'zroot/DESTDIRs/13_0R-CA72-instwrld-norm@dirty-style': snapshot delimiter '@' is not expected here Despite the "cannot open" message, the result looks like: # zfs list -tall NAME USED AVAIL REFER MOUNTPOINT . . . zroot/DESTDIRs/13_0R-CA72-instwrld-alt-0 1.44G 114G 96K /usr/obj/DESTDIRs/13_0R-CA72-instwrld-alt-0 zroot/DESTDIRs/13_0R-CA72-instwrld-alt-0@dirty-style 1.44G - 1.44G - . . . Still, it leaves me wondering if everything is okay given that internal attempt to use the old name with @dirty-style when it was apparently no longer available under that naming. For reference: # uname -apKU FreeBSD CA72_4c8G_ZFS 13.0-RELEASE FreeBSD 13.0-RELEASE #0 releng/13.0-n244733-ea31abc261ff-dirty: Thu Apr 29 21:53:20 PDT 2021 root@CA72_4c8G_ZFS:/usr/obj/BUILDs/13_0R-CA72-nodbg-clang/usr/13_0R-src/arm64.aarch64/sys/GENERIC-NODBG-CA72 arm64 aarch64 1300139 1300139 === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar) ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: FYI: WITH_REPRODUCIBLE_BUILD= problem for some files? [Ignore recent test: -dirty vs. checked-in usage difference]
[The first buidlworld is still in process. So while waiting . . .] On 2021-May-4, at 10:31, Mark Millard wrote: > I probably know why the huge count of differences this time > unlike the original report . . . > > Previously I built based on a checked-in branch as part of > my experimenting. This time it was in a -dirty form (not > checked in), again as part of my experimental exploration. > > WITH_REPRODUCIBLE_BUILD= makes a distinction between these > if I remember right: (partially?) disabling itself for > -dirty style. > > To reproduce the original style of test I need to create > a branch with my few patches checked in and do the > buildworlds from that branch. > > This will, of course, take a while. > > Sorry for the noise. > I've confirmed some of the details of the large number of files with difference while waiting for the 1st buildworld : The 4 bytes at the end of the .gnu_debuglink section that are ending up different are the checksum for the .debug file. The .debug files have differences such as: │ -<1a> DW_AT_comp_dir: (indirect string) /usr/obj/BUILDs/13_0R-CA72-nodbg-clang-alt/usr/13_0R-src/arm64.aarch64/lib/csu/aarch64 │ +<1a> DW_AT_comp_dir: (indirect string) /usr/obj/BUILDs/13_0R-CA72-nodbg-clang/usr/13_0R-src/arm64.aarch64/lib/csu/aarch64 So I need to build, snapshot (in case need to reference), install, clean-out, build, install elsewhere, compare. (Or analogous that uses the same build base-path for both installs despite separate buildworld's.) This is separate from any potential -dirty vs. checked-in handling variation by WITH_REPRODUCIBLE_BUILD= . My process that produced the original armv7 report happened to do that before I accidentally discovered the presence of the few files with differences. My new experiments were different and I'd not though of needing to vary the procedure to get you the right evidence. === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar) ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
diffoscope's odd UnicodeDecodeError error message: reason found
I had reported in the reproducable build list messages: > # diffoscope /.zfs/snapshot/2021-04-*-01:40:48-0/bin/sh > [...] > $<3/>2021-05-04 08:49:21 W: diffoscope.main: Fuzzy-matching is currently > disabled as the "tlsh" module is unavailable. > UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb7 in position 18: > invalid start byte Well, it turns out that the file name pattern was incorrect and only matched one file. By contrast: # diffoscope /.zfs/snapshot/2021-04-*/bin/sh $<3/>2021-05-04 11:05:25 W: diffoscope.main: Fuzzy-matching is currently disabled as the "tlsh" module is unavailable. worked fine. And making the "one file" status obvious: # diffoscope c_tests/a.out $<3/>2021-05-04 11:11:45 W: diffoscope.main: Fuzzy-matching is currently disabled as the "tlsh" module is unavailable. $<3/>Traceback (most recent call last): File "/usr/local/lib/python3.7/site-packages/diffoscope/main.py", line 745, in main sys.exit(run_diffoscope(parsed_args)) File "/usr/local/lib/python3.7/site-packages/diffoscope/main.py", line 677, in run_diffoscope difference = load_diff_from_path(path1) File "/usr/local/lib/python3.7/site-packages/diffoscope/readers/__init__.py", line 31, in load_diff_from_path return load_diff(codecs.getreader("utf-8")(fp), path) File "/usr/local/lib/python3.7/site-packages/diffoscope/readers/__init__.py", line 35, in load_diff return JSONReaderV1().load(fp, path) File "/usr/local/lib/python3.7/site-packages/diffoscope/readers/json.py", line 33, in load raw = json.load(fp) File "/usr/local/lib/python3.7/json/__init__.py", line 293, in load return loads(fp.read(), File "/usr/local/lib/python3.7/codecs.py", line 504, in read newchars, decodedbytes = self.decode(data, self.errors) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb7 in position 18: invalid start byte Not exactly an obvious error message for the issue. === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar) ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: FYI: WITH_REPRODUCIBLE_BUILD= problem for some files? [Ignore recent test: -dirty vs. checked-in usage difference]
I probably know why the huge count of differences this time unlike the original report . . . Previously I built based on a checked-in branch as part of my experimenting. This time it was in a -dirty form (not checked in), again as part of my experimental exploration. WITH_REPRODUCIBLE_BUILD= makes a distinction between these if I remember right: (partially?) disabling itself for -dirty style. To reproduce the original style of test I need to create a branch with my few patches checked in and do the buildworlds from that branch. This will, of course, take a while. Sorry for the noise. === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar) ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: FYI: WITH_REPRODUCIBLE_BUILD= problem for some files?
[Just adding readelf -S info since it seems to show more.] On 2021-May-4, at 10:01, Mark Millard wrote: > On 2021-May-4, at 08:51, Mark Millard wrote: > >> On 2021-May-4, at 06:01, Ed Maste wrote: >> >>> On Mon, 3 May 2021 at 22:26, Mark Millard wrote: >>>> >>>> But I'll note that I've built and stalled py37-diffoscope >>>> (new to me). A basic quick test showed that it reports: >>>> >>>> W: diffoscope.main: Fuzzy-matching is currently disabled as the "tlsh" >>>> module is unavailable. >>> >>> I just looked up tlsh - its "A Locality Sensitive Hash"; I presume >>> diffoscope uses it to infer file renames. I believe the warning >>> emitted here should have no impact on the output we're looking for. >> >> Okay. >> >>> As far as the utf-8 issues go, diffoscope requires a utf-8 locale and >>> I suspect that is the issue. If you don't have LANG set already, try >>> setting LANG=C.UTF-8 in your environment. >> >> That is not the issue for the UnicodeDecodeError: >> >> # echo $LANG >> C.UTF-8 >> >> # diffoscope /.zfs/snapshot/2021-04-*-01:40:48-0/bin/sh >> $<3/>2021-05-04 08:49:21 W: diffoscope.main: Fuzzy-matching is currently >> disabled as the "tlsh" module is unavailable. >> $<3/>Traceback (most recent call last): >> File "/usr/local/lib/python3.7/site-packages/diffoscope/main.py", line 745, >> in main >> sys.exit(run_diffoscope(parsed_args)) >> File "/usr/local/lib/python3.7/site-packages/diffoscope/main.py", line 677, >> in run_diffoscope >> difference = load_diff_from_path(path1) >> File >> "/usr/local/lib/python3.7/site-packages/diffoscope/readers/__init__.py", >> line 31, in load_diff_from_path >> return load_diff(codecs.getreader("utf-8")(fp), path) >> File >> "/usr/local/lib/python3.7/site-packages/diffoscope/readers/__init__.py", >> line 35, in load_diff >> return JSONReaderV1().load(fp, path) >> File "/usr/local/lib/python3.7/site-packages/diffoscope/readers/json.py", >> line 33, in load >> raw = json.load(fp) >> File "/usr/local/lib/python3.7/json/__init__.py", line 293, in load >> return loads(fp.read(), >> File "/usr/local/lib/python3.7/codecs.py", line 504, in read >> newchars, decodedbytes = self.decode(data, self.errors) >> UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb7 in position 18: >> invalid start byte >> > > Well, the list of differing files is huge. But this seems to > be .gnu_debuglink content for the area it is in. Specifically: the last 4 bytes of the .gnu_debuglink section. > I'll note > that I did installworld but not the likes of distrib-dirs > or distribution this time. > > This test did buildworld to two distinct directories: > > zroot/BUILDs/13_0R-CA72-nodbg-clang 5.13G 118G 5.13G > /usr/obj/BUILDs/13_0R-CA72-nodbg-clang > zroot/BUILDs/13_0R-CA72-nodbg-clang-alt 4.28G 118G 4.28G > /usr/obj/BUILDs/13_0R-CA72-nodbg-clang-alt > > and installworld to 2 distinct directories: > > zroot/DESTDIRs/13_0R-CA72-instwrld-alt1.44G 118G 1.44G > /usr/obj/DESTDIRs/13_0R-CA72-instwrld-alt > zroot/DESTDIRs/13_0R-CA72-instwrld-norm 1.44G 118G 1.44G > /usr/obj/DESTDIRs/13_0R-CA72-instwrld-norm > > Previously (armv7 target) I had built, installed, rebuilt > to same directory (after clean-out) and installed to an > alternate directory. That had gotten only a few files > different but I do not know (yet) if it was the procedural > difference that made the difference. > > Prefix of the list of different files this time: > > # diff -rq /usr/obj/DESTDIRs/13_0R-CA72-instwrld-norm/ > /usr/obj/DESTDIRs/13_0R-CA72-instwrld-alt/ | more > Files /usr/obj/DESTDIRs/13_0R-CA72-instwrld-norm/bin/[ and > /usr/obj/DESTDIRs/13_0R-CA72-instwrld-alt/bin/[ differ > Files /usr/obj/DESTDIRs/13_0R-CA72-instwrld-norm/bin/cat and > /usr/obj/DESTDIRs/13_0R-CA72-instwrld-alt/bin/cat differ > Files /usr/obj/DESTDIRs/13_0R-CA72-instwrld-norm/bin/chflags and > /usr/obj/DESTDIRs/13_0R-CA72-instwrld-alt/bin/chflags differ > Files /usr/obj/DESTDIRs/13_0R-CA72-instwrld-norm/bin/chio and > /usr/obj/DESTDIRs/13_0R-CA72-instwrld-alt/bin/chio differ > . . . > > Looking, aarch64 seems to typically get a back-to-back > sequence of 4 bytes different in native programs in my > builds: > > # cmp -x /usr/obj/DESTDIRs/13_0R-CA72-instwrld-norm/bin/cat > /usr/obj/DESTDIRs/13_0R-CA72-instwrld-al
Re: FYI: WITH_REPRODUCIBLE_BUILD= problem for some files?
On 2021-May-4, at 08:51, Mark Millard wrote: > On 2021-May-4, at 06:01, Ed Maste wrote: > >> On Mon, 3 May 2021 at 22:26, Mark Millard wrote: >>> >>> But I'll note that I've built and stalled py37-diffoscope >>> (new to me). A basic quick test showed that it reports: >>> >>> W: diffoscope.main: Fuzzy-matching is currently disabled as the "tlsh" >>> module is unavailable. >> >> I just looked up tlsh - its "A Locality Sensitive Hash"; I presume >> diffoscope uses it to infer file renames. I believe the warning >> emitted here should have no impact on the output we're looking for. > > Okay. > >> As far as the utf-8 issues go, diffoscope requires a utf-8 locale and >> I suspect that is the issue. If you don't have LANG set already, try >> setting LANG=C.UTF-8 in your environment. > > That is not the issue for the UnicodeDecodeError: > > # echo $LANG > C.UTF-8 > > # diffoscope /.zfs/snapshot/2021-04-*-01:40:48-0/bin/sh > $<3/>2021-05-04 08:49:21 W: diffoscope.main: Fuzzy-matching is currently > disabled as the "tlsh" module is unavailable. > $<3/>Traceback (most recent call last): > File "/usr/local/lib/python3.7/site-packages/diffoscope/main.py", line 745, > in main >sys.exit(run_diffoscope(parsed_args)) > File "/usr/local/lib/python3.7/site-packages/diffoscope/main.py", line 677, > in run_diffoscope >difference = load_diff_from_path(path1) > File > "/usr/local/lib/python3.7/site-packages/diffoscope/readers/__init__.py", line > 31, in load_diff_from_path >return load_diff(codecs.getreader("utf-8")(fp), path) > File > "/usr/local/lib/python3.7/site-packages/diffoscope/readers/__init__.py", line > 35, in load_diff >return JSONReaderV1().load(fp, path) > File "/usr/local/lib/python3.7/site-packages/diffoscope/readers/json.py", > line 33, in load >raw = json.load(fp) > File "/usr/local/lib/python3.7/json/__init__.py", line 293, in load >return loads(fp.read(), > File "/usr/local/lib/python3.7/codecs.py", line 504, in read >newchars, decodedbytes = self.decode(data, self.errors) > UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb7 in position 18: > invalid start byte > Well, the list of differing files is huge. But this seems to be .gnu_debuglink content for the area it is in. I'll note that I did installworld but not the likes of distrib-dirs or distribution this time. This test did buildworld to two distinct directories: zroot/BUILDs/13_0R-CA72-nodbg-clang 5.13G 118G 5.13G /usr/obj/BUILDs/13_0R-CA72-nodbg-clang zroot/BUILDs/13_0R-CA72-nodbg-clang-alt 4.28G 118G 4.28G /usr/obj/BUILDs/13_0R-CA72-nodbg-clang-alt and installworld to 2 distinct directories: zroot/DESTDIRs/13_0R-CA72-instwrld-alt1.44G 118G 1.44G /usr/obj/DESTDIRs/13_0R-CA72-instwrld-alt zroot/DESTDIRs/13_0R-CA72-instwrld-norm 1.44G 118G 1.44G /usr/obj/DESTDIRs/13_0R-CA72-instwrld-norm Previously (armv7 target) I had built, installed, rebuilt to same directory (after clean-out) and installed to an alternate directory. That had gotten only a few files different but I do not know (yet) if it was the procedural difference that made the difference. Prefix of the list of different files this time: # diff -rq /usr/obj/DESTDIRs/13_0R-CA72-instwrld-norm/ /usr/obj/DESTDIRs/13_0R-CA72-instwrld-alt/ | more Files /usr/obj/DESTDIRs/13_0R-CA72-instwrld-norm/bin/[ and /usr/obj/DESTDIRs/13_0R-CA72-instwrld-alt/bin/[ differ Files /usr/obj/DESTDIRs/13_0R-CA72-instwrld-norm/bin/cat and /usr/obj/DESTDIRs/13_0R-CA72-instwrld-alt/bin/cat differ Files /usr/obj/DESTDIRs/13_0R-CA72-instwrld-norm/bin/chflags and /usr/obj/DESTDIRs/13_0R-CA72-instwrld-alt/bin/chflags differ Files /usr/obj/DESTDIRs/13_0R-CA72-instwrld-norm/bin/chio and /usr/obj/DESTDIRs/13_0R-CA72-instwrld-alt/bin/chio differ . . . Looking, aarch64 seems to typically get a back-to-back sequence of 4 bytes different in native programs in my builds: # cmp -x /usr/obj/DESTDIRs/13_0R-CA72-instwrld-norm/bin/cat /usr/obj/DESTDIRs/13_0R-CA72-instwrld-alt/bin/cat 3bd4 1d 65 3bd5 eb a3 3bd6 bb ca 3bd7 8e 1a # ls -Tld /usr/obj/DESTDIRs/13_0R-CA72-instwrld-norm/bin/cat /usr/obj/DESTDIRs/13_0R-CA72-instwrld-alt/bin/cat -r-xr-xr-x 1 root wheel 18448 May 4 08:55:01 2021 /usr/obj/DESTDIRs/13_0R-CA72-instwrld-alt/bin/cat -r-xr-xr-x 1 root wheel 18448 May 3 23:16:36 2021 /usr/obj/DESTDIRs/13_0R-CA72-instwrld-norm/bin/cat Sections: Idx Name Size VMA LMA File off Algn . . . 25 .gnu_debuglink 0010 3bc8 2**0 CONTENTS, READONLY 3bd4-0
Re: FYI: WITH_REPRODUCIBLE_BUILD= problem for some files?
On 2021-May-4, at 06:01, Ed Maste wrote: > On Mon, 3 May 2021 at 22:26, Mark Millard wrote: >> >> But I'll note that I've built and stalled py37-diffoscope >> (new to me). A basic quick test showed that it reports: >> >> W: diffoscope.main: Fuzzy-matching is currently disabled as the "tlsh" >> module is unavailable. > > I just looked up tlsh - its "A Locality Sensitive Hash"; I presume > diffoscope uses it to infer file renames. I believe the warning > emitted here should have no impact on the output we're looking for. Okay. > As far as the utf-8 issues go, diffoscope requires a utf-8 locale and > I suspect that is the issue. If you don't have LANG set already, try > setting LANG=C.UTF-8 in your environment. That is not the issue for the UnicodeDecodeError: # echo $LANG C.UTF-8 # diffoscope /.zfs/snapshot/2021-04-*-01:40:48-0/bin/sh $<3/>2021-05-04 08:49:21 W: diffoscope.main: Fuzzy-matching is currently disabled as the "tlsh" module is unavailable. $<3/>Traceback (most recent call last): File "/usr/local/lib/python3.7/site-packages/diffoscope/main.py", line 745, in main sys.exit(run_diffoscope(parsed_args)) File "/usr/local/lib/python3.7/site-packages/diffoscope/main.py", line 677, in run_diffoscope difference = load_diff_from_path(path1) File "/usr/local/lib/python3.7/site-packages/diffoscope/readers/__init__.py", line 31, in load_diff_from_path return load_diff(codecs.getreader("utf-8")(fp), path) File "/usr/local/lib/python3.7/site-packages/diffoscope/readers/__init__.py", line 35, in load_diff return JSONReaderV1().load(fp, path) File "/usr/local/lib/python3.7/site-packages/diffoscope/readers/json.py", line 33, in load raw = json.load(fp) File "/usr/local/lib/python3.7/json/__init__.py", line 293, in load return loads(fp.read(), File "/usr/local/lib/python3.7/codecs.py", line 504, in read newchars, decodedbytes = self.decode(data, self.errors) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb7 in position 18: invalid start byte === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar) ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: FYI: WITH_REPRODUCIBLE_BUILD= problem for some files?
On 2021-May-3, at 21:27, Mark Millard wrote: > On 2021-May-3, at 19:26, Mark Millard wrote: > >> On 2021-May-3, at 10:51, Mark Millard wrote: >> >>> On 2021-May-3, at 07:47, Ed Maste wrote: >>> >>>> On Thu, 29 Apr 2021 at 02:50, Mark Millard via freebsd-current >>>> wrote: >>>>> >>>>> Files /usr/obj/DESTDIRs/13_0R-CA7-chroot/sbin/ping and >>>>> /usr/obj/DESTDIRs/13_0R-CA7-poud/sbin/ping differ >>>>> Files /usr/obj/DESTDIRs/13_0R-CA7-chroot/sbin/ping6 and >>>>> /usr/obj/DESTDIRs/13_0R-CA7-poud/sbin/ping6 differ >>>>> Files /usr/obj/DESTDIRs/13_0R-CA7-chroot/usr/bin/ntpq and >>>>> /usr/obj/DESTDIRs/13_0R-CA7-poud/usr/bin/ntpq differ... >>>> >>>> This is unexpected. Unfortunately I haven't looked at reproducibility >>>> in a while, and my work was all on x86. This could be a regression or >>>> a longstanding issue with arm64. >>>> >>>> If you install the diffoscope package (py37-diffoscope) and run it on >>>> the two directories / files it should give a more convenient view of >>>> the differences. (Or, if you can make a tarball of the differing files >>>> I can take a look.) >>> >>> I no longer have the same content in those directory >>> trees: newer rebuild and the same buildworld used to >>> installworld to both places, instead of 2 different >>> buildworld's. I'm also unsure how reproducible getting >>> differences was. >>> >>> I can eventually do experiments to test multiple separate >>> buildworld's and installworld's, but the machine is busy >>> building ports and the llvm builds involved means it >>> will be some time before I'd switch activities. And the >>> buildworld's involve llvm builds as well and take notable >>> time themselves. So my next comparison will not be any >>> time soon. >>> >>> I'll let you know if I manage to generate another example, >>> this time being sure to keep the data. If I try multiple >>> times without finding any differences, I'll eventually >>> decide "enough is enough" and let you know. >> >> I've still got a long ways to go to do the first >> actual comparison of builds. >> >> But I'll note that I've built and stalled py37-diffoscope >> (new to me). A basic quick test showed that it reports: >> >> W: diffoscope.main: Fuzzy-matching is currently disabled as the "tlsh" >> module is unavailable. >> >> As I'm not familiar with the tool, you might need to send >> notes about how you want me to use the tool to get the >> output that you would want. (And, so, I get to learn . . .) > > I've tried another experiment (* in the path matches "28" and "30"): > > # diffoscope /.zfs/snapshot/2021-04-*-01:40:48-0/bin/sh > $<3/>2021-05-03 21:08:48 W: diffoscope.main: Fuzzy-matching is currently > disabled as the "tlsh" module is unavailable. > $<3/>Traceback (most recent call last): > File "/usr/local/lib/python3.7/site-packages/diffoscope/main.py", line 745, > in main >sys.exit(run_diffoscope(parsed_args)) > File "/usr/local/lib/python3.7/site-packages/diffoscope/main.py", line 677, > in run_diffoscope >difference = load_diff_from_path(path1) > File > "/usr/local/lib/python3.7/site-packages/diffoscope/readers/__init__.py", line > 31, in load_diff_from_path >return load_diff(codecs.getreader("utf-8")(fp), path) > File > "/usr/local/lib/python3.7/site-packages/diffoscope/readers/__init__.py", line > 35, in load_diff >return JSONReaderV1().load(fp, path) > File "/usr/local/lib/python3.7/site-packages/diffoscope/readers/json.py", > line 33, in load >raw = json.load(fp) > File "/usr/local/lib/python3.7/json/__init__.py", line 293, in load >return loads(fp.read(), > File "/usr/local/lib/python3.7/codecs.py", line 504, in read >newchars, decodedbytes = self.decode(data, self.errors) > UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb7 in position 18: > invalid start byte > > The two older snapshots of a Boot Environment have > bin/sh files that compare equal. But every program I > tried the above sort of thing against on got the same > UnicodeDecodeError result from diffoscope, byte value > and position matching. > > These snapshots have more than an installworld in them > and so are messy to compare overall. But the > installworld (and
Re: FYI: WITH_REPRODUCIBLE_BUILD= problem for some files?
On 2021-May-3, at 19:26, Mark Millard wrote: > On 2021-May-3, at 10:51, Mark Millard wrote: > >> On 2021-May-3, at 07:47, Ed Maste wrote: >> >>> On Thu, 29 Apr 2021 at 02:50, Mark Millard via freebsd-current >>> wrote: >>>> >>>> Files /usr/obj/DESTDIRs/13_0R-CA7-chroot/sbin/ping and >>>> /usr/obj/DESTDIRs/13_0R-CA7-poud/sbin/ping differ >>>> Files /usr/obj/DESTDIRs/13_0R-CA7-chroot/sbin/ping6 and >>>> /usr/obj/DESTDIRs/13_0R-CA7-poud/sbin/ping6 differ >>>> Files /usr/obj/DESTDIRs/13_0R-CA7-chroot/usr/bin/ntpq and >>>> /usr/obj/DESTDIRs/13_0R-CA7-poud/usr/bin/ntpq differ... >>> >>> This is unexpected. Unfortunately I haven't looked at reproducibility >>> in a while, and my work was all on x86. This could be a regression or >>> a longstanding issue with arm64. >>> >>> If you install the diffoscope package (py37-diffoscope) and run it on >>> the two directories / files it should give a more convenient view of >>> the differences. (Or, if you can make a tarball of the differing files >>> I can take a look.) >> >> I no longer have the same content in those directory >> trees: newer rebuild and the same buildworld used to >> installworld to both places, instead of 2 different >> buildworld's. I'm also unsure how reproducible getting >> differences was. >> >> I can eventually do experiments to test multiple separate >> buildworld's and installworld's, but the machine is busy >> building ports and the llvm builds involved means it >> will be some time before I'd switch activities. And the >> buildworld's involve llvm builds as well and take notable >> time themselves. So my next comparison will not be any >> time soon. >> >> I'll let you know if I manage to generate another example, >> this time being sure to keep the data. If I try multiple >> times without finding any differences, I'll eventually >> decide "enough is enough" and let you know. > > I've still got a long ways to go to do the first > actual comparison of builds. > > But I'll note that I've built and stalled py37-diffoscope > (new to me). A basic quick test showed that it reports: > > W: diffoscope.main: Fuzzy-matching is currently disabled as the "tlsh" module > is unavailable. > > As I'm not familiar with the tool, you might need to send > notes about how you want me to use the tool to get the > output that you would want. (And, so, I get to learn . . .) I've tried another experiment (* in the path matches "28" and "30"): # diffoscope /.zfs/snapshot/2021-04-*-01:40:48-0/bin/sh $<3/>2021-05-03 21:08:48 W: diffoscope.main: Fuzzy-matching is currently disabled as the "tlsh" module is unavailable. $<3/>Traceback (most recent call last): File "/usr/local/lib/python3.7/site-packages/diffoscope/main.py", line 745, in main sys.exit(run_diffoscope(parsed_args)) File "/usr/local/lib/python3.7/site-packages/diffoscope/main.py", line 677, in run_diffoscope difference = load_diff_from_path(path1) File "/usr/local/lib/python3.7/site-packages/diffoscope/readers/__init__.py", line 31, in load_diff_from_path return load_diff(codecs.getreader("utf-8")(fp), path) File "/usr/local/lib/python3.7/site-packages/diffoscope/readers/__init__.py", line 35, in load_diff return JSONReaderV1().load(fp, path) File "/usr/local/lib/python3.7/site-packages/diffoscope/readers/json.py", line 33, in load raw = json.load(fp) File "/usr/local/lib/python3.7/json/__init__.py", line 293, in load return loads(fp.read(), File "/usr/local/lib/python3.7/codecs.py", line 504, in read newchars, decodedbytes = self.decode(data, self.errors) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb7 in position 18: invalid start byte The two older snapshots of a Boot Environment have bin/sh files that compare equal. But every program I tried the above sort of thing against on got the same UnicodeDecodeError result from diffoscope, byte value and position matching. These snapshots have more than an installworld in them and so are messy to compare overall. But the installworld (and installkernel) content show similar differences to what I reported before as far as example files with differences go. But this is aarch64, not armv7. It will still be notable time before I have simple installworld tree's to compare. === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar) ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: FYI: WITH_REPRODUCIBLE_BUILD= problem for some files?
On 2021-May-3, at 10:51, Mark Millard wrote: > On 2021-May-3, at 07:47, Ed Maste wrote: > >> On Thu, 29 Apr 2021 at 02:50, Mark Millard via freebsd-current >> wrote: >>> >>> Files /usr/obj/DESTDIRs/13_0R-CA7-chroot/sbin/ping and >>> /usr/obj/DESTDIRs/13_0R-CA7-poud/sbin/ping differ >>> Files /usr/obj/DESTDIRs/13_0R-CA7-chroot/sbin/ping6 and >>> /usr/obj/DESTDIRs/13_0R-CA7-poud/sbin/ping6 differ >>> Files /usr/obj/DESTDIRs/13_0R-CA7-chroot/usr/bin/ntpq and >>> /usr/obj/DESTDIRs/13_0R-CA7-poud/usr/bin/ntpq differ... >> >> This is unexpected. Unfortunately I haven't looked at reproducibility >> in a while, and my work was all on x86. This could be a regression or >> a longstanding issue with arm64. >> >> If you install the diffoscope package (py37-diffoscope) and run it on >> the two directories / files it should give a more convenient view of >> the differences. (Or, if you can make a tarball of the differing files >> I can take a look.) > > I no longer have the same content in those directory > trees: newer rebuild and the same buildworld used to > installworld to both places, instead of 2 different > buildworld's. I'm also unsure how reproducible getting > differences was. > > I can eventually do experiments to test multiple separate > buildworld's and installworld's, but the machine is busy > building ports and the llvm builds involved means it > will be some time before I'd switch activities. And the > buildworld's involve llvm builds as well and take notable > time themselves. So my next comparison will not be any > time soon. > > I'll let you know if I manage to generate another example, > this time being sure to keep the data. If I try multiple > times without finding any differences, I'll eventually > decide "enough is enough" and let you know. I've still got a long ways to go to do the first actual comparison of builds. But I'll note that I've built and stalled py37-diffoscope (new to me). A basic quick test showed that it reports: W: diffoscope.main: Fuzzy-matching is currently disabled as the "tlsh" module is unavailable. As I'm not familiar with the tool, you might need to send notes about how you want me to use the tool to get the output that you would want. (And, so, I get to learn . . .) === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar) === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar) ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: FYI: WITH_REPRODUCIBLE_BUILD= problem for some files?
On 2021-May-3, at 07:47, Ed Maste wrote: > On Thu, 29 Apr 2021 at 02:50, Mark Millard via freebsd-current > wrote: >> >> Files /usr/obj/DESTDIRs/13_0R-CA7-chroot/sbin/ping and >> /usr/obj/DESTDIRs/13_0R-CA7-poud/sbin/ping differ >> Files /usr/obj/DESTDIRs/13_0R-CA7-chroot/sbin/ping6 and >> /usr/obj/DESTDIRs/13_0R-CA7-poud/sbin/ping6 differ >> Files /usr/obj/DESTDIRs/13_0R-CA7-chroot/usr/bin/ntpq and >> /usr/obj/DESTDIRs/13_0R-CA7-poud/usr/bin/ntpq differ... > > This is unexpected. Unfortunately I haven't looked at reproducibility > in a while, and my work was all on x86. This could be a regression or > a longstanding issue with arm64. > > If you install the diffoscope package (py37-diffoscope) and run it on > the two directories / files it should give a more convenient view of > the differences. (Or, if you can make a tarball of the differing files > I can take a look.) I no longer have the same content in those directory trees: newer rebuild and the same buildworld used to installworld to both places, instead of 2 different buildworld's. I'm also unsure how reproducible getting differences was. I can eventually do experiments to test multiple separate buildworld's and installworld's, but the machine is busy building ports and the llvm builds involved means it will be some time before I'd switch activities. And the buildworld's involve llvm builds as well and take notable time themselves. So my next comparison will not be any time soon. I'll let you know if I manage to generate another example, this time being sure to keep the data. If I try multiple times without finding any differences, I'll eventually decide "enough is enough" and let you know. === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar) ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
FYI: WITH_REPRODUCIBLE_BUILD= problem for some files?
I did 2 test buildworld's based on: # ~/fbsd-based-on-what-freebsd.sh branch: releng/13.0 merge-base: ea31abc261ffc01b6ff5671bffb15cf910a07f4b merge-base: CommitDate: 2021-04-09 00:14:30 + ea31abc261ff (HEAD -> releng/13.0, tag: release/13.0.0, freebsd/releng/13.0) 13.0: update to RELEASE n244733 (--first-parent --count for merge-base) and produced separate build trees. I also installed the world build into two separate directory trees: /usr/obj/DESTDIRs/13_0R-CA7-chroot/ vs. /usr/obj/DESTDIRs/13_0R-CA7-poud/ This was for other reasons. But eventually I happened to do a diff -rq of the two trees and ended up with the output showing some differing files: Files /usr/obj/DESTDIRs/13_0R-CA7-chroot/sbin/ping and /usr/obj/DESTDIRs/13_0R-CA7-poud/sbin/ping differ Files /usr/obj/DESTDIRs/13_0R-CA7-chroot/sbin/ping6 and /usr/obj/DESTDIRs/13_0R-CA7-poud/sbin/ping6 differ Files /usr/obj/DESTDIRs/13_0R-CA7-chroot/usr/bin/ntpq and /usr/obj/DESTDIRs/13_0R-CA7-poud/usr/bin/ntpq differ Files /usr/obj/DESTDIRs/13_0R-CA7-chroot/usr/lib/debug/sbin/ping.debug and /usr/obj/DESTDIRs/13_0R-CA7-poud/usr/lib/debug/sbin/ping.debug differ Files /usr/obj/DESTDIRs/13_0R-CA7-chroot/usr/lib/debug/usr/sbin/ntpd.debug and /usr/obj/DESTDIRs/13_0R-CA7-poud/usr/lib/debug/usr/sbin/ntpd.debug differ Files /usr/obj/DESTDIRs/13_0R-CA7-chroot/usr/lib/debug/usr/tests/sbin/ping/in_cksum_test.debug and /usr/obj/DESTDIRs/13_0R-CA7-poud/usr/lib/debug/usr/tests/sbin/ping/in_cksum_test.debug differ Files /usr/obj/DESTDIRs/13_0R-CA7-chroot/usr/sbin/ntp-keygen and /usr/obj/DESTDIRs/13_0R-CA7-poud/usr/sbin/ntp-keygen differ Files /usr/obj/DESTDIRs/13_0R-CA7-chroot/usr/sbin/ntpd and /usr/obj/DESTDIRs/13_0R-CA7-poud/usr/sbin/ntpd differ Files /usr/obj/DESTDIRs/13_0R-CA7-chroot/usr/sbin/ntpdate and /usr/obj/DESTDIRs/13_0R-CA7-poud/usr/sbin/ntpdate differ Files /usr/obj/DESTDIRs/13_0R-CA7-chroot/usr/sbin/ntpdc and /usr/obj/DESTDIRs/13_0R-CA7-poud/usr/sbin/ntpdc differ Files /usr/obj/DESTDIRs/13_0R-CA7-chroot/usr/sbin/sntp and /usr/obj/DESTDIRs/13_0R-CA7-poud/usr/sbin/sntp differ Files /usr/obj/DESTDIRs/13_0R-CA7-chroot/usr/tests/sbin/ping/in_cksum_test and /usr/obj/DESTDIRs/13_0R-CA7-poud/usr/tests/sbin/ping/in_cksum_test differ (That is all.) For as much as I've looked at (not much), it looks to be variations in byte-padding values. The builds both were set up to tune for cortex-a7 explicitly. I patch top's source code. I patch the OOM kill code to report the specific reason for a kill. I still have some bcm2838 pci/xhci patching in place from an old investigation, but that would be kernel code. None of the patching is specific to the above list of files. The hosting context was: # uname -apKU FreeBSD CA72_4c8G_ZFS 13.0-RELEASE FreeBSD 13.0-RELEASE #1 releng/13.0-n244733-ea31abc261ff-dirty: Wed Apr 28 05:45:27 PDT 2021 root@CA72_4c8G_ZFS:/usr/obj/BUILDs/13_0R-CA72-nodbg-clang/usr/src/arm64.aarch64/sys/GENERIC-NODBG-CA72 arm64 aarch64 1300139 1300139 based on building the same source code (tuning for cortex-a72). It was the same media for all the activity. Unlike the past many years for me, the context is using ZFS instead of UFS, not that I think that makes a difference here. The differences do not mess up my activity but others might notice and care about such differences. === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar) ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: (D29934) Reorder commented steps in UPDATING following sequential order. (was: etcupdate -p vs. root on zfs (and bectl use and such): no /usr/src/etc/master.passwd (for example))
On 2021-Apr-25, at 08:14, Graham Perrin wrote: > On 23/04/2021 08:39, Mark Millard via freebsd-current wrote: > >> [3] > > > With regard to mounting ZFS file systems in single user mode > > What's currently footnote 3 will probably become footnote 4, please see: > > <https://reviews.freebsd.org/D29934#inline-186101> > > … and so on. If it were me, I'd probably do something to make the mounting of file systems and such have an explicit reminder as its own step, something like: [4] mergemaster -Fp [5] I just do not think of such as part of : it is already rebooted in single user at that point in my view. Sorry that I missed what was there in UPDATING. However, /usr/src/Makefile has: # 1. `cd /usr/src' (or to the directory containing your source tree). # 2. `make buildworld' # 3. `make buildkernel KERNCONF=YOUR_KERNEL_HERE' (default is GENERIC). # 4. `make installkernel KERNCONF=YOUR_KERNEL_HERE' (default is GENERIC). # [steps 3. & 4. can be combined by using the "kernel" target] # 5. `reboot'(in single user mode: boot -s from the loader prompt). # 6. `mergemaster -p' # 7. `make installworld' # 8. `mergemaster'(you may wish to use -i, along with -U or -F). # 9. `make delete-old' # 10. `reboot' # 11. `make delete-old-libs' (in case no 3rd party program uses them anymore) without such material, even in footnotes. Side notes: "From the bootblocks, boot -s, and then do": "From the boot loader, boot -s, and then do"? etcupdate vs. mergemaster and the $FreeBSD$ issue? Is mergemaster going to stay as the recommented command to use? If so, with which command line options? === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar) ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Despite the documentation, "etcupdate extract" handles -D destdir (and its contribution to the default workdir)
# etcupdate -? Illegal option -? usage: etcupdate [-npBF] [-d workdir] [-r | -s source | -t tarball] [-A patterns] [-D destdir] [-I patterns] [-L logfile] [-M options] etcupdate build [-B] [-d workdir] [-s source] [-L logfile] [-M options] etcupdate diff [-d workdir] [-D destdir] [-I patterns] [-L logfile] etcupdate extract [-B] [-d workdir] [-s source | -t tarball] [-L logfile] [-M options] etcupdate resolve [-p] [-d workdir] [-D destdir] [-L logfile] etcupdate status [-d workdir] [-D destdir] The "etcupdate extract" material does not show -D destdir as valid. # man etcupdate . . . SYNOPSIS etcupdate [-npBF] [-d workdir] [-r | -s source | -t tarball] [-A patterns] [-D destdir] [-I patterns] [-L logfile] [-M options] etcupdate build [-B] [-d workdir] [-s source] [-L logfile] [-M options] tarball etcupdate diff [-d workdir] [-D destdir] [-I patterns] [-L logfile] etcupdate extract [-B] [-d workdir] [-s source | -t tarball] [-L logfile] [-M options] etcupdate resolve [-p] [-d workdir] [-D destdir] [-L logfile] etcupdate status [-d workdir] [-D destdir] . . . Again the "etcupdate extract" material does not show -D destdir as valid. But I used it: # etcupdate extract -D usr/obj/DESTDIRs/13_0R-CA7-for-chroot and it created and filled in the workdir: /usr/obj/DESTDIRs/13_0R-CA7-for-chroot/var/db/etcupdate/ I have not checked on if "etcupdate build" has a similar issue vs. not. === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar) ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
https://artifact.ci.freebsd.org/snapshot/stable-13/?C=M=D messed up dates and HASHID-only use make things extremely hard to find "in time order"
Using an example to illustrate problems finding artifacts, the problems not being limited to the example's specifics. I have historically used https://artifact.ci.freebsd.org/snapshot/ to do build-less approximate bisecting (and other things). Such use is very messed up since the git-related URL conventions chosen were put in place. The below illustrates an example of the mess for how things are currently presented. https://artifact.ci.freebsd.org/snapshot/stable-13/?C=M=D lists ac845558f7b626d9a31b8f6dab686c45d39dc5a0/ as having date/time 2021-Apr-10 18:43 . But: https://artifact.ci.freebsd.org/snapshot/stable-13/ac845558f7b626d9a31b8f6dab686c45d39dc5a0/?C=M=D lists: powerpc/ and arm/ as having date/times 2021-Apr-10 18:54 and 2021-Apr-10 18:50 yet lists... i386/ and arm64/ as having date/times 2021-Feb-19 19:00 and 2021-Feb-19 18:50 . But it gets worse: https://artifact.ci.freebsd.org/snapshot/stable-13/ac845558f7b626d9a31b8f6dab686c45d39dc5a0/powerpc/?C=M=D shows an empty directory. Same for: https://artifact.ci.freebsd.org/snapshot/stable-13/ac845558f7b626d9a31b8f6dab686c45d39dc5a0/arm/?C=M=D By contrast, https://artifact.ci.freebsd.org/snapshot/stable-13/ac845558f7b626d9a31b8f6dab686c45d39dc5a0/i386/?C=M=D shows i386/ with date/time 2021-Apr-10 18:43 but https://artifact.ci.freebsd.org/snapshot/stable-13/ac845558f7b626d9a31b8f6dab686c45d39dc5a0/i386/i386/?C=M=D shows all the file dates as 2021-Feb-19 19:00 . Going back to arm64/ I find a similar 2021-Feb-19 dating, although 21021-Feb does show up in more places: https://artifact.ci.freebsd.org/snapshot/stable-13/ac845558f7b626d9a31b8f6dab686c45d39dc5a0/arm64/?C=M=D shows aarch64/ with date/time 2021-Feb-19 18:50 and https://artifact.ci.freebsd.org/snapshot/stable-13/ac845558f7b626d9a31b8f6dab686c45d39dc5a0/arm64/aarch64/?C=M=D shows the files also having the date/time 2021-Feb-19 18:50 . In my view the choice to only use the hash-id for the commit in the url is a usability mistake and the url prefix should be of a form more like (for this example context): https://artifact.ci.freebsd.org/snapshot/stable-13/n??-HASHID/ where the ?? is from: git rev-list --first-parent --count (as used elsewhere by FreeBSD). (The HASHID might be just the 12 character prefix instead of the whole hash-id as well.) Such a convention would be more independent of dates possibly being touched on the file server and would make time ordered finding of things (such as for build-less approximate bisecting) far more reasonable. === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar) ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Is stable/13 going to start getting snapshot builds?
Is stable/13 going to start getting snapshot builds? (As stands, main , stable/12 , and stable/11 are getting them.) === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar) ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
etcupdate -p vs. root on zfs (and bectl use and such): no /usr/src/etc/master.passwd (for example)
FYI: The default bsdinstall result for auto ZFS that I tried has a separate zroot/usr/src dataset, which zfs mounts at /usr/src . UPDATING and such places indicate sequences like: (think etcupdate where it lists mergemaster and ignore -F and -Fi) make buildworld make buildkernel KERNCONF=YOUR_KERNEL_HERE make installkernel KERNCONF=YOUR_KERNEL_HERE [1] [3] mergemaster -Fp [5] NOTE: What /usr/src/etc/master.passwd here? (for example) make installworld mergemaster -Fi [4] make delete-old [6] etcupdate has the logic for handling -p: if [ -n "$preworld" ]; then # Build a limited tree that only contains files that are # crucial to installworld. for file in $PREWORLD_FILES; do name=$(basename $file) mkdir -p $1/etc >&3 2>&1 || return 1 cp -p $SRCDIR/$file $1/etc/$name || return 1 done Note the "$SRCDIR/$file". But for a boot -s after installing the kernel there is only zroot/ROOT/NAME and no zroot/usr/src zfs mount so /usr/src/ is empty. This leads to needing an additional step: zfs mount zroot/usr/src (The instructions do not deal with making / writable at this stage either.) === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar) ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
13.0-RELEASE bsdinstall failure : looked for MANIFEST in wrong place (not with *.txz files)
Booted RPi4 via micrsd card dd'd from: FreeBSD-13.0-RELEASE-arm64-aarch64-RPI.img I attempted a bsdinstall onto a USB3 SSD. The following reports what happened. # bsdinstall default keymap Select Hostname OK ftp mirror OK Auto (ZFS) OK Install Select stripe OK [*] da0 OK Last Chance! da0 YES Error while fetching file:///usr/freebsd-dist/MANIFEST │ : No such file or directory OK Exit NOTE: the path is /usr/freebsd-dist/MANIFEST instead of /mnt/usr/freebsd-dist/MANIFEST but . . . # df -m Filesystem 1M-blocks Used Avail Capacity Mounted on /dev/ufs/rootfs28862 3217 2333612%/ devfs 00 0 100%/dev /dev/msdosfs/MSDOSBOOT49 24 2549%/boot/msdos tmpfs 500 49 1%/tmp zroot/ROOT/default197406 183 197222 0%/mnt zroot/tmp 1972220 197222 0%/mnt/tmp zroot/usr/home1972220 197222 0%/mnt/usr/home zroot/usr/ports 1972220 197222 0%/mnt/usr/ports zroot/usr/src 1972220 197222 0%/mnt/usr/src zroot/var/audit 1972220 197222 0%/mnt/var/audit zroot/var/crash 1972220 197222 0%/mnt/var/crash zroot/var/log 1972220 197222 0%/mnt/var/log zroot/var/mail1972220 197222 0%/mnt/var/mail zroot/var/tmp 1972220 197222 0%/mnt/var/tmp zroot 1972220 197222 0%/mnt/zroot # ls -Tla /mnt/usr/freebsd-dist/ total 187454 drwxr-xr-x 2 root wheel 4 Apr 9 07:39:20 2021 . drwxr-xr-x 6 root wheel 6 Apr 9 07:39:20 2021 .. -rw-r--r-- 1 root wheel 165248188 Apr 9 07:39:20 2021 base.txz -rw-r--r-- 1 root wheel 26552108 Apr 9 07:39:21 2021 kernel.txz # ls -Tla /usr/freebsd-dist/ ls: /usr/freebsd-dist/: No such file or directory NOTE: creating /usr/freebsd-dist/ with a copy of the MANIFEST file in it was enough to get past this issue: it is doing Archive Extraction now. === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar) ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
powerpc64le is missing in: https://www.freebsd.org/platforms/
When I looked at https://www.freebsd.org/platforms/ I noticed that "64-bit little-endian PowerPC" powerpc64le is not listed. === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar) ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Filesystem operations slower in 13.0 than 12.2
On 2021-Mar-22, at 22:51, Kevin Oberman wrote: > On Mon, Mar 22, 2021 at 8:19 AM Adrian Chadd wrote: >> On Mon, 15 Mar 2021 at 14:58, Kevin Oberman wrote: >> >> > > >> > > It appears that the messages are associated with reading >> > > the disk(s), not directly with writing them, where the >> > > reads take more than "hz * 20" time units to complete. >> > > (I'm looking at main (14) code.) What might contribute >> > > to the time taken for the pending read(s)? >> > > >> > The reference to hz * 20 woke up a few sleeping memory cells. I forgot that >> > I cleaned up my loader.conf. It was largely a copy of the one on my >> > decade-old T520. I commented out "kern.hz=100". I don't recall the details, >> > but I think it was actually from an even older system, my T42 from before I >> > retired. >> > >> > In any case, restoring this setting has greatly improved the situation. I >> > now have really bad disk I/O performance on large disk to disk activity >> > (untarring the firefox distro) instead of terrible performance and the >> > system freezes have vanished, though I do see pauses in response to clicks >> > or text entry, but the display remains active and the pauses are short... 1 >> > to 15 seconds, I'd guess. No, I have no idea what this indicates. >> >> ... which drive controller is this? Is it just a laptop ATA disk? >> >> > I'm still not seeing the performance I was seeing back in February when 40 >> > MB/s for extended intervals was common and I once untarred firefox.tar.gz2 >> > in under a minute and performance seldom dropped below 1.4 MB/s. >> >> Did you find a resolution? I wonder if setting kern.hz is kicking >> some process(es) to get some time more frequently due to bugs >> elsewhere in the system (interrupts, IPI handling, wake-ups, etc) >> >> >> >> -adrian > No resolution. This is a Lenovo L15 ThinkPad with a 2TB ATAPI drive. I've not found documentation indicating the "which drive controller" answer. That may have to be answered from boot messages or boot -v messages or other such on FreeBSD. (I've no access to such a machine.) You might want to put a copy of such a log someplace that folks could look at it. There may be commands that some folks would like to see the output of. (I'm not all that likely to be one that could put such to use but other folks might be able to.) Intel® Celeron®? 10th Generation Intel CoreTM i3? i5? i7? > The current drive is a Seagate. All testing has been done since I got it > back from Lenovo in late January. I can read or write the drive at reasonable > rates that exceed 50 MB/s. Extracting a tar distribution file is painful. I > have had firefox extracts take over a half hour. Worse, if I do other > operations while the extract is taking place, I often see a 30 second (and, > occasionally 60 second) display freezes I thought that you had reported that use of kern.hz=100 had lead to "the system freezes have vanished" and "pauses are short... 1 to 15 seconds". Did more testing show that to not be always the case? > as well as log reports that of "swap_pager: indefinite wait buffer:" Unfortunately, I do not know how to investigate what is leading to those message being generated. Figuring that out would seem to be important but I do not know what to monitor to at least potentially eliminate some possibilities. One possible thing to look at is something like "gstat -spod" output spanning the time of the untar. It would at least indicate if a large queue backlog was accumulating on the device. And the ms/r and ms/w columns would give a clue if commands are sitting in the queues for long periods. (The "d" may be a waste: no BIO_DELETEs possible? Also, the r/s vs. ms/r are not rescaled reciprocals but distinct measurements. Similarly for: the w/s vs. ms/w.) Given the "indefinite wait buffer" messages, I expect the ms/r and/or ms/w figures to be large at least some of the time. Knowing how large may be of use to someone. But I can not eliminate anything with such information. > This is a bit odd as I have 20G of RAM and am pretty close to no swap space > activity, but, of course, paging does occur. With 20 GiBytes of RAM, what is going on at the time that leads to paging activity? I'm thinking of just untarring the firefox file, not building firefox or such. Can you test such an untar in a context that is not otherwise paging (nor swapping)? If yes, is the behavior different in any readily noticeable way? > This system is CometLake and graphics are not supported on 12. I am not > absolutely sure that there is no
Re: Filesystem operations slower in 13.0 than 12.2
On 2021-Mar-15, at 14:57, Kevin Oberman wrote: > Responses in-line. > > On Sun, Mar 14, 2021 at 3:09 PM Mark Millard wrote: > >> On 2021-Mar-14, at 11:09, Kevin Oberman wrote: >> >> > . . . >> > >> > Seems to only occur on large r/w operations from/to the same disk. "sp >> > big-file /other/file/on/same/disk" or tar/untar operations on large files. >> > Hit this today updating firefox. >> > >> > I/O starts at >40MB/s. Dropped to about 1.5MB/s. If I tried doing other >> > things while it was running slowly, the disk would appear to lock up. E.g. >> > pwd(1) seemed to completely lock up the system, but I could still ping it >> > and, after about 30 seconds, things came back to life. It was also not >> > instantaneous. Disc activity dropped to <1MB/s for a few seconds before >> > everything froze. >> > >> > During the untar of firefox, I saw; this several times. I also looked at my >> > console where I found these errors during : >> > swap_pager: indefinite wait buffer: bufobj: 0, blkno: 55043, size: 8192 >> > swap_pager: indefinite wait buffer: bufobj: 0, blkno: 51572, size: 4096 >> >> Does anyone know: >> Are those messages normal "reading is taking a rather long >> time" notices or is their presence more useful information >> in some way about the type of problem or context for the >> problem? >> > As for the tests: > Are these messages always present when near a time frame > when the problem occurs? Never present in a near time > frame to a period when the problem does not occur? > In a large number of test, these errors have not repeated. They baffle me for > another reason. This system has 20G or RAM. Tyically, all swap space is > unused. ATM I see 16384M free out of 16384. Not sure that I have ever seen it > used, though it might have been while building rust. I have not built rust > for a month. > > It appears that the messages are associated with reading > the disk(s), not directly with writing them, where the > reads take more than "hz * 20" time units to complete. > (I'm looking at main (14) code.) What might contribute > to the time taken for the pending read(s)? > The reference to hz * 20 woke up a few sleeping memory cells. I forgot that I > cleaned up my loader.conf. It was largely a copy of the one on my decade-old > T520. I commented out "kern.hz=100". I don't recall the details, but I think > it was actually from an even older system, my T42 from before I retired. > > In any case, restoring this setting has greatly improved the situation. I now > have really bad disk I/O performance on large disk to disk activity > (untarring the firefox distro) instead of terrible performance and the system > freezes have vanished, though I do see pauses in response to clicks or text > entry, but the display remains active and the pauses are short... 1 to 15 > seconds, I'd guess. No, I have no idea what this indicates. Interesting. > I'm still not seeing the performance I was seeing back in February when 40 > MB/s for extended intervals was common and I once untarred firefox.tar.gz2 in > under a minute and performance seldom dropped below 1.4 MB/s. > >>> /* >>> * swap_pager_getpages() - bring pages in from swap >>> * >>> * Attempt to page in the pages in array "ma" of length "count". The >>> * caller may optionally specify that additional pages preceding and >>> * succeeding the specified range be paged in. The number of such >>> pages >>> * is returned in the "rbehind" and "rahead" parameters, and they will >>> * be in the inactive queue upon return. >>> * >>> * The pages in "ma" must be busied and will remain busied upon return. >>> */ >>> static int >>> swap_pager_getpages_locked(vm_object_t object, vm_page_t *ma, int count, >>> int *rbehind, int *rahead) >>> { >>> . . . >>> /* >>> * Wait for the pages we want to complete. VPO_SWAPINPROG is always >>> * cleared on completion. If an I/O error occurs, SWAPBLK_NONE >>> * is set in the metadata for each page in the request. >>> */ >>> VM_OBJECT_WLOCK(object); >>> /* This could be implemented more efficiently with aflags */ >>> while ((ma[0]->oflags & VPO_SWAPINPROG) != 0) { >>> ma[0]->oflags |= VPO_SWAPSLEEP; >>>
Re: Filesystem operations slower in 13.0 than 12.2
reasonable values for > everything. No indication of a HW problem. The system performs well unless I > do something that tries a bulk disk data move. Building world takes about 75 > minutes. I just have a very hard time building big ports. Almost like things were stuck-sleeping and then the sleep(s) finished? === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar) ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Filesystem operations slower in 13.0 than 12.2
Konstantin Belousov kostikbel at gmail.com wrote on Fri Mar 5 23:12:13 UTC 2021 : > On Sat, Mar 06, 2021 at 12:27:55AM +0200, Christos Chatzaras wrote: . . . > > Command: /usr/bin/time -l portsnap extract (these tests done with 2 > > different idle servers but with same 4TB HDDs models) > > > > FreeBSD 12.2p4 > > > >99.45 real34.90 user59.63 sys > > 100.00 real34.91 user59.97 sys > >82.95 real35.98 user60.68 sys > > > > FreeBSD 13.0-RC1 > > > > 217.43 real75.67 user 110.97 sys > > 125.50 real63.00 user96.47 sys > > 118.93 real62.91 user96.28 sys > . . . > In the portsnap results for 13RC1, the variance is too high to conclude > anything, I think. I'll note that there are other reports of wide variance in transfer rates observed during an overall operation such as "make extract". The one I'm thinking of is: https://lists.freebsd.org/pipermail/freebsd-stable/2021-March/093251.html which is an update to earlier reports, but based on more recent stable/13. https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=253968 comment 4 has some more notes about the context. The "make extract" for firefox likely is not as complicated as the portsnap extract example's execution structure. Might be something to keep an eye on if there are on-going examples of over time. === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar) ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Filesystem operations slower in 13.0 than 12.2
On 2021-Mar-4, at 14:16, Mark Millard wrote: > Christos Chatzaras chris at cretaforce.gr wrote on > Thu Mar 4 21:41:01 UTC 2021 : > > >> After finding slow filesystem operations with 13.0-BETA2 I did more tests. >> >> All tests done with same hardware (Seagate ST4000NM0245 4TB HDD - 2 disks >> with RAID-1 using gmirror). >> >> Filesystem mounted with noatime. >> >> Command used: >> >> /usr/bin/time -l portsnap extract >> >> but similar differences I see with "/usr/bin/time -l rm -fr /usr/ports" > > I doubt that "rm -fr" gets large differences of the > type: > > (from 12.2p4:) > 0 messages sent > 0 messages received > vs. (13.0-BETA4 and 14.0-CURRENT:) > 4412 messages sent > 2536379 messages received The more I think above the above figures, the more it seems like 12.2 probably just does not track messsages sent and received, especially given the lack of huge "voluntary context switches" differences vs. 13.0-BETA4 and 14.0-CURRENT. (I expect the message sends/receives to context switch, but I might be wrong.) > In other words, large variations in Inter-Process-Communiciation > counts, especially "received". > > It is not obvious that the "portsnap extract" issue > is dominated by file system I/O vs IPC issues. > > portsanp is a script and does something that looks > like the following, with the "while read" happening > over 29000 times: > > . . . | while read FILE HASH; do >echo ${PORTSDIR}/${FILE} >if ! [ -s "${WORKDIR}/files/${HASH}.gz" ]; then >echo "files/${HASH}.gz not found -- snapshot corrupt." >return 1 >fi >case ${FILE} in >*/) >rm -rf ${PORTSDIR}/${FILE%/} >mkdir -p ${PORTSDIR}/${FILE} >tar -xz --numeric-owner -f ${WORKDIR}/files/${HASH}.gz > \ >-C ${PORTSDIR}/${FILE} >;; >*) >rm -f ${PORTSDIR}/${FILE} >tar -xz --numeric-owner -f ${WORKDIR}/files/${HASH}.gz > \ >-C ${PORTSDIR} ${FILE} >;; >esac >done; then > > I expect that the "tar -xz . . . *.gz" sort of commands > also involve internal IPC use. (It looked like the > portsnap script has not changed noticeably since > something like late 2016.) I wonder if the large user and/or sys differences between 12.2 and 13.0-BETA4 might be in process creation given the over 29000 repititions of the loop and the number of processes created per loop iteration. The block input and output figures make no clear difference that I can tell: 29 block input operations 2783 block output operations vs. 716 block input operations 868 block output operations There is also: 11821398 page reclaims vs. 12288156 page reclaims but none of that suggests that scale of differences in: 98.18 real35.31 user59.31 sys vs. 163.81 real71.93 user 107.32 sys So it might be that "time -l" just does not report on what makes up much of the difference. Given the scale of the differences, I'd not expect the variations in the likes of "involuntary context switches" or the like to explain much of the observed differences. (I avoid 14.0-CURRENT for this because of its debug build status that was reported. I avoid 13.0-BETA2 because of know block input/output operation count issues.) > (13.0-BETA2 showed a large "voluntary context switches" > difference as well, but I ignore that middle step in > the version sequence here.) > > So I expect publishing the "rm -fr /usr/ports" figures > from "time -l" would be appropriate. I do not know if > the reports should be via separate topic or not but I > doubt the figures with large differences will be the > same for most-modern vs. older: I do not expect notable > IPC from "rm -fr". > >> -- >> >> FreeBSD 12.2p4 >> >> 98.18 real35.31 user59.31 sys >> 49064 maximum resident set size >>21 average shared memory size >> 3 average unshared data size >>86 average unshared stack size >> 11821398 page reclaims >> 0 page faults >> 0 swaps >>29 block input operations >> 2783 block output operations >> 0 messages sent >> 0 me
Re: Filesystem operations slower in 13.0 than 12.2
>175 block output operations > 4412 messages sent >2536379 messages received > 0 signals received > 385527 voluntary context switches >369 involuntary context switches > > -- > > Differences between 13.0 and 14-CURRENT maybe related to debugging features. > > But 13.0-BETA4 is slower than 12.2. Does someone have more information about > this? Again, I expect that the "time -l" figures may point in different directions for "portsnap extract" vs. "rm -fr /usr/ports" in your context. The question may need to be split because the answers may be different. === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar) ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: FreeBSD 13.0-BETA2 and slow IO
Kevin Oberman rkoberman at gmail.com wrote on Mon Mar 1 07:11:32 UTC 2021 : > On Sun, Feb 28, 2021 at 12:49 PM Christos Chatzaras > wrote: > > > Did someone test if this is fixed in BETA4? > > > > Just tried to "make extract" on firefox and I am still seeing transfer > rates around 1.7M when I would expect more like 50M. If I see the same > thing others are, it runs for a while at >40MB and abruptly drops to > 1.5-20M for some random time varying from a few seconds to minutes before > jumping back to >40MB. Is this what others are seeing? I'll note that someone submitted: https://lists.freebsd.org/pipermail/freebsd-bugs/2021-March/100124.html against 13.0-BETA4 for the UFS journaled soft-updates related performance issue(s). They compared something to 12.1-RELEASE for illustration. === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar) ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: How do I know if my 13-stable has security patches?
aster-9312e0fd1vendor/openzfs Martin Matuska 4 days 36 -247/+716 * | Fix build after 2c7dc6bae9fd. Alexander Motin 4 days 1 -0/+4 * | Refactor CTL datamove KPI. Alexander Motin 4 days 12 -162/+94 * | jail: Add pr_state to struct prison Jamie Gritton 4 days 2 -51/+65 * | vfs: shrink struct vnode to 448 bytes on LP64 Mateusz Guzik 4 days 1 -1/+12 * | jail: fix build after the previous commit Mateusz Guzik 4 days 1 -1/+1 * | jail: Change the locking around pr_ref and pr_uref Jamie Gritton 4 days 6 -235/+232 * | sctp: improve computation of an alternate net Michael Tuexen 5 days 1 -36/+49 * | sctp: clear a pointer to a net which will be removed . . . (all the prior history) . . . and an empty vs. non-empty status is easier to tell apart. === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar) ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: When did pkg(8) drop support for 12-stable?
On 2021-Feb-23, at 18:08, Chris wrote: > On 2021-02-23 17:42, Mark Millard wrote: >> (Warner is only CC'd here.) >> Warner Losh imp at bsdimp.com wrote on >> Wed Feb 24 01:04:13 UTC 2021 : >>> On Tue, Feb 23, 2021, 4:51 PM Chris wrote: >>> > Given this is a pkg(8) error, I brought it up on ports@ >>> > but it was suggested I (also?) bring it up here on stable@ >>> > >>> > OK awhile back I installed a copy of 12 stable from the >>> > usb stick image. I tweaked it to my wishes then got called >>> > away and haven't been able to get back to it until the other >>> > day. This is still a fresh install which has a populated /usr/src. >>> > So I >>> > svnlite co svn://svn.freebsd.org/ports/head /usr/ports >>> > followed by a >>> > cd /usr/ports/ports-mgmt/pkg/ && make install clean >>> > which returns >>> > make >>> > /!\ ERROR: /!\ >>> > >>> > Ports Collection support for your FreeBSD version has ended, and no ports >>> > are >>> > guaranteed to build on this syst >>> em. Please upgrade to a supported release. >>> > >>> > No support will be provided if you silence this message by defining >>> > ALLOW_UNSUPPORTED_SYSTEM. >>> > >>> > *** Error code 1 >>> > >>> > Stop. >>> > Err what? Ok while I think this was from stable 12.1, it's still still 12, >>> > and it's on stable. So what gives? >>> > >>> 12.1 has reached EOL now that 12.2 has been out a while. >>> From release/12.1.0/ : >> "Tag releng/12.1@r354233 as release/12.1.0 (12.1-RELEASE)" >> I think that implicit in Warner's response is that >> versions of stable/12/ that are not after r354233 are >> also EOL. One needs to have stable/12/ material from >> after -r354233 in order for it to be supported. >> He might even mean that stable/12/ material from before: >> "Tag releng/12.2@r366954 as release/12.2.0 (12.2-RELEASE)" >> would also be considered as not supported. >> To be safe you should be using stable/12/ material from >> on or after -r366954 in order to have a supported >> context. >> (I'm not sure if anything is explicit about the status >> of stable/12/ material between releng/12.1@r354233 >> and releng/12.2@r366954 .) > A HUGE thanks for all of this, Mark. This is EXACTLY what I needed. > > # uname -apKU > FreeBSD fbsd12dev 12.1-STABLE FreeBSD 12.1-STABLE r363918 GENERIC amd64 > amd64 1201522 1201522 > which pretty well confirms what you deduced. > I'm still a bit confused. It seems to me that it didn't _used_ > to be that way. But my brain isn't using ECC. So a couple of > bits may be flipped. The implication of all of stable/12/ being supported would be support of stable/12/ from on or after its creation: QUOTE Revision 339434 - Directory Listing Modified Fri Oct 19 00:09:24 2018 UTC (2 years, 4 months ago) by gjb Copied from: head revision 339432 Copy head@r339432 to stable/12 as part of the 12.0-RELEASE cycle. Additional post-branch commits will follow. END QUOTE Such does not seem likely to me. What would be the point of dropping 12.0-RELEASE support and 12.1-RELEASE support if such stable/12/ history was covered, some of that history being minor variations on the 12.0-RELEASE or 12.1-RELEASE ? Note: Despite some claims in other messages, svn -r363918 is not 12.1-RELEASE ( not -r354233 ) and -r363918 is shown as (only) in stable/12/ by svn. Your claim of 12-STABLE was correct, just not detailed enough. === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar) ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: When did pkg(8) drop support for 12-stable?
(Warner is only CC'd here.) Warner Losh imp at bsdimp.com wrote on Wed Feb 24 01:04:13 UTC 2021 : > On Tue, Feb 23, 2021, 4:51 PM Chris wrote: > > > Given this is a pkg(8) error, I brought it up on ports@ > > but it was suggested I (also?) bring it up here on stable@ > > > > OK awhile back I installed a copy of 12 stable from the > > usb stick image. I tweaked it to my wishes then got called > > away and haven't been able to get back to it until the other > > day. This is still a fresh install which has a populated /usr/src. > > So I > > svnlite co svn://svn.freebsd.org/ports/head /usr/ports > > followed by a > > cd /usr/ports/ports-mgmt/pkg/ && make install clean > > which returns > > make > > /!\ ERROR: /!\ > > > > Ports Collection support for your FreeBSD version has ended, and no ports > > are > > guaranteed to build on this syst > em. Please upgrade to a supported release. > > > > No support will be provided if you silence this message by defining > > ALLOW_UNSUPPORTED_SYSTEM. > > > > *** Error code 1 > > > > Stop. > > Err what? Ok while I think this was from stable 12.1, it's still still 12, > > and it's on stable. So what gives? > > > > 12.1 has reached EOL now that 12.2 has been out a while. >From release/12.1.0/ : "Tag releng/12.1@r354233 as release/12.1.0 (12.1-RELEASE)" I think that implicit in Warner's response is that versions of stable/12/ that are not after r354233 are also EOL. One needs to have stable/12/ material from after -r354233 in order for it to be supported. He might even mean that stable/12/ material from before: "Tag releng/12.2@r366954 as release/12.2.0 (12.2-RELEASE)" would also be considered as not supported. To be safe you should be using stable/12/ material from on or after -r366954 in order to have a supported context. (I'm not sure if anything is explicit about the status of stable/12/ material between releng/12.1@r354233 and releng/12.2@r366954 .) Since you did not provide the output from the likes of "uname -apKU" (or some rough equivalent) I've no direct clue which version you were trying. But you should be able to compare to the above to see which range the material is from. === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar) ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: git to svn update frequency ?
On 2021-Feb-18, at 05:33, Mark Millard wrote: > mike tancsa mike at sentex.net wrote on > Thu Feb 18 10:33:14 UTC 2021 : > >> On 2/17/2021 12:10 PM, Warner Losh wrote: >>> On Feb 17, 2021, at 6:05 AM, mike tancsa wrote: >>>>I noticed on a box that I update RELENG_12 via git there are more >>>> recent commits then if I use svnlite to track. Are they only >>>> periodically updated ? If so, how frequently do they get refreshed ? >>>> e.g. I see the new OpenSSL version in git, but not when I update via >>>> svnlite. >>> Yes. There is a lag for a number of reasons. The updates happen on a >>> batched basis (it’s a script I wrote) and then there’s a delay in >>> replication to the main subversion servers. I believe that the rate is on >>> the scale of hourly, but lwhsu will have to answer that detail. >>> >> Hi Warner & Li-Wen, >> >>I think something might be broken somewhere ? The last update is >> from ~ 36 hrs ago and there have been many commits to the git repo since >> for RELENG_12. >> >> # svnlite update >> Updating '.': >> At revision 369283. >> # > > You are referencing 12, not 13 . . . > > https://cgit.freebsd.org/src/log/?h=releng/12.0 > > shows the most recent releng/12.0 in git is from 2021-Jan-28: > > Commit message (Expand) Author Age Files Lines > Add UPDATING entries and bump version.releng/12.0 Gordon Tetlow > 2020-01-28 2 -1/+17 > > > Are you confusing stable/12 and releng/12.0 or > possibly releng/12.0 and releng/13.0 ? Dumb of me to show releng/12.0 instead of releng/12.2 , I guess. But I luck out: releng/12.2 was only one day more recent . . . https://cgit.freebsd.org/src/log/?h=releng/12.2 shows: Commit message (Expand) Author Age Files Lines Add UPDATING entry and bump versionreleng/12.2 Ed Maste2021-01-29 2 -1/+17 === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar) ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: git to svn update frequency ?
mike tancsa mike at sentex.net wrote on Thu Feb 18 10:33:14 UTC 2021 : > On 2/17/2021 12:10 PM, Warner Losh wrote: > > On Feb 17, 2021, at 6:05 AM, mike tancsa wrote: > >> I noticed on a box that I update RELENG_12 via git there are more > >> recent commits then if I use svnlite to track. Are they only > >> periodically updated ? If so, how frequently do they get refreshed ? > >> e.g. I see the new OpenSSL version in git, but not when I update via > >> svnlite. > > Yes. There is a lag for a number of reasons. The updates happen on a > > batched basis (it’s a script I wrote) and then there’s a delay in > > replication to the main subversion servers. I believe that the rate is on > > the scale of hourly, but lwhsu will have to answer that detail. > > > Hi Warner & Li-Wen, > > I think something might be broken somewhere ? The last update is > from ~ 36 hrs ago and there have been many commits to the git repo since > for RELENG_12. > > # svnlite update > Updating '.': > At revision 369283. > # You are referencing 12, not 13 . . . https://cgit.freebsd.org/src/log/?h=releng/12.0 shows the most recent releng/12.0 in git is from 2021-Jan-28: Commit message (Expand) Author Age Files Lines Add UPDATING entries and bump version.releng/12.0 Gordon Tetlow 2020-01-28 2 -1/+17 Are you confusing stable/12 and releng/12.0 or possibly releng/12.0 and releng/13.0 ? === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar) ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: where to upgrade 12-stable now, svn still, or git?
On 2021-Feb-12, at 23:03, Mark Millard wrote: > Dewayne Geraghty dewayne at heuristicsystems.com.au wrote on > Sat Feb 13 06:04:52 UTC 2021 : > >> The main list we used was: >> >> https://lists.freebsd.org/pipermail/svn-src-stable-12/ >> >> but that appears dead. >> . . . >> https://lists.freebsd.org/pipermail/svn-src-release/ >> >> suspect also dead. > > I should have mentioned this area in my reply to tech-lists. > This part of things is more git based now, probably > meaning more use of https://svnweb.freebsd.org/ to look > at commits/check-ins is needed in order to see the modern > cross-references between svn and git. (Such is not available > from the git side.) > > (Older history in svn does not have git references as > far as I know.) > >> I suspect that >> >> https://lists.freebsd.org/pipermail/dev-commits-src-branches/2021-January/thread.html >> >> is the stable-12 equivalent but are incremental patch releases also >> available here? > > > That covers stable/11 , stable/12 , and stable/13 . But no list > that I know of covers any releng/* or release/* commit activity. That last sentence is false as of today for releng/13.0 : https://lists.freebsd.org/pipermail/dev-commits-src-branches/2021-February/thread.html lists 7 releng/13.0 entries, the first being: git: 00abeecb4a25 - releng/13.0 - pf: Slightly relax pf_rule_addr validation Kristof Provost > For the git side of things, one has to look at the likes of > branches via cgit (or whatever) via the likes of: > > https://cgit.freebsd.org/src/log/?h=releng/12.2 > https://cgit.freebsd.org/src/log/?h=releng/13.0 > > Something like release/12.2.0 seems to be via a tag > on a commit. So https://cgit.freebsd.org/src/log/?h=releng/12.2 > lists it but https://cgit.freebsd.org/src/log/?h=stable/12 > does not. (There are commits to releng/12.2 after the > release/12.2.0 tag.) > > Of course, for 12 there still is: > > https://svnweb.freebsd.org/base/release/12.2.0/ > https://svnweb.freebsd.org/base/releng/12.2/ > > as a svn side view of things that has the modern > cross references to git included. === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar) ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: where to upgrade 12-stable now, svn still, or git?
Dewayne Geraghty dewayne at heuristicsystems.com.au wrote on Sat Feb 13 06:04:52 UTC 2021 : > The main list we used was: > > https://lists.freebsd.org/pipermail/svn-src-stable-12/ > > but that appears dead. > . . . > https://lists.freebsd.org/pipermail/svn-src-release/ > > suspect also dead. I should have mentioned this area in my reply to tech-lists. This part of things is more git based now, probably meaning more use of https://svnweb.freebsd.org/ to look at commits/check-ins is needed in order to see the modern cross-references between svn and git. (Such is not available from the git side.) (Older history in svn does not have git references as far as I know.) > I suspect that > > https://lists.freebsd.org/pipermail/dev-commits-src-branches/2021-January/thread.html > > is the stable-12 equivalent but are incremental patch releases also > available here? That covers stable/11 , stable/12 , and stable/13 . But no list that I know of covers any releng/* or release/* commit activity. For the git side of things, one has to look at the likes of branches via cgit (or whatever) via the likes of: https://cgit.freebsd.org/src/log/?h=releng/12.2 https://cgit.freebsd.org/src/log/?h=releng/13.0 Something like release/12.2.0 seems to be via a tag on a commit. So https://cgit.freebsd.org/src/log/?h=releng/12.2 lists it but https://cgit.freebsd.org/src/log/?h=stable/12 does not. (There are commits to releng/12.2 after the release/12.2.0 tag.) Of course, for 12 there still is: https://svnweb.freebsd.org/base/release/12.2.0/ https://svnweb.freebsd.org/base/releng/12.2/ as a svn side view of things that has the modern cross references to git included. === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar) ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: where to upgrade 12-stable now, svn still, or git?
tech-lists tech-lists at zyxst.net wrote on Sat Feb 13 04:11:46 UTC 2021 : > Basically I'm asking "which is the source for truth now". The official answer for 12 as I understand it: git, where the commits are initially made before they are converted into svn. (Thus svn is time delayed.) But that is complicated because the official builds for releng, release, and even snapshots, are still based on svn (not git) for 12 and still use rDD numbering from svn and will be for the life of 12 (and 11). (There is no svn branch for 13 and later.) So, tracking relationships to official builds is easier from the svn side of things and it also provides pointers back to git. To me that makes answers to "which is the source for truth now" problematical: it would be hard to avoid mixing the criteria from what I can tell. (But, I'm unlikely to deal with before 13 and so likely will be able to avoid the issue myself.) === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar) ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: where to upgrade 12-stable now, svn still, or git?
> As subject, where to get sources for 12-stable upgrade now? Is it still > svn or is it git? Probably your choice. But one thing that could bias towards svn is that the svn information spans identifying both the svn and the git material but the git commit does not identify the svn material. For example, via: https://svnweb.freebsd.org/base/stable/12/lib/?sortby=rev=down=log is the following . . . QUOTE Revision 369260 - Directory Listing Modified Fri Feb 12 21:02:48 2021 UTC (4 hours, 49 minutes ago) by dim test_inf_inputs: Use atf_tc_expect_fail() instead of atf_tc_skip() Reviewed By:lwhsu Differential Revision: https://reviews.freebsd.org/D28396 (cherry picked from commit 4d2edf3af1dbd8a3e7cf1b22343a1ecfc2dd41ba) Fix lib/msun's ctrig_test/test_inf_inputs test case with clang >= 10 This sprinkles a few strategic volatiles in an attempt to defeat clang's optimization interfering with the expected floating-point exception flags. Reported by:lwhsu PR: 244732 (cherry picked from commit ac76bc1145dd7f4476e5d982ce8f355f71015713) Git Hash: f2a88e744701de1b37d7463828f2147f96e39d58 Git Author: arichard...@freebsd.org END QUOTE So both -r369260 and git hash-ids are indicated. By contrast, the cgit commit's display does not identify the svn side's -r369260 : QUOTE diff options context: space: mode: author Alex Richardson2021-01-29 09:28:40 + committer Dimitry Andric2021-02-12 20:50:28 + commit f2a88e744701de1b37d7463828f2147f96e39d58 (patch) tree0db8207a810f40d7f82c2033f8377ed38ce08ba2 parent 9525ccc84e337f4261425fc8fbf9f0de18500a1b (diff) downloadsrc-f2a88e744701de1b37d7463828f2147f96e39d58.tar.gz src-f2a88e744701de1b37d7463828f2147f96e39d58.zip test_inf_inputs: Use atf_tc_expect_fail() instead of atf_tc_skip()stable/12 Reviewed By:lwhsu Differential Revision: https://reviews.freebsd.org/D28396 (cherry picked from commit 4d2edf3af1dbd8a3e7cf1b22343a1ecfc2dd41ba) Fix lib/msun's ctrig_test/test_inf_inputs test case with clang >= 10 This sprinkles a few strategic volatiles in an attempt to defeat clang's optimization interfering with the expected floating-point exception flags. Reported by:lwhsu PR: 244732 (cherry picked from commit ac76bc1145dd7f4476e5d982ce8f355f71015713) END QUOTE Matching up stable revisions with releng/12.3/ or release/12.3.0/ in the future would be easier starting from svn material in the first place and would provide identification for git as well. But I've no clue if such would be important to what you might need to do with 12. === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar) ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: swap space issues
On 2020-Jun-29, at 14:12, Donald Wilde wrote: > On 6/29/20, Mark Millard wrote: >> [I'm now subscribed so my messages should go through to >> the list.] >> >> On 2020-Jun-29, at 06:17, Donald Wilde wrote: >> >>> . . . >> >> You report using: >> >> # For possibly insufficient swap/paging space >> # (might run out), increase the pageout delay >> # that leads to Out Of Memory killing of >> # processes: >> vm.pfault_oom_attempts= 10 >> vm.pfault_oom_wait= 1 >> # (The multiplication is the total but there >> # are other potential tradoffs in the factors >> # multiplied for the same total.) >> >> Note: kib might be interested in what happens >> for, say, 10 and 1, 5 and 2, and 1 and 10. >> He has asked for such before from someone >> having OOM problems but, to my knowledge, >> no one has taken him up on such testing. >> (He might be only after 10/1 and 1/10 or >> other specific figures. Best to ask him if >> you want to try such things for him.) > > Who is 'kib'? I'm still learning the current team of the Project. Konstantin Belousov Also known as kib (from kib at freebsd.org). Also known as kostik (from part of his gmail address?). >> I've always set up to use vm.pfault_oom_attempts=-1 >> (avoiding running out of swap space by how I >> configure things and what I choose to run). I >> avoid things like tempfs that compete for RAM, >> especially in low memory contexts. > > Until you explained what you have taught me, I thought these were > swap-related issues. > > TBH, I am getting disgusted with Synth, as good as it (by spec, not > actuality) is supposed to be. While I experimented with Synth a little a long time ago, I normally stick to tools and techniques that work across amd64, powerpc64, aarch64, 32-bit powerpc, and armv7 when I can. So, the experiment was strictly temporary on one environment at the time. > CCache I've used for years, and never had this kind of issue. >> >> For 64-bit environments I've never had to have >> enough swapspace that the boot reported an issue >> for kern.maxswapzone : more swap is allowed for >> the same amount of RAM as is allowed for a 32-bit >> environment. > > Now that you've opened the possibility, it would explain how it goes > from <3% swap use to OOM in moments... it's not a swap usage issue! > That's an important thing to learn. > > Not having heard from anyone else, I'm in the process of zeroing my > drive and starting over. >> >> In the 64-bit type of context with 1 GiByte+ >> of RAM I do -j4 build world buildkernel, 3072 MiBytes >> of swap. For 2 GiByte+ of RAM I use 4 poudriere builders >> (one per core), each allowed 4 processes >> (ALLOW_MAKE_JOBS=yes), so the load average can at times >> reach around 16 over significant periods. I also use >> USB SSDs instead of spinning rust. The port builds >> include a couple of llvm's and other toolchains. But >> there could be other stuff around that would not fit. >> >> (So synth for you vs. poudriere for me is a >> difference in our contexts. ALso, I stick to >> default kern.maxswapzone use without boot >> messages about exceeding the maximum >> recommended amount. Increasing kern.maxswapzone >> trades off KVM available for other purposes and >> I avoid the tradeoffs that I do not understand.) > [snip] >> (My context is head, not stable.) > > Thanks for documenting your usage. I'll store a pointer to this week's > -stable archives so I can come back to this when I get to smaller > builds. >> >> . . . >> >>> What got corrupted was one of the /usr/.ccache directories, but >>> 'ccache -C' doesn't clear it. >> >> I've not used ccache. So that is another variation >> in our contexts. >> >> I use UFS, not ZFS. I avoid tmpfs and such that complete >> for memory. > > I'm using UFS on MBR partitions. GPT for root file systems for me, other than any old PowerMacs (APM). (On the small arm's I just use microsd cards to get to booting the root file system on a GPT based USB SSD via a technique that works the same for all such arms that I sometimes have access to, other than the RPi4's at this stage.) >> . . . === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar) ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: swap space issues
[I'm now subscribed so my messages should go through to the list.] On 2020-Jun-29, at 06:17, Donald Wilde wrote: > [adding maintainers of synth and ccache] > > On 6/29/20, Mark Millard wrote: >> Based on "small arm system" context experiments >> mostly . . . >> >> If your console messasges do not include >> messages about "swap_pager_getswapspace(...): failed", >> then it is unlikely that being out of swap space >> is the actual issue even when it reports: "was killed: >> out of swap space" messages. For such contexts, making >> the swap area bigger does not help. >> > > It did not show those getswapspace messages. Any other potentially of interest console messages? >> In other words, "was killed: out of swap space" >> is frequently a misnomer and not to be believed >> for "why" the kill happened or what should be >> done about it --without other evidence also being >> present anyway. >> >> Other causes include: >> >> Sustained low free RAM (via stays-runnable processes). >> A sufficiently delayed pageout. >> The swap blk uma zone was exhausted. >> The swap pctrie uma zone was exhausted. >> >> (stays-runnable processes are not swapped out >> [kernel stacks are not swapped out] but do actively >> compete for RAM via paging activity. In such a >> context, free RAM can stay low.) >> >> The below material does not deal with the >> the "exhausted" causes but does deal with >> the other 2. >> >> Presuming that you are getting "was killed: out >> of swap space" notices but are not getting >> "swap_pager_getswapspace failed" notices and >> that kern.maxswzone vs. system load has not >> been adjusted in a way that leads to bad >> memory tradeoffs . . . >> >> I recommend attempting use of, say, (from >> my /etc/sysctl.conf ): >> > Attached is what I tried, but when I ran synth again, I got a > corrupted HDD that fsck refuses to fix, whether in 1U mode or with fs > mounted. It just will not SALVAGE even when I add the -y flag. That is a horrible result. I assume that you rebooted after editing sysctl.conf or manually applied the values separately instead. What sort of console messages were generated? Was the corruption the only issue? Did the system crash? In what way? Your notes on what you set have a incorrect comment about a case that you did not use: # For plunty of swap/paging space (will not # run out), avoid pageout delays leading to # Out Of Memory killing of processes: #vm.pfault_oom_attempts=-1 # infinite vm.pfault_oom_attempts being -1 is a special value that disables the the logic for the vm.pfault_oom_attempts and vm.pfault_oom_wait pair: Willing to wait indefinitely relative to how long the pageout takes, no retries. (Other OOM criteria may still be active.) You report using: # For possibly insufficient swap/paging space # (might run out), increase the pageout delay # that leads to Out Of Memory killing of # processes: vm.pfault_oom_attempts= 10 vm.pfault_oom_wait= 1 # (The multiplication is the total but there # are other potential tradoffs in the factors # multiplied for the same total.) Note: kib might be interested in what happens for, say, 10 and 1, 5 and 2, and 1 and 10. He has asked for such before from someone having OOM problems but, to my knowledge, no one has taken him up on such testing. (He might be only after 10/1 and 1/10 or other specific figures. Best to ask him if you want to try such things for him.) I've always set up to use vm.pfault_oom_attempts=-1 (avoiding running out of swap space by how I configure things and what I choose to run). I avoid things like tempfs that compete for RAM, especially in low memory contexts. For 64-bit environments I've never had to have enough swapspace that the boot reported an issue for kern.maxswapzone : more swap is allowed for the same amount of RAM as is allowed for a 32-bit environment. In the 64-bit type of context with 1 GiByte+ of RAM I do -j4 build world buildkernel, 3072 MiBytes of swap. For 2 GiByte+ of RAM I use 4 poudriere builders (one per core), each allowed 4 processes (ALLOW_MAKE_JOBS=yes), so the load average can at times reach around 16 over significant periods. I also use USB SSDs instead of spinning rust. The port builds include a couple of llvm's and other toolchains. But there could be other stuff around that would not fit. (So synth for you vs. poudriere for me is a difference in our contexts. ALso, I stick to default kern.maxswapzone use without boot messages about exceeding the maximum recommended amount. Increasing kern.maxswapzone trades off KVM available for other purposes and I avoid the tradeoffs that I do not understand.) For 32-bit e
Re: How to boot from GPT partition without "bootme" attribute?
Lev Serebryakov lev at FreeBSD.org wrote on Tue Oct 30 18:37:14 UTC 2018 : > I have disk with GPT scheme and three partitions: > > p1 - freebsd-boot > p2 - freebsd-ufs > p3 - freebsd-ufs > > pmbr is installed on this disk, and gptboot is installed on p1. Both p2 > and p3 contains valid FreeBSD installation, with /boot/loader, kernel, > and everything. > > I have attribute "bootme" set on p3, but not on p2. > > What should I do to boot from p2? > > I've tried to interrupt gptboot and override its choice: > > 0:ad(0p3)/boot/loader > > with > > 0:ad(0p2)/boot/loader > > After that loader, loaded from p2, loads kernel from p3 and boots > system from p3! Are the kernel's on p2 and p3 distinct in an identifiable way? Can you be sure it was not a mix of the p2 kernel and p3 world that booted? I ask because . . . One way to control what world is booted is to adjust the /etc/fstab where the /boot/kernel/kernel is loaded from, having that /etc/fstab to point to a different / area. I do this on small, single board computers to get the kernel from a microsd card but world from a USB storage media device. (I tend to use some form of labeling style reference to avoid device numbering dependencies.) The /etc/fstab where world is from has / agreeing and directs swap partition bindings and such that are appropriate to the specific world. (I've frequently had a world on the microsd card that the initial /etc/fstab can be edited to point to. This gives me a way to boot if there is a problem for the USB media.) I've done such things in gpt and non-gpt contexts. Any chance that that /etc/fstab initially used points to p3's world for / ? There are also things like /boot/loader.conf having something like: vfs.root.mountfrom='ufs:/dev/gpt/MyRoot' to control where things are booted from. > If I have MBR, I could override "active" slice in boot0 MBR loader > interactively. > > Is it analogous feature for GPT? === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar) ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
What will be tier 1 for 12.0-Release?
I note that https://pkg.freebsd.org/ does not list FreeBSD:12:aarch64 under the Tier-2 support Package sets but instead on the list with i386 and amd64. But the same is true for FreeBSD:11:aarch64 . FreeBSD:12:armv7 is listed in the Tier-2 support package sets list. The same is true for FreeBSD:12:armv6 . https://www.freebsd.org/platforms/ and https://www.freebsd.org/doc/en_US.ISO8859-1/articles/committers-guide/archs.html are, of course, not updated so far. (12.0 is not released yet and may be nothing is changing in the status.) It may be that the FreeBSD Core Team has not yet covered this for 12.0 or that it waits to see how the release goes for the potential status changes before declaring a status changed. (So I may be asking this too early.) Just curious. Good to see that there are pkg builds for powerpc64 these days: FreeBSD:12:powerpc64 and FreeBSD:11:powerpc64 are listed in the Tier-2 support package sets list as well. Technically the reported lists are from: pkg0.isc.freebsd.org === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar) ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: head -r338804 boots threadripper 1950X fine; head -r338810+ do not; -r338807 seems implicated
[I' unable to reproduce the under-Hyper-V early kernel crash for WITH_ZFS= (implicit) build that includes the for-loaders patch I was given to try.] On 2018-Oct-22, at 10:01 AM, Mark Millard wrote: > [I will note the the loader problem has been shown to > not be involved in the kernel problem that this > "Subject:" was originally for.] > > On 2018-Oct-22, at 9:26 AM, Warner Losh wrote: > >> On Mon, Oct 22, 2018 at 6:39 AM Mark Millard wrote: >>> On 2018-Oct-22, at 4:07 AM, Toomas Soome wrote: >>> >>>> On 22 Oct 2018, at 13:58, Mark Millard wrote: >>>>> >>>>> On 2018-Oct-22, at 2:27 AM, Toomas Soome wrote: >>>>>> >>>>>>> On 22 Oct 2018, at 06:30, Warner Losh wrote: >>>>>>> >>>>>>> On Sun, Oct 21, 2018 at 9:28 PM Warner Losh wrote: >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Sun, Oct 21, 2018 at 8:57 PM Mark Millard via freebsd-stable < >>>>>>>> freebsd-stable@freebsd.org> wrote: >>>>>>>> >>>>>>>>> [I built based on WITHOUT_ZFS= for other reasons. But, >>>>>>>>> after installing the build, Hyper-V based boots are >>>>>>>>> working.] >>>>>>>>> >>>>>>>>> On 2018-Oct-20, at 2:09 AM, Mark Millard wrote: >>>>>>>>> >>>>>>>>>> On 2018-Oct-20, at 1:39 AM, Mark Millard >>>>>>>>>> wrote: >>>>>>>>>> . . . >>>>>>> >>>>>> >>>>>> It would help to get output from loader lsdev -v command. >>>>> >>>>> That turned out to be very interesting: The non-ZFS loader >>>>> crashes during the listing, during disk8, which shows a >>>>> x0 instead of a x512. >>>>> >>>> >>>> Yes, thats the root cause there. The non-zfs loader does only *read* the >>>> boot disk, thats why the issue was not revealed there. >>>> >>>> It would help to identify the sector size for that disk, at least from OS, >>>> so we can compare with what we can get from INT13. >>>> >>>> I have pretty good idea what to look there, but I am afraid we need to run >>>> few tests with you to understand why that disk is reporting sector size 0 >>>> there. >>>> >>>> >>> >>> Looks like I guessed wrong about the device >>> for "drive8". >>> >>> So I unplugged the only other external >>> storage device, so the original drives >>> 0-13 become 0-11 overall. >>> >>> The machine has a multi-LUN media card reader with >>> no cards plugged in. It is built-in rather than >>> one that I plugged into a port. It has 4 LUN's. >>> >>> So 8+4=12 and drives 0-7 show up with media before >>> it tries any of the 4 LUN's with no card in place. >>> >>> I conclude that "drive8" is an empty LUN in a media >>> card reader. >>> >>> I conclude that there is no sector size available for >>> any of the empty LUNs in the media reader. >>> >> I think you are probably right and we're hitting some divide by 0 error when >> we should just ignore the disk. > > In the Hyper-V context, the loader and kernel do not > see the 4-LUN media reader at all: only drives with > normal freebsd-* style partitions and free space. > This explains why I did not see a loader problem > in that context. > > So I conclude that the kernel crash under Hyper-V > associated with -r338807 is a separate issue even > though WITHOUT_ZFS= seems to have avoided the > crash. > > My plan is to continue with the -r338807 investigation > after the loader problem is fixed in my builds. Then > I've go back to trying builds using WITH_ZFS= (implicit), > both native boots and Hyper-V based ones. So much for my ability to make that inference correctly: The WITH_ZFS= (implicit) build worked fine for booting natively and via Hyper-V when the patch to fix the loaders was included in what to build. I'm now unable to reproduce this kernel-time crash. The patch was from: https://reviews.freebsd.org/D11174 The empty LUN's in the media reader now get messages that look something like: disk8: Read 1 sector(s) from 0 to 0xe000 (0x8000): 0x31 early in the loader activity. === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar) ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: head -r338804 boots threadripper 1950X fine; head -r338810+ do not; -r338807 seems implicated
[I will note the the loader problem has been shown to not be involved in the kernel problem that this "Subject:" was originally for.] On 2018-Oct-22, at 9:26 AM, Warner Losh wrote: > On Mon, Oct 22, 2018 at 6:39 AM Mark Millard wrote: >> On 2018-Oct-22, at 4:07 AM, Toomas Soome wrote: >> >> > On 22 Oct 2018, at 13:58, Mark Millard wrote: >> >> >> >> On 2018-Oct-22, at 2:27 AM, Toomas Soome wrote: >> >>> >> >>>> On 22 Oct 2018, at 06:30, Warner Losh wrote: >> >>>> >> >>>> On Sun, Oct 21, 2018 at 9:28 PM Warner Losh wrote: >> >>>> >> >>>>> >> >>>>> >> >>>>> On Sun, Oct 21, 2018 at 8:57 PM Mark Millard via freebsd-stable < >> >>>>> freebsd-stable@freebsd.org> wrote: >> >>>>> >> >>>>>> [I built based on WITHOUT_ZFS= for other reasons. But, >> >>>>>> after installing the build, Hyper-V based boots are >> >>>>>> working.] >> >>>>>> >> >>>>>> On 2018-Oct-20, at 2:09 AM, Mark Millard wrote: >> >>>>>> >> >>>>>>> On 2018-Oct-20, at 1:39 AM, Mark Millard >> >>>>>>> wrote: >> >>>>>>> . . . >> >>>> >> >>> >> >>> It would help to get output from loader lsdev -v command. >> >> >> >> That turned out to be very interesting: The non-ZFS loader >> >> crashes during the listing, during disk8, which shows a >> >> x0 instead of a x512. >> >> >> > >> > Yes, thats the root cause there. The non-zfs loader does only *read* the >> > boot disk, thats why the issue was not revealed there. >> > >> > It would help to identify the sector size for that disk, at least from OS, >> > so we can compare with what we can get from INT13. >> > >> > I have pretty good idea what to look there, but I am afraid we need to run >> > few tests with you to understand why that disk is reporting sector size 0 >> > there. >> > >> > >> >> Looks like I guessed wrong about the device >> for "drive8". >> >> So I unplugged the only other external >> storage device, so the original drives >> 0-13 become 0-11 overall. >> >> The machine has a multi-LUN media card reader with >> no cards plugged in. It is built-in rather than >> one that I plugged into a port. It has 4 LUN's. >> >> So 8+4=12 and drives 0-7 show up with media before >> it tries any of the 4 LUN's with no card in place. >> >> I conclude that "drive8" is an empty LUN in a media >> card reader. >> >> I conclude that there is no sector size available for >> any of the empty LUNs in the media reader. >> > I think you are probably right and we're hitting some divide by 0 error when > we should just ignore the disk. In the Hyper-V context, the loader and kernel do not see the 4-LUN media reader at all: only drives with normal freebsd-* style partitions and free space. This explains why I did not see a loader problem in that context. So I conclude that the kernel crash under Hyper-V associated with -r338807 is a separate issue even though WITHOUT_ZFS= seems to have avoided the crash. My plan is to continue with the -r338807 investigation after the loader problem is fixed in my builds. Then I've go back to trying builds using WITH_ZFS= (implicit), both native boots and Hyper-V based ones. === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar) ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: head -r338804 boots threadripper 1950X fine; head -r338810+ do not; -r338807 seems implicated
On 2018-Oct-22, at 4:07 AM, Toomas Soome wrote: > On 22 Oct 2018, at 13:58, Mark Millard wrote: >> >> On 2018-Oct-22, at 2:27 AM, Toomas Soome wrote: >>> >>>> On 22 Oct 2018, at 06:30, Warner Losh wrote: >>>> >>>> On Sun, Oct 21, 2018 at 9:28 PM Warner Losh wrote: >>>> >>>>> >>>>> >>>>> On Sun, Oct 21, 2018 at 8:57 PM Mark Millard via freebsd-stable < >>>>> freebsd-stable@freebsd.org> wrote: >>>>> >>>>>> [I built based on WITHOUT_ZFS= for other reasons. But, >>>>>> after installing the build, Hyper-V based boots are >>>>>> working.] >>>>>> >>>>>> On 2018-Oct-20, at 2:09 AM, Mark Millard wrote: >>>>>> >>>>>>> On 2018-Oct-20, at 1:39 AM, Mark Millard wrote: >>>>>>> . . . >>>> >>> >>> It would help to get output from loader lsdev -v command. >> >> That turned out to be very interesting: The non-ZFS loader >> crashes during the listing, during disk8, which shows a >> x0 instead of a x512. >> > > Yes, thats the root cause there. The non-zfs loader does only *read* the boot > disk, thats why the issue was not revealed there. > > It would help to identify the sector size for that disk, at least from OS, so > we can compare with what we can get from INT13. > > I have pretty good idea what to look there, but I am afraid we need to run > few tests with you to understand why that disk is reporting sector size 0 > there. > > Looks like I guessed wrong about the device for "drive8". So I unplugged the only other external storage device, so the original drives 0-13 become 0-11 overall. The machine has a multi-LUN media card reader with no cards plugged in. It is built-in rather than one that I plugged into a port. It has 4 LUN's. So 8+4=12 and drives 0-7 show up with media before it tries any of the 4 LUN's with no card in place. I conclude that "drive8" is an empty LUN in a media card reader. I conclude that there is no sector size available for any of the empty LUNs in the media reader. > > >> Hand transcribed from pictures: >> >> OK lsdev -v >> disk devices >> disk0: BIOS drive C (937703088 x 512): >> disk0p1: FreeBSD boot 512K >> disk0p2: FreeBSD UFS 356G >> disk0p3: FreeBSD swap 15G >> disp0p4: FreeBSD swap 76G >> disk1: BIOS drive D (16514064 x 512): >> disk1s1: Linux 2048KB >> disk1s2: Unknown 952GB >> disk2: BIOS drive E (16514064 x 512): >> disk2p1: Unknown 128MB >> disk3: BIOS drive F (16514064 x 512): >> disk3p1: Unknown 128MB >> disk4: BIOS drive G (16434495 x 512): >> disk2p1: Unknown 128MB >> disk4p2: DOS/Windwos 1716GB >> disk5: BIOS drive H (16434495 x 512): >> disk5p1: FreeBSD boot 512K >> disk5p2: FreeBSD UFS 176G >> disk5p3: FreeBSD swap 193G >> disp5p4: FreeBSD swap 15G >> disk6: BIOS drive I (16434495 x 512): >> disk6p1: Unknown 499MB >> disk6p2: EFI 99MB >> disk6p3: Unknown 16MB >> disp6p4: DOS/Windows 886G >> dis7: BIOS drive H (16434495 x 512): >> disk7p1: FreeBSD boot 512K >> disk7p2: FreeBSD UFS 953G >> disk8: BIOS drive K (262144 x 0): >> >> int= err= efl=00010246 eip=000286bd >> eax= ebx=72b50430 ecx= edx= >> esi= edi=00092080 ebp=00091eec esp=00091ea8 >> cs=002b ds=0033 es=0033fs=0033 gs=0033 ss=0033 >> cs:eip=f7 f1 89 c1 85 d2 0f 85-d8 01 00 00 6a 05 58 85 >> f6 0f 88 75 01 00 00 89-cb c1 fb 1f 89 ca 03 55 >> ss:esp=09 00 00 00 00 00 00 00-0a 00 00 00 02 00 00 00 >> 00 00 00 00 00 00 00 00-78 1f 09 00 33 45 04 00 >> BTX halted >> >> I expect that "disk8" is what gpart show -p >> from a native boot showed as: >> >> => 1 60062499da1 MBR (29G) >>131 - free - (16K) >> 32 60062468 da1s1 fat32lba (29G) >> >> (That gpart show -p output is in another of the >> list messages.) >> >>> Also if you could test boot loader with UEFI - for example get to loader >>> prompt via usb/cd boot and then get the same lsdev -v output. >> >> Still true given the above crash? Or, going the >> other way, should "drive8" be left as it is in >> order to be sure to do this test with the drive >> present? >> >> If I do this test later, it will take a bit to >> get media to do it with. (It is about 4AM i
Re: head -r338804 boots threadripper 1950X fine; head -r338810+ do not; -r338807 seems implicated
On 2018-Oct-22, at 2:27 AM, Toomas Soome wrote: > >> On 22 Oct 2018, at 06:30, Warner Losh wrote: >> >> On Sun, Oct 21, 2018 at 9:28 PM Warner Losh wrote: >> >>> >>> >>> On Sun, Oct 21, 2018 at 8:57 PM Mark Millard via freebsd-stable < >>> freebsd-stable@freebsd.org> wrote: >>> >>>> [I built based on WITHOUT_ZFS= for other reasons. But, >>>> after installing the build, Hyper-V based boots are >>>> working.] >>>> >>>> On 2018-Oct-20, at 2:09 AM, Mark Millard wrote: >>>> >>>>> On 2018-Oct-20, at 1:39 AM, Mark Millard wrote: >>>>> >>>>>> I attempted to jump from head -r334014 to -r339076 >>>>>> on a threadripper 1950X board and the boot fails. >>>>>> This is both native booting and under Hyper-V, >>>>>> same machine and root file system in both cases. >>>>> >>>>> I did my investigation under Hyper-V after seeing >>>>> a boot failure native. >>>>> >>>>> Looks like the native failure is even earlier, >>>>> before db> is even possible, possibly during >>>>> early loader activity. >>>>> >>>>> So this report is really for running under >>>>> Hyper-V: -r338804 boots and -r338810 does >>>>> not. By contrast -r334804 does not boot native. >>>>> (But I've little information for that context.) >>>>> >>>>> Sorry for the confusion. I rushed the report >>>>> in hopes of getting to sleep. It was not to be. >>>>> >>>>>> It fails just after the FreeBSD/SMP lines, >>>>>> reporting "kernel trap 9 with interrupts disabled". >>>>>> >>>>>> It fails in pmap_force_invaldiate_cache_range at >>>>>> a clflusl (%rax) instruction that produces a >>>>>> "Fatal trap 9: general protection fault while >>>>>> in kernel mode". cpudid=0 apic id= 00 >>>>>> >>>>>> I used kernel.txz files from: >>>>>> >>>>>> https://artifact.ci.freebsd.org/snapshot/head/r*/amd64/amd64/ >>>>>> >>>>>> to narrow the range of kernel builds for working -> failing >>>>>> and got: >>>>>> >>>>>> -r338804 boots fine >>>>>> (no amd64 kernel builds between to try) >>>>>> -r338810+ fails (any that I tried, anyway) >>>>>> >>>>>> In that range is -r338807 : >>>>>> >>>>>> QUOTE >>>>>> Author: kib >>>>>> Date: Wed Sep 19 19:35:02 2018 >>>>>> New Revision: 338807 >>>>>> URL: >>>>>> https://svnweb.freebsd.org/changeset/base/338807 >>>>>> >>>>>> >>>>>> Log: >>>>>> Convert x86 cache invalidation functions to ifuncs. >>>>>> >>>>>> This simplifies the runtime logic and reduces the number of >>>>>> runtime-constant branches. >>>>>> >>>>>> Reviewed by: alc, markj >>>>>> Sponsored by:The FreeBSD Foundation >>>>>> Approved by: re (gjb) >>>>>> Differential revision: >>>>>> https://reviews.freebsd.org/D16736 >>>>>> >>>>>> Modified: >>>>>> head/sys/amd64/amd64/pmap.c >>>>>> head/sys/amd64/include/pmap.h >>>>>> head/sys/dev/drm2/drm_os_freebsd.c >>>>>> head/sys/dev/drm2/i915/intel_ringbuffer.c >>>>>> head/sys/i386/i386/pmap.c >>>>>> head/sys/i386/i386/vm_machdep.c >>>>>> head/sys/i386/include/pmap.h >>>>>> head/sys/x86/iommu/intel_utils.c >>>>>> END QUOTE >>>>>> >>>>>> There do seem to be changes associated with >>>>>> clflush(...) use. Looking at: >>>>>> >>>>>> >>>> https://svnweb.freebsd.org/base/head/sys/amd64/amd64/pmap.c?annotate=339432 >>>>>> >>>>>> it appears that pmap_force_invalidate_cache_range has not >>>>>> changed since -r338807. >>>>>> >>>>>> It seems that -r338806 and -r3388810 would be unlikely >>>>>> contributors.
Re: head -r338804 boots threadripper 1950X fine; head -r338810+ do not; -r338807 seems implicated
On 2018-Oct-21, at 8:30 PM, Warner Losh wrote: > On Sun, Oct 21, 2018 at 9:28 PM Warner Losh wrote: > > On Sun, Oct 21, 2018 at 8:57 PM Mark Millard via freebsd-stable > wrote: >> [I built based on WITHOUT_ZFS= for other reasons. But, >> after installing the build, Hyper-V based boots are >> working.] >> >> On 2018-Oct-20, at 2:09 AM, Mark Millard wrote: >> >> > On 2018-Oct-20, at 1:39 AM, Mark Millard wrote: >> > >> >> I attempted to jump from head -r334014 to -r339076 >> >> on a threadripper 1950X board and the boot fails. >> >> This is both native booting and under Hyper-V, >> >> same machine and root file system in both cases. >> > >> > I did my investigation under Hyper-V after seeing >> > a boot failure native. >> > >> > Looks like the native failure is even earlier, >> > before db> is even possible, possibly during >> > early loader activity. >> > >> > So this report is really for running under >> > Hyper-V: -r338804 boots and -r338810 does >> > not. By contrast -r334804 does not boot native. >> > (But I've little information for that context.) >> > >> > Sorry for the confusion. I rushed the report >> > in hopes of getting to sleep. It was not to be. >> > >> >> It fails just after the FreeBSD/SMP lines, >> >> reporting "kernel trap 9 with interrupts disabled". >> >> >> >> It fails in pmap_force_invaldiate_cache_range at >> >> a clflusl (%rax) instruction that produces a >> >> "Fatal trap 9: general protection fault while >> >> in kernel mode". cpudid=0 apic id= 00 >> >> >> >> I used kernel.txz files from: >> >> >> >> https://artifact.ci.freebsd.org/snapshot/head/r*/amd64/amd64/ >> >> >> >> to narrow the range of kernel builds for working -> failing >> >> and got: >> >> >> >> -r338804 boots fine >> >> (no amd64 kernel builds between to try) >> >> -r338810+ fails (any that I tried, anyway) >> >> >> >> In that range is -r338807 : >> >> >> >> QUOTE >> >> Author: kib >> >> Date: Wed Sep 19 19:35:02 2018 >> >> New Revision: 338807 >> >> URL: >> >> https://svnweb.freebsd.org/changeset/base/338807 >> >> >> >> >> >> Log: >> >> Convert x86 cache invalidation functions to ifuncs. >> >> >> >> This simplifies the runtime logic and reduces the number of >> >> runtime-constant branches. >> >> >> >> Reviewed by: alc, markj >> >> Sponsored by:The FreeBSD Foundation >> >> Approved by: re (gjb) >> >> Differential revision: >> >> https://reviews.freebsd.org/D16736 >> >> >> >> Modified: >> >> head/sys/amd64/amd64/pmap.c >> >> head/sys/amd64/include/pmap.h >> >> head/sys/dev/drm2/drm_os_freebsd.c >> >> head/sys/dev/drm2/i915/intel_ringbuffer.c >> >> head/sys/i386/i386/pmap.c >> >> head/sys/i386/i386/vm_machdep.c >> >> head/sys/i386/include/pmap.h >> >> head/sys/x86/iommu/intel_utils.c >> >> END QUOTE >> >> >> >> There do seem to be changes associated with >> >> clflush(...) use. Looking at: >> >> >> >> https://svnweb.freebsd.org/base/head/sys/amd64/amd64/pmap.c?annotate=339432 >> >> >> >> it appears that pmap_force_invalidate_cache_range has not >> >> changed since -r338807. >> >> >> >> It seems that -r338806 and -r3388810 would be unlikely >> >> contributors. >> > >> >> I went after my native-boot loader problem first because I >> could switch kernels via the loader for booting FreeBSD under >> Hyper-V. Switching loaders is more of a problem. >> >> In order to avoid the loader-time crash I switched to building >> installing based on WITHOUT_ZFS= . I've had no active use of >> ZFS in years. (The old official-build loaders that worked were >> non-ZFS ones.) >> >> This took care of the native-boot loader-crash --and, to my >> surprise, also the Hyper-V-boot kernel-time crash. >> >> My private builds now boot the 1950X in both contexts just >> fine. >> >> During my early investigation I did pick up specific changes >> from after -r339076 tha
Re: head -r338804 boots threadripper 1950X fine; head -r338810+ do not; -r338807 seems implicated
[I built based on WITHOUT_ZFS= for other reasons. But, after installing the build, Hyper-V based boots are working.] On 2018-Oct-20, at 2:09 AM, Mark Millard wrote: > On 2018-Oct-20, at 1:39 AM, Mark Millard wrote: > >> I attempted to jump from head -r334014 to -r339076 >> on a threadripper 1950X board and the boot fails. >> This is both native booting and under Hyper-V, >> same machine and root file system in both cases. > > I did my investigation under Hyper-V after seeing > a boot failure native. > > Looks like the native failure is even earlier, > before db> is even possible, possibly during > early loader activity. > > So this report is really for running under > Hyper-V: -r338804 boots and -r338810 does > not. By contrast -r334804 does not boot native. > (But I've little information for that context.) > > Sorry for the confusion. I rushed the report > in hopes of getting to sleep. It was not to be. > >> It fails just after the FreeBSD/SMP lines, >> reporting "kernel trap 9 with interrupts disabled". >> >> It fails in pmap_force_invaldiate_cache_range at >> a clflusl (%rax) instruction that produces a >> "Fatal trap 9: general protection fault while >> in kernel mode". cpudid=0 apic id= 00 >> >> I used kernel.txz files from: >> >> https://artifact.ci.freebsd.org/snapshot/head/r*/amd64/amd64/ >> >> to narrow the range of kernel builds for working -> failing >> and got: >> >> -r338804 boots fine >> (no amd64 kernel builds between to try) >> -r338810+ fails (any that I tried, anyway) >> >> In that range is -r338807 : >> >> QUOTE >> Author: kib >> Date: Wed Sep 19 19:35:02 2018 >> New Revision: 338807 >> URL: >> https://svnweb.freebsd.org/changeset/base/338807 >> >> >> Log: >> Convert x86 cache invalidation functions to ifuncs. >> >> This simplifies the runtime logic and reduces the number of >> runtime-constant branches. >> >> Reviewed by: alc, markj >> Sponsored by:The FreeBSD Foundation >> Approved by: re (gjb) >> Differential revision: >> https://reviews.freebsd.org/D16736 >> >> Modified: >> head/sys/amd64/amd64/pmap.c >> head/sys/amd64/include/pmap.h >> head/sys/dev/drm2/drm_os_freebsd.c >> head/sys/dev/drm2/i915/intel_ringbuffer.c >> head/sys/i386/i386/pmap.c >> head/sys/i386/i386/vm_machdep.c >> head/sys/i386/include/pmap.h >> head/sys/x86/iommu/intel_utils.c >> END QUOTE >> >> There do seem to be changes associated with >> clflush(...) use. Looking at: >> >> https://svnweb.freebsd.org/base/head/sys/amd64/amd64/pmap.c?annotate=339432 >> >> it appears that pmap_force_invalidate_cache_range has not >> changed since -r338807. >> >> It seems that -r338806 and -r3388810 would be unlikely >> contributors. > I went after my native-boot loader problem first because I could switch kernels via the loader for booting FreeBSD under Hyper-V. Switching loaders is more of a problem. In order to avoid the loader-time crash I switched to building installing based on WITHOUT_ZFS= . I've had no active use of ZFS in years. (The old official-build loaders that worked were non-ZFS ones.) This took care of the native-boot loader-crash --and, to my surprise, also the Hyper-V-boot kernel-time crash. My private builds now boot the 1950X in both contexts just fine. During my early investigation I did pick up specific changes from after -r339076 that seemed to be tied to Ryzen and such. (They made no difference to the boot problems at the time but I saw no reason to remove them.) # uname -apKU FreeBSD FBSDFSSD 12.0-ALPHA8 FreeBSD 12.0-ALPHA8 #5 r339076:339432M: Sun Oct 21 16:44:25 PDT 2018 markmi@FBSDFSSD:/usr/obj/amd64_clang/amd64.amd64/usr/src/amd64.amd64/sys/GENERIC-NODBG amd64 amd64 1200084 1200084 === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar) ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: head -r339076's boot loader fails to boot threadripper 1950X system (BTX halted); an earlier version works [ WITHOUT_ZFS= fixes it ]
[Building and installing based on WITHOUT_ZFS= allows the resulting loader to work correctly on the 1950X.] On 2018-Oct-21, at 12:05 AM, Mark Millard wrote: > On 2018-Oct-20, at 10:32 PM, Warner Losh wrote: > >> On Sat, Oct 20, 2018 at 11:04 PM Mark Millard wrote: >> [I found what change lead to the 1950X boot crashing >> with BTX halted.] >> >>> On 2018-Oct-20, at 12:44 PM, Mark Millard wrote: >>> >>>> [Adding some vintage information for a loader >>>> that allowed a native boot.] >>>> >>>> On 2018-Oct-20, at 4:00 AM, Mark Millard wrote: >>>> >>>>> I attempted to jump from head -r334014 to -r339076 >>>>> on a threadripper 1950X board and the native >>>>> FreeBSD boot failed very early. (Hyper-V use of >>>>> the same media did not have this issue.) >>>>> >>>>> But copying over an older /boot/loader from another >>>>> storage device with a FreeBSD head version that has >>>>> not been updated yet got past the problem being >>>>> reported here. (For other reasons, the kernel has >>>>> been moved back to -r338804 --and with that, >>>>> and the older /boot/loader, the 1950X native-boots >>>>> FreeBSD all the way just fine.) >>>> >>>> I found one /boot/loader.old that was dated >>>> in the update'd file system as 2018-May 20, >>>> instead of 2018-Apr-03 from the older file >>>> system. May 20 would apparently mean a little >>>> below -r334014 . It native-booted okay, as did >>>> the April one. >>>> >>>> [I do not know how to inspect a /boot/loader* >>>> to find out what -r?? it is from.] >>>> >>>> Unfortunately, I had done more than one -r339076 >>>> install from -r334014 before rebooting and >>>> no -r334014 loaders were still present: >>>> the other *.old files from a few minutes before >>>> the ones I had the boot problem with. >>>> >>>> I might be able to extract loaders from various: >>>> >>>> https://artifact.ci.freebsd.org/snapshot/head/r*/amd64/amd64/base.txz >>>> >>>> materials and try substituting them in order to >>>> narrow the range for works -> fails. If I can, >>>> this likely would take a fair amount of time in >>>> my context. >>>> >>>> Other notes: >>>> >>>> It turns out that only Hyper-V based use needed >>>> a -r334804 kernel: Native booting with the older >>>> loaders and newer kernels works fine. >>>> >>>> Windows 10 Pro 64bit also has no problems >>>> booting and operating the machine. >>>> >>>> The native-boot problem does seem to be freeBSD >>>> loader-vintage specific. >>>> >>>>> For the BTX failure the display ends up with >>>>> (hand transcribed, ". . ." for an omission): >>>>> >>>>> BTX loader 1.00 BTX version is 1.02 >>>>> Console: internal video/keyboard >>>>> BIOS drive C: is disk0 >>>>> . . . >>>>> BIOS drive P: is disk13 >>>>> - >>>>> int= err= efl=00010246 eip=96fd >>>>> eax=74d48000 ebx=74d4e5e0 ecx=0011 edx= >>>>> esi=74d4e380 edi=74d4e5b0 ebp=00091da0 esp=00091d60 >>>>> cs=002b ds=0033 es=0033fs=0033 gs=0033 ss=0033 >>>>> cs:eip=66 f7 77 04 0f b7 c0 89-44 24 0c 89 5c 24 04 8b >>>>> 45 08 89 04 24 83 64 24-10 00 c7 44 24 08 01 00 >>>>> ss:esp=00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 >>>>> 00 00 00 00 00 00 00 00-f0 1d 89 00 00 00 00 00 >>>>> BTX halted >>>> >>>> I've no clue what of that output might be loader vintage >>>> specific. It might not be of use without knowing the >>>> exact build of the loader. >>>> >>>>> The board is a GIGABYTE X399 AORUS Gaming 7 (rev 1.0). >>>>> It has 96 GiBytes of ECC RAM, just 6 DIMMs installed. >>>> >>>> For reference for the board's BIOS: >>>> >>>> Version: F11e >>>> Dated: 2018-Sep-17 >>>> Description: Update AGESA 1.1.0.1a >>> >>> Using: >>> >>> https://artifact.ci.freebsd.org/snapshot/head/r*/amd64/amd64/base.txz >>> >&g
Re: head -r339076's boot loader fails to boot threadripper 1950X system (BTX halted); an earlier version works [ -r336532 broke it ]
On 2018-Oct-20, at 10:32 PM, Warner Losh wrote: > On Sat, Oct 20, 2018 at 11:04 PM Mark Millard wrote: > [I found what change lead to the 1950X boot crashing > with BTX halted.] > >> On 2018-Oct-20, at 12:44 PM, Mark Millard wrote: >> >> > [Adding some vintage information for a loader >> > that allowed a native boot.] >> > >> > On 2018-Oct-20, at 4:00 AM, Mark Millard wrote: >> > >> >> I attempted to jump from head -r334014 to -r339076 >> >> on a threadripper 1950X board and the native >> >> FreeBSD boot failed very early. (Hyper-V use of >> >> the same media did not have this issue.) >> >> >> >> But copying over an older /boot/loader from another >> >> storage device with a FreeBSD head version that has >> >> not been updated yet got past the problem being >> >> reported here. (For other reasons, the kernel has >> >> been moved back to -r338804 --and with that, >> >> and the older /boot/loader, the 1950X native-boots >> >> FreeBSD all the way just fine.) >> > >> > I found one /boot/loader.old that was dated >> > in the update'd file system as 2018-May 20, >> > instead of 2018-Apr-03 from the older file >> > system. May 20 would apparently mean a little >> > below -r334014 . It native-booted okay, as did >> > the April one. >> > >> > [I do not know how to inspect a /boot/loader* >> > to find out what -r?? it is from.] >> > >> > Unfortunately, I had done more than one -r339076 >> > install from -r334014 before rebooting and >> > no -r334014 loaders were still present: >> > the other *.old files from a few minutes before >> > the ones I had the boot problem with. >> > >> > I might be able to extract loaders from various: >> > >> > https://artifact.ci.freebsd.org/snapshot/head/r*/amd64/amd64/base.txz >> > >> > materials and try substituting them in order to >> > narrow the range for works -> fails. If I can, >> > this likely would take a fair amount of time in >> > my context. >> > >> > Other notes: >> > >> > It turns out that only Hyper-V based use needed >> > a -r334804 kernel: Native booting with the older >> > loaders and newer kernels works fine. >> > >> > Windows 10 Pro 64bit also has no problems >> > booting and operating the machine. >> > >> > The native-boot problem does seem to be freeBSD >> > loader-vintage specific. >> > >> >> For the BTX failure the display ends up with >> >> (hand transcribed, ". . ." for an omission): >> >> >> >> BTX loader 1.00 BTX version is 1.02 >> >> Console: internal video/keyboard >> >> BIOS drive C: is disk0 >> >> . . . >> >> BIOS drive P: is disk13 >> >> - >> >> int= err= efl=00010246 eip=96fd >> >> eax=74d48000 ebx=74d4e5e0 ecx=0011 edx= >> >> esi=74d4e380 edi=74d4e5b0 ebp=00091da0 esp=00091d60 >> >> cs=002b ds=0033 es=0033fs=0033 gs=0033 ss=0033 >> >> cs:eip=66 f7 77 04 0f b7 c0 89-44 24 0c 89 5c 24 04 8b >> >> 45 08 89 04 24 83 64 24-10 00 c7 44 24 08 01 00 >> >> ss:esp=00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 >> >> 00 00 00 00 00 00 00 00-f0 1d 89 00 00 00 00 00 >> >> BTX halted >> > >> > I've no clue what of that output might be loader vintage >> > specific. It might not be of use without knowing the >> > exact build of the loader. >> > >> >> The board is a GIGABYTE X399 AORUS Gaming 7 (rev 1.0). >> >> It has 96 GiBytes of ECC RAM, just 6 DIMMs installed. >> > >> > For reference for the board's BIOS: >> > >> > Version: F11e >> > Dated: 2018-Sep-17 >> > Description: Update AGESA 1.1.0.1a >> >> Using: >> >> https://artifact.ci.freebsd.org/snapshot/head/r*/amd64/amd64/base.txz >> >> materials I found that: >> >> -r336492: worked (loader vs. zfsloader: not linked) >> (no more amd64 builds until . . .) >> -r336538: failed (loader vs. zfsloader: linked) >> >> (Later ones that I tried also failed.) >> >> Looks like this broke for booting the 1950X >> system in question when the following was >> checked in: >> >> Author: imp >> Date: Fri Jul 20 05:17:37 2018 >>
Re: head -r339076's boot loader fails to boot threadripper 1950X system (BTX halted); an earlier version works [ -r336532 broke it ]
[I found what change lead to the 1950X boot crashing with BTX halted.] On 2018-Oct-20, at 12:44 PM, Mark Millard wrote: > [Adding some vintage information for a loader > that allowed a native boot.] > > On 2018-Oct-20, at 4:00 AM, Mark Millard wrote: > >> I attempted to jump from head -r334014 to -r339076 >> on a threadripper 1950X board and the native >> FreeBSD boot failed very early. (Hyper-V use of >> the same media did not have this issue.) >> >> But copying over an older /boot/loader from another >> storage device with a FreeBSD head version that has >> not been updated yet got past the problem being >> reported here. (For other reasons, the kernel has >> been moved back to -r338804 --and with that, >> and the older /boot/loader, the 1950X native-boots >> FreeBSD all the way just fine.) > > I found one /boot/loader.old that was dated > in the update'd file system as 2018-May 20, > instead of 2018-Apr-03 from the older file > system. May 20 would apparently mean a little > below -r334014 . It native-booted okay, as did > the April one. > > [I do not know how to inspect a /boot/loader* > to find out what -r?? it is from.] > > Unfortunately, I had done more than one -r339076 > install from -r334014 before rebooting and > no -r334014 loaders were still present: > the other *.old files from a few minutes before > the ones I had the boot problem with. > > I might be able to extract loaders from various: > > https://artifact.ci.freebsd.org/snapshot/head/r*/amd64/amd64/base.txz > > materials and try substituting them in order to > narrow the range for works -> fails. If I can, > this likely would take a fair amount of time in > my context. > > Other notes: > > It turns out that only Hyper-V based use needed > a -r334804 kernel: Native booting with the older > loaders and newer kernels works fine. > > Windows 10 Pro 64bit also has no problems > booting and operating the machine. > > The native-boot problem does seem to be freeBSD > loader-vintage specific. > >> For the BTX failure the display ends up with >> (hand transcribed, ". . ." for an omission): >> >> BTX loader 1.00 BTX version is 1.02 >> Console: internal video/keyboard >> BIOS drive C: is disk0 >> . . . >> BIOS drive P: is disk13 >> - >> int= err= efl=00010246 eip=96fd >> eax=74d48000 ebx=74d4e5e0 ecx=0011 edx= >> esi=74d4e380 edi=74d4e5b0 ebp=00091da0 esp=00091d60 >> cs=002b ds=0033 es=0033fs=0033 gs=0033 ss=0033 >> cs:eip=66 f7 77 04 0f b7 c0 89-44 24 0c 89 5c 24 04 8b >> 45 08 89 04 24 83 64 24-10 00 c7 44 24 08 01 00 >> ss:esp=00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 >> 00 00 00 00 00 00 00 00-f0 1d 89 00 00 00 00 00 >> BTX halted > > I've no clue what of that output might be loader vintage > specific. It might not be of use without knowing the > exact build of the loader. > >> The board is a GIGABYTE X399 AORUS Gaming 7 (rev 1.0). >> It has 96 GiBytes of ECC RAM, just 6 DIMMs installed. > > For reference for the board's BIOS: > > Version: F11e > Dated: 2018-Sep-17 > Description: Update AGESA 1.1.0.1a Using: https://artifact.ci.freebsd.org/snapshot/head/r*/amd64/amd64/base.txz materials I found that: -r336492: worked (loader vs. zfsloader: not linked) (no more amd64 builds until . . .) -r336538: failed (loader vs. zfsloader: linked) (Later ones that I tried also failed.) Looks like this broke for booting the 1950X system in question when the following was checked in: Author: imp Date: Fri Jul 20 05:17:37 2018 New Revision: 336532 URL: https://svnweb.freebsd.org/changeset/base/336532 Log: Collapse zfsloader functionality back down into loader. . . . === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar) ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: head -r339076's boot loader fails to boot threadripper 1950X system (BTX halted); an earlier version works
[Adding some vintage information for a loader that allowed a native boot.] On 2018-Oct-20, at 4:00 AM, Mark Millard wrote: > I attempted to jump from head -r334014 to -r339076 > on a threadripper 1950X board and the native > FreeBSD boot failed very early. (Hyper-V use of > the same media did not have this issue.) > > But copying over an older /boot/loader from another > storage device with a FreeBSD head version that has > not been updated yet got past the problem being > reported here. (For other reasons, the kernel has > been moved back to -r338804 --and with that, > and the older /boot/loader, the 1950X native-boots > FreeBSD all the way just fine.) I found one /boot/loader.old that was dated in the update'd file system as 2018-May 20, instead of 2018-Apr-03 from the older file system. May 20 would apparently mean a little below -r334014 . It native-booted okay, as did the April one. [I do not know how to inspect a /boot/loader* to find out what -r?? it is from.] Unfortunately, I had done more than one -r339076 install from -r334014 before rebooting and no -r334014 loaders were still present: the other *.old files from a few minutes before the ones I had the boot problem with. I might be able to extract loaders from various: https://artifact.ci.freebsd.org/snapshot/head/r*/amd64/amd64/base.txz materials and try substituting them in order to narrow the range for works -> fails. If I can, this likely would take a fair amount of time in my context. Other notes: It turns out that only Hyper-V based use needed a -r334804 kernel: Native booting with the older loaders and newer kernels works fine. Windows 10 Pro 64bit also has no problems booting and operating the machine. The native-boot problem does seem to be freeBSD loader-vintage specific. > For the BTX failure the display ends up with > (hand transcribed, ". . ." for an omission): > > BTX loader 1.00 BTX version is 1.02 > Console: internal video/keyboard > BIOS drive C: is disk0 > . . . > BIOS drive P: is disk13 > - > int= err= efl=00010246 eip=96fd > eax=74d48000 ebx=74d4e5e0 ecx=0011 edx= > esi=74d4e380 edi=74d4e5b0 ebp=00091da0 esp=00091d60 > cs=002b ds=0033 es=0033fs=0033 gs=0033 ss=0033 > cs:eip=66 f7 77 04 0f b7 c0 89-44 24 0c 89 5c 24 04 8b > 45 08 89 04 24 83 64 24-10 00 c7 44 24 08 01 00 > ss:esp=00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 > 00 00 00 00 00 00 00 00-f0 1d 89 00 00 00 00 00 > BTX halted I've no clue what of that output might be loader vintage specific. It might not be of use without knowing the exact build of the loader. > The board is a GIGABYTE X399 AORUS Gaming 7 (rev 1.0). > It has 96 GiBytes of ECC RAM, just 6 DIMMs installed. For reference for the board's BIOS: Version: F11e Dated: 2018-Sep-17 Description: Update AGESA 1.1.0.1a === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar) ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
head -r339076's boot loader fails to boot threadripper 1950X system (BTX halted); an earlier version works
I attempted to jump from head -r334014 to -r339076 on a threadripper 1950X board and the native FreeBSD boot failed very early. (Hyper-V use of the same media did not have this issue.) But copying over an older /boot/loader from another storage device with a FreeBSD head version that has not been updated yet got past the problem being reported here. (For other reasons, the kernel has been moved back to -r338804 --and with that, and the older /boot/loader, the 1950X native-boots FreeBSD all the way just fine.) For the BTX failure the display ends up with (hand transcribed, ". . ." for an omission): BTX loader 1.00 BTX version is 1.02 Console: internal video/keyboard BIOS drive C: is disk0 . . . BIOS drive P: is disk13 - int= err= efl=00010246 eip=96fd eax=74d48000 ebx=74d4e5e0 ecx=0011 edx= esi=74d4e380 edi=74d4e5b0 ebp=00091da0 esp=00091d60 cs=002b ds=0033 es=0033fs=0033 gs=0033 ss=0033 cs:eip=66 f7 77 04 0f b7 c0 89-44 24 0c 89 5c 24 04 8b 45 08 89 04 24 83 64 24-10 00 c7 44 24 08 01 00 ss:esp=00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00-f0 1d 89 00 00 00 00 00 BTX halted The board is a GIGABYTE X399 AORUS Gaming 7 (rev 1.0). It has 96 GiBytes of ECC RAM, just 6 DIMMs installed. === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar) ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: head -r338804 boots threadripper 1950X fine; head -r338810+ do not; -r338807 seems implicated
On 2018-Oct-20, at 1:39 AM, Mark Millard wrote: > I attempted to jump from head -r334014 to -r339076 > on a threadripper 1950X board and the boot fails. > This is both native booting and under Hyper-V, > same machine and root file system in both cases. I did my investigation under Hyper-V after seeing a boot failure native. Looks like the native failure is even earlier, before db> is even possible, possibly during early loader activity. So this report is really for running under Hyper-V: -r338804 boots and -r338810 does not. By contrast -r334804 does not boot native. (But I've little information for that context.) Sorry for the confusion. I rushed the report in hopes of getting to sleep. It was not to be. > It fails just after the FreeBSD/SMP lines, > reporting "kernel trap 9 with interrupts disabled". > > It fails in pmap_force_invaldiate_cache_range at > a clflusl (%rax) instruction that produces a > "Fatal trap 9: general protection fault while > in kernel mode". cpudid=0 apic id= 00 > > I used kernel.txz files from: > > https://artifact.ci.freebsd.org/snapshot/head/r*/amd64/amd64/ > > to narrow the range of kernel builds for working -> failing > and got: > > -r338804 boots fine > (no amd64 kernel builds between to try) > -r338810+ fails (any that I tried, anyway) > > In that range is -r338807 : > > QUOTE > Author: kib > Date: Wed Sep 19 19:35:02 2018 > New Revision: 338807 > URL: > https://svnweb.freebsd.org/changeset/base/338807 > > > Log: > Convert x86 cache invalidation functions to ifuncs. > > This simplifies the runtime logic and reduces the number of > runtime-constant branches. > > Reviewed by: alc, markj > Sponsored by:The FreeBSD Foundation > Approved by: re (gjb) > Differential revision: > https://reviews.freebsd.org/D16736 > > Modified: > head/sys/amd64/amd64/pmap.c > head/sys/amd64/include/pmap.h > head/sys/dev/drm2/drm_os_freebsd.c > head/sys/dev/drm2/i915/intel_ringbuffer.c > head/sys/i386/i386/pmap.c > head/sys/i386/i386/vm_machdep.c > head/sys/i386/include/pmap.h > head/sys/x86/iommu/intel_utils.c > END QUOTE > > There do seem to be changes associated with > clflush(...) use. Looking at: > > https://svnweb.freebsd.org/base/head/sys/amd64/amd64/pmap.c?annotate=339432 > > it appears that pmap_force_invalidate_cache_range has not > changed since -r338807. > > It seems that -r338806 and -r3388810 would be unlikely > contributors. === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar) ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
head -r338804 boots threadripper 1950X fine; head -r338810+ do not; -r338807 seems implicated
I attempted to jump from head -r334014 to -r339076 on a threadripper 1950X board and the boot fails. This is both native booting and under Hyper-V, same machine and root file system in both cases. It fails just after the FreeBSD/SMP lines, reporting "kernel trap 9 with interrupts disabled". It fails in pmap_force_invaldiate_cache_range at a clflusl (%rax) instruction that produces a "Fatal trap 9: general protection fault while in kernel mode". cpudid=0 apic id= 00 I used kernel.txz files from: https://artifact.ci.freebsd.org/snapshot/head/r*/amd64/amd64/ to narrow the range of kernel builds for working -> failing and got: -r338804 boots fine (no amd64 kernel builds between to try) -r338810+ fails (any that I tried, anyway) In that range is -r338807 : QUOTE Author: kib Date: Wed Sep 19 19:35:02 2018 New Revision: 338807 URL: https://svnweb.freebsd.org/changeset/base/338807 Log: Convert x86 cache invalidation functions to ifuncs. This simplifies the runtime logic and reduces the number of runtime-constant branches. Reviewed by: alc, markj Sponsored by: The FreeBSD Foundation Approved by: re (gjb) Differential revision: https://reviews.freebsd.org/D16736 Modified: head/sys/amd64/amd64/pmap.c head/sys/amd64/include/pmap.h head/sys/dev/drm2/drm_os_freebsd.c head/sys/dev/drm2/i915/intel_ringbuffer.c head/sys/i386/i386/pmap.c head/sys/i386/i386/vm_machdep.c head/sys/i386/include/pmap.h head/sys/x86/iommu/intel_utils.c END QUOTE There do seem to be changes associated with clflush(...) use. Looking at: https://svnweb.freebsd.org/base/head/sys/amd64/amd64/pmap.c?annotate=339432 it appears that pmap_force_invalidate_cache_range has not changed since -r338807. It seems that -r338806 and -r3388810 would be unlikely contributors. === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar) ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Heads up: OFED build by default
OFED in head lead to the following in order for ci.freebsd.org's FreeBSD-head-amd64-gcc builds to not fail/stop in all_subdir_lib/ofed : Author: jhb Date: Mon Aug 6 23:51:08 2018 New Revision: 337399 URL: https://svnweb.freebsd.org/changeset/base/337399 Log: Make the system C11 atomics headers fully compatible with external GCC. The and headers already included support for C11 atomics via intrinsincs in modern versions of GCC, but these versions tried to "hide" atomic variables inside a wrapper structure. This wrapper is not compatible with GCC's internal header, so that if GCC's was used together with , use of C11 atomics would fail to compile. Fix this by not hiding atomic variables in a structure for modern versions of GCC. The headers already avoid using a wrapper structure on clang. Note that this wrapper was only used if C11 was not enabled (e.g. via -std=c99), so this also fixes compile failures if a modern version of GCC was used with -std=c11 but with FreeBSD's instead of GCC's and this change fixes that case as well. Reported by: Mark Millard Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D16585 Modified: head/sys/sys/cdefs.h head/sys/sys/stdatomic.h Without this FreeBSD-head-amd64-gcc was getting: --- all_subdir_lib/ofed --- In file included from /workspace/src/contrib/ofed/librdmacm/cma.h:43:0, from /workspace/src/contrib/ofed/librdmacm/acm.c:42: /workspace/src/contrib/ofed/librdmacm/cma.h: In function 'fastlock_init': /workspace/src/contrib/ofed/librdmacm/cma.h:60:2: error: invalid initializer atomic_store(>cnt, 0); ^ In file included from /workspace/src/contrib/ofed/librdmacm/acm.c:42:0: /workspace/src/contrib/ofed/librdmacm/cma.h: In function 'fastlock_acquire': /workspace/src/contrib/ofed/librdmacm/cma.h:68:2: error: operand type 'struct *' is incompatible with argument 1 of '__atomic_fetch_add' if (atomic_fetch_add(>cnt, 1) > 0) ^~ /workspace/src/contrib/ofed/librdmacm/cma.h: In function 'fastlock_release': /workspace/src/contrib/ofed/librdmacm/cma.h:73:2: error: operand type 'struct *' is incompatible with argument 1 of '__atomic_fetch_sub' if (atomic_fetch_sub(>cnt, 1) > 1) ^~ . . . --- all_subdir_lib/ofed --- *** [acm.o] Error code 1 Side notes: A modern enough /usr/ports avoids the devel/*-gcc the separate float.h problem: -r476273 fixed /devel/powerpc64-gcc (the master port for devel/*-gcc ports) to avoid this. With both fixes in place I was able to buildworld buildkernel via amd64's xtoolchain ( so via its use of devel/amd64-gcc ). (The build servers likely do not have -r476273 yet.) === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar) ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: zfs problems after rebuilding system [SOLVED]
Eugene Grosbein eugen at grosbein.net wrote on Mon Mar 5 12:20:47 UTC 2018 : > 05.03.2018 19:10, Dimitry Andric wrote: > >>> When no boot drive is detected early enough, the kernel goes to the >>> mountroot prompt. That seems to hold a Giant lock which inhibits >>> further progress being made. Sometimes progress can be made by trying >>> to mount unmountable partitions on other drives, but this usually goes >>> too fast, especially if the USB drive often times out. >> >> What I would like to know, is why our USB stack has such timeout issues >> at all. When I boot Linux on the same type of hardware, I never see USB >> timeouts. They must be doing something right, or maybe they just don't >> bother checking some status bits that we are very strict about? > > This is heavily hardware-dependent. You may have no issues with some > software+hardware combination and long timeouts with same software > but different hardware. Dimitry's example is for changing the software for the same(?) hardware, if I understand right. (FreeBSD vs. some Linux distribution.) (?: He did say "type of".) Perhaps that type of hardware can be used to figure out the difference. === Mark Millard marklmi at yahoo.com ( markmi at dsl-only.net is going away in 2018-Feb, late) ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: 50 percent swap used, but "ps auxww" output shows no processes swapped out
Brandon Allbery allbery.b at gmail.com wrote on Sat Feb 3 21:18:53 UTC 2018 : > Swapping whole processes out is not really a thing any more. Individual > pages are paged to/from memory; if a memory page has no backing file, it > will be allocated a block in swap space as its backing storage. > > (I'm not sure "W" status even means swap; I thought whole-process swapping > wasn't even supported any more.) >From what I've seen on the lists there is a technical distinction made between "kernel stacks for the process no longer memory resident" (swapped out) and other pages for the process having paged to disk and not being resident. But many tools do not seem to present that point of view and still reflect an older view in the terminology used, including in documentation. One has to interpret what one is shown as I understand. As an example, top can show RES being zero despite the kernel stacks for the process not having been moved to disk. RES zero might not mean what one might expect about "swapped out". I do not know if a W after the first letter in state (STAT) for "ps auxww" track the kernel-stacks' resident-vs-not status for the process or not. (Matching your not sure status.) > On Sat, Feb 3, 2018 at 4:14 PM, Michael Voorhis wrote: > > > Hi all, > > > > I've got an amd64 system running 11.1-STABLE r325027, with something > > like 20G of swap. "swapinfo" shows that half the swap is used. > > > > So of course I'm curious to know which processes have been swapped > > out. I'm not using any "tmpfs" filesystems; no ZFS, no huge amounts of > > wired-down memory. The system's got 16 processors and 128G of RAM. "ps > > auxww" output shows *no* processes that are swapped out (2nd character > > in "STAT" field is "W"). Not a single one. The only process with a W in > > the stat field at all is the "[intr]" kernel thread. > > > > What is using the swapspace The so-called swapspace is really the paging/swap-space with most of the use being paging typically. (As Brandon indicated.) Once a page is paged out, if the process sticks around but does not use or free the page, the page likely stays paged-out. (I'm guessing some at the intended results for default tuning --and that you probably are using default tuning.) So the in-use swapspace is likely from one or more existing processes that did page-outs earlier. (Expect my descriptions to be over simplified, but hopefully pointing in the right general direction.) > > Please educate me. > > === Mark Millard marklmi at yahoo.com ( markmi at dsl-only.net is going away in 2018-Feb, late) ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Ryzen issues on FreeBSD ?
Don Lewis truckman at FreeBSD.org wrote on Sat Jan 27 08:23:27 UTC 2018 : > PIDTID COMMTDNAME CPU PRI STATE WCHAN > > 90692 100801 python2.7 --1 124 sleep usem > > 90692 100824 python2.7 --1 124 sleep usem > . . . # grep -r '"usem"' /usr/src/sys/ /usr/src/sys/dev/qlnx/qlnxe/ecore_dbg_fw_funcs.c: "usem", { true, true, true }, true, DBG_USTORM_ID, /usr/src/sys/kern/kern_umtx.c: error = umtxq_sleep(uq, "usem", timeout == NULL ? NULL : ); /usr/src/sys/kern/kern_umtx.c: error = umtxq_sleep(uq, "usem", timeout == NULL ? NULL : ); /usr/src/sys/kern/kern_umtx.c has : #if defined(COMPAT_FREEBSD9) || defined(COMPAT_FREEBSD10) static int do_sem_wait(struct thread *td, struct _usem *sem, struct _umtx_time *timeout) { . . . error = umtxq_sleep(uq, "usem", timeout == NULL ? NULL : ); . . . #endif . . . static int do_sem2_wait(struct thread *td, struct _usem2 *sem, struct _umtx_time *timeout) { . . . error = umtxq_sleep(uq, "usem", timeout == NULL ? NULL : ); . . . The comparison/contrast for: > 90692 101629 python2.7 --1 125 sleep umtxn > # grep -r '"umtxn"' /usr/src/sys/ /usr/src/sys/kern/kern_umtx.c: error = umtxq_sleep(uq, "umtxn", timeout == NULL ? /usr/src/sys/kern/kern_umtx.c has: static int do_lock_normal(struct thread *td, struct umutex *m, uint32_t flags, struct _umtx_time *timeout, int mode) { . . . /* * We set the contested bit, sleep. Otherwise the lock changed * and we need to retry or we lost a race to the thread * unlocking the umtx. */ umtxq_lock(>uq_key); umtxq_unbusy(>uq_key); if (old == owner) error = umtxq_sleep(uq, "umtxn", timeout == NULL ? NULL : ); umtxq_remove(uq); umtxq_unlock(>uq_key); umtx_key_release(>uq_key); . . . Both contexts are umtxq_sleep usage: /* * Put thread into sleep state, before sleeping, check if * thread was removed from umtx queue. */ static inline int umtxq_sleep(struct umtx_q *uq, const char *wmesg, struct abs_timeout *abstime) . . . Note: I'm guessing that /usr/src/sys/dev/qlnx/qlnxe/ecore_dbg_fw_funcs.c is not involved. === Mark Millard marklmi at yahoo.com ( markmi at dsl-only.net is going away in 2018-Feb, late) ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Ryzen issues on FreeBSD ?
Mike Pumford michaelp at bsquare.com wrote on Wed Jan 24 12:03:04 UTC 2018 : > I've run into this on modern Intel systems as well. The RAM is sold as > 2400 but thats actually an overclock profile. If I actually enabled it > (despite both board and RAM being qualified for that) the system ends up > locking up or crashing as soon as you stress it. Go back to the standard > DDR profile advertised by the RAM and it is totally stable. The reported fails are during idle time as I understand. Things are working when the CPU's are kept busy from what I've read in the various notes. The hang-ups are during idle times. "the system ends up locking up or crashing as soon as you stress it" does not sound like a matching context. That a slower RAM speed might help idle behave correctly is interesting given the Zen and Ryzen dependence on RAM speed for the speed of its internal interconnect-fabric's operation. I'll note that, if one goes through the referenced Linux exchanges about this, Ryzen Threadripper's examples are also reported to have the problem. === Mark Millard marklmi at yahoo.com ( markmi at dsl-only.net is going away in 2018-Feb, late) ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Ryzen issues on FreeBSD ?
On 2018-Jan-21, at 12:17 PM, Don Lewis wrote: > On 20 Jan, Mark Millard wrote: >> Don Lewis truckman at FreeBSD.org wrote on >> Sat Jan 20 02:35:40 UTC 2018 : >> >>> The only real problem with the old CPUs is the random segfault problem >>> and some other random strangeness, like the lang/ghc build almost always >>> failing. >> >> >> At one time you had written >> ( https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=221029 >> comment #103 on 2017-Oct-09): >> >> QUOTE >> The ghc build failure seems to be gone after upgrading the a >> more recent 12.0-CURRENT. I will try to bisect for the fix >> when I have a chance. >> END QUOTE >> >> Did that not pan out? Did you conclude it was >> hardware-context specific? > > I was never able to reproduce the problem. It seems like it failed on > the first ports build run after I replaced the CPU. When I upgraded the > OS and ports, the build succeeded. I tried going back to much earlier > OS and ports versions, but I could never get the ghc build to fail > again. I'm baffled by this ... Sounds like the overall information is then: Old CPU: frequent problem building ghc (nearly always fails as far as I know) New CPU: rare problem building ghc (possibly never for some softare version combinations?) (On a Ryzen Threadripper 1950X I've not seen a failure. For the above I'm including what I observed under Hyper-V for the 1800X and 1950X as contributing evidence: The 1800X was a early one and fit the "Old CPU" case above. AMD has stated that threadrippers never had the problems that other, early Ryzen CPUs did for heavy compiling use. So far, for me, that seems true.) So, it sounds like building ghc is still a good test. Back when I had access to the 1800X Ryzen system ghc was the most reliable failure-to-build of what I tried. It still may be useful for that sort of test activity to classify Ryzen CPUs for the one type of issue. === Mark Millard marklmi at yahoo.com ( markmi at dsl-only.net is going away in 2018-Feb, late) ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Ryzen issues on FreeBSD ?
Don Lewis truckman at FreeBSD.org wrote on Sat Jan 20 02:35:40 UTC 2018 : > The only real problem with the old CPUs is the random segfault problem > and some other random strangeness, like the lang/ghc build almost always > failing. At one time you had written ( https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=221029 comment #103 on 2017-Oct-09): QUOTE The ghc build failure seems to be gone after upgrading the a more recent 12.0-CURRENT. I will try to bisect for the fix when I have a chance. END QUOTE Did that not pan out? Did you conclude it was hardware-context specific? === Mark Millard marklmi26-fbsd at yahoo.com ( markmi at dsl-only.net is going away in 2018-Feb, late) ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Ryzen issues on FreeBSD ?
Mike Tancsa mike at sentex.net wrote on: Wed Jan 17 14:31:50 UTC 2018 : > On 1/17/2018 8:46 AM, Nimrod Levy wrote: > > I've been seeing similar issues on Ryzen and asked some questions, > > here > > https://lists.freebsd.org/pipermail/freebsd-stable/2017-December/088121.html > > > > My previous queries didn't go anywhere. > > > > > > Thats not very promising :( Googling around, shows lots of similar > reports both on FreeBSD and Linux, but its a lot of "I tweaked this BIOS > setting and so far so good" but nothing definitive / conclusive. Having > to mess about with hardware settings for days on end hoping to fix > random lockups is not good. See Bugzilla 219399 and 221029 : https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=219399 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=221029 I'm not sure how much stable/11 and the like have been tracking things that were done in head (12) during this. My use has only been via versions of head. My 1800X use was basically after head was updated to deal with what 219399 eventually was isolated to. (221029 is from splitting off problems that were not originally known to be separate.) While I had problems for 1800X that are what the 221029 bugzilla above is about, I've not had such with a 1950X in the same sorts of contexts as I had been using the 1800X. But this was under Hyper-V for both processor variants (with matching boards). I've only tried the 1950X with a native FreeBSD boot once (a fair time ago). It showed a lockup problem fairly quickly (power switch/plug time). I've never seen such (or anything analogous) under Hyper-V with extensive use. It does not look like I'll be investigating native FreeBSD on the 1950X anytime soon. (I no longer have access to the 1800X.) === Mark Millard marklmi26-fbsd at yahoo.com ( markmi at dsl-only.net is going away in 2018-Feb, late) ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
11.1-STABLE for amd64: jumping from -r326142 to -r327228: all_subdir_cxgbe/t4_firmware failed to build
g.txt' 't4fw_cfg.txt'" indicates execution of the (whitespace changed below): else ln -s /usr/src/sys/dev/cxgbe/firmware/t4fw_cfg.txt t4fw_cfg.txt; ld -b binary --no-warn-mismatch -d -warn-common -m elf_x86_64_fbsd -r -d -o t4fw_cfg.txt.fwo t4fw_cfg.txt; rm t4fw_cfg.txt; fi The "E 99835 /usr/obj/amd64_clang/amd64.amd64/usr/src/tmp/usr/bin/ld" indicates which ld was executed. "X 99835 1 0" indicates a non-zero status return if I understand right. There is no "D t4fw_cfg.txt" line to match up with the "rm t4fw_cfg.txt", nor an "E" to match up with rm. # uname -apKU FreeBSD FBSDFS 11.1-STABLE FreeBSD 11.1-STABLE r326142 amd64 amd64 1101506 1101506 # svnlite info /usr/src/ | grep "Re[plv]" Relative URL: ^/stable/11 Repository Root: svn://svn.freebsd.org/base Repository UUID: ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f Revision: 327228 Last Changed Rev: 327228 # more ~/sys_build_scripts.amd64-host/make_amd64_nodebug_clang-amd64-host.sh kldload -n filemon && \ script /typescripts/sys_typescripts/typescript_make_amd64_nodebug_clang-amd64-host-$(date +%Y-%m-%d:%H:%M:%S) \ env __MAKE_CONF="/root/src.configs/make.conf" SRCCONF="/dev/null" SRC_ENV_CONF="/root/src.configs/src.conf.amd64-clang.amd64-host" \ WITH_META_MODE=yes \ MAKEOBJDIRPREFIX="/usr/obj/amd64_clang/amd64.amd64" \ make $* # more /root/src.configs/src.conf.amd64-clang.amd64-host TO_TYPE=amd64 # KERNCONF=GENERIC TARGET=${TO_TYPE} .if ${.MAKE.LEVEL} == 0 TARGET_ARCH=${TO_TYPE} .export TARGET_ARCH .endif # WITH_META_MODE= #WITH_CROSS_COMPILER= WITH_SYSTEM_COMPILER= # WITH_LIBCPLUSPLUS= WITH_BINUTILS_BOOTSTRAP= WITH_ELFTOOLCHAIN_BOOTSTRAP= #WITH_CLANG_BOOTSTRAP= WITH_CLANG= WITH_CLANG_IS_CC= WITH_CLANG_FULL= WITH_CLANG_EXTRAS= #WITH_LLD= #WITHOUT_LLD_IS_LD= #WITH_LLVM_LIBUNWIND= #WITH_LLDB= #PORTS_MODULES=emulators/virtualbox-ose-additions # WITH_BOOT= WITH_LIB32= # WITHOUT_GCC_BOOTSTRAP= WITHOUT_GCC= WITHOUT_GCC_IS_CC= WITHOUT_GNUCXX= # NO_WERROR= #WERROR= MALLOC_PRODUCTION= # WITH_REPRODUCIBLE_BUILD= WITH_DEBUG_FILES= Ryzen Threadripper 1950X HW but FreeBSD -r327142 running under a Windows 10 Pro Hyper-V virtual machine. 110592 MB of RAM assigned. 29 virtual processors assigned. Physical hard disk used, not a virtual one. === Mark Millard markmi at dsl-only.net ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ryzen issues?
On 12/15/2017 7:42 AM, Nimrod Levy wrote: > I've been having a problem with a recent computer build. I've got a Ryzen > 5 1600 on an Asus Prime B350+ motherboard. When it runs, it runs well. I > don't see things like programs crashing or anything like that. I have no debugging help to provide but can describe a similar experience. I only tried a direct-boot of FreeBSD once and it may be a long time before I try again. With a Ryzen Threadripper 1950X that normally runs FreeBSD 12.0-CURRENT builds under Windows 10 Pro's Hyper-V, I once tried to boot and use the system directly with FreeBSD --from the same media that it runs on via Hyper-V. (Under Hyper-V FreeBSD runs fine.) It booted. But not too long later it hung up. As I remember I was able to hold a power button in for a longer than normal time to cut the power. If I remember right, the light normally visible on the USB keyboard when it is operational on USB was out after the hangup (and before the forced power-off). As I remember, the video display stayed (but did not update). Ethernet access stopped. The Ryzen Threadripper 1950X does not have the CPU problem that lower end Ryzen's have had. (So for this is confirmed in my testing, contrasted to my earlier access to a 1800X, also used under Hyper-V, but never directly booted.) So, I doubt that CPU issue is involved in the hangup that I got. I will note that I've seen list messages from folks indicating some Ryzen variant for some system --and they were not reporting such hangups, nor did they indicate running under any hypervisors. So, something more local-context-special seems to be involved. === Mark Millard markmi at dsl-only.net ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
stable/11 -r326142 (e.g.): "cat /dev/null | zstd --stdout" gets "/usr/bin/zstd: Undefined symbol "stat@FBSD_1.5"
# cat /dev/null | zstd --stdout /usr/bin/zstd: Undefined symbol "stat@FBSD_1.5" # freebsd-version -ku 11.1-STABLE 11.1-STABLE # uname -apKU FreeBSD FBSDFS 11.1-STABLE FreeBSD 11.1-STABLE r326142 amd64 amd64 1101506 1101506 It was built from source: # svnlite info /usr/src/ Path: . Working Copy Root Path: /usr/src URL: svn://svn.freebsd.org/base/stable/11 Relative URL: ^/stable/11 Repository Root: svn://svn.freebsd.org/base Repository UUID: ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f Revision: 326142 Node Kind: directory Schedule: normal Last Changed Author: ae Last Changed Rev: 326142 Last Changed Date: 2017-11-23 20:42:21 -0800 (Thu, 23 Nov 2017) # svnlite status /usr/src/ # (So, no changes.) /usr/src/lib/libc/sys/Symbol.map has: FBSD_1.0 { . . . socket; socketpair; stat; statfs; swapoff; swapon; . . . So 1.0 vs. 1.5 for some reason. Note: Using /rescue/zstd avoids this issue. === Mark Millard markmi at dsl-only.net ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: 11.1 running on HyperV hn interface hangs
Paul Koch paul.koch at akips.com wrote on Wed Sep 6 09:33:26 UTC 2017 : > We recently moved our software from 11.0-p9 to 11.1-p1, but looks like there > is a regression in 11.1-p1 running on HyperV (Windows/HyperV 2012 R2) where > the virtual hn0 interface hangs with the following kernel messages: > > hn0: on vmbus0 > hn0: Ethernet address: 00:15:5d:31:21:0f > hn0: link state changed to UP > ... > hn0: RXBUF ack retry > hn0: RXBUF ack failed > last message repeated 571 times > > . . . > > Has anyone seen this problem before with 11.1 ? While it is/was a personal use/experiment I have used all the following under Windows 10 Pro's Hyper-V with networking via hn0 Ethernet as seen from the guest FreeBSD: releng/11.1 (no longer around to remind me of the most recent -r?? but various updates ) stable/11 (various updates, -r320807 currently) head(various updates, -r323147 currently) I had no problems with my use. (By no means a traffic match to your context but definitely used.) In all cases the Virtual Switch Manager was tied to the (builtin) "External network" that is listed as: Intel(R) I211 Gigabit Network Connection in the Virtual Switch Properties pop-up for External network. The machine is not a server. So not totally broken as far as I can tell. Something more specific to your context would seem to also be involved. Hyper-V has worked nicely for assigning 14 of the machine's 16 hardware threads to FreeBSD and doing buildworld buildkernel and poudriere based port builds. (Windows 10 Pro not being otherwise busy.) === Mark Millard markmi at dsl-only.net ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: svn commit: r322715 - in stable/11: etc/mtree lib/libcasper lib/libcasper/services lib/libcasper/services/cap_dns lib/libcasper/services/cap_dns/tests lib/libcasper/services/cap_grp lib/libcaspe
Nevermind, stupid mistake on my part: armv6 was not actually updated yet. > On 2017-Aug-29, at 8:42 PM, Mark Millard wrote: > > installworld for -r323012 is getting things like (at least with the likes of > -j14): > > --- pwd_test.install --- > --- _proginstall --- > install -N /usr/src/etc -s -o root -g wheel -m 555 pwd_test > /usr/obj/DESTDIRs/clang-armv7-installworld-dist-from-src/usr/tests/lib/libcasper/services/cap_pwd/pwd_test > install: pwd_test: No such file or directory > *** [_proginstall] Error code 71 > > > This was on a amd64 -> armv6 cross build and local installworld > on the amd64 file system, not a live install. > > > # svnlite status /usr/src/ | sort > ? /usr/src/sys/amd64/conf/GENERIC-NODBG > ? /usr/src/sys/arm/conf/GENERIC-NODBG > ? /usr/src/sys/arm64/conf/GENERIC-NODBG > ? /usr/src/sys/powerpc/conf/GENERIC64vtsc-NODBG > ? /usr/src/sys/powerpc/conf/GENERICvtsc-NODBG > M /usr/src/contrib/llvm/lib/Target/PowerPC/PPCFrameLowering.cpp > M /usr/src/sys/boot/powerpc/kboot/Makefile > > > # uname -apKU > FreeBSD FBSDx6411SL 11.1-STABLE FreeBSD 11.1-STABLE r323012M amd64 amd64 > 1101502 1101502 > > > # svnlite info /usr/src/ | grep "Re[plv]" > Relative URL: ^/stable/11 > Repository Root: svn://svn.freebsd.org/base > Repository UUID: ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f > Revision: 323012 > Last Changed Rev: 323012 === Mark Millard markmi at dsl-only.net ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: svn commit: r322715 - in stable/11: etc/mtree lib/libcasper lib/libcasper/services lib/libcasper/services/cap_dns lib/libcasper/services/cap_dns/tests lib/libcasper/services/cap_grp lib/libcaspe
installworld for -r323012 is getting things like (at least with the likes of -j14): --- pwd_test.install --- --- _proginstall --- install -N /usr/src/etc -s -o root -g wheel -m 555 pwd_test /usr/obj/DESTDIRs/clang-armv7-installworld-dist-from-src/usr/tests/lib/libcasper/services/cap_pwd/pwd_test install: pwd_test: No such file or directory *** [_proginstall] Error code 71 This was on a amd64 -> armv6 cross build and local installworld on the amd64 file system, not a live install. # svnlite status /usr/src/ | sort ? /usr/src/sys/amd64/conf/GENERIC-NODBG ? /usr/src/sys/arm/conf/GENERIC-NODBG ? /usr/src/sys/arm64/conf/GENERIC-NODBG ? /usr/src/sys/powerpc/conf/GENERIC64vtsc-NODBG ? /usr/src/sys/powerpc/conf/GENERICvtsc-NODBG M /usr/src/contrib/llvm/lib/Target/PowerPC/PPCFrameLowering.cpp M /usr/src/sys/boot/powerpc/kboot/Makefile # uname -apKU FreeBSD FBSDx6411SL 11.1-STABLE FreeBSD 11.1-STABLE r323012M amd64 amd64 1101502 1101502 # svnlite info /usr/src/ | grep "Re[plv]" Relative URL: ^/stable/11 Repository Root: svn://svn.freebsd.org/base Repository UUID: ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f Revision: 323012 Last Changed Rev: 323012 === Mark Millard markmi at dsl-only.net ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: svn commit: r322875 - head/sys/dev/nvme
On 2017-Aug-27, at 11:54 PM, Ed Schouten wrote: > 2017-08-25 14:53 GMT+02:00 Ed Schouten : >> 2017-08-25 9:46 GMT+02:00 Mark Millard : >>> It appears that at least 11.1-STABLE -r322807 does not handle >>> -std=c++98 styles of use of _Static_assert for g++7 in that >>> g++7 reports an error: >> >> Maybe we need to do something like this? >> >> Index: sys/sys/cdefs.h >> === >> --- sys/sys/cdefs.h (revision 322887) >> +++ sys/sys/cdefs.h (working copy) >> @@ -294,7 +294,7 @@ >> #if (defined(__cplusplus) && __cplusplus >= 201103L) || \ >> __has_extension(cxx_static_assert) >> #define _Static_assert(x, y) static_assert(x, y) >> -#elif __GNUC_PREREQ__(4,6) >> +#elif __GNUC_PREREQ__(4,6) && !defined(__cplusplus) >> /* Nothing, gcc 4.6 and higher has _Static_assert built-in */ >> #elif defined(__COUNTER__) >> #define _Static_assert(x, y) __Static_assert(x, __COUNTER__) > > Could you let me know whether this patch fixes the build for you? If > so, I'll commit it! As a variant of stable/11 -r322807 . . . buildworld and buildkernel seem to work fine. (I did not try any port [re-]builds.) Based on the same main.cc as before . . . g++7 -std=c++98 main.cc g++7 -Wpedantic -std=c++98 main.cc g++7 -std=c++03 main.cc g++7 -Wpedantic -std=c++03 main.cc no longer complain (so no error, no warning). clang++ -Wpedantic -std=c++11 main.cc clang++ -Wpedantic -std=c++98 main.cc clang++ -Wpedantic -std=c++03 main.cc each still give the warning but no error. g++7 -Wpedantic -std=c++11 main.cc g++7 -std=c++11 main.cc clang++ -std=c++11 main.cc clang++ -std=c++98 main.cc clang++ -std=c++03 main.cc are still silent, no errors, no warnings. Note that clang here is version 4 --the same as in my original report that had the g++7 rejection example. This is because of the stable/11 context that I used. (An intended MFC had been listed.) If needed I could probably try under some version of head (and so test clang version 5). === Mark Millard markmi at dsl-only.net ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: svn commit: r322875 - head/sys/dev/nvme
On 2017-Aug-25, at 12:14 AM, David Chisnall wrote: > On 25 Aug 2017, at 07:32, Mark Millard wrote: >> >> As I remember _Static_assert is from C11, not >> the older C99. > > In pre-C11 dialects of C, _Static_assert is an identifier reserved for the > implementation. sys/cdefs.h defines it to generate a zero-length array if > the condition is true or a negative-length array if it is false, emulating > the behaviour (though giving less helpful error messages) > >> >> As I understand head/sys/dev/nvme/nvme.h use by >> C++ code could now reject attempts to use >> _Static_assert . > > In C++, _Static_assert is an identifier reserved for the implementation, but > in C++11 or newer static_assert is a keyword. sys/cdefs.h defines > _Static_assert to static_assert for newer versions of C++ and defines it to > the C-before-11-compatible version for C++-before-11. > > TL;DR: We have gone to a lot of effort to ensure that these keywords work in > all C/C++ dialects, please use them, please report bugs if you find a case > where they don’t work. It appears that at least 11.1-STABLE -r322807 does not handle -std=c++98 styles of use of _Static_assert for g++7 in that g++7 reports an error: # uname -apKU FreeBSD hzFreeBSD11S 11.1-STABLE FreeBSD 11.1-STABLE r322807 amd64 amd64 1101501 1101501 # more main.cc #include "/usr/include/sys/cdefs.h" _Static_assert(1,"Test"); int main(void) { return 0; } # g++7 -std=c++98 main.cc main.cc:2:15: error: expected constructor, destructor, or type conversion before '(' token _Static_assert(1,"Test"); ^ So it appears that as stands the _Static_assert implementation requires a more modern C++ standard vintage. With the likes of -Wpedantic clang++ from 11.1-STABLE -r322807 reports a warning: # clang++ -Wpedantic -std=c++11 main.cc main.cc:2:1: warning: _Static_assert is a C11-specific feature [-Wc11-extensions] _Static_assert(1,"Test"); ^ 1 warning generated. # clang++ -Wpedantic -std=c++98 main.cc In file included from main.cc:1: /usr/include/sys/cdefs.h:852:27: warning: variadic macros are a C99 feature [-Wvariadic-macros] #define __locks_exclusive(...) \ ^ . . . (more such macro reports) . . . main.cc:2:1: warning: _Static_assert is a C11-specific feature [-Wc11-extensions] _Static_assert(1,"Test"); ^ 11 warnings generated. By contrast "g++7 -Wpedantic -std=c++11 main.cc" is silent about it. === Mark Millard markmi at dsl-only.net ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: svn commit: r322875 - head/sys/dev/nvme
> Author: imp > Date: Fri Aug 25 04:33:06 2017 > New Revision: 322875 > URL: > https://svnweb.freebsd.org/changeset/base/322875 > > > Log: > Use _Static_assert > > These files are compiled in userland too, so we can't use sys/systm.h > and rely on CTASSERT. Switch to using _Static_assert instead. > > MFC After: 3 days > Sponsored by: Netflix > > Modified: > head/sys/dev/nvme/nvme.h > head/sys/dev/nvme/nvme_util.c As I remember _Static_assert is from C11, not the older C99. As I understand head/sys/dev/nvme/nvme.h use by C++ code could now reject attempts to use _Static_assert . There have been at least one old bugzilla report for such. An example is 205453 (back around 2015-Dec). >From back then: > # more main.cc > #include "/usr/include/sys/cdefs.h" > _Static_assert(1,"Test"); > int main(void) > { > return 0; > } > > For example: > > # g++49 main.cc > main.cc:2:15: error: expected constructor, destructor, or type conversion > before '(' token > _Static_assert(1,"Test"); > . . . > g++49, g++5, and powerpc64-portbld-freebsd11.0-g++ all reject the above > source the same way that libcxxrt/guard.cc compiles are rejected during > powerpc64-portbld-freebsd11.0-g++ based buildworld lib32 -m32 compiles. > > gcc49, gcc5, and powerpc64-portbld-freebsd11.0-gcc all accept the above > instead (when in main.c instead of main.cc so it is handle as C code), with > or without the include. _Static_assert is specific to C11 and is not part of > C++. It takes explicit definitions to make the syntax acceptable as C++. > > Note: clang++ (3.7) accepts the use of the C11 _Static_assert, with or > without the include, going well outside the C++ language definition. > > . . . > > Fixed in r297299 . (The context was a C++ file head/contrib/libcxxrt/guard.cc so C++'s static_assert was used instead and -std=c++11 was added for the library in question [libcxxrt].) Unless head/sys/dev/nvme/nvme.h is not to be used from C++ code: use of _Static_assert in the header would appear to be a problem. === Mark Millard markmi at dsl-only.net ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
UNAME_r () and OSVERSION (1101501) do not agree on major version number , which poudriere bulk rejects as a combination.
: /usr/local Categories : ports-mgmt Licenses : BSD2CLAUSE Maintainer : bdrew...@freebsd.org WWW: https://github.com/freebsd/poudriere/wiki Comment: Port build and test system Options: EXAMPLES : on QEMU : off ZSH: on Annotations: repo_type : binary repository : FreeBSD Flat size : 2.09MiB Description: poudriere is a tool primarily designed to test package production on FreeBSD. However, most people will find it useful to bulk build ports for FreeBSD. WWW: https://github.com/freebsd/poudriere/wiki I tried to configure and use a -m null -M /usr/obj/DESTDIRs/FBSDx6411SL-installworld-dist-from-src based jail and a -m null -M /usr/ports based ports . . . This leads to my attempted poudriere bulk after configuring doing the following: [00:00:00] Creating the reference jail... done [00:00:04] Mounting system devices for zrFBSDx64SLjail-default [00:00:04] Mounting ports/packages/distfiles [00:00:04] Converting package repository to new format [00:00:04] Stashing existing package repository [00:00:04] Mounting packages from: /usr/local/poudriere/data/packages/zrFBSDx64SLjail-default /etc/resolv.conf -> /usr/local/poudriere/data/.m/zrFBSDx64SLjail-default/ref/etc/resolv.conf [00:00:04] Starting jail zrFBSDx64SLjail-default make: "/usr/ports/Mk/bsd.port.mk" line 1177: UNAME_r () and OSVERSION (1101501) do not agree on major version number. [00:00:05] Logs: /usr/local/poudriere/data/logs/bulk/zrFBSDx64SLjail-default/2017-08-13_17h45m28s [00:00:05] Loading MOVED make: "/usr/ports/Mk/bsd.port.mk" line 1177: UNAME_r () and OSVERSION (1101501) do not agree on major version number. [00:00:06] Error: Error looking up pre-build ports vars [00:00:06] Cleaning up [00:00:09] Unmounting file systems And at this point we are to what I put in the summary. === Mark Millard markmi at dsl-only.net ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: stack_guard hardening bsdinstall option in STABLE and 11.1
Vlad K. vlad-fbsd at acheronmedia.com wrote on Mon Jul 17 15:03:11 UTC 2017 : > I also asked why wasn't the bsdinstall-er option change > MFC'd after 1 day, two weeks ago, whether it's by omission, simply > ENOTIME, or something else... Given what Konstantin Belousov described (default stack space sizes and apparently guard pages eat into stack space instead of the overall space being bigger by the guard size), I think that would explain not moving from CURRENT: it was known to be a problem. (Although I expect Konstantin Belousov's note here is the first public description of the problem's details.) I agree that you did not get an answer for the other part: > I simply asked if it's safe to assume the sysctl to be an integer in > 11.1 I've not gone through any draft 11.1-release code to check. === Mark Millard markmi at dsl-only.net ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: lang/gcc* package builds vs. release/11.0.1/ and the future release/11.1.0 because of vm_ooffset_t and vm_pindex_t changes and how the lang/gcc* work
On 2017-Jun-29, at 3:10 AM, Gerald Pfeifer wrote: > Am 28. Juni 2017 22:38:52 GMT+08:00 schrieb Mark Millard dsl-only.net>: >> A primary test is building lang/gcc5-devel under release/11.0.1 >> and then using it under stable/11 or some draft of release/11.1.0 . > > Thank you, Mark. Let me know how it went. In the meantime I'll prepare the > change for gcc5 itself. I'm not currently set up to run more than head on any of amd64, powerpc64, powerpc, aarch64, or armv6/7 (which are all I target). And I'm in the middle of attempting a fairly large jump to head -r320458 on those. (powerpc 32-bit and 64-bit just failed for libc++ time-usage compiling now that 32-bit has 64-bit time_t, including in world32/lib32 contexts for powerpc64.) It will likely be a while before I manage to have a 11.x context (without losing my head contexts), much less examples from all "my" 5 TARGET_ARCH's. (Given past wchar_t type handling problems (e.g.) for gcc targeting powerpc family members I think it should be checked.) I'll have to find and set up disks: I do not even have such handy/ready at the moment. [I got into this area by being asked questions, not by my direct use of release/11.0.1 , stable/11 , or a draft of release/11.1.0 .] I'll let you know when I have some test results but others may get some before I do. > . . . >> Eventually most of the lang/gcc* 's will need whatever >> technique is used. > > Yes, agreed. Version 5 is most important since it's the default; then 6; 4.x > is for retro computing fans ;-), so 7 will then be next. [In my normal/head environment I'm switching to lang/gcc7-devel for gcc (from lang/gcc6 ) but I'm odd that way.] === Mark Millard markmi at dsl-only.net ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: lang/gcc* package builds vs. release/11.0.1/ and the future release/11.1.0 because of vm_ooffset_t and vm_pindex_t changes and how the lang/gcc* work
On 2017-Jun-28, at 3:21 AM, Gerald Pfeifer <ger...@pfeifer.com> wrote: > I am testing a patch for gcc5-devel right now that will disable fixincludes > (or rather its fixed files) being packaged. > > Should that work fine for you, I will push this back to gcc5 the following > days. > > That said, the change that triggered this is what I would expect on CURRENT, > not STABLE (and hence hoped we'd have more time for this change). > > My Internet connectivity right now is only slightly above pigeon speed, so > sorry for any delays. Thanks! Some notes: A primary test is building lang/gcc5-devel under release/11.0.1 and then using it under stable/11 or some draft of release/11.1.0 . It looks like the the lang/gcc5-devel build still creates and uses the headers that go in include-fixed/ but that they are removed from $(STAGEDIR}${TARGLIB} 's tree before installation or packaging. So, if I understand right, lang/gcc5-devel itself still does use the adjusted headers to produce its own materials but when lang/gcc5-devel is used later it does not. Definitely something to be testing since it is a mix overall. Is some form of exp-like run needed that tries to force use of a release/11.0.1 built lang/gcc5-devel (-r444563) to build other things under, say, stable/11 or some draft of release/11.1.0 ? Is this odd combination even possible currently? A normal exp-run on release/11.0.1 without a system version switch being involved also seems appropriate. The same could be said of an exp-run based on a release/11.1.0 draft for both building lang/gcc5-devel and using it to build other things. I had hoped that the Linux From Scratch technique of doing: sed -i 's@\./fixinc\.sh@-c true@' gcc/Makefile.in (or an equivalent) before gcc/Makefile.in is used would allow lang/gcc5-devel to use the same headers in its build that the installed compiler would then use to produce other code --by avoiding generating most of the adjusted files in the first place. But I guess that did not work out. Eventually most of the lang/gcc* 's will need whatever technique is used. Some, such as lang/gcc6-aux, need more done because of binary bootstrap materials being downloaded and used and so the build of lang/gcc6-aux gets the problem and fails before staging happens: the binary-bootstrap materials need to avoid the adjusted headers that they currently contain. === Mark Millard markmi at dsl-only.net ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: lang/gcc* package builds vs. release/11.0.1/ and the future release/11.1.0 because of vm_ooffset_t and vm_pindex_t changes and how the lang/gcc* work
Top post on one point. . . Patrick Powell papowell at astart.com wrote on Mon Jun 26 14:10:44 UTC 2017 (He was quoting Gerald. I was also part of some earlier discussions.) > (Luckily this only hits with most -CURRENT versions of FreeBSD and > older packages only.) > > Gerald Unfortunately this part is false if it is about the vm_ooffset_t and vm_pindex_t issue: stable/11/ and release/11.1.0/ also have the vm_ooffset_t and vm_pindex_t issue vs. lang/gcc* packages built by release/11.0.1/ . The issue is not limited to head (12) at this point: Installing a gcc* package built by release/11.0.1/ fails now for stable/11/ and the drafs oft release/11.1.0/ . Anyone progressing to one of those has to build the lang/gcc* of interest from source under the newer system context. (Mixing source builds and package builds is discouraged as I understand.) I'm not claiming which specific handling needs to be made. But the vm_ooffset_t and vm_pindex_t changes did not even make the UPDATING notes. Right now things look to have the worst combination for lang/gcc* when release/11.1.0/ becomes official: lang/gcc* 's break without notification or suggestion of a workaround. === Mark Millard markmi at dsl-only.net On 2017-Jun-24, at 5:55 PM, Mark Millard <mar...@dsl-only.net> wrote: The following is based mostly on an extraction from a private exchange in which a question was asked and my answer was unsettling: incompatibilities within the 11.* family. I would not normally send to re but doing so was explicitly mentioned. Hopefully this example is reasonable for doing that. Aspect #0: what is broken currently (and in the future?) within the 11.* family? lang/gcc* packages built on release/11.0.1/ to not work fully on stable/11/ or on the drafts of release/11.1.0/ . (I leave releng/11.*/'s implicit.) -r313194 in head and was describied with: > Define the vm_ooffset_t and vm_pindex_t types as machine-independend. > > The types are for the byte offset and page index in vm object. They > are similar to off_t, which is defined as 64bit MI integer. Using MI > definitions will allow to provide consistent MD values of vm > object-related maximum sizes. The known issue is the generation of header dependencies in the lang/gcc* builds on release/11.0.1/ that when used on stable/11/ or release/11.0.1/ generate reports like: /usr/local/lib/gcc5/gcc/x86_64-portbld-freebsd11.0/5.4.0/include-fixed/sys/types.h:266:9: error: '__vm_ooffset_t' does not name a type typedef __vm_ooffset_t vm_ooffset_t; ^ /usr/local/lib/gcc5/gcc/x86_64-portbld-freebsd11.0/5.4.0/include-fixed/sys/types.h:268:9: error: '__vm_pindex_t' does not name a type typedef __vm_pindex_t vm_pindex_t; ^ *** [CoinFactorization2.lo] Error code 1 Unfortunately UPDATING was not updated for head/'s -r313194 (2017-Feb-4) --nor for stable/11/'s -r313574 (2017-Feb-11), the MFC. (No MFC was made to stable/10/ or to release/10.3.0 as far as I found.) (These changes predate the INO64 issue in head/ . Head ends up with more issues than I'm dealing with here.) Aspect #1: what 11.* version builds the pre-built packages targeting 11.* and the apparent consequences (given the vm_ooffset_t and vm_pindex_t changes and the lang/gcc* build behavior) This is the unsettling part for pre-built packages: incompatibilities within the 11.* family for the lang/gcc* packages. http://portsmon.freebsd.org/portoverview.py?category=%3Bamng=gcc5= shows categories for builds for 8.4 9.3 10.1 10.3 11.0 head (Nothing for stable/*/ .) But the 10.3 rows show no package builds. I would guess that they start once 10.1 stops (approximately). So it may be that 11.1 will not get package builds until 11.0 stops (approximately). If so unless lang/gcc* are changed to bootstrap differently they will configure to match release/11.0.1/ and will not be compatible with the vm_ooffset_t and vm_pindex_t changes in stable/11/ and release/11.1.0/ . But as I understand updating how the lang/gcc* builds work to remove such dependencies is under investigation. I do not know any timing relative to release/11.1.0/ if my understanding is right. Until then (if I was right): Unless there are separate packages made for targeting release/11.0.1/ vs. release/11.1.0/ it is not obvious when lang/gcc* packages will be generally compatible with various folks choices about what to install as the system version within the release/11.*/ and stable/11/ family. This would likely be true even if they were built on release/11.1.0/ : then release/11.0.1/ likely would have compatibility problems. The ABI versioning does not cover the specific issues involved based on how vm_ooffset_t and vm_pindex_t were changed and what the lang/gcc* builds do relative to such changes. Yet there is incompatibility for some fairly-significant-usage ports. Aspect #2: stable/10/ and release/10.4.0/ Just covered for completeness: I do not see a
lang/gcc* package builds vs. release/11.0.1/ and the future release/11.1.0 because of vm_ooffset_t and vm_pindex_t changes and how the lang/gcc* work
The following is based mostly on an extraction from a private exchange in which a question was asked and my answer was unsettling: incompatibilities within the 11.* family. I would not normally send to re but doing so was explicitly mentioned. Hopefully this example is reasonable for doing that. Aspect #0: what is broken currently (and in the future?) within the 11.* family? lang/gcc* packages built on release/11.0.1/ to not work fully on stable/11/ or on the drafts of release/11.1.0/ . (I leave releng/11.*/'s implicit.) -r313194 in head and was describied with: > Define the vm_ooffset_t and vm_pindex_t types as machine-independend. > > The types are for the byte offset and page index in vm object. They > are similar to off_t, which is defined as 64bit MI integer. Using MI > definitions will allow to provide consistent MD values of vm > object-related maximum sizes. The known issue is the generation of header dependencies in the lang/gcc* builds on release/11.0.1/ that when used on stable/11/ or release/11.0.1/ generate reports like: /usr/local/lib/gcc5/gcc/x86_64-portbld-freebsd11.0/5.4.0/include-fixed/sys/types.h:266:9: error: '__vm_ooffset_t' does not name a type typedef __vm_ooffset_t vm_ooffset_t; ^ /usr/local/lib/gcc5/gcc/x86_64-portbld-freebsd11.0/5.4.0/include-fixed/sys/types.h:268:9: error: '__vm_pindex_t' does not name a type typedef __vm_pindex_t vm_pindex_t; ^ *** [CoinFactorization2.lo] Error code 1 Unfortunately UPDATING was not updated for head/'s -r313194 (2017-Feb-4) --nor for stable/11/'s -r313574 (2017-Feb-11), the MFC. (No MFC was made to stable/10/ or to release/10.3.0 as far as I found.) (These changes predate the INO64 issue in head/ . Head ends up with more issues than I'm dealing with here.) Aspect #1: what 11.* version builds the pre-built packages targeting 11.* and the apparent consequences (given the vm_ooffset_t and vm_pindex_t changes and the lang/gcc* build behavior) This is the unsettling part for pre-built packages: incompatibilities within the 11.* family for the lang/gcc* packages. http://portsmon.freebsd.org/portoverview.py?category=%3Bamng=gcc5= shows categories for builds for 8.4 9.3 10.1 10.3 11.0 head (Nothing for stable/*/ .) But the 10.3 rows show no package builds. I would guess that they start once 10.1 stops (approximately). So it may be that 11.1 will not get package builds until 11.0 stops (approximately). If so unless lang/gcc* are changed to bootstrap differently they will configure to match release/11.0.1/ and will not be compatible with the vm_ooffset_t and vm_pindex_t changes in stable/11/ and release/11.1.0/ . But as I understand updating how the lang/gcc* builds work to remove such dependencies is under investigation. I do not know any timing relative to release/11.1.0/ if my understanding is right. Until then (if I was right): Unless there are separate packages made for targeting release/11.0.1/ vs. release/11.1.0/ it is not obvious when lang/gcc* packages will be generally compatible with various folks choices about what to install as the system version within the release/11.*/ and stable/11/ family. This would likely be true even if they were built on release/11.1.0/ : then release/11.0.1/ likely would have compatibility problems. The ABI versioning does not cover the specific issues involved based on how vm_ooffset_t and vm_pindex_t were changed and what the lang/gcc* builds do relative to such changes. Yet there is incompatibility for some fairly-significant-usage ports. Aspect #2: stable/10/ and release/10.4.0/ Just covered for completeness: I do not see a MFC of -r313194 to stable/10/ : its sys/sys/types.h dates back to 2015-Oct-10. So it looks like 10.x has a permanent difference in this area: 10.x continues to get separate lang/gcc* package builds from 11.x and later. No problem for this context as far as I know. Note: To simplify I choose to not be explicit about what authors wrote what original text. If that becomes an issue, it is correctable. Blame me for any errors in the above. === Mark Millard markmi at dsl-only.net ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: GCC + FreeBSD 11.0 Stable - stat.h does not have vm_ooffset_t definition
Gerald Pfeifer gerald at pfeifer.com wrote on Sun Apr 30 15:20:35 UTC 2017 : > That, or run the fixinc.sh script in > ./libexec/gcc/$TARGETTRIPLET/$VERSION/install-tools/fixinc.sh. fixinc.sh is designed to be run by (for the */* involved): bootstrap/libexec/gcc/*/*/install-tools/mkheaders and that mkheaders does more than just fixinc.sh as far as changing headers goes, such as limits.h and gsyslmits.h and syslimits.h . In more detail: The mkheaders core loop looks like: for ml in `cat ${itoolsdatadir}/fixinc_list`; do sysroot_headers_suffix=`echo ${ml} | sed -e 's/;.*$//'` multi_dir=`echo ${ml} | sed -e 's/^[^;]*;//'` subincdir=${incdir}${multi_dir} . ${itoolsdatadir}/mkheaders.conf if [ x${STMP_FIXINC} != x ] ; then TARGET_MACHINE="${target}" target_canonical="${target}" \ MACRO_LIST="${itoolsdatadir}/macro_list" \ /bin/sh ./fixinc.sh ${subincdir} \ ${isysroot}${SYSTEM_HEADER_DIR} ${OTHER_FIXINCLUDES_DIRS} rm -f ${subincdir}/syslimits.h if [ -f ${subincdir}/limits.h ]; then mv ${subincdir}/limits.h ${subincdir}/syslimits.h else cp ${itoolsdatadir}/gsyslimits.h ${subincdir}/syslimits.h fi fi cp ${itoolsdatadir}/include${multi_dir}/limits.h ${subincdir} done Note that mkheaders also provides various definitions to fixinc.sh, such as MACRO_LIST . Direct use of fixinc.sh likely requires providing appropriate alternate definitions for such. I'll note that: http://www.linuxfromscratch.org/lfs/view/7.1/chapter06/gcc.html reports as one of its steps (quote): The fixincludes script is known to occasionally erroneously attempt to "fix" the system headers installed so far. As the headers up to this point are known to not require fixing, issue the following command to prevent the fixincludes script from running: sed -i 's@\./fixinc\.sh@-c true@' gcc/Makefile.in (End quote) So seems that disabling fixinc.sh's use is fairly common when the headers are known to "not require fixing" (i.e., are known to already be gcc compliant). This still leaves the limits.h and gsystemlimits.h and syslimits.h code in place but does block most of the activity. === Mark Millard markmi at dsl-only.net ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: arm64 fork/swap data corruptions: A ~110 line C program demonstrating an example (Pine64+ 2GB context) [Corrected subject: arm64!]
On 2017-Mar-21, at 7:21 PM, Mark Millard wrote: > On 2017-Mar-18, at 9:10 PM, Mark Millard wrote: > >> >> On 2017-Mar-18, at 5:53 PM, Mark Millard wrote: >> >>> A new, significant discovery follows. . . >>> >>> While checking out use of procstat -v I ran >>> into the following common property for the 3 >>> programs that I looked at: >>> >>> A) My small test program that fails for >>> a dynamically allocated space. >>> >>> B) sh reporting Failed assertion: "tsd_booted". >>> >>> C) su reporting Failed assertion: "tsd_booted". >>> >>> Here are example addresses from the area of >>> incorrectly zeroed memory (A then B then C): >>> >>> (lldb) print dyn_region >>> (region *volatile) $0 = 0x40616000 >>> >>> (lldb) print &__je_tsd_booted >>> (bool *) $0 = 0x40618520 >>> >>> (lldb) print &__je_tsd_booted >>> (bool *) $0 = 0x40618520 >> >> That last above was a copy/paste error. Correction: >> >> (lldb) print &__je_tsd_booted >> (bool *) $0 = 0x4061d520 >> >>> The first is from dynamic allocation ending up >>> in the area. The other two are from libc.so.7 >>> globals/statics ending up in the general area. >>> >>> It looks like something is trashing a specific >>> memory area for some reason, rather independently >>> of what the program specifics are. > > I probably should have noted that the processes > involved were: child/parent then grandparent > and then great grandparent. The grandparent > was sh and the great grandparent was su. > > The ancestors in the process tree are being > damaged, not just the instances of the > program that demonstrates the problem. > >>> Other notes: >>> >>> At least for my small program showing failure: >>> >>> Being explicit about the combined conditions for failure >>> for my test program. . . >>> >>> Both tcache enabled and allocations fitting in SMALL_MAXCLASS >>> are required in order to make the program fail. >>> >>> Note: >>> >>> lldb) print __je_tcache_maxclass >>> (size_t) $0 = 32768 >>> >>> which is larger than SMALL_MAXCLASS. I've not observed >>> failures for sizes above SMALL_MAXCLASS but not exceeding >>> __je_tcache_maxclass. >>> >>> Thus tcache use by itself does not seen sufficient for >>> my program to get corruption of its dynamically allocated >>> memory: the small allocation size also matters. >>> >>> >>> Be warned that I can not eliminate the possibility that >>> the trashing changed what region of memory it trashed >>> for larger allocations or when tcache is disabled. >> >> The pine64+ 2GB eventually got into a state where: >> >> /etc/malloc.conf -> tcache:false >> >> made no difference and the failure kept occurring >> with that symbolic link in place. >> >> But after a reboot of the pin46+ 2GB >> /etc/malloc.conf -> tcache:false was again effective >> for my test program. (It was still present from >> before the reboot.) >> >> I checked the .core files and the allocated address >> assigned to dyn_region was the same in the tries >> before and after the reboot. (I had put in an >> additional raise(SIGABRT) so I'd always have >> a core file to look at.) >> >> Apparently /etc/malloc.conf -> tcache:false was >> being ignored before the reboot for some reason? > > I have also discovered that if the child process > in an example like my program does a: > > (void) posix_madvise(dyn_region, region_size, POSIX_MADV_WILLNEED); > > after the fork but before the sleep/swap-out/wait > then the problem does not happen. This is without > any read or write access to the memory between the > fork and sleep/swap-out/wait. > > By contrast such POSIX_MADV_WILLNEED use in the parent > process does not change the failure behavior. I've added another test program to bugzilla 217239 and 217138, one with thousands of 14 KiByte allocations. The test program usually ends up with them all being zeroed in the parent and child of the fork. But I've had a couple of runs where a much smaller prefix was messed up and then there were normal, expected values. #define region_size (14u*1024u) . . . #define num_regions (256u*1024u*1024u/region_size) So num_regions==18724, using up most of 256 MiBytes. Note: each region has its own 14 KiByte allocation. But dyn_regions[1296].array[0] in one example was the first normal value. In another example dyn_regions[2180].array[4096] was the first normal value. The last is interesting for being part way through an allocation's space. That but aligning with a 4 KiByte page size would seem odd for a pure-jemalloc issue. === Mark Millard markmi at dsl-only.net ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: arm64 fork/swap data corruptions: A ~110 line C program demonstrating an example (Pine64+ 2GB context) [Corrected subject: arm64!]
On 2017-Mar-18, at 9:10 PM, Mark Millard <mar...@dsl-only.net> wrote: > > On 2017-Mar-18, at 5:53 PM, Mark Millard <mar...@dsl-only.net> wrote: > >> A new, significant discovery follows. . . >> >> While checking out use of procstat -v I ran >> into the following common property for the 3 >> programs that I looked at: >> >> A) My small test program that fails for >> a dynamically allocated space. >> >> B) sh reporting Failed assertion: "tsd_booted". >> >> C) su reporting Failed assertion: "tsd_booted". >> >> Here are example addresses from the area of >> incorrectly zeroed memory (A then B then C): >> >> (lldb) print dyn_region >> (region *volatile) $0 = 0x40616000 >> >> (lldb) print &__je_tsd_booted >> (bool *) $0 = 0x40618520 >> >> (lldb) print &__je_tsd_booted >> (bool *) $0 = 0x40618520 > > That last above was a copy/paste error. Correction: > > (lldb) print &__je_tsd_booted > (bool *) $0 = 0x4061d520 > >> The first is from dynamic allocation ending up >> in the area. The other two are from libc.so.7 >> globals/statics ending up in the general area. >> >> It looks like something is trashing a specific >> memory area for some reason, rather independently >> of what the program specifics are. I probably should have noted that the processes involved were: child/parent then grandparent and then great grandparent. The grandparent was sh and the great grandparent was su. The ancestors in the process tree are being damaged, not just the instances of the program that demonstrates the problem. >> Other notes: >> >> At least for my small program showing failure: >> >> Being explicit about the combined conditions for failure >> for my test program. . . >> >> Both tcache enabled and allocations fitting in SMALL_MAXCLASS >> are required in order to make the program fail. >> >> Note: >> >> lldb) print __je_tcache_maxclass >> (size_t) $0 = 32768 >> >> which is larger than SMALL_MAXCLASS. I've not observed >> failures for sizes above SMALL_MAXCLASS but not exceeding >> __je_tcache_maxclass. >> >> Thus tcache use by itself does not seen sufficient for >> my program to get corruption of its dynamically allocated >> memory: the small allocation size also matters. >> >> >> Be warned that I can not eliminate the possibility that >> the trashing changed what region of memory it trashed >> for larger allocations or when tcache is disabled. > > The pine64+ 2GB eventually got into a state where: > > /etc/malloc.conf -> tcache:false > > made no difference and the failure kept occurring > with that symbolic link in place. > > But after a reboot of the pin46+ 2GB > /etc/malloc.conf -> tcache:false was again effective > for my test program. (It was still present from > before the reboot.) > > I checked the .core files and the allocated address > assigned to dyn_region was the same in the tries > before and after the reboot. (I had put in an > additional raise(SIGABRT) so I'd always have > a core file to look at.) > > Apparently /etc/malloc.conf -> tcache:false was > being ignored before the reboot for some reason? I have also discovered that if the child process in an example like my program does a: (void) posix_madvise(dyn_region, region_size, POSIX_MADV_WILLNEED); after the fork but before the sleep/swap-out/wait then the problem does not happen. This is without any read or write access to the memory between the fork and sleep/swap-out/wait. By contrast such POSIX_MADV_WILLNEED use in the parent process does not change the failure behavior. === Mark Millard markmi at dsl-only.net ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Unicode strageness with lldb
Pete French petefrench at ingresso.co.uk wrote on Mon Mar 20 14:55:44 UTC 2017: > Using the lldb installed with 11-STABLE from an hour or so ago. Thoigh > I dont know when this started, as I have been using db until now. > > First command I type is fine, subsequent commands, every keypress I > type looks like this: > > (lldb) \U+7F68\U+7F65\U+7F08 > > That's an attempted 'bt' so something is adding 0x7F00 to it somewhere. > > Anyone else seeing this ? Its in a standard xterm, options '-sb +lc -en utf8' > but I have tried with other options and it does the same thing. There was a time a while back when I was seeing such in a head (12) context. I'm not sure about the specific values after the +'s but \U prefixed output was definitely involved. I was not explicitly using any of the options that you list in those ssh sessions (from a macOS environment). I discovered that if I typed ^C it would output a new prompt and start taking/displaying input normally. I've not had such an issue in a while. I never managed to isolate what contributed to it happening. === Mark Millard markmi at dsl-only.net ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: arm64 fork/swap data corruptions: A ~110 line C program demonstrating an example (Pine64+ 2GB context) [Corrected subject: arm64!]
On 2017-Mar-18, at 5:53 PM, Mark Millard <mar...@dsl-only.net> wrote: > A new, significant discovery follows. . . > > While checking out use of procstat -v I ran > into the following common property for the 3 > programs that I looked at: > > A) My small test program that fails for > a dynamically allocated space. > > B) sh reporting Failed assertion: "tsd_booted". > > C) su reporting Failed assertion: "tsd_booted". > > Here are example addresses from the area of > incorrectly zeroed memory (A then B then C): > > (lldb) print dyn_region > (region *volatile) $0 = 0x40616000 > > (lldb) print &__je_tsd_booted > (bool *) $0 = 0x40618520 > > (lldb) print &__je_tsd_booted > (bool *) $0 = 0x40618520 That last above was a copy/paste error. Correction: (lldb) print &__je_tsd_booted (bool *) $0 = 0x4061d520 > The first is from dynamic allocation ending up > in the area. The other two are from libc.so.7 > globals/statics ending up in the general area. > > It looks like something is trashing a specific > memory area for some reason, rather independently > of what the program specifics are. > > > Other notes: > > At least for my small program showing failure: > > Being explicit about the combined conditions for failure > for my test program. . . > > Both tcache enabled and allocations fitting in SMALL_MAXCLASS > are required in order to make the program fail. > > Note: > > lldb) print __je_tcache_maxclass > (size_t) $0 = 32768 > > which is larger than SMALL_MAXCLASS. I've not observed > failures for sizes above SMALL_MAXCLASS but not exceeding > __je_tcache_maxclass. > > Thus tcache use by itself does not seen sufficient for > my program to get corruption of its dynamically allocated > memory: the small allocation size also matters. > > > Be warned that I can not eliminate the possibility that > the trashing changed what region of memory it trashed > for larger allocations or when tcache is disabled. The pine64+ 2GB eventually got into a state where: /etc/malloc.conf -> tcache:false made no difference and the failure kept occurring with that symbolic link in place. But after a reboot of the pin46+ 2GB /etc/malloc.conf -> tcache:false was again effective for my test program. (It was still present from before the reboot.) I checked the .core files and the allocated address assigned to dyn_region was the same in the tries before and after the reboot. (I had put in an additional raise(SIGABRT) so I'd always have a core file to look at.) Apparently /etc/malloc.conf -> tcache:false was being ignored before the reboot for some reason? === Mark Millard markmi at dsl-only.net ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: arm64 fork/swap data corruptions: A ~110 line C program demonstrating an example (Pine64+ 2GB context) [Corrected subject: arm64!]
A new, significant discovery follows. . . While checking out use of procstat -v I ran into the following common property for the 3 programs that I looked at: A) My small test program that fails for a dynamically allocated space. B) sh reporting Failed assertion: "tsd_booted". C) su reporting Failed assertion: "tsd_booted". Here are example addresses from the area of incorrectly zeroed memory (A then B then C): (lldb) print dyn_region (region *volatile) $0 = 0x40616000 (lldb) print &__je_tsd_booted (bool *) $0 = 0x40618520 (lldb) print &__je_tsd_booted (bool *) $0 = 0x40618520 The first is from dynamic allocation ending up in the area. The other two are from libc.so.7 globals/statics ending up in the general area. It looks like something is trashing a specific memory area for some reason, rather independently of what the program specifics are. Other notes: At least for my small program showing failure: Being explicit about the combined conditions for failure for my test program. . . Both tcache enabled and allocations fitting in SMALL_MAXCLASS are required in order to make the program fail. Note: lldb) print __je_tcache_maxclass (size_t) $0 = 32768 which is larger than SMALL_MAXCLASS. I've not observed failures for sizes above SMALL_MAXCLASS but not exceeding __je_tcache_maxclass. Thus tcache use by itself does not seen sufficient for my program to get corruption of its dynamically allocated memory: the small allocation size also matters. Be warned that I can not eliminate the possibility that the trashing changed what region of memory it trashed for larger allocations or when tcache is disabled. === Mark Millard markmi at dsl-only.net ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: arm64 fork/swap data corruptions: A ~110 line C program demonstrating an example (Pine64+ 2GB context) [Corrected subject: arm64!]
[Summary: I've now tested on a rpi3 in addition to a pine64+ 2GB. Both contexts show the problem.] On 2017-Mar-16, at 2:07 AM, Mark Millard wrote: > On 2017-Mar-15, at 11:07 PM, Scott Bennett wrote: > >> Mark Millard wrote: >> >>> [Something strange happened to the automatic CC: fill-in for my original >>> reply. Also I should have mentioned that for my test program if a >>> variant is made that does not fork the swapping works fine.] >>> >>> On 2017-Mar-15, at 9:37 AM, Mark Millard wrote: >>> >>>> On 2017-Mar-15, at 6:15 AM, Scott Bennett wrote: >>>> >>>>> On Tue, 14 Mar 2017 18:18:56 -0700 Mark Millard >>>>> wrote: >>>>>> On 2017-Mar-14, at 4:44 PM, Bernd Walter <ti...@cicely7.cicely.de> wrote: >>>>>> >>>>>>> On Tue, Mar 14, 2017 at 03:28:53PM -0700, Mark Millard wrote: >>>>>>>> [test_check() between the fork and the wait/sleep prevents the >>>>>>>> failure from occurring. Even a small access to the memory at >>>>>>>> that stage prevents the failure. Details follow.] >>>>>>> >>>>>>> Maybe a stupid question, since you might have written it somewhere. >>>>>>> What medium do you swap to? >>>>>>> I've seen broken firmware on microSD cards doing silent data >>>>>>> corruption for some access patterns. >>>>>> >>>>>> The root filesystem is on a USB SSD on a powered hub. >>>>>> >>>>>> Only the kernel is from the microSD card. >>>>>> >>>>>> I have several examples of the USB SSD model and have >>>>>> never observed such problems in any other context. >>>>>> >>>>>> [remainder of irrelevant material deleted --SB] >>>>> >>>>> You gave a very long-winded non-answer to Bernd's question, so I'll >>>>> repeat it here. What medium do you swap to? >>>> >>>> My wording of: >>>> >>>> The root filesystem is on a USB SSD on a powered hub. >>>> >>>> was definitely poor. It should have explicitly mentioned the >>>> swap partition too: >>>> >>>> The root filesystem and swap partition are both on the same >>>> USB SSD on a powered hub. >>>> >>>> More detail from dmesg -a for usb: >>>> >>>> usbus0: 12Mbps Full Speed USB v1.0 >>>> usbus1: 480Mbps High Speed USB v2.0 >>>> usbus2: 12Mbps Full Speed USB v1.0 >>>> usbus3: 480Mbps High Speed USB v2.0 >>>> ugen0.1: at usbus0 >>>> uhub0: on usbus0 >>>> ugen1.1: at usbus1 >>>> uhub1: on >>>> usbus1 >>>> ugen2.1: at usbus2 >>>> uhub2: on usbus2 >>>> ugen3.1: at usbus3 >>>> uhub3: on >>>> usbus3 >>>> . . . >>>> uhub0: 1 port with 1 removable, self powered >>>> uhub2: 1 port with 1 removable, self powered >>>> uhub1: 1 port with 1 removable, self powered >>>> uhub3: 1 port with 1 removable, self powered >>>> ugen3.2: at usbus3 >>>> uhub4 on uhub3 >>>> uhub4: on >>>> usbus3 >>>> uhub4: MTT enabled >>>> uhub4: 4 ports with 4 removable, self powered >>>> ugen3.3: at usbus3 >>>> umass0 on uhub4 >>>> umass0: on usbus3 >>>> umass0: SCSI over Bulk-Only; quirks = 0x0100 >>>> umass0:0:0: Attached to scbus0 >>>> . . . >>>> da0 at umass-sim0 bus 0 scbus0 target 0 lun 0 >>>> da0: Fixed Direct Access SPC-4 SCSI device >>>> da0: Serial Number >>>> da0: 40.000MB/s transfers >>>> >>>> (Edited a bit because there is other material interlaced, even >>>> internal to some lines. Also: I removed the serial number of the >>>> specific example device.) >> >>Thank you. That presents a much clearer picture. >>>> >>>>> I will further note that any kind of USB device cannot automatically >>>>> be trusted to behave properly. USB devices are notorious, for example, >>>>> >>>>> [reasons why deleted --SB] >>>>> >>>>> You should identify where you page/swap to and then try substituting >>>>> a different device for that function as a test to eliminate the >