Re: Ryzen public erratas
On 13 June 2018 at 04:16, Eitan Adler wrote: > On 13 June 2018 at 03:35, Konstantin Belousov wrote: >> Today I noted that AMD published the public errata document for Ryzens, >> https://developer.amd.com/wp-content/resources/55449_1.12.pdf >> >> Some of the issues listed there looks quite relevant to the potential >> hangs that some people still experience with the machines. I wrote >> a script which should apply the recommended workarounds to the erratas >> that I find interesting. >> >> To run it, kldload cpuctl, then apply the latest firmware update to your >> CPU, then run the following shell script. Comments indicate the errata >> number for the workarounds. >> >> Please report the results. If the script helps, I will code the kernel >> change to apply the workarounds. >> >> #!/bin/sh >> >> # Enable workarounds for erratas listed in >> # https://developer.amd.com/wp-content/resources/55449_1.12.pdf >> >> # 1057, 1109 >> sysctl machdep.idle_mwait=0 >> sysctl machdep.idle=hlt > > > Is this needed if it was previously machdep.idle: acpi ? This might explain why I've never seen the lockup issues mentioned by other people. What would cause my machine to differ from others? -- Eitan Adler ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: A head buildworld race visible in the ci.freebsd.org build history
On 2018-Jun-18, at 6:03 PM, Mark Millard wrote: > On 2018-Jun-18, at 4:08 PM, Bryan Drewery wrote: > >> On 6/18/2018 3:27 PM, Li-Wen Hsu wrote: >>> ranlib -D libpcap.a >>> ranlib: fatal: Failed to open 'libpcap.a' >> >> Where is this error even coming from? It's not in the usr.bin/ar code >> and ranlib does not cause it. >> >> # ranlib -D uh >> ranlib: warning: uh: no such file > > A more complete sequence is (with some > other text mixed in, as in where I got > the text from on ci.freebsd.org): > > --- libvgl.a --- > building static vgl library > ar -crD libvgl.a `NM='nm' NMFLAGS='' lorder main.o simple.o bitmap.o text.o > mouse.o keyboard.o | tsort -q` > --- all_subdir_lib/libsysdecode --- > ranlib -D libsysdecode.a > --- all_subdir_lib/libvgl --- > ranlib -D libvgl.a > ranlib: fatal: Failed to open 'libvgl.a' > --- all_subdir_lib/libsysdecode --- > ranlib: fatal: Failed to open 'libsysdecode.a' > --- all_subdir_lib/libvgl --- > *** [libvgl.a] Error code 70 > > So, in essence, > > ar -crD libvgl.a `NM='nm' NMFLAGS='' lorder main.o simple.o bitmap.o text.o > mouse.o keyboard.o | tsort -q` > ranlib -D libvgl.a > ranlib: fatal: Failed to open 'libvgl.a' > > It is not obvious to me that the "Failed to open" means > that there was "no such file". Might there be some other > form of "Failed to open" for a file that does exist from > the ar at least having created its output .a file? > Also, if what varies is the head system version (for failing vs. working) and what is the same is running a 11.1R jail, then it would seem to be the underlying head system software in each that matters for the ar -> ranlib sequence behavior, but not 11.1R's ar or ranlib or 11.1R's libraries indirectly involved --nor in head's ar or ranlib (or their indirections). head's: unused. The only parts of head that could be involved are parts that the 11.1R jail does not avoid. This suggests more basic infrastructure in head to me. === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar) ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: A head buildworld race visible in the ci.freebsd.org build history
On 2018-Jun-18, at 4:08 PM, Bryan Drewery wrote: > On 6/18/2018 3:27 PM, Li-Wen Hsu wrote: >> ranlib -D libpcap.a >> ranlib: fatal: Failed to open 'libpcap.a' > > Where is this error even coming from? It's not in the usr.bin/ar code > and ranlib does not cause it. > > # ranlib -D uh > ranlib: warning: uh: no such file A more complete sequence is (with some other text mixed in, as in where I got the text from on ci.freebsd.org): --- libvgl.a --- building static vgl library ar -crD libvgl.a `NM='nm' NMFLAGS='' lorder main.o simple.o bitmap.o text.o mouse.o keyboard.o | tsort -q` --- all_subdir_lib/libsysdecode --- ranlib -D libsysdecode.a --- all_subdir_lib/libvgl --- ranlib -D libvgl.a ranlib: fatal: Failed to open 'libvgl.a' --- all_subdir_lib/libsysdecode --- ranlib: fatal: Failed to open 'libsysdecode.a' --- all_subdir_lib/libvgl --- *** [libvgl.a] Error code 70 So, in essence, ar -crD libvgl.a `NM='nm' NMFLAGS='' lorder main.o simple.o bitmap.o text.o mouse.o keyboard.o | tsort -q` ranlib -D libvgl.a ranlib: fatal: Failed to open 'libvgl.a' It is not obvious to me that the "Failed to open" means that there was "no such file". Might there be some other form of "Failed to open" for a file that does exist from the ar at least having created its output .a file? === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar) ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: review of nfsd rc.d script patch
On 15 Jun, Rick Macklem wrote: > Hi, > > For the pNFS service MDS machine, the nfsd can't be started until all nfs > mounts > in /etc/fstab are done. > I think that adding "mountcritremote" to the "# REQUIRE:" line is sufficient > to do this? > > I don't think delaying the startup of the nfsd daemon until after any NFS > mounts > are done will do any harm, but if others think it would be a POLA violation, > I could make this dependent on the pNFS service being enabled. > Does anyone think this would cause a POLA violation? Sounds like that would break cross mounts. Back in the olden days before the automounter, I would set up workstation clusters with hosta exporting local filesystem /home/hosta, and hostb exporting /home/hostb. In addition, hosta would do a bg NFS mount of /home/hostb and hostb would do a bg NFS mount of /home/hosta. That way everybody would have a consistent view of everything. If a power failure took down everything, the first system up would export its local filesystem and even though it wouldn't be able to mount any remote filesystems, mount would background itself at the boot would complete. As the remaining machines came up, they would be able to mount the remote filesystems of the machine that came up earlier, and the early machines would mount the filesystems from the later machines as they became available. If nfsd is delayed until all the NFS filesystems are mounted, the above setup would deadlock. ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: A head buildworld race visible in the ci.freebsd.org build history
On 18 June 2018 at 19:29, Bryan Drewery wrote: > > The error is coming from libarchive which had a change between those > revisions: > >> >> r328332 | mm | 2018-01-24 06:24:17 -0800 (Wed, 24 Jan 2018) | 14 lines Li-Wen reported that the build is done in a 11.1-rel jail though, so the libarchive (or any userland) change shouldn't be responsible. Can we update a canary builder to somewhere between r328278 and r88? ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: A head buildworld race visible in the ci.freebsd.org build history
On 6/18/2018 3:27 PM, Li-Wen Hsu wrote: > On Mon, Jun 18, 2018 at 5:04 PM Mark Millard via freebsd-toolchain > wrote: >> >> On 2018-Jun-18, at 12:42 PM, Bryan Drewery wrote: >> >>> On 6/15/2018 10:55 PM, Mark Millard wrote: In watching ci.freebsd.org builds I've seen a notable number of one time failures, such as (example from powerpc64): --- all_subdir_lib/libufs --- ranlib -D libufs.a ranlib: fatal: Failed to open 'libufs.a' *** [libufs.a] Error code 70 where the next build works despite the change being irrelevant to whatever ranlib complained about. Other builds failed similarly: --- all_subdir_lib/libbsm --- ranlib -D libbsm_p.a ranlib: fatal: Failed to open 'libbsm_p.a' *** [libbsm_p.a] Error code 70 and: --- kerberos5/lib__L --- ranlib -D libgssapi_spnego_p.a --- libgssapi_spnego.a --- ranlib -D libgssapi_spnego.a --- libgssapi_spnego_p.a --- ranlib: fatal: Failed to open 'libgssapi_spnego_p.a' *** [libgssapi_spnego_p.a] Error code 70 and so on. It is not limited to powerpc64. For example, for aarch64 there are: --- libpam_exec.a --- building static pam_exec library ar -crD libpam_exec.a `NM='nm' NMFLAGS='' lorder pam_exec.o | tsort -q` ranlib -D libpam_exec.a ranlib: fatal: Failed to open 'libpam_exec.a' *** [libpam_exec.a] Error code 70 and: --- all_subdir_lib/libusb --- ranlib -D libusb.a ranlib: fatal: Failed to open 'libusb.a' *** [libusb.a] Error code 70 and: --- all_subdir_lib/libbsnmp --- ranlib: fatal: Failed to open 'libbsnmp.a' --- all_subdir_lib/ncurses --- --- all_subdir_lib/ncurses/panelw --- --- panel.pico --- --- all_subdir_lib/libbsnmp --- *** [libbsnmp.a] Error code 70 Even amd64 gets such: --- libpcap.a --- ranlib -D libpcap.a ranlib: fatal: Failed to open 'libpcap.a' *** [libpcap.a] Error code 70 and: --- libkafs5.a --- ranlib: fatal: Failed to open 'libkafs5.a' --- libkafs5_p.a --- ranlib: fatal: Failed to open 'libkafs5_p.a' --- cddl/lib__L --- /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/lua/lbaselib.c:60:26: note: include the header or explicitly provide a declaration for 'toupper' --- kerberos5/lib__L --- *** [libkafs5_p.a] Error code 70 make[5]: stopped in /usr/src/kerberos5/lib/libkafs5 --- libkafs5.a --- *** [libkafs5.a] Error code 70 and: --- lib__L --- ranlib -D libclang_rt.asan_cxx-i386.a ranlib: fatal: Failed to open 'libclang_rt.asan_cxx-i386.a' *** [libclang_rt.asan_cxx-i386.a] Error code 70 (Notice the variability in what .a the ranlib's fail for.) >>> >>> >>> I looked at this a few days ago and don't believe it's actually a build >>> race. I think there is something wrong with the ar/ranlib on that system >>> or something else. I've found no evidence of concurrent building of the >>> .a files in question. >> >> >> Looking at a bunch of the failures, spanning multiple >> FreeBSD-head-*-build types of builds, I see only: >> >> NODE_LABELS bhyve_host butler1.nyi.freebsd.org jailer jailer_fast >> NODE_NAME butler1.nyi.freebsd.org >> >> for the failures that I looked at. >> >> So your "on that system" might well be correct. > > Thanks for the insight, the build is done in a 11.1-R jail on a > -CURRENT host. butler1.nyi is running r88 (as a canary) while > other builders are mostly running r328278. I upgraded few others and > it seems can reproduce the issue, and now I downgraded all the build > slaves to r328278 before we find the root cause. > The error is coming from libarchive which had a change between those revisions: > > r328332 | mm | 2018-01-24 06:24:17 -0800 (Wed, 24 Jan 2018) | 14 lines > > MFV r328323,328324: > Sync libarchive with vendor. > > Relevant vendor changes: > PR #893: delete dead ppmd7 alloc callbacks > PR #904: Fix archive freeing bug in bsdcat > PR #961: Fix ZIP format names > PR #962: Don't modify attributes for existing directories >when ARCHIVE_EXTRACT_NO_OVERWRITE is set > PR #964: Fix -Werror=implicit-fallthrough= for GCC 7 > PR #970: zip: Allow backslash as path separator > > MFC after: 1 week > > Nothing obvious stands out in the change to me though from a brief look. -- Regards, Bryan Drewery signature.asc Description: OpenPGP digital signature
Re: A head buildworld race visible in the ci.freebsd.org build history
On 6/18/2018 3:27 PM, Li-Wen Hsu wrote: > ranlib -D libpcap.a > ranlib: fatal: Failed to open 'libpcap.a' Where is this error even coming from? It's not in the usr.bin/ar code and ranlib does not cause it. # ranlib -D uh ranlib: warning: uh: no such file -- Regards, Bryan Drewery signature.asc Description: OpenPGP digital signature
Re: A head buildworld race visible in the ci.freebsd.org build history
On 6/18/2018 3:31 PM, Li-Wen Hsu wrote: > On Mon, Jun 18, 2018 at 6:27 PM Bryan Drewery wrote: >> >> On 6/18/2018 1:45 PM, Konstantin Belousov wrote: >>> On Mon, Jun 18, 2018 at 12:42:46PM -0700, Bryan Drewery wrote: On 6/15/2018 10:55 PM, Mark Millard wrote: > In watching ci.freebsd.org builds I've seen a notable > number of one time failures, such as (example from > powerpc64): > > --- all_subdir_lib/libufs --- > ranlib -D libufs.a > ranlib: fatal: Failed to open 'libufs.a' > *** [libufs.a] Error code 70 > > where the next build works despite the change being > irrelevant to whatever ranlib complained about. > > Other builds failed similarly: > > --- all_subdir_lib/libbsm --- > ranlib -D libbsm_p.a > ranlib: fatal: Failed to open 'libbsm_p.a' > *** [libbsm_p.a] Error code 70 > > and: > > --- kerberos5/lib__L --- > ranlib -D libgssapi_spnego_p.a > --- libgssapi_spnego.a --- > ranlib -D libgssapi_spnego.a > --- libgssapi_spnego_p.a --- > ranlib: fatal: Failed to open 'libgssapi_spnego_p.a' > *** [libgssapi_spnego_p.a] Error code 70 > > and so on. > > > It is not limited to powerpc64. For example, for aarch64 > there are: > > --- libpam_exec.a --- > building static pam_exec library > ar -crD libpam_exec.a `NM='nm' NMFLAGS='' lorder pam_exec.o | tsort -q` > ranlib -D libpam_exec.a > ranlib: fatal: Failed to open 'libpam_exec.a' > *** [libpam_exec.a] Error code 70 > > and: > > --- all_subdir_lib/libusb --- > ranlib -D libusb.a > ranlib: fatal: Failed to open 'libusb.a' > *** [libusb.a] Error code 70 > > and: > > --- all_subdir_lib/libbsnmp --- > ranlib: fatal: Failed to open 'libbsnmp.a' > --- all_subdir_lib/ncurses --- > --- all_subdir_lib/ncurses/panelw --- > --- panel.pico --- > --- all_subdir_lib/libbsnmp --- > *** [libbsnmp.a] Error code 70 > > > Even amd64 gets such: > > --- libpcap.a --- > ranlib -D libpcap.a > ranlib: fatal: Failed to open 'libpcap.a' > *** [libpcap.a] Error code 70 > > and: > > > --- libkafs5.a --- > ranlib: fatal: Failed to open 'libkafs5.a' > --- libkafs5_p.a --- > ranlib: fatal: Failed to open 'libkafs5_p.a' > --- cddl/lib__L --- > /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/lua/lbaselib.c:60:26: > note: include the header or explicitly provide a declaration > for 'toupper' > --- kerberos5/lib__L --- > *** [libkafs5_p.a] Error code 70 > > make[5]: stopped in /usr/src/kerberos5/lib/libkafs5 > --- libkafs5.a --- > *** [libkafs5.a] Error code 70 > > and: > > > --- lib__L --- > ranlib -D libclang_rt.asan_cxx-i386.a > ranlib: fatal: Failed to open 'libclang_rt.asan_cxx-i386.a' > *** [libclang_rt.asan_cxx-i386.a] Error code 70 > > > (Notice the variability in what .a the ranlib's fail for.) > > > > > I looked at this a few days ago and don't believe it's actually a build race. I think there is something wrong with the ar/ranlib on that system or something else. I've found no evidence of concurrent building of the .a files in question. >>> >>> FWIW, I got the similar failure when I did last checks for the OFED >>> commit. For me, it was libgcc.a. >>> >> >> If it was -lgcc_s then it's a known rare build race due to >> tools/install.sh not handling -S. > > It seems a more general problem, this one: > > https://ci.freebsd.org/job/FreeBSD-head-aarch64-build/8190/console > > calls for libcuse_p.a, while this one: > > https://ci.freebsd.org/job/FreeBSD-head-mips-build/2919/console > > calls for libfifolog.a > Well why is ar -> ranlib so special? Nothing else is failing. What filesystem are these using for objdirs? What revision is the host kernel? -- Regards, Bryan Drewery signature.asc Description: OpenPGP digital signature
Re: A head buildworld race visible in the ci.freebsd.org build history
On Mon, Jun 18, 2018 at 5:04 PM Mark Millard via freebsd-toolchain wrote: > > On 2018-Jun-18, at 12:42 PM, Bryan Drewery wrote: > > > On 6/15/2018 10:55 PM, Mark Millard wrote: > >> In watching ci.freebsd.org builds I've seen a notable > >> number of one time failures, such as (example from > >> powerpc64): > >> > >> --- all_subdir_lib/libufs --- > >> ranlib -D libufs.a > >> ranlib: fatal: Failed to open 'libufs.a' > >> *** [libufs.a] Error code 70 > >> > >> where the next build works despite the change being > >> irrelevant to whatever ranlib complained about. > >> > >> Other builds failed similarly: > >> > >> --- all_subdir_lib/libbsm --- > >> ranlib -D libbsm_p.a > >> ranlib: fatal: Failed to open 'libbsm_p.a' > >> *** [libbsm_p.a] Error code 70 > >> > >> and: > >> > >> --- kerberos5/lib__L --- > >> ranlib -D libgssapi_spnego_p.a > >> --- libgssapi_spnego.a --- > >> ranlib -D libgssapi_spnego.a > >> --- libgssapi_spnego_p.a --- > >> ranlib: fatal: Failed to open 'libgssapi_spnego_p.a' > >> *** [libgssapi_spnego_p.a] Error code 70 > >> > >> and so on. > >> > >> > >> It is not limited to powerpc64. For example, for aarch64 > >> there are: > >> > >> --- libpam_exec.a --- > >> building static pam_exec library > >> ar -crD libpam_exec.a `NM='nm' NMFLAGS='' lorder pam_exec.o | tsort -q` > >> ranlib -D libpam_exec.a > >> ranlib: fatal: Failed to open 'libpam_exec.a' > >> *** [libpam_exec.a] Error code 70 > >> > >> and: > >> > >> --- all_subdir_lib/libusb --- > >> ranlib -D libusb.a > >> ranlib: fatal: Failed to open 'libusb.a' > >> *** [libusb.a] Error code 70 > >> > >> and: > >> > >> --- all_subdir_lib/libbsnmp --- > >> ranlib: fatal: Failed to open 'libbsnmp.a' > >> --- all_subdir_lib/ncurses --- > >> --- all_subdir_lib/ncurses/panelw --- > >> --- panel.pico --- > >> --- all_subdir_lib/libbsnmp --- > >> *** [libbsnmp.a] Error code 70 > >> > >> > >> Even amd64 gets such: > >> > >> --- libpcap.a --- > >> ranlib -D libpcap.a > >> ranlib: fatal: Failed to open 'libpcap.a' > >> *** [libpcap.a] Error code 70 > >> > >> and: > >> > >> > >> --- libkafs5.a --- > >> ranlib: fatal: Failed to open 'libkafs5.a' > >> --- libkafs5_p.a --- > >> ranlib: fatal: Failed to open 'libkafs5_p.a' > >> --- cddl/lib__L --- > >> /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/lua/lbaselib.c:60:26: > >> note: include the header or explicitly provide a declaration > >> for 'toupper' > >> --- kerberos5/lib__L --- > >> *** [libkafs5_p.a] Error code 70 > >> > >> make[5]: stopped in /usr/src/kerberos5/lib/libkafs5 > >> --- libkafs5.a --- > >> *** [libkafs5.a] Error code 70 > >> > >> and: > >> > >> > >> --- lib__L --- > >> ranlib -D libclang_rt.asan_cxx-i386.a > >> ranlib: fatal: Failed to open 'libclang_rt.asan_cxx-i386.a' > >> *** [libclang_rt.asan_cxx-i386.a] Error code 70 > >> > >> > >> (Notice the variability in what .a the ranlib's fail for.) > >> > >> > >> > >> > >> > > > > > > I looked at this a few days ago and don't believe it's actually a build > > race. I think there is something wrong with the ar/ranlib on that system > > or something else. I've found no evidence of concurrent building of the > > .a files in question. > > > Looking at a bunch of the failures, spanning multiple > FreeBSD-head-*-build types of builds, I see only: > > NODE_LABELS bhyve_host butler1.nyi.freebsd.org jailer jailer_fast > NODE_NAME butler1.nyi.freebsd.org > > for the failures that I looked at. > > So your "on that system" might well be correct. Thanks for the insight, the build is done in a 11.1-R jail on a -CURRENT host. butler1.nyi is running r88 (as a canary) while other builders are mostly running r328278. I upgraded few others and it seems can reproduce the issue, and now I downgraded all the build slaves to r328278 before we find the root cause. Li-Wen -- Li-Wen Hsu https://lwhsu.org ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: A head buildworld race visible in the ci.freebsd.org build history
On Mon, Jun 18, 2018 at 6:27 PM Bryan Drewery wrote: > > On 6/18/2018 1:45 PM, Konstantin Belousov wrote: > > On Mon, Jun 18, 2018 at 12:42:46PM -0700, Bryan Drewery wrote: > >> On 6/15/2018 10:55 PM, Mark Millard wrote: > >>> In watching ci.freebsd.org builds I've seen a notable > >>> number of one time failures, such as (example from > >>> powerpc64): > >>> > >>> --- all_subdir_lib/libufs --- > >>> ranlib -D libufs.a > >>> ranlib: fatal: Failed to open 'libufs.a' > >>> *** [libufs.a] Error code 70 > >>> > >>> where the next build works despite the change being > >>> irrelevant to whatever ranlib complained about. > >>> > >>> Other builds failed similarly: > >>> > >>> --- all_subdir_lib/libbsm --- > >>> ranlib -D libbsm_p.a > >>> ranlib: fatal: Failed to open 'libbsm_p.a' > >>> *** [libbsm_p.a] Error code 70 > >>> > >>> and: > >>> > >>> --- kerberos5/lib__L --- > >>> ranlib -D libgssapi_spnego_p.a > >>> --- libgssapi_spnego.a --- > >>> ranlib -D libgssapi_spnego.a > >>> --- libgssapi_spnego_p.a --- > >>> ranlib: fatal: Failed to open 'libgssapi_spnego_p.a' > >>> *** [libgssapi_spnego_p.a] Error code 70 > >>> > >>> and so on. > >>> > >>> > >>> It is not limited to powerpc64. For example, for aarch64 > >>> there are: > >>> > >>> --- libpam_exec.a --- > >>> building static pam_exec library > >>> ar -crD libpam_exec.a `NM='nm' NMFLAGS='' lorder pam_exec.o | tsort -q` > >>> ranlib -D libpam_exec.a > >>> ranlib: fatal: Failed to open 'libpam_exec.a' > >>> *** [libpam_exec.a] Error code 70 > >>> > >>> and: > >>> > >>> --- all_subdir_lib/libusb --- > >>> ranlib -D libusb.a > >>> ranlib: fatal: Failed to open 'libusb.a' > >>> *** [libusb.a] Error code 70 > >>> > >>> and: > >>> > >>> --- all_subdir_lib/libbsnmp --- > >>> ranlib: fatal: Failed to open 'libbsnmp.a' > >>> --- all_subdir_lib/ncurses --- > >>> --- all_subdir_lib/ncurses/panelw --- > >>> --- panel.pico --- > >>> --- all_subdir_lib/libbsnmp --- > >>> *** [libbsnmp.a] Error code 70 > >>> > >>> > >>> Even amd64 gets such: > >>> > >>> --- libpcap.a --- > >>> ranlib -D libpcap.a > >>> ranlib: fatal: Failed to open 'libpcap.a' > >>> *** [libpcap.a] Error code 70 > >>> > >>> and: > >>> > >>> > >>> --- libkafs5.a --- > >>> ranlib: fatal: Failed to open 'libkafs5.a' > >>> --- libkafs5_p.a --- > >>> ranlib: fatal: Failed to open 'libkafs5_p.a' > >>> --- cddl/lib__L --- > >>> /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/lua/lbaselib.c:60:26: > >>> note: include the header or explicitly provide a declaration > >>> for 'toupper' > >>> --- kerberos5/lib__L --- > >>> *** [libkafs5_p.a] Error code 70 > >>> > >>> make[5]: stopped in /usr/src/kerberos5/lib/libkafs5 > >>> --- libkafs5.a --- > >>> *** [libkafs5.a] Error code 70 > >>> > >>> and: > >>> > >>> > >>> --- lib__L --- > >>> ranlib -D libclang_rt.asan_cxx-i386.a > >>> ranlib: fatal: Failed to open 'libclang_rt.asan_cxx-i386.a' > >>> *** [libclang_rt.asan_cxx-i386.a] Error code 70 > >>> > >>> > >>> (Notice the variability in what .a the ranlib's fail for.) > >>> > >>> > >>> > >>> > >>> > >> > >> > >> I looked at this a few days ago and don't believe it's actually a build > >> race. I think there is something wrong with the ar/ranlib on that system > >> or something else. I've found no evidence of concurrent building of the > >> .a files in question. > > > > FWIW, I got the similar failure when I did last checks for the OFED > > commit. For me, it was libgcc.a. > > > > If it was -lgcc_s then it's a known rare build race due to > tools/install.sh not handling -S. It seems a more general problem, this one: https://ci.freebsd.org/job/FreeBSD-head-aarch64-build/8190/console calls for libcuse_p.a, while this one: https://ci.freebsd.org/job/FreeBSD-head-mips-build/2919/console calls for libfifolog.a -- Li-Wen Hsu https://lwhsu.org ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: A head buildworld race visible in the ci.freebsd.org build history
On 6/18/2018 1:45 PM, Konstantin Belousov wrote: > On Mon, Jun 18, 2018 at 12:42:46PM -0700, Bryan Drewery wrote: >> On 6/15/2018 10:55 PM, Mark Millard wrote: >>> In watching ci.freebsd.org builds I've seen a notable >>> number of one time failures, such as (example from >>> powerpc64): >>> >>> --- all_subdir_lib/libufs --- >>> ranlib -D libufs.a >>> ranlib: fatal: Failed to open 'libufs.a' >>> *** [libufs.a] Error code 70 >>> >>> where the next build works despite the change being >>> irrelevant to whatever ranlib complained about. >>> >>> Other builds failed similarly: >>> >>> --- all_subdir_lib/libbsm --- >>> ranlib -D libbsm_p.a >>> ranlib: fatal: Failed to open 'libbsm_p.a' >>> *** [libbsm_p.a] Error code 70 >>> >>> and: >>> >>> --- kerberos5/lib__L --- >>> ranlib -D libgssapi_spnego_p.a >>> --- libgssapi_spnego.a --- >>> ranlib -D libgssapi_spnego.a >>> --- libgssapi_spnego_p.a --- >>> ranlib: fatal: Failed to open 'libgssapi_spnego_p.a' >>> *** [libgssapi_spnego_p.a] Error code 70 >>> >>> and so on. >>> >>> >>> It is not limited to powerpc64. For example, for aarch64 >>> there are: >>> >>> --- libpam_exec.a --- >>> building static pam_exec library >>> ar -crD libpam_exec.a `NM='nm' NMFLAGS='' lorder pam_exec.o | tsort -q` >>> ranlib -D libpam_exec.a >>> ranlib: fatal: Failed to open 'libpam_exec.a' >>> *** [libpam_exec.a] Error code 70 >>> >>> and: >>> >>> --- all_subdir_lib/libusb --- >>> ranlib -D libusb.a >>> ranlib: fatal: Failed to open 'libusb.a' >>> *** [libusb.a] Error code 70 >>> >>> and: >>> >>> --- all_subdir_lib/libbsnmp --- >>> ranlib: fatal: Failed to open 'libbsnmp.a' >>> --- all_subdir_lib/ncurses --- >>> --- all_subdir_lib/ncurses/panelw --- >>> --- panel.pico --- >>> --- all_subdir_lib/libbsnmp --- >>> *** [libbsnmp.a] Error code 70 >>> >>> >>> Even amd64 gets such: >>> >>> --- libpcap.a --- >>> ranlib -D libpcap.a >>> ranlib: fatal: Failed to open 'libpcap.a' >>> *** [libpcap.a] Error code 70 >>> >>> and: >>> >>> >>> --- libkafs5.a --- >>> ranlib: fatal: Failed to open 'libkafs5.a' >>> --- libkafs5_p.a --- >>> ranlib: fatal: Failed to open 'libkafs5_p.a' >>> --- cddl/lib__L --- >>> /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/lua/lbaselib.c:60:26: >>> note: include the header or explicitly provide a declaration for >>> 'toupper' >>> --- kerberos5/lib__L --- >>> *** [libkafs5_p.a] Error code 70 >>> >>> make[5]: stopped in /usr/src/kerberos5/lib/libkafs5 >>> --- libkafs5.a --- >>> *** [libkafs5.a] Error code 70 >>> >>> and: >>> >>> >>> --- lib__L --- >>> ranlib -D libclang_rt.asan_cxx-i386.a >>> ranlib: fatal: Failed to open 'libclang_rt.asan_cxx-i386.a' >>> *** [libclang_rt.asan_cxx-i386.a] Error code 70 >>> >>> >>> (Notice the variability in what .a the ranlib's fail for.) >>> >>> >>> >>> >>> >> >> >> I looked at this a few days ago and don't believe it's actually a build >> race. I think there is something wrong with the ar/ranlib on that system >> or something else. I've found no evidence of concurrent building of the >> .a files in question. > > FWIW, I got the similar failure when I did last checks for the OFED > commit. For me, it was libgcc.a. > If it was -lgcc_s then it's a known rare build race due to tools/install.sh not handling -S. -- Regards, Bryan Drewery signature.asc Description: OpenPGP digital signature
Re: ESXi NFSv4.1 client id is nasty
Hi, On 06/18/18 17:42, Rick Macklem wrote: Steve Wills wrote: Would it be possible or reasonable to use the client ID to log a message telling the admin to enable a sysctl to enable the hacks? Yes. However, this client implementation id is only seen by the server when the client makes a mount attempt. I suppose it could log the message and fail the mount, if the "hack" sysctl isn't set? I hadn't thought of failing the mount, just defaulting not enabling the hacks unless the admin chooses to enable them. But at the same time being proactive about telling the admin to enable them. I.E. keep the implementation RFC compliant since we wouldn't be changing the behavior based on the implementation ID, only based upon the admin setting the sysctl, which we told them to do based on the implementation ID. Just an idea, maybe Warner's suggestion is a better one. Steve ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: ESXi NFSv4.1 client id is nasty
Steve Wills wrote: >Would it be possible or reasonable to use the client ID to log a message >telling the admin to enable a sysctl to enable the hacks? Yes. However, this client implementation id is only seen by the server when the client makes a mount attempt. I suppose it could log the message and fail the mount, if the "hack" sysctl isn't set? rick [stuff snipped] From: Steve Wills Sent: Monday, June 18, 2018 5:21:10 PM To: Rick Macklem; freebsd-current@freebsd.org Cc: andreas.n...@frequentis.com Subject: Re: ESXi NFSv4.1 client id is nasty Would it be possible or reasonable to use the client ID to log a message telling the admin to enable a sysctl to enable the hacks? Steve On 06/17/18 08:35, Rick Macklem wrote: > Hi, > > Andreas Nagy has been doing a lot of testing of the NFSv4.1 client in ESXi > 6.5u1 > (VMware) against the FreeBSD server. I have given him a bunch of hackish > patches > to try and some of them do help. However not all issues are resolved. > The problem is that these hacks pretty obviously violate the NFSv4.1 RFC > (5661). > (Details on these come later, for those interested in such things.) > > I can think of three ways to deal with this: > 1 - Just leave the server as is and point people to the issues that should be > addressed > in the ESXi client. > 2 - Put the hacks in, but only enable them based on a sysctl not enabled by > default. > (The main problem with this is when the server also has non-ESXi > mounts.) > 3 - Enable the hacks for ESXi client mounts only, using the implementation ID > it presents at mount time in its ExchangeID arguments. > - This is my preferred solution, but the RFC says: > An example use for implementation identifiers would be diagnostic > software that extracts this information in an attempt to identify > interoperability problems, performance workload behaviors, or general > usage statistics. Since the intent of having access to this > information is for planning or general diagnosis only, the client and > server MUST NOT interpret this implementation identity information in > a way that affects interoperational behavior of the implementation. > The reason is that if clients and servers did such a thing, they > might use fewer capabilities of the protocol than the peer can > support, or the client and server might refuse to interoperate. > > Note the "MUST NOT" w.r.t. doing this. Of course, I could argue that, since > the > hacks violate the RFC, then why not enable them in a way that violates the > RFC. > > Anyhow, I would like to hear from others w.r.t. how they think this should be > handled? > > Here's details on the breakage and workarounds for those interested, from > looking > at packet traces in wireshark: > Fairly benign ones: > - The client does a ReclaimComplete with one_fs == false and then does a >ReclaimComplete with one_fs == true. The server returns >NFS4ERR_COMPLETE_ALREADY for the second one, which the ESXi client >doesn't like. >Woraround: Don't return an error for the one_fs == true case and just > assume > that same as "one_fs == false". >There is also a case where the client only does the >ReclaimComplete with one_fs == true. Since FreeBSD exports a hierarchy of >file systems, this doesn't indicate to the server that all reclaims are > done. >(Other extant clients never do the "one_fs == true" variant of >ReclaimComplete.) >This case of just doing the "one_fs == true" variant is actually a > limitation >of the server which I don't know how to fix. However the same workaround >as listed about gets around it. > > - The client puts random garbage in the delegate_type argument for >Open/ClaimPrevious. >Workaround: Since the client sets OPEN4_SHARE_ACCESS_WANT_NO_DELEG, it > doesn't >want a delegation, so assume OPEN_DELEGATE_NONE or > OPEN_DELEGATE_NONE_EXT >instead of garbage. (Not sure which of the two values makes it > happier.) > > Serious ones: > - The client does a OpenDowngrade with arguments set to OPEN_SHARE_ACCESS_BOTH >and OPEN_SHARE_DENY_BOTH. >Since OpenDowngrade is supposed to decrease share_access and share_deny, >the server returns NFS4ERR_INVAL. OpenDowngrade is not supposed to ever >conflict with another Open. (A conflict happens when another Open has >set an OPEN_SHARE_DENY that denies the result of the OpenDowngrade.) >with NFS4ERR_SHARE_DENIED. >I believe this one is done by the client for something it calls a >"device lock" and really doesn't like this failing. >Workaround: All I can think of is ignore the check for new bits not being > set >and reply NFS_OK, when no conflicting Open exists. >When there is a conflicting Open, returning NFS4ERR_INVAL seems to be > the >only option, since NFS4ERR_SHARE_DENIED isn't listed for OpenDowngrade. > > - W
Re: ESXi NFSv4.1 client id is nasty
My thoughts on this are mixed. You need certain workarounds, but they sound like they need to be on a per-client-type basis. On the one hand, you don't want to chat with different clients differently, but on the other you want it to work. I'd suggest a two-tiered approach. First, have a sysctl per workaround that's a list of client types to apply the workaround to. Have these default to ESX client, but allow for others. Second, have a master sysctl to turn on/off per-client workarounds. Have this default to off. And finally, see if you can get ESXi to fix their flaws. This is by far the best solution. The above should really only be a stop-gap, but would be extensible should this sort of thing become more of the norm than is desired. Warner On Mon, Jun 18, 2018 at 3:21 PM, Steve Wills wrote: > Would it be possible or reasonable to use the client ID to log a message > telling the admin to enable a sysctl to enable the hacks? > > Steve > > On 06/17/18 08:35, Rick Macklem wrote: > >> Hi, >> >> Andreas Nagy has been doing a lot of testing of the NFSv4.1 client in >> ESXi 6.5u1 >> (VMware) against the FreeBSD server. I have given him a bunch of hackish >> patches >> to try and some of them do help. However not all issues are resolved. >> The problem is that these hacks pretty obviously violate the NFSv4.1 RFC >> (5661). >> (Details on these come later, for those interested in such things.) >> >> I can think of three ways to deal with this: >> 1 - Just leave the server as is and point people to the issues that >> should be addressed >> in the ESXi client. >> 2 - Put the hacks in, but only enable them based on a sysctl not enabled >> by default. >> (The main problem with this is when the server also has non-ESXi >> mounts.) >> 3 - Enable the hacks for ESXi client mounts only, using the >> implementation ID >> it presents at mount time in its ExchangeID arguments. >> - This is my preferred solution, but the RFC says: >> An example use for implementation identifiers would be diagnostic >> software that extracts this information in an attempt to identify >> interoperability problems, performance workload behaviors, or general >> usage statistics. Since the intent of having access to this >> information is for planning or general diagnosis only, the client and >> server MUST NOT interpret this implementation identity information in >> a way that affects interoperational behavior of the implementation. >> The reason is that if clients and servers did such a thing, they >> might use fewer capabilities of the protocol than the peer can >> support, or the client and server might refuse to interoperate. >> >> Note the "MUST NOT" w.r.t. doing this. Of course, I could argue that, >> since the >> hacks violate the RFC, then why not enable them in a way that violates >> the RFC. >> >> Anyhow, I would like to hear from others w.r.t. how they think this >> should be handled? >> >> Here's details on the breakage and workarounds for those interested, from >> looking >> at packet traces in wireshark: >> Fairly benign ones: >> - The client does a ReclaimComplete with one_fs == false and then does a >>ReclaimComplete with one_fs == true. The server returns >>NFS4ERR_COMPLETE_ALREADY for the second one, which the ESXi client >>doesn't like. >>Woraround: Don't return an error for the one_fs == true case and just >> assume >> that same as "one_fs == false". >>There is also a case where the client only does the >>ReclaimComplete with one_fs == true. Since FreeBSD exports a hierarchy >> of >>file systems, this doesn't indicate to the server that all reclaims >> are done. >>(Other extant clients never do the "one_fs == true" variant of >>ReclaimComplete.) >>This case of just doing the "one_fs == true" variant is actually a >> limitation >>of the server which I don't know how to fix. However the same >> workaround >>as listed about gets around it. >> >> - The client puts random garbage in the delegate_type argument for >>Open/ClaimPrevious. >>Workaround: Since the client sets OPEN4_SHARE_ACCESS_WANT_NO_DELEG, >> it doesn't >>want a delegation, so assume OPEN_DELEGATE_NONE or >> OPEN_DELEGATE_NONE_EXT >>instead of garbage. (Not sure which of the two values makes it >> happier.) >> >> Serious ones: >> - The client does a OpenDowngrade with arguments set to >> OPEN_SHARE_ACCESS_BOTH >>and OPEN_SHARE_DENY_BOTH. >>Since OpenDowngrade is supposed to decrease share_access and >> share_deny, >>the server returns NFS4ERR_INVAL. OpenDowngrade is not supposed to ever >>conflict with another Open. (A conflict happens when another Open has >>set an OPEN_SHARE_DENY that denies the result of the OpenDowngrade.) >>with NFS4ERR_SHARE_DENIED. >>I believe this one is done by the client for something it calls a >>"device lock" and really doesn't like this failing. >>Workar
Re: ESXi NFSv4.1 client id is nasty
Would it be possible or reasonable to use the client ID to log a message telling the admin to enable a sysctl to enable the hacks? Steve On 06/17/18 08:35, Rick Macklem wrote: Hi, Andreas Nagy has been doing a lot of testing of the NFSv4.1 client in ESXi 6.5u1 (VMware) against the FreeBSD server. I have given him a bunch of hackish patches to try and some of them do help. However not all issues are resolved. The problem is that these hacks pretty obviously violate the NFSv4.1 RFC (5661). (Details on these come later, for those interested in such things.) I can think of three ways to deal with this: 1 - Just leave the server as is and point people to the issues that should be addressed in the ESXi client. 2 - Put the hacks in, but only enable them based on a sysctl not enabled by default. (The main problem with this is when the server also has non-ESXi mounts.) 3 - Enable the hacks for ESXi client mounts only, using the implementation ID it presents at mount time in its ExchangeID arguments. - This is my preferred solution, but the RFC says: An example use for implementation identifiers would be diagnostic software that extracts this information in an attempt to identify interoperability problems, performance workload behaviors, or general usage statistics. Since the intent of having access to this information is for planning or general diagnosis only, the client and server MUST NOT interpret this implementation identity information in a way that affects interoperational behavior of the implementation. The reason is that if clients and servers did such a thing, they might use fewer capabilities of the protocol than the peer can support, or the client and server might refuse to interoperate. Note the "MUST NOT" w.r.t. doing this. Of course, I could argue that, since the hacks violate the RFC, then why not enable them in a way that violates the RFC. Anyhow, I would like to hear from others w.r.t. how they think this should be handled? Here's details on the breakage and workarounds for those interested, from looking at packet traces in wireshark: Fairly benign ones: - The client does a ReclaimComplete with one_fs == false and then does a ReclaimComplete with one_fs == true. The server returns NFS4ERR_COMPLETE_ALREADY for the second one, which the ESXi client doesn't like. Woraround: Don't return an error for the one_fs == true case and just assume that same as "one_fs == false". There is also a case where the client only does the ReclaimComplete with one_fs == true. Since FreeBSD exports a hierarchy of file systems, this doesn't indicate to the server that all reclaims are done. (Other extant clients never do the "one_fs == true" variant of ReclaimComplete.) This case of just doing the "one_fs == true" variant is actually a limitation of the server which I don't know how to fix. However the same workaround as listed about gets around it. - The client puts random garbage in the delegate_type argument for Open/ClaimPrevious. Workaround: Since the client sets OPEN4_SHARE_ACCESS_WANT_NO_DELEG, it doesn't want a delegation, so assume OPEN_DELEGATE_NONE or OPEN_DELEGATE_NONE_EXT instead of garbage. (Not sure which of the two values makes it happier.) Serious ones: - The client does a OpenDowngrade with arguments set to OPEN_SHARE_ACCESS_BOTH and OPEN_SHARE_DENY_BOTH. Since OpenDowngrade is supposed to decrease share_access and share_deny, the server returns NFS4ERR_INVAL. OpenDowngrade is not supposed to ever conflict with another Open. (A conflict happens when another Open has set an OPEN_SHARE_DENY that denies the result of the OpenDowngrade.) with NFS4ERR_SHARE_DENIED. I believe this one is done by the client for something it calls a "device lock" and really doesn't like this failing. Workaround: All I can think of is ignore the check for new bits not being set and reply NFS_OK, when no conflicting Open exists. When there is a conflicting Open, returning NFS4ERR_INVAL seems to be the only option, since NFS4ERR_SHARE_DENIED isn't listed for OpenDowngrade. - When a server reboots, client does not serialize ExchangeID/CreateSession. When the server reboots, a client needs to do a serialized set of RPCs with ExchangeID followed by CreateSession to confirm it. The reply to ExchangeID has a sequence number (csr_sequence) in it and the CreateSession needs to have the same value in its csa_sequence argument to confirm the clientid issued by the ExchangeID. The client sends many ExchangeIDs and CreateSessions, so they end up failing many times due to the sequence number not matching the last ExchangeID. (This might only happen in the trunked case.) Workaround: Nothing that I can think of. - ExchangeID sometimes sends eia_clientowner.co_verifier argument as all zeros. Sometimes the client bogusly fill
Re: A head buildworld race visible in the ci.freebsd.org build history
On 2018-Jun-18, at 12:42 PM, Bryan Drewery wrote: > On 6/15/2018 10:55 PM, Mark Millard wrote: >> In watching ci.freebsd.org builds I've seen a notable >> number of one time failures, such as (example from >> powerpc64): >> >> --- all_subdir_lib/libufs --- >> ranlib -D libufs.a >> ranlib: fatal: Failed to open 'libufs.a' >> *** [libufs.a] Error code 70 >> >> where the next build works despite the change being >> irrelevant to whatever ranlib complained about. >> >> Other builds failed similarly: >> >> --- all_subdir_lib/libbsm --- >> ranlib -D libbsm_p.a >> ranlib: fatal: Failed to open 'libbsm_p.a' >> *** [libbsm_p.a] Error code 70 >> >> and: >> >> --- kerberos5/lib__L --- >> ranlib -D libgssapi_spnego_p.a >> --- libgssapi_spnego.a --- >> ranlib -D libgssapi_spnego.a >> --- libgssapi_spnego_p.a --- >> ranlib: fatal: Failed to open 'libgssapi_spnego_p.a' >> *** [libgssapi_spnego_p.a] Error code 70 >> >> and so on. >> >> >> It is not limited to powerpc64. For example, for aarch64 >> there are: >> >> --- libpam_exec.a --- >> building static pam_exec library >> ar -crD libpam_exec.a `NM='nm' NMFLAGS='' lorder pam_exec.o | tsort -q` >> ranlib -D libpam_exec.a >> ranlib: fatal: Failed to open 'libpam_exec.a' >> *** [libpam_exec.a] Error code 70 >> >> and: >> >> --- all_subdir_lib/libusb --- >> ranlib -D libusb.a >> ranlib: fatal: Failed to open 'libusb.a' >> *** [libusb.a] Error code 70 >> >> and: >> >> --- all_subdir_lib/libbsnmp --- >> ranlib: fatal: Failed to open 'libbsnmp.a' >> --- all_subdir_lib/ncurses --- >> --- all_subdir_lib/ncurses/panelw --- >> --- panel.pico --- >> --- all_subdir_lib/libbsnmp --- >> *** [libbsnmp.a] Error code 70 >> >> >> Even amd64 gets such: >> >> --- libpcap.a --- >> ranlib -D libpcap.a >> ranlib: fatal: Failed to open 'libpcap.a' >> *** [libpcap.a] Error code 70 >> >> and: >> >> >> --- libkafs5.a --- >> ranlib: fatal: Failed to open 'libkafs5.a' >> --- libkafs5_p.a --- >> ranlib: fatal: Failed to open 'libkafs5_p.a' >> --- cddl/lib__L --- >> /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/lua/lbaselib.c:60:26: >> note: include the header or explicitly provide a declaration for >> 'toupper' >> --- kerberos5/lib__L --- >> *** [libkafs5_p.a] Error code 70 >> >> make[5]: stopped in /usr/src/kerberos5/lib/libkafs5 >> --- libkafs5.a --- >> *** [libkafs5.a] Error code 70 >> >> and: >> >> >> --- lib__L --- >> ranlib -D libclang_rt.asan_cxx-i386.a >> ranlib: fatal: Failed to open 'libclang_rt.asan_cxx-i386.a' >> *** [libclang_rt.asan_cxx-i386.a] Error code 70 >> >> >> (Notice the variability in what .a the ranlib's fail for.) >> >> >> >> >> > > > I looked at this a few days ago and don't believe it's actually a build > race. I think there is something wrong with the ar/ranlib on that system > or something else. I've found no evidence of concurrent building of the > .a files in question. Looking at a bunch of the failures, spanning multiple FreeBSD-head-*-build types of builds, I see only: NODE_LABELS bhyve_host butler1.nyi.freebsd.org jailer jailer_fast NODE_NAME butler1.nyi.freebsd.org for the failures that I looked at. So your "on that system" might well be correct. === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar) ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: A head buildworld race visible in the ci.freebsd.org build history
On Mon, Jun 18, 2018 at 12:42:46PM -0700, Bryan Drewery wrote: > On 6/15/2018 10:55 PM, Mark Millard wrote: > > In watching ci.freebsd.org builds I've seen a notable > > number of one time failures, such as (example from > > powerpc64): > > > > --- all_subdir_lib/libufs --- > > ranlib -D libufs.a > > ranlib: fatal: Failed to open 'libufs.a' > > *** [libufs.a] Error code 70 > > > > where the next build works despite the change being > > irrelevant to whatever ranlib complained about. > > > > Other builds failed similarly: > > > > --- all_subdir_lib/libbsm --- > > ranlib -D libbsm_p.a > > ranlib: fatal: Failed to open 'libbsm_p.a' > > *** [libbsm_p.a] Error code 70 > > > > and: > > > > --- kerberos5/lib__L --- > > ranlib -D libgssapi_spnego_p.a > > --- libgssapi_spnego.a --- > > ranlib -D libgssapi_spnego.a > > --- libgssapi_spnego_p.a --- > > ranlib: fatal: Failed to open 'libgssapi_spnego_p.a' > > *** [libgssapi_spnego_p.a] Error code 70 > > > > and so on. > > > > > > It is not limited to powerpc64. For example, for aarch64 > > there are: > > > > --- libpam_exec.a --- > > building static pam_exec library > > ar -crD libpam_exec.a `NM='nm' NMFLAGS='' lorder pam_exec.o | tsort -q` > > ranlib -D libpam_exec.a > > ranlib: fatal: Failed to open 'libpam_exec.a' > > *** [libpam_exec.a] Error code 70 > > > > and: > > > > --- all_subdir_lib/libusb --- > > ranlib -D libusb.a > > ranlib: fatal: Failed to open 'libusb.a' > > *** [libusb.a] Error code 70 > > > > and: > > > > --- all_subdir_lib/libbsnmp --- > > ranlib: fatal: Failed to open 'libbsnmp.a' > > --- all_subdir_lib/ncurses --- > > --- all_subdir_lib/ncurses/panelw --- > > --- panel.pico --- > > --- all_subdir_lib/libbsnmp --- > > *** [libbsnmp.a] Error code 70 > > > > > > Even amd64 gets such: > > > > --- libpcap.a --- > > ranlib -D libpcap.a > > ranlib: fatal: Failed to open 'libpcap.a' > > *** [libpcap.a] Error code 70 > > > > and: > > > > > > --- libkafs5.a --- > > ranlib: fatal: Failed to open 'libkafs5.a' > > --- libkafs5_p.a --- > > ranlib: fatal: Failed to open 'libkafs5_p.a' > > --- cddl/lib__L --- > > /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/lua/lbaselib.c:60:26: > > note: include the header or explicitly provide a declaration for > > 'toupper' > > --- kerberos5/lib__L --- > > *** [libkafs5_p.a] Error code 70 > > > > make[5]: stopped in /usr/src/kerberos5/lib/libkafs5 > > --- libkafs5.a --- > > *** [libkafs5.a] Error code 70 > > > > and: > > > > > > --- lib__L --- > > ranlib -D libclang_rt.asan_cxx-i386.a > > ranlib: fatal: Failed to open 'libclang_rt.asan_cxx-i386.a' > > *** [libclang_rt.asan_cxx-i386.a] Error code 70 > > > > > > (Notice the variability in what .a the ranlib's fail for.) > > > > > > > > > > > > > I looked at this a few days ago and don't believe it's actually a build > race. I think there is something wrong with the ar/ranlib on that system > or something else. I've found no evidence of concurrent building of the > .a files in question. FWIW, I got the similar failure when I did last checks for the OFED commit. For me, it was libgcc.a. ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
RFC: ESXi client is nasty, what should I do?
Hi, I realized that the subject line "ESXi NFSv4.1 client id is nasty" wouldn't have indicated that I was looking for comments w.r.t. how to handle this poorly behaved client. Please go to the "ESXi NFSv4.1 client id is nasty" thread and comment. (It should be in the archive, if you already deleted it.) Thanks, rick ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: A head buildworld race visible in the ci.freebsd.org build history
On 6/15/2018 10:55 PM, Mark Millard wrote: > In watching ci.freebsd.org builds I've seen a notable > number of one time failures, such as (example from > powerpc64): > > --- all_subdir_lib/libufs --- > ranlib -D libufs.a > ranlib: fatal: Failed to open 'libufs.a' > *** [libufs.a] Error code 70 > > where the next build works despite the change being > irrelevant to whatever ranlib complained about. > > Other builds failed similarly: > > --- all_subdir_lib/libbsm --- > ranlib -D libbsm_p.a > ranlib: fatal: Failed to open 'libbsm_p.a' > *** [libbsm_p.a] Error code 70 > > and: > > --- kerberos5/lib__L --- > ranlib -D libgssapi_spnego_p.a > --- libgssapi_spnego.a --- > ranlib -D libgssapi_spnego.a > --- libgssapi_spnego_p.a --- > ranlib: fatal: Failed to open 'libgssapi_spnego_p.a' > *** [libgssapi_spnego_p.a] Error code 70 > > and so on. > > > It is not limited to powerpc64. For example, for aarch64 > there are: > > --- libpam_exec.a --- > building static pam_exec library > ar -crD libpam_exec.a `NM='nm' NMFLAGS='' lorder pam_exec.o | tsort -q` > ranlib -D libpam_exec.a > ranlib: fatal: Failed to open 'libpam_exec.a' > *** [libpam_exec.a] Error code 70 > > and: > > --- all_subdir_lib/libusb --- > ranlib -D libusb.a > ranlib: fatal: Failed to open 'libusb.a' > *** [libusb.a] Error code 70 > > and: > > --- all_subdir_lib/libbsnmp --- > ranlib: fatal: Failed to open 'libbsnmp.a' > --- all_subdir_lib/ncurses --- > --- all_subdir_lib/ncurses/panelw --- > --- panel.pico --- > --- all_subdir_lib/libbsnmp --- > *** [libbsnmp.a] Error code 70 > > > Even amd64 gets such: > > --- libpcap.a --- > ranlib -D libpcap.a > ranlib: fatal: Failed to open 'libpcap.a' > *** [libpcap.a] Error code 70 > > and: > > > --- libkafs5.a --- > ranlib: fatal: Failed to open 'libkafs5.a' > --- libkafs5_p.a --- > ranlib: fatal: Failed to open 'libkafs5_p.a' > --- cddl/lib__L --- > /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/lua/lbaselib.c:60:26: > note: include the header or explicitly provide a declaration for > 'toupper' > --- kerberos5/lib__L --- > *** [libkafs5_p.a] Error code 70 > > make[5]: stopped in /usr/src/kerberos5/lib/libkafs5 > --- libkafs5.a --- > *** [libkafs5.a] Error code 70 > > and: > > > --- lib__L --- > ranlib -D libclang_rt.asan_cxx-i386.a > ranlib: fatal: Failed to open 'libclang_rt.asan_cxx-i386.a' > *** [libclang_rt.asan_cxx-i386.a] Error code 70 > > > (Notice the variability in what .a the ranlib's fail for.) > > > > > I looked at this a few days ago and don't believe it's actually a build race. I think there is something wrong with the ar/ranlib on that system or something else. I've found no evidence of concurrent building of the .a files in question. -- Regards, Bryan Drewery ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Current @ r335314 not bootable with Geli and ZFS
Something changed in /boot/gptzfsboot between r334610 and r335314. I built current this morning and my system is un-bootable. I am using redundant ZFS disks and only copied the updated /boot/gptzfsboot file to my ada0 drive. I was able to boot the ada1 drive that still had the gptzfsboot file from r334610. I had a similar issue a few months ago with the upgrades to the Geli + ZFS booting process. These were resolved and operation has been fine since the last 'hick-up' in the testing process. I might not be the only person running the combination of Geli encryption and using a ZFS filesystem, but it should not be that much uncommon setup that I am the first to report the problem. Let me know far back I need to revert my sources to identify the commit that broke gptzfsboot. My system goes into a continuous reboot loop before presenting the password prompt. It is very early in the startup process. Tom -- Public Keys: PGP KeyID = 0x5F22FDC1 GnuPG KeyID = 0x620836CF ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: r335282: first stage boot failure on PCengines APU 2C4
-BEGIN PGP SIGNED MESSAGE- Hash: SHA512 Am Mon, 18 Jun 2018 07:42:20 -0600 Warner Losh schrieb: > On Mon, Jun 18, 2018, 3:01 AM Olivier Cochard-Labbé > wrote: > > > On Sun, Jun 17, 2018 at 10:01 AM O. Hartmann > > wrote: > > > > > -BEGIN PGP SIGNED MESSAGE- > > > Hash: SHA512 > > > > > > Running CURRENT as routing and firewalling appliance on a PCengines APU > > > 2C4 with the > > > latest (official) SEABios available for this product, NanoBSD (FreeBSD > > > CURRENT FreeBSD > > > 12.0-CURRENT #60 r335278: Sun Jun 17 07:57:20 CEST 2018 amd64)is unable > > to > > > boot recent > > > OS at the first stage (GPT partitioning, SD card memory). > > > > > > Hi, > > > > > > My nanobsd images are based on : > > [root@apu2]~# uname -a > > FreeBSD apu2 12.0-CURRENT FreeBSD 12.0-CURRENT r335286M amd64 > > > > And I don't remember to have upgraded its BIOS: > > [root@apu2]~# kenv smbios.bios.reldate > > 03/07/2016 > > [root@apu2]~# kenv smbios.system.product > > apu2 > > > > But I'm using MBR partitionning on a mSATA disk. > > > > Do you know the first version to have this problem? Are you using geli? > > Warner > > > Hello, if you addressed me (wasn't so clear from your reply on Olivier Cochard-Labbé's reply to me), I could give you this information (I posted it to Allen Jude in response to svn commit: r335254 - in head/stand/i386: libi386 zfsboot probably the wrong commit. Please allow me to copy-paste: [...] I realised that CURRENT r335222 from Friday, 10th June 2018 booted without problems (NanoBSD, slightly modified to boot off GPT/UEFI partitions, but in general a simple "gpart bootcode -b pmbr -p /boot/gptboot -i 2 mmcsd0" preparation of a dd'd image). Another try with recent r335282 on a APU 2C4, freshly build NanoBSD, fails to boot firststage: the (most recent) SEABios stopps for ever at telling "Booting from hard disk"; were usually a carret starts to spinn there is vast emptyness. Preparing the boot partition with an older bootcode on that very same SD card via "gpart bootcode -b pmbr - - -p /boot/gptboot -i 2 mmcsd0" with an older booted image of CURRENT has solved the problem. The layout of the SD card is as follows, just for the record: #: gpart show mmcsd0 => 40 60751792 mmcsd0 GPT (29G) 40 1024 2 freebsd-boot (512K) 1064 2205944 3 freebsd-ufs (1.1G) 2207008 2210127 4 freebsd-ufs (1.1G) 4417135 1048576 5 freebsd-ufs (512M) 5465711 55286121 - free - (26G) [...] I didn't bi-sect the issue du to time constraints, but if it helps, I could try. I resolved the problem temporarely by writing the bootcode and partition code from an older image. The APU is serial console only. Kind regards, Oliver Hartmann - -- O. Hartmann Ich widerspreche der Nutzung oder Übermittlung meiner Daten für Werbezwecke oder für die Markt- oder Meinungsforschung (§ 28 Abs. 4 BDSG). -BEGIN PGP SIGNATURE- iLUEARMKAB0WIQQZVZMzAtwC2T/86TrS528fyFhYlAUCWyfUjQAKCRDS528fyFhY lJZ6AfwOeHnA01kpMxQgEtkIaoCYGoa1wetyss1HvkJj/kolplwJT9mVRKypLjZb CWxS+ldHyy2lhs9Q1dIrrm64TfVbAgCAc3oZyuOtzoO9+CbMopmUwt5FqFBGn/b0 AI7w7mVu+EzWww/Qx/73E98j6LtZIjB8jpHGc0lqlpDwD6HL5IHt =mFD0 -END PGP SIGNATURE- ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Ryzen public erratas
On 6/13/2018 6:35 AM, Konstantin Belousov wrote: > > Please report the results. If the script helps, I will code the kernel > change to apply the workarounds. The hard lockups I was seeing on Ryzen and Epyc boxes are now gone with the microcode and script below. Not sure if its one or some combo of the settings, but all the steps below have made my 2 test systems stable on RELENG_11 anyways. This was on a Ryzen 5 1600X (ASUS PRIME X370-PRO BIOS from 04/19/2018) CPU Microcode patch level: 0x8001137 And EPYC 7281 16-Core (Supermicro H11SSL-i BIOS 04/27/2018 ) Microcode patch level: 0x8001227 Details of the issue were discussed at https://lists.freebsd.org/pipermail/freebsd-virtualization/2018-March/006187.html and https://lists.freebsd.org/pipermail/freebsd-stable/2018-January/088174.html TL;DR : Generating traffic via iperf3 between VMs either on bhyve or VirtualBox would make the box lockup-- no crash, just a blank screen ---Mike > > #!/bin/sh > > # Enable workarounds for erratas listed in > # https://developer.amd.com/wp-content/resources/55449_1.12.pdf > > # 1057, 1109 > sysctl machdep.idle_mwait=0 > sysctl machdep.idle=hlt > > for x in /dev/cpuctl*; do > # 1021 > cpucontrol -m '0xc0011029|=0x2000' $x > # 1033 > cpucontrol -m '0xc0011020|=0x10' $x > # 1049 > cpucontrol -m '0xc0011028|=0x10' $x > # 1095 > cpucontrol -m '0xc0011020|=0x200' $x > done > > ___ > freebsd-current@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org" > > -- --- Mike Tancsa, tel +1 519 651 3400 x203 Sentex Communications, m...@sentex.net Providing Internet services since 1994 www.sentex.net Cambridge, Ontario Canada ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: r335282: first stage boot failure on PCengines APU 2C4
On Mon, Jun 18, 2018, 3:01 AM Olivier Cochard-Labbé wrote: > On Sun, Jun 17, 2018 at 10:01 AM O. Hartmann > wrote: > > > -BEGIN PGP SIGNED MESSAGE- > > Hash: SHA512 > > > > Running CURRENT as routing and firewalling appliance on a PCengines APU > > 2C4 with the > > latest (official) SEABios available for this product, NanoBSD (FreeBSD > > CURRENT FreeBSD > > 12.0-CURRENT #60 r335278: Sun Jun 17 07:57:20 CEST 2018 amd64)is unable > to > > boot recent > > OS at the first stage (GPT partitioning, SD card memory). > > > > Hi, > > > My nanobsd images are based on : > [root@apu2]~# uname -a > FreeBSD apu2 12.0-CURRENT FreeBSD 12.0-CURRENT r335286M amd64 > > And I don't remember to have upgraded its BIOS: > [root@apu2]~# kenv smbios.bios.reldate > 03/07/2016 > [root@apu2]~# kenv smbios.system.product > apu2 > > But I'm using MBR partitionning on a mSATA disk. > Do you know the first version to have this problem? Are you using geli? Warner > ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: r335282: first stage boot failure on PCengines APU 2C4
On Sun, Jun 17, 2018 at 10:01 AM O. Hartmann wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA512 > > Running CURRENT as routing and firewalling appliance on a PCengines APU > 2C4 with the > latest (official) SEABios available for this product, NanoBSD (FreeBSD > CURRENT FreeBSD > 12.0-CURRENT #60 r335278: Sun Jun 17 07:57:20 CEST 2018 amd64)is unable to > boot recent > OS at the first stage (GPT partitioning, SD card memory). > > Hi, My nanobsd images are based on : [root@apu2]~# uname -a FreeBSD apu2 12.0-CURRENT FreeBSD 12.0-CURRENT r335286M amd64 And I don't remember to have upgraded its BIOS: [root@apu2]~# kenv smbios.bios.reldate 03/07/2016 [root@apu2]~# kenv smbios.system.product apu2 But I'm using MBR partitionning on a mSATA disk. Regards ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"