Re: Ryzen public erratas

2018-06-18 Thread Eitan Adler
On 13 June 2018 at 04:16, Eitan Adler  wrote:
> On 13 June 2018 at 03:35, Konstantin Belousov  wrote:
>> Today I noted that AMD published the public errata document for Ryzens,
>> https://developer.amd.com/wp-content/resources/55449_1.12.pdf
>>
>> Some of the issues listed there looks quite relevant to the potential
>> hangs that some people still experience with the machines.  I wrote
>> a script which should apply the recommended workarounds to the erratas
>> that I find interesting.
>>
>> To run it, kldload cpuctl, then apply the latest firmware update to your
>> CPU, then run the following shell script.  Comments indicate the errata
>> number for the workarounds.
>>
>> Please report the results.  If the script helps, I will code the kernel
>> change to apply the workarounds.
>>
>> #!/bin/sh
>>
>> # Enable workarounds for erratas listed in
>> # https://developer.amd.com/wp-content/resources/55449_1.12.pdf
>>
>> # 1057, 1109
>> sysctl machdep.idle_mwait=0
>> sysctl machdep.idle=hlt
>
>
> Is this needed if it was previously machdep.idle: acpi ?

This might explain why I've never seen the lockup issues mentioned by
other people. What would cause my machine to differ from others?



-- 
Eitan Adler
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: A head buildworld race visible in the ci.freebsd.org build history

2018-06-18 Thread Mark Millard
On 2018-Jun-18, at 6:03 PM, Mark Millard  wrote:
> On 2018-Jun-18, at 4:08 PM, Bryan Drewery  wrote:
> 
>> On 6/18/2018 3:27 PM, Li-Wen Hsu wrote:
>>> ranlib -D libpcap.a
>>> ranlib: fatal: Failed to open 'libpcap.a'
>> 
>> Where is this error even coming from? It's not in the usr.bin/ar code
>> and ranlib does not cause it.
>> 
>> # ranlib -D uh
>> ranlib: warning: uh: no such file
> 
> A more complete sequence is (with some
> other text mixed in, as in where I got
> the text from on ci.freebsd.org):
> 
> --- libvgl.a ---
> building static vgl library
> ar -crD libvgl.a `NM='nm' NMFLAGS=''  lorder main.o simple.o bitmap.o text.o 
> mouse.o keyboard.o  | tsort -q` 
> --- all_subdir_lib/libsysdecode ---
> ranlib -D libsysdecode.a
> --- all_subdir_lib/libvgl ---
> ranlib -D libvgl.a
> ranlib: fatal: Failed to open 'libvgl.a'
> --- all_subdir_lib/libsysdecode ---
> ranlib: fatal: Failed to open 'libsysdecode.a'
> --- all_subdir_lib/libvgl ---
> *** [libvgl.a] Error code 70
> 
> So, in essence,
> 
> ar -crD libvgl.a `NM='nm' NMFLAGS=''  lorder main.o simple.o bitmap.o text.o 
> mouse.o keyboard.o  | tsort -q` 
> ranlib -D libvgl.a
> ranlib: fatal: Failed to open 'libvgl.a'
> 
> It is not obvious to me that the "Failed to open" means
> that there was "no such file". Might there be some other
> form of "Failed to open" for a file that does exist from
> the ar at least having created its output .a file?
> 

Also, if what varies is the head system version (for failing vs.
working) and what is the same is running a 11.1R jail, then it
would seem to be the underlying head system software in each
that matters for the ar -> ranlib sequence behavior, but not
11.1R's ar or ranlib or 11.1R's libraries indirectly involved
--nor in head's ar or ranlib (or their indirections). head's:
unused.

The only parts of head that could be involved are parts that the
11.1R jail does not avoid.

This suggests more basic infrastructure in head to me.

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: A head buildworld race visible in the ci.freebsd.org build history

2018-06-18 Thread Mark Millard



On 2018-Jun-18, at 4:08 PM, Bryan Drewery  wrote:

> On 6/18/2018 3:27 PM, Li-Wen Hsu wrote:
>> ranlib -D libpcap.a
>> ranlib: fatal: Failed to open 'libpcap.a'
> 
> Where is this error even coming from? It's not in the usr.bin/ar code
> and ranlib does not cause it.
> 
> # ranlib -D uh
> ranlib: warning: uh: no such file

A more complete sequence is (with some
other text mixed in, as in where I got
the text from on ci.freebsd.org):

--- libvgl.a ---
building static vgl library
ar -crD libvgl.a `NM='nm' NMFLAGS=''  lorder main.o simple.o bitmap.o text.o 
mouse.o keyboard.o  | tsort -q` 
--- all_subdir_lib/libsysdecode ---
ranlib -D libsysdecode.a
--- all_subdir_lib/libvgl ---
ranlib -D libvgl.a
ranlib: fatal: Failed to open 'libvgl.a'
--- all_subdir_lib/libsysdecode ---
ranlib: fatal: Failed to open 'libsysdecode.a'
--- all_subdir_lib/libvgl ---
*** [libvgl.a] Error code 70

So, in essence,

ar -crD libvgl.a `NM='nm' NMFLAGS=''  lorder main.o simple.o bitmap.o text.o 
mouse.o keyboard.o  | tsort -q` 
ranlib -D libvgl.a
ranlib: fatal: Failed to open 'libvgl.a'

It is not obvious to me that the "Failed to open" means
that there was "no such file". Might there be some other
form of "Failed to open" for a file that does exist from
the ar at least having created its output .a file?


===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: review of nfsd rc.d script patch

2018-06-18 Thread Don Lewis
On 15 Jun, Rick Macklem wrote:
> Hi,
> 
> For the pNFS service MDS machine, the nfsd can't be started until all nfs 
> mounts
> in /etc/fstab are done.
> I think that adding "mountcritremote" to the "# REQUIRE:" line is sufficient 
> to do this?
> 
> I don't think delaying the startup of the nfsd daemon until after any NFS 
> mounts
> are done will do any harm, but if others think it would be a POLA violation,
> I could make this dependent on the pNFS service being enabled.
> Does anyone think this would cause a POLA violation?

Sounds like that would break cross mounts.  Back in the olden days
before the automounter, I would set up workstation clusters with hosta
exporting local filesystem /home/hosta, and hostb exporting /home/hostb.
In addition, hosta would do a bg NFS mount of /home/hostb and hostb
would do a bg NFS mount of /home/hosta.  That way everybody would have a
consistent view of everything.  If a power failure took down everything,
the first system up would export its local filesystem and even though it
wouldn't be able to mount any remote filesystems, mount would background
itself at the boot would complete.  As the remaining machines came up,
they would be able to mount the remote filesystems of the machine that
came up earlier, and the early machines would mount the filesystems from
the later machines as they became available.

If nfsd is delayed until all the NFS filesystems are mounted, the above
setup would deadlock.

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: A head buildworld race visible in the ci.freebsd.org build history

2018-06-18 Thread Ed Maste
On 18 June 2018 at 19:29, Bryan Drewery  wrote:
>
> The error is coming from libarchive which had a change between those
> revisions:
>
>> 
>> r328332 | mm | 2018-01-24 06:24:17 -0800 (Wed, 24 Jan 2018) | 14 lines

Li-Wen reported that the build is done in a 11.1-rel jail though, so
the libarchive (or any userland) change shouldn't be responsible.

Can we update a canary builder to somewhere between r328278 and r88?
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: A head buildworld race visible in the ci.freebsd.org build history

2018-06-18 Thread Bryan Drewery
On 6/18/2018 3:27 PM, Li-Wen Hsu wrote:
> On Mon, Jun 18, 2018 at 5:04 PM Mark Millard via freebsd-toolchain
>  wrote:
>>
>> On 2018-Jun-18, at 12:42 PM, Bryan Drewery  wrote:
>>
>>> On 6/15/2018 10:55 PM, Mark Millard wrote:
 In watching ci.freebsd.org builds I've seen a notable
 number of one time failures, such as (example from
 powerpc64):

 --- all_subdir_lib/libufs ---
 ranlib -D libufs.a
 ranlib: fatal: Failed to open 'libufs.a'
 *** [libufs.a] Error code 70

 where the next build works despite the change being
 irrelevant to whatever ranlib complained about.

 Other builds failed similarly:

 --- all_subdir_lib/libbsm ---
 ranlib -D libbsm_p.a
 ranlib: fatal: Failed to open 'libbsm_p.a'
 *** [libbsm_p.a] Error code 70

 and:

 --- kerberos5/lib__L ---
 ranlib -D libgssapi_spnego_p.a
 --- libgssapi_spnego.a ---
 ranlib -D libgssapi_spnego.a
 --- libgssapi_spnego_p.a ---
 ranlib: fatal: Failed to open 'libgssapi_spnego_p.a'
 *** [libgssapi_spnego_p.a] Error code 70

 and so on.


 It is not limited to powerpc64. For example, for aarch64
 there are:

 --- libpam_exec.a ---
 building static pam_exec library
 ar -crD libpam_exec.a `NM='nm' NMFLAGS=''  lorder pam_exec.o  | tsort -q`
 ranlib -D libpam_exec.a
 ranlib: fatal: Failed to open 'libpam_exec.a'
 *** [libpam_exec.a] Error code 70

 and:

 --- all_subdir_lib/libusb ---
 ranlib -D libusb.a
 ranlib: fatal: Failed to open 'libusb.a'
 *** [libusb.a] Error code 70

 and:

 --- all_subdir_lib/libbsnmp ---
 ranlib: fatal: Failed to open 'libbsnmp.a'
 --- all_subdir_lib/ncurses ---
 --- all_subdir_lib/ncurses/panelw ---
 --- panel.pico ---
 --- all_subdir_lib/libbsnmp ---
 *** [libbsnmp.a] Error code 70


 Even amd64 gets such:

 --- libpcap.a ---
 ranlib -D libpcap.a
 ranlib: fatal: Failed to open 'libpcap.a'
 *** [libpcap.a] Error code 70

 and:


 --- libkafs5.a ---
 ranlib: fatal: Failed to open 'libkafs5.a'
 --- libkafs5_p.a ---
 ranlib: fatal: Failed to open 'libkafs5_p.a'
 --- cddl/lib__L ---
 /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/lua/lbaselib.c:60:26:
  note: include the header  or explicitly provide a declaration 
 for 'toupper'
 --- kerberos5/lib__L ---
 *** [libkafs5_p.a] Error code 70

 make[5]: stopped in /usr/src/kerberos5/lib/libkafs5
 --- libkafs5.a ---
 *** [libkafs5.a] Error code 70

 and:


 --- lib__L ---
 ranlib -D libclang_rt.asan_cxx-i386.a
 ranlib: fatal: Failed to open 'libclang_rt.asan_cxx-i386.a'
 *** [libclang_rt.asan_cxx-i386.a] Error code 70


 (Notice the variability in what .a the ranlib's fail for.)





>>>
>>>
>>> I looked at this a few days ago and don't believe it's actually a build
>>> race. I think there is something wrong with the ar/ranlib on that system
>>> or something else. I've found no evidence of concurrent building of the
>>> .a files in question.
>>
>>
>> Looking at a bunch of the failures, spanning multiple
>> FreeBSD-head-*-build types of builds, I see only:
>>
>> NODE_LABELS bhyve_host butler1.nyi.freebsd.org jailer jailer_fast
>> NODE_NAME   butler1.nyi.freebsd.org
>>
>> for the failures that I looked at.
>>
>> So your "on that system" might well be correct.
> 
> Thanks for the insight, the build is done in a 11.1-R jail on a
> -CURRENT host.  butler1.nyi is running r88 (as a canary) while
> other builders are mostly running r328278.  I upgraded few others and
> it seems can reproduce the issue, and now I downgraded all the build
> slaves to r328278 before we find the root cause.
> 

The error is coming from libarchive which had a change between those
revisions:

> 
> r328332 | mm | 2018-01-24 06:24:17 -0800 (Wed, 24 Jan 2018) | 14 lines
> 
> MFV r328323,328324:
> Sync libarchive with vendor.
> 
> Relevant vendor changes:
>   PR #893: delete dead ppmd7 alloc callbacks
>   PR #904: Fix archive freeing bug in bsdcat
>   PR #961: Fix ZIP format names
>   PR #962: Don't modify attributes for existing directories
>when ARCHIVE_EXTRACT_NO_OVERWRITE is set
>   PR #964: Fix -Werror=implicit-fallthrough= for GCC 7
>   PR #970: zip: Allow backslash as path separator
> 
> MFC after:  1 week
> 
> 

Nothing obvious stands out in the change to me though from a brief look.


-- 
Regards,
Bryan Drewery



signature.asc
Description: OpenPGP digital signature


Re: A head buildworld race visible in the ci.freebsd.org build history

2018-06-18 Thread Bryan Drewery
On 6/18/2018 3:27 PM, Li-Wen Hsu wrote:
> ranlib -D libpcap.a
> ranlib: fatal: Failed to open 'libpcap.a'

Where is this error even coming from? It's not in the usr.bin/ar code
and ranlib does not cause it.

# ranlib -D uh
ranlib: warning: uh: no such file



-- 
Regards,
Bryan Drewery



signature.asc
Description: OpenPGP digital signature


Re: A head buildworld race visible in the ci.freebsd.org build history

2018-06-18 Thread Bryan Drewery
On 6/18/2018 3:31 PM, Li-Wen Hsu wrote:
> On Mon, Jun 18, 2018 at 6:27 PM Bryan Drewery  wrote:
>>
>> On 6/18/2018 1:45 PM, Konstantin Belousov wrote:
>>> On Mon, Jun 18, 2018 at 12:42:46PM -0700, Bryan Drewery wrote:
 On 6/15/2018 10:55 PM, Mark Millard wrote:
> In watching ci.freebsd.org builds I've seen a notable
> number of one time failures, such as (example from
> powerpc64):
>
> --- all_subdir_lib/libufs ---
> ranlib -D libufs.a
> ranlib: fatal: Failed to open 'libufs.a'
> *** [libufs.a] Error code 70
>
> where the next build works despite the change being
> irrelevant to whatever ranlib complained about.
>
> Other builds failed similarly:
>
> --- all_subdir_lib/libbsm ---
> ranlib -D libbsm_p.a
> ranlib: fatal: Failed to open 'libbsm_p.a'
> *** [libbsm_p.a] Error code 70
>
> and:
>
> --- kerberos5/lib__L ---
> ranlib -D libgssapi_spnego_p.a
> --- libgssapi_spnego.a ---
> ranlib -D libgssapi_spnego.a
> --- libgssapi_spnego_p.a ---
> ranlib: fatal: Failed to open 'libgssapi_spnego_p.a'
> *** [libgssapi_spnego_p.a] Error code 70
>
> and so on.
>
>
> It is not limited to powerpc64. For example, for aarch64
> there are:
>
> --- libpam_exec.a ---
> building static pam_exec library
> ar -crD libpam_exec.a `NM='nm' NMFLAGS=''  lorder pam_exec.o  | tsort -q`
> ranlib -D libpam_exec.a
> ranlib: fatal: Failed to open 'libpam_exec.a'
> *** [libpam_exec.a] Error code 70
>
> and:
>
> --- all_subdir_lib/libusb ---
> ranlib -D libusb.a
> ranlib: fatal: Failed to open 'libusb.a'
> *** [libusb.a] Error code 70
>
> and:
>
> --- all_subdir_lib/libbsnmp ---
> ranlib: fatal: Failed to open 'libbsnmp.a'
> --- all_subdir_lib/ncurses ---
> --- all_subdir_lib/ncurses/panelw ---
> --- panel.pico ---
> --- all_subdir_lib/libbsnmp ---
> *** [libbsnmp.a] Error code 70
>
>
> Even amd64 gets such:
>
> --- libpcap.a ---
> ranlib -D libpcap.a
> ranlib: fatal: Failed to open 'libpcap.a'
> *** [libpcap.a] Error code 70
>
> and:
>
>
> --- libkafs5.a ---
> ranlib: fatal: Failed to open 'libkafs5.a'
> --- libkafs5_p.a ---
> ranlib: fatal: Failed to open 'libkafs5_p.a'
> --- cddl/lib__L ---
> /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/lua/lbaselib.c:60:26:
>  note: include the header  or explicitly provide a declaration 
> for 'toupper'
> --- kerberos5/lib__L ---
> *** [libkafs5_p.a] Error code 70
>
> make[5]: stopped in /usr/src/kerberos5/lib/libkafs5
> --- libkafs5.a ---
> *** [libkafs5.a] Error code 70
>
> and:
>
>
> --- lib__L ---
> ranlib -D libclang_rt.asan_cxx-i386.a
> ranlib: fatal: Failed to open 'libclang_rt.asan_cxx-i386.a'
> *** [libclang_rt.asan_cxx-i386.a] Error code 70
>
>
> (Notice the variability in what .a the ranlib's fail for.)
>
>
>
>
>


 I looked at this a few days ago and don't believe it's actually a build
 race. I think there is something wrong with the ar/ranlib on that system
 or something else. I've found no evidence of concurrent building of the
 .a files in question.
>>>
>>> FWIW, I got the similar failure when I did last checks for the OFED
>>> commit.  For me, it was libgcc.a.
>>>
>>
>> If it was -lgcc_s then it's a known rare build race due to
>> tools/install.sh not handling -S.
> 
> It seems a more general problem, this one:
> 
> https://ci.freebsd.org/job/FreeBSD-head-aarch64-build/8190/console
> 
> calls for libcuse_p.a, while this one:
> 
> https://ci.freebsd.org/job/FreeBSD-head-mips-build/2919/console
> 
> calls for libfifolog.a
> 

Well why is ar -> ranlib so special? Nothing else is failing.
What filesystem are these using for objdirs?
What revision is the host kernel?

-- 
Regards,
Bryan Drewery



signature.asc
Description: OpenPGP digital signature


Re: A head buildworld race visible in the ci.freebsd.org build history

2018-06-18 Thread Li-Wen Hsu
On Mon, Jun 18, 2018 at 5:04 PM Mark Millard via freebsd-toolchain
 wrote:
>
> On 2018-Jun-18, at 12:42 PM, Bryan Drewery  wrote:
>
> > On 6/15/2018 10:55 PM, Mark Millard wrote:
> >> In watching ci.freebsd.org builds I've seen a notable
> >> number of one time failures, such as (example from
> >> powerpc64):
> >>
> >> --- all_subdir_lib/libufs ---
> >> ranlib -D libufs.a
> >> ranlib: fatal: Failed to open 'libufs.a'
> >> *** [libufs.a] Error code 70
> >>
> >> where the next build works despite the change being
> >> irrelevant to whatever ranlib complained about.
> >>
> >> Other builds failed similarly:
> >>
> >> --- all_subdir_lib/libbsm ---
> >> ranlib -D libbsm_p.a
> >> ranlib: fatal: Failed to open 'libbsm_p.a'
> >> *** [libbsm_p.a] Error code 70
> >>
> >> and:
> >>
> >> --- kerberos5/lib__L ---
> >> ranlib -D libgssapi_spnego_p.a
> >> --- libgssapi_spnego.a ---
> >> ranlib -D libgssapi_spnego.a
> >> --- libgssapi_spnego_p.a ---
> >> ranlib: fatal: Failed to open 'libgssapi_spnego_p.a'
> >> *** [libgssapi_spnego_p.a] Error code 70
> >>
> >> and so on.
> >>
> >>
> >> It is not limited to powerpc64. For example, for aarch64
> >> there are:
> >>
> >> --- libpam_exec.a ---
> >> building static pam_exec library
> >> ar -crD libpam_exec.a `NM='nm' NMFLAGS=''  lorder pam_exec.o  | tsort -q`
> >> ranlib -D libpam_exec.a
> >> ranlib: fatal: Failed to open 'libpam_exec.a'
> >> *** [libpam_exec.a] Error code 70
> >>
> >> and:
> >>
> >> --- all_subdir_lib/libusb ---
> >> ranlib -D libusb.a
> >> ranlib: fatal: Failed to open 'libusb.a'
> >> *** [libusb.a] Error code 70
> >>
> >> and:
> >>
> >> --- all_subdir_lib/libbsnmp ---
> >> ranlib: fatal: Failed to open 'libbsnmp.a'
> >> --- all_subdir_lib/ncurses ---
> >> --- all_subdir_lib/ncurses/panelw ---
> >> --- panel.pico ---
> >> --- all_subdir_lib/libbsnmp ---
> >> *** [libbsnmp.a] Error code 70
> >>
> >>
> >> Even amd64 gets such:
> >>
> >> --- libpcap.a ---
> >> ranlib -D libpcap.a
> >> ranlib: fatal: Failed to open 'libpcap.a'
> >> *** [libpcap.a] Error code 70
> >>
> >> and:
> >>
> >>
> >> --- libkafs5.a ---
> >> ranlib: fatal: Failed to open 'libkafs5.a'
> >> --- libkafs5_p.a ---
> >> ranlib: fatal: Failed to open 'libkafs5_p.a'
> >> --- cddl/lib__L ---
> >> /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/lua/lbaselib.c:60:26:
> >>  note: include the header  or explicitly provide a declaration 
> >> for 'toupper'
> >> --- kerberos5/lib__L ---
> >> *** [libkafs5_p.a] Error code 70
> >>
> >> make[5]: stopped in /usr/src/kerberos5/lib/libkafs5
> >> --- libkafs5.a ---
> >> *** [libkafs5.a] Error code 70
> >>
> >> and:
> >>
> >>
> >> --- lib__L ---
> >> ranlib -D libclang_rt.asan_cxx-i386.a
> >> ranlib: fatal: Failed to open 'libclang_rt.asan_cxx-i386.a'
> >> *** [libclang_rt.asan_cxx-i386.a] Error code 70
> >>
> >>
> >> (Notice the variability in what .a the ranlib's fail for.)
> >>
> >>
> >>
> >>
> >>
> >
> >
> > I looked at this a few days ago and don't believe it's actually a build
> > race. I think there is something wrong with the ar/ranlib on that system
> > or something else. I've found no evidence of concurrent building of the
> > .a files in question.
>
>
> Looking at a bunch of the failures, spanning multiple
> FreeBSD-head-*-build types of builds, I see only:
>
> NODE_LABELS bhyve_host butler1.nyi.freebsd.org jailer jailer_fast
> NODE_NAME   butler1.nyi.freebsd.org
>
> for the failures that I looked at.
>
> So your "on that system" might well be correct.

Thanks for the insight, the build is done in a 11.1-R jail on a
-CURRENT host.  butler1.nyi is running r88 (as a canary) while
other builders are mostly running r328278.  I upgraded few others and
it seems can reproduce the issue, and now I downgraded all the build
slaves to r328278 before we find the root cause.

Li-Wen

--
Li-Wen Hsu 
https://lwhsu.org
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: A head buildworld race visible in the ci.freebsd.org build history

2018-06-18 Thread Li-Wen Hsu
On Mon, Jun 18, 2018 at 6:27 PM Bryan Drewery  wrote:
>
> On 6/18/2018 1:45 PM, Konstantin Belousov wrote:
> > On Mon, Jun 18, 2018 at 12:42:46PM -0700, Bryan Drewery wrote:
> >> On 6/15/2018 10:55 PM, Mark Millard wrote:
> >>> In watching ci.freebsd.org builds I've seen a notable
> >>> number of one time failures, such as (example from
> >>> powerpc64):
> >>>
> >>> --- all_subdir_lib/libufs ---
> >>> ranlib -D libufs.a
> >>> ranlib: fatal: Failed to open 'libufs.a'
> >>> *** [libufs.a] Error code 70
> >>>
> >>> where the next build works despite the change being
> >>> irrelevant to whatever ranlib complained about.
> >>>
> >>> Other builds failed similarly:
> >>>
> >>> --- all_subdir_lib/libbsm ---
> >>> ranlib -D libbsm_p.a
> >>> ranlib: fatal: Failed to open 'libbsm_p.a'
> >>> *** [libbsm_p.a] Error code 70
> >>>
> >>> and:
> >>>
> >>> --- kerberos5/lib__L ---
> >>> ranlib -D libgssapi_spnego_p.a
> >>> --- libgssapi_spnego.a ---
> >>> ranlib -D libgssapi_spnego.a
> >>> --- libgssapi_spnego_p.a ---
> >>> ranlib: fatal: Failed to open 'libgssapi_spnego_p.a'
> >>> *** [libgssapi_spnego_p.a] Error code 70
> >>>
> >>> and so on.
> >>>
> >>>
> >>> It is not limited to powerpc64. For example, for aarch64
> >>> there are:
> >>>
> >>> --- libpam_exec.a ---
> >>> building static pam_exec library
> >>> ar -crD libpam_exec.a `NM='nm' NMFLAGS=''  lorder pam_exec.o  | tsort -q`
> >>> ranlib -D libpam_exec.a
> >>> ranlib: fatal: Failed to open 'libpam_exec.a'
> >>> *** [libpam_exec.a] Error code 70
> >>>
> >>> and:
> >>>
> >>> --- all_subdir_lib/libusb ---
> >>> ranlib -D libusb.a
> >>> ranlib: fatal: Failed to open 'libusb.a'
> >>> *** [libusb.a] Error code 70
> >>>
> >>> and:
> >>>
> >>> --- all_subdir_lib/libbsnmp ---
> >>> ranlib: fatal: Failed to open 'libbsnmp.a'
> >>> --- all_subdir_lib/ncurses ---
> >>> --- all_subdir_lib/ncurses/panelw ---
> >>> --- panel.pico ---
> >>> --- all_subdir_lib/libbsnmp ---
> >>> *** [libbsnmp.a] Error code 70
> >>>
> >>>
> >>> Even amd64 gets such:
> >>>
> >>> --- libpcap.a ---
> >>> ranlib -D libpcap.a
> >>> ranlib: fatal: Failed to open 'libpcap.a'
> >>> *** [libpcap.a] Error code 70
> >>>
> >>> and:
> >>>
> >>>
> >>> --- libkafs5.a ---
> >>> ranlib: fatal: Failed to open 'libkafs5.a'
> >>> --- libkafs5_p.a ---
> >>> ranlib: fatal: Failed to open 'libkafs5_p.a'
> >>> --- cddl/lib__L ---
> >>> /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/lua/lbaselib.c:60:26:
> >>>  note: include the header  or explicitly provide a declaration 
> >>> for 'toupper'
> >>> --- kerberos5/lib__L ---
> >>> *** [libkafs5_p.a] Error code 70
> >>>
> >>> make[5]: stopped in /usr/src/kerberos5/lib/libkafs5
> >>> --- libkafs5.a ---
> >>> *** [libkafs5.a] Error code 70
> >>>
> >>> and:
> >>>
> >>>
> >>> --- lib__L ---
> >>> ranlib -D libclang_rt.asan_cxx-i386.a
> >>> ranlib: fatal: Failed to open 'libclang_rt.asan_cxx-i386.a'
> >>> *** [libclang_rt.asan_cxx-i386.a] Error code 70
> >>>
> >>>
> >>> (Notice the variability in what .a the ranlib's fail for.)
> >>>
> >>>
> >>>
> >>>
> >>>
> >>
> >>
> >> I looked at this a few days ago and don't believe it's actually a build
> >> race. I think there is something wrong with the ar/ranlib on that system
> >> or something else. I've found no evidence of concurrent building of the
> >> .a files in question.
> >
> > FWIW, I got the similar failure when I did last checks for the OFED
> > commit.  For me, it was libgcc.a.
> >
>
> If it was -lgcc_s then it's a known rare build race due to
> tools/install.sh not handling -S.

It seems a more general problem, this one:

https://ci.freebsd.org/job/FreeBSD-head-aarch64-build/8190/console

calls for libcuse_p.a, while this one:

https://ci.freebsd.org/job/FreeBSD-head-mips-build/2919/console

calls for libfifolog.a

-- 
Li-Wen Hsu 
https://lwhsu.org
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: A head buildworld race visible in the ci.freebsd.org build history

2018-06-18 Thread Bryan Drewery
On 6/18/2018 1:45 PM, Konstantin Belousov wrote:
> On Mon, Jun 18, 2018 at 12:42:46PM -0700, Bryan Drewery wrote:
>> On 6/15/2018 10:55 PM, Mark Millard wrote:
>>> In watching ci.freebsd.org builds I've seen a notable
>>> number of one time failures, such as (example from
>>> powerpc64):
>>>
>>> --- all_subdir_lib/libufs ---
>>> ranlib -D libufs.a
>>> ranlib: fatal: Failed to open 'libufs.a'
>>> *** [libufs.a] Error code 70
>>>
>>> where the next build works despite the change being
>>> irrelevant to whatever ranlib complained about.
>>>
>>> Other builds failed similarly:
>>>
>>> --- all_subdir_lib/libbsm ---
>>> ranlib -D libbsm_p.a
>>> ranlib: fatal: Failed to open 'libbsm_p.a'
>>> *** [libbsm_p.a] Error code 70
>>>
>>> and:
>>>
>>> --- kerberos5/lib__L ---
>>> ranlib -D libgssapi_spnego_p.a
>>> --- libgssapi_spnego.a ---
>>> ranlib -D libgssapi_spnego.a
>>> --- libgssapi_spnego_p.a ---
>>> ranlib: fatal: Failed to open 'libgssapi_spnego_p.a'
>>> *** [libgssapi_spnego_p.a] Error code 70
>>>
>>> and so on.
>>>
>>>
>>> It is not limited to powerpc64. For example, for aarch64
>>> there are:
>>>
>>> --- libpam_exec.a ---
>>> building static pam_exec library
>>> ar -crD libpam_exec.a `NM='nm' NMFLAGS=''  lorder pam_exec.o  | tsort -q` 
>>> ranlib -D libpam_exec.a
>>> ranlib: fatal: Failed to open 'libpam_exec.a'
>>> *** [libpam_exec.a] Error code 70
>>>
>>> and:
>>>
>>> --- all_subdir_lib/libusb ---
>>> ranlib -D libusb.a
>>> ranlib: fatal: Failed to open 'libusb.a'
>>> *** [libusb.a] Error code 70
>>>
>>> and:
>>>
>>> --- all_subdir_lib/libbsnmp ---
>>> ranlib: fatal: Failed to open 'libbsnmp.a'
>>> --- all_subdir_lib/ncurses ---
>>> --- all_subdir_lib/ncurses/panelw ---
>>> --- panel.pico ---
>>> --- all_subdir_lib/libbsnmp ---
>>> *** [libbsnmp.a] Error code 70
>>>
>>>
>>> Even amd64 gets such:
>>>
>>> --- libpcap.a ---
>>> ranlib -D libpcap.a
>>> ranlib: fatal: Failed to open 'libpcap.a'
>>> *** [libpcap.a] Error code 70
>>>
>>> and:
>>>
>>>
>>> --- libkafs5.a ---
>>> ranlib: fatal: Failed to open 'libkafs5.a'
>>> --- libkafs5_p.a ---
>>> ranlib: fatal: Failed to open 'libkafs5_p.a'
>>> --- cddl/lib__L ---
>>> /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/lua/lbaselib.c:60:26:
>>>  note: include the header  or explicitly provide a declaration for 
>>> 'toupper'
>>> --- kerberos5/lib__L ---
>>> *** [libkafs5_p.a] Error code 70
>>>
>>> make[5]: stopped in /usr/src/kerberos5/lib/libkafs5
>>> --- libkafs5.a ---
>>> *** [libkafs5.a] Error code 70
>>>
>>> and:
>>>
>>>
>>> --- lib__L ---
>>> ranlib -D libclang_rt.asan_cxx-i386.a
>>> ranlib: fatal: Failed to open 'libclang_rt.asan_cxx-i386.a'
>>> *** [libclang_rt.asan_cxx-i386.a] Error code 70
>>>
>>>
>>> (Notice the variability in what .a the ranlib's fail for.)
>>>
>>>
>>>
>>>
>>>
>>
>>
>> I looked at this a few days ago and don't believe it's actually a build
>> race. I think there is something wrong with the ar/ranlib on that system
>> or something else. I've found no evidence of concurrent building of the
>> .a files in question.
> 
> FWIW, I got the similar failure when I did last checks for the OFED
> commit.  For me, it was libgcc.a.
> 

If it was -lgcc_s then it's a known rare build race due to
tools/install.sh not handling -S.

-- 
Regards,
Bryan Drewery



signature.asc
Description: OpenPGP digital signature


Re: ESXi NFSv4.1 client id is nasty

2018-06-18 Thread Steve Wills

Hi,

On 06/18/18 17:42, Rick Macklem wrote:

Steve Wills wrote:

Would it be possible or reasonable to use the client ID to log a message
telling the admin to enable a sysctl to enable the hacks?

Yes. However, this client implementation id is only seen by the server
when the client makes a mount attempt.

I suppose it could log the message and fail the mount, if the "hack" sysctl 
isn't
set?


I hadn't thought of failing the mount, just defaulting not enabling the 
hacks unless the admin chooses to enable them. But at the same time 
being proactive about telling the admin to enable them.


I.E. keep the implementation RFC compliant since we wouldn't be changing 
the behavior based on the implementation ID, only based upon the admin 
setting the sysctl, which we told them to do based on the implementation ID.


Just an idea, maybe Warner's suggestion is a better one.

Steve


___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: ESXi NFSv4.1 client id is nasty

2018-06-18 Thread Rick Macklem
Steve Wills wrote:
>Would it be possible or reasonable to use the client ID to log a message
>telling the admin to enable a sysctl to enable the hacks?
Yes. However, this client implementation id is only seen by the server
when the client makes a mount attempt.

I suppose it could log the message and fail the mount, if the "hack" sysctl 
isn't
set?

rick
[stuff snipped]


From: Steve Wills 
Sent: Monday, June 18, 2018 5:21:10 PM
To: Rick Macklem; freebsd-current@freebsd.org
Cc: andreas.n...@frequentis.com
Subject: Re: ESXi NFSv4.1 client id is nasty

Would it be possible or reasonable to use the client ID to log a message
telling the admin to enable a sysctl to enable the hacks?

Steve

On 06/17/18 08:35, Rick Macklem wrote:
> Hi,
>
> Andreas Nagy has been doing a lot of testing of the NFSv4.1 client in ESXi 
> 6.5u1
> (VMware) against the FreeBSD server. I have given him a bunch of hackish 
> patches
> to try and some of them do help. However not all issues are resolved.
> The problem is that these hacks pretty obviously violate the NFSv4.1 RFC 
> (5661).
> (Details on these come later, for those interested in such things.)
>
> I can think of three ways to deal with this:
> 1 - Just leave the server as is and point people to the issues that should be 
> addressed
>   in the ESXi client.
> 2 - Put the hacks in, but only enable them based on a sysctl not enabled by 
> default.
>   (The main problem with this is when the server also has non-ESXi 
> mounts.)
> 3 - Enable the hacks for ESXi client mounts only, using the implementation ID
>   it presents at mount time in its ExchangeID arguments.
>   - This is my preferred solution, but the RFC says:
> An example use for implementation identifiers would be diagnostic
> software that extracts this information in an attempt to identify
> interoperability problems, performance workload behaviors, or general
> usage statistics.  Since the intent of having access to this
> information is for planning or general diagnosis only, the client and
> server MUST NOT interpret this implementation identity information in
> a way that affects interoperational behavior of the implementation.
> The reason is that if clients and servers did such a thing, they
> might use fewer capabilities of the protocol than the peer can
> support, or the client and server might refuse to interoperate.
>
> Note the "MUST NOT" w.r.t. doing this. Of course, I could argue that, since 
> the
> hacks violate the RFC, then why not enable them in a way that violates the 
> RFC.
>
> Anyhow, I would like to hear from others w.r.t. how they think this should be 
> handled?
>
> Here's details on the breakage and workarounds for those interested, from 
> looking
> at packet traces in wireshark:
> Fairly benign ones:
> - The client does a ReclaimComplete with one_fs == false and then does a
>ReclaimComplete with one_fs == true. The server returns
>NFS4ERR_COMPLETE_ALREADY for the second one, which the ESXi client
>doesn't like.
>Woraround: Don't return an error for the one_fs == true case and just 
> assume
> that same as "one_fs == false".
>There is also a case where the client only does the
>ReclaimComplete with one_fs == true. Since FreeBSD exports a hierarchy of
>file systems, this doesn't indicate to the server that all reclaims are 
> done.
>(Other extant clients never do the "one_fs == true" variant of
>ReclaimComplete.)
>This case of just doing the "one_fs == true" variant is actually a 
> limitation
>of the server which I don't know how to fix. However the same workaround
>as listed about gets around it.
>
> - The client puts random garbage in the delegate_type argument for
>Open/ClaimPrevious.
>Workaround: Since the client sets OPEN4_SHARE_ACCESS_WANT_NO_DELEG, it 
> doesn't
>want a delegation, so assume OPEN_DELEGATE_NONE or 
> OPEN_DELEGATE_NONE_EXT
>instead of garbage. (Not sure which of the two values makes it 
> happier.)
>
> Serious ones:
> - The client does a OpenDowngrade with arguments set to OPEN_SHARE_ACCESS_BOTH
>and OPEN_SHARE_DENY_BOTH.
>Since OpenDowngrade is supposed to decrease share_access and share_deny,
>the server returns NFS4ERR_INVAL. OpenDowngrade is not supposed to ever
>conflict with another Open. (A conflict happens when another Open has
>set an OPEN_SHARE_DENY that denies the result of the OpenDowngrade.)
>with NFS4ERR_SHARE_DENIED.
>I believe this one is done by the client for something it calls a
>"device lock" and really doesn't like this failing.
>Workaround: All I can think of is ignore the check for new bits not being 
> set
>and reply NFS_OK, when no conflicting Open exists.
>When there is a conflicting Open, returning NFS4ERR_INVAL seems to be 
> the
>only option, since NFS4ERR_SHARE_DENIED isn't listed for OpenDowngrade.
>
> - W

Re: ESXi NFSv4.1 client id is nasty

2018-06-18 Thread Warner Losh
My thoughts on this are mixed.

You need certain workarounds, but they sound like they need to be on a
per-client-type basis.
On the one hand, you don't want to chat with different clients differently,
but on the other you want it to work.

I'd suggest a two-tiered approach.

First, have a sysctl per workaround that's a list of client types to apply
the workaround to. Have these default to ESX client, but allow for others.

Second, have a master sysctl to turn on/off per-client workarounds. Have
this default to off.

And finally, see if you can get ESXi to fix their flaws. This is by far the
best solution. The above should really only be a stop-gap, but would be
extensible should this sort of thing become more of the norm than is
desired.

Warner

On Mon, Jun 18, 2018 at 3:21 PM, Steve Wills  wrote:

> Would it be possible or reasonable to use the client ID to log a message
> telling the admin to enable a sysctl to enable the hacks?
>
> Steve
>
> On 06/17/18 08:35, Rick Macklem wrote:
>
>> Hi,
>>
>> Andreas Nagy has been doing a lot of testing of the NFSv4.1 client in
>> ESXi 6.5u1
>> (VMware) against the FreeBSD server. I have given him a bunch of hackish
>> patches
>> to try and some of them do help. However not all issues are resolved.
>> The problem is that these hacks pretty obviously violate the NFSv4.1 RFC
>> (5661).
>> (Details on these come later, for those interested in such things.)
>>
>> I can think of three ways to deal with this:
>> 1 - Just leave the server as is and point people to the issues that
>> should be addressed
>>   in the ESXi client.
>> 2 - Put the hacks in, but only enable them based on a sysctl not enabled
>> by default.
>>   (The main problem with this is when the server also has non-ESXi
>> mounts.)
>> 3 - Enable the hacks for ESXi client mounts only, using the
>> implementation ID
>>   it presents at mount time in its ExchangeID arguments.
>>   - This is my preferred solution, but the RFC says:
>> An example use for implementation identifiers would be diagnostic
>> software that extracts this information in an attempt to identify
>> interoperability problems, performance workload behaviors, or general
>> usage statistics.  Since the intent of having access to this
>> information is for planning or general diagnosis only, the client and
>> server MUST NOT interpret this implementation identity information in
>> a way that affects interoperational behavior of the implementation.
>> The reason is that if clients and servers did such a thing, they
>> might use fewer capabilities of the protocol than the peer can
>> support, or the client and server might refuse to interoperate.
>>
>> Note the "MUST NOT" w.r.t. doing this. Of course, I could argue that,
>> since the
>> hacks violate the RFC, then why not enable them in a way that violates
>> the RFC.
>>
>> Anyhow, I would like to hear from others w.r.t. how they think this
>> should be handled?
>>
>> Here's details on the breakage and workarounds for those interested, from
>> looking
>> at packet traces in wireshark:
>> Fairly benign ones:
>> - The client does a ReclaimComplete with one_fs == false and then does a
>>ReclaimComplete with one_fs == true. The server returns
>>NFS4ERR_COMPLETE_ALREADY for the second one, which the ESXi client
>>doesn't like.
>>Woraround: Don't return an error for the one_fs == true case and just
>> assume
>> that same as "one_fs == false".
>>There is also a case where the client only does the
>>ReclaimComplete with one_fs == true. Since FreeBSD exports a hierarchy
>> of
>>file systems, this doesn't indicate to the server that all reclaims
>> are done.
>>(Other extant clients never do the "one_fs == true" variant of
>>ReclaimComplete.)
>>This case of just doing the "one_fs == true" variant is actually a
>> limitation
>>of the server which I don't know how to fix. However the same
>> workaround
>>as listed about gets around it.
>>
>> - The client puts random garbage in the delegate_type argument for
>>Open/ClaimPrevious.
>>Workaround: Since the client sets OPEN4_SHARE_ACCESS_WANT_NO_DELEG,
>> it doesn't
>>want a delegation, so assume OPEN_DELEGATE_NONE or
>> OPEN_DELEGATE_NONE_EXT
>>instead of garbage. (Not sure which of the two values makes it
>> happier.)
>>
>> Serious ones:
>> - The client does a OpenDowngrade with arguments set to
>> OPEN_SHARE_ACCESS_BOTH
>>and OPEN_SHARE_DENY_BOTH.
>>Since OpenDowngrade is supposed to decrease share_access and
>> share_deny,
>>the server returns NFS4ERR_INVAL. OpenDowngrade is not supposed to ever
>>conflict with another Open. (A conflict happens when another Open has
>>set an OPEN_SHARE_DENY that denies the result of the OpenDowngrade.)
>>with NFS4ERR_SHARE_DENIED.
>>I believe this one is done by the client for something it calls a
>>"device lock" and really doesn't like this failing.
>>Workar

Re: ESXi NFSv4.1 client id is nasty

2018-06-18 Thread Steve Wills
Would it be possible or reasonable to use the client ID to log a message 
telling the admin to enable a sysctl to enable the hacks?


Steve

On 06/17/18 08:35, Rick Macklem wrote:

Hi,

Andreas Nagy has been doing a lot of testing of the NFSv4.1 client in ESXi 6.5u1
(VMware) against the FreeBSD server. I have given him a bunch of hackish patches
to try and some of them do help. However not all issues are resolved.
The problem is that these hacks pretty obviously violate the NFSv4.1 RFC (5661).
(Details on these come later, for those interested in such things.)

I can think of three ways to deal with this:
1 - Just leave the server as is and point people to the issues that should be 
addressed
  in the ESXi client.
2 - Put the hacks in, but only enable them based on a sysctl not enabled by 
default.
  (The main problem with this is when the server also has non-ESXi mounts.)
3 - Enable the hacks for ESXi client mounts only, using the implementation ID
  it presents at mount time in its ExchangeID arguments.
  - This is my preferred solution, but the RFC says:
An example use for implementation identifiers would be diagnostic
software that extracts this information in an attempt to identify
interoperability problems, performance workload behaviors, or general
usage statistics.  Since the intent of having access to this
information is for planning or general diagnosis only, the client and
server MUST NOT interpret this implementation identity information in
a way that affects interoperational behavior of the implementation.
The reason is that if clients and servers did such a thing, they
might use fewer capabilities of the protocol than the peer can
support, or the client and server might refuse to interoperate.

Note the "MUST NOT" w.r.t. doing this. Of course, I could argue that, since the
hacks violate the RFC, then why not enable them in a way that violates the RFC.

Anyhow, I would like to hear from others w.r.t. how they think this should be 
handled?

Here's details on the breakage and workarounds for those interested, from 
looking
at packet traces in wireshark:
Fairly benign ones:
- The client does a ReclaimComplete with one_fs == false and then does a
   ReclaimComplete with one_fs == true. The server returns
   NFS4ERR_COMPLETE_ALREADY for the second one, which the ESXi client
   doesn't like.
   Woraround: Don't return an error for the one_fs == true case and just assume
that same as "one_fs == false".
   There is also a case where the client only does the
   ReclaimComplete with one_fs == true. Since FreeBSD exports a hierarchy of
   file systems, this doesn't indicate to the server that all reclaims are done.
   (Other extant clients never do the "one_fs == true" variant of
   ReclaimComplete.)
   This case of just doing the "one_fs == true" variant is actually a limitation
   of the server which I don't know how to fix. However the same workaround
   as listed about gets around it.

- The client puts random garbage in the delegate_type argument for
   Open/ClaimPrevious.
   Workaround: Since the client sets OPEN4_SHARE_ACCESS_WANT_NO_DELEG, it 
doesn't
   want a delegation, so assume OPEN_DELEGATE_NONE or OPEN_DELEGATE_NONE_EXT
   instead of garbage. (Not sure which of the two values makes it happier.)

Serious ones:
- The client does a OpenDowngrade with arguments set to OPEN_SHARE_ACCESS_BOTH
   and OPEN_SHARE_DENY_BOTH.
   Since OpenDowngrade is supposed to decrease share_access and share_deny,
   the server returns NFS4ERR_INVAL. OpenDowngrade is not supposed to ever
   conflict with another Open. (A conflict happens when another Open has
   set an OPEN_SHARE_DENY that denies the result of the OpenDowngrade.)
   with NFS4ERR_SHARE_DENIED.
   I believe this one is done by the client for something it calls a
   "device lock" and really doesn't like this failing.
   Workaround: All I can think of is ignore the check for new bits not being set
   and reply NFS_OK, when no conflicting Open exists.
   When there is a conflicting Open, returning NFS4ERR_INVAL seems to be the
   only option, since NFS4ERR_SHARE_DENIED isn't listed for OpenDowngrade.

- When a server reboots, client does not serialize ExchangeID/CreateSession.
   When the server reboots, a client needs to do a serialized set of RPCs
   with ExchangeID followed by CreateSession to confirm it. The reply to
   ExchangeID has a sequence number (csr_sequence) in it and the
   CreateSession needs to have the same value in its csa_sequence argument
   to confirm the clientid issued by the ExchangeID.
   The client sends many ExchangeIDs and CreateSessions, so they end up failing
   many times due to the sequence number not matching the last ExchangeID.
   (This might only happen in the trunked case.)
   Workaround: Nothing that I can think of.

- ExchangeID sometimes sends eia_clientowner.co_verifier argument as all zeros.
   Sometimes the client bogusly fill

Re: A head buildworld race visible in the ci.freebsd.org build history

2018-06-18 Thread Mark Millard



On 2018-Jun-18, at 12:42 PM, Bryan Drewery  wrote:

> On 6/15/2018 10:55 PM, Mark Millard wrote:
>> In watching ci.freebsd.org builds I've seen a notable
>> number of one time failures, such as (example from
>> powerpc64):
>> 
>> --- all_subdir_lib/libufs ---
>> ranlib -D libufs.a
>> ranlib: fatal: Failed to open 'libufs.a'
>> *** [libufs.a] Error code 70
>> 
>> where the next build works despite the change being
>> irrelevant to whatever ranlib complained about.
>> 
>> Other builds failed similarly:
>> 
>> --- all_subdir_lib/libbsm ---
>> ranlib -D libbsm_p.a
>> ranlib: fatal: Failed to open 'libbsm_p.a'
>> *** [libbsm_p.a] Error code 70
>> 
>> and:
>> 
>> --- kerberos5/lib__L ---
>> ranlib -D libgssapi_spnego_p.a
>> --- libgssapi_spnego.a ---
>> ranlib -D libgssapi_spnego.a
>> --- libgssapi_spnego_p.a ---
>> ranlib: fatal: Failed to open 'libgssapi_spnego_p.a'
>> *** [libgssapi_spnego_p.a] Error code 70
>> 
>> and so on.
>> 
>> 
>> It is not limited to powerpc64. For example, for aarch64
>> there are:
>> 
>> --- libpam_exec.a ---
>> building static pam_exec library
>> ar -crD libpam_exec.a `NM='nm' NMFLAGS=''  lorder pam_exec.o  | tsort -q` 
>> ranlib -D libpam_exec.a
>> ranlib: fatal: Failed to open 'libpam_exec.a'
>> *** [libpam_exec.a] Error code 70
>> 
>> and:
>> 
>> --- all_subdir_lib/libusb ---
>> ranlib -D libusb.a
>> ranlib: fatal: Failed to open 'libusb.a'
>> *** [libusb.a] Error code 70
>> 
>> and:
>> 
>> --- all_subdir_lib/libbsnmp ---
>> ranlib: fatal: Failed to open 'libbsnmp.a'
>> --- all_subdir_lib/ncurses ---
>> --- all_subdir_lib/ncurses/panelw ---
>> --- panel.pico ---
>> --- all_subdir_lib/libbsnmp ---
>> *** [libbsnmp.a] Error code 70
>> 
>> 
>> Even amd64 gets such:
>> 
>> --- libpcap.a ---
>> ranlib -D libpcap.a
>> ranlib: fatal: Failed to open 'libpcap.a'
>> *** [libpcap.a] Error code 70
>> 
>> and:
>> 
>> 
>> --- libkafs5.a ---
>> ranlib: fatal: Failed to open 'libkafs5.a'
>> --- libkafs5_p.a ---
>> ranlib: fatal: Failed to open 'libkafs5_p.a'
>> --- cddl/lib__L ---
>> /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/lua/lbaselib.c:60:26:
>>  note: include the header  or explicitly provide a declaration for 
>> 'toupper'
>> --- kerberos5/lib__L ---
>> *** [libkafs5_p.a] Error code 70
>> 
>> make[5]: stopped in /usr/src/kerberos5/lib/libkafs5
>> --- libkafs5.a ---
>> *** [libkafs5.a] Error code 70
>> 
>> and:
>> 
>> 
>> --- lib__L ---
>> ranlib -D libclang_rt.asan_cxx-i386.a
>> ranlib: fatal: Failed to open 'libclang_rt.asan_cxx-i386.a'
>> *** [libclang_rt.asan_cxx-i386.a] Error code 70
>> 
>> 
>> (Notice the variability in what .a the ranlib's fail for.)
>> 
>> 
>> 
>> 
>> 
> 
> 
> I looked at this a few days ago and don't believe it's actually a build
> race. I think there is something wrong with the ar/ranlib on that system
> or something else. I've found no evidence of concurrent building of the
> .a files in question.


Looking at a bunch of the failures, spanning multiple
FreeBSD-head-*-build types of builds, I see only:

NODE_LABELS bhyve_host butler1.nyi.freebsd.org jailer jailer_fast
NODE_NAME   butler1.nyi.freebsd.org

for the failures that I looked at.

So your "on that system" might well be correct.


===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: A head buildworld race visible in the ci.freebsd.org build history

2018-06-18 Thread Konstantin Belousov
On Mon, Jun 18, 2018 at 12:42:46PM -0700, Bryan Drewery wrote:
> On 6/15/2018 10:55 PM, Mark Millard wrote:
> > In watching ci.freebsd.org builds I've seen a notable
> > number of one time failures, such as (example from
> > powerpc64):
> > 
> > --- all_subdir_lib/libufs ---
> > ranlib -D libufs.a
> > ranlib: fatal: Failed to open 'libufs.a'
> > *** [libufs.a] Error code 70
> > 
> > where the next build works despite the change being
> > irrelevant to whatever ranlib complained about.
> > 
> > Other builds failed similarly:
> > 
> > --- all_subdir_lib/libbsm ---
> > ranlib -D libbsm_p.a
> > ranlib: fatal: Failed to open 'libbsm_p.a'
> > *** [libbsm_p.a] Error code 70
> > 
> > and:
> > 
> > --- kerberos5/lib__L ---
> > ranlib -D libgssapi_spnego_p.a
> > --- libgssapi_spnego.a ---
> > ranlib -D libgssapi_spnego.a
> > --- libgssapi_spnego_p.a ---
> > ranlib: fatal: Failed to open 'libgssapi_spnego_p.a'
> > *** [libgssapi_spnego_p.a] Error code 70
> > 
> > and so on.
> > 
> > 
> > It is not limited to powerpc64. For example, for aarch64
> > there are:
> > 
> > --- libpam_exec.a ---
> > building static pam_exec library
> > ar -crD libpam_exec.a `NM='nm' NMFLAGS=''  lorder pam_exec.o  | tsort -q` 
> > ranlib -D libpam_exec.a
> > ranlib: fatal: Failed to open 'libpam_exec.a'
> > *** [libpam_exec.a] Error code 70
> > 
> > and:
> > 
> > --- all_subdir_lib/libusb ---
> > ranlib -D libusb.a
> > ranlib: fatal: Failed to open 'libusb.a'
> > *** [libusb.a] Error code 70
> > 
> > and:
> > 
> > --- all_subdir_lib/libbsnmp ---
> > ranlib: fatal: Failed to open 'libbsnmp.a'
> > --- all_subdir_lib/ncurses ---
> > --- all_subdir_lib/ncurses/panelw ---
> > --- panel.pico ---
> > --- all_subdir_lib/libbsnmp ---
> > *** [libbsnmp.a] Error code 70
> > 
> > 
> > Even amd64 gets such:
> > 
> > --- libpcap.a ---
> > ranlib -D libpcap.a
> > ranlib: fatal: Failed to open 'libpcap.a'
> > *** [libpcap.a] Error code 70
> > 
> > and:
> > 
> > 
> > --- libkafs5.a ---
> > ranlib: fatal: Failed to open 'libkafs5.a'
> > --- libkafs5_p.a ---
> > ranlib: fatal: Failed to open 'libkafs5_p.a'
> > --- cddl/lib__L ---
> > /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/lua/lbaselib.c:60:26:
> >  note: include the header  or explicitly provide a declaration for 
> > 'toupper'
> > --- kerberos5/lib__L ---
> > *** [libkafs5_p.a] Error code 70
> > 
> > make[5]: stopped in /usr/src/kerberos5/lib/libkafs5
> > --- libkafs5.a ---
> > *** [libkafs5.a] Error code 70
> > 
> > and:
> > 
> > 
> > --- lib__L ---
> > ranlib -D libclang_rt.asan_cxx-i386.a
> > ranlib: fatal: Failed to open 'libclang_rt.asan_cxx-i386.a'
> > *** [libclang_rt.asan_cxx-i386.a] Error code 70
> > 
> > 
> > (Notice the variability in what .a the ranlib's fail for.)
> > 
> > 
> > 
> > 
> > 
> 
> 
> I looked at this a few days ago and don't believe it's actually a build
> race. I think there is something wrong with the ar/ranlib on that system
> or something else. I've found no evidence of concurrent building of the
> .a files in question.

FWIW, I got the similar failure when I did last checks for the OFED
commit.  For me, it was libgcc.a.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


RFC: ESXi client is nasty, what should I do?

2018-06-18 Thread Rick Macklem
Hi,

I realized that the subject line "ESXi NFSv4.1 client id is nasty" wouldn't have
indicated that I was looking for comments w.r.t. how to handle this poorly
behaved client.

Please go to the "ESXi NFSv4.1 client id is nasty" thread and comment.
(It should be in the archive, if you already deleted it.)

Thanks, rick
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: A head buildworld race visible in the ci.freebsd.org build history

2018-06-18 Thread Bryan Drewery
On 6/15/2018 10:55 PM, Mark Millard wrote:
> In watching ci.freebsd.org builds I've seen a notable
> number of one time failures, such as (example from
> powerpc64):
> 
> --- all_subdir_lib/libufs ---
> ranlib -D libufs.a
> ranlib: fatal: Failed to open 'libufs.a'
> *** [libufs.a] Error code 70
> 
> where the next build works despite the change being
> irrelevant to whatever ranlib complained about.
> 
> Other builds failed similarly:
> 
> --- all_subdir_lib/libbsm ---
> ranlib -D libbsm_p.a
> ranlib: fatal: Failed to open 'libbsm_p.a'
> *** [libbsm_p.a] Error code 70
> 
> and:
> 
> --- kerberos5/lib__L ---
> ranlib -D libgssapi_spnego_p.a
> --- libgssapi_spnego.a ---
> ranlib -D libgssapi_spnego.a
> --- libgssapi_spnego_p.a ---
> ranlib: fatal: Failed to open 'libgssapi_spnego_p.a'
> *** [libgssapi_spnego_p.a] Error code 70
> 
> and so on.
> 
> 
> It is not limited to powerpc64. For example, for aarch64
> there are:
> 
> --- libpam_exec.a ---
> building static pam_exec library
> ar -crD libpam_exec.a `NM='nm' NMFLAGS=''  lorder pam_exec.o  | tsort -q` 
> ranlib -D libpam_exec.a
> ranlib: fatal: Failed to open 'libpam_exec.a'
> *** [libpam_exec.a] Error code 70
> 
> and:
> 
> --- all_subdir_lib/libusb ---
> ranlib -D libusb.a
> ranlib: fatal: Failed to open 'libusb.a'
> *** [libusb.a] Error code 70
> 
> and:
> 
> --- all_subdir_lib/libbsnmp ---
> ranlib: fatal: Failed to open 'libbsnmp.a'
> --- all_subdir_lib/ncurses ---
> --- all_subdir_lib/ncurses/panelw ---
> --- panel.pico ---
> --- all_subdir_lib/libbsnmp ---
> *** [libbsnmp.a] Error code 70
> 
> 
> Even amd64 gets such:
> 
> --- libpcap.a ---
> ranlib -D libpcap.a
> ranlib: fatal: Failed to open 'libpcap.a'
> *** [libpcap.a] Error code 70
> 
> and:
> 
> 
> --- libkafs5.a ---
> ranlib: fatal: Failed to open 'libkafs5.a'
> --- libkafs5_p.a ---
> ranlib: fatal: Failed to open 'libkafs5_p.a'
> --- cddl/lib__L ---
> /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/lua/lbaselib.c:60:26: 
> note: include the header  or explicitly provide a declaration for 
> 'toupper'
> --- kerberos5/lib__L ---
> *** [libkafs5_p.a] Error code 70
> 
> make[5]: stopped in /usr/src/kerberos5/lib/libkafs5
> --- libkafs5.a ---
> *** [libkafs5.a] Error code 70
> 
> and:
> 
> 
> --- lib__L ---
> ranlib -D libclang_rt.asan_cxx-i386.a
> ranlib: fatal: Failed to open 'libclang_rt.asan_cxx-i386.a'
> *** [libclang_rt.asan_cxx-i386.a] Error code 70
> 
> 
> (Notice the variability in what .a the ranlib's fail for.)
> 
> 
> 
> 
> 


I looked at this a few days ago and don't believe it's actually a build
race. I think there is something wrong with the ar/ranlib on that system
or something else. I've found no evidence of concurrent building of the
.a files in question.


-- 
Regards,
Bryan Drewery
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Current @ r335314 not bootable with Geli and ZFS

2018-06-18 Thread Thomas Laus
Something changed in /boot/gptzfsboot between r334610 and r335314.  I
built current this morning and my system is un-bootable.  I am using
redundant ZFS disks and only copied the updated /boot/gptzfsboot file to
my ada0 drive.  I was able to boot the ada1 drive that still had the
gptzfsboot file from r334610.

I had a similar issue a few months ago with the upgrades to the Geli +
ZFS booting process.  These were resolved and operation has been fine
since the last 'hick-up' in the testing process.  I might not be the
only person running the combination of Geli encryption and using a ZFS
filesystem, but it should not be that much uncommon setup that I am the
first to report the problem.

Let me know far back I need to revert my sources to identify the commit
that broke gptzfsboot.  My system goes into a continuous reboot loop
before presenting the password prompt.  It is very early in the startup
process.

Tom

-- 
Public Keys:
PGP KeyID = 0x5F22FDC1
GnuPG KeyID = 0x620836CF
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: r335282: first stage boot failure on PCengines APU 2C4

2018-06-18 Thread O. Hartmann
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

Am Mon, 18 Jun 2018 07:42:20 -0600
Warner Losh  schrieb:

> On Mon, Jun 18, 2018, 3:01 AM Olivier Cochard-Labbé 
> wrote:
> 
> > On Sun, Jun 17, 2018 at 10:01 AM O. Hartmann 
> > wrote:
> >  
> > > -BEGIN PGP SIGNED MESSAGE-
> > > Hash: SHA512
> > >
> > > Running CURRENT as routing and firewalling appliance on a PCengines APU
> > > 2C4 with the
> > > latest (official) SEABios available for this product, NanoBSD (FreeBSD
> > > CURRENT FreeBSD
> > > 12.0-CURRENT #60 r335278: Sun Jun 17 07:57:20 CEST 2018 amd64)is unable  
> > to  
> > > boot recent
> > > OS at the first stage (GPT partitioning, SD card memory).
> > >
> > > ​Hi,  
> >
> >
> > My nanobsd images are based on :
> > [root@apu2]~# uname -a
> > FreeBSD apu2 12.0-CURRENT FreeBSD 12.0-CURRENT  r335286M  amd64
> >
> > And I don't remember to have upgraded its BIOS:
> > [root@apu2]~# kenv smbios.bios.reldate
> > 03/07/2016
> > [root@apu2]~# kenv smbios.system.product
> > apu2
> >
> > But I'm using MBR partitionning on a mSATA disk.
> >  
> 
> Do you know the first version to have this problem? Are you using geli?
> 
> Warner
> 
> >  


Hello, 

if you addressed me (wasn't so clear from your reply on Olivier Cochard-Labbé's
reply to me), I could give you this information (I posted it to Allen Jude in 
response to

svn commit: r335254 - in head/stand/i386: libi386 zfsboot

probably the wrong commit. Please allow me to copy-paste:

[...]

I realised that CURRENT r335222 from Friday, 10th June 2018 booted without 
problems
(NanoBSD, slightly modified to boot off GPT/UEFI partitions, but in general a 
simple
"gpart bootcode -b pmbr -p /boot/gptboot -i 2 mmcsd0" preparation of a dd'd 
image).
Another try with recent r335282 on a APU 2C4, freshly build NanoBSD, fails to 
boot
firststage: the (most recent) SEABios stopps for ever at telling "Booting from 
hard
disk"; were usually a carret starts to spinn there is vast emptyness. Preparing 
the boot
partition with an older bootcode on that very same SD card via "gpart bootcode 
-b pmbr
- - -p /boot/gptboot -i 2 mmcsd0" with an older booted image of CURRENT has 
solved the
problem. The layout of the SD card is as follows, just for the record:

#: gpart show mmcsd0
=>  40  60751792  mmcsd0  GPT  (29G)  
40  1024   2  freebsd-boot  (512K)
  1064   2205944   3  freebsd-ufs  (1.1G)
   2207008   2210127   4  freebsd-ufs  (1.1G)
   4417135   1048576   5  freebsd-ufs  (512M)
   5465711  55286121  - free -  (26G)

[...]

I didn't bi-sect the issue du to time constraints, but if it helps, I could 
try. I
resolved the problem temporarely by writing the bootcode and partition code 
from an older
image.

The APU is serial console only.

Kind regards,

Oliver Hartmann


- -- 
O. Hartmann

Ich widerspreche der Nutzung oder Übermittlung meiner Daten für
Werbezwecke oder für die Markt- oder Meinungsforschung (§ 28 Abs. 4 BDSG).
-BEGIN PGP SIGNATURE-

iLUEARMKAB0WIQQZVZMzAtwC2T/86TrS528fyFhYlAUCWyfUjQAKCRDS528fyFhY
lJZ6AfwOeHnA01kpMxQgEtkIaoCYGoa1wetyss1HvkJj/kolplwJT9mVRKypLjZb
CWxS+ldHyy2lhs9Q1dIrrm64TfVbAgCAc3oZyuOtzoO9+CbMopmUwt5FqFBGn/b0
AI7w7mVu+EzWww/Qx/73E98j6LtZIjB8jpHGc0lqlpDwD6HL5IHt
=mFD0
-END PGP SIGNATURE-
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Ryzen public erratas

2018-06-18 Thread Mike Tancsa
On 6/13/2018 6:35 AM, Konstantin Belousov wrote:
> 
> Please report the results.  If the script helps, I will code the kernel
> change to apply the workarounds.

The hard lockups I was seeing on Ryzen and Epyc boxes are now gone with
the microcode and script below.

Not sure if its one or some combo of the settings, but all the steps
below have made my 2 test systems stable on RELENG_11 anyways.

This was on a Ryzen 5 1600X (ASUS PRIME X370-PRO BIOS from 04/19/2018)
CPU Microcode patch level: 0x8001137

And
EPYC 7281 16-Core (Supermicro H11SSL-i BIOS 04/27/2018 )
Microcode patch level: 0x8001227



Details of the issue were discussed at

https://lists.freebsd.org/pipermail/freebsd-virtualization/2018-March/006187.html
and
https://lists.freebsd.org/pipermail/freebsd-stable/2018-January/088174.html

TL;DR : Generating traffic via iperf3 between VMs either on bhyve or
VirtualBox would make the box lockup-- no crash, just a blank screen

---Mike


> 
> #!/bin/sh
> 
> # Enable workarounds for erratas listed in
> # https://developer.amd.com/wp-content/resources/55449_1.12.pdf
> 
> # 1057, 1109
> sysctl machdep.idle_mwait=0
> sysctl machdep.idle=hlt
> 
> for x in /dev/cpuctl*; do
>   # 1021
>   cpucontrol -m '0xc0011029|=0x2000' $x
>   # 1033
>   cpucontrol -m '0xc0011020|=0x10' $x
>   # 1049
>   cpucontrol -m '0xc0011028|=0x10' $x
>   # 1095
>   cpucontrol -m '0xc0011020|=0x200' $x
> done
> 
> ___
> freebsd-current@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
> 
> 


-- 
---
Mike Tancsa, tel +1 519 651 3400 x203
Sentex Communications, m...@sentex.net
Providing Internet services since 1994 www.sentex.net
Cambridge, Ontario Canada
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: r335282: first stage boot failure on PCengines APU 2C4

2018-06-18 Thread Warner Losh
On Mon, Jun 18, 2018, 3:01 AM Olivier Cochard-Labbé 
wrote:

> On Sun, Jun 17, 2018 at 10:01 AM O. Hartmann 
> wrote:
>
> > -BEGIN PGP SIGNED MESSAGE-
> > Hash: SHA512
> >
> > Running CURRENT as routing and firewalling appliance on a PCengines APU
> > 2C4 with the
> > latest (official) SEABios available for this product, NanoBSD (FreeBSD
> > CURRENT FreeBSD
> > 12.0-CURRENT #60 r335278: Sun Jun 17 07:57:20 CEST 2018 amd64)is unable
> to
> > boot recent
> > OS at the first stage (GPT partitioning, SD card memory).
> >
> > ​Hi,
>
>
> My nanobsd images are based on :
> [root@apu2]~# uname -a
> FreeBSD apu2 12.0-CURRENT FreeBSD 12.0-CURRENT  r335286M  amd64
>
> And I don't remember to have upgraded its BIOS:
> [root@apu2]~# kenv smbios.bios.reldate
> 03/07/2016
> [root@apu2]~# kenv smbios.system.product
> apu2
>
> But I'm using MBR partitionning on a mSATA disk.
>

Do you know the first version to have this problem? Are you using geli?

Warner

>
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: r335282: first stage boot failure on PCengines APU 2C4

2018-06-18 Thread Olivier Cochard-Labbé
On Sun, Jun 17, 2018 at 10:01 AM O. Hartmann  wrote:

> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA512
>
> Running CURRENT as routing and firewalling appliance on a PCengines APU
> 2C4 with the
> latest (official) SEABios available for this product, NanoBSD (FreeBSD
> CURRENT FreeBSD
> 12.0-CURRENT #60 r335278: Sun Jun 17 07:57:20 CEST 2018 amd64)is unable to
> boot recent
> OS at the first stage (GPT partitioning, SD card memory).
>
> ​Hi,


My nanobsd images are based on :
[root@apu2]~# uname -a
FreeBSD apu2 12.0-CURRENT FreeBSD 12.0-CURRENT  r335286M  amd64

And I don't remember to have upgraded its BIOS:
[root@apu2]~# kenv smbios.bios.reldate
03/07/2016
[root@apu2]~# kenv smbios.system.product
apu2

But I'm using MBR partitionning on a mSATA disk.

Regards
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"