https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=219399

--- Comment #259 from Don Lewis <truck...@freebsd.org> ---
(In reply to SF from comment #256)
Neither of my AM4 boards have a VRM frequency adjustment, and none of my large
collection of non-AM4 boards have it either.  I think this feature is pretty
rare.

The highest temperature that I observed in my testing was about 62 C, and that
was  only on very hot afternoons in an un-airconditioned room.  We only
recently got temperature monitoring working in FreeBSD for Ryzen, so I don't
know what the CPU temperature was in my early testing, but the room temperature
was probably 10C lower on my overnight tests and it didn't seem to make any
difference.  Disabling all but two cores in the BIOS also didn't make the
errors go away.  That should have reduced power consumption and heat
dissipation to something like 25W.  Reducing the CPU and RAM clock frequencies
also did not help.  Forcing the cooling fans to run at full speed full time
also did not help.  The default fan curve never cranked up the fan speed this
high.  This doesn't look like a thermal or voltage regulation issue to me.

The only thing that really seemed to improve the results that I was seeing was
tweaking the scheduler to limit the migration of threads between cores, and the
effect was not at all subtle.

The AMD Community Forum thread that I cited has posts from a large number of
Linux users who were experiencing the random segfault problem.  Many of them
worked with AMD customer support who suggested trying a number of different
things (mostly voltage tweaks, disabling SMT, disabling OPCACHE, etc.) that
really didn't seem to solve the problem.  At best they reduced the frequency of
the errors.  AMD does now say that there is a "performance marginality" issue
and has been doing warranty replacements of CPUs for users who have this
problem and generally people who have gotten replacement CPUs have been happy
with the results.  I don't think AMD would be spending the money to do this if
the problem could be fixed with a motherboard BIOS upgrade that would tweak the
default VRM settings.  Apparently AMD is now able to screen for this problem
because they also stated that Threadripper is not affected and it uses two of
the Ryzen die (with the same stepping as the Ryzen CPU chips).

In my case, I just received a warranty CPU replacement.  The random compiler
segfaults are now gone.  The only info that I had to send AMD was my CPU part
and serial numbers, a description of my hardware (PSU, RAM, motherboard, BIOS
revision, etc.), a photo of the BIOS screen showing voltages and temperatures,
and a photo of my case interior so they could look for any potential cooling
problems.  Based on that, they approved an RMA and sent me a replacement CPU. 
It doesn't look like they thought that any BIOS tuning tweaks would be worth
trying.  I still see some random build failures, but I see the same sorts of
failures on my AMD FX-8320E.

-- 
You are receiving this mail because:
You are the assignee for the bug.
_______________________________________________
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"

Reply via email to