[Bug 220184] clang 4.0.0 segfaults on buildworld

2017-06-26 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=220184

--- Comment #22 from Chris Collins  ---
(In reply to Mark Millard from comment #20)

Thanks, the laptop isnt using MSIX, or MSI anyway so I am ok on that, I will
have a look at the i5 750 dmesg to see if MSI or MSIX is used.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-toolchain@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-toolchain
To unsubscribe, send any mail to "freebsd-toolchain-unsubscr...@freebsd.org"


[Bug 220184] clang 4.0.0 segfaults on buildworld

2017-06-26 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=220184

--- Comment #21 from Chris Collins  ---
No issues on the i5 750 now as well across 4 reboots and 13 buildworlds.

I may raise a new bug regarding the timers, as I had to as well adjust the
timecounter on my laptop to get C states working, its default kept it in C1 all
the time, so seems is weird eventtimer and timecounter issues on older
hardware.

The VMWare machine which has no is is a 2016 cpu.
The i5 750 cpu was released in 2009
The laptop cpu is a core 2 duo T5750 released in 2008

Thanks guys for your help.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-toolchain@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-toolchain
To unsubscribe, send any mail to "freebsd-toolchain-unsubscr...@freebsd.org"


[Bug 220184] clang 4.0.0 segfaults on buildworld

2017-06-25 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=220184

--- Comment #20 from Mark Millard  ---
(In reply to Chris Collins from comments #18 and #19)

Interesting --and non-obvious.

>From what I've read Message Signaled Interrupts (MSI)
from PCI 2.2+ depend on LAPIC, requiring LAPIC to be
enabled.

If LAPIC is not working correctly then MSI might not
work fully correctly either and so should be avoided
in such a context?

(I'm not familiar with the details in this area. Take
the above as hear-say.)

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-toolchain@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-toolchain
To unsubscribe, send any mail to "freebsd-toolchain-unsubscr...@freebsd.org"


[Bug 220184] clang 4.0.0 segfaults on buildworld

2017-06-25 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=220184

--- Comment #19 from Chris Collins  ---
So to confirm as I dont think I written it well, using i8254 on my laptop I
dont get segfaults.  The default timer changed between 11.0 and 11-STABLE.

I also meant "roll of the dice" but typod.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-toolchain@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-toolchain
To unsubscribe, send any mail to "freebsd-toolchain-unsubscr...@freebsd.org"


[Bug 220184] clang 4.0.0 segfaults on buildworld

2017-06-25 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=220184

--- Comment #18 from Chris Collins  ---
Ok a further update.

After a reboot, the i5 750 machine started getting segfaults again, a few
reboots later I have discovered the behaviour is fairly consistent, where a
rolld o the dice occurs on a reboot, usually if the first buildworld has no
problem I can probably do 3+ in a row with no segfault, but if the first has a
segfault then I will struggle to get just one successful buildworld.

I discovered the LAPIC timer on my laptop is broken, aided by a warning on the
console, when I switched it to i8254 the problem stopped.  I then fresh
installed 11.0 again and discovered on 11.0 it uses i8254 by default but on
11-STABLE it uses LAPIC, when LAPIC is used I see some other odd behaviours
e.g. systat -v 1 will update really slowly.

I then checked on my i5 750 on 11.0 it uses LAPIC by default and seems to work
ok, on 11-STABLE LAPIC has the same issues as the laptop and it defaults to
HPET.  At the time of this post I havent tried a buildworld using a non default
timer, but I am running buildworld now using i8254 on the i5 750 to see what
results I get, I will run many times over multiple reboots.

The VMWare hypervisor has no segfault problems and uses LAPIC by default
working fine on 11.0 and 11-STABLE.

All the current tests are with empty src.conf aside from
'LOADER_ZFS_SUPPORT=YES'' and no CPUTYPE defined to try and simplify the
diagnosis.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-toolchain@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-toolchain
To unsubscribe, send any mail to "freebsd-toolchain-unsubscr...@freebsd.org"


[Bug 220184] clang 4.0.0 segfaults on buildworld

2017-06-23 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=220184

--- Comment #17 from Chris Collins  ---
Perhaps buildworld with clang 4.0 is now the ultimate hardware stability test
:)

3rd compile was fine, now running 4th.

Will still test on the server class hardware this weekend.

So it seems the diagnosis here is that clang 4.0 works the cpu harder so it is
more likely to show up stability problems than clang 3.x?

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-toolchain@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-toolchain
To unsubscribe, send any mail to "freebsd-toolchain-unsubscr...@freebsd.org"


[Bug 220184] clang 4.0.0 segfaults on buildworld

2017-06-22 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=220184

--- Comment #16 from Chris Collins  ---
I am not insisting its not hardware and I continue to persue the hardware
route.

I am about to go bed as is 4am here, but I upped the vcore on my cpu and dram
voltage on the system and done 2 buildworlds since with no segfaults, it is an
old cpu so is possible voltage degradation has occurred to the point that stock
voltage is not enough to be stable which is why I have raised the voltage.

I will start another buildworld now which will be a third, if it succeeds it
will be the first time 3 have worked in a row.

It is still on the GENERIC kernel as well.

I will also do more runs tomorrow with an empty src.conf.

If these new runs all work (with increased voltage and of course also is good
on my xeon), then yes I accept that as a hardware issue, and is possible my old
laptop may have similar issues as that is old as well. :)

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-toolchain@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-toolchain
To unsubscribe, send any mail to "freebsd-toolchain-unsubscr...@freebsd.org"


[Bug 220184] clang 4.0.0 segfaults on buildworld

2017-06-22 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=220184

--- Comment #15 from Mark Millard  ---
(In reply to Mark Millard from comment #14)

My paragraph:

"If this were a general problem the build servers
would not be able to build the releases, ports,
and such."

was poorly chosen. I should have referred to just
test builds that are based on head, stable/11,
or the drafts of 11.1 . (I expect that there have
been many.) These likely start with
projects/clang*-import/ testing and continue with
head, stable/11, and the 11.1 drafts.

The official of releases and such likely are still
based on an older context building the newer
context. I do not know if they build and use a
bootstrap clang 4 and then use it or not when the
target is head, stable/11, or an 11.1 draft version
of some kind. It could be that only the system
compiler is built and installed but not used for
anything relative to buildworld buildkernel activity.

As I understand exp-runs were made for building
ports that were based on clang 4. This might
still be on-going.

My own activity is incremental updates of head,
so using clang 4 to build a bootstrap compiler
that is clang 4 when needed. Then using the
resultant clang 4 either way. (I ignore here
experimenting with devel/*xtoolchain* or using
gcc 4.2.1 where I have to [32-bit powerpc
kernel that finishes booting correctly].)

There is also likely activity of other people
working based on clang 4, including buildworld,
buildkernel, and building ports (ports that do
not force some gcc or some other toolchain).

I expect there is still enough activity based
on clang 4 that my overall argument structure
still holds: It would be good to try something
that matches a well used, well established
build configuration overall and see what
the status is for that build configuration.

I'll note that my activity is mostly based on
system-clang, not devel/llvm40 clang. Although
I have attempted devel/xtoolchain-llvm40 for
buildworld and buildkernel when there were
unusual failures like missing routines in
linking. (So far system-clang and
devel/xtoolchain-llvm40 have matched for such
build issues. But I've rarely tried this.)

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-toolchain@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-toolchain
To unsubscribe, send any mail to "freebsd-toolchain-unsubscr...@freebsd.org"


[Bug 220184] clang 4.0.0 segfaults on buildworld

2017-06-22 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=220184

--- Comment #14 from Mark Millard  ---
(In reply to Chris Collins from comment #11)

If this were a general problem the build servers
would not be able to build the releases, ports,
and such.

I do buildworld buildkernel for head on amd64, powerpc64,
aarch64, armv7, and powerpc. I've not been having such
problems. (I do cross builds amd64 ->  more than
native but do on occasion build native for the others.
My amd64 activity is under virtual box on either Windows
10 or macOS 10.12.5 at this point. The others are
directly on the hardware that I have access to.) I
build and run non-debug kernels normally despite running
versions of head.

If what you report was generally happening to others
most FreeBSD activity that is clang 4 based would be
largely "dead in the water" --but it is not. Almost
certainly some uncommon property in other environments
is a property of your environment and is involved. The
problem is isolating what is involved.

It may be time for detailed kernel config specifications.
As I remember you already listed the src.config that you
use (comment 6). None of my src.conf content matches any
of yours. I do not have any 11.x environments at this point,
just head based, currently -r320192 .

If you have a failing environment that can use a pure
GENERIC kernel config and a empty src.conf (or some
match to a well established set of such files), you
might want to try such. If it happens to work okay
then it would form the starting point of a search
for what makes the difference. By contrast if things
still fail this gets much harder to track down.

I can supply examples of my config files if needed
but I do not have defaults. (Just using clang 4 for
targeting powerpc64 or for powerpc is odd in the
first place: I gather evidence of issues that I
discover and report them, generally to llvm.) I do
have a few source file differences associated with
the experiments on non-amd64 --historically mostly
tied to powerpc64 and powerpc.

(Note: Actually powerpc (32-bit) has problems with
crashing even when sitting idle in my context, even
if built with gcc 4.2.1. I've had crashes in minutes
--or up to somewhat over 10 days 8 hours later. Usually
it has been hours but less than 9 hours. But use of
clang need not be involved at all for this so it
is not a fit to your context. And no other of my
environments has shown such behavior so far.)

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-toolchain@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-toolchain
To unsubscribe, send any mail to "freebsd-toolchain-unsubscr...@freebsd.org"


[Bug 220184] clang 4.0.0 segfaults on buildworld

2017-06-22 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=220184

--- Comment #13 from Chris Collins  ---
This is with buildworld running

root@test 11s # sysctl dev.cpu |grep temper  
dev.cpu.3.temperature: 39.0C
dev.cpu.2.temperature: 40.0C
dev.cpu.1.temperature: 39.0C
dev.cpu.0.temperature: 40.0C

I will provide feedback saturday or sunday when I test on a EXSI instance, the
host machine has ECC ram and a new XEON chip powering it. Also server class
storage.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-toolchain@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-toolchain
To unsubscribe, send any mail to "freebsd-toolchain-unsubscr...@freebsd.org"


[Bug 220184] clang 4.0.0 segfaults on buildworld

2017-06-22 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=220184

--- Comment #12 from Chris Collins  ---
(In reply to Conrad Meyer from comment #10)

it has no issue with prime95 stress tests and other stress tests.

So to confirm absolutely 100% stable in every software on the system except
clang 4.0 buildworld.

The cpu temperature is fine and will within spec.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-toolchain@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-toolchain
To unsubscribe, send any mail to "freebsd-toolchain-unsubscr...@freebsd.org"


[Bug 220184] clang 4.0.0 segfaults on buildworld

2017-06-22 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=220184

--- Comment #11 from Chris Collins  ---
Have now tested on an old laptop (slow hardware so long waiting time)

It has the exact same symptons.

Stable when building 11.0 or 10.3 on older clang.

Once on 11-STABLE, random segfaults on clang 4.0

Will test on the server class hardware at weekend, but given the results of
this search and my significant testing of replacement ram etc. I think its a
clang 4.0 issue.

Has FreeBSD changed compiler version before historically on a STABLE branch?
like it has on 11.0 to 11.1 now?

google search "clang 4.0 segfault bug site:lists.llvm.org"

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-toolchain@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-toolchain
To unsubscribe, send any mail to "freebsd-toolchain-unsubscr...@freebsd.org"


[Bug 220184] clang 4.0.0 segfaults on buildworld

2017-06-22 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=220184

--- Comment #10 from Conrad Meyer  ---
If overheating of the CPU is causing segfaults (non-overclocked), your CPU is
already damaged.  Some stress test like Prime95 or IntelBurnTest should also
reproduce the issue.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-toolchain@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-toolchain
To unsubscribe, send any mail to "freebsd-toolchain-unsubscr...@freebsd.org"


[Bug 220184] clang 4.0.0 segfaults on buildworld

2017-06-22 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=220184

O. Hartmann  changed:

   What|Removed |Added

 CC||ohartm...@walstatt.org

--- Comment #9 from O. Hartmann  ---
In the past I saw similar segfaults and after all memory tests have passed
successfully, I realised that the CPU temperature arose dramatically and the
dissipation capacity of the cooler has been insufficient.

Since LLVM/CLANG 4.0.0 is in the tree, I realise a dramatic temperature
increase on my Lenovo ThinkPad Edge E540, which is equipted with a Intel
i5-4200M. The temperature is something I observe very carefully. this might be
o coincidence, but I have the imagination that compiler developers try to use
the facilities a CPU provides to speed up compilation, so the performance is in
relation to power consumption and therefore heat dissipation.

On the other hand, I ripped off the CPU cooler and applied high quality thermal
grease - and that dropped the CPU temperature from ~ 81 degree Celsius down to
66 - 72 degree Celsius within the same environment temperature and roughly the
same OS revision (I did the grease application within one day and recompiled a
complete world from scratch, again).

So, to make it short: check the grease and thermal conductivity of your CPU
cooler. Thermal grease is not long-term stable, the same is for thermal pads.
They get brittle and loose thermal conductivity capabilities over several years
of use, and faster when the CPU is stressed by overclocking.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-toolchain@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-toolchain
To unsubscribe, send any mail to "freebsd-toolchain-unsubscr...@freebsd.org"


[Bug 220184] clang 4.0.0 segfaults on buildworld

2017-06-22 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=220184

--- Comment #8 from Dimitry Andric  ---
I cannot reproduce this crash with the sample you provided.  I tried:
* clang 4.0.0 (297347) on FreeBSD 11.1-BETA1 i386 and amd64
* clang 4.0.0 (297347) on FreeBSD 12.0-CURRENT i386 and amd64
* clang 5.0.0 (305575) on FreeBSD 12.0-CURRENT i386 and amd64.

It doesn't use a lot of memory either, roughly 250M max RSS:

8.37 real 8.19 user 0.16 sys
249616  maximum resident set size
 48201  average shared memory size
   268  average unshared data size
   249  average unshared stack size
 54447  page reclaims
  6410  page faults
 0  swaps
32  block input operations
 5  block output operations
 0  messages sent
 0  messages received
 0  signals received
20  voluntary context switches
   459  involuntary context switches

So memory starvation is pretty unlikely.  I would suspect hardware issues, in
this case.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-toolchain@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-toolchain
To unsubscribe, send any mail to "freebsd-toolchain-unsubscr...@freebsd.org"


[Bug 220184] clang 4.0.0 segfaults on buildworld

2017-06-22 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=220184

Conrad Meyer  changed:

   What|Removed |Added

 CC||c...@freebsd.org,
   ||d...@freebsd.org
   Assignee|freebsd-b...@freebsd.org|freebsd-toolchain@FreeBSD.o
   ||rg

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-toolchain@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-toolchain
To unsubscribe, send any mail to "freebsd-toolchain-unsubscr...@freebsd.org"