[Bug 220184] clang 4.0.0 segfaults on buildworld
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=220184 --- Comment #22 from Chris Collins --- (In reply to Mark Millard from comment #20) Thanks, the laptop isnt using MSIX, or MSI anyway so I am ok on that, I will have a look at the i5 750 dmesg to see if MSI or MSIX is used. -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-toolchain@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-toolchain To unsubscribe, send any mail to "freebsd-toolchain-unsubscr...@freebsd.org"
[Bug 220184] clang 4.0.0 segfaults on buildworld
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=220184 --- Comment #21 from Chris Collins --- No issues on the i5 750 now as well across 4 reboots and 13 buildworlds. I may raise a new bug regarding the timers, as I had to as well adjust the timecounter on my laptop to get C states working, its default kept it in C1 all the time, so seems is weird eventtimer and timecounter issues on older hardware. The VMWare machine which has no is is a 2016 cpu. The i5 750 cpu was released in 2009 The laptop cpu is a core 2 duo T5750 released in 2008 Thanks guys for your help. -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-toolchain@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-toolchain To unsubscribe, send any mail to "freebsd-toolchain-unsubscr...@freebsd.org"
[Bug 220184] clang 4.0.0 segfaults on buildworld
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=220184 --- Comment #20 from Mark Millard --- (In reply to Chris Collins from comments #18 and #19) Interesting --and non-obvious. >From what I've read Message Signaled Interrupts (MSI) from PCI 2.2+ depend on LAPIC, requiring LAPIC to be enabled. If LAPIC is not working correctly then MSI might not work fully correctly either and so should be avoided in such a context? (I'm not familiar with the details in this area. Take the above as hear-say.) -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-toolchain@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-toolchain To unsubscribe, send any mail to "freebsd-toolchain-unsubscr...@freebsd.org"
[Bug 220184] clang 4.0.0 segfaults on buildworld
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=220184 --- Comment #19 from Chris Collins --- So to confirm as I dont think I written it well, using i8254 on my laptop I dont get segfaults. The default timer changed between 11.0 and 11-STABLE. I also meant "roll of the dice" but typod. -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-toolchain@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-toolchain To unsubscribe, send any mail to "freebsd-toolchain-unsubscr...@freebsd.org"
[Bug 220184] clang 4.0.0 segfaults on buildworld
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=220184 --- Comment #18 from Chris Collins --- Ok a further update. After a reboot, the i5 750 machine started getting segfaults again, a few reboots later I have discovered the behaviour is fairly consistent, where a rolld o the dice occurs on a reboot, usually if the first buildworld has no problem I can probably do 3+ in a row with no segfault, but if the first has a segfault then I will struggle to get just one successful buildworld. I discovered the LAPIC timer on my laptop is broken, aided by a warning on the console, when I switched it to i8254 the problem stopped. I then fresh installed 11.0 again and discovered on 11.0 it uses i8254 by default but on 11-STABLE it uses LAPIC, when LAPIC is used I see some other odd behaviours e.g. systat -v 1 will update really slowly. I then checked on my i5 750 on 11.0 it uses LAPIC by default and seems to work ok, on 11-STABLE LAPIC has the same issues as the laptop and it defaults to HPET. At the time of this post I havent tried a buildworld using a non default timer, but I am running buildworld now using i8254 on the i5 750 to see what results I get, I will run many times over multiple reboots. The VMWare hypervisor has no segfault problems and uses LAPIC by default working fine on 11.0 and 11-STABLE. All the current tests are with empty src.conf aside from 'LOADER_ZFS_SUPPORT=YES'' and no CPUTYPE defined to try and simplify the diagnosis. -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-toolchain@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-toolchain To unsubscribe, send any mail to "freebsd-toolchain-unsubscr...@freebsd.org"
[Bug 220184] clang 4.0.0 segfaults on buildworld
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=220184 --- Comment #17 from Chris Collins --- Perhaps buildworld with clang 4.0 is now the ultimate hardware stability test :) 3rd compile was fine, now running 4th. Will still test on the server class hardware this weekend. So it seems the diagnosis here is that clang 4.0 works the cpu harder so it is more likely to show up stability problems than clang 3.x? -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-toolchain@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-toolchain To unsubscribe, send any mail to "freebsd-toolchain-unsubscr...@freebsd.org"
[Bug 220184] clang 4.0.0 segfaults on buildworld
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=220184 --- Comment #16 from Chris Collins --- I am not insisting its not hardware and I continue to persue the hardware route. I am about to go bed as is 4am here, but I upped the vcore on my cpu and dram voltage on the system and done 2 buildworlds since with no segfaults, it is an old cpu so is possible voltage degradation has occurred to the point that stock voltage is not enough to be stable which is why I have raised the voltage. I will start another buildworld now which will be a third, if it succeeds it will be the first time 3 have worked in a row. It is still on the GENERIC kernel as well. I will also do more runs tomorrow with an empty src.conf. If these new runs all work (with increased voltage and of course also is good on my xeon), then yes I accept that as a hardware issue, and is possible my old laptop may have similar issues as that is old as well. :) -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-toolchain@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-toolchain To unsubscribe, send any mail to "freebsd-toolchain-unsubscr...@freebsd.org"
[Bug 220184] clang 4.0.0 segfaults on buildworld
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=220184 --- Comment #15 from Mark Millard --- (In reply to Mark Millard from comment #14) My paragraph: "If this were a general problem the build servers would not be able to build the releases, ports, and such." was poorly chosen. I should have referred to just test builds that are based on head, stable/11, or the drafts of 11.1 . (I expect that there have been many.) These likely start with projects/clang*-import/ testing and continue with head, stable/11, and the 11.1 drafts. The official of releases and such likely are still based on an older context building the newer context. I do not know if they build and use a bootstrap clang 4 and then use it or not when the target is head, stable/11, or an 11.1 draft version of some kind. It could be that only the system compiler is built and installed but not used for anything relative to buildworld buildkernel activity. As I understand exp-runs were made for building ports that were based on clang 4. This might still be on-going. My own activity is incremental updates of head, so using clang 4 to build a bootstrap compiler that is clang 4 when needed. Then using the resultant clang 4 either way. (I ignore here experimenting with devel/*xtoolchain* or using gcc 4.2.1 where I have to [32-bit powerpc kernel that finishes booting correctly].) There is also likely activity of other people working based on clang 4, including buildworld, buildkernel, and building ports (ports that do not force some gcc or some other toolchain). I expect there is still enough activity based on clang 4 that my overall argument structure still holds: It would be good to try something that matches a well used, well established build configuration overall and see what the status is for that build configuration. I'll note that my activity is mostly based on system-clang, not devel/llvm40 clang. Although I have attempted devel/xtoolchain-llvm40 for buildworld and buildkernel when there were unusual failures like missing routines in linking. (So far system-clang and devel/xtoolchain-llvm40 have matched for such build issues. But I've rarely tried this.) -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-toolchain@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-toolchain To unsubscribe, send any mail to "freebsd-toolchain-unsubscr...@freebsd.org"
[Bug 220184] clang 4.0.0 segfaults on buildworld
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=220184 --- Comment #14 from Mark Millard --- (In reply to Chris Collins from comment #11) If this were a general problem the build servers would not be able to build the releases, ports, and such. I do buildworld buildkernel for head on amd64, powerpc64, aarch64, armv7, and powerpc. I've not been having such problems. (I do cross builds amd64 -> more than native but do on occasion build native for the others. My amd64 activity is under virtual box on either Windows 10 or macOS 10.12.5 at this point. The others are directly on the hardware that I have access to.) I build and run non-debug kernels normally despite running versions of head. If what you report was generally happening to others most FreeBSD activity that is clang 4 based would be largely "dead in the water" --but it is not. Almost certainly some uncommon property in other environments is a property of your environment and is involved. The problem is isolating what is involved. It may be time for detailed kernel config specifications. As I remember you already listed the src.config that you use (comment 6). None of my src.conf content matches any of yours. I do not have any 11.x environments at this point, just head based, currently -r320192 . If you have a failing environment that can use a pure GENERIC kernel config and a empty src.conf (or some match to a well established set of such files), you might want to try such. If it happens to work okay then it would form the starting point of a search for what makes the difference. By contrast if things still fail this gets much harder to track down. I can supply examples of my config files if needed but I do not have defaults. (Just using clang 4 for targeting powerpc64 or for powerpc is odd in the first place: I gather evidence of issues that I discover and report them, generally to llvm.) I do have a few source file differences associated with the experiments on non-amd64 --historically mostly tied to powerpc64 and powerpc. (Note: Actually powerpc (32-bit) has problems with crashing even when sitting idle in my context, even if built with gcc 4.2.1. I've had crashes in minutes --or up to somewhat over 10 days 8 hours later. Usually it has been hours but less than 9 hours. But use of clang need not be involved at all for this so it is not a fit to your context. And no other of my environments has shown such behavior so far.) -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-toolchain@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-toolchain To unsubscribe, send any mail to "freebsd-toolchain-unsubscr...@freebsd.org"
[Bug 220184] clang 4.0.0 segfaults on buildworld
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=220184 --- Comment #13 from Chris Collins --- This is with buildworld running root@test 11s # sysctl dev.cpu |grep temper dev.cpu.3.temperature: 39.0C dev.cpu.2.temperature: 40.0C dev.cpu.1.temperature: 39.0C dev.cpu.0.temperature: 40.0C I will provide feedback saturday or sunday when I test on a EXSI instance, the host machine has ECC ram and a new XEON chip powering it. Also server class storage. -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-toolchain@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-toolchain To unsubscribe, send any mail to "freebsd-toolchain-unsubscr...@freebsd.org"
[Bug 220184] clang 4.0.0 segfaults on buildworld
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=220184 --- Comment #12 from Chris Collins --- (In reply to Conrad Meyer from comment #10) it has no issue with prime95 stress tests and other stress tests. So to confirm absolutely 100% stable in every software on the system except clang 4.0 buildworld. The cpu temperature is fine and will within spec. -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-toolchain@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-toolchain To unsubscribe, send any mail to "freebsd-toolchain-unsubscr...@freebsd.org"
[Bug 220184] clang 4.0.0 segfaults on buildworld
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=220184 --- Comment #11 from Chris Collins --- Have now tested on an old laptop (slow hardware so long waiting time) It has the exact same symptons. Stable when building 11.0 or 10.3 on older clang. Once on 11-STABLE, random segfaults on clang 4.0 Will test on the server class hardware at weekend, but given the results of this search and my significant testing of replacement ram etc. I think its a clang 4.0 issue. Has FreeBSD changed compiler version before historically on a STABLE branch? like it has on 11.0 to 11.1 now? google search "clang 4.0 segfault bug site:lists.llvm.org" -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-toolchain@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-toolchain To unsubscribe, send any mail to "freebsd-toolchain-unsubscr...@freebsd.org"
[Bug 220184] clang 4.0.0 segfaults on buildworld
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=220184 --- Comment #10 from Conrad Meyer --- If overheating of the CPU is causing segfaults (non-overclocked), your CPU is already damaged. Some stress test like Prime95 or IntelBurnTest should also reproduce the issue. -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-toolchain@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-toolchain To unsubscribe, send any mail to "freebsd-toolchain-unsubscr...@freebsd.org"
[Bug 220184] clang 4.0.0 segfaults on buildworld
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=220184 O. Hartmann changed: What|Removed |Added CC||ohartm...@walstatt.org --- Comment #9 from O. Hartmann --- In the past I saw similar segfaults and after all memory tests have passed successfully, I realised that the CPU temperature arose dramatically and the dissipation capacity of the cooler has been insufficient. Since LLVM/CLANG 4.0.0 is in the tree, I realise a dramatic temperature increase on my Lenovo ThinkPad Edge E540, which is equipted with a Intel i5-4200M. The temperature is something I observe very carefully. this might be o coincidence, but I have the imagination that compiler developers try to use the facilities a CPU provides to speed up compilation, so the performance is in relation to power consumption and therefore heat dissipation. On the other hand, I ripped off the CPU cooler and applied high quality thermal grease - and that dropped the CPU temperature from ~ 81 degree Celsius down to 66 - 72 degree Celsius within the same environment temperature and roughly the same OS revision (I did the grease application within one day and recompiled a complete world from scratch, again). So, to make it short: check the grease and thermal conductivity of your CPU cooler. Thermal grease is not long-term stable, the same is for thermal pads. They get brittle and loose thermal conductivity capabilities over several years of use, and faster when the CPU is stressed by overclocking. -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-toolchain@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-toolchain To unsubscribe, send any mail to "freebsd-toolchain-unsubscr...@freebsd.org"
[Bug 220184] clang 4.0.0 segfaults on buildworld
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=220184 --- Comment #8 from Dimitry Andric --- I cannot reproduce this crash with the sample you provided. I tried: * clang 4.0.0 (297347) on FreeBSD 11.1-BETA1 i386 and amd64 * clang 4.0.0 (297347) on FreeBSD 12.0-CURRENT i386 and amd64 * clang 5.0.0 (305575) on FreeBSD 12.0-CURRENT i386 and amd64. It doesn't use a lot of memory either, roughly 250M max RSS: 8.37 real 8.19 user 0.16 sys 249616 maximum resident set size 48201 average shared memory size 268 average unshared data size 249 average unshared stack size 54447 page reclaims 6410 page faults 0 swaps 32 block input operations 5 block output operations 0 messages sent 0 messages received 0 signals received 20 voluntary context switches 459 involuntary context switches So memory starvation is pretty unlikely. I would suspect hardware issues, in this case. -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-toolchain@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-toolchain To unsubscribe, send any mail to "freebsd-toolchain-unsubscr...@freebsd.org"
[Bug 220184] clang 4.0.0 segfaults on buildworld
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=220184 Conrad Meyer changed: What|Removed |Added CC||c...@freebsd.org, ||d...@freebsd.org Assignee|freebsd-b...@freebsd.org|freebsd-toolchain@FreeBSD.o ||rg -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-toolchain@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-toolchain To unsubscribe, send any mail to "freebsd-toolchain-unsubscr...@freebsd.org"