https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=219399
--- Comment #89 from Don Lewis <truck...@freebsd.org> --- Created attachment 184641 --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=184641&action=edit patch to move amd64 shared page to a lower address to avoid Ryzen problem with executing code near user address upper limit I've been doing a number of experiments with openjdk7 builds to try to better characterize the Ryzen problem. First I did a number of openjdk7 builds using cpuset to pin the build to individual cores. Using cpuset -l 0 to pin the build to the first thread on core 0 would consistently cause a silent reboot on the first or second try. Pinning the build to any of the other cores allowed me to successfully build openjdk7. I ran four builds on each of the other cores to make sure that I wasn't just getting a successful build by chance. Surprisingly, pinning the build to the second thread on core 0 was also successful. In any case, the results were consistent with my earlier tests where I disabled SMT and also all but two cores in the BIOS, since those tests always used the first thread on core 0. I tried building openjdk7 on all cores except the first thread of core 0 by using cpuset -l 1-15 and was also successful. Based on that positive result, I tried building my default set of ~1600 ports with cpuset -l 1-15. A little over two hours into the build, the llvm40 build failed with the: _arena.c:821: Failed assertion: "nstime_compare(&decay->epoc h, &time) <= 0") causing the ports that depend on it to be skipped, but everything else built successfully. When I restarted poudriere, the llvm40 build succeeded, but the system hung after about an hour while running java as part of the openjdk7 build. Next I tried building with cpuset -l 2-15. The only problem that I ran into is that the gcc build failed with SIGBUS, causing its dependencies to be skipped. When I restarted poudriere, gcc5 and the remaining ports build successfully. I wanted to try to eliminate the possibility of a subtle defect in core 0 as a potential cause of the problem, so I tried adding hint.lapic.0.disabled=1 hint.lapic.1.disabled=1 to /boot/loader.conf, but FreeBSD does not allow the BSP to be disabled B-( The other thing that is unique about core 0 on my machine is that it looks like all of the external interrupts (but not interprocessor interrupts) go there. The biggest source of those seemed to be hpet, but I couldn't figure out how to disable that (other than maybe disabling ACPI totally). When I tried hint.hpet.0.clock=0, all of the CPUs got assigned interrupts from another timer. The next thing I tried was inspired by the Dragonfly patch. At least some thread implementations use signals to communicate between threads. I'm not familiar with OpenJDK, but it is possible that it is such an implementation, so it might be a heavy signal user and spend a lot of cycles in the signal trampoline code. Our signal trampoline code is in a different location than Dragonfly uses, but it is still close to (in the top page of) the top of user memory. Even though I got the impression that the Dragonfly patch addresses an issue with SMT, it does involve an interaction between interrupts and execution of code near the top of user memory. As an experiment, I patched the kernel to move the location of the shared page lower by PAGE_SIZE. I'm not sure if it is necessary, but the page at the old location has the same rwx permissions and is zero filled. I don't know if the bug is triggered by executing code close to the upper address boundary or close to a permission boundary. The preliminary results so far are very promising. With the patch applied, I am able to successfully build openjdk7 either unpinned or pinned to the first thread of core 0. I just kicked off an unpinned ~1600 port poudriere run. I should have results of that late today. The patch is attached. -- You are receiving this mail because: You are the assignee for the bug. _______________________________________________ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"