On Apr 30 Stefan Richter wrote: > On Apr 29 Stefan Richter wrote: > > On Apr 26 Stefan Richter wrote: > > > v4.6-rc solidly hangs after a short while after boot, login to X11, and > > > doing nothing much remarkable on the just brought up X desktop. > > > > > > Hardware: x86-64, E3-1245 v3 (Haswell), > > > mainboard Supermicro X10SAE, > > > using integrated Intel graphics (HD P4600, i915 driver), > > > C226 PCH's AHCI and USB 2/3, ASMedia ASM1062 AHCI, > > > Intel LAN (i217, igb driver), > > > several IEEE 1394 controllers, some of them behind > > > PCIe bridges (IDT, PLX) or PCIe-to-PCI bridges (TI, Tundra) > > > and one PCI-to-CardBus bridge (Ricoh) > > > > > > kernel.org kernel, Gentoo Linux userland > > > > > > 1. known good: v4.5-rc5 (gcc 4.9.3) > > > known bad: v4.6-rc2 (gcc 4.9.3), only tried one time > > > > > > 2. known good: v4.5.2 (gcc 5.2.0) > > > known bad: v4.6-rc5 (gcc 5.2.0), only tried one time > > > > > > I will send my linux-4.6-rc5/.config in a follow-up message. > > .config: http://www.spinics.net/lists/kernel/msg2243444.html > lspci: http://www.spinics.net/lists/kernel/msg2243447.html > > Some userland package versions, in case these have any bearing: > x11-base/xorg-drivers-1.17 > x11-base/xorg-server-1.17.4 > x11-bas/xorg-x11-7.4-r2
Furthermore, there is a single display hooked up via DisplayPort. > > After it proved impossible to capture an oops through netconsole, I > > started git bisect. This will apparently take almost a week, as git > > estimated 13 bisection steps and I will be allowing about 12 hours of > > uptime as a sign for a good kernel. (In my four or five tests of bad > > kernels before I started bisection, they hung after 3 minutes...5.5 hours > > uptime, with no discernible difference in workload. Maybe 12 h cutoff is > > even too short...) I took at least 18 hours uptime (usually 24 hours) as a sign for good kernels. During the bisection, bad kernels hung after 3 h, 2 h, 9 min, 45 min, and 4 min uptime. Thus I arrived at a98ee79317b4 "drm/i915/fbc: enable FBC by default on HSW and BDW" as the point where the hangs are introduced. Quoting the changelog of the commit: Oh, and in case you - the person reading this commit message - found this commit through git bisect, please do the following: - Check your dmesg and see if there are error messages mentioning underruns around the time your problem started happening. Well, I always had the followings lines in dmesg: [drm:intel_set_cpu_fifo_underrun_reporting] *ERROR* uncleared fifo underrun on pipe A [drm:intel_cpu_fifo_underrun_irq_handler] *ERROR* CPU pipe A FIFO underrun I always got these when I switch on the DisplayPort attached monitor. Recently I changed userland from kdm to sddm and noticed that I apparently get these when sddm shuts down. I am not aware of whether or not this also already happened with kdm. However, "around the time your problem started happening" there is nothing in dmesg, because "your problem" is a complete hang without possibility of disk IO and without netconsole output. - Download intel-gpu-tools, compile it, and run: $ sudo ./tests/kms_frontbuffer_tracking --run-subtest '*fbc-*' 2>&1 | tee fbc.txt Then send us the fbc.txt file, especially if you get a failure. This will really maximize your chances of getting the bug fixed quickly. Do you need this while FBC is enabled, or can I run it while FBC is disabled? - Try to find a reliable way to reproduce the problem, and tell us. The reliable way is to just wait for the kernel to hang after about 3 minutes to 5.5 hours. I have not identified any special activity which would trigger the hang. - Boot with drm.debug=0xe, reproduce the problem, then send us the dmesg file. I can try this, but I am skeptical about getting any useful kernel messages from before the hang. PS: I am mentioning the following just in case that it has any relationship with the FBC related kernel freezes. Maybe it doesn't... There is another recent regression on this PC, but I have not yet figured out whether it was introduced by any particular kernel version. The regression is: When switching from X11 to text console by [Ctrl][Alt][Fx] or by shutting down sddm, I often only get a blank screen. I suspect that this regression was introduced when I replaced kdm by sddm, but I am not sure about that. -- Stefan Richter -======----- -=-= --=-= http://arcgraph.de/sr/