While you are probably thinking about the iwlwifi issue causing RT throttling I have one more interesting followup below.
Peter Zijlstra wrote: > On Fri, Aug 23, 2013 at 12:38:53PM +0200, Martin Mokrejs wrote: >>> It means you have (a) real-time task(s) that consume significant amount >> >> How can I find them? > > ps -deo pid,cls,cmd | grep -e RR -e FF > > Should do I suppose > >> I don't think I need the RT, I have two CPU-bound >> processes and want to run them at max speed. Rest of the system is >> unimportant. >> >> I still don't understand what the $subj message actually says. Does it say >> the RT-requiring task was slowed down? I am a bit lost here. > > Yeah, they were forcibly stopped from running for a little while. > >>> of time. At some point we throttle them in an attempt to keep the system >>> from falling over. >> >> Will I get companion "[sched_delayed] sched: RT throttling deactivated" >> at some point? > > Nope, you get that message once to tell you that we throttle RT tasks. > >> Are python-based apps requiring the realtime features? > > I'm fairly sure python could use the relevant scheduling classes, but I > don't speak snake so I really wouldn't know. > >> I used to get the messages below which are now gone with my CPU cooler being >> replaced yesterday: >> >> [ 4172.717272] CPU1: Core temperature above threshold, cpu clock throttled >> (total events = 153727) > >> mcelog report in such cases: >> >> Hardware event. This is not a software error. >> MCE 0 >> CPU 1 THERMAL EVENT TSC 1bf82e2a146 >> TIME 1375536062 Sat Aug 3 15:21:02 2013 >> Processor 1 heated above trip temperature. Throttling enabled. >> Please check your system cooling. Performance will be impacted >> STATUS 880003c3 MCGSTATUS 0 >> MCGCAP c07 APICID 2 SOCKETID 0 >> CPUID Vendor Intel Family 6 Model 42 > > Right, those are thermal events throttling the speed of your CPU to keep > the thing from heat damaging itself. > >> While my CPU cooler got replaced even now I still get (hence this email >> thread): >> >> [39564.452795] blah.py[14396]: segfault at 7ff67af34a58 ip 00007ff67badff00 >> sp 00007fff771ce798 error 4 in libpython2.7.so.1.0[7ff67b9cf000+173000] >> [44520.259205] [sched_delayed] sched: RT throttling activated >> [48956.057816] blah.py[16623]: segfault at 2f ip 00007fd462e5d046 sp >> 00007fff638431e0 error 4 in libpython2.7.so.1.0[7fd462d7c000+173000] >> [49288.388797] blah.py[28631]: segfault at 7fe254b6aa58 ip 00007fe255715f00 >> sp 00007fff6ddaaff8 error 4 in libpython2.7.so.1.0[7fe255605000+173000] >> [49942.020084] blah.py[6950]: segfault at d0 ip 00007f3e8a9acf9c sp >> 00007fffa72288a0 error 4 in libpython2.7.so.1.0[7f3e8a904000+173000] >> [66696.443342] blah.py[8015]: segfault at cf ip 00007f798f708f9c sp >> 00007fff420336e0 error 4 in libpython2.7.so.1.0[7f798f660000+173000] >> [67561.587383] blah.py[7483]: segfault at 7f7b16e01540 ip 00007f7b17a85f00 >> sp 00007fffe663d9b8 error 4 in libpython2.7.so.1.0[7f7b17975000+173000] >> [77262.490502] blah.py[29107]: segfault at 21e1458 ip 00007fc54cd17f00 sp >> 00007fff283c5c38 error 4 in libpython2.7.so.1.0[7fc54cc07000+173000] >> >> >> So, what does this "[sched_delayed] sched: RT throttling activated" tell me? > > That of the past 1s, 0.95s were spend running RR/FIFO tasks. It is a > warning that comes only once per boot and should prompt you to > investigate. > > You can turn the throttle off, but be advised that running a RR/FIFO > task at 100% can (and generally does) negatively affect the running of > your system (as in, these tasks can prevent system duties from taking > place and eventually make the system come to a halt). > > > As to those faults, investigate if your python prog does something > particualrly weird or your runtime is in order. Otherwise I would advise > you to run memtest for a while to make sure your machine is in proper > working order. Hmm, meanwhile the core dumps filled up my /var/dumps/ directory of / filesystem. I do not have timing information what was the time since bootup. I deleted some files on the disk and thought I am done. Now, few hours later I realized: [85451.247130] traps: blah.py[30787] general protection ip:7faf7b57a046 sp:7fffd9f7b1d0 error:0 in libpython2.7.so.1.0[7faf7b499000+173000] [87125.493730] nr_pdflush_threads exported in /proc is scheduled for removal [87125.494238] sysctl: The scan_unevictable_pages sysctl/node-interface has been disabled for lack of a legitimate use case. If you have one, please send an email to linux...@kvack.org. [97959.812943] blah.py[13069]: segfault at 7f1f2cfdca58 ip 00007f1f2db87f00 sp 00007fffade41768 error 4 in libpython2.7.so.1.0[7f1f2da77000+173000] I bet at about the time 87125 the disk was full. The laptop has 16GB of RAM and the coredump files are really big, 300MB to 8GB. However, the nr_pdflush_threads message sounds scary. Does linux 3.10.9 want to delete /proc on the fly? ;-) Martin -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/