On 1/20/16, John Stultz <john.stu...@linaro.org> wrote: > On Wed, Jan 20, 2016 at 9:16 AM, Jeff Merkey <linux....@gmail.com> wrote: >> On 1/20/16, Jeff Merkey <linux....@gmail.com> wrote: >>> On 1/20/16, Thomas Gleixner <t...@linutronix.de> wrote: >>>> On Tue, 19 Jan 2016, Jeff Merkey wrote: >>>>> Nasty bug but trivial fix for this. What happens here is RAX (nsecs) >>>>> gets set to a huge value (RAX = 0x17AE7F57C671EA7D) and passed through >>>> >>>> And how exactly does that happen? >>>> >>>> 0x17AE7F57C671EA7D = 1.70644e+18 nsec >>>> = 1.70644e+09 sec >>>> = 2.84407e+07 min >>>> = 474011 hrs >>>> = 19750.5 days >>>> = 54.1109 years >>>> >>>> That's the real issue, not what you are trying to 'fix' in >>>> timespec_add_ns() >>>> >> >> I guess I am going to have to become an expert on the timekeeper and >> learn this subsystem backwards and forwards to code a touch function >> to keep it from crashing the system. >> >> On the 2.6 series kernels (and 2.2) this problem did not exist. I >> noticed a lot of these changes came in in the late 2.6 cycles. Before >> that time, I could leave the debugger spinning for days and linux >> worked fine. >> >> For people who have to pay developers to develop code on Linux a >> debugger is almost >> an essential tool since it saves hundreds of thousands of dollars in >> development costs. Not everyone wants to spend money for their >> employees and engineers to sit around and code review every problem - >> customers just want their problems fixed -- and fast. That being >> said, I am having no lack of people who download and use this debugger >> and I'm certain kgdb is heavily used by folks doing development. If >> kernel development is too hard, people move to something else based on >> simple economics. >> >> That being said, I need to get this fixed. There is no good reason a >> debugger shouldn't be able to stop the system and leave it suspended >> for days if necessary to run down a bug. I wrote a debugger on SMP >> Netware that worked that way. The earliest versions of MDB worked >> that way. >> >> kgdb is broken right now because of this. I am not certain it affects >> all systems out there, but it needs to be fixed. >> >> If you have any ideas on how to code a touch function please send me a >> patch or suggest how it could be done non-obstrusively, otherwise I'll >> have to dive into the timekeeper and fix it myself and learn yet >> another subsystem of Linux and fix it bugs. A code subsystem that >> crashes because the timer tick is skewed or returns garbage is poorly >> designed IMHO. > > Ehrm. A more productive route in solving this might be to cap the > cycle delta we return from timekeeping_get_delta(). > > We already do this in the CONFIG_DEBUG_TIMEKEEPING, but adding a > simple check it to the non-debug case should be doable w/o adding too > much overhead to this very hot path. > > Something like: > if (delta > tkr->clock->max_cycles) > delta = tkr->clock->max_cycles; > > return delta; > > thanks > -john >
Thank you John. This is helpful. Can you send me a patch for this and I'll test it. Then I am not touching this code and you guys can put it in. Jeff