Re: [Regression][Revert request] Excessive delay or hang during resume from system suspend due to a hrti

2012-07-16 Thread John Stultz

On 07/16/2012 10:54 AM, John Stultz wrote:


Thanks for sending your config and test results.

Looking at the call trace you provided, I'm not seeing anything yet, 
but I'll be looking over the code while running my test boxes in a 
reboot loop w/ your config to see if I can't figure out the 
get_next_timer_interrupt issue you saw at bootup.


Left my two boxes (one bare-metal, the other a VM) in reboot loops all 
day with your config and haven't hit anything. :(


Do let me know if you're able to trigger this again. Given that the back 
trace you listed doesn't seem to be directly linked to the leapsecond 
related hrtimer changes, I'm wondering if this might be something else.


But if you get any more details, do let me know and I'll try to sort it out.

thanks
-john

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Regression][Revert request] Excessive delay or hang during resume from system suspend due to a hrti

2012-07-16 Thread John Stultz

On 07/16/2012 10:51 AM, Sedat Dilek wrote:

On Mon, Jul 16, 2012 at 7:16 AM, Sedat Dilek  wrote:

[ QUOTE ]

Hi Linus,

Please revert:

commit 5baefd6d84163443215f4a99f6a20f054ef11236
Author: John Stultz 
Date:   Tue Jul 10 18:43:25 2012 -0400

 hrtimer: Update hrtimer base offsets each hrtimer_interrupt

This breaks resume on the iBook G4 and Toshiba Portege R500 (at least), by
adding an excessive delay to it (the Toshiba box sometimes hangs hard during
resume from system suspend).  According to Andreas
(https://lkml.org/lkml/2012/7/15/66):

"Apparently during or before noirq resume the system is hanging by the same
amount of time as the system was sleeping."

which seems to agree with my observations.

Given that the two known-affected boxes are so different, it is quite probable
that the total number of affected systems is actually quite high.

Thanks!


To everyone involved: the fact that this change, which was likely to introduce
regressions from the look of it alone, has been pushed to Linus (an to -stable
at the same time!) so late in the cycle, is seriuosly disappointing.

Thanks,
Rafael

[ /QUOTE ]

Hi,

when I booted 1st into Linux-3.5-rc7 (a few hours after release) I had
a call-trace in get_next_timer_interrupt() (NULL pointer dereference)
on early-boot.
The machine got frozen.

I can't say if this is related to the same issue here, but I can
confirm after suspend + resume the machine (sandy-bridge ultrabook) I
am working on is in an unusable state.
I had to cold reboot/restart.

Regards,
- Sedat -

P.S.: Unfortunately, I could not reproduce the NULL-deref again.
Thomas gave me some instruction to enable some debugobjects
kernel-options (see attached backlog from IRC).

Hi,

John asked me on IRC to send my kernel-config and to clarify about my
experiences with the NULL-deref I saw in get_next_timer_interrupt():
I only saw it once - even after several bootups/bootins - as said at
the very beginning - the machine was unusable - hard reset.
Unfortunately, I had no digicam around to take a screenshot.


Thanks for sending your config and test results.

Looking at the call trace you provided, I'm not seeing anything yet, but 
I'll be looking over the code while running my test boxes in a reboot 
loop w/ your config to see if I can't figure out the 
get_next_timer_interrupt issue you saw at bootup.


thanks
-john

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Regression][Revert request] Excessive delay or hang during resume from system suspend due to a hrti

2012-07-16 Thread John Stultz

On 07/16/2012 10:51 AM, Sedat Dilek wrote:

On Mon, Jul 16, 2012 at 7:16 AM, Sedat Dilek sedat.di...@gmail.com wrote:

[ QUOTE ]

Hi Linus,

Please revert:

commit 5baefd6d84163443215f4a99f6a20f054ef11236
Author: John Stultz johns...@us.ibm.com
Date:   Tue Jul 10 18:43:25 2012 -0400

 hrtimer: Update hrtimer base offsets each hrtimer_interrupt

This breaks resume on the iBook G4 and Toshiba Portege R500 (at least), by
adding an excessive delay to it (the Toshiba box sometimes hangs hard during
resume from system suspend).  According to Andreas
(https://lkml.org/lkml/2012/7/15/66):

Apparently during or before noirq resume the system is hanging by the same
amount of time as the system was sleeping.

which seems to agree with my observations.

Given that the two known-affected boxes are so different, it is quite probable
that the total number of affected systems is actually quite high.

Thanks!


To everyone involved: the fact that this change, which was likely to introduce
regressions from the look of it alone, has been pushed to Linus (an to -stable
at the same time!) so late in the cycle, is seriuosly disappointing.

Thanks,
Rafael

[ /QUOTE ]

Hi,

when I booted 1st into Linux-3.5-rc7 (a few hours after release) I had
a call-trace in get_next_timer_interrupt() (NULL pointer dereference)
on early-boot.
The machine got frozen.

I can't say if this is related to the same issue here, but I can
confirm after suspend + resume the machine (sandy-bridge ultrabook) I
am working on is in an unusable state.
I had to cold reboot/restart.

Regards,
- Sedat -

P.S.: Unfortunately, I could not reproduce the NULL-deref again.
Thomas gave me some instruction to enable some debugobjects
kernel-options (see attached backlog from IRC).

Hi,

John asked me on IRC to send my kernel-config and to clarify about my
experiences with the NULL-deref I saw in get_next_timer_interrupt():
I only saw it once - even after several bootups/bootins - as said at
the very beginning - the machine was unusable - hard reset.
Unfortunately, I had no digicam around to take a screenshot.


Thanks for sending your config and test results.

Looking at the call trace you provided, I'm not seeing anything yet, but 
I'll be looking over the code while running my test boxes in a reboot 
loop w/ your config to see if I can't figure out the 
get_next_timer_interrupt issue you saw at bootup.


thanks
-john

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Regression][Revert request] Excessive delay or hang during resume from system suspend due to a hrti

2012-07-16 Thread John Stultz

On 07/16/2012 10:54 AM, John Stultz wrote:


Thanks for sending your config and test results.

Looking at the call trace you provided, I'm not seeing anything yet, 
but I'll be looking over the code while running my test boxes in a 
reboot loop w/ your config to see if I can't figure out the 
get_next_timer_interrupt issue you saw at bootup.


Left my two boxes (one bare-metal, the other a VM) in reboot loops all 
day with your config and haven't hit anything. :(


Do let me know if you're able to trigger this again. Given that the back 
trace you listed doesn't seem to be directly linked to the leapsecond 
related hrtimer changes, I'm wondering if this might be something else.


But if you get any more details, do let me know and I'll try to sort it out.

thanks
-john

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/