Re: [Intel-gfx] [PATCH] drm/i915: fix failure to power off after hibernate

2015-03-02 Thread David Weinehall
On Fri, Feb 27, 2015 at 08:23:41PM +0200, Imre Deak wrote:
 We've checked today with Ville a few machines we've found from GEN2 to
 GEN5+. There was one Thinkpad x61s (GEN4) where I could reproduce the
 exact same problem and get rid of it using the same workaround. All the
 others were non-Lenovo (including GEN4) mobile and desktop platforms and
 none of these had the problem. I also tried it on a Lenovo IVB
 ultrabook, which also works fine. So the current theory is that it's
 related to GEN4 Thinkpad BIOSes, and so we need to change the condition
 to match that at least.

If you need more ThinkPads to test with I have a few at home :)


Kind regards, David
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: fix failure to power off after hibernate

2015-02-27 Thread David Weinehall
On Thu, Feb 26, 2015 at 09:01:27PM +0100, Daniel Vetter wrote:
 On Thu, Feb 26, 2015 at 08:50:48PM +0200, Imre Deak wrote:

[snip]
  
  The problem seems to be that after the kernel puts the device into D3
  the BIOS still tries to access it, or otherwise assumes that it's in D0.
  This is clearly bogus, since ACPI mandates that devices are put into D3
  by the OSPM if they are not wake-up sources. In the future we want to
  unify more of the driver's runtime and system suspend paths, for example
  by skipping all the system suspend/hibernation hooks if the device is
  runtime suspended already. Accordingly for all other platforms the goal
  is still to properly power down the device during hibernation.

[snip]
 
 If we see more of these troublesome machines we might need to apply an
 even bigger hammer like gen  5 or so. But as long as there's just 1
 report I think this is the right approach.
 
 Bjorn, as soon as we have your tested-by that this work we can apply this
 and shovel it through the backporting machinery.

Isn't there a risk that this will negatively impact machines using gen4
that *don't* have a buggy BIOS?  Wouldn't a quirk tied to the laptop
or BIOS version be better suited -- or possibly a parameter that can be
passed to the driver, which would make it easier to test if others
suffering from similar symptoms on other systems suffer from the same
issue or not?

Just my 2¢.


Kind regards, David Weinehall
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: fix failure to power off after hibernate

2015-02-27 Thread Imre Deak
On Fri, 2015-02-27 at 14:15 +0200, David Weinehall wrote:
 On Thu, Feb 26, 2015 at 09:01:27PM +0100, Daniel Vetter wrote:
  On Thu, Feb 26, 2015 at 08:50:48PM +0200, Imre Deak wrote:
 
 [snip]
   
   The problem seems to be that after the kernel puts the device into D3
   the BIOS still tries to access it, or otherwise assumes that it's in D0.
   This is clearly bogus, since ACPI mandates that devices are put into D3
   by the OSPM if they are not wake-up sources. In the future we want to
   unify more of the driver's runtime and system suspend paths, for example
   by skipping all the system suspend/hibernation hooks if the device is
   runtime suspended already. Accordingly for all other platforms the goal
   is still to properly power down the device during hibernation.
 
 [snip]
  
  If we see more of these troublesome machines we might need to apply an
  even bigger hammer like gen  5 or so. But as long as there's just 1
  report I think this is the right approach.
  
  Bjorn, as soon as we have your tested-by that this work we can apply this
  and shovel it through the backporting machinery.
 
 Isn't there a risk that this will negatively impact machines using gen4
 that *don't* have a buggy BIOS?  Wouldn't a quirk tied to the laptop
 or BIOS version be better suited -- or possibly a parameter that can be
 passed to the driver, which would make it easier to test if others
 suffering from similar symptoms on other systems suffer from the same
 issue or not?
 
 Just my 2¢.

We've checked today with Ville a few machines we've found from GEN2 to
GEN5+. There was one Thinkpad x61s (GEN4) where I could reproduce the
exact same problem and get rid of it using the same workaround. All the
others were non-Lenovo (including GEN4) mobile and desktop platforms and
none of these had the problem. I also tried it on a Lenovo IVB
ultrabook, which also works fine. So the current theory is that it's
related to GEN4 Thinkpad BIOSes, and so we need to change the condition
to match that at least.

--Imre

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: fix failure to power off after hibernate

2015-02-26 Thread Imre Deak
On to, 2015-02-26 at 10:34 +0100, Bjørn Mork wrote:
 Imre Deak imre.d...@intel.com writes:

  That patch fixes the problem, with only pci_set_power_state commented
  out.  Do you still want me to try with pci_disable_device() commented
  out as well?
 
  No, but it would help if you could still try the two attached patch
  separately, without any of the previous workarounds. Based on the
  result, we'll follow up with a fix that adds for your specific platform
  either of the attached workarounds or simply avoids putting the device
  into D3 (corresponding to the patch you already tried).

 None of those patches made any difference.  The laptop still hangs at
 power-off.

 Not really surprising, is it?  Previous testing shows that the hang
 occurs at the pci_set_power_state(drm_dev-pdev, PCI_D3hot) call in the
 poweroff_late hook.  It is hard to see how replacing it by an attempt to
 set D3cold, or adding any call after this point, could possibly change
 anything.  The system is stil hanging at the pci_set_power_state() call.

Judging from the blinking LED, I assume that it's not
pci_set_power_state() that hangs the machine, but the hang happens in
BIOS code.

 The generic pci-driver code will put the i915 device into PCI_D3hot for
 you, won't it? Why do you need to duplicate that in the driver,
 *knowing* that doing so breaks (at least some) systems?

Letting the pci core put the device into D3 wouldn't get rid of the problem.
It's putting the device into D3 in the first place what causes it.

 I honestly don't think this let's try some random code is the proper
 way to fix this bug (or any other bug for that matter).  You need to
 start understanding the code you write, and the first step is by
 actually explaining the changes you make.

We have a good understanding about the issue: the BIOS on your platform
does something unexpected behind the back of the driver/kernel. In that
sense the patches were not random, for instance the first one is based on
an existing workaround for an issue that resembles quite a lot yours, see
pci_pm_poweroff_noirq().

 I also believe that you completely miss the fact that this bug has
 survived a full release cycle before you became aware of it, and the
 implications this has wrt other affected systems/users.  I assume you
 understand that my system isn't one-of-a-kind, This means that there are
 other affected users with identical/similar systems.  Now, if none of
 those users reported the bug to you (we all know why: Linux kernel
 development is currently limited by the available testing resources, NOT
 by the available developer resources), then how do you know that there
 aren't a number of other systems affected as well?

 Let me answer that for you:  You don't.

 Which is why you must explain the mechanism triggering the bug, proving
 that it is a chipset/system specific issue.  Because that's the only way
 you will *know* that you have solved the problem not only for me, but for
 all affected users.

 IMHO, the only safe and sane solution at the moment is the revert patch
 I posted.  It's a simple fix, reverting back to the *known* working
 state before this regression was introduced.

 Then you can start over from there, trying to implement this properly.

The current way is the proper one that we want for the generic case. The issue
on your platform is the exception, so working around that is a sensible choice.

Attached is the proposed fix for this issue.

--Imre

From 5c23657bc168db12a1ba100ab65fabd305c89c8a Mon Sep 17 00:00:00 2001
From: Imre Deak imre.d...@intel.com
Date: Thu, 26 Feb 2015 18:38:53 +0200
Subject: [PATCH] drm/i915: gm45: work around hang during hibernation
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Bjørn reported that his machine hang during hibernation and eventually
bisected the problem to the following commit:

commit da2bc1b9db3351addd293e5b82757efe1f77ed1d
Author: Imre Deak imre.d...@intel.com
Date:   Thu Oct 23 19:23:26 2014 +0300

drm/i915: add poweroff_late handler

The problem seems to be that after the kernel puts the device into D3
the BIOS still tries to access it, or otherwise assumes that it's in D0.
This is clearly bogus, since ACPI mandates that devices are put into D3
by the OSPM if they are not wake-up sources. In the future we want to
unify more of the driver's runtime and system suspend paths, for example
by skipping all the system suspend/hibernation hooks if the device is
runtime suspended already. Accordingly for all other platforms the goal
is still to properly power down the device during hibernation.

Reference: http://lists.freedesktop.org/archives/intel-gfx/2015-February/060633.html
Reported-and-bisected-by: Bjørn Mork bj...@mork.no
Signed-off-by: Imre Deak imre.d...@intel.com
---
 drivers/gpu/drm/i915/i915_drv.c | 27 ++-
 1 file changed, 22 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.c 

Re: [Intel-gfx] [PATCH] drm/i915: fix failure to power off after hibernate

2015-02-26 Thread Ville Syrjälä
On Thu, Feb 26, 2015 at 08:50:48PM +0200, Imre Deak wrote:
 On to, 2015-02-26 at 10:34 +0100, Bjørn Mork wrote:
  Imre Deak imre.d...@intel.com writes:
 
   That patch fixes the problem, with only pci_set_power_state commented
   out.  Do you still want me to try with pci_disable_device() commented
   out as well?
  
   No, but it would help if you could still try the two attached patch
   separately, without any of the previous workarounds. Based on the
   result, we'll follow up with a fix that adds for your specific platform
   either of the attached workarounds or simply avoids putting the device
   into D3 (corresponding to the patch you already tried).
 
  None of those patches made any difference.  The laptop still hangs at
  power-off.
 
  Not really surprising, is it?  Previous testing shows that the hang
  occurs at the pci_set_power_state(drm_dev-pdev, PCI_D3hot) call in the
  poweroff_late hook.  It is hard to see how replacing it by an attempt to
  set D3cold, or adding any call after this point, could possibly change
  anything.  The system is stil hanging at the pci_set_power_state() call.
 
 Judging from the blinking LED, I assume that it's not
 pci_set_power_state() that hangs the machine, but the hang happens in
 BIOS code.
 
  The generic pci-driver code will put the i915 device into PCI_D3hot for
  you, won't it? Why do you need to duplicate that in the driver,
  *knowing* that doing so breaks (at least some) systems?
 
 Letting the pci core put the device into D3 wouldn't get rid of the problem.
 It's putting the device into D3 in the first place what causes it.
 
  I honestly don't think this let's try some random code is the proper
  way to fix this bug (or any other bug for that matter).  You need to
  start understanding the code you write, and the first step is by
  actually explaining the changes you make.
 
 We have a good understanding about the issue: the BIOS on your platform
 does something unexpected behind the back of the driver/kernel. In that
 sense the patches were not random, for instance the first one is based on
 an existing workaround for an issue that resembles quite a lot yours, see
 pci_pm_poweroff_noirq().
 
  I also believe that you completely miss the fact that this bug has
  survived a full release cycle before you became aware of it, and the
  implications this has wrt other affected systems/users.  I assume you
  understand that my system isn't one-of-a-kind, This means that there are
  other affected users with identical/similar systems.  Now, if none of
  those users reported the bug to you (we all know why: Linux kernel
  development is currently limited by the available testing resources, NOT
  by the available developer resources), then how do you know that there
  aren't a number of other systems affected as well?
 
  Let me answer that for you:  You don't.
 
  Which is why you must explain the mechanism triggering the bug, proving
  that it is a chipset/system specific issue.  Because that's the only way
  you will *know* that you have solved the problem not only for me, but for
  all affected users.
 
  IMHO, the only safe and sane solution at the moment is the revert patch
  I posted.  It's a simple fix, reverting back to the *known* working
  state before this regression was introduced.
 
  Then you can start over from there, trying to implement this properly.
 
 The current way is the proper one that we want for the generic case. The issue
 on your platform is the exception, so working around that is a sensible 
 choice.
 
 Attached is the proposed fix for this issue.
 
 --Imre
 

 From 5c23657bc168db12a1ba100ab65fabd305c89c8a Mon Sep 17 00:00:00 2001
 From: Imre Deak imre.d...@intel.com
 Date: Thu, 26 Feb 2015 18:38:53 +0200
 Subject: [PATCH] drm/i915: gm45: work around hang during hibernation
 MIME-Version: 1.0
 Content-Type: text/plain; charset=UTF-8
 Content-Transfer-Encoding: 8bit
 
 Bjørn reported that his machine hang during hibernation and eventually
 bisected the problem to the following commit:
 
 commit da2bc1b9db3351addd293e5b82757efe1f77ed1d
 Author: Imre Deak imre.d...@intel.com
 Date:   Thu Oct 23 19:23:26 2014 +0300
 
 drm/i915: add poweroff_late handler
 
 The problem seems to be that after the kernel puts the device into D3
 the BIOS still tries to access it, or otherwise assumes that it's in D0.
 This is clearly bogus, since ACPI mandates that devices are put into D3
 by the OSPM if they are not wake-up sources. In the future we want to
 unify more of the driver's runtime and system suspend paths, for example
 by skipping all the system suspend/hibernation hooks if the device is
 runtime suspended already. Accordingly for all other platforms the goal
 is still to properly power down the device during hibernation.
 
 Reference: 
 http://lists.freedesktop.org/archives/intel-gfx/2015-February/060633.html
 Reported-and-bisected-by: Bjørn Mork bj...@mork.no
 Signed-off-by: Imre Deak imre.d...@intel.com
 ---
  

Re: [Intel-gfx] [PATCH] drm/i915: fix failure to power off after hibernate

2015-02-26 Thread Daniel Vetter
On Thu, Feb 26, 2015 at 08:50:48PM +0200, Imre Deak wrote:
 On to, 2015-02-26 at 10:34 +0100, Bjørn Mork wrote:
  Imre Deak imre.d...@intel.com writes:
 
   That patch fixes the problem, with only pci_set_power_state commented
   out.  Do you still want me to try with pci_disable_device() commented
   out as well?
  
   No, but it would help if you could still try the two attached patch
   separately, without any of the previous workarounds. Based on the
   result, we'll follow up with a fix that adds for your specific platform
   either of the attached workarounds or simply avoids putting the device
   into D3 (corresponding to the patch you already tried).
 
  None of those patches made any difference.  The laptop still hangs at
  power-off.
 
  Not really surprising, is it?  Previous testing shows that the hang
  occurs at the pci_set_power_state(drm_dev-pdev, PCI_D3hot) call in the
  poweroff_late hook.  It is hard to see how replacing it by an attempt to
  set D3cold, or adding any call after this point, could possibly change
  anything.  The system is stil hanging at the pci_set_power_state() call.
 
 Judging from the blinking LED, I assume that it's not
 pci_set_power_state() that hangs the machine, but the hang happens in
 BIOS code.
 
  The generic pci-driver code will put the i915 device into PCI_D3hot for
  you, won't it? Why do you need to duplicate that in the driver,
  *knowing* that doing so breaks (at least some) systems?
 
 Letting the pci core put the device into D3 wouldn't get rid of the problem.
 It's putting the device into D3 in the first place what causes it.
 
  I honestly don't think this let's try some random code is the proper
  way to fix this bug (or any other bug for that matter).  You need to
  start understanding the code you write, and the first step is by
  actually explaining the changes you make.
 
 We have a good understanding about the issue: the BIOS on your platform
 does something unexpected behind the back of the driver/kernel. In that
 sense the patches were not random, for instance the first one is based on
 an existing workaround for an issue that resembles quite a lot yours, see
 pci_pm_poweroff_noirq().
 
  I also believe that you completely miss the fact that this bug has
  survived a full release cycle before you became aware of it, and the
  implications this has wrt other affected systems/users.  I assume you
  understand that my system isn't one-of-a-kind, This means that there are
  other affected users with identical/similar systems.  Now, if none of
  those users reported the bug to you (we all know why: Linux kernel
  development is currently limited by the available testing resources, NOT
  by the available developer resources), then how do you know that there
  aren't a number of other systems affected as well?
 
  Let me answer that for you:  You don't.
 
  Which is why you must explain the mechanism triggering the bug, proving
  that it is a chipset/system specific issue.  Because that's the only way
  you will *know* that you have solved the problem not only for me, but for
  all affected users.
 
  IMHO, the only safe and sane solution at the moment is the revert patch
  I posted.  It's a simple fix, reverting back to the *known* working
  state before this regression was introduced.
 
  Then you can start over from there, trying to implement this properly.
 
 The current way is the proper one that we want for the generic case. The issue
 on your platform is the exception, so working around that is a sensible 
 choice.
 
 Attached is the proposed fix for this issue.
 
 --Imre
 

 From 5c23657bc168db12a1ba100ab65fabd305c89c8a Mon Sep 17 00:00:00 2001
 From: Imre Deak imre.d...@intel.com
 Date: Thu, 26 Feb 2015 18:38:53 +0200
 Subject: [PATCH] drm/i915: gm45: work around hang during hibernation
 MIME-Version: 1.0
 Content-Type: text/plain; charset=UTF-8
 Content-Transfer-Encoding: 8bit
 
 Bjørn reported that his machine hang during hibernation and eventually
 bisected the problem to the following commit:
 
 commit da2bc1b9db3351addd293e5b82757efe1f77ed1d
 Author: Imre Deak imre.d...@intel.com
 Date:   Thu Oct 23 19:23:26 2014 +0300
 
 drm/i915: add poweroff_late handler
 
 The problem seems to be that after the kernel puts the device into D3
 the BIOS still tries to access it, or otherwise assumes that it's in D0.
 This is clearly bogus, since ACPI mandates that devices are put into D3
 by the OSPM if they are not wake-up sources. In the future we want to
 unify more of the driver's runtime and system suspend paths, for example
 by skipping all the system suspend/hibernation hooks if the device is
 runtime suspended already. Accordingly for all other platforms the goal
 is still to properly power down the device during hibernation.
 
 Reference: 
 http://lists.freedesktop.org/archives/intel-gfx/2015-February/060633.html
 Reported-and-bisected-by: Bjørn Mork bj...@mork.no
 Signed-off-by: Imre Deak imre.d...@intel.com
 ---
  

Re: [Intel-gfx] [PATCH] drm/i915: fix failure to power off after hibernate

2015-02-26 Thread Bjørn Mork
Ville Syrjälä ville.syrj...@linux.intel.com writes:

 @@ -651,7 +651,14 @@ static int i915_drm_suspend_late(struct drm_device 
 *drm_dev)
  }
  
  pci_disable_device(drm_dev-pdev);
 -pci_set_power_state(drm_dev-pdev, PCI_D3hot);
 +/*
 + * During hibernation on some GM45 platforms the BIOS may try to access
 + * the device even though it's already in D3 and hang the machine. So
 + * leave the device in D0 on those platforms and hope the BIOS will
 + * power down the device properly.

 Please include the model of the known bad machine in this comment, to
 help future archaeologists.

Here are some details:

bjorn@nemi:~$ grep . /sys/class/dmi/id/{bios,product}* 2/dev/null 
/sys/class/dmi/id/bios_date:12/19/2011
/sys/class/dmi/id/bios_vendor:LENOVO
/sys/class/dmi/id/bios_version:6EET55WW (3.15 )
/sys/class/dmi/id/product_name:2776LEG
/sys/class/dmi/id/product_version:ThinkPad X301

Please let me know if you need some other data.


Bjørn
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: fix failure to power off after hibernate

2015-02-26 Thread Bjørn Mork
Imre Deak imre.d...@intel.com writes:

 Attached is the proposed fix for this issue.

 --Imre

 From 5c23657bc168db12a1ba100ab65fabd305c89c8a Mon Sep 17 00:00:00 2001
 From: Imre Deak imre.d...@intel.com
 Date: Thu, 26 Feb 2015 18:38:53 +0200
 Subject: [PATCH] drm/i915: gm45: work around hang during hibernation

I can confirm that this patch solves the problem for me.  Thanks.



Bjørn
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: fix failure to power off after hibernate

2015-02-26 Thread Bjørn Mork
Imre Deak imre.d...@intel.com writes:

 That patch fixes the problem, with only pci_set_power_state commented
 out.  Do you still want me to try with pci_disable_device() commented
 out as well?

 No, but it would help if you could still try the two attached patch
 separately, without any of the previous workarounds. Based on the
 result, we'll follow up with a fix that adds for your specific platform
 either of the attached workarounds or simply avoids putting the device
 into D3 (corresponding to the patch you already tried).

None of those patches made any difference.  The laptop still hangs at
power-off.

Not really surprising, is it?  Previous testing shows that the hang
occurs at the pci_set_power_state(drm_dev-pdev, PCI_D3hot) call in the
poweroff_late hook.  It is hard to see how replacing it by an attempt to
set D3cold, or adding any call after this point, could possibly change
anything.  The system is stil hanging at the pci_set_power_state() call.

The generic pci-driver code will put the i915 device into PCI_D3hot for
you, won't it?  Why do you need to duplicate that in the driver,
*knowing* that doing so breaks (at least some) systems?

I honestly don't think this let's try some random code is the proper
way to fix this bug (or any other bug for that matter).  You need to
start understanding the code you write, and the first step is by
actually explaining the changes you make.

I also believe that you completely miss the fact that this bug has
survived a full release cycle before you became aware of it, and the
implications this has wrt other affected systems/users.  I assume you
understand that my system isn't one-of-a-kind, This means that there are
other affected users with identical/similar systems.  Now, if none of
those users reported the bug to you (we all know why: Linux kernel
development is currently limited by the available testing resources, NOT
by the available developer resources), then how do you know that there
aren't a number of other systems affected as well?

Let me answer that for you:  You don't.

Which is why you must explain the mechanism triggering the bug, proving
that it is a chipset/system specific issue.  Because that's the only way
you will *know* that you have solved the problem not only for me, but for
all affected users.

IMHO, the only safe and sane solution at the moment is the revert patch
I posted.  It's a simple fix, reverting back to the *known* working
state before this regression was introduced.

Then you can start over from there, trying to implement this properly.


Thanks,
Bjørn
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: fix failure to power off after hibernate

2015-02-25 Thread Imre Deak
On ti, 2015-02-24 at 20:00 +0100, Bjørn Mork wrote:
 Imre Deak imre.d...@intel.com writes:
 
  The poweroff handlers undo the actions of the thaw handlers. As the
  original commit stated saving the registers is not needed there, but
  it's also not a big overhead and there should be no problem doing it. We
  are planning to optimize the hibernation sequence by for example not
  shutting down the display between freeze and thaw, and also getting rid
  of unnecessary steps at the power off phase. But before that we want to
  actually unify things rather than having special cases, as maintaining
  the special code paths caused already quite a lot of problems for us so
  far.
 
 That sounds like a worthy goal.  I don't understand what you hope to
 achieve by having a poweroff_late hook, since there aren't really
 anything useful left to do at the point it is called, but if you want a
 dummy callback there for code structure reasons then fine.

There are the following reasons for shutting down the device properly
during hibernation:
- ACPI mandates that the OSPM (the kernel in our case) puts all devices
into D3 that are not wake-up sources (i915 is not) (Kudos to Ville for
pointing this out)
- Embedded panels have a well defined shutdown sequence. We don't have
any good reason to not follow this, in fact for some panels the
subsequent reinitialization could be problematic in case of a hard
power-off. (Thanks to Jani for this info)

In fact the only reason why we didn't put the device into PCI D3 before
the patch you bisected, is kind of arbitrary. The PCI core puts every
device into D3 unless its driver saves the device's PCI state on its
own. Since before that patch we did save the state, but haven't put the
device into D3 it stayed in D0. That was definitely not intentional and
as such we depended on the BIOS to correct this for us afterwards.

 But you cannot just run around breaking stuff while slowly moving
 towards this goal over multiple releases... v3.19 is currently broken
 and that seems totally unnecessary.
 
 In any case: You should have noticed this problem while testing your
 patches.  The breakage is 100% reproducible. Unfortunately I had to do a
 bisect to realize what you had done to the i915 driver, something I
 unfortunately didn't find time to do before v3.19 was released.  But I
 do find it unnecessary to release with such bugs.  Any attempt to
 exercise the code path you modified would have revealed the bug.

We tested these patches on numerous platforms and haven't seen the issue
you reported. Based on your feedback the current assumption is that this
is a bug in your BIOS, which tries to access the device despite it being
powered down. We're trying our best to test each change we commit on a
big set of platforms, but - especially on older hardware with random
BIOS versions/configurations - full coverage is not possible. So we
depend on reports like yours a lot to fill in the gaps.

  Reverting the commit may hide some other issue, so before doing that
  could you try the following patch:
  http://lists.freedesktop.org/archives/intel-gfx/2015-February/060529.html
 
 Makes no difference.  I assume that patch fixes an unrelated bug? The
 age and reported symptoms indicates so.  Note that I am reporting a
 regression introduced after v3.18, while that seems to fix a bug
 introduced in v3.17. Both v3.17 and v3.18 (including v3.18.6), as
 well as earlier releases, work fine for me.

Ok, thanks for trying.

  If with that you still get the hang could you try on top of that the
  patch below, first having only pci_set_power_state uncommented, then
  both pci_set_power_state and pci_disable_device uncommented?
 
 That patch fixes the problem, with only pci_set_power_state commented
 out.  Do you still want me to try with pci_disable_device() commented
 out as well?

No, but it would help if you could still try the two attached patch
separately, without any of the previous workarounds. Based on the
result, we'll follow up with a fix that adds for your specific platform
either of the attached workarounds or simply avoids putting the device
into D3 (corresponding to the patch you already tried).

Thanks,
Imre
From fe8b90c976f14eab80cb6715d0405157163d316e Mon Sep 17 00:00:00 2001
From: Imre Deak imre.d...@intel.com
Date: Wed, 25 Feb 2015 19:38:53 +0200
Subject: [PATCH] drm/i915: zero PCI_COMMAND at the end of hibernation

Signed-off-by: Imre Deak imre.d...@intel.com
---
 drivers/gpu/drm/i915/i915_drv.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 4badb23..b226cc6 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -653,6 +653,8 @@ static int i915_drm_suspend_late(struct drm_device *drm_dev)
 	pci_disable_device(drm_dev-pdev);
 	pci_set_power_state(drm_dev-pdev, PCI_D3hot);
 
+	pci_write_config_word(drm_dev-pdev, PCI_COMMAND, 0);
+
 	return 0;
 }
 
-- 
2.1.0

From 

Re: [Intel-gfx] [PATCH] drm/i915: fix failure to power off after hibernate

2015-02-24 Thread Bjørn Mork
Imre Deak imre.d...@intel.com writes:

 The poweroff handlers undo the actions of the thaw handlers. As the
 original commit stated saving the registers is not needed there, but
 it's also not a big overhead and there should be no problem doing it. We
 are planning to optimize the hibernation sequence by for example not
 shutting down the display between freeze and thaw, and also getting rid
 of unnecessary steps at the power off phase. But before that we want to
 actually unify things rather than having special cases, as maintaining
 the special code paths caused already quite a lot of problems for us so
 far.

That sounds like a worthy goal.  I don't understand what you hope to
achieve by having a poweroff_late hook, since there aren't really
anything useful left to do at the point it is called, but if you want a
dummy callback there for code structure reasons then fine.

But you cannot just run around breaking stuff while slowly moving
towards this goal over multiple releases... v3.19 is currently broken
and that seems totally unnecessary.

In any case: You should have noticed this problem while testing your
patches.  The breakage is 100% reproducible. Unfortunately I had to do a
bisect to realize what you had done to the i915 driver, something I
unfortunately didn't find time to do before v3.19 was released.  But I
do find it unnecessary to release with such bugs.  Any attempt to
exercise the code path you modified would have revealed the bug.


 Reverting the commit may hide some other issue, so before doing that
 could you try the following patch:
 http://lists.freedesktop.org/archives/intel-gfx/2015-February/060529.html

Makes no difference.  I assume that patch fixes an unrelated bug? The
age and reported symptoms indicates so.  Note that I am reporting a
regression introduced after v3.18, while that seems to fix a bug
introduced in v3.17. Both v3.17 and v3.18 (including v3.18.6), as
well as earlier releases, work fine for me.

 If with that you still get the hang could you try on top of that the
 patch below, first having only pci_set_power_state uncommented, then
 both pci_set_power_state and pci_disable_device uncommented?

That patch fixes the problem, with only pci_set_power_state commented
out.  Do you still want me to try with pci_disable_device() commented
out as well?



Bjørn
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: fix failure to power off after hibernate

2015-02-24 Thread Imre Deak
On ti, 2015-02-24 at 15:58 +0100, Bjørn Mork wrote:
 This fixes a bug where hibernation completes, but the system
 fails to power off after the image has been saved.
 
 Bisection lead to commit da2bc1b9db33 (drm/i915: add poweroff_late
 handler) which added a .poweroff_late hook pointing to the same
 function as the .freeze_late hook, without any justification or
 explanation why this would be correct, except for consistency.
 
 The assumption that this makes the power off handling identical to
 the S3 suspend and S4 freeze handling is completely bogus. As clearly
 documented in Documentation/power/devices.txt, the poweroff* hooks
 are part of the hibernation API along with the freeze* hooks.  The
 phases involved in hibernation are:
 
prepare, freeze, freeze_late, freeze_noirq, thaw_noirq, thaw_early,
thaw, complete, prepare, poweroff, poweroff_late, poweroff_noirq
 
 With the above sequence in mind, it should be fairly obvious that you
 cannot save registers and disable the device in both the freeze_late
 and poweroff_late phases.  Or as Documentation/power/devices.txt
 explains it:
 
  The poweroff, poweroff_late and poweroff_noirq callbacks should do 
 essentially
  the same things as the suspend, suspend_late and suspend_noirq callbacks,
  respectively.  The only notable difference is that they need not store the
  device register values, because the registers should already have been stored
  during the freeze, freeze_late or freeze_noirq phases.
 
 The consistency between suspend and hibernate was already in
 place, with freeze_late pointing to the same function as suspend_late.
 There is no need for any poweroff_late hook in this driver.

The poweroff handlers undo the actions of the thaw handlers. As the
original commit stated saving the registers is not needed there, but
it's also not a big overhead and there should be no problem doing it. We
are planning to optimize the hibernation sequence by for example not
shutting down the display between freeze and thaw, and also getting rid
of unnecessary steps at the power off phase. But before that we want to
actually unify things rather than having special cases, as maintaining
the special code paths caused already quite a lot of problems for us so
far.

Reverting the commit may hide some other issue, so before doing that
could you try the following patch:
http://lists.freedesktop.org/archives/intel-gfx/2015-February/060529.html

If with that you still get the hang could you try on top of that the
patch below, first having only pci_set_power_state uncommented, then
both pci_set_power_state and pci_disable_device uncommented?

Thanks,
Imre

diff --git a/drivers/gpu/drm/i915/i915_drv.c
b/drivers/gpu/drm/i915/i915_drv.c
index 4badb23..dc91d4b 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -968,6 +968,23 @@ static int i915_pm_suspend_late(struct device *dev)
return i915_drm_suspend_late(drm_dev);
 }
 
+static int i915_pm_poweroff_late(struct device *dev)
+{
+   struct drm_device *drm_dev = dev_to_i915(dev)-dev;
+   struct drm_i915_private *dev_priv = drm_dev-dev_private;
+   int ret;
+
+   ret = intel_suspend_complete(dev_priv);
+
+   if (ret)
+   return ret;
+
+   pci_disable_device(drm_dev-pdev);
+// pci_set_power_state(drm_dev-pdev, PCI_D3hot);
+
+   return 0;
+}
+
 static int i915_pm_resume_early(struct device *dev)
 {
struct drm_device *drm_dev = dev_to_i915(dev)-dev;
@@ -1535,7 +1552,7 @@ static const struct dev_pm_ops i915_pm_ops = {
.thaw_early = i915_pm_resume_early,
.thaw = i915_pm_resume,
.poweroff = i915_pm_suspend,
-   .poweroff_late = i915_pm_suspend_late,
+   .poweroff_late = i915_pm_poweroff_late,
.restore_early = i915_pm_resume_early,
.restore = i915_pm_resume,
 


___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx