Re: [PATCH v2 8/9] PCI: Exclude PCIe ports used for tunneling in pcie_bandwidth_available()

2023-11-06 Thread Lukas Wunner
On Mon, Nov 06, 2023 at 12:44:25PM -0600, Mario Limonciello wrote:
> Tangentially related; the link speed is currently symmetric but there are
> two sysfs files.  Mika left a comment in drivers/thunderbolt/switch.c it may
> be asymmetric in the future. So we may need to keep that in mind on any
> design that builds on top of them.

Aren't asymmetric Thunderbolt speeds just a DisplayPort thing?


> As 'thunderbolt' can be a module or built in, we need to bring code into PCI
> core so that it works in early boot before it loads.

tb_switch_get_generation() is small enough that it could be moved to the
PCI core.  I doubt that we need to make thunderbolt built-in only
or move a large amount of code to the PCI core.

Thanks,

Lukas


Re: [PATCH v2 6/9] PCI: Rename is_thunderbolt to is_tunneled

2023-11-06 Thread Lukas Wunner
On Mon, Nov 06, 2023 at 10:59:13AM -0600, Mario Limonciello wrote:
> On 11/5/2023 11:39, Lukas Wunner wrote:
> > On Fri, Nov 03, 2023 at 02:07:55PM -0500, Mario Limonciello wrote:
> > > The `is_thunderbolt` bit has been used to indicate that a PCIe device
> > > contained the Intel VSEC which is used by various parts of the kernel
> > > to change behavior. To later allow usage with USB4 controllers as well,
> > > rename this to `is_tunneled`.
> > 
> > This doesn't seem to make sense.  is_thunderbolt indicates that a device
> > is part of a Thunderbolt controller.  See the code comment:
> > 
> > > - unsigned intis_thunderbolt:1;   /* Thunderbolt controller */
> > 
> > A Thunderbolt controller is not necessarily tunneled.  The PCIe switch,
> > NHI and XHCI of the Thunderbolt host controller are not tunneled at all.
> 
> I could really use some clarification which PCIe devices actually contain
> the Intel VSEC.
> 
> Is it in all 3 of those PCIe devices and not just the switch?

Yes, I've just double-checked Light Ridge, Cactus Ridge, Alpine Ridge.

Thanks,

Lukas


Re: [PATCH v2 8/9] PCI: Exclude PCIe ports used for tunneling in pcie_bandwidth_available()

2023-11-06 Thread Lukas Wunner
On Fri, Nov 03, 2023 at 02:07:57PM -0500, Mario Limonciello wrote:
> The USB4 spec specifies that PCIe ports that are used for tunneling
> PCIe traffic over USB4 fabric will be hardcoded to advertise 2.5GT/s and
> behave as a PCIe Gen1 device. The actual performance of these ports is
> controlled by the fabric implementation.
> 
> Downstream drivers such as amdgpu which utilize pcie_bandwidth_available()
> to program the device will always find the PCIe ports used for
> tunneling as a limiting factor potentially leading to incorrect
> performance decisions.
> 
> To prevent problems in downstream drivers check explicitly for ports
> being used for PCIe tunneling and skip them when looking for bandwidth
> limitations of the hierarchy. If the only device connected is a root port
> used for tunneling then report that device.

I think a better approach would be to define three new bandwidths for
Thunderbolt in enum pci_bus_speed and add appropriate descriptions in
pci_speed_string().  Those three bandwidths would be 10 GBit/s for
Thunderbolt 1, 20 GBit/s for Thunderbolt 2, 40 GBit/s for Thunderbolt 3
and 4.

Code to determine the Thunderbolt generation from the PCI ID already exists
in tb_switch_get_generation().

This will not only address the amdgpu issue you're trying to solve,
but also emit an accurate speed from __pcie_print_link_status().

The speed you're reporting with your approach is not necessarily
accurate because the next non-tunneled device in the hierarchy might
be connected with a far higher PCIe speed than what the Thunderbolt
fabric allows.

Thanks,

Lukas


Re: [PATCH v2 6/9] PCI: Rename is_thunderbolt to is_tunneled

2023-11-05 Thread Lukas Wunner
On Fri, Nov 03, 2023 at 02:07:55PM -0500, Mario Limonciello wrote:
> The `is_thunderbolt` bit has been used to indicate that a PCIe device
> contained the Intel VSEC which is used by various parts of the kernel
> to change behavior. To later allow usage with USB4 controllers as well,
> rename this to `is_tunneled`.

This doesn't seem to make sense.  is_thunderbolt indicates that a device
is part of a Thunderbolt controller.  See the code comment:

> - unsigned intis_thunderbolt:1;   /* Thunderbolt controller */

A Thunderbolt controller is not necessarily tunneled.  The PCIe switch,
NHI and XHCI of the Thunderbolt host controller are not tunneled at all.

Thanks,

Lukas


Re: [PATCH v2 3/3] drm: Call vga_switcheroo_process_delayed_switch() in drm_lastclose

2023-01-17 Thread Lukas Wunner
On Thu, Jan 12, 2023 at 09:11:56PM +0100, Thomas Zimmermann wrote:
> Several lastclose helpers call vga_switcheroo_process_delayed_switch().
> It's better to call the helper from drm_lastclose() after the kernel
> client's screen has been restored. This way, all drivers can benefit
> without having to implement their own lastclose helper. For drivers
> without vga-switcheroo, vga_switcheroo_process_delayed_switch() does
> nothing.
[...]
> --- a/drivers/gpu/drm/drm_file.c
> +++ b/drivers/gpu/drm/drm_file.c
> @@ -38,6 +38,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> @@ -460,6 +461,8 @@ void drm_lastclose(struct drm_device * dev)
>   drm_legacy_dev_reinit(dev);
>  
>   drm_client_dev_restore(dev);
> +
> + vga_switcheroo_process_delayed_switch();
>  }

Hm, this looks like a case of midlayer fallacy:

https://blog.ffwll.ch/2016/12/midlayers-once-more-with-feeling.html

It is a departure from the opt-in library approach we've had so far.

For switcheroo-aware EDID retrieval, there's a drm_get_edid_switcheroo()
helper.  How about introducing a switcheroo-aware lastclose helper which
drivers can reference?

Thanks,

Lukas


Re: [PATCH v5 3/7] PCI: Drop the `is_thunderbolt` attribute from PCI core

2022-02-28 Thread Lukas Wunner
On Mon, Feb 28, 2022 at 04:13:44PM -0600, Bjorn Helgaas wrote:
> On Mon, Feb 28, 2022 at 03:33:13PM +, Limonciello, Mario wrote:
> > > On Fri, Feb 25, 2022 at 11:42:24AM -0600, Bjorn Helgaas wrote:
> > > > That would just leave the "PCI_VSEC_ID_INTEL_TBT implies external-
> > > facing"
> > > > assumption above.  Not having a Thunderbolt spec, I have no idea how
> > > > you deal with that.
> > > 
> > > You can download the spec here:
[...]
> > > Inside the archive there is also the DVSEC spec with name "USB4 DVSEC
> > > Version 1.0.pdf".
> > 
> > The spec has Host_Router_indication (bits 18-19) as meaning external facing.
> > I'll respin the patch 3 for using that.
> 
> Thanks, please include the spec citation when you do.  And probably
> the URL, because it's not at all obvious how the casual reader would
> get from "is_thunderbolt" to a recent add-on to the USB4 spec.

PCI_VSEC_ID_INTEL_TBT is not mentioned at all in the USB4 spec,
hence there's no connection between "is_thunderbolt" and the USB4 spec.

It's a proprietary VSEC used by Intel and the only way to recognize
pre-USB4 Thunderbolt devices that I know of.  Its ID is also
different from the DVSEC IDs given in the above-mentioned spec.

Thanks,

Lukas


Re: [PATCH v4 00/10] Overhaul `is_thunderbolt`

2022-02-14 Thread Lukas Wunner
On Mon, Feb 14, 2022 at 06:01:50PM -0600, Mario Limonciello wrote:
>  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c |  2 +-
>  drivers/gpu/drm/amd/amdgpu/nbio_v2_3.c  |  2 +-
>  drivers/gpu/drm/nouveau/nouveau_vga.c   |  4 +-
>  drivers/gpu/drm/radeon/radeon_device.c  |  4 +-
>  drivers/gpu/drm/radeon/radeon_kms.c |  2 +-
>  drivers/pci/hotplug/pciehp_hpc.c|  6 +-
>  drivers/pci/pci-acpi.c  | 15 -
>  drivers/pci/pci.c   | 17 +++--
>  drivers/pci/probe.c | 52 ++-
>  drivers/pci/quirks.c| 84 +
>  drivers/platform/x86/apple-gmux.c   |  2 +-
>  drivers/thunderbolt/nhi.h   |  2 -
>  include/linux/pci.h | 25 +---
>  include/linux/pci_ids.h |  3 +
>  14 files changed, 173 insertions(+), 47 deletions(-)

That's an awful lot of additional LoC for what is primarily
a refactoring job with the intent to simplify things.

Honestly this looks like an attempt to fix something that
isn't broken.  Specifically, the is_thunderbolt bit apparently
can't be removed without adding new bits to struct pci_dev.
Not sure if that can be called progress.

Thanks,

Lukas


Re: [PATCH v3 05/12] PCI: Detect root port of internal USB4 devices by `usb4-host-interface`

2022-02-14 Thread Lukas Wunner
On Mon, Feb 14, 2022 at 09:34:26AM +0200, Mika Westerberg wrote:
> On Fri, Feb 11, 2022 at 03:45:46PM -0600, Bjorn Helgaas wrote:
> > My expectation is that "USB" (like "PCI" and "PCIe") tells me
> > something about how a device is electrically connected and how
> > software can operate it.  It doesn't really tell me anything about
> > whether those electrical connections are permanent, made through an
> > internal slot, or made through an external connector and cable.
> 
> It is used to identify "tunneled" ports (whether PCIe, USB 3.x or
> DisplayPort). Tunnels are created by software (in Linux it is the
> Thunderbolt driver) and are dynamic in nature. The USB4 links go over
> USB Type-C cable which also is something user can plug/unplug freely.
> 
> I would say it is reasonable expectation that anything behind these
> ports can be assumed as "removable".

USB gadgets may be soldered to the mainboard.  Those cannot be
unplugged freely.  It is common practice to solder USB Ethernet
or USB FTDI serial ports and nothing's preventing a vendor to solder
USB4/Thunderbolt gadgets.


Re: [PATCH v3 03/12] PCI: Move check for old Apple Thunderbolt controllers into a quirk

2022-02-13 Thread Lukas Wunner
On Sun, Feb 13, 2022 at 10:19:20AM +0100, Lukas Wunner wrote:
> Apple had been using its own scheme to put Thunderbolt controllers
> into D3cold when nothing is plugged in, about a decade before Microsoft
> defined the ACPI property.

I meant to say "half a decade", sorry.


Re: [PATCH v3 03/12] PCI: Move check for old Apple Thunderbolt controllers into a quirk

2022-02-13 Thread Lukas Wunner
On Fri, Feb 11, 2022 at 01:32:41PM -0600, Mario Limonciello wrote:
> `pci_bridge_d3_possible` currently checks explicitly for a Thunderbolt
> controller to indicate that D3 is possible.  As this is used solely
> for older Apple systems, move it into a quirk that enumerates across
> all Intel TBT controllers.

I'm not so sure if it is only needed on Apple systems.


> @@ -2954,10 +2960,6 @@ bool pci_bridge_d3_possible(struct pci_dev *bridge)
>   if (pci_bridge_d3_force)
>   return true;
>  
> - /* Even the oldest 2010 Thunderbolt controller supports D3. */
> - if (bridge->is_thunderbolt)
> - return true;
> -
>   /* Platform might know better if the bridge supports D3 */
>   if (platform_pci_bridge_d3(bridge))
>   return true;

The fact that Thunderbolt PCIe ports support D3 is a property of those
devices.  It's not a property of the platform or a quirk of a particular
vendor.

Hence in my view the current location of the check (pci_bridge_d3_possible())
makes sense wheras the location you're moving it to does not.


> +/* Apple machines as old as 2010 can do D3 with Thunderbolt controllers, but 
> don't specify
> + * it in the ACPI tables
> + */

Apple started shipping Thunderbolt in 2011.
Intel brought the first chips to market in 2010.

The date is meaningful at the code's current location in
pci_bridge_d3_possible() because a few lines further down
there's a 2015 BIOS cut-off date.

Microsoft came up with an ACPI property that BIOS vendors may set
so that Windows knows it may put a Thunderbolt controller into D3cold.
I'm not even sure if that property was ever officially adopted by the
ACPI spec or if it's just a Microsoft-defined "standard".

Apple had been using its own scheme to put Thunderbolt controllers
into D3cold when nothing is plugged in, about a decade before Microsoft
defined the ACPI property.

I'm not sure if other vendors came up with their own schemes to
power-manage Thunderbolt.  We may regress those with the present
patch.

Thanks,

Lukas


Re: [PATCH v2 3/9] PCI: drop `is_thunderbolt` attribute

2022-02-13 Thread Lukas Wunner
On Fri, Feb 11, 2022 at 12:23:51PM +0200, Mika Westerberg wrote:
> On Thu, Feb 10, 2022 at 04:43:23PM -0600, Mario Limonciello wrote:
> > @@ -2955,7 +2955,7 @@ bool pci_bridge_d3_possible(struct pci_dev *bridge)
> > return true;
> >  
> > /* Even the oldest 2010 Thunderbolt controller supports D3. */
> > -   if (bridge->is_thunderbolt)
> > +   if (dev_is_removable(&bridge->dev))
> 
> For this, I'm not entirely sure this is what we want. The purpose of
> this check is to enable port power management of Apple systems with
> Intel Thunderbolt controller and therefore checking for "removable" here
> is kind of misleading IMHO.
[...]
> and then make a quirk in quirks.c that adds the software node property
> for the Apple systems? Or something along those lines.

Honestly, that feels wrong to me.

There are non-Apple products with Thunderbolt controllers,
e.g. Supermicro X10SAT was a Xeon board with Redwood Ridge
which was introduced in 2013.  This was way before Microsoft
came up with the HotPlugSupportInD3 property.  It was also way
before the 2015 BIOS cut-off date that we use to disable
power management on older boards.

Still, we currently whitelist the Thunderbolt ports on that
board for D3 because we know it works.  What if products like
this one use their own power management scheme and we'd cause
a power regression if we needlessly disable D3 for them now?

Thanks,

Lukas


Re: [PATCH v3 04/12] PCI: Drop the `is_thunderbolt` attribute from PCI core

2022-02-13 Thread Lukas Wunner
On Fri, Feb 11, 2022 at 01:32:42PM -0600, Mario Limonciello wrote:
> The `is_thunderbolt` attribute is currently a dumping ground for a
> variety of things.

It's not as arbitrary as it may seem.  Quite a bit of thought went into
the current design.


> Instead use the driver core removable attribute to indicate the
> detail a device is attached to a thunderbolt or USB4 chain.

You're missing the point that "is_thunderbolt" is set on the *controller*
(i.e. its upstream and downstream ports).

The controller itself is *not* removable if it's the host controller.

However a device can be assumed to be removable if it has an ancestor
which has the "is_thunderbolt" flag set.


>  static void pci_set_removable(struct pci_dev *dev)
>  {
>   struct pci_dev *parent = pci_upstream_bridge(dev);
> + u16 vsec;
> +
> + /* Is the device a Thunderbolt controller? */
> + vsec = pci_find_vsec_capability(dev, PCI_VENDOR_ID_INTEL, 
> PCI_VSEC_ID_INTEL_TBT);

This doesn't make any sense because the host controller is not
removable.


> @@ -1860,7 +1855,6 @@ int pci_setup_device(struct pci_dev *dev)
>   dev->cfg_size = pci_cfg_space_size(dev);
>  
>   /* Need to have dev->cfg_size ready */
> - set_pcie_thunderbolt(dev);
>  
>   set_pcie_untrusted(dev);

Either drop the blank line or drop the code comment if set_pcie_untrusted()
doesn't need dev->cfg_size.


> diff --git a/drivers/platform/x86/apple-gmux.c 
> b/drivers/platform/x86/apple-gmux.c
> index 57553f9b4d1d..04232fbc7d56 100644
> --- a/drivers/platform/x86/apple-gmux.c
> +++ b/drivers/platform/x86/apple-gmux.c
> @@ -596,7 +596,7 @@ static int gmux_resume(struct device *dev)
>  
>  static int is_thunderbolt(struct device *dev, void *data)
>  {
> - return to_pci_dev(dev)->is_thunderbolt;
> + return pci_is_thunderbolt_attached(to_pci_dev(dev));
>  }

No, the gmux driver changes its behavior if a Thunderbolt host
controller is present.  Not if there's a Thunderbolt-attached
device present.

Thanks,

Lukas


[PATCH] PCI/ACPI: Whitelist hotplug ports for D3 if power managed by ACPI

2020-10-01 Thread Lukas Wunner
Recent laptops with dual AMD GPUs fail to suspend the discrete GPU, thus
causing lockups on system sleep and high power consumption at runtime.
The discrete GPU would normally be suspended to D3cold by turning off
ACPI _PR3 Power Resources of the Root Port above the GPU.

However on affected systems, the Root Port is hotplug-capable and
pci_bridge_d3_possible() only allows hotplug ports to go to D3 if they
belong to a Thunderbolt device or if the Root Port possesses a
"HotPlugSupportInD3" ACPI property.  Neither is the case on affected
laptops.  The reason for whitelisting only specific, known to work
hotplug ports for D3 is that there have been reports of SkyLake Xeon-SP
systems raising Hardware Error NMIs upon suspending their hotplug ports:
https://lore.kernel.org/linux-pci/20170503180426.GA4058@otc-nc-03/

But if a hotplug port is power manageable by ACPI (as can be detected
through presence of Power Resources and corresponding _PS0 and _PS3
methods) then it ought to be safe to suspend it to D3.  To this end,
amend acpi_pci_bridge_d3() to whitelist such ports for D3.

Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1222
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1252
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1304
Reported-and-tested-by: Arthur Borsboom 
Reported-and-tested-by: matoro 
Reported-by: Aaron Zakhrov 
Reported-by: Michal Rostecki 
Reported-by: Shai Coleman 
Signed-off-by: Lukas Wunner 
Cc: sta...@vger.kernel.org
Cc: Alex Deucher 
Cc: Rafael J. Wysocki 
Cc: Mika Westerberg 
---
 drivers/pci/pci-acpi.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c
index d5869a0..d9aa551 100644
--- a/drivers/pci/pci-acpi.c
+++ b/drivers/pci/pci-acpi.c
@@ -944,6 +944,16 @@ static bool acpi_pci_bridge_d3(struct pci_dev *dev)
if (!dev->is_hotplug_bridge)
return false;
 
+   /* Assume D3 support if the bridge is power-manageable by ACPI. */
+   adev = ACPI_COMPANION(&dev->dev);
+   if (!adev && !pci_dev_is_added(dev)) {
+   adev = acpi_pci_find_companion(&dev->dev);
+   ACPI_COMPANION_SET(&dev->dev, adev);
+   }
+
+   if (adev && acpi_device_power_manageable(adev))
+   return true;
+
/*
 * Look for a special _DSD property for the root port and if it
 * is set we know the hierarchy behind it supports D3 just fine.
-- 
2.27.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [Nouveau] [PATCH 1/4] drm: retrieve EDID via ACPI _DDC method

2020-08-08 Thread Lukas Wunner
On Mon, Jul 27, 2020 at 03:53:54PM -0500, Daniel Dadap wrote:
> --- a/drivers/gpu/drm/drm_edid.c
> +++ b/drivers/gpu/drm/drm_edid.c
> @@ -34,6 +34,7 @@
>  #include 
>  #include 
>  #include 
> +#include 

Nit: Alphabetic ordering.

> +static u64 *get_dod_entries(acpi_handle handle, int *count)
> +{
> + acpi_status status;
> + struct acpi_buffer buf = { ACPI_ALLOCATE_BUFFER, NULL };
> + union acpi_object *dod;
> + int i;
> + u64 *ret = NULL;

Nits: Reverse christmas tree.
"ret" is a poor name, I'd suggest "entries" or something like that.
The spec says that _DOD is a list of 32-bit values, not 64-bit.

> + status = acpi_evaluate_object(handle, "_DOD", NULL, &buf);
> +
> + if (ACPI_FAILURE(status))
> + return NULL;

Nit: No blank line between function invocation and error check.

> + dod = buf.pointer;
> +
> + if (dod == NULL || dod->type != ACPI_TYPE_PACKAGE)
> + goto done;

Same.

> + ret = kmalloc_array(dod->package.count, sizeof(*ret), GFP_KERNEL);
> + if (ret == NULL)
> + goto done;

Nit: Usually we use "if (!ret)" or "if (ret)".

> + list_for_each_safe(node, next, &device->children) {

No, that's not safe because the ACPI namespace may change concurrently,
e.g. because a DSDT patch is applied by the user via sysfs.
Use acpi_walk_namespace() with a depth of 1 instead.

> + for (i = 0; i < num_dod_entries; i++) {
> + if (adr == dod_entries[i]) {
> + ret = do_acpi_ddc(child->handle);
> +
> + if (ret != NULL)
> + goto done;

I guess ideally we'd want to correlate the display objects with
drm_connectors or at least constrain the search to Display Type
"Internal/Integrated Digital Flat Panel" instead of picking the
first EDID found.  Otherwise we might erroneously use the DDC
for an externally attached display.

> +struct edid *drm_get_edid_acpi(void)
> +{
> +#if defined(CONFIG_ACPI) && defined(CONFIG_PCI)

No, put an empty inline stub in the header file instead of using #ifdef, see:

https://www.kernel.org/doc/html/latest/process/coding-style.html#conditional-compilation

Patches 2, 3 and 4 need a "drm/" prefix in the Subject, e.g.
"drm/i915: ".

Please cc all ACPI-related patches to linux-acpi.

Thanks,

Lukas
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 0/6] RFC Support hot device unplug in amdgpu

2020-05-11 Thread Lukas Wunner
On Mon, May 11, 2020 at 02:21:57PM +0200, Daniel Vetter wrote:
> On Mon, May 11, 2020 at 1:43 PM Lukas Wunner  wrote:
> > On Mon, May 11, 2020 at 11:54:33AM +0200, Daniel Vetter wrote:
> > > - One unfortunate thing with drm_dev_unplug is that the driver core is
> > >   very opinionated and doesn't tell you whether it's a hotunplug or a
> > >   driver unload. In the former case trying to shut down hw just wastes
> > >   time (and might hit driver bugs), in the latter case driver engineers
> > >   very much expect everything to be shut down.
> > 
> > You can get that information at the PCI bus level with
> > pci_dev_is_disconnected().
> 
> Ok, so at least for pci devices you could do something like
> 
> if (pci_dev_is_disconnected())
> drm_dev_unplug();
> else
> drm_dev_unregister();
>
> In the ->remove callback and both users and developers should be
> happy.

Basically yes.  But if the driver is unbound e.g. via sysfs and the
device is hot-removed while it is being unbound, that approach fails.

So you'll need checks for pci_dev_is_disconnected() further below in
the call stack as well to avoid unpleasant side effects such as unduly
delaying unbinding or ending up in infinite loops when reading "all ones"
from PCI BARs, etc.

It may also be worth checking for pci_dev_is_disconnected() in ioctls
as well and directly returning -ENODEV, though of course that suffers
from the same race.  (The device may disappear after the check for
pci_dev_is_disconnected(), or it may have already disappeared but
pciehp hasn't updated the device's channel state yet.)

Thanks,

Lukas
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 0/6] RFC Support hot device unplug in amdgpu

2020-05-11 Thread Lukas Wunner
On Mon, May 11, 2020 at 11:54:33AM +0200, Daniel Vetter wrote:
> - One unfortunate thing with drm_dev_unplug is that the driver core is
>   very opinionated and doesn't tell you whether it's a hotunplug or a
>   driver unload. In the former case trying to shut down hw just wastes
>   time (and might hit driver bugs), in the latter case driver engineers
>   very much expect everything to be shut down.

You can get that information at the PCI bus level with
pci_dev_is_disconnected().  The flag queried by this function is set
upon hot removal.  Be aware however that the device is guaranteed to
be unreachable if the function returns true, but the converse is NOT
guaranteed, i.e. the function may return false even though the device
has just gone away.

Those somewhat difficult semantics are one of the reasons why some
people are skeptical of the function's merits (notably Greg KH).
See this LWN article for more information:

https://lwn.net/Articles/767885/
(scroll down to the "Surprise removal" section)

I've suggested to Greg a few years back that we should have a flag
at the device level to indicate whether it's gone, not just at the bus
level.  That way the property could be expressed regardless of the bus
used.  It would facilitate the feature you're missing, that the driver
core tells you whether it's a surprise removal or not.  Unfortunately
Greg rejected the idea.

Thanks,

Lukas
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 0/1] Fiji GPU audio register timeout when in BACO state

2020-05-02 Thread Lukas Wunner
On Sat, May 02, 2020 at 09:11:58AM +0200, Takashi Iwai wrote:
> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> @@ -673,6 +673,12 @@ static int amdgpu_dm_audio_component_bind(struct device 
> *kdev,
>   struct amdgpu_device *adev = dev->dev_private;
>   struct drm_audio_component *acomp = data;
>  
> + if (!device_link_add(hda_kdev, kdev, DL_FLAG_STATELESS |
> +  DL_FLAG_PM_RUNTIME)) {
> + DRM_ERROR("DM: cannot add device link to audio device\n");
> + return -ENOMEM;
> + }
> +

Doesn't this duplicate drivers/pci/quirks.c:quirk_gpu_hda() ?
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 09/19] drm/amdgpu: add additional boco checks to runtime suspend/resume

2019-10-12 Thread Lukas Wunner
On Thu, Oct 10, 2019 at 08:45:26PM -0500, Alex Deucher wrote:
> + if (amdgpu_device_supports_boco(drm_dev))
> + drm_dev->switch_power_state = DRM_SWITCH_POWER_CHANGING;
>  
> + if (amdgpu_device_supports_boco(drm_dev)) {
> + if (amdgpu_is_atpx_hybrid() ||
> + !amdgpu_has_atpx_dgpu_power_cntl())
> + pci_set_power_state(pdev, PCI_D0);
> + pci_restore_state(pdev);
> + ret = pci_enable_device(pdev);
> + if (ret)
> + return ret;
> + pci_set_master(pdev);
> + }

Two consecutive if-clauses with same condition, can be folded into one.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 02/19] drm/amdgpu: add supports_baco callback for soc15 asics.

2019-10-12 Thread Lukas Wunner
On Thu, Oct 10, 2019 at 08:45:19PM -0500, Alex Deucher wrote:
> Check the BACO capabilities from the powerplay table.
> 
> Signed-off-by: Alex Deucher 
[...]
> @@ -997,6 +1020,7 @@ static const struct amdgpu_asic_funcs vega20_asic_funcs =
>   .read_bios_from_rom = &soc15_read_bios_from_rom,
>   .read_register = &soc15_read_register,
>   .reset = &soc15_asic_reset,
> + .reset_method = &soc15_asic_reset_method,
>   .set_vga_state = &soc15_vga_set_state,
>   .get_xclk = &soc15_get_xclk,
>   .set_uvd_clocks = &soc15_set_uvd_clocks,
> @@ -1009,7 +1033,7 @@ static const struct amdgpu_asic_funcs vega20_asic_funcs 
> =
>   .get_pcie_usage = &vega20_get_pcie_usage,
>   .need_reset_on_init = &soc15_need_reset_on_init,
>   .get_pcie_replay_count = &soc15_get_pcie_replay_count,
> - .reset_method = &soc15_asic_reset_method

Seemingly unrelated change without explanation.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 08/19] drm/amdgpu: rename amdgpu_device_is_px to amdgpu_device_supports_boco

2019-10-12 Thread Lukas Wunner
On Thu, Oct 10, 2019 at 08:45:25PM -0500, Alex Deucher wrote:
> To better match what we are checking for and to align with
> amdgpu_device_supports_baco.
> 
> BACO - Bus Active, Chip Off
> BOCO - Bus Off, Chip Off

It would be useful to spell out BACO in the preceding patches as well.


> - * amdgpu_device_is_px - Is the device is a dGPU with HG/PX power control
> + * amdgpu_device_supports_boco - Is the device is a dGPU with HG/PX power 
> control

Duplicate verb.

My personal style to document boolean return values is "whether ...",
i.e. "whether device is a dGPU with HG/PX power control".
But that's just me.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] gpu: drm: radeon: Set DPM_FLAG_NEVER_SKIP when enabling PM-runtime

2019-02-16 Thread Lukas Wunner
On Fri, Feb 15, 2019 at 11:01:04AM -0500, Alex Deucher wrote:
> On Fri, Feb 15, 2019 at 10:39 AM Rafael J. Wysocki  wrote:
> > On HP ProBook 4540s, if PM-runtime is enabled in the radeon driver
> > and the direct-complete optimization is used for the radeon device
> > during system-wide suspend, the system doesn't resume.
> >
> > Preventing direct-complete from being used with the radeon device by
> > setting the DPM_FLAG_NEVER_SKIP driver flag for it makes the problem
> > go away, which indicates that direct-complete is not safe for the
> > radeon driver in general and should not be used with it (at least
> > for now).
> >
> > This fixes a regression introduced by commit c62ec4610c40
> > ("PM / core: Fix direct_complete handling for devices with no
> > callbacks") which allowed direct-complete to be applied to
> > devices without PM callbacks (again) which in turn unlocked
> > direct-complete for radeon on HP ProBook 4540s.
> 
> Do other similar drivers like amdgpu and nouveau need the same fix?
> I'm not too familiar with the direct_complete feature in general.

direct_complete means that a discrete GPU which is in D3cold upon
entering system sleep is left as is, i.e. it is not woken.  It is
also expected to still be in D3cold when resuming from system sleep
from the PM core's point of view.  (If it is in D0uninitialized, the
GPU's driver needs to ensure it is transitioned to D3cold again.)

I know for a fact that resuming the discrete GPU is not necessary
on my MacBook Pro with Nvidia GPU.  I'd expect those with AMD GPUs
to behave the same.  The apple-gmux driver takes care of putting
the GPU into D3cold on resume from system sleep if it was in D3cold
when entering system sleep (see drivers/platform/x86/apple-gmux.c,
gmux_resume()).

I think it is desirable to use direct_complete because it saves power
(no need to gratuitously wake the GPU upon entering system sleep,
only to immediately cut its power) and it also speeds up the suspend
process by about half a second.

The root cause on the HP ProBook 4540s needs to be debugged, I'd
suspect a BIOS issue which could be adressed by a quirk, either for
this particular machine or for a certain class of devices (e.g. all
machines which use PR3 to transition to D3cold) if that is necessary
to behave identically to Windows.  Or maybe the atpx vga_switcheroo
handler needs to be amended to put the GPU into D3cold on resume from
system sleep if it was runtime suspended before.

Is this machine using s2idle or does it suspend to S3?

Thanks,

Lukas

> > Fixes: c62ec4610c40 ("PM / core: Fix direct_complete handling for devices 
> > with no callbacks")
> > Link: https://bugzilla.kernel.org/show_bug.cgi?id=201519
> > Reported-by: ??  
> > Tested-by: ??  
> > Signed-off-by: Rafael J. Wysocki 
> > ---
> >  drivers/gpu/drm/radeon/radeon_kms.c |1 +
> >  1 file changed, 1 insertion(+)
> >
> > Index: linux-pm/drivers/gpu/drm/radeon/radeon_kms.c
> > ===
> > --- linux-pm.orig/drivers/gpu/drm/radeon/radeon_kms.c
> > +++ linux-pm/drivers/gpu/drm/radeon/radeon_kms.c
> > @@ -172,6 +172,7 @@ int radeon_driver_load_kms(struct drm_de
> > }
> >
> > if (radeon_is_px(dev)) {
> > +   dev_pm_set_driver_flags(dev->dev, DPM_FLAG_NEVER_SKIP);
> > pm_runtime_use_autosuspend(dev->dev);
> > pm_runtime_set_autosuspend_delay(dev->dev, 5000);
> > pm_runtime_set_active(dev->dev);
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [V3] vga_switcheroo: set audio client id according to bound GPU id

2018-07-17 Thread Lukas Wunner
On Tue, Jul 17, 2018 at 04:20:50PM +0800, Jim Qu wrote:
> On modern laptop, there are more and more platforms
> have two GPUs, and each of them maybe have audio codec
> for HDMP/DP output. For some dGPU which is no output,
> audio codec usually is disabled.
> 
> In currect HDA audio driver, it will set all codec as
> VGA_SWITCHEROO_DIS, the audio which is binded to UMA
> will be suspended if user use debugfs to contorl power
> 
> In HDA driver side, it is difficult to know which GPU
> the audio has binded to. So set the bound gpu pci dev
> to vga_switcheroo.
> 
> if the audio client is not the third registration, audio
> id will set in vga_switcheroo enable function. if the
> audio client is the last registration when vga_switcheroo
> _ready() get true, we should get audio client id from bound
> GPU directly.
> 
> Signed-off-by: Jim Qu 

Reviewed-by: Lukas Wunner 

@Takashi, any preference which tree to merge this through?
sound or drm-misc, either way would seem fine to me.  I think
there's going to be one final drm-misc pull sent to Dave this
week, after that it's 4.20.

Thanks,

Lukas
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [alsa-devel] [V2 2/2] vgaswitchreoo: set audio client id in vgaswitchreoo enable function

2018-07-16 Thread Lukas Wunner
On Mon, Jul 16, 2018 at 04:47:11PM +0200, Takashi Iwai wrote:
> On Mon, 16 Jul 2018 16:43:42 +0200,
> Lukas Wunner wrote:
> > 
> > On Mon, Jul 16, 2018 at 02:06:35PM +0800, Jim Qu wrote:
> > > +
> > > + list_for_each_entry(client, &vgasr_priv.clients, list) {
> > > + if (!client_is_audio(client) || client_id(client) !=
> > > + VGA_SWITCHEROO_UNKNOWN_ID)
> > 
> > Don't you have to check for
> > 
> >   client_id(client) != VGA_SWITCHEROO_UNKNOWN_ID | ID_BIT_AUDIO
> > 
> > here?  That's the value you assign when the audio client registers,
> > yet you're checking for something else, so you're always skipping
> > over audio clients.  Did you actually test this patch?
> 
> This should be OK, since client_id() is a macro to strip to the core
> id number without ID_BIT_AUDIO.  And client_is_vga() &
> client_is_audio() macros were rewritten along with it, too.

Okay, thanks for the clarification.

Lukas
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [alsa-devel] [V2 2/2] vgaswitchreoo: set audio client id in vgaswitchreoo enable function

2018-07-16 Thread Lukas Wunner
On Mon, Jul 16, 2018 at 02:06:35PM +0800, Jim Qu wrote:
> +
> + list_for_each_entry(client, &vgasr_priv.clients, list) {
> + if (!client_is_audio(client) || client_id(client) !=
> + VGA_SWITCHEROO_UNKNOWN_ID)

Don't you have to check for

  client_id(client) != VGA_SWITCHEROO_UNKNOWN_ID | ID_BIT_AUDIO

here?  That's the value you assign when the audio client registers,
yet you're checking for something else, so you're always skipping
over audio clients.  Did you actually test this patch?


> On modern laptop, there are more and more platforms
> have two GPUs, and each of them maybe have audio codec
> for HDMP/DP output. For some dGPU which is no output,
> audio codec usually is disabled.
> 
> In currect HDA audio driver, it will set all codec as
> VGA_SWITCHEROO_DIS, the audio which is binded to UMA
> will be suspended if user use debugfs to control power
> 
> In HDA driver side, it is difficult to know which GPU
> the audio has bound to. So set the bound gpu pci dev
> to vgaswitchreoo, the correct audio id will be set in
> vgaswitchreoo enable function.
> 
> Signed-off-by: Jim Qu 

Apart from the issue above and that this approach doesn't work if
hda_intel registers after the two GPUs, some nits:

vgaswitchreoo => vga_switcheroo multiple times above and in the subject.


> --- a/drivers/gpu/vga/vga_switcheroo.c
> +++ b/drivers/gpu/vga/vga_switcheroo.c
> @@ -103,6 +103,7 @@
>   *   runtime pm. If true, writing ON and OFF to the vga_switcheroo debugfs
>   *   interface is a no-op so as not to interfere with runtime pm
>   * @list: client list
> + * @vga_dev: pci device, indicate which GPU is bound to current audio client
>   *
>   * Registered client. A client can be either a GPU or an audio device on a 
> GPU.
>   * For audio clients, the @fb_info and @active members are bogus.

Please add a note here that vga_dev is bogus for GPU clients.


> @@ -192,14 +193,28 @@ static void vga_switcheroo_enable(void)
>   vgasr_priv.handler->init();
>  
>   list_for_each_entry(client, &vgasr_priv.clients, list) {
> - if (client->id != VGA_SWITCHEROO_UNKNOWN_ID)
> + if (!client_is_vga(client) || client_id(client) !=
> + VGA_SWITCHEROO_UNKNOWN_ID)

Readability might improve if you break the line like this:

if (!client_is_vga(client) ||
client_id(client) != VGA_SWITCHEROO_UNKNOWN_ID)


> @@ -339,9 +357,10 @@ EXPORT_SYMBOL(vga_switcheroo_register_client);
>   */
>  int vga_switcheroo_register_audio_client(struct pci_dev *pdev,
>   const struct vga_switcheroo_client_ops *ops,
> - enum vga_switcheroo_client_id id)
> + struct pci_dev *vga_dev)
>  {
> - return register_client(pdev, ops, id | ID_BIT_AUDIO, false, true);
> + return register_client(pdev, ops, VGA_SWITCHEROO_UNKNOWN_ID |
> + ID_BIT_AUDIO, vga_dev, false, true);

Again, I'd break the line such that the two operands to the bitwise or
are on the same line:

return register_client(pdev, ops,
   VGA_SWITCHEROO_UNKNOWN_ID | ID_BIT_AUDIO,
   vga_dev, false, true);

Lukas
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 2/2] vgaswitchreoo: set audio client id in vgaswitchreoo enable function

2018-07-16 Thread Lukas Wunner
On Sat, Jul 14, 2018 at 02:15:24PM +0800, jimqu wrote:
> ??? 2018/7/13 17:27, Lukas Wunner ??:
> >On Fri, Jul 13, 2018 at 04:06:02PM +0800, Jim Qu wrote:
> >>On modern laptop, there are more and more platforms
> >>have two GPUs, and each of them maybe have audio codec
> >>for HDMP/DP output. For some dGPU which is no output,
> >>audio codec usually is disabled.
> >>
> >>In currect HDA audio driver, it will set all codec as
> >>VGA_SWITCHEROO_DIS, the audio which is binded to UMA
> >>will be suspended if user use debugfs to contorl power
> >>
> >>In HDA driver side, it is difficult to know which GPU
> >>the audio has binded to. So set the bound gpu pci dev
> >>to vgaswitchroo, the correct audio id will be set in
> >>vgaswitchreoo enable function.
> >>
> >>Signed-off-by: Jim Qu 
> >The approach you've taken in this patch won't work if the HDA controller
> >is bound to a driver after both GPUs have already been bound to a driver
> >and the handler has already been registered.
> >
> >That's because vga_switcheroo_enable() is run when two GPUs have registered
> >as clients and the handler has also registered.  If the HDA controller is
> >probed afterwards, its ID will be stuck at VGA_SWITCHEROO_UNKNOWN_ID.
> 
> In genernic speaking, there are two cases , a. audio client is the third
> client. b. audio client is not the third client.
> 
> if audio is third client. vga_switcheroo_ready() is true,
> vga_switcheroo_enable() can be called in audio client register fuction.
> if audio is not the third client, vga_switcheroo_enable() will be called in
> the second GFX client register.
> In vga_switcheroo_enable() , the first list loop will set two GPU clients'
> id, and the second list loop will select audio client, set id by its bound
> gpu pci dev.

No, if audio is the third client, vga_switcheroo_ready() is *not* true
because vgasr_priv.active is true.  This is set to true once the two
GPUs and the handler have registered.  If the audio device registers
afterwards, vga_switcheroo_enable() will already have been called.
It's never called twice because it sets vgasr_priv.active = true.

As a result, the audio device is stuck at VGA_SWITCHEROO_UNKNOWN_ID.

Thanks,

Lukas
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 2/2] vgaswitchreoo: set audio client id in vgaswitchreoo enable function

2018-07-13 Thread Lukas Wunner
On Fri, Jul 13, 2018 at 04:06:02PM +0800, Jim Qu wrote:
> On modern laptop, there are more and more platforms
> have two GPUs, and each of them maybe have audio codec
> for HDMP/DP output. For some dGPU which is no output,
> audio codec usually is disabled.
> 
> In currect HDA audio driver, it will set all codec as
> VGA_SWITCHEROO_DIS, the audio which is binded to UMA
> will be suspended if user use debugfs to contorl power
> 
> In HDA driver side, it is difficult to know which GPU
> the audio has binded to. So set the bound gpu pci dev
> to vgaswitchroo, the correct audio id will be set in
> vgaswitchreoo enable function.
> 
> Signed-off-by: Jim Qu 

The approach you've taken in this patch won't work if the HDA controller
is bound to a driver after both GPUs have already been bound to a driver
and the handler has already been registered.

That's because vga_switcheroo_enable() is run when two GPUs have registered
as clients and the handler has also registered.  If the HDA controller is
probed afterwards, its ID will be stuck at VGA_SWITCHEROO_UNKNOWN_ID.

Before resubmitting this patch, please run it through scripts/checkpatch.pl,
there are multiple trivial issues here.

Also, please use scripts/get_maintainers.pl on all files you're touching
and cc the patch to the listed maintainers and reviewers.


> @@ -161,9 +163,8 @@ struct vgasr_priv {
>  };
>  
>  #define ID_BIT_AUDIO 0x100
> -#define client_is_audio(c)   ((c)->id & ID_BIT_AUDIO)
> -#define client_is_vga(c) ((c)->id == VGA_SWITCHEROO_UNKNOWN_ID || \
> -  !client_is_audio(c))
> +#define client_is_audio(c)   ((c)->id & ID_BIT_AUDIO)
> +#define client_is_vga(c) (!client_is_audio(c))
>  #define client_id(c) ((c)->id & ~ID_BIT_AUDIO)

There's a spurious whitespace change in the client_is_audio() line,
please get rid of it.


> --- a/include/linux/vga_switcheroo.h
> +++ b/include/linux/vga_switcheroo.h
> @@ -84,8 +84,8 @@ enum vga_switcheroo_state {
>   * Client identifier. Audio clients use the same identifier & 0x100.
>   */
>  enum vga_switcheroo_client_id {
> - VGA_SWITCHEROO_UNKNOWN_ID = -1,
> - VGA_SWITCHEROO_IGD,
> + VGA_SWITCHEROO_UNKNOWN_ID = 0x1000,
> + VGA_SWITCHEROO_IGD = 0,

Hm, why is this change necessary?


Thanks,

Lukas
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 1/2] ALSA: HDA: use PCI_BASE_CLASS_DISPLAY to cover more display adapters

2018-07-13 Thread Lukas Wunner
On Fri, Jul 13, 2018 at 04:06:01PM +0800, Jim Qu wrote:
> Signed-off-by: Jim Qu 

Empty commit message.  Please add an explanation why the change is
necessary so folks browsing the git history understand it's motivation.
I think here the motivation is that the PCI class is sometimes
PCI_CLASS_DISPLAY_3D or PCI_CLASS_DISPLAY_OTHER.


> --- a/sound/pci/hda/hda_intel.c
> +++ b/sound/pci/hda/hda_intel.c
> @@ -31,7 +31,7 @@
>   *  CHANGES:
>   *
>   *  2004.12.01   Major rewrite by tiwai, merged the work of pshou
> - * 
> + *

Spurious unrelated change, please get rid of it, this clutters up
the git history unnecessarily.


> @@ -390,8 +390,8 @@ static char *driver_short_names[] = {
>   [AZX_DRIVER_SIS] = "HDA SIS966",
>   [AZX_DRIVER_ULI] = "HDA ULI M5461",
>   [AZX_DRIVER_NVIDIA] = "HDA NVidia",
> - [AZX_DRIVER_TERA] = "HDA Teradici", 
> - [AZX_DRIVER_CTX] = "HDA Creative", 
> + [AZX_DRIVER_TERA] = "HDA Teradici",
> + [AZX_DRIVER_CTX] = "HDA Creative",

Ditto.

Thanks,

Lukas
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [alsa-devel] ??????: [PATCH] vgaswitchroo: set audio client id according to bound gpu client id

2018-07-09 Thread Lukas Wunner
On Mon, Jul 09, 2018 at 05:52:49PM +0200, Takashi Iwai wrote:
> On Mon, 09 Jul 2018 17:47:34 +0200, Lukas Wunner wrote:
> > Since v4.17, every time the GPU is powered up, the HDA controller is
> > runtime resumed to PCI_D0.  (See the call to pci_wakeup_bus() in
> > vga_switcheroo_runtime_resume() added by dcac86b7d0.)
> > 
> > If the HDA controller can't detect presence of an HDMI display even
> > if it's in PCI_D0, then I suppose that needs to be adressed in the HDA
> > driver.  Thus so far I don't see the need to call from amdgpu into the
> > HDA driver.
> 
> It's not about the PCI power state, but the problem is that the
> detection of the HDMI and its ELD read is somewhat asynchronous.
> Basically it's handled via the hardware unsolicited event emitted over
> HD-audio bus, which was originally generated by GPU at dealing with
> the hotplug/unplug.  So, this part is racy and not 100% reliable --
> although usually it shouldn't be a big problem.
> 
> That said, it's not the exact "need" but it would make things more
> reliable and accurate (even consumes less power as we don't need to
> power up/down just for the HDMI detection).

Okay.  If amdgpu triggers the event on the HDA bus, it should call
pm_runtime_get_sync() on the HDA device beforehand.  Which device
needs to be resumed exactly to ensure ELD reception, the PCI one
or the codec?

Thanks,

Lukas
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [alsa-devel] ??????: [PATCH] vgaswitchroo: set audio client id according to bound gpu client id

2018-07-09 Thread Lukas Wunner
On Mon, Jul 09, 2018 at 05:04:22PM +0200, Takashi Iwai wrote:
> On Mon, 09 Jul 2018 15:58:51 +0200, Alex Deucher wrote:
> > On Mon, Jul 9, 2018 at 6:16 AM, Qu, Jim  wrote:
> > > > You're saying above that the HDA controller isn't runtime resumed on
> > > > hotplug of a display.  Is that necessary to retrieve ELD or something?
> > > > I'm not sure if there's code in the HDA driver to acquire a runtime PM
> > > > ref on HPD, but maybe it's necessary.  Or maybe the code is there but
> > > > somehow no HPD interrupt is received by the HDA driver?
> > >
> > > So far, I do not find any code about audio response HPD in kernel.
> > > the HPD interrupt will sent to user mode via uevent, not sure whether
> > > audio user mode driver can receive the event or not.
> > 
> > On the gfx side at least, we can get a hotplug event via ACPI
> > (depending on the OEM design) if displays are attached/detached while
> > the dGPU is powered down.  I suppose the gfx driver could call into
> > the audio driver during one of those events.  On the gfx side at least
> > we just generate the gfx hotplug event and let userspace deal with it.
> 
> IMO, a more proper way would be to have the direct communication
> between the graphics driver and the audio driver like i915 driver
> does.  Then the audio driver can get plug/unplug event at more
> accurate timing without races.

Since v4.17, every time the GPU is powered up, the HDA controller is
runtime resumed to PCI_D0.  (See the call to pci_wakeup_bus() in
vga_switcheroo_runtime_resume() added by dcac86b7d0.)

If the HDA controller can't detect presence of an HDMI display even
if it's in PCI_D0, then I suppose that needs to be adressed in the HDA
driver.  Thus so far I don't see the need to call from amdgpu into the
HDA driver.

Thanks,

Lukas
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] vgaswitchroo: set audio client id according to bound gpu client id

2018-07-09 Thread Lukas Wunner
On Mon, Jul 09, 2018 at 08:52:33AM +, Qu, Jim wrote:
> Now, I found the audio device will auto suspend even if the gpu is active,
> and if I plug in a HDMI device it also do not resume back. 
> 
> 1. Did you encounter similar issue before?
> 2. audio will auto suspend as default at beginning of boot, is it expect
> behaviour?

Yes, that's expected.  Once you start streaming audio to attached displays,
user space opens the codec device and this causes the HDA controller to
runtime resume.  If the discrete GPU is suspended at that moment, it will
be powered on and kept powered on as long as user space is streaming audio.

Do you need to runtime resume the HDA controller even if user space isn't
streaming audio?  Why, and in which situation exactly?

You're saying above that the HDA controller isn't runtime resumed on
hotplug of a display.  Is that necessary to retrieve ELD or something?
I'm not sure if there's code in the HDA driver to acquire a runtime PM
ref on HPD, but maybe it's necessary.  Or maybe the code is there but
somehow no HPD interrupt is received by the HDA driver?

Thanks,

Lukas
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] vgaswitchroo: set audio client id according to bound gpu client id

2018-06-29 Thread Lukas Wunner
On Fri, Jun 29, 2018 at 10:40:50AM +, Qu, Jim wrote:
> > That is no longer the case since v4.17.  The HDA controller now runtime
> > suspends autonomously, see commit 07f4f97d7b4b ("vga_switcheroo: Use
> > device link for HDA controller").
> > 
> > Your patch appears to be geared towards an older kernel version.
> > Please retest on the laptop in question with a v4.17+ kernel.
> 
> indeed? I used 4.13 on the platform. let me have a try with the patch
> you mentioned

The commit can't be cherry-picked by itself onto v4.13, it was part of
a larger series.  It's best to use a stock v4.17 kernel.

Note, a fix went into Linus' tree yesterday, commit 57cb54e53bdd
("ALSA: hda - Force to link down at runtime suspend on ATI/AMD HDMI").
Not sure if it's needed on your machine.

Thanks,

Lukas
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] vgaswitchroo: set audio client id according to bound gpu client id

2018-06-29 Thread Lukas Wunner
On Fri, Jun 29, 2018 at 08:55:40AM +, Qu, Jim wrote:
> When our dGPU does suspend by runtime pm. amdgpu driver for dgpu will
> also call vgaswtichroo to power off its audio. vgaswitchroo driver will
> find audio codec by vgaswitchroo dgpu client id(VGA_SWITCHEROO_DIS).

That is no longer the case since v4.17.  The HDA controller now runtime
suspends autonomously, see commit 07f4f97d7b4b ("vga_switcheroo: Use
device link for HDA controller").

Your patch appears to be geared towards an older kernel version.
Please retest on the laptop in question with a v4.17+ kernel.


> I think the issue should be observed on both Intel+AMD or AMD+AMD
> platform which has the same HW configuration.
> 
> 1.if dGPU has no audio codec. the issue should be always observed.
> 2.if both iGPU and dGPU has audio codecs, the issue should be random,
> it depends on the first audio found by vgaswitchroo driver is on iGPU
> or dGPU.

On discrete AMD and Nvidia GPUs, the HDA controller is function 1 and
the GPU is function 0 in the same PCI slot.

On Intel chipsets, the HDA controller and the GPU have completely
different PCI device numbers, e.g. the GPU might be :00:02.0 and
the HDA controller might be :00:1b.0.

get_bound_vga() checks that the HDA controller is function 1 and there's
a GPU in function 0 of the same slot.

Thus get_bound_vga() always returns NULL for an Intel HDA controller and
the controller is never registered with vga_switcheroo (which is fine).

Thanks,

Lukas
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] vgaswitchroo: set audio client id according to bound gpu client id

2018-06-29 Thread Lukas Wunner
> On Thu, Jun 28, 2018 at 2:22 AM, Jim Qu  wrote:
> > On modern laptop, there are more and more platforms
> > have two GPUs, and each of them maybe have audio codec
> > for HDMP/DP output. For some dGPU which is no output,
> > audio codec usually is disabled.
> >
> > In currect HDA audio driver, it will set all codec as
> > VGA_SWITCHEROO_DIS, so if system runtime pm is enabled,
> > the audio which is binded to UMA will be suspended.

Please explain what you mean by UMA here.  Are you referring to
dual GPU systems where both GPUs are by AMD?  And the integrated
GPU is incorrectly assigned VGA_SWITCHEROO_DIS?

Starting with v4.17, the only reason the audio driver registers
with vga_switcheroo is because it can be powered down if manual
power control is used.  However manual power control is not the
default, it's primary use case nowadays is MacBook Pros, and
there are no MacBook Pros with dual AMD GPUs.  So I don't
understand what real world use case you're trying to fix.
Please explain.


> > Newer dGPUs use PCI_CLASS_DISPLAY_OTHER rather than
> > PCI_CLASS_DISPLAY_VGA.  Make sure we check for both when
> > checking for the dGPU in get_bound_vga()

Unlike the other changes in your patch, this one would seem
perfectly valid to me.  However it's a separate issue, so
please put it in a separate patch.

Instead of adding a check for PCI_CLASS_DISPLAY_OTHER, I'd prefer
you replacing the existing check for PCI_CLASS_DISPLAY_VGA with a
check for PCI_BASE_CLASS_DISPLAY, like so:

-   if ((p->class >> 8) == PCI_CLASS_DISPLAY_VGA)
+   if ((p->class >> 16) == PCI_BASE_CLASS_DISPLAY)

Thanks,

Lukas
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: Avoid leaking PM domain on driver unbind

2018-01-23 Thread Lukas Wunner
On Mon, Jan 22, 2018 at 11:22:58PM -0500, Alex Deucher wrote:
> On Sun, Jan 21, 2018 at 7:46 AM, Lukas Wunner  wrote:
> > amdgpu_device_init() calls vga_switcheroo_init_domain_pm_ops() either
> > if the device has the PowerXpress flag set or if the user has set the
> > "runpm" module param to 1.
> >
> > However amdgpu_device_fini() calls vga_switcheroo_fini_domain_pm_ops()
> > only under the first of those two conditions.
> 
> Good catch.  Forcing runpm=1 doesn't do anything useful anyway so just
> remove it in device_init().  See the attached patch.

Ok, ack.

Lukas
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/amdgpu: Avoid leaking PM domain on driver unbind

2018-01-21 Thread Lukas Wunner
amdgpu_device_init() calls vga_switcheroo_init_domain_pm_ops() either
if the device has the PowerXpress flag set or if the user has set the
"runpm" module param to 1.

However amdgpu_device_fini() calls vga_switcheroo_fini_domain_pm_ops()
only under the first of those two conditions.

(Note this issue isn't present in radeon.  It was introduced when the
driver was forked.)

Cc: sta...@vger.kernel.org
Cc: Alex Deucher 
Signed-off-by: Lukas Wunner 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index d09c4ee9f7e1..f07eedc53761 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2083,7 +2083,7 @@ void amdgpu_device_fini(struct amdgpu_device *adev)
adev->bios = NULL;
if (!pci_is_thunderbolt_attached(adev->pdev))
vga_switcheroo_unregister_client(adev->pdev);
-   if (adev->flags & AMD_IS_PX)
+   if (adev->flags & AMD_IS_PX || amdgpu_runtime_pm == 1)
vga_switcheroo_fini_domain_pm_ops(adev->dev);
vga_client_register(adev->pdev, NULL, NULL, NULL);
if (adev->rio_mem)
-- 
2.15.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: iMac 10,1 with Ubuntu 16.04: black screen after suspend

2017-07-25 Thread Lukas Wunner
On Tue, Jul 25, 2017 at 07:14:23AM +0200, Mario Kleiner wrote:
> On 07/24/2017 03:45 PM, Florian Echtler wrote:
> > thanks for the hint. I've applied this patch to the v4.12 tree I'm
> > currently running, and it didn't seem to make any difference (i.e.
> > display stays black after suspend).
> 
> That's the same here with my patch applied. After a suspend -> resume, the
> internal panel stays black, the patch doesn't help for that. Somethig i
> didn't notice btw., apparently i never suspend->resumed it.
[...]
> Lukas idea that some hardware mux gets switched into the wrong position on
> models with TDM sounds pretty appealing to me, but how to test?

If all else fails, with a multimeter / oscilloscope. :-)  The board
schematics are available by googling for the right terms, such as
the drawing number "051-7863" and internal codename "K23".

Of interest is the eDP connector on the mainboard, page 90.  Notice
there are two power rails going into it, 3.3V (pin 31) and 12V (pins
27 - 30).  Florian obtained dmesg output of the machine coming out
of suspend by ssh'ing into it and it showed that the eDP link could
be trained properly upon system resume.  Still the panel stayed black.

My guess is that the panel's DisplayPort transceiver is powered via the
3.3V rail.  This rail is powered permanently when the system is in S0,
it cannot be switched off at runtime.  So, the Radeon card can talk to
the DisplayPort transceiver (which has power) but the 12V rail, which
presumably powers the LED backlight, is off.  You could test this theory
by attaching a multimeter after coming out of suspend:  You should see
a voltage difference of 3.3V between pins 31 and 32 (ground) and 0V
between pins 27 - 30 and 32.

The logic for the 12V rail is somewhat complicated, first there's
pin 21 ("VIDEO_ON"), this seems to come *from* the panel and presumably
signals that the link is trained. This should go high after resume.
If it does not then maybe a write to a custom DPCD register is necessary
to make it go high.  If this pin stays low then the 12V rail is not
powered.

Next there's a 74LVC157A mux (page 95 top-left).  Datasheet:
https://assets.nexperia.com/documents/data-sheet/74LVC157A.pdf
The mux can switch four wires, but Apple only needed three.
I guess all outputs (pins 7, 9, 12) must be high for the backlight
to go on.  The mux is under the control of the SMC and is presumably
switched by issuing appropriate commands to the SMC.  It's unclear
to me if the SMC has switched it to the Radeon or to the TDM source
after resume.

Assuming that the mux is switched to the Radeon card, follow input
pins 5, 10 and 14 (MXM_PNL_BL_PWM, MXM_PNL_BL_EN, MXM_PNL_PWR_EN).
They are coming from a "system management" block on the Radeon card
(pins 25, 27, 23, page 85).  Apparently these are GPIO pins for
OEM use and I guess they can be controlled by writing to the PCI BAR
of the Radeon card, but I've no idea at what address their registers
might be located.  I'll try to look at the macOS Radeon drivers with
a disassembler but this is like looking for a needle in a haystack.


> However this is something i got when connecting the external Displayport
> panel:
> 
> Jul  7 04:16:09 kleinerm-MaxtorLinux kernel: [ 1441.344792]
> [drm:radeon_dp_link_train [radeon]] *ERROR* displayport link status failed
> Jul  7 04:16:09 kleinerm-MaxtorLinux kernel: [ 1441.344819]
> [drm:radeon_dp_link_train [radeon]] *ERROR* clock recovery failed
> Jul  7 04:18:07 kleinerm-MaxtorLinux kernel: [ 1559.770783]
> drm_dp_i2c_do_msg: 20 callbacks suppressed
> Jul  7 04:30:19 kleinerm-MaxtorLinux kernel: [ 2291.143406]
> [drm:radeon_dp_link_train [radeon]] *ERROR* displayport link status failed
> Jul  7 04:30:19 kleinerm-MaxtorLinux kernel: [ 2291.143439]
> [drm:radeon_dp_link_train [radeon]] *ERROR* clock recovery failed
> Jul  7 04:30:19 kleinerm-MaxtorLinux kernel: [ 2291.356173]
> [drm:radeon_dp_link_train [radeon]] *ERROR* displayport link status failed
> Jul  7 04:30:19 kleinerm-MaxtorLinux kernel: [ 2291.356205]
> [drm:radeon_dp_link_train [radeon]] *ERROR* clock recovery failed
> 
> So link training failed, because drm_dp_dpcd_read_link_status() already
> failed to read from the dp aux channel.

The AUX channel can be terminated in either of two modes under the control
of the SMC:  100k source termination or weak sink termination (page 94/95).
Failure to communicate over AUX may be explained by being in the incorrect
mode.

HTH,

Lukas
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: iMac 10,1 with Ubuntu 16.04: black screen after suspend

2017-07-17 Thread Lukas Wunner
On Fri, Jun 02, 2017 at 06:47:07PM +0200, Florian Echtler wrote:
> Regarding the SMC, there's actually only one key that consistently seems to 
> have
> a different value whether the display is on or off:
> 
> --- blank 2017-05-05 08:40:53.694565045 +0200
> +++ non_blank 2017-05-05 08:40:53.702565066 +0200
> @@ -143,7 +143,7 @@
>MSWR  [ui8 ]  0 (bytes 00)
>MVBO  [hex_]  (bytes ff ff)
>MVDC  [bin_]  (bytes 00)
> -  MVDS  [bin_]  (bytes 08)
> +  MVDS  [bin_]  (bytes 0a)
>MVE1  [si8 ]  (bytes 0d)
>MVE5  [si8 ]  (bytes 0b)
>MVHR  [flag]  (bytes 01)
> 
> However, even with my modified SmcDumpKeys.c which I can use to enable TDM, I
> cannot write to that key. Since other MV__ keys control the display, too, it
> would make sense that that is related to the display state, but it seems to 
> be a
> read-only key :-/
> 
> Running out of ideas again... any suggestions?

Sorry for the delay Florian.  Commit 564d8a2cf3ab by Mario Kleiner (+cc)
landed in Linus' tree last week and is included in 4.13-rc1.  It is
supposed to fix black screen issues with the iMac10,1 that you're also
using, though in Mario's case they seem to occur upon boot, rather than
on suspend, still might be worth a try:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=564d8a2cf3abf16575af48bdc3e86e92ee8a617d

Thanks,

Lukas
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: iMac 10,1 with Ubuntu 16.04: black screen after suspend

2017-05-31 Thread Lukas Wunner
On Wed, May 31, 2017 at 10:48:37AM +0200, Florian Echtler wrote:
> On 30.05.2017 12:54, Lukas Wunner wrote:
> > On Tue, May 30, 2017 at 11:34:17AM +0200, Florian Echtler wrote:
> >> On 26.05.2017 23:03, Lukas Wunner wrote:
> >>> On Fri, May 26, 2017 at 02:13:29PM +0200, Florian Echtler wrote:
> >>>> I'm running Ubuntu 16.04.2 on a 27" unibody iMac 10,1 from 2009. 
> >>>> However, even
> >>>> with the most recent HWE stack (kernel 4.8), the display stays black 
> >>>> after
> >>>> suspend. I can ssh into the machine, so wakeup is OK, but the display is
> >>>> entirely black (backlight stays off).
> > 
> > So, just to confirm, if you never plug in an external DP source and the iMac
> > comes out of sleep, the display stays black?
> 
> Correct, verified this just now.
> 
> > If you log in via ssh and look at dmesg, was radeon able to train the DP
> > link again?  No error messages, nothing?
> 
> Here are the relevant dmesg lines after wakeup:
> 
> [  157.622950] [drm] enabling PCIE gen 2 link speeds, disable with
> radeon.pcie_gen2=0
> [  157.626077] [drm] PCIE GART of 1024M enabled (table at 0x0014C000).
> [  157.626094] radeon :02:00.0: WB enabled
> [  157.626097] radeon :02:00.0: fence driver on ring 0 use gpu addr
> 0x1c00 and cpu addr 0xa1242dfd6c00
> [  157.626098] radeon :02:00.0: fence driver on ring 3 use gpu addr
> 0x1c0c and cpu addr 0xa1242dfd6c0c
> [  157.626315] radeon :02:00.0: fence driver on ring 5 use gpu addr
> 0x0005c598 and cpu addr 0xbc3081c1c598
> [  157.672183] [drm] ring test on 0 succeeded in 1 usecs
> [  157.672187] [drm] ring test on 3 succeeded in 2 usecs
> [  157.847098] [drm] ring test on 5 succeeded in 1 usecs
> [  157.847102] [drm] UVD initialized successfully.
> [  157.847121] [drm] ib test on ring 0 succeeded in 0 usecs
> [  157.847136] [drm] ib test on ring 3 succeeded in 0 usecs
> [  158.524061] [drm] ib test on ring 5 succeeded

Hm, try booting with drm.debug=0xf to see if link training for the
eDP connector succeeds.  If the link cannot be trained, it would
explain that the screen stays black.  Should this indeed be the
case, one possible explanation would be that the panel is muxed
to the external port on resume and an appropriate command needs to
be sent to the SMC to switch the mux to the radeon card.  This could
be done in the ->resume_early hook of your APP000C platform driver.
The apple-gmux driver contains something similar to power the discrete
GPU down if it was off before suspend (because the BIOS always powers
it up on resume).

Another possible explanation would be that panel power needs to be
enabled by writing to the registers which control the pins mentioned
below.


> And this is logged in Xorg.0.log after wakeup:
> 
> [   229.956] (II) RADEON(0): Allocate new frame buffer 320x200 stride 384
> [   229.956] (II) RADEON(0): VRAM usage limit set to 219596K
> 
> xrandr still shows the eDP output as connected and active with 2560x1440 
> resolution.
> 
> > Is the panel off or is it on but merely with 0% brightness?
> 
> The backlight is definitely off. I've tried to shine a torch onto the panel 
> and
> see if any content is visible, but it looks like the panel itself is off, too.
> 
> > There's a block of pins on the GPU for "system management" which contains
> > a bunch of GPIO, OEM and reserved pins.  Three of these are used to control
> > the panel when it's switched to the radeon card:
> > pin 23 PNL_PWR_EN
> > pin 25 PNL_BL_EN
> > pin 27 PNL_BL_PWM
> > 
> > Basically you'd need to know what these are set to before and after sleep.
> > I don't know how to access them, presumably via a BAR register.  A quick
> > look at the RV730-specific registers in drivers/gpu/drm/radeon/*.h did not
> > yield anything useful, but someone at AMD may be able to find this out.
> 
> Can I perhaps use radeontool for this? I.e. dump registers before and after
> sleep and see if there's a difference? Or is radeontool too old to know about
> the correct registers?

*shrug*  Unfortunately I'm not familiar at all with radeontool. :-(

Best regards,

Lukas
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: iMac 10,1 with Ubuntu 16.04: black screen after suspend

2017-05-30 Thread Lukas Wunner
On Tue, May 30, 2017 at 11:34:17AM +0200, Florian Echtler wrote:
> On 26.05.2017 23:03, Lukas Wunner wrote:
> > On Fri, May 26, 2017 at 02:13:29PM +0200, Florian Echtler wrote:
> >> I'm running Ubuntu 16.04.2 on a 27" unibody iMac 10,1 from 2009. However, 
> >> even
> >> with the most recent HWE stack (kernel 4.8), the display stays black after
> >> suspend. I can ssh into the machine, so wakeup is OK, but the display is
> >> entirely black (backlight stays off).
> >>
> > The ATI card has MXM_PNL_PWR_EN and MXM_PNL_BL_EN pins.  Those must be
> > enabled for the panel to light up.  Perhaps radeon needs to be extended
> > to do that.  One theory is that this is done via ACPI, but perhaps
> > only if a certain operating system is running.  Dump the ACPI tables
> > and search DSDT and SSDT* tables for methods relating to the ATI card
> > and/or backlight.  If you find checks for OSDW there (e.g. in _PS0,
> > i.e. on resume), it means the AML code is only executed on Darwin.
> I've already had a look at the ACPI tables; there seem to be no references to
> the "official" backlight/display control functions (no _BL? or _DO? methods).
> 
> > Conversely !OSDW means the code is only executed on Windows (Boot Camp).
> > Linux masquerades as Darwin, but this can be disabled with
> > acpi_osi=!Darwin on the command line.  It's worth trying if that changes
> > the behaviour.  If it does, then macOS likely sets those pins on resume
> > and we need to do the same in radeon.
> There are OSDW checks present, however, booting with acpi_osi=!Darwin doesn't
> make a difference to the backlight behaviour.

So, just to confirm, if you never plug in an external DP source and the iMac
comes out of sleep, the display stays black?

If you log in via ssh and look at dmesg, was radeon able to train the DP
link again?  No error messages, nothing?

Is the panel off or is it on but merely with 0% brightness?

There's a block of pins on the GPU for "system management" which contains
a bunch of GPIO, OEM and reserved pins.  Three of these are used to control
the panel when it's switched to the radeon card:
pin 23 PNL_PWR_EN
pin 25 PNL_BL_EN
pin 27 PNL_BL_PWM

Basically you'd need to know what these are set to before and after sleep.
I don't know how to access them, presumably via a BAR register.  A quick
look at the RV730-specific registers in drivers/gpu/drm/radeon/*.h did not
yield anything useful, but someone at AMD may be able to find this out.

Maybe these registers are supposed to be modified by the OS when coming
out of sleep, or they're modified by video BIOS but the OS has to trigger
this somehow.

HTH,

Lukas
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: iMac 10,1 with Ubuntu 16.04: black screen after suspend

2017-05-26 Thread Lukas Wunner
[cc += dri-devel, amd-gfx]

On Fri, May 26, 2017 at 02:13:29PM +0200, Florian Echtler wrote:
> I'm running Ubuntu 16.04.2 on a 27" unibody iMac 10,1 from 2009. However, even
> with the most recent HWE stack (kernel 4.8), the display stays black after
> suspend. I can ssh into the machine, so wakeup is OK, but the display is
> entirely black (backlight stays off).
> 
> Note: this appears to be entirely unrelated to the TDM stuff I posted about
> earlier. Interestingly, when I enable TDM, even after suspend, the display 
> works
> while TDM is active. Once I disable TDM, it's back to black (unintentional 
> song
> reference there).

Sounds like a radeon issue to me.

The LCD_PANEL_PWR and LCD_BKL_ON_MUX pins are coming from a mux.

In TDM mode, the pins are muxed to the SMC, so it can turn the panel on
and off.  If TDM is not in use, the pins are muxed to the ATI card.

The ATI card has MXM_PNL_PWR_EN and MXM_PNL_BL_EN pins.  Those must be
enabled for the panel to light up.  Perhaps radeon needs to be extended
to do that.  One theory is that this is done via ACPI, but perhaps
only if a certain operating system is running.  Dump the ACPI tables
and search DSDT and SSDT* tables for methods relating to the ATI card
and/or backlight.  If you find checks for OSDW there (e.g. in _PS0,
i.e. on resume), it means the AML code is only executed on Darwin.
Conversely !OSDW means the code is only executed on Windows (Boot Camp).
Linux masquerades as Darwin, but this can be disabled with
acpi_osi=!Darwin on the command line.  It's worth trying if that changes
the behaviour.  If it does, then macOS likely sets those pins on resume
and we need to do the same in radeon.

HTH,

Lukas

> 
> I've noted that in /sys/class/backlight, there are two different control
> interfaces, apple_backlight and radeon_bl0. Before suspend, I can control the
> screen brightness from system settings, even if only one of them is loaded, so
> both brightness interfaces work.
> 
> What I've already tried without success:
> 
> - Blindly switching to the console and back.
> - Calling DISPLAY=:0.0 xrandr --output eDP1 --auto via ssh.
> - Restarting X via ssh.
> - Blacklist apple_bl on the kernel cmdline (modprobe.blacklist=apple_bl).
> - Disabling the radeon backlight on the kernel cmdline (radeon.backlight=0).
> - Using radeontool to turn the backlight back on after suspend.
> - Directly using the interface in /sys/class/backlight/ to turn it back on.
> 
> I'm starting to run out of ideas... any hints?
> 
> Regards, Florian
> -- 
> SENT FROM MY DEC VT50 TERMINAL
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/radeon: Fix oops upon driver load on PowerXpress laptops

2017-05-23 Thread Lukas Wunner
On Tue, May 23, 2017 at 09:32:38AM +0200, Christian König wrote:
> Am 23.05.2017 um 05:55 schrieb Michel Dänzer:
> >On 23/05/17 12:50 PM, Lukas Wunner wrote:
> >>On Tue, May 23, 2017 at 12:09:49PM +0900, Michel Dänzer wrote:
> >>>On 22/05/17 11:04 PM, Lukas Wunner wrote:
> >>>>On Sun, May 21, 2017 at 09:31:09AM +0200, Nicolai Stange wrote:
> >>>>>On Thu, May 18 2017, Lukas Wunner wrote:
> >>>>[snip]
> >>>>>>Reported-by: Nicolai Stange 
> >>>>>>Fixes: 7ffb0ce31cf9 ("drm/radeon: Don't register Thunderbolt eGPU with 
> >>>>>>vga_switcheroo")
> >>>>>>Signed-off-by: Lukas Wunner 
> >>>>>>---
> >>>>>>
> >>>>>>Awaiting a Tested-by: from Nicolai, but it's clear this is a bug and
> >>>>>>needs to be fixed, so sending out with a proper commit message now.
> >>>>>>The bug was only introduced to radeon, not amdgpu.
> >>>>>Tested-by: Nicolai Stange 
> >>>>>
> >>>>>Thanks for the quick fix!
> >>>>>
> >>>>>>@Alex Deucher: I could push this to drm-misc-fixes but then it wouldn't
> >>>>>>land before -rc3 because Sean Paul has already sent out the -rc2 pull.
> >>>>>>I notice you haven't sent out a pull for -rc2 yet, so maybe you want to
> >>>>>>take it yourself?  Whichever you prefer.  Thanks & sorry for the 
> >>>>>>breakage!
> >>>>I've learned this morning that Alex is on vacation.
> >>>Christian König is standing in for Alex.
> >>By his own account, he already has "all hands full replacing him [Alex]",
> >>explicitly asked Daniel to merge an amdgpu patch through drm-misc-next for
> >>this reason and lacks permission to update branches in Alex' repo on fdo:
> >>
> >>"One lesson learned from the past week is that Alex needs to stop using
> >>his personal repository on fdo.
> >>We were asked a couple of times if I couldn't update a branch there from
> >>different directions, which we obviously can't do."
> >>
> >>https://lists.freedesktop.org/archives/dri-devel/2017-May/142376.html
> >>https://lists.freedesktop.org/archives/dri-devel/2017-May/142380.html
> >The important point being that Christian reviewed that patch and
> >explicitly asked Daniel to pick it up.
> 
> Wow, wait a second. I'm just catching up on this thread.
> 
> Lukas didn't committed the patch to drm-misc without a review, didn't you?
> 
> I was intentionally holding back a rb because that isn't my field of
> expertise and I was only briefly involved in the original patch.

It would have been helpful if you had communicated that, I explicitly
asked Alex which tree he'd prefer merging through.  If you're his
stand-in then why didn't you reply?

I was already wondering why you took the time to reply to Daniel's patch
(which went into drm-misc-next, so queued for 4.13), but didn't reply
at all to my patch (which affects 4.12, so arguably has higher priority).

I'd dispute that the issue at hand requires specific domain knowlege,
it's a trivial dereference of a pointer before it's set.


> Alex should
> be back by the end of the week, so no need for a rush like that.

End of the week means the patch would miss another rc cycle, and the
DRM subsystem is getting enough criticism for causing regressions as
it is, isn't it? :-(

Thanks,

Lukas
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/radeon: Fix oops upon driver load on PowerXpress laptops

2017-05-23 Thread Lukas Wunner
On Mon, May 22, 2017 at 09:24:34PM +0200, Daniel Vetter wrote:
> On Thu, May 18, 2017 at 09:33:44PM +0200, Lukas Wunner wrote:
> > Nicolai Stange reports the following oops which is caused by
> > dereferencing rdev->pdev before it's subsequently set by
> > radeon_device_init().  Fix it.
> > 
> >   BUG: unable to handle kernel NULL pointer dereference at 07cb
> >   IP: radeon_driver_load_kms+0xeb/0x230 [radeon]
> >   PGD 0
> >   P4D 0
> > 
> >   Oops:  [#1] SMP
> >   Modules linked in: amdkfd amd_iommu_v2 i915(+) radeon(+) i2c_algo_bit 
> > drm_kms_helper ttm e1000e drm sdhci_pci sdhci_acpi ptp sdhci crc32c_intel 
> > serio_raw mmc_core pps_core video i2c_hid hid_plantronics
> >   CPU: 4 PID: 389 Comm: systemd-udevd Not tainted 4.12.0-rc1-next-20170515+ 
> > #1
> >   Hardware name: Dell Inc. Latitude E6540/0725FP, BIOS A10 06/26/2014
> >   task: 97d62c8f task.stack: b96f01478000
> >   RIP: 0010:radeon_driver_load_kms+0xeb/0x230 [radeon]
> >   RSP: 0018:b96f0147b9d0 EFLAGS: 00010246
> >   RAX:  RBX: 97d620085000 RCX: 00610037
> >   RDX:  RSI: 002b RDI: 
> >   RBP: b96f0147b9e8 R08: 0002 R09: b96f0147b924
> >   R10:  R11: 97d62edd2ec0 R12: 97d628d5c000
> >   R13: 00610037 R14: c0698280 R15: 
> >   FS:  7f496363d8c0() GS:97d62eb0() 
> > knlGS:
> >   CS:  0010 DS:  ES:  CR0: 80050033
> >   CR2: 07cb CR3: 00022c14c000 CR4: 001406e0
> >   Call Trace:
> >drm_dev_register+0x146/0x1d0 [drm]
> >drm_get_pci_dev+0x9a/0x180 [drm]
> >radeon_pci_probe+0xb8/0xe0 [radeon]
> >local_pci_probe+0x45/0xa0
> >pci_device_probe+0x14f/0x1a0
> >driver_probe_device+0x29c/0x450
> >__driver_attach+0xdf/0xf0
> >? driver_probe_device+0x450/0x450
> >bus_for_each_dev+0x6c/0xc0
> >driver_attach+0x1e/0x20
> >bus_add_driver+0x170/0x270
> >driver_register+0x60/0xe0
> >? 0xc0508000
> >__pci_register_driver+0x4c/0x50
> >drm_pci_init+0xeb/0x100 [drm]
> >? vga_switcheroo_register_handler+0x6a/0x90
> >? 0xc0508000
> >radeon_init+0x98/0xb6 [radeon]
> >do_one_initcall+0x52/0x1a0
> >? __vunmap+0x81/0xb0
> >? kmem_cache_alloc_trace+0x159/0x1b0
> >? do_init_module+0x27/0x1f8
> >do_init_module+0x5f/0x1f8
> >load_module+0x27ce/0x2be0
> >SYSC_finit_module+0xdf/0x110
> >? SYSC_finit_module+0xdf/0x110
> >SyS_finit_module+0xe/0x10
> >do_syscall_64+0x67/0x150
> >entry_SYSCALL64_slow_path+0x25/0x25
> >   RIP: 0033:0x7f4962295679
> >   RSP: 002b:7ffdd8c4f878 EFLAGS: 0246 ORIG_RAX: 0139
> >   RAX: ffda RBX: 55c014ed8200 RCX: 7f4962295679
> >   RDX:  RSI: 7f4962dd19c5 RDI: 0010
> >   RBP: 7f4962dd19c5 R08:  R09: 7ffdd8c4f990
> >   R10: 0010 R11: 0246 R12: 
> >   R13: 55c014ed81a0 R14: 0002 R15: 55c0149d1fca
> >   Code: 5d 5d c3 8b 05 a7 05 14 00 49 81 cd 00 00 08 00 85 c0 74 a3 e8 e7 
> > c0 0e 00 84 c0 74 9a 41 f7 c5 00 00 02 00 75 91 49 8b 44 24 10 <0f> b6 90 
> > cb 07 00 00 f6 c2 20 74 1e e9 7b ff ff ff 48 8b 40 38
> >   RIP: radeon_driver_load_kms+0xeb/0x230 [radeon] RSP: b96f0147b9d0
> >   CR2: 07cb
> >   ---[ end trace 89cc4ba7e569c65c ]---
> > 
> > Reported-by: Nicolai Stange 
> > Fixes: 7ffb0ce31cf9 ("drm/radeon: Don't register Thunderbolt eGPU with 
> > vga_switcheroo")
> > Signed-off-by: Lukas Wunner 
> > ---
> > 
> > Awaiting a Tested-by: from Nicolai, but it's clear this is a bug and
> > needs to be fixed, so sending out with a proper commit message now.
> > The bug was only introduced to radeon, not amdgpu.
> > 
> > @Alex Deucher: I could push this to drm-misc-fixes but then it wouldn't
> > land before -rc3 because Sean Paul has already sent out the -rc2 pull.
> > I notice you haven't sent out a pull for -rc2 yet, so maybe you want to
> > take it yourself?  Whichever you prefer.  Thanks & sorry for the breakage!
> 
> Just noticed that this has landed already in drm-misc-fixes, without any
> r-b or at least an ack from radeon driver folks. That's breaking the
> drm-misc rules, we need at least an ack for small drivers (which ra

Re: [PATCH] drm/radeon: Fix oops upon driver load on PowerXpress laptops

2017-05-23 Thread Lukas Wunner
On Mon, May 22, 2017 at 03:35:48PM -0400, Sean Paul wrote:
> On Mon, May 22, 2017 at 04:04:07PM +0200, Lukas Wunner wrote:
> > On Sun, May 21, 2017 at 09:31:09AM +0200, Nicolai Stange wrote:
> > > On Thu, May 18 2017, Lukas Wunner wrote:
> > [snip]
> > > > Reported-by: Nicolai Stange 
> > > > Fixes: 7ffb0ce31cf9 ("drm/radeon: Don't register Thunderbolt eGPU with 
> > > > vga_switcheroo")
> > > > Signed-off-by: Lukas Wunner 
> > > > ---
> > > >
> > > > Awaiting a Tested-by: from Nicolai, but it's clear this is a bug and
> > > > needs to be fixed, so sending out with a proper commit message now.
> > > > The bug was only introduced to radeon, not amdgpu.
> > > 
> > > Tested-by: Nicolai Stange 
> > > 
> > > Thanks for the quick fix!
> > >
> > > > @Alex Deucher: I could push this to drm-misc-fixes but then it wouldn't
> > > > land before -rc3 because Sean Paul has already sent out the -rc2 pull.
> > > > I notice you haven't sent out a pull for -rc2 yet, so maybe you want to
> > > > take it yourself?  Whichever you prefer.  Thanks & sorry for the 
> > > > breakage!
> > 
> > I've learned this morning that Alex is on vacation.  I've pushed
> > the patch to drm-misc-fixes so that the issue is fixed in 4.12-rc3.
> > 
> > @Sean Paul: I've fast-forwarded to 4.12-rc2 before pushing, please
> > shout if I've done anything wrong.  First time I'm doing this.
> 
> No shouting, but a heads-up on IRC is probably warranted for both pushing a
> patch without R-b and fast-forwarding one of the branches.

Thanks, noted.  I'm not paid for work on the DRM subsystem, so I have to
do this during breaks at $DAYJOB where I have no access to IRC, but I will
ask via e-mail in the future before going out on a limb.

Not being able to dedicate my full attention to this all the time is also
the reason why it's hard for me to get the timing perfect:  I had already
submitted a fix before you sent out your -rc2 pull and I would have hated
missing another rc cycle, yet wasn't sure when exactly you were going to
send out your -rc3 pull this week and whether I would be able to carve out
enough time to push the patch before that date without hurriedly making
major mistakes.

Kind regards,

Lukas
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/radeon: Fix oops upon driver load on PowerXpress laptops

2017-05-22 Thread Lukas Wunner
On Tue, May 23, 2017 at 12:09:49PM +0900, Michel Dänzer wrote:
> On 22/05/17 11:04 PM, Lukas Wunner wrote:
> > On Sun, May 21, 2017 at 09:31:09AM +0200, Nicolai Stange wrote:
> >> On Thu, May 18 2017, Lukas Wunner wrote:
> > [snip]
> >>> Reported-by: Nicolai Stange 
> >>> Fixes: 7ffb0ce31cf9 ("drm/radeon: Don't register Thunderbolt eGPU with 
> >>> vga_switcheroo")
> >>> Signed-off-by: Lukas Wunner 
> >>> ---
> >>>
> >>> Awaiting a Tested-by: from Nicolai, but it's clear this is a bug and
> >>> needs to be fixed, so sending out with a proper commit message now.
> >>> The bug was only introduced to radeon, not amdgpu.
> >>
> >> Tested-by: Nicolai Stange 
> >>
> >> Thanks for the quick fix!
> >>
> >>> @Alex Deucher: I could push this to drm-misc-fixes but then it wouldn't
> >>> land before -rc3 because Sean Paul has already sent out the -rc2 pull.
> >>> I notice you haven't sent out a pull for -rc2 yet, so maybe you want to
> >>> take it yourself?  Whichever you prefer.  Thanks & sorry for the breakage!
> > 
> > I've learned this morning that Alex is on vacation.
> 
> Christian König is standing in for Alex.

By his own account, he already has "all hands full replacing him [Alex]",
explicitly asked Daniel to merge an amdgpu patch through drm-misc-next for
this reason and lacks permission to update branches in Alex' repo on fdo:

"One lesson learned from the past week is that Alex needs to stop using
his personal repository on fdo.
We were asked a couple of times if I couldn't update a branch there from 
different directions, which we obviously can't do."

https://lists.freedesktop.org/archives/dri-devel/2017-May/142376.html
https://lists.freedesktop.org/archives/dri-devel/2017-May/142380.html


> > I've pushed the patch to drm-misc-fixes so that the issue is fixed in
> > 4.12-rc3.
> 
> I don't think there was any particular need to bypass the normal radeon
> tree for this. There was plenty of time for the fix to get into 4.12
> final, even after Alex is back.

Well, it wouldn't be nice towards users affected by the same issue
who may waste time with bisecting to just sit on a fix twiddling thumbs.

Thanks,

Lukas
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/radeon: Fix oops upon driver load on PowerXpress laptops

2017-05-22 Thread Lukas Wunner
On Sun, May 21, 2017 at 09:31:09AM +0200, Nicolai Stange wrote:
> On Thu, May 18 2017, Lukas Wunner wrote:
[snip]
> > Reported-by: Nicolai Stange 
> > Fixes: 7ffb0ce31cf9 ("drm/radeon: Don't register Thunderbolt eGPU with 
> > vga_switcheroo")
> > Signed-off-by: Lukas Wunner 
> > ---
> >
> > Awaiting a Tested-by: from Nicolai, but it's clear this is a bug and
> > needs to be fixed, so sending out with a proper commit message now.
> > The bug was only introduced to radeon, not amdgpu.
> 
> Tested-by: Nicolai Stange 
> 
> Thanks for the quick fix!
>
> > @Alex Deucher: I could push this to drm-misc-fixes but then it wouldn't
> > land before -rc3 because Sean Paul has already sent out the -rc2 pull.
> > I notice you haven't sent out a pull for -rc2 yet, so maybe you want to
> > take it yourself?  Whichever you prefer.  Thanks & sorry for the breakage!

I've learned this morning that Alex is on vacation.  I've pushed
the patch to drm-misc-fixes so that the issue is fixed in 4.12-rc3.

@Sean Paul: I've fast-forwarded to 4.12-rc2 before pushing, please
shout if I've done anything wrong.  First time I'm doing this.

Thanks,

Lukas

> >
> >  drivers/gpu/drm/radeon/radeon_kms.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/radeon/radeon_kms.c 
> > b/drivers/gpu/drm/radeon/radeon_kms.c
> > index 6a68d440bc44..d0ad03674250 100644
> > --- a/drivers/gpu/drm/radeon/radeon_kms.c
> > +++ b/drivers/gpu/drm/radeon/radeon_kms.c
> > @@ -116,7 +116,7 @@ int radeon_driver_load_kms(struct drm_device *dev, 
> > unsigned long flags)
> > if ((radeon_runtime_pm != 0) &&
> > radeon_has_atpx() &&
> > ((flags & RADEON_IS_IGP) == 0) &&
> > -   !pci_is_thunderbolt_attached(rdev->pdev))
> > +   !pci_is_thunderbolt_attached(dev->pdev))
> > flags |= RADEON_IS_PX;
> >  
> > /* radeon_device_init should report only fatal error
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/radeon: Fix oops upon driver load on PowerXpress laptops

2017-05-18 Thread Lukas Wunner
Nicolai Stange reports the following oops which is caused by
dereferencing rdev->pdev before it's subsequently set by
radeon_device_init().  Fix it.

  BUG: unable to handle kernel NULL pointer dereference at 07cb
  IP: radeon_driver_load_kms+0xeb/0x230 [radeon]
  PGD 0
  P4D 0

  Oops:  [#1] SMP
  Modules linked in: amdkfd amd_iommu_v2 i915(+) radeon(+) i2c_algo_bit 
drm_kms_helper ttm e1000e drm sdhci_pci sdhci_acpi ptp sdhci crc32c_intel 
serio_raw mmc_core pps_core video i2c_hid hid_plantronics
  CPU: 4 PID: 389 Comm: systemd-udevd Not tainted 4.12.0-rc1-next-20170515+ #1
  Hardware name: Dell Inc. Latitude E6540/0725FP, BIOS A10 06/26/2014
  task: 97d62c8f task.stack: b96f01478000
  RIP: 0010:radeon_driver_load_kms+0xeb/0x230 [radeon]
  RSP: 0018:b96f0147b9d0 EFLAGS: 00010246
  RAX:  RBX: 97d620085000 RCX: 00610037
  RDX:  RSI: 002b RDI: 
  RBP: b96f0147b9e8 R08: 0002 R09: b96f0147b924
  R10:  R11: 97d62edd2ec0 R12: 97d628d5c000
  R13: 00610037 R14: c0698280 R15: 
  FS:  7f496363d8c0() GS:97d62eb0() knlGS:
  CS:  0010 DS:  ES:  CR0: 80050033
  CR2: 07cb CR3: 00022c14c000 CR4: 001406e0
  Call Trace:
   drm_dev_register+0x146/0x1d0 [drm]
   drm_get_pci_dev+0x9a/0x180 [drm]
   radeon_pci_probe+0xb8/0xe0 [radeon]
   local_pci_probe+0x45/0xa0
   pci_device_probe+0x14f/0x1a0
   driver_probe_device+0x29c/0x450
   __driver_attach+0xdf/0xf0
   ? driver_probe_device+0x450/0x450
   bus_for_each_dev+0x6c/0xc0
   driver_attach+0x1e/0x20
   bus_add_driver+0x170/0x270
   driver_register+0x60/0xe0
   ? 0xc0508000
   __pci_register_driver+0x4c/0x50
   drm_pci_init+0xeb/0x100 [drm]
   ? vga_switcheroo_register_handler+0x6a/0x90
   ? 0xc0508000
   radeon_init+0x98/0xb6 [radeon]
   do_one_initcall+0x52/0x1a0
   ? __vunmap+0x81/0xb0
   ? kmem_cache_alloc_trace+0x159/0x1b0
   ? do_init_module+0x27/0x1f8
   do_init_module+0x5f/0x1f8
   load_module+0x27ce/0x2be0
   SYSC_finit_module+0xdf/0x110
   ? SYSC_finit_module+0xdf/0x110
   SyS_finit_module+0xe/0x10
   do_syscall_64+0x67/0x150
   entry_SYSCALL64_slow_path+0x25/0x25
  RIP: 0033:0x7f4962295679
  RSP: 002b:7ffdd8c4f878 EFLAGS: 0246 ORIG_RAX: 0139
  RAX: ffda RBX: 55c014ed8200 RCX: 7f4962295679
  RDX:  RSI: 7f4962dd19c5 RDI: 0010
  RBP: 7f4962dd19c5 R08:  R09: 7ffdd8c4f990
  R10: 0010 R11: 0246 R12: 
  R13: 55c014ed81a0 R14: 0002 R15: 55c0149d1fca
  Code: 5d 5d c3 8b 05 a7 05 14 00 49 81 cd 00 00 08 00 85 c0 74 a3 e8 e7 c0 0e 
00 84 c0 74 9a 41 f7 c5 00 00 02 00 75 91 49 8b 44 24 10 <0f> b6 90 cb 07 00 00 
f6 c2 20 74 1e e9 7b ff ff ff 48 8b 40 38
  RIP: radeon_driver_load_kms+0xeb/0x230 [radeon] RSP: b96f0147b9d0
  CR2: 07cb
  ---[ end trace 89cc4ba7e569c65c ]---

Reported-by: Nicolai Stange 
Fixes: 7ffb0ce31cf9 ("drm/radeon: Don't register Thunderbolt eGPU with 
vga_switcheroo")
Signed-off-by: Lukas Wunner 
---

Awaiting a Tested-by: from Nicolai, but it's clear this is a bug and
needs to be fixed, so sending out with a proper commit message now.
The bug was only introduced to radeon, not amdgpu.

@Alex Deucher: I could push this to drm-misc-fixes but then it wouldn't
land before -rc3 because Sean Paul has already sent out the -rc2 pull.
I notice you haven't sent out a pull for -rc2 yet, so maybe you want to
take it yourself?  Whichever you prefer.  Thanks & sorry for the breakage!

 drivers/gpu/drm/radeon/radeon_kms.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/radeon/radeon_kms.c 
b/drivers/gpu/drm/radeon/radeon_kms.c
index 6a68d440bc44..d0ad03674250 100644
--- a/drivers/gpu/drm/radeon/radeon_kms.c
+++ b/drivers/gpu/drm/radeon/radeon_kms.c
@@ -116,7 +116,7 @@ int radeon_driver_load_kms(struct drm_device *dev, unsigned 
long flags)
if ((radeon_runtime_pm != 0) &&
radeon_has_atpx() &&
((flags & RADEON_IS_IGP) == 0) &&
-   !pci_is_thunderbolt_attached(rdev->pdev))
+   !pci_is_thunderbolt_attached(dev->pdev))
flags |= RADEON_IS_PX;
 
/* radeon_device_init should report only fatal error
-- 
2.11.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [REGRESSION] drm/radeon: Don't register Thunderbolt eGPU with vga_switcheroo

2017-05-18 Thread Lukas Wunner
On Wed, May 17, 2017 at 11:08:23PM +0200, Nicolai Stange wrote:
> I'm experiencing a boot failure on next-20170515:
> 
>   BUG: unable to handle kernel NULL pointer dereference at 07cb
>   IP: radeon_driver_load_kms+0xeb/0x230 [radeon]
[snip]
> Bisection lead to commit 7ffb0ce31cf9 ("drm/radeon: Don't register
> Thunderbolt eGPU with vga_switcheroo"). Reverting this commit on top of
> next-20170515 fixes the issue for me.
> 
> My box is a Dell laptop which most certainly hasn't got any Thunderbolt
> circuitry.

Thanks a lot Nicolai for reporting this, my apologies for the breakage
which turns out to be a dereference of rdev->pdev before it's set. :-(

14:   e8 e7 c0 0e 00  callq  0xec100 ; radeon_has_atpx()
19:   84 c0   test   %al,%al
1b:   74 9a   je 0xffb7
1d:   41 f7 c5 00 00 02 00test   $0x2,%r13d  ; flags & RADEON_IS_IGP
24:   75 91   jne0xffb7
26:   49 8b 44 24 10  mov0x10(%r12),%rax ; rax = rdev
2b:*  0f b6 90 cb 07 00 00movzbl 0x7cb(%rax),%edx <-- trapping 
instruction

Could you verify if the patch below fixes the issue for you?

Thanks!

Lukas

-- >8 --

diff --git a/drivers/gpu/drm/radeon/radeon_kms.c 
b/drivers/gpu/drm/radeon/radeon_kms.c
index e3e7cb1..4761f27 100644
--- a/drivers/gpu/drm/radeon/radeon_kms.c
+++ b/drivers/gpu/drm/radeon/radeon_kms.c
@@ -116,7 +116,7 @@ int radeon_driver_load_kms(struct drm_device *dev, unsigned 
long flags)
if ((radeon_runtime_pm != 0) &&
radeon_has_atpx() &&
((flags & RADEON_IS_IGP) == 0) &&
-   !pci_is_thunderbolt_attached(rdev->pdev))
+   !pci_is_thunderbolt_attached(dev->pdev))
flags |= RADEON_IS_PX;
 
/* radeon_device_init should report only fatal error
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 0/5] Thunderbolt GPU fixes

2017-03-10 Thread Lukas Wunner
On Thu, Mar 09, 2017 at 04:03:47PM +0100, Daniel Vetter wrote:
> On Fri, Feb 24, 2017 at 08:19:45PM +0100, Lukas Wunner wrote:
> > Fix Thunderbolt-related issues in apple-gmux and vga_switcheroo:
> > 
> > Patch [1/5] ("Recognize Thunderbolt devices") has already been subjected
> > to a fair amount of scrutiny over at linux-pci@, I've submitted it 5 times
> > total since May 2016.  With luck it may be in ack-able shape now.
> > 
> > Patch [2/5] amends apple-gmux to handle combined DP/Thunderbolt ports
> > properly on newer MacBook Pros.
> > 
> > Patches [3/5] to [5/5] avoid registering external Thunderbolt GPUs with
> > vga_switcheroo:  Dave Airlie designed vga_switcheroo to register GPUs
> > unconditionally.  So if a desktop box has multiple GPUs, vga_switcheroo
> > will see more than one discrete GPU but that's not a problem because on
> > desktop boxes no handler is registered and thus vga_switcheroo_enable()
> > is never called.  Hybrid graphics laptops on the other hand do register
> > a handler, but are assumed to never register more than one discrete GPU.
> > However once a Thunderbolt eGPU is attached to a hybrid graphics laptop,
> > that assumption is no longer true and things go south when vga_switcheroo
> > runtime suspends the external discrete GPU and then calls the handler to
> > cut power to the internal discrete GPU.  The driver for the internal GPU
> > will sit there puzzled and typically cause a lockup.
[snip]
> > I've pushed the present series to GitHub in case anyone prefers reviewing
> > it in a GUI:
> > https://github.com/l1k/linux/commits/thunderbolt_gpu_v1
> 
> For merging, should I smash this all into drm-misc? The only thing outside
> is the apple-gmux driver ...

Merging through drm-misc would be lovely.  However I've prepared a v2 of
patch [1/5] to address Bjorn's comments (amended the commit message and a
code comment).  I'll respin the series this evening and include the acks
I've collected so far.

@Darren & Andy:
Please ack patch [5/5] of this series, barring any objections.

I'll move the apple-gmux patch to the end of the series, so merging that
one can be postponed until Darren and Andy find the time to look at it.

Thanks!

Lukas
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 4/5] drm/radeon: Don't register Thunderbolt eGPU with vga_switcheroo

2017-03-09 Thread Lukas Wunner
On Wed, Mar 08, 2017 at 03:34:47PM -0500, Alex Deucher wrote:
> On Wed, Mar 8, 2017 at 12:01 AM, Lukas Wunner  wrote:
> > On Tue, Mar 07, 2017 at 03:30:30PM -0500, Alex Deucher wrote:
> >> On Fri, Feb 24, 2017 at 2:19 PM, Lukas Wunner  wrote:
> >> > An external Thunderbolt GPU can neither drive the laptop's panel nor be
> >> > powered off by the platform, so there's no point in registering it with
> >> > vga_switcheroo.  In fact, when the external GPU is runtime suspended,
> >> > vga_switcheroo will cut power to the internal discrete GPU, resulting in
> >> > a lockup.
> >>
> >> I'm not necessarily opposed to this, but I'd prefer something more
> >> generic.  E.g., what happens if someone uses another dGPU in a docking
> >> station or some other sort of PCIe bridge?
> >
> > Such a dGPU is only relevant to vga_switcheroo if it can either drive
> > the panel or be powered off by the platform.  Does such a product exist?
> >
> > I've never heard of one, and think that's because such a product doesn't
> > make sense:  A docking staton is not power-constrained, it's stationary
> > and connected to a wall outlet, so there's no need to power the dGPU off
> > when it's not in use.
> >
> > And at a docking station you're usually connected to a larger monitor,
> > so having the dGPU drive the laptop's smaller panel isn't necessary either.
> > In the rare cases where there's no larger monitor, you just use the dGPU
> > for render offloading, just as you would for contemporary ATPX laptops.
> >
> > OTOH, dGPUs in Thunderbolt enclosures *do* exist and connecting them
> > to an ATPX machine causes failure, as explained in the commit message.
> 
> Whether you introduce additional dGPUs via thunderbolt or some
> proprietary interface or a pci bridge in a docking station the result
> is still the same.  You end up with the potential scenario you
> described in this commit message that there may be confusion as to
> which GPU is controlled via ACPI power controls.
> 
> I talked to the windows team.  They special case thunderbolt as well,

Very interesting, thanks for reaching out to them.  I've already heard
that Windows 10 supports Thunderbolt eGPUs, but only with Thunderbolt 3.
I think it's desirable that we achieve feature parity with them (without
the unnecessary restriction to Thunderbolt 3).  Older Windows versions as
well as macOS apparently require all sorts of awful hacks for eGPUs.


> so the patch is probably fine.

Is that an ack or are there any remaining concerns?


> For pcie ports in a docking station, I
> suspect there may not actually be any docking stations supported by PX
> laptops where this could be an issue.

If/when such products do become available, they can probably be identified
via specific ACPI methods or if all else fails, DMI quirks.


> For non-thunderbolt detachable
> graphics there is a new ATIF function to query the bus number of the
> supported device.  I'll send a patch out for that in a bit.

Great, thanks.


> Thinking about this more, long term we should probably only register
> with vga_switcheroo if we support display muxing which is a legacy
> feature these days.  Most systems are mux-less so we just need to
> handle dgpu power control via runtime pm.

Right now registering with vga_switcheroo is still necessary even for
muxless systems primarily because DRM drivers call
vga_switcheroo_set_dynamic_switch() to pause the HDA driver and update
the power state stored internally by vga_switcheroo.

I plan to address the former by reworking vga_switcheroo audio handling
using functional device dependencies (a new PM mechanism that appeared
in v4.10, see documentation in aad800403a87), and I think the latter will
then become obsolete as well.  I've got a concept in my head and am
pumped to code it up, just a little time-constrained at the moment. :-)

Thanks,

Lukas
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 4/5] drm/radeon: Don't register Thunderbolt eGPU with vga_switcheroo

2017-03-08 Thread Lukas Wunner
On Wed, Mar 08, 2017 at 11:46:33AM +0100, Peter Wu wrote:
> On Wed, Mar 08, 2017 at 06:01:54AM +0100, Lukas Wunner wrote:
> > On Tue, Mar 07, 2017 at 03:30:30PM -0500, Alex Deucher wrote:
> > > On Fri, Feb 24, 2017 at 2:19 PM, Lukas Wunner  wrote:
> > > > An external Thunderbolt GPU can neither drive the laptop's panel nor be
> > > > powered off by the platform, so there's no point in registering it with
> > > > vga_switcheroo.  In fact, when the external GPU is runtime suspended,
> > > > vga_switcheroo will cut power to the internal discrete GPU, resulting in
> > > > a lockup.
> > > 
> > > I think on AMD platforms
> > > at least we should be able to determine what devices are the
> > > switcheroo devices based on information in the ATIF and ATPX ACPI
> > > methods.  In that case, we can be explicit in which devices we
> > > register with vga_switcheroo.
> > 
> > Is there public documentation on these methods somewhere?
> 
> The ACPI interface is documented in
> drivers/gpu/drm/amd/include/amd_acpi.h while
> drivers/gpu/drm/amd/amdgpu/amdgpu_atpx_handler.c contains some glue for
> ACPI and the amdgpu driver (similar code exists for radeon).

Ah, thanks Peter.

Unfortunately this method will not work on Macs.  So I guess on those
we're again dependent on deducing whether a dGPU is internal or external
by looking at the PCI hierarchy.

However on non-Macs it seems that ATIF_FUNCTION_GET_GRAPHICS_DEVICE_TYPES
should return the type for each built-in GPU.

@Alex:
How reliable is this, e.g. is it possible that vendors may have forgotten
to set these bits in the BIOS?  If we depend on ATIF to determine the
type of a dGPU and the information returned is incorrect, we risk not
registering a device when we actually should, thus causing a regression.

Also, could you explain which of these types should be registered with
vga_switcheroo and which shouldn't?  Likewise, which of these can be
powered down by the platform and should thus use the vga_switcheroo
power domain?

ATIF_PX_REMOVABLE_GRAPHICS_DEVICE
ATIF_XGP_PORT
ATIF_VGA_ENABLED_GRAPHICS_DEVICE
ATIF_XGP_PORT_IN_DOCK

Thanks,

Lukas
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 4/5] drm/radeon: Don't register Thunderbolt eGPU with vga_switcheroo

2017-03-07 Thread Lukas Wunner
On Tue, Mar 07, 2017 at 03:30:30PM -0500, Alex Deucher wrote:
> On Fri, Feb 24, 2017 at 2:19 PM, Lukas Wunner  wrote:
> > An external Thunderbolt GPU can neither drive the laptop's panel nor be
> > powered off by the platform, so there's no point in registering it with
> > vga_switcheroo.  In fact, when the external GPU is runtime suspended,
> > vga_switcheroo will cut power to the internal discrete GPU, resulting in
> > a lockup.
> 
> I'm not necessarily opposed to this, but I'd prefer something more
> generic.  E.g., what happens if someone uses another dGPU in a docking
> station or some other sort of PCIe bridge?

Such a dGPU is only relevant to vga_switcheroo if it can either drive
the panel or be powered off by the platform.  Does such a product exist?

I've never heard of one, and think that's because such a product doesn't
make sense:  A docking staton is not power-constrained, it's stationary
and connected to a wall outlet, so there's no need to power the dGPU off
when it's not in use.

And at a docking station you're usually connected to a larger monitor,
so having the dGPU drive the laptop's smaller panel isn't necessary either.
In the rare cases where there's no larger monitor, you just use the dGPU
for render offloading, just as you would for contemporary ATPX laptops.

OTOH, dGPUs in Thunderbolt enclosures *do* exist and connecting them
to an ATPX machine causes failure, as explained in the commit message.


> I think on AMD platforms
> at least we should be able to determine what devices are the
> switcheroo devices based on information in the ATIF and ATPX ACPI
> methods.  In that case, we can be explicit in which devices we
> register with vga_switcheroo.

Is there public documentation on these methods somewhere?

Thanks,

Lukas
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 1/5] PCI: Recognize Thunderbolt devices

2017-03-04 Thread Lukas Wunner
On Fri, Feb 24, 2017 at 04:17:24PM -0600, Bjorn Helgaas wrote:
> On Fri, Feb 24, 2017 at 08:19:45PM +0100, Lukas Wunner wrote:
> > Detect on probe whether a PCI device is part of a Thunderbolt controller.
[...]
> > * If an external Thunderbolt GPU is connected to a dual GPU laptop (Mac
> >   or not), that GPU is currently registered with vga_switcheroo even
> >   though it can neither drive the laptop's panel nor be powered off by
> >   the platform.  To vga_switcheroo it will appear as if two discrete
> >   GPUs are present.  As a result, when the external GPU is runtime
> >   suspended, vga_switcheroo will cut power to the internal discrete GPU
> >   which may not be runtime suspended at all at this moment.  The
> >   solution is to not register external GPUs with vga_switcheroo, which
> >   necessitates a way to recognize if they're on a Thunderbolt daisy
> >   chain.
> 
> If I understand correctly, vga_switcheroo manages two GPUs that have a
> single output: either there's a mux that connects one GPU or the other
> to the output, or one GPU is permanently connected to the output and
> the other does offline rendering.

There are two aspects to hybrid graphics, switching the panel between
GPUs and powering off the discrete GPU.  (Some laptops can also switch
external DP ports between GPUs and some, as you say, cannot switch the
panel and only use the discrete GPU for render offloading.)


> To this non-GPU person, it sounds like the important question is
> whether two GPUs are related in this way (either they feed the same
> mux, or there's some special offline rendering connection between
> them).  That sounds unrelated to the question of how the GPUs
> themselves are connected to the PCI hierarchy.

To the best of my knowledge there's no definite way to determine whether
two GPUs are connected to the same panel via a mux.

There is also no such thing as a special render offloading connection:
Frames are computed on a discrete GPU, then copied over PCIe into the
framebuffer of the integrated GPU.  Whether that discrete GPU is
on-board or externally connected is completely transparent and not
discernible other than by looking at the PCI hierarchy.

The same issue exists for HD Audio controllers integrated into many
discrete GPUs:  To the OS these look like separate PCI functions
and in sound/pci/hda/hda_intel.c:get_bound_vga() we leverage the fact
that the GPU is always function 0 and the HD Audio is function 1 to
discover the GPU corresponding to a particular HD Audio controller.
So again that relationship is deduced from the PCI hierarchy.


> From a pure PCI perspective, I assume it would be conceivable to have
> two Thunderbolt-connected GPUs feeding into a mux.  Or to have a GPU
> (unrelated to the mux) in a non-Thunderbolt plugin slot or connected
> externally via a non-Thunderbolt switch and an iPass cable.

Technically such products would be possible, but I believe they don't
exist.  (Unlike Thunderbolt eGPUs, which have been on the market for
a few years.)

Thanks,

Lukas
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 1/5] PCI: Recognize Thunderbolt devices

2017-02-24 Thread Lukas Wunner
On Fri, Feb 24, 2017 at 04:17:24PM -0600, Bjorn Helgaas wrote:
> On Fri, Feb 24, 2017 at 08:19:45PM +0100, Lukas Wunner wrote:
> > --- a/include/linux/pci.h
> > +++ b/include/linux/pci.h
> > @@ -358,6 +358,7 @@ struct pci_dev {
> > unsigned intis_virtfn:1;
> > unsigned intreset_fn:1;
> > unsigned intis_hotplug_bridge:1;
> > +   unsigned intis_thunderbolt:1; /* Thunderbolt controller */
> 
> I'm not really keen on having this in the PCI core because the core
> doesn't need this or even know what it means.
> 
> pci_find_next_ext_capability() is available to drivers, and if
> Thunderbolt-connectedness is useful information to apple-gmux or GPU
> drivers, it's fine with me if you want to use it there.  I just don't
> see the benefit to having it in the core.

The above contradicts your statement 3 days earlier:

"Assuming we need it, having it in struct pci_dev is fine.
 There's no point in looking up the VSEC capability more than once."
(http://www.spinics.net/lists/linux-pci/msg58532.html)

Please explain.

Thanks,

Lukas
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 5/5] drm/amdgpu: Don't register Thunderbolt eGPU with vga_switcheroo

2017-02-24 Thread Lukas Wunner
An external Thunderbolt GPU can neither drive the laptop's panel nor be
powered off by the platform, so there's no point in registering it with
vga_switcheroo.  In fact, when the external GPU is runtime suspended,
vga_switcheroo will cut power to the internal discrete GPU, resulting in
a lockup.

Cc: Alex Deucher 
Signed-off-by: Lukas Wunner 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 7 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c| 3 ++-
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 944ba0d3874a..99d73504e56d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -1759,7 +1759,9 @@ int amdgpu_device_init(struct amdgpu_device *adev,
runtime = true;
if (amdgpu_device_is_px(ddev))
runtime = true;
-   vga_switcheroo_register_client(adev->pdev, &amdgpu_switcheroo_ops, 
runtime);
+   if (!pci_is_thunderbolt_attached(adev->pdev))
+   vga_switcheroo_register_client(adev->pdev,
+  &amdgpu_switcheroo_ops, runtime);
if (runtime)
vga_switcheroo_init_domain_pm_ops(adev->dev, 
&adev->vga_pm_domain);
 
@@ -1922,7 +1924,8 @@ void amdgpu_device_fini(struct amdgpu_device *adev)
amdgpu_atombios_fini(adev);
kfree(adev->bios);
adev->bios = NULL;
-   vga_switcheroo_unregister_client(adev->pdev);
+   if (!pci_is_thunderbolt_attached(adev->pdev))
+   vga_switcheroo_unregister_client(adev->pdev);
if (adev->flags & AMD_IS_PX)
vga_switcheroo_fini_domain_pm_ops(adev->dev);
vga_client_register(adev->pdev, NULL, NULL, NULL);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index 61d94c745672..2f3b236721c9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -103,7 +103,8 @@ int amdgpu_driver_load_kms(struct drm_device *dev, unsigned 
long flags)
amdgpu_has_atpx() &&
(amdgpu_is_atpx_hybrid() ||
 amdgpu_has_atpx_dgpu_power_cntl()) &&
-   ((flags & AMD_IS_APU) == 0))
+   ((flags & AMD_IS_APU) == 0) &&
+   !pci_is_thunderbolt_attached(dev->pdev))
flags |= AMD_IS_PX;
 
/* amdgpu_device_init should report only fatal error
-- 
2.11.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 4/5] drm/radeon: Don't register Thunderbolt eGPU with vga_switcheroo

2017-02-24 Thread Lukas Wunner
An external Thunderbolt GPU can neither drive the laptop's panel nor be
powered off by the platform, so there's no point in registering it with
vga_switcheroo.  In fact, when the external GPU is runtime suspended,
vga_switcheroo will cut power to the internal discrete GPU, resulting in
a lockup.

Cc: Alex Deucher 
Signed-off-by: Lukas Wunner 
---
 drivers/gpu/drm/radeon/radeon_device.c | 7 +--
 drivers/gpu/drm/radeon/radeon_kms.c| 3 ++-
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_device.c 
b/drivers/gpu/drm/radeon/radeon_device.c
index 4b0c388be3f5..27be17f0b227 100644
--- a/drivers/gpu/drm/radeon/radeon_device.c
+++ b/drivers/gpu/drm/radeon/radeon_device.c
@@ -1471,7 +1471,9 @@ int radeon_device_init(struct radeon_device *rdev,
 
if (rdev->flags & RADEON_IS_PX)
runtime = true;
-   vga_switcheroo_register_client(rdev->pdev, &radeon_switcheroo_ops, 
runtime);
+   if (!pci_is_thunderbolt_attached(rdev->pdev))
+   vga_switcheroo_register_client(rdev->pdev,
+  &radeon_switcheroo_ops, runtime);
if (runtime)
vga_switcheroo_init_domain_pm_ops(rdev->dev, 
&rdev->vga_pm_domain);
 
@@ -1564,7 +1566,8 @@ void radeon_device_fini(struct radeon_device *rdev)
/* evict vram memory */
radeon_bo_evict_vram(rdev);
radeon_fini(rdev);
-   vga_switcheroo_unregister_client(rdev->pdev);
+   if (!pci_is_thunderbolt_attached(rdev->pdev))
+   vga_switcheroo_unregister_client(rdev->pdev);
if (rdev->flags & RADEON_IS_PX)
vga_switcheroo_fini_domain_pm_ops(rdev->dev);
vga_client_register(rdev->pdev, NULL, NULL, NULL);
diff --git a/drivers/gpu/drm/radeon/radeon_kms.c 
b/drivers/gpu/drm/radeon/radeon_kms.c
index 56f35c06742c..e95ceec1c97a 100644
--- a/drivers/gpu/drm/radeon/radeon_kms.c
+++ b/drivers/gpu/drm/radeon/radeon_kms.c
@@ -115,7 +115,8 @@ int radeon_driver_load_kms(struct drm_device *dev, unsigned 
long flags)
 
if ((radeon_runtime_pm != 0) &&
radeon_has_atpx() &&
-   ((flags & RADEON_IS_IGP) == 0))
+   ((flags & RADEON_IS_IGP) == 0) &&
+   !pci_is_thunderbolt_attached(rdev->pdev))
flags |= RADEON_IS_PX;
 
/* radeon_device_init should report only fatal error
-- 
2.11.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 1/5] PCI: Recognize Thunderbolt devices

2017-02-24 Thread Lukas Wunner
Detect on probe whether a PCI device is part of a Thunderbolt controller.

Intel uses a Vendor-Specific Extended Capability (VSEC) with ID 0x1234
on such devices.  Detect presence of this VSEC and cache it in a newly
added is_thunderbolt bit in struct pci_dev.  Add a helper to check
whether a given PCI device is situated on a Thunderbolt daisy chain.

The necessity arises from the following:

* To power off Thunderbolt controllers on Macs even if their BIOS is
  older than 2015, their PCIe ports need to be whitelisted for runtime
  PM.  For this we need a way to recognize them.

* Dual GPU MacBook Pros introduced 2011+ can no longer switch external
  DisplayPort ports between GPUs.  (They're no longer just used for DP
  but have become combined DP/Thunderbolt ports.)  The driver to switch
  the ports, drivers/platform/x86/apple-gmux.c, needs to detect presence
  of a Thunderbolt controller and, if found, keep external ports
  permanently switched to the discrete GPU.

* If an external Thunderbolt GPU is connected to a dual GPU laptop (Mac
  or not), that GPU is currently registered with vga_switcheroo even
  though it can neither drive the laptop's panel nor be powered off by
  the platform.  To vga_switcheroo it will appear as if two discrete
  GPUs are present.  As a result, when the external GPU is runtime
  suspended, vga_switcheroo will cut power to the internal discrete GPU
  which may not be runtime suspended at all at this moment.  The
  solution is to not register external GPUs with vga_switcheroo, which
  necessitates a way to recognize if they're on a Thunderbolt daisy
  chain.

Cc: Andreas Noever 
Cc: Michael Jamet 
Cc: Tomas Winkler 
Cc: Amir Levy 
Signed-off-by: Lukas Wunner 
---
 drivers/pci/pci.h   |  2 ++
 drivers/pci/probe.c | 21 +
 include/linux/pci.h | 23 +++
 3 files changed, 46 insertions(+)

diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index cb17db242f30..45c2b8144911 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -3,6 +3,8 @@
 
 #define PCI_FIND_CAP_TTL   48
 
+#define PCI_VSEC_ID_INTEL_TBT  0x1234  /* Thunderbolt */
+
 extern const unsigned char pcie_link_speed[];
 
 bool pcie_cap_has_lnkctl(const struct pci_dev *dev);
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 204960e70333..7963ecc6d85f 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1208,6 +1208,24 @@ void set_pcie_hotplug_bridge(struct pci_dev *pdev)
pdev->is_hotplug_bridge = 1;
 }
 
+static void set_pcie_thunderbolt(struct pci_dev *dev)
+{
+   int vsec = 0;
+   u32 header;
+
+   while ((vsec = pci_find_next_ext_capability(dev, vsec,
+   PCI_EXT_CAP_ID_VNDR))) {
+   pci_read_config_dword(dev, vsec + PCI_VNDR_HEADER, &header);
+
+   /* Is the device part of a Thunderbolt controller? */
+   if (dev->vendor == PCI_VENDOR_ID_INTEL &&
+   PCI_VNDR_HEADER_ID(header) == PCI_VSEC_ID_INTEL_TBT) {
+   dev->is_thunderbolt = 1;
+   return;
+   }
+   }
+}
+
 /**
  * pci_ext_cfg_is_aliased - is ext config space just an alias of std config?
  * @dev: PCI device
@@ -1360,6 +1378,9 @@ int pci_setup_device(struct pci_dev *dev)
/* need to have dev->class ready */
dev->cfg_size = pci_cfg_space_size(dev);
 
+   /* need to have dev->cfg_size ready */
+   set_pcie_thunderbolt(dev);
+
/* "Unknown power state" */
dev->current_state = PCI_UNKNOWN;
 
diff --git a/include/linux/pci.h b/include/linux/pci.h
index e2d1a124216a..36dfcfd946f4 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -358,6 +358,7 @@ struct pci_dev {
unsigned intis_virtfn:1;
unsigned intreset_fn:1;
unsigned intis_hotplug_bridge:1;
+   unsigned intis_thunderbolt:1; /* Thunderbolt controller */
unsigned int__aer_firmware_first_valid:1;
unsigned int__aer_firmware_first:1;
unsigned intbroken_intx_masking:1;
@@ -2171,6 +2172,28 @@ static inline bool pci_ari_enabled(struct pci_bus *bus)
return bus->self && bus->self->ari_enabled;
 }
 
+/**
+ * pci_is_thunderbolt_attached - whether device is on a Thunderbolt daisy chain
+ * @pdev: PCI device to check
+ *
+ * Walk upwards from @pdev and check for each encountered bridge if it's
+ * part of a Thunderbolt controller.  Reaching the host bridge means @pdev
+ * is soldered to the mainboard.
+ */
+static inline bool pci_is_thunderbolt_attached(struct pci_dev *pdev)
+{
+   struct pci_dev *parent = pdev;
+
+   if (pdev->is_thunderbolt)
+   return true;
+
+   while ((parent = pci_upstream_bridge(parent)))
+   if (parent->is_thunderbolt)
+   return true;
+
+   

[PATCH 0/5] Thunderbolt GPU fixes

2017-02-24 Thread Lukas Wunner
Fix Thunderbolt-related issues in apple-gmux and vga_switcheroo:

Patch [1/5] ("Recognize Thunderbolt devices") has already been subjected
to a fair amount of scrutiny over at linux-pci@, I've submitted it 5 times
total since May 2016.  With luck it may be in ack-able shape now.

Patch [2/5] amends apple-gmux to handle combined DP/Thunderbolt ports
properly on newer MacBook Pros.

Patches [3/5] to [5/5] avoid registering external Thunderbolt GPUs with
vga_switcheroo:  Dave Airlie designed vga_switcheroo to register GPUs
unconditionally.  So if a desktop box has multiple GPUs, vga_switcheroo
will see more than one discrete GPU but that's not a problem because on
desktop boxes no handler is registered and thus vga_switcheroo_enable()
is never called.  Hybrid graphics laptops on the other hand do register
a handler, but are assumed to never register more than one discrete GPU.
However once a Thunderbolt eGPU is attached to a hybrid graphics laptop,
that assumption is no longer true and things go south when vga_switcheroo
runtime suspends the external discrete GPU and then calls the handler to
cut power to the internal discrete GPU.  The driver for the internal GPU
will sit there puzzled and typically cause a lockup.

This series is just a first step towards proper handling of eGPUs, another
will be support for surprise removal:  DRM drivers need to cease MMIO and
PCI config space access once a device is gone to avoid delaying device
teardown or accessing a newly attached replacement device.  Also, MMIO
reads to removed devices return "all ones", which results in an infinite
loop e.g. in nouveau's nvkm_nsec().

The question is how to recognize device removal.  One common method is to
read the vendor register with pci_device_is_present(), but this reports
a false positive if the device is present but in D3cold.  A better method
is to let the PCIe hotplug driver recognize and cache device removal.
Keith Busch has developed patches which do exactly that, they're now at
v6 and fully reviewed by Christoph Hellwig but alas were not included in
the 4.11 PCI pull for some reason:
http://www.spinics.net/lists/linux-pci/msg58123.html

I've pushed the present series to GitHub in case anyone prefers reviewing
it in a GUI:
https://github.com/l1k/linux/commits/thunderbolt_gpu_v1

Thanks,

Lukas


Lukas Wunner (5):
  PCI: Recognize Thunderbolt devices
  apple-gmux: Don't switch external DP port on 2011+ MacBook Pros
  drm/nouveau: Don't register Thunderbolt eGPU with vga_switcheroo
  drm/radeon: Don't register Thunderbolt eGPU with vga_switcheroo
  drm/amdgpu: Don't register Thunderbolt eGPU with vga_switcheroo

 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  7 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c|  3 ++-
 drivers/gpu/drm/nouveau/nouveau_vga.c  | 10 +-
 drivers/gpu/drm/radeon/radeon_device.c |  7 +--
 drivers/gpu/drm/radeon/radeon_kms.c|  3 ++-
 drivers/pci/pci.h  |  2 ++
 drivers/pci/probe.c| 21 
 drivers/platform/x86/apple-gmux.c  | 31 +-
 include/linux/pci.h| 23 ++
 9 files changed, 99 insertions(+), 8 deletions(-)

-- 
2.11.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [RFC] Using DC in amdgpu for upcoming GPU

2016-12-13 Thread Lukas Wunner
On Tue, Dec 13, 2016 at 10:03:58AM -0500, Cheng, Tony wrote:
> On 12/13/2016 4:40 AM, Lukas Wunner wrote:
> > If the Linux community contributes to DC, I guess those contributions
> > can generally be assumed to be GPLv2 licensed.  Yet a future version
> > of the macOS driver would incorporate those contributions in the same
> > binary as their closed source OS-specific portion.
> 
> I am struggling with that these comminty contributions to DC would be.
> 
> Us AMD developer has access to HW docs and designer and we are still
> spending 50% of our time figuring out why our HW doesn't work right.
> I can't image community doing much of this heavy lifting.

True, but past experience with radeon/amdgpu is that the community has
use cases that AMD developers don't specifically cater to, e.g. due to
lack of the required hardware or resource constraints.

E.g. Mario Kleiner has contributed lots of patches for proper vsync
handling which are needed for his neuroscience software.  I've
contributed DDC switching support for MacBook Pros to radeon.
Your driver becomes more useful, you get more customers, everyone wins.


> > Do I understand DAL3.jpg correctly that the macOS driver builds on top
> > of DAL Core?  I'm asking because the graphics drivers shipping with
> > macOS as well as on Apple's EFI Firmware Volume are closed source.
> 
> macOS currently ship with their own driver.  I can't really comment on what
> macOS do without getting into trouble.

The Intel Israel folks working on Thunderbolt are similarly between
the rock that is the community's expectation of openness and the hard
place that is Apple's secrecy.  So I sympathize with your situation,
kudos for trying to do the right thing.


> I guess we can nak all changes and "rewrite" our
> own version of clean up patch community want to see?

I don't think that would be workable honestly.

One way out of this conundrum might be to use a permissive license such
as BSD for DC.  Then whenever you merge a community patch, in addition
to informing the contributor thereof, send them a boilerplate one-liner
that community contributions are assumed to be under the same license
and if the contributor disagrees they should send a short notice to
have their contribution removed.

But IANAL.

Best regards,

Lukas
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [RFC] Using DC in amdgpu for upcoming GPU

2016-12-13 Thread Lukas Wunner
On Mon, Dec 12, 2016 at 09:52:08PM -0500, Cheng, Tony wrote:
> With DC the display hardware programming, resource optimization, power
> management and interaction with rest of system will be fully validated
> across multiple OSs.

Do I understand DAL3.jpg correctly that the macOS driver builds on top
of DAL Core?  I'm asking because the graphics drivers shipping with
macOS as well as on Apple's EFI Firmware Volume are closed source.

If the Linux community contributes to DC, I guess those contributions
can generally be assumed to be GPLv2 licensed.  Yet a future version
of the macOS driver would incorporate those contributions in the same
binary as their closed source OS-specific portion.

I don't quite see how that would be legal but maybe I'm missing
something.

Presumably the situation with the Windows driver is the same.

I guess you could maintain a separate branch sans community contributions
which would serve as a basis for closed source drivers, but not sure if
that is feasible given your resource constraints.

Thanks,

Lukas
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: disable CRTCs before teardown

2016-09-26 Thread Lukas Wunner
On Sun, Sep 25, 2016 at 11:34:48PM +0300, Grazvydas Ignotas wrote:
> Some code called by drm_crtc_force_disable_all() wants to wait for all
> fences, so only do fence teardown after CRTCs are disabled.
> 
> Signed-off-by: Grazvydas Ignotas 

Fixes: 84b89bdcedf8 ("drm/amdgpu: Turn off CRTCs on driver unload")
Cc: sta...@vger.kernel.org # v4.8+

Alex, would it be possible to get this fix into 4.8 this week?

Thanks!

Lukas

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 99a15ca..1a1bc79 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -1822,11 +1822,11 @@ void amdgpu_device_fini(struct amdgpu_device *adev)
>  
>   DRM_INFO("amdgpu: finishing device.\n");
>   adev->shutdown = true;
> + drm_crtc_force_disable_all(adev->ddev);
>   /* evict vram memory */
>   amdgpu_bo_evict_vram(adev);
>   amdgpu_ib_pool_fini(adev);
>   amdgpu_fence_driver_fini(adev);
> - drm_crtc_force_disable_all(adev->ddev);
>   amdgpu_fbdev_fini(adev);
>   r = amdgpu_fini(adev);
>   kfree(adev->ip_block_status);
> -- 
> 2.7.4
> 
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: disable CRTCs before teardown

2016-09-26 Thread Lukas Wunner
On Sun, Sep 25, 2016 at 11:34:48PM +0300, Grazvydas Ignotas wrote:
> Some code called by drm_crtc_force_disable_all() wants to wait for all
> fences, so only do fence teardown after CRTCs are disabled.

Ugh, how embarrassing, that was added by me.

Do you have a BUG splat (e.g. soft lockup) for this?  I'd be curious to
see exactly where things explode, would also be good to have that in
the commit message.

Thanks!

Lukas

> 
> Signed-off-by: Grazvydas Ignotas 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 99a15ca..1a1bc79 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -1822,11 +1822,11 @@ void amdgpu_device_fini(struct amdgpu_device *adev)
>  
>   DRM_INFO("amdgpu: finishing device.\n");
>   adev->shutdown = true;
> + drm_crtc_force_disable_all(adev->ddev);
>   /* evict vram memory */
>   amdgpu_bo_evict_vram(adev);
>   amdgpu_ib_pool_fini(adev);
>   amdgpu_fence_driver_fini(adev);
> - drm_crtc_force_disable_all(adev->ddev);
>   amdgpu_fbdev_fini(adev);
>   r = amdgpu_fini(adev);
>   kfree(adev->ip_block_status);
> -- 
> 2.7.4
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [pull] radeon and amdgpu drm-next-4.9

2016-09-18 Thread Lukas Wunner
On Fri, Sep 16, 2016 at 06:07:36PM -0400, Alex Deucher wrote:
> On Fri, Sep 16, 2016 at 5:38 PM, Lukas Wunner  wrote:
> > On Fri, Sep 16, 2016 at 04:42:43PM -0400, Alex Deucher wrote:
> >>   drm/amdgpu: skip suspend/resume on DRM_SWITCH_POWER_DYNAMIC_OFF
> >>   drm/radeon: skip suspend/resume on DRM_SWITCH_POWER_DYNAMIC_OFF
> >
> > Those two are unnecessary, it can't happen that the ->suspend hook
> > is executed with the device runtime suspended.
> >
> > Since commit d14d2a8453d6 ("drm: Remove dev_pm_ops from drm_class"),
> > DRM devices are afforded direct_complete, i.e. if the GPU is runtime
> > suspended upon system sleep, it is left in this state.  The only
> > callbacks that are executed are the ->prepare and the ->complete hook.
> > All the callbacks in-between, like ->suspend, are skipped.
> >
> > Even if direct_complete is not afforded for some reason, the PM core
> > will automatically runtime resume the device before executing ->suspend.
> >
> > That ->suspend is skipped in the DRM_SWITCH_POWER_OFF case was done
> > because the device is suspended behind the PM core's back if runpm=0
> > is set.  (And it doesn't work properly because the PCI core will
> > invalidate the saved_state during ->resume_noirq.  That could be
> > solved by returning 1 from the ->prepare hook in the DRM_SWITCH_POWER_OFF
> > case, but that's a different story.)
> >
> > (Sorry for not raising this earlier, I'm not subscribed to amd-gfx.)
> 
> Thanks for the heads up.  They shouldn't hurt anything and it matches
> what other drivers do.  If you feel strongly about it, I can revert
> them later.

Well, superfluous code shouldn't be included just because it doesn't hurt
or because others didn't know better either, or removed because of
someone else's feelings. ;-)

You can find the runtime resume before execution of the driver ->suspend
callback in drivers/pci/pci-driver.c:pci_pm_suspend():

const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
...
pm_runtime_resume(dev);
...
if (pm->suspend) {
...
error = pm->suspend(dev);

The device is prevented from auto-suspending because of a runtime PM ref
acquired beforehand in drivers/base/power/main.c:device_prepare().

Best regards,

Lukas
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [pull] radeon and amdgpu drm-next-4.9

2016-09-16 Thread Lukas Wunner
On Fri, Sep 16, 2016 at 04:42:43PM -0400, Alex Deucher wrote:
>   drm/amdgpu: skip suspend/resume on DRM_SWITCH_POWER_DYNAMIC_OFF
>   drm/radeon: skip suspend/resume on DRM_SWITCH_POWER_DYNAMIC_OFF

Those two are unnecessary, it can't happen that the ->suspend hook
is executed with the device runtime suspended.

Since commit d14d2a8453d6 ("drm: Remove dev_pm_ops from drm_class"),
DRM devices are afforded direct_complete, i.e. if the GPU is runtime
suspended upon system sleep, it is left in this state.  The only
callbacks that are executed are the ->prepare and the ->complete hook.
All the callbacks in-between, like ->suspend, are skipped.

Even if direct_complete is not afforded for some reason, the PM core
will automatically runtime resume the device before executing ->suspend.

That ->suspend is skipped in the DRM_SWITCH_POWER_OFF case was done
because the device is suspended behind the PM core's back if runpm=0
is set.  (And it doesn't work properly because the PCI core will
invalidate the saved_state during ->resume_noirq.  That could be
solved by returning 1 from the ->prepare hook in the DRM_SWITCH_POWER_OFF
case, but that's a different story.)

(Sorry for not raising this earlier, I'm not subscribed to amd-gfx.)

Best regards,

Lukas
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: ATPX changes in drm-next-4.8 and D3cold handling

2016-07-28 Thread Lukas Wunner
On Thu, Jul 28, 2016 at 03:33:25PM +, Deucher, Alexander wrote:
> > From: Peter Wu [mailto:pe...@lekensteyn.nl]
> > Sent: Thursday, July 21, 2016 6:43 AM
> > In case you missed it, Dave's D3cold patches were succeeded by changes
> > in PCI core. Relevant commits in the pci/pm branch:
> > 
> > 006d44e PCI: Add runtime PM support for PCIe ports
> > 16468c7 ACPI / hotplug / PCI: Runtime resume bridge before rescan
> > d963f65 PCI: Power on bridges before scanning new devices
> > 9d26d3a PCI: Put PCIe ports into D3 during suspend
> > 43f7f88 PCI: Don't clear d3cold_allowed for PCIe ports
> 
> Did those get merged yet?

They will go into 4.8. Should have gone into 4.7 already but were
dropped at the last minute.


> I just need to revert this commit once the d3cold patches land:
> https://cgit.freedesktop.org/~agd5f/linux/commit/?h=drm-next-4.8&id=bdfb76040068d960cb9e226876be8a508d741c4a

So you probably need to revert this now.

Best regards,

Lukas
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx