Re: [PATCH] drm/amdgpu: always reset asic when going into suspend
On Thu, Jan 16, 2020 at 11:15 PM Alex Deucher wrote: > It's just papering over the problem. It would be better from a power > perspective for the driver to just not suspend and keep running like > normal. When the driver is not suspended runtime things like clock > and power gating are active which keep the GPU power at a minimum. Until we have a better solution, are there any strategies we could apply here to avoid the suspend as you say? e.g. DMI quirk these products to disable suspend? Or disable suspend on all s2idle setups? This would certainly be better than the current situation of the machine becoming unusable on resume. > I talked to our sbios team and they seem to think our S0ix > implementation works pretty differently from Intel's. I'm not really > an expert on this area however. We have a new team ramping on up this > for Linux however. Thanks for following up on this internally! Can I lend a product sample to the new team so that they have direct access? Daniel ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] drm/amdgpu: always reset asic when going into suspend
On Thu, Dec 19, 2019 at 10:08 PM Alex Deucher wrote: > I think there may be some AMD specific handling needed in > drivers/acpi/sleep.c. My understanding from reading the modern > standby documents from MS is that each vendor needs to provide a > platform specific PEP driver. I'm not sure how much of that current > code is Intel specific or not. I don't think there is anything Intel-specific in drivers/acpi/sleep.c. Reading more about PEP, I see that Linux supports PEP devices with ACPI ID INT33A1 or PNP0D80. Indeed the Intel platforms we work with have INT33A1 devices in their ACPI tables. This product has a \_SB.PEP ACPI device with _HID AMD0004 and _CID PNP0D80. Full acpidump: https://gist.github.com/dsd/ff3dfc0f63cdd9eba4a0fbd9e776e8be (see ssdt7) This PEP device responds to a _DSM with UUID argument "e3f32452-febc-43ce-9039-932122d37721", which is not the one documented at https://uefi.org/sites/default/files/resources/Intel_ACPI_Low_Power_S0_Idle.pdf Nevertheless, there is some data about the GPU: Package (0x04) { One, "\\_SB.PCI0.GP17.VGA", Zero, 0x03 }, However since this data is identical to many other devices that suspend and resume just fine, I wonder if it is really important. The one supported method does offer two calls which may mirror the Display Off/On Notifications in the above spec: Case (0x02) { \_SB.PCI0.SBRG.EC0.CSEE (0xB7) Return (Zero) } Case (0x03) { \_SB.PCI0.SBRG.EC0.CSEE (0xB8) Notify (\_SB.PCI0.SBRG.EC0.LID, 0x80) // Status Change Return (Zero) } but I tried executing this code after suspending amdgpu, and the problem still stands, amdgpu cannot wakeup correctly. There's nothing else really interesting in the PEP device as far as I can see. PEP things aside, I am still quite suspicious about the fact that calling amdgpu_device_suspend() then amdgpu_device_resume() on multiple products (not just this one) fails. It seems that this code flow is relying on the BIOS doing something in the S3 suspend/resume path in order to make the device resumable by amdgpu_device_resume(), which is why we have only encountered this issue for the first time on our first AMD platform that does not support S3 suspend. With that in mind, and lacking any better info, wouldn't it make sense for amdgpu_device_resume() to always call reset? Maybe it's not necessary in the S3 case, but it shouldn't harm anything. Or perhaps it could check if the device is alive and reset it if it's not? Alternatively do you have any other contacts within AMD that could help us figure out the underlying question of how to correctly suspend and resume these devices? Happy to ship an affected product sample your way. Thanks Daniel ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] drm/amdgpu: always reset asic when going into suspend
Hi Alex, On Mon, Nov 25, 2019 at 1:17 PM Daniel Drake wrote: > Unfortunately not. The original issue still exists (dead gfx after > resume from s2idle) and also when I trigger execution of the suspend > or runtime suspend routines the power usage increases around 1.5W as > before. > > Have you confirmed that amdgpu s2idle is working on platforms you have in > hand? Any further ideas here? Or any workarounds that you would consider? This platform has been rather tricky but all of the other problems are now solved: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f897e60a12f0b9146357780d317879bce2a877dc https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d21b8adbd475dba19ac2086d3306327b4a297418 https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=406857f773b082bc88edfd24967facf4ed07ac85 https://patchwork.kernel.org/patch/11263477/ amdgpu is the only breakage left before Linux can be shipped on this family of products. Thanks Daniel ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] drm/amdgpu: always reset asic when going into suspend
On Fri, Nov 22, 2019 at 11:32 PM Alex Deucher wrote: > Do these patches help? > https://patchwork.freedesktop.org/patch/341775/ > https://patchwork.freedesktop.org/patch/341968/ Unfortunately not. The original issue still exists (dead gfx after resume from s2idle) and also when I trigger execution of the suspend or runtime suspend routines the power usage increases around 1.5W as before. Have you confirmed that amdgpu s2idle is working on platforms you have in hand? Thanks Daniel ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] drm/amdgpu: always reset asic when going into suspend
On Wed, Oct 16, 2019 at 2:43 AM Alex Deucher wrote: > Is s2idle actually powering down the GPU? My understanding is that s2idle (at a high level) just calls all devices suspend routines and then puts the CPU into its deepest running state. So if there is something special to be done to power off the GPU, I believe that amdgpu is responsible for making arrangements for that to happen. In this case the amdgpu code already does: pci_disable_device(dev->pdev); pci_set_power_state(dev->pdev, PCI_D3hot); And the PCI layer will call through to any appropriate ACPI methods related to that low power state. > Do you see a difference in power usage? I think you are just working around > the fact that the > GPU never actually gets powered down. I ran a series of experiments. Base setup: no UI running, ran "setterm -powersave 1; setterm -blank 1" and waited 1 minute for screen to turn off. Base power usage in this state is 4.7W as reported by BAT0/power_now 1. Run amdgpu_device_suspend(ddev, true, true); before my change --> Power usage increases to 6.1W 2. Run amdgpu_device_suspend(ddev, true, true); with my change applied --> Power usage increases to 6.0W 3. Put amdgpu device in runtime suspend --> Power usage increases to 6.2W 4. Try unmodified suspend path but d3cold instead of d3hot --> Power usage increases to 6.1W So, all of the suspend schemes actually increase the power usage by roughly the same amount, reset or not, with and without my patch :/ Any ideas? Thanks, Daniel ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH] drm/amdgpu: always reset asic when going into suspend
On Asus UX434DA (Ryzen7 3700U), upon resume from s2idle, the screen turns on again and shows the pre-suspend image, but the display remains frozen from that point onwards. The kernel logs show errors: [drm] psp command failed and response status is (0x7) [drm] Fence fallback timer expired on ring sdma0 [drm] Fence fallback timer expired on ring gfx amdgpu :03:00.0: [drm:amdgpu_ib_ring_tests] *ERROR* IB test failed on gfx (-22). [drm:process_one_work] *ERROR* ib ring test failed (-22). This can also be reproduced with pm_test: # echo devices > /sys/power/pm_test # echo freeze > /sys/power/mem The same reproducer causes the same problem on Asus X512DK (Ryzen5 3500U) even though that model is normally able to suspend and resume OK via S3. Experimenting, I observed that this error condition can be invoked on any amdgpu product by executing in succession: amdgpu_device_suspend(drm_dev, true, true); amdgpu_device_resume(drm_dev, true, true); i.e. it appears that the resume routine is unable to get the device out of suspended state, except for the S3 suspend case where it presumably has a bit of extra help from the firmware or hardware. However, I also observed that the runtime suspend/resume routines work OK when tested like this, which lead me to the key difference in these two cases: the ASIC reset, which only happens in the runtime suspend path. Since it takes less than 1ms, we should do the ASIC reset in all suspend paths, fixing resume from s2idle on these products. Link: https://bugs.freedesktop.org/show_bug.cgi?id=111811 Signed-off-by: Daniel Drake --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 5a1939dbd4e3..7f4870e974fb 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -3082,15 +3082,16 @@ int amdgpu_device_suspend(struct drm_device *dev, bool suspend, bool fbcon) */ amdgpu_bo_evict_vram(adev); + amdgpu_asic_reset(adev); + r = amdgpu_asic_reset(adev); + if (r) + DRM_ERROR("amdgpu asic reset failed\n"); + pci_save_state(dev->pdev); if (suspend) { /* Shut down the device */ pci_disable_device(dev->pdev); pci_set_power_state(dev->pdev, PCI_D3hot); - } else { - r = amdgpu_asic_reset(adev); - if (r) - DRM_ERROR("amdgpu asic reset failed\n"); } return 0; -- 2.20.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: amdgpu hangs on boot or shutdown on AMD Raven Ridge CPU (Engineer Sample)
WHi Alex, On Thu, Apr 19, 2018 at 4:13 PM, Alex Deucherwrote: https://bugs.freedesktop.org/show_bug.cgi?id=105684 >>> >>> No progress made on that bug report so far. >>> What can we do to help this advance? >> >> Ping, any news here? How can we help advance on this bug? > > Can you try one of these branches? > https://cgit.freedesktop.org/~agd5f/linux/log/?h=amd-staging-drm-next > https://cgit.freedesktop.org/~agd5f/linux/log/?h=drm-next-4.18-wip > do they work any better? It's been over 3 months since we reported this bug by email, over 6 weeks since we reported it on bugzilla, and still there has been no meaningful diagnostics help from AMD. This follows a similar pattern to what we have seen with other issues prior to this one. What can we do so that this bug gets some attention from your team? Secondarily https://bugs.freedesktop.org/show_bug.cgi?id=106228 is another bug that needs attention. We have a growing number of consumer platforms affected by this. When booted, the amdgpu screen brightness value is incorrectly read back as 0, which systemd will then store on shutdown. On next boot, it restores the very low brightness level. This can reproduce out of the box on Fedora, Ubuntu, etc. Thanks, Daniel ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: amdgpu hangs on boot or shutdown on AMD Raven Ridge CPU (Engineer Sample)
On Thu, Mar 22, 2018 at 3:09 AM, Daniel Drake <dr...@endlessm.com> wrote: > On Tue, Feb 20, 2018 at 10:18 PM, Alex Deucher <alexdeuc...@gmail.com> wrote: >>> It seems that we are not alone seeing amdgpu-induced stability >>> problems on multiple Raven Ridge platforms. >>> https://www.phoronix.com/scan.php?page=news_item=AMD-Raven-Ridge-Mobo-Linux >>> >>> AMD, what can we do to help? >> >> Please file bugs: >> https://bugs.freedesktop.org > > Sorry for the delayed response. We're still seeing serious instability > here even on the latest kernel. Filed > https://bugs.freedesktop.org/show_bug.cgi?id=105684 No progress made on that bug report so far. What can we do to help this advance? Thanks, Daniel ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: amdgpu hangs on boot or shutdown on AMD Raven Ridge CPU (Engineer Sample)
Hi Alex, On Tue, Feb 20, 2018 at 10:18 PM, Alex Deucherwrote: >> It seems that we are not alone seeing amdgpu-induced stability >> problems on multiple Raven Ridge platforms. >> https://www.phoronix.com/scan.php?page=news_item=AMD-Raven-Ridge-Mobo-Linux >> >> AMD, what can we do to help? > > Please file bugs: > https://bugs.freedesktop.org Sorry for the delayed response. We're still seeing serious instability here even on the latest kernel. Filed https://bugs.freedesktop.org/show_bug.cgi?id=105684 Thanks, Daniel ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: amdgpu hangs on boot or shutdown on AMD Raven Ridge CPU (Engineer Sample)
Hi, > >>> We are working with new laptops that have the AMD Ravenl Ridge > >>> chipset with this `/proc/cpuinfo` > >>> https://gist.github.com/mschiu77/b06dba574e89b9a30cf4c450eaec49bc > >>> > >>> With the latest kernel 4.15, there're lots of different > >>> panics/oops during boot so no chance to get into X. It also happens > >>> during shutdown. Then I tried to build kernel from > >>> git://people.freedesktop.org/~agd5f/linux on branch > >>> amd-staging-drm-next with head on commit "drm: Fix trailing semicolon" > >>> and update the linux-firmware. Things seem to get better, only 1 oops > >>> observed. Here's the oops > >>> https://gist.github.com/mschiu77/1a68f27272b24775b2040acdb474cdd3. It seems that we are not alone seeing amdgpu-induced stability problems on multiple Raven Ridge platforms. https://www.phoronix.com/scan.php?page=news_item=AMD-Raven-Ridge-Mobo-Linux AMD, what can we do to help? Thanks! Daniel ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 4.15] drm/amd/display: call set csc_default if enable adjustment is false
From: Yue Hin Lau <yuehin@amd.com> Signed-off-by: Yue Hin Lau <yuehin@amd.com> Reviewed-by: Eric Bernstein <eric.bernst...@amd.com> Acked-by: Harry Wentland <harry.wentl...@amd.com> Signed-off-by: Alex Deucher <alexander.deuc...@amd.com> [dr...@endlessm.com: backport to 4.15] Signed-off-by: Daniel Drake <dr...@endlessm.com> --- drivers/gpu/drm/amd/display/dc/dcn10/dcn10_dpp.h | 2 +- drivers/gpu/drm/amd/display/dc/dcn10/dcn10_dpp_cm.c | 6 ++ drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c | 2 ++ drivers/gpu/drm/amd/display/dc/inc/hw/dpp.h | 2 +- 4 files changed, 6 insertions(+), 6 deletions(-) Testing Acer Aspire TC-380 engineering sample (Raven Ridge), the display comes up with an excessively green tint. This patch (from amd-staging-drm-next) solves the issue. Can it be included in Linux 4.15? diff --git a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_dpp.h b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_dpp.h index a9782b1aba47..34daf895f848 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_dpp.h +++ b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_dpp.h @@ -1360,7 +1360,7 @@ void dpp1_cm_set_output_csc_adjustment( void dpp1_cm_set_output_csc_default( struct dpp *dpp_base, - const struct default_adjustment *default_adjust); + enum dc_color_space colorspace); void dpp1_cm_set_gamut_remap( struct dpp *dpp, diff --git a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_dpp_cm.c b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_dpp_cm.c index 40627c244bf5..ed1216b53465 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_dpp_cm.c +++ b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_dpp_cm.c @@ -225,14 +225,13 @@ void dpp1_cm_set_gamut_remap( void dpp1_cm_set_output_csc_default( struct dpp *dpp_base, - const struct default_adjustment *default_adjust) + enum dc_color_space colorspace) { struct dcn10_dpp *dpp = TO_DCN10_DPP(dpp_base); uint32_t ocsc_mode = 0; - if (default_adjust != NULL) { - switch (default_adjust->out_color_space) { + switch (colorspace) { case COLOR_SPACE_SRGB: case COLOR_SPACE_2020_RGB_FULLRANGE: ocsc_mode = 0; @@ -253,7 +252,6 @@ void dpp1_cm_set_output_csc_default( case COLOR_SPACE_UNKNOWN: default: break; - } } REG_SET(CM_OCSC_CONTROL, 0, CM_OCSC_MODE, ocsc_mode); diff --git a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c index 961ad5c3b454..05dc01e54531 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c +++ b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c @@ -2097,6 +2097,8 @@ static void program_csc_matrix(struct pipe_ctx *pipe_ctx, tbl_entry.color_space = color_space; //tbl_entry.regval = matrix; pipe_ctx->plane_res.dpp->funcs->opp_set_csc_adjustment(pipe_ctx->plane_res.dpp, _entry); + } else { + pipe_ctx->plane_res.dpp->funcs->opp_set_csc_default(pipe_ctx->plane_res.dpp, colorspace); } } static bool is_lower_pipe_tree_visible(struct pipe_ctx *pipe_ctx) diff --git a/drivers/gpu/drm/amd/display/dc/inc/hw/dpp.h b/drivers/gpu/drm/amd/display/dc/inc/hw/dpp.h index 83a68460edcd..9420dfb94d39 100644 --- a/drivers/gpu/drm/amd/display/dc/inc/hw/dpp.h +++ b/drivers/gpu/drm/amd/display/dc/inc/hw/dpp.h @@ -64,7 +64,7 @@ struct dpp_funcs { void (*opp_set_csc_default)( struct dpp *dpp, - const struct default_adjustment *default_adjust); + enum dc_color_space colorspace); void (*opp_set_csc_adjustment)( struct dpp *dpp, -- 2.14.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] iommu/amd: flush IOTLB for specific domains only (v3)
Hi, On Tue, May 30, 2017 at 3:38 PM, Nath, Arindamwrote: >>-Original Message- >>From: Joerg Roedel [mailto:j...@8bytes.org] >>Sent: Monday, May 29, 2017 8:09 PM >>To: Nath, Arindam ; Lendacky, Thomas >> >>Cc: io...@lists.linux-foundation.org; amd-gfx@lists.freedesktop.org; >>Deucher, Alexander ; Bridgman, John >> ; dr...@endlessm.com; Suthikulpanit, Suravee >> ; li...@endlessm.com; Craig Stein >> ; mic...@daenzer.net; Kuehling, Felix >> ; sta...@vger.kernel.org >>Subject: Re: [PATCH] iommu/amd: flush IOTLB for specific domains only (v3) >> >>Hi Arindam, >> >>I met Tom Lendacky last week in Nuremberg last week and he told me he is >>working on the same area of the code that this patch is for. His reason >>for touching this code was to solve some locking problems. Maybe you two >>can work together on a joint approach to improve this? > > Sure Joerg, I will work with Tom. What was the end result here? I see that the code has been reworked in 4.13 so your original patch no longer applies. Is the reworked version also expected to solve the original issue? Thanks Daniel ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
amdgpu display corruption and hang on AMD A10-9620P
Hi, We are working with new laptops that have the AMD Bristol Ridge chipset with this SoC: AMD A10-9620P RADEON R5, 10 COMPUTE CORES 4C+6G I think this is the Bristol Ridge chipset. During boot, the display becomes unusable at the point where the amdgpu driver loads. You can see at least two horizontal lines of garbage at this point. We have reproduced on 4.8, 4.10 and linus master (early 4.12). Photo: http://pasteboard.co/qrC9mh4p.jpg Getting logs is tricky because the system appears to freeze at that point. Is this a known issue? Anything we can do to help diagnosis? Thanks Daniel ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] iommu/amd: flush IOTLB for specific domains only
On Wed, Apr 5, 2017 at 9:01 AM, Nath, Arindam <arindam.n...@amd.com> wrote: > > >-Original Message- > >From: Daniel Drake [mailto:dr...@endlessm.com] > >Sent: Thursday, March 30, 2017 7:15 PM > >To: Nath, Arindam > >Cc: j...@8bytes.org; Deucher, Alexander; Bridgman, John; amd- > >g...@lists.freedesktop.org; io...@lists.linux-foundation.org; Suthikulpanit, > >Suravee; Linux Upstreaming Team > >Subject: Re: [PATCH] iommu/amd: flush IOTLB for specific domains only > > > >On Thu, Mar 30, 2017 at 12:23 AM, Nath, Arindam <arindam.n...@amd.com> > >wrote: > >> Daniel, did you get chance to test this patch? > > > >Not yet. Should we test it alone or alongside "PCI: Blacklist AMD > >Stoney GPU devices for ATS"? > > Daniel, any luck with this patch? Sorry for the delay. The patch appears to be working fine. Daniel ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] iommu/amd: flush IOTLB for specific domains only
Hi Arindam, You CC'd me on this - does this mean that it is a fix for the issue described in the thread "amd-iommu: can't boot with amdgpu, AMD-Vi: Completion-Wait loop timed out" ? Thanks Daniel On Mon, Mar 27, 2017 at 12:17 AM,wrote: > From: Arindam Nath > > The idea behind flush queues is to defer the IOTLB flushing > for domains for which the mappings are no longer valid. We > add such domains in queue_add(), and when the queue size > reaches FLUSH_QUEUE_SIZE, we perform __queue_flush(). > > Since we have already taken lock before __queue_flush() > is called, we need to make sure the IOTLB flushing is > performed as quickly as possible. > > In the current implementation, we perform IOTLB flushing > for all domains irrespective of which ones were actually > added in the flush queue initially. This can be quite > expensive especially for domains for which unmapping is > not required at this point of time. > > This patch makes use of domain information in > 'struct flush_queue_entry' to make sure we only flush > IOTLBs for domains who need it, skipping others. > > Signed-off-by: Arindam Nath > --- > drivers/iommu/amd_iommu.c | 15 --- > 1 file changed, 8 insertions(+), 7 deletions(-) > > diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c > index 98940d1..6a9a048 100644 > --- a/drivers/iommu/amd_iommu.c > +++ b/drivers/iommu/amd_iommu.c > @@ -2227,15 +2227,16 @@ static struct iommu_group > *amd_iommu_device_group(struct device *dev) > > static void __queue_flush(struct flush_queue *queue) > { > - struct protection_domain *domain; > - unsigned long flags; > int idx; > > - /* First flush TLB of all known domains */ > - spin_lock_irqsave(_iommu_pd_lock, flags); > - list_for_each_entry(domain, _iommu_pd_list, list) > - domain_flush_tlb(domain); > - spin_unlock_irqrestore(_iommu_pd_lock, flags); > + /* First flush TLB of all domains which were added to flush queue */ > + for (idx = 0; idx < queue->next; ++idx) { > + struct flush_queue_entry *entry; > + > + entry = queue->entries + idx; > + > + domain_flush_tlb(>dma_dom->domain); > + } > > /* Wait until flushes have completed */ > domain_flush_complete(NULL); > -- > 1.9.1 > ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: amd-iommu: can't boot with amdgpu, AMD-Vi: Completion-Wait loop timed out
Hi Joerg, Thanks for looking into this. We confirm that this workaround avoids the iommu log spam and that amdgpu appears to be working fine with it. Daniel On Wed, Mar 22, 2017 at 5:22 AM, j...@8bytes.orgwrote: > On Tue, Mar 21, 2017 at 04:30:55PM +, Deucher, Alexander wrote: >> > I am preparing a debug-patch that disables ATS for these GPUs so someone >> > with such a chip can test it. >> >> Thanks Joerg. > > Here is a debug patch, using the hard hammer of disabling the use of ATS > completly in the AMD IOMMU driver. If it fixes the issue I am going to > write a more upstreamable version. > > But for now, please test if this fixes the issue. > > Thanks, > > Joerg > > diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c > index 98940d1..f019aa6 100644 > --- a/drivers/iommu/amd_iommu.c > +++ b/drivers/iommu/amd_iommu.c > @@ -467,7 +467,7 @@ static int iommu_init_device(struct device *dev) > struct amd_iommu *iommu; > > iommu = amd_iommu_rlookup_table[dev_data->devid]; > - dev_data->iommu_v2 = iommu->is_iommu_v2; > + dev_data->iommu_v2 = false; > } > > dev->archdata.iommu = dev_data; > diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c > index 6130278..41d0e64 100644 > --- a/drivers/iommu/amd_iommu_init.c > +++ b/drivers/iommu/amd_iommu_init.c > @@ -171,7 +171,7 @@ int amd_iommus_present; > > /* IOMMUs have a non-present cache? */ > bool amd_iommu_np_cache __read_mostly; > -bool amd_iommu_iotlb_sup __read_mostly = true; > +bool amd_iommu_iotlb_sup __read_mostly = false; > > u32 amd_iommu_max_pasid __read_mostly = ~0; > ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: amd-iommu: can't boot with amdgpu, AMD-Vi: Completion-Wait loop timed out
Hi, On Mon, Mar 13, 2017 at 2:01 PM, Deucher, Alexanderwrote: > > We are unable to boot Acer Aspire E5-553G (AMD FX-9800P RADEON R7) nor > > Acer Aspire E5-523 with standard configurations because during boot > > the screen is flooded with the following error message over and over: > > > > AMD-Vi: Completion-Wait loop timed out > > We ran into similar issues and bisected it to commit > b1516a14657acf81a587e9a6e733a881625eee53. I'm not too familiar with the > IOMMU hardware to know if this is an iommu or display driver issue yet. We can confirm that reverting this commit solves the issue. Given that that commit is an optimization, but it has introduced a regression on multiple platforms, and has been like this for 8 months, it would be common practice to now revert this patch upstream until the regression is fixed. Could you please send a new patch to do this? Also, we would be happy to test any real solutions to this issue while we still have the affected units in hand. Thanks Daniel ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx