RE: [PATCH] drm/radeon: check if device is root before getting pci speed caps

2019-01-21 Thread Quan, Evan
Reviewed-by: Evan Quan 

> -Original Message-
> From: amd-gfx  On Behalf Of Alex
> Deucher
> Sent: 2019年1月22日 2:59
> To: amd-gfx list 
> Cc: Deucher, Alexander 
> Subject: Re: [PATCH] drm/radeon: check if device is root before getting pci
> speed caps
> 
> Ping?
> 
> On Thu, Jan 17, 2019 at 2:44 PM Alex Deucher 
> wrote:
> >
> > Check if the device is root rather before attempting to see what
> > speeds the pcie port supports.  Fixes a crash with pci passthrough in
> > a VM.
> >
> > Bug: https://bugs.freedesktop.org/show_bug.cgi?id=109366
> > Signed-off-by: Alex Deucher 
> > ---
> >  drivers/gpu/drm/radeon/ci_dpm.c | 5 +++--
> > drivers/gpu/drm/radeon/si_dpm.c | 5 +++--
> >  2 files changed, 6 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/radeon/ci_dpm.c
> > b/drivers/gpu/drm/radeon/ci_dpm.c index d587779a80b4..a97294ac96d5
> > 100644
> > --- a/drivers/gpu/drm/radeon/ci_dpm.c
> > +++ b/drivers/gpu/drm/radeon/ci_dpm.c
> > @@ -5676,7 +5676,7 @@ int ci_dpm_init(struct radeon_device *rdev)
> > u16 data_offset, size;
> > u8 frev, crev;
> > struct ci_power_info *pi;
> > -   enum pci_bus_speed speed_cap;
> > +   enum pci_bus_speed speed_cap = PCI_SPEED_UNKNOWN;
> > struct pci_dev *root = rdev->pdev->bus->self;
> > int ret;
> >
> > @@ -5685,7 +5685,8 @@ int ci_dpm_init(struct radeon_device *rdev)
> > return -ENOMEM;
> > rdev->pm.dpm.priv = pi;
> >
> > -   speed_cap = pcie_get_speed_cap(root);
> > +   if (!pci_is_root_bus(rdev->pdev->bus))
> > +   speed_cap = pcie_get_speed_cap(root);
> > if (speed_cap == PCI_SPEED_UNKNOWN) {
> > pi->sys_pcie_mask = 0;
> > } else {
> > diff --git a/drivers/gpu/drm/radeon/si_dpm.c
> > b/drivers/gpu/drm/radeon/si_dpm.c index 8fb60b3af015..0a785ef0ab66
> > 100644
> > --- a/drivers/gpu/drm/radeon/si_dpm.c
> > +++ b/drivers/gpu/drm/radeon/si_dpm.c
> > @@ -6899,7 +6899,7 @@ int si_dpm_init(struct radeon_device *rdev)
> > struct ni_power_info *ni_pi;
> > struct si_power_info *si_pi;
> > struct atom_clock_dividers dividers;
> > -   enum pci_bus_speed speed_cap;
> > +   enum pci_bus_speed speed_cap = PCI_SPEED_UNKNOWN;
> > struct pci_dev *root = rdev->pdev->bus->self;
> > int ret;
> >
> > @@ -6911,7 +6911,8 @@ int si_dpm_init(struct radeon_device *rdev)
> > eg_pi = &ni_pi->eg;
> > pi = &eg_pi->rv7xx;
> >
> > -   speed_cap = pcie_get_speed_cap(root);
> > +   if (!pci_is_root_bus(rdev->pdev->bus))
> > +   speed_cap = pcie_get_speed_cap(root);
> > if (speed_cap == PCI_SPEED_UNKNOWN) {
> > si_pi->sys_pcie_mask = 0;
> > } else {
> > --
> > 2.20.1
> >
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH] drm/amdgpu: Add missing power attribute to APU check

2019-01-21 Thread Quan, Evan
Reviewed-by: Evan Quan 

> -Original Message-
> From: amd-gfx  On Behalf Of Alex
> Deucher
> Sent: 2019年1月22日 3:01
> To: amd-gfx list 
> Cc: Deucher, Alexander 
> Subject: Re: [PATCH] drm/amdgpu: Add missing power attribute to APU
> check
> 
> Ping again?
> 
> Alex
> 
> On Tue, Jan 15, 2019 at 11:32 AM Alex Deucher 
> wrote:
> >
> > Ping?
> >
> > On Wed, Jan 9, 2019 at 10:23 PM Alex Deucher 
> wrote:
> > >
> > > Add missing power_average to visible check for power attributesi for
> > > APUs.  Was missed before.
> > >
> > > Signed-off-by: Alex Deucher 
> > > ---
> > >  drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c | 3 ++-
> > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
> > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
> > > index 51eb2cf42b81..979d96278413 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
> > > @@ -1799,7 +1799,8 @@ static umode_t
> hwmon_attributes_visible(struct kobject *kobj,
> > > effective_mode &= ~S_IWUSR;
> > >
> > > if ((adev->flags & AMD_IS_APU) &&
> > > -   (attr == &sensor_dev_attr_power1_cap_max.dev_attr.attr ||
> > > +   (attr == &sensor_dev_attr_power1_average.dev_attr.attr ||
> > > +attr == &sensor_dev_attr_power1_cap_max.dev_attr.attr
> > > + ||
> > >  attr == &sensor_dev_attr_power1_cap_min.dev_attr.attr||
> > >  attr == &sensor_dev_attr_power1_cap.dev_attr.attr))
> > > return 0;
> > > --
> > > 2.20.1
> > >
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/amdkfd: Fix if preprocessor statement above kfd_fill_iolink_info_for_cpu

2019-01-21 Thread Nathan Chancellor
Clang warns:

drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_crat.c:866:5: warning:
'CONFIG_X86_64' is not defined, evaluates to 0 [-Wundef]
#if CONFIG_X86_64
^
1 warning generated.

Fixes: d1c234e2cd10 ("drm/amdkfd: Allow building KFD on ARM64 (v2)")
Signed-off-by: Nathan Chancellor 
---

Resending as I forgot to add the lists...

 drivers/gpu/drm/amd/amdkfd/kfd_crat.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
index 5d85ff341385..2e7c44955f43 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
@@ -863,7 +863,7 @@ static int kfd_fill_mem_info_for_cpu(int numa_node_id, int 
*avail_size,
return 0;
 }
 
-#if CONFIG_X86_64
+#ifdef CONFIG_X86_64
 static int kfd_fill_iolink_info_for_cpu(int numa_node_id, int *avail_size,
uint32_t *num_entries,
struct crat_subtype_iolink *sub_type_hdr)
-- 
2.20.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/amdgpu: add a workaround for GDS ordered append hangs with compute queues

2019-01-21 Thread Marek Olšák
From: Marek Olšák 

I'm not increasing the DRM version because GDS isn't totally without bugs yet.

Signed-off-by: Marek Olšák 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gds.h |  2 ++
 drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c   | 17 
 drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c   | 17 
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c   | 36 +
 include/uapi/drm/amdgpu_drm.h   |  5 
 5 files changed, 77 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gds.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gds.h
index ecbcefe49a98..f89f5734d985 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gds.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gds.h
@@ -30,20 +30,22 @@ struct amdgpu_bo;
 struct amdgpu_gds_asic_info {
uint32_ttotal_size;
uint32_tgfx_partition_size;
uint32_tcs_partition_size;
 };
 
 struct amdgpu_gds {
struct amdgpu_gds_asic_info mem;
struct amdgpu_gds_asic_info gws;
struct amdgpu_gds_asic_info oa;
+   uint32_tgds_compute_max_wave_id;
+
/* At present, GDS, GWS and OA resources for gfx (graphics)
 * is always pre-allocated and available for graphics operation.
 * Such resource is shared between all gfx clients.
 * TODO: move this operation to user space
 * */
struct amdgpu_bo*   gds_gfx_bo;
struct amdgpu_bo*   gws_gfx_bo;
struct amdgpu_bo*   oa_gfx_bo;
 };
 
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
index 7984292f9282..d971ea914755 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
@@ -2257,20 +2257,36 @@ static void gfx_v7_0_ring_emit_ib_gfx(struct 
amdgpu_ring *ring,
 }
 
 static void gfx_v7_0_ring_emit_ib_compute(struct amdgpu_ring *ring,
  struct amdgpu_job *job,
  struct amdgpu_ib *ib,
  uint32_t flags)
 {
unsigned vmid = AMDGPU_JOB_GET_VMID(job);
u32 control = INDIRECT_BUFFER_VALID | ib->length_dw | (vmid << 24);
 
+   /* Currently, there is a high possibility to get wave ID mismatch
+* between ME and GDS, leading to a hw deadlock, because ME generates
+* different wave IDs than the GDS expects. This situation happens
+* randomly when at least 5 compute pipes use GDS ordered append.
+* The wave IDs generated by ME are also wrong after suspend/resume.
+* Those are probably bugs somewhere else in the kernel driver.
+*
+* Writing GDS_COMPUTE_MAX_WAVE_ID resets wave ID counters in ME and
+* GDS to 0 for this ring (me/pipe).
+*/
+   if (ib->flags & AMDGPU_IB_FLAG_RESET_GDS_MAX_WAVE_ID) {
+   amdgpu_ring_write(ring, PACKET3(PACKET3_SET_CONFIG_REG, 1));
+   amdgpu_ring_write(ring, mmGDS_COMPUTE_MAX_WAVE_ID - 
PACKET3_SET_CONFIG_REG_START);
+   amdgpu_ring_write(ring, 
ring->adev->gds.gds_compute_max_wave_id);
+   }
+
amdgpu_ring_write(ring, PACKET3(PACKET3_INDIRECT_BUFFER, 2));
amdgpu_ring_write(ring,
 #ifdef __BIG_ENDIAN
  (2 << 0) |
 #endif
  (ib->gpu_addr & 0xFFFC));
amdgpu_ring_write(ring, upper_32_bits(ib->gpu_addr) & 0x);
amdgpu_ring_write(ring, control);
 }
 
@@ -5050,20 +5066,21 @@ static void gfx_v7_0_set_irq_funcs(struct amdgpu_device 
*adev)
adev->gfx.priv_inst_irq.num_types = 1;
adev->gfx.priv_inst_irq.funcs = &gfx_v7_0_priv_inst_irq_funcs;
 }
 
 static void gfx_v7_0_set_gds_init(struct amdgpu_device *adev)
 {
/* init asci gds info */
adev->gds.mem.total_size = RREG32(mmGDS_VMID0_SIZE);
adev->gds.gws.total_size = 64;
adev->gds.oa.total_size = 16;
+   adev->gds.gds_compute_max_wave_id = RREG32(mmGDS_COMPUTE_MAX_WAVE_ID);
 
if (adev->gds.mem.total_size == 64 * 1024) {
adev->gds.mem.gfx_partition_size = 4096;
adev->gds.mem.cs_partition_size = 4096;
 
adev->gds.gws.gfx_partition_size = 4;
adev->gds.gws.cs_partition_size = 4;
 
adev->gds.oa.gfx_partition_size = 4;
adev->gds.oa.cs_partition_size = 1;
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
index a26747681ed6..dcdae74fc0e1 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
@@ -6077,20 +6077,36 @@ static void gfx_v8_0_ring_emit_ib_gfx(struct 
amdgpu_ring *ring,
 }
 
 static void gfx_v8_0_ring_emit_ib_compute(struct amdgpu_ring *ring,
  struct amdgpu_job *job,
  struct amdgpu_ib *ib,
   

Re: [PATCH] drm: Split out drm_probe_helper.h

2019-01-21 Thread Sam Ravnborg
Hi Daniel et al.

> > 
> > Yeah the drm_crtc_helper.h header is a bit the miniature drmP.h for legacy
> > kms drivers. Just removing it from all the atomic drivers caused lots of
> > fallout, I expect even more if you entirely remove the includes it has.
> > Maybe a todo, care to pls create that patch since it's your idea?
> 
> The main reason I bailed out initially was that this would create
> small changes to several otherwise seldomly touched files.
> And then we would later come and remove drmP.h - so lots of
> small but incremental changes to the same otherwise seldomly
> edited files.
> And the job was only partially done.
> 
> I will try to experiment with an approach where I clean up the
> include/drm/*.h files a little (like suggested above, +delete drmP.h
> and maybe a bit more).
> 
> Then to try on a driver by driver basis to make it build with a
> cleaned set of include files.
> I hope that the cleaned up driver can still build without the
> cleaned header files so the changes can be submitted piecemal.
> 
> Will do so with an eye on the lesser maintained drivers to try it
> out to avoid creating too much chrunch for others.

I have now a few patches queued, but the result is not too pretty.
I did the following:

- For all files in include/drm/*.h the set of include files
  were adjusted to the minimum number of files required to make
  them build without any other files included first.

  Created one .c file for each .h file. Then included the .h
  file and adjusted to the minimal set of include files.
  In the process a lot of forwards were added.

- Deleted drmP.h

- Fixed build of a few drivers: sti, tilcdc, gma500, tve200, via

Some observations:

- Killing all the includes not needed in the headers files
  results in a a lot of extra changes.
  Examples:
drm_modseset_helper_vtables.h is no longer
included by anyone, so needs to be added in many files

drm_atomic_state_helper.h is no longer included
by anyone so likewise needs to be added in many files

- It is very tedious to do this properly.
  The process I followed was:
  - delete / comment out all include files
  - add back the obvious from a quick scan of the code
  - build - fix - build - fix - build - fix ...
  -   next file...

- The result is errorprone as only the allyesconfig + allmodconfig
  variants are tested. But reallife configurations are more diverse.

Current diffstat:
   111 files changed, 771 insertions(+), 401 deletions(-)

This is for the 5 drivers alone and not the header cleanup.
So long story short - this is not good and not the way forward.

I will try to come up with a few improvements to make the
headers files selfcontained, but restricted to the changes that
add forwards/include to avoid the chrunch in all the drivers.

And then post for review a few patches to clean up some headers.
If the cleanup gets a go I will try to persuade the introduction
of these.
This will include, but will not be limited to, the above mentioned
drm_crtc_helper.h header file.

For now too much time was already spent on this, so it is at the
moment pushed back on my TODO list.
This mail serve also as a kind of "where had I left", when/if I
pick this up again.

If there are anyone that knows some tooling that can help in the
process of adjusting the header files I am all ears.

Sam
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: BUG: unable to handle kernel NULL pointer dereference at 0000000000000008

2019-01-21 Thread Mikhail Gavrilov
On Mon, 21 Jan 2019 at 22:00, Grodzovsky, Andrey
 wrote:
>
> + Harry
>
> Looks like this is happening during GPU reset due to job time out. I would 
> first try to reproduce this just with a plain reset from sysfs.
> Mikhail, also please provide add2line for dce110_setup_audio_dto.isra.8+0x171

$ eu-addr2line -e
/lib/debug/lib/modules/5.0.0-0.rc2.git4.3.fc30.x86_64/kernel/drivers/gpu/drm/amd/amdgpu/amdgpu.ko.debug
dce110_setup_audio_dto.isra.8+0x171
drivers/gpu/drm/amd/amdgpu/../display/dc/dce110/dce110_hw_sequencer.c:1996:25

Thanks, I hope this helps.

--
Best Regards,
Mike Gavrilov.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [RFC PATCH] drm: disable WC optimization for cache coherent devices on non-x86

2019-01-21 Thread Ard Biesheuvel
On Mon, 21 Jan 2019 at 20:04, Michel Dänzer  wrote:
>
> On 2019-01-21 7:28 p.m., Ard Biesheuvel wrote:
> > On Mon, 21 Jan 2019 at 19:24, Michel Dänzer  wrote:
> >> On 2019-01-21 7:20 p.m., Ard Biesheuvel wrote:
> >>> On Mon, 21 Jan 2019 at 19:04, Michel Dänzer  wrote:
>  On 2019-01-21 6:59 p.m., Ard Biesheuvel wrote:
> > On Mon, 21 Jan 2019 at 18:55, Michel Dänzer  wrote:
> >> On 2019-01-21 5:30 p.m., Ard Biesheuvel wrote:
> >>> On Mon, 21 Jan 2019 at 17:22, Christoph Hellwig  
> >>> wrote:
> >>>
>  Until that happens we should just change the driver ifdefs to default
>  the hacks to off and only enable them on setups where we 100%
>  positively know that they actually work.  And document that fact
>  in big fat comments.
> >>>
> >>> Well, as I mentioned in my commit log as well, if we default to off
> >>> unless CONFIG_X86, we may break working setups on MIPS and Power where
> >>> the device is in fact non-cache coherent, and relies on this
> >>> 'optimization' to get things working.
> >>
> >> FWIW, the amdgpu driver doesn't rely on non-snooped transfers for
> >> correct basic operation (the scenario Christian brought up is a very
> >> specialized use-case), so that shouldn't be an issue.
> >
> > The point is that this is only true for x86.
> >
> > On other architectures, the use of non-cached mappings on the CPU side
> > means that you /do/ rely on non-snooped transfers, since if those
> > transfers turn out not to snoop inadvertently, the accesses are
> > incoherent with the CPU's view of memory.
> 
>  The driver generally only uses non-cached mappings if
>  drm_arch/device_can_wc_memory returns true.
> >>>
> >>> Indeed. And so we should take care to only return 'true' from that
> >>> function if it is guaranteed that non-cached CPU mappings are coherent
> >>> with the mappings used by the GPU, either because that is always the
> >>> case (like on x86), or because we know that the platform in question
> >>> implements NoSnoop correctly throughout the interconnect.
> >>>
> >>> What seems to be complicating matters is that in some cases, the
> >>> device is non-cache coherent to begin with, so regardless of whether
> >>> the NoSnoop attribute is used or not, those accesses will not snoop in
> >>> the caches and be coherent with the non-cached mappings used by the
> >>> CPU. So if we restrict this optimization [on non-X86] to platforms
> >>> that are known to implement NoSnoop correctly, we may break platforms
> >>> that are implicitly NoSnoop all the time.
> >>
> >> Since the driver generally doesn't rely on non-snooped accesses for
> >> correctness, that couldn't "break" anything that hasn't always been broken.
> >
> > Again, that is only true on x86.
> >
> > On other architectures, DMA writes from the device may allocate in the
> > caches, and be invisible to the CPU when it uses non-cached mappings.
>
> Let me try one last time:
>

I could say the same :-)

> If drm_arch_can_wc_memory returns false, the driver falls back to the
> normal mode of operation, using a cacheable CPU mapping and snooped GPU
> transfers, even if userspace asks (as a performance optimization) for a
> write-combined CPU mapping and non-snooped GPU transfers via
> AMDGPU_GEM_CREATE_CPU_GTT_USWC.

I am not talking about the case where drm_arch_can_wc_memory() returns false.

I am talking about the case where it returns true, which is currently
the default for all architectures, except Power and MIPS in some
cases.

This mode of operation breaks my cache coherent arm64 system. (AMD Seattle)
With this patch applied, everything works fine.

> This normal mode of operation is also
> used for the ring buffers at the heart of the driver's operation.

But is it really the same mode of operation? Does it also vmap() the
pages? Or does it use the DMA API to allocate the ring buffers?
Because in the latter case, this will give you non-cached CPU mappings
as well if the device is non-cache coherent.

> If
> there is a platform where this normal mode of operation doesn't work,
> the driver could never have worked reliably on that platform, since
> before AMDGPU_GEM_CREATE_CPU_GTT_USWC or drm_arch_can_wc_memory even
> existed.
>

As I said, I am talking about the case where drm_arch_can_wc_memory()
returns true on a cache coherent system. This relies on NoSnoop being
implemented correctly in the platform, or a CPU architecture that
snoops the caches when doing uncached memory accesses (such as x86)
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [RFC PATCH] drm: disable WC optimization for cache coherent devices on non-x86

2019-01-21 Thread Michel Dänzer
On 2019-01-21 7:28 p.m., Ard Biesheuvel wrote:
> On Mon, 21 Jan 2019 at 19:24, Michel Dänzer  wrote:
>> On 2019-01-21 7:20 p.m., Ard Biesheuvel wrote:
>>> On Mon, 21 Jan 2019 at 19:04, Michel Dänzer  wrote:
 On 2019-01-21 6:59 p.m., Ard Biesheuvel wrote:
> On Mon, 21 Jan 2019 at 18:55, Michel Dänzer  wrote:
>> On 2019-01-21 5:30 p.m., Ard Biesheuvel wrote:
>>> On Mon, 21 Jan 2019 at 17:22, Christoph Hellwig  
>>> wrote:
>>>
 Until that happens we should just change the driver ifdefs to default
 the hacks to off and only enable them on setups where we 100%
 positively know that they actually work.  And document that fact
 in big fat comments.
>>>
>>> Well, as I mentioned in my commit log as well, if we default to off
>>> unless CONFIG_X86, we may break working setups on MIPS and Power where
>>> the device is in fact non-cache coherent, and relies on this
>>> 'optimization' to get things working.
>>
>> FWIW, the amdgpu driver doesn't rely on non-snooped transfers for
>> correct basic operation (the scenario Christian brought up is a very
>> specialized use-case), so that shouldn't be an issue.
>
> The point is that this is only true for x86.
>
> On other architectures, the use of non-cached mappings on the CPU side
> means that you /do/ rely on non-snooped transfers, since if those
> transfers turn out not to snoop inadvertently, the accesses are
> incoherent with the CPU's view of memory.

 The driver generally only uses non-cached mappings if
 drm_arch/device_can_wc_memory returns true.
>>>
>>> Indeed. And so we should take care to only return 'true' from that
>>> function if it is guaranteed that non-cached CPU mappings are coherent
>>> with the mappings used by the GPU, either because that is always the
>>> case (like on x86), or because we know that the platform in question
>>> implements NoSnoop correctly throughout the interconnect.
>>>
>>> What seems to be complicating matters is that in some cases, the
>>> device is non-cache coherent to begin with, so regardless of whether
>>> the NoSnoop attribute is used or not, those accesses will not snoop in
>>> the caches and be coherent with the non-cached mappings used by the
>>> CPU. So if we restrict this optimization [on non-X86] to platforms
>>> that are known to implement NoSnoop correctly, we may break platforms
>>> that are implicitly NoSnoop all the time.
>>
>> Since the driver generally doesn't rely on non-snooped accesses for
>> correctness, that couldn't "break" anything that hasn't always been broken.
> 
> Again, that is only true on x86.
> 
> On other architectures, DMA writes from the device may allocate in the
> caches, and be invisible to the CPU when it uses non-cached mappings.

Let me try one last time:

If drm_arch_can_wc_memory returns false, the driver falls back to the
normal mode of operation, using a cacheable CPU mapping and snooped GPU
transfers, even if userspace asks (as a performance optimization) for a
write-combined CPU mapping and non-snooped GPU transfers via
AMDGPU_GEM_CREATE_CPU_GTT_USWC. This normal mode of operation is also
used for the ring buffers at the heart of the driver's operation. If
there is a platform where this normal mode of operation doesn't work,
the driver could never have worked reliably on that platform, since
before AMDGPU_GEM_CREATE_CPU_GTT_USWC or drm_arch_can_wc_memory even
existed.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: Add missing power attribute to APU check

2019-01-21 Thread Alex Deucher
Ping again?

Alex

On Tue, Jan 15, 2019 at 11:32 AM Alex Deucher  wrote:
>
> Ping?
>
> On Wed, Jan 9, 2019 at 10:23 PM Alex Deucher  wrote:
> >
> > Add missing power_average to visible check for power
> > attributesi for APUs.  Was missed before.
> >
> > Signed-off-by: Alex Deucher 
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c 
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
> > index 51eb2cf42b81..979d96278413 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
> > @@ -1799,7 +1799,8 @@ static umode_t hwmon_attributes_visible(struct 
> > kobject *kobj,
> > effective_mode &= ~S_IWUSR;
> >
> > if ((adev->flags & AMD_IS_APU) &&
> > -   (attr == &sensor_dev_attr_power1_cap_max.dev_attr.attr ||
> > +   (attr == &sensor_dev_attr_power1_average.dev_attr.attr ||
> > +attr == &sensor_dev_attr_power1_cap_max.dev_attr.attr ||
> >  attr == &sensor_dev_attr_power1_cap_min.dev_attr.attr||
> >  attr == &sensor_dev_attr_power1_cap.dev_attr.attr))
> > return 0;
> > --
> > 2.20.1
> >
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/radeon: check if device is root before getting pci speed caps

2019-01-21 Thread Alex Deucher
Ping?

On Thu, Jan 17, 2019 at 2:44 PM Alex Deucher  wrote:
>
> Check if the device is root rather before attempting to see what
> speeds the pcie port supports.  Fixes a crash with pci passthrough
> in a VM.
>
> Bug: https://bugs.freedesktop.org/show_bug.cgi?id=109366
> Signed-off-by: Alex Deucher 
> ---
>  drivers/gpu/drm/radeon/ci_dpm.c | 5 +++--
>  drivers/gpu/drm/radeon/si_dpm.c | 5 +++--
>  2 files changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/radeon/ci_dpm.c b/drivers/gpu/drm/radeon/ci_dpm.c
> index d587779a80b4..a97294ac96d5 100644
> --- a/drivers/gpu/drm/radeon/ci_dpm.c
> +++ b/drivers/gpu/drm/radeon/ci_dpm.c
> @@ -5676,7 +5676,7 @@ int ci_dpm_init(struct radeon_device *rdev)
> u16 data_offset, size;
> u8 frev, crev;
> struct ci_power_info *pi;
> -   enum pci_bus_speed speed_cap;
> +   enum pci_bus_speed speed_cap = PCI_SPEED_UNKNOWN;
> struct pci_dev *root = rdev->pdev->bus->self;
> int ret;
>
> @@ -5685,7 +5685,8 @@ int ci_dpm_init(struct radeon_device *rdev)
> return -ENOMEM;
> rdev->pm.dpm.priv = pi;
>
> -   speed_cap = pcie_get_speed_cap(root);
> +   if (!pci_is_root_bus(rdev->pdev->bus))
> +   speed_cap = pcie_get_speed_cap(root);
> if (speed_cap == PCI_SPEED_UNKNOWN) {
> pi->sys_pcie_mask = 0;
> } else {
> diff --git a/drivers/gpu/drm/radeon/si_dpm.c b/drivers/gpu/drm/radeon/si_dpm.c
> index 8fb60b3af015..0a785ef0ab66 100644
> --- a/drivers/gpu/drm/radeon/si_dpm.c
> +++ b/drivers/gpu/drm/radeon/si_dpm.c
> @@ -6899,7 +6899,7 @@ int si_dpm_init(struct radeon_device *rdev)
> struct ni_power_info *ni_pi;
> struct si_power_info *si_pi;
> struct atom_clock_dividers dividers;
> -   enum pci_bus_speed speed_cap;
> +   enum pci_bus_speed speed_cap = PCI_SPEED_UNKNOWN;
> struct pci_dev *root = rdev->pdev->bus->self;
> int ret;
>
> @@ -6911,7 +6911,8 @@ int si_dpm_init(struct radeon_device *rdev)
> eg_pi = &ni_pi->eg;
> pi = &eg_pi->rv7xx;
>
> -   speed_cap = pcie_get_speed_cap(root);
> +   if (!pci_is_root_bus(rdev->pdev->bus))
> +   speed_cap = pcie_get_speed_cap(root);
> if (speed_cap == PCI_SPEED_UNKNOWN) {
> si_pi->sys_pcie_mask = 0;
> } else {
> --
> 2.20.1
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [RFC PATCH] drm: disable WC optimization for cache coherent devices on non-x86

2019-01-21 Thread Ard Biesheuvel
On Mon, 21 Jan 2019 at 19:24, Michel Dänzer  wrote:
>
> On 2019-01-21 7:20 p.m., Ard Biesheuvel wrote:
> > On Mon, 21 Jan 2019 at 19:04, Michel Dänzer  wrote:
> >>
> >> On 2019-01-21 6:59 p.m., Ard Biesheuvel wrote:
> >>> On Mon, 21 Jan 2019 at 18:55, Michel Dänzer  wrote:
> 
>  On 2019-01-21 5:30 p.m., Ard Biesheuvel wrote:
> > On Mon, 21 Jan 2019 at 17:22, Christoph Hellwig  
> > wrote:
> >
> >> Until that happens we should just change the driver ifdefs to default
> >> the hacks to off and only enable them on setups where we 100%
> >> positively know that they actually work.  And document that fact
> >> in big fat comments.
> >
> > Well, as I mentioned in my commit log as well, if we default to off
> > unless CONFIG_X86, we may break working setups on MIPS and Power where
> > the device is in fact non-cache coherent, and relies on this
> > 'optimization' to get things working.
> 
>  FWIW, the amdgpu driver doesn't rely on non-snooped transfers for
>  correct basic operation (the scenario Christian brought up is a very
>  specialized use-case), so that shouldn't be an issue.
> 
> >>>
> >>> The point is that this is only true for x86.
> >>>
> >>> On other architectures, the use of non-cached mappings on the CPU side
> >>> means that you /do/ rely on non-snooped transfers, since if those
> >>> transfers turn out not to snoop inadvertently, the accesses are
> >>> incoherent with the CPU's view of memory.
> >>
> >> The driver generally only uses non-cached mappings if
> >> drm_arch/device_can_wc_memory returns true.
> >>
> >
> > Indeed. And so we should take care to only return 'true' from that
> > function if it is guaranteed that non-cached CPU mappings are coherent
> > with the mappings used by the GPU, either because that is always the
> > case (like on x86), or because we know that the platform in question
> > implements NoSnoop correctly throughout the interconnect.
> >
> > What seems to be complicating matters is that in some cases, the
> > device is non-cache coherent to begin with, so regardless of whether
> > the NoSnoop attribute is used or not, those accesses will not snoop in
> > the caches and be coherent with the non-cached mappings used by the
> > CPU. So if we restrict this optimization [on non-X86] to platforms
> > that are known to implement NoSnoop correctly, we may break platforms
> > that are implicitly NoSnoop all the time.
>
> Since the driver generally doesn't rely on non-snooped accesses for
> correctness, that couldn't "break" anything that hasn't always been broken.
>

Again, that is only true on x86.

On other architectures, DMA writes from the device may allocate in the
caches, and be invisible to the CPU when it uses non-cached mappings.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [RFC PATCH] drm: disable WC optimization for cache coherent devices on non-x86

2019-01-21 Thread Ard Biesheuvel
On Mon, 21 Jan 2019 at 19:04, Michel Dänzer  wrote:
>
> On 2019-01-21 6:59 p.m., Ard Biesheuvel wrote:
> > On Mon, 21 Jan 2019 at 18:55, Michel Dänzer  wrote:
> >>
> >> On 2019-01-21 5:30 p.m., Ard Biesheuvel wrote:
> >>> On Mon, 21 Jan 2019 at 17:22, Christoph Hellwig  
> >>> wrote:
> >>>
>  Until that happens we should just change the driver ifdefs to default
>  the hacks to off and only enable them on setups where we 100%
>  positively know that they actually work.  And document that fact
>  in big fat comments.
> >>>
> >>> Well, as I mentioned in my commit log as well, if we default to off
> >>> unless CONFIG_X86, we may break working setups on MIPS and Power where
> >>> the device is in fact non-cache coherent, and relies on this
> >>> 'optimization' to get things working.
> >>
> >> FWIW, the amdgpu driver doesn't rely on non-snooped transfers for
> >> correct basic operation (the scenario Christian brought up is a very
> >> specialized use-case), so that shouldn't be an issue.
> >>
> >
> > The point is that this is only true for x86.
> >
> > On other architectures, the use of non-cached mappings on the CPU side
> > means that you /do/ rely on non-snooped transfers, since if those
> > transfers turn out not to snoop inadvertently, the accesses are
> > incoherent with the CPU's view of memory.
>
> The driver generally only uses non-cached mappings if
> drm_arch/device_can_wc_memory returns true.
>

Indeed. And so we should take care to only return 'true' from that
function if it is guaranteed that non-cached CPU mappings are coherent
with the mappings used by the GPU, either because that is always the
case (like on x86), or because we know that the platform in question
implements NoSnoop correctly throughout the interconnect.

What seems to be complicating matters is that in some cases, the
device is non-cache coherent to begin with, so regardless of whether
the NoSnoop attribute is used or not, those accesses will not snoop in
the caches and be coherent with the non-cached mappings used by the
CPU. So if we restrict this optimization [on non-X86] to platforms
that are known to implement NoSnoop correctly, we may break platforms
that are implicitly NoSnoop all the time.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [RFC PATCH] drm: disable WC optimization for cache coherent devices on non-x86

2019-01-21 Thread Michel Dänzer
On 2019-01-21 7:20 p.m., Ard Biesheuvel wrote:
> On Mon, 21 Jan 2019 at 19:04, Michel Dänzer  wrote:
>>
>> On 2019-01-21 6:59 p.m., Ard Biesheuvel wrote:
>>> On Mon, 21 Jan 2019 at 18:55, Michel Dänzer  wrote:

 On 2019-01-21 5:30 p.m., Ard Biesheuvel wrote:
> On Mon, 21 Jan 2019 at 17:22, Christoph Hellwig  
> wrote:
>
>> Until that happens we should just change the driver ifdefs to default
>> the hacks to off and only enable them on setups where we 100%
>> positively know that they actually work.  And document that fact
>> in big fat comments.
>
> Well, as I mentioned in my commit log as well, if we default to off
> unless CONFIG_X86, we may break working setups on MIPS and Power where
> the device is in fact non-cache coherent, and relies on this
> 'optimization' to get things working.

 FWIW, the amdgpu driver doesn't rely on non-snooped transfers for
 correct basic operation (the scenario Christian brought up is a very
 specialized use-case), so that shouldn't be an issue.

>>>
>>> The point is that this is only true for x86.
>>>
>>> On other architectures, the use of non-cached mappings on the CPU side
>>> means that you /do/ rely on non-snooped transfers, since if those
>>> transfers turn out not to snoop inadvertently, the accesses are
>>> incoherent with the CPU's view of memory.
>>
>> The driver generally only uses non-cached mappings if
>> drm_arch/device_can_wc_memory returns true.
>>
> 
> Indeed. And so we should take care to only return 'true' from that
> function if it is guaranteed that non-cached CPU mappings are coherent
> with the mappings used by the GPU, either because that is always the
> case (like on x86), or because we know that the platform in question
> implements NoSnoop correctly throughout the interconnect.
> 
> What seems to be complicating matters is that in some cases, the
> device is non-cache coherent to begin with, so regardless of whether
> the NoSnoop attribute is used or not, those accesses will not snoop in
> the caches and be coherent with the non-cached mappings used by the
> CPU. So if we restrict this optimization [on non-X86] to platforms
> that are known to implement NoSnoop correctly, we may break platforms
> that are implicitly NoSnoop all the time.

Since the driver generally doesn't rely on non-snooped accesses for
correctness, that couldn't "break" anything that hasn't always been broken.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [RFC PATCH] drm: disable WC optimization for cache coherent devices on non-x86

2019-01-21 Thread Michel Dänzer
On 2019-01-21 6:59 p.m., Ard Biesheuvel wrote:
> On Mon, 21 Jan 2019 at 18:55, Michel Dänzer  wrote:
>>
>> On 2019-01-21 5:30 p.m., Ard Biesheuvel wrote:
>>> On Mon, 21 Jan 2019 at 17:22, Christoph Hellwig  wrote:
>>>
 Until that happens we should just change the driver ifdefs to default
 the hacks to off and only enable them on setups where we 100%
 positively know that they actually work.  And document that fact
 in big fat comments.
>>>
>>> Well, as I mentioned in my commit log as well, if we default to off
>>> unless CONFIG_X86, we may break working setups on MIPS and Power where
>>> the device is in fact non-cache coherent, and relies on this
>>> 'optimization' to get things working.
>>
>> FWIW, the amdgpu driver doesn't rely on non-snooped transfers for
>> correct basic operation (the scenario Christian brought up is a very
>> specialized use-case), so that shouldn't be an issue.
>>
> 
> The point is that this is only true for x86.
> 
> On other architectures, the use of non-cached mappings on the CPU side
> means that you /do/ rely on non-snooped transfers, since if those
> transfers turn out not to snoop inadvertently, the accesses are
> incoherent with the CPU's view of memory.

The driver generally only uses non-cached mappings if
drm_arch/device_can_wc_memory returns true.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [RFC PATCH] drm: disable WC optimization for cache coherent devices on non-x86

2019-01-21 Thread Ard Biesheuvel
On Mon, 21 Jan 2019 at 17:35, Christoph Hellwig  wrote:
>
> On Mon, Jan 21, 2019 at 05:30:00PM +0100, Ard Biesheuvel wrote:
> > > Until that happens we should just change the driver ifdefs to default
> > > the hacks to off and only enable them on setups where we 100%
> > > positively know that they actually work.  And document that fact
> > > in big fat comments.
> >
> > Well, as I mentioned in my commit log as well, if we default to off
> > unless CONFIG_X86, we may break working setups on MIPS and Power where
> > the device is in fact non-cache coherent, and relies on this
> > 'optimization' to get things working. The same could be true for
> > non-coherent ARM systems, hence my approach to disable this hack for
> > cache coherent devices on non-X86 only.
>
> Do we break existing setups or just reduce performance due to the lack
> of WC mappings?  I thought it was the latter.

Combining this check

#if defined(CONFIG_PPC) && !defined(CONFIG_NOT_COHERENT_CACHE)
return false;
...
#else
return true;
#endif

with the fact that the driver assumes that DMA is cache coherent, I'd
be surprised if this hardware works without the hack.

The optimization targets cache coherent systems where pushing data
through the cache into the GPU is wasteful, given that it evicts other
data and relies on snooping by the device. In this case, disabling the
optimization reduces performance.

My point is that it is highly likely that non-cache coherent systems
may accidentally be in a working state only due to this optimization,
and not because the driver deals with non-cache coherent DMA
correctly.

> The point is that
> even your check won't do what you actually said.

Well, it enables the optimization only for non-cache coherent devices.
How is that different from what I said?

>  At lot of non-loongson
> mips platforms are not cache coherent.

You are assuming that whoever added that check cared about other MIPS platforms

> As are a lot of arm setups
> and all sparc64 ones for example.  And chances that someone will
> hacks this file out in a random subsystem when adding news ports also
> is rather slim, so we'll remaing broken by default.
>
> That is why I want at very least:  a whitelist instead of a blacklist
> and some very big comments explaining what is going on here.

ideally, we'd turn off this optimization entirely unless running on
x86. That would be my preference.

However, given the above re non-cache coherent systems, i am cautious.

>  And in
> the mid to long term even drm really needs to learn to use the
> proper APIs :(

+1
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [RFC PATCH] drm: disable WC optimization for cache coherent devices on non-x86

2019-01-21 Thread Ard Biesheuvel
On Mon, 21 Jan 2019 at 18:55, Michel Dänzer  wrote:
>
> On 2019-01-21 5:30 p.m., Ard Biesheuvel wrote:
> > On Mon, 21 Jan 2019 at 17:22, Christoph Hellwig  wrote:
> >
> >> Until that happens we should just change the driver ifdefs to default
> >> the hacks to off and only enable them on setups where we 100%
> >> positively know that they actually work.  And document that fact
> >> in big fat comments.
> >
> > Well, as I mentioned in my commit log as well, if we default to off
> > unless CONFIG_X86, we may break working setups on MIPS and Power where
> > the device is in fact non-cache coherent, and relies on this
> > 'optimization' to get things working.
>
> FWIW, the amdgpu driver doesn't rely on non-snooped transfers for
> correct basic operation (the scenario Christian brought up is a very
> specialized use-case), so that shouldn't be an issue.
>

The point is that this is only true for x86.

On other architectures, the use of non-cached mappings on the CPU side
means that you /do/ rely on non-snooped transfers, since if those
transfers turn out not to snoop inadvertently, the accesses are
incoherent with the CPU's view of memory.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [RFC PATCH] drm: disable WC optimization for cache coherent devices on non-x86

2019-01-21 Thread Michel Dänzer
On 2019-01-21 5:30 p.m., Ard Biesheuvel wrote:
> On Mon, 21 Jan 2019 at 17:22, Christoph Hellwig  wrote:
> 
>> Until that happens we should just change the driver ifdefs to default
>> the hacks to off and only enable them on setups where we 100%
>> positively know that they actually work.  And document that fact
>> in big fat comments.
> 
> Well, as I mentioned in my commit log as well, if we default to off
> unless CONFIG_X86, we may break working setups on MIPS and Power where
> the device is in fact non-cache coherent, and relies on this
> 'optimization' to get things working.

FWIW, the amdgpu driver doesn't rely on non-snooped transfers for
correct basic operation (the scenario Christian brought up is a very
specialized use-case), so that shouldn't be an issue.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: BUG: unable to handle kernel NULL pointer dereference at 0000000000000008

2019-01-21 Thread Grodzovsky, Andrey
+ Harry

Looks like this is happening during GPU reset due to job time out. I would 
first try to reproduce this just with a plain reset from sysfs.
Mikhail, also please provide add2line for dce110_setup_audio_dto.isra.8+0x171

Andrey

On 01/20/2019 06:59 AM, Mikhail Gavrilov wrote:

Hi folks.
Yet another kernel panic just happens:

[ 5605.179568] BUG: unable to handle kernel NULL pointer dereference
at 0008
[ 5605.179572] #PF error: [normal kernel read fault]
[ 5605.179573] PGD 0 P4D 0
[ 5605.179576] Oops:  [#1] SMP NOPTI
[ 5605.179578] CPU: 5 PID: 2087 Comm: gnome-shell Tainted: G C
   5.0.0-0.rc2.git4.3.fc30.x86_64 #1
[ 5605.179580] Hardware name: System manufacturer System Product
Name/ROG STRIX X470-I GAMING, BIOS 1103 11/16/2018
[ 5605.179627] RIP: 0010:build_audio_output.isra.2+0x9a/0x110 [amdgpu]
[ 5605.179629] Code: 48 89 43 24 8b 95 18 01 00 00 89 53 18 89 53 1c
48 8b 45 08 8b 80 c4 02 00 00 83 f8 04 74 51 83 e8 20 83 e0 df 75 13
48 8b 3f <48> 8b 47 08 48 8b 40 08 e8 89 3e 56 f2 89 43 2c 8b 85 58 02
00 00
[ 5605.179631] RSP: 0018:b472c8f5b8b0 EFLAGS: 00010246
[ 5605.179632] RAX:  RBX: b472c8f5b8cc RCX: 99e027772800
[ 5605.179634] RDX: 00082302 RSI: 00879ec0 RDI: 
[ 5605.179635] RBP: 99dff0f74188 R08: 0008 R09: 0100
[ 5605.179636] R10: 0001 R11: 000f R12: 99dff0f74000
[ 5605.179637] R13: 99e027cec938 R14: 99dc099b R15: 99e027cec800
[ 5605.179639] FS:  7f37b5383640() GS:99e03d60()
knlGS:
[ 5605.179640] CS:  0010 DS:  ES:  CR0: 80050033
[ 5605.179641] CR2: 0008 CR3: 0007f2b28000 CR4: 003406e0
[ 5605.179642] Call Trace:
[ 5605.179688]  dce110_setup_audio_dto.isra.8+0x171/0x1b0 [amdgpu]
[ 5605.179734]  ? bios_set_scratch_critical_state+0x79/0x140 [amdgpu]
[ 5605.179746]  dce110_apply_ctx_to_hw+0x154/0x4a0 [amdgpu]
[ 5605.179792]  ? dm_pp_apply_display_requirements+0x196/0x1a0 [amdgpu]
[ 5605.179792]  ? dce12_update_clocks+0xea/0x120 [amdgpu]
[ 5605.179792]  dc_commit_state+0x2af/0x500 [amdgpu]
[ 5605.179792]  amdgpu_dm_atomic_commit_tail+0x1ef/0xb90 [amdgpu]
[ 5605.179792]  commit_tail+0x3d/0x70 [drm_kms_helper]
[ 5605.179792]  drm_atomic_helper_commit+0xdf/0x150 [drm_kms_helper]
[ 5605.179792]  drm_atomic_helper_set_config+0x75/0x80 [drm_kms_helper]
[ 5605.179792]  drm_mode_setcrtc+0x1a4/0x6e0 [drm]
[ 5605.179792]  ? drm_mode_getcrtc+0x180/0x180 [drm]
[ 5605.179792]  drm_ioctl_kernel+0xa9/0xf0 [drm]
[ 5605.179792]  drm_ioctl+0x201/0x3a0 [drm]
[ 5605.179792]  ? drm_mode_getcrtc+0x180/0x180 [drm]
[ 5605.179792]  amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
[ 5605.179792]  do_vfs_ioctl+0xa5/0x6f0
[ 5605.179792]  ksys_ioctl+0x60/0x90
[ 5605.179792]  __x64_sys_ioctl+0x16/0x20
[ 5605.179792]  do_syscall_64+0x60/0x1f0
[ 5605.179792]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 5605.179792] RIP: 0033:0x7f37b8eda39b
[ 5605.179792] Code: 0f 1e fa 48 8b 05 ed da 0c 00 64 c7 00 26 00 00
00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00
00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d bd da 0c 00 f7 d8 64 89
01 48
[ 5605.179792] RSP: 002b:7ffdd307f5a8 EFLAGS: 0246 ORIG_RAX:
0010
[ 5605.179792] RAX: ffda RBX: 7ffdd307f5e0 RCX: 7f37b8eda39b
[ 5605.179792] RDX: 7ffdd307f5e0 RSI: c06864a2 RDI: 000b
[ 5605.179792] RBP: 7ffdd307f5e0 R08:  R09: 5573cad41d40
[ 5605.179792] R10: 7f37980091e0 R11: 0246 R12: c06864a2
[ 5605.179792] R13: 000b R14:  R15: 7f37980091e0
[ 5605.179792] Modules linked in: fuse rfcomm xt_CHECKSUM
ipt_MASQUERADE tun bridge stp llc devlink nf_conntrack_netbios_ns
nf_conntrack_broadcast xt_CT ip6t_rpfilter ip6t_REJECT nf_reject_ipv6
xt_conntrack ebtable_nat ip6table_nat nf_nat_ipv6 ip6table_mangle
ip6table_raw ip6table_security iptable_nat nf_nat_ipv4 nf_nat
iptable_mangle iptable_raw iptable_security nf_conntrack
nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ip_set nfnetlink
ebtable_filter ebtables ip6table_filter ip6_tables cmac bnep sunrpc
vfat fat arc4 edac_mce_amd r8822be(C) kvm_amd kvm joydev
snd_hda_codec_realtek snd_hda_codec_generic irqbypass ledtrig_audio
snd_hda_codec_hdmi uvcvideo mac80211 snd_hda_intel videobuf2_vmalloc
videobuf2_memops videobuf2_v4l2 snd_hda_codec snd_usb_audio
videobuf2_common crct10dif_pclmul crc32_pclmul snd_hda_core videodev
snd_usbmidi_lib snd_rawmidi snd_hwdep ghash_clmulni_intel media
snd_seq snd_seq_device btusb btrtl btbcm btintel snd_pcm bluetooth
wmi_bmof cfg80211 eeepc_wmi asus_wmi
[ 5605.179792]  sparse_keymap video snd_timer snd sp5100_tco i2c_piix4
k10temp soundcore ecdh_generic ccp rfkill pcc_cpufreq gpio_amdpt
gpio_generic acpi_cpufreq amdgpu hid_logitech_hidpp chash amd_iommu_v2
gpu_sched ttm drm_kms_helper hid_logitech_dj crc32c_intel drm igb dca
i2c_algo_bit nvme nvme_core wmi 

Re: [RFC PATCH] drm: disable WC optimization for cache coherent devices on non-x86

2019-01-21 Thread Ard Biesheuvel
On Mon, 21 Jan 2019 at 17:22, Christoph Hellwig  wrote:
>
> On Mon, Jan 21, 2019 at 05:14:37PM +0100, Ard Biesheuvel wrote:
> > > I'll add big fat comments.  But the fact that nothing is exported
> > > there should be a fairly big hint.
> > >
> >
> > I don't follow. How do other header files 'export' things in a way
> > that this header doesn't?
>
> Well, I'll add comments to make it more obvious..
>
> > As far as I can tell, these drivers allocate DMA'able memory [in
> > ttm_tt_populate()] and subsequently create their own CPU mappings for
> > it, assuming that
> > a) the default is cache coherent, so vmap()ing those pages with
> > cacheable attributes works, and
>
> Yikes.  vmaping with different attributes is generally prone to
> create problems on a lot of architectures.
>

Indeed. But if your starting point is the assumption that DMA is
always cache coherent, those vmap() attributes are never different.

> > b) telling the GPU to use NoSnoop attributes makes the accesses it
> > performs coherent with non-cacheable CPU mappings of those physical
> > pages
> >
> > Since the latter is not true for many arm64 systems, I need this patch
> > to get a working system.
>
> Do we know that this actually works anywhere but x86?
>

In theory, it could work on arm64 systems with stage2-only SMMUs and
correctly configured PCIe RCs that set the right AMBA attributes for
inbound transactions with the NoSnoop attributes set.

Unfortunately, it seems that the current SMMU ARM code will clobber
those AMBA attributes when it uses stage1 mappings, since it forces
the memory attributes to WBWA for cache coherent devices.

So, as I pointed out in the commit log, the main difference between
x86 and other arches it that it can easily tolerate when NoSnoop is
non-functional.

> In general I would call these above sequence rather bogus and would
> prefer we could get rid of such antipatterns in the kernel and just use
> dma_alloc_attrs with DMA_ATTR_WRITECOMBINE if we want writecombine
> semantics.
>

Agreed.

> Until that happens we should just change the driver ifdefs to default
> the hacks to off and only enable them on setups where we 100%
> positively know that they actually work.  And document that fact
> in big fat comments.

Well, as I mentioned in my commit log as well, if we default to off
unless CONFIG_X86, we may break working setups on MIPS and Power where
the device is in fact non-cache coherent, and relies on this
'optimization' to get things working. The same could be true for
non-coherent ARM systems, hence my approach to disable this hack for
cache coherent devices on non-X86 only.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [RFC PATCH] drm: disable WC optimization for cache coherent devices on non-x86

2019-01-21 Thread Ard Biesheuvel
On Mon, 21 Jan 2019 at 16:59, Christoph Hellwig  wrote:
>
> On Mon, Jan 21, 2019 at 04:33:27PM +0100, Ard Biesheuvel wrote:
> > On Mon, 21 Jan 2019 at 16:07, Christoph Hellwig  wrote:
> > >
> > > > +#include 
> > >
> > > This header is not for usage in device drivers, but purely for
> > > dma-mapping implementations!
> > >
> >
> > Is that documented anywhere?
>
> I'll add big fat comments.  But the fact that nothing is exported
> there should be a fairly big hint.
>

I don't follow. How do other header files 'export' things in a way
that this header doesn't?

> > > And even if something like this was valid to do, it would have to be
> > > a core function with an arch hook, and not hidden in a random driver.
> >
> > Why would it not be valid to do? Is it wrong for a driver to be aware
> > of whether a device is cache coherent or not?
> >
> > And in case it isn't, do you have an alternative suggestion on how to
> > fix this mess?
>
> For the write combine mappings we need a proper core API how instances
> can advertise the support.  One thing I want to do fairly soon is
> error checking of the attrs argument to dma_alloc_attrs - so if you
> pass in something unsupported it will give you back an error.
>

That sounds useful.

> It seems that isn't quite enough for the drm use case, so we might
> also need a way to query these features, but that really has to go
> through the usual dma layer abstraction as well and not hacked together
> in a driver based on an eduacted guestimate.

As far as I can tell, these drivers allocate DMA'able memory [in
ttm_tt_populate()] and subsequently create their own CPU mappings for
it, assuming that
a) the default is cache coherent, so vmap()ing those pages with
cacheable attributes works, and
b) telling the GPU to use NoSnoop attributes makes the accesses it
performs coherent with non-cacheable CPU mappings of those physical
pages

Since the latter is not true for many arm64 systems, I need this patch
to get a working system.

So how do you propose we proceed to get this fixed? Does it have to
wait for this proper core API plus followup changes for DRM?
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [RFC PATCH] drm: disable WC optimization for cache coherent devices on non-x86

2019-01-21 Thread Christoph Hellwig
On Mon, Jan 21, 2019 at 05:30:00PM +0100, Ard Biesheuvel wrote:
> > Until that happens we should just change the driver ifdefs to default
> > the hacks to off and only enable them on setups where we 100%
> > positively know that they actually work.  And document that fact
> > in big fat comments.
> 
> Well, as I mentioned in my commit log as well, if we default to off
> unless CONFIG_X86, we may break working setups on MIPS and Power where
> the device is in fact non-cache coherent, and relies on this
> 'optimization' to get things working. The same could be true for
> non-coherent ARM systems, hence my approach to disable this hack for
> cache coherent devices on non-X86 only.

Do we break existing setups or just reduce performance due to the lack
of WC mappings?  I thought it was the latter.  The point is that
even your check won't do what you actually said.  At lot of non-loongson
mips platforms are not cache coherent.  As are a lot of arm setups
and all sparc64 ones for example.  And chances that someone will
hacks this file out in a random subsystem when adding news ports also
is rather slim, so we'll remaing broken by default.

That is why I want at very least:  a whitelist instead of a blacklist
and some very big comments explaining what is going on here.  And in
the mid to long term even drm really needs to learn to use the
proper APIs :(
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [RFC PATCH] drm: disable WC optimization for cache coherent devices on non-x86

2019-01-21 Thread Christoph Hellwig
On Mon, Jan 21, 2019 at 05:14:37PM +0100, Ard Biesheuvel wrote:
> > I'll add big fat comments.  But the fact that nothing is exported
> > there should be a fairly big hint.
> >
> 
> I don't follow. How do other header files 'export' things in a way
> that this header doesn't?

Well, I'll add comments to make it more obvious..

> As far as I can tell, these drivers allocate DMA'able memory [in
> ttm_tt_populate()] and subsequently create their own CPU mappings for
> it, assuming that
> a) the default is cache coherent, so vmap()ing those pages with
> cacheable attributes works, and

Yikes.  vmaping with different attributes is generally prone to
create problems on a lot of architectures.

> b) telling the GPU to use NoSnoop attributes makes the accesses it
> performs coherent with non-cacheable CPU mappings of those physical
> pages
> 
> Since the latter is not true for many arm64 systems, I need this patch
> to get a working system.

Do we know that this actually works anywhere but x86?

In general I would call these above sequence rather bogus and would
prefer we could get rid of such antipatterns in the kernel and just use
dma_alloc_attrs with DMA_ATTR_WRITECOMBINE if we want writecombine
semantics.

Until that happens we should just change the driver ifdefs to default
the hacks to off and only enable them on setups where we 100%
positively know that they actually work.  And document that fact
in big fat comments.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 2/2] drm/amd/powerplay: run btc before enabling all SMU features

2019-01-21 Thread Deucher, Alexander
Series is:

Acked-by: Alex Deucher 


From: amd-gfx  on behalf of Evan Quan 

Sent: Monday, January 21, 2019 4:46:07 AM
To: amd-gfx@lists.freedesktop.org
Cc: Quan, Evan
Subject: [PATCH 2/2] drm/amd/powerplay: run btc before enabling all SMU features

BTC is needed before enabling all SMU features.

Change-Id: Ic717226528f4d09a58264524b2d8e67150a35da7
Signed-off-by: Evan Quan 
---
 drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c 
b/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c
index 60a22d8da7f0..5085b3636f8e 100644
--- a/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c
+++ b/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c
@@ -804,6 +804,11 @@ static int vega20_set_allowed_featuresmask(struct pp_hwmgr 
*hwmgr)
 return 0;
 }

+static int vega20_run_btc(struct pp_hwmgr *hwmgr)
+{
+   return smum_send_msg_to_smc(hwmgr, PPSMC_MSG_RunBtc);
+}
+
 static int vega20_run_btc_afll(struct pp_hwmgr *hwmgr)
 {
 return smum_send_msg_to_smc(hwmgr, PPSMC_MSG_RunAfllBtc);
@@ -1565,6 +1570,11 @@ static int vega20_enable_dpm_tasks(struct pp_hwmgr 
*hwmgr)
 "[EnableDPMTasks] Failed to initialize SMC table!",
 return result);

+   result = vega20_run_btc(hwmgr);
+   PP_ASSERT_WITH_CODE(!result,
+   "[EnableDPMTasks] Failed to run btc!",
+   return result);
+
 result = vega20_run_btc_afll(hwmgr);
 PP_ASSERT_WITH_CODE(!result,
 "[EnableDPMTasks] Failed to run btc afll!",
--
2.20.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [RFC PATCH] drm: disable WC optimization for cache coherent devices on non-x86

2019-01-21 Thread Christoph Hellwig
On Mon, Jan 21, 2019 at 04:33:27PM +0100, Ard Biesheuvel wrote:
> On Mon, 21 Jan 2019 at 16:07, Christoph Hellwig  wrote:
> >
> > > +#include 
> >
> > This header is not for usage in device drivers, but purely for
> > dma-mapping implementations!
> >
> 
> Is that documented anywhere?

I'll add big fat comments.  But the fact that nothing is exported
there should be a fairly big hint.

> > And even if something like this was valid to do, it would have to be
> > a core function with an arch hook, and not hidden in a random driver.
> 
> Why would it not be valid to do? Is it wrong for a driver to be aware
> of whether a device is cache coherent or not?
> 
> And in case it isn't, do you have an alternative suggestion on how to
> fix this mess?

For the write combine mappings we need a proper core API how instances
can advertise the support.  One thing I want to do fairly soon is
error checking of the attrs argument to dma_alloc_attrs - so if you
pass in something unsupported it will give you back an error.

It seems that isn't quite enough for the drm use case, so we might
also need a way to query these features, but that really has to go
through the usual dma layer abstraction as well and not hacked together
in a driver based on an eduacted guestimate.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [RFC PATCH] drm: disable WC optimization for cache coherent devices on non-x86

2019-01-21 Thread Ard Biesheuvel
On Mon, 21 Jan 2019 at 16:07, Christoph Hellwig  wrote:
>
> > +#include 
>
> This header is not for usage in device drivers, but purely for
> dma-mapping implementations!
>

Is that documented anywhere?

> > +static inline bool drm_device_can_wc_memory(struct drm_device *ddev)
> >  {
> > + if (IS_ENABLED(CONFIG_PPC))
> > + return IS_ENABLED(CONFIG_NOT_COHERENT_CACHE);
> > + else if (IS_ENABLED(CONFIG_MIPS))
> > + return !IS_ENABLED(CONFIG_CPU_LOONGSON3);
> > + else if (IS_ENABLED(CONFIG_X86))
> > + return true;
> > +
> > + return !dev_is_dma_coherent(ddev->dev);
>
> And even if something like this was valid to do, it would have to be
> a core function with an arch hook, and not hidden in a random driver.

Why would it not be valid to do? Is it wrong for a driver to be aware
of whether a device is cache coherent or not?

And in case it isn't, do you have an alternative suggestion on how to
fix this mess?
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amd/display: change the max clock level to 16

2019-01-21 Thread Alex Deucher
On Mon, Jan 21, 2019 at 5:02 AM Evan Quan  wrote:
>
> As the gfxclk for SMU11 can have at most 16 discrete levels.
>
> Change-Id: I0c6a8db8f40206a240286471c6f7b1fffef15ea2
> Signed-off-by: Evan Quan 

Acked-by: Alex Deucher 

> ---
>  drivers/gpu/drm/amd/display/dc/dm_services_types.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/display/dc/dm_services_types.h 
> b/drivers/gpu/drm/amd/display/dc/dm_services_types.h
> index 9afd36a031a9..77200711abbe 100644
> --- a/drivers/gpu/drm/amd/display/dc/dm_services_types.h
> +++ b/drivers/gpu/drm/amd/display/dc/dm_services_types.h
> @@ -92,7 +92,7 @@ enum dm_pp_clock_type {
> (clk_type) == DM_PP_CLOCK_TYPE_FCLK ? "F" : \
> "Invalid"
>
> -#define DM_PP_MAX_CLOCK_LEVELS 8
> +#define DM_PP_MAX_CLOCK_LEVELS 16
>
>  struct dm_pp_clock_levels {
> uint32_t num_levels;
> --
> 2.20.1
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [RFC PATCH] drm: disable WC optimization for cache coherent devices on non-x86

2019-01-21 Thread Christoph Hellwig
> +#include 

This header is not for usage in device drivers, but purely for
dma-mapping implementations!

> +static inline bool drm_device_can_wc_memory(struct drm_device *ddev)
>  {
> + if (IS_ENABLED(CONFIG_PPC))
> + return IS_ENABLED(CONFIG_NOT_COHERENT_CACHE);
> + else if (IS_ENABLED(CONFIG_MIPS))
> + return !IS_ENABLED(CONFIG_CPU_LOONGSON3);
> + else if (IS_ENABLED(CONFIG_X86))
> + return true;
> +
> + return !dev_is_dma_coherent(ddev->dev);

And even if something like this was valid to do, it would have to be
a core function with an arch hook, and not hidden in a random driver.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[RFC PATCH] drm: disable WC optimization for cache coherent devices on non-x86

2019-01-21 Thread Ard Biesheuvel
Currently, the DRM code assumes that PCI devices are always cache
coherent for DMA, and that this can be selectively overridden for
some buffers using non-cached mappings on the CPU side and PCIe
NoSnoop transactions on the bus side.

Whether the NoSnoop part is implemented correctly is highly platform
specific. Whether it /matters/ if NoSnoop is implemented correctly or
not is architecture specific: on x86, such transactions are coherent
with the CPU whether the NoSnoop attribute is honored or not. On other
architectures, it depends on whether such transactions may allocate in
caches that are non-coherent with the CPU's uncached mappings.

Bottom line is that we should not rely on this optimization to work
correctly for cache coherent devices in the general case. On the
other hand, disabling this optimization for non-coherent devices
is likely to cause breakage as well, since the driver will assume
cache coherent PCIe if this optimization is turned off.

So rename drm_arch_can_wc_memory() to drm_device_can_wc_memory(), and
pass the drm_device pointer into it so we can base the return value
on whether the device is cache coherent or not if not running on
X86.

Cc: Christian Koenig 
Cc: Alex Deucher 
Cc: David Zhou 
Cc: Huang Rui 
Cc: Junwei Zhang 
Cc: Michel Daenzer 
Cc: David Airlie 
Cc: Daniel Vetter 
Cc: Maarten Lankhorst 
Cc: Maxime Ripard 
Cc: Sean Paul 
Cc: Michael Ellerman 
Cc: Benjamin Herrenschmidt 
Cc: Will Deacon 
Reported-by: Carsten Haitzler 
Signed-off-by: Ard Biesheuvel 
---
This is a followup to '[RFC PATCH] drm/ttm: force cached mappings for system
RAM on ARM'

https://lore.kernel.org/linux-arm-kernel/20190110072841.3283-1-ard.biesheu...@linaro.org/

Without t
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |  2 +-
 drivers/gpu/drm/radeon/radeon_object.c |  2 +-
 include/drm/drm_cache.h| 19 +++
 3 files changed, 13 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 728e15e5d68a..777fa251838f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -480,7 +480,7 @@ static int amdgpu_bo_do_create(struct amdgpu_device *adev,
/* For architectures that don't support WC memory,
 * mask out the WC flag from the BO
 */
-   if (!drm_arch_can_wc_memory())
+   if (!drm_device_can_wc_memory(adev->ddev))
bo->flags &= ~AMDGPU_GEM_CREATE_CPU_GTT_USWC;
 #endif
 
diff --git a/drivers/gpu/drm/radeon/radeon_object.c 
b/drivers/gpu/drm/radeon/radeon_object.c
index 833e909706a9..610889bf6ab5 100644
--- a/drivers/gpu/drm/radeon/radeon_object.c
+++ b/drivers/gpu/drm/radeon/radeon_object.c
@@ -249,7 +249,7 @@ int radeon_bo_create(struct radeon_device *rdev,
/* For architectures that don't support WC memory,
 * mask out the WC flag from the BO
 */
-   if (!drm_arch_can_wc_memory())
+   if (!drm_device_can_wc_memory(rdev->ddev))
bo->flags &= ~RADEON_GEM_GTT_WC;
 #endif
 
diff --git a/include/drm/drm_cache.h b/include/drm/drm_cache.h
index bfe1639df02d..ced63b1207a3 100644
--- a/include/drm/drm_cache.h
+++ b/include/drm/drm_cache.h
@@ -33,6 +33,8 @@
 #ifndef _DRM_CACHE_H_
 #define _DRM_CACHE_H_
 
+#include 
+#include 
 #include 
 
 void drm_clflush_pages(struct page *pages[], unsigned long num_pages);
@@ -41,15 +43,16 @@ void drm_clflush_virt_range(void *addr, unsigned long 
length);
 u64 drm_get_max_iomem(void);
 
 
-static inline bool drm_arch_can_wc_memory(void)
+static inline bool drm_device_can_wc_memory(struct drm_device *ddev)
 {
-#if defined(CONFIG_PPC) && !defined(CONFIG_NOT_COHERENT_CACHE)
-   return false;
-#elif defined(CONFIG_MIPS) && defined(CONFIG_CPU_LOONGSON3)
-   return false;
-#else
-   return true;
-#endif
+   if (IS_ENABLED(CONFIG_PPC))
+   return IS_ENABLED(CONFIG_NOT_COHERENT_CACHE);
+   else if (IS_ENABLED(CONFIG_MIPS))
+   return !IS_ENABLED(CONFIG_CPU_LOONGSON3);
+   else if (IS_ENABLED(CONFIG_X86))
+   return true;
+
+   return !dev_is_dma_coherent(ddev->dev);
 }
 
 #endif
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [RFC PATCH] drm: disable WC optimization for cache coherent devices on non-x86

2019-01-21 Thread Koenig, Christian
Am 21.01.19 um 11:06 schrieb Ard Biesheuvel:
> Currently, the DRM code assumes that PCI devices are always cache
> coherent for DMA, and that this can be selectively overridden for
> some buffers using non-cached mappings on the CPU side and PCIe
> NoSnoop transactions on the bus side.
>
> Whether the NoSnoop part is implemented correctly is highly platform
> specific. Whether it /matters/ if NoSnoop is implemented correctly or
> not is architecture specific: on x86, such transactions are coherent
> with the CPU whether the NoSnoop attribute is honored or not. On other
> architectures, it depends on whether such transactions may allocate in
> caches that are non-coherent with the CPU's uncached mappings.
>
> Bottom line is that we should not rely on this optimization to work
> correctly for cache coherent devices in the general case. On the
> other hand, disabling this optimization for non-coherent devices
> is likely to cause breakage as well, since the driver will assume
> cache coherent PCIe if this optimization is turned off.
>
> So rename drm_arch_can_wc_memory() to drm_device_can_wc_memory(), and
> pass the drm_device pointer into it so we can base the return value
> on whether the device is cache coherent or not if not running on
> X86.
>
> Cc: Christian Koenig 
> Cc: Alex Deucher 
> Cc: David Zhou 
> Cc: Huang Rui 
> Cc: Junwei Zhang 
> Cc: Michel Daenzer 
> Cc: David Airlie 
> Cc: Daniel Vetter 
> Cc: Maarten Lankhorst 
> Cc: Maxime Ripard 
> Cc: Sean Paul 
> Cc: Michael Ellerman 
> Cc: Benjamin Herrenschmidt 
> Cc: Will Deacon 
> Reported-by: Carsten Haitzler 
> Signed-off-by: Ard Biesheuvel 

Looks sane to me, but I can't fully judge if the check is actually 
correct or not.

So Acked-by: Christian König 

> ---
> This is a followup to '[RFC PATCH] drm/ttm: force cached mappings for system
> RAM on ARM'
>
> https://lore.kernel.org/linux-arm-kernel/20190110072841.3283-1-ard.biesheu...@linaro.org/
>
> Without t
>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |  2 +-
>   drivers/gpu/drm/radeon/radeon_object.c |  2 +-
>   include/drm/drm_cache.h| 19 +++
>   3 files changed, 13 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> index 728e15e5d68a..777fa251838f 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> @@ -480,7 +480,7 @@ static int amdgpu_bo_do_create(struct amdgpu_device *adev,
>   /* For architectures that don't support WC memory,
>* mask out the WC flag from the BO
>*/
> - if (!drm_arch_can_wc_memory())
> + if (!drm_device_can_wc_memory(adev->ddev))
>   bo->flags &= ~AMDGPU_GEM_CREATE_CPU_GTT_USWC;
>   #endif
>   
> diff --git a/drivers/gpu/drm/radeon/radeon_object.c 
> b/drivers/gpu/drm/radeon/radeon_object.c
> index 833e909706a9..610889bf6ab5 100644
> --- a/drivers/gpu/drm/radeon/radeon_object.c
> +++ b/drivers/gpu/drm/radeon/radeon_object.c
> @@ -249,7 +249,7 @@ int radeon_bo_create(struct radeon_device *rdev,
>   /* For architectures that don't support WC memory,
>* mask out the WC flag from the BO
>*/
> - if (!drm_arch_can_wc_memory())
> + if (!drm_device_can_wc_memory(rdev->ddev))
>   bo->flags &= ~RADEON_GEM_GTT_WC;
>   #endif
>   
> diff --git a/include/drm/drm_cache.h b/include/drm/drm_cache.h
> index bfe1639df02d..ced63b1207a3 100644
> --- a/include/drm/drm_cache.h
> +++ b/include/drm/drm_cache.h
> @@ -33,6 +33,8 @@
>   #ifndef _DRM_CACHE_H_
>   #define _DRM_CACHE_H_
>   
> +#include 
> +#include 
>   #include 
>   
>   void drm_clflush_pages(struct page *pages[], unsigned long num_pages);
> @@ -41,15 +43,16 @@ void drm_clflush_virt_range(void *addr, unsigned long 
> length);
>   u64 drm_get_max_iomem(void);
>   
>   
> -static inline bool drm_arch_can_wc_memory(void)
> +static inline bool drm_device_can_wc_memory(struct drm_device *ddev)
>   {
> -#if defined(CONFIG_PPC) && !defined(CONFIG_NOT_COHERENT_CACHE)
> - return false;
> -#elif defined(CONFIG_MIPS) && defined(CONFIG_CPU_LOONGSON3)
> - return false;
> -#else
> - return true;
> -#endif
> + if (IS_ENABLED(CONFIG_PPC))
> + return IS_ENABLED(CONFIG_NOT_COHERENT_CACHE);
> + else if (IS_ENABLED(CONFIG_MIPS))
> + return !IS_ENABLED(CONFIG_CPU_LOONGSON3);
> + else if (IS_ENABLED(CONFIG_X86))
> + return true;
> +
> + return !dev_is_dma_coherent(ddev->dev);
>   }
>   
>   #endif

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/amd/display: change the max clock level to 16

2019-01-21 Thread Evan Quan
As the gfxclk for SMU11 can have at most 16 discrete levels.

Change-Id: I0c6a8db8f40206a240286471c6f7b1fffef15ea2
Signed-off-by: Evan Quan 
---
 drivers/gpu/drm/amd/display/dc/dm_services_types.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dm_services_types.h 
b/drivers/gpu/drm/amd/display/dc/dm_services_types.h
index 9afd36a031a9..77200711abbe 100644
--- a/drivers/gpu/drm/amd/display/dc/dm_services_types.h
+++ b/drivers/gpu/drm/amd/display/dc/dm_services_types.h
@@ -92,7 +92,7 @@ enum dm_pp_clock_type {
(clk_type) == DM_PP_CLOCK_TYPE_FCLK ? "F" : \
"Invalid"
 
-#define DM_PP_MAX_CLOCK_LEVELS 8
+#define DM_PP_MAX_CLOCK_LEVELS 16
 
 struct dm_pp_clock_levels {
uint32_t num_levels;
-- 
2.20.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: csa_vaddr should not larger than AMDGPU_GMC_HOLE_START

2019-01-21 Thread Koenig, Christian
Am 21.01.19 um 05:35 schrieb Liu, Monk:
> Actually that's not so crazy at all. See the ATC uses the CPU page tables to 
> provide parts of the virtual GPU address space.

So aperture 0->hole-start will be translated *not* by GMC9’s page table but 
instead by CPU’s (or IOMMU ?) MMU table after ATC enabled ? like GMC9 give up 
the role to do the translate but just deliver the address to CPU MMU (or 
IOMMU), is that correct ?

More or less yes. The GMC9 page tables still take precedence before the ATC 
IIRC, but the problem is simply that two hardware engines want to use the same 
address space.

So when you have a collision you just mess things up no matter what you do.

 Besides, where did you read that HOLE_START is set to be 0x8000 ? any 
registers actually controls this number ? I want to take a close look in the 
GMC registers regarding it

As far as I know that is hard wired on Vega. I was once in a meeting where they 
discussed if it should be configurable on Navi, but I'm not sure if they 
actually did that.

Christian.


No offending received, don’t be sorry at all, I don’t know that part before and 
thanks for sharing it.

Thank you
/Monk

From: amd-gfx 

 On Behalf Of Koenig, Christian
Sent: Friday, January 18, 2019 8:21 PM
To: Liu, Monk ; Lou, Wentao 
; 
amd-gfx@lists.freedesktop.org; Zhu, Rex 

Cc: Deng, Emily 
Subject: Re: [PATCH] drm/amdgpu: csa_vaddr should not larger than 
AMDGPU_GMC_HOLE_START

You know what, …  when you explained range 0 to HOLE-START is even not good to 
exposed to UMD I thought you made a typo and that’s why I repeat my question 
again …
Sorry my fault then. Didn't wanted to sound offending.


it’s the first time I heard that GMC9  cannot use 0 -> HOLE-START even for UMD 
general usage …
Well you actually can do it, but then you can't use the ATC or other SVA 
mechanism.


With your assert in DEV_INFO the “virtual_address_offset/max” is now *totally* 
wrong … I saw current kmd still give that range from 0 to HOLE_START
That is actually correct and for backward compatibility with old userspace. But 
since old userspace won't use the ATC that is also not a problem.

As I said one possibility to solve this issue would be to use a low CSA address 
for SRIOV, because the ATC isn't usable with SRIOV anyway.

I would just like to avoid that because it sounds like the CSA for some reason 
doesn't work at all in the higher address range and we will certainly then run 
into issues with that on bare metal as well.


I need to check what you said with some HW guys, that sounds crazy …
Actually that's not so crazy at all. See the ATC uses the CPU page tables to 
provide parts of the virtual GPU address space.

E.g. when it is enabled you can then use the same pointer to memory on the CPU 
and the GPU.

The problem is when the UMD now manually maps something into this range you can 
have a clash of the address space and the MC doesn't know any more if it should 
send a request to the CPU or the GPU page tables.

Regards,
Christian.

Am 18.01.19 um 11:57 schrieb Liu, Monk:
You know what, …  when you explained range 0 to HOLE-START is even not good to 
exposed to UMD I thought you made a typo and that’s why I repeat my question 
again …
it’s the first time I heard that GMC9  cannot use 0 -> HOLE-START even for UMD 
general usage …
With your assert in DEV_INFO the “virtual_address_offset/max” is now *totally* 
wrong … I saw current kmd still give that range from 0 to HOLE_START
I need to check what you said with some HW guys, that sounds crazy …

/Monk
From: Christian König 

Sent: Friday, January 18, 2019 5:12 PM
To: Liu, Monk ; Koenig, Christian 
; Lou, Wentao 
; 
amd-gfx@lists.freedesktop.org; Zhu, Rex 

Cc: Deng, Emily 
Subject: Re: [PATCH] drm/amdgpu: csa_vaddr should not larger than 
AMDGPU_GMC_HOLE_START

Hi Monk,



You see that for UMD, it can use 0 to HOLE_START
Let me say it once more: The UMD nor anybody else CAN'T use 0 to HOLE_START, 
that region is reserved for the ATC hardware!

We unfortunately didn't knew that initially and also didn't used the ATC, so we 
didn't ran into a problem.

But ROCm now uses the ATC on Raven/Picasso and I have a branch where I enable 
it on Vega as well. So when we don't fix that we will run into problems here.

The ATC isn't usable in combination with SRIOV and I don't think Windows uses 
it either, so they probably never ran into any issues.



Do you mean even UMD should  not use virtual address that dropped in range from 
0 to HOLE_START ?
Yes, exactly! That in combination with ATC use can have quite a bunch of 
strange and hard to track

[PATCH 1/2] drm/amd/powerplay: fit the SOC clock also to the new performance level

2019-01-21 Thread Evan Quan
The SOC clock needs also to fit the new performance level.

Change-Id: I24c5c4cdff11a4d2e0946b970ed950a4fc530b0a
Signed-off-by: Evan Quan 
---
 .../drm/amd/powerplay/hwmgr/vega20_hwmgr.c| 37 +++
 1 file changed, 37 insertions(+)

diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c 
b/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c
index 8c1fa985c7d4..60a22d8da7f0 100644
--- a/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c
+++ b/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c
@@ -2170,6 +2170,12 @@ static int vega20_force_dpm_highest(struct pp_hwmgr 
*hwmgr)
data->dpm_table.mem_table.dpm_state.soft_max_level =
data->dpm_table.mem_table.dpm_levels[soft_level].value;
 
+   soft_level = 
vega20_find_highest_dpm_level(&(data->dpm_table.soc_table));
+
+   data->dpm_table.soc_table.dpm_state.soft_min_level =
+   data->dpm_table.soc_table.dpm_state.soft_max_level =
+   data->dpm_table.soc_table.dpm_levels[soft_level].value;
+
ret = vega20_upload_dpm_min_level(hwmgr, 0x);
PP_ASSERT_WITH_CODE(!ret,
"Failed to upload boot level to highest!",
@@ -2202,6 +2208,12 @@ static int vega20_force_dpm_lowest(struct pp_hwmgr 
*hwmgr)
data->dpm_table.mem_table.dpm_state.soft_max_level =
data->dpm_table.mem_table.dpm_levels[soft_level].value;
 
+   soft_level = vega20_find_lowest_dpm_level(&(data->dpm_table.soc_table));
+
+   data->dpm_table.soc_table.dpm_state.soft_min_level =
+   data->dpm_table.soc_table.dpm_state.soft_max_level =
+   data->dpm_table.soc_table.dpm_levels[soft_level].value;
+
ret = vega20_upload_dpm_min_level(hwmgr, 0x);
PP_ASSERT_WITH_CODE(!ret,
"Failed to upload boot level to highest!",
@@ -2218,8 +2230,32 @@ static int vega20_force_dpm_lowest(struct pp_hwmgr 
*hwmgr)
 
 static int vega20_unforce_dpm_levels(struct pp_hwmgr *hwmgr)
 {
+   struct vega20_hwmgr *data =
+   (struct vega20_hwmgr *)(hwmgr->backend);
+   uint32_t soft_min_level, soft_max_level;
int ret = 0;
 
+   soft_min_level = 
vega20_find_lowest_dpm_level(&(data->dpm_table.gfx_table));
+   soft_max_level = 
vega20_find_highest_dpm_level(&(data->dpm_table.gfx_table));
+   data->dpm_table.gfx_table.dpm_state.soft_min_level =
+   data->dpm_table.gfx_table.dpm_levels[soft_min_level].value;
+   data->dpm_table.gfx_table.dpm_state.soft_max_level =
+   data->dpm_table.gfx_table.dpm_levels[soft_max_level].value;
+
+   soft_min_level = 
vega20_find_lowest_dpm_level(&(data->dpm_table.mem_table));
+   soft_max_level = 
vega20_find_highest_dpm_level(&(data->dpm_table.mem_table));
+   data->dpm_table.mem_table.dpm_state.soft_min_level =
+   data->dpm_table.mem_table.dpm_levels[soft_min_level].value;
+   data->dpm_table.mem_table.dpm_state.soft_max_level =
+   data->dpm_table.mem_table.dpm_levels[soft_max_level].value;
+
+   soft_min_level = 
vega20_find_lowest_dpm_level(&(data->dpm_table.soc_table));
+   soft_max_level = 
vega20_find_highest_dpm_level(&(data->dpm_table.soc_table));
+   data->dpm_table.soc_table.dpm_state.soft_min_level =
+   data->dpm_table.soc_table.dpm_levels[soft_min_level].value;
+   data->dpm_table.soc_table.dpm_state.soft_max_level =
+   data->dpm_table.soc_table.dpm_levels[soft_max_level].value;
+
ret = vega20_upload_dpm_min_level(hwmgr, 0x);
PP_ASSERT_WITH_CODE(!ret,
"Failed to upload DPM Bootup Levels!",
@@ -2457,6 +2493,7 @@ static int vega20_dpm_force_dpm_level(struct pp_hwmgr 
*hwmgr,
return ret;
vega20_force_clock_level(hwmgr, PP_SCLK, 1 << sclk_mask);
vega20_force_clock_level(hwmgr, PP_MCLK, 1 << mclk_mask);
+   vega20_force_clock_level(hwmgr, PP_SOCCLK, 1 << soc_mask);
break;
 
case AMD_DPM_FORCED_LEVEL_MANUAL:
-- 
2.20.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 2/2] drm/amd/powerplay: run btc before enabling all SMU features

2019-01-21 Thread Evan Quan
BTC is needed before enabling all SMU features.

Change-Id: Ic717226528f4d09a58264524b2d8e67150a35da7
Signed-off-by: Evan Quan 
---
 drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c 
b/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c
index 60a22d8da7f0..5085b3636f8e 100644
--- a/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c
+++ b/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c
@@ -804,6 +804,11 @@ static int vega20_set_allowed_featuresmask(struct pp_hwmgr 
*hwmgr)
return 0;
 }
 
+static int vega20_run_btc(struct pp_hwmgr *hwmgr)
+{
+   return smum_send_msg_to_smc(hwmgr, PPSMC_MSG_RunBtc);
+}
+
 static int vega20_run_btc_afll(struct pp_hwmgr *hwmgr)
 {
return smum_send_msg_to_smc(hwmgr, PPSMC_MSG_RunAfllBtc);
@@ -1565,6 +1570,11 @@ static int vega20_enable_dpm_tasks(struct pp_hwmgr 
*hwmgr)
"[EnableDPMTasks] Failed to initialize SMC table!",
return result);
 
+   result = vega20_run_btc(hwmgr);
+   PP_ASSERT_WITH_CODE(!result,
+   "[EnableDPMTasks] Failed to run btc!",
+   return result);
+
result = vega20_run_btc_afll(hwmgr);
PP_ASSERT_WITH_CODE(!result,
"[EnableDPMTasks] Failed to run btc afll!",
-- 
2.20.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH v2] drm/amdgpu: add flags to emit_ib interface v2

2019-01-21 Thread Koenig, Christian
Reviewed-by: Christian König 

Am 21.01.2019 07:57 schrieb "Xiao, Jack" :
Replace the last bool type parameter with a general flags parameter,
to make the last parameter be able to contain more information.

v2: drop setting need_ctx_switch = false

Signed-off-by: Jack Xiao 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c   | 10 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h |  4 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c  |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vce.h  |  2 +-
 drivers/gpu/drm/amd/amdgpu/cik_sdma.c|  2 +-
 drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c|  4 ++--
 drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c|  6 +++---
 drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c|  4 ++--
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c|  4 ++--
 drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c   |  2 +-
 drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c   |  2 +-
 drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c   |  2 +-
 drivers/gpu/drm/amd/amdgpu/si_dma.c  |  2 +-
 drivers/gpu/drm/amd/amdgpu/uvd_v4_2.c|  2 +-
 drivers/gpu/drm/amd/amdgpu/uvd_v5_0.c|  2 +-
 drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c|  4 ++--
 drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c|  4 ++--
 drivers/gpu/drm/amd/amdgpu/vce_v3_0.c|  2 +-
 drivers/gpu/drm/amd/amdgpu/vce_v4_0.c|  2 +-
 drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c|  6 +++---
 20 files changed, 34 insertions(+), 34 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
index c48207b3..0b8ef2d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
@@ -202,12 +202,12 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned 
num_ibs,
 amdgpu_asic_flush_hdp(adev, ring);
 }

+   if (need_ctx_switch)
+   status |= AMDGPU_HAVE_CTX_SWITCH;
+
 skip_preamble = ring->current_ctx == fence_ctx;
 if (job && ring->funcs->emit_cntxcntl) {
-   if (need_ctx_switch)
-   status |= AMDGPU_HAVE_CTX_SWITCH;
 status |= job->preamble_status;
-
 amdgpu_ring_emit_cntxcntl(ring, status);
 }

@@ -221,8 +221,8 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned 
num_ibs,
 !amdgpu_sriov_vf(adev)) /* for SRIOV preemption, 
Preamble CE ib must be inserted anyway */
 continue;

-   amdgpu_ring_emit_ib(ring, job, ib, need_ctx_switch);
-   need_ctx_switch = false;
+   amdgpu_ring_emit_ib(ring, job, ib, status);
+   status &= ~AMDGPU_HAVE_CTX_SWITCH;
 }

 if (ring->funcs->emit_tmz)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
index d87e828..d7fae26 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
@@ -131,7 +131,7 @@ struct amdgpu_ring_funcs {
 void (*emit_ib)(struct amdgpu_ring *ring,
 struct amdgpu_job *job,
 struct amdgpu_ib *ib,
-   bool ctx_switch);
+   uint32_t flags);
 void (*emit_fence)(struct amdgpu_ring *ring, uint64_t addr,
uint64_t seq, unsigned flags);
 void (*emit_pipeline_sync)(struct amdgpu_ring *ring);
@@ -229,7 +229,7 @@ struct amdgpu_ring {
 #define amdgpu_ring_get_rptr(r) (r)->funcs->get_rptr((r))
 #define amdgpu_ring_get_wptr(r) (r)->funcs->get_wptr((r))
 #define amdgpu_ring_set_wptr(r) (r)->funcs->set_wptr((r))
-#define amdgpu_ring_emit_ib(r, job, ib, c) ((r)->funcs->emit_ib((r), (job), 
(ib), (c)))
+#define amdgpu_ring_emit_ib(r, job, ib, flags) ((r)->funcs->emit_ib((r), 
(job), (ib), (flags)))
 #define amdgpu_ring_emit_pipeline_sync(r) (r)->funcs->emit_pipeline_sync((r))
 #define amdgpu_ring_emit_vm_flush(r, vmid, addr) 
(r)->funcs->emit_vm_flush((r), (vmid), (addr))
 #define amdgpu_ring_emit_fence(r, addr, seq, flags) 
(r)->funcs->emit_fence((r), (addr), (seq), (flags))
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
index 98a1b2c..c021b11 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
@@ -1035,7 +1035,7 @@ int amdgpu_vce_ring_parse_cs_vm(struct amdgpu_cs_parser 
*p, uint32_t ib_idx)
 void amdgpu_vce_ring_emit_ib(struct amdgpu_ring *ring,
 struct amdgpu_job *job,
 struct amdgpu_ib *ib,
-   bool ctx_switch)
+   uint32_t flags)
 {
 amdgpu_ring_write(ring, VCE_CMD_IB);
 amdgpu_ring_write(ring, lower_32_bits(ib->gpu_addr));
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.h
index 5029365..30ea54d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.h
@@ -66,7 +66,7 @@ int amdgpu_vce_get_destroy_msg(struc

RE: [PATCH v2] drm/amdgpu: add flags to emit_ib interface v2

2019-01-21 Thread Zhang, Hawking
Reviewed-by: Hawking Zhang 

Regards,
Hawking
-Original Message-
From: Xiao, Jack  
Sent: 2019年1月21日 14:57
To: amd-gfx@lists.freedesktop.org; Deucher, Alexander 
; Zhang, Hawking ; Koenig, 
Christian 
Cc: Xiao, Jack 
Subject: [PATCH v2] drm/amdgpu: add flags to emit_ib interface v2

Replace the last bool type parameter with a general flags parameter, to make 
the last parameter be able to contain more information.

v2: drop setting need_ctx_switch = false

Signed-off-by: Jack Xiao 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c   | 10 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h |  4 ++--  
drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c  |  2 +-  
drivers/gpu/drm/amd/amdgpu/amdgpu_vce.h  |  2 +-
 drivers/gpu/drm/amd/amdgpu/cik_sdma.c|  2 +-
 drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c|  4 ++--
 drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c|  6 +++---
 drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c|  4 ++--
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c|  4 ++--
 drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c   |  2 +-
 drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c   |  2 +-
 drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c   |  2 +-
 drivers/gpu/drm/amd/amdgpu/si_dma.c  |  2 +-
 drivers/gpu/drm/amd/amdgpu/uvd_v4_2.c|  2 +-
 drivers/gpu/drm/amd/amdgpu/uvd_v5_0.c|  2 +-
 drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c|  4 ++--
 drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c|  4 ++--
 drivers/gpu/drm/amd/amdgpu/vce_v3_0.c|  2 +-
 drivers/gpu/drm/amd/amdgpu/vce_v4_0.c|  2 +-
 drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c|  6 +++---
 20 files changed, 34 insertions(+), 34 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
index c48207b3..0b8ef2d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
@@ -202,12 +202,12 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned 
num_ibs,
amdgpu_asic_flush_hdp(adev, ring);
}
 
+   if (need_ctx_switch)
+   status |= AMDGPU_HAVE_CTX_SWITCH;
+
skip_preamble = ring->current_ctx == fence_ctx;
if (job && ring->funcs->emit_cntxcntl) {
-   if (need_ctx_switch)
-   status |= AMDGPU_HAVE_CTX_SWITCH;
status |= job->preamble_status;
-
amdgpu_ring_emit_cntxcntl(ring, status);
}
 
@@ -221,8 +221,8 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned 
num_ibs,
!amdgpu_sriov_vf(adev)) /* for SRIOV preemption, 
Preamble CE ib must be inserted anyway */
continue;
 
-   amdgpu_ring_emit_ib(ring, job, ib, need_ctx_switch);
-   need_ctx_switch = false;
+   amdgpu_ring_emit_ib(ring, job, ib, status);
+   status &= ~AMDGPU_HAVE_CTX_SWITCH;
}
 
if (ring->funcs->emit_tmz)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
index d87e828..d7fae26 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
@@ -131,7 +131,7 @@ struct amdgpu_ring_funcs {
void (*emit_ib)(struct amdgpu_ring *ring,
struct amdgpu_job *job,
struct amdgpu_ib *ib,
-   bool ctx_switch);
+   uint32_t flags);
void (*emit_fence)(struct amdgpu_ring *ring, uint64_t addr,
   uint64_t seq, unsigned flags);
void (*emit_pipeline_sync)(struct amdgpu_ring *ring); @@ -229,7 +229,7 
@@ struct amdgpu_ring {  #define amdgpu_ring_get_rptr(r) 
(r)->funcs->get_rptr((r))  #define amdgpu_ring_get_wptr(r) 
(r)->funcs->get_wptr((r))  #define amdgpu_ring_set_wptr(r) 
(r)->funcs->set_wptr((r)) -#define amdgpu_ring_emit_ib(r, job, ib, c) 
((r)->funcs->emit_ib((r), (job), (ib), (c)))
+#define amdgpu_ring_emit_ib(r, job, ib, flags) 
+((r)->funcs->emit_ib((r), (job), (ib), (flags)))
 #define amdgpu_ring_emit_pipeline_sync(r) (r)->funcs->emit_pipeline_sync((r))
 #define amdgpu_ring_emit_vm_flush(r, vmid, addr) 
(r)->funcs->emit_vm_flush((r), (vmid), (addr))  #define 
amdgpu_ring_emit_fence(r, addr, seq, flags) (r)->funcs->emit_fence((r), (addr), 
(seq), (flags)) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
index 98a1b2c..c021b11 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
@@ -1035,7 +1035,7 @@ int amdgpu_vce_ring_parse_cs_vm(struct amdgpu_cs_parser 
*p, uint32_t ib_idx)  void amdgpu_vce_ring_emit_ib(struct amdgpu_ring *ring,
struct amdgpu_job *job,
struct amdgpu_ib *ib,
-   bool ctx_switch)
+   uint32_t flags)
 {
amdgpu_ring_write(ring, VCE_CMD_IB);
amdgpu_ring_write(ring, lower_32_bits(ib->gpu_addr)); diff --git 
a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.h 
b/drive

RE: [PATCH] drm/amd/powerplay: OD setting fix on Vega10

2019-01-21 Thread Quan, Evan
Reviewed-by: Evan Quan 

> -Original Message-
> From: amd-gfx  On Behalf Of
> Kenneth Feng
> Sent: 2019年1月21日 16:17
> To: amd-gfx@lists.freedesktop.org
> Cc: Feng, Kenneth 
> Subject: [PATCH] drm/amd/powerplay: OD setting fix on Vega10
> 
> gfxclk for OD setting is limited to 1980M for non-acg ASICs of Vega10
> 
> Signed-off-by: Kenneth Feng 
> ---
>  .../amd/powerplay/hwmgr/vega10_processpptables.c   | 22
> +-
>  1 file changed, 21 insertions(+), 1 deletion(-)
> 
> diff --git
> a/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_processpptables.c
> b/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_processpptables.c
> index b8747a5..99d596d 100644
> --- a/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_processpptables.c
> +++ b/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_processpptables.c
> @@ -32,6 +32,7 @@
>  #include "vega10_pptable.h"
> 
>  #define NUM_DSPCLK_LEVELS 8
> +#define VEGA10_ENGINECLOCK_HARDMAX 198000
> 
>  static void set_hw_cap(struct pp_hwmgr *hwmgr, bool enable,
>   enum phm_platform_caps cap)
> @@ -258,7 +259,26 @@ static int init_over_drive_limits(
>   struct pp_hwmgr *hwmgr,
>   const ATOM_Vega10_POWERPLAYTABLE *powerplay_table)
> {
> - hwmgr->platform_descriptor.overdriveLimit.engineClock =
> + const ATOM_Vega10_GFXCLK_Dependency_Table
> *gfxclk_dep_table =
> + (const ATOM_Vega10_GFXCLK_Dependency_Table *)
> + (((unsigned long) powerplay_table) +
> + le16_to_cpu(powerplay_table-
> >usGfxclkDependencyTableOffset));
> + bool is_acg_enabled = false;
> + ATOM_Vega10_GFXCLK_Dependency_Record_V2
> *patom_record_v2;
> +
> + if (gfxclk_dep_table->ucRevId == 1) {
> + patom_record_v2 =
> + (ATOM_Vega10_GFXCLK_Dependency_Record_V2
> *)gfxclk_dep_table->entries;
> + is_acg_enabled =
> + (bool)patom_record_v2[gfxclk_dep_table-
> >ucNumEntries-1].ucACGEnable;
> + }
> +
> + if (powerplay_table->ulMaxODEngineClock >
> VEGA10_ENGINECLOCK_HARDMAX &&
> + !is_acg_enabled)
> + hwmgr->platform_descriptor.overdriveLimit.engineClock =
> + VEGA10_ENGINECLOCK_HARDMAX;
> + else
> + hwmgr->platform_descriptor.overdriveLimit.engineClock =
>   le32_to_cpu(powerplay_table-
> >ulMaxODEngineClock);
>   hwmgr->platform_descriptor.overdriveLimit.memoryClock =
>   le32_to_cpu(powerplay_table-
> >ulMaxODMemoryClock);
> --
> 2.7.4
> 
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/amd/powerplay: OD setting fix on Vega10

2019-01-21 Thread Kenneth Feng
gfxclk for OD setting is limited to 1980M for non-acg
ASICs of Vega10

Signed-off-by: Kenneth Feng 
---
 .../amd/powerplay/hwmgr/vega10_processpptables.c   | 22 +-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_processpptables.c 
b/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_processpptables.c
index b8747a5..99d596d 100644
--- a/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_processpptables.c
+++ b/drivers/gpu/drm/amd/powerplay/hwmgr/vega10_processpptables.c
@@ -32,6 +32,7 @@
 #include "vega10_pptable.h"
 
 #define NUM_DSPCLK_LEVELS 8
+#define VEGA10_ENGINECLOCK_HARDMAX 198000
 
 static void set_hw_cap(struct pp_hwmgr *hwmgr, bool enable,
enum phm_platform_caps cap)
@@ -258,7 +259,26 @@ static int init_over_drive_limits(
struct pp_hwmgr *hwmgr,
const ATOM_Vega10_POWERPLAYTABLE *powerplay_table)
 {
-   hwmgr->platform_descriptor.overdriveLimit.engineClock =
+   const ATOM_Vega10_GFXCLK_Dependency_Table *gfxclk_dep_table =
+   (const ATOM_Vega10_GFXCLK_Dependency_Table *)
+   (((unsigned long) powerplay_table) +
+   
le16_to_cpu(powerplay_table->usGfxclkDependencyTableOffset));
+   bool is_acg_enabled = false;
+   ATOM_Vega10_GFXCLK_Dependency_Record_V2 *patom_record_v2;
+
+   if (gfxclk_dep_table->ucRevId == 1) {
+   patom_record_v2 =
+   (ATOM_Vega10_GFXCLK_Dependency_Record_V2 
*)gfxclk_dep_table->entries;
+   is_acg_enabled =
+   
(bool)patom_record_v2[gfxclk_dep_table->ucNumEntries-1].ucACGEnable;
+   }
+
+   if (powerplay_table->ulMaxODEngineClock > VEGA10_ENGINECLOCK_HARDMAX &&
+   !is_acg_enabled)
+   hwmgr->platform_descriptor.overdriveLimit.engineClock =
+   VEGA10_ENGINECLOCK_HARDMAX;
+   else
+   hwmgr->platform_descriptor.overdriveLimit.engineClock =
le32_to_cpu(powerplay_table->ulMaxODEngineClock);
hwmgr->platform_descriptor.overdriveLimit.memoryClock =
le32_to_cpu(powerplay_table->ulMaxODMemoryClock);
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx