Re: [PATCH 2/4] PCI: add functionality for resizing resources v2

2017-03-24 Thread Bjorn Helgaas
On Mon, Mar 13, 2017 at 01:41:34PM +0100, Christian König wrote:
> From: Christian König 
> 
> This allows device drivers to request resizing their BARs.
> 
> The function only tries to reprogram the windows of the bridge directly above
> the requesting device and only the BAR of the same type (usually mem, 64bit,
> prefetchable). This is done to make sure not to disturb other drivers by
> changing the BARs of their devices.
> 
> If reprogramming the bridge BAR fails the old status is restored and -ENOSPC
> returned to the calling device driver.
> 
> v2: rebase on changes in rbar support
> 
> Signed-off-by: Christian König 
> ---
>  drivers/pci/setup-bus.c | 61 
> +
>  drivers/pci/setup-res.c | 51 +
>  include/linux/pci.h |  2 ++
>  3 files changed, 114 insertions(+)
> 
> diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
> index f30ca75..cfab2c7 100644
> --- a/drivers/pci/setup-bus.c
> +++ b/drivers/pci/setup-bus.c
> @@ -1923,6 +1923,67 @@ void pci_assign_unassigned_bridge_resources(struct 
> pci_dev *bridge)
>  }
>  EXPORT_SYMBOL_GPL(pci_assign_unassigned_bridge_resources);
>  
> +int pci_reassign_bridge_resources(struct pci_dev *bridge, unsigned long type)
> +{
> + const unsigned long type_mask = IORESOURCE_IO | IORESOURCE_MEM |
> + IORESOURCE_PREFETCH | IORESOURCE_MEM_64;
> +
> + struct resource saved;
> + LIST_HEAD(add_list);
> + LIST_HEAD(fail_head);
> + struct pci_dev_resource *fail_res;
> + unsigned i;
> + int ret = 0;
> +
> + /* Release all children from the matching bridge resource */
> + for (i = PCI_BRIDGE_RESOURCES; i < PCI_BRIDGE_RESOURCE_END; ++i) {

Nit: use post-increment unless you need pre-increment.

> + struct resource *res = >resource[i];
> +
> + if ((res->flags & type_mask) != (type & type_mask))
> + continue;
> +
> + saved = *res;
> + if (res->parent) {
> + release_child_resources(res);

Doesn't this recursively release *all* child resources?  There could
be BARs from several devices, or even windows of downstream bridges,
inside this window.  The drivers of those other devices aren't
expecting things to change here.

> + release_resource(res);
> + }
> + res->start = 0;
> + res->end = 0;
> + break;
> + }
> +
> + if (i == PCI_BRIDGE_RESOURCE_END)
> + return -ENOENT;
> +
> + __pci_bus_size_bridges(bridge->subordinate, _list);
> + __pci_bridge_assign_resources(bridge, _list, _head);
> + BUG_ON(!list_empty(_list));
> +
> + /* restore size and flags */
> + list_for_each_entry(fail_res, _head, list) {
> + struct resource *res = fail_res->res;
> +
> + res->start = fail_res->start;
> + res->end = fail_res->end;
> + res->flags = fail_res->flags;
> + }
> +
> + /* Revert to the old configuration */
> + if (!list_empty(_head)) {
> + struct resource *res = >resource[i];
> +
> + res->start = saved.start;
> + res->end = saved.end;
> + res->flags = saved.flags;
> +
> + pci_claim_resource(bridge, i);
> + ret = -ENOSPC;
> + }
> +
> + free_list(_head);
> + return ret;
> +}
> +
>  void pci_assign_unassigned_bus_resources(struct pci_bus *bus)
>  {
>   struct pci_dev *dev;
> diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-res.c
> index 9526e34..3bb1e29 100644
> --- a/drivers/pci/setup-res.c
> +++ b/drivers/pci/setup-res.c
> @@ -363,6 +363,57 @@ int pci_reassign_resource(struct pci_dev *dev, int 
> resno, resource_size_t addsiz
>   return 0;
>  }
>  
> +int pci_resize_resource(struct pci_dev *dev, int resno, int size)
> +{
> + struct resource *res = dev->resource + resno;
> + u32 sizes = pci_rbar_get_possible_sizes(dev, resno);
> + int old = pci_rbar_get_current_size(dev, resno);
> + u64 bytes = 1ULL << (size + 20);
> + int ret = 0;

I think we should fail the request if the device is enabled, i.e., if
the PCI_COMMAND_MEMORY bit is set.  We can't safely change the BAR
while memory decoding is enabled.

I know there's code in pci_std_update_resource() that turns off
PCI_COMMAND_MEMORY, but I think that's a mistake: I think it should
fail when asked to update an enabled BAR the same way
pci_iov_update_resource() does.

I'm not sure why you call pci_reenable_device() below, but I think I
would rather have the driver do something like this:

  pci_disable_device(dev);
  pci_resize_resource(dev, 0, size);
  pci_enable_device(dev);

That way it's very clear to the driver that it must re-read all BARs
after resizing this one.

> + if (!sizes)
> + return -ENOTSUPP;
> +
> + if (!(sizes & (1 << size)))
> + return 

Re: [PATCH 3/3] drm/amdgpu/soc15: return cached values for some registers

2017-03-24 Thread Felix Kuehling
We're reporting gb_addr_config to user mode in our KFD tiling info API.

If this is no longer needed by user mode for soc15, we could just put in
a dummy value. However, I haven't been told that it can be removed for
older ASICs.

Regards,
  Felix


On 17-03-24 03:48 PM, Alex Deucher wrote:
> On Fri, Mar 24, 2017 at 3:44 PM, Christian König
>  wrote:
>> Am 24.03.2017 um 20:13 schrieb Alex Deucher:
>>> Required for SR-IOV and saves MMIO transactions.
>>>
>>> Signed-off-by: Alex Deucher 
>>
>> As far as I can see they are not used any more by userspace and the same
>> info is available in enabled_rb_pipes_mask.
>>
>> So why do you want to keep them?
> Does anything use mmGB_ADDR_CONFIG?  If not, I agree, we can drop the
> whole thing.  Not sure if any of the closed UMDs use them or not off
> hand.
>
> Alex
>
>> Regards,
>> Christian.
>>
>>
>>> ---
>>>   drivers/gpu/drm/amd/amdgpu/soc15.c | 40
>>> ++
>>>   1 file changed, 32 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c
>>> b/drivers/gpu/drm/amd/amdgpu/soc15.c
>>> index 804bd8d..441e0f4 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/soc15.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
>>> @@ -322,6 +322,32 @@ static uint32_t soc15_read_indexed_register(struct
>>> amdgpu_device *adev, u32 se_n
>>> return val;
>>>   }
>>>   +static uint32_t soc15_get_register_value(struct amdgpu_device *adev,
>>> +bool indexed, u32 se_num,
>>> +u32 sh_num, u32 reg_offset)
>>> +{
>>> +   if (indexed) {
>>> +   unsigned se_idx = (se_num == 0x) ? 0 : se_num;
>>> +   unsigned sh_idx = (sh_num == 0x) ? 0 : sh_num;
>>> +
>>> +   switch (reg_offset) {
>>> +   case SOC15_REG_OFFSET(GC, 0, mmCC_RB_BACKEND_DISABLE):
>>> +   return
>>> adev->gfx.config.rb_config[se_idx][sh_idx].rb_backend_disable;
>>> +   case SOC15_REG_OFFSET(GC, 0,
>>> mmGC_USER_RB_BACKEND_DISABLE):
>>> +   return
>>> adev->gfx.config.rb_config[se_idx][sh_idx].user_rb_backend_disable;
>>> +   }
>>> +
>>> +   return soc15_read_indexed_register(adev, se_num, sh_num,
>>> reg_offset);
>>> +   } else {
>>> +   switch (reg_offset) {
>>> +   case SOC15_REG_OFFSET(GC, 0, mmGB_ADDR_CONFIG):
>>> +   return adev->gfx.config.gb_addr_config;
>>> +   default:
>>> +   return RREG32(reg_offset);
>>> +   }
>>> +   }
>>> +}
>>> +
>>>   static int soc15_read_register(struct amdgpu_device *adev, u32 se_num,
>>> u32 sh_num, u32 reg_offset, u32 *value)
>>>   {
>>> @@ -345,10 +371,9 @@ static int soc15_read_register(struct amdgpu_device
>>> *adev, u32 se_num,
>>> if (reg_offset != asic_register_entry->reg_offset)
>>> continue;
>>> if (!asic_register_entry->untouched)
>>> -   *value = asic_register_entry->grbm_indexed
>>> ?
>>> -   soc15_read_indexed_register(adev,
>>> se_num,
>>> -sh_num,
>>> reg_offset) :
>>> -   RREG32(reg_offset);
>>> +   *value = soc15_get_register_value(adev,
>>> +
>>> asic_register_entry->grbm_indexed,
>>> + se_num,
>>> sh_num, reg_offset);
>>> return 0;
>>> }
>>> }
>>> @@ -358,10 +383,9 @@ static int soc15_read_register(struct amdgpu_device
>>> *adev, u32 se_num,
>>> continue;
>>> if (!soc15_allowed_read_registers[i].untouched)
>>> -   *value =
>>> soc15_allowed_read_registers[i].grbm_indexed ?
>>> -   soc15_read_indexed_register(adev, se_num,
>>> -sh_num,
>>> reg_offset) :
>>> -   RREG32(reg_offset);
>>> +   *value = soc15_get_register_value(adev,
>>> +
>>> soc15_allowed_read_registers[i].grbm_indexed,
>>> + se_num, sh_num,
>>> reg_offset);
>>> return 0;
>>> }
>>> return -EINVAL;
>>
>>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: Plan: BO move throttling for visible VRAM evictions

2017-03-24 Thread Samuel Pitoiset



On 03/24/2017 05:33 PM, Marek Olšák wrote:

Hi,

I'm sharing this idea here, because it's something that has been
decreasing our performance a lot recently, for example:
http://openbenchmarking.org/prospect/1703011-RI-RADEONDIR06/7b7668cfc109d1c3dc27e871c8aea71ca13f23fa

I think the problem there is that Mesa git started uploading
descriptors and uniforms to VRAM, which helps when TC L2 has a low
hit/miss ratio, but the performance can randomly drop by an order of
magnitude. I've heard rumours that kernel 4.11 has an improved
allocator that should perform better, but the situation is still far
from ideal.


I have just tried 4.11-rc3 from torvalds's github, nothing changed with 
civ6, still 23FPS.




AMD CPUs and APUs will hopefully suffer less, because we can resize
the visible VRAM with the help of our CPU hw specs, but Intel CPUs
will remain limited to 256 MB. The following plan describes how to do
throttling for visible VRAM evictions.


1) Theory

Initially, the driver doesn't care about where buffers are in VRAM,
because VRAM buffers are only moved to visible VRAM on CPU page faults
(when the CPU touches the buffer memory but the memory is in the
invisible part of VRAM). When it happens,
amdgpu_bo_fault_reserve_notify is called, which moves the buffer to
visible VRAM, and the app continues. amdgpu_bo_fault_reserve_notify
also marks the buffer as contiguous, which makes memory fragmentation
worse.

I verified this with DiRT Rally where amdgpu_bo_fault_reserve_notify
was much higher in a CPU profiler than anything else in the kernel.


Looks like, I see similar things with civ6.




2) Monitoring via Gallium HUD

We need to expose 2 kernel counters via the INFO ioctl and display
those via Gallium HUD:
- The number of VRAM CPU page faults. (the number of calls to
amdgpu_bo_fault_reserve_notify).
- The number of bytes moved by ttm_bo_validate inside
amdgpu_bo_fault_reserve_notify.

This will help us observe what exactly is happening and fine-tune the
throttling when it's done.



Should really be useful.

Samuel.



3) Solution

a) When amdgpu_bo_fault_reserve_notify is called, record the fact.
(amdgpu_bo::had_cpu_page_fault = true)

b) Monitor the MB/s rate at which buffers are moved by
amdgpu_bo_fault_reserve_notify. If we get above a specific threshold,
don't move the buffer to visible VRAM. Move it to GTT instead. Note
that moving to GTT can be cheaper, because moving to visible VRAM is
likely to evict a lot of buffers there and unmap them from the CPU,
but moving to GTT shouldn't evict or unmap anything.

c) When we get into the CS ioctl and a buffer has had_cpu_page_fault,
it can be moved to VRAM if:
- the GTT->VRAM move rate is low enough to allow it (this is the
existing throttling mechanism)
- the visible VRAM move rate is low enough that we will be OK with
another CPU page fault if it happens.

d) The solution can be fine-tuned with the help of Gallium HUD to get
the best performance under various scenarios. The current throttling
mechanism can serve as an inspiration.


That's it. Feel free to comment. I think this is our biggest
performance bottleneck at the moment.

Marek
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 3/3] drm/amdgpu/soc15: return cached values for some registers

2017-03-24 Thread Christian König

Am 24.03.2017 um 20:13 schrieb Alex Deucher:

Required for SR-IOV and saves MMIO transactions.

Signed-off-by: Alex Deucher 


As far as I can see they are not used any more by userspace and the same 
info is available in enabled_rb_pipes_mask.


So why do you want to keep them?

Regards,
Christian.


---
  drivers/gpu/drm/amd/amdgpu/soc15.c | 40 ++
  1 file changed, 32 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c 
b/drivers/gpu/drm/amd/amdgpu/soc15.c
index 804bd8d..441e0f4 100644
--- a/drivers/gpu/drm/amd/amdgpu/soc15.c
+++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
@@ -322,6 +322,32 @@ static uint32_t soc15_read_indexed_register(struct 
amdgpu_device *adev, u32 se_n
return val;
  }
  
+static uint32_t soc15_get_register_value(struct amdgpu_device *adev,

+bool indexed, u32 se_num,
+u32 sh_num, u32 reg_offset)
+{
+   if (indexed) {
+   unsigned se_idx = (se_num == 0x) ? 0 : se_num;
+   unsigned sh_idx = (sh_num == 0x) ? 0 : sh_num;
+
+   switch (reg_offset) {
+   case SOC15_REG_OFFSET(GC, 0, mmCC_RB_BACKEND_DISABLE):
+   return 
adev->gfx.config.rb_config[se_idx][sh_idx].rb_backend_disable;
+   case SOC15_REG_OFFSET(GC, 0, mmGC_USER_RB_BACKEND_DISABLE):
+   return 
adev->gfx.config.rb_config[se_idx][sh_idx].user_rb_backend_disable;
+   }
+
+   return soc15_read_indexed_register(adev, se_num, sh_num, 
reg_offset);
+   } else {
+   switch (reg_offset) {
+   case SOC15_REG_OFFSET(GC, 0, mmGB_ADDR_CONFIG):
+   return adev->gfx.config.gb_addr_config;
+   default:
+   return RREG32(reg_offset);
+   }
+   }
+}
+
  static int soc15_read_register(struct amdgpu_device *adev, u32 se_num,
u32 sh_num, u32 reg_offset, u32 *value)
  {
@@ -345,10 +371,9 @@ static int soc15_read_register(struct amdgpu_device *adev, 
u32 se_num,
if (reg_offset != asic_register_entry->reg_offset)
continue;
if (!asic_register_entry->untouched)
-   *value = asic_register_entry->grbm_indexed ?
-   soc15_read_indexed_register(adev, 
se_num,
-sh_num, 
reg_offset) :
-   RREG32(reg_offset);
+   *value = soc15_get_register_value(adev,
+ 
asic_register_entry->grbm_indexed,
+ se_num, 
sh_num, reg_offset);
return 0;
}
}
@@ -358,10 +383,9 @@ static int soc15_read_register(struct amdgpu_device *adev, 
u32 se_num,
continue;
  
  		if (!soc15_allowed_read_registers[i].untouched)

-   *value = soc15_allowed_read_registers[i].grbm_indexed ?
-   soc15_read_indexed_register(adev, se_num,
-sh_num, reg_offset) :
-   RREG32(reg_offset);
+   *value = soc15_get_register_value(adev,
+ 
soc15_allowed_read_registers[i].grbm_indexed,
+ se_num, sh_num, 
reg_offset);
return 0;
}
return -EINVAL;



___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 2/3] drm/amdgpu/gfx9: cache RB harvest registers

2017-03-24 Thread Alex Deucher
So we don't have to look this up via MMIO when users request
the value via ioctl.

Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index 6fc4b29..fa3e579 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -1237,6 +1237,11 @@ static void gfx_v9_0_setup_rb(struct amdgpu_device *adev)
data = gfx_v9_0_get_rb_active_bitmap(adev);
active_rbs |= data << ((i * 
adev->gfx.config.max_sh_per_se + j) *
   rb_bitmap_width_per_sh);
+   /* cache the values for userspace */
+   adev->gfx.config.rb_config[i][j].rb_backend_disable =
+   RREG32(SOC15_REG_OFFSET(GC, 0, 
mmCC_RB_BACKEND_DISABLE));
+   
adev->gfx.config.rb_config[i][j].user_rb_backend_disable =
+   RREG32(SOC15_REG_OFFSET(GC, 0, 
mmGC_USER_RB_BACKEND_DISABLE));
}
}
gfx_v9_0_select_se_sh(adev, 0x, 0x, 0x);
-- 
2.5.5

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 3/3] drm/amdgpu/soc15: return cached values for some registers

2017-03-24 Thread Alex Deucher
Required for SR-IOV and saves MMIO transactions.

Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/soc15.c | 40 ++
 1 file changed, 32 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c 
b/drivers/gpu/drm/amd/amdgpu/soc15.c
index 804bd8d..441e0f4 100644
--- a/drivers/gpu/drm/amd/amdgpu/soc15.c
+++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
@@ -322,6 +322,32 @@ static uint32_t soc15_read_indexed_register(struct 
amdgpu_device *adev, u32 se_n
return val;
 }
 
+static uint32_t soc15_get_register_value(struct amdgpu_device *adev,
+bool indexed, u32 se_num,
+u32 sh_num, u32 reg_offset)
+{
+   if (indexed) {
+   unsigned se_idx = (se_num == 0x) ? 0 : se_num;
+   unsigned sh_idx = (sh_num == 0x) ? 0 : sh_num;
+
+   switch (reg_offset) {
+   case SOC15_REG_OFFSET(GC, 0, mmCC_RB_BACKEND_DISABLE):
+   return 
adev->gfx.config.rb_config[se_idx][sh_idx].rb_backend_disable;
+   case SOC15_REG_OFFSET(GC, 0, mmGC_USER_RB_BACKEND_DISABLE):
+   return 
adev->gfx.config.rb_config[se_idx][sh_idx].user_rb_backend_disable;
+   }
+
+   return soc15_read_indexed_register(adev, se_num, sh_num, 
reg_offset);
+   } else {
+   switch (reg_offset) {
+   case SOC15_REG_OFFSET(GC, 0, mmGB_ADDR_CONFIG):
+   return adev->gfx.config.gb_addr_config;
+   default:
+   return RREG32(reg_offset);
+   }
+   }
+}
+
 static int soc15_read_register(struct amdgpu_device *adev, u32 se_num,
u32 sh_num, u32 reg_offset, u32 *value)
 {
@@ -345,10 +371,9 @@ static int soc15_read_register(struct amdgpu_device *adev, 
u32 se_num,
if (reg_offset != asic_register_entry->reg_offset)
continue;
if (!asic_register_entry->untouched)
-   *value = asic_register_entry->grbm_indexed ?
-   soc15_read_indexed_register(adev, 
se_num,
-sh_num, 
reg_offset) :
-   RREG32(reg_offset);
+   *value = soc15_get_register_value(adev,
+ 
asic_register_entry->grbm_indexed,
+ se_num, 
sh_num, reg_offset);
return 0;
}
}
@@ -358,10 +383,9 @@ static int soc15_read_register(struct amdgpu_device *adev, 
u32 se_num,
continue;
 
if (!soc15_allowed_read_registers[i].untouched)
-   *value = soc15_allowed_read_registers[i].grbm_indexed ?
-   soc15_read_indexed_register(adev, se_num,
-sh_num, reg_offset) :
-   RREG32(reg_offset);
+   *value = soc15_get_register_value(adev,
+ 
soc15_allowed_read_registers[i].grbm_indexed,
+ se_num, sh_num, 
reg_offset);
return 0;
}
return -EINVAL;
-- 
2.5.5

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 1/3] drm/amdgpu/gfx9: use hweight for calculating num_rbs

2017-03-24 Thread Alex Deucher
Match what we do for other asics.

Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index 988c24d..6fc4b29 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -1225,7 +1225,7 @@ static u32 gfx_v9_0_get_rb_active_bitmap(struct 
amdgpu_device *adev)
 static void gfx_v9_0_setup_rb(struct amdgpu_device *adev)
 {
int i, j;
-   u32 data, tmp, num_rbs = 0;
+   u32 data;
u32 active_rbs = 0;
u32 rb_bitmap_width_per_sh = adev->gfx.config.max_backends_per_se /
adev->gfx.config.max_sh_per_se;
@@ -1243,10 +1243,7 @@ static void gfx_v9_0_setup_rb(struct amdgpu_device *adev)
mutex_unlock(>grbm_idx_mutex);
 
adev->gfx.config.backend_enable_mask = active_rbs;
-   tmp = active_rbs;
-   while (tmp >>= 1)
-   num_rbs++;
-   adev->gfx.config.num_rbs = num_rbs;
+   adev->gfx.config.num_rbs = hweight32(active_rbs);
 }
 
 #define DEFAULT_SH_MEM_BASES   (0x6000)
-- 
2.5.5

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] Revert "drm/radeon: Try evicting from CPU accessible to inaccessible VRAM first"

2017-03-24 Thread Julien Isorce
Hi Michel,

No this change does not help on the other issue (hard lockup).
I have no tried it in combination with the 0 -> i change.

Thx anyway.
Julien


On 24 March 2017 at 10:03, Michel Dänzer  wrote:

> On 24/03/17 12:31 AM, Zachary Michaels wrote:
> >
> > I should also note that we are experiencing another issue where the
> > kernel locks up in similar circumstances. As Julien noted, we get no
> > output, and the watchdogs don't seem to work. It may be the case that
> > Xorg and our process are calling ttm_bo_mem_force_space concurrently,
> > but I don't think we have enough information yet to say for
> > sure. Reverting this commit does not fix that issue. I have some small
> > amount of evidence indicating that bos flagged for CPU access are
> > getting placed in CPU inaccessible memory. Could that cause this sort of
> > kernel lockup?
>
> Possibly, does this help?
>
> diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/
> radeon_ttm.c
> index 37d68cd1f272..40d1bb467a71 100644
> --- a/drivers/gpu/drm/radeon/radeon_ttm.c
> +++ b/drivers/gpu/drm/radeon/radeon_ttm.c
> @@ -198,7 +198,8 @@ static void radeon_evict_flags(struct
> ttm_buffer_object *bo,
> case TTM_PL_VRAM:
> if (rbo->rdev->ring[radeon_copy_ring_index(rbo->rdev)].ready
> == false)
> radeon_ttm_placement_from_domain(rbo,
> RADEON_GEM_DOMAIN_CPU);
> -   else if (rbo->rdev->mc.visible_vram_size <
> rbo->rdev->mc.real_vram_size &&
> +   else if (!(rbo->flags & RADEON_GEM_CPU_ACCESS) &&
> +rbo->rdev->mc.visible_vram_size <
> rbo->rdev->mc.real_vram_size &&
>  bo->mem.start < (rbo->rdev->mc.visible_vram_size
> >> PAGE_SHIFT)) {
> unsigned fpfn = rbo->rdev->mc.visible_vram_size
> >> PAGE_SHIFT;
> int i;
>
>
>
> --
> Earthling Michel Dänzer   |   http://www.amd.com
> Libre software enthusiast | Mesa and X developer
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 2/2] drm/amdgpu: sanitize soc15_allowed_read_registers

2017-03-24 Thread Christian König

Am 24.03.2017 um 19:38 schrieb Alex Deucher:

On Fri, Mar 24, 2017 at 11:09 AM, Christian König
 wrote:

From: Christian König 

Disallow mmCC_RB_BACKEND_DISABLE, reading it can cause GRBM problems and the
same info is available cached as enabled_rb_pipes_mask.

NACK.  We need to implement the caching anyway for sr-iov.  We can
just port the cache changes over from VI.  Have you started looking at
that yet?


Yeah, had that halve way implemented and then realized that the 
registers aren't used by userspace any more.


The only user was addrlib and even there it actually looks incorrect to me.

Ken already had that mostly fixed in libdrm, but somehow missed that 
one. Going to send out libdrm changes to not read from it as well.


Additional to that it is completely pointless to read the register again 
when the info query already has everything from it anyway.


Christian.



Alex


Also remove duplicate mmCP_CPF_BUSY_STAT.

Signed-off-by: Christian König 
---
  drivers/gpu/drm/amd/amdgpu/soc15.c | 2 --
  1 file changed, 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c 
b/drivers/gpu/drm/amd/amdgpu/soc15.c
index 7e54d9dc..be0d47f 100644
--- a/drivers/gpu/drm/amd/amdgpu/soc15.c
+++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
@@ -296,11 +296,9 @@ static struct amdgpu_allowed_register_entry 
soc15_allowed_read_registers[] = {
 { SOC15_REG_OFFSET(GC, 0, mmCP_CPF_BUSY_STAT), false},
 { SOC15_REG_OFFSET(GC, 0, mmCP_CPF_STALLED_STAT1), false},
 { SOC15_REG_OFFSET(GC, 0, mmCP_CPF_STATUS), false},
-   { SOC15_REG_OFFSET(GC, 0, mmCP_CPF_BUSY_STAT), false},
 { SOC15_REG_OFFSET(GC, 0, mmCP_CPC_STALLED_STAT1), false},
 { SOC15_REG_OFFSET(GC, 0, mmCP_CPC_STATUS), false},
 { SOC15_REG_OFFSET(GC, 0, mmGB_ADDR_CONFIG), false},
-   { SOC15_REG_OFFSET(GC, 0, mmCC_RB_BACKEND_DISABLE), false, true},
 { SOC15_REG_OFFSET(GC, 0, mmGC_USER_RB_BACKEND_DISABLE), false, true},
 { SOC15_REG_OFFSET(GC, 0, mmGB_BACKEND_MAP), false, false},
  };
--
2.5.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx



___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] Revert "drm/radeon: Try evicting from CPU accessible to inaccessible VRAM first"

2017-03-24 Thread Julien Isorce
Hi Michel,

I double checked and you are right, the change 0 -> i works.

Cheers
Julien

On 24 March 2017 at 09:59, Michel Dänzer  wrote:

> On 24/03/17 06:50 PM, Julien Isorce wrote:
> > Hi Michel,
> >
> > (Just for other readers my reply has been delayed on the mailing lists
> > and should have been on second position)
>
> It is on https://patchwork.freedesktop.org/patch/145731/ , did you mean
> something else?
>
> The delay was because you weren't subscribed to the amd-gfx mailing list
> yet, so your post went through the moderation queue.
>
>
> > I will have a go with that change and let you know. I do not remember if
> > I tried it for this soft lockup. But for sure it does not solve the hard
> > lockup that Zach also mentioned at the end of his reply.
>
> I'll follow up to his post about that.
>
>
> > But in general, isn't "radeon_lockup_timeout" supposed to detect this
> > situation ?
>
> No, it's for detecting GPU hangs, whereas this is a CPU "hang" (infinite
> loop).
>
>
> --
> Earthling Michel Dänzer   |   http://www.amd.com
> Libre software enthusiast | Mesa and X developer
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 2/2] drm/amdgpu: sanitize soc15_allowed_read_registers

2017-03-24 Thread Alex Deucher
On Fri, Mar 24, 2017 at 11:09 AM, Christian König
 wrote:
> From: Christian König 
>
> Disallow mmCC_RB_BACKEND_DISABLE, reading it can cause GRBM problems and the
> same info is available cached as enabled_rb_pipes_mask.

NACK.  We need to implement the caching anyway for sr-iov.  We can
just port the cache changes over from VI.  Have you started looking at
that yet?

Alex

>
> Also remove duplicate mmCP_CPF_BUSY_STAT.
>
> Signed-off-by: Christian König 
> ---
>  drivers/gpu/drm/amd/amdgpu/soc15.c | 2 --
>  1 file changed, 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c 
> b/drivers/gpu/drm/amd/amdgpu/soc15.c
> index 7e54d9dc..be0d47f 100644
> --- a/drivers/gpu/drm/amd/amdgpu/soc15.c
> +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
> @@ -296,11 +296,9 @@ static struct amdgpu_allowed_register_entry 
> soc15_allowed_read_registers[] = {
> { SOC15_REG_OFFSET(GC, 0, mmCP_CPF_BUSY_STAT), false},
> { SOC15_REG_OFFSET(GC, 0, mmCP_CPF_STALLED_STAT1), false},
> { SOC15_REG_OFFSET(GC, 0, mmCP_CPF_STATUS), false},
> -   { SOC15_REG_OFFSET(GC, 0, mmCP_CPF_BUSY_STAT), false},
> { SOC15_REG_OFFSET(GC, 0, mmCP_CPC_STALLED_STAT1), false},
> { SOC15_REG_OFFSET(GC, 0, mmCP_CPC_STATUS), false},
> { SOC15_REG_OFFSET(GC, 0, mmGB_ADDR_CONFIG), false},
> -   { SOC15_REG_OFFSET(GC, 0, mmCC_RB_BACKEND_DISABLE), false, true},
> { SOC15_REG_OFFSET(GC, 0, mmGC_USER_RB_BACKEND_DISABLE), false, true},
> { SOC15_REG_OFFSET(GC, 0, mmGB_BACKEND_MAP), false, false},
>  };
> --
> 2.5.0
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 1/2] drm/amdgpu: drop GB_GPU_ID from the golden settings

2017-03-24 Thread Alex Deucher
On Fri, Mar 24, 2017 at 11:09 AM, Christian König
 wrote:
> From: Christian König 
>
> That register is marked deprecated, reading it results in a bus error.
>
> Signed-off-by: Christian König 

Might want to compare with the latest golden register list in CAIL to
see if it's already removed and pick up any additional changes if
there are any.  Otherwise:
Reviewed-by: Alex Deucher 

Alex

> ---
>  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 1 -
>  1 file changed, 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
> b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> index ad82ab7..b196431 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> @@ -88,7 +88,6 @@ static const struct amdgpu_gds_reg_offset 
> amdgpu_gds_reg_offset[] =
>  static const u32 golden_settings_gc_9_0[] =
>  {
> SOC15_REG_OFFSET(GC, 0, mmDB_DEBUG2), 0xf00ffeff, 0x0400,
> -   SOC15_REG_OFFSET(GC, 0, mmGB_GPU_ID), 0x000f, 0x,
> SOC15_REG_OFFSET(GC, 0, mmPA_SC_BINNER_EVENT_CNTL_3), 0x0003, 
> 0x82400024,
> SOC15_REG_OFFSET(GC, 0, mmPA_SC_ENHANCE), 0x3fff, 0x0001,
> SOC15_REG_OFFSET(GC, 0, mmPA_SC_LINE_STIPPLE_STATE), 0xff0f, 
> 0x,
> --
> 2.5.0
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: Plan: BO move throttling for visible VRAM evictions

2017-03-24 Thread Deucher, Alexander
> -Original Message-
> From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On Behalf
> Of Marek Olšák
> Sent: Friday, March 24, 2017 12:34 PM
> To: amd-gfx mailing list
> Subject: Plan: BO move throttling for visible VRAM evictions
> 
> Hi,
> 
> I'm sharing this idea here, because it's something that has been
> decreasing our performance a lot recently, for example:
> http://openbenchmarking.org/prospect/1703011-RI-
> RADEONDIR06/7b7668cfc109d1c3dc27e871c8aea71ca13f23fa
> 
> I think the problem there is that Mesa git started uploading
> descriptors and uniforms to VRAM, which helps when TC L2 has a low
> hit/miss ratio, but the performance can randomly drop by an order of
> magnitude. I've heard rumours that kernel 4.11 has an improved
> allocator that should perform better, but the situation is still far
> from ideal.
> 
> AMD CPUs and APUs will hopefully suffer less, because we can resize
> the visible VRAM with the help of our CPU hw specs, but Intel CPUs
> will remain limited to 256 MB. The following plan describes how to do
> throttling for visible VRAM evictions.

Has anyone checked the Intel chipset docs?  Maybe they document the interface?  
There's also ACPI _SRS which should be the vendor independent way to handle 
this.

Alex

> 
> 
> 1) Theory
> 
> Initially, the driver doesn't care about where buffers are in VRAM,
> because VRAM buffers are only moved to visible VRAM on CPU page faults
> (when the CPU touches the buffer memory but the memory is in the
> invisible part of VRAM). When it happens,
> amdgpu_bo_fault_reserve_notify is called, which moves the buffer to
> visible VRAM, and the app continues. amdgpu_bo_fault_reserve_notify
> also marks the buffer as contiguous, which makes memory fragmentation
> worse.
> 
> I verified this with DiRT Rally where amdgpu_bo_fault_reserve_notify
> was much higher in a CPU profiler than anything else in the kernel.
> 
> 
> 2) Monitoring via Gallium HUD
> 
> We need to expose 2 kernel counters via the INFO ioctl and display
> those via Gallium HUD:
> - The number of VRAM CPU page faults. (the number of calls to
> amdgpu_bo_fault_reserve_notify).
> - The number of bytes moved by ttm_bo_validate inside
> amdgpu_bo_fault_reserve_notify.
> 
> This will help us observe what exactly is happening and fine-tune the
> throttling when it's done.
> 
> 
> 3) Solution
> 
> a) When amdgpu_bo_fault_reserve_notify is called, record the fact.
> (amdgpu_bo::had_cpu_page_fault = true)
> 
> b) Monitor the MB/s rate at which buffers are moved by
> amdgpu_bo_fault_reserve_notify. If we get above a specific threshold,
> don't move the buffer to visible VRAM. Move it to GTT instead. Note
> that moving to GTT can be cheaper, because moving to visible VRAM is
> likely to evict a lot of buffers there and unmap them from the CPU,
> but moving to GTT shouldn't evict or unmap anything.
> 
> c) When we get into the CS ioctl and a buffer has had_cpu_page_fault,
> it can be moved to VRAM if:
> - the GTT->VRAM move rate is low enough to allow it (this is the
> existing throttling mechanism)
> - the visible VRAM move rate is low enough that we will be OK with
> another CPU page fault if it happens.
> 
> d) The solution can be fine-tuned with the help of Gallium HUD to get
> the best performance under various scenarios. The current throttling
> mechanism can serve as an inspiration.
> 
> 
> That's it. Feel free to comment. I think this is our biggest
> performance bottleneck at the moment.
> 
> Marek
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: Plan: BO move throttling for visible VRAM evictions

2017-03-24 Thread Marek Olšák
On Fri, Mar 24, 2017 at 5:45 PM, Christian König
 wrote:
> Am 24.03.2017 um 17:33 schrieb Marek Olšák:
>>
>> Hi,
>>
>> I'm sharing this idea here, because it's something that has been
>> decreasing our performance a lot recently, for example:
>>
>> http://openbenchmarking.org/prospect/1703011-RI-RADEONDIR06/7b7668cfc109d1c3dc27e871c8aea71ca13f23fa
>>
>> I think the problem there is that Mesa git started uploading
>> descriptors and uniforms to VRAM, which helps when TC L2 has a low
>> hit/miss ratio, but the performance can randomly drop by an order of
>> magnitude. I've heard rumours that kernel 4.11 has an improved
>> allocator that should perform better, but the situation is still far
>> from ideal.
>>
>> AMD CPUs and APUs will hopefully suffer less, because we can resize
>> the visible VRAM with the help of our CPU hw specs, but Intel CPUs
>> will remain limited to 256 MB. The following plan describes how to do
>> throttling for visible VRAM evictions.
>>
>>
>> 1) Theory
>>
>> Initially, the driver doesn't care about where buffers are in VRAM,
>> because VRAM buffers are only moved to visible VRAM on CPU page faults
>> (when the CPU touches the buffer memory but the memory is in the
>> invisible part of VRAM). When it happens,
>> amdgpu_bo_fault_reserve_notify is called, which moves the buffer to
>> visible VRAM, and the app continues. amdgpu_bo_fault_reserve_notify
>> also marks the buffer as contiguous, which makes memory fragmentation
>> worse.
>>
>> I verified this with DiRT Rally where amdgpu_bo_fault_reserve_notify
>> was much higher in a CPU profiler than anything else in the kernel.
>
>
> Good to know that my expectations on this are correct.
>
> How about fixing the need for contiguous buffers when CPU mapping them?
>
> That should actually be pretty easy to do.
>
>> 2) Monitoring via Gallium HUD
>>
>> We need to expose 2 kernel counters via the INFO ioctl and display
>> those via Gallium HUD:
>> - The number of VRAM CPU page faults. (the number of calls to
>> amdgpu_bo_fault_reserve_notify).
>> - The number of bytes moved by ttm_bo_validate inside
>> amdgpu_bo_fault_reserve_notify.
>>
>> This will help us observe what exactly is happening and fine-tune the
>> throttling when it's done.
>>
>>
>> 3) Solution
>>
>> a) When amdgpu_bo_fault_reserve_notify is called, record the fact.
>> (amdgpu_bo::had_cpu_page_fault = true)
>
>
> What is that good for?
>
>> b) Monitor the MB/s rate at which buffers are moved by
>> amdgpu_bo_fault_reserve_notify. If we get above a specific threshold,
>> don't move the buffer to visible VRAM. Move it to GTT instead. Note
>> that moving to GTT can be cheaper, because moving to visible VRAM is
>> likely to evict a lot of buffers there and unmap them from the CPU,
>> but moving to GTT shouldn't evict or unmap anything.
>
>
> Yeah, had that idea as well. I've been working on adding a context to TTMs
> BO validation call chain.
>
> This way we could add a byte limit on how much TTM will try to evict before
> returning -ENOMEM (or better ENOSPC).
>
>> c) When we get into the CS ioctl and a buffer has had_cpu_page_fault,
>> it can be moved to VRAM if:
>> - the GTT->VRAM move rate is low enough to allow it (this is the
>> existing throttling mechanism)
>> - the visible VRAM move rate is low enough that we will be OK with
>> another CPU page fault if it happens.
>
>
> Interesting idea, need to think a bit about it.
>
> But I would say this has second priority, fixing the contiguous buffer
> requirement should be first. Going to work on that next.

Interesting. I didn't know the contiguous setting wasn't required.

Marek
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: Plan: BO move throttling for visible VRAM evictions

2017-03-24 Thread Christian König

Am 24.03.2017 um 17:33 schrieb Marek Olšák:

Hi,

I'm sharing this idea here, because it's something that has been
decreasing our performance a lot recently, for example:
http://openbenchmarking.org/prospect/1703011-RI-RADEONDIR06/7b7668cfc109d1c3dc27e871c8aea71ca13f23fa

I think the problem there is that Mesa git started uploading
descriptors and uniforms to VRAM, which helps when TC L2 has a low
hit/miss ratio, but the performance can randomly drop by an order of
magnitude. I've heard rumours that kernel 4.11 has an improved
allocator that should perform better, but the situation is still far
from ideal.

AMD CPUs and APUs will hopefully suffer less, because we can resize
the visible VRAM with the help of our CPU hw specs, but Intel CPUs
will remain limited to 256 MB. The following plan describes how to do
throttling for visible VRAM evictions.


1) Theory

Initially, the driver doesn't care about where buffers are in VRAM,
because VRAM buffers are only moved to visible VRAM on CPU page faults
(when the CPU touches the buffer memory but the memory is in the
invisible part of VRAM). When it happens,
amdgpu_bo_fault_reserve_notify is called, which moves the buffer to
visible VRAM, and the app continues. amdgpu_bo_fault_reserve_notify
also marks the buffer as contiguous, which makes memory fragmentation
worse.

I verified this with DiRT Rally where amdgpu_bo_fault_reserve_notify
was much higher in a CPU profiler than anything else in the kernel.


Good to know that my expectations on this are correct.

How about fixing the need for contiguous buffers when CPU mapping them?

That should actually be pretty easy to do.


2) Monitoring via Gallium HUD

We need to expose 2 kernel counters via the INFO ioctl and display
those via Gallium HUD:
- The number of VRAM CPU page faults. (the number of calls to
amdgpu_bo_fault_reserve_notify).
- The number of bytes moved by ttm_bo_validate inside
amdgpu_bo_fault_reserve_notify.

This will help us observe what exactly is happening and fine-tune the
throttling when it's done.


3) Solution

a) When amdgpu_bo_fault_reserve_notify is called, record the fact.
(amdgpu_bo::had_cpu_page_fault = true)


What is that good for?


b) Monitor the MB/s rate at which buffers are moved by
amdgpu_bo_fault_reserve_notify. If we get above a specific threshold,
don't move the buffer to visible VRAM. Move it to GTT instead. Note
that moving to GTT can be cheaper, because moving to visible VRAM is
likely to evict a lot of buffers there and unmap them from the CPU,
but moving to GTT shouldn't evict or unmap anything.


Yeah, had that idea as well. I've been working on adding a context to 
TTMs BO validation call chain.


This way we could add a byte limit on how much TTM will try to evict 
before returning -ENOMEM (or better ENOSPC).



c) When we get into the CS ioctl and a buffer has had_cpu_page_fault,
it can be moved to VRAM if:
- the GTT->VRAM move rate is low enough to allow it (this is the
existing throttling mechanism)
- the visible VRAM move rate is low enough that we will be OK with
another CPU page fault if it happens.


Interesting idea, need to think a bit about it.

But I would say this has second priority, fixing the contiguous buffer 
requirement should be first. Going to work on that next.



d) The solution can be fine-tuned with the help of Gallium HUD to get
the best performance under various scenarios. The current throttling
mechanism can serve as an inspiration.


That's it. Feel free to comment. I think this is our biggest
performance bottleneck at the moment.


Yeah, completely agree.

Regards,
Christian.



Marek
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx



___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 1/4] PCI: add resizeable BAR infrastructure v3

2017-03-24 Thread Bjorn Helgaas
On Mon, Mar 13, 2017 at 01:41:33PM +0100, Christian König wrote:
> From: Christian König 
> 
> Just the defines and helper functions to read the possible sizes of a BAR and
> update it's size.

s/it's/its/

> See 
> https://pcisig.com/sites/default/files/specification_documents/ECN_Resizable-BAR_24Apr2008.pdf.

It's good to have the public ECN that anybody can read, but we should
also have a reference to the full spec that incorporates it, e.g.,
PCIe r3.1, sec 7.22.

> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -1946,6 +1946,9 @@ void pci_request_acs(void);
>  bool pci_acs_enabled(struct pci_dev *pdev, u16 acs_flags);
>  bool pci_acs_path_enabled(struct pci_dev *start,
> struct pci_dev *end, u16 acs_flags);
> +u32 pci_rbar_get_possible_sizes(struct pci_dev *pdev, int bar);
> +int pci_rbar_get_current_size(struct pci_dev *pdev, int bar);
> +int pci_rbar_set_size(struct pci_dev *pdev, int bar, int size);

These should be declared in drivers/pci/pci.h unless they're needed
outside drivers/pci.  I hope they aren't needed outside, because
they're not safe to use after the PCI core has claimed resources.

>  #define PCI_VPD_LRDT 0x80/* Large Resource Data Type */
>  #define PCI_VPD_LRDT_ID(x)   ((x) | PCI_VPD_LRDT)
> diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
> index e5a2e68..6de29d6 100644
> --- a/include/uapi/linux/pci_regs.h
> +++ b/include/uapi/linux/pci_regs.h
> @@ -932,9 +932,16 @@
>  #define PCI_SATA_SIZEOF_LONG 16
>  
>  /* Resizable BARs */
> +#define PCI_REBAR_CAP4   /* capability register */
> +#define  PCI_REBAR_CTRL_SIZES_MASK   (0xF << 4)  /* mask for sizes */
> +#define  PCI_REBAR_CTRL_SIZES_SHIFT  4   /* shift for sizes */
>  #define PCI_REBAR_CTRL   8   /* control register */
> +#define  PCI_REBAR_CTRL_BAR_IDX_MASK (7 << 0)/* mask for bar index */
> +#define  PCI_REBAR_CTRL_BAR_IDX_SHIFT0   /* shift for bar index 
> */
>  #define  PCI_REBAR_CTRL_NBAR_MASK(7 << 5)/* mask for # bars */
>  #define  PCI_REBAR_CTRL_NBAR_SHIFT   5   /* shift for # bars */
> +#define  PCI_REBAR_CTRL_BAR_SIZE_MASK(0x1F << 8) /* mask for bar 
> size */
> +#define  PCI_REBAR_CTRL_BAR_SIZE_SHIFT   8   /* shift for bar size */

s/bar/BAR/ several places above

>  /* Dynamic Power Allocation */
>  #define PCI_DPA_CAP  4   /* capability register */
> -- 
> 2.7.4
> 
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Plan: BO move throttling for visible VRAM evictions

2017-03-24 Thread Marek Olšák
Hi,

I'm sharing this idea here, because it's something that has been
decreasing our performance a lot recently, for example:
http://openbenchmarking.org/prospect/1703011-RI-RADEONDIR06/7b7668cfc109d1c3dc27e871c8aea71ca13f23fa

I think the problem there is that Mesa git started uploading
descriptors and uniforms to VRAM, which helps when TC L2 has a low
hit/miss ratio, but the performance can randomly drop by an order of
magnitude. I've heard rumours that kernel 4.11 has an improved
allocator that should perform better, but the situation is still far
from ideal.

AMD CPUs and APUs will hopefully suffer less, because we can resize
the visible VRAM with the help of our CPU hw specs, but Intel CPUs
will remain limited to 256 MB. The following plan describes how to do
throttling for visible VRAM evictions.


1) Theory

Initially, the driver doesn't care about where buffers are in VRAM,
because VRAM buffers are only moved to visible VRAM on CPU page faults
(when the CPU touches the buffer memory but the memory is in the
invisible part of VRAM). When it happens,
amdgpu_bo_fault_reserve_notify is called, which moves the buffer to
visible VRAM, and the app continues. amdgpu_bo_fault_reserve_notify
also marks the buffer as contiguous, which makes memory fragmentation
worse.

I verified this with DiRT Rally where amdgpu_bo_fault_reserve_notify
was much higher in a CPU profiler than anything else in the kernel.


2) Monitoring via Gallium HUD

We need to expose 2 kernel counters via the INFO ioctl and display
those via Gallium HUD:
- The number of VRAM CPU page faults. (the number of calls to
amdgpu_bo_fault_reserve_notify).
- The number of bytes moved by ttm_bo_validate inside
amdgpu_bo_fault_reserve_notify.

This will help us observe what exactly is happening and fine-tune the
throttling when it's done.


3) Solution

a) When amdgpu_bo_fault_reserve_notify is called, record the fact.
(amdgpu_bo::had_cpu_page_fault = true)

b) Monitor the MB/s rate at which buffers are moved by
amdgpu_bo_fault_reserve_notify. If we get above a specific threshold,
don't move the buffer to visible VRAM. Move it to GTT instead. Note
that moving to GTT can be cheaper, because moving to visible VRAM is
likely to evict a lot of buffers there and unmap them from the CPU,
but moving to GTT shouldn't evict or unmap anything.

c) When we get into the CS ioctl and a buffer has had_cpu_page_fault,
it can be moved to VRAM if:
- the GTT->VRAM move rate is low enough to allow it (this is the
existing throttling mechanism)
- the visible VRAM move rate is low enough that we will be OK with
another CPU page fault if it happens.

d) The solution can be fine-tuned with the help of Gallium HUD to get
the best performance under various scenarios. The current throttling
mechanism can serve as an inspiration.


That's it. Feel free to comment. I think this is our biggest
performance bottleneck at the moment.

Marek
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH 09/13] drm/amdgpu:fix gmc_v9 vm fault process for SRIOV

2017-03-24 Thread Deucher, Alexander
> -Original Message-
> From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On Behalf
> Of Monk Liu
> Sent: Friday, March 24, 2017 6:38 AM
> To: amd-gfx@lists.freedesktop.org
> Cc: Liu, Monk
> Subject: [PATCH 09/13] drm/amdgpu:fix gmc_v9 vm fault process for SRIOV
> 
> for SRIOV we cannot use access register when in IRQ routine
> with regular KIQ method
> 
> Change-Id: Ifae3164cf12311b851ae131f58175f6ec3174f82
> Signed-off-by: Monk Liu 
> ---
>  drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 24 
>  1 file changed, 16 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> index 51a1919..88221bb 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> @@ -138,20 +138,28 @@ static int gmc_v9_0_process_interrupt(struct
> amdgpu_device *adev,
>   addr = (u64)entry->src_data[0] << 12;
>   addr |= ((u64)entry->src_data[1] & 0xf) << 44;
> 
> - if (entry->vm_id_src) {
> - status = RREG32(mmhub->vm_l2_pro_fault_status);
> - WREG32_P(mmhub->vm_l2_pro_fault_cntl, 1, ~1);
> - } else {
> - status = RREG32(gfxhub->vm_l2_pro_fault_status);
> - WREG32_P(gfxhub->vm_l2_pro_fault_cntl, 1, ~1);
> - }
> + if (!amdgpu_sriov_vf(adev)) {
> + if (entry->vm_id_src) {
> + status = RREG32(mmhub->vm_l2_pro_fault_status);
> + WREG32_P(mmhub->vm_l2_pro_fault_cntl, 1, ~1);
> + } else {
> + status = RREG32(gfxhub->vm_l2_pro_fault_status);
> + WREG32_P(gfxhub->vm_l2_pro_fault_cntl, 1, ~1);
> + }
> 
> - DRM_ERROR("[%s]VMC page fault (src_id:%u ring:%u vm_id:%u
> pas_id:%u) "
> + DRM_ERROR("[%s]VMC page fault (src_id:%u ring:%u
> vm_id:%u pas_id:%u) "
> "at page 0x%016llx from %d\n"
> "VM_L2_PROTECTION_FAULT_STATUS:0x%08X\n",
> entry->vm_id_src ? "mmhub" : "gfxhub",
> entry->src_id, entry->ring_id, entry->vm_id, entry->pas_id,
> addr, entry->client_id, status);

Fix the indentation here.

> + } else {
> + DRM_ERROR("[%s]VMC page fault (src_id:%u ring:%u
> vm_id:%u pas_id:%u) "
> +   "at page 0x%016llx from %d\n",
> +   entry->vm_id_src ? "mmhub" : "gfxhub",
> +   entry->src_id, entry->ring_id, entry->vm_id, entry->pas_id,
> +   addr, entry->client_id);

And here.  With that fixed:
Reviewed-by: Alex Deucher 


> + }
> 
>   return 0;
>  }
> --
> 2.7.4
> 
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 1/2] drm/amdgpu: drop GB_GPU_ID from the golden settings

2017-03-24 Thread Christian König
From: Christian König 

That register is marked deprecated, reading it results in a bus error.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index ad82ab7..b196431 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -88,7 +88,6 @@ static const struct amdgpu_gds_reg_offset 
amdgpu_gds_reg_offset[] =
 static const u32 golden_settings_gc_9_0[] =
 {
SOC15_REG_OFFSET(GC, 0, mmDB_DEBUG2), 0xf00ffeff, 0x0400,
-   SOC15_REG_OFFSET(GC, 0, mmGB_GPU_ID), 0x000f, 0x,
SOC15_REG_OFFSET(GC, 0, mmPA_SC_BINNER_EVENT_CNTL_3), 0x0003, 
0x82400024,
SOC15_REG_OFFSET(GC, 0, mmPA_SC_ENHANCE), 0x3fff, 0x0001,
SOC15_REG_OFFSET(GC, 0, mmPA_SC_LINE_STIPPLE_STATE), 0xff0f, 
0x,
-- 
2.5.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 2/2] drm/amdgpu: sanitize soc15_allowed_read_registers

2017-03-24 Thread Christian König
From: Christian König 

Disallow mmCC_RB_BACKEND_DISABLE, reading it can cause GRBM problems and the
same info is available cached as enabled_rb_pipes_mask.

Also remove duplicate mmCP_CPF_BUSY_STAT.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/soc15.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c 
b/drivers/gpu/drm/amd/amdgpu/soc15.c
index 7e54d9dc..be0d47f 100644
--- a/drivers/gpu/drm/amd/amdgpu/soc15.c
+++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
@@ -296,11 +296,9 @@ static struct amdgpu_allowed_register_entry 
soc15_allowed_read_registers[] = {
{ SOC15_REG_OFFSET(GC, 0, mmCP_CPF_BUSY_STAT), false},
{ SOC15_REG_OFFSET(GC, 0, mmCP_CPF_STALLED_STAT1), false},
{ SOC15_REG_OFFSET(GC, 0, mmCP_CPF_STATUS), false},
-   { SOC15_REG_OFFSET(GC, 0, mmCP_CPF_BUSY_STAT), false},
{ SOC15_REG_OFFSET(GC, 0, mmCP_CPC_STALLED_STAT1), false},
{ SOC15_REG_OFFSET(GC, 0, mmCP_CPC_STATUS), false},
{ SOC15_REG_OFFSET(GC, 0, mmGB_ADDR_CONFIG), false},
-   { SOC15_REG_OFFSET(GC, 0, mmCC_RB_BACKEND_DISABLE), false, true},
{ SOC15_REG_OFFSET(GC, 0, mmGC_USER_RB_BACKEND_DISABLE), false, true},
{ SOC15_REG_OFFSET(GC, 0, mmGB_BACKEND_MAP), false, false},
 };
-- 
2.5.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH umr] Add new AI CG bits to umr_print_config()

2017-03-24 Thread Deucher, Alexander
> -Original Message-
> From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On Behalf
> Of Tom St Denis
> Sent: Friday, March 24, 2017 9:49 AM
> To: amd-gfx@lists.freedesktop.org
> Cc: StDenis, Tom
> Subject: [PATCH umr] Add new AI CG bits to umr_print_config()
> 
> Signed-off-by: Tom St Denis 

Reviewed-by: Alex Deucher 

> ---
>  src/app/print_config.c | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/src/app/print_config.c b/src/app/print_config.c
> index 6a4bf5dd594a..6dbe0d42b8dc 100644
> --- a/src/app/print_config.c
> +++ b/src/app/print_config.c
> @@ -54,6 +54,12 @@ M(AMD_CG_SUPPORT_VCE_MGCG, 1UL << 14)
>  M(AMD_CG_SUPPORT_HDP_LS, 1UL << 15)
>  M(AMD_CG_SUPPORT_HDP_MGCG, 1UL << 16)
>  M(AMD_CG_SUPPORT_ROM_MGCG, 1UL << 17)
> +M(AMD_CG_SUPPORT_DRM_LS, 1UL << 18)
> +M(AMD_CG_SUPPORT_BIF_MGCG, 1UL << 19)
> +M(AMD_CG_SUPPORT_GFX_3D_CGCG, 1UL << 20)
> +M(AMD_CG_SUPPORT_GFX_3D_CGLS, 1UL << 21)
> +M(AMD_CG_SUPPORT_DRM_MGCG, 1UL << 22)
> +M(AMD_CG_SUPPORT_DF_MGCG, 1UL << 23)
>  { NULL, 0, },
>  };
> 
> --
> 2.12.0
> 
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH 11/13] drm/amdgpu:fix missing programing critical registers

2017-03-24 Thread Deucher, Alexander
> -Original Message-
> From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On Behalf
> Of Monk Liu
> Sent: Friday, March 24, 2017 6:39 AM
> To: amd-gfx@lists.freedesktop.org
> Cc: Liu, Monk
> Subject: [PATCH 11/13] drm/amdgpu:fix missing programing critical registers
> 
> those MC_VM registers won't be programed by VBIOS in VF
> so driver is responsible to programe them.
> 
> Change-Id: I817371346d86bd5668ac80a486dadc1605d0b6ca
> Signed-off-by: Monk Liu 

Reviewed-by: Alex Deucher 

> ---
>  drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c | 9 +
>  drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c| 4 +++-
>  drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c  | 9 +
>  3 files changed, 21 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c
> b/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c
> index 1ff019c..1d3c34d 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c
> @@ -53,6 +53,15 @@ int gfxhub_v1_0_gart_enable(struct amdgpu_device
> *adev)
> 
>   mmMC_VM_SYSTEM_APERTURE_DEFAULT_ADDR_MSB),
>   (u32)(value >> 44));
> 
> + if (amdgpu_sriov_vf(adev)) {
> + /* MC_VM_FB_LOCATION_BASE/TOP is NULL for VF,
> becuase they are VF copy registers so
> + vbios post doesn't program them, for SRIOV driver need to
> program them */
> + WREG32(SOC15_REG_OFFSET(GC, 0,
> mmMC_VM_FB_LOCATION_BASE),
> + adev->mc.vram_start >> 24);
> + WREG32(SOC15_REG_OFFSET(GC, 0,
> mmMC_VM_FB_LOCATION_TOP),
> + adev->mc.vram_end >> 24);
> + }
> +
>   /* Disable AGP. */
>   WREG32(SOC15_REG_OFFSET(GC, 0, mmMC_VM_AGP_BASE), 0);
>   WREG32(SOC15_REG_OFFSET(GC, 0, mmMC_VM_AGP_TOP), 0);
> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> index 88221bb..d841bc9 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> @@ -383,7 +383,9 @@ static int gmc_v9_0_late_init(void *handle)
>  static void gmc_v9_0_vram_gtt_location(struct amdgpu_device *adev,
>   struct amdgpu_mc *mc)
>  {
> - u64 base = mmhub_v1_0_get_fb_location(adev);
> + u64 base = 0;
> + if (!amdgpu_sriov_vf(adev))
> + base = mmhub_v1_0_get_fb_location(adev);
>   amdgpu_vram_location(adev, >mc, base);
>   adev->mc.gtt_base_align = 0;
>   amdgpu_gtt_location(adev, mc);
> diff --git a/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c
> b/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c
> index b1e0e6b..12025d0 100644
> --- a/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c
> @@ -67,6 +67,15 @@ int mmhub_v1_0_gart_enable(struct amdgpu_device
> *adev)
> 
>   mmMC_VM_SYSTEM_APERTURE_DEFAULT_ADDR_MSB),
>   (u32)(value >> 44));
> 
> + if (amdgpu_sriov_vf(adev)) {
> + /* MC_VM_FB_LOCATION_BASE/TOP is NULL for VF,
> becuase they are VF copy registers so
> + vbios post doesn't program them, for SRIOV driver need to
> program them */
> + WREG32(SOC15_REG_OFFSET(MMHUB, 0,
> mmMC_VM_FB_LOCATION_BASE),
> + adev->mc.vram_start >> 24);
> + WREG32(SOC15_REG_OFFSET(MMHUB, 0,
> mmMC_VM_FB_LOCATION_TOP),
> + adev->mc.vram_end >> 24);
> + }
> +
>   /* Disable AGP. */
>   WREG32(SOC15_REG_OFFSET(MMHUB, 0, mmMC_VM_AGP_BASE),
> 0);
>   WREG32(SOC15_REG_OFFSET(MMHUB, 0, mmMC_VM_AGP_TOP),
> 0);
> --
> 2.7.4
> 
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH 08/13] drm/amdgpu:no cg for soc15 of SRIOV

2017-03-24 Thread Deucher, Alexander
> -Original Message-
> From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On Behalf
> Of Monk Liu
> Sent: Friday, March 24, 2017 6:38 AM
> To: amd-gfx@lists.freedesktop.org
> Cc: Liu, Monk
> Subject: [PATCH 08/13] drm/amdgpu:no cg for soc15 of SRIOV
> 
> no CG for SRIOV on SOC15
> 
> Change-Id: Ic17e99862a875de9bfc811c72d0ab627ba58d585
> Signed-off-by: Monk Liu 

Reviewed-by: Alex Deucher 

> ---
>  drivers/gpu/drm/amd/amdgpu/soc15.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c
> b/drivers/gpu/drm/amd/amdgpu/soc15.c
> index 4ebe94b..509cd0a 100644
> --- a/drivers/gpu/drm/amd/amdgpu/soc15.c
> +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
> @@ -781,6 +781,9 @@ static int soc15_common_set_clockgating_state(void
> *handle,
>  {
>   struct amdgpu_device *adev = (struct amdgpu_device *)handle;
> 
> + if (amdgpu_sriov_vf(adev))
> + return 0;
> +
>   switch (adev->asic_type) {
>   case CHIP_VEGA10:
>   nbio_v6_1_update_medium_grain_clock_gating(adev,
> --
> 2.7.4
> 
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH 06/13] drm/amdgpu:change sequence of SDMA v4 init

2017-03-24 Thread Deucher, Alexander
> -Original Message-
> From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On Behalf
> Of Monk Liu
> Sent: Friday, March 24, 2017 6:38 AM
> To: amd-gfx@lists.freedesktop.org
> Cc: Liu, Monk
> Subject: [PATCH 06/13] drm/amdgpu:change sequence of SDMA v4 init
> 
> must set minor_update.enable before write smaller value
> to wptr/doorbell, so for sriov we need set that register
> bit in hw_init period.
> 
> this could fix the SDMA ring test fail after guest reboot
> 
> Change-Id: Id863396788cc5b35550cdcac405131d41690e77a
> Signed-off-by: Monk Liu 
> ---
>  drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 37
> +-
>  1 file changed, 27 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
> b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
> index ee3b4a9..4d9fec8 100644
> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
> @@ -560,8 +560,14 @@ static int sdma_v4_0_gfx_resume(struct
> amdgpu_device *adev)
>   WREG32(sdma_v4_0_get_reg_offset(i,
> mmSDMA0_GFX_RB_BASE_HI), ring->gpu_addr >> 40);
> 
>   ring->wptr = 0;
> - WREG32(sdma_v4_0_get_reg_offset(i,
> mmSDMA0_GFX_RB_WPTR), lower_32_bits(ring->wptr) << 2);
> - WREG32(sdma_v4_0_get_reg_offset(i,
> mmSDMA0_GFX_RB_WPTR_HI), upper_32_bits(ring->wptr) << 2);
> +
> + /* before programing wptr to a less value, need set
> minor_ptr_update first */
> + WREG32(sdma_v4_0_get_reg_offset(i,
> mmSDMA0_GFX_MINOR_PTR_UPDATE), 1);
> +
> + if (!amdgpu_sriov_vf(adev)) { /* only bare-metal use register
> write for wptr */
> + WREG32(sdma_v4_0_get_reg_offset(i,
> mmSDMA0_GFX_RB_WPTR), lower_32_bits(ring->wptr) << 2);
> + WREG32(sdma_v4_0_get_reg_offset(i,
> mmSDMA0_GFX_RB_WPTR_HI), upper_32_bits(ring->wptr) << 2);
> + }
> 
>   doorbell = RREG32(sdma_v4_0_get_reg_offset(i,
> mmSDMA0_GFX_DOORBELL));
>   doorbell_offset = RREG32(sdma_v4_0_get_reg_offset(i,
> mmSDMA0_GFX_DOORBELL_OFFSET));
> @@ -577,15 +583,23 @@ static int sdma_v4_0_gfx_resume(struct
> amdgpu_device *adev)
>   WREG32(sdma_v4_0_get_reg_offset(i,
> mmSDMA0_GFX_DOORBELL_OFFSET), doorbell_offset);
>   nbio_v6_1_sdma_doorbell_range(adev, i, ring-
> >use_doorbell, ring->doorbell_index);
> 
> + if (amdgpu_sriov_vf(adev))
> + sdma_v4_0_ring_set_wptr(ring);
> +
> + /* set minor_ptr_update to 0 after wptr programed */
> + WREG32(sdma_v4_0_get_reg_offset(i,
> mmSDMA0_GFX_MINOR_PTR_UPDATE), 0);
> +
>   /* set utc l1 enable flag always to 1 */
>   temp = RREG32(sdma_v4_0_get_reg_offset(i,
> mmSDMA0_CNTL));
>   temp = REG_SET_FIELD(temp, SDMA0_CNTL,
> UTC_L1_ENABLE, 1);
>   WREG32(sdma_v4_0_get_reg_offset(i, mmSDMA0_CNTL),
> temp);
> 
> - /* unhalt engine */
> - temp = RREG32(sdma_v4_0_get_reg_offset(i,
> mmSDMA0_F32_CNTL));
> - temp = REG_SET_FIELD(temp, SDMA0_F32_CNTL, HALT, 0);
> - WREG32(sdma_v4_0_get_reg_offset(i,
> mmSDMA0_F32_CNTL), temp);
> + if (!amdgpu_sriov_vf(adev)) {
> + /* unhalt engine */
> + temp = RREG32(sdma_v4_0_get_reg_offset(i,
> mmSDMA0_F32_CNTL));
> + temp = REG_SET_FIELD(temp, SDMA0_F32_CNTL,
> HALT, 0);
> + WREG32(sdma_v4_0_get_reg_offset(i,
> mmSDMA0_F32_CNTL), temp);
> + }
> 
>   /* enable DMA RB */
>   rb_cntl = REG_SET_FIELD(rb_cntl, SDMA0_GFX_RB_CNTL,
> RB_ENABLE, 1);
> @@ -601,6 +615,11 @@ static int sdma_v4_0_gfx_resume(struct
> amdgpu_device *adev)
> 
>   ring->ready = true;
> 
> + if (amdgpu_sriov_vf(adev)) { /* bare-metal sequence
> doesn't need below to lines */
> + sdma_v4_0_ctx_switch_enable(adev, true);
> + sdma_v4_0_enable(adev, true);
> + }
> +
>   r = amdgpu_ring_test_ring(ring);
>   if (r) {
>   ring->ready = false;
> @@ -671,8 +690,6 @@ static int sdma_v4_0_load_microcode(struct
> amdgpu_device *adev)
>   (adev->sdma.instance[i].fw->data +
>   le32_to_cpu(hdr-
> >header.ucode_array_offset_bytes));
> 
> - sdma_v4_0_print_ucode_regs(adev);
> -

This should be a separate change.  With that fixed:
Reviewed-by: Alex Deucher 

It might be nice to further unify these sequences if possible.  Some of these 
may not be required for bare metal, but as long as they don't hurt anything, I 
think it makes sense to reduce the number of code paths in general.

Alex


>   WREG32(sdma_v4_0_get_reg_offset(i,
> mmSDMA0_UCODE_ADDR), 0);
> 
> 
> @@ -699,10 +716,10 @@ static int sdma_v4_0_load_microcode(struct
> 

RE: [PATCH 07/13] drm/amdgpu:two fixings for sdma v4 for SRIOV

2017-03-24 Thread Deucher, Alexander
> -Original Message-
> From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On Behalf
> Of Monk Liu
> Sent: Friday, March 24, 2017 6:38 AM
> To: amd-gfx@lists.freedesktop.org
> Cc: Liu, Monk
> Subject: [PATCH 07/13] drm/amdgpu:two fixings for sdma v4 for SRIOV
> 
> no hw_fini for SRIOV, otherwise other VF will be affected
> no CG for SRIOV
> 
> Change-Id: I1b0525eb8d08754b4bd1a6ee6798bf5e41c6bc6b
> Signed-off-by: Monk Liu 

Reviewed-by: Alex Deucher 

> ---
>  drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
> b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
> index 4d9fec8..443f850 100644
> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
> @@ -1172,6 +1172,9 @@ static int sdma_v4_0_hw_fini(void *handle)
>  {
>   struct amdgpu_device *adev = (struct amdgpu_device *)handle;
> 
> + if (amdgpu_sriov_vf(adev))
> + return 0;
> +
>   sdma_v4_0_ctx_switch_enable(adev, false);
>   sdma_v4_0_enable(adev, false);
> 
> @@ -1406,6 +1409,9 @@ static int sdma_v4_0_set_clockgating_state(void
> *handle,
>  {
>   struct amdgpu_device *adev = (struct amdgpu_device *)handle;
> 
> + if (amdgpu_sriov_vf(adev))
> + return 0;
> +
>   switch (adev->asic_type) {
>   case CHIP_VEGA10:
>   sdma_v4_0_update_medium_grain_clock_gating(adev,
> --
> 2.7.4
> 
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH 04/13] drm/amdgpu:virt_init_setting invoke is missed!

2017-03-24 Thread Deucher, Alexander
> -Original Message-
> From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On Behalf
> Of Monk Liu
> Sent: Friday, March 24, 2017 6:38 AM
> To: amd-gfx@lists.freedesktop.org
> Cc: Liu, Monk
> Subject: [PATCH 04/13] drm/amdgpu:virt_init_setting invoke is missed!
> 
> this must be invoked during early init
> 
> Change-Id: I68726dd36825259913b47493ba1e9c467b368d0c
> Signed-off-by: Monk Liu 

Reviewed-by: Alex Deucher 

> ---
>  drivers/gpu/drm/amd/amdgpu/soc15.c | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c
> b/drivers/gpu/drm/amd/amdgpu/soc15.c
> index 7e54d9dc..4ebe94b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/soc15.c
> +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
> @@ -553,6 +553,10 @@ static int soc15_common_early_init(void *handle)
>   (amdgpu_ip_block_mask & (1 <<
> AMD_IP_BLOCK_TYPE_PSP)))
>   psp_enabled = true;
> 
> + if (amdgpu_sriov_vf(adev)) {
> + amdgpu_virt_init_setting(adev);
> + }
> +
>   /*
>* nbio need be used for both sdma and gfx9, but only
>* initializes once
> --
> 2.7.4
> 
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH 02/13] drm/amdgpu:enable mcbp for gfx9

2017-03-24 Thread Deucher, Alexander
> -Original Message-
> From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On Behalf
> Of Monk Liu
> Sent: Friday, March 24, 2017 6:38 AM
> To: amd-gfx@lists.freedesktop.org
> Cc: Liu, Monk
> Subject: [PATCH 02/13] drm/amdgpu:enable mcbp for gfx9
> 
> set bit 21 of IB.control filed to actually enable
> MCBP for SRIOV
> 
> Change-Id: Ie5126d5be95e037087cf7167c28c61975f40d784
> Signed-off-by: Monk Liu 
> ---
>  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> index ad82ab7..0d8fb51 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> @@ -3016,6 +3016,9 @@ static void gfx_v9_0_ring_emit_ib_gfx(struct
> amdgpu_ring *ring,
> 
>  control |= ib->length_dw | (vm_id << 24);
> 
> + if (amdgpu_sriov_vf(ring->adev) && (ib->flags &
> AMDGPU_IB_FLAG_PREEMPT))
> + control |= (1 << 21);

Can you add proper defines for these bits?

With that fixed:
Reviewed-by: Alex Deucher 

> +
>  amdgpu_ring_write(ring, header);
>   BUG_ON(ib->gpu_addr & 0x3); /* Dword align */
>  amdgpu_ring_write(ring,
> --
> 2.7.4
> 
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH 01/13] drm/amdgpu:imple cond_exec for gfx8

2017-03-24 Thread Deucher, Alexander
> -Original Message-
> From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On Behalf
> Of Monk Liu
> Sent: Friday, March 24, 2017 6:38 AM
> To: amd-gfx@lists.freedesktop.org
> Cc: Liu, Monk
> Subject: [PATCH 01/13] drm/amdgpu:imple cond_exec for gfx8
> 
> when MCBP enalbed for gfx8, the cond_exec must also
> be implemented, otherwise there will be odds to meet
> cross engine (ce and me) deadlock when WORLD switch happens.
> 
> Change-Id: I6bdb5f91dc6e1b56dcad43741a109a6eb08cb5fa
> Signed-off-by: Monk Liu 
> ---
>  drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 28
> 
>  1 file changed, 28 insertions(+)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> index 5757300..396c075 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> @@ -6499,6 +6499,32 @@ static void gfx_v8_ring_emit_cntxcntl(struct
> amdgpu_ring *ring, uint32_t flags)
>   (flags & AMDGPU_VM_DOMAIN) ?
> AMDGPU_CSA_VADDR : ring->adev->virt.csa_vmid0_addr);
>  }
> 
> +static unsigned gfx_v8_0_ring_emit_init_cond_exec(struct amdgpu_ring
> *ring)
> +{
> + unsigned ret;

New line between stack variables can code please.

> + amdgpu_ring_write(ring, PACKET3(PACKET3_COND_EXEC, 3));
> + amdgpu_ring_write(ring, lower_32_bits(ring-
> >cond_exe_gpu_addr));
> + amdgpu_ring_write(ring, upper_32_bits(ring-
> >cond_exe_gpu_addr));
> + amdgpu_ring_write(ring, 0); /* discard following DWs if
> *cond_exec_gpu_addr==0 */
> + ret = ring->wptr & ring->buf_mask;
> + amdgpu_ring_write(ring, 0x55aa55aa); /* patch dummy value later
> */
> + return ret;
> +}
> +
> +static void gfx_v8_0_ring_emit_patch_cond_exec(struct amdgpu_ring
> *ring, unsigned offset)
> +{
> + unsigned cur;

Same thing here.

With these fixed:
Reviewed-by: Alex Deucher 

> + BUG_ON(offset > ring->buf_mask);
> + BUG_ON(ring->ring[offset] != 0x55aa55aa);
> +
> + cur = (ring->wptr & ring->buf_mask) - 1;
> + if (likely(cur > offset))
> + ring->ring[offset] = cur - offset;
> + else
> + ring->ring[offset] = (ring->ring_size >> 2) - offset + cur;
> +}
> +
> +
>  static void gfx_v8_0_ring_emit_rreg(struct amdgpu_ring *ring, uint32_t reg)
>  {
>   struct amdgpu_device *adev = ring->adev;
> @@ -6788,6 +6814,8 @@ static const struct amdgpu_ring_funcs
> gfx_v8_0_ring_funcs_gfx = {
>   .pad_ib = amdgpu_ring_generic_pad_ib,
>   .emit_switch_buffer = gfx_v8_ring_emit_sb,
>   .emit_cntxcntl = gfx_v8_ring_emit_cntxcntl,
> + .init_cond_exec = gfx_v8_0_ring_emit_init_cond_exec,
> + .patch_cond_exec = gfx_v8_0_ring_emit_patch_cond_exec,
>  };
> 
>  static const struct amdgpu_ring_funcs gfx_v8_0_ring_funcs_compute = {
> --
> 2.7.4
> 
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH 6/6] drm/amdgpu: fix to remove HDP MGCG on soc15

2017-03-24 Thread Deucher, Alexander
> -Original Message-
> From: Huang Rui [mailto:ray.hu...@amd.com]
> Sent: Friday, March 24, 2017 1:48 AM
> To: amd-gfx@lists.freedesktop.org; Deucher, Alexander
> Cc: Huan, Alvin; Huang, Ray
> Subject: [PATCH 6/6] drm/amdgpu: fix to remove HDP MGCG on soc15
> 
> SOC15 doesn't enable HDP MGCG yet.
> 
> Signed-off-by: Huang Rui 

For the series:
Reviewed-by: Alex Deucher 


> ---
>  drivers/gpu/drm/amd/amdgpu/soc15.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c
> b/drivers/gpu/drm/amd/amdgpu/soc15.c
> index dd70984..a7a0c27 100644
> --- a/drivers/gpu/drm/amd/amdgpu/soc15.c
> +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
> @@ -561,7 +561,6 @@ static int soc15_common_early_init(void *handle)
>   AMD_CG_SUPPORT_GFX_CGLS |
>   AMD_CG_SUPPORT_BIF_MGCG |
>   AMD_CG_SUPPORT_BIF_LS |
> - AMD_CG_SUPPORT_HDP_MGCG |
>   AMD_CG_SUPPORT_HDP_LS |
>   AMD_CG_SUPPORT_DRM_MGCG |
>   AMD_CG_SUPPORT_DRM_LS |
> --
> 2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH umr] Add family text for family 141

2017-03-24 Thread Tom St Denis
Signed-off-by: Tom St Denis 
---
 src/app/print_config.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/app/print_config.c b/src/app/print_config.c
index 6dbe0d42b8dc..e295302ab7a3 100644
--- a/src/app/print_config.c
+++ b/src/app/print_config.c
@@ -91,6 +91,7 @@ static const struct {
{ "Kaveri", 125 },
{ "Volcanic Islands", 130 },
{ "Carrizo", 135 },
+   { "Arctic Islands", 141 },
{ NULL, 0 },
 };
 
-- 
2.12.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH umr] Add new AI CG bits to umr_print_config()

2017-03-24 Thread Tom St Denis
Signed-off-by: Tom St Denis 
---
 src/app/print_config.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/src/app/print_config.c b/src/app/print_config.c
index 6a4bf5dd594a..6dbe0d42b8dc 100644
--- a/src/app/print_config.c
+++ b/src/app/print_config.c
@@ -54,6 +54,12 @@ M(AMD_CG_SUPPORT_VCE_MGCG, 1UL << 14)
 M(AMD_CG_SUPPORT_HDP_LS, 1UL << 15)
 M(AMD_CG_SUPPORT_HDP_MGCG, 1UL << 16)
 M(AMD_CG_SUPPORT_ROM_MGCG, 1UL << 17)
+M(AMD_CG_SUPPORT_DRM_LS, 1UL << 18)
+M(AMD_CG_SUPPORT_BIF_MGCG, 1UL << 19)
+M(AMD_CG_SUPPORT_GFX_3D_CGCG, 1UL << 20)
+M(AMD_CG_SUPPORT_GFX_3D_CGLS, 1UL << 21)
+M(AMD_CG_SUPPORT_DRM_MGCG, 1UL << 22)
+M(AMD_CG_SUPPORT_DF_MGCG, 1UL << 23)
 { NULL, 0, },
 };
 
-- 
2.12.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH umr] Add program memory dump to wave status.

2017-03-24 Thread Edward O'Callaghan
Reviewed-by: Edward O'Callaghan 

On 03/22/2017 01:11 AM, Tom St Denis wrote:
> It will display the leading 4 words up to
> the current PC value and then 4 words after.
> 
> Signed-off-by: Tom St Denis 
> ---
>  src/app/print_waves.c | 44 +---
>  1 file changed, 33 insertions(+), 11 deletions(-)
> 
> diff --git a/src/app/print_waves.c b/src/app/print_waves.c
> index f0eeeba43a99..e3662983d8d1 100644
> --- a/src/app/print_waves.c
> +++ b/src/app/print_waves.c
> @@ -35,7 +35,8 @@
>  
>  void umr_print_waves(struct umr_asic *asic)
>  {
> - uint32_t x, se, sh, cu, simd, wave, sgprs[1024], shift;
> + uint32_t x, se, sh, cu, simd, wave, sgprs[1024], shift, opcodes[8];
> + uint64_t pgm_addr;
>   struct umr_wave_status ws;
>   int first = 1, col = 0;
>  
> @@ -74,17 +75,24 @@ void umr_print_waves(struct umr_asic *asic)
>  (unsigned long)ws.hw_id.value, (unsigned long)ws.gpr_alloc.value, (unsigned 
> long)ws.lds_alloc.value, (unsigned long)ws.trapsts.value, (unsigned 
> long)ws.ib_sts.value,
>  (unsigned long)ws.tba_hi, (unsigned long)ws.tba_lo, (unsigned 
> long)ws.tma_hi, (unsigned long)ws.tma_lo, (unsigned long)ws.ib_dbg0, 
> (unsigned long)ws.m0
>  );
> - for (x = 0; x < 
> ((ws.gpr_alloc.sgpr_size + 1) << shift); x += 4)
> - printf(">SGPRS[%u..%u] = { 
> %08lx, %08lx, %08lx, %08lx }\n",
> - 
> (unsigned)((ws.gpr_alloc.sgpr_base << shift) + x),
> - 
> (unsigned)((ws.gpr_alloc.sgpr_base << shift) + x + 3),
> - (unsigned long)sgprs[x],
> - (unsigned 
> long)sgprs[x+1],
> - (unsigned 
> long)sgprs[x+2],
> - (unsigned 
> long)sgprs[x+3]);
> - }
> + for (x = 0; x < 
> ((ws.gpr_alloc.sgpr_size + 1) << shift); x += 4)
> + printf(">SGPRS[%u..%u] 
> = { %08lx, %08lx, %08lx, %08lx }\n",
> + 
> (unsigned)((ws.gpr_alloc.sgpr_base << shift) + x),
> + 
> (unsigned)((ws.gpr_alloc.sgpr_base << shift) + x + 3),
> + (unsigned 
> long)sgprs[x],
> + (unsigned 
> long)sgprs[x+1],
> + (unsigned 
> long)sgprs[x+2],
> + (unsigned 
> long)sgprs[x+3]);
>  
> - if (options.bitfields) {
> + pgm_addr = (((uint64_t)ws.pc_hi 
> << 32) | ws.pc_lo) - (sizeof(opcodes)/2);
> + umr_read_vram(asic, 
> ws.hw_id.vm_id, pgm_addr, sizeof(opcodes), opcodes);
> + for (x = 0; x < 
> sizeof(opcodes)/4; x++) {
> + printf(">pgm[%lu@%llx] 
> = %08lx\n",
> + (unsigned 
> long)ws.hw_id.vm_id,
> + (unsigned long 
> long)(pgm_addr + 4 * x),
> + (unsigned 
> long)opcodes[x]);
> + }
> + } else {
>   first = 0;
>   
> printf("\n--\nse%u.sh%u.cu%u.simd%u.wave%u\n",
>   (unsigned)se, (unsigned)sh, 
> (unsigned)cu, (unsigned)ws.hw_id.simd_id, (unsigned)ws.hw_id.wave_id);
> @@ -156,6 +164,20 @@ void umr_print_waves(struct umr_asic *asic)
>   (unsigned 
> long)sgprs[x+2],
>   (unsigned 
> long)sgprs[x+3]);
>  
> + printf("\n\nPGM_MEM:\n");
> + pgm_addr = (((uint64_t)ws.pc_hi 
> << 32) | ws.pc_lo) - (sizeof(opcodes)/2);
> + umr_read_vram(asic, 
> ws.hw_id.vm_id, pgm_addr, sizeof(opcodes), opcodes);
> + for (x = 0; x < 
> sizeof(opcodes)/4; x++) {
> + if (x == 
> (sizeof(opcodes)/8))
> +  

Re: [PATCH 0/6] drm/amdgpu: add get clockgating functions for new asic

2017-03-24 Thread Edward O'Callaghan
This series is,
Reviewed-by: Edward O'Callaghan 

On 03/24/2017 04:47 PM, Huang Rui wrote:
> Hi all,
> 
> This patch set adds get_clockgating functions, after that, we can use
> debugfs pm to check the dynamic clockgating status.
> 
> Thanks,
> Rui
> 
> Huang Rui (6):
>   drm/amdgpu: add get_clockgating callback for gfx v9
>   drm/amdgpu: add get_clockgating callback for nbio v6.1
>   drm/amdgpu: add get_clockgating callback for soc15
>   drm/amdgpu: add get_clockgating for sdma v4
>   drm/amdgpu: add get_clockgating callback for mmhub v1
>   drm/amdgpu: fix to remove HDP MGCG on soc15
> 
>  drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c  |  6 +
>  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c   | 43 
> +
>  drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c | 17 +
>  drivers/gpu/drm/amd/amdgpu/nbio_v6_1.c  | 15 
>  drivers/gpu/drm/amd/amdgpu/nbio_v6_1.h  |  1 +
>  drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c  | 17 +
>  drivers/gpu/drm/amd/amdgpu/soc15.c  | 35 ++-
>  7 files changed, 133 insertions(+), 1 deletion(-)
> 



signature.asc
Description: OpenPGP digital signature
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 3/6] drm/amdgpu: add get_clockgating callback for soc15

2017-03-24 Thread William Lewis


On 03/24/17 00:47, Huang Rui wrote:
> Signed-off-by: Huang Rui 
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c |  3 +++
>   drivers/gpu/drm/amd/amdgpu/soc15.c | 34 
> ++
>   2 files changed, 37 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
> index 743a852..fef89c0 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
> @@ -55,7 +55,10 @@ static const struct cg_flag_name clocks[] = {
>   {AMD_CG_SUPPORT_VCE_MGCG, "Video Compression Engine Medium Grain Clock 
> Gating"},
>   {AMD_CG_SUPPORT_HDP_LS, "Host Data Path Light Sleep"},
>   {AMD_CG_SUPPORT_HDP_MGCG, "Host Data Path Medium Grain Clock Gating"},
> + {AMD_CG_SUPPORT_DRM_MGCG, "Digital Right Managment Medium Grain Clock 
> Gating"},
> + {AMD_CG_SUPPORT_DRM_LS, "Digital Right Managment Light Sleep"},
s/Managment/Management/
>   {AMD_CG_SUPPORT_ROM_MGCG, "Rom Medium Grain Clock Gating"},
> + {AMD_CG_SUPPORT_DF_MGCG, "Data Fabric Medium Grain Clock Gating"},
>   {0, NULL},
>   };
>   
> diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c 
> b/drivers/gpu/drm/amd/amdgpu/soc15.c
> index e37c1ff..dd70984 100644
> --- a/drivers/gpu/drm/amd/amdgpu/soc15.c
> +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
> @@ -782,6 +782,39 @@ static int soc15_common_set_clockgating_state(void 
> *handle,
>   return 0;
>   }
>   
> +static void soc15_common_get_clockgating_state(void *handle, u32 *flags)
> +{
> + struct amdgpu_device *adev = (struct amdgpu_device *)handle;
> + int data;
> +
> + nbio_v6_1_get_clockgating_state(adev, flags);
> +
> + /* AMD_CG_SUPPORT_HDP_LS */
> + data = RREG32(SOC15_REG_OFFSET(HDP, 0, mmHDP_MEM_POWER_LS));
> + if (data & HDP_MEM_POWER_LS__LS_ENABLE_MASK)
> + *flags |= AMD_CG_SUPPORT_HDP_LS;
> +
> + /* AMD_CG_SUPPORT_DRM_MGCG */
> + data = RREG32(SOC15_REG_OFFSET(MP0, 0, mmMP0_SMN_CGTT_DRM_CLK_CTRL0));
> + if (!(data & MP0_SMN_CGTT_DRM_CLK_CTRL0__SOFT_OVERRIDE0_MASK))
> + *flags |= AMD_CG_SUPPORT_DRM_MGCG;
> +
> + /* AMD_CG_SUPPORT_DRM_LS */
> + data = RREG32(SOC15_REG_OFFSET(MP0, 0, mmMP0_SMN_DRM_LIGHT_SLEEP_CTRL));
> + if (data & MP0_SMN_DRM_LIGHT_SLEEP_CTRL__MEM_LIGHT_SLEEP_EN_MASK)
> + *flags |= AMD_CG_SUPPORT_DRM_LS;
> +
> + /* AMD_CG_SUPPORT_ROM_MGCG */
> + data = RREG32(SOC15_REG_OFFSET(SMUIO, 0, mmCGTT_ROM_CLK_CTRL0));
> + if (!(data & CGTT_ROM_CLK_CTRL0__SOFT_OVERRIDE0_MASK))
> + *flags |= AMD_CG_SUPPORT_ROM_MGCG;
> +
> + /* AMD_CG_SUPPORT_DF_MGCG */
> + data = RREG32(SOC15_REG_OFFSET(DF, 0, mmDF_PIE_AON0_DfGlobalClkGater));
> + if (data & DF_MGCG_ENABLE_15_CYCLE_DELAY)
> + *flags |= AMD_CG_SUPPORT_DF_MGCG;
> +}
> +
>   static int soc15_common_set_powergating_state(void *handle,
>   enum amd_powergating_state state)
>   {
> @@ -804,4 +837,5 @@ const struct amd_ip_funcs soc15_common_ip_funcs = {
>   .soft_reset = soc15_common_soft_reset,
>   .set_clockgating_state = soc15_common_set_clockgating_state,
>   .set_powergating_state = soc15_common_set_powergating_state,
> + .get_clockgating_state= soc15_common_get_clockgating_state,
>   };

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH 00/13] *** VEGA10 SRIOV PATCHES ***

2017-03-24 Thread Yu, Xiangliang
Reviewed-by: Xiangliang Yu  for the series.


Thanks!
Xiangliang Yu


> -Original Message-
> From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On Behalf
> Of Monk Liu
> Sent: Friday, March 24, 2017 6:38 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Liu, Monk 
> Subject: [PATCH 00/13] *** VEGA10 SRIOV PATCHES ***
> 
> patch 1 is a fix for VI
> patch 2 to patch 11 are bug fixings for vega10 for SRIOV patch 12/13 is the
> DMAframe scheme change to fix CE VM fault after world switch.
> 
> 
> Monk Liu (13):
>   drm/amdgpu:imple cond_exec for gfx8
>   drm/amdgpu:enable mcbp for gfx9
>   drm/amdgpu:add KIQ interrupt id
>   drm/amdgpu:virt_init_setting invoke is missed!
>   drm/amdgpu:fix ring init sequence
>   drm/amdgpu:change sequence of SDMA v4 init
>   drm/amdgpu:two fixings for sdma v4 for SRIOV
>   drm/amdgpu:no cg for soc15 of SRIOV
>   drm/amdgpu:fix gmc_v9 vm fault process for SRIOV
>   drm/amdgpu:fix ring_write_multiple
>   drm/amdgpu:fix missing programing critical registers
>   drm/amdgpu:changes in gfx DMAframe scheme
>   drm/amdgpu:change emit_frame_size of gfx8
> 
>  drivers/gpu/drm/amd/amdgpu/amdgpu.h  |  4 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c   |  8 ++--
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c |  6 +--
> drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h |  2 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c   | 77
> +++-
>  drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c| 57 ++
> -
>  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c| 37 ++-
>  drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c |  9 
>  drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c| 28 
>  drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c  |  9 
>  drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c   | 43 +-
>  drivers/gpu/drm/amd/amdgpu/soc15.c   |  7 +++
>  12 files changed, 206 insertions(+), 81 deletions(-)
> 
> --
> 2.7.4
> 
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 10/13] drm/amdgpu:fix ring_write_multiple

2017-03-24 Thread Monk Liu
ring_write_multiple should use buf_mask instead of ptr_mask

Change-Id: Ia249b6a1a990a6c3cba5c4048de6d604bb91d0ef
Signed-off-by: Monk Liu 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 2861aee..284af3a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1804,9 +1804,9 @@ static inline void amdgpu_ring_write_multiple(struct 
amdgpu_ring *ring, void *sr
if (ring->count_dw < count_dw) {
DRM_ERROR("amdgpu: writing more dwords to the ring than 
expected!\n");
} else {
-   occupied = ring->wptr & ring->ptr_mask;
+   occupied = ring->wptr & ring->buf_mask;
dst = (void *)>ring[occupied];
-   chunk1 = ring->ptr_mask + 1 - occupied;
+   chunk1 = ring->buf_mask + 1 - occupied;
chunk1 = (chunk1 >= count_dw) ? count_dw: chunk1;
chunk2 = count_dw - chunk1;
chunk1 <<= 2;
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 12/13] drm/amdgpu:changes in gfx DMAframe scheme

2017-03-24 Thread Monk Liu
1) Adapt to vulkan:
Now use double SWITCH BUFFER to replace the 128 nops w/a,
because when vulkan introduced, umd can insert 7 ~ 16 IBs
per submit which makes 256 DW size cannot hold the whole
DMAframe (if we still insert those 128 nops), CP team suggests
use double SWITCH_BUFFERs, instead of tricky 128 NOPs w/a.

2) To fix the CE VM fault issue when MCBP introduced:
Need one more COND_EXEC wrapping IB part (original one us
for VM switch part).

this change can fix vm fault issue caused by below scenario
without this change:

>CE passed original COND_EXEC (no MCBP issued this moment),
 proceed as normal.

>DE catch up to this COND_EXEC, but this time MCBP issued,
 thus DE treats all following packages as NOP. The following
 VM switch packages now looks just as NOP to DE, so DE
 dosen't do VM flush at all.

>Now CE proceeds to the first IBc, and triggers VM fault,
 because DE didn't do VM flush for this DMAframe.

3) change estimated alloc size for gfx9.
with new DMAframe scheme, we need modify emit_frame_size
for gfx9

with above changes, no more 128 NOPs w/a after VM flush

Change-Id: Ib3f92d9d5a81bfff0369a00f23e1e5891797089a
Signed-off-by: Monk Liu 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c |  8 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 77 +-
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c  | 29 -
 3 files changed, 69 insertions(+), 45 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
index d103270..b300929 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
@@ -167,9 +167,6 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned 
num_ibs,
return r;
}
 
-   if (ring->funcs->init_cond_exec)
-   patch_offset = amdgpu_ring_init_cond_exec(ring);
-
if (vm) {
amdgpu_ring_insert_nop(ring, extra_nop); /* prevent CE go too 
fast than DE */
 
@@ -180,7 +177,10 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned 
num_ibs,
}
}
 
-   if (ring->funcs->emit_hdp_flush
+   if (ring->funcs->init_cond_exec)
+   patch_offset = amdgpu_ring_init_cond_exec(ring);
+
+   if (ring->funcs->emit_hdp_flush
 #ifdef CONFIG_X86_64
&& !(adev->flags & AMD_IS_APU)
 #endif
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 9ff445c..74be4fa 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -483,42 +483,59 @@ int amdgpu_vm_flush(struct amdgpu_ring *ring, struct 
amdgpu_job *job)
id->oa_size != job->oa_size);
int r;
 
-   if (ring->funcs->emit_pipeline_sync && (
-   job->vm_needs_flush || gds_switch_needed ||
-   amdgpu_vm_ring_has_compute_vm_bug(ring)))
-   amdgpu_ring_emit_pipeline_sync(ring);
+   if (job->vm_needs_flush || gds_switch_needed ||
+   amdgpu_vm_is_gpu_reset(adev, id) ||
+   amdgpu_vm_ring_has_compute_vm_bug(ring)) {
+   unsigned patch_offset = 0;
 
-   if (ring->funcs->emit_vm_flush && (job->vm_needs_flush ||
-   amdgpu_vm_is_gpu_reset(adev, id))) {
-   struct fence *fence;
-   u64 pd_addr = amdgpu_vm_adjust_mc_addr(adev, job->vm_pd_addr);
+   if (ring->funcs->init_cond_exec)
+   patch_offset = amdgpu_ring_init_cond_exec(ring);
 
-   trace_amdgpu_vm_flush(pd_addr, ring->idx, job->vm_id);
-   amdgpu_ring_emit_vm_flush(ring, job->vm_id, pd_addr);
+   if (ring->funcs->emit_pipeline_sync &&
+   (job->vm_needs_flush || gds_switch_needed ||
+   amdgpu_vm_ring_has_compute_vm_bug(ring)))
+   amdgpu_ring_emit_pipeline_sync(ring);
 
-   r = amdgpu_fence_emit(ring, );
-   if (r)
-   return r;
+   if (ring->funcs->emit_vm_flush && (job->vm_needs_flush ||
+   amdgpu_vm_is_gpu_reset(adev, id))) {
+   struct fence *fence;
+   u64 pd_addr = amdgpu_vm_adjust_mc_addr(adev, 
job->vm_pd_addr);
 
-   mutex_lock(>vm_manager.lock);
-   fence_put(id->last_flush);
-   id->last_flush = fence;
-   mutex_unlock(>vm_manager.lock);
-   }
+   trace_amdgpu_vm_flush(pd_addr, ring->idx, job->vm_id);
+   amdgpu_ring_emit_vm_flush(ring, job->vm_id, pd_addr);
 
-   if (gds_switch_needed) {
-   id->gds_base = job->gds_base;
-   id->gds_size = job->gds_size;
-   id->gws_base = job->gws_base;
-   id->gws_size = job->gws_size;
-   id->oa_base = job->oa_base;
-   id->oa_size = job->oa_size;
-   

[PATCH 05/13] drm/amdgpu:fix ring init sequence

2017-03-24 Thread Monk Liu
ring->buf_mask need be set prior to ring_clear_ring invoke
and fix ring_clear_ring as well which should use buf_mask
instead of ptr_mask

Change-Id: I7778a7afe27ac2bdedcaba1b0146582100602f9d
Signed-off-by: Monk Liu 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 6 +++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
index e619833..10e94d0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
@@ -235,6 +235,9 @@ int amdgpu_ring_init(struct amdgpu_device *adev, struct 
amdgpu_ring *ring,
ring->ring_size = roundup_pow_of_two(max_dw * 4 *
 amdgpu_sched_hw_submission);
 
+   ring->buf_mask = (ring->ring_size / 4) - 1;
+   ring->ptr_mask = ring->funcs->support_64bit_ptrs ?
+   0x : ring->buf_mask;
/* Allocate ring buffer */
if (ring->ring_obj == NULL) {
r = amdgpu_bo_create_kernel(adev, ring->ring_size, PAGE_SIZE,
@@ -248,9 +251,6 @@ int amdgpu_ring_init(struct amdgpu_device *adev, struct 
amdgpu_ring *ring,
}
amdgpu_ring_clear_ring(ring);
}
-   ring->buf_mask = (ring->ring_size / 4) - 1;
-   ring->ptr_mask = ring->funcs->support_64bit_ptrs ?
-   0x : ring->buf_mask;
 
ring->max_dw = max_dw;
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
index 9f57eda..5fd3dd1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
@@ -194,7 +194,7 @@ void amdgpu_ring_fini(struct amdgpu_ring *ring);
 static inline void amdgpu_ring_clear_ring(struct amdgpu_ring *ring)
 {
int i = 0;
-   while (i <= ring->ptr_mask)
+   while (i <= ring->buf_mask)
ring->ring[i++] = ring->funcs->nop;
 
 }
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 00/13] *** VEGA10 SRIOV PATCHES ***

2017-03-24 Thread Monk Liu
patch 1 is a fix for VI
patch 2 to patch 11 are bug fixings for vega10 for SRIOV
patch 12/13 is the DMAframe scheme change to fix CE VM fault after world switch.


Monk Liu (13):
  drm/amdgpu:imple cond_exec for gfx8
  drm/amdgpu:enable mcbp for gfx9
  drm/amdgpu:add KIQ interrupt id
  drm/amdgpu:virt_init_setting invoke is missed!
  drm/amdgpu:fix ring init sequence
  drm/amdgpu:change sequence of SDMA v4 init
  drm/amdgpu:two fixings for sdma v4 for SRIOV
  drm/amdgpu:no cg for soc15 of SRIOV
  drm/amdgpu:fix gmc_v9 vm fault process for SRIOV
  drm/amdgpu:fix ring_write_multiple
  drm/amdgpu:fix missing programing critical registers
  drm/amdgpu:changes in gfx DMAframe scheme
  drm/amdgpu:change emit_frame_size of gfx8

 drivers/gpu/drm/amd/amdgpu/amdgpu.h  |  4 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c   |  8 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c |  6 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c   | 77 +++-
 drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c| 57 ++-
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c| 37 ++-
 drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c |  9 
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c| 28 
 drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c  |  9 
 drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c   | 43 +-
 drivers/gpu/drm/amd/amdgpu/soc15.c   |  7 +++
 12 files changed, 206 insertions(+), 81 deletions(-)

-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 04/13] drm/amdgpu:virt_init_setting invoke is missed!

2017-03-24 Thread Monk Liu
this must be invoked during early init

Change-Id: I68726dd36825259913b47493ba1e9c467b368d0c
Signed-off-by: Monk Liu 
---
 drivers/gpu/drm/amd/amdgpu/soc15.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c 
b/drivers/gpu/drm/amd/amdgpu/soc15.c
index 7e54d9dc..4ebe94b 100644
--- a/drivers/gpu/drm/amd/amdgpu/soc15.c
+++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
@@ -553,6 +553,10 @@ static int soc15_common_early_init(void *handle)
(amdgpu_ip_block_mask & (1 << AMD_IP_BLOCK_TYPE_PSP)))
psp_enabled = true;
 
+   if (amdgpu_sriov_vf(adev)) {
+   amdgpu_virt_init_setting(adev);
+   }
+
/*
 * nbio need be used for both sdma and gfx9, but only
 * initializes once
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 06/13] drm/amdgpu:change sequence of SDMA v4 init

2017-03-24 Thread Monk Liu
must set minor_update.enable before write smaller value
to wptr/doorbell, so for sriov we need set that register
bit in hw_init period.

this could fix the SDMA ring test fail after guest reboot

Change-Id: Id863396788cc5b35550cdcac405131d41690e77a
Signed-off-by: Monk Liu 
---
 drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 37 +-
 1 file changed, 27 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c 
b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
index ee3b4a9..4d9fec8 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
@@ -560,8 +560,14 @@ static int sdma_v4_0_gfx_resume(struct amdgpu_device *adev)
WREG32(sdma_v4_0_get_reg_offset(i, mmSDMA0_GFX_RB_BASE_HI), 
ring->gpu_addr >> 40);
 
ring->wptr = 0;
-   WREG32(sdma_v4_0_get_reg_offset(i, mmSDMA0_GFX_RB_WPTR), 
lower_32_bits(ring->wptr) << 2);
-   WREG32(sdma_v4_0_get_reg_offset(i, mmSDMA0_GFX_RB_WPTR_HI), 
upper_32_bits(ring->wptr) << 2);
+
+   /* before programing wptr to a less value, need set 
minor_ptr_update first */
+   WREG32(sdma_v4_0_get_reg_offset(i, 
mmSDMA0_GFX_MINOR_PTR_UPDATE), 1);
+
+   if (!amdgpu_sriov_vf(adev)) { /* only bare-metal use register 
write for wptr */
+   WREG32(sdma_v4_0_get_reg_offset(i, 
mmSDMA0_GFX_RB_WPTR), lower_32_bits(ring->wptr) << 2);
+   WREG32(sdma_v4_0_get_reg_offset(i, 
mmSDMA0_GFX_RB_WPTR_HI), upper_32_bits(ring->wptr) << 2);
+   }
 
doorbell = RREG32(sdma_v4_0_get_reg_offset(i, 
mmSDMA0_GFX_DOORBELL));
doorbell_offset = RREG32(sdma_v4_0_get_reg_offset(i, 
mmSDMA0_GFX_DOORBELL_OFFSET));
@@ -577,15 +583,23 @@ static int sdma_v4_0_gfx_resume(struct amdgpu_device 
*adev)
WREG32(sdma_v4_0_get_reg_offset(i, 
mmSDMA0_GFX_DOORBELL_OFFSET), doorbell_offset);
nbio_v6_1_sdma_doorbell_range(adev, i, ring->use_doorbell, 
ring->doorbell_index);
 
+   if (amdgpu_sriov_vf(adev))
+   sdma_v4_0_ring_set_wptr(ring);
+
+   /* set minor_ptr_update to 0 after wptr programed */
+   WREG32(sdma_v4_0_get_reg_offset(i, 
mmSDMA0_GFX_MINOR_PTR_UPDATE), 0);
+
/* set utc l1 enable flag always to 1 */
temp = RREG32(sdma_v4_0_get_reg_offset(i, mmSDMA0_CNTL));
temp = REG_SET_FIELD(temp, SDMA0_CNTL, UTC_L1_ENABLE, 1);
WREG32(sdma_v4_0_get_reg_offset(i, mmSDMA0_CNTL), temp);
 
-   /* unhalt engine */
-   temp = RREG32(sdma_v4_0_get_reg_offset(i, mmSDMA0_F32_CNTL));
-   temp = REG_SET_FIELD(temp, SDMA0_F32_CNTL, HALT, 0);
-   WREG32(sdma_v4_0_get_reg_offset(i, mmSDMA0_F32_CNTL), temp);
+   if (!amdgpu_sriov_vf(adev)) {
+   /* unhalt engine */
+   temp = RREG32(sdma_v4_0_get_reg_offset(i, 
mmSDMA0_F32_CNTL));
+   temp = REG_SET_FIELD(temp, SDMA0_F32_CNTL, HALT, 0);
+   WREG32(sdma_v4_0_get_reg_offset(i, mmSDMA0_F32_CNTL), 
temp);
+   }
 
/* enable DMA RB */
rb_cntl = REG_SET_FIELD(rb_cntl, SDMA0_GFX_RB_CNTL, RB_ENABLE, 
1);
@@ -601,6 +615,11 @@ static int sdma_v4_0_gfx_resume(struct amdgpu_device *adev)
 
ring->ready = true;
 
+   if (amdgpu_sriov_vf(adev)) { /* bare-metal sequence doesn't 
need below to lines */
+   sdma_v4_0_ctx_switch_enable(adev, true);
+   sdma_v4_0_enable(adev, true);
+   }
+
r = amdgpu_ring_test_ring(ring);
if (r) {
ring->ready = false;
@@ -671,8 +690,6 @@ static int sdma_v4_0_load_microcode(struct amdgpu_device 
*adev)
(adev->sdma.instance[i].fw->data +

le32_to_cpu(hdr->header.ucode_array_offset_bytes));
 
-   sdma_v4_0_print_ucode_regs(adev);
-
WREG32(sdma_v4_0_get_reg_offset(i, mmSDMA0_UCODE_ADDR), 0);
 
 
@@ -699,10 +716,10 @@ static int sdma_v4_0_load_microcode(struct amdgpu_device 
*adev)
  */
 static int sdma_v4_0_start(struct amdgpu_device *adev)
 {
-   int r;
+   int r = 0;
 
if (amdgpu_sriov_vf(adev)) {
-   /* disable RB and halt engine */
+   sdma_v4_0_ctx_switch_enable(adev, false);
sdma_v4_0_enable(adev, false);
 
/* set RB registers */
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 13/13] drm/amdgpu:change emit_frame_size of gfx8

2017-03-24 Thread Monk Liu
and no need to insert 128 nops after gfx8 vm flush anymore
because there was double SWITCH_BUFFER append to vm flush

Change-Id: I6ecec95236bd1745f2beaa1b34a075748813f131
Signed-off-by: Monk Liu 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 29 ++---
 1 file changed, 18 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
index 396c075..3016e535 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
@@ -6390,8 +6390,6 @@ static void gfx_v8_0_ring_emit_vm_flush(struct 
amdgpu_ring *ring,
/* sync PFP to ME, otherwise we might get invalid PFP reads */
amdgpu_ring_write(ring, PACKET3(PACKET3_PFP_SYNC_ME, 0));
amdgpu_ring_write(ring, 0x0);
-   /* GFX8 emits 128 dw nop to prevent CE access VM before 
vm_flush finish */
-   amdgpu_ring_insert_nop(ring, 128);
}
 }
 
@@ -6791,15 +6789,24 @@ static const struct amdgpu_ring_funcs 
gfx_v8_0_ring_funcs_gfx = {
.get_rptr = gfx_v8_0_ring_get_rptr,
.get_wptr = gfx_v8_0_ring_get_wptr_gfx,
.set_wptr = gfx_v8_0_ring_set_wptr_gfx,
-   .emit_frame_size =
-   20 + /* gfx_v8_0_ring_emit_gds_switch */
-   7 + /* gfx_v8_0_ring_emit_hdp_flush */
-   5 + /* gfx_v8_0_ring_emit_hdp_invalidate */
-   6 + 6 + 6 +/* gfx_v8_0_ring_emit_fence_gfx x3 for user fence, 
vm fence */
-   7 + /* gfx_v8_0_ring_emit_pipeline_sync */
-   128 + 19 + /* gfx_v8_0_ring_emit_vm_flush */
-   2 + /* gfx_v8_ring_emit_sb */
-   3 + 4 + 29, /* gfx_v8_ring_emit_cntxcntl including vgt 
flush/meta-data */
+   .emit_frame_size = /* maximum 215dw if count 16 IBs in */
+   5 +  /* COND_EXEC */
+   7 +  /* PIPELINE_SYNC */
+   19 + /* VM_FLUSH */
+   8 +  /* FENCE for VM_FLUSH */
+   20 + /* GDS switch */
+   4 + /* double SWITCH_BUFFER,
+  the first COND_EXEC jump to the place just
+  prior to this double SWITCH_BUFFER  */
+   5 + /* COND_EXEC */
+   7 +  /* HDP_flush */
+   4 +  /* VGT_flush */
+   14 + /* CE_META */
+   31 + /* DE_META */
+   3 + /* CNTX_CTRL */
+   5 + /* HDP_INVL */
+   8 + 8 + /* FENCE x2 */
+   2, /* SWITCH_BUFFER */
.emit_ib_size = 4, /* gfx_v8_0_ring_emit_ib_gfx */
.emit_ib = gfx_v8_0_ring_emit_ib_gfx,
.emit_fence = gfx_v8_0_ring_emit_fence_gfx,
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 11/13] drm/amdgpu:fix missing programing critical registers

2017-03-24 Thread Monk Liu
those MC_VM registers won't be programed by VBIOS in VF
so driver is responsible to programe them.

Change-Id: I817371346d86bd5668ac80a486dadc1605d0b6ca
Signed-off-by: Monk Liu 
---
 drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c | 9 +
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c| 4 +++-
 drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c  | 9 +
 3 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c
index 1ff019c..1d3c34d 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c
@@ -53,6 +53,15 @@ int gfxhub_v1_0_gart_enable(struct amdgpu_device *adev)
mmMC_VM_SYSTEM_APERTURE_DEFAULT_ADDR_MSB),
(u32)(value >> 44));
 
+   if (amdgpu_sriov_vf(adev)) {
+   /* MC_VM_FB_LOCATION_BASE/TOP is NULL for VF, becuase they are 
VF copy registers so
+   vbios post doesn't program them, for SRIOV driver need to 
program them */
+   WREG32(SOC15_REG_OFFSET(GC, 0, mmMC_VM_FB_LOCATION_BASE),
+   adev->mc.vram_start >> 24);
+   WREG32(SOC15_REG_OFFSET(GC, 0, mmMC_VM_FB_LOCATION_TOP),
+   adev->mc.vram_end >> 24);
+   }
+
/* Disable AGP. */
WREG32(SOC15_REG_OFFSET(GC, 0, mmMC_VM_AGP_BASE), 0);
WREG32(SOC15_REG_OFFSET(GC, 0, mmMC_VM_AGP_TOP), 0);
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index 88221bb..d841bc9 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -383,7 +383,9 @@ static int gmc_v9_0_late_init(void *handle)
 static void gmc_v9_0_vram_gtt_location(struct amdgpu_device *adev,
struct amdgpu_mc *mc)
 {
-   u64 base = mmhub_v1_0_get_fb_location(adev);
+   u64 base = 0;
+   if (!amdgpu_sriov_vf(adev))
+   base = mmhub_v1_0_get_fb_location(adev);
amdgpu_vram_location(adev, >mc, base);
adev->mc.gtt_base_align = 0;
amdgpu_gtt_location(adev, mc);
diff --git a/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c 
b/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c
index b1e0e6b..12025d0 100644
--- a/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c
@@ -67,6 +67,15 @@ int mmhub_v1_0_gart_enable(struct amdgpu_device *adev)
mmMC_VM_SYSTEM_APERTURE_DEFAULT_ADDR_MSB),
(u32)(value >> 44));
 
+   if (amdgpu_sriov_vf(adev)) {
+   /* MC_VM_FB_LOCATION_BASE/TOP is NULL for VF, becuase they are 
VF copy registers so
+   vbios post doesn't program them, for SRIOV driver need to 
program them */
+   WREG32(SOC15_REG_OFFSET(MMHUB, 0, mmMC_VM_FB_LOCATION_BASE),
+   adev->mc.vram_start >> 24);
+   WREG32(SOC15_REG_OFFSET(MMHUB, 0, mmMC_VM_FB_LOCATION_TOP),
+   adev->mc.vram_end >> 24);
+   }
+
/* Disable AGP. */
WREG32(SOC15_REG_OFFSET(MMHUB, 0, mmMC_VM_AGP_BASE), 0);
WREG32(SOC15_REG_OFFSET(MMHUB, 0, mmMC_VM_AGP_TOP), 0);
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 07/13] drm/amdgpu:two fixings for sdma v4 for SRIOV

2017-03-24 Thread Monk Liu
no hw_fini for SRIOV, otherwise other VF will be affected
no CG for SRIOV

Change-Id: I1b0525eb8d08754b4bd1a6ee6798bf5e41c6bc6b
Signed-off-by: Monk Liu 
---
 drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c 
b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
index 4d9fec8..443f850 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
@@ -1172,6 +1172,9 @@ static int sdma_v4_0_hw_fini(void *handle)
 {
struct amdgpu_device *adev = (struct amdgpu_device *)handle;
 
+   if (amdgpu_sriov_vf(adev))
+   return 0;
+
sdma_v4_0_ctx_switch_enable(adev, false);
sdma_v4_0_enable(adev, false);
 
@@ -1406,6 +1409,9 @@ static int sdma_v4_0_set_clockgating_state(void *handle,
 {
struct amdgpu_device *adev = (struct amdgpu_device *)handle;
 
+   if (amdgpu_sriov_vf(adev))
+   return 0;
+
switch (adev->asic_type) {
case CHIP_VEGA10:
sdma_v4_0_update_medium_grain_clock_gating(adev,
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 02/13] drm/amdgpu:enable mcbp for gfx9

2017-03-24 Thread Monk Liu
set bit 21 of IB.control filed to actually enable
MCBP for SRIOV

Change-Id: Ie5126d5be95e037087cf7167c28c61975f40d784
Signed-off-by: Monk Liu 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index ad82ab7..0d8fb51 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -3016,6 +3016,9 @@ static void gfx_v9_0_ring_emit_ib_gfx(struct amdgpu_ring 
*ring,
 
 control |= ib->length_dw | (vm_id << 24);
 
+   if (amdgpu_sriov_vf(ring->adev) && (ib->flags & 
AMDGPU_IB_FLAG_PREEMPT))
+   control |= (1 << 21);
+
 amdgpu_ring_write(ring, header);
BUG_ON(ib->gpu_addr & 0x3); /* Dword align */
 amdgpu_ring_write(ring,
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] Revert "drm/radeon: Try evicting from CPU accessible to inaccessible VRAM first"

2017-03-24 Thread Michel Dänzer
On 24/03/17 06:50 PM, Julien Isorce wrote:
> Hi Michel,
> 
> (Just for other readers my reply has been delayed on the mailing lists
> and should have been on second position)

It is on https://patchwork.freedesktop.org/patch/145731/ , did you mean
something else?

The delay was because you weren't subscribed to the amd-gfx mailing list
yet, so your post went through the moderation queue.


> I will have a go with that change and let you know. I do not remember if
> I tried it for this soft lockup. But for sure it does not solve the hard
> lockup that Zach also mentioned at the end of his reply.

I'll follow up to his post about that.


> But in general, isn't "radeon_lockup_timeout" supposed to detect this
> situation ?

No, it's for detecting GPU hangs, whereas this is a CPU "hang" (infinite
loop).


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] Revert "drm/radeon: Try evicting from CPU accessible to inaccessible VRAM first"

2017-03-24 Thread Julien Isorce
Hi Michel,

(Just for other readers my reply has been delayed on the mailing lists and
should have been on second position)

We have actually spotted this /0/i/ but somehow I convinced myself it was
intentional. The reason I found was that you wanted to set the fpfn only if
there is 2 placements, which means it will try to move from accessible to
inaccessible.

I will have a go with that change and let you know. I do not remember if I
tried it for this soft lockup. But for sure it does not solve the hard
lockup that Zach also mentioned at the end of his reply. I am saying that
because this other issue has some similarities (same ioctl call).

But in general, isn't "radeon_lockup_timeout" supposed to detect this
situation ?

Thx
Julien


On 24 March 2017 at 09:24, Michel Dänzer  wrote:

> On 23/03/17 06:26 PM, Julien Isorce wrote:
> > Hi Michel,
> >
> > When it happens, the main thread of our gl based app is stuck on a
> > ioctl(RADEON_CS). I set RADEON_THREAD=false to ease the debugging but
> > same thing happens if true. Other threads are only si_shader:0,1,2,3 and
> > are doing nothing, just waiting for jobs. I can also do sudo gdb -p
> > $(pidof Xorg) to block the X11 server, to make sure there is no ping
> > pong between 2 processes. All other processes are not loading
> > dri/radeonsi_dri.so . And adding a few traces shows that the above ioctl
> > call is looping for ever on
> > https://github.com/torvalds/linux/blob/master/drivers/gpu/
> drm/ttm/ttm_bo.c#L819
> >  drm/ttm/ttm_bo.c#L819> and
> > comes from
> > mesa https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/
> winsys/radeon/drm/radeon_drm_cs.c#n454
> > .
> >
> > After adding even more traces I can see that the bo, which is being
> > indefinitely evicted, has the flag RADEON_GEM_NO_CPU_ACCESS.
> > And it gets 3 potential placements after calling "radeon_evict_flags".
> >  1: VRAM cpu inaccessible, fpfn is 65536
> >  2: VRAM cpu accessible, fpfn is 0
> >  3: GTT, fpfn is 0
> >
> > And it looks like it continuously succeeds to move on the second
> > placement. So I might be wrong but it looks it is not even a ping pong
> > between VRAM accessible / not accessible, it just keeps being blited in
> > the CPU accessible part of the VRAM.
>
> Thanks for the detailed description! AFAICT this can only happen due to
> a silly mistake I made in this code. Does this fix it?
>
> diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/
> radeon_ttm.c
> index 5c7cf644ba1d..37d68cd1f272 100644
> --- a/drivers/gpu/drm/radeon/radeon_ttm.c
> +++ b/drivers/gpu/drm/radeon/radeon_ttm.c
> @@ -213,8 +213,8 @@ static void radeon_evict_flags(struct
> ttm_buffer_object *bo,
> rbo->placement.num_busy_placement = 0;
> for (i = 0; i < rbo->placement.num_placement; i++)
> {
> if (rbo->placements[i].flags &
> TTM_PL_FLAG_VRAM) {
> -   if (rbo->placements[0].fpfn < fpfn)
> -   rbo->placements[0].fpfn =
> fpfn;
> +   if (rbo->placements[i].fpfn < fpfn)
> +   rbo->placements[i].fpfn =
> fpfn;
> } else {
> rbo->placement.busy_placement =
> >placements[i];
>
>
>
> --
> Earthling Michel Dänzer   |   http://www.amd.com
> Libre software enthusiast | Mesa and X developer
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 00/18] *** multiple level VMPT enablement ***

2017-03-24 Thread Christian König
Patches #11 and #18 are Reviewed-by: Christian König 
.


Patch #12 is a NAK, that will also increase the VM space for pre gfx9 
and we already found that this isn't a good idea.


We should change how that value is evaluated in the different GMC 
handling code instead.


In other words, set it to -1 as default value and handle -1 in gmc 
v6/v7/v9 as 64GB, while in gmc v9 as 1TB (or even 256TB?).


Patch #13, #14: Good catch, but please squash into my original patch #8.

Patch #15, #16, #17: Please squash into patch #10.

Apart from that thanks a lot to pick this up while I was busy tracking 
the hw problem down.


And BTW did you tested the 48bit (256TB) address space support as well? 
That should also work now.


Regards,
Christian.

Am 24.03.2017 um 04:16 schrieb Chunming Zhou:

*** BLURB HERE ***
 From Vega, ascis start to support multiple level vmpt, the series is to 
implement it.

Tested successfully with 2/3/4 levels.

Christian König (10):
   drm/amdgpu: rename page_directory_fence to last_dir_update
   drm/amdgpu: add the VM pointer to the amdgpu_pte_update_params as well
   drm/amdgpu: add num_level to the VM manager
   drm/amdgpu: generalize page table level
   drm/amdgpu: handle multi level PD size calculation
   drm/amdgpu: handle multi level PD during validation
   drm/amdgpu: handle multi level PD in the LRU
   drm/amdgpu: handle multi level PD updates
   drm/amdgpu: handle multi level PD during PT updates
   drm/amdgpu: add alloc/free for multi level PDs

Chunming Zhou (8):
   drm/amdgpu: set page table depth by num_level
   drm/amdgpu: block size of multiple level vmpt prefers one page
   drm/amdgpu: fix update sub levels
   drm/amdgpu: sub levels need to update regardless of parent updates
   drm/amdgpu: clear entries allocation
   drm/amdgpu: fix entries index calculation
   drm/amdgpu: need alloc sub level even parent bo was allocated
   drm/amdgpu: enable four level vmpt

  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c   |   6 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c  |   4 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c  |   2 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c   | 474 ---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h   |  16 +-
  drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c |   3 +-
  drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c|   1 +
  drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c|   1 +
  drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c|   1 +
  drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c|   1 +
  drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c  |   2 +-
  11 files changed, 336 insertions(+), 175 deletions(-)



___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH v2 1/2] drm/amdgpu: add optional fence out-parameter to amdgpu_vm_clear_freed

2017-03-24 Thread Christian König

Am 23.03.2017 um 20:27 schrieb Nicolai Hähnle:

From: Nicolai Hähnle 

We will add the fence to freed buffer objects in a later commit, to ensure
that the underlying memory can only be re-used after all references in
page tables have been cleared.

Signed-off-by: Nicolai Hähnle 


Reviewed-by: Christian König  for both.


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c  |  2 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c |  2 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c  | 21 +++--
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h  |  3 ++-
  4 files changed, 19 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 55d553a..85e6070 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -778,21 +778,21 @@ static int amdgpu_bo_vm_update_pte(struct 
amdgpu_cs_parser *p)
int i, r;
  
  	r = amdgpu_vm_update_page_directory(adev, vm);

if (r)
return r;
  
  	r = amdgpu_sync_fence(adev, >job->sync, vm->page_directory_fence);

if (r)
return r;
  
-	r = amdgpu_vm_clear_freed(adev, vm);

+   r = amdgpu_vm_clear_freed(adev, vm, NULL);
if (r)
return r;
  
  	r = amdgpu_vm_bo_update(adev, fpriv->prt_va, false);

if (r)
return r;
  
  	r = amdgpu_sync_fence(adev, >job->sync,

  fpriv->prt_va->last_pt_update);
if (r)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index be9fb2c..4a53c43 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -535,21 +535,21 @@ static void amdgpu_gem_va_update_vm(struct amdgpu_device 
*adev,
  
  	r = amdgpu_vm_validate_pt_bos(adev, vm, amdgpu_gem_va_check,

  NULL);
if (r)
goto error;
  
  	r = amdgpu_vm_update_page_directory(adev, vm);

if (r)
goto error;
  
-	r = amdgpu_vm_clear_freed(adev, vm);

+   r = amdgpu_vm_clear_freed(adev, vm, NULL);
if (r)
goto error;
  
  	if (operation == AMDGPU_VA_OP_MAP ||

operation == AMDGPU_VA_OP_REPLACE)
r = amdgpu_vm_bo_update(adev, bo_va, false);
  
  error:

if (r && r != -ERESTARTSYS)
DRM_ERROR("Couldn't update BO_VA (%d)\n", r);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index dd7df45..2c95a75 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -1397,48 +1397,57 @@ static void amdgpu_vm_prt_fini(struct amdgpu_device 
*adev, struct amdgpu_vm *vm)
}
  
  	kfree(shared);

  }
  
  /**

   * amdgpu_vm_clear_freed - clear freed BOs in the PT
   *
   * @adev: amdgpu_device pointer
   * @vm: requested vm
+ * @fence: optional resulting fence (unchanged if no work needed to be done
+ * or if an error occurred)
   *
   * Make sure all freed BOs are cleared in the PT.
   * Returns 0 for success.
   *
   * PTs have to be reserved and mutex must be locked!
   */
  int amdgpu_vm_clear_freed(struct amdgpu_device *adev,
- struct amdgpu_vm *vm)
+ struct amdgpu_vm *vm,
+ struct fence **fence)
  {
struct amdgpu_bo_va_mapping *mapping;
-   struct fence *fence = NULL;
+   struct fence *f = NULL;
int r;
  
  	while (!list_empty(>freed)) {

mapping = list_first_entry(>freed,
struct amdgpu_bo_va_mapping, list);
list_del(>list);
  
  		r = amdgpu_vm_bo_split_mapping(adev, NULL, 0, NULL, vm, mapping,

-  0, 0, );
-   amdgpu_vm_free_mapping(adev, vm, mapping, fence);
+  0, 0, );
+   amdgpu_vm_free_mapping(adev, vm, mapping, f);
if (r) {
-   fence_put(fence);
+   fence_put(f);
return r;
}
+   }
  
+	if (fence && f) {

+   fence_put(*fence);
+   *fence = f;
+   } else {
+   fence_put(f);
}
-   fence_put(fence);
+
return 0;
  
  }
  
  /**

   * amdgpu_vm_clear_invalids - clear invalidated BOs in the PT
   *
   * @adev: amdgpu_device pointer
   * @vm: requested vm
   *
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
index ff10fa5..9d5a572 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -187,21 +187,22 @@ int amdgpu_vm_alloc_pts(struct amdgpu_device *adev,
struct amdgpu_vm *vm,
uint64_t saddr, uint64_t size);
  int 

[PATCH 5/6] drm/amdgpu: add get_clockgating callback for mmhub v1

2017-03-24 Thread Huang Rui
Signed-off-by: Huang Rui 
---
 drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c | 17 +
 1 file changed, 17 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c 
b/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c
index b1e0e6b..68e5f7a 100644
--- a/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c
@@ -552,6 +552,22 @@ static int mmhub_v1_0_set_clockgating_state(void *handle,
return 0;
 }
 
+static void mmhub_v1_0_get_clockgating_state(void *handle, u32 *flags)
+{
+   struct amdgpu_device *adev = (struct amdgpu_device *)handle;
+   int data;
+
+   /* AMD_CG_SUPPORT_MC_MGCG */
+   data = RREG32(SOC15_REG_OFFSET(ATHUB, 0, mmATHUB_MISC_CNTL));
+   if (data & ATHUB_MISC_CNTL__CG_ENABLE_MASK)
+   *flags |= AMD_CG_SUPPORT_MC_MGCG;
+
+   /* AMD_CG_SUPPORT_MC_LS */
+   data = RREG32(SOC15_REG_OFFSET(MMHUB, 0, mmATC_L2_MISC_CG));
+   if (data & ATC_L2_MISC_CG__MEM_LS_ENABLE_MASK)
+   *flags |= AMD_CG_SUPPORT_MC_LS;
+}
+
 static int mmhub_v1_0_set_powergating_state(void *handle,
enum amd_powergating_state state)
 {
@@ -573,6 +589,7 @@ const struct amd_ip_funcs mmhub_v1_0_ip_funcs = {
.soft_reset = mmhub_v1_0_soft_reset,
.set_clockgating_state = mmhub_v1_0_set_clockgating_state,
.set_powergating_state = mmhub_v1_0_set_powergating_state,
+   .get_clockgating_state = mmhub_v1_0_get_clockgating_state,
 };
 
 const struct amdgpu_ip_block_version mmhub_v1_0_ip_block =
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 0/6] drm/amdgpu: add get clockgating functions for new asic

2017-03-24 Thread Huang Rui
Hi all,

This patch set adds get_clockgating functions, after that, we can use
debugfs pm to check the dynamic clockgating status.

Thanks,
Rui

Huang Rui (6):
  drm/amdgpu: add get_clockgating callback for gfx v9
  drm/amdgpu: add get_clockgating callback for nbio v6.1
  drm/amdgpu: add get_clockgating callback for soc15
  drm/amdgpu: add get_clockgating for sdma v4
  drm/amdgpu: add get_clockgating callback for mmhub v1
  drm/amdgpu: fix to remove HDP MGCG on soc15

 drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c  |  6 +
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c   | 43 +
 drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c | 17 +
 drivers/gpu/drm/amd/amdgpu/nbio_v6_1.c  | 15 
 drivers/gpu/drm/amd/amdgpu/nbio_v6_1.h  |  1 +
 drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c  | 17 +
 drivers/gpu/drm/amd/amdgpu/soc15.c  | 35 ++-
 7 files changed, 133 insertions(+), 1 deletion(-)

-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 4/6] drm/amdgpu: add get_clockgating for sdma v4

2017-03-24 Thread Huang Rui
Signed-off-by: Huang Rui 
---
 drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 17 +
 1 file changed, 17 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c 
b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
index 7347326..df4b1d3 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
@@ -1389,6 +1389,22 @@ static int sdma_v4_0_set_powergating_state(void *handle,
return 0;
 }
 
+static void sdma_v4_0_get_clockgating_state(void *handle, u32 *flags)
+{
+   struct amdgpu_device *adev = (struct amdgpu_device *)handle;
+   int data;
+
+   /* AMD_CG_SUPPORT_SDMA_MGCG */
+   data = RREG32(SOC15_REG_OFFSET(SDMA0, 0, mmSDMA0_CLK_CTRL));
+   if (!(data & SDMA0_CLK_CTRL__SOFT_OVERRIDE7_MASK))
+   *flags |= AMD_CG_SUPPORT_SDMA_MGCG;
+
+   /* AMD_CG_SUPPORT_SDMA_LS */
+   data = RREG32(SOC15_REG_OFFSET(SDMA0, 0, mmSDMA0_POWER_CNTL));
+   if (data & SDMA0_POWER_CNTL__MEM_POWER_OVERRIDE_MASK)
+   *flags |= AMD_CG_SUPPORT_SDMA_LS;
+}
+
 const struct amd_ip_funcs sdma_v4_0_ip_funcs = {
.name = "sdma_v4_0",
.early_init = sdma_v4_0_early_init,
@@ -1404,6 +1420,7 @@ const struct amd_ip_funcs sdma_v4_0_ip_funcs = {
.soft_reset = sdma_v4_0_soft_reset,
.set_clockgating_state = sdma_v4_0_set_clockgating_state,
.set_powergating_state = sdma_v4_0_set_powergating_state,
+   .get_clockgating_state = sdma_v4_0_get_clockgating_state,
 };
 
 static const struct amdgpu_ring_funcs sdma_v4_0_ring_funcs = {
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 2/6] drm/amdgpu: add get_clockgating callback for nbio v6.1

2017-03-24 Thread Huang Rui
Signed-off-by: Huang Rui 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c |  1 +
 drivers/gpu/drm/amd/amdgpu/nbio_v6_1.c | 15 +++
 drivers/gpu/drm/amd/amdgpu/nbio_v6_1.h |  1 +
 3 files changed, 17 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
index 2c170f1..743a852 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
@@ -49,6 +49,7 @@ static const struct cg_flag_name clocks[] = {
{AMD_CG_SUPPORT_MC_MGCG, "Memory Controller Medium Grain Clock Gating"},
{AMD_CG_SUPPORT_SDMA_LS, "System Direct Memory Access Light Sleep"},
{AMD_CG_SUPPORT_SDMA_MGCG, "System Direct Memory Access Medium Grain 
Clock Gating"},
+   {AMD_CG_SUPPORT_BIF_MGCG, "Bus Interface Medium Grain Clock Gating"},
{AMD_CG_SUPPORT_BIF_LS, "Bus Interface Light Sleep"},
{AMD_CG_SUPPORT_UVD_MGCG, "Unified Video Decoder Medium Grain Clock 
Gating"},
{AMD_CG_SUPPORT_VCE_MGCG, "Video Compression Engine Medium Grain Clock 
Gating"},
diff --git a/drivers/gpu/drm/amd/amdgpu/nbio_v6_1.c 
b/drivers/gpu/drm/amd/amdgpu/nbio_v6_1.c
index f517e9a..c0945e8 100644
--- a/drivers/gpu/drm/amd/amdgpu/nbio_v6_1.c
+++ b/drivers/gpu/drm/amd/amdgpu/nbio_v6_1.c
@@ -206,6 +206,21 @@ void nbio_v6_1_update_medium_grain_light_sleep(struct 
amdgpu_device *adev,
WREG32_PCIE(smnPCIE_CNTL2, data);
 }
 
+void nbio_v6_1_get_clockgating_state(struct amdgpu_device *adev, u32 *flags)
+{
+   int data;
+
+   /* AMD_CG_SUPPORT_BIF_MGCG */
+   data = RREG32_PCIE(smnCPM_CONTROL);
+   if (data & CPM_CONTROL__LCLK_DYN_GATE_ENABLE_MASK)
+   *flags |= AMD_CG_SUPPORT_BIF_MGCG;
+
+   /* AMD_CG_SUPPORT_BIF_LS */
+   data = RREG32_PCIE(smnPCIE_CNTL2);
+   if (data & PCIE_CNTL2__SLV_MEM_LS_EN_MASK)
+   *flags |= AMD_CG_SUPPORT_BIF_LS;
+}
+
 struct nbio_hdp_flush_reg nbio_v6_1_hdp_flush_reg;
 struct nbio_pcie_index_data nbio_v6_1_pcie_index_data;
 
diff --git a/drivers/gpu/drm/amd/amdgpu/nbio_v6_1.h 
b/drivers/gpu/drm/amd/amdgpu/nbio_v6_1.h
index a778d1c..a7e6f39 100644
--- a/drivers/gpu/drm/amd/amdgpu/nbio_v6_1.h
+++ b/drivers/gpu/drm/amd/amdgpu/nbio_v6_1.h
@@ -48,5 +48,6 @@ void nbio_v6_1_ih_control(struct amdgpu_device *adev);
 u32 nbio_v6_1_get_rev_id(struct amdgpu_device *adev);
 void nbio_v6_1_update_medium_grain_clock_gating(struct amdgpu_device *adev, 
bool enable);
 void nbio_v6_1_update_medium_grain_light_sleep(struct amdgpu_device *adev, 
bool enable);
+void nbio_v6_1_get_clockgating_state(struct amdgpu_device *adev, u32 *flags);
 
 #endif
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx