date:20220929

[PATCH 2/2] drm/amdgpu: remove unnecessary judgement in sdma reg offest calculaton

2022-09-29 Thread Yifan Zhang

clean sdma_v6_0_get_reg_offset function.

Signed-off-by: Yifan Zhang 
---
 drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c 
b/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
index db51230163c5..b2c71f533e93 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
@@ -67,12 +67,10 @@ static u32 sdma_v6_0_get_reg_offset(struct amdgpu_device 
*adev, u32 instance, u3
if (internal_offset >= SDMA0_HYP_DEC_REG_START &&
internal_offset <= SDMA0_HYP_DEC_REG_END) {
base = adev->reg_offset[GC_HWIP][0][1];
-   if (instance != 0)
-   internal_offset += SDMA1_HYP_DEC_REG_OFFSET * instance;
+   internal_offset += SDMA1_HYP_DEC_REG_OFFSET * instance;
} else {
base = adev->reg_offset[GC_HWIP][0][0];
-   if (instance == 1)
-   internal_offset += SDMA1_REG_OFFSET;
+   internal_offset += SDMA1_REG_OFFSET * instance;
}
 
return base + internal_offset;
-- 
2.37.3

[PATCH 1/2] drm/amdkfd: correct RB_SIZE in SDMA0_QUEUE0_RB_CNTL

2022-09-29 Thread Yifan Zhang

In SDMA0_QUEUE0_RB_CNTL, queue size is 2^RB_SIZE, not 2^(RB_SIZE +1).

Signed-off-by: Yifan Zhang 
---
 drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v10.c | 2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v11.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v10.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v10.c
index d3e2b6a599a4..03699a9ad3d9 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v10.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v10.c
@@ -329,7 +329,7 @@ static void update_mqd_sdma(struct mqd_manager *mm, void 
*mqd,
struct v10_sdma_mqd *m;
 
m = get_sdma_mqd(mqd);
-   m->sdmax_rlcx_rb_cntl = (ffs(q->queue_size / sizeof(unsigned int)) - 1)
+   m->sdmax_rlcx_rb_cntl = order_base_2(q->queue_size / 4)
<< SDMA0_RLC0_RB_CNTL__RB_SIZE__SHIFT |
q->vmid << SDMA0_RLC0_RB_CNTL__RB_VMID__SHIFT |
1 << SDMA0_RLC0_RB_CNTL__RPTR_WRITEBACK_ENABLE__SHIFT |
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v11.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v11.c
index 26b53b6d673e..451fcb9bb051 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v11.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v11.c
@@ -329,7 +329,7 @@ static void update_mqd_sdma(struct mqd_manager *mm, void 
*mqd,
struct v11_sdma_mqd *m;
 
m = get_sdma_mqd(mqd);
-   m->sdmax_rlcx_rb_cntl = (ffs(q->queue_size / sizeof(unsigned int)) - 1)
+   m->sdmax_rlcx_rb_cntl = order_base_2(q->queue_size / 4)
<< SDMA0_QUEUE0_RB_CNTL__RB_SIZE__SHIFT |
q->vmid << SDMA0_QUEUE0_RB_CNTL__RB_VMID__SHIFT |
1 << SDMA0_QUEUE0_RB_CNTL__RPTR_WRITEBACK_ENABLE__SHIFT |
-- 
2.37.3

Re: [PATCH v3 5/5] drm/amdgpu: switch workload context to/from compute

2022-09-29 Thread Lazar, Lijo





On 9/30/2022 12:02 AM, Alex Deucher wrote:

On Thu, Sep 29, 2022 at 10:14 AM Lazar, Lijo  wrote:




On 9/29/2022 7:30 PM, Sharma, Shashank wrote:



On 9/29/2022 3:37 PM, Lazar, Lijo wrote:

To be clear your understanding -

Nothing is automatic in PMFW. PMFW picks a priority based on the
actual mask sent by driver.

Assuming lower bits corresponds to highest priority -

If driver sends a mask with Bit3 and Bit 0 set, PMFW will chose
profile that corresponds to Bit0. If driver sends a mask with Bit4
Bit2 set and rest unset, PMFW will chose profile that corresponds to
Bit2. However if driver sends a mask only with a single bit set, it
chooses the profile regardless of whatever was the previous profile. t
doesn't check if the existing profile > newly requested one. That is
the behavior.

So if a job send chooses a profile that corresponds to Bit0, driver
will send that. Next time if another job chooses a profile that
corresponds to Bit1, PMFW will receive that as the new profile and
switch to that. It trusts the driver to send the proper workload mask.

Hope that gives the picture.




Thanks, my understanding is also similar, referring to the core power
switch profile function here:
amd_powerplay.c::pp_dpm_switch_power_profile()
*snip code*
hwmgr->workload_mask |= (1 << hwmgr->workload_prority[type]);
  index = fls(hwmgr->workload_mask);
  index = index <= Workload_Policy_Max ? index - 1 : 0;
  workload = hwmgr->workload_setting[index];
*snip_code*
hwmgr->hwmgr_func->set_power_profile_mode(hwmgr, &workload, 0);

Here I can see that the new workload mask is appended into the existing
workload mask (not overwritten). So if we keep sending new
workload_modes, they would be appended into the workload flags and
finally the PM will pick the most aggressive one of all these flags, as
per its policy.



Actually it's misleading -

The path for sienna is -
set_power_profile_mode -> sienna_cichlid_set_power_profile_mode


This code here is a picking one based on lookup table.

   workload_type = smu_cmn_to_asic_specific_index(smu,

CMN2ASIC_MAPPING_WORKLOAD,

smu->power_profile_mode);

This is that lookup table -

static struct cmn2asic_mapping
sienna_cichlid_workload_map[PP_SMC_POWER_PROFILE_COUNT] = {
  WORKLOAD_MAP(PP_SMC_POWER_PROFILE_BOOTUP_DEFAULT,
WORKLOAD_PPLIB_DEFAULT_BIT),
  WORKLOAD_MAP(PP_SMC_POWER_PROFILE_FULLSCREEN3D,
WORKLOAD_PPLIB_FULL_SCREEN_3D_BIT),
  WORKLOAD_MAP(PP_SMC_POWER_PROFILE_POWERSAVING,
WORKLOAD_PPLIB_POWER_SAVING_BIT),
  WORKLOAD_MAP(PP_SMC_POWER_PROFILE_VIDEO,
WORKLOAD_PPLIB_VIDEO_BIT),
  WORKLOAD_MAP(PP_SMC_POWER_PROFILE_VR,
WORKLOAD_PPLIB_VR_BIT),
  WORKLOAD_MAP(PP_SMC_POWER_PROFILE_COMPUTE,
WORKLOAD_PPLIB_COMPUTE_BIT),
  WORKLOAD_MAP(PP_SMC_POWER_PROFILE_CUSTOM,
WORKLOAD_PPLIB_CUSTOM_BIT),
};


And this is the place of interaction with PMFW. (1 << workload_type) is
the mask being sent.

 smu_cmn_send_smc_msg_with_param(smu, SMU_MSG_SetWorkloadMask,
  1 << workload_type, NULL);

In the end, driver implementation expects only one bit to be set.


Shashank and I had a discussion about this today.  I think there are a
few thing we can do to handle this better:

1. Set a flag that if the user changes the default via sysfs that
overrides any runtime setting via an application since presumably that
is what the user wants and we won't change the hint at runtime.
2. Drop the GET API.  There's no need for this, the hint is just a hint.


Double checked again based on Felix's comments on API definition. Driver 
decides the priority instead of FW. That way we can still keep Get API.



2. Have the driver arbitrate between the available workload profiles
based on the numeric value of the hint (e.g., default < 3D < video <
VR < compute) as the higher values are more aggressive in most cases.
If requests come in for 3D and compute at the same time, the driver
will select compute because it's value is highest.  Each hint type
would be ref counted so we'd know what state to be in every time we go
to set the state.  If all of the clients requesting compute go away,
and only 3D requestors remain, we can switch to 3D.  If all refcounts
go to 0, we go back to default.  This will not require any change to
the current workload API in the SMU code.


Since PM layer decides priority, refcount can be kept at powerplay and 
swsmu layer instead of any higher level API.


User API may keep something like req_power_profile (for any 
logging/debug purpose) for the job preference.


Thanks,
Lijo



Alex



Thanks,
Lijo


Now, when we have a single workload:
-> Job1: requests profile P1 via UAPI, ref count = 1
-> driver sends flags for p1
-> PM FW applies profile P1
-> Job executes in profile P1
-> Job goes to reset function, ref_count = 0,
-> Power profile resets

Now, we have conflicts only when we see multiple workloads (Job1 and Job 2)
-> Job1: requests profile P1 via UAPI, ref count =

Re: [PATCH v3 5/5] drm/amdgpu: switch workload context to/from compute

2022-09-29 Thread Lazar, Lijo

On 9/29/2022 11:37 PM, Felix Kuehling wrote:

On 2022-09-29 07:10, Lazar, Lijo wrote:

On 9/29/2022 2:18 PM, Sharma, Shashank wrote:

On 9/28/2022 11:51 PM, Alex Deucher wrote:

On Wed, Sep 28, 2022 at 4:57 AM Sharma, Shashank
 wrote:

On 9/27/2022 10:40 PM, Alex Deucher wrote:

On Tue, Sep 27, 2022 at 11:38 AM Sharma, Shashank
 wrote:

On 9/27/2022 5:23 PM, Felix Kuehling wrote:

Am 2022-09-27 um 10:58 schrieb Sharma, Shashank:

Hello Felix,

Thank for the review comments.

On 9/27/2022 4:48 PM, Felix Kuehling wrote:

Am 2022-09-27 um 02:12 schrieb Christian König:

Am 26.09.22 um 23:40 schrieb Shashank Sharma:

This patch switches the GPU workload mode to/from
compute mode, while submitting compute workload.

Signed-off-by: Alex Deucher 
Signed-off-by: Shashank Sharma 

Feel free to add my acked-by, but Felix should probably take 
a look

as well.

This look OK purely from a compute perspective. But I'm concerned
about the interaction of compute with graphics or multiple 
graphics
contexts submitting work concurrently. They would constantly 
override

or disable each other's workload hints.

For example, you have an amdgpu_ctx with
AMDGPU_CTX_WORKLOAD_HINT_COMPUTE (maybe Vulkan compute) and a KFD
process that also wants the compute profile. Those could be 
different
processes belonging to different users. Say, KFD enables the 
compute
profile first. Then the graphics context submits a job. At the 
start
of the job, the compute profile is enabled. That's a no-op 
because
KFD already enabled the compute profile. When the job 
finishes, it

disables the compute profile for everyone, including KFD. That's
unexpected.

In this case, it will not disable the compute profile, as the
reference counter will not be zero. The reset_profile() will 
only act

if the reference counter is 0.

OK, I missed the reference counter.

But I would be happy to get any inputs about a policy which can be
more sustainable and gets better outputs, for example:
- should we not allow a profile change, if a PP mode is already
applied and keep it Early bird basis ?

For example: Policy A
- Job A sets the profile to compute
- Job B tries to set profile to 3D, but we do not allow it as 
job A is

not finished it yet.

Or Policy B: Current one
- Job A sets the profile to compute
- Job B tries to set profile to 3D, and we allow it. Job A also 
runs

in PP 3D
- Job B finishes, but does not reset PP as reference count is 
not zero

due to compute
- Job  A finishes, profile reset to NONE

I think this won't work. As I understand it, the
amdgpu_dpm_switch_power_profile enables and disables individual
profiles. Disabling the 3D profile doesn't disable the compute 
profile

at the same time. I think you'll need one refcount per profile.

Regards,
 Felix

Thanks, This is exactly what I was looking for, I think Alex's 
initial
idea was around it, but I was under the assumption that there is 
only
one HW profile in SMU which keeps on getting overwritten. This 
can solve
our problems, as I can create an array of reference counters, and 
will

disable only the profile whose reference counter goes 0.

It's been a while since I paged any of this code into my head, but I
believe the actual workload message in the SMU is a mask where you 
can

specify multiple workload types at the same time and the SMU will
arbitrate between them internally.  E.g., the most aggressive one 
will

be selected out of the ones specified.  I think in the driver we just
set one bit at a time using the current interface.  It might be 
better

to change the interface and just ref count the hint types and then
when we call the set function look at the ref counts for each hint
type and set the mask as appropriate.

Alex

Hey Alex,
Thanks for your comment, if that is the case, this current patch 
series
works straight forward, and no changes would be required. Please 
let me

know if my understanding is correct:

Assumption: Order of aggression: 3D > Media > Compute

- Job 1: Requests mode compute: PP changed to compute, ref count 1
- Job 2: Requests mode media: PP changed to media, ref count 2
- Job 3: requests mode 3D: PP changed to 3D, ref count 3
- Job 1 finishes, downs ref count to 2, doesn't reset the PP as ref 
> 0,

PP still 3D
- Job 3 finishes, downs ref count to 1, doesn't reset the PP as ref 
> 0,

PP still 3D
- Job 2 finishes, downs ref count to 0, PP changed to NONE,

In this way, every job will be operating in the Power profile of 
desired
aggression or higher, and this API guarantees the execution 
at-least in

the desired power profile.

I'm not entirely sure on the relative levels of aggression, but I
believe the SMU priorities them by index.  E.g.
#define WORKLOAD_PPLIB_DEFAULT_BIT    0
#define WORKLOAD_PPLIB_FULL_SCREEN_3D_BIT 1
#define WORKLOAD_PPLIB_POWER_SAVING_BIT   2
#define WORKLOAD_PPLIB_VIDEO_BIT  3
#define WORKLOAD_PPLIB_VR_BIT 4
#define WORKLOAD_PPLIB_COMPUTE_BIT    5
#define WORKLOAD_PPLIB_CU

[PATCH v8] drm/sched: Add FIFO sched policy to run queue

2022-09-29 Thread Luben Tuikov

From: Andrey Grodzovsky 

When many entities are competing for the same run queue
on the same scheduler, we observe an unusually long wait
times and some jobs get starved. This has been observed on GPUVis.

The issue is due to the Round Robin policy used by schedulers
to pick up the next entity's job queue for execution. Under stress
of many entities and long job queues within entity some
jobs could be stuck for very long time in it's entity's
queue before being popped from the queue and executed
while for other entities with smaller job queues a job
might execute earlier even though that job arrived later
then the job in the long queue.

Fix:
Add FIFO selection policy to entities in run queue, chose next entity
on run queue in such order that if job on one entity arrived
earlier then job on another entity the first job will start
executing earlier regardless of the length of the entity's job
queue.

v2:
Switch to rb tree structure for entities based on TS of
oldest job waiting in the job queue of an entity. Improves next
entity extraction to O(1). Entity TS update
O(log N) where N is the number of entities in the run-queue

Drop default option in module control parameter.

v3:
Various cosmetical fixes and minor refactoring of fifo update function. (Luben)

v4:
Switch drm_sched_rq_select_entity_fifo to in order search (Luben)

v5: Fix up drm_sched_rq_select_entity_fifo loop (Luben)

v6: Add missing drm_sched_rq_remove_fifo_locked

v7: Fix ts sampling bug and more cosmetic stuff (Luben)

v8: Fix module parameter string (Luben)

Cc: Luben Tuikov 
Cc: Christian König 
Cc: Direct Rendering Infrastructure - Development 

Cc: AMD Graphics 
Signed-off-by: Andrey Grodzovsky 
Tested-by: Yunxiang Li (Teddy) 
Signed-off-by: Luben Tuikov 
Reviewed-by: Luben Tuikov 
---
 drivers/gpu/drm/scheduler/sched_entity.c | 20 +
 drivers/gpu/drm/scheduler/sched_main.c   | 96 +++-
 include/drm/gpu_scheduler.h  | 32 
 3 files changed, 145 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
b/drivers/gpu/drm/scheduler/sched_entity.c
index 6b25b2f4f5a308..7060e4ed5a3148 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -73,6 +73,7 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
entity->priority = priority;
entity->sched_list = num_sched_list > 1 ? sched_list : NULL;
entity->last_scheduled = NULL;
+   RB_CLEAR_NODE(&entity->rb_tree_node);
 
if(num_sched_list)
entity->rq = &sched_list[0]->sched_rq[entity->priority];
@@ -443,6 +444,19 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct 
drm_sched_entity *entity)
smp_wmb();
 
spsc_queue_pop(&entity->job_queue);
+
+   /*
+* Update the entity's location in the min heap according to
+* the timestamp of the next job, if any.
+*/
+   if (drm_sched_policy == DRM_SCHED_POLICY_FIFO) {
+   struct drm_sched_job *next;
+
+   next = to_drm_sched_job(spsc_queue_peek(&entity->job_queue));
+   if (next)
+   drm_sched_rq_update_fifo(entity, next->submit_ts);
+   }
+
return sched_job;
 }
 
@@ -507,6 +521,7 @@ void drm_sched_entity_push_job(struct drm_sched_job 
*sched_job)
atomic_inc(entity->rq->sched->score);
WRITE_ONCE(entity->last_user, current->group_leader);
first = spsc_queue_push(&entity->job_queue, &sched_job->queue_node);
+   sched_job->submit_ts = ktime_get();
 
/* first job wakes up scheduler */
if (first) {
@@ -518,8 +533,13 @@ void drm_sched_entity_push_job(struct drm_sched_job 
*sched_job)
DRM_ERROR("Trying to push to a killed entity\n");
return;
}
+
drm_sched_rq_add_entity(entity->rq, entity);
spin_unlock(&entity->rq_lock);
+
+   if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
+   drm_sched_rq_update_fifo(entity, sched_job->submit_ts);
+
drm_sched_wakeup(entity->rq->sched);
}
 }
diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c
index 4f2395d1a79182..ce86b03e838699 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -62,6 +62,55 @@
 #define to_drm_sched_job(sched_job)\
container_of((sched_job), struct drm_sched_job, queue_node)
 
+int drm_sched_policy = DRM_SCHED_POLICY_RR;
+
+/**
+ * DOC: sched_policy (int)
+ * Used to override default entities scheduling policy in a run queue.
+ */
+MODULE_PARM_DESC(sched_policy, "Specify schedule policy for entities on a 
runqueue, " __stringify(DRM_SCHED_POLICY_RR) " = Round Robin (default), " 
__stringify(DRM_SCHED_POLICY_FIFO) " = use FIFO.");
+module_param_named(sched_policy, drm_sched_policy, int, 0444);
+
+static __always_inli

Re: [PATCH 2/7] mm: Free device private pages have zero refcount

2022-09-29 Thread Dan Williams

Alistair Popple wrote:
> 
> Jason Gunthorpe  writes:
> 
> > On Mon, Sep 26, 2022 at 04:03:06PM +1000, Alistair Popple wrote:
> >> Since 27674ef6c73f ("mm: remove the extra ZONE_DEVICE struct page
> >> refcount") device private pages have no longer had an extra reference
> >> count when the page is in use. However before handing them back to the
> >> owning device driver we add an extra reference count such that free
> >> pages have a reference count of one.
> >>
> >> This makes it difficult to tell if a page is free or not because both
> >> free and in use pages will have a non-zero refcount. Instead we should
> >> return pages to the drivers page allocator with a zero reference count.
> >> Kernel code can then safely use kernel functions such as
> >> get_page_unless_zero().
> >>
> >> Signed-off-by: Alistair Popple 
> >> ---
> >>  arch/powerpc/kvm/book3s_hv_uvmem.c   | 1 +
> >>  drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 1 +
> >>  drivers/gpu/drm/nouveau/nouveau_dmem.c   | 1 +
> >>  lib/test_hmm.c   | 1 +
> >>  mm/memremap.c| 5 -
> >>  mm/page_alloc.c  | 6 ++
> >>  6 files changed, 10 insertions(+), 5 deletions(-)
> >
> > I think this is a great idea, but I'm surprised no dax stuff is
> > touched here?
> 
> free_zone_device_page() shouldn't be called for pgmap->type ==
> MEMORY_DEVICE_FS_DAX so I don't think we should have to worry about DAX
> there. Except that the folio code looks like it might have introduced a
> bug. AFAICT put_page() always calls
> put_devmap_managed_page(&folio->page) but folio_put() does not (although
> folios_put() does!). So it seems folio_put() won't end up calling
> __put_devmap_managed_page_refs() as I think it should.
> 
> I think you're right about the change to __init_zone_device_page() - I
> should limit it to DEVICE_PRIVATE/COHERENT pages only. But I need to
> look at Dan's patch series more closely as I suspect it might be better
> to rebase this patch on top of that.

Apologies for the delay I was travelling the past few days. Yes, I think
this patch slots in nicely to avoid the introduction of an init_mode
[1]:

https://lore.kernel.org/nvdimm/166329940343.2786261.6047770378829215962.st...@dwillia2-xfh.jf.intel.com/

Mind if I steal it into my series?

Re: [PATCH v2 2/8] mm: Free device private pages have zero refcount

2022-09-29 Thread Felix Kuehling




On 2022-09-28 08:01, Alistair Popple wrote:

Since 27674ef6c73f ("mm: remove the extra ZONE_DEVICE struct page
refcount") device private pages have no longer had an extra reference
count when the page is in use. However before handing them back to the
owning device driver we add an extra reference count such that free
pages have a reference count of one.

This makes it difficult to tell if a page is free or not because both
free and in use pages will have a non-zero refcount. Instead we should
return pages to the drivers page allocator with a zero reference count.
Kernel code can then safely use kernel functions such as
get_page_unless_zero().

Signed-off-by: Alistair Popple 


Acked-by: Felix Kuehling 



Cc: Jason Gunthorpe 
Cc: Michael Ellerman 
Cc: Felix Kuehling 
Cc: Alex Deucher 
Cc: Christian König 
Cc: Ben Skeggs 
Cc: Lyude Paul 
Cc: Ralph Campbell 
Cc: Alex Sierra 
Cc: John Hubbard 
Cc: Dan Williams 

---

This will conflict with Dan's series to fix reference counts for DAX[1].
At the moment this only makes changes for device private and coherent
pages, however if DAX is fixed to remove the extra refcount then we
should just be able to drop the checks for private/coherent pages and
treat them the same.

[1] - 
https://lore.kernel.org/linux-mm/166329930818.2786261.6086109734008025807.st...@dwillia2-xfh.jf.intel.com/
---
  arch/powerpc/kvm/book3s_hv_uvmem.c   |  2 +-
  drivers/gpu/drm/amd/amdkfd/kfd_migrate.c |  2 +-
  drivers/gpu/drm/nouveau/nouveau_dmem.c   |  2 +-
  include/linux/memremap.h |  1 +
  lib/test_hmm.c   |  2 +-
  mm/memremap.c|  9 +
  mm/page_alloc.c  |  8 
  7 files changed, 22 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
b/arch/powerpc/kvm/book3s_hv_uvmem.c
index d4eacf4..9d8de68 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -718,7 +718,7 @@ static struct page *kvmppc_uvmem_get_page(unsigned long 
gpa, struct kvm *kvm)
  
  	dpage = pfn_to_page(uvmem_pfn);

dpage->zone_device_data = pvt;
-   lock_page(dpage);
+   zone_device_page_init(dpage);
return dpage;
  out_clear:
spin_lock(&kvmppc_uvmem_bitmap_lock);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
index 776448b..97a6845 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
@@ -223,7 +223,7 @@ svm_migrate_get_vram_page(struct svm_range *prange, 
unsigned long pfn)
page = pfn_to_page(pfn);
svm_range_bo_ref(prange->svm_bo);
page->zone_device_data = prange->svm_bo;
-   lock_page(page);
+   zone_device_page_init(page);
  }
  
  static void

diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c 
b/drivers/gpu/drm/nouveau/nouveau_dmem.c
index 1635661..b092988 100644
--- a/drivers/gpu/drm/nouveau/nouveau_dmem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c
@@ -326,7 +326,7 @@ nouveau_dmem_page_alloc_locked(struct nouveau_drm *drm)
return NULL;
}
  
-	lock_page(page);

+   zone_device_page_init(page);
return page;
  }
  
diff --git a/include/linux/memremap.h b/include/linux/memremap.h

index 1901049..f68bf6d 100644
--- a/include/linux/memremap.h
+++ b/include/linux/memremap.h
@@ -182,6 +182,7 @@ static inline bool folio_is_device_coherent(const struct 
folio *folio)
  }
  
  #ifdef CONFIG_ZONE_DEVICE

+void zone_device_page_init(struct page *page);
  void *memremap_pages(struct dev_pagemap *pgmap, int nid);
  void memunmap_pages(struct dev_pagemap *pgmap);
  void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap);
diff --git a/lib/test_hmm.c b/lib/test_hmm.c
index 89463ff..688c15d 100644
--- a/lib/test_hmm.c
+++ b/lib/test_hmm.c
@@ -627,8 +627,8 @@ static struct page *dmirror_devmem_alloc_page(struct 
dmirror_device *mdevice)
goto error;
}
  
+	zone_device_page_init(dpage);

dpage->zone_device_data = rpage;
-   lock_page(dpage);
return dpage;
  
  error:

diff --git a/mm/memremap.c b/mm/memremap.c
index 25029a4..1c2c038 100644
--- a/mm/memremap.c
+++ b/mm/memremap.c
@@ -505,8 +505,17 @@ void free_zone_device_page(struct page *page)
/*
 * Reset the page count to 1 to prepare for handing out the page again.
 */
+   if (page->pgmap->type != MEMORY_DEVICE_PRIVATE &&
+   page->pgmap->type != MEMORY_DEVICE_COHERENT)
+   set_page_count(page, 1);
+}
+
+void zone_device_page_init(struct page *page)
+{
set_page_count(page, 1);
+   lock_page(page);
  }
+EXPORT_SYMBOL_GPL(zone_device_page_init);
  
  #ifdef CONFIG_FS_DAX

  bool __put_devmap_managed_page_refs(struct page *page, int refs)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 9d49803..4df1e43 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6744,6 +6744,14 @@

Re: [PATCH] drm/sched: Add FIFO sched policy to run queue

2022-09-29 Thread Luben Tuikov

Inlined:

On 2022-09-29 14:46, Luben Tuikov wrote:
> From: Andrey Grodzovsky 
> 
> When many entities are competing for the same run queue
> on the same scheduler, we observe an unusually long wait
> times and some jobs get starved. This has been observed on GPUVis.
> 
> The issue is due to the Round Robin policy used by schedulers
> to pick up the next entity's job queue for execution. Under stress
> of many entities and long job queues within entity some
> jobs could be stuck for very long time in it's entity's
> queue before being popped from the queue and executed
> while for other entities with smaller job queues a job
> might execute earlier even though that job arrived later
> then the job in the long queue.
>    
> Fix:
> Add FIFO selection policy to entities in run queue, chose next entity
> on run queue in such order that if job on one entity arrived
> earlier then job on another entity the first job will start
> executing earlier regardless of the length of the entity's job
> queue.
> 
> v2:
> Switch to rb tree structure for entities based on TS of
> oldest job waiting in the job queue of an entity. Improves next
> entity extraction to O(1). Entity TS update
> O(log N) where N is the number of entities in the run-queue
> 
> Drop default option in module control parameter.
> 
> v3:
> Various cosmetical fixes and minor refactoring of fifo update function. 
> (Luben)
> 
> v4:
> Switch drm_sched_rq_select_entity_fifo to in order search (Luben)
> 
> v5: Fix up drm_sched_rq_select_entity_fifo loop (Luben)
> 
> v6: Add missing drm_sched_rq_remove_fifo_locked
> 
> Cc: Luben Tuikov 
> Cc: Christian König 
> Cc: Direct Rendering Infrastructure - Development 
> 
> Cc: AMD Graphics 
> Signed-off-by: Andrey Grodzovsky 
> Tested-by: Yunxiang Li (Teddy) 
> ---
>  drivers/gpu/drm/scheduler/sched_entity.c | 26 ++-
>  drivers/gpu/drm/scheduler/sched_main.c   | 97 +++-
>  include/drm/gpu_scheduler.h  | 32 
>  3 files changed, 149 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
> b/drivers/gpu/drm/scheduler/sched_entity.c
> index 6b25b2f4f5a308..f3ffce3c9304d5 100644
> --- a/drivers/gpu/drm/scheduler/sched_entity.c
> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> @@ -73,6 +73,7 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
>   entity->priority = priority;
>   entity->sched_list = num_sched_list > 1 ? sched_list : NULL;
>   entity->last_scheduled = NULL;
> + RB_CLEAR_NODE(&entity->rb_tree_node);
>  
>   if(num_sched_list)
>   entity->rq = &sched_list[0]->sched_rq[entity->priority];
> @@ -417,14 +418,16 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct 
> drm_sched_entity *entity)
>  
>   sched_job = to_drm_sched_job(spsc_queue_peek(&entity->job_queue));
>   if (!sched_job)
> - return NULL;
> + goto skip;
>  
>   while ((entity->dependency =
>   drm_sched_job_dependency(sched_job, entity))) {
>   trace_drm_sched_job_wait_dep(sched_job, entity->dependency);
>  
> - if (drm_sched_entity_add_dependency_cb(entity))
> - return NULL;
> + if (drm_sched_entity_add_dependency_cb(entity)) {
> + sched_job = NULL;
> + goto skip;
> + }
>   }
>  
>   /* skip jobs from entity that marked guilty */
> @@ -443,6 +446,16 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct 
> drm_sched_entity *entity)
>   smp_wmb();
>  
>   spsc_queue_pop(&entity->job_queue);
> +
> + /*
> +  * It's when head job is extracted we can access the next job (or empty)
> +  * queue and update the entity location in the min heap accordingly.
> +  */
> +skip:
> + if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
> + drm_sched_rq_update_fifo(entity,
> +  (sched_job ? sched_job->submit_ts : 
> ktime_get()));
> +
>   return sched_job;
>  }

If there's a next job, you should update the entity's timestamp to that
of the next job. Also you shouldn't have to update the timestamp when there
is no next job. And to be true to the comment above, you should have this here:

-skip:
-   if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
-   drm_sched_rq_update_fifo(entity,
-(sched_job ? sched_job->submit_ts : 
ktime_get()));
 
+   if (drm_sched_policy == DRM_SCHED_POLICY_FIFO) {
+   struct drm_sched_job *next;
+
+   next = to_drm_sched_job(spsc_queue_peek(&entity->job_queue));
+   if (next)
+   drm_sched_rq_update_fifo(entity, next->submit_ts);
+   }
+Out:
return sched_job;

And then change the "goto skip;" to "goto Out;" in the preceding code.

This way you update the entity's timestamp to that of the next (oldest) job
in the entity's job queue, if there is a next

[PATCH] drm/sched: Add FIFO sched policy to run queue

2022-09-29 Thread Luben Tuikov

From: Andrey Grodzovsky 

When many entities are competing for the same run queue
on the same scheduler, we observe an unusually long wait
times and some jobs get starved. This has been observed on GPUVis.

The issue is due to the Round Robin policy used by schedulers
to pick up the next entity's job queue for execution. Under stress
of many entities and long job queues within entity some
jobs could be stuck for very long time in it's entity's
queue before being popped from the queue and executed
while for other entities with smaller job queues a job
might execute earlier even though that job arrived later
then the job in the long queue.
   
Fix:
Add FIFO selection policy to entities in run queue, chose next entity
on run queue in such order that if job on one entity arrived
earlier then job on another entity the first job will start
executing earlier regardless of the length of the entity's job
queue.

v2:
Switch to rb tree structure for entities based on TS of
oldest job waiting in the job queue of an entity. Improves next
entity extraction to O(1). Entity TS update
O(log N) where N is the number of entities in the run-queue

Drop default option in module control parameter.

v3:
Various cosmetical fixes and minor refactoring of fifo update function. (Luben)

v4:
Switch drm_sched_rq_select_entity_fifo to in order search (Luben)

v5: Fix up drm_sched_rq_select_entity_fifo loop (Luben)

v6: Add missing drm_sched_rq_remove_fifo_locked

Cc: Luben Tuikov 
Cc: Christian König 
Cc: Direct Rendering Infrastructure - Development 

Cc: AMD Graphics 
Signed-off-by: Andrey Grodzovsky 
Tested-by: Yunxiang Li (Teddy) 
---
 drivers/gpu/drm/scheduler/sched_entity.c | 26 ++-
 drivers/gpu/drm/scheduler/sched_main.c   | 97 +++-
 include/drm/gpu_scheduler.h  | 32 
 3 files changed, 149 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
b/drivers/gpu/drm/scheduler/sched_entity.c
index 6b25b2f4f5a308..f3ffce3c9304d5 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -73,6 +73,7 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
entity->priority = priority;
entity->sched_list = num_sched_list > 1 ? sched_list : NULL;
entity->last_scheduled = NULL;
+   RB_CLEAR_NODE(&entity->rb_tree_node);
 
if(num_sched_list)
entity->rq = &sched_list[0]->sched_rq[entity->priority];
@@ -417,14 +418,16 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct 
drm_sched_entity *entity)
 
sched_job = to_drm_sched_job(spsc_queue_peek(&entity->job_queue));
if (!sched_job)
-   return NULL;
+   goto skip;
 
while ((entity->dependency =
drm_sched_job_dependency(sched_job, entity))) {
trace_drm_sched_job_wait_dep(sched_job, entity->dependency);
 
-   if (drm_sched_entity_add_dependency_cb(entity))
-   return NULL;
+   if (drm_sched_entity_add_dependency_cb(entity)) {
+   sched_job = NULL;
+   goto skip;
+   }
}
 
/* skip jobs from entity that marked guilty */
@@ -443,6 +446,16 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct 
drm_sched_entity *entity)
smp_wmb();
 
spsc_queue_pop(&entity->job_queue);
+
+   /*
+* It's when head job is extracted we can access the next job (or empty)
+* queue and update the entity location in the min heap accordingly.
+*/
+skip:
+   if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
+   drm_sched_rq_update_fifo(entity,
+(sched_job ? sched_job->submit_ts : 
ktime_get()));
+
return sched_job;
 }
 
@@ -502,11 +515,13 @@ void drm_sched_entity_push_job(struct drm_sched_job 
*sched_job)
 {
struct drm_sched_entity *entity = sched_job->entity;
bool first;
+   ktime_t ts =  ktime_get();
 
trace_drm_sched_job(sched_job, entity);
atomic_inc(entity->rq->sched->score);
WRITE_ONCE(entity->last_user, current->group_leader);
first = spsc_queue_push(&entity->job_queue, &sched_job->queue_node);
+   sched_job->submit_ts = ts;
 
/* first job wakes up scheduler */
if (first) {
@@ -518,8 +533,13 @@ void drm_sched_entity_push_job(struct drm_sched_job 
*sched_job)
DRM_ERROR("Trying to push to a killed entity\n");
return;
}
+
drm_sched_rq_add_entity(entity->rq, entity);
spin_unlock(&entity->rq_lock);
+
+   if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
+   drm_sched_rq_update_fifo(entity, ts);
+
drm_sched_wakeup(entity->rq->sched);
}
 }
diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c
ind

[PATCH v3 6/6] amd/display: indicate support for atomic async page-flips on DC

2022-09-29 Thread Simon Ser

amdgpu_dm_commit_planes() already sets the flip_immediate flag for
async page-flips. This flag is used to set the UNP_FLIP_CONTROL
register. Thus, no additional change is required to handle async
page-flips with the atomic uAPI.

v2: make it clear this commit is about DC and not only DCN

Signed-off-by: Simon Ser 
Reviewed-by: André Almeida 
Reviewed-by: Alex Deucher 
Cc: Joshua Ashton 
Cc: Melissa Wen 
Cc: Harry Wentland 
Cc: Nicholas Kazlauskas 
---
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 7500e82cf06a..44235345fd57 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -3808,7 +3808,6 @@ static int amdgpu_dm_mode_config_init(struct 
amdgpu_device *adev)
adev_to_drm(adev)->mode_config.prefer_shadow = 1;
/* indicates support for immediate flip */
adev_to_drm(adev)->mode_config.async_page_flip = true;
-   adev_to_drm(adev)->mode_config.atomic_async_page_flip_not_supported = 
true;
 
adev_to_drm(adev)->mode_config.fb_base = adev->gmc.aper_base;
 
-- 
2.37.3

[PATCH v3 4/6] drm: allow DRM_MODE_PAGE_FLIP_ASYNC for atomic commits

2022-09-29 Thread Simon Ser

If the driver supports it, allow user-space to supply the
DRM_MODE_PAGE_FLIP_ASYNC flag to request an async page-flip.
Set drm_crtc_state.async_flip accordingly.

Document that drivers will reject atomic commits if an async
flip isn't possible. This allows user-space to fall back to
something else. For instance, Xorg falls back to a blit.
Another option is to wait as close to the next vblank as
possible before performing the page-flip to reduce latency.

v2: document new uAPI

v3: add comment about Intel hardware which needs one last
sync page-flip before being able to switch to async (Ville, Pekka)

Signed-off-by: Simon Ser 
Co-developed-by: André Almeida 
Signed-off-by: André Almeida 
Reviewed-by: André Almeida 
Reviewed-by: Alex Deucher 
Cc: Daniel Vetter 
Cc: Joshua Ashton 
Cc: Melissa Wen 
Cc: Harry Wentland 
Cc: Nicholas Kazlauskas 
Cc: Ville Syrjälä 
---
 drivers/gpu/drm/drm_atomic_uapi.c | 28 +---
 include/uapi/drm/drm_mode.h   |  9 +
 2 files changed, 34 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/drm_atomic_uapi.c 
b/drivers/gpu/drm/drm_atomic_uapi.c
index 79730fa1dd8e..ee24ed7e2edb 100644
--- a/drivers/gpu/drm/drm_atomic_uapi.c
+++ b/drivers/gpu/drm/drm_atomic_uapi.c
@@ -1278,6 +1278,18 @@ static void complete_signaling(struct drm_device *dev,
kfree(fence_state);
 }
 
+static void
+set_async_flip(struct drm_atomic_state *state)
+{
+   struct drm_crtc *crtc;
+   struct drm_crtc_state *crtc_state;
+   int i;
+
+   for_each_new_crtc_in_state(state, crtc, crtc_state, i) {
+   crtc_state->async_flip = true;
+   }
+}
+
 int drm_mode_atomic_ioctl(struct drm_device *dev,
  void *data, struct drm_file *file_priv)
 {
@@ -1318,9 +1330,16 @@ int drm_mode_atomic_ioctl(struct drm_device *dev,
}
 
if (arg->flags & DRM_MODE_PAGE_FLIP_ASYNC) {
-   drm_dbg_atomic(dev,
-  "commit failed: invalid flag 
DRM_MODE_PAGE_FLIP_ASYNC\n");
-   return -EINVAL;
+   if (!dev->mode_config.async_page_flip) {
+   drm_dbg_atomic(dev,
+  "commit failed: DRM_MODE_PAGE_FLIP_ASYNC 
not supported\n");
+   return -EINVAL;
+   }
+   if (dev->mode_config.atomic_async_page_flip_not_supported) {
+   drm_dbg_atomic(dev,
+  "commit failed: DRM_MODE_PAGE_FLIP_ASYNC 
not supported with atomic\n");
+   return -EINVAL;
+   }
}
 
/* can't test and expect an event at the same time. */
@@ -1418,6 +1437,9 @@ int drm_mode_atomic_ioctl(struct drm_device *dev,
if (ret)
goto out;
 
+   if (arg->flags & DRM_MODE_PAGE_FLIP_ASYNC)
+   set_async_flip(state);
+
if (arg->flags & DRM_MODE_ATOMIC_TEST_ONLY) {
ret = drm_atomic_check_only(state);
} else if (arg->flags & DRM_MODE_ATOMIC_NONBLOCK) {
diff --git a/include/uapi/drm/drm_mode.h b/include/uapi/drm/drm_mode.h
index 86a292c3185a..b39e78117b18 100644
--- a/include/uapi/drm/drm_mode.h
+++ b/include/uapi/drm/drm_mode.h
@@ -942,6 +942,15 @@ struct hdr_output_metadata {
  * Request that the page-flip is performed as soon as possible, ie. with no
  * delay due to waiting for vblank. This may cause tearing to be visible on
  * the screen.
+ *
+ * When used with atomic uAPI, the driver will return an error if the hardware
+ * doesn't support performing an asynchronous page-flip for this update.
+ * User-space should handle this, e.g. by falling back to a regular page-flip.
+ *
+ * Note, some hardware might need to perform one last synchronous page-flip
+ * before being able to switch to asynchronous page-flips. As an exception,
+ * the driver will return success even though that first page-flip is not
+ * asynchronous.
  */
 #define DRM_MODE_PAGE_FLIP_ASYNC 0x02
 #define DRM_MODE_PAGE_FLIP_TARGET_ABSOLUTE 0x4
-- 
2.37.3

[PATCH v3 5/6] drm: introduce DRM_CAP_ATOMIC_ASYNC_PAGE_FLIP

2022-09-29 Thread Simon Ser

This new kernel capability indicates whether async page-flips are
supported via the atomic uAPI. DRM clients can use it to check
for support before feeding DRM_MODE_PAGE_FLIP_ASYNC to the kernel.

Make it clear that DRM_CAP_ASYNC_PAGE_FLIP is for legacy uAPI only.

Signed-off-by: Simon Ser 
Reviewed-by: André Almeida 
Reviewed-by: Alex Deucher 
Cc: Daniel Vetter 
Cc: Joshua Ashton 
Cc: Melissa Wen 
Cc: Harry Wentland 
Cc: Nicholas Kazlauskas 
Cc: Ville Syrjälä 
---
 drivers/gpu/drm/drm_ioctl.c |  5 +
 include/uapi/drm/drm.h  | 10 +-
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
index ca2a6e6101dc..5b1591e2b46c 100644
--- a/drivers/gpu/drm/drm_ioctl.c
+++ b/drivers/gpu/drm/drm_ioctl.c
@@ -302,6 +302,11 @@ static int drm_getcap(struct drm_device *dev, void *data, 
struct drm_file *file_
case DRM_CAP_CRTC_IN_VBLANK_EVENT:
req->value = 1;
break;
+   case DRM_CAP_ATOMIC_ASYNC_PAGE_FLIP:
+   req->value = drm_core_check_feature(dev, DRIVER_ATOMIC) &&
+dev->mode_config.async_page_flip &&
+
!dev->mode_config.atomic_async_page_flip_not_supported;
+   break;
default:
return -EINVAL;
}
diff --git a/include/uapi/drm/drm.h b/include/uapi/drm/drm.h
index 642808520d92..b1962628ecda 100644
--- a/include/uapi/drm/drm.h
+++ b/include/uapi/drm/drm.h
@@ -706,7 +706,8 @@ struct drm_gem_open {
 /**
  * DRM_CAP_ASYNC_PAGE_FLIP
  *
- * If set to 1, the driver supports &DRM_MODE_PAGE_FLIP_ASYNC.
+ * If set to 1, the driver supports &DRM_MODE_PAGE_FLIP_ASYNC for legacy
+ * page-flips.
  */
 #define DRM_CAP_ASYNC_PAGE_FLIP0x7
 /**
@@ -767,6 +768,13 @@ struct drm_gem_open {
  * Documentation/gpu/drm-mm.rst, section "DRM Sync Objects".
  */
 #define DRM_CAP_SYNCOBJ_TIMELINE   0x14
+/**
+ * DRM_CAP_ATOMIC_ASYNC_PAGE_FLIP
+ *
+ * If set to 1, the driver supports &DRM_MODE_PAGE_FLIP_ASYNC for atomic
+ * commits.
+ */
+#define DRM_CAP_ATOMIC_ASYNC_PAGE_FLIP 0x15
 
 /* DRM_IOCTL_GET_CAP ioctl argument type */
 struct drm_get_cap {
-- 
2.37.3

[PATCH v3 3/6] drm: introduce drm_mode_config.atomic_async_page_flip_not_supported

2022-09-29 Thread Simon Ser

This new field indicates whether the driver has the necessary logic
to support async page-flips via the atomic uAPI. This is leveraged by
the next commit to allow user-space to use this functionality.

All atomic drivers setting drm_mode_config.async_page_flip are updated
to also set drm_mode_config.atomic_async_page_flip_not_supported. We
will gradually check and update these drivers to properly handle
drm_crtc_state.async_flip in their atomic logic.

The goal of this negative flag is the same as
fb_modifiers_not_supported: we want to eventually get rid of all
drivers missing atomic support for async flips. New drivers should not
set this flag, instead they should support atomic async flips (if
they support async flips at all). IOW, we don't want more drivers
with async flip support for legacy but not atomic.

v2: only set the flag on atomic drivers (remove it on amdgpu DCE and
on radeon)

Signed-off-by: Simon Ser 
Reviewed-by: André Almeida 
Reviewed-by: Alex Deucher 
Cc: Daniel Vetter 
Cc: Joshua Ashton 
Cc: Melissa Wen 
Cc: Harry Wentland 
Cc: Nicholas Kazlauskas 
Cc: Ville Syrjälä 
---
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c |  1 +
 drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.c  |  1 +
 drivers/gpu/drm/i915/display/intel_display.c  |  1 +
 drivers/gpu/drm/nouveau/nouveau_display.c |  1 +
 drivers/gpu/drm/vc4/vc4_kms.c |  1 +
 include/drm/drm_mode_config.h | 11 +++
 6 files changed, 16 insertions(+)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 44235345fd57..7500e82cf06a 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -3808,6 +3808,7 @@ static int amdgpu_dm_mode_config_init(struct 
amdgpu_device *adev)
adev_to_drm(adev)->mode_config.prefer_shadow = 1;
/* indicates support for immediate flip */
adev_to_drm(adev)->mode_config.async_page_flip = true;
+   adev_to_drm(adev)->mode_config.atomic_async_page_flip_not_supported = 
true;
 
adev_to_drm(adev)->mode_config.fb_base = adev->gmc.aper_base;
 
diff --git a/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.c 
b/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.c
index f7e7f4e919c7..ffb3a2fa797f 100644
--- a/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.c
+++ b/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.c
@@ -639,6 +639,7 @@ static int atmel_hlcdc_dc_modeset_init(struct drm_device 
*dev)
dev->mode_config.max_height = dc->desc->max_height;
dev->mode_config.funcs = &mode_config_funcs;
dev->mode_config.async_page_flip = true;
+   dev->mode_config.atomic_async_page_flip_not_supported = true;
 
return 0;
 }
diff --git a/drivers/gpu/drm/i915/display/intel_display.c 
b/drivers/gpu/drm/i915/display/intel_display.c
index 40fbf8a296e2..e025b3499c9d 100644
--- a/drivers/gpu/drm/i915/display/intel_display.c
+++ b/drivers/gpu/drm/i915/display/intel_display.c
@@ -8621,6 +8621,7 @@ static void intel_mode_config_init(struct 
drm_i915_private *i915)
mode_config->helper_private = &intel_mode_config_funcs;
 
mode_config->async_page_flip = HAS_ASYNC_FLIPS(i915);
+   mode_config->atomic_async_page_flip_not_supported = true;
 
/*
 * Maximum framebuffer dimensions, chosen to match
diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c 
b/drivers/gpu/drm/nouveau/nouveau_display.c
index a2f5df568ca5..2b5c4f24aedd 100644
--- a/drivers/gpu/drm/nouveau/nouveau_display.c
+++ b/drivers/gpu/drm/nouveau/nouveau_display.c
@@ -699,6 +699,7 @@ nouveau_display_create(struct drm_device *dev)
dev->mode_config.async_page_flip = false;
else
dev->mode_config.async_page_flip = true;
+   dev->mode_config.atomic_async_page_flip_not_supported = true;
 
drm_kms_helper_poll_init(dev);
drm_kms_helper_poll_disable(dev);
diff --git a/drivers/gpu/drm/vc4/vc4_kms.c b/drivers/gpu/drm/vc4/vc4_kms.c
index 4419e810103d..3fe59c6b2cf0 100644
--- a/drivers/gpu/drm/vc4/vc4_kms.c
+++ b/drivers/gpu/drm/vc4/vc4_kms.c
@@ -1047,6 +1047,7 @@ int vc4_kms_load(struct drm_device *dev)
dev->mode_config.helper_private = &vc4_mode_config_helpers;
dev->mode_config.preferred_depth = 24;
dev->mode_config.async_page_flip = true;
+   dev->mode_config.atomic_async_page_flip_not_supported = true;
 
ret = vc4_ctm_obj_init(vc4);
if (ret)
diff --git a/include/drm/drm_mode_config.h b/include/drm/drm_mode_config.h
index 6b5e01295348..1b535d94f2f4 100644
--- a/include/drm/drm_mode_config.h
+++ b/include/drm/drm_mode_config.h
@@ -917,6 +917,17 @@ struct drm_mode_config {
 */
bool async_page_flip;
 
+   /**
+* @atomic_async_page_flip_not_supported:
+*
+* If true, the driver does not support async page-flips with the
+* atomic uAPI. This is only used by old drivers which

[PATCH v3 2/6] amd/display: only accept async flips for fast updates

2022-09-29 Thread Simon Ser

Up until now, amdgpu was silently degrading to vsync when
user-space requested an async flip but the hardware didn't support
it.

The hardware doesn't support immediate flips when the update changes
the FB pitch, the DCC state, the rotation, enables or disables CRTCs
or planes, etc. This is reflected in the dm_crtc_state.update_type
field: UPDATE_TYPE_FAST means that immediate flip is supported.

Silently degrading async flips to vsync is not the expected behavior
from a uAPI point-of-view. Xorg expects async flips to fail if
unsupported, to be able to fall back to a blit. i915 already behaves
this way.

This patch aligns amdgpu with uAPI expectations and returns a failure
when an async flip is not possible.

v2: new patch

Signed-off-by: Simon Ser 
Reviewed-by: André Almeida 
Reviewed-by: Alex Deucher 
Cc: Joshua Ashton 
Cc: Melissa Wen 
Cc: Harry Wentland 
Cc: Nicholas Kazlauskas 
---
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c  |  8 
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c | 10 ++
 2 files changed, 18 insertions(+)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 7b19f444624c..44235345fd57 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -7629,7 +7629,15 @@ static void amdgpu_dm_commit_planes(struct 
drm_atomic_state *state,
/*
 * Only allow immediate flips for fast updates that don't
 * change FB pitch, DCC state, rotation or mirroing.
+*
+* dm_crtc_helper_atomic_check() only accepts async flips with
+* fast updates.
 */
+   if (crtc->state->async_flip &&
+   acrtc_state->update_type != UPDATE_TYPE_FAST)
+   drm_warn_once(state->dev,
+ "[PLANE:%d:%s] async flip with non-fast 
update\n",
+ plane->base.id, plane->name);
bundle->flip_addrs[planes_count].flip_immediate =
crtc->state->async_flip &&
acrtc_state->update_type == UPDATE_TYPE_FAST;
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c
index 594fe8a4d02b..97ead857f507 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c
@@ -388,6 +388,16 @@ static int dm_crtc_helper_atomic_check(struct drm_crtc 
*crtc,
return -EINVAL;
}
 
+   /* Only allow async flips for fast updates that don't change the FB
+* pitch, the DCC state, rotation, etc. */
+   if (crtc_state->async_flip &&
+   dm_crtc_state->update_type != UPDATE_TYPE_FAST) {
+   drm_dbg_atomic(crtc->dev,
+  "[CRTC:%d:%s] async flips are only supported for 
fast updates\n",
+  crtc->base.id, crtc->name);
+   return -EINVAL;
+   }
+
/* In some use cases, like reset, no stream is attached */
if (!dm_crtc_state->stream)
return 0;
-- 
2.37.3

[PATCH v3 1/6] drm: document DRM_MODE_PAGE_FLIP_ASYNC

2022-09-29 Thread Simon Ser

This is a subset of [1], included here because a subsequent patch
needs to document the behavior of this flag under the atomic uAPI.

v2: new patch

[1]: https://patchwork.freedesktop.org/patch/500177/

Signed-off-by: Simon Ser 
Reviewed-by: André Almeida 
Reviewed-by: Alex Deucher 
---
 include/uapi/drm/drm_mode.h | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/include/uapi/drm/drm_mode.h b/include/uapi/drm/drm_mode.h
index fa953309d9ce..86a292c3185a 100644
--- a/include/uapi/drm/drm_mode.h
+++ b/include/uapi/drm/drm_mode.h
@@ -936,6 +936,13 @@ struct hdr_output_metadata {
 };
 
 #define DRM_MODE_PAGE_FLIP_EVENT 0x01
+/**
+ * DRM_MODE_PAGE_FLIP_ASYNC
+ *
+ * Request that the page-flip is performed as soon as possible, ie. with no
+ * delay due to waiting for vblank. This may cause tearing to be visible on
+ * the screen.
+ */
 #define DRM_MODE_PAGE_FLIP_ASYNC 0x02
 #define DRM_MODE_PAGE_FLIP_TARGET_ABSOLUTE 0x4
 #define DRM_MODE_PAGE_FLIP_TARGET_RELATIVE 0x8
-- 
2.37.3

[PATCH v3 0/6] Add support for atomic async page-flips

2022-09-29 Thread Simon Ser

This series adds support for DRM_MODE_PAGE_FLIP_ASYNC for atomic
commits, aka. "immediate flip" (which might result in tearing).
The feature was only available via the legacy uAPI, however for
gaming use-cases it may be desirable to enable it via the atomic
uAPI too.

- Patchwork: https://patchwork.freedesktop.org/series/107683/
- User-space patch: https://github.com/Plagman/gamescope/pull/595
- IGT patch: https://patchwork.freedesktop.org/series/107681/

Main changes in v2: add docs, fail atomic commit if async flip isn't
possible.

Changes in v3: add a note in the documentation about Intel hardware,
add R-b tags.

Tested on an AMD Picasso iGPU (Simon) and an AMD Vangogh GPU (André).

Simon Ser (6):
  drm: document DRM_MODE_PAGE_FLIP_ASYNC
  amd/display: only accept async flips for fast updates
  drm: introduce drm_mode_config.atomic_async_page_flip_not_supported
  drm: allow DRM_MODE_PAGE_FLIP_ASYNC for atomic commits
  drm: introduce DRM_CAP_ATOMIC_ASYNC_PAGE_FLIP
  amd/display: indicate support for atomic async page-flips on DC

 .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c |  8 ++
 .../amd/display/amdgpu_dm/amdgpu_dm_crtc.c| 10 +++
 drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.c  |  1 +
 drivers/gpu/drm/drm_atomic_uapi.c | 28 +--
 drivers/gpu/drm/drm_ioctl.c   |  5 
 drivers/gpu/drm/i915/display/intel_display.c  |  1 +
 drivers/gpu/drm/nouveau/nouveau_display.c |  1 +
 drivers/gpu/drm/vc4/vc4_kms.c |  1 +
 include/drm/drm_mode_config.h | 11 
 include/uapi/drm/drm.h| 10 ++-
 include/uapi/drm/drm_mode.h   | 16 +++
 11 files changed, 88 insertions(+), 4 deletions(-)

-- 
2.37.3

Re: [PATCH v3 5/5] drm/amdgpu: switch workload context to/from compute

2022-09-29 Thread Alex Deucher

On Thu, Sep 29, 2022 at 10:14 AM Lazar, Lijo  wrote:
>
>
>
> On 9/29/2022 7:30 PM, Sharma, Shashank wrote:
> >
> >
> > On 9/29/2022 3:37 PM, Lazar, Lijo wrote:
> >> To be clear your understanding -
> >>
> >> Nothing is automatic in PMFW. PMFW picks a priority based on the
> >> actual mask sent by driver.
> >>
> >> Assuming lower bits corresponds to highest priority -
> >>
> >> If driver sends a mask with Bit3 and Bit 0 set, PMFW will chose
> >> profile that corresponds to Bit0. If driver sends a mask with Bit4
> >> Bit2 set and rest unset, PMFW will chose profile that corresponds to
> >> Bit2. However if driver sends a mask only with a single bit set, it
> >> chooses the profile regardless of whatever was the previous profile. t
> >> doesn't check if the existing profile > newly requested one. That is
> >> the behavior.
> >>
> >> So if a job send chooses a profile that corresponds to Bit0, driver
> >> will send that. Next time if another job chooses a profile that
> >> corresponds to Bit1, PMFW will receive that as the new profile and
> >> switch to that. It trusts the driver to send the proper workload mask.
> >>
> >> Hope that gives the picture.
> >>
> >
> >
> > Thanks, my understanding is also similar, referring to the core power
> > switch profile function here:
> > amd_powerplay.c::pp_dpm_switch_power_profile()
> > *snip code*
> > hwmgr->workload_mask |= (1 << hwmgr->workload_prority[type]);
> >  index = fls(hwmgr->workload_mask);
> >  index = index <= Workload_Policy_Max ? index - 1 : 0;
> >  workload = hwmgr->workload_setting[index];
> > *snip_code*
> > hwmgr->hwmgr_func->set_power_profile_mode(hwmgr, &workload, 0);
> >
> > Here I can see that the new workload mask is appended into the existing
> > workload mask (not overwritten). So if we keep sending new
> > workload_modes, they would be appended into the workload flags and
> > finally the PM will pick the most aggressive one of all these flags, as
> > per its policy.
> >
>
> Actually it's misleading -
>
> The path for sienna is -
> set_power_profile_mode -> sienna_cichlid_set_power_profile_mode
>
>
> This code here is a picking one based on lookup table.
>
>   workload_type = smu_cmn_to_asic_specific_index(smu,
>
> CMN2ASIC_MAPPING_WORKLOAD,
>
> smu->power_profile_mode);
>
> This is that lookup table -
>
> static struct cmn2asic_mapping
> sienna_cichlid_workload_map[PP_SMC_POWER_PROFILE_COUNT] = {
>  WORKLOAD_MAP(PP_SMC_POWER_PROFILE_BOOTUP_DEFAULT,
> WORKLOAD_PPLIB_DEFAULT_BIT),
>  WORKLOAD_MAP(PP_SMC_POWER_PROFILE_FULLSCREEN3D,
> WORKLOAD_PPLIB_FULL_SCREEN_3D_BIT),
>  WORKLOAD_MAP(PP_SMC_POWER_PROFILE_POWERSAVING,
> WORKLOAD_PPLIB_POWER_SAVING_BIT),
>  WORKLOAD_MAP(PP_SMC_POWER_PROFILE_VIDEO,
> WORKLOAD_PPLIB_VIDEO_BIT),
>  WORKLOAD_MAP(PP_SMC_POWER_PROFILE_VR,
> WORKLOAD_PPLIB_VR_BIT),
>  WORKLOAD_MAP(PP_SMC_POWER_PROFILE_COMPUTE,
> WORKLOAD_PPLIB_COMPUTE_BIT),
>  WORKLOAD_MAP(PP_SMC_POWER_PROFILE_CUSTOM,
> WORKLOAD_PPLIB_CUSTOM_BIT),
> };
>
>
> And this is the place of interaction with PMFW. (1 << workload_type) is
> the mask being sent.
>
> smu_cmn_send_smc_msg_with_param(smu, SMU_MSG_SetWorkloadMask,
>  1 << workload_type, NULL);
>
> In the end, driver implementation expects only one bit to be set.

Shashank and I had a discussion about this today.  I think there are a
few thing we can do to handle this better:

1. Set a flag that if the user changes the default via sysfs that
overrides any runtime setting via an application since presumably that
is what the user wants and we won't change the hint at runtime.
2. Drop the GET API.  There's no need for this, the hint is just a hint.
2. Have the driver arbitrate between the available workload profiles
based on the numeric value of the hint (e.g., default < 3D < video <
VR < compute) as the higher values are more aggressive in most cases.
If requests come in for 3D and compute at the same time, the driver
will select compute because it's value is highest.  Each hint type
would be ref counted so we'd know what state to be in every time we go
to set the state.  If all of the clients requesting compute go away,
and only 3D requestors remain, we can switch to 3D.  If all refcounts
go to 0, we go back to default.  This will not require any change to
the current workload API in the SMU code.

Alex

>
> Thanks,
> Lijo
>
> > Now, when we have a single workload:
> > -> Job1: requests profile P1 via UAPI, ref count = 1
> > -> driver sends flags for p1
> > -> PM FW applies profile P1
> > -> Job executes in profile P1
> > -> Job goes to reset function, ref_count = 0,
> > -> Power profile resets
> >
> > Now, we have conflicts only when we see multiple workloads (Job1 and Job 2)
> > -> Job1: requests profile P1 via UAPI, ref count = 1
> > -> driver sends flags for p1
> > -> PM FW applies profile P1
> > -> Job executes in profile P1
> > -> Job2: requests profile P2 via UAPI, refcount = 2

Re: [PATCH v2 1/8] mm/memory.c: Fix race when faulting a device private page

2022-09-29 Thread Felix Kuehling


On 2022-09-28 08:01, Alistair Popple wrote:

When the CPU tries to access a device private page the migrate_to_ram()
callback associated with the pgmap for the page is called. However no
reference is taken on the faulting page. Therefore a concurrent
migration of the device private page can free the page and possibly the
underlying pgmap. This results in a race which can crash the kernel due
to the migrate_to_ram() function pointer becoming invalid. It also means
drivers can't reliably read the zone_device_data field because the page
may have been freed with memunmap_pages().

Close the race by getting a reference on the page while holding the ptl
to ensure it has not been freed. Unfortunately the elevated reference
count will cause the migration required to handle the fault to fail. To
avoid this failure pass the faulting page into the migrate_vma functions
so that if an elevated reference count is found it can be checked to see
if it's expected or not.


Do we really have to drag the fault_page all the way into the migrate 
structure? Is the elevated refcount still needed at that time? Maybe it 
would be easier to drop the refcount early in the ops->migrage_to_ram 
callbacks, so we won't have to deal with it in all the migration code.


Regards,
  Felix




Signed-off-by: Alistair Popple 
Cc: Jason Gunthorpe 
Cc: John Hubbard 
Cc: Ralph Campbell 
Cc: Michael Ellerman 
Cc: Felix Kuehling 
Cc: Lyude Paul 
---
  arch/powerpc/kvm/book3s_hv_uvmem.c   | 15 ++-
  drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 17 +++--
  drivers/gpu/drm/amd/amdkfd/kfd_migrate.h |  2 +-
  drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 11 +---
  include/linux/migrate.h  |  8 ++-
  lib/test_hmm.c   |  7 ++---
  mm/memory.c  | 16 +++-
  mm/migrate.c | 34 ++---
  mm/migrate_device.c  | 18 +
  9 files changed, 87 insertions(+), 41 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
b/arch/powerpc/kvm/book3s_hv_uvmem.c
index 5980063..d4eacf4 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -508,10 +508,10 @@ unsigned long kvmppc_h_svm_init_start(struct kvm *kvm)
  static int __kvmppc_svm_page_out(struct vm_area_struct *vma,
unsigned long start,
unsigned long end, unsigned long page_shift,
-   struct kvm *kvm, unsigned long gpa)
+   struct kvm *kvm, unsigned long gpa, struct page *fault_page)
  {
unsigned long src_pfn, dst_pfn = 0;
-   struct migrate_vma mig;
+   struct migrate_vma mig = { 0 };
struct page *dpage, *spage;
struct kvmppc_uvmem_page_pvt *pvt;
unsigned long pfn;
@@ -525,6 +525,7 @@ static int __kvmppc_svm_page_out(struct vm_area_struct *vma,
mig.dst = &dst_pfn;
mig.pgmap_owner = &kvmppc_uvmem_pgmap;
mig.flags = MIGRATE_VMA_SELECT_DEVICE_PRIVATE;
+   mig.fault_page = fault_page;
  
  	/* The requested page is already paged-out, nothing to do */

if (!kvmppc_gfn_is_uvmem_pfn(gpa >> page_shift, kvm, NULL))
@@ -580,12 +581,14 @@ static int __kvmppc_svm_page_out(struct vm_area_struct 
*vma,
  static inline int kvmppc_svm_page_out(struct vm_area_struct *vma,
  unsigned long start, unsigned long end,
  unsigned long page_shift,
- struct kvm *kvm, unsigned long gpa)
+ struct kvm *kvm, unsigned long gpa,
+ struct page *fault_page)
  {
int ret;
  
  	mutex_lock(&kvm->arch.uvmem_lock);

-   ret = __kvmppc_svm_page_out(vma, start, end, page_shift, kvm, gpa);
+   ret = __kvmppc_svm_page_out(vma, start, end, page_shift, kvm, gpa,
+   fault_page);
mutex_unlock(&kvm->arch.uvmem_lock);
  
  	return ret;

@@ -736,7 +739,7 @@ static int kvmppc_svm_page_in(struct vm_area_struct *vma,
bool pagein)
  {
unsigned long src_pfn, dst_pfn = 0;
-   struct migrate_vma mig;
+   struct migrate_vma mig = { 0 };
struct page *spage;
unsigned long pfn;
struct page *dpage;
@@ -994,7 +997,7 @@ static vm_fault_t kvmppc_uvmem_migrate_to_ram(struct 
vm_fault *vmf)
  
  	if (kvmppc_svm_page_out(vmf->vma, vmf->address,

vmf->address + PAGE_SIZE, PAGE_SHIFT,
-   pvt->kvm, pvt->gpa))
+   pvt->kvm, pvt->gpa, vmf->page))
return VM_FAULT_SIGBUS;
else
return 0;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
index b059a77..776448b 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
@@ -409,7 +409,7 @@ svm_migra

Re: [PATCH 09/10] drm/amdgpu/gfx10: switch to amdgpu_gfx_rlc_init_microcode

2022-09-29 Thread Dmitry Osipenko

On 9/28/22 22:32, Alex Deucher wrote:
> On Wed, Sep 28, 2022 at 3:24 PM Dmitry Osipenko
>  wrote:
>>
>> On 9/28/22 20:47, Dmitry Osipenko wrote:
>>> On 9/28/22 20:44, Deucher, Alexander wrote:
 [AMD Official Use Only - General]

 This should be fixed in a backwards compatible way with this patch:
 https://patchwork.freedesktop.org/patch/504869/
>>>
>>> Good to know that it's already addressed, thank you very much for the
>>> quick reply.
>>
>> Unfortunately, that patch doesn't help beige goby. Please fix.
> 
> https://patchwork.freedesktop.org/patch/505248/

That helped, thank you.

-- 
Best regards,
Dmitry

Re: [PATCH v3 5/5] drm/amdgpu: switch workload context to/from compute

2022-09-29 Thread Felix Kuehling

On 2022-09-29 07:10, Lazar, Lijo wrote:

On 9/29/2022 2:18 PM, Sharma, Shashank wrote:

On 9/28/2022 11:51 PM, Alex Deucher wrote:

On Wed, Sep 28, 2022 at 4:57 AM Sharma, Shashank
 wrote:

On 9/27/2022 10:40 PM, Alex Deucher wrote:

On Tue, Sep 27, 2022 at 11:38 AM Sharma, Shashank
 wrote:

On 9/27/2022 5:23 PM, Felix Kuehling wrote:

Am 2022-09-27 um 10:58 schrieb Sharma, Shashank:

Hello Felix,

Thank for the review comments.

On 9/27/2022 4:48 PM, Felix Kuehling wrote:

Am 2022-09-27 um 02:12 schrieb Christian König:

Am 26.09.22 um 23:40 schrieb Shashank Sharma:

This patch switches the GPU workload mode to/from
compute mode, while submitting compute workload.

Signed-off-by: Alex Deucher 
Signed-off-by: Shashank Sharma 

Feel free to add my acked-by, but Felix should probably take 
a look

as well.

This look OK purely from a compute perspective. But I'm concerned
about the interaction of compute with graphics or multiple 
graphics
contexts submitting work concurrently. They would constantly 
override

or disable each other's workload hints.

For example, you have an amdgpu_ctx with
AMDGPU_CTX_WORKLOAD_HINT_COMPUTE (maybe Vulkan compute) and a KFD
process that also wants the compute profile. Those could be 
different
processes belonging to different users. Say, KFD enables the 
compute
profile first. Then the graphics context submits a job. At the 
start
of the job, the compute profile is enabled. That's a no-op 
because
KFD already enabled the compute profile. When the job 
finishes, it

disables the compute profile for everyone, including KFD. That's
unexpected.

In this case, it will not disable the compute profile, as the
reference counter will not be zero. The reset_profile() will 
only act

if the reference counter is 0.

OK, I missed the reference counter.

But I would be happy to get any inputs about a policy which can be
more sustainable and gets better outputs, for example:
- should we not allow a profile change, if a PP mode is already
applied and keep it Early bird basis ?

For example: Policy A
- Job A sets the profile to compute
- Job B tries to set profile to 3D, but we do not allow it as 
job A is

not finished it yet.

Or Policy B: Current one
- Job A sets the profile to compute
- Job B tries to set profile to 3D, and we allow it. Job A also 
runs

in PP 3D
- Job B finishes, but does not reset PP as reference count is 
not zero

due to compute
- Job  A finishes, profile reset to NONE

I think this won't work. As I understand it, the
amdgpu_dpm_switch_power_profile enables and disables individual
profiles. Disabling the 3D profile doesn't disable the compute 
profile

at the same time. I think you'll need one refcount per profile.

Regards,
 Felix

Thanks, This is exactly what I was looking for, I think Alex's 
initial
idea was around it, but I was under the assumption that there is 
only
one HW profile in SMU which keeps on getting overwritten. This 
can solve
our problems, as I can create an array of reference counters, and 
will

disable only the profile whose reference counter goes 0.

It's been a while since I paged any of this code into my head, but I
believe the actual workload message in the SMU is a mask where you 
can

specify multiple workload types at the same time and the SMU will
arbitrate between them internally.  E.g., the most aggressive one 
will

be selected out of the ones specified.  I think in the driver we just
set one bit at a time using the current interface.  It might be 
better

to change the interface and just ref count the hint types and then
when we call the set function look at the ref counts for each hint
type and set the mask as appropriate.

Alex

Hey Alex,
Thanks for your comment, if that is the case, this current patch 
series
works straight forward, and no changes would be required. Please 
let me

know if my understanding is correct:

Assumption: Order of aggression: 3D > Media > Compute

- Job 1: Requests mode compute: PP changed to compute, ref count 1
- Job 2: Requests mode media: PP changed to media, ref count 2
- Job 3: requests mode 3D: PP changed to 3D, ref count 3
- Job 1 finishes, downs ref count to 2, doesn't reset the PP as ref 
> 0,

PP still 3D
- Job 3 finishes, downs ref count to 1, doesn't reset the PP as ref 
> 0,

PP still 3D
- Job 2 finishes, downs ref count to 0, PP changed to NONE,

In this way, every job will be operating in the Power profile of 
desired
aggression or higher, and this API guarantees the execution 
at-least in

the desired power profile.

I'm not entirely sure on the relative levels of aggression, but I
believe the SMU priorities them by index.  E.g.
#define WORKLOAD_PPLIB_DEFAULT_BIT    0
#define WORKLOAD_PPLIB_FULL_SCREEN_3D_BIT 1
#define WORKLOAD_PPLIB_POWER_SAVING_BIT   2
#define WORKLOAD_PPLIB_VIDEO_BIT  3
#define WORKLOAD_PPLIB_VR_BIT 4
#define WORKLOAD_PPLIB_COMPUTE_BIT    5
#define WORKLOAD_PPLIB_CUSTOM_BIT 6

3D < video < VR < compute < c

Re: [PATCH v6 1/1] drm/amdkfd: Track unified memory when switching xnack mode

2022-09-29 Thread Felix Kuehling


On 2022-09-29 12:47, Philip Yang wrote:

Unified memory usage with xnack off is tracked to avoid oversubscribe
system memory, with xnack on, we don't track unified memory usage to
allow memory oversubscribe. When switching xnack mode from off to on,
subsequent free ranges allocated with xnack off will not unreserve
memory. When switching xnack mode from on to off, subsequent free ranges
allocated with xnack on will unreserve memory. Both cases cause memory
accounting unbalanced.

When switching xnack mode from on to off, need reserve already allocated
svm range memory. When switching xnack mode from off to on, need
unreserve already allocated svm range memory.

v6: Take prange lock to access range child list
v5: Handle prange child ranges
v4: Handle reservation memory failure
v3: Handle switching xnack mode race with svm_range_deferred_list_work
v2: Handle both switching xnack from on to off and from off to on cases

Signed-off-by: Philip Yang 
Reviewed-by: Felix Kuehling 


Really Reviewed-by me this time. Feel free to submit to 
amd-staging-drm-next.


Thanks,
  Felix



---
  drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 26 +++---
  drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 60 +++-
  drivers/gpu/drm/amd/amdkfd/kfd_svm.h |  1 +
  3 files changed, 80 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 56f7307c21d2..5feaba6a77de 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -1584,6 +1584,8 @@ static int kfd_ioctl_smi_events(struct file *filep,
return kfd_smi_event_open(pdd->dev, &args->anon_fd);
  }
  
+#if IS_ENABLED(CONFIG_HSA_AMD_SVM)

+
  static int kfd_ioctl_set_xnack_mode(struct file *filep,
struct kfd_process *p, void *data)
  {
@@ -1594,22 +1596,29 @@ static int kfd_ioctl_set_xnack_mode(struct file *filep,
if (args->xnack_enabled >= 0) {
if (!list_empty(&p->pqm.queues)) {
pr_debug("Process has user queues running\n");
-   mutex_unlock(&p->mutex);
-   return -EBUSY;
+   r = -EBUSY;
+   goto out_unlock;
}
-   if (args->xnack_enabled && !kfd_process_xnack_mode(p, true))
+
+   if (p->xnack_enabled == args->xnack_enabled)
+   goto out_unlock;
+
+   if (args->xnack_enabled && !kfd_process_xnack_mode(p, true)) {
r = -EPERM;
-   else
-   p->xnack_enabled = args->xnack_enabled;
+   goto out_unlock;
+   }
+
+   r = svm_range_switch_xnack_reserve_mem(p, args->xnack_enabled);
} else {
args->xnack_enabled = p->xnack_enabled;
}
+
+out_unlock:
mutex_unlock(&p->mutex);
  
  	return r;

  }
  
-#if IS_ENABLED(CONFIG_HSA_AMD_SVM)

  static int kfd_ioctl_svm(struct file *filep, struct kfd_process *p, void 
*data)
  {
struct kfd_ioctl_svm_args *args = data;
@@ -1629,6 +1638,11 @@ static int kfd_ioctl_svm(struct file *filep, struct 
kfd_process *p, void *data)
return r;
  }
  #else
+static int kfd_ioctl_set_xnack_mode(struct file *filep,
+   struct kfd_process *p, void *data)
+{
+   return -EPERM;
+}
  static int kfd_ioctl_svm(struct file *filep, struct kfd_process *p, void 
*data)
  {
return -EPERM;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index cf5b4005534c..f5913ba22174 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -278,7 +278,7 @@ static void svm_range_free(struct svm_range *prange, bool 
update_mem_usage)
svm_range_free_dma_mappings(prange);
  
  	if (update_mem_usage && !p->xnack_enabled) {

-   pr_debug("unreserve mem limit: %lld\n", size);
+   pr_debug("unreserve prange 0x%p size: 0x%llx\n", prange, size);
amdgpu_amdkfd_unreserve_mem_limit(NULL, size,
KFD_IOC_ALLOC_MEM_FLAGS_USERPTR);
}
@@ -2956,6 +2956,64 @@ svm_range_restore_pages(struct amdgpu_device *adev, 
unsigned int pasid,
return r;
  }
  
+int

+svm_range_switch_xnack_reserve_mem(struct kfd_process *p, bool xnack_enabled)
+{
+   struct svm_range *prange, *pchild;
+   uint64_t reserved_size = 0;
+   uint64_t size;
+   int r = 0;
+
+   pr_debug("switching xnack from %d to %d\n", p->xnack_enabled, 
xnack_enabled);
+
+   mutex_lock(&p->svms.lock);
+
+   list_for_each_entry(prange, &p->svms.list, list) {
+   svm_range_lock(prange);
+   list_for_each_entry(pchild, &prange->child_list, child_list) {
+   size = (pchild->last - pchild->start + 1) << PAGE_SHIFT;
+   if (xn

Re: [PATCH 1/2] drm/amdgpu: Enable VCN DPG for GC11_0_1

2022-09-29 Thread James Zhu


Reviewed-by:JamesZhufortheseries.

On 2022-09-29 12:50 p.m., Sonny Jiang wrote:

Enable VCN DPG on GC11_0_1

Signed-off-by: Sonny Jiang
---
  drivers/gpu/drm/amd/amdgpu/soc21.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/soc21.c 
b/drivers/gpu/drm/amd/amdgpu/soc21.c
index 5f0d6983714a..16b757664a35 100644
--- a/drivers/gpu/drm/amd/amdgpu/soc21.c
+++ b/drivers/gpu/drm/amd/amdgpu/soc21.c
@@ -629,6 +629,7 @@ static int soc21_common_early_init(void *handle)
AMD_CG_SUPPORT_JPEG_MGCG;
adev->pg_flags =
AMD_PG_SUPPORT_GFX_PG |
+   AMD_PG_SUPPORT_VCN_DPG |
AMD_PG_SUPPORT_JPEG;
adev->external_rev_id = adev->rev_id + 0x1;
break;

[PATCH 1/2] drm/amdgpu: Enable VCN DPG for GC11_0_1

2022-09-29 Thread Sonny Jiang

Enable VCN DPG on GC11_0_1

Signed-off-by: Sonny Jiang 
---
 drivers/gpu/drm/amd/amdgpu/soc21.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/soc21.c 
b/drivers/gpu/drm/amd/amdgpu/soc21.c
index 5f0d6983714a..16b757664a35 100644
--- a/drivers/gpu/drm/amd/amdgpu/soc21.c
+++ b/drivers/gpu/drm/amd/amdgpu/soc21.c
@@ -629,6 +629,7 @@ static int soc21_common_early_init(void *handle)
AMD_CG_SUPPORT_JPEG_MGCG;
adev->pg_flags =
AMD_PG_SUPPORT_GFX_PG |
+   AMD_PG_SUPPORT_VCN_DPG |
AMD_PG_SUPPORT_JPEG;
adev->external_rev_id = adev->rev_id + 0x1;
break;
-- 
2.36.1

[PATCH 2/2] drm/amdgpu: Enable sram on vcn_4_0_2

2022-09-29 Thread Sonny Jiang

Enable sram on vcn_4_0_2

Signed-off-by: Sonny Jiang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
index f36e4f08db6d..0b52af415b28 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
@@ -191,7 +191,7 @@ int amdgpu_vcn_sw_init(struct amdgpu_device *adev)
fw_name = FIRMWARE_VCN4_0_2;
if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) &&
(adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG))
-   adev->vcn.indirect_sram = false;
+   adev->vcn.indirect_sram = true;
break;
case IP_VERSION(4, 0, 4):
fw_name = FIRMWARE_VCN4_0_4;
-- 
2.36.1

[PATCH v6 1/1] drm/amdkfd: Track unified memory when switching xnack mode

2022-09-29 Thread Philip Yang

Unified memory usage with xnack off is tracked to avoid oversubscribe
system memory, with xnack on, we don't track unified memory usage to
allow memory oversubscribe. When switching xnack mode from off to on,
subsequent free ranges allocated with xnack off will not unreserve
memory. When switching xnack mode from on to off, subsequent free ranges
allocated with xnack on will unreserve memory. Both cases cause memory
accounting unbalanced.

When switching xnack mode from on to off, need reserve already allocated
svm range memory. When switching xnack mode from off to on, need
unreserve already allocated svm range memory.

v6: Take prange lock to access range child list
v5: Handle prange child ranges
v4: Handle reservation memory failure
v3: Handle switching xnack mode race with svm_range_deferred_list_work
v2: Handle both switching xnack from on to off and from off to on cases

Signed-off-by: Philip Yang 
Reviewed-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 26 +++---
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 60 +++-
 drivers/gpu/drm/amd/amdkfd/kfd_svm.h |  1 +
 3 files changed, 80 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 56f7307c21d2..5feaba6a77de 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -1584,6 +1584,8 @@ static int kfd_ioctl_smi_events(struct file *filep,
return kfd_smi_event_open(pdd->dev, &args->anon_fd);
 }
 
+#if IS_ENABLED(CONFIG_HSA_AMD_SVM)
+
 static int kfd_ioctl_set_xnack_mode(struct file *filep,
struct kfd_process *p, void *data)
 {
@@ -1594,22 +1596,29 @@ static int kfd_ioctl_set_xnack_mode(struct file *filep,
if (args->xnack_enabled >= 0) {
if (!list_empty(&p->pqm.queues)) {
pr_debug("Process has user queues running\n");
-   mutex_unlock(&p->mutex);
-   return -EBUSY;
+   r = -EBUSY;
+   goto out_unlock;
}
-   if (args->xnack_enabled && !kfd_process_xnack_mode(p, true))
+
+   if (p->xnack_enabled == args->xnack_enabled)
+   goto out_unlock;
+
+   if (args->xnack_enabled && !kfd_process_xnack_mode(p, true)) {
r = -EPERM;
-   else
-   p->xnack_enabled = args->xnack_enabled;
+   goto out_unlock;
+   }
+
+   r = svm_range_switch_xnack_reserve_mem(p, args->xnack_enabled);
} else {
args->xnack_enabled = p->xnack_enabled;
}
+
+out_unlock:
mutex_unlock(&p->mutex);
 
return r;
 }
 
-#if IS_ENABLED(CONFIG_HSA_AMD_SVM)
 static int kfd_ioctl_svm(struct file *filep, struct kfd_process *p, void *data)
 {
struct kfd_ioctl_svm_args *args = data;
@@ -1629,6 +1638,11 @@ static int kfd_ioctl_svm(struct file *filep, struct 
kfd_process *p, void *data)
return r;
 }
 #else
+static int kfd_ioctl_set_xnack_mode(struct file *filep,
+   struct kfd_process *p, void *data)
+{
+   return -EPERM;
+}
 static int kfd_ioctl_svm(struct file *filep, struct kfd_process *p, void *data)
 {
return -EPERM;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index cf5b4005534c..f5913ba22174 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -278,7 +278,7 @@ static void svm_range_free(struct svm_range *prange, bool 
update_mem_usage)
svm_range_free_dma_mappings(prange);
 
if (update_mem_usage && !p->xnack_enabled) {
-   pr_debug("unreserve mem limit: %lld\n", size);
+   pr_debug("unreserve prange 0x%p size: 0x%llx\n", prange, size);
amdgpu_amdkfd_unreserve_mem_limit(NULL, size,
KFD_IOC_ALLOC_MEM_FLAGS_USERPTR);
}
@@ -2956,6 +2956,64 @@ svm_range_restore_pages(struct amdgpu_device *adev, 
unsigned int pasid,
return r;
 }
 
+int
+svm_range_switch_xnack_reserve_mem(struct kfd_process *p, bool xnack_enabled)
+{
+   struct svm_range *prange, *pchild;
+   uint64_t reserved_size = 0;
+   uint64_t size;
+   int r = 0;
+
+   pr_debug("switching xnack from %d to %d\n", p->xnack_enabled, 
xnack_enabled);
+
+   mutex_lock(&p->svms.lock);
+
+   list_for_each_entry(prange, &p->svms.list, list) {
+   svm_range_lock(prange);
+   list_for_each_entry(pchild, &prange->child_list, child_list) {
+   size = (pchild->last - pchild->start + 1) << PAGE_SHIFT;
+   if (xnack_enabled) {
+   amdgpu_amdkfd_unreserve_mem_limit(NULL, size,
+   
KFD_IOC_ALLOC_MEM_F

Re: [PATCH v5 1/1] drm/amdkfd: Track unified memory when switching xnack mode

2022-09-29 Thread Philip Yang




On 2022-09-28 15:02, Felix Kuehling wrote:


Am 2022-09-28 um 12:11 schrieb Philip Yang:

Unified memory usage with xnack off is tracked to avoid oversubscribe
system memory, with xnack on, we don't track unified memory usage to
allow memory oversubscribe. When switching xnack mode from off to on,
subsequent free ranges allocated with xnack off will not unreserve
memory. When switching xnack mode from on to off, subsequent free ranges
allocated with xnack on will unreserve memory. Both cases cause memory
accounting unbalanced.

When switching xnack mode from on to off, need reserve already allocated
svm range memory. When switching xnack mode from off to on, need
unreserve already allocated svm range memory.

v5: Handle prange child ranges
v4: Handle reservation memory failure
v3: Handle switching xnack mode race with svm_range_deferred_list_work
v2: Handle both switching xnack from on to off and from off to on cases

Signed-off-by: Philip Yang 
Reviewed-by: Felix Kuehling 
---
  drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 26 ---
  drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 56 +++-
  drivers/gpu/drm/amd/amdkfd/kfd_svm.h |  1 +
  3 files changed, 76 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c

index 56f7307c21d2..5feaba6a77de 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -1584,6 +1584,8 @@ static int kfd_ioctl_smi_events(struct file 
*filep,

  return kfd_smi_event_open(pdd->dev, &args->anon_fd);
  }
  +#if IS_ENABLED(CONFIG_HSA_AMD_SVM)
+
  static int kfd_ioctl_set_xnack_mode(struct file *filep,
  struct kfd_process *p, void *data)
  {
@@ -1594,22 +1596,29 @@ static int kfd_ioctl_set_xnack_mode(struct 
file *filep,

  if (args->xnack_enabled >= 0) {
  if (!list_empty(&p->pqm.queues)) {
  pr_debug("Process has user queues running\n");
-    mutex_unlock(&p->mutex);
-    return -EBUSY;
+    r = -EBUSY;
+    goto out_unlock;
  }
-    if (args->xnack_enabled && !kfd_process_xnack_mode(p, true))
+
+    if (p->xnack_enabled == args->xnack_enabled)
+    goto out_unlock;
+
+    if (args->xnack_enabled && !kfd_process_xnack_mode(p, true)) {
  r = -EPERM;
-    else
-    p->xnack_enabled = args->xnack_enabled;
+    goto out_unlock;
+    }
+
+    r = svm_range_switch_xnack_reserve_mem(p, args->xnack_enabled);
  } else {
  args->xnack_enabled = p->xnack_enabled;
  }
+
+out_unlock:
  mutex_unlock(&p->mutex);
    return r;
  }
  -#if IS_ENABLED(CONFIG_HSA_AMD_SVM)
  static int kfd_ioctl_svm(struct file *filep, struct kfd_process *p, 
void *data)

  {
  struct kfd_ioctl_svm_args *args = data;
@@ -1629,6 +1638,11 @@ static int kfd_ioctl_svm(struct file *filep, 
struct kfd_process *p, void *data)

  return r;
  }
  #else
+static int kfd_ioctl_set_xnack_mode(struct file *filep,
+    struct kfd_process *p, void *data)
+{
+    return -EPERM;
+}
  static int kfd_ioctl_svm(struct file *filep, struct kfd_process *p, 
void *data)

  {
  return -EPERM;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c

index cf5b4005534c..ff47ac836bd4 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -278,7 +278,7 @@ static void svm_range_free(struct svm_range 
*prange, bool update_mem_usage)

  svm_range_free_dma_mappings(prange);
    if (update_mem_usage && !p->xnack_enabled) {
-    pr_debug("unreserve mem limit: %lld\n", size);
+    pr_debug("unreserve prange 0x%p size: 0x%llx\n", prange, size);
  amdgpu_amdkfd_unreserve_mem_limit(NULL, size,
  KFD_IOC_ALLOC_MEM_FLAGS_USERPTR);
  }
@@ -2956,6 +2956,60 @@ svm_range_restore_pages(struct amdgpu_device 
*adev, unsigned int pasid,

  return r;
  }
  +int
+svm_range_switch_xnack_reserve_mem(struct kfd_process *p, bool 
xnack_enabled)

+{
+    struct svm_range *prange, *pchild;
+    uint64_t reserved_size = 0;
+    uint64_t size;
+    int r = 0;
+
+    pr_debug("switching xnack from %d to %d\n", p->xnack_enabled, 
xnack_enabled);

+
+    mutex_lock(&p->svms.lock);
+
+    list_for_each_entry(prange, &p->svms.list, list) {
+    list_for_each_entry(pchild, &prange->child_list, child_list) {


I believe the child_list is not protected by the svms.lock because we 
update it in MMU notifiers. It is protected by the prange->lock of the 
parent range.


Yes, svms lock can not protect range child list, we should take 
prange->lock to access range child list, because prange maybe split to 
child ranges by unmap MMU notifier in parallel, taking pchild->lock is 
not needed. Will send out v6 patch.


Regards,

Philip



Regards,
  Felix



+    size = (pchild->last - pchild->start + 1) << PAGE_SHIFT

Re: [PATCH] drm/amd/display: fix array-bounds error in dc_stream_remove_writeback()

2022-09-29 Thread Felix Kuehling




Am 2022-09-29 um 11:41 schrieb Hamza Mahfooz:



On 2022-09-29 11:36, Felix Kuehling wrote:

I'm still seeing a warning even with this fix:

/home/fkuehlin/compute/kernel/drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc_stream.c: 
In function ?dc_stream_remove_writeback?:
/home/fkuehlin/compute/kernel/drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc_stream.c:527:55: 
warning: array subscript 1 is above array bounds of ?struct 
dc_writeback_info[1]? [-Warray-bounds]

   527 | stream->writeback_info[j] = stream->writeback_info[i];
   | ~~^~~



What version of GCC are you using? I don't see it on GCC 12.2 with 
this patch applied.


gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0

Regards,
  Felix





Regards,
   Felix


Am 2022-09-27 um 16:35 schrieb Pillai, Aurabindo:


[AMD Official Use Only - General]


[AMD Official Use Only - General]


Reviewed-by: Aurabindo Pillai 

--

Regards,
Jay
 


*From:* Mahfooz, Hamza 
*Sent:* Tuesday, September 27, 2022 3:12 PM
*To:* linux-ker...@vger.kernel.org 
*Cc:* Mahfooz, Hamza ; Wentland, Harry 
; Li, Sun peng (Leo) ; 
Siqueira, Rodrigo ; Deucher, Alexander 
; Koenig, Christian 
; Pan, Xinhui ; David 
Airlie ; Daniel Vetter ; Lee, 
Alvin ; Hung, Alex ; Kotarac, 
Pavle ; Wang, Chao-kai (Stylon) 
; Pillai, Aurabindo ; 
Ma, Leo ; Wu, Hersen ; 
Po-Yu Hsieh Paul ; Jimmy Kizito 
; amd-gfx@lists.freedesktop.org 
; dri-de...@lists.freedesktop.org 

*Subject:* [PATCH] drm/amd/display: fix array-bounds error in 
dc_stream_remove_writeback()

Address the following error:
drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc_stream.c: In 
function ‘dc_stream_remove_writeback’:
drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc_stream.c:527:55: 
error: array subscript [0, 0] is outside array bounds of ‘struct 
dc_writeback_info[1]’ [-Werror=array-bounds]

  527 | stream->writeback_info[j] = stream->writeback_info[i];
  | ~~^~~
In file included from 
./drivers/gpu/drm/amd/amdgpu/../display/dc/dc.h:1269,
 from 
./drivers/gpu/drm/amd/amdgpu/../display/dc/inc/core_types.h:29,
 from 
./drivers/gpu/drm/amd/amdgpu/../display/dc/basics/dc_common.h:29,
 from 
drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc_stream.c:27:
./drivers/gpu/drm/amd/amdgpu/../display/dc/dc_stream.h:241:34: note: 
while referencing ‘writeback_info’

  241 | struct dc_writeback_info writeback_info[MAX_DWB_PIPES];
  |

Currently, we aren't checking to see if j remains within
writeback_info[]'s bounds. So, add a check to make sure that we aren't
overflowing the buffer.

Signed-off-by: Hamza Mahfooz 
---
 drivers/gpu/drm/amd/display/dc/core/dc_stream.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_stream.c 
b/drivers/gpu/drm/amd/display/dc/core/dc_stream.c

index 3ca1592ce7ac..ae13887756bf 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc_stream.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc_stream.c
@@ -520,7 +520,7 @@ bool dc_stream_remove_writeback(struct dc *dc,
 }

 /* remove writeback info for disabled writeback pipes from 
stream */

-   for (i = 0, j = 0; i < stream->num_wb_info; i++) {
+   for (i = 0, j = 0; i < stream->num_wb_info && j < 
MAX_DWB_PIPES; i++) {

 if (stream->writeback_info[i].wb_enabled) {
 if (i != j)
 /* trim the array */
--
2.37.2

Re: [PATCH] drm/amd/display: fix array-bounds error in dc_stream_remove_writeback()

2022-09-29 Thread Hamza Mahfooz





On 2022-09-29 11:36, Felix Kuehling wrote:

I'm still seeing a warning even with this fix:

/home/fkuehlin/compute/kernel/drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc_stream.c:
 In function ?dc_stream_remove_writeback?:
/home/fkuehlin/compute/kernel/drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc_stream.c:527:55:
 warning: array subscript 1 is above array bounds of ?struct 
dc_writeback_info[1]? [-Warray-bounds]
   527 | stream->writeback_info[j] = stream->writeback_info[i];
   | ~~^~~



What version of GCC are you using? I don't see it on GCC 12.2 with this 
patch applied.



Regards,
   Felix


Am 2022-09-27 um 16:35 schrieb Pillai, Aurabindo:


[AMD Official Use Only - General]


[AMD Official Use Only - General]


Reviewed-by: Aurabindo Pillai 

--

Regards,
Jay

*From:* Mahfooz, Hamza 
*Sent:* Tuesday, September 27, 2022 3:12 PM
*To:* linux-ker...@vger.kernel.org 
*Cc:* Mahfooz, Hamza ; Wentland, Harry 
; Li, Sun peng (Leo) ; 
Siqueira, Rodrigo ; Deucher, Alexander 
; Koenig, Christian 
; Pan, Xinhui ; David 
Airlie ; Daniel Vetter ; Lee, Alvin 
; Hung, Alex ; Kotarac, Pavle 
; Wang, Chao-kai (Stylon) 
; Pillai, Aurabindo ; 
Ma, Leo ; Wu, Hersen ; Po-Yu 
Hsieh Paul ; Jimmy Kizito ; 
amd-gfx@lists.freedesktop.org ; 
dri-de...@lists.freedesktop.org 
*Subject:* [PATCH] drm/amd/display: fix array-bounds error in 
dc_stream_remove_writeback()

Address the following error:
drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc_stream.c: In function 
‘dc_stream_remove_writeback’:
drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc_stream.c:527:55: 
error: array subscript [0, 0] is outside array bounds of ‘struct 
dc_writeback_info[1]’ [-Werror=array-bounds]

  527 | stream->writeback_info[j] = stream->writeback_info[i];
  | ~~^~~
In file included from 
./drivers/gpu/drm/amd/amdgpu/../display/dc/dc.h:1269,
 from 
./drivers/gpu/drm/amd/amdgpu/../display/dc/inc/core_types.h:29,
 from 
./drivers/gpu/drm/amd/amdgpu/../display/dc/basics/dc_common.h:29,
 from 
drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc_stream.c:27:
./drivers/gpu/drm/amd/amdgpu/../display/dc/dc_stream.h:241:34: note: 
while referencing ‘writeback_info’

  241 | struct dc_writeback_info writeback_info[MAX_DWB_PIPES];
  |

Currently, we aren't checking to see if j remains within
writeback_info[]'s bounds. So, add a check to make sure that we aren't
overflowing the buffer.

Signed-off-by: Hamza Mahfooz 
---
 drivers/gpu/drm/amd/display/dc/core/dc_stream.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_stream.c 
b/drivers/gpu/drm/amd/display/dc/core/dc_stream.c

index 3ca1592ce7ac..ae13887756bf 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc_stream.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc_stream.c
@@ -520,7 +520,7 @@ bool dc_stream_remove_writeback(struct dc *dc,
 }

 /* remove writeback info for disabled writeback pipes from 
stream */

-   for (i = 0, j = 0; i < stream->num_wb_info; i++) {
+   for (i = 0, j = 0; i < stream->num_wb_info && j < 
MAX_DWB_PIPES; i++) {

 if (stream->writeback_info[i].wb_enabled) {
 if (i != j)
 /* trim the array */
--
2.37.2



--
Hamza

Re: [PATCH] drm/amd/display: fix array-bounds error in dc_stream_remove_writeback()

2022-09-29 Thread Felix Kuehling


I'm still seeing a warning even with this fix:

/home/fkuehlin/compute/kernel/drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc_stream.c:
 In function ?dc_stream_remove_writeback?:
/home/fkuehlin/compute/kernel/drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc_stream.c:527:55:
 warning: array subscript 1 is above array bounds of ?struct 
dc_writeback_info[1]? [-Warray-bounds]
  527 | stream->writeback_info[j] = stream->writeback_info[i];
  | ~~^~~

Regards,
  Felix


Am 2022-09-27 um 16:35 schrieb Pillai, Aurabindo:


[AMD Official Use Only - General]


[AMD Official Use Only - General]


Reviewed-by: Aurabindo Pillai 

--

Regards,
Jay

*From:* Mahfooz, Hamza 
*Sent:* Tuesday, September 27, 2022 3:12 PM
*To:* linux-ker...@vger.kernel.org 
*Cc:* Mahfooz, Hamza ; Wentland, Harry 
; Li, Sun peng (Leo) ; 
Siqueira, Rodrigo ; Deucher, Alexander 
; Koenig, Christian 
; Pan, Xinhui ; David 
Airlie ; Daniel Vetter ; Lee, Alvin 
; Hung, Alex ; Kotarac, Pavle 
; Wang, Chao-kai (Stylon) 
; Pillai, Aurabindo ; 
Ma, Leo ; Wu, Hersen ; Po-Yu 
Hsieh Paul ; Jimmy Kizito ; 
amd-gfx@lists.freedesktop.org ; 
dri-de...@lists.freedesktop.org 
*Subject:* [PATCH] drm/amd/display: fix array-bounds error in 
dc_stream_remove_writeback()

Address the following error:
drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc_stream.c: In function 
‘dc_stream_remove_writeback’:
drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc_stream.c:527:55: 
error: array subscript [0, 0] is outside array bounds of ‘struct 
dc_writeback_info[1]’ [-Werror=array-bounds]

  527 | stream->writeback_info[j] = stream->writeback_info[i];
  | ~~^~~
In file included from 
./drivers/gpu/drm/amd/amdgpu/../display/dc/dc.h:1269,
 from 
./drivers/gpu/drm/amd/amdgpu/../display/dc/inc/core_types.h:29,
 from 
./drivers/gpu/drm/amd/amdgpu/../display/dc/basics/dc_common.h:29,
 from 
drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc_stream.c:27:
./drivers/gpu/drm/amd/amdgpu/../display/dc/dc_stream.h:241:34: note: 
while referencing ‘writeback_info’

  241 | struct dc_writeback_info writeback_info[MAX_DWB_PIPES];
  |

Currently, we aren't checking to see if j remains within
writeback_info[]'s bounds. So, add a check to make sure that we aren't
overflowing the buffer.

Signed-off-by: Hamza Mahfooz 
---
 drivers/gpu/drm/amd/display/dc/core/dc_stream.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_stream.c 
b/drivers/gpu/drm/amd/display/dc/core/dc_stream.c

index 3ca1592ce7ac..ae13887756bf 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc_stream.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc_stream.c
@@ -520,7 +520,7 @@ bool dc_stream_remove_writeback(struct dc *dc,
 }

 /* remove writeback info for disabled writeback pipes from 
stream */

-   for (i = 0, j = 0; i < stream->num_wb_info; i++) {
+   for (i = 0, j = 0; i < stream->num_wb_info && j < 
MAX_DWB_PIPES; i++) {

 if (stream->writeback_info[i].wb_enabled) {
 if (i != j)
 /* trim the array */
--
2.37.2

Re: [RFC PATCH v2 0/9] Enable 3D LUT to AMD display drivers

2022-09-29 Thread Harry Wentland

On 2022-09-06 12:46, Melissa Wen wrote:
> Hi,
> 
> From all feedback at [3DLUT_RFC] and an extensive AMD driver
> examination, here I am back with a first attempt to wire up a user 3D
> LUT (post-blending) to DC interface via DM CRTC color mgmt :)
> 
> I'm following some specific approaches to handle user shaper LUT and
> user 3D LUT that I would like to validate if the path taken is correct.
> 
> I used a modified version [igt_tests] of Ville's IGT 3D LUT test to
> verify that the shaper and 3D LUT programming is happening. However, I
> still have doubts about hw behavior and DC MPC's current implementation
> for 3D LUT.
> 
> Despite some initial patches for code cleanup and DRM interface, my
> focus here is the inclusion of a user 3D LUT in the Display Manager,
> which is done in the last five patches of this series:
> 
> - drm/amd/display: enable DRM shaper and 3D LUT properties
> - drm/amd/display: update lut3d and shaper lut to stream
> - drm/amd/display: add user shaper LUT support to amdgpu_dm color pipeline
> - drm/amd/display: add user 3D LUT support to the amdgpu_dm color pipeline
> - drm/amd/display: encapsulate atomic regamma operation
> 
> Things to take into account:
> 
> - 3D LUT (and shaper LUT) is only available in the atomic pipeline (I
>   didn't work on any implicit conversions that are done in the legacy
>   path)
> 
> - Therefore, I'm not doing any implicit conversions for shaper LUT
>   considering the input space, which means: it's set or not. When there
>   is no shaper LUT, it's set to BYPASS, but unfortunately, it seems that
>   the BYPASS mode for shaper LUT is not supported in the current DC
>   dcn30_set_mpc_shaper_3dlut(), since it returns false when
>   mpc3_program_shaper returns false (no params). Is the combination of a
>   user 3D LUT with a bypassed shaper LUT accepted by the hw?
> 

AFAIK this should be supported. The current DC code is centered on
use-cases for other OSes but I'm not sure this is the key issue here,
though it might explain why things won't always look as expected.

mpc3_program_shaper will program the shaper LUT to bypass when params
are NULL. It will return false so dcn30_set_output_transfer_func knows
no shaper LUT is set. The reason behind this is that shaper LUT and output
gamma tend to work in tandem.

If you have linear content incoming and want linear content on the output
side of the 3DLUT you'll need to program both shaper LUT and output gamma
LUTs with EOTF^-1 and EOTF respectively.

(Btw, please watch my HDR presentation at XDC or look through the slides,
as I'll show why the current DRM LUTs can't represent EOTF^-1 functions.)

In your case I don't expect you're operating on linear content. Your content
will already be in sRGB, so shaper LUT and output gamma should really be
both in bypass mode. You might need to debug dcn30_set_output_transfer_func
to make sure it's doing what you're expecting (and add new logic if needed).

> - I also see in dcn30_set_mpc_shaper_3dlut() that some bits need to be
>   set in lut3d_func to have the 3D LUT programmed on the MPC block. In
>   this sense, I used the dc_acquire_release_mpc_3dlut() function to get
>   the lut3d_func from the resource pool, but I'm not sure if the timing to
>   acquire and release the lut3d_func from the resource pool is correct
>   (and if I can really use it directly or I should make a copy).
> 

There are a limited number of MPC 3DLUT blocks (shaper + 3DLUT) available.
Acquiring and releasing them in DM is the right way to go about this.

Unfortunately dc_acquire_release_mpc_3dlut can fail. For this reason we
should run it in atomic_check. Unfortunately this function operates on
the current state, which makes it unsuitable for atomic_check.

We need a new function that does what dc_acquire_release_mpc_3dlut does
but takes a dm_state->context as parameter and operates on that instead.
Once we have that you can call that in dm_update_crtc_state to acquire
or release the 3DLUT on a CRTC as needed.

> - Still, on this topic, I use for lut3d the same bit.out_tf to update
>   the stream in the commit_tail because it triggers
>   .set_output_transfer_func that is in charge of setting both OGAM and 3D
>   LUT on MPC. There is a chance I got it wrong here, so I appreciate any
>   input on this topic.
> 

This looks correct to me.

> - Finally, in set_output_transfer_func, AFAIU, even if a user OGAM is
>   set, it won't be programmed if the shaper LUT and 3D LUT programming
>   are successful. However, if shaper/3DLUT programming fails, OGAM can be
>   considered. Should DM only accept DRM regamma if no DRM 3D LUT is
>   passed, or allowing the programming of both is still desirable?
> 

I don't know why we skip the OGAM programming when shaper + 3DLUT are
set. This doesn't make sense to me. As described above, the thing with
the shaper LUT is that you might need to sandwich the 3DLUT with two
1DLUTs, one being the shaper, the other the OGAM. Obviously it depends on
your intended tra

[pull] amdgpu drm-fixes-6.0

2022-09-29 Thread Alex Deucher

Hi Dave, Daniel,

Fixes for 6.0.

The following changes since commit f76349cf41451c5c42a99f18a9163377e4b364ff:

  Linux 6.0-rc7 (2022-09-25 14:01:02 -0700)

are available in the Git repository at:

  https://gitlab.freedesktop.org/agd5f/linux.git 
tags/amd-drm-fixes-6.0-2022-09-29

for you to fetch changes up to 83ca5fb40e758e0a0257bf4e3a1148dd52c6d0f2:

  drm/amd/display: Prevent OTG shutdown during PSR SU (2022-09-29 10:07:42 
-0400)


amd-drm-fixes-6.0-2022-09-29:

amdgpu:
- GC 11.x fixes
- SMU 13.x fixes
- DCN 3.1.4 fixes
- DCN 3.2.x fixes
- GC 9.x fix
- Fence fix
- SR-IOV supend/resume fix
- PSR regression fix


Alvin Lee (1):
  drm/amd/display: Update DCN32 to use new SR latencies

Aric Cyr (1):
  drm/amd/display: Fix audio on display after unplugging another

Bokun Zhang (1):
  drm/amdgpu: Add amdgpu suspend-resume code path under SRIOV

Eric Bernstein (1):
  drm/amd/display: Remove assert for odm transition case

Evan Quan (3):
  drm/amdgpu: avoid gfx register accessing during gfxoff
  drm/amd/pm: enable gfxoff feature for SMU 13.0.0
  drm/amd/pm: use adverse selection for dpm features unsupported by driver

Graham Sider (3):
  drm/amdkfd: fix MQD init for GFX11 in init_mqd
  drm/amdgpu: pass queue size and is_aql_queue to MES
  drm/amdkfd: fix dropped interrupt in kfd_int_process_v11

Jiadong.Zhu (2):
  drm/amdgpu: Correct the position in patch_cond_exec
  drm/amdgpu: Remove fence_process in count_emitted

Leo Li (1):
  drm/amd/display: Prevent OTG shutdown during PSR SU

Nicholas Kazlauskas (3):
  drm/amd/display: Do DIO FIFO enable after DP video stream enable
  drm/amd/display: Wrap OTG disable workaround with FIFO control
  drm/amd/display: Add explicit FIFO disable for DP blank

Samson Tam (1):
  drm/amd/display: fill in clock values when DPM is not enabled

Taimur Hassan (3):
  drm/amd/display: Avoid avoid unnecessary pixel rate divider programming
  drm/amd/display: Fix typo in get_pixel_rate_div
  drm/amd/display: Avoid unnecessary pixel rate divider programming

 drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c   |  4 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 27 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  |  1 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h|  2 +
 drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c |  4 +
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c  |  2 +-
 drivers/gpu/drm/amd/amdgpu/mes_v11_0.c |  4 +
 .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c  |  2 +
 drivers/gpu/drm/amd/amdkfd/kfd_int_process_v11.c   |  6 +-
 drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v11.c   |  4 +
 .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_psr.c  |  8 +-
 .../amd/display/dc/clk_mgr/dcn314/dcn314_clk_mgr.c | 11 ++-
 .../amd/display/dc/clk_mgr/dcn32/dcn32_clk_mgr.c   | 14 
 .../amd/display/dc/dce110/dce110_hw_sequencer.c|  6 +-
 .../gpu/drm/amd/display/dc/dcn314/dcn314_dccg.c| 47 
 .../display/dc/dcn314/dcn314_dio_stream_encoder.c  | 25 --
 drivers/gpu/drm/amd/display/dc/dcn32/dcn32_dccg.c  | 53 +
 .../gpu/drm/amd/display/dc/dcn32/dcn32_hubbub.c| 10 ++-
 .../gpu/drm/amd/display/dc/dml/dcn32/dcn32_fpu.c   | 43 ++-
 .../gpu/drm/amd/display/dc/dml/dcn32/dcn32_fpu.h   |  2 +
 .../drm/amd/display/dc/inc/hw/clk_mgr_internal.h   |  2 +
 drivers/gpu/drm/amd/include/mes_v11_api_def.h  |  3 +-
 .../gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c   | 89 +++---
 23 files changed, 283 insertions(+), 86 deletions(-)

Re: [PATCH v3 5/5] drm/amdgpu: switch workload context to/from compute

2022-09-29 Thread Sharma, Shashank





On 9/29/2022 4:14 PM, Lazar, Lijo wrote:



On 9/29/2022 7:30 PM, Sharma, Shashank wrote:



On 9/29/2022 3:37 PM, Lazar, Lijo wrote:

To be clear your understanding -

Nothing is automatic in PMFW. PMFW picks a priority based on the 
actual mask sent by driver.


Assuming lower bits corresponds to highest priority -

If driver sends a mask with Bit3 and Bit 0 set, PMFW will chose 
profile that corresponds to Bit0. If driver sends a mask with Bit4 
Bit2 set and rest unset, PMFW will chose profile that corresponds to 
Bit2. However if driver sends a mask only with a single bit set, it 
chooses the profile regardless of whatever was the previous profile. 
t doesn't check if the existing profile > newly requested one. That 
is the behavior.


So if a job send chooses a profile that corresponds to Bit0, driver 
will send that. Next time if another job chooses a profile that 
corresponds to Bit1, PMFW will receive that as the new profile and 
switch to that. It trusts the driver to send the proper workload mask.


Hope that gives the picture.




Thanks, my understanding is also similar, referring to the core power 
switch profile function here: 
amd_powerplay.c::pp_dpm_switch_power_profile()

*snip code*
hwmgr->workload_mask |= (1 << hwmgr->workload_prority[type]);
 index = fls(hwmgr->workload_mask);
 index = index <= Workload_Policy_Max ? index - 1 : 0;
 workload = hwmgr->workload_setting[index];
*snip_code*
hwmgr->hwmgr_func->set_power_profile_mode(hwmgr, &workload, 0);

Here I can see that the new workload mask is appended into the 
existing workload mask (not overwritten). So if we keep sending new 
workload_modes, they would be appended into the workload flags and 
finally the PM will pick the most aggressive one of all these flags, 
as per its policy.




Actually it's misleading -

The path for sienna is -
set_power_profile_mode -> sienna_cichlid_set_power_profile_mode


This code here is a picking one based on lookup table.

  workload_type = smu_cmn_to_asic_specific_index(smu,

CMN2ASIC_MAPPING_WORKLOAD,

smu->power_profile_mode);

This is that lookup table -

static struct cmn2asic_mapping 
sienna_cichlid_workload_map[PP_SMC_POWER_PROFILE_COUNT] = {
     WORKLOAD_MAP(PP_SMC_POWER_PROFILE_BOOTUP_DEFAULT, 
WORKLOAD_PPLIB_DEFAULT_BIT),
     WORKLOAD_MAP(PP_SMC_POWER_PROFILE_FULLSCREEN3D, 
WORKLOAD_PPLIB_FULL_SCREEN_3D_BIT),
     WORKLOAD_MAP(PP_SMC_POWER_PROFILE_POWERSAVING, 
WORKLOAD_PPLIB_POWER_SAVING_BIT),
     WORKLOAD_MAP(PP_SMC_POWER_PROFILE_VIDEO, 
WORKLOAD_PPLIB_VIDEO_BIT),

     WORKLOAD_MAP(PP_SMC_POWER_PROFILE_VR, WORKLOAD_PPLIB_VR_BIT),
     WORKLOAD_MAP(PP_SMC_POWER_PROFILE_COMPUTE, 
WORKLOAD_PPLIB_COMPUTE_BIT),
     WORKLOAD_MAP(PP_SMC_POWER_PROFILE_CUSTOM, 
WORKLOAD_PPLIB_CUSTOM_BIT),

};


And this is the place of interaction with PMFW. (1 << workload_type) is 
the mask being sent.


    smu_cmn_send_smc_msg_with_param(smu, SMU_MSG_SetWorkloadMask,
     1 << workload_type, NULL);

In the end, driver implementation expects only one bit to be set.



Well this seems like a bug here in the core functions, because the 
powerplay layer is doing the right thing by appending the workload flags 
keeping in mind that a profile_change can be requested while one profile 
is active, but the core functions are actually ignoring those flags.


This brings us to look into actual PM FW expectations. If it expects 
only one flag to be set in the power_mode change message, we don't need 
to bother about this anymore. But if it can handle more than one flag 
but the core driver implementation is blocking it, we will have to fix 
that as well.


@Alex: How can we get more information on this ?

- Shashank


Thanks,
Lijo


Now, when we have a single workload:
-> Job1: requests profile P1 via UAPI, ref count = 1
-> driver sends flags for p1
-> PM FW applies profile P1
-> Job executes in profile P1
-> Job goes to reset function, ref_count = 0,
-> Power profile resets

Now, we have conflicts only when we see multiple workloads (Job1 and 
Job 2)

-> Job1: requests profile P1 via UAPI, ref count = 1
-> driver sends flags for p1
-> PM FW applies profile P1
-> Job executes in profile P1
-> Job2: requests profile P2 via UAPI, refcount = 2
-> driver sends flags for (P1|P2)
-> PM FW picks the more aggressive of the two (Say P1, stays in P1)
-> Job1 goes to reset function, ref_count = 1, job1 does not reset 
power profile
-> Job2 goes to reset function, ref_counter = 2, job 2 resets Power 
profile

-> Power profile resets to None

So this state machine looks like if there is only 1 job, it will be 
executed in desired mode. But if there are multiple, the most 
aggressive profile will be picked, and every job will be executed in 
atleast the requested power profile mode or higher.


Do you find any problem so far ?

- Shashank



Thanks,
Lijo

Re: [PATCH v3 5/5] drm/amdgpu: switch workload context to/from compute

2022-09-29 Thread Lazar, Lijo





On 9/29/2022 7:30 PM, Sharma, Shashank wrote:



On 9/29/2022 3:37 PM, Lazar, Lijo wrote:

To be clear your understanding -

Nothing is automatic in PMFW. PMFW picks a priority based on the 
actual mask sent by driver.


Assuming lower bits corresponds to highest priority -

If driver sends a mask with Bit3 and Bit 0 set, PMFW will chose 
profile that corresponds to Bit0. If driver sends a mask with Bit4 
Bit2 set and rest unset, PMFW will chose profile that corresponds to 
Bit2. However if driver sends a mask only with a single bit set, it 
chooses the profile regardless of whatever was the previous profile. t 
doesn't check if the existing profile > newly requested one. That is 
the behavior.


So if a job send chooses a profile that corresponds to Bit0, driver 
will send that. Next time if another job chooses a profile that 
corresponds to Bit1, PMFW will receive that as the new profile and 
switch to that. It trusts the driver to send the proper workload mask.


Hope that gives the picture.




Thanks, my understanding is also similar, referring to the core power 
switch profile function here: 
amd_powerplay.c::pp_dpm_switch_power_profile()

*snip code*
hwmgr->workload_mask |= (1 << hwmgr->workload_prority[type]);
     index = fls(hwmgr->workload_mask);
     index = index <= Workload_Policy_Max ? index - 1 : 0;
     workload = hwmgr->workload_setting[index];
*snip_code*
hwmgr->hwmgr_func->set_power_profile_mode(hwmgr, &workload, 0);

Here I can see that the new workload mask is appended into the existing 
workload mask (not overwritten). So if we keep sending new 
workload_modes, they would be appended into the workload flags and 
finally the PM will pick the most aggressive one of all these flags, as 
per its policy.




Actually it's misleading -

The path for sienna is -
set_power_profile_mode -> sienna_cichlid_set_power_profile_mode


This code here is a picking one based on lookup table.

 workload_type = smu_cmn_to_asic_specific_index(smu,

CMN2ASIC_MAPPING_WORKLOAD,

smu->power_profile_mode);

This is that lookup table -

static struct cmn2asic_mapping 
sienna_cichlid_workload_map[PP_SMC_POWER_PROFILE_COUNT] = {
WORKLOAD_MAP(PP_SMC_POWER_PROFILE_BOOTUP_DEFAULT, 
WORKLOAD_PPLIB_DEFAULT_BIT),
WORKLOAD_MAP(PP_SMC_POWER_PROFILE_FULLSCREEN3D, 
WORKLOAD_PPLIB_FULL_SCREEN_3D_BIT),
WORKLOAD_MAP(PP_SMC_POWER_PROFILE_POWERSAVING, 
WORKLOAD_PPLIB_POWER_SAVING_BIT),
WORKLOAD_MAP(PP_SMC_POWER_PROFILE_VIDEO, 
WORKLOAD_PPLIB_VIDEO_BIT),
WORKLOAD_MAP(PP_SMC_POWER_PROFILE_VR, 
WORKLOAD_PPLIB_VR_BIT),
WORKLOAD_MAP(PP_SMC_POWER_PROFILE_COMPUTE, 
WORKLOAD_PPLIB_COMPUTE_BIT),
WORKLOAD_MAP(PP_SMC_POWER_PROFILE_CUSTOM, 
WORKLOAD_PPLIB_CUSTOM_BIT),

};


And this is the place of interaction with PMFW. (1 << workload_type) is 
the mask being sent.


   smu_cmn_send_smc_msg_with_param(smu, SMU_MSG_SetWorkloadMask,
1 << workload_type, NULL);

In the end, driver implementation expects only one bit to be set.

Thanks,
Lijo


Now, when we have a single workload:
-> Job1: requests profile P1 via UAPI, ref count = 1
-> driver sends flags for p1
-> PM FW applies profile P1
-> Job executes in profile P1
-> Job goes to reset function, ref_count = 0,
-> Power profile resets

Now, we have conflicts only when we see multiple workloads (Job1 and Job 2)
-> Job1: requests profile P1 via UAPI, ref count = 1
-> driver sends flags for p1
-> PM FW applies profile P1
-> Job executes in profile P1
-> Job2: requests profile P2 via UAPI, refcount = 2
-> driver sends flags for (P1|P2)
-> PM FW picks the more aggressive of the two (Say P1, stays in P1)
-> Job1 goes to reset function, ref_count = 1, job1 does not reset power 
profile

-> Job2 goes to reset function, ref_counter = 2, job 2 resets Power profile
-> Power profile resets to None

So this state machine looks like if there is only 1 job, it will be 
executed in desired mode. But if there are multiple, the most aggressive 
profile will be picked, and every job will be executed in atleast the 
requested power profile mode or higher.


Do you find any problem so far ?

- Shashank



Thanks,
Lijo

Re: [PATCH v2 0/6] Add support for atomic async page-flips

2022-09-29 Thread Alex Deucher

On Tue, Aug 30, 2022 at 1:29 PM Simon Ser  wrote:
>
> This series adds support for DRM_MODE_PAGE_FLIP_ASYNC for atomic
> commits, aka. "immediate flip" (which might result in tearing).
> The feature was only available via the legacy uAPI, however for
> gaming use-cases it may be desirable to enable it via the atomic
> uAPI too.
>
> - v1: https://patchwork.freedesktop.org/series/107683/
> - User-space patch: https://github.com/Plagman/gamescope/pull/595
> - IGT patch: https://patchwork.freedesktop.org/series/107681/
>
> Main changes in v2: add docs, fail atomic commit if async flip isn't
> possible.
>
> Tested on an AMD Picasso iGPU.

Series is:
Reviewed-by: Alex Deucher 

>
> Simon Ser (6):
>   amd/display: only accept async flips for fast updates
>   drm: document DRM_MODE_PAGE_FLIP_ASYNC
>   drm: introduce drm_mode_config.atomic_async_page_flip_not_supported
>   drm: allow DRM_MODE_PAGE_FLIP_ASYNC for atomic commits
>   drm: introduce DRM_CAP_ATOMIC_ASYNC_PAGE_FLIP
>   amd/display: indicate support for atomic async page-flips on DC
>
>  .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c |  8 ++
>  .../amd/display/amdgpu_dm/amdgpu_dm_crtc.c| 10 +++
>  drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.c  |  1 +
>  drivers/gpu/drm/drm_atomic_uapi.c | 28 +--
>  drivers/gpu/drm/drm_ioctl.c   |  5 
>  drivers/gpu/drm/i915/display/intel_display.c  |  1 +
>  drivers/gpu/drm/nouveau/nouveau_display.c |  1 +
>  drivers/gpu/drm/vc4/vc4_kms.c |  1 +
>  include/drm/drm_mode_config.h | 11 
>  include/uapi/drm/drm.h| 10 ++-
>  include/uapi/drm/drm_mode.h   | 11 
>  11 files changed, 83 insertions(+), 4 deletions(-)
>
> --
> 2.37.2
>
>

[PATCH -next] drm/amdgpu/sdma: add missing release_firmware() in amdgpu_sdma_init_microcode()

2022-09-29 Thread Yang Yingliang

In some error path in amdgpu_sdma_init_microcode(), release_firmware() is
not called, the memory allocated in request_firmware() will be leaked,
calling amdgpu_sdma_destroy_inst_ctx() which calls release_firmware() to
avoid memory leak.

Fixes: 60704ab9 ("drm/amdgpu: add function to init SDMA microcode")
Signed-off-by: Yang Yingliang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c
index 3949b7e3907f..43cf8632cc1a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c
@@ -222,8 +222,10 @@ int amdgpu_sdma_init_microcode(struct amdgpu_device *adev,
adev->sdma.instance[instance].fw->data;
version_major = le16_to_cpu(header->header_version_major);
 
-   if ((duplicate && instance) || (!duplicate && version_major > 1))
-   return -EINVAL;
+   if ((duplicate && instance) || (!duplicate && version_major > 1)) {
+   err = -EINVAL;
+   goto out;
+   }
 
err = amdgpu_sdma_init_inst_ctx(&adev->sdma.instance[instance]);
if (err)
@@ -272,7 +274,7 @@ int amdgpu_sdma_init_microcode(struct amdgpu_device *adev,

ALIGN(le32_to_cpu(sdma_hdr->ctl_ucode_size_bytes), PAGE_SIZE);
break;
default:
-   return -EINVAL;
+   err = -EINVAL;
}
}
 
-- 
2.25.1

Re: [PATCH v3 5/5] drm/amdgpu: switch workload context to/from compute

2022-09-29 Thread Sharma, Shashank





On 9/29/2022 3:37 PM, Lazar, Lijo wrote:

To be clear your understanding -

Nothing is automatic in PMFW. PMFW picks a priority based on the actual 
mask sent by driver.


Assuming lower bits corresponds to highest priority -

If driver sends a mask with Bit3 and Bit 0 set, PMFW will chose profile 
that corresponds to Bit0. If driver sends a mask with Bit4 Bit2 set and 
rest unset, PMFW will chose profile that corresponds to Bit2. However if 
driver sends a mask only with a single bit set, it chooses the profile 
regardless of whatever was the previous profile. t doesn't check if the 
existing profile > newly requested one. That is the behavior.


So if a job send chooses a profile that corresponds to Bit0, driver will 
send that. Next time if another job chooses a profile that corresponds 
to Bit1, PMFW will receive that as the new profile and switch to that. 
It trusts the driver to send the proper workload mask.


Hope that gives the picture.




Thanks, my understanding is also similar, referring to the core power 
switch profile function here: amd_powerplay.c::pp_dpm_switch_power_profile()

*snip code*
hwmgr->workload_mask |= (1 << hwmgr->workload_prority[type]);
index = fls(hwmgr->workload_mask);
index = index <= Workload_Policy_Max ? index - 1 : 0;
workload = hwmgr->workload_setting[index];
*snip_code*
hwmgr->hwmgr_func->set_power_profile_mode(hwmgr, &workload, 0);

Here I can see that the new workload mask is appended into the existing 
workload mask (not overwritten). So if we keep sending new 
workload_modes, they would be appended into the workload flags and 
finally the PM will pick the most aggressive one of all these flags, as 
per its policy.


Now, when we have a single workload:
-> Job1: requests profile P1 via UAPI, ref count = 1
-> driver sends flags for p1
-> PM FW applies profile P1
-> Job executes in profile P1
-> Job goes to reset function, ref_count = 0,
-> Power profile resets

Now, we have conflicts only when we see multiple workloads (Job1 and Job 2)
-> Job1: requests profile P1 via UAPI, ref count = 1
-> driver sends flags for p1
-> PM FW applies profile P1
-> Job executes in profile P1
-> Job2: requests profile P2 via UAPI, refcount = 2
-> driver sends flags for (P1|P2)
-> PM FW picks the more aggressive of the two (Say P1, stays in P1)
-> Job1 goes to reset function, ref_count = 1, job1 does not reset power 
profile

-> Job2 goes to reset function, ref_counter = 2, job 2 resets Power profile
-> Power profile resets to None

So this state machine looks like if there is only 1 job, it will be 
executed in desired mode. But if there are multiple, the most aggressive 
profile will be picked, and every job will be executed in atleast the 
requested power profile mode or higher.


Do you find any problem so far ?

- Shashank



Thanks,
Lijo

Re: [PATCH v3 5/5] drm/amdgpu: switch workload context to/from compute

2022-09-29 Thread Lazar, Lijo

On 9/29/2022 6:50 PM, Sharma, Shashank wrote:

On 9/29/2022 1:10 PM, Lazar, Lijo wrote:

On 9/29/2022 2:18 PM, Sharma, Shashank wrote:

On 9/28/2022 11:51 PM, Alex Deucher wrote:

On Wed, Sep 28, 2022 at 4:57 AM Sharma, Shashank
 wrote:

On 9/27/2022 10:40 PM, Alex Deucher wrote:

On Tue, Sep 27, 2022 at 11:38 AM Sharma, Shashank
 wrote:

On 9/27/2022 5:23 PM, Felix Kuehling wrote:

Am 2022-09-27 um 10:58 schrieb Sharma, Shashank:

Hello Felix,

Thank for the review comments.

On 9/27/2022 4:48 PM, Felix Kuehling wrote:

Am 2022-09-27 um 02:12 schrieb Christian König:

Am 26.09.22 um 23:40 schrieb Shashank Sharma:

This patch switches the GPU workload mode to/from
compute mode, while submitting compute workload.

Signed-off-by: Alex Deucher 
Signed-off-by: Shashank Sharma 

Feel free to add my acked-by, but Felix should probably take 
a look

as well.

This look OK purely from a compute perspective. But I'm concerned
about the interaction of compute with graphics or multiple 
graphics
contexts submitting work concurrently. They would constantly 
override

or disable each other's workload hints.

For example, you have an amdgpu_ctx with
AMDGPU_CTX_WORKLOAD_HINT_COMPUTE (maybe Vulkan compute) and a KFD
process that also wants the compute profile. Those could be 
different
processes belonging to different users. Say, KFD enables the 
compute
profile first. Then the graphics context submits a job. At the 
start
of the job, the compute profile is enabled. That's a no-op 
because
KFD already enabled the compute profile. When the job 
finishes, it

disables the compute profile for everyone, including KFD. That's
unexpected.

In this case, it will not disable the compute profile, as the
reference counter will not be zero. The reset_profile() will 
only act

if the reference counter is 0.

OK, I missed the reference counter.

But I would be happy to get any inputs about a policy which can be
more sustainable and gets better outputs, for example:
- should we not allow a profile change, if a PP mode is already
applied and keep it Early bird basis ?

For example: Policy A
- Job A sets the profile to compute
- Job B tries to set profile to 3D, but we do not allow it as 
job A is

not finished it yet.

Or Policy B: Current one
- Job A sets the profile to compute
- Job B tries to set profile to 3D, and we allow it. Job A also 
runs

in PP 3D
- Job B finishes, but does not reset PP as reference count is 
not zero

due to compute
- Job  A finishes, profile reset to NONE

I think this won't work. As I understand it, the
amdgpu_dpm_switch_power_profile enables and disables individual
profiles. Disabling the 3D profile doesn't disable the compute 
profile

at the same time. I think you'll need one refcount per profile.

Regards,
 Felix

Thanks, This is exactly what I was looking for, I think Alex's 
initial
idea was around it, but I was under the assumption that there is 
only
one HW profile in SMU which keeps on getting overwritten. This 
can solve
our problems, as I can create an array of reference counters, and 
will

disable only the profile whose reference counter goes 0.

It's been a while since I paged any of this code into my head, but I
believe the actual workload message in the SMU is a mask where you 
can

specify multiple workload types at the same time and the SMU will
arbitrate between them internally.  E.g., the most aggressive one 
will

be selected out of the ones specified.  I think in the driver we just
set one bit at a time using the current interface.  It might be 
better

to change the interface and just ref count the hint types and then
when we call the set function look at the ref counts for each hint
type and set the mask as appropriate.

Alex

Hey Alex,
Thanks for your comment, if that is the case, this current patch 
series
works straight forward, and no changes would be required. Please 
let me

know if my understanding is correct:

Assumption: Order of aggression: 3D > Media > Compute

- Job 1: Requests mode compute: PP changed to compute, ref count 1
- Job 2: Requests mode media: PP changed to media, ref count 2
- Job 3: requests mode 3D: PP changed to 3D, ref count 3
- Job 1 finishes, downs ref count to 2, doesn't reset the PP as ref 
> 0,

PP still 3D
- Job 3 finishes, downs ref count to 1, doesn't reset the PP as ref 
> 0,

PP still 3D
- Job 2 finishes, downs ref count to 0, PP changed to NONE,

In this way, every job will be operating in the Power profile of 
desired
aggression or higher, and this API guarantees the execution 
at-least in

the desired power profile.

I'm not entirely sure on the relative levels of aggression, but I
believe the SMU priorities them by index.  E.g.
#define WORKLOAD_PPLIB_DEFAULT_BIT    0
#define WORKLOAD_PPLIB_FULL_SCREEN_3D_BIT 1
#define WORKLOAD_PPLIB_POWER_SAVING_BIT   2
#define WORKLOAD_PPLIB_VIDEO_BIT  3
#define WORKLOAD_PPLIB_VR_BIT 4
#define WORKLOAD_PPLIB_COMPUTE_BIT    5
#define WORKLOAD_PPLI

Re: [PATCH 1/4] drm/amdgpu: use proper DC check in amdgpu_display_supported_domains()

2022-09-29 Thread Christian König


Feel free to add my acked-by, but that's really not my field of expertise.

Christian.

Am 29.09.22 um 15:15 schrieb Alex Deucher:

Ping on this series?

Alex

On Thu, Sep 8, 2022 at 10:39 AM Alex Deucher  wrote:

amdgpu_device_asic_has_dc_support() just checks the asic itself.
amdgpu_device_has_dc_support() is a runtime check which not
only checks the asic, but also other things in the driver
like whether virtual display is enabled.  We want the latter
here.

Signed-off-by: Alex Deucher 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_display.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
index c20922a5af9f..b0fa5d065d50 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
@@ -514,7 +514,7 @@ uint32_t amdgpu_display_supported_domains(struct 
amdgpu_device *adev,
  */
 if ((bo_flags & AMDGPU_GEM_CREATE_CPU_GTT_USWC) &&
 amdgpu_bo_support_uswc(bo_flags) &&
-   amdgpu_device_asic_has_dc_support(adev->asic_type) &&
+   amdgpu_device_has_dc_support(adev) &&
 adev->mode_info.gpu_vm_support)
 domain |= AMDGPU_GEM_DOMAIN_GTT;
  #endif
--
2.37.2

Re: [PATCH v3 5/5] drm/amdgpu: switch workload context to/from compute

2022-09-29 Thread Sharma, Shashank

On 9/29/2022 1:10 PM, Lazar, Lijo wrote:

On 9/29/2022 2:18 PM, Sharma, Shashank wrote:

On 9/28/2022 11:51 PM, Alex Deucher wrote:

On Wed, Sep 28, 2022 at 4:57 AM Sharma, Shashank
 wrote:

On 9/27/2022 10:40 PM, Alex Deucher wrote:

On Tue, Sep 27, 2022 at 11:38 AM Sharma, Shashank
 wrote:

On 9/27/2022 5:23 PM, Felix Kuehling wrote:

Am 2022-09-27 um 10:58 schrieb Sharma, Shashank:

Hello Felix,

Thank for the review comments.

On 9/27/2022 4:48 PM, Felix Kuehling wrote:

Am 2022-09-27 um 02:12 schrieb Christian König:

Am 26.09.22 um 23:40 schrieb Shashank Sharma:

This patch switches the GPU workload mode to/from
compute mode, while submitting compute workload.

Signed-off-by: Alex Deucher 
Signed-off-by: Shashank Sharma 

Feel free to add my acked-by, but Felix should probably take a 
look

as well.

This look OK purely from a compute perspective. But I'm concerned
about the interaction of compute with graphics or multiple 
graphics
contexts submitting work concurrently. They would constantly 
override

or disable each other's workload hints.

For example, you have an amdgpu_ctx with
AMDGPU_CTX_WORKLOAD_HINT_COMPUTE (maybe Vulkan compute) and a KFD
process that also wants the compute profile. Those could be 
different
processes belonging to different users. Say, KFD enables the 
compute
profile first. Then the graphics context submits a job. At the 
start

of the job, the compute profile is enabled. That's a no-op because
KFD already enabled the compute profile. When the job finishes, it
disables the compute profile for everyone, including KFD. That's
unexpected.

In this case, it will not disable the compute profile, as the
reference counter will not be zero. The reset_profile() will 
only act

if the reference counter is 0.

OK, I missed the reference counter.

But I would be happy to get any inputs about a policy which can be
more sustainable and gets better outputs, for example:
- should we not allow a profile change, if a PP mode is already
applied and keep it Early bird basis ?

For example: Policy A
- Job A sets the profile to compute
- Job B tries to set profile to 3D, but we do not allow it as 
job A is

not finished it yet.

Or Policy B: Current one
- Job A sets the profile to compute
- Job B tries to set profile to 3D, and we allow it. Job A also 
runs

in PP 3D
- Job B finishes, but does not reset PP as reference count is 
not zero

due to compute
- Job  A finishes, profile reset to NONE

I think this won't work. As I understand it, the
amdgpu_dpm_switch_power_profile enables and disables individual
profiles. Disabling the 3D profile doesn't disable the compute 
profile

at the same time. I think you'll need one refcount per profile.

Regards,
 Felix

Thanks, This is exactly what I was looking for, I think Alex's 
initial

idea was around it, but I was under the assumption that there is only
one HW profile in SMU which keeps on getting overwritten. This can 
solve
our problems, as I can create an array of reference counters, and 
will

disable only the profile whose reference counter goes 0.

It's been a while since I paged any of this code into my head, but I
believe the actual workload message in the SMU is a mask where you can
specify multiple workload types at the same time and the SMU will
arbitrate between them internally.  E.g., the most aggressive one will
be selected out of the ones specified.  I think in the driver we just
set one bit at a time using the current interface.  It might be better
to change the interface and just ref count the hint types and then
when we call the set function look at the ref counts for each hint
type and set the mask as appropriate.

Alex

Hey Alex,
Thanks for your comment, if that is the case, this current patch series
works straight forward, and no changes would be required. Please let me
know if my understanding is correct:

Assumption: Order of aggression: 3D > Media > Compute

- Job 1: Requests mode compute: PP changed to compute, ref count 1
- Job 2: Requests mode media: PP changed to media, ref count 2
- Job 3: requests mode 3D: PP changed to 3D, ref count 3
- Job 1 finishes, downs ref count to 2, doesn't reset the PP as ref 
> 0,

PP still 3D
- Job 3 finishes, downs ref count to 1, doesn't reset the PP as ref 
> 0,

PP still 3D
- Job 2 finishes, downs ref count to 0, PP changed to NONE,

In this way, every job will be operating in the Power profile of 
desired

aggression or higher, and this API guarantees the execution at-least in
the desired power profile.

I'm not entirely sure on the relative levels of aggression, but I
believe the SMU priorities them by index.  E.g.
#define WORKLOAD_PPLIB_DEFAULT_BIT    0
#define WORKLOAD_PPLIB_FULL_SCREEN_3D_BIT 1
#define WORKLOAD_PPLIB_POWER_SAVING_BIT   2
#define WORKLOAD_PPLIB_VIDEO_BIT  3
#define WORKLOAD_PPLIB_VR_BIT 4
#define WORKLOAD_PPLIB_COMPUTE_BIT    5
#define WORKLOAD_PPLIB_CUSTOM_BIT 6

3D < video < VR < compute < custom

V

Re: [PATCH 1/4] drm/amdgpu: use proper DC check in amdgpu_display_supported_domains()

2022-09-29 Thread Alex Deucher

Ping on this series?

Alex

On Thu, Sep 8, 2022 at 10:39 AM Alex Deucher  wrote:
>
> amdgpu_device_asic_has_dc_support() just checks the asic itself.
> amdgpu_device_has_dc_support() is a runtime check which not
> only checks the asic, but also other things in the driver
> like whether virtual display is enabled.  We want the latter
> here.
>
> Signed-off-by: Alex Deucher 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_display.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
> index c20922a5af9f..b0fa5d065d50 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
> @@ -514,7 +514,7 @@ uint32_t amdgpu_display_supported_domains(struct 
> amdgpu_device *adev,
>  */
> if ((bo_flags & AMDGPU_GEM_CREATE_CPU_GTT_USWC) &&
> amdgpu_bo_support_uswc(bo_flags) &&
> -   amdgpu_device_asic_has_dc_support(adev->asic_type) &&
> +   amdgpu_device_has_dc_support(adev) &&
> adev->mode_info.gpu_vm_support)
> domain |= AMDGPU_GEM_DOMAIN_GTT;
>  #endif
> --
> 2.37.2
>

[PATCH -next] drm/amd/display: change to enc314_stream_encoder_dp_blank static

2022-09-29 Thread Yang Yingliang

enc314_stream_encoder_dp_blank is only used in dcn314_dio_stream_encoder.c now,
change it to static.

Fixes: c9c2ff53cc20 ("drm/amd/display: Add explicit FIFO disable for DP blank")
Signed-off-by: Yang Yingliang 
---
 .../gpu/drm/amd/display/dc/dcn314/dcn314_dio_stream_encoder.c   | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn314/dcn314_dio_stream_encoder.c 
b/drivers/gpu/drm/amd/display/dc/dcn314/dcn314_dio_stream_encoder.c
index 0d2ffb692957..7e773bf7b895 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn314/dcn314_dio_stream_encoder.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn314/dcn314_dio_stream_encoder.c
@@ -262,7 +262,7 @@ static bool is_two_pixels_per_containter(const struct 
dc_crtc_timing *timing)
return two_pix;
 }
 
-void enc314_stream_encoder_dp_blank(
+static void enc314_stream_encoder_dp_blank(
struct dc_link *link,
struct stream_encoder *enc)
 {
-- 
2.25.1

Re: [PATCH 1/7] mm/memory.c: Fix race when faulting a device private page

2022-09-29 Thread Michael Ellerman

Alistair Popple  writes:
> Michael Ellerman  writes:
>> Alistair Popple  writes:
>>> When the CPU tries to access a device private page the migrate_to_ram()
>>> callback associated with the pgmap for the page is called. However no
>>> reference is taken on the faulting page. Therefore a concurrent
>>> migration of the device private page can free the page and possibly the
>>> underlying pgmap. This results in a race which can crash the kernel due
>>> to the migrate_to_ram() function pointer becoming invalid. It also means
>>> drivers can't reliably read the zone_device_data field because the page
>>> may have been freed with memunmap_pages().
>>>
>>> Close the race by getting a reference on the page while holding the ptl
>>> to ensure it has not been freed. Unfortunately the elevated reference
>>> count will cause the migration required to handle the fault to fail. To
>>> avoid this failure pass the faulting page into the migrate_vma functions
>>> so that if an elevated reference count is found it can be checked to see
>>> if it's expected or not.
>>>
>>> Signed-off-by: Alistair Popple 
>>> ---
>>>  arch/powerpc/kvm/book3s_hv_uvmem.c   | 15 ++-
>>>  drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 17 +++--
>>>  drivers/gpu/drm/amd/amdkfd/kfd_migrate.h |  2 +-
>>>  drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 11 +---
>>>  include/linux/migrate.h  |  8 ++-
>>>  lib/test_hmm.c   |  7 ++---
>>>  mm/memory.c  | 16 +++-
>>>  mm/migrate.c | 34 ++---
>>>  mm/migrate_device.c  | 18 +
>>>  9 files changed, 87 insertions(+), 41 deletions(-)
>>>
>>> diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
>>> b/arch/powerpc/kvm/book3s_hv_uvmem.c
>>> index 5980063..d4eacf4 100644
>>> --- a/arch/powerpc/kvm/book3s_hv_uvmem.c
>>> +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
>>> @@ -508,10 +508,10 @@ unsigned long kvmppc_h_svm_init_start(struct kvm *kvm)
...
>>> @@ -994,7 +997,7 @@ static vm_fault_t kvmppc_uvmem_migrate_to_ram(struct 
>>> vm_fault *vmf)
>>>
>>> if (kvmppc_svm_page_out(vmf->vma, vmf->address,
>>> vmf->address + PAGE_SIZE, PAGE_SHIFT,
>>> -   pvt->kvm, pvt->gpa))
>>> +   pvt->kvm, pvt->gpa, vmf->page))
>>> return VM_FAULT_SIGBUS;
>>> else
>>> return 0;
>>
>> I don't have a UV test system, but as-is it doesn't even compile :)
>
> Ugh, thanks. I did get as far as installing a PPC cross-compiler and
> building a kernel. Apparently I did not get as far as enabling
> CONFIG_PPC_UV :)

No worries, that's really on us. If we're going to keep the code in the
tree then it should really be enabled in at least one of our defconfigs.

cheers

Re: [PATCH v2 8/8] hmm-tests: Add test for migrate_device_range()

2022-09-29 Thread Alistair Popple



Andrew Morton  writes:

> On Wed, 28 Sep 2022 22:01:22 +1000 Alistair Popple  wrote:
>
>> @@ -1401,22 +1494,7 @@ static int dmirror_device_init(struct dmirror_device 
>> *mdevice, int id)
>>
>>  static void dmirror_device_remove(struct dmirror_device *mdevice)
>>  {
>> -unsigned int i;
>> -
>> -if (mdevice->devmem_chunks) {
>> -for (i = 0; i < mdevice->devmem_count; i++) {
>> -struct dmirror_chunk *devmem =
>> -mdevice->devmem_chunks[i];
>> -
>> -memunmap_pages(&devmem->pagemap);
>> -if (devmem->pagemap.type == MEMORY_DEVICE_PRIVATE)
>> -release_mem_region(devmem->pagemap.range.start,
>> -   
>> range_len(&devmem->pagemap.range));
>> -kfree(devmem);
>> -}
>> -kfree(mdevice->devmem_chunks);
>> -}
>> -
>> +dmirror_device_remove_chunks(mdevice);
>>  cdev_del(&mdevice->cdevice);
>>  }
>
> Needed a bit or rework due to
> https://lkml.kernel.org/r/20220826050631.25771-1-mpent...@redhat.com.
> Please check my resolution.

Thanks. Rework looks good to me.

> --- a/lib/test_hmm.c~hmm-tests-add-test-for-migrate_device_range
> +++ a/lib/test_hmm.c
> @@ -100,6 +100,7 @@ struct dmirror {
>  struct dmirror_chunk {
>   struct dev_pagemap  pagemap;
>   struct dmirror_device   *mdevice;
> + bool remove;
>  };
>
>  /*
> @@ -192,11 +193,15 @@ static int dmirror_fops_release(struct i
>   return 0;
>  }
>
> +static struct dmirror_chunk *dmirror_page_to_chunk(struct page *page)
> +{
> + return container_of(page->pgmap, struct dmirror_chunk, pagemap);
> +}
> +
>  static struct dmirror_device *dmirror_page_to_device(struct page *page)
>
>  {
> - return container_of(page->pgmap, struct dmirror_chunk,
> - pagemap)->mdevice;
> + return dmirror_page_to_chunk(page)->mdevice;
>  }
>
>  static int dmirror_do_fault(struct dmirror *dmirror, struct hmm_range *range)
> @@ -1218,6 +1223,85 @@ static int dmirror_snapshot(struct dmirr
>   return ret;
>  }
>
> +static void dmirror_device_evict_chunk(struct dmirror_chunk *chunk)
> +{
> + unsigned long start_pfn = chunk->pagemap.range.start >> PAGE_SHIFT;
> + unsigned long end_pfn = chunk->pagemap.range.end >> PAGE_SHIFT;
> + unsigned long npages = end_pfn - start_pfn + 1;
> + unsigned long i;
> + unsigned long *src_pfns;
> + unsigned long *dst_pfns;
> +
> + src_pfns = kcalloc(npages, sizeof(*src_pfns), GFP_KERNEL);
> + dst_pfns = kcalloc(npages, sizeof(*dst_pfns), GFP_KERNEL);
> +
> + migrate_device_range(src_pfns, start_pfn, npages);
> + for (i = 0; i < npages; i++) {
> + struct page *dpage, *spage;
> +
> + spage = migrate_pfn_to_page(src_pfns[i]);
> + if (!spage || !(src_pfns[i] & MIGRATE_PFN_MIGRATE))
> + continue;
> +
> + if (WARN_ON(!is_device_private_page(spage) &&
> + !is_device_coherent_page(spage)))
> + continue;
> + spage = BACKING_PAGE(spage);
> + dpage = alloc_page(GFP_HIGHUSER_MOVABLE | __GFP_NOFAIL);
> + lock_page(dpage);
> + copy_highpage(dpage, spage);
> + dst_pfns[i] = migrate_pfn(page_to_pfn(dpage));
> + if (src_pfns[i] & MIGRATE_PFN_WRITE)
> + dst_pfns[i] |= MIGRATE_PFN_WRITE;
> + }
> + migrate_device_pages(src_pfns, dst_pfns, npages);
> + migrate_device_finalize(src_pfns, dst_pfns, npages);
> + kfree(src_pfns);
> + kfree(dst_pfns);
> +}
> +
> +/* Removes free pages from the free list so they can't be re-allocated */
> +static void dmirror_remove_free_pages(struct dmirror_chunk *devmem)
> +{
> + struct dmirror_device *mdevice = devmem->mdevice;
> + struct page *page;
> +
> + for (page = mdevice->free_pages; page; page = page->zone_device_data)
> + if (dmirror_page_to_chunk(page) == devmem)
> + mdevice->free_pages = page->zone_device_data;
> +}
> +
> +static void dmirror_device_remove_chunks(struct dmirror_device *mdevice)
> +{
> + unsigned int i;
> +
> + mutex_lock(&mdevice->devmem_lock);
> + if (mdevice->devmem_chunks) {
> + for (i = 0; i < mdevice->devmem_count; i++) {
> + struct dmirror_chunk *devmem =
> + mdevice->devmem_chunks[i];
> +
> + spin_lock(&mdevice->lock);
> + devmem->remove = true;
> + dmirror_remove_free_pages(devmem);
> + spin_unlock(&mdevice->lock);
> +
> + dmirror_device_evict_chunk(devmem);
> + memunmap_pages(&devmem->pagemap);
> + if (devmem->pagemap.type == MEMORY_DEVICE_PRIVATE)
> + release_mem_region(devmem->pagemap.range.s

Information about XDC 2022 - next week!

2022-09-29 Thread Jeremy White


Hi folks,

We are excited to welcome you in person to the 2022 X.Org Developers 
Conference, held in conjunction with WineConf and FOSS XR conference.


The conference will start officially on Tuesday morning, October 4th. 
The program is here:

  https://indico.freedesktop.org/event/2/timetable/#all.detailed
The official events start at 8:30 am, but we will have coffee and 
pastries available from 7:30 on Tuesday and 8 on Wednesday and Thursday.


We expect everyone attending to be vaccinated and to be respectful of 
people that are trying to avoid catching COVID. Masks are mandatory, 
except when presenting or eating.


A small number of us will gather informally at Brit’s Pub, starting at 
around 4:00 pm on Monday, October 3rd.  We’ll try to have a table with 
some sort of a sign, and folks can connect, have a drink, and then 
perhaps group up to explore alternate food.  Note that if the weather is 
nice, we may be up on the roof, so explore far to find us.


We will be on the Minneapolis campus of St. Thomas, which is a mildly 
confusing campus.  We have given instructions and a picture to guide you 
here:


https://indico.freedesktop.org/event/2/page/10-attending-xdc-wineconf-foss-xr
We are working on the remote experience, and expect to have streaming of 
all events available. The above page will have those details just as 
soon as they are finalized.


We have a page of instructions for folks that will be presenting:
  https://indico.freedesktop.org/event/2/page/18-speaker-instructions

We are also excited to announce the happy hour taking place on 
Wednesday, from 6:00 pm until 8:00 pm.  The hope is that all three 
projects can mingle and socialize and enjoy the return of in person 
meetings.


Also, this year we plan to adopt the Wine strategy of using a deliberate 
Matrix chat room just for the conference.  Matrix has a variety of apps, 
and Element, the default one is easy to configure on many devices, 
including mobile phones.  The link to that channel is here:

  https://matrix.to/#/#xdc-wineconf-fossxr-2022:matrix.org
We find the chat channel a good place to learn what restaurants and bars 
are chosen, and just a good way to track the social aspects of the 
conference.


We look forward to seeing you next week!

Cheers,

Jeremy

Re: [PATCH v3 5/5] drm/amdgpu: switch workload context to/from compute

2022-09-29 Thread Lazar, Lijo





On 9/29/2022 2:18 PM, Sharma, Shashank wrote:



On 9/28/2022 11:51 PM, Alex Deucher wrote:

On Wed, Sep 28, 2022 at 4:57 AM Sharma, Shashank
 wrote:




On 9/27/2022 10:40 PM, Alex Deucher wrote:

On Tue, Sep 27, 2022 at 11:38 AM Sharma, Shashank
 wrote:




On 9/27/2022 5:23 PM, Felix Kuehling wrote:

Am 2022-09-27 um 10:58 schrieb Sharma, Shashank:

Hello Felix,

Thank for the review comments.

On 9/27/2022 4:48 PM, Felix Kuehling wrote:

Am 2022-09-27 um 02:12 schrieb Christian König:

Am 26.09.22 um 23:40 schrieb Shashank Sharma:

This patch switches the GPU workload mode to/from
compute mode, while submitting compute workload.

Signed-off-by: Alex Deucher 
Signed-off-by: Shashank Sharma 


Feel free to add my acked-by, but Felix should probably take a 
look

as well.


This look OK purely from a compute perspective. But I'm concerned
about the interaction of compute with graphics or multiple graphics
contexts submitting work concurrently. They would constantly 
override

or disable each other's workload hints.

For example, you have an amdgpu_ctx with
AMDGPU_CTX_WORKLOAD_HINT_COMPUTE (maybe Vulkan compute) and a KFD
process that also wants the compute profile. Those could be 
different
processes belonging to different users. Say, KFD enables the 
compute
profile first. Then the graphics context submits a job. At the 
start

of the job, the compute profile is enabled. That's a no-op because
KFD already enabled the compute profile. When the job finishes, it
disables the compute profile for everyone, including KFD. That's
unexpected.



In this case, it will not disable the compute profile, as the
reference counter will not be zero. The reset_profile() will only 
act

if the reference counter is 0.


OK, I missed the reference counter.




But I would be happy to get any inputs about a policy which can be
more sustainable and gets better outputs, for example:
- should we not allow a profile change, if a PP mode is already
applied and keep it Early bird basis ?

For example: Policy A
- Job A sets the profile to compute
- Job B tries to set profile to 3D, but we do not allow it as job 
A is

not finished it yet.

Or Policy B: Current one
- Job A sets the profile to compute
- Job B tries to set profile to 3D, and we allow it. Job A also runs
in PP 3D
- Job B finishes, but does not reset PP as reference count is not 
zero

due to compute
- Job  A finishes, profile reset to NONE


I think this won't work. As I understand it, the
amdgpu_dpm_switch_power_profile enables and disables individual
profiles. Disabling the 3D profile doesn't disable the compute 
profile

at the same time. I think you'll need one refcount per profile.

Regards,
 Felix


Thanks, This is exactly what I was looking for, I think Alex's initial
idea was around it, but I was under the assumption that there is only
one HW profile in SMU which keeps on getting overwritten. This can 
solve

our problems, as I can create an array of reference counters, and will
disable only the profile whose reference counter goes 0.


It's been a while since I paged any of this code into my head, but I
believe the actual workload message in the SMU is a mask where you can
specify multiple workload types at the same time and the SMU will
arbitrate between them internally.  E.g., the most aggressive one will
be selected out of the ones specified.  I think in the driver we just
set one bit at a time using the current interface.  It might be better
to change the interface and just ref count the hint types and then
when we call the set function look at the ref counts for each hint
type and set the mask as appropriate.

Alex



Hey Alex,
Thanks for your comment, if that is the case, this current patch series
works straight forward, and no changes would be required. Please let me
know if my understanding is correct:

Assumption: Order of aggression: 3D > Media > Compute

- Job 1: Requests mode compute: PP changed to compute, ref count 1
- Job 2: Requests mode media: PP changed to media, ref count 2
- Job 3: requests mode 3D: PP changed to 3D, ref count 3
- Job 1 finishes, downs ref count to 2, doesn't reset the PP as ref > 0,
PP still 3D
- Job 3 finishes, downs ref count to 1, doesn't reset the PP as ref > 0,
PP still 3D
- Job 2 finishes, downs ref count to 0, PP changed to NONE,

In this way, every job will be operating in the Power profile of desired
aggression or higher, and this API guarantees the execution at-least in
the desired power profile.


I'm not entirely sure on the relative levels of aggression, but I
believe the SMU priorities them by index.  E.g.
#define WORKLOAD_PPLIB_DEFAULT_BIT    0
#define WORKLOAD_PPLIB_FULL_SCREEN_3D_BIT 1
#define WORKLOAD_PPLIB_POWER_SAVING_BIT   2
#define WORKLOAD_PPLIB_VIDEO_BIT  3
#define WORKLOAD_PPLIB_VR_BIT 4
#define WORKLOAD_PPLIB_COMPUTE_BIT    5
#define WORKLOAD_PPLIB_CUSTOM_BIT 6

3D < video < VR < compute < custom

VR and compute are the most aggressive.  Custom takes pre

[PATCH 4/4] drm/amdgpu: MCBP based on DRM scheduler (v8)

2022-09-29 Thread jiadong.zhu

From: "Jiadong.Zhu" 

Trigger Mid-Command Buffer Preemption according to the priority of the software
rings and the hw fence signalling condition.

The muxer saves the locations of the indirect buffer frames from the software
ring together with the fence sequence number in its fifo queue, and pops out
those records when the fences are signalled. The locations are used to resubmit
packages in preemption scenarios by coping the chunks from the software ring.

v2: Update comment style.
v3: Fix conflict caused by previous modifications.
v4: Remove unnecessary prints.
v5: Fix corner cases for resubmission cases.
v6: Refactor functions for resubmission, calling fence_process in irq handler.
v7: Solve conflict for removing amdgpu_sw_ring.c.
v8: Add time threshold to judge if preemption request is needed.

Cc: Christian Koenig 
Cc: Luben Tuikov 
Cc: Andrey Grodzovsky 
Cc: Michel Dänzer 
Acked-by: Luben Tuikov 
Signed-off-by: Jiadong.Zhu 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c|  53 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c   |   2 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c |  12 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h |   6 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c | 353 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.h |  29 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c   |   2 +
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c|  10 +-
 8 files changed, 422 insertions(+), 45 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
index 790f7bfdc654..470448bc1ebb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
@@ -54,6 +54,7 @@ struct amdgpu_fence {
 
/* RB, DMA, etc. */
struct amdgpu_ring  *ring;
+   ktime_t start_timestamp;
 };
 
 static struct kmem_cache *amdgpu_fence_slab;
@@ -199,6 +200,7 @@ int amdgpu_fence_emit(struct amdgpu_ring *ring, struct 
dma_fence **f, struct amd
rcu_assign_pointer(*ptr, dma_fence_get(fence));
 
*f = fence;
+   to_amdgpu_fence(fence)->start_timestamp = ktime_get();
 
return 0;
 }
@@ -400,6 +402,57 @@ unsigned amdgpu_fence_count_emitted(struct amdgpu_ring 
*ring)
return lower_32_bits(emitted);
 }
 
+/**
+ * amdgpu_fence_last_unsignaled_time_us - the time fence emited till now
+ * @ring: ring the fence is associated with
+ *
+ * Find the earlist fence unsignaled till now, calculate the time delta
+ * between the time fence emitted and now.
+ */
+u64 amdgpu_fence_last_unsignaled_time_us(struct amdgpu_ring *ring)
+{
+   struct amdgpu_fence_driver *drv = &ring->fence_drv;
+   struct dma_fence *fence;
+   uint32_t last_seq, sync_seq;
+
+   last_seq = atomic_read(&ring->fence_drv.last_seq);
+   sync_seq = READ_ONCE(ring->fence_drv.sync_seq);
+   if (last_seq == sync_seq)
+   return 0;
+
+   ++last_seq;
+   last_seq &= drv->num_fences_mask;
+   fence = drv->fences[last_seq];
+   if (!fence)
+   return 0;
+
+   return ktime_us_delta(ktime_get(),
+   to_amdgpu_fence(fence)->start_timestamp);
+}
+
+/**
+ * amdgpu_fence_update_start_timestamp - update the timestamp of the fence
+ * @ring: ring the fence is associated with
+ * @seq: the fence seq number to update.
+ * @timestamp: the start timestamp to update.
+ *
+ * The function called at the time the fence and related ib is about to
+ * resubmit to gpu in MCBP scenario. Thus we do not consider race condition
+ * with amdgpu_fence_process to modify the same fence.
+ */
+void amdgpu_fence_update_start_timestamp(struct amdgpu_ring *ring, uint32_t 
seq, ktime_t timestamp)
+{
+   struct amdgpu_fence_driver *drv = &ring->fence_drv;
+   struct dma_fence *fence;
+
+   seq &= drv->num_fences_mask;
+   fence = drv->fences[seq];
+   if (!fence)
+   return;
+
+   to_amdgpu_fence(fence)->start_timestamp = timestamp;
+}
+
 /**
  * amdgpu_fence_driver_start_ring - make the fence driver
  * ready for use on the requested ring.
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
index 258cffe3c06a..af86d87e2f3b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
@@ -211,6 +211,7 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned 
num_ibs,
}
}
 
+   amdgpu_ring_ib_begin(ring);
if (job && ring->funcs->init_cond_exec)
patch_offset = amdgpu_ring_init_cond_exec(ring);
 
@@ -285,6 +286,7 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned 
num_ibs,
ring->hw_prio == AMDGPU_GFX_PIPE_PRIO_HIGH)
ring->funcs->emit_wave_limit(ring, false);
 
+   amdgpu_ring_ib_end(ring);
amdgpu_ring_commit(ring);
return 0;
 }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c

[PATCH 3/4] drm/amdgpu: Modify unmap_queue format for gfx9 (v4)

2022-09-29 Thread jiadong.zhu

From: "Jiadong.Zhu" 

1. Modify the unmap_queue package on gfx9. Add trailing fence to track the
   preemption done.
2. Modify emit_ce_meta emit_de_meta functions for the resumed ibs.

v2: Restyle code not to use ternary operator.
v3: Modify code format.
v4: Enable Mid-Command Buffer Preemption for gfx9 by default.

Cc: Christian Koenig 
Cc: Michel Dänzer 
Acked-by: Christian König 
Signed-off-by: Jiadong.Zhu 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h |   1 +
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c| 181 +++
 drivers/gpu/drm/amd/amdgpu/soc15d.h  |   2 +
 3 files changed, 155 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
index f08ee1ac281c..e90d327a589e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
@@ -60,6 +60,7 @@ enum amdgpu_ring_priority_level {
 #define AMDGPU_FENCE_FLAG_64BIT (1 << 0)
 #define AMDGPU_FENCE_FLAG_INT   (1 << 1)
 #define AMDGPU_FENCE_FLAG_TC_WB_ONLY(1 << 2)
+#define AMDGPU_FENCE_FLAG_EXEC  (1 << 3)
 
 #define to_amdgpu_ring(s) container_of((s), struct amdgpu_ring, sched)
 
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index 3b607c09d267..0864801241f6 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -753,7 +753,7 @@ static void gfx_v9_0_set_rlc_funcs(struct amdgpu_device 
*adev);
 static int gfx_v9_0_get_cu_info(struct amdgpu_device *adev,
struct amdgpu_cu_info *cu_info);
 static uint64_t gfx_v9_0_get_gpu_clock_counter(struct amdgpu_device *adev);
-static void gfx_v9_0_ring_emit_de_meta(struct amdgpu_ring *ring);
+static void gfx_v9_0_ring_emit_de_meta(struct amdgpu_ring *ring, bool resume);
 static u64 gfx_v9_0_ring_get_rptr_compute(struct amdgpu_ring *ring);
 static void gfx_v9_0_query_ras_error_count(struct amdgpu_device *adev,
  void *ras_error_status);
@@ -826,9 +826,10 @@ static void gfx_v9_0_kiq_unmap_queues(struct amdgpu_ring 
*kiq_ring,

PACKET3_UNMAP_QUEUES_DOORBELL_OFFSET0(ring->doorbell_index));
 
if (action == PREEMPT_QUEUES_NO_UNMAP) {
-   amdgpu_ring_write(kiq_ring, lower_32_bits(gpu_addr));
-   amdgpu_ring_write(kiq_ring, upper_32_bits(gpu_addr));
-   amdgpu_ring_write(kiq_ring, seq);
+   amdgpu_ring_write(kiq_ring, lower_32_bits(ring->wptr & 
ring->buf_mask));
+   amdgpu_ring_write(kiq_ring, 0);
+   amdgpu_ring_write(kiq_ring, 0);
+
} else {
amdgpu_ring_write(kiq_ring, 0);
amdgpu_ring_write(kiq_ring, 0);
@@ -5364,11 +5365,17 @@ static void gfx_v9_0_ring_emit_ib_gfx(struct 
amdgpu_ring *ring,
 
control |= ib->length_dw | (vmid << 24);
 
-   if (amdgpu_sriov_vf(ring->adev) && (ib->flags & 
AMDGPU_IB_FLAG_PREEMPT)) {
+   if (ib->flags & AMDGPU_IB_FLAG_PREEMPT) {
control |= INDIRECT_BUFFER_PRE_ENB(1);
 
+   if (flags & AMDGPU_IB_PREEMPTED)
+   control |= INDIRECT_BUFFER_PRE_RESUME(1);
+
if (!(ib->flags & AMDGPU_IB_FLAG_CE) && vmid)
-   gfx_v9_0_ring_emit_de_meta(ring);
+   gfx_v9_0_ring_emit_de_meta(ring,
+  
(!amdgpu_sriov_vf(ring->adev) &&
+  flags & AMDGPU_IB_PREEMPTED) 
?
+  true : false);
}
 
amdgpu_ring_write(ring, header);
@@ -5423,17 +5430,23 @@ static void gfx_v9_0_ring_emit_fence(struct amdgpu_ring 
*ring, u64 addr,
bool write64bit = flags & AMDGPU_FENCE_FLAG_64BIT;
bool int_sel = flags & AMDGPU_FENCE_FLAG_INT;
bool writeback = flags & AMDGPU_FENCE_FLAG_TC_WB_ONLY;
+   bool exec = flags & AMDGPU_FENCE_FLAG_EXEC;
+   uint32_t dw2 = 0;
 
/* RELEASE_MEM - flush caches, send int */
amdgpu_ring_write(ring, PACKET3(PACKET3_RELEASE_MEM, 6));
-   amdgpu_ring_write(ring, ((writeback ? (EOP_TC_WB_ACTION_EN |
-  EOP_TC_NC_ACTION_EN) :
- (EOP_TCL1_ACTION_EN |
-  EOP_TC_ACTION_EN |
-  EOP_TC_WB_ACTION_EN |
-  EOP_TC_MD_ACTION_EN)) |
-EVENT_TYPE(CACHE_FLUSH_AND_INV_TS_EVENT) |
-EVENT_INDEX(5)));
+
+   if (writeback) {
+   dw2 = EOP_TC_WB_ACTION_EN | EOP_TC_NC_ACTION_EN;
+   } else {
+   dw2 = EOP_TCL1_ACTION_EN | EOP_TC_ACTION_EN |
+   EOP_TC_WB_ACTION_EN | EOP_TC_MD_ACTION_EN;
+   }
+   dw2 |= EVENT_TYPE(CACHE

[PATCH 2/4] drm/amdgpu: Add software ring callbacks for gfx9 (v7)

2022-09-29 Thread jiadong.zhu

From: "Jiadong.Zhu" 

Set ring functions with software ring callbacks on gfx9.

The software ring could be tested by debugfs_test_ib case.

v2: Set sw_ring 2 to enable software ring by default.
v3: Remove the parameter for software ring enablement.
v4: Use amdgpu_ring_init/fini for software rings.
v5: Update for code format. Fix conflict.
v6: Remove unnecessary checks and enable software ring on gfx9 by default.
v7: Use static array for software ring names and priorities.

Acked-by: Luben Tuikov 
Cc: Christian Koenig 
Cc: Luben Tuikov 
Cc: Andrey Grodzovsky 
Cc: Michel Dänzer 
Signed-off-by: Jiadong.Zhu 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h  |   1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h |   1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c |  20 
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.h |   2 +
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c| 104 ++-
 5 files changed, 127 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
index 9996dadb39f7..4fdfc3ec134a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
@@ -348,6 +348,7 @@ struct amdgpu_gfx {
 
boolis_poweron;
 
+   struct amdgpu_ring  sw_gfx_ring[AMDGPU_MAX_SW_GFX_RINGS];
struct amdgpu_ring_mux  muxer;
 };
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
index 40b1277b4f0c..f08ee1ac281c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
@@ -39,6 +39,7 @@ struct amdgpu_vm;
 #define AMDGPU_MAX_RINGS   28
 #define AMDGPU_MAX_HWIP_RINGS  8
 #define AMDGPU_MAX_GFX_RINGS   2
+#define AMDGPU_MAX_SW_GFX_RINGS 2
 #define AMDGPU_MAX_COMPUTE_RINGS   8
 #define AMDGPU_MAX_VCE_RINGS   3
 #define AMDGPU_MAX_UVD_ENC_RINGS   2
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c
index 43cab8a37754..2e64ffccc030 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c
@@ -29,6 +29,14 @@
 
 #define AMDGPU_MUX_RESUBMIT_JIFFIES_TIMEOUT (HZ / 2)
 
+static const struct ring_info {
+   unsigned int hw_pio;
+   const char *ring_name;
+} sw_ring_info[] = {
+   { AMDGPU_RING_PRIO_DEFAULT, "gfx_low"},
+   { AMDGPU_RING_PRIO_2, "gfx_high"},
+};
+
 int amdgpu_ring_mux_init(struct amdgpu_ring_mux *mux, struct amdgpu_ring *ring,
 unsigned int entry_size)
 {
@@ -215,3 +223,15 @@ void amdgpu_sw_ring_insert_nop(struct amdgpu_ring *ring, 
uint32_t count)
 {
WARN_ON(!ring->is_sw_ring);
 }
+
+const char *amdgpu_sw_ring_name(int idx)
+{
+   return idx < ARRAY_SIZE(sw_ring_info) ?
+   sw_ring_info[idx].ring_name : NULL;
+}
+
+unsigned int amdgpu_sw_ring_priority(int idx)
+{
+   return idx < ARRAY_SIZE(sw_ring_info) ?
+   sw_ring_info[idx].hw_pio : AMDGPU_RING_PRIO_DEFAULT;
+}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.h
index d91629589577..28399f4b0e5c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.h
@@ -73,4 +73,6 @@ void amdgpu_sw_ring_insert_nop(struct amdgpu_ring *ring, 
uint32_t count);
 void amdgpu_sw_ring_ib_begin(struct amdgpu_ring *ring);
 void amdgpu_sw_ring_ib_end(struct amdgpu_ring *ring);
 
+const char *amdgpu_sw_ring_name(int idx);
+unsigned int amdgpu_sw_ring_priority(int idx);
 #endif
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index 6b609f33261f..3b607c09d267 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -47,6 +47,7 @@
 
 #include "amdgpu_ras.h"
 
+#include "amdgpu_ring_mux.h"
 #include "gfx_v9_4.h"
 #include "gfx_v9_0.h"
 #include "gfx_v9_4_2.h"
@@ -56,6 +57,7 @@
 #include "asic_reg/gc/gc_9_0_default.h"
 
 #define GFX9_NUM_GFX_RINGS 1
+#define GFX9_NUM_SW_GFX_RINGS  2
 #define GFX9_MEC_HPD_SIZE 4096
 #define RLCG_UCODE_LOADING_START_ADDRESS 0x2000L
 #define RLC_SAVE_RESTORE_ADDR_STARTING_OFFSET 0xL
@@ -2273,6 +2275,7 @@ static int gfx_v9_0_sw_init(void *handle)
struct amdgpu_ring *ring;
struct amdgpu_kiq *kiq;
struct amdgpu_device *adev = (struct amdgpu_device *)handle;
+   unsigned int hw_prio;
 
switch (adev->ip_versions[GC_HWIP][0]) {
case IP_VERSION(9, 0, 1):
@@ -2356,6 +2359,9 @@ static int gfx_v9_0_sw_init(void *handle)
sprintf(ring->name, "gfx_%d", i);
ring->use_doorbell = true;
ring->doorbell_index = adev->doorbell_index.gfx_ring0 << 1;
+
+   /* disable scheduler on the real ring */
+   ring->no_scheduler = true;
r = amdgpu_ring_init(a

[PATCH 1/4] drm/amdgpu: Introduce gfx software ring (v8)

2022-09-29 Thread jiadong.zhu

From: "Jiadong.Zhu" 

The software ring is created to support priority context while there is only
one hardware queue for gfx.

Every software ring has its fence driver and could be used as an ordinary ring
for the GPU scheduler.
Multiple software rings are bound to a real ring with the ring muxer. The
packages committed on the software ring are copied to the real ring.

v2: Use array to store software ring entry.
v3: Remove unnecessary prints.
v4: Remove amdgpu_ring_sw_init/fini functions,
using gtt for sw ring buffer for later dma copy
optimization.
v5: Allocate ring entry dynamically in the muxer.
v6: Update comments for the ring muxer.
v7: Modify for function naming.
v8: Combine software ring functions into amdgpu_ring_mux.c

Cc: Christian Koenig 
Cc: Luben Tuikov 
Cc: Andrey Grodzovsky  
Cc: Michel Dänzer 
Signed-off-by: Jiadong.Zhu 
---
 drivers/gpu/drm/amd/amdgpu/Makefile  |   3 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h  |   3 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h |   4 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c | 217 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.h |  76 +++
 5 files changed, 302 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.h

diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile 
b/drivers/gpu/drm/amd/amdgpu/Makefile
index 3e0e2eb7e235..add7da53950c 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -58,7 +58,8 @@ amdgpu-y += amdgpu_device.o amdgpu_kms.o \
amdgpu_vm_sdma.o amdgpu_discovery.o amdgpu_ras_eeprom.o amdgpu_nbio.o \
amdgpu_umc.o smu_v11_0_i2c.o amdgpu_fru_eeprom.o amdgpu_rap.o \
amdgpu_fw_attestation.o amdgpu_securedisplay.o \
-   amdgpu_eeprom.o amdgpu_mca.o amdgpu_psp_ta.o amdgpu_lsdma.o
+   amdgpu_eeprom.o amdgpu_mca.o amdgpu_psp_ta.o amdgpu_lsdma.o \
+   amdgpu_ring_mux.o
 
 amdgpu-$(CONFIG_PROC_FS) += amdgpu_fdinfo.o
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
index 53526ffb2ce1..9996dadb39f7 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
@@ -33,6 +33,7 @@
 #include "amdgpu_imu.h"
 #include "soc15.h"
 #include "amdgpu_ras.h"
+#include "amdgpu_ring_mux.h"
 
 /* GFX current status */
 #define AMDGPU_GFX_NORMAL_MODE 0xL
@@ -346,6 +347,8 @@ struct amdgpu_gfx {
struct amdgpu_gfx_ras   *ras;
 
boolis_poweron;
+
+   struct amdgpu_ring_mux  muxer;
 };
 
 #define amdgpu_gfx_get_gpu_clock_counter(adev) 
(adev)->gfx.funcs->get_gpu_clock_counter((adev))
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
index 7d89a52091c0..40b1277b4f0c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
@@ -278,6 +278,10 @@ struct amdgpu_ring {
boolis_mes_queue;
uint32_thw_queue_id;
struct amdgpu_mes_ctx_data *mes_ctx;
+
+   boolis_sw_ring;
+   unsigned intentry_index;
+
 };
 
 #define amdgpu_ring_parse_cs(r, p, job, ib) ((r)->funcs->parse_cs((p), (job), 
(ib)))
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c
new file mode 100644
index ..43cab8a37754
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c
@@ -0,0 +1,217 @@
+/*
+ * Copyright 2022 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+#include 
+#include 
+
+#include "amdgpu_ring_mux.h"
+#include "amdgpu_ring.h"
+#include "amdgpu.h"
+
+#define AMDGPU_MUX_RESUBMIT_JIFFIES_TIMEOUT (HZ / 2)
+
+int amdgpu_ring_mux_init(struct amdgpu_ring_mux *mux, struct amdgpu_ring *ring,
+

Re: [PATCH v3 5/5] drm/amdgpu: switch workload context to/from compute

2022-09-29 Thread Sharma, Shashank





On 9/28/2022 11:51 PM, Alex Deucher wrote:

On Wed, Sep 28, 2022 at 4:57 AM Sharma, Shashank
 wrote:




On 9/27/2022 10:40 PM, Alex Deucher wrote:

On Tue, Sep 27, 2022 at 11:38 AM Sharma, Shashank
 wrote:




On 9/27/2022 5:23 PM, Felix Kuehling wrote:

Am 2022-09-27 um 10:58 schrieb Sharma, Shashank:

Hello Felix,

Thank for the review comments.

On 9/27/2022 4:48 PM, Felix Kuehling wrote:

Am 2022-09-27 um 02:12 schrieb Christian König:

Am 26.09.22 um 23:40 schrieb Shashank Sharma:

This patch switches the GPU workload mode to/from
compute mode, while submitting compute workload.

Signed-off-by: Alex Deucher 
Signed-off-by: Shashank Sharma 


Feel free to add my acked-by, but Felix should probably take a look
as well.


This look OK purely from a compute perspective. But I'm concerned
about the interaction of compute with graphics or multiple graphics
contexts submitting work concurrently. They would constantly override
or disable each other's workload hints.

For example, you have an amdgpu_ctx with
AMDGPU_CTX_WORKLOAD_HINT_COMPUTE (maybe Vulkan compute) and a KFD
process that also wants the compute profile. Those could be different
processes belonging to different users. Say, KFD enables the compute
profile first. Then the graphics context submits a job. At the start
of the job, the compute profile is enabled. That's a no-op because
KFD already enabled the compute profile. When the job finishes, it
disables the compute profile for everyone, including KFD. That's
unexpected.



In this case, it will not disable the compute profile, as the
reference counter will not be zero. The reset_profile() will only act
if the reference counter is 0.


OK, I missed the reference counter.




But I would be happy to get any inputs about a policy which can be
more sustainable and gets better outputs, for example:
- should we not allow a profile change, if a PP mode is already
applied and keep it Early bird basis ?

For example: Policy A
- Job A sets the profile to compute
- Job B tries to set profile to 3D, but we do not allow it as job A is
not finished it yet.

Or Policy B: Current one
- Job A sets the profile to compute
- Job B tries to set profile to 3D, and we allow it. Job A also runs
in PP 3D
- Job B finishes, but does not reset PP as reference count is not zero
due to compute
- Job  A finishes, profile reset to NONE


I think this won't work. As I understand it, the
amdgpu_dpm_switch_power_profile enables and disables individual
profiles. Disabling the 3D profile doesn't disable the compute profile
at the same time. I think you'll need one refcount per profile.

Regards,
 Felix


Thanks, This is exactly what I was looking for, I think Alex's initial
idea was around it, but I was under the assumption that there is only
one HW profile in SMU which keeps on getting overwritten. This can solve
our problems, as I can create an array of reference counters, and will
disable only the profile whose reference counter goes 0.


It's been a while since I paged any of this code into my head, but I
believe the actual workload message in the SMU is a mask where you can
specify multiple workload types at the same time and the SMU will
arbitrate between them internally.  E.g., the most aggressive one will
be selected out of the ones specified.  I think in the driver we just
set one bit at a time using the current interface.  It might be better
to change the interface and just ref count the hint types and then
when we call the set function look at the ref counts for each hint
type and set the mask as appropriate.

Alex



Hey Alex,
Thanks for your comment, if that is the case, this current patch series
works straight forward, and no changes would be required. Please let me
know if my understanding is correct:

Assumption: Order of aggression: 3D > Media > Compute

- Job 1: Requests mode compute: PP changed to compute, ref count 1
- Job 2: Requests mode media: PP changed to media, ref count 2
- Job 3: requests mode 3D: PP changed to 3D, ref count 3
- Job 1 finishes, downs ref count to 2, doesn't reset the PP as ref > 0,
PP still 3D
- Job 3 finishes, downs ref count to 1, doesn't reset the PP as ref > 0,
PP still 3D
- Job 2 finishes, downs ref count to 0, PP changed to NONE,

In this way, every job will be operating in the Power profile of desired
aggression or higher, and this API guarantees the execution at-least in
the desired power profile.


I'm not entirely sure on the relative levels of aggression, but I
believe the SMU priorities them by index.  E.g.
#define WORKLOAD_PPLIB_DEFAULT_BIT0
#define WORKLOAD_PPLIB_FULL_SCREEN_3D_BIT 1
#define WORKLOAD_PPLIB_POWER_SAVING_BIT   2
#define WORKLOAD_PPLIB_VIDEO_BIT  3
#define WORKLOAD_PPLIB_VR_BIT 4
#define WORKLOAD_PPLIB_COMPUTE_BIT5
#define WORKLOAD_PPLIB_CUSTOM_BIT 6

3D < video < VR < compute < custom

VR and compute are the most aggressive.  Custom takes preference
because it's user customizable.

Alex



Thanks, so this UA

RE: [PATCH] drm/amdgpu: correct the memcpy size for ip discovery firmware

2022-09-29 Thread Zhang, Hawking

[AMD Official Use Only - General]

Reviewed-by: Hawking Zhang 

Regards,
Hawking
-Original Message-
From: amd-gfx  On Behalf Of Le Ma
Sent: Thursday, September 29, 2022 15:50
To: amd-gfx@lists.freedesktop.org
Cc: Ma, Le 
Subject: [PATCH] drm/amdgpu: correct the memcpy size for ip discovery firmware

Use fw->size instead of discovery_tmr_size for fallback path.

Change-Id: I61f1ec55314ea5948ed3ef821becfdd63d876272
Signed-off-by: Le Ma 
Acked-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
index 309d35026222..0b4f4d2f8d32 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
@@ -234,7 +234,7 @@ static int amdgpu_discovery_read_binary_from_file(struct 
amdgpu_device *adev, ui
return r;
}
 
-   memcpy((u8 *)binary, (u8 *)fw->data, adev->mman.discovery_tmr_size);
+   memcpy((u8 *)binary, (u8 *)fw->data, fw->size);
release_firmware(fw);
 
return 0;
-- 
2.17.1

[linux-next:master] BUILD REGRESSION de90d455a35e474a184c898e66a6a108c3a99434

2022-09-29 Thread kernel test robot

tree/branch: 
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
branch HEAD: de90d455a35e474a184c898e66a6a108c3a99434  Add linux-next specific 
files for 20220928

Error/Warning reports:

https://lore.kernel.org/linux-mm/202209150141.wgbakqmx-...@intel.com
https://lore.kernel.org/llvm/202209220019.yr2vuxhg-...@intel.com

Error/Warning: (recently discovered and may have been fixed)

ERROR: modpost: "devm_iio_channel_get_all" 
[drivers/power/supply/mt6370-charger.ko] undefined!
ERROR: modpost: "devm_ioremap_resource" [drivers/dma/fsl-edma.ko] undefined!
ERROR: modpost: "devm_ioremap_resource" [drivers/dma/idma64.ko] undefined!
ERROR: modpost: "devm_ioremap_resource" [drivers/dma/qcom/hdma.ko] undefined!
ERROR: modpost: "devm_memremap" [drivers/misc/open-dice.ko] undefined!
ERROR: modpost: "devm_memunmap" [drivers/misc/open-dice.ko] undefined!
ERROR: modpost: "devm_platform_ioremap_resource" 
[drivers/char/xillybus/xillybus_of.ko] undefined!
ERROR: modpost: "devm_platform_ioremap_resource" 
[drivers/clk/xilinx/clk-xlnx-clock-wizard.ko] undefined!
ERROR: modpost: "iio_read_channel_processed" 
[drivers/power/supply/mt6370-charger.ko] undefined!
ERROR: modpost: "ioremap" [drivers/tty/ipwireless/ipwireless.ko] undefined!
ERROR: modpost: "iounmap" [drivers/net/ethernet/8390/pcnet_cs.ko] undefined!
ERROR: modpost: "iounmap" [drivers/tty/ipwireless/ipwireless.ko] undefined!
aarch64-linux-ld: security/apparmor/lsm.c:1545: undefined reference to 
`zstd_max_clevel'
arch/arm64/kernel/alternative.c:199:6: warning: no previous prototype for 
'apply_alternatives_vdso' [-Wmissing-prototypes]
arch/arm64/kernel/alternative.c:295:14: warning: no previous prototype for 
'alt_cb_patch_nops' [-Wmissing-prototypes]
depmod: ERROR: Cycle detected: nf_conntrack -> nf_nat -> nf_conntrack
depmod: ERROR: Found 2 modules in dependency cycles!
drivers/gpu/drm/msm/msm_gem_shrinker.c:29:28: error: '__GFP_ATOMIC' undeclared 
(first use in this function); did you mean 'GFP_ATOMIC'?
drivers/gpu/drm/tests/drm_format_helper_test.c:381:31: warning: use of NULL 
'buf' where non-null expected [CWE-476] [-Wanalyzer-null-argument]
drivers/iommu/ipmmu-vmsa.c:946:34: warning: 'ipmmu_of_ids' defined but not used 
[-Wunused-const-variable=]
include/linux/compiler_types.h:352:45: error: call to 
'__compiletime_assert_243' declared with attribute error: FIELD_PREP: mask is 
not constant
include/linux/compiler_types.h:352:45: error: call to 
'__compiletime_assert_256' declared with attribute error: FIELD_PREP: mask is 
not constant
include/linux/compiler_types.h:352:45: error: call to 
'__compiletime_assert_265' declared with attribute error: FIELD_PREP: mask is 
not constant
include/linux/compiler_types.h:352:45: error: call to 
'__compiletime_assert_284' declared with attribute error: FIELD_PREP: mask is 
not constant
include/linux/compiler_types.h:352:45: error: call to 
'__compiletime_assert_290' declared with attribute error: FIELD_PREP: mask is 
not constant
include/linux/compiler_types.h:352:45: error: call to 
'__compiletime_assert_408' declared with attribute error: FIELD_PREP: mask is 
not constant
security/apparmor/apparmorfs.c:1204: undefined reference to `zstd_min_clevel'
security/apparmor/apparmorfs.c:1210: undefined reference to `zstd_max_clevel'
security/apparmor/lsm.c:1545: undefined reference to `zstd_min_clevel'

Error/Warning ids grouped by kconfigs:

gcc_recent_errors
|-- arc-randconfig-r043-20220926
|   |-- 
drivers-gpu-drm-msm-msm_gem_shrinker.c:error:__GFP_ATOMIC-undeclared-(first-use-in-this-function)
|   `-- 
include-linux-compiler_types.h:error:call-to-__compiletime_assert_NNN-declared-with-attribute-error:FIELD_PREP:mask-is-not-constant
|-- arm-buildonly-randconfig-r001-20220926
|   `-- 
include-linux-compiler_types.h:error:call-to-__compiletime_assert_NNN-declared-with-attribute-error:FIELD_PREP:mask-is-not-constant
|-- arm-randconfig-c002-20220925
|   `-- 
drivers-gpu-drm-tests-drm_format_helper_test.c:warning:use-of-NULL-buf-where-non-null-expected-CWE
|-- arm64-allyesconfig
|   |-- 
arch-arm64-kernel-alternative.c:warning:no-previous-prototype-for-alt_cb_patch_nops
|   `-- 
arch-arm64-kernel-alternative.c:warning:no-previous-prototype-for-apply_alternatives_vdso
|-- arm64-buildonly-randconfig-r006-20220926
|   |-- 
aarch64-linux-ld:security-apparmor-lsm.c:undefined-reference-to-zstd_max_clevel
|   |-- 
arch-arm64-kernel-alternative.c:warning:no-previous-prototype-for-alt_cb_patch_nops
|   |-- 
arch-arm64-kernel-alternative.c:warning:no-previous-prototype-for-apply_alternatives_vdso
|   |-- security-apparmor-apparmorfs.c:undefined-reference-to-zstd_max_clevel
|   |-- security-apparmor-apparmorfs.c:undefined-reference-to-zstd_min_clevel
|   `-- security-apparmor-lsm.c:undefined-reference-to-zstd_min_clevel
|-- arm64-randconfig-r035-20220926
|   |-- 
arch-arm64-kernel-alternative.c:warning:no-previous-prototype-for-alt_cb_patch_nops
|   |-- 
arch-arm64-kernel-alternative.c:warning:no-previous-prototype-f

[PATCH] drm/amdgpu: correct the memcpy size for ip discovery firmware

2022-09-29 Thread Le Ma

Use fw->size instead of discovery_tmr_size for fallback path.

Change-Id: I61f1ec55314ea5948ed3ef821becfdd63d876272
Signed-off-by: Le Ma 
Acked-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
index 309d35026222..0b4f4d2f8d32 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
@@ -234,7 +234,7 @@ static int amdgpu_discovery_read_binary_from_file(struct 
amdgpu_device *adev, ui
return r;
}
 
-   memcpy((u8 *)binary, (u8 *)fw->data, adev->mman.discovery_tmr_size);
+   memcpy((u8 *)binary, (u8 *)fw->data, fw->size);
release_firmware(fw);
 
return 0;
-- 
2.17.1

53 matches

Mail list logo