Hey Joe, I don't have all the answers right now but one thing I want to mention is that, with cgroup, there's always a possibility for a user configuration that lead to under resource for the application. Your comments certainly highlight the needs to make under-resource situation obvious to debug. (I want to write this down so I don't forget also... :) I should probably have some dmesg for situation like this.) Thanks!
Regards, Kenny On Mon, Dec 2, 2019 at 5:05 PM Greathouse, Joseph <joseph.greatho...@amd.com> wrote: > > > -----Original Message----- > > From: Kenny Ho <y2ke...@gmail.com> > > Sent: Friday, November 29, 2019 12:00 AM > > > > Reducing audience since this is AMD specific. > > > > On Tue, Oct 8, 2019 at 3:11 PM Kuehling, Felix <felix.kuehl...@amd.com> > > wrote: > > > > > > On 2019-08-29 2:05 a.m., Kenny Ho wrote: > > > > The number of logical gpu (lgpu) is defined to be the number of > > > > compute unit (CU) for a device. The lgpu allocation limit only > > > > applies to compute workload for the moment (enforced via kfd queue > > > > creation.) Any cu_mask update is validated against the availability > > > > of the compute unit as defined by the drmcg the kfd process belongs to. > > > > > > There is something missing here. There is an API for the application > > > to specify a CU mask. Right now it looks like the > > > application-specified and CGroup-specified CU masks would clobber each > > > other. Instead the two should be merged. > > > > > > The CGroup-specified mask should specify a subset of CUs available for > > > application-specified CU masks. When the cgroup CU mask changes, you'd > > > need to take any application-specified CU masks into account before > > > updating the hardware. > > The idea behind the current implementation is to give sysadmin priority > > over user application (as that is the definition of control > > group.) Mask specified by applicatoin/user is validated by > > pqm_drmcg_lgpu_validate and rejected with EACCES if they are not > > compatible. The alternative is to ignore the difference and have the > > kernel guess/redistribute the assignment but I am not sure if this > > is a good approach since there is not enough information to allow the > > kernel to guess the user's intention correctly consistently. (This > > is base on multiple conversations with you and Joe that, led me to believe, > > there are situation where spreading CU assignment across > > multiple SE is a good thing but not always.) > > > > If the cgroup-specified mask is changed after the application has set the > > mask, the intersection of the two masks will be set instead. It > > is possible to have no intersection and in this case no CU is made > > available to the application (just like the possibility for memcgroup to > > starve the amount of memory needed by an application.) > > I don't disagree with forcing a user to work within an lgpu's allocation. But > there's two minor problems here: > > 1) we will need a way for the process to query what the lgpu's bitmap looks > like. You and Felix are somewhat discussing this below, but I don't think the > KFD's "number of CUs" topology information is sufficient. I can know I have > 32 CUs, but I don't know which 32 bits in the bitmask are turned on. But your > code in pqm_drmcg_lgpu_validate() requires a subset when setting CU mask on > an lgpu. A user needs to know what bits are on in the LGPU for this to work. > 2) Even if we have a query API, do we have an easy way to prevent a data > race? Do we care? For instance, if I query the existing lgpu bitmap, then try > to set a CU mask on a subset of that, it's possible that the lgpu will change > between the query and set. That would make the setting fail, maybe that's > good enough (you can just try in a loop until it succeeds?) > > Do empty CU masks actually work? This seems like something we would want to > avoid. This could happen not infrequently if someone does something like: > * lgpu with half the CUs enabled > * User sets a mask to use half of those CUs > * lgpu is changed to enable the other half of the CUS --> now the user's mask > is fully destroyed and everything dies. :\ > > > > The KFD topology APIs report the number of available CUs to the > > > application. CGroups would change that number at runtime and > > > applications would not expect that. I think the best way to deal with > > > that would be to have multiple bits in the application-specified CU > > > mask map to the same CU. How to do that in a fair way is not obvious. > > > I guess a more coarse-grain division of the GPU into LGPUs would make > > > this somewhat easier. > > Another possibility is to add namespace to the topology sysfs such that the > > correct number of CUs changes accordingly. Although that > > wouldn't give the user the available mask that is made available by this > > implementation via the cgroup sysfs. Another possibility is to > > modify the thunk similar to what was done for device cgroup (device > > re-mapping.) > > I'd vote for a set of mask query APIs in the Thunk. One for the process's > current CU mask, and one for a queue's current CU mask. We have a setter API > already. Since the KFD topology information is also mirrored in sysfs, I > would worry that a process would see different KFD topology information if > it's querying the Thunk (which would show the lgpu's number of CUS0 vs. if > it's reading sysfs (which would show the GPU's number of CUs). > > As mentioned above, the KFD "num CUs" is insufficient for knowing how to set > the CU bitmask, so I don't think we should rely on it in this case. IMO, KFD > topology should describe the real hardware regardless of how cgroups is > limiting things. I'm willing to be told this is a bad idea, though. > > > > How is this problem handled for CPU cores and the interaction with CPU > > > pthread_setaffinity_np? > > Per the documentation of pthread_setaffinity_np, "If the call is > > successful, and the thread is not currently running on one of the CPUs > > in cpuset, then it is migrated to one of those CPUs." > > http://man7.org/linux/man-pages/man3/pthread_setaffinity_np.3.html > > > > Regards, > > Kenny > > > > > > > > > Regards, > > > Felix > > > > > > > > > > > > > > Change-Id: I69a57452c549173a1cd623c30dc57195b3b6563e > > > > Signed-off-by: Kenny Ho <kenny...@amd.com> > > > > --- > > > > drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 4 + > > > > drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 21 +++ > > > > drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 6 + > > > > drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 3 + > > > > .../amd/amdkfd/kfd_process_queue_manager.c | 140 ++++++++++++++++++ > > > > 5 files changed, 174 insertions(+) > > > > > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h > > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h > > > > index 55cb1b2094fd..369915337213 100644 > > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h > > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h > > > > @@ -198,6 +198,10 @@ uint8_t amdgpu_amdkfd_get_xgmi_hops_count(struct > > > > kgd_dev *dst, struct kgd_dev *s > > > > valid; \ > > > > }) > > > > > > > > +int amdgpu_amdkfd_update_cu_mask_for_process(struct task_struct *task, > > > > + struct amdgpu_device *adev, unsigned long *lgpu_bitmap, > > > > + unsigned int nbits); > > > > + > > > > /* GPUVM API */ > > > > int amdgpu_amdkfd_gpuvm_create_process_vm(struct kgd_dev *kgd, > > > > unsigned int pasid, > > > > void **vm, void > > > > **process_info, diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > > > > index 163a4fbf0611..8abeffdd2e5b 100644 > > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > > > > @@ -1398,9 +1398,29 @@ amdgpu_get_crtc_scanout_position(struct > > > > drm_device *dev, unsigned int pipe, > > > > static void amdgpu_drmcg_custom_init(struct drm_device *dev, > > > > struct drmcg_props *props) > > > > { > > > > + struct amdgpu_device *adev = dev->dev_private; > > > > + > > > > + props->lgpu_capacity = adev->gfx.cu_info.number; > > > > + > > > > props->limit_enforced = true; > > > > } > > > > > > > > +static void amdgpu_drmcg_limit_updated(struct drm_device *dev, > > > > + struct task_struct *task, struct drmcg_device_resource > > > > *ddr, > > > > + enum drmcg_res_type res_type) { > > > > + struct amdgpu_device *adev = dev->dev_private; > > > > + > > > > + switch (res_type) { > > > > + case DRMCG_TYPE_LGPU: > > > > + amdgpu_amdkfd_update_cu_mask_for_process(task, adev, > > > > + ddr->lgpu_allocated, > > > > dev->drmcg_props.lgpu_capacity); > > > > + break; > > > > + default: > > > > + break; > > > > + } > > > > +} > > > > + > > > > static struct drm_driver kms_driver = { > > > > .driver_features = > > > > DRIVER_USE_AGP | DRIVER_ATOMIC | @@ -1438,6 +1458,7 @@ > > > > static struct drm_driver kms_driver = { > > > > .gem_prime_mmap = amdgpu_gem_prime_mmap, > > > > > > > > .drmcg_custom_init = amdgpu_drmcg_custom_init, > > > > + .drmcg_limit_updated = amdgpu_drmcg_limit_updated, > > > > > > > > .name = DRIVER_NAME, > > > > .desc = DRIVER_DESC, > > > > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c > > > > b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c > > > > index 138c70454e2b..fa765b803f97 100644 > > > > --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c > > > > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c > > > > @@ -450,6 +450,12 @@ static int kfd_ioctl_set_cu_mask(struct file > > > > *filp, struct kfd_process *p, > > > > return -EFAULT; > > > > } > > > > > > > > + if (!pqm_drmcg_lgpu_validate(p, args->queue_id, > > > > properties.cu_mask, cu_mask_size)) { > > > > + pr_debug("CU mask not permitted by DRM Cgroup"); > > > > + kfree(properties.cu_mask); > > > > + return -EACCES; > > > > + } > > > > + > > > > mutex_lock(&p->mutex); > > > > > > > > retval = pqm_set_cu_mask(&p->pqm, args->queue_id, > > > > &properties); diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h > > > > b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h > > > > index 8b0eee5b3521..88881bec7550 100644 > > > > --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h > > > > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h > > > > @@ -1038,6 +1038,9 @@ int pqm_get_wave_state(struct > > > > process_queue_manager *pqm, > > > > u32 *ctl_stack_used_size, > > > > u32 *save_area_used_size); > > > > > > > > +bool pqm_drmcg_lgpu_validate(struct kfd_process *p, int qid, u32 > > > > *cu_mask, > > > > + unsigned int cu_mask_size); > > > > + > > > > int amdkfd_fence_wait_timeout(unsigned int *fence_addr, > > > > unsigned int fence_value, > > > > unsigned int timeout_ms); diff --git > > > > a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c > > > > b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c > > > > index 7e6c3ee82f5b..a896de290307 100644 > > > > --- a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c > > > > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c > > > > @@ -23,9 +23,11 @@ > > > > > > > > #include <linux/slab.h> > > > > #include <linux/list.h> > > > > +#include <linux/cgroup_drm.h> > > > > #include "kfd_device_queue_manager.h" > > > > #include "kfd_priv.h" > > > > #include "kfd_kernel_queue.h" > > > > +#include "amdgpu.h" > > > > #include "amdgpu_amdkfd.h" > > > > > > > > static inline struct process_queue_node *get_queue_by_qid( @@ > > > > -167,6 +169,7 @@ static int create_cp_queue(struct > > > > process_queue_manager *pqm, > > > > struct queue_properties *q_properties, > > > > struct file *f, unsigned int qid) > > > > { > > > > + struct drmcg *drmcg; > > > > int retval; > > > > > > > > /* Doorbell initialized in user space*/ @@ -180,6 +183,36 @@ > > > > static int create_cp_queue(struct process_queue_manager *pqm, > > > > if (retval != 0) > > > > return retval; > > > > > > > > + > > > > + drmcg = drmcg_get(pqm->process->lead_thread); > > > > + if (drmcg) { > > > > + struct amdgpu_device *adev; > > > > + struct drmcg_device_resource *ddr; > > > > + int mask_size; > > > > + u32 *mask; > > > > + > > > > + adev = (struct amdgpu_device *) dev->kgd; > > > > + > > > > + mask_size = adev->ddev->drmcg_props.lgpu_capacity; > > > > + mask = kzalloc(sizeof(u32) * round_up(mask_size, 32), > > > > + GFP_KERNEL); > > > > + > > > > + if (!mask) { > > > > + drmcg_put(drmcg); > > > > + uninit_queue(*q); > > > > + return -ENOMEM; > > > > + } > > > > + > > > > + ddr = > > > > + drmcg->dev_resources[adev->ddev->primary->index]; > > > > + > > > > + bitmap_to_arr32(mask, ddr->lgpu_allocated, mask_size); > > > > + > > > > + (*q)->properties.cu_mask_count = mask_size; > > > > + (*q)->properties.cu_mask = mask; > > > > + > > > > + drmcg_put(drmcg); > > > > + } > > > > + > > > > (*q)->device = dev; > > > > (*q)->process = pqm->process; > > > > > > > > @@ -495,6 +528,113 @@ int pqm_get_wave_state(struct > > > > process_queue_manager *pqm, > > > > > > > > save_area_used_size); > > > > } > > > > > > > > +bool pqm_drmcg_lgpu_validate(struct kfd_process *p, int qid, u32 > > > > *cu_mask, > > > > + unsigned int cu_mask_size) { > > > > + DECLARE_BITMAP(curr_mask, MAX_DRMCG_LGPU_CAPACITY); > > > > + struct drmcg_device_resource *ddr; > > > > + struct process_queue_node *pqn; > > > > + struct amdgpu_device *adev; > > > > + struct drmcg *drmcg; > > > > + bool result; > > > > + > > > > + if (cu_mask_size > MAX_DRMCG_LGPU_CAPACITY) > > > > + return false; > > > > + > > > > + bitmap_from_arr32(curr_mask, cu_mask, cu_mask_size); > > > > + > > > > + pqn = get_queue_by_qid(&p->pqm, qid); > > > > + if (!pqn) > > > > + return false; > > > > + > > > > + adev = (struct amdgpu_device *)pqn->q->device->kgd; > > > > + > > > > + drmcg = drmcg_get(p->lead_thread); > > > > + ddr = drmcg->dev_resources[adev->ddev->primary->index]; > > > > + > > > > + if (bitmap_subset(curr_mask, ddr->lgpu_allocated, > > > > + MAX_DRMCG_LGPU_CAPACITY)) > > > > + result = true; > > > > + else > > > > + result = false; > > > > + > > > > + drmcg_put(drmcg); > > > > + > > > > + return result; > > > > +} > > > > + > > > > +int amdgpu_amdkfd_update_cu_mask_for_process(struct task_struct *task, > > > > + struct amdgpu_device *adev, unsigned long *lgpu_bm, > > > > + unsigned int lgpu_bm_size) { > > > > + struct kfd_dev *kdev = adev->kfd.dev; > > > > + struct process_queue_node *pqn; > > > > + struct kfd_process *kfdproc; > > > > + size_t size_in_bytes; > > > > + u32 *cu_mask; > > > > + int rc = 0; > > > > + > > > > + if ((lgpu_bm_size % 32) != 0) { > > > > + pr_warn("lgpu_bm_size %d must be a multiple of 32", > > > > + lgpu_bm_size); > > > > + return -EINVAL; > > > > + } > > > > + > > > > + kfdproc = kfd_get_process(task); > > > > + > > > > + if (IS_ERR(kfdproc)) > > > > + return -ESRCH; > > > > + > > > > + size_in_bytes = sizeof(u32) * round_up(lgpu_bm_size, 32); > > > > + > > > > + mutex_lock(&kfdproc->mutex); > > > > + list_for_each_entry(pqn, &kfdproc->pqm.queues, > > > > process_queue_list) { > > > > + if (pqn->q && pqn->q->device == kdev) { > > > > + /* update cu_mask accordingly */ > > > > + cu_mask = kzalloc(size_in_bytes, GFP_KERNEL); > > > > + if (!cu_mask) { > > > > + rc = -ENOMEM; > > > > + break; > > > > + } > > > > + > > > > + if (pqn->q->properties.cu_mask) { > > > > + DECLARE_BITMAP(curr_mask, > > > > + > > > > + MAX_DRMCG_LGPU_CAPACITY); > > > > + > > > > + if (pqn->q->properties.cu_mask_count > > > > > + lgpu_bm_size) { > > > > + rc = -EINVAL; > > > > + kfree(cu_mask); > > > > + break; > > > > + } > > > > + > > > > + bitmap_from_arr32(curr_mask, > > > > + > > > > pqn->q->properties.cu_mask, > > > > + > > > > + pqn->q->properties.cu_mask_count); > > > > + > > > > + bitmap_and(curr_mask, curr_mask, lgpu_bm, > > > > + lgpu_bm_size); > > > > + > > > > + bitmap_to_arr32(cu_mask, curr_mask, > > > > + lgpu_bm_size); > > > > + > > > > + kfree(curr_mask); > > > > + } else > > > > + bitmap_to_arr32(cu_mask, lgpu_bm, > > > > + lgpu_bm_size); > > > > + > > > > + pqn->q->properties.cu_mask = cu_mask; > > > > + pqn->q->properties.cu_mask_count = > > > > + lgpu_bm_size; > > > > + > > > > + rc = pqn->q->device->dqm->ops.update_queue( > > > > + pqn->q->device->dqm, pqn->q); > > > > + } > > > > + } > > > > + mutex_unlock(&kfdproc->mutex); > > > > + > > > > + return rc; > > > > +} > > > > + > > > > #if defined(CONFIG_DEBUG_FS) > > > > > > > > int pqm_debugfs_mqds(struct seq_file *m, void *data) _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx