from:"Marek Olšák"

Re: [PATCH] drm: add drm device name

2019-09-17 Thread Marek Olšák

drmVersion::name = amdgpu, radeon, intel, etc.
drmVersion::desc = vega10, vega12, vega20, ...

The common Mesa code will use name and desc to select the driver.

The AMD-specific Mesa code will use desc to identify the chip.

Mesa won't receive any PCI IDs for future chips.

Marek


On Tue, Sep 17, 2019 at 10:33 AM Michel Dänzer  wrote:

> On 2019-09-17 1:20 p.m., Christian König wrote:
> > Am 17.09.19 um 11:27 schrieb Michel Dänzer:
> >> On 2019-09-17 11:23 a.m., Michel Dänzer wrote:
> >>> On 2019-09-17 10:23 a.m., Koenig, Christian wrote:
> >>>> Am 17.09.19 um 10:17 schrieb Daniel Vetter:
> >>>>> On Tue, Sep 17, 2019 at 10:12 AM Christian König
> >>>>>  wrote:
> >>>>>> Am 17.09.19 um 07:47 schrieb Jani Nikula:
> >>>>>>> On Mon, 16 Sep 2019, Marek Olšák  wrote:
> >>>>>>>> The purpose is to get rid of all PCI ID tables for all drivers in
> >>>>>>>> userspace. (or at least stop updating them)
> >>>>>>>>
> >>>>>>>> Mesa common code and modesetting will use this.
> >>>>>>> I'd think this would warrant a high level description of what you
> >>>>>>> want
> >>>>>>> to achieve in the commit message.
> >>>>>> And maybe explicitly call it uapi_name or even uapi_driver_name.
> >>>>> If it's uapi_name, then why do we need a new one for every
> generation?
> >>>>> Userspace drivers tend to span a lot more than just 1 generation. And
> >>>>> if you want to have per-generation data from the kernel to userspace,
> >>>>> then imo that's much better suited in some amdgpu ioctl, instead of
> >>>>> trying to encode that into the driver name.
> >>>> Well we already have an IOCTL for that, but I thought the intention
> >>>> here
> >>>> was to get rid of the PCI-ID tables in userspace to figure out which
> >>>> driver to load.
> >>> That's just unrealistic in general, I'm afraid. See e.g. the ongoing
> >>> transition from i965 to iris for recent Intel hardware. How is the
> >>> kernel supposed to know which driver is to be used?
> >
> > Well how is userspace currently handling that? The kernel should NOT say
> > which driver to use in userspace, but rather which one is used in the
> > kernel.
>
> Would that really help though? E.g. the radeon kernel driver supports
> radeon/r200/r300/r600/radeonsi DRI drivers, the i915 one i915/i965/iris
> (and the amdgpu one radeonsi/amdgpu).
>
> The HW generation identifier proposed in these patches might be useful,
> but I suspect there'll always be cases where userspace needs to know
> more precisely.
>
>
> > Mapping that information to an userspace driver still needs to be done
> > somewhere else, but the difference is that you don't need to add all
> > PCI-IDs twice.
>
> It should only really be necessary in Mesa.
>
>
> On 2019-09-17 1:32 p.m., Daniel Vetter wrote:
> > How are other compositors solving this? I don't expect they have a
> > pciid table like modesetting copied to all of them ...
>
> They don't need any of this. The Xorg modesetting driver only did for
> determining the client driver name to advertise via the DRI2 extension.
>
>
> --
> Earthling Michel Dänzer   |   https://redhat.com
> Libre software enthusiast | Mesa and X developer
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm: add drm device name

2019-09-16 Thread Marek Olšák

The purpose is to get rid of all PCI ID tables for all drivers in
userspace. (or at least stop updating them)

Mesa common code and modesetting will use this.

Marek

On Sat, Sep 7, 2019 at 3:48 PM Daniel Vetter  wrote:

> On Sat, Sep 7, 2019 at 3:18 AM Rob Clark  wrote:
> >
> > On Fri, Sep 6, 2019 at 3:16 PM Marek Olšák  wrote:
> > >
> > > + dri-devel
> > >
> > > On Tue, Sep 3, 2019 at 5:41 PM Jiang, Sonny 
> wrote:
> > >>
> > >> Add DRM device name and use DRM_IOCTL_VERSION ioctl drmVersion::desc
> passing it to user space
> > >> instead of unused DRM driver name descriptor.
> > >>
> > >> Change-Id: I809f6d3e057111417efbe8fa7cab8f0113ba4b21
> > >> Signed-off-by: Sonny Jiang 
> > >> ---
> > >>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  2 ++
> > >>  drivers/gpu/drm/drm_drv.c  | 17 +
> > >>  drivers/gpu/drm/drm_ioctl.c|  2 +-
> > >>  include/drm/drm_device.h   |  3 +++
> > >>  include/drm/drm_drv.h  |  1 +
> > >>  5 files changed, 24 insertions(+), 1 deletion(-)
> > >>
> > >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > >> index 67b09cb2a9e2..8f0971cea363 100644
> > >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > >> @@ -2809,6 +2809,8 @@ int amdgpu_device_init(struct amdgpu_device
> *adev,
> > >> /* init the mode config */
> > >> drm_mode_config_init(adev->ddev);
> > >>
> > >> +   drm_dev_set_name(adev->ddev,
> amdgpu_asic_name[adev->asic_type]);
> > >> +
> > >> r = amdgpu_device_ip_init(adev);
> > >> if (r) {
> > >> /* failed in exclusive mode due to timeout */
> > >> diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
> > >> index 862621494a93..6c33879bb538 100644
> > >> --- a/drivers/gpu/drm/drm_drv.c
> > >> +++ b/drivers/gpu/drm/drm_drv.c
> > >> @@ -802,6 +802,7 @@ void drm_dev_fini(struct drm_device *dev)
> > >> mutex_destroy(&dev->struct_mutex);
> > >> drm_legacy_destroy_members(dev);
> > >> kfree(dev->unique);
> > >> +   kfree(dev->name);
> > >>  }
> > >>  EXPORT_SYMBOL(drm_dev_fini);
> > >>
> > >> @@ -1078,6 +1079,22 @@ int drm_dev_set_unique(struct drm_device *dev,
> const char *name)
> > >>  }
> > >>  EXPORT_SYMBOL(drm_dev_set_unique);
> > >>
> > >> +/**
> > >> + * drm_dev_set_name - Set the name of a DRM device
> > >> + * @dev: device of which to set the name
> > >> + * @name: name to be set
> > >> + *
> > >> + * Return: 0 on success or a negative error code on failure.
> > >> + */
> > >> +int drm_dev_set_name(struct drm_device *dev, const char *name)
> > >> +{
> > >> +   kfree(dev->name);
> > >> +   dev->name = kstrdup(name, GFP_KERNEL);
> > >> +
> > >> +   return dev->name ? 0 : -ENOMEM;
> > >> +}
> > >> +EXPORT_SYMBOL(drm_dev_set_name);
> > >> +
> > >>  /*
> > >>   * DRM Core
> > >>   * The DRM core module initializes all global DRM objects and makes
> them
> > >> diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
> > >> index 2263e3ddd822..61f02965106b 100644
> > >> --- a/drivers/gpu/drm/drm_ioctl.c
> > >> +++ b/drivers/gpu/drm/drm_ioctl.c
> > >> @@ -506,7 +506,7 @@ int drm_version(struct drm_device *dev, void
> *data,
> > >> dev->driver->date);
> > >> if (!err)
> > >> err = drm_copy_field(version->desc,
> &version->desc_len,
> > >> -   dev->driver->desc);
> > >> +   dev->name);
> >
> > I suspect this needs to be something like dev->name ? dev->name :
> > dev->driver->desc
> >
> > Or somewhere something needs to arrange for dev->name to default to
> > dev->driver->desc
> >
> > And maybe this should be dev->desc instead of dev->name.. that at
> > least seems less c

Re: [PATCH] drm: add drm device name

2019-09-06 Thread Marek Olšák

+ dri-devel

On Tue, Sep 3, 2019 at 5:41 PM Jiang, Sonny  wrote:

> Add DRM device name and use DRM_IOCTL_VERSION ioctl drmVersion::desc
> passing it to user space
> instead of unused DRM driver name descriptor.
>
> Change-Id: I809f6d3e057111417efbe8fa7cab8f0113ba4b21
> Signed-off-by: Sonny Jiang 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  2 ++
>  drivers/gpu/drm/drm_drv.c  | 17 +
>  drivers/gpu/drm/drm_ioctl.c|  2 +-
>  include/drm/drm_device.h   |  3 +++
>  include/drm/drm_drv.h  |  1 +
>  5 files changed, 24 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 67b09cb2a9e2..8f0971cea363 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -2809,6 +2809,8 @@ int amdgpu_device_init(struct amdgpu_device *adev,
> /* init the mode config */
> drm_mode_config_init(adev->ddev);
>
> +   drm_dev_set_name(adev->ddev, amdgpu_asic_name[adev->asic_type]);
> +
> r = amdgpu_device_ip_init(adev);
> if (r) {
> /* failed in exclusive mode due to timeout */
> diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
> index 862621494a93..6c33879bb538 100644
> --- a/drivers/gpu/drm/drm_drv.c
> +++ b/drivers/gpu/drm/drm_drv.c
> @@ -802,6 +802,7 @@ void drm_dev_fini(struct drm_device *dev)
> mutex_destroy(&dev->struct_mutex);
> drm_legacy_destroy_members(dev);
> kfree(dev->unique);
> +   kfree(dev->name);
>  }
>  EXPORT_SYMBOL(drm_dev_fini);
>
> @@ -1078,6 +1079,22 @@ int drm_dev_set_unique(struct drm_device *dev,
> const char *name)
>  }
>  EXPORT_SYMBOL(drm_dev_set_unique);
>
> +/**
> + * drm_dev_set_name - Set the name of a DRM device
> + * @dev: device of which to set the name
> + * @name: name to be set
> + *
> + * Return: 0 on success or a negative error code on failure.
> + */
> +int drm_dev_set_name(struct drm_device *dev, const char *name)
> +{
> +   kfree(dev->name);
> +   dev->name = kstrdup(name, GFP_KERNEL);
> +
> +   return dev->name ? 0 : -ENOMEM;
> +}
> +EXPORT_SYMBOL(drm_dev_set_name);
> +
>  /*
>   * DRM Core
>   * The DRM core module initializes all global DRM objects and makes them
> diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
> index 2263e3ddd822..61f02965106b 100644
> --- a/drivers/gpu/drm/drm_ioctl.c
> +++ b/drivers/gpu/drm/drm_ioctl.c
> @@ -506,7 +506,7 @@ int drm_version(struct drm_device *dev, void *data,
> dev->driver->date);
> if (!err)
> err = drm_copy_field(version->desc, &version->desc_len,
> -   dev->driver->desc);
> +   dev->name);
>
> return err;
>  }
> diff --git a/include/drm/drm_device.h b/include/drm/drm_device.h
> index 7f9ef709b2b6..e29912c484e4 100644
> --- a/include/drm/drm_device.h
> +++ b/include/drm/drm_device.h
> @@ -123,6 +123,9 @@ struct drm_device {
> /** @unique: Unique name of the device */
> char *unique;
>
> +   /** @name: device name */
> +   char *name;
> +
> /**
>  * @struct_mutex:
>  *
> diff --git a/include/drm/drm_drv.h b/include/drm/drm_drv.h
> index 68ca736c548d..f742e2bde467 100644
> --- a/include/drm/drm_drv.h
> +++ b/include/drm/drm_drv.h
> @@ -798,6 +798,7 @@ static inline bool drm_drv_uses_atomic_modeset(struct
> drm_device *dev)
>
>
>  int drm_dev_set_unique(struct drm_device *dev, const char *name);
> +int drm_dev_set_name(struct drm_device *dev, const char *name);
>
>
>  #endif
> --
> 2.17.1
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH v2] drm/amdgpu: Default disable GDS for compute+gfx

2019-08-29 Thread Marek Olšák

If you decide to add it back, use this instead, it's simpler:
https://patchwork.freedesktop.org/patch/318391/?series=63775&rev=1

Maybe remove OA reservation if you don't need it.

Marek

On Thu, Aug 29, 2019 at 5:06 AM zhoucm1  wrote:

>
> On 2019/8/29 下午3:22, Christian König wrote:
>
> Am 29.08.19 um 07:55 schrieb zhoucm1:
>
>
> On 2019/8/29 上午1:08, Marek Olšák wrote:
>
> It can't break an older driver, because there is no older driver that
> requires the static allocation.
>
> Note that closed source drivers don't count, because they don't need
> backward compatibility.
>
> Yes, I agree, we don't need take care of closed source stack.
>
> But AMDVLK is indeed an open source stack, many fans are using it, we need
> keep its compatibility, don't we?
>
>
> Actually that is still under discussion.
>
> But AMDVLK should have never ever used the static GDS space in the first
> place. We only added that for a transition time for old OpenGL and it
> shouldn't have leaked into the upstream driver.
>
> Not sure what's the best approach here. We could revert "[PATCH]
> drm/amdgpu: remove static GDS, GWS and OA", but that would break KFD. So we
> can only choose between two evils here.
>
> Only alternative I can see which would work for both would be to still
> allocate the static GDS, GWS and OA space, but make it somehow dynamic so
> that the KFD can swap it out again.
>
> Agree with you.
>
> -David
>
>
> Christian.
>
> -David
>
>
> Marek
>
> On Wed, Aug 28, 2019 at 2:44 AM zhoucm1  wrote:
>
>>
>> On 2019/7/23 上午3:08, Christian König wrote:
>> > Am 22.07.19 um 17:34 schrieb Greathouse, Joseph:
>> >> Units in the GDS block default to allowing all VMIDs access to all
>> >> entries. Disable shader access to the GDS, GWS, and OA blocks from all
>> >> compute and gfx VMIDs by default. For compute, HWS firmware will set
>> >> up the access bits for the appropriate VMID when a compute queue
>> >> requires access to these blocks.
>> >> The driver will handle enabling access on-demand for graphics VMIDs.
>>
>> gds_switch is depending on job->gds/gws/oa/_base/size.
>>
>> "[PATCH] drm/amdgpu: remove static GDS, GWS and OA allocation", the
>> default allocations in kernel were removed. If some UMD stacks don't
>> pass gds/gws/oa allocation to bo_list, then kernel will not enable
>> access of them, that will break previous driver.
>>
>> do we need revert "[PATCH] drm/amdgpu: remove static GDS, GWS and OA
>> allocation" ?
>>
>> -David
>>
>> >>
>> >> Leaving VMID0 with full access because otherwise HWS cannot save or
>> >> restore values during task switch.
>> >>
>> >> v2: Fixed code and comment styling.
>> >>
>> >> Change-Id: I3d768a96935d2ed1dff09b02c995090f4fbfa539
>> >> Signed-off-by: Joseph Greathouse 
>> >
>> > Reviewed-by: Christian König 
>> >
>> >> ---
>> >>   drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 25 ++---
>> >>   drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c  | 24 +---
>> >>   drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c  | 24 +---
>> >>   drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c  | 24 +---
>> >>   4 files changed, 69 insertions(+), 28 deletions(-)
>> >>
>> >> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>> >> b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>> >> index 73dcb632a3ce..2a9692bc34b4 100644
>> >> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>> >> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>> >> @@ -1516,17 +1516,27 @@ static void
>> >> gfx_v10_0_init_compute_vmid(struct amdgpu_device *adev)
>> >>   }
>> >>   nv_grbm_select(adev, 0, 0, 0, 0);
>> >>   mutex_unlock(&adev->srbm_mutex);
>> >> +}
>> >>   -/* Initialize all compute VMIDs to have no GDS, GWS, or OA
>> >> -   acccess. These should be enabled by FW for target VMIDs. */
>> >> -for (i = FIRST_COMPUTE_VMID; i < LAST_COMPUTE_VMID; i++) {
>> >> -WREG32_SOC15_OFFSET(GC, 0, mmGDS_VMID0_BASE, 2 * i, 0);
>> >> -WREG32_SOC15_OFFSET(GC, 0, mmGDS_VMID0_SIZE, 2 * i, 0);
>> >> -WREG32_SOC15_OFFSET(GC, 0, mmGDS_GWS_VMID0, i, 0);
>> >> -WREG32_SOC15_OFFSET(GC, 0, mmGDS_OA_VMID0, i, 0);
>> >> +static voi

Re: [PATCH v2] drm/amdgpu: Default disable GDS for compute+gfx

2019-08-28 Thread Marek Olšák

It can't break an older driver, because there is no older driver that
requires the static allocation.

Note that closed source drivers don't count, because they don't need
backward compatibility.

Marek

On Wed, Aug 28, 2019 at 2:44 AM zhoucm1  wrote:

>
> On 2019/7/23 上午3:08, Christian König wrote:
> > Am 22.07.19 um 17:34 schrieb Greathouse, Joseph:
> >> Units in the GDS block default to allowing all VMIDs access to all
> >> entries. Disable shader access to the GDS, GWS, and OA blocks from all
> >> compute and gfx VMIDs by default. For compute, HWS firmware will set
> >> up the access bits for the appropriate VMID when a compute queue
> >> requires access to these blocks.
> >> The driver will handle enabling access on-demand for graphics VMIDs.
>
> gds_switch is depending on job->gds/gws/oa/_base/size.
>
> "[PATCH] drm/amdgpu: remove static GDS, GWS and OA allocation", the
> default allocations in kernel were removed. If some UMD stacks don't
> pass gds/gws/oa allocation to bo_list, then kernel will not enable
> access of them, that will break previous driver.
>
> do we need revert "[PATCH] drm/amdgpu: remove static GDS, GWS and OA
> allocation" ?
>
> -David
>
> >>
> >> Leaving VMID0 with full access because otherwise HWS cannot save or
> >> restore values during task switch.
> >>
> >> v2: Fixed code and comment styling.
> >>
> >> Change-Id: I3d768a96935d2ed1dff09b02c995090f4fbfa539
> >> Signed-off-by: Joseph Greathouse 
> >
> > Reviewed-by: Christian König 
> >
> >> ---
> >>   drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 25 ++---
> >>   drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c  | 24 +---
> >>   drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c  | 24 +---
> >>   drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c  | 24 +---
> >>   4 files changed, 69 insertions(+), 28 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> >> b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> >> index 73dcb632a3ce..2a9692bc34b4 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> >> @@ -1516,17 +1516,27 @@ static void
> >> gfx_v10_0_init_compute_vmid(struct amdgpu_device *adev)
> >>   }
> >>   nv_grbm_select(adev, 0, 0, 0, 0);
> >>   mutex_unlock(&adev->srbm_mutex);
> >> +}
> >>   -/* Initialize all compute VMIDs to have no GDS, GWS, or OA
> >> -   acccess. These should be enabled by FW for target VMIDs. */
> >> -for (i = FIRST_COMPUTE_VMID; i < LAST_COMPUTE_VMID; i++) {
> >> -WREG32_SOC15_OFFSET(GC, 0, mmGDS_VMID0_BASE, 2 * i, 0);
> >> -WREG32_SOC15_OFFSET(GC, 0, mmGDS_VMID0_SIZE, 2 * i, 0);
> >> -WREG32_SOC15_OFFSET(GC, 0, mmGDS_GWS_VMID0, i, 0);
> >> -WREG32_SOC15_OFFSET(GC, 0, mmGDS_OA_VMID0, i, 0);
> >> +static void gfx_v10_0_init_gds_vmid(struct amdgpu_device *adev)
> >> +{
> >> +int vmid;
> >> +
> >> +/*
> >> + * Initialize all compute and user-gfx VMIDs to have no GDS,
> >> GWS, or OA
> >> + * access. Compute VMIDs should be enabled by FW for target VMIDs,
> >> + * the driver can enable them for graphics. VMID0 should maintain
> >> + * access so that HWS firmware can save/restore entries.
> >> + */
> >> +for (vmid = 1; vmid < 16; vmid++) {
> >> +WREG32_SOC15_OFFSET(GC, 0, mmGDS_VMID0_BASE, 2 * vmid, 0);
> >> +WREG32_SOC15_OFFSET(GC, 0, mmGDS_VMID0_SIZE, 2 * vmid, 0);
> >> +WREG32_SOC15_OFFSET(GC, 0, mmGDS_GWS_VMID0, vmid, 0);
> >> +WREG32_SOC15_OFFSET(GC, 0, mmGDS_OA_VMID0, vmid, 0);
> >>   }
> >>   }
> >>   +
> >>   static void gfx_v10_0_tcp_harvest(struct amdgpu_device *adev)
> >>   {
> >>   int i, j, k;
> >> @@ -1629,6 +1639,7 @@ static void gfx_v10_0_constants_init(struct
> >> amdgpu_device *adev)
> >>   mutex_unlock(&adev->srbm_mutex);
> >> gfx_v10_0_init_compute_vmid(adev);
> >> +gfx_v10_0_init_gds_vmid(adev);
> >> }
> >>   diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
> >> b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
> >> index 3f98624772a4..48796b6824cf 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
> >> @@ -1877,14 +1877,23 @@ static void gfx_v7_0_init_compute_vmid(struct
> >> amdgpu_device *adev)
> >>   }
> >>   cik_srbm_select(adev, 0, 0, 0, 0);
> >>   mutex_unlock(&adev->srbm_mutex);
> >> +}
> >>   -/* Initialize all compute VMIDs to have no GDS, GWS, or OA
> >> -   acccess. These should be enabled by FW for target VMIDs. */
> >> -for (i = FIRST_COMPUTE_VMID; i < LAST_COMPUTE_VMID; i++) {
> >> -WREG32(amdgpu_gds_reg_offset[i].mem_base, 0);
> >> -WREG32(amdgpu_gds_reg_offset[i].mem_size, 0);
> >> -WREG32(amdgpu_gds_reg_offset[i].gws, 0);
> >> -WREG32(amdgpu_gds_reg_offset[i].oa, 0);
> >> +static void gfx_v7_0_init_gds_vmid(struct amdgpu_device *adev)
> >> +{
> >> +int vmid;
> >> +
> >> +/*
> >> + * Initialize all compu

[PATCH] Revert "drm/amdgpu: fix transform feedback GDS hang on gfx10 (v2)"

2019-08-02 Thread Marek Olšák

From: Marek Olšák 

This reverts commit b41335c6c0303d100abe89c843e52645d1974cd9.

SET_CONFIG_REG writes to memory if register shadowing is enabled,
causing a VM fault.

NGG streamout is unstable anyway, so all UMDs should use legacy
streamout. I think Mesa is the only driver using NGG streamout.

Signed-off-by: Marek Olšák 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gds.h |  1 -
 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c  | 12 +---
 2 files changed, 1 insertion(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gds.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gds.h
index df8a23554831..f6ac1e9548f2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gds.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gds.h
@@ -25,21 +25,20 @@
 #define __AMDGPU_GDS_H__
 
 struct amdgpu_ring;
 struct amdgpu_bo;
 
 struct amdgpu_gds {
uint32_t gds_size;
uint32_t gws_size;
uint32_t oa_size;
uint32_t gds_compute_max_wave_id;
-   uint32_t vgt_gs_max_wave_id;
 };
 
 struct amdgpu_gds_reg_offset {
uint32_tmem_base;
uint32_tmem_size;
uint32_tgws;
uint32_toa;
 };
 
 #endif /* __AMDGPU_GDS_H__ */
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
index 618291df659b..e3823c8e9850 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
@@ -4269,29 +4269,20 @@ static void gfx_v10_0_ring_emit_hdp_flush(struct 
amdgpu_ring *ring)
 }
 
 static void gfx_v10_0_ring_emit_ib_gfx(struct amdgpu_ring *ring,
   struct amdgpu_job *job,
   struct amdgpu_ib *ib,
   uint32_t flags)
 {
unsigned vmid = AMDGPU_JOB_GET_VMID(job);
u32 header, control = 0;
 
-   /* Prevent a hw deadlock due to a wave ID mismatch between ME and GDS.
-* This resets the wave ID counters. (needed by transform feedback)
-* TODO: This might only be needed on a VMID switch when we change
-*   the GDS OA mapping, not sure.
-*/
-   amdgpu_ring_write(ring, PACKET3(PACKET3_SET_CONFIG_REG, 1));
-   amdgpu_ring_write(ring, mmVGT_GS_MAX_WAVE_ID);
-   amdgpu_ring_write(ring, ring->adev->gds.vgt_gs_max_wave_id);
-
if (ib->flags & AMDGPU_IB_FLAG_CE)
header = PACKET3(PACKET3_INDIRECT_BUFFER_CNST, 2);
else
header = PACKET3(PACKET3_INDIRECT_BUFFER, 2);
 
control |= ib->length_dw | (vmid << 24);
 
if (amdgpu_mcbp && (ib->flags & AMDGPU_IB_FLAG_PREEMPT)) {
control |= INDIRECT_BUFFER_PRE_ENB(1);
 
@@ -5023,21 +5014,21 @@ static const struct amdgpu_ring_funcs 
gfx_v10_0_ring_funcs_gfx = {
 */
5 + /* COND_EXEC */
7 + /* HDP_flush */
4 + /* VGT_flush */
14 + /* CE_META */
31 + /* DE_META */
3 + /* CNTX_CTRL */
5 + /* HDP_INVL */
8 + 8 + /* FENCE x2 */
2, /* SWITCH_BUFFER */
-   .emit_ib_size = 7, /* gfx_v10_0_ring_emit_ib_gfx */
+   .emit_ib_size = 4, /* gfx_v10_0_ring_emit_ib_gfx */
.emit_ib = gfx_v10_0_ring_emit_ib_gfx,
.emit_fence = gfx_v10_0_ring_emit_fence,
.emit_pipeline_sync = gfx_v10_0_ring_emit_pipeline_sync,
.emit_vm_flush = gfx_v10_0_ring_emit_vm_flush,
.emit_gds_switch = gfx_v10_0_ring_emit_gds_switch,
.emit_hdp_flush = gfx_v10_0_ring_emit_hdp_flush,
.test_ring = gfx_v10_0_ring_test_ring,
.test_ib = gfx_v10_0_ring_test_ib,
.insert_nop = amdgpu_ring_insert_nop,
.pad_ib = amdgpu_ring_generic_pad_ib,
@@ -5175,21 +5166,20 @@ static void gfx_v10_0_set_rlc_funcs(struct 
amdgpu_device *adev)
 }
 
 static void gfx_v10_0_set_gds_init(struct amdgpu_device *adev)
 {
/* init asic gds info */
switch (adev->asic_type) {
case CHIP_NAVI10:
default:
adev->gds.gds_size = 0x1;
adev->gds.gds_compute_max_wave_id = 0x4ff;
-   adev->gds.vgt_gs_max_wave_id = 0x3ff;
break;
}
 
adev->gds.gws_size = 64;
adev->gds.oa_size = 16;
 }
 
 static void gfx_v10_0_set_user_wgp_inactive_bitmap_per_sh(struct amdgpu_device 
*adev,
  u32 bitmap)
 {
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: reserve GDS resources on the gfx ring for gfx10

2019-07-17 Thread Marek Olšák

On Wed., Jul. 17, 2019, 10:43 Christian König, <
ckoenig.leichtzumer...@gmail.com> wrote:

> Am 17.07.19 um 16:27 schrieb Marek Olšák:
>
>
>
> On Wed., Jul. 17, 2019, 03:15 Christian König, <
> ckoenig.leichtzumer...@gmail.com> wrote:
>
>> Am 17.07.19 um 02:06 schrieb Marek Olšák:
>> > From: Marek Olšák 
>> >
>> > Hopefully we'll only use 1 gfx ring, because otherwise we'd have to have
>> > separate GDS buffers for each gfx ring.
>> >
>> > This is a workaround to ensure stability of transform feedback. Shaders
>> hang
>> > waiting for a GDS instruction (ds_sub, not ds_ordered_count).
>> >
>> > The disadvantage is that compute IBs might get a different VMID,
>> > because now gfx always has GDS and compute doesn't.
>>
>> So far compute is only using GWS, but I don't think that those
>> reservations are a good idea in general.
>>
>> How severe is the ENOMEM problem you see with using an explicit GWS
>> allocation?
>>
>
> I guess you mean GDS or OA.
>
>
> Yeah, just a typo. Compute is using GWS and we want to use GDS and OA here.
>
> There is no ENOMEM, it just hangs. I don't know why. The shader is waiting
> for ds_sub and can't continue, but GDS is idle.
>
>
> Well could it be because we don't correctly handle non zero offsets or
> stuff like that?
>

I don't know what you mean.


> Does it work with this hack when you don't allocate GDS/OA from the start?
> (Just allocate it twice or something like this).
>

It's only allocated once by the kernel with this hack.

Marek


> Christian.
>
>
> Marek
>
>
>> Regards,
>> Christian.
>>
>> >
>> > Signed-off-by: Marek Olšák 
>> > ---
>> >   drivers/gpu/drm/amd/amdgpu/amdgpu.h |  1 +
>> >   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c  | 10 ++
>> >   drivers/gpu/drm/amd/amdgpu/amdgpu_gds.h |  6 ++
>> >   drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c  | 20 
>> >   4 files changed, 37 insertions(+)
>> >
>> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> > index 4b514a44184c..cbd55d061b72 100644
>> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> > @@ -456,20 +456,21 @@ struct amdgpu_cs_parser {
>> >   struct drm_file *filp;
>> >   struct amdgpu_ctx   *ctx;
>> >
>> >   /* chunks */
>> >   unsignednchunks;
>> >   struct amdgpu_cs_chunk  *chunks;
>> >
>> >   /* scheduler job object */
>> >   struct amdgpu_job   *job;
>> >   struct drm_sched_entity *entity;
>> > + unsignedhw_ip;
>> >
>> >   /* buffer objects */
>> >   struct ww_acquire_ctx   ticket;
>> >   struct amdgpu_bo_list   *bo_list;
>> >   struct amdgpu_mn*mn;
>> >   struct amdgpu_bo_list_entry vm_pd;
>> >   struct list_headvalidated;
>> >   struct dma_fence*fence;
>> >   uint64_tbytes_moved_threshold;
>> >   uint64_tbytes_moved_vis_threshold;
>> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>> > index c691df6f7a57..9125cd69a124 100644
>> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>> > @@ -678,20 +678,28 @@ static int amdgpu_cs_parser_bos(struct
>> amdgpu_cs_parser *p,
>> >   if (r)
>> >   goto error_validate;
>> >
>> >   amdgpu_cs_report_moved_bytes(p->adev, p->bytes_moved,
>> >p->bytes_moved_vis);
>> >
>> >   gds = p->bo_list->gds_obj;
>> >   gws = p->bo_list->gws_obj;
>> >   oa = p->bo_list->oa_obj;
>> >
>> > + if (p->hw_ip == AMDGPU_HW_IP_GFX) {
>> > + /* Only gfx10 allocates these. */
>> > + if (!gds)
>> > + gds = p->adev->gds.gds_gfx_bo;
>> > + if (!oa)
>> > + oa = p->adev->gds.oa_gfx_bo;
>> > + }
>> > +
>> >   amdgpu_bo_list_for_each_entry(e, p->bo_list) {
>> >   struct amdgpu_b

Re: [PATCH] drm/amdgpu: reserve GDS resources on the gfx ring for gfx10

2019-07-17 Thread Marek Olšák

On Wed., Jul. 17, 2019, 03:15 Christian König, <
ckoenig.leichtzumer...@gmail.com> wrote:

> Am 17.07.19 um 02:06 schrieb Marek Olšák:
> > From: Marek Olšák 
> >
> > Hopefully we'll only use 1 gfx ring, because otherwise we'd have to have
> > separate GDS buffers for each gfx ring.
> >
> > This is a workaround to ensure stability of transform feedback. Shaders
> hang
> > waiting for a GDS instruction (ds_sub, not ds_ordered_count).
> >
> > The disadvantage is that compute IBs might get a different VMID,
> > because now gfx always has GDS and compute doesn't.
>
> So far compute is only using GWS, but I don't think that those
> reservations are a good idea in general.
>
> How severe is the ENOMEM problem you see with using an explicit GWS
> allocation?
>

I guess you mean GDS or OA.

There is no ENOMEM, it just hangs. I don't know why. The shader is waiting
for ds_sub and can't continue, but GDS is idle.

Marek


> Regards,
> Christian.
>
> >
> > Signed-off-by: Marek Olšák 
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu.h |  1 +
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c  | 10 ++
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_gds.h |  6 ++
> >   drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c  | 20 
> >   4 files changed, 37 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > index 4b514a44184c..cbd55d061b72 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > @@ -456,20 +456,21 @@ struct amdgpu_cs_parser {
> >   struct drm_file *filp;
> >   struct amdgpu_ctx   *ctx;
> >
> >   /* chunks */
> >   unsignednchunks;
> >   struct amdgpu_cs_chunk  *chunks;
> >
> >   /* scheduler job object */
> >   struct amdgpu_job   *job;
> >   struct drm_sched_entity *entity;
> > + unsignedhw_ip;
> >
> >   /* buffer objects */
> >   struct ww_acquire_ctx   ticket;
> >   struct amdgpu_bo_list   *bo_list;
> >   struct amdgpu_mn*mn;
> >   struct amdgpu_bo_list_entry vm_pd;
> >   struct list_headvalidated;
> >   struct dma_fence*fence;
> >   uint64_tbytes_moved_threshold;
> >   uint64_tbytes_moved_vis_threshold;
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> > index c691df6f7a57..9125cd69a124 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> > @@ -678,20 +678,28 @@ static int amdgpu_cs_parser_bos(struct
> amdgpu_cs_parser *p,
> >   if (r)
> >   goto error_validate;
> >
> >   amdgpu_cs_report_moved_bytes(p->adev, p->bytes_moved,
> >p->bytes_moved_vis);
> >
> >   gds = p->bo_list->gds_obj;
> >   gws = p->bo_list->gws_obj;
> >   oa = p->bo_list->oa_obj;
> >
> > + if (p->hw_ip == AMDGPU_HW_IP_GFX) {
> > + /* Only gfx10 allocates these. */
> > + if (!gds)
> > + gds = p->adev->gds.gds_gfx_bo;
> > + if (!oa)
> > + oa = p->adev->gds.oa_gfx_bo;
> > + }
> > +
> >   amdgpu_bo_list_for_each_entry(e, p->bo_list) {
> >   struct amdgpu_bo *bo = ttm_to_amdgpu_bo(e->tv.bo);
> >
> >   /* Make sure we use the exclusive slot for shared BOs */
> >   if (bo->prime_shared_count)
> >   e->tv.num_shared = 0;
> >   e->bo_va = amdgpu_vm_bo_find(vm, bo);
> >   }
> >
> >   if (gds) {
> > @@ -954,20 +962,22 @@ static int amdgpu_cs_ib_fill(struct amdgpu_device
> *adev,
> >   struct drm_amdgpu_cs_chunk_ib *chunk_ib;
> >   struct drm_sched_entity *entity;
> >
> >   chunk = &parser->chunks[i];
> >   ib = &parser->job->ibs[j];
> >   chunk_ib = (struct drm_amdgpu_cs_chunk_ib *)chunk->kdata;
> >
> >   if (chunk->chunk_id != AMDGPU_CHUNK_ID_IB)
> >   continue;
> >
> > + parser->hw_ip = chunk_ib->ip_type;
> > +
> >   if (chunk_ib->ip_ty

[PATCH] drm/amdgpu: reserve GDS resources on the gfx ring for gfx10

2019-07-16 Thread Marek Olšák

From: Marek Olšák 

Hopefully we'll only use 1 gfx ring, because otherwise we'd have to have
separate GDS buffers for each gfx ring.

This is a workaround to ensure stability of transform feedback. Shaders hang
waiting for a GDS instruction (ds_sub, not ds_ordered_count).

The disadvantage is that compute IBs might get a different VMID,
because now gfx always has GDS and compute doesn't.

Signed-off-by: Marek Olšák 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c  | 10 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_gds.h |  6 ++
 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c  | 20 
 4 files changed, 37 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 4b514a44184c..cbd55d061b72 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -456,20 +456,21 @@ struct amdgpu_cs_parser {
struct drm_file *filp;
struct amdgpu_ctx   *ctx;
 
/* chunks */
unsignednchunks;
struct amdgpu_cs_chunk  *chunks;
 
/* scheduler job object */
struct amdgpu_job   *job;
struct drm_sched_entity *entity;
+   unsignedhw_ip;
 
/* buffer objects */
struct ww_acquire_ctx   ticket;
struct amdgpu_bo_list   *bo_list;
struct amdgpu_mn*mn;
struct amdgpu_bo_list_entry vm_pd;
struct list_headvalidated;
struct dma_fence*fence;
uint64_tbytes_moved_threshold;
uint64_tbytes_moved_vis_threshold;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index c691df6f7a57..9125cd69a124 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -678,20 +678,28 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser 
*p,
if (r)
goto error_validate;
 
amdgpu_cs_report_moved_bytes(p->adev, p->bytes_moved,
 p->bytes_moved_vis);
 
gds = p->bo_list->gds_obj;
gws = p->bo_list->gws_obj;
oa = p->bo_list->oa_obj;
 
+   if (p->hw_ip == AMDGPU_HW_IP_GFX) {
+   /* Only gfx10 allocates these. */
+   if (!gds)
+   gds = p->adev->gds.gds_gfx_bo;
+   if (!oa)
+   oa = p->adev->gds.oa_gfx_bo;
+   }
+
amdgpu_bo_list_for_each_entry(e, p->bo_list) {
struct amdgpu_bo *bo = ttm_to_amdgpu_bo(e->tv.bo);
 
/* Make sure we use the exclusive slot for shared BOs */
if (bo->prime_shared_count)
e->tv.num_shared = 0;
e->bo_va = amdgpu_vm_bo_find(vm, bo);
}
 
if (gds) {
@@ -954,20 +962,22 @@ static int amdgpu_cs_ib_fill(struct amdgpu_device *adev,
struct drm_amdgpu_cs_chunk_ib *chunk_ib;
struct drm_sched_entity *entity;
 
chunk = &parser->chunks[i];
ib = &parser->job->ibs[j];
chunk_ib = (struct drm_amdgpu_cs_chunk_ib *)chunk->kdata;
 
if (chunk->chunk_id != AMDGPU_CHUNK_ID_IB)
continue;
 
+   parser->hw_ip = chunk_ib->ip_type;
+
if (chunk_ib->ip_type == AMDGPU_HW_IP_GFX &&
(amdgpu_mcbp || amdgpu_sriov_vf(adev))) {
if (chunk_ib->flags & AMDGPU_IB_FLAG_PREEMPT) {
if (chunk_ib->flags & AMDGPU_IB_FLAG_CE)
ce_preempt++;
else
de_preempt++;
}
 
/* each GFX command submit allows 0 or 1 IB preemptible 
for CE & DE */
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gds.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gds.h
index df8a23554831..0943b8e1d97e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gds.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gds.h
@@ -26,20 +26,26 @@
 
 struct amdgpu_ring;
 struct amdgpu_bo;
 
 struct amdgpu_gds {
uint32_t gds_size;
uint32_t gws_size;
uint32_t oa_size;
uint32_t gds_compute_max_wave_id;
uint32_t vgt_gs_max_wave_id;
+
+   /* Reserved OA for the gfx ring. (gfx10) */
+   uint32_t gds_gfx_reservation_size;
+   uint32_t oa_gfx_reservation_size;
+   struct amdgpu_bo *gds_gfx_bo;
+   struct amdgpu_bo *oa_gfx_bo;
 };
 
 struct amdgpu_gds_reg_offset {
uint32_tmem_base;
uint32_tmem_size;
uint32_tgws;
uint32_toa;
 };
 
 #endif /* __

Re: [PATCH] drm/amdgpu/gfx10: set SH_MEM_CONFIG.INITIAL_INST_PREFETCH

2019-07-15 Thread Marek Olšák

The patch doesn't apply. Can you rebase it?

Thanks,
Marek

On Fri, Jul 12, 2019 at 9:47 AM Haehnle, Nicolai 
wrote:

> Prefetch mode 0 is not supported and can lead to hangs with certain very
> specific code patterns. Set a sound prefetch mode for all VMIDs rather
> than forcing all shaders to set the prefetch mode at the beginning.
>
> Reduce code duplication a bit while we're at it. Note that the 64-bit
> address mode enum and the retry all enum are both 0, so the only
> functional change is in the INITIAL_INST_PREFETCH field.
>
> Signed-off-by: Nicolai Hähnle 
> --
> I haven't been able to properly test this yet, but it is the right thing
> to be doing in principle.
> ---
>  drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 27 ++
>  1 file changed, 10 insertions(+), 17 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> index 0d94c812df1b..b8498c359191 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> @@ -157,20 +157,27 @@ static const struct soc15_reg_golden
> golden_settings_gc_10_1_1[] =
> SOC15_REG_GOLDEN_VALUE(GC, 0, mmTA_CNTL_AUX, 0xfff7,
> 0x0103),
> SOC15_REG_GOLDEN_VALUE(GC, 0, mmTCP_CNTL, 0x6010, 0x479c0010),
> SOC15_REG_GOLDEN_VALUE(GC, 0, mmUTCL1_CTRL, 0x0080,
> 0x0080),
>  };
>
>  static const struct soc15_reg_golden golden_settings_gc_10_1_nv14[] =
>  {
> /* Pending on emulation bring up */
>  };
>
> +#define DEFAULT_SH_MEM_CONFIG \
> +   ((SH_MEM_ADDRESS_MODE_64 << SH_MEM_CONFIG__ADDRESS_MODE__SHIFT) | \
> +(SH_MEM_ALIGNMENT_MODE_UNALIGNED <<
> SH_MEM_CONFIG__ALIGNMENT_MODE__SHIFT) | \
> +(SH_MEM_RETRY_MODE_ALL << SH_MEM_CONFIG__RETRY_MODE__SHIFT) | \
> +(3 << SH_MEM_CONFIG__INITIAL_INST_PREFETCH__SHIFT))
> +
> +
>  static void gfx_v10_0_set_ring_funcs(struct amdgpu_device *adev);
>  static void gfx_v10_0_set_irq_funcs(struct amdgpu_device *adev);
>  static void gfx_v10_0_set_gds_init(struct amdgpu_device *adev);
>  static void gfx_v10_0_set_rlc_funcs(struct amdgpu_device *adev);
>  static int gfx_v10_0_get_cu_info(struct amdgpu_device *adev,
>   struct amdgpu_cu_info *cu_info);
>  static uint64_t gfx_v10_0_get_gpu_clock_counter(struct amdgpu_device
> *adev);
>  static void gfx_v10_0_select_se_sh(struct amdgpu_device *adev, u32 se_num,
>u32 sh_num, u32 instance);
>  static u32 gfx_v10_0_get_wgp_active_bitmap_per_sh(struct amdgpu_device
> *adev);
> @@ -1476,40 +1483,35 @@ static u32
> gfx_v10_0_init_pa_sc_tile_steering_override(struct amdgpu_device *ade
> return pa_sc_tile_steering_override;
>  }
>
>  #define DEFAULT_SH_MEM_BASES   (0x6000)
>  #define FIRST_COMPUTE_VMID (8)
>  #define LAST_COMPUTE_VMID  (16)
>
>  static void gfx_v10_0_init_compute_vmid(struct amdgpu_device *adev)
>  {
> int i;
> -   uint32_t sh_mem_config;
> uint32_t sh_mem_bases;
>
> /*
>  * Configure apertures:
>  * LDS: 0x6000' - 0x6001' (4GB)
>  * Scratch: 0x6001' - 0x6002' (4GB)
>  * GPUVM:   0x6001' - 0x6002' (1TB)
>  */
> sh_mem_bases = DEFAULT_SH_MEM_BASES | (DEFAULT_SH_MEM_BASES << 16);
>
> -   sh_mem_config = SH_MEM_ADDRESS_MODE_64 |
> -   SH_MEM_ALIGNMENT_MODE_UNALIGNED <<
> -   SH_MEM_CONFIG__ALIGNMENT_MODE__SHIFT;
> -
> mutex_lock(&adev->srbm_mutex);
> for (i = FIRST_COMPUTE_VMID; i < LAST_COMPUTE_VMID; i++) {
> nv_grbm_select(adev, 0, 0, 0, i);
> /* CP and shaders */
> -   WREG32_SOC15(GC, 0, mmSH_MEM_CONFIG, sh_mem_config);
> +   WREG32_SOC15(GC, 0, mmSH_MEM_CONFIG,
> DEFAULT_SH_MEM_CONFIG);
> WREG32_SOC15(GC, 0, mmSH_MEM_BASES, sh_mem_bases);
> }
> nv_grbm_select(adev, 0, 0, 0, 0);
> mutex_unlock(&adev->srbm_mutex);
>  }
>
>  static void gfx_v10_0_tcp_harvest(struct amdgpu_device *adev)
>  {
> int i, j, k;
> int max_wgp_per_sh = adev->gfx.config.max_cu_per_sh >> 1;
> @@ -1590,31 +1592,22 @@ static void gfx_v10_0_constants_init(struct
> amdgpu_device *adev)
> gfx_v10_0_get_cu_info(adev, &adev->gfx.cu_info);
> adev->gfx.config.pa_sc_tile_steering_override =
> gfx_v10_0_init_pa_sc_tile_steering_override(adev);
>
> /* XXX SH_MEM regs */
> /* where to put LDS, scratch, GPUVM in FSA64 space */
> mutex_lock(&adev->srbm_mutex);
> for (i = 0; i < adev->vm_manager.id_mgr[AMDGPU_GFXHUB].num_ids;
> i++) {
> nv_grbm_select(adev, 0, 0, 0, i);
> /* CP and shaders */
> -   if (i == 0) {
> -   tmp = REG_SET_FIELD(0, SH_MEM_CONFIG,
> ALIGNMENT_MODE,
> -
>  SH_MEM_ALIGNMENT_MODE_UN

Re: [PATCH] drm/amdgpu/gfx10: set SH_MEM_CONFIG.INITIAL_INST_PREFETCH

2019-07-12 Thread Marek Olšák

Reviewed-by: Marek Olšák 

Marek

On Fri, Jul 12, 2019 at 9:47 AM Haehnle, Nicolai 
wrote:

> Prefetch mode 0 is not supported and can lead to hangs with certain very
> specific code patterns. Set a sound prefetch mode for all VMIDs rather
> than forcing all shaders to set the prefetch mode at the beginning.
>
> Reduce code duplication a bit while we're at it. Note that the 64-bit
> address mode enum and the retry all enum are both 0, so the only
> functional change is in the INITIAL_INST_PREFETCH field.
>
> Signed-off-by: Nicolai Hähnle 
> --
> I haven't been able to properly test this yet, but it is the right thing
> to be doing in principle.
> ---
>  drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 27 ++
>  1 file changed, 10 insertions(+), 17 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> index 0d94c812df1b..b8498c359191 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> @@ -157,20 +157,27 @@ static const struct soc15_reg_golden
> golden_settings_gc_10_1_1[] =
> SOC15_REG_GOLDEN_VALUE(GC, 0, mmTA_CNTL_AUX, 0xfff7,
> 0x0103),
> SOC15_REG_GOLDEN_VALUE(GC, 0, mmTCP_CNTL, 0x6010, 0x479c0010),
> SOC15_REG_GOLDEN_VALUE(GC, 0, mmUTCL1_CTRL, 0x0080,
> 0x0080),
>  };
>
>  static const struct soc15_reg_golden golden_settings_gc_10_1_nv14[] =
>  {
> /* Pending on emulation bring up */
>  };
>
> +#define DEFAULT_SH_MEM_CONFIG \
> +   ((SH_MEM_ADDRESS_MODE_64 << SH_MEM_CONFIG__ADDRESS_MODE__SHIFT) | \
> +(SH_MEM_ALIGNMENT_MODE_UNALIGNED <<
> SH_MEM_CONFIG__ALIGNMENT_MODE__SHIFT) | \
> +(SH_MEM_RETRY_MODE_ALL << SH_MEM_CONFIG__RETRY_MODE__SHIFT) | \
> +(3 << SH_MEM_CONFIG__INITIAL_INST_PREFETCH__SHIFT))
> +
> +
>  static void gfx_v10_0_set_ring_funcs(struct amdgpu_device *adev);
>  static void gfx_v10_0_set_irq_funcs(struct amdgpu_device *adev);
>  static void gfx_v10_0_set_gds_init(struct amdgpu_device *adev);
>  static void gfx_v10_0_set_rlc_funcs(struct amdgpu_device *adev);
>  static int gfx_v10_0_get_cu_info(struct amdgpu_device *adev,
>   struct amdgpu_cu_info *cu_info);
>  static uint64_t gfx_v10_0_get_gpu_clock_counter(struct amdgpu_device
> *adev);
>  static void gfx_v10_0_select_se_sh(struct amdgpu_device *adev, u32 se_num,
>u32 sh_num, u32 instance);
>  static u32 gfx_v10_0_get_wgp_active_bitmap_per_sh(struct amdgpu_device
> *adev);
> @@ -1476,40 +1483,35 @@ static u32
> gfx_v10_0_init_pa_sc_tile_steering_override(struct amdgpu_device *ade
> return pa_sc_tile_steering_override;
>  }
>
>  #define DEFAULT_SH_MEM_BASES   (0x6000)
>  #define FIRST_COMPUTE_VMID (8)
>  #define LAST_COMPUTE_VMID  (16)
>
>  static void gfx_v10_0_init_compute_vmid(struct amdgpu_device *adev)
>  {
> int i;
> -   uint32_t sh_mem_config;
> uint32_t sh_mem_bases;
>
> /*
>  * Configure apertures:
>  * LDS: 0x6000' - 0x6001' (4GB)
>  * Scratch: 0x6001' - 0x6002' (4GB)
>  * GPUVM:   0x6001' - 0x6002' (1TB)
>  */
> sh_mem_bases = DEFAULT_SH_MEM_BASES | (DEFAULT_SH_MEM_BASES << 16);
>
> -   sh_mem_config = SH_MEM_ADDRESS_MODE_64 |
> -   SH_MEM_ALIGNMENT_MODE_UNALIGNED <<
> -   SH_MEM_CONFIG__ALIGNMENT_MODE__SHIFT;
> -
> mutex_lock(&adev->srbm_mutex);
> for (i = FIRST_COMPUTE_VMID; i < LAST_COMPUTE_VMID; i++) {
> nv_grbm_select(adev, 0, 0, 0, i);
> /* CP and shaders */
> -   WREG32_SOC15(GC, 0, mmSH_MEM_CONFIG, sh_mem_config);
> +   WREG32_SOC15(GC, 0, mmSH_MEM_CONFIG,
> DEFAULT_SH_MEM_CONFIG);
> WREG32_SOC15(GC, 0, mmSH_MEM_BASES, sh_mem_bases);
> }
> nv_grbm_select(adev, 0, 0, 0, 0);
> mutex_unlock(&adev->srbm_mutex);
>  }
>
>  static void gfx_v10_0_tcp_harvest(struct amdgpu_device *adev)
>  {
> int i, j, k;
> int max_wgp_per_sh = adev->gfx.config.max_cu_per_sh >> 1;
> @@ -1590,31 +1592,22 @@ static void gfx_v10_0_constants_init(struct
> amdgpu_device *adev)
> gfx_v10_0_get_cu_info(adev, &adev->gfx.cu_info);
> adev->gfx.config.pa_sc_tile_steering_override =
> gfx_v10_0_init_pa_sc_tile_steering_override(adev);
>
> /* XXX SH_MEM regs */
&

Re: [PATCH] drm/amdgpu: don't invalidate caches in RELEASE_MEM, only do the writeback

2019-07-08 Thread Marek Olšák

ping

On Tue, Jul 2, 2019 at 2:29 PM Marek Olšák  wrote:

> From: Marek Olšák 
>
> This RELEASE_MEM use has the Release semantic, which means we should write
> back but not invalidate. Invalidations only make sense with the Acquire
> semantic (ACQUIRE_MEM), or when RELEASE_MEM is used to do the combined
> Acquire-Release semantic, which is a barrier, not a fence.
>
> The undesirable side effect of doing invalidations for the Release semantic
> is that it invalidates caches while shaders are running, because the
> Release
> can execute in the middle of the next IB.
>
> UMDs should use ACQUIRE_MEM at the beginning of IBs. Doing cache
> invalidations for a fence (like in this case) doesn't do anything
> for correctness.
>
> Signed-off-by: Marek Olšák 
> ---
>  drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 6 +-
>  1 file changed, 1 insertion(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> index 210d24511dc6..a30f5d4913b9 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> @@ -4296,25 +4296,21 @@ static void gfx_v10_0_ring_emit_fence(struct
> amdgpu_ring *ring, u64 addr,
> bool int_sel = flags & AMDGPU_FENCE_FLAG_INT;
>
> /* Interrupt not work fine on GFX10.1 model yet. Use fallback
> instead */
> if (adev->pdev->device == 0x50)
> int_sel = false;
>
> /* RELEASE_MEM - flush caches, send int */
> amdgpu_ring_write(ring, PACKET3(PACKET3_RELEASE_MEM, 6));
> amdgpu_ring_write(ring, (PACKET3_RELEASE_MEM_GCR_SEQ |
>  PACKET3_RELEASE_MEM_GCR_GL2_WB |
> -PACKET3_RELEASE_MEM_GCR_GL2_INV |
> -PACKET3_RELEASE_MEM_GCR_GL2_US |
> -PACKET3_RELEASE_MEM_GCR_GL1_INV |
> -PACKET3_RELEASE_MEM_GCR_GLV_INV |
> -PACKET3_RELEASE_MEM_GCR_GLM_INV |
> +PACKET3_RELEASE_MEM_GCR_GLM_INV | /* must
> be set with GLM_WB */
>  PACKET3_RELEASE_MEM_GCR_GLM_WB |
>  PACKET3_RELEASE_MEM_CACHE_POLICY(3) |
>
>  PACKET3_RELEASE_MEM_EVENT_TYPE(CACHE_FLUSH_AND_INV_TS_EVENT) |
>  PACKET3_RELEASE_MEM_EVENT_INDEX(5)));
> amdgpu_ring_write(ring, (PACKET3_RELEASE_MEM_DATA_SEL(write64bit ?
> 2 : 1) |
>  PACKET3_RELEASE_MEM_INT_SEL(int_sel ? 2 :
> 0)));
>
> /*
>  * the address should be Qword aligned if 64bit write, Dword
>  * aligned if only send 32bit data low (discard data high)
> --
> 2.17.1
>
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: The problem "ring gfx timeout" are experienced yet another AMD GPU Vega 8 user

2019-07-03 Thread Marek Olšák

It looks like memory corruption. You can try to disable IOMMU in the BIOS.

Marek

On Tue, Jul 2, 2019 at 12:07 AM Mikhail Gavrilov <
mikhail.v.gavri...@gmail.com> wrote:

> On Wed, 27 Feb 2019 at 00:57, Marek Olšák  wrote:
> >
> > Sadly, the logs don't contain any clue as to why it hangs.
> >
> > It would be helpful to check if the hang can be reproduced on Vega 56 or
> 64 as well.
> >
> > Marek
> >
>
> Hi, Marek.
>
> I'm sorry to trouble you.
> But today the user of described above Vega 8 graphic sended me fresh logs.
>
> Actual versions: kernel 5.1.15 / DRM 3.30.0 / Mesa 19.0. / LLVM 8.0.0
>
> I uploaded all logs to mega cloud storage.
> Can you look this logs please?
>
> https://mega.nz/#F!Mt5mhKiI!8Sv2T5a6yTxBqVknhH1NjA
>
>
> --
> Best Regards,
> Mike Gavrilov.
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[ANNOUNCE] libdrm 2.4.99

2019-07-02 Thread Marek Olšák

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512


Adrian Salido (1):
  libdrm: reduce number of reallocations in drmModeAtomicAddProperty

Chunming Zhou (9):
  add cs chunk for syncobj timeline
  add timeline wait/query ioctl v2
  wrap syncobj timeline query/wait APIs for amdgpu v3
  add timeline signal/transfer ioctls v2
  expose timeline signal/export/import interfaces v2
  wrap transfer interfaces
  add syncobj timeline tests v3
  update drm.h
  enable syncobj test depending on capability

Hawking Zhang (1):
  libdrm/amdgpu: add new member in drm_amdgpu_device_info for navi10

Hemant Hariyani (1):
  libdrm: omap: Add DRM_RDWR flag to dmabuf export

Huang Rui (1):
  amdgpu: add navi family id

Ilia Mirkin (11):
  util: add C8 format, support it with SMPTE pattern
  util: fix MAKE_RGBA macro for 10bpp modes
  util: add gradient pattern
  util: add fp16 format support
  util: add cairo drawing for 30bpp formats when available
  modetest: don't pretend that atomic mode includes a format
  modetest: add an add_property_optional variant that does not print errors
  modetest: add C8 support to generate SMPTE pattern
  modetest: add the ability to specify fill patterns on the commandline
  modetest: add FP16 format support
  util: fix include path for drm_mode.h

John Stultz (2):
  libdrm: Android.mk: Add minimal Android platform check
  libdrm: amdgpu: Initialize unions with memset rather than "= {0}"

Leo Liu (1):
  tests/amdgpu/vcn: add VCN2.0 decode support

Lucas Stach (1):
  etnaviv: drop etna_bo_from_handle symbol

Marek Olšák (1):
  Bump version to 2.4.99

Marek Vasut (1):
  etnaviv: Fix double-free in etna_bo_cache_free()

Michel Dänzer (6):
  amdgpu: Add amdgpu_cs_syncobj_transfer to amdgpu-symbol-check
  amdgpu: Move union declaration to top of amdgpu_cs_ctx_override_priority
  amdgpu: Update amdgpu_bo_handle_type_kms_noimport documentation
  amdgpu: Pass file descriptor directly to amdgpu_close_kms_handle
  amdgpu: Add BO handle to table in amdgpu_bo_create
  amdgpu: Rename fd_mutex/list to dev_mutex/list

Prabhanjan Kandula (1):
  libdrm: Avoid additional drm open close

Sean Paul (1):
  libdrm: Use mmap64 instead of __mmap2

Seung-Woo Kim (2):
  tests/libkms-test-plane: fix possbile memory leak
  xf86drm: Fix possible memory leak with drmModeGetPropertyPtr()

Tao Zhou (1):
  libdrm/amdgpu: add new vram type (GDDR6) for navi10

git tag: libdrm-2.4.99

https://dri.freedesktop.org/libdrm/libdrm-2.4.99.tar.bz2
MD5:  72539626815b35159a63d45bc4c14ee6  libdrm-2.4.99.tar.bz2
SHA1: e15a3fcc2d321b03d233a245a8593abde7feefd4  libdrm-2.4.99.tar.bz2
SHA256: 4dbf539c7ed25dbb2055090b77ab87508fc46be39a9379d15fed4b5517e1da5e  
libdrm-2.4.99.tar.bz2
SHA512: 
04702eebe8dca97fac61653623804fdcb0b8b3714bdc6f5e72f0dfdce9c9524cf16f69d37aa9feac79ddc1c11939be44a216484563a612414668ea5eaeadf191
  libdrm-2.4.99.tar.bz2
PGP:  https://dri.freedesktop.org/libdrm/libdrm-2.4.99.tar.bz2.sig

https://dri.freedesktop.org/libdrm/libdrm-2.4.99.tar.gz
MD5:  4c6951cfe4094805fe1f1cb39f5dbfc2  libdrm-2.4.99.tar.gz
SHA1: 402b6b1c2db1a6b754a4ecb5775ecc74d02541e8  libdrm-2.4.99.tar.gz
SHA256: 597fb879e2f45193431a0d352d10cd79ef61a24ab31f44320168583e10cb6302  
libdrm-2.4.99.tar.gz
SHA512: 
f0f90b8d115897600d0882a82b35f825558f40189f798dcbbdfd7b5f1b9096b38aa44ebafc9bc439c5e968b0bd56a886b5367706785f980983f79138c0943644
  libdrm-2.4.99.tar.gz
PGP:  https://dri.freedesktop.org/libdrm/libdrm-2.4.99.tar.gz.sig

-BEGIN PGP SIGNATURE-

iQEzBAEBCgAdFiEEzUfFNBo3XzO+97r6/dFdWs7w8rEFAl0bpQYACgkQ/dFdWs7w
8rE7Zgf/R6BzBoY9gvGMwKeCmgRogH8CiBrNcyLrdGzahu2UcfCvc5j2nmMY7aUR
VN1/JPag5t3VWs/V2Oufd0YroYzj4swC8hkj29XXZU1wg7VJNBc0uDoF22jE9mpN
X8+34YUStTrBWjAHZ/SAVuBh152ppczP7isfAlEm+xZd2PcbV20Efmr8JVWjmpJV
2DeAes38E8uL4T/meeWOEEZVQjcA7CaTJnQzv0qnSWUI7PfjtFzlcubRbRj+BOmb
pjVrvjFFbv0B4gzUsJ8r13thWewDNNJGsMVlv7cVLRrgSkJbxEQ351h/XJVWunTb
tcaCfu0ODS+ADO63lDMllNqswrXe8A==
=TU5/
-END PGP SIGNATURE-
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] drm/amdgpu: don't invalidate caches in RELEASE_MEM, only do the writeback

2019-07-02 Thread Marek Olšák

From: Marek Olšák 

This RELEASE_MEM use has the Release semantic, which means we should write
back but not invalidate. Invalidations only make sense with the Acquire
semantic (ACQUIRE_MEM), or when RELEASE_MEM is used to do the combined
Acquire-Release semantic, which is a barrier, not a fence.

The undesirable side effect of doing invalidations for the Release semantic
is that it invalidates caches while shaders are running, because the Release
can execute in the middle of the next IB.

UMDs should use ACQUIRE_MEM at the beginning of IBs. Doing cache
invalidations for a fence (like in this case) doesn't do anything
for correctness.

Signed-off-by: Marek Olšák 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
index 210d24511dc6..a30f5d4913b9 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
@@ -4296,25 +4296,21 @@ static void gfx_v10_0_ring_emit_fence(struct 
amdgpu_ring *ring, u64 addr,
bool int_sel = flags & AMDGPU_FENCE_FLAG_INT;
 
/* Interrupt not work fine on GFX10.1 model yet. Use fallback instead */
if (adev->pdev->device == 0x50)
int_sel = false;
 
/* RELEASE_MEM - flush caches, send int */
amdgpu_ring_write(ring, PACKET3(PACKET3_RELEASE_MEM, 6));
amdgpu_ring_write(ring, (PACKET3_RELEASE_MEM_GCR_SEQ |
 PACKET3_RELEASE_MEM_GCR_GL2_WB |
-PACKET3_RELEASE_MEM_GCR_GL2_INV |
-PACKET3_RELEASE_MEM_GCR_GL2_US |
-PACKET3_RELEASE_MEM_GCR_GL1_INV |
-PACKET3_RELEASE_MEM_GCR_GLV_INV |
-PACKET3_RELEASE_MEM_GCR_GLM_INV |
+PACKET3_RELEASE_MEM_GCR_GLM_INV | /* must be 
set with GLM_WB */
 PACKET3_RELEASE_MEM_GCR_GLM_WB |
 PACKET3_RELEASE_MEM_CACHE_POLICY(3) |
 
PACKET3_RELEASE_MEM_EVENT_TYPE(CACHE_FLUSH_AND_INV_TS_EVENT) |
 PACKET3_RELEASE_MEM_EVENT_INDEX(5)));
amdgpu_ring_write(ring, (PACKET3_RELEASE_MEM_DATA_SEL(write64bit ? 2 : 
1) |
 PACKET3_RELEASE_MEM_INT_SEL(int_sel ? 2 : 0)));
 
/*
 * the address should be Qword aligned if 64bit write, Dword
 * aligned if only send 32bit data low (discard data high)
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 2/2] drm/amdgpu: handle AMDGPU_IB_FLAG_RESET_GDS_MAX_WAVE_ID on gfx10

2019-06-28 Thread Marek Olšák

Thanks. I'll push both patches with emit_ib_size updated for this patch.

Marek

On Thu, Jun 27, 2019 at 3:50 AM zhoucm1  wrote:

> any reason for not care .emit_ib_size in this one?
>
> -David
>
>
> On 2019年06月27日 06:35, Marek Olšák wrote:
> > From: Marek Olšák 
> >
> > Signed-off-by: Marek Olšák 
> > ---
> >   drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 17 +
> >   1 file changed, 17 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> > index 6baaa65a1daa..5b807a19bbbf 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> > @@ -4257,20 +4257,36 @@ static void gfx_v10_0_ring_emit_ib_gfx(struct
> amdgpu_ring *ring,
> >   }
> >
> >   static void gfx_v10_0_ring_emit_ib_compute(struct amdgpu_ring *ring,
> >  struct amdgpu_job *job,
> >  struct amdgpu_ib *ib,
> >  uint32_t flags)
> >   {
> >   unsigned vmid = AMDGPU_JOB_GET_VMID(job);
> >   u32 control = INDIRECT_BUFFER_VALID | ib->length_dw | (vmid << 24);
> >
> > + /* Currently, there is a high possibility to get wave ID mismatch
> > +  * between ME and GDS, leading to a hw deadlock, because ME
> generates
> > +  * different wave IDs than the GDS expects. This situation happens
> > +  * randomly when at least 5 compute pipes use GDS ordered append.
> > +  * The wave IDs generated by ME are also wrong after
> suspend/resume.
> > +  * Those are probably bugs somewhere else in the kernel driver.
> > +  *
> > +  * Writing GDS_COMPUTE_MAX_WAVE_ID resets wave ID counters in ME
> and
> > +  * GDS to 0 for this ring (me/pipe).
> > +  */
> > + if (ib->flags & AMDGPU_IB_FLAG_RESET_GDS_MAX_WAVE_ID) {
> > + amdgpu_ring_write(ring, PACKET3(PACKET3_SET_CONFIG_REG,
> 1));
> > + amdgpu_ring_write(ring, mmGDS_COMPUTE_MAX_WAVE_ID);
> > + amdgpu_ring_write(ring,
> ring->adev->gds.gds_compute_max_wave_id);
> > + }
> > +
> >   amdgpu_ring_write(ring, PACKET3(PACKET3_INDIRECT_BUFFER, 2));
> >   BUG_ON(ib->gpu_addr & 0x3); /* Dword align */
> >   amdgpu_ring_write(ring,
> >   #ifdef __BIG_ENDIAN
> >   (2 << 0) |
> >   #endif
> >   lower_32_bits(ib->gpu_addr));
> >   amdgpu_ring_write(ring, upper_32_bits(ib->gpu_addr));
> >   amdgpu_ring_write(ring, control);
> >   }
> > @@ -5103,20 +5119,21 @@ static void gfx_v10_0_set_rlc_funcs(struct
> amdgpu_device *adev)
> >   }
> >   }
> >
> >   static void gfx_v10_0_set_gds_init(struct amdgpu_device *adev)
> >   {
> >   /* init asic gds info */
> >   switch (adev->asic_type) {
> >   case CHIP_NAVI10:
> >   default:
> >   adev->gds.gds_size = 0x1;
> > + adev->gds.gds_compute_max_wave_id = 0x4ff;
> >   adev->gds.vgt_gs_max_wave_id = 0x3ff;
> >   break;
> >   }
> >
> >   adev->gds.gws_size = 64;
> >   adev->gds.oa_size = 16;
> >   }
> >
> >   static void gfx_v10_0_set_user_wgp_inactive_bitmap_per_sh(struct
> amdgpu_device *adev,
> > u32 bitmap)
>
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 2/2] drm/amdgpu: handle AMDGPU_IB_FLAG_RESET_GDS_MAX_WAVE_ID on gfx10

2019-06-26 Thread Marek Olšák

From: Marek Olšák 

Signed-off-by: Marek Olšák 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 17 +
 1 file changed, 17 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
index 6baaa65a1daa..5b807a19bbbf 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
@@ -4257,20 +4257,36 @@ static void gfx_v10_0_ring_emit_ib_gfx(struct 
amdgpu_ring *ring,
 }
 
 static void gfx_v10_0_ring_emit_ib_compute(struct amdgpu_ring *ring,
   struct amdgpu_job *job,
   struct amdgpu_ib *ib,
   uint32_t flags)
 {
unsigned vmid = AMDGPU_JOB_GET_VMID(job);
u32 control = INDIRECT_BUFFER_VALID | ib->length_dw | (vmid << 24);
 
+   /* Currently, there is a high possibility to get wave ID mismatch
+* between ME and GDS, leading to a hw deadlock, because ME generates
+* different wave IDs than the GDS expects. This situation happens
+* randomly when at least 5 compute pipes use GDS ordered append.
+* The wave IDs generated by ME are also wrong after suspend/resume.
+* Those are probably bugs somewhere else in the kernel driver.
+*
+* Writing GDS_COMPUTE_MAX_WAVE_ID resets wave ID counters in ME and
+* GDS to 0 for this ring (me/pipe).
+*/
+   if (ib->flags & AMDGPU_IB_FLAG_RESET_GDS_MAX_WAVE_ID) {
+   amdgpu_ring_write(ring, PACKET3(PACKET3_SET_CONFIG_REG, 1));
+   amdgpu_ring_write(ring, mmGDS_COMPUTE_MAX_WAVE_ID);
+   amdgpu_ring_write(ring, 
ring->adev->gds.gds_compute_max_wave_id);
+   }
+
amdgpu_ring_write(ring, PACKET3(PACKET3_INDIRECT_BUFFER, 2));
BUG_ON(ib->gpu_addr & 0x3); /* Dword align */
amdgpu_ring_write(ring,
 #ifdef __BIG_ENDIAN
(2 << 0) |
 #endif
lower_32_bits(ib->gpu_addr));
amdgpu_ring_write(ring, upper_32_bits(ib->gpu_addr));
amdgpu_ring_write(ring, control);
 }
@@ -5103,20 +5119,21 @@ static void gfx_v10_0_set_rlc_funcs(struct 
amdgpu_device *adev)
}
 }
 
 static void gfx_v10_0_set_gds_init(struct amdgpu_device *adev)
 {
/* init asic gds info */
switch (adev->asic_type) {
case CHIP_NAVI10:
default:
adev->gds.gds_size = 0x1;
+   adev->gds.gds_compute_max_wave_id = 0x4ff;
adev->gds.vgt_gs_max_wave_id = 0x3ff;
break;
}
 
adev->gds.gws_size = 64;
adev->gds.oa_size = 16;
 }
 
 static void gfx_v10_0_set_user_wgp_inactive_bitmap_per_sh(struct amdgpu_device 
*adev,
  u32 bitmap)
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 1/2] drm/amdgpu: fix transform feedback GDS hang on gfx10 (v2)

2019-06-26 Thread Marek Olšák

From: Marek Olšák 

v2: update emit_ib_size
(though it's still wrong because it was wrong before)

Signed-off-by: Marek Olšák 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gds.h |  3 ++-
 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c  | 14 +++---
 2 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gds.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gds.h
index dad2186f4ed5..df8a23554831 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gds.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gds.h
@@ -24,21 +24,22 @@
 #ifndef __AMDGPU_GDS_H__
 #define __AMDGPU_GDS_H__
 
 struct amdgpu_ring;
 struct amdgpu_bo;
 
 struct amdgpu_gds {
uint32_t gds_size;
uint32_t gws_size;
uint32_t oa_size;
-   uint32_tgds_compute_max_wave_id;
+   uint32_t gds_compute_max_wave_id;
+   uint32_t vgt_gs_max_wave_id;
 };
 
 struct amdgpu_gds_reg_offset {
uint32_tmem_base;
uint32_tmem_size;
uint32_tgws;
uint32_toa;
 };
 
 #endif /* __AMDGPU_GDS_H__ */
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
index 16b2bcc590e7..6baaa65a1daa 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
@@ -4211,20 +4211,29 @@ static void gfx_v10_0_ring_emit_hdp_flush(struct 
amdgpu_ring *ring)
 }
 
 static void gfx_v10_0_ring_emit_ib_gfx(struct amdgpu_ring *ring,
   struct amdgpu_job *job,
   struct amdgpu_ib *ib,
   uint32_t flags)
 {
unsigned vmid = AMDGPU_JOB_GET_VMID(job);
u32 header, control = 0;
 
+   /* Prevent a hw deadlock due to a wave ID mismatch between ME and GDS.
+* This resets the wave ID counters. (needed by transform feedback)
+* TODO: This might only be needed on a VMID switch when we change
+*   the GDS OA mapping, not sure.
+*/
+   amdgpu_ring_write(ring, PACKET3(PACKET3_SET_CONFIG_REG, 1));
+   amdgpu_ring_write(ring, mmVGT_GS_MAX_WAVE_ID);
+   amdgpu_ring_write(ring, ring->adev->gds.vgt_gs_max_wave_id);
+
if (ib->flags & AMDGPU_IB_FLAG_CE)
header = PACKET3(PACKET3_INDIRECT_BUFFER_CNST, 2);
else
header = PACKET3(PACKET3_INDIRECT_BUFFER, 2);
 
control |= ib->length_dw | (vmid << 24);
 
if (amdgpu_mcbp && (ib->flags & AMDGPU_IB_FLAG_PREEMPT)) {
control |= INDIRECT_BUFFER_PRE_ENB(1);
 
@@ -4944,21 +4953,21 @@ static const struct amdgpu_ring_funcs 
gfx_v10_0_ring_funcs_gfx = {
 */
5 + /* COND_EXEC */
7 + /* HDP_flush */
4 + /* VGT_flush */
14 + /* CE_META */
31 + /* DE_META */
3 + /* CNTX_CTRL */
5 + /* HDP_INVL */
8 + 8 + /* FENCE x2 */
2, /* SWITCH_BUFFER */
-   .emit_ib_size = 4, /* gfx_v10_0_ring_emit_ib_gfx */
+   .emit_ib_size = 7, /* gfx_v10_0_ring_emit_ib_gfx */
.emit_ib = gfx_v10_0_ring_emit_ib_gfx,
.emit_fence = gfx_v10_0_ring_emit_fence,
.emit_pipeline_sync = gfx_v10_0_ring_emit_pipeline_sync,
.emit_vm_flush = gfx_v10_0_ring_emit_vm_flush,
.emit_gds_switch = gfx_v10_0_ring_emit_gds_switch,
.emit_hdp_flush = gfx_v10_0_ring_emit_hdp_flush,
.test_ring = gfx_v10_0_ring_test_ring,
.test_ib = gfx_v10_0_ring_test_ib,
.insert_nop = amdgpu_ring_insert_nop,
.pad_ib = amdgpu_ring_generic_pad_ib,
@@ -5092,24 +5101,23 @@ static void gfx_v10_0_set_rlc_funcs(struct 
amdgpu_device *adev)
default:
break;
}
 }
 
 static void gfx_v10_0_set_gds_init(struct amdgpu_device *adev)
 {
/* init asic gds info */
switch (adev->asic_type) {
case CHIP_NAVI10:
-   adev->gds.gds_size = 0x1;
-   break;
default:
adev->gds.gds_size = 0x1;
+   adev->gds.vgt_gs_max_wave_id = 0x3ff;
break;
}
 
adev->gds.gws_size = 64;
adev->gds.oa_size = 16;
 }
 
 static void gfx_v10_0_set_user_wgp_inactive_bitmap_per_sh(struct amdgpu_device 
*adev,
  u32 bitmap)
 {
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 1/2] drm/amdgpu: fix transform feedback GDS hang on gfx10

2019-06-19 Thread Marek Olšák

From: Marek Olšák 

Signed-off-by: Marek Olšák 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gds.h |  3 ++-
 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c  | 12 ++--
 2 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gds.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gds.h
index dad2186f4ed5..df8a23554831 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gds.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gds.h
@@ -24,21 +24,22 @@
 #ifndef __AMDGPU_GDS_H__
 #define __AMDGPU_GDS_H__
 
 struct amdgpu_ring;
 struct amdgpu_bo;
 
 struct amdgpu_gds {
uint32_t gds_size;
uint32_t gws_size;
uint32_t oa_size;
-   uint32_tgds_compute_max_wave_id;
+   uint32_t gds_compute_max_wave_id;
+   uint32_t vgt_gs_max_wave_id;
 };
 
 struct amdgpu_gds_reg_offset {
uint32_tmem_base;
uint32_tmem_size;
uint32_tgws;
uint32_toa;
 };
 
 #endif /* __AMDGPU_GDS_H__ */
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
index 0090cba2d24d..75a34779a57c 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
@@ -4213,20 +4213,29 @@ static void gfx_v10_0_ring_emit_hdp_flush(struct 
amdgpu_ring *ring)
 }
 
 static void gfx_v10_0_ring_emit_ib_gfx(struct amdgpu_ring *ring,
   struct amdgpu_job *job,
   struct amdgpu_ib *ib,
   uint32_t flags)
 {
unsigned vmid = AMDGPU_JOB_GET_VMID(job);
u32 header, control = 0;
 
+   /* Prevent a hw deadlock due to a wave ID mismatch between ME and GDS.
+* This resets the wave ID counters. (needed by transform feedback)
+* TODO: This might only be needed on a VMID switch when we change
+*   the GDS OA mapping, not sure.
+*/
+   amdgpu_ring_write(ring, PACKET3(PACKET3_SET_CONFIG_REG, 1));
+   amdgpu_ring_write(ring, mmVGT_GS_MAX_WAVE_ID);
+   amdgpu_ring_write(ring, ring->adev->gds.vgt_gs_max_wave_id);
+
if (ib->flags & AMDGPU_IB_FLAG_CE)
header = PACKET3(PACKET3_INDIRECT_BUFFER_CNST, 2);
else
header = PACKET3(PACKET3_INDIRECT_BUFFER, 2);
 
control |= ib->length_dw | (vmid << 24);
 
if (amdgpu_mcbp && (ib->flags & AMDGPU_IB_FLAG_PREEMPT)) {
control |= INDIRECT_BUFFER_PRE_ENB(1);
 
@@ -5094,24 +5103,23 @@ static void gfx_v10_0_set_rlc_funcs(struct 
amdgpu_device *adev)
default:
break;
}
 }
 
 static void gfx_v10_0_set_gds_init(struct amdgpu_device *adev)
 {
/* init asic gds info */
switch (adev->asic_type) {
case CHIP_NAVI10:
-   adev->gds.gds_size = 0x1;
-   break;
default:
adev->gds.gds_size = 0x1;
+   adev->gds.vgt_gs_max_wave_id = 0x3ff;
break;
}
 
adev->gds.gws_size = 64;
adev->gds.oa_size = 16;
 }
 
 static void gfx_v10_0_set_user_wgp_inactive_bitmap_per_sh(struct amdgpu_device 
*adev,
  u32 bitmap)
 {
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 2/2] drm/amdgpu: handle AMDGPU_IB_FLAG_RESET_GDS_MAX_WAVE_ID on gfx10

2019-06-19 Thread Marek Olšák

From: Marek Olšák 

Signed-off-by: Marek Olšák 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 17 +
 1 file changed, 17 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
index 75a34779a57c..77507b2a4652 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
@@ -4259,20 +4259,36 @@ static void gfx_v10_0_ring_emit_ib_gfx(struct 
amdgpu_ring *ring,
 }
 
 static void gfx_v10_0_ring_emit_ib_compute(struct amdgpu_ring *ring,
   struct amdgpu_job *job,
   struct amdgpu_ib *ib,
   uint32_t flags)
 {
unsigned vmid = AMDGPU_JOB_GET_VMID(job);
u32 control = INDIRECT_BUFFER_VALID | ib->length_dw | (vmid << 24);
 
+   /* Currently, there is a high possibility to get wave ID mismatch
+* between ME and GDS, leading to a hw deadlock, because ME generates
+* different wave IDs than the GDS expects. This situation happens
+* randomly when at least 5 compute pipes use GDS ordered append.
+* The wave IDs generated by ME are also wrong after suspend/resume.
+* Those are probably bugs somewhere else in the kernel driver.
+*
+* Writing GDS_COMPUTE_MAX_WAVE_ID resets wave ID counters in ME and
+* GDS to 0 for this ring (me/pipe).
+*/
+   if (ib->flags & AMDGPU_IB_FLAG_RESET_GDS_MAX_WAVE_ID) {
+   amdgpu_ring_write(ring, PACKET3(PACKET3_SET_CONFIG_REG, 1));
+   amdgpu_ring_write(ring, mmGDS_COMPUTE_MAX_WAVE_ID);
+   amdgpu_ring_write(ring, 
ring->adev->gds.gds_compute_max_wave_id);
+   }
+
amdgpu_ring_write(ring, PACKET3(PACKET3_INDIRECT_BUFFER, 2));
BUG_ON(ib->gpu_addr & 0x3); /* Dword align */
amdgpu_ring_write(ring,
 #ifdef __BIG_ENDIAN
(2 << 0) |
 #endif
lower_32_bits(ib->gpu_addr));
amdgpu_ring_write(ring, upper_32_bits(ib->gpu_addr));
amdgpu_ring_write(ring, control);
 }
@@ -5105,20 +5121,21 @@ static void gfx_v10_0_set_rlc_funcs(struct 
amdgpu_device *adev)
}
 }
 
 static void gfx_v10_0_set_gds_init(struct amdgpu_device *adev)
 {
/* init asic gds info */
switch (adev->asic_type) {
case CHIP_NAVI10:
default:
adev->gds.gds_size = 0x1;
+   adev->gds.gds_compute_max_wave_id = 0x4ff;
adev->gds.vgt_gs_max_wave_id = 0x3ff;
break;
}
 
adev->gds.gws_size = 64;
adev->gds.oa_size = 16;
 }
 
 static void gfx_v10_0_set_user_wgp_inactive_bitmap_per_sh(struct amdgpu_device 
*adev,
  u32 bitmap)
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] drm/amdgpu: bump the DRM version for GDS ENOMEM fixes

2019-06-04 Thread Marek Olšák

From: Marek Olšák 

Signed-off-by: Marek Olšák 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 1f38d6fc1fe3..f9462ad2a314 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -69,23 +69,24 @@
  * - 3.23.0 - Add query for VRAM lost counter
  * - 3.24.0 - Add high priority compute support for gfx9
  * - 3.25.0 - Add support for sensor query info (stable pstate sclk/mclk).
  * - 3.26.0 - GFX9: Process AMDGPU_IB_FLAG_TC_WB_NOT_INVALIDATE.
  * - 3.27.0 - Add new chunk to to AMDGPU_CS to enable BO_LIST creation.
  * - 3.28.0 - Add AMDGPU_CHUNK_ID_SCHEDULED_DEPENDENCIES
  * - 3.29.0 - Add AMDGPU_IB_FLAG_RESET_GDS_MAX_WAVE_ID
  * - 3.30.0 - Add AMDGPU_SCHED_OP_CONTEXT_PRIORITY_OVERRIDE.
  * - 3.31.0 - Add support for per-flip tiling attribute changes with DC
  * - 3.32.0 - Add syncobj timeline support to AMDGPU_CS.
+ * - 3.33.0 - Fixes for GDS ENOMEM failures in AMDGPU_CS.
  */
 #define KMS_DRIVER_MAJOR   3
-#define KMS_DRIVER_MINOR   32
+#define KMS_DRIVER_MINOR   33
 #define KMS_DRIVER_PATCHLEVEL  0
 
 #define AMDGPU_MAX_TIMEOUT_PARAM_LENTH 256
 
 int amdgpu_vram_limit = 0;
 int amdgpu_vis_vram_limit = 0;
 int amdgpu_gart_size = -1; /* auto */
 int amdgpu_gtt_size = -1; /* auto */
 int amdgpu_moverate = -1; /* auto */
 int amdgpu_benchmarking = 0;
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] drm/amdgpu: bump the DRM version for GDS ENOMEM fixes

2019-06-04 Thread Marek Olšák

From: Marek Olšák 

Signed-off-by: Marek Olšák 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 1f38d6fc1fe3..7daa2a8f1c08 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -69,20 +69,21 @@
  * - 3.23.0 - Add query for VRAM lost counter
  * - 3.24.0 - Add high priority compute support for gfx9
  * - 3.25.0 - Add support for sensor query info (stable pstate sclk/mclk).
  * - 3.26.0 - GFX9: Process AMDGPU_IB_FLAG_TC_WB_NOT_INVALIDATE.
  * - 3.27.0 - Add new chunk to to AMDGPU_CS to enable BO_LIST creation.
  * - 3.28.0 - Add AMDGPU_CHUNK_ID_SCHEDULED_DEPENDENCIES
  * - 3.29.0 - Add AMDGPU_IB_FLAG_RESET_GDS_MAX_WAVE_ID
  * - 3.30.0 - Add AMDGPU_SCHED_OP_CONTEXT_PRIORITY_OVERRIDE.
  * - 3.31.0 - Add support for per-flip tiling attribute changes with DC
  * - 3.32.0 - Add syncobj timeline support to AMDGPU_CS.
+ * - 3.33.0 - Fixes for GDS ENOMEM failures in AMDGPU_CS.
  */
 #define KMS_DRIVER_MAJOR   3
 #define KMS_DRIVER_MINOR   32
 #define KMS_DRIVER_PATCHLEVEL  0
 
 #define AMDGPU_MAX_TIMEOUT_PARAM_LENTH 256
 
 int amdgpu_vram_limit = 0;
 int amdgpu_vis_vram_limit = 0;
 int amdgpu_gart_size = -1; /* auto */
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 11/11] drm/amdgpu: stop removing BOs from the LRU during CS

2019-05-14 Thread Marek Olšák

This series fixes the OOM errors. However, if I torture the kernel driver
more, I can get it to deadlock and end up with unkillable processes. I can
also get an OOM error. I just ran the test 5 times:

AMD_DEBUG=testgdsmm glxgears & AMD_DEBUG=testgdsmm glxgears &
AMD_DEBUG=testgdsmm glxgears & AMD_DEBUG=testgdsmm glxgears &
AMD_DEBUG=testgdsmm glxgears

Marek

On Tue, May 14, 2019 at 8:31 AM Christian König <
ckoenig.leichtzumer...@gmail.com> wrote:

> This avoids OOM situations when we have lots of threads
> submitting at the same time.
>
> Signed-off-by: Christian König 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> index fff558cf385b..f9240a94217b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> @@ -648,7 +648,7 @@ static int amdgpu_cs_parser_bos(struct
> amdgpu_cs_parser *p,
> }
>
> r = ttm_eu_reserve_buffers(&p->ticket, &p->validated, true,
> -  &duplicates, true);
> +  &duplicates, false);
> if (unlikely(r != 0)) {
> if (r != -ERESTARTSYS)
> DRM_ERROR("ttm_eu_reserve_buffers failed.\n");
> --
> 2.17.1
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: remove static GDS, GWS and OA allocation

2019-05-10 Thread Marek Olšák

Reviewed-by: Marek Olšák 

Marek

On Fri, May 10, 2019 at 1:58 PM Christian König <
ckoenig.leichtzumer...@gmail.com> wrote:

> As far as we know this was never used by userspace and so should be
> removed.
>
> Signed-off-by: Christian König 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c |  6 ++--
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gds.h | 21 ++-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 11 +++---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 24 ++---
>  drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c   | 32 +++--
>  drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c   | 32 +++--
>  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c   | 39 -
>  7 files changed, 28 insertions(+), 137 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
> index 5c79da8e1150..d497467b7fc6 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
> @@ -81,9 +81,9 @@ int amdgpu_bo_list_create(struct amdgpu_device *adev,
> struct drm_file *filp,
> return -ENOMEM;
>
> kref_init(&list->refcount);
> -   list->gds_obj = adev->gds.gds_gfx_bo;
> -   list->gws_obj = adev->gds.gws_gfx_bo;
> -   list->oa_obj = adev->gds.oa_gfx_bo;
> +   list->gds_obj = NULL;
> +   list->gws_obj = NULL;
> +   list->oa_obj = NULL;
>
> array = amdgpu_bo_list_array_entry(list, 0);
> memset(array, 0, num_entries * sizeof(struct
> amdgpu_bo_list_entry));
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gds.h
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gds.h
> index f89f5734d985..dad2186f4ed5 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gds.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gds.h
> @@ -27,26 +27,11 @@
>  struct amdgpu_ring;
>  struct amdgpu_bo;
>
> -struct amdgpu_gds_asic_info {
> -   uint32_ttotal_size;
> -   uint32_tgfx_partition_size;
> -   uint32_tcs_partition_size;
> -};
> -
>  struct amdgpu_gds {
> -   struct amdgpu_gds_asic_info mem;
> -   struct amdgpu_gds_asic_info gws;
> -   struct amdgpu_gds_asic_info oa;
> +   uint32_t gds_size;
> +   uint32_t gws_size;
> +   uint32_t oa_size;
> uint32_tgds_compute_max_wave_id;
> -
> -   /* At present, GDS, GWS and OA resources for gfx (graphics)
> -* is always pre-allocated and available for graphics operation.
> -* Such resource is shared between all gfx clients.
> -* TODO: move this operation to user space
> -* */
> -   struct amdgpu_bo*   gds_gfx_bo;
> -   struct amdgpu_bo*   gws_gfx_bo;
> -   struct amdgpu_bo*   oa_gfx_bo;
>  };
>
>  struct amdgpu_gds_reg_offset {
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> index da7b4fe8ade3..87a93874d71e 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> @@ -590,13 +590,10 @@ static int amdgpu_info_ioctl(struct drm_device *dev,
> void *data, struct drm_file
> struct drm_amdgpu_info_gds gds_info;
>
> memset(&gds_info, 0, sizeof(gds_info));
> -   gds_info.gds_gfx_partition_size =
> adev->gds.mem.gfx_partition_size;
> -   gds_info.compute_partition_size =
> adev->gds.mem.cs_partition_size;
> -   gds_info.gds_total_size = adev->gds.mem.total_size;
> -   gds_info.gws_per_gfx_partition =
> adev->gds.gws.gfx_partition_size;
> -   gds_info.gws_per_compute_partition =
> adev->gds.gws.cs_partition_size;
> -   gds_info.oa_per_gfx_partition =
> adev->gds.oa.gfx_partition_size;
> -   gds_info.oa_per_compute_partition =
> adev->gds.oa.cs_partition_size;
> +   gds_info.compute_partition_size = adev->gds.gds_size;
> +   gds_info.gds_total_size = adev->gds.gds_size;
> +   gds_info.gws_per_compute_partition = adev->gds.gws_size;
> +   gds_info.oa_per_compute_partition = adev->gds.oa_size;
> return copy_to_user(out, &gds_info,
> min((size_t)size, sizeof(gds_info))) ?
> -EFAULT : 0;
> }
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> index c14198737dcd..b25922e3d1ed 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> @@ -1758

Re: Userptr broken with the latest amdgpu driver

2019-04-08 Thread Marek Olšák

Indeed, you're right.

Marek

On Mon, Apr 8, 2019 at 4:59 PM Yang, Philip  wrote:

> Hi Marek,
>
> I guess you are using old kernel config with 5.x kernel, and the kernel
> config option CONFIG_HMM is missing because the dependency option
> CONFIG_ZONE_DEVICE is missing in old config file. Please update your
> kernel config file to enable option CONFIG_ZONE_DEVICE.
>
> You should have this message in dmesg log:
>
> "HMM_MIRROR kernel config option is not enabled, "
> "add CONFIG_ZONE_DEVICE=y in config file to fix
>
> Philip
>
>
> On 2019-04-08 4:46 p.m., Marek Olšák wrote:
> > Hi,
> >
> > amdgpu_mn_register fails with -ENODEV in amdgpu_gem_userptr_ioctl.
> >
> > The regression happened within the last 1-2 months.
> >
> > Marek
> >
> > ___
> > amd-gfx mailing list
> > amd-gfx@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/amd-gfx
> >
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Userptr broken with the latest amdgpu driver

2019-04-08 Thread Marek Olšák

Hi,

amdgpu_mn_register fails with -ENODEV in amdgpu_gem_userptr_ioctl.

The regression happened within the last 1-2 months.

Marek
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH xf86-video-amdgpu] Allow changing DCC parameters between flips

2019-04-02 Thread Marek Olšák

As you probably noticed, I don't use gitlab for my own patches yet.

Marek

On Fri, Mar 1, 2019 at 3:52 AM Michel Dänzer  wrote:

>
> Thanks Marek for the patch, but xf86-video-amdgpu patches are being
> reviewed as GitLab merge requests since the last release[0].
>
> I'll create a merge request with this patch and some follow-up changes.
>
>
> [0] Isn't README.md clear enough on this?
>
> --
> Earthling Michel Dänzer   |  https://www.amd.com
> Libre software enthusiast | Mesa and X developer
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH xf86-video-amdgpu] Allow changing DCC parameters between flips

2019-02-28 Thread Marek Olšák

From: Marek Olšák 

---
 src/amdgpu_present.c | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/src/amdgpu_present.c b/src/amdgpu_present.c
index ce88bd8f..f4fc6ebd 100644
--- a/src/amdgpu_present.c
+++ b/src/amdgpu_present.c
@@ -271,26 +271,34 @@ amdgpu_present_check_flip(RRCrtcPtr crtc, WindowPtr 
window, PixmapPtr pixmap,
return FALSE;
 
if (info->drmmode.dri2_flipping)
return FALSE;
 
 #if XORG_VERSION_CURRENT <= XORG_VERSION_NUMERIC(1, 20, 99, 1, 0)
if (pixmap->devKind != screen_pixmap->devKind)
return FALSE;
 #endif
 
+   uint64_t tiling_info1 = amdgpu_pixmap_get_tiling_info(pixmap);
+   uint64_t tiling_info2 = amdgpu_pixmap_get_tiling_info(screen_pixmap);
+
/* The kernel driver doesn't handle flipping between BOs with different
-* tiling parameters correctly yet
+* tiling parameters correctly yet except DCC.
 */
-   if (amdgpu_pixmap_get_tiling_info(pixmap) !=
-   amdgpu_pixmap_get_tiling_info(screen_pixmap))
-   return FALSE;
+   if (info->family >= AMDGPU_FAMILY_AI) {
+   if (AMDGPU_TILING_GET(tiling_info1, SWIZZLE_MODE) !=
+   AMDGPU_TILING_GET(tiling_info2, SWIZZLE_MODE))
+   return FALSE;
+   } else {
+   if (tiling_info1 != tiling_info2)
+   return FALSE;
+   }
 
for (i = 0, num_crtcs_on = 0; i < config->num_crtc; i++) {
if (drmmode_crtc_can_flip(config->crtc[i]))
num_crtcs_on++;
else if (config->crtc[i] == crtc->devPrivate)
return FALSE;
}
 
if (num_crtcs_on == 0)
return FALSE;
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: Bump amdgpu version for per-flip plane tiling updates

2019-02-28 Thread Marek Olšák

Reviewed-by: Marek Olšák 

Marek

On Thu, Feb 28, 2019 at 9:59 AM Nicholas Kazlauskas <
nicholas.kazlaus...@amd.com> wrote:

> To help xf86-video-amdgpu and mesa know DC supports updating the
> tiling attributes for a framebuffer per-flip.
>
> Cc: Michel Dänzer 
> Cc: Marek Olšák 
> Signed-off-by: Nicholas Kazlauskas 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index 223013ef8466..ae4e3eeb4ae2 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -74,9 +74,10 @@
>   * - 3.28.0 - Add AMDGPU_CHUNK_ID_SCHEDULED_DEPENDENCIES
>   * - 3.29.0 - Add AMDGPU_IB_FLAG_RESET_GDS_MAX_WAVE_ID
>   * - 3.30.0 - Add AMDGPU_SCHED_OP_CONTEXT_PRIORITY_OVERRIDE.
> + * - 3.31.0 - Add support for per-flip tiling attribute changes with DC
>   */
>  #define KMS_DRIVER_MAJOR   3
> -#define KMS_DRIVER_MINOR   30
> +#define KMS_DRIVER_MINOR   31
>  #define KMS_DRIVER_PATCHLEVEL  0
>
>  int amdgpu_vram_limit = 0;
> --
> 2.17.1
>
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: The problem "ring gfx timeout" are experienced yet another AMD GPU Vega 8 user

2019-02-26 Thread Marek Olšák

Sadly, the logs don't contain any clue as to why it hangs.

It would be helpful to check if the hang can be reproduced on Vega 56 or 64
as well.

Marek

On Tue, Feb 26, 2019 at 7:51 AM Mikhail Gavrilov <
mikhail.v.gavri...@gmail.com> wrote:

> On Tue, 26 Feb 2019 at 00:40, Marek Olšák  wrote:
> >
> > Some shaders are stuck at "s_load_dwordx4 s[32:35], s[36:37], 0x0", but
> that might mean all sorts of things.
> >
> > Do you also have the dmesg log?
> >
> > Marek
>
> All files together here:
> https://mega.nz/#F!c4RwAYDJ!0ds-bVIftIDV4KCQOaDIsw
>
> --
> Best Regards,
> Mike Gavrilov.
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: The problem "ring gfx timeout" are experienced yet another AMD GPU Vega 8 user

2019-02-25 Thread Marek Olšák

Some shaders are stuck at "s_load_dwordx4 s[32:35], s[36:37], 0x0", but
that might mean all sorts of things.

Do you also have the dmesg log?

Marek

On Sat, Feb 9, 2019 at 12:20 PM Mikhail Gavrilov <
mikhail.v.gavri...@gmail.com> wrote:

> On Sat, 9 Feb 2019 at 22:01, Marek Olšák  wrote:
> >
> > I don't see any attachments here.
> >
> > Marek
>
>
> --
> Best Regards,
> Mike Gavrilov.
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: The problem "ring gfx timeout" are experienced yet another AMD GPU Vega 8 user

2019-02-09 Thread Marek Olšák

I don't see any attachments here.

Marek

On Sat, Feb 9, 2019, 11:37 AM Grodzovsky, Andrey  +Marek
>
> Can't find the last fence seqno from mmCP_EOP_LAST_FENCE_LO in gfx ring
> dump (probably that seqno wasn't really the last if the register was
> dumped several times before) but since waves were dumped could be some
> shader issue. Marek, could you please give it a quick look ?
>
> Andrey
>
> On 2/9/19 7:53 AM, Mikhail Gavrilov wrote:
> > Hi Andrey,
> > in our Linux chat yet another AMD GPU user complains on problem with
> > `ring gfx timeout`.
> > He said that problem happens when he played in the game "Hearts of Iron
> 4".
> > His config:
> > - APU: Ryzen 2200G
> > - Kernel: 4.20.6
> > - LLVM: 7.0.0
> > - MESA: 18.2.8
> >
> > All logs which he collected with UMR I attach it here.
> >
> > Can you look please what happened with his GPU?
> >
> > --
> > Best Regards,
> > Mike Gavrilov.
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: add a workaround for GDS ordered append hangs with compute queues

2019-02-04 Thread Marek Olšák

FYI, when I push this, I'll bump the DRM version.

Marek

On Mon, Feb 4, 2019 at 10:55 AM Marek Olšák  wrote:

> On Mon, Feb 4, 2019 at 7:42 AM Christian König <
> ckoenig.leichtzumer...@gmail.com> wrote:
>
>> At least from coding style, backward compatibility etc.. this looks sane
>> to me, so feel free to add an Acked-by.
>>
>> But I absolutely can't judge if that is correct from the hardware point
>> of view or not.
>>
>
> Our GDS docs say that writing the register resets the wave counter.
>
>
>>
>> And I think that somebody else looking at this is mandatory for it to be
>> committed.
>>
>
> There won't be anybody else. Nobody here really understands GDS, nobody
> here really cares about GDS, and hw guys weren't helpful with the hangs. If
> I didn't discover this by luck, GDS OA would be unusable on Linux.
>
> Marek
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: add a workaround for GDS ordered append hangs with compute queues

2019-02-04 Thread Marek Olšák

On Mon, Feb 4, 2019 at 7:42 AM Christian König <
ckoenig.leichtzumer...@gmail.com> wrote:

> At least from coding style, backward compatibility etc.. this looks sane
> to me, so feel free to add an Acked-by.
>
> But I absolutely can't judge if that is correct from the hardware point of
> view or not.
>

Our GDS docs say that writing the register resets the wave counter.

>
> And I think that somebody else looking at this is mandatory for it to be
> committed.
>

There won't be anybody else. Nobody here really understands GDS, nobody
here really cares about GDS, and hw guys weren't helpful with the hangs. If
I didn't discover this by luck, GDS OA would be unusable on Linux.

Marek
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH v2] drm/amdgpu: Add AMDGPU_CHUNK_ID_SCHEDULED_DEPENDENCIES

2019-02-01 Thread Marek Olšák

Can you also bump the KMS version?

Thanks,
Marek

On Fri, Feb 1, 2019 at 3:09 PM Marek Olšák  wrote:

> Reviewed-by: Marek Olšák 
> Tested-by: Marek Olšák 
>
> On Fri, Feb 1, 2019 at 3:00 PM Andrey Grodzovsky <
> andrey.grodzov...@amd.com> wrote:
>
>> New chunk for dependency on start of job's execution instead on
>> the end. This is used for GPU deadlock prevention when
>> userspace uses mid-IB fences to wait for mid-IB work on other rings.
>>
>> v2: Fix typo in AMDGPU_CHUNK_ID_SCHEDULED_DEPENDENCIES
>>
>> Signed-off-by: Andrey Grodzovsky 
>> Suggested-by: Christian Koenig 
>> ---
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 12 +++-
>>  include/uapi/drm/amdgpu_drm.h  |  1 +
>>  2 files changed, 12 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>> index 1c49b82..3f21eca 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>> @@ -214,6 +214,7 @@ static int amdgpu_cs_parser_init(struct
>> amdgpu_cs_parser *p, union drm_amdgpu_cs
>> case AMDGPU_CHUNK_ID_DEPENDENCIES:
>> case AMDGPU_CHUNK_ID_SYNCOBJ_IN:
>> case AMDGPU_CHUNK_ID_SYNCOBJ_OUT:
>> +   case AMDGPU_CHUNK_ID_SCHEDULED_DEPENDENCIES:
>> break;
>>
>> default:
>> @@ -1090,6 +1091,14 @@ static int amdgpu_cs_process_fence_dep(struct
>> amdgpu_cs_parser *p,
>>
>> fence = amdgpu_ctx_get_fence(ctx, entity,
>>  deps[i].handle);
>> +
>> +   if (chunk->chunk_id ==
>> AMDGPU_CHUNK_ID_SCHEDULED_DEPENDENCIES) {
>> +   struct drm_sched_fence *s_fence =
>> to_drm_sched_fence(fence);
>> +
>> +   dma_fence_put(fence);
>> +   fence = dma_fence_get(&s_fence->scheduled);
>> +   }
>> +
>> if (IS_ERR(fence)) {
>> r = PTR_ERR(fence);
>> amdgpu_ctx_put(ctx);
>> @@ -1177,7 +1186,8 @@ static int amdgpu_cs_dependencies(struct
>> amdgpu_device *adev,
>>
>> chunk = &p->chunks[i];
>>
>> -   if (chunk->chunk_id == AMDGPU_CHUNK_ID_DEPENDENCIES) {
>> +   if (chunk->chunk_id == AMDGPU_CHUNK_ID_DEPENDENCIES ||
>> +   chunk->chunk_id ==
>> AMDGPU_CHUNK_ID_SCHEDULED_DEPENDENCIES) {
>> r = amdgpu_cs_process_fence_dep(p, chunk);
>> if (r)
>> return r;
>> diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
>> index faaad04..43d03a2 100644
>> --- a/include/uapi/drm/amdgpu_drm.h
>> +++ b/include/uapi/drm/amdgpu_drm.h
>> @@ -526,6 +526,7 @@ struct drm_amdgpu_gem_va {
>>  #define AMDGPU_CHUNK_ID_SYNCOBJ_IN  0x04
>>  #define AMDGPU_CHUNK_ID_SYNCOBJ_OUT 0x05
>>  #define AMDGPU_CHUNK_ID_BO_HANDLES  0x06
>> +#define AMDGPU_CHUNK_ID_SCHEDULED_DEPENDENCIES 0x07
>>
>>  struct drm_amdgpu_cs_chunk {
>> __u32   chunk_id;
>> --
>> 2.7.4
>>
>> ___
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH v2] drm/amdgpu: Add AMDGPU_CHUNK_ID_SCHEDULED_DEPENDENCIES

2019-02-01 Thread Marek Olšák

Reviewed-by: Marek Olšák 
Tested-by: Marek Olšák 

On Fri, Feb 1, 2019 at 3:00 PM Andrey Grodzovsky 
wrote:

> New chunk for dependency on start of job's execution instead on
> the end. This is used for GPU deadlock prevention when
> userspace uses mid-IB fences to wait for mid-IB work on other rings.
>
> v2: Fix typo in AMDGPU_CHUNK_ID_SCHEDULED_DEPENDENCIES
>
> Signed-off-by: Andrey Grodzovsky 
> Suggested-by: Christian Koenig 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 12 +++-
>  include/uapi/drm/amdgpu_drm.h  |  1 +
>  2 files changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> index 1c49b82..3f21eca 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> @@ -214,6 +214,7 @@ static int amdgpu_cs_parser_init(struct
> amdgpu_cs_parser *p, union drm_amdgpu_cs
> case AMDGPU_CHUNK_ID_DEPENDENCIES:
> case AMDGPU_CHUNK_ID_SYNCOBJ_IN:
> case AMDGPU_CHUNK_ID_SYNCOBJ_OUT:
> +   case AMDGPU_CHUNK_ID_SCHEDULED_DEPENDENCIES:
> break;
>
> default:
> @@ -1090,6 +1091,14 @@ static int amdgpu_cs_process_fence_dep(struct
> amdgpu_cs_parser *p,
>
> fence = amdgpu_ctx_get_fence(ctx, entity,
>  deps[i].handle);
> +
> +   if (chunk->chunk_id ==
> AMDGPU_CHUNK_ID_SCHEDULED_DEPENDENCIES) {
> +   struct drm_sched_fence *s_fence =
> to_drm_sched_fence(fence);
> +
> +   dma_fence_put(fence);
> +   fence = dma_fence_get(&s_fence->scheduled);
> +   }
> +
> if (IS_ERR(fence)) {
> r = PTR_ERR(fence);
> amdgpu_ctx_put(ctx);
> @@ -1177,7 +1186,8 @@ static int amdgpu_cs_dependencies(struct
> amdgpu_device *adev,
>
> chunk = &p->chunks[i];
>
> -   if (chunk->chunk_id == AMDGPU_CHUNK_ID_DEPENDENCIES) {
> +   if (chunk->chunk_id == AMDGPU_CHUNK_ID_DEPENDENCIES ||
> +   chunk->chunk_id ==
> AMDGPU_CHUNK_ID_SCHEDULED_DEPENDENCIES) {
> r = amdgpu_cs_process_fence_dep(p, chunk);
> if (r)
> return r;
> diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
> index faaad04..43d03a2 100644
> --- a/include/uapi/drm/amdgpu_drm.h
> +++ b/include/uapi/drm/amdgpu_drm.h
> @@ -526,6 +526,7 @@ struct drm_amdgpu_gem_va {
>  #define AMDGPU_CHUNK_ID_SYNCOBJ_IN  0x04
>  #define AMDGPU_CHUNK_ID_SYNCOBJ_OUT 0x05
>  #define AMDGPU_CHUNK_ID_BO_HANDLES  0x06
> +#define AMDGPU_CHUNK_ID_SCHEDULED_DEPENDENCIES 0x07
>
>  struct drm_amdgpu_cs_chunk {
> __u32   chunk_id;
> --
> 2.7.4
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: add AMDGPU_IB_FLAG_GET_START_SYNCOBJ to expose scheduled fence

2019-01-29 Thread Marek Olšák

On Tue, Jan 29, 2019 at 3:01 AM Christian König <
ckoenig.leichtzumer...@gmail.com> wrote:

> Am 28.01.19 um 22:52 schrieb Marek Olšák:
> > From: Marek Olšák 
> >
> > Normal syncobjs signal when an IB finishes. Start syncobjs signal when
> > an IB starts.
>
> That approach has quite a number of problems (for example you can't
> allocate memory at this point).
>

Even if I drop this patch, can you describe all the problems with it?
Andrey and I would like to understand this.

Thanks,
Marek
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: add AMDGPU_IB_FLAG_GET_START_SYNCOBJ to expose scheduled fence

2019-01-29 Thread Marek Olšák

On Tue, Jan 29, 2019, 3:01 AM Christian König <
ckoenig.leichtzumer...@gmail.com wrote:

> Am 28.01.19 um 22:52 schrieb Marek Olšák:
> > From: Marek Olšák 
> >
> > Normal syncobjs signal when an IB finishes. Start syncobjs signal when
> > an IB starts.
>
> That approach has quite a number of problems (for example you can't
> allocate memory at this point).
>
> Better add a flag that we should only sync on scheduling for a
> dependency/syncobj instead.
>

I don't understand. Can you give me an example of the interface and how the
implementation would look?

Thanks,
Marek


> Christian.
>
> >
> > Signed-off-by: Marek Olšák 
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu.h |  1 +
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c  | 18 ++
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c |  3 ++-
> >   include/uapi/drm/amdgpu_drm.h   | 13 -
> >   4 files changed, 33 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > index d67f8b1dfe80..8e2f7e558bc9 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > @@ -453,20 +453,21 @@ struct amdgpu_cs_parser {
> >   struct dma_fence*fence;
> >   uint64_tbytes_moved_threshold;
> >   uint64_tbytes_moved_vis_threshold;
> >   uint64_tbytes_moved;
> >   uint64_tbytes_moved_vis;
> >   struct amdgpu_bo_list_entry *evictable;
> >
> >   /* user fence */
> >   struct amdgpu_bo_list_entry uf_entry;
> >
> > + boolget_start_syncobj;
> >   unsigned num_post_dep_syncobjs;
> >   struct drm_syncobj **post_dep_syncobjs;
> >   };
> >
> >   static inline u32 amdgpu_get_ib_value(struct amdgpu_cs_parser *p,
> > uint32_t ib_idx, int idx)
> >   {
> >   return p->job->ibs[ib_idx].ptr[idx];
> >   }
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> > index 1c49b8266d69..917f3818c61c 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> > @@ -1022,20 +1022,23 @@ static int amdgpu_cs_ib_fill(struct
> amdgpu_device *adev,
> >   r = amdgpu_ctx_get_entity(parser->ctx, chunk_ib->ip_type,
> > chunk_ib->ip_instance,
> chunk_ib->ring,
> > &entity);
> >   if (r)
> >   return r;
> >
> >   if (chunk_ib->flags & AMDGPU_IB_FLAG_PREAMBLE)
> >   parser->job->preamble_status |=
> >   AMDGPU_PREAMBLE_IB_PRESENT;
> >
> > + if (chunk_ib->flags & AMDGPU_IB_FLAG_GET_START_SYNCOBJ)
> > + parser->get_start_syncobj = true;
> > +
> >   if (parser->entity && parser->entity != entity)
> >   return -EINVAL;
> >
> >   parser->entity = entity;
> >
> >   ring = to_amdgpu_ring(entity->rq->sched);
> >   r =  amdgpu_ib_get(adev, vm, ring->funcs->parse_cs ?
> >  chunk_ib->ib_bytes : 0, ib);
> >   if (r) {
> >   DRM_ERROR("Failed to get ib !\n");
> > @@ -1227,20 +1230,35 @@ static int amdgpu_cs_submit(struct
> amdgpu_cs_parser *p,
> >   amdgpu_mn_lock(p->mn);
> >   amdgpu_bo_list_for_each_userptr_entry(e, p->bo_list) {
> >   struct amdgpu_bo *bo = ttm_to_amdgpu_bo(e->tv.bo);
> >
> >   if (amdgpu_ttm_tt_userptr_needs_pages(bo->tbo.ttm)) {
> >   r = -ERESTARTSYS;
> >   goto error_abort;
> >   }
> >   }
> >
> > + if (p->get_start_syncobj) {
> > + struct drm_syncobj *syncobj;
> > +
> > + r = drm_syncobj_create(&syncobj, 0,
> > +&job->base.s_fence->scheduled);
> > + if (r)
> > + goto error_abort;
> > +
> > + r = drm_syncobj_get_handle(p->filp, syncobj,
> > +&cs->out.

[PATCH] drm/amdgpu: add AMDGPU_IB_FLAG_GET_START_SYNCOBJ to expose scheduled fence

2019-01-28 Thread Marek Olšák

From: Marek Olšák 

Normal syncobjs signal when an IB finishes. Start syncobjs signal when
an IB starts.

Signed-off-by: Marek Olšák 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c  | 18 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c |  3 ++-
 include/uapi/drm/amdgpu_drm.h   | 13 -
 4 files changed, 33 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index d67f8b1dfe80..8e2f7e558bc9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -453,20 +453,21 @@ struct amdgpu_cs_parser {
struct dma_fence*fence;
uint64_tbytes_moved_threshold;
uint64_tbytes_moved_vis_threshold;
uint64_tbytes_moved;
uint64_tbytes_moved_vis;
struct amdgpu_bo_list_entry *evictable;
 
/* user fence */
struct amdgpu_bo_list_entry uf_entry;
 
+   boolget_start_syncobj;
unsigned num_post_dep_syncobjs;
struct drm_syncobj **post_dep_syncobjs;
 };
 
 static inline u32 amdgpu_get_ib_value(struct amdgpu_cs_parser *p,
  uint32_t ib_idx, int idx)
 {
return p->job->ibs[ib_idx].ptr[idx];
 }
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 1c49b8266d69..917f3818c61c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -1022,20 +1022,23 @@ static int amdgpu_cs_ib_fill(struct amdgpu_device *adev,
r = amdgpu_ctx_get_entity(parser->ctx, chunk_ib->ip_type,
  chunk_ib->ip_instance, chunk_ib->ring,
  &entity);
if (r)
return r;
 
if (chunk_ib->flags & AMDGPU_IB_FLAG_PREAMBLE)
parser->job->preamble_status |=
AMDGPU_PREAMBLE_IB_PRESENT;
 
+   if (chunk_ib->flags & AMDGPU_IB_FLAG_GET_START_SYNCOBJ)
+   parser->get_start_syncobj = true;
+
if (parser->entity && parser->entity != entity)
return -EINVAL;
 
parser->entity = entity;
 
ring = to_amdgpu_ring(entity->rq->sched);
r =  amdgpu_ib_get(adev, vm, ring->funcs->parse_cs ?
   chunk_ib->ib_bytes : 0, ib);
if (r) {
DRM_ERROR("Failed to get ib !\n");
@@ -1227,20 +1230,35 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
amdgpu_mn_lock(p->mn);
amdgpu_bo_list_for_each_userptr_entry(e, p->bo_list) {
struct amdgpu_bo *bo = ttm_to_amdgpu_bo(e->tv.bo);
 
if (amdgpu_ttm_tt_userptr_needs_pages(bo->tbo.ttm)) {
r = -ERESTARTSYS;
goto error_abort;
}
}
 
+   if (p->get_start_syncobj) {
+   struct drm_syncobj *syncobj;
+
+   r = drm_syncobj_create(&syncobj, 0,
+  &job->base.s_fence->scheduled);
+   if (r)
+   goto error_abort;
+
+   r = drm_syncobj_get_handle(p->filp, syncobj,
+  &cs->out.start_syncobj);
+   if (r)
+   goto error_abort;
+   drm_syncobj_put(syncobj);
+   }
+
job->owner = p->filp;
p->fence = dma_fence_get(&job->base.s_fence->finished);
 
amdgpu_ctx_add_fence(p->ctx, entity, p->fence, &seq);
amdgpu_cs_post_dependencies(p);
 
if ((job->preamble_status & AMDGPU_PREAMBLE_IB_PRESENT) &&
!p->ctx->preamble_presented) {
job->preamble_status |= AMDGPU_PREAMBLE_IB_PRESENT_FIRST;
p->ctx->preamble_presented = true;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index c806f984bcc5..a230a30722d4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -64,23 +64,24 @@
  * - 3.18.0 - Export gpu always on cu bitmap
  * - 3.19.0 - Add support for UVD MJPEG decode
  * - 3.20.0 - Add support for local BOs
  * - 3.21.0 - Add DRM_AMDGPU_FENCE_TO_HANDLE ioctl
  * - 3.22.0 - Add DRM_AMDGPU_SCHED ioctl
  * - 3.23.0 - Add query for VRAM lost counter
  * - 3.24.0 - Add high priority compute support for gfx9
  * - 3.25.0 - Add support for sensor query info (stable pstate sclk/mclk).
  * - 3.26.0 - GFX9: Process AMD

Re: [PATCH] drm/amdgpu: clean up memory/GDS/GWS/OA alignment code

2019-01-28 Thread Marek Olšák

Ping

On Tue, Jan 22, 2019 at 4:45 PM Marek Olšák  wrote:

> From: Marek Olšák 
>
> - move all adjustments into one place
> - specify GDS/GWS/OA alignment in basic units of the heaps
> - it looks like GDS alignment was 1 instead of 4
>
> Signed-off-by: Marek Olšák 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c|  7 ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 16 
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c|  6 +++---
>  3 files changed, 15 insertions(+), 14 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> index f4f00217546e..d21dd2f369da 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> @@ -47,24 +47,20 @@ int amdgpu_gem_object_create(struct amdgpu_device
> *adev, unsigned long size,
>  u64 flags, enum ttm_bo_type type,
>  struct reservation_object *resv,
>  struct drm_gem_object **obj)
>  {
> struct amdgpu_bo *bo;
> struct amdgpu_bo_param bp;
> int r;
>
> memset(&bp, 0, sizeof(bp));
> *obj = NULL;
> -   /* At least align on page size */
> -   if (alignment < PAGE_SIZE) {
> -   alignment = PAGE_SIZE;
> -   }
>
> bp.size = size;
> bp.byte_align = alignment;
> bp.type = type;
> bp.resv = resv;
> bp.preferred_domain = initial_domain;
>  retry:
> bp.flags = flags;
> bp.domain = initial_domain;
> r = amdgpu_bo_create(adev, &bp, &bo);
> @@ -237,23 +233,20 @@ int amdgpu_gem_create_ioctl(struct drm_device *dev,
> void *data,
> if (args->in.domains & (AMDGPU_GEM_DOMAIN_GDS |
> AMDGPU_GEM_DOMAIN_GWS | AMDGPU_GEM_DOMAIN_OA)) {
> if (flags & AMDGPU_GEM_CREATE_VM_ALWAYS_VALID) {
> /* if gds bo is created from user space, it must be
>  * passed to bo list
>  */
> DRM_ERROR("GDS bo cannot be per-vm-bo\n");
> return -EINVAL;
> }
> flags |= AMDGPU_GEM_CREATE_NO_CPU_ACCESS;
> -   /* GDS allocations must be DW aligned */
> -   if (args->in.domains & AMDGPU_GEM_DOMAIN_GDS)
> -   size = ALIGN(size, 4);
> }
>
> if (flags & AMDGPU_GEM_CREATE_VM_ALWAYS_VALID) {
> r = amdgpu_bo_reserve(vm->root.base.bo, false);
> if (r)
> return r;
>
> resv = vm->root.base.bo->tbo.resv;
> }
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> index 728e15e5d68a..fd9c4beeaaa4 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> @@ -419,26 +419,34 @@ static int amdgpu_bo_do_create(struct amdgpu_device
> *adev,
> .interruptible = (bp->type != ttm_bo_type_kernel),
> .no_wait_gpu = false,
> .resv = bp->resv,
> .flags = TTM_OPT_FLAG_ALLOW_RES_EVICT
> };
> struct amdgpu_bo *bo;
> unsigned long page_align, size = bp->size;
> size_t acc_size;
> int r;
>
> -   page_align = roundup(bp->byte_align, PAGE_SIZE) >> PAGE_SHIFT;
> -   if (bp->domain & (AMDGPU_GEM_DOMAIN_GDS | AMDGPU_GEM_DOMAIN_GWS |
> - AMDGPU_GEM_DOMAIN_OA))
> +   /* Note that GDS/GWS/OA allocates 1 page per byte/resource. */
> +   if (bp->domain & (AMDGPU_GEM_DOMAIN_GWS | AMDGPU_GEM_DOMAIN_OA)) {
> +   /* GWS and OA don't need any alignment. */
> +   page_align = bp->byte_align;
> size <<= PAGE_SHIFT;
> -   else
> +   } else if (bp->domain & AMDGPU_GEM_DOMAIN_GDS) {
> +   /* Both size and alignment must be a multiple of 4. */
> +   page_align = ALIGN(bp->byte_align, 4);
> +   size = ALIGN(size, 4) << PAGE_SHIFT;
> +   } else {
> +   /* Memory should be aligned at least to a page size. */
> +   page_align = ALIGN(bp->byte_align, PAGE_SIZE) >>
> PAGE_SHIFT;
> size = ALIGN(size, PAGE_SIZE);
> +   }
>
> if (!amdgpu_bo_validate_size(adev, size, bp->domain))
> return -ENOMEM;
>
> *bo_ptr = NULL;
>
> acc_size = ttm_bo_dma_acc_size

Re: [PATCH] drm/amdgpu: add a workaround for GDS ordered append hangs with compute queues

2019-01-28 Thread Marek Olšák

Ping

On Tue, Jan 22, 2019 at 3:05 PM Marek Olšák  wrote:

> From: Marek Olšák 
>
> I'm not increasing the DRM version because GDS isn't totally without bugs
> yet.
>
> v2: update emit_ib_size
>
> Signed-off-by: Marek Olšák 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gds.h |  2 ++
>  drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c   | 19 +++-
>  drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c   | 21 +++--
>  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c   | 40 +++--
>  include/uapi/drm/amdgpu_drm.h   |  5 
>  5 files changed, 82 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gds.h
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gds.h
> index ecbcefe49a98..f89f5734d985 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gds.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gds.h
> @@ -30,20 +30,22 @@ struct amdgpu_bo;
>  struct amdgpu_gds_asic_info {
> uint32_ttotal_size;
> uint32_tgfx_partition_size;
> uint32_tcs_partition_size;
>  };
>
>  struct amdgpu_gds {
> struct amdgpu_gds_asic_info mem;
> struct amdgpu_gds_asic_info gws;
> struct amdgpu_gds_asic_info oa;
> +   uint32_tgds_compute_max_wave_id;
> +
> /* At present, GDS, GWS and OA resources for gfx (graphics)
>  * is always pre-allocated and available for graphics operation.
>  * Such resource is shared between all gfx clients.
>  * TODO: move this operation to user space
>  * */
> struct amdgpu_bo*   gds_gfx_bo;
> struct amdgpu_bo*   gws_gfx_bo;
> struct amdgpu_bo*   oa_gfx_bo;
>  };
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
> b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
> index 7984292f9282..a59e0fdf5a97 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
> @@ -2257,20 +2257,36 @@ static void gfx_v7_0_ring_emit_ib_gfx(struct
> amdgpu_ring *ring,
>  }
>
>  static void gfx_v7_0_ring_emit_ib_compute(struct amdgpu_ring *ring,
>   struct amdgpu_job *job,
>   struct amdgpu_ib *ib,
>   uint32_t flags)
>  {
> unsigned vmid = AMDGPU_JOB_GET_VMID(job);
> u32 control = INDIRECT_BUFFER_VALID | ib->length_dw | (vmid << 24);
>
> +   /* Currently, there is a high possibility to get wave ID mismatch
> +* between ME and GDS, leading to a hw deadlock, because ME
> generates
> +* different wave IDs than the GDS expects. This situation happens
> +* randomly when at least 5 compute pipes use GDS ordered append.
> +* The wave IDs generated by ME are also wrong after
> suspend/resume.
> +* Those are probably bugs somewhere else in the kernel driver.
> +*
> +* Writing GDS_COMPUTE_MAX_WAVE_ID resets wave ID counters in ME
> and
> +* GDS to 0 for this ring (me/pipe).
> +*/
> +   if (ib->flags & AMDGPU_IB_FLAG_RESET_GDS_MAX_WAVE_ID) {
> +   amdgpu_ring_write(ring, PACKET3(PACKET3_SET_CONFIG_REG,
> 1));
> +   amdgpu_ring_write(ring, mmGDS_COMPUTE_MAX_WAVE_ID -
> PACKET3_SET_CONFIG_REG_START);
> +   amdgpu_ring_write(ring,
> ring->adev->gds.gds_compute_max_wave_id);
> +   }
> +
> amdgpu_ring_write(ring, PACKET3(PACKET3_INDIRECT_BUFFER, 2));
> amdgpu_ring_write(ring,
>  #ifdef __BIG_ENDIAN
>   (2 << 0) |
>  #endif
>   (ib->gpu_addr & 0xFFFC));
> amdgpu_ring_write(ring, upper_32_bits(ib->gpu_addr) & 0x);
> amdgpu_ring_write(ring, control);
>  }
>
> @@ -4993,21 +5009,21 @@ static const struct amdgpu_ring_funcs
> gfx_v7_0_ring_funcs_compute = {
> .get_rptr = gfx_v7_0_ring_get_rptr,
> .get_wptr = gfx_v7_0_ring_get_wptr_compute,
> .set_wptr = gfx_v7_0_ring_set_wptr_compute,
> .emit_frame_size =
> 20 + /* gfx_v7_0_ring_emit_gds_switch */
> 7 + /* gfx_v7_0_ring_emit_hdp_flush */
> 5 + /* hdp invalidate */
> 7 + /* gfx_v7_0_ring_emit_pipeline_sync */
> CIK_FLUSH_GPU_TLB_NUM_WREG * 5 + 7 + /*
> gfx_v7_0_ring_emit_vm_flush */
> 7 + 7 + 7, /* gfx_v7_0_ring_emit_fence_compute x3 for user
> fence, vm fence */
> -   .emit_ib_size = 4, /* gfx_v7_0_ring_emit_ib_compute */
> +   .emit_ib_size = 7, /* gfx_v7_0_ring_emit_ib_co

[PATCH] drm/amdgpu: clean up memory/GDS/GWS/OA alignment code

2019-01-22 Thread Marek Olšák

From: Marek Olšák 

- move all adjustments into one place
- specify GDS/GWS/OA alignment in basic units of the heaps
- it looks like GDS alignment was 1 instead of 4

Signed-off-by: Marek Olšák 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c|  7 ---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 16 
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c|  6 +++---
 3 files changed, 15 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index f4f00217546e..d21dd2f369da 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -47,24 +47,20 @@ int amdgpu_gem_object_create(struct amdgpu_device *adev, 
unsigned long size,
 u64 flags, enum ttm_bo_type type,
 struct reservation_object *resv,
 struct drm_gem_object **obj)
 {
struct amdgpu_bo *bo;
struct amdgpu_bo_param bp;
int r;
 
memset(&bp, 0, sizeof(bp));
*obj = NULL;
-   /* At least align on page size */
-   if (alignment < PAGE_SIZE) {
-   alignment = PAGE_SIZE;
-   }
 
bp.size = size;
bp.byte_align = alignment;
bp.type = type;
bp.resv = resv;
bp.preferred_domain = initial_domain;
 retry:
bp.flags = flags;
bp.domain = initial_domain;
r = amdgpu_bo_create(adev, &bp, &bo);
@@ -237,23 +233,20 @@ int amdgpu_gem_create_ioctl(struct drm_device *dev, void 
*data,
if (args->in.domains & (AMDGPU_GEM_DOMAIN_GDS |
AMDGPU_GEM_DOMAIN_GWS | AMDGPU_GEM_DOMAIN_OA)) {
if (flags & AMDGPU_GEM_CREATE_VM_ALWAYS_VALID) {
/* if gds bo is created from user space, it must be
 * passed to bo list
 */
DRM_ERROR("GDS bo cannot be per-vm-bo\n");
return -EINVAL;
}
flags |= AMDGPU_GEM_CREATE_NO_CPU_ACCESS;
-   /* GDS allocations must be DW aligned */
-   if (args->in.domains & AMDGPU_GEM_DOMAIN_GDS)
-   size = ALIGN(size, 4);
}
 
if (flags & AMDGPU_GEM_CREATE_VM_ALWAYS_VALID) {
r = amdgpu_bo_reserve(vm->root.base.bo, false);
if (r)
return r;
 
resv = vm->root.base.bo->tbo.resv;
}
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 728e15e5d68a..fd9c4beeaaa4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -419,26 +419,34 @@ static int amdgpu_bo_do_create(struct amdgpu_device *adev,
.interruptible = (bp->type != ttm_bo_type_kernel),
.no_wait_gpu = false,
.resv = bp->resv,
.flags = TTM_OPT_FLAG_ALLOW_RES_EVICT
};
struct amdgpu_bo *bo;
unsigned long page_align, size = bp->size;
size_t acc_size;
int r;
 
-   page_align = roundup(bp->byte_align, PAGE_SIZE) >> PAGE_SHIFT;
-   if (bp->domain & (AMDGPU_GEM_DOMAIN_GDS | AMDGPU_GEM_DOMAIN_GWS |
- AMDGPU_GEM_DOMAIN_OA))
+   /* Note that GDS/GWS/OA allocates 1 page per byte/resource. */
+   if (bp->domain & (AMDGPU_GEM_DOMAIN_GWS | AMDGPU_GEM_DOMAIN_OA)) {
+   /* GWS and OA don't need any alignment. */
+   page_align = bp->byte_align;
size <<= PAGE_SHIFT;
-   else
+   } else if (bp->domain & AMDGPU_GEM_DOMAIN_GDS) {
+   /* Both size and alignment must be a multiple of 4. */
+   page_align = ALIGN(bp->byte_align, 4);
+   size = ALIGN(size, 4) << PAGE_SHIFT;
+   } else {
+   /* Memory should be aligned at least to a page size. */
+   page_align = ALIGN(bp->byte_align, PAGE_SIZE) >> PAGE_SHIFT;
size = ALIGN(size, PAGE_SIZE);
+   }
 
if (!amdgpu_bo_validate_size(adev, size, bp->domain))
return -ENOMEM;
 
*bo_ptr = NULL;
 
acc_size = ttm_bo_dma_acc_size(&adev->mman.bdev, size,
   sizeof(struct amdgpu_bo));
 
bo = kzalloc(sizeof(struct amdgpu_bo), GFP_KERNEL);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index b852abb9db0f..73e71e61dc99 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -1749,47 +1749,47 @@ int amdgpu_ttm_init(struct amdgpu_device *adev)
 
/* Initialize various on-chip memory pools */
r = ttm_bo_init_mm(&adev->mman.bdev, AMDGPU_PL_GDS,

[PATCH] drm/amdgpu: add a workaround for GDS ordered append hangs with compute queues

2019-01-22 Thread Marek Olšák

From: Marek Olšák 

I'm not increasing the DRM version because GDS isn't totally without bugs yet.

v2: update emit_ib_size

Signed-off-by: Marek Olšák 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gds.h |  2 ++
 drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c   | 19 +++-
 drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c   | 21 +++--
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c   | 40 +++--
 include/uapi/drm/amdgpu_drm.h   |  5 
 5 files changed, 82 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gds.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gds.h
index ecbcefe49a98..f89f5734d985 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gds.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gds.h
@@ -30,20 +30,22 @@ struct amdgpu_bo;
 struct amdgpu_gds_asic_info {
uint32_ttotal_size;
uint32_tgfx_partition_size;
uint32_tcs_partition_size;
 };
 
 struct amdgpu_gds {
struct amdgpu_gds_asic_info mem;
struct amdgpu_gds_asic_info gws;
struct amdgpu_gds_asic_info oa;
+   uint32_tgds_compute_max_wave_id;
+
/* At present, GDS, GWS and OA resources for gfx (graphics)
 * is always pre-allocated and available for graphics operation.
 * Such resource is shared between all gfx clients.
 * TODO: move this operation to user space
 * */
struct amdgpu_bo*   gds_gfx_bo;
struct amdgpu_bo*   gws_gfx_bo;
struct amdgpu_bo*   oa_gfx_bo;
 };
 
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
index 7984292f9282..a59e0fdf5a97 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
@@ -2257,20 +2257,36 @@ static void gfx_v7_0_ring_emit_ib_gfx(struct 
amdgpu_ring *ring,
 }
 
 static void gfx_v7_0_ring_emit_ib_compute(struct amdgpu_ring *ring,
  struct amdgpu_job *job,
  struct amdgpu_ib *ib,
  uint32_t flags)
 {
unsigned vmid = AMDGPU_JOB_GET_VMID(job);
u32 control = INDIRECT_BUFFER_VALID | ib->length_dw | (vmid << 24);
 
+   /* Currently, there is a high possibility to get wave ID mismatch
+* between ME and GDS, leading to a hw deadlock, because ME generates
+* different wave IDs than the GDS expects. This situation happens
+* randomly when at least 5 compute pipes use GDS ordered append.
+* The wave IDs generated by ME are also wrong after suspend/resume.
+* Those are probably bugs somewhere else in the kernel driver.
+*
+* Writing GDS_COMPUTE_MAX_WAVE_ID resets wave ID counters in ME and
+* GDS to 0 for this ring (me/pipe).
+*/
+   if (ib->flags & AMDGPU_IB_FLAG_RESET_GDS_MAX_WAVE_ID) {
+   amdgpu_ring_write(ring, PACKET3(PACKET3_SET_CONFIG_REG, 1));
+   amdgpu_ring_write(ring, mmGDS_COMPUTE_MAX_WAVE_ID - 
PACKET3_SET_CONFIG_REG_START);
+   amdgpu_ring_write(ring, 
ring->adev->gds.gds_compute_max_wave_id);
+   }
+
amdgpu_ring_write(ring, PACKET3(PACKET3_INDIRECT_BUFFER, 2));
amdgpu_ring_write(ring,
 #ifdef __BIG_ENDIAN
  (2 << 0) |
 #endif
  (ib->gpu_addr & 0xFFFC));
amdgpu_ring_write(ring, upper_32_bits(ib->gpu_addr) & 0x);
amdgpu_ring_write(ring, control);
 }
 
@@ -4993,21 +5009,21 @@ static const struct amdgpu_ring_funcs 
gfx_v7_0_ring_funcs_compute = {
.get_rptr = gfx_v7_0_ring_get_rptr,
.get_wptr = gfx_v7_0_ring_get_wptr_compute,
.set_wptr = gfx_v7_0_ring_set_wptr_compute,
.emit_frame_size =
20 + /* gfx_v7_0_ring_emit_gds_switch */
7 + /* gfx_v7_0_ring_emit_hdp_flush */
5 + /* hdp invalidate */
7 + /* gfx_v7_0_ring_emit_pipeline_sync */
CIK_FLUSH_GPU_TLB_NUM_WREG * 5 + 7 + /* 
gfx_v7_0_ring_emit_vm_flush */
7 + 7 + 7, /* gfx_v7_0_ring_emit_fence_compute x3 for user 
fence, vm fence */
-   .emit_ib_size = 4, /* gfx_v7_0_ring_emit_ib_compute */
+   .emit_ib_size = 7, /* gfx_v7_0_ring_emit_ib_compute */
.emit_ib = gfx_v7_0_ring_emit_ib_compute,
.emit_fence = gfx_v7_0_ring_emit_fence_compute,
.emit_pipeline_sync = gfx_v7_0_ring_emit_pipeline_sync,
.emit_vm_flush = gfx_v7_0_ring_emit_vm_flush,
.emit_gds_switch = gfx_v7_0_ring_emit_gds_switch,
.emit_hdp_flush = gfx_v7_0_ring_emit_hdp_flush,
.test_ring = gfx_v7_0_ring_test_ring,
.test_ib = gfx_v7_0_ring_test_ib,
.insert_nop = amdgpu_ring_insert_nop,
.pad_ib = amdgpu_ring_generic_pad_ib,
@@ -5050,20 +5066,21 @@ static void gfx_v7_0_set

[ANNOUNCE] libdrm 2.4.97

2019-01-22 Thread Marek Olšák

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512


Alex Deucher (1):
  amdgpu: update to latest marketing names from 18.50

Andrey Grodzovsky (3):
  amdgpu/test: Add illegal register and memory access test v2
  amdgpu/test: Disable deadlock tests for all non gfx8/9 ASICs.
  amdgpu/test: Enable deadlock test for CI family (gfx7)

Christian König (1):
  amdgpu: add VM test to exercise max/min address space

Daniel Vetter (1):
  doc: Rename README&CONTRIBUTING to .rst

Eric Anholt (2):
  Avoid hardcoded strlens in drmParseSubsystemType().
  drm: Attempt to parse SPI devices as platform bus devices.

Eric Engestrom (6):
  xf86drmHash: remove unused loop variable
  meson: fix typo in compiler flag
  tests: skip drmdevice test if the machine doesn't have any drm device
  freedreno: remove always-defined #ifdef
  xf86atomic: #undef internal define
  README: reflow the project description to improve readability

François Tigeot (2):
  xf86drm: implement drmParseSubsystemType for DragonFly
  libdrm: Use DRM_IOCTL_GET_PCIINFO on DragonFly

Leo Liu (1):
  tests/amdgpu/vcn: fix the nop command in IBs

Lucas De Marchi (2):
  gitignore: sort file
  gitignore: add _build

Marek Olšák (3):
  amdgpu: update amdgpu_drm.h
  amdgpu: add a faster BO list API
  Bump the version to 2.4.97

Mauro Rossi (1):
  android: Fix 32-bit app crashing in 64-bit Android

git tag: libdrm-2.4.97

https://dri.freedesktop.org/libdrm/libdrm-2.4.97.tar.bz2
MD5:  acef22d0c62c89692348c2dd5591393e  libdrm-2.4.97.tar.bz2
SHA1: 7635bec769a17edd140282fa2c46838c4a44bc91  libdrm-2.4.97.tar.bz2
SHA256: 77d0ccda3e10d6593398edb70b1566bfe1a23a39bd3da98ace2147692eadd123  
libdrm-2.4.97.tar.bz2
SHA512: 
3e08ee9d6c9ce265d783a59b51e22449905ea73aa27f25a082a1e9e1532f7c99e1c9f7cb966eb0970be2a08e2e5993dc9aa55093b1bff548689fdb465e7145ed
  libdrm-2.4.97.tar.bz2
PGP:  https://dri.freedesktop.org/libdrm/libdrm-2.4.97.tar.bz2.sig

https://dri.freedesktop.org/libdrm/libdrm-2.4.97.tar.gz
MD5:  a8bb09d6f4ed28191ba6e86e788dc3a4  libdrm-2.4.97.tar.gz
SHA1: af778f72d716589e9eacec9336bafc81b447cc42  libdrm-2.4.97.tar.gz
SHA256: 8c6f4d0934f5e005cc61bc05a917463b0c867403de176499256965f6797092f1  
libdrm-2.4.97.tar.gz
SHA512: 
9a7130ab5534555d7cf5ff95ac761d2cd2fe2c44eb9b63c7ad3f9b912d0f13f1e3ff099487d8e90b08514329c61adb4e73fe25404e7c2f4c26b205c64be8d114
  libdrm-2.4.97.tar.gz
PGP:  https://dri.freedesktop.org/libdrm/libdrm-2.4.97.tar.gz.sig

-BEGIN PGP SIGNATURE-

iQEyBAEBCgAdFiEEzUfFNBo3XzO+97r6/dFdWs7w8rEFAlxHR3sACgkQ/dFdWs7w
8rEg/Af3d1I0DnABd0j3GUTxUAfHn7/yYkyFunFmqD9tmhdZZ8rl+PzXMocEhtDz
dn+lrG3JHhj4O0istZBe0B8oZIyCuSk+36j5t3XEgR1SfF5YlDhXnlEMaPuJQerr
ZrdXsggmQyv1BjeaLcseHM4wdnbkcClSoHXCNqbKQLPOyS0r0xEj0Ft6QvtDfPxh
rpCMdNjIPSFhBiJqyFuaHw6dWbX1elzSIjtXpdBOYrf7mfF/laE6OX7p+P7LtwC4
PkoeuzdHqt77iGASBoQI28XfVGpfQvBrTzDI4xRGI9IGXyc5oeJuk0uTnVjT3A9I
zHocD5j8r4pQJdfb49RQzyPOaxvF
=Lxi8
-END PGP SIGNATURE-
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] drm/amdgpu: add a workaround for GDS ordered append hangs with compute queues

2019-01-21 Thread Marek Olšák

From: Marek Olšák 

I'm not increasing the DRM version because GDS isn't totally without bugs yet.

Signed-off-by: Marek Olšák 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gds.h |  2 ++
 drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c   | 17 
 drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c   | 17 
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c   | 36 +
 include/uapi/drm/amdgpu_drm.h   |  5 
 5 files changed, 77 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gds.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gds.h
index ecbcefe49a98..f89f5734d985 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gds.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gds.h
@@ -30,20 +30,22 @@ struct amdgpu_bo;
 struct amdgpu_gds_asic_info {
uint32_ttotal_size;
uint32_tgfx_partition_size;
uint32_tcs_partition_size;
 };
 
 struct amdgpu_gds {
struct amdgpu_gds_asic_info mem;
struct amdgpu_gds_asic_info gws;
struct amdgpu_gds_asic_info oa;
+   uint32_tgds_compute_max_wave_id;
+
/* At present, GDS, GWS and OA resources for gfx (graphics)
 * is always pre-allocated and available for graphics operation.
 * Such resource is shared between all gfx clients.
 * TODO: move this operation to user space
 * */
struct amdgpu_bo*   gds_gfx_bo;
struct amdgpu_bo*   gws_gfx_bo;
struct amdgpu_bo*   oa_gfx_bo;
 };
 
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
index 7984292f9282..d971ea914755 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
@@ -2257,20 +2257,36 @@ static void gfx_v7_0_ring_emit_ib_gfx(struct 
amdgpu_ring *ring,
 }
 
 static void gfx_v7_0_ring_emit_ib_compute(struct amdgpu_ring *ring,
  struct amdgpu_job *job,
  struct amdgpu_ib *ib,
  uint32_t flags)
 {
unsigned vmid = AMDGPU_JOB_GET_VMID(job);
u32 control = INDIRECT_BUFFER_VALID | ib->length_dw | (vmid << 24);
 
+   /* Currently, there is a high possibility to get wave ID mismatch
+* between ME and GDS, leading to a hw deadlock, because ME generates
+* different wave IDs than the GDS expects. This situation happens
+* randomly when at least 5 compute pipes use GDS ordered append.
+* The wave IDs generated by ME are also wrong after suspend/resume.
+* Those are probably bugs somewhere else in the kernel driver.
+*
+* Writing GDS_COMPUTE_MAX_WAVE_ID resets wave ID counters in ME and
+* GDS to 0 for this ring (me/pipe).
+*/
+   if (ib->flags & AMDGPU_IB_FLAG_RESET_GDS_MAX_WAVE_ID) {
+   amdgpu_ring_write(ring, PACKET3(PACKET3_SET_CONFIG_REG, 1));
+   amdgpu_ring_write(ring, mmGDS_COMPUTE_MAX_WAVE_ID - 
PACKET3_SET_CONFIG_REG_START);
+   amdgpu_ring_write(ring, 
ring->adev->gds.gds_compute_max_wave_id);
+   }
+
amdgpu_ring_write(ring, PACKET3(PACKET3_INDIRECT_BUFFER, 2));
amdgpu_ring_write(ring,
 #ifdef __BIG_ENDIAN
  (2 << 0) |
 #endif
  (ib->gpu_addr & 0xFFFC));
amdgpu_ring_write(ring, upper_32_bits(ib->gpu_addr) & 0x);
amdgpu_ring_write(ring, control);
 }
 
@@ -5050,20 +5066,21 @@ static void gfx_v7_0_set_irq_funcs(struct amdgpu_device 
*adev)
adev->gfx.priv_inst_irq.num_types = 1;
adev->gfx.priv_inst_irq.funcs = &gfx_v7_0_priv_inst_irq_funcs;
 }
 
 static void gfx_v7_0_set_gds_init(struct amdgpu_device *adev)
 {
/* init asci gds info */
adev->gds.mem.total_size = RREG32(mmGDS_VMID0_SIZE);
adev->gds.gws.total_size = 64;
adev->gds.oa.total_size = 16;
+   adev->gds.gds_compute_max_wave_id = RREG32(mmGDS_COMPUTE_MAX_WAVE_ID);
 
if (adev->gds.mem.total_size == 64 * 1024) {
adev->gds.mem.gfx_partition_size = 4096;
adev->gds.mem.cs_partition_size = 4096;
 
adev->gds.gws.gfx_partition_size = 4;
adev->gds.gws.cs_partition_size = 4;
 
adev->gds.oa.gfx_partition_size = 4;
adev->gds.oa.cs_partition_size = 1;
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
index a26747681ed6..dcdae74fc0e1 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
@@ -6077,20 +6077,36 @@ static void gfx_v8_0_ring_emit_ib_gfx(struct 
amdgpu_ring *ring,
 }
 
 static void gfx_v8_0_ring_emit_ib_compute(struct amdgpu_ring *ring,

Re: [PATCH libdrm] amdgpu: add a faster BO list API

2019-01-16 Thread Marek Olšák

FYI, I've pushed the patch because it helps simplify our the amdgpu winsys
code and I already have code that depends on it that I don't wanna rewrite.

Marek

On Wed, Jan 16, 2019 at 12:39 PM Marek Olšák  wrote:

> On Wed, Jan 16, 2019 at 9:43 AM Christian König <
> ckoenig.leichtzumer...@gmail.com> wrote:
>
>> Am 16.01.19 um 15:39 schrieb Marek Olšák:
>>
>>
>>
>> On Wed, Jan 16, 2019, 9:34 AM Koenig, Christian > wrote:
>>
>>> Am 16.01.19 um 15:31 schrieb Marek Olšák:
>>>
>>>
>>>
>>> On Wed, Jan 16, 2019, 7:55 AM Christian König <
>>> ckoenig.leichtzumer...@gmail.com wrote:
>>>
>>>> Well if you ask me we should have the following interface for
>>>> negotiating memory management with the kernel:
>>>>
>>>> 1. We have per process BOs which can't be shared between processes.
>>>>
>>>> Those are always valid and don't need to be mentioned in any BO list
>>>> whatsoever.
>>>>
>>>> If we knew that a per process BO is currently not in use we can
>>>> optionally tell that to the kernel to make memory management more
>>>> efficient.
>>>>
>>>> In other words instead of a list of stuff which is used we send down to
>>>> the kernel a list of stuff which is not used any more and that only
>>>> when
>>>> we know that it is necessary, e.g. when a game or application
>>>> overcommits.
>>>>
>>>
>>> Radeonsi doesn't use this because this approach caused performance
>>> degradation and also drops BO priorities.
>>>
>>>
>>> The performance degradation where mostly shortcomings with the LRU which
>>> by now have been fixed.
>>>
>>> BO priorities are a different topic, but could be added to per VM BOs as
>>> well.
>>>
>>
>> What's the minimum drm version that contains the fixes?
>>
>>
>> I've pushed the last optimization this morning. No idea when it really
>> became useful, but the numbers from the closed source clients now look much
>> better.
>>
>> We should probably test and bump the drm version when we are sure that
>> this now works as expected.
>>
>
> We should, but AMD Mesa guys don't have any time.
>
> Marek
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH libdrm] amdgpu: add a faster BO list API

2019-01-16 Thread Marek Olšák

On Wed, Jan 16, 2019 at 9:43 AM Christian König <
ckoenig.leichtzumer...@gmail.com> wrote:

> Am 16.01.19 um 15:39 schrieb Marek Olšák:
>
>
>
> On Wed, Jan 16, 2019, 9:34 AM Koenig, Christian  wrote:
>
>> Am 16.01.19 um 15:31 schrieb Marek Olšák:
>>
>>
>>
>> On Wed, Jan 16, 2019, 7:55 AM Christian König <
>> ckoenig.leichtzumer...@gmail.com wrote:
>>
>>> Well if you ask me we should have the following interface for
>>> negotiating memory management with the kernel:
>>>
>>> 1. We have per process BOs which can't be shared between processes.
>>>
>>> Those are always valid and don't need to be mentioned in any BO list
>>> whatsoever.
>>>
>>> If we knew that a per process BO is currently not in use we can
>>> optionally tell that to the kernel to make memory management more
>>> efficient.
>>>
>>> In other words instead of a list of stuff which is used we send down to
>>> the kernel a list of stuff which is not used any more and that only when
>>> we know that it is necessary, e.g. when a game or application
>>> overcommits.
>>>
>>
>> Radeonsi doesn't use this because this approach caused performance
>> degradation and also drops BO priorities.
>>
>>
>> The performance degradation where mostly shortcomings with the LRU which
>> by now have been fixed.
>>
>> BO priorities are a different topic, but could be added to per VM BOs as
>> well.
>>
>
> What's the minimum drm version that contains the fixes?
>
>
> I've pushed the last optimization this morning. No idea when it really
> became useful, but the numbers from the closed source clients now look much
> better.
>
> We should probably test and bump the drm version when we are sure that
> this now works as expected.
>

We should, but AMD Mesa guys don't have any time.

Marek
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH libdrm] amdgpu: update amdgpu_drm.h

2019-01-16 Thread Marek Olšák

On Wed, Jan 16, 2019 at 11:25 AM Koenig, Christian 
wrote:

> Am 16.01.19 um 17:15 schrieb Marek Olšák:
>
> On Wed, Jan 16, 2019 at 2:37 AM Christian König <
> ckoenig.leichtzumer...@gmail.com> wrote:
>
>> Am 15.01.19 um 20:25 schrieb Marek Olšák:
>> > From: Marek Olšák 
>>
>> Maybe note in the commit message from which upstream kernel.
>>
>
> No upstream kernel. It's from amd-staging-drm-next.
>
>
> That's a problem, see the rules for updating this.
>
> IIRC the code must land in an upstream kernel before it can be committed
> to libdrm.
>
> Christian.
>

It looks like it's all in the master branch.

Marek
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH libdrm] amdgpu: update amdgpu_drm.h

2019-01-16 Thread Marek Olšák

On Wed, Jan 16, 2019 at 2:37 AM Christian König <
ckoenig.leichtzumer...@gmail.com> wrote:

> Am 15.01.19 um 20:25 schrieb Marek Olšák:
> > From: Marek Olšák 
>
> Maybe note in the commit message from which upstream kernel.
>

No upstream kernel. It's from amd-staging-drm-next.

Marek
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH libdrm] amdgpu: add a faster BO list API

2019-01-16 Thread Marek Olšák

On Wed, Jan 16, 2019 at 10:15 AM Bas Nieuwenhuizen 
wrote:

> On Wed, Jan 16, 2019 at 3:38 PM Marek Olšák  wrote:
> >
> >
> >
> > On Wed, Jan 16, 2019, 7:46 AM Bas Nieuwenhuizen  wrote:
> >>
> >> So random questions:
> >>
> >> 1) In this discussion it was mentioned that some Vulkan drivers still
> >> use the bo_list interface. I think that implies radv as I think we're
> >> still using bo_list. Is there any other API we should be using? (Also,
> >> with VK_EXT_descriptor_indexing I suspect we'll be moving more towards
> >> a global bo list instead of a cmd buffer one, as we cannot know all
> >> the BOs referenced anymore, but not sure what end state here will be).
> >>
> >> 2) The other alternative mentioned was adding the buffers directly
> >> into the submit ioctl. Is this the desired end state (though as above
> >> I'm not sure how that works for vulkan)? If yes, what is the timeline
> >> for this that we need something in the interim?
> >
> >
> > Radeonsi already uses this.
> >
> >>
> >> 3) Did we measure any performance benefit?
> >>
> >> In general I'd like to to ack the raw bo list creation function as
> >> this interface seems easier to use. The two arrays thing has always
> >> been kind of a pain when we want to use e.g. builtin sort functions to
> >> make sure we have no duplicate BOs, but have some comments below.
> >
> >
> > The reason amdgpu was slower than radeon was because of this inefficient
> bo list interface.
> >
> >>
> >> On Mon, Jan 7, 2019 at 8:31 PM Marek Olšák  wrote:
> >> >
> >> > From: Marek Olšák 
> >> >
> >> > ---
> >> >  amdgpu/amdgpu-symbol-check |  3 ++
> >> >  amdgpu/amdgpu.h| 56
> +-
> >> >  amdgpu/amdgpu_bo.c | 36 
> >> >  amdgpu/amdgpu_cs.c | 25 +
> >> >  4 files changed, 119 insertions(+), 1 deletion(-)
> >> >
> >> > diff --git a/amdgpu/amdgpu-symbol-check b/amdgpu/amdgpu-symbol-check
> >> > index 6f5e0f95..96a44b40 100755
> >> > --- a/amdgpu/amdgpu-symbol-check
> >> > +++ b/amdgpu/amdgpu-symbol-check
> >> > @@ -12,20 +12,22 @@ _edata
> >> >  _end
> >> >  _fini
> >> >  _init
> >> >  amdgpu_bo_alloc
> >> >  amdgpu_bo_cpu_map
> >> >  amdgpu_bo_cpu_unmap
> >> >  amdgpu_bo_export
> >> >  amdgpu_bo_free
> >> >  amdgpu_bo_import
> >> >  amdgpu_bo_inc_ref
> >> > +amdgpu_bo_list_create_raw
> >> > +amdgpu_bo_list_destroy_raw
> >> >  amdgpu_bo_list_create
> >> >  amdgpu_bo_list_destroy
> >> >  amdgpu_bo_list_update
> >> >  amdgpu_bo_query_info
> >> >  amdgpu_bo_set_metadata
> >> >  amdgpu_bo_va_op
> >> >  amdgpu_bo_va_op_raw
> >> >  amdgpu_bo_wait_for_idle
> >> >  amdgpu_create_bo_from_user_mem
> >> >  amdgpu_cs_chunk_fence_info_to_data
> >> > @@ -40,20 +42,21 @@ amdgpu_cs_destroy_semaphore
> >> >  amdgpu_cs_destroy_syncobj
> >> >  amdgpu_cs_export_syncobj
> >> >  amdgpu_cs_fence_to_handle
> >> >  amdgpu_cs_import_syncobj
> >> >  amdgpu_cs_query_fence_status
> >> >  amdgpu_cs_query_reset_state
> >> >  amdgpu_query_sw_info
> >> >  amdgpu_cs_signal_semaphore
> >> >  amdgpu_cs_submit
> >> >  amdgpu_cs_submit_raw
> >> > +amdgpu_cs_submit_raw2
> >> >  amdgpu_cs_syncobj_export_sync_file
> >> >  amdgpu_cs_syncobj_import_sync_file
> >> >  amdgpu_cs_syncobj_reset
> >> >  amdgpu_cs_syncobj_signal
> >> >  amdgpu_cs_syncobj_wait
> >> >  amdgpu_cs_wait_fences
> >> >  amdgpu_cs_wait_semaphore
> >> >  amdgpu_device_deinitialize
> >> >  amdgpu_device_initialize
> >> >  amdgpu_find_bo_by_cpu_mapping
> >> > diff --git a/amdgpu/amdgpu.h b/amdgpu/amdgpu.h
> >> > index dc51659a..5b800033 100644
> >> > --- a/amdgpu/amdgpu.h
> >> > +++ b/amdgpu/amdgpu.h
> >> > @@ -35,20 +35,21 @@
> >> >  #define _AMDGPU_H_
> >> >
> >> >  #include 
> >> >  #include 
> >> >
> >> >  #ifdef __cplusplus
> >> >  extern "C" {
> >> >  #endif
> >>

Re: [PATCH libdrm] amdgpu: add a faster BO list API

2019-01-16 Thread Marek Olšák

On Wed, Jan 16, 2019, 9:34 AM Koenig, Christian  Am 16.01.19 um 15:31 schrieb Marek Olšák:
>
>
>
> On Wed, Jan 16, 2019, 7:55 AM Christian König <
> ckoenig.leichtzumer...@gmail.com wrote:
>
>> Well if you ask me we should have the following interface for
>> negotiating memory management with the kernel:
>>
>> 1. We have per process BOs which can't be shared between processes.
>>
>> Those are always valid and don't need to be mentioned in any BO list
>> whatsoever.
>>
>> If we knew that a per process BO is currently not in use we can
>> optionally tell that to the kernel to make memory management more
>> efficient.
>>
>> In other words instead of a list of stuff which is used we send down to
>> the kernel a list of stuff which is not used any more and that only when
>> we know that it is necessary, e.g. when a game or application overcommits.
>>
>
> Radeonsi doesn't use this because this approach caused performance
> degradation and also drops BO priorities.
>
>
> The performance degradation where mostly shortcomings with the LRU which
> by now have been fixed.
>
> BO priorities are a different topic, but could be added to per VM BOs as
> well.
>

What's the minimum drm version that contains the fixes?

Marek


> Christian.
>
>
> Marek
>
>
>> 2. We have shared BOs which are used by more than one process.
>>
>> Those are rare and should be added to the per CS list of BOs in use.
>>
>>
>> The whole BO list interface Marek tries to optimize here should be
>> deprecated and not used any more.
>>
>> Regards,
>> Christian.
>>
>> Am 16.01.19 um 13:46 schrieb Bas Nieuwenhuizen:
>> > So random questions:
>> >
>> > 1) In this discussion it was mentioned that some Vulkan drivers still
>> > use the bo_list interface. I think that implies radv as I think we're
>> > still using bo_list. Is there any other API we should be using? (Also,
>> > with VK_EXT_descriptor_indexing I suspect we'll be moving more towards
>> > a global bo list instead of a cmd buffer one, as we cannot know all
>> > the BOs referenced anymore, but not sure what end state here will be).
>> >
>> > 2) The other alternative mentioned was adding the buffers directly
>> > into the submit ioctl. Is this the desired end state (though as above
>> > I'm not sure how that works for vulkan)? If yes, what is the timeline
>> > for this that we need something in the interim?
>> >
>> > 3) Did we measure any performance benefit?
>> >
>> > In general I'd like to to ack the raw bo list creation function as
>> > this interface seems easier to use. The two arrays thing has always
>> > been kind of a pain when we want to use e.g. builtin sort functions to
>> > make sure we have no duplicate BOs, but have some comments below.
>> >
>> > On Mon, Jan 7, 2019 at 8:31 PM Marek Olšák  wrote:
>> >> From: Marek Olšák 
>> >>
>> >> ---
>> >>   amdgpu/amdgpu-symbol-check |  3 ++
>> >>   amdgpu/amdgpu.h| 56
>> +-
>> >>   amdgpu/amdgpu_bo.c | 36 
>> >>   amdgpu/amdgpu_cs.c | 25 +
>> >>   4 files changed, 119 insertions(+), 1 deletion(-)
>> >>
>> >> diff --git a/amdgpu/amdgpu-symbol-check b/amdgpu/amdgpu-symbol-check
>> >> index 6f5e0f95..96a44b40 100755
>> >> --- a/amdgpu/amdgpu-symbol-check
>> >> +++ b/amdgpu/amdgpu-symbol-check
>> >> @@ -12,20 +12,22 @@ _edata
>> >>   _end
>> >>   _fini
>> >>   _init
>> >>   amdgpu_bo_alloc
>> >>   amdgpu_bo_cpu_map
>> >>   amdgpu_bo_cpu_unmap
>> >>   amdgpu_bo_export
>> >>   amdgpu_bo_free
>> >>   amdgpu_bo_import
>> >>   amdgpu_bo_inc_ref
>> >> +amdgpu_bo_list_create_raw
>> >> +amdgpu_bo_list_destroy_raw
>> >>   amdgpu_bo_list_create
>> >>   amdgpu_bo_list_destroy
>> >>   amdgpu_bo_list_update
>> >>   amdgpu_bo_query_info
>> >>   amdgpu_bo_set_metadata
>> >>   amdgpu_bo_va_op
>> >>   amdgpu_bo_va_op_raw
>> >>   amdgpu_bo_wait_for_idle
>> >>   amdgpu_create_bo_from_user_mem
>> >>   amdgpu_cs_chunk_fence_info_to_data
>> >> @@ -40,20 +42,21 @@ amdgpu_cs_destroy_semaphore
>> >>   amdgpu_cs_destroy_syncobj
>> >>

Re: [PATCH libdrm] amdgpu: add a faster BO list API

2019-01-16 Thread Marek Olšák

On Wed, Jan 16, 2019, 7:46 AM Bas Nieuwenhuizen  So random questions:
>
> 1) In this discussion it was mentioned that some Vulkan drivers still
> use the bo_list interface. I think that implies radv as I think we're
> still using bo_list. Is there any other API we should be using? (Also,
> with VK_EXT_descriptor_indexing I suspect we'll be moving more towards
> a global bo list instead of a cmd buffer one, as we cannot know all
> the BOs referenced anymore, but not sure what end state here will be).
>
> 2) The other alternative mentioned was adding the buffers directly
> into the submit ioctl. Is this the desired end state (though as above
> I'm not sure how that works for vulkan)? If yes, what is the timeline
> for this that we need something in the interim?
>

Radeonsi already uses this.


> 3) Did we measure any performance benefit?
>
> In general I'd like to to ack the raw bo list creation function as
> this interface seems easier to use. The two arrays thing has always
> been kind of a pain when we want to use e.g. builtin sort functions to
> make sure we have no duplicate BOs, but have some comments below.
>

The reason amdgpu was slower than radeon was because of this inefficient bo
list interface.


> On Mon, Jan 7, 2019 at 8:31 PM Marek Olšák  wrote:
> >
> > From: Marek Olšák 
> >
> > ---
> >  amdgpu/amdgpu-symbol-check |  3 ++
> >  amdgpu/amdgpu.h| 56 +-
> >  amdgpu/amdgpu_bo.c | 36 
> >  amdgpu/amdgpu_cs.c | 25 +
> >  4 files changed, 119 insertions(+), 1 deletion(-)
> >
> > diff --git a/amdgpu/amdgpu-symbol-check b/amdgpu/amdgpu-symbol-check
> > index 6f5e0f95..96a44b40 100755
> > --- a/amdgpu/amdgpu-symbol-check
> > +++ b/amdgpu/amdgpu-symbol-check
> > @@ -12,20 +12,22 @@ _edata
> >  _end
> >  _fini
> >  _init
> >  amdgpu_bo_alloc
> >  amdgpu_bo_cpu_map
> >  amdgpu_bo_cpu_unmap
> >  amdgpu_bo_export
> >  amdgpu_bo_free
> >  amdgpu_bo_import
> >  amdgpu_bo_inc_ref
> > +amdgpu_bo_list_create_raw
> > +amdgpu_bo_list_destroy_raw
> >  amdgpu_bo_list_create
> >  amdgpu_bo_list_destroy
> >  amdgpu_bo_list_update
> >  amdgpu_bo_query_info
> >  amdgpu_bo_set_metadata
> >  amdgpu_bo_va_op
> >  amdgpu_bo_va_op_raw
> >  amdgpu_bo_wait_for_idle
> >  amdgpu_create_bo_from_user_mem
> >  amdgpu_cs_chunk_fence_info_to_data
> > @@ -40,20 +42,21 @@ amdgpu_cs_destroy_semaphore
> >  amdgpu_cs_destroy_syncobj
> >  amdgpu_cs_export_syncobj
> >  amdgpu_cs_fence_to_handle
> >  amdgpu_cs_import_syncobj
> >  amdgpu_cs_query_fence_status
> >  amdgpu_cs_query_reset_state
> >  amdgpu_query_sw_info
> >  amdgpu_cs_signal_semaphore
> >  amdgpu_cs_submit
> >  amdgpu_cs_submit_raw
> > +amdgpu_cs_submit_raw2
> >  amdgpu_cs_syncobj_export_sync_file
> >  amdgpu_cs_syncobj_import_sync_file
> >  amdgpu_cs_syncobj_reset
> >  amdgpu_cs_syncobj_signal
> >  amdgpu_cs_syncobj_wait
> >  amdgpu_cs_wait_fences
> >  amdgpu_cs_wait_semaphore
> >  amdgpu_device_deinitialize
> >  amdgpu_device_initialize
> >  amdgpu_find_bo_by_cpu_mapping
> > diff --git a/amdgpu/amdgpu.h b/amdgpu/amdgpu.h
> > index dc51659a..5b800033 100644
> > --- a/amdgpu/amdgpu.h
> > +++ b/amdgpu/amdgpu.h
> > @@ -35,20 +35,21 @@
> >  #define _AMDGPU_H_
> >
> >  #include 
> >  #include 
> >
> >  #ifdef __cplusplus
> >  extern "C" {
> >  #endif
> >
> >  struct drm_amdgpu_info_hw_ip;
> > +struct drm_amdgpu_bo_list_entry;
> >
> >
> /*--*/
> >  /* --- Defines
>  */
> >
> /*--*/
> >
> >  /**
> >   * Define max. number of Command Buffers (IB) which could be sent to
> the single
> >   * hardware IP to accommodate CE/DE requirements
> >   *
> >   * \sa amdgpu_cs_ib_info
> > @@ -767,34 +768,65 @@ int amdgpu_bo_cpu_unmap(amdgpu_bo_handle
> buf_handle);
> >   *and no GPU access is scheduled.
> >   *  1 GPU access is in fly or scheduled
> >   *
> >   * \return   0 - on success
> >   *  <0 - Negative POSIX Error code
> >   */
> >  int amdgpu_bo_wait_for_idle(amdgpu_bo_handle buf_handle,
> > uint64_t timeout_ns,
> &g

Re: [PATCH libdrm] amdgpu: add a faster BO list API

2019-01-16 Thread Marek Olšák

On Wed, Jan 16, 2019, 7:55 AM Christian König <
ckoenig.leichtzumer...@gmail.com wrote:

> Well if you ask me we should have the following interface for
> negotiating memory management with the kernel:
>
> 1. We have per process BOs which can't be shared between processes.
>
> Those are always valid and don't need to be mentioned in any BO list
> whatsoever.
>
> If we knew that a per process BO is currently not in use we can
> optionally tell that to the kernel to make memory management more
> efficient.
>
> In other words instead of a list of stuff which is used we send down to
> the kernel a list of stuff which is not used any more and that only when
> we know that it is necessary, e.g. when a game or application overcommits.
>

Radeonsi doesn't use this because this approach caused performance
degradation and also drops BO priorities.

Marek


> 2. We have shared BOs which are used by more than one process.
>
> Those are rare and should be added to the per CS list of BOs in use.
>
>
> The whole BO list interface Marek tries to optimize here should be
> deprecated and not used any more.
>
> Regards,
> Christian.
>
> Am 16.01.19 um 13:46 schrieb Bas Nieuwenhuizen:
> > So random questions:
> >
> > 1) In this discussion it was mentioned that some Vulkan drivers still
> > use the bo_list interface. I think that implies radv as I think we're
> > still using bo_list. Is there any other API we should be using? (Also,
> > with VK_EXT_descriptor_indexing I suspect we'll be moving more towards
> > a global bo list instead of a cmd buffer one, as we cannot know all
> > the BOs referenced anymore, but not sure what end state here will be).
> >
> > 2) The other alternative mentioned was adding the buffers directly
> > into the submit ioctl. Is this the desired end state (though as above
> > I'm not sure how that works for vulkan)? If yes, what is the timeline
> > for this that we need something in the interim?
> >
> > 3) Did we measure any performance benefit?
> >
> > In general I'd like to to ack the raw bo list creation function as
> > this interface seems easier to use. The two arrays thing has always
> > been kind of a pain when we want to use e.g. builtin sort functions to
> > make sure we have no duplicate BOs, but have some comments below.
> >
> > On Mon, Jan 7, 2019 at 8:31 PM Marek Olšák  wrote:
> >> From: Marek Olšák 
> >>
> >> ---
> >>   amdgpu/amdgpu-symbol-check |  3 ++
> >>   amdgpu/amdgpu.h| 56 +-
> >>   amdgpu/amdgpu_bo.c | 36 
> >>   amdgpu/amdgpu_cs.c | 25 +
> >>   4 files changed, 119 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/amdgpu/amdgpu-symbol-check b/amdgpu/amdgpu-symbol-check
> >> index 6f5e0f95..96a44b40 100755
> >> --- a/amdgpu/amdgpu-symbol-check
> >> +++ b/amdgpu/amdgpu-symbol-check
> >> @@ -12,20 +12,22 @@ _edata
> >>   _end
> >>   _fini
> >>   _init
> >>   amdgpu_bo_alloc
> >>   amdgpu_bo_cpu_map
> >>   amdgpu_bo_cpu_unmap
> >>   amdgpu_bo_export
> >>   amdgpu_bo_free
> >>   amdgpu_bo_import
> >>   amdgpu_bo_inc_ref
> >> +amdgpu_bo_list_create_raw
> >> +amdgpu_bo_list_destroy_raw
> >>   amdgpu_bo_list_create
> >>   amdgpu_bo_list_destroy
> >>   amdgpu_bo_list_update
> >>   amdgpu_bo_query_info
> >>   amdgpu_bo_set_metadata
> >>   amdgpu_bo_va_op
> >>   amdgpu_bo_va_op_raw
> >>   amdgpu_bo_wait_for_idle
> >>   amdgpu_create_bo_from_user_mem
> >>   amdgpu_cs_chunk_fence_info_to_data
> >> @@ -40,20 +42,21 @@ amdgpu_cs_destroy_semaphore
> >>   amdgpu_cs_destroy_syncobj
> >>   amdgpu_cs_export_syncobj
> >>   amdgpu_cs_fence_to_handle
> >>   amdgpu_cs_import_syncobj
> >>   amdgpu_cs_query_fence_status
> >>   amdgpu_cs_query_reset_state
> >>   amdgpu_query_sw_info
> >>   amdgpu_cs_signal_semaphore
> >>   amdgpu_cs_submit
> >>   amdgpu_cs_submit_raw
> >> +amdgpu_cs_submit_raw2
> >>   amdgpu_cs_syncobj_export_sync_file
> >>   amdgpu_cs_syncobj_import_sync_file
> >>   amdgpu_cs_syncobj_reset
> >>   amdgpu_cs_syncobj_signal
> >>   amdgpu_cs_syncobj_wait
> >>   amdgpu_cs_wait_fences
> >>   amdgpu_cs_wait_semaphore
> >>   amdgpu_device_deinitialize
> >>   amdgpu_device_initialize
> >>   amdgpu_find_bo_by_cpu_mapping
>

[PATCH libdrm] amdgpu: update amdgpu_drm.h

2019-01-15 Thread Marek Olšák

From: Marek Olšák 

---
 include/drm/amdgpu_drm.h | 8 
 1 file changed, 8 insertions(+)

diff --git a/include/drm/amdgpu_drm.h b/include/drm/amdgpu_drm.h
index 1ceec56d..be84e43c 100644
--- a/include/drm/amdgpu_drm.h
+++ b/include/drm/amdgpu_drm.h
@@ -319,20 +319,26 @@ struct drm_amdgpu_gem_userptr {
 #define AMDGPU_TILING_BANK_HEIGHT_SHIFT17
 #define AMDGPU_TILING_BANK_HEIGHT_MASK 0x3
 #define AMDGPU_TILING_MACRO_TILE_ASPECT_SHIFT  19
 #define AMDGPU_TILING_MACRO_TILE_ASPECT_MASK   0x3
 #define AMDGPU_TILING_NUM_BANKS_SHIFT  21
 #define AMDGPU_TILING_NUM_BANKS_MASK   0x3
 
 /* GFX9 and later: */
 #define AMDGPU_TILING_SWIZZLE_MODE_SHIFT   0
 #define AMDGPU_TILING_SWIZZLE_MODE_MASK0x1f
+#define AMDGPU_TILING_DCC_OFFSET_256B_SHIFT5
+#define AMDGPU_TILING_DCC_OFFSET_256B_MASK 0xFF
+#define AMDGPU_TILING_DCC_PITCH_MAX_SHIFT  29
+#define AMDGPU_TILING_DCC_PITCH_MAX_MASK   0x3FFF
+#define AMDGPU_TILING_DCC_INDEPENDENT_64B_SHIFT43
+#define AMDGPU_TILING_DCC_INDEPENDENT_64B_MASK 0x1
 
 /* Set/Get helpers for tiling flags. */
 #define AMDGPU_TILING_SET(field, value) \
(((__u64)(value) & AMDGPU_TILING_##field##_MASK) << 
AMDGPU_TILING_##field##_SHIFT)
 #define AMDGPU_TILING_GET(value, field) \
(((__u64)(value) >> AMDGPU_TILING_##field##_SHIFT) & 
AMDGPU_TILING_##field##_MASK)
 
 #define AMDGPU_GEM_METADATA_OP_SET_METADATA  1
 #define AMDGPU_GEM_METADATA_OP_GET_METADATA  2
 
@@ -658,20 +664,22 @@ struct drm_amdgpu_cs_chunk_data {
/* Subquery id: Query PSP ASD firmware version */
#define AMDGPU_INFO_FW_ASD  0x0d
/* Subquery id: Query VCN firmware version */
#define AMDGPU_INFO_FW_VCN  0x0e
/* Subquery id: Query GFX RLC SRLC firmware version */
#define AMDGPU_INFO_FW_GFX_RLC_RESTORE_LIST_CNTL 0x0f
/* Subquery id: Query GFX RLC SRLG firmware version */
#define AMDGPU_INFO_FW_GFX_RLC_RESTORE_LIST_GPM_MEM 0x10
/* Subquery id: Query GFX RLC SRLS firmware version */
#define AMDGPU_INFO_FW_GFX_RLC_RESTORE_LIST_SRM_MEM 0x11
+   /* Subquery id: Query DMCU firmware version */
+   #define AMDGPU_INFO_FW_DMCU 0x12
 /* number of bytes moved for TTM migration */
 #define AMDGPU_INFO_NUM_BYTES_MOVED0x0f
 /* the used VRAM size */
 #define AMDGPU_INFO_VRAM_USAGE 0x10
 /* the used GTT size */
 #define AMDGPU_INFO_GTT_USAGE  0x11
 /* Information about GDS, etc. resource configuration */
 #define AMDGPU_INFO_GDS_CONFIG 0x13
 /* Query information about VRAM and GTT domains */
 #define AMDGPU_INFO_VRAM_GTT   0x14
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH libdrm] amdgpu: add a faster BO list API

2019-01-10 Thread Marek Olšák

On Thu, Jan 10, 2019, 6:51 AM Christian König <
ckoenig.leichtzumer...@gmail.com wrote:

> Am 10.01.19 um 12:41 schrieb Marek Olšák:
>
>
>
> On Thu, Jan 10, 2019, 4:15 AM Koenig, Christian  wrote:
>
>> Am 10.01.19 um 00:39 schrieb Marek Olšák:
>>
>> On Wed, Jan 9, 2019 at 1:41 PM Christian König <
>> ckoenig.leichtzumer...@gmail.com> wrote:
>>
>>> Am 09.01.19 um 17:14 schrieb Marek Olšák:
>>>
>>> On Wed, Jan 9, 2019 at 8:09 AM Christian König <
>>> ckoenig.leichtzumer...@gmail.com> wrote:
>>>
>>>> Am 09.01.19 um 13:36 schrieb Marek Olšák:
>>>>
>>>>
>>>>
>>>> On Wed, Jan 9, 2019, 5:28 AM Christian König <
>>>> ckoenig.leichtzumer...@gmail.com wrote:
>>>>
>>>>> Looks good, but I'm wondering what's the actual improvement?
>>>>>
>>>>
>>>> No malloc calls and 1 less for loop copying the bo list.
>>>>
>>>>
>>>> Yeah, but didn't we want to get completely rid of the bo list?
>>>>
>>>
>>> If we have multiple IBs (e.g. gfx + compute) that share a BO list, I
>>> think it's faster to send the BO list to the kernel only once.
>>>
>>>
>>> That's not really faster.
>>>
>>> The only thing we safe us is a single loop over all BOs to lockup the
>>> handle into a pointer and that is only a tiny fraction of the overhead.
>>>
>>> The majority of the overhead is locking the BOs and reserving space for
>>> the submission.
>>>
>>> What could really help here is to submit gfx+comput together in just one
>>> CS IOCTL. This way we would need the locking and space reservation only
>>> once.
>>>
>>> It's a bit of work in the kernel side, but certainly doable.
>>>
>>
>> OK. Any objections to this patch?
>>
>>
>> In general I'm wondering if we couldn't avoid adding so much new
>> interface.
>>
>
> There are Vulkan drivers that still use the bo_list interface.
>
>
>> For example we can avoid the malloc() when we just cache the last freed
>> bo_list structure in the device. We would just need an atomic pointer
>> exchange operation for that.
>>
>
>> This way we even don't need to change mesa at all.
>>
>
> There is still the for loop that we need to get rid of.
>
>
> Yeah, but that I'm fine to handle with a amdgpu_bo_list_create_raw which
> only takes the handles and still returns the amdgpu_bo_list structure we
> are used to.
>
> See what I'm mostly concerned about is having another CS function to
> maintain.
>

There is no maintenance cost. It's just a wrapper. Eventually all drivers
will switch to it.

Marek


>
>
>> Regarding optimization, this chunk can be replaced by a cast on 64bit:
>>
>> +chunk_array = alloca(sizeof(uint64_t) * num_chunks);
>> +for (i = 0; i < num_chunks; i++)
>> +chunk_array[i] = (uint64_t)(uintptr_t)&chunks[i];
>>
>> It can't. The input is an array of structures. The ioctl takes an array
> of pointers.
>
>
> Ah! Haven't seen this, sorry for the noise.
>
> Christian.
>
>
> Marek
>
>
>> Regards,
>> Christian.
>>
>>
>> Thanks,
>> Marek
>>
>>
>>
> ___
> amd-gfx mailing 
> listamd-gfx@lists.freedesktop.orghttps://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
>
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH libdrm] amdgpu: add a faster BO list API

2019-01-10 Thread Marek Olšák

On Thu, Jan 10, 2019, 4:15 AM Koenig, Christian  Am 10.01.19 um 00:39 schrieb Marek Olšák:
>
> On Wed, Jan 9, 2019 at 1:41 PM Christian König <
> ckoenig.leichtzumer...@gmail.com> wrote:
>
>> Am 09.01.19 um 17:14 schrieb Marek Olšák:
>>
>> On Wed, Jan 9, 2019 at 8:09 AM Christian König <
>> ckoenig.leichtzumer...@gmail.com> wrote:
>>
>>> Am 09.01.19 um 13:36 schrieb Marek Olšák:
>>>
>>>
>>>
>>> On Wed, Jan 9, 2019, 5:28 AM Christian König <
>>> ckoenig.leichtzumer...@gmail.com wrote:
>>>
>>>> Looks good, but I'm wondering what's the actual improvement?
>>>>
>>>
>>> No malloc calls and 1 less for loop copying the bo list.
>>>
>>>
>>> Yeah, but didn't we want to get completely rid of the bo list?
>>>
>>
>> If we have multiple IBs (e.g. gfx + compute) that share a BO list, I
>> think it's faster to send the BO list to the kernel only once.
>>
>>
>> That's not really faster.
>>
>> The only thing we safe us is a single loop over all BOs to lockup the
>> handle into a pointer and that is only a tiny fraction of the overhead.
>>
>> The majority of the overhead is locking the BOs and reserving space for
>> the submission.
>>
>> What could really help here is to submit gfx+comput together in just one
>> CS IOCTL. This way we would need the locking and space reservation only
>> once.
>>
>> It's a bit of work in the kernel side, but certainly doable.
>>
>
> OK. Any objections to this patch?
>
>
> In general I'm wondering if we couldn't avoid adding so much new interface.
>

There are Vulkan drivers that still use the bo_list interface.


> For example we can avoid the malloc() when we just cache the last freed
> bo_list structure in the device. We would just need an atomic pointer
> exchange operation for that.
>

> This way we even don't need to change mesa at all.
>

There is still the for loop that we need to get rid of.


> Regarding optimization, this chunk can be replaced by a cast on 64bit:
>
> + chunk_array = alloca(sizeof(uint64_t) * num_chunks);
> + for (i = 0; i < num_chunks; i++)
> + chunk_array[i] = (uint64_t)(uintptr_t)&chunks[i];
>
> It can't. The input is an array of structures. The ioctl takes an array of
pointers.

Marek


> Regards,
> Christian.
>
>
> Thanks,
> Marek
>
>
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH libdrm] amdgpu: add a faster BO list API

2019-01-09 Thread Marek Olšák

On Wed, Jan 9, 2019 at 1:41 PM Christian König <
ckoenig.leichtzumer...@gmail.com> wrote:

> Am 09.01.19 um 17:14 schrieb Marek Olšák:
>
> On Wed, Jan 9, 2019 at 8:09 AM Christian König <
> ckoenig.leichtzumer...@gmail.com> wrote:
>
>> Am 09.01.19 um 13:36 schrieb Marek Olšák:
>>
>>
>>
>> On Wed, Jan 9, 2019, 5:28 AM Christian König <
>> ckoenig.leichtzumer...@gmail.com wrote:
>>
>>> Looks good, but I'm wondering what's the actual improvement?
>>>
>>
>> No malloc calls and 1 less for loop copying the bo list.
>>
>>
>> Yeah, but didn't we want to get completely rid of the bo list?
>>
>
> If we have multiple IBs (e.g. gfx + compute) that share a BO list, I think
> it's faster to send the BO list to the kernel only once.
>
>
> That's not really faster.
>
> The only thing we safe us is a single loop over all BOs to lockup the
> handle into a pointer and that is only a tiny fraction of the overhead.
>
> The majority of the overhead is locking the BOs and reserving space for
> the submission.
>
> What could really help here is to submit gfx+comput together in just one
> CS IOCTL. This way we would need the locking and space reservation only
> once.
>
> It's a bit of work in the kernel side, but certainly doable.
>

OK. Any objections to this patch?

Thanks,
Marek
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH libdrm] amdgpu: add a faster BO list API

2019-01-09 Thread Marek Olšák

On Wed, Jan 9, 2019 at 8:09 AM Christian König <
ckoenig.leichtzumer...@gmail.com> wrote:

> Am 09.01.19 um 13:36 schrieb Marek Olšák:
>
>
>
> On Wed, Jan 9, 2019, 5:28 AM Christian König <
> ckoenig.leichtzumer...@gmail.com wrote:
>
>> Looks good, but I'm wondering what's the actual improvement?
>>
>
> No malloc calls and 1 less for loop copying the bo list.
>
>
> Yeah, but didn't we want to get completely rid of the bo list?
>

If we have multiple IBs (e.g. gfx + compute) that share a BO list, I think
it's faster to send the BO list to the kernel only once.

Marek
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH libdrm] amdgpu: add a faster BO list API

2019-01-09 Thread Marek Olšák

On Wed, Jan 9, 2019, 5:28 AM Christian König <
ckoenig.leichtzumer...@gmail.com wrote:

> Looks good, but I'm wondering what's the actual improvement?
>

No malloc calls and 1 less for loop copying the bo list.

Marek


> Christian.
>
> Am 07.01.19 um 20:31 schrieb Marek Olšák:
> > From: Marek Olšák 
> >
> > ---
> >   amdgpu/amdgpu-symbol-check |  3 ++
> >   amdgpu/amdgpu.h| 56 +-
> >   amdgpu/amdgpu_bo.c | 36 
> >   amdgpu/amdgpu_cs.c | 25 +
> >   4 files changed, 119 insertions(+), 1 deletion(-)
> >
> > diff --git a/amdgpu/amdgpu-symbol-check b/amdgpu/amdgpu-symbol-check
> > index 6f5e0f95..96a44b40 100755
> > --- a/amdgpu/amdgpu-symbol-check
> > +++ b/amdgpu/amdgpu-symbol-check
> > @@ -12,20 +12,22 @@ _edata
> >   _end
> >   _fini
> >   _init
> >   amdgpu_bo_alloc
> >   amdgpu_bo_cpu_map
> >   amdgpu_bo_cpu_unmap
> >   amdgpu_bo_export
> >   amdgpu_bo_free
> >   amdgpu_bo_import
> >   amdgpu_bo_inc_ref
> > +amdgpu_bo_list_create_raw
> > +amdgpu_bo_list_destroy_raw
> >   amdgpu_bo_list_create
> >   amdgpu_bo_list_destroy
> >   amdgpu_bo_list_update
> >   amdgpu_bo_query_info
> >   amdgpu_bo_set_metadata
> >   amdgpu_bo_va_op
> >   amdgpu_bo_va_op_raw
> >   amdgpu_bo_wait_for_idle
> >   amdgpu_create_bo_from_user_mem
> >   amdgpu_cs_chunk_fence_info_to_data
> > @@ -40,20 +42,21 @@ amdgpu_cs_destroy_semaphore
> >   amdgpu_cs_destroy_syncobj
> >   amdgpu_cs_export_syncobj
> >   amdgpu_cs_fence_to_handle
> >   amdgpu_cs_import_syncobj
> >   amdgpu_cs_query_fence_status
> >   amdgpu_cs_query_reset_state
> >   amdgpu_query_sw_info
> >   amdgpu_cs_signal_semaphore
> >   amdgpu_cs_submit
> >   amdgpu_cs_submit_raw
> > +amdgpu_cs_submit_raw2
> >   amdgpu_cs_syncobj_export_sync_file
> >   amdgpu_cs_syncobj_import_sync_file
> >   amdgpu_cs_syncobj_reset
> >   amdgpu_cs_syncobj_signal
> >   amdgpu_cs_syncobj_wait
> >   amdgpu_cs_wait_fences
> >   amdgpu_cs_wait_semaphore
> >   amdgpu_device_deinitialize
> >   amdgpu_device_initialize
> >   amdgpu_find_bo_by_cpu_mapping
> > diff --git a/amdgpu/amdgpu.h b/amdgpu/amdgpu.h
> > index dc51659a..5b800033 100644
> > --- a/amdgpu/amdgpu.h
> > +++ b/amdgpu/amdgpu.h
> > @@ -35,20 +35,21 @@
> >   #define _AMDGPU_H_
> >
> >   #include 
> >   #include 
> >
> >   #ifdef __cplusplus
> >   extern "C" {
> >   #endif
> >
> >   struct drm_amdgpu_info_hw_ip;
> > +struct drm_amdgpu_bo_list_entry;
> >
> >
>  
> /*--*/
> >   /* --- Defines
>  */
> >
>  
> /*--*/
> >
> >   /**
> >* Define max. number of Command Buffers (IB) which could be sent to
> the single
> >* hardware IP to accommodate CE/DE requirements
> >*
> >* \sa amdgpu_cs_ib_info
> > @@ -767,34 +768,65 @@ int amdgpu_bo_cpu_unmap(amdgpu_bo_handle
> buf_handle);
> >*and no GPU access is scheduled.
> >*  1 GPU access is in fly or scheduled
> >*
> >* \return   0 - on success
> >*  <0 - Negative POSIX Error code
> >*/
> >   int amdgpu_bo_wait_for_idle(amdgpu_bo_handle buf_handle,
> >   uint64_t timeout_ns,
> >   bool *buffer_busy);
> >
> > +/**
> > + * Creates a BO list handle for command submission.
> > + *
> > + * \param   dev  - \c [in] Device handle.
> > + *  See #amdgpu_device_initialize()
> > + * \param   number_of_buffers- \c [in] Number of BOs in the list
> > + * \param   buffers  - \c [in] List of BO handles
> > + * \param   result   - \c [out] Created BO list handle
> > + *
> > + * \return   0 on success\n
> > + *  <0 - Negative POSIX Error code
> > + *
> > + * \sa amdgpu_bo_list_destroy_raw()
> > +*/
> > +int amdgpu_bo_list_create_raw(amdgpu_device_handle dev,
> > +   uint32_t number_of_buffers,
> > +   struct drm_amdgpu_bo_list_entry *buffers,
> > +   uint32_t *result);
> > +
> > +/**

[PATCH libdrm] amdgpu: add a faster BO list API

2019-01-07 Thread Marek Olšák

From: Marek Olšák 

---
 amdgpu/amdgpu-symbol-check |  3 ++
 amdgpu/amdgpu.h| 56 +-
 amdgpu/amdgpu_bo.c | 36 
 amdgpu/amdgpu_cs.c | 25 +
 4 files changed, 119 insertions(+), 1 deletion(-)

diff --git a/amdgpu/amdgpu-symbol-check b/amdgpu/amdgpu-symbol-check
index 6f5e0f95..96a44b40 100755
--- a/amdgpu/amdgpu-symbol-check
+++ b/amdgpu/amdgpu-symbol-check
@@ -12,20 +12,22 @@ _edata
 _end
 _fini
 _init
 amdgpu_bo_alloc
 amdgpu_bo_cpu_map
 amdgpu_bo_cpu_unmap
 amdgpu_bo_export
 amdgpu_bo_free
 amdgpu_bo_import
 amdgpu_bo_inc_ref
+amdgpu_bo_list_create_raw
+amdgpu_bo_list_destroy_raw
 amdgpu_bo_list_create
 amdgpu_bo_list_destroy
 amdgpu_bo_list_update
 amdgpu_bo_query_info
 amdgpu_bo_set_metadata
 amdgpu_bo_va_op
 amdgpu_bo_va_op_raw
 amdgpu_bo_wait_for_idle
 amdgpu_create_bo_from_user_mem
 amdgpu_cs_chunk_fence_info_to_data
@@ -40,20 +42,21 @@ amdgpu_cs_destroy_semaphore
 amdgpu_cs_destroy_syncobj
 amdgpu_cs_export_syncobj
 amdgpu_cs_fence_to_handle
 amdgpu_cs_import_syncobj
 amdgpu_cs_query_fence_status
 amdgpu_cs_query_reset_state
 amdgpu_query_sw_info
 amdgpu_cs_signal_semaphore
 amdgpu_cs_submit
 amdgpu_cs_submit_raw
+amdgpu_cs_submit_raw2
 amdgpu_cs_syncobj_export_sync_file
 amdgpu_cs_syncobj_import_sync_file
 amdgpu_cs_syncobj_reset
 amdgpu_cs_syncobj_signal
 amdgpu_cs_syncobj_wait
 amdgpu_cs_wait_fences
 amdgpu_cs_wait_semaphore
 amdgpu_device_deinitialize
 amdgpu_device_initialize
 amdgpu_find_bo_by_cpu_mapping
diff --git a/amdgpu/amdgpu.h b/amdgpu/amdgpu.h
index dc51659a..5b800033 100644
--- a/amdgpu/amdgpu.h
+++ b/amdgpu/amdgpu.h
@@ -35,20 +35,21 @@
 #define _AMDGPU_H_
 
 #include 
 #include 
 
 #ifdef __cplusplus
 extern "C" {
 #endif
 
 struct drm_amdgpu_info_hw_ip;
+struct drm_amdgpu_bo_list_entry;
 
 /*--*/
 /* --- Defines  */
 /*--*/
 
 /**
  * Define max. number of Command Buffers (IB) which could be sent to the single
  * hardware IP to accommodate CE/DE requirements
  *
  * \sa amdgpu_cs_ib_info
@@ -767,34 +768,65 @@ int amdgpu_bo_cpu_unmap(amdgpu_bo_handle buf_handle);
  *and no GPU access is scheduled.
  *  1 GPU access is in fly or scheduled
  *
  * \return   0 - on success
  *  <0 - Negative POSIX Error code
  */
 int amdgpu_bo_wait_for_idle(amdgpu_bo_handle buf_handle,
uint64_t timeout_ns,
bool *buffer_busy);
 
+/**
+ * Creates a BO list handle for command submission.
+ *
+ * \param   dev- \c [in] Device handle.
+ *See #amdgpu_device_initialize()
+ * \param   number_of_buffers  - \c [in] Number of BOs in the list
+ * \param   buffers- \c [in] List of BO handles
+ * \param   result - \c [out] Created BO list handle
+ *
+ * \return   0 on success\n
+ *  <0 - Negative POSIX Error code
+ *
+ * \sa amdgpu_bo_list_destroy_raw()
+*/
+int amdgpu_bo_list_create_raw(amdgpu_device_handle dev,
+ uint32_t number_of_buffers,
+ struct drm_amdgpu_bo_list_entry *buffers,
+ uint32_t *result);
+
+/**
+ * Destroys a BO list handle.
+ *
+ * \param   bo_list- \c [in] BO list handle.
+ *
+ * \return   0 on success\n
+ *  <0 - Negative POSIX Error code
+ *
+ * \sa amdgpu_bo_list_create_raw(), amdgpu_cs_submit_raw2()
+*/
+int amdgpu_bo_list_destroy_raw(amdgpu_device_handle dev, uint32_t bo_list);
+
 /**
  * Creates a BO list handle for command submission.
  *
  * \param   dev- \c [in] Device handle.
  *See #amdgpu_device_initialize()
  * \param   number_of_resources- \c [in] Number of BOs in the list
  * \param   resources  - \c [in] List of BO handles
  * \param   resource_prios - \c [in] Optional priority for each handle
  * \param   result - \c [out] Created BO list handle
  *
  * \return   0 on success\n
  *  <0 - Negative POSIX Error code
  *
- * \sa amdgpu_bo_list_destroy()
+ * \sa amdgpu_bo_list_destroy(), amdgpu_cs_submit_raw2()
 */
 int amdgpu_bo_list_create(amdgpu_device_handle dev,
  uint32_t number_of_resources,
  amdgpu_bo_handle *resources,
  uint8_t *resource_prios,
  amdgpu_bo_list_handle *result);
 
 /**
  * Destroys a BO list handle.
  *
@@ -1580,20 +1612,42 @@ struct drm_amdgpu_cs_chunk;
 struct drm_amdgpu_cs_chunk_dep;
 struct drm_amdgpu_cs_chunk_data;
 
 int amdgpu_cs_submit_raw(amdgpu_device_handle dev,
 amdgpu_context_handle context,

Re: [PATCH libdrm 1/2] amdgpu: prevent an integer wraparound of cpu_map_count

2018-11-02 Thread Marek Olšák

On Fri, Nov 2, 2018 at 3:39 AM Koenig, Christian 
wrote:

> Am 31.10.18 um 23:12 schrieb Marek Olšák:
>
> On Wed, Oct 31, 2018 at 3:59 AM Koenig, Christian <
> christian.koe...@amd.com> wrote:
>
>> Am 30.10.18 um 16:59 schrieb Michel Dänzer:
>> > On 2018-10-30 4:52 p.m., Marek Olšák wrote:
>> >> On Tue, Oct 30, 2018, 11:49 AM Marek Olšák  wrote:
>> >>> On Tue, Oct 30, 2018, 4:20 AM Michel Dänzer 
>> wrote:
>> >>>
>> >>>> On 2018-10-29 10:15 p.m., Marek Olšák wrote:
>> >>>>> You and I discussed this extensively internally a while ago. It's
>> >>>> expected
>> >>>>> and correct behavior. Mesa doesn't unmap some buffers and never
>> will.
>> >>>> It doesn't need to keep mapping the same buffer over and over again
>> >>>> though, does it?
>> >>>>
>> >>> It doesnt map it again. It just doesnt unmap. So the next map call
>> just
>> >>> returns the pointer. It's correct to stop the counter wraparound.
>> >>>
>> >> Mesa doesn't track whether a buffer is already mapped. Libdrm tracks
>> that.
>> >> It's a feature of libdrm to return the same pointer and expect infinite
>> >> number of map calls.
>> > That's not what the reference counting in libdrm is intended for. It's
>> > for keeping track of how many independent callers have mapped the
>> > buffer. Mesa should remember that it mapped a buffer and not map it
>> again.
>>
>> Well if Mesa just wants to query the existing mapping then why not add a
>> amdgpu_bo_get_cpu_ptr() which just queries if a CPU mapping exists and
>> if yes returns the appropriate pointer or NULL otherwise?
>>
>> I mean when we want to abstract everything in libdrm then we just need
>> to add the functions we need to use this abstraction.
>>
>
> That can be future work for the sake of cleanliness and clarity, but it
> would be a waste of time and wouldn't help old Mesa.
>
>
> That it doesn't help old Mesa is unfortunate, but this is clearly a bug in
> Mesa.
>
> If old Mesa is broken then we should fix it by updating it and not add
> workarounds for specific clients in libdrm.
>

It's not a workaround. We made a decision with amdgpu to share code by
moving portions of the Mesa winsys into libdrm. The map_count is part of
that. It's highly desirable to continue with code sharing. There is nothing
broken with Mesa. Mesa won't check whether a buffer is already mapped.
That's the responsibility of libdrm as part of code sharing and we don't
want to duplicate the same logic in Mesa. It's all part of the intended
design.

Marek
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH libdrm 1/2] amdgpu: prevent an integer wraparound of cpu_map_count

2018-10-31 Thread Marek Olšák

On Wed, Oct 31, 2018 at 3:59 AM Koenig, Christian 
wrote:

> Am 30.10.18 um 16:59 schrieb Michel Dänzer:
> > On 2018-10-30 4:52 p.m., Marek Olšák wrote:
> >> On Tue, Oct 30, 2018, 11:49 AM Marek Olšák  wrote:
> >>> On Tue, Oct 30, 2018, 4:20 AM Michel Dänzer 
> wrote:
> >>>
> >>>> On 2018-10-29 10:15 p.m., Marek Olšák wrote:
> >>>>> You and I discussed this extensively internally a while ago. It's
> >>>> expected
> >>>>> and correct behavior. Mesa doesn't unmap some buffers and never will.
> >>>> It doesn't need to keep mapping the same buffer over and over again
> >>>> though, does it?
> >>>>
> >>> It doesnt map it again. It just doesnt unmap. So the next map call just
> >>> returns the pointer. It's correct to stop the counter wraparound.
> >>>
> >> Mesa doesn't track whether a buffer is already mapped. Libdrm tracks
> that.
> >> It's a feature of libdrm to return the same pointer and expect infinite
> >> number of map calls.
> > That's not what the reference counting in libdrm is intended for. It's
> > for keeping track of how many independent callers have mapped the
> > buffer. Mesa should remember that it mapped a buffer and not map it
> again.
>
> Well if Mesa just wants to query the existing mapping then why not add a
> amdgpu_bo_get_cpu_ptr() which just queries if a CPU mapping exists and
> if yes returns the appropriate pointer or NULL otherwise?
>
> I mean when we want to abstract everything in libdrm then we just need
> to add the functions we need to use this abstraction.
>

That can be future work for the sake of cleanliness and clarity, but it
would be a waste of time and wouldn't help old Mesa.

Marek
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH libdrm 1/2] amdgpu: prevent an integer wraparound of cpu_map_count

2018-10-30 Thread Marek Olšák

On Tue, Oct 30, 2018, 11:52 AM Marek Olšák  wrote:

>
>
> On Tue, Oct 30, 2018, 11:49 AM Marek Olšák  wrote:
>
>>
>>
>> On Tue, Oct 30, 2018, 4:20 AM Michel Dänzer  wrote:
>>
>>> On 2018-10-29 10:15 p.m., Marek Olšák wrote:
>>> > You and I discussed this extensively internally a while ago. It's
>>> expected
>>> > and correct behavior. Mesa doesn't unmap some buffers and never will.
>>>
>>> It doesn't need to keep mapping the same buffer over and over again
>>> though, does it?
>>>
>>
>> It doesnt map it again. It just doesnt unmap. So the next map call just
>> returns the pointer. It's correct to stop the counter wraparound.
>>
>
> Mesa doesn't track whether a buffer is already mapped. Libdrm tracks that.
> It's a feature of libdrm to return the same pointer and expect infinite
> number of map calls.
>

Mesa has been having this optimization for 8 years. (since the radeon
winsys). It's surprising that it surprises you now.

Marek



> Marek
>
>
>> Marek
>>
>>
>>>
>>> --
>>> Earthling Michel Dänzer   |   http://www.amd.com
>>> Libre software enthusiast | Mesa and X developer
>>>
>>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH libdrm 1/2] amdgpu: prevent an integer wraparound of cpu_map_count

2018-10-30 Thread Marek Olšák

On Tue, Oct 30, 2018, 11:49 AM Marek Olšák  wrote:

>
>
> On Tue, Oct 30, 2018, 4:20 AM Michel Dänzer  wrote:
>
>> On 2018-10-29 10:15 p.m., Marek Olšák wrote:
>> > You and I discussed this extensively internally a while ago. It's
>> expected
>> > and correct behavior. Mesa doesn't unmap some buffers and never will.
>>
>> It doesn't need to keep mapping the same buffer over and over again
>> though, does it?
>>
>
> It doesnt map it again. It just doesnt unmap. So the next map call just
> returns the pointer. It's correct to stop the counter wraparound.
>

Mesa doesn't track whether a buffer is already mapped. Libdrm tracks that.
It's a feature of libdrm to return the same pointer and expect infinite
number of map calls.

Marek


> Marek
>
>
>>
>> --
>> Earthling Michel Dänzer   |   http://www.amd.com
>> Libre software enthusiast | Mesa and X developer
>>
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH libdrm 1/2] amdgpu: prevent an integer wraparound of cpu_map_count

2018-10-30 Thread Marek Olšák

On Tue, Oct 30, 2018, 4:20 AM Michel Dänzer  wrote:

> On 2018-10-29 10:15 p.m., Marek Olšák wrote:
> > You and I discussed this extensively internally a while ago. It's
> expected
> > and correct behavior. Mesa doesn't unmap some buffers and never will.
>
> It doesn't need to keep mapping the same buffer over and over again
> though, does it?
>

It doesnt map it again. It just doesnt unmap. So the next map call just
returns the pointer. It's correct to stop the counter wraparound.

Marek


>
> --
> Earthling Michel Dänzer   |   http://www.amd.com
> Libre software enthusiast | Mesa and X developer
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH libdrm 2/2] amdgpu: don't track handles for non-memory allocations

2018-10-29 Thread Marek Olšák

OK. I'll drop this patch.

Marek

On Wed, Oct 24, 2018 at 4:14 AM Christian König <
ckoenig.leichtzumer...@gmail.com> wrote:

> Am 24.10.18 um 10:04 schrieb Michel Dänzer:
> > On 2018-10-23 9:07 p.m., Marek Olšák wrote:
> >> From: Marek Olšák 
> >>
> >> ---
> >>   amdgpu/amdgpu_bo.c | 15 +--
> >>   1 file changed, 9 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/amdgpu/amdgpu_bo.c b/amdgpu/amdgpu_bo.c
> >> index 81f8a5f7..00b9b54a 100644
> >> --- a/amdgpu/amdgpu_bo.c
> >> +++ b/amdgpu/amdgpu_bo.c
> >> @@ -91,26 +91,29 @@ drm_public int amdgpu_bo_alloc(amdgpu_device_handle
> dev,
> >>  if (r)
> >>  goto out;
> >>
> >>  r = amdgpu_bo_create(dev, alloc_buffer->alloc_size,
> args.out.handle,
> >>   buf_handle);
> >>  if (r) {
> >>  amdgpu_close_kms_handle(dev, args.out.handle);
> >>  goto out;
> >>  }
> >>
> >> -pthread_mutex_lock(&dev->bo_table_mutex);
> >> -r = handle_table_insert(&dev->bo_handles, (*buf_handle)->handle,
> >> -*buf_handle);
> >> -pthread_mutex_unlock(&dev->bo_table_mutex);
> >> -if (r)
> >> -amdgpu_bo_free(*buf_handle);
> >> +if (alloc_buffer->preferred_heap &
> >> +(AMDGPU_GEM_DOMAIN_VRAM | AMDGPU_GEM_DOMAIN_GTT)) {
> > What about AMDGPU_GEM_DOMAIN_CPU? I mean, that's unlikely to actually be
> > used here, but if it were, exporting and importing the resulting BO
> > should work fine?
> >
> > Instead of white-listing the domains which can be shared, it might be
> > better to black-list those which can't, i.e. GDS/GWS/OA.
>
> Well first of all GDS can be shared between applications.
>
> Then adding a BO to the tracking doesn't add much overhead (only 8 bytes
> and only if it was the last allocated).
>
> So I don't really see a reason why we should do this?
>
> Christian.
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH libdrm 1/2] amdgpu: prevent an integer wraparound of cpu_map_count

2018-10-29 Thread Marek Olšák

You and I discussed this extensively internally a while ago. It's expected
and correct behavior. Mesa doesn't unmap some buffers and never will.

Marek

On Wed, Oct 24, 2018 at 3:45 AM Christian König <
ckoenig.leichtzumer...@gmail.com> wrote:

> That looks really ugly to me. Mapping the same BO so often is illegal
> and should be handled as error.
>
> Otherwise we will never be able to cleanly recover from a GPU lockup
> with lost state by reloading the client library.
>
> Christian.
>
> Am 23.10.18 um 21:07 schrieb Marek Olšák:
> > From: Marek Olšák 
> >
> > ---
> >   amdgpu/amdgpu_bo.c | 19 +--
> >   1 file changed, 17 insertions(+), 2 deletions(-)
> >
> > diff --git a/amdgpu/amdgpu_bo.c b/amdgpu/amdgpu_bo.c
> > index c0f42e81..81f8a5f7 100644
> > --- a/amdgpu/amdgpu_bo.c
> > +++ b/amdgpu/amdgpu_bo.c
> > @@ -22,20 +22,21 @@
> >*
> >*/
> >
> >   #include 
> >   #include 
> >   #include 
> >   #include 
> >   #include 
> >   #include 
> >   #include 
> > +#include 
> >   #include 
> >   #include 
> >   #include 
> >
> >   #include "libdrm_macros.h"
> >   #include "xf86drm.h"
> >   #include "amdgpu_drm.h"
> >   #include "amdgpu_internal.h"
> >   #include "util_math.h"
> >
> > @@ -442,21 +443,29 @@ drm_public int amdgpu_bo_cpu_map(amdgpu_bo_handle
> bo, void **cpu)
> >   {
> >   union drm_amdgpu_gem_mmap args;
> >   void *ptr;
> >   int r;
> >
> >   pthread_mutex_lock(&bo->cpu_access_mutex);
> >
> >   if (bo->cpu_ptr) {
> >   /* already mapped */
> >   assert(bo->cpu_map_count > 0);
> > - bo->cpu_map_count++;
> > +
> > + /* If the counter has already reached INT_MAX, don't
> increment
> > +  * it and assume that the buffer will be mapped
> indefinitely.
> > +  * The buffer is pretty unlikely to get unmapped by the
> user
> > +  * at this point.
> > +  */
> > + if (bo->cpu_map_count != INT_MAX)
> > + bo->cpu_map_count++;
> > +
> >   *cpu = bo->cpu_ptr;
> >   pthread_mutex_unlock(&bo->cpu_access_mutex);
> >   return 0;
> >   }
> >
> >   assert(bo->cpu_map_count == 0);
> >
> >   memset(&args, 0, sizeof(args));
> >
> >   /* Query the buffer address (args.addr_ptr).
> > @@ -492,21 +501,27 @@ drm_public int
> amdgpu_bo_cpu_unmap(amdgpu_bo_handle bo)
> >
> >   pthread_mutex_lock(&bo->cpu_access_mutex);
> >   assert(bo->cpu_map_count >= 0);
> >
> >   if (bo->cpu_map_count == 0) {
> >   /* not mapped */
> >   pthread_mutex_unlock(&bo->cpu_access_mutex);
> >   return -EINVAL;
> >   }
> >
> > - bo->cpu_map_count--;
> > + /* If the counter has already reached INT_MAX, don't decrement it.
> > +  * This is because amdgpu_bo_cpu_map doesn't increment it past
> > +  * INT_MAX.
> > +  */
> > + if (bo->cpu_map_count != INT_MAX)
> > + bo->cpu_map_count--;
> > +
> >   if (bo->cpu_map_count > 0) {
> >   /* mapped multiple times */
> >   pthread_mutex_unlock(&bo->cpu_access_mutex);
> >   return 0;
> >   }
> >
> >   r = drm_munmap(bo->cpu_ptr, bo->alloc_size) == 0 ? 0 : -errno;
> >   bo->cpu_ptr = NULL;
> >   pthread_mutex_unlock(&bo->cpu_access_mutex);
> >   return r;
>
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH libdrm 1/2] amdgpu: prevent an integer wraparound of cpu_map_count

2018-10-29 Thread Marek Olšák

On Tue, Oct 23, 2018 at 10:38 PM Zhang, Jerry(Junwei) 
wrote:

> On 10/24/18 3:07 AM, Marek Olšák wrote:
> > From: Marek Olšák 
>
> We need commit log and sign-off here.
>
> BTW, have you encounter any issue about that?
>

I don't know what you mean. I'm pretty sure that a sign-off is not needed
for libdrm.


>
> >
> > ---
> >   amdgpu/amdgpu_bo.c | 19 +--
> >   1 file changed, 17 insertions(+), 2 deletions(-)
> >
> > diff --git a/amdgpu/amdgpu_bo.c b/amdgpu/amdgpu_bo.c
> > index c0f42e81..81f8a5f7 100644
> > --- a/amdgpu/amdgpu_bo.c
> > +++ b/amdgpu/amdgpu_bo.c
> > @@ -22,20 +22,21 @@
> >*
> >*/
> >
> >   #include 
> >   #include 
> >   #include 
> >   #include 
> >   #include 
> >   #include 
> >   #include 
> > +#include 
> >   #include 
> >   #include 
> >   #include 
> >
> >   #include "libdrm_macros.h"
> >   #include "xf86drm.h"
> >   #include "amdgpu_drm.h"
> >   #include "amdgpu_internal.h"
> >   #include "util_math.h"
> >
> > @@ -442,21 +443,29 @@ drm_public int amdgpu_bo_cpu_map(amdgpu_bo_handle
> bo, void **cpu)
> >   {
> >   union drm_amdgpu_gem_mmap args;
> >   void *ptr;
> >   int r;
> >
> >   pthread_mutex_lock(&bo->cpu_access_mutex);
> >
> >   if (bo->cpu_ptr) {
> >   /* already mapped */
> >   assert(bo->cpu_map_count > 0);
> > - bo->cpu_map_count++;
> > +
> > + /* If the counter has already reached INT_MAX, don't
> increment
> > +  * it and assume that the buffer will be mapped
> indefinitely.
> > +  * The buffer is pretty unlikely to get unmapped by the
> user
> > +  * at this point.
> > +  */
> > + if (bo->cpu_map_count != INT_MAX)
> > + bo->cpu_map_count++;
>
> If so, shall we print some error here to notice that indefinite mappings
> come up.
>

No error. This is expected usage.

Marek
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH libdrm 2/2] amdgpu: don't track handles for non-memory allocations

2018-10-23 Thread Marek Olšák

From: Marek Olšák 

---
 amdgpu/amdgpu_bo.c | 15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/amdgpu/amdgpu_bo.c b/amdgpu/amdgpu_bo.c
index 81f8a5f7..00b9b54a 100644
--- a/amdgpu/amdgpu_bo.c
+++ b/amdgpu/amdgpu_bo.c
@@ -91,26 +91,29 @@ drm_public int amdgpu_bo_alloc(amdgpu_device_handle dev,
if (r)
goto out;
 
r = amdgpu_bo_create(dev, alloc_buffer->alloc_size, args.out.handle,
 buf_handle);
if (r) {
amdgpu_close_kms_handle(dev, args.out.handle);
goto out;
}
 
-   pthread_mutex_lock(&dev->bo_table_mutex);
-   r = handle_table_insert(&dev->bo_handles, (*buf_handle)->handle,
-   *buf_handle);
-   pthread_mutex_unlock(&dev->bo_table_mutex);
-   if (r)
-   amdgpu_bo_free(*buf_handle);
+   if (alloc_buffer->preferred_heap &
+   (AMDGPU_GEM_DOMAIN_VRAM | AMDGPU_GEM_DOMAIN_GTT)) {
+   pthread_mutex_lock(&dev->bo_table_mutex);
+   r = handle_table_insert(&dev->bo_handles, (*buf_handle)->handle,
+   *buf_handle);
+   pthread_mutex_unlock(&dev->bo_table_mutex);
+   if (r)
+   amdgpu_bo_free(*buf_handle);
+   }
 out:
return r;
 }
 
 drm_public int amdgpu_bo_set_metadata(amdgpu_bo_handle bo,
  struct amdgpu_bo_metadata *info)
 {
struct drm_amdgpu_gem_metadata args = {};
 
args.handle = bo->handle;
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH libdrm 1/2] amdgpu: prevent an integer wraparound of cpu_map_count

2018-10-23 Thread Marek Olšák

From: Marek Olšák 

---
 amdgpu/amdgpu_bo.c | 19 +--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/amdgpu/amdgpu_bo.c b/amdgpu/amdgpu_bo.c
index c0f42e81..81f8a5f7 100644
--- a/amdgpu/amdgpu_bo.c
+++ b/amdgpu/amdgpu_bo.c
@@ -22,20 +22,21 @@
  *
  */
 
 #include 
 #include 
 #include 
 #include 
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
 
 #include "libdrm_macros.h"
 #include "xf86drm.h"
 #include "amdgpu_drm.h"
 #include "amdgpu_internal.h"
 #include "util_math.h"
 
@@ -442,21 +443,29 @@ drm_public int amdgpu_bo_cpu_map(amdgpu_bo_handle bo, 
void **cpu)
 {
union drm_amdgpu_gem_mmap args;
void *ptr;
int r;
 
pthread_mutex_lock(&bo->cpu_access_mutex);
 
if (bo->cpu_ptr) {
/* already mapped */
assert(bo->cpu_map_count > 0);
-   bo->cpu_map_count++;
+
+   /* If the counter has already reached INT_MAX, don't increment
+* it and assume that the buffer will be mapped indefinitely.
+* The buffer is pretty unlikely to get unmapped by the user
+* at this point.
+*/
+   if (bo->cpu_map_count != INT_MAX)
+   bo->cpu_map_count++;
+
*cpu = bo->cpu_ptr;
pthread_mutex_unlock(&bo->cpu_access_mutex);
return 0;
}
 
assert(bo->cpu_map_count == 0);
 
memset(&args, 0, sizeof(args));
 
/* Query the buffer address (args.addr_ptr).
@@ -492,21 +501,27 @@ drm_public int amdgpu_bo_cpu_unmap(amdgpu_bo_handle bo)
 
pthread_mutex_lock(&bo->cpu_access_mutex);
assert(bo->cpu_map_count >= 0);
 
if (bo->cpu_map_count == 0) {
/* not mapped */
pthread_mutex_unlock(&bo->cpu_access_mutex);
return -EINVAL;
}
 
-   bo->cpu_map_count--;
+   /* If the counter has already reached INT_MAX, don't decrement it.
+* This is because amdgpu_bo_cpu_map doesn't increment it past
+* INT_MAX.
+*/
+   if (bo->cpu_map_count != INT_MAX)
+   bo->cpu_map_count--;
+
if (bo->cpu_map_count > 0) {
/* mapped multiple times */
pthread_mutex_unlock(&bo->cpu_access_mutex);
return 0;
}
 
r = drm_munmap(bo->cpu_ptr, bo->alloc_size) == 0 ? 0 : -errno;
bo->cpu_ptr = NULL;
pthread_mutex_unlock(&bo->cpu_access_mutex);
return r;
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: Add DCC flags for GFX9 amdgpu_bo

2018-10-23 Thread Marek Olšák

Reviewed-by: Marek Olšák 

Marek

On Tue, Oct 23, 2018 at 10:05 AM Nicholas Kazlauskas <
nicholas.kazlaus...@amd.com> wrote:

> [Why]
> Hardware support for Delta Color Compression (DCC) decompression is
> available in DC for GFX9 but there's no way for userspace to enable
> the feature.
>
> Enabling the feature can provide improved GFX performance and
> power savings in many situations.
>
> [How]
> Extend the GFX9 tiling flags to include DCC parameters. These are
> logically grouped together with tiling flags even if they are
> technically distinct.
>
> This trivially maintains backwards compatibility with existing
> users of amdgpu_gem_metadata. No new IOCTls or data structures are
> needed to support DCC.
>
> This patch helps expose DCC attributes to both libdrm and amdgpu_dm.
>
> Signed-off-by: Nicholas Kazlauskas 
> ---
>  include/uapi/drm/amdgpu_drm.h | 6 ++
>  1 file changed, 6 insertions(+)
>
> diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
> index 6a0d77dcfc47..faaad04814e4 100644
> --- a/include/uapi/drm/amdgpu_drm.h
> +++ b/include/uapi/drm/amdgpu_drm.h
> @@ -329,6 +329,12 @@ struct drm_amdgpu_gem_userptr {
>  /* GFX9 and later: */
>  #define AMDGPU_TILING_SWIZZLE_MODE_SHIFT   0
>  #define AMDGPU_TILING_SWIZZLE_MODE_MASK0x1f
> +#define AMDGPU_TILING_DCC_OFFSET_256B_SHIFT5
> +#define AMDGPU_TILING_DCC_OFFSET_256B_MASK 0xFF
> +#define AMDGPU_TILING_DCC_PITCH_MAX_SHIFT  29
> +#define AMDGPU_TILING_DCC_PITCH_MAX_MASK   0x3FFF
> +#define AMDGPU_TILING_DCC_INDEPENDENT_64B_SHIFT43
> +#define AMDGPU_TILING_DCC_INDEPENDENT_64B_MASK 0x1
>
>  /* Set/Get helpers for tiling flags. */
>  #define AMDGPU_TILING_SET(field, value) \
> --
> 2.17.1
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 2/3] drm/amdgpu: increase the size of HQD EOP buffers

2018-10-18 Thread Marek Olšák

On Tue, Oct 9, 2018 at 12:17 PM Alex Deucher  wrote:

> On Fri, Oct 5, 2018 at 5:01 PM Marek Olšák  wrote:
> >
> > From: Marek Olšák 
> >
> > Signed-off-by: Marek Olšák 
> > ---
> >  drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 2 +-
> >  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 2 +-
>
> Any reason not to bump the size for gfx7 as well?
>

No, I just don't know if gfx7 supports the same size.

Marek
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 3/3] drm/amdgpu: put HQD EOP buffers into VRAM

2018-10-05 Thread Marek Olšák

From: Marek Olšák 

This increases performance of compute queues.
EOP events (PKT3_RELEASE_MEM) are stored into these buffers.

Signed-off-by: Marek Olšák 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c | 2 +-
 drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 2 +-
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
index 0e72bc09939a..000180d79f30 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
@@ -2774,21 +2774,21 @@ static int gfx_v7_0_mec_init(struct amdgpu_device *adev)
bitmap_zero(adev->gfx.mec.queue_bitmap, AMDGPU_MAX_COMPUTE_QUEUES);
 
/* take ownership of the relevant compute queues */
amdgpu_gfx_compute_queue_acquire(adev);
 
/* allocate space for ALL pipes (even the ones we don't own) */
mec_hpd_size = adev->gfx.mec.num_mec * adev->gfx.mec.num_pipe_per_mec
* GFX7_MEC_HPD_SIZE * 2;
 
r = amdgpu_bo_create_reserved(adev, mec_hpd_size, PAGE_SIZE,
- AMDGPU_GEM_DOMAIN_GTT,
+ AMDGPU_GEM_DOMAIN_VRAM,
  &adev->gfx.mec.hpd_eop_obj,
  &adev->gfx.mec.hpd_eop_gpu_addr,
  (void **)&hpd);
if (r) {
dev_warn(adev->dev, "(%d) create, pin or map of HDP EOP bo 
failed\n", r);
gfx_v7_0_mec_fini(adev);
return r;
}
 
/* clear memory.  Not sure if this is required or not */
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
index 191feafc3b60..8b6dae7a10bf 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
@@ -1436,21 +1436,21 @@ static int gfx_v8_0_mec_init(struct amdgpu_device *adev)
size_t mec_hpd_size;
 
bitmap_zero(adev->gfx.mec.queue_bitmap, AMDGPU_MAX_COMPUTE_QUEUES);
 
/* take ownership of the relevant compute queues */
amdgpu_gfx_compute_queue_acquire(adev);
 
mec_hpd_size = adev->gfx.num_compute_rings * GFX8_MEC_HPD_SIZE;
 
r = amdgpu_bo_create_reserved(adev, mec_hpd_size, PAGE_SIZE,
- AMDGPU_GEM_DOMAIN_GTT,
+ AMDGPU_GEM_DOMAIN_VRAM,
  &adev->gfx.mec.hpd_eop_obj,
  &adev->gfx.mec.hpd_eop_gpu_addr,
  (void **)&hpd);
if (r) {
dev_warn(adev->dev, "(%d) create HDP EOP bo failed\n", r);
return r;
}
 
memset(hpd, 0, mec_hpd_size);
 
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index a9d3d6a3fb41..3aaacf61d85e 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -1247,21 +1247,21 @@ static int gfx_v9_0_mec_init(struct amdgpu_device *adev)
 
const struct gfx_firmware_header_v1_0 *mec_hdr;
 
bitmap_zero(adev->gfx.mec.queue_bitmap, AMDGPU_MAX_COMPUTE_QUEUES);
 
/* take ownership of the relevant compute queues */
amdgpu_gfx_compute_queue_acquire(adev);
mec_hpd_size = adev->gfx.num_compute_rings * GFX9_MEC_HPD_SIZE;
 
r = amdgpu_bo_create_reserved(adev, mec_hpd_size, PAGE_SIZE,
- AMDGPU_GEM_DOMAIN_GTT,
+ AMDGPU_GEM_DOMAIN_VRAM,
  &adev->gfx.mec.hpd_eop_obj,
  &adev->gfx.mec.hpd_eop_gpu_addr,
  (void **)&hpd);
if (r) {
dev_warn(adev->dev, "(%d) create HDP EOP bo failed\n", r);
gfx_v9_0_mec_fini(adev);
return r;
}
 
memset(hpd, 0, adev->gfx.mec.hpd_eop_obj->tbo.mem.size);
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 1/3] drm/amdgpu: set GTT_USWC on reserved VRAM allocations

2018-10-05 Thread Marek Olšák

From: Marek Olšák 

Signed-off-by: Marek Olšák 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 904014dc5915..8e0f47343e0e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -241,21 +241,23 @@ int amdgpu_bo_create_reserved(struct amdgpu_device *adev,
if (!size) {
amdgpu_bo_unref(bo_ptr);
return 0;
}
 
memset(&bp, 0, sizeof(bp));
bp.size = size;
bp.byte_align = align;
bp.domain = domain;
bp.flags = AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED |
-   AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS;
+  AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS |
+  (domain & AMDGPU_GEM_DOMAIN_VRAM ?
+   AMDGPU_GEM_CREATE_CPU_GTT_USWC : 0);
bp.type = ttm_bo_type_kernel;
bp.resv = NULL;
 
if (!*bo_ptr) {
r = amdgpu_bo_create(adev, &bp, bo_ptr);
if (r) {
dev_err(adev->dev, "(%d) failed to allocate kernel 
bo\n",
r);
return r;
}
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 2/3] drm/amdgpu: increase the size of HQD EOP buffers

2018-10-05 Thread Marek Olšák

From: Marek Olšák 

Signed-off-by: Marek Olšák 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 2 +-
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
index 77e05c19022a..191feafc3b60 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
@@ -47,21 +47,21 @@
 #include "gca/gfx_8_0_enum.h"
 
 #include "dce/dce_10_0_d.h"
 #include "dce/dce_10_0_sh_mask.h"
 
 #include "smu/smu_7_1_3_d.h"
 
 #include "ivsrcid/ivsrcid_vislands30.h"
 
 #define GFX8_NUM_GFX_RINGS 1
-#define GFX8_MEC_HPD_SIZE 2048
+#define GFX8_MEC_HPD_SIZE 4096
 
 #define TOPAZ_GB_ADDR_CONFIG_GOLDEN 0x22010001
 #define CARRIZO_GB_ADDR_CONFIG_GOLDEN 0x22010001
 #define POLARIS11_GB_ADDR_CONFIG_GOLDEN 0x22011002
 #define TONGA_GB_ADDR_CONFIG_GOLDEN 0x22011003
 
 #define ARRAY_MODE(x)  ((x) << 
GB_TILE_MODE0__ARRAY_MODE__SHIFT)
 #define PIPE_CONFIG(x) ((x) << 
GB_TILE_MODE0__PIPE_CONFIG__SHIFT)
 #define TILE_SPLIT(x)  ((x) << 
GB_TILE_MODE0__TILE_SPLIT__SHIFT)
 #define MICRO_TILE_MODE_NEW(x) ((x) << 
GB_TILE_MODE0__MICRO_TILE_MODE_NEW__SHIFT)
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index 4b020cc4bea9..a9d3d6a3fb41 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -34,21 +34,21 @@
 #include "vega10_enum.h"
 #include "hdp/hdp_4_0_offset.h"
 
 #include "soc15_common.h"
 #include "clearstate_gfx9.h"
 #include "v9_structs.h"
 
 #include "ivsrcid/gfx/irqsrcs_gfx_9_0.h"
 
 #define GFX9_NUM_GFX_RINGS 1
-#define GFX9_MEC_HPD_SIZE 2048
+#define GFX9_MEC_HPD_SIZE 4096
 #define RLCG_UCODE_LOADING_START_ADDRESS 0x2000L
 #define RLC_SAVE_RESTORE_ADDR_STARTING_OFFSET 0xL
 
 #define mmPWR_MISC_CNTL_STATUS 0x0183
 #define mmPWR_MISC_CNTL_STATUS_BASE_IDX0
 #define PWR_MISC_CNTL_STATUS__PWR_GFX_RLC_CGPG_EN__SHIFT   0x0
 #define PWR_MISC_CNTL_STATUS__PWR_GFXOFF_STATUS__SHIFT 0x1
 #define PWR_MISC_CNTL_STATUS__PWR_GFX_RLC_CGPG_EN_MASK 0x0001L
 #define PWR_MISC_CNTL_STATUS__PWR_GFXOFF_STATUS_MASK   0x0006L
 
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH libdrm v2 2/2] amdgpu/test: Fix deadlock tests for AI and RV v2

2018-10-03 Thread Marek Olšák

Yes, Andrey has commit rights.

Marek

On Wed, Oct 3, 2018 at 10:34 AM Christian König
 wrote:
>
> Thanks for keeping working on this.
>
> Series is Reviewed-by: Christian König  as well.
>
> Do you now have commit rights?
>
> Christian.
>
> Am 02.10.2018 um 22:47 schrieb Marek Olšák:
> > For the series:
> >
> > Reviewed-by: Marek Olšák 
> >
> > Marek
> > On Fri, Sep 28, 2018 at 10:46 AM Andrey Grodzovsky
> >  wrote:
> >> Seems like AI and RV requires uncashed memory mapping to be able
> >> to pickup value written to memory by CPU after the WAIT_REG_MEM
> >> command was already launched.
> >> .
> >> Enable the test for AI and RV.
> >>
> >> v2:
> >> Update commit description.
> >>
> >> Signed-off-by: Andrey Grodzovsky 
> >> ---
> >>   tests/amdgpu/deadlock_tests.c | 13 -
> >>   1 file changed, 8 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/tests/amdgpu/deadlock_tests.c b/tests/amdgpu/deadlock_tests.c
> >> index 304482d..292ec4e 100644
> >> --- a/tests/amdgpu/deadlock_tests.c
> >> +++ b/tests/amdgpu/deadlock_tests.c
> >> @@ -80,6 +80,8 @@ static  uint32_t  minor_version;
> >>   static pthread_t stress_thread;
> >>   static uint32_t *ptr;
> >>
> >> +int use_uc_mtype = 0;
> >> +
> >>   static void amdgpu_deadlock_helper(unsigned ip_type);
> >>   static void amdgpu_deadlock_gfx(void);
> >>   static void amdgpu_deadlock_compute(void);
> >> @@ -92,13 +94,14 @@ CU_BOOL suite_deadlock_tests_enable(void)
> >>   &minor_version, 
> >> &device_handle))
> >>  return CU_FALSE;
> >>
> >> -   if (device_handle->info.family_id == AMDGPU_FAMILY_AI ||
> >> -   device_handle->info.family_id == AMDGPU_FAMILY_SI ||
> >> -   device_handle->info.family_id == AMDGPU_FAMILY_RV) {
> >> +   if (device_handle->info.family_id == AMDGPU_FAMILY_SI) {
> >>  printf("\n\nCurrently hangs the CP on this ASIC, deadlock 
> >> suite disabled\n");
> >>  enable = CU_FALSE;
> >>  }
> >>
> >> +   if (device_handle->info.family_id >= AMDGPU_FAMILY_AI)
> >> +   use_uc_mtype = 1;
> >> +
> >>  if (amdgpu_device_deinitialize(device_handle))
> >>  return CU_FALSE;
> >>
> >> @@ -183,8 +186,8 @@ static void amdgpu_deadlock_helper(unsigned ip_type)
> >>  r = amdgpu_cs_ctx_create(device_handle, &context_handle);
> >>  CU_ASSERT_EQUAL(r, 0);
> >>
> >> -   r = amdgpu_bo_alloc_and_map(device_handle, 4096, 4096,
> >> -   AMDGPU_GEM_DOMAIN_GTT, 0,
> >> +   r = amdgpu_bo_alloc_and_map_raw(device_handle, 4096, 4096,
> >> +   AMDGPU_GEM_DOMAIN_GTT, 0, use_uc_mtype ? 
> >> AMDGPU_VM_MTYPE_UC : 0,
> >>  &ib_result_handle, 
> >> &ib_result_cpu,
> >>  
> >> &ib_result_mc_address, &va_handle);
> >>  CU_ASSERT_EQUAL(r, 0);
> >> --
> >> 2.7.4
> >>
> >> ___
> >> dri-devel mailing list
> >> dri-de...@lists.freedesktop.org
> >> https://lists.freedesktop.org/mailman/listinfo/dri-devel
> > ___
> > dri-devel mailing list
> > dri-de...@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/dri-devel
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH libdrm v2 2/2] amdgpu/test: Fix deadlock tests for AI and RV v2

2018-10-02 Thread Marek Olšák

For the series:

Reviewed-by: Marek Olšák 

Marek
On Fri, Sep 28, 2018 at 10:46 AM Andrey Grodzovsky
 wrote:
>
> Seems like AI and RV requires uncashed memory mapping to be able
> to pickup value written to memory by CPU after the WAIT_REG_MEM
> command was already launched.
> .
> Enable the test for AI and RV.
>
> v2:
> Update commit description.
>
> Signed-off-by: Andrey Grodzovsky 
> ---
>  tests/amdgpu/deadlock_tests.c | 13 -
>  1 file changed, 8 insertions(+), 5 deletions(-)
>
> diff --git a/tests/amdgpu/deadlock_tests.c b/tests/amdgpu/deadlock_tests.c
> index 304482d..292ec4e 100644
> --- a/tests/amdgpu/deadlock_tests.c
> +++ b/tests/amdgpu/deadlock_tests.c
> @@ -80,6 +80,8 @@ static  uint32_t  minor_version;
>  static pthread_t stress_thread;
>  static uint32_t *ptr;
>
> +int use_uc_mtype = 0;
> +
>  static void amdgpu_deadlock_helper(unsigned ip_type);
>  static void amdgpu_deadlock_gfx(void);
>  static void amdgpu_deadlock_compute(void);
> @@ -92,13 +94,14 @@ CU_BOOL suite_deadlock_tests_enable(void)
>  &minor_version, &device_handle))
> return CU_FALSE;
>
> -   if (device_handle->info.family_id == AMDGPU_FAMILY_AI ||
> -   device_handle->info.family_id == AMDGPU_FAMILY_SI ||
> -   device_handle->info.family_id == AMDGPU_FAMILY_RV) {
> +   if (device_handle->info.family_id == AMDGPU_FAMILY_SI) {
> printf("\n\nCurrently hangs the CP on this ASIC, deadlock 
> suite disabled\n");
> enable = CU_FALSE;
> }
>
> +   if (device_handle->info.family_id >= AMDGPU_FAMILY_AI)
> +   use_uc_mtype = 1;
> +
> if (amdgpu_device_deinitialize(device_handle))
> return CU_FALSE;
>
> @@ -183,8 +186,8 @@ static void amdgpu_deadlock_helper(unsigned ip_type)
> r = amdgpu_cs_ctx_create(device_handle, &context_handle);
> CU_ASSERT_EQUAL(r, 0);
>
> -   r = amdgpu_bo_alloc_and_map(device_handle, 4096, 4096,
> -   AMDGPU_GEM_DOMAIN_GTT, 0,
> +   r = amdgpu_bo_alloc_and_map_raw(device_handle, 4096, 4096,
> +   AMDGPU_GEM_DOMAIN_GTT, 0, use_uc_mtype ? 
> AMDGPU_VM_MTYPE_UC : 0,
> &ib_result_handle, 
> &ib_result_cpu,
> &ib_result_mc_address, 
> &va_handle);
> CU_ASSERT_EQUAL(r, 0);
> --
> 2.7.4
>
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH libdrm 1/3] amdgpu: Propogate user flags to amdgpu_bo_va_op_raw

2018-09-27 Thread Marek Olšák

This will break old UMDs that didn't set the flags correctly. Instead,
UMDs should stop using amdgpu_bo_va_op if they want to set the flags.

Marek
On Thu, Sep 27, 2018 at 3:05 PM Andrey Grodzovsky
 wrote:
>
> Signed-off-by: Andrey Grodzovsky 
> ---
>  amdgpu/amdgpu_bo.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/amdgpu/amdgpu_bo.c b/amdgpu/amdgpu_bo.c
> index c0f42e8..1892345 100644
> --- a/amdgpu/amdgpu_bo.c
> +++ b/amdgpu/amdgpu_bo.c
> @@ -736,7 +736,7 @@ drm_public int amdgpu_bo_va_op(amdgpu_bo_handle bo,
>uint64_t offset,
>uint64_t size,
>uint64_t addr,
> -  uint64_t flags,
> +  uint64_t extra_flags,
>uint32_t ops)
>  {
> amdgpu_device_handle dev = bo->dev;
> @@ -746,7 +746,8 @@ drm_public int amdgpu_bo_va_op(amdgpu_bo_handle bo,
> return amdgpu_bo_va_op_raw(dev, bo, offset, size, addr,
>AMDGPU_VM_PAGE_READABLE |
>AMDGPU_VM_PAGE_WRITEABLE |
> -  AMDGPU_VM_PAGE_EXECUTABLE, ops);
> +  AMDGPU_VM_PAGE_EXECUTABLE |
> +  extra_flags, ops);
>  }
>
>  drm_public int amdgpu_bo_va_op_raw(amdgpu_device_handle dev,
> --
> 2.7.4
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: reserve GDS resources statically

2018-09-13 Thread Marek Olšák

To be fair, since we have only 7 user VMIDs and 8 chunks of GDS, we
can make the 8th GDS chunk global and allocatable and use it based on
a CS flag. It would need more work and a lot of testing though. I
don't think we can do the testing part now because of the complexity
of interactions between per-VMID GDS and global GDS, but it's
certainly something that people could add in the future.

Marek

On Thu, Sep 13, 2018 at 3:04 PM, Marek Olšák  wrote:
> I was thinking about that too, but it would be too much trouble for
> something we don't need.
>
> Marek
>
> On Thu, Sep 13, 2018 at 2:57 PM, Deucher, Alexander
>  wrote:
>> Why don't we just fix up the current GDS code so it works the same as vram
>> and then we can add a new CS or context flag to ignore the current static
>> allocation for gfx.  We can ignore data persistence if it's too much
>> trouble.  Assume you always have to init the memory before you use it.
>> That's already the case.
>>
>>
>> Alex
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: reserve GDS resources statically

2018-09-13 Thread Marek Olšák

I was thinking about that too, but it would be too much trouble for
something we don't need.

Marek

On Thu, Sep 13, 2018 at 2:57 PM, Deucher, Alexander
 wrote:
> Why don't we just fix up the current GDS code so it works the same as vram
> and then we can add a new CS or context flag to ignore the current static
> allocation for gfx.  We can ignore data persistence if it's too much
> trouble.  Assume you always have to init the memory before you use it.
> That's already the case.
>
>
> Alex
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: reserve GDS resources statically

2018-09-13 Thread Marek Olšák

GDS is a temporary memory. Its purpose depends on the job, but most of
the time, the idea is:
- beginning of IB
- initialize GDS variables
- dispatch compute that works with GDS variables
- when done, copy GDS variables to memory
- repeat ...
- end of IB

GDS is like a pool of global shader GPRs.

GDS is too small for persistent data.

Marek

On Thu, Sep 13, 2018 at 1:26 PM, Christian König
 wrote:
> Are you sure of that? I mean it is rather pointless to have a Global Data
> Share when it can't be used to share anything?
>
> On the other hand I'm not opposed to get rid of all that stuff if we really
> don't need it.
>
> Christian.
>
> Am 13.09.2018 um 17:27 schrieb Marek Olšák:
>>
>> That's OK. We don't need IBs to get the same VMID.
>>
>> Marek
>>
>> On Thu, Sep 13, 2018 at 4:40 AM, Christian König
>>  wrote:
>>>
>>> As discussed internally that doesn't work because threads don't necessary
>>> get the same VMID assigned.
>>>
>>> Christian.
>>>
>>> Am 12.09.2018 um 22:33 schrieb Marek Olšák:
>>>>
>>>> From: Marek Olšák 
>>>>
>>>> I've chosen to do it like this because it's easy and allows an arbitrary
>>>> number of processes.
>>>>
>>>> Signed-off-by: Marek Olšák 
>>>> ---
>>>>drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c |  10 --
>>>>drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h |   3 -
>>>>drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c  |  20 
>>>>drivers/gpu/drm/amd/amdgpu/amdgpu_gds.h |  19 +--
>>>>drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c |  24 +---
>>>>drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c |   6 -
>>>>drivers/gpu/drm/amd/amdgpu/amdgpu_ids.h |   7 --
>>>>drivers/gpu/drm/amd/amdgpu/amdgpu_job.h |   3 -
>>>>drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c |  14 +--
>>>>drivers/gpu/drm/amd/amdgpu/amdgpu_object.c  |  21 
>>>>drivers/gpu/drm/amd/amdgpu/amdgpu_object.h  |   6 -
>>>>drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h|   5 -
>>>>drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c |  61 --
>>>>drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h |   8 --
>>>>drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c  |  34 +-
>>>>drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c   | 125
>>>> +---
>>>>drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c   | 123 +--
>>>>drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c   | 124 ++-
>>>>include/uapi/drm/amdgpu_drm.h   |  15 +--
>>>>19 files changed, 109 insertions(+), 519 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
>>>> index b80243d3972e..7264a4930b88 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
>>>> @@ -71,23 +71,20 @@ int amdgpu_bo_list_create(struct amdgpu_device
>>>> *adev,
>>>> struct drm_file *filp,
>>>>  / sizeof(struct amdgpu_bo_list_entry))
>>>>  return -EINVAL;
>>>>  size = sizeof(struct amdgpu_bo_list);
>>>>  size += num_entries * sizeof(struct amdgpu_bo_list_entry);
>>>>  list = kvmalloc(size, GFP_KERNEL);
>>>>  if (!list)
>>>>  return -ENOMEM;
>>>>  kref_init(&list->refcount);
>>>> -   list->gds_obj = adev->gds.gds_gfx_bo;
>>>> -   list->gws_obj = adev->gds.gws_gfx_bo;
>>>> -   list->oa_obj = adev->gds.oa_gfx_bo;
>>>>  array = amdgpu_bo_list_array_entry(list, 0);
>>>>  memset(array, 0, num_entries * sizeof(struct
>>>> amdgpu_bo_list_entry));
>>>>  for (i = 0; i < num_entries; ++i) {
>>>>  struct amdgpu_bo_list_entry *entry;
>>>>  struct drm_gem_object *gobj;
>>>>  struct amdgpu_bo *bo;
>>>>  struct mm_struct *usermm;
>>>>@@ -111,27 +108,20 @@ int amdgpu_bo_list_create(struct amdgpu_device
>>>> *adev, struct drm_file *filp,
>>>>  } else {
>>>>  entry = &array[last_entry++];
>>>>

Re: [PATCH] drm/amdgpu: reserve GDS resources statically

2018-09-13 Thread Marek Olšák

That's OK. We don't need IBs to get the same VMID.

Marek

On Thu, Sep 13, 2018 at 4:40 AM, Christian König
 wrote:
> As discussed internally that doesn't work because threads don't necessary
> get the same VMID assigned.
>
> Christian.
>
> Am 12.09.2018 um 22:33 schrieb Marek Olšák:
>>
>> From: Marek Olšák 
>>
>> I've chosen to do it like this because it's easy and allows an arbitrary
>> number of processes.
>>
>> Signed-off-by: Marek Olšák 
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c |  10 --
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h |   3 -
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c  |  20 
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_gds.h |  19 +--
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c |  24 +---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c |   6 -
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_ids.h |   7 --
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_job.h |   3 -
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c |  14 +--
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.c  |  21 
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.h  |   6 -
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h|   5 -
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c |  61 --
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h |   8 --
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c  |  34 +-
>>   drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c   | 125 +---
>>   drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c   | 123 +--
>>   drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c   | 124 ++-
>>   include/uapi/drm/amdgpu_drm.h   |  15 +--
>>   19 files changed, 109 insertions(+), 519 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
>> index b80243d3972e..7264a4930b88 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
>> @@ -71,23 +71,20 @@ int amdgpu_bo_list_create(struct amdgpu_device *adev,
>> struct drm_file *filp,
>> / sizeof(struct amdgpu_bo_list_entry))
>> return -EINVAL;
>> size = sizeof(struct amdgpu_bo_list);
>> size += num_entries * sizeof(struct amdgpu_bo_list_entry);
>> list = kvmalloc(size, GFP_KERNEL);
>> if (!list)
>> return -ENOMEM;
>> kref_init(&list->refcount);
>> -   list->gds_obj = adev->gds.gds_gfx_bo;
>> -   list->gws_obj = adev->gds.gws_gfx_bo;
>> -   list->oa_obj = adev->gds.oa_gfx_bo;
>> array = amdgpu_bo_list_array_entry(list, 0);
>> memset(array, 0, num_entries * sizeof(struct
>> amdgpu_bo_list_entry));
>> for (i = 0; i < num_entries; ++i) {
>> struct amdgpu_bo_list_entry *entry;
>> struct drm_gem_object *gobj;
>> struct amdgpu_bo *bo;
>> struct mm_struct *usermm;
>>   @@ -111,27 +108,20 @@ int amdgpu_bo_list_create(struct amdgpu_device
>> *adev, struct drm_file *filp,
>> } else {
>> entry = &array[last_entry++];
>> }
>> entry->robj = bo;
>> entry->priority = min(info[i].bo_priority,
>>   AMDGPU_BO_LIST_MAX_PRIORITY);
>> entry->tv.bo = &entry->robj->tbo;
>> entry->tv.shared = !entry->robj->prime_shared_count;
>>   - if (entry->robj->preferred_domains ==
>> AMDGPU_GEM_DOMAIN_GDS)
>> -   list->gds_obj = entry->robj;
>> -   if (entry->robj->preferred_domains ==
>> AMDGPU_GEM_DOMAIN_GWS)
>> -   list->gws_obj = entry->robj;
>> -   if (entry->robj->preferred_domains ==
>> AMDGPU_GEM_DOMAIN_OA)
>> -   list->oa_obj = entry->robj;
>> -
>> total_size += amdgpu_bo_size(entry->robj);
>> trace_amdgpu_bo_list_set(list, entry->robj);
>> }
>> list->first_userptr = first_userptr;
>> list->num_entries = num_entries;
>> trace_amdgpu_cs_bo_status(list->num_entries, total_size);
>> *result = list;
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h
>> index 61b089768e

[PATCH] drm/amdgpu: reserve GDS resources statically

2018-09-12 Thread Marek Olšák

From: Marek Olšák 

I've chosen to do it like this because it's easy and allows an arbitrary
number of processes.

Signed-off-by: Marek Olšák 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c |  10 --
 drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h |   3 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c  |  20 
 drivers/gpu/drm/amd/amdgpu/amdgpu_gds.h |  19 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c |  24 +---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c |   6 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_ids.h |   7 --
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.h |   3 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c |  14 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c  |  21 
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.h  |   6 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h|   5 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c |  61 --
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h |   8 --
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c  |  34 +-
 drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c   | 125 +---
 drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c   | 123 +--
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c   | 124 ++-
 include/uapi/drm/amdgpu_drm.h   |  15 +--
 19 files changed, 109 insertions(+), 519 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
index b80243d3972e..7264a4930b88 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
@@ -71,23 +71,20 @@ int amdgpu_bo_list_create(struct amdgpu_device *adev, 
struct drm_file *filp,
/ sizeof(struct amdgpu_bo_list_entry))
return -EINVAL;
 
size = sizeof(struct amdgpu_bo_list);
size += num_entries * sizeof(struct amdgpu_bo_list_entry);
list = kvmalloc(size, GFP_KERNEL);
if (!list)
return -ENOMEM;
 
kref_init(&list->refcount);
-   list->gds_obj = adev->gds.gds_gfx_bo;
-   list->gws_obj = adev->gds.gws_gfx_bo;
-   list->oa_obj = adev->gds.oa_gfx_bo;
 
array = amdgpu_bo_list_array_entry(list, 0);
memset(array, 0, num_entries * sizeof(struct amdgpu_bo_list_entry));
 
for (i = 0; i < num_entries; ++i) {
struct amdgpu_bo_list_entry *entry;
struct drm_gem_object *gobj;
struct amdgpu_bo *bo;
struct mm_struct *usermm;
 
@@ -111,27 +108,20 @@ int amdgpu_bo_list_create(struct amdgpu_device *adev, 
struct drm_file *filp,
} else {
entry = &array[last_entry++];
}
 
entry->robj = bo;
entry->priority = min(info[i].bo_priority,
  AMDGPU_BO_LIST_MAX_PRIORITY);
entry->tv.bo = &entry->robj->tbo;
entry->tv.shared = !entry->robj->prime_shared_count;
 
-   if (entry->robj->preferred_domains == AMDGPU_GEM_DOMAIN_GDS)
-   list->gds_obj = entry->robj;
-   if (entry->robj->preferred_domains == AMDGPU_GEM_DOMAIN_GWS)
-   list->gws_obj = entry->robj;
-   if (entry->robj->preferred_domains == AMDGPU_GEM_DOMAIN_OA)
-   list->oa_obj = entry->robj;
-
total_size += amdgpu_bo_size(entry->robj);
trace_amdgpu_bo_list_set(list, entry->robj);
}
 
list->first_userptr = first_userptr;
list->num_entries = num_entries;
 
trace_amdgpu_cs_bo_status(list->num_entries, total_size);
 
*result = list;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h
index 61b089768e1c..30f12a60aa28 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h
@@ -36,23 +36,20 @@ struct amdgpu_bo_list_entry {
struct ttm_validate_buffer  tv;
struct amdgpu_bo_va *bo_va;
uint32_tpriority;
struct page **user_pages;
int user_invalidated;
 };
 
 struct amdgpu_bo_list {
struct rcu_head rhead;
struct kref refcount;
-   struct amdgpu_bo *gds_obj;
-   struct amdgpu_bo *gws_obj;
-   struct amdgpu_bo *oa_obj;
unsigned first_userptr;
unsigned num_entries;
 };
 
 int amdgpu_bo_list_get(struct amdgpu_fpriv *fpriv, int id,
   struct amdgpu_bo_list **result);
 void amdgpu_bo_list_get_list(struct amdgpu_bo_list *list,
 struct list_head *validated);
 void amdgpu_bo_list_put(struct amdgpu_bo_list *list);
 int amdgpu_bo_create_list_entry_array(struct drm_amdgpu_bo_list_in *in,
diff --git a/drivers/gpu/drm/amd/amdgp

Re: [RFC] drm/amdgpu: Add macros and documentation for format modifiers.

2018-09-07 Thread Marek Olšák

On Fri, Sep 7, 2018 at 5:55 AM, Bas Nieuwenhuizen
 wrote:
> On Fri, Sep 7, 2018 at 6:51 AM Marek Olšák  wrote:
>>
>> Hopefully this answers some questions.
>>
>> Other parameters that affect tiling layouts are GB_ADDR_CONFIG (all
>> chips) and MC_ARB_RAMCFG (GFX6-8 only), and those vary with each chip.
>
> For GFX6-GFX8:
> From GB_ADDR_CONFIG addrlib only uses the pipe interleave bytes which
> are 0 (=256 bytes) for all AMDGPU HW (and on GFX9 addrlib even asserts
> on that).  From MC_ARB_RAMCFG addrlib reads the number of banks and
> ranks, calculates the number of logical banks from it, but then does
> not use it. (Presumably because it is the same number as the number of
> banks in the tiling table entry?) Some bits gets used by the kernel
> (memory row size), but those get encoded in the tile split of the
> tiling table, i.e. we do not need the separate bits.
>
> for GFX9, only the DCC meta surface seems to depend on GB_ADDR_CONFIG
> (except the aforementioned pipe interleave bytes) which are constant.

On GFX9, addrlib in Mesa uses most fields from GB_ADDR_CONFIG.
GB_ADDR_CONFIG defines the tiling formats.

On older chips, addrlib reads some fields from GB_ADDR_CONFIG and uses
the chip identification for others like the number of pipes, even
though GB_ADDR_CONFIG has the information too.

Marek
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [RFC] drm/amdgpu: Add macros and documentation for format modifiers.

2018-09-06 Thread Marek Olšák

Hopefully this answers some questions.

Other parameters that affect tiling layouts are GB_ADDR_CONFIG (all
chips) and MC_ARB_RAMCFG (GFX6-8 only), and those vary with each chip.

Some 32bpp 1D tiling layouts are compatible across all chips (1D
display tiling is the same as SW_256B_D if Bpp == 4).

On GFX9, swizzle modes <= 11 are the same on all GFX9 chips. The
remaining modes depend on GB_ADDR_CONFIG and are also more efficient.
Bpp, number of samples, and resource type (2D/3D) affect the layout
too, e.g. 3D textures silently use thick tiling on GFX9.

Harvesting doesn't affect tiling layouts.

The layout changes between layers/slices a little. Always use the base
address of the whole image when programming the hardware. Don't assume
that the 2nd layer has the same layout.

> + * TODO: Can scanout really not support fastclear data?

It can, but only those encoded in the DCC buffer (0/1). There is no
DAL support for DCC though.


> + * TODO: Do some generations share DCC format?

DCC mirrors the tiling layout, so the same tiling mode means the same
DCC. Take the absolute pixel address, shift it to the right, and
you'll get the DCC element address.

I would generally consider DCC as non-shareable because of different
meanings of TILING_INDEX between chips except maybe for common GFX9
layouts.


> [comments about number of bits]

We could certainly represent all formats as a list of enums, but then
we would need to convert the enums to the full description in drivers.
GFX6-8 can use TILING_INDEX (except for stencil, let's ignore
stencil). The tiling tables shouldn't change anymore because they are
optimized for the hardware, and later hw doesn't have any tables.

Marek
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: Refine gmc9 VM fault print.

2018-08-27 Thread Marek Olšák

Reviewed-by: Marek Olšák 

Marek

On Mon, Aug 27, 2018 at 2:55 PM, Alex Deucher  wrote:
> On Mon, Aug 27, 2018 at 2:23 PM Andrey Grodzovsky
>  wrote:
>>
>> The fault reports the page number where the fault happend and not
>> the exact faulty address. Update the print message to reflect that.
>>
>> Signed-off-by: Andrey Grodzovsky 
>
> Reviewed-by: Alex Deucher 
>
>> ---
>>  drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c 
>> b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>> index 6763570..d44c5e2 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>> @@ -270,7 +270,7 @@ static int gmc_v9_0_process_interrupt(struct 
>> amdgpu_device *adev,
>> entry->src_id, entry->ring_id, entry->vmid,
>> entry->pasid, task_info.process_name, task_info.tgid,
>> task_info.task_name, task_info.pid);
>> -   dev_err(adev->dev, "  at address 0x%016llx from %d\n",
>> +   dev_err(adev->dev, "  in page starting at address 0x%016llx 
>> from %d\n",
>> addr, entry->client_id);
>> if (!amdgpu_sriov_vf(adev))
>> dev_err(adev->dev,
>> --
>> 2.7.4
>>
>> ___
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 2/5] drm/amdgpu: add ring soft recovery v2

2018-08-23 Thread Marek Olšák

On Thu, Aug 23, 2018 at 2:51 AM Christian König
 wrote:
>
> Am 22.08.2018 um 21:32 schrieb Marek Olšák:
> > On Wed, Aug 22, 2018 at 12:56 PM Alex Deucher  wrote:
> >> On Wed, Aug 22, 2018 at 6:05 AM Christian König
> >>  wrote:
> >>> Instead of hammering hard on the GPU try a soft recovery first.
> >>>
> >>> v2: reorder code a bit
> >>>
> >>> Signed-off-by: Christian König 
> >>> ---
> >>>   drivers/gpu/drm/amd/amdgpu/amdgpu_job.c  |  6 ++
> >>>   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 24 
> >>>   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h |  4 
> >>>   3 files changed, 34 insertions(+)
> >>>
> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
> >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> >>> index 265ff90f4e01..d93e31a5c4e7 100644
> >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> >>> @@ -33,6 +33,12 @@ static void amdgpu_job_timedout(struct drm_sched_job 
> >>> *s_job)
> >>>  struct amdgpu_ring *ring = to_amdgpu_ring(s_job->sched);
> >>>  struct amdgpu_job *job = to_amdgpu_job(s_job);
> >>>
> >>> +   if (amdgpu_ring_soft_recovery(ring, job->vmid, 
> >>> s_job->s_fence->parent)) {
> >>> +   DRM_ERROR("ring %s timeout, but soft recovered\n",
> >>> + s_job->sched->name);
> >>> +   return;
> >>> +   }
> >> I think we should still bubble up the error to userspace even if we
> >> can recover.  Data is lost when the wave is killed.  We should treat
> >> it like a GPU reset.
> > Yes, please increment gpu_reset_counter, so that we are compliant with
> > OpenGL. Being able to recover from infinite loops is great, but test
> > suites also expect this to be properly reported to userspace via the
> > per-context query.
>
> Sure that shouldn't be a problem.
>
> > Also please bump the deadline to 1 second. Even you if you kill all
> > shaders, the IB can also contain CP DMA, which may take longer than 1
> > ms.
>
> Is there any way we can get a feedback from the SQ if the kill was
> successfully?

I don't think so. The kill should be finished pretty quickly, but more
waves with infinite loops may be waiting to be launched, so you still
need to repeat the kill command. And we should ideally repeat it for 1
second.

The reason is that vertex shader waves take a lot of time to launch. A
very very very large draw call can keep launching new waves for 1
second with the same infinite loop. You would have to soft-reset all
VGTs to stop that.

>
> 1 second is way to long, since in the case of a blocked MC we need to
> start up hard reset relative fast.

10 seconds have already passed.

I think that some hangs from corrupted descriptors may still be
recoverable just by killing waves.

Marek

>
> Regards,
> Christian.
>
> >
> > Marek
> >
> > Marek
> >
> >> Alex
> >>
> >>> +
> >>>  DRM_ERROR("ring %s timeout, signaled seq=%u, emitted seq=%u\n",
> >>>job->base.sched->name, 
> >>> atomic_read(&ring->fence_drv.last_seq),
> >>>ring->fence_drv.sync_seq);
> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c 
> >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> >>> index 5dfd26be1eec..c045a4e38ad1 100644
> >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> >>> @@ -383,6 +383,30 @@ void 
> >>> amdgpu_ring_emit_reg_write_reg_wait_helper(struct amdgpu_ring *ring,
> >>>  amdgpu_ring_emit_reg_wait(ring, reg1, mask, mask);
> >>>   }
> >>>
> >>> +/**
> >>> + * amdgpu_ring_soft_recovery - try to soft recover a ring lockup
> >>> + *
> >>> + * @ring: ring to try the recovery on
> >>> + * @vmid: VMID we try to get going again
> >>> + * @fence: timedout fence
> >>> + *
> >>> + * Tries to get a ring proceeding again when it is stuck.
> >>> + */
> >>> +bool amdgpu_ring_soft_recovery(struct amdgpu_ring *ring, unsigned int 
> >>> vmid,
> >>> +  struct dma_fence *fence)
> >>> +{
> >>> +   ktime_t deadline = ktime_ad

Re: [PATCH 2/5] drm/amdgpu: add ring soft recovery v2

2018-08-22 Thread Marek Olšák

On Wed, Aug 22, 2018 at 12:56 PM Alex Deucher  wrote:
>
> On Wed, Aug 22, 2018 at 6:05 AM Christian König
>  wrote:
> >
> > Instead of hammering hard on the GPU try a soft recovery first.
> >
> > v2: reorder code a bit
> >
> > Signed-off-by: Christian König 
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_job.c  |  6 ++
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 24 
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h |  4 
> >  3 files changed, 34 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> > index 265ff90f4e01..d93e31a5c4e7 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> > @@ -33,6 +33,12 @@ static void amdgpu_job_timedout(struct drm_sched_job 
> > *s_job)
> > struct amdgpu_ring *ring = to_amdgpu_ring(s_job->sched);
> > struct amdgpu_job *job = to_amdgpu_job(s_job);
> >
> > +   if (amdgpu_ring_soft_recovery(ring, job->vmid, 
> > s_job->s_fence->parent)) {
> > +   DRM_ERROR("ring %s timeout, but soft recovered\n",
> > + s_job->sched->name);
> > +   return;
> > +   }
>
> I think we should still bubble up the error to userspace even if we
> can recover.  Data is lost when the wave is killed.  We should treat
> it like a GPU reset.

Yes, please increment gpu_reset_counter, so that we are compliant with
OpenGL. Being able to recover from infinite loops is great, but test
suites also expect this to be properly reported to userspace via the
per-context query.

Also please bump the deadline to 1 second. Even you if you kill all
shaders, the IB can also contain CP DMA, which may take longer than 1
ms.

Marek

Marek

>
> Alex
>
> > +
> > DRM_ERROR("ring %s timeout, signaled seq=%u, emitted seq=%u\n",
> >   job->base.sched->name, 
> > atomic_read(&ring->fence_drv.last_seq),
> >   ring->fence_drv.sync_seq);
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c 
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> > index 5dfd26be1eec..c045a4e38ad1 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> > @@ -383,6 +383,30 @@ void amdgpu_ring_emit_reg_write_reg_wait_helper(struct 
> > amdgpu_ring *ring,
> > amdgpu_ring_emit_reg_wait(ring, reg1, mask, mask);
> >  }
> >
> > +/**
> > + * amdgpu_ring_soft_recovery - try to soft recover a ring lockup
> > + *
> > + * @ring: ring to try the recovery on
> > + * @vmid: VMID we try to get going again
> > + * @fence: timedout fence
> > + *
> > + * Tries to get a ring proceeding again when it is stuck.
> > + */
> > +bool amdgpu_ring_soft_recovery(struct amdgpu_ring *ring, unsigned int vmid,
> > +  struct dma_fence *fence)
> > +{
> > +   ktime_t deadline = ktime_add_us(ktime_get(), 1000);
> > +
> > +   if (!ring->funcs->soft_recovery)
> > +   return false;
> > +
> > +   while (!dma_fence_is_signaled(fence) &&
> > +  ktime_to_ns(ktime_sub(deadline, ktime_get())) > 0)
> > +   ring->funcs->soft_recovery(ring, vmid);
> > +
> > +   return dma_fence_is_signaled(fence);
> > +}
> > +
> >  /*
> >   * Debugfs info
> >   */
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h 
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> > index 409fdd9b9710..9cc239968e40 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> > @@ -168,6 +168,8 @@ struct amdgpu_ring_funcs {
> > /* priority functions */
> > void (*set_priority) (struct amdgpu_ring *ring,
> >   enum drm_sched_priority priority);
> > +   /* Try to soft recover the ring to make the fence signal */
> > +   void (*soft_recovery)(struct amdgpu_ring *ring, unsigned vmid);
> >  };
> >
> >  struct amdgpu_ring {
> > @@ -260,6 +262,8 @@ void amdgpu_ring_fini(struct amdgpu_ring *ring);
> >  void amdgpu_ring_emit_reg_write_reg_wait_helper(struct amdgpu_ring *ring,
> > uint32_t reg0, uint32_t 
> > val0,
> > uint32_t reg1, uint32_t 
> > val1);
> > +bool amdgpu_ring_soft_recovery(struct amdgpu_ring *ring, unsigned int vmid,
> > +  struct dma_fence *fence);
> >
> >  static inline void amdgpu_ring_clear_ring(struct amdgpu_ring *ring)
> >  {
> > --
> > 2.14.1
> >
> > ___
> > amd-gfx mailing list
> > amd-gfx@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/amd-gfx
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedes

Re: [PATCH libdrm 6/6] amdgpu: always add all BOs to lockup table

2018-08-10 Thread Marek Olšák

OK. Thanks.

Marek

On Fri, Aug 10, 2018 at 9:06 AM, Christian König
 wrote:
> Why should it? Adding the handle is now not more than setting an array
> entry.
>
> I've tested with allocating 250k BOs of 4k size each and there wasn't any
> measurable performance differences.
>
> Christian.
>
>
> Am 09.08.2018 um 18:56 schrieb Marek Olšák:
>>
>> I don't think this is a good idea. Can you please explain why this
>> won't cause performance regressions?
>>
>> Thanks
>> Marek
>>
>> On Fri, Aug 3, 2018 at 7:34 AM, Christian König
>>  wrote:
>>>
>>> This way we can always find a BO structure by its handle.
>>>
>>> Signed-off-by: Christian König 
>>> ---
>>>   amdgpu/amdgpu_bo.c | 14 --
>>>   1 file changed, 4 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/amdgpu/amdgpu_bo.c b/amdgpu/amdgpu_bo.c
>>> index 02592377..422c7c99 100644
>>> --- a/amdgpu/amdgpu_bo.c
>>> +++ b/amdgpu/amdgpu_bo.c
>>> @@ -87,6 +87,10 @@ int amdgpu_bo_alloc(amdgpu_device_handle dev,
>>>
>>>  bo->handle = args.out.handle;
>>>
>>> +   pthread_mutex_lock(&bo->dev->bo_table_mutex);
>>> +   r = handle_table_insert(&bo->dev->bo_handles, bo->handle, bo);
>>> +   pthread_mutex_unlock(&bo->dev->bo_table_mutex);
>>> +
>>>  pthread_mutex_init(&bo->cpu_access_mutex, NULL);
>>>
>>>  if (r)
>>> @@ -171,13 +175,6 @@ int amdgpu_bo_query_info(amdgpu_bo_handle bo,
>>>  return 0;
>>>   }
>>>
>>> -static void amdgpu_add_handle_to_table(amdgpu_bo_handle bo)
>>> -{
>>> -   pthread_mutex_lock(&bo->dev->bo_table_mutex);
>>> -   handle_table_insert(&bo->dev->bo_handles, bo->handle, bo);
>>> -   pthread_mutex_unlock(&bo->dev->bo_table_mutex);
>>> -}
>>> -
>>>   static int amdgpu_bo_export_flink(amdgpu_bo_handle bo)
>>>   {
>>>  struct drm_gem_flink flink;
>>> @@ -240,14 +237,11 @@ int amdgpu_bo_export(amdgpu_bo_handle bo,
>>>  return 0;
>>>
>>>  case amdgpu_bo_handle_type_kms:
>>> -   amdgpu_add_handle_to_table(bo);
>>> -   /* fall through */
>>>  case amdgpu_bo_handle_type_kms_noimport:
>>>  *shared_handle = bo->handle;
>>>  return 0;
>>>
>>>  case amdgpu_bo_handle_type_dma_buf_fd:
>>> -   amdgpu_add_handle_to_table(bo);
>>>  return drmPrimeHandleToFD(bo->dev->fd, bo->handle,
>>>DRM_CLOEXEC | DRM_RDWR,
>>>(int*)shared_handle);
>>> --
>>> 2.14.1
>>>
>>> ___
>>> amd-gfx mailing list
>>> amd-gfx@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH libdrm 6/6] amdgpu: always add all BOs to lockup table

2018-08-09 Thread Marek Olšák

I don't think this is a good idea. Can you please explain why this
won't cause performance regressions?

Thanks
Marek

On Fri, Aug 3, 2018 at 7:34 AM, Christian König
 wrote:
> This way we can always find a BO structure by its handle.
>
> Signed-off-by: Christian König 
> ---
>  amdgpu/amdgpu_bo.c | 14 --
>  1 file changed, 4 insertions(+), 10 deletions(-)
>
> diff --git a/amdgpu/amdgpu_bo.c b/amdgpu/amdgpu_bo.c
> index 02592377..422c7c99 100644
> --- a/amdgpu/amdgpu_bo.c
> +++ b/amdgpu/amdgpu_bo.c
> @@ -87,6 +87,10 @@ int amdgpu_bo_alloc(amdgpu_device_handle dev,
>
> bo->handle = args.out.handle;
>
> +   pthread_mutex_lock(&bo->dev->bo_table_mutex);
> +   r = handle_table_insert(&bo->dev->bo_handles, bo->handle, bo);
> +   pthread_mutex_unlock(&bo->dev->bo_table_mutex);
> +
> pthread_mutex_init(&bo->cpu_access_mutex, NULL);
>
> if (r)
> @@ -171,13 +175,6 @@ int amdgpu_bo_query_info(amdgpu_bo_handle bo,
> return 0;
>  }
>
> -static void amdgpu_add_handle_to_table(amdgpu_bo_handle bo)
> -{
> -   pthread_mutex_lock(&bo->dev->bo_table_mutex);
> -   handle_table_insert(&bo->dev->bo_handles, bo->handle, bo);
> -   pthread_mutex_unlock(&bo->dev->bo_table_mutex);
> -}
> -
>  static int amdgpu_bo_export_flink(amdgpu_bo_handle bo)
>  {
> struct drm_gem_flink flink;
> @@ -240,14 +237,11 @@ int amdgpu_bo_export(amdgpu_bo_handle bo,
> return 0;
>
> case amdgpu_bo_handle_type_kms:
> -   amdgpu_add_handle_to_table(bo);
> -   /* fall through */
> case amdgpu_bo_handle_type_kms_noimport:
> *shared_handle = bo->handle;
> return 0;
>
> case amdgpu_bo_handle_type_dma_buf_fd:
> -   amdgpu_add_handle_to_table(bo);
> return drmPrimeHandleToFD(bo->dev->fd, bo->handle,
>   DRM_CLOEXEC | DRM_RDWR,
>   (int*)shared_handle);
> --
> 2.14.1
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 1/2] drm/amdgpu: return bo itself if userptr is cpu addr of bo (v3)

2018-08-01 Thread Marek Olšák

On Wed, Aug 1, 2018 at 2:29 PM, Christian König
 wrote:
> Am 01.08.2018 um 19:59 schrieb Marek Olšák:
>>
>> On Wed, Aug 1, 2018 at 1:52 PM, Christian König
>>  wrote:
>>>
>>> Am 01.08.2018 um 19:39 schrieb Marek Olšák:
>>>>
>>>> On Wed, Aug 1, 2018 at 2:32 AM, Christian König
>>>>  wrote:
>>>>>
>>>>> Am 01.08.2018 um 00:07 schrieb Marek Olšák:
>>>>>>
>>>>>> Can this be implemented as a wrapper on top of libdrm? So that the
>>>>>> tree (or hash table) isn't created for UMDs that don't need it.
>>>>>
>>>>>
>>>>> No, the problem is that an application gets a CPU pointer from one API
>>>>> and
>>>>> tries to import that pointer into another one.
>>>>>
>>>>> In other words we need to implement this independent of the UMD who
>>>>> mapped
>>>>> the BO.
>>>>
>>>> Yeah, it could be an optional feature of libdrm, and other components
>>>> should be able to disable it to remove the overhead.
>>>
>>>
>>> The overhead is negligible, the real problem is the memory footprint.
>>>
>>> A brief look at the hash implementation in libdrm showed that this is
>>> actually really inefficient.
>>>
>>> I think we have the choice of implementing a r/b tree to map the CPU
>>> pointer
>>> addresses or implement a quadratic tree to map the handles.
>>>
>>> The later is easy to do and would also allow to get rid of the hash table
>>> as
>>> well.
>>
>> We can also use the hash table from mesa/src/util.
>>
>> I don't think the overhead would be negligible. It would be a log(n)
>> insertion in bo_map and a log(n) deletion in bo_unmap. If you did
>> bo_map+bo_unmap 1 times, would it be negligible?
>
>
> Compared to what the kernel needs to do for updating the page tables it is
> less than 1% of the total work.
>
> The real question is if it wouldn't be simpler to use a tree for the
> handles. Since the handles are dense you can just use an unbalanced tree
> which is really easy.
>
> For a tree of the CPU mappings we would need an r/b interval tree, which is
> hard to implement and quite some overkill.
>
> Do you have any numbers how many BOs really get a CPU mapping in a real
> world application?

Without our suballocator, we sometimes exceeded the max. mmap limit
(~64K). It should be much less with the suballocator with 128KB slabs,
probably a few thousands.

Marek
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 1/2] drm/amdgpu: return bo itself if userptr is cpu addr of bo (v3)

2018-08-01 Thread Marek Olšák

On Wed, Aug 1, 2018 at 1:52 PM, Christian König
 wrote:
> Am 01.08.2018 um 19:39 schrieb Marek Olšák:
>>
>> On Wed, Aug 1, 2018 at 2:32 AM, Christian König
>>  wrote:
>>>
>>> Am 01.08.2018 um 00:07 schrieb Marek Olšák:
>>>>
>>>> Can this be implemented as a wrapper on top of libdrm? So that the
>>>> tree (or hash table) isn't created for UMDs that don't need it.
>>>
>>>
>>> No, the problem is that an application gets a CPU pointer from one API
>>> and
>>> tries to import that pointer into another one.
>>>
>>> In other words we need to implement this independent of the UMD who
>>> mapped
>>> the BO.
>>
>> Yeah, it could be an optional feature of libdrm, and other components
>> should be able to disable it to remove the overhead.
>
>
> The overhead is negligible, the real problem is the memory footprint.
>
> A brief look at the hash implementation in libdrm showed that this is
> actually really inefficient.
>
> I think we have the choice of implementing a r/b tree to map the CPU pointer
> addresses or implement a quadratic tree to map the handles.
>
> The later is easy to do and would also allow to get rid of the hash table as
> well.

We can also use the hash table from mesa/src/util.

I don't think the overhead would be negligible. It would be a log(n)
insertion in bo_map and a log(n) deletion in bo_unmap. If you did
bo_map+bo_unmap 1 times, would it be negligible?

Marek
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 1/2] drm/amdgpu: return bo itself if userptr is cpu addr of bo (v3)

2018-08-01 Thread Marek Olšák

On Wed, Aug 1, 2018 at 2:32 AM, Christian König
 wrote:
> Am 01.08.2018 um 00:07 schrieb Marek Olšák:
>>
>> Can this be implemented as a wrapper on top of libdrm? So that the
>> tree (or hash table) isn't created for UMDs that don't need it.
>
>
> No, the problem is that an application gets a CPU pointer from one API and
> tries to import that pointer into another one.
>
> In other words we need to implement this independent of the UMD who mapped
> the BO.

Yeah, it could be an optional feature of libdrm, and other components
should be able to disable it to remove the overhead.

Marek
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[ANNOUNCE] libdrm 2.4.93

2018-07-31 Thread Marek Olšák

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512


Christian König (1):
  amdgpu: make sure to set CLOEXEC on duplicated FDs

Emil Velikov (10):
  xf86drm: drmGetDevice2: error out if the fd has unknown subsys
  xf86drm: introduce drm_device_has_rdev() helper
  xf86drm: Fold drmDevice processing into process_device() helper
  xf86drm: Allocate drmDevicePtr's on stack
  xf86drm: introduce a get_real_pci_path() helper
  xf86drm: Add drmDevice support for virtio_gpu
  tests/drmdevices: install alongside other utilities
  tests/drmdevice: add a couple of printf headers
  drmdevice: convert the tabbed output into a tree
  drmdevice: print the correct host1x information

Jan Vesely (3):
  amdgpu: Take a lock before removing devices from fd_tab hash table.
  amdgpu/util_hash_table: Add helper function to count the number of 
entries in hash table
  amdgpu: Destroy fd_hash table when the last device is removed.

José Roberto de Souza (2):
  intel: Introducing Whiskey Lake platform
  intel: Introducing Amber Lake platform

Kevin Strasser (1):
  xf86drm: Be sure to closedir before return

Marek Olšák (3):
  amdgpu: don't call add_handle_to_table for KMS BO exports
  amdgpu: add amdgpu_bo_handle_type_kms_noimport
  configure.ac: bump version to 2.4.93

Mariusz Ceier (1):
  xf86drm: Fix error path in drmGetDevice2

Michel Dänzer (2):
  Always pass O_CLOEXEC when opening DRM file descriptors
  Revert "amdgpu: don't call add_handle_to_table for KMS BO exports"

Rob Clark (5):
  freedreno: add user ptr to fd_ringbuffer
  freedreno: add fd_ringbuffer_new_object()
  freedreno: small cleanup
  freedreno: slight reordering
  freedreno/msm: "stateobj" support

git tag: libdrm-2.4.93

https://dri.freedesktop.org/libdrm/libdrm-2.4.93.tar.bz2
MD5:  0ba45ad1551b2c1b6df0797a3e65f827  libdrm-2.4.93.tar.bz2
SHA1: 550ba4bb50236fc2e9138cbeadcb4942ce09410e  libdrm-2.4.93.tar.bz2
SHA256: 6e84d1dc9548a76f20b59a85cf80a0b230cd8196084f5243469d9e65354fcd3c  
libdrm-2.4.93.tar.bz2
SHA512: 
ba4221e8d6a3a9872fb6d30a0ea391e30ea0e17f249c66f067bed9c2161ed1ad8083959cb2c212834c6566c3e025f4daae31e9533d77aae19b9de6c2ab3d
  libdrm-2.4.93.tar.bz2
PGP:  https://dri.freedesktop.org/libdrm/libdrm-2.4.93.tar.bz2.sig

https://dri.freedesktop.org/libdrm/libdrm-2.4.93.tar.gz
MD5:  2df4729a0e9829d77a7a0a7a8dda2f66  libdrm-2.4.93.tar.gz
SHA1: 3947aeecb6ecc271c657638f97e8d21753cedf6e  libdrm-2.4.93.tar.gz
SHA256: bc67b2503106155c239c4e455b6718ef1b31675ea51f544c785c0e3295712861  
libdrm-2.4.93.tar.gz
SHA512: 
3ca334ee46fe50103e146463d2dab85c7a075559192a85bfe73ca2f80cb8a8847b19775b1a16271b537656e9d7a48f0e209ea7227a3e1ebd9fa3a5caf38047f2
  libdrm-2.4.93.tar.gz
PGP:  https://dri.freedesktop.org/libdrm/libdrm-2.4.93.tar.gz.sig

-BEGIN PGP SIGNATURE-

iQEzBAEBCgAdFiEEzUfFNBo3XzO+97r6/dFdWs7w8rEFAlthEUoACgkQ/dFdWs7w
8rGVWAf/bV1GNvp0Aakm95UhIHC61CvZk7PxnSADd3SlZC6BE9K5WNK2jumtXQln
o1EmXc1WS2b02jPQVX4+qv/F8gxVlHdKDbi52Rk1RnK0ii7gP2oGf+4q0EUrdwoq
HYE6XHppUFgBVRBXTa0vCqVBo/KYWRgPlSPNlEsxigPzRk00Qt0vWEjQiN8vTmH2
+YReSIvOQxmZ/CSU8/+JaV395S+7nc59HG18xHvjcC6F4AelWBFdAA+P782yTeJ5
nr952bkr7+Z5/n/XWUkzdTr11YSHev5N24JYxopvXgbmDr+Dz/hdIYyy8MGJSKp0
q+2RVWP6WlNlXh2KUvwPM9SZydxkDA==
=DmZ7
-END PGP SIGNATURE-
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 1/2] drm/amdgpu: return bo itself if userptr is cpu addr of bo (v3)

2018-07-31 Thread Marek Olšák

Can this be implemented as a wrapper on top of libdrm? So that the
tree (or hash table) isn't created for UMDs that don't need it.

Marek

On Tue, Jul 31, 2018 at 6:13 AM, Christian König
 wrote:
> Am 31.07.2018 um 11:54 schrieb Zhang, Jerry (Junwei):
>>
>> On 07/31/2018 05:04 PM, Christian König wrote:
>>>
>>> Am 31.07.2018 um 10:58 schrieb Zhang, Jerry (Junwei):

 On 07/31/2018 04:13 PM, Christian König wrote:
>
> Am 31.07.2018 um 10:05 schrieb Zhang, Jerry (Junwei):
>>
>> On 07/31/2018 03:03 PM, Christian König wrote:
>>>
>>> Am 31.07.2018 um 08:58 schrieb Zhang, Jerry (Junwei):

 On 07/30/2018 06:47 PM, Christian König wrote:
>
> Am 30.07.2018 um 12:02 schrieb Junwei Zhang:
> [SNIP]
> Please double check if that is still up to date.


 We may have to replace drm_gem_object_reference() with
 drm_gem_object_get().

 On 2nd thought, do we really need to do reference every time?
>>>
>>>
>>> Yes, that's a must have. Otherwise the handle could be freed and
>>> reused already when we return.
>>>
 if UMD find the same gem object for 3 times, it also need to
 explicitly free(put) that object for 3 times?
>>>
>>>
>>> Correct yes. Thinking more about this the real problem is to
>>> translate the handle into a structure in libdrm.
>>>
>>> Here we are back to the problem Marek and Michel has been working on
>>> for a while that we always need to be able to translate a handle into a 
>>> bo
>>> structure.
>>>
>>> So that needs to be solved before we can upstream the changes.
>>
>>
>> Thanks for your info.
>> It's better to fix that before upstream.
>
>
> Thinking more about this the hash currently used in libdrm is not
> adequate any more.
>
> E.g. we now need to be able to find all BOs based on their handle.
> Since the handles are dense either an r/b tree or a radix tree now sounds
> like the best approach to me.


 Not sure the exact reason that we added hash table in libdrm.
>>>
>>>
>>> The reason for that was that when a kernel function returns a handle we
>>> need to make sure that we always use the same struct amdgpu_bo for it.
>>>
>>> Otherwise you run into quite some problems with syncing etc...
>>
>>
>> Thanks for your explanation.
>>
>>>
 But it really costs much less time than calling IOCTL to find BO by
 their handles.
>>>
>>>
>>> Well we could just completely drop the kernel implementation and use an
>>> userspace implementation.
>>
>>
>> Do you mean to implement finding bo by cpu address in libdrm completely?
>
>
> Yes, exactly.
>
>> e.g. to create a tree to manage bo handle in libdrm?
>
>
> I mean when we need to create a tree to map the handle to a BO you could
> also create a tree to map the CPU pointer to the BO directly and avoid the
> IOCTL overhead completely.
>
> Christian.
>
>
>>
>> Jerry
>>
>>>
>>> And yes I agree when we need a tree anyway it would probably be faster
>>> than calling the IOCTL to find the BO.
>>>
>>> Christian.
>>>

 In this case, UMD seems not to be able to get BO handle and try to
 verify it by cpu address then.
 In another word, UMD would like to find if the memory is created as BO
 or system memory, I suppose.

 Regards,
 Jerry


>
> Christian.
>
>>
>> Regards,
>> Jerry
>
>
 ___
 amd-gfx mailing list
 amd-gfx@lists.freedesktop.org
 https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>
>>>
>> ___
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH libdrm] amdgpu: add amdgpu_bo_handle_type_kms_noimport

2018-07-24 Thread Marek Olšák

Christian,

Would you please give me an Rb if the patch is OK with you? I have
spoken with Michel and he would be OK with me pushing it as long as it
gets an Rb from either you or Alex.

Thanks,
Marek

On Wed, Jul 11, 2018 at 8:47 PM, Marek Olšák  wrote:
> From: Marek Olšák 
>
> ---
>  amdgpu/amdgpu.h| 7 ++-
>  amdgpu/amdgpu_bo.c | 4 
>  2 files changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/amdgpu/amdgpu.h b/amdgpu/amdgpu.h
> index 36f91058..be83b457 100644
> --- a/amdgpu/amdgpu.h
> +++ b/amdgpu/amdgpu.h
> @@ -77,21 +77,26 @@ struct drm_amdgpu_info_hw_ip;
>   *
>  */
>  enum amdgpu_bo_handle_type {
> /** GEM flink name (needs DRM authentication, used by DRI2) */
> amdgpu_bo_handle_type_gem_flink_name = 0,
>
> /** KMS handle which is used by all driver ioctls */
> amdgpu_bo_handle_type_kms = 1,
>
> /** DMA-buf fd handle */
> -   amdgpu_bo_handle_type_dma_buf_fd = 2
> +   amdgpu_bo_handle_type_dma_buf_fd = 2,
> +
> +   /** KMS handle, but re-importing as a DMABUF handle through
> +*  drmPrimeHandleToFD is forbidden. (Glamor does that)
> +*/
> +   amdgpu_bo_handle_type_kms_noimport = 3,
>  };
>
>  /** Define known types of GPU VM VA ranges */
>  enum amdgpu_gpu_va_range
>  {
> /** Allocate from "normal"/general range */
> amdgpu_gpu_va_range_general = 0
>  };
>
>  enum amdgpu_sw_info {
> diff --git a/amdgpu/amdgpu_bo.c b/amdgpu/amdgpu_bo.c
> index 9e37b149..d29be244 100644
> --- a/amdgpu/amdgpu_bo.c
> +++ b/amdgpu/amdgpu_bo.c
> @@ -234,20 +234,22 @@ int amdgpu_bo_export(amdgpu_bo_handle bo,
> case amdgpu_bo_handle_type_gem_flink_name:
> r = amdgpu_bo_export_flink(bo);
> if (r)
> return r;
>
> *shared_handle = bo->flink_name;
> return 0;
>
> case amdgpu_bo_handle_type_kms:
> amdgpu_add_handle_to_table(bo);
> +   /* fall through */
> +   case amdgpu_bo_handle_type_kms_noimport:
> *shared_handle = bo->handle;
> return 0;
>
> case amdgpu_bo_handle_type_dma_buf_fd:
> amdgpu_add_handle_to_table(bo);
> return drmPrimeHandleToFD(bo->dev->fd, bo->handle,
>   DRM_CLOEXEC | DRM_RDWR,
>   (int*)shared_handle);
> }
> return -EINVAL;
> @@ -299,20 +301,21 @@ int amdgpu_bo_import(amdgpu_device_handle dev,
> bo = util_hash_table_get(dev->bo_flink_names,
>  (void*)(uintptr_t)shared_handle);
> break;
>
> case amdgpu_bo_handle_type_dma_buf_fd:
> bo = util_hash_table_get(dev->bo_handles,
>  (void*)(uintptr_t)shared_handle);
> break;
>
> case amdgpu_bo_handle_type_kms:
> +   case amdgpu_bo_handle_type_kms_noimport:
> /* Importing a KMS handle in not allowed. */
> pthread_mutex_unlock(&dev->bo_table_mutex);
> return -EPERM;
>
> default:
> pthread_mutex_unlock(&dev->bo_table_mutex);
> return -EINVAL;
> }
>
> if (bo) {
> @@ -368,20 +371,21 @@ int amdgpu_bo_import(amdgpu_device_handle dev,
> util_hash_table_set(dev->bo_flink_names,
> (void*)(uintptr_t)bo->flink_name, bo);
> break;
>
> case amdgpu_bo_handle_type_dma_buf_fd:
> bo->handle = shared_handle;
> bo->alloc_size = dma_buf_size;
> break;
>
> case amdgpu_bo_handle_type_kms:
> +   case amdgpu_bo_handle_type_kms_noimport:
> assert(0); /* unreachable */
> }
>
> /* Initialize it. */
> atomic_set(&bo->refcount, 1);
> bo->dev = dev;
> pthread_mutex_init(&bo->cpu_access_mutex, NULL);
>
> util_hash_table_set(dev->bo_handles, (void*)(uintptr_t)bo->handle, 
> bo);
> pthread_mutex_unlock(&dev->bo_table_mutex);
> --
> 2.17.1
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH libdrm] amdgpu: add amdgpu_bo_handle_type_kms_noimport

2018-07-19 Thread Marek Olšák

On Wed, Jul 18, 2018 at 11:55 AM, Michel Dänzer  wrote:
> On 2018-07-17 08:14 PM, Marek Olšák wrote:
>> Michel, I think you are wasting your time. This change can be misused
>> as easily as any other API. It's not more dangerous that any other
>> amdgpu libdrm function.
>
> That's trivially false.
>
>> You won't achieve anything by optimizing the hash table (= losing time),
>> [...]
>
> I think you're focusing too much on your immediate desire instead of the
> big(ger) picture.
>
> E.g. I see amdgpu_bo_export getting called from surprising places (in
> Xorg), performing a hash table lookup each time. Fixing that would
> achieve something, though probably not much.

I know about the use in Xorg and this patch actually indirectly
mentions it (it mentions Glamor in the code). The flag contains
_noimport to self-document itself to mitigate incorrect usage.

>
> Anyway, adding dangerous API (keep in mind that we don't control all
> libdrm_amdgpu users, or even know how they're using it) for something
> that can also be achieved without is just a bad idea. Avoiding that is
> achievement enough.

We don't need to control other libdrm users. They can control
themselves. :) I'm totally fine with incorrect usage leading to bad
things, like any other bug. Much worse things can be done with the CS
ioctl.

Marek
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH libdrm] amdgpu: add amdgpu_bo_handle_type_kms_noimport

2018-07-17 Thread Marek Olšák

Michel, I think you are wasting your time. This change can be misused
as easily as any other API. It's not more dangerous that any other
amdgpu libdrm function. You won't achieve anything by optimizing the
hash table (= losing time), and you also won't achieve anything by
NAKing this (= losing performance on the lookup). Both are lose-lose
solutions, because you'll lose and others will lose too.

Marek

On Tue, Jul 17, 2018 at 4:57 AM, Michel Dänzer  wrote:
> On 2018-07-16 08:51 PM, Marek Olšák wrote:
>> On Mon, Jul 16, 2018 at 12:05 PM, Michel Dänzer  wrote:
>>> On 2018-07-13 08:47 PM, Marek Olšák wrote:
>>>> On Fri, Jul 13, 2018 at 4:28 AM, Michel Dänzer  wrote:
>>>
>>>>> I'd rather add the handle to the hash table in amdgpu_bo_alloc,
>>>>> amdgpu_create_bo_from_user_mem and amdgpu_bo_import instead of in
>>>>> amdgpu_bo_export, making amdgpu_bo_export(bo, amdgpu_bo_handle_type_kms,
>>>>> ...) essentially free. In the unlikely (since allocating a BO from the
>>>>> kernel is expensive) case that the hash table shows up on profiles, we
>>>>> can optimize it.
>>>>
>>>> The hash table isn't very good for high BO counts. The time complexity
>>>> of a lookup is O(n).
>>>
>>> A lookup is only needed in amdgpu_bo_import. amdgpu_bo_alloc and
>>> amdgpu_create_bo_from_user_mem can just add the handle to the hash
>>> bucket directly.
>>>
>>> Do you know of, or can you imagine, any workload where amdgpu_bo_import
>>> is called often enough for this to be a concern?
>>
>> Fullscreen DRI2 or DRI3 re-imports buffers every frame.
>
> DRI3 doesn't. The X server only imports each DRI3 buffer once, after
> that it's referred to via the pixmap XID.
>
>
> With DRI2 page flipping (ignoring that basically nobody's using that
> anymore with radeonsi :), it's always the same set of buffers, so the
> lookup can be made fast as discussed in the sub-thread with Christian.
> (Also, DRI2 can only use page flipping with sync-to-vblank enabled, so
> this happens on the order of hundreds of times per second max)
>
>
> --
> Earthling Michel Dänzer   |   http://www.amd.com
> Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH libdrm] amdgpu: add amdgpu_bo_handle_type_kms_noimport

2018-07-16 Thread Marek Olšák

On Mon, Jul 16, 2018 at 12:05 PM, Michel Dänzer  wrote:
> On 2018-07-13 08:47 PM, Marek Olšák wrote:
>> On Fri, Jul 13, 2018 at 4:28 AM, Michel Dänzer  wrote:
>>> On 2018-07-12 07:03 PM, Marek Olšák wrote:
>>>> On Thu, Jul 12, 2018, 3:31 AM Michel Dänzer  wrote:
>>>>>
>>>>> What is the rationale for this? I.e. why do you want to not store some
>>>>> handles in the hash table?
>>>>
>>>>
>>>> Because I have the option.
>>>
>>> Seems like you're expecting this patch to be accepted without providing
>>> any real justification for it (here or in the corresponding Mesa patch).
>>> NAK from me if so.
>>
>> The real justification is implied by the patch. See: 
>> amdgpu_add_handle_to_table
>> Like I said: There is no risk of regression and it simplifies one
>> simple case trivially. We shouldn't have to even talk about it.
>
> IMO you haven't provided enough justification for adding API which is
> prone to breakage if used incorrectly.
>
> Other opinions?
>
>
>>> I'd rather add the handle to the hash table in amdgpu_bo_alloc,
>>> amdgpu_create_bo_from_user_mem and amdgpu_bo_import instead of in
>>> amdgpu_bo_export, making amdgpu_bo_export(bo, amdgpu_bo_handle_type_kms,
>>> ...) essentially free. In the unlikely (since allocating a BO from the
>>> kernel is expensive) case that the hash table shows up on profiles, we
>>> can optimize it.
>>
>> The hash table isn't very good for high BO counts. The time complexity
>> of a lookup is O(n).
>
> A lookup is only needed in amdgpu_bo_import. amdgpu_bo_alloc and
> amdgpu_create_bo_from_user_mem can just add the handle to the hash
> bucket directly.
>
> Do you know of, or can you imagine, any workload where amdgpu_bo_import
> is called often enough for this to be a concern?

Fullscreen DRI2 or DRI3 re-imports buffers every frame. It might show
up in a profiler.

Marek
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH libdrm] amdgpu: add amdgpu_bo_handle_type_kms_noimport

2018-07-13 Thread Marek Olšák

On Fri, Jul 13, 2018 at 4:28 AM, Michel Dänzer  wrote:
> On 2018-07-12 07:03 PM, Marek Olšák wrote:
>> On Thu, Jul 12, 2018, 3:31 AM Michel Dänzer  wrote:
>>> On 2018-07-12 02:47 AM, Marek Olšák wrote:
>>>> From: Marek Olšák 
>>>>
>>>> diff --git a/amdgpu/amdgpu_bo.c b/amdgpu/amdgpu_bo.c
>>>> index 9e37b149..d29be244 100644
>>>> --- a/amdgpu/amdgpu_bo.c
>>>> +++ b/amdgpu/amdgpu_bo.c
>>>> @@ -234,20 +234,22 @@ int amdgpu_bo_export(amdgpu_bo_handle bo,
>>>>   case amdgpu_bo_handle_type_gem_flink_name:
>>>>   r = amdgpu_bo_export_flink(bo);
>>>>   if (r)
>>>>   return r;
>>>>
>>>>   *shared_handle = bo->flink_name;
>>>>   return 0;
>>>>
>>>>   case amdgpu_bo_handle_type_kms:
>>>>   amdgpu_add_handle_to_table(bo);
>>>> + /* fall through */
>>>> + case amdgpu_bo_handle_type_kms_noimport:
>>>>   *shared_handle = bo->handle;
>>>>   return 0;
>>>
>>> What is the rationale for this? I.e. why do you want to not store some
>>> handles in the hash table?
>>
>>
>> Because I have the option.
>
> Seems like you're expecting this patch to be accepted without providing
> any real justification for it (here or in the corresponding Mesa patch).
> NAK from me if so.

The real justification is implied by the patch. See: amdgpu_add_handle_to_table
Like I said: There is no risk of regression and it simplifies one
simple case trivially. We shouldn't have to even talk about it.

>
>
>>> And how can code using amdgpu_bo_handle_type_kms_noimport be sure that
>>> the BO will never be re-imported via dma-buf?
>>
>> That's for the user to decide and prove when it's safe.
>
> We shouldn't even have to think about this, let's use the mental
> capacity for more useful things. :)

Mental capacity spent to write the patch: 15 seconds
Mental capacity spent for bike-shedding: Minutes? Tens of minutes?

>
> I'd rather add the handle to the hash table in amdgpu_bo_alloc,
> amdgpu_create_bo_from_user_mem and amdgpu_bo_import instead of in
> amdgpu_bo_export, making amdgpu_bo_export(bo, amdgpu_bo_handle_type_kms,
> ...) essentially free. In the unlikely (since allocating a BO from the
> kernel is expensive) case that the hash table shows up on profiles, we
> can optimize it.

The hash table isn't very good for high BO counts. The time complexity
of a lookup is O(n).

Marek
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

< 1 2 3 4 5 6 >

201 - 300 of 512 matches

Mail list logo