Re: All games maded with Unity engine in steam catalog crashes on AMD GPU, but worked on Intel GPU on same machine.

2018-01-30 Thread Marek Olšák
On Tue, Jan 30, 2018 at 10:04 PM, mikhail <mikhail.v.gavri...@gmail.com> wrote:
> On Tue, 2018-01-30 at 20:12 +0100, Marek Olšák wrote:
>> Can you record an apitrace on a driver that is not radeonsi?
>
> All traces from five listed games was recorded on Intel GPU (not
> radeonsi).
> I also understood why for some games traces was not recorded yesterday.
> It happens because such games are 32 bit and for correct working
> apitrace was needed apitrace-libs-7.1-7.fc27.i686 package, but only
> apitrace-libs-7.1-7.fc27.x86_64 was installed.
>
>> If yes, can you correctly replay the apitrace on a driver that is not
>> radeonsi?
>
> All traces was correctly replayed on Intel GPU (not radeonsi).
>
>> If yes, can you reproduce the crash if you replay the apitrace on
>> radeonsi?
>
> All traces also was correctly replayed on AMD Vega 56 GPU (radeonsi)
> without crashes.
>
> What does this give us?
>
> Anyway launching listed games under AMD Vega 56 GPU lead to new crases.

Thanks. It's possible that Unity contains its own version of LLVM or
its own version of some standard libraries that LLVM uses, and
radeonsi doesn't like when games replace its dependencies.

Marek
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: All games maded with Unity engine in steam catalog crashes on AMD GPU, but worked on Intel GPU on same machine.

2018-01-30 Thread Marek Olšák
On Tue, Jan 30, 2018 at 7:08 PM, mikhail  wrote:
> Launching games with latest llvm and mesa not solve problem. :(
>
> Current versions:
> mesa: 17.4.0-0.18.git32170d8
> llvm: 7.0.0-0.1.r322903
>
> /home/mikhail/.local/share/Steam/steamapps/common/Comedy Night/Comedy
> Night_Data/Mono/x86/libmono.so(+0x8a0b8) [0xebe5c0b8]
> linux-gate.so.1(__kernel_rt_sigreturn+0) [0xf7f6bde0]
> linux-gate.so.1(__kernel_vsyscall+0x9) [0xf7f6bdb9]
> /lib/libc.so.6(gsignal+0xc2) [0xf7aa3602]
> /lib/libc.so.6(abort+0x127) [0xf7aa4ea7]
> /lib/libc.so.6(+0x77b7b) [0xf7aeab7b]
> /lib/libc.so.6(+0x7f51e) [0xf7af251e]
> /lib/libc.so.6(cfree+0x65b) [0xf7af9dcb]
> /lib/libLLVM-7.0svn.so(_ZdlPv+0x18) [0xf369a408]
> /lib/libLLVM-7.0svn.so(_ZdlPvj+0x18) [0xf36a4b58]
> /usr/lib/dri/radeonsi_dri.so(+0x10fd0e) [0xf4b21d0e]
> /usr/lib/dri/radeonsi_dri.so(+0x307016) [0xf4d19016]
> /usr/lib/dri/radeonsi_dri.so(+0x308c67) [0xf4d1ac67]
> /usr/lib/dri/radeonsi_dri.so(+0x309e59) [0xf4d1be59]
> /usr/lib/dri/radeonsi_dri.so(+0x572517) [0xf4f84517]
> /usr/lib/dri/radeonsi_dri.so(+0x2e8f7c) [0xf4cfaf7c]
> /usr/lib/dri/radeonsi_dri.so(+0x2e915d) [0xf4cfb15d]
> ./Comedy Night.x86() [0x8fe2b70]
> ./Comedy Night.x86() [0x8faf255]
> ./Comedy Night.x86() [0x8f890e8]
> ./Comedy Night.x86() [0x8f8e70e]
> ./Comedy Night.x86() [0x8f826af]
> ./Comedy Night.x86() [0x8557807]
> /lib/libpthread.so.0(+0x6587) [0xf7f26587]
> /lib/libc.so.6(clone+0x66) [0xf7b6fee6]
>
>
> What else can I do to help?

Can you record an apitrace on a driver that is not radeonsi?
If yes, can you correctly replay the apitrace on a driver that is not radeonsi?
If yes, can you reproduce the crash if you replay the apitrace on radeonsi?

The possible answers to those questions are either "yes" or "no".

Thanks,
Marek
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: LLVM 7.0 + Wayland = broken mouse control

2018-01-29 Thread Marek Olšák
On Mon, Jan 29, 2018 at 8:58 PM, mikhail <mikhail.v.gavri...@gmail.com> wrote:
>
> On Mon, 2018-01-29 at 19:18 +0100, Marek Olšák wrote:
>> Please report this issue to the gnome-shell team. gnome-shell has a
>> bug in how they handle (or ignore) 10-bits-per-channel visuals.
>
> I already reported here:
> https://gitlab.gnome.org/GNOME/mutter/issues/2
>
>> The workaround is to set the environment variable
>> "allow_rgb10_configs=false" for gnome-shell.
>
> I added environment variable as you said:
> # echo "allow_rgb10_configs=false" > /etc/profile.d/gnome-workaround.sh
> But it not helps workaround problem.
> Maybe we made a mistake in the diagnosis?

It's possible.

Marek
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: All games maded with Unity engine in steam catalog crashes on AMD GPU, but worked on Intel GPU on same machine.

2018-01-29 Thread Marek Olšák
On Mon, Jan 29, 2018 at 6:51 PM, mikhail  wrote:
>
> May be problem in my LLVM and MESA?
> mesa: 17.4.0-0.16.git41c36c4
> llvm: 7.0.0-0.1.r322132

I don't know.

>
> I am couldn't update it to much newer version cause newer version is
> breaks my GNOME desktop. I am wrote about it before in this mailing
> list.

Please report this issue to the gnome-shell team. gnome-shell has a
bug in how they handle (or ignore) 10-bits-per-channel visuals.

The workaround is to set the environment variable
"allow_rgb10_configs=false" for gnome-shell.

Marek
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: All games maded with Unity engine in steam catalog crashes on AMD GPU, but worked on Intel GPU on same machine.

2018-01-29 Thread Marek Olšák
Neither of your apitraces crashes on my machine.

Marek
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: All games maded with Unity engine in steam catalog crashes on AMD GPU, but worked on Intel GPU on same machine.

2018-01-29 Thread Marek Olšák
[-gpudriverdevsupport (Bcc)]

Let's keep this on amd-gfx.

It looks like all backtraces crash in free() or operator delete(). I
would say it's heap corruption.

Can you record an apitrace on a driver that is not radeonsi?
Can you correctly replay the apitrace on a driver that is not radeonsi?
Can you reproduce the crash if you replay the apitrace on radeonsi?

Thanks,
Marek

On Sun, Jan 28, 2018 at 5:15 PM, mikhail  wrote:
>
> Hi!
>
> Comedy Night - http://store.steampowered.com/app/665360/
> Fractal Space - http://store.steampowered.com/app/435410/
> Kerbal Space Program - http://store.steampowered.com/app/220200/
> Red Comrades 3: Return of Alaska. Reloaded - http://store.steampowered.
> com/app/546220/
> Youtubers Life - http://store.steampowered.com/app/428690/
>
> All listed games unite that all of them are made with Unity engine.
>
> My question: Reason why they crashed on AMD GPU is cause bug in Unity
> engine or bug in AMD driver?
>
> For investigation I added all backtraces to this message.
>
> --
> Best Regards,
> Mike Gavrilov.
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH xf86-video-amdgpu] Fix linear check in amdgpu_glamor_share_pixmap_backing

2018-01-22 Thread Marek Olšák
Reviewed-by: Marek Olšák <marek.ol...@amd.com>

Marek

On Mon, Jan 22, 2018 at 6:36 PM, Michel Dänzer <mic...@daenzer.net> wrote:
> From: Michel Dänzer <michel.daen...@amd.com>
>
> We were incorrectly interpreting the tiling information.
>
> Reported-by: Marek Olšák <marek.ol...@amd.com>
> Signed-off-by: Michel Dänzer <michel.daen...@amd.com>
> ---
>  configure.ac|  2 +-
>  src/amdgpu_glamor.c | 11 ++-
>  2 files changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/configure.ac b/configure.ac
> index 02e19aac5..ea41f41b3 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -71,7 +71,7 @@ XORG_DRIVER_CHECK_EXT(DPMSExtension, xextproto)
>
>  # Checks for libraries.
>  PKG_CHECK_MODULES(LIBDRM, [libdrm >= 2.4.78])
> -PKG_CHECK_MODULES(LIBDRM_AMDGPU, [libdrm_amdgpu >= 2.4.72])
> +PKG_CHECK_MODULES(LIBDRM_AMDGPU, [libdrm_amdgpu >= 2.4.76])
>  PKG_CHECK_MODULES(GBM, [gbm])
>
>  # Obtain compiler/linker options for the driver dependencies
> diff --git a/src/amdgpu_glamor.c b/src/amdgpu_glamor.c
> index 4d9788787..82146a157 100644
> --- a/src/amdgpu_glamor.c
> +++ b/src/amdgpu_glamor.c
> @@ -323,13 +323,22 @@ amdgpu_glamor_share_pixmap_backing(PixmapPtr pixmap, 
> ScreenPtr slave,
>void **handle_p)
>  {
> ScreenPtr screen = pixmap->drawable.pScreen;
> +   AMDGPUInfoPtr info = AMDGPUPTR(xf86ScreenToScrn(screen));
> uint64_t tiling_info;
> CARD16 stride;
> CARD32 size;
> +   Bool is_linear;
> int fd;
>
> tiling_info = amdgpu_pixmap_get_tiling_info(pixmap);
> -   if (AMDGPU_TILING_GET(tiling_info, ARRAY_MODE) != 0) {
> +
> +   if (info->family >= AMDGPU_FAMILY_AI) {
> +   is_linear = AMDGPU_TILING_GET(tiling_info, SWIZZLE_MODE) == 0;
> +   } else {
> +   is_linear = AMDGPU_TILING_GET(tiling_info, ARRAY_MODE) == 1;
> +   }
> +
> +   if (!is_linear) {
> PixmapPtr linear;
>
> /* We don't want to re-allocate the screen pixmap as
> --
> 2.15.1
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH libdrm 3/4] amdgpu: use the high VA range if possible v2

2018-01-08 Thread Marek Olšák
Actually, 0x8000 is fine.

For the series:

Reviewed-by: Marek Olšák <marek.ol...@amd.com>

Marek
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH libdrm 3/4] amdgpu: use the high VA range if possible v2

2018-01-08 Thread Marek Olšák
Can we put the 32-bit address space higher? E.g. high bits = 0xfff0 ?

Marek

On Sun, Jan 7, 2018 at 10:11 AM, Christian König
 wrote:
> Retire the low range on Vega10 this frees up everything below 
> 0x8000 for HMM.
>
> v2: keep the 32bit range working.
>
> Signed-off-by: Christian König 
> ---
>  amdgpu/amdgpu_device.c | 18 ++
>  1 file changed, 14 insertions(+), 4 deletions(-)
>
> diff --git a/amdgpu/amdgpu_device.c b/amdgpu/amdgpu_device.c
> index d7077184..a0d01727 100644
> --- a/amdgpu/amdgpu_device.c
> +++ b/amdgpu/amdgpu_device.c
> @@ -264,13 +264,23 @@ int amdgpu_device_initialize(int fd,
> goto cleanup;
> }
>
> -   start = dev->dev_info.virtual_address_offset;
> -   max = MIN2(dev->dev_info.virtual_address_max, 0x1ULL);
> +   if (dev->dev_info.high_va_offset && dev->dev_info.high_va_max) {
> +   start = dev->dev_info.high_va_offset;
> +   max = dev->dev_info.high_va_max;
> +   } else {
> +   start = dev->dev_info.virtual_address_offset;
> +   max = dev->dev_info.virtual_address_max;
> +   }
> +
> +   max = MIN2(max, (start & ~0x) + 0x1ULL);
> amdgpu_vamgr_init(>vamgr_32, start, max,
>   dev->dev_info.virtual_address_alignment);
>
> -   start = MAX2(dev->dev_info.virtual_address_offset, 0x1ULL);
> -   max = MAX2(dev->dev_info.virtual_address_max, 0x1ULL);
> +   start = max;
> +   if (dev->dev_info.high_va_offset && dev->dev_info.high_va_max)
> +   max = dev->dev_info.high_va_max;
> +   else
> +   max = dev->dev_info.virtual_address_max;
> amdgpu_vamgr_init(>vamgr, start, max,
>   dev->dev_info.virtual_address_alignment);
>
> --
> 2.11.0
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH libdrm 1/3] amdgpu: Clean up amdgpu_parse_asic_ids error handling

2018-01-05 Thread Marek Olšák
On Fri, Dec 1, 2017 at 5:56 PM, Michel Dänzer  wrote:
> From: Michel Dänzer 
>
> * Move error message printing into amdgpu_parse_asic_ids and make it
>   return void
> * Print only "Invalid format" error message if parse_one_line returns
>   -EINVAL
> * Use strerror instead of printing the (negative) error code in hex
>
> Signed-off-by: Michel Dänzer 
> ---
>  amdgpu/amdgpu_asic_id.c  | 16 ++--
>  amdgpu/amdgpu_device.c   |  6 +-
>  amdgpu/amdgpu_internal.h |  2 +-
>  3 files changed, 12 insertions(+), 12 deletions(-)
>
> diff --git a/amdgpu/amdgpu_asic_id.c b/amdgpu/amdgpu_asic_id.c
> index e8218974..eb42bbc2 100644
> --- a/amdgpu/amdgpu_asic_id.c
> +++ b/amdgpu/amdgpu_asic_id.c
> @@ -109,7 +109,7 @@ out:
> return r;
>  }
>
> -int amdgpu_parse_asic_ids(struct amdgpu_asic_id **p_asic_id_table)
> +void amdgpu_parse_asic_ids(struct amdgpu_asic_id **p_asic_id_table)
>  {
> struct amdgpu_asic_id *asic_id_table;
> struct amdgpu_asic_id *id;
> @@ -126,7 +126,7 @@ int amdgpu_parse_asic_ids(struct amdgpu_asic_id 
> **p_asic_id_table)
> if (!fp) {
> fprintf(stderr, "%s: %s\n", AMDGPU_ASIC_ID_TABLE,
> strerror(errno));
> -   return -EINVAL;
> +   return;
> }
>
> asic_id_table = calloc(table_max_size + 1,
> @@ -177,8 +177,6 @@ int amdgpu_parse_asic_ids(struct amdgpu_asic_id 
> **p_asic_id_table)
> line_num++;
> continue;
> }
> -   fprintf(stderr, "Invalid format: %s: line %d: %s\n",
> -   AMDGPU_ASIC_ID_TABLE, line_num, line);
> goto free;
> }
>
> @@ -201,6 +199,14 @@ int amdgpu_parse_asic_ids(struct amdgpu_asic_id 
> **p_asic_id_table)
> memset(id, 0, sizeof(struct amdgpu_asic_id));
>
>  free:
> +   if (r == -EINVAL) {
> +   fprintf(stderr, "Invalid format: %s: line %d: %s\n",
> +   AMDGPU_ASIC_ID_TABLE, line_num, line);
> +   } else if (r) {
> +   fprintf(stderr, "%s: Cannot parse ASIC IDs: %s\n",
> +   __func__, strerror(-r));

This is a good example how strerror should not be used in graphics
drivers. It makes no sense:
"amdgpu_parse_asic_ids: Cannot parse ASIC IDs: Resource temporarily unavailable"

Is my graphics card temporary unavailable?

Marek
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] amdgpu: disable GPU reset if amdgpu.lockup_timeout=0

2017-12-12 Thread Marek Olšák
On Tue, Dec 12, 2017 at 5:36 PM, Christian König
<ckoenig.leichtzumer...@gmail.com> wrote:
> Am 12.12.2017 um 15:57 schrieb Marek Olšák:
>>
>> On Tue, Dec 12, 2017 at 10:01 AM, Christian König
>> <ckoenig.leichtzumer...@gmail.com> wrote:
>>>
>>> Am 11.12.2017 um 22:29 schrieb Marek Olšák:
>>>>
>>>> From: Marek Olšák <marek.ol...@amd.com>
>>>>
>>>> Signed-off-by: Marek Olšák <marek.ol...@amd.com>
>>>> ---
>>>>
>>>> Is this really correct? I have no easy way to test it.
>>>
>>>
>>> It's a step in the right direction, but I would rather vote for something
>>> else:
>>>
>>> Instead of disabling the timeout by default we only disable the GPU
>>> reset/recovery.
>>>
>>> The idea is to add a new parameter amdgpu_gpu_recovery which makes
>>> amdgpu_gpu_recover only prints out an error and doesn't touch the GPU at
>>> all
>>> (on bare metal systems).
>>>
>>> Then we finally set the amdgpu_lockup_timeout to a non zero value by
>>> default.
>>>
>>> Andrey could you take care of this when you have time?
>>
>> I don't understand this.
>>
>> Why can't we keep the previous behavior where amdgpu.lockup_timeout=0
>> disabled GPU reset? Why do we have to add another option for the same
>> thing?
>
>
> lockup_timeout=0 never disabled the GPU reset, it just disabled the timeout.

It disabled the automatic reset before we had those interrupt callbacks.

>
> You could still manually trigger a reset and also invalid commands, invalid
> register writes and requests from the SRIOV hypervisor could trigger this.

That's OK. Manual resets should always be allowed.

>
> And as Monk explained GPU resets are mandatory for SRIOV, you can't disable
> them at all in this case.

What is preventing Monk from setting amdgpu.lockup_timeout > 0, which
should be the default state anyway?

Let's just say lockup_timeout=0 has undefined behavior with SRIOV.

>
> Additional to that we probably want the error message that something timed
> out, but not touching the hardware in any way.

Yes that is a fair point.

Marek
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] amdgpu: disable GPU reset if amdgpu.lockup_timeout=0

2017-12-12 Thread Marek Olšák
On Tue, Dec 12, 2017 at 4:18 AM, Liu, Monk  wrote:
> NAK, you change break SRIOV logic:
>
> Without lockup_timeout set, this gpu_recover() won't get called at all , 
> unless your IB triggered invalid instruct and that IRQ invoked
> Amdgpu_gpu_recover(), by this cause you should disable the logic that in that 
> IRQ instead of change gpu_recover() itself because
> For SRIOV we need gpu_recover() even lockup_timeout is zero

The default value of 0 indicates that GPU reset isn't ready to be
enabled by default. That's what it means. Once the GPU reset works,
the default should be non-zero (e.g. 1) and
amdgpu.lockup_timeout=0 should be used to disable all GPU resets in
order to be able do scandumps and debug GPU hangs.

Marek
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] amdgpu: disable GPU reset if amdgpu.lockup_timeout=0

2017-12-12 Thread Marek Olšák
On Tue, Dec 12, 2017 at 10:01 AM, Christian König
<ckoenig.leichtzumer...@gmail.com> wrote:
> Am 11.12.2017 um 22:29 schrieb Marek Olšák:
>>
>> From: Marek Olšák <marek.ol...@amd.com>
>>
>> Signed-off-by: Marek Olšák <marek.ol...@amd.com>
>> ---
>>
>> Is this really correct? I have no easy way to test it.
>
>
> It's a step in the right direction, but I would rather vote for something
> else:
>
> Instead of disabling the timeout by default we only disable the GPU
> reset/recovery.
>
> The idea is to add a new parameter amdgpu_gpu_recovery which makes
> amdgpu_gpu_recover only prints out an error and doesn't touch the GPU at all
> (on bare metal systems).
>
> Then we finally set the amdgpu_lockup_timeout to a non zero value by
> default.
>
> Andrey could you take care of this when you have time?

I don't understand this.

Why can't we keep the previous behavior where amdgpu.lockup_timeout=0
disabled GPU reset? Why do we have to add another option for the same
thing?

Marek
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] amdgpu: disable GPU reset if amdgpu.lockup_timeout=0

2017-12-11 Thread Marek Olšák
From: Marek Olšák <marek.ol...@amd.com>

Signed-off-by: Marek Olšák <marek.ol...@amd.com>
---

Is this really correct? I have no easy way to test it.

 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 8d03baa..56c41cf 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3018,20 +3018,24 @@ static int amdgpu_reset_sriov(struct amdgpu_device 
*adev, uint64_t *reset_flags,
  *
  * Attempt to reset the GPU if it has hung (all asics).
  * Returns 0 for success or an error on failure.
  */
 int amdgpu_gpu_recover(struct amdgpu_device *adev, struct amdgpu_job *job)
 {
struct drm_atomic_state *state = NULL;
uint64_t reset_flags = 0;
int i, r, resched;
 
+   /* amdgpu.lockup_timeout=0 disables GPU reset. */
+   if (amdgpu_lockup_timeout == 0)
+   return 0;
+
if (!amdgpu_check_soft_reset(adev)) {
DRM_INFO("No hardware hang detected. Did some blocks stall?\n");
return 0;
}
 
dev_info(adev->dev, "GPU reset begin!\n");
 
mutex_lock(>lock_reset);
atomic_inc(>gpu_reset_counter);
adev->in_gpu_reset = 1;
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH libdrm] [drm] - Adding amdgpu_cs_create_syncobj2 to create syncobj as signaled initially

2017-11-28 Thread Marek Olšák
You need a freedesktop.org account:
https://www.freedesktop.org/wiki/AccountRequests/

Marek

On Tue, Nov 28, 2017 at 2:32 PM, Mao, David  wrote:
> I have never tried to commit the change before. So I guess the answer is no.
> Could you let me know, how I can apply for the commit right?
>
> Thanks.
> Best Regards,
> David
>
> -Original Message-
> From: Christian König [mailto:ckoenig.leichtzumer...@gmail.com]
> Sent: Tuesday, November 28, 2017 9:29 PM
> To: Mao, David ; amd-gfx@lists.freedesktop.org
> Subject: Re: [PATCH libdrm] [drm] - Adding amdgpu_cs_create_syncobj2 to 
> create syncobj as signaled initially
>
> Reviewed-by: Christian König 
>
> But in general for libdrm changes I would ping Marek, Nicolai, Michel and in 
> this special case Dave Airlie because he added the patch with the missing 
> flags field.
>
> And I strongly assume you don't have commit rights, don't you?
>
> Regards,
> Christian.
>
> Am 28.11.2017 um 14:22 schrieb Mao, David:
>> Anyone can help to review the change?
>> Thanks.
>>
>> Best Regards,
>> David
>>
>> -Original Message-
>> From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On Behalf
>> Of David Mao
>> Sent: Tuesday, November 28, 2017 11:26 AM
>> To: amd-gfx@lists.freedesktop.org
>> Subject: [PATCH libdrm] [drm] - Adding amdgpu_cs_create_syncobj2 to
>> create syncobj as signaled initially
>>
>> Change-Id: Icf8d29bd4b50ee76936faacbbe099492cf0557cc
>> Signed-off-by: David Mao 
>> ---
>>   amdgpu/amdgpu.h| 15 +++
>>   amdgpu/amdgpu_cs.c | 10 ++
>>   2 files changed, 25 insertions(+)
>>
>> diff --git a/amdgpu/amdgpu.h b/amdgpu/amdgpu.h index 78fbd1e..47bdb3a
>> 100644
>> --- a/amdgpu/amdgpu.h
>> +++ b/amdgpu/amdgpu.h
>> @@ -1727,6 +1727,21 @@ const char 
>> *amdgpu_get_marketing_name(amdgpu_device_handle dev);
>>   /**
>>*  Create kernel sync object
>>*
>> + * \param   dev - \c [in]  device handle
>> + * \param   flags   - \c [in]  flags that affect creation
>> + * \param   syncobj - \c [out] sync object handle
>> + *
>> + * \return   0 on success\n
>> + *  <0 - Negative POSIX Error code
>> + *
>> +*/
>> +int amdgpu_cs_create_syncobj2(amdgpu_device_handle dev,
>> +   uint32_t  flags,
>> +   uint32_t *syncobj);
>> +
>> +/**
>> + *  Create kernel sync object
>> + *
>>* \param   dev   - \c [in]  device handle
>>* \param   syncobj   - \c [out] sync object handle
>>*
>> diff --git a/amdgpu/amdgpu_cs.c b/amdgpu/amdgpu_cs.c index
>> 64ad911..a9fbab9 100644
>> --- a/amdgpu/amdgpu_cs.c
>> +++ b/amdgpu/amdgpu_cs.c
>> @@ -606,6 +606,16 @@ int amdgpu_cs_destroy_semaphore(amdgpu_semaphore_handle 
>> sem)
>>   return amdgpu_cs_unreference_sem(sem);  }
>>
>> +int amdgpu_cs_create_syncobj2(amdgpu_device_handle dev,
>> +   uint32_t  flags,
>> +   uint32_t *handle)
>> +{
>> + if (NULL == dev)
>> + return -EINVAL;
>> +
>> + return drmSyncobjCreate(dev->fd, flags, handle); }
>> +
>>   int amdgpu_cs_create_syncobj(amdgpu_device_handle dev,
>>uint32_t *handle)
>>   {
>> --
>> 2.7.4
>>
>> ___
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>> ___
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] amdgpu: Add R600_DEBUG flag to reserve VMID per ctx.

2017-11-03 Thread Marek Olšák
Pushed with a fixed typo and whitespace, thanks.

Marek

On Thu, Nov 2, 2017 at 3:50 PM, Andrey Grodzovsky
 wrote:
> Fixes reverted patch f03b7c9 by doing VMID reservation per
> process and not per context.
> Also updates required amdgpu libdrm version since the change
> involved interface updates in amdgpu libdrm.
>
> Signed-off-by: Andrey Grodzovsky 
> ---
>  configure.ac  |  2 +-
>  meson.build   |  2 +-
>  src/gallium/drivers/radeon/r600_pipe_common.c |  1 +
>  src/gallium/drivers/radeon/r600_pipe_common.h |  1 +
>  src/gallium/winsys/amdgpu/drm/amdgpu_cs.h |  1 +
>  src/gallium/winsys/amdgpu/drm/amdgpu_winsys.c | 12 
>  src/gallium/winsys/amdgpu/drm/amdgpu_winsys.h |  1 +
>  7 files changed, 18 insertions(+), 2 deletions(-)
>
> diff --git a/configure.ac b/configure.ac
> index 9aa02f5..0116b90 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -74,7 +74,7 @@ AC_SUBST([OPENCL_VERSION])
>  # in the first entry.
>  LIBDRM_REQUIRED=2.4.75
>  LIBDRM_RADEON_REQUIRED=2.4.71
> -LIBDRM_AMDGPU_REQUIRED=2.4.85
> +LIBDRM_AMDGPU_REQUIRED=2.4.88
>  LIBDRM_INTEL_REQUIRED=2.4.75
>  LIBDRM_NVVIEUX_REQUIRED=2.4.66
>  LIBDRM_NOUVEAU_REQUIRED=2.4.66
> diff --git a/meson.build b/meson.build
> index 6ad8c8b..3ceaec4 100644
> --- a/meson.build
> +++ b/meson.build
> @@ -638,7 +638,7 @@ dep_libdrm_nouveau = []
>  dep_libdrm_etnaviv = []
>  dep_libdrm_freedreno = []
>  if with_amd_vk or with_gallium_radeonsi
> -  dep_libdrm_amdgpu = dependency('libdrm_amdgpu', version : '>= 2.4.86')
> +  dep_libdrm_amdgpu = dependency('libdrm_amdgpu', version : '>= 2.4.88')
>  endif
>  if with_gallium_radeonsi or with_dri_r100 or with_dri_r200
>dep_libdrm_radeon = dependency('libdrm_radeon', version : '>= 2.4.71')
> diff --git a/src/gallium/drivers/radeon/r600_pipe_common.c 
> b/src/gallium/drivers/radeon/r600_pipe_common.c
> index b77d859..3364dac 100644
> --- a/src/gallium/drivers/radeon/r600_pipe_common.c
> +++ b/src/gallium/drivers/radeon/r600_pipe_common.c
> @@ -851,6 +851,7 @@ static const struct debug_named_value 
> common_debug_options[] = {
> { "dpbb", DBG(DPBB), "Enable DPBB." },
> { "dfsm", DBG(DFSM), "Enable DFSM." },
> { "nooutoforder", DBG(NO_OUT_OF_ORDER), "Disable out-of-order 
> rasterization" },
> +   { "reserve_vmid", DBG(RESERVE_VMID), "Force VMID resrvation per 
> context." },
>
> DEBUG_NAMED_VALUE_END /* must be last */
>  };
> diff --git a/src/gallium/drivers/radeon/r600_pipe_common.h 
> b/src/gallium/drivers/radeon/r600_pipe_common.h
> index a7c91cb..94c8d4f 100644
> --- a/src/gallium/drivers/radeon/r600_pipe_common.h
> +++ b/src/gallium/drivers/radeon/r600_pipe_common.h
> @@ -107,6 +107,7 @@ enum {
> DBG_NO_DISCARD_RANGE,
> DBG_NO_WC,
> DBG_CHECK_VM,
> +   DBG_RESERVE_VMID,
>
> /* 3D engine options: */
> DBG_SWITCH_ON_EOP,
> diff --git a/src/gallium/winsys/amdgpu/drm/amdgpu_cs.h 
> b/src/gallium/winsys/amdgpu/drm/amdgpu_cs.h
> index 1c3d0f0..5fe3592 100644
> --- a/src/gallium/winsys/amdgpu/drm/amdgpu_cs.h
> +++ b/src/gallium/winsys/amdgpu/drm/amdgpu_cs.h
> @@ -162,6 +162,7 @@ static inline bool amdgpu_fence_is_syncobj(struct 
> amdgpu_fence *fence)
>  static inline void amdgpu_ctx_unref(struct amdgpu_ctx *ctx)
>  {
> if (p_atomic_dec_zero(>refcount)) {
> +
>amdgpu_cs_ctx_free(ctx->ctx);
>amdgpu_bo_free(ctx->user_fence_bo);
>FREE(ctx);
> diff --git a/src/gallium/winsys/amdgpu/drm/amdgpu_winsys.c 
> b/src/gallium/winsys/amdgpu/drm/amdgpu_winsys.c
> index a210a27..7b261bb 100644
> --- a/src/gallium/winsys/amdgpu/drm/amdgpu_winsys.c
> +++ b/src/gallium/winsys/amdgpu/drm/amdgpu_winsys.c
> @@ -73,6 +73,7 @@ static bool do_winsys_init(struct amdgpu_winsys *ws, int fd)
>
> ws->check_vm = strstr(debug_get_option("R600_DEBUG", ""), "check_vm") != 
> NULL;
> ws->debug_all_bos = debug_get_option_all_bos();
> +   ws->reserve_vmid = strstr(debug_get_option("R600_DEBUG", ""), 
> "reserve_vmid") != NULL;
>
> return true;
>
> @@ -92,6 +93,9 @@ static void amdgpu_winsys_destroy(struct radeon_winsys *rws)
>  {
> struct amdgpu_winsys *ws = (struct amdgpu_winsys*)rws;
>
> +   if (ws->reserve_vmid)
> +  amdgpu_vm_unreserve_vmid(ws->dev, 0);
> +
> if (util_queue_is_initialized(>cs_queue))
>util_queue_destroy(>cs_queue);
>
> @@ -342,6 +346,14 @@ amdgpu_winsys_create(int fd, const struct 
> pipe_screen_config *config,
>
> util_hash_table_set(dev_tab, dev, ws);
>
> +   if (ws->reserve_vmid) {
> +  r = amdgpu_vm_reserve_vmid(dev, 0);
> +  if (r) {
> +   fprintf(stderr, "amdgpu: amdgpu_vm_reserve_vmid failed. 
> (%i)\n", r);
> +   goto fail_cache;
> +  }
> +   }
> +
> /* We must unlock the mutex once the winsys is fully initialized, so that
>  * other threads attempting to create the winsys from the same fd will

[ANNOUNCE] libdrm 2.4.88

2017-11-03 Thread Marek Olšák

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


libdrm 2.4.88 has been released.


Andrey Grodzovsky (1):
  amdgpu: Fix wrappers for AMDGPU_VM IOCTL.

Marek Olšák (1):
  configure.ac: bump version for release

git tag: libdrm-2.4.88

https://dri.freedesktop.org/libdrm/libdrm-2.4.88.tar.bz2
MD5:  fe4d5c77f1468ee73d0bbb30d76945d7  libdrm-2.4.88.tar.bz2
SHA1: 9cac721d33eb1e65a89a764cb71a7a63ceb6d7c1  libdrm-2.4.88.tar.bz2
SHA256: b5e55dbac2124e742e639f5b8553e8b7395863bf73dab4f77e99fe2fc25572b5  
libdrm-2.4.88.tar.bz2
SHA512: 
0d9d4bcc0d9be1fb6b1ca075339b22b0f927288a4c02bbcbf95406b5c095051890f3d2e0d32e529ef9b6952ce1250afd1e0765ad3188c2bac924dda8c33afabb
  libdrm-2.4.88.tar.bz2
PGP:  https://dri.freedesktop.org/libdrm/libdrm-2.4.88.tar.bz2.sig

https://dri.freedesktop.org/libdrm/libdrm-2.4.88.tar.gz
MD5:  090c1a93d92a1549df2514de0566cd50  libdrm-2.4.88.tar.gz
SHA1: 4c187a55ce622c623491c6f873fc672a96b60a15  libdrm-2.4.88.tar.gz
SHA256: a8b458db6a73c717baee2e249d39511fb6f5c0f5f24dee2770935eddeda1a017  
libdrm-2.4.88.tar.gz
SHA512: 
126ae9bb2f3ea45a90d46792a083db44a6ca550ca71697272dfc480f536377749c8f6ae6d92464537d0720bd058dc9a23f34c8e35043472a7a279193b1512abc
  libdrm-2.4.88.tar.gz
PGP:  https://dri.freedesktop.org/libdrm/libdrm-2.4.88.tar.gz.sig

-BEGIN PGP SIGNATURE-
Version: GnuPG v1

iQEcBAEBAgAGBQJZ/J3ZAAoJEP3RXVrO8PKxIPwH/0kaRQ7ZBdw7NFPTRRr12Fml
EWbJN2RWqI7SlosEHAl/P2LPtvnnwVibLv6o4ZaosPLJn/uCdAU3IVYDQ4icUI65
v+ZQqKDTo08n9xZJcwaF8xWGpM5jh5c0eJs8dlytBi8pqJW+YEZg5TaY9et369hU
w5iBRYnsnO3qzE44OVKtCgUZXjGXsLCZ8kiE7T249QzfLXWZYmxwdClgZz5S4/Kq
MyidYv8k12VcCEX7egKlhXdrdHOLZEblHTk62UQRteVGYjlaw/Qy4fFmyAAOPdp4
iMabQhzSCCBHZqgkL+MG/BTLbtLfR/PHsineC++TZK+JlQ7tbskE/GVqS0ZszYo=
=CyiB
-END PGP SIGNATURE-
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [Mesa-dev] [PATCH] amdgpu: Add R600_DEBUG flag to reserve VMID per ctx.

2017-11-01 Thread Marek Olšák
Yeah, it should be called when the winsys is created.

Marek

On Wed, Nov 1, 2017 at 9:49 AM, Christian König
<ckoenig.leichtzumer...@gmail.com> wrote:
> I'm not 100% sure that patch was correct.
>
> When is amdgpu_ctx_create() called? The VMID is reserved for the whole
> process, not just a context.
>
> Regards,
> Christian.
>
>
> Am 31.10.2017 um 16:57 schrieb Marek Olšák:
>>
>> I addressed the feedback and pushed the patch.
>>
>> Marek
>>
>> On Tue, Oct 31, 2017 at 4:50 PM, Michel Dänzer <mic...@daenzer.net> wrote:
>>>
>>> On 31/10/17 04:40 PM, Andrey Grodzovsky wrote:
>>>>
>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzov...@amd.com>
>>>
>>> [...]
>>>
>>>> diff --git a/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c
>>>> b/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c
>>>> index 8f43e93..1155492 100644
>>>> --- a/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c
>>>> +++ b/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c
>>>> @@ -256,6 +256,14 @@ static struct radeon_winsys_ctx
>>>> *amdgpu_ctx_create(struct radeon_winsys *ws)
>>>> goto error_create;
>>>>  }
>>>>
>>>> +   if (ctx->ws->reserve_vmid) {
>>>> +r = amdgpu_vm_reserve_vmid(ctx->ctx, 0);
>>>> +if (r) {
>>>> + fprintf(stderr, "amdgpu: amdgpu_cs_ctx_create failed.
>>>> (%i)\n", r);
>>>
>>> This should say "amdgpu: amdgpu_vm_reserve_vmid failed. (%i)\n".
>>>
>>>
>>> --
>>> Earthling Michel Dänzer   |   http://www.amd.com
>>> Libre software enthusiast | Mesa and X developer
>>> ___
>>> mesa-dev mailing list
>>> mesa-...@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
>
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [Mesa-dev] [PATCH] amdgpu: Add R600_DEBUG flag to reserve VMID per ctx.

2017-10-31 Thread Marek Olšák
I addressed the feedback and pushed the patch.

Marek

On Tue, Oct 31, 2017 at 4:50 PM, Michel Dänzer  wrote:
> On 31/10/17 04:40 PM, Andrey Grodzovsky wrote:
>> Signed-off-by: Andrey Grodzovsky 
>
> [...]
>
>> diff --git a/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c 
>> b/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c
>> index 8f43e93..1155492 100644
>> --- a/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c
>> +++ b/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c
>> @@ -256,6 +256,14 @@ static struct radeon_winsys_ctx 
>> *amdgpu_ctx_create(struct radeon_winsys *ws)
>>goto error_create;
>> }
>>
>> +   if (ctx->ws->reserve_vmid) {
>> +r = amdgpu_vm_reserve_vmid(ctx->ctx, 0);
>> +if (r) {
>> + fprintf(stderr, "amdgpu: amdgpu_cs_ctx_create failed. (%i)\n", 
>> r);
>
> This should say "amdgpu: amdgpu_vm_reserve_vmid failed. (%i)\n".
>
>
> --
> Earthling Michel Dänzer   |   http://www.amd.com
> Libre software enthusiast | Mesa and X developer
> ___
> mesa-dev mailing list
> mesa-...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[ANNOUNCE] libdrm 2.4.87

2017-10-31 Thread Marek Olšák
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

libdrm 2.4.87 has been released.

Marek Olšák (2):
  amdgpu: fix 32-bit build
  configure.ac: bump version for release

git tag: libdrm-2.4.87

https://dri.freedesktop.org/libdrm/libdrm-2.4.87.tar.bz2
MD5:  b4f9063838559d08649d45fec2d1184a  libdrm-2.4.87.tar.bz2
SHA1: 26b976c8901061399b8b934854b960f5b0761f36  libdrm-2.4.87.tar.bz2
SHA256: 4c1c5293bdbfa248e029d3e6446767e17a3208387a719ec9da2d20c19849ed48  
libdrm-2.4.87.tar.bz2
SHA512: 
9ce259d38eaffbcaf82a6975ffa513307da6750bcc31a53cade5717f854151b0e7b6d4ce25c0b518cab5a542c544d5984329f216dc4cdc9369b4adae19a68e48
  libdrm-2.4.87.tar.bz2
PGP:  https://dri.freedesktop.org/libdrm/libdrm-2.4.87.tar.bz2.sig

https://dri.freedesktop.org/libdrm/libdrm-2.4.87.tar.gz
MD5:  8ec8c39c3a42db9a9a8d8a8c67290883  libdrm-2.4.87.tar.gz
SHA1: 0bd17d00d1843a1ddca98352abcdebb468dd7384  libdrm-2.4.87.tar.gz
SHA256: e813e2e8d7d9f071200317ee6b2ef6e35d4d497a57488ed3d44551e5970fc41a  
libdrm-2.4.87.tar.gz
SHA512: 
178541f488c62ca45ffb9bc04c737fa54560df7a845f0cd8b02684da5a51d80c0e96f2327e0b84ad4127501a64d960533b90e0e8c08275bbe6e536c0bd533d09
  libdrm-2.4.87.tar.gz
PGP:  https://dri.freedesktop.org/libdrm/libdrm-2.4.87.tar.gz.sig

-BEGIN PGP SIGNATURE-
Version: GnuPG v1

iQEcBAEBAgAGBQJZ+JAJAAoJEP3RXVrO8PKxe3kH/11MckJZYJRx5Gnwkb8o2kMx
y9omlgbpA9CI8eRZTIOD/XZ/yxRLziPDw8NLFuJwMQvYTiSbv7RY1ESqtFHv5U3y
d343Yux85TWyfKrPTc11PdHgK+qRnoLWVPH4Nr0SJLuBX1kqiNe0ws0VSvoNlFxB
7ayGsoF71W4lZIWkEi+U59InilDHEZv+C1xt1CMR1pBegBPZ5BOumEScHkuHH8Fu
v7A5q1eMtHbU1kkjuqVoZj/igYFZB3yPN11iByGHgz/N9iXXRMopt8X/AK6yFHjN
pzWd/qA/sr3PP35kU4Xo5ENmsEWSxyV86pMX38D2zekLovhPkstLfEz/wuhEC+o=
=7DH2
-END PGP SIGNATURE-
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[ANNOUNCE] libdrm 2.4.86

2017-10-31 Thread Marek Olšák
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


libdrm 2.4.86 has been released.

Andrey Grodzovsky (2):
  amdgpu: Add wrappers for AMDGPU_VM IOCTL.
  amdgpu: Add VMID reservation per GPU context test.

Dave Airlie (2):
  drm: sync drm headers from drm-next.
  drm/syncobj: fix some whitespace issues

Marek Olšák (1):
  configure.ac: bump version for release

git tag: libdrm-2.4.86

https://dri.freedesktop.org/libdrm/libdrm-2.4.86.tar.bz2
MD5:  8dabf172c9695c24d000cbddfa849a82  libdrm-2.4.86.tar.bz2
SHA1: b2c428326843dfaa5fb88899b8dfaa43b792dc70  libdrm-2.4.86.tar.bz2
SHA256: 4b010002ce158b4b6537ddb5a0f94a628be19727a71f1ab556a276829989072b  
libdrm-2.4.86.tar.bz2
SHA512: 
7d07b66db104728d013830da96fff678580c7505f1292cc7713b99f1e34e80276bcdc40a4377d67cb7ea73708e3fba29ba4062bd68d84845943b68090b9d7a01
  libdrm-2.4.86.tar.bz2
PGP:  https://dri.freedesktop.org/libdrm/libdrm-2.4.86.tar.bz2.sig

https://dri.freedesktop.org/libdrm/libdrm-2.4.86.tar.gz
MD5:  23ffd64496823a673b290268953da034  libdrm-2.4.86.tar.gz
SHA1: f7d277ef964b767f94215bf3f464e83fcdb742bd  libdrm-2.4.86.tar.gz
SHA256: 90fca042dd5c619fff2771ab634c69010f25c582071519aa284860758fac2963  
libdrm-2.4.86.tar.gz
SHA512: 
c22f748ecdc7910c0ebac618a936af3b47f3f5f0179be823a72920a1c1d47d52e3e832e0de5d37758d723a5ab7f0b5ef39dcb3ed7158904f9ca9f70509e7ee8b
  libdrm-2.4.86.tar.gz
PGP:  https://dri.freedesktop.org/libdrm/libdrm-2.4.86.tar.gz.sig

-BEGIN PGP SIGNATURE-
Version: GnuPG v1

iQEcBAEBAgAGBQJZ+IdmAAoJEP3RXVrO8PKxrG8IAKRCbKenZyeQlyYVav5xTfvJ
y++EOE7HqJ3K8LJnIwiQqjY6M67B8UVXrY536GCc5laMLGV7789wAbMpLVfGYS3w
/uUpeEf1NmIehnfQHQvHUT6JxeuHamsVFJ7Svfbfpx2JJ7pFoH4mIynxGlPjjv97
BBJg6suK8orD0yL/73A2x1FXDLoKHE4PGuCTZecpnb4y/EVdatFFE7wlb0uxGRrs
ZQroSptRmn3kc/KCIhoLYPrhXvRgiNMgvyL/orsL0tL5NlMBjJHOFO+WhjHZBvBC
wGTEwJRnRoJkTklC5O3G/wXZMjQHi7ExOF+7yIB/4EjYuNeEPYpYHzJjnHfrvto=
=Kd+P
-END PGP SIGNATURE-
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 1/3] drm/syncobj: extract two helpers from drm_syncobj_create

2017-10-12 Thread Marek Olšák
Sorry, this codepath is not tested by radeonsi.

Marek

On Sun, Oct 1, 2017 at 1:20 AM, Zhou, David(ChunMing)
<david1.z...@amd.com> wrote:
> Could you test and review it? On hand, I have no env.
>
> Regards,
> David Zhou
>
> 发自坚果 Pro
>
> Marek Ol?醟 <mar...@gmail.com> 于 2017年9月30日 下午11:56写道:
>
> The idea sounds good.
>
> Marek
>
> On Sat, Sep 30, 2017 at 3:55 AM, Chunming Zhou <zhou...@amd.com> wrote:
>> My mean is like the attached, I revert part of yours.
>>
>> Regards,
>>
>> David zhou
>>
>>
>>
>> On 2017年09月29日 22:15, Marek Olšák wrote:
>>>
>>> On Fri, Sep 29, 2017 at 4:13 PM, Marek Olšák <mar...@gmail.com> wrote:
>>>>
>>>> On Fri, Sep 29, 2017 at 4:44 AM, Chunming Zhou <zhou...@amd.com> wrote:
>>>>>
>>>>>
>>>>> On 2017年09月13日 04:42, Marek Olšák wrote:
>>>>>>
>>>>>> From: Marek Olšák <marek.ol...@amd.com>
>>>>>>
>>>>>> For amdgpu.
>>>>>>
>>>>>> drm_syncobj_create is renamed to drm_syncobj_create_as_handle, and new
>>>>>> helpers drm_syncobj_create and drm_syncobj_get_handle are added.
>>>>>>
>>>>>> Signed-off-by: Marek Olšák <marek.ol...@amd.com>
>>>>>> ---
>>>>>>drivers/gpu/drm/drm_syncobj.c | 49
>>>>>> +++
>>>>>>include/drm/drm_syncobj.h |  4 
>>>>>>2 files changed, 49 insertions(+), 4 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/drm_syncobj.c
>>>>>> b/drivers/gpu/drm/drm_syncobj.c
>>>>>> index 0422b8c..0bb1741 100644
>>>>>> --- a/drivers/gpu/drm/drm_syncobj.c
>>>>>> +++ b/drivers/gpu/drm/drm_syncobj.c
>>>>>> @@ -262,8 +262,14 @@ void drm_syncobj_free(struct kref *kref)
>>>>>>}
>>>>>>EXPORT_SYMBOL(drm_syncobj_free);
>>>>>>-static int drm_syncobj_create(struct drm_file *file_private,
>>>>>> - u32 *handle, uint32_t flags)
>>>>>
>>>>> You can add a new parameter for passing dma fence, then in patch3, you
>>>>> can
>>>>> directly use it for AMDGPU_FENCE_TO HANDLE_GET_SYNCOBJ.
>>>>>
>>>>> otherwise the set looks good to me.
>>>>
>>>> Sorry I just pushed this.
>>>
>>> Actually, you commented on a deleted line. The function already has
>>> dma_fence among the parameters.
>>>
>>> Marek
>>
>>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[ANNOUNCE] libdrm 2.4.84

2017-10-12 Thread Marek Olšák

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


libdrm 2.4.84 has been released.


Alex Deucher (1):
  tests/amdgpu: add missing header to SOURCES

Andrey Grodzovsky (1):
  amdgpu: Add deadlock detection test suit.

Anuj Phogat (1):
  intel: Change a KBL pci id to GT2 from GT1.5

Christian König (1):
  amdgpu: make userptr unit test more interesting

Dave Airlie (1):
  headers: sync syncobj ioctl defines.

Eric Engestrom (1):
  freedreno/kgsl: fix pointer-to-int cast

James Zhu (2):
  tests/amdgpu: add new uvd enc support check
  tests/amdgpu: fix uvd enc data corruption issue

Jan Vesely (1):
  amdgpu: Do not write beyond allocated memory when parsing ids

Marek Olšák (7):
  amdgpu: print error messages when amdgpu_device_initialize is failing
  include: sync drm.h and amdgpu_drm.h with airlied/drm-next
  amdgpu: add sync_file import and export functions
  drm: add drmSyncobjWait wrapper
  amdgpu: add amdgpu_cs_syncobj_wait
  amdgpu: add amdgpu_cs_fence_to_handle
  configure.ac: bump version to 2.4.84

Philipp Zabel (1):
  etnaviv: prevent deadlock in error path

Rob Herring (2):
  Android: move libraries to /vendor
  headers: sync DRM_MODE_ROTATE/REFLECT defines from kernel v4.14-rc1

git tag: libdrm-2.4.84

https://dri.freedesktop.org/libdrm/libdrm-2.4.84.tar.bz2
MD5:  35b9544bc2ad864acd1abaa1a2b99092  libdrm-2.4.84.tar.bz2
SHA1: 3a8835aaef89648757593f8de9eff95990dd  libdrm-2.4.84.tar.bz2
SHA256: 7ae9c24d91139ac9a2cdee06fe46dbe1c401a1eda1c0bd2a6d1ecf72f479e0aa  
libdrm-2.4.84.tar.bz2
SHA512: 
860ebc5fa934edee97e9e7e13aaa2f2e70a68b946f4f3893cd7f93b8296c10b3cd4ce4c23b1676eefe375286e6e2292b96e917d7976f65c61da3fa661e5e641a
  libdrm-2.4.84.tar.bz2
PGP:  https://dri.freedesktop.org/libdrm/libdrm-2.4.84.tar.bz2.sig

https://dri.freedesktop.org/libdrm/libdrm-2.4.84.tar.gz
MD5:  5e676f903bdb245878383334dca4cc33  libdrm-2.4.84.tar.gz
SHA1: e406522d41c2dc5f3ac9643f885a349e039ffeb6  libdrm-2.4.84.tar.gz
SHA256: ca4d3a4705be2ec289f9df7cfa871f5e02fa43d0f653622c9d9d428959143e78  
libdrm-2.4.84.tar.gz
SHA512: 
efbe352e1bbf3bb1962ad8e1c0a2774e5683db9cd0d91e42b844ddc74089d131f305bc6977c0734690c88af11f5d6777dbfd4bbaab9778fa550dc1b6f52e5cb6
  libdrm-2.4.84.tar.gz
PGP:  https://dri.freedesktop.org/libdrm/libdrm-2.4.84.tar.gz.sig

-BEGIN PGP SIGNATURE-
Version: GnuPG v1

iQEcBAEBAgAGBQJZ37tJAAoJEP3RXVrO8PKxmNoH/jdcdZzvpXtWB6eWC2SEXGbl
WvhoIrNf94cAz/1JvN6nzXjo5oewTTrHsHTd7XYxHcFt13jHTB5uCX8gnCotvY+d
yJesW8liHg4wx1FyL8syAufFlWtzyxK74lQj6R4DHvVeHMnUpc9+goDVZDPkHj2R
/S4u5qbLC4AJeATyDsQG07kPwhblrQpm7QLns4caQsJssrmhboM5w4IkjViHUR1j
7ju5MW1eyAAdWHdhXRgFk7lIO+nuyZXoxS6TjMlQmYichBM261PJldUjo5Zi4Aal
oH/QuDYvoFq0F5E4f/JIliUciU/w4FFSOtheWW2AztzrAeUGyTl1D413WPI4ZUQ=
=o07z
-END PGP SIGNATURE-
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: introduce AMDGPU_GEM_CREATE_EXPLICIT_SYNC v2

2017-10-12 Thread Marek Olšák
On Thu, Sep 21, 2017 at 4:38 PM, Andres Rodriguez  wrote:
> Hi Christian,
>
> The reference radv patches are on the list. The basic idea is to only set
> the explicit sync flag for buffers allocated for dri usage.

Did you mean "only set the explicit sync flag for buffers NOT
allocated for dri usage"?

Marek
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 1/3] drm/syncobj: extract two helpers from drm_syncobj_create

2017-09-30 Thread Marek Olšák
The idea sounds good.

Marek

On Sat, Sep 30, 2017 at 3:55 AM, Chunming Zhou <zhou...@amd.com> wrote:
> My mean is like the attached, I revert part of yours.
>
> Regards,
>
> David zhou
>
>
>
> On 2017年09月29日 22:15, Marek Olšák wrote:
>>
>> On Fri, Sep 29, 2017 at 4:13 PM, Marek Olšák <mar...@gmail.com> wrote:
>>>
>>> On Fri, Sep 29, 2017 at 4:44 AM, Chunming Zhou <zhou...@amd.com> wrote:
>>>>
>>>>
>>>> On 2017年09月13日 04:42, Marek Olšák wrote:
>>>>>
>>>>> From: Marek Olšák <marek.ol...@amd.com>
>>>>>
>>>>> For amdgpu.
>>>>>
>>>>> drm_syncobj_create is renamed to drm_syncobj_create_as_handle, and new
>>>>> helpers drm_syncobj_create and drm_syncobj_get_handle are added.
>>>>>
>>>>> Signed-off-by: Marek Olšák <marek.ol...@amd.com>
>>>>> ---
>>>>>drivers/gpu/drm/drm_syncobj.c | 49
>>>>> +++
>>>>>include/drm/drm_syncobj.h |  4 
>>>>>2 files changed, 49 insertions(+), 4 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/drm_syncobj.c
>>>>> b/drivers/gpu/drm/drm_syncobj.c
>>>>> index 0422b8c..0bb1741 100644
>>>>> --- a/drivers/gpu/drm/drm_syncobj.c
>>>>> +++ b/drivers/gpu/drm/drm_syncobj.c
>>>>> @@ -262,8 +262,14 @@ void drm_syncobj_free(struct kref *kref)
>>>>>}
>>>>>EXPORT_SYMBOL(drm_syncobj_free);
>>>>>-static int drm_syncobj_create(struct drm_file *file_private,
>>>>> - u32 *handle, uint32_t flags)
>>>>
>>>> You can add a new parameter for passing dma fence, then in patch3, you
>>>> can
>>>> directly use it for AMDGPU_FENCE_TO HANDLE_GET_SYNCOBJ.
>>>>
>>>> otherwise the set looks good to me.
>>>
>>> Sorry I just pushed this.
>>
>> Actually, you commented on a deleted line. The function already has
>> dma_fence among the parameters.
>>
>> Marek
>
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 1/3] drm/syncobj: extract two helpers from drm_syncobj_create

2017-09-29 Thread Marek Olšák
On Fri, Sep 29, 2017 at 4:13 PM, Marek Olšák <mar...@gmail.com> wrote:
> On Fri, Sep 29, 2017 at 4:44 AM, Chunming Zhou <zhou...@amd.com> wrote:
>>
>>
>> On 2017年09月13日 04:42, Marek Olšák wrote:
>>>
>>> From: Marek Olšák <marek.ol...@amd.com>
>>>
>>> For amdgpu.
>>>
>>> drm_syncobj_create is renamed to drm_syncobj_create_as_handle, and new
>>> helpers drm_syncobj_create and drm_syncobj_get_handle are added.
>>>
>>> Signed-off-by: Marek Olšák <marek.ol...@amd.com>
>>> ---
>>>   drivers/gpu/drm/drm_syncobj.c | 49
>>> +++
>>>   include/drm/drm_syncobj.h |  4 
>>>   2 files changed, 49 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
>>> index 0422b8c..0bb1741 100644
>>> --- a/drivers/gpu/drm/drm_syncobj.c
>>> +++ b/drivers/gpu/drm/drm_syncobj.c
>>> @@ -262,8 +262,14 @@ void drm_syncobj_free(struct kref *kref)
>>>   }
>>>   EXPORT_SYMBOL(drm_syncobj_free);
>>>   -static int drm_syncobj_create(struct drm_file *file_private,
>>> - u32 *handle, uint32_t flags)
>>
>> You can add a new parameter for passing dma fence, then in patch3, you can
>> directly use it for AMDGPU_FENCE_TO HANDLE_GET_SYNCOBJ.
>>
>> otherwise the set looks good to me.
>
> Sorry I just pushed this.

Actually, you commented on a deleted line. The function already has
dma_fence among the parameters.

Marek
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 1/3] drm/syncobj: extract two helpers from drm_syncobj_create

2017-09-29 Thread Marek Olšák
On Fri, Sep 29, 2017 at 4:44 AM, Chunming Zhou <zhou...@amd.com> wrote:
>
>
> On 2017年09月13日 04:42, Marek Olšák wrote:
>>
>> From: Marek Olšák <marek.ol...@amd.com>
>>
>> For amdgpu.
>>
>> drm_syncobj_create is renamed to drm_syncobj_create_as_handle, and new
>> helpers drm_syncobj_create and drm_syncobj_get_handle are added.
>>
>> Signed-off-by: Marek Olšák <marek.ol...@amd.com>
>> ---
>>   drivers/gpu/drm/drm_syncobj.c | 49
>> +++
>>   include/drm/drm_syncobj.h |  4 
>>   2 files changed, 49 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
>> index 0422b8c..0bb1741 100644
>> --- a/drivers/gpu/drm/drm_syncobj.c
>> +++ b/drivers/gpu/drm/drm_syncobj.c
>> @@ -262,8 +262,14 @@ void drm_syncobj_free(struct kref *kref)
>>   }
>>   EXPORT_SYMBOL(drm_syncobj_free);
>>   -static int drm_syncobj_create(struct drm_file *file_private,
>> - u32 *handle, uint32_t flags)
>
> You can add a new parameter for passing dma fence, then in patch3, you can
> directly use it for AMDGPU_FENCE_TO HANDLE_GET_SYNCOBJ.
>
> otherwise the set looks good to me.

Sorry I just pushed this.

Marek
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH libdrm 2/4] drm: add drmSyncobjWait wrapper

2017-09-29 Thread Marek Olšák
On Fri, Sep 29, 2017 at 4:12 AM, Chunming Zhou <zhou...@amd.com> wrote:
>
>
> On 2017年09月29日 06:10, Marek Olšák wrote:
>>
>> From: Marek Olšák <marek.ol...@amd.com>
>>
>> ---
>>   include/drm/drm.h | 24 
>>   xf86drm.c | 22 ++
>>   xf86drm.h |  3 +++
>>   3 files changed, 49 insertions(+)
>>
>> diff --git a/include/drm/drm.h b/include/drm/drm.h
>> index bf3674a..4da1667 100644
>> --- a/include/drm/drm.h
>> +++ b/include/drm/drm.h
>> @@ -687,38 +687,57 @@ struct drm_prime_handle {
>> /** Flags.. only applicable for handle->fd */
>> __u32 flags;
>> /** Returned dmabuf file descriptor */
>> __s32 fd;
>>   };
>> struct drm_syncobj_create {
>> __u32 handle;
>> +#define DRM_SYNCOBJ_CREATE_SIGNALED (1 << 0)
>> __u32 flags;
>>   };
>> struct drm_syncobj_destroy {
>> __u32 handle;
>> __u32 pad;
>>   };
>> #define DRM_SYNCOBJ_FD_TO_HANDLE_FLAGS_IMPORT_SYNC_FILE (1 << 0)
>>   #define DRM_SYNCOBJ_HANDLE_TO_FD_FLAGS_EXPORT_SYNC_FILE (1 << 0)
>
> Typo for '(1 << 1)' ?

No, this was copied from kernel headers.

>
> With that fixes, the set is Reviewed-by: Chunming Zhou <david1.z...@amd.com>

Thanks.

Marek
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 3/3] drm/amdgpu: add FENCE_TO_HANDLE ioctl that returns syncobj or sync_file

2017-09-29 Thread Marek Olšák
On Fri, Sep 29, 2017 at 1:42 AM, Dave Airlie <airl...@gmail.com> wrote:
> On 29 September 2017 at 06:41, Marek Olšák <mar...@gmail.com> wrote:
>> Can I get Rb for this series?
>>
>
> For the series,
>
> Reviewed-by: Dave Airlie <airl...@redhat.com>
>
> Alex, please merge the two drm core precursor with patch 3.

Alex, this is for drm-next, where I can't push.

Thanks,
Marek
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH libdrm 2/4] drm: add drmSyncobjWait wrapper

2017-09-28 Thread Marek Olšák
From: Marek Olšák <marek.ol...@amd.com>

---
 include/drm/drm.h | 24 
 xf86drm.c | 22 ++
 xf86drm.h |  3 +++
 3 files changed, 49 insertions(+)

diff --git a/include/drm/drm.h b/include/drm/drm.h
index bf3674a..4da1667 100644
--- a/include/drm/drm.h
+++ b/include/drm/drm.h
@@ -687,38 +687,57 @@ struct drm_prime_handle {
 
/** Flags.. only applicable for handle->fd */
__u32 flags;
 
/** Returned dmabuf file descriptor */
__s32 fd;
 };
 
 struct drm_syncobj_create {
__u32 handle;
+#define DRM_SYNCOBJ_CREATE_SIGNALED (1 << 0)
__u32 flags;
 };
 
 struct drm_syncobj_destroy {
__u32 handle;
__u32 pad;
 };
 
 #define DRM_SYNCOBJ_FD_TO_HANDLE_FLAGS_IMPORT_SYNC_FILE (1 << 0)
 #define DRM_SYNCOBJ_HANDLE_TO_FD_FLAGS_EXPORT_SYNC_FILE (1 << 0)
 struct drm_syncobj_handle {
__u32 handle;
__u32 flags;
 
__s32 fd;
__u32 pad;
 };
 
+#define DRM_SYNCOBJ_WAIT_FLAGS_WAIT_ALL (1 << 0)
+#define DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT (1 << 1)
+struct drm_syncobj_wait {
+   __u64 handles;
+   /* absolute timeout */
+   __s64 timeout_nsec;
+   __u32 count_handles;
+   __u32 flags;
+   __u32 first_signaled; /* only valid when not waiting all */
+   __u32 pad;
+};
+
+struct drm_syncobj_array {
+   __u64 handles;
+   __u32 count_handles;
+   __u32 pad;
+};
+
 #if defined(__cplusplus)
 }
 #endif
 
 #include "drm_mode.h"
 
 #if defined(__cplusplus)
 extern "C" {
 #endif
 
@@ -827,20 +846,23 @@ extern "C" {
 #define DRM_IOCTL_MODE_OBJ_SETPROPERTY DRM_IOWR(0xBA, struct 
drm_mode_obj_set_property)
 #define DRM_IOCTL_MODE_CURSOR2 DRM_IOWR(0xBB, struct drm_mode_cursor2)
 #define DRM_IOCTL_MODE_ATOMIC  DRM_IOWR(0xBC, struct drm_mode_atomic)
 #define DRM_IOCTL_MODE_CREATEPROPBLOB  DRM_IOWR(0xBD, struct 
drm_mode_create_blob)
 #define DRM_IOCTL_MODE_DESTROYPROPBLOB DRM_IOWR(0xBE, struct 
drm_mode_destroy_blob)
 
 #define DRM_IOCTL_SYNCOBJ_CREATE   DRM_IOWR(0xBF, struct 
drm_syncobj_create)
 #define DRM_IOCTL_SYNCOBJ_DESTROY  DRM_IOWR(0xC0, struct 
drm_syncobj_destroy)
 #define DRM_IOCTL_SYNCOBJ_HANDLE_TO_FD DRM_IOWR(0xC1, struct 
drm_syncobj_handle)
 #define DRM_IOCTL_SYNCOBJ_FD_TO_HANDLE DRM_IOWR(0xC2, struct 
drm_syncobj_handle)
+#define DRM_IOCTL_SYNCOBJ_WAIT DRM_IOWR(0xC3, struct drm_syncobj_wait)
+#define DRM_IOCTL_SYNCOBJ_RESETDRM_IOWR(0xC4, struct 
drm_syncobj_array)
+#define DRM_IOCTL_SYNCOBJ_SIGNAL   DRM_IOWR(0xC5, struct drm_syncobj_array)
 
 /**
  * Device specific ioctls should only be in their respective headers
  * The device specific ioctl range is from 0x40 to 0x9f.
  * Generic IOCTLS restart at 0xA0.
  *
  * \sa drmCommandNone(), drmCommandRead(), drmCommandWrite(), and
  * drmCommandReadWrite().
  */
 #define DRM_COMMAND_BASE0x40
@@ -869,20 +891,21 @@ struct drm_event {
 struct drm_event_vblank {
struct drm_event base;
__u64 user_data;
__u32 tv_sec;
__u32 tv_usec;
__u32 sequence;
__u32 crtc_id; /* 0 on older kernels that do not support this */
 };
 
 /* typedef area */
+#ifndef __KERNEL__
 typedef struct drm_clip_rect drm_clip_rect_t;
 typedef struct drm_drawable_info drm_drawable_info_t;
 typedef struct drm_tex_region drm_tex_region_t;
 typedef struct drm_hw_lock drm_hw_lock_t;
 typedef struct drm_version drm_version_t;
 typedef struct drm_unique drm_unique_t;
 typedef struct drm_list drm_list_t;
 typedef struct drm_block drm_block_t;
 typedef struct drm_control drm_control_t;
 typedef enum drm_map_type drm_map_type_t;
@@ -910,16 +933,17 @@ typedef struct drm_draw drm_draw_t;
 typedef struct drm_update_draw drm_update_draw_t;
 typedef struct drm_auth drm_auth_t;
 typedef struct drm_irq_busid drm_irq_busid_t;
 typedef enum drm_vblank_seq_type drm_vblank_seq_type_t;
 
 typedef struct drm_agp_buffer drm_agp_buffer_t;
 typedef struct drm_agp_binding drm_agp_binding_t;
 typedef struct drm_agp_info drm_agp_info_t;
 typedef struct drm_scatter_gather drm_scatter_gather_t;
 typedef struct drm_set_version drm_set_version_t;
+#endif
 
 #if defined(__cplusplus)
 }
 #endif
 
 #endif
diff --git a/xf86drm.c b/xf86drm.c
index 6ea0112..8a32717 100644
--- a/xf86drm.c
+++ b/xf86drm.c
@@ -4218,10 +4218,32 @@ int drmSyncobjExportSyncFile(int fd, uint32_t handle, 
int *sync_file_fd)
 memclear(args);
 args.fd = -1;
 args.handle = handle;
 args.flags = DRM_SYNCOBJ_HANDLE_TO_FD_FLAGS_EXPORT_SYNC_FILE;
 ret = drmIoctl(fd, DRM_IOCTL_SYNCOBJ_HANDLE_TO_FD, );
 if (ret)
return ret;
 *sync_file_fd = args.fd;
 return 0;
 }
+
+int drmSyncobjWait(int fd, uint32_t *handles, unsigned num_handles,
+  int64_t timeout_nsec, unsigned flags,
+  uint32_t *first_signaled)
+{
+   struct drm_syncobj_wait args;
+ 

[PATCH libdrm 3/4] amdgpu: add amdgpu_cs_syncobj_wait

2017-09-28 Thread Marek Olšák
From: Marek Olšák <marek.ol...@amd.com>

v2: update amdgpu-symbol-check
---
 amdgpu/amdgpu-symbol-check |  1 +
 amdgpu/amdgpu.h| 20 
 amdgpu/amdgpu_cs.c | 12 
 3 files changed, 33 insertions(+)

diff --git a/amdgpu/amdgpu-symbol-check b/amdgpu/amdgpu-symbol-check
index bc9ed3f..7ecfc98 100755
--- a/amdgpu/amdgpu-symbol-check
+++ b/amdgpu/amdgpu-symbol-check
@@ -35,20 +35,21 @@ amdgpu_cs_destroy_semaphore
 amdgpu_cs_destroy_syncobj
 amdgpu_cs_export_syncobj
 amdgpu_cs_import_syncobj
 amdgpu_cs_query_fence_status
 amdgpu_cs_query_reset_state
 amdgpu_cs_signal_semaphore
 amdgpu_cs_submit
 amdgpu_cs_submit_raw
 amdgpu_cs_syncobj_export_sync_file
 amdgpu_cs_syncobj_import_sync_file
+amdgpu_cs_syncobj_wait
 amdgpu_cs_wait_fences
 amdgpu_cs_wait_semaphore
 amdgpu_device_deinitialize
 amdgpu_device_initialize
 amdgpu_get_marketing_name
 amdgpu_query_buffer_size_alignment
 amdgpu_query_crtc_from_id
 amdgpu_query_firmware_version
 amdgpu_query_gds_info
 amdgpu_query_gpu_info
diff --git a/amdgpu/amdgpu.h b/amdgpu/amdgpu.h
index b44b9b6..979acfc 100644
--- a/amdgpu/amdgpu.h
+++ b/amdgpu/amdgpu.h
@@ -1347,20 +1347,40 @@ int amdgpu_cs_create_syncobj(amdgpu_device_handle dev,
  * \param   syncobj - \c [in] sync object handle
  *
  * \return   0 on success\n
  *  <0 - Negative POSIX Error code
  *
 */
 int amdgpu_cs_destroy_syncobj(amdgpu_device_handle dev,
  uint32_t syncobj);
 
 /**
+ *  Wait for one or all sync objects to signal.
+ *
+ * \param   dev- \c [in] self-explanatory
+ * \param   handles - \c [in] array of sync object handles
+ * \param   num_handles - \c [in] self-explanatory
+ * \param   timeout_nsec - \c [in] self-explanatory
+ * \param   flags   - \c [in] a bitmask of DRM_SYNCOBJ_WAIT_FLAGS_*
+ * \param   first_signaled - \c [in] self-explanatory
+ *
+ * \return   0 on success\n
+ *  -ETIME - Timeout
+ *  <0 - Negative POSIX Error code
+ *
+ */
+int amdgpu_cs_syncobj_wait(amdgpu_device_handle dev,
+  uint32_t *handles, unsigned num_handles,
+  int64_t timeout_nsec, unsigned flags,
+  uint32_t *first_signaled);
+
+/**
  *  Export kernel sync object to shareable fd.
  *
  * \param   dev   - \c [in] device handle
  * \param   syncobj- \c [in] sync object handle
  * \param   shared_fd  - \c [out] shared file descriptor.
  *
  * \return   0 on success\n
  *  <0 - Negative POSIX Error code
  *
 */
diff --git a/amdgpu/amdgpu_cs.c b/amdgpu/amdgpu_cs.c
index 4a05536..2cde7bf 100644
--- a/amdgpu/amdgpu_cs.c
+++ b/amdgpu/amdgpu_cs.c
@@ -608,20 +608,32 @@ int amdgpu_cs_create_syncobj(amdgpu_device_handle dev,
 
 int amdgpu_cs_destroy_syncobj(amdgpu_device_handle dev,
  uint32_t handle)
 {
if (NULL == dev)
return -EINVAL;
 
return drmSyncobjDestroy(dev->fd, handle);
 }
 
+int amdgpu_cs_syncobj_wait(amdgpu_device_handle dev,
+  uint32_t *handles, unsigned num_handles,
+  int64_t timeout_nsec, unsigned flags,
+  uint32_t *first_signaled)
+{
+   if (NULL == dev)
+   return -EINVAL;
+
+   return drmSyncobjWait(dev->fd, handles, num_handles, timeout_nsec,
+ flags, first_signaled);
+}
+
 int amdgpu_cs_export_syncobj(amdgpu_device_handle dev,
 uint32_t handle,
 int *shared_fd)
 {
if (NULL == dev)
return -EINVAL;
 
return drmSyncobjHandleToFD(dev->fd, handle, shared_fd);
 }
 
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH libdrm 4/4] amdgpu: add amdgpu_cs_fence_to_handle

2017-09-28 Thread Marek Olšák
From: Marek Olšák <marek.ol...@amd.com>

v2: update amdgpu-symbol-check
---
 amdgpu/amdgpu-symbol-check |  1 +
 amdgpu/amdgpu.h| 14 ++
 amdgpu/amdgpu_cs.c | 22 ++
 include/drm/amdgpu_drm.h   | 21 +
 4 files changed, 58 insertions(+)

diff --git a/amdgpu/amdgpu-symbol-check b/amdgpu/amdgpu-symbol-check
index 7ecfc98..d9f89ef 100755
--- a/amdgpu/amdgpu-symbol-check
+++ b/amdgpu/amdgpu-symbol-check
@@ -27,20 +27,21 @@ amdgpu_bo_wait_for_idle
 amdgpu_create_bo_from_user_mem
 amdgpu_cs_chunk_fence_info_to_data
 amdgpu_cs_chunk_fence_to_dep
 amdgpu_cs_create_semaphore
 amdgpu_cs_create_syncobj
 amdgpu_cs_ctx_create
 amdgpu_cs_ctx_free
 amdgpu_cs_destroy_semaphore
 amdgpu_cs_destroy_syncobj
 amdgpu_cs_export_syncobj
+amdgpu_cs_fence_to_handle
 amdgpu_cs_import_syncobj
 amdgpu_cs_query_fence_status
 amdgpu_cs_query_reset_state
 amdgpu_cs_signal_semaphore
 amdgpu_cs_submit
 amdgpu_cs_submit_raw
 amdgpu_cs_syncobj_export_sync_file
 amdgpu_cs_syncobj_import_sync_file
 amdgpu_cs_syncobj_wait
 amdgpu_cs_wait_fences
diff --git a/amdgpu/amdgpu.h b/amdgpu/amdgpu.h
index 979acfc..23cde10 100644
--- a/amdgpu/amdgpu.h
+++ b/amdgpu/amdgpu.h
@@ -1426,20 +1426,34 @@ int 
amdgpu_cs_syncobj_export_sync_file(amdgpu_device_handle dev,
  *
  * \return   0 on success\n
  *  <0 - Negative POSIX Error code
  *
  */
 int amdgpu_cs_syncobj_import_sync_file(amdgpu_device_handle dev,
   uint32_t syncobj,
   int sync_file_fd);
 
 /**
+ * Export an amdgpu fence as a handle (syncobj or fd).
+ *
+ * \param what AMDGPU_FENCE_TO_HANDLE_GET_{SYNCOBJ, FD}
+ * \param out_handle   returned handle
+ *
+ * \return   0 on success\n
+ *  <0 - Negative POSIX Error code
+ */
+int amdgpu_cs_fence_to_handle(amdgpu_device_handle dev,
+ struct amdgpu_cs_fence *fence,
+ uint32_t what,
+ uint32_t *out_handle);
+
+/**
  *  Submit raw command submission to kernel
  *
  * \param   dev   - \c [in] device handle
  * \param   context- \c [in] context handle for context id
  * \param   bo_list_handle - \c [in] request bo list handle (0 for none)
  * \param   num_chunks - \c [in] number of CS chunks to submit
  * \param   chunks - \c [in] array of CS chunks
  * \param   seq_no - \c [out] output sequence number for submission.
  *
  * \return   0 on success\n
diff --git a/amdgpu/amdgpu_cs.c b/amdgpu/amdgpu_cs.c
index 2cde7bf..9577d5c 100644
--- a/amdgpu/amdgpu_cs.c
+++ b/amdgpu/amdgpu_cs.c
@@ -706,10 +706,32 @@ void amdgpu_cs_chunk_fence_info_to_data(struct 
amdgpu_cs_fence_info *fence_info,
 
 void amdgpu_cs_chunk_fence_to_dep(struct amdgpu_cs_fence *fence,
  struct drm_amdgpu_cs_chunk_dep *dep)
 {
dep->ip_type = fence->ip_type;
dep->ip_instance = fence->ip_instance;
dep->ring = fence->ring;
dep->ctx_id = fence->context->id;
dep->handle = fence->fence;
 }
+
+int amdgpu_cs_fence_to_handle(amdgpu_device_handle dev,
+ struct amdgpu_cs_fence *fence,
+ uint32_t what,
+ uint32_t *out_handle)
+{
+   union drm_amdgpu_fence_to_handle fth = {0};
+   int r;
+
+   fth.in.fence.ctx_id = fence->context->id;
+   fth.in.fence.ip_type = fence->ip_type;
+   fth.in.fence.ip_instance = fence->ip_instance;
+   fth.in.fence.ring = fence->ring;
+   fth.in.fence.seq_no = fence->fence;
+   fth.in.what = what;
+
+   r = drmCommandWriteRead(dev->fd, DRM_AMDGPU_FENCE_TO_HANDLE,
+   , sizeof(fth));
+   if (r == 0)
+   *out_handle = fth.out.handle;
+   return r;
+}
diff --git a/include/drm/amdgpu_drm.h b/include/drm/amdgpu_drm.h
index d9aa4a3..00f1b81 100644
--- a/include/drm/amdgpu_drm.h
+++ b/include/drm/amdgpu_drm.h
@@ -45,55 +45,59 @@ extern "C" {
 #define DRM_AMDGPU_CS  0x04
 #define DRM_AMDGPU_INFO0x05
 #define DRM_AMDGPU_GEM_METADATA0x06
 #define DRM_AMDGPU_GEM_WAIT_IDLE   0x07
 #define DRM_AMDGPU_GEM_VA  0x08
 #define DRM_AMDGPU_WAIT_CS 0x09
 #define DRM_AMDGPU_GEM_OP  0x10
 #define DRM_AMDGPU_GEM_USERPTR 0x11
 #define DRM_AMDGPU_WAIT_FENCES 0x12
 #define DRM_AMDGPU_VM  0x13
+#define DRM_AMDGPU_FENCE_TO_HANDLE 0x15
 
 #define DRM_IOCTL_AMDGPU_GEM_CREATEDRM_IOWR(DRM_COMMAND_BASE + 
DRM_AMDGPU_GEM_CREATE, union drm_amdgpu_gem_create)
 #define DRM_IOCTL_AMDGPU_GEM_MMAP  DRM_IOWR(DRM_COMMAND_BASE + 
DRM_AMDGPU_GEM_MMAP, union drm_amdgpu_gem_mmap)
 #define DRM_IOCTL_AMDGPU_CTX   DRM_IOWR(DRM_COMMAND_BASE + 
DRM_AMDGPU_CTX, union drm_amdgpu_ctx)
 #define DRM_IOCTL

[PATCH libdrm 1/4] amdgpu: add sync_file import and export functions

2017-09-28 Thread Marek Olšák
From: Marek Olšák <marek.ol...@amd.com>

v2: update amdgpu-symbol-check
---
 amdgpu/amdgpu-symbol-check |  2 ++
 amdgpu/amdgpu.h| 30 ++
 amdgpu/amdgpu_cs.c | 20 
 3 files changed, 52 insertions(+)

diff --git a/amdgpu/amdgpu-symbol-check b/amdgpu/amdgpu-symbol-check
index c5b85b5..bc9ed3f 100755
--- a/amdgpu/amdgpu-symbol-check
+++ b/amdgpu/amdgpu-symbol-check
@@ -33,20 +33,22 @@ amdgpu_cs_ctx_create
 amdgpu_cs_ctx_free
 amdgpu_cs_destroy_semaphore
 amdgpu_cs_destroy_syncobj
 amdgpu_cs_export_syncobj
 amdgpu_cs_import_syncobj
 amdgpu_cs_query_fence_status
 amdgpu_cs_query_reset_state
 amdgpu_cs_signal_semaphore
 amdgpu_cs_submit
 amdgpu_cs_submit_raw
+amdgpu_cs_syncobj_export_sync_file
+amdgpu_cs_syncobj_import_sync_file
 amdgpu_cs_wait_fences
 amdgpu_cs_wait_semaphore
 amdgpu_device_deinitialize
 amdgpu_device_initialize
 amdgpu_get_marketing_name
 amdgpu_query_buffer_size_alignment
 amdgpu_query_crtc_from_id
 amdgpu_query_firmware_version
 amdgpu_query_gds_info
 amdgpu_query_gpu_info
diff --git a/amdgpu/amdgpu.h b/amdgpu/amdgpu.h
index 238b1aa..b44b9b6 100644
--- a/amdgpu/amdgpu.h
+++ b/amdgpu/amdgpu.h
@@ -1376,20 +1376,50 @@ int amdgpu_cs_export_syncobj(amdgpu_device_handle dev,
  *
  * \return   0 on success\n
  *  <0 - Negative POSIX Error code
  *
 */
 int amdgpu_cs_import_syncobj(amdgpu_device_handle dev,
 int shared_fd,
 uint32_t *syncobj);
 
 /**
+ *  Export kernel sync object to a sync_file.
+ *
+ * \param   dev   - \c [in] device handle
+ * \param   syncobj- \c [in] sync object handle
+ * \param   sync_file_fd - \c [out] sync_file file descriptor.
+ *
+ * \return   0 on success\n
+ *  <0 - Negative POSIX Error code
+ *
+ */
+int amdgpu_cs_syncobj_export_sync_file(amdgpu_device_handle dev,
+  uint32_t syncobj,
+  int *sync_file_fd);
+
+/**
+ *  Import kernel sync object from a sync_file.
+ *
+ * \param   dev   - \c [in] device handle
+ * \param   syncobj- \c [in] sync object handle
+ * \param   sync_file_fd - \c [in] sync_file file descriptor.
+ *
+ * \return   0 on success\n
+ *  <0 - Negative POSIX Error code
+ *
+ */
+int amdgpu_cs_syncobj_import_sync_file(amdgpu_device_handle dev,
+  uint32_t syncobj,
+  int sync_file_fd);
+
+/**
  *  Submit raw command submission to kernel
  *
  * \param   dev   - \c [in] device handle
  * \param   context- \c [in] context handle for context id
  * \param   bo_list_handle - \c [in] request bo list handle (0 for none)
  * \param   num_chunks - \c [in] number of CS chunks to submit
  * \param   chunks - \c [in] array of CS chunks
  * \param   seq_no - \c [out] output sequence number for submission.
  *
  * \return   0 on success\n
diff --git a/amdgpu/amdgpu_cs.c b/amdgpu/amdgpu_cs.c
index dfba875..4a05536 100644
--- a/amdgpu/amdgpu_cs.c
+++ b/amdgpu/amdgpu_cs.c
@@ -628,20 +628,40 @@ int amdgpu_cs_export_syncobj(amdgpu_device_handle dev,
 int amdgpu_cs_import_syncobj(amdgpu_device_handle dev,
 int shared_fd,
 uint32_t *handle)
 {
if (NULL == dev)
return -EINVAL;
 
return drmSyncobjFDToHandle(dev->fd, shared_fd, handle);
 }
 
+int amdgpu_cs_syncobj_export_sync_file(amdgpu_device_handle dev,
+  uint32_t syncobj,
+  int *sync_file_fd)
+{
+   if (NULL == dev)
+   return -EINVAL;
+
+   return drmSyncobjExportSyncFile(dev->fd, syncobj, sync_file_fd);
+}
+
+int amdgpu_cs_syncobj_import_sync_file(amdgpu_device_handle dev,
+  uint32_t syncobj,
+  int sync_file_fd)
+{
+   if (NULL == dev)
+   return -EINVAL;
+
+   return drmSyncobjImportSyncFile(dev->fd, syncobj, sync_file_fd);
+}
+
 int amdgpu_cs_submit_raw(amdgpu_device_handle dev,
 amdgpu_context_handle context,
 amdgpu_bo_list_handle bo_list_handle,
 int num_chunks,
 struct drm_amdgpu_cs_chunk *chunks,
 uint64_t *seq_no)
 {
union drm_amdgpu_cs cs = {0};
uint64_t *chunk_array;
int i, r;
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 1/3] drm/syncobj: extract two helpers from drm_syncobj_create

2017-09-28 Thread Marek Olšák
On Thu, Sep 14, 2017 at 10:01 AM, Emil Velikov <emil.l.veli...@gmail.com> wrote:
> On 14 September 2017 at 08:56, Emil Velikov <emil.l.veli...@gmail.com> wrote:
>> Hi Marek,
>>
>> On 12 September 2017 at 21:42, Marek Olšák <mar...@gmail.com> wrote:
>>
>>>  include/drm/drm_syncobj.h |  4 
>> Please sync the header as described in
>> https://cgit.freedesktop.org/mesa/drm/tree/include/drm/README#n72
>>
>> Tl;DR: cd .../linux; make headers_install; cp ... .../drm/include/drm;
>> cd .../drm; git commit -sm " $branch $sha1..."
>>
> Seems like I've replied to the wrong patch - silly me.
> This was meant for the libdrm ones - suggestion still applies though.

I actually did that, but I'm on amd-staging-drm-next.

Marek
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 3/3] drm/amdgpu: add FENCE_TO_HANDLE ioctl that returns syncobj or sync_file

2017-09-28 Thread Marek Olšák
Can I get Rb for this series?

Thanks,
Marek

On Tue, Sep 12, 2017 at 10:42 PM, Marek Olšák <mar...@gmail.com> wrote:
> From: Marek Olšák <marek.ol...@amd.com>
>
> for being able to convert an amdgpu fence into one of the handles.
> Mesa will use this.
>
> Signed-off-by: Marek Olšák <marek.ol...@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu.h |  2 ++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c  | 61 
> +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c |  3 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c |  1 +
>  include/uapi/drm/amdgpu_drm.h   | 16 +
>  5 files changed, 82 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> index b5c8b90..c15fa93 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> @@ -1308,6 +1308,8 @@ int amdgpu_gem_va_ioctl(struct drm_device *dev, void 
> *data,
>  int amdgpu_gem_op_ioctl(struct drm_device *dev, void *data,
> struct drm_file *filp);
>  int amdgpu_cs_ioctl(struct drm_device *dev, void *data, struct drm_file 
> *filp);
> +int amdgpu_cs_fence_to_handle_ioctl(struct drm_device *dev, void *data,
> +   struct drm_file *filp);
>  int amdgpu_cs_wait_ioctl(struct drm_device *dev, void *data, struct drm_file 
> *filp);
>  int amdgpu_cs_wait_fences_ioctl(struct drm_device *dev, void *data,
> struct drm_file *filp);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> index 7cb8a59..6dd719c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> @@ -25,6 +25,7 @@
>   *Jerome Glisse <gli...@freedesktop.org>
>   */
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -1311,6 +1312,66 @@ static struct dma_fence *amdgpu_cs_get_fence(struct 
> amdgpu_device *adev,
> return fence;
>  }
>
> +int amdgpu_cs_fence_to_handle_ioctl(struct drm_device *dev, void *data,
> +   struct drm_file *filp)
> +{
> +   struct amdgpu_device *adev = dev->dev_private;
> +   struct amdgpu_fpriv *fpriv = filp->driver_priv;
> +   union drm_amdgpu_fence_to_handle *info = data;
> +   struct dma_fence *fence;
> +   struct drm_syncobj *syncobj;
> +   struct sync_file *sync_file;
> +   int fd, r;
> +
> +   if (amdgpu_kms_vram_lost(adev, fpriv))
> +   return -ENODEV;
> +
> +   fence = amdgpu_cs_get_fence(adev, filp, >in.fence);
> +   if (IS_ERR(fence))
> +   return PTR_ERR(fence);
> +
> +   switch (info->in.what) {
> +   case AMDGPU_FENCE_TO_HANDLE_GET_SYNCOBJ:
> +   r = drm_syncobj_create(, 0, fence);
> +   dma_fence_put(fence);
> +   if (r)
> +   return r;
> +   r = drm_syncobj_get_handle(filp, syncobj, >out.handle);
> +   drm_syncobj_put(syncobj);
> +   return r;
> +
> +   case AMDGPU_FENCE_TO_HANDLE_GET_SYNCOBJ_FD:
> +   r = drm_syncobj_create(, 0, fence);
> +   dma_fence_put(fence);
> +   if (r)
> +   return r;
> +   r = drm_syncobj_get_fd(syncobj, (int*)>out.handle);
> +   drm_syncobj_put(syncobj);
> +   return r;
> +
> +   case AMDGPU_FENCE_TO_HANDLE_GET_SYNC_FILE_FD:
> +   fd = get_unused_fd_flags(O_CLOEXEC);
> +   if (fd < 0) {
> +   dma_fence_put(fence);
> +   return fd;
> +   }
> +
> +   sync_file = sync_file_create(fence);
> +   dma_fence_put(fence);
> +   if (!sync_file) {
> +   put_unused_fd(fd);
> +   return -ENOMEM;
> +   }
> +
> +   fd_install(fd, sync_file->file);
> +   info->out.handle = fd;
> +   return 0;
> +
> +   default:
> +   return -EINVAL;
> +   }
> +}
> +
>  /**
>   * amdgpu_cs_wait_all_fence - wait on all fences to signal
>   *
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index d01aca6..1e38411 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -70,9 +70,10 @@
>   * - 3.18.0 - Export gpu always on cu bitmap
>   * - 3.19.0 - Add support for UVD MJPEG decode
>   * - 3.20.0 - Add support fo

Re: [PATCH 5/5] drm/amd/sched: signal and free remaining fences in amd_sched_entity_fini

2017-09-28 Thread Marek Olšák
Thanks for this series. I can finally finish piglit on VI.

Marek

On Thu, Sep 28, 2017 at 4:55 PM, Nicolai Hähnle  wrote:
> From: Nicolai Hähnle 
>
> Highly concurrent Piglit runs can trigger a race condition where a pending
> SDMA job on a buffer object is never executed because the corresponding
> process is killed (perhaps due to a crash). Since the job's fences were
> never signaled, the buffer object was effectively leaked. Worse, the
> buffer was stuck wherever it happened to be at the time, possibly in VRAM.
>
> The symptom was user space processes stuck in interruptible waits with
> kernel stacks like:
>
> [] dma_fence_default_wait+0x112/0x250
> [] dma_fence_wait_timeout+0x39/0xf0
> [] reservation_object_wait_timeout_rcu+0x1c2/0x300
> [] ttm_bo_cleanup_refs_and_unlock+0xff/0x1a0 [ttm]
> [] ttm_mem_evict_first+0xba/0x1a0 [ttm]
> [] ttm_bo_mem_space+0x341/0x4c0 [ttm]
> [] ttm_bo_validate+0xd4/0x150 [ttm]
> [] ttm_bo_init_reserved+0x2ed/0x420 [ttm]
> [] amdgpu_bo_create_restricted+0x1f3/0x470 [amdgpu]
> [] amdgpu_bo_create+0xda/0x220 [amdgpu]
> [] amdgpu_gem_object_create+0xaa/0x140 [amdgpu]
> [] amdgpu_gem_create_ioctl+0x97/0x120 [amdgpu]
> [] drm_ioctl+0x1fa/0x480 [drm]
> [] amdgpu_drm_ioctl+0x4f/0x90 [amdgpu]
> [] do_vfs_ioctl+0xa3/0x5f0
> [] SyS_ioctl+0x79/0x90
> [] entry_SYSCALL_64_fastpath+0x1e/0xad
> [] 0x
>
> Signed-off-by: Nicolai Hähnle 
> Acked-by: Christian König 
> ---
>  drivers/gpu/drm/amd/scheduler/gpu_scheduler.c | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c 
> b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
> index 54eb77cffd9b..32a99e980d78 100644
> --- a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
> +++ b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
> @@ -220,22 +220,27 @@ void amd_sched_entity_fini(struct amd_gpu_scheduler 
> *sched,
> amd_sched_entity_is_idle(entity));
> amd_sched_rq_remove_entity(rq, entity);
> if (r) {
> struct amd_sched_job *job;
>
> /* Park the kernel for a moment to make sure it isn't 
> processing
>  * our enity.
>  */
> kthread_park(sched->thread);
> kthread_unpark(sched->thread);
> -   while (kfifo_out(>job_queue, , sizeof(job)))
> +   while (kfifo_out(>job_queue, , sizeof(job))) {
> +   struct amd_sched_fence *s_fence = job->s_fence;
> +   amd_sched_fence_scheduled(s_fence);
> +   amd_sched_fence_finished(s_fence);
> +   dma_fence_put(_fence->finished);
> sched->ops->free_job(job);
> +   }
>
> }
> kfifo_free(>job_queue);
>  }
>
>  static void amd_sched_entity_wakeup(struct dma_fence *f, struct dma_fence_cb 
> *cb)
>  {
> struct amd_sched_entity *entity =
> container_of(cb, struct amd_sched_entity, cb);
> entity->dependency = NULL;
> --
> 2.11.0
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 11/18] drm/amdgpu:add vgt_flush for gfx9

2017-09-18 Thread Marek Olšák
Yes, the UMD does it.

Marek

On Mon, Sep 18, 2017 at 11:18 AM, Christian König
 wrote:
> Am 18.09.2017 um 08:11 schrieb Monk Liu:
>>
>> Change-Id: I584572cfb9145ee1b8d11d69ba2989bd6acfd706
>> Signed-off-by: Monk Liu 
>
>
> I could be wrong, but wasn't the consensus that this should be done by the
> UMD?
>
> Marek, please comment.
>
> Christian.
>
>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 14 ++
>>   1 file changed, 14 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>> b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>> index 3306667..f201510 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>> @@ -3499,6 +3499,17 @@ static void gfx_v9_0_ring_set_wptr_gfx(struct
>> amdgpu_ring *ring)
>> }
>>   }
>>   +static void gfx_v9_0_ring_emit_vgt_flush(struct amdgpu_ring *ring)
>> +{
>> +   amdgpu_ring_write(ring, PACKET3(PACKET3_EVENT_WRITE, 0));
>> +   amdgpu_ring_write(ring, EVENT_TYPE(VS_PARTIAL_FLUSH) |
>> +   EVENT_INDEX(4));
>> +
>> +   amdgpu_ring_write(ring, PACKET3(PACKET3_EVENT_WRITE, 0));
>> +   amdgpu_ring_write(ring, EVENT_TYPE(VGT_FLUSH) |
>> +   EVENT_INDEX(0));
>> +}
>> +
>>   static void gfx_v9_0_ring_emit_hdp_flush(struct amdgpu_ring *ring)
>>   {
>> u32 ref_and_mask, reg_mem_engine;
>> @@ -3530,6 +3541,9 @@ static void gfx_v9_0_ring_emit_hdp_flush(struct
>> amdgpu_ring *ring)
>>   nbio_hf_reg->hdp_flush_req_offset,
>>   nbio_hf_reg->hdp_flush_done_offset,
>>   ref_and_mask, ref_and_mask, 0x20);
>> +
>> +   if (ring->funcs->type == AMDGPU_RING_TYPE_GFX)
>> +   gfx_v9_0_ring_emit_vgt_flush(ring);
>>   }
>> static void gfx_v9_0_ring_emit_hdp_invalidate(struct amdgpu_ring
>> *ring)
>
>
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: revert tile table update for oland

2017-09-14 Thread Marek Olšák
Reviewed-by: Marek Olšák <marek.ol...@amd.com>

Marek

On Mon, Sep 11, 2017 at 5:43 PM, Jean Delvare <jdelv...@suse.de> wrote:
> Several users have complained that the tile table update broke Oland
> support. Despite several attempts to fix it, the root cause is still
> unknown at this point and no solution is available. As it is not
> acceptable to leave a known regression breaking a major functionality
> in the kernel for several releases, let's just reverse this
> optimization for now. It can be implemented again later if and only
> if the breakage is understood and fixed.
>
> As there were no complaints for Hainan so far, only the Oland part of
> the offending commit is reverted. Optimization is preserved on
> Hainan, so this commit isn't an actual revert of the original.
>
> This fixes bug #194761:
> https://bugzilla.kernel.org/show_bug.cgi?id=194761
>
> Signed-off-by: Jean Delvare <jdelv...@suse.de>
> Fixes: f8d9422ef80c ("drm/amdgpu: update tile table for oland/hainan")
> Cc: Flora Cui <flora@amd.com>
> Cc: Junwei Zhang <jerry.zh...@amd.com>
> Cc: Alex Deucher <alexander.deuc...@amd.com>
> Cc: Marek Olšák <mar...@gmail.com>
> ---
> This version of the fix is suitable for kernels v4.13 and up.
> I'm running it for some time now it works perfectly on my
> Radeon R5 240 (Dell OEM):
> 01:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. 
> [AMD/ATI] Oland [Radeon HD 8570 / R7 240/340 OEM] [1002:6611]
>
>  drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c |  189 
> +-
>  1 file changed, 188 insertions(+), 1 deletion(-)
>
> --- linux-4.13.orig/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c   2017-09-11 
> 17:33:30.103176910 +0200
> +++ linux-4.13/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c2017-09-11 
> 17:40:12.711316976 +0200
> @@ -636,7 +636,194 @@ static void gfx_v6_0_tiling_mode_table_i
> NUM_BANKS(ADDR_SURF_2_BANK);
> for (reg_offset = 0; reg_offset < num_tile_mode_states; 
> reg_offset++)
> WREG32(mmGB_TILE_MODE0 + reg_offset, 
> tilemode[reg_offset]);
> -   } else if (adev->asic_type == CHIP_OLAND || adev->asic_type == 
> CHIP_HAINAN) {
> +   } else if (adev->asic_type == CHIP_OLAND) {
> +   tilemode[0] =   MICRO_TILE_MODE(ADDR_SURF_DEPTH_MICRO_TILING) 
> |
> +   ARRAY_MODE(ARRAY_2D_TILED_THIN1) |
> +   PIPE_CONFIG(ADDR_SURF_P4_8x16) |
> +   TILE_SPLIT(ADDR_SURF_TILE_SPLIT_64B) |
> +   NUM_BANKS(ADDR_SURF_16_BANK) |
> +   BANK_WIDTH(ADDR_SURF_BANK_WIDTH_1) |
> +   BANK_HEIGHT(ADDR_SURF_BANK_HEIGHT_4) |
> +   MACRO_TILE_ASPECT(ADDR_SURF_MACRO_ASPECT_4);
> +   tilemode[1] =   MICRO_TILE_MODE(ADDR_SURF_DEPTH_MICRO_TILING) 
> |
> +   ARRAY_MODE(ARRAY_2D_TILED_THIN1) |
> +   PIPE_CONFIG(ADDR_SURF_P4_8x16) |
> +   TILE_SPLIT(ADDR_SURF_TILE_SPLIT_128B) |
> +   NUM_BANKS(ADDR_SURF_16_BANK) |
> +   BANK_WIDTH(ADDR_SURF_BANK_WIDTH_1) |
> +   BANK_HEIGHT(ADDR_SURF_BANK_HEIGHT_4) |
> +   MACRO_TILE_ASPECT(ADDR_SURF_MACRO_ASPECT_4);
> +   tilemode[2] =   MICRO_TILE_MODE(ADDR_SURF_DEPTH_MICRO_TILING) 
> |
> +   ARRAY_MODE(ARRAY_2D_TILED_THIN1) |
> +   PIPE_CONFIG(ADDR_SURF_P4_8x16) |
> +   TILE_SPLIT(ADDR_SURF_TILE_SPLIT_256B) |
> +   NUM_BANKS(ADDR_SURF_16_BANK) |
> +   BANK_WIDTH(ADDR_SURF_BANK_WIDTH_1) |
> +   BANK_HEIGHT(ADDR_SURF_BANK_HEIGHT_4) |
> +   MACRO_TILE_ASPECT(ADDR_SURF_MACRO_ASPECT_4);
> +   tilemode[3] =   MICRO_TILE_MODE(ADDR_SURF_DEPTH_MICRO_TILING) 
> |
> +   ARRAY_MODE(ARRAY_2D_TILED_THIN1) |
> +   PIPE_CONFIG(ADDR_SURF_P4_8x16) |
> +   TILE_SPLIT(ADDR_SURF_TILE_SPLIT_128B) |
> +   NUM_BANKS(ADDR_SURF_16_BANK) |
> +   BANK_WIDTH(ADDR_SURF_BANK_WIDTH_1) |
> +   BANK_HEIGHT(ADDR_SURF_BANK_HEIGHT_4) |
> +   MACRO_TILE_ASPECT(ADDR_SURF_MACRO_ASPECT_4);
> +   tilemode[4] =   MICRO_TILE_MODE(ADDR_SURF_DEPTH_MICRO_TILING) 
> |
> 

Re: [PATCH 3/3] drm/amdgpu: add FENCE_TO_HANDLE ioctl that returns syncobj or sync_file

2017-09-13 Thread Marek Olšák
On Wed, Sep 13, 2017 at 3:46 PM, Zhou, David(ChunMing)
 wrote:
> For android using mesa instance, egl draw will dequeue an android buffer,
> after egl draw, the buffer will back to android bufffer queue, but need
> append a syncfile fd. If getting syncfile fd for every egl draw always needs
> several syncobj ioctls, the io overhead isn't small. But if we directly
> return syncfile when egl draw CS,  isn't it better?

You have a good point. I'd be OK with either approach, or even with
having both options in the kernel.

Marek
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 3/3] drm/amdgpu: add FENCE_TO_HANDLE ioctl that returns syncobj or sync_file

2017-09-13 Thread Marek Olšák
On Wed, Sep 13, 2017 at 1:32 PM, Zhou, David(ChunMing)
 wrote:
> Could you describe how difficult to directly use CS syncfile fd in Mesa
> compared with concerting CS seq to syncfile fd via several syncobj ioctls?

It just simplifies things. Mesa primarily uses seq_no-based fences and
will continue to use them. We can't remove the seq_no fence code
because we have to keep Mesa compatible with older kernels.

The only possibilities are:
- Mesa gets both seq_no and sync_file from CS.
- Mesa only gets seq_no from CS.

I decided to take the simpler option. I don't know if there is a perf
difference between CS returning a sync_file and using a separate
ioctl, but it's probably insignificant since we already call 3 ioctls
per IB submission (BO list create+destroy, submit).

Marek

>
> Regards,
> David Zhou
>
> 发自坚果 Pro
>
> Marek Ol?醟  于 2017年9月13日 下午6:11写道:
>
> On Wed, Sep 13, 2017 at 5:03 AM, zhoucm1  wrote:
>> Hi Marek,
>>
>> You're doing same things with me, see my "introduce syncfile as fence
>> reuturn" patch set, which makes things more simple, we just need to
>> directly
>> return syncfile fd to UMD when CS, then the fence UMD get will be always
>> syncfile fd, UMD don't need to construct ip_type/ip_instance/ctx_id/ring
>> any
>> more, which also can pass to dependency and syncobj as well.
>
> For simpler Mesa code, Mesa won't get a sync file from the CS ioctl.
>
> Marek
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 3/3] drm/amdgpu: add FENCE_TO_HANDLE ioctl that returns syncobj or sync_file

2017-09-13 Thread Marek Olšák
On Wed, Sep 13, 2017 at 5:03 AM, zhoucm1  wrote:
> Hi Marek,
>
> You're doing same things with me, see my "introduce syncfile as fence
> reuturn" patch set, which makes things more simple, we just need to directly
> return syncfile fd to UMD when CS, then the fence UMD get will be always
> syncfile fd, UMD don't need to construct ip_type/ip_instance/ctx_id/ring any
> more, which also can pass to dependency and syncobj as well.

For simpler Mesa code, Mesa won't get a sync file from the CS ioctl.

Marek
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH libdrm 3/4] amdgpu: add amdgpu_cs_syncobj_wait

2017-09-12 Thread Marek Olšák
From: Marek Olšák <marek.ol...@amd.com>

---
 amdgpu/amdgpu.h| 20 
 amdgpu/amdgpu_cs.c | 12 
 2 files changed, 32 insertions(+)

diff --git a/amdgpu/amdgpu.h b/amdgpu/amdgpu.h
index b44b9b6..979acfc 100644
--- a/amdgpu/amdgpu.h
+++ b/amdgpu/amdgpu.h
@@ -1354,6 +1354,26 @@ int amdgpu_cs_destroy_syncobj(amdgpu_device_handle dev,
  uint32_t syncobj);
 
 /**
+ *  Wait for one or all sync objects to signal.
+ *
+ * \param   dev- \c [in] self-explanatory
+ * \param   handles - \c [in] array of sync object handles
+ * \param   num_handles - \c [in] self-explanatory
+ * \param   timeout_nsec - \c [in] self-explanatory
+ * \param   flags   - \c [in] a bitmask of DRM_SYNCOBJ_WAIT_FLAGS_*
+ * \param   first_signaled - \c [in] self-explanatory
+ *
+ * \return   0 on success\n
+ *  -ETIME - Timeout
+ *  <0 - Negative POSIX Error code
+ *
+ */
+int amdgpu_cs_syncobj_wait(amdgpu_device_handle dev,
+  uint32_t *handles, unsigned num_handles,
+  int64_t timeout_nsec, unsigned flags,
+  uint32_t *first_signaled);
+
+/**
  *  Export kernel sync object to shareable fd.
  *
  * \param   dev   - \c [in] device handle
diff --git a/amdgpu/amdgpu_cs.c b/amdgpu/amdgpu_cs.c
index 4a05536..2cde7bf 100644
--- a/amdgpu/amdgpu_cs.c
+++ b/amdgpu/amdgpu_cs.c
@@ -615,6 +615,18 @@ int amdgpu_cs_destroy_syncobj(amdgpu_device_handle dev,
return drmSyncobjDestroy(dev->fd, handle);
 }
 
+int amdgpu_cs_syncobj_wait(amdgpu_device_handle dev,
+  uint32_t *handles, unsigned num_handles,
+  int64_t timeout_nsec, unsigned flags,
+  uint32_t *first_signaled)
+{
+   if (NULL == dev)
+   return -EINVAL;
+
+   return drmSyncobjWait(dev->fd, handles, num_handles, timeout_nsec,
+ flags, first_signaled);
+}
+
 int amdgpu_cs_export_syncobj(amdgpu_device_handle dev,
 uint32_t handle,
 int *shared_fd)
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH libdrm 4/4] amdgpu: add amdgpu_cs_fence_to_handle

2017-09-12 Thread Marek Olšák
From: Marek Olšák <marek.ol...@amd.com>

---
 amdgpu/amdgpu.h  | 14 ++
 amdgpu/amdgpu_cs.c   | 22 ++
 include/drm/amdgpu_drm.h | 21 +
 3 files changed, 57 insertions(+)

diff --git a/amdgpu/amdgpu.h b/amdgpu/amdgpu.h
index 979acfc..23cde10 100644
--- a/amdgpu/amdgpu.h
+++ b/amdgpu/amdgpu.h
@@ -1433,6 +1433,20 @@ int 
amdgpu_cs_syncobj_import_sync_file(amdgpu_device_handle dev,
   int sync_file_fd);
 
 /**
+ * Export an amdgpu fence as a handle (syncobj or fd).
+ *
+ * \param what AMDGPU_FENCE_TO_HANDLE_GET_{SYNCOBJ, FD}
+ * \param out_handle   returned handle
+ *
+ * \return   0 on success\n
+ *  <0 - Negative POSIX Error code
+ */
+int amdgpu_cs_fence_to_handle(amdgpu_device_handle dev,
+ struct amdgpu_cs_fence *fence,
+ uint32_t what,
+ uint32_t *out_handle);
+
+/**
  *  Submit raw command submission to kernel
  *
  * \param   dev   - \c [in] device handle
diff --git a/amdgpu/amdgpu_cs.c b/amdgpu/amdgpu_cs.c
index 2cde7bf..9577d5c 100644
--- a/amdgpu/amdgpu_cs.c
+++ b/amdgpu/amdgpu_cs.c
@@ -713,3 +713,25 @@ void amdgpu_cs_chunk_fence_to_dep(struct amdgpu_cs_fence 
*fence,
dep->ctx_id = fence->context->id;
dep->handle = fence->fence;
 }
+
+int amdgpu_cs_fence_to_handle(amdgpu_device_handle dev,
+ struct amdgpu_cs_fence *fence,
+ uint32_t what,
+ uint32_t *out_handle)
+{
+   union drm_amdgpu_fence_to_handle fth = {0};
+   int r;
+
+   fth.in.fence.ctx_id = fence->context->id;
+   fth.in.fence.ip_type = fence->ip_type;
+   fth.in.fence.ip_instance = fence->ip_instance;
+   fth.in.fence.ring = fence->ring;
+   fth.in.fence.seq_no = fence->fence;
+   fth.in.what = what;
+
+   r = drmCommandWriteRead(dev->fd, DRM_AMDGPU_FENCE_TO_HANDLE,
+   , sizeof(fth));
+   if (r == 0)
+   *out_handle = fth.out.handle;
+   return r;
+}
diff --git a/include/drm/amdgpu_drm.h b/include/drm/amdgpu_drm.h
index d9aa4a3..00f1b81 100644
--- a/include/drm/amdgpu_drm.h
+++ b/include/drm/amdgpu_drm.h
@@ -52,6 +52,7 @@ extern "C" {
 #define DRM_AMDGPU_GEM_USERPTR 0x11
 #define DRM_AMDGPU_WAIT_FENCES 0x12
 #define DRM_AMDGPU_VM  0x13
+#define DRM_AMDGPU_FENCE_TO_HANDLE 0x15
 
 #define DRM_IOCTL_AMDGPU_GEM_CREATEDRM_IOWR(DRM_COMMAND_BASE + 
DRM_AMDGPU_GEM_CREATE, union drm_amdgpu_gem_create)
 #define DRM_IOCTL_AMDGPU_GEM_MMAP  DRM_IOWR(DRM_COMMAND_BASE + 
DRM_AMDGPU_GEM_MMAP, union drm_amdgpu_gem_mmap)
@@ -67,6 +68,7 @@ extern "C" {
 #define DRM_IOCTL_AMDGPU_GEM_USERPTR   DRM_IOWR(DRM_COMMAND_BASE + 
DRM_AMDGPU_GEM_USERPTR, struct drm_amdgpu_gem_userptr)
 #define DRM_IOCTL_AMDGPU_WAIT_FENCES   DRM_IOWR(DRM_COMMAND_BASE + 
DRM_AMDGPU_WAIT_FENCES, union drm_amdgpu_wait_fences)
 #define DRM_IOCTL_AMDGPU_VMDRM_IOWR(DRM_COMMAND_BASE + 
DRM_AMDGPU_VM, union drm_amdgpu_vm)
+#define DRM_IOCTL_AMDGPU_FENCE_TO_HANDLE DRM_IOWR(DRM_COMMAND_BASE + 
DRM_AMDGPU_FENCE_TO_HANDLE, union drm_amdgpu_fence_to_handle)
 
 #define AMDGPU_GEM_DOMAIN_CPU  0x1
 #define AMDGPU_GEM_DOMAIN_GTT  0x2
@@ -87,6 +89,8 @@ extern "C" {
 #define AMDGPU_GEM_CREATE_SHADOW   (1 << 4)
 /* Flag that allocating the BO should use linear VRAM */
 #define AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS  (1 << 5)
+/* Flag that BO is always valid in this VM */
+#define AMDGPU_GEM_CREATE_VM_ALWAYS_VALID  (1 << 6)
 
 struct drm_amdgpu_gem_create_in  {
/** the requested memory size */
@@ -513,6 +517,20 @@ struct drm_amdgpu_cs_chunk_sem {
__u32 handle;
 };
 
+#define AMDGPU_FENCE_TO_HANDLE_GET_SYNCOBJ 0
+#define AMDGPU_FENCE_TO_HANDLE_GET_SYNCOBJ_FD  1
+#define AMDGPU_FENCE_TO_HANDLE_GET_SYNC_FILE_FD2
+
+union drm_amdgpu_fence_to_handle {
+   struct {
+   struct drm_amdgpu_fence fence;
+   __u32 what;
+   } in;
+   struct {
+   __u32 handle;
+   } out;
+};
+
 struct drm_amdgpu_cs_chunk_data {
union {
struct drm_amdgpu_cs_chunk_ib   ib_data;
@@ -764,6 +782,7 @@ struct drm_amdgpu_info_device {
__u64 max_memory_clock;
/* cu information */
__u32 cu_active_number;
+   /* NOTE: cu_ao_mask is INVALID, DON'T use it */
__u32 cu_ao_mask;
__u32 cu_bitmap[4][4];
/** Render backend pipe mask. One render backend is CB+DB. */
@@ -818,6 +837,8 @@ struct drm_amdgpu_info_device {
/* max gs wavefront per vgt*/
__u32 max_gs_waves_per_vgt;
__u32 _pad1;
+   /* always on cu bitmap */
+   __u32 cu_ao_bitmap[4][4];
 };
 
 struct drm_amdgpu_info_hw

[PATCH libdrm 2/4] drm: add drmSyncobjWait wrapper

2017-09-12 Thread Marek Olšák
From: Marek Olšák <marek.ol...@amd.com>

---
 include/drm/drm.h | 24 
 xf86drm.c | 22 ++
 xf86drm.h |  3 +++
 3 files changed, 49 insertions(+)

diff --git a/include/drm/drm.h b/include/drm/drm.h
index bf3674a..4da1667 100644
--- a/include/drm/drm.h
+++ b/include/drm/drm.h
@@ -694,6 +694,7 @@ struct drm_prime_handle {
 
 struct drm_syncobj_create {
__u32 handle;
+#define DRM_SYNCOBJ_CREATE_SIGNALED (1 << 0)
__u32 flags;
 };
 
@@ -712,6 +713,24 @@ struct drm_syncobj_handle {
__u32 pad;
 };
 
+#define DRM_SYNCOBJ_WAIT_FLAGS_WAIT_ALL (1 << 0)
+#define DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT (1 << 1)
+struct drm_syncobj_wait {
+   __u64 handles;
+   /* absolute timeout */
+   __s64 timeout_nsec;
+   __u32 count_handles;
+   __u32 flags;
+   __u32 first_signaled; /* only valid when not waiting all */
+   __u32 pad;
+};
+
+struct drm_syncobj_array {
+   __u64 handles;
+   __u32 count_handles;
+   __u32 pad;
+};
+
 #if defined(__cplusplus)
 }
 #endif
@@ -834,6 +853,9 @@ extern "C" {
 #define DRM_IOCTL_SYNCOBJ_DESTROY  DRM_IOWR(0xC0, struct 
drm_syncobj_destroy)
 #define DRM_IOCTL_SYNCOBJ_HANDLE_TO_FD DRM_IOWR(0xC1, struct 
drm_syncobj_handle)
 #define DRM_IOCTL_SYNCOBJ_FD_TO_HANDLE DRM_IOWR(0xC2, struct 
drm_syncobj_handle)
+#define DRM_IOCTL_SYNCOBJ_WAIT DRM_IOWR(0xC3, struct drm_syncobj_wait)
+#define DRM_IOCTL_SYNCOBJ_RESETDRM_IOWR(0xC4, struct 
drm_syncobj_array)
+#define DRM_IOCTL_SYNCOBJ_SIGNAL   DRM_IOWR(0xC5, struct drm_syncobj_array)
 
 /**
  * Device specific ioctls should only be in their respective headers
@@ -876,6 +898,7 @@ struct drm_event_vblank {
 };
 
 /* typedef area */
+#ifndef __KERNEL__
 typedef struct drm_clip_rect drm_clip_rect_t;
 typedef struct drm_drawable_info drm_drawable_info_t;
 typedef struct drm_tex_region drm_tex_region_t;
@@ -917,6 +940,7 @@ typedef struct drm_agp_binding drm_agp_binding_t;
 typedef struct drm_agp_info drm_agp_info_t;
 typedef struct drm_scatter_gather drm_scatter_gather_t;
 typedef struct drm_set_version drm_set_version_t;
+#endif
 
 #if defined(__cplusplus)
 }
diff --git a/xf86drm.c b/xf86drm.c
index 6ea0112..8a32717 100644
--- a/xf86drm.c
+++ b/xf86drm.c
@@ -4225,3 +4225,25 @@ int drmSyncobjExportSyncFile(int fd, uint32_t handle, 
int *sync_file_fd)
 *sync_file_fd = args.fd;
 return 0;
 }
+
+int drmSyncobjWait(int fd, uint32_t *handles, unsigned num_handles,
+  int64_t timeout_nsec, unsigned flags,
+  uint32_t *first_signaled)
+{
+   struct drm_syncobj_wait args;
+   int ret;
+
+   memclear(args);
+   args.handles = (intptr_t)handles;
+   args.timeout_nsec = timeout_nsec;
+   args.count_handles = num_handles;
+   args.flags = flags;
+
+   ret = drmIoctl(fd, DRM_IOCTL_SYNCOBJ_WAIT, );
+   if (ret < 0)
+   return ret;
+
+   if (first_signaled)
+   *first_signaled = args.first_signaled;
+   return ret;
+}
diff --git a/xf86drm.h b/xf86drm.h
index 2855a3e..ea650ef 100644
--- a/xf86drm.h
+++ b/xf86drm.h
@@ -860,6 +860,9 @@ extern int drmSyncobjFDToHandle(int fd, int obj_fd, 
uint32_t *handle);
 
 extern int drmSyncobjImportSyncFile(int fd, uint32_t handle, int sync_file_fd);
 extern int drmSyncobjExportSyncFile(int fd, uint32_t handle, int 
*sync_file_fd);
+extern int drmSyncobjWait(int fd, uint32_t *handles, unsigned num_handles,
+ int64_t timeout_nsec, unsigned flags,
+ uint32_t *first_signaled);
 
 #if defined(__cplusplus)
 }
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH libdrm 1/4] amdgpu: add sync_file import and export functions

2017-09-12 Thread Marek Olšák
From: Marek Olšák <marek.ol...@amd.com>

---
 amdgpu/amdgpu.h| 30 ++
 amdgpu/amdgpu_cs.c | 20 
 2 files changed, 50 insertions(+)

diff --git a/amdgpu/amdgpu.h b/amdgpu/amdgpu.h
index 238b1aa..b44b9b6 100644
--- a/amdgpu/amdgpu.h
+++ b/amdgpu/amdgpu.h
@@ -1383,6 +1383,36 @@ int amdgpu_cs_import_syncobj(amdgpu_device_handle dev,
 uint32_t *syncobj);
 
 /**
+ *  Export kernel sync object to a sync_file.
+ *
+ * \param   dev   - \c [in] device handle
+ * \param   syncobj- \c [in] sync object handle
+ * \param   sync_file_fd - \c [out] sync_file file descriptor.
+ *
+ * \return   0 on success\n
+ *  <0 - Negative POSIX Error code
+ *
+ */
+int amdgpu_cs_syncobj_export_sync_file(amdgpu_device_handle dev,
+  uint32_t syncobj,
+  int *sync_file_fd);
+
+/**
+ *  Import kernel sync object from a sync_file.
+ *
+ * \param   dev   - \c [in] device handle
+ * \param   syncobj- \c [in] sync object handle
+ * \param   sync_file_fd - \c [in] sync_file file descriptor.
+ *
+ * \return   0 on success\n
+ *  <0 - Negative POSIX Error code
+ *
+ */
+int amdgpu_cs_syncobj_import_sync_file(amdgpu_device_handle dev,
+  uint32_t syncobj,
+  int sync_file_fd);
+
+/**
  *  Submit raw command submission to kernel
  *
  * \param   dev   - \c [in] device handle
diff --git a/amdgpu/amdgpu_cs.c b/amdgpu/amdgpu_cs.c
index dfba875..4a05536 100644
--- a/amdgpu/amdgpu_cs.c
+++ b/amdgpu/amdgpu_cs.c
@@ -635,6 +635,26 @@ int amdgpu_cs_import_syncobj(amdgpu_device_handle dev,
return drmSyncobjFDToHandle(dev->fd, shared_fd, handle);
 }
 
+int amdgpu_cs_syncobj_export_sync_file(amdgpu_device_handle dev,
+  uint32_t syncobj,
+  int *sync_file_fd)
+{
+   if (NULL == dev)
+   return -EINVAL;
+
+   return drmSyncobjExportSyncFile(dev->fd, syncobj, sync_file_fd);
+}
+
+int amdgpu_cs_syncobj_import_sync_file(amdgpu_device_handle dev,
+  uint32_t syncobj,
+  int sync_file_fd)
+{
+   if (NULL == dev)
+   return -EINVAL;
+
+   return drmSyncobjImportSyncFile(dev->fd, syncobj, sync_file_fd);
+}
+
 int amdgpu_cs_submit_raw(amdgpu_device_handle dev,
 amdgpu_context_handle context,
 amdgpu_bo_list_handle bo_list_handle,
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 2/3] drm/syncobj: add a new helper drm_syncobj_get_fd

2017-09-12 Thread Marek Olšák
From: Marek Olšák <marek.ol...@amd.com>

Signed-off-by: Marek Olšák <marek.ol...@amd.com>
---
 drivers/gpu/drm/drm_syncobj.c | 33 +++--
 include/drm/drm_syncobj.h |  1 +
 2 files changed, 20 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index 0bb1741..62adc7a 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -386,33 +386,38 @@ static int drm_syncobj_alloc_file(struct drm_syncobj 
*syncobj)
return 0;
 }
 
-static int drm_syncobj_handle_to_fd(struct drm_file *file_private,
-   u32 handle, int *p_fd)
+int drm_syncobj_get_fd(struct drm_syncobj *syncobj, int *p_fd)
 {
-   struct drm_syncobj *syncobj = drm_syncobj_find(file_private, handle);
int ret;
int fd;
 
-   if (!syncobj)
-   return -EINVAL;
-
fd = get_unused_fd_flags(O_CLOEXEC);
-   if (fd < 0) {
-   drm_syncobj_put(syncobj);
+   if (fd < 0)
return fd;
-   }
 
if (!syncobj->file) {
ret = drm_syncobj_alloc_file(syncobj);
-   if (ret)
-   goto out_put_fd;
+   if (ret) {
+   put_unused_fd(fd);
+   return ret;
+   }
}
fd_install(fd, syncobj->file);
-   drm_syncobj_put(syncobj);
*p_fd = fd;
return 0;
-out_put_fd:
-   put_unused_fd(fd);
+}
+EXPORT_SYMBOL(drm_syncobj_get_fd);
+
+static int drm_syncobj_handle_to_fd(struct drm_file *file_private,
+   u32 handle, int *p_fd)
+{
+   struct drm_syncobj *syncobj = drm_syncobj_find(file_private, handle);
+   int ret;
+
+   if (!syncobj)
+   return -EINVAL;
+
+   ret = drm_syncobj_get_fd(syncobj, p_fd);
drm_syncobj_put(syncobj);
return ret;
 }
diff --git a/include/drm/drm_syncobj.h b/include/drm/drm_syncobj.h
index e7f0035..43e2f38 100644
--- a/include/drm/drm_syncobj.h
+++ b/include/drm/drm_syncobj.h
@@ -140,5 +140,6 @@ int drm_syncobj_create(struct drm_syncobj **out_syncobj, 
uint32_t flags,
   struct dma_fence *fence);
 int drm_syncobj_get_handle(struct drm_file *file_private,
   struct drm_syncobj *syncobj, u32 *handle);
+int drm_syncobj_get_fd(struct drm_syncobj *syncobj, int *p_fd);
 
 #endif
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 1/3] drm/syncobj: extract two helpers from drm_syncobj_create

2017-09-12 Thread Marek Olšák
From: Marek Olšák <marek.ol...@amd.com>

For amdgpu.

drm_syncobj_create is renamed to drm_syncobj_create_as_handle, and new
helpers drm_syncobj_create and drm_syncobj_get_handle are added.

Signed-off-by: Marek Olšák <marek.ol...@amd.com>
---
 drivers/gpu/drm/drm_syncobj.c | 49 +++
 include/drm/drm_syncobj.h |  4 
 2 files changed, 49 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index 0422b8c..0bb1741 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -262,8 +262,14 @@ void drm_syncobj_free(struct kref *kref)
 }
 EXPORT_SYMBOL(drm_syncobj_free);
 
-static int drm_syncobj_create(struct drm_file *file_private,
- u32 *handle, uint32_t flags)
+/**
+ * drm_syncobj_create - create a new syncobj
+ * @out_syncobj: returned syncobj
+ * @flags: DRM_SYNCOBJ_* flags
+ * @fence: if non-NULL, the syncobj will represent this fence
+ */
+int drm_syncobj_create(struct drm_syncobj **out_syncobj, uint32_t flags,
+  struct dma_fence *fence)
 {
int ret;
struct drm_syncobj *syncobj;
@@ -284,6 +290,25 @@ static int drm_syncobj_create(struct drm_file 
*file_private,
}
}
 
+   if (fence)
+   drm_syncobj_replace_fence(syncobj, fence);
+
+   *out_syncobj = syncobj;
+   return 0;
+}
+EXPORT_SYMBOL(drm_syncobj_create);
+
+/**
+ * drm_syncobj_get_handle - get a handle from a syncobj
+ */
+int drm_syncobj_get_handle(struct drm_file *file_private,
+  struct drm_syncobj *syncobj, u32 *handle)
+{
+   int ret;
+
+   /* take a reference to put in the idr */
+   drm_syncobj_get(syncobj);
+
idr_preload(GFP_KERNEL);
spin_lock(_private->syncobj_table_lock);
ret = idr_alloc(_private->syncobj_idr, syncobj, 1, 0, GFP_NOWAIT);
@@ -299,6 +324,22 @@ static int drm_syncobj_create(struct drm_file 
*file_private,
*handle = ret;
return 0;
 }
+EXPORT_SYMBOL(drm_syncobj_get_handle);
+
+static int drm_syncobj_create_as_handle(struct drm_file *file_private,
+   u32 *handle, uint32_t flags)
+{
+   int ret;
+   struct drm_syncobj *syncobj;
+
+   ret = drm_syncobj_create(, flags, NULL);
+   if (ret)
+   return ret;
+
+   ret = drm_syncobj_get_handle(file_private, syncobj, handle);
+   drm_syncobj_put(syncobj);
+   return ret;
+}
 
 static int drm_syncobj_destroy(struct drm_file *file_private,
   u32 handle)
@@ -522,8 +563,8 @@ drm_syncobj_create_ioctl(struct drm_device *dev, void *data,
if (args->flags & ~DRM_SYNCOBJ_CREATE_SIGNALED)
return -EINVAL;
 
-   return drm_syncobj_create(file_private,
- >handle, args->flags);
+   return drm_syncobj_create_as_handle(file_private,
+   >handle, args->flags);
 }
 
 int
diff --git a/include/drm/drm_syncobj.h b/include/drm/drm_syncobj.h
index c00fee5..e7f0035 100644
--- a/include/drm/drm_syncobj.h
+++ b/include/drm/drm_syncobj.h
@@ -136,5 +136,9 @@ int drm_syncobj_find_fence(struct drm_file *file_private,
   u32 handle,
   struct dma_fence **fence);
 void drm_syncobj_free(struct kref *kref);
+int drm_syncobj_create(struct drm_syncobj **out_syncobj, uint32_t flags,
+  struct dma_fence *fence);
+int drm_syncobj_get_handle(struct drm_file *file_private,
+  struct drm_syncobj *syncobj, u32 *handle);
 
 #endif
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 3/3] drm/amdgpu: add FENCE_TO_HANDLE ioctl that returns syncobj or sync_file

2017-09-12 Thread Marek Olšák
From: Marek Olšák <marek.ol...@amd.com>

for being able to convert an amdgpu fence into one of the handles.
Mesa will use this.

Signed-off-by: Marek Olšák <marek.ol...@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h |  2 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c  | 61 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c |  3 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c |  1 +
 include/uapi/drm/amdgpu_drm.h   | 16 +
 5 files changed, 82 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index b5c8b90..c15fa93 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1308,6 +1308,8 @@ int amdgpu_gem_va_ioctl(struct drm_device *dev, void 
*data,
 int amdgpu_gem_op_ioctl(struct drm_device *dev, void *data,
struct drm_file *filp);
 int amdgpu_cs_ioctl(struct drm_device *dev, void *data, struct drm_file *filp);
+int amdgpu_cs_fence_to_handle_ioctl(struct drm_device *dev, void *data,
+   struct drm_file *filp);
 int amdgpu_cs_wait_ioctl(struct drm_device *dev, void *data, struct drm_file 
*filp);
 int amdgpu_cs_wait_fences_ioctl(struct drm_device *dev, void *data,
struct drm_file *filp);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 7cb8a59..6dd719c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -25,6 +25,7 @@
  *Jerome Glisse <gli...@freedesktop.org>
  */
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1311,6 +1312,66 @@ static struct dma_fence *amdgpu_cs_get_fence(struct 
amdgpu_device *adev,
return fence;
 }
 
+int amdgpu_cs_fence_to_handle_ioctl(struct drm_device *dev, void *data,
+   struct drm_file *filp)
+{
+   struct amdgpu_device *adev = dev->dev_private;
+   struct amdgpu_fpriv *fpriv = filp->driver_priv;
+   union drm_amdgpu_fence_to_handle *info = data;
+   struct dma_fence *fence;
+   struct drm_syncobj *syncobj;
+   struct sync_file *sync_file;
+   int fd, r;
+
+   if (amdgpu_kms_vram_lost(adev, fpriv))
+   return -ENODEV;
+
+   fence = amdgpu_cs_get_fence(adev, filp, >in.fence);
+   if (IS_ERR(fence))
+   return PTR_ERR(fence);
+
+   switch (info->in.what) {
+   case AMDGPU_FENCE_TO_HANDLE_GET_SYNCOBJ:
+   r = drm_syncobj_create(, 0, fence);
+   dma_fence_put(fence);
+   if (r)
+   return r;
+   r = drm_syncobj_get_handle(filp, syncobj, >out.handle);
+   drm_syncobj_put(syncobj);
+   return r;
+
+   case AMDGPU_FENCE_TO_HANDLE_GET_SYNCOBJ_FD:
+   r = drm_syncobj_create(, 0, fence);
+   dma_fence_put(fence);
+   if (r)
+   return r;
+   r = drm_syncobj_get_fd(syncobj, (int*)>out.handle);
+   drm_syncobj_put(syncobj);
+   return r;
+
+   case AMDGPU_FENCE_TO_HANDLE_GET_SYNC_FILE_FD:
+   fd = get_unused_fd_flags(O_CLOEXEC);
+   if (fd < 0) {
+   dma_fence_put(fence);
+   return fd;
+   }
+
+   sync_file = sync_file_create(fence);
+   dma_fence_put(fence);
+   if (!sync_file) {
+   put_unused_fd(fd);
+   return -ENOMEM;
+   }
+
+   fd_install(fd, sync_file->file);
+   info->out.handle = fd;
+   return 0;
+
+   default:
+   return -EINVAL;
+   }
+}
+
 /**
  * amdgpu_cs_wait_all_fence - wait on all fences to signal
  *
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index d01aca6..1e38411 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -70,9 +70,10 @@
  * - 3.18.0 - Export gpu always on cu bitmap
  * - 3.19.0 - Add support for UVD MJPEG decode
  * - 3.20.0 - Add support for local BOs
+ * - 3.21.0 - Add DRM_AMDGPU_FENCE_TO_HANDLE ioctl
  */
 #define KMS_DRIVER_MAJOR   3
-#define KMS_DRIVER_MINOR   20
+#define KMS_DRIVER_MINOR   21
 #define KMS_DRIVER_PATCHLEVEL  0
 
 int amdgpu_vram_limit = 0;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index d31777b..b09d315 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -1021,6 +1021,7 @@ const struct drm_ioctl_desc amdgpu_ioctls_kms[] = {
DRM_IOCTL_DEF_DRV(AMDGPU_CTX, amdgpu_ctx_ioctl, 
DRM_AUTH|DRM_RENDER_ALLOW),
DRM_IOCTL_DEF_DRV(AMDGPU_VM, amdgpu_vm_ioctl, 
DRM_AUTH|DRM_RENDER_ALLOW),
DRM_IOCTL_DEF_DRV(AM

[PATCH libdrm] amdgpu: print error messages when amdgpu_device_initialize is failing

2017-09-12 Thread Marek Olšák
From: Marek Olšák <marek.ol...@amd.com>

---
 amdgpu/amdgpu_device.c | 19 ---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/amdgpu/amdgpu_device.c b/amdgpu/amdgpu_device.c
index 9a238d9..2b31c45 100644
--- a/amdgpu/amdgpu_device.c
+++ b/amdgpu/amdgpu_device.c
@@ -184,42 +184,47 @@ int amdgpu_device_initialize(int fd,
uint32_t accel_working = 0;
uint64_t start, max;
 
*device_handle = NULL;
 
pthread_mutex_lock(_mutex);
if (!fd_tab)
fd_tab = util_hash_table_create(fd_hash, fd_compare);
r = amdgpu_get_auth(fd, _auth);
if (r) {
+   fprintf(stderr, "%s: amdgpu_get_auth (1) failed (%i)\n",
+   __func__, r);
pthread_mutex_unlock(_mutex);
return r;
}
dev = util_hash_table_get(fd_tab, UINT_TO_PTR(fd));
if (dev) {
r = amdgpu_get_auth(dev->fd, _authexist);
if (r) {
+   fprintf(stderr, "%s: amdgpu_get_auth (2) failed (%i)\n",
+   __func__, r);
pthread_mutex_unlock(_mutex);
return r;
}
if ((flag_auth) && (!flag_authexist)) {
dev->flink_fd = dup(fd);
}
*major_version = dev->major_version;
*minor_version = dev->minor_version;
amdgpu_device_reference(device_handle, dev);
pthread_mutex_unlock(_mutex);
return 0;
}
 
dev = calloc(1, sizeof(struct amdgpu_device));
if (!dev) {
+   fprintf(stderr, "%s: calloc failed\n", __func__);
pthread_mutex_unlock(_mutex);
return -ENOMEM;
}
 
dev->fd = -1;
dev->flink_fd = -1;
 
atomic_set(>refcount, 1);
 
version = drmGetVersion(fd);
@@ -241,41 +246,49 @@ int amdgpu_device_initialize(int fd,
dev->minor_version = version->version_minor;
drmFreeVersion(version);
 
dev->bo_flink_names = util_hash_table_create(handle_hash,
 handle_compare);
dev->bo_handles = util_hash_table_create(handle_hash, handle_compare);
pthread_mutex_init(>bo_table_mutex, NULL);
 
/* Check if acceleration is working. */
r = amdgpu_query_info(dev, AMDGPU_INFO_ACCEL_WORKING, 4, 
_working);
-   if (r)
+   if (r) {
+   fprintf(stderr, "%s: amdgpu_query_info(ACCEL_WORKING) failed 
(%i)\n",
+   __func__, r);
goto cleanup;
+   }
if (!accel_working) {
+   fprintf(stderr, "%s: AMDGPU_INFO_ACCEL_WORKING = 0\n", 
__func__);
r = -EBADF;
goto cleanup;
}
 
r = amdgpu_query_gpu_info_init(dev);
-   if (r)
+   if (r) {
+   fprintf(stderr, "%s: amdgpu_query_gpu_info_init failed\n", 
__func__);
goto cleanup;
+   }
 
amdgpu_vamgr_init(>vamgr, dev->dev_info.virtual_address_offset,
  dev->dev_info.virtual_address_max,
  dev->dev_info.virtual_address_alignment);
 
max = MIN2(dev->dev_info.virtual_address_max, 0x);
start = amdgpu_vamgr_find_va(>vamgr,
 max - dev->dev_info.virtual_address_offset,
 dev->dev_info.virtual_address_alignment, 
0);
-   if (start > 0x)
+   if (start > 0x) {
+   fprintf(stderr, "%s: amdgpu_vamgr_find_va failed\n", __func__);
goto free_va; /* shouldn't get here */
+   }
 
amdgpu_vamgr_init(>vamgr_32, start, max,
  dev->dev_info.virtual_address_alignment);
 
r = amdgpu_parse_asic_ids(>asic_ids);
if (r) {
fprintf(stderr, "%s: Cannot parse ASIC IDs, 0x%x.",
__func__, r);
}
 
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: fix VM sync with always valid BOs

2017-09-08 Thread Marek Olšák
The hang seems to be gone with this patch.

Marek

On Fri, Sep 8, 2017 at 2:26 PM, Christian König  wrote:
> Marek this one will most likely fix your issues with always valid BOs on
> Raven.
>
> Please give it a try when you have time.
>
> Thanks,
> Christian.
>
>
> Am 08.09.2017 um 14:18 schrieb Christian König:
>>
>> From: Christian König 
>>
>> All users of a VM must always wait for updates with always
>> valid BOs to be completed.
>>
>> Signed-off-by: Christian König 
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 14 ++
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 15 ++-
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h |  2 +-
>>   3 files changed, 21 insertions(+), 10 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>> index 8aa37e0..d6e66b7 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>> @@ -752,10 +752,6 @@ static int amdgpu_bo_vm_update_pte(struct
>> amdgpu_cs_parser *p)
>> if (r)
>> return r;
>>   - r = amdgpu_sync_fence(adev, >job->sync, vm->last_dir_update);
>> -   if (r)
>> -   return r;
>> -
>> r = amdgpu_vm_clear_freed(adev, vm, NULL);
>> if (r)
>> return r;
>> @@ -797,6 +793,10 @@ static int amdgpu_bo_vm_update_pte(struct
>> amdgpu_cs_parser *p)
>> if (bo_va == NULL)
>> continue;
>>   + if (bo_va->base.bo->flags &
>> +   AMDGPU_GEM_CREATE_VM_ALWAYS_VALID)
>> +   continue;
>> +
>> r = amdgpu_vm_bo_update(adev, bo_va, false);
>> if (r)
>> return r;
>> @@ -810,6 +810,12 @@ static int amdgpu_bo_vm_update_pte(struct
>> amdgpu_cs_parser *p)
>> }
>> r = amdgpu_vm_handle_moved(adev, vm, >job->sync);
>> +   if (r)
>> +   return r;
>> +
>> +   r = amdgpu_sync_fence(adev, >job->sync, vm->mandatory_update);
>> +   if (r)
>> +   return r;
>> if (amdgpu_vm_debug && p->bo_list) {
>> /* Invalidate all BOs to test for userspace bugs */
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> index 55f1ecb..12c8a4c 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> @@ -1140,9 +1140,8 @@ static int amdgpu_vm_update_level(struct
>> amdgpu_device *adev,
>> goto error_free;
>> amdgpu_bo_fence(parent->base.bo, fence, true);
>> -   dma_fence_put(vm->last_dir_update);
>> -   vm->last_dir_update = dma_fence_get(fence);
>> -   dma_fence_put(fence);
>> +   dma_fence_put(vm->mandatory_update);
>> +   vm->mandatory_update = fence;
>> }
>> }
>>   @@ -1803,6 +1802,12 @@ int amdgpu_vm_bo_update(struct amdgpu_device
>> *adev,
>> trace_amdgpu_vm_bo_mapping(mapping);
>> }
>>   + if (bo_va->base.bo &&
>> +   bo_va->base.bo->tbo.resv == vm->root.base.bo->tbo.resv) {
>> +   dma_fence_put(vm->mandatory_update);
>> +   vm->mandatory_update =
>> dma_fence_get(bo_va->last_pt_update);
>> +   }
>> +
>> return 0;
>>   }
>>   @@ -2586,7 +2591,7 @@ int amdgpu_vm_init(struct amdgpu_device *adev,
>> struct amdgpu_vm *vm,
>>  vm->use_cpu_for_update ? "CPU" : "SDMA");
>> WARN_ONCE((vm->use_cpu_for_update &
>> !amdgpu_vm_is_large_bar(adev)),
>>   "CPU update of VM recommended only for large BAR
>> system\n");
>> -   vm->last_dir_update = NULL;
>> +   vm->mandatory_update = NULL;
>> flags = AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS |
>> AMDGPU_GEM_CREATE_VRAM_CLEARED;
>> @@ -2692,7 +2697,7 @@ void amdgpu_vm_fini(struct amdgpu_device *adev,
>> struct amdgpu_vm *vm)
>> }
>> amdgpu_vm_free_levels(>root);
>> -   dma_fence_put(vm->last_dir_update);
>> +   dma_fence_put(vm->mandatory_update);
>> for (i = 0; i < AMDGPU_MAX_VMHUBS; i++)
>> amdgpu_vm_free_reserved_vmid(adev, vm, i);
>>   }
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>> index c1accd1..63fa2e5 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>> @@ -140,7 +140,7 @@ struct amdgpu_vm {
>> /* contains the page directory */
>> struct amdgpu_vm_pt root;
>> -   struct dma_fence*last_dir_update;
>> +   struct dma_fence*mandatory_update;
>> /* protecting freed */
>> spinlock_t 

Re: [PATCH libdrm] libdrm_amdgpu: add kernel semaphore support

2017-09-04 Thread Marek Olšák
On Fri, Sep 1, 2017 at 5:36 PM, Marek Olšák <mar...@gmail.com> wrote:
> On Thu, Jul 6, 2017 at 3:17 AM, Dave Airlie <airl...@gmail.com> wrote:
>> From: Dave Airlie <airl...@redhat.com>
>>
>> This adds kernel semaphore support to the command submission
>> interface in what should be a backwards compatible manner,
>> it adds a new command submission API.
>>
>> Signed-off-by: Dave Airlie <airl...@redhat.com>
>> ---
>>  amdgpu/amdgpu.h|  29 -
>>  amdgpu/amdgpu_cs.c | 118 
>> +
>>  2 files changed, 138 insertions(+), 9 deletions(-)
>>
>> diff --git a/amdgpu/amdgpu.h b/amdgpu/amdgpu.h
>> index 1901fa8..649b66e 100644
>> --- a/amdgpu/amdgpu.h
>> +++ b/amdgpu/amdgpu.h
>> @@ -369,6 +369,16 @@ struct amdgpu_cs_request {
>> struct amdgpu_cs_fence_info fence_info;
>>  };
>>
>> +struct amdgpu_cs_request_syncobj {
>> +   /*
>> +*
>> +*/
>
> Did you mean to fill in the comment here?
>
> Also, is this interface relevant with the raw CS API?

Additionally, where can I find the kernel patch for this?

Thanks,
Marek
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH libdrm] libdrm_amdgpu: add kernel semaphore support

2017-09-01 Thread Marek Olšák
On Thu, Jul 6, 2017 at 3:17 AM, Dave Airlie  wrote:
> From: Dave Airlie 
>
> This adds kernel semaphore support to the command submission
> interface in what should be a backwards compatible manner,
> it adds a new command submission API.
>
> Signed-off-by: Dave Airlie 
> ---
>  amdgpu/amdgpu.h|  29 -
>  amdgpu/amdgpu_cs.c | 118 
> +
>  2 files changed, 138 insertions(+), 9 deletions(-)
>
> diff --git a/amdgpu/amdgpu.h b/amdgpu/amdgpu.h
> index 1901fa8..649b66e 100644
> --- a/amdgpu/amdgpu.h
> +++ b/amdgpu/amdgpu.h
> @@ -369,6 +369,16 @@ struct amdgpu_cs_request {
> struct amdgpu_cs_fence_info fence_info;
>  };
>
> +struct amdgpu_cs_request_syncobj {
> +   /*
> +*
> +*/

Did you mean to fill in the comment here?

Also, is this interface relevant with the raw CS API?

Marek

> +   uint32_t number_in_syncobj;
> +   uint32_t number_out_syncobj;
> +   uint32_t *in_syncobj;
> +   uint32_t *out_syncobj;
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 9/9] drm/amdgpu: WIP add IOCTL interface for per VM BOs

2017-08-29 Thread Marek Olšák
It might be interesting to try glmark2.

Marek

On Tue, Aug 29, 2017 at 3:59 PM, Christian König
<deathsim...@vodafone.de> wrote:
> Ok, found something that works. Xonotic in lowest resolution, lowest effects
> quality (e.g. totally CPU bound):
>
> Without per process BOs:
>
> Xonotic 0.8:
> pts/xonotic-1.4.0 [Resolution: 800 x 600 - Effects Quality: Low]
> Test 1 of 1
> Estimated Trial Run Count:3
> Estimated Time To Completion: 3 Minutes
> Started Run 1 @ 21:13:50
> Started Run 2 @ 21:14:57
> Started Run 3 @ 21:16:03  [Std. Dev: 0.94%]
>
> Test Results:
> 187.436577
> 189.514724
> 190.9605812
>
> Average: 189.30 Frames Per Second
> Minimum: 131
> Maximum: 355
>
> With per process BOs:
>
> Xonotic 0.8:
> pts/xonotic-1.4.0 [Resolution: 800 x 600 - Effects Quality: Low]
> Test 1 of 1
> Estimated Trial Run Count:3
> Estimated Time To Completion: 3 Minutes
> Started Run 1 @ 21:20:05
> Started Run 2 @ 21:21:07
> Started Run 3 @ 21:22:10  [Std. Dev: 1.49%]
>
> Test Results:
> 203.0471676
> 199.6622532
> 197.0954183
>
> Average: 199.93 Frames Per Second
> Minimum: 132
> Maximum: 349
>
> Well that looks like some improvement.
>
> Regards,
> Christian.
>
>
> Am 28.08.2017 um 14:59 schrieb Zhou, David(ChunMing):
>
> I will push our vulkan guys to test it, their bo list is very long.
>
> 发自坚果 Pro
>
> Christian K鰊ig <deathsim...@vodafone.de> 于 2017年8月28日 下午7:55写道:
>
> Am 28.08.2017 um 06:21 schrieb zhoucm1:
>>
>>
>> On 2017年08月27日 18:03, Christian König wrote:
>>> Am 25.08.2017 um 21:19 schrieb Christian König:
>>>> Am 25.08.2017 um 18:22 schrieb Marek Olšák:
>>>>> On Fri, Aug 25, 2017 at 3:00 PM, Christian König
>>>>> <deathsim...@vodafone.de> wrote:
>>>>>> Am 25.08.2017 um 12:32 schrieb zhoucm1:
>>>>>>>
>>>>>>>
>>>>>>> On 2017年08月25日 17:38, Christian König wrote:
>>>>>>>> From: Christian König <christian.koe...@amd.com>
>>>>>>>>
>>>>>>>> Add the IOCTL interface so that applications can allocate per VM
>>>>>>>> BOs.
>>>>>>>>
>>>>>>>> Still WIP since not all corner cases are tested yet, but this
>>>>>>>> reduces
>>>>>>>> average
>>>>>>>> CS overhead for 10K BOs from 21ms down to 48us.
>>>>>>> Wow, cheers, eventually you get per vm bo to same reservation
>>>>>>> with PD/pts,
>>>>>>> indeed save a lot of bo list.
>>>>>>
>>>>>> Don't cheer to loud yet, that is a completely constructed test case.
>>>>>>
>>>>>> So far I wasn't able to archive any improvements with any real
>>>>>> game on this
>>>>>> with Mesa.
>> With thinking more, too many BOs share one reservation, which could
>> result in reservation lock often is busy, if eviction or destroy also
>> happens often in the meaning time, then which could effect VM update
>> and CS submission as well.
>
> That's exactly the reason why I've added code to the BO destroy path to
> avoid at least some of the problems. But yeah, that's only the tip of
> the iceberg of problems with that approach.
>
>> Anyway, this is very good start and try that we reduce CS overhead,
>> especially we've seen "reduces average CS overhead for 10K BOs from
>> 21ms down to 48us. ".
>
> Actually, it's not that good. See this is a completely build up test
> case on a kernel with lockdep and KASAN enabled.
>
> In reality we usually don't have so many BOs and so far I wasn't able to
> find much of an improvement in any real world testing.
>
> Regards,
> Christian.
>
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 9/9] drm/amdgpu: WIP add IOCTL interface for per VM BOs

2017-08-25 Thread Marek Olšák
On Fri, Aug 25, 2017 at 3:00 PM, Christian König
 wrote:
> Am 25.08.2017 um 12:32 schrieb zhoucm1:
>>
>>
>>
>> On 2017年08月25日 17:38, Christian König wrote:
>>>
>>> From: Christian König 
>>>
>>> Add the IOCTL interface so that applications can allocate per VM BOs.
>>>
>>> Still WIP since not all corner cases are tested yet, but this reduces
>>> average
>>> CS overhead for 10K BOs from 21ms down to 48us.
>>
>> Wow, cheers, eventually you get per vm bo to same reservation with PD/pts,
>> indeed save a lot of bo list.
>
>
> Don't cheer to loud yet, that is a completely constructed test case.
>
> So far I wasn't able to archive any improvements with any real game on this
> with Mesa.
>
> BTW: Marek can you take a look with some CPU bound tests? I can provide a
> kernel branch if necessary.

Do you have a branch that works on Raven? This patch series doesn't,
and I didn't investigate why.

Marek

>
> Regards,
> Christian.
>
>
>> overall looks good, I will take a detailed check for this tomorrow.
>>
>> Regards,
>> David Zhou
>>>
>>>
>>> Signed-off-by: Christian König 
>>> ---
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu.h   |  7 ++--
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c|  2 +-
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c   | 59
>>> ++-
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c |  3 +-
>>>   include/uapi/drm/amdgpu_drm.h |  2 ++
>>>   5 files changed, 51 insertions(+), 22 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>> index b1e817c..21cab36 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>> @@ -457,9 +457,10 @@ struct amdgpu_sa_bo {
>>>*/
>>>   void amdgpu_gem_force_release(struct amdgpu_device *adev);
>>>   int amdgpu_gem_object_create(struct amdgpu_device *adev, unsigned long
>>> size,
>>> -int alignment, u32 initial_domain,
>>> -u64 flags, bool kernel,
>>> -struct drm_gem_object **obj);
>>> + int alignment, u32 initial_domain,
>>> + u64 flags, bool kernel,
>>> + struct reservation_object *resv,
>>> + struct drm_gem_object **obj);
>>> int amdgpu_mode_dumb_create(struct drm_file *file_priv,
>>>   struct drm_device *dev,
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c
>>> index 0e907ea..7256f83 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c
>>> @@ -144,7 +144,7 @@ static int amdgpufb_create_pinned_object(struct
>>> amdgpu_fbdev *rfbdev,
>>>  AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED |
>>>  AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS |
>>>  AMDGPU_GEM_CREATE_VRAM_CLEARED,
>>> -   true, );
>>> +   true, NULL, );
>>>   if (ret) {
>>>   pr_err("failed to allocate framebuffer (%d)\n", aligned_size);
>>>   return -ENOMEM;
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>> index d028806..b8e8d67 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>> @@ -44,11 +44,12 @@ void amdgpu_gem_object_free(struct drm_gem_object
>>> *gobj)
>>>   }
>>> int amdgpu_gem_object_create(struct amdgpu_device *adev, unsigned
>>> long size,
>>> -int alignment, u32 initial_domain,
>>> -u64 flags, bool kernel,
>>> -struct drm_gem_object **obj)
>>> + int alignment, u32 initial_domain,
>>> + u64 flags, bool kernel,
>>> + struct reservation_object *resv,
>>> + struct drm_gem_object **obj)
>>>   {
>>> -struct amdgpu_bo *robj;
>>> +struct amdgpu_bo *bo;
>>>   int r;
>>> *obj = NULL;
>>> @@ -59,7 +60,7 @@ int amdgpu_gem_object_create(struct amdgpu_device
>>> *adev, unsigned long size,
>>> retry:
>>>   r = amdgpu_bo_create(adev, size, alignment, kernel, initial_domain,
>>> - flags, NULL, NULL, 0, );
>>> + flags, NULL, resv, 0, );
>>>   if (r) {
>>>   if (r != -ERESTARTSYS) {
>>>   if (initial_domain == AMDGPU_GEM_DOMAIN_VRAM) {
>>> @@ -71,7 +72,7 @@ int amdgpu_gem_object_create(struct amdgpu_device
>>> *adev, unsigned long size,
>>>   }
>>>   return r;
>>>   }
>>> -*obj = >gem_base;
>>> +*obj = >gem_base;
>>> return 0;
>>>   }
>>> @@ -136,13 +137,14 @@ void amdgpu_gem_object_close(struct drm_gem_object
>>> *obj,
>>>   struct amdgpu_vm *vm = >vm;
>>> struct amdgpu_bo_list_entry vm_pd;
>>> -struct list_head list;
>>> +struct list_head list, duplicates;
>>>   struct 

Re: [PATCH] drm/amdgpu: Fix undue fallthroughs in golden registers initialization

2017-07-30 Thread Marek Olšák
Reviewed-by: Marek Olšák <marek.ol...@amd.com>

Marek

On Sun, Jul 30, 2017 at 10:18 AM, Jean Delvare <jdelv...@suse.de> wrote:

> As I was staring at the si_init_golden_registers code, I noticed that
> the Pitcairn initialization silently falls through the Cape Verde
> initialization, and the Oland initialization falls through the Hainan
> initialization. However there is no comment stating that this is
> intentional, and the radeon driver doesn't have any such fallthrough,
> so I suspect this is not supposed to happen.
>
> Signed-off-by: Jean Delvare <jdelv...@suse.de>
> Fixes: 62a37553414a ("drm/amdgpu: add si implementation v10")
> Cc: Ken Wang <qingqing.w...@amd.com>
> Cc: Alex Deucher <alexander.deuc...@amd.com>
> Cc: "Marek Olšák" <mar...@gmail.com>
> Cc: "Christian König" <christian.koe...@amd.com>
> Cc: Flora Cui <flora@amd.com>
> ---
> If the fallthroughs are really supposed to happen, comments should be
> added that say so. Surprisingly it doesn't seem to make any
> difference on my Oland card.
>
>  drivers/gpu/drm/amd/amdgpu/si.c |2 ++
>  1 file changed, 2 insertions(+)
>
> --- linux-4.12.orig/drivers/gpu/drm/amd/amdgpu/si.c 2017-07-30
> 09:25:46.891083334 +0200
> +++ linux-4.12/drivers/gpu/drm/amd/amdgpu/si.c  2017-07-30
> 09:45:24.350188642 +0200
> @@ -1385,6 +1385,7 @@ static void si_init_golden_registers(str
> amdgpu_program_register_sequence(adev,
>  pitcairn_mgcg_cgcg_init,
>  (const
> u32)ARRAY_SIZE(pitcairn_mgcg_cgcg_init));
> +   break;
> case CHIP_VERDE:
> amdgpu_program_register_sequence(adev,
>  verde_golden_registers,
> @@ -1409,6 +1410,7 @@ static void si_init_golden_registers(str
> amdgpu_program_register_sequence(adev,
>  oland_mgcg_cgcg_init,
>  (const
> u32)ARRAY_SIZE(oland_mgcg_cgcg_init));
> +   break;
> case CHIP_HAINAN:
> amdgpu_program_register_sequence(adev,
>  hainan_golden_registers,
>
>
> --
> Jean Delvare
> SUSE L3 Support
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [Mesa-dev] [RFC PATCH] radeonsi: set a per-buffer flag that disables inter-process sharing (v2)

2017-07-20 Thread Marek Olšák
On Jul 19, 2017 10:21 PM, "zhoucm1" <david1.z...@amd.com> wrote:



On 2017年07月19日 23:34, Marek Olšák wrote:



On Jul 19, 2017 3:36 AM, "zhoucm1" <david1.z...@amd.com> wrote:



On 2017年07月19日 04:08, Marek Olšák wrote:

> From: Marek Olšák <marek.ol...@amd.com>
>
> For lower overhead in the CS ioctl.
> Winsys allocators are not used with interprocess-sharable resources.
>
Hi Marek,

Could I know from how your this way reduces overhead in CS ioctl? reusing
BO to short bo list?


The kernel part of the work hasn't been done yet. The idea is that
nonsharable buffers don't have to be revalidated by TTM,

OK, Maybe I only can see the whole picture of this idea when you complete
kernel part.
Out of curious,  why/how can nonsharable buffers be revalidated by TTM
without exposing like amdgpu_bo_make_resident api?


I think the idea is that all nonsharable buffers will be backed by the same
reservation object, so TTM can skip buffer validation if no buffer has been
moved. It's just an optimization for the current design.


With mentioned in another thread, if we can expose make_resident api, we
can remove bo_list, even we can remove reservation operation in CS ioctl.
And now, I think our bo list is a very bad design,
first, umd must create bo list for every command submission, this is a
extra cpu overhead compared with traditional way.
second, kernel also have to iterate the list, when bo list is too long,
like OpenCL program, they always throw several thousands BOs to bo list,
reservation must keep these thousands ww_mutex safe, CPU overhead is too
big.

So I strongly suggest we should expose make_resident api to user space. if
cannot, I want to know any specific reason to see if we can solve it.


Yeah, I think the BO list idea is likely to die sooner or later. It made
sense for GL before bindless was a thing. Nowadays I don't see much value
in it.

MesaGL will keep tracking the BO list because it's a requirement for good
GL performance (it determines whether to flush IBs before BO
synchronization, it allows tracking fences for each BO, which are used to
determine dependencies between IBs, and that all allows async SDMA and
async compute for GL, which doesn't have separate queues).

However, we don't need any BO list at the libdrm level and lower. I think a
BO_CREATE flag that causes that the buffer is added to a kernel-side per-fd
BO list would be sufficient. How the kernel manages its BO list should be
its own implementation detail. Initially we can just move the current BO
list management into the kernel.

Marek





Regards,
David Zhou

so it should remove a lot of kernel overhead and the BO list remains the
same.

Marek



Thanks,
David Zhou


> v2: It shouldn't crash anymore, but the kernel will reject the new flag.
> ---
>   src/gallium/drivers/radeon/r600_buffer_common.c |  7 +
>   src/gallium/drivers/radeon/radeon_winsys.h  | 20 +++---
>   src/gallium/winsys/amdgpu/drm/amdgpu_bo.c   | 36
> -
>   src/gallium/winsys/radeon/drm/radeon_drm_bo.c   | 27 +++
>   4 files changed, 62 insertions(+), 28 deletions(-)
>
> diff --git a/src/gallium/drivers/radeon/r600_buffer_common.c
> b/src/gallium/drivers/radeon/r600_buffer_common.c
> index dd1c209..2747ac4 100644
> --- a/src/gallium/drivers/radeon/r600_buffer_common.c
> +++ b/src/gallium/drivers/radeon/r600_buffer_common.c
> @@ -160,20 +160,27 @@ void r600_init_resource_fields(struct
> r600_common_screen *rscreen,
> }
> /* Tiled textures are unmappable. Always put them in VRAM. */
> if ((res->b.b.target != PIPE_BUFFER && !rtex->surface.is_linear) ||
> res->flags & R600_RESOURCE_FLAG_UNMAPPABLE) {
> res->domains = RADEON_DOMAIN_VRAM;
> res->flags |= RADEON_FLAG_NO_CPU_ACCESS |
>  RADEON_FLAG_GTT_WC;
> }
>   + /* Only displayable single-sample textures can be shared between
> +* processes. */
> +   if (res->b.b.target == PIPE_BUFFER ||
> +   res->b.b.nr_samples >= 2 ||
> +   rtex->surface.micro_tile_mode != RADEON_MICRO_MODE_DISPLAY)
> +   res->flags |= RADEON_FLAG_NO_INTERPROCESS_SHARING;
> +
> /* If VRAM is just stolen system memory, allow both VRAM and
>  * GTT, whichever has free space. If a buffer is evicted from
>  * VRAM to GTT, it will stay there.
>  *
>  * DRM 3.6.0 has good BO move throttling, so we can allow VRAM-only
>  * placements even with a low amount of stolen VRAM.
>  */
> if (!rscreen->info.has_dedicated_vram &&
> (rscreen->info.drm_major < 3 || rscreen->info.drm_minor < 6) &&
> res->domains == RADEON_D

Re: [Mesa-dev] [RFC PATCH] radeonsi: set a per-buffer flag that disables inter-process sharing (v2)

2017-07-19 Thread Marek Olšák
On Jul 19, 2017 3:36 AM, "zhoucm1" <david1.z...@amd.com> wrote:



On 2017年07月19日 04:08, Marek Olšák wrote:

> From: Marek Olšák <marek.ol...@amd.com>
>
> For lower overhead in the CS ioctl.
> Winsys allocators are not used with interprocess-sharable resources.
>
Hi Marek,

Could I know from how your this way reduces overhead in CS ioctl? reusing
BO to short bo list?


The kernel part of the work hasn't been done yet. The idea is that
nonsharable buffers don't have to be revalidated by TTM, so it should
remove a lot of kernel overhead and the BO list remains the same.

Marek



Thanks,
David Zhou


> v2: It shouldn't crash anymore, but the kernel will reject the new flag.
> ---
>   src/gallium/drivers/radeon/r600_buffer_common.c |  7 +
>   src/gallium/drivers/radeon/radeon_winsys.h  | 20 +++---
>   src/gallium/winsys/amdgpu/drm/amdgpu_bo.c   | 36
> -
>   src/gallium/winsys/radeon/drm/radeon_drm_bo.c   | 27 +++
>   4 files changed, 62 insertions(+), 28 deletions(-)
>
> diff --git a/src/gallium/drivers/radeon/r600_buffer_common.c
> b/src/gallium/drivers/radeon/r600_buffer_common.c
> index dd1c209..2747ac4 100644
> --- a/src/gallium/drivers/radeon/r600_buffer_common.c
> +++ b/src/gallium/drivers/radeon/r600_buffer_common.c
> @@ -160,20 +160,27 @@ void r600_init_resource_fields(struct
> r600_common_screen *rscreen,
> }
> /* Tiled textures are unmappable. Always put them in VRAM. */
> if ((res->b.b.target != PIPE_BUFFER && !rtex->surface.is_linear) ||
> res->flags & R600_RESOURCE_FLAG_UNMAPPABLE) {
> res->domains = RADEON_DOMAIN_VRAM;
> res->flags |= RADEON_FLAG_NO_CPU_ACCESS |
>  RADEON_FLAG_GTT_WC;
> }
>   + /* Only displayable single-sample textures can be shared between
> +* processes. */
> +   if (res->b.b.target == PIPE_BUFFER ||
> +   res->b.b.nr_samples >= 2 ||
> +   rtex->surface.micro_tile_mode != RADEON_MICRO_MODE_DISPLAY)
> +   res->flags |= RADEON_FLAG_NO_INTERPROCESS_SHARING;
> +
> /* If VRAM is just stolen system memory, allow both VRAM and
>  * GTT, whichever has free space. If a buffer is evicted from
>  * VRAM to GTT, it will stay there.
>  *
>  * DRM 3.6.0 has good BO move throttling, so we can allow VRAM-only
>  * placements even with a low amount of stolen VRAM.
>  */
> if (!rscreen->info.has_dedicated_vram &&
> (rscreen->info.drm_major < 3 || rscreen->info.drm_minor < 6) &&
> res->domains == RADEON_DOMAIN_VRAM) {
> diff --git a/src/gallium/drivers/radeon/radeon_winsys.h
> b/src/gallium/drivers/radeon/radeon_winsys.h
> index 351edcd..0abcb56 100644
> --- a/src/gallium/drivers/radeon/radeon_winsys.h
> +++ b/src/gallium/drivers/radeon/radeon_winsys.h
> @@ -47,20 +47,21 @@ enum radeon_bo_domain { /* bitfield */
>   RADEON_DOMAIN_GTT  = 2,
>   RADEON_DOMAIN_VRAM = 4,
>   RADEON_DOMAIN_VRAM_GTT = RADEON_DOMAIN_VRAM | RADEON_DOMAIN_GTT
>   };
> enum radeon_bo_flag { /* bitfield */
>   RADEON_FLAG_GTT_WC =(1 << 0),
>   RADEON_FLAG_NO_CPU_ACCESS = (1 << 1),
>   RADEON_FLAG_NO_SUBALLOC =   (1 << 2),
>   RADEON_FLAG_SPARSE =(1 << 3),
> +RADEON_FLAG_NO_INTERPROCESS_SHARING = (1 << 4),
>   };
> enum radeon_bo_usage { /* bitfield */
>   RADEON_USAGE_READ = 2,
>   RADEON_USAGE_WRITE = 4,
>   RADEON_USAGE_READWRITE = RADEON_USAGE_READ | RADEON_USAGE_WRITE,
> /* The winsys ensures that the CS submission will be scheduled
> after
>* previously flushed CSs referencing this BO in a conflicting way.
>*/
> @@ -685,28 +686,33 @@ static inline enum radeon_bo_domain
> radeon_domain_from_heap(enum radeon_heap hea
>   default:
>   assert(0);
>   return (enum radeon_bo_domain)0;
>   }
>   }
> static inline unsigned radeon_flags_from_heap(enum radeon_heap heap)
>   {
>   switch (heap) {
>   case RADEON_HEAP_VRAM_NO_CPU_ACCESS:
> -return RADEON_FLAG_GTT_WC | RADEON_FLAG_NO_CPU_ACCESS;
> +return RADEON_FLAG_GTT_WC |
> +   RADEON_FLAG_NO_CPU_ACCESS |
> +   RADEON_FLAG_NO_INTERPROCESS_SHARING;
> +
>   case RADEON_HEAP_VRAM:
>   case RADEON_HEAP_VRAM_GTT:
>   case RADEON_HEAP_GTT_WC:
> -return RADEON_FLAG_GTT_WC;
> +return RADEON_FLAG_GTT_WC |
> +   RADEON_FLAG_NO_INTERPROCESS_SHARING;
> +
>   case RADEON_HEAP_GTT:
>   

Re: [RFC PATCH] radeonsi: set a per-buffer flag that disables inter-process sharing (v2)

2017-07-18 Thread Marek Olšák
On Tue, Jul 18, 2017 at 5:11 PM, Michel Dänzer <mic...@daenzer.net> wrote:
> On 18/07/17 04:08 PM, Marek Olšák wrote:
>>
>> From: Marek Olšák <marek.ol...@amd.com>
>>
>> For lower overhead in the CS ioctl.
>> Winsys allocators are not used with interprocess-sharable resources.
>>
>> v2: It shouldn't crash anymore, but the kernel will reject the new flag.
>> ---
>>   src/gallium/drivers/radeon/r600_buffer_common.c |  7 +
>>   src/gallium/drivers/radeon/radeon_winsys.h  | 20 +++---
>>   src/gallium/winsys/amdgpu/drm/amdgpu_bo.c   | 36
>> -
>>   src/gallium/winsys/radeon/drm/radeon_drm_bo.c   | 27 +++
>>   4 files changed, 62 insertions(+), 28 deletions(-)
>>
>> diff --git a/src/gallium/drivers/radeon/r600_buffer_common.c
>> b/src/gallium/drivers/radeon/r600_buffer_common.c
>> index dd1c209..2747ac4 100644
>> --- a/src/gallium/drivers/radeon/r600_buffer_common.c
>> +++ b/src/gallium/drivers/radeon/r600_buffer_common.c
>> @@ -160,20 +160,27 @@ void r600_init_resource_fields(struct
>> r600_common_screen *rscreen,
>> }
>> /* Tiled textures are unmappable. Always put them in VRAM. */
>> if ((res->b.b.target != PIPE_BUFFER && !rtex->surface.is_linear)
>> ||
>> res->flags & R600_RESOURCE_FLAG_UNMAPPABLE) {
>> res->domains = RADEON_DOMAIN_VRAM;
>> res->flags |= RADEON_FLAG_NO_CPU_ACCESS |
>>  RADEON_FLAG_GTT_WC;
>> }
>>   + /* Only displayable single-sample textures can be shared between
>> +* processes. */
>> +   if (res->b.b.target == PIPE_BUFFER ||
>> +   res->b.b.nr_samples >= 2 ||
>> +   rtex->surface.micro_tile_mode != RADEON_MICRO_MODE_DISPLAY)
>> +   res->flags |= RADEON_FLAG_NO_INTERPROCESS_SHARING;
>
>
> We might want to share non-displayable textures at some point for
> performance, right? Will we still be able to flag some textures as
> non-shareable in that case?

Yes if we weren't CPU-bound in 2D acceleration on VI. As long as we
are CPU-bound, GPU optimizations won't probably make much difference.

Marek
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[RFC PATCH] radeonsi: set a per-buffer flag that disables inter-process sharing (v2)

2017-07-18 Thread Marek Olšák
From: Marek Olšák <marek.ol...@amd.com>

For lower overhead in the CS ioctl.
Winsys allocators are not used with interprocess-sharable resources.

v2: It shouldn't crash anymore, but the kernel will reject the new flag.
---
 src/gallium/drivers/radeon/r600_buffer_common.c |  7 +
 src/gallium/drivers/radeon/radeon_winsys.h  | 20 +++---
 src/gallium/winsys/amdgpu/drm/amdgpu_bo.c   | 36 -
 src/gallium/winsys/radeon/drm/radeon_drm_bo.c   | 27 +++
 4 files changed, 62 insertions(+), 28 deletions(-)

diff --git a/src/gallium/drivers/radeon/r600_buffer_common.c 
b/src/gallium/drivers/radeon/r600_buffer_common.c
index dd1c209..2747ac4 100644
--- a/src/gallium/drivers/radeon/r600_buffer_common.c
+++ b/src/gallium/drivers/radeon/r600_buffer_common.c
@@ -160,20 +160,27 @@ void r600_init_resource_fields(struct r600_common_screen 
*rscreen,
}
 
/* Tiled textures are unmappable. Always put them in VRAM. */
if ((res->b.b.target != PIPE_BUFFER && !rtex->surface.is_linear) ||
res->flags & R600_RESOURCE_FLAG_UNMAPPABLE) {
res->domains = RADEON_DOMAIN_VRAM;
res->flags |= RADEON_FLAG_NO_CPU_ACCESS |
 RADEON_FLAG_GTT_WC;
}
 
+   /* Only displayable single-sample textures can be shared between
+* processes. */
+   if (res->b.b.target == PIPE_BUFFER ||
+   res->b.b.nr_samples >= 2 ||
+   rtex->surface.micro_tile_mode != RADEON_MICRO_MODE_DISPLAY)
+   res->flags |= RADEON_FLAG_NO_INTERPROCESS_SHARING;
+
/* If VRAM is just stolen system memory, allow both VRAM and
 * GTT, whichever has free space. If a buffer is evicted from
 * VRAM to GTT, it will stay there.
 *
 * DRM 3.6.0 has good BO move throttling, so we can allow VRAM-only
 * placements even with a low amount of stolen VRAM.
 */
if (!rscreen->info.has_dedicated_vram &&
(rscreen->info.drm_major < 3 || rscreen->info.drm_minor < 6) &&
res->domains == RADEON_DOMAIN_VRAM) {
diff --git a/src/gallium/drivers/radeon/radeon_winsys.h 
b/src/gallium/drivers/radeon/radeon_winsys.h
index 351edcd..0abcb56 100644
--- a/src/gallium/drivers/radeon/radeon_winsys.h
+++ b/src/gallium/drivers/radeon/radeon_winsys.h
@@ -47,20 +47,21 @@ enum radeon_bo_domain { /* bitfield */
 RADEON_DOMAIN_GTT  = 2,
 RADEON_DOMAIN_VRAM = 4,
 RADEON_DOMAIN_VRAM_GTT = RADEON_DOMAIN_VRAM | RADEON_DOMAIN_GTT
 };
 
 enum radeon_bo_flag { /* bitfield */
 RADEON_FLAG_GTT_WC =(1 << 0),
 RADEON_FLAG_NO_CPU_ACCESS = (1 << 1),
 RADEON_FLAG_NO_SUBALLOC =   (1 << 2),
 RADEON_FLAG_SPARSE =(1 << 3),
+RADEON_FLAG_NO_INTERPROCESS_SHARING = (1 << 4),
 };
 
 enum radeon_bo_usage { /* bitfield */
 RADEON_USAGE_READ = 2,
 RADEON_USAGE_WRITE = 4,
 RADEON_USAGE_READWRITE = RADEON_USAGE_READ | RADEON_USAGE_WRITE,
 
 /* The winsys ensures that the CS submission will be scheduled after
  * previously flushed CSs referencing this BO in a conflicting way.
  */
@@ -685,28 +686,33 @@ static inline enum radeon_bo_domain 
radeon_domain_from_heap(enum radeon_heap hea
 default:
 assert(0);
 return (enum radeon_bo_domain)0;
 }
 }
 
 static inline unsigned radeon_flags_from_heap(enum radeon_heap heap)
 {
 switch (heap) {
 case RADEON_HEAP_VRAM_NO_CPU_ACCESS:
-return RADEON_FLAG_GTT_WC | RADEON_FLAG_NO_CPU_ACCESS;
+return RADEON_FLAG_GTT_WC |
+   RADEON_FLAG_NO_CPU_ACCESS |
+   RADEON_FLAG_NO_INTERPROCESS_SHARING;
+
 case RADEON_HEAP_VRAM:
 case RADEON_HEAP_VRAM_GTT:
 case RADEON_HEAP_GTT_WC:
-return RADEON_FLAG_GTT_WC;
+return RADEON_FLAG_GTT_WC |
+   RADEON_FLAG_NO_INTERPROCESS_SHARING;
+
 case RADEON_HEAP_GTT:
 default:
-return 0;
+return RADEON_FLAG_NO_INTERPROCESS_SHARING;
 }
 }
 
 /* The pb cache bucket is chosen to minimize pb_cache misses.
  * It must be between 0 and 3 inclusive.
  */
 static inline unsigned radeon_get_pb_cache_bucket_index(enum radeon_heap heap)
 {
 switch (heap) {
 case RADEON_HEAP_VRAM_NO_CPU_ACCESS:
@@ -724,22 +730,28 @@ static inline unsigned 
radeon_get_pb_cache_bucket_index(enum radeon_heap heap)
 
 /* Return the heap index for winsys allocators, or -1 on failure. */
 static inline int radeon_get_heap_index(enum radeon_bo_domain domain,
 enum radeon_bo_flag flags)
 {
 /* VRAM implies WC (write combining) */
 assert(!(domain & RADEON_DOMAIN_VRAM) || flags & RADEON_FLAG_GTT_WC);
 /* NO_CPU_ACCESS implies VRAM only. */
 assert(!(flags & RADEON_FLAG_NO_CPU_ACCESS) || domain == 
RADEON_DOMAIN_VRAM);
 
+/* Resources with interpro

Re: [RFC PATCH] radeonsi: set a per-buffer flag that disables inter-process sharing

2017-07-18 Thread Marek Olšák
For comments only. There are some assertion failures.

Marek

On Tue, Jul 18, 2017 at 1:47 PM, Marek Olšák <mar...@gmail.com> wrote:
> From: Marek Olšák <marek.ol...@amd.com>
>
> for lower overhead in the CS ioctl
> ---
>  src/gallium/drivers/radeon/r600_buffer_common.c | 7 +++
>  src/gallium/drivers/radeon/radeon_winsys.h  | 1 +
>  src/gallium/winsys/amdgpu/drm/amdgpu_bo.c   | 6 ++
>  3 files changed, 14 insertions(+)
>
> diff --git a/src/gallium/drivers/radeon/r600_buffer_common.c 
> b/src/gallium/drivers/radeon/r600_buffer_common.c
> index dd1c209..2747ac4 100644
> --- a/src/gallium/drivers/radeon/r600_buffer_common.c
> +++ b/src/gallium/drivers/radeon/r600_buffer_common.c
> @@ -160,20 +160,27 @@ void r600_init_resource_fields(struct 
> r600_common_screen *rscreen,
> }
>
> /* Tiled textures are unmappable. Always put them in VRAM. */
> if ((res->b.b.target != PIPE_BUFFER && !rtex->surface.is_linear) ||
> res->flags & R600_RESOURCE_FLAG_UNMAPPABLE) {
> res->domains = RADEON_DOMAIN_VRAM;
> res->flags |= RADEON_FLAG_NO_CPU_ACCESS |
>  RADEON_FLAG_GTT_WC;
> }
>
> +   /* Only displayable single-sample textures can be shared between
> +* processes. */
> +   if (res->b.b.target == PIPE_BUFFER ||
> +   res->b.b.nr_samples >= 2 ||
> +   rtex->surface.micro_tile_mode != RADEON_MICRO_MODE_DISPLAY)
> +   res->flags |= RADEON_FLAG_NO_INTERPROCESS_SHARING;
> +
> /* If VRAM is just stolen system memory, allow both VRAM and
>  * GTT, whichever has free space. If a buffer is evicted from
>  * VRAM to GTT, it will stay there.
>  *
>  * DRM 3.6.0 has good BO move throttling, so we can allow VRAM-only
>  * placements even with a low amount of stolen VRAM.
>  */
> if (!rscreen->info.has_dedicated_vram &&
> (rscreen->info.drm_major < 3 || rscreen->info.drm_minor < 6) &&
> res->domains == RADEON_DOMAIN_VRAM) {
> diff --git a/src/gallium/drivers/radeon/radeon_winsys.h 
> b/src/gallium/drivers/radeon/radeon_winsys.h
> index 351edcd..ce2fd73 100644
> --- a/src/gallium/drivers/radeon/radeon_winsys.h
> +++ b/src/gallium/drivers/radeon/radeon_winsys.h
> @@ -47,20 +47,21 @@ enum radeon_bo_domain { /* bitfield */
>  RADEON_DOMAIN_GTT  = 2,
>  RADEON_DOMAIN_VRAM = 4,
>  RADEON_DOMAIN_VRAM_GTT = RADEON_DOMAIN_VRAM | RADEON_DOMAIN_GTT
>  };
>
>  enum radeon_bo_flag { /* bitfield */
>  RADEON_FLAG_GTT_WC =(1 << 0),
>  RADEON_FLAG_NO_CPU_ACCESS = (1 << 1),
>  RADEON_FLAG_NO_SUBALLOC =   (1 << 2),
>  RADEON_FLAG_SPARSE =(1 << 3),
> +RADEON_FLAG_NO_INTERPROCESS_SHARING = (1 << 4),
>  };
>
>  enum radeon_bo_usage { /* bitfield */
>  RADEON_USAGE_READ = 2,
>  RADEON_USAGE_WRITE = 4,
>  RADEON_USAGE_READWRITE = RADEON_USAGE_READ | RADEON_USAGE_WRITE,
>
>  /* The winsys ensures that the CS submission will be scheduled after
>   * previously flushed CSs referencing this BO in a conflicting way.
>   */
> diff --git a/src/gallium/winsys/amdgpu/drm/amdgpu_bo.c 
> b/src/gallium/winsys/amdgpu/drm/amdgpu_bo.c
> index 97bbe23..f97e1bf 100644
> --- a/src/gallium/winsys/amdgpu/drm/amdgpu_bo.c
> +++ b/src/gallium/winsys/amdgpu/drm/amdgpu_bo.c
> @@ -31,20 +31,24 @@
>
>  #include "amdgpu_cs.h"
>
>  #include "os/os_time.h"
>  #include "state_tracker/drm_driver.h"
>  #include 
>  #include 
>  #include 
>  #include 
>
> +#ifndef AMDGPU_GEM_CREATE_NO_INTERPROCESS_SHARING
> +#define AMDGPU_GEM_CREATE_NO_INTERPROCESS_SHARING (1 << 6)
> +#endif
> +
>  /* Set to 1 for verbose output showing committed sparse buffer ranges. */
>  #define DEBUG_SPARSE_COMMITS 0
>
>  struct amdgpu_sparse_backing_chunk {
> uint32_t begin, end;
>  };
>
>  static struct pb_buffer *
>  amdgpu_bo_create(struct radeon_winsys *rws,
>   uint64_t size,
> @@ -395,20 +399,22 @@ static struct amdgpu_winsys_bo *amdgpu_create_bo(struct 
> amdgpu_winsys *ws,
>
> if (initial_domain & RADEON_DOMAIN_VRAM)
>request.preferred_heap |= AMDGPU_GEM_DOMAIN_VRAM;
> if (initial_domain & RADEON_DOMAIN_GTT)
>request.preferred_heap |= AMDGPU_GEM_DOMAIN_GTT;
>
> if (flags & RADEON_FLAG_NO_CPU_ACCESS)
>request.flags |= AMDGPU_GEM_CREATE_NO_CPU_ACCESS;
> if (flags & RADEON_FLAG_GTT_WC)
>request.flags |= AMDGPU_GEM_CREATE_CPU_GTT_USWC;
&

[RFC PATCH] radeonsi: set a per-buffer flag that disables inter-process sharing

2017-07-18 Thread Marek Olšák
From: Marek Olšák <marek.ol...@amd.com>

for lower overhead in the CS ioctl
---
 src/gallium/drivers/radeon/r600_buffer_common.c | 7 +++
 src/gallium/drivers/radeon/radeon_winsys.h  | 1 +
 src/gallium/winsys/amdgpu/drm/amdgpu_bo.c   | 6 ++
 3 files changed, 14 insertions(+)

diff --git a/src/gallium/drivers/radeon/r600_buffer_common.c 
b/src/gallium/drivers/radeon/r600_buffer_common.c
index dd1c209..2747ac4 100644
--- a/src/gallium/drivers/radeon/r600_buffer_common.c
+++ b/src/gallium/drivers/radeon/r600_buffer_common.c
@@ -160,20 +160,27 @@ void r600_init_resource_fields(struct r600_common_screen 
*rscreen,
}
 
/* Tiled textures are unmappable. Always put them in VRAM. */
if ((res->b.b.target != PIPE_BUFFER && !rtex->surface.is_linear) ||
res->flags & R600_RESOURCE_FLAG_UNMAPPABLE) {
res->domains = RADEON_DOMAIN_VRAM;
res->flags |= RADEON_FLAG_NO_CPU_ACCESS |
 RADEON_FLAG_GTT_WC;
}
 
+   /* Only displayable single-sample textures can be shared between
+* processes. */
+   if (res->b.b.target == PIPE_BUFFER ||
+   res->b.b.nr_samples >= 2 ||
+   rtex->surface.micro_tile_mode != RADEON_MICRO_MODE_DISPLAY)
+   res->flags |= RADEON_FLAG_NO_INTERPROCESS_SHARING;
+
/* If VRAM is just stolen system memory, allow both VRAM and
 * GTT, whichever has free space. If a buffer is evicted from
 * VRAM to GTT, it will stay there.
 *
 * DRM 3.6.0 has good BO move throttling, so we can allow VRAM-only
 * placements even with a low amount of stolen VRAM.
 */
if (!rscreen->info.has_dedicated_vram &&
(rscreen->info.drm_major < 3 || rscreen->info.drm_minor < 6) &&
res->domains == RADEON_DOMAIN_VRAM) {
diff --git a/src/gallium/drivers/radeon/radeon_winsys.h 
b/src/gallium/drivers/radeon/radeon_winsys.h
index 351edcd..ce2fd73 100644
--- a/src/gallium/drivers/radeon/radeon_winsys.h
+++ b/src/gallium/drivers/radeon/radeon_winsys.h
@@ -47,20 +47,21 @@ enum radeon_bo_domain { /* bitfield */
 RADEON_DOMAIN_GTT  = 2,
 RADEON_DOMAIN_VRAM = 4,
 RADEON_DOMAIN_VRAM_GTT = RADEON_DOMAIN_VRAM | RADEON_DOMAIN_GTT
 };
 
 enum radeon_bo_flag { /* bitfield */
 RADEON_FLAG_GTT_WC =(1 << 0),
 RADEON_FLAG_NO_CPU_ACCESS = (1 << 1),
 RADEON_FLAG_NO_SUBALLOC =   (1 << 2),
 RADEON_FLAG_SPARSE =(1 << 3),
+RADEON_FLAG_NO_INTERPROCESS_SHARING = (1 << 4),
 };
 
 enum radeon_bo_usage { /* bitfield */
 RADEON_USAGE_READ = 2,
 RADEON_USAGE_WRITE = 4,
 RADEON_USAGE_READWRITE = RADEON_USAGE_READ | RADEON_USAGE_WRITE,
 
 /* The winsys ensures that the CS submission will be scheduled after
  * previously flushed CSs referencing this BO in a conflicting way.
  */
diff --git a/src/gallium/winsys/amdgpu/drm/amdgpu_bo.c 
b/src/gallium/winsys/amdgpu/drm/amdgpu_bo.c
index 97bbe23..f97e1bf 100644
--- a/src/gallium/winsys/amdgpu/drm/amdgpu_bo.c
+++ b/src/gallium/winsys/amdgpu/drm/amdgpu_bo.c
@@ -31,20 +31,24 @@
 
 #include "amdgpu_cs.h"
 
 #include "os/os_time.h"
 #include "state_tracker/drm_driver.h"
 #include 
 #include 
 #include 
 #include 
 
+#ifndef AMDGPU_GEM_CREATE_NO_INTERPROCESS_SHARING
+#define AMDGPU_GEM_CREATE_NO_INTERPROCESS_SHARING (1 << 6)
+#endif
+
 /* Set to 1 for verbose output showing committed sparse buffer ranges. */
 #define DEBUG_SPARSE_COMMITS 0
 
 struct amdgpu_sparse_backing_chunk {
uint32_t begin, end;
 };
 
 static struct pb_buffer *
 amdgpu_bo_create(struct radeon_winsys *rws,
  uint64_t size,
@@ -395,20 +399,22 @@ static struct amdgpu_winsys_bo *amdgpu_create_bo(struct 
amdgpu_winsys *ws,
 
if (initial_domain & RADEON_DOMAIN_VRAM)
   request.preferred_heap |= AMDGPU_GEM_DOMAIN_VRAM;
if (initial_domain & RADEON_DOMAIN_GTT)
   request.preferred_heap |= AMDGPU_GEM_DOMAIN_GTT;
 
if (flags & RADEON_FLAG_NO_CPU_ACCESS)
   request.flags |= AMDGPU_GEM_CREATE_NO_CPU_ACCESS;
if (flags & RADEON_FLAG_GTT_WC)
   request.flags |= AMDGPU_GEM_CREATE_CPU_GTT_USWC;
+   if (flags & RADEON_FLAG_NO_INTERPROCESS_SHARING)
+  request.flags |= AMDGPU_GEM_CREATE_NO_INTERPROCESS_SHARING;
 
r = amdgpu_bo_alloc(ws->dev, , _handle);
if (r) {
   fprintf(stderr, "amdgpu: Failed to allocate a buffer:\n");
   fprintf(stderr, "amdgpu:size  : %"PRIu64" bytes\n", size);
   fprintf(stderr, "amdgpu:alignment : %u bytes\n", alignment);
   fprintf(stderr, "amdgpu:domains   : %u\n", initial_domain);
   goto error_bo_alloc;
}
 
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH libdrm] libdrm_amdgpu: add kernel semaphore support

2017-07-18 Thread Marek Olšák
Hi Dave,

If you just add "get" functions for what you need from amdgpu objects,
that should be fine.

Marek

On Mon, Jul 17, 2017 at 11:00 PM, Dave Airlie  wrote:
> On 18 July 2017 at 03:02, Christian König  wrote:
>> Am 17.07.2017 um 05:36 schrieb Dave Airlie:

 I can take a look at it, I just won't have time until next week most
 likely.
>>>
>>> I've taken a look, and it's seemingly more complicated than I'm
>>> expecting I'd want to land in Mesa before 17.2 ships, I'd really
>>> prefer to just push the new libdrm_amdgpu api from this patch. If I
>>> have to port all the current radv code to the new API, I'll most
>>> definitely get something wrong.
>>>
>>> Adding the new API so far looks like
>>> https://cgit.freedesktop.org/~airlied/drm/log/?h=drm-amdgpu-cs-submit-raw
>>>
>>>
>>> https://cgit.freedesktop.org/~airlied/drm/commit/?h=drm-amdgpu-cs-submit-raw=e7f85d0ca617fa41e72624780c9035df132e23c4
>>> being the API, and whether it should take a uint32_t context id or
>>> context handle left as an open question in the last patch in the
>>> series.
>>
>>
>> I would stick with the context handle, as far as I can see there isn't any
>> value in using the uint32_t for this.
>>
>> We just want to be able to send arbitrary chunks down into the kernel
>> without libdrm_amdgpu involvement and/or the associated overhead of the
>> extra loop and the semaphore handling.
>>
>> So your "amdgpu/cs: add new raw cs submission interface just taking chunks"
>> patch looks fine to me as far as I can tell.
>>
>> As far as I can see the "amdgpu: refactor semaphore handling" patch is
>> actually incorrect. We must hole the mutex while sending the CS down to the
>> kernel, or otherwise "context->last_seq" won't be accurate.
>>
>>> However to hook this into radv or radeonsi will take a bit of
>>> rewriting of a lot of code that is probably a bit more fragile than
>>> I'd like for this sort of surgery at this point.
>>
>>
>> Again, I can move over the existing Mesa stuff if you like.
>>
>>> I'd actually suspect if we do want to proceed with this type of
>>> interface, we might be better doing it all in common mesa code, and
>>> maybe bypassing libdrm_amdgpu altogether, which I suppose the API I've
>>> written here is mostly already doing.
>>
>>
>> I want to stick with the other interfaces for now. No need to make it more
>> complicated than it already is.
>>
>> Only the CS stuff is the most performance critical and thing we have right
>> now.
>
> As I suspected this plan is full of traps.
>
> So with the raw cs api I posted (using amdgpu_bo_list_handle instead), I ran
> into two places the abstraction cuts me.
>
>   CC   winsys/amdgpu/radv_amdgpu_cs.lo
> winsys/amdgpu/radv_amdgpu_cs.c: In function ‘radv_amdgpu_cs_submit’:
> winsys/amdgpu/radv_amdgpu_cs.c:1173:63: error: dereferencing pointer
> to incomplete type ‘struct amdgpu_bo’
>chunk_data[i].fence_data.handle = request->fence_info.handle->handle;
>^~
> winsys/amdgpu/radv_amdgpu_cs.c:1193:31: error: dereferencing pointer
> to incomplete type ‘struct amdgpu_context’
> dep->ctx_id = info->context->id;
>
> In order to do user fence chunk I need the actual bo handle not the
> amdgpu wrapped one, we don't have an accessor method for that.
>
> In order to do the dependencies chunks, I need a context id.
>
> Now I suppose I can add chunk creation helpers to libdrm, but it does
> seems like it breaks the future proof interface if we can't access the
> details of a bunch of objects we want to pass through to the kernel
> API.
>
> Dave.
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH libdrm] libdrm_amdgpu: add kernel semaphore support

2017-07-17 Thread Marek Olšák
On Sun, Jul 16, 2017 at 11:36 PM, Dave Airlie  wrote:
>>
>> I can take a look at it, I just won't have time until next week most likely.
>
> I've taken a look, and it's seemingly more complicated than I'm
> expecting I'd want to land in Mesa before 17.2 ships, I'd really
> prefer to just push the new libdrm_amdgpu api from this patch. If I
> have to port all the current radv code to the new API, I'll most
> definitely get something wrong.
>
> Adding the new API so far looks like
> https://cgit.freedesktop.org/~airlied/drm/log/?h=drm-amdgpu-cs-submit-raw
>
> https://cgit.freedesktop.org/~airlied/drm/commit/?h=drm-amdgpu-cs-submit-raw=e7f85d0ca617fa41e72624780c9035df132e23c4
> being the API, and whether it should take a uint32_t context id or
> context handle left as an open question in the last patch in the
> series.
>
> However to hook this into radv or radeonsi will take a bit of
> rewriting of a lot of code that is probably a bit more fragile than
> I'd like for this sort of surgery at this point.
>
> I'd actually suspect if we do want to proceed with this type of
> interface, we might be better doing it all in common mesa code, and
> maybe bypassing libdrm_amdgpu altogether, which I suppose the API I've
> written here is mostly already doing.

Well, we plan to stop using the BO list ioctl. The interface has
bo_list_handle in it. Will we just set it to 0 when add the chunk for
the inlined buffer list i.e. what radeon has?

Marek
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH libdrm] libdrm_amdgpu: add kernel semaphore support

2017-07-11 Thread Marek Olšák
On Tue, Jul 11, 2017 at 11:20 AM, Dave Airlie  wrote:
> On 11 July 2017 at 18:36, Christian König  wrote:
>> Am 11.07.2017 um 08:49 schrieb Dave Airlie:
>>>
>>> On 7 July 2017 at 19:07, Christian König  wrote:

 Hi Dave,

 on first glance that looks rather good to me, but there is one things I
 don't really like and I strongly think Marek will absolutely agree on
 that:
 When we add a new CS function then let's get ride of all this
 abstraction!

 The new function should get an amdgpu_device_handle and a list of chunks
 to
 submit, nothing else.

 When then provide helper functions to generate the chunks out of the
 existing amdgpu_context_handle and amdgpu_bo_list_handle.

 That should be perfectly sufficient and extensible for future additions
 as
 well.
>>>
>>> Sounds tempting, but it a bit messier than it looks once I started
>>> digging into it.
>>>
>>> The main things I ran up against is the context sequence mutex protecting
>>> the
>>> kernel submissions per context which would be tricky to figure out why
>>> that is
>>> required (should we be submitting from different contexts on different
>>> threads?)
>>
>>
>> The sequence lock is just to keep last_seq up to date and last_seq just
>> exists because of amdgpu_cs_signal_semaphore.
>>
>> We want to get ride of that, so you can drop support for this altogether in
>> the new IOCTL.
>>
>>> I'd prefer to land this then refactor a new interface, I do wonder if
>>> maybe Marek
>>> would prefer just doing this all in Mesa and avoiding these APIs a bit
>>> more :-)
>>>
>>> Once I get the syncobjs in I might look at internally refactoring the
>>> code a bit more,
>>> then a new API.
>>
>>
>> Actually I wanted to propose just to remove the old semaphore API, it was
>> never used by Mesa or any other open source user.
>
> radv uses it right now until we have syncobjs.
>
> So it should hang around.

Dave, this may sound outrageous, but think about it: What would happen
if we removed the old semaphore API? Users would have to get a new
RADV after getting new libdrm_amdgpu. Is that really that big of a
deal? It happens with LLVM all the time and we've managed to cope with
that just fine.

Marek
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 1/3] drm/amdgpu/atom: fix atom_fw check

2017-07-10 Thread Marek Olšák
On Mon, Jul 10, 2017 at 3:49 PM, Deucher, Alexander <
alexander.deuc...@amd.com> wrote:

> > -Original Message-
> > From: Marek Olšák [mailto:mar...@gmail.com]
> > Sent: Saturday, July 08, 2017 7:08 PM
> > To: Alex Deucher
> > Cc: amd-gfx mailing list; Deucher, Alexander
> > Subject: Re: [PATCH 1/3] drm/amdgpu/atom: fix atom_fw check
> >
> > Hi Alex,
> >
> > This commit causes that clock_crystal_freq is 0 on Raven, demoting
> > OpenGL support from 4.5 to 3.2. Can I revert?
>
> The problem is, it's just reading garbage right now.  It would be better
> to just hardcode the reference clock until I can figure out where the
> reference clock is in atomfirmware.
>

OK. I'll find a way to work around it in Mesa.

Marek
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 1/3] drm/amdgpu/atom: fix atom_fw check

2017-07-08 Thread Marek Olšák
Hi Alex,

This commit causes that clock_crystal_freq is 0 on Raven, demoting
OpenGL support from 4.5 to 3.2. Can I revert?

Marek

On Wed, Jul 5, 2017 at 9:51 PM, Alex Deucher  wrote:
> Not all vbios images seem to set the version appropriately.
> Switch the check based on asic type instead.
>
> Signed-off-by: Alex Deucher 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_bios.c | 15 +--
>  1 file changed, 1 insertion(+), 14 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_bios.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_bios.c
> index 365e735..ea3a250 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_bios.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_bios.c
> @@ -86,19 +86,6 @@ static bool check_atom_bios(uint8_t *bios, size_t size)
> return false;
>  }
>
> -static bool is_atom_fw(uint8_t *bios)
> -{
> -   uint16_t bios_header_start = bios[0x48] | (bios[0x49] << 8);
> -   uint8_t frev = bios[bios_header_start + 2];
> -   uint8_t crev = bios[bios_header_start + 3];
> -
> -   if ((frev < 3) ||
> -   ((frev == 3) && (crev < 3)))
> -   return false;
> -
> -   return true;
> -}
> -
>  /* If you boot an IGP board with a discrete card as the primary,
>   * the IGP rom is not accessible via the rom bar as the IGP rom is
>   * part of the system bios.  On boot, the system bios puts a
> @@ -455,6 +442,6 @@ bool amdgpu_get_bios(struct amdgpu_device *adev)
> return false;
>
>  success:
> -   adev->is_atom_fw = is_atom_fw(adev->bios);
> +   adev->is_atom_fw = (adev->asic_type >= CHIP_VEGA10) ? true : false;
> return true;
>  }
> --
> 2.5.5
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 4/5] drm/amdgpu: Set/clear CPU_ACCESS_REQUIRED flag on page fault and CS

2017-07-07 Thread Marek Olšák
On Fri, Jun 30, 2017 at 8:47 AM, Christian König
 wrote:
> Am 30.06.2017 um 04:24 schrieb Michel Dänzer:
>>
>> On 29/06/17 07:05 PM, Daniel Vetter wrote:
>>>
>>> On Thu, Jun 29, 2017 at 06:58:05PM +0900, Michel Dänzer wrote:

 On 29/06/17 05:23 PM, Christian König wrote:
>
> Am 29.06.2017 um 04:35 schrieb Michel Dänzer:
>>
>> On 29/06/17 08:26 AM, John Brooks wrote:
>>>
>>> On Wed, Jun 28, 2017 at 03:05:32PM +0200, Christian König wrote:
>
> Instead of the flag being set in stone at BO creation, set the flag
> when a
> page fault occurs so that it goes somewhere CPU-visible, and clear
> it when
> the BO is requested by the GPU.
>
> However, clearing the CPU_ACCESS_REQUIRED flag may move BOs in GTT
> to
> invisible VRAM, where they may promptly generate another page
> fault. When
> BOs are constantly moved back and forth like this, it is highly
> detrimental
> to performance. Only clear the flag on CS if:
>
> - The BO wasn't page faulted for a certain amount of time
> (currently 10
> seconds), and
> - its last page fault didn't occur too soon (currently 500ms) after
> its
> last CS request, or vice versa.
>
> Setting the flag in amdgpu_fault_reserve_notify() also means that
> we can
> remove the loop to restrict lpfn to the end of visible VRAM,
> because
> amdgpu_ttm_placement_init() will do it for us.

 I'm fine with the general approach, but I'm still absolutely not
 keen about
 clearing the flag when userspace has originally specified it.
>>
>> Is there any specific concern you have about that?
>
> Yeah, quite a bunch actually. We want to use this flag for P2P buffer
> sharing in the future as well and I don't intent to add another one
> like
> CPU_ACCESS_REALLY_REQUIRED or something like this.

 Won't a BO need to be pinned while it's being shared with another
 device?
>>>
>>> That's an artifact of the current kernel implementation, I think we could
>>> do better (but for current use-cases where we share a bunch of scanouts
>>> and maybe a few pixmaps it's pointless). I wouldn't bet uapi on this
>>> never
>>> changing.
>>
>> Surely there will need to be some kind of transaction though to let the
>> driver know when a BO starts/stops being shared with another device?
>> Either via the existing dma-buf callbacks, or something similar. We
>> can't rely on userspace setting a "CPU access" flag to make sure a BO
>> can be shared with other devices?
>
>
> Well, the flag was never intended to be used by userspace.
>
> See the history was more like we need something in the kernel to place the
> BO in CPU accessible VRAM.
>
> Then the closed source UMD came along and said hey we have the concept of
> two different heaps for visible and invisible VRAM, how does that maps to
> amdgpu?

Mesa stopped using CPU_ACCESS_REQUIRED a couple of days ago.

Marek
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: Deprecation of AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED

2017-07-04 Thread Marek Olšák
On Tue, Jul 4, 2017 at 10:09 AM, Michel Dänzer <mic...@daenzer.net> wrote:
> On 03/07/17 10:03 PM, Marek Olšák wrote:
>> On Mon, Jul 3, 2017 at 12:08 PM, Michel Dänzer <mic...@daenzer.net> wrote:
>>> On 30/06/17 08:43 PM, Marek Olšák wrote:
>>>>
>>>> I don't know what is being talked about here anymore, but I wouldn't
>>>> like to use CPU_ACCESS_REQUIRED or CPU_ACCESS_REALLY_REQUIRED in
>>>> userspace. The reason is that userspace doesn't and can't know whether
>>>> CPU access will be required, and the frequency at which it will be
>>>> required. 3 heaps {no CPU access, no flag, CPU access required} are
>>>> too many. Userspace mostly doesn't use the "no flag" heap for VRAM. It
>>>> uses "CPU access required" for almost everything except tiled
>>>> textures, which use "no CPU access".
>>>
>>> FWIW, the difference between setting CPU_ACCESS_REQUIRED and not setting
>>> it for a BO created in VRAM will be: If it's set, the BO is initially
>>> created in CPU visible VRAM, otherwise it's most likely created in CPU
>>> invisible VRAM.
>>>
>>> If userspace knows that a BO will likely be accessed by the CPU first,
>>> setting the flag could save a move from CPU invisible to CPU visible
>>> VRAM when the CPU access happens. Conversely, if a BO will likely never
>>> be accessed by the CPU, not setting the flag may reduce pressure on CPU
>>> visible VRAM.
>>>
>>> Not sure radeonsi can make this distinction though.
>>
>> It can't.
>
> Not sure why, but anyway.
>
>
>> Either all mappable BOs set CPU_ACCESS_REQUIRED, or all mappable BOs
>> don't set it.
>
> FWIW, with kernels without John's fixes, the flag shouldn't be set for
> all such BOs, because it would prevent them from ever going to CPU
> invisible VRAM.
>
> With John's fixes, one possibility would be to set the flag based on the
> existing heuristics or some variant, and use it when a BO is newly
> created from the kernel, but to ignore the flag when reusing a BO from
> the cache (with John's fixes, whether or not the flag was set when a BO
> was created won't directly affect its state WRT the flag when it's
> reused from the cache).

Thanks. Based on your comments, I've come to the conclusion that the
userspace shouldn't set CPU_ACCESS_REQUIRED, because the only other
alternative (always setting the flag) is worse.

Marek
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: Deprecation of AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED

2017-07-03 Thread Marek Olšák
On Mon, Jul 3, 2017 at 12:08 PM, Michel Dänzer <mic...@daenzer.net> wrote:
> On 30/06/17 08:43 PM, Marek Olšák wrote:
>>
>> I don't know what is being talked about here anymore, but I wouldn't
>> like to use CPU_ACCESS_REQUIRED or CPU_ACCESS_REALLY_REQUIRED in
>> userspace. The reason is that userspace doesn't and can't know whether
>> CPU access will be required, and the frequency at which it will be
>> required. 3 heaps {no CPU access, no flag, CPU access required} are
>> too many. Userspace mostly doesn't use the "no flag" heap for VRAM. It
>> uses "CPU access required" for almost everything except tiled
>> textures, which use "no CPU access".
>
> FWIW, the difference between setting CPU_ACCESS_REQUIRED and not setting
> it for a BO created in VRAM will be: If it's set, the BO is initially
> created in CPU visible VRAM, otherwise it's most likely created in CPU
> invisible VRAM.
>
> If userspace knows that a BO will likely be accessed by the CPU first,
> setting the flag could save a move from CPU invisible to CPU visible
> VRAM when the CPU access happens. Conversely, if a BO will likely never
> be accessed by the CPU, not setting the flag may reduce pressure on CPU
> visible VRAM.
>
> Not sure radeonsi can make this distinction though.

It can't.

Either all mappable BOs set CPU_ACCESS_REQUIRED, or all mappable BOs
don't set it. Either way, there is only one combination of flags for
mappable BOs in VRAM, and therefore only one kind of behavior the
kernel can follow.

>
>
>> I've been trying to trim down the number of heaps. So far, I have:
>> - VRAM_NO_CPU_ACCESS (implies WC)
>> - VRAM (implies WC)
>> - VRAM_GTT (combined, implies WC)
>
> Is this useful? It means:
>
> * The BO may be created in VRAM, or if there's no space, in GTT.
> * Once the BO is in GTT for any reason, it will never go back to VRAM.
>
> Such BOs will tend to end up in GTT after some time, at the latest after
> suspend/resume.
>
> I think it would be better for radeonsi to choose either VRAM or GTT as
> the preferred domain, and let the kernel handle it.

Currently, radeonsi on amdgpu doesn't use VRAM_GTT with the current kernel.

I'm aware of the limited usefulness.

Marek
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: Deprecation of AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED

2017-06-30 Thread Marek Olšák
On Fri, Jun 30, 2017 at 12:34 PM, Christian König
 wrote:
> Am 30.06.2017 um 09:14 schrieb Michel Dänzer:
>>
>> On 30/06/17 03:59 PM, Christian König wrote:
>>>
>>> Am 30.06.2017 um 08:51 schrieb Michel Dänzer:

 We can deal with that internally in the kernel, while fixing the
 existing flag for userspace.
>>>
>>> And as I said, NAK to that approach. I'm not going to add a
>>> CPU_ACCESS_REALLY_REQUIRED flag in the kernel just because mesa has
>>> messed up it's use case.
>>>
>>> We could agree on filtering that flag from userspace when BOs are
>>> created and/or map it to a CREATE_CPU_ACCESS_HINT flag.
>>
>> Then I propose the following:
>>
>> One patch:
>>
>> Convert AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED to a kernel internal flag
>> AMDGPU_GEM_CPU_ACCESS_HINT in amdgpu_gem_create_ioctl, which is
>> initially treated the same way as AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED.
>>
>> Another patch:
>>
>> Change the treatment of AMDGPU_GEM_CPU_ACCESS_HINT according to John's
>> patch 4 in the latest series, or a variation of that as discussed on IRC.
>>
>>
>> If any regressions are reported, we will be able to differentiate
>> whether they are due to the addition of the new flag itself or due to
>> the change in its handling.
>
>
>
> It just occurred to me that there is a simpler way of handling this: We just
> never clear the flag on kernel allocations.
>
> See my main concern are the in kernel users of the flag which use it as
> guarantee that the BO is CPU accessible.
>
> If we handle those specially there shouldn't be a problem clearing the flag
> for the UMD BOs.

Hi,

I don't know what is being talked about here anymore, but I wouldn't
like to use CPU_ACCESS_REQUIRED or CPU_ACCESS_REALLY_REQUIRED in
userspace. The reason is that userspace doesn't and can't know whether
CPU access will be required, and the frequency at which it will be
required. 3 heaps {no CPU access, no flag, CPU access required} are
too many. Userspace mostly doesn't use the "no flag" heap for VRAM. It
uses "CPU access required" for almost everything except tiled
textures, which use "no CPU access".

I've been trying to trim down the number of heaps. So far, I have:
- VRAM_NO_CPU_ACCESS (implies WC)
- VRAM (implies WC)
- VRAM_GTT (combined, implies WC)
- GTT_WC
- GTT

See, you can't forbid CPU access for the combined VRAM_GTT heap. It's
one of the compromises there.

The more heaps we have, the more memory can be wasted by
suballocators. It's silly to have more than 3 suballocators just for
VRAM.

Marek
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: Deprecation of AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED

2017-06-29 Thread Marek Olšák
Do you have any concern if we also stop using the CPU_ACCESS flag on radeon?

Thanks,
Marek

On Thu, Jun 29, 2017 at 4:51 PM, Christian König
<deathsim...@vodafone.de> wrote:
> Yeah, I was thinking something similar.
>
> See the intention behind CPU_ACCESS_REQUIRED is to always guarantee that CPU
> access is immediately possible.
>
> If you ask me that is not really useful for the UMD and was never meant to
> be used by Mesa (only the closed source UMD and some kernel internal use
> cases).
>
> I would like to keep the behavior in the kernel driver as it is, but we
> should really stop using this as a hint in Mesa.
>
> Regards,
> Christian.
>
>
> Am 29.06.2017 um 16:41 schrieb Marek Olšák:
>>
>> Hi,
>>
>> Given how our memory manager works and the guesswork that UMDs have to
>> do to determine whether to set the flag, I think the flag isn't
>> useful.
>>
>> I'm proposing that CPU_ACCESS_REQUIRED:
>> - will be deprecated.
>> - It will remain to be accepted by the kernel driver, but it will
>> either not have any effect, or it will serve as a hint that might or
>> might not be followed.
>> - The only flag that UMDs are expected to set with regard to CPU
>> access is NO_CPU_ACCESS.
>>
>> The main motivation is the reduction of "virtual" heaps for UMD buffer
>> suballocators and reusable buffer pools. A higher number of heaps
>> means that more memory can be wasted by UMDs.
>>
>> Opinions?
>>
>> Thanks,
>> Marek
>> ___
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
>
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Deprecation of AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED

2017-06-29 Thread Marek Olšák
Hi,

Given how our memory manager works and the guesswork that UMDs have to
do to determine whether to set the flag, I think the flag isn't
useful.

I'm proposing that CPU_ACCESS_REQUIRED:
- will be deprecated.
- It will remain to be accepted by the kernel driver, but it will
either not have any effect, or it will serve as a hint that might or
might not be followed.
- The only flag that UMDs are expected to set with regard to CPU
access is NO_CPU_ACCESS.

The main motivation is the reduction of "virtual" heaps for UMD buffer
suballocators and reusable buffer pools. A higher number of heaps
means that more memory can be wasted by UMDs.

Opinions?

Thanks,
Marek
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 6/9] drm/amdgpu: Set/clear CPU_ACCESS_REQUIRED flag on page fault and CS

2017-06-26 Thread Marek Olšák
On Mon, Jun 26, 2017 at 11:27 AM, Michel Dänzer  wrote:
> On 25/06/17 03:00 AM, Christian König wrote:
>> Am 23.06.2017 um 19:39 schrieb John Brooks:
>>> When the AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED flag is given by
>>> userspace,
>>> it should only be treated as a hint to initially place a BO somewhere CPU
>>> accessible, rather than having a permanent effect on BO placement.
>>
>> And that is a clear NAK from my side.
>>
>> CPU_ACCESS_REQUIRED is a permanent limitation to where the buffer should
>> be placed.
>
> It really can't be more than a hint. The userspace driver cannot
> reliably know ahead of time whether a BO will be accessed by the CPU at
> all, let alone how often. A BO which incorrectly has this flag set
> creates artificial pressure on CPU visible VRAM.

I also think the flag is only a hint and shouldn't be taken too seriously.

Only AMDGPU_GEM_CREATE_NO_CPU_ACCESS has a strict behavior.

Marek
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 1/3] drm/amdgpu: fix a typo

2017-06-23 Thread Marek Olšák
On Fri, Jun 23, 2017 at 3:45 PM, axie  wrote:
> Hi Marek,
>
> I understand you spent time on your original logic too. I really don't
> understand why you talked about pain if somebody can improve it.
>
> To reduce the pain, now I am seriously considering dropping this patch. But
> please read on before you conclude. Let us treat open source software
> development a fun.
>
> Same trick like this patch could be found in open source Intel GPU driver
> and xfs.
>
> Talking about code size. You will be surprised if you really calculate it.
>
> For function amdgpu_cs_get_threshold_for_moves:
> New code:  It is 2 more loops and 2 more ifs.
> Old code:  2 spinlock inline function. spin_lock can expand to 5 function
> calls and one if. One function call contains 7 parameters.
> spin_unlock can expand to 4 function calls.
>
> By the way, you can config Linux kernel to disable some spinlock macro
> expansion. But I don't think people really do that.
>
> In function amdgpu_cs_report_moved_bytes
> New code:  zero
> Old code:  2 spinlock inline function.
>
> In Total:
>
> New code:  It is 2 more loops and 2 more ifs. Maybe there are one or two
> other tiny things.
> Old code:  4 spinlock inline function. They are expanded to 18 function
> calls. Among them, two function calls each contain 7 parameters.
>
> Please think about it. Are you still sure that the new code make code size
> bigger? Now what is the next problem of the new change?

The code size means the size of source code, not binary. The fewer
lines of amdgpu code that we need to get the job done, the better.

The next issue is the risk of breaking this already hard-to-test code.

Your logic would be OK if there was a measurable benefit even with the
silliest microbenchmark you can find (and even if it were as low as
0.2% improvement). Without that, I can't accept it. Sorry.

Marek
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 1/3] drm/amdgpu: fix a typo

2017-06-23 Thread Marek Olšák
On Fri, Jun 23, 2017 at 1:55 PM, Zhou, David(ChunMing)
<david1.z...@amd.com> wrote:
>
> ____
> From: Marek Olšák [mar...@gmail.com]
> Sent: Friday, June 23, 2017 6:49 PM
> To: Christian König
> Cc: Zhou, David(ChunMing); Xie, AlexBin; amd-gfx@lists.freedesktop.org; Xie, 
> AlexBin
> Subject: Re: [PATCH 1/3] drm/amdgpu: fix a typo
>
> On Fri, Jun 23, 2017 at 11:27 AM, Christian König
> <deathsim...@vodafone.de> wrote:
>> Am 23.06.2017 um 11:08 schrieb zhoucm1:
>>>
>>>
>>>
>>> On 2017年06月23日 17:01, zhoucm1 wrote:
>>>>
>>>>
>>>>
>>>> On 2017年06月23日 16:25, Christian König wrote:
>>>>>
>>>>> Am 23.06.2017 um 09:09 schrieb zhoucm1:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 2017年06月23日 14:57, Christian König wrote:
>>>>>>>
>>>>>>> But giving the CS IOCTL an option for directly specifying the BOs
>>>>>>> instead of a BO list like Marek suggested would indeed save us some time
>>>>>>> here.
>>>>>>
>>>>>> interesting, I always follow how to improve our cs ioctl, since UMD
>>>>>> guys aften complain our command submission is slower than windows.
>>>>>> Then how to directly specifying the BOs instead of a BO list? BO handle
>>>>>> array from UMD? Could your guys describe more clear? Is it doable?
>>>>>
>>>>>
>>>>> Making the BO list part of the CS IOCTL wouldn't help at all for the
>>>>> close source UMDs. To be precise we actually came up with the BO list
>>>>> approach because of their requirement.
>>>>>
>>>>> The biggest bunch of work during CS is reserving all the buffers,
>>>>> validating them and checking their VM status.
>>>>
>>>> Totally agree. Every time when I read code there, I often want to
>>>> optimize them.
>>>>
>>>>> It doesn't matter if the BOs come from the BO list or directly in the CS
>>>>> IOCTL.
>>>>>
>>>>> The key point is that CS overhead is pretty much irrelevant for the open
>>>>> source stack, since Mesa does command submission from a separate thread
>>>>> anyway.
>>>>
>>>> If irrelevant for the open stack, then how does open source stack handle
>>>> "The biggest bunch of work during CS is reserving all the buffers,
>>>> validating them and checking their VM status."?
>>
>>
>> Command submission on the open stack is outsourced to a separate user space
>> thread. E.g. when an application triggers a flush the IBs created so far are
>> just put on a queue and another thread pushes them down to the kernel.
>>
>> I mean reducing the overhead of the CS IOCTL is always nice, but you usual
>> won't see any fps increase as long as not all CPUs are completely bound to
>> some tasks.
>>
>>>> If open stack has a better way, I think closed stack can follow it, I
>>>> don't know the history.
>>>
>>> Do you not use bo list at all in mesa? radv as well?
>>
>>
>> I don't think so. Mesa just wants to send the list of used BOs down to the
>> kernel with every IOCTL.
>
> The CS ioctl actually costs us some performance, but not as much as on
> closed source drivers.
>
> MesaGL always executes all CS ioctls in a separate thread (in parallel
> with the UMD) except for the last IB that's submitted by SwapBuffers.
> SwapBuffers requires that all IBs have been submitted when SwapBuffers
> returns. For example, if you have 5 IBs per frame, 4 of them are
> executed on the thread and the overhead is hidden. The last one is
> executed on the thread too, but this time the Mesa driver has to wait
> for it. For things like glxgears with only 1 IB per frame, the thread
> doesn't hide anything and Mesa always has to wait for it after
> submission, just because of SwapBuffers.
>
> Having 10 or more IBs per frame is great, because 9 are done in
> parallel and the last one is synchronous. The final CPU cost is 10x
> lower, but it's not zero.
> [DZ] Thanks Marek, this is very useful and helpful message for me to 
> understand Mesa action of CS, I will talk to closed guys to see if it can be 
> used for them.
> Anothing I also want to confirm with you, do you know if radv is using this 
> cs way?
>
> For us, it's certainly useful to optimize the CS ioctl because of apps
> that submit only 1 IB per frame where multithreading has no effect or
> may even hurt performance.
>
> The most obvious inefficiency is the BO_LIST ioctl that is completely
> unnecessary and only slows us down. What we need is exactly what
> radeon does.
>
> [DZ] I don't know how radeon handle bo list, could you describe it as well?

Inputs for the following ioctls are:

AMDGPU: BO_LIST:
- list of BOs

AMDGPU: CS
- list of IBs
- BO list handle

RADEON: CS
- one IB
- list of BOs

Ideal solution for a new amdgpu CS ioctl:
- list of IBs
- list of BOs

Marek
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 1/3] drm/amdgpu: fix a typo

2017-06-23 Thread Marek Olšák
On Fri, Jun 23, 2017 at 3:01 PM, Christian König
<deathsim...@vodafone.de> wrote:
> The key point here is while optimizing this is nice the much bigger pile is
> the locking done for each BO.
>
> In other words even when we optimize all the other locks involved into
> atomics or RCU, the BO reservation lock will still dominate everything.
>
> One possible solution to this would be per process resources like I
> suggested multiple times now.

Mesa can set a per-process resource flag on all resources except
displayable ones. The question is, would it help if an IB contained
1000 per-process resources and 1-2 inter-process sharable?

Marek

>
> Christian.
>
>
> Am 23.06.2017 um 13:37 schrieb Marek Olšák:
>>
>> I agree with you about the spinlock. You seem to be good at this.
>>
>> It's always good to do measurements to validate that a code change
>> improves something, especially when the code size and code complexity
>> has to be increased. A CPU profiler such as sysprof can show you
>> improvements on the order of 1/1th = 0.01% if you record enough
>> samples. Sometimes you have to un-inline a function to make it visible
>> there. If you see a function that takes 0.3% of CPU time and you
>> optimize it down to 0.1% using the profiler as the measurement tool,
>> you have evidence that the improvement is there and nobody can reject
>> the idea anymore. It also proves that the code size increase is worth
>> it. It's always "added code size and loss of simplicity" vs benefit.
>> It's a transaction. You trade one for the other. You lose something to
>> get something else. OK, we know the code complexity. Now, what's the
>> benefit? Can you do some measurements? The accuracy of 1/1th
>> should be enough for anybody.
>>
>> I know the feeling when you spend many days working on something,
>> adding 100s or 1000s of lines of code, solving many problems to get
>> there and increasing code complexity significantly, and then you do
>> the measurement and it doesn't improve anything. I know the feeling
>> very well. It sucks. The frustration comes from the investment of time
>> and getting no return on the investment. Many frustrations in life are
>> like that.
>>
>> Marek
>>
>>
>> On Fri, Jun 23, 2017 at 4:23 AM, axie <a...@amd.com> wrote:
>>>
>>> Hi Marek,
>>>
>>>
>>> So do you agree that spinlock disables CPU preemption, contrary to your
>>> original idea?
>>>
>>>
>>> If you have new reason that this patch does not improve, please speak
>>> out.
>>>
>>>
>>> Many patches in GPU driver aim at improving performance and power
>>> efficiency. Does most patches submitted in AMDGPU requires a benchmarking
>>> first?
>>>
>>> If all developers are required to always answer your questions when code
>>> review, I am afraid that most open source community developers cannot
>>> meet
>>> that requirement and stop working on AMDGPU.
>>>
>>>
>>> To improve performance, there are many bottlenecks to clear. When the
>>> last
>>> several bottlenecks are clear, the performance will show faster more
>>> significantly.
>>>
>>> My pass profiling experience told me that clearing a lock can improve
>>> performance for some driver like 0.3% to much bigger percentage. It
>>> depends
>>> on many factors, even depends on the application itself.
>>>
>>>
>>> This is not the first bottleneck fixed. This is surely not the last one.
>>>
>>>
>>> Thanks,
>>>
>>> Alex Bin
>>>
>>>
>>>
>>> On 2017-06-22 07:54 PM, Marek Olšák wrote:
>>>>
>>>> That's all nice, but does it improve performance? Have you been able
>>>> to measure some performance difference with that code? Were you
>>>> targeting a specific inefficiency you had seen e.g. with a CPU
>>>> profiler?
>>>>
>>>> Marek
>>>>
>>>> On Thu, Jun 22, 2017 at 8:19 PM, axie <a...@amd.com> wrote:
>>>>>
>>>>> To clarify, local IRQ is disabled by calling raw_local_irq_save(flags);
>>>>>
>>>>> Function __lock_acquire double checks that the local IRQ is really
>>>>> disabled.
>>>>>
>>>>>
>>>>>
>>>>> On 2017-06-22 01:34 PM, axie wrote:
>>>>>>
>>>>>> Hi Marek,
>>>>>>
>>>>>> 

Re: [PATCH 1/3] drm/amdgpu: fix a typo

2017-06-23 Thread Marek Olšák
I agree with you about the spinlock. You seem to be good at this.

It's always good to do measurements to validate that a code change
improves something, especially when the code size and code complexity
has to be increased. A CPU profiler such as sysprof can show you
improvements on the order of 1/1th = 0.01% if you record enough
samples. Sometimes you have to un-inline a function to make it visible
there. If you see a function that takes 0.3% of CPU time and you
optimize it down to 0.1% using the profiler as the measurement tool,
you have evidence that the improvement is there and nobody can reject
the idea anymore. It also proves that the code size increase is worth
it. It's always "added code size and loss of simplicity" vs benefit.
It's a transaction. You trade one for the other. You lose something to
get something else. OK, we know the code complexity. Now, what's the
benefit? Can you do some measurements? The accuracy of 1/1th
should be enough for anybody.

I know the feeling when you spend many days working on something,
adding 100s or 1000s of lines of code, solving many problems to get
there and increasing code complexity significantly, and then you do
the measurement and it doesn't improve anything. I know the feeling
very well. It sucks. The frustration comes from the investment of time
and getting no return on the investment. Many frustrations in life are
like that.

Marek


On Fri, Jun 23, 2017 at 4:23 AM, axie <a...@amd.com> wrote:
> Hi Marek,
>
>
> So do you agree that spinlock disables CPU preemption, contrary to your
> original idea?
>
>
> If you have new reason that this patch does not improve, please speak out.
>
>
> Many patches in GPU driver aim at improving performance and power
> efficiency. Does most patches submitted in AMDGPU requires a benchmarking
> first?
>
> If all developers are required to always answer your questions when code
> review, I am afraid that most open source community developers cannot meet
> that requirement and stop working on AMDGPU.
>
>
> To improve performance, there are many bottlenecks to clear. When the last
> several bottlenecks are clear, the performance will show faster more
> significantly.
>
> My pass profiling experience told me that clearing a lock can improve
> performance for some driver like 0.3% to much bigger percentage. It depends
> on many factors, even depends on the application itself.
>
>
> This is not the first bottleneck fixed. This is surely not the last one.
>
>
> Thanks,
>
> Alex Bin
>
>
>
> On 2017-06-22 07:54 PM, Marek Olšák wrote:
>>
>> That's all nice, but does it improve performance? Have you been able
>> to measure some performance difference with that code? Were you
>> targeting a specific inefficiency you had seen e.g. with a CPU
>> profiler?
>>
>> Marek
>>
>> On Thu, Jun 22, 2017 at 8:19 PM, axie <a...@amd.com> wrote:
>>>
>>> To clarify, local IRQ is disabled by calling raw_local_irq_save(flags);
>>>
>>> Function __lock_acquire double checks that the local IRQ is really
>>> disabled.
>>>
>>>
>>>
>>> On 2017-06-22 01:34 PM, axie wrote:
>>>>
>>>> Hi Marek,
>>>>
>>>> Spin lock and spin unlock is fast. But it is not so fast compared with
>>>> atomic, which is a single CPU instruction in x86.
>>>>
>>>>
>>>> 1. spinlock does NOT allow preemption at local CPU. Let us have a look
>>>> at
>>>> how spin lock was implemented.
>>>>
>>>> static inline void __raw_spin_lock(raw_spinlock_t *lock)
>>>> {
>>>>  preempt_disable(); <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<--This is
>>>> memory barrier operation too.
>>>>  spin_acquire(>dep_map, 0, 0, _RET_IP_);
>>>>  LOCK_CONTENDED(lock, do_raw_spin_trylock, do_raw_spin_lock);
>>>> }
>>>>
>>>> 2.  A function  __lock_acquire called by spinlock. The function is so
>>>> long
>>>> that I would not attach all of it here.
>>>>
>>>> There is atomic operation inside and 12 meta data updates and 14 if
>>>> statements and it calls quite some other functions.
>>>>
>>>> Note that it disable IRQ...
>>>>
>>>> static int __lock_acquire(struct lockdep_map *lock, unsigned int
>>>> subclass,
>>>>int trylock, int read, int check, int hardirqs_off,
>>>>struct lockdep_map *nest_lock, unsigned long ip,
>>>&

Re: [PATCH 1/3] drm/amdgpu: fix a typo

2017-06-23 Thread Marek Olšák
On Fri, Jun 23, 2017 at 11:27 AM, Christian König
 wrote:
> Am 23.06.2017 um 11:08 schrieb zhoucm1:
>>
>>
>>
>> On 2017年06月23日 17:01, zhoucm1 wrote:
>>>
>>>
>>>
>>> On 2017年06月23日 16:25, Christian König wrote:

 Am 23.06.2017 um 09:09 schrieb zhoucm1:
>
>
>
> On 2017年06月23日 14:57, Christian König wrote:
>>
>> But giving the CS IOCTL an option for directly specifying the BOs
>> instead of a BO list like Marek suggested would indeed save us some time
>> here.
>
> interesting, I always follow how to improve our cs ioctl, since UMD
> guys aften complain our command submission is slower than windows.
> Then how to directly specifying the BOs instead of a BO list? BO handle
> array from UMD? Could your guys describe more clear? Is it doable?


 Making the BO list part of the CS IOCTL wouldn't help at all for the
 close source UMDs. To be precise we actually came up with the BO list
 approach because of their requirement.

 The biggest bunch of work during CS is reserving all the buffers,
 validating them and checking their VM status.
>>>
>>> Totally agree. Every time when I read code there, I often want to
>>> optimize them.
>>>
 It doesn't matter if the BOs come from the BO list or directly in the CS
 IOCTL.

 The key point is that CS overhead is pretty much irrelevant for the open
 source stack, since Mesa does command submission from a separate thread
 anyway.
>>>
>>> If irrelevant for the open stack, then how does open source stack handle
>>> "The biggest bunch of work during CS is reserving all the buffers,
>>> validating them and checking their VM status."?
>
>
> Command submission on the open stack is outsourced to a separate user space
> thread. E.g. when an application triggers a flush the IBs created so far are
> just put on a queue and another thread pushes them down to the kernel.
>
> I mean reducing the overhead of the CS IOCTL is always nice, but you usual
> won't see any fps increase as long as not all CPUs are completely bound to
> some tasks.
>
>>> If open stack has a better way, I think closed stack can follow it, I
>>> don't know the history.
>>
>> Do you not use bo list at all in mesa? radv as well?
>
>
> I don't think so. Mesa just wants to send the list of used BOs down to the
> kernel with every IOCTL.

The CS ioctl actually costs us some performance, but not as much as on
closed source drivers.

MesaGL always executes all CS ioctls in a separate thread (in parallel
with the UMD) except for the last IB that's submitted by SwapBuffers.
SwapBuffers requires that all IBs have been submitted when SwapBuffers
returns. For example, if you have 5 IBs per frame, 4 of them are
executed on the thread and the overhead is hidden. The last one is
executed on the thread too, but this time the Mesa driver has to wait
for it. For things like glxgears with only 1 IB per frame, the thread
doesn't hide anything and Mesa always has to wait for it after
submission, just because of SwapBuffers.

Having 10 or more IBs per frame is great, because 9 are done in
parallel and the last one is synchronous. The final CPU cost is 10x
lower, but it's not zero.

For us, it's certainly useful to optimize the CS ioctl because of apps
that submit only 1 IB per frame where multithreading has no effect or
may even hurt performance.

The most obvious inefficiency is the BO_LIST ioctl that is completely
unnecessary and only slows us down. What we need is exactly what
radeon does.

Marek
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 1/3] drm/amdgpu: fix a typo

2017-06-22 Thread Marek Olšák
That's all nice, but does it improve performance? Have you been able
to measure some performance difference with that code? Were you
targeting a specific inefficiency you had seen e.g. with a CPU
profiler?

Marek

On Thu, Jun 22, 2017 at 8:19 PM, axie <a...@amd.com> wrote:
> To clarify, local IRQ is disabled by calling raw_local_irq_save(flags);
>
> Function __lock_acquire double checks that the local IRQ is really disabled.
>
>
>
> On 2017-06-22 01:34 PM, axie wrote:
>>
>> Hi Marek,
>>
>> Spin lock and spin unlock is fast. But it is not so fast compared with
>> atomic, which is a single CPU instruction in x86.
>>
>>
>> 1. spinlock does NOT allow preemption at local CPU. Let us have a look at
>> how spin lock was implemented.
>>
>> static inline void __raw_spin_lock(raw_spinlock_t *lock)
>> {
>> preempt_disable(); <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<--This is
>> memory barrier operation too.
>> spin_acquire(>dep_map, 0, 0, _RET_IP_);
>> LOCK_CONTENDED(lock, do_raw_spin_trylock, do_raw_spin_lock);
>> }
>>
>> 2.  A function  __lock_acquire called by spinlock. The function is so long
>> that I would not attach all of it here.
>>
>> There is atomic operation inside and 12 meta data updates and 14 if
>> statements and it calls quite some other functions.
>>
>> Note that it disable IRQ...
>>
>> static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
>>   int trylock, int read, int check, int hardirqs_off,
>>   struct lockdep_map *nest_lock, unsigned long ip,
>>   int references, int pin_count)
>> {
>> struct task_struct *curr = current;
>> struct lock_class *class = NULL;
>> struct held_lock *hlock;
>> unsigned int depth;
>> int chain_head = 0;
>> int class_idx;
>> u64 chain_key;
>>
>> if (unlikely(!debug_locks))
>> return 0;
>>
>> /*
>>  * Lockdep should run with IRQs disabled, otherwise we could
>>  * get an interrupt which would want to take locks, which would
>>  * end up in lockdep and have you got a head-ache already?
>>  */
>> if (DEBUG_LOCKS_WARN_ON(!irqs_disabled())) <<<<<<<<<<<<<<> return 0;
>>
>> 
>>
>> 3. Another function called by spinlock in a higher level:
>>
>> void lock_acquire(struct lockdep_map *lock, unsigned int subclass,
>>
>>   int trylock, int read, int check,
>>   struct lockdep_map *nest_lock, unsigned long ip)
>> {
>> unsigned long flags;
>>
>> if (unlikely(current->lockdep_recursion))
>> return;
>>
>> raw_local_irq_save(flags);
>> check_flags(flags);
>>
>> current->lockdep_recursion = 1;
>> trace_lock_acquire(lock, subclass, trylock, read, check, nest_lock,
>> ip);
>> __lock_acquire(lock, subclass, trylock, read, check,
>>irqs_disabled_flags(flags), nest_lock, ip, 0, 0);
>> current->lockdep_recursion = 0;
>> raw_local_irq_restore(flags);
>> }
>>
>>
>> Thanks,
>>
>> Alex Bin
>>
>>
>> On 2017-06-22 12:27 PM, Marek Olšák wrote:
>>>
>>> On Thu, Jun 22, 2017 at 5:33 PM, Xie, AlexBin <alexbin@amd.com>
>>> wrote:
>>>>
>>>> Hi Christian,
>>>>
>>>>
>>>> In fact, the change from spinlock to atomic is quite painful. When I
>>>> started, I thought it was easy but later I found there might be race
>>>> condition here and there. Now I think the change looks more robust. In
>>>> kernel source, there are several other drivers used the same trick.
>>>>
>>>>
>>>> On the other hand, I think the logic itself might be optimized
>>>> considering
>>>> the locking. But I had spent quite some effort to maintain original
>>>> logic.
>>>
>>> It seems quite complicated and I don't know if there is any
>>> performance benefit. Spinlocks are nice because they allow preemption.
>>>
>>> It would be more interesting to merge the CS and BO_LIST ioctls into one.
>>>
>>> Marek
>>
>>
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 1/3] drm/amdgpu: fix a typo

2017-06-22 Thread Marek Olšák
On Thu, Jun 22, 2017 at 5:33 PM, Xie, AlexBin  wrote:
> Hi Christian,
>
>
> In fact, the change from spinlock to atomic is quite painful. When I
> started, I thought it was easy but later I found there might be race
> condition here and there. Now I think the change looks more robust. In
> kernel source, there are several other drivers used the same trick.
>
>
> On the other hand, I think the logic itself might be optimized considering
> the locking. But I had spent quite some effort to maintain original logic.

It seems quite complicated and I don't know if there is any
performance benefit. Spinlocks are nice because they allow preemption.

It would be more interesting to merge the CS and BO_LIST ioctls into one.

Marek
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: GART write flush error on SI w/ amdgpu

2017-06-20 Thread Marek Olšák
On Tue, Jun 20, 2017 at 1:46 PM, Christian König
<deathsim...@vodafone.de> wrote:
> Am 20.06.2017 um 12:34 schrieb Marek Olšák:
>>
>> BTW, I noticed the flush sequence in the kernel is wrong. The correct
>> flush sequence should be:
>>
>> 1) EVENT_WRITE_EOP - CACHE_FLUSH_AND_INV_TS - write a dword to memory,
>> but no fence/interrupt.
>> 2) WAIT_REG_MEM on the dword to wait for idle before SURFACE_SYNC.
>> 3) SURFACE_SYNC (TC, K$, I$)
>> 4) Write CP_COHER_CNTL2.
>> 5) EVENT_WRITE_EOP - BOTTOM_OF_PIPE_TS - write the fence with the
>> interrupt.
>>
>> WAIT_REG_MEM wouldn't be needed if we were able to merge
>> CACHE_FLUSH_AND_INV, SURFACE_SYNC, and CP_COHER_CNTL2 into one EOP
>> event.
>>
>> The main issue with the current flush sequence in radeon and amdgpu is
>> that it doesn't wait for idle before writing CP_COHER_CNTL2 and
>> SURFACE_SYNC. So far we've been able to avoid the bug by waiting for
>> idle in userspace IBs.
>
>
> Well not waiting for idle between IBs is an explicit requirement, because it
> is rather bad for performance to do so.
>
> David Zhou, Monk and I worked quite a lot on this to avoid both possible
> hazard and performance drop.

I guess the requirement was ignored for SI. If you don't do the TC
flush as part the EOP event, you have to wait for idle before
SURFACE_SYNC, because SURFACE_SYNC doesn't wait for idle. It's kinda
useless to flush TC when shaders are still in flight.

Marek
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: GART write flush error on SI w/ amdgpu

2017-06-20 Thread Marek Olšák
On Tue, Jun 20, 2017 at 1:49 PM, Nicolai Hähnle <nhaeh...@gmail.com> wrote:
> On 20.06.2017 12:34, Marek Olšák wrote:
>>
>> BTW, I noticed the flush sequence in the kernel is wrong. The correct
>> flush sequence should be:
>>
>> 1) EVENT_WRITE_EOP - CACHE_FLUSH_AND_INV_TS - write a dword to memory,
>> but no fence/interrupt.
>> 2) WAIT_REG_MEM on the dword to wait for idle before SURFACE_SYNC.
>> 3) SURFACE_SYNC (TC, K$, I$)
>> 4) Write CP_COHER_CNTL2.
>> 5) EVENT_WRITE_EOP - BOTTOM_OF_PIPE_TS - write the fence with the
>> interrupt.
>>
>> WAIT_REG_MEM wouldn't be needed if we were able to merge
>> CACHE_FLUSH_AND_INV, SURFACE_SYNC, and CP_COHER_CNTL2 into one EOP
>> event.
>>
>> The main issue with the current flush sequence in radeon and amdgpu is
>> that it doesn't wait for idle before writing CP_COHER_CNTL2 and
>> SURFACE_SYNC. So far we've been able to avoid the bug by waiting for
>> idle in userspace IBs.
>
>
> This is gfx9-only though, right?

No, I'm only talking about SI.

Marek
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: GART write flush error on SI w/ amdgpu

2017-06-20 Thread Marek Olšák
BTW, I noticed the flush sequence in the kernel is wrong. The correct
flush sequence should be:

1) EVENT_WRITE_EOP - CACHE_FLUSH_AND_INV_TS - write a dword to memory,
but no fence/interrupt.
2) WAIT_REG_MEM on the dword to wait for idle before SURFACE_SYNC.
3) SURFACE_SYNC (TC, K$, I$)
4) Write CP_COHER_CNTL2.
5) EVENT_WRITE_EOP - BOTTOM_OF_PIPE_TS - write the fence with the interrupt.

WAIT_REG_MEM wouldn't be needed if we were able to merge
CACHE_FLUSH_AND_INV, SURFACE_SYNC, and CP_COHER_CNTL2 into one EOP
event.

The main issue with the current flush sequence in radeon and amdgpu is
that it doesn't wait for idle before writing CP_COHER_CNTL2 and
SURFACE_SYNC. So far we've been able to avoid the bug by waiting for
idle in userspace IBs.

Marek


On Fri, May 26, 2017 at 5:47 PM, Marek Olšák <mar...@gmail.com> wrote:
> On Tue, May 9, 2017 at 2:13 PM, Nicolai Hähnle <nhaeh...@gmail.com> wrote:
>> Hi all,
>>
>> I'm seeing some very strange errors on Verde with CPU readback from GART,
>> and am pretty much out of ideas. Some help would be very much appreciated.
>>
>> The error manifests with the
>> GL45-CTS.gtf32.GL3Tests.packed_pixels.packed_pixels_pbo test on amdgpu,
>> but
>> *not* on radeon. Here's what the test does:
>>
>> 1. Upload a texture.
>> 2. Read the texture back via a shader that uses shader buffer writes to
>> write data to a buffer that is allocated in GART.
>> 3. The CPU then reads from the buffer -- and sometimes gets stale data.
>>
>> This sequence is repeated for many sub-tests. There are some sub-tests
>> where
>> the CPU reads stale data from the buffer, i.e. the shader writes simply
>> don't make it to the CPU. The tests vary superficially, e.g. the first
>> failing test is (almost?) always one where data is written in 16-bit words
>> (but there are succeeding sub-tests with 16-bit writes as well).
>>
>> The bug is *not* a timing issue. Adding even a 1sec delay (sleep(1);)
>> between the fence wait and the return of glMapBuffer does not fix the
>> problem. The data must be stuck in a cache somewhere.
>>
>> Since the test runs okay with the radeon module, I tried some changes
>> based
>> on comparing the IB submit between radeon and amdgpu, and based on
>> comparing
>> register settings via scans obtained from umr. Some of the things I've
>> tried:
>>
>> - Set HDP_MISC_CNTL.FLUSH_INVALIDATE_CACHE to 1 (both radeon and
>> amdgpu/gfx9
>> set this)
>> - Add SURFACE_SYNC packets preceded by setting CP_COHER_CNTL2 to the vmid
>> (radeon does this)
>> - Change gfx_v6_0_ring_emit_hdp_invalidate: select ME engine instead of
>> PFP
>> (which seems more logical, and is done by gfx7+), or remove the
>> corresponding WRITE_DATA entirely
>>
>> None of these changes helped.
>>
>> What *does* help is adding an artificial wait. Specifically, I'm adding a
>> sequence of
>>
>> - WRITE_DATA
>> - CACHE_FLUSH_AND_INV_TS_EVENT (BOTTOM_OF_PIPE_TS has same behavior)
>> - WAIT_REG_MEM
>>
>> as can be seen in the attached patch. This works around the problem, but
>> it
>> makes no sense:
>>
>> Adding the wait sequence *before* the SURFACE_SYNC in ring_emit_fence
>> works
>> around the problem. However(!) it does not actually cause the UMD to wait
>> any longer than before. Without this change, the UMD immediately sees a
>> signaled user fence (and never uses an ioctl to wait), and with this
>> change,
>> it *still* sees a signaled user fence.
>>
>> Also, note that the way I've hacked the change, the wait sequence is only
>> added for the user fence emit (and I'm using a modified UMD to ensure that
>> there is enough memory to be used by the added wait sequence).
>>
>> Adding the wait sequence *after* the SURFACE_SYNC *doesn't* work around
>> the
>> problem.
>>
>> So for whatever reason, the added wait sequence *before* the SURFACE_SYNC
>> encourages some part of the GPU to flush the data from wherever it's
>> stuck,
>> and that's just really bizarre. There must be something really simple I'm
>> missing, and any pointers would be appreciated.
>
> Have you tried this?
>
> diff --git a/src/gallium/drivers/radeonsi/si_hw_context.c
> b/src/gallium/drivers/radeonsi/si_hw_context.c
> index 92c09cb..e6ac0ba 100644
> --- a/src/gallium/drivers/radeonsi/si_hw_context.c
> +++ b/src/gallium/drivers/radeonsi/si_hw_context.c
> @@ -133,7 +133,8 @@ void si_context_gfx_flush(void *context, unsigned flags,
> SI_CONTEXT_PS_PARTIAL_FLUSH;
>
> /* DRM 3.1.0 doesn't flush TC for VI

Re: GART write flush error on SI w/ amdgpu

2017-05-26 Thread Marek Olšák
On Tue, May 9, 2017 at 2:13 PM, Nicolai Hähnle  wrote:
> Hi all,
>
> I'm seeing some very strange errors on Verde with CPU readback from GART,
> and am pretty much out of ideas. Some help would be very much appreciated.
>
> The error manifests with the
> GL45-CTS.gtf32.GL3Tests.packed_pixels.packed_pixels_pbo test on amdgpu,
but
> *not* on radeon. Here's what the test does:
>
> 1. Upload a texture.
> 2. Read the texture back via a shader that uses shader buffer writes to
> write data to a buffer that is allocated in GART.
> 3. The CPU then reads from the buffer -- and sometimes gets stale data.
>
> This sequence is repeated for many sub-tests. There are some sub-tests
where
> the CPU reads stale data from the buffer, i.e. the shader writes simply
> don't make it to the CPU. The tests vary superficially, e.g. the first
> failing test is (almost?) always one where data is written in 16-bit words
> (but there are succeeding sub-tests with 16-bit writes as well).
>
> The bug is *not* a timing issue. Adding even a 1sec delay (sleep(1);)
> between the fence wait and the return of glMapBuffer does not fix the
> problem. The data must be stuck in a cache somewhere.
>
> Since the test runs okay with the radeon module, I tried some changes
based
> on comparing the IB submit between radeon and amdgpu, and based on
comparing
> register settings via scans obtained from umr. Some of the things I've
> tried:
>
> - Set HDP_MISC_CNTL.FLUSH_INVALIDATE_CACHE to 1 (both radeon and
amdgpu/gfx9
> set this)
> - Add SURFACE_SYNC packets preceded by setting CP_COHER_CNTL2 to the vmid
> (radeon does this)
> - Change gfx_v6_0_ring_emit_hdp_invalidate: select ME engine instead of
PFP
> (which seems more logical, and is done by gfx7+), or remove the
> corresponding WRITE_DATA entirely
>
> None of these changes helped.
>
> What *does* help is adding an artificial wait. Specifically, I'm adding a
> sequence of
>
> - WRITE_DATA
> - CACHE_FLUSH_AND_INV_TS_EVENT (BOTTOM_OF_PIPE_TS has same behavior)
> - WAIT_REG_MEM
>
> as can be seen in the attached patch. This works around the problem, but
it
> makes no sense:
>
> Adding the wait sequence *before* the SURFACE_SYNC in ring_emit_fence
works
> around the problem. However(!) it does not actually cause the UMD to wait
> any longer than before. Without this change, the UMD immediately sees a
> signaled user fence (and never uses an ioctl to wait), and with this
change,
> it *still* sees a signaled user fence.
>
> Also, note that the way I've hacked the change, the wait sequence is only
> added for the user fence emit (and I'm using a modified UMD to ensure that
> there is enough memory to be used by the added wait sequence).
>
> Adding the wait sequence *after* the SURFACE_SYNC *doesn't* work around
the
> problem.
>
> So for whatever reason, the added wait sequence *before* the SURFACE_SYNC
> encourages some part of the GPU to flush the data from wherever it's
stuck,
> and that's just really bizarre. There must be something really simple I'm
> missing, and any pointers would be appreciated.

Have you tried this?

diff --git a/src/gallium/drivers/radeonsi/si_hw_context.c
b/src/gallium/drivers/radeonsi/si_hw_context.c
index 92c09cb..e6ac0ba 100644
--- a/src/gallium/drivers/radeonsi/si_hw_context.c
+++ b/src/gallium/drivers/radeonsi/si_hw_context.c
@@ -133,7 +133,8 @@ void si_context_gfx_flush(void *context, unsigned flags,
SI_CONTEXT_PS_PARTIAL_FLUSH;

/* DRM 3.1.0 doesn't flush TC for VI correctly. */
-   if (ctx->b.chip_class == VI && ctx->b.screen->info.drm_minor <= 1)
+   if ((ctx->b.chip_class == VI && ctx->b.screen->info.drm_minor <= 1)
||
+   (ctx->b.chip_class == SI && ctx->b.screen->info.drm_major == 3))
ctx->b.flags |= SI_CONTEXT_INV_GLOBAL_L2 |
SI_CONTEXT_INV_VMEM_L1;

One more cache flush there shouldn't hurt.

Also, Mesa uses PFP_SYNC_ME. It shouldn't be necessary, but it's worth a
try.

Marek
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 0/3] drm/amdgpu: Tweaks for high pressure on CPU visible VRAM

2017-05-25 Thread Marek Olšák
On Thu, May 25, 2017 at 5:31 AM, Michel Dänzer <mic...@daenzer.net> wrote:
> On 24/05/17 08:27 PM, Christian König wrote:
>> Am 24.05.2017 um 13:03 schrieb Marek Olšák:
>>>>
>>> I think the final solution (done in fault_reserve_notify) should be:
>>> if (bo->num_cpu_page_faults++ > 20)
>>> bo->preferred_domain = GTT_WC;
>
> I agree something like that will probably be part of the solution, but I
> doubt it's quite that simple or that it's the only thing that can be
> improved.
>
>
>> I more or less agree on that, but setting preferred_domain permanently
>> to GTT_WC is what worries me a bit.
>>
>> E.g. imagine you alt+tab from a game to your browser and back and the
>> game runs way slower now because BOs are never moved back to VRAM.
>
> Right, permanently moving a BO to GTT might itself cause performance to
> drop down a cliff in some cases. It's possible that this is irrelevant
> compared to excessive buffer migration for CPU access though.
>
>
>> What we need is a global limit of number of bytes transfered per second
>> for swap operations or something like that.
>>
>> Or maybe a timeout which says when a BO was moved (either by swapping it
>> out or by a CPU page fault) only move it back after +n jiffies or
>> something like that.
>
> I also feel like something like this will be more useful than the number
> of CPU page faults per se. But I'm curious what Marek comes up with. :)

I don't have any better idea at the moment. It looks like John Brooks
has already solved this issue based on his IRC comments.

Marek
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 0/3] drm/amdgpu: Tweaks for high pressure on CPU visible VRAM

2017-05-24 Thread Marek Olšák
On Wed, May 24, 2017 at 9:56 AM, Michel Dänzer <mic...@daenzer.net> wrote:
> On 23/05/17 07:38 PM, Marek Olšák wrote:
>> On Tue, May 23, 2017 at 2:45 AM, Michel Dänzer <mic...@daenzer.net> wrote:
>>> On 22/05/17 07:09 PM, Marek Olšák wrote:
>>>> On Mon, May 22, 2017 at 12:00 PM, Michel Dänzer <mic...@daenzer.net> wrote:
>>>>> On 20/05/17 06:26 PM, Marek Olšák wrote:
>>>>>> On May 20, 2017 3:26 AM, "Michel Dänzer" <mic...@daenzer.net
>>>>>> <mailto:mic...@daenzer.net>> wrote:
>>>>>>
>>>>>> On 20/05/17 01:14 AM, Marek Olšák wrote:
>>>>>> > Hi Michel,
>>>>>> >
>>>>>> > I've applied your series
>>>>>>
>>>>>> Thanks for testing it.
>>>>>>
>>>>>> > and it doesn't help with low Dirt Rally performance on Fiji. I see 
>>>>>> TTM
>>>>>> > buffer moves at 800MB/s and many VRAM page faults.
>>>>>>
>>>>>> Did you see this:
>>>>>>
>>>>>> >> Note that there's only little if any improvement of the average
>>>>>> framerate
>>>>>> >> reported, but the minimum framerate as seen on the HUD goes from
>>>>>> ~10 fps
>>>>>> >> to ~17.
>>>>>>
>>>>>> I.e. it mostly affects the minimum framerate and smoothness for me
>>>>>> as well.
>>>>>>
>>>>>>
>>>>>> Without the series, I get 70 average fps. With the series, I get 30
>>>>>> average fps. That might just be random bad luck. I don't know.
>>>>>
>>>>> Hmm, yeah, maybe that was just one of the random slowdowns you've been
>>>>> talking about in other threads and on IRC?
>>>>>
>>>>> I can't reproduce any slowdown with these patches, even leaving visible
>>>>> VRAM size at 256 MB.
>>>>
>>>> The random slowdowns with Dirt Rally are only caused by the pressure
>>>> on visible VRAM. This whole thread is about those random slowdowns.
>>>
>>> No, this thread is about the scenario described in the cover letter of
>>> this patch series.
>>>
>>>
>>>> If you're saying "maybe it was just one of the random slowdowns", you're
>>>> saying "maybe it was just the visible VRAM pressure". It's only
>>>> random with Dirt Rally, which makes it difficult to believe statements
>>>> such as "I can't reproduce any slowdown".
>>>
>>> I could say the same thing about you seeing random slowdowns... I've
>>> never seen that, I had to artificially limit the size of visible VRAM to
>>> 64 MB to make it significantly affect the benchmark result.
>>>
>>> How many times do you need to run the benchmark on average to hit a
>>> random slowdown? Which desktop environment and other X clients are
>>> running during the benchmark? Which tab is active in the Steam window
>>> while the benchmark runs?
>>>
>>> In my case, it's only xfwm4, xterm and steam on the Dirt Rally page in
>>> the library.
>>
>> Ubuntu Unity, Steam small mode (there are no tabs), Ultra settings in
>> Dirt Rally.
>>
>> Every single time I run the game with this series, I get 700-1000MB/s
>> of TTM BO moves. There doesn't seem to be any randomness.
>>
>> It was better without this series. (meaning it was sometimes OK, sometimes 
>> bad)
>
> Thanks for the additional details. I presume that in the bad case there
> are some BOs lying around in visible VRAM (e.g. from Unity), which
> causes some of Dirt Rally's BOs to go back and forth between GTT on CPU
> page faults and VRAM on GPU usage.
>
> This means at least patch 2 goes out the window. I'll see if I can
> salvage something out of patch 3.

I think the final solution (done in fault_reserve_notify) should be:
if (bo->num_cpu_page_faults++ > 20)
   bo->preferred_domain = GTT_WC;

Otherwise I think we'll be just going in circles and not get anywhere.

Marek
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] winsys/amdgpu: align VA allocations to fragment size

2017-05-23 Thread Marek Olšák
On Tue, May 23, 2017 at 6:43 PM, Christian König
<deathsim...@vodafone.de> wrote:
> From: Christian König <christian.koe...@amd.com>
>
> BOs larger than the minimum fragment size should have their VA
> alignet to at least the fragment size for optimal performance.
>
> Signed-off-by: Christian König <christian.koe...@amd.com>
> ---
>  src/amd/common/ac_gpu_info.c  | 1 +
>  src/amd/common/ac_gpu_info.h  | 1 +
>  src/gallium/winsys/amdgpu/drm/amdgpu_bo.c | 2 ++
>  src/gallium/winsys/amdgpu/drm/amdgpu_winsys.h | 1 +
>  4 files changed, 5 insertions(+)
>
> diff --git a/src/amd/common/ac_gpu_info.c b/src/amd/common/ac_gpu_info.c
> index cf5d6e1..0b4933e 100644
> --- a/src/amd/common/ac_gpu_info.c
> +++ b/src/amd/common/ac_gpu_info.c
> @@ -281,6 +281,7 @@ bool ac_query_gpu_info(int fd, amdgpu_device_handle dev,
> memcpy(info->cik_macrotile_mode_array, amdinfo->gb_macro_tile_mode,
> sizeof(amdinfo->gb_macro_tile_mode));
>
> +   info->pte_fragment_size = alignment_info.size_local;
> info->gart_page_size = alignment_info.size_remote;
>
> if (info->chip_class == SI)
> diff --git a/src/amd/common/ac_gpu_info.h b/src/amd/common/ac_gpu_info.h
> index d8029ef..3785eb4 100644
> --- a/src/amd/common/ac_gpu_info.h
> +++ b/src/amd/common/ac_gpu_info.h
> @@ -45,6 +45,7 @@ struct radeon_info {
> uint32_tpci_id;
> enum radeon_family  family;
> enum chip_class chip_class;
> +   uint32_tpte_fragment_size;
> uint32_tgart_page_size;
> uint64_tgart_size;
> uint64_tvram_size;
> diff --git a/src/gallium/winsys/amdgpu/drm/amdgpu_bo.c 
> b/src/gallium/winsys/amdgpu/drm/amdgpu_bo.c
> index 6bdcce5..4017411 100644
> --- a/src/gallium/winsys/amdgpu/drm/amdgpu_bo.c
> +++ b/src/gallium/winsys/amdgpu/drm/amdgpu_bo.c
> @@ -415,6 +415,8 @@ static struct amdgpu_winsys_bo *amdgpu_create_bo(struct 
> amdgpu_winsys *ws,
> }
>
> va_gap_size = ws->check_vm ? MAX2(4 * alignment, 64 * 1024) : 0;
> +   if (size > ws->info.pte_fragment_size)
> +  alignment = MAX2(alignment, ws->info.pte_fragment_size);
> r = amdgpu_va_range_alloc(ws->dev, amdgpu_gpu_va_range_general,
>   size + va_gap_size, alignment, 0, , 
> _handle, 0);
> if (r)
> diff --git a/src/gallium/winsys/amdgpu/drm/amdgpu_winsys.h 
> b/src/gallium/winsys/amdgpu/drm/amdgpu_winsys.h
> index 896a463..88975e2 100644
> --- a/src/gallium/winsys/amdgpu/drm/amdgpu_winsys.h
> +++ b/src/gallium/winsys/amdgpu/drm/amdgpu_winsys.h
> @@ -73,6 +73,7 @@ struct amdgpu_winsys {
>
>     struct amdgpu_gpu_info amdinfo;
> ADDR_HANDLE addrlib;
> +   struct amdgpu_buffer_size_alignmentsalignments;
>
> bool check_vm;

This hunk seems unnecessary. If you remove it, the patch is:

Reviewed-by: Marek Olšák <marek.ol...@amd.com>

Marek
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 0/3] drm/amdgpu: Tweaks for high pressure on CPU visible VRAM

2017-05-23 Thread Marek Olšák
On Tue, May 23, 2017 at 2:45 AM, Michel Dänzer <mic...@daenzer.net> wrote:
> On 22/05/17 07:09 PM, Marek Olšák wrote:
>> On Mon, May 22, 2017 at 12:00 PM, Michel Dänzer <mic...@daenzer.net> wrote:
>>> On 20/05/17 06:26 PM, Marek Olšák wrote:
>>>> On May 20, 2017 3:26 AM, "Michel Dänzer" <mic...@daenzer.net
>>>> <mailto:mic...@daenzer.net>> wrote:
>>>>
>>>> On 20/05/17 01:14 AM, Marek Olšák wrote:
>>>> > Hi Michel,
>>>> >
>>>> > I've applied your series
>>>>
>>>> Thanks for testing it.
>>>>
>>>> > and it doesn't help with low Dirt Rally performance on Fiji. I see 
>>>> TTM
>>>> > buffer moves at 800MB/s and many VRAM page faults.
>>>>
>>>> Did you see this:
>>>>
>>>> >> Note that there's only little if any improvement of the average
>>>> framerate
>>>> >> reported, but the minimum framerate as seen on the HUD goes from
>>>> ~10 fps
>>>> >> to ~17.
>>>>
>>>> I.e. it mostly affects the minimum framerate and smoothness for me
>>>> as well.
>>>>
>>>>
>>>> Without the series, I get 70 average fps. With the series, I get 30
>>>> average fps. That might just be random bad luck. I don't know.
>>>
>>> Hmm, yeah, maybe that was just one of the random slowdowns you've been
>>> talking about in other threads and on IRC?
>>>
>>> I can't reproduce any slowdown with these patches, even leaving visible
>>> VRAM size at 256 MB.
>>
>> The random slowdowns with Dirt Rally are only caused by the pressure
>> on visible VRAM. This whole thread is about those random slowdowns.
>
> No, this thread is about the scenario described in the cover letter of
> this patch series.
>
>
>> If you're saying "maybe it was just one of the random slowdowns", you're
>> saying "maybe it was just the visible VRAM pressure". It's only
>> random with Dirt Rally, which makes it difficult to believe statements
>> such as "I can't reproduce any slowdown".
>
> I could say the same thing about you seeing random slowdowns... I've
> never seen that, I had to artificially limit the size of visible VRAM to
> 64 MB to make it significantly affect the benchmark result.
>
> How many times do you need to run the benchmark on average to hit a
> random slowdown? Which desktop environment and other X clients are
> running during the benchmark? Which tab is active in the Steam window
> while the benchmark runs?
>
> In my case, it's only xfwm4, xterm and steam on the Dirt Rally page in
> the library.

Ubuntu Unity, Steam small mode (there are no tabs), Ultra settings in
Dirt Rally.

Every single time I run the game with this series, I get 700-1000MB/s
of TTM BO moves. There doesn't seem to be any randomness.

It was better without this series. (meaning it was sometimes OK, sometimes bad)

Marek
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 0/3] drm/amdgpu: Tweaks for high pressure on CPU visible VRAM

2017-05-22 Thread Marek Olšák
On Mon, May 22, 2017 at 12:00 PM, Michel Dänzer <mic...@daenzer.net> wrote:
> On 20/05/17 06:26 PM, Marek Olšák wrote:
>> On May 20, 2017 3:26 AM, "Michel Dänzer" <mic...@daenzer.net
>> <mailto:mic...@daenzer.net>> wrote:
>>
>> On 20/05/17 01:14 AM, Marek Olšák wrote:
>> > Hi Michel,
>> >
>> > I've applied your series
>>
>> Thanks for testing it.
>>
>> > and it doesn't help with low Dirt Rally performance on Fiji. I see TTM
>> > buffer moves at 800MB/s and many VRAM page faults.
>>
>> Did you see this:
>>
>> >> Note that there's only little if any improvement of the average
>> framerate
>> >> reported, but the minimum framerate as seen on the HUD goes from
>> ~10 fps
>> >> to ~17.
>>
>> I.e. it mostly affects the minimum framerate and smoothness for me
>> as well.
>>
>>
>> Without the series, I get 70 average fps. With the series, I get 30
>> average fps. That might just be random bad luck. I don't know.
>
> Hmm, yeah, maybe that was just one of the random slowdowns you've been
> talking about in other threads and on IRC?
>
> I can't reproduce any slowdown with these patches, even leaving visible
> VRAM size at 256 MB.

The random slowdowns with Dirt Rally are only caused by the pressure
on visible VRAM. This whole thread is about those random slowdowns. If
you're saying "maybe it was just one of the random slowdowns", you're
saying "maybe it was just the visible VRAM pressure". It's only
random with Dirt Rally, which makes it difficult to believe statements
such as "I can't reproduce any slowdown". It's not random with Dying
Light.

Marek
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 0/3] drm/amdgpu: Tweaks for high pressure on CPU visible VRAM

2017-05-20 Thread Marek Olšák
On May 20, 2017 3:26 AM, "Michel Dänzer" <mic...@daenzer.net> wrote:

On 20/05/17 01:14 AM, Marek Olšák wrote:
> Hi Michel,
>
> I've applied your series

Thanks for testing it.

> and it doesn't help with low Dirt Rally performance on Fiji. I see TTM
> buffer moves at 800MB/s and many VRAM page faults.

Did you see this:

>> Note that there's only little if any improvement of the average framerate
>> reported, but the minimum framerate as seen on the HUD goes from ~10 fps
>> to ~17.

I.e. it mostly affects the minimum framerate and smoothness for me as well.


Without the series, I get 70 average fps. With the series, I get 30 average
fps. That might just be random bad luck. I don't know. In any case, 30 fps
is really bad, so I don't think the series does what you think it does.

Marek



--
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 0/3] drm/amdgpu: Tweaks for high pressure on CPU visible VRAM

2017-05-19 Thread Marek Olšák
Hi Michel,

I've applied your series and it doesn't help with low Dirt Rally
performance on Fiji. I see TTM buffer moves at 800MB/s and many VRAM
page faults.

Marek

On Thu, May 18, 2017 at 11:08 AM, Michel Dänzer  wrote:
> From: Michel Dänzer 
>
> This series was developed and tested under the following scenario:
>
> Running the PTS dirt-rally benchmark (1920x1080, Ultra) on Tonga with
> 2G, with CPU visible VRAM artificially restricted to 64 MB.
>
> Without this series, there's a lot of stutter during about the first
> minute of a benchmark run. During this time there are significant amounts
> of buffer moves (starting from about 500 MB on the HUD) and evictions,
> gradually declining until the buffer moves settle around 8 MB on the HUD.
>
> With this series, there's only slight stutter during the first seconds
> after the car launches, even though the buffer move volume is about the
> same as without the series. Buffer evictions are eliminated almost
> completely, except for a few at the beginning. Buffer moves still settle
> around 8 MB on the HUD, but with less variance than before.
>
> Note that there's only little if any improvement of the average framerate
> reported, but the minimum framerate as seen on the HUD goes from ~10 fps
> to ~17.
>
>
> Patch 1 is a cleanup that I noticed along the way.
>
> Patch 2 makes the main difference for the above scenario.
>
> Patch 3 doesn't make as much difference, I'm fine with it not landing at
> least for now.
>
> Michel Dänzer (3):
>   drm/amdgpu: Drop useless loops for placement restrictions
>   drm/amdgpu: Don't evict other BOs from VRAM for page faults
>   drm/amdgpu: Try evicting from CPU visible to invisible VRAM first
>
>  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 42 ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c| 46 
> --
>  drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c|  6 +---
>  3 files changed, 51 insertions(+), 43 deletions(-)
>
> --
> 2.11.0
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 0/3] drm/amdgpu: Tweaks for high pressure on CPU visible VRAM

2017-05-19 Thread Marek Olšák
On Fri, May 19, 2017 at 5:27 PM, John Brooks <j...@fastquake.com> wrote:
> On Fri, May 19, 2017 at 05:24:36PM +0200, Marek Olšák wrote:
>> Where is your "attached" patch?
>>
>> Marek
>
> It's actually a reply to my message. Sorry if that was unclear.

That's OK, but I don't see any patch from you here.

Marek
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 0/3] drm/amdgpu: Tweaks for high pressure on CPU visible VRAM

2017-05-19 Thread Marek Olšák
Where is your "attached" patch?

Marek

On Fri, May 19, 2017 at 5:04 AM, John Brooks  wrote:
> I'm glad this is being worked on. However, somewhat to my surprise, this patch
> series didn't help Dying Light's BO eviction problem. For those who don't 
> know,
> that game performs very badly in certain areas, and it is correlated with
> increased TTM eviction rates. Relevant screenshots of gallium HUD and sysprof:
>
> http://www.fastquake.com/images/screen-dlgalliumhud1-20170513-171241.png
> http://www.fastquake.com/images/screen-dlsysprof-20170515-225919.png
>
> I noticed last week that adding RADEON_DOMAIN_GTT to the domains in radeonsi
> (patch: http://www.fastquake.com/files/text/radeon-gtt.txt ) greatly improved
> performance in these areas, to the tune of about a 30fps increase. Obviously,
> putting GTT in every buffer's domain is not a proper solution. But it lead me
> to believe that perhaps the problem wasn't just the swapping of resident BOs,
> but the creation of new ones that only have VRAM in their domain, and they
> cause existing BOs to be evicted from visible VRAM unconditionally.
>
> The attached patch assigns GTT as the busy placement for newly created BOs 
> that
> have the AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED flag, so that they will go to
> GTT if visible VRAM is full, instead of evicting established BOs. Since there
> is no way to know what the usage patterns of a new BO will be, we shouldn't
> evict established BOs (for which we have hypothetically had the opportunity to
> gather usage data) from visible VRAM for new, unknown BOs.
>
> With this patch I get hugely improved performance in Dying Light just like 
> with
> the Mesa patch: I observed 30-40fps where I got 14 before, and 60fps where I
> got 40 before. TTM evictions and bytes moved have dropped to zero where they
> were exceedingly high before. Buffer evictions no longer dominate the prof
> trace. Screenshots:
>
> http://www.fastquake.com/images/screen-dl-gtt_busy_only-20170518-192602.png
> http://www.fastquake.com/images/screen-dlsysprof-gttpatch-20170518-223200.png
>
> --
> John Brooks (Frogging101)
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: Plan: BO move throttling for visible VRAM evictions

2017-05-18 Thread Marek Olšák
On May 18, 2017 10:17 AM, "Michel Dänzer" <mic...@daenzer.net> wrote:

On 17/05/17 09:35 PM, Marek Olšák wrote:
> On May 16, 2017 3:57 AM, "Michel Dänzer" <mic...@daenzer.net
> <mailto:mic...@daenzer.net>> wrote:
> On 15/05/17 07:11 PM, Marek Olšák wrote:
> > On May 15, 2017 4:29 AM, "Michel Dänzer" <mic...@daenzer.net
> <mailto:mic...@daenzer.net>
> > <mailto:mic...@daenzer.net <mailto:mic...@daenzer.net>>> wrote:
> >
> > I think the next step should be to make radeonsi keep track of
> how much
> > VRAM it's trying to use that's expected to be accessed by the
> CPU, and
> > to use GTT instead when that exceeds a threshold (probably
> derived from
> > vram_vis_size).
> >
> > That's difficult to estimate. There are apps with 600MB of mapped
VRAM
> > and don't experience any performance issues. And some apps with
> 300MB of
> > mapped VRAM do. It only depends on the CPU access pattern, not what
> > radeonsi sees.
>
> What I mean is keeping track of the total size of resources which have
> RADEON_DOMAIN_VRAM and RADEON_FLAG_CPU_ACCESS set, and if it exceeds a
> threshold, create new ones having those flags in GTT instead. Even
> though this might not be strictly necessary with amdgpu in the long
run,
> it probably is for radeon anyway, and in the short term it might help
> even with amdgpu.
>
>
> That might hurt us more than it can help.

You may be right, but I think I'll play with that idea a little anyway
to see how it goes. :)

> All mappable buffers have the CPU access flag set, but many of them are
> immutable.

You mean they're only written to once by the CPU? We shouldn't set the
RADEON_FLAG_CPU_ACCESS flag for BOs where we expect that, because it
will currently prevent them from being in the CPU invisible part of VRAM.


The only thing I can do is set the CPU access flag for persistently mapped
buffers only. We certainly want buffers to go to the invisible part of VRAM
if there is no CPU access for a certain timeframe. So maybe we shouldn't
set the flag at all. What do you thing?

The truth is we have no way to know what apps intend to do with any buffers.



> The only place where this can be handled​ is the kernel.

Ideally, the placement of a BO should be determined based on how it's
actually being used by the GPU vs CPU. But I'm not sure how to determine
that in a useful way.


CPU page faults are the only way to determine that CPU access is happening.


> Even if it's as simple as: if (bo->numcpufaults > 10) domain = GTT_WC;

I'm skeptical about the number of CPU page faults per se being a useful
metric. It doesn't tell us much about how the BO is used even by the
CPU, let alone the GPU. But let's see where this leads you.


It tells us more than what Mesa can ever know, which is nothing.

Marek



One thing that might help would be if we could swap individual memory
nodes between visible and invisible VRAM for CPU page faults, instead of
moving/evicting whole BOs. Christian, do you think something like that
would be possible?


Another idea (to avoid issues such as the recent one with Rocket League)
was to make VRAM CPU mappings write-only, and move the BO to GTT if
there's a read fault. But not sure if this is possible at all, or how
much effort it would be.


--
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/amdgpu: add an INFO query for monitoring VRAM CPU page faults

2017-05-17 Thread Marek Olšák
From: Marek Olšák <marek.ol...@amd.com>

Signed-off-by: Marek Olšák <marek.ol...@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h| 1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c| 3 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 1 +
 include/uapi/drm/amdgpu_drm.h  | 2 ++
 4 files changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index fadeb55..251e5de 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1525,20 +1525,21 @@ struct amdgpu_device {
 
/* memory management */
struct amdgpu_mman  mman;
struct amdgpu_vram_scratch  vram_scratch;
struct amdgpu_wbwb;
atomic64_t  vram_usage;
atomic64_t  vram_vis_usage;
atomic64_t  gtt_usage;
atomic64_t  num_bytes_moved;
atomic64_t  num_evictions;
+   atomic64_t  num_vram_cpu_page_faults;
atomic_tgpu_reset_counter;
 
/* data for buffer migration throttling */
struct {
spinlock_t  lock;
s64 last_update_us;
s64 accum_us; /* accumulated microseconds */
u32 log2_max_MBps;
} mm_stats;
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index d167949..81291d8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -407,20 +407,23 @@ static int amdgpu_info_ioctl(struct drm_device *dev, void 
*data, struct drm_file
 
return copy_to_user(out, _info,
min((size_t)size, sizeof(fw_info))) ? 
-EFAULT : 0;
}
case AMDGPU_INFO_NUM_BYTES_MOVED:
ui64 = atomic64_read(>num_bytes_moved);
return copy_to_user(out, , min(size, 8u)) ? -EFAULT : 0;
case AMDGPU_INFO_NUM_EVICTIONS:
ui64 = atomic64_read(>num_evictions);
return copy_to_user(out, , min(size, 8u)) ? -EFAULT : 0;
+   case AMDGPU_INFO_NUM_VRAM_CPU_PAGE_FAULTS:
+   ui64 = atomic64_read(>num_vram_cpu_page_faults);
+   return copy_to_user(out, , min(size, 8u)) ? -EFAULT : 0;
case AMDGPU_INFO_VRAM_USAGE:
ui64 = atomic64_read(>vram_usage);
return copy_to_user(out, , min(size, 8u)) ? -EFAULT : 0;
case AMDGPU_INFO_VIS_VRAM_USAGE:
ui64 = atomic64_read(>vram_vis_usage);
return copy_to_user(out, , min(size, 8u)) ? -EFAULT : 0;
case AMDGPU_INFO_GTT_USAGE:
ui64 = atomic64_read(>gtt_usage);
return copy_to_user(out, , min(size, 8u)) ? -EFAULT : 0;
case AMDGPU_INFO_GDS_CONFIG: {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 6bc52cc..b6da86e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -953,20 +953,21 @@ int amdgpu_bo_fault_reserve_notify(struct 
ttm_buffer_object *bo)
offset = bo->mem.start << PAGE_SHIFT;
/* TODO: figure out how to map scattered VRAM to the CPU */
if ((offset + size) <= adev->mc.visible_vram_size)
return 0;
 
/* Can't move a pinned BO to visible VRAM */
if (abo->pin_count > 0)
return -EINVAL;
 
/* hurrah the memory is not visible ! */
+   atomic64_inc(>num_vram_cpu_page_faults);
amdgpu_ttm_placement_from_domain(abo, AMDGPU_GEM_DOMAIN_VRAM);
lpfn =  adev->mc.visible_vram_size >> PAGE_SHIFT;
for (i = 0; i < abo->placement.num_placement; i++) {
/* Force into visible VRAM */
if ((abo->placements[i].flags & TTM_PL_FLAG_VRAM) &&
(!abo->placements[i].lpfn ||
 abo->placements[i].lpfn > lpfn))
abo->placements[i].lpfn = lpfn;
}
r = ttm_bo_validate(bo, >placement, false, false);
diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
index c99fe63..4f34394 100644
--- a/include/uapi/drm/amdgpu_drm.h
+++ b/include/uapi/drm/amdgpu_drm.h
@@ -598,20 +598,22 @@ struct drm_amdgpu_cs_chunk_data {
/* Subquery id: Query GPU temperature */
#define AMDGPU_INFO_SENSOR_GPU_TEMP 0x3
/* Subquery id: Query GPU load */
#define AMDGPU_INFO_SENSOR_GPU_LOAD 0x4
/* Subquery id: Query average GPU power */
#define AMDGPU_INFO_SENSOR_GPU_AVG_POWER0x5
/* Subquery id: Query northbridge voltage */
   

Re: Plan: BO move throttling for visible VRAM evictions

2017-05-17 Thread Marek Olšák
On May 16, 2017 3:57 AM, "Michel Dänzer" <mic...@daenzer.net> wrote:

On 15/05/17 07:11 PM, Marek Olšák wrote:
> On May 15, 2017 4:29 AM, "Michel Dänzer" <mic...@daenzer.net
> <mailto:mic...@daenzer.net>> wrote:
>
> I think the next step should be to make radeonsi keep track of how
much
> VRAM it's trying to use that's expected to be accessed by the CPU, and
> to use GTT instead when that exceeds a threshold (probably derived
from
> vram_vis_size).
>
> That's difficult to estimate. There are apps with 600MB of mapped VRAM
> and don't experience any performance issues. And some apps with 300MB of
> mapped VRAM do. It only depends on the CPU access pattern, not what
> radeonsi sees.

What I mean is keeping track of the total size of resources which have
RADEON_DOMAIN_VRAM and RADEON_FLAG_CPU_ACCESS set, and if it exceeds a
threshold, create new ones having those flags in GTT instead. Even
though this might not be strictly necessary with amdgpu in the long run,
it probably is for radeon anyway, and in the short term it might help
even with amdgpu.


That might hurt us more than it can help. All mappable buffers have the CPU
access flag set, but many of them are immutable.

The only place where this can be handled​ is the kernel. Even if it's as
simple as: if (bo->numcpufaults > 10) domain = GTT_WC;

Marek



--
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 4/4] drm/amdgpu: reset fpriv vram_lost_counter

2017-05-17 Thread Marek Olšák
David,

We already have a query that returns whether a device is lost. It's
called amdgpu_cs_query_reset_state. That should return whether a hang
was caused by a certain context or whether the hang happened but the
context is innocent. You can extend it to accept no context, in which
case it will return either NO_RESET (everything is OK) or
UNKNOWN_RESET (= when a hang happened but the caller didn't specify
the context).

Marek

On Wed, May 17, 2017 at 10:56 AM, Christian König
 wrote:
> Am 17.05.2017 um 10:46 schrieb zhoucm1:
>
>
>
> On 2017年05月17日 16:40, Christian König wrote:
>
> Am 17.05.2017 um 10:01 schrieb Michel Dänzer:
>
> On 17/05/17 04:13 PM, zhoucm1 wrote:
>
> On 2017年05月17日 14:57, Michel Dänzer wrote:
>
> On 17/05/17 01:28 PM, zhoucm1 wrote:
>
> On 2017年05月17日 11:15, Michel Dänzer wrote:
>
> On 17/05/17 12:04 PM, zhoucm1 wrote:
>
> On 2017年05月17日 09:18, Michel Dänzer wrote:
>
> On 16/05/17 06:25 PM, Chunming Zhou wrote:
>
> Change-Id: I8eb6d7f558da05510e429d3bf1d48c8cec6c1977
> Signed-off-by: Chunming Zhou 
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> index bca1fb5..f3e7525 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> @@ -2547,6 +2547,9 @@ int amdgpu_vm_ioctl(struct drm_device *dev,
> void *data, struct drm_file *filp)
>  case AMDGPU_VM_OP_UNRESERVE_VMID:
>  amdgpu_vm_free_reserved_vmid(adev, >vm,
> AMDGPU_GFXHUB);
>  break;
> +case AMDGPU_VM_OP_RESET:
> +fpriv->vram_lost_counter =
> atomic_read(>vram_lost_counter);
> +break;
>
> How do you envision the UMDs using this? I can mostly think of them
> calling this ioctl when a context is created or destroyed. But that
> would also allow any other remaining contexts using the same DRM file
> descriptor to use all ioctls again. So, I think there needs to be a
> vram_lost_counter in struct amdgpu_ctx instead of in struct
> amdgpu_fpriv.
>
> struct amdgpu_fpriv for vram_lost_counter is proper place, especially
> for ioctl return value.
> if you need to reset ctx one by one, we can mark all contexts of that
> vm, and then reset by userspace.
>
> I'm not following. With vram_lost_counter in amdgpu_fpriv, if any
> context calls this ioctl, all other contexts using the same file
> descriptor will also be considered safe again, right?
>
> Yes, but it really depends on userspace requirement, if you need to
> reset ctx one by one, we can mark all contexts of that vm to guilty, and
> then reset one context by userspace.
>
> Still not sure what you mean by that.
>
> E.g. what do you mean by "guilty"? I thought that refers to the context
> which caused a hang. But it seems like you're using it to refer to any
> context which hasn't reacted yet to VRAM contents being lost.
>
> When vram is lost, we treat all contexts need to reset.
>
> Essentially, your patches only track VRAM contents being lost per file
> descriptor, not per context. I'm not sure (rather skeptical) that this
> is suitable for OpenGL UMDs, since state is usually tracked per context.
> Marek / Nicolai?
>
>
> Oh, yeah that's a good point.
>
> The problem with tracking it per context is that Vulkan also wants the
> ENODEV on the amdgpu_gem_va_ioct() and amdgpu_info_ioctl() which are context
> less.
>
> But thinking more about this blocking those two doesn't make much sense. The
> VM content can be restored and why should be disallow reading GPU info?
>
> I can re-paste the Vulkan APIs requiring ENODEV:
> "
>
> The Vulkan APIs listed below could return VK_ERROR_DEVICE_LOST according to
> the spec.
>
> I tries to provide a list of u/k interfaces that could be called for each vk
> API.
>
>
> Well those are the Vulkan requirements, but that doesn't necessary mean we
> must follow that on the kernel side. Keep in mind that Vulkan can't made any
> requirements towards the kernel driver.
>
> IIRC we already have a query Vulkan can use to figure out if a GPU reset
> happened or not. So they could use that instead.
>
> Regards,
> Christian.
>
>
>
> vkCreateDevice
>
> -  amdgpu_device_initialize.
>
> -  amdgpu_query_gpu_info
>
>
>
> vkQueueSubmit
>
> -  amdgpu_cs_submit
>
>
>
> vkWaitForFences
>
> amdgpu_cs_wait_fences
>
>
>
> vkGetEventStatus
>
> vkQueueWaitIdle
>
> vkDeviceWaitIdle
>
> vkGetQueryPoolResults
>
> amdgpu_cs_query_Fence_status
>
>
>
> vkQueueBindSparse
>
> amdgpu_bo_va_op
>
> amdgpu_bo_va_op_raw
>
>
>
> vkCreateSwapchainKHR
>
> vkAcquireNextImageKHR
>
> vkQueuePresentKHR
>
> Not related with u/k interface.
>
>
>
> Besides those listed above, I think amdgpu_cs_signal_Sem/amdgpu_cs_wait_sem
> should respond to gpu reset as well."
>
>
> Christian.
>
>
>
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> 

Re: Plan: BO move throttling for visible VRAM evictions

2017-05-13 Thread Marek Olšák
On Mon, Apr 17, 2017 at 11:55 AM, Michel Dänzer <mic...@daenzer.net> wrote:
> On 17/04/17 07:58 AM, Marek Olšák wrote:
>> On Fri, Apr 14, 2017 at 12:14 PM, Michel Dänzer <mic...@daenzer.net> wrote:
>>> On 04/04/17 05:11 AM, Marek Olšák wrote:
>>>> On Fri, Mar 31, 2017 at 5:24 AM, Michel Dänzer <mic...@daenzer.net> wrote:
>>>>> On 30/03/17 07:03 PM, Michel Dänzer wrote:
>>>>>> On 25/03/17 01:33 AM, Marek Olšák wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> I'm sharing this idea here, because it's something that has been
>>>>>>> decreasing our performance a lot recently, for example:
>>>>>>> http://openbenchmarking.org/prospect/1703011-RI-RADEONDIR06/7b7668cfc109d1c3dc27e871c8aea71ca13f23fa
>>>>>>
>>>>>> The attached proof-of-concept patch (on top of Christian's "CPU mapping
>>>>>> of split VRAM buffers" series, ported from radeon) results in 145.05 fps
>>>>>> on my Tonga.
>>>>>
>>>>> I get the same result without my or Christian's patches though, with
>>>>> 4.11 based DRM or amd-staging-4.9. So I guess I just can't reproduce the
>>>>> problem with this test. Are there any other tests for it?
>>>>
>>>> It's random. Sometimes the benchmark runs OK, other times it's slow.
>>>> You can easily see the difference but observing how smooth it is. The
>>>> visible VRAM evictions result in constant 100-200ms stalls but not
>>>> every frame, which feels like the frame rate is much lower than it
>>>> actually is.
>>>>
>>>> Make sure your graphics details are maxed out. The best score I can
>>>> get with my rig is 70 fps. (Fiji & Core i5 3570)
>>>
>>> I'm getting around 53-54 fps at Ultra with Tonga, both with Mesa 13.0.6
>>> and Git.
>>>
>>> Have you tried if Christian's patches for CPU access to split VRAM
>>> buffers help? I can imagine that forcing contiguous VRAM buffers for CPU
>>> access could cause lots of other BOs to be unnecessarily evicted from
>>> VRAM, if at least one of their fragments happens to be in the CPU
>>> visible part of VRAM.
>>
>> I've finally tested latest amd-staging-4.9 and I'm very pleased. For
>> the first time, the Deus Ex benchmark has almost no hiccups. I've
>> never seen it so smooth. At one point, the MB/s BO move rate increase
>> to 200MB/s, stayed there for a couple of seconds, and then it dropped
>> to 0 again. The frame rate was OK-ish, so I guess the moves didn't
>> happen all at once. I also tested DiRT Rally and I haven't been able
>> to reproduce the low FPS with the consistently-high BO move rate that
>> I saw several months ago.
>>
>> We could do some move throttling there for sure, but it's much better
>> than it ever was.
>
> That's great to hear. If you get a chance, it would be interesting if
> the attached updated patch improves things even more for you. (The patch
> I attached previously couldn't work as intended, this one at least might :)

Frogging101 on IRC noticed that we get a ton of TTM BO moves due to
visible VRAM thrashing and Michel's patch doesn't help. His kernel is
up to date with amd-staging. It looks like the only option left is my
original plan: BO move throttling for visible VRAM by redirecting
mapped buffers to GTT and not allowing them to go back to VRAM if some
counter is too high.

Opinions?

Marek
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: Plan: BO move throttling for visible VRAM evictions

2017-04-16 Thread Marek Olšák
On Fri, Apr 14, 2017 at 12:14 PM, Michel Dänzer <mic...@daenzer.net> wrote:
> On 04/04/17 05:11 AM, Marek Olšák wrote:
>> On Fri, Mar 31, 2017 at 5:24 AM, Michel Dänzer <mic...@daenzer.net> wrote:
>>> On 30/03/17 07:03 PM, Michel Dänzer wrote:
>>>> On 25/03/17 01:33 AM, Marek Olšák wrote:
>>>>> Hi,
>>>>>
>>>>> I'm sharing this idea here, because it's something that has been
>>>>> decreasing our performance a lot recently, for example:
>>>>> http://openbenchmarking.org/prospect/1703011-RI-RADEONDIR06/7b7668cfc109d1c3dc27e871c8aea71ca13f23fa
>>>>
>>>> The attached proof-of-concept patch (on top of Christian's "CPU mapping
>>>> of split VRAM buffers" series, ported from radeon) results in 145.05 fps
>>>> on my Tonga.
>>>
>>> I get the same result without my or Christian's patches though, with
>>> 4.11 based DRM or amd-staging-4.9. So I guess I just can't reproduce the
>>> problem with this test. Are there any other tests for it?
>>
>> It's random. Sometimes the benchmark runs OK, other times it's slow.
>> You can easily see the difference but observing how smooth it is. The
>> visible VRAM evictions result in constant 100-200ms stalls but not
>> every frame, which feels like the frame rate is much lower than it
>> actually is.
>>
>> Make sure your graphics details are maxed out. The best score I can
>> get with my rig is 70 fps. (Fiji & Core i5 3570)
>
> I'm getting around 53-54 fps at Ultra with Tonga, both with Mesa 13.0.6
> and Git.
>
> Have you tried if Christian's patches for CPU access to split VRAM
> buffers help? I can imagine that forcing contiguous VRAM buffers for CPU
> access could cause lots of other BOs to be unnecessarily evicted from
> VRAM, if at least one of their fragments happens to be in the CPU
> visible part of VRAM.

I've finally tested latest amd-staging-4.9 and I'm very pleased. For
the first time, the Deus Ex benchmark has almost no hiccups. I've
never seen it so smooth. At one point, the MB/s BO move rate increase
to 200MB/s, stayed there for a couple of seconds, and then it dropped
to 0 again. The frame rate was OK-ish, so I guess the moves didn't
happen all at once. I also tested DiRT Rally and I haven't been able
to reproduce the low FPS with the consistently-high BO move rate that
I saw several months ago.

We could do some move throttling there for sure, but it's much better
than it ever was.

Marek
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 3/3] drm/amdgpu: CIK support is no longer experimental

2017-04-13 Thread Marek Olšák
On Thu, Apr 13, 2017 at 6:41 PM, Nicolai Hähnle  wrote:
> On 11.04.2017 00:06, Felix Kuehling wrote:
>>
>> On 17-04-08 04:50 AM, Nicolai Hähnle wrote:
>>>
>>> On 07.04.2017 22:15, Felix Kuehling wrote:

 Change the wording of the CONFIG_DRM_AMDGPU_CIK option to indicate
 that it's no longer experimental.

 Signed-off-by: Felix Kuehling 
 ---
  drivers/gpu/drm/amd/amdgpu/Kconfig | 9 +
  1 file changed, 5 insertions(+), 4 deletions(-)

 diff --git a/drivers/gpu/drm/amd/amdgpu/Kconfig
 b/drivers/gpu/drm/amd/amdgpu/Kconfig
 index f3b6df8..029e3fe 100644
 --- a/drivers/gpu/drm/amd/amdgpu/Kconfig
 +++ b/drivers/gpu/drm/amd/amdgpu/Kconfig
 @@ -9,11 +9,12 @@ config DRM_AMDGPU_CIK
  bool "Enable amdgpu support for CIK parts"
  depends on DRM_AMDGPU
  help
 -  Choose this option if you want to enable experimental support
 -  for CIK asics.
 +  Choose this option if you want to enable support for CIK asics.

 -  CIK is already supported in radeon.  CIK support in amdgpu
 -  is for experimentation and testing.
 +  If you choose No here, CIK ASICs will be supported by the
 +  radeon driver, as in previous kernel versions. Depending on
 +  your choice you will need different user mode (Mesa, X.org)
 +  drivers to support accelerated graphics on CIK.
>>>
>>>
>>> The last part is a bit misleading: while you do need different DDXes,
>>> the same Mesa driver (radeonsi) will work with both the radeon and the
>>> amdgpu kernel module for CIK. FWIW, the same is true for SI, although
>>> older versions of Mesa might stumble when run on the amdgpu kernel
>>> module.
>>
>>
>> I see. Do you know the minimum Mesa version required for SI and CIK
>> support on amdgpu respectively?
>
>
> For SI, it's Mesa 17.0.
>
> For CIK, I kind of suspect the support has "always" been there, since the
> amdgpu kernel module was originally brought up on CIK, but maybe Marek knows
> more.

Yes, CIK Mesa support should work with all amdgpu versions.

Marek
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


<    1   2   3   4   5   >