Re: [Mesa-dev] [PATCH] gpu/docs: Clarify what userspace means for gl

2019-04-24 Thread zhoucm1



On 2019年04月25日 03:22, Eric Anholt wrote:

"Zhou, David(ChunMing)"  writes:


Will linux be only mesa-linux? I thought linux is an  open linux.
Which will impact our opengl/amdvlk(MIT open source), not sure Rocm:
1. how to deal with one uapi that opengl/amdvlk needs but mesa dont need? 
reject?
2. one hw feature that opengl/amdvlk developers work on that but no mesa
developers work on, cannot upstream as well?

I believe these questions are already covered by

"+Other userspace is only admissible if exposing a given feature through OpenGL
or
+OpenGL ES would result in a technically unsound design, incomplete driver or
+an implementation which isn't useful in real world usage."

If OpenGL needs the interface, then you need a Mesa implementation.
It's time for you to work with the community to build that or get it
built.  Or, in AMD's case, work with the Mesa developers that you
already employ.

If OpenGL doesn't need it, but Vulkan needs it, then we don't have a
clear policy in place, and this patch doesn't change that.  I would
personally say that AMDVLK doesn't qualify given that as far as I know
there is not open review of proposed patches to the project as they're
being developed.
Can I understand what you mean is, as soon as the stack is openly 
developed, then which will be able to drive new UAPI?


-David

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] anv: Implement VK_ANDROID_native_buffer (v5)

2017-09-18 Thread zhoucm1



On 2017年09月19日 05:44, Chad Versace wrote:

This implementation is correct (afaict), but takes two shortcuts
regarding the import/export of Android sync fds.

   Shortcut 1. When Android calls vkAcquireImageANDROID to import a sync
   fd into a VkSemaphore or VkFence, the driver instead simply blocks on
   the sync fd, then puts the VkSemaphore or VkFence into the signalled
   state. Thanks to implicit sync, this produces correct behavior (with
   extra latency overhead, perhaps) despite its ugliness.

   Shortcut 2. When Android calls vkQueueSignalReleaseImageANDROID to export
   a collection of wait semaphores as a sync fd, the driver instead
   submits the semaphores to the queue, then returns sync fd -1, which
   informs the caller that no additional synchronization is needed.
   Again, thanks to implicit sync, this produces correct behavior (with
   extra batch submission overhead) despite its ugliness.

I chose to take the shortcuts instead of properly importing/exporting
the sync fds for two reasons:

   Reason 1. I've already tested this patch with dEQP and with demos
   apps. It works. I wanted to get the tested patches into the tree now,
   and polish the implementation afterwards.

   Reason 2. I want to run this on a 3.18 kernel (gasp!). In 3.18, i915
   supports neither Android's sync_fence, nor upstream's sync_file, nor
   drm_syncobj. Again, I tested these patches on Android with a 3.18
   kernel and they work.

I plan to quickly follow-up with patches that remove the shortcuts and
properly import/export the sync fds.

Testing
===
I tested with 64-bit ARC++ on a Skylake Chromebook and a 3.18 kernel.
The following pass:

   a little spinning cube demo APK
   dEQP-VK.info.*
   dEQP-VK.api.smoke.*
   dEQP-VK.api.info.instance.*
   dEQP-VK.api.info.device.*
   dEQP-VK.api.wsi.android.*

v2:
   - Reject VkNativeBufferANDROID if the dma-buf's size is too small for
 the VkImage.
   - Stop abusing VkNativeBufferANDROID by passing it to vkAllocateMemory
 during vkCreateImage. Instead, directly import its dma-buf during
 vkCreateImage with anv_bo_cache_import(). [for jekstrand]
   - Rebase onto Tapani's VK_EXT_debug_report changes.
   - Drop `CPPFLAGS += $(top_srcdir)/include/android`. The dir does not
 exist.

v3:
   - Delete duplicate #include "anv_private.h". [per Tapani]
   - Try to fix the Android-IA build in Android.vulkan.mk by following
 Tapani's example.

v4:
   - Unset EXEC_OBJECT_ASYNC and set EXEC_OBJECT_WRITE on the imported
 gralloc buffer, just as we do for all other winsys buffers in
 anv_wsi.c. [found by Tapani]

v5:
   - Really fix the Android-IA build by ensuring that Android.vulkan.mk
 uses Mesa' vulkan.h and not Android's.  Insert -I$(MESA_TOP)/include
 before -Iframeworks/native/vulkan/include. [for Tapani]
   - In vkAcquireImageANDROID, submit signal operations to the
 VkSemaphore and VkFence. [for zhou]

Cc: Tapani Pälli 
Cc: Jason Ekstrand 
Cc: zhoucm1 
---
  src/intel/Android.vulkan.mk |   7 +-
  src/intel/Makefile.sources  |   3 +
  src/intel/Makefile.vulkan.am|   2 +
  src/intel/vulkan/anv_android.c  | 243 
  src/intel/vulkan/anv_device.c   |  12 +-
  src/intel/vulkan/anv_entrypoints_gen.py |  10 +-
  src/intel/vulkan/anv_extensions.py  |   1 +
  src/intel/vulkan/anv_image.c| 148 +--
  src/intel/vulkan/anv_private.h  |   1 +
  9 files changed, 415 insertions(+), 12 deletions(-)
  create mode 100644 src/intel/vulkan/anv_android.c

diff --git a/src/intel/Android.vulkan.mk b/src/intel/Android.vulkan.mk
index e20b32b87c..b2d7d4e46c 100644
--- a/src/intel/Android.vulkan.mk
+++ b/src/intel/Android.vulkan.mk
@@ -28,6 +28,7 @@ VK_ENTRYPOINTS_SCRIPT := $(MESA_PYTHON2) 
$(LOCAL_PATH)/vulkan/anv_entrypoints_ge
  VK_EXTENSIONS_SCRIPT := $(MESA_PYTHON2) $(LOCAL_PATH)/vulkan/anv_extensions.py
  
  VULKAN_COMMON_INCLUDES := \

+   $(MESA_TOP)/include \
$(MESA_TOP)/src/mapi \
$(MESA_TOP)/src/gallium/auxiliary \
$(MESA_TOP)/src/gallium/include \
@@ -36,7 +37,8 @@ VULKAN_COMMON_INCLUDES := \
$(MESA_TOP)/src/vulkan/util \
$(MESA_TOP)/src/intel \
$(MESA_TOP)/include/drm-uapi \
-   $(MESA_TOP)/src/intel/vulkan
+   $(MESA_TOP)/src/intel/vulkan \
+   frameworks/native/vulkan/include
  
  # libmesa_anv_entrypoints with header and dummy.c

  #
@@ -254,7 +256,8 @@ LOCAL_CFLAGS := -DLOG_TAG=\"INTEL-MESA\"
  LOCAL_LDFLAGS += -Wl,--build-id=sha1
  
  LOCAL_SRC_FILES := \

-   $(VULKAN_GEM_FILES)
+   $(VULKAN_GEM_FILES) \
+   $(VULKAN_ANDROID_FILES)
  
  LOCAL_C_INCLUDES := \

$(VULKAN_COMMON_INCLUDES) \
diff --git a/src/intel/Makefile.sources b/src/intel/Makefile.sources
index 8ca50ff622..6f2dfa91e2 100644
--- a/src/intel/Makefile.sources
+++ b/src/intel/Makefile.sources
@@ -229,6 +229,9 @@

Re: [Mesa-dev] [PATCH 20/20] anv: Implement VK_ANDROID_native_buffer (v2)

2017-09-14 Thread zhoucm1



On 2017年09月14日 07:03, Chad Versace wrote:

From: Chad Versace 

This implementation is correct (afaict), but takes two shortcuts
regarding the import/export of Android sync fds.

   Shortcut 1. When Android calls vkAcquireImageANDROID to import a sync
   fd into a VkSemaphore or VkFence, the driver instead simply blocks on
   the sync fd, then puts the VkSemaphore or VkFence into the signalled
   state. Thanks to implicit sync, this produces correct behavior (with
   extra latency overhead, perhaps) despite its ugliness.

   Shortcut 2. When Android calls vkQueueSignalReleaseImageANDROID to export
   a collection of wait semaphores as a sync fd, the driver instead
   submits the semaphores to the queue, then returns sync fd -1, which
   informs the caller that no additional synchronization is needed.
   Again, thanks to implicit sync, this produces correct behavior (with
   extra batch submission overhead) despite its ugliness.

I chose to take the shortcuts instead of properly importing/exporting
the sync fds for two reasons:

   Reason 1. I've already tested this patch with dEQP and with demos
   apps. It works. I wanted to get the tested patches into the tree now,
   and polish the implementation afterwards.

   Reason 2. I want to run this on a 3.18 kernel (gasp!). In 3.18, i915
   supports neither Android's sync_fence, nor upstream's sync_file, nor
   drm_syncobj. Again, I tested these patches on Android with a 3.18
   kernel and they work.

I plan to quickly follow-up with patches that remove the shortcuts and
properly import/export the sync fds.

Non-Testing
===
I did not test at all using the Android.mk buildsystem. I probably
broke it. Please test and review that.

Testing
===
I tested with 64-bit ARC++ on a Skylake Chromebook and a 3.18 kernel.
The following pass:

   a little spinning cube demo APK
   dEQP-VK.info.*
   dEQP-VK.api.smoke.*
   dEQP-VK.api.info.instance.*
   dEQP-VK.api.info.device.*
   dEQP-VK.api.wsi.android.*

v2:
   - Reject VkNativeBufferANDROID if the dma-buf's size is too small for
 the VkImage.
   - Stop abusing VkNativeBufferANDROID by passing it to vkAllocateMemory
 during vkCreateImage. Instead, directly import its dma-buf during
 vkCreateImage with anv_bo_cache_import(). [for jekstrand]
   - Rebase onto Tapani's VK_EXT_debug_report changes.
   - Drop `CPPFLAGS += $(top_srcdir)/include/android`. The dir does not
 exist.
---
  src/intel/Makefile.sources  |   3 +
  src/intel/Makefile.vulkan.am|   2 +
  src/intel/vulkan/anv_android.c  | 245 
  src/intel/vulkan/anv_device.c   |  12 +-
  src/intel/vulkan/anv_entrypoints_gen.py |  10 +-
  src/intel/vulkan/anv_extensions.py  |   1 +
  src/intel/vulkan/anv_image.c| 141 --
  src/intel/vulkan/anv_private.h  |   1 +
  8 files changed, 405 insertions(+), 10 deletions(-)
  create mode 100644 src/intel/vulkan/anv_android.c

diff --git a/src/intel/Makefile.sources b/src/intel/Makefile.sources
index 8ca50ff622b..6f2dfa91e20 100644
--- a/src/intel/Makefile.sources
+++ b/src/intel/Makefile.sources
@@ -229,6 +229,9 @@ VULKAN_FILES := \
vulkan/anv_wsi.c \
vulkan/vk_format_info.h
  
+VULKAN_ANDROID_FILES := \

+   vulkan/anv_android.c
+
  VULKAN_WSI_WAYLAND_FILES := \
vulkan/anv_wsi_wayland.c
  
diff --git a/src/intel/Makefile.vulkan.am b/src/intel/Makefile.vulkan.am

index d1b1132ed2e..e9c824f717b 100644
--- a/src/intel/Makefile.vulkan.am
+++ b/src/intel/Makefile.vulkan.am
@@ -147,8 +147,10 @@ VULKAN_LIB_DEPS = \
-lm
  
  if HAVE_PLATFORM_ANDROID

+VULKAN_CPPFLAGS += $(ANDROID_CPPFLAGS)
  VULKAN_CFLAGS += $(ANDROID_CFLAGS)
  VULKAN_LIB_DEPS += $(ANDROID_LIBS)
+VULKAN_SOURCES += $(VULKAN_ANDROID_FILES)
  endif
  
  if HAVE_PLATFORM_X11

diff --git a/src/intel/vulkan/anv_android.c b/src/intel/vulkan/anv_android.c
new file mode 100644
index 000..6b19ace4d2d
--- /dev/null
+++ b/src/intel/vulkan/anv_android.c
@@ -0,0 +1,245 @@
+/*
+ * Copyright 2017 Google
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR 

Re: [Mesa-dev] [RFC PATCH] radeonsi: set a per-buffer flag that disables inter-process sharing (v2)

2017-07-20 Thread zhoucm1



On 2017年07月20日 22:59, Marek Olšák wrote:



On Jul 19, 2017 10:21 PM, "zhoucm1" <mailto:david1.z...@amd.com>> wrote:




On 2017年07月19日 23:34, Marek Olšák wrote:



On Jul 19, 2017 3:36 AM, "zhoucm1" mailto:david1.z...@amd.com>> wrote:



On 2017年07月19日 04:08, Marek Olšák wrote:

From: Marek Olšák mailto:marek.ol...@amd.com>>

For lower overhead in the CS ioctl.
Winsys allocators are not used with interprocess-sharable
resources.

Hi Marek,

Could I know from how your this way reduces overhead in CS
ioctl? reusing BO to short bo list?


The kernel part of the work hasn't been done yet. The idea is
that nonsharable buffers don't have to be revalidated by TTM,

OK, Maybe I only can see the whole picture of this idea when you
complete kernel part.
Out of curious,  why/how can nonsharable buffers be revalidated by
TTM without exposing like amdgpu_bo_make_resident api?


I think the idea is that all nonsharable buffers will be backed by the 
same reservation object, so TTM can skip buffer validation if no 
buffer has been moved. It's just an optimization for the current design.



With mentioned in another thread, if we can expose make_resident
api, we can remove bo_list, even we can remove reservation
operation in CS ioctl.
And now, I think our bo list is a very bad design,
first, umd must create bo list for every command submission, this
is a extra cpu overhead compared with traditional way.
second, kernel also have to iterate the list, when bo list is too
long, like OpenCL program, they always throw several thousands BOs
to bo list, reservation must keep these thousands ww_mutex safe,
CPU overhead is too big.

So I strongly suggest we should expose make_resident api to user
space. if cannot, I want to know any specific reason to see if we
can solve it.


Yeah, I think the BO list idea is likely to die sooner or later. It 
made sense for GL before bindless was a thing. Nowadays I don't see 
much value in it.


MesaGL will keep tracking the BO list because it's a requirement for 
good GL performance (it determines whether to flush IBs before BO 
synchronization, it allows tracking fences for each BO, which are used 
to determine dependencies between IBs, and that all allows async SDMA 
and async compute for GL, which doesn't have separate queues).


However, we don't need any BO list at the libdrm level and lower. I 
think a BO_CREATE flag that causes that the buffer is added to a 
kernel-side per-fd BO list would be sufficient. How the kernel manages 
its BO list should be its own implementation detail. Initially we can 
just move the current BO list management into the kernel.
I guess this idea will make bo list worse, which just decrease umd 
effort, but increase kernel driver complication.


First, from your and Christian's comments, we can get this agreement 
that bo list design is not a good way.

My proposal of exposing amdgpu_bo_make_resident is to replace bo list.
If we can make all needed bo resident, then we don't need to validate it 
again in cs ioctl, then we don't need their reservation lock more. After 
job pushed to scheduler, then we can un-resident BOs.
Even we can make it for VM bo, then we don't need to check vm update 
again while done in va map ioctl.


If this is got done(eviction has been improved more), I cannot see any 
obvious gap for performance.


What do you think of this proposal of exposing amdgpu_bo_make_resident 
api to user space? Or any other idea we can discuss.


If you all agree with, I can volunteer to try with UMD guys.

Regards,
David Zhou



Marek





Regards,
David Zhou


so it should remove a lot of kernel overhead and the BO list
remains the same.

Marek



Thanks,
David Zhou


v2: It shouldn't crash anymore, but the kernel will
reject the new flag.
---
  src/gallium/drivers/radeon/r600_buffer_common.c |  7 +
  src/gallium/drivers/radeon/radeon_winsys.h | 20
+++---
  src/gallium/winsys/amdgpu/drm/amdgpu_bo.c  | 36
-
  src/gallium/winsys/radeon/drm/radeon_drm_bo.c  | 27
+++
  4 files changed, 62 insertions(+), 28 deletions(-)

diff --git
a/src/gallium/drivers/radeon/r600_buffer_common.c
b/src/gallium/drivers/radeon/r600_buffer_common.c
index dd1c209..2747ac4 100644
--- a/src/gallium/drivers/radeon/r600_buffer_common.c
+++ b/src/gallium/drivers/radeon/r600_buffer_common.c
@@ -160,20 +160,27 @@ void
r600_init_resource_fields(struct r600_common_screen *rscreen,
  

Re: [Mesa-dev] [RFC PATCH] radeonsi: set a per-buffer flag that disables inter-process sharing (v2)

2017-07-19 Thread zhoucm1



On 2017年07月19日 23:34, Marek Olšák wrote:



On Jul 19, 2017 3:36 AM, "zhoucm1" <mailto:david1.z...@amd.com>> wrote:




On 2017年07月19日 04:08, Marek Olšák wrote:

From: Marek Olšák mailto:marek.ol...@amd.com>>

For lower overhead in the CS ioctl.
Winsys allocators are not used with interprocess-sharable
resources.

Hi Marek,

Could I know from how your this way reduces overhead in CS ioctl?
reusing BO to short bo list?


The kernel part of the work hasn't been done yet. The idea is that 
nonsharable buffers don't have to be revalidated by TTM,
OK, Maybe I only can see the whole picture of this idea when you 
complete kernel part.
Out of curious,  why/how can nonsharable buffers be revalidated by TTM 
without exposing like amdgpu_bo_make_resident api?


With mentioned in another thread, if we can expose make_resident api, we 
can remove bo_list, even we can remove reservation operation in CS ioctl.

And now, I think our bo list is a very bad design,
first, umd must create bo list for every command submission, this is a 
extra cpu overhead compared with traditional way.
second, kernel also have to iterate the list, when bo list is too long, 
like OpenCL program, they always throw several thousands BOs to bo list, 
reservation must keep these thousands ww_mutex safe, CPU overhead is too 
big.


So I strongly suggest we should expose make_resident api to user space. 
if cannot, I want to know any specific reason to see if we can solve it.



Regards,
David Zhou
so it should remove a lot of kernel overhead and the BO list remains 
the same.


Marek



Thanks,
David Zhou


v2: It shouldn't crash anymore, but the kernel will reject the
new flag.
---
  src/gallium/drivers/radeon/r600_buffer_common.c |  7 +
  src/gallium/drivers/radeon/radeon_winsys.h   | 20 +++---
  src/gallium/winsys/amdgpu/drm/amdgpu_bo.c| 36
-
  src/gallium/winsys/radeon/drm/radeon_drm_bo.c  | 27
+++
  4 files changed, 62 insertions(+), 28 deletions(-)

diff --git a/src/gallium/drivers/radeon/r600_buffer_common.c
b/src/gallium/drivers/radeon/r600_buffer_common.c
index dd1c209..2747ac4 100644
--- a/src/gallium/drivers/radeon/r600_buffer_common.c
+++ b/src/gallium/drivers/radeon/r600_buffer_common.c
@@ -160,20 +160,27 @@ void r600_init_resource_fields(struct
r600_common_screen *rscreen,
}
/* Tiled textures are unmappable. Always put them in
VRAM. */
if ((res->b.b.target != PIPE_BUFFER &&
!rtex->surface.is_linear) ||
res->flags & R600_RESOURCE_FLAG_UNMAPPABLE) {
res->domains = RADEON_DOMAIN_VRAM;
res->flags |= RADEON_FLAG_NO_CPU_ACCESS |
 RADEON_FLAG_GTT_WC;
}
  + /* Only displayable single-sample textures can be
shared between
+* processes. */
+   if (res->b.b.target == PIPE_BUFFER ||
+   res->b.b.nr_samples >= 2 ||
+   rtex->surface.micro_tile_mode !=
RADEON_MICRO_MODE_DISPLAY)
+   res->flags |= RADEON_FLAG_NO_INTERPROCESS_SHARING;
+
/* If VRAM is just stolen system memory, allow both
VRAM and
 * GTT, whichever has free space. If a buffer is
evicted from
 * VRAM to GTT, it will stay there.
 *
 * DRM 3.6.0 has good BO move throttling, so we can
allow VRAM-only
 * placements even with a low amount of stolen VRAM.
 */
if (!rscreen->info.has_dedicated_vram &&
(rscreen->info.drm_major < 3 ||
rscreen->info.drm_minor < 6) &&
res->domains == RADEON_DOMAIN_VRAM) {
diff --git a/src/gallium/drivers/radeon/radeon_winsys.h
b/src/gallium/drivers/radeon/radeon_winsys.h
index 351edcd..0abcb56 100644
--- a/src/gallium/drivers/radeon/radeon_winsys.h
+++ b/src/gallium/drivers/radeon/radeon_winsys.h
@@ -47,20 +47,21 @@ enum radeon_bo_domain { /* bitfield */
  RADEON_DOMAIN_GTT  = 2,
  RADEON_DOMAIN_VRAM = 4,
  RADEON_DOMAIN_VRAM_GTT = RADEON_DOMAIN_VRAM |
RADEON_DOMAIN_GTT
  };
enum radeon_bo_flag { /* bitfield */
  RADEON_FLAG_GTT_WC =(1 << 0),
  RADEON_FLAG_NO_CPU_ACCESS = (1 << 1),
  RADEON_FLAG_NO_SUBALLOC =   (1 << 2),
  RADEON_FLAG_SPARSE =(1 << 3),
+RADEON_FLAG_NO_INTERPROCESS_SHARING = (1 <

Re: [Mesa-dev] [RFC PATCH] radeonsi: set a per-buffer flag that disables inter-process sharing (v2)

2017-07-19 Thread zhoucm1



On 2017年07月19日 04:08, Marek Olšák wrote:

From: Marek Olšák 

For lower overhead in the CS ioctl.
Winsys allocators are not used with interprocess-sharable resources.

Hi Marek,

Could I know from how your this way reduces overhead in CS ioctl? 
reusing BO to short bo list?


Thanks,
David Zhou


v2: It shouldn't crash anymore, but the kernel will reject the new flag.
---
  src/gallium/drivers/radeon/r600_buffer_common.c |  7 +
  src/gallium/drivers/radeon/radeon_winsys.h  | 20 +++---
  src/gallium/winsys/amdgpu/drm/amdgpu_bo.c   | 36 -
  src/gallium/winsys/radeon/drm/radeon_drm_bo.c   | 27 +++
  4 files changed, 62 insertions(+), 28 deletions(-)

diff --git a/src/gallium/drivers/radeon/r600_buffer_common.c 
b/src/gallium/drivers/radeon/r600_buffer_common.c
index dd1c209..2747ac4 100644
--- a/src/gallium/drivers/radeon/r600_buffer_common.c
+++ b/src/gallium/drivers/radeon/r600_buffer_common.c
@@ -160,20 +160,27 @@ void r600_init_resource_fields(struct r600_common_screen 
*rscreen,
}
  
  	/* Tiled textures are unmappable. Always put them in VRAM. */

if ((res->b.b.target != PIPE_BUFFER && !rtex->surface.is_linear) ||
res->flags & R600_RESOURCE_FLAG_UNMAPPABLE) {
res->domains = RADEON_DOMAIN_VRAM;
res->flags |= RADEON_FLAG_NO_CPU_ACCESS |
 RADEON_FLAG_GTT_WC;
}
  
+	/* Only displayable single-sample textures can be shared between

+* processes. */
+   if (res->b.b.target == PIPE_BUFFER ||
+   res->b.b.nr_samples >= 2 ||
+   rtex->surface.micro_tile_mode != RADEON_MICRO_MODE_DISPLAY)
+   res->flags |= RADEON_FLAG_NO_INTERPROCESS_SHARING;
+
/* If VRAM is just stolen system memory, allow both VRAM and
 * GTT, whichever has free space. If a buffer is evicted from
 * VRAM to GTT, it will stay there.
 *
 * DRM 3.6.0 has good BO move throttling, so we can allow VRAM-only
 * placements even with a low amount of stolen VRAM.
 */
if (!rscreen->info.has_dedicated_vram &&
(rscreen->info.drm_major < 3 || rscreen->info.drm_minor < 6) &&
res->domains == RADEON_DOMAIN_VRAM) {
diff --git a/src/gallium/drivers/radeon/radeon_winsys.h 
b/src/gallium/drivers/radeon/radeon_winsys.h
index 351edcd..0abcb56 100644
--- a/src/gallium/drivers/radeon/radeon_winsys.h
+++ b/src/gallium/drivers/radeon/radeon_winsys.h
@@ -47,20 +47,21 @@ enum radeon_bo_domain { /* bitfield */
  RADEON_DOMAIN_GTT  = 2,
  RADEON_DOMAIN_VRAM = 4,
  RADEON_DOMAIN_VRAM_GTT = RADEON_DOMAIN_VRAM | RADEON_DOMAIN_GTT
  };
  
  enum radeon_bo_flag { /* bitfield */

  RADEON_FLAG_GTT_WC =(1 << 0),
  RADEON_FLAG_NO_CPU_ACCESS = (1 << 1),
  RADEON_FLAG_NO_SUBALLOC =   (1 << 2),
  RADEON_FLAG_SPARSE =(1 << 3),
+RADEON_FLAG_NO_INTERPROCESS_SHARING = (1 << 4),
  };
  
  enum radeon_bo_usage { /* bitfield */

  RADEON_USAGE_READ = 2,
  RADEON_USAGE_WRITE = 4,
  RADEON_USAGE_READWRITE = RADEON_USAGE_READ | RADEON_USAGE_WRITE,
  
  /* The winsys ensures that the CS submission will be scheduled after

   * previously flushed CSs referencing this BO in a conflicting way.
   */
@@ -685,28 +686,33 @@ static inline enum radeon_bo_domain 
radeon_domain_from_heap(enum radeon_heap hea
  default:
  assert(0);
  return (enum radeon_bo_domain)0;
  }
  }
  
  static inline unsigned radeon_flags_from_heap(enum radeon_heap heap)

  {
  switch (heap) {
  case RADEON_HEAP_VRAM_NO_CPU_ACCESS:
-return RADEON_FLAG_GTT_WC | RADEON_FLAG_NO_CPU_ACCESS;
+return RADEON_FLAG_GTT_WC |
+   RADEON_FLAG_NO_CPU_ACCESS |
+   RADEON_FLAG_NO_INTERPROCESS_SHARING;
+
  case RADEON_HEAP_VRAM:
  case RADEON_HEAP_VRAM_GTT:
  case RADEON_HEAP_GTT_WC:
-return RADEON_FLAG_GTT_WC;
+return RADEON_FLAG_GTT_WC |
+   RADEON_FLAG_NO_INTERPROCESS_SHARING;
+
  case RADEON_HEAP_GTT:
  default:
-return 0;
+return RADEON_FLAG_NO_INTERPROCESS_SHARING;
  }
  }
  
  /* The pb cache bucket is chosen to minimize pb_cache misses.

   * It must be between 0 and 3 inclusive.
   */
  static inline unsigned radeon_get_pb_cache_bucket_index(enum radeon_heap heap)
  {
  switch (heap) {
  case RADEON_HEAP_VRAM_NO_CPU_ACCESS:
@@ -724,22 +730,28 @@ static inline unsigned 
radeon_get_pb_cache_bucket_index(enum radeon_heap heap)
  
  /* Return the heap index for winsys allocators, or -1 on failure. */

  static inline int radeon_get_heap_index(enum radeon_bo_domain domain,
  enum radeon_bo_flag flags)
  {
  /* VRAM implies WC (write combining) */
  assert(!(domain & RADEON_DOMAIN_VRAM) || flags & RADEON_FLAG_GTT_WC);
  /* NO_CPU_ACCESS implies VRAM only. */
  assert(!(flags & RADEON_FLAG_NO_CPU