Re: [Mesa-dev] [PATCH 1/5] clover/memory: Copy data when creating buffers with CL_MEM_USE_HOST_PTR

2017-08-04 Thread Grigori Goronzy

On 2017-08-03 22:26, Alex Deucher wrote:


IIRC, user_ptrs require page alignment.

Alex



I didn't follow the whole discussion (sorry if I'm saying something 
redundant), but AMD's older OpenCL Optimization Guide [1] has some notes 
regarding the implementation of the USE_HOST_PTR flag.
It initially recommends 4KB (aka page) alignment but also supports 
arbitrary alignment (with additional overhead, I suppose it pins an 
extra page for bad alignments). It also does some optimizations to 
minimize mapping/unmapping operations, called "pre-pinning". Not sure if 
that is applicable to Mesa/Clover, aren't (GTT) buffers usually mapped 
forever?


Grigori

[1] 
http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2013/12/AMD_OpenCL_Programming_Optimization_Guide.pdf




Right now it's hard-coded to R600_MAP_BUFFER_ALIGNMENT in si_pipe.c
and r600_pipe.c which has a value of 64 (bytes, I believe).





And also change si_pipe.c:si_get_param's switch statement value to 
return:

  case PIPE_CAP_MIN_MAP_BUFFER_ALIGNMENT:
return sscreen->b.info.gart_page_size;


I'm not sure what the correct value is here. AFAIK, EG uses 256B 
cache

lines so I'd expect the value of to be at least that


Depending on how the weather works out tonight, I might be able to at
least find out what NI reports for gart page sizes and compare that to
my SI.  I haven't tried to test user pointer support on r600g yet, so
either it's working alright with the existing 64-byte alignment, or
it's broken when we allocate pointers using the actual alignments
reported by clGetDeviceInfo. If it's broken, I'll try 256B, then keep
bumping it up until it either starts working or I hit GART page size.

--Aaron



Both NI and GCN should be able to use 4K pages (which is what
gart_page_size is set to), but we might want higher alignment for
better performance[0]

[0]https://lists.freedesktop.org/archives/dri-devel/2014-May/058858.htm
l


Then I can successfully create buffers from user pointers on my SI 
card.


I'm a bit fuzzy on what alignment restrictions exist for SI/GCN 
cards,

but the winsys seems to indicate we should align things to gart page
size, which makes sense on the surface at least.

If the alignment restrictions have changed between R600 and GCN, 
that
might explain why what's broken for me is working for you/Grigori 
(on

r600).


I remember there was a buffer alignment patch form AMD recently for
SI/CI vs. VI+, but I can't find it.
It looks like a separate issue however. if incorrect alignment makes
user_ptr fail, and the test still fails, it looks like the 
no-user_ptr

fallback is broken.

Jan



--Aaron

>
> Jan
>
>
> >
> > Signed-off-by: Aaron Watry 
> > CC: Francisco Jerez 
> > ---
> >  src/gallium/state_trackers/clover/core/memory.cpp | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/src/gallium/state_trackers/clover/core/memory.cpp 
b/src/gallium/state_trackers/clover/core/memory.cpp
> > index b852e6896f..912d74830a 100644
> > --- a/src/gallium/state_trackers/clover/core/memory.cpp
> > +++ b/src/gallium/state_trackers/clover/core/memory.cpp
> > @@ -30,7 +30,7 @@ memory_obj::memory_obj(clover::context , cl_mem_flags 
flags,
> > size_t size, void *host_ptr) :
> > context(ctx), _flags(flags),
> > _size(size), _host_ptr(host_ptr) {
> > -   if (flags & CL_MEM_COPY_HOST_PTR)
> > +   if (flags & (CL_MEM_COPY_HOST_PTR | CL_MEM_USE_HOST_PTR))
> >data.append((char *)host_ptr, size);
> >  }
> >
>
> --
> Jan Vesely 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] glx: add support for GLX_ARB_create_context_no_error

2017-08-03 Thread Grigori Goronzy

Hi,

there also is a patch needed to make this work for Xorg on the 
xorg-devel list as well as preliminary piglit test to verify the 
functionality on the piglit list.


Grigori

On 2017-08-03 20:07, Grigori Goronzy wrote:

---
 src/glx/dri2_glx.c  | 12 
 src/glx/dri3_glx.c  |  8 
 src/glx/dri_common.c| 52 
-

 src/glx/dri_common.h|  5 +
 src/glx/drisw_glx.c |  3 +++
 src/glx/glxclient.h |  6 ++
 src/glx/glxextensions.c |  1 +
 src/glx/glxextensions.h |  1 +
 8 files changed, 87 insertions(+), 1 deletion(-)

diff --git a/src/glx/dri2_glx.c b/src/glx/dri2_glx.c
index ae8cb11..263f864 100644
--- a/src/glx/dri2_glx.c
+++ b/src/glx/dri2_glx.c
@@ -262,6 +262,10 @@ dri2_create_context_attribs(struct glx_screen 
*base,

  , , error))
   goto error_exit;

+   if (!dri2_check_no_error(flags, shareList, major_ver, error)) {
+  goto error_exit;
+   }
+
/* Check the renderType value */
if (!validate_renderType_against_config(config_base, renderType))
goto error_exit;
@@ -1159,6 +1163,14 @@ dri2BindExtensions(struct dri2_screen *psc,
struct glx_display * priv,
  __glXEnableDirectExtension(>base,
 
"GLX_ARB_create_context_robustness");


+  /* DRI2 version 3 is also required because
+   * GLX_ARB_create_context_no_error requires 
GLX_ARB_create_context.

+   */
+  if (psc->dri2->base.version >= 3
+  && strcmp(extensions[i]->name, __DRI2_NO_ERROR) == 0)
+ __glXEnableDirectExtension(>base,
+
"GLX_ARB_create_context_no_error");

+
   /* DRI2 version 3 is also required because 
GLX_MESA_query_renderer

* requires GLX_ARB_create_context_profile.
*/
diff --git a/src/glx/dri3_glx.c b/src/glx/dri3_glx.c
index 5091606..19667fa 100644
--- a/src/glx/dri3_glx.c
+++ b/src/glx/dri3_glx.c
@@ -283,6 +283,10 @@ dri3_create_context_attribs(struct glx_screen 
*base,

  , error))
   goto error_exit;

+   if (!dri2_check_no_error(flags, shareList, major_ver, error)) {
+  goto error_exit;
+   }
+
/* Check the renderType value */
if (!validate_renderType_against_config(config_base, render_type))
goto error_exit;
@@ -754,6 +758,10 @@ dri3_bind_extensions(struct dri3_screen *psc,
struct glx_display * priv,
  __glXEnableDirectExtension(>base,
 
"GLX_ARB_create_context_robustness");


+  if (strcmp(extensions[i]->name, __DRI2_NO_ERROR) == 0)
+ __glXEnableDirectExtension(>base,
+
"GLX_ARB_create_context_no_error");

+
   if (strcmp(extensions[i]->name, __DRI2_RENDERER_QUERY) == 0) {
  psc->rendererQuery = (__DRI2rendererQueryExtension *) 
extensions[i];
  __glXEnableDirectExtension(>base, 
"GLX_MESA_query_renderer");

diff --git a/src/glx/dri_common.c b/src/glx/dri_common.c
index 854733a..2cab207 100644
--- a/src/glx/dri_common.c
+++ b/src/glx/dri_common.c
@@ -468,6 +468,7 @@ dri2_convert_glx_attribs(unsigned num_attribs,
const uint32_t *attribs,
 {
unsigned i;
bool got_profile = false;
+   int no_error = 0;
uint32_t profile;

*major_ver = 1;
@@ -499,6 +500,9 @@ dri2_convert_glx_attribs(unsigned num_attribs,
const uint32_t *attribs,
   case GLX_CONTEXT_FLAGS_ARB:
 *flags = attribs[i * 2 + 1];
 break;
+  case GLX_CONTEXT_OPENGL_NO_ERROR_ARB:
+no_error = attribs[i * 2 + 1];
+break;
   case GLX_CONTEXT_PROFILE_MASK_ARB:
 profile = attribs[i * 2 + 1];
 got_profile = true;
@@ -527,6 +531,10 @@ dri2_convert_glx_attribs(unsigned num_attribs,
const uint32_t *attribs,
   }
}

+   if (no_error) {
+  *flags |= __DRI_CTX_FLAG_NO_ERROR;
+   }
+
if (!got_profile) {
   if (*major_ver > 3 || (*major_ver == 3 && *minor_ver >= 2))
 *api = __DRI_API_OPENGL_CORE;
@@ -567,7 +575,8 @@ dri2_convert_glx_attribs(unsigned num_attribs,
const uint32_t *attribs,
/* Unknown flag value.
 */
if (*flags & ~(__DRI_CTX_FLAG_DEBUG | 
__DRI_CTX_FLAG_FORWARD_COMPATIBLE

-  | __DRI_CTX_FLAG_ROBUST_BUFFER_ACCESS)) {
+  | __DRI_CTX_FLAG_ROBUST_BUFFER_ACCESS
+  | __DRI_CTX_FLAG_NO_ERROR)) {
   *error = __DRI_CTX_ERROR_UNKNOWN_FLAG;
   return false;
}
@@ -592,4 +601,45 @@ dri2_convert_glx_attribs(unsigned num_attribs,
const uint32_t *attribs,
return true;
 }

+_X_HIDDEN bool
+dri2_check_no_error(uint32_t flags, struct glx_context *share_context,
+int major, unsigned *error)
+{
+   Bool noError = flags & __DRI_CTX_FLAG_NO_ERROR;
+
+   /* The KHR_no_error specs say:
+*
+*Requires OpenGL ES 2.

[Mesa-dev] [PATCH 2/2] st/glx: add support for GLX_ARB_create_context_no_error

2017-08-03 Thread Grigori Goronzy
---
 src/gallium/state_trackers/glx/xlib/glx_api.c | 55 ---
 src/gallium/state_trackers/glx/xlib/xm_api.c  |  6 ++-
 src/gallium/state_trackers/glx/xlib/xm_api.h  |  4 +-
 3 files changed, 57 insertions(+), 8 deletions(-)

diff --git a/src/gallium/state_trackers/glx/xlib/glx_api.c 
b/src/gallium/state_trackers/glx/xlib/glx_api.c
index c473a0f..ac50dc6 100644
--- a/src/gallium/state_trackers/glx/xlib/glx_api.c
+++ b/src/gallium/state_trackers/glx/xlib/glx_api.c
@@ -66,6 +66,7 @@
"GLX_MESA_pixmap_colormap " \
"GLX_MESA_release_buffers " \
"GLX_ARB_create_context " \
+   "GLX_ARB_create_context_no_error " \
"GLX_ARB_create_context_profile " \
"GLX_ARB_get_proc_address " \
"GLX_EXT_create_context_es_profile " \
@@ -1108,7 +1109,8 @@ static GLXContext
 create_context(Display *dpy, XMesaVisual xmvis,
XMesaContext shareCtx, Bool direct,
unsigned major, unsigned minor,
-   unsigned profileMask, unsigned contextFlags)
+   unsigned profileMask, unsigned contextFlags,
+   Bool noError)
 {
GLXContext glxCtx;
 
@@ -1125,7 +1127,8 @@ create_context(Display *dpy, XMesaVisual xmvis,
 #endif
 
glxCtx->xmesaContext = XMesaCreateContext(xmvis, shareCtx, major, minor,
- profileMask, contextFlags);
+ profileMask, contextFlags,
+ (GLboolean)noError);
if (!glxCtx->xmesaContext) {
   free(glxCtx);
   return NULL;
@@ -1158,7 +1161,8 @@ glXCreateContext( Display *dpy, XVisualInfo *visinfo,
return create_context(dpy, xmvis,
  shareCtx ? shareCtx->xmesaContext : NULL,
  direct,
- 1, 0, GLX_CONTEXT_COMPATIBILITY_PROFILE_BIT_ARB, 0x0);
+ 1, 0, GLX_CONTEXT_COMPATIBILITY_PROFILE_BIT_ARB, 0x0,
+ False);
 }
 
 
@@ -2194,7 +2198,8 @@ glXCreateNewContext( Display *dpy, GLXFBConfig config,
return create_context(dpy, xmvis,
  shareCtx ? shareCtx->xmesaContext : NULL,
  direct,
- 1, 0, GLX_CONTEXT_COMPATIBILITY_PROFILE_BIT_ARB, 0x0);
+ 1, 0, GLX_CONTEXT_COMPATIBILITY_PROFILE_BIT_ARB, 0x0,
+ False);
 }
 
 
@@ -2409,7 +2414,8 @@ glXCreateContextWithConfigSGIX(Display *dpy, 
GLXFBConfigSGIX config,
return create_context(dpy, xmvis,
  shareCtx ? shareCtx->xmesaContext : NULL,
  direct,
- 1, 0, GLX_CONTEXT_COMPATIBILITY_PROFILE_BIT_ARB, 0x0);
+ 1, 0, GLX_CONTEXT_COMPATIBILITY_PROFILE_BIT_ARB, 0x0,
+ False);
 }
 
 
@@ -2741,6 +2747,7 @@ glXCreateContextAttribsARB(Display *dpy, GLXFBConfig 
config,
int renderType = GLX_RGBA_TYPE;
unsigned i;
Bool done = False;
+   Bool noError = False;
const int contextFlagsAll = (GLX_CONTEXT_DEBUG_BIT_ARB |
 GLX_CONTEXT_FORWARD_COMPATIBLE_BIT_ARB);
GLXContext ctx;
@@ -2757,6 +2764,9 @@ glXCreateContextAttribsARB(Display *dpy, GLXFBConfig 
config,
   case GLX_CONTEXT_FLAGS_ARB:
  contextFlags = attrib_list[++i];
  break;
+  case GLX_CONTEXT_OPENGL_NO_ERROR_ARB:
+ noError = attrib_list[++i];
+ break;
   case GLX_CONTEXT_PROFILE_MASK_ARB:
  profileMask = attrib_list[++i];
  break;
@@ -2826,16 +2836,49 @@ glXCreateContextAttribsARB(Display *dpy, GLXFBConfig 
config,
   return NULL;
}
 
+   /* The KHR_no_error specs say:
+*
+*Requires OpenGL ES 2.0 or OpenGL 2.0.
+*/
+   if (noError && majorVersion < 2) {
+  generate_error(dpy, BadMatch, 0, X_GLXCreateContextAttribsARB, True);
+  return NULL;
+   }
+
if (renderType == GLX_COLOR_INDEX_TYPE && majorVersion >= 3) {
   generate_error(dpy, BadMatch, 0, X_GLXCreateContextAttribsARB, True);
   return NULL;
}
 
+   /* The GLX_ARB_create_context_no_error specs say:
+*
+*BadMatch is generated if the value of GLX_CONTEXT_OPENGL_NO_ERROR_ARB
+*used to create  does not match the value of
+*GLX_CONTEXT_OPENGL_NO_ERROR_ARB for the context being created.
+*/
+   if (shareCtx && shareCtx->xmesaContext->no_error != noError) {
+  generate_error(dpy, BadMatch, 0, X_GLXCreateContextAttribsARB, True);
+  return NULL;
+   }
+
+   /* The GLX_ARB_create_context_no_error specs say:
+*
+*BadMatch is generated if the GLX_CONTEXT_OPENGL_NO_ERROR_ARB is TRUE 
at
+*the same time as a debug or robustness context is specified.
+*
+* Robustness isn't supported by this GLX implementation yet, so doesn't
+* apply.
+*/
+   if (noError && (contextFlags & GLX_CONTEXT_DEBUG_BIT_ARB)) {
+  generate_error(dpy, BadMatch, 0, 

[Mesa-dev] [PATCH 1/2] glx: add support for GLX_ARB_create_context_no_error

2017-08-03 Thread Grigori Goronzy
---
 src/glx/dri2_glx.c  | 12 
 src/glx/dri3_glx.c  |  8 
 src/glx/dri_common.c| 52 -
 src/glx/dri_common.h|  5 +
 src/glx/drisw_glx.c |  3 +++
 src/glx/glxclient.h |  6 ++
 src/glx/glxextensions.c |  1 +
 src/glx/glxextensions.h |  1 +
 8 files changed, 87 insertions(+), 1 deletion(-)

diff --git a/src/glx/dri2_glx.c b/src/glx/dri2_glx.c
index ae8cb11..263f864 100644
--- a/src/glx/dri2_glx.c
+++ b/src/glx/dri2_glx.c
@@ -262,6 +262,10 @@ dri2_create_context_attribs(struct glx_screen *base,
  , , error))
   goto error_exit;
 
+   if (!dri2_check_no_error(flags, shareList, major_ver, error)) {
+  goto error_exit;
+   }
+
/* Check the renderType value */
if (!validate_renderType_against_config(config_base, renderType))
goto error_exit;
@@ -1159,6 +1163,14 @@ dri2BindExtensions(struct dri2_screen *psc, struct 
glx_display * priv,
  __glXEnableDirectExtension(>base,
 "GLX_ARB_create_context_robustness");
 
+  /* DRI2 version 3 is also required because
+   * GLX_ARB_create_context_no_error requires GLX_ARB_create_context.
+   */
+  if (psc->dri2->base.version >= 3
+  && strcmp(extensions[i]->name, __DRI2_NO_ERROR) == 0)
+ __glXEnableDirectExtension(>base,
+"GLX_ARB_create_context_no_error");
+
   /* DRI2 version 3 is also required because GLX_MESA_query_renderer
* requires GLX_ARB_create_context_profile.
*/
diff --git a/src/glx/dri3_glx.c b/src/glx/dri3_glx.c
index 5091606..19667fa 100644
--- a/src/glx/dri3_glx.c
+++ b/src/glx/dri3_glx.c
@@ -283,6 +283,10 @@ dri3_create_context_attribs(struct glx_screen *base,
  , error))
   goto error_exit;
 
+   if (!dri2_check_no_error(flags, shareList, major_ver, error)) {
+  goto error_exit;
+   }
+
/* Check the renderType value */
if (!validate_renderType_against_config(config_base, render_type))
goto error_exit;
@@ -754,6 +758,10 @@ dri3_bind_extensions(struct dri3_screen *psc, struct 
glx_display * priv,
  __glXEnableDirectExtension(>base,
 "GLX_ARB_create_context_robustness");
 
+  if (strcmp(extensions[i]->name, __DRI2_NO_ERROR) == 0)
+ __glXEnableDirectExtension(>base,
+"GLX_ARB_create_context_no_error");
+
   if (strcmp(extensions[i]->name, __DRI2_RENDERER_QUERY) == 0) {
  psc->rendererQuery = (__DRI2rendererQueryExtension *) extensions[i];
  __glXEnableDirectExtension(>base, "GLX_MESA_query_renderer");
diff --git a/src/glx/dri_common.c b/src/glx/dri_common.c
index 854733a..2cab207 100644
--- a/src/glx/dri_common.c
+++ b/src/glx/dri_common.c
@@ -468,6 +468,7 @@ dri2_convert_glx_attribs(unsigned num_attribs, const 
uint32_t *attribs,
 {
unsigned i;
bool got_profile = false;
+   int no_error = 0;
uint32_t profile;
 
*major_ver = 1;
@@ -499,6 +500,9 @@ dri2_convert_glx_attribs(unsigned num_attribs, const 
uint32_t *attribs,
   case GLX_CONTEXT_FLAGS_ARB:
 *flags = attribs[i * 2 + 1];
 break;
+  case GLX_CONTEXT_OPENGL_NO_ERROR_ARB:
+no_error = attribs[i * 2 + 1];
+break;
   case GLX_CONTEXT_PROFILE_MASK_ARB:
 profile = attribs[i * 2 + 1];
 got_profile = true;
@@ -527,6 +531,10 @@ dri2_convert_glx_attribs(unsigned num_attribs, const 
uint32_t *attribs,
   }
}
 
+   if (no_error) {
+  *flags |= __DRI_CTX_FLAG_NO_ERROR;
+   }
+
if (!got_profile) {
   if (*major_ver > 3 || (*major_ver == 3 && *minor_ver >= 2))
 *api = __DRI_API_OPENGL_CORE;
@@ -567,7 +575,8 @@ dri2_convert_glx_attribs(unsigned num_attribs, const 
uint32_t *attribs,
/* Unknown flag value.
 */
if (*flags & ~(__DRI_CTX_FLAG_DEBUG | __DRI_CTX_FLAG_FORWARD_COMPATIBLE
-  | __DRI_CTX_FLAG_ROBUST_BUFFER_ACCESS)) {
+  | __DRI_CTX_FLAG_ROBUST_BUFFER_ACCESS
+  | __DRI_CTX_FLAG_NO_ERROR)) {
   *error = __DRI_CTX_ERROR_UNKNOWN_FLAG;
   return false;
}
@@ -592,4 +601,45 @@ dri2_convert_glx_attribs(unsigned num_attribs, const 
uint32_t *attribs,
return true;
 }
 
+_X_HIDDEN bool
+dri2_check_no_error(uint32_t flags, struct glx_context *share_context,
+int major, unsigned *error)
+{
+   Bool noError = flags & __DRI_CTX_FLAG_NO_ERROR;
+
+   /* The KHR_no_error specs say:
+*
+*Requires OpenGL ES 2.0 or OpenGL 2.0.
+*/
+   if (major < 2) {
+  *error = __DRI_CTX_ERROR_UNKNOWN_ATTRIBUTE;
+  return false;
+   }
+
+   /* The GLX_ARB_create_context_no_error specs say:
+*
+*BadMatch is generated if the value of GLX_CONTEXT_OPENGL_NO_ERROR_ARB
+*used to create  does not match the value of
+*GLX_CONTEXT_OPENGL_NO_ERROR_ARB for the context 

Re: [Mesa-dev] [PATCH] egl: fix check for KHR_no_error vs debug/robustness

2017-07-26 Thread Grigori Goronzy

On 2017-07-19 23:51, Grigori Goronzy wrote:

The check is too aggressive and might also fail if context flags
appear after the no-error attribute in the context attribute list.

Delay the check to after attribute parsing to fix this.
---
This was found by the piglit test I just sent to the piglit ML. I 
promise,

next time I'll write tests before writing any code that touches public
interfaces. :)



Ping.

I'd especially like to get this into Mesa 7.2. Like noted earlier, this 
patch can be tested with the new piglit test I've sent.


Grigori


 src/egl/main/eglcontext.c | 21 ++---
 1 file changed, 10 insertions(+), 11 deletions(-)

diff --git a/src/egl/main/eglcontext.c b/src/egl/main/eglcontext.c
index 1a8e9bd..1b03160 100644
--- a/src/egl/main/eglcontext.c
+++ b/src/egl/main/eglcontext.c
@@ -328,17 +328,6 @@ _eglParseContextAttribList(_EGLContext *ctx,
_EGLDisplay *dpy,
 break;
  }

- /* The EGL_KHR_create_context_no_error spec says:
-  *
-  *"BAD_MATCH is generated if the
EGL_CONTEXT_OPENGL_NO_ERROR_KHR is TRUE at
-  *the same time as a debug or robustness context is 
specified."

-  */
- if (ctx->Flags & EGL_CONTEXT_OPENGL_DEBUG_BIT_KHR ||
- ctx->Flags & EGL_CONTEXT_OPENGL_ROBUST_ACCESS_BIT_KHR) {
-err = EGL_BAD_MATCH;
-break;
- }
-
  /* Canonicalize value to EGL_TRUE/EGL_FALSE definitions */
  ctx->NoError = !!val;
  break;
@@ -489,6 +478,16 @@ _eglParseContextAttribList(_EGLContext *ctx,
_EGLDisplay *dpy,
   break;
}

+   /* The EGL_KHR_create_context_no_error spec says:
+*
+*"BAD_MATCH is generated if the
EGL_CONTEXT_OPENGL_NO_ERROR_KHR is TRUE at
+*the same time as a debug or robustness context is specified."
+*/
+   if (ctx->NoError && (ctx->Flags & EGL_CONTEXT_OPENGL_DEBUG_BIT_KHR 
||

+ctx->Flags &
EGL_CONTEXT_OPENGL_ROBUST_ACCESS_BIT_KHR)) {
+  err = EGL_BAD_MATCH;
+   }
+
if ((ctx->Flags & ~(EGL_CONTEXT_OPENGL_DEBUG_BIT_KHR
   | EGL_CONTEXT_OPENGL_FORWARD_COMPATIBLE_BIT_KHR
   | EGL_CONTEXT_OPENGL_ROBUST_ACCESS_BIT_KHR)) != 
0) {

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] dri: Make classic drivers allow __DRI_CTX_FLAG_NO_ERROR.

2017-07-20 Thread Grigori Goronzy

On 2017-07-18 20:25, Ian Romanick wrote:

On 07/14/2017 04:10 PM, Kenneth Graunke wrote:

Grigori recently added EGL_KHR_create_context_no_error support,
which causes EGL to pass a new __DRI_CTX_FLAG_NO_ERROR flag to
drivers when requesting an appropriate context mode.

driContextSetFlags() will already handle it properly for us, but the
classic drivers all have code to explicitly balk at unknown flags.  We
need to let it through or they'll fail to create a no_error context.


I'm almost afraid to ask... are there tests that try to create a
no_error context?



I have now posted a test to the piglit ML, which might be useful for 
testing this patch.


Grigori


---
 src/mesa/drivers/dri/i915/intel_screen.c   | 2 +-
 src/mesa/drivers/dri/i965/brw_context.c| 5 +++--
 src/mesa/drivers/dri/nouveau/nouveau_context.c | 2 +-
 src/mesa/drivers/dri/r200/r200_context.c   | 2 +-
 src/mesa/drivers/dri/radeon/radeon_context.c   | 2 +-
 5 files changed, 7 insertions(+), 6 deletions(-)

Drivers other than i965 have not been tested.

diff --git a/src/mesa/drivers/dri/i915/intel_screen.c 
b/src/mesa/drivers/dri/i915/intel_screen.c

index 9e23552b998..1ac72e14a15 100644
--- a/src/mesa/drivers/dri/i915/intel_screen.c
+++ b/src/mesa/drivers/dri/i915/intel_screen.c
@@ -972,7 +972,7 @@ intelCreateContext(gl_api api,
__DRIscreen *sPriv = driContextPriv->driScreenPriv;
struct intel_screen *intelScreen = sPriv->driverPrivate;

-   if (flags & ~__DRI_CTX_FLAG_DEBUG) {
+   if (flags & ~(__DRI_CTX_FLAG_DEBUG | __DRI_CTX_FLAG_NO_ERROR)) {
   *error = __DRI_CTX_ERROR_UNKNOWN_FLAG;
   return false;
}
diff --git a/src/mesa/drivers/dri/i965/brw_context.c 
b/src/mesa/drivers/dri/i965/brw_context.c

index b23e811f305..bd26e2332c7 100644
--- a/src/mesa/drivers/dri/i965/brw_context.c
+++ b/src/mesa/drivers/dri/i965/brw_context.c
@@ -813,8 +813,9 @@ brwCreateContext(gl_api api,
/* Only allow the __DRI_CTX_FLAG_ROBUST_BUFFER_ACCESS flag if the 
kernel

 * provides us with context reset notifications.
 */
-   uint32_t allowed_flags = __DRI_CTX_FLAG_DEBUG
-  | __DRI_CTX_FLAG_FORWARD_COMPATIBLE;
+   uint32_t allowed_flags = __DRI_CTX_FLAG_DEBUG |
+__DRI_CTX_FLAG_FORWARD_COMPATIBLE |
+__DRI_CTX_FLAG_NO_ERROR;

if (screen->has_context_reset_notification)
   allowed_flags |= __DRI_CTX_FLAG_ROBUST_BUFFER_ACCESS;
diff --git a/src/mesa/drivers/dri/nouveau/nouveau_context.c 
b/src/mesa/drivers/dri/nouveau/nouveau_context.c

index 6ddcadce1f0..d6f9e533848 100644
--- a/src/mesa/drivers/dri/nouveau/nouveau_context.c
+++ b/src/mesa/drivers/dri/nouveau/nouveau_context.c
@@ -63,7 +63,7 @@ nouveau_context_create(gl_api api,
struct nouveau_context *nctx;
struct gl_context *ctx;

-   if (flags & ~__DRI_CTX_FLAG_DEBUG) {
+   if (flags & ~(__DRI_CTX_FLAG_DEBUG | __DRI_CTX_FLAG_NO_ERROR)) {
*error = __DRI_CTX_ERROR_UNKNOWN_FLAG;
return false;
}
diff --git a/src/mesa/drivers/dri/r200/r200_context.c 
b/src/mesa/drivers/dri/r200/r200_context.c

index aaa9b9317df..5a7f33499b1 100644
--- a/src/mesa/drivers/dri/r200/r200_context.c
+++ b/src/mesa/drivers/dri/r200/r200_context.c
@@ -189,7 +189,7 @@ GLboolean r200CreateContext( gl_api api,
int i;
int tcl_mode;

-   if (flags & ~__DRI_CTX_FLAG_DEBUG) {
+   if (flags & ~(__DRI_CTX_FLAG_DEBUG | __DRI_CTX_FLAG_NO_ERROR)) {
   *error = __DRI_CTX_ERROR_UNKNOWN_FLAG;
   return false;
}
diff --git a/src/mesa/drivers/dri/radeon/radeon_context.c 
b/src/mesa/drivers/dri/radeon/radeon_context.c

index 11afe20c6a0..5ef3467ac17 100644
--- a/src/mesa/drivers/dri/radeon/radeon_context.c
+++ b/src/mesa/drivers/dri/radeon/radeon_context.c
@@ -155,7 +155,7 @@ r100CreateContext( gl_api api,
int i;
int tcl_mode, fthrottle_mode;

-   if (flags & ~__DRI_CTX_FLAG_DEBUG) {
+   if (flags & ~(__DRI_CTX_FLAG_DEBUG | __DRI_CTX_FLAG_NO_ERROR)) {
   *error = __DRI_CTX_ERROR_UNKNOWN_FLAG;
   return false;
}



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] egl: fix check for KHR_no_error vs debug/robustness

2017-07-19 Thread Grigori Goronzy
The check is too aggressive and might also fail if context flags
appear after the no-error attribute in the context attribute list.

Delay the check to after attribute parsing to fix this.
---
This was found by the piglit test I just sent to the piglit ML. I promise,
next time I'll write tests before writing any code that touches public
interfaces. :)

 src/egl/main/eglcontext.c | 21 ++---
 1 file changed, 10 insertions(+), 11 deletions(-)

diff --git a/src/egl/main/eglcontext.c b/src/egl/main/eglcontext.c
index 1a8e9bd..1b03160 100644
--- a/src/egl/main/eglcontext.c
+++ b/src/egl/main/eglcontext.c
@@ -328,17 +328,6 @@ _eglParseContextAttribList(_EGLContext *ctx, _EGLDisplay 
*dpy,
 break;
  }
 
- /* The EGL_KHR_create_context_no_error spec says:
-  *
-  *"BAD_MATCH is generated if the EGL_CONTEXT_OPENGL_NO_ERROR_KHR 
is TRUE at
-  *the same time as a debug or robustness context is specified."
-  */
- if (ctx->Flags & EGL_CONTEXT_OPENGL_DEBUG_BIT_KHR ||
- ctx->Flags & EGL_CONTEXT_OPENGL_ROBUST_ACCESS_BIT_KHR) {
-err = EGL_BAD_MATCH;
-break;
- }
-
  /* Canonicalize value to EGL_TRUE/EGL_FALSE definitions */
  ctx->NoError = !!val;
  break;
@@ -489,6 +478,16 @@ _eglParseContextAttribList(_EGLContext *ctx, _EGLDisplay 
*dpy,
   break;
}
 
+   /* The EGL_KHR_create_context_no_error spec says:
+*
+*"BAD_MATCH is generated if the EGL_CONTEXT_OPENGL_NO_ERROR_KHR is 
TRUE at
+*the same time as a debug or robustness context is specified."
+*/
+   if (ctx->NoError && (ctx->Flags & EGL_CONTEXT_OPENGL_DEBUG_BIT_KHR ||
+ctx->Flags & 
EGL_CONTEXT_OPENGL_ROBUST_ACCESS_BIT_KHR)) {
+  err = EGL_BAD_MATCH;
+   }
+
if ((ctx->Flags & ~(EGL_CONTEXT_OPENGL_DEBUG_BIT_KHR
   | EGL_CONTEXT_OPENGL_FORWARD_COMPATIBLE_BIT_KHR
   | EGL_CONTEXT_OPENGL_ROBUST_ACCESS_BIT_KHR)) != 0) {
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] dri: Make classic drivers allow __DRI_CTX_FLAG_NO_ERROR.

2017-07-18 Thread Grigori Goronzy

On 2017-07-18 20:25, Ian Romanick wrote:

On 07/14/2017 04:10 PM, Kenneth Graunke wrote:

Grigori recently added EGL_KHR_create_context_no_error support,
which causes EGL to pass a new __DRI_CTX_FLAG_NO_ERROR flag to
drivers when requesting an appropriate context mode.

driContextSetFlags() will already handle it properly for us, but the
classic drivers all have code to explicitly balk at unknown flags.  We
need to let it through or they'll fail to create a no_error context.


I'm almost afraid to ask... are there tests that try to create a
no_error context?



I don't think there are any yet. I have a piglit test for this which 
creates a context with EGL_khr_create_context_no_error and verifies that 
glGetError() behaves correctly in error conditions. I've used it to test 
my no_error series, but it's super hacky. Let me finish it up and I'll 
submit it in a few days.


Grigori


---
 src/mesa/drivers/dri/i915/intel_screen.c   | 2 +-
 src/mesa/drivers/dri/i965/brw_context.c| 5 +++--
 src/mesa/drivers/dri/nouveau/nouveau_context.c | 2 +-
 src/mesa/drivers/dri/r200/r200_context.c   | 2 +-
 src/mesa/drivers/dri/radeon/radeon_context.c   | 2 +-
 5 files changed, 7 insertions(+), 6 deletions(-)

Drivers other than i965 have not been tested.

diff --git a/src/mesa/drivers/dri/i915/intel_screen.c 
b/src/mesa/drivers/dri/i915/intel_screen.c

index 9e23552b998..1ac72e14a15 100644
--- a/src/mesa/drivers/dri/i915/intel_screen.c
+++ b/src/mesa/drivers/dri/i915/intel_screen.c
@@ -972,7 +972,7 @@ intelCreateContext(gl_api api,
__DRIscreen *sPriv = driContextPriv->driScreenPriv;
struct intel_screen *intelScreen = sPriv->driverPrivate;

-   if (flags & ~__DRI_CTX_FLAG_DEBUG) {
+   if (flags & ~(__DRI_CTX_FLAG_DEBUG | __DRI_CTX_FLAG_NO_ERROR)) {
   *error = __DRI_CTX_ERROR_UNKNOWN_FLAG;
   return false;
}
diff --git a/src/mesa/drivers/dri/i965/brw_context.c 
b/src/mesa/drivers/dri/i965/brw_context.c

index b23e811f305..bd26e2332c7 100644
--- a/src/mesa/drivers/dri/i965/brw_context.c
+++ b/src/mesa/drivers/dri/i965/brw_context.c
@@ -813,8 +813,9 @@ brwCreateContext(gl_api api,
/* Only allow the __DRI_CTX_FLAG_ROBUST_BUFFER_ACCESS flag if the 
kernel

 * provides us with context reset notifications.
 */
-   uint32_t allowed_flags = __DRI_CTX_FLAG_DEBUG
-  | __DRI_CTX_FLAG_FORWARD_COMPATIBLE;
+   uint32_t allowed_flags = __DRI_CTX_FLAG_DEBUG |
+__DRI_CTX_FLAG_FORWARD_COMPATIBLE |
+__DRI_CTX_FLAG_NO_ERROR;

if (screen->has_context_reset_notification)
   allowed_flags |= __DRI_CTX_FLAG_ROBUST_BUFFER_ACCESS;
diff --git a/src/mesa/drivers/dri/nouveau/nouveau_context.c 
b/src/mesa/drivers/dri/nouveau/nouveau_context.c

index 6ddcadce1f0..d6f9e533848 100644
--- a/src/mesa/drivers/dri/nouveau/nouveau_context.c
+++ b/src/mesa/drivers/dri/nouveau/nouveau_context.c
@@ -63,7 +63,7 @@ nouveau_context_create(gl_api api,
struct nouveau_context *nctx;
struct gl_context *ctx;

-   if (flags & ~__DRI_CTX_FLAG_DEBUG) {
+   if (flags & ~(__DRI_CTX_FLAG_DEBUG | __DRI_CTX_FLAG_NO_ERROR)) {
*error = __DRI_CTX_ERROR_UNKNOWN_FLAG;
return false;
}
diff --git a/src/mesa/drivers/dri/r200/r200_context.c 
b/src/mesa/drivers/dri/r200/r200_context.c

index aaa9b9317df..5a7f33499b1 100644
--- a/src/mesa/drivers/dri/r200/r200_context.c
+++ b/src/mesa/drivers/dri/r200/r200_context.c
@@ -189,7 +189,7 @@ GLboolean r200CreateContext( gl_api api,
int i;
int tcl_mode;

-   if (flags & ~__DRI_CTX_FLAG_DEBUG) {
+   if (flags & ~(__DRI_CTX_FLAG_DEBUG | __DRI_CTX_FLAG_NO_ERROR)) {
   *error = __DRI_CTX_ERROR_UNKNOWN_FLAG;
   return false;
}
diff --git a/src/mesa/drivers/dri/radeon/radeon_context.c 
b/src/mesa/drivers/dri/radeon/radeon_context.c

index 11afe20c6a0..5ef3467ac17 100644
--- a/src/mesa/drivers/dri/radeon/radeon_context.c
+++ b/src/mesa/drivers/dri/radeon/radeon_context.c
@@ -155,7 +155,7 @@ r100CreateContext( gl_api api,
int i;
int tcl_mode, fthrottle_mode;

-   if (flags & ~__DRI_CTX_FLAG_DEBUG) {
+   if (flags & ~(__DRI_CTX_FLAG_DEBUG | __DRI_CTX_FLAG_NO_ERROR)) {
   *error = __DRI_CTX_ERROR_UNKNOWN_FLAG;
   return false;
}


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/4] dri: Add KHR_no_error toggle to driconf

2017-07-18 Thread Grigori Goronzy

On 2017-07-17 19:21, Emil Velikov wrote:

On 13 July 2017 at 12:09, Grigori Goronzy <g...@chown.ath.cx> wrote:

On 2017-07-12 15:15, Emil Velikov wrote:


As mentioned in earlier commit no_error should be device agnostic.
Hence removing the st/dri bits and adding a DRI_CONF_MESA_NO_ERROR()
line next to DRI_CONF_VBLANK_MODE seems like the better solution.



Hm, driconf overrides are typically set per screen and/or driver, so 
that
won't work. The overrides will be ignored because of screen/driver 
mismatch.
So I think it needs to be implemented separately for each classic 
driver.

I'll keep this part to the Gallium state tracker for now.


Hmm my understanding was completely different. Have you tested my
suggestion or this is your assumption?



Sure, I have tested this. Check where driParseConfigFiles() is used in 
the code. Different parts of the stack have completely separate driconf 
databases, which are associated with different "driver names" (in 
quotes, because it's a rather confusing description, given the usage). 
The generic DRI layer that handles vblank_mode uses "dri2" as "driver 
name". Other parts have other different "driver names", all of which 
aren't obvious or documented, and most of the classic Mesa drivers also 
have separate driconf databases. So I added mesa_no_error to the generic 
DRI layer, but it only produced any result when an option is added to a 
"dri2" section of the driconf XML, which makes it somewhat strange and 
impractical to use. Of course vblank_mode has the same issue. I think 
this isn't really a good design and should be addressed at same point. 
Maybe it could be a good option to move to a single global database, or 
a hierarchical database (somewhat like LDAP). There are various possible 
options.


At this time, for practical reasons, I think it makes sense to add the 
mesa_no_error flag at the graphics driver layer only. That makes it easy 
to override this setting with the "driconf" GUI tool and there is no 
obscure "driver name" magic going on that users should not need to know 
about.


Grigori
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] dri: Make classic drivers allow __DRI_CTX_FLAG_NO_ERROR.

2017-07-14 Thread Grigori Goronzy

On 2017-07-15 01:10, Kenneth Graunke wrote:

Grigori recently added EGL_KHR_create_context_no_error support,
which causes EGL to pass a new __DRI_CTX_FLAG_NO_ERROR flag to
drivers when requesting an appropriate context mode.

driContextSetFlags() will already handle it properly for us, but the
classic drivers all have code to explicitly balk at unknown flags.  We
need to let it through or they'll fail to create a no_error context.



I can't test it, but LGTM, so:

Reviewed-by: Grigori Goronzy <g...@chown.ath.cx>


---
 src/mesa/drivers/dri/i915/intel_screen.c   | 2 +-
 src/mesa/drivers/dri/i965/brw_context.c| 5 +++--
 src/mesa/drivers/dri/nouveau/nouveau_context.c | 2 +-
 src/mesa/drivers/dri/r200/r200_context.c   | 2 +-
 src/mesa/drivers/dri/radeon/radeon_context.c   | 2 +-
 5 files changed, 7 insertions(+), 6 deletions(-)

Drivers other than i965 have not been tested.

diff --git a/src/mesa/drivers/dri/i915/intel_screen.c
b/src/mesa/drivers/dri/i915/intel_screen.c
index 9e23552b998..1ac72e14a15 100644
--- a/src/mesa/drivers/dri/i915/intel_screen.c
+++ b/src/mesa/drivers/dri/i915/intel_screen.c
@@ -972,7 +972,7 @@ intelCreateContext(gl_api api,
__DRIscreen *sPriv = driContextPriv->driScreenPriv;
struct intel_screen *intelScreen = sPriv->driverPrivate;

-   if (flags & ~__DRI_CTX_FLAG_DEBUG) {
+   if (flags & ~(__DRI_CTX_FLAG_DEBUG | __DRI_CTX_FLAG_NO_ERROR)) {
   *error = __DRI_CTX_ERROR_UNKNOWN_FLAG;
   return false;
}
diff --git a/src/mesa/drivers/dri/i965/brw_context.c
b/src/mesa/drivers/dri/i965/brw_context.c
index b23e811f305..bd26e2332c7 100644
--- a/src/mesa/drivers/dri/i965/brw_context.c
+++ b/src/mesa/drivers/dri/i965/brw_context.c
@@ -813,8 +813,9 @@ brwCreateContext(gl_api api,
/* Only allow the __DRI_CTX_FLAG_ROBUST_BUFFER_ACCESS flag if the 
kernel

 * provides us with context reset notifications.
 */
-   uint32_t allowed_flags = __DRI_CTX_FLAG_DEBUG
-  | __DRI_CTX_FLAG_FORWARD_COMPATIBLE;
+   uint32_t allowed_flags = __DRI_CTX_FLAG_DEBUG |
+__DRI_CTX_FLAG_FORWARD_COMPATIBLE |
+__DRI_CTX_FLAG_NO_ERROR;

if (screen->has_context_reset_notification)
   allowed_flags |= __DRI_CTX_FLAG_ROBUST_BUFFER_ACCESS;
diff --git a/src/mesa/drivers/dri/nouveau/nouveau_context.c
b/src/mesa/drivers/dri/nouveau/nouveau_context.c
index 6ddcadce1f0..d6f9e533848 100644
--- a/src/mesa/drivers/dri/nouveau/nouveau_context.c
+++ b/src/mesa/drivers/dri/nouveau/nouveau_context.c
@@ -63,7 +63,7 @@ nouveau_context_create(gl_api api,
struct nouveau_context *nctx;
struct gl_context *ctx;

-   if (flags & ~__DRI_CTX_FLAG_DEBUG) {
+   if (flags & ~(__DRI_CTX_FLAG_DEBUG | __DRI_CTX_FLAG_NO_ERROR)) {
*error = __DRI_CTX_ERROR_UNKNOWN_FLAG;
return false;
}
diff --git a/src/mesa/drivers/dri/r200/r200_context.c
b/src/mesa/drivers/dri/r200/r200_context.c
index aaa9b9317df..5a7f33499b1 100644
--- a/src/mesa/drivers/dri/r200/r200_context.c
+++ b/src/mesa/drivers/dri/r200/r200_context.c
@@ -189,7 +189,7 @@ GLboolean r200CreateContext( gl_api api,
int i;
int tcl_mode;

-   if (flags & ~__DRI_CTX_FLAG_DEBUG) {
+   if (flags & ~(__DRI_CTX_FLAG_DEBUG | __DRI_CTX_FLAG_NO_ERROR)) {
   *error = __DRI_CTX_ERROR_UNKNOWN_FLAG;
   return false;
}
diff --git a/src/mesa/drivers/dri/radeon/radeon_context.c
b/src/mesa/drivers/dri/radeon/radeon_context.c
index 11afe20c6a0..5ef3467ac17 100644
--- a/src/mesa/drivers/dri/radeon/radeon_context.c
+++ b/src/mesa/drivers/dri/radeon/radeon_context.c
@@ -155,7 +155,7 @@ r100CreateContext( gl_api api,
int i;
int tcl_mode, fthrottle_mode;

-   if (flags & ~__DRI_CTX_FLAG_DEBUG) {
+   if (flags & ~(__DRI_CTX_FLAG_DEBUG | __DRI_CTX_FLAG_NO_ERROR)) {
   *error = __DRI_CTX_ERROR_UNKNOWN_FLAG;
   return false;
}

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] egl: Fix predecence problem when setting __DRI_CTX_FLAG_NO_ERROR

2017-07-14 Thread Grigori Goronzy

On 2017-07-14 23:30, Kenneth Graunke wrote:

This accidentally set __DRI_CTX_FLAG_NO_ERROR whenever any flags were
present.  Just needs extra parenthesis.

Fixes: 4909519a6655 (egl: Add EGL_KHR_create_context_no_error support)


Reviewed-by: Grigori Goronzy <g...@chown.ath.cx>

Sorry for breaking so much stuff today. :)


---
 src/egl/drivers/dri2/egl_dri2.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/egl/drivers/dri2/egl_dri2.c 
b/src/egl/drivers/dri2/egl_dri2.c

index f632ebe2551..072494ed4ed 100644
--- a/src/egl/drivers/dri2/egl_dri2.c
+++ b/src/egl/drivers/dri2/egl_dri2.c
@@ -1074,7 +1074,7 @@ dri2_fill_context_attribs(struct
dri2_egl_context *dri2_ctx,

   ctx_attribs[pos++] = __DRI_CTX_ATTRIB_FLAGS;
   ctx_attribs[pos++] = dri2_ctx->base.Flags |
-dri2_ctx->base.NoError ? __DRI_CTX_FLAG_NO_ERROR : 0;
+ (dri2_ctx->base.NoError ? __DRI_CTX_FLAG_NO_ERROR : 0);
}

if (dri2_ctx->base.ResetNotificationStrategy !=
EGL_NO_RESET_NOTIFICATION_KHR) {

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] mesa/marshal: fix Windows build

2017-07-14 Thread Grigori Goronzy
This was broken by commit 1ad24faa.
---
 src/mesa/main/marshal.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/mesa/main/marshal.h b/src/mesa/main/marshal.h
index f2dc842..63e0295 100644
--- a/src/mesa/main/marshal.h
+++ b/src/mesa/main/marshal.h
@@ -257,7 +257,7 @@ void GLAPIENTRY
 _mesa_marshal_ClearBufferfv(GLenum buffer, GLint drawbuffer,
 const GLfloat *value);
 
-void GLAPIENTRY
+void
 _mesa_unmarshal_ClearBufferiv(struct gl_context *ctx,
   const struct marshal_cmd_ClearBuffer *cmd);
 
@@ -265,7 +265,7 @@ void GLAPIENTRY
 _mesa_marshal_ClearBufferiv(GLenum buffer, GLint drawbuffer,
 const GLint *value);
 
-void GLAPIENTRY
+void
 _mesa_unmarshal_ClearBufferuiv(struct gl_context *ctx,
const struct marshal_cmd_ClearBuffer *cmd);
 
@@ -273,7 +273,7 @@ void GLAPIENTRY
 _mesa_marshal_ClearBufferuiv(GLenum buffer, GLint drawbuffer,
  const GLuint *value);
 
-void GLAPIENTRY
+void
 _mesa_unmarshal_ClearBufferfi(struct gl_context *ctx,
   const struct marshal_cmd_ClearBuffer *cmd);
 
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 1/4] dri: Add KHR_no_error DRI extension

2017-07-13 Thread Grigori Goronzy
This basic extension allows usage of the __DRI_CTX_FLAG_NO_ERROR flag.
This includes support code for classic Mesa drivers to switch on the
no-error mode if the flag is set.

v2: Move to common DRI code.
---
 include/GL/internal/dri_interface.h   | 19 +++
 src/gallium/state_trackers/dri/dri2.c |  2 ++
 src/gallium/state_trackers/dri/dri_context.c  |  3 ++-
 src/gallium/state_trackers/dri/drisw.c|  1 +
 src/mesa/drivers/dri/common/dri_util.c| 12 ++--
 src/mesa/drivers/dri/common/dri_util.h|  2 ++
 src/mesa/drivers/dri/i915/intel_screen.c  |  1 +
 src/mesa/drivers/dri/i965/intel_screen.c  |  2 ++
 src/mesa/drivers/dri/nouveau/nouveau_screen.c |  1 +
 src/mesa/drivers/dri/radeon/radeon_screen.c   |  1 +
 src/mesa/drivers/dri/swrast/swrast.c  |  1 +
 11 files changed, 42 insertions(+), 3 deletions(-)

diff --git a/include/GL/internal/dri_interface.h 
b/include/GL/internal/dri_interface.h
index 2da46f7..da60648 100644
--- a/include/GL/internal/dri_interface.h
+++ b/include/GL/internal/dri_interface.h
@@ -1050,6 +1050,12 @@ struct __DRIdri2LoaderExtensionRec {
 #define __DRI_CTX_FLAG_ROBUST_BUFFER_ACCESS0x0004
 
 /**
+ * \requires __DRI2_NO_ERROR.
+ *
+ */
+#define __DRI_CTX_FLAG_NO_ERROR0x0008
+
+/**
  * \name Context reset strategies.
  */
 /*@{*/
@@ -1612,6 +1618,19 @@ struct __DRIrobustnessExtensionRec {
 };
 
 /**
+ * No-error context driver extension.
+ *
+ * Existence of this extension means the driver can accept the
+ * __DRI_CTX_FLAG_NO_ERROR flag.
+ */
+#define __DRI2_NO_ERROR "DRI_NoError"
+#define __DRI2_NO_ERROR_VERSION 1
+
+typedef struct __DRInoErrorExtensionRec {
+   __DRIextension base;
+} __DRInoErrorExtension;
+
+/**
  * DRI config options extension.
  *
  * This extension provides the XML string containing driver options for use by
diff --git a/src/gallium/state_trackers/dri/dri2.c 
b/src/gallium/state_trackers/dri/dri2.c
index 5da1c4e..6cd1582 100644
--- a/src/gallium/state_trackers/dri/dri2.c
+++ b/src/gallium/state_trackers/dri/dri2.c
@@ -2002,6 +2002,7 @@ static const __DRIextension *dri_screen_extensions[] = {
,
,
,
+   ,
NULL
 };
 
@@ -2015,6 +2016,7 @@ static const __DRIextension 
*dri_robust_screen_extensions[] = {
,
,
,
+   ,
NULL
 };
 
diff --git a/src/gallium/state_trackers/dri/dri_context.c 
b/src/gallium/state_trackers/dri/dri_context.c
index ec555e4..e25f186 100644
--- a/src/gallium/state_trackers/dri/dri_context.c
+++ b/src/gallium/state_trackers/dri/dri_context.c
@@ -57,7 +57,8 @@ dri_create_context(gl_api api, const struct gl_config * 
visual,
struct st_context_attribs attribs;
enum st_context_error ctx_err = 0;
unsigned allowed_flags = __DRI_CTX_FLAG_DEBUG |
-__DRI_CTX_FLAG_FORWARD_COMPATIBLE;
+__DRI_CTX_FLAG_FORWARD_COMPATIBLE |
+__DRI_CTX_FLAG_NO_ERROR;
const __DRIbackgroundCallableExtension *backgroundCallable =
   screen->sPriv->dri2.backgroundCallable;
 
diff --git a/src/gallium/state_trackers/dri/drisw.c 
b/src/gallium/state_trackers/dri/drisw.c
index 189d61c..ac40956 100644
--- a/src/gallium/state_trackers/dri/drisw.c
+++ b/src/gallium/state_trackers/dri/drisw.c
@@ -371,6 +371,7 @@ static const __DRIextension *drisw_screen_extensions[] = {
,
,
,
+   ,
NULL
 };
 
diff --git a/src/mesa/drivers/dri/common/dri_util.c 
b/src/mesa/drivers/dri/common/dri_util.c
index f6df488..bfae020 100644
--- a/src/mesa/drivers/dri/common/dri_util.c
+++ b/src/mesa/drivers/dri/common/dri_util.c
@@ -403,7 +403,8 @@ driCreateContextAttribs(__DRIscreen *screen, int api,
 if (mesa_api != API_OPENGL_COMPAT
 && mesa_api != API_OPENGL_CORE
 && (flags & ~(__DRI_CTX_FLAG_DEBUG |
- __DRI_CTX_FLAG_ROBUST_BUFFER_ACCESS))) {
+ __DRI_CTX_FLAG_ROBUST_BUFFER_ACCESS |
+ __DRI_CTX_FLAG_NO_ERROR))) {
*error = __DRI_CTX_ERROR_BAD_FLAG;
return NULL;
 }
@@ -425,7 +426,8 @@ driCreateContextAttribs(__DRIscreen *screen, int api,
 
 const uint32_t allowed_flags = (__DRI_CTX_FLAG_DEBUG
 | __DRI_CTX_FLAG_FORWARD_COMPATIBLE
-| __DRI_CTX_FLAG_ROBUST_BUFFER_ACCESS);
+| __DRI_CTX_FLAG_ROBUST_BUFFER_ACCESS
+| __DRI_CTX_FLAG_NO_ERROR);
 if (flags & ~allowed_flags) {
*error = __DRI_CTX_ERROR_UNKNOWN_FLAG;
return NULL;
@@ -467,6 +469,8 @@ driContextSetFlags(struct gl_context *ctx, uint32_t flags)
_mesa_set_debug_state_int(ctx, GL_DEBUG_OUTPUT, GL_TRUE);
 ctx->Const.ContextFlags |= GL_CONTEXT_FLAG_DEBUG_BIT;
 }
+if ((flags & __DRI_CTX_FLAG_NO_ERROR) != 0)
+ctx->Const.ContextFlags |= GL_CONTEXT_FLAG_NO_ERROR_BIT_KHR;
 }
 
 static __DRIcontext *
@@ -935,3 +939,7 @@ const 

[Mesa-dev] [PATCH v2 3/4] egl: Add EGL_KHR_create_context_no_error support

2017-07-13 Thread Grigori Goronzy
This only adds the EGL side, needs to be plumbed into Mesa frontend.

v2: Add check for extension availability.
---
 src/egl/drivers/dri2/egl_dri2.c | 20 ++--
 src/egl/drivers/dri2/egl_dri2.h |  1 +
 src/egl/main/eglapi.c   |  1 +
 src/egl/main/eglcontext.c   | 31 +++
 src/egl/main/eglcontext.h   |  1 +
 src/egl/main/egldisplay.h   |  1 +
 6 files changed, 53 insertions(+), 2 deletions(-)

diff --git a/src/egl/drivers/dri2/egl_dri2.c b/src/egl/drivers/dri2/egl_dri2.c
index cf26242..6bb94e4 100644
--- a/src/egl/drivers/dri2/egl_dri2.c
+++ b/src/egl/drivers/dri2/egl_dri2.c
@@ -428,6 +428,7 @@ static const struct dri2_extension_match 
swrast_core_extensions[] = {
 
 static const struct dri2_extension_match optional_core_extensions[] = {
{ __DRI2_ROBUSTNESS, 1, offsetof(struct dri2_egl_display, robustness) },
+   { __DRI2_NO_ERROR, 1, offsetof(struct dri2_egl_display, no_error) },
{ __DRI2_CONFIG_QUERY, 1, offsetof(struct dri2_egl_display, config) },
{ __DRI2_FENCE, 1, offsetof(struct dri2_egl_display, fence) },
{ __DRI2_RENDERER_QUERY, 1, offsetof(struct dri2_egl_display, 
rendererQuery) },
@@ -665,6 +666,9 @@ dri2_setup_screen(_EGLDisplay *disp)
  disp->Extensions.EXT_create_context_robustness = EGL_TRUE;
}
 
+   if (dri2_dpy->no_error)
+  disp->Extensions.KHR_create_context_no_error = EGL_TRUE;
+
if (dri2_dpy->fence) {
   disp->Extensions.KHR_fence_sync = EGL_TRUE;
   disp->Extensions.KHR_wait_sync = EGL_TRUE;
@@ -1056,7 +1060,7 @@ dri2_fill_context_attribs(struct dri2_egl_context 
*dri2_ctx,
ctx_attribs[pos++] = __DRI_CTX_ATTRIB_MINOR_VERSION;
ctx_attribs[pos++] = dri2_ctx->base.ClientMinorVersion;
 
-   if (dri2_ctx->base.Flags != 0) {
+   if (dri2_ctx->base.Flags != 0 || dri2_ctx->base.NoError) {
   /* If the implementation doesn't support the __DRI2_ROBUSTNESS
* extension, don't even try to send it the robust-access flag.
* It may explode.  Instead, generate the required EGL error here.
@@ -1068,7 +1072,8 @@ dri2_fill_context_attribs(struct dri2_egl_context 
*dri2_ctx,
   }
 
   ctx_attribs[pos++] = __DRI_CTX_ATTRIB_FLAGS;
-  ctx_attribs[pos++] = dri2_ctx->base.Flags;
+  ctx_attribs[pos++] = dri2_ctx->base.Flags |
+dri2_ctx->base.NoError ? __DRI_CTX_FLAG_NO_ERROR : 0;
}
 
if (dri2_ctx->base.ResetNotificationStrategy != 
EGL_NO_RESET_NOTIFICATION_KHR) {
@@ -1131,6 +1136,17 @@ dri2_create_context(_EGLDriver *drv, _EGLDisplay *disp, 
_EGLConfig *conf,
   goto cleanup;
}
 
+   /* The EGL_KHR_create_context_no_error spec says:
+*
+*"BAD_MATCH is generated if the value of 
EGL_CONTEXT_OPENGL_NO_ERROR_KHR
+*used to create  does not match the value of
+*EGL_CONTEXT_OPENGL_NO_ERROR_KHR for the context being created."
+*/
+   if (share_list && share_list->NoError != dri2_ctx->base.NoError) {
+  _eglError(EGL_BAD_MATCH, "eglCreateContext");
+  goto cleanup;
+   }
+
switch (dri2_ctx->base.ClientAPI) {
case EGL_OPENGL_ES_API:
   switch (dri2_ctx->base.ClientMajorVersion) {
diff --git a/src/egl/drivers/dri2/egl_dri2.h b/src/egl/drivers/dri2/egl_dri2.h
index 4a5cf8e..5b3e93a 100644
--- a/src/egl/drivers/dri2/egl_dri2.h
+++ b/src/egl/drivers/dri2/egl_dri2.h
@@ -170,6 +170,7 @@ struct dri2_egl_display
const __DRItexBufferExtension  *tex_buffer;
const __DRIimageExtension  *image;
const __DRIrobustnessExtension *robustness;
+   const __DRInoErrorExtension*no_error;
const __DRI2configQueryExtension *config;
const __DRI2fenceExtension *fence;
const __DRI2rendererQueryExtension *rendererQuery;
diff --git a/src/egl/main/eglapi.c b/src/egl/main/eglapi.c
index 9b899d8..000368a 100644
--- a/src/egl/main/eglapi.c
+++ b/src/egl/main/eglapi.c
@@ -494,6 +494,7 @@ _eglCreateExtensionsString(_EGLDisplay *dpy)
_EGL_CHECK_EXTENSION(KHR_cl_event2);
_EGL_CHECK_EXTENSION(KHR_config_attribs);
_EGL_CHECK_EXTENSION(KHR_create_context);
+   _EGL_CHECK_EXTENSION(KHR_create_context_no_error);
_EGL_CHECK_EXTENSION(KHR_fence_sync);
_EGL_CHECK_EXTENSION(KHR_get_all_proc_addresses);
_EGL_CHECK_EXTENSION(KHR_gl_colorspace);
diff --git a/src/egl/main/eglcontext.c b/src/egl/main/eglcontext.c
index df8b45c..1a8e9bd 100644
--- a/src/egl/main/eglcontext.c
+++ b/src/egl/main/eglcontext.c
@@ -312,6 +312,37 @@ _eglParseContextAttribList(_EGLContext *ctx, _EGLDisplay 
*dpy,
 ctx->Flags |= EGL_CONTEXT_OPENGL_FORWARD_COMPATIBLE_BIT_KHR;
  break;
 
+  case EGL_CONTEXT_OPENGL_NO_ERROR_KHR:
+ if (dpy->Version < 14 ||
+ !dpy->Extensions.KHR_create_context_no_error) {
+err = EGL_BAD_ATTRIBUTE;
+break;
+ }
+
+ /* The KHR_no_error spec only applies against OpenGL 2.0+ and
+  * OpenGL ES 2.0+
+  */
+ if ((api != EGL_OPENGL_API && api != EGL_OPENGL_ES_API) ||
+ 

[Mesa-dev] [PATCH v2 2/4] st/mesa: add support for KHR_no_error flag

2017-07-13 Thread Grigori Goronzy
Add a new context flag and plumb it through the various layers of the
context creation code to set up dispatch tables for the no-error mode.
---
 src/gallium/include/state_tracker/st_api.h   |  1 +
 src/gallium/state_trackers/dri/dri_context.c |  3 +++
 src/mesa/state_tracker/st_context.c  | 10 +++---
 src/mesa/state_tracker/st_context.h  |  3 ++-
 src/mesa/state_tracker/st_manager.c  |  6 +-
 5 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/src/gallium/include/state_tracker/st_api.h 
b/src/gallium/include/state_tracker/st_api.h
index 222e565..29e05e9 100644
--- a/src/gallium/include/state_tracker/st_api.h
+++ b/src/gallium/include/state_tracker/st_api.h
@@ -90,6 +90,7 @@ enum st_api_feature
 #define ST_CONTEXT_FLAG_FORWARD_COMPATIBLE  (1 << 1)
 #define ST_CONTEXT_FLAG_ROBUST_ACCESS   (1 << 2)
 #define ST_CONTEXT_FLAG_RESET_NOTIFICATION_ENABLED (1 << 3)
+#define ST_CONTEXT_FLAG_NO_ERROR(1 << 4)
 
 /**
  * Reasons that context creation might fail.
diff --git a/src/gallium/state_trackers/dri/dri_context.c 
b/src/gallium/state_trackers/dri/dri_context.c
index e25f186..275c0d4 100644
--- a/src/gallium/state_trackers/dri/dri_context.c
+++ b/src/gallium/state_trackers/dri/dri_context.c
@@ -107,6 +107,9 @@ dri_create_context(gl_api api, const struct gl_config * 
visual,
if (notify_reset)
   attribs.flags |= ST_CONTEXT_FLAG_RESET_NOTIFICATION_ENABLED;
 
+   if (flags & __DRI_CTX_FLAG_NO_ERROR)
+  attribs.flags |= ST_CONTEXT_FLAG_NO_ERROR;
+
if (sharedContextPrivate) {
   st_share = ((struct dri_context *)sharedContextPrivate)->st;
}
diff --git a/src/mesa/state_tracker/st_context.c 
b/src/mesa/state_tracker/st_context.c
index f535139..b8677f4 100644
--- a/src/mesa/state_tracker/st_context.c
+++ b/src/mesa/state_tracker/st_context.c
@@ -288,7 +288,7 @@ static void st_init_driver_flags(struct st_context *st);
 
 static struct st_context *
 st_create_context_priv( struct gl_context *ctx, struct pipe_context *pipe,
-   const struct st_config_options *options)
+   const struct st_config_options *options, bool no_error)
 {
struct pipe_screen *screen = pipe->screen;
uint i;
@@ -369,6 +369,9 @@ st_create_context_priv( struct gl_context *ctx, struct 
pipe_context *pipe,
 
ctx->VertexProgram._MaintainTnlProgram = GL_TRUE;
 
+   if (no_error)
+  ctx->Const.ContextFlags |= GL_CONTEXT_FLAG_NO_ERROR_BIT_KHR;
+
st->has_stencil_export =
   screen->get_param(screen, PIPE_CAP_SHADER_STENCIL_EXPORT);
st->has_shader_model3 = screen->get_param(screen, PIPE_CAP_SM3);
@@ -535,7 +538,8 @@ static void st_init_driver_flags(struct st_context *st)
 struct st_context *st_create_context(gl_api api, struct pipe_context *pipe,
  const struct gl_config *visual,
  struct st_context *share,
- const struct st_config_options *options)
+ const struct st_config_options *options,
+ bool no_error)
 {
struct gl_context *ctx;
struct gl_context *shareCtx = share ? share->ctx : NULL;
@@ -566,7 +570,7 @@ struct st_context *st_create_context(gl_api api, struct 
pipe_context *pipe,
if (debug_get_option_mesa_mvp_dp4())
   ctx->Const.ShaderCompilerOptions[MESA_SHADER_VERTEX].OptimizeForAOS = 
GL_TRUE;
 
-   st = st_create_context_priv(ctx, pipe, options);
+   st = st_create_context_priv(ctx, pipe, options, no_error);
if (!st) {
   _mesa_destroy_context(ctx);
}
diff --git a/src/mesa/state_tracker/st_context.h 
b/src/mesa/state_tracker/st_context.h
index af9149e..b2ea6b5 100644
--- a/src/mesa/state_tracker/st_context.h
+++ b/src/mesa/state_tracker/st_context.h
@@ -390,7 +390,8 @@ extern struct st_context *
 st_create_context(gl_api api, struct pipe_context *pipe,
   const struct gl_config *visual,
   struct st_context *share,
-  const struct st_config_options *options);
+  const struct st_config_options *options,
+  bool no_error);
 
 extern void
 st_destroy_context(struct st_context *st);
diff --git a/src/mesa/state_tracker/st_manager.c 
b/src/mesa/state_tracker/st_manager.c
index 7a3205c..242262b 100644
--- a/src/mesa/state_tracker/st_manager.c
+++ b/src/mesa/state_tracker/st_manager.c
@@ -654,6 +654,7 @@ st_api_create_context(struct st_api *stapi, struct 
st_manager *smapi,
struct pipe_context *pipe;
struct gl_config mode;
gl_api api;
+   bool no_error = false;
unsigned ctx_flags = PIPE_CONTEXT_PREFER_THREADED;
 
if (!(stapi->profile_mask & (1 << attribs->profile)))
@@ -680,6 +681,9 @@ st_api_create_context(struct st_api *stapi, struct 
st_manager *smapi,
if (attribs->flags & ST_CONTEXT_FLAG_ROBUST_ACCESS)
   ctx_flags |= PIPE_CONTEXT_ROBUST_BUFFER_ACCESS;
 
+   if (attribs->flags & ST_CONTEXT_FLAG_NO_ERROR)
+  

[Mesa-dev] [PATCH v2 4/4] st/mesa: Add KHR_no_error toggle to driconf

2017-07-13 Thread Grigori Goronzy
Allows applications to be whitelisted.

v2: Remove misguided DRI common part.
---
 src/gallium/state_trackers/dri/dri_context.c| 3 +++
 src/gallium/state_trackers/dri/dri_screen.c | 1 +
 src/mesa/drivers/dri/common/xmlpool/t_options.h | 5 +
 3 files changed, 9 insertions(+)

diff --git a/src/gallium/state_trackers/dri/dri_context.c 
b/src/gallium/state_trackers/dri/dri_context.c
index 275c0d4..8c3797e4 100644
--- a/src/gallium/state_trackers/dri/dri_context.c
+++ b/src/gallium/state_trackers/dri/dri_context.c
@@ -124,6 +124,9 @@ dri_create_context(gl_api api, const struct gl_config * 
visual,
ctx->cPriv = cPriv;
ctx->sPriv = sPriv;
 
+   if (driQueryOptionb(>optionCache, "mesa_no_error"))
+  attribs.flags |= ST_CONTEXT_FLAG_NO_ERROR;
+
attribs.options = screen->options;
dri_fill_st_visual(, screen, visual);
ctx->st = stapi->create_context(stapi, >base, , _err,
diff --git a/src/gallium/state_trackers/dri/dri_screen.c 
b/src/gallium/state_trackers/dri/dri_screen.c
index 6b58830..de0840b 100644
--- a/src/gallium/state_trackers/dri/dri_screen.c
+++ b/src/gallium/state_trackers/dri/dri_screen.c
@@ -56,6 +56,7 @@ const __DRIconfigOptionsExtension gallium_config_options = {
DRI_CONF_BEGIN
   DRI_CONF_SECTION_PERFORMANCE
  DRI_CONF_MESA_GLTHREAD("false")
+ DRI_CONF_MESA_NO_ERROR("false")
  DRI_CONF_DISABLE_EXT_BUFFER_AGE("false")
  DRI_CONF_DISABLE_OML_SYNC_CONTROL("false")
   DRI_CONF_SECTION_END
diff --git a/src/mesa/drivers/dri/common/xmlpool/t_options.h 
b/src/mesa/drivers/dri/common/xmlpool/t_options.h
index 9aa1798..e308839 100644
--- a/src/mesa/drivers/dri/common/xmlpool/t_options.h
+++ b/src/mesa/drivers/dri/common/xmlpool/t_options.h
@@ -332,6 +332,11 @@ DRI_CONF_OPT_BEGIN_B(mesa_glthread, def) \
 DRI_CONF_DESC(en,gettext("Enable offloading GL driver work to a 
separate thread")) \
 DRI_CONF_OPT_END
 
+#define DRI_CONF_MESA_NO_ERROR(def) \
+DRI_CONF_OPT_BEGIN_B(mesa_no_error, def) \
+DRI_CONF_DESC(en,gettext("Disable GL driver error checking")) \
+DRI_CONF_OPT_END
+
 #define DRI_CONF_DISABLE_EXT_BUFFER_AGE(def) \
 DRI_CONF_OPT_BEGIN_B(glx_disable_ext_buffer_age, def) \
DRI_CONF_DESC(en, gettext("Disable the GLX_EXT_buffer_age extension")) \
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/4] dri: Add KHR_no_error toggle to driconf

2017-07-13 Thread Grigori Goronzy

On 2017-07-12 15:15, Emil Velikov wrote:

As mentioned in earlier commit no_error should be device agnostic.
Hence removing the st/dri bits and adding a DRI_CONF_MESA_NO_ERROR()
line next to DRI_CONF_VBLANK_MODE seems like the better solution.



Hm, driconf overrides are typically set per screen and/or driver, so 
that won't work. The overrides will be ignored because of screen/driver 
mismatch. So I think it needs to be implemented separately for each 
classic driver. I'll keep this part to the Gallium state tracker for 
now.


By the way, glthread seems to have similar issues even though it is in 
theory independent of driver.


Grigori


As always, if you think I'm off my rocker at any point don't be afraid
to let me know.

-Emil

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/4] st/mesa: add support for KHR_no_error flag

2017-07-12 Thread Grigori Goronzy

On 2017-07-12 15:08, Emil Velikov wrote:

On 11 July 2017 at 23:26, Grigori Goronzy <g...@chown.ath.cx> wrote:

Add a new context flag and plumb it through the various layers of the
context creation code to set up dispatch tables for the no-error mode.
---
 src/gallium/include/state_tracker/st_api.h   |  1 +
 src/gallium/state_trackers/dri/dri_context.c |  3 +++
 src/mesa/state_tracker/st_context.c  | 10 +++---
 src/mesa/state_tracker/st_context.h  |  3 ++-
 src/mesa/state_tracker/st_manager.c  |  6 +-
 5 files changed, 18 insertions(+), 5 deletions(-)


I think this should come before the enablement patch... although
looking at Issue #6 we may be fine as-is.



What's issue #6?

Grigori
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] KHR_no_error improvements

2017-07-12 Thread Grigori Goronzy

On 2017-07-12 15:16, Emil Velikov wrote:

On 11 July 2017 at 23:26, Grigori Goronzy <g...@chown.ath.cx> wrote:

Hi,

this series implements support for the EGL_KHR_context_create_no
error extension and the associated plumbing through the different
layers of Mesa - EGL, DRI, Gallium state tracker, Mesa frontend. It
took me a while to figure out how everything is connected together
and still it's somewhat confusing to me, so please bear with me if
I did something stupid. :)

It's close to perfect actually. Do you have any plans on setting up 
GLX?




Thanks. We're already discussing the GLX variant of the extension on 
#dri-devel, so yes.


Grigori
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/4] egl: Add EGL_KHR_create_context_no_error support

2017-07-12 Thread Grigori Goronzy

On 2017-07-12 12:33, Eric Engestrom wrote:

+  case EGL_CONTEXT_OPENGL_NO_ERROR_KHR:
+ if (dpy->Version < 14) {
+err = EGL_BAD_ATTRIBUTE;
+break;
+ }
+
+ /* The KHR_no_error spec only applies against OpenGL 2.0+ 
and

+  * OpenGL ES 2.0+
+  */
+ if ((api != EGL_OPENGL_API && api != EGL_OPENGL_ES_API) ||
+ ctx->ClientMajorVersion < 2) {
+err = EGL_BAD_ATTRIBUTE;
+break;
+ }
+
+ /* The EGL_KHR_create_context_no_error spec says:
+  *
+  *"BAD_MATCH is generated if the 
EGL_CONTEXT_OPENGL_NO_ERROR_KHR is TRUE at
+  *the same time as a debug or robustness context is 
specified."

+  */
+ if (ctx->Flags & EGL_CONTEXT_OPENGL_DEBUG_BIT_KHR ||
+ ctx->Flags & EGL_CONTEXT_OPENGL_ROBUST_ACCESS_BIT_KHR) {
+err = EGL_BAD_MATCH;
+break;
+ }
+
+ /* Canonicalize value to EGL_TRUE/EGL_FALSE definitions */
+ ctx->NoError = !!val;


Do we need NoError?
Wouldn't adding __DRI_CTX_FLAG_NO_ERROR to Flags be enough?



The "Flags" field is for EGL context flags. There is no EGL context flag 
for KHR_no_error. It looks like an oversight in the 
KHR_create_context_no_error specification, but we can't fix it.


Grigori


Cheers,
  Eric


+ break;
+
   default:
  err = EGL_BAD_ATTRIBUTE;
  break;
diff --git a/src/egl/main/eglcontext.h b/src/egl/main/eglcontext.h
index f2fe806..0667622 100644
--- a/src/egl/main/eglcontext.h
+++ b/src/egl/main/eglcontext.h
@@ -62,6 +62,7 @@ struct _egl_context
EGLint Flags;
EGLint Profile;
EGLint ResetNotificationStrategy;
+   EGLBoolean NoError;

/* The real render buffer when a window surface is bound */
EGLint WindowRenderBuffer;
diff --git a/src/egl/main/egldisplay.h b/src/egl/main/egldisplay.h
index a13ff5b..3d5a445 100644
--- a/src/egl/main/egldisplay.h
+++ b/src/egl/main/egldisplay.h
@@ -122,6 +122,7 @@ struct _egl_extensions
EGLBoolean KHR_reusable_sync;
EGLBoolean KHR_surfaceless_context;
EGLBoolean KHR_wait_sync;
+   EGLBoolean KHR_create_context_no_error;

EGLBoolean MESA_drm_image;
EGLBoolean MESA_image_dma_buf_export;
--
2.7.4


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/4] dri: Add KHR_no_error DRI extension

2017-07-11 Thread Grigori Goronzy
This basic extension allows usage of the __DRI_CTX_FLAG_NO_ERROR flag.
This includes support code for classic Mesa drivers to switch on the
no-error mode if the flag is set.
---
 include/GL/internal/dri_interface.h  | 19 +++
 src/gallium/state_trackers/dri/dri2.c|  6 ++
 src/gallium/state_trackers/dri/dri_context.c |  3 ++-
 src/mesa/drivers/dri/common/dri_util.c   |  8 ++--
 src/mesa/drivers/dri/i965/intel_screen.c |  8 +++-
 5 files changed, 40 insertions(+), 4 deletions(-)

diff --git a/include/GL/internal/dri_interface.h 
b/include/GL/internal/dri_interface.h
index 2da46f7..da60648 100644
--- a/include/GL/internal/dri_interface.h
+++ b/include/GL/internal/dri_interface.h
@@ -1050,6 +1050,12 @@ struct __DRIdri2LoaderExtensionRec {
 #define __DRI_CTX_FLAG_ROBUST_BUFFER_ACCESS0x0004
 
 /**
+ * \requires __DRI2_NO_ERROR.
+ *
+ */
+#define __DRI_CTX_FLAG_NO_ERROR0x0008
+
+/**
  * \name Context reset strategies.
  */
 /*@{*/
@@ -1612,6 +1618,19 @@ struct __DRIrobustnessExtensionRec {
 };
 
 /**
+ * No-error context driver extension.
+ *
+ * Existence of this extension means the driver can accept the
+ * __DRI_CTX_FLAG_NO_ERROR flag.
+ */
+#define __DRI2_NO_ERROR "DRI_NoError"
+#define __DRI2_NO_ERROR_VERSION 1
+
+typedef struct __DRInoErrorExtensionRec {
+   __DRIextension base;
+} __DRInoErrorExtension;
+
+/**
  * DRI config options extension.
  *
  * This extension provides the XML string containing driver options for use by
diff --git a/src/gallium/state_trackers/dri/dri2.c 
b/src/gallium/state_trackers/dri/dri2.c
index 5da1c4e..244a6ad 100644
--- a/src/gallium/state_trackers/dri/dri2.c
+++ b/src/gallium/state_trackers/dri/dri2.c
@@ -1667,6 +1667,10 @@ static const __DRIrobustnessExtension dri2Robustness = {
.base = { __DRI2_ROBUSTNESS, 1 }
 };
 
+static const __DRInoErrorExtension driNoError = {
+   .base = { __DRI2_NO_ERROR, 1 }
+};
+
 static int
 dri2_interop_query_device_info(__DRIcontext *_ctx,
struct mesa_glinterop_device_info *out)
@@ -2002,6 +2006,7 @@ static const __DRIextension *dri_screen_extensions[] = {
,
,
,
+   ,
NULL
 };
 
@@ -2015,6 +2020,7 @@ static const __DRIextension 
*dri_robust_screen_extensions[] = {
,
,
,
+   ,
NULL
 };
 
diff --git a/src/gallium/state_trackers/dri/dri_context.c 
b/src/gallium/state_trackers/dri/dri_context.c
index ec555e4..e25f186 100644
--- a/src/gallium/state_trackers/dri/dri_context.c
+++ b/src/gallium/state_trackers/dri/dri_context.c
@@ -57,7 +57,8 @@ dri_create_context(gl_api api, const struct gl_config * 
visual,
struct st_context_attribs attribs;
enum st_context_error ctx_err = 0;
unsigned allowed_flags = __DRI_CTX_FLAG_DEBUG |
-__DRI_CTX_FLAG_FORWARD_COMPATIBLE;
+__DRI_CTX_FLAG_FORWARD_COMPATIBLE |
+__DRI_CTX_FLAG_NO_ERROR;
const __DRIbackgroundCallableExtension *backgroundCallable =
   screen->sPriv->dri2.backgroundCallable;
 
diff --git a/src/mesa/drivers/dri/common/dri_util.c 
b/src/mesa/drivers/dri/common/dri_util.c
index f6df488..174356f 100644
--- a/src/mesa/drivers/dri/common/dri_util.c
+++ b/src/mesa/drivers/dri/common/dri_util.c
@@ -403,7 +403,8 @@ driCreateContextAttribs(__DRIscreen *screen, int api,
 if (mesa_api != API_OPENGL_COMPAT
 && mesa_api != API_OPENGL_CORE
 && (flags & ~(__DRI_CTX_FLAG_DEBUG |
- __DRI_CTX_FLAG_ROBUST_BUFFER_ACCESS))) {
+ __DRI_CTX_FLAG_ROBUST_BUFFER_ACCESS |
+ __DRI_CTX_FLAG_NO_ERROR))) {
*error = __DRI_CTX_ERROR_BAD_FLAG;
return NULL;
 }
@@ -425,7 +426,8 @@ driCreateContextAttribs(__DRIscreen *screen, int api,
 
 const uint32_t allowed_flags = (__DRI_CTX_FLAG_DEBUG
 | __DRI_CTX_FLAG_FORWARD_COMPATIBLE
-| __DRI_CTX_FLAG_ROBUST_BUFFER_ACCESS);
+| __DRI_CTX_FLAG_ROBUST_BUFFER_ACCESS
+| __DRI_CTX_FLAG_NO_ERROR);
 if (flags & ~allowed_flags) {
*error = __DRI_CTX_ERROR_UNKNOWN_FLAG;
return NULL;
@@ -467,6 +469,8 @@ driContextSetFlags(struct gl_context *ctx, uint32_t flags)
_mesa_set_debug_state_int(ctx, GL_DEBUG_OUTPUT, GL_TRUE);
 ctx->Const.ContextFlags |= GL_CONTEXT_FLAG_DEBUG_BIT;
 }
+if ((flags & __DRI_CTX_FLAG_NO_ERROR) != 0)
+ctx->Const.ContextFlags |= GL_CONTEXT_FLAG_NO_ERROR_BIT_KHR;
 }
 
 static __DRIcontext *
diff --git a/src/mesa/drivers/dri/i965/intel_screen.c 
b/src/mesa/drivers/dri/i965/intel_screen.c
index c75f212..2bfe0b9 100644
--- a/src/mesa/drivers/dri/i965/intel_screen.c
+++ b/src/mesa/drivers/dri/i965/intel_screen.c
@@ -1247,7 +1247,11 @@ static const __DRI2rendererQueryExtension 
intelRendererQueryExtension = {
 };
 
 static const 

[Mesa-dev] [PATCH 1/4] egl: Add EGL_KHR_create_context_no_error support

2017-07-11 Thread Grigori Goronzy
This only adds the EGL side, needs to be plumbed into Mesa frontend.
---
 src/egl/drivers/dri2/egl_dri2.c | 20 ++--
 src/egl/drivers/dri2/egl_dri2.h |  1 +
 src/egl/main/eglapi.c   |  1 +
 src/egl/main/eglcontext.c   | 30 ++
 src/egl/main/eglcontext.h   |  1 +
 src/egl/main/egldisplay.h   |  1 +
 6 files changed, 52 insertions(+), 2 deletions(-)

diff --git a/src/egl/drivers/dri2/egl_dri2.c b/src/egl/drivers/dri2/egl_dri2.c
index cf26242..6bb94e4 100644
--- a/src/egl/drivers/dri2/egl_dri2.c
+++ b/src/egl/drivers/dri2/egl_dri2.c
@@ -428,6 +428,7 @@ static const struct dri2_extension_match 
swrast_core_extensions[] = {
 
 static const struct dri2_extension_match optional_core_extensions[] = {
{ __DRI2_ROBUSTNESS, 1, offsetof(struct dri2_egl_display, robustness) },
+   { __DRI2_NO_ERROR, 1, offsetof(struct dri2_egl_display, no_error) },
{ __DRI2_CONFIG_QUERY, 1, offsetof(struct dri2_egl_display, config) },
{ __DRI2_FENCE, 1, offsetof(struct dri2_egl_display, fence) },
{ __DRI2_RENDERER_QUERY, 1, offsetof(struct dri2_egl_display, 
rendererQuery) },
@@ -665,6 +666,9 @@ dri2_setup_screen(_EGLDisplay *disp)
  disp->Extensions.EXT_create_context_robustness = EGL_TRUE;
}
 
+   if (dri2_dpy->no_error)
+  disp->Extensions.KHR_create_context_no_error = EGL_TRUE;
+
if (dri2_dpy->fence) {
   disp->Extensions.KHR_fence_sync = EGL_TRUE;
   disp->Extensions.KHR_wait_sync = EGL_TRUE;
@@ -1056,7 +1060,7 @@ dri2_fill_context_attribs(struct dri2_egl_context 
*dri2_ctx,
ctx_attribs[pos++] = __DRI_CTX_ATTRIB_MINOR_VERSION;
ctx_attribs[pos++] = dri2_ctx->base.ClientMinorVersion;
 
-   if (dri2_ctx->base.Flags != 0) {
+   if (dri2_ctx->base.Flags != 0 || dri2_ctx->base.NoError) {
   /* If the implementation doesn't support the __DRI2_ROBUSTNESS
* extension, don't even try to send it the robust-access flag.
* It may explode.  Instead, generate the required EGL error here.
@@ -1068,7 +1072,8 @@ dri2_fill_context_attribs(struct dri2_egl_context 
*dri2_ctx,
   }
 
   ctx_attribs[pos++] = __DRI_CTX_ATTRIB_FLAGS;
-  ctx_attribs[pos++] = dri2_ctx->base.Flags;
+  ctx_attribs[pos++] = dri2_ctx->base.Flags |
+dri2_ctx->base.NoError ? __DRI_CTX_FLAG_NO_ERROR : 0;
}
 
if (dri2_ctx->base.ResetNotificationStrategy != 
EGL_NO_RESET_NOTIFICATION_KHR) {
@@ -1131,6 +1136,17 @@ dri2_create_context(_EGLDriver *drv, _EGLDisplay *disp, 
_EGLConfig *conf,
   goto cleanup;
}
 
+   /* The EGL_KHR_create_context_no_error spec says:
+*
+*"BAD_MATCH is generated if the value of 
EGL_CONTEXT_OPENGL_NO_ERROR_KHR
+*used to create  does not match the value of
+*EGL_CONTEXT_OPENGL_NO_ERROR_KHR for the context being created."
+*/
+   if (share_list && share_list->NoError != dri2_ctx->base.NoError) {
+  _eglError(EGL_BAD_MATCH, "eglCreateContext");
+  goto cleanup;
+   }
+
switch (dri2_ctx->base.ClientAPI) {
case EGL_OPENGL_ES_API:
   switch (dri2_ctx->base.ClientMajorVersion) {
diff --git a/src/egl/drivers/dri2/egl_dri2.h b/src/egl/drivers/dri2/egl_dri2.h
index 4a5cf8e..5b3e93a 100644
--- a/src/egl/drivers/dri2/egl_dri2.h
+++ b/src/egl/drivers/dri2/egl_dri2.h
@@ -170,6 +170,7 @@ struct dri2_egl_display
const __DRItexBufferExtension  *tex_buffer;
const __DRIimageExtension  *image;
const __DRIrobustnessExtension *robustness;
+   const __DRInoErrorExtension*no_error;
const __DRI2configQueryExtension *config;
const __DRI2fenceExtension *fence;
const __DRI2rendererQueryExtension *rendererQuery;
diff --git a/src/egl/main/eglapi.c b/src/egl/main/eglapi.c
index 9b899d8..000368a 100644
--- a/src/egl/main/eglapi.c
+++ b/src/egl/main/eglapi.c
@@ -494,6 +494,7 @@ _eglCreateExtensionsString(_EGLDisplay *dpy)
_EGL_CHECK_EXTENSION(KHR_cl_event2);
_EGL_CHECK_EXTENSION(KHR_config_attribs);
_EGL_CHECK_EXTENSION(KHR_create_context);
+   _EGL_CHECK_EXTENSION(KHR_create_context_no_error);
_EGL_CHECK_EXTENSION(KHR_fence_sync);
_EGL_CHECK_EXTENSION(KHR_get_all_proc_addresses);
_EGL_CHECK_EXTENSION(KHR_gl_colorspace);
diff --git a/src/egl/main/eglcontext.c b/src/egl/main/eglcontext.c
index df8b45c..4244ca0 100644
--- a/src/egl/main/eglcontext.c
+++ b/src/egl/main/eglcontext.c
@@ -312,6 +312,36 @@ _eglParseContextAttribList(_EGLContext *ctx, _EGLDisplay 
*dpy,
 ctx->Flags |= EGL_CONTEXT_OPENGL_FORWARD_COMPATIBLE_BIT_KHR;
  break;
 
+  case EGL_CONTEXT_OPENGL_NO_ERROR_KHR:
+ if (dpy->Version < 14) {
+err = EGL_BAD_ATTRIBUTE;
+break;
+ }
+
+ /* The KHR_no_error spec only applies against OpenGL 2.0+ and
+  * OpenGL ES 2.0+
+  */
+ if ((api != EGL_OPENGL_API && api != EGL_OPENGL_ES_API) ||
+ ctx->ClientMajorVersion < 2) {
+err = EGL_BAD_ATTRIBUTE;
+break;
+ }
+
+ 

[Mesa-dev] KHR_no_error improvements

2017-07-11 Thread Grigori Goronzy
Hi,

this series implements support for the EGL_KHR_context_create_no
error extension and the associated plumbing through the different
layers of Mesa - EGL, DRI, Gallium state tracker, Mesa frontend. It
took me a while to figure out how everything is connected together
and still it's somewhat confusing to me, so please bear with me if
I did something stupid. :)

With all the infrastructure in place, it's easy to add driconf
support for KHR_no_error, so that's done as well. Maybe games can be
whitelisted, similar to glthread, although that seems to be a slightly
controversial idea.

Grigori

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/4] st/mesa: add support for KHR_no_error flag

2017-07-11 Thread Grigori Goronzy
Add a new context flag and plumb it through the various layers of the
context creation code to set up dispatch tables for the no-error mode.
---
 src/gallium/include/state_tracker/st_api.h   |  1 +
 src/gallium/state_trackers/dri/dri_context.c |  3 +++
 src/mesa/state_tracker/st_context.c  | 10 +++---
 src/mesa/state_tracker/st_context.h  |  3 ++-
 src/mesa/state_tracker/st_manager.c  |  6 +-
 5 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/src/gallium/include/state_tracker/st_api.h 
b/src/gallium/include/state_tracker/st_api.h
index 222e565..29e05e9 100644
--- a/src/gallium/include/state_tracker/st_api.h
+++ b/src/gallium/include/state_tracker/st_api.h
@@ -90,6 +90,7 @@ enum st_api_feature
 #define ST_CONTEXT_FLAG_FORWARD_COMPATIBLE  (1 << 1)
 #define ST_CONTEXT_FLAG_ROBUST_ACCESS   (1 << 2)
 #define ST_CONTEXT_FLAG_RESET_NOTIFICATION_ENABLED (1 << 3)
+#define ST_CONTEXT_FLAG_NO_ERROR(1 << 4)
 
 /**
  * Reasons that context creation might fail.
diff --git a/src/gallium/state_trackers/dri/dri_context.c 
b/src/gallium/state_trackers/dri/dri_context.c
index e25f186..275c0d4 100644
--- a/src/gallium/state_trackers/dri/dri_context.c
+++ b/src/gallium/state_trackers/dri/dri_context.c
@@ -107,6 +107,9 @@ dri_create_context(gl_api api, const struct gl_config * 
visual,
if (notify_reset)
   attribs.flags |= ST_CONTEXT_FLAG_RESET_NOTIFICATION_ENABLED;
 
+   if (flags & __DRI_CTX_FLAG_NO_ERROR)
+  attribs.flags |= ST_CONTEXT_FLAG_NO_ERROR;
+
if (sharedContextPrivate) {
   st_share = ((struct dri_context *)sharedContextPrivate)->st;
}
diff --git a/src/mesa/state_tracker/st_context.c 
b/src/mesa/state_tracker/st_context.c
index f535139..b8677f4 100644
--- a/src/mesa/state_tracker/st_context.c
+++ b/src/mesa/state_tracker/st_context.c
@@ -288,7 +288,7 @@ static void st_init_driver_flags(struct st_context *st);
 
 static struct st_context *
 st_create_context_priv( struct gl_context *ctx, struct pipe_context *pipe,
-   const struct st_config_options *options)
+   const struct st_config_options *options, bool no_error)
 {
struct pipe_screen *screen = pipe->screen;
uint i;
@@ -369,6 +369,9 @@ st_create_context_priv( struct gl_context *ctx, struct 
pipe_context *pipe,
 
ctx->VertexProgram._MaintainTnlProgram = GL_TRUE;
 
+   if (no_error)
+  ctx->Const.ContextFlags |= GL_CONTEXT_FLAG_NO_ERROR_BIT_KHR;
+
st->has_stencil_export =
   screen->get_param(screen, PIPE_CAP_SHADER_STENCIL_EXPORT);
st->has_shader_model3 = screen->get_param(screen, PIPE_CAP_SM3);
@@ -535,7 +538,8 @@ static void st_init_driver_flags(struct st_context *st)
 struct st_context *st_create_context(gl_api api, struct pipe_context *pipe,
  const struct gl_config *visual,
  struct st_context *share,
- const struct st_config_options *options)
+ const struct st_config_options *options,
+ bool no_error)
 {
struct gl_context *ctx;
struct gl_context *shareCtx = share ? share->ctx : NULL;
@@ -566,7 +570,7 @@ struct st_context *st_create_context(gl_api api, struct 
pipe_context *pipe,
if (debug_get_option_mesa_mvp_dp4())
   ctx->Const.ShaderCompilerOptions[MESA_SHADER_VERTEX].OptimizeForAOS = 
GL_TRUE;
 
-   st = st_create_context_priv(ctx, pipe, options);
+   st = st_create_context_priv(ctx, pipe, options, no_error);
if (!st) {
   _mesa_destroy_context(ctx);
}
diff --git a/src/mesa/state_tracker/st_context.h 
b/src/mesa/state_tracker/st_context.h
index af9149e..b2ea6b5 100644
--- a/src/mesa/state_tracker/st_context.h
+++ b/src/mesa/state_tracker/st_context.h
@@ -390,7 +390,8 @@ extern struct st_context *
 st_create_context(gl_api api, struct pipe_context *pipe,
   const struct gl_config *visual,
   struct st_context *share,
-  const struct st_config_options *options);
+  const struct st_config_options *options,
+  bool no_error);
 
 extern void
 st_destroy_context(struct st_context *st);
diff --git a/src/mesa/state_tracker/st_manager.c 
b/src/mesa/state_tracker/st_manager.c
index 7a3205c..242262b 100644
--- a/src/mesa/state_tracker/st_manager.c
+++ b/src/mesa/state_tracker/st_manager.c
@@ -654,6 +654,7 @@ st_api_create_context(struct st_api *stapi, struct 
st_manager *smapi,
struct pipe_context *pipe;
struct gl_config mode;
gl_api api;
+   bool no_error = false;
unsigned ctx_flags = PIPE_CONTEXT_PREFER_THREADED;
 
if (!(stapi->profile_mask & (1 << attribs->profile)))
@@ -680,6 +681,9 @@ st_api_create_context(struct st_api *stapi, struct 
st_manager *smapi,
if (attribs->flags & ST_CONTEXT_FLAG_ROBUST_ACCESS)
   ctx_flags |= PIPE_CONTEXT_ROBUST_BUFFER_ACCESS;
 
+   if (attribs->flags & ST_CONTEXT_FLAG_NO_ERROR)
+  

[Mesa-dev] [PATCH 4/4] dri: Add KHR_no_error toggle to driconf

2017-07-11 Thread Grigori Goronzy
Allows applications to be whitelisted.
---
 src/gallium/state_trackers/dri/dri_context.c| 3 +++
 src/gallium/state_trackers/dri/dri_screen.c | 1 +
 src/mesa/drivers/dri/common/dri_util.c  | 3 +++
 src/mesa/drivers/dri/common/xmlpool/t_options.h | 5 +
 4 files changed, 12 insertions(+)

diff --git a/src/gallium/state_trackers/dri/dri_context.c 
b/src/gallium/state_trackers/dri/dri_context.c
index 275c0d4..e4f7c96 100644
--- a/src/gallium/state_trackers/dri/dri_context.c
+++ b/src/gallium/state_trackers/dri/dri_context.c
@@ -124,6 +124,9 @@ dri_create_context(gl_api api, const struct gl_config * 
visual,
ctx->cPriv = cPriv;
ctx->sPriv = sPriv;
 
+   if (driQueryOptionb(>optionCache, "mesa_no_error"))
+  attribs.flags |=  ST_CONTEXT_FLAG_NO_ERROR;
+
attribs.options = screen->options;
dri_fill_st_visual(, screen, visual);
ctx->st = stapi->create_context(stapi, >base, , _err,
diff --git a/src/gallium/state_trackers/dri/dri_screen.c 
b/src/gallium/state_trackers/dri/dri_screen.c
index 6b58830..de0840b 100644
--- a/src/gallium/state_trackers/dri/dri_screen.c
+++ b/src/gallium/state_trackers/dri/dri_screen.c
@@ -56,6 +56,7 @@ const __DRIconfigOptionsExtension gallium_config_options = {
DRI_CONF_BEGIN
   DRI_CONF_SECTION_PERFORMANCE
  DRI_CONF_MESA_GLTHREAD("false")
+ DRI_CONF_MESA_NO_ERROR("false")
  DRI_CONF_DISABLE_EXT_BUFFER_AGE("false")
  DRI_CONF_DISABLE_OML_SYNC_CONTROL("false")
   DRI_CONF_SECTION_END
diff --git a/src/mesa/drivers/dri/common/dri_util.c 
b/src/mesa/drivers/dri/common/dri_util.c
index 174356f..cc97c2d 100644
--- a/src/mesa/drivers/dri/common/dri_util.c
+++ b/src/mesa/drivers/dri/common/dri_util.c
@@ -437,6 +437,9 @@ driCreateContextAttribs(__DRIscreen *screen, int api,
   major_version, minor_version, error))
return NULL;
 
+if (driQueryOptionb(>optionCache, "mesa_no_error"))
+flags |= __DRI_CTX_FLAG_NO_ERROR;
+
 context = calloc(1, sizeof *context);
 if (!context) {
*error = __DRI_CTX_ERROR_NO_MEMORY;
diff --git a/src/mesa/drivers/dri/common/xmlpool/t_options.h 
b/src/mesa/drivers/dri/common/xmlpool/t_options.h
index 9aa1798..e308839 100644
--- a/src/mesa/drivers/dri/common/xmlpool/t_options.h
+++ b/src/mesa/drivers/dri/common/xmlpool/t_options.h
@@ -332,6 +332,11 @@ DRI_CONF_OPT_BEGIN_B(mesa_glthread, def) \
 DRI_CONF_DESC(en,gettext("Enable offloading GL driver work to a 
separate thread")) \
 DRI_CONF_OPT_END
 
+#define DRI_CONF_MESA_NO_ERROR(def) \
+DRI_CONF_OPT_BEGIN_B(mesa_no_error, def) \
+DRI_CONF_DESC(en,gettext("Disable GL driver error checking")) \
+DRI_CONF_OPT_END
+
 #define DRI_CONF_DISABLE_EXT_BUFFER_AGE(def) \
 DRI_CONF_OPT_BEGIN_B(glx_disable_ext_buffer_age, def) \
DRI_CONF_DESC(en, gettext("Disable the GLX_EXT_buffer_age extension")) \
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] mesa/marshal: fix glNamedBufferData with NULL data

2017-07-10 Thread Grigori Goronzy
The semantics are similar to glBufferData. Fixes a crash with VMWare
Player.

Signed-off-by: Grigori Goronzy <g...@chown.ath.cx>
---
 src/mesa/main/marshal.c | 17 +
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/src/mesa/main/marshal.c b/src/mesa/main/marshal.c
index 8db4531..b801bdc 100644
--- a/src/mesa/main/marshal.c
+++ b/src/mesa/main/marshal.c
@@ -415,6 +415,7 @@ struct marshal_cmd_NamedBufferData
GLuint name;
GLsizei size;
GLenum usage;
+   bool data_null; /* If set, no data follows for "data" */
/* Next size bytes are GLubyte data[size] */
 };
 
@@ -425,7 +426,12 @@ _mesa_unmarshal_NamedBufferData(struct gl_context *ctx,
const GLuint name = cmd->name;
const GLsizei size = cmd->size;
const GLenum usage = cmd->usage;
-   const void *data = (const void *) (cmd + 1);
+   const void *data;
+
+   if (cmd->data_null)
+  data = NULL;
+   else
+  data = (const void *) (cmd + 1);
 
CALL_NamedBufferData(ctx->CurrentServerDispatch,
 (name, size, data, usage));
@@ -436,7 +442,7 @@ _mesa_marshal_NamedBufferData(GLuint buffer, GLsizeiptr 
size,
   const GLvoid * data, GLenum usage)
 {
GET_CURRENT_CONTEXT(ctx);
-   size_t cmd_size = sizeof(struct marshal_cmd_NamedBufferData) + size;
+   size_t cmd_size = sizeof(struct marshal_cmd_NamedBufferData) + (data ? size 
: 0);
 
debug_print_marshal("NamedBufferData");
if (unlikely(size < 0)) {
@@ -452,8 +458,11 @@ _mesa_marshal_NamedBufferData(GLuint buffer, GLsizeiptr 
size,
   cmd->name = buffer;
   cmd->size = size;
   cmd->usage = usage;
-  char *variable_data = (char *) (cmd + 1);
-  memcpy(variable_data, data, size);
+  cmd->data_null = !data;
+  if (data) {
+ char *variable_data = (char *) (cmd + 1);
+ memcpy(variable_data, data, size);
+  }
   _mesa_post_marshal_hook(ctx);
} else {
   _mesa_glthread_finish(ctx);
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] mesa/marshal: add custom marshallingforglNamedBuffer(Sub)Data

2017-07-09 Thread Grigori Goronzy

On 2017-06-26 15:51, Marc Dietrich wrote:

Am Montag, 26. Juni 2017, 15:35:15 CEST schrieb Grigori Goronzy:

On 2017-06-26 15:11, Marc Dietrich wrote:
> unfortunately, this change broke vmware/vmplayer here (bisected).
> Windows
> guest on linux host. Sig 11 in SVGA driver. All good if
> mesa_glthread=false.

Can you provide instructions how to reproduce this problem? A 
backtrace

might help, too.


well, this is all proprietary software, so the backtrace doesn't really 
tell

something.


I don't really get it, by the way. Isn't the SVGA driver for Linux
guests?


I think the windows driver is named the same. Here is a paste of 
vmware.log:


https://pastebin.com/X3CS7rCP

I also have core dump, maybe only useful for VMWARE staff...



Hey Marc,

does the attached patch fix the crash?

Grigori





Best regards
Grigori

>> > Best regards
>> > Grigori
>> >
>> >> [1]
>> >> https://lists.freedesktop.org/archives/mesa-dev/2017-June/160329.html
>> >>
>> >> On 25/06/17 02:59, Grigori Goronzy wrote:
>> >>> These entry points are used by Alien Isolation and caused
>> >>> synchronization with glthread. The async marshalling implementation
>> >>> is similar to glBuffer(Sub)Data.
>> >>>
>> >>> Results in an approximately 6x drop in glthread synchronizations and
>> >>> a
>> >>> ~30% FPS jump in Alien Isolation (Medium preset, Athlon 860K, RX
>> >>> 480).
>> >>>
>> >>> This does not care about the EXTERNAL_VIRTUAL_MEMORY_BUFFER_AMD
>> >>> special
>> >>> case like the Buffer(Sub)Data marshalling functions.
>> >>> ---
>> >>> I'm not a fan of the code duplication and I'll try to address that in
>> >>> further changes to glthread/marshalling, but the improvement is so
>> >>> noticeable that I'd like to share it. Alien Isolation is now
>> >>> playable on
>> >>> my system while it wasn't before.
>> >>>
>> >>>   src/mapi/glapi/gen/ARB_direct_state_access.xml |   4 +-
>> >>>   src/mesa/main/marshal.c| 108
>> >>>
>> >>> +
>> >>>
>> >>>   src/mesa/main/marshal.h|  18 +
>> >>>   3 files changed, 128 insertions(+), 2 deletions(-)
>> >>>
>> >>> diff --git a/src/mapi/glapi/gen/ARB_direct_state_access.xml
>> >>> b/src/mapi/glapi/gen/ARB_direct_state_access.xml
>> >>> index cb24d79..d3d2246 100644
>> >>> --- a/src/mapi/glapi/gen/ARB_direct_state_access.xml
>> >>> +++ b/src/mapi/glapi/gen/ARB_direct_state_access.xml
>> >>> @@ -61,14 +61,14 @@
>> >>>
>> >>> 
>> >>>
>> >>>  
>> >>>
>> >>>   -   
>> >>>
>> >>> +   
>> >>>
>> >>> 
>> >>> 
>> >>> 
>> >>> 
>> >>>
>> >>>  
>> >>>
>> >>>   -   
>> >>>
>> >>> +   > >>> marshal="custom">
>> >>>
>> >>> 
>> >>> 
>> >>> 
>> >>>
>> >>> diff --git a/src/mesa/main/marshal.c b/src/mesa/main/marshal.c
>> >>> index 4840f32..1fddf8e 100644
>> >>> --- a/src/mesa/main/marshal.c
>> >>> +++ b/src/mesa/main/marshal.c
>> >>> @@ -408,6 +408,114 @@ _mesa_marshal_BufferSubData(GLenum target,
>> >>> GLintptr offset, GLsizeiptr size,
>> >>>
>> >>>  }
>> >>>
>> >>>   }
>> >>>   +/* NamedBufferData: marshalled asynchronously */
>> >>>
>> >>> +struct marshal_cmd_NamedBufferData
>> >>> +{
>> >>> +   struct marshal_cmd_base cmd_base;
>> >>> +   GLuint name;
>> >>> +   GLsizei size;
>> >>> +   GLenum usage;
>> >>> +   /* Next size bytes are GLubyte data[size] */
>> >>> +};
>> >>> +
>> >>> +void
>> >>> +_mesa_unmarshal_NamedBufferData(struct gl_context *ctx,
>> >>> +const struct
>> >>> marshal_cmd_NamedBufferData *cmd)
>> >>> +{
>> >>> +   const GLuint name = cmd->name;
&g

Re: [Mesa-dev] [PATCH 1/2] mesa/marshal: extract ClearBuffer helpers

2017-07-09 Thread Grigori Goronzy

On 2017-07-09 18:52, Matt Turner wrote:

+static inline size_t buffer_to_size(GLenum buffer)
+{
+   switch (buffer) {
+   case GL_COLOR:
+  return 4;
+   case GL_DEPTH_STENCIL:
+  return 2;
+   case GL_STENCIL:
+   case GL_DEPTH:
+  return 1;
+   default:
+  return 0;
+   }
+}
+
+static inline bool clear_buffer_add_command(struct gl_context *ctx, 
uint16_t id,


Please don't use 'inline'. The compiler is capable of making this
decision for itself, based on the data it has available.


Well, it's just a hint. If the compiler believes inlining is not 
beneficial, it does not have to do it. The GL frontend and no_error code 
uses inline quite a bit, so I figured it's acceptable in this place, but 
I can remove it as well.


Grigori


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] mesa/marshal: add marshalling for glClearBuffer*

2017-07-09 Thread Grigori Goronzy
Add async marshalling/unmarshalling for all glClearBuffer variants.
These entry points are commonly used in general and Alien Isolation
specifically uses glClearBufferiv. Slightly reduces the number of
thread synchronizations with glthread in that game.
---
 src/mapi/glapi/gen/GL3x.xml |   6 +-
 src/mesa/main/marshal.c | 133 +++-
 src/mesa/main/marshal.h |  29 +-
 3 files changed, 163 insertions(+), 5 deletions(-)

diff --git a/src/mapi/glapi/gen/GL3x.xml b/src/mapi/glapi/gen/GL3x.xml
index 24490da..7c86e8f 100644
--- a/src/mapi/glapi/gen/GL3x.xml
+++ b/src/mapi/glapi/gen/GL3x.xml
@@ -117,13 +117,13 @@
 
   
 
-  
+  
 
 
 
   
 
-  
+  
 
 
 
@@ -135,7 +135,7 @@
 
   
 
-  
+  
 
 
 
diff --git a/src/mesa/main/marshal.c b/src/mesa/main/marshal.c
index 1edc580..5fc733f 100644
--- a/src/mesa/main/marshal.c
+++ b/src/mesa/main/marshal.c
@@ -516,7 +516,7 @@ _mesa_marshal_NamedBufferSubData(GLuint buffer, GLintptr 
offset,
}
 }
 
-/* ClearBufferfv: marshalled asynchronously */
+/* ClearBuffer* (all variants): marshalled asynchronously */
 struct marshal_cmd_ClearBuffer
 {
struct marshal_cmd_base cmd_base;
@@ -537,6 +537,46 @@ _mesa_unmarshal_ClearBufferfv(struct gl_context *ctx,
   (buffer, drawbuffer, value));
 }
 
+void
+_mesa_unmarshal_ClearBufferiv(struct gl_context *ctx,
+  const struct marshal_cmd_ClearBuffer *cmd)
+{
+   const GLenum buffer = cmd->buffer;
+   const GLint drawbuffer = cmd->drawbuffer;
+   const char *variable_data = (const char *) (cmd + 1);
+   const GLint *value = (const GLint *) variable_data;
+
+   CALL_ClearBufferiv(ctx->CurrentServerDispatch,
+  (buffer, drawbuffer, value));
+}
+
+void
+_mesa_unmarshal_ClearBufferuiv(struct gl_context *ctx,
+   const struct marshal_cmd_ClearBuffer *cmd)
+{
+   const GLenum buffer = cmd->buffer;
+   const GLint drawbuffer = cmd->drawbuffer;
+   const char *variable_data = (const char *) (cmd + 1);
+   const GLuint *value = (const GLuint *) variable_data;
+
+   CALL_ClearBufferuiv(ctx->CurrentServerDispatch,
+   (buffer, drawbuffer, value));
+}
+
+void
+_mesa_unmarshal_ClearBufferfi(struct gl_context *ctx,
+  const struct marshal_cmd_ClearBuffer *cmd)
+{
+   const GLenum buffer = cmd->buffer;
+   const GLint drawbuffer = cmd->drawbuffer;
+   const char *variable_data = (const char *) (cmd + 1);
+   const GLfloat *depth = (const GLfloat *) variable_data;
+   const GLint *stencil = (const GLint *) (variable_data + 4);
+
+   CALL_ClearBufferfi(ctx->CurrentServerDispatch,
+  (buffer, drawbuffer, *depth, *stencil));
+}
+
 static inline size_t buffer_to_size(GLenum buffer)
 {
switch (buffer) {
@@ -607,3 +647,94 @@ _mesa_marshal_ClearBufferfv(GLenum buffer, GLint 
drawbuffer,
  (buffer, drawbuffer, value));
}
 }
+
+void GLAPIENTRY
+_mesa_marshal_ClearBufferiv(GLenum buffer, GLint drawbuffer,
+const GLint *value)
+{
+   GET_CURRENT_CONTEXT(ctx);
+   debug_print_marshal("ClearBufferiv");
+
+   if (!(buffer == GL_STENCIL || buffer == GL_COLOR)) {
+  _mesa_glthread_finish(ctx);
+
+  /* Page 498 of the PDF, section '17.4.3.1 Clearing Individual Buffers'
+   * of the OpenGL 4.5 spec states:
+   *
+   *"An INVALID_ENUM error is generated by ClearBufferiv and
+   * ClearNamedFramebufferiv if buffer is not COLOR or STENCIL."
+   */
+  _mesa_error(ctx, GL_INVALID_ENUM, "glClearBufferiv(buffer=%s)",
+  _mesa_enum_to_string(buffer));
+   }
+
+   size_t size = buffer_to_size(buffer);
+   if (!clear_buffer_add_command(ctx, DISPATCH_CMD_ClearBufferiv, buffer,
+ drawbuffer, (GLuint *)value, size)) {
+  debug_print_sync("ClearBufferiv");
+  _mesa_glthread_finish(ctx);
+  CALL_ClearBufferiv(ctx->CurrentServerDispatch,
+ (buffer, drawbuffer, value));
+   }
+}
+
+void GLAPIENTRY
+_mesa_marshal_ClearBufferuiv(GLenum buffer, GLint drawbuffer,
+ const GLuint *value)
+{
+   GET_CURRENT_CONTEXT(ctx);
+   debug_print_marshal("ClearBufferuiv");
+
+   if (buffer != GL_COLOR) {
+  _mesa_glthread_finish(ctx);
+
+  /* Page 498 of the PDF, section '17.4.3.1 Clearing Individual Buffers'
+   * of the OpenGL 4.5 spec states:
+   *
+   *"An INVALID_ENUM error is generated by ClearBufferuiv and
+   * ClearNamedFramebufferuiv if buffer is not COLOR."
+   */
+  _mesa_error(ctx, GL_INVALID_ENUM, "glClearBufferuiv(buffer=%s)",
+  _mesa_enum_to_string(buffer));
+   }
+
+   if (!clear_buffer_add_command(ctx, DISPATCH_CMD_ClearBufferuiv, buffer,
+ drawbuffer, (GLuint *)value, 4)) {
+  debug_print_sync("ClearBufferuiv");
+  

[Mesa-dev] [PATCH 1/2] mesa/marshal: extract ClearBuffer helpers

2017-07-09 Thread Grigori Goronzy
Extract clear buffer helper functions in preparation for adding
marshal/unmarshal functions for the various glClearBuffer variants.
---
 src/mesa/main/marshal.c | 74 +++--
 src/mesa/main/marshal.h |  5 ++--
 2 files changed, 50 insertions(+), 29 deletions(-)

diff --git a/src/mesa/main/marshal.c b/src/mesa/main/marshal.c
index 8db4531..1edc580 100644
--- a/src/mesa/main/marshal.c
+++ b/src/mesa/main/marshal.c
@@ -517,7 +517,7 @@ _mesa_marshal_NamedBufferSubData(GLuint buffer, GLintptr 
offset,
 }
 
 /* ClearBufferfv: marshalled asynchronously */
-struct marshal_cmd_ClearBufferfv
+struct marshal_cmd_ClearBuffer
 {
struct marshal_cmd_base cmd_base;
GLenum buffer;
@@ -526,7 +526,7 @@ struct marshal_cmd_ClearBufferfv
 
 void
 _mesa_unmarshal_ClearBufferfv(struct gl_context *ctx,
-  const struct marshal_cmd_ClearBufferfv *cmd)
+  const struct marshal_cmd_ClearBuffer *cmd)
 {
const GLenum buffer = cmd->buffer;
const GLint drawbuffer = cmd->drawbuffer;
@@ -537,6 +537,47 @@ _mesa_unmarshal_ClearBufferfv(struct gl_context *ctx,
   (buffer, drawbuffer, value));
 }
 
+static inline size_t buffer_to_size(GLenum buffer)
+{
+   switch (buffer) {
+   case GL_COLOR:
+  return 4;
+   case GL_DEPTH_STENCIL:
+  return 2;
+   case GL_STENCIL:
+   case GL_DEPTH:
+  return 1;
+   default:
+  return 0;
+   }
+}
+
+static inline bool clear_buffer_add_command(struct gl_context *ctx, uint16_t 
id,
+GLenum buffer, GLint drawbuffer,
+const GLuint *value, size_t size)
+{
+   size_t cmd_size = sizeof(struct marshal_cmd_ClearBuffer) + size;
+   if (cmd_size <= MARSHAL_MAX_CMD_SIZE) {
+  struct marshal_cmd_ClearBuffer *cmd =
+ _mesa_glthread_allocate_command(ctx, id,
+ cmd_size);
+  cmd->buffer = buffer;
+  cmd->drawbuffer = drawbuffer;
+  GLuint *variable_data = (GLuint *) (cmd + 1);
+  if (size == 4)
+ COPY_4V(variable_data,  value);
+  else if (size == 2)
+ COPY_2V(variable_data, value);
+  else
+ *variable_data = *value;
+
+  _mesa_post_marshal_hook(ctx);
+  return true;
+   }
+
+   return false;
+}
+
 void GLAPIENTRY
 _mesa_marshal_ClearBufferfv(GLenum buffer, GLint drawbuffer,
 const GLfloat *value)
@@ -544,15 +585,7 @@ _mesa_marshal_ClearBufferfv(GLenum buffer, GLint 
drawbuffer,
GET_CURRENT_CONTEXT(ctx);
debug_print_marshal("ClearBufferfv");
 
-   size_t size;
-   switch (buffer) {
-   case GL_DEPTH:
-  size = sizeof(GLfloat);
-  break;
-   case GL_COLOR:
-  size = sizeof(GLfloat) * 4;
-  break;
-   default:
+   if (!(buffer == GL_DEPTH || buffer == GL_COLOR)) {
   _mesa_glthread_finish(ctx);
 
   /* Page 498 of the PDF, section '17.4.3.1 Clearing Individual Buffers'
@@ -563,24 +596,11 @@ _mesa_marshal_ClearBufferfv(GLenum buffer, GLint 
drawbuffer,
*/
   _mesa_error(ctx, GL_INVALID_ENUM, "glClearBufferfv(buffer=%s)",
   _mesa_enum_to_string(buffer));
-  return;
}
 
-   size_t cmd_size = sizeof(struct marshal_cmd_ClearBufferfv) + size;
-   if (cmd_size <= MARSHAL_MAX_CMD_SIZE) {
-  struct marshal_cmd_ClearBufferfv *cmd =
- _mesa_glthread_allocate_command(ctx, DISPATCH_CMD_ClearBufferfv,
- cmd_size);
-  cmd->buffer = buffer;
-  cmd->drawbuffer = drawbuffer;
-  GLfloat *variable_data = (GLfloat *) (cmd + 1);
-  if (buffer == GL_COLOR)
- COPY_4V(variable_data, value);
-  else
- *variable_data = *value;
-
-  _mesa_post_marshal_hook(ctx);
-   } else {
+   size_t size = buffer_to_size(buffer);
+   if (!clear_buffer_add_command(ctx, DISPATCH_CMD_ClearBufferfv, buffer,
+ drawbuffer, (GLuint *)value, size)) {
   debug_print_sync("ClearBufferfv");
   _mesa_glthread_finish(ctx);
   CALL_ClearBufferfv(ctx->CurrentServerDispatch,
diff --git a/src/mesa/main/marshal.h b/src/mesa/main/marshal.h
index 999c75e..1567e7b 100644
--- a/src/mesa/main/marshal.h
+++ b/src/mesa/main/marshal.h
@@ -182,7 +182,8 @@ struct marshal_cmd_BufferData;
 struct marshal_cmd_BufferSubData;
 struct marshal_cmd_NamedBufferData;
 struct marshal_cmd_NamedBufferSubData;
-struct marshal_cmd_ClearBufferfv;
+struct marshal_cmd_ClearBuffer;
+#define marshal_cmd_ClearBufferfv marshal_cmd_ClearBuffer
 
 void
 _mesa_unmarshal_Enable(struct gl_context *ctx,
@@ -247,7 +248,7 @@ _mesa_marshal_NamedBufferSubData(GLuint buffer, GLintptr 
offset, GLsizeiptr size
 
 void
 _mesa_unmarshal_ClearBufferfv(struct gl_context *ctx,
-  const struct marshal_cmd_ClearBufferfv *cmd);
+  const struct marshal_cmd_ClearBuffer *cmd);
 
 void GLAPIENTRY
 

Re: [Mesa-dev] [PATCH] glthread: get rid of unmarshal dispatch enum/table

2017-07-07 Thread Grigori Goronzy

On 2017-07-01 18:46, Marek Olšák wrote:

Instead of passing the function pointer through the queue, passing
just a call ID (uint16_t) is preferable.

If the switch statement is an issue, doing a function pointer lookup
from a static array should be OK.



OK, then let's drop this patch. gcc turns the switch/case block into an 
efficient jump table with the ID method, so an array for function lookup 
instead of that doesn't improve anything.
I didn't see any measurable benefit of the function pointer method 
either.


Best regards
Grigori



On Fri, Jun 30, 2017 at 7:14 PM, Grigori Goronzy <g...@chown.ath.cx> 
wrote:

On 2017-06-30 15:27, Nicolai Hähnle wrote:


On 30.06.2017 02:29, Grigori Goronzy wrote:


Use function pointers to identify the unmarshalling function, which
is simpler and gets rid of a lot generated code.

This removes an indirection and possibly results in a slight speedup
as well.



The fact that it blows up cmd_base from 4 bytes to 16 bytes might
result in a slowdown. Marek's recent changes clearly indicated that
looking at memory behavior matters quite a bit for glthread. So I'm
inclined to say No on this unless you can demonstrate a consistent
speedup.



That's indeed a notable difference. I suspect it isn't so much the 
byte size

of the marshalled commands that affects throughput, but the number of
commands per batch and their associated costs when unmarshalling, so 
the
larger size of cmd_base might not matter much (perhaps with adjusted 
max

batch size). In any case, I'll try get hold of some numbers.

Best regards
Grigori



Cheers,
Nicolai



---
  src/mapi/glapi/gen/Makefile.am |  4 --
  src/mapi/glapi/gen/gl_marshal.py   | 36 ++--
  src/mapi/glapi/gen/gl_marshal_h.py | 86
--
  src/mesa/Android.gen.mk|  7 
  src/mesa/Makefile.sources  |  1 -
  src/mesa/SConscript|  8 
  src/mesa/main/.gitignore   |  1 -
  src/mesa/main/glthread.c   |  9 +++-
  src/mesa/main/glthread.h   |  2 -
  src/mesa/main/marshal.c| 19 -
  src/mesa/main/marshal.h| 14 +++
  11 files changed, 26 insertions(+), 161 deletions(-)
  delete mode 100644 src/mapi/glapi/gen/gl_marshal_h.py

diff --git a/src/mapi/glapi/gen/Makefile.am
b/src/mapi/glapi/gen/Makefile.am
index bd04519..62007a4 100644
--- a/src/mapi/glapi/gen/Makefile.am
+++ b/src/mapi/glapi/gen/Makefile.am
@@ -76,7 +76,6 @@ EXTRA_DIST= \
gl_genexec.py \
gl_gentable.py \
gl_marshal.py \
-   gl_marshal_h.py \
gl_procs.py \
gl_SPARC_asm.py \
gl_table.py \
@@ -297,9 +296,6 @@ $(MESA_DIR)/main/api_exec.c: gl_genexec.py 
apiexec.py

$(COMMON)
  $(MESA_DIR)/main/marshal_generated.c: gl_marshal.py marshal_XML.py
$(COMMON)
$(PYTHON_GEN) $(srcdir)/gl_marshal.py -f
$(srcdir)/gl_and_es_API.xml > $@
  -$(MESA_DIR)/main/marshal_generated.h: gl_marshal_h.py 
marshal_XML.py

$(COMMON)
-   $(PYTHON_GEN) $(srcdir)/gl_marshal_h.py -f
$(srcdir)/gl_and_es_API.xml > $@
-
  $(MESA_DIR)/main/dispatch.h: gl_table.py $(COMMON)
$(PYTHON_GEN) $(srcdir)/gl_table.py -f
$(srcdir)/gl_and_es_API.xml -m remap_table > $@
  diff --git a/src/mapi/glapi/gen/gl_marshal.py
b/src/mapi/glapi/gen/gl_marshal.py
index efa4d9e..e71ede3 100644
--- a/src/mapi/glapi/gen/gl_marshal.py
+++ b/src/mapi/glapi/gen/gl_marshal.py
@@ -34,7 +34,6 @@ header = """
  #include "dispatch.h"
  #include "glthread.h"
  #include "marshal.h"
-#include "marshal_generated.h"
  """
@@ -106,7 +105,7 @@ class PrintCode(gl_XML.gl_print_base):
def print_async_dispatch(self, func):
  out('cmd = _mesa_glthread_allocate_command(ctx, '
-'DISPATCH_CMD_{0}, cmd_size);'.format(func.name))
+'(unmarshal_func)_mesa_unmarshal_{0},
cmd_size);'.format(func.name))
  for p in func.fixed_params:
  if p.count:
  out('memcpy(cmd->{0}, {0}, {1});'.format(
@@ -166,7 +165,7 @@ class PrintCode(gl_XML.gl_print_base):
  out('};')
def print_async_unmarshal(self, func):
-out('static inline void')
+out('static void')
  out(('_mesa_unmarshal_{0}(struct gl_context *ctx, '
   'const struct marshal_cmd_{0} 
*cmd)').format(func.name))

  out('{')
@@ -205,6 +204,7 @@ class PrintCode(gl_XML.gl_print_base):
  else:
  out('variable_data +=
{0};'.format(p.size_string(False)))
  +
out('debug_print_unmarshal("{0}");'.format(func.name))

  self.print_sync_call(func)
  out('}')
  @@ -276,35 +276,6 @@ class PrintCode(gl_XML.gl_print_base):
  out('')
  out('')
  -def print_unmarshal_dispatch_cmd(self, api):
-out('size_t')
-out('_mesa_unmarshal_dispatch_cmd(struct gl_context *ctx, '
-  

Re: [Mesa-dev] [PATCH] glthread: get rid of unmarshal dispatch enum/table

2017-06-30 Thread Grigori Goronzy

On 2017-06-30 15:27, Nicolai Hähnle wrote:

On 30.06.2017 02:29, Grigori Goronzy wrote:

Use function pointers to identify the unmarshalling function, which
is simpler and gets rid of a lot generated code.

This removes an indirection and possibly results in a slight speedup
as well.


The fact that it blows up cmd_base from 4 bytes to 16 bytes might
result in a slowdown. Marek's recent changes clearly indicated that
looking at memory behavior matters quite a bit for glthread. So I'm
inclined to say No on this unless you can demonstrate a consistent
speedup.



That's indeed a notable difference. I suspect it isn't so much the byte 
size of the marshalled commands that affects throughput, but the number 
of commands per batch and their associated costs when unmarshalling, so 
the larger size of cmd_base might not matter much (perhaps with adjusted 
max batch size). In any case, I'll try get hold of some numbers.


Best regards
Grigori


Cheers,
Nicolai



---
  src/mapi/glapi/gen/Makefile.am |  4 --
  src/mapi/glapi/gen/gl_marshal.py   | 36 ++--
  src/mapi/glapi/gen/gl_marshal_h.py | 86 
--

  src/mesa/Android.gen.mk|  7 
  src/mesa/Makefile.sources  |  1 -
  src/mesa/SConscript|  8 
  src/mesa/main/.gitignore   |  1 -
  src/mesa/main/glthread.c   |  9 +++-
  src/mesa/main/glthread.h   |  2 -
  src/mesa/main/marshal.c| 19 -
  src/mesa/main/marshal.h| 14 +++
  11 files changed, 26 insertions(+), 161 deletions(-)
  delete mode 100644 src/mapi/glapi/gen/gl_marshal_h.py

diff --git a/src/mapi/glapi/gen/Makefile.am 
b/src/mapi/glapi/gen/Makefile.am

index bd04519..62007a4 100644
--- a/src/mapi/glapi/gen/Makefile.am
+++ b/src/mapi/glapi/gen/Makefile.am
@@ -76,7 +76,6 @@ EXTRA_DIST= \
gl_genexec.py \
gl_gentable.py \
gl_marshal.py \
-   gl_marshal_h.py \
gl_procs.py \
gl_SPARC_asm.py \
gl_table.py \
@@ -297,9 +296,6 @@ $(MESA_DIR)/main/api_exec.c: gl_genexec.py 
apiexec.py $(COMMON)
  $(MESA_DIR)/main/marshal_generated.c: gl_marshal.py marshal_XML.py 
$(COMMON)
  	$(PYTHON_GEN) $(srcdir)/gl_marshal.py -f 
$(srcdir)/gl_and_es_API.xml > $@
  -$(MESA_DIR)/main/marshal_generated.h: gl_marshal_h.py 
marshal_XML.py $(COMMON)
-	$(PYTHON_GEN) $(srcdir)/gl_marshal_h.py -f 
$(srcdir)/gl_and_es_API.xml > $@

-
  $(MESA_DIR)/main/dispatch.h: gl_table.py $(COMMON)
  	$(PYTHON_GEN) $(srcdir)/gl_table.py -f $(srcdir)/gl_and_es_API.xml 
-m remap_table > $@
  diff --git a/src/mapi/glapi/gen/gl_marshal.py 
b/src/mapi/glapi/gen/gl_marshal.py

index efa4d9e..e71ede3 100644
--- a/src/mapi/glapi/gen/gl_marshal.py
+++ b/src/mapi/glapi/gen/gl_marshal.py
@@ -34,7 +34,6 @@ header = """
  #include "dispatch.h"
  #include "glthread.h"
  #include "marshal.h"
-#include "marshal_generated.h"
  """
@@ -106,7 +105,7 @@ class PrintCode(gl_XML.gl_print_base):
def print_async_dispatch(self, func):
  out('cmd = _mesa_glthread_allocate_command(ctx, '
-'DISPATCH_CMD_{0}, cmd_size);'.format(func.name))
+'(unmarshal_func)_mesa_unmarshal_{0}, 
cmd_size);'.format(func.name))

  for p in func.fixed_params:
  if p.count:
  out('memcpy(cmd->{0}, {0}, {1});'.format(
@@ -166,7 +165,7 @@ class PrintCode(gl_XML.gl_print_base):
  out('};')
def print_async_unmarshal(self, func):
-out('static inline void')
+out('static void')
  out(('_mesa_unmarshal_{0}(struct gl_context *ctx, '
   'const struct marshal_cmd_{0} 
*cmd)').format(func.name))

  out('{')
@@ -205,6 +204,7 @@ class PrintCode(gl_XML.gl_print_base):
  else:
  out('variable_data += 
{0};'.format(p.size_string(False)))

  +out('debug_print_unmarshal("{0}");'.format(func.name))
  self.print_sync_call(func)
  out('}')
  @@ -276,35 +276,6 @@ class PrintCode(gl_XML.gl_print_base):
  out('')
  out('')
  -def print_unmarshal_dispatch_cmd(self, api):
-out('size_t')
-out('_mesa_unmarshal_dispatch_cmd(struct gl_context *ctx, '
-'const void *cmd)')
-out('{')
-with indent():
-out('const struct marshal_cmd_base *cmd_base = cmd;')
-out('switch (cmd_base->cmd_id) {')
-for func in api.functionIterateAll():
-flavor = func.marshal_flavor()
-if flavor in ('skip', 'sync'):
-continue
-out('case DISPATCH_CMD_{0}:'.format(func.name))
-with indent():
-
out('debug_print_unmarshal("{0}");'.format(func.name))
-out(('_mesa_unmarshal_{0}(ctx, (const struct 
marshal_cmd_{0} *)'

- 

[Mesa-dev] [PATCH] glthread: get rid of unmarshal dispatch enum/table

2017-06-29 Thread Grigori Goronzy
Use function pointers to identify the unmarshalling function, which
is simpler and gets rid of a lot generated code.

This removes an indirection and possibly results in a slight speedup
as well.
---
 src/mapi/glapi/gen/Makefile.am |  4 --
 src/mapi/glapi/gen/gl_marshal.py   | 36 ++--
 src/mapi/glapi/gen/gl_marshal_h.py | 86 --
 src/mesa/Android.gen.mk|  7 
 src/mesa/Makefile.sources  |  1 -
 src/mesa/SConscript|  8 
 src/mesa/main/.gitignore   |  1 -
 src/mesa/main/glthread.c   |  9 +++-
 src/mesa/main/glthread.h   |  2 -
 src/mesa/main/marshal.c| 19 -
 src/mesa/main/marshal.h| 14 +++
 11 files changed, 26 insertions(+), 161 deletions(-)
 delete mode 100644 src/mapi/glapi/gen/gl_marshal_h.py

diff --git a/src/mapi/glapi/gen/Makefile.am b/src/mapi/glapi/gen/Makefile.am
index bd04519..62007a4 100644
--- a/src/mapi/glapi/gen/Makefile.am
+++ b/src/mapi/glapi/gen/Makefile.am
@@ -76,7 +76,6 @@ EXTRA_DIST= \
gl_genexec.py \
gl_gentable.py \
gl_marshal.py \
-   gl_marshal_h.py \
gl_procs.py \
gl_SPARC_asm.py \
gl_table.py \
@@ -297,9 +296,6 @@ $(MESA_DIR)/main/api_exec.c: gl_genexec.py apiexec.py 
$(COMMON)
 $(MESA_DIR)/main/marshal_generated.c: gl_marshal.py marshal_XML.py $(COMMON)
$(PYTHON_GEN) $(srcdir)/gl_marshal.py -f $(srcdir)/gl_and_es_API.xml > 
$@
 
-$(MESA_DIR)/main/marshal_generated.h: gl_marshal_h.py marshal_XML.py $(COMMON)
-   $(PYTHON_GEN) $(srcdir)/gl_marshal_h.py -f $(srcdir)/gl_and_es_API.xml 
> $@
-
 $(MESA_DIR)/main/dispatch.h: gl_table.py $(COMMON)
$(PYTHON_GEN) $(srcdir)/gl_table.py -f $(srcdir)/gl_and_es_API.xml -m 
remap_table > $@
 
diff --git a/src/mapi/glapi/gen/gl_marshal.py b/src/mapi/glapi/gen/gl_marshal.py
index efa4d9e..e71ede3 100644
--- a/src/mapi/glapi/gen/gl_marshal.py
+++ b/src/mapi/glapi/gen/gl_marshal.py
@@ -34,7 +34,6 @@ header = """
 #include "dispatch.h"
 #include "glthread.h"
 #include "marshal.h"
-#include "marshal_generated.h"
 """
 
 
@@ -106,7 +105,7 @@ class PrintCode(gl_XML.gl_print_base):
 
 def print_async_dispatch(self, func):
 out('cmd = _mesa_glthread_allocate_command(ctx, '
-'DISPATCH_CMD_{0}, cmd_size);'.format(func.name))
+'(unmarshal_func)_mesa_unmarshal_{0}, 
cmd_size);'.format(func.name))
 for p in func.fixed_params:
 if p.count:
 out('memcpy(cmd->{0}, {0}, {1});'.format(
@@ -166,7 +165,7 @@ class PrintCode(gl_XML.gl_print_base):
 out('};')
 
 def print_async_unmarshal(self, func):
-out('static inline void')
+out('static void')
 out(('_mesa_unmarshal_{0}(struct gl_context *ctx, '
  'const struct marshal_cmd_{0} *cmd)').format(func.name))
 out('{')
@@ -205,6 +204,7 @@ class PrintCode(gl_XML.gl_print_base):
 else:
 out('variable_data += 
{0};'.format(p.size_string(False)))
 
+out('debug_print_unmarshal("{0}");'.format(func.name))
 self.print_sync_call(func)
 out('}')
 
@@ -276,35 +276,6 @@ class PrintCode(gl_XML.gl_print_base):
 out('')
 out('')
 
-def print_unmarshal_dispatch_cmd(self, api):
-out('size_t')
-out('_mesa_unmarshal_dispatch_cmd(struct gl_context *ctx, '
-'const void *cmd)')
-out('{')
-with indent():
-out('const struct marshal_cmd_base *cmd_base = cmd;')
-out('switch (cmd_base->cmd_id) {')
-for func in api.functionIterateAll():
-flavor = func.marshal_flavor()
-if flavor in ('skip', 'sync'):
-continue
-out('case DISPATCH_CMD_{0}:'.format(func.name))
-with indent():
-out('debug_print_unmarshal("{0}");'.format(func.name))
-out(('_mesa_unmarshal_{0}(ctx, (const struct 
marshal_cmd_{0} *)'
- ' cmd);').format(func.name))
-out('break;')
-out('default:')
-with indent():
-out('assert(!"Unrecognized command ID");')
-out('break;')
-out('}')
-out('')
-out('return cmd_base->cmd_size;')
-out('}')
-out('')
-out('')
-
 def print_create_marshal_table(self, api):
 out('struct _glapi_table *')
 out('_mesa_create_marshal_table(const struct gl_context *ctx)')
@@ -338,7 +309,6 @@ class PrintCode(gl_XML.gl_print_base):
 async_funcs.append(func)
 elif flavor == 'sync':
 self.print_sync_body(func)
-self.print_unmarshal_dispatch_cmd(api)
 self.print_create_marshal_table(api)
 
 
diff --git a/src/mapi/glapi/gen/gl_marshal_h.py 
b/src/mapi/glapi/gen/gl_marshal_h.py
deleted file mode 100644

Re: [Mesa-dev] [PATCH] mesa/marshal: add custom marshalling forglNamedBuffer(Sub)Data

2017-06-26 Thread Grigori Goronzy

On 2017-06-26 15:11, Marc Dietrich wrote:


unfortunately, this change broke vmware/vmplayer here (bisected). 
Windows
guest on linux host. Sig 11 in SVGA driver. All good if 
mesa_glthread=false.




Can you provide instructions how to reproduce this problem? A backtrace 
might help, too.


I don't really get it, by the way. Isn't the SVGA driver for Linux 
guests?


Best regards
Grigori





> Best regards
> Grigori
>
>> [1]
>> https://lists.freedesktop.org/archives/mesa-dev/2017-June/160329.html
>>
>> On 25/06/17 02:59, Grigori Goronzy wrote:
>>> These entry points are used by Alien Isolation and caused
>>> synchronization with glthread. The async marshalling implementation
>>> is similar to glBuffer(Sub)Data.
>>>
>>> Results in an approximately 6x drop in glthread synchronizations and a
>>> ~30% FPS jump in Alien Isolation (Medium preset, Athlon 860K, RX 480).
>>>
>>> This does not care about the EXTERNAL_VIRTUAL_MEMORY_BUFFER_AMD special
>>> case like the Buffer(Sub)Data marshalling functions.
>>> ---
>>> I'm not a fan of the code duplication and I'll try to address that in
>>> further changes to glthread/marshalling, but the improvement is so
>>> noticeable that I'd like to share it. Alien Isolation is now
>>> playable on
>>> my system while it wasn't before.
>>>
>>>   src/mapi/glapi/gen/ARB_direct_state_access.xml |   4 +-
>>>   src/mesa/main/marshal.c| 108
>>>
>>> +
>>>
>>>   src/mesa/main/marshal.h|  18 +
>>>   3 files changed, 128 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/src/mapi/glapi/gen/ARB_direct_state_access.xml
>>> b/src/mapi/glapi/gen/ARB_direct_state_access.xml
>>> index cb24d79..d3d2246 100644
>>> --- a/src/mapi/glapi/gen/ARB_direct_state_access.xml
>>> +++ b/src/mapi/glapi/gen/ARB_direct_state_access.xml
>>> @@ -61,14 +61,14 @@
>>>
>>> 
>>>
>>>  
>>>
>>>   -   
>>>
>>> +   
>>>
>>> 
>>> 
>>> 
>>> 
>>>
>>>  
>>>
>>>   -   
>>>
>>> +   >> marshal="custom">
>>>
>>> 
>>> 
>>> 
>>>
>>> diff --git a/src/mesa/main/marshal.c b/src/mesa/main/marshal.c
>>> index 4840f32..1fddf8e 100644
>>> --- a/src/mesa/main/marshal.c
>>> +++ b/src/mesa/main/marshal.c
>>> @@ -408,6 +408,114 @@ _mesa_marshal_BufferSubData(GLenum target,
>>> GLintptr offset, GLsizeiptr size,
>>>
>>>  }
>>>
>>>   }
>>>   +/* NamedBufferData: marshalled asynchronously */
>>>
>>> +struct marshal_cmd_NamedBufferData
>>> +{
>>> +   struct marshal_cmd_base cmd_base;
>>> +   GLuint name;
>>> +   GLsizei size;
>>> +   GLenum usage;
>>> +   /* Next size bytes are GLubyte data[size] */
>>> +};
>>> +
>>> +void
>>> +_mesa_unmarshal_NamedBufferData(struct gl_context *ctx,
>>> +const struct
>>> marshal_cmd_NamedBufferData *cmd)
>>> +{
>>> +   const GLuint name = cmd->name;
>>> +   const GLsizei size = cmd->size;
>>> +   const GLenum usage = cmd->usage;
>>> +   const void *data = (const void *) (cmd + 1);
>>> +
>>> +   CALL_NamedBufferData(ctx->CurrentServerDispatch,
>>> +  (name, size, data, usage));
>>> +}
>>> +
>>> +void GLAPIENTRY
>>> +_mesa_marshal_NamedBufferData(GLuint buffer, GLsizeiptr size,
>>> +  const GLvoid * data, GLenum usage)
>>> +{
>>> +   GET_CURRENT_CONTEXT(ctx);
>>> +   size_t cmd_size = sizeof(struct marshal_cmd_NamedBufferData) +
>>> size;
>>> +
>>> +   debug_print_marshal("NamedBufferData");
>>> +   if (unlikely(size < 0)) {
>>> +  _mesa_glthread_finish(ctx);
>>> +  _mesa_error(ctx, GL_INVALID_VALUE, "NamedBufferData(size < 0)");
>>> +  return;
>>> +   }
>>> +
>>> +   if (buffer > 0 && cmd_size <= MARSHAL_MAX_CMD_SIZE) {
>>> +  struct marshal_cmd_NamedBufferData *cmd =
>>> + _mesa_glthread_allocate_command(ctx,
>>> DISPATCH_CMD_NamedBufferData,
>>> +  

Re: [Mesa-dev] [PATCH] radeonsi: enable LLVM sisched for Unigine Superposition

2017-06-25 Thread Grigori Goronzy

On 2017-06-22 17:10, Marek Olšák wrote:

From: Marek Olšák 

+2.3% better score on Fiji. It might be better without HBM.


Is this really useful? Superposition is a benchmark. It would make more 
sense if this also targeted some actual games.
Optimizations specific to only benchmarks are considered "cheating" 
sometimes.


Best regards
Grigori


---
 src/gallium/drivers/radeonsi/si_pipe.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/src/gallium/drivers/radeonsi/si_pipe.c
b/src/gallium/drivers/radeonsi/si_pipe.c
index ff787ad..4088849 100644
--- a/src/gallium/drivers/radeonsi/si_pipe.c
+++ b/src/gallium/drivers/radeonsi/si_pipe.c
@@ -935,20 +935,27 @@ struct pipe_screen
*radeonsi_screen_create(struct radeon_winsys *ws,

si_init_screen_state_functions(sscreen);

if (!r600_common_screen_init(>b, ws, flags) ||
!si_init_gs_info(sscreen) ||
!si_init_shader_cache(sscreen)) {
FREE(sscreen);
return NULL;
}

+   /* Enable sisched where it helps. */
+   char process[128];
+   if (os_get_process_name(process, sizeof(process)) &&
+   /* Unigine Superposition */
+   !strcmp(process, "superposition"))
+   sscreen->b.debug_flags |= DBG_SI_SCHED;
+
 	/* Only enable as many threads as we have target machines, but at 
most

 * the number of CPUs - 1 if there is more than one.
 */
num_threads = sysconf(_SC_NPROCESSORS_ONLN);
num_threads = MAX2(1, num_threads - 1);
num_compiler_threads = MIN2(num_threads, ARRAY_SIZE(sscreen->tm));
num_compiler_threads_lowprio =
MIN2(num_threads, ARRAY_SIZE(sscreen->tm_low_priority));

if (!util_queue_init(>shader_compiler_queue, "si_shader",

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] mesa/marshal: add custom marshalling for glNamedBuffer(Sub)Data

2017-06-25 Thread Grigori Goronzy

On 2017-06-25 02:37, Timothy Arceri wrote:

Please try the series from Marek which reduces the batch size [1], the
reduced size helps reduce the impact of syncs. MARSHAL_MAX_CMD_SIZE is
also greatly reduced to help reduce thrashing the cache so its
possible this patch won't be as effective anymore. However you might
not even need it.



Sorry, I forgot to mention, the 30% improvement measured is with this 
patch on top of Marek's series compared to just Marek's series. That 
series alone is improving glthread with Alien Isolation as well, but I 
didn't measure exactly how much. It wouldn't surprise me if it is in the 
40-50% region with both, though.


Best regards
Grigori

[1] 
https://lists.freedesktop.org/archives/mesa-dev/2017-June/160329.html


On 25/06/17 02:59, Grigori Goronzy wrote:

These entry points are used by Alien Isolation and caused
synchronization with glthread. The async marshalling implementation
is similar to glBuffer(Sub)Data.

Results in an approximately 6x drop in glthread synchronizations and a
~30% FPS jump in Alien Isolation (Medium preset, Athlon 860K, RX 480).

This does not care about the EXTERNAL_VIRTUAL_MEMORY_BUFFER_AMD 
special

case like the Buffer(Sub)Data marshalling functions.
---
I'm not a fan of the code duplication and I'll try to address that in
further changes to glthread/marshalling, but the improvement is so
noticeable that I'd like to share it. Alien Isolation is now playable 
on

my system while it wasn't before.

  src/mapi/glapi/gen/ARB_direct_state_access.xml |   4 +-
  src/mesa/main/marshal.c| 108 
+

  src/mesa/main/marshal.h|  18 +
  3 files changed, 128 insertions(+), 2 deletions(-)

diff --git a/src/mapi/glapi/gen/ARB_direct_state_access.xml 
b/src/mapi/glapi/gen/ARB_direct_state_access.xml

index cb24d79..d3d2246 100644
--- a/src/mapi/glapi/gen/ARB_direct_state_access.xml
+++ b/src/mapi/glapi/gen/ARB_direct_state_access.xml
@@ -61,14 +61,14 @@

 
  -   
+   




 
  -   
+   marshal="custom">




diff --git a/src/mesa/main/marshal.c b/src/mesa/main/marshal.c
index 4840f32..1fddf8e 100644
--- a/src/mesa/main/marshal.c
+++ b/src/mesa/main/marshal.c
@@ -408,6 +408,114 @@ _mesa_marshal_BufferSubData(GLenum target, 
GLintptr offset, GLsizeiptr size,

 }
  }
  +/* NamedBufferData: marshalled asynchronously */
+struct marshal_cmd_NamedBufferData
+{
+   struct marshal_cmd_base cmd_base;
+   GLuint name;
+   GLsizei size;
+   GLenum usage;
+   /* Next size bytes are GLubyte data[size] */
+};
+
+void
+_mesa_unmarshal_NamedBufferData(struct gl_context *ctx,
+const struct 
marshal_cmd_NamedBufferData *cmd)

+{
+   const GLuint name = cmd->name;
+   const GLsizei size = cmd->size;
+   const GLenum usage = cmd->usage;
+   const void *data = (const void *) (cmd + 1);
+
+   CALL_NamedBufferData(ctx->CurrentServerDispatch,
+  (name, size, data, usage));
+}
+
+void GLAPIENTRY
+_mesa_marshal_NamedBufferData(GLuint buffer, GLsizeiptr size,
+  const GLvoid * data, GLenum usage)
+{
+   GET_CURRENT_CONTEXT(ctx);
+   size_t cmd_size = sizeof(struct marshal_cmd_NamedBufferData) + 
size;

+
+   debug_print_marshal("NamedBufferData");
+   if (unlikely(size < 0)) {
+  _mesa_glthread_finish(ctx);
+  _mesa_error(ctx, GL_INVALID_VALUE, "NamedBufferData(size < 
0)");

+  return;
+   }
+
+   if (buffer > 0 && cmd_size <= MARSHAL_MAX_CMD_SIZE) {
+  struct marshal_cmd_NamedBufferData *cmd =
+ _mesa_glthread_allocate_command(ctx, 
DISPATCH_CMD_NamedBufferData,

+ cmd_size);
+  cmd->name = buffer;
+  cmd->size = size;
+  cmd->usage = usage;
+  char *variable_data = (char *) (cmd + 1);
+  memcpy(variable_data, data, size);
+  _mesa_post_marshal_hook(ctx);
+   } else {
+  _mesa_glthread_finish(ctx);
+  CALL_NamedBufferData(ctx->CurrentServerDispatch,
+ (buffer, size, data, usage));
+   }
+}
+
+/* NamedBufferSubData: marshalled asynchronously */
+struct marshal_cmd_NamedBufferSubData
+{
+   struct marshal_cmd_base cmd_base;
+   GLuint name;
+   GLintptr offset;
+   GLsizei size;
+   /* Next size bytes are GLubyte data[size] */
+};
+
+void
+_mesa_unmarshal_NamedBufferSubData(struct gl_context *ctx,
+   const struct 
marshal_cmd_NamedBufferSubData *cmd)

+{
+   const GLuint name = cmd->name;
+   const GLintptr offset = cmd->offset;
+   const GLsizei size = cmd->size;
+   const void *data = (const void *) (cmd + 1);
+
+   CALL_NamedBufferSubData(ctx->CurrentServerDispatch,
+  (name, offset, size, data));
+}
+
+void GLAPIENTRY
+_mesa_marshal_NamedBufferSubData(GLuint buffer, GLintptr offset,
+  

[Mesa-dev] [PATCH] mesa/marshal: add custom marshalling for glNamedBuffer(Sub)Data

2017-06-24 Thread Grigori Goronzy
These entry points are used by Alien Isolation and caused
synchronization with glthread. The async marshalling implementation
is similar to glBuffer(Sub)Data.

Results in an approximately 6x drop in glthread synchronizations and a
~30% FPS jump in Alien Isolation (Medium preset, Athlon 860K, RX 480).

This does not care about the EXTERNAL_VIRTUAL_MEMORY_BUFFER_AMD special
case like the Buffer(Sub)Data marshalling functions.
---
I'm not a fan of the code duplication and I'll try to address that in
further changes to glthread/marshalling, but the improvement is so
noticeable that I'd like to share it. Alien Isolation is now playable on
my system while it wasn't before.

 src/mapi/glapi/gen/ARB_direct_state_access.xml |   4 +-
 src/mesa/main/marshal.c| 108 +
 src/mesa/main/marshal.h|  18 +
 3 files changed, 128 insertions(+), 2 deletions(-)

diff --git a/src/mapi/glapi/gen/ARB_direct_state_access.xml 
b/src/mapi/glapi/gen/ARB_direct_state_access.xml
index cb24d79..d3d2246 100644
--- a/src/mapi/glapi/gen/ARB_direct_state_access.xml
+++ b/src/mapi/glapi/gen/ARB_direct_state_access.xml
@@ -61,14 +61,14 @@
   

 
-   
+   
   
   
   
   

 
-   
+   
   
   
   
diff --git a/src/mesa/main/marshal.c b/src/mesa/main/marshal.c
index 4840f32..1fddf8e 100644
--- a/src/mesa/main/marshal.c
+++ b/src/mesa/main/marshal.c
@@ -408,6 +408,114 @@ _mesa_marshal_BufferSubData(GLenum target, GLintptr 
offset, GLsizeiptr size,
}
 }
 
+/* NamedBufferData: marshalled asynchronously */
+struct marshal_cmd_NamedBufferData
+{
+   struct marshal_cmd_base cmd_base;
+   GLuint name;
+   GLsizei size;
+   GLenum usage;
+   /* Next size bytes are GLubyte data[size] */
+};
+
+void
+_mesa_unmarshal_NamedBufferData(struct gl_context *ctx,
+const struct marshal_cmd_NamedBufferData *cmd)
+{
+   const GLuint name = cmd->name;
+   const GLsizei size = cmd->size;
+   const GLenum usage = cmd->usage;
+   const void *data = (const void *) (cmd + 1);
+
+   CALL_NamedBufferData(ctx->CurrentServerDispatch,
+  (name, size, data, usage));
+}
+
+void GLAPIENTRY
+_mesa_marshal_NamedBufferData(GLuint buffer, GLsizeiptr size,
+  const GLvoid * data, GLenum usage)
+{
+   GET_CURRENT_CONTEXT(ctx);
+   size_t cmd_size = sizeof(struct marshal_cmd_NamedBufferData) + size;
+
+   debug_print_marshal("NamedBufferData");
+   if (unlikely(size < 0)) {
+  _mesa_glthread_finish(ctx);
+  _mesa_error(ctx, GL_INVALID_VALUE, "NamedBufferData(size < 0)");
+  return;
+   }
+
+   if (buffer > 0 && cmd_size <= MARSHAL_MAX_CMD_SIZE) {
+  struct marshal_cmd_NamedBufferData *cmd =
+ _mesa_glthread_allocate_command(ctx, DISPATCH_CMD_NamedBufferData,
+ cmd_size);
+  cmd->name = buffer;
+  cmd->size = size;
+  cmd->usage = usage;
+  char *variable_data = (char *) (cmd + 1);
+  memcpy(variable_data, data, size);
+  _mesa_post_marshal_hook(ctx);
+   } else {
+  _mesa_glthread_finish(ctx);
+  CALL_NamedBufferData(ctx->CurrentServerDispatch,
+ (buffer, size, data, usage));
+   }
+}
+
+/* NamedBufferSubData: marshalled asynchronously */
+struct marshal_cmd_NamedBufferSubData
+{
+   struct marshal_cmd_base cmd_base;
+   GLuint name;
+   GLintptr offset;
+   GLsizei size;
+   /* Next size bytes are GLubyte data[size] */
+};
+
+void
+_mesa_unmarshal_NamedBufferSubData(struct gl_context *ctx,
+   const struct marshal_cmd_NamedBufferSubData 
*cmd)
+{
+   const GLuint name = cmd->name;
+   const GLintptr offset = cmd->offset;
+   const GLsizei size = cmd->size;
+   const void *data = (const void *) (cmd + 1);
+
+   CALL_NamedBufferSubData(ctx->CurrentServerDispatch,
+  (name, offset, size, data));
+}
+
+void GLAPIENTRY
+_mesa_marshal_NamedBufferSubData(GLuint buffer, GLintptr offset,
+ GLsizeiptr size, const GLvoid * data)
+{
+   GET_CURRENT_CONTEXT(ctx);
+   size_t cmd_size = sizeof(struct marshal_cmd_NamedBufferSubData) + size;
+
+   debug_print_marshal("NamedBufferSubData");
+   if (unlikely(size < 0)) {
+  _mesa_glthread_finish(ctx);
+  _mesa_error(ctx, GL_INVALID_VALUE, "NamedBufferSubData(size < 0)");
+  return;
+   }
+
+   if (buffer > 0 && cmd_size <= MARSHAL_MAX_CMD_SIZE) {
+  struct marshal_cmd_NamedBufferSubData *cmd =
+ _mesa_glthread_allocate_command(ctx, DISPATCH_CMD_NamedBufferSubData,
+ cmd_size);
+  cmd->name = buffer;
+  cmd->offset = offset;
+  cmd->size = size;
+  char *variable_data = (char *) (cmd + 1);
+  memcpy(variable_data, data, size);
+  _mesa_post_marshal_hook(ctx);
+   } else {
+  _mesa_glthread_finish(ctx);
+  CALL_NamedBufferSubData(ctx->CurrentServerDispatch,
+  

Re: [Mesa-dev] [PATCH] radeonsi: don't emit partial flushes at the end of IBs (v2)

2017-06-23 Thread Grigori Goronzy

On 2017-06-23 13:48, Andy Furniss wrote:

Marek Olšák wrote:

From: Marek Olšák 

The kernel sort of does the same thing with fences.

v2: do emit partial flushes on SI


Bugzilla seems to be down currently so replying here.

On R9 285 with current agd5f 4.13-wip kernel I get some slight
artifacts on Unigine Valley since this.



I also see corruption, sometimes even basic X rendering gets corrupted.

I filed a bug:

https://bugs.freedesktop.org/show_bug.cgi?id=101565

Best regards
Grigori
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/4] util/disk_cache: compress individual cache entries

2017-03-02 Thread Grigori Goronzy

On 2017-03-02 10:08, Timothy Arceri wrote:

On 02/03/17 18:45, Tobias Droste wrote:

Hi Timothy,

if you plan to support multiple compression algorithms, shouldn't 
"struct
cache_entry_file_data" contain some info about what compression 
algorithm was

used to compress the data? Or is this already there and I missed it?


I don't plan to support more than one. I'm just saying it's a
possibility for the future, depending on further analysis and
requirements from different hardware. But right now I would just like
to land zlib support so we have a baseline to work from.



Like outlined on IRC, for the time being can you reconsider the 
compression level, though?


The cache currently quite noticeably affects shader loading time on 
first hit, i.e. when the cache is cold. Cache I/O and compression is 
right now done synchronously, so it even happens without compression on 
a system with fast SSD. Here's a summary of the numbers I gathered from 
the initial loading screens of DE:MD on an Athlon X4 860k:


No Cache 215 sec

Cold Cache zlib BEST_COMPRESSION 285 sec
Warm Cache zlib BEST_COMPRESSION 33 sec

Cold Cache zlib BEST_SPEED   264 sec
Warm Cache zlib BEST_SPEED   33 sec

Cold Cache no compression266 sec
Warm Cache no compression34 sec

The total cache size for that game is 48 MiB with BEST_COMPRESSION, 56 
MiB with BEST_SPEED and 170 MiB with no compression.


What In conclude from these numbers is that

a) the cache works really well! A warmed cache cuts down loading time by 
an order of magnitude,
b) compression does a good job of reducing the cache size (which 
probably helps with traditional spinning disk HDDs, and when disk space 
is limited),

c) decompression always seems Fast Enough On My Computer™.

However, I also notice that BEST_COMPRESSION doesn't really affect the 
total size of the cache files much compared to BEST_SPEED, while it does 
noticeably increase the cold cache shader loading time. So in the end, 
BEST_SPEED might be a better compromise, particularly for systems with a 
slow CPU.


Apart from that, consider the series
Reviewed-by: Grigori Goronzy <g...@chown.ath.cx>

Best regards
Grigori



Am Donnerstag, 2. März 2017, 03:20:05 CET schrieb Matt Turner:
On Wed, Mar 1, 2017 at 2:19 PM, Timothy Arceri 
<tarc...@itsqueeze.com>

wrote:
IMO we should go with zlib and people can provide future patches 
with
justifications/stats for using a different library over zlib just 
like we

do for any other performance based patch.


Yes, agreed. "Which compression should we use?" is one of the easiest
bikesheds to paint.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Mesa 12.1.0 release plan (Was Re: Next Mesa release, anyone?)

2016-10-19 Thread Grigori Goronzy

On 2016-10-04 12:32, Emil Velikov wrote:

On 2 October 2016 at 14:17, Axel Davy  wrote:
I'd prefer myself Oct 14, because we have a lot of patches for nine, 
and
they deserve more cleaning and testing, but if it's Oct 7, we'll try 
be on

time.


14th it is. As mentioned before: _don't_ wait for the last week to get
things merged. Once you're reasonably happy just send the new work
review and commit it.
Same applies for bugfixes :-)



What happened to these plans? It is the October 19th already. Nine fixes 
have trickled into Mesa and radv was merged also. What's the holdup?


Grigori
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/3] radv: add missing unreachable

2016-10-11 Thread Grigori Goronzy
---
 src/amd/vulkan/radv_descriptor_set.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/amd/vulkan/radv_descriptor_set.c 
b/src/amd/vulkan/radv_descriptor_set.c
index d1d2b1f..ba8a002 100644
--- a/src/amd/vulkan/radv_descriptor_set.c
+++ b/src/amd/vulkan/radv_descriptor_set.c
@@ -113,6 +113,7 @@ VkResult radv_CreateDescriptorSetLayout(
alignment = 16;
break;
default:
+   unreachable("unknown descriptor type\n");
break;
}
 
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/3] radv: fix strict aliasing violation

2016-10-11 Thread Grigori Goronzy
---
 src/amd/vulkan/radv_pipeline_cache.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/src/amd/vulkan/radv_pipeline_cache.c 
b/src/amd/vulkan/radv_pipeline_cache.c
index 032a7e4..85a2b6d 100644
--- a/src/amd/vulkan/radv_pipeline_cache.c
+++ b/src/amd/vulkan/radv_pipeline_cache.c
@@ -28,7 +28,10 @@
 #include "ac_nir_to_llvm.h"
 
 struct cache_entry {
-   unsigned char sha1[20];
+   union {
+   unsigned char sha1[20];
+   uint32_t sha1_dw[5];
+   };
uint32_t code_size;
struct ac_shader_variant_info variant_info;
struct ac_shader_config config;
@@ -185,7 +188,7 @@ radv_pipeline_cache_set_entry(struct radv_pipeline_cache 
*cache,
  struct cache_entry *entry)
 {
const uint32_t mask = cache->table_size - 1;
-   const uint32_t start = (*(uint32_t *) entry->sha1);
+   const uint32_t start = entry->sha1_dw[0];
 
/* We'll always be able to insert when we get here. */
assert(cache->kernel_count < cache->table_size / 2);
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/3] radv: fix uninitialized variables

2016-10-11 Thread Grigori Goronzy
This gets rid of "may be used uninitialized" compiler warnings.
---
 src/amd/vulkan/radv_formats.c  | 2 +-
 src/amd/vulkan/radv_pipeline.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/amd/vulkan/radv_formats.c b/src/amd/vulkan/radv_formats.c
index 90c140c..76d5fa1 100644
--- a/src/amd/vulkan/radv_formats.c
+++ b/src/amd/vulkan/radv_formats.c
@@ -804,7 +804,7 @@ bool radv_format_pack_clear_color(VkFormat format,
  uint32_t clear_vals[2],
  VkClearColorValue *value)
 {
-   uint8_t r, g, b, a;
+   uint8_t r = 0, g = 0, b = 0, a = 0;
const struct vk_format_description *desc = 
vk_format_description(format);
 
if (vk_format_get_component_bits(format, VK_FORMAT_COLORSPACE_RGB, 0) 
<= 8) {
diff --git a/src/amd/vulkan/radv_pipeline.c b/src/amd/vulkan/radv_pipeline.c
index 89300e5..eb64b69 100644
--- a/src/amd/vulkan/radv_pipeline.c
+++ b/src/amd/vulkan/radv_pipeline.c
@@ -367,7 +367,7 @@ radv_pipeline_compile(struct radv_pipeline *pipeline,
struct radv_shader_variant *variant;
nir_shader *nir;
void *code = NULL;
-   unsigned code_size;
+   unsigned code_size = 0;
 
if (module->nir)
_mesa_sha1_compute(module->nir->info.name,
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] vl: add a bicubic interpolation filter(v4)

2016-06-28 Thread Grigori Goronzy

On 2016-06-28 11:25, Nayan Deshmukh wrote:

This is a shader based bicubic interpolater which uses cubic
Hermite spline algorithm.

v2: set dst_area and dst_clip during scaling (Christian)
v3: clear the render target before rendering
v4: intialize offsets while initializing shaders
use a constant buffer to send dst_size to frag shader
small changes to reduce calculation in shader

Signed-off-by: Nayan Deshmukh 
---
 src/gallium/auxiliary/Makefile.sources   |   2 +
 src/gallium/auxiliary/vl/vl_bicubic_filter.c | 465 
+++

 src/gallium/auxiliary/vl/vl_bicubic_filter.h |  63 
 3 files changed, 530 insertions(+)
 create mode 100644 src/gallium/auxiliary/vl/vl_bicubic_filter.c
 create mode 100644 src/gallium/auxiliary/vl/vl_bicubic_filter.h

diff --git a/src/gallium/auxiliary/Makefile.sources
b/src/gallium/auxiliary/Makefile.sources
index ab58358..e0311bf 100644
--- a/src/gallium/auxiliary/Makefile.sources
+++ b/src/gallium/auxiliary/Makefile.sources
@@ -317,6 +317,8 @@ NIR_SOURCES := \
nir/tgsi_to_nir.h

 VL_SOURCES := \
+   vl/vl_bicubic_filter.c \
+   vl/vl_bicubic_filter.h \
vl/vl_compositor.c \
vl/vl_compositor.h \
vl/vl_csc.c \
diff --git a/src/gallium/auxiliary/vl/vl_bicubic_filter.c
b/src/gallium/auxiliary/vl/vl_bicubic_filter.c
new file mode 100644
index 000..396e76d
--- /dev/null
+++ b/src/gallium/auxiliary/vl/vl_bicubic_filter.c
@@ -0,0 +1,465 @@
+/**
+ *
+ * Copyright 2016 Nayan Deshmukh.
+ * All Rights Reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person 
obtaining a

+ * copy of this software and associated documentation files (the
+ * "Software"), to deal in the Software without restriction, including
+ * without limitation the rights to use, copy, modify, merge, publish,
+ * distribute, sub license, and/or sell copies of the Software, and to
+ * permit persons to whom the Software is furnished to do so, subject 
to

+ * the following conditions:
+ *
+ * The above copyright notice and this permission notice (including 
the
+ * next paragraph) shall be included in all copies or substantial 
portions

+ * of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 
EXPRESS

+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND 
NON-INFRINGEMENT.

+ * IN NO EVENT SHALL VMWARE AND/OR ITS SUPPLIERS BE LIABLE FOR
+ * ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF 
CONTRACT,

+ * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+ * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+ *
+ 
**/

+
+#include 
+
+#include "pipe/p_context.h"
+
+#include "tgsi/tgsi_ureg.h"
+
+#include "util/u_draw.h"
+#include "util/u_memory.h"
+#include "util/u_math.h"
+#include "util/u_rect.h"
+
+#include "vl_types.h"
+#include "vl_vertex_buffers.h"
+#include "vl_bicubic_filter.h"
+
+enum VS_OUTPUT
+{
+   VS_O_VPOS = 0,
+   VS_O_VTEX = 0
+};
+
+static void *
+create_vert_shader(struct vl_bicubic_filter *filter)
+{
+   struct ureg_program *shader;
+   struct ureg_src i_vpos;
+   struct ureg_dst o_vpos, o_vtex;
+
+   shader = ureg_create(PIPE_SHADER_VERTEX);
+   if (!shader)
+  return NULL;
+
+   i_vpos = ureg_DECL_vs_input(shader, 0);
+   o_vpos = ureg_DECL_output(shader, TGSI_SEMANTIC_POSITION, 
VS_O_VPOS);
+   o_vtex = ureg_DECL_output(shader, TGSI_SEMANTIC_GENERIC, 
VS_O_VTEX);

+
+   ureg_MOV(shader, o_vpos, i_vpos);
+   ureg_MOV(shader, o_vtex, i_vpos);
+
+   ureg_END(shader);
+
+   return ureg_create_shader_and_destroy(shader, filter->pipe);
+}
+
+static void
+create_frag_shader_cubic_interpolater(struct ureg_program *shader,
struct ureg_src tex_a,
+  struct ureg_src tex_b, struct
ureg_src tex_c,
+  struct ureg_src tex_d, struct 
ureg_src t,

+  struct ureg_dst o_fragment)
+{
+   struct ureg_dst temp[11];
+   struct ureg_dst t_2;
+   unsigned i;
+
+   for(i = 0; i < 11; ++i)
+   temp[i] = ureg_DECL_temporary(shader);
+   t_2 = ureg_DECL_temporary(shader);
+
+   /*
+* |temp[0]|   |  0  2  0  0 |  |tex_a|
+* |temp[1]| = | -1  0  1  0 |* |tex_b|
+* |temp[2]|   |  2 -5  4 -1 |  |tex_c|
+* |temp[3]|   | -1  3 -3  1 |  |tex_d|
+*/
+   ureg_MUL(shader, temp[0], tex_b, ureg_imm1f(shader, 2.0f));
+
+   ureg_MUL(shader, temp[1], tex_a, ureg_imm1f(shader, -1.0f));
+   ureg_MAD(shader, temp[1], tex_c, ureg_imm1f(shader, 1.0f),
+ureg_src(temp[1]));
+
+   ureg_MUL(shader, temp[2], tex_a, ureg_imm1f(shader, 2.0f));
+   ureg_MAD(shader, temp[2], tex_b, ureg_imm1f(shader, -5.0f),
+ureg_src(temp[2]));
+   ureg_MAD(shader, temp[2], tex_c, ureg_imm1f(shader, 4.0f),
+ 

Re: [Mesa-dev] [PATCH] radeon/uvd: fix the H264 level for Tonga

2016-05-30 Thread Grigori Goronzy

On 2016-05-27 15:16, Emil Velikov wrote:

The odd things is that VLC uses/used to? check that information before
feeding the video to the decoder, while others implementations (like
the original one in mplayer done by the Nvidia devs) do/did? not
bother.



Many files either have an incorrect level set or they subtly exceed the 
restrictions of a common level, e.g. 4.1. So strict checks will make 
many files non-playable with VDPAU despite actually working fine without 
checks. That's why some players don't check the level.


That said, I wonder if the level 4.1 returned on UVD 3/4 for H.264 high 
profile is actually correct. These UVD variants can easily play 1080p60, 
right? Even UVD 2 can do that, I think. Level 4.1 only allows up to 
1080p30. So wouldn't level 4.2 be the right choice? Or are there some 
restrictions on the HW decoder that make it formally not capable of 
level 4.2?


Grigori
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] winsys/amdgpu: adjust IB size based on buffer wait time

2016-04-20 Thread Grigori Goronzy

On 2016-04-20 02:20, Nicolai Hähnle wrote:

This is just a slight massaging of the patch you sent previously. What
happened to the discussion we had about how to do this properly?



This already provides good value as-is and it is (IMHO) reasonably 
clean, so why not include it for the time being? Marek seemed to like 
the general concept as well. I agree that basing IB size on GPU idleness 
is a great idea and I'll look into that, either as an alternative or as 
an addition to this.


Just a random question: we can count on up to date gfx fence sequence 
numbers being available without any calls into the kernel, right? The 
winsys code makes that conditional and calls into the kernel when no 
fence pointer is available.


Grigori



On 19.04.2016 18:13, Grigori Goronzy wrote:

Small IBs help to reduce stalls for workloads that require a lot of
synchronization. On the other hand, if there is no notable
synchronization, we can use a large IB size to slightly improve
performance in some cases.

This introduces tuning of the IB size based on feedback on the average
buffer wait time. The average wait time is tracked with exponential
smoothing.
---
  src/gallium/winsys/amdgpu/drm/amdgpu_bo.c |  6 +-
  src/gallium/winsys/amdgpu/drm/amdgpu_cs.c | 17 -
  src/gallium/winsys/amdgpu/drm/amdgpu_cs.h |  2 ++
  src/gallium/winsys/amdgpu/drm/amdgpu_winsys.h |  9 +
  4 files changed, 28 insertions(+), 6 deletions(-)

diff --git a/src/gallium/winsys/amdgpu/drm/amdgpu_bo.c 
b/src/gallium/winsys/amdgpu/drm/amdgpu_bo.c

index 036301e..4b8554d 100644
--- a/src/gallium/winsys/amdgpu/drm/amdgpu_bo.c
+++ b/src/gallium/winsys/amdgpu/drm/amdgpu_bo.c
@@ -195,8 +195,10 @@ static void *amdgpu_bo_map(struct pb_buffer *buf,
 return NULL;
  }
   }
+ amdgpu_winsys_update_buffer_wait_avg(bo->ws, 0);
} else {
   uint64_t time = os_time_get_nano();
+ uint64_t duration;

   if (!(usage & PIPE_TRANSFER_WRITE)) {
  /* Mapping for read.
@@ -221,7 +223,9 @@ static void *amdgpu_bo_map(struct pb_buffer *buf,
 RADEON_USAGE_READWRITE);
   }

- bo->ws->buffer_wait_time += os_time_get_nano() - time;
+ duration = os_time_get_nano() - time;
+ bo->ws->buffer_wait_time += duration;
+ amdgpu_winsys_update_buffer_wait_avg(bo->ws, duration);
}
 }

diff --git a/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c 
b/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c

index 69902c4..b9a7c5b 100644
--- a/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c
+++ b/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c
@@ -201,10 +201,7 @@ amdgpu_ctx_query_reset_status(struct 
radeon_winsys_ctx *rwctx)
  static bool amdgpu_get_new_ib(struct radeon_winsys *ws, struct 
amdgpu_ib *ib,
struct amdgpu_cs_ib_info *info, 
unsigned ib_type)

  {
-   /* Small IBs are better than big IBs, because the GPU goes idle 
quicker

-* and there is less waiting for buffers and fences. Proof:
-*   
http://www.phoronix.com/scan.php?page=article=mesa-111-si=1

-*/
+   struct amdgpu_winsys *aws = (struct amdgpu_winsys *)ws;
 unsigned buffer_size, ib_size;

 switch (ib_type) {
@@ -216,8 +213,18 @@ static bool amdgpu_get_new_ib(struct 
radeon_winsys *ws, struct amdgpu_ib *ib,

ib_size = 128 * 1024 * 4;
break;
 case IB_MAIN:
+  /* Small IBs are often better than big IBs, because the GPU 
goes idle
+   * quicker and there is less waiting for buffers and fences. 
Proof:
+   *   
http://www.phoronix.com/scan.php?page=article=mesa-111-si=1
+   * Tune IB size depending on average buffer waiting time, which 
is an

+   * indicator for the amount of synchronization going on. Some
+   * applications don't cause notable synchronization, so we can 
use

+   * large IB size for slightly improved throughput.
+   */
buffer_size = 128 * 1024 * 4;
-  ib_size = 20 * 1024 * 4;
+  ib_size = 32 * 1024 * 4;
+  if (aws->buffer_wait_time_avg > IB_SIZE_WAIT_THRESHOLD_NS)
+ ib_size = 10 * 1024 * 4;
 }

 ib->base.cdw = 0;
diff --git a/src/gallium/winsys/amdgpu/drm/amdgpu_cs.h 
b/src/gallium/winsys/amdgpu/drm/amdgpu_cs.h

index 4ed830b..98e58a2 100644
--- a/src/gallium/winsys/amdgpu/drm/amdgpu_cs.h
+++ b/src/gallium/winsys/amdgpu/drm/amdgpu_cs.h
@@ -35,6 +35,8 @@
  #include "amdgpu_bo.h"
  #include "util/u_memory.h"

+#define IB_SIZE_WAIT_THRESHOLD_NS   1
+
  struct amdgpu_ctx {
 struct amdgpu_winsys *ws;
 amdgpu_context_handle ctx;
diff --git a/src/gallium/winsys/amdgpu/drm/amdgpu_winsys.h 
b/src/gallium/winsys/amdgpu/drm/amdgpu_winsys.h

index 91b9be4..3bd63b6 100644
--- a/src/gallium/winsys/amdgpu/drm/amdgpu_winsys.h
+++ b/src/gallium/winsys/amdgpu/drm/amdgpu_winsys.h
@@ -54,6 +54,7 @@ struct amdgpu_winsys {
 uint64_t allocated_vram;
 uin

[Mesa-dev] [PATCH 2/2] winsys/amdgpu: clean up and fix switch statement

2016-04-19 Thread Grigori Goronzy
Add missing break, add default case. Additionally initialize variables
to avoid compiler warnings.
---
 src/gallium/winsys/amdgpu/drm/amdgpu_cs.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c 
b/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c
index b9a7c5b..d978a0d 100644
--- a/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c
+++ b/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c
@@ -202,12 +202,13 @@ static bool amdgpu_get_new_ib(struct radeon_winsys *ws, 
struct amdgpu_ib *ib,
   struct amdgpu_cs_ib_info *info, unsigned ib_type)
 {
struct amdgpu_winsys *aws = (struct amdgpu_winsys *)ws;
-   unsigned buffer_size, ib_size;
+   unsigned buffer_size = 0, ib_size = 0;
 
switch (ib_type) {
case IB_CONST_PREAMBLE:
   buffer_size = 4 * 1024 * 4;
   ib_size = 1024 * 4;
+  break;
case IB_CONST:
   buffer_size = 512 * 1024 * 4;
   ib_size = 128 * 1024 * 4;
@@ -225,6 +226,9 @@ static bool amdgpu_get_new_ib(struct radeon_winsys *ws, 
struct amdgpu_ib *ib,
   ib_size = 32 * 1024 * 4;
   if (aws->buffer_wait_time_avg > IB_SIZE_WAIT_THRESHOLD_NS)
  ib_size = 10 * 1024 * 4;
+  break;
+   default:
+  assert(!"unreachable");
}
 
ib->base.cdw = 0;
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] winsys/amdgpu: adjust IB size based on buffer wait time

2016-04-19 Thread Grigori Goronzy
Small IBs help to reduce stalls for workloads that require a lot of
synchronization. On the other hand, if there is no notable
synchronization, we can use a large IB size to slightly improve
performance in some cases.

This introduces tuning of the IB size based on feedback on the average
buffer wait time. The average wait time is tracked with exponential
smoothing.
---
 src/gallium/winsys/amdgpu/drm/amdgpu_bo.c |  6 +-
 src/gallium/winsys/amdgpu/drm/amdgpu_cs.c | 17 -
 src/gallium/winsys/amdgpu/drm/amdgpu_cs.h |  2 ++
 src/gallium/winsys/amdgpu/drm/amdgpu_winsys.h |  9 +
 4 files changed, 28 insertions(+), 6 deletions(-)

diff --git a/src/gallium/winsys/amdgpu/drm/amdgpu_bo.c 
b/src/gallium/winsys/amdgpu/drm/amdgpu_bo.c
index 036301e..4b8554d 100644
--- a/src/gallium/winsys/amdgpu/drm/amdgpu_bo.c
+++ b/src/gallium/winsys/amdgpu/drm/amdgpu_bo.c
@@ -195,8 +195,10 @@ static void *amdgpu_bo_map(struct pb_buffer *buf,
return NULL;
 }
  }
+ amdgpu_winsys_update_buffer_wait_avg(bo->ws, 0);
   } else {
  uint64_t time = os_time_get_nano();
+ uint64_t duration;
 
  if (!(usage & PIPE_TRANSFER_WRITE)) {
 /* Mapping for read.
@@ -221,7 +223,9 @@ static void *amdgpu_bo_map(struct pb_buffer *buf,
RADEON_USAGE_READWRITE);
  }
 
- bo->ws->buffer_wait_time += os_time_get_nano() - time;
+ duration = os_time_get_nano() - time;
+ bo->ws->buffer_wait_time += duration;
+ amdgpu_winsys_update_buffer_wait_avg(bo->ws, duration);
   }
}
 
diff --git a/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c 
b/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c
index 69902c4..b9a7c5b 100644
--- a/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c
+++ b/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c
@@ -201,10 +201,7 @@ amdgpu_ctx_query_reset_status(struct radeon_winsys_ctx 
*rwctx)
 static bool amdgpu_get_new_ib(struct radeon_winsys *ws, struct amdgpu_ib *ib,
   struct amdgpu_cs_ib_info *info, unsigned ib_type)
 {
-   /* Small IBs are better than big IBs, because the GPU goes idle quicker
-* and there is less waiting for buffers and fences. Proof:
-*   http://www.phoronix.com/scan.php?page=article=mesa-111-si=1
-*/
+   struct amdgpu_winsys *aws = (struct amdgpu_winsys *)ws;
unsigned buffer_size, ib_size;
 
switch (ib_type) {
@@ -216,8 +213,18 @@ static bool amdgpu_get_new_ib(struct radeon_winsys *ws, 
struct amdgpu_ib *ib,
   ib_size = 128 * 1024 * 4;
   break;
case IB_MAIN:
+  /* Small IBs are often better than big IBs, because the GPU goes idle
+   * quicker and there is less waiting for buffers and fences. Proof:
+   *   http://www.phoronix.com/scan.php?page=article=mesa-111-si=1
+   * Tune IB size depending on average buffer waiting time, which is an
+   * indicator for the amount of synchronization going on. Some
+   * applications don't cause notable synchronization, so we can use
+   * large IB size for slightly improved throughput.
+   */
   buffer_size = 128 * 1024 * 4;
-  ib_size = 20 * 1024 * 4;
+  ib_size = 32 * 1024 * 4;
+  if (aws->buffer_wait_time_avg > IB_SIZE_WAIT_THRESHOLD_NS)
+ ib_size = 10 * 1024 * 4;
}
 
ib->base.cdw = 0;
diff --git a/src/gallium/winsys/amdgpu/drm/amdgpu_cs.h 
b/src/gallium/winsys/amdgpu/drm/amdgpu_cs.h
index 4ed830b..98e58a2 100644
--- a/src/gallium/winsys/amdgpu/drm/amdgpu_cs.h
+++ b/src/gallium/winsys/amdgpu/drm/amdgpu_cs.h
@@ -35,6 +35,8 @@
 #include "amdgpu_bo.h"
 #include "util/u_memory.h"
 
+#define IB_SIZE_WAIT_THRESHOLD_NS   1
+
 struct amdgpu_ctx {
struct amdgpu_winsys *ws;
amdgpu_context_handle ctx;
diff --git a/src/gallium/winsys/amdgpu/drm/amdgpu_winsys.h 
b/src/gallium/winsys/amdgpu/drm/amdgpu_winsys.h
index 91b9be4..3bd63b6 100644
--- a/src/gallium/winsys/amdgpu/drm/amdgpu_winsys.h
+++ b/src/gallium/winsys/amdgpu/drm/amdgpu_winsys.h
@@ -54,6 +54,7 @@ struct amdgpu_winsys {
uint64_t allocated_vram;
uint64_t allocated_gtt;
uint64_t buffer_wait_time; /* time spent in buffer_wait in ns */
+   uint64_t buffer_wait_time_avg;
uint64_t num_cs_flushes;
unsigned gart_page_size;
 
@@ -76,6 +77,14 @@ amdgpu_winsys(struct radeon_winsys *base)
return (struct amdgpu_winsys*)base;
 }
 
+static inline
+void amdgpu_winsys_update_buffer_wait_avg(struct amdgpu_winsys *ws,
+  uint64_t wait)
+{
+   /* Exponential smoothing with alpha = 0.25 */
+   ws->buffer_wait_time_avg = (3 * ws->buffer_wait_time_avg + wait) / 4;
+}
+
 void amdgpu_surface_init_functions(struct amdgpu_winsys *ws);
 ADDR_HANDLE amdgpu_addr_create(struct amdgpu_winsys *ws);
 
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC] dynamic IB size tuning for radeonsi

2016-04-17 Thread Grigori Goronzy

Interesting, and thanks for poking at this issue. I've been thinking
about tuning IB sizes as well. I'd like for us to get this right, so I
wonder: What's your theory for _why_ your change helps?



See below. I think you discovered it yourself.


I'll be honest with you: Right now, I think your approach contains too
much unexplained "magic". What's the theory that explains using buffer
wait averages in this way?



I agree that there is too much, magic, e.g. the cutoff buffer-wait-time 
for small IBs is quite magical and can't be explained well.



My theory for why your change helps is about CPU/GPU parallelism. When
we wait for buffer idle, this most likely means the GPU becomes
idle.[1] If you use a large IB to start the GPU up again, you'll wait
a longer time before the GPU starts doing work again. Basically, in
ASCII art:

GPU idle
GPU =+..+=
  |  |  |
CPU ==+..+==+=
   buffer
wait

By reducing the size of the IB, the picture changes like this:

 GPU idle
GPU =+..+=
  |  |  |
CPU ==+..+==+=
   buffer
   wait

It takes a shorter amount of CPU time before the GPU gets new work,
the GPU is utilized more fully and the program runs faster.



Yes, that is the basic idea. :)
When it is likely that we need to synchronize work with the GPU later 
on, it pays off to queue work sooner to keep the GPU busy most of the 
time, and that is enforced by smaller IBs.



If this explanation is correct and all there is to it, then it suggest
the logic for when IBs should be shorter. Basically, we should use
short IBs when the GPU is idle.[2]



Right, but the problem is to cheaply and reliably determine idleness.


There are a bunch of different options. A simple one that comes
closest to what your patch does - without actually querying for GPU
idle - is to just make the first IB after each buffer wait a small
one. The length of the buffer wait doesn't seem important because what
we need to address is the fact that the GPU is idle. That's a boolean
matter.



Let me give that a try, sounds like a good idea. Particularly, we could 
use *really* small IBs without affecting general performance in this 
case, at least in theory.


For the moment, the slightly "magic" way with buffer-wait-time still 
leads to consistent improvements (I did not see any regressions, 
either). So I'll try to describe the magic somewhat in a upcoming patch 
and hope that's alright for inclusion.


Grigori

PS: "about to become idle" is probably hard to measure, so the small IB 
approach maybe has some merit even if we can easily check idleness.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/4] gallium/radeon: add clear_texture function

2016-04-16 Thread Grigori Goronzy

On 2016-04-15 20:30, Jakob Sinclair wrote:

In other places in radeonsi that require reinterpretation (e.g.
si_blit.c), the surface template is modified instead of changing the
surface after creation. I'm not sure if r600/radeonsi like it if the
format is changed late like here. Seems to be cleaner and clearer to
change the template anyway.



Thanks for the information. Just want to make sure I got the surface
template solution right:

First you create an surface object which you will then copy over
information too. Then you
can edit things like the format without worrying about something going
horribly wrong.
Then you can use that object to create a new surface which the old
surface can now point too.



No, you just declare a regular struct pipe_surface and set it up by 
settings its fields. This is called the template. The template is then 
used for the surface creation call to describe the desired properties of 
the surface object. It can be a bit confusing because the same type is 
used for surfaces and surface templates, I guess.


Refer to si_blit.c:si_resource_copy_region to see how it's done.

Grigori
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [RFC] dynamic IB size tuning for radeonsi

2016-04-15 Thread Grigori Goronzy
Hi,

apps that cause a lot of synchronization benefit from small IB
sizes. The current IB size is a bit on the large side for this class
of apps. On the other hand, if there isn't much synchronization going
on, increasing the IB size can slightly improve performance, too.

Here's a quick hack that tunes the IB size based on feedback from
buffer_wait_time. What do you think? I see good results with Unigine
Heaven (no synchronization, benefits from larger IB size), Metro Last
Light (lots of synchronization, benefits from small IBs) as well as
OpenArena and Xonotic (same).

Note: this patch applies on top of Bas' constant engine patchset.

Grigori

In-Reply-To: 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] amdgpu/winsys: adjust IB size based on buffer wait time

2016-04-15 Thread Grigori Goronzy
Small IBs help to reduce stalls for workloads that require a lot of
synchronization. On the other hand, if there is no notable
synchronization, we can use a large IB size to slightly improve
performance in some cases.

This introduces tuning of the IB size based on feedback on the average
buffer wait time. The average wait time is tracked with exponential
smoothing.
---
 src/gallium/winsys/amdgpu/drm/amdgpu_bo.c | 2 ++
 src/gallium/winsys/amdgpu/drm/amdgpu_cs.c | 8 ++--
 src/gallium/winsys/amdgpu/drm/amdgpu_winsys.h | 1 +
 3 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/src/gallium/winsys/amdgpu/drm/amdgpu_bo.c 
b/src/gallium/winsys/amdgpu/drm/amdgpu_bo.c
index 036301e..1e441e5 100644
--- a/src/gallium/winsys/amdgpu/drm/amdgpu_bo.c
+++ b/src/gallium/winsys/amdgpu/drm/amdgpu_bo.c
@@ -195,6 +195,7 @@ static void *amdgpu_bo_map(struct pb_buffer *buf,
return NULL;
 }
  }
+ bo->ws->buffer_wait_time_avg = (3 * bo->ws->buffer_wait_time_avg) / 4;
   } else {
  uint64_t time = os_time_get_nano();
 
@@ -222,6 +223,7 @@ static void *amdgpu_bo_map(struct pb_buffer *buf,
  }
 
  bo->ws->buffer_wait_time += os_time_get_nano() - time;
+ bo->ws->buffer_wait_time_avg = (3 * bo->ws->buffer_wait_time_avg + 
os_time_get_nano() - time) / 4;
   }
}
 
diff --git a/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c 
b/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c
index 3ea0f3d..a9af0ce 100644
--- a/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c
+++ b/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c
@@ -201,12 +201,16 @@ amdgpu_ctx_query_reset_status(struct radeon_winsys_ctx 
*rwctx)
 static bool amdgpu_get_new_ib(struct radeon_winsys *ws, struct amdgpu_ib *ib,
   struct amdgpu_cs_ib_info *info, unsigned ib_type)
 {
+   unsigned buffer_size = 128 * 1024 * 4;
+   unsigned ib_size = 32 * 1024 * 4;
+
/* Small IBs are better than big IBs, because the GPU goes idle quicker
 * and there is less waiting for buffers and fences. Proof:
 *   http://www.phoronix.com/scan.php?page=article=mesa-111-si=1
 */
-   unsigned buffer_size = 128 * 1024 * 4;
-   unsigned ib_size = 20 * 1024 * 4;
+   uint64_t avg = ((struct amdgpu_winsys *)ws)->buffer_wait_time_avg;
+   if (avg > 1E4)
+   ib_size = 10 * 1024 * 4;
 
if (ib_type == IB_CONST) {
   buffer_size = 512 * 1024 * 4;
diff --git a/src/gallium/winsys/amdgpu/drm/amdgpu_winsys.h 
b/src/gallium/winsys/amdgpu/drm/amdgpu_winsys.h
index 91b9be4..56be13e 100644
--- a/src/gallium/winsys/amdgpu/drm/amdgpu_winsys.h
+++ b/src/gallium/winsys/amdgpu/drm/amdgpu_winsys.h
@@ -54,6 +54,7 @@ struct amdgpu_winsys {
uint64_t allocated_vram;
uint64_t allocated_gtt;
uint64_t buffer_wait_time; /* time spent in buffer_wait in ns */
+   uint64_t buffer_wait_time_avg;
uint64_t num_cs_flushes;
unsigned gart_page_size;
 
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/4] gallium/radeon: add clear_texture function

2016-04-15 Thread Grigori Goronzy

On 2016-04-15 18:38, Ilia Mirkin wrote:

+   } else {
+   union pipe_color_union color;
+   switch (util_format_get_blocksizebits(res->format)) {
+   case 128:
+   sf->format = PIPE_FORMAT_R32G32B32A32_UINT;


Just as an FYI... this is safe on nouveau because I control all the
internals and know that this is safe to do. Please verify that it's
similarly safe to change the surface format after creation on radeon -
it might not be. (Esp for compressed textures...)



In other places in radeonsi that require reinterpretation (e.g. 
si_blit.c), the surface template is modified instead of changing the 
surface after creation. I'm not sure if r600/radeonsi like it if the 
format is changed late like here. Seems to be cleaner and clearer to 
change the template anyway.


Grigori
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radeonsi: fix mask checking when emitting scissors and viewports

2016-04-11 Thread Grigori Goronzy

On 2016-04-08 11:00, Marek Olšák wrote:

From: Marek Olšák <marek.ol...@amd.com>

---
 src/gallium/drivers/radeonsi/si_state.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_state.c
b/src/gallium/drivers/radeonsi/si_state.c
index 8087d23..3894e1d 100644
--- a/src/gallium/drivers/radeonsi/si_state.c
+++ b/src/gallium/drivers/radeonsi/si_state.c
@@ -912,8 +912,10 @@ static void si_emit_scissors(struct si_context
*sctx, struct r600_atom *atom)
bool scissor_enable = sctx->queued.named.rasterizer->scissor_enable;

/* The simple case: Only 1 viewport is active. */
-   if (mask & 1 &&
-   !si_get_vs_info(sctx)->writes_viewport_index) {
+   if (!si_get_vs_info(sctx)->writes_viewport_index) {
+   if (!(mask & 1))
+       return;
+


Reviewed-by: Grigori Goronzy <g...@chown.ath.cx>

I also noticed this when I tried to implement the guard band feature. 
E.g. OpenArena will often needlessly emit scissors for all 16 viewports 
without this.


Grigori
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/5] R600, GCN: Guard Band support

2016-04-11 Thread Grigori Goronzy

On 2016-04-11 00:34, Marek Olšák wrote:


This patch series adds Guard Band support into r600g and radeonsi.

It first implements the Guard Band in radeonsi, then it moves all
radeonsi scissor & viewport code into gallium/radeon, and then r600g
is switched to it and its original scissor & viewport code is deleted.



Thanks for implementing this properly.

Reviewed-by: Grigori Goronzy <g...@chown.ath.cx>

Grigori
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2] radeonsi: use guard band clipping

2016-04-06 Thread Grigori Goronzy
With the previous changes to handling of viewport clipping, it is
almost trivial to add proper support for guard band clipping.  Select a
suitable integer clipping value to keep inside the rasterizer's guard
band range of [-32768, 32767] and program the hardware to use guard
band clipping.

Guard band clipping speeds up rasterization for primitives that are
partially off-screen.  This change in particular results in small
framerate improvements in a wide range of games.

v2: the rasterizer doesn't clamp coordinates, so use coordinates
before viewport clamping to determine guardband clipping.
---

So, how about this? I don't like the introduction of the new struct,
but this seems to be cleaner than v1 anyway.

 src/gallium/drivers/radeonsi/si_state.c | 83 +++--
 src/gallium/drivers/radeonsi/si_state.h |  7 +++
 2 files changed, 66 insertions(+), 24 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_state.c 
b/src/gallium/drivers/radeonsi/si_state.c
index 0bb41e5..a9a58e8 100644
--- a/src/gallium/drivers/radeonsi/si_state.c
+++ b/src/gallium/drivers/radeonsi/si_state.c
@@ -838,39 +838,53 @@ static void si_set_scissor_states(struct pipe_context 
*ctx,
si_mark_atom_dirty(sctx, >scissors.atom);
 }
 
+static void si_mix_scissor(struct si_viewport_scissor *out,
+  struct si_viewport_scissor *mix)
+{
+   out->minx = MIN2(out->minx, mix->minx);
+   out->miny = MIN2(out->miny, mix->miny);
+   out->maxx = MAX2(out->maxx, mix->maxx);
+   out->maxy = MAX2(out->maxy, mix->maxy);
+}
+
 static void si_get_scissor_from_viewport(struct pipe_viewport_state *vp,
-struct pipe_scissor_state *scissor)
+struct si_viewport_scissor *scissor)
 {
int tmp;
 
/* Convert (-1, -1) and (1, 1) from clip space into window space. */
-   int minx = (int)(-vp->scale[0] + vp->translate[0]);
-   int miny = (int)(-vp->scale[1] + vp->translate[1]);
-   int maxx = (int)(vp->scale[0] + vp->translate[0]);
-   int maxy = (int)(vp->scale[1] + vp->translate[1]);
+   scissor->minx = (int)(-vp->scale[0] + vp->translate[0]);
+   scissor->miny = (int)(-vp->scale[1] + vp->translate[1]);
+   scissor->maxx = (int)(vp->scale[0] + vp->translate[0]);
+   scissor->maxy = (int)(vp->scale[1] + vp->translate[1]);
 
/* r600_draw_rectangle sets this. Disable the scissor. */
-   if (minx == -1 && miny == -1 && maxx == 1 && maxy == 1) {
-   minx = miny = 0;
-   maxx = maxy = 16384;
+   if (scissor->minx == -1 && scissor->miny == -1 &&
+   scissor->maxx == 1 && scissor->maxy == 1) {
+   scissor->minx = scissor->miny = 0;
+   scissor->maxx = scissor->maxy = 16384;
}
 
/* Handle inverted viewports. */
-   if (minx > maxx) {
-   tmp =  minx;
-   minx =  maxx;
-   maxx = tmp;
+   if (scissor->minx > scissor->maxx) {
+   tmp = scissor->minx;
+   scissor->minx = scissor->maxx;
+   scissor->maxx = tmp;
}
-   if (miny > maxy) {
-   tmp =  miny;
-   miny =  maxy;
-   maxy = tmp;
+   if (scissor->miny > scissor->maxy) {
+   tmp = scissor->miny;
+   scissor->miny = scissor->maxy;
+   scissor->maxy = tmp;
}
+}
 
-   scissor->minx = CLAMP(minx, 0, 16384);
-   scissor->miny = CLAMP(miny, 0, 16384);
-   scissor->maxx = CLAMP(maxx, 0, 16384);
-   scissor->maxy = CLAMP(maxy, 0, 16384);
+static void si_clamp_scissor(struct pipe_scissor_state *out,
+struct si_viewport_scissor *scissor)
+{
+   out->minx = CLAMP(scissor->minx, 0, 16384);
+   out->miny = CLAMP(scissor->miny, 0, 16384);
+   out->maxx = CLAMP(scissor->maxx, 0, 16384);
+   out->maxy = CLAMP(scissor->maxy, 0, 16384);
 }
 
 static void si_clip_scissor(struct pipe_scissor_state *out,
@@ -884,14 +898,18 @@ static void si_clip_scissor(struct pipe_scissor_state 
*out,
 
 static void si_emit_one_scissor(struct radeon_winsys_cs *cs,
struct pipe_viewport_state *vp,
-   struct pipe_scissor_state *scissor)
+   struct pipe_scissor_state *scissor,
+   struct si_viewport_scissor *mix)
 {
+   struct si_viewport_scissor vp_scissor;
struct pipe_scissor_state final;
 
/* Since the guard band disables clipping, we have to clip per-pixel
 * using a scissor.
 */
-   si_get_scissor_from_viewport(vp, );
+   si_get_scissor_from_viewport(vp, _scissor);
+   si_mix_scissor(mix, _scissor);
+   si_clamp_scissor(, _scissor);
 
if (scissor)
si_clip_scissor(, scissor);
@@ -903,19 +921,35 @@ static void si_emit_one_scissor(struct radeon_winsys_cs 

[Mesa-dev] [PATCH 2/2] radeonsi: use guard band clipping

2016-04-06 Thread Grigori Goronzy
With the previous changes to handling of viewport clipping, it is
almost trivial to add proper support for guard band clipping.  Select a
suitable integer clipping value to keep inside the rasterizer's guard
band range of [-32768, 32767] and program the hardware to use guard
band clipping.

Guard band clipping speeds up rasterization for primitives that are
partially off-screen.  This change in particular results in small
framerate improvements in a wide range of games.
---
 src/gallium/drivers/radeonsi/si_state.c | 34 ++---
 1 file changed, 31 insertions(+), 3 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_state.c 
b/src/gallium/drivers/radeonsi/si_state.c
index 0bb41e5..013fcf1 100644
--- a/src/gallium/drivers/radeonsi/si_state.c
+++ b/src/gallium/drivers/radeonsi/si_state.c
@@ -882,9 +882,19 @@ static void si_clip_scissor(struct pipe_scissor_state *out,
out->maxy = MIN2(out->maxy, clip->maxy);
 }
 
+static void si_mix_scissor(struct pipe_scissor_state *out,
+  struct pipe_scissor_state *clip)
+{
+   out->minx = MIN2(out->minx, clip->minx);
+   out->miny = MIN2(out->miny, clip->miny);
+   out->maxx = MAX2(out->maxx, clip->maxx);
+   out->maxy = MAX2(out->maxy, clip->maxy);
+}
+
 static void si_emit_one_scissor(struct radeon_winsys_cs *cs,
struct pipe_viewport_state *vp,
-   struct pipe_scissor_state *scissor)
+   struct pipe_scissor_state *scissor,
+   struct pipe_scissor_state *mix)
 {
struct pipe_scissor_state final;
 
@@ -895,6 +905,7 @@ static void si_emit_one_scissor(struct radeon_winsys_cs *cs,
 
if (scissor)
si_clip_scissor(, scissor);
+   si_mix_scissor(mix, );
 
radeon_emit(cs, S_028250_TL_X(final.minx) |
S_028250_TL_Y(final.miny) |
@@ -903,19 +914,35 @@ static void si_emit_one_scissor(struct radeon_winsys_cs 
*cs,
S_028254_BR_Y(final.maxy));
 }
 
+static void si_emit_guardband(struct si_context *sctx,
+ struct pipe_scissor_state *scissor)
+{
+   struct radeon_winsys_cs *cs = sctx->b.gfx.cs;
+
+   int width = scissor->maxx - scissor->minx;
+   int height = scissor->maxy - scissor->miny;
+   float guardband_x = width ? (32768 / width) : 1.0;
+   float guardband_y = height ? (32768 / height) : 1.0;
+
+   radeon_set_context_reg(cs, R_028BF0_PA_CL_GB_HORZ_CLIP_ADJ, 
fui(guardband_x));
+   radeon_set_context_reg(cs, R_028BE8_PA_CL_GB_VERT_CLIP_ADJ, 
fui(guardband_y));
+}
+
 static void si_emit_scissors(struct si_context *sctx, struct r600_atom *atom)
 {
struct radeon_winsys_cs *cs = sctx->b.gfx.cs;
struct pipe_scissor_state *states = sctx->scissors.states;
unsigned mask = sctx->scissors.dirty_mask;
bool scissor_enable = sctx->queued.named.rasterizer->scissor_enable;
+   struct pipe_scissor_state max_clip = {0};
 
/* The simple case: Only 1 viewport is active. */
if (mask & 1 &&
!si_get_vs_info(sctx)->writes_viewport_index) {
radeon_set_context_reg_seq(cs, 
R_028250_PA_SC_VPORT_SCISSOR_0_TL, 2);
si_emit_one_scissor(cs, >viewports.states[0],
-   scissor_enable ? [0] : NULL);
+   scissor_enable ? [0] : NULL, 
_clip);
+   si_emit_guardband(sctx, _clip);
sctx->scissors.dirty_mask &= ~1; /* clear one bit */
return;
}
@@ -929,9 +956,10 @@ static void si_emit_scissors(struct si_context *sctx, 
struct r600_atom *atom)
   start * 4 * 2, count * 2);
for (i = start; i < start+count; i++) {
si_emit_one_scissor(cs, >viewports.states[i],
-   scissor_enable ? [i] : NULL);
+   scissor_enable ? [i] : NULL, 
_clip);
}
}
+   si_emit_guardband(sctx, _clip);
sctx->scissors.dirty_mask = 0;
 }
 
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] radeonsi: do per-pixel clipping based on viewport states

2016-04-06 Thread Grigori Goronzy
From: Marek Olšák 

In other words, vport scissors are derived from viewport states.
If the scissor test is enabled, the intersection of both is used.

The guard band will disable clipping, so we have to clip per-pixel.

v2: fix check for r600_draw_rectangle and other overflow conditions.
(Grigori)
---
 src/gallium/drivers/radeonsi/si_state.c | 94 +
 src/gallium/drivers/radeonsi/si_state.h |  1 +
 2 files changed, 84 insertions(+), 11 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_state.c 
b/src/gallium/drivers/radeonsi/si_state.c
index 10d691a..0bb41e5 100644
--- a/src/gallium/drivers/radeonsi/si_state.c
+++ b/src/gallium/drivers/radeonsi/si_state.c
@@ -830,25 +830,92 @@ static void si_set_scissor_states(struct pipe_context 
*ctx,
for (i = 0; i < num_scissors; i++)
sctx->scissors.states[start_slot + i] = state[i];
 
+   if (!sctx->queued.named.rasterizer ||
+   !sctx->queued.named.rasterizer->scissor_enable)
+   return;
+
sctx->scissors.dirty_mask |= ((1 << num_scissors) - 1) << start_slot;
si_mark_atom_dirty(sctx, >scissors.atom);
 }
 
+static void si_get_scissor_from_viewport(struct pipe_viewport_state *vp,
+struct pipe_scissor_state *scissor)
+{
+   int tmp;
+
+   /* Convert (-1, -1) and (1, 1) from clip space into window space. */
+   int minx = (int)(-vp->scale[0] + vp->translate[0]);
+   int miny = (int)(-vp->scale[1] + vp->translate[1]);
+   int maxx = (int)(vp->scale[0] + vp->translate[0]);
+   int maxy = (int)(vp->scale[1] + vp->translate[1]);
+
+   /* r600_draw_rectangle sets this. Disable the scissor. */
+   if (minx == -1 && miny == -1 && maxx == 1 && maxy == 1) {
+   minx = miny = 0;
+   maxx = maxy = 16384;
+   }
+
+   /* Handle inverted viewports. */
+   if (minx > maxx) {
+   tmp =  minx;
+   minx =  maxx;
+   maxx = tmp;
+   }
+   if (miny > maxy) {
+   tmp =  miny;
+   miny =  maxy;
+   maxy = tmp;
+   }
+
+   scissor->minx = CLAMP(minx, 0, 16384);
+   scissor->miny = CLAMP(miny, 0, 16384);
+   scissor->maxx = CLAMP(maxx, 0, 16384);
+   scissor->maxy = CLAMP(maxy, 0, 16384);
+}
+
+static void si_clip_scissor(struct pipe_scissor_state *out,
+   struct pipe_scissor_state *clip)
+{
+   out->minx = MAX2(out->minx, clip->minx);
+   out->miny = MAX2(out->miny, clip->miny);
+   out->maxx = MIN2(out->maxx, clip->maxx);
+   out->maxy = MIN2(out->maxy, clip->maxy);
+}
+
+static void si_emit_one_scissor(struct radeon_winsys_cs *cs,
+   struct pipe_viewport_state *vp,
+   struct pipe_scissor_state *scissor)
+{
+   struct pipe_scissor_state final;
+
+   /* Since the guard band disables clipping, we have to clip per-pixel
+* using a scissor.
+*/
+   si_get_scissor_from_viewport(vp, );
+
+   if (scissor)
+   si_clip_scissor(, scissor);
+
+   radeon_emit(cs, S_028250_TL_X(final.minx) |
+   S_028250_TL_Y(final.miny) |
+   S_028250_WINDOW_OFFSET_DISABLE(1));
+   radeon_emit(cs, S_028254_BR_X(final.maxx) |
+   S_028254_BR_Y(final.maxy));
+}
+
 static void si_emit_scissors(struct si_context *sctx, struct r600_atom *atom)
 {
struct radeon_winsys_cs *cs = sctx->b.gfx.cs;
struct pipe_scissor_state *states = sctx->scissors.states;
unsigned mask = sctx->scissors.dirty_mask;
+   bool scissor_enable = sctx->queued.named.rasterizer->scissor_enable;
 
/* The simple case: Only 1 viewport is active. */
if (mask & 1 &&
!si_get_vs_info(sctx)->writes_viewport_index) {
radeon_set_context_reg_seq(cs, 
R_028250_PA_SC_VPORT_SCISSOR_0_TL, 2);
-   radeon_emit(cs, S_028250_TL_X(states[0].minx) |
-   S_028250_TL_Y(states[0].miny) |
-   S_028250_WINDOW_OFFSET_DISABLE(1));
-   radeon_emit(cs, S_028254_BR_X(states[0].maxx) |
-   S_028254_BR_Y(states[0].maxy));
+   si_emit_one_scissor(cs, >viewports.states[0],
+   scissor_enable ? [0] : NULL);
sctx->scissors.dirty_mask &= ~1; /* clear one bit */
return;
}
@@ -861,11 +928,8 @@ static void si_emit_scissors(struct si_context *sctx, 
struct r600_atom *atom)
radeon_set_context_reg_seq(cs, 
R_028250_PA_SC_VPORT_SCISSOR_0_TL +
   start * 4 * 2, count * 2);
for (i = start; i < start+count; i++) {
-   radeon_emit(cs, S_028250_TL_X(states[i].minx) |
-   

Re: [Mesa-dev] [PATCH 2/2] radeonsi: use re-Z

2016-02-24 Thread Grigori Goronzy

On 2016-02-23 17:45, Marek Olšák wrote:

From: Marek Olšák 

This can increase perf for shaders that kill pixels (kill, alpha-test,
alpha-to-coverage).
---
 src/gallium/drivers/radeonsi/si_shader.h|  1 +
 src/gallium/drivers/radeonsi/si_state.c |  6 +++---
 src/gallium/drivers/radeonsi/si_state_shaders.c | 16 +---
 3 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_shader.h
b/src/gallium/drivers/radeonsi/si_shader.h
index ff5c24d..637d264 100644
--- a/src/gallium/drivers/radeonsi/si_shader.h
+++ b/src/gallium/drivers/radeonsi/si_shader.h
@@ -365,6 +365,7 @@ struct si_shader {
struct r600_resource*scratch_bo;
union si_shader_key key;
boolis_binary_shared;
+   unsignedz_order;

/* The following data is all that's needed for binary shaders. */
struct radeon_shader_binary binary;
diff --git a/src/gallium/drivers/radeonsi/si_state.c
b/src/gallium/drivers/radeonsi/si_state.c
index 2dfdbeb..b23b17a 100644
--- a/src/gallium/drivers/radeonsi/si_state.c
+++ b/src/gallium/drivers/radeonsi/si_state.c
@@ -1339,10 +1339,10 @@ static void si_emit_db_render_state(struct
si_context *sctx, struct r600_atom *s
sctx->ps_db_shader_control;

/* Bug workaround for smoothing (overrasterization) on SI. */
-   if (sctx->b.chip_class == SI && sctx->smoothing_enabled)
+   if (sctx->b.chip_class == SI && sctx->smoothing_enabled) {
+   db_shader_control &= C_02880C_Z_ORDER;
db_shader_control |= S_02880C_Z_ORDER(V_02880C_LATE_Z);
-   else
-   db_shader_control |= 
S_02880C_Z_ORDER(V_02880C_EARLY_Z_THEN_LATE_Z);
+   }

 	/* Disable the gl_SampleMask fragment shader output if MSAA is 
disabled. */
 	if (sctx->framebuffer.nr_samples <= 1 || (rs && 
!rs->multisample_enable))

diff --git a/src/gallium/drivers/radeonsi/si_state_shaders.c
b/src/gallium/drivers/radeonsi/si_state_shaders.c
index a6753a7..c220185 100644
--- a/src/gallium/drivers/radeonsi/si_state_shaders.c
+++ b/src/gallium/drivers/radeonsi/si_state_shaders.c
@@ -789,6 +789,13 @@ static void si_shader_ps(struct si_shader *shader)
   S_00B02C_EXTRA_LDS_SIZE(shader->config.lds_size) |
   S_00B02C_USER_SGPR(num_user_sgprs) |
 		   S_00B32C_SCRATCH_EN(shader->config.scratch_bytes_per_wave > 
0));

+
+   /* Prefer RE_Z if the shader is complex enough. */
+   if (info->num_memory_instructions >= 2 ||
+   shader->binary.code_size > 100*4)
+   shader->z_order = V_02880C_EARLY_Z_THEN_RE_Z;
+   else
+   shader->z_order = V_02880C_EARLY_Z_THEN_LATE_Z;
 }



Are these thresholds for switching to re-Z based on measurements, 
feedback by the HW team or are they just a shot in the dark?
Either way, the magic numbers don't look particularly nice. Maybe 
preprocessor constants should be introduced for them?



 static void si_shader_init_pm4_state(struct si_shader *shader)
@@ -1985,15 +1992,18 @@ bool si_update_shaders(struct si_context *sctx)
si_update_vgt_shader_config(sctx);

if (sctx->ps_shader.cso) {
-   unsigned db_shader_control =
-   sctx->ps_shader.cso->db_shader_control |
-			S_02880C_KILL_ENABLE(si_get_alpha_test_func(sctx) != 
PIPE_FUNC_ALWAYS);

+   unsigned db_shader_control;

r = si_shader_select(ctx, >ps_shader);
if (r)
return false;
si_pm4_bind_state(sctx, ps, sctx->ps_shader.current->pm4);

+   db_shader_control =
+   sctx->ps_shader.cso->db_shader_control |
+			S_02880C_KILL_ENABLE(si_get_alpha_test_func(sctx) != 
PIPE_FUNC_ALWAYS) |

+   S_02880C_Z_ORDER(sctx->ps_shader.current->z_order);
+
 		if (si_pm4_state_changed(sctx, ps) || si_pm4_state_changed(sctx, vs) 
||

sctx->sprite_coord_enable != rs->sprite_coord_enable ||
sctx->flatshade != rs->flatshade) {

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] radeonsi: use re-Z

2016-02-24 Thread Grigori Goronzy

On 2016-02-24 12:47, Marek Olšák wrote:
On Wed, Feb 24, 2016 at 12:22 PM, Grigori Goronzy <g...@chown.ath.cx> 
wrote:

S_00B32C_SCRATCH_EN(shader->config.scratch_bytes_per_wave > 0));
+
+   /* Prefer RE_Z if the shader is complex enough. */
+   if (info->num_memory_instructions >= 2 ||
+   shader->binary.code_size > 100*4)
+   shader->z_order = V_02880C_EARLY_Z_THEN_RE_Z;
+   else
+   shader->z_order = V_02880C_EARLY_Z_THEN_LATE_Z;
 }



Are these thresholds for switching to re-Z based on measurements, 
feedback

by the HW team or are they just a shot in the dark?
Either way, the magic numbers don't look particularly nice. Maybe
preprocessor constants should be introduced for them?


They are not so magic. The meaning is 2 memory instructions or
instruction count between 50 and 100. They are based on my estimates
and expectations.



Of course the semantics are easy to understand, but what the reasoning 
behind the constant's values is, that is not clear.
I think it's always useful to assign names to such tunables and to put 
them into a central place (e.g. at the top of a file, together with 
other #defines).
It's just a suggestion, though, I don't want to bikeshed this any 
further. :)


Grigori


No benchmarking has been done, but there is a potential to gain some
performance with shaders killing pixels.

Marek

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] radeon/uvd: fix VC-1 simple/main profile decode

2015-09-23 Thread Grigori Goronzy
Hi,

On 23.09.2015 10:11, Christian König wrote:
> From: Boyuan Zhang 
> 
> Signed-off-by: Boyuan Zhang 
> Reviewed-by: Christian König 
> ---

Thanks, nice to see this finally getting fixed, and it was a pretty
simple thing after all... well, not quite yet apparently. Sometimes
playback works correctly, sometimes it doesn't and glitches around, on
my CIK (Bonaire) GPU. It seems random, but isn't as bad and
hang-inducing compared to before this was disabled in Mesa. Maybe some
state isn't being set all the time?

I tested these two samples:

> http://samples.ffmpeg.org/asf-wmv/asf_with_chapters.wmv
> http://samples.ffmpeg.org/V-codecs/WVC1/Test_1440x576_WVC1_6Mbps.wmv

I used "mpv --hwdec=vdpau --vo=vdpau" to test this.

Best regards
Grigori

>  src/gallium/drivers/radeon/radeon_uvd.c   | 6 ++
>  src/gallium/drivers/radeon/radeon_video.c | 5 +
>  2 files changed, 7 insertions(+), 4 deletions(-)
> 
> diff --git a/src/gallium/drivers/radeon/radeon_uvd.c 
> b/src/gallium/drivers/radeon/radeon_uvd.c
> index 81f3f45..9edb511 100644
> --- a/src/gallium/drivers/radeon/radeon_uvd.c
> +++ b/src/gallium/drivers/radeon/radeon_uvd.c
> @@ -940,6 +940,12 @@ static void ruvd_end_frame(struct pipe_video_codec 
> *decoder,
>   dec->msg->body.decode.width_in_samples = dec->base.width;
>   dec->msg->body.decode.height_in_samples = dec->base.height;
>  
> + if ((picture->profile == PIPE_VIDEO_PROFILE_VC1_SIMPLE)
> +|| (picture->profile == PIPE_VIDEO_PROFILE_VC1_MAIN)) {
> + dec->msg->body.decode.width_in_samples = 
> align(dec->msg->body.decode.width_in_samples, 16) / 16;
> + dec->msg->body.decode.height_in_samples = 
> align(dec->msg->body.decode.height_in_samples, 16) / 16;
> + }
> +
>   dec->msg->body.decode.dpb_size = dec->dpb.res->buf->size;
>   dec->msg->body.decode.bsd_size = bs_size;
>   dec->msg->body.decode.db_pitch = dec->base.width;
> diff --git a/src/gallium/drivers/radeon/radeon_video.c 
> b/src/gallium/drivers/radeon/radeon_video.c
> index 3a1834b..e6cfdf6 100644
> --- a/src/gallium/drivers/radeon/radeon_video.c
> +++ b/src/gallium/drivers/radeon/radeon_video.c
> @@ -259,11 +259,8 @@ int rvid_get_video_param(struct pipe_screen *screen,
>   case PIPE_VIDEO_FORMAT_MPEG12:
>   case PIPE_VIDEO_FORMAT_MPEG4:
>   case PIPE_VIDEO_FORMAT_MPEG4_AVC:
> - return entrypoint != PIPE_VIDEO_ENTRYPOINT_ENCODE;
>   case PIPE_VIDEO_FORMAT_VC1:
> - /* FIXME: VC-1 simple/main profile is broken */
> - return profile == PIPE_VIDEO_PROFILE_VC1_ADVANCED &&
> -entrypoint != PIPE_VIDEO_ENTRYPOINT_ENCODE;
> + return entrypoint != PIPE_VIDEO_ENTRYPOINT_ENCODE;
>   case PIPE_VIDEO_FORMAT_HEVC:
>   /* Carrizo only supports HEVC Main */
>   return rscreen->family >= CHIP_CARRIZO &&
> 




signature.asc
Description: OpenPGP digital signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] clover: fix event handling of buffer operations

2015-06-25 Thread Grigori Goronzy

On 2015-06-09 22:52, Francisco Jerez wrote:

+
+   if (blocking)
+  hev().wait();
+


hard_event::wait() may fail, so this should probably be done before the
ret_object() call to avoid leaks.


Alright... C++ exceptions are a minefield. :)


Is there any reason you didn't make
the same change in clEnqueueReadBuffer() and clEnqueueWriteBuffer()?



Must be an oversight. I think I did that, or at least I intended to do 
so.



Same comment as above.  Also note that this is being more strict than
the spec requires (which I believe is what Tom was referring to).  From
the CL 1.2 spec:

| If blocking_write is CL_TRUE, the OpenCL implementation copies the 
data

| referred to by ptr and enqueues the write operation in the
| command-queue. The memory pointed to by ptr can be reused by the
| application after the clEnqueueWriteBufferRect call returns.

The spec is giving you no guarantee that the write to the actual memory
object will be complete by the time the clEnqueueWriteBufferRect call
returns -- Only that your data will have been buffered somewhere and 
the

memory pointed to by the argument can be reused immediately by the
application.  The reason why I was reluctant to make this change last
time it came up was that it's likely to hurt performance unnecessarily
because the wait() call blocks until *all* previous commands in the 
same

queue have completed execution, even though in the most common case the
copy is performed synchronously using soft_copy_op(), so the wait() 
call

is redundant even for blocking copies.



OK, maybe we could drop the wait completely for all of the write 
calls.



The case with blocking reads is similar, the copy is handled
synchronously using soft_copy_op() when no user events are present in
the list of dependencies, so calling wait() on the event is unnecessary
to guarantee that the execution of the read has completed, and will
cause a pipe_context flush and wait until the most recent fence is
signalled.



I think it's reasonable to expect that the event is ready for profile 
queries after a blocking read has finished. That was the initial 
motivation for this patch. Other implementations behave like that. I 
didn't expect wait() to completely flush everything. Won't that cause a 
lot of needless flushing with event wait lists?



Ideally we would have a weaker variant of event::wait()
(e.g. wait_signalled()) that doesn't flush and just waits for the
associated action call-back to have been executed without giving any
guarantees about the corresponding GPU command.  The event interface
doesn't expose such a functionality right now, I'm attaching two
(completely untested) patches implementing it, you should be able to 
use

them as starting point to fix blocking transfers.



Thanks, I'll look into that later when I get some free time.

Grigori
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] clover: implement CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE

2015-06-25 Thread Grigori Goronzy

On 2015-05-28 13:04, Grigori Goronzy wrote:

Work-group size should always be aligned to subgroup size; this is a
basic requirement, otherwise some work-items will be no-operation.

It might make sense to refine the value according to a kernel's
resource usage, but that's a possible optimization for the future.


Ping?

This is rather simple, but I'd like an Rb, if possible. That also goes 
for the Gallium support patch.


Grigori


---
 src/gallium/state_trackers/clover/api/kernel.cpp  | 2 +-
 src/gallium/state_trackers/clover/core/device.cpp | 5 +
 src/gallium/state_trackers/clover/core/device.hpp | 1 +
 3 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/src/gallium/state_trackers/clover/api/kernel.cpp
b/src/gallium/state_trackers/clover/api/kernel.cpp
index 05cc392..857a152 100644
--- a/src/gallium/state_trackers/clover/api/kernel.cpp
+++ b/src/gallium/state_trackers/clover/api/kernel.cpp
@@ -169,7 +169,7 @@ clGetKernelWorkGroupInfo(cl_kernel d_kern,
cl_device_id d_dev,
   break;

case CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE:
-  buf.as_scalarsize_t() = 1;
+  buf.as_scalarsize_t() = dev.subgroup_size();
   break;

case CL_KERNEL_PRIVATE_MEM_SIZE:
diff --git a/src/gallium/state_trackers/clover/core/device.cpp
b/src/gallium/state_trackers/clover/core/device.cpp
index 42b45b7..c42d1d2 100644
--- a/src/gallium/state_trackers/clover/core/device.cpp
+++ b/src/gallium/state_trackers/clover/core/device.cpp
@@ -185,6 +185,11 @@ device::max_block_size() const {
return { v.begin(), v.end() };
 }

+cl_uint
+device::subgroup_size() const {
+   return get_compute_paramuint32_t(pipe, 
PIPE_COMPUTE_CAP_SUBGROUP_SIZE)[0];

+}
+
 std::string
 device::device_name() const {
return pipe-get_name(pipe);
diff --git a/src/gallium/state_trackers/clover/core/device.hpp
b/src/gallium/state_trackers/clover/core/device.hpp
index de5fc6b..2857847 100644
--- a/src/gallium/state_trackers/clover/core/device.hpp
+++ b/src/gallium/state_trackers/clover/core/device.hpp
@@ -67,6 +67,7 @@ namespace clover {
   bool has_doubles() const;

   std::vectorsize_t max_block_size() const;
+  cl_uint subgroup_size() const;
   std::string device_name() const;
   std::string vendor_name() const;
   enum pipe_shader_ir ir_format() const;

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] gallium: add PIPE_COMPUTE_CAP_SUBGROUP_SIZE

2015-06-04 Thread Grigori Goronzy
On 28.05.2015 13:04, Grigori Goronzy wrote:
 We need this to implement OpenCL's
 CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE.
 ---

Ping?

  src/gallium/docs/source/screen.rst |  2 ++
  src/gallium/drivers/ilo/ilo_screen.c   |  8 
  src/gallium/drivers/nouveau/nvc0/nvc0_screen.c |  4 
  src/gallium/drivers/radeon/r600_pipe_common.c  |  6 ++
  src/gallium/drivers/radeon/r600_pipe_common.h  | 20 
  src/gallium/include/pipe/p_defines.h   |  3 ++-
  6 files changed, 42 insertions(+), 1 deletion(-)
 
 diff --git a/src/gallium/docs/source/screen.rst 
 b/src/gallium/docs/source/screen.rst
 index 416ef2d..32c1e87 100644
 --- a/src/gallium/docs/source/screen.rst
 +++ b/src/gallium/docs/source/screen.rst
 @@ -382,6 +382,8 @@ pipe_screen::get_compute_param.
Value type: ``uint32_t``
  * ``PIPE_COMPUTE_CAP_IMAGES_SUPPORTED``: Whether images are supported
non-zero means yes, zero means no. Value type: ``uint32_t``
 +* ``PIPE_COMPUTE_CAP_SUBGROUP_SIZE``: The size of a basic execution unit in
 +  threads. Also known as wavefront size, warp size or SIMD width.
  
  .. _pipe_bind:
  
 diff --git a/src/gallium/drivers/ilo/ilo_screen.c 
 b/src/gallium/drivers/ilo/ilo_screen.c
 index b0fed73..f2a18b2 100644
 --- a/src/gallium/drivers/ilo/ilo_screen.c
 +++ b/src/gallium/drivers/ilo/ilo_screen.c
 @@ -195,6 +195,7 @@ ilo_get_compute_param(struct pipe_screen *screen,
uint32_t max_clock_frequency;
uint32_t max_compute_units;
uint32_t images_supported;
 +  uint32_t subgroup_size;
 } val;
 const void *ptr;
 int size;
 @@ -286,6 +287,13 @@ ilo_get_compute_param(struct pipe_screen *screen,
ptr = val.images_supported;
size = sizeof(val.images_supported);
break;
 +   case PIPE_COMPUTE_CAP_SUBGROUP_SIZE:
 +  /* best case is SIMD32 */
 +  val.subgroup_size = 32;
 +
 +  ptr = val.subgroup_size;
 +  size = sizeof(val.subgroup_size);
 +  break;
 default:
ptr = NULL;
size = 0;
 diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c 
 b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
 index 1ca997a..f6bef83 100644
 --- a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
 +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
 @@ -340,6 +340,7 @@ nvc0_screen_get_compute_param(struct pipe_screen *pscreen,
enum pipe_compute_cap param, void *data)
  {
 uint64_t *data64 = (uint64_t *)data;
 +   uint32_t *data32 = (uint32_t *)data;
 const uint16_t obj_class = nvc0_screen(pscreen)-compute-oclass;
  
 switch (param) {
 @@ -371,6 +372,9 @@ nvc0_screen_get_compute_param(struct pipe_screen *pscreen,
 case PIPE_COMPUTE_CAP_MAX_INPUT_SIZE: /* c[], arbitrary limit */
data64[0] = 4096;
return 8;
 +   case PIPE_COMPUTE_CAP_SUBGROUP_SIZE:
 +  data32[0] = 32;
 +  return 4;
 default:
return 0;
 }
 diff --git a/src/gallium/drivers/radeon/r600_pipe_common.c 
 b/src/gallium/drivers/radeon/r600_pipe_common.c
 index 42e681d..5494cb3 100644
 --- a/src/gallium/drivers/radeon/r600_pipe_common.c
 +++ b/src/gallium/drivers/radeon/r600_pipe_common.c
 @@ -637,6 +637,12 @@ static int r600_get_compute_param(struct pipe_screen 
 *screen,
   return sizeof(uint32_t);
   case PIPE_COMPUTE_CAP_MAX_PRIVATE_SIZE:
   break; /* unused */
 + case PIPE_COMPUTE_CAP_SUBGROUP_SIZE:
 + if (ret) {
 + uint32_t *subgroup_size = ret;
 + *subgroup_size = r600_wavefront_size(rscreen-family);
 + }
 + return sizeof(uint32_t);
   }
  
  fprintf(stderr, unknown PIPE_COMPUTE_CAP %d\n, param);
 diff --git a/src/gallium/drivers/radeon/r600_pipe_common.h 
 b/src/gallium/drivers/radeon/r600_pipe_common.h
 index 6ce81d3..51fd016 100644
 --- a/src/gallium/drivers/radeon/r600_pipe_common.h
 +++ b/src/gallium/drivers/radeon/r600_pipe_common.h
 @@ -570,6 +570,26 @@ static inline unsigned r600_tex_aniso_filter(unsigned 
 filter)
/* else */return 4;
  }
  
 +static inline unsigned r600_wavefront_size(enum radeon_family family)
 +{
 + switch (family) {
 + case CHIP_RV610:
 + case CHIP_RS780:
 + case CHIP_RV620:
 + case CHIP_RS880:
 + return 16;
 + case CHIP_RV630:
 + case CHIP_RV635:
 + case CHIP_RV730:
 + case CHIP_RV710:
 + case CHIP_PALM:
 + case CHIP_CEDAR:
 + return 32;
 + default:
 + return 64;
 + }
 +}
 +
  #define COMPUTE_DBG(rscreen, fmt, args...) \
   do { \
   if ((rscreen-b.debug_flags  DBG_COMPUTE)) fprintf(stderr, 
 fmt, ##args); \
 diff --git a/src/gallium/include/pipe/p_defines.h 
 b/src/gallium/include/pipe/p_defines.h
 index 8fabf5e..b50ae2b 100644
 --- a/src/gallium/include/pipe/p_defines.h
 +++ b/src/gallium/include/pipe/p_defines.h
 @@ -699,7 +699,8 @@ enum pipe_compute_cap

Re: [Mesa-dev] [PATCH 1/2] clover: fix event handling of buffer operations

2015-06-04 Thread Grigori Goronzy
On 28.05.2015 10:10, Grigori Goronzy wrote:
 Wrap MapBuffer and MapImage as hard_event actions, like other
 operations. This enables correct profiling. Also make sure to wait
 for events to finish when blocking is requested by the caller.
 ---

Ping?

  src/gallium/state_trackers/clover/api/transfer.cpp | 50 
 --
  1 file changed, 46 insertions(+), 4 deletions(-)
 
 diff --git a/src/gallium/state_trackers/clover/api/transfer.cpp 
 b/src/gallium/state_trackers/clover/api/transfer.cpp
 index fdb9405..4986f53 100644
 --- a/src/gallium/state_trackers/clover/api/transfer.cpp
 +++ b/src/gallium/state_trackers/clover/api/transfer.cpp
 @@ -270,6 +270,18 @@ namespace {
 src_obj-resource(q), src_orig);
};
 }
 +
 +   ///
 +   /// Resource mapping
 +   ///
 +   templatetypename T
 +   std::functionvoid (event )
 +   map_resource_op(command_queue q, T obj, cl_map_flags flags, bool 
 blocking,
 +   const vector_t origin, const vector_t region, void 
 **map) {
 +  return [=, q](event ) {
 + *map = obj-resource(q).add_map(q, flags, blocking, origin, region);
 +  };
 +   }
  }
  
  CLOVER_API cl_int
 @@ -363,6 +375,10 @@ clEnqueueReadBufferRect(cl_command_queue d_q, cl_mem 
 d_mem, cl_bool blocking,
 region));
  
 ret_object(rd_ev, hev);
 +
 +   if (blocking)
 +  hev().wait();
 +
 return CL_SUCCESS;
  
  } catch (error e) {
 @@ -400,6 +416,10 @@ clEnqueueWriteBufferRect(cl_command_queue d_q, cl_mem 
 d_mem, cl_bool blocking,
 region));
  
 ret_object(rd_ev, hev);
 +
 +   if (blocking)
 +  hev().wait();
 +
 return CL_SUCCESS;
  
  } catch (error e) {
 @@ -505,6 +525,10 @@ clEnqueueReadImage(cl_command_queue d_q, cl_mem d_mem, 
 cl_bool blocking,
 region));
  
 ret_object(rd_ev, hev);
 +
 +   if (blocking)
 +  hev().wait();
 +
 return CL_SUCCESS;
  
  } catch (error e) {
 @@ -539,6 +563,10 @@ clEnqueueWriteImage(cl_command_queue d_q, cl_mem d_mem, 
 cl_bool blocking,
 region));
  
 ret_object(rd_ev, hev);
 +
 +   if (blocking)
 +  hev().wait();
 +
 return CL_SUCCESS;
  
  } catch (error e) {
 @@ -665,10 +693,17 @@ clEnqueueMapBuffer(cl_command_queue d_q, cl_mem d_mem, 
 cl_bool blocking,
 validate_object(q, mem, obj_origin, obj_pitch, region);
 validate_map_flags(mem, flags);
  
 -   void *map = mem.resource(q).add_map(q, flags, blocking, obj_origin, 
 region);
 +   void *map = nullptr;
 +   auto hev = createhard_event(
 +  q, CL_COMMAND_MAP_BUFFER, deps,
 +  map_resource_op(q, mem, flags, blocking, obj_origin, region, map));
  
 -   ret_object(rd_ev, createhard_event(q, CL_COMMAND_MAP_BUFFER, deps));
 +   ret_object(rd_ev, hev);
 ret_error(r_errcode, CL_SUCCESS);
 +
 +   if (blocking)
 +  hev().wait();
 +
 return map;
  
  } catch (error e) {
 @@ -693,10 +728,17 @@ clEnqueueMapImage(cl_command_queue d_q, cl_mem d_mem, 
 cl_bool blocking,
 validate_object(q, img, origin, region);
 validate_map_flags(img, flags);
  
 -   void *map = img.resource(q).add_map(q, flags, blocking, origin, region);
 +   void *map = nullptr;
 +   auto hev = createhard_event(
 +  q, CL_COMMAND_MAP_IMAGE, deps,
 +  map_resource_op(q, img, flags, blocking, origin, region, map));
  
 -   ret_object(rd_ev, createhard_event(q, CL_COMMAND_MAP_IMAGE, deps));
 +   ret_object(rd_ev, hev);
 ret_error(r_errcode, CL_SUCCESS);
 +
 +   if (blocking)
 +  hev().wait();
 +
 return map;
  
  } catch (error e) {
 




signature.asc
Description: OpenPGP digital signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] clover: fix event handling of buffer operations

2015-05-28 Thread Grigori Goronzy
Wrap MapBuffer and MapImage as hard_event actions, like other
operations. This enables correct profiling. Also make sure to wait
for events to finish when blocking is requested by the caller.
---
 src/gallium/state_trackers/clover/api/transfer.cpp | 50 --
 1 file changed, 46 insertions(+), 4 deletions(-)

diff --git a/src/gallium/state_trackers/clover/api/transfer.cpp 
b/src/gallium/state_trackers/clover/api/transfer.cpp
index fdb9405..4986f53 100644
--- a/src/gallium/state_trackers/clover/api/transfer.cpp
+++ b/src/gallium/state_trackers/clover/api/transfer.cpp
@@ -270,6 +270,18 @@ namespace {
src_obj-resource(q), src_orig);
   };
}
+
+   ///
+   /// Resource mapping
+   ///
+   templatetypename T
+   std::functionvoid (event )
+   map_resource_op(command_queue q, T obj, cl_map_flags flags, bool blocking,
+   const vector_t origin, const vector_t region, void **map) 
{
+  return [=, q](event ) {
+ *map = obj-resource(q).add_map(q, flags, blocking, origin, region);
+  };
+   }
 }
 
 CLOVER_API cl_int
@@ -363,6 +375,10 @@ clEnqueueReadBufferRect(cl_command_queue d_q, cl_mem 
d_mem, cl_bool blocking,
region));
 
ret_object(rd_ev, hev);
+
+   if (blocking)
+  hev().wait();
+
return CL_SUCCESS;
 
 } catch (error e) {
@@ -400,6 +416,10 @@ clEnqueueWriteBufferRect(cl_command_queue d_q, cl_mem 
d_mem, cl_bool blocking,
region));
 
ret_object(rd_ev, hev);
+
+   if (blocking)
+  hev().wait();
+
return CL_SUCCESS;
 
 } catch (error e) {
@@ -505,6 +525,10 @@ clEnqueueReadImage(cl_command_queue d_q, cl_mem d_mem, 
cl_bool blocking,
region));
 
ret_object(rd_ev, hev);
+
+   if (blocking)
+  hev().wait();
+
return CL_SUCCESS;
 
 } catch (error e) {
@@ -539,6 +563,10 @@ clEnqueueWriteImage(cl_command_queue d_q, cl_mem d_mem, 
cl_bool blocking,
region));
 
ret_object(rd_ev, hev);
+
+   if (blocking)
+  hev().wait();
+
return CL_SUCCESS;
 
 } catch (error e) {
@@ -665,10 +693,17 @@ clEnqueueMapBuffer(cl_command_queue d_q, cl_mem d_mem, 
cl_bool blocking,
validate_object(q, mem, obj_origin, obj_pitch, region);
validate_map_flags(mem, flags);
 
-   void *map = mem.resource(q).add_map(q, flags, blocking, obj_origin, region);
+   void *map = nullptr;
+   auto hev = createhard_event(
+  q, CL_COMMAND_MAP_BUFFER, deps,
+  map_resource_op(q, mem, flags, blocking, obj_origin, region, map));
 
-   ret_object(rd_ev, createhard_event(q, CL_COMMAND_MAP_BUFFER, deps));
+   ret_object(rd_ev, hev);
ret_error(r_errcode, CL_SUCCESS);
+
+   if (blocking)
+  hev().wait();
+
return map;
 
 } catch (error e) {
@@ -693,10 +728,17 @@ clEnqueueMapImage(cl_command_queue d_q, cl_mem d_mem, 
cl_bool blocking,
validate_object(q, img, origin, region);
validate_map_flags(img, flags);
 
-   void *map = img.resource(q).add_map(q, flags, blocking, origin, region);
+   void *map = nullptr;
+   auto hev = createhard_event(
+  q, CL_COMMAND_MAP_IMAGE, deps,
+  map_resource_op(q, img, flags, blocking, origin, region, map));
 
-   ret_object(rd_ev, createhard_event(q, CL_COMMAND_MAP_IMAGE, deps));
+   ret_object(rd_ev, hev);
ret_error(r_errcode, CL_SUCCESS);
+
+   if (blocking)
+  hev().wait();
+
return map;
 
 } catch (error e) {
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] clover: check clEnqueueMap* for map errors

2015-05-28 Thread Grigori Goronzy
Mapping can fail, and this should be handled. Return the proper error
code and abort the associated event in this case.
---
 src/gallium/state_trackers/clover/api/transfer.cpp | 16 ++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/src/gallium/state_trackers/clover/api/transfer.cpp 
b/src/gallium/state_trackers/clover/api/transfer.cpp
index 4986f53..275059c 100644
--- a/src/gallium/state_trackers/clover/api/transfer.cpp
+++ b/src/gallium/state_trackers/clover/api/transfer.cpp
@@ -699,11 +699,17 @@ clEnqueueMapBuffer(cl_command_queue d_q, cl_mem d_mem, 
cl_bool blocking,
   map_resource_op(q, mem, flags, blocking, obj_origin, region, map));
 
ret_object(rd_ev, hev);
-   ret_error(r_errcode, CL_SUCCESS);
 
if (blocking)
   hev().wait();
 
+   if (!map) {
+  hev().abort(CL_MAP_FAILURE);
+  ret_error(r_errcode, CL_MAP_FAILURE);
+  return NULL;
+   }
+
+   ret_error(r_errcode, CL_SUCCESS);
return map;
 
 } catch (error e) {
@@ -734,11 +740,17 @@ clEnqueueMapImage(cl_command_queue d_q, cl_mem d_mem, 
cl_bool blocking,
   map_resource_op(q, img, flags, blocking, origin, region, map));
 
ret_object(rd_ev, hev);
-   ret_error(r_errcode, CL_SUCCESS);
 
if (blocking)
   hev().wait();
 
+   if (!map) {
+  hev().abort(CL_MAP_FAILURE);
+  ret_error(r_errcode, CL_MAP_FAILURE);
+  return NULL;
+   }
+
+   ret_error(r_errcode, CL_SUCCESS);
return map;
 
 } catch (error e) {
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] gallium: add PIPE_COMPUTE_CAP_SUBGROUP_SIZE

2015-05-28 Thread Grigori Goronzy
We need this to implement OpenCL's
CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE.
---
 src/gallium/docs/source/screen.rst |  2 ++
 src/gallium/drivers/ilo/ilo_screen.c   |  8 
 src/gallium/drivers/nouveau/nvc0/nvc0_screen.c |  4 
 src/gallium/drivers/radeon/r600_pipe_common.c  |  6 ++
 src/gallium/drivers/radeon/r600_pipe_common.h  | 20 
 src/gallium/include/pipe/p_defines.h   |  3 ++-
 6 files changed, 42 insertions(+), 1 deletion(-)

diff --git a/src/gallium/docs/source/screen.rst 
b/src/gallium/docs/source/screen.rst
index 416ef2d..32c1e87 100644
--- a/src/gallium/docs/source/screen.rst
+++ b/src/gallium/docs/source/screen.rst
@@ -382,6 +382,8 @@ pipe_screen::get_compute_param.
   Value type: ``uint32_t``
 * ``PIPE_COMPUTE_CAP_IMAGES_SUPPORTED``: Whether images are supported
   non-zero means yes, zero means no. Value type: ``uint32_t``
+* ``PIPE_COMPUTE_CAP_SUBGROUP_SIZE``: The size of a basic execution unit in
+  threads. Also known as wavefront size, warp size or SIMD width.
 
 .. _pipe_bind:
 
diff --git a/src/gallium/drivers/ilo/ilo_screen.c 
b/src/gallium/drivers/ilo/ilo_screen.c
index b0fed73..f2a18b2 100644
--- a/src/gallium/drivers/ilo/ilo_screen.c
+++ b/src/gallium/drivers/ilo/ilo_screen.c
@@ -195,6 +195,7 @@ ilo_get_compute_param(struct pipe_screen *screen,
   uint32_t max_clock_frequency;
   uint32_t max_compute_units;
   uint32_t images_supported;
+  uint32_t subgroup_size;
} val;
const void *ptr;
int size;
@@ -286,6 +287,13 @@ ilo_get_compute_param(struct pipe_screen *screen,
   ptr = val.images_supported;
   size = sizeof(val.images_supported);
   break;
+   case PIPE_COMPUTE_CAP_SUBGROUP_SIZE:
+  /* best case is SIMD32 */
+  val.subgroup_size = 32;
+
+  ptr = val.subgroup_size;
+  size = sizeof(val.subgroup_size);
+  break;
default:
   ptr = NULL;
   size = 0;
diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
index 1ca997a..f6bef83 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
@@ -340,6 +340,7 @@ nvc0_screen_get_compute_param(struct pipe_screen *pscreen,
   enum pipe_compute_cap param, void *data)
 {
uint64_t *data64 = (uint64_t *)data;
+   uint32_t *data32 = (uint32_t *)data;
const uint16_t obj_class = nvc0_screen(pscreen)-compute-oclass;
 
switch (param) {
@@ -371,6 +372,9 @@ nvc0_screen_get_compute_param(struct pipe_screen *pscreen,
case PIPE_COMPUTE_CAP_MAX_INPUT_SIZE: /* c[], arbitrary limit */
   data64[0] = 4096;
   return 8;
+   case PIPE_COMPUTE_CAP_SUBGROUP_SIZE:
+  data32[0] = 32;
+  return 4;
default:
   return 0;
}
diff --git a/src/gallium/drivers/radeon/r600_pipe_common.c 
b/src/gallium/drivers/radeon/r600_pipe_common.c
index 42e681d..5494cb3 100644
--- a/src/gallium/drivers/radeon/r600_pipe_common.c
+++ b/src/gallium/drivers/radeon/r600_pipe_common.c
@@ -637,6 +637,12 @@ static int r600_get_compute_param(struct pipe_screen 
*screen,
return sizeof(uint32_t);
case PIPE_COMPUTE_CAP_MAX_PRIVATE_SIZE:
break; /* unused */
+   case PIPE_COMPUTE_CAP_SUBGROUP_SIZE:
+   if (ret) {
+   uint32_t *subgroup_size = ret;
+   *subgroup_size = r600_wavefront_size(rscreen-family);
+   }
+   return sizeof(uint32_t);
}
 
 fprintf(stderr, unknown PIPE_COMPUTE_CAP %d\n, param);
diff --git a/src/gallium/drivers/radeon/r600_pipe_common.h 
b/src/gallium/drivers/radeon/r600_pipe_common.h
index 6ce81d3..51fd016 100644
--- a/src/gallium/drivers/radeon/r600_pipe_common.h
+++ b/src/gallium/drivers/radeon/r600_pipe_common.h
@@ -570,6 +570,26 @@ static inline unsigned r600_tex_aniso_filter(unsigned 
filter)
 /* else */return 4;
 }
 
+static inline unsigned r600_wavefront_size(enum radeon_family family)
+{
+   switch (family) {
+   case CHIP_RV610:
+   case CHIP_RS780:
+   case CHIP_RV620:
+   case CHIP_RS880:
+   return 16;
+   case CHIP_RV630:
+   case CHIP_RV635:
+   case CHIP_RV730:
+   case CHIP_RV710:
+   case CHIP_PALM:
+   case CHIP_CEDAR:
+   return 32;
+   default:
+   return 64;
+   }
+}
+
 #define COMPUTE_DBG(rscreen, fmt, args...) \
do { \
if ((rscreen-b.debug_flags  DBG_COMPUTE)) fprintf(stderr, 
fmt, ##args); \
diff --git a/src/gallium/include/pipe/p_defines.h 
b/src/gallium/include/pipe/p_defines.h
index 8fabf5e..b50ae2b 100644
--- a/src/gallium/include/pipe/p_defines.h
+++ b/src/gallium/include/pipe/p_defines.h
@@ -699,7 +699,8 @@ enum pipe_compute_cap
PIPE_COMPUTE_CAP_MAX_MEM_ALLOC_SIZE,
PIPE_COMPUTE_CAP_MAX_CLOCK_FREQUENCY,
PIPE_COMPUTE_CAP_MAX_COMPUTE_UNITS,
-   

[Mesa-dev] [PATCH 2/2] clover: implement CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE

2015-05-28 Thread Grigori Goronzy
Work-group size should always be aligned to subgroup size; this is a
basic requirement, otherwise some work-items will be no-operation.

It might make sense to refine the value according to a kernel's
resource usage, but that's a possible optimization for the future.
---
 src/gallium/state_trackers/clover/api/kernel.cpp  | 2 +-
 src/gallium/state_trackers/clover/core/device.cpp | 5 +
 src/gallium/state_trackers/clover/core/device.hpp | 1 +
 3 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/src/gallium/state_trackers/clover/api/kernel.cpp 
b/src/gallium/state_trackers/clover/api/kernel.cpp
index 05cc392..857a152 100644
--- a/src/gallium/state_trackers/clover/api/kernel.cpp
+++ b/src/gallium/state_trackers/clover/api/kernel.cpp
@@ -169,7 +169,7 @@ clGetKernelWorkGroupInfo(cl_kernel d_kern, cl_device_id 
d_dev,
   break;
 
case CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE:
-  buf.as_scalarsize_t() = 1;
+  buf.as_scalarsize_t() = dev.subgroup_size();
   break;
 
case CL_KERNEL_PRIVATE_MEM_SIZE:
diff --git a/src/gallium/state_trackers/clover/core/device.cpp 
b/src/gallium/state_trackers/clover/core/device.cpp
index 42b45b7..c42d1d2 100644
--- a/src/gallium/state_trackers/clover/core/device.cpp
+++ b/src/gallium/state_trackers/clover/core/device.cpp
@@ -185,6 +185,11 @@ device::max_block_size() const {
return { v.begin(), v.end() };
 }
 
+cl_uint
+device::subgroup_size() const {
+   return get_compute_paramuint32_t(pipe, PIPE_COMPUTE_CAP_SUBGROUP_SIZE)[0];
+}
+
 std::string
 device::device_name() const {
return pipe-get_name(pipe);
diff --git a/src/gallium/state_trackers/clover/core/device.hpp 
b/src/gallium/state_trackers/clover/core/device.hpp
index de5fc6b..2857847 100644
--- a/src/gallium/state_trackers/clover/core/device.hpp
+++ b/src/gallium/state_trackers/clover/core/device.hpp
@@ -67,6 +67,7 @@ namespace clover {
   bool has_doubles() const;
 
   std::vectorsize_t max_block_size() const;
+  cl_uint subgroup_size() const;
   std::string device_name() const;
   std::string vendor_name() const;
   enum pipe_shader_ir ir_format() const;
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] radeonsi: Add CIK SDMA support

2015-05-26 Thread Grigori Goronzy
On 26.05.2015 09:28, Michel Dänzer wrote:
 From: Michel Dänzer michel.daen...@amd.com
 
 Based on the corresponding SI support. Same as that, this is currently
 only enabled for one-dimensional buffer copies due to issues with
 multi-dimensional SDMA copies.


What a pity, so CIK has exactly the same issues as SI? We should really
try to figure out what's wrong with tiled DMA copies.

Anyway,

Reviewed-by: Grigori Goronzy g...@chown.ath.cx

 Signed-off-by: Michel Dänzer michel.daen...@amd.com
 ---
  src/gallium/drivers/radeonsi/Makefile.sources |   1 +
  src/gallium/drivers/radeonsi/cik_sdma.c   | 364 
 ++
  src/gallium/drivers/radeonsi/si_dma.c |  20 --
  src/gallium/drivers/radeonsi/si_pipe.h|   9 +
  src/gallium/drivers/radeonsi/si_state.c   |  22 +-
  src/gallium/drivers/radeonsi/si_state.h   |   1 +
  src/gallium/drivers/radeonsi/sid.h|  31 +++
  7 files changed, 427 insertions(+), 21 deletions(-)
  create mode 100644 src/gallium/drivers/radeonsi/cik_sdma.c
 
 diff --git a/src/gallium/drivers/radeonsi/Makefile.sources 
 b/src/gallium/drivers/radeonsi/Makefile.sources
 index 774dc22..2876c0a 100644
 --- a/src/gallium/drivers/radeonsi/Makefile.sources
 +++ b/src/gallium/drivers/radeonsi/Makefile.sources
 @@ -1,4 +1,5 @@
  C_SOURCES := \
 + cik_sdma.c \
   si_blit.c \
   si_commands.c \
   si_compute.c \
 diff --git a/src/gallium/drivers/radeonsi/cik_sdma.c 
 b/src/gallium/drivers/radeonsi/cik_sdma.c
 new file mode 100644
 index 000..3c0103a
 --- /dev/null
 +++ b/src/gallium/drivers/radeonsi/cik_sdma.c
 @@ -0,0 +1,364 @@
 +/*
 + * Copyright 2010 Jerome Glisse gli...@freedesktop.org
 + * Copyright 2014 Advanced Micro Devices, Inc.
 + *
 + * Permission is hereby granted, free of charge, to any person obtaining a
 + * copy of this software and associated documentation files (the Software),
 + * to deal in the Software without restriction, including without limitation
 + * on the rights to use, copy, modify, merge, publish, distribute, sub
 + * license, and/or sell copies of the Software, and to permit persons to whom
 + * the Software is furnished to do so, subject to the following conditions:
 + *
 + * The above copyright notice and this permission notice (including the next
 + * paragraph) shall be included in all copies or substantial portions of the
 + * Software.
 + *
 + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 + * FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL
 + * THE AUTHOR(S) AND/OR THEIR SUPPLIERS BE LIABLE FOR ANY CLAIM,
 + * DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
 + * OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE
 + * USE OR OTHER DEALINGS IN THE SOFTWARE.
 + *
 + * Authors:
 + *  Jerome Glisse
 + */
 +
 +#include sid.h
 +#include si_pipe.h
 +#include ../radeon/r600_cs.h
 +
 +#include util/u_format.h
 +
 +static uint32_t cik_micro_tile_mode(struct si_screen *sscreen, unsigned 
 tile_mode)
 +{
 + if (sscreen-b.info.si_tile_mode_array_valid) {
 + uint32_t gb_tile_mode = 
 sscreen-b.info.si_tile_mode_array[tile_mode];
 +
 + return G_009910_MICRO_TILE_MODE_NEW(gb_tile_mode);
 + }
 +
 + /* The kernel cannod return the tile mode array. Guess? */
 + return V_009910_ADDR_SURF_THIN_MICRO_TILING;
 +}
 +
 +static void cik_sdma_do_copy_buffer(struct si_context *ctx,
 + struct pipe_resource *dst,
 + struct pipe_resource *src,
 + uint64_t dst_offset,
 + uint64_t src_offset,
 + uint64_t size)
 +{
 + struct radeon_winsys_cs *cs = ctx-b.rings.dma.cs;
 + unsigned i, ncopy, csize;
 + struct r600_resource *rdst = (struct r600_resource*)dst;
 + struct r600_resource *rsrc = (struct r600_resource*)src;
 +
 + dst_offset += r600_resource(dst)-gpu_address;
 + src_offset += r600_resource(src)-gpu_address;
 +
 + ncopy = (size + CIK_SDMA_COPY_MAX_SIZE - 1) / CIK_SDMA_COPY_MAX_SIZE;
 + r600_need_dma_space(ctx-b, ncopy * 7);
 +
 + r600_context_bo_reloc(ctx-b, ctx-b.rings.dma, rsrc, 
 RADEON_USAGE_READ,
 +   RADEON_PRIO_MIN);
 + r600_context_bo_reloc(ctx-b, ctx-b.rings.dma, rdst, 
 RADEON_USAGE_WRITE,
 +   RADEON_PRIO_MIN);
 +
 + for (i = 0; i  ncopy; i++) {
 + csize = size  CIK_SDMA_COPY_MAX_SIZE ? size : 
 CIK_SDMA_COPY_MAX_SIZE;
 + cs-buf[cs-cdw++] = CIK_SDMA_PACKET(CIK_SDMA_OPCODE_COPY,
 +  
 CIK_SDMA_COPY_SUB_OPCODE_LINEAR,
 +  0);
 + cs-buf[cs-cdw++] = csize;
 + cs-buf[cs-cdw++] = 0; /* src/dst endian swap

Re: [Mesa-dev] [PATCH 2/2] clover: try userptr for CL_MEM_USE_HOST_PTR

2015-05-23 Thread Grigori Goronzy
On 23.05.2015 15:53, Francisco Jerez wrote:
 diff --git a/src/gallium/state_trackers/clover/core/resource.cpp 
 b/src/gallium/state_trackers/clover/core/resource.cpp
 index 8ed4c42..8e51b3c 100644
 --- a/src/gallium/state_trackers/clover/core/resource.cpp
 +++ b/src/gallium/state_trackers/clover/core/resource.cpp
 @@ -118,6 +118,8 @@ root_resource::root_resource(clover::device dev, 
 memory_obj obj,
   command_queue q, const std::string data) :
 resource(dev, obj) {
 pipe_resource info {};
 +   bool user_ptr_support = (bool)dev.pipe-get_param(dev.pipe,
 
 Let's make this variable const, and remove the cast to bool, it
 shouldn't be necessary.  With that fixed:
 Reviewed-by: Francisco Jerez curroje...@riseup.net


Thanks for the review. Pushed with minor changes.

Grigori



signature.asc
Description: OpenPGP digital signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] clover: implement CL_MEM_ALLOC_HOST_PTR

2015-05-19 Thread Grigori Goronzy
This flag is typically used to request pinned host memory, to avoid
any copies between GPU and CPU.

This improves throughput with an older OpenCL app which I unfortunately
can't publish due to its licensing.
---
 src/gallium/state_trackers/clover/core/resource.cpp | 4 
 1 file changed, 4 insertions(+)

diff --git a/src/gallium/state_trackers/clover/core/resource.cpp 
b/src/gallium/state_trackers/clover/core/resource.cpp
index bcf87e1..8ed4c42 100644
--- a/src/gallium/state_trackers/clover/core/resource.cpp
+++ b/src/gallium/state_trackers/clover/core/resource.cpp
@@ -137,6 +137,10 @@ root_resource::root_resource(clover::device dev, 
memory_obj obj,
 PIPE_BIND_TRANSFER_READ |
 PIPE_BIND_TRANSFER_WRITE);
 
+   if (obj.flags()  CL_MEM_ALLOC_HOST_PTR) {
+  info.usage = PIPE_USAGE_STAGING;
+   }
+
pipe = dev.pipe-resource_create(dev.pipe, info);
if (!pipe)
   throw error(CL_OUT_OF_RESOURCES);
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] clover: try userptr for CL_MEM_USE_HOST_PTR

2015-05-19 Thread Grigori Goronzy
According to spec, CL_MEM_USE_HOST_PTR should directly use host memory,
if possible. This is just what userptr is for, so use it.

In case the memory cannot be mapped, a fallback similar to
CL_MEM_COPY_HOST_PTR is used.
---
 src/gallium/state_trackers/clover/core/memory.cpp   |  2 +-
 src/gallium/state_trackers/clover/core/resource.cpp | 17 ++---
 2 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/src/gallium/state_trackers/clover/core/memory.cpp 
b/src/gallium/state_trackers/clover/core/memory.cpp
index 905ebc0..055336a 100644
--- a/src/gallium/state_trackers/clover/core/memory.cpp
+++ b/src/gallium/state_trackers/clover/core/memory.cpp
@@ -30,7 +30,7 @@ memory_obj::memory_obj(clover::context ctx, cl_mem_flags 
flags,
size_t size, void *host_ptr) :
context(ctx), _flags(flags),
_size(size), _host_ptr(host_ptr) {
-   if (flags  (CL_MEM_COPY_HOST_PTR | CL_MEM_USE_HOST_PTR))
+   if (flags  CL_MEM_COPY_HOST_PTR)
   data.append((char *)host_ptr, size);
 }
 
diff --git a/src/gallium/state_trackers/clover/core/resource.cpp 
b/src/gallium/state_trackers/clover/core/resource.cpp
index 8ed4c42..8e51b3c 100644
--- a/src/gallium/state_trackers/clover/core/resource.cpp
+++ b/src/gallium/state_trackers/clover/core/resource.cpp
@@ -118,6 +118,8 @@ root_resource::root_resource(clover::device dev, 
memory_obj obj,
  command_queue q, const std::string data) :
resource(dev, obj) {
pipe_resource info {};
+   bool user_ptr_support = (bool)dev.pipe-get_param(dev.pipe,
+ PIPE_CAP_RESOURCE_FROM_USER_MEMORY);
 
if (image *img = dynamic_castimage *(obj)) {
   info.format = translate_format(img-format());
@@ -137,7 +139,15 @@ root_resource::root_resource(clover::device dev, 
memory_obj obj,
 PIPE_BIND_TRANSFER_READ |
 PIPE_BIND_TRANSFER_WRITE);
 
-   if (obj.flags()  CL_MEM_ALLOC_HOST_PTR) {
+   if (obj.flags()  CL_MEM_USE_HOST_PTR  user_ptr_support) {
+  // Page alignment is normally required for this, just try, hope for the
+  // best and fall back if it fails.
+  pipe = dev.pipe-resource_from_user_memory(dev.pipe, info, 
obj.host_ptr());
+  if (pipe)
+ return;
+   }
+
+   if (obj.flags()  (CL_MEM_ALLOC_HOST_PTR | CL_MEM_USE_HOST_PTR)) {
   info.usage = PIPE_USAGE_STAGING;
}
 
@@ -145,12 +155,13 @@ root_resource::root_resource(clover::device dev, 
memory_obj obj,
if (!pipe)
   throw error(CL_OUT_OF_RESOURCES);
 
-   if (!data.empty()) {
+   if (obj.flags()  (CL_MEM_USE_HOST_PTR | CL_MEM_COPY_HOST_PTR)) {
+  const void *data_ptr = !data.empty() ? data.data() : obj.host_ptr();
   box rect { {{ 0, 0, 0 }}, {{ info.width0, info.height0, info.depth0 }} };
   unsigned cpp = util_format_get_blocksize(info.format);
 
   q.pipe-transfer_inline_write(q.pipe, pipe, 0, PIPE_TRANSFER_WRITE,
-rect, data.data(), cpp * info.width0,
+rect, data_ptr, cpp * info.width0,
 cpp * info.width0 * info.height0);
}
 }
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] Revert radeon/llvm: enable unsafe math for graphics shaders

2015-02-18 Thread Grigori Goronzy
Hi,

AFAIR not enabling this makes LLVM generate really slow code in some
common cases. Maybe this is just a bug in LLVM/R600 triggered by unsafe
FP math optimization or some optimization is too eager. Other drivers do
fine with these types of optimization.

What's the impact on performance with unsafe FP math disabled at this time?

Best regards
Grigori

On 17.02.2015 09:15, Michel Dänzer wrote:
 From: Michel Dänzer michel.daen...@amd.com
 
 This reverts commit 0e9cdedd2e3943bdb7f3543a3508b883b167e427.
 
 It caused the grass to disappear in The Talos Principle.
 
 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89069
 Signed-off-by: Michel Dänzer mic...@daenzer.net
 ---
  src/gallium/drivers/radeon/radeon_llvm_emit.c | 4 
  1 file changed, 4 deletions(-)
 
 diff --git a/src/gallium/drivers/radeon/radeon_llvm_emit.c 
 b/src/gallium/drivers/radeon/radeon_llvm_emit.c
 index 0f9dbab..624077c 100644
 --- a/src/gallium/drivers/radeon/radeon_llvm_emit.c
 +++ b/src/gallium/drivers/radeon/radeon_llvm_emit.c
 @@ -80,10 +80,6 @@ void radeon_llvm_shader_type(LLVMValueRef F, unsigned type)
   sprintf(Str, %1d, llvm_type);
  
   LLVMAddTargetDependentFunctionAttr(F, ShaderType, Str);
 -
 - if (type != TGSI_PROCESSOR_COMPUTE) {
 - LLVMAddTargetDependentFunctionAttr(F, unsafe-fp-math, true);
 - }
  }
  
  static void init_r600_target()
 




signature.asc
Description: OpenPGP digital signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] Revert radeon/llvm: enable unsafe math for graphics shaders

2015-02-18 Thread Grigori Goronzy

Am 2015-02-18 09:13, schrieb Michel Dänzer:

On 18.02.2015 16:52, Grigori Goronzy wrote:

Hi,

AFAIR not enabling this makes LLVM generate really slow code in some
common cases. Maybe this is just a bug in LLVM/R600 triggered by 
unsafe
FP math optimization or some optimization is too eager. Other drivers 
do

fine with these types of optimization.


It can be enabled again after fixing the problem exposed by The Talos
Principle.



Right. I just want to avoid the situation that this workaround gets 
committed and then forgotten about. I guess it's up to me then. :)





On 17.02.2015 09:15, Michel Dänzer wrote:

From: Michel Dänzer michel.daen...@amd.com

This reverts commit 0e9cdedd2e3943bdb7f3543a3508b883b167e427.

It caused the grass to disappear in The Talos Principle.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89069
Signed-off-by: Michel Dänzer mic...@daenzer.net
---
 src/gallium/drivers/radeon/radeon_llvm_emit.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/src/gallium/drivers/radeon/radeon_llvm_emit.c 
b/src/gallium/drivers/radeon/radeon_llvm_emit.c

index 0f9dbab..624077c 100644
--- a/src/gallium/drivers/radeon/radeon_llvm_emit.c
+++ b/src/gallium/drivers/radeon/radeon_llvm_emit.c
@@ -80,10 +80,6 @@ void radeon_llvm_shader_type(LLVMValueRef F, 
unsigned type)

sprintf(Str, %1d, llvm_type);

LLVMAddTargetDependentFunctionAttr(F, ShaderType, Str);
-
-   if (type != TGSI_PROCESSOR_COMPUTE) {
-   LLVMAddTargetDependentFunctionAttr(F, unsafe-fp-math, true);
-   }
 }

 static void init_r600_target()


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radeonsi: Disable asynchronous DMA except for PIPE_BUFFER

2014-11-14 Thread Grigori Goronzy
Reviewed-by: Grigori Goronzy g...@chown.ath.cx

I've been using a similar patch to fix stability issues on my machine
for quite a while. Still, it's a pity we have to go that far to get
everything stable again.

On 13.11.2014 07:52, Michel Dänzer wrote:
 From: Michel Dänzer michel.daen...@amd.com
 
 Using the asynchronous DMA engine for multi-dimensional operations seems
 to cause random GPU lockups for various people. While the root cause for
 this might need to be fixed in the kernel, let's disable it for now.
 
 Before re-enabling this, please make sure you can hit all newly enabled
 paths in your testing, preferably with both piglit and real world apps,
 and get in touch with people on the bug reports below for stability
 testing.
 
 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=85647
 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83500
 Cc: mesa-sta...@lists.freedesktop.org
 Signed-off-by: Michel Dänzer michel.daen...@amd.com
 ---
  src/gallium/drivers/radeonsi/si_dma.c | 3 +++
  1 file changed, 3 insertions(+)
 
 diff --git a/src/gallium/drivers/radeonsi/si_dma.c 
 b/src/gallium/drivers/radeonsi/si_dma.c
 index b1bd5e7..1d3b524 100644
 --- a/src/gallium/drivers/radeonsi/si_dma.c
 +++ b/src/gallium/drivers/radeonsi/si_dma.c
 @@ -250,6 +250,9 @@ void si_dma_copy(struct pipe_context *ctx,
   return;
   }
  
 + /* XXX: The paths below cause lockups for some */
 + goto fallback;
 +
   if (src-format != dst-format || src_box-depth  1 ||
   rdst-dirty_level_mask != 0 ||
   rdst-cmask.size || rdst-fmask.size ||
 




signature.asc
Description: OpenPGP digital signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/4] radeonsi: Catch more cases that can't be handled by si_dma_copy_buffer/tile

2014-10-01 Thread Grigori Goronzy
On 30.09.2014 05:58, Michel Dänzer wrote:
 diff --git a/src/gallium/drivers/radeonsi/si_dma.c 
 b/src/gallium/drivers/radeonsi/si_dma.c
 index ff64722..643ce3f 100644
 --- a/src/gallium/drivers/radeonsi/si_dma.c
 +++ b/src/gallium/drivers/radeonsi/si_dma.c
 @@ -251,7 +251,9 @@ void si_dma_copy(struct pipe_context *ctx,
   }
  
   if (src-format != dst-format || src_box-depth  1 ||
 - rdst-dirty_level_mask != 0) {
 + rdst-dirty_level_mask != 0 ||
 + rdst-cmask.size || rdst-fmask.size ||
 + rsrc-cmask.size || rsrc-fmask.size) {
   goto fallback;
   }

Does the existence of the cmask alone really matter? We shouldn't copy
from or to fast cleared surfaces, but this change will disable DMA
copies even if the fast clear has been eliminated. Isn't that handled
elsewhere already?

   }
   /* the x test here are currently useless (because we don't support 
 partial blit)
* but keep them around so we don't forget about those
*/
 - if ((src_pitch % 8) || (src_box-x % 8) || (dst_x % 8) || (src_box-y % 
 8) || (dst_y % 8)) {
 + if ((src_pitch % 8) || (src_box-x % 8) || (dst_x % 8) ||
 + (src_box-y % 8) || (dst_y % 8) || (src_box-height % 8)) {
   goto fallback;
   }

That will only allow DMA copies for heights with a multiple of 8. That
isn't requirement of the DMA engines AFAICT, size is always specified in
DWs anyway.

Grigori




signature.asc
Description: OpenPGP digital signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radeonsi: Simplify si_dma_copy_tile function

2014-09-10 Thread Grigori Goronzy
LGTM, but I have a comments below.

Grigori

On 10.09.2014 10:54, Michel Dänzer wrote:
 From: Michel Dänzer michel.daen...@amd.com
 
 Signed-off-by: Michel Dänzer michel.daen...@amd.com
 ---
 
 This might help for investigating DMA related bugs.
 
  src/gallium/drivers/radeonsi/si_dma.c | 103 
 ++
  1 file changed, 41 insertions(+), 62 deletions(-)
 
 diff --git a/src/gallium/drivers/radeonsi/si_dma.c 
 b/src/gallium/drivers/radeonsi/si_dma.c
 index f6e2a78..c067cd9 100644
 --- a/src/gallium/drivers/radeonsi/si_dma.c
 +++ b/src/gallium/drivers/radeonsi/si_dma.c
 @@ -130,8 +130,11 @@ static void si_dma_copy_tile(struct si_context *ctx,
   struct si_screen *sscreen = ctx-screen;
   struct r600_texture *rsrc = (struct r600_texture*)src;
   struct r600_texture *rdst = (struct r600_texture*)dst;
 + struct r600_texture *rlinear, *rtiled;
 + unsigned linear_lvl, tiled_lvl;
   unsigned array_mode, lbpp, pitch_tile_max, slice_tile_max, size;
 - unsigned ncopy, height, cheight, detile, i, x, y, z, src_mode, dst_mode;
 + unsigned ncopy, height, cheight, detile, i, src_mode, dst_mode;
 + unsigned linear_x, linear_y, linear_z,  tiled_x, tiled_y, tiled_z;
   unsigned sub_cmd, bank_h, bank_w, mt_aspect, nbanks, tile_split, mt;
   uint64_t base, addr;
   unsigned pipe_config, tile_mode_index;
 @@ -143,68 +146,44 @@ static void si_dma_copy_tile(struct si_context *ctx,
   dst_mode = dst_mode == RADEON_SURF_MODE_LINEAR_ALIGNED ? 
 RADEON_SURF_MODE_LINEAR : dst_mode;
   assert(dst_mode != src_mode);
  
 - y = 0;
   sub_cmd = SI_DMA_COPY_TILED;
   lbpp = util_logbase2(bpp);
   pitch_tile_max = ((pitch / bpp) / 8) - 1;
  
 - if (dst_mode == RADEON_SURF_MODE_LINEAR) {
 - /* T2L */
 - array_mode = si_array_mode(src_mode);
 - slice_tile_max = (rsrc-surface.level[src_level].nblk_x * 
 rsrc-surface.level[src_level].nblk_y) / (8*8);
 - slice_tile_max = slice_tile_max ? slice_tile_max - 1 : 0;
 - /* linear height must be the same as the slice tile max height, 
 it's ok even
 -  * if the linear destination/source have smaller heigh as the 
 size of the
 -  * dma packet will be using the copy_height which is always 
 smaller or equal
 -  * to the linear height
 -  */
 - height = rsrc-surface.level[src_level].npix_y;
 - detile = 1;
 - x = src_x;
 - y = src_y;
 - z = src_z;
 - base = rsrc-surface.level[src_level].offset;
 - addr = rdst-surface.level[dst_level].offset;
 - addr += rdst-surface.level[dst_level].slice_size * dst_z;
 - addr += dst_y * pitch + dst_x * bpp;
 - bank_h = cik_bank_wh(rsrc-surface.bankh);
 - bank_w = cik_bank_wh(rsrc-surface.bankw);
 - mt_aspect = cik_macro_tile_aspect(rsrc-surface.mtilea);
 - tile_split = cik_tile_split(rsrc-surface.tile_split);
 - tile_mode_index = si_tile_mode_index(rsrc, src_level,
 -  
 util_format_has_stencil(util_format_description(src-format)));
 - nbanks = si_num_banks(sscreen, rsrc);
 - base += rsrc-resource.gpu_address;
 - addr += rdst-resource.gpu_address;
 - } else {
 - /* L2T */
 - array_mode = si_array_mode(dst_mode);
 - slice_tile_max = (rdst-surface.level[dst_level].nblk_x * 
 rdst-surface.level[dst_level].nblk_y) / (8*8);
 - slice_tile_max = slice_tile_max ? slice_tile_max - 1 : 0;
 - /* linear height must be the same as the slice tile max height, 
 it's ok even
 -  * if the linear destination/source have smaller heigh as the 
 size of the
 -  * dma packet will be using the copy_height which is always 
 smaller or equal
 -  * to the linear height
 -  */
 - height = rdst-surface.level[dst_level].npix_y;
 - detile = 0;
 - x = dst_x;
 - y = dst_y;
 - z = dst_z;
 - base = rdst-surface.level[dst_level].offset;
 - addr = rsrc-surface.level[src_level].offset;
 - addr += rsrc-surface.level[src_level].slice_size * src_z;
 - addr += src_y * pitch + src_x * bpp;
 - bank_h = cik_bank_wh(rdst-surface.bankh);
 - bank_w = cik_bank_wh(rdst-surface.bankw);
 - mt_aspect = cik_macro_tile_aspect(rdst-surface.mtilea);
 - tile_split = cik_tile_split(rdst-surface.tile_split);
 - tile_mode_index = si_tile_mode_index(rdst, dst_level,
 -  
 util_format_has_stencil(util_format_description(dst-format)));
 - nbanks = si_num_banks(sscreen, rdst);
 - base += rdst-resource.gpu_address;
 - addr += 

Re: [Mesa-dev] [PATCH] r600g, radeonsi: add debug option which forces DMA for copy_region and blit

2014-09-08 Thread Grigori Goronzy
On 08.09.2014 14:50, Axel Davy wrote:
 Hi,
 
 When reading si_dma.c code, it looks like the requested width of the
 copy is ignored except for PIPE_BUFFER.
 Perhaps that explains the bugs observed ?


It isn't ignored. Partial DMA copies (i.e. operations that do not copy
whole lines) are simply not supported right now, and will fall back to
resource_copy_region. In fact, it's even stricter: the picthes of source
and destination have to match.

Grigori

 Axel Davy
 
 On 06/09/2014 19:14, Andy Furniss wrote :
 Marek Olšák wrote:
 From: Marek Olšák marek.ol...@amd.com

 ---

 Turn this on, run piglit, and pray for mercy.
 It might be interesting to see if it makes 3D apps any faster. Or
 piglit.

 Well it's not piglit and I haven't benchmarked anything yet, but I get
 a couple of faults at the start of Unigine Valley.

 [20635.429686] radeon :01:00.0: GPU fault detected: 146 0x07bd3d14
 [20635.429690] radeon :01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR
   0x683D
 [20635.429691] radeon :01:00.0:
 VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x1D03D014
 [20635.429693] VM fault (0x04, vmid 14) at page 26685, write from DMA1
 (61)
 [20635.429708] radeon :01:00.0: GPU fault detected: 146 0x07bd3d14
 [20635.429709] radeon :01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR
   0x
 [20635.429710] radeon :01:00.0:
 VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x1C03D00C
 [20635.429711] VM fault (0x0c, vmid 14) at page 0, read from DMA1 (61)

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
 
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev




signature.asc
Description: OpenPGP digital signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g, radeonsi: add debug option which forces DMA for copy_region and blit

2014-09-08 Thread Grigori Goronzy
On 08.09.2014 21:07, Axel Davy wrote:
 On 08/09/2014 20:21, Grigori Goronzy wrote :
 On 08.09.2014 14:50, Axel Davy wrote:
 Hi,

 When reading si_dma.c code, it looks like the requested width of the
 copy is ignored except for PIPE_BUFFER.
 Perhaps that explains the bugs observed ?

 It isn't ignored. Partial DMA copies (i.e. operations that do not copy
 whole lines) are simply not supported right now, and will fall back to
 resource_copy_region. In fact, it's even stricter: the picthes of source
 and destination have to match.

 Grigori
 My point is I don't see a check to verify the width of the copy equals
 the width of the buffers,
 even if I see the pitch test.


There's a check for that (src_w != dst_w) in si_dma_copy.

Grigori

 Axel
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev




signature.asc
Description: OpenPGP digital signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/3] radeonsi: add sampling of 4:2:2 subsampled textures

2014-08-29 Thread Grigori Goronzy
On 29.08.2014 10:19, Christian König wrote:
 
 That sounds like something doesn't work correctly.
 
 The resources are created with the subsamled formats R8G8_R8B8 or
 G8R8_B8R8, but since this can't be accessed by the CB we need to use
 R8G8B8A8 as surface format for writing to them.
 
 If that doesn't work the whole blitter and transfer code won't work
 either and we obviously have a problem. Can you figure out what's the
 real cause of this? Maybe we use the wrong tiling mode with such surfaces?


I don't really know how tiling works, but it looks like formats disagree
in how width is specified. Even though 4:2:2 subsampled formats are
logically organized in 32 bit blocks of 2x1 luma pixels with one chroma
value, the width needs to be specified in single pixels, not 2x1 blocks
of pixels. This is similar to compressed texture formats. If you now try
to reinterpret the surface as plain RGBA, the width is incorrect.

The blitting code (resource_copy_region) also reinterprets subsampled
formats as RGBA, but it adjusts the width (uses number of blocks as
width) [1].

By the way, on my system, the 4:2:2 video buffers are by default mapped
directly, so broken blitting won't have any effect. If I force usage of
a staging texture, the dma_copy and resource_copy_region paths work
correctly, though.

 As for that 4:2:2 doesn't work, AFAICT it absolutely does, but there
 is no linear interpolation for chroma, so quality isn't ideal. This
 seems to be a hardware restriction, unfortunately.
 
 Interesting, from the docs I thought that linear interpolation should
 also work for G and B components but I might remember that incorrectly.


Maybe some special incantation is required to enable chroma filtering? I
don't have any detailed docs and couldn't find anything in the register
docs. :) However, there's a GL extension that exposes subsampled, packed
4:2:2 formats, GL_APPLE_rgb_422. The spec explicitly states that chroma
filtering is undefined. My guess is that the extension was modeled after
existing hardware and was written that way because some hardware like
Radeon lacks good chroma filtering.

 Anyway, it might be easier to handle YUYV and UYVY as planar formats
 anyway and only to the transformation while we upload the data to them.
 

A planar format might also be useful for deinterlacing 4:2:2, so that
would be a useful addition.

Grigori

[1]
http://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/radeonsi/si_blit.c#n559



signature.asc
Description: OpenPGP digital signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/3] radeonsi: add sampling of 4:2:2 subsampled textures

2014-08-29 Thread Grigori Goronzy
On 29.08.2014 12:31, Andy Furniss wrote:
 As for that 4:2:2 doesn't work, AFAICT it absolutely does, but
 there is no linear interpolation for chroma, so quality isn't ideal.
 This seems to be a hardware restriction, unfortunately.
 
 Hmm, we may have to disagree on the definition of working here :-)
 
 Of course I don't understand the hardware - but are you saying that
 because the chrome needs horizontal interpolation you have to have
 vertical interpolation too?


No, I just observed that there is no linear interpolation horizontally.
I.e. the exact same chroma value is used for two pixels of the same block.

 
 Tests were 1:1 vertical scale.
 
 With weave shader a test pattern rgbymc 422 chroma looked much like it
 would if I had used ffmpeg to convert to it 420 (ie none of those at all
 rendered). On real video it bled between fields, so my use case of
 feeding it to a TV to deinterlace would fail - luma was OK.


Hmm - where can I find this test pattern?

I just made a simple test pattern myself: http://i.imgur.com/hsLFBxy.png
I displayed it in mpv with the following command line:
 mpv --vo=vdpau --vf=format=uyvy422 422-chroma.png

I clearly see separate chroma for each line, but chroma cositing seems
to be a bit off, i.e. chroma is shifted downwards by about half a pixel.
That doesn't happen with forced progressive mode. So this is most
probably it.

Grigori



signature.asc
Description: OpenPGP digital signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/3] radeonsi: add sampling of 4:2:2 subsampled textures

2014-08-28 Thread Grigori Goronzy
On 04.07.2014 01:24, Andy Furniss wrote:
 Maybe not 1/frame but anyway the first couple of a run have numbers
 rather than s
 
 [27977.386795] radeon :01:00.0: GPU fault detected: 146 0x0c035014
 [27977.386800] radeon :01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR
   0x15E0
 [27977.386802] radeon :01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS
 0x03050014
 [27977.386804] VM fault (0x04, vmid 1) at page 5600, write from CB (80)
 [27977.386841] radeon :01:00.0: GPU fault detected: 146 0x0c03e014
 [27977.386842] radeon :01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR
   0x15E0
 [27977.386844] radeon :01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS
 0x030E0014
 [27977.386846] VM fault (0x04, vmid 1) at page 5600, write from CB (224)
 

OK, so I finally have some time to look into this.

These faults appear to be caused by the surface format mangling that vl
applies to subsampled formats, which results in an incorrect framebuffer
setup, so that rendering causes writes beyond the size of the texture
BO. Typically the only rendering operation on video buffers is clearing
them and that's why you only see these errors a couple of times after
starting a video, one for each video surface that is created. I'm not
yet sure what's the best way to solve this issue. The format mangling
would need to also change the surface width appropriately. Or maybe it's
for the best to make rendering support optional and never try to render
to subsampled surfaces. Any thoughts about this, Christian?

As for that 4:2:2 doesn't work, AFAICT it absolutely does, but there
is no linear interpolation for chroma, so quality isn't ideal. This
seems to be a hardware restriction, unfortunately.

Grigori



signature.asc
Description: OpenPGP digital signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] radeonsi: implement BPTC texture support

2014-08-12 Thread Grigori Goronzy
Passes all piglit tests.

v2: rebased
---
 src/gallium/drivers/radeonsi/si_state.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/src/gallium/drivers/radeonsi/si_state.c 
b/src/gallium/drivers/radeonsi/si_state.c
index 6e9a60a..4f7adea 100644
--- a/src/gallium/drivers/radeonsi/si_state.c
+++ b/src/gallium/drivers/radeonsi/si_state.c
@@ -1102,6 +1102,22 @@ static uint32_t si_translate_texformat(struct 
pipe_screen *screen,
}
}
 
+   if (desc-layout == UTIL_FORMAT_LAYOUT_BPTC) {
+   if (!enable_s3tc)
+   goto out_unknown;
+
+   switch (format) {
+   case PIPE_FORMAT_BPTC_RGBA_UNORM:
+   case PIPE_FORMAT_BPTC_SRGBA:
+   return V_008F14_IMG_DATA_FORMAT_BC7;
+   case PIPE_FORMAT_BPTC_RGB_FLOAT:
+   case PIPE_FORMAT_BPTC_RGB_UFLOAT:
+   return V_008F14_IMG_DATA_FORMAT_BC6;
+   default:
+   goto out_unknown;
+   }
+   }
+
if (desc-layout == UTIL_FORMAT_LAYOUT_SUBSAMPLED) {
switch (format) {
case PIPE_FORMAT_R8G8_B8G8_UNORM:
@@ -2467,12 +2483,16 @@ static struct pipe_sampler_view 
*si_create_sampler_view(struct pipe_context *ctx
case PIPE_FORMAT_DXT1_SRGBA:
case PIPE_FORMAT_DXT3_SRGBA:
case PIPE_FORMAT_DXT5_SRGBA:
+   case PIPE_FORMAT_BPTC_SRGBA:
num_format = 
V_008F14_IMG_NUM_FORMAT_SRGB;
break;
case PIPE_FORMAT_RGTC1_SNORM:
case PIPE_FORMAT_LATC1_SNORM:
case PIPE_FORMAT_RGTC2_SNORM:
case PIPE_FORMAT_LATC2_SNORM:
+   /* implies float, so use SNORM/UNORM to 
determine
+  whether data is signed or not */
+   case PIPE_FORMAT_BPTC_RGB_FLOAT:
num_format = 
V_008F14_IMG_NUM_FORMAT_SNORM;
break;
default:
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] radeonsi: implement BPTC texture support

2014-07-23 Thread Grigori Goronzy
Passes corrected piglit test and should also handle signed vs unsigned
float correctly.
---
 src/gallium/drivers/radeonsi/si_state.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/src/gallium/drivers/radeonsi/si_state.c 
b/src/gallium/drivers/radeonsi/si_state.c
index 3dec536..6b64e7c 100644
--- a/src/gallium/drivers/radeonsi/si_state.c
+++ b/src/gallium/drivers/radeonsi/si_state.c
@@ -1102,6 +1102,22 @@ static uint32_t si_translate_texformat(struct 
pipe_screen *screen,
}
}
 
+   if (desc-layout == UTIL_FORMAT_LAYOUT_BPTC) {
+   if (!enable_s3tc)
+   goto out_unknown;
+
+   switch (format) {
+   case PIPE_FORMAT_BPTC_RGBA_UNORM:
+   case PIPE_FORMAT_BPTC_SRGBA_UNORM:
+   return V_008F14_IMG_DATA_FORMAT_BC7;
+   case PIPE_FORMAT_BPTC_RGB_FLOAT:
+   case PIPE_FORMAT_BPTC_RGB_UFLOAT:
+   return V_008F14_IMG_DATA_FORMAT_BC6;
+   default:
+   goto out_unknown;
+   }
+   }
+
if (desc-layout == UTIL_FORMAT_LAYOUT_SUBSAMPLED) {
switch (format) {
case PIPE_FORMAT_R8G8_B8G8_UNORM:
@@ -2467,12 +2483,16 @@ static struct pipe_sampler_view 
*si_create_sampler_view(struct pipe_context *ctx
case PIPE_FORMAT_DXT1_SRGBA:
case PIPE_FORMAT_DXT3_SRGBA:
case PIPE_FORMAT_DXT5_SRGBA:
+   case PIPE_FORMAT_BPTC_SRGBA_UNORM:
num_format = 
V_008F14_IMG_NUM_FORMAT_SRGB;
break;
case PIPE_FORMAT_RGTC1_SNORM:
case PIPE_FORMAT_LATC1_SNORM:
case PIPE_FORMAT_RGTC2_SNORM:
case PIPE_FORMAT_LATC2_SNORM:
+   /* implies float, so use SNORM/UNORM to 
determine
+  whether data is signed or not */
+   case PIPE_FORMAT_BPTC_RGB_FLOAT:
num_format = 
V_008F14_IMG_NUM_FORMAT_SNORM;
break;
default:
-- 
1.8.3.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] radeon/llvm: enable unsafe math for graphics shaders

2014-07-21 Thread Grigori Goronzy
On 17.07.2014 21:24, Tom Stellard wrote:
 On Thu, Jul 17, 2014 at 06:44:25PM +0200, Grigori Goronzy wrote:
 Accuracy of some operations was recently improved in the R600 backend,
 at the cost of slower code. This is required for compute shaders,
 but not for graphics shaders. Add unsafe-fp-math hint to make LLVM
 generate faster but possibly less accurate code.

 Piglit didn't indicate any regressions.
 
 Both patches are:
 Reviewed-by: Tom Stellard thomas.stell...@amd.com


Can you please commit the patches for me? My account request is still
pending.

Grigori

 ---
  src/gallium/drivers/radeon/radeon_llvm_emit.c | 5 +
  1 file changed, 5 insertions(+)

 diff --git a/src/gallium/drivers/radeon/radeon_llvm_emit.c 
 b/src/gallium/drivers/radeon/radeon_llvm_emit.c
 index 1b17dd4..171ccaa 100644
 --- a/src/gallium/drivers/radeon/radeon_llvm_emit.c
 +++ b/src/gallium/drivers/radeon/radeon_llvm_emit.c
 @@ -26,6 +26,7 @@
  #include radeon_llvm_emit.h
  #include radeon_elf_util.h
  #include util/u_memory.h
 +#include pipe/p_shader_tokens.h
  
  #include llvm-c/Target.h
  #include llvm-c/TargetMachine.h
 @@ -50,6 +51,10 @@ void radeon_llvm_shader_type(LLVMValueRef F, unsigned 
 type)
sprintf(Str, %1d, type);
  
LLVMAddTargetDependentFunctionAttr(F, ShaderType, Str);
 +
 +  if (type != TGSI_PROCESSOR_COMPUTE) {
 +LLVMAddTargetDependentFunctionAttr(F, unsafe-fp-math, true);
 +  }
  }
  
  static void init_r600_target() {
 -- 
 1.8.3.2

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
 




signature.asc
Description: OpenPGP digital signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/5] r600g, radeonsi: Use write-combined persistent GTT mappings

2014-07-18 Thread Grigori Goronzy
On 18.07.2014 13:45, Marek Olšák wrote:
 If the requirements of GL_MAP_COHERENT_BIT are satisfied, then the
 patch is okay.


Apart from correctness, I still wonder how this will affect performance,
most notably CPU reads. This change unconditionally uses write-combined,
uncached memory for MAP_COHERENT buffers. Unless I am missing something,
CPU reads will be slow, even if the buffer storage flags indicate that
the buffer will be read by the CPU. Maybe it's a good idea to avoid
write combined memory if the buffer storage flags include MAP_READ_BIT?

Grigori

 Marek
 
 
 On Fri, Jul 18, 2014 at 5:19 AM, Michel Dänzer mic...@daenzer.net wrote:
 On 17.07.2014 21:00, Marek Olšák wrote:
 On Thu, Jul 17, 2014 at 12:01 PM, Michel Dänzer mic...@daenzer.net wrote:
 From: Michel Dänzer michel.daen...@amd.com

 This is hopefully safe: The kernel makes sure writes to these mappings
 finish before the GPU might start reading from them, and the GPU caches
 are invalidated at the start of a command stream.

 The resource flags actually tell you what you can do. If the COHERENT
 flag is set, the mapping must be cached.

 Why is that required? As I explain above, we should satisfy the
 requirements of the ARB_buffer_storage extension AFAICT.


 As pointed out by you and Grigori in other posts, I should probably just
 drop the special treatment of persistent mappings though, so the
 placement and flags are derived from the buffer usage.


 --
 Earthling Michel Dänzer|  http://www.amd.com
 Libre software enthusiast  |Mesa and X developer
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
 




signature.asc
Description: OpenPGP digital signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/5] r600g, radeonsi: Use write-combined persistent GTT mappings

2014-07-17 Thread Grigori Goronzy
On 17.07.2014 12:01, Michel Dänzer wrote:
 From: Michel Dänzer michel.daen...@amd.com
 
 This is hopefully safe: The kernel makes sure writes to these mappings
 finish before the GPU might start reading from them, and the GPU caches
 are invalidated at the start of a command stream.


Aren't CPU reads from write-combined GTT memory extraordinarily slow,
because they're uncached? And don't you need the right access patterns
to make write combining perform well?

Grigori

 Signed-off-by: Michel Dänzer michel.daen...@amd.com
 ---
  src/gallium/drivers/radeon/r600_buffer_common.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 
 diff --git a/src/gallium/drivers/radeon/r600_buffer_common.c 
 b/src/gallium/drivers/radeon/r600_buffer_common.c
 index 40917f0..c8a0723 100644
 --- a/src/gallium/drivers/radeon/r600_buffer_common.c
 +++ b/src/gallium/drivers/radeon/r600_buffer_common.c
 @@ -131,7 +131,7 @@ bool r600_init_resource(struct r600_common_screen 
 *rscreen,
   res-b.b.flags  (PIPE_RESOURCE_FLAG_MAP_PERSISTENT |
 PIPE_RESOURCE_FLAG_MAP_COHERENT)) {
   res-domains = RADEON_DOMAIN_GTT;
 - flags = 0;
 + flags = RADEON_FLAG_GTT_WC;
   }
  
   /* Tiled textures are unmappable. Always put them in VRAM. */
 




signature.asc
Description: OpenPGP digital signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] radeon/llvm: enable unsafe math for graphics shaders

2014-07-17 Thread Grigori Goronzy
Accuracy of some operations was recently improved in the R600 backend,
at the cost of slower code. This is required for compute shaders,
but not for graphics shaders. Add unsafe-fp-math hint to make LLVM
generate faster but possibly less accurate code.

Piglit didn't indicate any regressions.
---
 src/gallium/drivers/radeon/radeon_llvm_emit.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/src/gallium/drivers/radeon/radeon_llvm_emit.c 
b/src/gallium/drivers/radeon/radeon_llvm_emit.c
index 1b17dd4..171ccaa 100644
--- a/src/gallium/drivers/radeon/radeon_llvm_emit.c
+++ b/src/gallium/drivers/radeon/radeon_llvm_emit.c
@@ -26,6 +26,7 @@
 #include radeon_llvm_emit.h
 #include radeon_elf_util.h
 #include util/u_memory.h
+#include pipe/p_shader_tokens.h
 
 #include llvm-c/Target.h
 #include llvm-c/TargetMachine.h
@@ -50,6 +51,10 @@ void radeon_llvm_shader_type(LLVMValueRef F, unsigned type)
   sprintf(Str, %1d, type);
 
   LLVMAddTargetDependentFunctionAttr(F, ShaderType, Str);
+
+  if (type != TGSI_PROCESSOR_COMPUTE) {
+LLVMAddTargetDependentFunctionAttr(F, unsafe-fp-math, true);
+  }
 }
 
 static void init_r600_target() {
-- 
1.8.3.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] radeon/llvm: fix formatting

2014-07-17 Thread Grigori Goronzy
Use KR and same indent as most other code. No functional change
intended.
---
 src/gallium/drivers/radeon/radeon_llvm_emit.c | 24 ++--
 1 file changed, 14 insertions(+), 10 deletions(-)

diff --git a/src/gallium/drivers/radeon/radeon_llvm_emit.c 
b/src/gallium/drivers/radeon/radeon_llvm_emit.c
index 171ccaa..53694b7 100644
--- a/src/gallium/drivers/radeon/radeon_llvm_emit.c
+++ b/src/gallium/drivers/radeon/radeon_llvm_emit.c
@@ -47,17 +47,18 @@
  */
 void radeon_llvm_shader_type(LLVMValueRef F, unsigned type)
 {
-  char Str[2];
-  sprintf(Str, %1d, type);
+   char Str[2];
+   sprintf(Str, %1d, type);
 
-  LLVMAddTargetDependentFunctionAttr(F, ShaderType, Str);
+   LLVMAddTargetDependentFunctionAttr(F, ShaderType, Str);
 
-  if (type != TGSI_PROCESSOR_COMPUTE) {
-LLVMAddTargetDependentFunctionAttr(F, unsafe-fp-math, true);
-  }
+   if (type != TGSI_PROCESSOR_COMPUTE) {
+   LLVMAddTargetDependentFunctionAttr(F, unsafe-fp-math, true);
+   }
 }
 
-static void init_r600_target() {
+static void init_r600_target()
+{
static unsigned initialized = 0;
if (!initialized) {
LLVMInitializeR600TargetInfo();
@@ -68,7 +69,8 @@ static void init_r600_target() {
}
 }
 
-static LLVMTargetRef get_r600_target() {
+static LLVMTargetRef get_r600_target()
+{
LLVMTargetRef target = NULL;
 
for (target = LLVMGetFirstTarget(); target;
@@ -87,7 +89,8 @@ static LLVMTargetRef get_r600_target() {
 
 #if HAVE_LLVM = 0x0305
 
-static void radeonDiagnosticHandler(LLVMDiagnosticInfoRef di, void *context) {
+static void radeonDiagnosticHandler(LLVMDiagnosticInfoRef di, void *context)
+{
if (LLVMGetDiagInfoSeverity(di) == LLVMDSError) {
unsigned int *diagnosticflag = (unsigned int *)context;
char *diaginfo_message = LLVMGetDiagInfoDescription(di);
@@ -106,7 +109,8 @@ static void radeonDiagnosticHandler(LLVMDiagnosticInfoRef 
di, void *context) {
  * @returns 0 for success, 1 for failure
  */
 unsigned radeon_llvm_compile(LLVMModuleRef M, struct radeon_shader_binary 
*binary,
- const char * gpu_family, unsigned 
dump) {
+ const char *gpu_family, unsigned dump)
+{
 
LLVMTargetRef target;
LLVMTargetMachineRef tm;
-- 
1.8.3.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/3] radeonsi: add sampling of 4:2:2 subsampled textures

2014-07-02 Thread Grigori Goronzy
On 02.07.2014 22:18, Andy Furniss wrote:
 
 Before I knew how to get field sync to use my TVs deinterlacer I had to
 modify mesa so that I could use the vdpau de-interlacer(s), when I did
 this I noticed that 422 didn't work and looked the same as it does now
 this has gone in with my si.


Are you trying to use the temporal deinterlacer with 4:2:2? Sorry, that
isn't implemented yet! The temporal deinterlacer pass always outputs 4:2:0.

Best regards
Grigori







signature.asc
Description: OpenPGP digital signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/3] radeonsi: add sampling of 4:2:2 subsampled textures

2014-06-18 Thread Grigori Goronzy
On 18.06.2014 13:11, Christian König wrote:
 @Grigori: Should I push it or did you got your account in the meantime?


No account yet. I wonder what's going on. Please push.

Best regards
Grigori

 Christian.
 
 Am 17.06.2014 22:26, schrieb Marek Olšák:
 This looks good to me.

 Reviewed-by: Marek Olšák marek.ol...@amd.com

 Marek

 On Wed, Jun 4, 2014 at 6:54 PM, Grigori Goronzy g...@chown.ath.cx
 wrote:
 This makes 4:2:2 video surfaces work in VDPAU.
 ---
   src/gallium/drivers/radeon/r600_texture.c |  5 +-
   src/gallium/drivers/radeonsi/si_blit.c| 91
 ++-
   src/gallium/drivers/radeonsi/si_state.c   | 15 +
   3 files changed, 71 insertions(+), 40 deletions(-)

 diff --git a/src/gallium/drivers/radeon/r600_texture.c
 b/src/gallium/drivers/radeon/r600_texture.c
 index 3a37465..a20b0c8 100644
 --- a/src/gallium/drivers/radeon/r600_texture.c
 +++ b/src/gallium/drivers/radeon/r600_texture.c
 @@ -737,9 +737,8 @@ static unsigned r600_choose_tiling(struct
 r600_common_screen *rscreen,
   * Compressed textures must always be tiled. */
  if (!(templ-flags  R600_RESOURCE_FLAG_FORCE_TILING) 
  !util_format_is_compressed(templ-format)) {
 -   /* Tiling doesn't work with the 422 (SUBSAMPLED)
 formats on R600-Cayman. */
 -   if (rscreen-chip_class = CAYMAN 
 -   desc-layout == UTIL_FORMAT_LAYOUT_SUBSAMPLED)
 +   /* Tiling doesn't work with the 422 (SUBSAMPLED)
 formats on R600+. */
 +   if (desc-layout == UTIL_FORMAT_LAYOUT_SUBSAMPLED)
  return RADEON_SURF_MODE_LINEAR_ALIGNED;

  /* Cursors are linear on SI.
 diff --git a/src/gallium/drivers/radeonsi/si_blit.c
 b/src/gallium/drivers/radeonsi/si_blit.c
 index e02615f..8c3e136 100644
 --- a/src/gallium/drivers/radeonsi/si_blit.c
 +++ b/src/gallium/drivers/radeonsi/si_blit.c
 @@ -548,46 +548,63 @@ static void si_resource_copy_region(struct
 pipe_context *ctx,
  dstx = util_format_get_nblocksx(orig_info[1].format,
 dstx);
  dsty = util_format_get_nblocksy(orig_info[1].format,
 dsty);
  } else if (!util_blitter_is_copy_supported(sctx-blitter,
 dst, src)) {
 -   unsigned blocksize =
 util_format_get_blocksize(src-format);
 -
 -   switch (blocksize) {
 -   case 1:
 -   si_change_format(src, src_level, orig_info[0],
 -PIPE_FORMAT_R8_UNORM);
 -   si_change_format(dst, dst_level, orig_info[1],
 -PIPE_FORMAT_R8_UNORM);
 -   break;
 -   case 2:
 -   si_change_format(src, src_level, orig_info[0],
 -PIPE_FORMAT_R8G8_UNORM);
 -   si_change_format(dst, dst_level, orig_info[1],
 -PIPE_FORMAT_R8G8_UNORM);
 -   break;
 -   case 4:
 -   si_change_format(src, src_level, orig_info[0],
 -PIPE_FORMAT_R8G8B8A8_UNORM);
 -   si_change_format(dst, dst_level, orig_info[1],
 -PIPE_FORMAT_R8G8B8A8_UNORM);
 -   break;
 -   case 8:
 -   si_change_format(src, src_level, orig_info[0],
 -PIPE_FORMAT_R16G16B16A16_UINT);
 -   si_change_format(dst, dst_level, orig_info[1],
 -PIPE_FORMAT_R16G16B16A16_UINT);
 -   break;
 -   case 16:
 +   if (util_format_is_subsampled_422(src-format)) {
 +   /* XXX untested */
  si_change_format(src, src_level, orig_info[0],
 -PIPE_FORMAT_R32G32B32A32_UINT);
 +PIPE_FORMAT_R8G8B8A8_UINT);
  si_change_format(dst, dst_level, orig_info[1],
 -PIPE_FORMAT_R32G32B32A32_UINT);
 -   break;
 -   default:
 -   fprintf(stderr, Unhandled format %s with
 blocksize %u\n,
 -   util_format_short_name(src-format),
 blocksize);
 -   assert(0);
 +PIPE_FORMAT_R8G8B8A8_UINT);
 +
 +   sbox = *src_box;
 +   sbox.x =
 util_format_get_nblocksx(orig_info[0].format, src_box-x);
 +   sbox.width =
 util_format_get_nblocksx(orig_info[0].format, src_box-width);
 +   src_box = sbox;
 +   dstx =
 util_format_get_nblocksx(orig_info[1].format, dstx);
 +
 +   restore_orig[0] = TRUE;
 +   restore_orig[1

Re: [Mesa-dev] [PATCH 2/3] radeonsi: add sampling of 4:2:2 subsampled textures

2014-06-17 Thread Grigori Goronzy
Ping? I'm not sure if this is completely correct, but this code path is
only excercised by VDPAU and it seems to work fine on SI.

Grigori

On 04.06.2014 18:54, Grigori Goronzy wrote:
 This makes 4:2:2 video surfaces work in VDPAU.
 ---
  src/gallium/drivers/radeon/r600_texture.c |  5 +-
  src/gallium/drivers/radeonsi/si_blit.c| 91 
 ++-
  src/gallium/drivers/radeonsi/si_state.c   | 15 +
  3 files changed, 71 insertions(+), 40 deletions(-)
 
 diff --git a/src/gallium/drivers/radeon/r600_texture.c 
 b/src/gallium/drivers/radeon/r600_texture.c
 index 3a37465..a20b0c8 100644
 --- a/src/gallium/drivers/radeon/r600_texture.c
 +++ b/src/gallium/drivers/radeon/r600_texture.c
 @@ -737,9 +737,8 @@ static unsigned r600_choose_tiling(struct 
 r600_common_screen *rscreen,
* Compressed textures must always be tiled. */
   if (!(templ-flags  R600_RESOURCE_FLAG_FORCE_TILING) 
   !util_format_is_compressed(templ-format)) {
 - /* Tiling doesn't work with the 422 (SUBSAMPLED) formats on 
 R600-Cayman. */
 - if (rscreen-chip_class = CAYMAN 
 - desc-layout == UTIL_FORMAT_LAYOUT_SUBSAMPLED)
 + /* Tiling doesn't work with the 422 (SUBSAMPLED) formats on 
 R600+. */
 + if (desc-layout == UTIL_FORMAT_LAYOUT_SUBSAMPLED)
   return RADEON_SURF_MODE_LINEAR_ALIGNED;
  
   /* Cursors are linear on SI.
 diff --git a/src/gallium/drivers/radeonsi/si_blit.c 
 b/src/gallium/drivers/radeonsi/si_blit.c
 index e02615f..8c3e136 100644
 --- a/src/gallium/drivers/radeonsi/si_blit.c
 +++ b/src/gallium/drivers/radeonsi/si_blit.c
 @@ -548,46 +548,63 @@ static void si_resource_copy_region(struct pipe_context 
 *ctx,
   dstx = util_format_get_nblocksx(orig_info[1].format, dstx);
   dsty = util_format_get_nblocksy(orig_info[1].format, dsty);
   } else if (!util_blitter_is_copy_supported(sctx-blitter, dst, src)) {
 - unsigned blocksize = util_format_get_blocksize(src-format);
 -
 - switch (blocksize) {
 - case 1:
 - si_change_format(src, src_level, orig_info[0],
 -  PIPE_FORMAT_R8_UNORM);
 - si_change_format(dst, dst_level, orig_info[1],
 -  PIPE_FORMAT_R8_UNORM);
 - break;
 - case 2:
 - si_change_format(src, src_level, orig_info[0],
 -  PIPE_FORMAT_R8G8_UNORM);
 - si_change_format(dst, dst_level, orig_info[1],
 -  PIPE_FORMAT_R8G8_UNORM);
 - break;
 - case 4:
 - si_change_format(src, src_level, orig_info[0],
 -  PIPE_FORMAT_R8G8B8A8_UNORM);
 - si_change_format(dst, dst_level, orig_info[1],
 -  PIPE_FORMAT_R8G8B8A8_UNORM);
 - break;
 - case 8:
 - si_change_format(src, src_level, orig_info[0],
 -  PIPE_FORMAT_R16G16B16A16_UINT);
 - si_change_format(dst, dst_level, orig_info[1],
 -  PIPE_FORMAT_R16G16B16A16_UINT);
 - break;
 - case 16:
 + if (util_format_is_subsampled_422(src-format)) {
 + /* XXX untested */
   si_change_format(src, src_level, orig_info[0],
 -  PIPE_FORMAT_R32G32B32A32_UINT);
 +  PIPE_FORMAT_R8G8B8A8_UINT);
   si_change_format(dst, dst_level, orig_info[1],
 -  PIPE_FORMAT_R32G32B32A32_UINT);
 - break;
 - default:
 - fprintf(stderr, Unhandled format %s with blocksize 
 %u\n,
 - util_format_short_name(src-format), blocksize);
 - assert(0);
 +  PIPE_FORMAT_R8G8B8A8_UINT);
 +
 + sbox = *src_box;
 + sbox.x = util_format_get_nblocksx(orig_info[0].format, 
 src_box-x);
 + sbox.width = 
 util_format_get_nblocksx(orig_info[0].format, src_box-width);
 + src_box = sbox;
 + dstx = util_format_get_nblocksx(orig_info[1].format, 
 dstx);
 +
 + restore_orig[0] = TRUE;
 + restore_orig[1] = TRUE;
 + } else {
 + unsigned blocksize = 
 util_format_get_blocksize(src-format);
 +
 + switch (blocksize) {
 + case 1:
 + si_change_format(src, src_level, orig_info[0],
 + PIPE_FORMAT_R8_UNORM

[Mesa-dev] [PATCH 1/3] util/u_format: move utility function from r600g

2014-06-04 Thread Grigori Goronzy
We need this for radeonsi, and it might be useful for other drivers,
too.
---
 src/gallium/auxiliary/util/u_format.c | 11 +++
 src/gallium/auxiliary/util/u_format.h |  3 +++
 src/gallium/drivers/r600/r600_blit.c  | 12 +---
 3 files changed, 15 insertions(+), 11 deletions(-)

diff --git a/src/gallium/auxiliary/util/u_format.c 
b/src/gallium/auxiliary/util/u_format.c
index 056f82f..a53ed6f 100644
--- a/src/gallium/auxiliary/util/u_format.c
+++ b/src/gallium/auxiliary/util/u_format.c
@@ -187,6 +187,17 @@ util_format_is_intensity(enum pipe_format format)
return FALSE;
 }
 
+boolean
+util_format_is_subsampled_422(enum pipe_format format)
+{
+   const struct util_format_description *desc =
+  util_format_description(format);
+
+   return desc-layout == UTIL_FORMAT_LAYOUT_SUBSAMPLED 
+  desc-block.width == 2 
+  desc-block.height == 1 
+  desc-block.bits == 32;
+}
 
 boolean
 util_format_is_supported(enum pipe_format format, unsigned bind)
diff --git a/src/gallium/auxiliary/util/u_format.h 
b/src/gallium/auxiliary/util/u_format.h
index 1dd5d52..2e2bf02 100644
--- a/src/gallium/auxiliary/util/u_format.h
+++ b/src/gallium/auxiliary/util/u_format.h
@@ -664,6 +664,9 @@ boolean
 util_format_is_intensity(enum pipe_format format);
 
 boolean
+util_format_is_subsampled_422(enum pipe_format format);
+
+boolean
 util_format_is_pure_integer(enum pipe_format format);
 
 boolean
diff --git a/src/gallium/drivers/r600/r600_blit.c 
b/src/gallium/drivers/r600/r600_blit.c
index 3269c47..962be60 100644
--- a/src/gallium/drivers/r600/r600_blit.c
+++ b/src/gallium/drivers/r600/r600_blit.c
@@ -563,16 +563,6 @@ static void r600_clear_buffer(struct pipe_context *ctx, 
struct pipe_resource *ds
}
 }
 
-static bool util_format_is_subsampled_2x1_32bpp(enum pipe_format format)
-{
-   const struct util_format_description *desc = 
util_format_description(format);
-
-   return desc-layout == UTIL_FORMAT_LAYOUT_SUBSAMPLED 
-  desc-block.width == 2 
-  desc-block.height == 1 
-  desc-block.bits == 32;
-}
-
 static void r600_resource_copy_region(struct pipe_context *ctx,
  struct pipe_resource *dst,
  unsigned dst_level,
@@ -647,7 +637,7 @@ static void r600_resource_copy_region(struct pipe_context 
*ctx,
 
src_force_level = src_level;
} else if (!util_blitter_is_copy_supported(rctx-blitter, dst, src)) {
-   if (util_format_is_subsampled_2x1_32bpp(src-format)) {
+   if (util_format_is_subsampled_422(src-format)) {
 
src_templ.format = PIPE_FORMAT_R8G8B8A8_UINT;
dst_templ.format = PIPE_FORMAT_R8G8B8A8_UINT;
-- 
1.8.3.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/3] radeonsi: add sampling of 4:2:2 subsampled textures

2014-06-04 Thread Grigori Goronzy
This makes 4:2:2 video surfaces work in VDPAU.
---
 src/gallium/drivers/radeon/r600_texture.c |  5 +-
 src/gallium/drivers/radeonsi/si_blit.c| 91 ++-
 src/gallium/drivers/radeonsi/si_state.c   | 15 +
 3 files changed, 71 insertions(+), 40 deletions(-)

diff --git a/src/gallium/drivers/radeon/r600_texture.c 
b/src/gallium/drivers/radeon/r600_texture.c
index 3a37465..a20b0c8 100644
--- a/src/gallium/drivers/radeon/r600_texture.c
+++ b/src/gallium/drivers/radeon/r600_texture.c
@@ -737,9 +737,8 @@ static unsigned r600_choose_tiling(struct 
r600_common_screen *rscreen,
 * Compressed textures must always be tiled. */
if (!(templ-flags  R600_RESOURCE_FLAG_FORCE_TILING) 
!util_format_is_compressed(templ-format)) {
-   /* Tiling doesn't work with the 422 (SUBSAMPLED) formats on 
R600-Cayman. */
-   if (rscreen-chip_class = CAYMAN 
-   desc-layout == UTIL_FORMAT_LAYOUT_SUBSAMPLED)
+   /* Tiling doesn't work with the 422 (SUBSAMPLED) formats on 
R600+. */
+   if (desc-layout == UTIL_FORMAT_LAYOUT_SUBSAMPLED)
return RADEON_SURF_MODE_LINEAR_ALIGNED;
 
/* Cursors are linear on SI.
diff --git a/src/gallium/drivers/radeonsi/si_blit.c 
b/src/gallium/drivers/radeonsi/si_blit.c
index e02615f..8c3e136 100644
--- a/src/gallium/drivers/radeonsi/si_blit.c
+++ b/src/gallium/drivers/radeonsi/si_blit.c
@@ -548,46 +548,63 @@ static void si_resource_copy_region(struct pipe_context 
*ctx,
dstx = util_format_get_nblocksx(orig_info[1].format, dstx);
dsty = util_format_get_nblocksy(orig_info[1].format, dsty);
} else if (!util_blitter_is_copy_supported(sctx-blitter, dst, src)) {
-   unsigned blocksize = util_format_get_blocksize(src-format);
-
-   switch (blocksize) {
-   case 1:
-   si_change_format(src, src_level, orig_info[0],
-PIPE_FORMAT_R8_UNORM);
-   si_change_format(dst, dst_level, orig_info[1],
-PIPE_FORMAT_R8_UNORM);
-   break;
-   case 2:
-   si_change_format(src, src_level, orig_info[0],
-PIPE_FORMAT_R8G8_UNORM);
-   si_change_format(dst, dst_level, orig_info[1],
-PIPE_FORMAT_R8G8_UNORM);
-   break;
-   case 4:
-   si_change_format(src, src_level, orig_info[0],
-PIPE_FORMAT_R8G8B8A8_UNORM);
-   si_change_format(dst, dst_level, orig_info[1],
-PIPE_FORMAT_R8G8B8A8_UNORM);
-   break;
-   case 8:
-   si_change_format(src, src_level, orig_info[0],
-PIPE_FORMAT_R16G16B16A16_UINT);
-   si_change_format(dst, dst_level, orig_info[1],
-PIPE_FORMAT_R16G16B16A16_UINT);
-   break;
-   case 16:
+   if (util_format_is_subsampled_422(src-format)) {
+   /* XXX untested */
si_change_format(src, src_level, orig_info[0],
-PIPE_FORMAT_R32G32B32A32_UINT);
+PIPE_FORMAT_R8G8B8A8_UINT);
si_change_format(dst, dst_level, orig_info[1],
-PIPE_FORMAT_R32G32B32A32_UINT);
-   break;
-   default:
-   fprintf(stderr, Unhandled format %s with blocksize 
%u\n,
-   util_format_short_name(src-format), blocksize);
-   assert(0);
+PIPE_FORMAT_R8G8B8A8_UINT);
+
+   sbox = *src_box;
+   sbox.x = util_format_get_nblocksx(orig_info[0].format, 
src_box-x);
+   sbox.width = 
util_format_get_nblocksx(orig_info[0].format, src_box-width);
+   src_box = sbox;
+   dstx = util_format_get_nblocksx(orig_info[1].format, 
dstx);
+
+   restore_orig[0] = TRUE;
+   restore_orig[1] = TRUE;
+   } else {
+   unsigned blocksize = 
util_format_get_blocksize(src-format);
+
+   switch (blocksize) {
+   case 1:
+   si_change_format(src, src_level, orig_info[0],
+   PIPE_FORMAT_R8_UNORM);
+   si_change_format(dst, dst_level, orig_info[1],
+   PIPE_FORMAT_R8_UNORM);
+ 

[Mesa-dev] [PATCH 3/3] radeon/uvd: disable VC-1 simple/main on UVD 2.x

2014-06-04 Thread Grigori Goronzy
It's about as broken as on later UVD revisions.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=66452
Cc: 10.1 10.2 mesa-sta...@lists.freedesktop.org
---
 src/gallium/drivers/radeon/radeon_video.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/radeon/radeon_video.c 
b/src/gallium/drivers/radeon/radeon_video.c
index 63bd805..eae533e 100644
--- a/src/gallium/drivers/radeon/radeon_video.c
+++ b/src/gallium/drivers/radeon/radeon_video.c
@@ -242,7 +242,10 @@ int rvid_get_video_param(struct pipe_screen *screen,
switch (param) {
case PIPE_VIDEO_CAP_SUPPORTED:
/* no support for MPEG4 */
-   return codec != PIPE_VIDEO_FORMAT_MPEG4;
+   return codec != PIPE_VIDEO_FORMAT_MPEG4 
+  /* FIXME: VC-1 simple/main profile is broken */
+  profile != PIPE_VIDEO_PROFILE_VC1_SIMPLE 
+  profile != PIPE_VIDEO_PROFILE_VC1_MAIN;
case PIPE_VIDEO_CAP_PREFERS_INTERLACED:
case PIPE_VIDEO_CAP_SUPPORTS_INTERLACED:
/* and MPEG2 only with shaders */
-- 
1.8.3.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] The way r600g handles shaders that use more than available GPRs

2014-04-20 Thread Grigori Goronzy

On 20.04.2014 03:02, Marek Olšák wrote:

It looks like the check is not needed with SB, because SB performs
register allocation. What happens if you comment out the conditional
which fails?



SB takes the machine code generated by the classic compiler as input, 
so the check is still needed. The best solution for this problem would 
be to integrate Vadim's tgsi-to-sb branch, which goes directly from TGSI 
to SB's internal representation, without the classic compiler as a 
middle man.


As far as I know, even with SB there is no spilling implemented, but it 
should only be a problem with really crazy shaders. SB optimizes 
register usage quite well.


Grigori


Marek

On Sun, Apr 20, 2014 at 1:30 AM, Marcello Maggioni haya...@gmail.com wrote:

Hello,

I realized while playing Diablo III on my machine that some shaders seem to
run out of available GPRs using r600g with my Macbook Pro with a HD6750m.
If the driver tries to do something to handle this case, but I couldn't find
any part inside the code that has to do with spilling.

There are some register related passes in SB, but none seems to be related
to possible spilling (anyway, the failing I get is in r600_shader.c:2148
inside r600_shader_from_tgsi() which makes shader compiling failing
altogether skipping SB).

How does r600g handle out of register situations? If it doesn't there are
plans to add this?

Cheers,
Marcello

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC] r600g/radeonsi: Use caching buffer manager for textures as well

2014-04-10 Thread Grigori Goronzy

On 10.04.2014 11:23, Michel Dänzer wrote:

From: Michel Dänzer michel.daen...@amd.com

---

This is just an RFC; if other developers approve of this approach, I can
make a more extensive patch removing the use_reusable_pool parameters.

The x11perf numbers below compare ShmGet/PutImage before and after this
change with glamor from keithp's glamor-server xserver branch on a Kaveri
APU. The change also reduces the total runtime of the gtkperf tests from
about 3.5s to about 3s.



I have done some similar experiments. I also see a noticeable speedup 
with glamor. AFAIR Marek had some objections because texture buffers can 
be quite large and too many and too big buffers might end up congesting 
the cache for a long time.


Maybe it makes sense to only use caching for texture BOs smaller than a 
certain threshold? I have such a change in my Mesa tree and it seems to 
work well. I cache all texture BOs = 512KB.


Best regards
Grigori


1: x11perf-baseline.txt
2: x11perf-caching.txt

 1  2   Operation
   -   -
   9070.048500.0 (  5.35)   ShmPutImage 10x10 square
   5670.027700.0 (  4.89)   ShmPutImage 100x100 square
758.0 2350.0 (  3.10)   ShmPutImage 500x500 square
   3600.0 5360.0 (  1.49)   ShmGetImage 10x10 square
   2960.0 5720.0 (  1.93)   ShmGetImage 100x100 square
346.0 1140.0 (  3.29)   ShmGetImage 500x500 square

  src/gallium/drivers/radeon/r600_texture.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/radeon/r600_texture.c 
b/src/gallium/drivers/radeon/r600_texture.c
index e39b9ec..293eeaa 100644
--- a/src/gallium/drivers/radeon/r600_texture.c
+++ b/src/gallium/drivers/radeon/r600_texture.c
@@ -633,7 +633,7 @@ r600_texture_create_object(struct pipe_screen *screen,
/* Now create the backing buffer. */
if (!buf) {
if (!r600_init_resource(rscreen, resource, rtex-size,
-   rtex-surface.bo_alignment, FALSE)) {
+   rtex-surface.bo_alignment, TRUE)) {
FREE(rtex);
return NULL;
}



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] st/vdpau: fix possible NULL dereference

2014-03-02 Thread Grigori Goronzy
---
 src/gallium/state_trackers/vdpau/mixer.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/gallium/state_trackers/vdpau/mixer.c 
b/src/gallium/state_trackers/vdpau/mixer.c
index 996fd8e..e6bfb8c 100644
--- a/src/gallium/state_trackers/vdpau/mixer.c
+++ b/src/gallium/state_trackers/vdpau/mixer.c
@@ -242,16 +242,16 @@ VdpStatus vlVdpVideoMixerRender(VdpVideoMixer mixer,
compositor = vmixer-device-compositor;
 
surf = vlGetDataHTAB(video_surface_current);
-   video_buffer = surf-video_buffer;
if (!surf)
   return VDP_STATUS_INVALID_HANDLE;
+   video_buffer = surf-video_buffer;
 
if (surf-device != vmixer-device)
   return VDP_STATUS_HANDLE_DEVICE_MISMATCH;
 
-   if (vmixer-video_width  surf-video_buffer-width ||
-   vmixer-video_height  surf-video_buffer-height ||
-   vmixer-chroma_format != surf-video_buffer-chroma_format)
+   if (vmixer-video_width  video_buffer-width ||
+   vmixer-video_height  video_buffer-height ||
+   vmixer-chroma_format != video_buffer-chroma_format)
   return VDP_STATUS_INVALID_SIZE;
 
if (layer_count  vmixer-max_layers)
-- 
1.8.3.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] NV_vdpau_interop: fix IsSurfaceNV return type

2014-03-02 Thread Grigori Goronzy
The spec incorrectly used void as return type, when it should have
been GLboolean. This has now been fixed. According to Nvidia, their
implementation always used GLboolean.
---
 include/GL/glext.h  | 2 +-
 src/mapi/glapi/gen/NV_vdpau_interop.xml | 1 +
 src/mesa/main/vdpau.c   | 9 +
 src/mesa/main/vdpau.h   | 2 +-
 4 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/include/GL/glext.h b/include/GL/glext.h
index 7d6033e..62bae4c 100644
--- a/include/GL/glext.h
+++ b/include/GL/glext.h
@@ -9658,7 +9658,7 @@ GLAPI void APIENTRY glVDPAUInitNV (const void *vdpDevice, 
const void *getProcAdd
 GLAPI void APIENTRY glVDPAUFiniNV (void);
 GLAPI GLvdpauSurfaceNV APIENTRY glVDPAURegisterVideoSurfaceNV (const void 
*vdpSurface, GLenum target, GLsizei numTextureNames, const GLuint 
*textureNames);
 GLAPI GLvdpauSurfaceNV APIENTRY glVDPAURegisterOutputSurfaceNV (const void 
*vdpSurface, GLenum target, GLsizei numTextureNames, const GLuint 
*textureNames);
-GLAPI void APIENTRY glVDPAUIsSurfaceNV (GLvdpauSurfaceNV surface);
+GLAPI GLboolean APIENTRY glVDPAUIsSurfaceNV (GLvdpauSurfaceNV surface);
 GLAPI void APIENTRY glVDPAUUnregisterSurfaceNV (GLvdpauSurfaceNV surface);
 GLAPI void APIENTRY glVDPAUGetSurfaceivNV (GLvdpauSurfaceNV surface, GLenum 
pname, GLsizei bufSize, GLsizei *length, GLint *values);
 GLAPI void APIENTRY glVDPAUSurfaceAccessNV (GLvdpauSurfaceNV surface, GLenum 
access);
diff --git a/src/mapi/glapi/gen/NV_vdpau_interop.xml 
b/src/mapi/glapi/gen/NV_vdpau_interop.xml
index cf5f0ed..0b19e1a 100644
--- a/src/mapi/glapi/gen/NV_vdpau_interop.xml
+++ b/src/mapi/glapi/gen/NV_vdpau_interop.xml
@@ -29,6 +29,7 @@
 /function
 
 function name=VDPAUIsSurfaceNV offset=assign
+return type=GLboolean/
param name=surface type=GLintptr/
 /function
 
diff --git a/src/mesa/main/vdpau.c b/src/mesa/main/vdpau.c
index 3597576..c2cf206 100644
--- a/src/mesa/main/vdpau.c
+++ b/src/mesa/main/vdpau.c
@@ -205,7 +205,7 @@ _mesa_VDPAURegisterOutputSurfaceNV(const GLvoid 
*vdpSurface, GLenum target,
numTextureNames, textureNames);
 }
 
-void GLAPIENTRY
+GLboolean GLAPIENTRY
 _mesa_VDPAUIsSurfaceNV(GLintptr surface)
 {
struct vdp_surface *surf = (struct vdp_surface *)surface;
@@ -213,13 +213,14 @@ _mesa_VDPAUIsSurfaceNV(GLintptr surface)
 
if (!ctx-vdpDevice || !ctx-vdpGetProcAddress || !ctx-vdpSurfaces) {
   _mesa_error(ctx, GL_INVALID_OPERATION, VDPAUIsSurfaceNV);
-  return;
+  return false;
}
 
if (!_mesa_set_search(ctx-vdpSurfaces, _mesa_hash_pointer(surf), surf)) {
-  _mesa_error(ctx, GL_INVALID_VALUE, VDPAUIsSurfaceNV);
-  return;
+  return false;
}
+
+   return true;
 }
 
 void GLAPIENTRY
diff --git a/src/mesa/main/vdpau.h b/src/mesa/main/vdpau.h
index f32d6da..627609c 100644
--- a/src/mesa/main/vdpau.h
+++ b/src/mesa/main/vdpau.h
@@ -50,7 +50,7 @@ _mesa_VDPAURegisterOutputSurfaceNV(const GLvoid *vdpSurface, 
GLenum target,
GLsizei numTextureNames,
const GLuint *textureNames);
 
-extern void GLAPIENTRY
+extern GLboolean GLAPIENTRY
 _mesa_VDPAUIsSurfaceNV(GLintptr surface);
 
 extern void GLAPIENTRY
-- 
1.8.3.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


  1   2   >