[Mesa-dev] [PATCH] doc: break down ARB_direct_state_access in GL3.txt
A student was wondering what was going on + I started working on it too. CC: Laura Ekstrand la...@jlekstrand.net Signed-off-by: Martin Peres martin.pe...@linux.intel.com --- docs/GL3.txt | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/docs/GL3.txt b/docs/GL3.txt index 648f5ac..2dc1404 100644 --- a/docs/GL3.txt +++ b/docs/GL3.txt @@ -191,7 +191,16 @@ GL 4.5, GLSL 4.50: GL_ARB_conditional_render_inverted DONE (i965, nv50, nvc0, llvmpipe, softpipe) GL_ARB_cull_distance not started GL_ARB_derivative_controlDONE (i965, nv50, nvc0, r600) - GL_ARB_direct_state_access started (Laura Ekstrand) + GL_ARB_direct_state_access started + - Transform Feedback object started (Martin Peres) + - Buffer object started (Laura Ekstrand) + - Framebuffer object started (Laura Ekstrand) + - Renderbuffer objectnot started + - Texture object started (Laura Ekstrand) + - Vertex array objectnot started + - Sampler object not started + - Program Pipeline objectnot started + - Query object started (Martin Peres) GL_ARB_get_texture_sub_image started (Brian Paul) GL_ARB_shader_texture_image_samples not started GL_ARB_texture_barrier DONE (nv50, nvc0, r300, r600, radeonsi) -- 2.2.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] mesa: Add new fast mtx_t mutex type for basic use cases
On Thu, Jan 29, 2015 at 6:36 AM, Emil Velikov emil.l.veli...@gmail.com wrote: On 28/01/15 05:08, Kristian Høgsberg wrote: While modern pthread mutexes are very fast, they still incur a call to an external DSO and overhead of the generality and features of pthread mutexes. Most mutexes in mesa only needs lock/unlock, and the idea here is that we can inline the atomic operation and make the fast case just two intructions. Mutexes are subtle and finicky to implement, so we carefully copy the implementation from Ulrich Dreppers well-written and well-reviewed paper: Futexes Are Tricky http://www.akkadia.org/drepper/futex.pdf We implement mutex3, which gives us a mutex that has no syscalls on uncontended lock or unlock. Further, the uncontended case boils down to a cmpxchg and an untaken branch and the uncontended unlock is just a locked decr and an untaken branch. We use __builtin_expect() to indicate that contention is unlikely so that gcc will put the contention code out of the main code flow. A fast mutex only supports lock/unlock, can't be recursive or used with condition variables. We keep the pthread mutex implementation around as full_mtx_t for the few places where we use condition variables or recursive locking. For platforms or compilers where futex and atomics aren't available, mtx_t falls back to the pthread mutex. The pthread mutex lock/unlock overhead shows up on benchmarks for CPU bound applications. Most CPU bound cases are helped and some of our internal bind_buffer_object heavy benchmarks gain up to 10%. Hi Kristian, Can I humbly ask that you split this into two patches - one that introduces the new functions/struct and another one that uses them ? This way it'll be easier if/when things go crazy. Also the patch seems to wonder between posix and win32 + typedef full_mtx_t mtx_t; and + typedef mtx_t fast_mtx_t; Looks like a left over from the should I rename XX variables to fast* or just one to full* moment :) Yeah, that's how it progressed :) At first I called it fast_mtx_t and planned on replacing simple uses of mtx_t one by one. Jordan suggested that it'd be easier to make the regular mutex fast and then rename the couple of places that use more feature than we provide. I just botched the windows side when I did that. However, given that the patch now reimplements mtx_t, there's really no good way to split up introducing the fast mutex and making use of it in multiple patches. I guess we can add full_mtx_t in a separate patch, use it in a second patch and then finally reimplement mtx_t as a fast mutex. How does that sound? Kristian ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] i965/skl: Force a BINDING_TABLE_POINTER_* command after push constant command
According to the SkyLake bspec the 3DSTATE_CONSTANT_* commands only take effect on the next corresponding 3DSTATE_BINDING_TABLE_POINTER_* command. This patch just makes it set the BRW_NEW_SURFACES state when uploading the push constants to ensure the binding tables will be updated. This fixes the fbo-blending-formats Piglit test and possibly others. Reviewed-by: Kristian Høgsberg k...@bitplanet.net --- src/mesa/drivers/dri/i965/gen7_vs_state.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/src/mesa/drivers/dri/i965/gen7_vs_state.c b/src/mesa/drivers/dri/i965/gen7_vs_state.c index 404dd20..c55d6ca 100644 --- a/src/mesa/drivers/dri/i965/gen7_vs_state.c +++ b/src/mesa/drivers/dri/i965/gen7_vs_state.c @@ -60,6 +60,13 @@ gen7_upload_constant_state(struct brw_context *brw, } ADVANCE_BATCH(); + + /* On SkyLake+ the new constants don't take effect until the next + * corresponding 3DSTATE_BINDING_TABLE_POINTER_* command is parsed so we + * need to ensure that is sent + */ + if (brw-gen = 9) + brw-state.dirty.brw |= BRW_NEW_SURFACES; } -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] i965: Add a better PRM citation for the IMS dimension mangling.
Paul originally had to reverse engineer these formulas based on the description about how the sampler works. The description here is not the easiest to follow - especially given that it's from the Sandybridge era, when the hardware only did 4x multisampling. Jordan and I recently found another part of the documentation where they simply state that IMS dimensions must be adjusted by a set of formulas. Quoting this section provides an easy to follow explanation for the code, including 2x/4x/8x/16x. Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 23 ++- 1 file changed, 22 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c index 26e137d..9e79f6e 100644 --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c @@ -290,7 +290,28 @@ intel_miptree_create_layout(struct brw_context *brw, /* Adjust width/height/depth for MSAA */ mt-msaa_layout = compute_msaa_layout(brw, format, mt-target); if (mt-msaa_layout == INTEL_MSAA_LAYOUT_IMS) { - /* In the Sandy Bridge PRM, volume 4, part 1, page 31, it says: + /* From the Ivybridge PRM, Volume 1, Part 1, page 108: + * If the surface is multisampled and it is a depth or stencil + * surface or Multisampled Surface StorageFormat in SURFACE_STATE is + * MSFMT_DEPTH_STENCIL, WL and HL must be adjusted as follows before + * proceeding: + * + * ++ + * | Num Multisamples |W_l = |H_l = | + * ++ + * | 2| ceiling(W_l / 2) * 4 | H_l (no adjustment) | + * | 4| ceiling(W_l / 2) * 4 | ceiling(H_l / 2) * 4 | + * | 8| ceiling(W_l / 2) * 8 | ceiling(H_l / 2) * 4 | + * |16| ceiling(W_l / 2) * 8 | ceiling(H_l / 2) * 8 | + * ++ + * + * + * Note that MSFMT_DEPTH_STENCIL just means the IMS (interleaved) + * format rather than UMS/CMS (array slices). The Sandybridge PRM, + * Volume 1, Part 1, Page 111 has the same formula for 4x MSAA. + * + * Another more complicated explanation for these adjustments comes + * from the Sandybridge PRM, volume 4, part 1, page 31: * * Any of the other messages (sample*, LOD, load4) used with a * (4x) multisampled surface will in-effect sample a surface with -- 2.2.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v4 2/3] i965/fs: Use NIR's scalarizing abilities and stop handling vectors
On Wednesday, January 28, 2015 04:27:23 PM Jason Ekstrand wrote: On Wed, Jan 28, 2015 at 4:02 PM, Matt Turner matts...@gmail.com wrote: On Tue, Jan 27, 2015 at 5:31 PM, Jason Ekstrand ja...@jlekstrand.net wrote: diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp index de0d780..217fe09 100644 @@ -689,9 +689,9 @@ fs_visitor::nir_emit_alu(nir_alu_instr *instr) struct brw_reg acc = retype(brw_acc_reg(dispatch_width), result.type); - emit_percomp(MUL(acc, op[0], op[1]), instr-dest.write_mask); - emit_percomp(MACH(reg_null_d, op[0], op[1]), instr-dest.write_mask); - emit_percomp(MOV(result, fs_reg(acc)), instr-dest.write_mask); Bah. How the hell did the old code pass piglit? I think channel expressions saved it. + emit(MUL(acc, op[0], op[1])); + emit(MACH(reg_null_d, op[0], op[1])); + emit(MOV(result, fs_reg(acc))); break; } @@ -773,72 +767,38 @@ fs_visitor::nir_emit_alu(nir_alu_instr *instr) case nir_op_ball_fequal3: case nir_op_ball_iequal3: case nir_op_ball_fequal4: - case nir_op_ball_iequal4: { - unsigned num_components = nir_op_infos[instr-op].input_sizes[0]; - fs_reg temp = vgrf(num_components); - emit_percomp(CMP(temp, op[0], op[1], BRW_CONDITIONAL_Z), - (1 num_components) - 1); - emit_reduction(BRW_OPCODE_AND, result, temp, num_components); - break; - } - + case nir_op_ball_iequal4: We can save it for later, but it might be interesting to let the fs backend get the vector comparisons directly, if that's possible. See https://bugs.freedesktop.org/show_bug.cgi?id=77456 We can. We just have to add a flag to turn that off in the nir_lower_alu_to_scalar pass. @@ -867,83 +827,67 @@ fs_visitor::nir_emit_alu(nir_alu_instr *instr) unreachable(not reached: should be handled by ldexp_to_arith()); case nir_op_fsqrt: - emit_math_percomp(SHADER_OPCODE_SQRT, result, op[0], -instr-dest.write_mask, instr-dest.saturate); + emit_math(SHADER_OPCODE_SQRT, result, op[0]) + -saturate = instr-dest.saturate; I don't really like the -saturate = ... as a single statement. I know I'm guilty of using it in a unit test recently. Yeah, I'm not sure what to do there. I really don't like having to assign it to a fs_inst variable and then do the saturate. I guess we could. Or we could add something to emit_math. Thoughts? In the brw_eu_emit.c code I worked around this by creating a brw_last_inst macro that always refers to the most recently emitted instruction. Still not 100% crazy about it, but it's been pretty handy. *shrug*. It's an idea, anyway. I'm not too opposed to just creating fs_inst * temporaries. --Ken signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/6] main: Added entry point for glCreateTransformFeedbacks
On Thursday 29 January 2015, Martin Peres wrote: v2: Review from Laura Ekstrand - generate the name of the gl method once - shorten some lines to stay in the 78 chars limit Signed-off-by: Martin Peres martin.pe...@linux.intel.com --- src/mapi/glapi/gen/ARB_direct_state_access.xml | 7 +++ src/mesa/main/tests/dispatch_sanity.cpp| 1 + src/mesa/main/transformfeedback.c | 63 -- src/mesa/main/transformfeedback.h | 6 +++ 4 files changed, 62 insertions(+), 15 deletions(-) diff --git a/src/mapi/glapi/gen/ARB_direct_state_access.xml b/src/mapi/glapi/gen/ARB_direct_state_access.xml index 2fe1638..15b00c2 100644 --- a/src/mapi/glapi/gen/ARB_direct_state_access.xml +++ b/src/mapi/glapi/gen/ARB_direct_state_access.xml @@ -7,6 +7,13 @@ enum name=QUERY_TARGETvalue=0x82EA/ enum name=TEXTURE_BINDING value=0x82EB/ + !-- Transform Feedback object functions -- + + function name=CreateTransformFeedbacks offset=assign + param name=n type=GLsizei / + param name=ids type=GLuint * / + /function + !-- Texture object functions -- function name=CreateTextures offset=assign diff --git a/src/mesa/main/tests/dispatch_sanity.cpp b/src/mesa/main/tests/dispatch_sanity.cpp index ee4db45..5e7f7ae 100644 --- a/src/mesa/main/tests/dispatch_sanity.cpp +++ b/src/mesa/main/tests/dispatch_sanity.cpp @@ -955,6 +955,7 @@ const struct function gl_core_functions_possible[] = { { glClipControl, 45, -1 }, /* GL_ARB_direct_state_access */ + { glCreateTransformFeedbacks, 45, -1 }, { glCreateTextures, 45, -1 }, { glTextureStorage1D, 45, -1 }, { glTextureStorage2D, 45, -1 }, diff --git a/src/mesa/main/transformfeedback.c b/src/mesa/main/transformfeedback.c index a737463..b0fca48 100644 --- a/src/mesa/main/transformfeedback.c +++ b/src/mesa/main/transformfeedback.c @@ -825,25 +825,24 @@ _mesa_lookup_transform_feedback_object(struct gl_context *ctx, GLuint name) _mesa_HashLookup(ctx-TransformFeedback.Objects, name); } - -/** - * Create new transform feedback objects. Transform feedback objects - * encapsulate the state related to transform feedback to allow quickly - * switching state (and drawing the results, below). - * Part of GL_ARB_transform_feedback2. - */ -void GLAPIENTRY -_mesa_GenTransformFeedbacks(GLsizei n, GLuint *names) +void +_mesa_create_transform_feedbacks(struct gl_context *ctx, GLsizei n, + GLuint *ids, bool dsa) { GLuint first; - GET_CURRENT_CONTEXT(ctx); + const char* gl_mthd_name; I would prefer it if you used a simpler name for this variable, such as func. + if (dsa) + gl_mthd_name = glCreateTransformFeedbacks; + else + gl_mthd_name = glGenTransformFeedbacks; if (n 0) { - _mesa_error(ctx, GL_INVALID_VALUE, glGenTransformFeedbacks(n 0)); + _mesa_error(ctx, GL_INVALID_VALUE, %s(n 0), gl_mthd_name); return; } - if (!names) + if (!ids) return; /* we don't need contiguous IDs, but this might be faster */ @@ -854,18 +853,52 @@ _mesa_GenTransformFeedbacks(GLsizei n, GLuint *names) struct gl_transform_feedback_object *obj = ctx-Driver.NewTransformFeedback(ctx, first + i); if (!obj) { -_mesa_error(ctx, GL_OUT_OF_MEMORY, glGenTransformFeedbacks); +_mesa_error(ctx, GL_OUT_OF_MEMORY, %s, gl_mthd_name); return; } - names[i] = first + i; + ids[i] = first + i; _mesa_HashInsert(ctx-TransformFeedback.Objects, first + i, obj); You have to set EverBound to true when dsa is true. Otherwise glIsTransformFeedback() will return false when it's passed an ID that was created by glCreateTransformFeedbacks(). } } else { - _mesa_error(ctx, GL_OUT_OF_MEMORY, glGenTransformFeedbacks); + _mesa_error(ctx, GL_OUT_OF_MEMORY, %s, gl_mthd_name); } } +/** + * Create new transform feedback objects. Transform feedback objects + * encapsulate the state related to transform feedback to allow quickly + * switching state (and drawing the results, below). + * Part of GL_ARB_transform_feedback2. + */ +void GLAPIENTRY +_mesa_GenTransformFeedbacks(GLsizei n, GLuint *names) +{ + GET_CURRENT_CONTEXT(ctx); + + /* GenTransformFeedbacks should just reserve the object names that a +* subsequent call to BindTransformFeedback should actively create. For +* the sake of simplicity, we reserve the names and create the objects +* straight away. +*/ + _mesa_create_transform_feedbacks(ctx, n, names, false); +} + +/** + * Create new transform feedback objects. Transform feedback objects + * encapsulate the state related to transform feedback to allow quickly + * switching state (and drawing the results, below). + * Part of GL_ARB_direct_state_access. + */
[Mesa-dev] [PATCH 5/6] main: Added entry point for glGetTransformFeedbacki_v
v2: Review from Laura Ekstrand - use the transform feedback object lookup wrapper Signed-off-by: Martin Peres martin.pe...@linux.intel.com --- src/mapi/glapi/gen/ARB_direct_state_access.xml | 7 +++ src/mesa/main/tests/dispatch_sanity.cpp| 1 + src/mesa/main/transformfeedback.c | 29 ++ src/mesa/main/transformfeedback.h | 4 4 files changed, 41 insertions(+) diff --git a/src/mapi/glapi/gen/ARB_direct_state_access.xml b/src/mapi/glapi/gen/ARB_direct_state_access.xml index 1ac4386..935e088 100644 --- a/src/mapi/glapi/gen/ARB_direct_state_access.xml +++ b/src/mapi/glapi/gen/ARB_direct_state_access.xml @@ -34,6 +34,13 @@ param name=param type=GLint * / /function + function name=GetTransformFeedbacki_v offset=assign + param name=xfb type=GLuint / + param name=pname type=GLenum / + param name=index type=GLuint / + param name=param type=GLint * / + /function + !-- Texture object functions -- function name=CreateTextures offset=assign diff --git a/src/mesa/main/tests/dispatch_sanity.cpp b/src/mesa/main/tests/dispatch_sanity.cpp index 888d34a..4b92699 100644 --- a/src/mesa/main/tests/dispatch_sanity.cpp +++ b/src/mesa/main/tests/dispatch_sanity.cpp @@ -959,6 +959,7 @@ const struct function gl_core_functions_possible[] = { { glTransformFeedbackBufferBase, 45, -1 }, { glTransformFeedbackBufferRange, 45, -1 }, { glGetTransformFeedbackiv, 45, -1 }, + { glGetTransformFeedbacki_v, 45, -1 }, { glCreateTextures, 45, -1 }, { glTextureStorage1D, 45, -1 }, { glTextureStorage2D, 45, -1 }, diff --git a/src/mesa/main/transformfeedback.c b/src/mesa/main/transformfeedback.c index ae3e1c0..1ffdc16 100644 --- a/src/mesa/main/transformfeedback.c +++ b/src/mesa/main/transformfeedback.c @@ -1220,3 +1220,32 @@ _mesa_GetTransformFeedbackiv(GLuint xfb, GLenum pname, GLint *param) glGetTransformFeedbackiv(pname=%i), pname); } } + +extern void GLAPIENTRY +_mesa_GetTransformFeedbacki_v(GLuint xfb, GLenum pname, GLuint index, + GLint *param) +{ + struct gl_transform_feedback_object *obj; + GET_CURRENT_CONTEXT(ctx); + + obj = _mesa_lookup_transform_feedback_object_err(ctx, xfb, + glGetTransformFeedbacki_v); + if(!obj) { + return; + } + + if (index = ctx-Const.MaxTransformFeedbackBuffers) { + _mesa_error(ctx, GL_INVALID_VALUE, + glGetTransformFeedbacki_v(index=%i), index); + return; + } + + switch(pname) { + case GL_TRANSFORM_FEEDBACK_BUFFER_BINDING: + *param = obj-BufferNames[index]; + break; + default: + _mesa_error(ctx, GL_INVALID_ENUM, + glGetTransformFeedbacki_v(pname=%i), pname); + } +} diff --git a/src/mesa/main/transformfeedback.h b/src/mesa/main/transformfeedback.h index 4bc7c0c..2c31b22 100644 --- a/src/mesa/main/transformfeedback.h +++ b/src/mesa/main/transformfeedback.h @@ -169,4 +169,8 @@ _mesa_TransformFeedbackBufferRange(GLuint xfb, GLuint index, GLuint buffer, extern void GLAPIENTRY _mesa_GetTransformFeedbackiv(GLuint xfb, GLenum pname, GLint *param); +extern void GLAPIENTRY +_mesa_GetTransformFeedbacki_v(GLuint xfb, GLenum pname, GLuint index, + GLint *param); + #endif /* TRANSFORM_FEEDBACK_H */ -- 2.2.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 4/6] main: Added entry point for glGetTransformFeedbackiv
v2: Review from Laura Ekstrand - use the transform feedback object lookup wrapper Signed-off-by: Martin Peres martin.pe...@linux.intel.com --- src/mapi/glapi/gen/ARB_direct_state_access.xml | 6 ++ src/mesa/main/tests/dispatch_sanity.cpp| 1 + src/mesa/main/transformfeedback.c | 25 + src/mesa/main/transformfeedback.h | 3 +++ 4 files changed, 35 insertions(+) diff --git a/src/mapi/glapi/gen/ARB_direct_state_access.xml b/src/mapi/glapi/gen/ARB_direct_state_access.xml index b3c090f..1ac4386 100644 --- a/src/mapi/glapi/gen/ARB_direct_state_access.xml +++ b/src/mapi/glapi/gen/ARB_direct_state_access.xml @@ -28,6 +28,12 @@ param name=size type=GLsizeiptr / /function + function name=GetTransformFeedbackiv offset=assign + param name=xfb type=GLuint / + param name=pname type=GLenum / + param name=param type=GLint * / + /function + !-- Texture object functions -- function name=CreateTextures offset=assign diff --git a/src/mesa/main/tests/dispatch_sanity.cpp b/src/mesa/main/tests/dispatch_sanity.cpp index 3b34f7b..888d34a 100644 --- a/src/mesa/main/tests/dispatch_sanity.cpp +++ b/src/mesa/main/tests/dispatch_sanity.cpp @@ -958,6 +958,7 @@ const struct function gl_core_functions_possible[] = { { glCreateTransformFeedbacks, 45, -1 }, { glTransformFeedbackBufferBase, 45, -1 }, { glTransformFeedbackBufferRange, 45, -1 }, + { glGetTransformFeedbackiv, 45, -1 }, { glCreateTextures, 45, -1 }, { glTextureStorage1D, 45, -1 }, { glTextureStorage2D, 45, -1 }, diff --git a/src/mesa/main/transformfeedback.c b/src/mesa/main/transformfeedback.c index 7564b6f..ae3e1c0 100644 --- a/src/mesa/main/transformfeedback.c +++ b/src/mesa/main/transformfeedback.c @@ -1195,3 +1195,28 @@ _mesa_ResumeTransformFeedback(void) assert(ctx-Driver.ResumeTransformFeedback); ctx-Driver.ResumeTransformFeedback(ctx, obj); } + +extern void GLAPIENTRY +_mesa_GetTransformFeedbackiv(GLuint xfb, GLenum pname, GLint *param) +{ +struct gl_transform_feedback_object *obj; +GET_CURRENT_CONTEXT(ctx); + +obj = _mesa_lookup_transform_feedback_object_err(ctx, xfb, + glGetTransformFeedbackiv); +if(!obj) { + return; +} + +switch(pname) { +case GL_TRANSFORM_FEEDBACK_PAUSED: + *param = obj-Paused; + break; +case GL_TRANSFORM_FEEDBACK_ACTIVE: + *param = obj-Active; + break; +default: + _mesa_error(ctx, GL_INVALID_ENUM, + glGetTransformFeedbackiv(pname=%i), pname); +} +} diff --git a/src/mesa/main/transformfeedback.h b/src/mesa/main/transformfeedback.h index 1178169..4bc7c0c 100644 --- a/src/mesa/main/transformfeedback.h +++ b/src/mesa/main/transformfeedback.h @@ -166,4 +166,7 @@ extern void GLAPIENTRY _mesa_TransformFeedbackBufferRange(GLuint xfb, GLuint index, GLuint buffer, GLintptr offset, GLsizeiptr size); +extern void GLAPIENTRY +_mesa_GetTransformFeedbackiv(GLuint xfb, GLenum pname, GLint *param); + #endif /* TRANSFORM_FEEDBACK_H */ -- 2.2.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/6] main: Added entry point for glTransformFeedbackBufferBase
v2: Review from Laura Ekstrand - give more helpful error messages - factor the lookup code for the xfb and objBuf - replace some already-existing tabs with spaces - add comments to explain the cases where xfb == 0 or buffer == 0 - fix the condition for binding the transform buffer or not Signed-off-by: Martin Peres martin.pe...@linux.intel.com --- src/mapi/glapi/gen/ARB_direct_state_access.xml | 6 ++ src/mesa/main/bufferobj.c | 9 +- src/mesa/main/tests/dispatch_sanity.cpp| 1 + src/mesa/main/transformfeedback.c | 136 +++-- src/mesa/main/transformfeedback.h | 19 +++- 5 files changed, 135 insertions(+), 36 deletions(-) diff --git a/src/mapi/glapi/gen/ARB_direct_state_access.xml b/src/mapi/glapi/gen/ARB_direct_state_access.xml index 15b00c2..35d6906 100644 --- a/src/mapi/glapi/gen/ARB_direct_state_access.xml +++ b/src/mapi/glapi/gen/ARB_direct_state_access.xml @@ -14,6 +14,12 @@ param name=ids type=GLuint * / /function + function name=TransformFeedbackBufferBase offset=assign + param name=xfb type=GLuint / + param name=index type=GLuint / + param name=buffer type=GLuint / + /function + !-- Texture object functions -- function name=CreateTextures offset=assign diff --git a/src/mesa/main/bufferobj.c b/src/mesa/main/bufferobj.c index 0c1ce98..86532ea 100644 --- a/src/mesa/main/bufferobj.c +++ b/src/mesa/main/bufferobj.c @@ -3546,8 +3546,9 @@ _mesa_BindBufferRange(GLenum target, GLuint index, switch (target) { case GL_TRANSFORM_FEEDBACK_BUFFER: - _mesa_bind_buffer_range_transform_feedback(ctx, index, bufObj, -offset, size); + _mesa_bind_buffer_range_transform_feedback(ctx, + ctx-TransformFeedback.CurrentObject, + index, bufObj, offset, size); return; case GL_UNIFORM_BUFFER: bind_buffer_range_uniform_buffer(ctx, index, bufObj, offset, size); @@ -3611,7 +3612,9 @@ _mesa_BindBufferBase(GLenum target, GLuint index, GLuint buffer) switch (target) { case GL_TRANSFORM_FEEDBACK_BUFFER: - _mesa_bind_buffer_base_transform_feedback(ctx, index, bufObj); + _mesa_bind_buffer_base_transform_feedback(ctx, + ctx-TransformFeedback.CurrentObject, +index, bufObj, false); return; case GL_UNIFORM_BUFFER: bind_buffer_base_uniform_buffer(ctx, index, bufObj); diff --git a/src/mesa/main/tests/dispatch_sanity.cpp b/src/mesa/main/tests/dispatch_sanity.cpp index 5e7f7ae..166401a 100644 --- a/src/mesa/main/tests/dispatch_sanity.cpp +++ b/src/mesa/main/tests/dispatch_sanity.cpp @@ -956,6 +956,7 @@ const struct function gl_core_functions_possible[] = { /* GL_ARB_direct_state_access */ { glCreateTransformFeedbacks, 45, -1 }, + { glTransformFeedbackBufferBase, 45, -1 }, { glCreateTextures, 45, -1 }, { glTextureStorage1D, 45, -1 }, { glTextureStorage2D, 45, -1 }, diff --git a/src/mesa/main/transformfeedback.c b/src/mesa/main/transformfeedback.c index b0fca48..6583974 100644 --- a/src/mesa/main/transformfeedback.c +++ b/src/mesa/main/transformfeedback.c @@ -514,22 +514,24 @@ _mesa_EndTransformFeedback(void) * Helper used by BindBufferRange() and BindBufferBase(). */ static void -bind_buffer_range(struct gl_context *ctx, GLuint index, +bind_buffer_range(struct gl_context *ctx, + struct gl_transform_feedback_object *obj, + GLuint index, struct gl_buffer_object *bufObj, - GLintptr offset, GLsizeiptr size) + GLintptr offset, GLsizeiptr size, + bool dsa) { - struct gl_transform_feedback_object *obj = - ctx-TransformFeedback.CurrentObject; - /* Note: no need to FLUSH_VERTICES or flag NewTransformFeedback, because * transform feedback buffers can't be changed while transform feedback is * active. */ - /* The general binding point */ - _mesa_reference_buffer_object(ctx, - ctx-TransformFeedback.CurrentBuffer, - bufObj); + if (!dsa) { + /* The general binding point */ + _mesa_reference_buffer_object(ctx, +ctx-TransformFeedback.CurrentBuffer, +bufObj); + } /* The per-attribute binding point */ _mesa_set_transform_feedback_binding(ctx, obj, index, bufObj, offset, size); @@ -543,15 +545,12 @@ bind_buffer_range(struct gl_context *ctx, GLuint index, */ void _mesa_bind_buffer_range_transform_feedback(struct gl_context *ctx, - GLuint index, - struct gl_buffer_object *bufObj, -
[Mesa-dev] [PATCH 0/6] Add 6 transform-feedback-related entry points for direct_state_access
The corresponding piglit tests have been submited to the piglit ML. Warning: those tests will only work when forcing mesa to expose the DSA extension. Martin Peres (6): main: Added entry point for glCreateTransformFeedbacks main: Added entry point for glTransformFeedbackBufferBase main: Added entry point for glTransformFeedbackBufferRange main: Added entry point for glGetTransformFeedbackiv main: Added entry point for glGetTransformFeedbacki_v main: Added entry point for glGetTransformFeedbacki64_v src/mapi/glapi/gen/ARB_direct_state_access.xml | 41 +++ src/mesa/main/bufferobj.c | 10 +- src/mesa/main/tests/dispatch_sanity.cpp| 6 + src/mesa/main/transformfeedback.c | 357 + src/mesa/main/transformfeedback.h | 42 ++- 5 files changed, 399 insertions(+), 57 deletions(-) -- 2.2.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/6] main: Added entry point for glCreateTransformFeedbacks
v2: Review from Laura Ekstrand - generate the name of the gl method once - shorten some lines to stay in the 78 chars limit Signed-off-by: Martin Peres martin.pe...@linux.intel.com --- src/mapi/glapi/gen/ARB_direct_state_access.xml | 7 +++ src/mesa/main/tests/dispatch_sanity.cpp| 1 + src/mesa/main/transformfeedback.c | 63 -- src/mesa/main/transformfeedback.h | 6 +++ 4 files changed, 62 insertions(+), 15 deletions(-) diff --git a/src/mapi/glapi/gen/ARB_direct_state_access.xml b/src/mapi/glapi/gen/ARB_direct_state_access.xml index 2fe1638..15b00c2 100644 --- a/src/mapi/glapi/gen/ARB_direct_state_access.xml +++ b/src/mapi/glapi/gen/ARB_direct_state_access.xml @@ -7,6 +7,13 @@ enum name=QUERY_TARGETvalue=0x82EA/ enum name=TEXTURE_BINDING value=0x82EB/ + !-- Transform Feedback object functions -- + + function name=CreateTransformFeedbacks offset=assign + param name=n type=GLsizei / + param name=ids type=GLuint * / + /function + !-- Texture object functions -- function name=CreateTextures offset=assign diff --git a/src/mesa/main/tests/dispatch_sanity.cpp b/src/mesa/main/tests/dispatch_sanity.cpp index ee4db45..5e7f7ae 100644 --- a/src/mesa/main/tests/dispatch_sanity.cpp +++ b/src/mesa/main/tests/dispatch_sanity.cpp @@ -955,6 +955,7 @@ const struct function gl_core_functions_possible[] = { { glClipControl, 45, -1 }, /* GL_ARB_direct_state_access */ + { glCreateTransformFeedbacks, 45, -1 }, { glCreateTextures, 45, -1 }, { glTextureStorage1D, 45, -1 }, { glTextureStorage2D, 45, -1 }, diff --git a/src/mesa/main/transformfeedback.c b/src/mesa/main/transformfeedback.c index a737463..b0fca48 100644 --- a/src/mesa/main/transformfeedback.c +++ b/src/mesa/main/transformfeedback.c @@ -825,25 +825,24 @@ _mesa_lookup_transform_feedback_object(struct gl_context *ctx, GLuint name) _mesa_HashLookup(ctx-TransformFeedback.Objects, name); } - -/** - * Create new transform feedback objects. Transform feedback objects - * encapsulate the state related to transform feedback to allow quickly - * switching state (and drawing the results, below). - * Part of GL_ARB_transform_feedback2. - */ -void GLAPIENTRY -_mesa_GenTransformFeedbacks(GLsizei n, GLuint *names) +void +_mesa_create_transform_feedbacks(struct gl_context *ctx, GLsizei n, + GLuint *ids, bool dsa) { GLuint first; - GET_CURRENT_CONTEXT(ctx); + const char* gl_mthd_name; + + if (dsa) + gl_mthd_name = glCreateTransformFeedbacks; + else + gl_mthd_name = glGenTransformFeedbacks; if (n 0) { - _mesa_error(ctx, GL_INVALID_VALUE, glGenTransformFeedbacks(n 0)); + _mesa_error(ctx, GL_INVALID_VALUE, %s(n 0), gl_mthd_name); return; } - if (!names) + if (!ids) return; /* we don't need contiguous IDs, but this might be faster */ @@ -854,18 +853,52 @@ _mesa_GenTransformFeedbacks(GLsizei n, GLuint *names) struct gl_transform_feedback_object *obj = ctx-Driver.NewTransformFeedback(ctx, first + i); if (!obj) { -_mesa_error(ctx, GL_OUT_OF_MEMORY, glGenTransformFeedbacks); +_mesa_error(ctx, GL_OUT_OF_MEMORY, %s, gl_mthd_name); return; } - names[i] = first + i; + ids[i] = first + i; _mesa_HashInsert(ctx-TransformFeedback.Objects, first + i, obj); } } else { - _mesa_error(ctx, GL_OUT_OF_MEMORY, glGenTransformFeedbacks); + _mesa_error(ctx, GL_OUT_OF_MEMORY, %s, gl_mthd_name); } } +/** + * Create new transform feedback objects. Transform feedback objects + * encapsulate the state related to transform feedback to allow quickly + * switching state (and drawing the results, below). + * Part of GL_ARB_transform_feedback2. + */ +void GLAPIENTRY +_mesa_GenTransformFeedbacks(GLsizei n, GLuint *names) +{ + GET_CURRENT_CONTEXT(ctx); + + /* GenTransformFeedbacks should just reserve the object names that a +* subsequent call to BindTransformFeedback should actively create. For +* the sake of simplicity, we reserve the names and create the objects +* straight away. +*/ + + _mesa_create_transform_feedbacks(ctx, n, names, false); +} + +/** + * Create new transform feedback objects. Transform feedback objects + * encapsulate the state related to transform feedback to allow quickly + * switching state (and drawing the results, below). + * Part of GL_ARB_direct_state_access. + */ +void GLAPIENTRY +_mesa_CreateTransformFeedbacks(GLsizei n, GLuint *names) +{ + GET_CURRENT_CONTEXT(ctx); + + _mesa_create_transform_feedbacks(ctx, n, names, true); +} + /** * Is the given ID a transform feedback object? diff --git a/src/mesa/main/transformfeedback.h b/src/mesa/main/transformfeedback.h index 87f4080..d598533 100644 --- a/src/mesa/main/transformfeedback.h +++ b/src/mesa/main/transformfeedback.h @@ -99,9
[Mesa-dev] [PATCH 3/6] main: Added entry point for glTransformFeedbackBufferRange
v2: review from Laura Ekstrand - use the refactored code to lookup the objects - improve some error messages - factor out the gl method name computation - better handle the spec differences between the DSA and non-DSA cases - quote the spec a little more Signed-off-by: Martin Peres martin.pe...@linux.intel.com --- src/mapi/glapi/gen/ARB_direct_state_access.xml | 8 +++ src/mesa/main/bufferobj.c | 3 +- src/mesa/main/tests/dispatch_sanity.cpp| 1 + src/mesa/main/transformfeedback.c | 94 +- src/mesa/main/transformfeedback.h | 6 +- 5 files changed, 94 insertions(+), 18 deletions(-) diff --git a/src/mapi/glapi/gen/ARB_direct_state_access.xml b/src/mapi/glapi/gen/ARB_direct_state_access.xml index 35d6906..b3c090f 100644 --- a/src/mapi/glapi/gen/ARB_direct_state_access.xml +++ b/src/mapi/glapi/gen/ARB_direct_state_access.xml @@ -20,6 +20,14 @@ param name=buffer type=GLuint / /function + function name=TransformFeedbackBufferRange offset=assign + param name=xfb type=GLuint / + param name=index type=GLuint / + param name=buffer type=GLuint / + param name=offset type=GLintptr / + param name=size type=GLsizeiptr / + /function + !-- Texture object functions -- function name=CreateTextures offset=assign diff --git a/src/mesa/main/bufferobj.c b/src/mesa/main/bufferobj.c index 86532ea..7558e17 100644 --- a/src/mesa/main/bufferobj.c +++ b/src/mesa/main/bufferobj.c @@ -3548,7 +3548,8 @@ _mesa_BindBufferRange(GLenum target, GLuint index, case GL_TRANSFORM_FEEDBACK_BUFFER: _mesa_bind_buffer_range_transform_feedback(ctx, ctx-TransformFeedback.CurrentObject, - index, bufObj, offset, size); + index, bufObj, offset, size, + false); return; case GL_UNIFORM_BUFFER: bind_buffer_range_uniform_buffer(ctx, index, bufObj, offset, size); diff --git a/src/mesa/main/tests/dispatch_sanity.cpp b/src/mesa/main/tests/dispatch_sanity.cpp index 166401a..3b34f7b 100644 --- a/src/mesa/main/tests/dispatch_sanity.cpp +++ b/src/mesa/main/tests/dispatch_sanity.cpp @@ -957,6 +957,7 @@ const struct function gl_core_functions_possible[] = { /* GL_ARB_direct_state_access */ { glCreateTransformFeedbacks, 45, -1 }, { glTransformFeedbackBufferBase, 45, -1 }, + { glTransformFeedbackBufferRange, 45, -1 }, { glCreateTextures, 45, -1 }, { glTextureStorage1D, 45, -1 }, { glTextureStorage2D, 45, -1 }, diff --git a/src/mesa/main/transformfeedback.c b/src/mesa/main/transformfeedback.c index 6583974..7564b6f 100644 --- a/src/mesa/main/transformfeedback.c +++ b/src/mesa/main/transformfeedback.c @@ -541,7 +541,8 @@ bind_buffer_range(struct gl_context *ctx, /** * Specify a buffer object to receive transform feedback results. Plus, * specify the starting offset to place the results, and max size. - * Called from the glBindBufferRange() function. + * Called from the glBindBufferRange() and glTransformFeedbackBufferRange + * functions. */ void _mesa_bind_buffer_range_transform_feedback(struct gl_context *ctx, @@ -549,35 +550,72 @@ _mesa_bind_buffer_range_transform_feedback(struct gl_context *ctx, GLuint index, struct gl_buffer_object *bufObj, GLintptr offset, - GLsizeiptr size) + GLsizeiptr size, + bool dsa) { + const char *gl_methd_name; + if (dsa) + gl_methd_name = glTransformFeedbackBufferRange; + else + gl_methd_name = glBindBufferRange; + + if (obj-Active) { - _mesa_error(ctx, GL_INVALID_OPERATION, - glBindBufferRange(transform feedback active)); + _mesa_error(ctx, GL_INVALID_OPERATION, %s(transform feedback active), + gl_methd_name); return; } if (index = ctx-Const.MaxTransformFeedbackBuffers) { - _mesa_error(ctx, GL_INVALID_VALUE, glBindBufferRange(index=%d - out of bounds), index); + /* An INVALID_VALUE error is generated if index is greater than or equal + * to the number of binding points for transform feedback, as described + * in section 6.7.1. + */ + _mesa_error(ctx, GL_INVALID_VALUE, %s(index=%d out of bounds), + gl_methd_name, index); return; } if (size 0x3) { - /* must a multiple of four */ - _mesa_error(ctx, GL_INVALID_VALUE, glBindBufferRange(size=%d), - (int) size); + /* must a be multiple of four */ + _mesa_error(ctx, GL_INVALID_VALUE, %s(size=%d must be a multiple of + four),
[Mesa-dev] [PATCH 6/6] main: Added entry point for glGetTransformFeedbacki64_v
v2: Review from Laura Ekstrand - use the transform feedback object lookup wrapper Signed-off-by: Martin Peres martin.pe...@linux.intel.com --- src/mapi/glapi/gen/ARB_direct_state_access.xml | 7 ++ src/mesa/main/tests/dispatch_sanity.cpp| 1 + src/mesa/main/transformfeedback.c | 32 ++ src/mesa/main/transformfeedback.h | 4 4 files changed, 44 insertions(+) diff --git a/src/mapi/glapi/gen/ARB_direct_state_access.xml b/src/mapi/glapi/gen/ARB_direct_state_access.xml index 935e088..340dbba 100644 --- a/src/mapi/glapi/gen/ARB_direct_state_access.xml +++ b/src/mapi/glapi/gen/ARB_direct_state_access.xml @@ -41,6 +41,13 @@ param name=param type=GLint * / /function + function name=GetTransformFeedbacki64_v offset=assign + param name=xfb type=GLuint / + param name=pname type=GLenum / + param name=index type=GLuint / + param name=param type=GLint64 * / + /function + !-- Texture object functions -- function name=CreateTextures offset=assign diff --git a/src/mesa/main/tests/dispatch_sanity.cpp b/src/mesa/main/tests/dispatch_sanity.cpp index 4b92699..f485262 100644 --- a/src/mesa/main/tests/dispatch_sanity.cpp +++ b/src/mesa/main/tests/dispatch_sanity.cpp @@ -960,6 +960,7 @@ const struct function gl_core_functions_possible[] = { { glTransformFeedbackBufferRange, 45, -1 }, { glGetTransformFeedbackiv, 45, -1 }, { glGetTransformFeedbacki_v, 45, -1 }, + { glGetTransformFeedbacki64_v, 45, -1 }, { glCreateTextures, 45, -1 }, { glTextureStorage1D, 45, -1 }, { glTextureStorage2D, 45, -1 }, diff --git a/src/mesa/main/transformfeedback.c b/src/mesa/main/transformfeedback.c index 1ffdc16..cf5608d 100644 --- a/src/mesa/main/transformfeedback.c +++ b/src/mesa/main/transformfeedback.c @@ -1249,3 +1249,35 @@ _mesa_GetTransformFeedbacki_v(GLuint xfb, GLenum pname, GLuint index, glGetTransformFeedbacki_v(pname=%i), pname); } } + +extern void GLAPIENTRY +_mesa_GetTransformFeedbacki64_v(GLuint xfb, GLenum pname, GLuint index, + GLint64 *param) +{ + struct gl_transform_feedback_object *obj; + GET_CURRENT_CONTEXT(ctx); + + obj = _mesa_lookup_transform_feedback_object_err(ctx, xfb, + glGetTransformFeedbacki64_v); + if(!obj) { + return; + } + + if (index = ctx-Const.MaxTransformFeedbackBuffers) { + _mesa_error(ctx, GL_INVALID_VALUE, + glGetTransformFeedbacki64_v(index=%i), index); + return; + } + + switch(pname) { + case GL_TRANSFORM_FEEDBACK_BUFFER_START: + *param = obj-Offset[index]; + break; + case GL_TRANSFORM_FEEDBACK_BUFFER_SIZE: + *param = obj-RequestedSize[index]; + break; + default: + _mesa_error(ctx, GL_INVALID_ENUM, + glGetTransformFeedbacki64_v(pname=%i), pname); + } +} diff --git a/src/mesa/main/transformfeedback.h b/src/mesa/main/transformfeedback.h index 2c31b22..1e7ac7f 100644 --- a/src/mesa/main/transformfeedback.h +++ b/src/mesa/main/transformfeedback.h @@ -173,4 +173,8 @@ extern void GLAPIENTRY _mesa_GetTransformFeedbacki_v(GLuint xfb, GLenum pname, GLuint index, GLint *param); +extern void GLAPIENTRY +_mesa_GetTransformFeedbacki64_v(GLuint xfb, GLenum pname, GLuint index, + GLint64 *param); + #endif /* TRANSFORM_FEEDBACK_H */ -- 2.2.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] doc: break down ARB_direct_state_access in GL3.txt
Good idea. Reviewed-by: Laura Ekstrand la...@jlekstrand.net On Thu, Jan 29, 2015 at 6:54 AM, Martin Peres martin.pe...@linux.intel.com wrote: A student was wondering what was going on + I started working on it too. CC: Laura Ekstrand la...@jlekstrand.net Signed-off-by: Martin Peres martin.pe...@linux.intel.com --- docs/GL3.txt | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/docs/GL3.txt b/docs/GL3.txt index 648f5ac..2dc1404 100644 --- a/docs/GL3.txt +++ b/docs/GL3.txt @@ -191,7 +191,16 @@ GL 4.5, GLSL 4.50: GL_ARB_conditional_render_inverted DONE (i965, nv50, nvc0, llvmpipe, softpipe) GL_ARB_cull_distance not started GL_ARB_derivative_controlDONE (i965, nv50, nvc0, r600) - GL_ARB_direct_state_access started (Laura Ekstrand) + GL_ARB_direct_state_access started + - Transform Feedback object started (Martin Peres) + - Buffer object started (Laura Ekstrand) + - Framebuffer object started (Laura Ekstrand) + - Renderbuffer objectnot started + - Texture object started (Laura Ekstrand) + - Vertex array objectnot started + - Sampler object not started + - Program Pipeline objectnot started + - Query object started (Martin Peres) GL_ARB_get_texture_sub_image started (Brian Paul) GL_ARB_shader_texture_image_samples not started GL_ARB_texture_barrier DONE (nv50, nvc0, r300, r600, radeonsi) -- 2.2.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/5] meta/blit: Write depth only when asked for
On Thu, Jan 29, 2015 at 6:21 AM, Topi Pohjolainen topi.pohjolai...@intel.com wrote: diff --git a/src/mesa/drivers/common/meta.c b/src/mesa/drivers/common/meta.c index ec7564a..45035b1 100644 --- a/src/mesa/drivers/common/meta.c +++ b/src/mesa/drivers/common/meta.c @@ -293,13 +293,14 @@ _mesa_meta_setup_blit_shader(struct gl_context *ctx, void main()\n {\n gl_FragColor = %s(texSampler, %s);\n - gl_FragDepth = gl_FragColor.x;\n +%s }\n, fs_preprocess, shader-type, fs_input, -shader-func, shader-texcoords); +shader-func, shader-texcoords, +do_depth ? gl_FragDepth = gl_FragColor.x;\n : ); _mesa_meta_compile_and_link_program(ctx, vs_source, fs_source, - ralloc_asprintf(mem_ctx, %s blit, + ralloc_asprintf(mem_ctx,%s blit, Unintended change. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/6] main: Added entry point for glTransformFeedbackBufferBase
On Thu, Jan 29, 2015 at 4:55 AM, Martin Peres martin.pe...@linux.intel.com wrote: v2: Review from Laura Ekstrand - give more helpful error messages - factor the lookup code for the xfb and objBuf - replace some already-existing tabs with spaces - add comments to explain the cases where xfb == 0 or buffer == 0 - fix the condition for binding the transform buffer or not Signed-off-by: Martin Peres martin.pe...@linux.intel.com --- src/mapi/glapi/gen/ARB_direct_state_access.xml | 6 ++ src/mesa/main/bufferobj.c | 9 +- src/mesa/main/tests/dispatch_sanity.cpp| 1 + src/mesa/main/transformfeedback.c | 136 +++-- src/mesa/main/transformfeedback.h | 19 +++- 5 files changed, 135 insertions(+), 36 deletions(-) diff --git a/src/mapi/glapi/gen/ARB_direct_state_access.xml b/src/mapi/glapi/gen/ARB_direct_state_access.xml index 15b00c2..35d6906 100644 --- a/src/mapi/glapi/gen/ARB_direct_state_access.xml +++ b/src/mapi/glapi/gen/ARB_direct_state_access.xml @@ -14,6 +14,12 @@ param name=ids type=GLuint * / /function + function name=TransformFeedbackBufferBase offset=assign + param name=xfb type=GLuint / + param name=index type=GLuint / + param name=buffer type=GLuint / + /function + !-- Texture object functions -- function name=CreateTextures offset=assign diff --git a/src/mesa/main/bufferobj.c b/src/mesa/main/bufferobj.c index 0c1ce98..86532ea 100644 --- a/src/mesa/main/bufferobj.c +++ b/src/mesa/main/bufferobj.c @@ -3546,8 +3546,9 @@ _mesa_BindBufferRange(GLenum target, GLuint index, switch (target) { case GL_TRANSFORM_FEEDBACK_BUFFER: - _mesa_bind_buffer_range_transform_feedback(ctx, index, bufObj, -offset, size); + _mesa_bind_buffer_range_transform_feedback(ctx, + ctx-TransformFeedback.CurrentObject, + index, bufObj, offset, size); return; case GL_UNIFORM_BUFFER: bind_buffer_range_uniform_buffer(ctx, index, bufObj, offset, size); @@ -3611,7 +3612,9 @@ _mesa_BindBufferBase(GLenum target, GLuint index, GLuint buffer) switch (target) { case GL_TRANSFORM_FEEDBACK_BUFFER: - _mesa_bind_buffer_base_transform_feedback(ctx, index, bufObj); + _mesa_bind_buffer_base_transform_feedback(ctx, + ctx-TransformFeedback.CurrentObject, +index, bufObj, false); return; case GL_UNIFORM_BUFFER: bind_buffer_base_uniform_buffer(ctx, index, bufObj); diff --git a/src/mesa/main/tests/dispatch_sanity.cpp b/src/mesa/main/tests/dispatch_sanity.cpp index 5e7f7ae..166401a 100644 --- a/src/mesa/main/tests/dispatch_sanity.cpp +++ b/src/mesa/main/tests/dispatch_sanity.cpp @@ -956,6 +956,7 @@ const struct function gl_core_functions_possible[] = { /* GL_ARB_direct_state_access */ { glCreateTransformFeedbacks, 45, -1 }, + { glTransformFeedbackBufferBase, 45, -1 }, { glCreateTextures, 45, -1 }, { glTextureStorage1D, 45, -1 }, { glTextureStorage2D, 45, -1 }, diff --git a/src/mesa/main/transformfeedback.c b/src/mesa/main/transformfeedback.c index b0fca48..6583974 100644 --- a/src/mesa/main/transformfeedback.c +++ b/src/mesa/main/transformfeedback.c @@ -514,22 +514,24 @@ _mesa_EndTransformFeedback(void) * Helper used by BindBufferRange() and BindBufferBase(). */ static void -bind_buffer_range(struct gl_context *ctx, GLuint index, +bind_buffer_range(struct gl_context *ctx, + struct gl_transform_feedback_object *obj, + GLuint index, struct gl_buffer_object *bufObj, - GLintptr offset, GLsizeiptr size) + GLintptr offset, GLsizeiptr size, + bool dsa) { - struct gl_transform_feedback_object *obj = - ctx-TransformFeedback.CurrentObject; - /* Note: no need to FLUSH_VERTICES or flag NewTransformFeedback, because * transform feedback buffers can't be changed while transform feedback is * active. */ - /* The general binding point */ - _mesa_reference_buffer_object(ctx, - ctx-TransformFeedback.CurrentBuffer, - bufObj); + if (!dsa) { + /* The general binding point */ + _mesa_reference_buffer_object(ctx, +ctx-TransformFeedback.CurrentBuffer, +bufObj); + } /* The per-attribute binding point */ _mesa_set_transform_feedback_binding(ctx, obj, index, bufObj, offset, size); @@ -543,15 +545,12 @@ bind_buffer_range(struct gl_context *ctx, GLuint index, */ void _mesa_bind_buffer_range_transform_feedback(struct gl_context *ctx, - GLuint index, -
Re: [Mesa-dev] [PATCH 3/6] main: Added entry point for glTransformFeedbackBufferRange
On Thu, Jan 29, 2015 at 4:55 AM, Martin Peres martin.pe...@linux.intel.com wrote: v2: review from Laura Ekstrand - use the refactored code to lookup the objects - improve some error messages - factor out the gl method name computation - better handle the spec differences between the DSA and non-DSA cases - quote the spec a little more Signed-off-by: Martin Peres martin.pe...@linux.intel.com --- src/mapi/glapi/gen/ARB_direct_state_access.xml | 8 +++ src/mesa/main/bufferobj.c | 3 +- src/mesa/main/tests/dispatch_sanity.cpp| 1 + src/mesa/main/transformfeedback.c | 94 +- src/mesa/main/transformfeedback.h | 6 +- 5 files changed, 94 insertions(+), 18 deletions(-) diff --git a/src/mapi/glapi/gen/ARB_direct_state_access.xml b/src/mapi/glapi/gen/ARB_direct_state_access.xml index 35d6906..b3c090f 100644 --- a/src/mapi/glapi/gen/ARB_direct_state_access.xml +++ b/src/mapi/glapi/gen/ARB_direct_state_access.xml @@ -20,6 +20,14 @@ param name=buffer type=GLuint / /function + function name=TransformFeedbackBufferRange offset=assign + param name=xfb type=GLuint / + param name=index type=GLuint / + param name=buffer type=GLuint / + param name=offset type=GLintptr / + param name=size type=GLsizeiptr / + /function + !-- Texture object functions -- function name=CreateTextures offset=assign diff --git a/src/mesa/main/bufferobj.c b/src/mesa/main/bufferobj.c index 86532ea..7558e17 100644 --- a/src/mesa/main/bufferobj.c +++ b/src/mesa/main/bufferobj.c @@ -3548,7 +3548,8 @@ _mesa_BindBufferRange(GLenum target, GLuint index, case GL_TRANSFORM_FEEDBACK_BUFFER: _mesa_bind_buffer_range_transform_feedback(ctx, ctx-TransformFeedback.CurrentObject, - index, bufObj, offset, size); + index, bufObj, offset, size, + false); return; case GL_UNIFORM_BUFFER: bind_buffer_range_uniform_buffer(ctx, index, bufObj, offset, size); diff --git a/src/mesa/main/tests/dispatch_sanity.cpp b/src/mesa/main/tests/dispatch_sanity.cpp index 166401a..3b34f7b 100644 --- a/src/mesa/main/tests/dispatch_sanity.cpp +++ b/src/mesa/main/tests/dispatch_sanity.cpp @@ -957,6 +957,7 @@ const struct function gl_core_functions_possible[] = { /* GL_ARB_direct_state_access */ { glCreateTransformFeedbacks, 45, -1 }, { glTransformFeedbackBufferBase, 45, -1 }, + { glTransformFeedbackBufferRange, 45, -1 }, { glCreateTextures, 45, -1 }, { glTextureStorage1D, 45, -1 }, { glTextureStorage2D, 45, -1 }, diff --git a/src/mesa/main/transformfeedback.c b/src/mesa/main/transformfeedback.c index 6583974..7564b6f 100644 --- a/src/mesa/main/transformfeedback.c +++ b/src/mesa/main/transformfeedback.c @@ -541,7 +541,8 @@ bind_buffer_range(struct gl_context *ctx, /** * Specify a buffer object to receive transform feedback results. Plus, * specify the starting offset to place the results, and max size. - * Called from the glBindBufferRange() function. + * Called from the glBindBufferRange() and glTransformFeedbackBufferRange + * functions. */ void _mesa_bind_buffer_range_transform_feedback(struct gl_context *ctx, @@ -549,35 +550,72 @@ _mesa_bind_buffer_range_transform_feedback(struct gl_context *ctx, GLuint index, struct gl_buffer_object *bufObj, GLintptr offset, - GLsizeiptr size) + GLsizeiptr size, + bool dsa) { + const char *gl_methd_name; + if (dsa) + gl_methd_name = glTransformFeedbackBufferRange; + else + gl_methd_name = glBindBufferRange; + + if (obj-Active) { - _mesa_error(ctx, GL_INVALID_OPERATION, - glBindBufferRange(transform feedback active)); + _mesa_error(ctx, GL_INVALID_OPERATION, %s(transform feedback active), + gl_methd_name); return; } if (index = ctx-Const.MaxTransformFeedbackBuffers) { - _mesa_error(ctx, GL_INVALID_VALUE, glBindBufferRange(index=%d - out of bounds), index); + /* An INVALID_VALUE error is generated if index is greater than or equal + * to the number of binding points for transform feedback, as described + * in section 6.7.1. + */ + _mesa_error(ctx, GL_INVALID_VALUE, %s(index=%d out of bounds), + gl_methd_name, index); return; } if (size 0x3) { - /* must a multiple of four */ - _mesa_error(ctx, GL_INVALID_VALUE, glBindBufferRange(size=%d), - (int) size); +
[Mesa-dev] [PATCH 2/5] meta/blit: Write depth only when asked for
Implementing an idea from Ken, on i965 the shader program for 2D blits becomes significantly simpler. Before: pln(8) g61Fg40,1,0Fg28,8,1F { align1 1Q compacted }; pln(8) g71Fg4.40,1,0F g28,8,1F { align1 1Q compacted }; send(8) g21UW g68,8,1F sampler (1, 0, 0, 1) mlen 2 rlen 4 { align1 1Q }; mov(8) g1231F g28,8,1F{ align1 1Q compacted }; mov(8) g1241F g38,8,1F{ align1 1Q compacted }; mov(8) g1251F g48,8,1F{ align1 1Q compacted }; mov(8) g1261F g58,8,1F{ align1 1Q compacted }; mov(8) g1271F g28,8,1F{ align1 1Q compacted }; nop ; sendc(8) nullg1238,8,1F render RT write SIMD8 LastRT Surface = 0 mlen 5 rlen 0 { align1 1Q EOT }; After: pln(8) g61F g40,1,0Fg28,8,1F { align1 1Q compacted }; pln(8) g71F g4.40,1,0F g28,8,1F { align1 1Q compacted }; send(8) g1241UW g68,8,1F sampler (1, 0, 0, 1) mlen 2 rlen 4{ align1 1Q }; sendc(8) nullg1248,8,1F render RT write SIMD8 LastRT Surface = 0 mlen 4 rlen 0 { align1 1Q EOT }; Signed-off-by: Topi Pohjolainen topi.pohjolai...@intel.com CC: Kenneth Graunke kenn...@whitecape.org --- src/mesa/drivers/common/meta.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/src/mesa/drivers/common/meta.c b/src/mesa/drivers/common/meta.c index ec7564a..45035b1 100644 --- a/src/mesa/drivers/common/meta.c +++ b/src/mesa/drivers/common/meta.c @@ -293,13 +293,14 @@ _mesa_meta_setup_blit_shader(struct gl_context *ctx, void main()\n {\n gl_FragColor = %s(texSampler, %s);\n - gl_FragDepth = gl_FragColor.x;\n +%s }\n, fs_preprocess, shader-type, fs_input, -shader-func, shader-texcoords); +shader-func, shader-texcoords, +do_depth ? gl_FragDepth = gl_FragColor.x;\n : ); _mesa_meta_compile_and_link_program(ctx, vs_source, fs_source, - ralloc_asprintf(mem_ctx, %s blit, + ralloc_asprintf(mem_ctx,%s blit, shader-type), shader-shader_prog); ralloc_free(mem_ctx); -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/5] meta/blit: Add plumbing for shaders without depth
Currently all blit programs are unconditionally compiled with gl_FragDepth. Signed-off-by: Topi Pohjolainen topi.pohjolai...@intel.com --- src/mesa/drivers/common/meta.c | 3 ++- src/mesa/drivers/common/meta.h | 1 + src/mesa/drivers/common/meta_blit.c| 2 +- src/mesa/drivers/common/meta_generate_mipmap.c | 2 +- 4 files changed, 5 insertions(+), 3 deletions(-) diff --git a/src/mesa/drivers/common/meta.c b/src/mesa/drivers/common/meta.c index 23c05eb..ec7564a 100644 --- a/src/mesa/drivers/common/meta.c +++ b/src/mesa/drivers/common/meta.c @@ -243,6 +243,7 @@ _mesa_meta_compile_and_link_program(struct gl_context *ctx, void _mesa_meta_setup_blit_shader(struct gl_context *ctx, GLenum target, + bool do_depth, struct blit_shader_table *table) { char *vs_source, *fs_source; @@ -3035,7 +3036,7 @@ decompress_texture_image(struct gl_context *ctx, _mesa_meta_setup_vertex_objects(decompress-VAO, decompress-VBO, true, 2, 4, 0); - _mesa_meta_setup_blit_shader(ctx, target, decompress-shaders); + _mesa_meta_setup_blit_shader(ctx, target, true, decompress-shaders); } else { _mesa_meta_setup_ff_tnl_for_blit(decompress-VAO, decompress-VBO, 3); } diff --git a/src/mesa/drivers/common/meta.h b/src/mesa/drivers/common/meta.h index 20d30f7..de3dc6a 100644 --- a/src/mesa/drivers/common/meta.h +++ b/src/mesa/drivers/common/meta.h @@ -632,6 +632,7 @@ _mesa_meta_setup_copypix_texture(struct gl_context *ctx, void _mesa_meta_setup_blit_shader(struct gl_context *ctx, GLenum target, + bool do_depth, struct blit_shader_table *table); void diff --git a/src/mesa/drivers/common/meta_blit.c b/src/mesa/drivers/common/meta_blit.c index 079b8e1..02e90e0 100644 --- a/src/mesa/drivers/common/meta_blit.c +++ b/src/mesa/drivers/common/meta_blit.c @@ -531,7 +531,7 @@ setup_glsl_blit_framebuffer(struct gl_context *ctx, } else if (is_target_multisample) { setup_glsl_msaa_blit_shader(ctx, blit, src_rb, target); } else { - _mesa_meta_setup_blit_shader(ctx, target, blit-shaders); + _mesa_meta_setup_blit_shader(ctx, target, true, blit-shaders); } } diff --git a/src/mesa/drivers/common/meta_generate_mipmap.c b/src/mesa/drivers/common/meta_generate_mipmap.c index eedbb8e..c38b46b 100644 --- a/src/mesa/drivers/common/meta_generate_mipmap.c +++ b/src/mesa/drivers/common/meta_generate_mipmap.c @@ -193,7 +193,7 @@ _mesa_meta_GenerateMipmap(struct gl_context *ctx, GLenum target, if (use_glsl_version) { _mesa_meta_setup_vertex_objects(mipmap-VAO, mipmap-VBO, true, 2, 4, 0); - _mesa_meta_setup_blit_shader(ctx, target, mipmap-shaders); + _mesa_meta_setup_blit_shader(ctx, target, true, mipmap-shaders); } else { _mesa_meta_setup_ff_tnl_for_blit(mipmap-VAO, mipmap-VBO, 3); _mesa_set_enable(ctx, target, GL_TRUE); -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] mesa: Add new fast mtx_t mutex type for basic use cases
On 28/01/15 05:08, Kristian Høgsberg wrote: While modern pthread mutexes are very fast, they still incur a call to an external DSO and overhead of the generality and features of pthread mutexes. Most mutexes in mesa only needs lock/unlock, and the idea here is that we can inline the atomic operation and make the fast case just two intructions. Mutexes are subtle and finicky to implement, so we carefully copy the implementation from Ulrich Dreppers well-written and well-reviewed paper: Futexes Are Tricky http://www.akkadia.org/drepper/futex.pdf We implement mutex3, which gives us a mutex that has no syscalls on uncontended lock or unlock. Further, the uncontended case boils down to a cmpxchg and an untaken branch and the uncontended unlock is just a locked decr and an untaken branch. We use __builtin_expect() to indicate that contention is unlikely so that gcc will put the contention code out of the main code flow. A fast mutex only supports lock/unlock, can't be recursive or used with condition variables. We keep the pthread mutex implementation around as full_mtx_t for the few places where we use condition variables or recursive locking. For platforms or compilers where futex and atomics aren't available, mtx_t falls back to the pthread mutex. The pthread mutex lock/unlock overhead shows up on benchmarks for CPU bound applications. Most CPU bound cases are helped and some of our internal bind_buffer_object heavy benchmarks gain up to 10%. Hi Kristian, Can I humbly ask that you split this into two patches - one that introduces the new functions/struct and another one that uses them ? This way it'll be easier if/when things go crazy. Also the patch seems to wonder between posix and win32 + typedef full_mtx_t mtx_t; and + typedef mtx_t fast_mtx_t; Looks like a left over from the should I rename XX variables to fast* or just one to full* moment :) Thanks Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] meta/blit: Write gl_FragDepth only when really needed
Currently meta blit programs unconditionally write both color and depth values. This series introduces two flavours - with and without gl_FragDepth - allowing simpler programs for color only blits. Ken had observed that i965 behaves badly emitting five unnecessary moves when the depth is included in the shader. While this didn't improve any performance benches I tried, it doesn't seem to introduce any regressions either in the benches or with piglit. Topi Pohjolainen (5): meta/blit: Add plumbing for shaders without depth meta/blit: Write depth only when asked for meta/blit: Compile programs with and without depth meta: Don't write depth when generating miptrees meta: Don't write depth when decompressing tex-images src/mesa/drivers/common/meta.c | 10 ++ src/mesa/drivers/common/meta.h | 4 +++- src/mesa/drivers/common/meta_blit.c| 13 + src/mesa/drivers/common/meta_generate_mipmap.c | 2 +- 4 files changed, 19 insertions(+), 10 deletions(-) -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/5] meta/blit: Compile programs with and without depth
When color buffers alone are concerned the depth is not needed. No regression on BDW where meta blit is used instead of blorp. I also disabled blorp temporarily for fbo-blits on IVB and saw no regressions there either. I also compared several graphics benchmarks on BDW and saw neither regressions or improvements. Signed-off-by: Topi Pohjolainen topi.pohjolai...@intel.com --- src/mesa/drivers/common/meta.h | 3 ++- src/mesa/drivers/common/meta_blit.c | 13 + 2 files changed, 11 insertions(+), 5 deletions(-) diff --git a/src/mesa/drivers/common/meta.h b/src/mesa/drivers/common/meta.h index de3dc6a..3b9e05e 100644 --- a/src/mesa/drivers/common/meta.h +++ b/src/mesa/drivers/common/meta.h @@ -298,7 +298,8 @@ struct blit_state { GLuint VAO; GLuint VBO; - struct blit_shader_table shaders; + struct blit_shader_table shaders_with_depth; + struct blit_shader_table shaders_without_depth; GLuint msaa_shaders[BLIT_MSAA_SHADER_COUNT]; struct temp_texture depthTex; bool no_ctsi_fallback; diff --git a/src/mesa/drivers/common/meta_blit.c b/src/mesa/drivers/common/meta_blit.c index 02e90e0..4212d94 100644 --- a/src/mesa/drivers/common/meta_blit.c +++ b/src/mesa/drivers/common/meta_blit.c @@ -510,7 +510,8 @@ setup_glsl_blit_framebuffer(struct gl_context *ctx, struct blit_state *blit, struct gl_renderbuffer *src_rb, GLenum target, GLenum filter, -bool is_scaled_blit) +bool is_scaled_blit, +bool do_depth) { unsigned texcoord_size; bool is_target_multisample = target == GL_TEXTURE_2D_MULTISAMPLE || @@ -531,7 +532,9 @@ setup_glsl_blit_framebuffer(struct gl_context *ctx, } else if (is_target_multisample) { setup_glsl_msaa_blit_shader(ctx, blit, src_rb, target); } else { - _mesa_meta_setup_blit_shader(ctx, target, true, blit-shaders); + _mesa_meta_setup_blit_shader(ctx, target, do_depth, + do_depth ? blit-shaders_with_depth +: blit-shaders_without_depth); } } @@ -642,7 +645,8 @@ blitframebuffer_texture(struct gl_context *ctx, scaled_blit = dstW != srcW || dstH != srcH; if (glsl_version) { - setup_glsl_blit_framebuffer(ctx, blit, rb, target, filter, scaled_blit); + setup_glsl_blit_framebuffer(ctx, blit, rb, target, filter, scaled_blit, + do_depth); } else { _mesa_meta_setup_ff_tnl_for_blit(ctx-Meta-Blit.VAO, @@ -962,7 +966,8 @@ _mesa_meta_glsl_blit_cleanup(struct blit_state *blit) blit-VBO = 0; } - _mesa_meta_blit_shader_table_cleanup(blit-shaders); + _mesa_meta_blit_shader_table_cleanup(blit-shaders_with_depth); + _mesa_meta_blit_shader_table_cleanup(blit-shaders_without_depth); _mesa_DeleteTextures(1, blit-depthTex.TexObj); blit-depthTex.TexObj = 0; -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 5/5] meta: Don't write depth when decompressing tex-images
Signed-off-by: Topi Pohjolainen topi.pohjolai...@intel.com --- src/mesa/drivers/common/meta.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/mesa/drivers/common/meta.c b/src/mesa/drivers/common/meta.c index 45035b1..6ddec73 100644 --- a/src/mesa/drivers/common/meta.c +++ b/src/mesa/drivers/common/meta.c @@ -3037,7 +3037,7 @@ decompress_texture_image(struct gl_context *ctx, _mesa_meta_setup_vertex_objects(decompress-VAO, decompress-VBO, true, 2, 4, 0); - _mesa_meta_setup_blit_shader(ctx, target, true, decompress-shaders); + _mesa_meta_setup_blit_shader(ctx, target, false, decompress-shaders); } else { _mesa_meta_setup_ff_tnl_for_blit(decompress-VAO, decompress-VBO, 3); } -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 4/5] meta: Don't write depth when generating miptrees
Signed-off-by: Topi Pohjolainen topi.pohjolai...@intel.com --- src/mesa/drivers/common/meta_generate_mipmap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/mesa/drivers/common/meta_generate_mipmap.c b/src/mesa/drivers/common/meta_generate_mipmap.c index c38b46b..c1b6d3c 100644 --- a/src/mesa/drivers/common/meta_generate_mipmap.c +++ b/src/mesa/drivers/common/meta_generate_mipmap.c @@ -193,7 +193,7 @@ _mesa_meta_GenerateMipmap(struct gl_context *ctx, GLenum target, if (use_glsl_version) { _mesa_meta_setup_vertex_objects(mipmap-VAO, mipmap-VBO, true, 2, 4, 0); - _mesa_meta_setup_blit_shader(ctx, target, true, mipmap-shaders); + _mesa_meta_setup_blit_shader(ctx, target, false, mipmap-shaders); } else { _mesa_meta_setup_ff_tnl_for_blit(mipmap-VAO, mipmap-VBO, 3); _mesa_set_enable(ctx, target, GL_TRUE); -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/6] main: Added entry point for glCreateTransformFeedbacks
On 29/01/15 15:37, Fredrik Höglund wrote: On Thursday 29 January 2015, Martin Peres wrote: v2: Review from Laura Ekstrand - generate the name of the gl method once - shorten some lines to stay in the 78 chars limit Signed-off-by: Martin Peres martin.pe...@linux.intel.com --- src/mapi/glapi/gen/ARB_direct_state_access.xml | 7 +++ src/mesa/main/tests/dispatch_sanity.cpp| 1 + src/mesa/main/transformfeedback.c | 63 -- src/mesa/main/transformfeedback.h | 6 +++ 4 files changed, 62 insertions(+), 15 deletions(-) diff --git a/src/mapi/glapi/gen/ARB_direct_state_access.xml b/src/mapi/glapi/gen/ARB_direct_state_access.xml index 2fe1638..15b00c2 100644 --- a/src/mapi/glapi/gen/ARB_direct_state_access.xml +++ b/src/mapi/glapi/gen/ARB_direct_state_access.xml @@ -7,6 +7,13 @@ enum name=QUERY_TARGETvalue=0x82EA/ enum name=TEXTURE_BINDING value=0x82EB/ + !-- Transform Feedback object functions -- + + function name=CreateTransformFeedbacks offset=assign + param name=n type=GLsizei / + param name=ids type=GLuint * / + /function + !-- Texture object functions -- function name=CreateTextures offset=assign diff --git a/src/mesa/main/tests/dispatch_sanity.cpp b/src/mesa/main/tests/dispatch_sanity.cpp index ee4db45..5e7f7ae 100644 --- a/src/mesa/main/tests/dispatch_sanity.cpp +++ b/src/mesa/main/tests/dispatch_sanity.cpp @@ -955,6 +955,7 @@ const struct function gl_core_functions_possible[] = { { glClipControl, 45, -1 }, /* GL_ARB_direct_state_access */ + { glCreateTransformFeedbacks, 45, -1 }, { glCreateTextures, 45, -1 }, { glTextureStorage1D, 45, -1 }, { glTextureStorage2D, 45, -1 }, diff --git a/src/mesa/main/transformfeedback.c b/src/mesa/main/transformfeedback.c index a737463..b0fca48 100644 --- a/src/mesa/main/transformfeedback.c +++ b/src/mesa/main/transformfeedback.c @@ -825,25 +825,24 @@ _mesa_lookup_transform_feedback_object(struct gl_context *ctx, GLuint name) _mesa_HashLookup(ctx-TransformFeedback.Objects, name); } - -/** - * Create new transform feedback objects. Transform feedback objects - * encapsulate the state related to transform feedback to allow quickly - * switching state (and drawing the results, below). - * Part of GL_ARB_transform_feedback2. - */ -void GLAPIENTRY -_mesa_GenTransformFeedbacks(GLsizei n, GLuint *names) +void +_mesa_create_transform_feedbacks(struct gl_context *ctx, GLsizei n, + GLuint *ids, bool dsa) { GLuint first; - GET_CURRENT_CONTEXT(ctx); + const char* gl_mthd_name; I would prefer it if you used a simpler name for this variable, such as func. Func seems like a good name. Thanks. + if (dsa) + gl_mthd_name = glCreateTransformFeedbacks; + else + gl_mthd_name = glGenTransformFeedbacks; if (n 0) { - _mesa_error(ctx, GL_INVALID_VALUE, glGenTransformFeedbacks(n 0)); + _mesa_error(ctx, GL_INVALID_VALUE, %s(n 0), gl_mthd_name); return; } - if (!names) + if (!ids) return; /* we don't need contiguous IDs, but this might be faster */ @@ -854,18 +853,52 @@ _mesa_GenTransformFeedbacks(GLsizei n, GLuint *names) struct gl_transform_feedback_object *obj = ctx-Driver.NewTransformFeedback(ctx, first + i); if (!obj) { -_mesa_error(ctx, GL_OUT_OF_MEMORY, glGenTransformFeedbacks); +_mesa_error(ctx, GL_OUT_OF_MEMORY, %s, gl_mthd_name); return; } - names[i] = first + i; + ids[i] = first + i; _mesa_HashInsert(ctx-TransformFeedback.Objects, first + i, obj); You have to set EverBound to true when dsa is true. Otherwise glIsTransformFeedback() will return false when it's passed an ID that was created by glCreateTransformFeedbacks(). Good catch! I'm going to modify my piglit test to check for this behaviour. Thanks. } } else { - _mesa_error(ctx, GL_OUT_OF_MEMORY, glGenTransformFeedbacks); + _mesa_error(ctx, GL_OUT_OF_MEMORY, %s, gl_mthd_name); } } +/** + * Create new transform feedback objects. Transform feedback objects + * encapsulate the state related to transform feedback to allow quickly + * switching state (and drawing the results, below). + * Part of GL_ARB_transform_feedback2. + */ +void GLAPIENTRY +_mesa_GenTransformFeedbacks(GLsizei n, GLuint *names) +{ + GET_CURRENT_CONTEXT(ctx); + + /* GenTransformFeedbacks should just reserve the object names that a +* subsequent call to BindTransformFeedback should actively create. For +* the sake of simplicity, we reserve the names and create the objects +* straight away. +*/ + _mesa_create_transform_feedbacks(ctx, n, names, false); +} + +/** + * Create new transform feedback objects. Transform feedback objects + * encapsulate the state related to transform feedback to allow quickly + *
[Mesa-dev] [PATCH] dir-locals.el: Don't set variables for non-programming modes
This limits the style changes to modes inherited from prog-mode. The main reason to do this is to avoid setting fill-column for people using Emacs to edit commit messages because 78 characters is too many to make it wrap properly in git log. Note that makefile-mode also inherits from prog-mode so the fill column should continue to apply there. --- .dir-locals.el | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.dir-locals.el b/.dir-locals.el index 04a1a2c..d95eb48 100644 --- a/.dir-locals.el +++ b/.dir-locals.el @@ -1,4 +1,4 @@ -((nil +((prog-mode (indent-tabs-mode . nil) (tab-width . 8) (c-basic-offset . 3) -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v4 2/3] i965/fs: Use NIR's scalarizing abilities and stop handling vectors
Matt wanted shader-db numbers and here they are: total instructions in shared programs: 5998287 - 5974070 (-0.40%) instructions in affected programs: 732041 - 707824 (-3.31%) helped:3135 HURT: 193 GAINED:18 LOST: 0 On Thu, Jan 29, 2015 at 3:41 AM, Kenneth Graunke kenn...@whitecape.org wrote: On Wednesday, January 28, 2015 04:27:23 PM Jason Ekstrand wrote: On Wed, Jan 28, 2015 at 4:02 PM, Matt Turner matts...@gmail.com wrote: On Tue, Jan 27, 2015 at 5:31 PM, Jason Ekstrand ja...@jlekstrand.net wrote: diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp index de0d780..217fe09 100644 @@ -689,9 +689,9 @@ fs_visitor::nir_emit_alu(nir_alu_instr *instr) struct brw_reg acc = retype(brw_acc_reg(dispatch_width), result.type); - emit_percomp(MUL(acc, op[0], op[1]), instr-dest.write_mask); - emit_percomp(MACH(reg_null_d, op[0], op[1]), instr-dest.write_mask); - emit_percomp(MOV(result, fs_reg(acc)), instr-dest.write_mask); Bah. How the hell did the old code pass piglit? I think channel expressions saved it. + emit(MUL(acc, op[0], op[1])); + emit(MACH(reg_null_d, op[0], op[1])); + emit(MOV(result, fs_reg(acc))); break; } @@ -773,72 +767,38 @@ fs_visitor::nir_emit_alu(nir_alu_instr *instr) case nir_op_ball_fequal3: case nir_op_ball_iequal3: case nir_op_ball_fequal4: - case nir_op_ball_iequal4: { - unsigned num_components = nir_op_infos[instr-op].input_sizes[0]; - fs_reg temp = vgrf(num_components); - emit_percomp(CMP(temp, op[0], op[1], BRW_CONDITIONAL_Z), - (1 num_components) - 1); - emit_reduction(BRW_OPCODE_AND, result, temp, num_components); - break; - } - + case nir_op_ball_iequal4: We can save it for later, but it might be interesting to let the fs backend get the vector comparisons directly, if that's possible. See https://bugs.freedesktop.org/show_bug.cgi?id=77456 We can. We just have to add a flag to turn that off in the nir_lower_alu_to_scalar pass. @@ -867,83 +827,67 @@ fs_visitor::nir_emit_alu(nir_alu_instr *instr) unreachable(not reached: should be handled by ldexp_to_arith()); case nir_op_fsqrt: - emit_math_percomp(SHADER_OPCODE_SQRT, result, op[0], -instr-dest.write_mask, instr-dest.saturate); + emit_math(SHADER_OPCODE_SQRT, result, op[0]) + -saturate = instr-dest.saturate; I don't really like the -saturate = ... as a single statement. I know I'm guilty of using it in a unit test recently. Yeah, I'm not sure what to do there. I really don't like having to assign it to a fs_inst variable and then do the saturate. I guess we could. Or we could add something to emit_math. Thoughts? In the brw_eu_emit.c code I worked around this by creating a brw_last_inst macro that always refers to the most recently emitted instruction. Still not 100% crazy about it, but it's been pretty handy. *shrug*. It's an idea, anyway. I'm not too opposed to just creating fs_inst * temporaries. --Ken ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] meta/blit: Write gl_FragDepth only when really needed
I haven't worked in meta a lot, but I didn't see anything that didn't look correct. The series is Reviewed-by: Matt Turner matts...@gmail.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] meta/blit: Write gl_FragDepth only when really needed
On Thursday, January 29, 2015 04:20:59 PM Topi Pohjolainen wrote: Currently meta blit programs unconditionally write both color and depth values. This series introduces two flavours - with and without gl_FragDepth - allowing simpler programs for color only blits. Ken had observed that i965 behaves badly emitting five unnecessary moves when the depth is included in the shader. While this didn't improve any performance benches I tried, it doesn't seem to introduce any regressions either in the benches or with piglit. Topi Pohjolainen (5): meta/blit: Add plumbing for shaders without depth meta/blit: Write depth only when asked for meta/blit: Compile programs with and without depth meta: Don't write depth when generating miptrees meta: Don't write depth when decompressing tex-images Series is: Reviewed-by: Kenneth Graunke kenn...@whitecape.org This should also help with future Cherryview optimizations. Thanks! signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] gallium/docs: fix docs wrt ARL/ARR/FLR
From: Roland Scheidegger srol...@vmware.com since the address reg holds integer values, ARL/ARR do an implicit float-to-int conversion, so clarify that. Thus it is also incorrect to say that FLR really does the same as ARL. --- src/gallium/docs/source/tgsi.rst | 18 -- 1 file changed, 8 insertions(+), 10 deletions(-) diff --git a/src/gallium/docs/source/tgsi.rst b/src/gallium/docs/source/tgsi.rst index ff322e8..84b0ed6 100644 --- a/src/gallium/docs/source/tgsi.rst +++ b/src/gallium/docs/source/tgsi.rst @@ -48,13 +48,13 @@ used. .. math:: - dst.x = \lfloor src.x\rfloor + dst.x = (int) \lfloor src.x\rfloor - dst.y = \lfloor src.y\rfloor + dst.y = (int) \lfloor src.y\rfloor - dst.z = \lfloor src.z\rfloor + dst.z = (int) \lfloor src.z\rfloor - dst.w = \lfloor src.w\rfloor + dst.w = (int) \lfloor src.w\rfloor .. opcode:: MOV - Move @@ -313,8 +313,6 @@ This instruction replicates its result. .. opcode:: FLR - Floor -This is identical to :opcode:`ARL`. - .. math:: dst.x = \lfloor src.x\rfloor @@ -637,13 +635,13 @@ This instruction replicates its result. .. math:: - dst.x = round(src.x) + dst.x = (int) round(src.x) - dst.y = round(src.y) + dst.y = (int) round(src.y) - dst.z = round(src.z) + dst.z = (int) round(src.z) - dst.w = round(src.w) + dst.w = (int) round(src.w) .. opcode:: SSG - Set Sign -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] gallium/docs: fix docs wrt ARL/ARR/FLR
On Thu, Jan 29, 2015 at 2:40 PM, srol...@vmware.com wrote: From: Roland Scheidegger srol...@vmware.com since the address reg holds integer values, ARL/ARR do an implicit float-to-int conversion, so clarify that. Thus it is also incorrect to say that FLR really does the same as ARL. Reviewed-by: Alex Deucher alexander.deuc...@amd.com --- src/gallium/docs/source/tgsi.rst | 18 -- 1 file changed, 8 insertions(+), 10 deletions(-) diff --git a/src/gallium/docs/source/tgsi.rst b/src/gallium/docs/source/tgsi.rst index ff322e8..84b0ed6 100644 --- a/src/gallium/docs/source/tgsi.rst +++ b/src/gallium/docs/source/tgsi.rst @@ -48,13 +48,13 @@ used. .. math:: - dst.x = \lfloor src.x\rfloor + dst.x = (int) \lfloor src.x\rfloor - dst.y = \lfloor src.y\rfloor + dst.y = (int) \lfloor src.y\rfloor - dst.z = \lfloor src.z\rfloor + dst.z = (int) \lfloor src.z\rfloor - dst.w = \lfloor src.w\rfloor + dst.w = (int) \lfloor src.w\rfloor .. opcode:: MOV - Move @@ -313,8 +313,6 @@ This instruction replicates its result. .. opcode:: FLR - Floor -This is identical to :opcode:`ARL`. - .. math:: dst.x = \lfloor src.x\rfloor @@ -637,13 +635,13 @@ This instruction replicates its result. .. math:: - dst.x = round(src.x) + dst.x = (int) round(src.x) - dst.y = round(src.y) + dst.y = (int) round(src.y) - dst.z = round(src.z) + dst.z = (int) round(src.z) - dst.w = round(src.w) + dst.w = (int) round(src.w) .. opcode:: SSG - Set Sign -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] gallium/docs: fix docs wrt ARL/ARR/FLR
On 29/01/15 19:40, srol...@vmware.com wrote: From: Roland Scheidegger srol...@vmware.com since the address reg holds integer values, ARL/ARR do an implicit float-to-int conversion, so clarify that. Thus it is also incorrect to say that FLR really does the same as ARL. --- src/gallium/docs/source/tgsi.rst | 18 -- 1 file changed, 8 insertions(+), 10 deletions(-) diff --git a/src/gallium/docs/source/tgsi.rst b/src/gallium/docs/source/tgsi.rst index ff322e8..84b0ed6 100644 --- a/src/gallium/docs/source/tgsi.rst +++ b/src/gallium/docs/source/tgsi.rst @@ -48,13 +48,13 @@ used. .. math:: - dst.x = \lfloor src.x\rfloor + dst.x = (int) \lfloor src.x\rfloor - dst.y = \lfloor src.y\rfloor + dst.y = (int) \lfloor src.y\rfloor - dst.z = \lfloor src.z\rfloor + dst.z = (int) \lfloor src.z\rfloor - dst.w = \lfloor src.w\rfloor + dst.w = (int) \lfloor src.w\rfloor .. opcode:: MOV - Move @@ -313,8 +313,6 @@ This instruction replicates its result. .. opcode:: FLR - Floor -This is identical to :opcode:`ARL`. - .. math:: dst.x = \lfloor src.x\rfloor @@ -637,13 +635,13 @@ This instruction replicates its result. .. math:: - dst.x = round(src.x) + dst.x = (int) round(src.x) - dst.y = round(src.y) + dst.y = (int) round(src.y) - dst.z = round(src.z) + dst.z = (int) round(src.z) - dst.w = round(src.w) + dst.w = (int) round(src.w) .. opcode:: SSG - Set Sign Looks good. Jose ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] killing off the address reg in tgsi
Hi, for Gallium Nine we use ARR and UARL. Do all cards supporting integers support indexing with an integer register ? If yes we could remove our UARL usage. As for ARR, if all cards are ok with normal registers, we could use a normal one too, as long as we can get the same rounding behaviour (either to a integer directly if card supports it, or to float). Axel Davy On 29/01/2015 22:20, Roland Scheidegger wrote : Hi, the address reg in tgsi is quite a nuisance. glsl-to-tgsi code assumes that indirections can only be done through the address reg and has quite some extra code to deal with this. Even though hardware and apis which worked like that are definitely old by now. Thus, I'm proposing the address reg be nuked. I am however not quite sure what the implications for drivers are, other than I'm certain llvmpipe can handle that already. For that reason, I suspect at least initially a new cap bit would be required so glsl-to-tgsi would skip the extra code. I tend to think longer term it would be great if it could be nuked completely, I am however not sure if that is easily done with drivers for old hw (such as r300) - I guess if necessary we could keep operations such as ARL (or even ARR though clearly not UARL!) and just define them to be usable with temp regs. Opinions? Roland ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] killing off the address reg in tgsi
On Thu, Jan 29, 2015 at 4:20 PM, Roland Scheidegger srol...@vmware.com wrote: Hi, the address reg in tgsi is quite a nuisance. glsl-to-tgsi code assumes that indirections can only be done through the address reg and has quite some extra code to deal with this. Even though hardware and apis which worked like that are definitely old by now. Thus, I'm proposing the address reg be nuked. I am however not quite sure what the implications for drivers are, other than I'm certain llvmpipe can handle that already. For that reason, I suspect at least initially a new cap bit would be required so glsl-to-tgsi would skip the extra code. I tend to think longer term it would be great if it could be nuked completely, I am however not sure if that is easily done with drivers for old hw (such as r300) - I guess if necessary we could keep operations such as ARL (or even ARR though clearly not UARL!) and just define them to be usable with temp regs. Opinions? Not sure about other hw (although I would assume radeon is similar).. but adreno has real actual address register, which need to be used for relative addressing of the register (or const) file. Otherwise we'd need to do something like arrays in memory and load/store instructions. Possibly w/ more smarts in the compiler, we could infer address register usage. It would require some smarts that we don't have yet to eliminate redundant writes to the address register generated for each instruction with indirect reference. (Although eventually I'd like to use tgsi-nir.. and haven't yet looked at how nir handles indirects so maybe eventually I need to implement those smarts in my compiler anyway.) BR, -R Roland ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 4/7] nir/algebraic: Fail to compile of a variable is used in a replace but not the search
--- src/glsl/nir/nir_algebraic.py | 7 +++ 1 file changed, 7 insertions(+) diff --git a/src/glsl/nir/nir_algebraic.py b/src/glsl/nir/nir_algebraic.py index f9b246d..75436f4 100644 --- a/src/glsl/nir/nir_algebraic.py +++ b/src/glsl/nir/nir_algebraic.py @@ -34,13 +34,18 @@ class VarSet(object): def __init__(self): self.names = {} self.ids = itertools.count() + self.immutable = False; def __getitem__(self, name): if name not in self.names: + assert not self.immutable, Unknown replacement variable: + name self.names[name] = self.ids.next() return self.names[name] + def lock(self): + self.immutable = True + class Value(object): @staticmethod def create(val, name_base, varset): @@ -138,6 +143,8 @@ class SearchAndReplace(object): else: self.search = Expression(search, search{0}.format(self.id), varset) + varset.lock() + if isinstance(replace, Value): self.replace = replace else: -- 2.2.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/7] nir/search: Allow for matching variables based on types
This allows you to match on an unknown value but only if it is of a given type. 90% of the uses of this are for matching only booleans, but adding the generality of arbitrary types is no more complex. nir_algebraic.py doesn't handle this yet but that's ok because the C language will ensure that the default type on all variables is void. --- src/glsl/nir/nir_search.c | 11 +++ src/glsl/nir/nir_search.h | 12 2 files changed, 23 insertions(+) diff --git a/src/glsl/nir/nir_search.c b/src/glsl/nir/nir_search.c index ec89817..35eb7e2 100644 --- a/src/glsl/nir/nir_search.c +++ b/src/glsl/nir/nir_search.c @@ -82,6 +82,17 @@ match_value(const nir_search_value *value, nir_alu_instr *instr, unsigned src, instr-src[src].src.ssa-parent_instr-type != nir_instr_type_load_const) return false; + if (var-type != nir_type_void) { +if (instr-src[src].src.ssa-parent_instr-type != nir_instr_type_alu) + return false; + +nir_alu_instr *src_alu = + nir_instr_as_alu(instr-src[src].src.ssa-parent_instr); + +if (nir_op_infos[src_alu-op].output_type != var-type) + return false; + } + state-variables_seen |= (1 var-variable); state-variables[var-variable].src = instr-src[src].src; state-variables[var-variable].abs = false; diff --git a/src/glsl/nir/nir_search.h b/src/glsl/nir/nir_search.h index 18aa28d..7d47792 100644 --- a/src/glsl/nir/nir_search.h +++ b/src/glsl/nir/nir_search.h @@ -54,6 +54,18 @@ typedef struct { * given variable is only allowed to match constant values. */ bool is_constant; + + /** Indicates that the given variable must have a certain type +* +* This is only allowed in search expressions and indicates that the +* given variable is only allowed to match values that come from an ALU +* instruction with the given output type. A type of nir_type_void +* means it can match any type. +* +* Note: A variable that is both constant and has a non-void type will +* never match anything. +*/ + nir_alu_type type; } nir_search_variable; typedef struct { -- 2.2.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 7/7] nir/opt_algebraic: Add some constant bcsel reductions
total instructions in shared programs: 8026469 - 8026343 (-0.00%) instructions in affected programs: 28541 - 28415 (-0.44%) helped:101 HURT: 76 --- src/glsl/nir/nir_opt_algebraic.py | 26 ++ 1 file changed, 26 insertions(+) diff --git a/src/glsl/nir/nir_opt_algebraic.py b/src/glsl/nir/nir_opt_algebraic.py index e3b309c..4e1beb4 100644 --- a/src/glsl/nir/nir_opt_algebraic.py +++ b/src/glsl/nir/nir_opt_algebraic.py @@ -122,9 +122,35 @@ optimizations = [ (('ieq', 'a@bool', 0), ('inot', 'a')), (('bcsel', 'a@bool', True, False), 'a'), (('bcsel', 'a@bool', False, True), ('inot', 'a')), + (('bcsel', True, b, c), b), + (('bcsel', False, b, c), c), + # The result of this should be hit by constant propagation and, in the + # next round of opt_algebraic, get picked up by one of the above two. + (('bcsel', '#a', b, c), ('bcsel', ('ine', 'a', 0), b, c)), # This one may not be exact (('feq', ('fadd', a, b), 0.0), ('feq', a, ('fneg', b))), ] +# Add optimizations to handle the case where the result of a ternary is +# compared to a constant. This way we can take things like +# +# (a ? 0 : 1) 0 +# +# and turn it into +# +# a ? (0 0) : (1 0) +# +# which constant folding will eat for lunch. The resulting ternary will +# further get cleaned up by the boolean reductions above ane we will be +# left with just the original variable a. +for op in ['flt', 'fge', 'feq', 'fne', + 'ilt', 'ige', 'ieq', 'ine', 'ult', 'uge']: + optimizations += [ + ((op, ('bcsel', 'a', '#b', '#c'), '#d'), + ('bcsel', 'a', (op, 'b', 'd'), (op, 'c', 'd'))), + ((op, '#d', ('bcsel', a, '#b', '#c')), + ('bcsel', 'a', (op, 'd', 'b'), (op, 'd', 'c'))), + ] + print nir_algebraic.AlgebraicPass(nir_opt_algebraic, optimizations).render() -- 2.2.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 6/7] nir/opt_algebraic: Add some boolean simplifications
total instructions in shared programs: 8026509 - 8026469 (-0.00%) instructions in affected programs: 6338 - 6298 (-0.63%) helped:11 --- src/glsl/nir/nir_opt_algebraic.py | 5 + 1 file changed, 5 insertions(+) diff --git a/src/glsl/nir/nir_opt_algebraic.py b/src/glsl/nir/nir_opt_algebraic.py index 1dea42a..e3b309c 100644 --- a/src/glsl/nir/nir_opt_algebraic.py +++ b/src/glsl/nir/nir_opt_algebraic.py @@ -117,6 +117,11 @@ optimizations = [ (('frcp', ('frcp', a)), a), (('frcp', ('fsqrt', a)), ('frsq', a)), (('frcp', ('frsq', a)), ('fsqrt', a)), + # Boolean simplifications + (('ine', 'a@bool', 0), 'a'), + (('ieq', 'a@bool', 0), ('inot', 'a')), + (('bcsel', 'a@bool', True, False), 'a'), + (('bcsel', 'a@bool', False, True), ('inot', 'a')), # This one may not be exact (('feq', ('fadd', a, b), 0.0), ('feq', a, ('fneg', b))), -- 2.2.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 5/7] nir/algebraic: Support specifying variable as constant or by type
--- src/glsl/nir/nir_algebraic.py | 20 +--- src/glsl/nir/nir_opt_algebraic.py | 12 +--- 2 files changed, 26 insertions(+), 6 deletions(-) diff --git a/src/glsl/nir/nir_algebraic.py b/src/glsl/nir/nir_algebraic.py index 75436f4..6e7973d 100644 --- a/src/glsl/nir/nir_algebraic.py +++ b/src/glsl/nir/nir_algebraic.py @@ -28,6 +28,7 @@ import itertools import struct import sys import mako.template +import re # Represents a set of variables, each with a unique id class VarSet(object): @@ -65,6 +66,8 @@ static const ${val.c_type} ${val.name} = { { ${hex(val)} /* ${val.value} */ }, % elif isinstance(val, Variable): ${val.index}, /* ${val.var_name} */ + ${'true' if val.is_constant else 'false'}, + nir_type_${'void' if val.required_type is None else val.required_type}, % elif isinstance(val, Expression): nir_op_${val.opcode}, { ${', '.join(src.c_ptr for src in val.sources)} }, @@ -111,12 +114,23 @@ class Constant(Value): else: assert False +_var_name_re = re.compile(r(?Pconst#)?(?Pname\w+)(?:@(?Ptype\w+))?) + class Variable(Value): def __init__(self, val, name, varset): Value.__init__(self, name, variable) - self.var_name = val - self.index = varset[val] - self.name = name + + m = _var_name_re.match(val) + assert m and m.group('name') is not None + + self.var_name = m.group('name') + self.is_constant = m.group('const') is not None + self.required_type = m.group('type') + + if self.required_type is not None: + assert self.required_type in ('float', 'bool', 'int', 'unsigned') + + self.index = varset[self.var_name] class Expression(Value): def __init__(self, expr, name_base, varset): diff --git a/src/glsl/nir/nir_opt_algebraic.py b/src/glsl/nir/nir_opt_algebraic.py index 9c62b28..1dea42a 100644 --- a/src/glsl/nir/nir_opt_algebraic.py +++ b/src/glsl/nir/nir_opt_algebraic.py @@ -36,9 +36,15 @@ d = 'd' # and replace is either an expression or a value. An expression is # defined as a tuple of the form (op, src0, src1, src2, src3) # where each source is either an expression or a value. A value can be -# either a numeric constant or a string representing a variable name. For -# constants, you have to be careful to make sure that it is the right type -# because python is unaware of the source and destination types of the +# either a numeric constant or a string representing a variable name. +# +# Variable names are specified as [#]name[@type] where # inicates that +# the given variable will only match constants and thpe type indicates that +# the given variable will only match values from ALU instructions with the +# given output type. +# +# For constants, you have to be careful to make sure that it is the right +# type because python is unaware of the source and destination types of the # opcodes. optimizations = [ -- 2.2.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/7] nir: Add a void type
This allows us to indicate a concept of an invalid type. --- src/glsl/nir/nir.h | 1 + 1 file changed, 1 insertion(+) diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h index 980fdd0..f2050de 100644 --- a/src/glsl/nir/nir.h +++ b/src/glsl/nir/nir.h @@ -576,6 +576,7 @@ void nir_alu_dest_copy(nir_alu_dest *dest, const nir_alu_dest *src, void *mem_ctx); typedef enum { + nir_type_void = 0, /* Not a valid type */ nir_type_float, nir_type_int, nir_type_unsigned, -- 2.2.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/7] nir/search: Add support for matching unknown constants
There are some algebraic transformations that we want to do but only if certain things are constants. For instance, we may want to replace a * (b + c) with (a * b) + (a * c) as long as a and either b or c is constant. While this generates more instructions, some of it will get constant folded. nir_algebraic.py doesn't handle this yet, but that's ok because the C language will make sure that false is the default for now. --- src/glsl/nir/nir_search.c | 6 ++ src/glsl/nir/nir_search.h | 7 +++ 2 files changed, 13 insertions(+) diff --git a/src/glsl/nir/nir_search.c b/src/glsl/nir/nir_search.c index 18e0330..ec89817 100644 --- a/src/glsl/nir/nir_search.c +++ b/src/glsl/nir/nir_search.c @@ -78,6 +78,10 @@ match_value(const nir_search_value *value, nir_alu_instr *instr, unsigned src, return true; } else { + if (var-is_constant + instr-src[src].src.ssa-parent_instr-type != nir_instr_type_load_const) +return false; + state-variables_seen |= (1 var-variable); state-variables[var-variable].src = instr-src[src].src; state-variables[var-variable].abs = false; @@ -236,6 +240,8 @@ construct_value(const nir_search_value *value, nir_alu_type type, nir_alu_src val; nir_alu_src_copy(val, state-variables[var-variable], mem_ctx); + assert(!var-is_constant); + return val; } diff --git a/src/glsl/nir/nir_search.h b/src/glsl/nir/nir_search.h index 8ec58b0..18aa28d 100644 --- a/src/glsl/nir/nir_search.h +++ b/src/glsl/nir/nir_search.h @@ -47,6 +47,13 @@ typedef struct { /** The variable index; Must be less than NIR_SEARCH_MAX_VARIABLES */ unsigned variable; + + /** Indicates that the given variable must be a constant +* +* This is only alloed in search expressions and indicates that the +* given variable is only allowed to match constant values. +*/ + bool is_constant; } nir_search_variable; typedef struct { -- 2.2.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/2] main: make the intel driver obey drirc's force_glsl_version
Signed-off-by: Martin Peres martin.pe...@linux.intel.com --- src/mesa/drivers/dri/i915/intel_screen.c | 1 + src/mesa/drivers/dri/i965/brw_context.c | 3 +++ src/mesa/drivers/dri/i965/intel_screen.c | 1 + 3 files changed, 5 insertions(+) diff --git a/src/mesa/drivers/dri/i915/intel_screen.c b/src/mesa/drivers/dri/i915/intel_screen.c index 748eee7..59ebabd 100644 --- a/src/mesa/drivers/dri/i915/intel_screen.c +++ b/src/mesa/drivers/dri/i915/intel_screen.c @@ -71,6 +71,7 @@ DRI_CONF_BEGIN DRI_CONF_ALWAYS_FLUSH_BATCH(false) DRI_CONF_ALWAYS_FLUSH_CACHE(false) DRI_CONF_DISABLE_THROTTLING(false) + DRI_CONF_FORCE_GLSL_VERSION(0) DRI_CONF_FORCE_GLSL_EXTENSIONS_WARN(false) DRI_CONF_DISABLE_GLSL_LINE_CONTINUATIONS(false) DRI_CONF_DISABLE_BLEND_FUNC_EXTENDED(false) diff --git a/src/mesa/drivers/dri/i965/brw_context.c b/src/mesa/drivers/dri/i965/brw_context.c index 6fa54a3..09a5dc1 100644 --- a/src/mesa/drivers/dri/i965/brw_context.c +++ b/src/mesa/drivers/dri/i965/brw_context.c @@ -641,6 +641,9 @@ brw_process_driconf_options(struct brw_context *brw) brw-precompile = driQueryOptionb(brw-optionCache, shader_precompile); + ctx-Const.ForceGLSLVersion = + driQueryOptioni(options, force_glsl_version); + ctx-Const.ForceGLSLExtensionsWarn = driQueryOptionb(options, force_glsl_extensions_warn); diff --git a/src/mesa/drivers/dri/i965/intel_screen.c b/src/mesa/drivers/dri/i965/intel_screen.c index 013222f..7bd8b44 100644 --- a/src/mesa/drivers/dri/i965/intel_screen.c +++ b/src/mesa/drivers/dri/i965/intel_screen.c @@ -77,6 +77,7 @@ DRI_CONF_BEGIN DRI_CONF_ALWAYS_FLUSH_BATCH(false) DRI_CONF_ALWAYS_FLUSH_CACHE(false) DRI_CONF_DISABLE_THROTTLING(false) + DRI_CONF_FORCE_GLSL_VERSION(0) DRI_CONF_FORCE_GLSL_EXTENSIONS_WARN(false) DRI_CONF_DISABLE_GLSL_LINE_CONTINUATIONS(false) DRI_CONF_DISABLE_BLEND_FUNC_EXTENDED(false) -- 2.2.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/2] mesa: add a workaround for unigine Tropics
While trying to understand a GLSL pass, curro and I tried running Unigine Tropics and the GLSL compilers would not compile the shaders. The reason is due to the fact that the shaders do not specify the needed GLSL version but also use in the same shader keywords that could never co-exist because one got deleted when the other one was added (for instance, gl_TexCoord and gl_InstanceID). The current solution was to use the force_glsl_extensions_warn workaround but it broke when GL_ARB_gpu_shader5 got introduced as this workaround also enabled this extension which reserved the name sample which is then used by most of Unigine Tropics' shaders. To fix this, the easiest solution seem to introduce a workaround to disable GL_ARB_gpu_shader5 in the GLSL compiler. This is what this patch does along with modifying drirc to enable this workaround on tropics. This patch has been tested on Haswell. It should also work on Gallium-based drivers but I did not test it. I would like to thank curro for helping me understand the whole issue and directed me to the right place in the code. Signed-off-by: Martin Peres martin.pe...@linux.intel.com --- src/gallium/include/state_tracker/st_api.h | 1 + src/gallium/state_trackers/dri/dri_screen.c | 3 +++ src/glsl/glsl_parser_extras.cpp | 3 +++ src/mesa/drivers/dri/common/drirc | 1 + src/mesa/drivers/dri/common/xmlpool/t_options.h | 4 src/mesa/drivers/dri/i915/intel_screen.c| 1 + src/mesa/drivers/dri/i965/brw_context.c | 3 +++ src/mesa/drivers/dri/i965/intel_screen.c| 1 + src/mesa/main/mtypes.h | 7 +++ src/mesa/state_tracker/st_extensions.c | 3 +++ 10 files changed, 27 insertions(+) diff --git a/src/gallium/include/state_tracker/st_api.h b/src/gallium/include/state_tracker/st_api.h index 86fdc69..01bc98d 100644 --- a/src/gallium/include/state_tracker/st_api.h +++ b/src/gallium/include/state_tracker/st_api.h @@ -246,6 +246,7 @@ struct st_config_options unsigned force_glsl_version; boolean force_s3tc_enable; boolean allow_glsl_extension_directive_midshader; + boolean disable_glsl_extension_gpu_shader5; }; /** diff --git a/src/gallium/state_trackers/dri/dri_screen.c b/src/gallium/state_trackers/dri/dri_screen.c index 9cdebf8..091aecf 100644 --- a/src/gallium/state_trackers/dri/dri_screen.c +++ b/src/gallium/state_trackers/dri/dri_screen.c @@ -70,6 +70,7 @@ const __DRIconfigOptionsExtension gallium_config_options = { DRI_CONF_DISABLE_SHADER_BIT_ENCODING(false) DRI_CONF_FORCE_GLSL_VERSION(0) DRI_CONF_ALLOW_GLSL_EXTENSION_DIRECTIVE_MIDSHADER(false) + DRI_CONF_DISABLE_GLSL_EXTENSION_GPU_SHADER5(false) DRI_CONF_SECTION_END DRI_CONF_SECTION_MISCELLANEOUS @@ -98,6 +99,8 @@ dri_fill_st_options(struct st_config_options *options, driQueryOptionb(optionCache, force_s3tc_enable); options-allow_glsl_extension_directive_midshader = driQueryOptionb(optionCache, allow_glsl_extension_directive_midshader); + options-disable_glsl_extension_gpu_shader5 = + driQueryOptionb(optionCache, disable_glsl_extension_gpu_shader5); } static const __DRIconfig ** diff --git a/src/glsl/glsl_parser_extras.cpp b/src/glsl/glsl_parser_extras.cpp index ccdf031..fff04b8 100644 --- a/src/glsl/glsl_parser_extras.cpp +++ b/src/glsl/glsl_parser_extras.cpp @@ -217,6 +217,9 @@ _mesa_glsl_parse_state::_mesa_glsl_parse_state(struct gl_context *_ctx, sizeof(this-atomic_counter_offsets)); this-allow_extension_directive_midshader = ctx-Const.AllowGLSLExtensionDirectiveMidShader; + + if (ctx-Const.DisableGLSLExtensionGpuShader5) + this-ARB_gpu_shader5_enable = false; } /** diff --git a/src/mesa/drivers/dri/common/drirc b/src/mesa/drivers/dri/common/drirc index cecd6a9..3199cfc 100644 --- a/src/mesa/drivers/dri/common/drirc +++ b/src/mesa/drivers/dri/common/drirc @@ -41,6 +41,7 @@ TODO: document the other workarounds. application name=Unigine Tropics executable=Tropics option name=force_glsl_extensions_warn value=true / +option name=disable_glsl_extension_gpu_shader5 value=true / option name=disable_blend_func_extended value=true / /application diff --git a/src/mesa/drivers/dri/common/xmlpool/t_options.h b/src/mesa/drivers/dri/common/xmlpool/t_options.h index 4e5a721..d1869d6 100644 --- a/src/mesa/drivers/dri/common/xmlpool/t_options.h +++ b/src/mesa/drivers/dri/common/xmlpool/t_options.h @@ -110,6 +110,10 @@ DRI_CONF_OPT_BEGIN_B(allow_glsl_extension_directive_midshader, def) \ DRI_CONF_DESC(en,gettext(Allow GLSL #extension directives in the middle of shaders)) \ DRI_CONF_OPT_END +#define DRI_CONF_DISABLE_GLSL_EXTENSION_GPU_SHADER5(def) \ +DRI_CONF_OPT_BEGIN_B(disable_glsl_extension_gpu_shader5, def) \ +DRI_CONF_DESC(en,gettext(Disable GLSL support for the GL_ARB_gpu_shader5)) \ +DRI_CONF_OPT_END
Re: [Mesa-dev] [PATCH 6/7] nir/opt_algebraic: Add some boolean simplifications
On Thu, Jan 29, 2015 at 12:50 PM, Jason Ekstrand ja...@jlekstrand.net wrote: total instructions in shared programs: 8026509 - 8026469 (-0.00%) instructions in affected programs: 6338 - 6298 (-0.63%) helped:11 Ignore those numbers. they're bogus. Here are the real ones: total instructions in shared programs: 5998321 - 5998287 (-0.00%) instructions in affected programs: 4520 - 4486 (-0.75%) helped:8 --- src/glsl/nir/nir_opt_algebraic.py | 5 + 1 file changed, 5 insertions(+) diff --git a/src/glsl/nir/nir_opt_algebraic.py b/src/glsl/nir/nir_opt_algebraic.py index 1dea42a..e3b309c 100644 --- a/src/glsl/nir/nir_opt_algebraic.py +++ b/src/glsl/nir/nir_opt_algebraic.py @@ -117,6 +117,11 @@ optimizations = [ (('frcp', ('frcp', a)), a), (('frcp', ('fsqrt', a)), ('frsq', a)), (('frcp', ('frsq', a)), ('fsqrt', a)), + # Boolean simplifications + (('ine', 'a@bool', 0), 'a'), + (('ieq', 'a@bool', 0), ('inot', 'a')), + (('bcsel', 'a@bool', True, False), 'a'), + (('bcsel', 'a@bool', False, True), ('inot', 'a')), # This one may not be exact (('feq', ('fadd', a, b), 0.0), ('feq', a, ('fneg', b))), -- 2.2.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] mesa: add a workaround for unigine Tropics
On Thursday, January 29, 2015 11:14:26 PM Martin Peres wrote: While trying to understand a GLSL pass, curro and I tried running Unigine Tropics and the GLSL compilers would not compile the shaders. The reason is due to the fact that the shaders do not specify the needed GLSL version but also use in the same shader keywords that could never co-exist because one got deleted when the other one was added (for instance, gl_TexCoord and gl_InstanceID). The current solution was to use the force_glsl_extensions_warn workaround but it broke when GL_ARB_gpu_shader5 got introduced as this workaround also enabled this extension which reserved the name sample which is then used by most of Unigine Tropics' shaders. To fix this, the easiest solution seem to introduce a workaround to disable GL_ARB_gpu_shader5 in the GLSL compiler. This is what this patch does along with modifying drirc to enable this workaround on tropics. This patch has been tested on Haswell. It should also work on Gallium-based drivers but I did not test it. I would like to thank curro for helping me understand the whole issue and directed me to the right place in the code. Signed-off-by: Martin Peres martin.pe...@linux.intel.com Hi Martin! Thanks for fixing this. One small comment... [snip] diff --git a/src/mesa/drivers/dri/i915/intel_screen.c b/src/mesa/drivers/dri/i915/intel_screen.c index 00d8580..748eee7 100644 --- a/src/mesa/drivers/dri/i915/intel_screen.c +++ b/src/mesa/drivers/dri/i915/intel_screen.c @@ -74,6 +74,7 @@ DRI_CONF_BEGIN DRI_CONF_FORCE_GLSL_EXTENSIONS_WARN(false) DRI_CONF_DISABLE_GLSL_LINE_CONTINUATIONS(false) DRI_CONF_DISABLE_BLEND_FUNC_EXTENDED(false) + DRI_CONF_DISABLE_GLSL_EXTENSION_GPU_SHADER5(false) DRI_CONF_OPT_BEGIN_B(shader_precompile, true) DRI_CONF_DESC(en, Perform code generation at shader link time.) You can probably drop the i915 change - it doesn't support ARB_gpu_shader5 anyway. With that removed, you can add: Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=82897 Reviewed-by: Kenneth Graunke kenn...@whitecape.org signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] main: make the intel driver obey drirc's force_glsl_version
On Thursday, January 29, 2015 11:14:27 PM Martin Peres wrote: Signed-off-by: Martin Peres martin.pe...@linux.intel.com --- src/mesa/drivers/dri/i915/intel_screen.c | 1 + src/mesa/drivers/dri/i965/brw_context.c | 3 +++ src/mesa/drivers/dri/i965/intel_screen.c | 1 + 3 files changed, 5 insertions(+) diff --git a/src/mesa/drivers/dri/i915/intel_screen.c b/src/mesa/drivers/dri/i915/intel_screen.c index 748eee7..59ebabd 100644 --- a/src/mesa/drivers/dri/i915/intel_screen.c +++ b/src/mesa/drivers/dri/i915/intel_screen.c @@ -71,6 +71,7 @@ DRI_CONF_BEGIN DRI_CONF_ALWAYS_FLUSH_BATCH(false) DRI_CONF_ALWAYS_FLUSH_CACHE(false) DRI_CONF_DISABLE_THROTTLING(false) + DRI_CONF_FORCE_GLSL_VERSION(0) DRI_CONF_FORCE_GLSL_EXTENSIONS_WARN(false) DRI_CONF_DISABLE_GLSL_LINE_CONTINUATIONS(false) DRI_CONF_DISABLE_BLEND_FUNC_EXTENDED(false) diff --git a/src/mesa/drivers/dri/i965/brw_context.c b/src/mesa/drivers/dri/i965/brw_context.c index 6fa54a3..09a5dc1 100644 --- a/src/mesa/drivers/dri/i965/brw_context.c +++ b/src/mesa/drivers/dri/i965/brw_context.c @@ -641,6 +641,9 @@ brw_process_driconf_options(struct brw_context *brw) brw-precompile = driQueryOptionb(brw-optionCache, shader_precompile); + ctx-Const.ForceGLSLVersion = + driQueryOptioni(options, force_glsl_version); + ctx-Const.ForceGLSLExtensionsWarn = driQueryOptionb(options, force_glsl_extensions_warn); diff --git a/src/mesa/drivers/dri/i965/intel_screen.c b/src/mesa/drivers/dri/i965/intel_screen.c index 013222f..7bd8b44 100644 --- a/src/mesa/drivers/dri/i965/intel_screen.c +++ b/src/mesa/drivers/dri/i965/intel_screen.c @@ -77,6 +77,7 @@ DRI_CONF_BEGIN DRI_CONF_ALWAYS_FLUSH_BATCH(false) DRI_CONF_ALWAYS_FLUSH_CACHE(false) DRI_CONF_DISABLE_THROTTLING(false) + DRI_CONF_FORCE_GLSL_VERSION(0) DRI_CONF_FORCE_GLSL_EXTENSIONS_WARN(false) DRI_CONF_DISABLE_GLSL_LINE_CONTINUATIONS(false) DRI_CONF_DISABLE_BLEND_FUNC_EXTENDED(false) I've been meaning to do this for a while :) Thanks! Reviewed-by: Kenneth Graunke kenn...@whitecape.org signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 7/7] nir/opt_algebraic: Add some constant bcsel reductions
On Thu, Jan 29, 2015 at 12:50 PM, Jason Ekstrand ja...@jlekstrand.net wrote: total instructions in shared programs: 8026469 - 8026343 (-0.00%) instructions in affected programs: 28541 - 28415 (-0.44%) helped:101 HURT: 76 These numbers are bogus too. The real ones look good for NIR but not for FS. I'm looking into that. --Jason --- src/glsl/nir/nir_opt_algebraic.py | 26 ++ 1 file changed, 26 insertions(+) diff --git a/src/glsl/nir/nir_opt_algebraic.py b/src/glsl/nir/nir_opt_algebraic.py index e3b309c..4e1beb4 100644 --- a/src/glsl/nir/nir_opt_algebraic.py +++ b/src/glsl/nir/nir_opt_algebraic.py @@ -122,9 +122,35 @@ optimizations = [ (('ieq', 'a@bool', 0), ('inot', 'a')), (('bcsel', 'a@bool', True, False), 'a'), (('bcsel', 'a@bool', False, True), ('inot', 'a')), + (('bcsel', True, b, c), b), + (('bcsel', False, b, c), c), + # The result of this should be hit by constant propagation and, in the + # next round of opt_algebraic, get picked up by one of the above two. + (('bcsel', '#a', b, c), ('bcsel', ('ine', 'a', 0), b, c)), # This one may not be exact (('feq', ('fadd', a, b), 0.0), ('feq', a, ('fneg', b))), ] +# Add optimizations to handle the case where the result of a ternary is +# compared to a constant. This way we can take things like +# +# (a ? 0 : 1) 0 +# +# and turn it into +# +# a ? (0 0) : (1 0) +# +# which constant folding will eat for lunch. The resulting ternary will +# further get cleaned up by the boolean reductions above ane we will be +# left with just the original variable a. +for op in ['flt', 'fge', 'feq', 'fne', + 'ilt', 'ige', 'ieq', 'ine', 'ult', 'uge']: + optimizations += [ + ((op, ('bcsel', 'a', '#b', '#c'), '#d'), + ('bcsel', 'a', (op, 'b', 'd'), (op, 'c', 'd'))), + ((op, '#d', ('bcsel', a, '#b', '#c')), + ('bcsel', 'a', (op, 'd', 'b'), (op, 'd', 'c'))), + ] + print nir_algebraic.AlgebraicPass(nir_opt_algebraic, optimizations).render() -- 2.2.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] mesa: Add new fast mtx_t mutex type for basic use cases
On 29/01/15 17:14, Kristian Høgsberg wrote: On Thu, Jan 29, 2015 at 6:36 AM, Emil Velikov emil.l.veli...@gmail.com wrote: On 28/01/15 05:08, Kristian Høgsberg wrote: While modern pthread mutexes are very fast, they still incur a call to an external DSO and overhead of the generality and features of pthread mutexes. Most mutexes in mesa only needs lock/unlock, and the idea here is that we can inline the atomic operation and make the fast case just two intructions. Mutexes are subtle and finicky to implement, so we carefully copy the implementation from Ulrich Dreppers well-written and well-reviewed paper: Futexes Are Tricky http://www.akkadia.org/drepper/futex.pdf We implement mutex3, which gives us a mutex that has no syscalls on uncontended lock or unlock. Further, the uncontended case boils down to a cmpxchg and an untaken branch and the uncontended unlock is just a locked decr and an untaken branch. We use __builtin_expect() to indicate that contention is unlikely so that gcc will put the contention code out of the main code flow. A fast mutex only supports lock/unlock, can't be recursive or used with condition variables. We keep the pthread mutex implementation around as full_mtx_t for the few places where we use condition variables or recursive locking. For platforms or compilers where futex and atomics aren't available, mtx_t falls back to the pthread mutex. The pthread mutex lock/unlock overhead shows up on benchmarks for CPU bound applications. Most CPU bound cases are helped and some of our internal bind_buffer_object heavy benchmarks gain up to 10%. Hi Kristian, Can I humbly ask that you split this into two patches - one that introduces the new functions/struct and another one that uses them ? This way it'll be easier if/when things go crazy. Also the patch seems to wonder between posix and win32 + typedef full_mtx_t mtx_t; and + typedef mtx_t fast_mtx_t; Looks like a left over from the should I rename XX variables to fast* or just one to full* moment :) Yeah, that's how it progressed :) At first I called it fast_mtx_t and planned on replacing simple uses of mtx_t one by one. Jordan suggested that it'd be easier to make the regular mutex fast and then rename the couple of places that use more feature than we provide. I just botched the windows side when I did that. However, given that the patch now reimplements mtx_t, there's really no good way to split up introducing the fast mutex and making use of it in multiple patches. I guess we can add full_mtx_t in a separate patch, use it in a second patch and then finally reimplement mtx_t as a fast mutex. How does that sound? That sounds great. Thank you. -Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 7/7] nir/opt_algebraic: Add some constant bcsel reductions
total instructions in shared programs: 5998190 - 5997603 (-0.01%) instructions in affected programs: 54276 - 53689 (-1.08%) helped:293 --- src/glsl/nir/nir_opt_algebraic.py | 30 -- 1 file changed, 28 insertions(+), 2 deletions(-) diff --git a/src/glsl/nir/nir_opt_algebraic.py b/src/glsl/nir/nir_opt_algebraic.py index e3b309c..3223119 100644 --- a/src/glsl/nir/nir_opt_algebraic.py +++ b/src/glsl/nir/nir_opt_algebraic.py @@ -120,11 +120,37 @@ optimizations = [ # Boolean simplifications (('ine', 'a@bool', 0), 'a'), (('ieq', 'a@bool', 0), ('inot', 'a')), - (('bcsel', 'a@bool', True, False), 'a'), - (('bcsel', 'a@bool', False, True), ('inot', 'a')), + (('bcsel', a, True, False), ('ine', a, 0)), + (('bcsel', a, False, True), ('ieq', a, 0)), + (('bcsel', True, b, c), b), + (('bcsel', False, b, c), c), + # The result of this should be hit by constant propagation and, in the + # next round of opt_algebraic, get picked up by one of the above two. + (('bcsel', '#a', b, c), ('bcsel', ('ine', 'a', 0), b, c)), # This one may not be exact (('feq', ('fadd', a, b), 0.0), ('feq', a, ('fneg', b))), ] +# Add optimizations to handle the case where the result of a ternary is +# compared to a constant. This way we can take things like +# +# (a ? 0 : 1) 0 +# +# and turn it into +# +# a ? (0 0) : (1 0) +# +# which constant folding will eat for lunch. The resulting ternary will +# further get cleaned up by the boolean reductions above ane we will be +# left with just the original variable a. +for op in ['flt', 'fge', 'feq', 'fne', + 'ilt', 'ige', 'ieq', 'ine', 'ult', 'uge']: + optimizations += [ + ((op, ('bcsel', 'a', '#b', '#c'), '#d'), + ('bcsel', 'a', (op, 'b', 'd'), (op, 'c', 'd'))), + ((op, '#d', ('bcsel', a, '#b', '#c')), + ('bcsel', 'a', (op, 'd', 'b'), (op, 'd', 'c'))), + ] + print nir_algebraic.AlgebraicPass(nir_opt_algebraic, optimizations).render() -- 2.2.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965/skl: Force a BINDING_TABLE_POINTER_* command after push constant command
On Thursday, January 29, 2015 05:21:03 PM Neil Roberts wrote: According to the SkyLake bspec the 3DSTATE_CONSTANT_* commands only take effect on the next corresponding 3DSTATE_BINDING_TABLE_POINTER_* command. This patch just makes it set the BRW_NEW_SURFACES state when uploading the push constants to ensure the binding tables will be updated. This fixes the fbo-blending-formats Piglit test and possibly others. Reviewed-by: Kristian Høgsberg k...@bitplanet.net --- src/mesa/drivers/dri/i965/gen7_vs_state.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/src/mesa/drivers/dri/i965/gen7_vs_state.c b/src/mesa/drivers/dri/i965/gen7_vs_state.c index 404dd20..c55d6ca 100644 --- a/src/mesa/drivers/dri/i965/gen7_vs_state.c +++ b/src/mesa/drivers/dri/i965/gen7_vs_state.c @@ -60,6 +60,13 @@ gen7_upload_constant_state(struct brw_context *brw, } ADVANCE_BATCH(); + + /* On SkyLake+ the new constants don't take effect until the next + * corresponding 3DSTATE_BINDING_TABLE_POINTER_* command is parsed so we + * need to ensure that is sent + */ + if (brw-gen = 9) + brw-state.dirty.brw |= BRW_NEW_SURFACES; } Reviewed-by: Kenneth Graunke kenn...@whitecape.org signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] killing off the address reg in tgsi
Roland Scheidegger srol...@vmware.com writes: Hi, the address reg in tgsi is quite a nuisance. glsl-to-tgsi code assumes that indirections can only be done through the address reg and has quite some extra code to deal with this. Even though hardware and apis which worked like that are definitely old by now. Thus, I'm proposing the address reg be nuked. I am however not quite sure what the implications for drivers are, other than I'm certain llvmpipe can handle that already. For that reason, I suspect at least initially a new cap bit would be required so glsl-to-tgsi would skip the extra code. I tend to think longer term it would be great if it could be nuked completely, I am however not sure if that is easily done with drivers for old hw (such as r300) - I guess if necessary we could keep operations such as ARL (or even ARR though clearly not UARL!) and just define them to be usable with temp regs. Yes, please, let's kill ARL. signature.asc Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] nir: Add variants of some of the comparison simplifications.
On Wed, Jan 28, 2015 at 11:20 PM, Eric Anholt e...@anholt.net wrote: Jason Ekstrand ja...@jlekstrand.net writes: I've got a patch I cooked up today that generalizes this a bit. I haven't sent it yet because it lacks shader-db statistics. That said, I'm fine with merging this in the mean time. Not sure what generalization you're working on, but something that let me have any sort of bool as an expression node, and that and(flt(a, b), flt(c, d)) was recognized as a bool would be nice for some of these patterns. I was wondering if band/bor operations that were only defined on bool arguments would be a good idea (even though they'd codegen just like normal iand/ior), but then I got to thinking that this sounds like just a specific case of generally wanting possible-value categorization for SSA nodes. Yeah, that's what it does. It's an extension to the current system to allow you to specify that a variable must match a certain type. I just sent the series. --Jason ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [RFC] gallium/docs: add some freedreno compiler docs
From: Rob Clark robcl...@freedesktop.org Enable the 'sphinx.ext.graphviz' extension, and add in a section for driver specific docs, with freedreno compiler docs beneath. The goal is for more complete compiler docs, and hopefully some docs about other parts of the driver (such as how tiling works, etc). Note that there is also a Distribution - Drivers section. Although that appears to be simply just a list of drivers. Not sure if that should move under the 'Drivers' section or left alone. I did add a one-line section for freedreno in the existing Distribution - Drivers section. Also, I'm not sure if there is a good way to link back to .rst source files that live under the src/gallium/drivers/foo directory, or if that is even desired. At least adding this under the existing gallium docs, versus my initial approach which setup a new sphinx build docs tree under the freedreno driver directory, results in a much smaller patch due to not having an extra Makefile, conf.py, etc. --- src/gallium/docs/source/conf.py| 2 +- src/gallium/docs/source/distro.rst | 5 + src/gallium/docs/source/drivers.rst| 9 + src/gallium/docs/source/drivers/freedreno.rst | 9 + .../docs/source/drivers/freedreno/ir3-notes.rst| 386 + src/gallium/docs/source/index.rst | 1 + 6 files changed, 411 insertions(+), 1 deletion(-) create mode 100644 src/gallium/docs/source/drivers.rst create mode 100644 src/gallium/docs/source/drivers/freedreno.rst create mode 100644 src/gallium/docs/source/drivers/freedreno/ir3-notes.rst diff --git a/src/gallium/docs/source/conf.py b/src/gallium/docs/source/conf.py index 1288666..5e8173d 100644 --- a/src/gallium/docs/source/conf.py +++ b/src/gallium/docs/source/conf.py @@ -22,7 +22,7 @@ sys.path.append(os.path.abspath('exts')) # Add any Sphinx extension module names here, as strings. They can be extensions # coming with Sphinx (named 'sphinx.ext.*') or your custom ones. -extensions = ['sphinx.ext.pngmath', 'formatting'] +extensions = ['sphinx.ext.pngmath', 'sphinx.ext.graphviz', 'formatting'] # Add any paths that contain templates here, relative to this directory. templates_path = ['_templates'] diff --git a/src/gallium/docs/source/distro.rst b/src/gallium/docs/source/distro.rst index d69c186..a841b34 100644 --- a/src/gallium/docs/source/distro.rst +++ b/src/gallium/docs/source/distro.rst @@ -60,6 +60,11 @@ AMD radeonsi Driver for the AMD Southern Islands family of GPUs. +freedreno +^ + +Driver for Qualcomm Adreno a2xx, a3xx, and a4xx series of GPUs. + .. _softpipe: Softpipe diff --git a/src/gallium/docs/source/drivers.rst b/src/gallium/docs/source/drivers.rst new file mode 100644 index 000..469197c --- /dev/null +++ b/src/gallium/docs/source/drivers.rst @@ -0,0 +1,9 @@ +Drivers +=== + +Driver specific documentation. + +.. toctree:: + :glob: + + drivers/* diff --git a/src/gallium/docs/source/drivers/freedreno.rst b/src/gallium/docs/source/drivers/freedreno.rst new file mode 100644 index 000..723ffdd --- /dev/null +++ b/src/gallium/docs/source/drivers/freedreno.rst @@ -0,0 +1,9 @@ +Freedreno += + +Freedreno driver specific docs. + +.. toctree:: + :glob: + + freedreno/* diff --git a/src/gallium/docs/source/drivers/freedreno/ir3-notes.rst b/src/gallium/docs/source/drivers/freedreno/ir3-notes.rst new file mode 100644 index 000..dd28700 --- /dev/null +++ b/src/gallium/docs/source/drivers/freedreno/ir3-notes.rst @@ -0,0 +1,386 @@ +IR3 NOTES += + +Some notes about ir3, the compiler and machine-specific IR for the shader ISA introduced with adreno a3xx. The same shader ISA is present, with some small differences, with adreno a4xx. + +Compared to the previous generation a2xx ISA (ir2), the a3xx ISA is a simple scalar instruction set. However, the compiler is responsible, in most cases, to schedule the instructions. The hardware does not try to hide the shader core pipeline stages. For a common example, a common (cat2) ALU instruction takes four cycles, so a subsequent cat2 instruction which uses the result must have three intervening instructions (or nops). When operating on vec4's, typically the corresponding scalar instructions for operating on the remaining three components could typically fit. Although that results in a lot of edge cases where things fall over, like: + +:: + + ADD TEMP[0], TEMP[1], TEMP[2] + MUL TEMP[0], TEMP[1], TEMP[0].wzyx + +Here, the second instruction needs the output of the first group of scalar instructions in the wrong order, resulting in not enough instruction spots between the ``add r0.w, r1.w, r2.w`` and ``mul r0.x, r1.x, r0.w``. Which is why the original/old compiler which merely translated nearly literally from TGSI to ir3, had a strong tendency to fall over. + +So the current compiler instead, in the frontend, generates a directed-acyclic-graph of instructions and basic
[Mesa-dev] [PATCH v5 2/3] i965/fs: Use NIR's scalarizing abilities and stop handling vectors
Now that we can scalarize with NIR, there's no need for all this code anymore. Let's get rid of it and just do scalar operations. v2: run copy prop before lowering phi nodes v3: Get rid of the emit(...)-saturate = foo pattern total instructions in shared programs: 5998321 - 5974070 (-0.40%) instructions in affected programs: 732075 - 707824 (-3.31%) helped:3137 HURT: 191 GAINED:18 LOST: 0 --- src/mesa/drivers/dri/i965/brw_fs.h | 15 - src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 476 ++- 2 files changed, 153 insertions(+), 338 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h index 84e0b9e..25197cd 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.h +++ b/src/mesa/drivers/dri/i965/brw_fs.h @@ -592,21 +592,6 @@ public: fs_reg get_nir_alu_src(nir_alu_instr *instr, unsigned src); fs_reg get_nir_dest(nir_dest dest); void emit_percomp(fs_inst *inst, unsigned wr_mask); - void emit_percomp(enum opcode op, fs_reg dest, fs_reg src0, - unsigned wr_mask, bool saturate = false, - enum brw_predicate predicate = BRW_PREDICATE_NONE, - enum brw_conditional_mod mod = BRW_CONDITIONAL_NONE); - void emit_percomp(enum opcode op, fs_reg dest, fs_reg src0, fs_reg src1, - unsigned wr_mask, bool saturate = false, - enum brw_predicate predicate = BRW_PREDICATE_NONE, - enum brw_conditional_mod mod = BRW_CONDITIONAL_NONE); - void emit_math_percomp(enum opcode op, fs_reg dest, fs_reg src0, - unsigned wr_mask, bool saturate = false); - void emit_math_percomp(enum opcode op, fs_reg dest, fs_reg src0, - fs_reg src1, unsigned wr_mask, - bool saturate = false); - void emit_reduction(enum opcode op, fs_reg dest, fs_reg src, - unsigned num_components); int setup_color_payload(fs_reg *dst, fs_reg color, unsigned components); void emit_alpha_test(); diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp index de0d780..37ccd24 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp @@ -36,6 +36,10 @@ nir_optimize(nir_shader *nir) nir_validate_shader(nir); progress |= nir_copy_prop(nir); nir_validate_shader(nir); + nir_lower_phis_to_scalar(nir); + nir_validate_shader(nir); + progress |= nir_copy_prop(nir); + nir_validate_shader(nir); progress |= nir_opt_dce(nir); nir_validate_shader(nir); progress |= nir_opt_cse(nir); @@ -85,6 +89,9 @@ fs_visitor::emit_nir_code() nir_split_var_copies(nir); nir_validate_shader(nir); + nir_lower_alu_to_scalar(nir); + nir_validate_shader(nir); + nir_optimize(nir); /* Lower a bunch of stuff */ @@ -532,6 +539,7 @@ void fs_visitor::nir_emit_alu(nir_alu_instr *instr) { struct brw_wm_prog_key *fs_key = (struct brw_wm_prog_key *) this-key; + fs_inst *inst; fs_reg op[3]; fs_reg result = get_nir_dest(instr-dest.dest); @@ -540,20 +548,30 @@ fs_visitor::nir_emit_alu(nir_alu_instr *instr) for (unsigned i = 0; i nir_op_infos[instr-op].num_inputs; i++) op[i] = get_nir_alu_src(instr, i); + if (nir_op_infos[instr-op].output_size == 0) { + /* We've already scalarized, so we know that we only have one + * channel. The only question is which channel. + */ + assert(_mesa_bitcount(instr-dest.write_mask) == 1); + unsigned off = ffs(instr-dest.write_mask) - 1; + result = offset(result, off); + + for (unsigned i = 0; i nir_op_infos[instr-op].num_inputs; i++) + op[i] = offset(op[i], off); + } + switch (instr-op) { case nir_op_fmov: case nir_op_i2f: - case nir_op_u2f: { - fs_inst *inst = MOV(result, op[0]); + case nir_op_u2f: + inst = emit(MOV(result, op[0])); inst-saturate = instr-dest.saturate; - emit_percomp(inst, instr-dest.write_mask); - } break; case nir_op_imov: case nir_op_f2i: case nir_op_f2u: - emit_percomp(MOV(result, op[0]), instr-dest.write_mask); + emit(MOV(result, op[0])); break; case nir_op_fsign: { @@ -562,55 +580,46 @@ fs_visitor::nir_emit_alu(nir_alu_instr *instr) * Predicated OR ORs 1.0 (0x3f80) with the sign bit if val is not * zero. */ - emit_percomp(CMP(reg_null_f, op[0], fs_reg(0.0f), BRW_CONDITIONAL_NZ), - instr-dest.write_mask); + emit(CMP(reg_null_f, op[0], fs_reg(0.0f), BRW_CONDITIONAL_NZ)); fs_reg result_int = retype(result, BRW_REGISTER_TYPE_UD); op[0].type = BRW_REGISTER_TYPE_UD; result.type = BRW_REGISTER_TYPE_UD; -
[Mesa-dev] [PATCH v5 1/3] nir: Add a pass to lower vector phi nodes to scalar phi nodes
v2 Jason Ekstrand jason.ekstr...@intel.com: - Add better comments - Use nir_ssa_dest_init and nir_src_for_ssa more places - Fix some void * casts v3 Jason Ekstrand jason.ekstr...@intel.com: - Rework the way we determine whether or not to sccalarize a phi node to make the recursion non-bogus - Treat load_const instructions as scalarizable v4 Jason Ekstrand jason.ekstr...@intel.com: - Allow uniform and input loads to be scalarizable --- src/glsl/Makefile.sources | 1 + src/glsl/nir/nir.h | 2 + src/glsl/nir/nir_lower_phis_to_scalar.c | 278 3 files changed, 281 insertions(+) create mode 100644 src/glsl/nir/nir_lower_phis_to_scalar.c diff --git a/src/glsl/Makefile.sources b/src/glsl/Makefile.sources index face22e..bf6b70b 100644 --- a/src/glsl/Makefile.sources +++ b/src/glsl/Makefile.sources @@ -31,6 +31,7 @@ NIR_FILES = \ nir/nir_lower_global_vars_to_local.c \ nir/nir_lower_locals_to_regs.c \ nir/nir_lower_io.c \ + nir/nir_lower_phis_to_scalar.c \ nir/nir_lower_samplers.cpp \ nir/nir_lower_system_values.c \ nir/nir_lower_to_source_mods.c \ diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h index 980fdd0..4f58eee 100644 --- a/src/glsl/nir/nir.h +++ b/src/glsl/nir/nir.h @@ -1526,6 +1526,8 @@ void nir_remove_dead_variables(nir_shader *shader); void nir_lower_vec_to_movs(nir_shader *shader); void nir_lower_alu_to_scalar(nir_shader *shader); +void nir_lower_phis_to_scalar(nir_shader *shader); + void nir_lower_samplers(nir_shader *shader, struct gl_shader_program *shader_program, struct gl_program *prog); diff --git a/src/glsl/nir/nir_lower_phis_to_scalar.c b/src/glsl/nir/nir_lower_phis_to_scalar.c new file mode 100644 index 000..bf65f5a --- /dev/null +++ b/src/glsl/nir/nir_lower_phis_to_scalar.c @@ -0,0 +1,278 @@ +/* + * Copyright © 2015 Intel Corporation + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the Software), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + * + * Authors: + *Jason Ekstrand (ja...@jlekstrand.net) + * + */ + +#include nir.h + +/* + * Implements a pass that lowers vector phi nodes to scalar phi nodes when + * we don't think it will hurt anything. + */ + +struct lower_phis_to_scalar_state { + void *mem_ctx; + void *dead_ctx; + + /* Hash table marking which phi nodes are scalarizable. The key is +* pointers to phi instructions and the entry is either NULL for not +* scalarizable or non-null for scalarizable. +*/ + struct hash_table *phi_table; +}; + +static bool +should_lower_phi(nir_phi_instr *phi, struct lower_phis_to_scalar_state *state); + +static bool +is_phi_src_scalarizable(nir_phi_src *src, +struct lower_phis_to_scalar_state *state) +{ + /* Don't know what to do with non-ssa sources */ + if (!src-src.is_ssa) + return false; + + nir_instr *src_instr = src-src.ssa-parent_instr; + switch (src_instr-type) { + case nir_instr_type_alu: { + nir_alu_instr *src_alu = nir_instr_as_alu(src_instr); + + /* ALU operations with output_size == 0 should be scalarized. We + * will also see a bunch of vecN operations from scalarizing ALU + * operations and, since they can easily be copy-propagated, they + * are ok too. + */ + return nir_op_infos[src_alu-op].output_size == 0 || + src_alu-op != nir_op_vec2 || + src_alu-op != nir_op_vec3 || + src_alu-op != nir_op_vec4; + } + + case nir_instr_type_phi: + /* A phi is scalarizable if we're going to lower it */ + return should_lower_phi(nir_instr_as_phi(src_instr), state); + + case nir_instr_type_load_const: + /* These are trivially scalarizable */ + return true; + + case nir_instr_type_intrinsic: { + nir_intrinsic_instr *src_intrin = nir_instr_as_intrinsic(src_instr); + + switch
[Mesa-dev] [PATCH v5 3/3] i965/fs_nir: Get rid of get_alu_src
Originally, get_alu_src was supposed to handle resolving swizzles and things like that. However, now that basically every instruction we have only takes scalar sources, we don't really need it anymore. The only case where it's still marginally useful is for the mov and vecN operations that are left over from SSA form. We can handle those cases as a special case easily enough. As a side-effect, we don't need the vec_to_movs pass anymore. --- src/mesa/drivers/dri/i965/brw_fs.h | 1 - src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 153 +++ 2 files changed, 95 insertions(+), 59 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h index 25197cd..b95e2c0 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.h +++ b/src/mesa/drivers/dri/i965/brw_fs.h @@ -589,7 +589,6 @@ public: void nir_emit_texture(nir_tex_instr *instr); void nir_emit_jump(nir_jump_instr *instr); fs_reg get_nir_src(nir_src src); - fs_reg get_nir_alu_src(nir_alu_instr *instr, unsigned src); fs_reg get_nir_dest(nir_dest dest); void emit_percomp(fs_inst *inst, unsigned wr_mask); diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp index 37ccd24..fbb1622 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp @@ -140,8 +140,6 @@ fs_visitor::emit_nir_code() nir_convert_from_ssa(nir); nir_validate_shader(nir); - nir_lower_vec_to_movs(nir); - nir_validate_shader(nir); /* emit the arrays used for inputs and outputs - load/store intrinsics will * be converted to reads/writes of these arrays @@ -419,6 +417,7 @@ fs_visitor::nir_emit_impl(nir_function_impl *impl) void fs_visitor::nir_emit_cf_list(exec_list *list) { + exec_list_validate(list); foreach_list_typed(nir_cf_node, node, node, list) { switch (node-type) { case nir_cf_node_if: @@ -541,34 +540,117 @@ fs_visitor::nir_emit_alu(nir_alu_instr *instr) struct brw_wm_prog_key *fs_key = (struct brw_wm_prog_key *) this-key; fs_inst *inst; - fs_reg op[3]; fs_reg result = get_nir_dest(instr-dest.dest); result.type = brw_type_for_nir_type(nir_op_infos[instr-op].output_type); - for (unsigned i = 0; i nir_op_infos[instr-op].num_inputs; i++) - op[i] = get_nir_alu_src(instr, i); + fs_reg op[4]; + for (unsigned i = 0; i nir_op_infos[instr-op].num_inputs; i++) { + op[i] = get_nir_src(instr-src[i].src); + op[i].type = brw_type_for_nir_type(nir_op_infos[instr-op].input_types[i]); + op[i].abs = instr-src[i].abs; + op[i].negate = instr-src[i].negate; + } + + /* We get a bunch of mov's out of the from_ssa pass and they may still +* be vectorized. We'll handle them as a special-case. We'll also +* handle vecN here because it's basically the same thing. +*/ + bool need_extra_copy = false; + switch (instr-op) { + case nir_op_vec4: + if (!instr-src[3].src.is_ssa + instr-dest.dest.reg.reg == instr-src[3].src.reg.reg) + need_extra_copy = true; + /* fall through */ + case nir_op_vec3: + if (!instr-src[2].src.is_ssa + instr-dest.dest.reg.reg == instr-src[2].src.reg.reg) + need_extra_copy = true; + /* fall through */ + case nir_op_vec2: + if (!instr-src[1].src.is_ssa + instr-dest.dest.reg.reg == instr-src[1].src.reg.reg) + need_extra_copy = true; + /* fall through */ + case nir_op_imov: + case nir_op_fmov: { + if (!instr-src[0].src.is_ssa + instr-dest.dest.reg.reg == instr-src[0].src.reg.reg) + need_extra_copy = true; + + fs_reg temp; + if (need_extra_copy) { + temp = retype(vgrf(4), result.type); + } else { + temp = result; + } + + if (instr-op == nir_op_imov || instr-op == nir_op_fmov) { + for (unsigned i = 0; i 4; i++) { +if (!(instr-dest.write_mask (1 i))) + continue; + +inst = emit(MOV(offset(temp, i), +offset(op[0], instr-src[0].swizzle[i]))); +inst-saturate = instr-dest.saturate; + } + } else { + for (unsigned i = 0; i 4; i++) { +if (!(instr-dest.write_mask (1 i))) + continue; + +inst = emit(MOV(offset(temp, i), +offset(op[i], instr-src[i].swizzle[0]))); +inst-saturate = instr-dest.saturate; + } + } + + /* In this case the source and destination registers were the same, + * so we need to insert an extra set of moves in order to deal with + * any swizzling. + */ + if (need_extra_copy) { + for (unsigned i = 0; i 4; i++) { +if (!(instr-dest.write_mask (1 i))) + continue; + +emit(MOV(offset(result, i), offset(temp, i))); + } + } + return; + } + default: + break; + }
Re: [Mesa-dev] [PATCH 2/2] egl/dri: Defer display destruction until the display is not in use.
On 01/28/2015 01:45 PM, jim.br...@linux.intel.com wrote: From: Jim Bride jim.br...@intel.com In situations where a DRI2 driver is in use and a call to eglTerminate() is followed by a call to eglMakeCurrent() the mandatory call to glFlush() inside of dri2_make_current() would cause a segmentation fault because the DRI2 driver was torn down and glFlush() expects certain data structures associated through the driver to be present. In order to accommodate this the tear-down of the driver itself has been moved from dri2_terminate() into a new function dri2_cleanup_display(). Finally, both dri2_terminate() and dri2_make_current() call dri2_cleanup_display() if there are no resources associated with the display. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=69622 Signed-off-by: Jim Bride jim.br...@intel.com --- src/egl/drivers/dri2/egl_dri2.c | 84 + 1 file changed, 68 insertions(+), 16 deletions(-) diff --git a/src/egl/drivers/dri2/egl_dri2.c b/src/egl/drivers/dri2/egl_dri2.c index 9d5a1cf..bea5ccc 100644 --- a/src/egl/drivers/dri2/egl_dri2.c +++ b/src/egl/drivers/dri2/egl_dri2.c @@ -665,20 +665,14 @@ dri2_initialize(_EGLDriver *drv, _EGLDisplay *disp) } /** - * Called via eglTerminate(), drv-API.Terminate(). + * Cleanup display elements that are not in use. */ -static EGLBoolean -dri2_terminate(_EGLDriver *drv, _EGLDisplay *disp) +static void +dri2_cleanup_display(_EGLDisplay *disp) { struct dri2_egl_display *dri2_dpy = dri2_egl_display(disp); unsigned i; - /* Release any resources that are not currently bound to -* the display. We do not want to release bound resources -* because doing so could interfere with proper cleanup via -* a call to eglMakeCurrent() to unbind said resources. -*/ - _eglReleaseDisplayResources(drv, disp, EGL_TRUE); _eglCleanupDisplay(disp); if (dri2_dpy-own_dri_screen) @@ -728,7 +722,26 @@ dri2_terminate(_EGLDriver *drv, _EGLDisplay *disp) } free(dri2_dpy); disp-DriverData = NULL; +} + +/** + * Called via eglTerminate(), drv-API.Terminate(). + */ +static EGLBoolean +dri2_terminate(_EGLDriver *drv, _EGLDisplay *disp) +{ + /* Release any resources that are not currently bound to +* the display. We do not want to release bound resources +* because doing so could interfere with proper cleanup via +* a call to eglMakeCurrent() to unbind said resources. +*/ + _eglReleaseDisplayResources(drv, disp, EGL_TRUE); + /* If the display has not resources (and all are tied to contexts) then ^^^ no +* go ahead and free up the display itself. +*/ + if (!disp-ResourceLists[_EGL_RESOURCE_CONTEXT]) + dri2_cleanup_display(disp); return EGL_TRUE; } @@ -992,14 +1005,13 @@ dri2_make_current(_EGLDriver *drv, _EGLDisplay *disp, _EGLSurface *dsurf, __DRIdrawable *ddraw, *rdraw; __DRIcontext *cctx; + /* flush before context switch */ + dri2_drv-glFlush(); + /* make new bindings */ if (!_eglBindContext(ctx, dsurf, rsurf, old_ctx, old_dsurf, old_rsurf)) return EGL_FALSE; - /* flush before context switch */ - if (old_ctx dri2_drv-glFlush) - dri2_drv-glFlush(); - ddraw = (dri2_dsurf) ? dri2_dsurf-dri_drawable : NULL; rdraw = (dri2_rsurf) ? dri2_rsurf-dri_drawable : NULL; cctx = (dri2_ctx) ? dri2_ctx-dri_context : NULL; @@ -1011,13 +1023,53 @@ dri2_make_current(_EGLDriver *drv, _EGLDisplay *disp, _EGLSurface *dsurf, if ((cctx == NULL ddraw == NULL rdraw == NULL) || dri2_dpy-core-bindContext(cctx, ddraw, rdraw)) { - if (old_dsurf) + _EGLDisplay *old_dpy = NULL, *ref = NULL; + + if (old_dsurf) { + ref = old_dsurf-Resource.Display; + old_dpy = (ref != disp) ? ref : NULL; + drv-API.DestroySurface(drv, disp, old_dsurf); - if (old_rsurf) + } + + if (old_rsurf) { + if (!old_dpy) { +ref = old_rsurf-Resource.Display; +old_dpy = (ref != disp) ? ref : NULL; + } Nit pick. Ignore if you wish. I think the second-level if statements are much easier to read if rewritten with the pattern below: if (old_rsurf) { old_dpy = old_rsurf-Resource.Display; drv-API.DestroySurface(drv, disp, old_rsurf); } It eliminates a nested 'if' statement, eliminates the weird `ref` variable, and eliminates the ternary 'if' operator. And it is functionally equivalent to the patch as-is because, if any of the old objects exist, the old display is the same for all of them. To make the simplification though, you would have to modify the 'if' condition farther below: - if (old_dpy !old_dpy-Initialized) + if (old_dpy old_dpy != disp !old_dpy-Initialized) drv-API.DestroySurface(drv, disp, old_rsurf); - if (old_ctx) + } + + if
Re: [Mesa-dev] [PATCH 1/7] nir: Add a void type
On Thu, Jan 29, 2015 at 4:13 PM, Kenneth Graunke kenn...@whitecape.org wrote: On Thursday, January 29, 2015 12:50:16 PM Jason Ekstrand wrote: This allows us to indicate a concept of an invalid type. --- src/glsl/nir/nir.h | 1 + 1 file changed, 1 insertion(+) diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h index 980fdd0..f2050de 100644 --- a/src/glsl/nir/nir.h +++ b/src/glsl/nir/nir.h @@ -576,6 +576,7 @@ void nir_alu_dest_copy(nir_alu_dest *dest, const nir_alu_dest *src, void *mem_ctx); typedef enum { + nir_type_void = 0, /* Not a valid type */ nir_type_float, nir_type_int, nir_type_unsigned, I think it would be clearer to call this nir_type_invalid, since void is a real type in source languages. Sure. I don't care too much. Changed locally. --Jason --Ken ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2 7/7] nir/opt_algebraic: Add some constant bcsel reductions
On Thursday, January 29, 2015 02:34:18 PM Jason Ekstrand wrote: total instructions in shared programs: 5998190 - 5997603 (-0.01%) instructions in affected programs: 54276 - 53689 (-1.08%) helped:293 --- src/glsl/nir/nir_opt_algebraic.py | 30 -- 1 file changed, 28 insertions(+), 2 deletions(-) diff --git a/src/glsl/nir/nir_opt_algebraic.py b/src/glsl/nir/nir_opt_algebraic.py index e3b309c..3223119 100644 --- a/src/glsl/nir/nir_opt_algebraic.py +++ b/src/glsl/nir/nir_opt_algebraic.py @@ -120,11 +120,37 @@ optimizations = [ # Boolean simplifications (('ine', 'a@bool', 0), 'a'), (('ieq', 'a@bool', 0), ('inot', 'a')), - (('bcsel', 'a@bool', True, False), 'a'), - (('bcsel', 'a@bool', False, True), ('inot', 'a')), + (('bcsel', a, True, False), ('ine', a, 0)), + (('bcsel', a, False, True), ('ieq', a, 0)), + (('bcsel', True, b, c), b), + (('bcsel', False, b, c), c), + # The result of this should be hit by constant propagation and, in the + # next round of opt_algebraic, get picked up by one of the above two. + (('bcsel', '#a', b, c), ('bcsel', ('ine', 'a', 0), b, c)), # This one may not be exact (('feq', ('fadd', a, b), 0.0), ('feq', a, ('fneg', b))), ] +# Add optimizations to handle the case where the result of a ternary is +# compared to a constant. This way we can take things like +# +# (a ? 0 : 1) 0 +# +# and turn it into +# +# a ? (0 0) : (1 0) +# +# which constant folding will eat for lunch. The resulting ternary will +# further get cleaned up by the boolean reductions above ane we will be typo (and) ^^^ Series is: Reviewed-by: Kenneth Graunke kenn...@whitecape.org +# left with just the original variable a. +for op in ['flt', 'fge', 'feq', 'fne', + 'ilt', 'ige', 'ieq', 'ine', 'ult', 'uge']: + optimizations += [ + ((op, ('bcsel', 'a', '#b', '#c'), '#d'), + ('bcsel', 'a', (op, 'b', 'd'), (op, 'c', 'd'))), + ((op, '#d', ('bcsel', a, '#b', '#c')), + ('bcsel', 'a', (op, 'd', 'b'), (op, 'd', 'c'))), + ] + print nir_algebraic.AlgebraicPass(nir_opt_algebraic, optimizations).render() signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] killing off the address reg in tgsi
On Thu, Jan 29, 2015 at 5:31 PM, Rob Clark robdcl...@gmail.com wrote: On Thu, Jan 29, 2015 at 4:20 PM, Roland Scheidegger srol...@vmware.com wrote: Hi, the address reg in tgsi is quite a nuisance. glsl-to-tgsi code assumes that indirections can only be done through the address reg and has quite some extra code to deal with this. Even though hardware and apis which worked like that are definitely old by now. Thus, I'm proposing the address reg be nuked. I am however not quite sure what the implications for drivers are, other than I'm certain llvmpipe can handle that already. For that reason, I suspect at least initially a new cap bit would be required so glsl-to-tgsi would skip the extra code. I tend to think longer term it would be great if it could be nuked completely, I am however not sure if that is easily done with drivers for old hw (such as r300) - I guess if necessary we could keep operations such as ARL (or even ARR though clearly not UARL!) and just define them to be usable with temp regs. Opinions? Not sure about other hw (although I would assume radeon is similar).. but adreno has real actual address register, which need to be used for relative addressing of the register (or const) file. Otherwise we'd need to do something like arrays in memory and load/store instructions. Possibly w/ more smarts in the compiler, we could infer address register usage. It would require some smarts that we don't have yet to eliminate redundant writes to the address register generated for each instruction with indirect reference. actually, I think I'm lying.. since it will end up SSA form (in either tgsi-ir3 or tgsi-nir-ir3), we should have a single address register value, so maybe it isn't such a big problem.. BR, -R (Although eventually I'd like to use tgsi-nir.. and haven't yet looked at how nir handles indirects so maybe eventually I need to implement those smarts in my compiler anyway.) BR, -R Roland ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 5/7] nir/algebraic: Support specifying variable as constant or by type
On Thursday, January 29, 2015 12:50:20 PM Jason Ekstrand wrote: --- src/glsl/nir/nir_algebraic.py | 20 +--- src/glsl/nir/nir_opt_algebraic.py | 12 +--- 2 files changed, 26 insertions(+), 6 deletions(-) diff --git a/src/glsl/nir/nir_algebraic.py b/src/glsl/nir/nir_algebraic.py index 75436f4..6e7973d 100644 --- a/src/glsl/nir/nir_algebraic.py +++ b/src/glsl/nir/nir_algebraic.py @@ -28,6 +28,7 @@ import itertools import struct import sys import mako.template +import re # Represents a set of variables, each with a unique id class VarSet(object): @@ -65,6 +66,8 @@ static const ${val.c_type} ${val.name} = { { ${hex(val)} /* ${val.value} */ }, % elif isinstance(val, Variable): ${val.index}, /* ${val.var_name} */ + ${'true' if val.is_constant else 'false'}, + nir_type_${'void' if val.required_type is None else val.required_type}, nir_type_${val.required_type or 'void'} % elif isinstance(val, Expression): nir_op_${val.opcode}, { ${', '.join(src.c_ptr for src in val.sources)} }, @@ -111,12 +114,23 @@ class Constant(Value): else: assert False +_var_name_re = re.compile(r(?Pconst#)?(?Pname\w+)(?:@(?Ptype\w+))?) + class Variable(Value): def __init__(self, val, name, varset): Value.__init__(self, name, variable) - self.var_name = val - self.index = varset[val] - self.name = name + + m = _var_name_re.match(val) + assert m and m.group('name') is not None + + self.var_name = m.group('name') + self.is_constant = m.group('const') is not None + self.required_type = m.group('type') + + if self.required_type is not None: + assert self.required_type in ('float', 'bool', 'int', 'unsigned') + + self.index = varset[self.var_name] class Expression(Value): def __init__(self, expr, name_base, varset): diff --git a/src/glsl/nir/nir_opt_algebraic.py b/src/glsl/nir/nir_opt_algebraic.py index 9c62b28..1dea42a 100644 --- a/src/glsl/nir/nir_opt_algebraic.py +++ b/src/glsl/nir/nir_opt_algebraic.py @@ -36,9 +36,15 @@ d = 'd' # and replace is either an expression or a value. An expression is # defined as a tuple of the form (op, src0, src1, src2, src3) # where each source is either an expression or a value. A value can be -# either a numeric constant or a string representing a variable name. For -# constants, you have to be careful to make sure that it is the right type -# because python is unaware of the source and destination types of the +# either a numeric constant or a string representing a variable name. +# +# Variable names are specified as [#]name[@type] where # inicates that +# the given variable will only match constants and thpe type indicates that typo +# the given variable will only match values from ALU instructions with the +# given output type. +# +# For constants, you have to be careful to make sure that it is the right +# type because python is unaware of the source and destination types of the # opcodes. optimizations = [ signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] killing off the address reg in tgsi
On 01/29/2015 02:20 PM, Roland Scheidegger wrote: Hi, the address reg in tgsi is quite a nuisance. glsl-to-tgsi code assumes that indirections can only be done through the address reg and has quite some extra code to deal with this. Even though hardware and apis which worked like that are definitely old by now. Thus, I'm proposing the address reg be nuked. I am however not quite sure what the implications for drivers are, other than I'm certain llvmpipe can handle that already. For that reason, I suspect at least initially a new cap bit would be required so glsl-to-tgsi would skip the extra code. I tend to think longer term it would be great if it could be nuked completely, I am however not sure if that is easily done with drivers for old hw (such as r300) - I guess if necessary we could keep operations such as ARL (or even ARR though clearly not UARL!) and just define them to be usable with temp regs. Opinions? Sounds good, but as you know, the vmware/svga driver uses the address register. Currently, there's some handy code in st_glsl_to_tgsi.cpp which looks for instructions where more than one operand uses indirect addressing and reduces it down to one indirection per instruction. For example, if we had something like this: uniform vec4 a[10], b[10], c[10]; x = a[i] * b[j] + c[k]; We get TGSI code something like this: ARL ADDR.x, i; MOV TEMP[0], CONST[ADDR.x]; ARL ADDR.x, j; MOV TEMP[1], CONST[ADDR.x]; ARL ADDR.x, k; MAD TEMP[2], TEMP[0], TEMP[1], CONST[ADDR.x]; (and yes, I can see the desire to get rid of that entirely and emit something like: MAD TEMP[2], CONST[TEMP_i], CONST[TEMP_j], CONST[TEMP_k]; ) If we added a new cap along the lines of PIPE_CAP_MAX_INDIRECTIONS_PER_INSTRUCTION we could keep that logic in the state tracker. This would make life easier in the driver(s). Otherwise, if we discard ADDR, ARL and don't have the new cap, the driver(s) will have to count the indirections in each instruction and load some of those operands into temp registers. It's do-able, but more work. It would take me a while to implement that in our driver. Of course, some of it could go into a TGSI lowering utility. In any case, the driver would still have to detect patterns like this: OPCODE DST, SRC[TEMP], ... and convert it to ARL ADDR.x, TEMP; OPCODE DST, SRC[ADDR.x], ... But that's not too hard. -Brian ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] killing off the address reg in tgsi
On Thu, Jan 29, 2015 at 6:06 PM, Brian Paul bri...@vmware.com wrote: On 01/29/2015 02:20 PM, Roland Scheidegger wrote: Hi, the address reg in tgsi is quite a nuisance. glsl-to-tgsi code assumes that indirections can only be done through the address reg and has quite some extra code to deal with this. Even though hardware and apis which worked like that are definitely old by now. Thus, I'm proposing the address reg be nuked. I am however not quite sure what the implications for drivers are, other than I'm certain llvmpipe can handle that already. For that reason, I suspect at least initially a new cap bit would be required so glsl-to-tgsi would skip the extra code. I tend to think longer term it would be great if it could be nuked completely, I am however not sure if that is easily done with drivers for old hw (such as r300) - I guess if necessary we could keep operations such as ARL (or even ARR though clearly not UARL!) and just define them to be usable with temp regs. Opinions? Sounds good, but as you know, the vmware/svga driver uses the address register. Currently, there's some handy code in st_glsl_to_tgsi.cpp which looks for instructions where more than one operand uses indirect addressing and reduces it down to one indirection per instruction. For example, if we had something like this: uniform vec4 a[10], b[10], c[10]; x = a[i] * b[j] + c[k]; We get TGSI code something like this: ARL ADDR.x, i; MOV TEMP[0], CONST[ADDR.x]; ARL ADDR.x, j; MOV TEMP[1], CONST[ADDR.x]; ARL ADDR.x, k; MAD TEMP[2], TEMP[0], TEMP[1], CONST[ADDR.x]; (and yes, I can see the desire to get rid of that entirely and emit something like: MAD TEMP[2], CONST[TEMP_i], CONST[TEMP_j], CONST[TEMP_k]; ) If we added a new cap along the lines of PIPE_CAP_MAX_INDIRECTIONS_PER_INSTRUCTION we could keep that logic in the state tracker. This would make life easier in the driver(s). that would make it easier for me at least.. :-) otherwise I end up lowering the second form back into the first BR, -R Otherwise, if we discard ADDR, ARL and don't have the new cap, the driver(s) will have to count the indirections in each instruction and load some of those operands into temp registers. It's do-able, but more work. It would take me a while to implement that in our driver. Of course, some of it could go into a TGSI lowering utility. In any case, the driver would still have to detect patterns like this: OPCODE DST, SRC[TEMP], ... and convert it to ARL ADDR.x, TEMP; OPCODE DST, SRC[ADDR.x], ... But that's not too hard. -Brian ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] GL: Update glext.h to fix ARB_dsa function prototypes.
It might be good to say in the commit message update glext.h to match revision X from Khronos and then put the comment about DSA functions as a second line in the commit message. --Jason On Fri, Jan 23, 2015 at 2:23 PM, Laura Ekstrand la...@jlekstrand.net wrote: I checked, and all of the currently implemented DSA functions build with this file. Laura On Fri, Jan 23, 2015 at 2:20 PM, Laura Ekstrand la...@jlekstrand.net wrote: --- include/GL/glext.h | 48 ++-- 1 file changed, 26 insertions(+), 22 deletions(-) diff --git a/include/GL/glext.h b/include/GL/glext.h index d3cfbb5..0ca89ca 100644 --- a/include/GL/glext.h +++ b/include/GL/glext.h @@ -33,7 +33,7 @@ extern C { ** used to make the header, and the header can be found at ** http://www.opengl.org/registry/ ** -** Khronos $Revision: 28986 $ on $Date: 2014-11-18 18:43:15 -0800 (Tue, 18 Nov 2014) $ +** Khronos $Revision: 29537 $ on $Date: 2015-01-22 02:32:35 -0800 (Thu, 22 Jan 2015) $ */ #if defined(_WIN32) !defined(APIENTRY) !defined(__CYGWIN__) !defined(__SCITECH_SNAP__) @@ -53,7 +53,7 @@ extern C { #define GLAPI extern #endif -#define GL_GLEXT_VERSION 20141118 +#define GL_GLEXT_VERSION 20150122 /* Generated C header for: * API: gl @@ -2607,25 +2607,25 @@ GLAPI void APIENTRY glBindVertexBuffers (GLuint first, GLsizei count, const GLui typedef void (APIENTRYP PFNGLCLIPCONTROLPROC) (GLenum origin, GLenum depth); typedef void (APIENTRYP PFNGLCREATETRANSFORMFEEDBACKSPROC) (GLsizei n, GLuint *ids); typedef void (APIENTRYP PFNGLTRANSFORMFEEDBACKBUFFERBASEPROC) (GLuint xfb, GLuint index, GLuint buffer); -typedef void (APIENTRYP PFNGLTRANSFORMFEEDBACKBUFFERRANGEPROC) (GLuint xfb, GLuint index, GLuint buffer, GLintptr offset, GLsizei size); +typedef void (APIENTRYP PFNGLTRANSFORMFEEDBACKBUFFERRANGEPROC) (GLuint xfb, GLuint index, GLuint buffer, GLintptr offset, GLsizeiptr size); typedef void (APIENTRYP PFNGLGETTRANSFORMFEEDBACKIVPROC) (GLuint xfb, GLenum pname, GLint *param); typedef void (APIENTRYP PFNGLGETTRANSFORMFEEDBACKI_VPROC) (GLuint xfb, GLenum pname, GLuint index, GLint *param); typedef void (APIENTRYP PFNGLGETTRANSFORMFEEDBACKI64_VPROC) (GLuint xfb, GLenum pname, GLuint index, GLint64 *param); typedef void (APIENTRYP PFNGLCREATEBUFFERSPROC) (GLsizei n, GLuint *buffers); -typedef void (APIENTRYP PFNGLNAMEDBUFFERSTORAGEPROC) (GLuint buffer, GLsizei size, const void *data, GLbitfield flags); -typedef void (APIENTRYP PFNGLNAMEDBUFFERDATAPROC) (GLuint buffer, GLsizei size, const void *data, GLenum usage); -typedef void (APIENTRYP PFNGLNAMEDBUFFERSUBDATAPROC) (GLuint buffer, GLintptr offset, GLsizei size, const void *data); -typedef void (APIENTRYP PFNGLCOPYNAMEDBUFFERSUBDATAPROC) (GLuint readBuffer, GLuint writeBuffer, GLintptr readOffset, GLintptr writeOffset, GLsizei size); +typedef void (APIENTRYP PFNGLNAMEDBUFFERSTORAGEPROC) (GLuint buffer, GLsizeiptr size, const void *data, GLbitfield flags); +typedef void (APIENTRYP PFNGLNAMEDBUFFERDATAPROC) (GLuint buffer, GLsizeiptr size, const void *data, GLenum usage); +typedef void (APIENTRYP PFNGLNAMEDBUFFERSUBDATAPROC) (GLuint buffer, GLintptr offset, GLsizeiptr size, const void *data); +typedef void (APIENTRYP PFNGLCOPYNAMEDBUFFERSUBDATAPROC) (GLuint readBuffer, GLuint writeBuffer, GLintptr readOffset, GLintptr writeOffset, GLsizeiptr size); typedef void (APIENTRYP PFNGLCLEARNAMEDBUFFERDATAPROC) (GLuint buffer, GLenum internalformat, GLenum format, GLenum type, const void *data); -typedef void (APIENTRYP PFNGLCLEARNAMEDBUFFERSUBDATAPROC) (GLuint buffer, GLenum internalformat, GLintptr offset, GLsizei size, GLenum format, GLenum type, const void *data); +typedef void (APIENTRYP PFNGLCLEARNAMEDBUFFERSUBDATAPROC) (GLuint buffer, GLenum internalformat, GLintptr offset, GLsizeiptr size, GLenum format, GLenum type, const void *data); typedef void *(APIENTRYP PFNGLMAPNAMEDBUFFERPROC) (GLuint buffer, GLenum access); -typedef void *(APIENTRYP PFNGLMAPNAMEDBUFFERRANGEPROC) (GLuint buffer, GLintptr offset, GLsizei length, GLbitfield access); +typedef void *(APIENTRYP PFNGLMAPNAMEDBUFFERRANGEPROC) (GLuint buffer, GLintptr offset, GLsizeiptr length, GLbitfield access); typedef GLboolean (APIENTRYP PFNGLUNMAPNAMEDBUFFERPROC) (GLuint buffer); -typedef void (APIENTRYP PFNGLFLUSHMAPPEDNAMEDBUFFERRANGEPROC) (GLuint buffer, GLintptr offset, GLsizei length); +typedef void (APIENTRYP PFNGLFLUSHMAPPEDNAMEDBUFFERRANGEPROC) (GLuint buffer, GLintptr offset, GLsizeiptr length); typedef void (APIENTRYP PFNGLGETNAMEDBUFFERPARAMETERIVPROC) (GLuint buffer, GLenum pname, GLint *params); typedef void (APIENTRYP PFNGLGETNAMEDBUFFERPARAMETERI64VPROC) (GLuint buffer, GLenum pname, GLint64 *params); typedef void (APIENTRYP PFNGLGETNAMEDBUFFERPOINTERVPROC) (GLuint buffer, GLenum pname, void **params); -typedef void (APIENTRYP PFNGLGETNAMEDBUFFERSUBDATAPROC) (GLuint
Re: [Mesa-dev] Mesa 10.5.0 release plan
On 29/01/15 04:05, Michel Dänzer wrote: On 29.01.2015 00:37, Emil Velikov wrote: February 20th 2015 - Release candidate 3 February 20th 2015 - Release candidate 4 I assume you meant February 27th for rc4. :) Sigh... will I be able to write a statement without ever making silly typos :) That's correct Michel - I meant the 27th. Cheers, Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 5/7] nir/algebraic: Support specifying variable as constant or by type
On Thursday, January 29, 2015 17:00:08 Kenneth Graunke wrote: On Thursday, January 29, 2015 12:50:20 PM Jason Ekstrand wrote: --- src/glsl/nir/nir_algebraic.py | 20 +--- src/glsl/nir/nir_opt_algebraic.py | 12 +--- 2 files changed, 26 insertions(+), 6 deletions(-) diff --git a/src/glsl/nir/nir_algebraic.py b/src/glsl/nir/nir_algebraic.py index 75436f4..6e7973d 100644 --- a/src/glsl/nir/nir_algebraic.py +++ b/src/glsl/nir/nir_algebraic.py @@ -28,6 +28,7 @@ import itertools import struct import sys import mako.template +import re # Represents a set of variables, each with a unique id class VarSet(object): @@ -65,6 +66,8 @@ static const ${val.c_type} ${val.name} = { { ${hex(val)} /* ${val.value} */ }, % elif isinstance(val, Variable): ${val.index}, /* ${val.var_name} */ + ${'true' if val.is_constant else 'false'}, + nir_type_${'void' if val.required_type is None else val.required_type}, nir_type_${val.required_type or 'void'} This is equivalent except in a few cases, namely 'void' will be picked over any falsy value, not just None. Reasonable falsy values might be 0 or '' (empty string). % elif isinstance(val, Expression): nir_op_${val.opcode}, { ${', '.join(src.c_ptr for src in val.sources)} }, @@ -111,12 +114,23 @@ class Constant(Value): else: assert False +_var_name_re = re.compile(r(?Pconst#)?(?Pname\w+)(?:@(?Ptype\w+))?) + class Variable(Value): def __init__(self, val, name, varset): Value.__init__(self, name, variable) - self.var_name = val - self.index = varset[val] - self.name = name + + m = _var_name_re.match(val) + assert m and m.group('name') is not None Can m.group('name') actually be None? I don't think it can. + + self.var_name = m.group('name') + self.is_constant = m.group('const') is not None + self.required_type = m.group('type') + + if self.required_type is not None: + assert self.required_type in ('float', 'bool', 'int', 'unsigned') + + self.index = varset[self.var_name] class Expression(Value): def __init__(self, expr, name_base, varset): diff --git a/src/glsl/nir/nir_opt_algebraic.py b/src/glsl/nir/nir_opt_algebraic.py index 9c62b28..1dea42a 100644 --- a/src/glsl/nir/nir_opt_algebraic.py +++ b/src/glsl/nir/nir_opt_algebraic.py @@ -36,9 +36,15 @@ d = 'd' # and replace is either an expression or a value. An expression is # defined as a tuple of the form (op, src0, src1, src2, src3) # where each source is either an expression or a value. A value can be -# either a numeric constant or a string representing a variable name. For -# constants, you have to be careful to make sure that it is the right type -# because python is unaware of the source and destination types of the +# either a numeric constant or a string representing a variable name. +# +# Variable names are specified as [#]name[@type] where # inicates that +# the given variable will only match constants and thpe type indicates that typo +# the given variable will only match values from ALU instructions with the +# given output type. +# +# For constants, you have to be careful to make sure that it is the right +# type because python is unaware of the source and destination types of the # opcodes. optimizations = [ signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 5/7] nir/algebraic: Support specifying variable as constant or by type
On Thu, Jan 29, 2015 at 5:00 PM, Kenneth Graunke kenn...@whitecape.org wrote: On Thursday, January 29, 2015 12:50:20 PM Jason Ekstrand wrote: --- src/glsl/nir/nir_algebraic.py | 20 +--- src/glsl/nir/nir_opt_algebraic.py | 12 +--- 2 files changed, 26 insertions(+), 6 deletions(-) diff --git a/src/glsl/nir/nir_algebraic.py b/src/glsl/nir/nir_algebraic.py index 75436f4..6e7973d 100644 --- a/src/glsl/nir/nir_algebraic.py +++ b/src/glsl/nir/nir_algebraic.py @@ -28,6 +28,7 @@ import itertools import struct import sys import mako.template +import re # Represents a set of variables, each with a unique id class VarSet(object): @@ -65,6 +66,8 @@ static const ${val.c_type} ${val.name} = { { ${hex(val)} /* ${val.value} */ }, % elif isinstance(val, Variable): ${val.index}, /* ${val.var_name} */ + ${'true' if val.is_constant else 'false'}, + nir_type_${'void' if val.required_type is None else val.required_type}, nir_type_${val.required_type or 'void'} done. % elif isinstance(val, Expression): nir_op_${val.opcode}, { ${', '.join(src.c_ptr for src in val.sources)} }, @@ -111,12 +114,23 @@ class Constant(Value): else: assert False +_var_name_re = re.compile(r(?Pconst#)?(?Pname\w+)(?:@(?Ptype\w+))?) + class Variable(Value): def __init__(self, val, name, varset): Value.__init__(self, name, variable) - self.var_name = val - self.index = varset[val] - self.name = name + + m = _var_name_re.match(val) + assert m and m.group('name') is not None + + self.var_name = m.group('name') + self.is_constant = m.group('const') is not None + self.required_type = m.group('type') + + if self.required_type is not None: + assert self.required_type in ('float', 'bool', 'int', 'unsigned') + + self.index = varset[self.var_name] class Expression(Value): def __init__(self, expr, name_base, varset): diff --git a/src/glsl/nir/nir_opt_algebraic.py b/src/glsl/nir/nir_opt_algebraic.py index 9c62b28..1dea42a 100644 --- a/src/glsl/nir/nir_opt_algebraic.py +++ b/src/glsl/nir/nir_opt_algebraic.py @@ -36,9 +36,15 @@ d = 'd' # and replace is either an expression or a value. An expression is # defined as a tuple of the form (op, src0, src1, src2, src3) # where each source is either an expression or a value. A value can be -# either a numeric constant or a string representing a variable name. For -# constants, you have to be careful to make sure that it is the right type -# because python is unaware of the source and destination types of the +# either a numeric constant or a string representing a variable name. +# +# Variable names are specified as [#]name[@type] where # inicates that +# the given variable will only match constants and thpe type indicates that typo fixed. +# the given variable will only match values from ALU instructions with the +# given output type. +# +# For constants, you have to be careful to make sure that it is the right +# type because python is unaware of the source and destination types of the # opcodes. optimizations = [ ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965: Add a better PRM citation for the IMS dimension mangling.
On 01/29/2015 12:00 AM, Kenneth Graunke wrote: Paul originally had to reverse engineer these formulas based on the description about how the sampler works. The description here is not the easiest to follow - especially given that it's from the Sandybridge era, when the hardware only did 4x multisampling. Jordan and I recently found another part of the documentation where they simply state that IMS dimensions must be adjusted by a set of formulas. Quoting this section provides an easy to follow explanation for the code, including 2x/4x/8x/16x. Signed-off-by: Kenneth Graunke kenn...@whitecape.org Reviewed-by: Chad Versace chad.vers...@intel.com signature.asc Description: OpenPGP digital signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/7] nir: Add a void type
On Thursday, January 29, 2015 12:50:16 PM Jason Ekstrand wrote: This allows us to indicate a concept of an invalid type. --- src/glsl/nir/nir.h | 1 + 1 file changed, 1 insertion(+) diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h index 980fdd0..f2050de 100644 --- a/src/glsl/nir/nir.h +++ b/src/glsl/nir/nir.h @@ -576,6 +576,7 @@ void nir_alu_dest_copy(nir_alu_dest *dest, const nir_alu_dest *src, void *mem_ctx); typedef enum { + nir_type_void = 0, /* Not a valid type */ nir_type_float, nir_type_int, nir_type_unsigned, I think it would be clearer to call this nir_type_invalid, since void is a real type in source languages. --Ken signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] egl: Add unbound_only parameter to _eglReleaseDisplayResources()
On 01/28/2015 01:45 PM, jim.br...@linux.intel.com wrote: From: Jim Bride jim.br...@intel.com Add a EGLBoolean parameter unbound_only to _eglReleaseDisplayResources() to allow that function to skip releasing contexts or surfaces that are currently bound to a display. Setting unbound_only to EGL_FALSE preserves the original behavior of this function. Signed-off-by: Jim Bride jim.br...@intel.com --- src/egl/drivers/dri2/egl_dri2.c | 7 ++- src/egl/main/egldisplay.c | 17 + src/egl/main/egldisplay.h | 3 ++- src/gallium/state_trackers/egl/common/egl_g3d.c | 9 - 4 files changed, 29 insertions(+), 7 deletions(-) Reviewed-by: Chad Versace chad.vers...@intel.com I'm waiting on fixes to patch 2 before committing patch 1, because they don't make sense to commit in independently. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 0/2] Fast mutex v2
Here's take two of the fast mutex path. This version has a check for linux/futex.h and it split into two patches. First patch introduces full_mtx_t and updates callers. Second patch adds the inline mtx_t. Kristian Høgsberg (2): mesa: Split full featured mutex into its own type mesa: Add new fast mtx_t mutex type for basic use cases configure.ac | 3 + include/c11/threads_posix.h | 149 --- include/c11/threads_win32.h | 48 --- src/gallium/auxiliary/os/os_thread.h | 10 +-- src/mesa/main/mtypes.h | 2 +- src/mesa/main/shared.c | 4 +- src/mesa/main/texobj.c | 4 +- src/mesa/main/texobj.h | 4 +- 8 files changed, 191 insertions(+), 33 deletions(-) -- 2.2.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 1/2] mesa: Split full featured mutex into its own type
Most of the mtx_t uses in mesa only use lock/unlock, but a couple of places we use a condition variable and a recursive mutex. This patch introduces a new full_mtx_t for the cases where we need a more featureful mutex and changes mtx_t to only support lock/unlock. Signed-off-by: Kristian Høgsberg k...@bitplanet.net --- include/c11/threads_posix.h | 51 include/c11/threads_win32.h | 48 ++--- src/gallium/auxiliary/os/os_thread.h | 10 +++ src/mesa/main/mtypes.h | 2 +- src/mesa/main/shared.c | 4 +-- src/mesa/main/texobj.c | 4 +-- src/mesa/main/texobj.h | 4 +-- 7 files changed, 91 insertions(+), 32 deletions(-) diff --git a/include/c11/threads_posix.h b/include/c11/threads_posix.h index f9c165d..5119773 100644 --- a/include/c11/threads_posix.h +++ b/include/c11/threads_posix.h @@ -59,13 +59,14 @@ Configuration macro: #endif // FIXME: temporary non-standard hack to ease transition +#define _FULL_MTX_INITIALIZER_NP PTHREAD_MUTEX_INITIALIZER #define _MTX_INITIALIZER_NP PTHREAD_MUTEX_INITIALIZER /* types */ typedef pthread_cond_t cnd_t; typedef pthread_t thrd_t; typedef pthread_key_t tss_t; -typedef pthread_mutex_t mtx_t; +typedef pthread_mutex_t full_mtx_t; typedef pthread_once_t once_flag; @@ -135,7 +136,7 @@ cnd_signal(cnd_t *cond) // 7.25.3.5 static inline int -cnd_timedwait(cnd_t *cond, mtx_t *mtx, const xtime *xt) +cnd_timedwait(cnd_t *cond, full_mtx_t *mtx, const xtime *xt) { struct timespec abs_time; int rt; @@ -148,7 +149,7 @@ cnd_timedwait(cnd_t *cond, mtx_t *mtx, const xtime *xt) // 7.25.3.6 static inline int -cnd_wait(cnd_t *cond, mtx_t *mtx) +cnd_wait(cnd_t *cond, full_mtx_t *mtx) { if (!cond || !mtx) return thrd_error; pthread_cond_wait(cond, mtx); @@ -159,7 +160,7 @@ cnd_wait(cnd_t *cond, mtx_t *mtx) /* 7.25.4 Mutex functions */ // 7.25.4.1 static inline void -mtx_destroy(mtx_t *mtx) +full_mtx_destroy(full_mtx_t *mtx) { assert(mtx); pthread_mutex_destroy(mtx); @@ -167,7 +168,7 @@ mtx_destroy(mtx_t *mtx) // 7.25.4.2 static inline int -mtx_init(mtx_t *mtx, int type) +full_mtx_init(full_mtx_t *mtx, int type) { pthread_mutexattr_t attr; if (!mtx) return thrd_error; @@ -191,7 +192,7 @@ mtx_init(mtx_t *mtx, int type) // 7.25.4.3 static inline int -mtx_lock(mtx_t *mtx) +full_mtx_lock(full_mtx_t *mtx) { if (!mtx) return thrd_error; pthread_mutex_lock(mtx); @@ -199,14 +200,14 @@ mtx_lock(mtx_t *mtx) } static inline int -mtx_trylock(mtx_t *mtx); +full_mtx_trylock(full_mtx_t *mtx); static inline void thrd_yield(void); // 7.25.4.4 static inline int -mtx_timedlock(mtx_t *mtx, const xtime *xt) +full_mtx_timedlock(full_mtx_t *mtx, const xtime *xt) { if (!mtx || !xt) return thrd_error; { @@ -222,7 +223,7 @@ mtx_timedlock(mtx_t *mtx, const xtime *xt) #else time_t expire = time(NULL); expire += xt-sec; -while (mtx_trylock(mtx) != thrd_success) { +while (full_mtx_trylock(mtx) != thrd_success) { time_t now = time(NULL); if (expire now) return thrd_busy; @@ -236,7 +237,7 @@ mtx_timedlock(mtx_t *mtx, const xtime *xt) // 7.25.4.5 static inline int -mtx_trylock(mtx_t *mtx) +full_mtx_trylock(full_mtx_t *mtx) { if (!mtx) return thrd_error; return (pthread_mutex_trylock(mtx) == 0) ? thrd_success : thrd_busy; @@ -244,13 +245,41 @@ mtx_trylock(mtx_t *mtx) // 7.25.4.6 static inline int -mtx_unlock(mtx_t *mtx) +full_mtx_unlock(full_mtx_t *mtx) { if (!mtx) return thrd_error; pthread_mutex_unlock(mtx); return thrd_success; } +typedef full_mtx_t mtx_t; + +static inline void +mtx_init(mtx_t *mtx, int type) +{ + assert(type == mtx_plain); + + full_mtx_init(mtx, mtx_plain); +} + +static inline void +mtx_destroy(mtx_t *mtx) +{ + full_mtx_destroy(mtx); +} + +static inline void +mtx_lock(mtx_t *mtx) +{ + full_mtx_lock(mtx); +} + +static inline void +mtx_unlock(mtx_t *mtx) +{ + full_mtx_unlock(mtx); +} + /*--- 7.25.5 Thread functions ---*/ // 7.25.5.1 diff --git a/include/c11/threads_win32.h b/include/c11/threads_win32.h index d017c31..6968ce4 100644 --- a/include/c11/threads_win32.h +++ b/include/c11/threads_win32.h @@ -85,6 +85,7 @@ Configuration macro: #define TSS_DTOR_ITERATIONS 1 // FIXME: temporary non-standard hack to ease transition +#define _FULL_MTX_INITIALIZER_NP {(PCRITICAL_SECTION_DEBUG)-1, -1, 0, 0, 0, 0} #define _MTX_INITIALIZER_NP {(PCRITICAL_SECTION_DEBUG)-1, -1, 0, 0, 0, 0} /* types */ @@ -206,7 +207,7 @@ static void impl_cond_do_signal(cnd_t *cond, int broadcast) ReleaseSemaphore(cond-sem_queue, nsignal, NULL); } -static int
[Mesa-dev] [PATCH v2 2/2] mesa: Add new fast mtx_t mutex type for basic use cases
While modern pthread mutexes are very fast, they still incur a call to an external DSO and overhead of the generality and features of pthread mutexes. Most mutexes in mesa only needs lock/unlock, and the idea here is that we can inline the atomic operation and make the fast case just two intructions. Mutexes are subtle and finicky to implement, so we carefully copy the implementation from Ulrich Dreppers well-written and well-reviewed paper: Futexes Are Tricky http://www.akkadia.org/drepper/futex.pdf We implement mutex3, which gives us a mutex that has no syscalls on uncontended lock or unlock. Further, the uncontended case boils down to a cmpxchg and an untaken branch and the uncontended unlock is just a locked decr and an untaken branch. We use __builtin_expect() to indicate that contention is unlikely so that gcc will put the contention code out of the main code flow. A fast mutex only supports lock/unlock, can't be recursive or used with condition variables. We keep the pthread mutex implementation around as full_mtx_t for the few places where we use condition variables or recursive locking. For platforms or compilers where futex and atomics aren't available, mtx_t falls back to the pthread mutex. The pthread mutex lock/unlock overhead shows up on benchmarks for CPU bound applications. Most CPU bound cases are helped and some of our internal bind_buffer_object heavy benchmarks gain up to 10%. Signed-off-by: Kristian Høgsberg k...@bitplanet.net --- configure.ac| 3 ++ include/c11/threads_posix.h | 98 - 2 files changed, 100 insertions(+), 1 deletion(-) diff --git a/configure.ac b/configure.ac index c2d775e..c4b26ee 100644 --- a/configure.ac +++ b/configure.ac @@ -639,6 +639,9 @@ dnl to -pthread, which causes problems if we need -lpthread to appear in dnl pkgconfig files. test -z $PTHREAD_LIBS PTHREAD_LIBS=-lpthread +dnl Check for futex for fast inline mtx_t. +AC_CHECK_HEADER([linux/futex.h], [DEFINES=$DEFINES -DHAVE_FUTEX]) + dnl SELinux awareness. AC_ARG_ENABLE([selinux], [AS_HELP_STRING([--enable-selinux], diff --git a/include/c11/threads_posix.h b/include/c11/threads_posix.h index 5119773..a48b15f 100644 --- a/include/c11/threads_posix.h +++ b/include/c11/threads_posix.h @@ -60,7 +60,6 @@ Configuration macro: // FIXME: temporary non-standard hack to ease transition #define _FULL_MTX_INITIALIZER_NP PTHREAD_MUTEX_INITIALIZER -#define _MTX_INITIALIZER_NP PTHREAD_MUTEX_INITIALIZER /* types */ typedef pthread_cond_t cnd_t; @@ -252,8 +251,104 @@ full_mtx_unlock(full_mtx_t *mtx) return thrd_success; } +#if defined(__GNUC__) defined(HAVE_FUTEX) + +/* mtx_t - Fast, simple mutex + * + * While modern pthread mutexes are very fast (implemented using futex), they + * still incur a call to an external DSO and overhead of the generality and + * features of pthread mutexes. Most mutexes in mesa only needs lock/unlock, + * and the idea here is that we can inline the atomic operation and make the + * fast case just two intructions. Mutexes are subtle and finicky to + * implement, so we carefully copy the implementation from Ulrich Dreppers + * well-written and well-reviewed paper: + * + * Futexes Are Tricky + * http://www.akkadia.org/drepper/futex.pdf + * + * We implement mutex3, which gives us a mutex that has no syscalls on + * uncontended lock or unlock. Further, the uncontended case boils down to a + * locked cmpxchg and an untaken branch, the uncontended unlock is just a + * locked decr and an untaken branch. We use __builtin_expect() to indicate + * that contention is unlikely so that gcc will put the contention code out of + * the main code flow. + * + * A fast mutex only supports lock/unlock, can't be recursive or used with + * condition variables. + */ + +typedef struct { + uint32_t val; +} mtx_t; + +#define _MTX_INITIALIZER_NP { 0 } + +#include stdint.h +#include values.h +#include linux/futex.h +#include sys/time.h +#include sys/syscall.h + +static inline long sys_futex(void *addr1, int op, int val1, struct timespec *timeout, void *addr2, int val3) +{ + return syscall(SYS_futex, addr1, op, val1, timeout, addr2, val3); +} + +static inline int futex_wake(uint32_t *addr) { + return sys_futex(addr, FUTEX_WAKE, 1, NULL, NULL, 0); +} + +static inline int futex_wait(uint32_t *addr, int32_t value) { + return sys_futex(addr, FUTEX_WAIT, value, NULL, NULL, 0); +} + +static inline void +mtx_init(mtx_t *mtx, int type) +{ + assert(type == mtx_plain); + + mtx-val = 0; +} + +static inline void +mtx_destroy(mtx_t *mtx) +{ +} + +static inline void +mtx_lock(mtx_t *mtx) +{ + uint32_t c; + + c = __sync_val_compare_and_swap(mtx-val, 0, 1); + if (__builtin_expect(c != 0, 0)) { + if (c != 2) + c = __sync_lock_test_and_set(mtx-val, 2); + while (c != 0) { + futex_wait(mtx-val, 2); + c =
Re: [Mesa-dev] [PATCH 1/2] egl: Add unbound_only parameter to _eglReleaseDisplayResources()
On Thu, Jan 29, 2015 at 6:03 PM, Chad Versace chad.vers...@intel.com wrote: On 01/28/2015 01:45 PM, jim.br...@linux.intel.com wrote: From: Jim Bride jim.br...@intel.com Add a EGLBoolean parameter unbound_only to _eglReleaseDisplayResources() to allow that function to skip releasing contexts or surfaces that are currently bound to a display. Setting unbound_only to EGL_FALSE preserves the original behavior of this function. Signed-off-by: Jim Bride jim.br...@intel.com --- src/egl/drivers/dri2/egl_dri2.c | 7 ++- src/egl/main/egldisplay.c | 17 + src/egl/main/egldisplay.h | 3 ++- src/gallium/state_trackers/egl/common/egl_g3d.c | 9 - 4 files changed, 29 insertions(+), 7 deletions(-) Reviewed-by: Chad Versace chad.vers...@intel.com I'm waiting on fixes to patch 2 before committing patch 1, because they don't make sense to commit in independently. It might be nice as to wait until the patches actually show up on the mailing list as well. I suspect Jim isn't subscribed so his patches are stuck in the moderation queue, but you got them since they were sent directly to you. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] glsl: Pick ast_conditional branch regardless of op1/2 being constant.
If the ?: operator's condition is a constant value, and both branches were pure expressions, we can just make the resulting value one or the other. Previously, we only did this if op[1] and op[2] were also constant values - but there's no actual reason for that restriction. No changes in shader-db, probably because we usually optimize this later anyway. But it does make us generate less stupid code up front. Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/glsl/ast_to_hir.cpp | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/src/glsl/ast_to_hir.cpp b/src/glsl/ast_to_hir.cpp index 1ba29f7..4d28069 100644 --- a/src/glsl/ast_to_hir.cpp +++ b/src/glsl/ast_to_hir.cpp @@ -1597,13 +1597,11 @@ ast_expression::do_hir(exec_list *instructions, } ir_constant *cond_val = op[0]-constant_expression_value(); - ir_constant *then_val = op[1]-constant_expression_value(); - ir_constant *else_val = op[2]-constant_expression_value(); if (then_instructions.is_empty() else_instructions.is_empty() - (cond_val != NULL) (then_val != NULL) (else_val != NULL)) { - result = (cond_val-value.b[0]) ? then_val : else_val; + cond_val != NULL) { + result = cond_val-value.b[0] ? op[1] : op[2]; } else { ir_variable *const tmp = new(ctx) ir_variable(type, conditional_tmp, ir_var_temporary); -- 2.2.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 12/21] main: Added entry points for ClearNamedBuffer[Sub]Data.
On Wed, Jan 21, 2015 at 5:40 PM, Laura Ekstrand la...@jlekstrand.net wrote: --- src/mapi/glapi/gen/ARB_direct_state_access.xml | 18 +++ src/mesa/main/bufferobj.c | 156 ++--- src/mesa/main/bufferobj.h | 34 -- src/mesa/main/tests/dispatch_sanity.cpp| 2 + src/mesa/state_tracker/st_cb_bufferobjects.c | 4 +- 5 files changed, 135 insertions(+), 79 deletions(-) diff --git a/src/mapi/glapi/gen/ARB_direct_state_access.xml b/src/mapi/glapi/gen/ARB_direct_state_access.xml index 8c80dfb..4b29e00 100644 --- a/src/mapi/glapi/gen/ARB_direct_state_access.xml +++ b/src/mapi/glapi/gen/ARB_direct_state_access.xml @@ -43,6 +43,24 @@ param name=size type=GLsizeiptr / /function + function name=ClearNamedBufferData offset=assign + param name=buffer type=GLuint / + param name=internalformat type=GLenum / + param name=format type=GLenum / + param name=type type=GLenum / + param name=data type=const GLvoid * / + /function + + function name=ClearNamedBufferSubData offset=assign + param name=buffer type=GLuint / + param name=internalformat type=GLenum / + param name=offset type=GLintptr / + param name=size type=GLsizeiptr / + param name=format type=GLenum / + param name=type type=GLenum / + param name=data type=const GLvoid * / + /function + !-- Texture object functions -- function name=CreateTextures offset=assign diff --git a/src/mesa/main/bufferobj.c b/src/mesa/main/bufferobj.c index 31ae006..1ec1681 100644 --- a/src/mesa/main/bufferobj.c +++ b/src/mesa/main/bufferobj.c @@ -660,11 +660,11 @@ _mesa_buffer_get_subdata( struct gl_context *ctx, GLintptrARB offset, * dd_function_table::ClearBufferSubData. */ void -_mesa_buffer_clear_subdata(struct gl_context *ctx, - GLintptr offset, GLsizeiptr size, - const GLvoid *clearValue, - GLsizeiptr clearValueSize, - struct gl_buffer_object *bufObj) +_mesa_ClearBufferSubData_sw(struct gl_context *ctx, +GLintptr offset, GLsizeiptr size, +const GLvoid *clearValue, +GLsizeiptr clearValueSize, +struct gl_buffer_object *bufObj) { GLsizeiptr i; GLubyte *dest; @@ -1113,7 +1113,7 @@ _mesa_init_buffer_object_functions(struct dd_function_table *driver) driver-UnmapBuffer = _mesa_buffer_unmap; /* GL_ARB_clear_buffer_object */ - driver-ClearBufferSubData = _mesa_buffer_clear_subdata; + driver-ClearBufferSubData = _mesa_ClearBufferSubData_sw; /* GL_ARB_map_buffer_range */ driver-MapBufferRange = _mesa_buffer_map_range; @@ -1666,57 +1666,93 @@ _mesa_GetBufferSubData(GLenum target, GLintptr offset, ctx-Driver.GetBufferSubData( ctx, offset, size, data, bufObj ); } - -void GLAPIENTRY -_mesa_ClearBufferData(GLenum target, GLenum internalformat, GLenum format, - GLenum type, const GLvoid* data) +/** + * \param subdata true if caller is *SubData, false if *Data + */ +void +_mesa_clear_buffer_sub_data(struct gl_context *ctx, +struct gl_buffer_object *bufObj, +GLenum internalformat, +GLintptr offset, GLsizeiptr size, +GLenum format, GLenum type, +const GLvoid *data, +const char *func, bool subdata) Doesn't matter much, but it might be nice if the refactor that pulls the guts out of ClearBufferData and ClearBufferSubData into one function were it's own commit and the DSA stuff were added later. It wasn't obvious what was going on in my initial scan-through. Not a big deal though. { - GET_CURRENT_CONTEXT(ctx); - struct gl_buffer_object* bufObj; mesa_format mesaFormat; GLubyte clearValue[MAX_PIXEL_BYTES]; GLsizeiptr clearValueSize; - bufObj = get_buffer(ctx, glClearBufferData, target, GL_INVALID_VALUE); - if (!bufObj) { - return; - } - - if (_mesa_check_disallowed_mapping(bufObj)) { - _mesa_error(ctx, GL_INVALID_OPERATION, - glClearBufferData(buffer currently mapped)); + /* This checks for disallowed mappings. */ + if (!buffer_object_subdata_range_good(ctx, bufObj, offset, size, + subdata, func)) { return; } mesaFormat = validate_clear_buffer_format(ctx, internalformat, - format, type, - glClearBufferData); + format, type, func); + if (mesaFormat == MESA_FORMAT_NONE) { return; } clearValueSize =
Re: [Mesa-dev] [PATCH] dir-locals.el: Don't set variables for non-programming modes
On 30.01.2015 03:52, Neil Roberts wrote: This limits the style changes to modes inherited from prog-mode. The main reason to do this is to avoid setting fill-column for people using Emacs to edit commit messages because 78 characters is too many to make it wrap properly in git log. Note that makefile-mode also inherits from prog-mode so the fill column should continue to apply there. Sounds good, but can you please make the same change to the other .dir-locals.el files in the tree as well? -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [RFC] nir/register: Add a parent_instr field
This adds a parent_instr field similar to the one for ssa_def. The difference here is that the parent_instr field on a nir_register can be NULL if the register does not have a unique definition or if that definition does not dominate all its uses. We set this field in the out-of-SSA pass so that backends can get SSA-like information even after they have gone out of SSA. --- Note: This has not been compile-tested src/glsl/nir/nir.c | 1 + src/glsl/nir/nir.h | 11 ++- src/glsl/nir/nir_from_ssa.c | 7 +++ 3 files changed, 18 insertions(+), 1 deletion(-) diff --git a/src/glsl/nir/nir.c b/src/glsl/nir/nir.c index 25cdac0..b3b825b 100644 --- a/src/glsl/nir/nir.c +++ b/src/glsl/nir/nir.c @@ -61,6 +61,7 @@ reg_create(void *mem_ctx, struct exec_list *list) { nir_register *reg = ralloc(mem_ctx, nir_register); + reg-parent_instr = NULL; reg-uses = _mesa_set_create(mem_ctx, _mesa_hash_pointer, _mesa_key_pointer_equal); reg-defs = _mesa_set_create(mem_ctx, _mesa_hash_pointer, diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h index 980fdd0..2c7ffdc 100644 --- a/src/glsl/nir/nir.h +++ b/src/glsl/nir/nir.h @@ -64,6 +64,7 @@ name(const in_type *parent) \ struct nir_function_overload; struct nir_function; struct nir_shader; +struct nir_instr; /** @@ -384,6 +385,14 @@ typedef struct { */ bool is_packed; + /** +* If this pointer is non-NULL then this register has exactly one +* definition and that definition dominates all of its uses. This is +* set by the out-of-SSA pass so that backends can get SSA-like +* information even once they have gone out of SSA. +*/ + struct nir_instr *parent_instr; + /** set of nir_instr's where this register is used (read from) */ struct set *uses; @@ -406,7 +415,7 @@ typedef enum { nir_instr_type_parallel_copy, } nir_instr_type; -typedef struct { +typedef struct nir_instr { struct exec_node node; nir_instr_type type; struct nir_block *block; diff --git a/src/glsl/nir/nir_from_ssa.c b/src/glsl/nir/nir_from_ssa.c index 2e7add3..f0105fb 100644 --- a/src/glsl/nir/nir_from_ssa.c +++ b/src/glsl/nir/nir_from_ssa.c @@ -513,6 +513,13 @@ get_register_for_ssa_def(nir_ssa_def *def, struct from_ssa_state *state) reg-num_components = def-num_components; reg-num_array_elems = 0; + /* This register comes from an SSA definition that was not part of a + * phi-web. Therefore, we know it has a single unique definition + * that dominates all of its uses. Therefore, we can copy the + * parent_instr from the SSA def safely. + */ + reg-parent_instr = def-parent_instr; + _mesa_hash_table_insert(state-ssa_table, def, reg); return reg; } -- 2.1.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 14/21] main: Add entry points for MapNamedBuffer[Range].
Again, I think this would be more readable if the refactor was in its own commit with a little more explanation what's going on. Then add DSA. On Wed, Jan 21, 2015 at 5:40 PM, Laura Ekstrand la...@jlekstrand.net wrote: --- src/mapi/glapi/gen/ARB_direct_state_access.xml | 14 ++ src/mesa/main/bufferobj.c | 327 - src/mesa/main/bufferobj.h | 20 +- src/mesa/main/tests/dispatch_sanity.cpp| 2 + 4 files changed, 189 insertions(+), 174 deletions(-) diff --git a/src/mapi/glapi/gen/ARB_direct_state_access.xml b/src/mapi/glapi/gen/ARB_direct_state_access.xml index 4b29e00..7727f0d 100644 --- a/src/mapi/glapi/gen/ARB_direct_state_access.xml +++ b/src/mapi/glapi/gen/ARB_direct_state_access.xml @@ -61,6 +61,20 @@ param name=data type=const GLvoid * / /function + function name=MapNamedBuffer offset=assign + return type=GLvoid * / + param name=buffer type=GLuint / + param name=access type=GLenum / + /function + + function name=MapNamedBufferRange offset=assign + return type=GLvoid * / + param name=buffer type=GLuint / + param name=offset type=GLintptr / + param name=length type=GLsizeiptr / + param name=access type=GLbitfield / + /function + !-- Texture object functions -- function name=CreateTextures offset=assign diff --git a/src/mesa/main/bufferobj.c b/src/mesa/main/bufferobj.c index 1ec1681..16c0faa 100644 --- a/src/mesa/main/bufferobj.c +++ b/src/mesa/main/bufferobj.c @@ -701,7 +701,7 @@ _mesa_ClearBufferSubData_sw(struct gl_context *ctx, * Called via glMapBufferRange(). */ static void * -_mesa_buffer_map_range( struct gl_context *ctx, GLintptr offset, +_mesa_MapBufferRange_sw(struct gl_context *ctx, GLintptr offset, GLsizeiptr length, GLbitfield access, struct gl_buffer_object *bufObj, gl_map_buffer_index index) @@ -1116,7 +1116,7 @@ _mesa_init_buffer_object_functions(struct dd_function_table *driver) driver-ClearBufferSubData = _mesa_ClearBufferSubData_sw; /* GL_ARB_map_buffer_range */ - driver-MapBufferRange = _mesa_buffer_map_range; + driver-MapBufferRange = _mesa_MapBufferRange_sw; driver-FlushMappedBufferRange = _mesa_buffer_flush_mapped_range; /* GL_ARB_copy_buffer */ @@ -1794,116 +1794,6 @@ _mesa_ClearNamedBufferSubData(GLuint buffer, GLenum internalformat, } -void * GLAPIENTRY -_mesa_MapBuffer(GLenum target, GLenum access) -{ - GET_CURRENT_CONTEXT(ctx); - struct gl_buffer_object * bufObj; - GLbitfield accessFlags; - void *map; - bool valid_access; - - ASSERT_OUTSIDE_BEGIN_END_WITH_RETVAL(ctx, NULL); - - switch (access) { - case GL_READ_ONLY_ARB: - accessFlags = GL_MAP_READ_BIT; - valid_access = _mesa_is_desktop_gl(ctx); - break; - case GL_WRITE_ONLY_ARB: - accessFlags = GL_MAP_WRITE_BIT; - valid_access = true; - break; - case GL_READ_WRITE_ARB: - accessFlags = GL_MAP_READ_BIT | GL_MAP_WRITE_BIT; - valid_access = _mesa_is_desktop_gl(ctx); - break; - default: - valid_access = false; - break; - } - - if (!valid_access) { - _mesa_error(ctx, GL_INVALID_ENUM, glMapBufferARB(access)); - return NULL; - } - - bufObj = get_buffer(ctx, glMapBufferARB, target, GL_INVALID_OPERATION); - if (!bufObj) - return NULL; - - if (accessFlags GL_MAP_READ_BIT - !(bufObj-StorageFlags GL_MAP_READ_BIT)) { - _mesa_error(ctx, GL_INVALID_OPERATION, - glMapBuffer(invalid read flag)); - return NULL; - } - - if (accessFlags GL_MAP_WRITE_BIT - !(bufObj-StorageFlags GL_MAP_WRITE_BIT)) { - _mesa_error(ctx, GL_INVALID_OPERATION, - glMapBuffer(invalid write flag)); - return NULL; - } - - if (_mesa_bufferobj_mapped(bufObj, MAP_USER)) { - _mesa_error(ctx, GL_INVALID_OPERATION, glMapBufferARB(already mapped)); - return NULL; - } - - if (!bufObj-Size) { - _mesa_error(ctx, GL_OUT_OF_MEMORY, - glMapBuffer(buffer size = 0)); - return NULL; - } - - ASSERT(ctx-Driver.MapBufferRange); - map = ctx-Driver.MapBufferRange(ctx, 0, bufObj-Size, accessFlags, bufObj, -MAP_USER); - if (!map) { - _mesa_error(ctx, GL_OUT_OF_MEMORY, glMapBufferARB(map failed)); - return NULL; - } - else { - /* The driver callback should have set these fields. - * This is important because other modules (like VBO) might call - * the driver function directly. - */ - ASSERT(bufObj-Mappings[MAP_USER].Pointer == map); - ASSERT(bufObj-Mappings[MAP_USER].Length == bufObj-Size); - ASSERT(bufObj-Mappings[MAP_USER].Offset == 0); - bufObj-Mappings[MAP_USER].AccessFlags = accessFlags;
Re: [Mesa-dev] [PATCH] glsl: Pick ast_conditional branch regardless of op1/2 being constant.
Reviewed-by: Matt Turner matts...@gmail.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] mesa: fix display list 8-byte alignment issue
Ping. I'll handle the cherry-pick to 10.4 since a small code change is needed. -Brian On 01/27/2015 08:06 PM, Brian Paul wrote: The _mesa_dlist_alloc() function is only guaranteed to return a pointer with 4-byte alignment. On 64-bit systems which don't support unaligned loads (e.g. SPARC or MIPS) this could lead to a bus error in the VBO code. The solution is to add a new _mesa_dlist_alloc_aligned() function which will return a pointer to an 8-byte aligned address on 64-bit systems. This is accomplished by inserting a 4-byte NOP instruction in the display list when needed. The only place this actually matters is the VBO code where we need to allocate a 'struct vbo_save_vertex_list' which needs to be 8-byte aligned (just as if it were malloc'd). The gears demo and others hit this bug. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88662 Cc: 10.4 mesa-sta...@lists.freedesktop.org --- src/mesa/main/dlist.c | 72 + src/mesa/main/dlist.h | 3 ++ src/mesa/vbo/vbo_save_api.c | 5 +++- 3 files changed, 73 insertions(+), 7 deletions(-) diff --git a/src/mesa/main/dlist.c b/src/mesa/main/dlist.c index 138d360..dc6070b 100644 --- a/src/mesa/main/dlist.c +++ b/src/mesa/main/dlist.c @@ -487,6 +487,7 @@ typedef enum /* The following three are meta instructions */ OPCODE_ERROR,/* raise compiled-in error */ OPCODE_CONTINUE, + OPCODE_NOP, /* No-op (used for 8-byte alignment */ OPCODE_END_OF_LIST, OPCODE_EXT_0 } OpCode; @@ -1012,13 +1013,16 @@ memdup(const void *src, GLsizei bytes) * Allocate space for a display list instruction (opcode + payload space). * \param opcode the instruction opcode (OPCODE_* value) * \param bytes instruction payload size (not counting opcode) - * \return pointer to allocated memory (the opcode space) + * \param align8 does the payload need to be 8-byte aligned? + *This is only relevant in 64-bit environments. + * \return pointer to allocated memory (the payload will be at pointer+1) */ static Node * -dlist_alloc(struct gl_context *ctx, OpCode opcode, GLuint bytes) +dlist_alloc(struct gl_context *ctx, OpCode opcode, GLuint bytes, bool align8) { const GLuint numNodes = 1 + (bytes + sizeof(Node) - 1) / sizeof(Node); const GLuint contNodes = 1 + POINTER_DWORDS; /* size of continue info */ + GLuint nopNode; Node *n; if (opcode OPCODE_EXT_0) { @@ -1032,7 +1036,19 @@ dlist_alloc(struct gl_context *ctx, OpCode opcode, GLuint bytes) } } - if (ctx-ListState.CurrentPos + numNodes + contNodes BLOCK_SIZE) { + if (sizeof(void *) == 8 align8 ctx-ListState.CurrentPos % 2 == 0) { + /* The opcode would get placed at node[0] and the payload would start + * at node[1]. But the payload needs to be at an even offset (8-byte + * multiple). + */ + nopNode = 1; + } + else { + nopNode = 0; + } + + if (ctx-ListState.CurrentPos + nopNode + numNodes + contNodes +BLOCK_SIZE) { /* This block is full. Allocate a new block and chain to it */ Node *newblock; n = ctx-ListState.CurrentBlock + ctx-ListState.CurrentPos; @@ -1042,13 +1058,34 @@ dlist_alloc(struct gl_context *ctx, OpCode opcode, GLuint bytes) _mesa_error(ctx, GL_OUT_OF_MEMORY, Building display list); return NULL; } + + /* a fresh block should be 8-byte aligned on 64-bit systems */ + assert(((GLintptr) newblock) % sizeof(void *) == 0); + save_pointer(n[1], newblock); ctx-ListState.CurrentBlock = newblock; ctx-ListState.CurrentPos = 0; + + /* Display list nodes are always 4 bytes. If we need 8-byte alignment + * we have to insert a NOP so that the payload of the real opcode lands + * on an even location: + * node[0] = OPCODE_NOP + * node[1] = OPCODE_x; + * node[2] = start of payload + */ + nopNode = sizeof(void *) == 8 align8; } n = ctx-ListState.CurrentBlock + ctx-ListState.CurrentPos; - ctx-ListState.CurrentPos += numNodes; + if (nopNode) { + assert(ctx-ListState.CurrentPos % 2 == 0); /* even value */ + n[0].opcode = OPCODE_NOP; + n++; + /* The real opcode will now be at an odd location and the payload + * will be at an even location. + */ + } + ctx-ListState.CurrentPos += nopNode + numNodes; n[0].opcode = opcode; @@ -1069,7 +1106,22 @@ dlist_alloc(struct gl_context *ctx, OpCode opcode, GLuint bytes) void * _mesa_dlist_alloc(struct gl_context *ctx, GLuint opcode, GLuint bytes) { - Node *n = dlist_alloc(ctx, (OpCode) opcode, bytes); + Node *n = dlist_alloc(ctx, (OpCode) opcode, bytes, false); + if (n) + return n + 1; /* return pointer to payload area, after opcode */ + else + return NULL; +} + + +/** + * Same as _mesa_dlist_alloc(), but return a pointer which is 8-byte + *