Re: [Mesa-dev] [PATCH v2 2/3] etnaviv: Update includes from rnndb
On Sat, Apr 15, 2017 at 07:49:53AM +0200, Wladimir J. van der Laan wrote: > On Fri, Apr 14, 2017 at 11:57:21PM +0200, Christian Gmeiner wrote: > > > +#define INST_OPCODE_IMADLOSAT0 0x004e > > > +#define INST_OPCODE_IMADLOSAT0 0x004f > > > > INST_OPCODE_IMADLOSAT0 got redefined... > > Second one should be IMADLOSAT1. Strange, I fixed this but apparently it > didn't make it to the patch, > messed up with git again :( > https://github.com/etnaviv/etna_viv/blob/master/src/etnaviv/isa.xml.h#L119 I now understand what went wrong: apparently I fixed another instance (IMUL not IMAD) but not this one. Strange, hadn't seen a warning for this. Thanks for the fix, Regards, Wladimir ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2 2/3] etnaviv: Update includes from rnndb
On Fri, Apr 14, 2017 at 11:57:21PM +0200, Christian Gmeiner wrote: > > +#define INST_OPCODE_IMADLOSAT0 0x004e > > +#define INST_OPCODE_IMADLOSAT0 0x004f > > INST_OPCODE_IMADLOSAT0 got redefined... Second one should be IMADLOSAT1. Strange, I fixed this but apparently it didn't make it to the patch, messed up with git again :( https://github.com/etnaviv/etna_viv/blob/master/src/etnaviv/isa.xml.h#L119 Regards, Wladimir ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 04/12] i965/cnl: Implement new pipe control workaround
On Fri, Apr 14, 2017 at 8:35 PM, Anuj Phogatwrote: > From: Ben Widawsky > > GEN10 requires flushing all previous pipe controls before issuing a render > target cache flush. The docs seem to fairly explicitly say this is gen10 only. > > v2: Rebased on > commit 04f74d66293222d5e1905cfb930bfa083e30463c > Author: Francisco Jerez > Date: Thu Jun 30 19:39:24 2016 -0700 > > i965: Emit SNB write cache flush W/A from brw_emit_pipe_control_flush. > > Cc: Francisco Jerez > Signed-off-by: Ben Widawsky > --- > src/mesa/drivers/dri/i965/brw_pipe_control.c | 18 ++ > 1 file changed, 18 insertions(+) > > diff --git a/src/mesa/drivers/dri/i965/brw_pipe_control.c > b/src/mesa/drivers/dri/i965/brw_pipe_control.c > index b8f7406..b921fe7 100644 > --- a/src/mesa/drivers/dri/i965/brw_pipe_control.c > +++ b/src/mesa/drivers/dri/i965/brw_pipe_control.c > @@ -128,6 +128,24 @@ brw_emit_pipe_control_flush(struct brw_context *brw, > uint32_t flags) > brw_emit_pipe_control_flush(brw, 0); >} > > + if (brw->gen == 10) { Should this only be if flags & PIPE_CONTROL_RENDER_TARGET_FLUSH ? > +/* Hardware workaround: CNL > + * > + * "Before sending a PIPE_CONTROL command with bit 12 set, SW > + * must issue another PIPE_CONTROL with Render Target Cache > + * Flush Enable (bit 12) = 0 and Pipe Control Flush Enable (bit > + * 7) = 1." > + */ > + BEGIN_BATCH(6); > + OUT_BATCH(_3DSTATE_PIPE_CONTROL | (6 - 2)); > + OUT_BATCH(PIPE_CONTROL_FLUSH_ENABLE); Based on the comment above, shouldn't this also be | PIPE_CONTROL_RENDER_TARGET_FLUSH? Also, this tends to be done as a brw_emit_pipe_control_flush(brw, fooflags) call above for gen9, makes sense to do the same thing here, no? > + OUT_BATCH(0); > + OUT_BATCH(0); > + OUT_BATCH(0); > + OUT_BATCH(0); > + ADVANCE_BATCH(); > + } > + >BEGIN_BATCH(6); >OUT_BATCH(_3DSTATE_PIPE_CONTROL | (6 - 2)); >OUT_BATCH(flags); > -- > 2.9.3 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] mesa: print target string in glBindTexture() error message
Reviewed-by: Timothy ArceriOn 15/04/17 04:42, Brian Paul wrote: --- src/mesa/main/texobj.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/mesa/main/texobj.c b/src/mesa/main/texobj.c index ad644ca..00feb97 100644 --- a/src/mesa/main/texobj.c +++ b/src/mesa/main/texobj.c @@ -1663,7 +1663,8 @@ _mesa_BindTexture( GLenum target, GLuint texName ) targetIndex = _mesa_tex_target_to_index(ctx, target); if (targetIndex < 0) { - _mesa_error(ctx, GL_INVALID_ENUM, "glBindTexture(target)"); + _mesa_error(ctx, GL_INVALID_ENUM, "glBindTexture(target = %s)", + _mesa_enum_to_string(target)); return; } assert(targetIndex < NUM_TEXTURE_TARGETS); ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 06/12] i965/cnl: Modify thread count shift for VS
From: Ben WidawskySigned-off-by: Ben Widawsky --- src/mesa/drivers/dri/i965/brw_defines.h | 1 + src/mesa/drivers/dri/i965/gen8_vs_state.c | 6 +- 2 files changed, 6 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_defines.h b/src/mesa/drivers/dri/i965/brw_defines.h index 08106c0..688ff61 100644 --- a/src/mesa/drivers/dri/i965/brw_defines.h +++ b/src/mesa/drivers/dri/i965/brw_defines.h @@ -607,6 +607,7 @@ enum brw_wrap_mode { /* DW5 */ # define GEN6_VS_MAX_THREADS_SHIFT 25 # define HSW_VS_MAX_THREADS_SHIFT 23 +# define GEN10_VS_MAX_THREADS_SHIFT 22 # define GEN6_VS_STATISTICS_ENABLE (1 << 10) # define GEN6_VS_CACHE_DISABLE (1 << 1) # define GEN6_VS_ENABLE(1 << 0) diff --git a/src/mesa/drivers/dri/i965/gen8_vs_state.c b/src/mesa/drivers/dri/i965/gen8_vs_state.c index 7b66da4..c4ad9cd 100644 --- a/src/mesa/drivers/dri/i965/gen8_vs_state.c +++ b/src/mesa/drivers/dri/i965/gen8_vs_state.c @@ -75,7 +75,11 @@ upload_vs_state(struct brw_context *brw) uint32_t simd8_enable = vue_prog_data->dispatch_mode == DISPATCH_MODE_SIMD8 ? GEN8_VS_SIMD8_ENABLE : 0; - OUT_BATCH(((devinfo->max_vs_threads - 1) << HSW_VS_MAX_THREADS_SHIFT) | + + uint32_t threads = (devinfo->max_vs_threads - 1); + threads <<= brw->gen >= 10 ? GEN10_VS_MAX_THREADS_SHIFT : +HSW_VS_MAX_THREADS_SHIFT; + OUT_BATCH(threads | GEN6_VS_STATISTICS_ENABLE | simd8_enable | GEN6_VS_ENABLE); -- 2.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 11/12] i965/cnl: Properly handle l3 configuration
From: Ben WidawskyV2: Squash the changes in one patch and rebased on master (Anuj). Signed-off-by: Ben Widawsky Signed-off-by: Anuj Phogat --- src/intel/common/gen_l3_config.c | 43 ++-- 1 file changed, 37 insertions(+), 6 deletions(-) diff --git a/src/intel/common/gen_l3_config.c b/src/intel/common/gen_l3_config.c index 4fe3503..f3e8793 100644 --- a/src/intel/common/gen_l3_config.c +++ b/src/intel/common/gen_l3_config.c @@ -102,6 +102,26 @@ static const struct gen_l3_config chv_l3_configs[] = { }; /** + * On CNL, RO clients are merged and shared with read/write space. As a result + * we have fewer allocation parameters. Also, programming does not require any + * back scaling. Programming simply works in 2k increments and is scaled by the + * hardware. + */ +static const struct gen_l3_config cnl_l3_configs[] = { + /* SLM URB Rest DC RO */ + {{ 0, 64, 64, 0, 0 }}, + {{ 0, 64, 0, 16, 48 }}, + {{ 0, 48, 0, 16, 64 }}, + {{ 0, 32, 0, 0, 96 }}, + {{ 0, 32, 96, 0, 0 }}, + {{ 0, 32, 0, 16, 80 }}, + {{ 32, 16, 80, 0, 0 }}, + {{ 32, 16, 0, 64, 16 }}, + {{ 32, 0, 96, 0, 0 }}, + {{ 0 }} +}; + +/** * Return a zero-terminated array of validated L3 configurations for the * specified device. */ @@ -116,9 +136,11 @@ get_l3_configs(const struct gen_device_info *devinfo) return (devinfo->is_cherryview ? chv_l3_configs : bdw_l3_configs); case 9: - case 10: return chv_l3_configs; + case 10: + return cnl_l3_configs; + default: unreachable("Not implemented"); } @@ -258,13 +280,19 @@ get_l3_way_size(const struct gen_device_info *devinfo) if (devinfo->is_baytrail) return 2; - else if (devinfo->gt == 1 || -devinfo->is_cherryview || -devinfo->is_broxton) + /* Way size is actually 6 * num_slices, because it's 2k per bank, and +* normally 3 banks per slice. However, on CNL+ this information isn't +* needed to setup the URB/l3 configuration. We fudge the answer here +* and then use the scaling to fix it up later. +*/ + if (devinfo->gen >= 10) + return 2 * devinfo->l3_banks; + + /* XXX: Cherryview and Broxton are always gt1 */ + if (devinfo->gt == 1) return 4; - else - return 8 * devinfo->num_slices; + return 8 * devinfo->num_slices; } /** @@ -274,6 +302,9 @@ get_l3_way_size(const struct gen_device_info *devinfo) static unsigned get_urb_size_scale(const struct gen_device_info *devinfo) { + if (devinfo->gen == 10) + return devinfo->l3_banks; + return (devinfo->gen >= 8 ? devinfo->num_slices : 1); } -- 2.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 10/12] i965/cnl: Update memory barrier assert
Signed-off-by: Anuj Phogat--- src/mesa/drivers/dri/i965/brw_program.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_program.c b/src/mesa/drivers/dri/i965/brw_program.c index e1f9896..ab719ad 100644 --- a/src/mesa/drivers/dri/i965/brw_program.c +++ b/src/mesa/drivers/dri/i965/brw_program.c @@ -292,7 +292,7 @@ brw_memory_barrier(struct gl_context *ctx, GLbitfield barriers) unsigned bits = (PIPE_CONTROL_DATA_CACHE_FLUSH | PIPE_CONTROL_NO_WRITE | PIPE_CONTROL_CS_STALL); - assert(brw->gen >= 7 && brw->gen <= 9); + assert(brw->gen >= 7 && brw->gen <= 10); if (barriers & (GL_VERTEX_ATTRIB_ARRAY_BARRIER_BIT | GL_ELEMENT_ARRAY_BARRIER_BIT | -- 2.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 12/12] i965/cnl: Add CNL MOCS defines
Signed-off-by: Anuj Phogat--- src/mesa/drivers/dri/i965/brw_blorp.c| 7 ++- src/mesa/drivers/dri/i965/brw_defines.h | 8 src/mesa/drivers/dri/i965/brw_wm_surface_state.c | 2 ++ 3 files changed, 16 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_blorp.c b/src/mesa/drivers/dri/i965/brw_blorp.c index 8a6cc66..eae925f 100644 --- a/src/mesa/drivers/dri/i965/brw_blorp.c +++ b/src/mesa/drivers/dri/i965/brw_blorp.c @@ -94,12 +94,17 @@ brw_blorp_init(struct brw_context *brw) brw->blorp.exec = gen8_blorp_exec; break; case 9: - case 10: brw->blorp.mocs.tex = SKL_MOCS_WB; brw->blorp.mocs.rb = SKL_MOCS_PTE; brw->blorp.mocs.vb = SKL_MOCS_WB; brw->blorp.exec = gen9_blorp_exec; break; + case 10: + brw->blorp.mocs.tex = CNL_MOCS_WB; + brw->blorp.mocs.rb = CNL_MOCS_PTE; + brw->blorp.mocs.vb = CNL_MOCS_WB; + brw->blorp.exec = gen9_blorp_exec; + break; default: unreachable("Invalid gen"); } diff --git a/src/mesa/drivers/dri/i965/brw_defines.h b/src/mesa/drivers/dri/i965/brw_defines.h index 688ff61..afa13b4 100644 --- a/src/mesa/drivers/dri/i965/brw_defines.h +++ b/src/mesa/drivers/dri/i965/brw_defines.h @@ -1408,6 +1408,14 @@ enum brw_pixel_shader_coverage_mask_mode { /* TC=LLC/eLLC, LeCC=PTE, LRUM=3, L3CC=WB */ #define SKL_MOCS_PTE (1 << 1) +/* CannonLake: MOCS is now an index into an array of 62 different caching + * configurations programmed by the kernel. + */ +/* TC=LLC/eLLC, LeCC=WB, LRUM=3, L3CC=WB */ +#define CNL_MOCS_WB (2 << 1) +/* TC=LLC/eLLC, LeCC=PTE, LRUM=3, L3CC=WB */ +#define CNL_MOCS_PTE (1 << 1) + #define MEDIA_VFE_STATE 0x7000 /* GEN7 DW2, GEN8+ DW3 */ # define MEDIA_VFE_STATE_MAX_THREADS_SHIFT 16 diff --git a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c index 1d4953e..68942f7 100644 --- a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c +++ b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c @@ -64,12 +64,14 @@ uint32_t tex_mocs[] = { [7] = GEN7_MOCS_L3, [8] = BDW_MOCS_WB, [9] = SKL_MOCS_WB, + [10] = CNL_MOCS_WB, }; uint32_t rb_mocs[] = { [7] = GEN7_MOCS_L3, [8] = BDW_MOCS_PTE, [9] = SKL_MOCS_PTE, + [10] = CNL_MOCS_PTE, }; static void -- 2.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 09/12] i965/cnl: URB {VS, GS, HS, DS} sizes cannot be a multiple of 3
v1: By Ben Widawskyv2: Add the restriction for GS, HS and DS and make sure the allocated sizes are not multiple of 3. Signed-off-by: Anuj Phogat Cc: Ben Widawsky --- src/mesa/drivers/dri/i965/gen7_urb.c | 12 1 file changed, 12 insertions(+) diff --git a/src/mesa/drivers/dri/i965/gen7_urb.c b/src/mesa/drivers/dri/i965/gen7_urb.c index 028161d..dc6826a 100644 --- a/src/mesa/drivers/dri/i965/gen7_urb.c +++ b/src/mesa/drivers/dri/i965/gen7_urb.c @@ -194,6 +194,17 @@ gen7_upload_urb(struct brw_context *brw, unsigned vs_size, entry_size[i] = prog_data[i] ? prog_data[i]->urb_entry_size : 1; } + /* For Cannonlake: +* Software shall not program an allocation size that specifies a size +* that is a multiple of 3 64B (512-bit) cachelines. +*/ + if (brw->gen == 10) { + for (int i = MESA_SHADER_VERTEX; i <= MESA_SHADER_GEOMETRY; i++) { + if (entry_size[i] % 3 == 0) +entry_size[i]++; + } + } + /* If we're just switching between programs with the same URB requirements, * skip the rest of the logic. */ @@ -224,6 +235,7 @@ gen7_upload_urb(struct brw_context *brw, unsigned vs_size, BEGIN_BATCH(8); for (int i = MESA_SHADER_VERTEX; i <= MESA_SHADER_GEOMETRY; i++) { + assert(brw->gen != 10 || entry_size[i] % 3); OUT_BATCH((_3DSTATE_URB_VS + i) << 16 | (2 - 2)); OUT_BATCH(entries[i] | ((entry_size[i] - 1) << GEN7_URB_ENTRY_SIZE_SHIFT) | -- 2.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 05/12] i965/cnl: Implement depth count workaround
From: Ben WidawskySigned-off-by: Ben Widawsky --- src/mesa/drivers/dri/i965/brw_queryobj.c | 8 1 file changed, 8 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_queryobj.c b/src/mesa/drivers/dri/i965/brw_queryobj.c index 5c3ecba..d0d0589 100644 --- a/src/mesa/drivers/dri/i965/brw_queryobj.c +++ b/src/mesa/drivers/dri/i965/brw_queryobj.c @@ -111,6 +111,14 @@ brw_write_depth_count(struct brw_context *brw, drm_intel_bo *query_bo, int idx) if (brw->gen == 9 && brw->gt == 4) flags |= PIPE_CONTROL_CS_STALL; + if (brw->gen >= 10) { + /* "Driver must program PIPE_CONTROL with only Depth Stall Enable bit set + * prior to programming a PIPE_CONTROL with Write PS Depth Count Post sync + * operation." + */ + brw_emit_pipe_control_flush(brw, PIPE_CONTROL_DEPTH_STALL); + } + brw_emit_pipe_control_write(brw, flags, query_bo, idx * sizeof(uint64_t), 0, 0); -- 2.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 07/12] i965/cnl: Restore lossless compression for sRGB formats
From: Ben WidawskyThis support was removed on gen9 (it worked before then) and was brought back for gen10. Signed-off-by: Ben Widawsky --- src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c index 467ada5..c8014b9 100644 --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c @@ -207,7 +207,7 @@ intel_miptree_supports_non_msrt_fast_clear(struct brw_context *brw, if (!brw->format_supported_as_render_target[mt->format]) return false; - if (brw->gen >= 9) { + if (brw->gen == 9) { mesa_format linear_format = _mesa_get_srgb_format_linear(mt->format); const uint32_t brw_format = brw_isl_format_for_mesa_format(linear_format); return isl_format_supports_ccs_e(>screen->devinfo, brw_format); -- 2.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 08/12] i965/cnl: Add a preliminary device for CNL
From: Ben WidawskySince we've implemented all the known quirks for supporting gen10 with none of the new features (ie. functions like Skylake), it should be safe to actually enable the device. v2: rebased on top of master and updated pci ids (Anuj) Signed-off-by: Ben Widawsky Signed-off-by: Anuj Phogat --- include/pci_ids/i965_pci_ids.h | 12 ++ src/intel/common/gen_device_info.c | 59 + src/intel/common/gen_device_info.h | 1 + src/intel/common/gen_l3_config.c| 1 + src/intel/compiler/brw_compiler.h | 2 +- src/intel/compiler/brw_eu.c | 2 + src/intel/compiler/brw_eu_compact.c | 1 + src/intel/isl/isl.c | 2 + src/intel/vulkan/anv_cmd_buffer.c | 1 + src/intel/vulkan/anv_device.c | 1 + src/intel/vulkan/anv_entrypoints_gen.py | 1 + src/mesa/drivers/dri/i965/brw_blorp.c | 1 + src/mesa/drivers/dri/i965/brw_draw_upload.c | 1 + src/mesa/drivers/dri/i965/brw_formatquery.c | 1 + src/mesa/drivers/dri/i965/intel_screen.c| 1 + 15 files changed, 86 insertions(+), 1 deletion(-) diff --git a/include/pci_ids/i965_pci_ids.h b/include/pci_ids/i965_pci_ids.h index 17504f5..b296359 100644 --- a/include/pci_ids/i965_pci_ids.h +++ b/include/pci_ids/i965_pci_ids.h @@ -165,3 +165,15 @@ CHIPSET(0x5927, kbl_gt3, "Intel(R) Iris Plus Graphics 650 (Kaby Lake GT3)") CHIPSET(0x593B, kbl_gt4, "Intel(R) Kabylake GT4") CHIPSET(0x3184, glk, "Intel(R) HD Graphics (Geminilake)") CHIPSET(0x3185, glk_2x6, "Intel(R) HD Graphics (Geminilake 2x6)") +CHIPSET(0x5A49, cnl_2x8, "Intel(R) HD Graphics (Cannonlake 2x8 GT0.5)") +CHIPSET(0x5A4A, cnl_2x8, "Intel(R) HD Graphics (Cannonlake 2x8 GT0.5)") +CHIPSET(0x5A41, cnl_3x8, "Intel(R) HD Graphics (Cannonlake 3x8 GT1)") +CHIPSET(0x5A42, cnl_3x8, "Intel(R) HD Graphics (Cannonlake 3x8 GT1)") +CHIPSET(0x5A44, cnl_3x8, "Intel(R) HD Graphics (Cannonlake 3x8 GT1)") +CHIPSET(0x5A59, cnl_4x8, "Intel(R) HD Graphics (Cannonlake 4x8 GT1.5)") +CHIPSET(0x5A5A, cnl_4x8, "Intel(R) HD Graphics (Cannonlake 4x8 GT1.5)") +CHIPSET(0x5A5C, cnl_4x8, "Intel(R) HD Graphics (Cannonlake 4x8 GT1.5)") +CHIPSET(0x5A50, cnl_5x8, "Intel(R) HD Graphics (Cannonlake 5x8 GT2)") +CHIPSET(0x5A51, cnl_5x8, "Intel(R) HD Graphics (Cannonlake 5x8 GT2)") +CHIPSET(0x5A52, cnl_5x8, "Intel(R) HD Graphics (Cannonlake 5x8 GT2)") +CHIPSET(0x5A54, cnl_5x8, "Intel(R) HD Graphics (Cannonlake 5x8 GT2)") diff --git a/src/intel/common/gen_device_info.c b/src/intel/common/gen_device_info.c index 47aed9d..43d6f08 100644 --- a/src/intel/common/gen_device_info.c +++ b/src/intel/common/gen_device_info.c @@ -555,6 +555,65 @@ static const struct gen_device_info gen_device_info_glk_2x6 = { GEN9_LP_FEATURES_2X6 }; +#define GEN10_HW_INFO \ + .gen = 10, \ + .max_vs_threads = 728, \ + .max_gs_threads = 432, \ + .max_tcs_threads = 432, \ + .max_tes_threads = 624, \ + .max_wm_threads = 64 * 12, \ + .max_cs_threads = 56,\ + .urb = { \ + .size = 256, \ + .min_entries = { \ + [MESA_SHADER_VERTEX]= 64, \ + [MESA_SHADER_TESS_EVAL] = 34, \ + },\ + .max_entries = { \ + [MESA_SHADER_VERTEX] = 3936,\ + [MESA_SHADER_TESS_CTRL]= 896, \ + [MESA_SHADER_TESS_EVAL]= 2064,\ + [MESA_SHADER_GEOMETRY] = 832, \ + },\ + } + +#define GEN10_FEATURES(_gt, _slices, _l3) \ + GEN8_FEATURES, \ + GEN10_HW_INFO, \ + .gt = _gt, .num_slices = _slices, .l3_banks = _l3 + +static const struct gen_device_info gen_device_info_cnl_2x8 = { + /* GT0.5 */ + GEN10_FEATURES(1, 1, 2) +}; + +static const struct gen_device_info gen_device_info_cnl_3x8 = { + /* GT1 */ + GEN10_FEATURES(1, 1, 3) +}; + +static const struct gen_device_info gen_device_info_cnl_4x8 = { + /* GT 1.5 */ + GEN10_FEATURES(1, 2, 6) +}; + +static const struct gen_device_info gen_device_info_cnl_5x8 = { + /* GT2 */ + GEN10_FEATURES(2, 2, 6) +}; + +static const struct gen_device_info gen_device_info_cnl_gt1 = { + GEN10_FEATURES(1, 1, 3) +}; + +static const struct gen_device_info gen_device_info_cnl_gt2 = { + GEN10_FEATURES(2, 2, 6) +}; + +static const struct gen_device_info gen_device_info_cnl_gt3 = { + GEN10_FEATURES(3, 4, 12) +}; + bool
[Mesa-dev] [PATCH 03/12] i965/cnl: Update the script generating genX_bits.h
Signed-off-by: Anuj Phogat--- src/intel/genxml/gen_bits_header.py | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/intel/genxml/gen_bits_header.py b/src/intel/genxml/gen_bits_header.py index 808e6cf..77cd966 100644 --- a/src/intel/genxml/gen_bits_header.py +++ b/src/intel/genxml/gen_bits_header.py @@ -84,6 +84,7 @@ static inline uint32_t ATTRIBUTE_PURE ${field.token_name}(const struct gen_device_info *devinfo) { switch (devinfo->gen) { + case 10: return ${field.bits(10)}; case 9: return ${field.bits(9)}; case 8: return ${field.bits(8)}; case 7: @@ -151,8 +152,7 @@ class Gen(object): def __init__(self, z): # Convert potential "major.minor" string z = float(z) -if z < 10: -z *= 10 +z *= 10 self.tenx = int(z) def __lt__(self, other): -- 2.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 00/12] Add Cannonlake support
This series adds a preliminary support for Cannonlake. We still end up using gen9 paths in many cases. My upcoming patches will change it by creating new functions, headers for gen10. You can also find this series at: https://github.com/aphogat/mesa.git branch: reviews Anuj Phogat (4): i965/cnl: Update the script generating genX_bits.h i965/cnl: URB {VS, GS, HS, DS} sizes cannot be a multiple of 3 i965/cnl: Update memory barrier assert i965/cnl: Add CNL MOCS defines Ben Widawsky (7): i965: Make feature macros gen8 based i965/cnl: Implement new pipe control workaround i965/cnl: Implement depth count workaround i965/cnl: Modify thread count shift for VS i965/cnl: Restore lossless compression for sRGB formats i965/cnl: Add a preliminary device for CNL i965/cnl: Properly handle l3 configuration Jason Ekstrand (1): i965/cnl: Add gen10.xml include/pci_ids/i965_pci_ids.h | 12 + src/intel/Makefile.sources |3 +- src/intel/common/gen_device_info.c | 72 +- src/intel/common/gen_device_info.h |1 + src/intel/common/gen_l3_config.c | 42 +- src/intel/compiler/brw_compiler.h|2 +- src/intel/compiler/brw_eu.c |2 + src/intel/compiler/brw_eu_compact.c |1 + src/intel/genxml/gen10.xml | 3557 ++ src/intel/genxml/gen_bits_header.py |4 +- src/intel/isl/isl.c |2 + src/intel/vulkan/anv_cmd_buffer.c|1 + src/intel/vulkan/anv_device.c|1 + src/intel/vulkan/anv_entrypoints_gen.py |1 + src/mesa/drivers/dri/i965/brw_blorp.c|6 + src/mesa/drivers/dri/i965/brw_defines.h |9 + src/mesa/drivers/dri/i965/brw_draw_upload.c |1 + src/mesa/drivers/dri/i965/brw_formatquery.c |1 + src/mesa/drivers/dri/i965/brw_pipe_control.c | 18 + src/mesa/drivers/dri/i965/brw_program.c |2 +- src/mesa/drivers/dri/i965/brw_queryobj.c |8 + src/mesa/drivers/dri/i965/brw_wm_surface_state.c |2 + src/mesa/drivers/dri/i965/gen7_urb.c | 12 + src/mesa/drivers/dri/i965/gen8_vs_state.c|6 +- src/mesa/drivers/dri/i965/intel_mipmap_tree.c|2 +- src/mesa/drivers/dri/i965/intel_screen.c |1 + 26 files changed, 3749 insertions(+), 20 deletions(-) create mode 100644 src/intel/genxml/gen10.xml -- 2.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 04/12] i965/cnl: Implement new pipe control workaround
From: Ben WidawskyGEN10 requires flushing all previous pipe controls before issuing a render target cache flush. The docs seem to fairly explicitly say this is gen10 only. v2: Rebased on commit 04f74d66293222d5e1905cfb930bfa083e30463c Author: Francisco Jerez Date: Thu Jun 30 19:39:24 2016 -0700 i965: Emit SNB write cache flush W/A from brw_emit_pipe_control_flush. Cc: Francisco Jerez Signed-off-by: Ben Widawsky --- src/mesa/drivers/dri/i965/brw_pipe_control.c | 18 ++ 1 file changed, 18 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_pipe_control.c b/src/mesa/drivers/dri/i965/brw_pipe_control.c index b8f7406..b921fe7 100644 --- a/src/mesa/drivers/dri/i965/brw_pipe_control.c +++ b/src/mesa/drivers/dri/i965/brw_pipe_control.c @@ -128,6 +128,24 @@ brw_emit_pipe_control_flush(struct brw_context *brw, uint32_t flags) brw_emit_pipe_control_flush(brw, 0); } + if (brw->gen == 10) { +/* Hardware workaround: CNL + * + * "Before sending a PIPE_CONTROL command with bit 12 set, SW + * must issue another PIPE_CONTROL with Render Target Cache + * Flush Enable (bit 12) = 0 and Pipe Control Flush Enable (bit + * 7) = 1." + */ + BEGIN_BATCH(6); + OUT_BATCH(_3DSTATE_PIPE_CONTROL | (6 - 2)); + OUT_BATCH(PIPE_CONTROL_FLUSH_ENABLE); + OUT_BATCH(0); + OUT_BATCH(0); + OUT_BATCH(0); + OUT_BATCH(0); + ADVANCE_BATCH(); + } + BEGIN_BATCH(6); OUT_BATCH(_3DSTATE_PIPE_CONTROL | (6 - 2)); OUT_BATCH(flags); -- 2.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 01/12] i965: Make feature macros gen8 based
From: Ben WidawskyAll the "features" of the hardware are similar starting with GEN8, so remove as much of the GEN9 uniqueness as possible. This makes implementing future gen platforms a bit easier. Signed-off-by: Ben Widawsky Reviewed-by: Anuj Phogat --- src/intel/common/gen_device_info.c | 13 + 1 file changed, 5 insertions(+), 8 deletions(-) diff --git a/src/intel/common/gen_device_info.c b/src/intel/common/gen_device_info.c index 209b293..47aed9d 100644 --- a/src/intel/common/gen_device_info.c +++ b/src/intel/common/gen_device_info.c @@ -378,15 +378,8 @@ static const struct gen_device_info gen_device_info_chv = { } }; -#define GEN9_FEATURES \ +#define GEN9_HW_INFO\ .gen = 9,\ - .has_hiz_and_separate_stencil = true,\ - .has_resource_streamer = true, \ - .must_use_separate_stencil = true, \ - .has_llc = true, \ - .has_pln = true, \ - .supports_simd16_3src = true,\ - .has_surface_tile_offset = true, \ .max_vs_threads = 336, \ .max_gs_threads = 336, \ .max_tcs_threads = 336, \ @@ -454,6 +447,10 @@ static const struct gen_device_info gen_device_info_chv = { }, \ } +#define GEN9_FEATURES \ + GEN8_FEATURES, \ + GEN9_HW_INFO + static const struct gen_device_info gen_device_info_skl_gt1 = { GEN9_FEATURES, .gt = 1, .num_slices = 1, -- 2.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] swr: Fix swr osmesa build
Thanks Emil, I will attempt to un-meh a bit at checkin. George > On Apr 14, 2017, at 5:52 PM, Emil Velikovwrote: > > Commit summary is a bit meh, but regardless. > > Reviewed-by: Emil Velikov > > -Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] docs: Document interaction Fixes tag and stable branches.
On 14 April 2017 at 22:43, Bas Nieuwenhuizenwrote: > For the next time I forget. > > CC: Emil Velikov > Signed-off-by: Bas Nieuwenhuizen > --- > docs/submittingpatches.html | 5 + > 1 file changed, 5 insertions(+) > > diff --git a/docs/submittingpatches.html b/docs/submittingpatches.html > index 5310b1d8c17..4b025647039 100644 > --- a/docs/submittingpatches.html > +++ b/docs/submittingpatches.html > @@ -266,6 +266,11 @@ Note: by removing the tag [as the commit is pushed] the > patch is > Thus, drop the line only if you want to cancel the > nomination. > > > +Alternatively, if one uses the "Fixes" tag as desribed in the "Patch > formatting" s/desribed/described/ > +section, it nominates a commit for all active stable branches that include > the > +commit that is referred to. If the "CC" tag is also present the "Fixes" tag > will > +be used to determine which active stable branches the commit applies to. > + Please drop drop the second sentence, since it does not bring much (it even confuses the hell out of me). Reviewed-by: Emil Velikov Thanks Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v3] etnaviv: native fence fd support
2017-04-12 12:31 GMT+02:00 Philipp Zabel: > This adds native fence fd support to etnaviv, similarly to commit > 0b98e84e9ba0 ("freedreno: native fence fd"), enabled for kernel > driver version 1.1 or later. > > Signed-off-by: Philipp Zabel > Reviewed-By: Wladimir J. van der Laan Reviewed-by: Christian Gmeiner -- Christian Gmeiner, MSc https://www.youtube.com/user/AloryOFFICIAL https://soundcloud.com/christian-gmeiner ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600g: update dirty_level_mask after the 1-st draw after FB change
Tested-by: Dieter NützelOn Turks XT (6670) Dieter Am 13.04.2017 22:56, schrieb Constantine Kharlamov: Ported from radeonsi. Testing with Kane shows ≈1k skipped updates per frame on average. No piglit changes with tests/gpu.py, gbm mode. Signed-off-by: Constantine Kharlamov --- src/gallium/drivers/r600/evergreen_state.c | 1 + src/gallium/drivers/r600/r600_pipe.h | 1 + src/gallium/drivers/r600/r600_state.c| 1 + src/gallium/drivers/r600/r600_state_common.c | 41 4 files changed, 26 insertions(+), 18 deletions(-) diff --git a/src/gallium/drivers/r600/evergreen_state.c b/src/gallium/drivers/r600/evergreen_state.c index 5697da4af9..19ad504097 100644 --- a/src/gallium/drivers/r600/evergreen_state.c +++ b/src/gallium/drivers/r600/evergreen_state.c @@ -1550,6 +1550,7 @@ static void evergreen_set_framebuffer_state(struct pipe_context *ctx, r600_mark_atom_dirty(rctx, >framebuffer.atom); r600_set_sample_locations_constant_buffer(rctx); + rctx->framebuffer.do_update_surf_dirtiness = true; } static void evergreen_set_min_samples(struct pipe_context *ctx, unsigned min_samples) diff --git a/src/gallium/drivers/r600/r600_pipe.h b/src/gallium/drivers/r600/r600_pipe.h index 7f1ecc278b..e1715e8628 100644 --- a/src/gallium/drivers/r600/r600_pipe.h +++ b/src/gallium/drivers/r600/r600_pipe.h @@ -189,6 +189,7 @@ struct r600_framebuffer { bool cb0_is_integer; bool is_msaa_resolve; bool dual_src_blend; + bool do_update_surf_dirtiness; }; struct r600_sample_mask { diff --git a/src/gallium/drivers/r600/r600_state.c b/src/gallium/drivers/r600/r600_state.c index 06100abc4a..fc93eb02ad 100644 --- a/src/gallium/drivers/r600/r600_state.c +++ b/src/gallium/drivers/r600/r600_state.c @@ -1209,6 +1209,7 @@ static void r600_set_framebuffer_state(struct pipe_context *ctx, r600_mark_atom_dirty(rctx, >framebuffer.atom); r600_set_sample_locations_constant_buffer(rctx); + rctx->framebuffer.do_update_surf_dirtiness = true; } static uint32_t sample_locs_2x[] = { diff --git a/src/gallium/drivers/r600/r600_state_common.c b/src/gallium/drivers/r600/r600_state_common.c index 5be49dcdfe..7b52be36cd 100644 --- a/src/gallium/drivers/r600/r600_state_common.c +++ b/src/gallium/drivers/r600/r600_state_common.c @@ -99,6 +99,7 @@ static void r600_texture_barrier(struct pipe_context *ctx, unsigned flags) R600_CONTEXT_FLUSH_AND_INV_CB | R600_CONTEXT_FLUSH_AND_INV | R600_CONTEXT_WAIT_3D_IDLE; + rctx->framebuffer.do_update_surf_dirtiness = true; } static unsigned r600_conv_pipe_prim(unsigned prim) @@ -1732,6 +1733,7 @@ static void r600_draw_vbo(struct pipe_context *ctx, const struct pipe_draw_info if (unlikely(dirty_tex_counter != rctx->b.last_dirty_tex_counter)) { rctx->b.last_dirty_tex_counter = dirty_tex_counter; r600_mark_atom_dirty(rctx, >framebuffer.atom); + rctx->framebuffer.do_update_surf_dirtiness = true; } if (!r600_update_derived_state(rctx)) { @@ -2034,29 +2036,32 @@ static void r600_draw_vbo(struct pipe_context *ctx, const struct pipe_draw_info radeon_emit(cs, EVENT_TYPE(EVENT_TYPE_SQ_NON_EVENT)); } - /* Set the depth buffer as dirty. */ - if (rctx->framebuffer.state.zsbuf) { - struct pipe_surface *surf = rctx->framebuffer.state.zsbuf; - struct r600_texture *rtex = (struct r600_texture *)surf->texture; + if (rctx->framebuffer.do_update_surf_dirtiness) { + /* Set the depth buffer as dirty. */ + if (rctx->framebuffer.state.zsbuf) { + struct pipe_surface *surf = rctx->framebuffer.state.zsbuf; + struct r600_texture *rtex = (struct r600_texture *)surf->texture; - rtex->dirty_level_mask |= 1 << surf->u.tex.level; + rtex->dirty_level_mask |= 1 << surf->u.tex.level; - if (rtex->surface.flags & RADEON_SURF_SBUFFER) - rtex->stencil_dirty_level_mask |= 1 << surf->u.tex.level; - } - if (rctx->framebuffer.compressed_cb_mask) { - struct pipe_surface *surf; - struct r600_texture *rtex; - unsigned mask = rctx->framebuffer.compressed_cb_mask; + if (rtex->surface.flags & RADEON_SURF_SBUFFER) + rtex->stencil_dirty_level_mask |= 1 << surf->u.tex.level; + } + if (rctx->framebuffer.compressed_cb_mask) { + struct pipe_surface *surf; + struct r600_texture *rtex; + unsigned mask = rctx->framebuffer.compressed_cb_mask; - do { - unsigned i = u_bit_scan(); - surf =
Re: [Mesa-dev] [PATCH] swr: Fix swr osmesa build
Commit summary is a bit meh, but regardless. Reviewed-by: Emil Velikov-Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v4 0/3] asynchronous pbo transfer with glthread
Am 14.04.2017 07:53, schrieb gregory hainaut: On Fri, 14 Apr 2017 05:20:38 +0200 Dieter Nützelwrote: Am 14.04.2017 02:06, schrieb Dieter Nützel: > Hello Gregory, > > have you tested this with Mesa-demos/tests/pbo 'b' (benchmark)? > It result in crazy numbers and do not 'return' (one core stays @ 100%). This is related to 'mesa_glthread=true'. If I disable (unset) it, all is fine after 'b' benchmark and 'pbo' exit with ESC as expeted. Crazy numbers stay, maybe counter overrun due to BIG numbers? ;-) Hope that helps. Dieter Hello Dieter, I tested the demo. There is a pseudo unrelated bug on the exit of the application. Mesa 17.1.0-devel implementation error: In _mesa_DeleteHashTable, found non-freed data I will add a call to a _mesa_HashDeleteAll to fix it. i.e. _mesa_HashDeleteAll(shared->ShadowBufferObjects, dummy_cb, ctx); Now let's go back to the test behavior. The benchmarks will send 4s of asynchronous PBO transfer commands. And then will sync gl_thread which mean the application thread will be blocked until all PBO transfers are done. Gl_thread is faster to dispatch command so you will need to wait more before the thread goes back to real life. On my side, I need to wait around 45 seconds for 6 millions of commands. Result: 6,440,627 reads (gl thread on + PBO patches) Result:274,960 reads (gl thread off) In your case, "Result: 77,444,412 reads", I hope you're patient. I think you must wait at least 10 minutes. Now, I was patient... Tried 2 times but after ~20 minutes I've killed it at first and attached gdb at it during second run. 0x7fbda686e9a6 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 (gdb) bt #0 0x7fbda686e9a6 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x7fbda5359453 in ?? () from /usr/local/lib/dri/r600_dri.so #2 0x7fbda53661f4 in ?? () from /usr/local/lib/dri/r600_dri.so #3 0x00401e18 in ?? () #4 0x004028c7 in ?? () #5 0x7fbda9925781 in fghRedrawWindow () from /usr/lib64/libglut.so.3 #6 0x7fbda9925c08 in ?? () from /usr/lib64/libglut.so.3 #7 0x7fbda9926cf9 in fgEnumWindows () from /usr/lib64/libglut.so.3 #8 0x7fbda9925ce4 in glutMainLoopEvent () from /usr/lib64/libglut.so.3 #9 0x7fbda9925d85 in glutMainLoop () from /usr/lib64/libglut.so.3 #10 0x004019fc in ?? () #11 0x7fbda957e541 in __libc_start_main () from /lib64/libc.so.6 #12 0x00401afa in ?? () Should I do more or not worth it? Dieter > mesa-demos/tests> ./pbo > ATTENTION: default value of option mesa_glthread overridden by > environment. > GL_VERSION = 4.1 Mesa 17.1.0-devel (git-7c8fe31e1c) > GL_RENDERER = Gallium 0.4 on AMD TURKS (DRM 2.49.0 / > 4.11.0-rc6-1.g5a51416-default, LLVM 5.0.0) > Loaded 194 by 188 image > Converting RGB image to RGBA > Benchmarking... > Result: 7712 reads in 4.00 seconds = -383971576.00 > pixels/sec > > top - 02:04:42 up 10:05, 4 users, load average: 1,03, 0,77, 0,71 > Tasks: 265 total, 1 running, 264 sleeping, 0 stopped, 0 zombie > %Cpu0 : 1,3 us, 0,3 sy, 0,0 ni, 98,3 id, 0,0 wa, 0,0 hi, 0,0 si, > 0,0 st > %Cpu1 : 1,3 us, 0,3 sy, 0,0 ni, 98,3 id, 0,0 wa, 0,0 hi, 0,0 si, > 0,0 st > %Cpu2 : 1,7 us, 0,0 sy, 0,0 ni, 98,3 id, 0,0 wa, 0,0 hi, 0,0 si, > 0,0 st > %Cpu3 : 2,3 us, 0,3 sy, 0,0 ni, 97,3 id, 0,0 wa, 0,0 hi, 0,0 si, > 0,0 st > %Cpu4 : 1,7 us, 0,3 sy, 0,0 ni, 98,0 id, 0,0 wa, 0,0 hi, 0,0 si, > 0,0 st > %Cpu5 : 98,3 us, 1,7 sy, 0,0 ni, 0,0 id, 0,0 wa, 0,0 hi, 0,0 si, > 0,0 st > %Cpu6 : 2,0 us, 0,3 sy, 0,0 ni, 97,7 id, 0,0 wa, 0,0 hi, 0,0 si, > 0,0 st > %Cpu7 : 1,7 us, 0,0 sy, 0,0 ni, 98,3 id, 0,0 wa, 0,0 hi, 0,0 si, > 0,0 st > KiB Mem : 24680300 total, 8155356 free, 5751864 used, 10773080 > buff/cache > KiB Swap:0 total,0 free,0 used. 18437888 avail > Mem > > PID USER PR NIVIRTRESSHR S %CPU %MEM TIME+ > COMMAND > 19380 dieter20 0 3259764 2,911g 22472 S 100,3 12,37 2:28.48 > pbo > 27937 dieter20 0 4029572 570236 166116 S 5,980 2,310 9:45.53 > konqueror > 13432 dieter20 0 1922820 269892 129152 S 5,648 1,094 4:33.80 > Web Content > > Other than that: > > For the series: > > Tested-by: Dieter Nützel > r600g, Turks XT (6670) > > Dieter > > Am 13.04.2017 19:32, schrieb Gregory Hainaut: >> Hello, >> >> Please find a new version to handle invalid buffer handles. >> >> Allow to handle this kind of case: >>genBuffer(); >>BindBuffer(pbo) >>DeleteBuffer(pbo); >>BindBuffer(rand_pbo) >>TexSubImage2D(user_memory_pointer); // Data transfer will be >> synchronous >> >> There are various subtely to handle multi threaded shared context. In >> order to >> keep the code sane, I've considered a buffer invalid when it is >> deleted by a >> context even it is still bound to others contexts. It will force a >> synchronous >> transfer which
[Mesa-dev] [PATCH] nir: Add GLSL_TYPE_[U]INT64 to some switch statements
Cc: mesa-sta...@lists.freedesktop.org --- src/compiler/nir/nir.c | 2 ++ src/compiler/nir/nir_split_var_copies.c | 2 ++ 2 files changed, 4 insertions(+) diff --git a/src/compiler/nir/nir.c b/src/compiler/nir/nir.c index 43fa60f..0abf9b6 100644 --- a/src/compiler/nir/nir.c +++ b/src/compiler/nir/nir.c @@ -699,7 +699,9 @@ deref_foreach_leaf_build_recur(nir_deref_var *deref, nir_deref *tail, assert(tail->child == NULL); switch (glsl_get_base_type(tail->type)) { case GLSL_TYPE_UINT: + case GLSL_TYPE_UINT64: case GLSL_TYPE_INT: + case GLSL_TYPE_INT64: case GLSL_TYPE_FLOAT: case GLSL_TYPE_DOUBLE: case GLSL_TYPE_BOOL: diff --git a/src/compiler/nir/nir_split_var_copies.c b/src/compiler/nir/nir_split_var_copies.c index 58c7873..15a185e 100644 --- a/src/compiler/nir/nir_split_var_copies.c +++ b/src/compiler/nir/nir_split_var_copies.c @@ -147,7 +147,9 @@ split_var_copy_instr(nir_intrinsic_instr *old_copy, break; case GLSL_TYPE_UINT: + case GLSL_TYPE_UINT64: case GLSL_TYPE_INT: + case GLSL_TYPE_INT64: case GLSL_TYPE_FLOAT: case GLSL_TYPE_DOUBLE: case GLSL_TYPE_BOOL: -- 2.5.0.400.gff86faf ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] etnaviv: resolve tile status when flushing resource
2017-04-12 16:13 GMT+02:00 Lucas Stach: > From: Philipp Zabel > > When passing render buffers from EGL clients to a wayland compositor, > the resource tile status must be resolved because otherwise the tile > status is lost in the transfer and cleared parts of the buffer will > contain old contents. > > The same applies when sampling directly from a renderable resource. > > lst: Add seqno tracking, to skip flush when not needed. > > Fixes: aadcb5e94b35 ("etnaviv: enable TS, but disable autodisable") > Signed-off-by: Philipp Zabel > Signed-off-by: Lucas Stach Reviewed-by: Christian Gmeiner -- Christian Gmeiner, MSc https://www.youtube.com/user/AloryOFFICIAL https://soundcloud.com/christian-gmeiner ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] etnaviv: stop repeatedly resolving an unchanged resource into its scanout prime buffer
2017-04-12 16:13 GMT+02:00 Lucas Stach: > From: Philipp Zabel > > Before resolving a resource into its scanout prime buffer, check that > the prime resource is actually older. If it is not, the resolve is an > expensive no-op, and we better skip it. > > Signed-off-by: Philipp Zabel Reviewed-by: Christian Gmeiner -- Christian Gmeiner, MSc https://www.youtube.com/user/AloryOFFICIAL https://soundcloud.com/christian-gmeiner ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2 2/3] etnaviv: Update includes from rnndb
> +#define INST_OPCODE_IMADLOSAT0 0x004e > +#define INST_OPCODE_IMADLOSAT0 0x004f INST_OPCODE_IMADLOSAT0 got redefined... greets -- Christian Gmeiner, MSc https://www.youtube.com/user/AloryOFFICIAL https://soundcloud.com/christian-gmeiner ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] nir: Add GLSL_TYPE_[U]INT64 to some switch statements
--- src/compiler/nir/nir.c | 2 ++ src/compiler/nir/nir_split_var_copies.c | 2 ++ 2 files changed, 4 insertions(+) diff --git a/src/compiler/nir/nir.c b/src/compiler/nir/nir.c index 43fa60f..0abf9b6 100644 --- a/src/compiler/nir/nir.c +++ b/src/compiler/nir/nir.c @@ -699,7 +699,9 @@ deref_foreach_leaf_build_recur(nir_deref_var *deref, nir_deref *tail, assert(tail->child == NULL); switch (glsl_get_base_type(tail->type)) { case GLSL_TYPE_UINT: + case GLSL_TYPE_UINT64: case GLSL_TYPE_INT: + case GLSL_TYPE_INT64: case GLSL_TYPE_FLOAT: case GLSL_TYPE_DOUBLE: case GLSL_TYPE_BOOL: diff --git a/src/compiler/nir/nir_split_var_copies.c b/src/compiler/nir/nir_split_var_copies.c index 58c7873..15a185e 100644 --- a/src/compiler/nir/nir_split_var_copies.c +++ b/src/compiler/nir/nir_split_var_copies.c @@ -147,7 +147,9 @@ split_var_copy_instr(nir_intrinsic_instr *old_copy, break; case GLSL_TYPE_UINT: + case GLSL_TYPE_UINT64: case GLSL_TYPE_INT: + case GLSL_TYPE_INT64: case GLSL_TYPE_FLOAT: case GLSL_TYPE_DOUBLE: case GLSL_TYPE_BOOL: -- 2.5.0.400.gff86faf ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] swr: Fix swr osmesa build
--- src/gallium/targets/osmesa/SConscript | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/gallium/targets/osmesa/SConscript b/src/gallium/targets/osmesa/SConscript index 47937a2..7be1b48 100644 --- a/src/gallium/targets/osmesa/SConscript +++ b/src/gallium/targets/osmesa/SConscript @@ -31,7 +31,7 @@ if env['llvm']: env.Prepend(LIBS = [llvmpipe]) if env['swr']: -env.Append(CPPDEFINES = 'HAVE_SWR') +env.Append(CPPDEFINES = 'GALLIUM_SWR') env.Prepend(LIBS = [swr]) if env['platform'] == 'windows': -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] docs: Document interaction Fixes tag and stable branches.
For the next time I forget. CC: Emil VelikovSigned-off-by: Bas Nieuwenhuizen --- docs/submittingpatches.html | 5 + 1 file changed, 5 insertions(+) diff --git a/docs/submittingpatches.html b/docs/submittingpatches.html index 5310b1d8c17..4b025647039 100644 --- a/docs/submittingpatches.html +++ b/docs/submittingpatches.html @@ -266,6 +266,11 @@ Note: by removing the tag [as the commit is pushed] the patch is Thus, drop the line only if you want to cancel the nomination. +Alternatively, if one uses the "Fixes" tag as desribed in the "Patch formatting" +section, it nominates a commit for all active stable branches that include the +commit that is referred to. If the "CC" tag is also present the "Fixes" tag will +be used to determine which active stable branches the commit applies to. + Criteria for accepting patches to the stable branch Mesa has a designated release manager for each stable branch, and the release -- 2.12.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v3 4/4] vc4: Only build the NEON code on arm32.
On 14 April 2017 at 19:21, Eric Anholtwrote: > Emil Velikov writes: > >> On 14 April 2017 at 18:47, Eric Anholt wrote: >>> NEON is sufficiently different on arm64 that we can't just reuse this >>> code. Disable it on arm64 for now. >>> >>> Signed-off-by: Eric Anholt >>> --- >>> src/gallium/drivers/vc4/vc4_tiling_lt.c | 4 ++-- >>> 1 file changed, 2 insertions(+), 2 deletions(-) >>> >>> diff --git a/src/gallium/drivers/vc4/vc4_tiling_lt.c >>> b/src/gallium/drivers/vc4/vc4_tiling_lt.c >>> index c9cbc65e2dbc..7de67b652daa 100644 >>> --- a/src/gallium/drivers/vc4/vc4_tiling_lt.c >>> +++ b/src/gallium/drivers/vc4/vc4_tiling_lt.c >>> @@ -61,7 +61,7 @@ static void >>> vc4_load_utile(void *cpu, void *gpu, uint32_t cpu_stride, uint32_t cpp) >>> { >>> uint32_t gpu_stride = vc4_utile_stride(cpp); >>> -#if defined(VC4_BUILD_NEON) && defined(__ARM_ARCH) >>> +#if defined(VC4_BUILD_NEON) && defined(__ARM_ARCH) && __ARM_ARCH <= 7 >>> if (gpu_stride == 8) { >>> __asm__ volatile ( >>> /* Load from the GPU in one shot, no interleave, to >>> @@ -118,7 +118,7 @@ vc4_store_utile(void *gpu, void *cpu, uint32_t >>> cpu_stride, uint32_t cpp) >>> { >>> uint32_t gpu_stride = vc4_utile_stride(cpp); >>> >>> -#if defined(VC4_BUILD_NEON) && defined(__ARM_ARCH) >>> +#if defined(VC4_BUILD_NEON) && defined(__ARM_ARCH) && __ARM_ARCH <= 7 >> >> This patch should be before 4/4, or it will cause intermittent breakage. > > I don't think there is any new breakage. We've been setting > VC4_BUILD_NEON already. From a quick skim it seemed that only the Android build is be busted, rather than everywhere. I think you're right. Thanks for the correction. Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2 3/3] etnaviv: SINGLE_BUFFER support on GC3000
2017-04-14 9:44 GMT+02:00 Wladimir J. van der Laan: > This patch adds support for the SINGLE_BUFFER feature on GC3000 > GPUs, which allows rendering to a single buffer using multiple pixel > pipes. > > This feature is always used when it is available, which means that > multi-tiled formats are no longer being used in that case, and all > buffers will be normal (super)tiled. This mimics the behavior of the > blob on GC3000. > > - Because the same format can be used to render to and texture from, > this avoids an extra resolve pass when rendering to texture. > > - i.MX6qp includes a PRE which can scan-out directly from tiled formats, > avoiding untiling overhead. > > Signed-off-by: Wladimir J. van der Laan Series is: Reviewed-by: Christian Gmeiner greets -- Christian Gmeiner, MSc https://www.youtube.com/user/AloryOFFICIAL https://soundcloud.com/christian-gmeiner ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] radv: enable timestampComputeAndGraphics
Reviewed-by: Bas NieuwenhuizenOn Fri, Apr 14, 2017 at 11:24 PM, Grazvydas Ignotas wrote: > Commit bfee9866 "radv: Use RELEASE_MEM packet for MEC timestamp query." > added WriteTimestamp handling for compute queues but forgot to flip > the flag. > > Tested with DOOM (by me) and CTS (by Bas), but without verification > that these tests actually use timestamps on compute queues. > > Signed-off-by: Grazvydas Ignotas > --- > src/amd/vulkan/radv_device.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c > index 12040a0..dd401f4 100644 > --- a/src/amd/vulkan/radv_device.c > +++ b/src/amd/vulkan/radv_device.c > @@ -650,11 +650,11 @@ void radv_GetPhysicalDeviceProperties( > .sampledImageIntegerSampleCounts = > VK_SAMPLE_COUNT_1_BIT, > .sampledImageDepthSampleCounts= sample_counts, > .sampledImageStencilSampleCounts = sample_counts, > .storageImageSampleCounts = > VK_SAMPLE_COUNT_1_BIT, > .maxSampleMaskWords = 1, > - .timestampComputeAndGraphics = false, > + .timestampComputeAndGraphics = true, > .timestampPeriod = 100.0 / > pdevice->rad_info.clock_crystal_freq, > .maxClipDistances = 8, > .maxCullDistances = 8, > .maxCombinedClipAndCullDistances = 8, > .discreteQueuePriorities = 1, > -- > 2.7.4 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] radv: enable timestampComputeAndGraphics
Commit bfee9866 "radv: Use RELEASE_MEM packet for MEC timestamp query." added WriteTimestamp handling for compute queues but forgot to flip the flag. Tested with DOOM (by me) and CTS (by Bas), but without verification that these tests actually use timestamps on compute queues. Signed-off-by: Grazvydas Ignotas--- src/amd/vulkan/radv_device.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c index 12040a0..dd401f4 100644 --- a/src/amd/vulkan/radv_device.c +++ b/src/amd/vulkan/radv_device.c @@ -650,11 +650,11 @@ void radv_GetPhysicalDeviceProperties( .sampledImageIntegerSampleCounts = VK_SAMPLE_COUNT_1_BIT, .sampledImageDepthSampleCounts= sample_counts, .sampledImageStencilSampleCounts = sample_counts, .storageImageSampleCounts = VK_SAMPLE_COUNT_1_BIT, .maxSampleMaskWords = 1, - .timestampComputeAndGraphics = false, + .timestampComputeAndGraphics = true, .timestampPeriod = 100.0 / pdevice->rad_info.clock_crystal_freq, .maxClipDistances = 8, .maxCullDistances = 8, .maxCombinedClipAndCullDistances = 8, .discreteQueuePriorities = 1, -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 6/7] radeonsi: remove local variable 'mod' from si_compile_tgsi_shader
On 14.04.2017 17:08, Marek Olšák wrote: From: Marek Olšák--- src/gallium/drivers/radeonsi/si_shader.c | 7 ++- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/src/gallium/drivers/radeonsi/si_shader.c b/src/gallium/drivers/radeonsi/si_shader.c index 6242ec1..704c67e 100644 --- a/src/gallium/drivers/radeonsi/si_shader.c +++ b/src/gallium/drivers/radeonsi/si_shader.c @@ -7481,21 +7481,20 @@ static void si_build_wrapper_function(struct si_shader_context *ctx, } int si_compile_tgsi_shader(struct si_screen *sscreen, LLVMTargetMachineRef tm, struct si_shader *shader, bool is_monolithic, struct pipe_debug_callback *debug) { struct si_shader_selector *sel = shader->selector; struct si_shader_context ctx; - LLVMModuleRef mod; int r = -1; /* Dump TGSI code before doing TGSI->LLVM conversion in case the * conversion fails. */ if (r600_can_dump_shader(>b, sel->info.processor) && !(sscreen->b.debug_flags & DBG_NO_TGSI)) { tgsi_dump(sel->tokens, 0); si_dump_streamout(>so); } @@ -7592,40 +7591,38 @@ int si_compile_tgsi_shader(struct si_screen *sscreen, parts[0] = ctx.main_fn; } si_get_ps_epilog_key(shader, _key); si_build_ps_epilog_function(, _key); parts[need_prolog ? 2 : 1] = ctx.main_fn; si_build_wrapper_function(, parts, need_prolog ? 3 : 2, need_prolog ? 1 : 0); } - mod = ctx.gallivm.module; - /* Dump LLVM IR before any optimization passes */ if (sscreen->b.debug_flags & DBG_PREOPT_IR && r600_can_dump_shader(>b, ctx.type)) - ac_dump_module(mod); + LLVMDumpModule(ctx.gallivm.module); Are you sure this works? Wasn't there some issue with different LLVM versions not having the function? Or wait... I think the function was briefly removed in trunk and then added again, so it's probably fine. The series is Reviewed-by: Nicolai Hähnle si_llvm_finalize_module(, r600_extra_shader_checks(>b, ctx.type)); /* Post-optimization transformations and analysis. */ si_eliminate_const_vs_outputs(); if ((debug && debug->debug_message) || r600_can_dump_shader(>b, ctx.type)) si_count_scratch_private_memory(); /* Compile to bytecode. */ r = si_compile_llvm(sscreen, >binary, >config, tm, - mod, debug, ctx.type, "TGSI shader"); + ctx.gallivm.module, debug, ctx.type, "TGSI shader"); si_llvm_dispose(); if (r) { fprintf(stderr, "LLVM failed to compile shader\n"); return r; } /* Validate SGPR and VGPR usage for compute to detect compiler bugs. * LLVM 3.9svn has this bug. */ if (sel->type == PIPE_SHADER_COMPUTE) { -- Lerne, wie die Welt wirklich ist, Aber vergiss niemals, wie sie sein sollte. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/3] anv/blorp: Properly handle VK_ATTACHMENT_UNUSED
On Wed, Apr 12, 2017 at 2:54 PM, Nanley Cherywrote: > On Tue, Apr 11, 2017 at 07:54:23AM -0700, Jason Ekstrand wrote: > > The Vulkan driver was originally written under the assumption that > > VK_ATTACHMENT_UNUSED was basically just for depth-stencil attachments. > > However, the way things fell together, VK_ATTACHMENT_UNUSED can be used > > anywhere in the subpass description. The blorp-based clear and resolve > > code has a bunch of places where we walk lists of attachments and we > > weren't handling VK_ATTACHMENT_UNUSED everywhere. This commit should > > fix all of them. > > > > Cc: "13.0 17.0" > > I think specifying the specific stable branch in quotes has been > deprecated according to > https://www.mesa3d.org/submittingpatches.html#nominations > > > --- > > src/intel/vulkan/anv_blorp.c | 30 +- > > 1 file changed, 25 insertions(+), 5 deletions(-) > > > > diff --git a/src/intel/vulkan/anv_blorp.c b/src/intel/vulkan/anv_blorp.c > > index 72a468a..d27132a 100644 > > --- a/src/intel/vulkan/anv_blorp.c > > +++ b/src/intel/vulkan/anv_blorp.c > > @@ -1148,6 +1148,9 @@ anv_cmd_buffer_flush_attachments(struct > anv_cmd_buffer *cmd_buffer, > > > > for (uint32_t i = 0; i < subpass->color_count; ++i) { > >uint32_t att = subpass->color_attachments[i].attachment; > > + if (att == VK_ATTACHMENT_UNUSED) > > + continue; > > + > >assert(att < pass->attachment_count); > >if (attachment_needs_flush(cmd_buffer, >attachments[att], > stage)) { > > cmd_buffer->state.pending_pipe_bits |= > > @@ -1175,14 +1178,19 @@ subpass_needs_clear(const struct anv_cmd_buffer > *cmd_buffer) > > > > for (uint32_t i = 0; i < cmd_state->subpass->color_count; ++i) { > >uint32_t a = cmd_state->subpass->color_attachments[i].attachment; > > + if (a == VK_ATTACHMENT_UNUSED) > > + continue; > > + > > + assert(a < cmd_state->pass->attachment_count); > >if (cmd_state->attachments[a].pending_clear_aspects) { > > return true; > >} > > } > > > > - if (ds != VK_ATTACHMENT_UNUSED && > > - cmd_state->attachments[ds].pending_clear_aspects) { > > - return true; > > + if (ds != VK_ATTACHMENT_UNUSED) { > > + assert(ds < cmd_state->pass->attachment_count); > > + if (cmd_state->attachments[ds].pending_clear_aspects) > > + return true; > > I'll refer to this hunk below. > > > } > > > > return false; > > @@ -1214,6 +1222,10 @@ anv_cmd_buffer_clear_subpass(struct > anv_cmd_buffer *cmd_buffer) > > struct anv_framebuffer *fb = cmd_buffer->state.framebuffer; > > for (uint32_t i = 0; i < cmd_state->subpass->color_count; ++i) { > >const uint32_t a = cmd_state->subpass->color_ > attachments[i].attachment; > > + if (a == VK_ATTACHMENT_UNUSED) > > + continue; > > + > > + assert(a < cmd_state->pass->attachment_count); > >struct anv_attachment_state *att_state = > _state->attachments[a]; > > > >if (!att_state->pending_clear_aspects) > > @@ -1273,6 +1285,7 @@ anv_cmd_buffer_clear_subpass(struct > anv_cmd_buffer *cmd_buffer) > > } > > > > const uint32_t ds = cmd_state->subpass->depth_ > stencil_attachment.attachment; > > + assert(ds == VK_ATTACHMENT_UNUSED || ds < > cmd_state->pass->attachment_count); > > > > I wonder why this assertion differs from the one two hunks up. > Not a good reason, but I prefer the simpler assert above but trying to do that here would have meant an extra unneeded level of control-flow nesting. The one above is just an "if (...) return true;" so adding nesting wasn't a big deal. > Nevertheless, this series is > Reviewed-by: Nanley Chery > > > if (ds != VK_ATTACHMENT_UNUSED && > > cmd_state->attachments[ds].pending_clear_aspects) { > > @@ -1578,8 +1591,12 @@ anv_cmd_buffer_resolve_subpass(struct > anv_cmd_buffer *cmd_buffer) > > blorp_batch_init(_buffer->device->blorp, , cmd_buffer, 0); > > > > for (uint32_t i = 0; i < subpass->color_count; ++i) { > > - ccs_resolve_attachment(cmd_buffer, , > > - subpass->color_attachments[i].attachment); > > + const uint32_t att = subpass->color_attachments[i].attachment; > > + if (att == VK_ATTACHMENT_UNUSED) > > + continue; > > + > > + assert(att < cmd_buffer->state.pass->attachment_count); > > + ccs_resolve_attachment(cmd_buffer, , att); > > } > > > > anv_cmd_buffer_flush_attachments(cmd_buffer, SUBPASS_STAGE_DRAW); > > @@ -1592,6 +1609,9 @@ anv_cmd_buffer_resolve_subpass(struct > anv_cmd_buffer *cmd_buffer) > > if (dst_att == VK_ATTACHMENT_UNUSED) > > continue; > > > > + assert(src_att < cmd_buffer->state.pass->attachment_count); > > + assert(dst_att < cmd_buffer->state.pass->attachment_count); > > + > > if
Re: [Mesa-dev] [PATCH v2] swr: update gallium driver docs
Reviewed-by: Bruce Cherniak> On Apr 14, 2017, at 2:03 PM, Tim Rowley wrote: > > v2: add back scons section, mention additional built swr libraries > --- > src/gallium/docs/source/drivers/openswr.rst | 2 +- > src/gallium/docs/source/drivers/openswr/usage.rst | 16 +++- > 2 files changed, 12 insertions(+), 6 deletions(-) > > diff --git a/src/gallium/docs/source/drivers/openswr.rst > b/src/gallium/docs/source/drivers/openswr.rst > index 84aa51f..e254d7b 100644 > --- a/src/gallium/docs/source/drivers/openswr.rst > +++ b/src/gallium/docs/source/drivers/openswr.rst > @@ -7,7 +7,7 @@ geometry heavy workloads there is a considerable speedup over > llvmpipe, > which is to be expected as the geometry frontend of llvmpipe is single > threaded. > > -This rasterizer is x86 specific and requires AVX or AVX2. The driver > +This rasterizer is x86 specific and requires AVX or above. The driver > fits into the gallium framework, and reuses gallivm for doing the TGSI > to vectorized llvm-IR conversion of the shader kernels. > > diff --git a/src/gallium/docs/source/drivers/openswr/usage.rst > b/src/gallium/docs/source/drivers/openswr/usage.rst > index e55b421..61c30c2 100644 > --- a/src/gallium/docs/source/drivers/openswr/usage.rst > +++ b/src/gallium/docs/source/drivers/openswr/usage.rst > @@ -4,8 +4,9 @@ Usage > Requirements > > > -* An x86 processor with AVX or AVX2 > -* LLVM version 3.6 or later > +* An x86 processor with AVX or above > +* LLVM version 3.9 or later > +* C++14 capable compiler > > Building > > @@ -18,13 +19,18 @@ configure time, for example: :: > Using > ^ > > -On Linux, building will create a drop-in alternative for libGL.so into:: > +On Linux, building with autotools will create a drop-in alternative > +for libGL.so into:: > > lib/gallium/libGL.so > + lib/gallium/libswrAVX.so > + lib/gallium/libswrAVX2.so > > -or:: > +Alternatively, building with SCons will produce:: > > - build/foo/gallium/targets/libgl-xlib/libGL.so > + build/linux-x86_64/gallium/targets/libgl-xlib/libGL.so > + build/linux-x86_64/gallium/drivers/swr/libswrAVX.so > + build/linux-x86_64/gallium/drivers/swr/libswrAVX2.so > > To use it set the LD_LIBRARY_PATH environment variable accordingly. > > -- > 2.7.4 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] swr: Enable MSAA in OpenSWR software renderer
Reviewed-by: George Kyriazis> With the assumption that there are additional changes forthcoming. On Apr 13, 2017, at 5:40 PM, Bruce Cherniak > wrote: This patch enables multisample antialiasing in the OpenSWR software renderer. MSAA is a proof-of-concept/work-in-progress with bug fixes and performance on the way. We wanted to get the changes out now to allow several customers to begin experimenting with MSAA in a software renderer. So as not to impact current customers, MSAA is turned off by default - previous functionality and performance remain intact. It is easily enabled via environment variables, as described below. It has only been tested with the glx-lib winsys. The intention is to enable other state-trackers, both Windows and Linux and more fully support FBOs. There are 2 environment variables that affect behavior: * SWR_MSAA_FORCE_ENABLE - force MSAA on, for apps that are not designed for MSAA... Beware, results will vary. This is mainly for testing. * SWR_MSAA_MAX_SAMPLE_COUNT - sets maximum supported number of samples (1,2,4,8,16), or 0 to disable MSAA altogether. (The default is currently 0.) --- src/gallium/drivers/swr/swr_context.cpp | 90 +- src/gallium/drivers/swr/swr_context.h | 3 + src/gallium/drivers/swr/swr_resource.h | 4 + src/gallium/drivers/swr/swr_screen.cpp | 159 +--- src/gallium/drivers/swr/swr_screen.h| 8 ++ src/gallium/drivers/swr/swr_state.cpp | 74 +-- 6 files changed, 313 insertions(+), 25 deletions(-) diff --git a/src/gallium/drivers/swr/swr_context.cpp b/src/gallium/drivers/swr/swr_context.cpp index 6f46d66..aa5cca8 100644 --- a/src/gallium/drivers/swr/swr_context.cpp +++ b/src/gallium/drivers/swr/swr_context.cpp @@ -267,20 +267,104 @@ swr_resource_copy(struct pipe_context *pipe, } +/* XXX: This resolve is incomplete and suboptimal. It will be removed once the + * pipelined resolve blit works. */ +void +swr_do_msaa_resolve(struct pipe_resource *src_resource, +struct pipe_resource *dst_resource) +{ + /* This is a pretty dumb inline resolve. It only supports 8-bit formats +* (ex RGBA8/BGRA8) - which are most common display formats anyway. +*/ + + /* quick check for 8-bit and number of components */ + uint8_t bits_per_component = + util_format_get_component_bits(src_resource->format, +UTIL_FORMAT_COLORSPACE_RGB, 0); + + /* Unsupported resolve format */ + assert(src_resource->format == dst_resource->format); + assert(bits_per_component == 8); + if ((src_resource->format != dst_resource->format) || + (bits_per_component != 8)) { + return; + } + + uint8_t src_num_comps = util_format_get_nr_components(src_resource->format); + + SWR_SURFACE_STATE *src_surface = _resource(src_resource)->swr; + SWR_SURFACE_STATE *dst_surface = _resource(dst_resource)->swr; + + uint32_t *src, *dst, offset; + uint32_t num_samples = src_surface->numSamples; + float recip_num_samples = 1.0f / num_samples; + for (uint32_t y = 0; y < src_surface->height; y++) { + for (uint32_t x = 0; x < src_surface->width; x++) { + float r = 0.0f; + float g = 0.0f; + float b = 0.0f; + float a = 0.0f; + for (uint32_t sampleNum = 0; sampleNum < num_samples; sampleNum++) { +offset = ComputeSurfaceOffset(x, y, 0, 0, sampleNum, 0, src_surface); +src = (uint32_t *) src_surface->pBaseAddress + offset/src_num_comps; +const uint32_t sample = *src; +r += (float)((sample >> 24) & 0xff) / 255.0f * recip_num_samples; +g += (float)((sample >> 16) & 0xff) / 255.0f * recip_num_samples; +b += (float)((sample >> 8) & 0xff) / 255.0f * recip_num_samples; +a += (float)((sample ) & 0xff) / 255.0f * recip_num_samples; + } + uint32_t result = 0; + result = ((uint8_t)(r * 255.0f) & 0xff) << 24; + result |= ((uint8_t)(g * 255.0f) & 0xff) << 16; + result |= ((uint8_t)(b * 255.0f) & 0xff) << 8; + result |= ((uint8_t)(a * 255.0f) & 0xff); + offset = ComputeSurfaceOffset(x, y, 0, 0, 0, 0, src_surface); + dst = (uint32_t *) dst_surface->pBaseAddress + offset/src_num_comps; + *dst = result; + } + } +} + + static void swr_blit(struct pipe_context *pipe, const struct pipe_blit_info *blit_info) { struct swr_context *ctx = swr_context(pipe); + /* Make a copy of the const blit_info, so we can modify it */ struct pipe_blit_info info = *blit_info; - if (blit_info->render_condition_enable && !swr_check_render_cond(pipe)) + if (info.render_condition_enable && !swr_check_render_cond(pipe)) return; if (info.src.resource->nr_samples > 1 && info.dst.resource->nr_samples <= 1 &&
Re: [Mesa-dev] [PATCH 3/3] winsys/amdgpu: init buffer_indices_hashlist with memset()
For the series: Reviewed-by: Marek OlšákMarek On Fri, Apr 14, 2017 at 6:32 PM, Samuel Pitoiset wrote: > Signed-off-by: Samuel Pitoiset > --- > src/gallium/winsys/amdgpu/drm/amdgpu_cs.c | 10 ++ > 1 file changed, 2 insertions(+), 8 deletions(-) > > diff --git a/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c > b/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c > index f068d8ea7a..8a277d08e1 100644 > --- a/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c > +++ b/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c > @@ -695,8 +695,6 @@ static void amdgpu_ib_finalize(struct amdgpu_ib *ib) > static bool amdgpu_init_cs_context(struct amdgpu_cs_context *cs, > enum ring_type ring_type) > { > - int i; > - > switch (ring_type) { > case RING_DMA: >cs->request.ip_type = AMDGPU_HW_IP_DMA; > @@ -720,9 +718,7 @@ static bool amdgpu_init_cs_context(struct > amdgpu_cs_context *cs, >break; > } > > - for (i = 0; i < ARRAY_SIZE(cs->buffer_indices_hashlist); i++) { > - cs->buffer_indices_hashlist[i] = -1; > - } > + memset(cs->buffer_indices_hashlist, -1, > sizeof(cs->buffer_indices_hashlist)); > cs->last_added_bo = NULL; > > cs->request.number_of_ibs = 1; > @@ -757,9 +753,7 @@ static void amdgpu_cs_context_cleanup(struct > amdgpu_cs_context *cs) > cs->num_sparse_buffers = 0; > amdgpu_fence_reference(>fence, NULL); > > - for (i = 0; i < ARRAY_SIZE(cs->buffer_indices_hashlist); i++) { > - cs->buffer_indices_hashlist[i] = -1; > - } > + memset(cs->buffer_indices_hashlist, -1, > sizeof(cs->buffer_indices_hashlist)); > cs->last_added_bo = NULL; > } > > -- > 2.12.2 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 5/5] radeonsi: enable ARB_shader_viewport_layer_array
For the series: Reviewed-by: Marek OlšákMarek On Thu, Apr 13, 2017 at 10:30 PM, Nicolai Hähnle wrote: > From: Nicolai Hähnle > > --- > docs/features.txt | 2 +- > docs/relnotes/17.1.0.html | 1 + > src/gallium/drivers/radeonsi/si_pipe.c | 2 +- > 3 files changed, 3 insertions(+), 2 deletions(-) > > diff --git a/docs/features.txt b/docs/features.txt > index a2d7785..7ca5fd3 100644 > --- a/docs/features.txt > +++ b/docs/features.txt > @@ -290,21 +290,21 @@ Khronos, ARB, and OES extensions that are not part of > any OpenGL or OpenGL ES ve >GL_ARB_post_depth_coverageDONE (i965) >GL_ARB_robustness_isolation not started >GL_ARB_sample_locations not started >GL_ARB_seamless_cubemap_per_texture DONE (i965, nvc0, > radeonsi, r600, softpipe, swr) >GL_ARB_shader_atomic_counter_ops DONE (i965/gen7+, > nvc0, radeonsi, softpipe) >GL_ARB_shader_ballot DONE (nvc0, radeonsi) >GL_ARB_shader_clock DONE (i965/gen7+, > nv50, nvc0, radeonsi) >GL_ARB_shader_draw_parameters DONE (i965, nvc0, > radeonsi) >GL_ARB_shader_group_vote DONE (nvc0, radeonsi) >GL_ARB_shader_stencil_export DONE (i965/gen9+, > radeonsi, softpipe, llvmpipe, swr) > - GL_ARB_shader_viewport_layer_arrayDONE (i965/gen6+) > + GL_ARB_shader_viewport_layer_arrayDONE (i965/gen6+, > radeonsi) >GL_ARB_sparse_buffer DONE (radeonsi/CIK+) >GL_ARB_sparse_texture not started >GL_ARB_sparse_texture2not started >GL_ARB_sparse_texture_clamp not started >GL_ARB_texture_filter_minmax not started >GL_ARB_transform_feedback_overflow_query DONE (i965/gen6+) >GL_KHR_blend_equation_advanced_coherent DONE (i965/gen9+) >GL_KHR_no_error not started >GL_KHR_texture_compression_astc_hdr DONE (core only) >GL_KHR_texture_compression_astc_sliced_3d not started > diff --git a/docs/relnotes/17.1.0.html b/docs/relnotes/17.1.0.html > index 8f237ed..82086d5 100644 > --- a/docs/relnotes/17.1.0.html > +++ b/docs/relnotes/17.1.0.html > @@ -41,20 +41,21 @@ TBD. > > > Note: some of the new features are only available with certain drivers. > > > > GL_ARB_gpu_shader_int64 on i965/gen8+, nvc0, radeonsi, softpipe, > llvmpipe > GL_ARB_shader_ballot on nvc0, radeonsi > GL_ARB_shader_clock on nv50, nvc0, radeonsi > GL_ARB_shader_group_vote on radeonsi > +GL_ARB_shader_viewport_layer_array on radeonsi > GL_ARB_sparse_buffer on radeonsi/CIK+ > GL_ARB_transform_feedback2 on i965/gen6 > GL_ARB_transform_feedback_overflow_query on i965/gen6+ > GL_NV_fill_rectangle on nvc0 > Geometry shaders enabled on swr > > > Bug fixes > > > diff --git a/src/gallium/drivers/radeonsi/si_pipe.c > b/src/gallium/drivers/radeonsi/si_pipe.c > index 2955249..f0e24c2 100644 > --- a/src/gallium/drivers/radeonsi/si_pipe.c > +++ b/src/gallium/drivers/radeonsi/si_pipe.c > @@ -414,20 +414,21 @@ static int si_get_param(struct pipe_screen* pscreen, > enum pipe_cap param) > case PIPE_CAP_STRING_MARKER: > case PIPE_CAP_CLEAR_TEXTURE: > case PIPE_CAP_CULL_DISTANCE: > case PIPE_CAP_TGSI_ARRAY_COMPONENTS: > case PIPE_CAP_TGSI_CAN_READ_OUTPUTS: > case PIPE_CAP_GLSL_OPTIMIZE_CONSERVATIVELY: > case PIPE_CAP_STREAM_OUTPUT_PAUSE_RESUME: > case PIPE_CAP_STREAM_OUTPUT_INTERLEAVE_BUFFERS: > case PIPE_CAP_DOUBLES: > case PIPE_CAP_TGSI_TEX_TXF_LZ: > + case PIPE_CAP_TGSI_TES_LAYER_VIEWPORT: > return 1; > > case PIPE_CAP_INT64: > case PIPE_CAP_INT64_DIVMOD: > case PIPE_CAP_TGSI_CLOCK: > return HAVE_LLVM >= 0x0309; > > case PIPE_CAP_TGSI_VOTE: > return HAVE_LLVM >= 0x0400; > > @@ -499,21 +500,20 @@ static int si_get_param(struct pipe_screen* pscreen, > enum pipe_cap param) > case PIPE_CAP_FAKE_SW_MSAA: > case PIPE_CAP_TEXTURE_GATHER_OFFSETS: > case PIPE_CAP_VERTEXID_NOBASE: > case PIPE_CAP_PRIMITIVE_RESTART_FOR_PATCHES: > case PIPE_CAP_MAX_WINDOW_RECTANGLES: > case PIPE_CAP_NATIVE_FENCE_FD: > case PIPE_CAP_TGSI_FS_FBFETCH: > case PIPE_CAP_TGSI_MUL_ZERO_WINS: > case PIPE_CAP_UMA: > case PIPE_CAP_POLYGON_MODE_FILL_RECTANGLE: > - case PIPE_CAP_TGSI_TES_LAYER_VIEWPORT: > return 0; > > case PIPE_CAP_QUERY_BUFFER_OBJECT: >
Re: [Mesa-dev] [PATCH] anv/cmd_buffer: Disable CCS on BDW input attachments
Reviewed-by: Jason EkstrandOn Fri, Apr 14, 2017 at 12:18 PM, Nanley Chery wrote: > The description under RENDER_SURFACE_STATE::RedClearColor says, > >For Sampling Engine Multisampled Surfaces and Render Targets: > Specifies the clear value for the red channel. >For Other Surfaces: > This field is ignored. > > This means that the sampler on BDW doesn't support CCS. > > Cc: Samuel Iglesias Gonsálvez > Cc: Jordan Justen > Cc: Jason Ekstrand > Cc: > Signed-off-by: Nanley Chery > --- > src/intel/vulkan/anv_blorp.c | 11 --- > src/intel/vulkan/genX_cmd_buffer.c | 32 +--- > 2 files changed, 13 insertions(+), 30 deletions(-) > > diff --git a/src/intel/vulkan/anv_blorp.c b/src/intel/vulkan/anv_blorp.c > index 4904ee3a5f..8a3c4deed3 100644 > --- a/src/intel/vulkan/anv_blorp.c > +++ b/src/intel/vulkan/anv_blorp.c > @@ -1381,7 +1381,6 @@ ccs_resolve_attachment(struct anv_cmd_buffer > *cmd_buffer, > * still hot in the cache. > */ > bool found_draw = false; > - bool self_dep = false; > enum anv_subpass_usage usage = 0; > for (uint32_t s = subpass_idx + 1; s < pass->subpass_count; s++) { >usage |= pass->attachments[att].subpass_usage[s]; > @@ -1391,8 +1390,6 @@ ccs_resolve_attachment(struct anv_cmd_buffer > *cmd_buffer, >* wait to resolve until then. >*/ > found_draw = true; > - if (pass->attachments[att].subpass_usage[s] & > ANV_SUBPASS_USAGE_INPUT) > -self_dep = true; > break; >} > } > @@ -1451,14 +1448,6 @@ ccs_resolve_attachment(struct anv_cmd_buffer > *cmd_buffer, >*binding this surface to Sampler." >*/ > resolve_op = BLORP_FAST_CLEAR_OP_RESOLVE_PARTIAL; > - } else if (cmd_buffer->device->info.gen == 8 && self_dep && > - att_state->input_aux_usage == ISL_AUX_USAGE_CCS_D) { > - /* On Broadwell we still need to do resolves when there is a > - * self-dependency because HW could not see fast-clears and works > - * on the render cache as if there was regular non-fast-clear > surface. > - * To avoid any inconsistency, we force the resolve. > - */ > - resolve_op = BLORP_FAST_CLEAR_OP_RESOLVE_FULL; >} > } > > diff --git a/src/intel/vulkan/genX_cmd_buffer.c > b/src/intel/vulkan/genX_cmd_buffer.c > index b78b13d88e..2e0108d3f5 100644 > --- a/src/intel/vulkan/genX_cmd_buffer.c > +++ b/src/intel/vulkan/genX_cmd_buffer.c > @@ -291,27 +291,21 @@ color_attachment_compute_aux_usage(struct > anv_device *device, >att_state->input_aux_usage = ISL_AUX_USAGE_CCS_E; > } else if (att_state->fast_clear) { >att_state->aux_usage = ISL_AUX_USAGE_CCS_D; > - if (GEN_GEN >= 9 && > - !isl_format_supports_ccs_e(>info, iview->isl.format)) { > - /* From the Sky Lake PRM, RENDER_SURFACE_STATE:: > AuxiliarySurfaceMode: > - * > - *"If Number of Multisamples is MULTISAMPLECOUNT_1, AUX_CCS_D > - *setting is only allowed if Surface Format supported for > Fast > - *Clear. In addition, if the surface is bound to the sampling > - *engine, Surface Format must be supported for Render Target > - *Compression for surfaces bound to the sampling engine." > - * > - * In other words, we can't sample from a fast-cleared image if > it > - * doesn't also support color compression. > - */ > - att_state->input_aux_usage = ISL_AUX_USAGE_NONE; > - } else if (GEN_GEN >= 8) { > - /* Broadwell/Skylake can sample from fast-cleared images */ > + /* From the Sky Lake PRM, RENDER_SURFACE_STATE:: > AuxiliarySurfaceMode: > + * > + *"If Number of Multisamples is MULTISAMPLECOUNT_1, AUX_CCS_D > + *setting is only allowed if Surface Format supported for Fast > + *Clear. In addition, if the surface is bound to the sampling > + *engine, Surface Format must be supported for Render Target > + *Compression for surfaces bound to the sampling engine." > + * > + * In other words, we can only sample from a fast-cleared image if > it > + * also supports color compression. > + */ > + if (isl_format_supports_ccs_e(>info, iview->isl.format)) > att_state->input_aux_usage = ISL_AUX_USAGE_CCS_D; > - } else { > - /* Ivy Bridge and Haswell cannot */ > + else > att_state->input_aux_usage = ISL_AUX_USAGE_NONE; > - } > } else { >att_state->aux_usage = ISL_AUX_USAGE_NONE; >att_state->input_aux_usage = ISL_AUX_USAGE_NONE; > -- > 2.12.2 > > ___ mesa-dev mailing list
Re: [Mesa-dev] [PATCH 2/2] radeonsi: cope with missing disassembly
For the series: Reviewed-by: Marek OlšákMarek On Thu, Apr 13, 2017 at 8:23 PM, Nicolai Hähnle wrote: > From: Nicolai Hähnle > > For robustness and testing purposes. > --- > src/gallium/drivers/radeonsi/si_state_shaders.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/src/gallium/drivers/radeonsi/si_state_shaders.c > b/src/gallium/drivers/radeonsi/si_state_shaders.c > index 78c7495..c52ffd9 100644 > --- a/src/gallium/drivers/radeonsi/si_state_shaders.c > +++ b/src/gallium/drivers/radeonsi/si_state_shaders.c > @@ -106,21 +106,22 @@ static uint32_t *read_chunk(uint32_t *ptr, void **data, > unsigned *size) > > /** > * Return the shader binary in a buffer. The first 4 bytes contain its size > * as integer. > */ > static void *si_get_shader_binary(struct si_shader *shader) > { > /* There is always a size of data followed by the data itself. */ > unsigned relocs_size = shader->binary.reloc_count * >sizeof(shader->binary.relocs[0]); > - unsigned disasm_size = strlen(shader->binary.disasm_string) + 1; > + unsigned disasm_size = shader->binary.disasm_string ? > + strlen(shader->binary.disasm_string) + 1 : 0; > unsigned llvm_ir_size = shader->binary.llvm_ir_string ? > strlen(shader->binary.llvm_ir_string) + 1 : 0; > unsigned size = > 4 + /* total size */ > 4 + /* CRC32 of the data below */ > align(sizeof(shader->config), 4) + > align(sizeof(shader->info), 4) + > 4 + align(shader->binary.code_size, 4) + > 4 + align(shader->binary.rodata_size, 4) + > 4 + align(relocs_size, 4) + > -- > 2.9.3 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] anv/cmd_buffer: Disable CCS on BDW input attachments
The description under RENDER_SURFACE_STATE::RedClearColor says, For Sampling Engine Multisampled Surfaces and Render Targets: Specifies the clear value for the red channel. For Other Surfaces: This field is ignored. This means that the sampler on BDW doesn't support CCS. Cc: Samuel Iglesias GonsálvezCc: Jordan Justen Cc: Jason Ekstrand Cc: Signed-off-by: Nanley Chery --- src/intel/vulkan/anv_blorp.c | 11 --- src/intel/vulkan/genX_cmd_buffer.c | 32 +--- 2 files changed, 13 insertions(+), 30 deletions(-) diff --git a/src/intel/vulkan/anv_blorp.c b/src/intel/vulkan/anv_blorp.c index 4904ee3a5f..8a3c4deed3 100644 --- a/src/intel/vulkan/anv_blorp.c +++ b/src/intel/vulkan/anv_blorp.c @@ -1381,7 +1381,6 @@ ccs_resolve_attachment(struct anv_cmd_buffer *cmd_buffer, * still hot in the cache. */ bool found_draw = false; - bool self_dep = false; enum anv_subpass_usage usage = 0; for (uint32_t s = subpass_idx + 1; s < pass->subpass_count; s++) { usage |= pass->attachments[att].subpass_usage[s]; @@ -1391,8 +1390,6 @@ ccs_resolve_attachment(struct anv_cmd_buffer *cmd_buffer, * wait to resolve until then. */ found_draw = true; - if (pass->attachments[att].subpass_usage[s] & ANV_SUBPASS_USAGE_INPUT) -self_dep = true; break; } } @@ -1451,14 +1448,6 @@ ccs_resolve_attachment(struct anv_cmd_buffer *cmd_buffer, *binding this surface to Sampler." */ resolve_op = BLORP_FAST_CLEAR_OP_RESOLVE_PARTIAL; - } else if (cmd_buffer->device->info.gen == 8 && self_dep && - att_state->input_aux_usage == ISL_AUX_USAGE_CCS_D) { - /* On Broadwell we still need to do resolves when there is a - * self-dependency because HW could not see fast-clears and works - * on the render cache as if there was regular non-fast-clear surface. - * To avoid any inconsistency, we force the resolve. - */ - resolve_op = BLORP_FAST_CLEAR_OP_RESOLVE_FULL; } } diff --git a/src/intel/vulkan/genX_cmd_buffer.c b/src/intel/vulkan/genX_cmd_buffer.c index b78b13d88e..2e0108d3f5 100644 --- a/src/intel/vulkan/genX_cmd_buffer.c +++ b/src/intel/vulkan/genX_cmd_buffer.c @@ -291,27 +291,21 @@ color_attachment_compute_aux_usage(struct anv_device *device, att_state->input_aux_usage = ISL_AUX_USAGE_CCS_E; } else if (att_state->fast_clear) { att_state->aux_usage = ISL_AUX_USAGE_CCS_D; - if (GEN_GEN >= 9 && - !isl_format_supports_ccs_e(>info, iview->isl.format)) { - /* From the Sky Lake PRM, RENDER_SURFACE_STATE::AuxiliarySurfaceMode: - * - *"If Number of Multisamples is MULTISAMPLECOUNT_1, AUX_CCS_D - *setting is only allowed if Surface Format supported for Fast - *Clear. In addition, if the surface is bound to the sampling - *engine, Surface Format must be supported for Render Target - *Compression for surfaces bound to the sampling engine." - * - * In other words, we can't sample from a fast-cleared image if it - * doesn't also support color compression. - */ - att_state->input_aux_usage = ISL_AUX_USAGE_NONE; - } else if (GEN_GEN >= 8) { - /* Broadwell/Skylake can sample from fast-cleared images */ + /* From the Sky Lake PRM, RENDER_SURFACE_STATE::AuxiliarySurfaceMode: + * + *"If Number of Multisamples is MULTISAMPLECOUNT_1, AUX_CCS_D + *setting is only allowed if Surface Format supported for Fast + *Clear. In addition, if the surface is bound to the sampling + *engine, Surface Format must be supported for Render Target + *Compression for surfaces bound to the sampling engine." + * + * In other words, we can only sample from a fast-cleared image if it + * also supports color compression. + */ + if (isl_format_supports_ccs_e(>info, iview->isl.format)) att_state->input_aux_usage = ISL_AUX_USAGE_CCS_D; - } else { - /* Ivy Bridge and Haswell cannot */ + else att_state->input_aux_usage = ISL_AUX_USAGE_NONE; - } } else { att_state->aux_usage = ISL_AUX_USAGE_NONE; att_state->input_aux_usage = ISL_AUX_USAGE_NONE; -- 2.12.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/3] radv: add private push descriptors for meta
Reviewed-by: Bas Nieuwenhuizenfor the series. On Fri, Apr 14, 2017 at 12:26 AM, Fredrik Höglund wrote: > This allows meta to use push descriptors without disturbing user > push descriptors. > > radv_meta_push_descriptor_set differs from vkCmdPushDescriptorSetKHR > in that partial updates are not supported; all descriptors used in > subsequent draw commands must be pushed at the same time. > > Signed-off-by: Fredrik Höglund > --- > src/amd/vulkan/radv_cmd_buffer.c | 33 + > src/amd/vulkan/radv_private.h| 8 > 2 files changed, 41 insertions(+) > > diff --git a/src/amd/vulkan/radv_cmd_buffer.c > b/src/amd/vulkan/radv_cmd_buffer.c > index f03e3dff34..31d04e535d 100644 > --- a/src/amd/vulkan/radv_cmd_buffer.c > +++ b/src/amd/vulkan/radv_cmd_buffer.c > @@ -1981,6 +1981,39 @@ static bool radv_init_push_descriptor_set(struct > radv_cmd_buffer *cmd_buffer, > return true; > } > > +void radv_meta_push_descriptor_set( > + struct radv_cmd_buffer* cmd_buffer, > + VkPipelineBindPoint pipelineBindPoint, > + VkPipelineLayout _layout, > + uint32_t set, > + uint32_t descriptorWriteCount, > + const VkWriteDescriptorSet* pDescriptorWrites) > +{ > + RADV_FROM_HANDLE(radv_pipeline_layout, layout, _layout); > + struct radv_descriptor_set *push_set = > _buffer->meta_push_descriptors; > + unsigned bo_offset; > + > + assert(layout->set[set].layout->flags & > VK_DESCRIPTOR_SET_LAYOUT_CREATE_PUSH_DESCRIPTOR_BIT_KHR); > + > + push_set->size = layout->set[set].layout->size; > + push_set->layout = layout->set[set].layout; > + > + if (!radv_cmd_buffer_upload_alloc(cmd_buffer, push_set->size, 32, > + _offset, > + (void**) _set->mapped_ptr)) > + return; > + > + push_set->va = > cmd_buffer->device->ws->buffer_get_va(cmd_buffer->upload.upload_bo); > + push_set->va += bo_offset; > + > + radv_update_descriptor_sets(cmd_buffer->device, cmd_buffer, > + radv_descriptor_set_to_handle(push_set), > + descriptorWriteCount, pDescriptorWrites, > 0, NULL); > + > + cmd_buffer->state.descriptors[set] = push_set; > + cmd_buffer->state.descriptors_dirty |= (1 << set); > +} > + > void radv_CmdPushDescriptorSetKHR( > VkCommandBuffer commandBuffer, > VkPipelineBindPoint pipelineBindPoint, > diff --git a/src/amd/vulkan/radv_private.h b/src/amd/vulkan/radv_private.h > index 00190e7eee..a64336856f 100644 > --- a/src/amd/vulkan/radv_private.h > +++ b/src/amd/vulkan/radv_private.h > @@ -787,6 +787,7 @@ struct radv_cmd_buffer { > uint32_t dynamic_buffers[4 * MAX_DYNAMIC_BUFFERS]; > VkShaderStageFlags push_constant_stages; > struct radv_push_descriptor_set push_descriptors; > + struct radv_descriptor_set meta_push_descriptors; > > struct radv_cmd_buffer_upload upload; > > @@ -1410,6 +1411,13 @@ radv_update_descriptor_set_with_template(struct > radv_device *device, > VkDescriptorUpdateTemplateKHR > descriptorUpdateTemplate, > const void *pData); > > +void radv_meta_push_descriptor_set(struct radv_cmd_buffer *cmd_buffer, > + VkPipelineBindPoint pipelineBindPoint, > + VkPipelineLayout _layout, > + uint32_t set, > + uint32_t descriptorWriteCount, > + const VkWriteDescriptorSet > *pDescriptorWrites); > + > void radv_initialise_cmask(struct radv_cmd_buffer *cmd_buffer, >struct radv_image *image, uint32_t value); > void radv_initialize_dcc(struct radv_cmd_buffer *cmd_buffer, > -- > 2.11.0 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2] swr: update gallium driver docs
v2: add back scons section, mention additional built swr libraries --- src/gallium/docs/source/drivers/openswr.rst | 2 +- src/gallium/docs/source/drivers/openswr/usage.rst | 16 +++- 2 files changed, 12 insertions(+), 6 deletions(-) diff --git a/src/gallium/docs/source/drivers/openswr.rst b/src/gallium/docs/source/drivers/openswr.rst index 84aa51f..e254d7b 100644 --- a/src/gallium/docs/source/drivers/openswr.rst +++ b/src/gallium/docs/source/drivers/openswr.rst @@ -7,7 +7,7 @@ geometry heavy workloads there is a considerable speedup over llvmpipe, which is to be expected as the geometry frontend of llvmpipe is single threaded. -This rasterizer is x86 specific and requires AVX or AVX2. The driver +This rasterizer is x86 specific and requires AVX or above. The driver fits into the gallium framework, and reuses gallivm for doing the TGSI to vectorized llvm-IR conversion of the shader kernels. diff --git a/src/gallium/docs/source/drivers/openswr/usage.rst b/src/gallium/docs/source/drivers/openswr/usage.rst index e55b421..61c30c2 100644 --- a/src/gallium/docs/source/drivers/openswr/usage.rst +++ b/src/gallium/docs/source/drivers/openswr/usage.rst @@ -4,8 +4,9 @@ Usage Requirements -* An x86 processor with AVX or AVX2 -* LLVM version 3.6 or later +* An x86 processor with AVX or above +* LLVM version 3.9 or later +* C++14 capable compiler Building @@ -18,13 +19,18 @@ configure time, for example: :: Using ^ -On Linux, building will create a drop-in alternative for libGL.so into:: +On Linux, building with autotools will create a drop-in alternative +for libGL.so into:: lib/gallium/libGL.so + lib/gallium/libswrAVX.so + lib/gallium/libswrAVX2.so -or:: +Alternatively, building with SCons will produce:: - build/foo/gallium/targets/libgl-xlib/libGL.so + build/linux-x86_64/gallium/targets/libgl-xlib/libGL.so + build/linux-x86_64/gallium/drivers/swr/libswrAVX.so + build/linux-x86_64/gallium/drivers/swr/libswrAVX2.so To use it set the LD_LIBRARY_PATH environment variable accordingly. -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] swr: Add polygon stipple support
On Fri, Apr 14, 2017 at 2:52 PM, Kyriazis, Georgewrote: > > > + /* work around the fact that poly stipple also affects lines */ > > > + /* and points, since we rasterize them as triangles, too */ > > > + /* Has to be before fragment shader, since it sets SWR_NEW_FS */ > > > + if (p_draw_info) { > > > + bool new_prim_is_poly = (u_reduced_prim(p_draw_info->mode) == > > > PIPE_PRIM_TRIANGLES); > > > > What about glPolygonMode and what about geometry shaders that take in > > e.g. points and put out triangles? Perhaps you need to pass in a "is > > this *really* a triangle" parameter to the shader generated by the > > rasterizer. > > > > > > Actually the GS thing won't happen since polygon stippling is a > > compat-only feature and we don't support GS in compat profiles. You do > > need to check that the polymode == FILL here though. > > Well, currently we don’t have a working polygon mode. Once we implement it, > then we’ll look at stipple at that time. Ah, indeed you don't. I thought you at least handled it when front == back, but I was mistaken. Cheers, -ilia ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] swr: Add polygon stipple support
On Apr 14, 2017, at 11:35 AM, Ilia Mirkin> wrote: On Fri, Apr 14, 2017 at 11:18 AM, Ilia Mirkin > wrote: On Thu, Apr 13, 2017 at 4:30 PM, George Kyriazis > wrote: Add polygon stipple functionality to the fragment shader. Explicitly turn off polygon stipple for lines and points, since we do them using tris. --- src/gallium/drivers/swr/swr_context.h | 4 ++- src/gallium/drivers/swr/swr_shader.cpp | 56 ++ src/gallium/drivers/swr/swr_shader.h | 1 + src/gallium/drivers/swr/swr_state.cpp | 27 ++-- src/gallium/drivers/swr/swr_state.h| 5 +++ 5 files changed, 84 insertions(+), 9 deletions(-) diff --git a/src/gallium/drivers/swr/swr_context.h b/src/gallium/drivers/swr/swr_context.h index be65a20..9d80c70 100644 --- a/src/gallium/drivers/swr/swr_context.h +++ b/src/gallium/drivers/swr/swr_context.h @@ -98,6 +98,8 @@ struct swr_draw_context { float userClipPlanes[PIPE_MAX_CLIP_PLANES][4]; + uint32_t polyStipple[32]; + SWR_SURFACE_STATE renderTargets[SWR_NUM_ATTACHMENTS]; void *pStats; }; @@ -127,7 +129,7 @@ struct swr_context { struct pipe_constant_buffer constants[PIPE_SHADER_TYPES][PIPE_MAX_CONSTANT_BUFFERS]; struct pipe_framebuffer_state framebuffer; - struct pipe_poly_stipple poly_stipple; + struct swr_poly_stipple poly_stipple; struct pipe_scissor_state scissor; SWR_RECT swr_scissor; struct pipe_sampler_view * diff --git a/src/gallium/drivers/swr/swr_shader.cpp b/src/gallium/drivers/swr/swr_shader.cpp index 6fc0596..d8f5512 100644 --- a/src/gallium/drivers/swr/swr_shader.cpp +++ b/src/gallium/drivers/swr/swr_shader.cpp @@ -165,6 +165,9 @@ swr_generate_fs_key(struct swr_jit_fs_key , sizeof(key.vs_output_semantic_idx)); swr_generate_sampler_key(swr_fs->info, ctx, PIPE_SHADER_FRAGMENT, key); + + key.poly_stipple_enable = ctx->rasterizer->poly_stipple_enable && + ctx->poly_stipple.prim_is_poly; } void @@ -1099,17 +1102,58 @@ BuilderSWR::CompileFS(struct swr_context *ctx, swr_jit_fs_key ) memset(_values, 0, sizeof(system_values)); struct lp_build_mask_context mask; + bool uses_mask = false; - if (swr_fs->info.base.uses_kill) { - Value *mask_val = LOAD(pPS, {0, SWR_PS_CONTEXT_activeMask}, "activeMask"); + if (swr_fs->info.base.uses_kill || + key.poly_stipple_enable) { + Value *vActiveMask = NULL; + if (swr_fs->info.base.uses_kill) { + vActiveMask = LOAD(pPS, {0, SWR_PS_CONTEXT_activeMask}, "activeMask"); + } + if (key.poly_stipple_enable) { + // first get fragment xy coords and clip to stipple bounds + Value *vXf = LOAD(pPS, {0, SWR_PS_CONTEXT_vX, PixelPositions_UL}); + Value *vYf = LOAD(pPS, {0, SWR_PS_CONTEXT_vY, PixelPositions_UL}); + Value *vXu = FP_TO_UI(vXf, mSimdInt32Ty); + Value *vYu = FP_TO_UI(vYf, mSimdInt32Ty); + + // stipple pattern is 32x32, which means that one line of stipple + // is stored in one word: + // vXstipple is bit offset inside 32-bit stipple word + // vYstipple is word index is stipple array + Value *vXstipple = AND(vXu, VIMMED1(0x1f)); // & (32-1) + Value *vYstipple = AND(vYu, VIMMED1(0x1f)); // & (32-1) + + // grab stipple pattern base address + Value *stipplePtr = GEP(hPrivateData, {0, swr_draw_context_polyStipple, 0}); + stipplePtr = BITCAST(stipplePtr, mInt8PtrTy); + + // peform a gather to grab stipple words for each lane + Value *vStipple = GATHERDD(VUNDEF_I(), stipplePtr, vYstipple, +VIMMED1(0x), C((char)4)); + + // create a mask with one bit corresponding to the x stipple + // and AND it with the pattern, to see if we have a bit + Value *vBitMask = LSHR(VIMMED1(0x8000), vXstipple); + Value *vStippleMask = AND(vStipple, vBitMask); + vStippleMask = ICMP_NE(vStippleMask, VIMMED1(0)); + vStippleMask = VMASK(vStippleMask); + + if (swr_fs->info.base.uses_kill) { +vActiveMask = AND(vActiveMask, vStippleMask); + } else { +vActiveMask = vStippleMask; + } + } lp_build_mask_begin( - , gallivm, lp_type_float_vec(32, 32 * 8), wrap(mask_val)); + , gallivm, lp_type_float_vec(32, 32 * 8), wrap(vActiveMask)); + uses_mask = true; } lp_build_tgsi_soa(gallivm, swr_fs->pipe.tokens, lp_type_float_vec(32, 32 * 8), - swr_fs->info.base.uses_kill ? : NULL, // mask + uses_mask ? : NULL, // mask wrap(consts_ptr), wrap(const_sizes_ptr), _values, @@ -1172,13 +1216,13 @@ BuilderSWR::CompileFS(struct swr_context *ctx,
[Mesa-dev] [PATCH] mesa: print target string in glBindTexture() error message
--- src/mesa/main/texobj.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/mesa/main/texobj.c b/src/mesa/main/texobj.c index ad644ca..00feb97 100644 --- a/src/mesa/main/texobj.c +++ b/src/mesa/main/texobj.c @@ -1663,7 +1663,8 @@ _mesa_BindTexture( GLenum target, GLuint texName ) targetIndex = _mesa_tex_target_to_index(ctx, target); if (targetIndex < 0) { - _mesa_error(ctx, GL_INVALID_ENUM, "glBindTexture(target)"); + _mesa_error(ctx, GL_INVALID_ENUM, "glBindTexture(target = %s)", + _mesa_enum_to_string(target)); return; } assert(targetIndex < NUM_TEXTURE_TARGETS); -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] Mesa 17.0.4 release candidate
Hello list, The candidate for the Mesa 17.0.4 is now available. Currently we have: - 28 queued - 1 nominated (outstanding) - and 0 rejected patch(es) Current queue includes of extra PCI IDs and a runtime warning fix for radeonsi. While r600 has improved error handling in OOM conditions. There is a GBM flush fix for VMWGFX and other drivers that queue DMA operations on the mapping context. A performance regression in freedreno has been resolved. For nouveau and i965 we have various fixes, of which the correct GL version is now reported on i965 devices. Haiku build issues have been addressed. Last but not least, Mesa no longer prints a harmless warning on platform devices. Take a look at section "Mesa stable queue" for more information. Testing reports/general approval Any testing reports (or general approval of the state of the branch) will be greatly appreciated. The plan is to have 17.0.4 this Sunday (16th of April), around or shortly after 19:00 GMT. If you have any questions or suggestions - be that about the current patch queue or otherwise, please go ahead. Trivial merge conflicts --- commit 5094311078e23a3a9f62b143f2451d3b91691134 Author: Craig Stoutanv/cmd_buffer: fix host memory leak (cherry picked from commit 1da7a11de8113932871487efaeb2674a3d1c644a) commit 04df217ac07847e7f020a180ac2951ed17209645 Author: Jason Ekstrand i965/blorp: Align vertex buffers to 64B (cherry picked from commit f938354362655a378d474c5f79c52cea9852ab91) commit e7f872f7b8a897e188cf7b0462867c8f0b5d9397 Author: Kenneth Graunke i965: Set screen->cmd_parser_version to 0 if we can't write registers. (cherry picked from commit 31693a13f8fbc52d4f19f1e8800a4edabeecbe19) commit a8e217d057a25584949f57093684fe9b4978dbf0 Author: Kenneth Graunke i965: Set kernel features before computing max GL version. (cherry picked from commit 02ccd8f52cffcc25e5fefdd0f900cf04230395f4) commit 1b2bcb6826ff8855e96117c9523821336a3be88a Author: Julien Isorce winsys/radeon: check null return from radeon_cs_create_fence in cs_flush (cherry picked from commit d08c0930af8aaef5bdf80df618bb906e0b349830) Cheers, Emil Mesa stable queue - Nominated (1) = Boyan Ding (1): d941ef3 nvc0/ir: Properly handle a "split form" of predicate destination Queued (28) === Alex Deucher (1): radeonsi: add new polaris10 pci id Alex Smith (1): radv: Invalidate L2 for TRANSFER_WRITE barriers Craig Stout (1): anv/cmd_buffer: fix host memory leak Emil Velikov (2): Revert "cherry-ignore: add the Flush after unmap in gbm/dri fix" Revert "freedreno: fix memory leak" Fabio Estevam (1): loader: Move non-error message to debug level Ilia Mirkin (4): nvc0/ir: fix LSB/BFE/BFI implementations nvc0/ir: fix overwriting of offset register with interpolateAtOffset nvc0: increase texture buffer object alignment to 256 for pre-GM107 nouveau: when mapping a persistent buffer, synchronize on former xfers Jason Ekstrand (5): i965/fs: Always provide a default LOD of 0 for TXS and TXL anv/pipeline: Properly handle unset gl_Layer and gl_ViewportIndex anv/blorp: Align vertex buffers to 64B i965/blorp: Align vertex buffers to 64B i965/blorp: Bump the batch space estimate Jerome Duval (2): haiku: build fixes around debug defines haiku/winsys: fix dt prototype args Julien Isorce (4): winsys/radeon: check null in radeon_cs_create_fence winsys/radeon: check null return from radeon_cs_create_fence in cs_flush radeon: initialize hole variable before calling container_of radeon_drm_bo: explicitly check return value of drmCommandWriteRead Kenneth Graunke (4): i965: Document the sad story of the kernel command parser. i965: Set screen->cmd_parser_version to 0 if we can't write registers. i965: Skip register write detection when possible. i965: Set kernel features before computing max GL version. Ken, the former three seem like an implicit requirement for the GL version fix. Marek Olšák (1): targets: export radeon winsys_create functions to silence LLVM warning Michal Srb (1): st: Add cubeMapFace parameter to st_finalize_texture. Thomas Hellstrom (1): gbm/dri: Flush after unmap Squashed with gbm/dri: Check dri extension version before flush after unmap Rejected (0) ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH V3 2/2] glsl: don't run the GLSL pre-processor when we are skipping compilation
Timothy Arceriwrites: > Improves Deus Ex start-up times with a warm cache from ~30 seconds to > ~22 seconds. > > Also fixes the leaking of state. The commit message could use some more context: "This moves the hashing of shader source for the cache lookup to before the preprocessor. In our experience, shaders are unlikely to hash the same after preprocessing if they didn't hash the same before, so we can skip preprocessing for cache hits." With something like that, Reviewed-by: Eric Anholt signature.asc Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2] gbm: add support for loading third-party backend (v2)
Emil Velikovwrites: > On 14 April 2017 at 10:38, Yu, Qiang wrote: >> >> Hi Emil, >> >>> What happened with the idea of reusing your existing amdgpu_dri.so ? >>> As mentioned before the DRI loader (libgbm) <> DRI driver (foo_dri.so) >>> interface is stable, so things should just work. >> Sorry for the late reply. I've asked our amdgpu_dri.so team for this, they >> seems have no interest and resource for implementing this interface. >> So the only option left for me is to reuse our current gbm_amdgpu.so >> and upstream libgbm.so changes if possible. >> > Quick look through `strings amdgpu_dri.so' shows that you guys are > missing the DRI2_FENCE and DRI2_INTEROP extensions. > Both of which are fairly trivial to implement and it will be the better > option. > > Doing so will give you: > - acknowledgement to the good work done by Marek (your colleague from > the other end of the org chart) > - less binaries to manage - remove gbm_amdgpu.so > - less code to manage - remove many of the libEGL and libgbm patches > that you have on top of Mesa. > > The proposed GBM interface end up broken rather often since: > - there's no open-source users that people test > - we have no tests to catch regressions :-\ > > TL;DR; You really want to implement the missing functionality in > amdgpu_dri.so - its more robust and it will reduce the code you have > to maintain. I agree with Emil here. Building ABI-stable interfaces is hard and error-prone, and the DRI interface and the GL interface are where where we do that already. We shouldn't introduce another ABI at the GBM backend level. signature.asc Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] swr: add linux to scons build
On Apr 14, 2017, at 12:44 PM, Emil Velikov> wrote: On 13 April 2017 at 20:17, George Kyriazis > wrote: Make swr compile for both linux and windows. --- src/gallium/drivers/swr/SConscript| 7 +-- src/gallium/targets/libgl-xlib/SConscript | 2 +- 2 files changed, 2 insertions(+), 7 deletions(-) diff --git a/src/gallium/drivers/swr/SConscript b/src/gallium/drivers/swr/SConscript index eca5dba..5e3784b 100644 --- a/src/gallium/drivers/swr/SConscript +++ b/src/gallium/drivers/swr/SConscript @@ -17,11 +17,6 @@ if env['LLVM_VERSION'] < distutils.version.LooseVersion('3.9'): env['swr'] = False Return() -if env['platform'] != 'windows': -print "warning: swr scons build only supports windows: not building swr" -env['swr'] = False -Return() - env.MSVC2013Compat() env = env.Clone() @@ -205,7 +200,7 @@ envavx2.Append(CPPDEFINES = ['KNOB_ARCH=KNOB_ARCH_AVX2']) if env['platform'] == 'windows': envavx2.Append(CCFLAGS = ['/arch:AVX2']) else: -envavx2.Append(CCFLAGS = ['-mavx2']) +envavx2.Append(CCFLAGS = ['-mavx2', '-mfma', '-mbmi2', '-mf16c']) swrAVX2 = envavx2.SharedLibrary( target = 'swrAVX2', diff --git a/src/gallium/targets/libgl-xlib/SConscript b/src/gallium/targets/libgl-xlib/SConscript index d01bb3c..a81ac79 100644 --- a/src/gallium/targets/libgl-xlib/SConscript +++ b/src/gallium/targets/libgl-xlib/SConscript @@ -49,7 +49,7 @@ if env['llvm']: env.Prepend(LIBS = [llvmpipe]) if env['swr']: -env.Append(CPPDEFINES = 'HAVE_SWR') +env.Append(CPPDEFINES = 'GALLIUM_SWR') Seems like we want the same fix in src/gallium/targets/osmesa/SConscript. Please squash that alongside a small note in docs/relnotes/17.1.0.html Checkin is already submitted, so I’ll make a foliow-up commit with those changes. With the above Reviewed-by: Emil Velikov > As a follow-up commit can we have $sed -i s/HAVE_/GALLIUM_ src/gallium/targets/libgl-xlib/* && git commit -asm “…” Yes, I want to fix this, too, and I was planning on doing it on a later commit. Thanks, George Thanks Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v3 4/4] vc4: Only build the NEON code on arm32.
Emil Velikovwrites: > On 14 April 2017 at 18:47, Eric Anholt wrote: >> NEON is sufficiently different on arm64 that we can't just reuse this >> code. Disable it on arm64 for now. >> >> Signed-off-by: Eric Anholt >> --- >> src/gallium/drivers/vc4/vc4_tiling_lt.c | 4 ++-- >> 1 file changed, 2 insertions(+), 2 deletions(-) >> >> diff --git a/src/gallium/drivers/vc4/vc4_tiling_lt.c >> b/src/gallium/drivers/vc4/vc4_tiling_lt.c >> index c9cbc65e2dbc..7de67b652daa 100644 >> --- a/src/gallium/drivers/vc4/vc4_tiling_lt.c >> +++ b/src/gallium/drivers/vc4/vc4_tiling_lt.c >> @@ -61,7 +61,7 @@ static void >> vc4_load_utile(void *cpu, void *gpu, uint32_t cpu_stride, uint32_t cpp) >> { >> uint32_t gpu_stride = vc4_utile_stride(cpp); >> -#if defined(VC4_BUILD_NEON) && defined(__ARM_ARCH) >> +#if defined(VC4_BUILD_NEON) && defined(__ARM_ARCH) && __ARM_ARCH <= 7 >> if (gpu_stride == 8) { >> __asm__ volatile ( >> /* Load from the GPU in one shot, no interleave, to >> @@ -118,7 +118,7 @@ vc4_store_utile(void *gpu, void *cpu, uint32_t >> cpu_stride, uint32_t cpp) >> { >> uint32_t gpu_stride = vc4_utile_stride(cpp); >> >> -#if defined(VC4_BUILD_NEON) && defined(__ARM_ARCH) >> +#if defined(VC4_BUILD_NEON) && defined(__ARM_ARCH) && __ARM_ARCH <= 7 > > This patch should be before 4/4, or it will cause intermittent breakage. I don't think there is any new breakage. We've been setting VC4_BUILD_NEON already. signature.asc Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v3 4/4] vc4: Only build the NEON code on arm32.
On 14 April 2017 at 18:47, Eric Anholtwrote: > NEON is sufficiently different on arm64 that we can't just reuse this > code. Disable it on arm64 for now. > > Signed-off-by: Eric Anholt > --- > src/gallium/drivers/vc4/vc4_tiling_lt.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/src/gallium/drivers/vc4/vc4_tiling_lt.c > b/src/gallium/drivers/vc4/vc4_tiling_lt.c > index c9cbc65e2dbc..7de67b652daa 100644 > --- a/src/gallium/drivers/vc4/vc4_tiling_lt.c > +++ b/src/gallium/drivers/vc4/vc4_tiling_lt.c > @@ -61,7 +61,7 @@ static void > vc4_load_utile(void *cpu, void *gpu, uint32_t cpu_stride, uint32_t cpp) > { > uint32_t gpu_stride = vc4_utile_stride(cpp); > -#if defined(VC4_BUILD_NEON) && defined(__ARM_ARCH) > +#if defined(VC4_BUILD_NEON) && defined(__ARM_ARCH) && __ARM_ARCH <= 7 > if (gpu_stride == 8) { > __asm__ volatile ( > /* Load from the GPU in one shot, no interleave, to > @@ -118,7 +118,7 @@ vc4_store_utile(void *gpu, void *cpu, uint32_t > cpu_stride, uint32_t cpp) > { > uint32_t gpu_stride = vc4_utile_stride(cpp); > > -#if defined(VC4_BUILD_NEON) && defined(__ARM_ARCH) > +#if defined(VC4_BUILD_NEON) && defined(__ARM_ARCH) && __ARM_ARCH <= 7 This patch should be before 4/4, or it will cause intermittent breakage. -Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] radv: remove irrelevant comment
Reviewed-by: Bas NieuwenhuizenOn Fri, Apr 14, 2017 at 7:17 PM, Grazvydas Ignotas wrote: > A leftover from anv. > > Signed-off-by: Grazvydas Ignotas > --- > src/amd/vulkan/radv_device.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c > index 5f14394..7857e8f 100644 > --- a/src/amd/vulkan/radv_device.c > +++ b/src/amd/vulkan/radv_device.c > @@ -660,11 +660,11 @@ void radv_GetPhysicalDeviceProperties( > .driverVersion = radv_get_driver_version(), > .vendorID = 0x1002, > .deviceID = pdevice->rad_info.pci_id, > .deviceType = VK_PHYSICAL_DEVICE_TYPE_DISCRETE_GPU, > .limits = limits, > - .sparseProperties = {0}, /* Broadwell doesn't do sparse. */ > + .sparseProperties = {0}, > }; > > strcpy(pProperties->deviceName, pdevice->name); > memcpy(pProperties->pipelineCacheUUID, pdevice->uuid, VK_UUID_SIZE); > } > -- > 2.7.4 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] radv: report timestampPeriod correctly
For some reason I thought it did it in 10 KHz. Reviewed-by: Bas NieuwenhuizenOn Fri, Apr 14, 2017 at 7:17 PM, Grazvydas Ignotas wrote: > The kernel returns frequency in kHz, so to convert to nanosecond > interval that Vulkan uses the dividend should be 100.0 and not > 10.0. > > This fixes the GPU graph in DOOM and matches the amdgpu-pro blob. > > Signed-off-by: Grazvydas Ignotas > Fixes: f4e499ec791 "radv: add initial non-conformant radv vulkan driver" > --- > src/amd/vulkan/radv_device.c| 2 +- > src/amd/vulkan/radv_radeon_winsys.h | 2 +- > 2 files changed, 2 insertions(+), 2 deletions(-) > > diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c > index 7857e8f..796cc70 100644 > --- a/src/amd/vulkan/radv_device.c > +++ b/src/amd/vulkan/radv_device.c > @@ -637,11 +637,11 @@ void radv_GetPhysicalDeviceProperties( > .sampledImageDepthSampleCounts= sample_counts, > .sampledImageStencilSampleCounts = sample_counts, > .storageImageSampleCounts = > VK_SAMPLE_COUNT_1_BIT, > .maxSampleMaskWords = 1, > .timestampComputeAndGraphics = false, > - .timestampPeriod = 10.0 / > pdevice->rad_info.clock_crystal_freq, > + .timestampPeriod = 100.0 / > pdevice->rad_info.clock_crystal_freq, > .maxClipDistances = 8, > .maxCullDistances = 8, > .maxCombinedClipAndCullDistances = 8, > .discreteQueuePriorities = 1, > .pointSizeRange = { 0.125, 255.875 > }, > diff --git a/src/amd/vulkan/radv_radeon_winsys.h > b/src/amd/vulkan/radv_radeon_winsys.h > index 9f2430f..f6bab74 100644 > --- a/src/amd/vulkan/radv_radeon_winsys.h > +++ b/src/amd/vulkan/radv_radeon_winsys.h > @@ -93,11 +93,11 @@ struct radeon_info { > bool has_uvd; > uint32_tsdma_rings; > uint32_tcompute_rings; > uint32_tvce_fw_version; > uint32_tvce_harvest_config; > - uint32_tclock_crystal_freq; > + uint32_tclock_crystal_freq; /* in kHz */ > > /* Kernel info. */ > uint32_tdrm_major; /* version */ > uint32_tdrm_minor; > uint32_tdrm_patchlevel; > -- > 2.7.4 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2] gbm: add support for loading third-party backend (v2)
On 14 April 2017 at 10:38, Yu, Qiangwrote: > > Hi Emil, > >> What happened with the idea of reusing your existing amdgpu_dri.so ? >> As mentioned before the DRI loader (libgbm) <> DRI driver (foo_dri.so) >> interface is stable, so things should just work. > Sorry for the late reply. I've asked our amdgpu_dri.so team for this, they > seems have no interest and resource for implementing this interface. > So the only option left for me is to reuse our current gbm_amdgpu.so > and upstream libgbm.so changes if possible. > Quick look through `strings amdgpu_dri.so' shows that you guys are missing the DRI2_FENCE and DRI2_INTEROP extensions. Both of which are fairly trivial to implement and it will be the better option. Doing so will give you: - acknowledgement to the good work done by Marek (your colleague from the other end of the org chart) - less binaries to manage - remove gbm_amdgpu.so - less code to manage - remove many of the libEGL and libgbm patches that you have on top of Mesa. The proposed GBM interface end up broken rather often since: - there's no open-source users that people test - we have no tests to catch regressions :-\ TL;DR; You really want to implement the missing functionality in amdgpu_dri.so - its more robust and it will reduce the code you have to maintain. Regards, Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] swr: update gallium driver docs
On 13 April 2017 at 19:41, Tim Rowleywrote: > --- > src/gallium/docs/source/drivers/openswr.rst | 2 +- > src/gallium/docs/source/drivers/openswr/usage.rst | 9 +++-- > 2 files changed, 4 insertions(+), 7 deletions(-) > > diff --git a/src/gallium/docs/source/drivers/openswr.rst > b/src/gallium/docs/source/drivers/openswr.rst > index 84aa51f..e254d7b 100644 > --- a/src/gallium/docs/source/drivers/openswr.rst > +++ b/src/gallium/docs/source/drivers/openswr.rst > @@ -7,7 +7,7 @@ geometry heavy workloads there is a considerable speedup over > llvmpipe, > which is to be expected as the geometry frontend of llvmpipe is single > threaded. > > -This rasterizer is x86 specific and requires AVX or AVX2. The driver > +This rasterizer is x86 specific and requires AVX or above. The driver > fits into the gallium framework, and reuses gallivm for doing the TGSI > to vectorized llvm-IR conversion of the shader kernels. > > diff --git a/src/gallium/docs/source/drivers/openswr/usage.rst > b/src/gallium/docs/source/drivers/openswr/usage.rst > index e55b421..d2a664e 100644 > --- a/src/gallium/docs/source/drivers/openswr/usage.rst > +++ b/src/gallium/docs/source/drivers/openswr/usage.rst > @@ -4,8 +4,9 @@ Usage > Requirements > > > -* An x86 processor with AVX or AVX2 > -* LLVM version 3.6 or later > +* An x86 processor with AVX or above > +* LLVM version 3.9 or later > +* C++14 capable compiler > > Building > > @@ -22,10 +23,6 @@ On Linux, building will create a drop-in alternative for > libGL.so into:: > There is a hunk just outside of the diff that wants a s/building will/building with autotools will/ >lib/gallium/libGL.so > > -or:: s/or/Alternatively, building with SCons will produce/ > - > - build/foo/gallium/targets/libgl-xlib/libGL.so > - and then keep this line. Considering George wired everything, one might as well have it documented ;-) Either way not my call, so feel free to ignore. -Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v3 1/4] gallium: Enable ARM NEON CPU detection.
I wrote this code with reference to pixman, though I've only decided to cover Linux (what I'm testing) and Android (seems obvious enough). Linux has getauxval() as a cleaner interface to the /proc entry, but it's more glibc-specific and I didn't want to add detection for that. This will be used to enable NEON at runtime on ARMv6 builds of vc4. v2: Actually initialize the temp vars in the Android path (noticed by daniels) v3: Actually pull in the cpufeatures library (change by robher). Use O_CLOEXEC. Break out of the loop when we find our feature. Only do NEON detection, until someone actually wants VFP features. --- src/gallium/auxiliary/Android.mk | 2 ++ src/gallium/auxiliary/util/u_cpu_detect.c | 43 +++ src/gallium/auxiliary/util/u_cpu_detect.h | 1 + 3 files changed, 46 insertions(+) diff --git a/src/gallium/auxiliary/Android.mk b/src/gallium/auxiliary/Android.mk index e8628e43744a..4f6f71bbf6a9 100644 --- a/src/gallium/auxiliary/Android.mk +++ b/src/gallium/auxiliary/Android.mk @@ -48,6 +48,8 @@ endif LOCAL_MODULE := libmesa_gallium LOCAL_STATIC_LIBRARIES += libmesa_nir +LOCAL_WHOLE_STATIC_LIBRARIES += cpufeatures + # generate sources LOCAL_MODULE_CLASS := STATIC_LIBRARIES intermediates := $(call local-generated-sources-dir) diff --git a/src/gallium/auxiliary/util/u_cpu_detect.c b/src/gallium/auxiliary/util/u_cpu_detect.c index 845fc6b34d5c..76115bf8d55d 100644 --- a/src/gallium/auxiliary/util/u_cpu_detect.c +++ b/src/gallium/auxiliary/util/u_cpu_detect.c @@ -59,12 +59,18 @@ #if defined(PIPE_OS_LINUX) #include +#include +#include #endif #ifdef PIPE_OS_UNIX #include #endif +#if defined(PIPE_OS_ANDROID) +#include +#endif + #if defined(PIPE_OS_WINDOWS) #include #if defined(PIPE_CC_MSVC) @@ -294,6 +300,38 @@ PIPE_ALIGN_STACK static inline boolean sse2_has_daz(void) #endif /* X86 or X86_64 */ +#if defined(PIPE_ARCH_ARM) +static void +check_os_arm_support(void) +{ +#if defined(PIPE_OS_ANDROID) + AndroidCpuFamily cpu_family = android_getCpuFamily(); + uint64_t cpu_features = android_getCpuFeatures(); + + if (cpu_family == ANDROID_CPU_FAMILY_ARM) { + if (cpu_features & ANDROID_CPU_ARM_FEATURE_NEON) + util_cpu_caps.has_neon = 1; + } +#elif defined(PIPE_OS_LINUX) +Elf32_auxv_t aux; +int fd; + +fd = open("/proc/self/auxv", O_RDONLY | O_CLOEXEC); +if (fd >= 0) { + while (read(fd, , sizeof(Elf32_auxv_t)) == sizeof(Elf32_auxv_t)) { + if (aux.a_type == AT_HWCAP) { + uint32_t hwcap = aux.a_un.a_val; + + util_cpu_caps.has_neon = (hwcap >> 12) & 1; + break; + } + } + close (fd); +} +#endif /* PIPE_OS_LINUX */ +} +#endif /* PIPE_ARCH_ARM */ + void util_cpu_detect(void) { @@ -443,6 +481,10 @@ util_cpu_detect(void) } #endif /* PIPE_ARCH_X86 || PIPE_ARCH_X86_64 */ +#if defined(PIPE_ARCH_ARM) + check_os_arm_support(); +#endif + #if defined(PIPE_ARCH_PPC) check_os_altivec_support(); #endif /* PIPE_ARCH_PPC */ @@ -471,6 +513,7 @@ util_cpu_detect(void) debug_printf("util_cpu_caps.has_3dnow_ext = %u\n", util_cpu_caps.has_3dnow_ext); debug_printf("util_cpu_caps.has_xop = %u\n", util_cpu_caps.has_xop); debug_printf("util_cpu_caps.has_altivec = %u\n", util_cpu_caps.has_altivec); + debug_printf("util_cpu_caps.has_neon = %u\n", util_cpu_caps.has_neon); debug_printf("util_cpu_caps.has_daz = %u\n", util_cpu_caps.has_daz); debug_printf("util_cpu_caps.has_avx512f = %u\n", util_cpu_caps.has_avx512f); debug_printf("util_cpu_caps.has_avx512dq = %u\n", util_cpu_caps.has_avx512dq); diff --git a/src/gallium/auxiliary/util/u_cpu_detect.h b/src/gallium/auxiliary/util/u_cpu_detect.h index 3bd7294f0759..4a34ac4d9a63 100644 --- a/src/gallium/auxiliary/util/u_cpu_detect.h +++ b/src/gallium/auxiliary/util/u_cpu_detect.h @@ -72,6 +72,7 @@ struct util_cpu_caps { unsigned has_xop:1; unsigned has_altivec:1; unsigned has_daz:1; + unsigned has_neon:1; unsigned has_avx512f:1; unsigned has_avx512dq:1; -- 2.11.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v3 4/4] vc4: Only build the NEON code on arm32.
NEON is sufficiently different on arm64 that we can't just reuse this code. Disable it on arm64 for now. Signed-off-by: Eric Anholt--- src/gallium/drivers/vc4/vc4_tiling_lt.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/gallium/drivers/vc4/vc4_tiling_lt.c b/src/gallium/drivers/vc4/vc4_tiling_lt.c index c9cbc65e2dbc..7de67b652daa 100644 --- a/src/gallium/drivers/vc4/vc4_tiling_lt.c +++ b/src/gallium/drivers/vc4/vc4_tiling_lt.c @@ -61,7 +61,7 @@ static void vc4_load_utile(void *cpu, void *gpu, uint32_t cpu_stride, uint32_t cpp) { uint32_t gpu_stride = vc4_utile_stride(cpp); -#if defined(VC4_BUILD_NEON) && defined(__ARM_ARCH) +#if defined(VC4_BUILD_NEON) && defined(__ARM_ARCH) && __ARM_ARCH <= 7 if (gpu_stride == 8) { __asm__ volatile ( /* Load from the GPU in one shot, no interleave, to @@ -118,7 +118,7 @@ vc4_store_utile(void *gpu, void *cpu, uint32_t cpu_stride, uint32_t cpp) { uint32_t gpu_stride = vc4_utile_stride(cpp); -#if defined(VC4_BUILD_NEON) && defined(__ARM_ARCH) +#if defined(VC4_BUILD_NEON) && defined(__ARM_ARCH) && __ARM_ARCH <= 7 if (gpu_stride == 8) { __asm__ volatile ( /* Load each 8-byte line from cpu-side source, -- 2.11.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v3 3/4] vc4: Use runtime CPU detection for whether NEON is available.
This will allow Raspbian's ARMv6 builds to take advantage of the new NEON code, and could prevent problems if vc4 ends up getting used on a v7 CPU without NEON. --- src/gallium/drivers/vc4/vc4_screen.c | 3 +++ src/gallium/drivers/vc4/vc4_tiling.h | 25 + 2 files changed, 16 insertions(+), 12 deletions(-) diff --git a/src/gallium/drivers/vc4/vc4_screen.c b/src/gallium/drivers/vc4/vc4_screen.c index 9030c4baf4bb..514af808b916 100644 --- a/src/gallium/drivers/vc4/vc4_screen.c +++ b/src/gallium/drivers/vc4/vc4_screen.c @@ -27,6 +27,7 @@ #include "pipe/p_screen.h" #include "pipe/p_state.h" +#include "util/u_cpu_detect.h" #include "util/u_debug.h" #include "util/u_memory.h" #include "util/u_format.h" @@ -627,6 +628,8 @@ vc4_screen_create(int fd) if (!vc4_get_chip_info(screen)) goto fail; +util_cpu_detect(); + slab_create_parent(>transfer_pool, sizeof(struct vc4_transfer), 16); vc4_fence_init(screen); diff --git a/src/gallium/drivers/vc4/vc4_tiling.h b/src/gallium/drivers/vc4/vc4_tiling.h index ba1ad6fb3f7d..31317db7a949 100644 --- a/src/gallium/drivers/vc4/vc4_tiling.h +++ b/src/gallium/drivers/vc4/vc4_tiling.h @@ -27,6 +27,7 @@ #include #include #include "util/macros.h" +#include "util/u_cpu_detect.h" /** Return the width in pixels of a 64-byte microtile. */ static inline uint32_t @@ -83,23 +84,18 @@ void vc4_store_tiled_image(void *dst, uint32_t dst_stride, uint8_t tiling_format, int cpp, const struct pipe_box *box); -/* If we're building for ARMv7 (Pi 2+), assume it has NEON. For Raspbian we - * should extend this to have some runtime detection of being built for ARMv6 - * on a Pi 2+. - */ -#if defined(__ARM_ARCH) && __ARM_ARCH == 7 -#define NEON_SUFFIX(x) x ## _neon -#else -#define NEON_SUFFIX(x) x ## _base -#endif - static inline void vc4_load_lt_image(void *dst, uint32_t dst_stride, void *src, uint32_t src_stride, int cpp, const struct pipe_box *box) { -NEON_SUFFIX(vc4_load_lt_image)(dst, dst_stride, src, src_stride, +if (util_cpu_caps.has_neon) { +vc4_load_lt_image_neon(dst, dst_stride, src, src_stride, + cpp, box); +} else { +vc4_load_lt_image_base(dst, dst_stride, src, src_stride, cpp, box); +} } static inline void @@ -107,8 +103,13 @@ vc4_store_lt_image(void *dst, uint32_t dst_stride, void *src, uint32_t src_stride, int cpp, const struct pipe_box *box) { -NEON_SUFFIX(vc4_store_lt_image)(dst, dst_stride, src, src_stride, +if (util_cpu_caps.has_neon) { +vc4_store_lt_image_neon(dst, dst_stride, src, src_stride, +cpp, box); +} else { +vc4_store_lt_image_base(dst, dst_stride, src, src_stride, cpp, box); +} } #undef NEON_SUFFIX -- 2.11.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v3 2/4] vc4: Use a wrapper file to set VC4_BUILD_NEON instead of CFLAGS.
Android.mk was setting the flag across the entire driver, so we didn't have non-NEON versions getting built. This was going to be a problem with the next commit, when I start auto-detecting NEON support and use the non-NEON version when appropriate. --- src/gallium/drivers/vc4/Android.mk | 2 -- src/gallium/drivers/vc4/Makefile.am | 6 -- src/gallium/drivers/vc4/Makefile.sources | 1 + src/gallium/drivers/vc4/vc4_tiling_lt_neon.c | 30 4 files changed, 31 insertions(+), 8 deletions(-) create mode 100644 src/gallium/drivers/vc4/vc4_tiling_lt_neon.c diff --git a/src/gallium/drivers/vc4/Android.mk b/src/gallium/drivers/vc4/Android.mk index fdc06744e5ab..de9d5e3f5b3c 100644 --- a/src/gallium/drivers/vc4/Android.mk +++ b/src/gallium/drivers/vc4/Android.mk @@ -25,8 +25,6 @@ include $(LOCAL_PATH)/Makefile.sources include $(CLEAR_VARS) -LOCAL_CFLAGS_arm := -DVC4_BUILD_NEON - LOCAL_SRC_FILES := \ $(C_SOURCES) diff --git a/src/gallium/drivers/vc4/Makefile.am b/src/gallium/drivers/vc4/Makefile.am index b361a0c588a8..0ed49b128b2d 100644 --- a/src/gallium/drivers/vc4/Makefile.am +++ b/src/gallium/drivers/vc4/Makefile.am @@ -41,10 +41,4 @@ libvc4_la_SOURCES = $(C_SOURCES) libvc4_la_LIBADD = $(SIM_LIB) $(VC4_LIBS) libvc4_la_LDFLAGS = $(SIM_LDFLAGS) -noinst_LTLIBRARIES += libvc4_neon.la -libvc4_la_LIBADD += libvc4_neon.la - -libvc4_neon_la_SOURCES = vc4_tiling_lt.c -libvc4_neon_la_CFLAGS = $(AM_CFLAGS) -DVC4_BUILD_NEON - EXTRA_DIST = kernel/README diff --git a/src/gallium/drivers/vc4/Makefile.sources b/src/gallium/drivers/vc4/Makefile.sources index 10de34361260..442d7a561782 100644 --- a/src/gallium/drivers/vc4/Makefile.sources +++ b/src/gallium/drivers/vc4/Makefile.sources @@ -56,6 +56,7 @@ C_SOURCES := \ vc4_state.c \ vc4_tiling.c \ vc4_tiling_lt.c \ + vc4_tiling_lt_neon.c \ vc4_tiling.h \ vc4_uniforms.c \ $() diff --git a/src/gallium/drivers/vc4/vc4_tiling_lt_neon.c b/src/gallium/drivers/vc4/vc4_tiling_lt_neon.c new file mode 100644 index ..7ba66ae4cdf4 --- /dev/null +++ b/src/gallium/drivers/vc4/vc4_tiling_lt_neon.c @@ -0,0 +1,30 @@ +/* + * Copyright © 2017 Broadcom + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + */ + +/* Wrapper file for building vc4_tiling_lt.c with the "build NEON assembly if + * possible" flag set, since Android.mk doesn't have a way to set CFLAGS for a + * single file. + */ + +#define VC4_BUILD_NEON +#include "vc4_tiling_lt.c" -- 2.11.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] swr: add linux to scons build
On 13 April 2017 at 20:17, George Kyriaziswrote: > Make swr compile for both linux and windows. > --- > src/gallium/drivers/swr/SConscript| 7 +-- > src/gallium/targets/libgl-xlib/SConscript | 2 +- > 2 files changed, 2 insertions(+), 7 deletions(-) > > diff --git a/src/gallium/drivers/swr/SConscript > b/src/gallium/drivers/swr/SConscript > index eca5dba..5e3784b 100644 > --- a/src/gallium/drivers/swr/SConscript > +++ b/src/gallium/drivers/swr/SConscript > @@ -17,11 +17,6 @@ if env['LLVM_VERSION'] < > distutils.version.LooseVersion('3.9'): > env['swr'] = False > Return() > > -if env['platform'] != 'windows': > -print "warning: swr scons build only supports windows: not building swr" > -env['swr'] = False > -Return() > - > env.MSVC2013Compat() > > env = env.Clone() > @@ -205,7 +200,7 @@ envavx2.Append(CPPDEFINES = ['KNOB_ARCH=KNOB_ARCH_AVX2']) > if env['platform'] == 'windows': > envavx2.Append(CCFLAGS = ['/arch:AVX2']) > else: > -envavx2.Append(CCFLAGS = ['-mavx2']) > +envavx2.Append(CCFLAGS = ['-mavx2', '-mfma', '-mbmi2', '-mf16c']) > > swrAVX2 = envavx2.SharedLibrary( > target = 'swrAVX2', > diff --git a/src/gallium/targets/libgl-xlib/SConscript > b/src/gallium/targets/libgl-xlib/SConscript > index d01bb3c..a81ac79 100644 > --- a/src/gallium/targets/libgl-xlib/SConscript > +++ b/src/gallium/targets/libgl-xlib/SConscript > @@ -49,7 +49,7 @@ if env['llvm']: > env.Prepend(LIBS = [llvmpipe]) > > if env['swr']: > -env.Append(CPPDEFINES = 'HAVE_SWR') > +env.Append(CPPDEFINES = 'GALLIUM_SWR') Seems like we want the same fix in src/gallium/targets/osmesa/SConscript. Please squash that alongside a small note in docs/relnotes/17.1.0.html With the above Reviewed-by: Emil Velikov As a follow-up commit can we have $sed -i s/HAVE_/GALLIUM_ src/gallium/targets/libgl-xlib/* && git commit -asm "..." Thanks Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [RFC 21/21] anv: Use DRM sync objects for external semaphores when available
--- src/intel/vulkan/anv_batch_chain.c | 69 src/intel/vulkan/anv_device.c | 2 + src/intel/vulkan/anv_private.h | 8 src/intel/vulkan/anv_queue.c | 93 -- 4 files changed, 148 insertions(+), 24 deletions(-) diff --git a/src/intel/vulkan/anv_batch_chain.c b/src/intel/vulkan/anv_batch_chain.c index ec37c81..0f118c8 100644 --- a/src/intel/vulkan/anv_batch_chain.c +++ b/src/intel/vulkan/anv_batch_chain.c @@ -953,6 +953,19 @@ anv_cmd_buffer_add_secondary(struct anv_cmd_buffer *primary, >surface_relocs, 0); } +struct drm_i915_gem_exec_fence { + /** +* User's handle for a dma-fence to wait on or signal. +*/ + __u32 handle; + +#define I915_EXEC_FENCE_WAIT(1<<0) +#define I915_EXEC_FENCE_SIGNAL (1<<1) + __u32 flags; +}; + +#define I915_EXEC_FENCE_ARRAY (1<<19) + struct anv_execbuf { struct drm_i915_gem_execbuffer2 execbuf; @@ -962,6 +975,10 @@ struct anv_execbuf { /* Allocated length of the 'objects' and 'bos' arrays */ uint32_t array_length; + + uint32_t fence_count; + uint32_t fence_array_length; + struct drm_i915_gem_exec_fence * fences; }; static void @@ -976,6 +993,7 @@ anv_execbuf_finish(struct anv_execbuf *exec, { vk_free(alloc, exec->objects); vk_free(alloc, exec->bos); + vk_free(alloc, exec->fences); } static VkResult @@ -1061,6 +1079,35 @@ anv_execbuf_add_bo(struct anv_execbuf *exec, return VK_SUCCESS; } +static VkResult +anv_execbuf_add_syncobj(struct anv_execbuf *exec, +uint32_t handle, +uint32_t flags, +const VkAllocationCallbacks *alloc) +{ + if (exec->fence_count >= exec->fence_array_length) { + uint32_t new_len = MAX2(exec->fence_array_length * 2, 64); + + struct drm_i915_gem_exec_fence *new_fences = + vk_realloc(alloc, exec->fences, new_len * sizeof(*new_fences), +8, VK_SYSTEM_ALLOCATION_SCOPE_COMMAND); + if (new_fences == NULL) + return vk_error(VK_ERROR_OUT_OF_HOST_MEMORY); + + exec->fences = new_fences; + exec->fence_array_length = new_len; + } + + exec->fences[exec->fence_count] = (struct drm_i915_gem_exec_fence) { + .handle = handle, + .flags = flags, + }; + + exec->fence_count++; + + return VK_SUCCESS; +} + static void anv_cmd_buffer_process_relocs(struct anv_cmd_buffer *cmd_buffer, struct anv_reloc_list *list) @@ -1447,6 +1494,14 @@ anv_cmd_buffer_execbuf(struct anv_device *device, impl->fd = -1; break; + case ANV_SEMAPHORE_TYPE_DRM_SYNCOBJ: + result = anv_execbuf_add_syncobj(, impl->syncobj, + I915_EXEC_FENCE_WAIT, + >alloc); + if (result != VK_SUCCESS) +return result; + break; + default: break; } @@ -1481,6 +1536,14 @@ anv_cmd_buffer_execbuf(struct anv_device *device, need_out_fence = true; break; + case ANV_SEMAPHORE_TYPE_DRM_SYNCOBJ: + result = anv_execbuf_add_syncobj(, impl->syncobj, + I915_EXEC_FENCE_SIGNAL, + >alloc); + if (result != VK_SUCCESS) +return result; + break; + default: break; } @@ -1494,6 +1557,12 @@ anv_cmd_buffer_execbuf(struct anv_device *device, setup_empty_execbuf(, device); } + if (execbuf.fence_count > 0) { + execbuf.execbuf.flags |= I915_EXEC_FENCE_ARRAY; + execbuf.execbuf.num_cliprects = execbuf.fence_count; + execbuf.execbuf.cliprects_ptr = (uintptr_t) execbuf.fences; + } + if (in_fence != -1) { execbuf.execbuf.flags |= I915_EXEC_FENCE_IN; execbuf.execbuf.rsvd2 |= (uint32_t)in_fence; diff --git a/src/intel/vulkan/anv_device.c b/src/intel/vulkan/anv_device.c index f853905..13d01d1 100644 --- a/src/intel/vulkan/anv_device.c +++ b/src/intel/vulkan/anv_device.c @@ -233,6 +233,8 @@ anv_physical_device_init(struct anv_physical_device *device, device->has_exec_async = anv_gem_get_param(fd, I915_PARAM_HAS_EXEC_ASYNC); device->has_exec_fence = anv_gem_get_param(fd, I915_PARAM_HAS_EXEC_FENCE); + device->has_syncobj = + anv_gem_get_param(fd, 47 /* I915_PARAM_HAS_EXEC_FENCE_ARRAY */); bool swizzled = anv_gem_get_bit6_swizzle(fd, I915_TILING_X); diff --git a/src/intel/vulkan/anv_private.h b/src/intel/vulkan/anv_private.h index d1406ab..0731e89 100644 --- a/src/intel/vulkan/anv_private.h +++ b/src/intel/vulkan/anv_private.h @@ -648,6 +648,7 @@ struct anv_physical_device { int cmd_parser_version; bool
[Mesa-dev] [RFC 20/21] anv/gem: Add a drm syncobj support
--- src/intel/vulkan/anv_gem.c | 79 src/intel/vulkan/anv_gem_stubs.c | 24 src/intel/vulkan/anv_private.h | 4 ++ 3 files changed, 107 insertions(+) diff --git a/src/intel/vulkan/anv_gem.c b/src/intel/vulkan/anv_gem.c index e331fbb..6db15ba 100644 --- a/src/intel/vulkan/anv_gem.c +++ b/src/intel/vulkan/anv_gem.c @@ -449,3 +449,82 @@ anv_gem_sync_file_merge(struct anv_device *device, int fd1, int fd2) return args.fence; } + +#define DRM_IOCTL_SYNCOBJ_CREATE DRM_IOWR(0xBF, struct drm_syncobj_create_info) +#define DRM_IOCTL_SYNCOBJ_DESTROY DRM_IOWR(0xC0, struct drm_syncobj_destroy) +#define DRM_IOCTL_SYNCOBJ_HANDLE_TO_FD DRM_IOWR(0xC1, struct drm_syncobj_handle) +#define DRM_IOCTL_SYNCOBJ_FD_TO_HANDLE DRM_IOWR(0xC2, struct drm_syncobj_handle) +#define DRM_IOCTL_SYNCOBJ_INFO DRM_IOWR(0xC3, struct drm_syncobj_create_info) + +struct drm_syncobj_create_info { + __u32 handle; + __u32 type; + __u32 flags; + __u32 pad; +}; + +struct drm_syncobj_destroy { + __u32 handle; + __u32 pad; +}; + +struct drm_syncobj_handle { + __u32 handle; + /** Flags.. only applicable for handle->fd */ + __u32 flags; + + __s32 fd; +}; + +uint32_t +anv_gem_syncobj_create(struct anv_device *device) +{ + struct drm_syncobj_create_info args = { + .type = 1 /* SYNC_FILE_TYPE_SEMAPHORE */, + .flags = 0, + }; + + int ret = anv_ioctl(device->fd, DRM_IOCTL_SYNCOBJ_CREATE, ); + if (ret) + return 0; + + return args.handle; +} + +void +anv_gem_syncobj_close(struct anv_device *device, uint32_t handle) +{ + struct drm_syncobj_destroy args = { + .handle = handle, + }; + + anv_ioctl(device->fd, DRM_IOCTL_SYNCOBJ_DESTROY, ); +} + +int +anv_gem_syncobj_handle_to_fd(struct anv_device *device, uint32_t handle) +{ + struct drm_syncobj_handle args = { + .handle = handle, + }; + + int ret = anv_ioctl(device->fd, DRM_IOCTL_SYNCOBJ_HANDLE_TO_FD, ); + if (ret) + return -1; + + return args.fd; +} + +uint32_t +anv_gem_syncobj_fd_to_handle(struct anv_device *device, int fd) +{ + struct drm_syncobj_handle args = { + .fd = fd, + }; + + int ret = anv_ioctl(device->fd, DRM_IOCTL_SYNCOBJ_FD_TO_HANDLE, ); + if (ret) + return 0; + + return args.handle; +} diff --git a/src/intel/vulkan/anv_gem_stubs.c b/src/intel/vulkan/anv_gem_stubs.c index d93009f..e3998b9 100644 --- a/src/intel/vulkan/anv_gem_stubs.c +++ b/src/intel/vulkan/anv_gem_stubs.c @@ -187,3 +187,27 @@ anv_gem_fd_to_handle(struct anv_device *device, int fd) { unreachable("Unused"); } + +uint32_t +anv_gem_syncobj_create(struct anv_device *device) +{ + unreachable("Unused"); +} + +void +anv_gem_syncobj_close(struct anv_device *device, uint32_t handle) +{ + unreachable("Unused"); +} + +int +anv_gem_syncobj_handle_to_fd(struct anv_device *device, uint32_t handle) +{ + unreachable("Unused"); +} + +uint32_t +anv_gem_syncobj_fd_to_handle(struct anv_device *device, int fd) +{ + unreachable("Unused"); +} diff --git a/src/intel/vulkan/anv_private.h b/src/intel/vulkan/anv_private.h index b99c93c..d1406ab 100644 --- a/src/intel/vulkan/anv_private.h +++ b/src/intel/vulkan/anv_private.h @@ -802,6 +802,10 @@ int anv_gem_set_domain(struct anv_device *device, uint32_t gem_handle, int anv_gem_sync_file_merge(struct anv_device *device, int fd1, int fd2); int anv_gem_set_context_param(struct anv_device *device, uint64_t param, uint64_t value); +uint32_t anv_gem_syncobj_create(struct anv_device *device); +void anv_gem_syncobj_close(struct anv_device *device, uint32_t handle); +int anv_gem_syncobj_handle_to_fd(struct anv_device *device, uint32_t handle); +uint32_t anv_gem_syncobj_fd_to_handle(struct anv_device *device, int fd); VkResult anv_bo_init_new(struct anv_bo *bo, struct anv_device *device, uint64_t size); -- 2.5.0.400.gff86faf ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 17/21] anv/gem: Use EXECBUFFER2_WR when the FENCE_OUT flag is set
--- src/intel/vulkan/anv_gem.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/src/intel/vulkan/anv_gem.c b/src/intel/vulkan/anv_gem.c index 185086f..1392bf4 100644 --- a/src/intel/vulkan/anv_gem.c +++ b/src/intel/vulkan/anv_gem.c @@ -185,7 +185,10 @@ int anv_gem_execbuffer(struct anv_device *device, struct drm_i915_gem_execbuffer2 *execbuf) { - return anv_ioctl(device->fd, DRM_IOCTL_I915_GEM_EXECBUFFER2, execbuf); + if (execbuf->flags & I915_EXEC_FENCE_OUT) + return anv_ioctl(device->fd, DRM_IOCTL_I915_GEM_EXECBUFFER2_WR, execbuf); + else + return anv_ioctl(device->fd, DRM_IOCTL_I915_GEM_EXECBUFFER2, execbuf); } int -- 2.5.0.400.gff86faf ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 16/21] anv: Implement VK_KHX_external_semaphore_fd
This implementation allocates a 4k BO for each semaphore that can be exported using OPAQUE_FD and uses the kernel's already-existing synchronization mechanism on BOs. --- src/intel/vulkan/anv_batch_chain.c | 53 ++-- src/intel/vulkan/anv_device.c | 4 + src/intel/vulkan/anv_entrypoints_gen.py | 1 + src/intel/vulkan/anv_private.h | 16 +++- src/intel/vulkan/anv_queue.c| 141 ++-- 5 files changed, 199 insertions(+), 16 deletions(-) diff --git a/src/intel/vulkan/anv_batch_chain.c b/src/intel/vulkan/anv_batch_chain.c index 136f273..0529f22 100644 --- a/src/intel/vulkan/anv_batch_chain.c +++ b/src/intel/vulkan/anv_batch_chain.c @@ -982,6 +982,7 @@ static VkResult anv_execbuf_add_bo(struct anv_execbuf *exec, struct anv_bo *bo, struct anv_reloc_list *relocs, + uint32_t extra_flags, const VkAllocationCallbacks *alloc) { struct drm_i915_gem_exec_object2 *obj = NULL; @@ -1036,7 +1037,7 @@ anv_execbuf_add_bo(struct anv_execbuf *exec, obj->relocs_ptr = 0; obj->alignment = 0; obj->offset = bo->offset; - obj->flags = bo->flags; + obj->flags = bo->flags | extra_flags; obj->rsvd1 = 0; obj->rsvd2 = 0; } @@ -1052,7 +1053,8 @@ anv_execbuf_add_bo(struct anv_execbuf *exec, for (size_t i = 0; i < relocs->num_relocs; i++) { /* A quick sanity check on relocations */ assert(relocs->relocs[i].offset < bo->size); - anv_execbuf_add_bo(exec, relocs->reloc_bos[i], NULL, alloc); + anv_execbuf_add_bo(exec, relocs->reloc_bos[i], NULL, +extra_flags, alloc); } } @@ -1261,7 +1263,7 @@ setup_execbuf_for_cmd_buffer(struct anv_execbuf *execbuf, adjust_relocations_from_state_pool(ss_pool, _buffer->surface_relocs, cmd_buffer->last_ss_pool_center); VkResult result = - anv_execbuf_add_bo(execbuf, _pool->bo, _buffer->surface_relocs, + anv_execbuf_add_bo(execbuf, _pool->bo, _buffer->surface_relocs, 0, _buffer->device->alloc); if (result != VK_SUCCESS) return result; @@ -1274,7 +1276,7 @@ setup_execbuf_for_cmd_buffer(struct anv_execbuf *execbuf, adjust_relocations_to_state_pool(ss_pool, &(*bbo)->bo, &(*bbo)->relocs, cmd_buffer->last_ss_pool_center); - result = anv_execbuf_add_bo(execbuf, &(*bbo)->bo, &(*bbo)->relocs, + result = anv_execbuf_add_bo(execbuf, &(*bbo)->bo, &(*bbo)->relocs, 0, _buffer->device->alloc); if (result != VK_SUCCESS) return result; @@ -1387,12 +1389,51 @@ setup_execbuf_for_cmd_buffer(struct anv_execbuf *execbuf, VkResult anv_cmd_buffer_execbuf(struct anv_device *device, - struct anv_cmd_buffer *cmd_buffer) + struct anv_cmd_buffer *cmd_buffer, + const VkSemaphore *in_semaphores, + uint32_t num_in_semaphores, + const VkSemaphore *out_semaphores, + uint32_t num_out_semaphores) { struct anv_execbuf execbuf; anv_execbuf_init(); - VkResult result = setup_execbuf_for_cmd_buffer(, cmd_buffer); + VkResult result = VK_SUCCESS; + for (uint32_t i = 0; i < num_in_semaphores; i++) { + ANV_FROM_HANDLE(anv_semaphore, semaphore, in_semaphores[i]); + assert(semaphore->temporary.type == ANV_SEMAPHORE_TYPE_NONE); + struct anv_semaphore_impl *impl = >permanent; + + switch (impl->type) { + case ANV_SEMAPHORE_TYPE_BO: + result = anv_execbuf_add_bo(, impl->bo, NULL, + 0, >alloc); + if (result != VK_SUCCESS) +return result; + break; + default: + break; + } + } + + for (uint32_t i = 0; i < num_out_semaphores; i++) { + ANV_FROM_HANDLE(anv_semaphore, semaphore, out_semaphores[i]); + assert(semaphore->temporary.type == ANV_SEMAPHORE_TYPE_NONE); + struct anv_semaphore_impl *impl = >permanent; + + switch (impl->type) { + case ANV_SEMAPHORE_TYPE_BO: + result = anv_execbuf_add_bo(, impl->bo, NULL, + EXEC_OBJECT_WRITE, >alloc); + if (result != VK_SUCCESS) +return result; + break; + default: + break; + } + } + + result = setup_execbuf_for_cmd_buffer(, cmd_buffer); if (result != VK_SUCCESS) return result; diff --git a/src/intel/vulkan/anv_device.c b/src/intel/vulkan/anv_device.c index b85cd40..f6e77ab 100644 --- a/src/intel/vulkan/anv_device.c +++ b/src/intel/vulkan/anv_device.c @@ -378,6 +378,10 @@ static const VkExtensionProperties device_extensions[] = { .extensionName = VK_KHX_EXTERNAL_SEMAPHORE_EXTENSION_NAME, .specVersion = 1, }, + { +
[Mesa-dev] [PATCH 18/21] anv: Implement support for exporting semaphores as FENCE_FD
--- src/intel/vulkan/anv_batch_chain.c | 96 -- src/intel/vulkan/anv_device.c | 25 ++ src/intel/vulkan/anv_gem.c | 36 ++ src/intel/vulkan/anv_private.h | 24 +++--- src/intel/vulkan/anv_queue.c | 73 +++-- 5 files changed, 240 insertions(+), 14 deletions(-) diff --git a/src/intel/vulkan/anv_batch_chain.c b/src/intel/vulkan/anv_batch_chain.c index 0529f22..ec37c81 100644 --- a/src/intel/vulkan/anv_batch_chain.c +++ b/src/intel/vulkan/anv_batch_chain.c @@ -1387,6 +1387,23 @@ setup_execbuf_for_cmd_buffer(struct anv_execbuf *execbuf, return VK_SUCCESS; } +static void +setup_empty_execbuf(struct anv_execbuf *execbuf, struct anv_device *device) +{ + anv_execbuf_add_bo(execbuf, >trivial_batch_bo, NULL, 0, + >alloc); + + execbuf->execbuf = (struct drm_i915_gem_execbuffer2) { + .buffers_ptr = (uintptr_t) execbuf->objects, + .buffer_count = execbuf->bo_count, + .batch_start_offset = 0, + .batch_len = 8, /* GEN8_MI_BATCH_BUFFER_END and NOOP */ + .flags = I915_EXEC_HANDLE_LUT | I915_EXEC_RENDER, + .rsvd1 = device->context_id, + .rsvd2 = 0, + }; +} + VkResult anv_cmd_buffer_execbuf(struct anv_device *device, struct anv_cmd_buffer *cmd_buffer, @@ -1398,11 +1415,13 @@ anv_cmd_buffer_execbuf(struct anv_device *device, struct anv_execbuf execbuf; anv_execbuf_init(); + int in_fence = -1; VkResult result = VK_SUCCESS; for (uint32_t i = 0; i < num_in_semaphores; i++) { ANV_FROM_HANDLE(anv_semaphore, semaphore, in_semaphores[i]); - assert(semaphore->temporary.type == ANV_SEMAPHORE_TYPE_NONE); - struct anv_semaphore_impl *impl = >permanent; + struct anv_semaphore_impl *impl = + semaphore->temporary.type != ANV_SEMAPHORE_TYPE_NONE ? + >temporary : >permanent; switch (impl->type) { case ANV_SEMAPHORE_TYPE_BO: @@ -1411,13 +1430,42 @@ anv_cmd_buffer_execbuf(struct anv_device *device, if (result != VK_SUCCESS) return result; break; + + case ANV_SEMAPHORE_TYPE_SYNC_FILE: + if (in_fence == -1) { +in_fence = impl->fd; + } else { +int merge = anv_gem_sync_file_merge(device, in_fence, impl->fd); +if (merge == -1) + return vk_error(VK_ERROR_INVALID_EXTERNAL_HANDLE_KHX); + +close(impl->fd); +close(in_fence); +in_fence = merge; + } + + impl->fd = -1; + break; + default: break; } + + /* Waiting on a semaphore with temporary state implicitly resets it back + * to the permanent state. + */ + if (semaphore->temporary.type != ANV_SEMAPHORE_TYPE_NONE) { + assert(semaphore->temporary.type == ANV_SEMAPHORE_TYPE_SYNC_FILE); + semaphore->temporary.type = ANV_SEMAPHORE_TYPE_NONE; + } } + bool need_out_fence = false; for (uint32_t i = 0; i < num_out_semaphores; i++) { ANV_FROM_HANDLE(anv_semaphore, semaphore, out_semaphores[i]); + /* Out fences can't have temporary state because that would imply + * that we imported a sync file and are trying to signal it. + */ assert(semaphore->temporary.type == ANV_SEMAPHORE_TYPE_NONE); struct anv_semaphore_impl *impl = >permanent; @@ -1428,17 +1476,55 @@ anv_cmd_buffer_execbuf(struct anv_device *device, if (result != VK_SUCCESS) return result; break; + + case ANV_SEMAPHORE_TYPE_SYNC_FILE: + need_out_fence = true; + break; + default: break; } } - result = setup_execbuf_for_cmd_buffer(, cmd_buffer); - if (result != VK_SUCCESS) - return result; + if (cmd_buffer) { + result = setup_execbuf_for_cmd_buffer(, cmd_buffer); + if (result != VK_SUCCESS) + return result; + } else { + setup_empty_execbuf(, device); + } + + if (in_fence != -1) { + execbuf.execbuf.flags |= I915_EXEC_FENCE_IN; + execbuf.execbuf.rsvd2 |= (uint32_t)in_fence; + } + + if (need_out_fence) + execbuf.execbuf.flags |= I915_EXEC_FENCE_OUT; result = anv_device_execbuf(device, , execbuf.bos); + /* Execbuf does not consume the in_fence. It's our job to close it. */ + close(in_fence); + + if (result == VK_SUCCESS && need_out_fence) { + int out_fence = execbuf.execbuf.rsvd2 >> 32; + for (uint32_t i = 0; i < num_out_semaphores; i++) { + ANV_FROM_HANDLE(anv_semaphore, semaphore, out_semaphores[i]); + /* Out fences can't have temporary state because that would imply + * that we imported a sync file and are trying to signal it. + */ + assert(semaphore->temporary.type == ANV_SEMAPHORE_TYPE_NONE); + struct anv_semaphore_impl *impl = >permanent; + + if (impl->type ==
[Mesa-dev] [HACK 19/21] anv: Set context priorities based on queue priorities
This patch will never be committed because Vulkan queue priorities are supposed to be local to the device and not cross process boundaries. --- src/intel/vulkan/anv_device.c| 12 src/intel/vulkan/anv_gem.c | 13 + src/intel/vulkan/anv_gem_stubs.c | 7 +++ src/intel/vulkan/anv_private.h | 2 ++ 4 files changed, 34 insertions(+) diff --git a/src/intel/vulkan/anv_device.c b/src/intel/vulkan/anv_device.c index 2885bb6..f853905 100644 --- a/src/intel/vulkan/anv_device.c +++ b/src/intel/vulkan/anv_device.c @@ -1068,6 +1068,18 @@ VkResult anv_CreateDevice( goto fail_fd; } + if (pCreateInfo->pQueueCreateInfos && + pCreateInfo->pQueueCreateInfos->pQueuePriorities) { + float priority = *pCreateInfo->pQueueCreateInfos->pQueuePriorities; + int kernel_priority = 1023 * priority - 1023; + int ret = anv_gem_set_context_param(device, 6, kernel_priority); + if (ret == -1) { + result = vk_errorf(VK_ERROR_INITIALIZATION_FAILED, +"Setting I915_CONTEXT_PARAM_PRIORITY failed: %m"); + goto fail_fd; + } + } + device->info = physical_device->info; device->isl_dev = physical_device->isl_dev; diff --git a/src/intel/vulkan/anv_gem.c b/src/intel/vulkan/anv_gem.c index ffdc5a1..e331fbb 100644 --- a/src/intel/vulkan/anv_gem.c +++ b/src/intel/vulkan/anv_gem.c @@ -231,6 +231,19 @@ anv_gem_get_param(int fd, uint32_t param) return 0; } +int +anv_gem_set_context_param(struct anv_device *device, + uint64_t param, uint64_t value) +{ + struct drm_i915_gem_context_param args = { + .ctx_id = device->context_id, + .param = param, + .value = value, + }; + + return anv_ioctl(device->fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, ); +} + bool anv_gem_get_bit6_swizzle(int fd, uint32_t tiling) { diff --git a/src/intel/vulkan/anv_gem_stubs.c b/src/intel/vulkan/anv_gem_stubs.c index a63e96d..d93009f 100644 --- a/src/intel/vulkan/anv_gem_stubs.c +++ b/src/intel/vulkan/anv_gem_stubs.c @@ -126,6 +126,13 @@ anv_gem_get_param(int fd, uint32_t param) unreachable("Unused"); } +int +anv_gem_set_context_param(struct anv_device *device, + uint64_t param, uint64_t value) +{ + unreachable("Unused"); +} + bool anv_gem_get_bit6_swizzle(int fd, uint32_t tiling) { diff --git a/src/intel/vulkan/anv_private.h b/src/intel/vulkan/anv_private.h index a083a07..b99c93c 100644 --- a/src/intel/vulkan/anv_private.h +++ b/src/intel/vulkan/anv_private.h @@ -800,6 +800,8 @@ int anv_gem_set_caching(struct anv_device *device, uint32_t gem_handle, uint32_t int anv_gem_set_domain(struct anv_device *device, uint32_t gem_handle, uint32_t read_domains, uint32_t write_domain); int anv_gem_sync_file_merge(struct anv_device *device, int fd1, int fd2); +int anv_gem_set_context_param(struct anv_device *device, + uint64_t param, uint64_t value); VkResult anv_bo_init_new(struct anv_bo *bo, struct anv_device *device, uint64_t size); -- 2.5.0.400.gff86faf ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 15/21] anv: Pull the guts of cmd_buffer_execbuf into a helper
--- src/intel/vulkan/anv_batch_chain.c | 59 ++ 1 file changed, 35 insertions(+), 24 deletions(-) diff --git a/src/intel/vulkan/anv_batch_chain.c b/src/intel/vulkan/anv_batch_chain.c index 3e9fa4c..136f273 100644 --- a/src/intel/vulkan/anv_batch_chain.c +++ b/src/intel/vulkan/anv_batch_chain.c @@ -1250,22 +1250,19 @@ relocate_cmd_buffer(struct anv_cmd_buffer *cmd_buffer, return true; } -VkResult -anv_cmd_buffer_execbuf(struct anv_device *device, - struct anv_cmd_buffer *cmd_buffer) +static VkResult +setup_execbuf_for_cmd_buffer(struct anv_execbuf *execbuf, + struct anv_cmd_buffer *cmd_buffer) { struct anv_batch *batch = _buffer->batch; struct anv_block_pool *ss_pool = _buffer->device->surface_state_block_pool; - struct anv_execbuf execbuf; - anv_execbuf_init(); - adjust_relocations_from_state_pool(ss_pool, _buffer->surface_relocs, cmd_buffer->last_ss_pool_center); VkResult result = - anv_execbuf_add_bo(, _pool->bo, _buffer->surface_relocs, - >alloc); + anv_execbuf_add_bo(execbuf, _pool->bo, _buffer->surface_relocs, + _buffer->device->alloc); if (result != VK_SUCCESS) return result; @@ -1277,8 +1274,8 @@ anv_cmd_buffer_execbuf(struct anv_device *device, adjust_relocations_to_state_pool(ss_pool, &(*bbo)->bo, &(*bbo)->relocs, cmd_buffer->last_ss_pool_center); - result = anv_execbuf_add_bo(, &(*bbo)->bo, &(*bbo)->relocs, - >alloc); + result = anv_execbuf_add_bo(execbuf, &(*bbo)->bo, &(*bbo)->relocs, + _buffer->device->alloc); if (result != VK_SUCCESS) return result; } @@ -1297,19 +1294,19 @@ anv_cmd_buffer_execbuf(struct anv_device *device, * corresponding to the first batch_bo in the chain with the last * element in the list. */ - if (first_batch_bo->bo.index != execbuf.bo_count - 1) { + if (first_batch_bo->bo.index != execbuf->bo_count - 1) { uint32_t idx = first_batch_bo->bo.index; - uint32_t last_idx = execbuf.bo_count - 1; + uint32_t last_idx = execbuf->bo_count - 1; - struct drm_i915_gem_exec_object2 tmp_obj = execbuf.objects[idx]; - assert(execbuf.bos[idx] == _batch_bo->bo); + struct drm_i915_gem_exec_object2 tmp_obj = execbuf->objects[idx]; + assert(execbuf->bos[idx] == _batch_bo->bo); - execbuf.objects[idx] = execbuf.objects[last_idx]; - execbuf.bos[idx] = execbuf.bos[last_idx]; - execbuf.bos[idx]->index = idx; + execbuf->objects[idx] = execbuf->objects[last_idx]; + execbuf->bos[idx] = execbuf->bos[last_idx]; + execbuf->bos[idx]->index = idx; - execbuf.objects[last_idx] = tmp_obj; - execbuf.bos[last_idx] = _batch_bo->bo; + execbuf->objects[last_idx] = tmp_obj; + execbuf->bos[last_idx] = _batch_bo->bo; first_batch_bo->bo.index = last_idx; } @@ -1330,9 +1327,9 @@ anv_cmd_buffer_execbuf(struct anv_device *device, } } - execbuf.execbuf = (struct drm_i915_gem_execbuffer2) { - .buffers_ptr = (uintptr_t) execbuf.objects, - .buffer_count = execbuf.bo_count, + execbuf->execbuf = (struct drm_i915_gem_execbuffer2) { + .buffers_ptr = (uintptr_t) execbuf->objects, + .buffer_count = execbuf->bo_count, .batch_start_offset = 0, .batch_len = batch->next - batch->start, .cliprects_ptr = 0, @@ -1345,7 +1342,7 @@ anv_cmd_buffer_execbuf(struct anv_device *device, .rsvd2 = 0, }; - if (relocate_cmd_buffer(cmd_buffer, )) { + if (relocate_cmd_buffer(cmd_buffer, execbuf)) { /* If we were able to successfully relocate everything, tell the kernel * that it can skip doing relocations. The requirement for using * NO_RELOC is: @@ -1370,7 +1367,7 @@ anv_cmd_buffer_execbuf(struct anv_device *device, * the RENDER_SURFACE_STATE matches presumed_offset, so it should be * safe for the kernel to relocate them as needed. */ - execbuf.execbuf.flags |= I915_EXEC_NO_RELOC; + execbuf->execbuf.flags |= I915_EXEC_NO_RELOC; } else { /* In the case where we fall back to doing kernel relocations, we need * to ensure that the relocation list is valid. All relocations on the @@ -1385,6 +1382,20 @@ anv_cmd_buffer_execbuf(struct anv_device *device, cmd_buffer->surface_relocs.relocs[i].presumed_offset = -1; } + return VK_SUCCESS; +} + +VkResult +anv_cmd_buffer_execbuf(struct anv_device *device, + struct anv_cmd_buffer *cmd_buffer) +{ + struct anv_execbuf execbuf; + anv_execbuf_init(); + + VkResult result = setup_execbuf_for_cmd_buffer(, cmd_buffer); + if (result != VK_SUCCESS) + return result; + result = anv_device_execbuf(device, , execbuf.bos);
[Mesa-dev] [PATCH 14/21] anv: Implement VK_KHX_external_semaphore
--- src/intel/vulkan/anv_device.c | 4 src/intel/vulkan/anv_entrypoints_gen.py | 1 + src/intel/vulkan/anv_queue.c| 8 3 files changed, 13 insertions(+) diff --git a/src/intel/vulkan/anv_device.c b/src/intel/vulkan/anv_device.c index 41e0fb3..b85cd40 100644 --- a/src/intel/vulkan/anv_device.c +++ b/src/intel/vulkan/anv_device.c @@ -374,6 +374,10 @@ static const VkExtensionProperties device_extensions[] = { .extensionName = VK_KHX_EXTERNAL_MEMORY_FD_EXTENSION_NAME, .specVersion = 1, }, + { + .extensionName = VK_KHX_EXTERNAL_SEMAPHORE_EXTENSION_NAME, + .specVersion = 1, + }, }; static void * diff --git a/src/intel/vulkan/anv_entrypoints_gen.py b/src/intel/vulkan/anv_entrypoints_gen.py index 5ad0f26..cfa9d68 100644 --- a/src/intel/vulkan/anv_entrypoints_gen.py +++ b/src/intel/vulkan/anv_entrypoints_gen.py @@ -48,6 +48,7 @@ SUPPORTED_EXTENSIONS = [ 'VK_KHX_external_memory', 'VK_KHX_external_memory_capabilities', 'VK_KHX_external_memory_fd', +'VK_KHX_external_semaphore', 'VK_KHX_external_semaphore_capabilities', ] diff --git a/src/intel/vulkan/anv_queue.c b/src/intel/vulkan/anv_queue.c index 906eb25..64c5900 100644 --- a/src/intel/vulkan/anv_queue.c +++ b/src/intel/vulkan/anv_queue.c @@ -508,6 +508,14 @@ VkResult anv_CreateSemaphore( if (semaphore == NULL) return vk_error(VK_ERROR_OUT_OF_HOST_MEMORY); + const VkExportSemaphoreCreateInfoKHX *export = + vk_find_struct_const(pCreateInfo->pNext, EXPORT_SEMAPHORE_CREATE_INFO_KHX); +VkExternalSemaphoreHandleTypeFlagsKHX handleTypes = + export ? export->handleTypes : 0; + + /* External semaphores are not yet supported */ + assert(handleTypes == 0); + /* The DRM execbuffer ioctl always execute in-oder, even between * different rings. As such, a dummy no-op semaphore is a perfectly * valid implementation. -- 2.5.0.400.gff86faf ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 12/21] anv: Add a real semaphore struct
It's just a dummy for now, but we'll flesh it out as needed for external semaphores. --- src/intel/vulkan/anv_private.h | 28 src/intel/vulkan/anv_queue.c | 32 ++-- 2 files changed, 54 insertions(+), 6 deletions(-) diff --git a/src/intel/vulkan/anv_private.h b/src/intel/vulkan/anv_private.h index 898f0cf..5cbb0c5 100644 --- a/src/intel/vulkan/anv_private.h +++ b/src/intel/vulkan/anv_private.h @@ -1706,6 +1706,33 @@ struct anv_event { struct anv_state state; }; +enum anv_semaphore_type { + ANV_SEMAPHORE_TYPE_NONE = 0, + ANV_SEMAPHORE_TYPE_DUMMY +}; + +struct anv_semaphore_impl { + enum anv_semaphore_type type; +}; + +struct anv_semaphore { + /* Permanent semaphore state. Every semaphore has some form of permanent +* state (type != ANV_SEMAPHORE_TYPE_NONE). This may be a BO to fence on +* (for cross-process semaphores0 or it could just be a dummy for use +* internally. +*/ + struct anv_semaphore_impl permanent; + + /* Temporary semaphore state. A semaphore *may* have temporary state. +* That state is added to the semaphore by an import operation and is reset +* back to ANV_SEMAPHORE_TYPE_NONE when the semaphore is waited on. A +* semaphore with temporary state cannot be signaled because the semaphore +* must already be signaled before the temporary state can be exported from +* the semaphore in the other process and imported here. +*/ + struct anv_semaphore_impl temporary; +}; + struct anv_shader_module { unsigned charsha1[20]; uint32_t size; @@ -2314,6 +2341,7 @@ ANV_DEFINE_NONDISP_HANDLE_CASTS(anv_pipeline_layout, VkPipelineLayout) ANV_DEFINE_NONDISP_HANDLE_CASTS(anv_query_pool, VkQueryPool) ANV_DEFINE_NONDISP_HANDLE_CASTS(anv_render_pass, VkRenderPass) ANV_DEFINE_NONDISP_HANDLE_CASTS(anv_sampler, VkSampler) +ANV_DEFINE_NONDISP_HANDLE_CASTS(anv_semaphore, VkSemaphore) ANV_DEFINE_NONDISP_HANDLE_CASTS(anv_shader_module, VkShaderModule) /* Gen-specific function declarations */ diff --git a/src/intel/vulkan/anv_queue.c b/src/intel/vulkan/anv_queue.c index 5a22ff7..f6ff41f 100644 --- a/src/intel/vulkan/anv_queue.c +++ b/src/intel/vulkan/anv_queue.c @@ -493,23 +493,43 @@ done: // Queue semaphore functions VkResult anv_CreateSemaphore( -VkDevicedevice, +VkDevice_device, const VkSemaphoreCreateInfo*pCreateInfo, const VkAllocationCallbacks*pAllocator, VkSemaphore*pSemaphore) { - /* The DRM execbuffer ioctl always execute in-oder, even between different -* rings. As such, there's nothing to do for the user space semaphore. + ANV_FROM_HANDLE(anv_device, device, _device); + struct anv_semaphore *semaphore; + + assert(pCreateInfo->sType == VK_STRUCTURE_TYPE_SEMAPHORE_CREATE_INFO); + + semaphore = vk_alloc2(>alloc, pAllocator, sizeof(*semaphore), 8, + VK_SYSTEM_ALLOCATION_SCOPE_OBJECT); + if (semaphore == NULL) + return vk_error(VK_ERROR_OUT_OF_HOST_MEMORY); + + /* The DRM execbuffer ioctl always execute in-oder, even between +* different rings. As such, a dummy no-op semaphore is a perfectly +* valid implementation. */ + semaphore->permanent.type = ANV_SEMAPHORE_TYPE_DUMMY; + semaphore->temporary.type = ANV_SEMAPHORE_TYPE_NONE; - *pSemaphore = (VkSemaphore)1; + *pSemaphore = anv_semaphore_to_handle(semaphore); return VK_SUCCESS; } void anv_DestroySemaphore( -VkDevicedevice, -VkSemaphore semaphore, +VkDevice_device, +VkSemaphore _semaphore, const VkAllocationCallbacks*pAllocator) { + ANV_FROM_HANDLE(anv_device, device, _device); + ANV_FROM_HANDLE(anv_semaphore, semaphore, _semaphore); + + if (semaphore == NULL) + return; + + vk_free2(>alloc, pAllocator, semaphore); } -- 2.5.0.400.gff86faf ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 09/21] anv: Use the BO cache for DeviceMemory allocations
Reviewed-by: Chad Versace--- src/intel/vulkan/anv_device.c | 27 --- src/intel/vulkan/anv_image.c | 2 +- src/intel/vulkan/anv_intel.c | 15 ++- src/intel/vulkan/anv_private.h | 4 +++- src/intel/vulkan/anv_wsi.c | 8 5 files changed, 30 insertions(+), 26 deletions(-) diff --git a/src/intel/vulkan/anv_device.c b/src/intel/vulkan/anv_device.c index a7ae6ce..eaf93b5 100644 --- a/src/intel/vulkan/anv_device.c +++ b/src/intel/vulkan/anv_device.c @@ -1124,10 +1124,14 @@ VkResult anv_CreateDevice( anv_bo_pool_init(>batch_bo_pool, device); + result = anv_bo_cache_init(>bo_cache); + if (result != VK_SUCCESS) + goto fail_batch_bo_pool; + result = anv_block_pool_init(>dynamic_state_block_pool, device, 16384); if (result != VK_SUCCESS) - goto fail_batch_bo_pool; + goto fail_bo_cache; anv_state_pool_init(>dynamic_state_pool, >dynamic_state_block_pool); @@ -1199,6 +1203,8 @@ VkResult anv_CreateDevice( fail_dynamic_state_pool: anv_state_pool_finish(>dynamic_state_pool); anv_block_pool_finish(>dynamic_state_block_pool); + fail_bo_cache: + anv_bo_cache_finish(>bo_cache); fail_batch_bo_pool: anv_bo_pool_finish(>batch_bo_pool); pthread_cond_destroy(>queue_submit); @@ -1246,6 +1252,8 @@ void anv_DestroyDevice( anv_state_pool_finish(>dynamic_state_pool); anv_block_pool_finish(>dynamic_state_block_pool); + anv_bo_cache_finish(>bo_cache); + anv_bo_pool_finish(>batch_bo_pool); pthread_cond_destroy(>queue_submit); @@ -1613,7 +1621,8 @@ VkResult anv_AllocateMemory( /* The kernel is going to give us whole pages anyway */ uint64_t alloc_size = align_u64(pAllocateInfo->allocationSize, 4096); - result = anv_bo_init_new(>bo, device, alloc_size); + result = anv_bo_cache_alloc(device, >bo_cache, + alloc_size, >bo); if (result != VK_SUCCESS) goto fail; @@ -1646,11 +1655,7 @@ void anv_FreeMemory( if (mem->map) anv_UnmapMemory(_device, _mem); - if (mem->bo.map) - anv_gem_munmap(mem->bo.map, mem->bo.size); - - if (mem->bo.gem_handle != 0) - anv_gem_close(device, mem->bo.gem_handle); + anv_bo_cache_release(device, >bo_cache, mem->bo); vk_free2(>alloc, pAllocator, mem); } @@ -1672,7 +1677,7 @@ VkResult anv_MapMemory( } if (size == VK_WHOLE_SIZE) - size = mem->bo.size - offset; + size = mem->bo->size - offset; /* From the Vulkan spec version 1.0.32 docs for MapMemory: * @@ -1682,7 +1687,7 @@ VkResult anv_MapMemory( *equal to the size of the memory minus offset */ assert(size > 0); - assert(offset + size <= mem->bo.size); + assert(offset + size <= mem->bo->size); /* FIXME: Is this supposed to be thread safe? Since vkUnmapMemory() only * takes a VkDeviceMemory pointer, it seems like only one map of the memory @@ -1702,7 +1707,7 @@ VkResult anv_MapMemory( /* Let's map whole pages */ map_size = align_u64(map_size, 4096); - void *map = anv_gem_mmap(device, mem->bo.gem_handle, + void *map = anv_gem_mmap(device, mem->bo->gem_handle, map_offset, map_size, gem_flags); if (map == MAP_FAILED) return vk_error(VK_ERROR_MEMORY_MAP_FAILED); @@ -1854,7 +1859,7 @@ VkResult anv_BindBufferMemory( ANV_FROM_HANDLE(anv_buffer, buffer, _buffer); if (mem) { - buffer->bo = >bo; + buffer->bo = mem->bo; buffer->offset = memoryOffset; } else { buffer->bo = NULL; diff --git a/src/intel/vulkan/anv_image.c b/src/intel/vulkan/anv_image.c index cf34dbe..4874f2f 100644 --- a/src/intel/vulkan/anv_image.c +++ b/src/intel/vulkan/anv_image.c @@ -341,7 +341,7 @@ VkResult anv_BindImageMemory( return VK_SUCCESS; } - image->bo = >bo; + image->bo = mem->bo; image->offset = memoryOffset; if (image->aux_surface.isl.size > 0) { diff --git a/src/intel/vulkan/anv_intel.c b/src/intel/vulkan/anv_intel.c index eda474e..991a935 100644 --- a/src/intel/vulkan/anv_intel.c +++ b/src/intel/vulkan/anv_intel.c @@ -49,18 +49,15 @@ VkResult anv_CreateDmaBufImageINTEL( if (mem == NULL) return vk_error(VK_ERROR_OUT_OF_HOST_MEMORY); - uint32_t gem_handle = anv_gem_fd_to_handle(device, pCreateInfo->fd); - if (!gem_handle) { - result = vk_error(VK_ERROR_OUT_OF_DEVICE_MEMORY); - goto fail; - } - uint64_t size = (uint64_t)pCreateInfo->strideInBytes * pCreateInfo->extent.height; - anv_bo_init(>bo, gem_handle, size); + result = anv_bo_cache_import(device, >bo_cache, +pCreateInfo->fd, size, >bo); + if (result != VK_SUCCESS) + goto fail; if (device->instance->physicalDevice.supports_48bit_addresses) - mem->bo.flags |= EXEC_OBJECT_SUPPORTS_48B_ADDRESS; + mem->bo->flags |= EXEC_OBJECT_SUPPORTS_48B_ADDRESS;
[Mesa-dev] [PATCH 13/21] anv: Implement VK_KHX_external_semaphore_capabilities
This just stubs things out. Real external semaphore support will come with VK_KHX_external_semaphore_fd. --- src/intel/vulkan/anv_device.c | 4 src/intel/vulkan/anv_entrypoints_gen.py | 1 + src/intel/vulkan/anv_queue.c| 13 + 3 files changed, 18 insertions(+) diff --git a/src/intel/vulkan/anv_device.c b/src/intel/vulkan/anv_device.c index 98b1868..41e0fb3 100644 --- a/src/intel/vulkan/anv_device.c +++ b/src/intel/vulkan/anv_device.c @@ -331,6 +331,10 @@ static const VkExtensionProperties global_extensions[] = { .extensionName = VK_KHX_EXTERNAL_MEMORY_CAPABILITIES_EXTENSION_NAME, .specVersion = 1, }, + { + .extensionName = VK_KHX_EXTERNAL_SEMAPHORE_CAPABILITIES_EXTENSION_NAME, + .specVersion = 1, + }, }; static const VkExtensionProperties device_extensions[] = { diff --git a/src/intel/vulkan/anv_entrypoints_gen.py b/src/intel/vulkan/anv_entrypoints_gen.py index b4395c0..5ad0f26 100644 --- a/src/intel/vulkan/anv_entrypoints_gen.py +++ b/src/intel/vulkan/anv_entrypoints_gen.py @@ -48,6 +48,7 @@ SUPPORTED_EXTENSIONS = [ 'VK_KHX_external_memory', 'VK_KHX_external_memory_capabilities', 'VK_KHX_external_memory_fd', +'VK_KHX_external_semaphore_capabilities', ] # We generate a static hash table for entry point lookup diff --git a/src/intel/vulkan/anv_queue.c b/src/intel/vulkan/anv_queue.c index f6ff41f..906eb25 100644 --- a/src/intel/vulkan/anv_queue.c +++ b/src/intel/vulkan/anv_queue.c @@ -533,3 +533,16 @@ void anv_DestroySemaphore( vk_free2(>alloc, pAllocator, semaphore); } + +void anv_GetPhysicalDeviceExternalSemaphorePropertiesKHX( +VkPhysicalDevicephysicalDevice, +const VkPhysicalDeviceExternalSemaphoreInfoKHX* pExternalSemaphoreInfo, +VkExternalSemaphorePropertiesKHX* pExternalSemaphoreProperties) +{ + switch (pExternalSemaphoreInfo->handleType) { + default: + pExternalSemaphoreProperties->exportFromImportedHandleTypes = 0; + pExternalSemaphoreProperties->compatibleHandleTypes = 0; + pExternalSemaphoreProperties->externalSemaphoreFeatures = 0; + } +} -- 2.5.0.400.gff86faf ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 10/21] anv: Implement VK_KHX_external_memory_fd
This commit just exposes the memory handle type. There's interesting we need to do here for images. So long as the user doesn't set any crazy environment variables such as INTEL_DEBUG=nohiz, all of the compression formats etc. should "just work" at least for opaque handle types. v2 (chadv): - Rebase. - Fix vkGetPhysicalDeviceImageFormatProperties2KHR when handleType == 0. - Move handleType-independency comments out of handleType-switch, in vkGetPhysicalDeviceExternalBufferPropertiesKHX. Reduces diff in future dma_buf patches. Co-authored-with: Chad VersaceReviewed-by: Chad Versace --- src/intel/vulkan/anv_device.c | 71 - src/intel/vulkan/anv_entrypoints_gen.py | 1 + src/intel/vulkan/anv_formats.c | 59 +++ 3 files changed, 113 insertions(+), 18 deletions(-) diff --git a/src/intel/vulkan/anv_device.c b/src/intel/vulkan/anv_device.c index eaf93b5..e891912 100644 --- a/src/intel/vulkan/anv_device.c +++ b/src/intel/vulkan/anv_device.c @@ -366,6 +366,10 @@ static const VkExtensionProperties device_extensions[] = { .extensionName = VK_KHX_EXTERNAL_MEMORY_EXTENSION_NAME, .specVersion = 1, }, + { + .extensionName = VK_KHX_EXTERNAL_MEMORY_FD_EXTENSION_NAME, + .specVersion = 1, + }, }; static void * @@ -1600,7 +1604,7 @@ VkResult anv_AllocateMemory( { ANV_FROM_HANDLE(anv_device, device, _device); struct anv_device_memory *mem; - VkResult result; + VkResult result = VK_SUCCESS; assert(pAllocateInfo->sType == VK_STRUCTURE_TYPE_MEMORY_ALLOCATE_INFO); @@ -1618,19 +1622,36 @@ VkResult anv_AllocateMemory( if (mem == NULL) return vk_error(VK_ERROR_OUT_OF_HOST_MEMORY); - /* The kernel is going to give us whole pages anyway */ - uint64_t alloc_size = align_u64(pAllocateInfo->allocationSize, 4096); - - result = anv_bo_cache_alloc(device, >bo_cache, - alloc_size, >bo); - if (result != VK_SUCCESS) - goto fail; - mem->type_index = pAllocateInfo->memoryTypeIndex; - mem->map = NULL; mem->map_size = 0; + const VkImportMemoryFdInfoKHX *fd_info = + vk_find_struct_const(pAllocateInfo->pNext, IMPORT_MEMORY_FD_INFO_KHX); + + /* The Vulkan spec permits handleType to be 0, in which case the struct is +* ignored. +*/ + if (fd_info && fd_info->handleType) { + /* At the moment, we only support the OPAQUE_FD memory type which is + * just a GEM buffer. + */ + assert(fd_info->handleType == + VK_EXTERNAL_MEMORY_HANDLE_TYPE_OPAQUE_FD_BIT_KHX); + + result = anv_bo_cache_import(device, >bo_cache, + fd_info->fd, pAllocateInfo->allocationSize, + >bo); + if (result != VK_SUCCESS) + goto fail; + } else { + result = anv_bo_cache_alloc(device, >bo_cache, + pAllocateInfo->allocationSize, + >bo); + if (result != VK_SUCCESS) + goto fail; + } + *pMem = anv_device_memory_to_handle(mem); return VK_SUCCESS; @@ -1641,6 +1662,36 @@ VkResult anv_AllocateMemory( return result; } +VkResult anv_GetMemoryFdKHX( +VkDevicedevice_h, +VkDeviceMemory memory_h, +VkExternalMemoryHandleTypeFlagBitsKHX handleType, +int*pFd) +{ + ANV_FROM_HANDLE(anv_device, dev, device_h); + ANV_FROM_HANDLE(anv_device_memory, mem, memory_h); + + /* We support only one handle type. */ + assert(handleType == VK_EXTERNAL_MEMORY_HANDLE_TYPE_OPAQUE_FD_BIT_KHX); + + return anv_bo_cache_export(dev, >bo_cache, mem->bo, pFd); +} + +VkResult anv_GetMemoryFdPropertiesKHX( +VkDevicedevice_h, +VkExternalMemoryHandleTypeFlagBitsKHX handleType, +int fd, +VkMemoryFdPropertiesKHX*pMemoryFdProperties) +{ + /* The valid usage section for this function says: +* +*"handleType must not be one of the handle types defined as opaque." +* +* Since we only handle opaque handles for now, there are no FD properties. +*/ + return VK_ERROR_INVALID_EXTERNAL_HANDLE_KHX; +} + void anv_FreeMemory( VkDevice_device, VkDeviceMemory _mem, diff --git a/src/intel/vulkan/anv_entrypoints_gen.py b/src/intel/vulkan/anv_entrypoints_gen.py index 400b567..b4395c0 100644 --- a/src/intel/vulkan/anv_entrypoints_gen.py +++ b/src/intel/vulkan/anv_entrypoints_gen.py @@ -47,6 +47,7 @@ SUPPORTED_EXTENSIONS = [ 'VK_KHR_xlib_surface', 'VK_KHX_external_memory', 'VK_KHX_external_memory_capabilities', +'VK_KHX_external_memory_fd', ] # We generate a static
[Mesa-dev] [PATCH 08/21] anv/allocator: Add a BO cache
This cache allows us to easily ensure that we have a unique anv_bo for each gem handle. We'll need this in order to support multiple-import of memory objects and semaphores. v2 (Jason Ekstrand): - Reject BO imports if the size doesn't match the prime fd size as reported by lseek(). --- src/intel/vulkan/anv_allocator.c | 257 + src/intel/vulkan/anv_private.h | 21 ++ .../drivers/dri/i965/brw_nir_trig_workarounds.c| 191 +++ 3 files changed, 469 insertions(+) create mode 100644 src/mesa/drivers/dri/i965/brw_nir_trig_workarounds.c diff --git a/src/intel/vulkan/anv_allocator.c b/src/intel/vulkan/anv_allocator.c index 697309f..4ab5f60 100644 --- a/src/intel/vulkan/anv_allocator.c +++ b/src/intel/vulkan/anv_allocator.c @@ -34,6 +34,8 @@ #include "anv_private.h" +#include "util/hash_table.h" + #ifdef HAVE_VALGRIND #define VG_NOACCESS_READ(__ptr) ({ \ VALGRIND_MAKE_MEM_DEFINED((__ptr), sizeof(*(__ptr))); \ @@ -1004,3 +1006,258 @@ anv_scratch_pool_alloc(struct anv_device *device, struct anv_scratch_pool *pool, return >bo; } + +struct anv_cached_bo { + struct anv_bo bo; + + uint32_t refcount; +}; + +VkResult +anv_bo_cache_init(struct anv_bo_cache *cache) +{ + cache->bo_map = _mesa_hash_table_create(NULL, _mesa_hash_pointer, + _mesa_key_pointer_equal); + if (!cache->bo_map) + return vk_error(VK_ERROR_OUT_OF_HOST_MEMORY); + + if (pthread_mutex_init(>mutex, NULL)) { + _mesa_hash_table_destroy(cache->bo_map, NULL); + return vk_errorf(VK_ERROR_OUT_OF_HOST_MEMORY, + "pthread_mutex_inti failed: %m"); + } + + return VK_SUCCESS; +} + +void +anv_bo_cache_finish(struct anv_bo_cache *cache) +{ + _mesa_hash_table_destroy(cache->bo_map, NULL); + pthread_mutex_destroy(>mutex); +} + +static struct anv_cached_bo * +anv_bo_cache_lookup_locked(struct anv_bo_cache *cache, uint32_t gem_handle) +{ + struct hash_entry *entry = + _mesa_hash_table_search(cache->bo_map, + (const void *)(uintptr_t)gem_handle); + if (!entry) + return NULL; + + struct anv_cached_bo *bo = (struct anv_cached_bo *)entry->data; + assert(bo->bo.gem_handle == gem_handle); + + return bo; +} + +static struct anv_bo * +anv_bo_cache_lookup(struct anv_bo_cache *cache, uint32_t gem_handle) +{ + pthread_mutex_lock(>mutex); + + struct anv_cached_bo *bo = anv_bo_cache_lookup_locked(cache, gem_handle); + + pthread_mutex_unlock(>mutex); + + return >bo; +} + +VkResult +anv_bo_cache_alloc(struct anv_device *device, + struct anv_bo_cache *cache, + uint64_t size, struct anv_bo **bo_out) +{ + struct anv_cached_bo *bo = + vk_alloc(>alloc, sizeof(struct anv_cached_bo), 8, + VK_SYSTEM_ALLOCATION_SCOPE_OBJECT); + if (!bo) + return vk_error(VK_ERROR_OUT_OF_HOST_MEMORY); + + bo->refcount = 1; + + /* The kernel is going to give us whole pages anyway */ + size = align_u64(size, 4096); + + VkResult result = anv_bo_init_new(>bo, device, size); + if (result != VK_SUCCESS) { + vk_free(>alloc, bo); + return result; + } + + assert(bo->bo.gem_handle); + + pthread_mutex_lock(>mutex); + + _mesa_hash_table_insert(cache->bo_map, + (void *)(uintptr_t)bo->bo.gem_handle, bo); + + pthread_mutex_unlock(>mutex); + + *bo_out = >bo; + + return VK_SUCCESS; +} + +VkResult +anv_bo_cache_import(struct anv_device *device, +struct anv_bo_cache *cache, +int fd, uint64_t size, struct anv_bo **bo_out) +{ + pthread_mutex_lock(>mutex); + + /* The kernel is going to give us whole pages anyway */ + size = align_u64(size, 4096); + + uint32_t gem_handle = anv_gem_fd_to_handle(device, fd); + if (!gem_handle) { + pthread_mutex_unlock(>mutex); + return vk_error(VK_ERROR_INVALID_EXTERNAL_HANDLE_KHX); + } + + struct anv_cached_bo *bo = anv_bo_cache_lookup_locked(cache, gem_handle); + if (bo) { + if (bo->bo.size != size) { + pthread_mutex_unlock(>mutex); + return vk_error(VK_ERROR_INVALID_EXTERNAL_HANDLE_KHX); + } + __sync_fetch_and_add(>refcount, 1); + } else { + /* For security purposes, we reject BO imports where the size does not + * match exactly. This prevents a malicious client from passing a + * buffer to a trusted client, lying about the size, and telling the + * trusted client to try and texture from an image that goes + * out-of-bounds. This sort of thing could lead to GPU hangs or worse + * in the trusted client. The trusted client can protect itself against + * this sort of attack but only if it can trust the buffer size. + */ + off_t import_size = lseek(fd, 0, SEEK_END); + if (import_size == (off_t)-1 || import_size != size) { + anv_gem_close(device,
[Mesa-dev] [PATCH 11/21] anv: Move queues, events, and semaphores to their own file
Things are about to get more complicated, especially as far as semaphores are concerned. Reviewed-by: Chad Versace--- src/intel/Makefile.sources| 1 + src/intel/vulkan/anv_device.c | 484 --- src/intel/vulkan/anv_queue.c | 515 ++ 3 files changed, 516 insertions(+), 484 deletions(-) create mode 100644 src/intel/vulkan/anv_queue.c diff --git a/src/intel/Makefile.sources b/src/intel/Makefile.sources index d7bc09e..c64a5f2 100644 --- a/src/intel/Makefile.sources +++ b/src/intel/Makefile.sources @@ -202,6 +202,7 @@ VULKAN_FILES := \ vulkan/anv_pipeline.c \ vulkan/anv_pipeline_cache.c \ vulkan/anv_private.h \ + vulkan/anv_queue.c \ vulkan/anv_util.c \ vulkan/anv_wsi.c \ vulkan/vk_format_info.h diff --git a/src/intel/vulkan/anv_device.c b/src/intel/vulkan/anv_device.c index e891912..98b1868 100644 --- a/src/intel/vulkan/anv_device.c +++ b/src/intel/vulkan/anv_device.c @@ -981,62 +981,6 @@ anv_device_init_border_colors(struct anv_device *device) border_colors); } -VkResult -anv_device_submit_simple_batch(struct anv_device *device, - struct anv_batch *batch) -{ - struct drm_i915_gem_execbuffer2 execbuf; - struct drm_i915_gem_exec_object2 exec2_objects[1]; - struct anv_bo bo, *exec_bos[1]; - VkResult result = VK_SUCCESS; - uint32_t size; - - /* Kernel driver requires 8 byte aligned batch length */ - size = align_u32(batch->next - batch->start, 8); - result = anv_bo_pool_alloc(>batch_bo_pool, , size); - if (result != VK_SUCCESS) - return result; - - memcpy(bo.map, batch->start, size); - if (!device->info.has_llc) - anv_flush_range(bo.map, size); - - exec_bos[0] = - exec2_objects[0].handle = bo.gem_handle; - exec2_objects[0].relocation_count = 0; - exec2_objects[0].relocs_ptr = 0; - exec2_objects[0].alignment = 0; - exec2_objects[0].offset = bo.offset; - exec2_objects[0].flags = 0; - exec2_objects[0].rsvd1 = 0; - exec2_objects[0].rsvd2 = 0; - - execbuf.buffers_ptr = (uintptr_t) exec2_objects; - execbuf.buffer_count = 1; - execbuf.batch_start_offset = 0; - execbuf.batch_len = size; - execbuf.cliprects_ptr = 0; - execbuf.num_cliprects = 0; - execbuf.DR1 = 0; - execbuf.DR4 = 0; - - execbuf.flags = - I915_EXEC_HANDLE_LUT | I915_EXEC_NO_RELOC | I915_EXEC_RENDER; - execbuf.rsvd1 = device->context_id; - execbuf.rsvd2 = 0; - - result = anv_device_execbuf(device, , exec_bos); - if (result != VK_SUCCESS) - goto fail; - - result = anv_device_wait(device, , INT64_MAX); - - fail: - anv_bo_pool_free(>batch_bo_pool, ); - - return result; -} - VkResult anv_CreateDevice( VkPhysicalDevicephysicalDevice, const VkDeviceCreateInfo* pCreateInfo, @@ -1350,26 +1294,6 @@ void anv_GetDeviceQueue( } VkResult -anv_device_execbuf(struct anv_device *device, - struct drm_i915_gem_execbuffer2 *execbuf, - struct anv_bo **execbuf_bos) -{ - int ret = anv_gem_execbuffer(device, execbuf); - if (ret != 0) { - /* We don't know the real error. */ - device->lost = true; - return vk_errorf(VK_ERROR_DEVICE_LOST, "execbuf2 failed: %m"); - } - - struct drm_i915_gem_exec_object2 *objects = - (void *)(uintptr_t)execbuf->buffers_ptr; - for (uint32_t k = 0; k < execbuf->buffer_count; k++) - execbuf_bos[k]->offset = objects[k].offset; - - return VK_SUCCESS; -} - -VkResult anv_device_query_status(struct anv_device *device) { /* This isn't likely as most of the callers of this function already check @@ -1446,119 +1370,6 @@ anv_device_wait(struct anv_device *device, struct anv_bo *bo, return anv_device_query_status(device); } -VkResult anv_QueueSubmit( -VkQueue _queue, -uint32_tsubmitCount, -const VkSubmitInfo* pSubmits, -VkFence _fence) -{ - ANV_FROM_HANDLE(anv_queue, queue, _queue); - ANV_FROM_HANDLE(anv_fence, fence, _fence); - struct anv_device *device = queue->device; - - /* Query for device status prior to submitting. Technically, we don't need -* to do this. However, if we have a client that's submitting piles of -* garbage, we would rather break as early as possible to keep the GPU -* hanging contained. If we don't check here, we'll either be waiting for -* the kernel to kick us or we'll have to wait until the client waits on a -* fence before we actually know whether or not we've hung. -*/ - VkResult result = anv_device_query_status(device); - if (result != VK_SUCCESS) - return result; - - /* We lock around QueueSubmit for three main reasons: -* -* 1) When a block pool is
[Mesa-dev] [PATCH 07/21] anv: Implement VK_KHX_external_memory
This is the trivial implementation that just exposes the extension string but exposes zero external handle types. Reviewed-by: Chad Versace--- src/intel/vulkan/anv_device.c | 4 src/intel/vulkan/anv_entrypoints_gen.py | 1 + 2 files changed, 5 insertions(+) diff --git a/src/intel/vulkan/anv_device.c b/src/intel/vulkan/anv_device.c index d8de707..a7ae6ce 100644 --- a/src/intel/vulkan/anv_device.c +++ b/src/intel/vulkan/anv_device.c @@ -362,6 +362,10 @@ static const VkExtensionProperties device_extensions[] = { .extensionName = VK_KHR_INCREMENTAL_PRESENT_EXTENSION_NAME, .specVersion = 1, }, + { + .extensionName = VK_KHX_EXTERNAL_MEMORY_EXTENSION_NAME, + .specVersion = 1, + }, }; static void * diff --git a/src/intel/vulkan/anv_entrypoints_gen.py b/src/intel/vulkan/anv_entrypoints_gen.py index 245d6d0..400b567 100644 --- a/src/intel/vulkan/anv_entrypoints_gen.py +++ b/src/intel/vulkan/anv_entrypoints_gen.py @@ -45,6 +45,7 @@ SUPPORTED_EXTENSIONS = [ 'VK_KHR_wayland_surface', 'VK_KHR_xcb_surface', 'VK_KHR_xlib_surface', +'VK_KHX_external_memory', 'VK_KHX_external_memory_capabilities', ] -- 2.5.0.400.gff86faf ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 06/21] anv: Implement VK_KHX_external_memory_capabilities
From: Chad VersaceThis is a complete but trivial implementation. It's trivial becasue We support no external memory capabilities yet. Most of the real work in this commit is in reworking the UUIDs advertised by the driver. v2 (chadv): - Fix chain traversal in vkGetPhysicalDeviceImageFormatProperties2KHR. Extract VkPhysicalDeviceExternalImageFormatInfoKHX from the chain of input structs, not the chain of output structs. - In vkGetPhysicalDeviceImageFormatProperties2KHR, iterate over the input chain and the output chain separately. Reduces diff in future dma_buf patches. Co-authored-with: Jason Ekstrand Reviewed-by: Chad Versace Reviewed-by: Jason Ekstrand --- src/intel/vulkan/anv_device.c | 52 --- src/intel/vulkan/anv_entrypoints_gen.py | 1 + src/intel/vulkan/anv_formats.c | 75 + src/intel/vulkan/anv_private.h | 2 + 4 files changed, 116 insertions(+), 14 deletions(-) diff --git a/src/intel/vulkan/anv_device.c b/src/intel/vulkan/anv_device.c index 0a67414..d8de707 100644 --- a/src/intel/vulkan/anv_device.c +++ b/src/intel/vulkan/anv_device.c @@ -116,6 +116,9 @@ anv_physical_device_init_uuids(struct anv_physical_device *device) uint8_t sha1[20]; STATIC_ASSERT(VK_UUID_SIZE <= sizeof(sha1)); + /* The pipeline cache UUID is used for determining when a pipeline cache is +* invalid. It needs both a driver build and the PCI ID of the device. +*/ _mesa_sha1_init(_ctx); _mesa_sha1_update(_ctx, build_id_data(note), build_id_len); _mesa_sha1_update(_ctx, >chipset_id, @@ -123,6 +126,27 @@ anv_physical_device_init_uuids(struct anv_physical_device *device) _mesa_sha1_final(_ctx, sha1); memcpy(device->pipeline_cache_uuid, sha1, VK_UUID_SIZE); + /* The driver UUID is used for determining sharability of images and memory +* between two Vulkan instances in separate processes. People who want to +* share memory need to also check the device UUID (below) so all this +* needs to be is the build-id. +*/ + memcpy(device->driver_uuid, build_id_data(note), VK_UUID_SIZE); + + /* The device UUID uniquely identifies the given device within the machine. +* Since we never have more than one device, this doesn't need to be a real +* UUID. However, on the off-chance that someone tries to use this to +* cache pre-tiled images or something of the like, we use the PCI ID and +* some bits of ISL info to ensure that this is safe. +*/ + _mesa_sha1_init(_ctx); + _mesa_sha1_update(_ctx, >chipset_id, + sizeof(device->chipset_id)); + _mesa_sha1_update(_ctx, >isl_dev.has_bit6_swizzling, + sizeof(device->isl_dev.has_bit6_swizzling)); + _mesa_sha1_final(_ctx, sha1); + memcpy(device->device_uuid, sha1, VK_UUID_SIZE); + return VK_SUCCESS; } @@ -209,10 +233,6 @@ anv_physical_device_init(struct anv_physical_device *device, device->has_exec_async = anv_gem_get_param(fd, I915_PARAM_HAS_EXEC_ASYNC); - result = anv_physical_device_init_uuids(device); - if (result != VK_SUCCESS) - goto fail; - bool swizzled = anv_gem_get_bit6_swizzle(fd, I915_TILING_X); /* GENs prior to 8 do not support EU/Subslice info */ @@ -252,14 +272,18 @@ anv_physical_device_init(struct anv_physical_device *device, device->compiler->shader_debug_log = compiler_debug_log; device->compiler->shader_perf_log = compiler_perf_log; + isl_device_init(>isl_dev, >info, swizzled); + + result = anv_physical_device_init_uuids(device); + if (result != VK_SUCCESS) + goto fail; + result = anv_init_wsi(device); if (result != VK_SUCCESS) { ralloc_free(device->compiler); goto fail; } - isl_device_init(>isl_dev, >info, swizzled); - device->local_fd = fd; return VK_SUCCESS; @@ -303,6 +327,10 @@ static const VkExtensionProperties global_extensions[] = { .extensionName = VK_KHR_GET_PHYSICAL_DEVICE_PROPERTIES_2_EXTENSION_NAME, .specVersion = 1, }, + { + .extensionName = VK_KHX_EXTERNAL_MEMORY_CAPABILITIES_EXTENSION_NAME, + .specVersion = 1, + }, }; static const VkExtensionProperties device_extensions[] = { @@ -729,6 +757,8 @@ void anv_GetPhysicalDeviceProperties2KHR( VkPhysicalDevicephysicalDevice, VkPhysicalDeviceProperties2KHR* pProperties) { + ANV_FROM_HANDLE(anv_physical_device, pdevice, physicalDevice); + anv_GetPhysicalDeviceProperties(physicalDevice, >properties); vk_foreach_struct(ext, pProperties->pNext) { @@ -741,6 +771,16 @@ void anv_GetPhysicalDeviceProperties2KHR( break; } + case VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_ID_PROPERTIES_KHX: { + VkPhysicalDeviceIDPropertiesKHX *id_props = +(VkPhysicalDeviceIDPropertiesKHX
[Mesa-dev] [PATCH 04/21] anv: Refactor device_get_cache_uuid into physical_device_init_uuids
Reviewed-by: Chad Versace--- src/intel/vulkan/anv_device.c | 30 +- 1 file changed, 17 insertions(+), 13 deletions(-) diff --git a/src/intel/vulkan/anv_device.c b/src/intel/vulkan/anv_device.c index 079b0c5..ad10531 100644 --- a/src/intel/vulkan/anv_device.c +++ b/src/intel/vulkan/anv_device.c @@ -97,16 +97,20 @@ anv_compute_heap_size(int fd, uint64_t *heap_size) return VK_SUCCESS; } -static bool -anv_device_get_cache_uuid(void *uuid, uint16_t pci_id) +static VkResult +anv_physical_device_init_uuids(struct anv_physical_device *device) { const struct build_id_note *note = build_id_find_nhdr("libvulkan_intel.so"); - if (!note) - return false; + if (!note) { + return vk_errorf(VK_ERROR_INITIALIZATION_FAILED, + "Failed to find build-id"); + } unsigned build_id_len = build_id_length(note); - if (build_id_len < 20) /* It should be a SHA-1 */ - return false; + if (build_id_len < 20) { + return vk_errorf(VK_ERROR_INITIALIZATION_FAILED, + "build-id too short. It needs to be a SHA"); + } struct mesa_sha1 sha1_ctx; uint8_t sha1[20]; @@ -114,11 +118,12 @@ anv_device_get_cache_uuid(void *uuid, uint16_t pci_id) _mesa_sha1_init(_ctx); _mesa_sha1_update(_ctx, build_id_data(note), build_id_len); - _mesa_sha1_update(_ctx, _id, sizeof(pci_id)); + _mesa_sha1_update(_ctx, >chipset_id, + sizeof(device->chipset_id)); _mesa_sha1_final(_ctx, sha1); + memcpy(device->uuid, sha1, VK_UUID_SIZE); - memcpy(uuid, sha1, VK_UUID_SIZE); - return true; + return VK_SUCCESS; } static VkResult @@ -204,11 +209,10 @@ anv_physical_device_init(struct anv_physical_device *device, device->has_exec_async = anv_gem_get_param(fd, I915_PARAM_HAS_EXEC_ASYNC); - if (!anv_device_get_cache_uuid(device->uuid, device->chipset_id)) { - result = vk_errorf(VK_ERROR_INITIALIZATION_FAILED, - "cannot generate UUID"); + result = anv_physical_device_init_uuids(device); + if (result != VK_SUCCESS) goto fail; - } + bool swizzled = anv_gem_get_bit6_swizzle(fd, I915_TILING_X); /* GENs prior to 8 do not support EU/Subslice info */ -- 2.5.0.400.gff86faf ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 01/21] anv: Add the pci_id into the shader cache UUID
This prevents a user from using a cache created on one hardware generation on a different one. Of course, with Intel hardware, this requires moving their drive from one machine to another but it's still possible and we should prevent it. Reviewed-by: Chad Versace--- src/intel/vulkan/anv_device.c | 20 +++- 1 file changed, 15 insertions(+), 5 deletions(-) diff --git a/src/intel/vulkan/anv_device.c b/src/intel/vulkan/anv_device.c index 35ef4c4..7a25ee9 100644 --- a/src/intel/vulkan/anv_device.c +++ b/src/intel/vulkan/anv_device.c @@ -34,6 +34,7 @@ #include "util/strtod.h" #include "util/debug.h" #include "util/build_id.h" +#include "util/mesa-sha1.h" #include "util/vk_util.h" #include "genxml/gen7_pack.h" @@ -97,17 +98,26 @@ anv_compute_heap_size(int fd, uint64_t *heap_size) } static bool -anv_device_get_cache_uuid(void *uuid) +anv_device_get_cache_uuid(void *uuid, uint16_t pci_id) { const struct build_id_note *note = build_id_find_nhdr("libvulkan_intel.so"); if (!note) return false; - unsigned len = build_id_length(note); - if (len < VK_UUID_SIZE) + unsigned build_id_len = build_id_length(note); + if (build_id_len < 20) /* It should be a SHA-1 */ return false; - memcpy(uuid, build_id_data(note), VK_UUID_SIZE); + struct mesa_sha1 sha1_ctx; + uint8_t sha1[20]; + STATIC_ASSERT(VK_UUID_SIZE <= sizeof(sha1)); + + _mesa_sha1_init(_ctx); + _mesa_sha1_update(_ctx, build_id_data(note), build_id_len); + _mesa_sha1_update(_ctx, _id, sizeof(pci_id)); + _mesa_sha1_final(_ctx, sha1); + + memcpy(uuid, sha1, VK_UUID_SIZE); return true; } @@ -192,7 +202,7 @@ anv_physical_device_init(struct anv_physical_device *device, if (result != VK_SUCCESS) goto fail; - if (!anv_device_get_cache_uuid(device->uuid)) { + if (!anv_device_get_cache_uuid(device->uuid, device->chipset_id)) { result = vk_errorf(VK_ERROR_INITIALIZATION_FAILED, "cannot generate UUID"); goto fail; -- 2.5.0.400.gff86faf ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 05/21] anv/physical_device: Rename uuid to pipeline_cache_uuid
We're about to have more UUIDs for different things so this one really needs to be properly labeled. Reviewed-by: Chad Versace--- src/intel/vulkan/anv_device.c | 5 +++-- src/intel/vulkan/anv_pipeline_cache.c | 4 ++-- src/intel/vulkan/anv_private.h| 2 +- 3 files changed, 6 insertions(+), 5 deletions(-) diff --git a/src/intel/vulkan/anv_device.c b/src/intel/vulkan/anv_device.c index ad10531..0a67414 100644 --- a/src/intel/vulkan/anv_device.c +++ b/src/intel/vulkan/anv_device.c @@ -121,7 +121,7 @@ anv_physical_device_init_uuids(struct anv_physical_device *device) _mesa_sha1_update(_ctx, >chipset_id, sizeof(device->chipset_id)); _mesa_sha1_final(_ctx, sha1); - memcpy(device->uuid, sha1, VK_UUID_SIZE); + memcpy(device->pipeline_cache_uuid, sha1, VK_UUID_SIZE); return VK_SUCCESS; } @@ -721,7 +721,8 @@ void anv_GetPhysicalDeviceProperties( }; strcpy(pProperties->deviceName, pdevice->name); - memcpy(pProperties->pipelineCacheUUID, pdevice->uuid, VK_UUID_SIZE); + memcpy(pProperties->pipelineCacheUUID, + pdevice->pipeline_cache_uuid, VK_UUID_SIZE); } void anv_GetPhysicalDeviceProperties2KHR( diff --git a/src/intel/vulkan/anv_pipeline_cache.c b/src/intel/vulkan/anv_pipeline_cache.c index cdd8215..3cfe3ec 100644 --- a/src/intel/vulkan/anv_pipeline_cache.c +++ b/src/intel/vulkan/anv_pipeline_cache.c @@ -351,7 +351,7 @@ anv_pipeline_cache_load(struct anv_pipeline_cache *cache, return; if (header.device_id != device->chipset_id) return; - if (memcmp(header.uuid, pdevice->uuid, VK_UUID_SIZE) != 0) + if (memcmp(header.uuid, pdevice->pipeline_cache_uuid, VK_UUID_SIZE) != 0) return; const void *end = data + size; @@ -498,7 +498,7 @@ VkResult anv_GetPipelineCacheData( header->header_version = VK_PIPELINE_CACHE_HEADER_VERSION_ONE; header->vendor_id = 0x8086; header->device_id = device->chipset_id; - memcpy(header->uuid, pdevice->uuid, VK_UUID_SIZE); + memcpy(header->uuid, pdevice->pipeline_cache_uuid, VK_UUID_SIZE); p += align_u32(header->header_size, 8); uint32_t *count = p; diff --git a/src/intel/vulkan/anv_private.h b/src/intel/vulkan/anv_private.h index 1f12a59..2fb0019 100644 --- a/src/intel/vulkan/anv_private.h +++ b/src/intel/vulkan/anv_private.h @@ -630,7 +630,7 @@ struct anv_physical_device { uint32_teu_total; uint32_tsubslice_total; -uint8_t uuid[VK_UUID_SIZE]; +uint8_t pipeline_cache_uuid[VK_UUID_SIZE]; struct wsi_device wsi_device; int local_fd; -- 2.5.0.400.gff86faf ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 00/21] anv: Add support for VK_KHX_external*
This patch series adds support for a bunch of the VK_KHX_external extensions. This is mostly a re-send but there are a few bugfixes tucked in here and there are also some new patches. Changes of note: 1) It's been freshly rebased on master 2) The BO cache has undergone quite a few bugfixes. 3) We're now setting EXEC_OBJECT_ASYNC on almost everything. 4) Patches have been added to implement external semaphores using DRM sync objects as created by Dave Airlie. The only non-new patch that has undergone extensive changes (beyond just fixing rebase issues) is the BO cache patch. The last two patches in this series are marked RFC because they add support for using the new DRM sync object API from Dave Airlie. I think I'm relatively happy with the kernel API but would like to give the kernel people a chance to chip in before we commit to it. Hopefully, we can get the sync object API and its semantics nailed down soon. The series has also undergone significantly better testing. I have written a new crucible test (that I will push later today after cleaning it up a bit) which seems to do a pretty good job of testing this stuff. It then took me a while to get the crucible test to fail because the kernel currently works on a first-come-first-served model so zero synchronization is needed in order to get the proper Vulkan behavior. Thanks to Chris' kernel series to add support for context priorities and the patch labled "HACK" in this series, I was able to force one of the two contexts in my test to run at significantly lower priority and things actually started executing out of sync. Once I finally had a test that failed, I was able to prove that the patches work. :-) In order to test it properly, you will need my drm-syncobj3 kernel branch which contains patches from Chris Wilson, Dave Airlie, and myself. It can be found here: https://cgit.freedesktop.org/~jekstrand/linux/log/?h=drm-syncobj3 This series can be found here: https://cgit.freedesktop.org/~jekstrand/mesa/log/?h=wip/anv-external I now consider this stuff to be in good enough shape to merge. I intend to do so as soon as it is reviewed and the 17.1 branch point is past. Cc: Chad VersaceCc: Dave Airlie Cc: Chris Wilson Cc: Daniel Vetter Chad Versace (1): anv: Implement VK_KHX_external_memory_capabilities Jason Ekstrand (20): anv: Add the pci_id into the shader cache UUID anv/cmd_buffer: Use the device allocator for QueueSubmit anv: Set EXEC_OBJECT_ASYNC when available anv: Refactor device_get_cache_uuid into physical_device_init_uuids anv/physical_device: Rename uuid to pipeline_cache_uuid anv: Implement VK_KHX_external_memory anv/allocator: Add a BO cache anv: Use the BO cache for DeviceMemory allocations anv: Implement VK_KHX_external_memory_fd anv: Move queues, events, and semaphores to their own file anv: Add a real semaphore struct anv: Implement VK_KHX_external_semaphore_capabilities anv: Implement VK_KHX_external_semaphore anv: Pull the guts of cmd_buffer_execbuf into a helper anv: Implement VK_KHX_external_semaphore_fd anv/gem: Use EXECBUFFER2_WR when the FENCE_OUT flag is set anv: Implement support for exporting semaphores as FENCE_FD HACK/anv: Set context priorities based on queue priorities anv/gem: Add a drm syncobj support anv: Use DRM sync objects for external semaphores when available src/intel/Makefile.sources | 1 + src/intel/vulkan/anv_allocator.c| 271 +++ src/intel/vulkan/anv_batch_chain.c | 263 +-- src/intel/vulkan/anv_device.c | 713 src/intel/vulkan/anv_entrypoints_gen.py | 6 + src/intel/vulkan/anv_formats.c | 118 - src/intel/vulkan/anv_gem.c | 133 +- src/intel/vulkan/anv_gem_stubs.c| 31 ++ src/intel/vulkan/anv_image.c| 2 +- src/intel/vulkan/anv_intel.c| 15 +- src/intel/vulkan/anv_pipeline_cache.c | 4 +- src/intel/vulkan/anv_private.h | 98 +++- src/intel/vulkan/anv_queue.c| 793 src/intel/vulkan/anv_wsi.c | 7 +- 14 files changed, 1888 insertions(+), 567 deletions(-) create mode 100644 src/intel/vulkan/anv_queue.c -- 2.5.0.400.gff86faf ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 03/21] anv: Set EXEC_OBJECT_ASYNC when available
--- src/intel/vulkan/anv_allocator.c | 3 +++ src/intel/vulkan/anv_device.c| 5 + src/intel/vulkan/anv_private.h | 1 + src/intel/vulkan/anv_wsi.c | 1 + 4 files changed, 10 insertions(+) diff --git a/src/intel/vulkan/anv_allocator.c b/src/intel/vulkan/anv_allocator.c index 784191e..697309f 100644 --- a/src/intel/vulkan/anv_allocator.c +++ b/src/intel/vulkan/anv_allocator.c @@ -504,6 +504,9 @@ anv_block_pool_grow(struct anv_block_pool *pool, struct anv_block_state *state) anv_bo_init(>bo, gem_handle, size); pool->bo.map = map; + if (pool->device->instance->physicalDevice.has_exec_async) + pool->bo.flags |= EXEC_OBJECT_ASYNC; + done: pthread_mutex_unlock(>device->mutex); diff --git a/src/intel/vulkan/anv_device.c b/src/intel/vulkan/anv_device.c index 7a25ee9..079b0c5 100644 --- a/src/intel/vulkan/anv_device.c +++ b/src/intel/vulkan/anv_device.c @@ -202,6 +202,8 @@ anv_physical_device_init(struct anv_physical_device *device, if (result != VK_SUCCESS) goto fail; + device->has_exec_async = anv_gem_get_param(fd, I915_PARAM_HAS_EXEC_ASYNC); + if (!anv_device_get_cache_uuid(device->uuid, device->chipset_id)) { result = vk_errorf(VK_ERROR_INITIALIZATION_FAILED, "cannot generate UUID"); @@ -1527,6 +1529,9 @@ anv_bo_init_new(struct anv_bo *bo, struct anv_device *device, uint64_t size) if (device->instance->physicalDevice.supports_48bit_addresses) bo->flags |= EXEC_OBJECT_SUPPORTS_48B_ADDRESS; + if (device->instance->physicalDevice.has_exec_async) + bo->flags |= EXEC_OBJECT_ASYNC; + return VK_SUCCESS; } diff --git a/src/intel/vulkan/anv_private.h b/src/intel/vulkan/anv_private.h index 7d07900..1f12a59 100644 --- a/src/intel/vulkan/anv_private.h +++ b/src/intel/vulkan/anv_private.h @@ -625,6 +625,7 @@ struct anv_physical_device { struct brw_compiler * compiler; struct isl_device isl_dev; int cmd_parser_version; +boolhas_exec_async; uint32_teu_total; uint32_tsubslice_total; diff --git a/src/intel/vulkan/anv_wsi.c b/src/intel/vulkan/anv_wsi.c index ba66ea6..a024561 100644 --- a/src/intel/vulkan/anv_wsi.c +++ b/src/intel/vulkan/anv_wsi.c @@ -208,6 +208,7 @@ x11_anv_wsi_image_create(VkDevice device_h, * know we're writing to them and synchronize uses on other rings (eg if * the display server uses the blitter ring). */ + memory->bo.flags &= ~EXEC_OBJECT_ASYNC; memory->bo.flags |= EXEC_OBJECT_WRITE; anv_BindImageMemory(device_h, image_h, memory_h, 0); -- 2.5.0.400.gff86faf ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 02/21] anv/cmd_buffer: Use the device allocator for QueueSubmit
The command is really operating on a Queue not a command buffer and the nearest object to that with an allocator is VkDevice. Cc: "17.0"--- src/intel/vulkan/anv_batch_chain.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/src/intel/vulkan/anv_batch_chain.c b/src/intel/vulkan/anv_batch_chain.c index 5f0528f..3e9fa4c 100644 --- a/src/intel/vulkan/anv_batch_chain.c +++ b/src/intel/vulkan/anv_batch_chain.c @@ -1265,7 +1265,7 @@ anv_cmd_buffer_execbuf(struct anv_device *device, cmd_buffer->last_ss_pool_center); VkResult result = anv_execbuf_add_bo(, _pool->bo, _buffer->surface_relocs, - _buffer->pool->alloc); + >alloc); if (result != VK_SUCCESS) return result; @@ -1278,7 +1278,7 @@ anv_cmd_buffer_execbuf(struct anv_device *device, cmd_buffer->last_ss_pool_center); result = anv_execbuf_add_bo(, &(*bbo)->bo, &(*bbo)->relocs, - _buffer->pool->alloc); + >alloc); if (result != VK_SUCCESS) return result; } @@ -1387,7 +1387,7 @@ anv_cmd_buffer_execbuf(struct anv_device *device, result = anv_device_execbuf(device, , execbuf.bos); - anv_execbuf_finish(, _buffer->pool->alloc); + anv_execbuf_finish(, >alloc); return result; } -- 2.5.0.400.gff86faf ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH kmscube 6/6] common: Give cmdline parameter for forcing modifiers
On 13 April 2017 at 19:22, Ben Widawskywrote: > --- > common.c | 13 - > common.h | 11 ++- > kmscube.c | 14 +++--- > 3 files changed, 29 insertions(+), 9 deletions(-) > > diff --git a/common.c b/common.c > index e63bb39..eaaa9a4 100644 > --- a/common.c > +++ b/common.c > @@ -31,9 +31,6 @@ > > static struct gbm gbm; > > -#ifndef DRM_FORMAT_MOD_LINEAR > -#define DRM_FORMAT_MOD_LINEAR 0 > -#endif > static int > get_modifiers(uint64_t **mods) > { > @@ -43,7 +40,7 @@ get_modifiers(uint64_t **mods) > return 1; > } > > -const struct gbm * init_gbm(int drm_fd, int w, int h) > +const struct gbm * init_gbm(int drm_fd, int w, int h, uint64_t modifier) > { > gbm.dev = gbm_create_device(drm_fd); > > @@ -57,7 +54,13 @@ const struct gbm * init_gbm(int drm_fd, int w, int h) > } > #else > uint64_t *mods; > - int count = get_modifiers(); > + int count; > + if (modifier != DRM_FORMAT_MOD_INVALID) { > + count = 1; > + mods = > + } else { > + count = get_modifiers(); > + } > gbm.surface = gbm_surface_create_with_modifiers(gbm.dev, w, h, > GBM_FORMAT_XRGB, mods, count); > #endif > diff --git a/common.h b/common.h > index f3d9d32..03634cc 100644 > --- a/common.h > +++ b/common.h > @@ -36,6 +36,14 @@ >#include "config.h" > #endif > > +#ifndef DRM_FORMAT_MOD_LINEAR > +#define DRM_FORMAT_MOD_LINEAR 0 > +#endif > + > +#ifndef DRM_FORMAT_MOD_INVALID > +#define DRM_FORMAT_MOD_INVALID __u64)0) << 56) | ((1ULL << 56) - 1)) > +#endif > + > #ifndef EGL_KHR_platform_gbm > #define EGL_KHR_platform_gbm 1 > #define EGL_PLATFORM_GBM_KHR 0x31D7 > @@ -57,9 +65,10 @@ struct gbm { > struct gbm_device *dev; > struct gbm_surface *surface; > int width, height; > + uint64_t forced_modifier; Seems used. Drop for now? With my trivial suggestions the series is Reviewed-by: Emil Velikov -Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH kmscube 5/6] common: Use libdrm AddFB with modifiers
On 13 April 2017 at 19:22, Ben Widawskywrote: > Note: nothing happens here yet since LINEAR == 0. Suggestion for the subject common: use drmModeAddFB2* API over the legacy drmModeAddFB one > --- > configure.ac | 2 +- > drm-common.c | 37 + > 2 files changed, 34 insertions(+), 5 deletions(-) > > diff --git a/configure.ac b/configure.ac > index 33167e4..f564ef3 100644 > --- a/configure.ac > +++ b/configure.ac > @@ -35,7 +35,7 @@ AC_PROG_CC > m4_ifdef([AM_SILENT_RULES], [AM_SILENT_RULES([yes])]) > > # Obtain compiler/linker options for depedencies > -PKG_CHECK_MODULES(DRM, libdrm) > +PKG_CHECK_MODULES(DRM, [libdrm >= 2.4.71]) > PKG_CHECK_MODULES(GBM, gbm >= 13.0) > PKG_CHECK_MODULES(EGL, egl) > PKG_CHECK_MODULES(GLES2, glesv2) > diff --git a/drm-common.c b/drm-common.c > index b69ed70..eb460df 100644 > --- a/drm-common.c > +++ b/drm-common.c > @@ -46,7 +46,7 @@ struct drm_fb * drm_fb_get_from_bo(struct gbm_bo *bo) > { > int drm_fd = gbm_device_get_fd(gbm_bo_get_device(bo)); > struct drm_fb *fb = gbm_bo_get_user_data(bo); > - uint32_t width, height, stride, handle; > + uint32_t width, height, strides[4]={0}, handles[4] = {0}, offsets[4] > = {0}, flags = 0; Nit: Add spaces around = for strides[]. > int ret; > > if (fb) > @@ -57,10 +57,39 @@ struct drm_fb * drm_fb_get_from_bo(struct gbm_bo *bo) > > width = gbm_bo_get_width(bo); > height = gbm_bo_get_height(bo); > - stride = gbm_bo_get_stride(bo); > - handle = gbm_bo_get_handle(bo).u32; > > - ret = drmModeAddFB(drm_fd, width, height, 24, 32, stride, handle, > >fb_id); > +#ifndef HAVE_GBM_MODIFIERS > + strides[0] = gbm_bo_get_stride(bo); > + handles[0] = gbm_bo_get_handle(bo).u32; These two should go in the fallback path. > + ret = -1; > +#else > + uint64_t modifiers[4] = {0}; > + modifiers[0] = gbm_bo_get_modifier(bo); > + const int num_planes = gbm_bo_get_plane_count(bo); > + for (int i = 0; i < num_planes; i++) { > + strides[i] = gbm_bo_get_stride_for_plane(bo, i); > + handles[i] = gbm_bo_get_handle(bo).u32; > + offsets[i] = gbm_bo_get_offset(bo, i); > + modifiers[i] = modifiers[0]; > + } > + > + if (modifiers[0]) { > + flags = DRM_MODE_FB_MODIFIERS; > + printf("Using modifier %lx\n", modifiers[0]); > + } > + > + ret = drmModeAddFB2WithModifiers(drm_fd, width, height, > + DRM_FORMAT_XRGB, handles, strides, offsets, > + modifiers, >fb_id, flags); > +#endif > + if (ret) { > + if (flags) > + fprintf(stderr, "Modifiers failed!\n"); > + flags = 0; Drop this line or use it in drmModeAddFB2? Here we'd want to correctly initialise all of strides[] handles[], since they may contain the 'wrong' values from above. it's a bit pedantic I admit, but should make the code easier to read and will prevent explosions in [buggy] kernel modules. -Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] radv: remove irrelevant comment
A leftover from anv. Signed-off-by: Grazvydas Ignotas--- src/amd/vulkan/radv_device.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c index 5f14394..7857e8f 100644 --- a/src/amd/vulkan/radv_device.c +++ b/src/amd/vulkan/radv_device.c @@ -660,11 +660,11 @@ void radv_GetPhysicalDeviceProperties( .driverVersion = radv_get_driver_version(), .vendorID = 0x1002, .deviceID = pdevice->rad_info.pci_id, .deviceType = VK_PHYSICAL_DEVICE_TYPE_DISCRETE_GPU, .limits = limits, - .sparseProperties = {0}, /* Broadwell doesn't do sparse. */ + .sparseProperties = {0}, }; strcpy(pProperties->deviceName, pdevice->name); memcpy(pProperties->pipelineCacheUUID, pdevice->uuid, VK_UUID_SIZE); } -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] radv: report timestampPeriod correctly
The kernel returns frequency in kHz, so to convert to nanosecond interval that Vulkan uses the dividend should be 100.0 and not 10.0. This fixes the GPU graph in DOOM and matches the amdgpu-pro blob. Signed-off-by: Grazvydas IgnotasFixes: f4e499ec791 "radv: add initial non-conformant radv vulkan driver" --- src/amd/vulkan/radv_device.c| 2 +- src/amd/vulkan/radv_radeon_winsys.h | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c index 7857e8f..796cc70 100644 --- a/src/amd/vulkan/radv_device.c +++ b/src/amd/vulkan/radv_device.c @@ -637,11 +637,11 @@ void radv_GetPhysicalDeviceProperties( .sampledImageDepthSampleCounts= sample_counts, .sampledImageStencilSampleCounts = sample_counts, .storageImageSampleCounts = VK_SAMPLE_COUNT_1_BIT, .maxSampleMaskWords = 1, .timestampComputeAndGraphics = false, - .timestampPeriod = 10.0 / pdevice->rad_info.clock_crystal_freq, + .timestampPeriod = 100.0 / pdevice->rad_info.clock_crystal_freq, .maxClipDistances = 8, .maxCullDistances = 8, .maxCombinedClipAndCullDistances = 8, .discreteQueuePriorities = 1, .pointSizeRange = { 0.125, 255.875 }, diff --git a/src/amd/vulkan/radv_radeon_winsys.h b/src/amd/vulkan/radv_radeon_winsys.h index 9f2430f..f6bab74 100644 --- a/src/amd/vulkan/radv_radeon_winsys.h +++ b/src/amd/vulkan/radv_radeon_winsys.h @@ -93,11 +93,11 @@ struct radeon_info { bool has_uvd; uint32_tsdma_rings; uint32_tcompute_rings; uint32_tvce_fw_version; uint32_tvce_harvest_config; - uint32_tclock_crystal_freq; + uint32_tclock_crystal_freq; /* in kHz */ /* Kernel info. */ uint32_tdrm_major; /* version */ uint32_tdrm_minor; uint32_tdrm_patchlevel; -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH kmscube 4/6] common: Use the create with modifiers interface
On 13 April 2017 at 19:22, Ben Widawskywrote: > --- > common.c | 19 +++ > 1 file changed, 19 insertions(+) > > diff --git a/common.c b/common.c > index 4bf3c5a..e63bb39 100644 > --- a/common.c > +++ b/common.c > @@ -31,10 +31,23 @@ > > static struct gbm gbm; > > +#ifndef DRM_FORMAT_MOD_LINEAR > +#define DRM_FORMAT_MOD_LINEAR 0 > +#endif > +static int > +get_modifiers(uint64_t **mods) > +{ > + /* Assumed LINEAR is supported everywhere */ > + static uint64_t modifiers[] = {DRM_FORMAT_MOD_LINEAR}; > + *mods = modifiers; > + return 1; > +} > + > const struct gbm * init_gbm(int drm_fd, int w, int h) > { > gbm.dev = gbm_create_device(drm_fd); > > +#ifndef HAVE_GBM_MODIFIERS > gbm.surface = gbm_surface_create(gbm.dev, w, h, > GBM_FORMAT_XRGB, > GBM_BO_USE_SCANOUT | GBM_BO_USE_RENDERING); > @@ -42,6 +55,12 @@ const struct gbm * init_gbm(int drm_fd, int w, int h) > printf("failed to create gbm surface\n"); > return NULL; > } > +#else > + uint64_t *mods; > + int count = get_modifiers(); > + gbm.surface = gbm_surface_create_with_modifiers(gbm.dev, w, h, > + GBM_FORMAT_XRGB, mods, count); > +#endif > Since gbm_surface_create_with_modifiers() can fail we want to have some error handling. Move the existing one after the ifndef/else block ? -Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH kmscube 3/6] common: include config.h
Hi Ben, On 13 April 2017 at 19:22, Ben Widawskywrote: > --- > common.h | 4 > 1 file changed, 4 insertions(+) > > diff --git a/common.h b/common.h > index 2eceac7..f3d9d32 100644 > --- a/common.h > +++ b/common.h > @@ -32,6 +32,10 @@ > #include > #include > > +#ifdef HAVE_CONFIG_H > + #include "config.h" > +#endif > + There's no config.h so you don't need this patch. -Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2] configure.ac: add --enable-sanitize option
On 13 April 2017 at 17:14, Nicolai Hähnlewrote: > From: Nicolai Hähnle > > Enable code sanitizers by adding -fsanitize=$foo flags for the compiler > and linker. > > In addition, this also disables checking for undefined symbols: running > the address sanitizer requires additional symbols which should be provided > by a preloaded libasan.so (preloaded for hooking into malloc & friends > globally), and the undefined symbols check gets tripped up by that. > > Running the tests works normally via `make check`, but shows additional > failures with the address sanitizer due to memory leaks that seem to be > mostly leaks in the tests themselves. I believe those failures should > really be fixed. In the mean-time, you can set > > export ASAN_OPTIONS=detect_leaks=0 > > to only check for more serious error types. > > v2: > - fail reasonably when an unsupported sanitize flag is given (Eric Engestrom) > > Reviewed-by: Bartosz Tomczyk (v1) > Reviewed-by: Eric Engestrom > -- > Eric, did you ever figure out what went wrong with LLVM? I'm compiling > with a fairly recent LLVM trunk here and it works fine, and so apparently > did you. FWIW, I'm using gcc 6.2. > > Emil, as you can see I tried `make check`, and it works without the > preload because all the tests are standalone libraries. > Thought we had some tests that use shared libs. Glad to hear that everything works as expected. Thanks for double-checking! Reviewed-by: Emil Velikov -Emil P.S. Feel free to add a note to docs/relnotes/17.1.0.html or I'll add one later today. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] nir: Destination component count of shader_clock intrinsic is 2
+Matt, +Ken On Wed, Apr 12, 2017 at 6:09 PM, Boyan Dingwrote: > 2017-04-13 2:25 GMT+08:00 Jason Ekstrand : > > On Wed, Apr 12, 2017 at 6:14 AM, Boyan Ding > wrote: > >> > >> This fixes the following error when using ARB_shader_clock on i965: > >> vec1 32 ssa_0 = intrinsic shader_clock () () () > >> intrinsic store_var (ssa_0) (clock_retval) (3) /* wrmask=xy */ > >> error: src->ssa->num_components == num_components > (nir/nir_validate.c:204) > >> > >> Cc: mesa-sta...@lists.freedesktop.org > >> Signed-off-by: Boyan Ding > >> --- > >> src/compiler/glsl/glsl_to_nir.cpp | 3 ++- > >> src/compiler/nir/nir_intrinsics.h | 2 +- > >> 2 files changed, 3 insertions(+), 2 deletions(-) > >> > >> diff --git a/src/compiler/glsl/glsl_to_nir.cpp > >> b/src/compiler/glsl/glsl_to_nir.cpp > >> index f0557f985b..870d457681 100644 > >> --- a/src/compiler/glsl/glsl_to_nir.cpp > >> +++ b/src/compiler/glsl/glsl_to_nir.cpp > >> @@ -930,7 +930,8 @@ nir_visitor::visit(ir_call *ir) > >> nir_builder_instr_insert(, >instr); > >> break; > >>case nir_intrinsic_shader_clock: > >> - nir_ssa_dest_init(>instr, >dest, 1, 32, NULL); > >> + nir_ssa_dest_init(>instr, >dest, 2, 32, NULL); > >> + instr->num_components = 2; > > > > > > This made me go look at the spec, and things get a bit more subtle... In > > particular, ARB_shader_clock specifies two builtin functions: > > > > uvec2 clock2x32ARB(void); > > uint64_t clockARB(void); > > > > Where the second one only exists if you support int64. On gen8+, we do > > support int64... > > > > My feeling is that the correct way to implement this is to make the NIR > > intrinsic return a 64bit value and wrap it in a nir_unpack_64_2x32 if the > > client asks for the 2x32 version. If that's too much refactoring for > you, > > then this patch is probably sufficient to solve the issue today. > > > > I agree with you. I'm not very familiar with nir internals, and was > just copying TGSI's handling here. There will be more intrinsics with > 64bit results, for example, ballot, which radv guys might be > interested in. > > I won't mind if someone comes up with a better solution and replaces > mine. But just as you said above, it solves the issue today. It's up > to you to decide. > > Cheers, > Boyan Ding > > >> nir_builder_instr_insert(, >instr); > >> break; > >>case nir_intrinsic_store_ssbo: { > >> diff --git a/src/compiler/nir/nir_intrinsics.h > >> b/src/compiler/nir/nir_intrinsics.h > >> index 105c56f759..3a519a73dd 100644 > >> --- a/src/compiler/nir/nir_intrinsics.h > >> +++ b/src/compiler/nir/nir_intrinsics.h > >> @@ -91,7 +91,7 @@ BARRIER(memory_barrier) > >> * The latter can be used as code motion barrier, which is currently > not > >> * feasible with NIR. > >> */ > >> -INTRINSIC(shader_clock, 0, ARR(0), true, 1, 0, 0, xx, xx, xx, > >> NIR_INTRINSIC_CAN_ELIMINATE) > >> +INTRINSIC(shader_clock, 0, ARR(0), true, 2, 0, 0, xx, xx, xx, > >> NIR_INTRINSIC_CAN_ELIMINATE) > >> > >> /* > >> * Memory barrier with semantics analogous to the compute shader > >> -- > >> 2.12.0 > >> > >> ___ > >> mesa-dev mailing list > >> mesa-dev@lists.freedesktop.org > >> https://lists.freedesktop.org/mailman/listinfo/mesa-dev > > > > > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] mesa: replace _mesa_index_buffer::type with index_size
On Fri, Apr 14, 2017 at 12:45 PM, Marek Olšákwrote: > On Fri, Apr 14, 2017 at 5:12 PM, Ilia Mirkin wrote: >> On Fri, Apr 14, 2017 at 11:06 AM, Marek Olšák wrote: >>> diff --git a/src/mesa/vbo/vbo.h b/src/mesa/vbo/vbo.h >>> index d62ab4e..79f7538 100644 >>> --- a/src/mesa/vbo/vbo.h >>> +++ b/src/mesa/vbo/vbo.h >> >> Should also be possible to remove vbo_sizeof_ib_type from here right? > > vbo_sizeof_ib_type is used to get index_size at the beginning of > indexed draw calls. However, it's not used in other places anymore. Erm right. Duh. My r-b still stands. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/7] gallium: fold u_trim_pipe_prim call from st/mesa to drivers
On Fri, Apr 14, 2017 at 12:42 PM, Marek Olšákwrote: > On Fri, Apr 14, 2017 at 5:45 PM, Ilia Mirkin wrote: >> On Fri, Apr 14, 2017 at 11:07 AM, Marek Olšák wrote: >>> diff --git a/src/gallium/drivers/nouveau/nv30/nv30_vbo.c >>> b/src/gallium/drivers/nouveau/nv30/nv30_vbo.c >>> index bc9b9a1..295c394 100644 >>> --- a/src/gallium/drivers/nouveau/nv30/nv30_vbo.c >>> +++ b/src/gallium/drivers/nouveau/nv30/nv30_vbo.c >>> @@ -543,20 +543,23 @@ nv30_draw_elements(struct nv30_context *nv30, bool >>> shorten, >>> } >>> } >>> >>> static void >>> nv30_draw_vbo(struct pipe_context *pipe, const struct pipe_draw_info *info) >>> { >>> struct nv30_context *nv30 = nv30_context(pipe); >>> struct nouveau_pushbuf *push = nv30->base.pushbuf; >>> int i; >>> >>> + if (!u_trim_pipe_prim(info->mode, (unsigned*)>count)) >>> + return; >>> + >> >> Should this also have a !info->primitive_restart? It's supported on >> nv4x (covered by this driver). > > In that case, I wonder if u_trim_pipe_prim is required with this > driver. It might be better to just remove that call. Based on a quick look, this seems to exist to prevent short draws and trim the count to the nearest prim size, i.e. if you try to draw a tri with %3 != 0 vertices, or a line with %2 != 0? I'm not 100% sure that the NV30 HW handles those correctly, but it probably does. I can double-check tonight, as I have one plugged in these days. Cheers, -ilia ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] nir: Destination component count of shader_clock intrinsic is 2
On Wed, Apr 12, 2017 at 6:14 AM, Boyan Dingwrote: > This fixes the following error when using ARB_shader_clock on i965: > vec1 32 ssa_0 = intrinsic shader_clock () () () > intrinsic store_var (ssa_0) (clock_retval) (3) /* wrmask=xy */ > error: src->ssa->num_components == num_components (nir/nir_validate.c:204) > > Cc: mesa-sta...@lists.freedesktop.org > Signed-off-by: Boyan Ding > --- > src/compiler/glsl/glsl_to_nir.cpp | 3 ++- > src/compiler/nir/nir_intrinsics.h | 2 +- > 2 files changed, 3 insertions(+), 2 deletions(-) > > diff --git a/src/compiler/glsl/glsl_to_nir.cpp > b/src/compiler/glsl/glsl_to_nir.cpp > index f0557f985b..870d457681 100644 > --- a/src/compiler/glsl/glsl_to_nir.cpp > +++ b/src/compiler/glsl/glsl_to_nir.cpp > @@ -930,7 +930,8 @@ nir_visitor::visit(ir_call *ir) > nir_builder_instr_insert(, >instr); > break; >case nir_intrinsic_shader_clock: > - nir_ssa_dest_init(>instr, >dest, 1, 32, NULL); > + nir_ssa_dest_init(>instr, >dest, 2, 32, NULL); > + instr->num_components = 2; > This isn't needed for things that have an explicit number of components. You can drop it. Other than that, Reviewed-by: Jason Ekstrand We can figure out hte int64 interactions later. > nir_builder_instr_insert(, >instr); > break; >case nir_intrinsic_store_ssbo: { > diff --git a/src/compiler/nir/nir_intrinsics.h b/src/compiler/nir/nir_ > intrinsics.h > index 105c56f759..3a519a73dd 100644 > --- a/src/compiler/nir/nir_intrinsics.h > +++ b/src/compiler/nir/nir_intrinsics.h > @@ -91,7 +91,7 @@ BARRIER(memory_barrier) > * The latter can be used as code motion barrier, which is currently not > * feasible with NIR. > */ > -INTRINSIC(shader_clock, 0, ARR(0), true, 1, 0, 0, xx, xx, xx, > NIR_INTRINSIC_CAN_ELIMINATE) > +INTRINSIC(shader_clock, 0, ARR(0), true, 2, 0, 0, xx, xx, xx, > NIR_INTRINSIC_CAN_ELIMINATE) > > /* > * Memory barrier with semantics analogous to the compute shader > -- > 2.12.0 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] mesa: replace _mesa_index_buffer::type with index_size
On Fri, Apr 14, 2017 at 5:12 PM, Ilia Mirkinwrote: > On Fri, Apr 14, 2017 at 11:06 AM, Marek Olšák wrote: >> diff --git a/src/mesa/vbo/vbo.h b/src/mesa/vbo/vbo.h >> index d62ab4e..79f7538 100644 >> --- a/src/mesa/vbo/vbo.h >> +++ b/src/mesa/vbo/vbo.h > > Should also be possible to remove vbo_sizeof_ib_type from here right? vbo_sizeof_ib_type is used to get index_size at the beginning of indexed draw calls. However, it's not used in other places anymore. Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/7] gallium: fold u_trim_pipe_prim call from st/mesa to drivers
On Fri, Apr 14, 2017 at 5:45 PM, Ilia Mirkinwrote: > On Fri, Apr 14, 2017 at 11:07 AM, Marek Olšák wrote: >> diff --git a/src/gallium/drivers/nouveau/nv30/nv30_vbo.c >> b/src/gallium/drivers/nouveau/nv30/nv30_vbo.c >> index bc9b9a1..295c394 100644 >> --- a/src/gallium/drivers/nouveau/nv30/nv30_vbo.c >> +++ b/src/gallium/drivers/nouveau/nv30/nv30_vbo.c >> @@ -543,20 +543,23 @@ nv30_draw_elements(struct nv30_context *nv30, bool >> shorten, >> } >> } >> >> static void >> nv30_draw_vbo(struct pipe_context *pipe, const struct pipe_draw_info *info) >> { >> struct nv30_context *nv30 = nv30_context(pipe); >> struct nouveau_pushbuf *push = nv30->base.pushbuf; >> int i; >> >> + if (!u_trim_pipe_prim(info->mode, (unsigned*)>count)) >> + return; >> + > > Should this also have a !info->primitive_restart? It's supported on > nv4x (covered by this driver). In that case, I wonder if u_trim_pipe_prim is required with this driver. It might be better to just remove that call. Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 7/7] st/mesa: use one big translation table in st_pipe_vertex_format
Thanks. I'm amending this: diff --git a/src/mesa/state_tracker/st_atom_array.c b/src/mesa/state_tracker/st_atom_array.c index 6cfbd24..436ea45 100644 --- a/src/mesa/state_tracker/st_atom_array.c +++ b/src/mesa/state_tracker/st_atom_array.c @@ -47,8 +47,9 @@ #include "main/bufferobj.h" #include "main/glformats.h" -static uint16_t vertex_formats[][4][4] = { - { +/* vertex_formats[gltype - GL_BYTE][integer*2 + normalized][size - 1] */ +static const uint16_t vertex_formats[][4][4] = { + { /* GL_BYTE */ { PIPE_FORMAT_R8_SSCALED, PIPE_FORMAT_R8G8_SSCALED, @@ -68,7 +69,7 @@ static uint16_t vertex_formats[][4][4] = { PIPE_FORMAT_R8G8B8A8_SINT }, }, - { + { /* GL_UNSIGNED_BYTE */ { PIPE_FORMAT_R8_USCALED, PIPE_FORMAT_R8G8_USCALED, @@ -88,7 +89,7 @@ static uint16_t vertex_formats[][4][4] = { PIPE_FORMAT_R8G8B8A8_UINT }, }, - { + { /* GL_SHORT */ { PIPE_FORMAT_R16_SSCALED, PIPE_FORMAT_R16G16_SSCALED, @@ -108,7 +109,7 @@ static uint16_t vertex_formats[][4][4] = { PIPE_FORMAT_R16G16B16A16_SINT }, }, - { + { /* GL_UNSIGNED_SHORT */ { PIPE_FORMAT_R16_USCALED, PIPE_FORMAT_R16G16_USCALED, @@ -128,7 +129,7 @@ static uint16_t vertex_formats[][4][4] = { PIPE_FORMAT_R16G16B16A16_UINT }, }, - { + { /* GL_INT */ { PIPE_FORMAT_R32_SSCALED, PIPE_FORMAT_R32G32_SSCALED, @@ -148,7 +149,7 @@ static uint16_t vertex_formats[][4][4] = { PIPE_FORMAT_R32G32B32A32_SINT }, }, - { + { /* GL_UNSIGNED_INT */ { PIPE_FORMAT_R32_USCALED, PIPE_FORMAT_R32G32_USCALED, @@ -168,7 +169,7 @@ static uint16_t vertex_formats[][4][4] = { PIPE_FORMAT_R32G32B32A32_UINT }, }, - { + { /* GL_FLOAT */ { PIPE_FORMAT_R32_FLOAT, PIPE_FORMAT_R32G32_FLOAT, @@ -185,7 +186,7 @@ static uint16_t vertex_formats[][4][4] = { {{0}}, /* GL_2_BYTES */ {{0}}, /* GL_3_BYTES */ {{0}}, /* GL_4_BYTES */ - { + { /* GL_DOUBLE */ { PIPE_FORMAT_R64_FLOAT, PIPE_FORMAT_R64G64_FLOAT, @@ -199,7 +200,7 @@ static uint16_t vertex_formats[][4][4] = { PIPE_FORMAT_R64G64B64A64_FLOAT }, }, - { + { /* GL_HALF_FLOAT */ { PIPE_FORMAT_R16_FLOAT, PIPE_FORMAT_R16G16_FLOAT, @@ -213,7 +214,7 @@ static uint16_t vertex_formats[][4][4] = { PIPE_FORMAT_R16G16B16A16_FLOAT }, }, - { + { /* GL_FIXED */ { PIPE_FORMAT_R32_FIXED, Marek On Fri, Apr 14, 2017 at 5:41 PM, Brian Paulwrote: > On 04/14/2017 09:07 AM, Marek Olšák wrote: >> >> From: Marek Olšák >> >> for lower overhead. >> --- >> src/mesa/state_tracker/st_atom_array.c | 469 >> - >> 1 file changed, 227 insertions(+), 242 deletions(-) >> >> diff --git a/src/mesa/state_tracker/st_atom_array.c >> b/src/mesa/state_tracker/st_atom_array.c >> index 221b2c7..6cfbd24 100644 >> --- a/src/mesa/state_tracker/st_atom_array.c >> +++ b/src/mesa/state_tracker/st_atom_array.c >> @@ -40,284 +40,269 @@ >> #include "st_atom.h" >> #include "st_cb_bufferobjects.h" >> #include "st_draw.h" >> #include "st_program.h" >> >> #include "cso_cache/cso_context.h" >> #include "util/u_math.h" >> #include "main/bufferobj.h" >> #include "main/glformats.h" >> >> - >> -static GLuint double_types[4] = { >> - PIPE_FORMAT_R64_FLOAT, >> - PIPE_FORMAT_R64G64_FLOAT, >> - PIPE_FORMAT_R64G64B64_FLOAT, >> - PIPE_FORMAT_R64G64B64A64_FLOAT >> -}; >> - >> -static GLuint float_types[4] = { >> - PIPE_FORMAT_R32_FLOAT, >> - PIPE_FORMAT_R32G32_FLOAT, >> - PIPE_FORMAT_R32G32B32_FLOAT, >> - PIPE_FORMAT_R32G32B32A32_FLOAT >> -}; >> - >> -static GLuint half_float_types[4] = { >> - PIPE_FORMAT_R16_FLOAT, >> - PIPE_FORMAT_R16G16_FLOAT, >> - PIPE_FORMAT_R16G16B16_FLOAT, >> - PIPE_FORMAT_R16G16B16A16_FLOAT >> -}; >> - >> -static GLuint uint_types_norm[4] = { >> - PIPE_FORMAT_R32_UNORM, >> - PIPE_FORMAT_R32G32_UNORM, >> - PIPE_FORMAT_R32G32B32_UNORM, >> - PIPE_FORMAT_R32G32B32A32_UNORM >> -}; >> - >> -static GLuint uint_types_scale[4] = { >> - PIPE_FORMAT_R32_USCALED, >> - PIPE_FORMAT_R32G32_USCALED, >> - PIPE_FORMAT_R32G32B32_USCALED, >> - PIPE_FORMAT_R32G32B32A32_USCALED >> -}; >> - >> -static GLuint uint_types_int[4] = { >> - PIPE_FORMAT_R32_UINT, >> - PIPE_FORMAT_R32G32_UINT, >> - PIPE_FORMAT_R32G32B32_UINT, >> - PIPE_FORMAT_R32G32B32A32_UINT >> -}; >> - >> -static GLuint int_types_norm[4] = { >> - PIPE_FORMAT_R32_SNORM, >> - PIPE_FORMAT_R32G32_SNORM, >> - PIPE_FORMAT_R32G32B32_SNORM, >> - PIPE_FORMAT_R32G32B32A32_SNORM >> -}; >> - >> -static GLuint int_types_scale[4] = { >> - PIPE_FORMAT_R32_SSCALED, >> - PIPE_FORMAT_R32G32_SSCALED, >> - PIPE_FORMAT_R32G32B32_SSCALED, >> -
Re: [Mesa-dev] [PATCH] swr: Add polygon stipple support
On Fri, Apr 14, 2017 at 11:18 AM, Ilia Mirkinwrote: > On Thu, Apr 13, 2017 at 4:30 PM, George Kyriazis > wrote: >> Add polygon stipple functionality to the fragment shader. >> >> Explicitly turn off polygon stipple for lines and points, since we >> do them using tris. >> --- >> src/gallium/drivers/swr/swr_context.h | 4 ++- >> src/gallium/drivers/swr/swr_shader.cpp | 56 >> ++ >> src/gallium/drivers/swr/swr_shader.h | 1 + >> src/gallium/drivers/swr/swr_state.cpp | 27 ++-- >> src/gallium/drivers/swr/swr_state.h| 5 +++ >> 5 files changed, 84 insertions(+), 9 deletions(-) >> >> diff --git a/src/gallium/drivers/swr/swr_context.h >> b/src/gallium/drivers/swr/swr_context.h >> index be65a20..9d80c70 100644 >> --- a/src/gallium/drivers/swr/swr_context.h >> +++ b/src/gallium/drivers/swr/swr_context.h >> @@ -98,6 +98,8 @@ struct swr_draw_context { >> >> float userClipPlanes[PIPE_MAX_CLIP_PLANES][4]; >> >> + uint32_t polyStipple[32]; >> + >> SWR_SURFACE_STATE renderTargets[SWR_NUM_ATTACHMENTS]; >> void *pStats; >> }; >> @@ -127,7 +129,7 @@ struct swr_context { >> struct pipe_constant_buffer >>constants[PIPE_SHADER_TYPES][PIPE_MAX_CONSTANT_BUFFERS]; >> struct pipe_framebuffer_state framebuffer; >> - struct pipe_poly_stipple poly_stipple; >> + struct swr_poly_stipple poly_stipple; >> struct pipe_scissor_state scissor; >> SWR_RECT swr_scissor; >> struct pipe_sampler_view * >> diff --git a/src/gallium/drivers/swr/swr_shader.cpp >> b/src/gallium/drivers/swr/swr_shader.cpp >> index 6fc0596..d8f5512 100644 >> --- a/src/gallium/drivers/swr/swr_shader.cpp >> +++ b/src/gallium/drivers/swr/swr_shader.cpp >> @@ -165,6 +165,9 @@ swr_generate_fs_key(struct swr_jit_fs_key , >>sizeof(key.vs_output_semantic_idx)); >> >> swr_generate_sampler_key(swr_fs->info, ctx, PIPE_SHADER_FRAGMENT, key); >> + >> + key.poly_stipple_enable = ctx->rasterizer->poly_stipple_enable && >> + ctx->poly_stipple.prim_is_poly; >> } >> >> void >> @@ -1099,17 +1102,58 @@ BuilderSWR::CompileFS(struct swr_context *ctx, >> swr_jit_fs_key ) >> memset(_values, 0, sizeof(system_values)); >> >> struct lp_build_mask_context mask; >> + bool uses_mask = false; >> >> - if (swr_fs->info.base.uses_kill) { >> - Value *mask_val = LOAD(pPS, {0, SWR_PS_CONTEXT_activeMask}, >> "activeMask"); >> + if (swr_fs->info.base.uses_kill || >> + key.poly_stipple_enable) { >> + Value *vActiveMask = NULL; >> + if (swr_fs->info.base.uses_kill) { >> + vActiveMask = LOAD(pPS, {0, SWR_PS_CONTEXT_activeMask}, >> "activeMask"); >> + } >> + if (key.poly_stipple_enable) { >> + // first get fragment xy coords and clip to stipple bounds >> + Value *vXf = LOAD(pPS, {0, SWR_PS_CONTEXT_vX, PixelPositions_UL}); >> + Value *vYf = LOAD(pPS, {0, SWR_PS_CONTEXT_vY, PixelPositions_UL}); >> + Value *vXu = FP_TO_UI(vXf, mSimdInt32Ty); >> + Value *vYu = FP_TO_UI(vYf, mSimdInt32Ty); >> + >> + // stipple pattern is 32x32, which means that one line of stipple >> + // is stored in one word: >> + // vXstipple is bit offset inside 32-bit stipple word >> + // vYstipple is word index is stipple array >> + Value *vXstipple = AND(vXu, VIMMED1(0x1f)); // & (32-1) >> + Value *vYstipple = AND(vYu, VIMMED1(0x1f)); // & (32-1) >> + >> + // grab stipple pattern base address >> + Value *stipplePtr = GEP(hPrivateData, {0, >> swr_draw_context_polyStipple, 0}); >> + stipplePtr = BITCAST(stipplePtr, mInt8PtrTy); >> + >> + // peform a gather to grab stipple words for each lane >> + Value *vStipple = GATHERDD(VUNDEF_I(), stipplePtr, vYstipple, >> +VIMMED1(0x), C((char)4)); >> + >> + // create a mask with one bit corresponding to the x stipple >> + // and AND it with the pattern, to see if we have a bit >> + Value *vBitMask = LSHR(VIMMED1(0x8000), vXstipple); >> + Value *vStippleMask = AND(vStipple, vBitMask); >> + vStippleMask = ICMP_NE(vStippleMask, VIMMED1(0)); >> + vStippleMask = VMASK(vStippleMask); >> + >> + if (swr_fs->info.base.uses_kill) { >> +vActiveMask = AND(vActiveMask, vStippleMask); >> + } else { >> +vActiveMask = vStippleMask; >> + } >> + } >>lp_build_mask_begin( >> - , gallivm, lp_type_float_vec(32, 32 * 8), wrap(mask_val)); >> + , gallivm, lp_type_float_vec(32, 32 * 8), wrap(vActiveMask)); >> + uses_mask = true; >> } >> >> lp_build_tgsi_soa(gallivm, >> swr_fs->pipe.tokens, >> lp_type_float_vec(32, 32 * 8), >> - swr_fs->info.base.uses_kill ? : NULL, // mask >> + uses_mask ? : NULL, //
[Mesa-dev] [PATCH 3/3] winsys/amdgpu: init buffer_indices_hashlist with memset()
Signed-off-by: Samuel Pitoiset--- src/gallium/winsys/amdgpu/drm/amdgpu_cs.c | 10 ++ 1 file changed, 2 insertions(+), 8 deletions(-) diff --git a/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c b/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c index f068d8ea7a..8a277d08e1 100644 --- a/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c +++ b/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c @@ -695,8 +695,6 @@ static void amdgpu_ib_finalize(struct amdgpu_ib *ib) static bool amdgpu_init_cs_context(struct amdgpu_cs_context *cs, enum ring_type ring_type) { - int i; - switch (ring_type) { case RING_DMA: cs->request.ip_type = AMDGPU_HW_IP_DMA; @@ -720,9 +718,7 @@ static bool amdgpu_init_cs_context(struct amdgpu_cs_context *cs, break; } - for (i = 0; i < ARRAY_SIZE(cs->buffer_indices_hashlist); i++) { - cs->buffer_indices_hashlist[i] = -1; - } + memset(cs->buffer_indices_hashlist, -1, sizeof(cs->buffer_indices_hashlist)); cs->last_added_bo = NULL; cs->request.number_of_ibs = 1; @@ -757,9 +753,7 @@ static void amdgpu_cs_context_cleanup(struct amdgpu_cs_context *cs) cs->num_sparse_buffers = 0; amdgpu_fence_reference(>fence, NULL); - for (i = 0; i < ARRAY_SIZE(cs->buffer_indices_hashlist); i++) { - cs->buffer_indices_hashlist[i] = -1; - } + memset(cs->buffer_indices_hashlist, -1, sizeof(cs->buffer_indices_hashlist)); cs->last_added_bo = NULL; } -- 2.12.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev