Re: [Mesa-dev] [PATCH] vulkan: Add VK_EXT_calibrated_timestamps extension (radv and anv) [v5]
Bas Nieuwenhuizen writes: > Reviewed-by: Bas Nieuwenhuizen Thanks to you, Jason and Lionel for reviewing the code and helping improve it. -- -keith signature.asc Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 00/15] A bunch of shared code and RadeonSI changes
GREAT work Marek! Best speed up for months on Polaris 20, at least. Coming from vacation with injured right ankle joint, so I haven't had time for testing before commit. But 'glmark2' numbers are better than before all the Spectre shit (~8-9%?!). In German: 'Da geht noch was...' ;-) Greetings, Dieter === glmark2 2017.07 === OpenGL Information GL_VENDOR: X.Org GL_RENDERER: Radeon RX 580 Series (POLARIS10, DRM 3.26.0, 4.18.14-1.gce1c446-default, LLVM 8.0.0) GL_VERSION:4.5 (Compatibility Profile) Mesa 18.3.0-devel (git-58a51d0a67) === [build] use-vbo=false: FPS: 3382 FrameTime: 0.296 ms [build] use-vbo=true: FPS: 11679 FrameTime: 0.086 ms [texture] texture-filter=nearest: FPS: 11607 FrameTime: 0.086 ms [texture] texture-filter=linear: FPS: 11572 FrameTime: 0.086 ms [texture] texture-filter=mipmap: FPS: 11676 FrameTime: 0.086 ms [shading] shading=gouraud: FPS: 12207 FrameTime: 0.082 ms [shading] shading=blinn-phong-inf: FPS: 11892 FrameTime: 0.084 ms [shading] shading=phong: FPS: 12073 FrameTime: 0.083 ms [shading] shading=cel: FPS: 11763 FrameTime: 0.085 ms [bump] bump-render=high-poly: FPS: 11252 FrameTime: 0.089 ms [bump] bump-render=normals: FPS: 11366 FrameTime: 0.088 ms [bump] bump-render=height: FPS: 11226 FrameTime: 0.089 ms libpng warning: iCCP: known incorrect sRGB profile [effect2d] kernel=0,1,0;1,-4,1;0,1,0;: FPS: 12171 FrameTime: 0.082 ms libpng warning: iCCP: known incorrect sRGB profile [effect2d] kernel=1,1,1,1,1;1,1,1,1,1;1,1,1,1,1;: FPS: 11314 FrameTime: 0.088 ms [pulsar] light=false:quads=5:texture=false: FPS: 10452 FrameTime: 0.096 ms libpng warning: iCCP: known incorrect sRGB profile [desktop] blur-radius=5:effect=blur:passes=1:separable=true:windows=4: FPS: 5506 FrameTime: 0.182 ms libpng warning: iCCP: known incorrect sRGB profile [desktop] effect=shadow:windows=4: FPS: 5864 FrameTime: 0.171 ms [buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 812 FrameTime: 1.232 ms [buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=subdata: FPS: 1128 FrameTime: 0.887 ms [buffer] columns=200:interleave=true:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 893 FrameTime: 1.120 ms [ideas] speed=duration: FPS: 2999 FrameTime: 0.333 ms [jellyfish] : FPS: 9422 FrameTime: 0.106 ms [terrain] : FPS: 1787 FrameTime: 0.560 ms [shadow] : FPS: 8930 FrameTime: 0.112 ms [refract] : FPS: 3418 FrameTime: 0.293 ms [conditionals] fragment-steps=0:vertex-steps=0: FPS: 11901 FrameTime: 0.084 ms [conditionals] fragment-steps=5:vertex-steps=0: FPS: 11567 FrameTime: 0.086 ms [conditionals] fragment-steps=0:vertex-steps=5: FPS: 11614 FrameTime: 0.086 ms [function] fragment-complexity=low:fragment-steps=5: FPS: 11611 FrameTime: 0.086 ms [function] fragment-complexity=medium:fragment-steps=5: FPS: 11643 FrameTime: 0.086 ms [loop] fragment-loop=false:fragment-steps=5:vertex-steps=5: FPS: 11933 FrameTime: 0.084 ms [loop] fragment-steps=5:fragment-uniform=false:vertex-steps=5: FPS: 11964 FrameTime: 0.084 ms [loop] fragment-steps=5:fragment-uniform=true:vertex-steps=5: FPS: 11714 FrameTime: 0.085 ms === glmark2 Score: 9101 === Before, even with DRM 3.27.0 (amd-staging-drm-next) I had glmark2 Score: 8361 Am 03.10.2018 00:35, schrieb Marek Olšák: Hi, Interesting bits: - CP DMA support for GDS (unused but there is a test) - switch back to DX sample positions - center the viewport in the scanline area for maximizing the guardband - optimal PA_SU_PRIM_FILTER_CNTL - higher subpixel precision for 4K and lower resolutions (for more precise rendering of T-junctions in geometry) Please review. Thanks, Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] radv: use nir_shrink_vec_array_vars()
Reviewed-by: Bas Nieuwenhuizen for the series. I wonder what the perf diff is for tessellation. See e.g. https://github.com/doitsujin/dxvk/issues/645 for a game where tessellation is hitting us hard. On Thu, Oct 18, 2018 at 1:28 AM Timothy Arceri wrote: > > Totals from affected shaders: > SGPRS: 1096 -> 1096 (0.00 %) > VGPRS: 1192 -> 1056 (-11.41 %) > Spilled SGPRs: 0 -> 0 (0.00 %) > Spilled VGPRs: 0 -> 0 (0.00 %) > Private memory VGPRs: 0 -> 0 (0.00 %) > Scratch size: 0 -> 0 (0.00 %) dwords per thread > Code Size: 100940 -> 94384 (-6.49 %) bytes > LDS: 0 -> 0 (0.00 %) blocks > Max Waves: 100 -> 112 (12.00 %) > Wait states: 0 -> 0 (0.00 %) > > All affected shaders are from Batman Arkham City. > --- > src/amd/vulkan/radv_shader.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/src/amd/vulkan/radv_shader.c b/src/amd/vulkan/radv_shader.c > index 13858b6130f..15c9de1e020 100644 > --- a/src/amd/vulkan/radv_shader.c > +++ b/src/amd/vulkan/radv_shader.c > @@ -127,6 +127,7 @@ radv_optimize_nir(struct nir_shader *shader, bool > optimize_conservatively, > progress = false; > > NIR_PASS(progress, shader, nir_split_array_vars, > nir_var_local); > + NIR_PASS(progress, shader, nir_shrink_vec_array_vars, > nir_var_local); > > NIR_PASS_V(shader, nir_lower_vars_to_ssa); > NIR_PASS_V(shader, nir_lower_pack); > -- > 2.17.1 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/2] radv: use nir_shrink_vec_array_vars()
Totals from affected shaders: SGPRS: 1096 -> 1096 (0.00 %) VGPRS: 1192 -> 1056 (-11.41 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 100940 -> 94384 (-6.49 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 100 -> 112 (12.00 %) Wait states: 0 -> 0 (0.00 %) All affected shaders are from Batman Arkham City. --- src/amd/vulkan/radv_shader.c | 1 + 1 file changed, 1 insertion(+) diff --git a/src/amd/vulkan/radv_shader.c b/src/amd/vulkan/radv_shader.c index 13858b6130f..15c9de1e020 100644 --- a/src/amd/vulkan/radv_shader.c +++ b/src/amd/vulkan/radv_shader.c @@ -127,6 +127,7 @@ radv_optimize_nir(struct nir_shader *shader, bool optimize_conservatively, progress = false; NIR_PASS(progress, shader, nir_split_array_vars, nir_var_local); + NIR_PASS(progress, shader, nir_shrink_vec_array_vars, nir_var_local); NIR_PASS_V(shader, nir_lower_vars_to_ssa); NIR_PASS_V(shader, nir_lower_pack); -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/2] radv: use nir_split_array_vars()
We call in the opt loop in case another pass results in an array with indirect access being turned into direct access. Totals from affected shaders: SGPRS: 512 -> 496 (-3.12 %) VGPRS: 456 -> 452 (-0.88 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 40040 -> 39664 (-0.94 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 41 -> 43 (4.88 %) Wait states: 0 -> 0 (0.00 %) All affected shaders are from Batman Arkham City. --- src/amd/vulkan/radv_shader.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/amd/vulkan/radv_shader.c b/src/amd/vulkan/radv_shader.c index 52aa83d4a5a..13858b6130f 100644 --- a/src/amd/vulkan/radv_shader.c +++ b/src/amd/vulkan/radv_shader.c @@ -126,6 +126,8 @@ radv_optimize_nir(struct nir_shader *shader, bool optimize_conservatively, do { progress = false; + NIR_PASS(progress, shader, nir_split_array_vars, nir_var_local); + NIR_PASS_V(shader, nir_lower_vars_to_ssa); NIR_PASS_V(shader, nir_lower_pack); -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] radv: use nir_opt_find_array_copies()
For full effect, you want to also enable shrink_vec_var_arrays and split_array_vars On Wed, Oct 17, 2018 at 6:00 PM Timothy Arceri wrote: > Totals from affected shaders: > SGPRS: 1112 -> 1112 (0.00 %) > VGPRS: 1492 -> 1196 (-19.84 %) > Spilled SGPRs: 0 -> 0 (0.00 %) > Spilled VGPRs: 0 -> 0 (0.00 %) > Private memory VGPRs: 0 -> 0 (0.00 %) > Scratch size: 0 -> 0 (0.00 %) dwords per thread > Code Size: 112172 -> 101316 (-9.68 %) bytes > LDS: 0 -> 0 (0.00 %) blocks > Max Waves: 93 -> 98 (5.38 %) > Wait states: 0 -> 0 (0.00 %) > > All affected shaders are from "Batman: Arkham City" over DXVK. > > The pass detects that the temporary array created by DXVK for > storing TCS inputs is a copy of the input arrays and allows > us to avoid copying all of the input data and then indirecting > on it with if-ladders, instead we just do indirect indexing. > --- > src/amd/vulkan/radv_pipeline.c | 6 +++--- > src/amd/vulkan/radv_shader.c | 22 ++ > src/amd/vulkan/radv_shader.h | 3 ++- > 3 files changed, 23 insertions(+), 8 deletions(-) > > diff --git a/src/amd/vulkan/radv_pipeline.c > b/src/amd/vulkan/radv_pipeline.c > index e1d665d0ac7..8d15a048bbf 100644 > --- a/src/amd/vulkan/radv_pipeline.c > +++ b/src/amd/vulkan/radv_pipeline.c > @@ -1808,13 +1808,13 @@ radv_link_shaders(struct radv_pipeline *pipeline, > nir_shader **shaders) > > ac_lower_indirect_derefs(ordered_shaders[i], > > pipeline->device->physical_device->rad_info.chip_class); > } > - radv_optimize_nir(ordered_shaders[i], false); > + radv_optimize_nir(ordered_shaders[i], false, > false); > > if > (nir_lower_global_vars_to_local(ordered_shaders[i - 1])) { > ac_lower_indirect_derefs(ordered_shaders[i > - 1], > > pipeline->device->physical_device->rad_info.chip_class); > } > - radv_optimize_nir(ordered_shaders[i - 1], false); > + radv_optimize_nir(ordered_shaders[i - 1], false, > false); > } > } > } > @@ -2073,7 +2073,7 @@ void radv_create_shaders(struct radv_pipeline > *pipeline, > > if (!(flags & > VK_PIPELINE_CREATE_DISABLE_OPTIMIZATION_BIT)) { > nir_lower_io_to_scalar_early(nir[i], mask); > - radv_optimize_nir(nir[i], false); > + radv_optimize_nir(nir[i], false, false); > } > } > } > diff --git a/src/amd/vulkan/radv_shader.c b/src/amd/vulkan/radv_shader.c > index 3b3422c8da6..52aa83d4a5a 100644 > --- a/src/amd/vulkan/radv_shader.c > +++ b/src/amd/vulkan/radv_shader.c > @@ -118,7 +118,8 @@ void radv_DestroyShaderModule( > } > > void > -radv_optimize_nir(struct nir_shader *shader, bool optimize_conservatively) > +radv_optimize_nir(struct nir_shader *shader, bool optimize_conservatively, > + bool allow_copies) > { > bool progress; > > @@ -128,6 +129,15 @@ radv_optimize_nir(struct nir_shader *shader, bool > optimize_conservatively) > NIR_PASS_V(shader, nir_lower_vars_to_ssa); > NIR_PASS_V(shader, nir_lower_pack); > > + if (allow_copies) { > + /* Only run this pass in the first call to > +* radv_optimize_nir. Later calls assume that > we've > +* lowered away any copy_deref instructions and we > +* don't want to introduce any more. > + */ > + NIR_PASS(progress, shader, > nir_opt_find_array_copies); > + } > + > NIR_PASS(progress, shader, nir_opt_copy_prop_vars); > NIR_PASS(progress, shader, nir_opt_dead_write_vars); > > @@ -306,7 +316,6 @@ radv_shader_compile_to_nir(struct radv_device *device, > } > > nir_split_var_copies(nir); > - nir_lower_var_copies(nir); > > nir_lower_global_vars_to_local(nir); > nir_remove_dead_variables(nir, nir_var_local); > @@ -323,7 +332,12 @@ radv_shader_compile_to_nir(struct radv_device *device, > nir_lower_load_const_to_scalar(nir); > > if (!(flags & VK_PIPELINE_CREATE_DISABLE_OPTIMIZATION_BIT)) > - radv_optimize_nir(nir, false); > + radv_optimize_nir(nir, false, true); > + > + /* We call nir_lower_var_copies() after the first > radv_optimize_nir() > +* to remove any copies introduced by nir_opt_find_array_copies(). > +*/ > + nir_lower_var_copies(nir); > > /* Indirect lowering must be called after the radv_optimize_nir() > loop > * has been called at least once. Otherwise indirect lowering can > @@ -331,7 +345,7 @@ radv_shader_compile_to_nir(struct radv_device *device, > * considered too large for unrolling. > */ > ac_lower_ind
Re: [Mesa-dev] [PATCH] radv: use nir_opt_find_array_copies()
and split_struct_vars while you're at it On Wed, Oct 17, 2018 at 6:15 PM Jason Ekstrand wrote: > For full effect, you want to also enable shrink_vec_var_arrays and > split_array_vars > > On Wed, Oct 17, 2018 at 6:00 PM Timothy Arceri > wrote: > >> Totals from affected shaders: >> SGPRS: 1112 -> 1112 (0.00 %) >> VGPRS: 1492 -> 1196 (-19.84 %) >> Spilled SGPRs: 0 -> 0 (0.00 %) >> Spilled VGPRs: 0 -> 0 (0.00 %) >> Private memory VGPRs: 0 -> 0 (0.00 %) >> Scratch size: 0 -> 0 (0.00 %) dwords per thread >> Code Size: 112172 -> 101316 (-9.68 %) bytes >> LDS: 0 -> 0 (0.00 %) blocks >> Max Waves: 93 -> 98 (5.38 %) >> Wait states: 0 -> 0 (0.00 %) >> >> All affected shaders are from "Batman: Arkham City" over DXVK. >> >> The pass detects that the temporary array created by DXVK for >> storing TCS inputs is a copy of the input arrays and allows >> us to avoid copying all of the input data and then indirecting >> on it with if-ladders, instead we just do indirect indexing. >> --- >> src/amd/vulkan/radv_pipeline.c | 6 +++--- >> src/amd/vulkan/radv_shader.c | 22 ++ >> src/amd/vulkan/radv_shader.h | 3 ++- >> 3 files changed, 23 insertions(+), 8 deletions(-) >> >> diff --git a/src/amd/vulkan/radv_pipeline.c >> b/src/amd/vulkan/radv_pipeline.c >> index e1d665d0ac7..8d15a048bbf 100644 >> --- a/src/amd/vulkan/radv_pipeline.c >> +++ b/src/amd/vulkan/radv_pipeline.c >> @@ -1808,13 +1808,13 @@ radv_link_shaders(struct radv_pipeline *pipeline, >> nir_shader **shaders) >> >> ac_lower_indirect_derefs(ordered_shaders[i], >> >> pipeline->device->physical_device->rad_info.chip_class); >> } >> - radv_optimize_nir(ordered_shaders[i], false); >> + radv_optimize_nir(ordered_shaders[i], false, >> false); >> >> if >> (nir_lower_global_vars_to_local(ordered_shaders[i - 1])) { >> >> ac_lower_indirect_derefs(ordered_shaders[i - 1], >> >> pipeline->device->physical_device->rad_info.chip_class); >> } >> - radv_optimize_nir(ordered_shaders[i - 1], false); >> + radv_optimize_nir(ordered_shaders[i - 1], false, >> false); >> } >> } >> } >> @@ -2073,7 +2073,7 @@ void radv_create_shaders(struct radv_pipeline >> *pipeline, >> >> if (!(flags & >> VK_PIPELINE_CREATE_DISABLE_OPTIMIZATION_BIT)) { >> nir_lower_io_to_scalar_early(nir[i], >> mask); >> - radv_optimize_nir(nir[i], false); >> + radv_optimize_nir(nir[i], false, false); >> } >> } >> } >> diff --git a/src/amd/vulkan/radv_shader.c b/src/amd/vulkan/radv_shader.c >> index 3b3422c8da6..52aa83d4a5a 100644 >> --- a/src/amd/vulkan/radv_shader.c >> +++ b/src/amd/vulkan/radv_shader.c >> @@ -118,7 +118,8 @@ void radv_DestroyShaderModule( >> } >> >> void >> -radv_optimize_nir(struct nir_shader *shader, bool >> optimize_conservatively) >> +radv_optimize_nir(struct nir_shader *shader, bool >> optimize_conservatively, >> + bool allow_copies) >> { >> bool progress; >> >> @@ -128,6 +129,15 @@ radv_optimize_nir(struct nir_shader *shader, bool >> optimize_conservatively) >> NIR_PASS_V(shader, nir_lower_vars_to_ssa); >> NIR_PASS_V(shader, nir_lower_pack); >> >> + if (allow_copies) { >> + /* Only run this pass in the first call to >> +* radv_optimize_nir. Later calls assume that >> we've >> +* lowered away any copy_deref instructions and we >> +* don't want to introduce any more. >> + */ >> + NIR_PASS(progress, shader, >> nir_opt_find_array_copies); >> + } >> + >> NIR_PASS(progress, shader, nir_opt_copy_prop_vars); >> NIR_PASS(progress, shader, nir_opt_dead_write_vars); >> >> @@ -306,7 +316,6 @@ radv_shader_compile_to_nir(struct radv_device *device, >> } >> >> nir_split_var_copies(nir); >> - nir_lower_var_copies(nir); >> >> nir_lower_global_vars_to_local(nir); >> nir_remove_dead_variables(nir, nir_var_local); >> @@ -323,7 +332,12 @@ radv_shader_compile_to_nir(struct radv_device >> *device, >> nir_lower_load_const_to_scalar(nir); >> >> if (!(flags & VK_PIPELINE_CREATE_DISABLE_OPTIMIZATION_BIT)) >> - radv_optimize_nir(nir, false); >> + radv_optimize_nir(nir, false, true); >> + >> + /* We call nir_lower_var_copies() after the first >> radv_optimize_nir() >> +* to remove any copies introduced by nir_opt_find_array_copies(). >> +*/ >> + nir_lower_var_copies(nir); >> >> /* Indirect lowering must be called after the radv_optimize_nir() >> loop >> * has been called
Re: [Mesa-dev] [PATCH] radv: use nir_opt_find_array_copies()
Reviewed-by: Bas Nieuwenhuizen On Thu, Oct 18, 2018 at 1:00 AM Timothy Arceri wrote: > > Totals from affected shaders: > SGPRS: 1112 -> 1112 (0.00 %) > VGPRS: 1492 -> 1196 (-19.84 %) > Spilled SGPRs: 0 -> 0 (0.00 %) > Spilled VGPRs: 0 -> 0 (0.00 %) > Private memory VGPRs: 0 -> 0 (0.00 %) > Scratch size: 0 -> 0 (0.00 %) dwords per thread > Code Size: 112172 -> 101316 (-9.68 %) bytes > LDS: 0 -> 0 (0.00 %) blocks > Max Waves: 93 -> 98 (5.38 %) > Wait states: 0 -> 0 (0.00 %) > > All affected shaders are from "Batman: Arkham City" over DXVK. > > The pass detects that the temporary array created by DXVK for > storing TCS inputs is a copy of the input arrays and allows > us to avoid copying all of the input data and then indirecting > on it with if-ladders, instead we just do indirect indexing. > --- > src/amd/vulkan/radv_pipeline.c | 6 +++--- > src/amd/vulkan/radv_shader.c | 22 ++ > src/amd/vulkan/radv_shader.h | 3 ++- > 3 files changed, 23 insertions(+), 8 deletions(-) > > diff --git a/src/amd/vulkan/radv_pipeline.c b/src/amd/vulkan/radv_pipeline.c > index e1d665d0ac7..8d15a048bbf 100644 > --- a/src/amd/vulkan/radv_pipeline.c > +++ b/src/amd/vulkan/radv_pipeline.c > @@ -1808,13 +1808,13 @@ radv_link_shaders(struct radv_pipeline *pipeline, > nir_shader **shaders) > ac_lower_indirect_derefs(ordered_shaders[i], > > pipeline->device->physical_device->rad_info.chip_class); > } > - radv_optimize_nir(ordered_shaders[i], false); > + radv_optimize_nir(ordered_shaders[i], false, false); > > if (nir_lower_global_vars_to_local(ordered_shaders[i > - 1])) { > ac_lower_indirect_derefs(ordered_shaders[i - > 1], > > pipeline->device->physical_device->rad_info.chip_class); > } > - radv_optimize_nir(ordered_shaders[i - 1], false); > + radv_optimize_nir(ordered_shaders[i - 1], false, > false); > } > } > } > @@ -2073,7 +2073,7 @@ void radv_create_shaders(struct radv_pipeline *pipeline, > > if (!(flags & > VK_PIPELINE_CREATE_DISABLE_OPTIMIZATION_BIT)) { > nir_lower_io_to_scalar_early(nir[i], mask); > - radv_optimize_nir(nir[i], false); > + radv_optimize_nir(nir[i], false, false); > } > } > } > diff --git a/src/amd/vulkan/radv_shader.c b/src/amd/vulkan/radv_shader.c > index 3b3422c8da6..52aa83d4a5a 100644 > --- a/src/amd/vulkan/radv_shader.c > +++ b/src/amd/vulkan/radv_shader.c > @@ -118,7 +118,8 @@ void radv_DestroyShaderModule( > } > > void > -radv_optimize_nir(struct nir_shader *shader, bool optimize_conservatively) > +radv_optimize_nir(struct nir_shader *shader, bool optimize_conservatively, > + bool allow_copies) > { > bool progress; > > @@ -128,6 +129,15 @@ radv_optimize_nir(struct nir_shader *shader, bool > optimize_conservatively) > NIR_PASS_V(shader, nir_lower_vars_to_ssa); > NIR_PASS_V(shader, nir_lower_pack); > > + if (allow_copies) { > + /* Only run this pass in the first call to > +* radv_optimize_nir. Later calls assume that we've > +* lowered away any copy_deref instructions and we > +* don't want to introduce any more. > + */ > + NIR_PASS(progress, shader, nir_opt_find_array_copies); > + } > + > NIR_PASS(progress, shader, nir_opt_copy_prop_vars); > NIR_PASS(progress, shader, nir_opt_dead_write_vars); > > @@ -306,7 +316,6 @@ radv_shader_compile_to_nir(struct radv_device *device, > } > > nir_split_var_copies(nir); > - nir_lower_var_copies(nir); > > nir_lower_global_vars_to_local(nir); > nir_remove_dead_variables(nir, nir_var_local); > @@ -323,7 +332,12 @@ radv_shader_compile_to_nir(struct radv_device *device, > nir_lower_load_const_to_scalar(nir); > > if (!(flags & VK_PIPELINE_CREATE_DISABLE_OPTIMIZATION_BIT)) > - radv_optimize_nir(nir, false); > + radv_optimize_nir(nir, false, true); > + > + /* We call nir_lower_var_copies() after the first radv_optimize_nir() > +* to remove any copies introduced by nir_opt_find_array_copies(). > +*/ > + nir_lower_var_copies(nir); > > /* Indirect lowering must be called after the radv_optimize_nir() loop > * has been called at least once. Otherwise indirect lowering can > @@ -331,7 +345,7 @@ radv_shader_compile_to_nir(struct radv_device *device, >
Re: [Mesa-dev] [PATCH] radv: use nir_opt_copy_prop_vars and nir_opt_dead_write_vars
On 18/10/18 9:51 am, Bas Nieuwenhuizen wrote: On Thu, Oct 18, 2018 at 12:04 AM Timothy Arceri wrote: Totals from affected shaders: SGPRS: 2856 -> 2856 (0.00 %) VGPRS: 3236 -> 3248 (0.37 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 236560 -> 233548 (-1.27 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 277 -> 283 (2.17 %) Wait states: 0 -> 0 (0.00 %) Interesting that both max waves and VGPRs increased. Yeah it was just one of those changes where the new NIR increased VGPR use in a larger number of shaders compared to the number that reduced enough to bump the max waves. However as I tried to indicate below the increase of VGPRs is something that could likely be improved on the LLVM side, the NIR itself looks much better. Reviewed-by: Bas Nieuwenhuizen Even in the cases were we have increased VGPR use it appears the NIR is improved significantly. --- src/amd/vulkan/radv_shader.c | 4 1 file changed, 4 insertions(+) diff --git a/src/amd/vulkan/radv_shader.c b/src/amd/vulkan/radv_shader.c index 3e3eb96a531..3b3422c8da6 100644 --- a/src/amd/vulkan/radv_shader.c +++ b/src/amd/vulkan/radv_shader.c @@ -127,6 +127,10 @@ radv_optimize_nir(struct nir_shader *shader, bool optimize_conservatively) NIR_PASS_V(shader, nir_lower_vars_to_ssa); NIR_PASS_V(shader, nir_lower_pack); + + NIR_PASS(progress, shader, nir_opt_copy_prop_vars); + NIR_PASS(progress, shader, nir_opt_dead_write_vars); + NIR_PASS_V(shader, nir_lower_alu_to_scalar); NIR_PASS_V(shader, nir_lower_phis_to_scalar); -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] radv: use nir_opt_find_array_copies()
Totals from affected shaders: SGPRS: 1112 -> 1112 (0.00 %) VGPRS: 1492 -> 1196 (-19.84 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 112172 -> 101316 (-9.68 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 93 -> 98 (5.38 %) Wait states: 0 -> 0 (0.00 %) All affected shaders are from "Batman: Arkham City" over DXVK. The pass detects that the temporary array created by DXVK for storing TCS inputs is a copy of the input arrays and allows us to avoid copying all of the input data and then indirecting on it with if-ladders, instead we just do indirect indexing. --- src/amd/vulkan/radv_pipeline.c | 6 +++--- src/amd/vulkan/radv_shader.c | 22 ++ src/amd/vulkan/radv_shader.h | 3 ++- 3 files changed, 23 insertions(+), 8 deletions(-) diff --git a/src/amd/vulkan/radv_pipeline.c b/src/amd/vulkan/radv_pipeline.c index e1d665d0ac7..8d15a048bbf 100644 --- a/src/amd/vulkan/radv_pipeline.c +++ b/src/amd/vulkan/radv_pipeline.c @@ -1808,13 +1808,13 @@ radv_link_shaders(struct radv_pipeline *pipeline, nir_shader **shaders) ac_lower_indirect_derefs(ordered_shaders[i], pipeline->device->physical_device->rad_info.chip_class); } - radv_optimize_nir(ordered_shaders[i], false); + radv_optimize_nir(ordered_shaders[i], false, false); if (nir_lower_global_vars_to_local(ordered_shaders[i - 1])) { ac_lower_indirect_derefs(ordered_shaders[i - 1], pipeline->device->physical_device->rad_info.chip_class); } - radv_optimize_nir(ordered_shaders[i - 1], false); + radv_optimize_nir(ordered_shaders[i - 1], false, false); } } } @@ -2073,7 +2073,7 @@ void radv_create_shaders(struct radv_pipeline *pipeline, if (!(flags & VK_PIPELINE_CREATE_DISABLE_OPTIMIZATION_BIT)) { nir_lower_io_to_scalar_early(nir[i], mask); - radv_optimize_nir(nir[i], false); + radv_optimize_nir(nir[i], false, false); } } } diff --git a/src/amd/vulkan/radv_shader.c b/src/amd/vulkan/radv_shader.c index 3b3422c8da6..52aa83d4a5a 100644 --- a/src/amd/vulkan/radv_shader.c +++ b/src/amd/vulkan/radv_shader.c @@ -118,7 +118,8 @@ void radv_DestroyShaderModule( } void -radv_optimize_nir(struct nir_shader *shader, bool optimize_conservatively) +radv_optimize_nir(struct nir_shader *shader, bool optimize_conservatively, + bool allow_copies) { bool progress; @@ -128,6 +129,15 @@ radv_optimize_nir(struct nir_shader *shader, bool optimize_conservatively) NIR_PASS_V(shader, nir_lower_vars_to_ssa); NIR_PASS_V(shader, nir_lower_pack); + if (allow_copies) { + /* Only run this pass in the first call to +* radv_optimize_nir. Later calls assume that we've +* lowered away any copy_deref instructions and we +* don't want to introduce any more. + */ + NIR_PASS(progress, shader, nir_opt_find_array_copies); + } + NIR_PASS(progress, shader, nir_opt_copy_prop_vars); NIR_PASS(progress, shader, nir_opt_dead_write_vars); @@ -306,7 +316,6 @@ radv_shader_compile_to_nir(struct radv_device *device, } nir_split_var_copies(nir); - nir_lower_var_copies(nir); nir_lower_global_vars_to_local(nir); nir_remove_dead_variables(nir, nir_var_local); @@ -323,7 +332,12 @@ radv_shader_compile_to_nir(struct radv_device *device, nir_lower_load_const_to_scalar(nir); if (!(flags & VK_PIPELINE_CREATE_DISABLE_OPTIMIZATION_BIT)) - radv_optimize_nir(nir, false); + radv_optimize_nir(nir, false, true); + + /* We call nir_lower_var_copies() after the first radv_optimize_nir() +* to remove any copies introduced by nir_opt_find_array_copies(). +*/ + nir_lower_var_copies(nir); /* Indirect lowering must be called after the radv_optimize_nir() loop * has been called at least once. Otherwise indirect lowering can @@ -331,7 +345,7 @@ radv_shader_compile_to_nir(struct radv_device *device, * considered too large for unrolling. */ ac_lower_indirect_derefs(nir, device->physical_device->rad_info.chip_class); - radv_optimize_nir(nir, flags & VK_PIPELINE_CREATE_DISABLE_OPTIMIZATION_BIT); + radv_optimize_nir(nir, flags & VK_PIPELINE_CREATE_DISABLE
Re: [Mesa-dev] [PATCH] radv: use nir_opt_copy_prop_vars and nir_opt_dead_write_vars
On Thu, Oct 18, 2018 at 12:04 AM Timothy Arceri wrote: > > Totals from affected shaders: > SGPRS: 2856 -> 2856 (0.00 %) > VGPRS: 3236 -> 3248 (0.37 %) > Spilled SGPRs: 0 -> 0 (0.00 %) > Spilled VGPRs: 0 -> 0 (0.00 %) > Private memory VGPRs: 0 -> 0 (0.00 %) > Scratch size: 0 -> 0 (0.00 %) dwords per thread > Code Size: 236560 -> 233548 (-1.27 %) bytes > LDS: 0 -> 0 (0.00 %) blocks > Max Waves: 277 -> 283 (2.17 %) > Wait states: 0 -> 0 (0.00 %) Interesting that both max waves and VGPRs increased. Reviewed-by: Bas Nieuwenhuizen > > Even in the cases were we have increased VGPR use it appears > the NIR is improved significantly. > --- > src/amd/vulkan/radv_shader.c | 4 > 1 file changed, 4 insertions(+) > > diff --git a/src/amd/vulkan/radv_shader.c b/src/amd/vulkan/radv_shader.c > index 3e3eb96a531..3b3422c8da6 100644 > --- a/src/amd/vulkan/radv_shader.c > +++ b/src/amd/vulkan/radv_shader.c > @@ -127,6 +127,10 @@ radv_optimize_nir(struct nir_shader *shader, bool > optimize_conservatively) > > NIR_PASS_V(shader, nir_lower_vars_to_ssa); > NIR_PASS_V(shader, nir_lower_pack); > + > + NIR_PASS(progress, shader, nir_opt_copy_prop_vars); > + NIR_PASS(progress, shader, nir_opt_dead_write_vars); > + > NIR_PASS_V(shader, nir_lower_alu_to_scalar); > NIR_PASS_V(shader, nir_lower_phis_to_scalar); > > -- > 2.17.1 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] vulkan: Add VK_EXT_calibrated_timestamps extension (radv and anv) [v5]
Reviewed-by: Bas Nieuwenhuizen On Wed, Oct 17, 2018 at 6:49 PM Keith Packard wrote: > > Offers three clocks, device, clock monotonic and clock monotonic > raw. Could use some kernel support to reduce the deviation between > clock values. > > v2: > Ensure deviation is at least as big as the GPU time interval. > > v3: > Set device->lost when returning DEVICE_LOST. > Use MAX2 and DIV_ROUND_UP instead of open coding these. > Delete spurious TIMESTAMP in radv version. > > Suggested-by: Jason Ekstrand > Suggested-by: Lionel Landwerlin > > v4: > Add anv_gem_reg_read to anv_gem_stubs.c > > Suggested-by: Jason Ekstrand > > v5: > Adjust maxDeviation computation to max(sampled_clock_period) + > sample_interval. > > Suggested-by: Bas Nieuwenhuizen > Suggested-by: Jason Ekstrand > > Signed-off-by: Keith Packard > --- > src/amd/vulkan/radv_device.c | 119 +++ > src/amd/vulkan/radv_extensions.py | 1 + > src/intel/vulkan/anv_device.c | 127 + > src/intel/vulkan/anv_extensions.py | 1 + > src/intel/vulkan/anv_gem.c | 13 +++ > src/intel/vulkan/anv_gem_stubs.c | 7 ++ > src/intel/vulkan/anv_private.h | 2 + > 7 files changed, 270 insertions(+) > > diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c > index 174922780fc..4a705a724ef 100644 > --- a/src/amd/vulkan/radv_device.c > +++ b/src/amd/vulkan/radv_device.c > @@ -4955,3 +4955,122 @@ radv_GetDeviceGroupPeerMemoryFeatures( >VK_PEER_MEMORY_FEATURE_GENERIC_SRC_BIT | >VK_PEER_MEMORY_FEATURE_GENERIC_DST_BIT; > } > + > +static const VkTimeDomainEXT radv_time_domains[] = { > + VK_TIME_DOMAIN_DEVICE_EXT, > + VK_TIME_DOMAIN_CLOCK_MONOTONIC_EXT, > + VK_TIME_DOMAIN_CLOCK_MONOTONIC_RAW_EXT, > +}; > + > +VkResult radv_GetPhysicalDeviceCalibrateableTimeDomainsEXT( > + VkPhysicalDevice physicalDevice, > + uint32_t *pTimeDomainCount, > + VkTimeDomainEXT *pTimeDomains) > +{ > + int d; > + VK_OUTARRAY_MAKE(out, pTimeDomains, pTimeDomainCount); > + > + for (d = 0; d < ARRAY_SIZE(radv_time_domains); d++) { > + vk_outarray_append(&out, i) { > + *i = radv_time_domains[d]; > + } > + } > + > + return vk_outarray_status(&out); > +} > + > +static uint64_t > +radv_clock_gettime(clockid_t clock_id) > +{ > + struct timespec current; > + int ret; > + > + ret = clock_gettime(clock_id, ¤t); > + if (ret < 0 && clock_id == CLOCK_MONOTONIC_RAW) > + ret = clock_gettime(CLOCK_MONOTONIC, ¤t); > + if (ret < 0) > + return 0; > + > + return (uint64_t) current.tv_sec * 10ULL + current.tv_nsec; > +} > + > +VkResult radv_GetCalibratedTimestampsEXT( > + VkDevice _device, > + uint32_t timestampCount, > + const VkCalibratedTimestampInfoEXT *pTimestampInfos, > + uint64_t *pTimestamps, > + uint64_t *pMaxDeviation) > +{ > + RADV_FROM_HANDLE(radv_device, device, _device); > + uint32_t clock_crystal_freq = > device->physical_device->rad_info.clock_crystal_freq; > + int d; > + uint64_t begin, end; > +uint64_t max_clock_period = 0; > + > + begin = radv_clock_gettime(CLOCK_MONOTONIC_RAW); > + > + for (d = 0; d < timestampCount; d++) { > + switch (pTimestampInfos[d].timeDomain) { > + case VK_TIME_DOMAIN_DEVICE_EXT: > + pTimestamps[d] = device->ws->query_value(device->ws, > + > RADEON_TIMESTAMP); > +uint64_t device_period = DIV_ROUND_UP(100, > clock_crystal_freq); > +max_clock_period = MAX2(max_clock_period, > device_period); > + break; > + case VK_TIME_DOMAIN_CLOCK_MONOTONIC_EXT: > + pTimestamps[d] = radv_clock_gettime(CLOCK_MONOTONIC); > +max_clock_period = MAX2(max_clock_period, 1); > + break; > + > + case VK_TIME_DOMAIN_CLOCK_MONOTONIC_RAW_EXT: > + pTimestamps[d] = begin; > + break; > + default: > + pTimestamps[d] = 0; > + break; > + } > + } > + > + end = radv_clock_gettime(CLOCK_MONOTONIC_RAW); > + > +/* > + * The maximum deviation is the sum of the interval over which we > + * perform the sampling and the maximum perio
[Mesa-dev] [PATCH] radv: use nir_opt_copy_prop_vars and nir_opt_dead_write_vars
Totals from affected shaders: SGPRS: 2856 -> 2856 (0.00 %) VGPRS: 3236 -> 3248 (0.37 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 236560 -> 233548 (-1.27 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 277 -> 283 (2.17 %) Wait states: 0 -> 0 (0.00 %) Even in the cases were we have increased VGPR use it appears the NIR is improved significantly. --- src/amd/vulkan/radv_shader.c | 4 1 file changed, 4 insertions(+) diff --git a/src/amd/vulkan/radv_shader.c b/src/amd/vulkan/radv_shader.c index 3e3eb96a531..3b3422c8da6 100644 --- a/src/amd/vulkan/radv_shader.c +++ b/src/amd/vulkan/radv_shader.c @@ -127,6 +127,10 @@ radv_optimize_nir(struct nir_shader *shader, bool optimize_conservatively) NIR_PASS_V(shader, nir_lower_vars_to_ssa); NIR_PASS_V(shader, nir_lower_pack); + + NIR_PASS(progress, shader, nir_opt_copy_prop_vars); + NIR_PASS(progress, shader, nir_opt_dead_write_vars); + NIR_PASS_V(shader, nir_lower_alu_to_scalar); NIR_PASS_V(shader, nir_lower_phis_to_scalar); -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 0/1] swr: Fix for LLVM 5 to 6 API change
This patch fixes a compile error resulting from a function whose API changed between LLVM versions 5 and 6. I sent this to mesa-dev, but it's primarly a fix for the stable branch as it affects releases with LLVM 5-based codegen. v2: included mesa-stable cc Alok Hota (1): swr/rast: ignore CreateElementUnorderedAtomicMemCpy .../drivers/swr/rasterizer/codegen/gen_llvm_ir_macros.py | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 1/1] swr/rast: ignore CreateElementUnorderedAtomicMemCpy
This function's API changed between LLVM 5 and 6. Compile errors occur when building with LLVM 6+ if LLVM 5 was used for a dist tarball CC: --- .../drivers/swr/rasterizer/codegen/gen_llvm_ir_macros.py | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/gallium/drivers/swr/rasterizer/codegen/gen_llvm_ir_macros.py b/src/gallium/drivers/swr/rasterizer/codegen/gen_llvm_ir_macros.py index d34e88d1bc..485403ae1e 100644 --- a/src/gallium/drivers/swr/rasterizer/codegen/gen_llvm_ir_macros.py +++ b/src/gallium/drivers/swr/rasterizer/codegen/gen_llvm_ir_macros.py @@ -161,7 +161,8 @@ def parse_ir_builder(input_file): func_name == 'CreateAlignmentAssumptionHelper' or func_name == 'CreateGEP' or func_name == 'CreateLoad' or -func_name == 'CreateMaskedLoad'): +func_name == 'CreateMaskedLoad' or +func_name == 'CreateElementUnorderedAtomicMemCpy'): ignore = True # Convert CamelCase to CAMEL_CASE -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Q: to which software renderers should we contribute to help virgl conformance testing
Am 17.10.18 um 19:15 schrieb Gert Wollny: > Dear all, > > we are looking into doing a CI for virglrenderer that also runs a > subset of the GLES dEQP, and in order to be able to run this also in > gitlab.fd.o we were looking into the available gallium software > renderers. Inital tests by just running the dEQP-GLES2 were quite > successful in the sense that the exection time is not too long (a full > run on the GL and GLES host with llvmpipe takes about 10 min [1]). > > Now to extend on that work the focus is turning to which software > renderer has the most features, the least failing tests, and is > actively developed. > > Simply looking at the commit stats it seems that the developement of > softpipe and llvmpipe is mostly stalled, swr, on the other had has seen > quite some development, but mostly regarding performance, and given the > FAQ [2] the focus is on a very specific application space and not so > much on getting more features in. I wouldn't quite say llvmpipe is stalled, although it's true that there weren't all that many changes (in particular as new features are concerned). > > When checking for conformance of virglrenderer we need a host driver > that is conformant itself, and we are willing to contribute here, but > it seems to make most sense to focus this work on just one driver. To > make sensible choice there are some open questions: > > Are there plans to get swr and/or llvmpipe to support gles 3.1, or > carry any of the drivers even further, maybe GLES 3.2 and desktop 4.x? At a quick glance for for gles 3.1 llvmpipe would be missing mostly compute shaders and shader images / ssbo, so definitely some work. GL 4 would add tessellation as well (at least I think these are the big parts missing). Unfortunately I don't have time to work on this, but it would be nice to have indeed. Well volunteers welcome, no special hw nor docs needed :-). (Although softpipe is easier to work with, but it's just not all that interesing.) > > > Is there any specific interest to fix all failures that occur when > running gles dEQP? In this bug report [3] Roland pointed out that > "there is no goal as such to pass dEQP, although patches are welcome", > any opinion for the other drivers? (for swr beyond what is written in > the FAQ). I think it wouldn't really be all that much work to get dEQP passing - since llvmpipe is built to honor dx10 rules, which are typically more strict than GL. But some things specific to GL fail. So IMHO if you want a non-hw driver to pass dEQP, llvmpipe is probably still your best bet (but of course, softpipe is generally easier to fix). Can't really comment on swr there. Roland > > As pointed out in the FAQ, swr is very Intel specific, are there plans > not layed out in the FAQ to support other, non-x86 hardware? > > many thanks > Gert ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/2] vulkan/wsi: Implement GetPhysicalDevicePresentRectanglesKHR
This got missed during 1.1 enabling because it was defined as an interaction between device groups and WSI and it wasn't obvious it was in the delta. The idea behind it is that it's supposed to provide a hint to the application in a multi-GPU setup to indicate which regions of the screen are being scanned out by which GPU so a multi-device split-screen rendering application can render each part of the screen on the GPU that will be presenting it and avoid extra bus traffic between GPUs. On a single-GPU setup or one which doesn't support this present mode, we need to do something. We choose to return the window size (or a max-size rect) if the compositor, X server, or crtc is associated with the given physical device and zero rectangles otherwise. --- src/amd/vulkan/radv_wsi.c | 14 +++ src/intel/vulkan/anv_wsi.c | 14 +++ src/vulkan/wsi/wsi_common.c | 14 +++ src/vulkan/wsi/wsi_common.h | 7 src/vulkan/wsi/wsi_common_display.c | 41 +++ src/vulkan/wsi/wsi_common_private.h | 5 +++ src/vulkan/wsi/wsi_common_wayland.c | 21 ++ src/vulkan/wsi/wsi_common_x11.c | 61 + 8 files changed, 177 insertions(+) diff --git a/src/amd/vulkan/radv_wsi.c b/src/amd/vulkan/radv_wsi.c index 8b165ea3916..43103a4ef85 100644 --- a/src/amd/vulkan/radv_wsi.c +++ b/src/amd/vulkan/radv_wsi.c @@ -284,3 +284,17 @@ VkResult radv_GetDeviceGroupSurfacePresentModesKHR( return VK_SUCCESS; } + +VkResult radv_GetPhysicalDevicePresentRectanglesKHR( + VkPhysicalDevicephysicalDevice, + VkSurfaceKHRsurface, + uint32_t* pRectCount, + VkRect2D* pRects) +{ + RADV_FROM_HANDLE(radv_physical_device, device, physicalDevice); + + return wsi_common_get_present_rectangles(&device->wsi_device, +device->local_fd, +surface, +pRectCount, pRects); +} diff --git a/src/intel/vulkan/anv_wsi.c b/src/intel/vulkan/anv_wsi.c index 1c9a54804e8..5d672c211c4 100644 --- a/src/intel/vulkan/anv_wsi.c +++ b/src/intel/vulkan/anv_wsi.c @@ -293,3 +293,17 @@ VkResult anv_GetDeviceGroupSurfacePresentModesKHR( return VK_SUCCESS; } + +VkResult anv_GetPhysicalDevicePresentRectanglesKHR( +VkPhysicalDevicephysicalDevice, +VkSurfaceKHRsurface, +uint32_t* pRectCount, +VkRect2D* pRects) +{ + ANV_FROM_HANDLE(anv_physical_device, device, physicalDevice); + + return wsi_common_get_present_rectangles(&device->wsi_device, +device->local_fd, +surface, +pRectCount, pRects); +} diff --git a/src/vulkan/wsi/wsi_common.c b/src/vulkan/wsi/wsi_common.c index 1e3c4e0028b..ad4b8c9075e 100644 --- a/src/vulkan/wsi/wsi_common.c +++ b/src/vulkan/wsi/wsi_common.c @@ -803,6 +803,20 @@ wsi_common_get_surface_present_modes(struct wsi_device *wsi_device, pPresentModes); } +VkResult +wsi_common_get_present_rectangles(struct wsi_device *wsi_device, + int local_fd, + VkSurfaceKHR _surface, + uint32_t* pRectCount, + VkRect2D* pRects) +{ + ICD_FROM_HANDLE(VkIcdSurfaceBase, surface, _surface); + struct wsi_interface *iface = wsi_device->wsi[surface->platform]; + + return iface->get_present_rectangles(surface, wsi_device, local_fd, +pRectCount, pRects); +} + VkResult wsi_common_create_swapchain(struct wsi_device *wsi, VkDevice device, diff --git a/src/vulkan/wsi/wsi_common.h b/src/vulkan/wsi/wsi_common.h index 424330de566..5b69c573d9e 100644 --- a/src/vulkan/wsi/wsi_common.h +++ b/src/vulkan/wsi/wsi_common.h @@ -199,6 +199,13 @@ wsi_common_get_surface_present_modes(struct wsi_device *wsi_device, uint32_t *pPresentModeCount, VkPresentModeKHR *pPresentModes); +VkResult +wsi_common_get_present_rectangles(struct wsi_device *wsi, + int local_fd, + VkSurfaceKHR surface, + uint32_t* pRectCount, + VkRect2D* pRects); + VkResult wsi_common_get_surface_capabilities2ext( struct wsi_device *wsi_device, diff --git a/src/vulkan/wsi/wsi_common_display.c b/src/vulkan/wsi/wsi_common_display.c index c004060a205..2315717ef8e 100644 --- a/src/vulkan/wsi/wsi_common_display.c +++
[Mesa-dev] [PATCH 1/2] vulkan/wsi: Store the instance allocator in wsi_device
We already have wsi_device and we know the instance allocator at wsi_device_init time so there's no need to pass it into the physical device queries. This also fixes a memory allocation domain bug that can occur if CreateSwapchain gets called prior to any queries (not likely) in which case the cached connection gets allocated off the device instead of the instance. --- src/amd/vulkan/radv_wsi.c | 1 - src/amd/vulkan/radv_wsi_x11.c | 2 -- src/intel/vulkan/anv_wsi.c | 1 - src/intel/vulkan/anv_wsi_x11.c | 2 -- src/vulkan/wsi/wsi_common.c | 4 ++-- src/vulkan/wsi/wsi_common.h | 4 +++- src/vulkan/wsi/wsi_common_display.c | 1 - src/vulkan/wsi/wsi_common_private.h | 1 - src/vulkan/wsi/wsi_common_wayland.c | 1 - src/vulkan/wsi/wsi_common_x11.c | 25 +++-- src/vulkan/wsi/wsi_common_x11.h | 1 - 11 files changed, 16 insertions(+), 27 deletions(-) diff --git a/src/amd/vulkan/radv_wsi.c b/src/amd/vulkan/radv_wsi.c index 6479bea070b..8b165ea3916 100644 --- a/src/amd/vulkan/radv_wsi.c +++ b/src/amd/vulkan/radv_wsi.c @@ -75,7 +75,6 @@ VkResult radv_GetPhysicalDeviceSurfaceSupportKHR( device->local_fd, queueFamilyIndex, surface, - &device->instance->alloc, pSupported); } diff --git a/src/amd/vulkan/radv_wsi_x11.c b/src/amd/vulkan/radv_wsi_x11.c index c65ac938772..9ef02ccc435 100644 --- a/src/amd/vulkan/radv_wsi_x11.c +++ b/src/amd/vulkan/radv_wsi_x11.c @@ -44,7 +44,6 @@ VkBool32 radv_GetPhysicalDeviceXcbPresentationSupportKHR( return wsi_get_physical_device_xcb_presentation_support( &device->wsi_device, - &device->instance->alloc, queueFamilyIndex, device->local_fd, true, connection, visual_id); @@ -60,7 +59,6 @@ VkBool32 radv_GetPhysicalDeviceXlibPresentationSupportKHR( return wsi_get_physical_device_xcb_presentation_support( &device->wsi_device, - &device->instance->alloc, queueFamilyIndex, device->local_fd, true, XGetXCBConnection(dpy), visualID); diff --git a/src/intel/vulkan/anv_wsi.c b/src/intel/vulkan/anv_wsi.c index 5ed1d711689..1c9a54804e8 100644 --- a/src/intel/vulkan/anv_wsi.c +++ b/src/intel/vulkan/anv_wsi.c @@ -92,7 +92,6 @@ VkResult anv_GetPhysicalDeviceSurfaceSupportKHR( device->local_fd, queueFamilyIndex, surface, - &device->instance->alloc, pSupported); } diff --git a/src/intel/vulkan/anv_wsi_x11.c b/src/intel/vulkan/anv_wsi_x11.c index 2feb5f13376..45c43f6f17f 100644 --- a/src/intel/vulkan/anv_wsi_x11.c +++ b/src/intel/vulkan/anv_wsi_x11.c @@ -40,7 +40,6 @@ VkBool32 anv_GetPhysicalDeviceXcbPresentationSupportKHR( return wsi_get_physical_device_xcb_presentation_support( &device->wsi_device, - &device->instance->alloc, queueFamilyIndex, device->local_fd, false, connection, visual_id); @@ -56,7 +55,6 @@ VkBool32 anv_GetPhysicalDeviceXlibPresentationSupportKHR( return wsi_get_physical_device_xcb_presentation_support( &device->wsi_device, - &device->instance->alloc, queueFamilyIndex, device->local_fd, false, XGetXCBConnection(dpy), visualID); diff --git a/src/vulkan/wsi/wsi_common.c b/src/vulkan/wsi/wsi_common.c index 3416fef3076..1e3c4e0028b 100644 --- a/src/vulkan/wsi/wsi_common.c +++ b/src/vulkan/wsi/wsi_common.c @@ -39,6 +39,7 @@ wsi_device_init(struct wsi_device *wsi, memset(wsi, 0, sizeof(*wsi)); + wsi->instance_alloc = *alloc; wsi->pdevice = pdevice; #define WSI_GET_CB(func) \ @@ -677,13 +678,12 @@ wsi_common_get_surface_support(struct wsi_device *wsi_device, int local_fd, uint32_t queueFamilyIndex, VkSurfaceKHR _surface, - const VkAllocationCallbacks *alloc, VkBool32* pSupported) { ICD_FROM_HANDLE(VkIcdSurfaceBase, surface, _surface); struct wsi_interface *iface = wsi_device->wsi[surface->platform]; - return iface->get_support(surface, wsi_device, alloc, + return iface->get_support(surface, wsi_device, queueFamilyIndex, local_fd, pSupported); } diff --git a/src/vulkan/wsi/wsi_common.h b/src/vulkan/wsi/wsi_common.h index 14f65097bb3..424330de566 100644 --- a/src/vulkan/wsi/wsi_common.h +++ b/src/vulkan/wsi/wsi_common.h @@ -90,6 +90,9 @@ struct wsi_interface; #define VK_ICD_WSI_PLATFORM_MAX (VK_ICD_WSI_PLATFORM_DISPLAY + 1) struct wsi_device { + /* Allocator for the instance */ + VkAllocationCallb
Re: [Mesa-dev] [PATCH 1/1] swr/rast: ignore CreateElementUnorderedAtomicMemCpy
Reviewed-by: Bruce Cherniak > On Oct 17, 2018, at 1:51 PM, Alok Hota wrote: > > This function's API changed between LLVM 5 and 6. Compile errors occur > when building with LLVM 6+ if LLVM 5 was used for a dist tarball > --- > .../drivers/swr/rasterizer/codegen/gen_llvm_ir_macros.py | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/src/gallium/drivers/swr/rasterizer/codegen/gen_llvm_ir_macros.py > b/src/gallium/drivers/swr/rasterizer/codegen/gen_llvm_ir_macros.py > index d34e88d1bc..485403ae1e 100644 > --- a/src/gallium/drivers/swr/rasterizer/codegen/gen_llvm_ir_macros.py > +++ b/src/gallium/drivers/swr/rasterizer/codegen/gen_llvm_ir_macros.py > @@ -161,7 +161,8 @@ def parse_ir_builder(input_file): > func_name == 'CreateAlignmentAssumptionHelper' or > func_name == 'CreateGEP' or > func_name == 'CreateLoad' or > -func_name == 'CreateMaskedLoad'): > +func_name == 'CreateMaskedLoad' or > +func_name == 'CreateElementUnorderedAtomicMemCpy'): > ignore = True > > # Convert CamelCase to CAMEL_CASE > -- > 2.17.1 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 108355] Civilization VI - Artifacts in mouse cursor
https://bugs.freedesktop.org/show_bug.cgi?id=108355 Hadrien Nilsson changed: What|Removed |Added Component|Drivers/Gallium/softpipe|Drivers/Gallium/swr --- Comment #7 from Hadrien Nilsson --- amdgpu.dc=0 had no effect, but using xf86-video-amdgpu 18.1.0 indeed fixed the problem :) Thank you Michel. Hopefully that new version we'll be released somehow for my Linux distribution. I still do not know if the mouse cursor is displayed as intended (there is a shadow which seems to use additive blending instead of alpha blending) but at least there are no more artifacts. I guess I should change the Bugzilla Product to "xorg"? -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 107765] [regression] Batman Arkham City crashes with DXVK under wine
https://bugs.freedesktop.org/show_bug.cgi?id=107765 --- Comment #13 from farmboy0+freedesk...@googlemail.com --- Can you tell me what settings you use? Do you use a 64 bit prefix? -- You are receiving this mail because: You are the assignee for the bug. You are the QA Contact for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 15/15] radeonsi: enable vcn jpeg decode for raven
From: Boyuan Zhang Enable vcn jpeg decode for raven. Signed-off-by: Boyuan Zhang Reviewed-by: Leo Liu --- src/gallium/drivers/radeonsi/si_get.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/gallium/drivers/radeonsi/si_get.c b/src/gallium/drivers/radeonsi/si_get.c index a87cb3cbc8..9b995bbcbf 100644 --- a/src/gallium/drivers/radeonsi/si_get.c +++ b/src/gallium/drivers/radeonsi/si_get.c @@ -628,6 +628,8 @@ static int si_get_video_param(struct pipe_screen *screen, return profile == PIPE_VIDEO_PROFILE_HEVC_MAIN; return false; case PIPE_VIDEO_FORMAT_JPEG: + if (sscreen->info.family == CHIP_RAVEN) + return true; if (sscreen->info.family < CHIP_CARRIZO || sscreen->info.family >= CHIP_VEGA10) return false; if (!(sscreen->info.drm_major == 3 && sscreen->info.drm_minor >= 19)) { -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 13/15] amd/common: add vcn jpeg ip info query
From: Boyuan Zhang Signed-off-by: Boyuan Zhang Reviewed-by: Leo Liu --- src/amd/common/ac_gpu_info.c | 14 -- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/src/amd/common/ac_gpu_info.c b/src/amd/common/ac_gpu_info.c index 766ad83547..8c50738c3f 100644 --- a/src/amd/common/ac_gpu_info.c +++ b/src/amd/common/ac_gpu_info.c @@ -99,7 +99,7 @@ bool ac_query_gpu_info(int fd, amdgpu_device_handle dev, struct drm_amdgpu_info_device device_info = {}; struct amdgpu_buffer_size_alignments alignment_info = {}; struct drm_amdgpu_info_hw_ip dma = {}, compute = {}, uvd = {}; - struct drm_amdgpu_info_hw_ip uvd_enc = {}, vce = {}, vcn_dec = {}; + struct drm_amdgpu_info_hw_ip uvd_enc = {}, vce = {}, vcn_dec = {}, vcn_jpeg = {}; struct drm_amdgpu_info_hw_ip vcn_enc = {}, gfx = {}; struct amdgpu_gds_resource_info gds = {}; uint32_t vce_version = 0, vce_feature = 0, uvd_version = 0, uvd_feature = 0; @@ -186,6 +186,14 @@ bool ac_query_gpu_info(int fd, amdgpu_device_handle dev, } } + if (info->drm_major == 3 && info->drm_minor >= 17) { + r = amdgpu_query_hw_ip_info(dev, AMDGPU_HW_IP_VCN_JPEG, 0, &vcn_jpeg); + if (r) { + fprintf(stderr, "amdgpu: amdgpu_query_hw_ip_info(vcn_jpeg) failed.\n"); + return false; + } + } + r = amdgpu_query_firmware_version(dev, AMDGPU_INFO_FW_GFX_ME, 0, 0, &info->me_fw_version, &info->me_fw_feature); @@ -340,7 +348,8 @@ bool ac_query_gpu_info(int fd, amdgpu_device_handle dev, info->max_se = amdinfo->num_shader_engines; info->max_sh_per_se = amdinfo->num_shader_arrays_per_engine; info->has_hw_decode = - (uvd.available_rings != 0) || (vcn_dec.available_rings != 0); + (uvd.available_rings != 0) || (vcn_dec.available_rings != 0) || + (vcn_jpeg.available_rings != 0); info->uvd_fw_version = uvd.available_rings ? uvd_version : 0; info->vce_fw_version = @@ -439,6 +448,7 @@ bool ac_query_gpu_info(int fd, amdgpu_device_handle dev, ib_align = MAX2(ib_align, vce.ib_start_alignment); ib_align = MAX2(ib_align, vcn_dec.ib_start_alignment); ib_align = MAX2(ib_align, vcn_enc.ib_start_alignment); + ib_align = MAX2(ib_align, vcn_jpeg.ib_start_alignment); assert(ib_align); info->ib_start_alignment = ib_align; -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 14/15] winsys/amdgpu: add vcn jpeg cs support
From: Boyuan Zhang Add vcn jpeg cs support, align cs by no-op. Signed-off-by: Boyuan Zhang Reviewed-by: Leo Liu --- src/gallium/winsys/amdgpu/drm/amdgpu_cs.c | 12 1 file changed, 12 insertions(+) diff --git a/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c b/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c index c0f8b442b1..5986810d4e 100644 --- a/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c +++ b/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c @@ -845,6 +845,10 @@ static bool amdgpu_init_cs_context(struct amdgpu_winsys *ws, cs->ib[IB_MAIN].ip_type = AMDGPU_HW_IP_VCN_ENC; break; + case RING_VCN_JPEG: + cs->ib[IB_MAIN].ip_type = AMDGPU_HW_IP_VCN_JPEG; + break; + case RING_COMPUTE: case RING_GFX: cs->ib[IB_MAIN].ip_type = ring_type == RING_GFX ? AMDGPU_HW_IP_GFX : @@ -1589,6 +1593,14 @@ static int amdgpu_cs_flush(struct radeon_cmdbuf *rcs, while (rcs->current.cdw & 15) radeon_emit(rcs, 0x8000); /* type2 nop packet */ break; + case RING_VCN_JPEG: + if (rcs->current.cdw % 2) + assert(0); + while (rcs->current.cdw & 15) { + radeon_emit(rcs, 0x6000); /* nop packet */ + radeon_emit(rcs, 0x); + } + break; case RING_VCN_DEC: while (rcs->current.cdw & 15) radeon_emit(rcs, 0x81ff); /* nop packet */ -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 12/15] radeon/vcn: implement jpeg target buffer cmd
From: Boyuan Zhang Implement jpeg target buffer cmd by programming registers directly, since there is no firmware for VCN Jpeg decode. Signed-off-by: Boyuan Zhang Acked-by: Leo Liu --- .../drivers/radeon/radeon_vcn_dec_jpeg.c | 73 ++- 1 file changed, 72 insertions(+), 1 deletion(-) diff --git a/src/gallium/drivers/radeon/radeon_vcn_dec_jpeg.c b/src/gallium/drivers/radeon/radeon_vcn_dec_jpeg.c index 0d96acfcd2..afa2015b09 100644 --- a/src/gallium/drivers/radeon/radeon_vcn_dec_jpeg.c +++ b/src/gallium/drivers/radeon/radeon_vcn_dec_jpeg.c @@ -116,7 +116,78 @@ static void send_cmd_target(struct radeon_decoder *dec, struct pb_buffer* buf, uint32_t off, enum radeon_bo_usage usage, enum radeon_bo_domain domain) { - /* TODO */ + uint64_t addr; + + set_reg_jpeg(dec, mmUVD_JPEG_PITCH, COND0, TYPE0, (dec->jpg.dt_pitch >> 4)); + set_reg_jpeg(dec, mmUVD_JPEG_UV_PITCH, COND0, TYPE0, ((dec->jpg.dt_uv_pitch * 2) >> 4)); + + set_reg_jpeg(dec, mmUVD_JPEG_TILING_CTRL, COND0, TYPE0, 0); + set_reg_jpeg(dec, mmUVD_JPEG_UV_TILING_CTRL, COND0, TYPE0, 0); + + dec->ws->cs_add_buffer(dec->cs, buf, usage | RADEON_USAGE_SYNCHRONIZED, + domain, 0); + addr = dec->ws->buffer_get_virtual_address(buf); + addr = addr + off; + + // set UVD_LMI_JPEG_WRITE_64BIT_BAR_LOW/HIGH based on target buffer address + set_reg_jpeg(dec, mmUVD_LMI_JPEG_WRITE_64BIT_BAR_HIGH, COND0, TYPE0, (addr >> 32)); + set_reg_jpeg(dec, mmUVD_LMI_JPEG_WRITE_64BIT_BAR_LOW, COND0, TYPE0, addr); + + // set output buffer data address + set_reg_jpeg(dec, mmUVD_JPEG_INDEX, COND0, TYPE0, 0); + set_reg_jpeg(dec, mmUVD_JPEG_DATA, COND0, TYPE0, dec->jpg.dt_luma_top_offset); + set_reg_jpeg(dec, mmUVD_JPEG_INDEX, COND0, TYPE0, 1); + set_reg_jpeg(dec, mmUVD_JPEG_DATA, COND0, TYPE0, dec->jpg.dt_chroma_top_offset); + set_reg_jpeg(dec, mmUVD_JPEG_TIER_CNTL2, COND0, TYPE3, 0); + + // set output buffer read pointer + set_reg_jpeg(dec, mmUVD_JPEG_OUTBUF_RPTR, COND0, TYPE0, 0); + + // enable error interrupts + set_reg_jpeg(dec, mmUVD_JPEG_INT_EN, COND0, TYPE0, 0xFFFE); + + // start engine command + set_reg_jpeg(dec, mmUVD_JPEG_CNTL, COND0, TYPE0, 0x6); + + // wait for job completion, wait for job JBSI fetch done + set_reg_jpeg(dec, mmUVD_CTX_INDEX, COND0, TYPE0, 0x01C3); + set_reg_jpeg(dec, mmUVD_CTX_DATA, COND0, TYPE0, (dec->jpg.bsd_size >> 2)); + set_reg_jpeg(dec, mmUVD_CTX_INDEX, COND0, TYPE0, 0x01C2); + set_reg_jpeg(dec, mmUVD_CTX_DATA, COND0, TYPE0, 0x01400200); + set_reg_jpeg(dec, mmUVD_JPEG_RB_RPTR, COND0, TYPE3, 0x); + + // wait for job jpeg outbuf idle + set_reg_jpeg(dec, mmUVD_CTX_INDEX, COND0, TYPE0, 0x01C3); + set_reg_jpeg(dec, mmUVD_CTX_DATA, COND0, TYPE0, 0x); + set_reg_jpeg(dec, mmUVD_JPEG_OUTBUF_WPTR, COND0, TYPE3, 0x0001); + + // stop engine + set_reg_jpeg(dec, mmUVD_JPEG_CNTL, COND0, TYPE0, 0x4); + + // asserting jpeg lmi drop + set_reg_jpeg(dec, mmUVD_CTX_INDEX, COND0, TYPE0, 0x0005); + set_reg_jpeg(dec, mmUVD_CTX_DATA, COND0, TYPE0, (1 << 23 | 1 << 0)); + set_reg_jpeg(dec, mmUVD_CTX_DATA, COND0, TYPE1, 0); + + // asserting jpeg reset + set_reg_jpeg(dec, mmUVD_JPEG_CNTL, COND0, TYPE0, 1); + + // ensure reset is asserted in sclk domain + set_reg_jpeg(dec, mmUVD_CTX_INDEX, COND0, TYPE0, 0x01C3); + set_reg_jpeg(dec, mmUVD_CTX_DATA, COND0, TYPE0, (1 << 9)); + set_reg_jpeg(dec, mmUVD_SOFT_RESET, COND0, TYPE3, (1 << 9)); + + // de-assert jpeg reset + set_reg_jpeg(dec, mmUVD_JPEG_CNTL, COND0, TYPE0, 0); + + // ensure reset is de-asserted in sclk domain + set_reg_jpeg(dec, mmUVD_CTX_INDEX, COND0, TYPE0, 0x01C3); + set_reg_jpeg(dec, mmUVD_CTX_DATA, COND0, TYPE0, (0 << 9)); + set_reg_jpeg(dec, mmUVD_SOFT_RESET, COND0, TYPE3, (1 << 9)); + + // de-asserting jpeg lmi drop + set_reg_jpeg(dec, mmUVD_CTX_INDEX, COND0, TYPE0, 0x0005); + set_reg_jpeg(dec, mmUVD_CTX_DATA, COND0, TYPE0, 0); } /** -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 09/15] st/va: get mjpeg slice header
From: Boyuan Zhang Move the previous get_mjpeg_slice_heaeder function and eoi from "radeon/vcn" to "st/va". Signed-off-by: Boyuan Zhang Reviewed-by: Leo Liu --- src/gallium/state_trackers/va/picture.c | 13 +- src/gallium/state_trackers/va/picture_mjpeg.c | 142 ++ src/gallium/state_trackers/va/va_private.h| 11 ++ 3 files changed, 164 insertions(+), 2 deletions(-) diff --git a/src/gallium/state_trackers/va/picture.c b/src/gallium/state_trackers/va/picture.c index e2cdb2b40c..04d2da0afe 100644 --- a/src/gallium/state_trackers/va/picture.c +++ b/src/gallium/state_trackers/va/picture.c @@ -259,11 +259,12 @@ handleVASliceDataBufferType(vlVaContext *context, vlVaBuffer *buf) { enum pipe_video_format format; unsigned num_buffers = 0; - void * const *buffers[2]; - unsigned sizes[2]; + void * const *buffers[3]; + unsigned sizes[3]; static const uint8_t start_code_h264[] = { 0x00, 0x00, 0x01 }; static const uint8_t start_code_h265[] = { 0x00, 0x00, 0x01 }; static const uint8_t start_code_vc1[] = { 0x00, 0x00, 0x01, 0x0d }; + static const uint8_t eoi_jpeg[] = { 0xff, 0xd9 }; format = u_reduce_video_profile(context->templat.profile); switch (format) { @@ -301,6 +302,9 @@ handleVASliceDataBufferType(vlVaContext *context, vlVaBuffer *buf) sizes[num_buffers++] = context->mpeg4.start_code_size; break; case PIPE_VIDEO_FORMAT_JPEG: + vlVaGetJpegSliceHeader(context); + buffers[num_buffers] = (void *)context->mjpeg.slice_header; + sizes[num_buffers++] = context->mjpeg.slice_header_size; break; case PIPE_VIDEO_FORMAT_VP9: vlVaDecoderVP9BitstreamHeader(context, buf); @@ -313,6 +317,11 @@ handleVASliceDataBufferType(vlVaContext *context, vlVaBuffer *buf) sizes[num_buffers] = buf->size; ++num_buffers; + if (format == PIPE_VIDEO_FORMAT_JPEG) { + buffers[num_buffers] = (void *const)&eoi_jpeg; + sizes[num_buffers++] = sizeof(eoi_jpeg); + } + if (context->needs_begin_frame) { context->decoder->begin_frame(context->decoder, context->target, &context->desc.base); diff --git a/src/gallium/state_trackers/va/picture_mjpeg.c b/src/gallium/state_trackers/va/picture_mjpeg.c index 396b743442..defb0b546d 100644 --- a/src/gallium/state_trackers/va/picture_mjpeg.c +++ b/src/gallium/state_trackers/va/picture_mjpeg.c @@ -114,3 +114,145 @@ void vlVaHandleSliceParameterBufferMJPEG(vlVaContext *context, vlVaBuffer *buf) context->desc.mjpeg.slice_parameter.restart_interval = mjpeg->restart_interval; context->desc.mjpeg.slice_parameter.num_mcus = mjpeg->num_mcus; } + +void vlVaGetJpegSliceHeader(vlVaContext *context) +{ + int size = 0, saved_size, len_pos, i; + uint16_t *bs; + uint8_t *p = context->mjpeg.slice_header; + + /* SOI */ + p[size++] = 0xff; + p[size++] = 0xd8; + + /* DQT */ + p[size++] = 0xff; + p[size++] = 0xdb; + + len_pos = size++; + size++; + + for (i = 0; i < 4; ++i) { + if (context->desc.mjpeg.quantization_table.load_quantiser_table[i] == 0) + continue; + + p[size++] = i; + memcpy((p + size), &context->desc.mjpeg.quantization_table.quantiser_table[i], 64); + size += 64; + } + + bs = (uint16_t*)&p[len_pos]; + *bs = util_bswap16(size - 4); + + saved_size = size; + + /* DHT */ + p[size++] = 0xff; + p[size++] = 0xc4; + + len_pos = size++; + size++; + + for (i = 0; i < 2; ++i) { + int num = 0, j; + + if (context->desc.mjpeg.huffman_table.load_huffman_table[i] == 0) + continue; + + p[size++] = 0x00 | i; + memcpy((p + size), &context->desc.mjpeg.huffman_table.table[i].num_dc_codes, 16); + size += 16; + for (j = 0; j < 16; ++j) + num += context->desc.mjpeg.huffman_table.table[i].num_dc_codes[j]; + assert(num <= 12); + memcpy((p + size), &context->desc.mjpeg.huffman_table.table[i].dc_values, num); + size += num; + } + + for (i = 0; i < 2; ++i) { + int num = 0, j; + + if (context->desc.mjpeg.huffman_table.load_huffman_table[i] == 0) + continue; + + p[size++] = 0x10 | i; + memcpy((p + size), &context->desc.mjpeg.huffman_table.table[i].num_ac_codes, 16); + size += 16; + for (j = 0; j < 16; ++j) + num += context->desc.mjpeg.huffman_table.table[i].num_ac_codes[j]; + assert(num <= 162); + memcpy((p + size), &context->desc.mjpeg.huffman_table.table[i].ac_values, num); + size += num; + } + + bs = (uint16_t*)&p[len_pos]; + *bs = util_bswap16(size - saved_size - 2); + + saved_size = size; + + /* DRI */ + if (context->desc.mjpeg.slice_parameter.restart_interval) { + p[size++] = 0xff; + p[size++] = 0xdd; + p[size++] = 0x00; + p[size++] = 0x04; + bs = (uint16_t*)&p[size++]; + *bs = util_bswap16(context->desc.mjpeg.slice_parameter.restart_interval); + saved_size = ++size; + } + + /* SOF */ + p[size++] = 0xff; + p[size+
[Mesa-dev] [PATCH 08/15] radeon/vcn: add jpeg decode implementation
From: Boyuan Zhang Add a new file to handle VCN Jpeg decode specific functions. Use Jpeg specific cmd sending function in end_frame call. Signed-off-by: Boyuan Zhang Reviewed-by: Leo Liu --- src/gallium/drivers/radeon/radeon_vcn_dec.c | 21 ++-- src/gallium/drivers/radeon/radeon_vcn_dec.h | 4 + .../drivers/radeon/radeon_vcn_dec_jpeg.c | 99 +++ src/gallium/drivers/radeonsi/Makefile.sources | 1 + src/gallium/drivers/radeonsi/meson.build | 1 + 5 files changed, 119 insertions(+), 7 deletions(-) create mode 100644 src/gallium/drivers/radeon/radeon_vcn_dec_jpeg.c diff --git a/src/gallium/drivers/radeon/radeon_vcn_dec.c b/src/gallium/drivers/radeon/radeon_vcn_dec.c index 30a98c2786..75ef4a5d40 100644 --- a/src/gallium/drivers/radeon/radeon_vcn_dec.c +++ b/src/gallium/drivers/radeon/radeon_vcn_dec.c @@ -1247,6 +1247,10 @@ static unsigned calc_dpb_size(struct radeon_decoder *dec) dpb_size *= (3 / 2); break; + case PIPE_VIDEO_FORMAT_JPEG: + dpb_size = 0; + break; + default: // something is missing here assert(0); @@ -1547,14 +1551,14 @@ struct pipe_video_codec *radeon_create_decoder(struct pipe_context *context, } dpb_size = calc_dpb_size(dec); - - if (!si_vid_create_buffer(dec->screen, &dec->dpb, dpb_size, PIPE_USAGE_DEFAULT)) { - RVID_ERR("Can't allocated dpb.\n"); - goto error; + if (dpb_size) { + if (!si_vid_create_buffer(dec->screen, &dec->dpb, dpb_size, PIPE_USAGE_DEFAULT)) { + RVID_ERR("Can't allocated dpb.\n"); + goto error; + } + si_vid_clear_buffer(context, &dec->dpb); } - si_vid_clear_buffer(context, &dec->dpb); - if (dec->stream_type == RDECODE_CODEC_H264_PERF) { unsigned ctx_size = calc_ctx_size_h264_perf(dec); if (!si_vid_create_buffer(dec->screen, &dec->ctx, ctx_size, PIPE_USAGE_DEFAULT)) { @@ -1581,7 +1585,10 @@ struct pipe_video_codec *radeon_create_decoder(struct pipe_context *context, next_buffer(dec); - dec->send_cmd = send_cmd_dec; + if (stream_type == RDECODE_CODEC_JPEG) + dec->send_cmd = send_cmd_jpeg; + else + dec->send_cmd = send_cmd_dec; return &dec->base; diff --git a/src/gallium/drivers/radeon/radeon_vcn_dec.h b/src/gallium/drivers/radeon/radeon_vcn_dec.h index 37c0503377..a6a726f46d 100644 --- a/src/gallium/drivers/radeon/radeon_vcn_dec.h +++ b/src/gallium/drivers/radeon/radeon_vcn_dec.h @@ -768,6 +768,10 @@ void send_cmd_dec(struct radeon_decoder *dec, struct pipe_video_buffer *target, struct pipe_picture_desc *picture); +void send_cmd_jpeg(struct radeon_decoder *dec, + struct pipe_video_buffer *target, + struct pipe_picture_desc *picture); + struct pipe_video_codec *radeon_create_decoder(struct pipe_context *context, const struct pipe_video_codec *templat); diff --git a/src/gallium/drivers/radeon/radeon_vcn_dec_jpeg.c b/src/gallium/drivers/radeon/radeon_vcn_dec_jpeg.c new file mode 100644 index 00..7c078a0964 --- /dev/null +++ b/src/gallium/drivers/radeon/radeon_vcn_dec_jpeg.c @@ -0,0 +1,99 @@ +/** + * + * Copyright 2018 Advanced Micro Devices, Inc. + * All Rights Reserved. + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the + * "Software"), to deal in the Software without restriction, including + * without limitation the rights to use, copy, modify, merge, publish, + * distribute, sub license, and/or sell copies of the Software, and to + * permit persons to whom the Software is furnished to do so, subject to + * the following conditions: + * + * The above copyright notice and this permission notice (including the + * next paragraph) shall be included in all copies or substantial portions + * of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS + * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. + * IN NO EVENT SHALL THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR + * ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, + * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE + * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. + * + **/ + +#include +#include + +#include "pipe/p_video_codec.h" + +#include "util/u_memory.h" +#include "util/u_video.h" + +#include "radeonsi/si_pipe.h" +#include "radeon_video.h" +#include "radeon_vcn_dec.h" + +static struct p
[Mesa-dev] [PATCH 07/15] radeon/vcn: separate send cmd call from end frame
From: Boyuan Zhang Use function pointer for sending cmd in end_frame call. By doing this, we can assign different cmd sending logics for Jpeg decode later. Signed-off-by: Boyuan Zhang Reviewed-by: Leo Liu --- src/gallium/drivers/radeon/radeon_vcn_dec.c | 29 +++-- src/gallium/drivers/radeon/radeon_vcn_dec.h | 7 + 2 files changed, 28 insertions(+), 8 deletions(-) diff --git a/src/gallium/drivers/radeon/radeon_vcn_dec.c b/src/gallium/drivers/radeon/radeon_vcn_dec.c index 26ea1f82ff..30a98c2786 100644 --- a/src/gallium/drivers/radeon/radeon_vcn_dec.c +++ b/src/gallium/drivers/radeon/radeon_vcn_dec.c @@ -1368,21 +1368,15 @@ static void radeon_dec_decode_bitstream(struct pipe_video_codec *decoder, } /** - * end decoding of the current frame + * send cmd for vcn dec */ -static void radeon_dec_end_frame(struct pipe_video_codec *decoder, +void send_cmd_dec(struct radeon_decoder *dec, struct pipe_video_buffer *target, struct pipe_picture_desc *picture) { - struct radeon_decoder *dec = (struct radeon_decoder*)decoder; struct pb_buffer *dt; struct rvid_buffer *msg_fb_it_probs_buf, *bs_buf; - assert(decoder); - - if (!dec->bs_ptr) - return; - msg_fb_it_probs_buf = &dec->msg_fb_it_probs_buffers[dec->cur_buffer]; bs_buf = &dec->bs_buffers[dec->cur_buffer]; @@ -1412,6 +1406,23 @@ static void radeon_dec_end_frame(struct pipe_video_codec *decoder, send_cmd(dec, RDECODE_CMD_PROB_TBL_BUFFER, msg_fb_it_probs_buf->res->buf, FB_BUFFER_OFFSET + FB_BUFFER_SIZE, RADEON_USAGE_READ, RADEON_DOMAIN_GTT); set_reg(dec, RDECODE_ENGINE_CNTL, 1); +} + +/** + * end decoding of the current frame + */ +static void radeon_dec_end_frame(struct pipe_video_codec *decoder, + struct pipe_video_buffer *target, + struct pipe_picture_desc *picture) +{ + struct radeon_decoder *dec = (struct radeon_decoder*)decoder; + + assert(decoder); + + if (!dec->bs_ptr) + return; + + dec->send_cmd(dec, target, picture); flush(dec, PIPE_FLUSH_ASYNC); next_buffer(dec); @@ -1570,6 +1581,8 @@ struct pipe_video_codec *radeon_create_decoder(struct pipe_context *context, next_buffer(dec); + dec->send_cmd = send_cmd_dec; + return &dec->base; error: diff --git a/src/gallium/drivers/radeon/radeon_vcn_dec.h b/src/gallium/drivers/radeon/radeon_vcn_dec.h index 2bcc1bb542..37c0503377 100644 --- a/src/gallium/drivers/radeon/radeon_vcn_dec.h +++ b/src/gallium/drivers/radeon/radeon_vcn_dec.h @@ -759,8 +759,15 @@ struct radeon_decoder { boolshow_frame; unsignedref_idx; struct jpeg_params jpg; + void (*send_cmd)(struct radeon_decoder *dec, +struct pipe_video_buffer *target, +struct pipe_picture_desc *picture); }; +void send_cmd_dec(struct radeon_decoder *dec, + struct pipe_video_buffer *target, + struct pipe_picture_desc *picture); + struct pipe_video_codec *radeon_create_decoder(struct pipe_context *context, const struct pipe_video_codec *templat); -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 11/15] radeon/vcn: implement jpeg bitstream buffer cmd
From: Boyuan Zhang Implement jpeg bitstream buffer cmd by programming registers directly, since there is no firmware for VCN Jpeg decode. Signed-off-by: Boyuan Zhang Acked-by: Leo Liu --- .../drivers/radeon/radeon_vcn_dec_jpeg.c | 46 ++- 1 file changed, 45 insertions(+), 1 deletion(-) diff --git a/src/gallium/drivers/radeon/radeon_vcn_dec_jpeg.c b/src/gallium/drivers/radeon/radeon_vcn_dec_jpeg.c index 7c078a0964..0d96acfcd2 100644 --- a/src/gallium/drivers/radeon/radeon_vcn_dec_jpeg.c +++ b/src/gallium/drivers/radeon/radeon_vcn_dec_jpeg.c @@ -59,12 +59,56 @@ static struct pb_buffer *radeon_jpeg_get_decode_param(struct radeon_decoder *dec return luma->buffer.buf; } +/* add a new set register command to the IB */ +static void set_reg_jpeg(struct radeon_decoder *dec, unsigned reg, +unsigned cond, unsigned type, uint32_t val) +{ + radeon_emit(dec->cs, RDECODE_PKTJ(SOC15_REG_ADDR(reg), cond, type)); + radeon_emit(dec->cs, val); +} + /* send a bitstream buffer command */ static void send_cmd_bitstream(struct radeon_decoder *dec, struct pb_buffer* buf, uint32_t off, enum radeon_bo_usage usage, enum radeon_bo_domain domain) { - /* TODO */ + uint64_t addr; + + // jpeg soft reset + set_reg_jpeg(dec, mmUVD_JPEG_CNTL, COND0, TYPE0, 1); + + // ensuring the Reset is asserted in SCLK domain + set_reg_jpeg(dec, mmUVD_CTX_INDEX, COND0, TYPE0, 0x01C2); + set_reg_jpeg(dec, mmUVD_CTX_DATA, COND0, TYPE0, 0x01400200); + set_reg_jpeg(dec, mmUVD_CTX_INDEX, COND0, TYPE0, 0x01C3); + set_reg_jpeg(dec, mmUVD_CTX_DATA, COND0, TYPE0, (1 << 9)); + set_reg_jpeg(dec, mmUVD_SOFT_RESET, COND0, TYPE3, (1 << 9)); + + // wait mem + set_reg_jpeg(dec, mmUVD_JPEG_CNTL, COND0, TYPE0, 0); + + // ensuring the Reset is de-asserted in SCLK domain + set_reg_jpeg(dec, mmUVD_CTX_INDEX, COND0, TYPE0, 0x01C3); + set_reg_jpeg(dec, mmUVD_CTX_DATA, COND0, TYPE0, (0 << 9)); + set_reg_jpeg(dec, mmUVD_SOFT_RESET, COND0, TYPE3, (1 << 9)); + + dec->ws->cs_add_buffer(dec->cs, buf, usage | RADEON_USAGE_SYNCHRONIZED, + domain, 0); + addr = dec->ws->buffer_get_virtual_address(buf); + addr = addr + off; + + // set UVD_LMI_JPEG_READ_64BIT_BAR_LOW/HIGH based on bitstream buffer address + set_reg_jpeg(dec, mmUVD_LMI_JPEG_READ_64BIT_BAR_HIGH, COND0, TYPE0, (addr >> 32)); + set_reg_jpeg(dec, mmUVD_LMI_JPEG_READ_64BIT_BAR_LOW, COND0, TYPE0, addr); + + // set jpeg_rb_base + set_reg_jpeg(dec, mmUVD_JPEG_RB_BASE, COND0, TYPE0, 0); + + // set jpeg_rb_base + set_reg_jpeg(dec, mmUVD_JPEG_RB_SIZE, COND0, TYPE0, 0xFFF0); + + // set jpeg_rb_wptr + set_reg_jpeg(dec, mmUVD_JPEG_RB_WPTR, COND0, TYPE0, (dec->jpg.bsd_size >> 2)); } /* send a target buffer command */ -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 10/15] radeon/uvd: remove get mjpeg slice header
From: Boyuan Zhang Move the previous get_mjpeg_slice_heaeder function and eoi from "radeon/vcn" to "st/va". Signed-off-by: Boyuan Zhang Reviewed-by: Leo Liu --- src/gallium/drivers/radeon/radeon_uvd.c | 157 1 file changed, 157 deletions(-) diff --git a/src/gallium/drivers/radeon/radeon_uvd.c b/src/gallium/drivers/radeon/radeon_uvd.c index a7ef4252ee..0f3b43de81 100644 --- a/src/gallium/drivers/radeon/radeon_uvd.c +++ b/src/gallium/drivers/radeon/radeon_uvd.c @@ -964,149 +964,6 @@ static struct ruvd_mpeg4 get_mpeg4_msg(struct ruvd_decoder *dec, return result; } -static void get_mjpeg_slice_header(struct ruvd_decoder *dec, struct pipe_mjpeg_picture_desc *pic) -{ - int size = 0, saved_size, len_pos, i; - uint16_t *bs; - uint8_t *buf = dec->bs_ptr; - - /* SOI */ - buf[size++] = 0xff; - buf[size++] = 0xd8; - - /* DQT */ - buf[size++] = 0xff; - buf[size++] = 0xdb; - - len_pos = size++; - size++; - - for (i = 0; i < 4; ++i) { - if (pic->quantization_table.load_quantiser_table[i] == 0) - continue; - - buf[size++] = i; - memcpy((buf + size), &pic->quantization_table.quantiser_table[i], 64); - size += 64; - } - - bs = (uint16_t*)&buf[len_pos]; - *bs = util_bswap16(size - 4); - - saved_size = size; - - /* DHT */ - buf[size++] = 0xff; - buf[size++] = 0xc4; - - len_pos = size++; - size++; - - for (i = 0; i < 2; ++i) { - int num = 0, j; - - if (pic->huffman_table.load_huffman_table[i] == 0) - continue; - - buf[size++] = 0x00 | i; - memcpy((buf + size), &pic->huffman_table.table[i].num_dc_codes, 16); - size += 16; - for (j = 0; j < 16; ++j) - num += pic->huffman_table.table[i].num_dc_codes[j]; - assert(num <= 12); - memcpy((buf + size), &pic->huffman_table.table[i].dc_values, num); - size += num; - } - - for (i = 0; i < 2; ++i) { - int num = 0, j; - - if (pic->huffman_table.load_huffman_table[i] == 0) - continue; - - buf[size++] = 0x10 | i; - memcpy((buf + size), &pic->huffman_table.table[i].num_ac_codes, 16); - size += 16; - for (j = 0; j < 16; ++j) - num += pic->huffman_table.table[i].num_ac_codes[j]; - assert(num <= 162); - memcpy((buf + size), &pic->huffman_table.table[i].ac_values, num); - size += num; - } - - bs = (uint16_t*)&buf[len_pos]; - *bs = util_bswap16(size - saved_size - 2); - - saved_size = size; - - /* DRI */ - if (pic->slice_parameter.restart_interval) { - buf[size++] = 0xff; - buf[size++] = 0xdd; - buf[size++] = 0x00; - buf[size++] = 0x04; - bs = (uint16_t*)&buf[size++]; - *bs = util_bswap16(pic->slice_parameter.restart_interval); - saved_size = ++size; - } - - /* SOF */ - buf[size++] = 0xff; - buf[size++] = 0xc0; - - len_pos = size++; - size++; - - buf[size++] = 0x08; - - bs = (uint16_t*)&buf[size++]; - *bs = util_bswap16(pic->picture_parameter.picture_height); - size++; - - bs = (uint16_t*)&buf[size++]; - *bs = util_bswap16(pic->picture_parameter.picture_width); - size++; - - buf[size++] = pic->picture_parameter.num_components; - - for (i = 0; i < pic->picture_parameter.num_components; ++i) { - buf[size++] = pic->picture_parameter.components[i].component_id; - buf[size++] = pic->picture_parameter.components[i].h_sampling_factor << 4 | - pic->picture_parameter.components[i].v_sampling_factor; - buf[size++] = pic->picture_parameter.components[i].quantiser_table_selector; - } - - bs = (uint16_t*)&buf[len_pos]; - *bs = util_bswap16(size - saved_size - 2); - - saved_size = size; - - /* SOS */ - buf[size++] = 0xff; - buf[size++] = 0xda; - - len_pos = size++; - size++; - - buf[size++] = pic->slice_parameter.num_components; - - for (i = 0; i < pic->slice_parameter.num_components; ++i) { - buf[size++] = pic->slice_parameter.components[i].component_selector; - buf[size++] = pic->slice_parameter.components[i].dc_table_selector << 4 | - pic->slice_parameter.components[i].ac_table_selector; - } - - buf[size++] = 0x00; - buf[size++] = 0x3f; - buf[size++] = 0x00; - - bs = (uint16_t*)&buf[len_pos]; - *bs = util_bswap16(size - saved_size - 2); - - dec->bs_ptr += size; -
[Mesa-dev] [PATCH 06/15] radeon/vcn: create cs based on ring type
From: Boyuan Zhang Add RING_VCN_JPEG for VCN Jpeg decode, and keep RING_VCN_DEC for other codecs. Signed-off-by: Boyuan Zhang Reviewed-by: Leo Liu --- src/gallium/drivers/radeon/radeon_vcn_dec.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/src/gallium/drivers/radeon/radeon_vcn_dec.c b/src/gallium/drivers/radeon/radeon_vcn_dec.c index fbfef6d273..26ea1f82ff 100644 --- a/src/gallium/drivers/radeon/radeon_vcn_dec.c +++ b/src/gallium/drivers/radeon/radeon_vcn_dec.c @@ -1433,7 +1433,7 @@ struct pipe_video_codec *radeon_create_decoder(struct pipe_context *context, struct si_context *sctx = (struct si_context*)context; struct radeon_winsys *ws = sctx->ws; unsigned width = templ->width, height = templ->height; - unsigned dpb_size, bs_buf_size, stream_type = 0; + unsigned dpb_size, bs_buf_size, stream_type = 0, ring = RING_VCN_DEC; struct radeon_decoder *dec; int r, i; @@ -1462,6 +1462,10 @@ struct pipe_video_codec *radeon_create_decoder(struct pipe_context *context, case PIPE_VIDEO_FORMAT_VP9: stream_type = RDECODE_CODEC_VP9; break; + case PIPE_VIDEO_FORMAT_JPEG: + stream_type = RDECODE_CODEC_JPEG; + ring = RING_VCN_JPEG; + break; default: assert(0); break; @@ -1488,7 +1492,7 @@ struct pipe_video_codec *radeon_create_decoder(struct pipe_context *context, dec->stream_handle = si_vid_alloc_stream_handle(); dec->screen = context->screen; dec->ws = ws; - dec->cs = ws->cs_create(sctx->ctx, RING_VCN_DEC, NULL, NULL); + dec->cs = ws->cs_create(sctx->ctx, ring, NULL, NULL); if (!dec->cs) { RVID_ERR("Can't get command submission context.\n"); goto error; -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 05/15] radeon/winsys: add vcn jpeg ring type
From: Boyuan Zhang Add a new ring type for vcn jpeg. Signed-off-by: Boyuan Zhang Reviewed-by: Leo Liu --- src/gallium/drivers/radeon/radeon_winsys.h | 1 + 1 file changed, 1 insertion(+) diff --git a/src/gallium/drivers/radeon/radeon_winsys.h b/src/gallium/drivers/radeon/radeon_winsys.h index bb732ab314..c6800808cb 100644 --- a/src/gallium/drivers/radeon/radeon_winsys.h +++ b/src/gallium/drivers/radeon/radeon_winsys.h @@ -87,6 +87,7 @@ enum ring_type { RING_UVD_ENC, RING_VCN_DEC, RING_VCN_ENC, +RING_VCN_JPEG, RING_LAST, }; -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 04/15] radeon/vcn: add vcn jpeg decode interface
From: Boyuan Zhang Add VCN Jpeg decode interfaces and register defines. Signed-off-by: Boyuan Zhang Reviewed-by: Leo Liu --- src/gallium/drivers/radeon/radeon_vcn_dec.h | 90 + 1 file changed, 90 insertions(+) diff --git a/src/gallium/drivers/radeon/radeon_vcn_dec.h b/src/gallium/drivers/radeon/radeon_vcn_dec.h index c6c2a933cc..2bcc1bb542 100644 --- a/src/gallium/drivers/radeon/radeon_vcn_dec.h +++ b/src/gallium/drivers/radeon/radeon_vcn_dec.h @@ -43,6 +43,15 @@ #define RDECODE_PKT2() (RDECODE_PKT_TYPE_S(2)) +#define RDECODE_PKT_REG_J(x) ((unsigned)(x) & 0x3) +#define RDECODE_PKT_RES_J(x) (((unsigned)(x) & 0x3F) << 18) +#define RDECODE_PKT_COND_J(x) (((unsigned)(x) & 0xF) << 24) +#define RDECODE_PKT_TYPE_J(x) (((unsigned)(x) & 0xF) << 28) +#define RDECODE_PKTJ(reg, cond, type) (RDECODE_PKT_REG_J(reg) | \ + RDECODE_PKT_RES_J(0) | \ + RDECODE_PKT_COND_J(cond) | \ + RDECODE_PKT_TYPE_J(type)) + #define RDECODE_CMD_MSG_BUFFER 0x #define RDECODE_CMD_DPB_BUFFER 0x0001 #define RDECODE_CMD_DECODING_TARGET_BUFFER 0x0002 @@ -62,6 +71,7 @@ #define RDECODE_CODEC_MPEG2_VLD0x0003 #define RDECODE_CODEC_MPEG40x0004 #define RDECODE_CODEC_H264_PERF0x0007 +#define RDECODE_CODEC_JPEG 0x0008 #define RDECODE_CODEC_H265 0x0010 #define RDECODE_CODEC_VP9 0x0011 @@ -112,6 +122,77 @@ #define RDECODE_VP9_PROBS_DATA_SIZE2304 +#define mmUVD_JPEG_CNTL0x0200 +#define mmUVD_JPEG_CNTL_BASE_IDX 1 +#define mmUVD_JPEG_RB_BASE 0x0201 +#define mmUVD_JPEG_RB_BASE_BASE_IDX1 +#define mmUVD_JPEG_RB_WPTR 0x0202 +#define mmUVD_JPEG_RB_WPTR_BASE_IDX1 +#define mmUVD_JPEG_RB_RPTR 0x0203 +#define mmUVD_JPEG_RB_RPTR_BASE_IDX1 +#define mmUVD_JPEG_RB_SIZE 0x0204 +#define mmUVD_JPEG_RB_SIZE_BASE_IDX1 +#define mmUVD_JPEG_TIER_CNTL2 0x021a +#define mmUVD_JPEG_TIER_CNTL2_BASE_IDX 1 +#define mmUVD_JPEG_UV_TILING_CTRL 0x021c +#define mmUVD_JPEG_UV_TILING_CTRL_BASE_IDX 1 +#define mmUVD_JPEG_TILING_CTRL 0x021e +#define mmUVD_JPEG_TILING_CTRL_BASE_IDX1 +#define mmUVD_JPEG_OUTBUF_RPTR 0x0220 +#define mmUVD_JPEG_OUTBUF_RPTR_BASE_IDX1 +#define mmUVD_JPEG_OUTBUF_WPTR 0x0221 +#define mmUVD_JPEG_OUTBUF_WPTR_BASE_IDX1 +#define mmUVD_JPEG_PITCH 0x0222 +#define mmUVD_JPEG_PITCH_BASE_IDX 1 +#define mmUVD_JPEG_INT_EN 0x0229 +#define mmUVD_JPEG_INT_EN_BASE_IDX 1 +#define mmUVD_JPEG_UV_PITCH0x022b +#define mmUVD_JPEG_UV_PITCH_BASE_IDX 1 +#define mmUVD_JPEG_INDEX 0x023e +#define mmUVD_JPEG_INDEX_BASE_IDX 1 +#define mmUVD_JPEG_DATA0x023f +#define mmUVD_JPEG_DATA_BASE_IDX 1 +#define mmUVD_LMI_JPEG_WRITE_64BIT_BAR_HIGH0x0438 +#define mmUVD_LMI_JPEG_WRITE_64BIT_BAR_HIGH_BASE_IDX 1 +#define mmUVD_LMI_JPEG_WRITE_64BIT_BAR_LOW 0x0439 +#define mmUVD_LMI_JPEG_WRITE_64BIT_BAR_LOW_BASE_IDX1 +#define mmUVD_LMI_JPEG_READ_64BIT_BAR_HIGH 0x045a +#define mmUVD_LMI_JPEG_READ_64BIT_BAR_HIGH_BASE_IDX1 +#define mmUVD_LMI_JPEG_READ_64BIT_BAR_LOW 0x045b +#define mmUVD_LMI_JPEG_READ_64BIT_BAR_LOW_BASE_IDX 1 +#define mmUVD_CTX_INDEX0x0528 +#define mmUVD_CTX_INDEX_BASE_IDX 1 +#define mmUVD_CTX_DATA 0x0529 +#define mmUVD_CTX_DATA_BASE_IDX1 +#define mmUVD_SOFT_RESET 0x05a0 +#define mmUVD_SOFT_RESET_BASE_IDX 1 + +#define UVD_BASE_INST0_SEG00x7800 +#define UVD_BASE_INST0_SEG10x7E00 +#define UVD_BASE_INST0_SEG20 +#define UVD_BASE_INST0_SEG30 +#define UVD_BASE_INST0_SEG40 + +#define SOC15_REG_ADDR(reg)(UVD_BASE_INST0_SEG1 + reg) + +#define COND0 0 +#define COND1 1 +#define COND2 2 +
[Mesa-dev] [PATCH 03/15] radeon/vcn: move radeon decoder define to header file
From: Boyuan Zhang Move radeon_decoder definition from "radeon_vcn_dec.c" to "radeon_vcn_dec.h", so that it can be included by other files later. Signed-off-by: Boyuan Zhang Reviewed-by: Leo Liu --- src/gallium/drivers/radeon/radeon_vcn_dec.c | 31 src/gallium/drivers/radeon/radeon_vcn_dec.h | 32 + 2 files changed, 32 insertions(+), 31 deletions(-) diff --git a/src/gallium/drivers/radeon/radeon_vcn_dec.c b/src/gallium/drivers/radeon/radeon_vcn_dec.c index c2e22048ce..fbfef6d273 100644 --- a/src/gallium/drivers/radeon/radeon_vcn_dec.c +++ b/src/gallium/drivers/radeon/radeon_vcn_dec.c @@ -51,42 +51,11 @@ #define RDECODE_GPCOM_VCPU_DATA1 0x20714 #define RDECODE_ENGINE_CNTL0x20718 -#define NUM_BUFFERS4 #define NUM_MPEG2_REFS 6 #define NUM_H264_REFS 17 #define NUM_VC1_REFS 5 #define NUM_VP9_REFS 8 -struct radeon_decoder { - struct pipe_video_codec base; - - unsignedstream_handle; - unsignedstream_type; - unsignedframe_number; - - struct pipe_screen *screen; - struct radeon_winsys*ws; - struct radeon_cmdbuf*cs; - - void*msg; - uint32_t*fb; - uint8_t *it; - uint8_t *probs; - void*bs_ptr; - - struct rvid_buffer msg_fb_it_probs_buffers[NUM_BUFFERS]; - struct rvid_buffer bs_buffers[NUM_BUFFERS]; - struct rvid_buffer dpb; - struct rvid_buffer ctx; - struct rvid_buffer sessionctx; - - unsignedbs_size; - unsignedcur_buffer; - void*render_pic_list[16]; - boolshow_frame; - unsignedref_idx; -}; - static rvcn_dec_message_avc_t get_h264_msg(struct radeon_decoder *dec, struct pipe_h264_picture_desc *pic) { diff --git a/src/gallium/drivers/radeon/radeon_vcn_dec.h b/src/gallium/drivers/radeon/radeon_vcn_dec.h index 7a07ad0637..c6c2a933cc 100644 --- a/src/gallium/drivers/radeon/radeon_vcn_dec.h +++ b/src/gallium/drivers/radeon/radeon_vcn_dec.h @@ -108,6 +108,8 @@ #define RDECODE_SPS_INFO_H264_EXTENSION_SUPPORT_FLAG_SHIFT 7 +#define NUM_BUFFERS4 + #define RDECODE_VP9_PROBS_DATA_SIZE2304 /* VP9 Frame header flags */ @@ -639,6 +641,36 @@ typedef struct rvcn_dec_vp9_probs_segment_s { }; } rvcn_dec_vp9_probs_segment_t; +struct radeon_decoder { + struct pipe_video_codec base; + + unsignedstream_handle; + unsignedstream_type; + unsignedframe_number; + + struct pipe_screen *screen; + struct radeon_winsys*ws; + struct radeon_cmdbuf*cs; + + void*msg; + uint32_t*fb; + uint8_t *it; + uint8_t *probs; + void*bs_ptr; + + struct rvid_buffer msg_fb_it_probs_buffers[NUM_BUFFERS]; + struct rvid_buffer bs_buffers[NUM_BUFFERS]; + struct rvid_buffer dpb; + struct rvid_buffer ctx; + struct rvid_buffer sessionctx; + + unsignedbs_size; + unsignedcur_buffer; + void*render_pic_list[16]; + boolshow_frame; + unsignedref_idx; +}; + struct pipe_video_codec *radeon_create_decoder(struct pipe_context *context, const struct pipe_video_codec *templat); -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 02/15] meson: update required amdgpu version to 2.4.95
From: Boyuan Zhang VCN jpeg requires new hw ip Signed-off-by: Boyuan Zhang --- meson.build | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/meson.build b/meson.build index 002ce35a60..35e3e934a3 100644 --- a/meson.build +++ b/meson.build @@ -1108,7 +1108,7 @@ dep_libdrm_etnaviv = null_dep dep_libdrm_freedreno = null_dep dep_libdrm_intel = null_dep -_drm_amdgpu_ver = '2.4.93' +_drm_amdgpu_ver = '2.4.95' _drm_radeon_ver = '2.4.71' _drm_nouveau_ver = '2.4.66' _drm_etnaviv_ver = '2.4.89' -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 01/15] configure.ac: update libdrm amdgpu version to 2.4.95
From: Boyuan Zhang VCN jpeg requires new hw ip Signed-off-by: Boyuan Zhang --- configure.ac | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/configure.ac b/configure.ac index 520948b051..5fd7d8510d 100644 --- a/configure.ac +++ b/configure.ac @@ -74,7 +74,7 @@ AC_SUBST([OPENCL_VERSION]) # in the first entry. LIBDRM_REQUIRED=2.4.75 LIBDRM_RADEON_REQUIRED=2.4.71 -LIBDRM_AMDGPU_REQUIRED=2.4.93 +LIBDRM_AMDGPU_REQUIRED=2.4.95 LIBDRM_INTEL_REQUIRED=2.4.75 LIBDRM_NVVIEUX_REQUIRED=2.4.66 LIBDRM_NOUVEAU_REQUIRED=2.4.66 -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/1] swr/rast: ignore CreateElementUnorderedAtomicMemCpy
This function's API changed between LLVM 5 and 6. Compile errors occur when building with LLVM 6+ if LLVM 5 was used for a dist tarball --- .../drivers/swr/rasterizer/codegen/gen_llvm_ir_macros.py | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/gallium/drivers/swr/rasterizer/codegen/gen_llvm_ir_macros.py b/src/gallium/drivers/swr/rasterizer/codegen/gen_llvm_ir_macros.py index d34e88d1bc..485403ae1e 100644 --- a/src/gallium/drivers/swr/rasterizer/codegen/gen_llvm_ir_macros.py +++ b/src/gallium/drivers/swr/rasterizer/codegen/gen_llvm_ir_macros.py @@ -161,7 +161,8 @@ def parse_ir_builder(input_file): func_name == 'CreateAlignmentAssumptionHelper' or func_name == 'CreateGEP' or func_name == 'CreateLoad' or -func_name == 'CreateMaskedLoad'): +func_name == 'CreateMaskedLoad' or +func_name == 'CreateElementUnorderedAtomicMemCpy'): ignore = True # Convert CamelCase to CAMEL_CASE -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 0/1] swr: Fix for LLVM 5 to 6 API change
This is primarily a fix for the stable branch as it is still packaged with LLVM 5 libs. This fixes a compile error if a user tries to build with LLVM 6+ from an 18.2.x release tarball Alok Hota (1): swr/rast: ignore CreateElementUnorderedAtomicMemCpy .../drivers/swr/rasterizer/codegen/gen_llvm_ir_macros.py | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/3] i965/nir: use vectorization for non-scalar stages
From: Connor Abbott Shader-db results on Haswell: total instructions in shared programs: 2180337 -> 2154080 (-1.20%) instructions in affected programs: 959766 -> 933509 (-2.74%) helped: 5653 HURT: 2560 total cycles in shared programs: 12339326 -> 12307102 (-0.26%) cycles in affected programs: 6102794 -> 6070570 (-0.53%) helped: 3838 HURT: 4868 Most of the hurt programs seem to be because we generate extra MOV's due to vectorizing things. For example, in shaders/non-free/steam/anomaly-2/158.shader_test, this: add(8) g116<1>.xyF g12<4,4,1>.xyyyF g1.4<0,4,1>.xyyyF { align16 NoDDClr 1Q }; add(8) g117<1>.xyF g12<4,4,1>.xyyyF g1.4<0,4,1>.zwwwF { align16 NoDDClr 1Q }; add(8) g116<1>.zwF g12<4,4,1>.xxxyF -g1.4<0,4,1>.xxxyF { align16 NoDDChk 1Q }; add(8) g117<1>.zwF g12<4,4,1>.xxxyF -g1.4<0,4,1>.zzzwF { align16 NoDDChk 1Q }; Turns into this: add(8) g13<1>F g12<4,4,1>.xyxyF g1.4<0,4,1>F { align16 1Q }; add(8) g14<1>F g12<4,4,1>.xyxyF -g1.4<0,4,1>F { align16 1Q }; mov(8) g116<1>.xyD g13<4,4,1>.xyyyD{ align16 NoDDClr 1Q }; mov(8) g117<1>.xyD g13<4,4,1>.zwwwD{ align16 NoDDClr 1Q }; mov(8) g116<1>.zwD g14<4,4,1>.xxxyD{ align16 NoDDChk 1Q }; mov(8) g117<1>.zwD g14<4,4,1>.zzzwD{ align16 NoDDChk 1Q }; So we eliminated two add's, but then had to introduce four mov's to transpose the result. Some of the hurt is because vectorization is a bit over-aggressive and we vectorize something when we should have left it as a scalar and CSEd it. Unfortunately, this is all really tricky to do as it involves the interactions between many different components. --- src/intel/compiler/brw_nir.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/src/intel/compiler/brw_nir.c b/src/intel/compiler/brw_nir.c index 297845b89b7..564fd004a94 100644 --- a/src/intel/compiler/brw_nir.c +++ b/src/intel/compiler/brw_nir.c @@ -568,6 +568,12 @@ brw_nir_optimize(nir_shader *nir, const struct brw_compiler *compiler, OPT(nir_copy_prop); OPT(nir_opt_dce); OPT(nir_opt_cse); + + if (!is_scalar) { + OPT(nir_opt_vectorize); + OPT(nir_copy_prop); + } + OPT(nir_opt_peephole_select, 0); OPT(nir_opt_intrinsics); OPT(nir_opt_algebraic); -- 2.19.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/3] intel/peephole_ffma: Fix swizzle propagation
The num_components value passed into get_mul_for_src is used to only compose the parts of the swizzle that we know will be used so we don't compose invalid swizzle components. However, we had a bug where we passed the number of components of the add all the way through. For the given source, we need the number of components read from that source. In the case where we have a narrow add, say 2 components, that is sourced from a chain of wider instructions, we may not compose all the swizzles. All we really need to do is pass through the right number of components at each level. Fixes: 2231cf0ba3a "nir: Fix output swizzle in get_mul_for_src" --- src/intel/compiler/brw_nir_opt_peephole_ffma.c | 11 +++ 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/src/intel/compiler/brw_nir_opt_peephole_ffma.c b/src/intel/compiler/brw_nir_opt_peephole_ffma.c index cc225e1847b..7271bdbca43 100644 --- a/src/intel/compiler/brw_nir_opt_peephole_ffma.c +++ b/src/intel/compiler/brw_nir_opt_peephole_ffma.c @@ -68,7 +68,7 @@ are_all_uses_fadd(nir_ssa_def *def) } static nir_alu_instr * -get_mul_for_src(nir_alu_src *src, int num_components, +get_mul_for_src(nir_alu_src *src, unsigned num_components, uint8_t swizzle[4], bool *negate, bool *abs) { uint8_t swizzle_tmp[4]; @@ -93,16 +93,19 @@ get_mul_for_src(nir_alu_src *src, int num_components, switch (alu->op) { case nir_op_imov: case nir_op_fmov: - alu = get_mul_for_src(&alu->src[0], num_components, swizzle, negate, abs); + alu = get_mul_for_src(&alu->src[0], alu->dest.dest.ssa.num_components, +swizzle, negate, abs); break; case nir_op_fneg: - alu = get_mul_for_src(&alu->src[0], num_components, swizzle, negate, abs); + alu = get_mul_for_src(&alu->src[0], alu->dest.dest.ssa.num_components, +swizzle, negate, abs); *negate = !*negate; break; case nir_op_fabs: - alu = get_mul_for_src(&alu->src[0], num_components, swizzle, negate, abs); + alu = get_mul_for_src(&alu->src[0], alu->dest.dest.ssa.num_components, +swizzle, negate, abs); *negate = false; *abs = true; break; -- 2.19.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/3] nir: add a vectorization pass
From: Connor Abbott This effectively does the opposite of nir_lower_alus_to_scalar, trying to combine per-component ALU operations with the same sources but different swizzles into one larger ALU operation. It uses a similar model as CSE, where we do a depth-first approach and keep around a hash set of instructions to be combined, but there are a few major differences: 1. For now, we only support entirely per-component ALU operations. 2. Since it's not always guaranteed that we'll be able to combine equivalent instructions, we keep a stack of equivalent instructions around, trying to combine new instructions with instructions on the stack. The pass isn't comprehensive by far; it can't handle operations where some of the sources are per-component and others aren't, and it can't handle phi nodes. But it should handle the more common cases, and it should be reasonably efficient. --- src/compiler/Makefile.sources| 1 + src/compiler/nir/meson.build | 1 + src/compiler/nir/nir.h | 2 + src/compiler/nir/nir_opt_vectorize.c | 454 +++ 4 files changed, 458 insertions(+) create mode 100644 src/compiler/nir/nir_opt_vectorize.c diff --git a/src/compiler/Makefile.sources b/src/compiler/Makefile.sources index b65bb9b80b9..e231f4a9ab1 100644 --- a/src/compiler/Makefile.sources +++ b/src/compiler/Makefile.sources @@ -289,6 +289,7 @@ NIR_FILES = \ nir/nir_opt_shrink_load.c \ nir/nir_opt_trivial_continues.c \ nir/nir_opt_undef.c \ + nir/nir_opt_vectorize.c \ nir/nir_phi_builder.c \ nir/nir_phi_builder.h \ nir/nir_print.c \ diff --git a/src/compiler/nir/meson.build b/src/compiler/nir/meson.build index d8f65640004..865d11bb278 100644 --- a/src/compiler/nir/meson.build +++ b/src/compiler/nir/meson.build @@ -173,6 +173,7 @@ files_libnir = files( 'nir_opt_shrink_load.c', 'nir_opt_trivial_continues.c', 'nir_opt_undef.c', + 'nir_opt_vectorize.c', 'nir_phi_builder.c', 'nir_phi_builder.h', 'nir_print.c', diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h index 5b871812d46..f33e2d3b726 100644 --- a/src/compiler/nir/nir.h +++ b/src/compiler/nir/nir.h @@ -3088,6 +3088,8 @@ bool nir_opt_trivial_continues(nir_shader *shader); bool nir_opt_undef(nir_shader *shader); +bool nir_opt_vectorize(nir_shader *shader); + bool nir_opt_conditional_discard(nir_shader *shader); void nir_sweep(nir_shader *shader); diff --git a/src/compiler/nir/nir_opt_vectorize.c b/src/compiler/nir/nir_opt_vectorize.c new file mode 100644 index 000..7e22726a3ef --- /dev/null +++ b/src/compiler/nir/nir_opt_vectorize.c @@ -0,0 +1,454 @@ +/* + * Copyright © 2015 Connor Abbott + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + * + */ + +#include "nir.h" +#include "nir_vla.h" +#include "nir_builder.h" +#include "util/u_dynarray.h" + +#define HASH(hash, data) _mesa_fnv32_1a_accumulate((hash), (data)) + +static uint32_t +hash_src(uint32_t hash, const nir_src *src) +{ + assert(src->is_ssa); + + return HASH(hash, src->ssa); +} + +static uint32_t +hash_alu_src(uint32_t hash, const nir_alu_src *src) +{ + assert(!src->abs && !src->negate); + + /* intentionally don't hash swizzle */ + + return hash_src(hash, &src->src); +} + +static uint32_t +hash_alu(uint32_t hash, const nir_alu_instr *instr) +{ + hash = HASH(hash, instr->op); + + hash = HASH(hash, instr->dest.dest.ssa.bit_size); + + for (unsigned i = 0; i < nir_op_infos[instr->op].num_inputs; i++) + hash = hash_alu_src(hash, &instr->src[i]); + + return hash; +} + +static uint32_t +hash_instr(const nir_instr *instr) +{ + uint32_t hash = _mesa_fnv32_1a_offset_bias; + + switch (instr->type) { + case nir_instr_type_alu: + return hash_alu(hash, nir_instr_as_alu(instr)); + default: + unreachable("bad instruction type"); + } +} + +static bool +srcs_equal(const nir_src *s
Re: [Mesa-dev] [PATCH] freedreno: Fix emacs modeline
Eric Engestrom writes: > That's absolutely fair :) > > I wanted to ack your patch earlier, since fixing it is good regardless, > but freedreno isn't my area so I didn't feel comfortable doing so; > I changed my mind in the mean time though, so here you go :P > Acked-by: Eric Engestrom > > You have push access, right? Yes, I have push access. But actually Rob already pushed my other patch to just remove it in the meantime, so there’s no need to do anything. Thanks anyway. Regards, - Neil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] freedreno: Fix emacs modeline
On Wednesday, 2018-10-17 17:25:22 +0200, Neil Roberts wrote: > Eric Engestrom writes: > > > You might want to remove these instead, and use the .editorconfig [1] > > already present at src/gallium/drivers/freedreno/.editorconfig This is > > much easier to maintain than per-files settings ;) > > Either fixing it or removing it is fine by me. I now notice there is a > .dir-locals.el file that should make it work anyway. (apparently I was > the last person to touch it too!) It has a typo which makes it fail to > set indent-tabs-mode though. I can make everything work locally either > way, I just wanted to get rid of the annoying warning whenever you open > a file. That's absolutely fair :) I wanted to ack your patch earlier, since fixing it is good regardless, but freedreno isn't my area so I didn't feel comfortable doing so; I changed my mind in the mean time though, so here you go :P Acked-by: Eric Engestrom You have push access, right? > > - Neil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] radeonsi: clamp point size to the limit
Tested-by: Jakob Bornecrantz On Wed, Oct 17, 2018 at 5:29 PM Marek Olšák wrote: > > From: Marek Olšák > > This fixes dEQP-GLES2.functional.rasterization.limits.points. > Broken by: ea039f789d9b54e1bd1d644b6a29863ca3500314 > --- > src/gallium/drivers/radeonsi/si_get.c | 5 +++-- > src/gallium/drivers/radeonsi/si_pipe.h | 1 + > src/gallium/drivers/radeonsi/si_state.c | 2 +- > 3 files changed, 5 insertions(+), 3 deletions(-) > > diff --git a/src/gallium/drivers/radeonsi/si_get.c > b/src/gallium/drivers/radeonsi/si_get.c > index ac302b8a946..804276b3eda 100644 > --- a/src/gallium/drivers/radeonsi/si_get.c > +++ b/src/gallium/drivers/radeonsi/si_get.c > @@ -326,25 +326,26 @@ static int si_get_param(struct pipe_screen *pscreen, > enum pipe_cap param) > default: > return u_pipe_screen_get_param_defaults(pscreen, param); > } > } > > static float si_get_paramf(struct pipe_screen* pscreen, enum pipe_capf param) > { > switch (param) { > case PIPE_CAPF_MAX_LINE_WIDTH: > case PIPE_CAPF_MAX_LINE_WIDTH_AA: > - case PIPE_CAPF_MAX_POINT_WIDTH: > - case PIPE_CAPF_MAX_POINT_WIDTH_AA: > /* This depends on the quant mode, though the precise > interactions > * are unknown. */ > return 2048; > + case PIPE_CAPF_MAX_POINT_WIDTH: > + case PIPE_CAPF_MAX_POINT_WIDTH_AA: > + return SI_MAX_POINT_SIZE; > case PIPE_CAPF_MAX_TEXTURE_ANISOTROPY: > return 16.0f; > case PIPE_CAPF_MAX_TEXTURE_LOD_BIAS: > return 16.0f; > case PIPE_CAPF_MIN_CONSERVATIVE_RASTER_DILATE: > case PIPE_CAPF_MAX_CONSERVATIVE_RASTER_DILATE: > case PIPE_CAPF_CONSERVATIVE_RASTER_DILATE_GRANULARITY: > return 0.0f; > } > return 0.0f; > diff --git a/src/gallium/drivers/radeonsi/si_pipe.h > b/src/gallium/drivers/radeonsi/si_pipe.h > index 6edc06cece7..dc95afb7421 100644 > --- a/src/gallium/drivers/radeonsi/si_pipe.h > +++ b/src/gallium/drivers/radeonsi/si_pipe.h > @@ -41,20 +41,21 @@ > #define ATI_VENDOR_ID 0x1002 > > #define SI_NOT_QUERY 0x > > /* The base vertex and primitive restart can be any number, but we must pick > * one which will mean "unknown" for the purpose of state tracking and > * the number shouldn't be a commonly-used one. */ > #define SI_BASE_VERTEX_UNKNOWN INT_MIN > #define SI_RESTART_INDEX_UNKNOWN INT_MIN > #define SI_NUM_SMOOTH_AA_SAMPLES 8 > +#define SI_MAX_POINT_SIZE 2048 > #define SI_GS_PER_ES 128 > /* Alignment for optimal CP DMA performance. */ > #define SI_CPDMA_ALIGNMENT 32 > > /* Tunables for compute-based clear_buffer and copy_buffer: */ > #define SI_COMPUTE_CLEAR_DW_PER_THREAD 4 > #define SI_COMPUTE_COPY_DW_PER_THREAD 4 > #define SI_COMPUTE_DST_CACHE_POLICYL2_STREAM > > /* Pipeline & streamout query controls. */ > diff --git a/src/gallium/drivers/radeonsi/si_state.c > b/src/gallium/drivers/radeonsi/si_state.c > index 8b2e6e57f45..176ec749148 100644 > --- a/src/gallium/drivers/radeonsi/si_state.c > +++ b/src/gallium/drivers/radeonsi/si_state.c > @@ -891,21 +891,21 @@ static void *si_create_rs_state(struct pipe_context > *ctx, > S_0286D4_PNT_SPRITE_OVRD_Z(V_0286D4_SPI_PNT_SPRITE_SEL_0) | > S_0286D4_PNT_SPRITE_OVRD_W(V_0286D4_SPI_PNT_SPRITE_SEL_1) | > S_0286D4_PNT_SPRITE_TOP_1(state->sprite_coord_mode != > PIPE_SPRITE_COORD_UPPER_LEFT)); > > /* point size 12.4 fixed point */ > tmp = (unsigned)(state->point_size * 8.0); > si_pm4_set_reg(pm4, R_028A00_PA_SU_POINT_SIZE, S_028A00_HEIGHT(tmp) | > S_028A00_WIDTH(tmp)); > > if (state->point_size_per_vertex) { > psize_min = util_get_min_point_size(state); > - psize_max = 8192; > + psize_max = SI_MAX_POINT_SIZE; > } else { > /* Force the point size to be as if the vertex output was > disabled. */ > psize_min = state->point_size; > psize_max = state->point_size; > } > rs->max_point_size = psize_max; > > /* Divide by two, because 0.5 = 1 pixel. */ > si_pm4_set_reg(pm4, R_028A04_PA_SU_POINT_MINMAX, > S_028A04_MIN_SIZE(si_pack_float_12p4(psize_min/2)) | > -- > 2.17.1 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] radeonsi: fix a VGT hang with primitive restart on Polaris10 and later
Tested-by: Jakob Bornecrantz On Wed, Oct 17, 2018 at 5:29 PM Marek Olšák wrote: > > From: Marek Olšák > > Cc: 18.1 18.2 > --- > src/gallium/drivers/radeonsi/si_state_draw.c | 10 -- > 1 file changed, 8 insertions(+), 2 deletions(-) > > diff --git a/src/gallium/drivers/radeonsi/si_state_draw.c > b/src/gallium/drivers/radeonsi/si_state_draw.c > index 83eb646b791..612ca910cb9 100644 > --- a/src/gallium/drivers/radeonsi/si_state_draw.c > +++ b/src/gallium/drivers/radeonsi/si_state_draw.c > @@ -376,21 +376,21 @@ si_get_init_multi_vgt_param(struct si_screen *sscreen, > } > > if (sscreen->info.chip_class >= CIK) { > /* WD_SWITCH_ON_EOP has no effect on GPUs with less than > * 4 shader engines. Set 1 to pass the assertion below. > * The other cases are hardware requirements. > * > * Polaris supports primitive restart with WD_SWITCH_ON_EOP=0 > * for points, line strips, and tri strips. > */ > - if (sscreen->info.max_se < 4 || > + if (sscreen->info.max_se <= 2 || > key->u.prim == PIPE_PRIM_POLYGON || > key->u.prim == PIPE_PRIM_LINE_LOOP || > key->u.prim == PIPE_PRIM_TRIANGLE_FAN || > key->u.prim == PIPE_PRIM_TRIANGLE_STRIP_ADJACENCY || > (key->u.primitive_restart && > (sscreen->info.family < CHIP_POLARIS10 || > (key->u.prim != PIPE_PRIM_POINTS && >key->u.prim != PIPE_PRIM_LINE_STRIP && >key->u.prim != PIPE_PRIM_TRIANGLE_STRIP))) || > key->u.count_from_stream_output) > @@ -407,35 +407,41 @@ si_get_init_multi_vgt_param(struct si_screen *sscreen, > * instances are smaller than a primgroup. > * Assume indirect draws always use small instances. > * This is needed for good VS wave utilization. > */ > if (sscreen->info.chip_class <= VI && > sscreen->info.max_se == 4 && > key->u.multi_instances_smaller_than_primgroup) > wd_switch_on_eop = true; > > /* Required on CIK and later. */ > - if (sscreen->info.max_se > 2 && !wd_switch_on_eop) > + if (sscreen->info.max_se == 4 && !wd_switch_on_eop) > ia_switch_on_eoi = true; > > /* Required by Hawaii and, for some special cases, by VI. */ > if (ia_switch_on_eoi && > (sscreen->info.family == CHIP_HAWAII || > (sscreen->info.chip_class == VI && > (key->u.uses_gs || max_primgroup_in_wave != 2 > partial_vs_wave = true; > > /* Instancing bug on Bonaire. */ > if (sscreen->info.family == CHIP_BONAIRE && ia_switch_on_eoi > && > key->u.uses_instancing) > partial_vs_wave = true; > > + /* This only applies to Polaris10 and later 4 SE chips. > +* wd_switch_on_eop is already true on all other chips. > +*/ > + if (!wd_switch_on_eop && key->u.primitive_restart) > + partial_vs_wave = true; > + > /* If the WD switch is false, the IA switch must be false > too. */ > assert(wd_switch_on_eop || !ia_switch_on_eop); > } > > /* If SWITCH_ON_EOI is set, PARTIAL_ES_WAVE must be set too. */ > if (sscreen->info.chip_class <= VI && ia_switch_on_eoi) > partial_es_wave = true; > > return S_028AA8_SWITCH_ON_EOP(ia_switch_on_eop) | > S_028AA8_SWITCH_ON_EOI(ia_switch_on_eoi) | > -- > 2.17.1 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] vulkan: Add VK_EXT_calibrated_timestamps extension (radv and anv) [v5]
Jason Ekstrand writes: > I like it When the comments are longer than the code, you know you're done? -- -keith signature.asc Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/3] mesa/st: enable EXT_sRGB_write_control for drivers that support it
Am Mittwoch, den 17.10.2018, 12:56 -0400 schrieb Ilia Mirkin: > On Wed, Oct 17, 2018 at 12:39 PM Gert Wollny > wrote: > > > > From: Gert Wollny > > > > With this patch the extension EXT_sRGB_write_control is enabled for > > gallium drivers that support sRGB formats as render targets. > > > > Tested (and pass) on r600(evergreen) and softpipe: > > > > dEQP- > > GLES31.functional.fbo.srgb_write_control.framebuffer_srgb_enabled* > > > > with "MESA_GLES_VERSION_OVERRIDE=3.2" (the tests needlessly check > > for this) > > > > Signed-off-by: Gert Wollny > > --- > > src/mesa/state_tracker/st_manager.c | 17 + > > 1 file changed, 9 insertions(+), 8 deletions(-) > > > > diff --git a/src/mesa/state_tracker/st_manager.c > > b/src/mesa/state_tracker/st_manager.c > > index ceb48dd490..562b12a1ef 100644 > > --- a/src/mesa/state_tracker/st_manager.c > > +++ b/src/mesa/state_tracker/st_manager.c > > @@ -457,14 +457,12 @@ st_framebuffer_create(struct st_context *st, > > * format such that util_format_srgb(visual->color_format) can > > be supported > > * by the pipe driver. We still need to advertise the > > capability here. > > * > > -* For GLES, however, sRGB framebuffer write is controlled only > > by the > > -* capability of the framebuffer. There is > > GL_EXT_sRGB_write_control to > > -* give applications the control back, but sRGB write is still > > enabled by > > -* default. To avoid unexpected results, we should not > > advertise the > > -* capability. This could change when we add support for > > -* EGL_KHR_gl_colorspace. > > +* For GLES, however, sRGB framebuffer write is initially only > > controlled > > +* by the capability of the framebuffer, but with > > GL_EXT_sRGB_write_control > > +* control is given back to the applications. Similar to > > desktop GL > > +* support for this extension depends EXT_framebuffer_sRGB. > > */ > > - if (_mesa_is_desktop_gl(st->ctx)) { > > + { > >struct pipe_screen *screen = st->pipe->screen; > >const enum pipe_format srgb_format = > > util_format_srgb(stfbi->visual->color_format); > > @@ -475,8 +473,11 @@ st_framebuffer_create(struct st_context *st, > >PIPE_TEXTURE_2D, stfbi- > > >visual->samples, > >stfbi->visual->samples, > >(PIPE_BIND_DISPLAY_TARGET | > > - PIPE_BIND_RENDER_TARGET))) > > + PIPE_BIND_RENDER_TARGET))) > > { > > mode.sRGBCapable = GL_TRUE; > > + /* Exposing this as extension is only needed on GLES */ > > + st->ctx->Extensions.EXT_sRGB_write_control = > > !_mesa_is_desktop_gl(st->ctx); > > Having weird dependencies in extension enables creates a lot of > confusion. I'd just flip it to true. My resasoning here was that this is a GLES only extension, but I now see that this is acctually done via the extension table. Thanks for all the pointers. Gert > > > + } > > } > > > > _mesa_initialize_window_framebuffer(&stfb->Base, &mode); > > -- > > 2.18.1 > > > > ___ > > mesa-dev mailing list > > mesa-dev@lists.freedesktop.org > > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] Q: to which software renderers should we contribute to help virgl conformance testing
Dear all, we are looking into doing a CI for virglrenderer that also runs a subset of the GLES dEQP, and in order to be able to run this also in gitlab.fd.o we were looking into the available gallium software renderers. Inital tests by just running the dEQP-GLES2 were quite successful in the sense that the exection time is not too long (a full run on the GL and GLES host with llvmpipe takes about 10 min [1]). Now to extend on that work the focus is turning to which software renderer has the most features, the least failing tests, and is actively developed. Simply looking at the commit stats it seems that the developement of softpipe and llvmpipe is mostly stalled, swr, on the other had has seen quite some development, but mostly regarding performance, and given the FAQ [2] the focus is on a very specific application space and not so much on getting more features in. When checking for conformance of virglrenderer we need a host driver that is conformant itself, and we are willing to contribute here, but it seems to make most sense to focus this work on just one driver. To make sensible choice there are some open questions: Are there plans to get swr and/or llvmpipe to support gles 3.1, or carry any of the drivers even further, maybe GLES 3.2 and desktop 4.x? Is there any specific interest to fix all failures that occur when running gles dEQP? In this bug report [3] Roland pointed out that "there is no goal as such to pass dEQP, although patches are welcome", any opinion for the other drivers? (for swr beyond what is written in the FAQ). As pointed out in the FAQ, swr is very Intel specific, are there plans not layed out in the FAQ to support other, non-x86 hardware? many thanks Gert [1] https://gitlab.freedesktop.org/gerddie/virglrenderer/pipelines [2] https://gallium.readthedocs.io/en/latest/drivers/openswr/faq.html#w hat-s-the-conformance [3] https://bugs.freedesktop.org/show_bug.cgi?id=94957 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] vulkan: Add VK_EXT_calibrated_timestamps extension (radv and anv) [v5]
I like it Reviewed-by: Jason Ekstrand On Wed, Oct 17, 2018 at 11:49 AM Keith Packard wrote: > Offers three clocks, device, clock monotonic and clock monotonic > raw. Could use some kernel support to reduce the deviation between > clock values. > > v2: > Ensure deviation is at least as big as the GPU time interval. > > v3: > Set device->lost when returning DEVICE_LOST. > Use MAX2 and DIV_ROUND_UP instead of open coding these. > Delete spurious TIMESTAMP in radv version. > > Suggested-by: Jason Ekstrand > Suggested-by: Lionel Landwerlin > > v4: > Add anv_gem_reg_read to anv_gem_stubs.c > > Suggested-by: Jason Ekstrand > > v5: > Adjust maxDeviation computation to max(sampled_clock_period) + > sample_interval. > > Suggested-by: Bas Nieuwenhuizen > Suggested-by: Jason Ekstrand > > Signed-off-by: Keith Packard > --- > src/amd/vulkan/radv_device.c | 119 +++ > src/amd/vulkan/radv_extensions.py | 1 + > src/intel/vulkan/anv_device.c | 127 + > src/intel/vulkan/anv_extensions.py | 1 + > src/intel/vulkan/anv_gem.c | 13 +++ > src/intel/vulkan/anv_gem_stubs.c | 7 ++ > src/intel/vulkan/anv_private.h | 2 + > 7 files changed, 270 insertions(+) > > diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c > index 174922780fc..4a705a724ef 100644 > --- a/src/amd/vulkan/radv_device.c > +++ b/src/amd/vulkan/radv_device.c > @@ -4955,3 +4955,122 @@ radv_GetDeviceGroupPeerMemoryFeatures( >VK_PEER_MEMORY_FEATURE_GENERIC_SRC_BIT | >VK_PEER_MEMORY_FEATURE_GENERIC_DST_BIT; > } > + > +static const VkTimeDomainEXT radv_time_domains[] = { > + VK_TIME_DOMAIN_DEVICE_EXT, > + VK_TIME_DOMAIN_CLOCK_MONOTONIC_EXT, > + VK_TIME_DOMAIN_CLOCK_MONOTONIC_RAW_EXT, > +}; > + > +VkResult radv_GetPhysicalDeviceCalibrateableTimeDomainsEXT( > + VkPhysicalDevice physicalDevice, > + uint32_t *pTimeDomainCount, > + VkTimeDomainEXT *pTimeDomains) > +{ > + int d; > + VK_OUTARRAY_MAKE(out, pTimeDomains, pTimeDomainCount); > + > + for (d = 0; d < ARRAY_SIZE(radv_time_domains); d++) { > + vk_outarray_append(&out, i) { > + *i = radv_time_domains[d]; > + } > + } > + > + return vk_outarray_status(&out); > +} > + > +static uint64_t > +radv_clock_gettime(clockid_t clock_id) > +{ > + struct timespec current; > + int ret; > + > + ret = clock_gettime(clock_id, ¤t); > + if (ret < 0 && clock_id == CLOCK_MONOTONIC_RAW) > + ret = clock_gettime(CLOCK_MONOTONIC, ¤t); > + if (ret < 0) > + return 0; > + > + return (uint64_t) current.tv_sec * 10ULL + current.tv_nsec; > +} > + > +VkResult radv_GetCalibratedTimestampsEXT( > + VkDevice _device, > + uint32_t timestampCount, > + const VkCalibratedTimestampInfoEXT *pTimestampInfos, > + uint64_t *pTimestamps, > + uint64_t *pMaxDeviation) > +{ > + RADV_FROM_HANDLE(radv_device, device, _device); > + uint32_t clock_crystal_freq = > device->physical_device->rad_info.clock_crystal_freq; > + int d; > + uint64_t begin, end; > +uint64_t max_clock_period = 0; > + > + begin = radv_clock_gettime(CLOCK_MONOTONIC_RAW); > + > + for (d = 0; d < timestampCount; d++) { > + switch (pTimestampInfos[d].timeDomain) { > + case VK_TIME_DOMAIN_DEVICE_EXT: > + pTimestamps[d] = > device->ws->query_value(device->ws, > + > RADEON_TIMESTAMP); > +uint64_t device_period = DIV_ROUND_UP(100, > clock_crystal_freq); > +max_clock_period = MAX2(max_clock_period, > device_period); > + break; > + case VK_TIME_DOMAIN_CLOCK_MONOTONIC_EXT: > + pTimestamps[d] = > radv_clock_gettime(CLOCK_MONOTONIC); > +max_clock_period = MAX2(max_clock_period, 1); > + break; > + > + case VK_TIME_DOMAIN_CLOCK_MONOTONIC_RAW_EXT: > + pTimestamps[d] = begin; > + break; > + default: > + pTimestamps[d] = 0; > + break; > + } > + } > + > + end = radv_clock_gettime(CLOCK_MONOTONIC_RAW); > + > +/* > + * The maximum deviation is the sum of the interval over which we > + * perform the sampling and the maximum period of any sampled > + * clock. That's because t
Re: [Mesa-dev] [PATCH 1/2] freedreno: Fix the Emacs indentation configuration file
> On Wed, Oct 17, 2018 at 12:45 PM Neil Roberts wrote: > >> I wonder if you have something else in your setup that is setting it? Ilia Mirkin writes: > Perhaps. It's the default, right? It is the default but the toplevel .dir-locals.el sets it to nil. These lower-level files are trying to override it back to the default. > These might have a common source... although, HAH! IT WASN'T ME! > Michel in 8d0a1a6bc05a set it to true, I probably copied, and am so > used to emacs errors that I didn't even notice. Indents worked, so I > was happy. :) > Yes, fixing these all is probably a good move. I don't think there are > a lot of emacs users in mesa. Lucky for them :) Regards, - Neil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/3] mesa/core: Add support for EXT_sRGB_write_control
On Wed, Oct 17, 2018 at 12:49 PM Ilia Mirkin wrote: > On Wed, Oct 17, 2018 at 12:38 PM Gert Wollny wrote: > > diff --git a/src/mesa/main/extensions_table.h > > b/src/mesa/main/extensions_table.h > > index 09bf923bd0..1185156f23 100644 > > --- a/src/mesa/main/extensions_table.h > > +++ b/src/mesa/main/extensions_table.h > > @@ -265,6 +265,7 @@ EXT(EXT_shader_integer_mix , > > EXT_shader_integer_mix > > EXT(EXT_shader_io_blocks, dummy_true > > , x , x , x , 31, 2014) > > EXT(EXT_shader_samples_identical, EXT_shader_samples_identical > > , GLL, GLC, x , 31, 2015) > > EXT(EXT_shadow_funcs, ARB_shadow > > , GLL, x , x , x , 2002) > > +EXT(EXT_sRGB_write_control , EXT_sRGB_write_control > > , GLL, x , x , 30, 2013) > > I think you want an "x" instead of "GLL" -- it's an ES-only ext. Also > I'd list "ES2" as the minimum. A driver that doesn't expose ES 3.0 or > EXT_sRGB just shouldn't set this enable to true. Oh, and an additional observation, since we don't expose EXT_sRGB at all in mesa, the 30 is warranted here. But when we do, we should drop this to ES2 and then ensure that the relevant drivers don't do anything silly. Cheers, -ilia ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/3] mesa/st: enable EXT_sRGB_write_control for drivers that support it
On Wed, Oct 17, 2018 at 12:39 PM Gert Wollny wrote: > > From: Gert Wollny > > With this patch the extension EXT_sRGB_write_control is enabled for > gallium drivers that support sRGB formats as render targets. > > Tested (and pass) on r600(evergreen) and softpipe: > > dEQP-GLES31.functional.fbo.srgb_write_control.framebuffer_srgb_enabled* > > with "MESA_GLES_VERSION_OVERRIDE=3.2" (the tests needlessly check for this) > > Signed-off-by: Gert Wollny > --- > src/mesa/state_tracker/st_manager.c | 17 + > 1 file changed, 9 insertions(+), 8 deletions(-) > > diff --git a/src/mesa/state_tracker/st_manager.c > b/src/mesa/state_tracker/st_manager.c > index ceb48dd490..562b12a1ef 100644 > --- a/src/mesa/state_tracker/st_manager.c > +++ b/src/mesa/state_tracker/st_manager.c > @@ -457,14 +457,12 @@ st_framebuffer_create(struct st_context *st, > * format such that util_format_srgb(visual->color_format) can be > supported > * by the pipe driver. We still need to advertise the capability here. > * > -* For GLES, however, sRGB framebuffer write is controlled only by the > -* capability of the framebuffer. There is GL_EXT_sRGB_write_control to > -* give applications the control back, but sRGB write is still enabled by > -* default. To avoid unexpected results, we should not advertise the > -* capability. This could change when we add support for > -* EGL_KHR_gl_colorspace. > +* For GLES, however, sRGB framebuffer write is initially only controlled > +* by the capability of the framebuffer, but with > GL_EXT_sRGB_write_control > +* control is given back to the applications. Similar to desktop GL > +* support for this extension depends EXT_framebuffer_sRGB. > */ > - if (_mesa_is_desktop_gl(st->ctx)) { > + { >struct pipe_screen *screen = st->pipe->screen; >const enum pipe_format srgb_format = > util_format_srgb(stfbi->visual->color_format); > @@ -475,8 +473,11 @@ st_framebuffer_create(struct st_context *st, >PIPE_TEXTURE_2D, > stfbi->visual->samples, >stfbi->visual->samples, >(PIPE_BIND_DISPLAY_TARGET | > - PIPE_BIND_RENDER_TARGET))) > + PIPE_BIND_RENDER_TARGET))) { > mode.sRGBCapable = GL_TRUE; > + /* Exposing this as extension is only needed on GLES */ > + st->ctx->Extensions.EXT_sRGB_write_control = > !_mesa_is_desktop_gl(st->ctx); Having weird dependencies in extension enables creates a lot of confusion. I'd just flip it to true. > + } > } > > _mesa_initialize_window_framebuffer(&stfb->Base, &mode); > -- > 2.18.1 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2] Fix setting indent-tabs-mode in the Emacs .dir-locals.el files
Reviewed-by: Ilia Mirkin On Wed, Oct 17, 2018 at 12:51 PM Neil Roberts wrote: > > Some of the .dir-locals.el had the wrong name for the truthy value so > it wasn’t setting indent-tabs-mode. > --- > src/gallium/drivers/freedreno/.dir-locals.el | 2 +- > src/gallium/drivers/r600/.dir-locals.el | 2 +- > src/gallium/drivers/radeon/.dir-locals.el| 2 +- > src/gallium/drivers/radeonsi/.dir-locals.el | 2 +- > src/mesa/drivers/dri/nouveau/.dir-locals.el | 2 +- > 5 files changed, 5 insertions(+), 5 deletions(-) > > diff --git a/src/gallium/drivers/freedreno/.dir-locals.el > b/src/gallium/drivers/freedreno/.dir-locals.el > index aa20d495465..b0e90fcbd53 100644 > --- a/src/gallium/drivers/freedreno/.dir-locals.el > +++ b/src/gallium/drivers/freedreno/.dir-locals.el > @@ -1,5 +1,5 @@ > ((prog-mode > - (indent-tabs-mode . true) > + (indent-tabs-mode . t) >(tab-width . 4) >(c-basic-offset . 4) >(c-file-style . "k&r") > diff --git a/src/gallium/drivers/r600/.dir-locals.el > b/src/gallium/drivers/r600/.dir-locals.el > index 4e35c129e70..15cd68edb0a 100644 > --- a/src/gallium/drivers/r600/.dir-locals.el > +++ b/src/gallium/drivers/r600/.dir-locals.el > @@ -1,5 +1,5 @@ > ((prog-mode > - (indent-tabs-mode . true) > + (indent-tabs-mode . t) >(tab-width . 8) >(c-basic-offset . 8) >(c-file-style . "stroustrup") > diff --git a/src/gallium/drivers/radeon/.dir-locals.el > b/src/gallium/drivers/radeon/.dir-locals.el > index 4e35c129e70..15cd68edb0a 100644 > --- a/src/gallium/drivers/radeon/.dir-locals.el > +++ b/src/gallium/drivers/radeon/.dir-locals.el > @@ -1,5 +1,5 @@ > ((prog-mode > - (indent-tabs-mode . true) > + (indent-tabs-mode . t) >(tab-width . 8) >(c-basic-offset . 8) >(c-file-style . "stroustrup") > diff --git a/src/gallium/drivers/radeonsi/.dir-locals.el > b/src/gallium/drivers/radeonsi/.dir-locals.el > index 4e35c129e70..15cd68edb0a 100644 > --- a/src/gallium/drivers/radeonsi/.dir-locals.el > +++ b/src/gallium/drivers/radeonsi/.dir-locals.el > @@ -1,5 +1,5 @@ > ((prog-mode > - (indent-tabs-mode . true) > + (indent-tabs-mode . t) >(tab-width . 8) >(c-basic-offset . 8) >(c-file-style . "stroustrup") > diff --git a/src/mesa/drivers/dri/nouveau/.dir-locals.el > b/src/mesa/drivers/dri/nouveau/.dir-locals.el > index 774f023ae6f..9b3ddf52461 100644 > --- a/src/mesa/drivers/dri/nouveau/.dir-locals.el > +++ b/src/mesa/drivers/dri/nouveau/.dir-locals.el > @@ -1,5 +1,5 @@ > ((prog-mode > - (indent-tabs-mode . true) > + (indent-tabs-mode . t) >(tab-width . 8) >(c-basic-offset . 8) >(c-file-style . "stroustrup") > -- > 2.17.1 > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] freedreno: Fix the Emacs indentation configuration file
On Wed, Oct 17, 2018 at 12:45 PM Neil Roberts wrote: > > Ilia Mirkin writes: > > > Are you sure? It works fine for me... I'm not against fixing it to be > > "t", but the current contents definitely worked fine for me. (As I > > recall, I may be the one who checked this file in.) > > Yes, I’m sure. If you type “true” and then do C-x C-e to evaluate it > then Emacs gives a void-variable error. If I leave it as “true” in the > file then it does indeed indent without tabs. Also if I do C-h v it says > the value is nil, whereas if I change the .dir-local.el to “t” then the > indentation works properly and the variable help says its value comes > from the .dir-locals.el. I wonder if you have something else in your > setup that is setting it? Perhaps. It's the default, right? > > I notice that there are some other files with the same problem. It might > be worth fixing them all in one patch. > > $ git grep 'indent-tabs-mode *\. *true' > src/gallium/drivers/freedreno/.dir-locals.el: (indent-tabs-mode . true) > src/gallium/drivers/r600/.dir-locals.el: (indent-tabs-mode . true) > src/gallium/drivers/radeon/.dir-locals.el: (indent-tabs-mode . true) > src/gallium/drivers/radeonsi/.dir-locals.el: (indent-tabs-mode . true) > src/mesa/drivers/dri/nouveau/.dir-locals.el: (indent-tabs-mode . true) These might have a common source... although, HAH! IT WASN'T ME! Michel in 8d0a1a6bc05a set it to true, I probably copied, and am so used to emacs errors that I didn't even notice. Indents worked, so I was happy. Yes, fixing these all is probably a good move. I don't think there are a lot of emacs users in mesa. -ilia ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2] Fix setting indent-tabs-mode in the Emacs .dir-locals.el files
Some of the .dir-locals.el had the wrong name for the truthy value so it wasn’t setting indent-tabs-mode. --- src/gallium/drivers/freedreno/.dir-locals.el | 2 +- src/gallium/drivers/r600/.dir-locals.el | 2 +- src/gallium/drivers/radeon/.dir-locals.el| 2 +- src/gallium/drivers/radeonsi/.dir-locals.el | 2 +- src/mesa/drivers/dri/nouveau/.dir-locals.el | 2 +- 5 files changed, 5 insertions(+), 5 deletions(-) diff --git a/src/gallium/drivers/freedreno/.dir-locals.el b/src/gallium/drivers/freedreno/.dir-locals.el index aa20d495465..b0e90fcbd53 100644 --- a/src/gallium/drivers/freedreno/.dir-locals.el +++ b/src/gallium/drivers/freedreno/.dir-locals.el @@ -1,5 +1,5 @@ ((prog-mode - (indent-tabs-mode . true) + (indent-tabs-mode . t) (tab-width . 4) (c-basic-offset . 4) (c-file-style . "k&r") diff --git a/src/gallium/drivers/r600/.dir-locals.el b/src/gallium/drivers/r600/.dir-locals.el index 4e35c129e70..15cd68edb0a 100644 --- a/src/gallium/drivers/r600/.dir-locals.el +++ b/src/gallium/drivers/r600/.dir-locals.el @@ -1,5 +1,5 @@ ((prog-mode - (indent-tabs-mode . true) + (indent-tabs-mode . t) (tab-width . 8) (c-basic-offset . 8) (c-file-style . "stroustrup") diff --git a/src/gallium/drivers/radeon/.dir-locals.el b/src/gallium/drivers/radeon/.dir-locals.el index 4e35c129e70..15cd68edb0a 100644 --- a/src/gallium/drivers/radeon/.dir-locals.el +++ b/src/gallium/drivers/radeon/.dir-locals.el @@ -1,5 +1,5 @@ ((prog-mode - (indent-tabs-mode . true) + (indent-tabs-mode . t) (tab-width . 8) (c-basic-offset . 8) (c-file-style . "stroustrup") diff --git a/src/gallium/drivers/radeonsi/.dir-locals.el b/src/gallium/drivers/radeonsi/.dir-locals.el index 4e35c129e70..15cd68edb0a 100644 --- a/src/gallium/drivers/radeonsi/.dir-locals.el +++ b/src/gallium/drivers/radeonsi/.dir-locals.el @@ -1,5 +1,5 @@ ((prog-mode - (indent-tabs-mode . true) + (indent-tabs-mode . t) (tab-width . 8) (c-basic-offset . 8) (c-file-style . "stroustrup") diff --git a/src/mesa/drivers/dri/nouveau/.dir-locals.el b/src/mesa/drivers/dri/nouveau/.dir-locals.el index 774f023ae6f..9b3ddf52461 100644 --- a/src/mesa/drivers/dri/nouveau/.dir-locals.el +++ b/src/mesa/drivers/dri/nouveau/.dir-locals.el @@ -1,5 +1,5 @@ ((prog-mode - (indent-tabs-mode . true) + (indent-tabs-mode . t) (tab-width . 8) (c-basic-offset . 8) (c-file-style . "stroustrup") -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/3] mesa/core: Add support for EXT_sRGB_write_control
On Wed, Oct 17, 2018 at 12:38 PM Gert Wollny wrote: > > From: Gert Wollny > > This GLES extension gives the applications the control over deciding whether > the conversion from linear space to sRGB is necessary by enabling or > disabling this conversion at framebuffer write or blending time just > like it is possible for desktop GL. > > Signed-off-by: Gert Wollny > --- > src/mesa/main/enable.c | 4 ++-- > src/mesa/main/extensions_table.h | 1 + > src/mesa/main/get.c | 6 ++ > src/mesa/main/get_hash_params.py | 1 + > src/mesa/main/mtypes.h | 1 + > 5 files changed, 11 insertions(+), 2 deletions(-) > > diff --git a/src/mesa/main/enable.c b/src/mesa/main/enable.c > index bd3e493da5..06c5a0eb68 100644 > --- a/src/mesa/main/enable.c > +++ b/src/mesa/main/enable.c > @@ -1125,7 +1125,7 @@ _mesa_set_enable(struct gl_context *ctx, GLenum cap, > GLboolean state) > >/* GL3.0 - GL_framebuffer_sRGB */ >case GL_FRAMEBUFFER_SRGB_EXT: > - if (!_mesa_is_desktop_gl(ctx)) > + if (!_mesa_is_desktop_gl(ctx) && > !ctx->Extensions.EXT_sRGB_write_control) > goto invalid_enum_error; > CHECK_EXTENSION(EXT_framebuffer_sRGB, cap); > _mesa_set_framebuffer_srgb(ctx, state); > @@ -1765,7 +1765,7 @@ _mesa_IsEnabled( GLenum cap ) > >/* GL3.0 - GL_framebuffer_sRGB */ >case GL_FRAMEBUFFER_SRGB_EXT: > - if (!_mesa_is_desktop_gl(ctx)) > + if (!_mesa_is_desktop_gl(ctx) && > !ctx->Extensions.EXT_sRGB_write_control) > goto invalid_enum_error; > CHECK_EXTENSION(EXT_framebuffer_sRGB); > return ctx->Color.sRGBEnabled; > diff --git a/src/mesa/main/extensions_table.h > b/src/mesa/main/extensions_table.h > index 09bf923bd0..1185156f23 100644 > --- a/src/mesa/main/extensions_table.h > +++ b/src/mesa/main/extensions_table.h > @@ -265,6 +265,7 @@ EXT(EXT_shader_integer_mix , > EXT_shader_integer_mix > EXT(EXT_shader_io_blocks, dummy_true > , x , x , x , 31, 2014) > EXT(EXT_shader_samples_identical, EXT_shader_samples_identical > , GLL, GLC, x , 31, 2015) > EXT(EXT_shadow_funcs, ARB_shadow > , GLL, x , x , x , 2002) > +EXT(EXT_sRGB_write_control , EXT_sRGB_write_control > , GLL, x , x , 30, 2013) I think you want an "x" instead of "GLL" -- it's an ES-only ext. Also I'd list "ES2" as the minimum. A driver that doesn't expose ES 3.0 or EXT_sRGB just shouldn't set this enable to true. > EXT(EXT_stencil_two_side, EXT_stencil_two_side > , GLL, x , x , x , 2001) > EXT(EXT_stencil_wrap, dummy_true > , GLL, x , x , x , 2002) > EXT(EXT_subtexture , dummy_true > , GLL, x , x , x , 1995) > diff --git a/src/mesa/main/get.c b/src/mesa/main/get.c > index 1b1679e8bf..fd9d3885f5 100644 > --- a/src/mesa/main/get.c > +++ b/src/mesa/main/get.c > @@ -394,6 +394,12 @@ static const int extra_ARB_compute_shader_es31[] = { > EXTRA_END > }; > > +static const int extra_EXT_sRGB_write_control_es30[] = { > + EXT(EXT_sRGB_write_control), > + EXTRA_API_ES3, > + EXTRA_END > +}; These get OR'd, I believe, which is not what you want. Just leave the EXT() in, leave the EXTRA_API out. > + > static const int extra_ARB_shader_storage_buffer_object_es31[] = { > EXT(ARB_shader_storage_buffer_object), > EXTRA_API_ES31, > diff --git a/src/mesa/main/get_hash_params.py > b/src/mesa/main/get_hash_params.py > index 1840db6ebb..822fab8151 100644 > --- a/src/mesa/main/get_hash_params.py > +++ b/src/mesa/main/get_hash_params.py > @@ -262,6 +262,7 @@ descriptor=[ > # Enums in GLES2, GLES3 > { "apis": ["GLES2", "GLES3"], "params": [ >[ "GPU_DISJOINT_EXT", "LOC_CUSTOM, TYPE_INT, 0, > extra_EXT_disjoint_timer_query" ], > + [ "FRAMEBUFFER_SRGB_EXT", "CONTEXT_BOOL(Color.sRGBEnabled), > extra_EXT_sRGB_write_control_es30" ], > ]}, > > { "apis": ["GL", "GL_CORE", "GLES2"], "params": [ > diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h > index 9ed49b7ff2..31cf62fdb6 100644 > --- a/src/mesa/main/mtypes.h > +++ b/src/mesa/main/mtypes.h > @@ -4253,6 +4253,7 @@ struct gl_extensions > GLboolean EXT_semaphore_fd; > GLboolean EXT_shader_integer_mix; > GLboolean EXT_shader_samples_identical; > + GLboolean EXT_sRGB_write_control; > GLboolean EXT_stencil_two_side; > GLboolean EXT_texture_array; > GLboolean EXT_texture_compression_latc; > -- > 2.18.1 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedes
[Mesa-dev] [PATCH] vulkan: Add VK_EXT_calibrated_timestamps extension (radv and anv) [v5]
Offers three clocks, device, clock monotonic and clock monotonic raw. Could use some kernel support to reduce the deviation between clock values. v2: Ensure deviation is at least as big as the GPU time interval. v3: Set device->lost when returning DEVICE_LOST. Use MAX2 and DIV_ROUND_UP instead of open coding these. Delete spurious TIMESTAMP in radv version. Suggested-by: Jason Ekstrand Suggested-by: Lionel Landwerlin v4: Add anv_gem_reg_read to anv_gem_stubs.c Suggested-by: Jason Ekstrand v5: Adjust maxDeviation computation to max(sampled_clock_period) + sample_interval. Suggested-by: Bas Nieuwenhuizen Suggested-by: Jason Ekstrand Signed-off-by: Keith Packard --- src/amd/vulkan/radv_device.c | 119 +++ src/amd/vulkan/radv_extensions.py | 1 + src/intel/vulkan/anv_device.c | 127 + src/intel/vulkan/anv_extensions.py | 1 + src/intel/vulkan/anv_gem.c | 13 +++ src/intel/vulkan/anv_gem_stubs.c | 7 ++ src/intel/vulkan/anv_private.h | 2 + 7 files changed, 270 insertions(+) diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c index 174922780fc..4a705a724ef 100644 --- a/src/amd/vulkan/radv_device.c +++ b/src/amd/vulkan/radv_device.c @@ -4955,3 +4955,122 @@ radv_GetDeviceGroupPeerMemoryFeatures( VK_PEER_MEMORY_FEATURE_GENERIC_SRC_BIT | VK_PEER_MEMORY_FEATURE_GENERIC_DST_BIT; } + +static const VkTimeDomainEXT radv_time_domains[] = { + VK_TIME_DOMAIN_DEVICE_EXT, + VK_TIME_DOMAIN_CLOCK_MONOTONIC_EXT, + VK_TIME_DOMAIN_CLOCK_MONOTONIC_RAW_EXT, +}; + +VkResult radv_GetPhysicalDeviceCalibrateableTimeDomainsEXT( + VkPhysicalDevice physicalDevice, + uint32_t *pTimeDomainCount, + VkTimeDomainEXT *pTimeDomains) +{ + int d; + VK_OUTARRAY_MAKE(out, pTimeDomains, pTimeDomainCount); + + for (d = 0; d < ARRAY_SIZE(radv_time_domains); d++) { + vk_outarray_append(&out, i) { + *i = radv_time_domains[d]; + } + } + + return vk_outarray_status(&out); +} + +static uint64_t +radv_clock_gettime(clockid_t clock_id) +{ + struct timespec current; + int ret; + + ret = clock_gettime(clock_id, ¤t); + if (ret < 0 && clock_id == CLOCK_MONOTONIC_RAW) + ret = clock_gettime(CLOCK_MONOTONIC, ¤t); + if (ret < 0) + return 0; + + return (uint64_t) current.tv_sec * 10ULL + current.tv_nsec; +} + +VkResult radv_GetCalibratedTimestampsEXT( + VkDevice _device, + uint32_t timestampCount, + const VkCalibratedTimestampInfoEXT *pTimestampInfos, + uint64_t *pTimestamps, + uint64_t *pMaxDeviation) +{ + RADV_FROM_HANDLE(radv_device, device, _device); + uint32_t clock_crystal_freq = device->physical_device->rad_info.clock_crystal_freq; + int d; + uint64_t begin, end; +uint64_t max_clock_period = 0; + + begin = radv_clock_gettime(CLOCK_MONOTONIC_RAW); + + for (d = 0; d < timestampCount; d++) { + switch (pTimestampInfos[d].timeDomain) { + case VK_TIME_DOMAIN_DEVICE_EXT: + pTimestamps[d] = device->ws->query_value(device->ws, + RADEON_TIMESTAMP); +uint64_t device_period = DIV_ROUND_UP(100, clock_crystal_freq); +max_clock_period = MAX2(max_clock_period, device_period); + break; + case VK_TIME_DOMAIN_CLOCK_MONOTONIC_EXT: + pTimestamps[d] = radv_clock_gettime(CLOCK_MONOTONIC); +max_clock_period = MAX2(max_clock_period, 1); + break; + + case VK_TIME_DOMAIN_CLOCK_MONOTONIC_RAW_EXT: + pTimestamps[d] = begin; + break; + default: + pTimestamps[d] = 0; + break; + } + } + + end = radv_clock_gettime(CLOCK_MONOTONIC_RAW); + +/* + * The maximum deviation is the sum of the interval over which we + * perform the sampling and the maximum period of any sampled + * clock. That's because the maximum skew between any two sampled + * clock edges is when the sampled clock with the largest period is + * sampled at the end of that period but right at the beginning of the + * sampling interval and some other clock is sampled right at the + * begin
Re: [Mesa-dev] [PATCH 1/2] freedreno: Fix the Emacs indentation configuration file
Ilia Mirkin writes: > Are you sure? It works fine for me... I'm not against fixing it to be > "t", but the current contents definitely worked fine for me. (As I > recall, I may be the one who checked this file in.) Yes, I’m sure. If you type “true” and then do C-x C-e to evaluate it then Emacs gives a void-variable error. If I leave it as “true” in the file then it does indeed indent without tabs. Also if I do C-h v it says the value is nil, whereas if I change the .dir-local.el to “t” then the indentation works properly and the variable help says its value comes from the .dir-locals.el. I wonder if you have something else in your setup that is setting it? I notice that there are some other files with the same problem. It might be worth fixing them all in one patch. $ git grep 'indent-tabs-mode *\. *true' src/gallium/drivers/freedreno/.dir-locals.el: (indent-tabs-mode . true) src/gallium/drivers/r600/.dir-locals.el: (indent-tabs-mode . true) src/gallium/drivers/radeon/.dir-locals.el: (indent-tabs-mode . true) src/gallium/drivers/radeonsi/.dir-locals.el: (indent-tabs-mode . true) src/mesa/drivers/dri/nouveau/.dir-locals.el: (indent-tabs-mode . true) Regards, - Neil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/4] radeonsi: use compute shaders for clear_buffer & copy_buffer
Can you test the attached patch? Marek On Wed, Oct 17, 2018 at 9:31 AM Michel Dänzer wrote: > On 2018-10-07 9:05 a.m., Marek Olšák wrote: > > From: Marek Olšák > > > > Fast color clears should be much faster. Also, fast color clears on > > evicted buffers should be 200x faster on GFX8 and older. > > Nice! Unfortunately, this broke clover with radeonsi. Everything using > OpenCL seems to hang, see e.g. the attached backtraces from clinfo. > > > -- > Earthling Michel Dänzer | http://www.amd.com > Libre software enthusiast | Mesa and X developer > From f0978b2afae808edf4ac281b14cd371305a5164b Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Marek=20Ol=C5=A1=C3=A1k?= Date: Wed, 17 Oct 2018 12:41:38 -0400 Subject: [PATCH] radeonsi: fix a deadlock due to partially-initialized context on CI --- src/gallium/drivers/radeonsi/si_pipe.c | 14 -- 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/src/gallium/drivers/radeonsi/si_pipe.c b/src/gallium/drivers/radeonsi/si_pipe.c index 59e41c53300..06740bd0f5c 100644 --- a/src/gallium/drivers/radeonsi/si_pipe.c +++ b/src/gallium/drivers/radeonsi/si_pipe.c @@ -575,12 +575,6 @@ static struct pipe_context *si_create_context(struct pipe_screen *screen, &sctx->null_const_buf); si_set_rw_buffer(sctx, SI_PS_CONST_SAMPLE_POSITIONS, &sctx->null_const_buf); - - /* Clear the NULL constant buffer, because loads should return zeros. */ - uint32_t clear_value = 0; - si_clear_buffer(sctx, sctx->null_const_buf.buffer, 0, -sctx->null_const_buf.buffer->width0, -&clear_value, 4, SI_COHERENCY_SHADER); } uint64_t max_threads_per_block; @@ -625,6 +619,14 @@ static struct pipe_context *si_create_context(struct pipe_screen *screen, /* this must be last */ si_begin_new_gfx_cs(sctx); + + if (sctx->chip_class == CIK) { + /* Clear the NULL constant buffer, because loads should return zeros. */ + uint32_t clear_value = 0; + si_clear_buffer(sctx, sctx->null_const_buf.buffer, 0, +sctx->null_const_buf.buffer->width0, +&clear_value, 4, SI_COHERENCY_SHADER); + } return &sctx->b; fail: fprintf(stderr, "radeonsi: Failed to create a context.\n"); -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/3] mesa/core: Add support for EXT_sRGB_write_control
From: Gert Wollny This GLES extension gives the applications the control over deciding whether the conversion from linear space to sRGB is necessary by enabling or disabling this conversion at framebuffer write or blending time just like it is possible for desktop GL. Signed-off-by: Gert Wollny --- src/mesa/main/enable.c | 4 ++-- src/mesa/main/extensions_table.h | 1 + src/mesa/main/get.c | 6 ++ src/mesa/main/get_hash_params.py | 1 + src/mesa/main/mtypes.h | 1 + 5 files changed, 11 insertions(+), 2 deletions(-) diff --git a/src/mesa/main/enable.c b/src/mesa/main/enable.c index bd3e493da5..06c5a0eb68 100644 --- a/src/mesa/main/enable.c +++ b/src/mesa/main/enable.c @@ -1125,7 +1125,7 @@ _mesa_set_enable(struct gl_context *ctx, GLenum cap, GLboolean state) /* GL3.0 - GL_framebuffer_sRGB */ case GL_FRAMEBUFFER_SRGB_EXT: - if (!_mesa_is_desktop_gl(ctx)) + if (!_mesa_is_desktop_gl(ctx) && !ctx->Extensions.EXT_sRGB_write_control) goto invalid_enum_error; CHECK_EXTENSION(EXT_framebuffer_sRGB, cap); _mesa_set_framebuffer_srgb(ctx, state); @@ -1765,7 +1765,7 @@ _mesa_IsEnabled( GLenum cap ) /* GL3.0 - GL_framebuffer_sRGB */ case GL_FRAMEBUFFER_SRGB_EXT: - if (!_mesa_is_desktop_gl(ctx)) + if (!_mesa_is_desktop_gl(ctx) && !ctx->Extensions.EXT_sRGB_write_control) goto invalid_enum_error; CHECK_EXTENSION(EXT_framebuffer_sRGB); return ctx->Color.sRGBEnabled; diff --git a/src/mesa/main/extensions_table.h b/src/mesa/main/extensions_table.h index 09bf923bd0..1185156f23 100644 --- a/src/mesa/main/extensions_table.h +++ b/src/mesa/main/extensions_table.h @@ -265,6 +265,7 @@ EXT(EXT_shader_integer_mix , EXT_shader_integer_mix EXT(EXT_shader_io_blocks, dummy_true , x , x , x , 31, 2014) EXT(EXT_shader_samples_identical, EXT_shader_samples_identical , GLL, GLC, x , 31, 2015) EXT(EXT_shadow_funcs, ARB_shadow , GLL, x , x , x , 2002) +EXT(EXT_sRGB_write_control , EXT_sRGB_write_control , GLL, x , x , 30, 2013) EXT(EXT_stencil_two_side, EXT_stencil_two_side , GLL, x , x , x , 2001) EXT(EXT_stencil_wrap, dummy_true , GLL, x , x , x , 2002) EXT(EXT_subtexture , dummy_true , GLL, x , x , x , 1995) diff --git a/src/mesa/main/get.c b/src/mesa/main/get.c index 1b1679e8bf..fd9d3885f5 100644 --- a/src/mesa/main/get.c +++ b/src/mesa/main/get.c @@ -394,6 +394,12 @@ static const int extra_ARB_compute_shader_es31[] = { EXTRA_END }; +static const int extra_EXT_sRGB_write_control_es30[] = { + EXT(EXT_sRGB_write_control), + EXTRA_API_ES3, + EXTRA_END +}; + static const int extra_ARB_shader_storage_buffer_object_es31[] = { EXT(ARB_shader_storage_buffer_object), EXTRA_API_ES31, diff --git a/src/mesa/main/get_hash_params.py b/src/mesa/main/get_hash_params.py index 1840db6ebb..822fab8151 100644 --- a/src/mesa/main/get_hash_params.py +++ b/src/mesa/main/get_hash_params.py @@ -262,6 +262,7 @@ descriptor=[ # Enums in GLES2, GLES3 { "apis": ["GLES2", "GLES3"], "params": [ [ "GPU_DISJOINT_EXT", "LOC_CUSTOM, TYPE_INT, 0, extra_EXT_disjoint_timer_query" ], + [ "FRAMEBUFFER_SRGB_EXT", "CONTEXT_BOOL(Color.sRGBEnabled), extra_EXT_sRGB_write_control_es30" ], ]}, { "apis": ["GL", "GL_CORE", "GLES2"], "params": [ diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h index 9ed49b7ff2..31cf62fdb6 100644 --- a/src/mesa/main/mtypes.h +++ b/src/mesa/main/mtypes.h @@ -4253,6 +4253,7 @@ struct gl_extensions GLboolean EXT_semaphore_fd; GLboolean EXT_shader_integer_mix; GLboolean EXT_shader_samples_identical; + GLboolean EXT_sRGB_write_control; GLboolean EXT_stencil_two_side; GLboolean EXT_texture_array; GLboolean EXT_texture_compression_latc; -- 2.18.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/3] intel/i965: Enable extension EXT_sRGB_write_control
From: Gert Wollny Enables and passes on i965: dEQP-GLES31.functional.fbo.srgb_write_control.framebuffer_srgb_enabled* Signed-off-by: Gert Wollny --- src/mesa/drivers/dri/i965/intel_extensions.c | 1 + 1 file changed, 1 insertion(+) diff --git a/src/mesa/drivers/dri/i965/intel_extensions.c b/src/mesa/drivers/dri/i965/intel_extensions.c index d7e02efb54..ca921de8e8 100644 --- a/src/mesa/drivers/dri/i965/intel_extensions.c +++ b/src/mesa/drivers/dri/i965/intel_extensions.c @@ -76,6 +76,7 @@ intelInitExtensions(struct gl_context *ctx) ctx->Extensions.ARB_shading_language_packing = true; ctx->Extensions.ARB_shadow = true; ctx->Extensions.ARB_sync = true; + ctx->Extensions.EXT_sRGB_write_control = true; ctx->Extensions.ARB_texture_border_clamp = true; ctx->Extensions.ARB_texture_compression_rgtc = true; ctx->Extensions.ARB_texture_cube_map = true; -- 2.18.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/3] mesa/st: enable EXT_sRGB_write_control for drivers that support it
From: Gert Wollny With this patch the extension EXT_sRGB_write_control is enabled for gallium drivers that support sRGB formats as render targets. Tested (and pass) on r600(evergreen) and softpipe: dEQP-GLES31.functional.fbo.srgb_write_control.framebuffer_srgb_enabled* with "MESA_GLES_VERSION_OVERRIDE=3.2" (the tests needlessly check for this) Signed-off-by: Gert Wollny --- src/mesa/state_tracker/st_manager.c | 17 + 1 file changed, 9 insertions(+), 8 deletions(-) diff --git a/src/mesa/state_tracker/st_manager.c b/src/mesa/state_tracker/st_manager.c index ceb48dd490..562b12a1ef 100644 --- a/src/mesa/state_tracker/st_manager.c +++ b/src/mesa/state_tracker/st_manager.c @@ -457,14 +457,12 @@ st_framebuffer_create(struct st_context *st, * format such that util_format_srgb(visual->color_format) can be supported * by the pipe driver. We still need to advertise the capability here. * -* For GLES, however, sRGB framebuffer write is controlled only by the -* capability of the framebuffer. There is GL_EXT_sRGB_write_control to -* give applications the control back, but sRGB write is still enabled by -* default. To avoid unexpected results, we should not advertise the -* capability. This could change when we add support for -* EGL_KHR_gl_colorspace. +* For GLES, however, sRGB framebuffer write is initially only controlled +* by the capability of the framebuffer, but with GL_EXT_sRGB_write_control +* control is given back to the applications. Similar to desktop GL +* support for this extension depends EXT_framebuffer_sRGB. */ - if (_mesa_is_desktop_gl(st->ctx)) { + { struct pipe_screen *screen = st->pipe->screen; const enum pipe_format srgb_format = util_format_srgb(stfbi->visual->color_format); @@ -475,8 +473,11 @@ st_framebuffer_create(struct st_context *st, PIPE_TEXTURE_2D, stfbi->visual->samples, stfbi->visual->samples, (PIPE_BIND_DISPLAY_TARGET | - PIPE_BIND_RENDER_TARGET))) + PIPE_BIND_RENDER_TARGET))) { mode.sRGBCapable = GL_TRUE; + /* Exposing this as extension is only needed on GLES */ + st->ctx->Extensions.EXT_sRGB_write_control = !_mesa_is_desktop_gl(st->ctx); + } } _mesa_initialize_window_framebuffer(&stfb->Base, &mode); -- 2.18.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 0/3] Add and enable extension EXT_sRGB_write_control
From: Gert Wollny Dear all, this series adds the basic plumbing for EXT_sRGB_write_control and enables it for some drivers. Since this is the first time I add an extension I'd ask reviews this to take a specific look at the first patch. One thing I left out therefore, is to enable this extension already for GLES 2.0 + EXT_sRGB, because I was not sure how to deal with the different dependencies in the tables in src/mesa/main/get_hash_params.py and src/mesa/main/extensions_table.h, so if someone can point me in the right direction there, I'll happily add this. many thanks for any review, Gert Gert Wollny (3): mesa/core: Add support for EXT_sRGB_write_control mesa/st: enable EXT_sRGB_write_control for drivers that support it i965: Enable extension EXT_sRGB_write_control src/mesa/drivers/dri/i965/intel_extensions.c | 1 + src/mesa/main/enable.c | 4 ++-- src/mesa/main/extensions_table.h | 1 + src/mesa/main/get.c | 6 ++ src/mesa/main/get_hash_params.py | 1 + src/mesa/main/mtypes.h | 1 + src/mesa/state_tracker/st_manager.c | 17 + 7 files changed, 21 insertions(+), 10 deletions(-) -- 2.18.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] freedreno: Fix the Emacs indentation configuration file
Are you sure? It works fine for me... I'm not against fixing it to be "t", but the current contents definitely worked fine for me. (As I recall, I may be the one who checked this file in.) On Wed, Oct 17, 2018 at 11:38 AM Neil Roberts wrote: > > The .dir-locals.el had the wrong name for the truthy value so it > wasn’t setting indent-tabs-mode. > --- > src/gallium/drivers/freedreno/.dir-locals.el | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/src/gallium/drivers/freedreno/.dir-locals.el > b/src/gallium/drivers/freedreno/.dir-locals.el > index aa20d495465..b0e90fcbd53 100644 > --- a/src/gallium/drivers/freedreno/.dir-locals.el > +++ b/src/gallium/drivers/freedreno/.dir-locals.el > @@ -1,5 +1,5 @@ > ((prog-mode > - (indent-tabs-mode . true) > + (indent-tabs-mode . t) >(tab-width . 4) >(c-basic-offset . 4) >(c-file-style . "k&r") > -- > 2.17.1 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/2] radeonsi: fix a VGT hang with primitive restart on Polaris10 and later
From: Marek Olšák Cc: 18.1 18.2 --- src/gallium/drivers/radeonsi/si_state_draw.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/src/gallium/drivers/radeonsi/si_state_draw.c b/src/gallium/drivers/radeonsi/si_state_draw.c index 83eb646b791..612ca910cb9 100644 --- a/src/gallium/drivers/radeonsi/si_state_draw.c +++ b/src/gallium/drivers/radeonsi/si_state_draw.c @@ -376,21 +376,21 @@ si_get_init_multi_vgt_param(struct si_screen *sscreen, } if (sscreen->info.chip_class >= CIK) { /* WD_SWITCH_ON_EOP has no effect on GPUs with less than * 4 shader engines. Set 1 to pass the assertion below. * The other cases are hardware requirements. * * Polaris supports primitive restart with WD_SWITCH_ON_EOP=0 * for points, line strips, and tri strips. */ - if (sscreen->info.max_se < 4 || + if (sscreen->info.max_se <= 2 || key->u.prim == PIPE_PRIM_POLYGON || key->u.prim == PIPE_PRIM_LINE_LOOP || key->u.prim == PIPE_PRIM_TRIANGLE_FAN || key->u.prim == PIPE_PRIM_TRIANGLE_STRIP_ADJACENCY || (key->u.primitive_restart && (sscreen->info.family < CHIP_POLARIS10 || (key->u.prim != PIPE_PRIM_POINTS && key->u.prim != PIPE_PRIM_LINE_STRIP && key->u.prim != PIPE_PRIM_TRIANGLE_STRIP))) || key->u.count_from_stream_output) @@ -407,35 +407,41 @@ si_get_init_multi_vgt_param(struct si_screen *sscreen, * instances are smaller than a primgroup. * Assume indirect draws always use small instances. * This is needed for good VS wave utilization. */ if (sscreen->info.chip_class <= VI && sscreen->info.max_se == 4 && key->u.multi_instances_smaller_than_primgroup) wd_switch_on_eop = true; /* Required on CIK and later. */ - if (sscreen->info.max_se > 2 && !wd_switch_on_eop) + if (sscreen->info.max_se == 4 && !wd_switch_on_eop) ia_switch_on_eoi = true; /* Required by Hawaii and, for some special cases, by VI. */ if (ia_switch_on_eoi && (sscreen->info.family == CHIP_HAWAII || (sscreen->info.chip_class == VI && (key->u.uses_gs || max_primgroup_in_wave != 2 partial_vs_wave = true; /* Instancing bug on Bonaire. */ if (sscreen->info.family == CHIP_BONAIRE && ia_switch_on_eoi && key->u.uses_instancing) partial_vs_wave = true; + /* This only applies to Polaris10 and later 4 SE chips. +* wd_switch_on_eop is already true on all other chips. +*/ + if (!wd_switch_on_eop && key->u.primitive_restart) + partial_vs_wave = true; + /* If the WD switch is false, the IA switch must be false too. */ assert(wd_switch_on_eop || !ia_switch_on_eop); } /* If SWITCH_ON_EOI is set, PARTIAL_ES_WAVE must be set too. */ if (sscreen->info.chip_class <= VI && ia_switch_on_eoi) partial_es_wave = true; return S_028AA8_SWITCH_ON_EOP(ia_switch_on_eop) | S_028AA8_SWITCH_ON_EOI(ia_switch_on_eoi) | -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/2] radeonsi: clamp point size to the limit
From: Marek Olšák This fixes dEQP-GLES2.functional.rasterization.limits.points. Broken by: ea039f789d9b54e1bd1d644b6a29863ca3500314 --- src/gallium/drivers/radeonsi/si_get.c | 5 +++-- src/gallium/drivers/radeonsi/si_pipe.h | 1 + src/gallium/drivers/radeonsi/si_state.c | 2 +- 3 files changed, 5 insertions(+), 3 deletions(-) diff --git a/src/gallium/drivers/radeonsi/si_get.c b/src/gallium/drivers/radeonsi/si_get.c index ac302b8a946..804276b3eda 100644 --- a/src/gallium/drivers/radeonsi/si_get.c +++ b/src/gallium/drivers/radeonsi/si_get.c @@ -326,25 +326,26 @@ static int si_get_param(struct pipe_screen *pscreen, enum pipe_cap param) default: return u_pipe_screen_get_param_defaults(pscreen, param); } } static float si_get_paramf(struct pipe_screen* pscreen, enum pipe_capf param) { switch (param) { case PIPE_CAPF_MAX_LINE_WIDTH: case PIPE_CAPF_MAX_LINE_WIDTH_AA: - case PIPE_CAPF_MAX_POINT_WIDTH: - case PIPE_CAPF_MAX_POINT_WIDTH_AA: /* This depends on the quant mode, though the precise interactions * are unknown. */ return 2048; + case PIPE_CAPF_MAX_POINT_WIDTH: + case PIPE_CAPF_MAX_POINT_WIDTH_AA: + return SI_MAX_POINT_SIZE; case PIPE_CAPF_MAX_TEXTURE_ANISOTROPY: return 16.0f; case PIPE_CAPF_MAX_TEXTURE_LOD_BIAS: return 16.0f; case PIPE_CAPF_MIN_CONSERVATIVE_RASTER_DILATE: case PIPE_CAPF_MAX_CONSERVATIVE_RASTER_DILATE: case PIPE_CAPF_CONSERVATIVE_RASTER_DILATE_GRANULARITY: return 0.0f; } return 0.0f; diff --git a/src/gallium/drivers/radeonsi/si_pipe.h b/src/gallium/drivers/radeonsi/si_pipe.h index 6edc06cece7..dc95afb7421 100644 --- a/src/gallium/drivers/radeonsi/si_pipe.h +++ b/src/gallium/drivers/radeonsi/si_pipe.h @@ -41,20 +41,21 @@ #define ATI_VENDOR_ID 0x1002 #define SI_NOT_QUERY 0x /* The base vertex and primitive restart can be any number, but we must pick * one which will mean "unknown" for the purpose of state tracking and * the number shouldn't be a commonly-used one. */ #define SI_BASE_VERTEX_UNKNOWN INT_MIN #define SI_RESTART_INDEX_UNKNOWN INT_MIN #define SI_NUM_SMOOTH_AA_SAMPLES 8 +#define SI_MAX_POINT_SIZE 2048 #define SI_GS_PER_ES 128 /* Alignment for optimal CP DMA performance. */ #define SI_CPDMA_ALIGNMENT 32 /* Tunables for compute-based clear_buffer and copy_buffer: */ #define SI_COMPUTE_CLEAR_DW_PER_THREAD 4 #define SI_COMPUTE_COPY_DW_PER_THREAD 4 #define SI_COMPUTE_DST_CACHE_POLICYL2_STREAM /* Pipeline & streamout query controls. */ diff --git a/src/gallium/drivers/radeonsi/si_state.c b/src/gallium/drivers/radeonsi/si_state.c index 8b2e6e57f45..176ec749148 100644 --- a/src/gallium/drivers/radeonsi/si_state.c +++ b/src/gallium/drivers/radeonsi/si_state.c @@ -891,21 +891,21 @@ static void *si_create_rs_state(struct pipe_context *ctx, S_0286D4_PNT_SPRITE_OVRD_Z(V_0286D4_SPI_PNT_SPRITE_SEL_0) | S_0286D4_PNT_SPRITE_OVRD_W(V_0286D4_SPI_PNT_SPRITE_SEL_1) | S_0286D4_PNT_SPRITE_TOP_1(state->sprite_coord_mode != PIPE_SPRITE_COORD_UPPER_LEFT)); /* point size 12.4 fixed point */ tmp = (unsigned)(state->point_size * 8.0); si_pm4_set_reg(pm4, R_028A00_PA_SU_POINT_SIZE, S_028A00_HEIGHT(tmp) | S_028A00_WIDTH(tmp)); if (state->point_size_per_vertex) { psize_min = util_get_min_point_size(state); - psize_max = 8192; + psize_max = SI_MAX_POINT_SIZE; } else { /* Force the point size to be as if the vertex output was disabled. */ psize_min = state->point_size; psize_max = state->point_size; } rs->max_point_size = psize_max; /* Divide by two, because 0.5 = 1 pixel. */ si_pm4_set_reg(pm4, R_028A04_PA_SU_POINT_MINMAX, S_028A04_MIN_SIZE(si_pack_float_12p4(psize_min/2)) | -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] intel/tools: Remove hardcoded PADDING_SIZE from sanitizer
On Wed, Oct 17, 2018 at 06:08:34PM +0300, Danylo Piliaiev wrote: > Signed-off-by: Danylo Piliaiev > --- > src/intel/tools/intel_sanitize_gpu.c | 38 +++- > 1 file changed, 20 insertions(+), 18 deletions(-) > > diff --git a/src/intel/tools/intel_sanitize_gpu.c > b/src/intel/tools/intel_sanitize_gpu.c > index 9b49b0bbf2..36c4725a2f 100644 > --- a/src/intel/tools/intel_sanitize_gpu.c > +++ b/src/intel/tools/intel_sanitize_gpu.c > @@ -51,14 +51,6 @@ static int (*libc_fcntl)(int fd, int cmd, int param); > > #define DRM_MAJOR 226 > > -/* TODO: we want to make sure that the padding forces > - * the BO to take another page on the (PP)GTT; 4KB > - * may or may not be the page size for the BO. Indeed, > - * depending on GPU, kernel version and GEM size, the > - * page size can be one of 4KB, 64KB or 2M. > - */ > -#define PADDING_SIZE 4096 > - > struct refcnt_hash_table { > struct hash_table *t; > int refcnt; > @@ -80,6 +72,8 @@ pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER; > > static struct hash_table *fds_to_bo_sizes = NULL; > > +static long padding_size = 0; > + > static inline struct hash_table* > bo_size_table(int fd) > { > @@ -166,7 +160,7 @@ padding_is_good(int fd, uint32_t handle) > struct drm_i915_gem_mmap mmap_arg = { >.handle = handle, >.offset = bo_size(fd, handle), > - .size = PADDING_SIZE, > + .size = padding_size, >.flags = 0, > }; > > @@ -189,17 +183,17 @@ padding_is_good(int fd, uint32_t handle) > * if the bo is not cache coherent we likely need to > * invalidate the cache lines to get it. > */ > - gen_invalidate_range(mapped, PADDING_SIZE); > + gen_invalidate_range(mapped, padding_size); > > expected_value = handle & 0xFF; > - for (uint32_t i = 0; i < PADDING_SIZE; ++i) { > + for (uint32_t i = 0; i < padding_size; ++i) { >if (expected_value != mapped[i]) { > - munmap(mapped, PADDING_SIZE); > + munmap(mapped, padding_size); > return false; >} >expected_value = next_noise_value(expected_value); > } > - munmap(mapped, PADDING_SIZE); > + munmap(mapped, padding_size); > > return true; > } > @@ -207,9 +201,9 @@ padding_is_good(int fd, uint32_t handle) > static int > create_with_padding(int fd, struct drm_i915_gem_create *create) > { > - create->size += PADDING_SIZE; > + create->size += padding_size; > int ret = libc_ioctl(fd, DRM_IOCTL_I915_GEM_CREATE, create); > - create->size -= PADDING_SIZE; > + create->size -= padding_size; > > if (ret != 0) >return ret; > @@ -218,7 +212,7 @@ create_with_padding(int fd, struct drm_i915_gem_create > *create) > struct drm_i915_gem_mmap mmap_arg = { >.handle = create->handle, >.offset = create->size, > - .size = PADDING_SIZE, > + .size = padding_size, >.flags = 0, > }; > > @@ -228,8 +222,8 @@ create_with_padding(int fd, struct drm_i915_gem_create > *create) > > noise_values = (uint8_t*) (uintptr_t) mmap_arg.addr_ptr; > fill_noise_buffer(noise_values, create->handle & 0xFF, > - PADDING_SIZE); > - munmap(noise_values, PADDING_SIZE); > + padding_size); > + munmap(noise_values, padding_size); > > _mesa_hash_table_insert(bo_size_table(fd), > (void*)(uintptr_t)create->handle, > (void*)(uintptr_t)create->size); > @@ -427,4 +421,12 @@ init(void) > libc_close = dlsym(RTLD_NEXT, "close"); > libc_fcntl = dlsym(RTLD_NEXT, "fcntl"); > libc_ioctl = dlsym(RTLD_NEXT, "ioctl"); > + > + /* We want to make sure that the padding forces > +* the BO to take another page on the (PP)GTT. > +*/ > + padding_size = sysconf(_SC_PAGESIZE); I don't think this is the page size we want. This is the page size of CPU/system memory, which might be different from what the GPU is using to map pages. For instance, even if we are using 64K pages for GPU mapping, I think this call would still return 4K. Though I'm not sure if there's an interface to query the kernel which page size we are using for the GPU... > + if (padding_size == -1) { > + unreachable("Bad page size"); > + } > } > -- > 2.18.0 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/2] freedreno: Remove the Emacs mode lines
These are not necessary because the corresponding settings are set via the .dir-locals.el file anyway. Most of them were missing a ‘:’ after “tab-width” which was making Emacs display an annoying warning whenever you open the file. This patch was made with: sed -ri '/-\*- mode:/,/^$/d' \ $(find src/gallium/{drivers,winsys} -name \*.\[ch\] \ -exec grep -l -- '-\*- mode:' {} \+) --- src/gallium/drivers/freedreno/a2xx/fd2_blend.c | 2 -- src/gallium/drivers/freedreno/a2xx/fd2_blend.h | 2 -- src/gallium/drivers/freedreno/a2xx/fd2_compiler.c | 2 -- src/gallium/drivers/freedreno/a2xx/fd2_compiler.h | 2 -- src/gallium/drivers/freedreno/a2xx/fd2_context.c| 2 -- src/gallium/drivers/freedreno/a2xx/fd2_context.h| 2 -- src/gallium/drivers/freedreno/a2xx/fd2_draw.c | 2 -- src/gallium/drivers/freedreno/a2xx/fd2_draw.h | 2 -- src/gallium/drivers/freedreno/a2xx/fd2_emit.c | 2 -- src/gallium/drivers/freedreno/a2xx/fd2_emit.h | 2 -- src/gallium/drivers/freedreno/a2xx/fd2_gmem.c | 2 -- src/gallium/drivers/freedreno/a2xx/fd2_gmem.h | 2 -- src/gallium/drivers/freedreno/a2xx/fd2_program.c| 2 -- src/gallium/drivers/freedreno/a2xx/fd2_program.h| 2 -- src/gallium/drivers/freedreno/a2xx/fd2_rasterizer.c | 2 -- src/gallium/drivers/freedreno/a2xx/fd2_rasterizer.h | 2 -- src/gallium/drivers/freedreno/a2xx/fd2_screen.c | 2 -- src/gallium/drivers/freedreno/a2xx/fd2_screen.h | 2 -- src/gallium/drivers/freedreno/a2xx/fd2_texture.c| 2 -- src/gallium/drivers/freedreno/a2xx/fd2_texture.h| 2 -- src/gallium/drivers/freedreno/a2xx/fd2_util.c | 2 -- src/gallium/drivers/freedreno/a2xx/fd2_util.h | 2 -- src/gallium/drivers/freedreno/a2xx/fd2_zsa.c| 2 -- src/gallium/drivers/freedreno/a2xx/fd2_zsa.h| 2 -- src/gallium/drivers/freedreno/a3xx/fd3_blend.c | 2 -- src/gallium/drivers/freedreno/a3xx/fd3_blend.h | 2 -- src/gallium/drivers/freedreno/a3xx/fd3_context.c| 2 -- src/gallium/drivers/freedreno/a3xx/fd3_context.h| 2 -- src/gallium/drivers/freedreno/a3xx/fd3_draw.c | 2 -- src/gallium/drivers/freedreno/a3xx/fd3_draw.h | 2 -- src/gallium/drivers/freedreno/a3xx/fd3_emit.c | 2 -- src/gallium/drivers/freedreno/a3xx/fd3_emit.h | 2 -- src/gallium/drivers/freedreno/a3xx/fd3_gmem.c | 2 -- src/gallium/drivers/freedreno/a3xx/fd3_gmem.h | 2 -- src/gallium/drivers/freedreno/a3xx/fd3_program.c| 2 -- src/gallium/drivers/freedreno/a3xx/fd3_program.h| 2 -- src/gallium/drivers/freedreno/a3xx/fd3_query.c | 2 -- src/gallium/drivers/freedreno/a3xx/fd3_query.h | 2 -- src/gallium/drivers/freedreno/a3xx/fd3_rasterizer.c | 2 -- src/gallium/drivers/freedreno/a3xx/fd3_rasterizer.h | 2 -- src/gallium/drivers/freedreno/a3xx/fd3_screen.c | 2 -- src/gallium/drivers/freedreno/a3xx/fd3_screen.h | 2 -- src/gallium/drivers/freedreno/a3xx/fd3_texture.c| 2 -- src/gallium/drivers/freedreno/a3xx/fd3_texture.h| 2 -- src/gallium/drivers/freedreno/a3xx/fd3_zsa.c| 2 -- src/gallium/drivers/freedreno/a3xx/fd3_zsa.h| 2 -- src/gallium/drivers/freedreno/a4xx/fd4_blend.c | 2 -- src/gallium/drivers/freedreno/a4xx/fd4_blend.h | 2 -- src/gallium/drivers/freedreno/a4xx/fd4_context.c| 2 -- src/gallium/drivers/freedreno/a4xx/fd4_context.h| 2 -- src/gallium/drivers/freedreno/a4xx/fd4_draw.c | 2 -- src/gallium/drivers/freedreno/a4xx/fd4_draw.h | 2 -- src/gallium/drivers/freedreno/a4xx/fd4_emit.c | 2 -- src/gallium/drivers/freedreno/a4xx/fd4_emit.h | 2 -- src/gallium/drivers/freedreno/a4xx/fd4_format.c | 2 -- src/gallium/drivers/freedreno/a4xx/fd4_format.h | 2 -- src/gallium/drivers/freedreno/a4xx/fd4_gmem.c | 2 -- src/gallium/drivers/freedreno/a4xx/fd4_gmem.h | 2 -- src/gallium/drivers/freedreno/a4xx/fd4_program.c| 2 -- src/gallium/drivers/freedreno/a4xx/fd4_program.h| 2 -- src/gallium/drivers/freedreno/a4xx/fd4_query.c | 2 -- src/gallium/drivers/freedreno/a4xx/fd4_query.h | 2 -- src/gallium/drivers/freedreno/a4xx/fd4_rasterizer.c | 2 -- src/gallium/drivers/freedreno/a4xx/fd4_rasterizer.h | 2 -- src/gallium/drivers/freedreno/a4xx/fd4_screen.c | 2 -- src/gallium/drivers/freedreno/a4xx/fd4_screen.h | 2 -- src/gallium/drivers/freedreno/a4xx/fd4_texture.c| 2 -- src/gallium/drivers/freedreno/a4xx/fd4_texture.h| 2 -- src/gallium/drivers/freedreno/a4xx/fd4_zsa.c| 2 -- src/gallium/drivers/freedreno/a4xx/fd4_zsa.h| 2 -- src/gallium/drivers/freedreno/freedreno_context.c | 2 -- src/gallium/drivers/freedreno/freedreno_context.h
[Mesa-dev] [PATCH 1/2] freedreno: Fix the Emacs indentation configuration file
The .dir-locals.el had the wrong name for the truthy value so it wasn’t setting indent-tabs-mode. --- src/gallium/drivers/freedreno/.dir-locals.el | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/gallium/drivers/freedreno/.dir-locals.el b/src/gallium/drivers/freedreno/.dir-locals.el index aa20d495465..b0e90fcbd53 100644 --- a/src/gallium/drivers/freedreno/.dir-locals.el +++ b/src/gallium/drivers/freedreno/.dir-locals.el @@ -1,5 +1,5 @@ ((prog-mode - (indent-tabs-mode . true) + (indent-tabs-mode . t) (tab-width . 4) (c-basic-offset . 4) (c-file-style . "k&r") -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] vulkan: Add VK_EXT_calibrated_timestamps extension (radv and anv) [v4]
On Wed, Oct 17, 2018 at 12:14 AM Keith Packard wrote: > Jason Ekstrand writes: > > > Doing all of the CPU sampling on one side or the other of the GPU > sampling > > would probably reduce our window. > > True, although as I said, it's taking several µs to get through the > loop, and the gpu clock tick is far smaller than that, so even adding > the two values together to make it fit the current implementation won't > make the deviation that much larger. > > > This leaves us with a delta of I + max(P(M), P(R), P(G)). In > > particular, any two real-number valued times are, instantaneously, > > within that interval. > > That, at least, would be easy to compute, and scale nicely if we added > more clocks in the future. > > > Personally, I'm completely content to have the delta just be a the first > > one: a bound on the difference between any two real-valued times. At > this > > point, I can guarantee you that far more thought has been put into this > > mesa-dev discussion than was put into the spec and I think we're rapidly > > getting to the point of diminishing returns. :-) > > It seems likely. How about we do the above computation for the current > code and leave it at that? > Sounds like a plan. Note that I should be computed as I = end - start + monotonic_raw_tick_ns to ensure we get a big enough interval. Given that monotonic_raw_tick_ns is likely 1, this doesn't expand things much. I think a comment is likely also in order. Probably not containing the entire e-mail thread but maybe some of my reasoning above? --Jason ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] freedreno: Fix emacs modeline
Eric Engestrom writes: > You might want to remove these instead, and use the .editorconfig [1] > already present at src/gallium/drivers/freedreno/.editorconfig This is > much easier to maintain than per-files settings ;) Either fixing it or removing it is fine by me. I now notice there is a .dir-locals.el file that should make it work anyway. (apparently I was the last person to touch it too!) It has a typo which makes it fail to set indent-tabs-mode though. I can make everything work locally either way, I just wanted to get rid of the annoying warning whenever you open a file. - Neil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 0/7] EGLDevice, take 2.1
On Wed, 3 Oct 2018 at 15:08, Emil Velikov wrote: > > Hi all, > > This re-spin of the series includes: > - correct flipped asserts > - cosmetic wording/comment fixes > - drop EGL_EXT_platform_device patches (swrast is broken) > - add the EGL_MESA_device_software spec patch > > At this point we should be pretty much set, so any formal Ack/Rb will > be appreciated. > > Thanks > Emil > > Cc: Adam Jackson > Cc: Eric Engestrom > Cc: Mathias Fröhlich > > Adam Jackson (1): > specs: Add EGL_MESA_device_software > > Emil Velikov (6): > egl: add base EGL_EXT_device_base implementation > egl: add EGL_MESA_device_software support > egl: add EGL_EXT_device_drm support > egl: set the EGLDevice when creating a display > egl: enable EGL_EXT_device_{base,enumeration,query} Humble ping? -Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] intel/tools: Remove hardcoded PADDING_SIZE from sanitizer
Signed-off-by: Danylo Piliaiev --- src/intel/tools/intel_sanitize_gpu.c | 38 +++- 1 file changed, 20 insertions(+), 18 deletions(-) diff --git a/src/intel/tools/intel_sanitize_gpu.c b/src/intel/tools/intel_sanitize_gpu.c index 9b49b0bbf2..36c4725a2f 100644 --- a/src/intel/tools/intel_sanitize_gpu.c +++ b/src/intel/tools/intel_sanitize_gpu.c @@ -51,14 +51,6 @@ static int (*libc_fcntl)(int fd, int cmd, int param); #define DRM_MAJOR 226 -/* TODO: we want to make sure that the padding forces - * the BO to take another page on the (PP)GTT; 4KB - * may or may not be the page size for the BO. Indeed, - * depending on GPU, kernel version and GEM size, the - * page size can be one of 4KB, 64KB or 2M. - */ -#define PADDING_SIZE 4096 - struct refcnt_hash_table { struct hash_table *t; int refcnt; @@ -80,6 +72,8 @@ pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER; static struct hash_table *fds_to_bo_sizes = NULL; +static long padding_size = 0; + static inline struct hash_table* bo_size_table(int fd) { @@ -166,7 +160,7 @@ padding_is_good(int fd, uint32_t handle) struct drm_i915_gem_mmap mmap_arg = { .handle = handle, .offset = bo_size(fd, handle), - .size = PADDING_SIZE, + .size = padding_size, .flags = 0, }; @@ -189,17 +183,17 @@ padding_is_good(int fd, uint32_t handle) * if the bo is not cache coherent we likely need to * invalidate the cache lines to get it. */ - gen_invalidate_range(mapped, PADDING_SIZE); + gen_invalidate_range(mapped, padding_size); expected_value = handle & 0xFF; - for (uint32_t i = 0; i < PADDING_SIZE; ++i) { + for (uint32_t i = 0; i < padding_size; ++i) { if (expected_value != mapped[i]) { - munmap(mapped, PADDING_SIZE); + munmap(mapped, padding_size); return false; } expected_value = next_noise_value(expected_value); } - munmap(mapped, PADDING_SIZE); + munmap(mapped, padding_size); return true; } @@ -207,9 +201,9 @@ padding_is_good(int fd, uint32_t handle) static int create_with_padding(int fd, struct drm_i915_gem_create *create) { - create->size += PADDING_SIZE; + create->size += padding_size; int ret = libc_ioctl(fd, DRM_IOCTL_I915_GEM_CREATE, create); - create->size -= PADDING_SIZE; + create->size -= padding_size; if (ret != 0) return ret; @@ -218,7 +212,7 @@ create_with_padding(int fd, struct drm_i915_gem_create *create) struct drm_i915_gem_mmap mmap_arg = { .handle = create->handle, .offset = create->size, - .size = PADDING_SIZE, + .size = padding_size, .flags = 0, }; @@ -228,8 +222,8 @@ create_with_padding(int fd, struct drm_i915_gem_create *create) noise_values = (uint8_t*) (uintptr_t) mmap_arg.addr_ptr; fill_noise_buffer(noise_values, create->handle & 0xFF, - PADDING_SIZE); - munmap(noise_values, PADDING_SIZE); + padding_size); + munmap(noise_values, padding_size); _mesa_hash_table_insert(bo_size_table(fd), (void*)(uintptr_t)create->handle, (void*)(uintptr_t)create->size); @@ -427,4 +421,12 @@ init(void) libc_close = dlsym(RTLD_NEXT, "close"); libc_fcntl = dlsym(RTLD_NEXT, "fcntl"); libc_ioctl = dlsym(RTLD_NEXT, "ioctl"); + + /* We want to make sure that the padding forces +* the BO to take another page on the (PP)GTT. +*/ + padding_size = sysconf(_SC_PAGESIZE); + if (padding_size == -1) { + unreachable("Bad page size"); + } } -- 2.18.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] freedreno: Fix emacs modeline
On Wednesday, 2018-10-17 15:48:41 +0200, Neil Roberts wrote: > The modeline was missing a ‘:’ after the tab-width and Emacs was > complaining every time you open a file. This patch was made with: > > sed -ri '1 s/; tab-width ([0-9])/; tab-width: \1/' \ > $(find -name \*.\[ch\] -exec grep -l -- '-\*- mode:' {} \+) > --- [snip] > > diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_blend.c > b/src/gallium/drivers/freedreno/a2xx/fd2_blend.c > index 4e991794f07..48bd395b594 100644 > --- a/src/gallium/drivers/freedreno/a2xx/fd2_blend.c > +++ b/src/gallium/drivers/freedreno/a2xx/fd2_blend.c > @@ -1,4 +1,4 @@ > -/* -*- mode: C; c-file-style: "k&r"; tab-width 4; indent-tabs-mode: t; -*- */ > +/* -*- mode: C; c-file-style: "k&r"; tab-width: 4; indent-tabs-mode: t; -*- > */ You might want to remove these instead, and use the .editorconfig [1] already present at src/gallium/drivers/freedreno/.editorconfig This is much easier to maintain than per-files settings ;) The website [1] has a link for a plugin for Emacs since it appears to lack native support, but if you're ok with installing a plugin, this should be a good solution for you :) [1] https://editorconfig.org ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] freedreno: Fix emacs modeline
The modeline was missing a ‘:’ after the tab-width and Emacs was complaining every time you open a file. This patch was made with: sed -ri '1 s/; tab-width ([0-9])/; tab-width: \1/' \ $(find -name \*.\[ch\] -exec grep -l -- '-\*- mode:' {} \+) --- src/gallium/drivers/freedreno/a2xx/fd2_blend.c | 2 +- src/gallium/drivers/freedreno/a2xx/fd2_blend.h | 2 +- src/gallium/drivers/freedreno/a2xx/fd2_compiler.c | 2 +- src/gallium/drivers/freedreno/a2xx/fd2_compiler.h | 2 +- src/gallium/drivers/freedreno/a2xx/fd2_context.c| 2 +- src/gallium/drivers/freedreno/a2xx/fd2_context.h| 2 +- src/gallium/drivers/freedreno/a2xx/fd2_draw.c | 2 +- src/gallium/drivers/freedreno/a2xx/fd2_draw.h | 2 +- src/gallium/drivers/freedreno/a2xx/fd2_emit.c | 2 +- src/gallium/drivers/freedreno/a2xx/fd2_emit.h | 2 +- src/gallium/drivers/freedreno/a2xx/fd2_gmem.c | 2 +- src/gallium/drivers/freedreno/a2xx/fd2_gmem.h | 2 +- src/gallium/drivers/freedreno/a2xx/fd2_program.c| 2 +- src/gallium/drivers/freedreno/a2xx/fd2_program.h| 2 +- src/gallium/drivers/freedreno/a2xx/fd2_rasterizer.c | 2 +- src/gallium/drivers/freedreno/a2xx/fd2_rasterizer.h | 2 +- src/gallium/drivers/freedreno/a2xx/fd2_screen.c | 2 +- src/gallium/drivers/freedreno/a2xx/fd2_screen.h | 2 +- src/gallium/drivers/freedreno/a2xx/fd2_texture.c| 2 +- src/gallium/drivers/freedreno/a2xx/fd2_texture.h| 2 +- src/gallium/drivers/freedreno/a2xx/fd2_util.c | 2 +- src/gallium/drivers/freedreno/a2xx/fd2_util.h | 2 +- src/gallium/drivers/freedreno/a2xx/fd2_zsa.c| 2 +- src/gallium/drivers/freedreno/a2xx/fd2_zsa.h| 2 +- src/gallium/drivers/freedreno/a3xx/fd3_blend.c | 2 +- src/gallium/drivers/freedreno/a3xx/fd3_blend.h | 2 +- src/gallium/drivers/freedreno/a3xx/fd3_context.c| 2 +- src/gallium/drivers/freedreno/a3xx/fd3_context.h| 2 +- src/gallium/drivers/freedreno/a3xx/fd3_draw.c | 2 +- src/gallium/drivers/freedreno/a3xx/fd3_draw.h | 2 +- src/gallium/drivers/freedreno/a3xx/fd3_emit.c | 2 +- src/gallium/drivers/freedreno/a3xx/fd3_emit.h | 2 +- src/gallium/drivers/freedreno/a3xx/fd3_gmem.c | 2 +- src/gallium/drivers/freedreno/a3xx/fd3_gmem.h | 2 +- src/gallium/drivers/freedreno/a3xx/fd3_program.c| 2 +- src/gallium/drivers/freedreno/a3xx/fd3_program.h| 2 +- src/gallium/drivers/freedreno/a3xx/fd3_query.c | 2 +- src/gallium/drivers/freedreno/a3xx/fd3_query.h | 2 +- src/gallium/drivers/freedreno/a3xx/fd3_rasterizer.c | 2 +- src/gallium/drivers/freedreno/a3xx/fd3_rasterizer.h | 2 +- src/gallium/drivers/freedreno/a3xx/fd3_screen.c | 2 +- src/gallium/drivers/freedreno/a3xx/fd3_screen.h | 2 +- src/gallium/drivers/freedreno/a3xx/fd3_texture.c| 2 +- src/gallium/drivers/freedreno/a3xx/fd3_texture.h| 2 +- src/gallium/drivers/freedreno/a3xx/fd3_zsa.c| 2 +- src/gallium/drivers/freedreno/a3xx/fd3_zsa.h| 2 +- src/gallium/drivers/freedreno/a4xx/fd4_blend.c | 2 +- src/gallium/drivers/freedreno/a4xx/fd4_blend.h | 2 +- src/gallium/drivers/freedreno/a4xx/fd4_context.c| 2 +- src/gallium/drivers/freedreno/a4xx/fd4_context.h| 2 +- src/gallium/drivers/freedreno/a4xx/fd4_draw.c | 2 +- src/gallium/drivers/freedreno/a4xx/fd4_draw.h | 2 +- src/gallium/drivers/freedreno/a4xx/fd4_emit.c | 2 +- src/gallium/drivers/freedreno/a4xx/fd4_emit.h | 2 +- src/gallium/drivers/freedreno/a4xx/fd4_format.c | 2 +- src/gallium/drivers/freedreno/a4xx/fd4_format.h | 2 +- src/gallium/drivers/freedreno/a4xx/fd4_gmem.c | 2 +- src/gallium/drivers/freedreno/a4xx/fd4_gmem.h | 2 +- src/gallium/drivers/freedreno/a4xx/fd4_program.c| 2 +- src/gallium/drivers/freedreno/a4xx/fd4_program.h| 2 +- src/gallium/drivers/freedreno/a4xx/fd4_query.c | 2 +- src/gallium/drivers/freedreno/a4xx/fd4_query.h | 2 +- src/gallium/drivers/freedreno/a4xx/fd4_rasterizer.c | 2 +- src/gallium/drivers/freedreno/a4xx/fd4_rasterizer.h | 2 +- src/gallium/drivers/freedreno/a4xx/fd4_screen.c | 2 +- src/gallium/drivers/freedreno/a4xx/fd4_screen.h | 2 +- src/gallium/drivers/freedreno/a4xx/fd4_texture.c| 2 +- src/gallium/drivers/freedreno/a4xx/fd4_texture.h| 2 +- src/gallium/drivers/freedreno/a4xx/fd4_zsa.c| 2 +- src/gallium/drivers/freedreno/a4xx/fd4_zsa.h| 2 +- src/gallium/drivers/freedreno/freedreno_context.c | 2 +- src/gallium/drivers/freedreno/freedreno_context.h | 2 +- src/gallium/drivers/freedreno/freedreno_draw.c | 2 +- src/gallium/drivers/freedreno/freedreno_draw.h | 2 +- src/galli
Re: [Mesa-dev] [PATCH 2/4] radeonsi: use compute shaders for clear_buffer & copy_buffer
On 2018-10-07 9:05 a.m., Marek Olšák wrote: > From: Marek Olšák > > Fast color clears should be much faster. Also, fast color clears on > evicted buffers should be 200x faster on GFX8 and older. Nice! Unfortunately, this broke clover with radeonsi. Everything using OpenCL seems to hang, see e.g. the attached backtraces from clinfo. -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer (gdb) info threads Id Target Id Frame * 1Thread 0x7f63ecdb2740 (LWP 24202) "clinfo" syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38 2Thread 0x7f63e62bc700 (LWP 24203) "clinfo:rcs0" 0x7f63e7e36e6c in futex_wait_cancelable (private=, expected=0, futex_word=0x55e915203af0) at ../sysdeps/unix/sysv/linux/futex-internal.h:88 3Thread 0x7f63e5934700 (LWP 24204) "clinfo:disk$0" 0x7f63e7e36e6c in futex_wait_cancelable (private=, expected=0, futex_word=0x55e915204768) at ../sysdeps/unix/sysv/linux/futex-internal.h:88 4Thread 0x7f63e510a700 (LWP 24205) "clinfo:cs0" 0x7f63e7e36e6c in futex_wait_cancelable (private=, expected=0, futex_word=0x55e915214aa0) at ../sysdeps/unix/sysv/linux/futex-internal.h:88 5Thread 0x7f63d7fff700 (LWP 24206) "clinfo:disk$0" 0x7f63e7e36e6c in futex_wait_cancelable (private=, expected=0, futex_word=0x55e9152185a8) at ../sysdeps/unix/sysv/linux/futex-internal.h:88 6Thread 0x7f63d77fe700 (LWP 24207) "clinfo:sh0" __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 7Thread 0x7f63d6ffd700 (LWP 24208) "clinfo:sh1" __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 8Thread 0x7f63c700 (LWP 24209) "clinfo:sh2" 0x7f63e7e36e6c in futex_wait_cancelable (private=, expected=0, futex_word=0x55e915217d00) at ../sysdeps/unix/sysv/linux/futex-internal.h:88 9Thread 0x7f63d67fc700 (LWP 24210) "clinfo:sh3" 0x7f63e7e36e6c in futex_wait_cancelable (private=, expected=0, futex_word=0x55e915217d00) at ../sysdeps/unix/sysv/linux/futex-internal.h:88 10 Thread 0x7f63d5ffb700 (LWP 24211) "clinfo:sh4" 0x7f63e7e36e6c in futex_wait_cancelable (private=, expected=0, futex_word=0x55e915217d00) at ../sysdeps/unix/sysv/linux/futex-internal.h:88 11 Thread 0x7f63d57fa700 (LWP 24212) "clinfo:sh5" 0x7f63e7e36e6c in futex_wait_cancelable (private=, expected=0, futex_word=0x55e915217d00) at ../sysdeps/unix/sysv/linux/futex-internal.h:88 12 Thread 0x7f63d4ff9700 (LWP 24213) "clinfo:sh6" 0x7f63e7e36e6c in futex_wait_cancelable (private=, expected=0, futex_word=0x55e915217d00) at ../sysdeps/unix/sysv/linux/futex-internal.h:88 13 Thread 0x7f63cf7fe700 (LWP 24214) "clinfo:sh7" 0x7f63e7e36e6c in futex_wait_cancelable (private=, expected=0, futex_word=0x55e915217d00) at ../sysdeps/unix/sysv/linux/futex-internal.h:88 14 Thread 0x7f63ceffd700 (LWP 24215) "clinfo:sh8" 0x7f63e7e36e6c in futex_wait_cancelable (private=, expected=0, futex_word=0x55e915217d00) at ../sysdeps/unix/sysv/linux/futex-internal.h:88 15 Thread 0x7f63ce7fc700 (LWP 24216) "clinfo:sh9" 0x7f63e7e36e6c in futex_wait_cancelable (private=, expected=0, futex_word=0x55e915217d00) at ../sysdeps/unix/sysv/linux/futex-internal.h:88 16 Thread 0x7f63cdffb700 (LWP 24217) "clinfo:sh10" 0x7f63e7e36e6c in futex_wait_cancelable (private=, expected=0, futex_word=0x55e915217d00) at ../sysdeps/unix/sysv/linux/futex-internal.h:88 17 Thread 0x7f63cd7fa700 (LWP 24218) "clinfo:sh11" 0x7f63e7e36e6c in futex_wait_cancelable (private=, expected=0, futex_word=0x55e915217d00) at ../sysdeps/unix/sysv/linux/futex-internal.h:88 18 Thread 0x7f63ccff9700 (LWP 24219) "clinfo:shlo0" 0x7f63e7e36e6c in futex_wait_cancelable (private=, expected=0, futex_word=0x55e915218280) at ../sysdeps/unix/sysv/linux/futex-internal.h:88 19 Thread 0x7f639bfff700 (LWP 24220) "clinfo:shlo1" 0x7f63e7e36e6c in futex_wait_cancelable (private=, expected=0, futex_word=0x55e915218280) at ../sysdeps/unix/sysv/linux/futex-internal.h:88 20 Thread 0x7f639b7fe700 (LWP 24221) "clinfo:shlo2" 0x7f63e7e36e6c in futex_wait_cancelable (private=, expected=0, futex_word=0x55e915218280) at ../sysdeps/unix/sysv/linux/futex-internal.h:88 21 Thread 0x7f639affd700 (LWP 24222) "clinfo:shlo3" 0x7f63e7e36e6c in futex_wait_cancelable (private=, expected=0, futex_word=0x55e915218280) at ../sysdeps/unix/sysv/linux/futex-internal.h:88 22 Thread 0x7f639a7fc700 (LWP 24223) "clinfo:shlo4" 0x7f63e7e36e6c in futex_wait_cancelable (private=, expected=0, futex_word=0x55e915218280) at ../sysdeps/unix/sysv/linux/futex-internal.h:88 (gdb) thread apply all bt Thread 22 (Thread 0x7f639a7fc700 (LWP 24223)): #0 0x7f63e7e36e6c in futex_wait_cancelable (private=, expected=0, futex_word=0x55e915218280) at ../sysdeps/unix/sysv/linux/futex-internal.h:88 #1 __pthread_cond_wait_common (abstime=0x0
[Mesa-dev] [PATCH 2/2] freedreno: allocate batches from the cache in launch_grid
Needs to allocate batches from the cache so that it could get a valid index and make resource dependancy tracking right. In addition this fixes assertion on debug build since the commit 1a40faa8 landed. --- src/gallium/drivers/freedreno/freedreno_draw.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/gallium/drivers/freedreno/freedreno_draw.c b/src/gallium/drivers/freedreno/freedreno_draw.c index e130895aac..fe026a5fd8 100644 --- a/src/gallium/drivers/freedreno/freedreno_draw.c +++ b/src/gallium/drivers/freedreno/freedreno_draw.c @@ -459,7 +459,7 @@ fd_launch_grid(struct pipe_context *pctx, const struct pipe_grid_info *info) struct fd_batch *batch, *save_batch = NULL; unsigned i; - batch = fd_batch_create(ctx, true); + batch = fd_bc_alloc_batch(&ctx->screen->batch_cache, ctx, true); fd_batch_reference(&save_batch, ctx->batch); fd_batch_reference(&ctx->batch, batch); @@ -506,6 +506,7 @@ fd_launch_grid(struct pipe_context *pctx, const struct pipe_grid_info *info) fd_batch_reference(&ctx->batch, save_batch); fd_batch_reference(&save_batch, NULL); + fd_batch_reference(&batch, NULL); } void -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/2] freedreno: adds nondraw param to fd_bc_alloc_batch
Needs to specify nondraw when creating a batch through fd_bc_alloc_batch since it'd better create a batch through it rather than fd_batch_create. --- src/gallium/drivers/freedreno/a6xx/fd6_blitter.c | 2 +- src/gallium/drivers/freedreno/freedreno_batch_cache.c | 6 +++--- src/gallium/drivers/freedreno/freedreno_batch_cache.h | 2 +- src/gallium/drivers/freedreno/freedreno_context.c | 2 +- 4 files changed, 6 insertions(+), 6 deletions(-) diff --git a/src/gallium/drivers/freedreno/a6xx/fd6_blitter.c b/src/gallium/drivers/freedreno/a6xx/fd6_blitter.c index bd37005d50..c962fe7997 100644 --- a/src/gallium/drivers/freedreno/a6xx/fd6_blitter.c +++ b/src/gallium/drivers/freedreno/a6xx/fd6_blitter.c @@ -486,7 +486,7 @@ fd6_blit(struct pipe_context *pctx, const struct pipe_blit_info *info) return; } - batch = fd_bc_alloc_batch(&ctx->screen->batch_cache, ctx); + batch = fd_bc_alloc_batch(&ctx->screen->batch_cache, ctx, true); fd6_emit_restore(batch, batch->draw); fd6_emit_lrz_flush(batch->draw); diff --git a/src/gallium/drivers/freedreno/freedreno_batch_cache.c b/src/gallium/drivers/freedreno/freedreno_batch_cache.c index 9d046f205b..a8b32d9bd0 100644 --- a/src/gallium/drivers/freedreno/freedreno_batch_cache.c +++ b/src/gallium/drivers/freedreno/freedreno_batch_cache.c @@ -270,7 +270,7 @@ fd_bc_invalidate_resource(struct fd_resource *rsc, bool destroy) } struct fd_batch * -fd_bc_alloc_batch(struct fd_batch_cache *cache, struct fd_context *ctx) +fd_bc_alloc_batch(struct fd_batch_cache *cache, struct fd_context *ctx, bool nondraw) { struct fd_batch *batch; uint32_t idx; @@ -333,7 +333,7 @@ fd_bc_alloc_batch(struct fd_batch_cache *cache, struct fd_context *ctx) idx--; /* bit zero returns 1 for ffs() */ - batch = fd_batch_create(ctx, false); + batch = fd_batch_create(ctx, nondraw); if (!batch) goto out; @@ -365,7 +365,7 @@ batch_from_key(struct fd_batch_cache *cache, struct key *key, return batch; } - batch = fd_bc_alloc_batch(cache, ctx); + batch = fd_bc_alloc_batch(cache, ctx, false); #ifdef DEBUG DBG("%p: hash=0x%08x, %ux%u, %u layers, %u samples", batch, hash, key->width, key->height, key->layers, key->samples); diff --git a/src/gallium/drivers/freedreno/freedreno_batch_cache.h b/src/gallium/drivers/freedreno/freedreno_batch_cache.h index 348418e187..0f2c40ba8d 100644 --- a/src/gallium/drivers/freedreno/freedreno_batch_cache.h +++ b/src/gallium/drivers/freedreno/freedreno_batch_cache.h @@ -68,7 +68,7 @@ void fd_bc_flush_deferred(struct fd_batch_cache *cache, struct fd_context *ctx); void fd_bc_invalidate_context(struct fd_context *ctx); void fd_bc_invalidate_batch(struct fd_batch *batch, bool destroy); void fd_bc_invalidate_resource(struct fd_resource *rsc, bool destroy); -struct fd_batch * fd_bc_alloc_batch(struct fd_batch_cache *cache, struct fd_context *ctx); +struct fd_batch * fd_bc_alloc_batch(struct fd_batch_cache *cache, struct fd_context *ctx, bool nondraw); struct fd_batch * fd_batch_from_fb(struct fd_batch_cache *cache, struct fd_context *ctx, const struct pipe_framebuffer_state *pfb); diff --git a/src/gallium/drivers/freedreno/freedreno_context.c b/src/gallium/drivers/freedreno/freedreno_context.c index 55e978073a..c540d6d143 100644 --- a/src/gallium/drivers/freedreno/freedreno_context.c +++ b/src/gallium/drivers/freedreno/freedreno_context.c @@ -316,7 +316,7 @@ fd_context_init(struct fd_context *ctx, struct pipe_screen *pscreen, pctx->const_uploader = pctx->stream_uploader; if (!ctx->screen->reorder) - ctx->batch = fd_bc_alloc_batch(&screen->batch_cache, ctx); + ctx->batch = fd_bc_alloc_batch(&screen->batch_cache, ctx, false); slab_create_child(&ctx->transfer_pool, &screen->transfer_pool); -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [RFC] Allow fd.o to join forces with X.Org
On Wed, Oct 17, 2018 at 2:05 PM Daniel Stone wrote: > > On Tue, 16 Oct 2018 at 08:17, Peter Hutterer wrote: > > On Mon, Oct 15, 2018 at 10:49:24AM -0400, Harry Wentland wrote: > > > + \item Support free and open source projects through the > > > freedesktop.org > > > + infrastructure. For projects outside the scope of item (\ref{1}) > > > support > > > + extends to project hosting only. > > > + > > > > Yes to the idea but given that the remaining 11 pages cover all the legalese > > for xorg I think we need to add at least a section of what "project hosting" > > means. Even if it's just a "includes but is not limited to blah". And some > > addition to 4.1 Powers is needed to spell out what the BoD can do in regards > > to fdo. > > Yeah, I think it makes sense. Some things we do: > - provide hosted network services for collaborative development, > testing, and discussion, of open-source projects > - administer, improve, and extend this suite of services as necessary > - assist open-source projects in their use of these services > - purchase, lease, or subscribe to, computing and networking > infrastructure allowing these services to be run I fully agree that we should document all this. I don't think the bylaws are the right place though, much better to put that into policies that the board approves and which can be adapted as needed. Imo bylaws should cover the high-level mission and procedural details, as our "constitution", with the really high acceptance criteria of 2/3rd of all members approving any changes. Some of the early discussions tried to spell out a lot of the fd.o policies in bylaw changes, but then we realized it's all there already. All the details are much better served in policies enacted by the board, like we do with everything else. As an example, let's look at XDC. Definitely one of the biggest things the foundation does, with handling finances, travel sponsoring grants, papers committee, and acquiring lots of sponsors. None of this is spelled out in the bylaws, it's all in policies that the board deliberates and approves. I think this same approach will also work well for fd.o. And if members are unhappy with what the board does, they can fix in the next election by throwing out the unwanted directors. Thanks, Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 100960] Special block from Minecraft mod rendered out of place
https://bugs.freedesktop.org/show_bug.cgi?id=100960 --- Comment #17 from Sergii Romantsov --- Hello, Fabian. Unfortunately, probably, no one will be interest in that fix in the Mesa so much. The reason: actually issue is in the game. Specification doesn't specify exact way how to handle it. So at this moment its implementation-dependent. Suggestion is: please, post an issue to Minecraft-mod owner. Probably, that is the fastest way to fix it: instead of calls 'glRotated(angle = 180, x = 0, y = 0, z = 0)' application should call: 'glRotated(angle = 180, x = 1, y = 0, z = 0)'. And, please, provide a link to it. Proposed patches are still actual and adds compatibility for Mesa with Nvidia and Windows. But still: current behavior of Mesa is also can be treated as 'correct'. -- You are receiving this mail because: You are the assignee for the bug. You are the QA Contact for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [RFC] Allow fd.o to join forces with X.Org
On Tue, 16 Oct 2018 at 08:17, Peter Hutterer wrote: > On Mon, Oct 15, 2018 at 10:49:24AM -0400, Harry Wentland wrote: > > + \item Support free and open source projects through the > > freedesktop.org > > + infrastructure. For projects outside the scope of item (\ref{1}) > > support > > + extends to project hosting only. > > + > > Yes to the idea but given that the remaining 11 pages cover all the legalese > for xorg I think we need to add at least a section of what "project hosting" > means. Even if it's just a "includes but is not limited to blah". And some > addition to 4.1 Powers is needed to spell out what the BoD can do in regards > to fdo. Yeah, I think it makes sense. Some things we do: - provide hosted network services for collaborative development, testing, and discussion, of open-source projects - administer, improve, and extend this suite of services as necessary - assist open-source projects in their use of these services - purchase, lease, or subscribe to, computing and networking infrastructure allowing these services to be run Cheers, Daniel ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/3] anv: Implement VK_EXT_conditional_rendering for gen 7.5+
Conditional rendering affects next functions: - vkCmdDraw, vkCmdDrawIndexed, vkCmdDrawIndirect, vkCmdDrawIndexedIndirect - vkCmdDrawIndirectCountKHR, vkCmdDrawIndexedIndirectCountKHR - vkCmdDispatch, vkCmdDispatchIndirect, vkCmdDispatchBase - vkCmdClearAttachments To reduce readings from the memory a result of the condition is calculated and stored into designated register MI_ALU_REG15. In current implementation affected functions expect MI_PREDICATE_RESULT being set before their call so any code which changes the predicate should restore it with restore_conditional_render_predicate. An alternative is to restore MI_PREDICATE_RESULT in all affected functions at their beginning. Signed-off-by: Danylo Piliaiev --- src/intel/vulkan/anv_blorp.c | 7 +- src/intel/vulkan/anv_device.c | 12 ++ src/intel/vulkan/anv_extensions.py | 1 + src/intel/vulkan/anv_private.h | 2 + src/intel/vulkan/genX_cmd_buffer.c | 192 - 5 files changed, 209 insertions(+), 5 deletions(-) diff --git a/src/intel/vulkan/anv_blorp.c b/src/intel/vulkan/anv_blorp.c index 478b8e7a3d..157875d16f 100644 --- a/src/intel/vulkan/anv_blorp.c +++ b/src/intel/vulkan/anv_blorp.c @@ -1144,8 +1144,11 @@ void anv_CmdClearAttachments( * trash our depth and stencil buffers. */ struct blorp_batch batch; - blorp_batch_init(&cmd_buffer->device->blorp, &batch, cmd_buffer, -BLORP_BATCH_NO_EMIT_DEPTH_STENCIL); + enum blorp_batch_flags flags = BLORP_BATCH_NO_EMIT_DEPTH_STENCIL; + if (cmd_buffer->state.conditional_render_enabled) { + flags |= BLORP_BATCH_PREDICATE_ENABLE; + } + blorp_batch_init(&cmd_buffer->device->blorp, &batch, cmd_buffer, flags); for (uint32_t a = 0; a < attachmentCount; ++a) { if (pAttachments[a].aspectMask & VK_IMAGE_ASPECT_ANY_COLOR_BIT_ANV) { diff --git a/src/intel/vulkan/anv_device.c b/src/intel/vulkan/anv_device.c index a2551452eb..930a192c25 100644 --- a/src/intel/vulkan/anv_device.c +++ b/src/intel/vulkan/anv_device.c @@ -957,6 +957,18 @@ void anv_GetPhysicalDeviceFeatures2( break; } + case VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_CONDITIONAL_RENDERING_FEATURES_EXT: { + VkPhysicalDeviceConditionalRenderingFeaturesEXT *features = +(VkPhysicalDeviceConditionalRenderingFeaturesEXT*)ext; + ANV_FROM_HANDLE(anv_physical_device, pdevice, physicalDevice); + + features->conditionalRendering = pdevice->info.gen >= 8 || + pdevice->info.is_haswell; + features->inheritedConditionalRendering = pdevice->info.gen >= 8 || + pdevice->info.is_haswell; + break; + } + default: anv_debug_ignored_stype(ext->sType); break; diff --git a/src/intel/vulkan/anv_extensions.py b/src/intel/vulkan/anv_extensions.py index c13ce531ee..2ef7a52d01 100644 --- a/src/intel/vulkan/anv_extensions.py +++ b/src/intel/vulkan/anv_extensions.py @@ -127,6 +127,7 @@ EXTENSIONS = [ Extension('VK_EXT_vertex_attribute_divisor', 3, True), Extension('VK_EXT_post_depth_coverage', 1, 'device->info.gen >= 9'), Extension('VK_EXT_sampler_filter_minmax', 1, 'device->info.gen >= 9'), +Extension('VK_EXT_conditional_rendering', 1, 'device->info.gen >= 8 || device->info.is_haswell'), ] class VkVersion: diff --git a/src/intel/vulkan/anv_private.h b/src/intel/vulkan/anv_private.h index 599b903f25..108da51a59 100644 --- a/src/intel/vulkan/anv_private.h +++ b/src/intel/vulkan/anv_private.h @@ -2032,6 +2032,8 @@ struct anv_cmd_state { */ bool hiz_enabled; + bool conditional_render_enabled; + /** * Array length is anv_cmd_state::pass::attachment_count. Array content is * valid only when recording a render pass instance. diff --git a/src/intel/vulkan/genX_cmd_buffer.c b/src/intel/vulkan/genX_cmd_buffer.c index f07a6aa7c9..87abc443b6 100644 --- a/src/intel/vulkan/genX_cmd_buffer.c +++ b/src/intel/vulkan/genX_cmd_buffer.c @@ -479,8 +479,9 @@ transition_depth_buffer(struct anv_cmd_buffer *cmd_buffer, 0, 0, 1, hiz_op); } -#define MI_PREDICATE_SRC0 0x2400 -#define MI_PREDICATE_SRC1 0x2408 +#define MI_PREDICATE_SRC00x2400 +#define MI_PREDICATE_SRC10x2408 +#define MI_PREDICATE_RESULT 0x2418 static void set_image_compressed_bit(struct anv_cmd_buffer *cmd_buffer, @@ -545,6 +546,14 @@ mi_alu(uint32_t opcode, uint32_t operand1, uint32_t operand2) #define CS_GPR(n) (0x2600 + (n) * 8) +#if GEN_GEN >= 8 || GEN_IS_HASWELL +static void +restore_conditional_render_predicate(struct anv_cmd_buffer *cmd_buffer) +{ + emit_lrr(&cmd_buffer->batch, MI_PREDICATE_RESULT, CS_GPR(MI_ALU_REG15)); +} +#endif + /* This is only really practical on haswell and above because it requires * MI math in orde
[Mesa-dev] [PATCH 0/3] anv: Implement VK_KHR_draw_indirect_count and VK_EXT_conditional_rendering
This series implement VK_KHR_draw_indirect_count and VK_EXT_conditional_rendering extensions. They are implemented together because they are highly interweaved. There are already tests in VK_CTS for VK_KHR_draw_indirect_count and I made a pull request with the tests for VK_EXT_conditional_rendering (https://github.com/KhronosGroup/VK-GL-CTS/pull/131). VK_KHR_draw_indirect_count is implemented for gen7+. VK_EXT_conditional_rendering is implemented for gen7.5+ because it requires MI_MATH to be implemented correctly. Since part of the tests aren't in VK-GL-CTS master I'm not sure how to test the implementation of VK_EXT_conditional_rendering with my tests on CI. Could anyone help me with this? Also the one thing I'm uncertain of is described in the last patch. Many thanks to Jason Ekstrand for the help with the extensions. Danylo Piliaiev (3): anv: Implement VK_KHR_draw_indirect_count for gen 7.5+ anv: Implement VK_KHR_draw_indirect_count for gen 7 anv: Implement VK_EXT_conditional_rendering for gen 7.5+ src/intel/vulkan/anv_blorp.c | 7 +- src/intel/vulkan/anv_device.c | 12 + src/intel/vulkan/anv_extensions.py | 2 + src/intel/vulkan/anv_private.h | 2 + src/intel/vulkan/genX_cmd_buffer.c | 355 - 5 files changed, 373 insertions(+), 5 deletions(-) -- 2.18.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/3] anv: Implement VK_KHR_draw_indirect_count for gen 7.5+
Signed-off-by: Danylo Piliaiev --- src/intel/vulkan/anv_extensions.py | 1 + src/intel/vulkan/genX_cmd_buffer.c | 155 + 2 files changed, 156 insertions(+) diff --git a/src/intel/vulkan/anv_extensions.py b/src/intel/vulkan/anv_extensions.py index d4915c9501..7f44da6648 100644 --- a/src/intel/vulkan/anv_extensions.py +++ b/src/intel/vulkan/anv_extensions.py @@ -113,6 +113,7 @@ EXTENSIONS = [ Extension('VK_KHR_xlib_surface', 6, 'VK_USE_PLATFORM_XLIB_KHR'), Extension('VK_KHR_multiview', 1, True), Extension('VK_KHR_display', 23, 'VK_USE_PLATFORM_DISPLAY_KHR'), +Extension('VK_KHR_draw_indirect_count', 1, 'device->info.gen >= 8 || device->info.is_haswell'), Extension('VK_EXT_acquire_xlib_display', 1, 'VK_USE_PLATFORM_XLIB_XRANDR_EXT'), Extension('VK_EXT_debug_report', 8, True), Extension('VK_EXT_direct_mode_display', 1, 'VK_USE_PLATFORM_DISPLAY_KHR'), diff --git a/src/intel/vulkan/genX_cmd_buffer.c b/src/intel/vulkan/genX_cmd_buffer.c index 43a02f2256..d7b94efd19 100644 --- a/src/intel/vulkan/genX_cmd_buffer.c +++ b/src/intel/vulkan/genX_cmd_buffer.c @@ -2982,6 +2982,161 @@ void genX(CmdDrawIndexedIndirect)( } } +#if GEN_IS_HASWELL || GEN_GEN >= 8 +static void +emit_draw_count_predicate(struct anv_cmd_buffer *cmd_buffer, + struct anv_address count_address, + uint32_t draw_index) +{ + /* Upload the current draw count from the draw parameters buffer to +* MI_PREDICATE_SRC0. +*/ + emit_lrr(&cmd_buffer->batch, MI_PREDICATE_SRC0, CS_GPR(MI_ALU_REG14)); + + /* Upload the index of the current primitive to MI_PREDICATE_SRC1. */ + emit_lri(&cmd_buffer->batch, MI_PREDICATE_SRC1, draw_index); + emit_lri(&cmd_buffer->batch, MI_PREDICATE_SRC1 + 4, 0); + + if (draw_index == 0) { + anv_batch_emit(&cmd_buffer->batch, GENX(MI_PREDICATE), mip) { + mip.LoadOperation= LOAD_LOADINV; + mip.CombineOperation = COMBINE_SET; + mip.CompareOperation = COMPARE_SRCS_EQUAL; + } + } else { + /* While draw_index < draw_count the predicate's result will be +* (draw_index == draw_count) ^ TRUE = TRUE +* When draw_index == draw_count the result is +* (TRUE) ^ TRUE = FALSE +* After this all results will be: +* (FALSE) ^ FALSE = FALSE +*/ + anv_batch_emit(&cmd_buffer->batch, GENX(MI_PREDICATE), mip) { + mip.LoadOperation= LOAD_LOAD; + mip.CombineOperation = COMBINE_XOR; + mip.CompareOperation = COMPARE_SRCS_EQUAL; + } + } +} + +void genX(CmdDrawIndirectCountKHR)( +VkCommandBuffer commandBuffer, +VkBuffer_buffer, +VkDeviceSizeoffset, +VkBuffer_countBuffer, +VkDeviceSizecountBufferOffset, +uint32_tmaxDrawCount, +uint32_tstride) +{ + ANV_FROM_HANDLE(anv_cmd_buffer, cmd_buffer, commandBuffer); + ANV_FROM_HANDLE(anv_buffer, buffer, _buffer); + ANV_FROM_HANDLE(anv_buffer, count_buffer, _countBuffer); + struct anv_cmd_state *cmd_state = &cmd_buffer->state; + struct anv_pipeline *pipeline = cmd_state->gfx.base.pipeline; + const struct brw_vs_prog_data *vs_prog_data = get_vs_prog_data(pipeline); + + if (anv_batch_has_error(&cmd_buffer->batch)) + return; + + genX(cmd_buffer_flush_state)(cmd_buffer); + + struct anv_address count_address = + anv_address_add(count_buffer->address, countBufferOffset); + + /* Needed to ensure the memory is coherent for the MI_LOAD_REGISTER_MEM +* command when loading the values into the predicate source registers. +*/ + anv_batch_emit(&cmd_buffer->batch, GENX(PIPE_CONTROL), pc) { + pc.PipeControlFlushEnable = true; + } + + emit_lrm(&cmd_buffer->batch, CS_GPR(MI_ALU_REG14), count_address); + emit_lri(&cmd_buffer->batch, CS_GPR(MI_ALU_REG14) + 4, 0); + + for (uint32_t i = 0; i < maxDrawCount; i++) { + struct anv_address draw = anv_address_add(buffer->address, offset); + + emit_draw_count_predicate(cmd_buffer, count_address, i); + + if (vs_prog_data->uses_firstvertex || + vs_prog_data->uses_baseinstance) + emit_base_vertex_instance_bo(cmd_buffer, anv_address_add(draw, 8)); + if (vs_prog_data->uses_drawid) + emit_draw_index(cmd_buffer, i); + + load_indirect_parameters(cmd_buffer, draw, false); + + anv_batch_emit(&cmd_buffer->batch, GENX(3DPRIMITIVE), prim) { + prim.IndirectParameterEnable = true; + prim.PredicateEnable = true; + prim.VertexAccessType = SEQUENTIAL; + prim.PrimitiveTopologyType= pi
[Mesa-dev] [PATCH 2/3] anv: Implement VK_KHR_draw_indirect_count for gen 7
Without MI_MATH we are forced to load MI_PREDICATE_SRC0 from memory on every predicate emission. Signed-off-by: Danylo Piliaiev --- src/intel/vulkan/anv_extensions.py | 2 +- src/intel/vulkan/genX_cmd_buffer.c | 12 ++-- 2 files changed, 11 insertions(+), 3 deletions(-) diff --git a/src/intel/vulkan/anv_extensions.py b/src/intel/vulkan/anv_extensions.py index 7f44da6648..c13ce531ee 100644 --- a/src/intel/vulkan/anv_extensions.py +++ b/src/intel/vulkan/anv_extensions.py @@ -113,7 +113,7 @@ EXTENSIONS = [ Extension('VK_KHR_xlib_surface', 6, 'VK_USE_PLATFORM_XLIB_KHR'), Extension('VK_KHR_multiview', 1, True), Extension('VK_KHR_display', 23, 'VK_USE_PLATFORM_DISPLAY_KHR'), -Extension('VK_KHR_draw_indirect_count', 1, 'device->info.gen >= 8 || device->info.is_haswell'), +Extension('VK_KHR_draw_indirect_count', 1, True), Extension('VK_EXT_acquire_xlib_display', 1, 'VK_USE_PLATFORM_XLIB_XRANDR_EXT'), Extension('VK_EXT_debug_report', 8, True), Extension('VK_EXT_direct_mode_display', 1, 'VK_USE_PLATFORM_DISPLAY_KHR'), diff --git a/src/intel/vulkan/genX_cmd_buffer.c b/src/intel/vulkan/genX_cmd_buffer.c index d7b94efd19..f07a6aa7c9 100644 --- a/src/intel/vulkan/genX_cmd_buffer.c +++ b/src/intel/vulkan/genX_cmd_buffer.c @@ -2982,7 +2982,6 @@ void genX(CmdDrawIndexedIndirect)( } } -#if GEN_IS_HASWELL || GEN_GEN >= 8 static void emit_draw_count_predicate(struct anv_cmd_buffer *cmd_buffer, struct anv_address count_address, @@ -2991,7 +2990,13 @@ emit_draw_count_predicate(struct anv_cmd_buffer *cmd_buffer, /* Upload the current draw count from the draw parameters buffer to * MI_PREDICATE_SRC0. */ +#if GEN_GEN >= 8 || GEN_IS_HASWELL emit_lrr(&cmd_buffer->batch, MI_PREDICATE_SRC0, CS_GPR(MI_ALU_REG14)); +#else + emit_lrm(&cmd_buffer->batch, MI_PREDICATE_SRC0, count_address); + /* Zero the top 32-bits of MI_PREDICATE_SRC0 */ + emit_lri(&cmd_buffer->batch, MI_PREDICATE_SRC0 + 4, 0); +#endif /* Upload the index of the current primitive to MI_PREDICATE_SRC1. */ emit_lri(&cmd_buffer->batch, MI_PREDICATE_SRC1, draw_index); @@ -3050,8 +3055,10 @@ void genX(CmdDrawIndirectCountKHR)( pc.PipeControlFlushEnable = true; } +#if GEN_GEN >= 8 || GEN_IS_HASWELL emit_lrm(&cmd_buffer->batch, CS_GPR(MI_ALU_REG14), count_address); emit_lri(&cmd_buffer->batch, CS_GPR(MI_ALU_REG14) + 4, 0); +#endif for (uint32_t i = 0; i < maxDrawCount; i++) { struct anv_address draw = anv_address_add(buffer->address, offset); @@ -3108,8 +3115,10 @@ void genX(CmdDrawIndexedIndirectCountKHR)( pc.PipeControlFlushEnable = true; } +#if GEN_GEN >= 8 || GEN_IS_HASWELL emit_lrm(&cmd_buffer->batch, CS_GPR(MI_ALU_REG14), count_address); emit_lri(&cmd_buffer->batch, CS_GPR(MI_ALU_REG14) + 4, 0); +#endif for (uint32_t i = 0; i < maxDrawCount; i++) { struct anv_address draw = anv_address_add(buffer->address, offset); @@ -3135,7 +3144,6 @@ void genX(CmdDrawIndexedIndirectCountKHR)( offset += stride; } } -#endif static VkResult flush_compute_descriptor_set(struct anv_cmd_buffer *cmd_buffer) -- 2.18.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] radv: Add support for VK_KHR_driver_properties.
This patch never landed in git, is that intentional? On Mon, 1 Oct 2018 at 17:46, Jason Ekstrand wrote: > On Sun, Sep 30, 2018 at 1:04 PM Bas Nieuwenhuizen > wrote: > >> --- >> src/amd/vulkan/radv_device.c | 27 +++ >> src/amd/vulkan/radv_extensions.py | 1 + >> 2 files changed, 28 insertions(+) >> >> diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c >> index f7752eac83b..fe7e7f7f6ac 100644 >> --- a/src/amd/vulkan/radv_device.c >> +++ b/src/amd/vulkan/radv_device.c >> @@ -1196,6 +1196,33 @@ void radv_GetPhysicalDeviceProperties2( >> >> properties->conservativeRasterizationPostDepthCoverage = VK_FALSE; >> break; >> } >> + case >> VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_DRIVER_PROPERTIES_KHR: { >> + VkPhysicalDeviceDriverPropertiesKHR *driver_props >> = >> + (VkPhysicalDeviceDriverPropertiesKHR *) >> ext; >> + >> + driver_props->driverID = >> VK_DRIVER_ID_MESA_RADV_KHR; >> + memset(driver_props->driverName, 0, >> VK_MAX_DRIVER_NAME_SIZE_KHR); >> + strcpy(driver_props->driverName, "radv"); >> + >> + memset(driver_props->driverInfo, 0, >> VK_MAX_DRIVER_INFO_SIZE_KHR); >> + snprintf(driver_props->driverInfo, >> VK_MAX_DRIVER_INFO_SIZE_KHR, >> + "Mesa " PACKAGE_VERSION >> +#ifdef MESA_GIT_SHA1 >> + " ("MESA_GIT_SHA1")" >> +#endif >> + " (LLVM %i.%i.%i)", >> > > I think %d is more customary, but I don't care. Assuming you actually > pass 1.1.0.2, > > Reviewed-by: Jason Ekstrand > > >> +(HAVE_LLVM >> 8) & 0xff, HAVE_LLVM & >> 0xff, >> +MESA_LLVM_VERSION_PATCH); >> + >> + driver_props->conformanceVersion = >> (VkConformanceVersionKHR) { >> + .major = 1, >> + .minor = 1, >> + .subminor = 0, >> + .patch = 2, >> + }; >> + break; >> + } >> + >> default: >> break; >> } >> diff --git a/src/amd/vulkan/radv_extensions.py >> b/src/amd/vulkan/radv_extensions.py >> index 584926df390..8df5da76ed5 100644 >> --- a/src/amd/vulkan/radv_extensions.py >> +++ b/src/amd/vulkan/radv_extensions.py >> @@ -59,6 +59,7 @@ EXTENSIONS = [ >> Extension('VK_KHR_device_group', 1, True), >> Extension('VK_KHR_device_group_creation', 1, True), >> Extension('VK_KHR_draw_indirect_count', 1, True), >> +Extension('VK_KHR_driver_properties', 1, True), >> Extension('VK_KHR_external_fence',1, >> 'device->rad_info.has_syncobj_wait_for_submit'), >> Extension('VK_KHR_external_fence_capabilities', 1, True), >> Extension('VK_KHR_external_fence_fd', 1, >> 'device->rad_info.has_syncobj_wait_for_submit'), >> -- >> 2.19.0 >> >> ___ >> mesa-dev mailing list >> mesa-dev@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/mesa-dev >> > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] nir: remove some redundant bcsel instructions
On 17/10/18 8:49 pm, Bas Nieuwenhuizen wrote: On Wed, Oct 17, 2018 at 5:49 AM Timothy Arceri wrote: For example: vec1 32 ssa_386 = feq ssa_333.x, ssa_6 vec1 32 ssa_387 = feq ssa_333.x, ssa_2 vec1 32 ssa_391 = bcsel ssa_387, ssa_388, ssa_324 vec1 32 ssa_396 = bcsel ssa_386, ssa_324, ssa_391 Can be simplified to: vec1 32 ssa_386 = feq ssa_333.x, ssa_6 vec1 32 ssa_391 = bcsel ssa_387, ssa_388, ssa_324 There are a bunch of these in Rise of The Tomb Raiders Vulkan shaders. There are also a hadful of shaders helped in shader-db but the changes there are smaller. For RADV: Totals from affected shaders: SGPRS: 11184 -> 11168 (-0.14 %) VGPRS: 11484 -> 11484 (0.00 %) Spilled SGPRs: 1119 -> 1116 (-0.27 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 1210856 -> 1210372 (-0.04 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 360 -> 360 (0.00 %) Wait states: 0 -> 0 (0.00 %) --- src/compiler/nir/nir_opt_algebraic.py | 4 1 file changed, 4 insertions(+) diff --git a/src/compiler/nir/nir_opt_algebraic.py b/src/compiler/nir/nir_opt_algebraic.py index cc747250ba5..7530710cbe0 100644 --- a/src/compiler/nir/nir_opt_algebraic.py +++ b/src/compiler/nir/nir_opt_algebraic.py @@ -34,6 +34,7 @@ a = 'a' b = 'b' c = 'c' d = 'd' +e = 'e' # Written in the form (, ) where is an expression # and is either an expression or a value. An expression is @@ -525,6 +526,9 @@ optimizations = [ # The result of this should be hit by constant propagation and, in the # next round of opt_algebraic, get picked up by one of the above two. (('bcsel', '#a', b, c), ('bcsel', ('ine', 'a', 0), b, c)), + # Remove redundant bcsel + (('bcsel', ('ieq', '#a', b), c, ('bcsel', ('ieq', '#d', b), e, c)), ('bcsel', ('ieq', d, b), e, c)), I think this only works if the value of a is not equal to the value of d? if a is equal to d, then the expression on the left is always c, while the expression on the right is e sometimes? Hmm. I though the search/matching code was smart enough to handle this, but looking at it it seems I was wrong. I'll take a look tomorrow to see how hard it would be to handle this safely. + (('bcsel', ('feq', '#a', b), c, ('bcsel', ('feq', '#d', b), e, c)), ('bcsel', ('feq', d, b), e, c)), (('bcsel', a, b, b), b), (('fcsel', a, b, b), b), -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] intel/compiler: fix node interference of simd16 instructions
SIMD16 instructions need to have additional interferences to prevent source / destination hazards when the source and destination registers are off by one register. While we already have code to handle this, it was only running for SIMD16 dispatches, however, we can have SIDM16 instructions in a SIMD8 dispatch. An example of this are pull constant loads since commit b56fa830c6095, but there are more cases. This fixes a number of CTS test failues found in work-in-progress tests that were hitting this situation for 16-wide pull constants in a SIMD8 program. --- src/intel/compiler/brw_fs_reg_allocate.cpp | 36 ++ 1 file changed, 17 insertions(+), 19 deletions(-) diff --git a/src/intel/compiler/brw_fs_reg_allocate.cpp b/src/intel/compiler/brw_fs_reg_allocate.cpp index 42ccb28de6..f72826bc41 100644 --- a/src/intel/compiler/brw_fs_reg_allocate.cpp +++ b/src/intel/compiler/brw_fs_reg_allocate.cpp @@ -632,26 +632,24 @@ fs_visitor::assign_regs(bool allow_spilling, bool spill_all) } } - if (dispatch_width > 8) { - /* In 16-wide dispatch we have an issue where a compressed - * instruction is actually two instructions executed simultaneiously. - * It's actually ok to have the source and destination registers be - * the same. In this case, each instruction over-writes its own - * source and there's no problem. The real problem here is if the - * source and destination registers are off by one. Then you can end - * up in a scenario where the first instruction over-writes the - * source of the second instruction. Since the compiler doesn't know - * about this level of granularity, we simply make the source and - * destination interfere. - */ - foreach_block_and_inst(block, fs_inst, inst, cfg) { - if (inst->dst.file != VGRF) -continue; + /* In 16-wide instructions we have an issue where a compressed +* instruction is actually two instructions executed simultaneiously. +* It's actually ok to have the source and destination registers be +* the same. In this case, each instruction over-writes its own +* source and there's no problem. The real problem here is if the +* source and destination registers are off by one. Then you can end +* up in a scenario where the first instruction over-writes the +* source of the second instruction. Since the compiler doesn't know +* about this level of granularity, we simply make the source and +* destination interfere. +*/ + foreach_block_and_inst(block, fs_inst, inst, cfg) { + if (inst->exec_size < 16 || inst->dst.file != VGRF) + continue; - for (int i = 0; i < inst->sources; ++i) { -if (inst->src[i].file == VGRF) { - ra_add_node_interference(g, inst->dst.nr, inst->src[i].nr); -} + for (int i = 0; i < inst->sources; ++i) { + if (inst->src[i].file == VGRF) { +ra_add_node_interference(g, inst->dst.nr, inst->src[i].nr); } } } -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] nir: remove some redundant bcsel instructions
On Wed, Oct 17, 2018 at 5:49 AM Timothy Arceri wrote: > > For example: > >vec1 32 ssa_386 = feq ssa_333.x, ssa_6 >vec1 32 ssa_387 = feq ssa_333.x, ssa_2 >vec1 32 ssa_391 = bcsel ssa_387, ssa_388, ssa_324 >vec1 32 ssa_396 = bcsel ssa_386, ssa_324, ssa_391 > > Can be simplified to: > >vec1 32 ssa_386 = feq ssa_333.x, ssa_6 >vec1 32 ssa_391 = bcsel ssa_387, ssa_388, ssa_324 > > There are a bunch of these in Rise of The Tomb Raiders Vulkan > shaders. There are also a hadful of shaders helped in shader-db > but the changes there are smaller. > > For RADV: > > Totals from affected shaders: > SGPRS: 11184 -> 11168 (-0.14 %) > VGPRS: 11484 -> 11484 (0.00 %) > Spilled SGPRs: 1119 -> 1116 (-0.27 %) > Spilled VGPRs: 0 -> 0 (0.00 %) > Private memory VGPRs: 0 -> 0 (0.00 %) > Scratch size: 0 -> 0 (0.00 %) dwords per thread > Code Size: 1210856 -> 1210372 (-0.04 %) bytes > LDS: 0 -> 0 (0.00 %) blocks > Max Waves: 360 -> 360 (0.00 %) > Wait states: 0 -> 0 (0.00 %) > --- > src/compiler/nir/nir_opt_algebraic.py | 4 > 1 file changed, 4 insertions(+) > > diff --git a/src/compiler/nir/nir_opt_algebraic.py > b/src/compiler/nir/nir_opt_algebraic.py > index cc747250ba5..7530710cbe0 100644 > --- a/src/compiler/nir/nir_opt_algebraic.py > +++ b/src/compiler/nir/nir_opt_algebraic.py > @@ -34,6 +34,7 @@ a = 'a' > b = 'b' > c = 'c' > d = 'd' > +e = 'e' > > # Written in the form (, ) where is an expression > # and is either an expression or a value. An expression is > @@ -525,6 +526,9 @@ optimizations = [ > # The result of this should be hit by constant propagation and, in the > # next round of opt_algebraic, get picked up by one of the above two. > (('bcsel', '#a', b, c), ('bcsel', ('ine', 'a', 0), b, c)), > + # Remove redundant bcsel > + (('bcsel', ('ieq', '#a', b), c, ('bcsel', ('ieq', '#d', b), e, c)), > ('bcsel', ('ieq', d, b), e, c)), I think this only works if the value of a is not equal to the value of d? if a is equal to d, then the expression on the left is always c, while the expression on the right is e sometimes? > + (('bcsel', ('feq', '#a', b), c, ('bcsel', ('feq', '#d', b), e, c)), > ('bcsel', ('feq', d, b), e, c)), > > (('bcsel', a, b, b), b), > (('fcsel', a, b, b), b), > -- > 2.17.1 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 104302] Wolfenstein 2 (2017) under wine graphical artifacting on RADV
https://bugs.freedesktop.org/show_bug.cgi?id=104302 --- Comment #22 from Samuel Pitoiset --- Patch available here https://reviews.llvm.org/D53359 -- You are receiving this mail because: You are the assignee for the bug. You are the QA Contact for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 108105] [DXVK] Dauntless Helmets rendering incorrectly on Vega, works in AMDVLK
https://bugs.freedesktop.org/show_bug.cgi?id=108105 --- Comment #13 from Samuel Pitoiset --- Patch available here https://reviews.llvm.org/D53359 -- You are receiving this mail because: You are the assignee for the bug. You are the QA Contact for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] nir: remove some redundant bcsel instructions
Reviewed-by: Iago Toral Quiroga On Wed, 2018-10-17 at 14:49 +1100, Timothy Arceri wrote: > For example: > >vec1 32 ssa_386 = feq ssa_333.x, ssa_6 >vec1 32 ssa_387 = feq ssa_333.x, ssa_2 >vec1 32 ssa_391 = bcsel ssa_387, ssa_388, ssa_324 >vec1 32 ssa_396 = bcsel ssa_386, ssa_324, ssa_391 > > Can be simplified to: > >vec1 32 ssa_386 = feq ssa_333.x, ssa_6 >vec1 32 ssa_391 = bcsel ssa_387, ssa_388, ssa_324 > > There are a bunch of these in Rise of The Tomb Raiders Vulkan > shaders. There are also a hadful of shaders helped in shader-db > but the changes there are smaller. > > For RADV: > > Totals from affected shaders: > SGPRS: 11184 -> 11168 (-0.14 %) > VGPRS: 11484 -> 11484 (0.00 %) > Spilled SGPRs: 1119 -> 1116 (-0.27 %) > Spilled VGPRs: 0 -> 0 (0.00 %) > Private memory VGPRs: 0 -> 0 (0.00 %) > Scratch size: 0 -> 0 (0.00 %) dwords per thread > Code Size: 1210856 -> 1210372 (-0.04 %) bytes > LDS: 0 -> 0 (0.00 %) blocks > Max Waves: 360 -> 360 (0.00 %) > Wait states: 0 -> 0 (0.00 %) > --- > src/compiler/nir/nir_opt_algebraic.py | 4 > 1 file changed, 4 insertions(+) > > diff --git a/src/compiler/nir/nir_opt_algebraic.py > b/src/compiler/nir/nir_opt_algebraic.py > index cc747250ba5..7530710cbe0 100644 > --- a/src/compiler/nir/nir_opt_algebraic.py > +++ b/src/compiler/nir/nir_opt_algebraic.py > @@ -34,6 +34,7 @@ a = 'a' > b = 'b' > c = 'c' > d = 'd' > +e = 'e' > > # Written in the form (, ) where is an > expression > # and is either an expression or a value. An expression > is > @@ -525,6 +526,9 @@ optimizations = [ > # The result of this should be hit by constant propagation and, > in the > # next round of opt_algebraic, get picked up by one of the above > two. > (('bcsel', '#a', b, c), ('bcsel', ('ine', 'a', 0), b, c)), > + # Remove redundant bcsel > + (('bcsel', ('ieq', '#a', b), c, ('bcsel', ('ieq', '#d', b), e, > c)), ('bcsel', ('ieq', d, b), e, c)), > + (('bcsel', ('feq', '#a', b), c, ('bcsel', ('feq', '#d', b), e, > c)), ('bcsel', ('feq', d, b), e, c)), > > (('bcsel', a, b, b), b), > (('fcsel', a, b, b), b), ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 108355] Civilization VI - Artifacts in mouse cursor
https://bugs.freedesktop.org/show_bug.cgi?id=108355 --- Comment #6 from Michel Dänzer --- Does it still happen with xf86-video-amdgpu 18.1.0? Does amdgpu.dc=0 on the kernel command line avoid the problem? -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 108355] Civilization VI - Artifacts in mouse cursor
https://bugs.freedesktop.org/show_bug.cgi?id=108355 Michel Dänzer changed: What|Removed |Added Attachment #142059|text/x-log |text/plain mime type|| -- You are receiving this mail because: You are the assignee for the bug. You are the QA Contact for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev