Re: [Mesa-dev] [PATCH 1/4] nv50: add target->hasDualIssueing()
On Sat, Aug 13, 2016 at 10:43 AM, Tobias Klausmannwrote: > > > > On 13.08.2016 12:02, Karol Herbst wrote: >> >> Signed-off-by: Karol Herbst >> --- >> src/gallium/drivers/nouveau/codegen/nv50_ir_target.h| 1 + >> src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp | 7 ++- >> src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.h | 1 + >> 3 files changed, 8 insertions(+), 1 deletion(-) >> >> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_target.h >> b/src/gallium/drivers/nouveau/codegen/nv50_ir_target.h >> index 4a701f7..485ca16 100644 >> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_target.h >> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_target.h >> @@ -222,6 +222,7 @@ public: >>const Value *) const = 0; >>// whether @insn can be issued together with @next (order matters) >> + virtual bool hasDualIssueing() const { return false; } >> virtual bool canDualIssue(const Instruction *insn, >>const Instruction *next) const { return >> false; } >> virtual int getLatency(const Instruction *) const { return 1; } >> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp >> b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp >> index 04ac288..faf2121 100644 >> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp >> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp >> @@ -605,12 +605,17 @@ int TargetNVC0::getThroughput(const Instruction *i) >> const >> } >> } >> +bool TargetNVC0::hasDualIssueing() const The correct spelling is "issuing". English can be so silly at times... >> +{ >> + return getChipset() >= 0xe4; >> +} >> + >> bool TargetNVC0::canDualIssue(const Instruction *a, const Instruction *b) >> const >> { >> const OpClass clA = operationClass[a->op]; >> const OpClass clB = operationClass[b->op]; >> - if (getChipset() >= 0xe4) { >> + if (hasDualIssueing()) { >> // not texturing >> // not if the 2nd instruction isn't necessarily executed >> if (clA == OPCLASS_TEXTURE || clA == OPCLASS_FLOW) >> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.h >> b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.h >> index 7d11cd9..3d55da7 100644 >> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.h >> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.h >> @@ -57,6 +57,7 @@ public: >> virtual bool isPostMultiplySupported(operation, float, int& e) const; >> virtual bool mayPredicate(const Instruction *, const Value *) const; >> + virtual bool hasDualIssueing() const; >> virtual bool canDualIssue(const Instruction *, const Instruction *) >> const; >> virtual int getLatency(const Instruction *) const; >> virtual int getThroughput(const Instruction *) const; > > > Reviewed-by: Tobias Klausmann > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/6] nir: Turn -(b2f(a) + b2f(b) >= 0 into !(a || b).
> > > > For now, this patch is > > > > Reviewed-by: Ian Romanick> I had a hard time parsing the title: "Turn -(b2f(a) + b2f(b) >= 0 into !(a || b)" at first, until I saw the replacement instructions. You're missing a ')' on the commit line. :) ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] mesa: Make TexSubImage check negative dimensions sooner.
Sorry, didn't CC mesa-dev, trying again... On Wed, Jun 8, 2016 at 4:11 PM, Kenneth Graunkewrote: > Two dEQP tests expect INVALID_VALUE errors for negative width/height > parameters, but get INVALID_OPERATION because they haven't actually > created a destination image. This is arguably not a bug in Mesa, as > there's no specified ordering of error conditions. > > However, it's also really easy to make the tests pass, and there's > no real harm in doing these checks earlier. > > Fixes: > dEQP-GLES3.functional.negative_api.texture.texsubimage3d_neg_width_height > dEQP-GLES31.functional.debug.negative_coverage.get_error.texture.texsubimage3d_neg_width_height > > Signed-off-by: Kenneth Graunke > --- > src/mesa/main/teximage.c | 68 > ++-- > 1 file changed, 49 insertions(+), 19 deletions(-) > > diff --git a/src/mesa/main/teximage.c b/src/mesa/main/teximage.c > index 58b7f27..d4f8278 100644 > --- a/src/mesa/main/teximage.c > +++ b/src/mesa/main/teximage.c > @@ -1102,6 +1102,32 @@ _mesa_legal_texture_dimensions(struct gl_context *ctx, > GLenum target, > } > } > > +static bool > +error_check_subtexture_negative_dimensions(struct gl_context *ctx, > + GLuint dims, > + GLsizei subWidth, > + GLsizei subHeight, > + GLsizei subDepth, > + const char *func) > +{ > + /* Check size */ > + if (subWidth < 0) { > + _mesa_error(ctx, GL_INVALID_VALUE, "%s(width=%d)", func, subWidth); > + return true; > + } > + > + if (dims > 1 && subHeight < 0) { > + _mesa_error(ctx, GL_INVALID_VALUE, "%s(height=%d)", func, subHeight); > + return true; > + } > + > + if (dims > 2 && subDepth < 0) { > + _mesa_error(ctx, GL_INVALID_VALUE, "%s(depth=%d)", func, subDepth); > + return true; > + } > + What do you think of a structure like: switch(dims) { case 3: if(subDepth < 0) { ... } /* fall through */ case 2: if(subHeight < 0) { ... } /* fall through * default: if(subWidth < 0) { ... } } return true; I think this would reduce the overall number of expressions to check. If you just want to check whether any are < 0, you can OR the sign bits: int result = 0; switch(dims) { case 3: result |= subDepth & (1 << 31); case 2: result |= subHeight & (1 << 31); default: result |= subWidth & (1 << 31); } return (bool)(result>>31); ...then later call that function to generate a more detailed error message about specifically which dimension was negative. > + return false; > +} > > /** > * Do error checking of xoffset, yoffset, zoffset, width, height and depth > @@ -1119,25 +1145,6 @@ error_check_subtexture_dimensions(struct gl_context > *ctx, GLuint dims, > const GLenum target = destImage->TexObject->Target; > GLuint bw, bh, bd; > > - /* Check size */ > - if (subWidth < 0) { > - _mesa_error(ctx, GL_INVALID_VALUE, > - "%s(width=%d)", func, subWidth); > - return GL_TRUE; > - } > - > - if (dims > 1 && subHeight < 0) { > - _mesa_error(ctx, GL_INVALID_VALUE, > - "%s(height=%d)", func, subHeight); > - return GL_TRUE; > - } > - > - if (dims > 2 && subDepth < 0) { > - _mesa_error(ctx, GL_INVALID_VALUE, > - "%s(depth=%d)", func, subDepth); > - return GL_TRUE; > - } > - > /* check xoffset and width */ > if (xoffset < - (GLint) destImage->Border) { >_mesa_error(ctx, GL_INVALID_VALUE, "%s(xoffset)", func); > @@ -2104,6 +2111,12 @@ texsubimage_error_check(struct gl_context *ctx, GLuint > dimensions, >return GL_TRUE; > } > > + if (error_check_subtexture_negative_dimensions(ctx, dimensions, > + width, height, depth, > + callerName)) { > + return GL_TRUE; > + } > + > texImage = _mesa_select_tex_image(texObj, target, level); > if (!texImage) { >/* non-existant texture level */ > @@ -2140,6 +2153,12 @@ texsubimage_error_check(struct gl_context *ctx, GLuint > dimensions, >return GL_TRUE; > } > > + if (error_check_subtexture_negative_dimensions(ctx, dimensions, > + width, height, depth, > + callerName)) { > + return GL_TRUE; > + } > + > if (error_check_subtexture_dimensions(ctx, dimensions, > texImage, xoffset, yoffset, zoffset, > width, height, depth, callerName)) { > @@ -2497,6 +2516,11 @@ copytexsubimage_error_check(struct gl_context *ctx,
Re: [Mesa-dev] Patchwork review process (efficiency) questions
> I will point out a couple notes/observations: > > Kernel (drm/dri-devel), xorg, and other related projects use the same > process, and a lot of us do (or at least at some point have) been > active in 2 or more of these. > > Also, I have seen/used some other processes (gerrit, github pulls, > etc).. and IMO on those projects the review process ended up being a > lot more rubber-stamping and less thorough review of the changes. > There is some value in not making things too "push-button".. What are people's opinions on patchwork? I'm a regular reader but not contributor. I find the interface appealing and overall not too difficult to see recently submitted patches. Is it slower (workflow-wise)/less convenient to use than email? Or are there certain use-cases that just don't work? -- Patrick ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Discussion: C++11 std::future in Mesa
> > > No. Shader compilation can only be asynchronous if it's far enough > from a draw call and the app doesn't query its status. If it's next to > a draw call, multithreading is useless. Completely useless. > I don't know a lot about the shader compilation/linking process, so I'm just asking this for my own benefit. I read that the optimizations take a long time. Is it possible to create a sort of -O0 version of the shader while the real version is generated by some thread pool? Or would there be some shaders that would just fail to run unless optimization took place (and the developers count on that)? > We need to get below 33 ms for all shaders needed to be compiled to > render a frame. If there are 10 VS and 10 PS, one shader must be > compiled within 1.65 ms on average. I don't see where your random > guess meets that goal. > > Marek > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965/tiled_memcpy: don't unconditionally use __builtin_bswap32
On Mon, Apr 18, 2016 at 9:31 PM, Jonathan Graywrote: > Use the defines Mesa configure sets to indicate presence of the bswap32 > builtins. This lets i965 work on OpenBSD again after the changes that > were made in 0a5d8d9af42fd77fce1492d55f958da97816961a. > > Signed-off-by: Jonathan Gray > --- > src/mesa/drivers/dri/i965/intel_tiled_memcpy.c | 15 ++- > 1 file changed, 14 insertions(+), 1 deletion(-) > > diff --git a/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c > b/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c > index a549854..c888e46 100644 > --- a/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c > +++ b/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c > @@ -64,6 +64,19 @@ ror(uint32_t n, uint32_t d) > return (n >> d) | (n << (32 - d)); > } > > +static inline uint32_t > +bswap32(uint32_t n) > +{ > +#if defined(HAVE___BUILTIN_BSWAP32) > + return __builtin_bswap32(n); > +#else > + return (n >> 24) | > + ((n >> 8) & 0xff00) | > + ((n << 8) & 0x00ff) | > + (n << 24); > +#endif > +} > If I recall, GCC recognizes an open-coded byte swapping funciton and will replace it with the BSWAP instruction. I'm about 99% sure it is not necessary to use __built_bswap32() to have the benefits of using BSWAP. While I understand that you're trying to fix the use of __builtin_bswap32(), I don't think it is really necessary to continue to use it in your wrapper function. I'm not sure about -O0 though... anyways, maybe it isn't worth looking too hard into, but you might be able to drop some of the ugly #if defined() stuff. > + > /** > * Copy RGBA to BGRA - swap R and B. > */ > @@ -76,7 +89,7 @@ rgba8_copy(void *dst, const void *src, size_t bytes) > assert(bytes % 4 == 0); > > while (bytes >= 4) { > - *d = ror(__builtin_bswap32(*s), 8); > + *d = ror(bswap32(*s), 8); >d += 1; >s += 1; >bytes -= 4; > -- > 2.8.1 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] nir: Use double-precision pow() when bit_size is 64, powf() otherwise
On Mon, Mar 28, 2016 at 1:58 PM, Patrick Baggett <baggett.patr...@gmail.com> wrote: >> What are the rules in C when you compare a double >> variable with a single constant? >> >> void foo(double d) >> { >> /* Does d get converted to single, or does 0.0f get converted to >> * double? >> */ >> if (d == 0.0f) >> printf("zero\n"); >> } > > The 0.0f is converted to a double. One site [1] has a likely looking > reference. :) Sadly, I don't know how to check the C spec directly (I > think that it is not free). > > [1] https://www.eskimo.com/~scs/cclass/int/sx4cb.html Nevermind, the spec is available..found the link via Wikipedia. 6.3.1.8 Usual arithmetic conversions 1 Otherwise, if the corresponding real type of either operand is double, the other operand is converted, without change of type domain, to a type whose corresponding real type is double. So yes, 100% sure that it is promoted to a double. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] nir: Use double-precision pow() when bit_size is 64, powf() otherwise
> What are the rules in C when you compare a double > variable with a single constant? > > void foo(double d) > { > /* Does d get converted to single, or does 0.0f get converted to > * double? > */ > if (d == 0.0f) > printf("zero\n"); > } The 0.0f is converted to a double. One site [1] has a likely looking reference. :) Sadly, I don't know how to check the C spec directly (I think that it is not free). [1] https://www.eskimo.com/~scs/cclass/int/sx4cb.html ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 08/10] nir: Simplify 0 < fabs(a)
On Fri, Mar 11, 2016 at 10:21 AM, Ian Romanick <i...@freedesktop.org> wrote: > On 03/10/2016 01:24 PM, Patrick Baggett wrote: >> On Thu, Mar 10, 2016 at 3:08 PM, Patrick Baggett >> <baggett.patr...@gmail.com> wrote: >>> On Thu, Mar 10, 2016 at 12:25 PM, Ian Romanick <i...@freedesktop.org> wrote: >>>> From: Ian Romanick <ian.d.roman...@intel.com> >>>> >>>> Sandy Bridge / Ivy Bridge / Haswell >>>> total instructions in shared programs: 8462180 -> 8462174 (-0.00%) >>>> instructions in affected programs: 564 -> 558 (-1.06%) >>>> helped: 6 >>>> HURT: 0 >>>> >>>> total cycles in shared programs: 117542462 -> 117542276 (-0.00%) >>>> cycles in affected programs: 9768 -> 9582 (-1.90%) >>>> helped: 12 >>>> HURT: 0 >>>> >>>> Broadwell / Skylake >>>> total instructions in shared programs: 8980833 -> 8980826 (-0.00%) >>>> instructions in affected programs: 626 -> 619 (-1.12%) >>>> helped: 7 >>>> HURT: 0 >>>> >>>> total cycles in shared programs: 70077900 -> 70077714 (-0.00%) >>>> cycles in affected programs: 9378 -> 9192 (-1.98%) >>>> helped: 12 >>>> HURT: 0 >>>> >>>> G45 and Ironlake showed no change. >>>> >>>> Signed-off-by: Ian Romanick <ian.d.roman...@intel.com> >>>> --- >>>> src/compiler/nir/nir_opt_algebraic.py | 5 + >>>> 1 file changed, 5 insertions(+) >>>> >>>> diff --git a/src/compiler/nir/nir_opt_algebraic.py >>>> b/src/compiler/nir/nir_opt_algebraic.py >>>> index 4db8f84..1442ce8 100644 >>>> --- a/src/compiler/nir/nir_opt_algebraic.py >>>> +++ b/src/compiler/nir/nir_opt_algebraic.py >>>> @@ -108,6 +108,11 @@ optimizations = [ >>>> # inot(a) >>>> (('fge', 0.0, ('b2f', a)), ('inot', a)), >>>> >>>> + # 0.0 < fabs(a) >>>> + # 0.0 != fabs(a) because fabs(a) must be >= 0 >>> I think this is wrong. Because >= 0.0 can mean that fabs(a) == 0.0 for >>> some a, you can't say then fabs(a) != 0.0. >>> >>> Then, the counter-example is when a = 0.0 >>> >>> 1) 0.0 != fabs(0.0) >>> 2) 0.0 != 0.0 >>> >> Rather, I mean the comment is wrong, but the conclusion that: >> 0 < fabs(a) <-> a != 0.0 >> is correct. You can just build a truth table or just observe that when >> a == 0, 0 < 0 is false, and >> when a != 0.0, fabs(a) will be > 0, so 0 < fabs(a) will be always true. > > How about if I change it to > ># 0.0 != fabs(a) Since fabs(a) >= 0, 0 <= fabs(a) must be true > > I think it's trivial to see how to get from "0 < fabs(a)" to "0 != > fabs(a)" based on that. Yeah, I think what gave me a pause when I read was "0.0 != fabs(a)", because that's not a general mathematical truth unless qualified by "a != 0.0". I don't have any particularly strong feelings about the wording. I personally didn't reason about it using (in)equalities at all. My logic was mostly based on domain analysis of the expression: let p(a) := 0 < fabs(a) p(0) <-> false p(a) <-> true, for any other value of a therefore p(a) <-> true when a != 0.0 therefore p(a) <-> a != 0 It's up to you. > >>>> + # 0.0 != a >>>> + (('flt', 0.0, ('fabs', a)), ('fne', a, 0.0)), >>>> + >>>> (('fge', ('fneg', ('fabs', a)), 0.0), ('feq', a, 0.0)), >>>> (('bcsel', ('flt', a, b), a, b), ('fmin', a, b)), >>>> (('bcsel', ('flt', a, b), b, a), ('fmax', a, b)), >>>> -- >>>> 2.5.0 >>>> >>>> ___ >>>> mesa-dev mailing list >>>> mesa-dev@lists.freedesktop.org >>>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 08/10] nir: Simplify 0 < fabs(a)
On Thu, Mar 10, 2016 at 3:08 PM, Patrick Baggett <baggett.patr...@gmail.com> wrote: > On Thu, Mar 10, 2016 at 12:25 PM, Ian Romanick <i...@freedesktop.org> wrote: >> From: Ian Romanick <ian.d.roman...@intel.com> >> >> Sandy Bridge / Ivy Bridge / Haswell >> total instructions in shared programs: 8462180 -> 8462174 (-0.00%) >> instructions in affected programs: 564 -> 558 (-1.06%) >> helped: 6 >> HURT: 0 >> >> total cycles in shared programs: 117542462 -> 117542276 (-0.00%) >> cycles in affected programs: 9768 -> 9582 (-1.90%) >> helped: 12 >> HURT: 0 >> >> Broadwell / Skylake >> total instructions in shared programs: 8980833 -> 8980826 (-0.00%) >> instructions in affected programs: 626 -> 619 (-1.12%) >> helped: 7 >> HURT: 0 >> >> total cycles in shared programs: 70077900 -> 70077714 (-0.00%) >> cycles in affected programs: 9378 -> 9192 (-1.98%) >> helped: 12 >> HURT: 0 >> >> G45 and Ironlake showed no change. >> >> Signed-off-by: Ian Romanick <ian.d.roman...@intel.com> >> --- >> src/compiler/nir/nir_opt_algebraic.py | 5 + >> 1 file changed, 5 insertions(+) >> >> diff --git a/src/compiler/nir/nir_opt_algebraic.py >> b/src/compiler/nir/nir_opt_algebraic.py >> index 4db8f84..1442ce8 100644 >> --- a/src/compiler/nir/nir_opt_algebraic.py >> +++ b/src/compiler/nir/nir_opt_algebraic.py >> @@ -108,6 +108,11 @@ optimizations = [ >> # inot(a) >> (('fge', 0.0, ('b2f', a)), ('inot', a)), >> >> + # 0.0 < fabs(a) >> + # 0.0 != fabs(a) because fabs(a) must be >= 0 > I think this is wrong. Because >= 0.0 can mean that fabs(a) == 0.0 for > some a, you can't say then fabs(a) != 0.0. > > Then, the counter-example is when a = 0.0 > > 1) 0.0 != fabs(0.0) > 2) 0.0 != 0.0 > Rather, I mean the comment is wrong, but the conclusion that: 0 < fabs(a) <-> a != 0.0 is correct. You can just build a truth table or just observe that when a == 0, 0 < 0 is false, and when a != 0.0, fabs(a) will be > 0, so 0 < fabs(a) will be always true. >> + # 0.0 != a > > > > >> + (('flt', 0.0, ('fabs', a)), ('fne', a, 0.0)), >> + >> (('fge', ('fneg', ('fabs', a)), 0.0), ('feq', a, 0.0)), >> (('bcsel', ('flt', a, b), a, b), ('fmin', a, b)), >> (('bcsel', ('flt', a, b), b, a), ('fmax', a, b)), >> -- >> 2.5.0 >> >> ___ >> mesa-dev mailing list >> mesa-dev@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 08/10] nir: Simplify 0 < fabs(a)
On Thu, Mar 10, 2016 at 12:25 PM, Ian Romanickwrote: > From: Ian Romanick > > Sandy Bridge / Ivy Bridge / Haswell > total instructions in shared programs: 8462180 -> 8462174 (-0.00%) > instructions in affected programs: 564 -> 558 (-1.06%) > helped: 6 > HURT: 0 > > total cycles in shared programs: 117542462 -> 117542276 (-0.00%) > cycles in affected programs: 9768 -> 9582 (-1.90%) > helped: 12 > HURT: 0 > > Broadwell / Skylake > total instructions in shared programs: 8980833 -> 8980826 (-0.00%) > instructions in affected programs: 626 -> 619 (-1.12%) > helped: 7 > HURT: 0 > > total cycles in shared programs: 70077900 -> 70077714 (-0.00%) > cycles in affected programs: 9378 -> 9192 (-1.98%) > helped: 12 > HURT: 0 > > G45 and Ironlake showed no change. > > Signed-off-by: Ian Romanick > --- > src/compiler/nir/nir_opt_algebraic.py | 5 + > 1 file changed, 5 insertions(+) > > diff --git a/src/compiler/nir/nir_opt_algebraic.py > b/src/compiler/nir/nir_opt_algebraic.py > index 4db8f84..1442ce8 100644 > --- a/src/compiler/nir/nir_opt_algebraic.py > +++ b/src/compiler/nir/nir_opt_algebraic.py > @@ -108,6 +108,11 @@ optimizations = [ > # inot(a) > (('fge', 0.0, ('b2f', a)), ('inot', a)), > > + # 0.0 < fabs(a) > + # 0.0 != fabs(a) because fabs(a) must be >= 0 I think this is wrong. Because >= 0.0 can mean that fabs(a) == 0.0 for some a, you can't say then fabs(a) != 0.0. Then, the counter-example is when a = 0.0 1) 0.0 != fabs(0.0) 2) 0.0 != 0.0 > + # 0.0 != a > + (('flt', 0.0, ('fabs', a)), ('fne', a, 0.0)), > + > (('fge', ('fneg', ('fabs', a)), 0.0), ('feq', a, 0.0)), > (('bcsel', ('flt', a, b), a, b), ('fmin', a, b)), > (('bcsel', ('flt', a, b), b, a), ('fmax', a, b)), > -- > 2.5.0 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 08/10] glsl: fix new gcc6 warnings
On Wed, Feb 17, 2016 at 3:35 PM, Rob Clarkwrote: > src/compiler/glsl/lower_discard_flow.cpp:79:1: warning: ‘ir_visitor_status > {anonymous}::lower_discard_flow_visitor::visit_enter(ir_loop_jump*)’ defined > but not used [-Wunused-function] > lower_discard_flow_visitor::visit_enter(ir_loop_jump *ir) > ^~ > > The base class method that was intended to be overridden was > 'visit(ir_loop_jump *ir)', not visit_entire(). > Has there been a discussion about using the "override" keyword (C++11)? It sounds like it could catch bugs like this, and if hidden behind a #define, act as a no-op when C++11 is not supported. Although obviously the new gcc6 warning is effectively doing much the same thing... > Signed-off-by: Rob Clark > --- > src/compiler/glsl/lower_discard_flow.cpp | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/src/compiler/glsl/lower_discard_flow.cpp > b/src/compiler/glsl/lower_discard_flow.cpp > index 9d0a56b..9e3a7c0 100644 > --- a/src/compiler/glsl/lower_discard_flow.cpp > +++ b/src/compiler/glsl/lower_discard_flow.cpp > @@ -62,8 +62,8 @@ public: > { > } > > + ir_visitor_status visit(ir_loop_jump *ir); > ir_visitor_status visit_enter(ir_discard *ir); > - ir_visitor_status visit_enter(ir_loop_jump *ir); > ir_visitor_status visit_enter(ir_loop *ir); > ir_visitor_status visit_enter(ir_function_signature *ir); > > @@ -76,7 +76,7 @@ public: > } /* anonymous namespace */ > > ir_visitor_status > -lower_discard_flow_visitor::visit_enter(ir_loop_jump *ir) > +lower_discard_flow_visitor::visit(ir_loop_jump *ir) > { > if (ir->mode != ir_loop_jump::jump_continue) >return visit_continue; > -- > 2.5.0 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [Bug 27512] Illegal instruction _mesa_x86_64_transform_points4_general
Given that there is a _mesa_3dnow_transform_points4_2d in the x86-64 asm (using MMX/3DNow! is deprecated in x86-64), it appears that this code was copy-pasted. I wrote a quick patch to change prefetch[w] to prefetcht1, which is more or less the equivalent in SSE. However, I'm not actually sure those prefetches really benefit the code since they appear to be monotonic addresses and hinting only 16 bytes ahead (a cache line is almost always at least 32 bytes) -- maybe that sort of testing is for another day. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/3] i965/nir/opt_peephole_ffma: Bypass fusion if any operand of fadd and fmul is a const
On Fri, Oct 23, 2015 at 10:55 AM, Eduardo Lima Mitevwrote: > When both fadd and fmul instructions have at least one operand that is a > constant and it is only used once, the total number of instructions can > be reduced from 3 (1 ffma + 2 load_const) to 2 (1 fmul + 1 fadd); because > the constants will be progagated as immediate operands of fmul and fadd. > > This patch detects these situations and prevents fusing fmul+fadd into > ffma. > > Shader-db results on i965 Haswell: > > total instructions in shared programs: 6235835 -> 6225895 (-0.16%) > instructions in affected programs: 1124094 -> 1114154 (-0.88%) > total loops in shared programs:1979 -> 1979 (0.00%) > helped:7612 > HURT: 843 > GAINED:4 > LOST: 0 > --- > .../drivers/dri/i965/brw_nir_opt_peephole_ffma.c | 31 > ++ > 1 file changed, 31 insertions(+) > > diff --git a/src/mesa/drivers/dri/i965/brw_nir_opt_peephole_ffma.c > b/src/mesa/drivers/dri/i965/brw_nir_opt_peephole_ffma.c > index a8448e7..c7fc15a 100644 > --- a/src/mesa/drivers/dri/i965/brw_nir_opt_peephole_ffma.c > +++ b/src/mesa/drivers/dri/i965/brw_nir_opt_peephole_ffma.c > @@ -133,6 +133,28 @@ get_mul_for_src(nir_alu_src *src, int num_components, > return alu; > } > > +/** > + * Given a list of (at least two) nir_alu_src's, tells if any of them is a > + * constant value and is used only once. > + */ > +static bool > +any_alu_src_is_a_constant(nir_alu_src srcs[]) > +{ > + for (unsigned i = 0; i < 2; i++) { > + if (srcs[i].src.ssa->parent_instr->type == > nir_instr_type_load_const) { > + nir_load_const_instr *load_const = > +nir_instr_as_load_const (srcs[i].src.ssa->parent_instr); > + > + if (list_is_single(_const->def.uses) && > + list_empty(_const->def.if_uses)) { > +return true; > + } > + } > + } > + > + return false; > +} > + > The comment above this functions reads "Given a list of (at least two) nir_alu_src's...", but the function checks exactly two. Was it your intention to support lists with size > 2? > static bool > brw_nir_opt_peephole_ffma_block(nir_block *block, void *void_state) > { > @@ -183,6 +205,15 @@ brw_nir_opt_peephole_ffma_block(nir_block *block, > void *void_state) >mul_src[0] = mul->src[0].src.ssa; >mul_src[1] = mul->src[1].src.ssa; > > + /* If any of the operands of the fmul and any of the fadd is a > constant, > + * we bypass because it will be more efficient as the constants > will be > + * propagated as operands, potentially saving two load_const > instructions. > + */ > + if (any_alu_src_is_a_constant(mul->src) && > + any_alu_src_is_a_constant(add->src)) { > + continue; > + } > + >if (abs) { > for (unsigned i = 0; i < 2; i++) { > nir_alu_instr *abs = nir_alu_instr_create(state->mem_ctx, > -- > 2.5.3 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] gallium/os: add os_wait_until_zero
On Fri, Jun 26, 2015 at 11:40 AM, Marek Olšák mar...@gmail.com wrote: If p_atomic_read is fine, then this patch is fine too. So you're telling that this should work: while (p_atomic_read(var)); I wouldn't be concerned about a memory barrier. This is only 1 int, so it should make its way into the shared cache eventually. Yes, it does make it to the shared cache, but the assumption is that the compiler will actually generate code to check the memory location more than one. I've personally been bitten by this assumption - it's a bad one. Ilia is right. If you have a variable that doesn't appear to modified at all, but you, the programmer know it will be modified by another thread, you're asking for an infinite loop. The only guarantee you get is that if this code ran in isolation on a single thread, it will do what you told it to. Consider even a trivial transformation: while(1) { if(var == 0) break; } The compiler can optimize this to a single statement: if(var != 0) infinite_loop(); ...because it produces the same results as the above code when run in isolation. However, if 'var' is volilate, it cannot assume that the value will remain the same and cannot apply this optimization. What's more fun is that debug mode tends to not apply these sorts of optimizations, so your code hangs in release builds, and when you check the memory location, you can see that it has been updated. Commence tearing hair out. Then you look at the assembly and hit your head on the desk. Or something like that. ;) Patrick ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] util: Unbreak usage of assert()/debug_assert() inside expressions.
On Fri, Dec 12, 2014 at 10:17 AM, Roland Scheidegger srol...@vmware.com wrote: Am 12.12.2014 um 15:09 schrieb Jose Fonseca: From: José Fonseca jfons...@vmware.com f0ba7d897d1c22202531acb70f134f2edc30557d made debug_assert()/assert() unsafe for expressions, but only now with u_atomic.h started to rely on them for Windows this became an issue. This fixes non-debug builds with MSVC. --- src/gallium/auxiliary/util/u_debug.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/gallium/auxiliary/util/u_debug.h b/src/gallium/auxiliary/util/u_debug.h index badd5e2..4c22fdf 100644 --- a/src/gallium/auxiliary/util/u_debug.h +++ b/src/gallium/auxiliary/util/u_debug.h @@ -185,7 +185,7 @@ void _debug_assert_fail(const char *expr, #ifdef DEBUG #define debug_assert(expr) ((expr) ? (void)0 : _debug_assert_fail(#expr, __FILE__, __LINE__, __FUNCTION__)) #else -#define debug_assert(expr) do { } while (0 (expr)) +#define debug_assert(expr) (void)(0 (expr)) #endif Just for my own education, can someone explain what the need for `debug_assert()` to have any expansion of `expr` at all? Rather, what breaks with something like: #define debug_assert(expr) ((void)0) ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 09/29] mesa: Add _mesa_swap2_copy and _mesa_swap4_copy
The restrict keyword is a C99 thing and I don't think it's supported in MSVC so that would be a problem. If it won't build with MSVC then it's a non-starter. If MSVC can handle restrict, then I don't know that I care much either way about 2 functions or 4 MSVC uses __restrict which functions identically -- but if there doesn't already exist a #define around this MSVC-ism, then I guess it may be more work then Iago was really signing up for. But it does exist. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 09/29] mesa: Add _mesa_swap2_copy and _mesa_swap4_copy
On Tue, Nov 18, 2014 at 3:23 AM, Iago Toral Quiroga ito...@igalia.com wrote: We have _mesa_swap{2,4} but these do in-place byte-swapping only. The new functions receive an extra parameter so we can swap bytes on a source input array and store the results in a (possibly different) destination array. If this is being split into an in-place and different pointers version, I think using the restrict keyword would be useful here to improve the performance. Then, the in-place one cannot be implemented as copy(p,p,n), but the code isn't overly complicated. This is useful to implement byte-swapping in pixel uploads, since in this case we need to swap bytes on the src data which is owned by the application so we can't do an in-place byte swap. --- src/mesa/main/image.c | 25 + src/mesa/main/image.h | 10 -- 2 files changed, 25 insertions(+), 10 deletions(-) diff --git a/src/mesa/main/image.c b/src/mesa/main/image.c index 4ea5f04..9ad97c5 100644 --- a/src/mesa/main/image.c +++ b/src/mesa/main/image.c @@ -41,36 +41,45 @@ /** - * Flip the order of the 2 bytes in each word in the given array. + * Flip the order of the 2 bytes in each word in the given array (src) and + * store the result in another array (dst). For in-place byte-swapping this + * function can be called with the same array for src and dst. * - * \param p array. + * \param dst the array where byte-swapped data will be stored. + * \param src the array with the source data we want to byte-swap. * \param n number of words. */ void -_mesa_swap2( GLushort *p, GLuint n ) +_mesa_swap2_copy( GLushort *dst, GLushort *src, GLuint n ) { GLuint i; for (i = 0; i n; i++) { - p[i] = (p[i] 8) | ((p[i] 8) 0xff00); + dst[i] = (src[i] 8) | ((src[i] 8) 0xff00); } } /* - * Flip the order of the 4 bytes in each word in the given array. + * Flip the order of the 4 bytes in each word in the given array (src) and + * store the result in another array (dst). For in-place byte-swapping this + * function can be called with the same array for src and dst. + * + * \param dst the array where byte-swapped data will be stored. + * \param src the array with the source data we want to byte-swap. + * \param n number of words. */ void -_mesa_swap4( GLuint *p, GLuint n ) +_mesa_swap4_copy( GLuint *dst, GLuint *src, GLuint n ) { GLuint i, a, b; for (i = 0; i n; i++) { - b = p[i]; + b = src[i]; a = (b 24) | ((b 8) 0xff00) | ((b 8) 0xff) | ((b 24) 0xff00); - p[i] = a; + dst[i] = a; } } diff --git a/src/mesa/main/image.h b/src/mesa/main/image.h index abd84bf..79c6e68 100644 --- a/src/mesa/main/image.h +++ b/src/mesa/main/image.h @@ -33,10 +33,16 @@ struct gl_context; struct gl_pixelstore_attrib; extern void -_mesa_swap2( GLushort *p, GLuint n ); +_mesa_swap2_copy( GLushort *dst, GLushort *src, GLuint n ); extern void -_mesa_swap4( GLuint *p, GLuint n ); +_mesa_swap4_copy( GLuint *dst, GLuint *src, GLuint n ); + +static inline void +_mesa_swap2( GLushort *p, GLuint n ) { _mesa_swap2_copy(p, p, n); } + +static inline void +_mesa_swap4( GLuint *p, GLuint n ) { _mesa_swap4_copy(p, p, n); } extern GLintptr _mesa_image_offset( GLuint dimensions, -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v5] gallium/auxiliary: add inc and dec alternative with return (v3)
On Mon, Nov 17, 2014 at 12:20 PM, Axel Davy axel.d...@ens.fr wrote: From: Christoph Bumiller christoph.bumil...@speed.at At this moment we use only zero or positive values. v2: Implement it for also for Solaris, MSVC assembly and enable for other combinations. v3: Replace MSVC assembly by assert + warning during compilation Signed-off-by: David Heidelberg da...@ixit.cz --- src/gallium/auxiliary/util/u_atomic.h | 72 +++ 1 file changed, 72 insertions(+) diff --git a/src/gallium/auxiliary/util/u_atomic.h b/src/gallium/auxiliary/util/u_atomic.h index 2f2b42b..9279073 100644 --- a/src/gallium/auxiliary/util/u_atomic.h +++ b/src/gallium/auxiliary/util/u_atomic.h @@ -69,6 +69,18 @@ p_atomic_dec(int32_t *v) } static INLINE int32_t +p_atomic_inc_return(int32_t *v) +{ + return __sync_add_and_fetch(v, 1); +} + +static INLINE int32_t +p_atomic_dec_return(int32_t *v) +{ + return __sync_sub_and_fetch(v, 1); +} + +static INLINE int32_t p_atomic_cmpxchg(int32_t *v, int32_t old, int32_t _new) { return __sync_val_compare_and_swap(v, old, _new); @@ -116,6 +128,18 @@ p_atomic_dec(int32_t *v) } static INLINE int32_t +p_atomic_inc_return(int32_t *v) +{ + return __sync_add_and_fetch(v, 1); +} + +static INLINE int32_t +p_atomic_dec_return(int32_t *v) +{ + return __sync_sub_and_fetch(v, 1); +} + +static INLINE int32_t p_atomic_cmpxchg(int32_t *v, int32_t old, int32_t _new) { return __sync_val_compare_and_swap(v, old, _new); @@ -161,6 +185,18 @@ p_atomic_dec(int32_t *v) } static INLINE int32_t +p_atomic_inc_return(int32_t *v) +{ + return __sync_add_and_fetch(v, 1); +} + +static INLINE int32_t +p_atomic_dec_return(int32_t *v) +{ + return __sync_sub_and_fetch(v, 1); +} + +static INLINE int32_t p_atomic_cmpxchg(int32_t *v, int32_t old, int32_t _new) { return __sync_val_compare_and_swap(v, old, _new); @@ -186,6 +222,8 @@ p_atomic_cmpxchg(int32_t *v, int32_t old, int32_t _new) #define p_atomic_dec_zero(_v) ((boolean) --(*(_v))) #define p_atomic_inc(_v) ((void) (*(_v))++) #define p_atomic_dec(_v) ((void) (*(_v))--) +#define p_atomic_inc_return(_v) ((*(_v))++) +#define p_atomic_dec_return(_v) ((*(_v))--) #define p_atomic_cmpxchg(_v, old, _new) (*(_v) == old ? *(_v) = (_new) : *(_v)) #endif @@ -197,6 +235,8 @@ p_atomic_cmpxchg(int32_t *v, int32_t old, int32_t _new) #define PIPE_ATOMIC MSVC x86 assembly +#include assert.h + #ifdef __cplusplus extern C { #endif @@ -236,6 +276,24 @@ p_atomic_dec(int32_t *v) } } +#pragma message ( Warning: p_atomic_dec_return and p_atomic_inc_return unimplemented for PIPE_ATOMIC_ASM_MSVC_X86 ) + +static INLINE int32_t +p_atomic_inc_return(int32_t *v) +{ + (void) v; + assert(0); + return 0; +} Why isn't _InterlockedIncrement() used here? It is used for the void functions. If you read the definition of _InterlockedIncrement() it returns the new value -- isn't that what is needed? + +static INLINE int32_t +p_atomic_dec_return(int32_t *v) +{ + (void) v; + assert(0); + return 0; +} Similarly here. + static INLINE int32_t p_atomic_cmpxchg(int32_t *v, int32_t old, int32_t _new) { @@ -288,6 +346,12 @@ p_atomic_inc(int32_t *v) _InterlockedIncrement((long *)v); } +static INLINE int32_t +p_atomic_inc_return(int32_t *v) +{ + return _InterlockedIncrement((long *)v); +} + static INLINE void p_atomic_dec(int32_t *v) { @@ -295,6 +359,12 @@ p_atomic_dec(int32_t *v) } static INLINE int32_t +p_atomic_dec_return(int32_t *v) +{ + return _InterlockedDecrement((long *)v); +} + +static INLINE int32_t p_atomic_cmpxchg(int32_t *v, int32_t old, int32_t _new) { return _InterlockedCompareExchange((long *)v, _new, old); @@ -329,6 +399,8 @@ p_atomic_dec_zero(int32_t *v) #define p_atomic_inc(_v) atomic_inc_32((uint32_t *) _v) #define p_atomic_dec(_v) atomic_dec_32((uint32_t *) _v) +#define p_atomic_inc_return(_v) atomic_inc_32_nv((uint32_t *) _v) +#define p_atomic_dec_return(_v) atomic_dec_32_nv((uint32_t *) _v) #define p_atomic_cmpxchg(_v, _old, _new) \ atomic_cas_32( (uint32_t *) _v, (uint32_t) _old, (uint32_t) _new) -- 2.1.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v5] gallium/auxiliary: add inc and dec alternative with return (v3)
Looking at u_atomic.h there is a section that uses PIPE_ATOMIC_ASM_MSVC_X86 and has explicit assembly, and there's a section that uses PIPE_ATOMIC_MSVC_INTRINSIC and has intrinsics. No clue whatsoever what the difference between them is, but presumably it doesn't exist solely for the purpose of annoying developers... I can't think of a good reason; I would be interested in knowing why. Last time I checked, MSVC is terrible at optimizing around __asm{} blocks and if I recall, only x86 (i.e. 32-bit) supports inline assembly. This is a bit off-topic though... -ilia ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/3][RFC v2] mesa/main/x86: Add sse2 streaming clamping
On Tue, Nov 4, 2014 at 6:05 AM, Juha-Pekka Heikkila juhapekka.heikk...@gmail.com wrote: Signed-off-by: Juha-Pekka Heikkila juhapekka.heikk...@gmail.com --- src/mesa/Makefile.am | 8 +++ src/mesa/main/x86/sse2_clamping.c | 103 ++ src/mesa/main/x86/sse2_clamping.h | 49 ++ 3 files changed, 160 insertions(+) create mode 100644 src/mesa/main/x86/sse2_clamping.c create mode 100644 src/mesa/main/x86/sse2_clamping.h diff --git a/src/mesa/Makefile.am b/src/mesa/Makefile.am index e71bccb..5d3c6f5 100644 --- a/src/mesa/Makefile.am +++ b/src/mesa/Makefile.am @@ -111,6 +111,10 @@ if SSE41_SUPPORTED ARCH_LIBS += libmesa_sse41.la endif +if SSE2_SUPPORTED +ARCH_LIBS += libmesa_sse2.la +endif + MESA_ASM_FILES_FOR_ARCH = if HAVE_X86_ASM @@ -154,6 +158,10 @@ libmesa_sse41_la_SOURCES = \ main/streaming-load-memcpy.c libmesa_sse41_la_CFLAGS = $(AM_CFLAGS) -msse4.1 +libmesa_sse2_la_SOURCES = \ + main/x86/sse2_clamping.c +libmesa_sse2_la_CFLAGS = $(AM_CFLAGS) -msse2 + pkgconfigdir = $(libdir)/pkgconfig pkgconfig_DATA = gl.pc diff --git a/src/mesa/main/x86/sse2_clamping.c b/src/mesa/main/x86/sse2_clamping.c new file mode 100644 index 000..7df1c85 --- /dev/null +++ b/src/mesa/main/x86/sse2_clamping.c @@ -0,0 +1,103 @@ +/* + * Copyright © 2014 Intel Corporation + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the Software), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + * + * Authors: + *Juha-Pekka Heikkila juhapekka.heikk...@gmail.com + * + */ + +#ifdef __SSE2__ +#include main/macros.h +#include main/x86/sse2_clamping.h +#include emmintrin.h + +/** + * Clamp four float values to [min,max] + */ +static inline void +_mesa_clamp_float_rgba(GLfloat src[4], GLfloat result[4], const float min, + const float max) +{ + __m128 operand, minval, maxval; + + operand = _mm_loadu_ps(src); + minval = _mm_set1_ps(min); + maxval = _mm_set1_ps(max); + operand = _mm_max_ps(operand, minval); + operand = _mm_min_ps(operand, maxval); + _mm_storeu_ps(result, operand); +} + + +/* Clamp n amount float rgba pixels to [min,max] using SSE2 Conceptually, _mesa_streaming_clamp_float_rgba() is clamping a contiguous array of floats to some min/max value. The fact that they are pixels is somewhat incidental when looking at it from a stream perspective. It looks like the code is more or less just operating on n*4 floats. Given that, a more efficient implementation would check alignment and then use aligned loads and streaming stores. It doesn't really matter if you straddle pixel boundaries as long as each float is operated on. I'm not sure how much effort you want to put into this though. :) + */ +void +_mesa_streaming_clamp_float_rgba(const GLuint n, GLfloat rgba_src[][4], + GLfloat rgba_dst[][4], const GLfloat min, + const GLfloat max) +{ + int i; + + for (i = 0; i n; i++) { + _mesa_clamp_float_rgba(rgba_src[i], rgba_dst[i], min, max); + } +} + + +/* Clamp n amount float rgba pixels to [min,max] using SSE2 and apply + * scaling and mapping to components. + * + * this replace handling of [RGBA] channels: + * rgba_temp[RCOMP] = CLAMP(rgba[i][RCOMP], 0.0F, 1.0F); + * rgba[i][RCOMP] = rMap[F_TO_I(rgba_temp[RCOMP] * scale[RCOMP])]; + */ +void +_mesa_clamp_float_rgba_scale_and_map(const GLuint n, GLfloat rgba_src[][4], + GLfloat rgba_dst[][4], const GLfloat min, + const GLfloat max, + const GLfloat scale[4], + const GLfloat* rMap, const GLfloat* gMap, + const GLfloat* bMap, const GLfloat* aMap) +{ + int i; + GLfloat
Re: [Mesa-dev] [PATCH 11/11] glsl: Optimize X / X == 1
Would this be conformant to GLSL spec if X had a runtime value of 0? Seems unsafe to replace X / X with 1 without a runtime test...maybe GLSL spec allows such optimizations. On Thu, Aug 7, 2014 at 3:51 PM, thomashellan...@gmail.com wrote: From: Thomas Helland thomashellan...@gmail.com Shows no changes for shader-db. Signed-off-by: Thomas Helland thomashelland90 at gmail.com --- src/glsl/opt_algebraic.cpp | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/glsl/opt_algebraic.cpp b/src/glsl/opt_algebraic.cpp index 21bf332..a49752d 100644 --- a/src/glsl/opt_algebraic.cpp +++ b/src/glsl/opt_algebraic.cpp @@ -513,6 +513,8 @@ ir_algebraic_visitor::handle_expression(ir_expression *ir) } if (is_vec_one(op_const[1])) return ir-operands[0]; + if(ir-operands[0]-equals(ir-operands[1])) + return new(mem_ctx) ir_constant(1.0f, 1); break; case ir_binop_dot: -- 2.0.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/3] util: Add util_memcpy_cpu_to_le32() v2
On Fri, Jul 18, 2014 at 2:10 PM, Tom Stellard thomas.stell...@amd.com wrote: v2: - Preserve word boundaries. --- src/gallium/auxiliary/util/u_math.h | 17 + 1 file changed, 17 insertions(+) diff --git a/src/gallium/auxiliary/util/u_math.h b/src/gallium/auxiliary/util/u_math.h index b9ed197..5de181a 100644 --- a/src/gallium/auxiliary/util/u_math.h +++ b/src/gallium/auxiliary/util/u_math.h @@ -812,6 +812,23 @@ util_bswap16(uint16_t n) (n 8); } +static INLINE void* +util_memcpy_cpu_to_le32(void *dest, void *src, size_t n) I don't know where Mesa is with C99 standards, but if you are utilizing C99 keywords, I think restrict would help here to show that the two pointers do not overlap. I'm not sure if have to mark 'd' and 's' as restrict to get the benefit if they are initialized by a typecast, but it probably wouldn't be a bad idea. This may be a no-go with C++ however. +{ +#ifdef PIPE_ARCH_BIG_ENDIAN + size_t i, e; + asset(n % 4 == 0); + + for (i = 0, e = n / 4; i e; i++) { + uint32_t *d = (uint32_t*)dest; + uint32_t *s = (uint32_t*)src; + d[i] = util_bswap32(s[i]); + } + return dest; +#else + return memcpy(dest, src, n); +#endif +} /** * Clamp X to [MIN, MAX]. -- 1.8.1.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] util: Add util_memcpy_cpu_to_le()
On Tue, Jul 15, 2014 at 11:19 AM, Tom Stellard thomas.stell...@amd.com wrote: --- src/gallium/auxiliary/util/u_math.h | 22 ++ src/gallium/drivers/radeonsi/si_shader.c | 8 +--- 2 files changed, 23 insertions(+), 7 deletions(-) diff --git a/src/gallium/auxiliary/util/u_math.h b/src/gallium/auxiliary/util/u_math.h index b9ed197..cd3cf04 100644 --- a/src/gallium/auxiliary/util/u_math.h +++ b/src/gallium/auxiliary/util/u_math.h @@ -812,6 +812,28 @@ util_bswap16(uint16_t n) (n 8); } +static INLINE void* +util_memcpy_cpu_to_le(void *dest, void *src, size_t n) +{ +#ifdef PIPE_ARCH_BIG_ENDIAN + size_t i, e; + for (i = 0, e = n % 8; i e; i++) { + char *d = (char*)dest; + char *s = (char*)src; + d[i] = s[e - i - 1]; + } + dest += i; + n -= i; + for (i = 0, e = n / 8; i e; i++) { + uint64_t *d = (uint64_t*)dest; + uint64_t *s = (uint64_t*)src; + d[i] = util_bswap64(s[e - i - 1]); + } Doesn't this reverse all of the byte (as if it were a list) without preserving word boundaries? e.g. |a, b, c, d | e, f, g, h | i, j, k, l | m, n, o, p | - |p, o, n, m | l, j, k, i | h, g, f, e | d, c, b, a | The old code did something like this, didn't it?: |a, b, c, d | e, f, g, h | i, j, k, l | m, n, o, p | - |d, c, b, a | h, g, f, e | l, k, j, i | p, o, n, m | I don't know which is correct, but it does seem like a behavior change. Or am I misreading the code? + return dest; +#else + return memcpy(dest, src, n); +#endif +} /** * Clamp X to [MIN, MAX]. diff --git a/src/gallium/drivers/radeonsi/si_shader.c b/src/gallium/drivers/radeonsi/si_shader.c index f0650f4..6f0504b 100644 --- a/src/gallium/drivers/radeonsi/si_shader.c +++ b/src/gallium/drivers/radeonsi/si_shader.c @@ -2559,13 +2559,7 @@ int si_compile_llvm(struct si_context *sctx, struct si_pipe_shader *shader, } ptr = (uint32_t*)sctx-b.ws-buffer_map(shader-bo-cs_buf, sctx-b.rings.gfx.cs, PIPE_TRANSFER_WRITE); - if (SI_BIG_ENDIAN) { - for (i = 0; i binary.code_size / 4; ++i) { - ptr[i] = util_cpu_to_le32((*(uint32_t*)(binary.code + i*4))); - } - } else { - memcpy(ptr, binary.code, binary.code_size); - } + util_memcpy_cpu_to_le(ptr, binary.code, binary.code_size); sctx-b.ws-buffer_unmap(shader-bo-cs_buf); free(binary.code); -- 1.8.1.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 7/9] glsl: Make foreach macros usable from C by adding struct keyword.
Yep, no new warnings. I tried a little test program % cat t.cpp class asdf { int x; }; void f() { asdf a; struct asdf b; class asdf c; } C++ never ceases to amaze. and I can't make it generate warnings (other than unused variables) regardless of whether I define asdf as a class or a struct. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 11/21] glsl: Store ir_variable::ir_type in 8 bits instead of 32
On Wed, May 28, 2014 at 2:17 PM, Ian Romanick i...@freedesktop.org wrote: On 05/27/2014 08:28 PM, Matt Turner wrote: On Tue, May 27, 2014 at 7:49 PM, Ian Romanick i...@freedesktop.org wrote: From: Ian Romanick ian.d.roman...@intel.com No change in the peak ir_variable memory usage in a trimmed apitrace of dota2 on 64-bit. No change in the peak ir_variable memory usage in a trimmed apitrace of dota2 on 32-bit. Signed-off-by: Ian Romanick ian.d.roman...@intel.com --- src/glsl/ir.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/src/glsl/ir.h b/src/glsl/ir.h index 7faee74..bc02f6e 100644 --- a/src/glsl/ir.h +++ b/src/glsl/ir.h @@ -92,12 +92,13 @@ enum ir_node_type { */ class ir_instruction : public exec_node { private: - enum ir_node_type ir_type; + uint8_t ir_type; public: inline enum ir_node_type get_ir_type() const { - return this-ir_type; + STATIC_ASSERT(ir_type_max 256); + return (enum ir_node_type) this-ir_type; } /** -- 1.8.1.4 Instead of doing this, you can mark the enum type with the PACKED attribute. I did this in a similar change in i965 already. See http://lists.freedesktop.org/archives/mesa-dev/2014-February/054643.html This way we still get enum type checking and warnings out of switch statements and such. Hmm... that would mean that patch 10 wouldn't strictly be necessary. The disadvantage is that the next patch would need (right?) some changes for MSVC, especially on 32-bit. I think it would need to be #if sizeof(ir_node_type) sizeof(void *) I don't think the preprocessor can evaluate sizeof(). # define PADDING_BYTES (sizeof(void *) - sizeof(ir_node_type)) #else # define PADDING_BYTES sizeof(void *) # if (__GNUC__ = 3) #error GCC did us wrong. # endif #endif uint8_t padding[PADDING_BYTES]; Seems a little sketchy, but might still be better... hmm... ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 7/9] egl: Don't attempt to redefine stdint.h types with VS 2013.
On Fri, May 2, 2014 at 10:11 AM, jfons...@vmware.com wrote: From: José Fonseca jfons...@vmware.com Just include stdint.h. --- src/egl/main/eglcompiler.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/egl/main/eglcompiler.h b/src/egl/main/eglcompiler.h index 53dab54..5ea83d6 100644 --- a/src/egl/main/eglcompiler.h +++ b/src/egl/main/eglcompiler.h @@ -37,7 +37,8 @@ /** * Get standard integer types */ -#if (defined(__STDC_VERSION__) __STDC_VERSION__ = 199901L) +#if (defined(__STDC_VERSION__) __STDC_VERSION__ = 199901L) || \ +(defined(_MSC_VER) _MSC_VER = 1600) VS 2010 is where the support for stdint.h beings. This can be verified by a quick Google search. # include stdint.h #elif defined(_MSC_VER) typedef __int8 int8_t; -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 4/8] radeonsi: Use util_cpu_to_le32() instead of bswap32() on big-endian systems
FWIW, memcpy() vs a for() loop has different semantics with respect to address alignment. I don't know how much it will matter, but last time I was reading assembly output, copying int[] via for() loop didn't produce a codepath for 16-byte aligned addresses (allowing for SSE streaming) while memcpy() has a lot of such logic. This won't matter much unless you have lots to copy, and of course, compiler optimizations can change, so maybe this situation has changed. Patrick On Thu, Feb 20, 2014 at 8:11 PM, Michel Dänzer mic...@daenzer.net wrote: On Don, 2014-02-20 at 10:21 -0800, Tom Stellard wrote: diff --git a/src/gallium/drivers/radeonsi/si_shader.c b/src/gallium/drivers/radeonsi/si_shader.c index 54270cd..9b04e6b 100644 --- a/src/gallium/drivers/radeonsi/si_shader.c +++ b/src/gallium/drivers/radeonsi/si_shader.c @@ -2335,7 +2335,7 @@ int si_compile_llvm(struct si_context *sctx, struct si_pipe_shader *shader, ptr = (uint32_t*)sctx-b.ws-buffer_map(shader-bo-cs_buf, sctx-b.rings.gfx.cs, PIPE_TRANSFER_WRITE); if (0 /*SI_BIG_ENDIAN*/) { for (i = 0; i binary.code_size / 4; ++i) { - ptr[i] = util_bswap32(*(uint32_t*)(binary.code + i*4)); + ptr[i] = util_cpu_to_le32((*(uint32_t*)(binary.code + i*4))); } } else { memcpy(ptr, binary.code, binary.code_size); We could get rid of the separate *_ENDIAN paths using util_cpu_to_le*(). Either way, the non-clover patches are Reviewed-by: Michel Dänzer michel.daen...@amd.com -- Earthling Michel Dänzer| http://www.amd.com Libre software enthusiast |Mesa and X developer ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Request for support of GL_AMD_pinned_memory and GL_ARB_buffer_storage extensions
My understanding is that this is like having MAP_UNSYNCHRONIZED on at all times, even when it isn't mapped, because it is always mapped (into memory). Is that correct Jose? Patrick On Wed, Feb 5, 2014 at 11:53 AM, Grigori Goronzy g...@chown.ath.cx wrote: On 05.02.2014 18:08, Jose Fonseca wrote: I honestly hope that GL_AMD_pinned_memory doesn't become popular. It would have been alright if it wasn't for this bit in http://www.opengl.org/registry/specs/AMD/pinned_memory.txt which says: 2) Can the application still use the buffer using the CPU address? RESOLVED: YES. However, this access would be completely non synchronized to the OpenGL pipeline, unless explicit synchronization is being used (for example, through glFinish or by using sync objects). And I'm imagining apps which are streaming vertex data doing precisely just that... I don't understand your concern, this is exactly the same behavior GL_MAP_UNSYCHRONIZED_BIT has, and apps are supposedly using that properly. How does apitrace handle it? Grigori ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] Testing optimizer
Hi all, Is there a way to see the machine code that is generated by the GLSL compiler for all GPU instruction sets? For example, I would like to know if the optimizer optimizes certain (equivalent) constructs (or not), and avoid them if possible. I know there is a lot to optimization on GPUs that I don't know, but I'd still like to get some ballpark estimates. For example, I'm curious whether: //let p1, p2, p3 be vec2 uniforms vec4(p1, 0, 0) + vec4(p2, 0, 0) + vec4(p3, 0, 1) produces identical machine code as: vec4(p1+p2+p3, 0, 1); for all architectures supported by Mesa. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] radeonsi: pad IBs to a multiple of 8 DWs
Any reason for this complicated logic, instead of simply: while (cs-cdw 0x7) cs-buf[cs-cdw++] = 0x8000; Ah, that is eloquently terse; I'm going to have to remember that. Patrick Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Debian, X and DRI developer ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Another Take on the S3TC issue
I've been hanging on this list for a while, and this isn't the first time this has been suggested. The general thing that is repeated is basically this: if you make an API (e.g. OpenGL) that supports S3TC without a license, you're in trouble, even if it is a passthrough to the hardware, which also required a license to produce in the first place. I think the assumption most people make is that if the hardware vendor paid a license to implement S3TC in an ASIC, then surely simply passing through data is OK. After all, it is being done without any knowledge of the algorithm, etc. From a common sense standpoint, I would agree. However, the note in the S3TC extension itself[1] mentions explicitly to be wary of such assumptions in the IP Status section, and notes that *a license for one API is not a license for another*. This implies that for an API to make use of S3TC, it requires a license, which Mesa in general, does not have, while a hardware vendor might. All of this is theoretical as far as I've read; I don't think anyone has legally challenged this for open source drivers and posted the results on this mailing list -- mostly have stayed away from it with a prejudice. I think the patent was granted in 1999, so at least in the USA, hopefully we don't have too many more years of this garbage. Patrick [1] http://www.opengl.org/registry/specs/EXT/texture_compression_s3tc.txt On Tue, Aug 13, 2013 at 1:53 PM, Uwe Schmidt simon.schm...@cs-systemberatung.de wrote: Hi, I have read about the issue of implementing the S3TC Extension in Mesa: http://dri.freedesktop.org/wiki/S3TC/ As I understood, the problem is, that encoding and decoding S3TC in software is covered by patents, while passing S3TC compressed data to the GPU is still ok. AS NOW: If force_s3tc_enable is enabled in Mesa3D, uploading a S3TC encoded texture works if format==internalFormat is true. If format!=internalFormat is true, it would fail (as i know). SO MY PROPOSAL: If 'format' is one of the S3TC types, and format!=internalFormat is true, then set internalFormat:=format. Else, if 'internalFormat' is one of the S3TC types, but the 'format' isn't, set internalFormat:=format (or any other format, Mesa3D can encode). ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Another Take on the S3TC issue
Erm... I'm wondering... why does the S3TC issue come up every few months out of it's grave and haunt the list (and your nerves) ? I think it is because the issue looks deceptively simple. Hardware is hardware, right? ASICs do the decompression, not software. Surely blindly copying bits from one device to another *can't* be patent infringement. Surely, right? :\ Patrick ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r300g: add program name check for BSD
On Wed, Jun 26, 2013 at 2:11 AM, Jonathan Gray j...@jsg.id.au wrote: program_invocation_short_name is glibc specific. Provide an alternative using getprogname(), which can be found on *BSD and OS X. Signed-off-by: Jonathan Gray j...@jsg.id.au --- src/gallium/drivers/r300/r300_chipset.c | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git src/gallium/drivers/r300/r300_chipset.c src/gallium/drivers/r300/r300_chipset.c index 11061ed..7f51ccb 100644 --- src/gallium/drivers/r300/r300_chipset.c +++ src/gallium/drivers/r300/r300_chipset.c @@ -30,6 +30,14 @@ #include stdio.h #include errno.h +#undef GET_PROGRAM_NAME +#ifdef __GLIBC__ +# define GET_PROGRAM_NAME() program_invocation_short_name I think you are missing parentheses on the end of program_invocation_short_name +#else /* *BSD and OS X */ +# include stdlib.h +# define GET_PROGRAM_NAME() getprogname() +#endif + /* r300_chipset: A file all to itself for deducing the various properties of * Radeons. */ @@ -49,7 +57,7 @@ static void r300_apply_hyperz_blacklist(struct r300_capabilities* caps) int i; for (i = 0; i Elements(list); i++) { -if (strcmp(list[i], program_invocation_short_name) == 0) { +if (strcmp(list[i], GET_PROGRAM_NAME()) == 0) { caps-zmask_ram = 0; caps-hiz_ram = 0; break; -- 1.8.3.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] forking shared intel directory?
On Fri, Jun 21, 2013 at 1:29 PM, Eric Anholt e...@anholt.net wrote: Long ago, when porting FBO and memory management support to i965, I merged a bunch of code between the i915 and i965 drivers and put it in the intel directory. I think it served us well for a long time, as both drivers got improvements from shared work on that code. But since then, we've talked several times about splitting things back apart (since we break i915 much more often than we improve it), so I spent yesterday and today looking at what the impact would be. I'm not a developer, but I like to keep up with the drivers that I have hardware for. Please take my opinions with a grain of salt. When you say you break i915 more than you improve it, do you mean to say that it is difficult to improve !i915 without breaking i915 and therefore to improve development speed, it should be forked OR that i915 doesn't receive enough testing / have maintainers who can resolve the issues and so it burdens other developers to fix i915 and hence slows development? The reason I ask if because if it is #2, then it sounds like you should be looking for someone to volunteer as the official i915 maintainer [and if none, then fork], but if it is #1, then maintainer or not, it will slow down your efforts. LOC counts (wc -l): intel/ i915/ i965/ total master: 14751 13458 61109 89318 fork-i915: 0 24322 74978 99300 We duplicate ~1 lines of code, but i915 drops ~4000 lines of code From its build and i965 drops ~1000. context size: i915i965 master:99512 101456 fork-i915: 99384 100824 There's a bunch of cleanup I haven't done in the branch, like moving brw_vtbl.c contents to sensible places, or nuking the intel vs brw split that doesn't make any sense any more. I'm ambivalent about the change. If the code growth from splitting was 7000 lines or so, I'd be happy, but this feels pretty big. On the other hand, the cleanups feel good to me. I don't know how other developers feel. There's a branch up at fork-i915 of my tree. If people are excited about doing this and I get a bunch of acks for the two copy the code to my directory commits, I'll do those two then start sending out the non-copying changes for review. If people don't like it, I won't be hurt. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] forking shared intel directory?
On Fri, Jun 21, 2013 at 3:53 PM, Kenneth Graunke kenn...@whitecape.orgwrote: On 06/21/2013 01:25 PM, Patrick Baggett wrote: I'm not a developer, but I like to keep up with the drivers that I have hardware for. Please take my opinions with a grain of salt. When you say you break i915 more than you improve it, do you mean to say that it is difficult to improve !i915 without breaking i915 and therefore to improve development speed, it should be forked OR that i915 doesn't receive enough testing / have maintainers who can resolve the issues and so it burdens other developers to fix i915 and hence slows development? The reason I ask if because if it is #2, then it sounds like you should be looking for someone to volunteer as the official i915 maintainer [and if none, then fork], but if it is #1, then maintainer or not, it will slow down your efforts. Mostly the former...i915c already supports everything the hardware can do, while we're continually adding new features to i965+ (well, mostly gen6+). Things like HiZ, fast color clears, and ETC texture compression support affect the common miptree code, but they do nothing for i915 class hardware...there's only a potential downside of accidental breakage. The latter is true as well. Unfortunately, community work is hampered by the fact that Intel hasn't released public documentation for i915 class hardware. From time to time we've tried to find and motivate the right people to make that happen, but it hasn't yet. Most people in the community are also more interested in working on the i915g driver. Ah, thanks for the explanation, though I guess it doesn't do a whole, whole lot to answer Eric's question. On a side note: I was interested in the i915g driver, but I couldn't find any documentation for it other than some architectural information about the GPU's pipeline. I'm glad I wasn't just lacking the Google-foo. :\ ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] forking shared intel directory?
The latter is true as well. Unfortunately, community work is hampered by the fact that Intel hasn't released public documentation for i915 class hardware. From time to time we've tried to find and motivate the right people to make that happen, but it hasn't yet. Most people in the community are also more interested in working on the i915g driver. Ah, thanks for the explanation, though I guess it doesn't do a whole, whole lot to answer Eric's question. That is to say, hearing that there isn't just a lack of maintainer or just lack of ease for new development doesn't make either option seem better to me, but you all know what's best here. Thanks for the info! ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 4/5] radeonsi/compute: Pass kernel arguments in a buffer
The only difference I could see is that in the old code you passed cb-buffer (which maybe points to a value?) directly into u_upload_data() where as in the new code, you do pass cb-buffer as the parameter rbuffer to r600_upload_const_buffer(), but then inside that function, you do *rbuffer = NULL before you start, which effectively erases any previous pointer, so if *rbuffer was examined by u_upload_data(), it may be different. I don't know if that matters, though. Patrick On Fri, May 24, 2013 at 1:07 PM, Tom Stellard t...@stellard.net wrote: From: Tom Stellard thomas.stell...@amd.com --- src/gallium/drivers/radeonsi/r600_buffer.c | 31 + src/gallium/drivers/radeonsi/radeonsi_compute.c | 26 ++--- src/gallium/drivers/radeonsi/si_state.c | 29 +++ 3 files changed, 51 insertions(+), 35 deletions(-) diff --git a/src/gallium/drivers/radeonsi/r600_buffer.c b/src/gallium/drivers/radeonsi/r600_buffer.c index cdf9988..87763c3 100644 --- a/src/gallium/drivers/radeonsi/r600_buffer.c +++ b/src/gallium/drivers/radeonsi/r600_buffer.c @@ -25,6 +25,8 @@ * Corbin Simpson mostawesomed...@gmail.com */ +#include byteswap.h + #include pipe/p_screen.h #include util/u_format.h #include util/u_math.h @@ -168,3 +170,32 @@ void r600_upload_index_buffer(struct r600_context *rctx, u_upload_data(rctx-uploader, 0, count * ib-index_size, ib-user_buffer, ib-offset, ib-buffer); } + +void r600_upload_const_buffer(struct r600_context *rctx, struct si_resource **rbuffer, + const uint8_t *ptr, unsigned size, + uint32_t *const_offset) +{ + *rbuffer = NULL; + + if (R600_BIG_ENDIAN) { + uint32_t *tmpPtr; + unsigned i; + + if (!(tmpPtr = malloc(size))) { + R600_ERR(Failed to allocate BE swap buffer.\n); + return; + } + + for (i = 0; i size / 4; ++i) { + tmpPtr[i] = bswap_32(((uint32_t *)ptr)[i]); + } + + u_upload_data(rctx-uploader, 0, size, tmpPtr, const_offset, + (struct pipe_resource**)rbuffer); + + free(tmpPtr); + } else { + u_upload_data(rctx-uploader, 0, size, ptr, const_offset, + (struct pipe_resource**)rbuffer); + } +} diff --git a/src/gallium/drivers/radeonsi/radeonsi_compute.c b/src/gallium/drivers/radeonsi/radeonsi_compute.c index 3fb6eb1..035076d 100644 --- a/src/gallium/drivers/radeonsi/radeonsi_compute.c +++ b/src/gallium/drivers/radeonsi/radeonsi_compute.c @@ -91,8 +91,11 @@ static void radeonsi_launch_grid( struct r600_context *rctx = (struct r600_context*)ctx; struct si_pipe_compute *program = rctx-cs_shader_state.program; struct si_pm4_state *pm4 = CALLOC_STRUCT(si_pm4_state); + struct si_resource *input_buffer; + uint32_t input_offset = 0; + uint64_t input_va; uint64_t shader_va; - unsigned arg_user_sgpr_count; + unsigned arg_user_sgpr_count = 2; unsigned i; struct si_pipe_shader *shader = program-kernels[pc]; @@ -109,21 +112,16 @@ static void radeonsi_launch_grid( si_pm4_inval_shader_cache(pm4); si_cmd_surface_sync(pm4, pm4-cp_coher_cntl); - arg_user_sgpr_count = program-input_size / 4; - if (program-input_size % 4 != 0) { - arg_user_sgpr_count++; - } + /* Upload the input data */ + r600_upload_const_buffer(rctx, input_buffer, input, + program-input_size, input_offset); + input_va = r600_resource_va(ctx-screen, (struct pipe_resource*)input_buffer); + input_va += input_offset; - /* XXX: We should store arguments in memory if we run out of user sgprs. -*/ - assert(arg_user_sgpr_count 16); + si_pm4_add_bo(pm4, input_buffer, RADEON_USAGE_READ); - for (i = 0; i arg_user_sgpr_count; i++) { - uint32_t *args = (uint32_t*)input; - si_pm4_set_reg(pm4, R_00B900_COMPUTE_USER_DATA_0 + - (i * 4), - args[i]); - } + si_pm4_set_reg(pm4, R_00B900_COMPUTE_USER_DATA_0, input_va); + si_pm4_set_reg(pm4, R_00B900_COMPUTE_USER_DATA_0 + 4, S_008F04_BASE_ADDRESS_HI (input_va 32) | S_008F04_STRIDE(0)); si_pm4_set_reg(pm4, R_00B810_COMPUTE_START_X, 0); si_pm4_set_reg(pm4, R_00B814_COMPUTE_START_Y, 0); diff --git a/src/gallium/drivers/radeonsi/si_state.c b/src/gallium/drivers/radeonsi/si_state.c index dec535c..1e94f7e 100644 --- a/src/gallium/drivers/radeonsi/si_state.c +++ b/src/gallium/drivers/radeonsi/si_state.c @@ -24,8 +24,6 @@ *
Re: [Mesa-dev] No configs available with xlib based egl
Perhaps 16-bit color isn't supported? Maybe try other color bits or set R/G/B individually and see what happens. Also, there is an eglinfo tool source code in Mesa that can probably tell you a whole lot more. Patrick On Tue, May 7, 2013 at 7:56 AM, Divick Kishore divick.kish...@gmail.comwrote: Hi, I have compiled mesa with the following options: .././configure --prefix=~/lib/mesa/swrast/ --build=x86_64-linux-gnu --with-gallium-drivers= --with-driver=xlib --enable-egl --enable-gles1 --enable-gles2 --with-egl-platforms=x11 CFLAGS=-Wall -g -O2 CXXFLAGS=-Wall -g -O2 but when I run a sample app with the following egl config, it returns 0 configs. EGLint attr[] = { // some attributes to set up our egl-interface EGL_BUFFER_SIZE, 16, EGL_RENDERABLE_TYPE, EGL_OPENGL_ES2_BIT, EGL_NONE }; EGLConfig ecfg; EGLint num_config; if ( !eglChooseConfig( egl_display, attr, ecfg, 1, num_config ) ) { cerr Failed to choose config (eglError: eglGetError() ) endl; return 1; } The code above prints 'Failed to choose config'. While the same code works fine when I compile with: ../../configure --prefix=~/lib/mesa/dri --build=x86_64-linux-gnu --with-driver=dri --with-dri-drivers=swrast --with-dri-driverdir=~/lib/mesa/dri/ --with-dri-searchpath='~/lib/mesa/dri' --enable-glx-tls --enable-xa --enable-driglx-direct --with-egl-platforms=x11 --enable-gallium-llvm=yes --with-gallium-drivers=swrast --enable-gles1 --enable-gles2 --enable-gallium-egl --disable-glu CFLAGS=-Wall -g -O2 CXXFLAGS=-Wall -g -O2 Could someone please suggest what could be causing this? Thanks Regards, Divick ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] glxgears performance higher with software renderer compared to h/w drivers
I don't think glxgears is the best benchmark for what is a typical OpenGL load (if there is a typical). The 60 FPS with your hardware driver sounds suspiciously like the refresh rate of your screen; perhaps it is synchronized with the vertical retrace? Since I'm assuming you want to find the fastest driver, why not try a free and open source game like openarena to give you a better idea of how they actually perform. Patrick On Mon, May 6, 2013 at 9:33 AM, Divick Kishore divick.kish...@gmail.comwrote: Hi, I am trying to build s/w only mesa driver. It seems that the performance of software only renderer (compiled with --with-driver=xlib) is higher than that of h/w drivers. Could someone please help me understand what is causing this or if it is expected? I see that dri based s/w renderer is also slower than xlib/swrast driver. So how does dri based s/w rendering work and why is it slower than xlib/swrast driver? I presume that --with-driver=xlib builds s/w only renderer. Please correct me if I am wrong. ./configure -build=x86_64-linux-gnu --with-driver=dri --with-dri-drivers=i915 swrast --with-dri-driverdir=/home/divick/work/mesa/mesa-8.0.5/build/dri/x86_64-linux-gnu/ --with-dri-searchpath='/home/divick/work/mesa/mesa-8.0.5/build/dri/x86_64-linux-gnu/' --enable-glx-tls --enable-shared-glapi --enable-texture-float --enable-xa --enable-driglx-direct --with-egl-platforms=x11 drm --enable-gallium-llvm --with-gallium-drivers=swrast i915 --enable-gles1 --enable-gles2 --enable-openvg --enable-gallium-egl --disable-glu CFLAGS=-Wall -g -O2 CXXFLAGS=-Wall -g -O2 with LIBGL_ALWAYS_SOFTWARE=1 glxgears reports: GL_RENDERER = Software Rasterizer GL_VERSION= 2.1 Mesa 8.0.5 GL_VENDOR = Mesa Project fps: ~ 490 fps Without LIBGL_ALWAYS_SOFTWARE set: GL_RENDERER = Mesa DRI Intel(R) Sandybridge Mobile GL_VERSION= 3.0 Mesa 8.0.5 GL_VENDOR = Tungsten Graphics, Inc fps: ~ 60 When compiled with configure options --build=x86_64-linux-gnu --disable-egl --with-gallium-drivers= --with-driver=xlib --disable-egl CFLAGS=-Wall -g -O2 CXXFLAGS=-Wall -g -O2 glxgears reports: GL_RENDERER = Mesa X11 GL_VERSION= 2.1 Mesa 8.0.5 GL_VENDOR = Brian Paul fps: ~1600 With drivers installed on system and with LIBGL_ALWAYS_SOFTWARE=1: GL_RENDERER = Gallium 0.4 on llvmpipe (LLVM 0x209) GL_VERSION= 2.1 Mesa 8.0.5 GL_VENDOR = VMware, Inc. fps: ~ 1130 Thanks Regards, Divick ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 03/17] swrast: Factor out texture slice counting.
On Mon, Apr 22, 2013 at 11:14 AM, Eric Anholt e...@anholt.net wrote: This function going to get used a lot more in upcoming patches. --- src/mesa/swrast/s_texture.c | 16 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/src/mesa/swrast/s_texture.c b/src/mesa/swrast/s_texture.c index 51048be..36a90dd 100644 --- a/src/mesa/swrast/s_texture.c +++ b/src/mesa/swrast/s_texture.c @@ -58,6 +58,14 @@ _swrast_delete_texture_image(struct gl_context *ctx, _mesa_delete_texture_image(ctx, texImage); } +static unsigned int +texture_slices(struct gl_texture_image *texImage) +{ + if (texImage-TexObject-Target == GL_TEXTURE_1D_ARRAY) + return texImage-Height; + else + return texImage-Depth; +} I think you can const-qualify 'texImage'. /** * Called via ctx-Driver.AllocTextureImageBuffer() @@ -83,11 +91,11 @@ _swrast_alloc_texture_image_buffer(struct gl_context *ctx, * We allocate the array for 1D/2D textures too in order to avoid special- * case code in the texstore routines. */ - swImg-ImageOffsets = malloc(texImage-Depth * sizeof(GLuint)); + swImg-ImageOffsets = malloc(texture_slices(texImage) * sizeof(GLuint)); if (!swImg-ImageOffsets) return GL_FALSE; - for (i = 0; i texImage-Depth; i++) { + for (i = 0; i texture_slices(texImage); i++) { swImg-ImageOffsets[i] = i * texImage-Width * texImage-Height; } @@ -209,20 +217,20 @@ _swrast_map_teximage(struct gl_context *ctx, map = swImage-Buffer; + assert(slice texture_slices(texImage)); + if (texImage-TexObject-Target == GL_TEXTURE_3D || texImage-TexObject-Target == GL_TEXTURE_2D_ARRAY) { GLuint sliceSize = _mesa_format_image_size(texImage-TexFormat, texImage-Width, texImage-Height, 1); - assert(slice texImage-Depth); map += slice * sliceSize; } else if (texImage-TexObject-Target == GL_TEXTURE_1D_ARRAY) { GLuint sliceSize = _mesa_format_image_size(texImage-TexFormat, texImage-Width, 1, 1); - assert(slice texImage-Height); map += slice * sliceSize; } -- 1.7.10.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/3] One definition of C99 inline/__func__ to rule them all.
On Tue, Mar 12, 2013 at 3:39 PM, jfons...@vmware.com wrote: From: José Fonseca jfons...@vmware.com We were in four already... --- include/c99_compat.h | 105 + src/egl/main/eglcompiler.h| 44 ++ src/gallium/include/pipe/p_compiler.h | 74 ++- src/mapi/mapi/u_compiler.h| 26 ++-- src/mesa/main/compiler.h | 56 ++ 5 files changed, 125 insertions(+), 180 deletions(-) create mode 100644 include/c99_compat.h diff --git a/include/c99_compat.h b/include/c99_compat.h new file mode 100644 index 000..39f958f --- /dev/null +++ b/include/c99_compat.h @@ -0,0 +1,105 @@ +/** + * + * Copyright 2007-2013 VMware, Inc. + * All Rights Reserved. + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the + * Software), to deal in the Software without restriction, including + * without limitation the rights to use, copy, modify, merge, publish, + * distribute, sub license, and/or sell copies of the Software, and to + * permit persons to whom the Software is furnished to do so, subject to + * the following conditions: + * + * The above copyright notice and this permission notice (including the + * next paragraph) shall be included in all copies or substantial portions + * of the Software. + * + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS + * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. + * IN NO EVENT SHALL VMWARE AND/OR ITS SUPPLIERS BE LIABLE FOR + * ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, + * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE + * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. + * + **/ + +#ifndef _C99_COMPAT_H_ +#define _C99_COMPAT_H_ + + +/* + * C99 inline keyword + */ +#ifndef inline +# ifdef __cplusplus + /* C++ supports inline keyword */ +# elif defined(__GNUC__) +#define inline __inline__ +# elif defined(_MSC_VER) +#define inline __inline +# elif defined(__ICL) +#define inline __inline +# elif defined(__INTEL_COMPILER) + /* Intel compiler supports inline keyword */ +# elif defined(__WATCOMC__) (__WATCOMC__ = 1100) +#define inline __inline +# elif defined(__SUNPRO_C) defined(__C99FEATURES__) Solaris Studio supports __inline and __inline__ + /* C99 supports inline keyword */ +# elif (__STDC_VERSION__ = 199901L) + /* C99 supports inline keyword */ +# else +#define inline +# endif +#endif The order of the checks will not work as expected. Intel's compiler will define __GNUC__, and so will clang. The check for __GNUC__ has to be the last one. + + +/* + * C99 restrict keyword + * + * See also: + * - http://cellperformance.beyond3d.com/articles/2006/05/demystifying-the-restrict-keyword.html + */ +#ifndef restrict +# if (__STDC_VERSION__ = 199901L) + /* C99 */ +# elif defined(__SUNPRO_C) defined(__C99FEATURES__) + /* C99 */ Solaris Studio supports _Restrict when not in C99 mode as well. #define restrict _Restrict +# elif defined(__GNUC__) +#define restrict __restrict__ +# elif defined(_MSC_VER) +#define restrict __restrict +# else +#define restrict /* */ +# endif +#endif + + +/* + * C99 __func__ macro + */ +#ifndef __func__ +# if (__STDC_VERSION__ = 199901L) + /* C99 */ +# elif defined(__SUNPRO_C) defined(__C99FEATURES__) + /* C99 */ Solaris Studio supports __FUNCTION__ when not in C99 mode. +# elif defined(__GNUC__) +#if __GNUC__ = 2 +# define __func__ __FUNCTION__ +#else +# define __func__ unknown +#endif +# elif defined(_MSC_VER) +#if _MSC_VER = 1300 +# define __func__ __FUNCTION__ +#else +# define __func__ unknown +#endif +# else +#define __func__ unknown +# endif +#endif + + +#endif /* _C99_COMPAT_H_ */ diff --git a/src/egl/main/eglcompiler.h b/src/egl/main/eglcompiler.h index 9823693..2499172 100644 --- a/src/egl/main/eglcompiler.h +++ b/src/egl/main/eglcompiler.h @@ -31,6 +31,9 @@ #define EGLCOMPILER_INCLUDED +#include c99_compat.h /* inline, __func__, etc. */ + + /** * Get standard integer types */ @@ -62,30 +65,7 @@ #endif -/** - * Function inlining - */ -#ifndef inline -# ifdef __cplusplus - /* C++ supports inline keyword */ -# elif defined(__GNUC__) -#define inline __inline__ -# elif defined(_MSC_VER) -#define inline __inline -# elif defined(__ICL) -#define inline __inline -# elif defined(__INTEL_COMPILER) -
Re: [Mesa-dev] [PATCH 2/2] mesa: Speedup the xrgb - argb special case in fast_read_rgba_pixels_memcpy
On Mon, Mar 11, 2013 at 9:56 AM, Jose Fonseca jfons...@vmware.com wrote: I'm surprised this is is faster. In particular, for big things we'll be touching memory twice. Did you measure the speed up? Jose I'm sorry to be dull, but is there a SSE2 implementation of this somewhere for x86 / x64 CPUs? Patrick - Original Message - --- src/mesa/main/readpix.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/src/mesa/main/readpix.c b/src/mesa/main/readpix.c index 349b0bc..0f5c84c 100644 --- a/src/mesa/main/readpix.c +++ b/src/mesa/main/readpix.c @@ -285,11 +285,12 @@ fast_read_rgba_pixels_memcpy( struct gl_context *ctx, } } else if (copy_xrgb) { /* convert xrgb - argb */ + int alphaOffset = texelBytes - 1; for (j = 0; j height; j++) { - GLuint *dst4 = (GLuint *) dst, *map4 = (GLuint *) map; + memcpy(dst, map, width * texelBytes); int i; for (i = 0; i width; i++) { -dst4[i] = map4[i] | 0xff00; /* set A=0xff */ +dst[i * texelBytes + alphaOffset] = 0xff; /* set A=0xff */ } dst += dstStride; map += stride; -- 1.8.1.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] meta: Allocate texture before initializing texture coordinates
On Fri, Feb 22, 2013 at 2:23 PM, Ian Romanick i...@freedesktop.org wrote: On 02/15/2013 11:20 AM, Anuj Phogat wrote: tex-Sright and tex-Ttop are initialized during texture allocation. This fixes depth buffer blitting failures in khronos conformance tests when run on desktop GL 3.0. Note: This is a candidate for stable branches. Signed-off-by: Anuj Phogat anuj.pho...@gmail.com Reviewed-by: Ian Romanick ian.d.roman...@intel.com I think there is a lot of room for other improvements in this code. Like... why are we doing glReadPixels into malloc memory, then handing that same pointer to glTexImage2D. We should (at least for desktop and GLES3) use a PBO. --- src/mesa/drivers/common/meta.c | 17 - 1 files changed, 8 insertions(+), 9 deletions(-) diff --git a/src/mesa/drivers/common/**meta.c b/src/mesa/drivers/common/* *meta.c index 4e32b50..29a209e 100644 --- a/src/mesa/drivers/common/**meta.c +++ b/src/mesa/drivers/common/**meta.c @@ -1910,6 +1910,14 @@ _mesa_meta_BlitFramebuffer(**struct gl_context *ctx, GLuint *tmp = malloc(srcW * srcH * sizeof(GLuint)); if (tmp) { + + newTex = alloc_texture(depthTex, srcW, srcH, GL_DEPTH_COMPONENT); Are out of memory conditions handled in alloc_texture? + _mesa_ReadPixels(srcX, srcY, srcW, srcH, GL_DEPTH_COMPONENT, + GL_UNSIGNED_INT, tmp); + setup_drawpix_texture(ctx, depthTex, newTex, GL_DEPTH_COMPONENT, + srcW, srcH, GL_DEPTH_COMPONENT, + GL_UNSIGNED_INT, tmp); + /* texcoords (after texture allocation!) */ { verts[0].s = 0.0F; @@ -1928,15 +1936,6 @@ _mesa_meta_BlitFramebuffer(**struct gl_context *ctx, if (!blit-DepthFP) init_blit_depth_pixels(ctx); - /* maybe change tex format here */ - newTex = alloc_texture(depthTex, srcW, srcH, GL_DEPTH_COMPONENT); - - _mesa_ReadPixels(srcX, srcY, srcW, srcH, - GL_DEPTH_COMPONENT, GL_UNSIGNED_INT, tmp); - - setup_drawpix_texture(ctx, depthTex, newTex, GL_DEPTH_COMPONENT, srcW, srcH, - GL_DEPTH_COMPONENT, GL_UNSIGNED_INT, tmp); - _mesa_BindProgramARB(GL_**FRAGMENT_PROGRAM_ARB, blit-DepthFP); _mesa_set_enable(ctx, GL_FRAGMENT_PROGRAM_ARB, GL_TRUE); _mesa_ColorMask(GL_FALSE, GL_FALSE, GL_FALSE, GL_FALSE); __**_ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/**mailman/listinfo/mesa-devhttp://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] use of likey() / unlikely() macros
On Thu, Jan 17, 2013 at 10:37 AM, Brian Paul bri...@vmware.com wrote: In compiler.h we define the likely(), unlikely() macros which wrap GCC's __builtin_expect(). But we only use them in a handful of places. It seems to me that an obvious place to possibly use these would be for GL error testing. For example, in glDrawArrays(): if (unlikely(count = 0)) { _mesa_error(); } Plus, in some of the glBegin/End per-vertex calls such as glVertexAttrib3fARB() where we error test the index parameter. I guess the key question is how much might we gain from this. I don't really have a good feel for the value at this level. In a tight inner loop, sure, but the GL error checking is pretty high-level code. This is basically a micro-optimization, to be honest. Not that micro-optimization is bad, but while it should improve performance, it would take a lot for that to show up on profiles. In the case of error checking at the start of a function, you might be lucky to save a few cycles -- virtually unnoticeable. I haven't found much on the web about performance gains from __builtin_expect(). Anyone? I read a few heresay posts, but this one comes with actual numbers: http://blog.man7.org/2012/10/how-much-do-builtinexpect-likely-and.html Long story short: if you're wrong, slower; if you're right, marginal improvement. It's use is for changing the ordering of jumps from gcc's default of assume linear execution. For example, code like this: --- if(A == NULL) //not likely return ERR_NULL; if(B = MAX) //not likely return ERR_MAX; if(C MIN) //not likely return ERR_MIN; doStuff(); --- generates jumps around the return statement, so in the normal case, you're making a jump, which can mean you have a delay and possibly refetch instructions. If you didn't jump, then CPU will have the then part already loaded in the icache. The optimal ordering then is: if(A != NULL) { if(B MAX) { if(C = MIN) { doStuff(); } else return ERR_MIN; } else return ERR_MAX; } else return ERR_NULL; --- In the common case then, the code does not branch, but executes a linear stream of instructions. On modern x86 CPUs, this matters very little, except for maybe a few in-order CPUs (maybe Intel Atom?). You're probably a lot more likely to get some improvements from non-x86 where branch prediction is weaker or unavailable and/or the CPU is in-order. ARM and older SPARC CPUs come to mind. Also, some architectures allow you to encode a branch prediction hint inside of the branch itself, e.g. IA64's br.call.sptk.many Branch / Call / Static Predict Taken / Many Times, which gcc can take advantage of. Still overall, this is well within the realm of micro-optimization. Patrick ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] GL 3.1 on Radeon HD 4670?
DOH. I'm sorry, I read that Mesa supported GL 3.1 and somehow I generalized that to all drivers. Thanks for that TODO list. I guess I need to start reading about the R700 architecture... Patrick On Wed, Oct 31, 2012 at 1:28 PM, Alex Deucher alexdeuc...@gmail.com wrote: On Wed, Oct 31, 2012 at 1:11 PM, Patrick Baggett baggett.patr...@gmail.com wrote: Hi all, I've got a really weird duck of system: an Itanium2 system running Linux 3.7.0-rc3 with the newest libdrm and mesa git from yesterday. I configured it with --enable-texture-float and the radeon DRI driver. When I use glxinfo, I see that it is Mesa 9.1-devel but only OpenGL 3.0. Is that because my version glxinfo doesn't create the appropriate context? Is there an updated version of glxinfo that does? Or a flag that I should pass to only consider core contexts? The open source r600g driver only supports GL 3.0 at the moment. See this document to see what's still missing: http://cgit.freedesktop.org/mesa/mesa/tree/docs/GL3.txt Alex ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] R600 tiling halves the frame rate
Is your screen refresh rate 70 Hz? Because if so, that means that it's syncing to the vblank on Mesa, and not doing so on the proprietary one. Patrick On Mon, Oct 29, 2012 at 8:24 PM, Tzvetan Mikov tmi...@jupiter.com wrote: On 10/28/2012 12:56 PM, Tzvetan Mikov wrote: On 10/28/2012 04:26 AM, Marek Olšák wrote: No, there is no X11 at all. I am running my tests on a very bare system with EGL only, hoping to minimize the test surface and isolate any interferences. I will try it though (it will also enable me to compare against the proprietary drivers as a baseline, I guess). This is not directly related to tiling, but I installed the proprietary drivers on the same hardware, and I am getting about 3X the performance. (From 70 FPS to 225 FPS in 1920x1200 on a HD6460). Is it known what the main reason is for such a dramatic performance difference between the Mesa R600 driver and proprietary driver? This is a very simple test app rendering two textured rectangles on screen, so I am guessing the difference must be due to something fundamental. regards, Tzvetan __**_ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/**mailman/listinfo/mesa-devhttp://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Mesa (master): Use signbit() in IS_NEGATIVE and DIFFERENT_SIGNS
Concurrency::precise_math::signbit(), and only as of VS 2012 runtimes. This is an awfully high bar for such a simple function. On Mon, Sep 24, 2012 at 1:43 PM, Matt Turner matts...@gmail.com wrote: On Mon, Sep 24, 2012 at 11:02 AM, Brian Paul bri...@vmware.com wrote: On 09/24/2012 10:49 AM, Matt Turner wrote: Module: Mesa Branch: master Commit: 0f3ba405eada72e1ab4371948315b28608903927 URL: http://cgit.freedesktop.org/mesa/mesa/commit/?id=0f3ba405eada72e1ab4371948315b28608903927 Author: Matt Turnermatts...@gmail.com Date: Fri Sep 14 16:04:40 2012 -0700 Use signbit() in IS_NEGATIVE and DIFFERENT_SIGNS signbit() appears to be available everywhere (even MSVC according to MSDN), so let's use it instead of open-coding some messy and confusing bit twiddling macros. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=54805 Reviewed-by: Paul Berrystereotype...@gmail.com Suggested-by: Ian Romanickian.d.roman...@intel.com --- configure.ac |7 +++ src/mesa/main/macros.h | 21 ++--- 2 files changed, 9 insertions(+), 19 deletions(-) diff --git a/configure.ac b/configure.ac index 4193496..cb65467 100644 --- a/configure.ac +++ b/configure.ac @@ -499,6 +499,13 @@ AC_SUBST([DLOPEN_LIBS]) dnl See if posix_memalign is available AC_CHECK_FUNC([posix_memalign], [DEFINES=$DEFINES -DHAVE_POSIX_MEMALIGN]) +dnl signbit() is a macro in glibc's math.h, so AC_CHECK_FUNC fails. To handle +dnl this, use AC_CHECK_DECLS and fallback to AC_CHECK_FUNC in case it fails. +AC_CHECK_DECLS([signbit],[], + AC_CHECK_FUNC([signbit],[], + AC_MSG_ERROR([could not find signbit()])), + [#includemath.h]) + dnl SELinux awareness. AC_ARG_ENABLE([selinux], [AS_HELP_STRING([--enable-selinux], diff --git a/src/mesa/main/macros.h b/src/mesa/main/macros.h index 04d59d7..7b7fd1b 100644 --- a/src/mesa/main/macros.h +++ b/src/mesa/main/macros.h @@ -693,31 +693,14 @@ NORMALIZE_3FV(GLfloat v[3]) static inline GLboolean IS_NEGATIVE(float x) { -#if defined(USE_IEEE) - fi_type fi; - fi.f = x; - return fi.i 0; -#else - return x 0.0F; -#endif + return signbit(x) != 0; } - /** Test two floats have opposite signs */ static inline GLboolean DIFFERENT_SIGNS(GLfloat x, GLfloat y) { -#if defined(USE_IEEE) - fi_type xfi, yfi; - xfi.f = x; - yfi.f = y; - return !!((xfi.i ^ yfi.i) (1u 31)); -#else - /* Could just use (x*y0) except for the flatshading requirements. -* Maybe there's a better way? -*/ - return ((x) * (y)= 0.0F (x) - (y) != 0.0F); -#endif + return signbit(x) != signbit(y); } Looks like we don't have signbit() on Windows. We build with scons there so the autoconf check isn't applicable. I'll post a patch in a bit. -Brian MSDN claims that Windows does have signbit(): http://msdn.microsoft.com/en-us/library/hh308342.aspx ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] mesa: loosen small matrix determinant check
On Mon, Jul 30, 2012 at 4:31 AM, Pekka Paalanen ppaala...@gmail.com wrote: On Tue, 24 Jul 2012 11:31:59 -0600 Brian Paul bri...@vmware.com wrote: When computing a matrix inverse, if the determinant is too small we could hit a divide by zero. There's a check to prevent this (we basically give up on computing the inverse and return the identity matrix.) This patches loosens this test to fix a lighting bug reported by Lars Henning Wendt. NOTE: This is a candidate for the 8.0 branch. --- src/mesa/math/m_matrix.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/src/mesa/math/m_matrix.c b/src/mesa/math/m_matrix.c index 02aedba..ef377ee 100644 --- a/src/mesa/math/m_matrix.c +++ b/src/mesa/math/m_matrix.c @@ -513,7 +513,7 @@ static GLboolean invert_matrix_3d_general( GLmatrix *mat ) det = pos + neg; - if (det*det 1e-25) + if (det 1e-25) return GL_FALSE; det = 1.0F / det; Hi, just a fly-by question; doesn't that break if determinant is negative? I.e. reflection transformations. Yeah, I think you need a fabsf() there. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] IROUND() issue
On Fri, May 18, 2012 at 11:28 AM, Brian Paul bri...@vmware.com wrote: On 05/18/2012 10:11 AM, Jose Fonseca wrote: - Original Message - A while back I noticed that the piglit roundmode-pixelstore and roundmode-getinteger tests pass on my 64-bit Fedora system but fail on a 32-bit Ubuntu system. Both glGetIntegerv() and glPixelStoref() use the IROUND() function to convert floats to ints. The implementation if IROUND() that uses the x86 fistp instruction is protected with: #if defined(USE_X86_ASM) defined(__GNUC__) defined(__i386__) but that evaluates to 0 on x86-64 (neither USE_X86_ASM nor __i386__ are defined) so we use the C fallback: #define IROUND(f) ((int) (((f)= 0.0F) ? ((f) + 0.5F) : ((f) - 0.5F))) The C version of IROUND() does what we want for the piglit tests but not the x86 version. I think the default x86 rounding mode is FE_UPWARD so that explains the failures. So I think I'd like to do the following: 1. Enable the x86 fistp-based functions in imports.h for x86-64. It's illegal/inneficient to use x87 on x86-64. We should use the appropriate SSE intrisinsic instead. The instruction is cvtss2si. Even if you use SSE here, you depend on the rounding mode in the MXCSR register, which means you'll have to set that, because some applications change this mode to use a faster or more precise rounding mode. It's the parallel problem that you have with fistp. 2. Rename IROUND() to IROUND_FAST() and define it as float-int conversion by whatever method is fastest. 3. Define IROUND() as round to nearest int. For the x86 fistp implementation this would involve setting/restoring the rounding mode. If I recall, it is generally run with some other rounding mode other than truncate by default, so usually float - int conversions that involve truncation (C cast) require changing the rounding mode *to truncation*. This was such a problem that in SSE3 there is fisttp which is FP integer store with truncation. I guess though if the default rounding mode causes problems, there isn't much that can be done but change it each time. Patrick ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Four questions about DRI1 drivers
Now I'm curious. Is it the case that every DRI1 driver *could be* a DRI2 driver with enough effort? Not talking about emulating hardware features. Patrick On Thu, Mar 1, 2012 at 1:46 PM, Dave Airlie airl...@gmail.com wrote: On Thu, Mar 1, 2012 at 7:25 PM, Connor Behan connor.be...@gmail.com wrote: On 01/03/12 01:36 AM, Dave Airlie wrote: You can still build r128_dri.so from Mesa 7.11 and it will work with later Mesa libGLs fine. You just can't build it from Mesa 8.0 source anymore. Really? Even if no one updates r128 to stay compatible with new libGLs and no one updating libGL gives a second thought as to whether that update will break r128? I thought the whole point of removing DRI1 drivers is that most of you are too pressured to keep that promise. If the plan really is to update libGL carefully so that DRI1 drivers will always work with it, then it seems like their removal does nothing but save a few MB of space on the git server. Thats the plan, some distros have to keep shipping older drivers, but also want to ship newer drivers. the libGL - driver interface is a lot more standard than the internal mesa-driver interfaces, and are not the same thing. Removing the drivers allowed major simplification of mesa internal interfaces not the GL-driver interface. It doesn't save any space on the git server since git holds all the history ever. Dave. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] gallium/i965g: hide that utterly broken driver better
On Mon, Nov 28, 2011 at 3:32 PM, Daniel Vetter daniel.vet...@ffwll.chwrote: And warn loudly in case people want to use it. Too many tester report gpu hangs on irc and we rootcause this ... Signed-Off-by: Daniel Vetter daniel.vet...@ffwll.ch --- configure.ac |9 - 1 files changed, 8 insertions(+), 1 deletions(-) diff --git a/configure.ac b/configure.ac index 8885a6d..4dee3ad 100644 --- a/configure.ac +++ b/configure.ac @@ -658,7 +658,7 @@ GALLIUM_DRIVERS_DEFAULT=r300,r600,swrast AC_ARG_WITH([gallium-drivers], [AS_HELP_STRING([--with-gallium-drivers@:@=DIRS...@:@], [comma delimited Gallium drivers list, e.g. -i915,i965,nouveau,r300,r600,svga,swrast +i915,nouveau,r300,r600,svga,swrast @:@default=r300,r600,swrast@:@])], [with_gallium_drivers=$withval], [with_gallium_drivers=$GALLIUM_DRIVERS_DEFAULT]) @@ -2007,10 +2007,17 @@ if echo $SRC_DIRS | grep 'gallium' /dev/null 21; then echo Winsys dirs: $GALLIUM_WINSYS_DIRS echo Driver dirs: $GALLIUM_DRIVERS_DIRS echo Trackers dirs: $GALLIUM_STATE_TRACKERS_DIRS + if echo $GALLIUM_DRIVERS_DIRS | grep i965 /dev/null 21; then + echo + echo WARNING: enabling i965 gallium driver + echo the i965g driver is currently utterly broken, only for adventurours developers I think the word is adventurous. + echo + fi else echo Gallium: no fi + dnl Libraries echo echo Shared libs: $enable_shared -- 1.7.7.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] mesa: re-implement unpacking of DEPTH_COMPONENT32F
On Tue, Nov 22, 2011 at 2:07 PM, Marek Olšák mar...@gmail.com wrote: Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=43122 --- src/mesa/main/format_unpack.c | 10 ++ 1 files changed, 10 insertions(+), 0 deletions(-) diff --git a/src/mesa/main/format_unpack.c b/src/mesa/main/format_unpack.c index 6e2ce7a..52f224a 100644 --- a/src/mesa/main/format_unpack.c +++ b/src/mesa/main/format_unpack.c @@ -1751,6 +1751,13 @@ unpack_float_z_Z32(GLuint n, const void *src, GLfloat *dst) } static void +unpack_float_z_Z32F(GLuint n, const void *src, GLfloat *dst) +{ + const GLfloat *s = ((const GLfloat *) src); + memcpy(dst, s, n * sizeof(float)); +} Why bother typecasting here in a separate variable 's'? + +static void unpack_float_z_Z32X24S8(GLuint n, const void *src, GLfloat *dst) { const GLfloat *s = ((const GLfloat *) src); @@ -1783,6 +1790,9 @@ _mesa_unpack_float_z_row(gl_format format, GLuint n, case MESA_FORMAT_Z32: unpack = unpack_float_z_Z32; break; + case MESA_FORMAT_Z32_FLOAT: + unpack = unpack_float_z_Z32F; + break; case MESA_FORMAT_Z32_FLOAT_X24S8: unpack = unpack_float_z_Z32X24S8; break; -- 1.7.5.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Mesa (master): st/xorg: fix build without LLVM
Well, trivial answer is that Win32 uses some C/C++ runtime provided by Microsoft, usually something like MSVCR90.DLL (v9.0) etc. Solaris uses libC.so, for example. As far as I know, only systems where the GNU C/C++ compiler is main system compiler (and generally therefore the GNU C++ runtime) uses anything named libstdc++. So I'd expect Free/Net/OpenBSD + Linux use that naming and probably not much else. On other commercial UNIXes, if it does exist, it is just for compatibility with C++ programs compiled using g++. Patrick 2011/10/13 Marcin Slusarz marcin.slus...@gmail.com On Thu, Oct 13, 2011 at 07:54:32PM +0200, Michel Dänzer wrote: On Don, 2011-10-13 at 10:03 -0700, Marcin XXlusarz wrote: Module: Mesa Branch: master Commit: 349e4db99e938f8ee8826b0d27e490c66a1e8356 URL: http://cgit.freedesktop.org/mesa/mesa/commit/?id=349e4db99e938f8ee8826b0d27e490c66a1e8356 Author: Marcin Slusarz marcin.slus...@gmail.com Date: Thu Oct 13 18:44:40 2011 +0200 st/xorg: fix build without LLVM --- src/gallium/targets/Makefile.xorg |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/src/gallium/targets/Makefile.xorg b/src/gallium/targets/Makefile.xorg index 9269375..c96eded 100644 --- a/src/gallium/targets/Makefile.xorg +++ b/src/gallium/targets/Makefile.xorg @@ -33,6 +33,8 @@ LD = $(CXX) LDFLAGS += $(LLVM_LDFLAGS) USE_CXX=1 DRIVER_LINKS += $(LLVM_LIBS) -lm -ldl +else +LDFLAGS += -lstdc++ endif This is wrong. Use g++ for linking libstdc++, gcc [...] -lstdc++ doesn't work everywhere. It wasn't my invention - I mimicked other targets (with partial exception of dri). Why gcc -lstdc++ doesn't work everywhere? --- From: Marcin Slusarz marcin.slus...@gmail.com Subject: [PATCH] gallium/targets: use g++ for linking As pointed by Michel Dänzer, gcc -lstdc++ doesn't work everywhere, because ... Use g++ for linking and remove redundant LDFLAGS += -lstdc++. --- src/gallium/targets/Makefile.dri |2 -- src/gallium/targets/Makefile.va|4 +--- src/gallium/targets/Makefile.vdpau |4 +--- src/gallium/targets/Makefile.xorg |5 + src/gallium/targets/Makefile.xvmc |4 +--- 5 files changed, 4 insertions(+), 15 deletions(-) diff --git a/src/gallium/targets/Makefile.dri b/src/gallium/targets/Makefile.dri index 857ebfe..a26b3ee 100644 --- a/src/gallium/targets/Makefile.dri +++ b/src/gallium/targets/Makefile.dri @@ -4,8 +4,6 @@ ifeq ($(MESA_LLVM),1) LDFLAGS += $(LLVM_LDFLAGS) DRIVER_EXTRAS = $(LLVM_LIBS) -else -LDFLAGS += -lstdc++ endif MESA_MODULES = \ diff --git a/src/gallium/targets/Makefile.va b/src/gallium/targets/Makefile.va index 7ced430..b6ee595 100644 --- a/src/gallium/targets/Makefile.va +++ b/src/gallium/targets/Makefile.va @@ -17,8 +17,6 @@ STATE_TRACKER_LIB = $(TOP)/src/gallium/state_trackers/va/libvatracker.a ifeq ($(MESA_LLVM),1) LDFLAGS += $(LLVM_LDFLAGS) DRIVER_EXTRAS = $(LLVM_LIBS) -else -LDFLAGS += -lstdc++ endif # XXX: Hack, VA public funcs aren't exported @@ -39,7 +37,7 @@ OBJECTS = $(C_SOURCES:.c=.o) \ default: depend symlinks $(TOP)/$(LIB_DIR)/gallium/$(LIBNAME) $(TOP)/$(LIB_DIR)/gallium/$(LIBNAME): $(OBJECTS) $(PIPE_DRIVERS) $(STATE_TRACKER_LIB) $(TOP)/$(LIB_DIR)/gallium Makefile - $(MKLIB) -o $(LIBBASENAME) -linker '$(CC)' -ldflags '$(LDFLAGS)' \ + $(MKLIB) -o $(LIBBASENAME) -linker '$(CXX)' -ldflags '$(LDFLAGS)' \ -major $(VA_MAJOR) -minor $(VA_MINOR) $(MKLIB_OPTIONS) \ -install $(TOP)/$(LIB_DIR)/gallium \ $(OBJECTS) $(STATE_TRACKER_LIB) $(PIPE_DRIVERS) $(LIBS) $(DRIVER_EXTRAS) diff --git a/src/gallium/targets/Makefile.vdpau b/src/gallium/targets/Makefile.vdpau index c634915..f6b89ad 100644 --- a/src/gallium/targets/Makefile.vdpau +++ b/src/gallium/targets/Makefile.vdpau @@ -17,8 +17,6 @@ STATE_TRACKER_LIB = $(TOP)/src/gallium/state_trackers/vdpau/libvdpautracker.a ifeq ($(MESA_LLVM),1) LDFLAGS += $(LLVM_LDFLAGS) DRIVER_EXTRAS = $(LLVM_LIBS) -else -LDFLAGS += -lstdc++ endif # XXX: Hack, VDPAU public funcs aren't exported if we link to libvdpautracker.a :( @@ -39,7 +37,7 @@ OBJECTS = $(C_SOURCES:.c=.o) \ default: depend symlinks $(TOP)/$(LIB_DIR)/gallium/$(LIBNAME) $(TOP)/$(LIB_DIR)/gallium/$(LIBNAME): $(OBJECTS) $(PIPE_DRIVERS) $(STATE_TRACKER_LIB) $(TOP)/$(LIB_DIR)/gallium Makefile - $(MKLIB) -o $(LIBBASENAME) -linker '$(CC)' -ldflags '$(LDFLAGS)' \ + $(MKLIB) -o $(LIBBASENAME) -linker '$(CXX)' -ldflags '$(LDFLAGS)' \ -major $(VDPAU_MAJOR) -minor $(VDPAU_MINOR) $(MKLIB_OPTIONS) \ -install $(TOP)/$(LIB_DIR)/gallium \ $(OBJECTS) $(STATE_TRACKER_LIB) $(PIPE_DRIVERS) $(LIBS) $(DRIVER_EXTRAS) diff --git a/src/gallium/targets/Makefile.xorg b/src/gallium/targets/Makefile.xorg index c96eded..0538b2b 100644 --- a/src/gallium/targets/Makefile.xorg +++
Re: [Mesa-dev] DEATH to old drivers!
My Voodoo3 3500 AGP just wept. On Wed, Aug 24, 2011 at 4:36 PM, Eric Anholt e...@anholt.net wrote: On Wed, 24 Aug 2011 12:11:32 -0700, Ian Romanick i...@freedesktop.org wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 I'd like to propose giving the ax to a bunch of old, unmaintained drivers. I've been doing a bunch of refactoring and reworking of core Mesa code, and these drivers have been causing me problems for a number of reasons. Acked! ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] GPL'd vl_mpeg12_bitstream.c
Why not ask the original author to relicense? 2011/8/12 Marek Olšák mar...@gmail.com 2011/8/12 Christian König deathsim...@vodafone.de: Am Freitag, den 12.08.2011, 10:49 -0400 schrieb Younes Manton: Sorry, by incompatible I didn't mean that you couldn't use them together, but that one is more restrictive than the other. Like the discussion you quoted states, if you combine MIT and GPL you have to satisfy both of them, which means you have to satisfy the GPL. I personally don't care that much, but unfortunately with the way gallium is built it affects more than just VDPAU. Every driver in lib/gallium includes that code, including swrast_dri (softpipe), r600_dri, etc, and libGL loads those drivers. If you build with the swrast config instead of DRI I believe galllium libGL statically links with softpipe, so basically my understanding is that anyone linking with gallium libGL (both swrast and DRI configs) has to satisfy the GPL now. A crap, your right. I've forgotten that GPL has even a problem when code is just linked in, compared to being used. Maybe someone else who is more familiar with these sorts of things can comment and confirm that this is accurate and whether or not it's a problem. I already asked around in my AMD team, and the general answer was: Oh fuck I've no idea, please don't give me a headache. I could asked around a bit more, but I don't think we get a definitive answer before xmas. As a short term solution we could compile that code conditionally, and only enable it when the VDPAU state tracker is enabled. But as the long term solution the code just needs a rewrite, beside having a license problem, it is just not very optimal. The original code is something like a decade old, and is using a whole bunch of quirks which are not useful by today’s standards (not including the sign in mv tables for example). ffmpegs/libavs implementation for example is something like halve the size and even faster, but uses more memory for table lookups. But that code is also dual licensed under the GPL/LGPL. Using LGPL code instead could also be a solution, because very important parts of Mesa (the GLSL parser for example) is already licensed under that, but I'm also not an expert with that also. Even though the GLSL parser is licensed under LGPL (because Bison is), there is a special exception that we may license it under whatever licence we want if we don't make software that does exactly what Bison does. So the whole GLSL compiler is actually licensed under the MIT license. There was one LGPL dependency (talloc), but Intel has paid special attention to get rid of that. My recollection is nobody wanted LGPL or GPL code in Mesa. Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] rationale for GLubyte pointers for strings?
SGI invented OpenGL and offered it first on their IRIX platform. SGI's MIPSpro compiler has the char datatype as unsigned by default, so the compiler would likely complain if assigning a GLbyte pointer to an [unsigned] character pointer. Thus, to do something like char* ext = glGetString(GL_VENDOR); doesn't require a cast on IRIX, while the same code would require a cast using other compilers due to the aforementioned problem. Patrick On Tue, Jul 19, 2011 at 1:44 PM, Allen Akin a...@arden.org wrote: On Tue, Jul 19, 2011 at 12:20:54PM -0600, tom fogal wrote: | glGetString and gluErrorString, plus maybe some other functions, return | GLubyte pointers instead of simply character pointers... | What's the rationale here? I agree, it's odd. I don't remember the rationale, but my best guess is that it papered over some compatibility issue with another language binding (probably Fortran). I suppose there's a very slight possibility that it sprang from a compatibility issue with Cray. Allen ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] is it possible to dynamic load OSMesa?
If libOSMesa.so is separate library, then isn't libGL.so too? You're calling glGetIntegerv() from libGL.so but not from libOSMesa.so -- try doing dlsym(glGetIntegerv) and removing libGL.so from the link line. Patrick On Fri, Jul 15, 2011 at 2:41 PM, Paul Gotzel paul.got...@gmail.com wrote: Hello, I've downloaded the latest 7.10.3 and I need to be able to dynamically load OSMesa. Is this possible? I've tried to use dlopen and dlsym to load the functions and all the OSMesa calls return success but when I make the gl calls I get: GL User Error: glGetIntegerv called without a rendering context GL User Error: glGetIntegerv called without a rendering context GL User Error: glGetIntegerv called without a rendering context Any help would be appreciated. Thanks, Paul My sample program is as follows. I compile it with the same flags as the rest of the demo programs without linking to OSMesa. static void * loadOSMesa() { return dlopen(libOSMesa.so, RTLD_DEEPBIND | RTLD_NOW | RTLD_GLOBAL); } static OSMesaContext dynOSMesaCreateContext() { typedef OSMesaContext (*CreateContextProto)( GLenum , GLint , GLint , GLint , OSMesaContext ); static void *createPfunc = NULL; CreateContextProto createContext; if (createPfunc == NULL) { void *handle = loadOSMesa(); if (handle) { createPfunc = dlsym(handle, OSMesaCreateContextExt); } } if (createPfunc) { createContext = (CreateContextProto)(createPfunc); return (*createContext)(GL_RGBA, 16, 0, 0, NULL); } return 0; } static GLboolean dynOSMesaMakeCurrent(OSMesaContext cid, void * win, GLenum type, GLsizei w, GLsizei h) { typedef GLboolean (*MakeCurrentProto)(OSMesaContext, void *, GLenum, GLsizei, GLsizei); static void *currentPfunc = NULL; MakeCurrentProto makeCurrent; if (currentPfunc == NULL) { void *handle = loadOSMesa(); if (handle) { currentPfunc = dlsym(handle, OSMesaMakeCurrent); } } if (currentPfunc) { makeCurrent = (MakeCurrentProto)(currentPfunc); return (*makeCurrent)(cid, win, type, w, h); } return GL_FALSE; } int main(int argc, char *argv[]) { OSMesaContext ctx; void *buffer; ctx = dynOSMesaCreateContext(); if (!ctx) { printf(OSMesaCreateContext failed!\n); return 0; } int Width = 100; int Height = 100; /* Allocate the image buffer */ buffer = malloc( Width * Height * 4 * sizeof(GLubyte) ); if (!buffer) { printf(Alloc image buffer failed!\n); return 0; } /* Bind the buffer to the context and make it current */ if (!dynOSMesaMakeCurrent( ctx, buffer, GL_UNSIGNED_BYTE, Width, Height )) { printf(OSMesaMakeCurrent failed!\n); return 0; } { int z, s, a; glGetIntegerv(GL_DEPTH_BITS, z); glGetIntegerv(GL_STENCIL_BITS, s); glGetIntegerv(GL_ACCUM_RED_BITS, a); printf(Depth=%d Stencil=%d Accum=%d\n, z, s, a); } return 0; } ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2 2/2] xorg/nouveau: blacklist all pre NV30 cards
Wasn't nouveau targeted to provide HW acceleration for old cards like the TNT2, or has that idea been killed? Patrick On Sun, Jun 5, 2011 at 2:06 PM, Marcin Slusarz marcin.slus...@gmail.comwrote: On Tue, May 17, 2011 at 12:20:14AM +0200, Marcin Slusarz wrote: Bail out early in probe, so other driver can take control of the card. Doing it in screen_create would be too late. --- src/gallium/targets/xorg-nouveau/nouveau_xorg.c | 44 ++- 1 files changed, 35 insertions(+), 9 deletions(-) ping ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] mesa: silence some compilation warnings.
I would be wary of assuming you can typecast long - pointer, or pointer - long. On 64-bit Windows, sizeof(int) == sizeof(long) == 4 but sizeof(void*) == 8. On 64-bit Linux (gcc), sizeof(int) == 4, sizeof(long) == sizeof(void*) == 8. It would be better to use stdint.h with uintptr_t -- it was designed to solve this problem exactly. If you insist on using long, why not use long long (C99) which is 64-bits on both platforms. On Thu, May 12, 2011 at 3:49 AM, zhigang gong zhigang.g...@gmail.comwrote: glu.h: typedef void (GLAPIENTRYP _GLUfuncptr)(); causes the following warning: function declaration isn't a prototype. egl: When convert a (void *) to a int type, it's better to convert to long firstly, otherwise in 64 bit envirnonment, it causes compilation warning. --- include/GL/glu.h|2 +- src/egl/drivers/dri2/egl_dri2.c |4 ++-- src/egl/drivers/dri2/platform_drm.c |4 ++-- src/egl/drivers/dri2/platform_x11.c |2 +- src/egl/main/eglapi.c |2 +- 5 files changed, 7 insertions(+), 7 deletions(-) diff --git a/include/GL/glu.h b/include/GL/glu.h index cd967ac..ba2228d 100644 --- a/include/GL/glu.h +++ b/include/GL/glu.h @@ -284,7 +284,7 @@ typedef GLUtesselator GLUtriangulatorObj; #define GLU_TESS_MAX_COORD 1.0e150 /* Internal convenience typedefs */ -typedef void (GLAPIENTRYP _GLUfuncptr)(); +typedef void (GLAPIENTRYP _GLUfuncptr)(void); GLAPI void GLAPIENTRY gluBeginCurve (GLUnurbs* nurb); GLAPI void GLAPIENTRY gluBeginPolygon (GLUtesselator* tess); diff --git a/src/egl/drivers/dri2/egl_dri2.c b/src/egl/drivers/dri2/egl_dri2.c index afab679..f5f5ac3 100644 --- a/src/egl/drivers/dri2/egl_dri2.c +++ b/src/egl/drivers/dri2/egl_dri2.c @@ -835,7 +835,7 @@ dri2_create_image_khr_renderbuffer(_EGLDisplay *disp, _EGLContext *ctx, struct dri2_egl_display *dri2_dpy = dri2_egl_display(disp); struct dri2_egl_context *dri2_ctx = dri2_egl_context(ctx); struct dri2_egl_image *dri2_img; - GLuint renderbuffer = (GLuint) buffer; + GLuint renderbuffer = (unsigned long) buffer; if (renderbuffer == 0) { _eglError(EGL_BAD_PARAMETER, dri2_create_image_khr); @@ -870,7 +870,7 @@ dri2_create_image_mesa_drm_buffer(_EGLDisplay *disp, _EGLContext *ctx, (void) ctx; - name = (EGLint) buffer; + name = (unsigned long) buffer; err = _eglParseImageAttribList(attrs, disp, attr_list); if (err != EGL_SUCCESS) diff --git a/src/egl/drivers/dri2/platform_drm.c b/src/egl/drivers/dri2/platform_drm.c index 68912e3..cea8418 100644 --- a/src/egl/drivers/dri2/platform_drm.c +++ b/src/egl/drivers/dri2/platform_drm.c @@ -596,7 +596,7 @@ dri2_get_device_name(int fd) goto out; } - device_name = udev_device_get_devnode(device); + device_name = (char*)udev_device_get_devnode(device); if (!device_name) goto out; device_name = strdup(device_name); @@ -690,7 +690,7 @@ dri2_initialize_drm(_EGLDriver *drv, _EGLDisplay *disp) memset(dri2_dpy, 0, sizeof *dri2_dpy); disp-DriverData = (void *) dri2_dpy; - dri2_dpy-fd = (int) disp-PlatformDisplay; + dri2_dpy-fd = (long) disp-PlatformDisplay; dri2_dpy-driver_name = dri2_get_driver_for_fd(dri2_dpy-fd); if (dri2_dpy-driver_name == NULL) diff --git a/src/egl/drivers/dri2/platform_x11.c b/src/egl/drivers/dri2/platform_x11.c index 5d4ac6a..90136f4 100644 --- a/src/egl/drivers/dri2/platform_x11.c +++ b/src/egl/drivers/dri2/platform_x11.c @@ -784,7 +784,7 @@ dri2_create_image_khr_pixmap(_EGLDisplay *disp, _EGLContext *ctx, (void) ctx; - drawable = (xcb_drawable_t) buffer; + drawable = (xcb_drawable_t) (long)buffer; xcb_dri2_create_drawable (dri2_dpy-conn, drawable); attachments[0] = XCB_DRI2_ATTACHMENT_BUFFER_FRONT_LEFT; buffers_cookie = diff --git a/src/egl/main/eglapi.c b/src/egl/main/eglapi.c index 336ec23..9063752 100644 --- a/src/egl/main/eglapi.c +++ b/src/egl/main/eglapi.c @@ -1168,7 +1168,7 @@ eglQueryModeStringMESA(EGLDisplay dpy, EGLModeMESA mode) EGLDisplay EGLAPIENTRY eglGetDRMDisplayMESA(int fd) { - _EGLDisplay *dpy = _eglFindDisplay(_EGL_PLATFORM_DRM, (void *) fd); + _EGLDisplay *dpy = _eglFindDisplay(_EGL_PLATFORM_DRM, (void *) (long)fd); return _eglGetDisplayHandle(dpy); } -- 1.7.3.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] Thanks To All!
I just wanted to say thanks! to everyone who has been taking part of Mesa3D. I have an R500-based card and it is good to know that it still functions on Linux even after ATI/AMD decided it was too old too support. Not only that, it still receives improvements from Mesa. I even hear whispers that those cards might function on Power architecture systems, and I can't help but finding myself impressed. Good job to you all and keep up the good work. Patrick Baggett ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Naked DXTn support via ARB_texture_compression?
Offhand, anyone know when these patents expire? Patrick ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Truncated extensions string
I feel like there is some kind of underlying lesson that we, OpenGL app programmers, should be getting out of this... What about a psuedo-database of app - extension list rather than by year? Surely Quake3 doesn't make use of but = 10 extensions. I'd imagine the same holds true for other old games as well. A simple strings on their binary could figure that out... On Fri, Mar 11, 2011 at 2:14 PM, Kenneth Graunke kenn...@whitecape.orgwrote: On Friday, March 11, 2011 10:46:31 AM José Fonseca wrote: On Fri, 2011-03-11 at 09:04 -0800, Eric Anholt wrote: On Fri, 11 Mar 2011 10:33:13 +, José Fonseca jfons...@vmware.com wrote: The problem from http://www.mail-archive.com/mesa3d-dev@lists.sourceforge.net/msg12493.h tml is back, and now a bit worse -- it causes Quake3 arena demo to crash (at least the windows version). The full version works fine. I'm not sure what other applications are hit by this. See the above thread for more background. There are two major approaches: 1) sort extensions chronologically instead of alphabetically. See attached patch for that - for those who prefer to see extensions sorted alphabetically in glxinfo, we could modify glxinfo to sort then before displaying 2) detect broken applications (i.e., by process name), and only sort extensions strings chronologically then Personally I think that varying behavior based on process name is a ugly and brittle hack, so I'd prefer 1), but I just want to put this on my back above all, so whatever works is also fine by me. If this is just a hack for one broken application, and we think that building in a workaround for this particular broken application is important (I don't), I still prefer an obvious hack for that broken application like feeding it a tiny extension string that it cares about, instead of reordering the extension list. There are many versions of Quake3 out there, some fixed, others not, and others enhanced. This means a tiny string would prevent any Quake3 application from finding newer extensions. So I think that if we go for the application name detection then we should present the whole extension string sorted chronologically, instead of giving a tiny string. Jose I agree with José - it's not one broken application, it's a number of old, sometimes closed-source games that we can't change. I'm not sure how changing the sorting solves the problem, anyway - the amount of data returned would still overflow the buffer, possibly wreaking havoc. I'd rather avoid that. Ian and I talked about this a year ago, and the solution I believe we came up with was to use a driconf option or environment variable: If MESA_MAX_EXTENSION_YEAR=2006, then glGetString would only return extensions created in 2006 or earlier. The rationale is that if a game came out in 2006, it won't know about any extensions from 2007 anyway, so advertising them is useless. The fixed-size buffer is also almost certainly large enough to handle this cut-down list of extensions. This should be trivial to do now that you already have the years for each extension...just store them in the table, rather than in comments, and check before listing an extension. A driconf option is nice because it allows this to be overridden in .drirc on a per-app basis, rather than having to set an environment variable. It might be a bit more work though. --Kenneth ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] os: add spinlocks
UP = Uniprocessor system, (S)MP = (Symmetric) multiprocessor system. On Wed, Dec 15, 2010 at 2:23 AM, Marek Olšák mar...@gmail.com wrote: On Tue, Dec 14, 2010 at 8:10 PM, Thomas Hellstrom thellst...@vmware.comwrote: Hmm, for the uninformed, where do we need to use spinlocks in gallium and how do we avoid using them on an UP system? I plan to use spinlocks to guard very simple code like the macro remove_from_list, which might be, under some circumstances, called too often. Entering and leaving a mutex is quite visible in callgrind. What does UP stand for? Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev