from:"Keith Whitwell"

Re: [Mesa-dev] [RFC]Improves st_finalize_texture cycles consumption

2012-01-08 Thread Keith Whitwell

I don't have the code handy (and haven't looked at it in a while), but wonder 
if finer-grained tracking of dirtiness would help?  Or more generally trying to 
preserve more computed results across state changes?

Keith

- Original Message -
> Hi,
> 
> I did some profiling with perf under nexuiz and found that
> st_finalize_texture
> function was one of the most cycle consumming. (~1,50% whereas
> darkplaces took ~30%)
> 
> I rewrite some part of this function to make it a bit faster ; with
> these 2 patches,
> st_finalize_texture consumption went down to ~1%, so a 40-50% boost.
> This does however not translate to more fps to Nexuiz : if there is
> any improvement,
> it is not noticeable (too much noise in measurements). On the other
> hand, the function
> has become less readable. I had to manually unroll loops and use
> intermediate values
> (gcc does not do it automaticaly, using default parameters).
> Of course I think that we should make less call to this function to
> see a true gain,
> but this would require more work.
> 
> Regards,
> Vincent
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] RFC: remove ctx->Driver.TextureMemCpy() hook

2011-12-02 Thread Keith Whitwell

On Fri, 2011-12-02 at 08:14 -0700, Brian Paul wrote:
> This hook was added many years ago to allow using an alternative 
> implementation of memcpy() for glTexImage() that was faster under some 
> circumstances.
> 
> The code is still present in the state tracker in st_cb_texture.c
> 
> The hook is only used in texstore.c in the memcpy_texture() helper. 
> It's not used for glCompressedTex[Sub]Image nor a few other places 
> where it could have been used.
> 
> The non-gallium drivers just set ctx->Driver.TextureMemCpy = memcpy so 
> it's really not utilized there.
> 
> If we think that using regular memcpy() everywhere is OK, I'd like to 
> remove this hook.  I haven't done any investigation into whether the 
> assembly __memcpy() function in st_cb_teximage.c is really any faster 
> nowadays.  But if there really is a benefit to this function, we could 
> use it in more places.
> 
> Any comments?

That was a very long time ago.  I'd be surprised if the problem
persists.

Keith

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 4/6] gallium: remove PIPE_CAP_GLSL and enable GLSL unconditionally

2011-11-18 Thread Keith Whitwell



- Original Message -
> On 11/18/2011 11:27 AM, Marek Olšák wrote:
> > Only i965g does not enable GLSL, but that driver has been
> > unmaintained and
> > bitrotting for quite a while anyway.
> 
> It doesn't even do GLSL?  I'm pretty shocked, I figured it at least
> did
> that.  Is it even worth keeping around in the tree?  Seems like it's
> just creating extra work for you guys, having to update it for
> Gallium
> changes...when ultimately, nobody's using it.
>

I agree -- this was never finished & isn't likely to be either.

Keith
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] TGSI declarations missing type info

2011-11-14 Thread Keith Whitwell

On Mon, 2011-11-14 at 09:42 +, Keith Whitwell wrote:
> On Sun, 2011-11-13 at 14:43 -0600, Bryan Cain wrote:
> > On 11/13/2011 09:06 AM, Dave Airlie wrote:
> > > Hi guys,
> > >
> > > Just been looking at llvmpipe integer support and it seems like we
> > > lose some information about the type of data stored into temporaries,
> > >
> > > after st_glsl_to_cpp we no longer know what type the temporaries are,
> > > and llvm would really like to know and I can't see any reason that
> > > TGSI doesn't contain the info. Having untyped temp decls means we'd
> > > have to allocate some sort of "union" via aliases I guess in llvmpipe
> > > for all temps so we can store int/float in them.
> > >
> > > I've attached a run of glsl-vs-loop from llvmpipe with integer opcodes
> > > forced on. (llvmpipe-int-test branch of my repo).
> > >
> > > Dave.
> > 
> > If you do add types to TGSI registers, it's worth noting that the
> > internal IR used by glsl_to_tgsi (glsl_to_tgsi_instruction) already the
> > types of all src and dst registers, and it's only lost when converting
> > that to TGSI.  However, it was only intended to be good enough to
> > determine whether to emit an integer or float instruction, so there
> > might be some mistakes remaining somewhere that would need to be corrected.
> > 
> 
> I'd certainly support the idea of adding type information to TGSI.  It
> would mean that any SM4-to-TGSI translator would have to do type
> inference, but afaik SM4 is the only place where that would have to
> happen -- all other potential sources of TGSI either have type
> information (like IR as noted above), or are pretty much float-only
> (like SM3).  

Note I'm mainly posting this because I previously held the opposite view
fairly strongly & wanted to make sure that nobody feels they need to
keep on considering that...

Keith 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] TGSI declarations missing type info

2011-11-14 Thread Keith Whitwell

On Sun, 2011-11-13 at 14:43 -0600, Bryan Cain wrote:
> On 11/13/2011 09:06 AM, Dave Airlie wrote:
> > Hi guys,
> >
> > Just been looking at llvmpipe integer support and it seems like we
> > lose some information about the type of data stored into temporaries,
> >
> > after st_glsl_to_cpp we no longer know what type the temporaries are,
> > and llvm would really like to know and I can't see any reason that
> > TGSI doesn't contain the info. Having untyped temp decls means we'd
> > have to allocate some sort of "union" via aliases I guess in llvmpipe
> > for all temps so we can store int/float in them.
> >
> > I've attached a run of glsl-vs-loop from llvmpipe with integer opcodes
> > forced on. (llvmpipe-int-test branch of my repo).
> >
> > Dave.
> 
> If you do add types to TGSI registers, it's worth noting that the
> internal IR used by glsl_to_tgsi (glsl_to_tgsi_instruction) already the
> types of all src and dst registers, and it's only lost when converting
> that to TGSI.  However, it was only intended to be good enough to
> determine whether to emit an integer or float instruction, so there
> might be some mistakes remaining somewhere that would need to be corrected.
> 

I'd certainly support the idea of adding type information to TGSI.  It
would mean that any SM4-to-TGSI translator would have to do type
inference, but afaik SM4 is the only place where that would have to
happen -- all other potential sources of TGSI either have type
information (like IR as noted above), or are pretty much float-only
(like SM3).  

Keith

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] RFC: Remove tgsi-sse2.

2011-11-08 Thread Keith Whitwell

On Tue, 2011-11-08 at 07:47 -0800, Jose Fonseca wrote:
> tgsi_exec is simple; llvm is fast; and tgsi_sse2 ends up being neither. So 
> really serves no purpose and is currently broken.
> 

Sounds good to me!

Keith

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] llvmpipe: fix a crash in non-SSE path

2011-10-30 Thread Keith Whitwell

Looks good to me.

Keith

On Sun, 2011-10-30 at 20:05 +0800, Chia-I Wu wrote:
> From: Chia-I Wu 
> 
> It is a typo went unnoticed.
> ---
>  src/gallium/drivers/llvmpipe/lp_rast_tri.c |2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/src/gallium/drivers/llvmpipe/lp_rast_tri.c 
> b/src/gallium/drivers/llvmpipe/lp_rast_tri.c
> index 3adfbaa..71d0ddf 100644
> --- a/src/gallium/drivers/llvmpipe/lp_rast_tri.c
> +++ b/src/gallium/drivers/llvmpipe/lp_rast_tri.c
> @@ -129,7 +129,7 @@ lp_rast_triangle_4_16(struct lp_rasterizer_task *task,
> union lp_rast_cmd_arg arg2;
> arg2.triangle.tri = arg.triangle.tri;
> arg2.triangle.plane_mask = (1<<4)-1;
> -   lp_rast_triangle_3(task, arg2);
> +   lp_rast_triangle_4(task, arg2);
>  }
>  
>  void


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] libgallium.so and miscelaneous buildsystem patches

2011-10-05 Thread Keith Whitwell

On Wed, 2011-10-05 at 20:14 +1100, Christopher James Halse Rogers wrote:
> On Wed, 2011-10-05 at 09:24 +0200, Joakim Sindholt wrote:
> > On Tue, 2011-10-04 at 17:58 +0200, Fabio wrote:
> > > Can the patches at
> > > http://lists.freedesktop.org/archives/mesa-dev/2011-August/011099.html
> > > be considered for merging?
> > > 
> > > Sharing libgallium should save some MB of installed space.
> > 
> > And be an ABI nightmare for distributions
> 
> No; it's a private library.  Distributions will happily ship a
> libgallium built from exactly the same source that the DRI drivers are
> built from.  Indeed, that's what currently happens for those
> distributions with ship with --enable-shared-dricore, and what happens
> in Ubuntu, where we've got this patch series applied in our never-ending
> quest to cram a fully-featured linux system on a 700MB CD.
> 
> Saving 20-odd megabytes is really useful there :)

An alternative would be to build all the drivers into a single library
for maximal sharing.

Keith

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/7] intel: Remove the pbo zero-copy code.

2011-09-21 Thread Keith Whitwell

I'm suprised that fragile code lasted as long as it did...

Looks good to me.

Keith

On Wed, 2011-09-21 at 10:15 -0700, Eric Anholt wrote:
> There were notes about the possibility of slowdowns due to zcopy from
> a PBO due to thrashing around of the region.  Slowdowns are even more
> likely now that textures are generally tiled, which a zcopy wouldn't
> get.  Additionally, there were no checks on the buffer size to ensure
> that the hardware-required rounding was present, which could result in
> GPU hangs on large zcopy PBOs.
> ---
>  src/mesa/drivers/dri/intel/intel_buffer_objects.c |   45 
>  src/mesa/drivers/dri/intel/intel_buffer_objects.h |   12 --
>  src/mesa/drivers/dri/intel/intel_regions.c|  119 
> -
>  src/mesa/drivers/dri/intel/intel_regions.h|   11 --
>  src/mesa/drivers/dri/intel/intel_tex_image.c  |   60 ---
>  5 files changed, 0 insertions(+), 247 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/intel/intel_buffer_objects.c 
> b/src/mesa/drivers/dri/intel/intel_buffer_objects.c
> index d35a50e..4df2d76 100644
> --- a/src/mesa/drivers/dri/intel/intel_buffer_objects.c
> +++ b/src/mesa/drivers/dri/intel/intel_buffer_objects.c
> @@ -79,30 +79,6 @@ intel_bufferobj_alloc(struct gl_context * ctx, GLuint 
> name, GLenum target)
> return &obj->Base;
>  }
>  
> -/* Break the COW tie to the region.  The region gets to keep the data.
> - */
> -void
> -intel_bufferobj_release_region(struct intel_buffer_object *intel_obj)
> -{
> -   assert(intel_obj->region->buffer == intel_obj->buffer);
> -   intel_obj->region->pbo = NULL;
> -   intel_obj->region = NULL;
> -
> -   release_buffer(intel_obj);
> -}
> -
> -/* Break the COW tie to the region.  Both the pbo and the region end
> - * up with a copy of the data.
> - */
> -void
> -intel_bufferobj_cow(struct intel_context *intel,
> -struct intel_buffer_object *intel_obj)
> -{
> -   assert(intel_obj->region);
> -   intel_region_cow(intel, intel_obj->region);
> -}
> -
> -
>  /**
>   * Deallocate/free a vertex/pixel buffer object.
>   * Called via glDeleteBuffersARB().
> @@ -122,9 +98,6 @@ intel_bufferobj_free(struct gl_context * ctx, struct 
> gl_buffer_object *obj)
>intel_bufferobj_unmap(ctx, obj);
>  
> free(intel_obj->sys_buffer);
> -   if (intel_obj->region) {
> -  intel_bufferobj_release_region(intel_obj);
> -   }
>  
> drm_intel_bo_unreference(intel_obj->buffer);
> free(intel_obj);
> @@ -160,9 +133,6 @@ intel_bufferobj_data(struct gl_context * ctx,
>  
> assert(!obj->Pointer); /* Mesa should have unmapped it */
>  
> -   if (intel_obj->region)
> -  intel_bufferobj_release_region(intel_obj);
> -
> if (intel_obj->buffer != NULL)
>release_buffer(intel_obj);
>  
> @@ -219,9 +189,6 @@ intel_bufferobj_subdata(struct gl_context * ctx,
>  
> assert(intel_obj);
>  
> -   if (intel_obj->region)
> -  intel_bufferobj_cow(intel, intel_obj);
> -
> /* If we have a single copy in system memory, update that */
> if (intel_obj->sys_buffer) {
>if (intel_obj->source)
> @@ -347,9 +314,6 @@ intel_bufferobj_map_range(struct gl_context * ctx,
>intel_obj->sys_buffer = NULL;
> }
>  
> -   if (intel_obj->region)
> -  intel_bufferobj_cow(intel, intel_obj);
> -
> /* If the mapping is synchronized with other GL operations, flush
>  * the batchbuffer so that GEM knows about the buffer access for later
>  * syncing.
> @@ -510,15 +474,6 @@ intel_bufferobj_buffer(struct intel_context *intel,
> struct intel_buffer_object *intel_obj,
>  GLuint flag)
>  {
> -   if (intel_obj->region) {
> -  if (flag == INTEL_WRITE_PART)
> - intel_bufferobj_cow(intel, intel_obj);
> -  else if (flag == INTEL_WRITE_FULL) {
> - intel_bufferobj_release_region(intel_obj);
> -  intel_bufferobj_alloc_buffer(intel, intel_obj);
> -  }
> -   }
> -
> if (intel_obj->source)
>release_buffer(intel_obj);
>  
> diff --git a/src/mesa/drivers/dri/intel/intel_buffer_objects.h 
> b/src/mesa/drivers/dri/intel/intel_buffer_objects.h
> index d75cdbf..b174e93 100644
> --- a/src/mesa/drivers/dri/intel/intel_buffer_objects.h
> +++ b/src/mesa/drivers/dri/intel/intel_buffer_objects.h
> @@ -31,7 +31,6 @@
>  #include "main/mtypes.h"
>  
>  struct intel_context;
> -struct intel_region;
>  struct gl_buffer_object;
>  
> 
> @@ -47,10 +46,6 @@ struct intel_buffer_object
> /** System memory buffer data, if not using a BO to store the data. */
> void *sys_buffer;
>  
> -   struct intel_region *region; /* Is there a zero-copy texture
> -   associated with this (pixel)
> -   buffer object? */
> -
> drm_intel_bo *range_map_bo;
> void *range_map_buffer;
> unsigned int range_map_offset;
> @@ -102,11 +97,4 @@ intel_buffer_object(struct gl_buffer_object *obj)
> return (struct intel_buffer_object *) obj;
>  }

Re: [Mesa-dev] Building with -fno-builtin-memcmp for improved performance

2011-09-20 Thread Keith Whitwell

On Tue, 2011-09-20 at 16:35 +0200, Roland Scheidegger wrote:
> Am 20.09.2011 16:15, schrieb Keith Whitwell:
> > On Tue, 2011-09-20 at 16:02 +0200, Roland Scheidegger wrote:
> >> Am 20.09.2011 12:35, schrieb Keith Whitwell:
> >>> On Tue, 2011-09-20 at 10:59 +0200, Fabio wrote:
> >>>> There was a discussion some months ago about using -fno-builtin-memcmp 
> >>>> for 
> >>>> improving memcmp performance:
> >>>> http://lists.freedesktop.org/archives/mesa-dev/2011-June/009078.html
> >>>>
> >>>> Since then, was it properly addressed in mesa or the flag is still 
> >>>> recommended? If so, what about adding it in configure.ac?
> >>>
> >>> I've been meaning to follow up on this too.  I don't know the answer,
> >>> but pinging Roland in case he does.
> >>
> >> I guess it is still recommended.
> >> Ideally this is really something which should be fixed in gcc - the
> >> compiler has all the knowledge about fixed alignment and size (if any)
> >> (and more importantly knows if only a binary answer is needed which
> >> makes this much easier) and doesn't need to do any function call.
> >> If you enable that flag and some platform just has the same primitive
> >> repz cmpsb sequence in the system library it will just get even slower,
> >> though I guess chances of that happening are slim (with the possible
> >> exception of windows).
> >> I think in most cases it won't make much difference, so nobody cared to
> >> implement that change. It is most likely still a good idea unless gcc
> >> addressed that in the meantime...
> > 
> > Hmm, it seemed like it made a big difference in the earlier
> > discussion...
> Yes for llvmpipe and one app at least.
> But that struct being compared there is most likely the biggest (by far)
> anywhere (at least which is compared in a regular fashion).
> 
> > I should take a look at reducing the size of the struct (as mentioned
> > before), but surely there's some way to pull in a better memcmp??
> 
> Well, apart from using -fno-builtin-memcmp we could build our own
> memcmpxx, though the version I did there (returning binary only result
> and assuming 32bit alignment/size allowing gcc to optimize it) was still
> slower for large sizes than -fno-builtin-memcmp. Of course we could
> optimize it more (e.g. for 64bit aligned/sized things, or using
> hand-coded sse2 versions using 128bit at-a-time comparisons) but then it
> gets more complicated, so I wasn't sure it was worth it.
> 
> For reference here are the earlier numbers (ipers with llvmpipe):
> original ipers: 12.1 fps
> optimized struct compare: 16.8 fps
> -fno-builtin-memcmp: 18.1 fps
> 
> And this was the function I used for getting the numbers:
> 
> static INLINE int util_cmp_struct(const void *src1, const void *src2,
> unsigned count)
> {
>   /* hmm pointer casting is evil */
>   const uint32_t *src1_ptr = (uint32_t *)src1;
>   const uint32_t *src2_ptr = (uint32_t *)src2;
>   unsigned i;
>   assert(count % 4 == 0);
>   for (i = 0; i < count/4; i++) {
> if (*src1_ptr != *src2_ptr) {
>   return 1;
> }
> src1_ptr++;
> src2_ptr++;
>   }
>   return 0;
> }

OK, maybe the first thing to do is fix the compared struct, then let's
see if there's anything significant left for a better memcmp to extract.

I can find some time to do that in the next few days.

Keith

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] Building with -fno-builtin-memcmp for improved performance

2011-09-20 Thread Keith Whitwell

On Tue, 2011-09-20 at 16:02 +0200, Roland Scheidegger wrote:
> Am 20.09.2011 12:35, schrieb Keith Whitwell:
> > On Tue, 2011-09-20 at 10:59 +0200, Fabio wrote:
> >> There was a discussion some months ago about using -fno-builtin-memcmp for 
> >> improving memcmp performance:
> >> http://lists.freedesktop.org/archives/mesa-dev/2011-June/009078.html
> >>
> >> Since then, was it properly addressed in mesa or the flag is still 
> >> recommended? If so, what about adding it in configure.ac?
> > 
> > I've been meaning to follow up on this too.  I don't know the answer,
> > but pinging Roland in case he does.
> 
> I guess it is still recommended.
> Ideally this is really something which should be fixed in gcc - the
> compiler has all the knowledge about fixed alignment and size (if any)
> (and more importantly knows if only a binary answer is needed which
> makes this much easier) and doesn't need to do any function call.
> If you enable that flag and some platform just has the same primitive
> repz cmpsb sequence in the system library it will just get even slower,
> though I guess chances of that happening are slim (with the possible
> exception of windows).
> I think in most cases it won't make much difference, so nobody cared to
> implement that change. It is most likely still a good idea unless gcc
> addressed that in the meantime...

Hmm, it seemed like it made a big difference in the earlier
discussion...

I should take a look at reducing the size of the struct (as mentioned
before), but surely there's some way to pull in a better memcmp??

Keith

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] Building with -fno-builtin-memcmp for improved performance

2011-09-20 Thread Keith Whitwell

On Tue, 2011-09-20 at 10:59 +0200, Fabio wrote:
> There was a discussion some months ago about using -fno-builtin-memcmp for 
> improving memcmp performance:
> http://lists.freedesktop.org/archives/mesa-dev/2011-June/009078.html
> 
> Since then, was it properly addressed in mesa or the flag is still 
> recommended? If so, what about adding it in configure.ac?

I've been meaning to follow up on this too.  I don't know the answer,
but pinging Roland in case he does.

Keith

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 3/3] state_trackers/dri/sw: Implement texture_from_pixmap.

2011-08-31 Thread Keith Whitwell

On Wed, 2011-08-31 at 04:55 -0700, Jose Fonseca wrote:
> I haven't tested but the whole patch series looks good AFAICT.
> 
> I'm really happy to see this work completed, as it was excluding the 
> llvmpipe/softpipe from a very big class of apps. Thanks for taking the 
> initiative!

Likewise!  Thanks for taking the time to figure this stuff out.

Keith

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/6] tgsi: add TXQ support.

2011-08-25 Thread Keith Whitwell

On Thu, 2011-08-25 at 15:00 +0100, Dave Airlie wrote:
> On Thu, Aug 25, 2011 at 2:43 PM, Keith Whitwell  wrote:
> > On Thu, 2011-08-25 at 07:28 -0600, Brian Paul wrote:
> >> How would the TXQ instruction be implemented for a hardware driver?
> >>
> >> Is there really a HW GPU instruction that returns the size of a texture?
> >
> > Yes, that's correct.
> >
> >> Otherwise, this seems like something we could implement in the state
> >> tracker by putting the texture size into a constant buffer slot.  Then
> >> we'd have it for all drivers.
> >
> > I think that's a good fallback for hardware that's missing this
> > capability, but DX10+ hardware should be expected to have it.
> 
> I can't see us caring really, its part of GLSL1.30 which pretty muhc
> means GL3.0, which pretty much means DX10.

Sounds fair enough...

Keith

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/6] tgsi: add TXQ support.

2011-08-25 Thread Keith Whitwell

On Thu, 2011-08-25 at 07:28 -0600, Brian Paul wrote:
> How would the TXQ instruction be implemented for a hardware driver?
> 
> Is there really a HW GPU instruction that returns the size of a texture?

Yes, that's correct.

> Otherwise, this seems like something we could implement in the state 
> tracker by putting the texture size into a constant buffer slot.  Then 
> we'd have it for all drivers.

I think that's a good fallback for hardware that's missing this
capability, but DX10+ hardware should be expected to have it.

Keith

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] DEATH to old drivers!

2011-08-25 Thread Keith Whitwell

On Wed, 2011-08-24 at 20:46 -0400, Kristian Høgsberg wrote:
> On Wed, Aug 24, 2011 at 3:11 PM, Ian Romanick  wrote:
> > -BEGIN PGP SIGNED MESSAGE-
> > Hash: SHA1
> >
> > I'd like to propose giving the ax to a bunch of old, unmaintained
> > drivers.  I've been doing a bunch of refactoring and reworking of core
> > Mesa code, and these drivers have been causing me problems for a number
> > of reasons.
> >
> > 1. The hardware is so old that it doesn't support a lot of features that
> > have been common for 12+ years.
> >
> > 2. The drivers are so unmaintained that even hacking in new features
> > with dummy implementations is painful.
> >
> > 3. The drivers are so buggy that many piglit tests hang the GPU.  I
> > tried doing a piglit run on a Rage128 Pro that I have, but I gave up
> > after having to blacklist 15 tests.
> >
> > It also seems that at least some distros (e.g., Fedora) have stopped
> > shipping non-DRI2 drivers.  If nobody is shipping it, nobody is using it.
> >
> > My specific proposal is:
> >
> >  - Remove all DRI1 drivers: i810, mach64, mga, r128, savage, sis, tdfx,
> > and unichrome.
> >
> >  - Remove all unmaintained Windows drivers: gldirect, icd.
> >
> >  - Remove beos.
> >
> >  - Remove fbdev (this is swrast on raw fbdev).
> >
> > Opinions?
> 
> I wasn't going to chime in with another "me too", but just make it
> clear that there's a pretty strong concensus, here we go: yes please!
> And I've done a good deal of work in the DRI interface area and the
> maintenence burden is real, no matter what the back seat drivers say.

I will though:  Me too!

Keith

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 08/12] mesa: Fix incorrect access parameter passed to MapBuffer

2011-08-22 Thread Keith Whitwell

Your analysis sounds reasonable to me, Ian.  Looks good.

Keith

On Mon, 2011-08-22 at 00:33 -0700, Ian Romanick wrote:
> From: Ian Romanick 
> 
> The code previously passed GL_DYNAMIC_DRAW for the access parameter.
> By inspection, I believe that all drivers would treat this as
> GL_READ_WRITE because it's not GL_READ_ONLY and it's not
> GL_WRITE_ONLY.  However, my guess is that this code actually wants to
> use GL_WRITE_ONLY.
> 
> Cc: Eric Anholt 
> Cc: Keith Whitwell 
> ---
>  src/mesa/drivers/dri/i965/brw_draw_upload.c |4 +---
>  src/mesa/main/api_arrayelt.c|4 +---
>  2 files changed, 2 insertions(+), 6 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_draw_upload.c 
> b/src/mesa/drivers/dri/i965/brw_draw_upload.c
> index 66c42aa..3b95244 100644
> --- a/src/mesa/drivers/dri/i965/brw_draw_upload.c
> +++ b/src/mesa/drivers/dri/i965/brw_draw_upload.c
> @@ -689,9 +689,7 @@ static void brw_prepare_indices(struct brw_context *brw)
> * rebase it into a temporary.
> */
> if ((get_size(index_buffer->type) - 1) & offset) {
> -   GLubyte *map = ctx->Driver.MapBuffer(ctx,
> -GL_DYNAMIC_DRAW_ARB,
> -bufferobj);
> +   GLubyte *map = ctx->Driver.MapBuffer(ctx, GL_READ_WRITE, 
> bufferobj);
> map += offset;
>  
>  intel_upload_data(&brw->intel, map, ib_size, ib_type_size,
> diff --git a/src/mesa/main/api_arrayelt.c b/src/mesa/main/api_arrayelt.c
> index 6400c8f..b897a33 100644
> --- a/src/mesa/main/api_arrayelt.c
> +++ b/src/mesa/main/api_arrayelt.c
> @@ -1602,9 +1602,7 @@ void _ae_map_vbos( struct gl_context *ctx )
>_ae_update_state(ctx);
>  
> for (i = 0; i < actx->nr_vbos; i++)
> -  ctx->Driver.MapBuffer(ctx,
> - GL_DYNAMIC_DRAW_ARB,
> - actx->vbo[i]);
> +  ctx->Driver.MapBuffer(ctx, GL_READ_WRITE, actx->vbo[i]);
>  
> if (actx->nr_vbos)
>actx->mapped_vbos = GL_TRUE;


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] st/mesa: fix incorrect loop over instruction src regs

2011-08-17 Thread Keith Whitwell

On Wed, 2011-08-17 at 09:36 -0500, Bryan Cain wrote:
> The usual commit message prefix for changes to glsl_to_tgsi is
> "glsl_to_tgsi", not "st/mesa".
> 
> On 08/16/2011 05:33 PM, Brian Paul wrote:
> > The array of src regs is of size 3, not 4.
> > ---
> >  src/mesa/state_tracker/st_glsl_to_tgsi.cpp |2 +-
> >  1 files changed, 1 insertions(+), 1 deletions(-)
> >
> > diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp 
> > b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
> > index aef23e7..7b90c81 100644
> > --- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
> > +++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
> > @@ -3443,7 +3443,7 @@ 
> > glsl_to_tgsi_visitor::eliminate_dead_code_advanced(void)
> >   /* Continuing the block, clear any channels from the write array 
> > that
> >* are read by this instruction.
> >*/
> > - for (int i = 0; i < 4; i++) {
> > + for (unsigned i = 0; i < Elements(inst->src); i++) {
> 
> Why not just use 3 here?

Elements(inst->src) is self-documenting.  

3 is just a number and to figure out if it was the correct number you'd
have to go and look at the header file to see if it matched the value
there.

Both should generate the same compiled code.

Keith


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] swrast: initial multi-threaded span rendering

2011-08-10 Thread Keith Whitwell

I'm not sure it makes a lot of sense to be optimizing swrast at this
stage.  Take a look at llvmpipe and perhaps consider improving the
multithreading already in place in that rasterizer, which is far better
optimized than swrast already.

Keith

On Wed, 2011-08-10 at 08:07 +, Andreas Fänger wrote:
> Optional parallel rendering of spans using OpenMP.
> Initial implementation for aa triangles. A new option for scons is
> also provided to activate the openmp support (off by default).
> ---
>  common.py  |1 +
>  scons/gallium.py   |   12 +++
>  src/mesa/swrast/s_aatritemp.h  |   68 ++-
>  src/mesa/swrast/s_context.c|   26 ---
>  src/mesa/swrast/s_texcombine.c |4 ++
>  src/mesa/tnl/t_pipeline.c  |   12 +++
>  6 files changed, 87 insertions(+), 36 deletions(-)
> 
> diff --git a/common.py b/common.py
> index 8657030..cfee1b5 100644
> --- a/common.py
> +++ b/common.py
> @@ -88,6 +88,7 @@ def AddOptions(opts):
>   opts.Add('toolchain', 'compiler toolchain', default_toolchain)
>   opts.Add(BoolOption('gles', 'EXPERIMENTAL: enable OpenGL ES support', 
> 'no'))
>   opts.Add(BoolOption('llvm', 'use LLVM', default_llvm))
> + opts.Add(BoolOption('openmp', 'EXPERIMENTAL: compile with openmp 
> (swrast)', 'no'))
>   opts.Add(BoolOption('debug', 'DEPRECATED: debug build', 'yes'))
>   opts.Add(BoolOption('profile', 'DEPRECATED: profile build', 'no'))
>   opts.Add(BoolOption('quiet', 'DEPRECATED: profile build', 'yes'))
> diff --git a/scons/gallium.py b/scons/gallium.py
> index 8cd3bc7..7135251 100755
> --- a/scons/gallium.py
> +++ b/scons/gallium.py
> @@ -596,6 +596,18 @@ def generate(env):
>  libs += ['m', 'pthread', 'dl']
>  env.Append(LIBS = libs)
>  
> +# OpenMP
> +if env['openmp']:
> +if env['msvc']:
> +env.Append(CCFLAGS = ['/openmp'])
> +# When building openmp release VS2008 link.exe crashes with 
> LNK1103 error.
> +# Workaround: overwrite PDB flags with empty value as it isn't 
> required anyways
> +if env['build'] == 'release':
> +env['PDB'] = ''
> +if env['gcc']:
> +env.Append(CCFLAGS = ['-fopenmp'])
> +env.Append(LIBS = ['gomp'])
> +
>  # Load tools
>  env.Tool('lex')
>  env.Tool('yacc')
> diff --git a/src/mesa/swrast/s_aatritemp.h b/src/mesa/swrast/s_aatritemp.h
> index 91d4f7a..005d12c 100644
> --- a/src/mesa/swrast/s_aatritemp.h
> +++ b/src/mesa/swrast/s_aatritemp.h
> @@ -181,13 +181,18 @@
>const GLfloat *pMax = vMax->attrib[FRAG_ATTRIB_WPOS];
>const GLfloat dxdy = majDx / majDy;
>const GLfloat xAdj = dxdy < 0.0F ? -dxdy : 0.0F;
> -  GLfloat x = pMin[0] - (yMin - iyMin) * dxdy;
>GLint iy;
> -  for (iy = iyMin; iy < iyMax; iy++, x += dxdy) {
> +  #pragma omp parallel for schedule(dynamic) private(iy) 
> firstprivate(span)
> +  for (iy = iyMin; iy < iyMax; iy++) {
> + GLfloat x = pMin[0] - (yMin - iy) * dxdy;
>   GLint ix, startX = (GLint) (x - xAdj);
>   GLuint count;
>   GLfloat coverage = 0.0F;
>  
> +#ifdef _OPENMP
> + /* each thread needs to use a different (global) SpanArrays 
> variable */
> + span.array = SWRAST_CONTEXT(ctx)->SpanArrays + omp_get_thread_num();
> +#endif
>   /* skip over fragments with zero coverage */
>   while (startX < MAX_WIDTH) {
>  coverage = compute_coveragef(pMin, pMid, pMax, startX, iy);
> @@ -228,13 +233,12 @@
>  coverage = compute_coveragef(pMin, pMid, pMax, ix, iy);
>   }
>   
> - if (ix <= startX)
> -continue;
> - 
> - span.x = startX;
> - span.y = iy;
> - span.end = (GLuint) ix - (GLuint) startX;
> - _swrast_write_rgba_span(ctx, &span);
> + if (ix > startX) {
> +span.x = startX;
> +span.y = iy;
> +span.end = (GLuint) ix - (GLuint) startX;
> +_swrast_write_rgba_span(ctx, &span);
> + }
>}
> }
> else {
> @@ -244,13 +248,18 @@
>const GLfloat *pMax = vMax->attrib[FRAG_ATTRIB_WPOS];
>const GLfloat dxdy = majDx / majDy;
>const GLfloat xAdj = dxdy > 0 ? dxdy : 0.0F;
> -  GLfloat x = pMin[0] - (yMin - iyMin) * dxdy;
>GLint iy;
> -  for (iy = iyMin; iy < iyMax; iy++, x += dxdy) {
> +  #pragma omp parallel for schedule(dynamic) private(iy) 
> firstprivate(span)
> +  for (iy = iyMin; iy < iyMax; iy++) {
> + GLfloat x = pMin[0] - (yMin - iy) * dxdy;
>   GLint ix, left, startX = (GLint) (x + xAdj);
>   GLuint count, n;
>   GLfloat coverage = 0.0F;
>   
> +#ifdef _OPENMP
> + /* each thread needs to use a different (global) SpanArrays 
> variable */
> + span.array = SWRAST_CONTEXT(ctx)->SpanArrays + omp_get_thread_num();
> +#endif
>

Re: [Mesa-dev] About merging pipe-video to master

2011-07-12 Thread Keith Whitwell

On Tue, 2011-07-12 at 11:13 -0400, Younes Manton wrote:
> 2011/7/12 Keith Whitwell :
> > I'm a bit unsure about what's the best approach here, though at this
> > stage I'm happy with your approach and don't think it needs to be
> > changed before any merge.
> >
> > But speaking in general terms, individual planes map well onto 8-bit
> > single-component texture images (L8 or similar) and any hardware
> > requirements (pitch, memory pool, etc) for the individual plane could be
> > specified with a PIPE_BIND_VIDEO_BUFFER flag.
> >
> > However, it's also easy to imagine hardware having special requirements
> > about the positioning of the planes relative to one another, similar to
> > how mipmaps must be layed out in hardware-specific ways.
> >
> > If we did decide to get rid of video_buffers and integrate the concept
> > with pipe_resources, it seems like there would need to be a way to
> > specify this at resource creation - either a planar YUV format, or some
> > other marking on the resource.
> >
> > I don't have easy answers for that, and in the meantime I don't think
> > it's important enough to hold up pipe-video, which is looking now like a
> > good step forward.
> >
> > Keith
> 
> 
> I've considered that. The problem that brings up is what happens when
> you need to hand that planar surface over to the 3D context as a
> texture to be sampled from for color conversion. From the state
> tracker's POV you've just handed over a single texture with
> corresponding vertex attribs, texcoords, shaders, etc, but in reality
> your 3D engine will have to treat each plane as a separate texture.
> You could special-case planar textures and internally create extra
> state objs and fix up the shader, but the extra complexity buys you
> nothing except a "nicer looking" planar texture representation in the
> interface and is ugly in itself.
> 
> Anyhow, Christian, your changes look alright to me. Again, I don't
> think this interface has to be perfect now to be acceptable.

Agreed.

Keith


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] About merging pipe-video to master

2011-07-12 Thread Keith Whitwell

On Mon, 2011-07-11 at 18:24 +0200, Christian König wrote:
> Hi guys,
> 
> as the subject already indicates: I'm about to merge pipe-video to
> master and just wanted to ask if anybody has still any objections?
> 
> After following Jose and Younes discussion on mesa-dev about how to
> design such an abstraction layer I took another round of cleaning up the
> interface and moved some more parts into the state tracker.
> 
> So the interface between the state tracker and drivers only consist of
> the following now:
> 
> 1. two additional functions for the screen object: get_video_param and
> is_video_format_supported. get_video_param gets a parameter for a
> specified codec (like max width/height of decoding target, which could
> be smaller than texture max width/height), and is_video_format_supported
> which checks if a texture format is supported as a decoding target for a
> codec.
> 
> 2. create_video_decoder function in the pipe_context object, which
> creates a decoder object for a given codec. The decoder object in turn
> includes everything needed to decode a video stream of that codec and
> uses pipe_video_decode_buffer objects to hold the input data of a single
> frame of that video codec.
> 
> 3. create_video_buffer function in the pipe_context object, which
> creates a video_buffer object to store a decoded video frame. This
> video_buffer object is then used for both rendering to the screen with
> normal pipe_context functionality and also as the input for reference
> frames back to the decoder.
> 
> The pipe_video_buffer object is there because I think hardware decoders
> need some special memory layout of the different planes of a yuv buffer.
> There is a standard implementation that just uses normal textures as the
> different planes for yuv buffer, which can be used by a driver when
> there is no need for a special memory layout or when the driver just
> uses shader based decoding.
> 
> The other option would be adding a PIPE_BIND_VIDEO_BUFFER flag to the
> resource creation functions, but I don't want to repeat functionality in
> the different drivers and as far as I can see the current resource
> functions (samplers/surfaces) can't be used to create a surface for just
> one plane/component of a yuv buffer and we could still clean that up to
> use the standard resource functions if the need arise.

I'm a bit unsure about what's the best approach here, though at this
stage I'm happy with your approach and don't think it needs to be
changed before any merge.

But speaking in general terms, individual planes map well onto 8-bit
single-component texture images (L8 or similar) and any hardware
requirements (pitch, memory pool, etc) for the individual plane could be
specified with a PIPE_BIND_VIDEO_BUFFER flag.

However, it's also easy to imagine hardware having special requirements
about the positioning of the planes relative to one another, similar to
how mipmaps must be layed out in hardware-specific ways.

If we did decide to get rid of video_buffers and integrate the concept
with pipe_resources, it seems like there would need to be a way to
specify this at resource creation - either a planar YUV format, or some
other marking on the resource.

I don't have easy answers for that, and in the meantime I don't think
it's important enough to hold up pipe-video, which is looking now like a
good step forward.

Keith

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 11/13] gallium/util: implement pack functions for Z32F and Z32F_S8X24

2011-07-01 Thread Keith Whitwell

On Fri, 2011-07-01 at 14:42 +0200, Marek Olšák wrote:
> On Fri, Jul 1, 2011 at 10:49 AM, Keith Whitwell  wrote:
> > On Fri, 2011-07-01 at 02:29 +0200, Marek Olšák wrote:
> >> The suffix of 64 means it returns uint64_t.
> >
> > It might be slightly clearer to call these functions util_pack64_{xxx}
> > -- currently it reads as if it is packing 64-bit source data.
> 
> Yeah, that's nicer. Here's the diff I am going to squash with the
> patches 11 and 12.

Looks great!

Keith

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] Gallium: fix buffer overflow

2011-07-01 Thread Keith Whitwell

This looks good to me -- Jose?

Keith

On Thu, 2011-06-30 at 03:33 +0100, Micael Dias wrote:
> ---
>  src/gallium/auxiliary/draw/draw_llvm.c |   12 
>  1 files changed, 12 insertions(+), 0 deletions(-)
> 
> diff --git a/src/gallium/auxiliary/draw/draw_llvm.c 
> b/src/gallium/auxiliary/draw/draw_llvm.c
> index 56c26f5..19134f3 100644
> --- a/src/gallium/auxiliary/draw/draw_llvm.c
> +++ b/src/gallium/auxiliary/draw/draw_llvm.c
> @@ -1163,6 +1163,7 @@ draw_llvm_generate(struct draw_llvm *llvm, struct 
> draw_llvm_variant *variant)
> struct lp_build_loop_state lp_loop;
> const int max_vertices = 4;
> LLVMValueRef outputs[PIPE_MAX_SHADER_OUTPUTS][NUM_CHANNELS];
> +   LLVMValueRef fetch_max;
> void *code;
> struct lp_build_sampler_soa *sampler = 0;
> LLVMValueRef ret, ret_ptr;
> @@ -1234,6 +1235,10 @@ draw_llvm_generate(struct draw_llvm *llvm, struct 
> draw_llvm_variant *variant)
>draw_llvm_variant_key_samplers(&variant->key),
>context_ptr);
>  
> +   fetch_max = LLVMBuildSub(builder, count,
> +lp_build_const_int32(gallivm, 1),
> +"fetch_max");
> +
>  #if DEBUG_STORE
> lp_build_printf(builder, "start = %d, end = %d, step = %d\n",
> start, end, step);
> @@ -1257,6 +1262,13 @@ draw_llvm_generate(struct draw_llvm *llvm, struct 
> draw_llvm_variant *variant)
>  builder,
>  lp_loop.counter,
>  lp_build_const_int32(gallivm, i), "");
> + LLVMValueRef fetch_ptr;
> +
> + /* make sure we're not out of bounds which can happen
> +  * if fetch_count % 4 != 0, because on the last iteration
> +  * a few of the 4 vertex fetches will be out of bounds */
> + true_index = lp_build_min(&bld, true_index, fetch_max);
> + 
>   for (j = 0; j < draw->pt.nr_vertex_elements; ++j) {
>  struct pipe_vertex_element *velem = &draw->pt.vertex_element[j];
>  LLVMValueRef vb_index = lp_build_const_int32(gallivm, 
> velem->vertex_buffer_index);


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 11/13] gallium/util: implement pack functions for Z32F and Z32F_S8X24

2011-07-01 Thread Keith Whitwell

On Fri, 2011-07-01 at 02:29 +0200, Marek Olšák wrote:
> The suffix of 64 means it returns uint64_t.

It might be slightly clearer to call these functions util_pack64_{xxx}
-- currently it reads as if it is packing 64-bit source data.

Keith

> ---
>  src/gallium/auxiliary/util/u_pack_color.h |   64 
> +
>  1 files changed, 64 insertions(+), 0 deletions(-)
> 
> diff --git a/src/gallium/auxiliary/util/u_pack_color.h 
> b/src/gallium/auxiliary/util/u_pack_color.h
> index 5378f2d..d2dfba5 100644
> --- a/src/gallium/auxiliary/util/u_pack_color.h
> +++ b/src/gallium/auxiliary/util/u_pack_color.h
> @@ -458,6 +458,19 @@ util_pack_mask_z(enum pipe_format format, uint32_t z)
> }
>  }
>  
> +
> +static INLINE uint64_t
> +util_pack_mask_z64(enum pipe_format format, uint32_t z)
> +{
> +   switch (format) {
> +   case PIPE_FORMAT_Z32_FLOAT_S8X24_USCALED:
> +  return z;
> +   default:
> +  return util_pack_mask_z(format, z);
> +   }
> +}
> +
> +
>  static INLINE uint32_t
>  util_pack_mask_z_stencil(enum pipe_format format, uint32_t z, uint8_t s)
>  {
> @@ -481,6 +494,21 @@ util_pack_mask_z_stencil(enum pipe_format format, 
> uint32_t z, uint8_t s)
>  }
>  
> 
> +static INLINE uint64_t
> +util_pack_mask_z_stencil64(enum pipe_format format, uint32_t z, uint8_t s)
> +{
> +   uint64_t packed;
> +
> +   switch (format) {
> +   case PIPE_FORMAT_Z32_FLOAT_S8X24_USCALED:
> +  packed = util_pack_mask_z64(format, z);
> +  packed |= (uint64_t)s << 32ull;
> +  return packed;
> +   default:
> +  return util_pack_mask_z_stencil(format, z, s);
> +   }
> +}
> +
>  
>  /**
>   * Note: it's assumed that z is in [0,1]
> @@ -525,6 +553,24 @@ util_pack_z(enum pipe_format format, double z)
>return 0;
> }
>  }
> +
> +
> +static INLINE uint64_t
> +util_pack_z64(enum pipe_format format, double z)
> +{
> +   union fi fui;
> +
> +   if (z == 0)
> +  return 0;
> +
> +   switch (format) {
> +   case PIPE_FORMAT_Z32_FLOAT_S8X24_USCALED:
> +  fui.f = (float)z;
> +  return fui.ui;
> +   default:
> +  return util_pack_z(format, z);
> +   }
> +}
>   
>  
>  /**
> @@ -554,6 +600,24 @@ util_pack_z_stencil(enum pipe_format format, double z, 
> uint8_t s)
>  }
>  
> 
> +static INLINE uint64_t
> +util_pack_z_stencil64(enum pipe_format format, double z, uint8_t s)
> +{
> +   uint64_t packed;
> +
> +   switch (format) {
> +   case PIPE_FORMAT_Z32_FLOAT_S8X24_USCALED:
> +  packed = util_pack_z64(format, z);
> +  packed |= (uint64_t)s << 32ull;
> +  break;
> +   default:
> +  return util_pack_z_stencil(format, z, s);
> +   }
> +
> +   return packed;
> +}
> +
> +
>  /**
>   * Pack 4 ubytes into a 4-byte word
>   */


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] llvmpipe: Optimize new fs state setup

2011-06-30 Thread Keith Whitwell

On Thu, 2011-06-30 at 17:53 +0200, Roland Scheidegger wrote:
> Am 30.06.2011 16:14, schrieb Adam Jackson:
> > On Thu, 2011-06-30 at 03:36 +0200, Roland Scheidegger wrote:
> >> Ok in fact there's a gcc bug about memcmp:
> >> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052
> >> In short gcc's memcmp builtin is totally lame and loses to glibc's
> >> memcmp (including call overhead, no knowledge about alignment etc.) even
> >> when comparing only very few bytes (and loses BIG time for lots of bytes
> >> to compare). Oops. Well at least if the strings are the same (I'd guess
> >> if the first byte is different it's hard to beat the gcc builtin...).
> >> So this is really a gcc bug. The bug is quite old though with no fix in
> >> sight apparently so might need to think about some workaround (but just
> >> not doing the comparison doesn't look like the right idea, since
> >> apparently it would be faster with the comparison if gcc's memcmp got
> >> fixed).
> > 
> > How do things fare if you build with -fno-builtin-memcmp?
> 
> This is even faster:
> original ipers: 12.1 fps
> ajax patch: 15.5 fps
> optimized struct compare: 16.8 fps
> -fno-builtin-memcmp: 18.1 fps
> 
> Looks like we have a winner :-) I guess glibc optimizes the hell out of
> it (in contrast to the other results, this affected all memcmp though I
> don't know if any others benefited from that on average).
> As noted by Keith though the struct we compare is really large (over 4k)
> so trimming the size might be a good idea anyway (of course the 4k size
> also meant any call overhead and non-optimal code due to glibc not
> knowing alignment beforehand and usage of return value is completely
> insignificant).
> A 50% improvement from disabling a compiler optimization, lol.

We probably what this everywhere throughout Mesa & Gallium...

Keith

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] llvmpipe: Optimize new fs state setup

2011-06-30 Thread Keith Whitwell

On Thu, 2011-06-30 at 03:27 -0700, Jose Fonseca wrote:
> 
> - Original Message -
> > On Thu, 2011-06-30 at 03:36 +0200, Roland Scheidegger wrote:
> > > Ok in fact there's a gcc bug about memcmp:
> > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052
> > > In short gcc's memcmp builtin is totally lame and loses to glibc's
> > > memcmp (including call overhead, no knowledge about alignment etc.)
> > > even
> > > when comparing only very few bytes (and loses BIG time for lots of
> > > bytes
> > > to compare). Oops. Well at least if the strings are the same (I'd
> > > guess
> > > if the first byte is different it's hard to beat the gcc
> > > builtin...).
> > > So this is really a gcc bug. The bug is quite old though with no
> > > fix in
> > > sight apparently so might need to think about some workaround (but
> > > just
> > > not doing the comparison doesn't look like the right idea, since
> > > apparently it would be faster with the comparison if gcc's memcmp
> > > got
> > > fixed).
> > 
> > Looking at the struct again (it's been a while), it seems like it
> > could
> > be rearranged to be variable-sized and on average significantly
> > smaller:
> > 
> > struct lp_rast_state {
> >struct lp_jit_context jit_context;
> >struct lp_fragment_shader_variant *variant;
> > };
> > 
> > struct lp_jit_context {
> >const float *constants;
> >float alpha_ref_value;
> >uint32_t stencil_ref_front, stencil_ref_back;
> >uint8_t *blend_color;
> >struct lp_jit_texture textures[PIPE_MAX_SAMPLERS];
> > };
> > 
> > If we moved the jit_context part behind "variant", and then hopefully
> > note that most of those lp_jit_texture structs are not in use.  That
> > would save time on the memcmp *and* space in the binned data.
> 
> Yeah, sounds a good idea.
> 
> But there's some subtletly to computing the number of textures: it
>  can't be just the NULL textures, because they may be reffered by the
>  JIT code, which has no NULL checks and  relies on the state setup to
>  provide storage for all textures, or dummy memory if one is not bound.

So it's a property of the variant, right?  We should just store that
information when we generate the llvm variant.

> I think a better idea would be:
> - split the texture/sampler state
> - to make the lp_jit_context::textures an array of pointers, and put the 
> struct lp_jit_texture in the pipe_texture object themselves
> - to make the lp_jit_context::samplers an array of pointers, and put the 
> struct lp_jit_sampler in the pipe_sampler_state CSO

I like this too - it's somewhat more involved of course.

In fact the two are orthogonal -- the struct below can still be shrunk
significantly by knowing how many samplers & textures the variant refers
to.  Interleaving them or packing them would reduce the bytes to be
compared.

Alternatively there could be just a pointer in jit_context to
textures/samplers binned elsewhere.

> struct lp_jit_context {
> struct lp_jit_texture *textures[PIPE_MAX_SAMPLERS];
> struct lp_jit_sampler *samplers[PIPE_MAX_SAMPLERS];
> };

The jit context above seems to have lost some of its fields...

The next step might be to split the context into four parts: textures,
samplers, constants, "other", and have jit_context just be a set of
pointers into the binned data:

struct lp_jit_context {
 struct lp_jit_texture **textures;
 struct lp_jit_sampler **samplers;
 const float *constants;
 const struct lp_jit_other *other;   
};

struct lp_jit_other {
   float alpha_ref_value;
   uint32_t stencil_ref_front;
   uint32_t stencil_ref_back;
   uint8_t *blend_color;
};

> struct lp_jit_texture
> {
>uint32_t width;
>uint32_t height;
>uint32_t depth;
>uint32_t first_level;
>uint32_t last_level;
>uint32_t row_stride[LP_MAX_TEXTURE_LEVELS];
>uint32_t img_stride[LP_MAX_TEXTURE_LEVELS];
>const void *data[LP_MAX_TEXTURE_LEVELS];
>/* sampler state, actually */
>float min_lod;
>float max_lod;
>float lod_bias;
>float border_color[4];
> };
> 
> struct lp_jit_sampler
> {
>float min_lod;
>float max_lod;
>float lod_bias;
>float border_color[4];
> };
> 
> 
> Jose


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] llvmpipe: Optimize new fs state setup

2011-06-30 Thread Keith Whitwell

On Thu, 2011-06-30 at 03:36 +0200, Roland Scheidegger wrote:
> Ok in fact there's a gcc bug about memcmp:
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052
> In short gcc's memcmp builtin is totally lame and loses to glibc's
> memcmp (including call overhead, no knowledge about alignment etc.) even
> when comparing only very few bytes (and loses BIG time for lots of bytes
> to compare). Oops. Well at least if the strings are the same (I'd guess
> if the first byte is different it's hard to beat the gcc builtin...).
> So this is really a gcc bug. The bug is quite old though with no fix in
> sight apparently so might need to think about some workaround (but just
> not doing the comparison doesn't look like the right idea, since
> apparently it would be faster with the comparison if gcc's memcmp got
> fixed).

Looking at the struct again (it's been a while), it seems like it could
be rearranged to be variable-sized and on average significantly smaller:

struct lp_rast_state {
   struct lp_jit_context jit_context;
   struct lp_fragment_shader_variant *variant;
};

struct lp_jit_context {
   const float *constants;
   float alpha_ref_value;
   uint32_t stencil_ref_front, stencil_ref_back;
   uint8_t *blend_color;
   struct lp_jit_texture textures[PIPE_MAX_SAMPLERS];
};

If we moved the jit_context part behind "variant", and then hopefully
note that most of those lp_jit_texture structs are not in use.  That
would save time on the memcmp *and* space in the binned data.

It's weird this wasn't showing up in past profiling.

Kieth


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] llvmpipe: Optimize new fs state setup

2011-06-30 Thread Keith Whitwell

On Wed, 2011-06-29 at 16:16 -0700, Corbin Simpson wrote:
> Okay, so maybe I'm failing to recognize the exact situation here, but
> wouldn't it be possible to mark the FS state with a serial number and
> just compare those? Or are these FS states not CSO-cached?

No, the struct being compared is poorly named & collides with a CSO
entity.  It's really all the state which the compiled fragment shader
will reference when it is later invoked.  It's all packed into a single
struct because it's easier to pass a single parameter to llvm-compiled
shaders and add/change that parameter, but it is somewhat non-orthogonal
and we end up generating too many of them.

Keith

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] llvmpipe: Optimize new fs state setup

2011-06-29 Thread Keith Whitwell

On Wed, 2011-06-29 at 13:19 -0400, Adam Jackson wrote:
> Perversely, do this by eliminating the comparison between stored and
> current fs state.  On ipers, a perf trace showed try_update_scene_state
> using 31% of a CPU, and 98% of that was in 'repz cmpsb', ie, the memcmp.
> Taking that out takes try_update_scene_state down to 6.5% of the
> profile; more importantly, ipers goes from 10 to 14fps and gears goes
> from 790 to 830fps.

Some of the motivation for that memcpy is about keeping the memory usage
of the binned scene from exploding and forcing unnecessary flushes on
more complex apps.

I wonder if there is a way to improve the dirty flag handling to avoid
ending up in that memcpy so often?


Note that freeglut is probably dominating your gears numbers by trying
to reinitialize your SpaceBall device (I don't have one either) on every
swapbuffers.

http://lists.freedesktop.org/archives/mesa-dev/2011-February/005599.html


Keith


> Signed-off-by: Adam Jackson 
> ---
>  src/gallium/drivers/llvmpipe/lp_setup.c |   61 
> ++-
>  1 files changed, 27 insertions(+), 34 deletions(-)
> 
> diff --git a/src/gallium/drivers/llvmpipe/lp_setup.c 
> b/src/gallium/drivers/llvmpipe/lp_setup.c
> index cbe06e5..9118db5 100644
> --- a/src/gallium/drivers/llvmpipe/lp_setup.c
> +++ b/src/gallium/drivers/llvmpipe/lp_setup.c
> @@ -839,42 +839,35 @@ try_update_scene_state( struct lp_setup_context *setup )
>setup->dirty |= LP_SETUP_NEW_FS;
> }
>  
> -
> if (setup->dirty & LP_SETUP_NEW_FS) {
> -  if (!setup->fs.stored ||
> -  memcmp(setup->fs.stored,
> - &setup->fs.current,
> - sizeof setup->fs.current) != 0)
> -  {
> - struct lp_rast_state *stored;
> - uint i;
> - 
> - /* The fs state that's been stored in the scene is different from
> -  * the new, current state.  So allocate a new lp_rast_state object
> -  * and append it to the bin's setup data buffer.
> -  */
> - stored = (struct lp_rast_state *) lp_scene_alloc(scene, sizeof 
> *stored);
> - if (!stored) {
> -assert(!new_scene);
> -return FALSE;
> - }
> +  struct lp_rast_state *stored;
> +  uint i;
> +  
> +  /* The fs state that's been stored in the scene is different from
> +   * the new, current state.  So allocate a new lp_rast_state object
> +   * and append it to the bin's setup data buffer.
> +   */
> +  stored = (struct lp_rast_state *) lp_scene_alloc(scene, sizeof 
> *stored);
> +  if (!stored) {
> + assert(!new_scene);
> + return FALSE;
> +  }
>  
> - memcpy(stored,
> -&setup->fs.current,
> -sizeof setup->fs.current);
> - setup->fs.stored = stored;
> - 
> - /* The scene now references the textures in the rasterization
> -  * state record.  Note that now.
> -  */
> - for (i = 0; i < Elements(setup->fs.current_tex); i++) {
> -if (setup->fs.current_tex[i]) {
> -   if (!lp_scene_add_resource_reference(scene,
> -setup->fs.current_tex[i],
> -new_scene)) {
> -  assert(!new_scene);
> -  return FALSE;
> -   }
> +  memcpy(stored,
> + &setup->fs.current,
> + sizeof setup->fs.current);
> +  setup->fs.stored = stored;
> +  
> +  /* The scene now references the textures in the rasterization
> +   * state record.  Note that now.
> +   */
> +  for (i = 0; i < Elements(setup->fs.current_tex); i++) {
> + if (setup->fs.current_tex[i]) {
> +if (!lp_scene_add_resource_reference(scene,
> + setup->fs.current_tex[i],
> + new_scene)) {
> +   assert(!new_scene);
> +   return FALSE;
>  }
>   }
>}


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 3/3] r600g: implement fragment and vertex color clamp

2011-06-27 Thread Keith Whitwell

On Mon, 2011-06-27 at 15:32 +0200, Marek Olšák wrote:
> On Mon, Jun 27, 2011 at 2:38 PM, Roland Scheidegger
>  wrote:
> > Am 25.06.2011 00:22, schrieb Vadim Girlin:
> >> On 06/24/2011 11:38 PM, Jerome Glisse wrote:
> >>> On Fri, Jun 24, 2011 at 12:29 PM, Vadim
> Girlin
> >>> wrote:
>  Fixes https://bugs.freedesktop.org/show_bug.cgi?id=38440
> 
>  Signed-off-by: Vadim Girlin
> >>>
> >>> As discussed previously, there is better to handle this. I think
> best
> >>> solution is to always add the instruction and to conditionally
> execute
> >>> them thanks to the boolean constant. If this reveal to have a too
> big
> >>> impact on shader, other solution i see is adding a cf block with
> those
> >>> instructions and to enable or disable that block (cf_nop) and
> reupload
> >>> shader that would avoid a rebuild.
> >>
> >> I know its not optimal to do a full rebuild, but rebuild is needed
> only
> >> when the application will use the same shader in different clamping
> >> states. It won't be a problem if the application doesn't change
> clamping
> >> state or if it changes the state but uses each shader in one state
> only.
> >> So assuming that typical app will not use one shader in both
> states, it
> >> shouldn't be a problem. Is this assumption wrong? I'm not really
> sure
> >> because I have no much experience in this. But if it's wrong then
> it's
> >> probably better for performance to build and cache both versions.
> > I tend to think you're right apps probably don't want to use the
> same
> > shader both with and without clamping.
> 
> It still can be changed by st/mesa or by u_blitter and u_blit for
> various reasons. IIRC, the OpenGL default is TRUE if the current
> framebuffer is fixed-point including texture_snorm and FALSE
> otherwise, so changing the framebuffer may change the clamp color
> state. Besides that, the u_blitter and u_blit operations always
> disable the clamping, so if a framebuffer is fixed-point and thus
> clamp color state is TRUE (if not changed by an app), the driver may
> receive state changes that turn the clamping on, off, on, off,... with
> the blit operations turning it off and everything else turning it on.
> The state might be changing pretty much all the time and doing full
> shader rebuilds repeatedly may turn some apps into a slideshow.

I haven't looked at the code, maybe this is irrelevant for some reason,
but the alternative to doing rebuilds when this type of state changes is
to permit >1 compiled version of the shader to exist, parameterized in
different ways.  That way the ping-pong scenario you describe results in
swapping between shaders (which should be cheap), rather than
rebuilding.

> Therefore we must ensure that a fragment shader is set/built as late
> as possible, i.e. in draw_vbo. Each shader variant should be compiled
> once at most and stored for later use. create_fs_state and
> bind_fs_state should not do anything except copying the parameters. 

Actually it sounds like you're describing the same idea here...

Keith


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] A gallium XA state tracker

2011-06-15 Thread Keith Whitwell

On Wed, 2011-06-15 at 11:29 +0200, Thomas Hellstrom wrote:
> Hi!
> 
> I just pushed an initial commit of an X Acceleration state tracker to 
> the xa_branch.
> 
> The idea is that in the long run it will be replacing the Xorg state 
> tracker, which can then move back to a modular xf86-video-modesetting. 
> It will also be responsible for the acceleration part of an updated 
> vmwgfx X driver
> 
>  From the README:
> 
> 8<--
> The XA state tracker is intended as a versioned interface to gallium for
> xorg driver writers. Initially it's mostly based on Zack Rusin's
> composite / video work for the Xorg state tracker.
> 
> The motivation behind this state tracker is that the Xorg state tracker has
> a number of interfaces to work with:
> 
> 1) The Xorg sdk (versioned)
> 2) Gallium3D (not versioned)
> 3) KMS modesetting (versioned)
> 4) Driver-private (hopefully versioned)
> 
> Since Gallium3D is versioned, the Xorg state tracker needs to be compiled

Hi Thomas!  Is there a missing "not" before versioned in the above
sentence?

Keith


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 0/6] Overhaul of Gallium configure options

2011-06-14 Thread Keith Whitwell

On Tue, 2011-06-14 at 18:25 +0200, Marek Olšák wrote:
> Hi,
> 
> This series reworks some of our configure options to make Gallium easier to 
> configure.
> 
> First, there is a new option --with-gallium-drivers=DIRS, which replaces the 
> current heap of options --enable-gallium-DRIVER. --disable-gallium is removed 
> as well, instead, --with-gallium-drivers= without parameters should be used 
> to disable Gallium.
> 
> --enable-gallium-egl is removed. having --enable-egl and 
> --with-gallium-drivers=somedriver is sufficient.
> 
> --with-state-trackers is removed as well. The list of state trackers is 
> automatically deduced from the --enable-API options (the vega,egl state 
> trackers) and --with-driver=dri|xlib (the dri,glx state trackers). Some state 
> trackers lack an enable flag now, so these two have been added to make the 
> list complete: --enable-xorg and --enable-d3d1x.
> 
> In order to be able to "git bisect run" through this change, you can specify 
> both the old and new options at the same time. Those that are unsupported are 
> ignored.
> 
> Other than that, I am enabling r600g by default and removing r300g and r600g 
> from scons. I am not a fan of having multiple build systems and most people 
> prefer autoconf anyway. It's not like anybody needs to build those drivers on 
> Windows.

I did use r600g + scons for the little bit of work I did there, and if I
went back to it, it would continue to be with scons...

Is there a significant cost to you having it there? 

Keith

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] st/mesa: improved is_interleaved_arrays() checking

2011-06-14 Thread Keith Whitwell

On Tue, 2011-06-14 at 09:39 -0600, Brian Paul wrote:
> Good question.  I was thinking that the interleaved vs. 
> non-interleaved paths could probably be merged with a little work.  I 
> don't remember the original reason for doing things as they are.

I think it enabled an easier upload path within the driver/state-tracker
-- memcpy a single range to a single VBO, rather than gathering.

Now that the upload is potentially code-generated, that may no longer
matter as much.

Keith

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] softpipe: Anisotropic filtering extension

2011-06-06 Thread Keith Whitwell

Andreas,

This looks very interesting.  Ultimately llvmpipe would want to have aniso as 
well, but performance would be much more important there.  Do you have a 
feeling for what shortcuts the hardware implementations are taking?

Keith

- Original Message -
From: "Andreas Faenger" 
To: mesa-dev@lists.freedesktop.org
Cc: "a faenger" 
Sent: Monday, 6 June, 2011 8:13:15 AM
Subject: [Mesa-dev] [PATCH] softpipe: Anisotropic filtering extension

Hi,

as requested by Paul, I've converted the patch which provides anisotropic 
filtering for swrast to softpipe. The rendering results of both version are 
almost identical and are much better compared to typical HW rendering, e.g. 
NVIDIA which produces a lot more aliasing.

Andreas

Andreas Faenger (1):
  softpipe: Anisotropic filtering extension.

 src/gallium/drivers/softpipe/sp_screen.c |4 +-
 src/gallium/drivers/softpipe/sp_tex_sample.c |  331 ++
 2 files changed, 333 insertions(+), 2 deletions(-)

-- 
1.7.4.msysgit.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] GLSL IR int-to-float pass

2011-05-25 Thread Keith Whitwell

On Wed, 2011-05-25 at 09:32 -0400, Jerome Glisse wrote:
> On Tue, May 24, 2011 at 8:09 PM, Bryan Cain  wrote:
> > Hi,
> >
> > In the past few days, I've been working on native integer support in my
> > GLSL to TGSI translator.  Something that's come to my attention is that
> > supporting Gallium targets with and without integer support using a
> > single GLSL IR backend will more or less require a GLSL IR pass to
> > convert int, uint, and possibly bool variables and operations to floats.
> >
> > Currently, this is done directly in the backend, in both ir_to_mesa and
> > st_glsl_to_tgsi.  However, the mod_to_fract and div_to_mul_rcp lowering
> > passes for GLSL IR need to know whether to lower integer modulus and
> > division operations to their corresponding float operations.  (They both
> > do this in Mesa master without asking the backend, but that will be easy
> > to change later.)  So a GLSL IR pass will be needed to do the type lowering.
> >
> > Such a pass would also have the advantage of less duplicated
> > functionality between backends, since ir_to_mesa could also take
> > advantage of the pass to eliminate some code.
> >
> > I'm more than willing to try writing such a pass myself if no one else
> > is interested in doing it, but I figure I should make sure there are no
> > objections before starting on it.
> >
> > Bryan
> 
> TGSI needs to grow type support (int, uint and possibly int8,16,32..)

Or go away entirely...

I'm not trying to impose a direction on this, but it seems like the GLSL
IR->TGSI converter (once running) could be pushed down into the
individual drivers and GLSL IR or a close cousin of it could become the
gallium-level interface.  Then individual drivers could be modified to
consume IR directly.

Keith

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 0/2] Misc i965 fixes / clean-ups

2011-04-11 Thread Keith Whitwell

On Mon, 2011-04-11 at 10:30 -0700, Ian Romanick wrote:
> The first patch "fixes" an issue that Ken and I discovered last week
> with the ROUND_DOWN_TO macro in the i965 driver.  The best fix is
> probably to pull this macro up into higher-level Mesa code.  I'd like
> some review that changing this macro won't break existing code.

Presumably that was me -- unfortunately I can't remember & it looks like
a bug so I'd say the fix makes sense.

Keith

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] draw-robustness: Test robustness for out-of-bounds vertex fetches.

2011-03-31 Thread Keith Whitwell

Looks good.

Keith

On Thu, 2011-03-31 at 14:46 +0100, jfons...@vmware.com wrote:
> From: José Fonseca 
> 
> Not added to the standard test lists given that ARB_vertex_buffer_object
> allows program termination out-of-bounds vertex buffer object fetches
> occur.
> ---
>  tests/general/CMakeLists.gl.txt |1 +
>  tests/general/draw-robustness.c |  201 
> +++
>  2 files changed, 202 insertions(+), 0 deletions(-)
>  create mode 100644 tests/general/draw-robustness.c
> 
> diff --git a/tests/general/CMakeLists.gl.txt b/tests/general/CMakeLists.gl.txt
> index bbe6507..d373e35 100644
> --- a/tests/general/CMakeLists.gl.txt
> +++ b/tests/general/CMakeLists.gl.txt
> @@ -36,6 +36,7 @@ ENDIF (UNIX)
>  add_executable (draw-elements-vs-inputs draw-elements-vs-inputs.c)
>  add_executable (draw-instanced draw-instanced.c)
>  add_executable (draw-instanced-divisor draw-instanced-divisor.c)
> +add_executable (draw-robustness draw-robustness.c)
>  add_executable (draw-vertices draw-vertices.c)
>  add_executable (draw-vertices-half-float draw-vertices-half-float.c)
>  add_executable (fog-modes fog-modes.c)
> diff --git a/tests/general/draw-robustness.c b/tests/general/draw-robustness.c
> new file mode 100644
> index 000..a13f568
> --- /dev/null
> +++ b/tests/general/draw-robustness.c
> @@ -0,0 +1,201 @@
> +/*
> + * Copyright (C) 2011 VMware, Inc.
> + * Copyright (C) 2010 Marek Olšák 
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the next
> + * paragraph) shall be included in all copies or substantial portions of the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
> + * DEALINGS IN THE SOFTWARE.
> + *
> + * Authors:
> + *   Jose Fonseca 
> + *   Based on code from Marek Olšák 
> + */
> +
> +/* Test whether out-of-bounds vertex buffer object cause termination.
> + *
> + * Note that the original ARB_vertex_buffer_object extension explicitly 
> states
> + * program termination is allowed when out-of-bounds vertex buffer object
> + * fetches occur.  The ARB_robustness extension does provides an enbale to
> + * guarantee that out-of-bounds buffer object accesses by the GPU will have
> + * deterministic behavior and preclude application instability or termination
> + * due to an incorrect buffer access.  But regardless of ARB_robustness
> + * extension support it is a good idea not to crash.  For example,  viewperf
> + * doesn't properly detect NV_primitive_restart and emits 0x indices
> + * which can result in crashes.
> + *
> + * TODO:
> + * - test out-of-bound index buffer object access
> + * - test more vertex/element formats
> + * - test non-aligned offsets
> + * - provide a command line option to actually enable ARB_robustness
> + */
> +
> +#include "piglit-util.h"
> +
> +int piglit_width = 320, piglit_height = 320;
> +int piglit_window_mode = GLUT_RGB;
> +
> +void piglit_init(int argc, char **argv)
> +{
> +piglit_ortho_projection(piglit_width, piglit_height, GL_FALSE);
> +
> +if (!GLEW_VERSION_1_5) {
> +printf("Requires OpenGL 1.5\n");
> +piglit_report_result(PIGLIT_SKIP);
> +}
> +
> +glShadeModel(GL_FLAT);
> +glClearColor(0.2, 0.2, 0.2, 1.0);
> +}
> +
> +static void
> +random_vertices(GLsizei offset, GLsizei stride, GLsizei count)
> +{
> +GLsizei size = offset + (count - 1)*stride + 2 * sizeof(GLfloat);
> +GLubyte *vertices;
> +GLsizei i;
> +
> +assert(offset % sizeof(GLfloat) == 0);
> +assert(stride % sizeof(GLfloat) == 0);
> +
> +vertices = malloc(size);
> +assert(vertices);
> +if (!vertices) {
> +return;
> +}
> +
> +for (i = 0; i < count; ++i) {
> +GLfloat *vertex = (GLfloat *)(vertices + offset + i*stride);
> +vertex[0] = (rand() % 1000) * .001;
> +vertex[1] = (rand() % 1000) * .001;
> +}
> +
> +glBufferData(GL_ARRAY_BUFFER, size, vertices, GL_STATIC_DRAW);
> +assert(glGetError() == GL_NO_ERROR);
> +
> +free(vertices);
> +}
> +
> +static void
> +random_ushort_indices(GLsizei offset, GLsizei count, GLuint min_index,

Re: [Mesa-dev] [PATCH] draw: Prevent out-of-bounds vertex buffer access.

2011-03-31 Thread Keith Whitwell

Looks good to me, Jose.

Keith

On Thu, 2011-03-31 at 14:45 +0100, jfons...@vmware.com wrote:
> From: José Fonseca 
> 
> Based on some code and ideas from Keith Whitwell.
> ---
>  src/gallium/auxiliary/Makefile |1 +
>  src/gallium/auxiliary/SConscript   |1 +
>  src/gallium/auxiliary/draw/draw_private.h  |8 ++
>  src/gallium/auxiliary/draw/draw_pt.c   |   11 +++
>  src/gallium/auxiliary/draw/draw_pt_fetch.c |2 +-
>  src/gallium/auxiliary/draw/draw_pt_fetch_emit.c|2 +-
>  .../auxiliary/draw/draw_pt_fetch_shade_emit.c  |2 +-
>  src/gallium/auxiliary/draw/draw_pt_vsplit.c|7 ++-
>  src/gallium/auxiliary/draw/draw_pt_vsplit_tmp.h|   12 ++-
>  src/gallium/auxiliary/util/u_draw.c|   94 
> 
>  src/gallium/auxiliary/util/u_draw.h|   19 
>  11 files changed, 152 insertions(+), 7 deletions(-)
>  create mode 100644 src/gallium/auxiliary/util/u_draw.c
> 
> diff --git a/src/gallium/auxiliary/Makefile b/src/gallium/auxiliary/Makefile
> index c765404..2be4509 100644
> --- a/src/gallium/auxiliary/Makefile
> +++ b/src/gallium/auxiliary/Makefile
> @@ -107,6 +107,7 @@ C_SOURCES = \
>   util/u_caps.c \
>   util/u_cpu_detect.c \
>   util/u_dl.c \
> + util/u_draw.c \
>   util/u_draw_quad.c \
>   util/u_format.c \
>   util/u_format_other.c \
> diff --git a/src/gallium/auxiliary/SConscript 
> b/src/gallium/auxiliary/SConscript
> index 8e422b2..96ca566 100644
> --- a/src/gallium/auxiliary/SConscript
> +++ b/src/gallium/auxiliary/SConscript
> @@ -154,6 +154,7 @@ source = [
>  'util/u_dump_defines.c',
>  'util/u_dump_state.c',
>  'util/u_dl.c',
> +'util/u_draw.c',
>  'util/u_draw_quad.c',
>  'util/u_format.c',
>  'util/u_format_other.c',
> diff --git a/src/gallium/auxiliary/draw/draw_private.h 
> b/src/gallium/auxiliary/draw/draw_private.h
> index db2e3c5..b7d693f 100644
> --- a/src/gallium/auxiliary/draw/draw_private.h
> +++ b/src/gallium/auxiliary/draw/draw_private.h
> @@ -146,6 +146,14 @@ struct draw_context
>struct pipe_vertex_buffer vertex_buffer[PIPE_MAX_ATTRIBS];
>unsigned nr_vertex_buffers;
>  
> +  /*
> +   * This is the largest legal index value for the current set of
> +   * bound vertex buffers.  Regardless of any other consideration,
> +   * all vertex lookups need to be clamped to 0..max_index to
> +   * prevent out-of-bound access.
> +   */
> +  unsigned max_index;
> +
>struct pipe_vertex_element vertex_element[PIPE_MAX_ATTRIBS];
>unsigned nr_vertex_elements;
>  
> diff --git a/src/gallium/auxiliary/draw/draw_pt.c 
> b/src/gallium/auxiliary/draw/draw_pt.c
> index c3d7e87..e0eda67 100644
> --- a/src/gallium/auxiliary/draw/draw_pt.c
> +++ b/src/gallium/auxiliary/draw/draw_pt.c
> @@ -470,6 +470,17 @@ draw_vbo(struct draw_context *draw,
> if (0)
>draw_print_arrays(draw, info->mode, info->start, MIN2(info->count, 
> 20));
>  
> +   draw->pt.max_index = util_draw_max_index(draw->pt.vertex_buffer,
> +draw->pt.nr_vertex_buffers,
> +draw->pt.vertex_element,
> +draw->pt.nr_vertex_elements,
> +info);
> +
> +   /*
> +* TODO: We could use draw->pt.max_index to further narrow
> +* the min_index/max_index hints given by the state tracker.
> +*/
> +
> for (instance = 0; instance < info->instance_count; instance++) {
>draw->instance_id = instance + info->start_instance;
>  
> diff --git a/src/gallium/auxiliary/draw/draw_pt_fetch.c 
> b/src/gallium/auxiliary/draw/draw_pt_fetch.c
> index 4fa3b26..5589a82 100644
> --- a/src/gallium/auxiliary/draw/draw_pt_fetch.c
> +++ b/src/gallium/auxiliary/draw/draw_pt_fetch.c
> @@ -139,7 +139,7 @@ void draw_pt_fetch_run( struct pt_fetch *fetch,
>   ((char *)draw->pt.user.vbuffer[i] + 
>draw->pt.vertex_buffer[i].buffer_offset),
>   draw->pt.vertex_buffer[i].stride,
> - draw->pt.user.max_index);
> + draw->pt.max_index);
> }
>  
> translate->run_elts( translate,
> diff --git a/src/gallium/auxiliary/draw/draw_pt_fetch_emit.c 
> b/src/gallium/auxiliary/draw/draw_pt_fetch_emit.c
> index 5104310..0ab11d0 100644
> --- a/src/gallium/auxiliary/draw/draw_pt_f

Re: [Mesa-dev] [PATCH] draw: implement vertex color clamping

2011-03-30 Thread Keith Whitwell


> On Wed, Mar 30, 2011 at 2:45 PM, Keith Whitwell 
> wrote:
> 
> > > diff --git a/src/gallium/auxiliary/draw/draw_llvm.h
> > b/src/gallium/auxiliary/draw/draw_llvm.h
> > > index e8623e7..643a9ef 100644
> > > --- a/src/gallium/auxiliary/draw/draw_llvm.h
> > > +++ b/src/gallium/auxiliary/draw/draw_llvm.h
> > > @@ -162,6 +162,7 @@ struct draw_llvm_variant_key
> > >  {
> > > unsigned nr_vertex_elements:8;
> > > unsigned nr_samplers:8;
> > > +   unsigned clamp_vertex_color:8;
> > > unsigned clip_xy:1;
> > > unsigned clip_z:1;
> > > unsigned clip_user:1;
> >
> > Why are there 8 bits for this?
> >
> > I'd suggest 1 bit is sufficient, and that you should take one bit
> from
> > "pad" to make space for it.
> >
> 
> It seems to be a typo. I have now fixed it, this is the updated part
> of the
> patch:
> 
> diff --git a/src/gallium/auxiliary/draw/draw_llvm.h
> b/src/gallium/auxiliary/draw/draw_llv
> index e8623e7..873a272 100644
> --- a/src/gallium/auxiliary/draw/draw_llvm.h
> +++ b/src/gallium/auxiliary/draw/draw_llvm.h
> @@ -162,6 +162,7 @@ struct draw_llvm_variant_key
>  {
> unsigned nr_vertex_elements:8;
> unsigned nr_samplers:8;
> +   unsigned clamp_vertex_color:1;
> unsigned clip_xy:1;
> unsigned clip_z:1;
> unsigned clip_user:1;
> @@ -169,7 +170,7 @@ struct draw_llvm_variant_key
> unsigned bypass_viewport:1;
> unsigned need_edgeflags:1;
> unsigned nr_planes:4;
> -   unsigned pad:6;
> +   unsigned pad:5;
> 
> /* Variable number of vertex elements:
>  */
> 
> 
> 
> > Otherwise, it looks good to me.
> >
> 
> OK.

Thanks Marek, looks great.

Keith

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] draw: implement vertex color clamping

2011-03-30 Thread Keith Whitwell

On Wed, 2011-03-30 at 14:33 +0200, Marek Olšák wrote:
> From: Luca Barbieri 
> 
> Disclaimer:
>   I will not push this code if this patch does not get any attention,
>   because I cannot say if it is 100% correct (the code is not mine).
>   However last time I checked, it passed all the related tests.
>   Also, the SSE and PPC paths are disabled with this code.
>   -Marek
> 
> Squashed commit of the following:
> 
> commit 737c0c6b7d591ac0fc969a7590e1691eeef0ce5e
> Author: Luca Barbieri 
> Date:   Fri Aug 27 02:13:57 2010 +0200
> 
> draw: disable SSE and PPC paths (use LLVM instead)
> 
> These paths don't support vertex clamping, and are anyway
> obsoleted by LLVM.
> 
> If you want to re-enable them, add vertex clamping and test that it
> works with the ARB_color_buffer_float piglit tests.
> 
> commit fed3486a7ca0683b403913604a26ee49a3ef48c7
> Author: Luca Barbieri 
> Date:   Thu Aug 26 18:27:38 2010 +0200
> 
> draw_llvm: respect vertex color clamp
> 
> commit ef0efe9f3d1d0f9b40ebab78940491d2154277a9
> Author: Luca Barbieri 
> Date:   Thu Aug 26 18:26:43 2010 +0200
> 
> draw: respect vertex clamping in interpreter path
> ---
>  src/gallium/auxiliary/draw/draw_llvm.c|   35 ++--
>  src/gallium/auxiliary/draw/draw_llvm.h|1 +
>  src/gallium/auxiliary/draw/draw_vs.c  |7 +
>  src/gallium/auxiliary/draw/draw_vs_exec.c |   22 ++
>  4 files changed, 57 insertions(+), 8 deletions(-)
> 
> diff --git a/src/gallium/auxiliary/draw/draw_llvm.c 
> b/src/gallium/auxiliary/draw/draw_llvm.c
> index a5217c1..27c5f3b 100644
> --- a/src/gallium/auxiliary/draw/draw_llvm.c
> +++ b/src/gallium/auxiliary/draw/draw_llvm.c
> @@ -438,7 +438,8 @@ generate_vs(struct draw_llvm *llvm,
>  const LLVMValueRef (*inputs)[NUM_CHANNELS],
>  LLVMValueRef system_values_array,
>  LLVMValueRef context_ptr,
> -struct lp_build_sampler_soa *draw_sampler)
> +struct lp_build_sampler_soa *draw_sampler,
> +boolean clamp_vertex_color)
>  {
> const struct tgsi_token *tokens = 
> llvm->draw->vs.vertex_shader->state.tokens;
> struct lp_type vs_type;
> @@ -474,6 +475,30 @@ generate_vs(struct draw_llvm *llvm,
>   outputs,
>   sampler,
>   &llvm->draw->vs.vertex_shader->info);
> +
> +   if(clamp_vertex_color)
> +   {
> +  LLVMValueRef out;
> +  unsigned chan, attrib;
> +  struct lp_build_context bld;
> +  struct tgsi_shader_info* info = &llvm->draw->vs.vertex_shader->info;
> +  lp_build_context_init(&bld, llvm->gallivm, vs_type);
> +
> +  for (attrib = 0; attrib < info->num_outputs; ++attrib) {
> + for(chan = 0; chan < NUM_CHANNELS; ++chan) {
> +if(outputs[attrib][chan]) {
> +   switch (info->output_semantic_name[attrib]) {
> +   case TGSI_SEMANTIC_COLOR:
> +   case TGSI_SEMANTIC_BCOLOR:
> +  out = LLVMBuildLoad(builder, outputs[attrib][chan], "");
> +  out = lp_build_clamp(&bld, out, bld.zero, bld.one);
> +  LLVMBuildStore(builder, out, outputs[attrib][chan]);
> +  break;
> +   }
> +}
> + }
> +  }
> +   }
>  }
>  
>  #if DEBUG_STORE
> @@ -1235,7 +1260,8 @@ draw_llvm_generate(struct draw_llvm *llvm, struct 
> draw_llvm_variant *variant)
>ptr_aos,
>system_values_array,
>context_ptr,
> -  sampler);
> +  sampler,
> +  variant->key.clamp_vertex_color);
>  
>/* store original positions in clip before further manipulation */
>store_clip(gallivm, io, outputs);
> @@ -1446,7 +1472,8 @@ draw_llvm_generate_elts(struct draw_llvm *llvm, struct 
> draw_llvm_variant *varian
>ptr_aos,
>system_values_array,
>context_ptr,
> -  sampler);
> +  sampler,
> +  variant->key.clamp_vertex_color);
>  
>/* store original positions in clip before further manipulation */
>store_clip(gallivm, io, outputs);
> @@ -1524,6 +1551,8 @@ draw_llvm_make_variant_key(struct draw_llvm *llvm, char 
> *store)
>  
> key = (struct draw_llvm_variant_key *)store;
>  
> +   key->clamp_vertex_color = llvm->draw->rasterizer->clamp_vertex_color; /**/
> +
> /* Presumably all variants of the shader should have the same
>  * number of vertex elements - ie the number of shader inputs.
>  */
> diff --git a/src/gallium/auxiliary/draw/draw_llvm.h 
> b/src/gallium/auxiliary/draw/draw_llvm.h
> index e8623e7..643a9ef 100644
> --- a/src/gallium/auxiliary/draw/draw_llvm.h
> +++ b/src/gallium/auxiliary/draw/draw_llvm.h
> @@ -162,6 +162,7 @@ struct draw_llvm_variant_key
>  {
> unsigned nr_vertex_elements:8;
> unsigned nr_samplers:8;
> +   unsigned clamp_verte

Re: [Mesa-dev] [RFC] Moving from macro to inline for list manipulation

2011-03-29 Thread Keith Whitwell

On Mon, 2011-03-28 at 17:54 -0400, Jerome Glisse wrote:
> Hi,
> 
> One short coming of macro has keep entertaining me until i figure out
> what was wrong, here is
> a simple scenario :
> 
> Macro can lead to hard to debug list bugs. For instance consider
> the following :
> LIST_ADD(item, list->prev)
> 3 instruction of the macro became :
> (list->prev)->next->prev = item
> which is equivalent to :
> list->prev = item
> Thus list prev field changes and next instruction in the macro
> (list->prev)->next = item
> became :
> item->next = item
> And you endup with list corruption, other case lead to similar
> list corruption. Inline function are not affected by this short
> coming
> 
> Thus i propose to switch list manipulation from macro to inline
> function, attached patch does exactly that. If there is no objection
> in next couple of week i will merge it. (to avoid mass renaming it
> keeps the macro that just wrap the functions it also add a bunch of
> new list walking helper)

Hmm, another good thing to do would be to eliminate the u_simple_list.h
code which is what I initially read this as being about.

Keith

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [RFC] Moving from macro to inline for list manipulation

2011-03-29 Thread Keith Whitwell

On Mon, 2011-03-28 at 17:54 -0400, Jerome Glisse wrote:
> Hi,
> 
> One short coming of macro has keep entertaining me until i figure out
> what was wrong, here is
> a simple scenario :
> 
> Macro can lead to hard to debug list bugs. For instance consider
> the following :
> LIST_ADD(item, list->prev)
> 3 instruction of the macro became :
> (list->prev)->next->prev = item
> which is equivalent to :
> list->prev = item
> Thus list prev field changes and next instruction in the macro
> (list->prev)->next = item
> became :
> item->next = item
> And you endup with list corruption, other case lead to similar
> list corruption. Inline function are not affected by this short
> coming
> 
> Thus i propose to switch list manipulation from macro to inline
> function, attached patch does exactly that. If there is no objection
> in next couple of week i will merge it. (to avoid mass renaming it
> keeps the macro that just wrap the functions it also add a bunch of
> new list walking helper)

I'm ok with this.  Those macros were supposed to be quick & dirty, but
they've been around much longer than I ever expected...

Keith

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] st/mesa: In update_samplers(), clear all samplers at once.

2011-03-23 Thread Keith Whitwell

On Mon, 2011-03-21 at 19:55 +0100, Tilman Sauerbeck wrote:
> Keith Whitwell [2011-03-21 18:43]:
> > On Mon, 2011-03-21 at 19:28 +0100, Tilman Sauerbeck wrote:
> > > Signed-off-by: Tilman Sauerbeck 
> > > ---
> > > 
> > > update_samplers() showed up in a profile of Heroes of Newerth;
> > > this patch pushes it down the profile by ~3%.
> > > 
> > > Does this seem plausible?
> > > 
> > >  src/mesa/state_tracker/st_atom_sampler.c |5 +++--
> > >  1 files changed, 3 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/src/mesa/state_tracker/st_atom_sampler.c 
> > > b/src/mesa/state_tracker/st_atom_sampler.c
> > > index 474cbd5..4374ac1 100644
> > > --- a/src/mesa/state_tracker/st_atom_sampler.c
> > > +++ b/src/mesa/state_tracker/st_atom_sampler.c
> > > @@ -129,12 +129,13 @@ update_samplers(struct st_context *st)
> > >  
> > > st->state.num_samplers = 0;
> > >  
> > > +   memset(st->state.samplers, 0, st->ctx->Const.MaxTextureImageUnits *
> > > +  sizeof(struct pipe_sampler_state));
> > > +
> > 
> > At a glance, could the memset be moved up another couple of lines and
> > changed to:
> > 
> >  memset(st->state.samplers, 0, st->state.num_samplers * sizeof(struct 
> > pipe_sampler_state));
> 
> I wondered about this, too. Consider the case where the sampler state is
> undefined when update_samplers() is called though -- then we might end
> up with partially uninitialized samplers, no?

In theory as long as the array started off zeroed, it should just work &
be a decent win.  Adding an assert or two might help catch cases where
we are wrong about this.

If you don't do that, I wonder if there is really a gain here -- it
might be that the original memset was inlined (because it has a known
size) and the new version just converts that to a non-inlined memset
with an unknown size?

As it stands, I don't see why this generates such a big improvement.

Keith

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] st/mesa: In update_samplers(), clear all samplers at once.

2011-03-21 Thread Keith Whitwell

On Mon, 2011-03-21 at 19:28 +0100, Tilman Sauerbeck wrote:
> Signed-off-by: Tilman Sauerbeck 
> ---
> 
> update_samplers() showed up in a profile of Heroes of Newerth;
> this patch pushes it down the profile by ~3%.
> 
> Does this seem plausible?
> 
>  src/mesa/state_tracker/st_atom_sampler.c |5 +++--
>  1 files changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/src/mesa/state_tracker/st_atom_sampler.c 
> b/src/mesa/state_tracker/st_atom_sampler.c
> index 474cbd5..4374ac1 100644
> --- a/src/mesa/state_tracker/st_atom_sampler.c
> +++ b/src/mesa/state_tracker/st_atom_sampler.c
> @@ -129,12 +129,13 @@ update_samplers(struct st_context *st)
>  
> st->state.num_samplers = 0;
>  
> +   memset(st->state.samplers, 0, st->ctx->Const.MaxTextureImageUnits *
> +  sizeof(struct pipe_sampler_state));
> +

At a glance, could the memset be moved up another couple of lines and
changed to:

 memset(st->state.samplers, 0, st->state.num_samplers * sizeof(struct 
pipe_sampler_state));


Keith


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/6] gallium: implement clamping controls (ARB_color_buffer_float)

2011-03-21 Thread Keith Whitwell

On Mon, 2011-03-21 at 16:23 +0100, Christoph Bumiller wrote:
> On 03/21/2011 02:12 AM, Marek Olšák wrote:
> 
> > diff --git a/src/gallium/include/pipe/p_state.h 
> > b/src/gallium/include/pipe/p_state.h
> > index cf6c5b5..f6ad456 100644
> > --- a/src/gallium/include/pipe/p_state.h
> > +++ b/src/gallium/include/pipe/p_state.h
> > @@ -81,6 +81,8 @@ struct pipe_rasterizer_state
> >  {
> > unsigned flatshade:1;
> > unsigned light_twoside:1;
> > +   unsigned clamp_vertex_color:1;
> > +   unsigned clamp_fragment_color:1;
> > unsigned front_ccw:1;
> > unsigned cull_face:2;  /**< PIPE_FACE_x */
> > unsigned fill_front:2; /**< PIPE_POLYGON_MODE_x */
> 
> Hadn't you put clamp_fragment_color in the blend state initially ?
> It seems like a more logical place to me.

Indeed you're right.  Fragment color clamping takes place in the part of
the pipeline governed by the blend CSO.

Keith

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/6] gallium: implement clamping controls (ARB_color_buffer_float)

2011-03-21 Thread Keith Whitwell

On Mon, 2011-03-21 at 02:12 +0100, Marek Olšák wrote:
> diff --git a/src/gallium/include/pipe/p_state.h
> b/src/gallium/include/pipe/p_state.h
> index cf6c5b5..f6ad456 100644
> --- a/src/gallium/include/pipe/p_state.h
> +++ b/src/gallium/include/pipe/p_state.h
> @@ -81,6 +81,8 @@ struct pipe_rasterizer_state
>  {
> unsigned flatshade:1;
> unsigned light_twoside:1;
> +   unsigned clamp_vertex_color:1;
> +   unsigned clamp_fragment_color:1;
> unsigned front_ccw:1;
> unsigned cull_face:2;  /**< PIPE_FACE_x */
> unsigned fill_front:2; /**< PIPE_POLYGON_MODE_x */ 

Don't know if this affects the overall packing of the struct.  Have you
been able to check?

Otherwise the interface changes look good to me.

Keith

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 0/2] mesa/gallium: add NV_texture_barrier

2011-03-21 Thread Keith Whitwell

On Sat, 2011-03-12 at 00:48 +0100, Marek Olšák wrote:
> On Fri, Mar 11, 2011 at 2:56 PM, Keith Whitwell  wrote:

> >
> > So my suggestion would be to name this something else, perhaps taking
> > language from the NV extension.
> >
> 
> Alright.
> 
> There are two patches attached in this email. The former is my attempt at
> display list support that I missed. The latter changes the gallium entry
> point to:
> 
> void (*texture_barrier)(struct pipe_context *);
> 
> Please review.
> 
> Best regards
> Marek 

Looks great Marek.

Keith

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] Gallium interface little cleanup

2011-03-11 Thread Keith Whitwell


> 
> I have done some of the changes in the gallium interface we discussed
> in the
> thread called "7 questions...".
> 
> There are 4 patches in total:
> 
> 1) gallium: kill is_resource_referenced
> 
> The function is_resource_referenced is removed. Considering that only
> st/xorg used it, I don't think this can cause any regressions in
> hardware
> drivers. However softpipe and llvmpipe use it internally, so I have
> kept it
> there and added driver-specific flags instead:
> 
> #define LP_UNREFERENCED 0
> #define LP_REFERENCED_FOR_READ (1 << 0)
> #define LP_REFERENCED_FOR_WRITE (1 << 1)
> 
> The same for softpipe (SP_*).
> 
> 
> 2) gallium: cleanup fence_signalled and fence_finish
> 
> This removes the "flags" parameter from both the functions and changes
> the
> return type to boolean (TRUE=success).
> 
> 
> 3) gallium: remove the geom_flags param from is_format_supported
> 
> This was unused anyway.
> 
> 
> 4) gallium: remove flags from the flush function
> 
> The drivers have been changed so that they behave as if all of the
> flags
> were set (if used at all). This is already implicit in most hardware
> drivers
> and required for multiple contexts anyway (besides maybe SWAPBUFFERS
> and
> FRAME, the exact meaning of which is undefined). Some state trackers
> were
> also abusing the PIPE_FLUSH_RENDER_CACHE flag to decide whether
> flush_frontbuffer should be called. New flag ST_FLUSH_FRONT has been
> added
> to st_api.h as a replacement, since the PIPE_FLUSH_* flags no longer
> exist.
> 
> 
> The patches are here (they are too large to be posted on ML):
>   http://cgit.freedesktop.org/~mareko/mesa/log/?h=gallium-cleanup
> 
> Please review.


It all looks & sounds good to me.

Keith

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 0/2] mesa/gallium: add NV_texture_barrier

2011-03-11 Thread Keith Whitwell

On Fri, 2011-03-11 at 06:05 +0100, Marek Olšák wrote:
> Hi,
> 
> these 2 patches add GL_NV_texture_barrier to Mesa and Gallium,
> respectively. The extension can be used for programmable
> blending, where the same texture can be bound as both a sampler
> and renderbuffer. The same feature exists in Direct10 and
> the entry point is:
> 
> VOID APIENTRY ResourceReadAfterWriteHazard(
>   __in  D3D10DDI_HDEVICE hDevice,
>   __in  D3D10DDI_HRESOURCE hResource
> )
> 
> I have chosen the same name for Gallium:
> 
> void (*resource_read_after_write_hazard)(struct pipe_context *,
>  struct pipe_resource *)
> 
> The function is documented in the second patch.
> There is a new piglit test too, called blending-in-shader.
> I only have working r300g support, but I may add softpipe
> if needed.

I support this goal, but think you've probably chosen the wrong name for
the function.

The call you're introducing is a method for the application to cooperate
with the driver to get meaningful results when a single resource is
bound for read & write at the same time.

In DX10-land, ResourceReadAfterWriteHazard() is something generated
internally by the runtime when a resource which was previously bound to
write is being rebound to read from, ie. something like:

   if (ctx->is_resource_referenced(ctx, resource))
   ctx->flush(ctx);

From their docs: "The ResourceReadAfterWriteHazard function informs the
user-mode display driver that the specified resource was used as an
output from the graphics processing unit (GPU) and that the resource
will be used as an input to the GPU." 

And: "The Microsoft Direct3D runtime calls ResourceReadAfterWriteHazard
immediately before the specified resource is bound as an input to the
GPU."

Binding a single subresource for reading and writing at the same time is
(to my knowledge) not permitted in DX10.

So my suggestion would be to name this something else, perhaps taking
language from the NV extension.

Keith

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 0/3] Removing unnecessary flushes in Gallium

2011-03-10 Thread Keith Whitwell

On Thu, 2011-03-10 at 20:25 +0100, Marek Olšák wrote:
> Hi,
> 
> I have reviewed where we call flush() and why and some of them
> seem unnecessary to me. Those flushes may slightly decrease
> performance, depending on each driver, and may hide driver bugs.
> 
> glFlush doesn't have to be called in OpenGL so often, and
> I think state trackers should follow suit.
> 
> The worst example of this is st/vega. I guess those flushes are
> there for debugging only.
> 
> Please review.

This all makes sense to me Marek.  I think some of those flushes predate
even is_resource_referenced() which explains their defensive nature.

Keith

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] 7 questions and proposals about changes in the Gallium interface

2011-03-07 Thread Keith Whitwell

On Mon, 2011-03-07 at 18:52 +0100, Marek Olšák wrote:
> > 6) Pixel buffer objects
> > >
> > > It woud be nice to have hardware-accelerated PBO copy in Gallium.
> > > Would
> > > resource_copy_region be a good candidate for this, where one of
> the
> > > arguments would be PIPE_BUFFER and the other one would be
> > > PIPE_TEXTURE_*, or
> > > am I missing something?
> >
> > Do you have a more concrete proposal?
> >
> > For me, it's always been a bit difficult to pin down what a PBO
> really
> > should map to either in hardware or gallium's abstraction of
> hardware.
> > Sometimes it's a long lived entity (eg. pipe_buffer), other times
> it's
> > just a transient object used for uploads - dma memory from a pool or
> a
> > pipe_transfer in gallium.
> >
> > But if there's a sensible addition to gallium that improves the
> > situation, I'm all for it.
> >
> 
> A PBO to me is just a way to copy data between a buffer and a texture.
> Two
> cases may arise:
> - The copy can be implemented using util_copy_rect (basically it's a
> memcpy
> with strides), in this case, resource_copy_region could be used to let
> drivers do the copy in hardware.
> - The copy cannot be implemented using util_copy_rect, in this case,
> the
> state tracker would use the current software implementation of PBOs.
> 
> Again, another two cases may arise inside resource_copy_region:
> - If the strides are well aligned, the copy can done in hardware.
> (what is
> "well aligned" is hardware-specific)
> - If they are not, a util function similar to
> util_resource_copy_region
> could be used as a fallback.
> 
> But then resource_copy_region needs to know the stride for the buffer
> being
> copied. New entrypoint maybe? 

I don't have a problem with that if it's helpful.  Maybe code something
up to get a better idea of what works & what doesn't and post a patch?
Unless anyone else has objections?

Keith

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [RFC][PATCH] Add usage for resources that have a short lifes cycle

2011-03-07 Thread Keith Whitwell

Isn't this PIPE_USAGE_STREAM ?

Keith

On Sat, 2011-03-05 at 17:54 +0100, Jakob Bornecrantz wrote:
> Hi all
> 
> Short and simple patch series attached.
> 
> Some drivers can treat one shot resources differently then resources
> that are expected to be used several times. Add a usage flag to allow
> the state tracker to mark such resources.
> 
> The motivation behind this is to identify the glBitmap cache textures
> so that the i915g driver can make them not tiled and upload the texture
> data with pwrites, which is faster then mapping them via the GTT which is
> needed when they tiled.
> 
> Comments please?
> 
> Cheers Jakob.
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] 7 questions and proposals about changes in the Gallium interface

2011-03-07 Thread Keith Whitwell

On Mon, 2011-03-07 at 20:47 +1000, Dave Airlie wrote:
> On Mon, Mar 7, 2011 at 3:42 AM, Marek Olšák  wrote:
> > Hi,
> >
> > I have several questions about Gallium. Some of them are about undocumented
> > stuff, others are just little things from the top of my head. Please
> > consider these as things I may do when time allows.
> >
> >
> > 1) Flush flags
> >
> > Which PIPE_FLUSH_* flag is used to flush the command stream? There doesn't
> > seem to be one and we need it for glFlush.
> >
> > What is PIPE_FLUSH_FRAME for? To my knowledge, Gallium doesn't know
> > "frames". What a driver should do when it gets that flag? The same for
> > PIPE_FLUSH_SWAPBUFFERS.
> 
> I thought that was used from the X.org state tracker but I could be wrong.

Hmm, this may be true, but drivers/svga doesn't look at this flag in any
significant way.

Keith

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] 7 questions and proposals about changes in the Gallium interface

2011-03-07 Thread Keith Whitwell

On Sun, 2011-03-06 at 18:42 +0100, Marek Olšák wrote:
> 
> Hi,
> 
> I have several questions about Gallium. Some of them are about
> undocumented
> stuff, others are just little things from the top of my head. Please
> consider these as things I may do when time allows.
> 
> 
> 1) Flush flags
> 
> Which PIPE_FLUSH_* flag is used to flush the command stream? There
> doesn't
> seem to be one and we need it for glFlush.
> 
> What is PIPE_FLUSH_FRAME for? To my knowledge, Gallium doesn't know
> "frames". What a driver should do when it gets that flag? The same for
> PIPE_FLUSH_SWAPBUFFERS.
> 
> I propose replacing the current flags with:
> - PIPE_FLUSH_COMMANDS // flush the command stream
> - PIPE_FLUSH_FRAMEBUFFER_CACHE // flush the write cache of the
> currently-set
> framebuffer
> - PIPE_FLUSH_TEXTURE_CACHE // invalidate the read cache of the
> currently-set
> textures

I'm not sure if the flags add any value & have been the source of
bugs/confusion in the past.

How about just removing the parameter?

> 
> 2) is_resource_referenced
> 
> Now that the transfer functions are in pipe_context, is this hook
> really
> necessary?

Good question.  I'd like to see those functions go away as they are
round-trips baked into the interface which is a pain if you try and
remote this stuff.

I guess you'd still need to catch the write-after-read case within a
single context and turn that into a flush.

I think others (Jose in particular) should comment, but I suspect that
individual drivers could now do this internally & not need to expose the
interface in gallium.

> 3) fence_signalled and fence_finish
> 
> Both of these functions take a driver-specific "flags" parameter
> (according
> to p_screen.h) and return an integer (probably driver-specific too),
> where
> zero means success. Could we either:
> - specify a valid set of arguments for "flags" and the return values,
> - or remove the "flags" parameters and change the return types to
> boolean?

I'd prefer the latter.

> 
> 4) geom_flags in is_format_supported
> 
> Not only are these unused by any driver, they also are redundant.
> - PIPE_TEXTURE_GEOM_NON_SQUARE was obsoleted by PIPE_TEXTURE_RECT,
> which can
> be directly passed to is_format_supported.
> - PIPE_TEXTURE_GEOM_NON_POWER_OF_TWO was obsoleted by
> PIPE_CAP_NPOT_TEXTURES.
> 
> Could the geom flags be removed?

This is OK with me too.

> 5) Block compression formats naming
> 
> Would anyone object to cleaning up the names of compression formats?
> 
> There are (or will be) these formats: DXTn, RGTCn, LATCn, BPTCx. They
> have
> many things in common:
> - All of them have 4x4 pixel blocks.
> - One block is either 64 bits of 128 bits large.
> - RGTC and LATC are equal except for swizzling.
> - RGTC and LATC are based on DXTn encoding.
> 
> I propose to copy the more consistent D3D11 naming and use the form
> PIPE_FORMAT_encoding_swizzle_type for all of them:
> 
> PIPE_FORMAT_BC1_RGB_UNORM // DXT1 = BC1
> PIPE_FORMAT_BC1_RGB_SRGB
> PIPE_FORMAT_BC1_RGBA_UNORM
> PIPE_FORMAT_BC1_RGBA_SRGB
> PIPE_FORMAT_BC2_RGBA_UNORM // DXT3 = BC2
> PIPE_FORMAT_BC2_RGBA_SRGB
> PIPE_FORMAT_BC3_RGBA_UNORM // DXT5 = BC3
> PIPE_FORMAT_BC3_RGBA_SRGB
> PIPE_FORMAT_BC4_R_UNORM // RGTC1 = BC4
> PIPE_FORMAT_BC4_R_SNORM
> PIPE_FORMAT_BC4_L_UNORM // LATC1 = BC4
> PIPE_FORMAT_BC4_L_SNORM
> PIPE_FORMAT_BC5_RG_UNORM // RGTC2 = D3D/3DC = BC5
> PIPE_FORMAT_BC5_RG_SNORM
> PIPE_FORMAT_BC5_LA_UNORM // LATC2 = GL/3DC = BC5
> PIPE_FORMAT_BC5_LA_SNORM
> PIPE_FORMAT_BC6_RGB_FLOAT // BPTC (BC6H)
> PIPE_FORMAT_BC6_RGB_UFLOAT
> PIPE_FORMAT_BC7_RGBA_UNORM // BPTC
> PIPE_FORMAT_BC7_RGBA_SRGB
> 
> The layout for all of them would be UTIL_FORMAT_LAYOUT_BC.
> 
> UFLOAT is a float without the sign bit. I guess UFLOAT should be used
> for
> R11G11B10_FLOAT and R9G9B9E5_FLOAT too.

Sounds good again, though this is more Jose's domain than mine.

> 6) Pixel buffer objects
> 
> It woud be nice to have hardware-accelerated PBO copy in Gallium.
> Would
> resource_copy_region be a good candidate for this, where one of the
> arguments would be PIPE_BUFFER and the other one would be
> PIPE_TEXTURE_*, or
> am I missing something?

Do you have a more concrete proposal?

For me, it's always been a bit difficult to pin down what a PBO really
should map to either in hardware or gallium's abstraction of hardware.
Sometimes it's a long lived entity (eg. pipe_buffer), other times it's
just a transient object used for uploads - dma memory from a pool or a
pipe_transfer in gallium.

But if there's a sensible addition to gallium that improves the
situation, I'm all for it.


> 7) Stippling and smoothing
> 
> Would anyone be against removing these two from the Gallium interface
> and
> fully implementing them in st/mesa? It's not like any of the radeon
> people
> wants to implement them in the hardware drivers, and the Draw module
> is not
> an option, because we still would like to have hardware-accelerated
> vertex
> shaders even with stippling and smoothing.

I don't think this

Re: [Mesa-dev] Mesa (master): tgsi: Disable SSE2 code generation.

2011-03-04 Thread Keith Whitwell

On Fri, 2011-03-04 at 07:02 -0800, Jose Fonseca wrote:
> Module: Mesa
> Branch: master
> Commit: 6838c9ce74f16c765474c0d2b4ae1469dd4a64d5
> URL:
> http://cgit.freedesktop.org/mesa/mesa/commit/?id=6838c9ce74f16c765474c0d2b4ae1469dd4a64d5
> 
> Author: José Fonseca 
> Date:   Fri Mar  4 14:54:24 2011 +
> 
> tgsi: Disable SSE2 code generation.
> 
> It's broken now that tgsi_exec_machine::Inputs/Ouputs are pointers.
> 
> Temporary if anybody still cares about tgsi_sse2.c. Permanent otherwise.

Oh, permanent please...

Keith

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] Project ideas [was Re: Gallium3D and Glide]

2011-03-02 Thread Keith Whitwell

On Tue, 2011-03-01 at 12:32 -0800, Ian Romanick wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
> 
> On 03/01/2011 05:36 AM, Keith Whitwell wrote:
> 
> > Create an automated bug-finder for gallium drivers.  
> > 
> > Step one: create a "split and compare" gallium driver (perhaps based on
> > failover) which runs the same set of commands on two different gallium
> > drivers (eg softpipe and r600g).  At each frame, compare the two images
> > and see if there are differences.
> 
> For non-trivial rendering, that had better be an awful fuzzy "compare."

The compare would have to have some fuzziness built into it, agreed.
There's probably a little research right there in finding a good
measure, but at worst you downsample and compute PSNR or SSIM of the two
images and find a threshold by trial and error.

Note that you're only going to bother doing this when you know there's a
bug, or for regression testing.  In both cases it should be trivial to
find an appropriate threshold, and this can be done case-by-case.

The motivation is to isolate bugs which result in gross misrenderings,
like incorrect texturing, malformed objects, black bars across the
screen, and so forth - in other words the threshold can be quite low.

>  All of the 3D rendering specs allow a large amount of variation in
> rasterization rules and precision rules.  

> Have you ever looked at OpenGL's line drawing rules?  Line locations can vary 
> by +/- a pixel in
> any direction.

Yes, in great detail. Hui & I took a close look at it while implementing
llvmpipe's half-plane based line rasterization.

The +/- 1 pixel business is there, agreed, but the preferred
implementation (bresenham/diamond exit rule) is clear.  The diamond-exit
rule is fairly well specified and modern hardware will likely follow it
exactly as it's the requirement for DX10 line rasterization as well.

Hardware isn't quite as diverse as it might have been historically
thanks to DX10 which specifies rasterization precisely and tests it by
image comparison against the reference rasterizer in exhaustive (or at
least exhausting) detail.

Llvmpipe is pixel-exact against nVidia's hardware for triangle, quad and
point rasterization.  In OpenGL line rasterization I believe nVidia have
a bug where they forget that GL's idea of "top" is different to DX's and
hence get the diamond logic slightly wrong.  I suspect if GL tightened
up its spec & had a proper test suite for rasterization they'd notice
that & fix it quickly.

Keith

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] Project ideas [was Re: Gallium3D and Glide]

2011-03-01 Thread Keith Whitwell

On Tue, 2011-03-01 at 13:18 +0100, Blaž Tomažič wrote:
> On Mon, 2011-02-28 at 16:39 -0800, Brian Paul wrote: 
> > On Mon, Feb 28, 2011 at 4:13 PM, Blaž Tomažič  
> > wrote:
> > > Hi Mesa developers,
> > >
> > > I am really interested in Gallium3D and I'm thinking about a project for
> > > my diploma (I think this is the same as bachelor's degree) on a computer
> > > university. I'm thinking about writing a Glide state tracker for
> > > Gallium3D. I know that Glide hasn't been used for a decade, but I think
> > > it would be relatively "easy" to implement an old 3D API and a great way
> > > to learn a part of Gallium3D on the way.
> > >
> > > Some old games (only Glide: Need for Speed 2; Glide and OpenGL: Unreal,
> > > Quake 2) used Glide API for rendering and Gallium3D could therefore
> > > accelerate them on newer hardware and more importantly, render them with
> > > nicer graphics instead of software rendering as is with some glide only
> > > games. I don't know if there were any Linux games, but Gallium3D works
> > > on Windows too if I'm not mistaken.
> > >
> > > So I have a few question for you:
> > >  - How hard and how much work would be involved in implementing such an
> > > API? (Any help on implementing it would be welcomed and appreciated of
> > > course)
> > >  - Are Glide to OpenGL wrappers a better solution because of changing
> > > nature of Gallium3D interface? (Personally I think they are, but I would
> > > like to work directly on Gallium)
> > >  - Do you think this project would fit well in Gallium3D or do you have
> > > any other/better proposals for a project including Gallium3D?
> > 
> > I don't mean to discourage you, but I this probably wouldn't be a very
> > good project.
> > 
> > The Glide API and 3dfx hardware is lacking in a number of areas,
> > particularly shaders.  Gallium was intended for newer hardware so it
> > wouldn't be a good fit for this older technology.
> > 
> > I'm sure there are other projects you could do related to
> > OpenGL/gallium if that's where your interest lies.  Other people on
> > this list might have some ideas.  I could probably come up with a few
> > otherwise.  What do you think?
> > 
> > -Brian
> 
> I know that Glide is an old technology and probably I should test this
> games if they even run on newer OSs. But still, Gallium supports OpenGL
> 1.x functionality and Glide is a subset of it. Fixed-pipe  functionality
> is easily implemented with shaders so I think that it could be
> implemented.
> I agree that some other project using current technology would be much
> better. Even if Glide would be implemented for Gallium it would be of
> almost no use and there would be almost no purpose maintaining it.
> Nevertheless, I thought it would be a great way to learn Gallium and in
> the process contribute something small to the community. And then, when
> I would know more about Gallium, contribute something more useful maybe.
> 
> Then Glide is not a good idea but I don't have any other. I'm open for
> any suggestions you and other list members might have then.

Create an automated bug-finder for gallium drivers.  

Step one: create a "split and compare" gallium driver (perhaps based on
failover) which runs the same set of commands on two different gallium
drivers (eg softpipe and r600g).  At each frame, compare the two images
and see if there are differences.

Step two: as above, but save the command stream each frame so that you
can bisect into it to find the actual draw call which caused the
failure.

Keith

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] st/mesa: fix computing the lowest address for interleaved attribs

2011-02-23 Thread Keith Whitwell

Looks good Marek.

Keith

On Wed, 2011-02-23 at 07:44 +0100, Marek Olšák wrote:
> From: Wiktor Janas 
> 
> Ptr can be very well NULL, so when there are two arrays, with one having
> offset 0 (and thus NULL Ptr), and the other having a non-zero offset,
> the non-zero value is taken as minimum (because of !low_addr ? start ...).
> On 32-bit systems, this somehow works. On 64-bit systems, it leads to crashes.
> ---
>  src/mesa/state_tracker/st_draw.c |9 ++---
>  1 files changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/src/mesa/state_tracker/st_draw.c 
> b/src/mesa/state_tracker/st_draw.c
> index 11ebd06..6530a06 100644
> --- a/src/mesa/state_tracker/st_draw.c
> +++ b/src/mesa/state_tracker/st_draw.c
> @@ -315,10 +315,13 @@ setup_interleaved_attribs(struct gl_context *ctx,
> const GLubyte *low_addr = NULL;
>  
> /* Find the lowest address. */
> -   for (attr = 0; attr < vpv->num_inputs; attr++) {
> -  const GLubyte *start = arrays[vp->index_to_input[attr]]->Ptr;
> +   if(vpv->num_inputs) {
> +  low_addr = arrays[vp->index_to_input[0]]->Ptr;
>  
> -  low_addr = !low_addr ? start : MIN2(low_addr, start);
> +  for (attr = 1; attr < vpv->num_inputs; attr++) {
> + const GLubyte *start = arrays[vp->index_to_input[attr]]->Ptr;
> + low_addr = MIN2(low_addr, start);
> +  }
> }
>  
> for (attr = 0; attr < vpv->num_inputs; attr++) {


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] st/mesa: fix crash when using both user and vbo buffers with the same stride

2011-02-22 Thread Keith Whitwell

Looks good to me.

Keith

On Sun, 2011-02-20 at 18:14 +0100, Marek Olšák wrote:
> If two buffers had the same stride where one buffer is a user one and
> the other is a vbo, it was considered to be one interleaved buffer,
> resulting in incorrect rendering and crashes.
> 
> This patch makes sure that the interleaved buffer is either user or vbo,
> not both.
> ---
>  src/mesa/state_tracker/st_draw.c |8 
>  1 files changed, 8 insertions(+), 0 deletions(-)
> 
> diff --git a/src/mesa/state_tracker/st_draw.c 
> b/src/mesa/state_tracker/st_draw.c
> index 5475e87..11ebd06 100644
> --- a/src/mesa/state_tracker/st_draw.c
> +++ b/src/mesa/state_tracker/st_draw.c
> @@ -249,6 +249,7 @@ is_interleaved_arrays(const struct st_vertex_program *vp,
> const struct gl_buffer_object *firstBufObj = NULL;
> GLint firstStride = -1;
> const GLubyte *client_addr = NULL;
> +   GLboolean user_memory;
>  
> for (attr = 0; attr < vpv->num_inputs; attr++) {
>const GLuint mesaAttr = vp->index_to_input[attr];
> @@ -257,6 +258,7 @@ is_interleaved_arrays(const struct st_vertex_program *vp,
>  
>if (firstStride < 0) {
>   firstStride = stride;
> + user_memory = !bufObj || !bufObj->Name;
>}
>else if (firstStride != stride) {
>   return GL_FALSE;
> @@ -266,6 +268,9 @@ is_interleaved_arrays(const struct st_vertex_program *vp,
>   /* Try to detect if the client-space arrays are
>* "close" to each other.
>*/
> + if (!user_memory) {
> +return GL_FALSE;
> + }
>   if (!client_addr) {
>  client_addr = arrays[mesaAttr]->Ptr;
>   }
> @@ -275,6 +280,9 @@ is_interleaved_arrays(const struct st_vertex_program *vp,
>   }
>}
>else if (!firstBufObj) {
> + if (user_memory) {
> +return GL_FALSE;
> + }
>   firstBufObj = bufObj;
>}
>else if (bufObj != firstBufObj) {


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] st/mesa: fix crash when DrawBuffer->_ColorDrawBuffers[0] is NULL

2011-02-22 Thread Keith Whitwell

Looks good Marek.

Keith

On Sun, 2011-02-20 at 16:52 +0100, Marek Olšák wrote:
> This fixes the game Tiny and Big.
> ---
>  src/mesa/state_tracker/st_cb_clear.c |   16 ++--
>  1 files changed, 10 insertions(+), 6 deletions(-)
> 
> diff --git a/src/mesa/state_tracker/st_cb_clear.c 
> b/src/mesa/state_tracker/st_cb_clear.c
> index d81e554..0e0c432 100644
> --- a/src/mesa/state_tracker/st_cb_clear.c
> +++ b/src/mesa/state_tracker/st_cb_clear.c
> @@ -300,9 +300,11 @@ clear_with_quad(struct gl_context *ctx,
> cso_set_fragment_shader_handle(st->cso_context, st->clear.fs);
> cso_set_vertex_shader_handle(st->cso_context, st->clear.vs);
>  
> -   st_translate_color(ctx->Color.ClearColor,
> -  ctx->DrawBuffer->_ColorDrawBuffers[0]->_BaseFormat,
> -  clearColor);
> +   if (ctx->DrawBuffer->_ColorDrawBuffers[0]) {
> +  st_translate_color(ctx->Color.ClearColor,
> + ctx->DrawBuffer->_ColorDrawBuffers[0]->_BaseFormat,
> + clearColor);
> +   }
>  
> /* draw quad matching scissor rect */
> draw_quad(st, x0, y0, x1, y1, (GLfloat) ctx->Depth.Clear, clearColor);
> @@ -555,9 +557,11 @@ st_Clear(struct gl_context *ctx, GLbitfield mask)
> ctx->DrawBuffer->Visual.stencilBits == 0))
>   clear_buffers |= PIPE_CLEAR_DEPTHSTENCIL;
>  
> -  st_translate_color(ctx->Color.ClearColor,
> - ctx->DrawBuffer->_ColorDrawBuffers[0]->_BaseFormat,
> - clearColor);
> +  if (ctx->DrawBuffer->_ColorDrawBuffers[0]) {
> + st_translate_color(ctx->Color.ClearColor,
> +
> ctx->DrawBuffer->_ColorDrawBuffers[0]->_BaseFormat,
> +clearColor);
> +  }
>  
>st->pipe->clear(st->pipe, clear_buffers, ctx->Color.ClearColor,
>ctx->Depth.Clear, ctx->Stencil.Clear);


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] FreeGLUT performance - pointless X server roundtrip

2011-02-22 Thread Keith Whitwell

I've always wondered why freeGLUT gives such poor numbers for gears and
similar high-framerate demos relative to the original.

It looks like one reason is the code added for Spaceball handling which
tries to initialize the Spaceball device every frame, even if it
previously failed.  The trouble being that this introduces an
unnecessary X server round-trip each frame:

#0  0x768bf870 in _XReply () from /usr/lib/libX11.so.6
#1  0x768a1c84 in XInternAtom () from /usr/lib/libX11.so.6
#2  0x77bb4c1f in fgInitialiseSpaceball () from /usr/lib/libglut.so.3
#3  0x77bb5015 in fgSpaceballSetWindow () from /usr/lib/libglut.so.3
#4  0x77bbb5db in fgSetWindow () from /usr/lib/libglut.so.3
#5  0x77bb5df1 in ?? () from /usr/lib/libglut.so.3
#6  0x77bb99f9 in fgEnumWindows () from /usr/lib/libglut.so.3
#7  0x77bb62d2 in glutMainLoopEvent () from /usr/lib/libglut.so.3
#8  0x77bb6c67 in glutMainLoop () from /usr/lib/libglut.so.3
#9  0x004032e2 in main (argc=1, argv=0x7fffe078)

Basically the spaceball code doesn't remember that it failed last time &
keeps trying to initialize itself.  This code went in at the end of
2009.

I have a patch, but can't seem to get freeglut to build to test it
(autogen.sh fails on demos/Error directory???) so hoping one of the glut
maintainers can look at this.

FWIW, I get:

nvidia, freeglut, gears:   2600fps
nvidia, origglut, gears:   5000fps
llvmpipe, freeglut, gears:  499fps
llvmpipe, origglut, gears: 1300fps

It also shows up in much less framerate-bound apps:

llvmpipe, freeglut, tunnel: 56fps
llvmpipe, origglut, tunnel: 68fps

Patch attached, but consider untested.

Keith
Index: freeglut/freeglut/src/freeglut_spaceball.c
===
--- freeglut/freeglut/src/freeglut_spaceball.c	(revision 878)
+++ freeglut/freeglut/src/freeglut_spaceball.c	(working copy)
@@ -52,12 +52,16 @@
 static SFG_Window *spnav_win;
 #endif
 
-static int sball_initialized;
+static enum { 
+   sb_uninited,
+   sb_inited,
+   sb_not_present
+} sball_initialized = sb_uninited;
 
 
 void fgInitialiseSpaceball(void)
 {
-if(sball_initialized) {
+if (sball_initialized != sb_uninited) {
 return;
 }
 
@@ -70,12 +74,17 @@
 
 w = fgStructure.CurrentWindow->Window.Handle;
 if(spnav_x11_open(fgDisplay.Display, w) == -1) {
-return;
+ fgWarning("fgInitialiseSpaceball failed\n");
+ sball_initialized = sb_not_present;
+ return;
 }
+
+sball_initialized = sb_inited;
+return;
 }
 #endif
 
-sball_initialized = 1;
+sball_initialized = sb_not_present;
 }
 
 void fgSpaceballClose(void)
@@ -87,14 +96,14 @@
 
 int fgHasSpaceball(void)
 {
-if(!sball_initialized) {
+if(sball_initialized == sb_uninited) {
 fgInitialiseSpaceball();
-if(!sball_initialized) {
-fgWarning("fgInitialiseSpaceball failed\n");
-return 0;
-}
 }
 
+if(sball_initialized == sb_not_present) {
+return 0;
+}
+
 #if TARGET_HOST_POSIX_X11
 /* XXX this function should somehow query the driver if there's a device
  * plugged in, as opposed to just checking if there's a driver to talk to.
@@ -107,12 +116,8 @@
 
 int fgSpaceballNumButtons(void)
 {
-if(!sball_initialized) {
-fgInitialiseSpaceball();
-if(!sball_initialized) {
-fgWarning("fgInitialiseSpaceball failed\n");
-return 0;
-}
+if(!fgHasSpaceball()) {
+return 0;
 }
 
 #if TARGET_HOST_POSIX_X11
@@ -124,11 +129,8 @@
 
 void fgSpaceballSetWindow(SFG_Window *window)
 {
-if(!sball_initialized) {
-fgInitialiseSpaceball();
-if(!sball_initialized) {
-return;
-}
+if(!fgHasSpaceball()) {
+return;
 }
 
 #if TARGET_HOST_POSIX_X11
@@ -150,7 +152,7 @@
 fgSpaceballSetWindow(fgStructure.CurrentWindow);
 }
 
-if(!sball_initialized) {
+if(!fgHasSpaceball()) {
 return 0;
 }
 
@@ -161,11 +163,8 @@
 {
 spnav_event sev;
 
-if(!sball_initialized) {
-fgInitialiseSpaceball();
-if(!sball_initialized) {
-return;
-}
+if(!fgHasSpaceball()) {
+   return;
 }
 
 if(spnav_x11_event(xev, &sev)) {
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] configure.ac: Disable intel gallium drivers by default

2011-02-16 Thread Keith Whitwell

I certainly have no objection for i965...  Dave and Jakob probably need
to comment also.

Keith

On Wed, 2011-02-16 at 13:35 -0500, Kristian Høgsberg wrote:
> They're not maintained and gets in the way when loading EGL drivers.
> The doc string even says it's disabled by default.
> ---
> 
> I think it makes sense to disable the intel gallium drivers as they're not
> maintained and not in use anywhere.  If nothing else, they're certainly
> experimental, and as such it makes sense to require an opt-in before
> compiling them.
> 
> Jakob didn't like the wholesale disabling of gallium for intel, but preferred
> a per-state tracker option (to disable gallium egl from loading intel
> gallium drivers, for example).  I just don't think it makes sense to build
> the gallium intel drivers by default if they're not supposed to be
> used anywhere.
> 
> Kristian
> 
>  configure.ac |4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/configure.ac b/configure.ac
> index 46d3516..4a3f06e 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -1705,7 +1705,7 @@ AC_ARG_ENABLE([gallium-i915],
>  [AS_HELP_STRING([--enable-gallium-i915],
>  [build gallium i915 @<:@default=disabled@:>@])],
>  [enable_gallium_i915="$enableval"],
> -[enable_gallium_i915=auto])
> +[enable_gallium_i915=no])
>  if test "x$enable_gallium_i915" = xyes; then
>  GALLIUM_WINSYS_DIRS="$GALLIUM_WINSYS_DIRS i915/sw"
>  GALLIUM_DRIVERS_DIRS="$GALLIUM_DRIVERS_DIRS i915"
> @@ -1722,7 +1722,7 @@ AC_ARG_ENABLE([gallium-i965],
>  [AS_HELP_STRING([--enable-gallium-i965],
>  [build gallium i965 @<:@default=disabled@:>@])],
>  [enable_gallium_i965="$enableval"],
> -[enable_gallium_i965=auto])
> +[enable_gallium_i965=no])
>  if test "x$enable_gallium_i965" = xyes; then
>  GALLIUM_DRIVERS_DIRS="$GALLIUM_DRIVERS_DIRS i965"
>  gallium_check_st "i965/drm" "dri-i965" "xorg-i965"


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 0/6] Mesa/Gallium vertex array state optimizations

2011-02-14 Thread Keith Whitwell

On Sun, 2011-02-13 at 22:04 +0100, Marek Olšák wrote:
> Keith,
> 
> Yes, they will. If vertex buffers are not re-set in st_draw_vbo,
> redefine_user_buffer is called for each user buffer which is set and that
> tells a driver which buffer ranges need to be re-uploaded. This can be found
> in the last hunk of the last patch, specifically:

OK, thanks for clarifying Marek.  I think the patches look great.

> @@ -646,6 +664,26 @@ st_draw_vbo(struct gl_context *ctx,
>  #endif
>}
> 
> +   /* Notify the driver that the content of user buffers may have been
> +* changed. */
> +   if (!new_array && st->num_user_vbs) {
> +  for (i = 0; i < st->num_user_vbs; i++) {
> + if (st->user_vb[i]) {
> +unsigned stride = st->user_vb_stride[i];
> +
> +if (stride) {
> +   pipe->redefine_user_buffer(pipe, st->user_vb[i],
> +  min_index * stride,
> +  (max_index + 1 - min_index) *
> stride);
> +} else {
> +   /* stride == 0 */
> +   pipe->redefine_user_buffer(pipe, st->user_vb[i],
> +  0, st->user_vb[i]->width0);
> +}
> + }
> +  }
> +   }
> +
>setup_index_buffer(ctx, ib, &ibuffer);
>pipe->set_index_buffer(pipe, &ibuffer);
> 
> What remains to implement is using this information in drivers to re-upload
> the buffer ranges marked with redefine_user_buffer. r300g, r600g, some
> nouveau drivers, and anything which uses Draw do not need this information,
> so they are safe. I think the only driver which needs special handling is
> svga, but I don't know that driver so well to be able to do it.

I know svga is getting a bit of attention at the moment, so this might
be something they may want to pick up.

Keith


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 0/6] Mesa/Gallium vertex array state optimizations

2011-02-13 Thread Keith Whitwell

Marek, 


These patches look good, but have you covered the case where the application is 
changing the contents of vertex arrays without rebinding/notifying GL in any 
way? 


eg. an app could do: 


memcpy(varray, foo, ...); 
glDrawArrays(...); 
memcpy(varray, bar, ...); 
glDrawArrays(...); 


with these changes will drivers still notice the difference? 


Keith 

- Original Message -
From: "Marek Olšák"  
To: mesa-dev@lists.freedesktop.org 
Sent: Saturday, February 12, 2011 7:05:27 PM 
Subject: [Mesa-dev] [PATCH 0/6] Mesa/Gallium vertex array state optimizations 

Hi, 

this patch series optimizes vertex array state changes in Mesa/Gallium. The 
problem with the vbo module and st/mesa is that it re-binds vertex arrays every 
draw operation instead of only when they get changed by the application, and 
this series aims to address that issue. 

Some new issues arose during the implemention though: 

1) The VBO module didn't notify the underlying driver when it was changing 
buffer offsets and other vertex array properties. This is fixed in the 1st 
patch. 

2) If we do not re-bind vertex arrays every draw operation, we must assure that 
the state is preserved after operations like glBlitFramebuffer. This is 
resolved in the 3rd patch using cso_cache. 

3) Unfortunately, user buffers must be mutable in order to prevent re-binding 
vertex buffers because we have no way to know how large they are. Instead, a 
new context hook has been added to Gallium called 'redefine_user_buffer', which 
notifies a driver that a subrange of a user buffer should be reuploaded, and 
also redefines its size. 

I've only tested softpipe and r300g and there were no regressions. r600g should 
also work and Christopher told me his Nouveau drivers should be ready for this 
series too. 

Please review. 

Marek Olšák (6): 
vbo: notify a driver that we change buffer offsets, strides, etc. 
vbo: bind arrays only when necessary 
gallium: always save and restore vertex buffers using cso_cache 
gallium: remove pipe_vertex_buffer::max_index 
st/mesa: set vertex arrays state only when necessary 
gallium: notify drivers about possible changes in user buffer contents 

Best regards 
Marek 

___ 
mesa-dev mailing list 
mesa-dev@lists.freedesktop.org 
http://lists.freedesktop.org/mailman/listinfo/mesa-dev 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] Gallium proposal: add a user pointer in pipe_resource

2011-02-08 Thread Keith Whitwell

On Tue, 2011-02-08 at 22:51 +0100, Marek Olšák wrote:
> 
> void redefine_user_buffer(
> struct pipe_context*,
> struct pipe_resource*,
> unsigned offset,
> unsigned size);
> 
> and new width0 would implicitly be offset+size;
> 

I think this is a great place to start, if you're happy with it too.

Lets see how well it works for you in implementation & make further
adjustments later if necessary.

Keith

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] Gallium proposal: add a user pointer in pipe_resource

2011-02-08 Thread Keith Whitwell

On Tue, 2011-02-08 at 22:29 +0100, Marek Olšák wrote:
> 
> 
> Keith,
> 
> redefine_user_buffer() would be a good compromise and I believe the
> performance hit wouldn't be so noticable. It would also allow partial
> uploads of constants in a user buffer, which is something we'd like to
> have
> too.
> 
> Currently, st/mesa in st_draw_vbo is doing:
> - N calls to resource_destroy
> - some unnecessary computations
> - N calls to user_buffer_create
> - 1 call to set_vertex_buffers
> - 1 call to set_vertex_elements_state
> 
> If we can replace this by N calls to redefine_user_buffer, then I am
> all for
> it, provided neither _NEW_ARRAY nor _NEW_PROGRAM is dirty of course.
> 
> Can the function look, let's say, like this?
> 
> void redefine_user_buffer(struct pipe_context *, struct pipe_resource
> *,
> const struct pipe_box *);
> 

That looks good, but wouldn't you also want to be able to change the
size of the userbuffer?  I can see you might be able to implicitly grow
the buffer this way (when box->x +  box->width > resource->width0), but
that's fairly obtuse and it doesn't permit shrinking.

So perhaps as above with a "unsigned new_width0" parameter?

Keith

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] Gallium proposal: add a user pointer in pipe_resource

2011-02-07 Thread Keith Whitwell

Marek, 

I'm fine with keeping user buffers -- it's only a vague hope they'll fade away 
over time, and I'm comfortable with keeping them as long as their behaviour is 
well understood. 

The really important thing for me is to preserve traceability. That is to say 
it should be possible to observe what happens over the interface and infer 
directly from that when something important happens. In this case, that would 
mean having a way to notice that the contents and/or size of a userbuffer 
changed. 

That could be as simple as a notification call that this has happened, for 
example "redefine_user_buffer()". On your current drivers that call would be a 
noop -- hopefully that's not going to be a noticiable performance hit? Then in 
some tracing or recording module, that call could be used to log the contents 
of the userbuffer to a file, or in some future indirect-rendering equivalent, 
the new contents could be transmitted across the wire, etc. 

This would mean that userbuffers continue to have a known size, etc, and would 
require the state-tracker to issue the redefine call as necessary. You're in a 
better position than me to comment on the performance impact of this. 

If you're saying this isn't workable from a performance POV, then as you 
suggest we'll have to find a way to push the logic for identifying when 
userbuffers changed down into those components (tracing, remoting, etc) which 
care about it. 

Keith 

- Original Message -
From: "Marek Olšák"  
To: "Keith Whitwell"  
Cc: mesa-dev@lists.freedesktop.org 
Sent: Sunday, 6 February, 2011 12:01:01 PM 
Subject: Re: [Mesa-dev] Gallium proposal: add a user pointer in pipe_resource 


Hi Keith, 

1) Recreating user buffers is very expensive, even though it's only the CALLOC 
overhead. Draw-call-heavy apps simply suffer hard there. It's one of the things 
the gallium-varrays-optim branch tries to optimize, i.e. make the user buffer 
content mutable. I can't see another way out. 


2) The map/unmap overhead is partially hidden by the fact that: 
- r300g doesn't unmap buffers when asked to, it defers unmapping until the 
command stream is flushed. This optimization has resulted in about 70% frame 
rate increase in Nexuiz. The overhead there is now mainly when locking and 
unlocking a mutex and doing some checks. 
- r600g keeps all buffers mapped all the time, even textures. The only 
disadvantage is it consumes address space. This is a result of desperation we 
have with draw-call-heavy apps. (Do you remember that I wanted to add 
spinlocks? Frankly, that was another desperate move.) 

But it's not enough. We must prevent any unnecessary calls to 
transfer_map/unmap. If keeping the upload buffer mapped a little longer results 
in 4% perfomance increase, then I want it. I have measured the real increase 
from this in Torcs and it's simply worth it. The problem with inline transfers 
is it's like map/unmap, so it wouldn't improve anything. 


3) Not sure if you noticed, but constants are now set via user buffers as well. 
IIRC, Radeon and Nouveau people welcomed this change. The thing is every driver 
uses a different approach to uploading constants and all it needs is a direct 
pointer to gl_program_parameter_list::ParameterValues to do the best job. 
Previously, drivers stored constants in malloc'd memory, which was basically 
just a temporary copy of ParameterValues. Eliminating that copy was the main 
motivation for using user buffers for constants. r300g copies the constants to 
the command stream directly, whereas r600g uses u_upload_mgr, and I guess other 
drivers do something entirely different. As you can see, we can't get rid of 
user buffers while keeping all drivers on the fast path. But I'd be ok with a 
new set_constant_buffer(data?) function which takes a pointer to constants 
instead of a resource. With that, we could remove the overhead of 
user_buffer_create for constants. The original set_constant_buffer function can 
be reserved for ARB_uniform_buffer_object, but shouldn't ideally be used for 
anything else. 


I fully understand that you want a robust interface. I would totally agree with 
you if I didn't spend months profiling Mesa. I'd like to have the same except 
that I also want it to be performance-oriented. I am afraid it will be very 
hard to have that and the robustness at the same time. I and other driver devs 
really want to compete with proprietary drivers in terms of performance. 


On Tue, Feb 1, 2011 at 6:55 PM, Keith Whitwell < kei...@vmware.com > wrote: 


So the optimization we're really talking about here is saving the 
map/unmap overhead on the upload buffer? 

And if the state tracker could do the uploads without incurring the 
map/unmap overhead, would that be sufficient for you to feel comfortable 
moving this functionality up a level? 


Becaus

Re: [Mesa-dev] Gallium proposal: add a user pointer in pipe_resource

2011-02-01 Thread Keith Whitwell

On Mon, 2011-01-31 at 10:46 -0800, Marek Olšák wrote:
> Hi Keith,
> 
> From my point of view, user buffers are just pointers passed through
> the Gallium interface and are well-defined from what I can see. They
> might be owned by the application (e.g. set via glVertexPointer etc.),
> therefore using the transfer functions on user buffers is invalid per
> se. Moreover, the application may change the content of user buffers
> any time,

Up until now we've always worked as if user buffers were not mutable
either by the application or the driver.  This means that userbuffers
behave very much like normal buffers which have some initial data but no
transfer mechanism.  

One upshot of this is that the driver can safely promote userbuffers to
true buffers in a one-off operation. A second upshot is that userbuffers
which may change will need to be deleted & recreated by the
state-tracker.

This is more expressive than a situation where the driver has to always
assume that userbuffers may have changed between arbitrary draw calls --
if the buffer is being reused, the driver knows it has not changed.

>  meaning that drivers should convert the user buffers to real buffers
> in the draw_vbo function, then draw vertices, and then forget the real
> buffers, keeping the user buffers bound for the next draw operation.
> Drivers should not upload user buffers anywhere else, because the
> application may change the contents between glDraw* calls. If they are
> bound as vertex buffers, we don't need to know their size and
> sometimes we even can't (again, glVertexPointer etc.). Instead, we can
> use pipe_draw_info::min_index and max_index and upload only that
> range. This has proved to be a great optimization and it's how r300g
> and r600g work now.

In what sense is this different to having the state-tracker destroy and
recreate the userbuffer around each draw call?  Is it just the overhead
of the CALLOC call to create the pipe_resource struct?  Otherwise the
behaviour inside draw_vbo() looks identical - the same number of
userbuffers get uploaded.  In both cases the min/max_index values can be
used to minimize the uploaded region.

> In theory, doing user buffer uploads at the state tracker side using
> inline transfers might work and should remove some burden from
> drivers. 

This would be an alternate approach -- the state-tracker could itself
figure out min/max_index, and upload that data into a real hardware
buffer -- basically the same task that the driver is doing in both
examples above.

> In practice, inline transfers may have a very large overhead compared
> to how things work now. In transfer_inline_write, you usually want to
> map the buffer, do a memcpy, and unmap it. The map/unmap overhead can
> be really significant. There are applications that use glDrawElements
> to draw one triangle at a time, and they draw hundreds of triangles
> with user buffers in this way (yes, applications really do this). We
> can't afford doing any more work than is absolutely necessary. When
> you get 1 or more draw_vbo calls per second, everything matters.
> 
> Currently, the radeon drivers have one upload buffer for vertices and
> it stays mapped until the command stream is flushed. When they get a
> user buffer, they do one memcpy and that's all. They don't touch
> winsys unless the upload buffer is full.

So the optimization we're really talking about here is saving the
map/unmap overhead on the upload buffer?

And if the state tracker could do the uploads without incurring the
map/unmap overhead, would that be sufficient for you to feel comfortable
moving this functionality up a level?

> 
> 
> And user-buffers tend not to stay user-buffers - they can be promoted
> to
> regular buffers behind the scenes by the driver.  Would that be
> reflected in this interface somehow?
> 
> I don't think it's needed. The pipe_resource fields can stay immutable
> and drivers can internally replace vertex buffers with their private
> pipe_resources. The state trackers don't need to know about it.
> 
> 
> 
> From the point of view of recording, replaying, debugging, remoting,
> etc. at the gallium boundary, it's preferable if all actions are
> interposable - ie. all actions are mediated by a function call of some
> sort into the gallium interface.  Giving a component a direct memory
> access into buffer contents would tend to defeat that and make
> record/replay of that action difficult.
> 
> Indeed, record/replay would be difficult but not impossbie. FWIW I
> think the interface shouldn't be specifically designed for
> record/replay. Instead, record/replay should be made work with
> whatever interface there is.

Well, yes, but there are some really powerful things you can do with an
interface like gallium if it is possible to fully interpose at this
level.  I'm sure you can come up with examples around remote rendering,
debugging, etc - it's important enough to want to preserve the ability
to interpose and serialize.

> Is it possible

Re: [Mesa-dev] Gallium proposal: add a user pointer in pipe_resource

2011-01-31 Thread Keith Whitwell

On Sat, 2011-01-29 at 15:12 -0800, Marek Olšák wrote:
> 
> 
> Hi,
> 
> I am proposing to add a pointer to a user buffer in pipe_resource.
> There are two reasons for this:
> 
> 1) I would like to have a way to query outside of a driver whether a
> buffer is a user buffer. Simply comparing the pointer with NULL would
> do the trick.
> 
> 2) I would like to efficiently obtain a pointer to a user buffer
> outside of a driver without going through the sequence of functions
> get_transfer, transfer_map, transfer_unmap, and transfer_destroy.
> 
> This will allow to move more driver-specific code to auxiliary/util.
> 
> 

Marek,

I have to say my preference would have been to see user buffers fade
away in favour of things like inline transfers.  That said you're much
more active than I am in looking at this right now, so I don't want to
get in the way of your progress.

I guess my biggest problem with user buffers is how poorly defined their
semantics are.  For instance, what does it really mean to get write
transfer into a userbuffer?   Will you be updating the original
application-owned memory?

And user-buffers tend not to stay user-buffers - they can be promoted to
regular buffers behind the scenes by the driver.  Would that be
reflected in this interface somehow?

From the point of view of recording, replaying, debugging, remoting,
etc. at the gallium boundary, it's preferable if all actions are
interposable - ie. all actions are mediated by a function call of some
sort into the gallium interface.  Giving a component a direct memory
access into buffer contents would tend to defeat that and make
record/replay of that action difficult.

Is it possible to get a description of what you're doing at a slightly
higher level to try and understand if there's a solution without these
drawbacks?

Keith 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH Resend] mesa: Optionally build a dricore support library.

2010-12-24 Thread Keith Whitwell

On Fri, Dec 24, 2010 at 4:12 AM, Christopher James Halse Rogers
 wrote:
> On Tue, 2010-12-21 at 08:58 +0000, Keith Whitwell wrote:
>> This promotes a private interface to a public one, right?  If so that
>> isn't really doing us any favours as next people will complain when that
>> newly public interface varies between releases.
>
> Not really; the new libraries are private (contained within
> $DRI_INSTALL_DIR, so /usr/lib/dri by default) and unversioned.  This is
> not significantly different to, say, the shared objects in /usr/lib/egl
> which have come and gone without complaint.
>
> This patch does *not* expose any additional interfaces in the public
> libGL, GLES, etc libraries.  Where objects need to be built with default
> visibility, they're built twice; once with -fvisibility=hidden for the
> code destined for the public libraries, once without for the shared,
> private libraries.
>
>>
>> If you want to save disk space by sharing components, what about an
>> alternate approach -- investigate methods for building all the DRI
>> drivers into a single binary?  That would keep the internal interface
>> private & possibly share more space than this approach.
>>
> It would indeed save a bit more space, and also apply more easily to the
> gallium drivers.  It'd require a much larger patch though - we'd need to
> change the libGL←→dri driver interface and patch X to keep up, right?
>
> If that's the direction you'd prefer to go, I could look at doing that.
> I think it'd be substantially more invasive, though, and more difficult
> to make optional.

I don't think my concerns are sufficient to hold this up -- if others
aren't concerned then guess I'm ok with this as an optional mechanism
for environments where version skew is unlikely, such as live cds.

Keith
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/4] gallium: add fragment shader property for color writes to all buffers.

2010-12-23 Thread Keith Whitwell

Dave,

This all looks good to me (modulo the glitch Tilman pointed out).

Keith

On Thu, 2010-12-23 at 00:43 -0800, Dave Airlie wrote:
> For GL fragColor semantics we need to tell the pipe drivers that the fragment
> shader color result is to be replicated to all bound color buffers, this
> adds the basic TGSI + documentation.
> 
> Signed-off-by: Dave Airlie 
> ---
>  src/gallium/auxiliary/tgsi/tgsi_text.c |3 +++
>  src/gallium/auxiliary/tgsi/tgsi_ureg.c |   16 +++-
>  src/gallium/auxiliary/tgsi/tgsi_ureg.h |4 
>  src/gallium/docs/source/tgsi.rst   |5 +
>  src/gallium/include/pipe/p_shader_tokens.h |3 ++-
>  5 files changed, 29 insertions(+), 2 deletions(-)
> 
> diff --git a/src/gallium/auxiliary/tgsi/tgsi_text.c 
> b/src/gallium/auxiliary/tgsi/tgsi_text.c
> index 9a38c37..d868b5b 100644
> --- a/src/gallium/auxiliary/tgsi/tgsi_text.c
> +++ b/src/gallium/auxiliary/tgsi/tgsi_text.c
> @@ -1265,6 +1265,7 @@ static const char *property_names[] =
> "GS_MAX_OUTPUT_VERTICES",
> "FS_COORD_ORIGIN",
> "FS_COORD_PIXEL_CENTER"
> +   "FS_COLOR0_WRITE_ALL_CBUFS"
>  };
>  
>  static const char *primitive_names[] =
> @@ -1398,6 +1399,8 @@ static boolean parse_property( struct translate_ctx 
> *ctx )
>   return FALSE;
>}
>break;
> +   case TGSI_PROPERTY_FS_COLOR0_WRITES_ALL_CBUFS:
> +  break;
> default:
>if (!parse_uint(&ctx->cur, &values[0] )) {
>   report_error( ctx, "Expected unsigned integer as property!" );
> diff --git a/src/gallium/auxiliary/tgsi/tgsi_ureg.c 
> b/src/gallium/auxiliary/tgsi/tgsi_ureg.c
> index 7d13a17..02de12d 100644
> --- a/src/gallium/auxiliary/tgsi/tgsi_ureg.c
> +++ b/src/gallium/auxiliary/tgsi/tgsi_ureg.c
> @@ -148,6 +148,7 @@ struct ureg_program
> unsigned property_gs_max_vertices;
> unsigned char property_fs_coord_origin; /* = TGSI_FS_COORD_ORIGIN_* */
> unsigned char property_fs_coord_pixel_center; /* = 
> TGSI_FS_COORD_PIXEL_CENTER_* */
> +   unsigned char property_fs_color0_writes_all_cbufs; /* = 
> TGSI_FS_COLOR0_WRITES_ALL_CBUFS * */
>  
> unsigned nr_addrs;
> unsigned nr_preds;
> @@ -284,7 +285,12 @@ ureg_property_fs_coord_pixel_center(struct ureg_program 
> *ureg,
> ureg->property_fs_coord_pixel_center = fs_coord_pixel_center;
>  }
>  
> -
> +void
> +ureg_property_fs_color0_writes_all_cbufs(struct ureg_program *ureg,
> +unsigned fs_color0_writes_all_cbufs)
> +{
> +   ureg->property_fs_color0_writes_all_cbufs = fs_color0_writes_all_cbufs;
> +}
>  
>  struct ureg_src
>  ureg_DECL_fs_input_cyl_centroid(struct ureg_program *ureg,
> @@ -1278,6 +1284,14 @@ static void emit_decls( struct ureg_program *ureg )
>  ureg->property_fs_coord_pixel_center);
> }
>  
> +   if (ureg->property_fs_color0_writes_all_cbufs) {
> +  assert(ureg->processor == TGSI_PROCESSOR_FRAGMENT);
> +
> +  emit_property(ureg,
> +TGSI_PROPERTY_FS_COLOR0_WRITES_ALL_CBUFS,
> +ureg->property_fs_color0_writes_all_cbufs);
> +   }
> +
> if (ureg->processor == TGSI_PROCESSOR_VERTEX) {
>for (i = 0; i < UREG_MAX_INPUT; i++) {
>   if (ureg->vs_inputs[i/32] & (1 << (i%32))) {
> diff --git a/src/gallium/auxiliary/tgsi/tgsi_ureg.h 
> b/src/gallium/auxiliary/tgsi/tgsi_ureg.h
> index acc4632..807128a 100644
> --- a/src/gallium/auxiliary/tgsi/tgsi_ureg.h
> +++ b/src/gallium/auxiliary/tgsi/tgsi_ureg.h
> @@ -153,6 +153,10 @@ void
>  ureg_property_fs_coord_pixel_center(struct ureg_program *ureg,
>  unsigned fs_coord_pixel_center);
>  
> +void
> +ureg_property_fs_color0_writes_all_cbufs(struct ureg_program *ureg,
> +unsigned fs_color0_writes_all_cbufs);
> +
>  /***
>   * Build shader declarations:
>   */
> diff --git a/src/gallium/docs/source/tgsi.rst 
> b/src/gallium/docs/source/tgsi.rst
> index 7eb6bd0..d986e66 100644
> --- a/src/gallium/docs/source/tgsi.rst
> +++ b/src/gallium/docs/source/tgsi.rst
> @@ -1516,6 +1516,11 @@ GL_ARB_fragment_coord_conventions extension.
>  DirectX 9 uses INTEGER.
>  DirectX 10 uses HALF_INTEGER.
>  
> +FS_COLOR0_WRITES_ALL_CBUFS
> +""
> +Specifies that writes to the fragment shader color 0 are replicated to all
> +bound cbufs. This facilitates OpenGL's fragColor output vs fragData[0] where
> +fragData is directed to a single color buffer, but fragColor is broadcast.
>  
> 
>  Texture Sampling and Texture Formats
> diff --git a/src/gallium/include/pipe/p_shader_tokens.h 
> b/src/gallium/include/pipe/p_shader_tokens.h
> index ba433b2..0a9e141 100644
> --- a/src/gallium/include/pipe/p_shader_tokens.h
> +++ b/src/gallium/include/pipe/p_shader_tokens.h
> @@ -177,7 +177,8 @@ union tgsi_immediate_data
>  #define TGSI_PROPERTY_GS_MAX_OUTPUT_VERTICES 2
>  #define TGSI_PROPERTY_FS_COORD_ORIGIN3
>  #define TGSI_PROPE

Re: [Mesa-dev] [PATCH 01/12] st/mesa: use DXT SRGB formats for COMPRESSED_SRGB

2010-12-22 Thread Keith Whitwell

Marek,

This series looks good to me.

Keith

On Tue, 2010-12-21 at 19:00 -0800, Marek Olšák wrote:
> And also check if the formats are supported to return something meaningful
> if compression cannot be used.
> ---
>  src/mesa/state_tracker/st_format.c |   20 
>  1 files changed, 16 insertions(+), 4 deletions(-)
> 
> diff --git a/src/mesa/state_tracker/st_format.c 
> b/src/mesa/state_tracker/st_format.c
> index 955d821..531fa94 100644
> --- a/src/mesa/state_tracker/st_format.c
> +++ b/src/mesa/state_tracker/st_format.c
> @@ -717,18 +717,30 @@ st_choose_format(struct pipe_screen *screen, GLenum 
> internalFormat,
>  
> case GL_SRGB_EXT:
> case GL_SRGB8_EXT:
> -   case GL_COMPRESSED_SRGB_EXT:
> -   case GL_COMPRESSED_SRGB_ALPHA_EXT:
> case GL_SRGB_ALPHA_EXT:
> case GL_SRGB8_ALPHA8_EXT:
>return default_srgba_format( screen, target, sample_count, bindings,
> geom_flags );
> +
> +   case GL_COMPRESSED_SRGB_EXT:
> case GL_COMPRESSED_SRGB_S3TC_DXT1_EXT:
> -  return PIPE_FORMAT_DXT1_SRGB;
> +  if (screen->is_format_supported(screen, PIPE_FORMAT_DXT1_SRGB, target,
> +  sample_count, bindings, geom_flags))
> + return PIPE_FORMAT_DXT1_SRGB;
> +  return default_srgba_format( screen, target, sample_count, bindings,
> +   geom_flags );
> +
> case GL_COMPRESSED_SRGB_ALPHA_S3TC_DXT1_EXT:
>return PIPE_FORMAT_DXT1_SRGBA;
> +
> +   case GL_COMPRESSED_SRGB_ALPHA_EXT:
> case GL_COMPRESSED_SRGB_ALPHA_S3TC_DXT3_EXT:
> -  return PIPE_FORMAT_DXT3_SRGBA;
> +  if (screen->is_format_supported(screen, PIPE_FORMAT_DXT3_SRGBA, target,
> +  sample_count, bindings, geom_flags))
> + return PIPE_FORMAT_DXT3_SRGBA;
> +  return default_srgba_format( screen, target, sample_count, bindings,
> +   geom_flags );
> +
> case GL_COMPRESSED_SRGB_ALPHA_S3TC_DXT5_EXT:
>return PIPE_FORMAT_DXT5_SRGBA;
>  


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH Resend] mesa: Optionally build a dricore support library.

2010-12-21 Thread Keith Whitwell

This promotes a private interface to a public one, right?  If so that
isn't really doing us any favours as next people will complain when that
newly public interface varies between releases.

If you want to save disk space by sharing components, what about an
alternate approach -- investigate methods for building all the DRI
drivers into a single binary?  That would keep the internal interface
private & possibly share more space than this approach.

Keith

On Mon, 2010-12-20 at 20:34 -0800, Christopher James Halse Rogers wrote:
> This an adds --enable-shared-dricore option to configure.  When enabled,
> DRI modules will link against a shared copy of the common mesa routines
> rather than statically linking these.
> 
> This saves about 30MB on disc with a full complement of classic DRI
> drivers.
> ---
> 
> Resending as it seems to have been ignored the first time.
> We've applied this in Ubuntu as we are (as always) scrabbling for
> CD space on the LiveCDs, but Fedora had a similar patch in the dim
> distant past.
> 
> This seems to be something that distros generally will be interested
> in.
> 
>  configs/autoconf.in|8 -
>  configs/default|3 ++
>  configs/freebsd-dri|4 ++-
>  configs/linux-dri  |4 ++-
>  configs/linux-dri-xcb  |4 ++-
>  configs/linux-egl  |4 ++-
>  configs/linux-indirect |3 +-
>  configure.ac   |   32 +-
>  src/glsl/Makefile  |   20 ++-
>  src/mesa/Makefile  |   57 +++
>  src/mesa/drivers/dri/Makefile.template |   12 +++
>  src/mesa/drivers/osmesa/Makefile   |2 +-
>  src/mesa/x86/read_rgba_span_x86.S  |8 
>  13 files changed, 136 insertions(+), 25 deletions(-)
> 
> diff --git a/configs/autoconf.in b/configs/autoconf.in
> index e2d70c6..37a137d 100644
> --- a/configs/autoconf.in
> +++ b/configs/autoconf.in
> @@ -33,6 +33,8 @@ LLVM_LDFLAGS = @LLVM_LDFLAGS@
>  LLVM_LIBS = @LLVM_LIBS@
>  GLW_CFLAGS = @GLW_CFLAGS@
>  GLUT_CFLAGS = @GLUT_CFLAGS@
> +DRI_CFLAGS = @DRI_CFLAGS@
> +DRI_CXXFLAGS = @DRI_CXXFLAGS@
> 
>  TALLOC_LIBS = @TALLOC_LIBS@
>  TALLOC_CFLAGS = @TALLOC_CFLAGS@
> @@ -103,7 +105,10 @@ GALLIUM_AUXILIARIES = 
> $(TOP)/src/gallium/auxiliary/libgallium.a
>  GALLIUM_DRIVERS = $(foreach 
> DIR,$(GALLIUM_DRIVERS_DIRS),$(TOP)/src/gallium/drivers/$(DIR)/lib$(DIR).a)
> 
>  # Driver specific build vars
> -DRI_DIRS = @DRI_DIRS@
> +DRI_DIRS = @DRI_DIRS@
> +DRICORE_GLSL_LIBS = @DRICORE_GLSL_LIBS@
> +DRICORE_LIBS = @DRICORE_LIBS@
> +DRICORE_LIB_DEPS = @DRICORE_LIB_DEPS@
>  EGL_PLATFORMS = @EGL_PLATFORMS@
>  EGL_CLIENT_APIS = @EGL_CLIENT_APIS@
> 
> @@ -131,6 +136,7 @@ GLESv2_LIB_DEPS = $(EXTRA_LIB_PATH) @GLESv2_LIB_DEPS@
>  VG_LIB_DEPS = $(EXTRA_LIB_PATH) @VG_LIB_DEPS@
> 
>  # DRI dependencies
> +MESA_MODULES = @MESA_MODULES@
>  DRI_LIB_DEPS = $(EXTRA_LIB_PATH) @DRI_LIB_DEPS@
>  LIBDRM_CFLAGS = @LIBDRM_CFLAGS@
>  LIBDRM_LIB = @LIBDRM_LIBS@
> diff --git a/configs/default b/configs/default
> index 0301345..1feeb97 100644
> --- a/configs/default
> +++ b/configs/default
> @@ -85,6 +85,9 @@ VG_LIB_GLOB = $(VG_LIB_NAME)*
>  TALLOC_LIBS = `pkg-config --libs talloc`
>  TALLOC_CFLAGS = `pkg-config --cflags talloc`
> 
> +DRI_CFLAGS = $(CFLAGS)
> +DRI_CXXFLAGS = $(CXXFLAGS)
> +
>  # Optional assembly language optimization files for libGL
>  MESA_ASM_SOURCES =
> 
> diff --git a/configs/freebsd-dri b/configs/freebsd-dri
> index a4aa82e..23cf58a 100644
> --- a/configs/freebsd-dri
> +++ b/configs/freebsd-dri
> @@ -30,9 +30,11 @@ ASM_SOURCES =
>  MESA_ASM_SOURCES =
> 
>  # Library/program dependencies
> +MESA_MODULES  = $(TOP)/src/mesa/libmesa.a
> +
>  LIBDRM_CFLAGS = `pkg-config --cflags libdrm`
>  LIBDRM_LIB = `pkg-config --libs libdrm`
> -DRI_LIB_DEPS = -L/usr/local/lib -lm -pthread -lexpat $(LIBDRM_LIB)
> +DRI_LIB_DEPS = $(MESA_MODULES) -L/usr/local/lib -lm -pthread -lexpat 
> $(LIBDRM_LIB)
>  GL_LIB_DEPS = -L/usr/local/lib -lX11 -lXext -lXxf86vm -lXdamage -lXfixes \
> -lm -pthread $(LIBDRM_LIB)
> 
> diff --git a/configs/linux-dri b/configs/linux-dri
> index 64fc407..caf0406 100644
> --- a/configs/linux-dri
> +++ b/configs/linux-dri
> @@ -43,9 +43,11 @@ MESA_ASM_SOURCES =
>  # Library/program dependencies
>  EXTRA_LIB_PATH=-L/usr/X11R6/lib
> 
> +MESA_MODULES  = $(TOP)/src/mesa/libmesa.a
> +
>  LIBDRM_CFLAGS = $(shell pkg-config --cflags libdrm)
>  LIBDRM_LIB = $(shell pkg-config --libs libdrm)
> -DRI_LIB_DEPS  = $(EXTRA_LIB_PATH) -lm -lpthread -lexpat -ldl -ltalloc 
> $(LIBDRM_LIB)
> +DRI_LIB_DEPS  = $(MESA_MODULES) $(EXTRA_LIB_PATH) -lm -lpthread -lexpat -ldl 
> -ltalloc $(LIBDRM_LIB)
>  GL_LIB_DEPS   = $(EXTRA_LIB_PATH) -lX11 -lXext -lXxf86vm -lXdamage -lXfixes \
> -lm -lpthread -ldl $(LIBDRM_LIB)
> 
> diff --git a/configs/linux-dri-xcb b/configs/linux-dri-xcb
> index 8092a04..75180

Re: [Mesa-dev] [PATCH] gallium: remove unused 'buf' parameter in pipe_buffer_unmap

2010-12-20 Thread Keith Whitwell

Looks good, Marek.

Keith

On Sun, 2010-12-19 at 04:02 -0800, Marek Olšák wrote:
> ---
>  src/gallium/auxiliary/util/u_index_modify.c  |   12 ++--
>  src/gallium/auxiliary/util/u_inlines.h   |3 +--
>  src/gallium/auxiliary/util/u_upload_mgr.c|4 ++--
>  src/gallium/drivers/nv50/nv50_shader_state.c |2 +-
>  src/gallium/drivers/nv50/nv50_vbo.c  |2 +-
>  src/gallium/drivers/r300/r300_render.c   |   11 +--
>  src/gallium/drivers/r300/r300_render_translate.c |5 ++---
>  src/gallium/drivers/r600/r600_translate.c|5 ++---
>  src/gallium/drivers/svga/svga_draw_arrays.c  |4 ++--
>  src/gallium/drivers/svga/svga_draw_elements.c|8 
>  src/gallium/drivers/svga/svga_state_constants.c  |2 +-
>  src/gallium/drivers/svga/svga_state_vs.c |4 +---
>  src/gallium/drivers/svga/svga_swtnl_backend.c|2 +-
>  src/gallium/drivers/svga/svga_swtnl_draw.c   |9 +++--
>  src/mesa/state_tracker/st_cb_bufferobjects.c |6 +++---
>  src/mesa/state_tracker/st_cb_drawtex.c   |2 +-
>  src/mesa/state_tracker/st_draw_feedback.c|8 +++-
>  17 files changed, 39 insertions(+), 50 deletions(-)
> 
> diff --git a/src/gallium/auxiliary/util/u_index_modify.c 
> b/src/gallium/auxiliary/util/u_index_modify.c
> index 65b079e..3822f60 100644
> --- a/src/gallium/auxiliary/util/u_index_modify.c
> +++ b/src/gallium/auxiliary/util/u_index_modify.c
> @@ -52,8 +52,8 @@ void util_shorten_ubyte_elts(struct pipe_context *context,
>  out_map++;
>  }
> 
> -pipe_buffer_unmap(context, *elts, src_transfer);
> -pipe_buffer_unmap(context, new_elts, dst_transfer);
> +pipe_buffer_unmap(context, src_transfer);
> +pipe_buffer_unmap(context, dst_transfer);
> 
>  *elts = new_elts;
>  }
> @@ -86,8 +86,8 @@ void util_rebuild_ushort_elts(struct pipe_context *context,
>  out_map++;
>  }
> 
> -pipe_buffer_unmap(context, *elts, in_transfer);
> -pipe_buffer_unmap(context, new_elts, out_transfer);
> +pipe_buffer_unmap(context, in_transfer);
> +pipe_buffer_unmap(context, out_transfer);
> 
>  *elts = new_elts;
>  }
> @@ -120,8 +120,8 @@ void util_rebuild_uint_elts(struct pipe_context *context,
>  out_map++;
>  }
> 
> -pipe_buffer_unmap(context, *elts, in_transfer);
> -pipe_buffer_unmap(context, new_elts, out_transfer);
> +pipe_buffer_unmap(context, in_transfer);
> +pipe_buffer_unmap(context, out_transfer);
> 
>  *elts = new_elts;
>  }
> diff --git a/src/gallium/auxiliary/util/u_inlines.h 
> b/src/gallium/auxiliary/util/u_inlines.h
> index e55aafe..9184b6a 100644
> --- a/src/gallium/auxiliary/util/u_inlines.h
> +++ b/src/gallium/auxiliary/util/u_inlines.h
> @@ -242,7 +242,6 @@ pipe_buffer_map(struct pipe_context *pipe,
> 
>  static INLINE void
>  pipe_buffer_unmap(struct pipe_context *pipe,
> -  struct pipe_resource *buf,
>struct pipe_transfer *transfer)
>  {
> if (transfer) {
> @@ -341,7 +340,7 @@ pipe_buffer_read(struct pipe_context *pipe,
> if (map)
>memcpy(data, map + offset, size);
> 
> -   pipe_buffer_unmap(pipe, buf, src_transfer);
> +   pipe_buffer_unmap(pipe, src_transfer);
>  }
> 
>  static INLINE struct pipe_transfer *
> diff --git a/src/gallium/auxiliary/util/u_upload_mgr.c 
> b/src/gallium/auxiliary/util/u_upload_mgr.c
> index af229e6..4daa55d 100644
> --- a/src/gallium/auxiliary/util/u_upload_mgr.c
> +++ b/src/gallium/auxiliary/util/u_upload_mgr.c
> @@ -108,7 +108,7 @@ my_buffer_write(struct pipe_context *pipe,
> 
> memcpy(map + offset, data, size);
> pipe_buffer_flush_mapped_range(pipe, transfer, offset, dirty_size);
> -   pipe_buffer_unmap(pipe, buf, transfer);
> +   pipe_buffer_unmap(pipe, transfer);
> 
> return PIPE_OK;
>  }
> @@ -243,7 +243,7 @@ enum pipe_error u_upload_buffer( struct u_upload_mgr 
> *upload,
> 
>  done:
> if (map)
> -  pipe_buffer_unmap( upload->pipe, inbuf, transfer );
> +  pipe_buffer_unmap( upload->pipe, transfer );
> 
> return ret;
>  }
> diff --git a/src/gallium/drivers/nv50/nv50_shader_state.c 
> b/src/gallium/drivers/nv50/nv50_shader_state.c
> index 306aa81..1c1b66d 100644
> --- a/src/gallium/drivers/nv50/nv50_shader_state.c
> +++ b/src/gallium/drivers/nv50/nv50_shader_state.c
> @@ -71,7 +71,7 @@ nv50_transfer_constbuf(struct nv50_context *nv50,
>map += nr;
> }
> 
> -   pipe_buffer_unmap(pipe, buf, transfer);
> +   pipe_buffer_unmap(pipe, transfer);
>  }
> 
>  static void
> diff --git a/src/gallium/drivers/nv50/nv50_vbo.c 
> b/src/gallium/drivers/nv50/nv50_vbo.c
> index d41a59d..53f319a 100644
> --- a/src/gallium/drivers/nv50/nv50_vbo.c
> +++ b/src/gallium/drivers/nv50/nv50_vbo.c
> @@ -284,7 +284,7 @@ nv50_draw_elements_inline(struct pipe_context *pipe,
> nzi = TRUE;
> }
> 
> -   pipe_buffer_unmap(pipe, indexBuffer, transfer);
> +   pipe_buffer_unmap

Re: [Mesa-dev] [Mesa3d-dev] ARB draw buffers + texenv program

2010-12-17 Thread Keith Whitwell

On Fri, 2010-12-17 at 00:49 -0800, Dave Airlie wrote:
> On Tue, Apr 27, 2010 at 7:10 AM, Dave Airlie  wrote:
>  buffers.
> >>> But you'll have more shader instructions for writing to all these
> >>> outputs right? I think that could still make a difference, though it
> >>> might be more theoretical rather than in practice.
> >>>
> >>
> >> Right, I was talking specifically about memory bandwidth. But it would
> >> take a few extra instructions as well.  I think the major impact will
> >> be the additional memory bandwidth rather than the added instructions.
> >>
> >
> >
> > I also thought about this a bit more, and I'm guessing GPUs use this
> > to optimise cache usage. If they know they are rendering the same data
> > to all 1..n render targets they only need to store one copy in the dst
> > cache.
> 
> Okay I'm finally getting back to resurrecting this,
> 
> http://people.freedesktop.org/~airlied/scratch/0001-gallium-r300g-fix-frag-color-writing-attempt-2.patch
> 
> Basically I've added a TGSI property that states whether single writes
> are meant to go to single out or all of them.
> 
> I've still got to fixup softpipe and there's lots of debugging in
> there, but with that r300g passes the fbo-drawbuffers-fragcolor test.

Dave,

I think this is subtle enough that it needs some documentation.  

I trawled the revived thread to figure out the detail of what's here
(having forgotten all about it), and it looks fine but I think this
really needs a couple of good paragraphs in the documentation and a
from-first-principles description in the patch.

Also, WRITE_ALL is a bit ambiguous -- could it be something like
"BROADCAST_COLOR0_WRITES" or maybe "WRITE_ALL_CBUFS"?

Keith

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] os: add spinlocks

2010-12-15 Thread Keith Whitwell

On Wed, 2010-12-15 at 09:19 -0800, Kristian Høgsberg wrote:
> On Wed, Dec 15, 2010 at 10:10 AM, Thomas Hellstrom
>  wrote:
> ...
> > Given this, I would advise strongly against building spinlocks into any code
> > that might be run on a uni-processor system.  Particularly gallium utility
> > code.
> > If we want to get rid of unnecessary locking overhead we should probably fix
> > the code up to avoid taking the locks when not strictly needed.
> 
> Another option is to rethink/refactor the code in question to just
> take the locks less.  Use per thread (context) state instead where
> possible and batch updates to global state so you can take the lock
> and do a bunch of stuff.  For example, if you're putting many items
> back on a global free list, just put them back on a local free list
> one by one, and then take the lock and then merge the thread local
> free list into the global list (should be a constant time operation)
> eventually.
> 
> I know it's easier said than done, but if locking is showing up on the
> profile, I think "use less locking" is a better fix than  "use faster
> locking".

I think one thing that's going on here is we've made it too easy to
share objects between threads/contexts, or perhaps too hard to create
context-private objects.

Before making more guesses though I'd be interested to get more
information about what these locks are protecting & what the
circumstances are.

Keith


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] os: add spinlocks

2010-12-14 Thread Keith Whitwell

Looks good to me.

Keith

On Tue, 2010-12-14 at 05:15 -0800, Marek Olšák wrote:
> ---
>  src/gallium/auxiliary/os/os_thread.h |   51 
> ++
>  1 files changed, 51 insertions(+), 0 deletions(-)
> 
> diff --git a/src/gallium/auxiliary/os/os_thread.h 
> b/src/gallium/auxiliary/os/os_thread.h
> index a084310..6c25b33 100644
> --- a/src/gallium/auxiliary/os/os_thread.h
> +++ b/src/gallium/auxiliary/os/os_thread.h
> @@ -92,6 +92,23 @@ typedef pthread_mutex_t pipe_mutex;
> (void) pthread_mutex_unlock(&(mutex))
>  
> 
> +/* pipe_spinlock
> + */
> +typedef pthread_spinlock_t pipe_spinlock;
> +
> +#define pipe_spin_init(spinlock) \
> +   (void) pthread_spin_init(&(spinlock), 0)
> +
> +#define pipe_spin_destroy(spinlock) \
> +   (void) pthread_spin_destroy(&(spinlock))
> +
> +#define pipe_spin_lock(spinlock) \
> +   (void) pthread_spin_lock(&(spinlock))
> +
> +#define pipe_spin_unlock(spinlock) \
> +   (void) pthread_spin_unlock(&(spinlock))
> +
> +
>  /* pipe_condvar
>   */
>  typedef pthread_cond_t pipe_condvar;
> @@ -167,6 +184,24 @@ typedef CRITICAL_SECTION pipe_mutex;
>  #define pipe_mutex_unlock(mutex) \
> LeaveCriticalSection(&mutex)
>  
> +
> +/* pipe_spinlock (fake implemention for windows using mutex)
> + */
> +typedef pipe_mutex pipe_spinlock;
> +
> +#define pipe_spin_init(spinlock) \
> +   pipe_mutex_init(spinlock)
> +
> +#define pipe_spin_destroy(spinlock) \
> +   pipe_mutex_destroy(spinlock)
> +
> +#define pipe_spin_lock(spinlock) \
> +   pipe_mutex_lock(spinlock)
> +
> +#define pipe_spin_unlock(spinlock) \
> +   pipe_mutex_unlock(spinlock)
> +
> +
>  /* TODO: Need a macro to declare "I don't care about WinXP compatibilty" */
>  #if 0 && defined (_WIN32_WINNT) && (_WIN32_WINNT >= 0x0600)
>  /* CONDITION_VARIABLE is only available on newer versions of Windows
> @@ -272,6 +307,22 @@ typedef unsigned pipe_mutex;
>  #define pipe_mutex_unlock(mutex) \
> (void) mutex
>  
> +
> +typedef unsigned pipe_spinlock;
> +
> +#define pipe_spin_init(spinlock) \
> +   (void) spinlock
> +
> +#define pipe_spin_destroy(spinlock) \
> +   (void) spinlock
> +
> +#define pipe_spin_lock(spinlock) \
> +   (void) spinlock
> +
> +#define pipe_spin_unlock(spinlock) \
> +   (void) spinlock
> +
> +
>  typedef int64_t pipe_condvar;
>  
>  #define pipe_static_condvar(condvar) \


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] Mesa (master): tnl: Initialize gl_program_machine memory in run_vp.

2010-12-13 Thread Keith Whitwell

On Mon, 2010-12-13 at 07:09 -0800, Brian Paul wrote:
> On 12/10/2010 03:27 PM, Vinson Lee wrote:
> > Module: Mesa
> > Branch: master
> > Commit: ef3f7e61b314236cbb7ed2cf24d34c6f90d9cfca
> > URL:
> > http://cgit.freedesktop.org/mesa/mesa/commit/?id=ef3f7e61b314236cbb7ed2cf24d34c6f90d9cfca
> >
> > Author: Vinson Lee
> > Date:   Fri Dec 10 14:24:05 2010 -0800
> >
> > tnl: Initialize gl_program_machine memory in run_vp.
> >
> > Fixes piglit valgrind glsl-array-bounds-04 failure (FDO bug 29946).
> >
> > NOTE:
> > This is a candidate for the 7.10 branch.
> > This is a candidate for the 7.9 branch.
> >
> > ---
> >
> >   src/mesa/tnl/t_vb_program.c |2 +-
> >   1 files changed, 1 insertions(+), 1 deletions(-)
> >
> > diff --git a/src/mesa/tnl/t_vb_program.c b/src/mesa/tnl/t_vb_program.c
> > index 76f8fde..7e7c59a 100644
> > --- a/src/mesa/tnl/t_vb_program.c
> > +++ b/src/mesa/tnl/t_vb_program.c
> > @@ -311,7 +311,7 @@ run_vp( struct gl_context *ctx, struct 
> > tnl_pipeline_stage *stage )
> >  struct vp_stage_data *store = VP_STAGE_DATA(stage);
> >  struct vertex_buffer *VB =&tnl->vb;
> >  struct gl_vertex_program *program = ctx->VertexProgram._Current;
> > -   struct gl_program_machine machine;
> > +   struct gl_program_machine machine = { 0 };
> >  GLuint outputs[VERT_RESULT_MAX], numOutputs;
> >  GLuint i, j;
> 
> I think there's a better fix.  The above will initialize the whole 
> object to zeros for every function call (and be a performance hit).  I 
> think we really only need to do it once to avoid the valgrind warning.
> 
> I've got a new patch that I'll commit.

Also, this idiom of partially initializing structures with {0} seems to
cause gcc to squawk about all the fields which weren't included in the
initializer.

MSVC apparently doesn't complain, and it is a convenient idiom providing
the compiler likes it.

On balance, I like getting those warnings from gcc as they can spot real
bugs elsewhere, so I'd prefer not to have code which causes us to want
to turn the warning off...

Keith

 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] tgsi: fix rbug compile error

2010-12-13 Thread Keith Whitwell

On Sat, 2010-12-11 at 04:22 -0800, Jose Fonseca wrote:
> Looks good to me FWIW.
> 
> Usually one uses a union for avoid breaking strict-aliasing rules, but your 
> memcpy approach should produce just as good code with less typing.
> 
> The only proper fix here would be to make struct tgsi_token an union of all 
> possible token types.

Having such a union would have been useful in other cases too...

Keith

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] Mesa releases in early January?

2010-12-01 Thread Keith Whitwell

On Tue, 2010-11-30 at 18:07 -0800, Roland Scheidegger wrote:
> Am 30.11.2010 21:23, schrieb Ian Romanick:
> > It seems that new development in master has slowed a bit, so how does
> > a 7.10 release on January 7th sound?  If we're going to do that,
> > we'll want to make the 7.10 branch on, say, December 8th.  That's
> > roughly a week from now.
> Don't say development has slowed if I'm just about to merge
> gallium-array-textures which likely will cause havoc :-).
> Though maybe a week is enough to clean up the mess mostly ;-).
> 

Is there any harm in creating the 7.10 branch a little earlier, prior to
Roland's merge?

Keith

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [RFC] st/vega: Clean up and OpenVG 1.1

2010-11-30 Thread Keith Whitwell

On Tue, 2010-11-30 at 01:51 -0800, Chia-I Wu wrote:
> Hi list,
> 
> I have spent the weekend adding OpenVG 1.1 support to Vega state
> tracker.  The new features added include mask layer support, text
> support, and a new color transformation stage.  The work can be found
> at
> 
>   http://cgit.freedesktop.org/~olv/mesa/log/?h=vega-1.1
> 
> vega-1.1 branch is based on another clean-up branch
> 
>   http://cgit.freedesktop.org/~olv/mesa/log/?h=vega-polish
> 
> Vega employs a renderer to submit its rendering commands to the pipe
> context.  vega-polish branch mainly introduces "states" to renderer.
> I feel vega-1.1 is more self-explanatory so I will focus more on
> vega-polish.
> 
> The idea of renderer states is that, functions like vgClear or vgMask
> submit rendering commands to the pipe context, but they do not want to
> go through the standard OpenVG pipeline.  Instead of having them
> handle the pipe states save/restore, renderer states allows them to
> switch the renderer to the specific state they need.  For example,
> vgClear can then be implemented by
> 
>   renderer_clear_begin(renderer);
>   renderer_clear(renderer, x, y, w, h, color); // clear a rectangle
>   renderer_clear_end(renderer);
> 
> vgClear does not need to know which pipe states renderer_clear_begin
> sets or which pipe states renderer_clear_end restores.  Nor should it
> care.
> 
> After the addition of renderer states, vega-polish goes on to refactor
> the code in a way that finally the renderer no longer depends on
> OpenVG states and all pipe context state manipulation is done by the
> renderer.  This makes the renderer an abraction of the pipe_context
> for Vega:
> 
>App -> OpenVG -> renderer -> pipe_context
> 
> The benefits of the changes are that code that handles OpenVG
> functions and code that sets pipe_context is no longer mixed.  Since
> the renderer is an opaque object, its implementation can be changed:
> Use a different pipe_context function for an OpenVG function or the
> frequency of pipe state changes may be minimized.

There's a lot here, but it looks like a good cleanup & really sorts out
the flow of control in that module.

Thanks for giving vega some love...

Keith

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] RFC: gallium-array-textures changes

2010-11-29 Thread Keith Whitwell

On Wed, 2010-11-24 at 18:28 -0800, Roland Scheidegger wrote:
>  From: 
> Roland Scheidegger
> 
>To: 
> mesa-dev@lists.freedesktop.org
> ,
> Keith Whitwell 
>   Subject: 
> RFC: gallium-array-textures changes
>  Date: 
> Wed, 24 Nov 2010 18:28:11 -0800
> (25/11/10 02:28:11)
> 
> 
> Hi,
> 
> gallium currently lacks array textures, and the gallium-array-textures
> branch is trying to fix this (I've attached just the interface changes
> here as the branch got ugly over time - guess will need a squash
> merge).
> 
> In short there's a new array_size field in pipe_resource - note this
> is
> meant to be mutually exclusive with the depth0 field (that is only one
> of them can be larger than one), since 3d arrays don't exist and the
> rest of the interface (surfaces/sampler views/transfers) are
> restricted
> to 3 dimensions in total.
> Also, pipe_subresource is eliminated - this is a concept which maps
> well
> to dx10 but not much else. For example, in OpenGL you can have
> transfers
> (by using TexImage3D for 2d arrays) which cover more than one array
> slice. In that sense, d3d10 distinguishes between array slices and the
> depth slices of a 3d texture, whereas OGL does not. So, at the gallium
> interface level, transfers / copy region etc. are allowed to span
> several array slices (or cube faces which is basically the same), just
> like it is for depth slices (this uses the "layer" term for meaning
> either depth, face, or array member). This also means some of the
> functions which had both face and zslice arguments now instead use a
> single layer argument.
> Aside from these changes, this also finally cleans up the surface
> interface. Creating and destroying surfaces is now handled much the
> same
> as sampler views, i.e. per context functions. And more, previously
> surfaces (due to historical reasons) were used for other things, but
> now
> they are really meant to be used as render (or depth_stencil) targets
> only (much the same as sampler views are used to bind textures (or now
> buffers too at least in the interface though nothing implements this
> yet)). pipe_surface still has some fields in there which should go
> away
> (width/height) but that was too intrusive for now (was hard enough to
> get rid of offset...).
> Surfaces and sampler views now also can span several layers (well for
> cube and 3d textures this was always implicit in the sampler views
> before), which is needed for dx10.
> 
> I suspect though I've broken a fair number of things (nvfx driver,
> python and d3d1x state trackers being the most likely), everything
> compiles for me but someone familiar with these pieces taking a look
> before any merge would be appreciated.
> 
> Any questions?
> 
> Roland
> 


Roland,

This all looks good to me, thanks for keeping on with this change.  I
agree it would be good for interested folks to check this over before it
merges, though I think that there should be a time-limit to that --
let's propose merging this on Wednesday if there are no further
comments.

Keith

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] u_blitter cpu/gpu stall

2010-11-23 Thread Keith Whitwell

On Mon, 2010-11-22 at 20:06 -0800, Dave Airlie wrote:
> Hi Marek,
> 
> So I was looking at some perf traces from r600g, and I see a stall on
> the blitter quad vbuf, every clear will cause the CPU to block on the
> mapping of the vbuf to upload the new coords. On r300g I can see this
> not mattering as the immediate upload path takes care of things,
> however I think we should probably do something like the attached and
> fire and forget the vertex buffer.
> 
> Dave.

Dave,

I don't have a particular objection to this change in and of itself, but
more broadly I feel it's a bandaid on one particular instance of a usage
pattern which r600g doesn't cope with well.

Ultimately, I think it will be necessary to introduce a pipelined
transfer path for r600g buffers, to avoid stalling on these types of
updates.

Keith

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] Path to optimize (moving from create/bind/delete paradgim to set only ?)

2010-11-17 Thread Keith Whitwell

On Tue, 2010-11-16 at 13:40 -0800, Roland Scheidegger wrote:
> On 16.11.2010 22:15, Jerome Glisse wrote:
> > On Tue, Nov 16, 2010 at 3:27 PM, Roland Scheidegger  
> > wrote:
> >> On 16.11.2010 20:59, Jerome Glisse wrote:
> >>> On Tue, Nov 16, 2010 at 2:38 PM, Roland Scheidegger  
> >>> wrote:
>  On 16.11.2010 20:21, Jerome Glisse wrote:
> > Hi,
> >
> > So i looked a bit more at what path we should try to optimize in the
> > mesa/gallium/pipe infrastructure. Here are some number gathers from
> > games :
> > drawcall / ps constant   vs constant ps samplervs sampler
> > doom31.45 1.39   9.24  
> > 9.86
> > nexuiz 6.27 5.98   6.84 
> >  7.30
> > openarena  2805.64 1.38   1.51  1.54
> >
> > (value of 1 mean there is a call of this function for every draw call,
> > while value of 10 means there is a call to this function every 10 draw
> > call, average)
> >
> > Note that openarena ps constant number is understable as it's fixed GL
> > pipeline which is in use here and the pixel shader constant doesn't
> > need much change in those case.
> >
> > So i think clear trend is that there is a lot of constant upload and
> > sampler changing (allmost at each draw call for some games) Thus i
> > think we want to make sure that we have real fast path for uploading
> > constant or changing sampler. I think those path should be change and
> > should avoid using some of the gallium infrastructure. For shader
> > constant i think best solution is to provide the ptr to program
> > constant buffer directly to the pipe driver and let the driver choose
> > how it wants to upload constant to the GPU (GPU have different
> > capabilities, some can stream constant buffer inside their command
> > stream, other can just keep around a pool of buffer into which they
> > can memcpy, ...) As there is no common denominator i don't think we
> > should go through the pipe buffer allocation and providing a new pipe
> > buffer each time.
> >
> > Optimizing this for r600g allow ~7% increase in games (when draw is
> > nop) ~5% (when not submitting to gpu) ~3% when no part of the driver
> > is commented. r600g have others bottleneck that tends to minimize the
> > gain we can get from such optimization. Patch at
> > http://people.freedesktop.org/~glisse/gallium_const_path/
> >
> > For sampler i don't think we want to create persistant object, we are
> > spending precious time building, hashing, searching for similar
> > sampler each time in the gallium code, i think best would be to think
> > state as use once and forget. That said we can provide helper function
> > to pipe driver that wants to be cache sampler (but even for virtual hw
> > i don't think this makes sense). I haven't yet implemented a fast path
> > for sampler to see how much we can win from that but i will report
> > back once i do.
> >
> > So a more fundamental question here is should we move away from
> > persistant state and consider all states (except shader and texture)
> > as being too much volatile so that caching any of them doesn't make
> > sense from performance point of view. That would mean change lot of
> > create/bind/delete interface to simply set interface for the pipe
> > driver. This could be seen as a simplification. Anyway i think we
> > should really consider moving more toward set than create/bind/delete
> > (i loved a lot the create/bind/delete paradigm but it doesn't seems to
> > be the one you want with GL, at least from number i gather with some
> > games).
>  Why do you think it's faster to create and use a new state rather than
>  search in the hash cache and reuse this? I was under the impression
>  (this being a dx10 paradigm) even hw is quite optimized for this (that
>  is, you just keep all the state objects on the hw somewhere and switch
>  between them). Also, what functions did you really see? If things work
>  as expected, it should be mostly bind, not create/delete.
>  Now it is certainly possible a driver doesn't make good use of this
>  (i.e. it really does all the time consuming stuff on bind), but this is
>  outside the scope of the infrastructure.
>  It is possible hashing is insufficient (could for instance cause too
>  many collisions hence need to recreate state object) but the principle
>  mechanism looks quite sound to me.
> 
>  Roland
> 
> >>> The create/bin & reuse paradgim is likely good for a directx like api
> >>> where api put incentive on application to create  and manage
> >>> efficiently the states it wants to use. But GL, which is i believe the
> >>> API we should focus on, is a completely di

Re: [Mesa-dev] RFC: gallium: Remove redundant sw and debug target helpers

2010-11-16 Thread Keith Whitwell

On Wed, 2010-11-10 at 16:04 -0800, Jakob Bornecrantz wrote:
> Hi all
> 
> We have a bunch of redundant target helpers to wrap screens with debug 
> drivers and for creating the various software drivers. This series removes 
> all but the inline one, I picked it since it gives more flexibility for 
> targets and maybe more importantly is the one that is used in 20 places vs 3 
> for the other one.
> 
> Comments please.
> 
> Cheers Jakob.

This looks goods to me Jakob.

Keith

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] r600g: Lower the minimum stride from 512 to 256 bytes to fix bug #31578.

2010-11-16 Thread Keith Whitwell

On Mon, Nov 15, 2010 at 9:46 PM, Alex Deucher  wrote:
> On Mon, Nov 15, 2010 at 4:41 PM, Tilman Sauerbeck  
> wrote:
>> piglit/fbo-readpixels still passes for me.
>>
>> Signed-off-by: Tilman Sauerbeck 
>> ---
>>
>> Please review. And someone please tell me where those 512 and 256 bytes
>> are coming from :)
>
> The alignment depends on the type of tiling in use (linear, 1d, 2d).
> See this drm patch for more info:
> http://git.kernel.org/?p=linux/kernel/git/airlied/drm-2.6.git;a=commitdiff;h=fba4312e223f1187efc8c083daed70e57fa9c9d3
> The info needed can be queried via the tiling info ioctl.

I found the documentation on this pretty hard to follow, but the
kernel code seems to make sense.

Keith
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] gl_FragCoord / FBOs vs mesa/st

2010-11-15 Thread Keith Whitwell

On Mon, 2010-11-15 at 01:32 -0800, Keith Whitwell wrote:
> On Mon, 2010-11-15 at 01:28 -0800, Keith Whitwell wrote:
> > On Sun, 2010-11-14 at 20:18 -0800, Dave Airlie wrote:
> > > Eric just checked in a test into piglit that tests that the
> > > gl_FragCoord works the right way up for FBOs,
> > > 
> > > Now all the gallium drivers fail this currently and fixing it creates
> > > an ugly linkage between the currently bound buffer and the fragment
> > > shader, since if you swap from an FBO to rendering to the front
> > > buffer, you need recompile the fragment shader to emit a proper wpos
> > > manipulation. Just wondering if anyone sees a nicer way to do this,
> > > than caching frag shaders with some sort of key in the state tracker,
> > > (which is pretty much what 965 has done.).
> > 
> > I guess the other possibility would be to have a couple of constants in
> > the constant buffer which get factored into the fragcood calculation in
> > such a way as to effect a flip based on their value, eg:
> > 
> >fc' = fc * const[0].x + const[0].y
> > 
> > where const[0] is either
> > -> {1, 0} for non-flipped
> > -> {-1, fb_height} for flipped
> 
> Another question is how to tell the pipe driver which of these to use --
> probably we want an explicit flag in one of the state atoms
> (rasterizer?) to select between the two possibilities?

...hmm need more coffee.  Of course you'd do this at the state tracker
level (as you detailed) & the pipe drivers wouldn't need to think about
it...

Keith

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] gl_FragCoord / FBOs vs mesa/st

2010-11-15 Thread Keith Whitwell

On Mon, 2010-11-15 at 01:28 -0800, Keith Whitwell wrote:
> On Sun, 2010-11-14 at 20:18 -0800, Dave Airlie wrote:
> > Eric just checked in a test into piglit that tests that the
> > gl_FragCoord works the right way up for FBOs,
> > 
> > Now all the gallium drivers fail this currently and fixing it creates
> > an ugly linkage between the currently bound buffer and the fragment
> > shader, since if you swap from an FBO to rendering to the front
> > buffer, you need recompile the fragment shader to emit a proper wpos
> > manipulation. Just wondering if anyone sees a nicer way to do this,
> > than caching frag shaders with some sort of key in the state tracker,
> > (which is pretty much what 965 has done.).
> 
> I guess the other possibility would be to have a couple of constants in
> the constant buffer which get factored into the fragcood calculation in
> such a way as to effect a flip based on their value, eg:
> 
>fc' = fc * const[0].x + const[0].y
> 
> where const[0] is either
> -> {1, 0} for non-flipped
> -> {-1, fb_height} for flipped

Another question is how to tell the pipe driver which of these to use --
probably we want an explicit flag in one of the state atoms
(rasterizer?) to select between the two possibilities?

Keith

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] gl_FragCoord / FBOs vs mesa/st

2010-11-15 Thread Keith Whitwell

On Sun, 2010-11-14 at 20:18 -0800, Dave Airlie wrote:
> Eric just checked in a test into piglit that tests that the
> gl_FragCoord works the right way up for FBOs,
> 
> Now all the gallium drivers fail this currently and fixing it creates
> an ugly linkage between the currently bound buffer and the fragment
> shader, since if you swap from an FBO to rendering to the front
> buffer, you need recompile the fragment shader to emit a proper wpos
> manipulation. Just wondering if anyone sees a nicer way to do this,
> than caching frag shaders with some sort of key in the state tracker,
> (which is pretty much what 965 has done.).

I guess the other possibility would be to have a couple of constants in
the constant buffer which get factored into the fragcood calculation in
such a way as to effect a flip based on their value, eg:

   fc' = fc * const[0].x + const[0].y

where const[0] is either
-> {1, 0} for non-flipped
-> {-1, fb_height} for flipped

Keith


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] r600g/mesa/gallium performance, whois to blame ?

2010-11-15 Thread Keith Whitwell

On Fri, 2010-11-12 at 20:32 -0800, Jerome Glisse wrote:

> 
> I think r600c is just a bit too naive and so it end up being very
> expensive to change any states with it. But i haven't took a closer
> look. I don't think we should look too much at relative cost of
> changing state. I think fglrx optimized the function call cost just
> enough so that it didn't impact performances, while nvidia did go nuts
> and over optimized function call overhead. Thus i think target should
> be more about making sure core mesa + gallium with noop pipe driver
> should be able to keep up at 500t draw call/sec when states change
> occur (of course this could vary depending on which states change) and
> not 173t call/sec.

I think the idea of installing a noop pipe driver & using that to
optimize everything else in the stack is a good one.

It's true that the per-statechange overhead hasn't really had a lot of
attention in the Mesa statetracker, and in particular there is a lot of
work done every draw call to re-assemble all of the vertex buffers and
vertex elements.

A lot of that could be short-circuited with a little effort.

Keith

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] Status update of XvMC on R600

2010-11-12 Thread Keith Whitwell

On Thu, 2010-11-11 at 14:59 -0800, Jerome Glisse wrote:
> 2010/11/11 Keith Whitwell :
> > There is still more to do there.  Currently r600g treats buffer and texture 
> > uploads separately, and I've only attempted to improve texture uploads.  
> > Buffer is just as important however.
> >
> > The change needed is likely to be one of two:
> > a) Allow newly created vertex buffers to be in the GTT domain, where they 
> > can be mapped cached.
> > b) Provide a staging resource upload path (with the staging buffer in GTT 
> > domain).
> >
> > The latter will catch more cases and doesn't suffer from waits for the 
> > engine to go idle when accessing an in-use buffer.  The former is probably 
> > fastest for the cases where it works.
> >
> > Right now staged texture uploads use a 3d blit to copy from the staging 
> > resource to the final destination.  That probably won't work (directly at 
> > least) for buffer uploads as buffer dimensions (eg 64k by 1) mean they 
> > usually can't be bound as render targets.  So we need to jump through some 
> > hoops to get a hardware upload path in the absence of a DMA engine or 
> > 1d-blit.
> >
> > Keith
> 
> I am not sure on how gallium texture upload was ever supposed to be or
> done, but from memory management point of view the idea i had was to
> create all bo in GTT and let migrate them to VRAM once they are use,
> eliminating any need for staging buffer. So it would be allocate bo,
> memcpy to bo the content of the texture, use bo and set it as vram bo
> so kernel migrate it to vram, that way you take advantage of kernel bo
> move which should be faster than any blit helped move.

That works great for normal/static textures that are written at most
once by the CPU and from then on always used by the GPU, and is
basically the (a) path, above.

The purpose of an intermediate/staging/dma-based upload path is to cope
with textures/buffers/etc which receive incremental updates from the CPU
at concurrently with being rendered from by the GPU.  

This is actually pretty common for VBOs, where a lot of applications
come up with schemes for incrementally updating a small number of large
VBOs (I think ETQW did this for instance), but also any application
using TexSubImage, etc, is effectively doing this.

Doing these updates with DMAs means we don't have to wait for buffer
idle before the update, which seems to be the most obvious current
bottleneck in r600g for a lot of apps.

> Anyway this was my initial thinking when doing the code.

It's definitely the most efficient path for static textures, but for
dynamically-updated resources, and for readbacks, having a GPU-mediated
copy seems to be a win.

Keith

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] Status update of XvMC on R600

2010-11-11 Thread Keith Whitwell

There is still more to do there.  Currently r600g treats buffer and texture 
uploads separately, and I've only attempted to improve texture uploads.  Buffer 
is just as important however.

The change needed is likely to be one of two:
a) Allow newly created vertex buffers to be in the GTT domain, where they can 
be mapped cached.
b) Provide a staging resource upload path (with the staging buffer in GTT 
domain).

The latter will catch more cases and doesn't suffer from waits for the engine 
to go idle when accessing an in-use buffer.  The former is probably fastest for 
the cases where it works.

Right now staged texture uploads use a 3d blit to copy from the staging 
resource to the final destination.  That probably won't work (directly at 
least) for buffer uploads as buffer dimensions (eg 64k by 1) mean they usually 
can't be bound as render targets.  So we need to jump through some hoops to get 
a hardware upload path in the absence of a DMA engine or 1d-blit.

Keith

From: mesa-dev-bounces+keithw=vmware@lists.freedesktop.org 
[mesa-dev-bounces+keithw=vmware@lists.freedesktop.org] On Behalf Of Alex 
Deucher [alexdeuc...@gmail.com]
Sent: Thursday, November 11, 2010 3:25 PM
To: Christian König
Cc: mesa-dev@lists.freedesktop.org
Subject: Re: [Mesa-dev] Status update of XvMC on R600

2010/11/11 Christian König :
> Am Mittwoch, den 10.11.2010, 15:30 -0500 schrieb Younes Manton:
>> In the meantime, I suggest you check if your vertex buffers are in
>> sytem memory (preferably at least WC-ed if not cached); I don't recall
>> spending that much time in gen_block_verts in Nouveau.
>
> Looks like your suspicions about the vertex buffer not being in system
> memory were right. Even if I move every single calculation from
> gen_block_verts into the vertex shader the cpu time spend in this
> function doesn't goes below ~35%.
>
> I also doesn't understand why it's only gen_block_verts and not
> gen_macroblock_verts. Probably because most blocks are only intra
> blocks, but we will see if this changes when I manage to implement
> proper buffer usage modes into r600g.
>

FWIW, Keithw committed a bunch of r600g usage fixes last week.

Alex

> Christian.
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] graw: Export graw_save_surface_to_file().

2010-11-04 Thread Keith Whitwell

Michal - it looks like this will mean that these tests now always try to create 
& populate a "result.bmp" file?  Would it be possible to guard this behaviour 
with some sort of option, eg an environment var?

Keith

From: mesa-dev-bounces+keithw=vmware@lists.freedesktop.org 
[mesa-dev-bounces+keithw=vmware@lists.freedesktop.org] On Behalf Of Michal 
Krol [mic...@vmware.com]
Sent: Thursday, November 04, 2010 4:51 PM
To: mesa-dev@lists.freedesktop.org
Subject: [Mesa-dev] [PATCH] graw: Export graw_save_surface_to_file().

>From c5a4c9d6f146077bd59759f985e103d9696cc9b2 Mon Sep 17 00:00:00 2001
From: Michal Krol 
Date: Thu, 4 Nov 2010 17:51:14 +0100
Subject: [PATCH] graw: Export graw_save_surface_to_file().

Allows applications to dump surfaces to file without
referencing gallium/auxiliary entry points statically.
---
 src/gallium/include/state_tracker/graw.h  |4 
 src/gallium/targets/graw-null/graw_util.c |   10 ++
 src/gallium/tests/graw/clear.c|   14 +-
 src/gallium/tests/graw/fs-test.c  |   13 +
 src/gallium/tests/graw/gs-test.c  |   13 +
 src/gallium/tests/graw/quad-tex.c |   13 +
 src/gallium/tests/graw/shader-leak.c  |1 -
 src/gallium/tests/graw/tri-gs.c   |1 -
 src/gallium/tests/graw/tri-instanced.c|   13 +
 src/gallium/tests/graw/tri.c  |   13 +
 src/gallium/tests/graw/vs-test.c  |   13 +
 11 files changed, 21 insertions(+), 87 deletions(-)

diff --git a/src/gallium/include/state_tracker/graw.h 
b/src/gallium/include/state_tracker/graw.h
index 6a99b23..51b8399 100644
--- a/src/gallium/include/state_tracker/graw.h
+++ b/src/gallium/include/state_tracker/graw.h
@@ -71,4 +71,8 @@ PUBLIC void *graw_parse_vertex_shader( struct pipe_context 
*pipe,
 PUBLIC void *graw_parse_fragment_shader( struct pipe_context *pipe,
  const char *text );

+PUBLIC void graw_save_surface_to_file(struct pipe_context *pipe,
+  struct pipe_surface *surface,
+  const char *filename);
+
 #endif
diff --git a/src/gallium/targets/graw-null/graw_util.c 
b/src/gallium/targets/graw-null/graw_util.c
index 531757f..41f65fd 100644
--- a/src/gallium/targets/graw-null/graw_util.c
+++ b/src/gallium/targets/graw-null/graw_util.c
@@ -3,6 +3,7 @@
 #include "pipe/p_context.h"
 #include "pipe/p_state.h"
 #include "tgsi/tgsi_text.h"
+#include "util/u_debug.h"
 #include "util/u_memory.h"
 #include "state_tracker/graw.h"

@@ -51,3 +52,12 @@ graw_parse_fragment_shader(struct pipe_context *pipe,
return pipe->create_fs_state(pipe, &state);
 }

+PUBLIC void
+graw_save_surface_to_file(struct pipe_context *pipe,
+  struct pipe_surface *surface,
+  const char *filename)
+{
+   /* XXX: Make that working in release builds.
+*/
+   debug_dump_surface_bmp(pipe, filename, surface);
+}
diff --git a/src/gallium/tests/graw/clear.c b/src/gallium/tests/graw/clear.c
index ce52a93..2b5cee2 100644
--- a/src/gallium/tests/graw/clear.c
+++ b/src/gallium/tests/graw/clear.c
@@ -8,8 +8,6 @@
 #include "pipe/p_state.h"
 #include "pipe/p_defines.h"

-#include "util/u_debug.h"   /* debug_dump_surface_bmp() */
-
 enum pipe_format formats[] = {
PIPE_FORMAT_R8G8B8A8_UNORM,
PIPE_FORMAT_B8G8R8A8_UNORM,
@@ -31,17 +29,7 @@ static void draw( void )
ctx->clear(ctx, PIPE_CLEAR_COLOR, clear_color, 0, 0);
ctx->flush(ctx, PIPE_FLUSH_RENDER_CACHE, NULL);

-#if 0
-   /* At the moment, libgraw leaks out/makes available some of the
-* symbols from gallium/auxiliary, including these debug helpers.
-* Will eventually want to bless some of these paths, and lock the
-* others down so they aren't accessible from test programs.
-*
-* This currently just happens to work on debug builds - a release
-* build will probably fail to link here:
-*/
-   debug_dump_surface_bmp(ctx, "result.bmp", surf);
-#endif
+   graw_save_surface_to_file(ctx, surf, "result.bmp");

screen->flush_frontbuffer(screen, surf, window);
 }
diff --git a/src/gallium/tests/graw/fs-test.c b/src/gallium/tests/graw/fs-test.c
index 53fbb74..333ecc9 100644
--- a/src/gallium/tests/graw/fs-test.c
+++ b/src/gallium/tests/graw/fs-test.c
@@ -10,7 +10,6 @@
 #include "pipe/p_defines.h"
 #include   /* for fread(), etc */

-#include "util/u_debug.h"   /* debug_dump_surface_bmp() */
 #include "util/u_inlines.h"
 #include "util/u_memory.h"  /* Offset() */
 #include "util/u_draw_quad.h"
@@ -279,17 +278,7 @@ static void draw( void )
util_draw_arrays(ctx, PIPE_PRIM_TRIANGLES, 0, 3);
ctx->flush(ctx, PIPE_FLUSH_RENDER_CACHE, NULL);

-#if 0
-   /* At the moment, libgraw leaks out/makes available some of the
-* symbols from gallium/auxiliary, including these debug helpers.
-* Will ev

Re: [Mesa-dev] [PATCH] r300g: Do not use buf param before checking for NULL.

2010-11-04 Thread Keith Whitwell

Looks good, committed.  Thanks for fixing this.

Keith

On Wed, Nov 3, 2010 at 9:14 PM, Guillermo S. Romero
 wrote:
> Commit 8dfafbf0861fe3d2542332658dd5493851053c78 forgot to update r300g.
> There is a buf == NULL check, but buf is used before for var init.
>
> Tested-by: Guillermo S. Romero 
> ---
>  src/gallium/drivers/r300/r300_state.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/src/gallium/drivers/r300/r300_state.c 
> b/src/gallium/drivers/r300/r300_state.c
> index f2479a9..f513f87 100644
> --- a/src/gallium/drivers/r300/r300_state.c
> +++ b/src/gallium/drivers/r300/r300_state.c
> @@ -1789,7 +1789,7 @@ static void r300_set_constant_buffer(struct 
> pipe_context *pipe,
>  {
>     struct r300_context* r300 = r300_context(pipe);
>     struct r300_constant_buffer *cbuf;
> -    uint32_t *mapped = r300_buffer(buf)->user_buffer;
> +    uint32_t *mapped;
>
>     switch (shader) {
>         case PIPE_SHADER_VERTEX:
> --
> 1.7.2.3
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] st/mesa: Reset the constant buffers before destroying the pipe context.

2010-11-03 Thread Keith Whitwell

Looks good Tillman.

Keith

On Tue, Nov 2, 2010 at 9:17 PM, Tilman Sauerbeck  wrote:
> Signed-off-by: Tilman Sauerbeck 
> ---
>
> v2: Also call into the pipe driver to make it release its reference.
>
>  src/mesa/state_tracker/st_context.c |    5 +
>  1 files changed, 5 insertions(+), 0 deletions(-)
>
> diff --git a/src/mesa/state_tracker/st_context.c 
> b/src/mesa/state_tracker/st_context.c
> index b5ea6d0..d0dcdd4 100644
> --- a/src/mesa/state_tracker/st_context.c
> +++ b/src/mesa/state_tracker/st_context.c
> @@ -238,6 +238,11 @@ void st_destroy_context( struct st_context *st )
>
>    pipe->set_index_buffer(pipe, NULL);
>
> +   for (i = 0; i < PIPE_SHADER_TYPES; i++) {
> +      pipe->set_constant_buffer(pipe, PIPE_SHADER_VERTEX, 0, NULL);
> +      pipe_resource_reference(&st->state.constants[PIPE_SHADER_VERTEX], 
> NULL);
> +   }
> +
>    _mesa_delete_program_cache(st->ctx, st->pixel_xfer.cache);
>
>    _vbo_DestroyContext(st->ctx);
> --
> 1.7.3.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 5/5] r600g: set hardware pixel centers according to gl_rasterization_rules

2010-11-02 Thread Keith Whitwell

On Tue, Nov 2, 2010 at 7:54 PM, Alex Deucher  wrote:
> On Tue, Nov 2, 2010 at 3:40 PM, Keith Whitwell  wrote:
>> These were previously being left in the default (D3D) mode.  This mean
>> that triangles were drawn slightly incorrectly, but also because this
>> state is relied on by the u_blitter code, all blits were half a pixel
>> off.
>
> Looks good.  Evergreen (evergreen_state.c) should be updated similarly.

I've got a few patches for evergreen but haven't set up hardware to
test them yet...  I can probably give it a shot tomorrow though.

Keith
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 5/5] r600g: set hardware pixel centers according to gl_rasterization_rules

2010-11-02 Thread Keith Whitwell

These were previously being left in the default (D3D) mode.  This mean
that triangles were drawn slightly incorrectly, but also because this
state is relied on by the u_blitter code, all blits were half a pixel
off.
---
 src/gallium/drivers/r600/r600_state.c |5 +
 src/gallium/drivers/r600/r600d.h  |4 
 src/gallium/winsys/r600/drm/r600_hw_context.c |1 +
 src/gallium/winsys/r600/drm/r600d.h   |1 +
 4 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_state.c 
b/src/gallium/drivers/r600/r600_state.c
index ccd7421..17e64b1 100644
--- a/src/gallium/drivers/r600/r600_state.c
+++ b/src/gallium/drivers/r600/r600_state.c
@@ -475,6 +475,11 @@ static void *r600_create_rs_state(struct pipe_context *ctx,
r600_pipe_state_add_reg(rstate, R_028A0C_PA_SC_LINE_STIPPLE, 
0x0005, 0x, NULL);
r600_pipe_state_add_reg(rstate, R_028A48_PA_SC_MPASS_PS_CNTL, 
0x, 0x, NULL);
r600_pipe_state_add_reg(rstate, R_028C00_PA_SC_LINE_CNTL, 0x0400, 
0x, NULL);
+
+   r600_pipe_state_add_reg(rstate, R_028C08_PA_SU_VTX_CNTL,
+   
S_028C08_PIX_CENTER_HALF(state->gl_rasterization_rules),
+   0x, NULL);
+
r600_pipe_state_add_reg(rstate, R_028C0C_PA_CL_GB_VERT_CLIP_ADJ, 
0x3F80, 0x, NULL);
r600_pipe_state_add_reg(rstate, R_028C10_PA_CL_GB_VERT_DISC_ADJ, 
0x3F80, 0x, NULL);
r600_pipe_state_add_reg(rstate, R_028C14_PA_CL_GB_HORZ_CLIP_ADJ, 
0x3F80, 0x, NULL);
diff --git a/src/gallium/drivers/r600/r600d.h b/src/gallium/drivers/r600/r600d.h
index a3cb5b8..ae19bfb 100644
--- a/src/gallium/drivers/r600/r600d.h
+++ b/src/gallium/drivers/r600/r600d.h
@@ -2100,6 +2100,10 @@
 #define   G_028C00_LAST_PIXEL(x)   (((x) >> 10) & 0x1)
 #define   C_028C00_LAST_PIXEL  0xFBFF
 #define R_028C04_PA_SC_AA_CONFIG 0x028C04
+#define R_028C08_PA_SU_VTX_CNTL  0x028C08
+#define   S_028C08_PIX_CENTER_HALF(x)  (((x) & 0x1) << 0)
+#define   G_028C08_PIX_CENTER_HALF(x)  (((x) >> 0) & 0x1)
+#define   C_028C08_PIX_CENTER_HALF 0xFFFE
 #define R_028C1C_PA_SC_AA_SAMPLE_LOCS_MCTX   0x028C1C
 #define R_028C48_PA_SC_AA_MASK   0x028C48
 #define R_028810_PA_CL_CLIP_CNTL 0x028810
diff --git a/src/gallium/winsys/r600/drm/r600_hw_context.c 
b/src/gallium/winsys/r600/drm/r600_hw_context.c
index effb228..c33f81e 100644
--- a/src/gallium/winsys/r600/drm/r600_hw_context.c
+++ b/src/gallium/winsys/r600/drm/r600_hw_context.c
@@ -384,6 +384,7 @@ static const struct r600_reg r600_context_reg_list[] = {
{PKT3_SET_CONTEXT_REG, R600_CONTEXT_REG_OFFSET, 
R_028A0C_PA_SC_LINE_STIPPLE, 0, 0, 0},
{PKT3_SET_CONTEXT_REG, R600_CONTEXT_REG_OFFSET, 
R_028A48_PA_SC_MPASS_PS_CNTL, 0, 0, 0},
{PKT3_SET_CONTEXT_REG, R600_CONTEXT_REG_OFFSET, 
R_028C00_PA_SC_LINE_CNTL, 0, 0, 0},
+   {PKT3_SET_CONTEXT_REG, R600_CONTEXT_REG_OFFSET, 
R_028C08_PA_SU_VTX_CNTL, 0, 0, 0},
{PKT3_SET_CONTEXT_REG, R600_CONTEXT_REG_OFFSET, 
R_028C0C_PA_CL_GB_VERT_CLIP_ADJ, 0, 0, 0},
{PKT3_SET_CONTEXT_REG, R600_CONTEXT_REG_OFFSET, 
R_028C10_PA_CL_GB_VERT_DISC_ADJ, 0, 0, 0},
{PKT3_SET_CONTEXT_REG, R600_CONTEXT_REG_OFFSET, 
R_028C14_PA_CL_GB_HORZ_CLIP_ADJ, 0, 0, 0},
diff --git a/src/gallium/winsys/r600/drm/r600d.h 
b/src/gallium/winsys/r600/drm/r600d.h
index d91f773..5ca7456 100644
--- a/src/gallium/winsys/r600/drm/r600d.h
+++ b/src/gallium/winsys/r600/drm/r600d.h
@@ -795,6 +795,7 @@
 #define R_028A48_PA_SC_MPASS_PS_CNTL 0x028A48
 #define R_028C00_PA_SC_LINE_CNTL 0x028C00
 #define R_028C04_PA_SC_AA_CONFIG 0x028C04
+#define R_028C08_PA_SU_VTX_CNTL  0x028C08
 #define R_028C1C_PA_SC_AA_SAMPLE_LOCS_MCTX   0x028C1C
 #define R_028C48_PA_SC_AA_MASK   0x028C48
 #define R_028810_PA_CL_CLIP_CNTL 0x028810
-- 
1.7.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 4/5] r600g: remove unused flink, domain fields from r600_resource

2010-11-02 Thread Keith Whitwell

These were being set but not used anywhere.
---
 src/gallium/drivers/r600/r600_buffer.c   |   27 ---
 src/gallium/drivers/r600/r600_resource.h |5 -
 src/gallium/drivers/r600/r600_texture.c  |1 -
 3 files changed, 0 insertions(+), 33 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_buffer.c 
b/src/gallium/drivers/r600/r600_buffer.c
index 3c45d78..ed97b6e 100644
--- a/src/gallium/drivers/r600/r600_buffer.c
+++ b/src/gallium/drivers/r600/r600_buffer.c
@@ -38,32 +38,6 @@
 
 extern struct u_resource_vtbl r600_buffer_vtbl;
 
-u32 r600_domain_from_usage(unsigned usage)
-{
-   u32 domain = RADEON_GEM_DOMAIN_GTT;
-
-   if (usage & PIPE_BIND_RENDER_TARGET) {
-   domain |= RADEON_GEM_DOMAIN_VRAM;
-   }
-   if (usage & PIPE_BIND_DEPTH_STENCIL) {
-   domain |= RADEON_GEM_DOMAIN_VRAM;
-   }
-   if (usage & PIPE_BIND_SAMPLER_VIEW) {
-   domain |= RADEON_GEM_DOMAIN_VRAM;
-   }
-   /* also need BIND_BLIT_SOURCE/DESTINATION ? */
-   if (usage & PIPE_BIND_VERTEX_BUFFER) {
-   domain |= RADEON_GEM_DOMAIN_GTT;
-   }
-   if (usage & PIPE_BIND_INDEX_BUFFER) {
-   domain |= RADEON_GEM_DOMAIN_GTT;
-   }
-   if (usage & PIPE_BIND_CONSTANT_BUFFER) {
-   domain |= RADEON_GEM_DOMAIN_VRAM;
-   }
-
-   return domain;
-}
 
 struct pipe_resource *r600_buffer_create(struct pipe_screen *screen,
 const struct pipe_resource *templ)
@@ -85,7 +59,6 @@ struct pipe_resource *r600_buffer_create(struct pipe_screen 
*screen,
rbuffer->r.base.b.screen = screen;
rbuffer->r.base.vtbl = &r600_buffer_vtbl;
rbuffer->r.size = rbuffer->r.base.b.width0;
-   rbuffer->r.domain = r600_domain_from_usage(rbuffer->r.base.b.bind);
bo = r600_bo((struct radeon*)screen->winsys, rbuffer->r.base.b.width0, 
alignment, rbuffer->r.base.b.bind, rbuffer->r.base.b.usage);
if (bo == NULL) {
FREE(rbuffer);
diff --git a/src/gallium/drivers/r600/r600_resource.h 
b/src/gallium/drivers/r600/r600_resource.h
index d24d5a1..7a2d1f4 100644
--- a/src/gallium/drivers/r600/r600_resource.h
+++ b/src/gallium/drivers/r600/r600_resource.h
@@ -45,8 +45,6 @@ struct r600_transfer {
 struct r600_resource {
struct u_resource   base;
struct r600_bo  *bo;
-   u32 domain;
-   u32 flink;
u32 size;
 };
 
@@ -68,9 +66,6 @@ struct r600_resource_texture {
 
 void r600_init_screen_resource_functions(struct pipe_screen *screen);
 
-/* r600_buffer */
-u32 r600_domain_from_usage(unsigned usage);
-
 /* r600_texture */
 struct pipe_resource *r600_texture_create(struct pipe_screen *screen,
const struct pipe_resource *templ);
diff --git a/src/gallium/drivers/r600/r600_texture.c 
b/src/gallium/drivers/r600/r600_texture.c
index 8fbe4a0..c92f634 100644
--- a/src/gallium/drivers/r600/r600_texture.c
+++ b/src/gallium/drivers/r600/r600_texture.c
@@ -284,7 +284,6 @@ r600_texture_create_object(struct pipe_screen *screen,
pipe_reference_init(&resource->base.b.reference, 1);
resource->base.b.screen = screen;
resource->bo = bo;
-   resource->domain = r600_domain_from_usage(resource->base.b.bind);
rtex->pitch_override = pitch_in_bytes_override;
 
if (array_mode)
-- 
1.7.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 3/5] r600g: use a buffer in GTT as intermediate on texture up and downloads

2010-11-02 Thread Keith Whitwell

Generalize the existing tiled_buffer path in texture transfers for use
in some non-tiled up and downloads.

Use a staging buffer, which the winsys will restrict to GTT memory.

GTT buffers have the major advantage when they are mapped, they are
cachable, which is a very nice property for downloads, usually the CPU
will want to do look at the data it downloaded.
---
 src/gallium/drivers/r600/r600_resource.h |2 +-
 src/gallium/drivers/r600/r600_texture.c  |   85 ++
 2 files changed, 64 insertions(+), 23 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_resource.h 
b/src/gallium/drivers/r600/r600_resource.h
index d152285..d24d5a1 100644
--- a/src/gallium/drivers/r600/r600_resource.h
+++ b/src/gallium/drivers/r600/r600_resource.h
@@ -35,7 +35,7 @@ struct r600_transfer {
/* Buffer transfer. */
struct pipe_transfer*buffer_transfer;
unsignedoffset;
-   struct pipe_resource*linear_texture;
+   struct pipe_resource*staging_texture;
 };
 
 /* This gets further specialized into either buffer or texture
diff --git a/src/gallium/drivers/r600/r600_texture.c 
b/src/gallium/drivers/r600/r600_texture.c
index 9a52cfa..8fbe4a0 100644
--- a/src/gallium/drivers/r600/r600_texture.c
+++ b/src/gallium/drivers/r600/r600_texture.c
@@ -40,8 +40,8 @@
 
 extern struct u_resource_vtbl r600_texture_vtbl;
 
-/* Copy from a tiled texture to a detiled one. */
-static void r600_copy_from_tiled_texture(struct pipe_context *ctx, struct 
r600_transfer *rtransfer)
+/* Copy from a full GPU texture to a transfer's staging one. */
+static void r600_copy_to_staging_texture(struct pipe_context *ctx, struct 
r600_transfer *rtransfer)
 {
struct pipe_transfer *transfer = (struct pipe_transfer*)rtransfer;
struct pipe_resource *texture = transfer->resource;
@@ -49,15 +49,15 @@ static void r600_copy_from_tiled_texture(struct 
pipe_context *ctx, struct r600_t
 
subdst.face = 0;
subdst.level = 0;
-   ctx->resource_copy_region(ctx, rtransfer->linear_texture,
+   ctx->resource_copy_region(ctx, rtransfer->staging_texture,
subdst, 0, 0, 0, texture, transfer->sr,
transfer->box.x, transfer->box.y, 
transfer->box.z,
transfer->box.width, transfer->box.height);
 }
 
 
-/* Copy from a detiled texture to a tiled one. */
-static void r600_copy_into_tiled_texture(struct pipe_context *ctx, struct 
r600_transfer *rtransfer)
+/* Copy from a transfer's staging texture to a full GPU one. */
+static void r600_copy_from_staging_texture(struct pipe_context *ctx, struct 
r600_transfer *rtransfer)
 {
struct pipe_transfer *transfer = (struct pipe_transfer*)rtransfer;
struct pipe_resource *texture = transfer->resource;
@@ -67,7 +67,7 @@ static void r600_copy_into_tiled_texture(struct pipe_context 
*ctx, struct r600_t
subsrc.level = 0;
ctx->resource_copy_region(ctx, texture, transfer->sr,
  transfer->box.x, transfer->box.y, 
transfer->box.z,
- rtransfer->linear_texture, subsrc,
+ rtransfer->staging_texture, subsrc,
  0, 0, 0,
  transfer->box.width, transfer->box.height);
 
@@ -435,10 +435,20 @@ int r600_texture_depth_flush(struct pipe_context *ctx,
}
 
 out:
+   /* XXX: only do this if the depth texture has actually changed:
+*/
r600_blit_uncompress_depth_ptr(ctx, rtex);
return 0;
 }
 
+/* Needs adjustment for pixelformat:
+ */
+static INLINE unsigned u_box_volume( const struct pipe_box *box )
+{
+return box->width * box->depth * box->height;
+};
+
+
 struct pipe_transfer* r600_texture_get_transfer(struct pipe_context *ctx,
struct pipe_resource *texture,
struct pipe_subresource sr,
@@ -449,6 +459,35 @@ struct pipe_transfer* r600_texture_get_transfer(struct 
pipe_context *ctx,
struct pipe_resource resource;
struct r600_transfer *trans;
int r;
+   boolean use_staging_texture = FALSE;
+   boolean discard = FALSE;
+
+   if (!(usage & PIPE_TRANSFER_READ) && (usage & PIPE_TRANSFER_DISCARD))
+   discard = TRUE;
+
+   /* We cannot map a tiled texture directly because the data is
+* in a different order, therefore we do detiling using a blit.
+*
+* Also, use a temporary in GTT memory for read transfers, as
+* the CPU is much happier reading out of cached system memory
+* than uncached VRAM.
+*/
+   if (rtex->tiled)
+   use_staging_texture = TRUE;
+
+if (usage & PIPE_TRANSFER_READ &&
+u_box_volume(box) > 1024)
+use_staging_texture = TRUE;
+
+/* XXX

[Mesa-dev] [PATCH 2/5] r600g: propogate resource usage flags to winsys, use to choose bo domains

2010-11-02 Thread Keith Whitwell

This opens the question of what interface the winsys layer should
really have for talking about these concepts.

For now I'm using the existing gallium resource usage concept, but
there is no reason not use terms closer to what the hardware
understands - eg. the domains themselves.
---
 src/gallium/drivers/r600/r600.h   |3 ++-
 src/gallium/drivers/r600/r600_buffer.c|7 ---
 src/gallium/drivers/r600/r600_shader.c|2 +-
 src/gallium/drivers/r600/r600_texture.c   |2 +-
 src/gallium/winsys/r600/drm/r600_bo.c |   24 +---
 src/gallium/winsys/r600/drm/r600_hw_context.c |   13 +
 src/gallium/winsys/r600/drm/r600_priv.h   |1 +
 7 files changed, 39 insertions(+), 13 deletions(-)

diff --git a/src/gallium/drivers/r600/r600.h b/src/gallium/drivers/r600/r600.h
index 62d9832..5ec607b 100644
--- a/src/gallium/drivers/r600/r600.h
+++ b/src/gallium/drivers/r600/r600.h
@@ -112,7 +112,8 @@ struct r600_tiling_info *r600_get_tiling_info(struct radeon 
*radeon);
 /* r600_bo.c */
 struct r600_bo;
 struct r600_bo *r600_bo(struct radeon *radeon,
- unsigned size, unsigned alignment, unsigned 
usage);
+unsigned size, unsigned alignment,
+unsigned binding, unsigned usage);
 struct r600_bo *r600_bo_handle(struct radeon *radeon,
   unsigned handle, unsigned *array_mode);
 void *r600_bo_map(struct radeon *radeon, struct r600_bo *bo, unsigned usage, 
void *ctx);
diff --git a/src/gallium/drivers/r600/r600_buffer.c 
b/src/gallium/drivers/r600/r600_buffer.c
index 455aa2e..3c45d78 100644
--- a/src/gallium/drivers/r600/r600_buffer.c
+++ b/src/gallium/drivers/r600/r600_buffer.c
@@ -86,7 +86,7 @@ struct pipe_resource *r600_buffer_create(struct pipe_screen 
*screen,
rbuffer->r.base.vtbl = &r600_buffer_vtbl;
rbuffer->r.size = rbuffer->r.base.b.width0;
rbuffer->r.domain = r600_domain_from_usage(rbuffer->r.base.b.bind);
-   bo = r600_bo((struct radeon*)screen->winsys, rbuffer->r.base.b.width0, 
alignment, rbuffer->r.base.b.bind);
+   bo = r600_bo((struct radeon*)screen->winsys, rbuffer->r.base.b.width0, 
alignment, rbuffer->r.base.b.bind, rbuffer->r.base.b.usage);
if (bo == NULL) {
FREE(rbuffer);
return NULL;
@@ -156,8 +156,9 @@ static void *r600_buffer_transfer_map(struct pipe_context 
*pipe,
r600_bo_reference((struct radeon*)pipe->winsys, 
&rbuffer->r.bo, NULL);
rbuffer->num_ranges = 0;
rbuffer->r.bo = r600_bo((struct 
radeon*)pipe->winsys,
-
rbuffer->r.base.b.width0, 0,
-
rbuffer->r.base.b.bind);
+
rbuffer->r.base.b.width0, 0,
+rbuffer->r.base.b.bind,
+
rbuffer->r.base.b.usage);
break;
}
}
diff --git a/src/gallium/drivers/r600/r600_shader.c 
b/src/gallium/drivers/r600/r600_shader.c
index 4106587..1a0b35d 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -218,7 +218,7 @@ static int r600_pipe_shader(struct pipe_context *ctx, 
struct r600_pipe_shader *s
 
/* copy new shader */
if (shader->bo == NULL) {
-   shader->bo = r600_bo(rctx->radeon, rshader->bc.ndw * 4, 4096, 
0);
+   shader->bo = r600_bo(rctx->radeon, rshader->bc.ndw * 4, 4096, 
0, 0);
if (shader->bo == NULL) {
return -ENOMEM;
}
diff --git a/src/gallium/drivers/r600/r600_texture.c 
b/src/gallium/drivers/r600/r600_texture.c
index 7222b43..9a52cfa 100644
--- a/src/gallium/drivers/r600/r600_texture.c
+++ b/src/gallium/drivers/r600/r600_texture.c
@@ -294,7 +294,7 @@ r600_texture_create_object(struct pipe_screen *screen,
resource->size = rtex->size;
 
if (!resource->bo) {
-   resource->bo = r600_bo(radeon, rtex->size, 4096, 0);
+   resource->bo = r600_bo(radeon, rtex->size, 4096, base->bind, 
base->usage);
if (!resource->bo) {
FREE(rtex);
return NULL;
diff --git a/src/gallium/winsys/r600/drm/r600_bo.c 
b/src/gallium/winsys/r600/drm/r600_bo.c
index 7d54ff1..9b9aec5 100644
--- a/src/gallium/winsys/r600/drm/r600_bo.c
+++ b/src/gallium/winsys/r600/drm/r600_bo.c
@@ -29,23 +29,37 @@
 #include "radeon_drm.h"
 #include "r600_priv.h"
 #include "r600d.h"
+#include "radeon_drm.h"
 
 struct r600_bo *r600_bo(struct radeon *radeon,
- unsigned size, unsigned alignment, unsigned 
usage)
+   uns

1 2 3 >

1 - 100 of 248 matches

Mail list logo