from:"Niels Ole Salscheider"

Re: [Mesa-dev] [PATCH] st/clover: Define __OPENCL_VERSION__ on the device side

2016-09-10 Thread Niels Ole Salscheider

On Wednesday, 31 August 2016, 15:53:05 CEST, Serge Martin wrote:
> On Wednesday 31 August 2016 12:39:23 Vedran Miletić wrote:
> > On 08/28/2016 04:42 PM, Niels Ole Salscheider wrote:
> > > This is required by the OpenCL standard.
> > > 
> > > Signed-off-by: Niels Ole Salscheider 
> > 
> > Reviewed-by: Vedran Miletić 
> > 
> > Good catch. Do we miss more defines from [1]?
> 
> I think __IMAGE_SUPPORT__ and __EMBEDDED_PROFILE__ should be managed by
> Clover too but none off them would be ever define wit our current feature
> level, so this is ok.
> 
> I think __ENDIAN_LITTLE__ is missing.
> 
> Anyway, adding some piglit tests would be nice :)

I have posted a patch with a piglit test. Can somebody push this for me?

> Serge
> 
> > Regards,
> > Vedran
> > 
> > [1]
> > https://www.khronos.org/registry/cl/sdk/1.1/docs/man/xhtml/preprocessorDir
> > ectives.html> 
> > > ---
> > > 
> > >  src/gallium/state_trackers/clover/llvm/invocation.cpp | 3 +++
> > >  1 file changed, 3 insertions(+)
> > > 
> > > diff --git a/src/gallium/state_trackers/clover/llvm/invocation.cpp
> > > b/src/gallium/state_trackers/clover/llvm/invocation.cpp index
> > > 5490d72..b5e8b52 100644
> > > --- a/src/gallium/state_trackers/clover/llvm/invocation.cpp
> > > +++ b/src/gallium/state_trackers/clover/llvm/invocation.cpp
> > > @@ -153,6 +153,9 @@ namespace {
> > > 
> > >// Add libclc include
> > >c.getPreprocessorOpts().Includes.push_back("clc/clc.h");
> > > 
> > > +  // Add definition for the OpenCL version
> > > +  c.getPreprocessorOpts().addMacroDef("__OPENCL_VERSION__=110");
> > > +
> > > 
> > >// clc.h requires that this macro be defined:
> > >c.getPreprocessorOpts().addMacroDef("cl_clang_storage_class_speci
> > >fiers");
> > >c.getPreprocessorOpts().addRemappedFile(


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] st/clover: Define __OPENCL_VERSION__ on the device side

2016-08-28 Thread Niels Ole Salscheider

This is required by the OpenCL standard.

Signed-off-by: Niels Ole Salscheider 
---
 src/gallium/state_trackers/clover/llvm/invocation.cpp | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/gallium/state_trackers/clover/llvm/invocation.cpp 
b/src/gallium/state_trackers/clover/llvm/invocation.cpp
index 5490d72..b5e8b52 100644
--- a/src/gallium/state_trackers/clover/llvm/invocation.cpp
+++ b/src/gallium/state_trackers/clover/llvm/invocation.cpp
@@ -153,6 +153,9 @@ namespace {
   // Add libclc include
   c.getPreprocessorOpts().Includes.push_back("clc/clc.h");
 
+  // Add definition for the OpenCL version
+  c.getPreprocessorOpts().addMacroDef("__OPENCL_VERSION__=110");
+
   // clc.h requires that this macro be defined:
   c.getPreprocessorOpts().addMacroDef("cl_clang_storage_class_specifiers");
   c.getPreprocessorOpts().addRemappedFile(
-- 
2.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] radeonsi: Do not suspend timer queries

2013-08-28 Thread Niels Ole Salscheider

Signed-off-by: Niels Ole Salscheider 
---
 src/gallium/drivers/radeonsi/r600.h|  1 +
 src/gallium/drivers/radeonsi/r600_hw_context.c | 28 ++
 src/gallium/drivers/radeonsi/r600_query.c  |  7 +--
 src/gallium/drivers/radeonsi/radeonsi_pipe.c   |  2 +-
 src/gallium/drivers/radeonsi/radeonsi_pipe.h   |  4 ++--
 src/gallium/drivers/radeonsi/si_state_draw.c   |  2 +-
 6 files changed, 30 insertions(+), 14 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/r600.h 
b/src/gallium/drivers/radeonsi/r600.h
index ce0468d..ac3b2f1 100644
--- a/src/gallium/drivers/radeonsi/r600.h
+++ b/src/gallium/drivers/radeonsi/r600.h
@@ -102,6 +102,7 @@ void si_context_emit_fence(struct r600_context *ctx, struct 
si_resource *fence,
unsigned offset, unsigned value);
 
 void r600_context_draw_opaque_count(struct r600_context *ctx, struct 
r600_so_target *t);
+bool si_is_timer_query(unsigned type);
 bool si_query_needs_begin(unsigned type);
 void si_need_cs_space(struct r600_context *ctx, unsigned num_dw, boolean 
count_draw_in);
 
diff --git a/src/gallium/drivers/radeonsi/r600_hw_context.c 
b/src/gallium/drivers/radeonsi/r600_hw_context.c
index 59b2d70..f050b3b 100644
--- a/src/gallium/drivers/radeonsi/r600_hw_context.c
+++ b/src/gallium/drivers/radeonsi/r600_hw_context.c
@@ -110,6 +110,13 @@ err:
return;
 }
 
+bool si_is_timer_query(unsigned type)
+{
+   return type == PIPE_QUERY_TIME_ELAPSED ||
+   type == PIPE_QUERY_TIMESTAMP ||
+   type == PIPE_QUERY_TIMESTAMP_DISJOINT;
+}
+
 bool si_query_needs_begin(unsigned type)
 {
return type != PIPE_QUERY_TIMESTAMP;
@@ -139,7 +146,7 @@ void si_need_cs_space(struct r600_context *ctx, unsigned 
num_dw,
}
 
/* Count in queries_suspend. */
-   num_dw += ctx->num_cs_dw_queries_suspend;
+   num_dw += ctx->num_cs_dw_nontimer_queries_suspend;
 
/* Count in streamout_end at the end of CS. */
num_dw += ctx->num_cs_dw_streamout_end;
@@ -211,7 +218,7 @@ void si_context_flush(struct r600_context *ctx, unsigned 
flags)
return;
 
/* suspend queries */
-   if (ctx->num_cs_dw_queries_suspend) {
+   if (ctx->num_cs_dw_nontimer_queries_suspend) {
r600_context_queries_suspend(ctx);
queries_suspended = true;
}
@@ -506,7 +513,9 @@ void r600_query_begin(struct r600_context *ctx, struct 
r600_query *query)
cs->buf[cs->cdw++] = PKT3(PKT3_NOP, 0, 0);
cs->buf[cs->cdw++] = r600_context_bo_reloc(ctx, query->buffer, 
RADEON_USAGE_WRITE);
 
-   ctx->num_cs_dw_queries_suspend += query->num_cs_dw;
+   if (!si_is_timer_query(query->type)) {
+   ctx->num_cs_dw_nontimer_queries_suspend += query->num_cs_dw;
+   }
 }
 
 void r600_query_end(struct r600_context *ctx, struct r600_query *query)
@@ -565,7 +574,10 @@ void r600_query_end(struct r600_context *ctx, struct 
r600_query *query)
cs->buf[cs->cdw++] = r600_context_bo_reloc(ctx, query->buffer, 
RADEON_USAGE_WRITE);
 
query->results_end = (query->results_end + query->result_size) % 
query->buffer->b.b.width0;
-   ctx->num_cs_dw_queries_suspend -= query->num_cs_dw;
+
+   if (si_query_needs_begin(query->type) && 
!si_is_timer_query(query->type)) {
+   ctx->num_cs_dw_nontimer_queries_suspend -= query->num_cs_dw;
+   }
 }
 
 void r600_query_predication(struct r600_context *ctx, struct r600_query 
*query, int operation,
@@ -712,19 +724,19 @@ void r600_context_queries_suspend(struct r600_context 
*ctx)
 {
struct r600_query *query;
 
-   LIST_FOR_EACH_ENTRY(query, &ctx->active_query_list, list) {
+   LIST_FOR_EACH_ENTRY(query, &ctx->active_nontimer_query_list, list) {
r600_query_end(ctx, query);
}
-   assert(ctx->num_cs_dw_queries_suspend == 0);
+   assert(ctx->num_cs_dw_nontimer_queries_suspend == 0);
 }
 
 void r600_context_queries_resume(struct r600_context *ctx)
 {
struct r600_query *query;
 
-   assert(ctx->num_cs_dw_queries_suspend == 0);
+   assert(ctx->num_cs_dw_nontimer_queries_suspend == 0);
 
-   LIST_FOR_EACH_ENTRY(query, &ctx->active_query_list, list) {
+   LIST_FOR_EACH_ENTRY(query, &ctx->active_nontimer_query_list, list) {
r600_query_begin(ctx, query);
}
 }
diff --git a/src/gallium/drivers/radeonsi/r600_query.c 
b/src/gallium/drivers/radeonsi/r600_query.c
index 927577c..aa51e74 100644
--- a/src/gallium/drivers/radeonsi/r600_query.c
+++ b/src/gallium/drivers/radeonsi/r600_query.c
@@ -50,7 +50,10 @@ static void r600_begin_query(struct pipe_context *ctx, 
struct pipe_query *query)
memset(&rquery->result, 0, sizeof(rquery->result));
rquery->results_start = rquery->results_end;
r600_quer

[Mesa-dev] [PATCH 2/2] st/clover: Always flush the queue when waiting on an hard_event

2013-09-06 Thread Niels Ole Salscheider

The OpenCL spec says:
"Any blocking commands queued in a command-queue and clReleaseCommandQueue
perform an implicit flush of the command-queue. These blocking commands are
[...] or clWaitForEvents."

Flushing the queue unconditionally also helps to actually clear the
queued_events list of the queue object.

Signed-off-by: Niels Ole Salscheider 
---
 src/gallium/state_trackers/clover/core/event.cpp | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/src/gallium/state_trackers/clover/core/event.cpp 
b/src/gallium/state_trackers/clover/core/event.cpp
index cbb97bf..8b5acd0 100644
--- a/src/gallium/state_trackers/clover/core/event.cpp
+++ b/src/gallium/state_trackers/clover/core/event.cpp
@@ -153,8 +153,7 @@ void
 hard_event::wait() const {
pipe_screen *screen = queue()->dev.pipe;
 
-   if (status() == CL_QUEUED)
-  queue()->flush();
+   queue()->flush();
 
if (!__fence ||
!screen->fence_finish(screen, __fence, PIPE_TIMEOUT_INFINITE))
-- 
1.8.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 3/6] st/clover: Unreference fences as early as possible

2013-09-06 Thread Niels Ole Salscheider

While unreferencing fences as early as possible is not a bad idea, this patch 
hides the underlying problem. That is, events are never deleted from the 
queued_events list of the queue object if their fences are signalled before 
the queue is flushed.
I will send a patch that fixes the problem shortly.

Ole
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 1/2] st/clover: Clear the complete queue

2013-09-06 Thread Niels Ole Salscheider

Events that are already signalled can be removed from the queue, too.

Signed-off-by: Niels Ole Salscheider 
---
 src/gallium/state_trackers/clover/core/queue.cpp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/state_trackers/clover/core/queue.cpp 
b/src/gallium/state_trackers/clover/core/queue.cpp
index 0b1c494..500a636 100644
--- a/src/gallium/state_trackers/clover/core/queue.cpp
+++ b/src/gallium/state_trackers/clover/core/queue.cpp
@@ -56,7 +56,7 @@ _cl_command_queue::flush() {
   pipe->flush(pipe, &fence, 0);
   std::for_each(first, last, [&](event_ptr &ev) { ev->fence(fence); });
   screen->fence_reference(screen, &fence, NULL);
-  queued_events.erase(first, last);
+  queued_events.clear();
}
 }
 
-- 
1.8.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/2] st/clover: Clear the complete queue

2013-09-26 Thread Niels Ole Salscheider

Ping
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/2] st/clover: Always flush the queue when waiting on an hard_event

2013-10-03 Thread Niels Ole Salscheider

> I don't think this is right, with this patch we remove *all* events from
> the command queue, signalled or not, every time the command queue is
> flushed.

You are right, I got the logic wrong here (see also 
http://lists.freedesktop.org/archives/mesa-dev/2013-September/044363.html).

The problem is that I have an application that causes a leak of event objects. 
That is, some events are never deleted from the queue. I will have to debug 
this further, but I am somewhat busy right now since I a have just relocated.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/2] st/clover: Always flush the queue when waiting on an hard_event

2013-10-03 Thread Niels Ole Salscheider

> Do you have any example of a real world application that relies on this?
> Or at least some reasonable use case?

The problem is that the queue is only cleared from already signalled events 
when we flush it. And we might not do this if the user only calls 
clWaitForEvents once the corresponding event has already been signalled.

I am fine with not flushing the queue, but we should at least make sure that 
signalled events are freed early enough.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/2] st/clover: Always flush the queue when waiting on an hard_event

2013-10-06 Thread Niels Ole Salscheider

Am Donnerstag, 3. Oktober 2013, 11:08:26 schrieb Francisco Jerez:
> Niels Ole Salscheider  writes:
> >> Do you have any example of a real world application that relies on this?
> >> Or at least some reasonable use case?
> > 
> > The problem is that the queue is only cleared from already signalled
> > events
> > when we flush it. And we might not do this if the user only calls
> > clWaitForEvents once the corresponding event has already been signalled.
> > 
> > I am fine with not flushing the queue, but we should at least make sure
> > that signalled events are freed early enough.

> So your application doesn't call clFlush() explicitly nor any blocking
> call on that specific event and it stalls forever polling an event with
> clGetEventInfo() that never gets flushed to the GPU?  Is that the
> problem you've seen?  Is it an open source application?

Unfortunately, the application is not open source and I am not allowed to give 
the code to someone else, even though I have access to it.

The application calls clFinish and clWaitForEvents, but not clFlush. I think 
the problem is that the kernels might already have finished execution when the 
application calls these functions. Because of that the queue is not flushed and 
thus not cleared.
However, I cannot reproduce it right now.

Regards,

Ole
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] clover: Calculate the optimal work group size when local_size is NULL

2013-10-29 Thread Niels Ole Salscheider

Hi Tom,

this has been on my todo list for quite a while.

Your patch looks good to me, but in my experience a block with approximately 
the same size for each dimension gives slightly better performance in many 
cases when compared to one where one dimension is significantly larger.
Maybe you could initialise the size for each dimension to 1 and multiply them 
by 2 in a round-robin fashion as long as feasible.

Regards,

Ole
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] pipe-loader: Fix out of source build

2013-04-04 Thread Niels Ole Salscheider

Am Sonntag, 24. Februar 2013, 15:02:33 schrieb Matt Turner:
> On Sun, Feb 24, 2013 at 2:00 PM, Niels Ole Salscheider
> 
>  wrote:
> > Signed-off-by: Niels Ole Salscheider 
> > ---
> > 
> >  src/gallium/targets/opencl/Makefile.am | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/src/gallium/targets/opencl/Makefile.am
> > b/src/gallium/targets/opencl/Makefile.am index c5c3003..709112f 100644
> > --- a/src/gallium/targets/opencl/Makefile.am
> > +++ b/src/gallium/targets/opencl/Makefile.am
> > @@ -32,11 +32,11 @@ libOpenCL_la_SOURCES =
> > 
> >  # Force usage of a C++ linker
> >  nodist_EXTRA_libOpenCL_la_SOURCES = dummy.cpp
> > 
> > -PIPE_SRC_DIR = $(top_srcdir)/src/gallium/targets/pipe-loader
> > +PIPE_BUILD_DIR = $(top_builddir)/src/gallium/targets/pipe-loader
> > 
> >  # Provide compatibility with scripts for the old Mesa build system for
> >  # a while by putting a link to the driver into /lib of the build tree.
> >  all-local: libOpenCL.la
> > 
> > -   @$(MAKE) -C $(PIPE_SRC_DIR)
> > +   @$(MAKE) -C $(PIPE_BUILD_DIR)
> > 
> > $(MKDIR_P) $(top_builddir)/$(LIB_DIR)
> > ln -f .libs/libOpenCL.so* $(top_builddir)/$(LIB_DIR)/
> > 
> > --
> > 1.8.1.3
> 
> I think I've fixed this in a different way (that doesn't involve
> calling $(MAKE)) in this branch:
> http://cgit.freedesktop.org/~mattst88/mesa/log/?h=make-dist

Do you intend to merge this branch in the forseeable future?
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] clover: Fix linkage of libOpenCL

2013-04-04 Thread Niels Ole Salscheider

Clover needs the irreader component of llvm
---
 configure.ac | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/configure.ac b/configure.ac
index 81d4a3f..bfba1b3 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1649,7 +1649,7 @@ if test "x$enable_gallium_llvm" = xyes; then
 fi
 
 if test "x$enable_opencl" = xyes; then
-LLVM_COMPONENTS="${LLVM_COMPONENTS} ipo linker instrumentation"
+LLVM_COMPONENTS="${LLVM_COMPONENTS} ipo irreader linker 
instrumentation"
 fi
LLVM_LDFLAGS=`$LLVM_CONFIG --ldflags`
LLVM_BINDIR=`$LLVM_CONFIG --bindir`
-- 
1.8.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] clover: Fix linkage of libOpenCL

2013-04-04 Thread Niels Ole Salscheider

Clover needs the irreader component of llvm

v2: Check for irreader component
irreader is only available with LLVM 3.3 >= 177971

Signed-off-by: Niels Ole Salscheider 
---
 configure.ac | 4 
 1 file changed, 4 insertions(+)

diff --git a/configure.ac b/configure.ac
index 81d4a3f..fea5868 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1650,6 +1650,10 @@ if test "x$enable_gallium_llvm" = xyes; then
 
 if test "x$enable_opencl" = xyes; then
 LLVM_COMPONENTS="${LLVM_COMPONENTS} ipo linker instrumentation"
+# LLVM 3.3 >= 177971 requires IRReader
+if $LLVM_CONFIG --components | grep -q '\'; then
+LLVM_COMPONENTS="${LLVM_COMPONENTS} irreader"
+fi
 fi
LLVM_LDFLAGS=`$LLVM_CONFIG --ldflags`
LLVM_BINDIR=`$LLVM_CONFIG --bindir`
-- 
1.8.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] gallium/opencl: Fix out-of-tree build

2013-04-09 Thread Niels Ole Salscheider

Am Dienstag, 9. April 2013, 11:17:39 schrieb Michel Dänzer:
> From: Michel Dänzer 
> 
> 
> Signed-off-by: Michel Dänzer 
> ---
>  src/gallium/targets/opencl/Makefile.am | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/src/gallium/targets/opencl/Makefile.am
> b/src/gallium/targets/opencl/Makefile.am index 389eecc..810f9bb 100644
> --- a/src/gallium/targets/opencl/Makefile.am
> +++ b/src/gallium/targets/opencl/Makefile.am
> @@ -32,11 +32,11 @@ libOpenCL_la_SOURCES =
>  # Force usage of a C++ linker
>  nodist_EXTRA_libOpenCL_la_SOURCES = dummy.cpp
>  
> -PIPE_SRC_DIR = $(top_srcdir)/src/gallium/targets/pipe-loader
> +PIPE_BUILD_DIR = $(top_builddir)/src/gallium/targets/pipe-loader
>  
>  # Provide compatibility with scripts for the old Mesa build system for
>  # a while by putting a link to the driver into /lib of the build tree.
>  all-local: libOpenCL.la
> -   @$(MAKE) -C $(PIPE_SRC_DIR)
> +   @$(MAKE) -C $(PIPE_BUILD_DIR)
> $(MKDIR_P) $(top_builddir)/$(LIB_DIR)
> ln -f .libs/libOpenCL.so* $(top_builddir)/$(LIB_DIR)/
> -- 
> 1.8.2

I sent that patch to the list on 24.02.2013, but Matt Turner said that he has 
a better solution that does not involve calling make...
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] r600g: fixup for MSAA texture support checking

2013-05-15 Thread Niels Ole Salscheider

Signed-off-by: Niels Ole Salscheider 
---
 src/gallium/drivers/r600/r600_shader.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/r600/r600_shader.c 
b/src/gallium/drivers/r600/r600_shader.c
index 4e5af70..4d74db0 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -305,7 +305,7 @@ int r600_compute_shader_create(struct pipe_context * ctx,
 
shader_ctx.bc = bytecode;
r600_bytecode_init(shader_ctx.bc, r600_ctx->chip_class, 
r600_ctx->family,
-  r600_ctx->screen->msaa_texture_support);
+  r600_ctx->screen->has_compressed_msaa_texturing);
shader_ctx.bc->type = TGSI_PROCESSOR_COMPUTE;
shader_ctx.bc->isa = r600_ctx->isa;
r600_llvm_compile(mod, r600_ctx->family,
-- 
1.8.2.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 5/5] radeonsi/compute: Upload work group, work item size in input buffer

2013-05-27 Thread Niels Ole Salscheider

Am Freitag, 24. Mai 2013, 14:07:29 schrieb Tom Stellard:
> From: Tom Stellard 
> 
> ---
>  src/gallium/drivers/radeonsi/radeonsi_compute.c | 38
> ++--- 1 file changed, 27 insertions(+), 11 deletions(-)
> 
> diff --git a/src/gallium/drivers/radeonsi/radeonsi_compute.c
> b/src/gallium/drivers/radeonsi/radeonsi_compute.c index 035076d..3abf50b
> 100644
> --- a/src/gallium/drivers/radeonsi/radeonsi_compute.c
> +++ b/src/gallium/drivers/radeonsi/radeonsi_compute.c
> @@ -91,9 +91,12 @@ static void radeonsi_launch_grid(
>   struct r600_context *rctx = (struct r600_context*)ctx;
>   struct si_pipe_compute *program = rctx->cs_shader_state.program;
>   struct si_pm4_state *pm4 = CALLOC_STRUCT(si_pm4_state);
> - struct si_resource *input_buffer;
> - uint32_t input_offset = 0;
> - uint64_t input_va;
> + struct si_resource *kernel_args_buffer;

You should initialize this pointer to 0.

> + unsigned kernel_args_size;
> + unsigned num_work_size_bytes = 36;
> + uint32_t kernel_args_offset = 0;
> + uint32_t *kernel_args;
> + uint64_t kernel_args_va;
>   uint64_t shader_va;
>   unsigned arg_user_sgpr_count = 2;
>   unsigned i;
> @@ -112,16 +115,29 @@ static void radeonsi_launch_grid(
>   si_pm4_inval_shader_cache(pm4);
>   si_cmd_surface_sync(pm4, pm4->cp_coher_cntl);
> 
> - /* Upload the input data */
> - r600_upload_const_buffer(rctx, &input_buffer, input,
> - program->input_size, &input_offset);
> - input_va = r600_resource_va(ctx->screen, (struct
> pipe_resource*)input_buffer); -   input_va += input_offset;
> + /* Upload the kernel arguments */
> 
> - si_pm4_add_bo(pm4, input_buffer, RADEON_USAGE_READ);
> + /* The extra num_work_size_bytes are for work group / work item size
> information */ +  kernel_args_size = program->input_size +
> num_work_size_bytes;
> + kernel_args = MALLOC(kernel_args_size);
> + for (i = 0; i < 3; i++) {
> + kernel_args[i] = grid_layout[i];
> + kernel_args[i + 3] = grid_layout[i] * block_layout[i];
> + kernel_args[i + 6] = block_layout[i];
> + }
> +
> + memcpy(kernel_args + (num_work_size_bytes / 4), input,
> program->input_size); +
> + r600_upload_const_buffer(rctx, &kernel_args_buffer, kernel_args,
> + kernel_args_size, &kernel_args_offset);
> + kernel_args_va = r600_resource_va(ctx->screen,
> + (struct pipe_resource*)kernel_args_buffer);
> + kernel_args_va += kernel_args_offset;
> +
> + si_pm4_add_bo(pm4, kernel_args_buffer, RADEON_USAGE_READ);
> 
> - si_pm4_set_reg(pm4, R_00B900_COMPUTE_USER_DATA_0, input_va);
> - si_pm4_set_reg(pm4, R_00B900_COMPUTE_USER_DATA_0 + 4,
> S_008F04_BASE_ADDRESS_HI (input_va >> 32) | S_008F04_STRIDE(0));
> + si_pm4_set_reg(pm4, R_00B900_COMPUTE_USER_DATA_0, kernel_args_va);
> + si_pm4_set_reg(pm4, R_00B900_COMPUTE_USER_DATA_0 + 4,
> S_008F04_BASE_ADDRESS_HI (kernel_args_va >> 32) | S_008F04_STRIDE(0));
> 
>   si_pm4_set_reg(pm4, R_00B810_COMPUTE_START_X, 0);
>   si_pm4_set_reg(pm4, R_00B814_COMPUTE_START_Y, 0);
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] st/clover: Allow double precision operations

2013-07-02 Thread Niels Ole Salscheider

Pass "cl_khr_fp64" preprocessor definition to clang

Signed-off-by: Niels Ole Salscheider 
---
 src/gallium/state_trackers/clover/llvm/invocation.cpp | 1 +
 1 Datei geändert, 1 Zeile hinzugefügt(+)

diff --git a/src/gallium/state_trackers/clover/llvm/invocation.cpp 
b/src/gallium/state_trackers/clover/llvm/invocation.cpp
index dae61f7..bc85b61 100644
--- a/src/gallium/state_trackers/clover/llvm/invocation.cpp
+++ b/src/gallium/state_trackers/clover/llvm/invocation.cpp
@@ -175,6 +175,7 @@ namespace {
 
   // clc.h requires that this macro be defined:
   c.getPreprocessorOpts().addMacroDef("cl_clang_storage_class_specifiers");
+  c.getPreprocessorOpts().addMacroDef("cl_khr_fp64");
 
   c.getLangOpts().NoBuiltin = true;
   c.getTargetOpts().Triple = triple;
-- 
1.7.11.7

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] R600/SI: Initial double precision support for Radeon SI

2013-07-02 Thread Niels Ole Salscheider

Hi,

the attached patches add initial support for double precision operations on 
Southern Islands cards.

Some expressions containing multiple double precision kernel arguments cause 
llvm to run until all memory is used - but I do not (yet) know why.
It works fine as long as I pass pointers to double values.

Regards,

Ole>From 4224b314cf2d97cdf2ac99564d6155fa04fbb971 Mon Sep 17 00:00:00 2001
From: Niels Ole Salscheider 
Date: Sat, 1 Jun 2013 16:48:56 +0200
Subject: [PATCH 1/6] R600/SI: Add initial double precision support for SI

---
 lib/Target/R600/AMDGPUISelLowering.cpp |  6 ++
 lib/Target/R600/SIISelLowering.cpp |  1 +
 lib/Target/R600/SIInstructions.td  | 30 +-
 test/CodeGen/R600/fadd64.ll| 13 +
 test/CodeGen/R600/fdiv64.ll| 14 ++
 test/CodeGen/R600/fmul64.ll| 13 +
 test/CodeGen/R600/load64.ll| 20 
 7 Dateien geändert, 96 Zeilen hinzugefügt(+), 1 Zeile entfernt(-)
 create mode 100644 test/CodeGen/R600/fadd64.ll
 create mode 100644 test/CodeGen/R600/fdiv64.ll
 create mode 100644 test/CodeGen/R600/fmul64.ll
 create mode 100644 test/CodeGen/R600/load64.ll

diff --git a/lib/Target/R600/AMDGPUISelLowering.cpp b/lib/Target/R600/AMDGPUISelLowering.cpp
index 4019a1f..5f3d496 100644
--- a/lib/Target/R600/AMDGPUISelLowering.cpp
+++ b/lib/Target/R600/AMDGPUISelLowering.cpp
@@ -60,12 +60,18 @@ AMDGPUTargetLowering::AMDGPUTargetLowering(TargetMachine &TM) :
   setOperationAction(ISD::STORE, MVT::v4f32, Promote);
   AddPromotedToType(ISD::STORE, MVT::v4f32, MVT::v4i32);
 
+  setOperationAction(ISD::STORE, MVT::f64, Promote);
+  AddPromotedToType(ISD::STORE, MVT::f64, MVT::i64);
+
   setOperationAction(ISD::LOAD, MVT::f32, Promote);
   AddPromotedToType(ISD::LOAD, MVT::f32, MVT::i32);
 
   setOperationAction(ISD::LOAD, MVT::v4f32, Promote);
   AddPromotedToType(ISD::LOAD, MVT::v4f32, MVT::v4i32);
 
+  setOperationAction(ISD::LOAD, MVT::f64, Promote);
+  AddPromotedToType(ISD::LOAD, MVT::f64, MVT::i64);
+
   setOperationAction(ISD::MUL, MVT::i64, Expand);
 
   setOperationAction(ISD::UDIV, MVT::i32, Expand);
diff --git a/lib/Target/R600/SIISelLowering.cpp b/lib/Target/R600/SIISelLowering.cpp
index 9d4cfef..0d17a12 100644
--- a/lib/Target/R600/SIISelLowering.cpp
+++ b/lib/Target/R600/SIISelLowering.cpp
@@ -45,6 +45,7 @@ SITargetLowering::SITargetLowering(TargetMachine &TM) :
 
   addRegisterClass(MVT::v2i32, &AMDGPU::VReg_64RegClass);
   addRegisterClass(MVT::v2f32, &AMDGPU::VReg_64RegClass);
+  addRegisterClass(MVT::f64, &AMDGPU::VReg_64RegClass);
 
   addRegisterClass(MVT::v4i32, &AMDGPU::VReg_128RegClass);
   addRegisterClass(MVT::v4f32, &AMDGPU::VReg_128RegClass);
diff --git a/lib/Target/R600/SIInstructions.td b/lib/Target/R600/SIInstructions.td
index 9c96c08..b956387 100644
--- a/lib/Target/R600/SIInstructions.td
+++ b/lib/Target/R600/SIInstructions.td
@@ -660,7 +660,9 @@ defm V_RSQ_LEGACY_F32 : VOP1_32 <
   [(set f32:$dst, (int_AMDGPU_rsq f32:$src0))]
 >;
 defm V_RSQ_F32 : VOP1_32 <0x002e, "V_RSQ_F32", []>;
-defm V_RCP_F64 : VOP1_64 <0x002f, "V_RCP_F64", []>;
+defm V_RCP_F64 : VOP1_64 <0x002f, "V_RCP_F64",
+  [(set f64:$dst, (fdiv FP_ONE, f64:$src0))]
+>;
 defm V_RCP_CLAMP_F64 : VOP1_64 <0x0030, "V_RCP_CLAMP_F64", []>;
 defm V_RSQ_F64 : VOP1_64 <0x0031, "V_RSQ_F64", []>;
 defm V_RSQ_CLAMP_F64 : VOP1_64 <0x0032, "V_RSQ_CLAMP_F64", []>;
@@ -996,10 +998,25 @@ def V_LSHR_B64 : VOP3_64_Shift <0x0162, "V_LSHR_B64",
 >;
 def V_ASHR_I64 : VOP3_64_Shift <0x0163, "V_ASHR_I64", []>;
 
+let isCommutable = 1 in {
+
 def V_ADD_F64 : VOP3_64 <0x0164, "V_ADD_F64", []>;
 def V_MUL_F64 : VOP3_64 <0x0165, "V_MUL_F64", []>;
 def V_MIN_F64 : VOP3_64 <0x0166, "V_MIN_F64", []>;
 def V_MAX_F64 : VOP3_64 <0x0167, "V_MAX_F64", []>;
+
+} // isCommutable = 1
+
+def : Pat <
+  (fadd f64:$src0, f64:$src1),
+  (V_ADD_F64 $src0, $src1, (i64 0))
+>;
+
+def : Pat < 
+  (fmul f64:$src0, f64:$src1),
+  (V_MUL_F64 $src0, $src1, (i64 0))
+>;
+
 def V_LDEXP_F64 : VOP3_64 <0x0168, "V_LDEXP_F64", []>;
 
 let isCommutable = 1 in {
@@ -1417,6 +1434,10 @@ def : BitConvert ;
 def : BitConvert ;
 def : BitConvert ;
 
+def : BitConvert ;
+
+def : BitConvert ;
+
 /** === **/
 /** Src & Dst modifiers **/
 /** === **/
@@ -1505,6 +1526,11 @@ def : Pat<
   (V_MUL_F32_e32 $src0, (V_RCP_F32_e32 $src1))
 >;
 
+def : Pat<
+  (fdiv f64:$src0, f64:$src1),
+  (V_MUL_F64 $src0, (V_RCP_F64_e32 $src1), (i64 0))
+>;
+
 def : Pat <
   (fcos f32:$src0),
   (V_COS_F32_e32 (V_MUL_F32_e32 $src0, (V_MOV_B32_e32 CONST.TWO_PI_INV)))
@@

Re: [Mesa-dev] R600/SI: Initial double precision support for Radeon SI

2013-07-09 Thread Niels Ole Salscheider

Hi Tom,

> All these patches look good to me, but #2 and #6 should have a test case
> with them.  If you resubmit these patches with test cases, I will push the
> entire series.

I have attached an updated patchset. I have added a test case to patch #2 and 
#6. I have also replaced the scalar move in patch #2 by a vector move since 
there is probably no point in having a floating point value in a scalar 
register.

Kind regards,

Ole>From 4224b314cf2d97cdf2ac99564d6155fa04fbb971 Mon Sep 17 00:00:00 2001
From: Niels Ole Salscheider 
Date: Sat, 1 Jun 2013 16:48:56 +0200
Subject: [PATCH 1/6] R600/SI: Add initial double precision support for SI

---
 lib/Target/R600/AMDGPUISelLowering.cpp |  6 ++
 lib/Target/R600/SIISelLowering.cpp |  1 +
 lib/Target/R600/SIInstructions.td  | 30 +-
 test/CodeGen/R600/fadd64.ll| 13 +
 test/CodeGen/R600/fdiv64.ll| 14 ++
 test/CodeGen/R600/fmul64.ll| 13 +
 test/CodeGen/R600/load64.ll| 20 
 7 Dateien geändert, 96 Zeilen hinzugefügt(+), 1 Zeile entfernt(-)
 create mode 100644 test/CodeGen/R600/fadd64.ll
 create mode 100644 test/CodeGen/R600/fdiv64.ll
 create mode 100644 test/CodeGen/R600/fmul64.ll
 create mode 100644 test/CodeGen/R600/load64.ll

diff --git a/lib/Target/R600/AMDGPUISelLowering.cpp b/lib/Target/R600/AMDGPUISelLowering.cpp
index 4019a1f..5f3d496 100644
--- a/lib/Target/R600/AMDGPUISelLowering.cpp
+++ b/lib/Target/R600/AMDGPUISelLowering.cpp
@@ -60,12 +60,18 @@ AMDGPUTargetLowering::AMDGPUTargetLowering(TargetMachine &TM) :
   setOperationAction(ISD::STORE, MVT::v4f32, Promote);
   AddPromotedToType(ISD::STORE, MVT::v4f32, MVT::v4i32);
 
+  setOperationAction(ISD::STORE, MVT::f64, Promote);
+  AddPromotedToType(ISD::STORE, MVT::f64, MVT::i64);
+
   setOperationAction(ISD::LOAD, MVT::f32, Promote);
   AddPromotedToType(ISD::LOAD, MVT::f32, MVT::i32);
 
   setOperationAction(ISD::LOAD, MVT::v4f32, Promote);
   AddPromotedToType(ISD::LOAD, MVT::v4f32, MVT::v4i32);
 
+  setOperationAction(ISD::LOAD, MVT::f64, Promote);
+  AddPromotedToType(ISD::LOAD, MVT::f64, MVT::i64);
+
   setOperationAction(ISD::MUL, MVT::i64, Expand);
 
   setOperationAction(ISD::UDIV, MVT::i32, Expand);
diff --git a/lib/Target/R600/SIISelLowering.cpp b/lib/Target/R600/SIISelLowering.cpp
index 9d4cfef..0d17a12 100644
--- a/lib/Target/R600/SIISelLowering.cpp
+++ b/lib/Target/R600/SIISelLowering.cpp
@@ -45,6 +45,7 @@ SITargetLowering::SITargetLowering(TargetMachine &TM) :
 
   addRegisterClass(MVT::v2i32, &AMDGPU::VReg_64RegClass);
   addRegisterClass(MVT::v2f32, &AMDGPU::VReg_64RegClass);
+  addRegisterClass(MVT::f64, &AMDGPU::VReg_64RegClass);
 
   addRegisterClass(MVT::v4i32, &AMDGPU::VReg_128RegClass);
   addRegisterClass(MVT::v4f32, &AMDGPU::VReg_128RegClass);
diff --git a/lib/Target/R600/SIInstructions.td b/lib/Target/R600/SIInstructions.td
index 9c96c08..b956387 100644
--- a/lib/Target/R600/SIInstructions.td
+++ b/lib/Target/R600/SIInstructions.td
@@ -660,7 +660,9 @@ defm V_RSQ_LEGACY_F32 : VOP1_32 <
   [(set f32:$dst, (int_AMDGPU_rsq f32:$src0))]
 >;
 defm V_RSQ_F32 : VOP1_32 <0x002e, "V_RSQ_F32", []>;
-defm V_RCP_F64 : VOP1_64 <0x002f, "V_RCP_F64", []>;
+defm V_RCP_F64 : VOP1_64 <0x002f, "V_RCP_F64",
+  [(set f64:$dst, (fdiv FP_ONE, f64:$src0))]
+>;
 defm V_RCP_CLAMP_F64 : VOP1_64 <0x0030, "V_RCP_CLAMP_F64", []>;
 defm V_RSQ_F64 : VOP1_64 <0x0031, "V_RSQ_F64", []>;
 defm V_RSQ_CLAMP_F64 : VOP1_64 <0x0032, "V_RSQ_CLAMP_F64", []>;
@@ -996,10 +998,25 @@ def V_LSHR_B64 : VOP3_64_Shift <0x0162, "V_LSHR_B64",
 >;
 def V_ASHR_I64 : VOP3_64_Shift <0x0163, "V_ASHR_I64", []>;
 
+let isCommutable = 1 in {
+
 def V_ADD_F64 : VOP3_64 <0x0164, "V_ADD_F64", []>;
 def V_MUL_F64 : VOP3_64 <0x0165, "V_MUL_F64", []>;
 def V_MIN_F64 : VOP3_64 <0x0166, "V_MIN_F64", []>;
 def V_MAX_F64 : VOP3_64 <0x0167, "V_MAX_F64", []>;
+
+} // isCommutable = 1
+
+def : Pat <
+  (fadd f64:$src0, f64:$src1),
+  (V_ADD_F64 $src0, $src1, (i64 0))
+>;
+
+def : Pat < 
+  (fmul f64:$src0, f64:$src1),
+  (V_MUL_F64 $src0, $src1, (i64 0))
+>;
+
 def V_LDEXP_F64 : VOP3_64 <0x0168, "V_LDEXP_F64", []>;
 
 let isCommutable = 1 in {
@@ -1417,6 +1434,10 @@ def : BitConvert ;
 def : BitConvert ;
 def : BitConvert ;
 
+def : BitConvert ;
+
+def : BitConvert ;
+
 /** === **/
 /** Src & Dst modifiers **/
 /** === **/
@@ -1505,6 +1526,11 @@ def : Pat<
   (V_MUL_F32_e32 $src0, (V_RCP_F32_e32 $src1))
 >;
 
+def : Pat<
+  (fdiv f64:$src0, f64:$src1),
+  (V_MUL_F64 $src0, (V_RCP_F64_e3

[Mesa-dev] [PATCH] clover: Fix linkage of libOpenCL

2013-08-07 Thread Niels Ole Salscheider

Clover needs the option component of llvm.

Signed-off-by: Niels Ole Salscheider 
---
 configure.ac | 4 
 1 Datei geändert, 4 Zeilen hinzugefügt(+)

diff --git a/configure.ac b/configure.ac
index 62d06e0..0dcd2a5 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1617,6 +1617,10 @@ if test "x$enable_gallium_llvm" = xyes; then
 if $LLVM_CONFIG --components | grep -qw 'irreader'; then
 LLVM_COMPONENTS="${LLVM_COMPONENTS} irreader"
 fi
+# LLVM 3.4 requires Option
+if $LLVM_CONFIG --components | grep -qw 'option'; then
+LLVM_COMPONENTS="${LLVM_COMPONENTS} option"
+fi
 fi
 DEFINES="${DEFINES} -DHAVE_LLVM=0x0$LLVM_VERSION_INT"
 MESA_LLVM=1
-- 
1.7.11.7

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 1/2] R600/SI: Implement sint<->fp64 conversions

2013-08-07 Thread Niels Ole Salscheider

---
 lib/Target/R600/SIInstrInfo.td| 6 ++
 lib/Target/R600/SIInstructions.td | 8 ++--
 test/CodeGen/R600/fp64_to_sint.ll | 9 +
 test/CodeGen/R600/sint_to_fp64.ll | 9 +
 4 Dateien geändert, 30 Zeilen hinzugefügt(+), 2 Zeilen entfernt(-)
 create mode 100644 test/CodeGen/R600/fp64_to_sint.ll
 create mode 100644 test/CodeGen/R600/sint_to_fp64.ll

diff --git a/lib/Target/R600/SIInstrInfo.td b/lib/Target/R600/SIInstrInfo.td
index 52af79c..302fa24 100644
--- a/lib/Target/R600/SIInstrInfo.td
+++ b/lib/Target/R600/SIInstrInfo.td
@@ -184,6 +184,12 @@ multiclass VOP1_32  op, string opName, list 
pattern>
 multiclass VOP1_64  op, string opName, list pattern>
   : VOP1_Helper ;
 
+multiclass VOP1_32_64  op, string opName, list pattern>
+  : VOP1_Helper ;
+
+multiclass VOP1_64_32  op, string opName, list pattern>
+  : VOP1_Helper ;
+
 multiclass VOP2_Helper  op, RegisterClass vrc, RegisterClass arc,
 string opName, list pattern, string revOp> {
   def _e32 : VOP2 <
diff --git a/lib/Target/R600/SIInstructions.td 
b/lib/Target/R600/SIInstructions.td
index 500d15e..efe7a3e 100644
--- a/lib/Target/R600/SIInstructions.td
+++ b/lib/Target/R600/SIInstructions.td
@@ -603,8 +603,12 @@ defm V_MOV_B32 : VOP1_32 <0x0001, "V_MOV_B32", []>;
 } // End neverHasSideEffects = 1, isMoveImm = 1
 
 defm V_READFIRSTLANE_B32 : VOP1_32 <0x0002, "V_READFIRSTLANE_B32", []>;
-//defm V_CVT_I32_F64 : VOP1_32 <0x0003, "V_CVT_I32_F64", []>;
-//defm V_CVT_F64_I32 : VOP1_64 <0x0004, "V_CVT_F64_I32", []>;
+defm V_CVT_I32_F64 : VOP1_32_64 <0x0003, "V_CVT_I32_F64",
+  [(set i32:$dst, (fp_to_sint f64:$src0))]
+>;
+defm V_CVT_F64_I32 : VOP1_64_32 <0x0004, "V_CVT_F64_I32",
+  [(set f64:$dst, (sint_to_fp i32:$src0))]
+>;
 defm V_CVT_F32_I32 : VOP1_32 <0x0005, "V_CVT_F32_I32",
   [(set f32:$dst, (sint_to_fp i32:$src0))]
 >;
diff --git a/test/CodeGen/R600/fp64_to_sint.ll 
b/test/CodeGen/R600/fp64_to_sint.ll
new file mode 100644
index 000..42f9f34
--- /dev/null
+++ b/test/CodeGen/R600/fp64_to_sint.ll
@@ -0,0 +1,9 @@
+; RUN: llc < %s -march=r600 -mcpu=SI | FileCheck %s --check-prefix=CHECK
+
+; CHECK: @fp64_to_sint
+; CHECK: V_CVT_I32_F64_e32
+define void @fp64_to_sint(i32 addrspace(1)* %out, double %in) {
+  %result = fptosi double %in to i32
+  store i32 %result, i32 addrspace(1)* %out
+  ret void
+}
diff --git a/test/CodeGen/R600/sint_to_fp64.ll 
b/test/CodeGen/R600/sint_to_fp64.ll
new file mode 100644
index 000..37f67c9
--- /dev/null
+++ b/test/CodeGen/R600/sint_to_fp64.ll
@@ -0,0 +1,9 @@
+; RUN: llc < %s -march=r600 -mcpu=SI | FileCheck %s --check-prefix=CHECK
+
+; CHECK: @sint_to_fp64
+; CHECK: V_CVT_F64_I32_e32
+define void @sint_to_fp64(double addrspace(1)* %out, i32 %in) {
+  %result = sitofp i32 %in to double
+  store double %result, double addrspace(1)* %out
+  ret void
+}
-- 
1.7.11.7

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/2] R600/SI: Implement fp32<->fp64 conversions

2013-08-07 Thread Niels Ole Salscheider

---
 lib/Target/R600/SIISelLowering.cpp | 3 +++
 lib/Target/R600/SIInstructions.td  | 8 ++--
 test/CodeGen/R600/fpext.ll | 9 +
 test/CodeGen/R600/fptrunc.ll   | 9 +
 4 Dateien geändert, 27 Zeilen hinzugefügt(+), 2 Zeilen entfernt(-)
 create mode 100644 test/CodeGen/R600/fpext.ll
 create mode 100644 test/CodeGen/R600/fptrunc.ll

diff --git a/lib/Target/R600/SIISelLowering.cpp 
b/lib/Target/R600/SIISelLowering.cpp
index c64027f..b714fc1 100644
--- a/lib/Target/R600/SIISelLowering.cpp
+++ b/lib/Target/R600/SIISelLowering.cpp
@@ -85,6 +85,9 @@ SITargetLowering::SITargetLowering(TargetMachine &TM) :
 
   setLoadExtAction(ISD::SEXTLOAD, MVT::i32, Expand);
 
+  setLoadExtAction(ISD::EXTLOAD, MVT::f32, Expand);
+  setTruncStoreAction(MVT::f64, MVT::f32, Expand);
+
   setOperationAction(ISD::GlobalAddress, MVT::i64, Custom);
 
   setTargetDAGCombine(ISD::SELECT_CC);
diff --git a/lib/Target/R600/SIInstructions.td 
b/lib/Target/R600/SIInstructions.td
index efe7a3e..dc41885 100644
--- a/lib/Target/R600/SIInstructions.td
+++ b/lib/Target/R600/SIInstructions.td
@@ -625,8 +625,12 @@ defm V_MOV_FED_B32 : VOP1_32 <0x0009, "V_MOV_FED_B32", 
[]>;
 //defm V_CVT_RPI_I32_F32 : VOP1_32 <0x000c, "V_CVT_RPI_I32_F32", []>;
 //defm V_CVT_FLR_I32_F32 : VOP1_32 <0x000d, "V_CVT_FLR_I32_F32", []>;
 //defm V_CVT_OFF_F32_I4 : VOP1_32 <0x000e, "V_CVT_OFF_F32_I4", []>;
-//defm V_CVT_F32_F64 : VOP1_32 <0x000f, "V_CVT_F32_F64", []>;
-//defm V_CVT_F64_F32 : VOP1_64 <0x0010, "V_CVT_F64_F32", []>;
+defm V_CVT_F32_F64 : VOP1_32_64 <0x000f, "V_CVT_F32_F64",
+  [(set f32:$dst, (fround f64:$src0))]
+>;
+defm V_CVT_F64_F32 : VOP1_64_32 <0x0010, "V_CVT_F64_F32",
+  [(set f64:$dst, (fextend f32:$src0))]
+>;
 //defm V_CVT_F32_UBYTE0 : VOP1_32 <0x0011, "V_CVT_F32_UBYTE0", []>;
 //defm V_CVT_F32_UBYTE1 : VOP1_32 <0x0012, "V_CVT_F32_UBYTE1", []>;
 //defm V_CVT_F32_UBYTE2 : VOP1_32 <0x0013, "V_CVT_F32_UBYTE2", []>;
diff --git a/test/CodeGen/R600/fpext.ll b/test/CodeGen/R600/fpext.ll
new file mode 100644
index 000..e02c19c
--- /dev/null
+++ b/test/CodeGen/R600/fpext.ll
@@ -0,0 +1,9 @@
+; RUN: llc < %s -march=r600 -mcpu=SI | FileCheck %s --check-prefix=CHECK
+
+; CHECK: @fpext
+; CHECK: V_CVT_F64_F32_e32
+define void @fpext(double addrspace(1)* %out, float %in) {
+  %result = fpext float %in to double
+  store double %result, double addrspace(1)* %out
+  ret void
+}
diff --git a/test/CodeGen/R600/fptrunc.ll b/test/CodeGen/R600/fptrunc.ll
new file mode 100644
index 000..2a10f63
--- /dev/null
+++ b/test/CodeGen/R600/fptrunc.ll
@@ -0,0 +1,9 @@
+; RUN: llc < %s -march=r600 -mcpu=SI | FileCheck %s --check-prefix=CHECK
+
+; CHECK: @fptrunc
+; CHECK: V_CVT_F32_F64_e32
+define void @fptrunc(float addrspace(1)* %out, double %in) {
+  %result = fptrunc double %in to float
+  store float %result, float addrspace(1)* %out
+  ret void
+}
-- 
1.7.11.7

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 3/6] st/clover: Unreference fences as early as possible

2013-08-09 Thread Niels Ole Salscheider

This makes sure that there are not too many concurrent fences.

Also, simplify status handling by keeping track of the current state.

Signed-off-by: Niels Ole Salscheider 
---
 src/gallium/state_trackers/clover/core/event.cpp | 29 +++-
 src/gallium/state_trackers/clover/core/event.hpp | 12 +-
 2 Dateien geändert, 24 Zeilen hinzugefügt(+), 17 Zeilen entfernt(-)

diff --git a/src/gallium/state_trackers/clover/core/event.cpp 
b/src/gallium/state_trackers/clover/core/event.cpp
index cbb97bf..13e8130 100644
--- a/src/gallium/state_trackers/clover/core/event.cpp
+++ b/src/gallium/state_trackers/clover/core/event.cpp
@@ -28,7 +28,7 @@ using namespace clover;
 _cl_event::_cl_event(clover::context &ctx,
  std::vector deps,
  action action_ok, action action_fail) :
-   ctx(ctx), __status(0), wait_count(1),
+   ctx(ctx), __status(CL_QUEUED), wait_count(1),
action_ok(action_ok), action_fail(action_fail) {
for (auto ev : deps)
   ev->chain(this);
@@ -114,6 +114,7 @@ hard_event::trigger() {
  pipe->end_query(pipe, __query_end);
  __ts_submit = screen->get_timestamp(screen);
   }
+  __status = CL_SUBMITTED;
 
   while (!__chain.empty()) {
  __chain.back()->trigger();
@@ -123,20 +124,21 @@ hard_event::trigger() {
 }
 
 cl_int
-hard_event::status() const {
+hard_event::status() {
pipe_screen *screen = queue()->dev.pipe;
 
-   if (__status < 0)
+   if (__status != CL_SUBMITTED)
   return __status;
 
-   else if (!__fence)
-  return CL_QUEUED;
-
-   else if (!screen->fence_signalled(screen, __fence))
+   else if (__fence && !screen->fence_signalled(screen, __fence))
   return CL_SUBMITTED;
 
-   else
+   else {
+  if (__fence)
+ screen->fence_reference(screen, &__fence, NULL);
+  __status = CL_COMPLETE;
   return CL_COMPLETE;
+   }
 }
 
 cl_command_queue
@@ -150,15 +152,20 @@ hard_event::command() const {
 }
 
 void
-hard_event::wait() const {
+hard_event::wait() {
pipe_screen *screen = queue()->dev.pipe;
 
if (status() == CL_QUEUED)
   queue()->flush();
 
+   if (status() == CL_COMPLETE)
+  return;
+
if (!__fence ||
!screen->fence_finish(screen, __fence, PIPE_TIMEOUT_INFINITE))
   throw error(CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST);
+   screen->fence_reference(screen, &__fence, NULL);
+   __status = CL_COMPLETE;
 }
 
 cl_ulong
@@ -231,7 +238,7 @@ soft_event::trigger() {
 }
 
 cl_int
-soft_event::status() const {
+soft_event::status() {
if (__status < 0)
   return __status;
 
@@ -256,7 +263,7 @@ soft_event::command() const {
 }
 
 void
-soft_event::wait() const {
+soft_event::wait() {
for (auto ev : deps)
   ev->wait();
 
diff --git a/src/gallium/state_trackers/clover/core/event.hpp 
b/src/gallium/state_trackers/clover/core/event.hpp
index de92de0..611b233 100644
--- a/src/gallium/state_trackers/clover/core/event.hpp
+++ b/src/gallium/state_trackers/clover/core/event.hpp
@@ -61,10 +61,10 @@ public:
void abort(cl_int status);
bool signalled() const;
 
-   virtual cl_int status() const = 0;
+   virtual cl_int status() = 0;
virtual cl_command_queue queue() const = 0;
virtual cl_command_type command() const = 0;
-   virtual void wait() const = 0;
+   virtual void wait() = 0;
 
clover::context &ctx;
 
@@ -101,10 +101,10 @@ namespace clover {
 
   virtual void trigger();
 
-  virtual cl_int status() const;
+  virtual cl_int status();
   virtual cl_command_queue queue() const;
   virtual cl_command_type command() const;
-  virtual void wait() const;
+  virtual void wait();
 
   cl_ulong ts_queued() const;
   cl_ulong ts_submit() const;
@@ -138,10 +138,10 @@ namespace clover {
 
   virtual void trigger();
 
-  virtual cl_int status() const;
+  virtual cl_int status();
   virtual cl_command_queue queue() const;
   virtual cl_command_type command() const;
-  virtual void wait() const;
+  virtual void wait();
};
 }
 
-- 
1.7.11.7

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 5/6] radeonsi: copy r600_get_timestamp

2013-08-09 Thread Niels Ole Salscheider

Signed-off-by: Niels Ole Salscheider 
---
 src/gallium/drivers/radeonsi/radeonsi_pipe.c | 9 +
 1 Datei geändert, 9 Zeilen hinzugefügt(+)

diff --git a/src/gallium/drivers/radeonsi/radeonsi_pipe.c 
b/src/gallium/drivers/radeonsi/radeonsi_pipe.c
index 3ba8232..7ae5598 100644
--- a/src/gallium/drivers/radeonsi/radeonsi_pipe.c
+++ b/src/gallium/drivers/radeonsi/radeonsi_pipe.c
@@ -779,6 +779,14 @@ static int r600_init_tiling(struct r600_screen *rscreen)
return evergreen_interpret_tiling(rscreen, tiling_config);
 }
 
+static uint64_t r600_get_timestamp(struct pipe_screen *screen)
+{
+   struct r600_screen *rscreen = (struct r600_screen*)screen;
+
+   return 100 * rscreen->ws->query_value(rscreen->ws, 
RADEON_TIMESTAMP) /
+   rscreen->info.r600_clock_crystal_freq;
+}
+
 static unsigned radeon_family_from_device(unsigned device)
 {
switch (device) {
@@ -830,6 +838,7 @@ struct pipe_screen *radeonsi_screen_create(struct 
radeon_winsys *ws)
rscreen->screen.get_shader_param = r600_get_shader_param;
rscreen->screen.get_paramf = r600_get_paramf;
rscreen->screen.get_compute_param = r600_get_compute_param;
+   rscreen->screen.get_timestamp = r600_get_timestamp;
rscreen->screen.is_format_supported = si_is_format_supported;
rscreen->screen.context_create = r600_create_context;
rscreen->screen.fence_reference = r600_fence_reference;
-- 
1.7.11.7

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 4/6] radeonsi: Implement PIPE_QUERY_TIMESTAMP

2013-08-09 Thread Niels Ole Salscheider

Signed-off-by: Niels Ole Salscheider 
---
 src/gallium/drivers/radeonsi/r600.h|  1 +
 src/gallium/drivers/radeonsi/r600_hw_context.c | 31 ++
 src/gallium/drivers/radeonsi/r600_query.c  | 14 +++-
 src/gallium/drivers/radeonsi/radeonsi_pipe.c   |  2 +-
 4 Dateien geändert, 46 Zeilen hinzugefügt(+), 2 Zeilen entfernt(-)

diff --git a/src/gallium/drivers/radeonsi/r600.h 
b/src/gallium/drivers/radeonsi/r600.h
index 8f35cc2..ce0468d 100644
--- a/src/gallium/drivers/radeonsi/r600.h
+++ b/src/gallium/drivers/radeonsi/r600.h
@@ -102,6 +102,7 @@ void si_context_emit_fence(struct r600_context *ctx, struct 
si_resource *fence,
unsigned offset, unsigned value);
 
 void r600_context_draw_opaque_count(struct r600_context *ctx, struct 
r600_so_target *t);
+bool si_query_needs_begin(unsigned type);
 void si_need_cs_space(struct r600_context *ctx, unsigned num_dw, boolean 
count_draw_in);
 
 int si_context_init(struct r600_context *ctx);
diff --git a/src/gallium/drivers/radeonsi/r600_hw_context.c 
b/src/gallium/drivers/radeonsi/r600_hw_context.c
index 25c972b..7de3745 100644
--- a/src/gallium/drivers/radeonsi/r600_hw_context.c
+++ b/src/gallium/drivers/radeonsi/r600_hw_context.c
@@ -110,6 +110,11 @@ err:
return;
 }
 
+bool si_query_needs_begin(unsigned type)
+{
+   return type != PIPE_QUERY_TIMESTAMP;
+}
+
 /* initialize */
 void si_need_cs_space(struct r600_context *ctx, unsigned num_dw,
boolean count_draw_in)
@@ -340,6 +345,12 @@ static boolean r600_query_result(struct r600_context *ctx, 
struct r600_query *qu
results_base = (results_base + 16) % 
query->buffer->b.b.width0;
}
break;
+   case PIPE_QUERY_TIMESTAMP:
+   {
+   uint32_t *current_result = (uint32_t*)map;
+   query->result.u64 = (uint64_t)current_result[0] | 
(uint64_t)current_result[1] << 32;
+   break;
+   }
case PIPE_QUERY_TIME_ELAPSED:
while (results_base != query->results_end) {
query->result.u64 +=
@@ -485,6 +496,19 @@ void r600_query_end(struct r600_context *ctx, struct 
r600_query *query)
 {
struct radeon_winsys_cs *cs = ctx->cs;
uint64_t va;
+   unsigned new_results_end;
+
+   /* The queries which need begin already called this in begin_query. */
+   if (!si_query_needs_begin(query->type)) {
+   si_need_cs_space(ctx, query->num_cs_dw, TRUE);
+
+   new_results_end = (query->results_end + query->result_size) % 
query->buffer->b.b.width0;
+
+   /* collect current results if query buffer is full */
+   if (new_results_end == query->results_start) {
+   r600_query_result(ctx, query, TRUE);
+   }
+   }
 
va = r600_resource_va(&ctx->screen->screen, (void*)query->buffer);
/* emit end query */
@@ -508,6 +532,8 @@ void r600_query_end(struct r600_context *ctx, struct 
r600_query *query)
break;
case PIPE_QUERY_TIME_ELAPSED:
va += query->results_end + query->result_size/2;
+   /* fall through */
+   case PIPE_QUERY_TIMESTAMP:
cs->buf[cs->cdw++] = PKT3(PKT3_EVENT_WRITE_EOP, 4, 0);
cs->buf[cs->cdw++] = 
EVENT_TYPE(EVENT_TYPE_CACHE_FLUSH_AND_INV_TS_EVENT) | EVENT_INDEX(5);
cs->buf[cs->cdw++] = va;
@@ -585,6 +611,10 @@ struct r600_query *r600_context_query_create(struct 
r600_context *ctx, unsigned
query->result_size = 16 * ctx->max_db;
query->num_cs_dw = 6;
break;
+   case PIPE_QUERY_TIMESTAMP:
+   query->result_size = 8;
+   query->num_cs_dw = 8;
+   break;
case PIPE_QUERY_TIME_ELAPSED:
query->result_size = 16;
query->num_cs_dw = 8;
@@ -648,6 +678,7 @@ boolean r600_context_query_result(struct r600_context *ctx,
case PIPE_QUERY_SO_OVERFLOW_PREDICATE:
*result_b = query->result.b;
break;
+   case PIPE_QUERY_TIMESTAMP:
case PIPE_QUERY_TIME_ELAPSED:
*result_u64 = (100 * query->result.u64) / 
ctx->screen->info.r600_clock_crystal_freq;
break;
diff --git a/src/gallium/drivers/radeonsi/r600_query.c 
b/src/gallium/drivers/radeonsi/r600_query.c
index 0162cce..927577c 100644
--- a/src/gallium/drivers/radeonsi/r600_query.c
+++ b/src/gallium/drivers/radeonsi/r600_query.c
@@ -42,6 +42,11 @@ static void r600_begin_query(struct pipe_context *ctx, 
struct pipe_query *query)
struct r600_context *rctx = (struct r600_context *)ctx;
struct r600_query *rquery = (struct r600_query *)query;
 
+   if (!si_query_needs_begin(rquery->type)) {
+   ass

[Mesa-dev] [PATCH 2/6] st/clover: Add event to deps even if it has been triggered

2013-08-09 Thread Niels Ole Salscheider

The command is submitted once the event has been triggered, but it might not
have completed yet. Therefore, we have to add it to deps in order to wait on it.

Signed-off-by: Niels Ole Salscheider 
---
 src/gallium/state_trackers/clover/core/event.cpp | 2 +-
 1 Datei geändert, 1 Zeile hinzugefügt(+), 1 Zeile entfernt(-)

diff --git a/src/gallium/state_trackers/clover/core/event.cpp 
b/src/gallium/state_trackers/clover/core/event.cpp
index de21f0c..cbb97bf 100644
--- a/src/gallium/state_trackers/clover/core/event.cpp
+++ b/src/gallium/state_trackers/clover/core/event.cpp
@@ -58,8 +58,8 @@ _cl_event::chain(clover::event *ev) {
if (wait_count) {
   ev->wait_count++;
   __chain.push_back(ev);
-  ev->deps.push_back(this);
}
+   ev->deps.push_back(this);
 }
 
 hard_event::hard_event(clover::command_queue &q, cl_command_type command,
-- 
1.7.11.7

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 6/6] radeonsi: Handle additional PIPE_COMPUTE_CAP_*

2013-08-09 Thread Niels Ole Salscheider

This patch adds support for:
PIPE_COMPUTE_CAP_MAX_INPUT_SIZE
PIPE_COMPUTE_CAP_MAX_LOCAL_SIZE

Return the values reported by the closed source driver for now.

Signed-off-by: Niels Ole Salscheider 
---
 src/gallium/drivers/radeonsi/radeonsi_pipe.c | 15 ++-
 1 Datei geändert, 14 Zeilen hinzugefügt(+), 1 Zeile entfernt(-)

diff --git a/src/gallium/drivers/radeonsi/radeonsi_pipe.c 
b/src/gallium/drivers/radeonsi/radeonsi_pipe.c
index 7ae5598..47f5191 100644
--- a/src/gallium/drivers/radeonsi/radeonsi_pipe.c
+++ b/src/gallium/drivers/radeonsi/radeonsi_pipe.c
@@ -602,7 +602,20 @@ static int r600_get_compute_param(struct pipe_screen 
*screen,
*max_global_size = 20;
}
return sizeof(uint64_t);
-
+   case PIPE_COMPUTE_CAP_MAX_LOCAL_SIZE:
+   if (ret) {
+   uint64_t *max_local_size = ret;
+   /* Value reported by the closed source driver. */
+   *max_local_size = 32768;
+   }
+   return sizeof(uint64_t);
+   case PIPE_COMPUTE_CAP_MAX_INPUT_SIZE:
+   if (ret) {
+   uint64_t *max_input_size = ret;
+   /* Value reported by the closed source driver. */
+   *max_input_size = 1024;
+   }
+   return sizeof(uint64_t);
case PIPE_COMPUTE_CAP_MAX_MEM_ALLOC_SIZE:
if (ret) {
uint64_t max_global_size;
-- 
1.7.11.7

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 1/6] st/clover: Profiling support

2013-08-09 Thread Niels Ole Salscheider

Signed-off-by: Niels Ole Salscheider 
---
 src/gallium/state_trackers/clover/api/event.cpp  |  26 -
 src/gallium/state_trackers/clover/core/event.cpp | 116 ---
 src/gallium/state_trackers/clover/core/event.hpp |  18 +++-
 3 Dateien geändert, 142 Zeilen hinzugefügt(+), 18 Zeilen entfernt(-)

diff --git a/src/gallium/state_trackers/clover/api/event.cpp 
b/src/gallium/state_trackers/clover/api/event.cpp
index 39a647b..ea1576c 100644
--- a/src/gallium/state_trackers/clover/api/event.cpp
+++ b/src/gallium/state_trackers/clover/api/event.cpp
@@ -217,7 +217,31 @@ clEnqueueWaitForEvents(cl_command_queue q, cl_uint num_evs,
 PUBLIC cl_int
 clGetEventProfilingInfo(cl_event ev, cl_profiling_info param,
 size_t size, void *buf, size_t *size_ret) {
-   return CL_PROFILING_INFO_NOT_AVAILABLE;
+   hard_event *hev = dynamic_cast(ev);
+   soft_event *sev = dynamic_cast(ev);
+
+   if (!hev && !sev)
+  return CL_INVALID_EVENT;
+   if (!hev || !(hev->queue()->props() & CL_QUEUE_PROFILING_ENABLE) ||
+   hev->status() != CL_COMPLETE)
+  return CL_PROFILING_INFO_NOT_AVAILABLE;
+
+   switch (param) {
+   case CL_PROFILING_COMMAND_QUEUED:
+  return scalar_property(buf, size, size_ret, hev->ts_queued());
+
+   case CL_PROFILING_COMMAND_SUBMIT:
+  return scalar_property(buf, size, size_ret, hev->ts_submit());
+
+   case CL_PROFILING_COMMAND_START:
+  return scalar_property(buf, size, size_ret, hev->ts_start());
+
+   case CL_PROFILING_COMMAND_END:
+  return scalar_property(buf, size, size_ret, hev->ts_end());
+
+   default:
+  return CL_INVALID_VALUE;
+   }
 }
 
 PUBLIC cl_int
diff --git a/src/gallium/state_trackers/clover/core/event.cpp 
b/src/gallium/state_trackers/clover/core/event.cpp
index 93d3b58..de21f0c 100644
--- a/src/gallium/state_trackers/clover/core/event.cpp
+++ b/src/gallium/state_trackers/clover/core/event.cpp
@@ -38,18 +38,6 @@ _cl_event::~_cl_event() {
 }
 
 void
-_cl_event::trigger() {
-   if (!--wait_count) {
-  action_ok(*this);
-
-  while (!__chain.empty()) {
- __chain.back()->trigger();
- __chain.pop_back();
-  }
-   }
-}
-
-void
 _cl_event::abort(cl_int status) {
__status = status;
action_fail(*this);
@@ -77,14 +65,61 @@ _cl_event::chain(clover::event *ev) {
 hard_event::hard_event(clover::command_queue &q, cl_command_type command,
std::vector deps, action action) :
_cl_event(q.ctx, deps, action, [](event &ev){}),
-   __queue(q), __command(command), __fence(NULL) {
+   __queue(q), __command(command), __fence(NULL),
+   __query_start(NULL), __query_end(NULL) {
q.sequence(this);
+
+   if(q.props() & CL_QUEUE_PROFILING_ENABLE) {
+  pipe_screen *screen = q.dev.pipe;
+  __ts_queued = screen->get_timestamp(screen);
+   }
+
trigger();
 }
 
 hard_event::~hard_event() {
pipe_screen *screen = queue()->dev.pipe;
+   pipe_context *pipe = queue()->pipe;
screen->fence_reference(screen, &__fence, NULL);
+
+   if(__query_start) {
+  pipe->destroy_query(pipe, __query_start);
+  __query_start = 0;
+   }
+
+   if(__query_end) {
+  pipe->destroy_query(pipe, __query_end);
+  __query_end = 0;
+   }
+}
+
+void
+hard_event::trigger() {
+   if (!--wait_count) {
+   /* XXX: Currently, a timestamp query gives wrong results for memory
+* transfers. This is, because we use memcpy instead of the DMA engines. */
+
+  if(queue()->props() & CL_QUEUE_PROFILING_ENABLE) {
+ pipe_context *pipe = queue()->pipe;
+ __query_start = pipe->create_query(pipe, PIPE_QUERY_TIMESTAMP);
+ pipe->end_query(queue()->pipe, __query_start);
+  }
+
+  action_ok(*this);
+
+  if(queue()->props() & CL_QUEUE_PROFILING_ENABLE) {
+ pipe_context *pipe = queue()->pipe;
+ pipe_screen *screen = queue()->dev.pipe;
+ __query_end = pipe->create_query(pipe, PIPE_QUERY_TIMESTAMP);
+ pipe->end_query(pipe, __query_end);
+ __ts_submit = screen->get_timestamp(screen);
+  }
+
+  while (!__chain.empty()) {
+ __chain.back()->trigger();
+ __chain.pop_back();
+  }
+   }
 }
 
 cl_int
@@ -126,6 +161,49 @@ hard_event::wait() const {
   throw error(CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST);
 }
 
+cl_ulong
+hard_event::ts_queued() const {
+   return __ts_queued;
+}
+
+cl_ulong
+hard_event::ts_submit() const {
+   return __ts_submit;
+}
+
+cl_ulong
+hard_event::ts_start() {
+   get_query_results();
+   return __ts_start;
+}
+
+cl_ulong
+hard_event::ts_end() {
+   get_query_results();
+   return __ts_end;
+}
+
+void
+hard_event::get_query_results() {
+   pipe_context *pipe = queue()->pipe;
+
+   if(__query_start) {
+  pipe_query_result result;
+  pipe->get_query_result(pipe, __query_start, true, &result);
+  __ts_start = result.u64;
+  pipe->destroy_q

[Mesa-dev] [PATCH 1/2] R600/SI: Add FMA pattern

2013-08-09 Thread Niels Ole Salscheider

---
 lib/Target/R600/SIInstructions.td |  8 ++--
 test/CodeGen/R600/fma.ll  | 31 +++
 2 Dateien geändert, 37 Zeilen hinzugefügt(+), 2 Zeilen entfernt(-)
 create mode 100644 test/CodeGen/R600/fma.ll

diff --git a/lib/Target/R600/SIInstructions.td 
b/lib/Target/R600/SIInstructions.td
index dc41885..dc14609 100644
--- a/lib/Target/R600/SIInstructions.td
+++ b/lib/Target/R600/SIInstructions.td
@@ -1007,8 +1007,12 @@ def V_BFE_U32 : VOP3_32 <0x0148, "V_BFE_U32", []>;
 def V_BFE_I32 : VOP3_32 <0x0149, "V_BFE_I32", []>;
 def V_BFI_B32 : VOP3_32 <0x014a, "V_BFI_B32", []>;
 defm : BFIPatterns ;
-def V_FMA_F32 : VOP3_32 <0x014b, "V_FMA_F32", []>;
-def V_FMA_F64 : VOP3_64 <0x014c, "V_FMA_F64", []>;
+def V_FMA_F32 : VOP3_32 <0x014b, "V_FMA_F32",
+  [(set f32:$dst, (fma f32:$src0, f32:$src1, f32:$src2))]
+>;
+def V_FMA_F64 : VOP3_64 <0x014c, "V_FMA_F64",
+  [(set f64:$dst, (fma f64:$src0, f64:$src1, f64:$src2))]
+>;
 //def V_LERP_U8 : VOP3_U8 <0x014d, "V_LERP_U8", []>;
 def V_ALIGNBIT_B32 : VOP3_32 <0x014e, "V_ALIGNBIT_B32", []>;
 def : ROTRPattern ;
diff --git a/test/CodeGen/R600/fma.ll b/test/CodeGen/R600/fma.ll
new file mode 100644
index 000..afef970
--- /dev/null
+++ b/test/CodeGen/R600/fma.ll
@@ -0,0 +1,31 @@
+; RUN: llc < %s -march=r600 -mcpu=SI | FileCheck %s
+
+; CHECK: @fma_f32
+; CHECK: V_FMA_F32 {{VGPR[0-9]+, VGPR[0-9]+, VGPR[0-9]+, VGPR[0-9]+}}
+
+define void @fma_f32(float addrspace(1)* %out, float addrspace(1)* %in1,
+ float addrspace(1)* %in2, float addrspace(1)* %in3) {
+   %r0 = load float addrspace(1)* %in1
+   %r1 = load float addrspace(1)* %in2
+   %r2 = load float addrspace(1)* %in3
+   %r3 = tail call float @llvm.fma.f32(float %r0, float %r1, float %r2)
+   store float %r3, float addrspace(1)* %out
+   ret void
+}
+
+declare float @llvm.fma.f32(float, float, float)
+
+; CHECK: @fma_f64
+; CHECK: V_FMA_F64 {{VGPR[0-9]+_VGPR[0-9]+, VGPR[0-9]+_VGPR[0-9]+, 
VGPR[0-9]+_VGPR[0-9]+, VGPR[0-9]+_VGPR[0-9]+}}
+
+define void @fma_f64(double addrspace(1)* %out, double addrspace(1)* %in1,
+ double addrspace(1)* %in2, double addrspace(1)* %in3) {
+   %r0 = load double addrspace(1)* %in1
+   %r1 = load double addrspace(1)* %in2
+   %r2 = load double addrspace(1)* %in3
+   %r3 = tail call double @llvm.fma.f64(double %r0, double %r1, double %r2)
+   store double %r3, double addrspace(1)* %out
+   ret void
+}
+
+declare double @llvm.fma.f64(double, double, double)
-- 
1.7.11.7

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/2] R600/SI: FMA is faster than fmul and fadd for f64

2013-08-09 Thread Niels Ole Salscheider

---
 lib/Target/R600/SIISelLowering.cpp | 18 ++
 lib/Target/R600/SIISelLowering.h   |  1 +
 test/CodeGen/R600/fmuladd.ll   | 31 +++
 3 Dateien geändert, 50 Zeilen hinzugefügt(+)
 create mode 100644 test/CodeGen/R600/fmuladd.ll

diff --git a/lib/Target/R600/SIISelLowering.cpp 
b/lib/Target/R600/SIISelLowering.cpp
index b714fc1..a76e6ee 100644
--- a/lib/Target/R600/SIISelLowering.cpp
+++ b/lib/Target/R600/SIISelLowering.cpp
@@ -338,6 +338,24 @@ MVT SITargetLowering::getScalarShiftAmountTy(EVT VT) const 
{
   return MVT::i32;
 }
 
+bool SITargetLowering::isFMAFasterThanFMulAndFAdd(EVT VT) const {
+  VT = VT.getScalarType();
+
+  if (!VT.isSimple())
+return false;
+
+  switch (VT.getSimpleVT().SimpleTy) {
+  case MVT::f32:
+return false; /* There is V_MAD_F32 for f32 */
+  case MVT::f64:
+return true;
+  default:
+break;
+  }
+
+  return false;
+}
+
 
//===--===//
 // Custom DAG Lowering Operations
 
//===--===//
diff --git a/lib/Target/R600/SIISelLowering.h b/lib/Target/R600/SIISelLowering.h
index b4202c4..effbf1f 100644
--- a/lib/Target/R600/SIISelLowering.h
+++ b/lib/Target/R600/SIISelLowering.h
@@ -55,6 +55,7 @@ public:
   MachineBasicBlock * BB) const;
   virtual EVT getSetCCResultType(LLVMContext &Context, EVT VT) const;
   virtual MVT getScalarShiftAmountTy(EVT VT) const;
+  virtual bool isFMAFasterThanFMulAndFAdd(EVT VT) const;
   virtual SDValue LowerOperation(SDValue Op, SelectionDAG &DAG) const;
   virtual SDValue PerformDAGCombine(SDNode *N, DAGCombinerInfo &DCI) const;
   virtual SDNode *PostISelFolding(MachineSDNode *N, SelectionDAG &DAG) const;
diff --git a/test/CodeGen/R600/fmuladd.ll b/test/CodeGen/R600/fmuladd.ll
new file mode 100644
index 000..ac379f4
--- /dev/null
+++ b/test/CodeGen/R600/fmuladd.ll
@@ -0,0 +1,31 @@
+; RUN: llc < %s -march=r600 -mcpu=SI | FileCheck %s
+
+; CHECK: @fmuladd_f32
+; CHECK: V_MAD_F32 {{VGPR[0-9]+, VGPR[0-9]+, VGPR[0-9]+, VGPR[0-9]+}}
+
+define void @fmuladd_f32(float addrspace(1)* %out, float addrspace(1)* %in1,
+ float addrspace(1)* %in2, float addrspace(1)* %in3) {
+   %r0 = load float addrspace(1)* %in1
+   %r1 = load float addrspace(1)* %in2
+   %r2 = load float addrspace(1)* %in3
+   %r3 = tail call float @llvm.fmuladd.f32(float %r0, float %r1, float %r2)
+   store float %r3, float addrspace(1)* %out
+   ret void
+}
+
+declare float @llvm.fmuladd.f32(float, float, float)
+
+; CHECK: @fmuladd_f64
+; CHECK: V_FMA_F64 {{VGPR[0-9]+_VGPR[0-9]+, VGPR[0-9]+_VGPR[0-9]+, 
VGPR[0-9]+_VGPR[0-9]+, VGPR[0-9]+_VGPR[0-9]+}}
+
+define void @fmuladd_f64(double addrspace(1)* %out, double addrspace(1)* %in1,
+ double addrspace(1)* %in2, double addrspace(1)* %in3) 
{
+   %r0 = load double addrspace(1)* %in1
+   %r1 = load double addrspace(1)* %in2
+   %r2 = load double addrspace(1)* %in3
+   %r3 = tail call double @llvm.fmuladd.f64(double %r0, double %r1, double %r2)
+   store double %r3, double addrspace(1)* %out
+   ret void
+}
+
+declare double @llvm.fmuladd.f64(double, double, double)
-- 
1.7.11.7

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] pipe-loader: Fix out of source build

2013-02-24 Thread Niels Ole Salscheider

Signed-off-by: Niels Ole Salscheider 
---
 src/gallium/targets/opencl/Makefile.am | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/gallium/targets/opencl/Makefile.am 
b/src/gallium/targets/opencl/Makefile.am
index c5c3003..709112f 100644
--- a/src/gallium/targets/opencl/Makefile.am
+++ b/src/gallium/targets/opencl/Makefile.am
@@ -32,11 +32,11 @@ libOpenCL_la_SOURCES =
 # Force usage of a C++ linker
 nodist_EXTRA_libOpenCL_la_SOURCES = dummy.cpp
 
-PIPE_SRC_DIR = $(top_srcdir)/src/gallium/targets/pipe-loader
+PIPE_BUILD_DIR = $(top_builddir)/src/gallium/targets/pipe-loader
 
 # Provide compatibility with scripts for the old Mesa build system for
 # a while by putting a link to the driver into /lib of the build tree.
 all-local: libOpenCL.la
-   @$(MAKE) -C $(PIPE_SRC_DIR)
+   @$(MAKE) -C $(PIPE_BUILD_DIR)
$(MKDIR_P) $(top_builddir)/$(LIB_DIR)
ln -f .libs/libOpenCL.so* $(top_builddir)/$(LIB_DIR)/
-- 
1.8.1.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] st/mesa: index can be negative in the PROGRAM_CONSTANT case

2012-08-12 Thread Niels Ole Salscheider

---
 src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp 
b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
index 39717b6..9f58312 100644
--- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
+++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
@@ -4028,7 +4028,7 @@ dst_register(struct st_translate *t,
 static struct ureg_src
 src_register(struct st_translate *t,
  gl_register_file file,
- GLuint index)
+ GLint index)
 {
switch(file) {
case PROGRAM_UNDEFINED:
-- 
1.7.11.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2] r600: Use DMA transfers in r600_copy_global_buffer

2014-09-28 Thread Niels Ole Salscheider

On Sunday 28 September 2014, 17:44:53, Bruno Jimenez wrote:
> Hi,
> 
> Sorry for not answering until now, but I have had some personal issues
> (changing university, moving to another city...)
> 
> As you said, this is used from clover's resource::copy, which is used by
> clEnqueueCopyBuffer if I remember correctly (and understand correctly
> clover) If it doesn't regress any piglit test then it has my R-b :)
> 
> Thanks a lot!
> Bruno

Hi,

no problem, I have been a bit busy with my thesis anyway (I have to hand it in 
on Tuesday)...

You are right, it is used from clEnqueueCopyBuffer - and it does not regress 
any piglit tests for me.
Can someone with write access please push this?

Thanks,

Ole

> On Mon, 2014-09-08 at 20:10 +0200, Niels Ole Salscheider wrote:
> > v2: Do not demote items that are already in the pool
> > 
> > Signed-off-by: Niels Ole Salscheider 
> > ---
> > 
> >  src/gallium/drivers/r600/evergreen_compute.h |  1 +
> >  src/gallium/drivers/r600/r600_blit.c | 59
> >   2 files changed, 43 insertions(+), 17
> >  deletions(-)
> > 
> > diff --git a/src/gallium/drivers/r600/evergreen_compute.h
> > b/src/gallium/drivers/r600/evergreen_compute.h index 4fb53a1..e4d3a38
> > 100644
> > --- a/src/gallium/drivers/r600/evergreen_compute.h
> > +++ b/src/gallium/drivers/r600/evergreen_compute.h
> > @@ -45,6 +45,7 @@ void evergreen_init_atom_start_compute_cs(struct
> > r600_context *rctx);> 
> >  void evergreen_init_compute_state_functions(struct r600_context *rctx);
> >  void evergreen_emit_cs_shader(struct r600_context *rctx, struct r600_atom
> >  * atom);> 
> > +struct r600_resource* r600_compute_buffer_alloc_vram(struct r600_screen
> > *screen, unsigned size);> 
> >  struct pipe_resource *r600_compute_global_buffer_create(struct
> >  pipe_screen *screen, const struct pipe_resource *templ); void
> >  r600_compute_global_buffer_destroy(struct pipe_screen *screen, struct
> >  pipe_resource *res); void *r600_compute_global_transfer_map(
> > 
> > diff --git a/src/gallium/drivers/r600/r600_blit.c
> > b/src/gallium/drivers/r600/r600_blit.c index f766e37..b334a75 100644
> > --- a/src/gallium/drivers/r600/r600_blit.c
> > +++ b/src/gallium/drivers/r600/r600_blit.c
> > @@ -21,6 +21,8 @@
> > 
> >   * USE OR OTHER DEALINGS IN THE SOFTWARE.
> >   */
> >  
> >  #include "r600_pipe.h"
> > 
> > +#include "compute_memory_pool.h"
> > +#include "evergreen_compute.h"
> > 
> >  #include "util/u_surface.h"
> >  #include "util/u_format.h"
> >  #include "evergreend.h"
> > 
> > @@ -514,29 +516,52 @@ static void r600_copy_buffer(struct pipe_context
> > *ctx, struct pipe_resource *dst> 
> >   * into a single global resource (r600_screen::global_pool).  The means
> >   * they don't have their own cs_buf handle, so they cannot be passed
> >   * to r600_copy_buffer() and must be handled separately.
> > 
> > - *
> > - * XXX: It should be possible to implement this function using
> > - * r600_copy_buffer() by passing the memory_pool resource as both src
> > - * and dst and updating dstx and src_box to point to the correct offsets.
> > - * This would likely perform better than the current implementation.
> > 
> >   */
> >  
> >  static void r600_copy_global_buffer(struct pipe_context *ctx,
> >  
> > struct pipe_resource *dst, unsigned
> > dstx, struct pipe_resource *src,
> > const struct pipe_box *src_box)
> >  
> >  {
> > 
> > -   struct pipe_box dst_box; struct pipe_transfer *src_pxfer,
> > -   *dst_pxfer;
> > -
> > -   u_box_1d(dstx, src_box->width, &dst_box);
> > -   void *src_ptr = ctx->transfer_map(ctx, src, 0, PIPE_TRANSFER_READ,
> > - src_box, &src_pxfer);
> > -   void *dst_ptr = ctx->transfer_map(ctx, dst, 0, PIPE_TRANSFER_WRITE,
> > - &dst_box, &dst_pxfer);
> > -   memcpy(dst_ptr, src_ptr, src_box->width);
> > -
> > -   ctx->transfer_unmap(ctx, src_pxfer);
> > -   ctx->transfer_unmap(ctx, dst_pxfer);
> > +   struct r600_context *rctx = (struct r600_context*)ctx;
> > +   struct compute_memory_pool *pool = rctx->screen->global_pool;
> > +   struct pipe_box new_src_box = *src_box;
> > +
> > +   if (src->bind & PIPE_BIND_GLOBAL) {
> > +   str

[Mesa-dev] [PATCH] egl/gallium: Set defines for supported APIs when using automake

2014-06-10 Thread Niels Ole Salscheider

This fixes automake builds which are broken since
b52a530ce2aada1967bc8fefa83ab53e6a737dae.

Signed-off-by: Niels Ole Salscheider 
---
 src/gallium/state_trackers/egl/Makefile.am | 20 
 1 file changed, 20 insertions(+)

diff --git a/src/gallium/state_trackers/egl/Makefile.am 
b/src/gallium/state_trackers/egl/Makefile.am
index b7dcdab..828bf13 100644
--- a/src/gallium/state_trackers/egl/Makefile.am
+++ b/src/gallium/state_trackers/egl/Makefile.am
@@ -88,3 +88,23 @@ AM_CPPFLAGS += \
-I$(top_srcdir)/src/gallium/winsys/sw \
-DHAVE_NULL_BACKEND
 endif
+
+if HAVE_OPENGL
+AM_CPPFLAGS += \
+   -DFEATURE_GL=1
+endif
+
+if HAVE_OPENGL_ES1
+AM_CPPFLAGS += \
+   -DFEATURE_ES1=1
+endif
+
+if HAVE_OPENGL_ES2
+AM_CPPFLAGS += \
+   -DFEATURE_ES2=1
+endif
+
+if HAVE_OPENVG
+AM_CPPFLAGS += \
+   -DFEATURE_VG=1
+endif
-- 
2.0.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] egl/gallium: Set defines for supported APIs when using automake

2014-06-10 Thread Niels Ole Salscheider

On Tuesday 10 June 2014, 16:18:56, Emil Velikov wrote:
> On 10/06/14 15:17, Niels Ole Salscheider wrote:
> > This fixes automake builds which are broken since
> > b52a530ce2aada1967bc8fefa83ab53e6a737dae.
> 
> Not sure what I was smoking with the above mentioned patch.
> Seem like I've completely forgotten about automake :\
> 
> Niels can you please drop the FEATURE* defines from
> src/gallium/targets/egl-static/Makefile.am

I think they are still necessary since src/gallium/targets/egl-static/egl_st.c 
contains these flags, too... Or am I missing something?
I have seen that you removed them in b52a530ce2aada1967bc8fefa83ab53e6a737dae 
for the other build systems...
 
> With that fixed
> 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=79885
> Reviewed-by: Emil Velikov 
> 
> Thanks
> Emil
> 
> > Signed-off-by: Niels Ole Salscheider 
> > ---
> > 
> >  src/gallium/state_trackers/egl/Makefile.am | 20 
> >  1 file changed, 20 insertions(+)
> > 
> > diff --git a/src/gallium/state_trackers/egl/Makefile.am
> > b/src/gallium/state_trackers/egl/Makefile.am index b7dcdab..828bf13
> > 100644
> > --- a/src/gallium/state_trackers/egl/Makefile.am
> > +++ b/src/gallium/state_trackers/egl/Makefile.am
> > @@ -88,3 +88,23 @@ AM_CPPFLAGS += \
> > 
> > -I$(top_srcdir)/src/gallium/winsys/sw \
> > -DHAVE_NULL_BACKEND
> >  
> >  endif
> > 
> > +
> > +if HAVE_OPENGL
> > +AM_CPPFLAGS += \
> > +   -DFEATURE_GL=1
> > +endif
> > +
> > +if HAVE_OPENGL_ES1
> > +AM_CPPFLAGS += \
> > +   -DFEATURE_ES1=1
> > +endif
> > +
> > +if HAVE_OPENGL_ES2
> > +AM_CPPFLAGS += \
> > +   -DFEATURE_ES2=1
> > +endif
> > +
> > +if HAVE_OPENVG
> > +AM_CPPFLAGS += \
> > +   -DFEATURE_VG=1
> > +endif

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v2] egl/gallium: Set defines for supported APIs when using automake

2014-06-11 Thread Niels Ole Salscheider

This fixes automake builds which are broken since
b52a530ce2aada1967bc8fefa83ab53e6a737dae.

v2: This patch also adds the FEATURE_* defines back to targets/egl-static for
Android and Scons that have been removed in the mentioned commit.

Signed-off-by: Niels Ole Salscheider 
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=79885
---
 src/gallium/state_trackers/egl/Makefile.am | 20 
 src/gallium/targets/egl-static/Android.mk  |  2 ++
 src/gallium/targets/egl-static/SConscript  |  6 ++
 3 files changed, 28 insertions(+)

diff --git a/src/gallium/state_trackers/egl/Makefile.am 
b/src/gallium/state_trackers/egl/Makefile.am
index b7dcdab..828bf13 100644
--- a/src/gallium/state_trackers/egl/Makefile.am
+++ b/src/gallium/state_trackers/egl/Makefile.am
@@ -88,3 +88,23 @@ AM_CPPFLAGS += \
-I$(top_srcdir)/src/gallium/winsys/sw \
-DHAVE_NULL_BACKEND
 endif
+
+if HAVE_OPENGL
+AM_CPPFLAGS += \
+   -DFEATURE_GL=1
+endif
+
+if HAVE_OPENGL_ES1
+AM_CPPFLAGS += \
+   -DFEATURE_ES1=1
+endif
+
+if HAVE_OPENGL_ES2
+AM_CPPFLAGS += \
+   -DFEATURE_ES2=1
+endif
+
+if HAVE_OPENVG
+AM_CPPFLAGS += \
+   -DFEATURE_VG=1
+endif
diff --git a/src/gallium/targets/egl-static/Android.mk 
b/src/gallium/targets/egl-static/Android.mk
index 01408a7..37244b5 100644
--- a/src/gallium/targets/egl-static/Android.mk
+++ b/src/gallium/targets/egl-static/Android.mk
@@ -31,6 +31,8 @@ LOCAL_SRC_FILES := \
egl_st.c
 
 LOCAL_CFLAGS := \
+   -DFEATURE_ES1=1 \
+   -DFEATURE_ES2=1 \
-D_EGL_MAIN=_eglBuiltInDriverGALLIUM
 
 LOCAL_C_INCLUDES := \
diff --git a/src/gallium/targets/egl-static/SConscript 
b/src/gallium/targets/egl-static/SConscript
index 7d8d4d2..afb5c11 100644
--- a/src/gallium/targets/egl-static/SConscript
+++ b/src/gallium/targets/egl-static/SConscript
@@ -63,6 +63,11 @@ if env['platform'] == 'windows':
 
 # OpenGL ES and OpenGL
 if env['gles']:
+env.Append(CPPDEFINES = [
+'FEATURE_GL=1',
+'FEATURE_ES1=1',
+'FEATURE_ES2=1'
+])
 env.Prepend(LIBPATH = [shared_glapi.dir])
 # manually add LIBPREFIX on windows
 glapi_name = 'glapi' if env['platform'] != 'windows' else 'libglapi'
@@ -70,6 +75,7 @@ if env['gles']:
 
 # OpenVG
 if True:
+env.Append(CPPDEFINES = ['FEATURE_VG=1'])
 env.Prepend(LIBPATH = [openvg.dir])
 # manually add LIBPREFIX on windows
 openvg_name = 'OpenVG' if env['platform'] != 'windows' else 'libOpenVG'
-- 
2.0.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/2] clover: Call end_query before getting timestamp result

2014-07-16 Thread Niels Ole Salscheider

On Wednesday 16 July 2014, 16:49:08, Tom Stellard wrote:
> Also change the wait parameter from false to true.
> ---
> 
> I'm really not sure what is correct here, but this patch fixes event
> profiling on SI.

I think you should call end_query in the constructor right after the call to 
create_query. That is because you want the corresponding packet to be emited 
as soon as the query is created and not when you are interested in the results 
(i. e. when the corresponding event has occured).

>  src/gallium/state_trackers/clover/core/timestamp.cpp | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/src/gallium/state_trackers/clover/core/timestamp.cpp
> b/src/gallium/state_trackers/clover/core/timestamp.cpp index
> 481c4f9..a6edaf6 100644
> --- a/src/gallium/state_trackers/clover/core/timestamp.cpp
> +++ b/src/gallium/state_trackers/clover/core/timestamp.cpp
> @@ -47,7 +47,8 @@ cl_ulong
>  timestamp::query::operator()() const {
> pipe_query_result result;
> 
> -   if (!q().pipe->get_query_result(q().pipe, _query, false, &result))
> +   q().pipe->end_query(q().pipe, _query);
> +   if (!q().pipe->get_query_result(q().pipe, _query, true, &result))
>throw error(CL_PROFILING_INFO_NOT_AVAILABLE);
> 
> return result.u64;
> --
> 1.8.1.5
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/2] clover: Call end_query before getting timestamp result v2

2014-07-16 Thread Niels Ole Salscheider

Reviewed-by: Niels Ole Salscheider 

On Wednesday 16 July 2014, 17:37:48, Tom Stellard wrote:
> v2:
>   - Move the end_query() call into the timestamp constructor.
>   - Still pass false as the wait parameter to get_query_result().
> ---
>  src/gallium/state_trackers/clover/core/timestamp.cpp | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/src/gallium/state_trackers/clover/core/timestamp.cpp
> b/src/gallium/state_trackers/clover/core/timestamp.cpp index
> 481c4f9..3fd341f 100644
> --- a/src/gallium/state_trackers/clover/core/timestamp.cpp
> +++ b/src/gallium/state_trackers/clover/core/timestamp.cpp
> @@ -30,6 +30,7 @@ using namespace clover;
>  timestamp::query::query(command_queue &q) :
> q(q),
> _query(q.pipe->create_query(q.pipe, PIPE_QUERY_TIMESTAMP, 0)) {
> +   q.pipe->end_query(q.pipe, _query);
>  }
> 
>  timestamp::query::query(query &&other) :
> --
> 1.8.1.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/2] radeonsi: Implement DMA blit

2014-03-17 Thread Niels Ole Salscheider

This code is a slightly modified version of evergreen_dma_blit (and
evergreen_dma_copy as well as evergreen_dma_copy_tile).
It would be nice to share some of the code in the long term.

I have reused some "cik"-prefixed functions that also return the right
value for SI. I am not sure if they should be renamed.

Signed-off-by: Niels Ole Salscheider 
---
 src/gallium/drivers/radeonsi/Makefile.sources |   1 +
 src/gallium/drivers/radeonsi/si_dma.c | 352 ++
 src/gallium/drivers/radeonsi/si_pipe.h|   9 +
 src/gallium/drivers/radeonsi/si_state.c   |  25 +-
 src/gallium/drivers/radeonsi/si_state.h   |   7 +
 src/gallium/drivers/radeonsi/sid.h|  20 ++
 6 files changed, 394 insertions(+), 20 deletions(-)
 create mode 100644 src/gallium/drivers/radeonsi/si_dma.c

diff --git a/src/gallium/drivers/radeonsi/Makefile.sources 
b/src/gallium/drivers/radeonsi/Makefile.sources
index 11b3319..6a24cde 100644
--- a/src/gallium/drivers/radeonsi/Makefile.sources
+++ b/src/gallium/drivers/radeonsi/Makefile.sources
@@ -3,6 +3,7 @@ C_SOURCES := \
si_commands.c \
si_compute.c \
si_descriptors.c \
+   si_dma.c \
si_hw_context.c \
si_pipe.c \
si_pm4.c \
diff --git a/src/gallium/drivers/radeonsi/si_dma.c 
b/src/gallium/drivers/radeonsi/si_dma.c
new file mode 100644
index 000..61078eb
--- /dev/null
+++ b/src/gallium/drivers/radeonsi/si_dma.c
@@ -0,0 +1,352 @@
+/*
+ * Copyright 2010 Jerome Glisse 
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * on the rights to use, copy, modify, merge, publish, distribute, sub
+ * license, and/or sell copies of the Software, and to permit persons to whom
+ * the Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHOR(S) AND/OR THEIR SUPPLIERS BE LIABLE FOR ANY CLAIM,
+ * DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
+ * OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE
+ * USE OR OTHER DEALINGS IN THE SOFTWARE.
+ *
+ * Authors:
+ *  Jerome Glisse
+ */
+
+#include "sid.h"
+#include "si_pipe.h"
+#include "../radeon/r600_cs.h"
+
+#include "util/u_format.h"
+
+static unsigned si_array_mode(unsigned mode)
+{
+   switch (mode) {
+   case RADEON_SURF_MODE_LINEAR_ALIGNED:
+   return V_009910_ARRAY_LINEAR_ALIGNED;
+   case RADEON_SURF_MODE_1D:
+   return V_009910_ARRAY_1D_TILED_THIN1;
+   case RADEON_SURF_MODE_2D:
+   return V_009910_ARRAY_2D_TILED_THIN1;
+   default:
+   case RADEON_SURF_MODE_LINEAR:
+   return V_009910_ARRAY_LINEAR_GENERAL;
+   }
+}
+
+static uint32_t si_num_banks(uint32_t nbanks)
+{
+   switch (nbanks) {
+   case 2:
+   return V_009910_ADDR_SURF_2_BANK;
+   case 4:
+   return V_009910_ADDR_SURF_4_BANK;
+   case 8:
+   default:
+   return V_009910_ADDR_SURF_8_BANK;
+   case 16:
+   return V_009910_ADDR_SURF_16_BANK;
+   }
+}
+
+static uint32_t si_micro_tile_mode(struct si_screen *sscreen, unsigned 
tile_mode)
+{
+   if (sscreen->b.info.si_tile_mode_array_valid) {
+   uint32_t gb_tile_mode = 
sscreen->b.info.si_tile_mode_array[tile_mode];
+
+   return G_009910_MICRO_TILE_MODE(gb_tile_mode);
+   }
+
+   /* The kernel cannod return the tile mode array. Guess? */
+   return V_009910_ADDR_SURF_THIN_MICRO_TILING;
+}
+
+static void si_dma_copy_buffer(struct si_context *ctx,
+   struct pipe_resource *dst,
+   struct pipe_resource *src,
+   uint64_t dst_offset,
+   uint64_t src_offset,
+   uint64_t size)
+{
+   struct radeon_winsys_cs *cs = ctx->b.rings.dma.cs;
+   unsigned i, ncopy, csize, max_csize, sub_cmd, shift;
+   struct r600_resource *rdst = (struct r600_resource*)dst;
+   struct r600_resource *rsrc = (struct r600_resource*)src;
+
+   /* Mark the buffer range of destination as valid (initialized),
+* so that transfer_map knows it should wait for the GPU when mapping
+* that range. */
+   util_range_add(&rdst->valid_buffer_range, dst_offset,
+

[Mesa-dev] [PATCH 1/2] radeon: Move r600_need_dma_space to common code

2014-03-17 Thread Niels Ole Salscheider

Signed-off-by: Niels Ole Salscheider 
---
 src/gallium/drivers/r600/evergreen_hw_context.c |  2 +-
 src/gallium/drivers/r600/evergreen_state.c  |  2 +-
 src/gallium/drivers/r600/r600_hw_context.c  | 12 +---
 src/gallium/drivers/r600/r600_pipe.h|  1 -
 src/gallium/drivers/r600/r600_state.c   |  2 +-
 src/gallium/drivers/radeon/r600_pipe_common.c   | 10 ++
 src/gallium/drivers/radeon/r600_pipe_common.h   |  1 +
 7 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/src/gallium/drivers/r600/evergreen_hw_context.c 
b/src/gallium/drivers/r600/evergreen_hw_context.c
index 083b697..a433876 100644
--- a/src/gallium/drivers/r600/evergreen_hw_context.c
+++ b/src/gallium/drivers/r600/evergreen_hw_context.c
@@ -62,7 +62,7 @@ void evergreen_dma_copy(struct r600_context *rctx,
}
ncopy = (size / 0x000f) + !!(size % 0x000f);
 
-   r600_need_dma_space(rctx, ncopy * 5);
+   r600_need_dma_space(&rctx->b, ncopy * 5);
for (i = 0; i < ncopy; i++) {
csize = size < 0x000f ? size : 0x000f;
/* emit reloc before writting cs so that cs is always in 
consistent state */
diff --git a/src/gallium/drivers/r600/evergreen_state.c 
b/src/gallium/drivers/r600/evergreen_state.c
index 05cc3ef..b929f17 100644
--- a/src/gallium/drivers/r600/evergreen_state.c
+++ b/src/gallium/drivers/r600/evergreen_state.c
@@ -3295,7 +3295,7 @@ static void evergreen_dma_copy_tile(struct r600_context 
*rctx,
 
size = (copy_height * pitch) >> 2;
ncopy = (size / 0x000f) + !!(size % 0x000f);
-   r600_need_dma_space(rctx, ncopy * 9);
+   r600_need_dma_space(&rctx->b, ncopy * 9);
 
for (i = 0; i < ncopy; i++) {
cheight = copy_height;
diff --git a/src/gallium/drivers/r600/r600_hw_context.c 
b/src/gallium/drivers/r600/r600_hw_context.c
index 3a3b3d5..75723be 100644
--- a/src/gallium/drivers/r600/r600_hw_context.c
+++ b/src/gallium/drivers/r600/r600_hw_context.c
@@ -440,16 +440,6 @@ void r600_cp_dma_copy_buffer(struct r600_context *rctx,
 R600_CONTEXT_INV_TEX_CACHE;
 }
 
-void r600_need_dma_space(struct r600_context *ctx, unsigned num_dw)
-{
-   /* The number of dwords we already used in the DMA so far. */
-   num_dw += ctx->b.rings.dma.cs->cdw;
-   /* Flush if there's not enough space. */
-   if (num_dw > RADEON_MAX_CMDBUF_DWORDS) {
-   ctx->b.rings.dma.flush(ctx, RADEON_FLUSH_ASYNC);
-   }
-}
-
 void r600_dma_copy(struct r600_context *rctx,
struct pipe_resource *dst,
struct pipe_resource *src,
@@ -475,7 +465,7 @@ void r600_dma_copy(struct r600_context *rctx,
shift = 2;
ncopy = (size / 0x) + !!(size % 0x);
 
-   r600_need_dma_space(rctx, ncopy * 5);
+   r600_need_dma_space(&rctx->b, ncopy * 5);
for (i = 0; i < ncopy; i++) {
csize = size < 0x ? size : 0x;
/* emit reloc before writting cs so that cs is always in 
consistent state */
diff --git a/src/gallium/drivers/r600/r600_pipe.h 
b/src/gallium/drivers/r600/r600_pipe.h
index a3827e3..0472eaa 100644
--- a/src/gallium/drivers/r600/r600_pipe.h
+++ b/src/gallium/drivers/r600/r600_pipe.h
@@ -586,7 +586,6 @@ void r600_context_flush(struct r600_context *ctx, unsigned 
flags);
 void r600_begin_new_cs(struct r600_context *ctx);
 void r600_flush_emit(struct r600_context *ctx);
 void r600_need_cs_space(struct r600_context *ctx, unsigned num_dw, boolean 
count_draw_in);
-void r600_need_dma_space(struct r600_context *ctx, unsigned num_dw);
 void r600_cp_dma_copy_buffer(struct r600_context *rctx,
 struct pipe_resource *dst, uint64_t dst_offset,
 struct pipe_resource *src, uint64_t src_offset,
diff --git a/src/gallium/drivers/r600/r600_state.c 
b/src/gallium/drivers/r600/r600_state.c
index 39e38f4..6c8222b 100644
--- a/src/gallium/drivers/r600/r600_state.c
+++ b/src/gallium/drivers/r600/r600_state.c
@@ -2856,7 +2856,7 @@ static boolean r600_dma_copy_tile(struct r600_context 
*rctx,
 */
cheight = ((0x << 2) / pitch) & 0xfff8;
ncopy = (copy_height / cheight) + !!(copy_height % cheight);
-   r600_need_dma_space(rctx, ncopy * 7);
+   r600_need_dma_space(&rctx->b, ncopy * 7);
 
for (i = 0; i < ncopy; i++) {
cheight = cheight > copy_height ? copy_height : cheight;
diff --git a/src/gallium/drivers/radeon/r600_pipe_common.c 
b/src/gallium/drivers/radeon/r600_pipe_common.c
index 05ada1c..35901c8 100644
--- a/src/gallium/drivers/radeon/r600_pipe_common.c
+++ b/src/gallium/drivers/radeon/r600_pipe_common.c
@@ -727,3 +727,13 @@ void r600_screen_clear_buffer(struct r600_common_screen 
*rscreen, struct pipe_re
rscreen->aux_context->flush(rscreen->aux_context, NULL, 0);

Re: [Mesa-dev] [PATCH 2/2] radeonsi: Implement DMA blit

2014-03-17 Thread Niels Ole Salscheider

I have sent an updated version of the patch to the mailing list.
I hope that the copyright header of si_dma.c is right - I copied it from 
si_hw_context.c...

Ole

On Monday 17 March 2014, 02:33:35, Marek Olšák wrote:
> Thanks for doing this! I have some comments...
> 
> 1) As of SI, the maximum supported size for dword-aligned L2L, L2T,
> and T2L copies is 0x8. The maximum supported size for byte-aligned
> L2L copies is 0xfffe0. I'd like to have proper definitions for this,
> e.g. SI_DMA_COPY_MAX_SIZE and SI_DMA_COPY_MAX_SIZE_DW. All occurrences
> of 0x000f should be replaced appropriately.
> 
> Now the cosmetic stuff.
> 
> 2) This is quite a lot of code, so I'd like all of this to be in a
> separate file, e.g. si_dma.c.
> 
> 3) r600/si_need_cs_space could be moved to drivers/radeon.
> 
> 4) All calls to r600_context_bo_reloc could be moved out of the loops,
> because SI supports virtual memory and therefore it's not required to
> call the function before every packet. See also my explanation in
> patch "winsys/radeon: only add duplicate relocations for DMA if VM
> isn't supported".
> 
> 5) Flushing the gfx CS is not required, because r600_context_bo_reloc
> flushes it for you.
> 
> Please see also my latest DMA patches for r600g.
> 
> Thanks.
> 
> Marek
> 
> On Thu, Mar 13, 2014 at 8:45 AM, Niels Ole Salscheider
> 
>  wrote:
> > This code is a slightly modified version of evergreen_dma_blit (and
> > evergreen_dma_copy as well as evergreen_dma_copy_tile).
> > It would be nice to share some of the code in the long term.
> > 
> > I have reused some "cik"-prefixed functions that also return the right
> > value for SI. I am not sure if they should be renamed.
> > 
> > Signed-off-by: Niels Ole Salscheider 
> > ---
> > 
> >  src/gallium/drivers/radeonsi/si_hw_context.c |  65 +++
> >  src/gallium/drivers/radeonsi/si_pipe.h   |   7 +
> >  src/gallium/drivers/radeonsi/si_state.c  | 265
> >  ++- src/gallium/drivers/radeonsi/sid.h  
> >  |  15 ++
> >  4 files changed, 346 insertions(+), 6 deletions(-)
> > 
> > diff --git a/src/gallium/drivers/radeonsi/si_hw_context.c
> > b/src/gallium/drivers/radeonsi/si_hw_context.c index d9fba01..76583a3
> > 100644
> > --- a/src/gallium/drivers/radeonsi/si_hw_context.c
> > +++ b/src/gallium/drivers/radeonsi/si_hw_context.c
> > @@ -25,6 +25,8 @@
> > 
> >   */
> >  
> >  #include "si_pipe.h"
> > 
> > +#include "sid.h"
> > +#include "../radeon/r600_cs.h"
> > 
> >  /* initialize */
> >  void si_need_cs_space(struct si_context *ctx, unsigned num_dw,
> > 
> > @@ -186,6 +188,69 @@ void si_begin_new_cs(struct si_context *ctx)
> > 
> > ctx->b.initial_gfx_cs_size = ctx->b.rings.gfx.cs->cdw;
> >  
> >  }
> > 
> > +void si_need_dma_space(struct si_context *ctx, unsigned num_dw)
> > +{
> > +   /* The number of dwords we already used in the DMA so far. */
> > +   num_dw += ctx->b.rings.dma.cs->cdw;
> > +   /* Flush if there's not enough space. */
> > +   if (num_dw > RADEON_MAX_CMDBUF_DWORDS) {
> > +   ctx->b.rings.dma.flush(ctx, RADEON_FLUSH_ASYNC);
> > +   }
> > +}
> > +
> > +void si_dma_copy(struct si_context *ctx,
> > +struct pipe_resource *dst,
> > +struct pipe_resource *src,
> > +uint64_t dst_offset,
> > +uint64_t src_offset,
> > +uint64_t size)
> > +{
> > +   struct radeon_winsys_cs *cs = ctx->b.rings.dma.cs;
> > +   unsigned i, ncopy, csize, sub_cmd, shift;
> > +   struct r600_resource *rdst = (struct r600_resource*)dst;
> > +   struct r600_resource *rsrc = (struct r600_resource*)src;
> > +
> > +   /* Mark the buffer range of destination as valid (initialized),
> > +* so that transfer_map knows it should wait for the GPU when
> > mapping +* that range. */
> > +   util_range_add(&rdst->valid_buffer_range, dst_offset,
> > +  dst_offset + size);
> > +
> > +   /* make sure that the dma ring is only one active */
> > +   ctx->b.rings.gfx.flush(ctx, RADEON_FLUSH_ASYNC);
> > +   dst_offset += r600_resource_va(&ctx->screen->b.b, dst);
> > +   src_offset += r600_resource_va(&ctx->screen->b.b, src);
> > +
> > +   /* see if we use dword or byte copy */
> > +   if (!(dst_o

Re: [Mesa-dev] [PATCH 0/2] radeon: Use the DMA engine for buffer downloads

2014-08-09 Thread Niels Ole Salscheider

On Tuesday 04 March 2014, 02:08:58, Marek Olšák wrote:
> Could you please do this without changing u_upload_mgr? You can still
> use u_upload_alloc to allocate buffer memory in the driver and the map
> buffer read/write flags are not important with persistent coherent
> buffer mappings anyway.

Since 150ac07b855b5c5f879bf6ce9ca421ccd1a6c938 we allocate CPU -> GPU 
streaming buffers (i. e. those with PIPE_USAGE_STREAM) in VRAM.
We should therefore set buffer.usage to PIPE_USAGE_STAGING in 
u_upload_alloc_buffer when we use u_upload_mgr for downloads - otherwise we 
won't get any performance improvements.
Would it now be OK to change u_upload_mgr or do you have a better proposal?

Ole

> Marek
> 
> On Mon, Mar 3, 2014 at 9:29 PM, Niels Ole Salscheider
> 
>  wrote:
> > Using the DMA engine for buffer downloads vastly improves performance.
> > This is because reads from VRAM by the CPU are slow because of the high
> > latency of the PCIe bus.
> > 
> > The first patch allows u_upload_mgr to be used for downloads, too. The
> > second patch then uses u_upload_mgr in the radeon driver for downloads.
> > I considered to rename u_upload_mgr to u_transfer_mgr since it might be
> > confusing that an "upload manager" can be used for downloads. But then
> > again we also have "transfers" so that u_transfer_mgr might also be
> > confusing. Thus, I decided not to rename it for now.
> > 
> > Without these patches, the buffer_bandwidth benchmark from uCLbench gives
> > me:
> > 
> > ./buffer_bandwidth --size=2000 --iterations=100
> > # device 0: AMD BARTS // type gpu (192 MB global memory, 64 KB constant
> > memory,> 
> > 32 KB local memory)
> > 
> > 1/1 direct 2000 Bytes   759.29 MB/s(HD) 17.13 MB/s(DD)
> > 
> > 14.61 MB/s(DH)
> > 
> > With these paches, the read performance is much better:
> > 
> > ./buffer_bandwidth --size=2000 --iterations=100
> > # device 0: AMD BARTS // type gpu (192 MB global memory, 64 KB constant
> > memory,> 
> > 32 KB local memory)
> > 
> > 1/1 direct 2000 Bytes   759.90 MB/s(HD) 613.49 MB/s(DD)
> > 
> > 1841.07 MB/s(DH)
> > 
> > Judging by these numbers, it might even make sense to use the DMA engine
> > for larger buffer downloads...
> > 
> > Niels Ole Salscheider (2):
> >   util/u_upload_mgr: Allow to also use it for downloads
> >   radeon: Use transfer manager for buffer downloads
> >  
> >  src/gallium/auxiliary/hud/hud_context.c |  3 +-
> >  src/gallium/auxiliary/util/u_blitter.c  |  3 +-
> >  src/gallium/auxiliary/util/u_upload_mgr.c   | 49 +++-
> >  src/gallium/auxiliary/util/u_upload_mgr.h   | 13 -
> >  src/gallium/auxiliary/util/u_vbuf.c |  3 +-
> >  src/gallium/auxiliary/vl/vl_compositor.c|  3 +-
> >  src/gallium/drivers/ilo/ilo_context.c   |  3 +-
> >  src/gallium/drivers/r300/r300_context.c |  3 +-
> >  src/gallium/drivers/radeon/r600_buffer_common.c | 78
> >  +++-- src/gallium/drivers/radeon/r600_pipe_common.c 
> >   | 14 -
> >  src/gallium/drivers/radeon/r600_pipe_common.h   |  1 +
> >  src/mesa/state_tracker/st_context.c |  9 ++-
> >  12 files changed, 136 insertions(+), 46 deletions(-)
> > 
> > --
> > 1.9.0
> > 
> > ___
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] gallium/radeon: Set gpu_address to 0 if r600_virtual_address is false

2014-08-10 Thread Niels Ole Salscheider

Without this patch I get the following during DMA transfers:
[drm:radeon_cs_ib_chunk] *ERROR* Invalid command stream !
radeon :01:00.0: CP DMA dst buffer too small (21475829792 4096)

This is a fixup for e878e154cdfd4dbb5474f776e0a6d86fcb983098.

Signed-off-by: Niels Ole Salscheider 
---
 src/gallium/drivers/radeon/r600_buffer_common.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/gallium/drivers/radeon/r600_buffer_common.c 
b/src/gallium/drivers/radeon/r600_buffer_common.c
index a580685..22bc97e 100644
--- a/src/gallium/drivers/radeon/r600_buffer_common.c
+++ b/src/gallium/drivers/radeon/r600_buffer_common.c
@@ -171,6 +171,8 @@ bool r600_init_resource(struct r600_common_screen *rscreen,
 
if (rscreen->info.r600_virtual_address)
res->gpu_address = 
rscreen->ws->buffer_get_virtual_address(res->cs_buf);
+   else
+   res->gpu_address = 0;
 
pb_reference(&old_buf, NULL);
 
-- 
2.0.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] gallium/radeon: Do not use u_upload_mgr for buffer downloads

2014-08-14 Thread Niels Ole Salscheider

Instead create a staging texture with pipe_buffer_create and
PIPE_USAGE_STAGING.

u_upload_mgr sets the usage of its staging buffer to PIPE_USAGE_STREAM.
But since 150ac07b855b5c5f879bf6ce9ca421ccd1a6c938 CPU -> GPU streaming buffers
are created in VRAM. Therefore the staging texture (in VRAM) does not offer any
performance improvements for buffer downloads.

Signed-off-by: Niels Ole Salscheider 
---
 src/gallium/drivers/radeon/r600_buffer_common.c | 20 
 1 file changed, 8 insertions(+), 12 deletions(-)

diff --git a/src/gallium/drivers/radeon/r600_buffer_common.c 
b/src/gallium/drivers/radeon/r600_buffer_common.c
index 22bc97e..ee05776 100644
--- a/src/gallium/drivers/radeon/r600_buffer_common.c
+++ b/src/gallium/drivers/radeon/r600_buffer_common.c
@@ -303,26 +303,22 @@ static void *r600_buffer_transfer_map(struct pipe_context 
*ctx,
 !(usage & PIPE_TRANSFER_WRITE) &&
 rbuffer->domains == RADEON_DOMAIN_VRAM &&
 r600_can_dma_copy_buffer(rctx, 0, box->x, box->width)) {
-   unsigned offset;
-   struct r600_resource *staging = NULL;
-
-   u_upload_alloc(rctx->uploader, 0,
-  box->width + (box->x % 
R600_MAP_BUFFER_ALIGNMENT),
-  &offset, (struct pipe_resource**)&staging, 
(void**)&data);
+   struct r600_resource *staging;
 
+   staging = (struct r600_resource*) pipe_buffer_create(
+   ctx->screen, PIPE_BIND_TRANSFER_READ, 
PIPE_USAGE_STAGING,
+   box->width + (box->x % 
R600_MAP_BUFFER_ALIGNMENT));
if (staging) {
-   data += box->x % R600_MAP_BUFFER_ALIGNMENT;
-
/* Copy the VRAM buffer to the staging buffer. */
rctx->dma_copy(ctx, &staging->b.b, 0,
-  offset + box->x % 
R600_MAP_BUFFER_ALIGNMENT,
+  box->x % R600_MAP_BUFFER_ALIGNMENT,
   0, 0, resource, level, box);
 
-   /* Just do the synchronization. The buffer is mapped 
already. */
-   r600_buffer_map_sync_with_rings(rctx, staging, 
PIPE_TRANSFER_READ);
+   data = r600_buffer_map_sync_with_rings(rctx, staging, 
PIPE_TRANSFER_READ);
+   data += box->x % R600_MAP_BUFFER_ALIGNMENT;
 
return r600_buffer_get_transfer(ctx, resource, level, 
usage, box,
-   ptransfer, data, 
staging, offset);
+   ptransfer, data, 
staging, 0);
}
}
 
-- 
2.0.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] r600: Use DMA transfers in r600_copy_global_buffer

2014-09-07 Thread Niels Ole Salscheider

Signed-off-by: Niels Ole Salscheider 
---
 src/gallium/drivers/r600/evergreen_compute.c | 27 ---
 src/gallium/drivers/r600/evergreen_compute.h |  1 +
 src/gallium/drivers/r600/r600_blit.c | 40 
 3 files changed, 41 insertions(+), 27 deletions(-)

diff --git a/src/gallium/drivers/r600/evergreen_compute.c 
b/src/gallium/drivers/r600/evergreen_compute.c
index 38b78c7..b495868 100644
--- a/src/gallium/drivers/r600/evergreen_compute.c
+++ b/src/gallium/drivers/r600/evergreen_compute.c
@@ -953,6 +953,22 @@ void r600_compute_global_buffer_destroy(
free(res);
 }
 
+void r600_compute_global_demote_or_alloc(
+   struct compute_memory_pool *pool,
+   struct compute_memory_item *item,
+   struct pipe_context *ctx)
+{
+   if (is_item_in_pool(item)) {
+   compute_memory_demote_item(pool, item, ctx);
+   } else {
+   if (item->real_buffer == NULL) {
+   item->real_buffer = (struct r600_resource*)
+   
r600_compute_buffer_alloc_vram(pool->screen, item->size_in_dw * 4);
+   }
+   }
+
+}
+
 void *r600_compute_global_transfer_map(
struct pipe_context *ctx_,
struct pipe_resource *resource,
@@ -970,16 +986,7 @@ void *r600_compute_global_transfer_map(
struct pipe_resource *dst = NULL;
unsigned offset = box->x;
 
-   if (is_item_in_pool(item)) {
-   compute_memory_demote_item(pool, item, ctx_);
-   }
-   else {
-   if (item->real_buffer == NULL) {
-   item->real_buffer = (struct r600_resource*)
-   
r600_compute_buffer_alloc_vram(pool->screen, item->size_in_dw * 4);
-   }
-   }
-
+   r600_compute_global_demote_or_alloc(pool, item, ctx_);
dst = (struct pipe_resource*)item->real_buffer;
 
if (usage & PIPE_TRANSFER_READ)
diff --git a/src/gallium/drivers/r600/evergreen_compute.h 
b/src/gallium/drivers/r600/evergreen_compute.h
index 4fb53a1..39bb854 100644
--- a/src/gallium/drivers/r600/evergreen_compute.h
+++ b/src/gallium/drivers/r600/evergreen_compute.h
@@ -47,6 +47,7 @@ void evergreen_emit_cs_shader(struct r600_context *rctx, 
struct r600_atom * atom
 
 struct pipe_resource *r600_compute_global_buffer_create(struct pipe_screen 
*screen, const struct pipe_resource *templ);
 void r600_compute_global_buffer_destroy(struct pipe_screen *screen, struct 
pipe_resource *res);
+void r600_compute_global_demote_or_alloc(struct compute_memory_pool *pool, 
struct compute_memory_item *item, struct pipe_context *ctx);
 void *r600_compute_global_transfer_map(
struct pipe_context *ctx_,
struct pipe_resource *resource,
diff --git a/src/gallium/drivers/r600/r600_blit.c 
b/src/gallium/drivers/r600/r600_blit.c
index f766e37..f6471cb 100644
--- a/src/gallium/drivers/r600/r600_blit.c
+++ b/src/gallium/drivers/r600/r600_blit.c
@@ -21,6 +21,8 @@
  * USE OR OTHER DEALINGS IN THE SOFTWARE.
  */
 #include "r600_pipe.h"
+#include "compute_memory_pool.h"
+#include "evergreen_compute.h"
 #include "util/u_surface.h"
 #include "util/u_format.h"
 #include "evergreend.h"
@@ -514,29 +516,33 @@ static void r600_copy_buffer(struct pipe_context *ctx, 
struct pipe_resource *dst
  * into a single global resource (r600_screen::global_pool).  The means
  * they don't have their own cs_buf handle, so they cannot be passed
  * to r600_copy_buffer() and must be handled separately.
- *
- * XXX: It should be possible to implement this function using
- * r600_copy_buffer() by passing the memory_pool resource as both src
- * and dst and updating dstx and src_box to point to the correct offsets.
- * This would likely perform better than the current implementation.
  */
 static void r600_copy_global_buffer(struct pipe_context *ctx,
struct pipe_resource *dst, unsigned
dstx, struct pipe_resource *src,
const struct pipe_box *src_box)
 {
-   struct pipe_box dst_box; struct pipe_transfer *src_pxfer,
-   *dst_pxfer;
-
-   u_box_1d(dstx, src_box->width, &dst_box);
-   void *src_ptr = ctx->transfer_map(ctx, src, 0, PIPE_TRANSFER_READ,
- src_box, &src_pxfer);
-   void *dst_ptr = ctx->transfer_map(ctx, dst, 0, PIPE_TRANSFER_WRITE,
- &dst_box, &dst_pxfer);
-   memcpy(dst_ptr, src_ptr, src_box->width);
-
-   ctx->transfer_unmap(ctx, src_pxfer);
-   ctx->transfer_unmap(ctx, dst_pxfer);
+   struct r600_context *rctx = (struct r600_context*)ctx;
+   struct compute_memory_pool *pool = rctx->screen->global_pool;
+
+   if (src->bind & PIPE_BIND_GLOBAL) {
+   struc

Re: [Mesa-dev] [PATCH] r600: Use DMA transfers in r600_copy_global_buffer

2014-09-08 Thread Niels Ole Salscheider

On Monday 08 September 2014, 15:19:15, Bruno Jimenez wrote:
> Hi,
> 
> I'm not sure if this will work. Imagine this case:
> 
> We  have an item in the pool, and we want to use
> r600_resource_copy_region with it, for example because we want to demote
> it. This will call r600_copy_global_buffer, and with your patch it will
> call r600_compute_global_demote_or_alloc, which will again call
> compute_memory_demote_item causing an infinite cycle.

I think this will not be a problem because neither the pool bo nor the 
"real_buffer" will have the PIPE_BIND_GLOBAL flag. Therefore, 
r600_compute_global_demote_or_alloc will not be called again.

> Also, why are you reassigning src and dst in r600_copy_global_buffer?

For r600, resources with PIPE_BIND_GLOBAL are not real resources but only 
correspond to items in the compute pool. There they can either have the 
"real_buffer" bo when they should be mapped or be part of the pool bo. 
Therefore the pipe_resources have to be reassigned accordingly.

I am however not sure if it is really necessary to demote the item from the 
pool before copying data to it. Otherwise it would be possible to directly 
access the pool bo if the item is already in it.

> - Bruno
> 
> On Sun, 2014-09-07 at 18:32 +0200, Niels Ole Salscheider wrote:
> > Signed-off-by: Niels Ole Salscheider 
> > ---
> > 
> >  src/gallium/drivers/r600/evergreen_compute.c | 27 ---
> >  src/gallium/drivers/r600/evergreen_compute.h |  1 +
> >  src/gallium/drivers/r600/r600_blit.c | 40
> >   3 files changed, 41 insertions(+), 27
> >  deletions(-)
> > 
> > diff --git a/src/gallium/drivers/r600/evergreen_compute.c
> > b/src/gallium/drivers/r600/evergreen_compute.c index 38b78c7..b495868
> > 100644
> > --- a/src/gallium/drivers/r600/evergreen_compute.c
> > +++ b/src/gallium/drivers/r600/evergreen_compute.c
> > @@ -953,6 +953,22 @@ void r600_compute_global_buffer_destroy(
> > 
> > free(res);
> >  
> >  }
> > 
> > +void r600_compute_global_demote_or_alloc(
> > +   struct compute_memory_pool *pool,
> > +   struct compute_memory_item *item,
> > +   struct pipe_context *ctx)
> > +{
> > +   if (is_item_in_pool(item)) {
> > +   compute_memory_demote_item(pool, item, ctx);
> > +   } else {
> > +   if (item->real_buffer == NULL) {
> > +   item->real_buffer = (struct r600_resource*)
> > +   
> > r600_compute_buffer_alloc_vram(pool->screen, item-
>size_in_dw * 4);
> > +   }
> > +   }
> > +
> > +}
> > +
> > 
> >  void *r600_compute_global_transfer_map(
> >  
> > struct pipe_context *ctx_,
> > struct pipe_resource *resource,
> > 
> > @@ -970,16 +986,7 @@ void *r600_compute_global_transfer_map(
> > 
> > struct pipe_resource *dst = NULL;
> > unsigned offset = box->x;
> > 
> > -   if (is_item_in_pool(item)) {
> > -   compute_memory_demote_item(pool, item, ctx_);
> > -   }
> > -   else {
> > -   if (item->real_buffer == NULL) {
> > -   item->real_buffer = (struct r600_resource*)
> > -   
> > r600_compute_buffer_alloc_vram(pool->screen, item-
>size_in_dw * 4);
> > -   }
> > -   }
> > -
> > +   r600_compute_global_demote_or_alloc(pool, item, ctx_);
> > 
> > dst = (struct pipe_resource*)item->real_buffer;
> > 
> > if (usage & PIPE_TRANSFER_READ)
> > 
> > diff --git a/src/gallium/drivers/r600/evergreen_compute.h
> > b/src/gallium/drivers/r600/evergreen_compute.h index 4fb53a1..39bb854
> > 100644
> > --- a/src/gallium/drivers/r600/evergreen_compute.h
> > +++ b/src/gallium/drivers/r600/evergreen_compute.h
> > @@ -47,6 +47,7 @@ void evergreen_emit_cs_shader(struct r600_context *rctx,
> > struct r600_atom * atom> 
> >  struct pipe_resource *r600_compute_global_buffer_create(struct
> >  pipe_screen *screen, const struct pipe_resource *templ); void
> >  r600_compute_global_buffer_destroy(struct pipe_screen *screen, struct
> >  pipe_resource *res);> 
> > +void r600_compute_global_demote_or_alloc(struct compute_memory_pool
> > *pool, struct compute_memory_item *item, struct pipe_context *ctx);> 
> >  void *r600_compute_global_transfer_map(
> >  
> > struct pipe_context *ctx_,
> > struct pipe_resource *resource,
> > 
> > diff --git a/src/gallium/drivers/r600/r600_blit.c
> > b/src/gallium/drivers/r600/r6

[Mesa-dev] [PATCH v2] r600: Use DMA transfers in r600_copy_global_buffer

2014-09-08 Thread Niels Ole Salscheider

v2: Do not demote items that are already in the pool

Signed-off-by: Niels Ole Salscheider 
---
 src/gallium/drivers/r600/evergreen_compute.h |  1 +
 src/gallium/drivers/r600/r600_blit.c | 59 
 2 files changed, 43 insertions(+), 17 deletions(-)

diff --git a/src/gallium/drivers/r600/evergreen_compute.h 
b/src/gallium/drivers/r600/evergreen_compute.h
index 4fb53a1..e4d3a38 100644
--- a/src/gallium/drivers/r600/evergreen_compute.h
+++ b/src/gallium/drivers/r600/evergreen_compute.h
@@ -45,6 +45,7 @@ void evergreen_init_atom_start_compute_cs(struct r600_context 
*rctx);
 void evergreen_init_compute_state_functions(struct r600_context *rctx);
 void evergreen_emit_cs_shader(struct r600_context *rctx, struct r600_atom * 
atom);
 
+struct r600_resource* r600_compute_buffer_alloc_vram(struct r600_screen 
*screen, unsigned size);
 struct pipe_resource *r600_compute_global_buffer_create(struct pipe_screen 
*screen, const struct pipe_resource *templ);
 void r600_compute_global_buffer_destroy(struct pipe_screen *screen, struct 
pipe_resource *res);
 void *r600_compute_global_transfer_map(
diff --git a/src/gallium/drivers/r600/r600_blit.c 
b/src/gallium/drivers/r600/r600_blit.c
index f766e37..b334a75 100644
--- a/src/gallium/drivers/r600/r600_blit.c
+++ b/src/gallium/drivers/r600/r600_blit.c
@@ -21,6 +21,8 @@
  * USE OR OTHER DEALINGS IN THE SOFTWARE.
  */
 #include "r600_pipe.h"
+#include "compute_memory_pool.h"
+#include "evergreen_compute.h"
 #include "util/u_surface.h"
 #include "util/u_format.h"
 #include "evergreend.h"
@@ -514,29 +516,52 @@ static void r600_copy_buffer(struct pipe_context *ctx, 
struct pipe_resource *dst
  * into a single global resource (r600_screen::global_pool).  The means
  * they don't have their own cs_buf handle, so they cannot be passed
  * to r600_copy_buffer() and must be handled separately.
- *
- * XXX: It should be possible to implement this function using
- * r600_copy_buffer() by passing the memory_pool resource as both src
- * and dst and updating dstx and src_box to point to the correct offsets.
- * This would likely perform better than the current implementation.
  */
 static void r600_copy_global_buffer(struct pipe_context *ctx,
struct pipe_resource *dst, unsigned
dstx, struct pipe_resource *src,
const struct pipe_box *src_box)
 {
-   struct pipe_box dst_box; struct pipe_transfer *src_pxfer,
-   *dst_pxfer;
-
-   u_box_1d(dstx, src_box->width, &dst_box);
-   void *src_ptr = ctx->transfer_map(ctx, src, 0, PIPE_TRANSFER_READ,
- src_box, &src_pxfer);
-   void *dst_ptr = ctx->transfer_map(ctx, dst, 0, PIPE_TRANSFER_WRITE,
- &dst_box, &dst_pxfer);
-   memcpy(dst_ptr, src_ptr, src_box->width);
-
-   ctx->transfer_unmap(ctx, src_pxfer);
-   ctx->transfer_unmap(ctx, dst_pxfer);
+   struct r600_context *rctx = (struct r600_context*)ctx;
+   struct compute_memory_pool *pool = rctx->screen->global_pool;
+   struct pipe_box new_src_box = *src_box;
+
+   if (src->bind & PIPE_BIND_GLOBAL) {
+   struct r600_resource_global *rsrc =
+   (struct r600_resource_global *)src;
+   struct compute_memory_item *item = rsrc->chunk;
+
+   if (is_item_in_pool(item)) {
+   new_src_box.x += 4 * item->start_in_dw;
+   src = (struct pipe_resource *)pool->bo;
+   } else {
+   if (item->real_buffer == NULL) {
+   item->real_buffer = (struct r600_resource*)
+   
r600_compute_buffer_alloc_vram(pool->screen,
+  
item->size_in_dw * 4);
+   }
+   src = (struct pipe_resource*)item->real_buffer;
+   }
+   }
+   if (dst->bind & PIPE_BIND_GLOBAL) {
+   struct r600_resource_global *rdst =
+   (struct r600_resource_global *)dst;
+   struct compute_memory_item *item = rdst->chunk;
+
+   if (is_item_in_pool(item)) {
+   dstx += 4 * item->start_in_dw;
+   dst = (struct pipe_resource *)pool->bo;
+   } else {
+   if (item->real_buffer == NULL) {
+   item->real_buffer = (struct r600_resource*)
+   
r600_compute_buffer_alloc_vram(pool->screen,
+  
item->size_in_dw * 4);
+   }
+   dst = (str

Re: [Mesa-dev] [PATCH] r600: Use DMA transfers in r600_copy_global_buffer

2014-09-09 Thread Niels Ole Salscheider

On Tuesday 09 September 2014, 11:40:49, Bruno Jimenez wrote:
> On Mon, 2014-09-08 at 18:30 +0200, Niels Ole Salscheider wrote:
> > On Monday 08 September 2014, 15:19:15, Bruno Jimenez wrote:
> > > Hi,
> > > 
> > > I'm not sure if this will work. Imagine this case:
> > > 
> > > We  have an item in the pool, and we want to use
> > > r600_resource_copy_region with it, for example because we want to demote
> > > it. This will call r600_copy_global_buffer, and with your patch it will
> > > call r600_compute_global_demote_or_alloc, which will again call
> > > compute_memory_demote_item causing an infinite cycle.
> > 
> > I think this will not be a problem because neither the pool bo nor the
> > "real_buffer" will have the PIPE_BIND_GLOBAL flag. Therefore,
> > r600_compute_global_demote_or_alloc will not be called again.
> 
> Hi,
> 
> You are completely right, for a moment I thought that the resources
> associated with the items also had the PIPE_BIND_GLOBAL flag.
> 
> Then I think that this code isn't truly necessary, as every call to
> resource_copy_region related with compute items is done to the
> r600_resources directly without touchin the global resources.
> 
> > > Also, why are you reassigning src and dst in r600_copy_global_buffer?
> > 
> > For r600, resources with PIPE_BIND_GLOBAL are not real resources but only
> > correspond to items in the compute pool. There they can either have the
> > "real_buffer" bo when they should be mapped or be part of the pool bo.
> > Therefore the pipe_resources have to be reassigned accordingly.
> 
> You are right again. I'm not thinking clearly lately, sorry.
> 
> > I am however not sure if it is really necessary to demote the item from
> > the
> > pool before copying data to it. Otherwise it would be possible to directly
> > access the pool bo if the item is already in it.
> 
> I hope that it isn't necesary to demote the items for this. But, as I
> have said, resource_copy_region isn't called with r600_resource_globals
> (as far as I know)

Yes, I have sent an updated patch to the list yesterday that does not demote 
the item.

This code is used, though. resource_copy_region is called from clover's 
resource::copy with global compute resources as arguments.

> Hopefully, I haven't said any other dumb thing.
> 
> Thanks!
> Bruno
> 
> > > - Bruno
> > > 
> > > On Sun, 2014-09-07 at 18:32 +0200, Niels Ole Salscheider wrote:
> > > > Signed-off-by: Niels Ole Salscheider 
> > > > ---
> > > > 
> > > >  src/gallium/drivers/r600/evergreen_compute.c | 27 ---
> > > >  src/gallium/drivers/r600/evergreen_compute.h |  1 +
> > > >  src/gallium/drivers/r600/r600_blit.c | 40
> > > >   3 files changed, 41 insertions(+), 27
> > > >  deletions(-)
> > > > 
> > > > diff --git a/src/gallium/drivers/r600/evergreen_compute.c
> > > > b/src/gallium/drivers/r600/evergreen_compute.c index 38b78c7..b495868
> > > > 100644
> > > > --- a/src/gallium/drivers/r600/evergreen_compute.c
> > > > +++ b/src/gallium/drivers/r600/evergreen_compute.c
> > > > @@ -953,6 +953,22 @@ void r600_compute_global_buffer_destroy(
> > > > 
> > > > free(res);
> > > >  
> > > >  }
> > > > 
> > > > +void r600_compute_global_demote_or_alloc(
> > > > +   struct compute_memory_pool *pool,
> > > > +   struct compute_memory_item *item,
> > > > +   struct pipe_context *ctx)
> > > > +{
> > > > +   if (is_item_in_pool(item)) {
> > > > +   compute_memory_demote_item(pool, item, ctx);
> > > > +   } else {
> > > > +   if (item->real_buffer == NULL) {
> > > > +   item->real_buffer = (struct r600_resource*)
> > > > +   
> > > > r600_compute_buffer_alloc_vram(pool->screen, item-
> > >
> > >size_in_dw * 4);
> > >
> > > > +   }
> > > > +   }
> > > > +
> > > > +}
> > > > +
> > > > 
> > > >  void *r600_compute_global_transfer_map(
> > > >  
> > > > struct pipe_context *ctx_,
> > > > struct pipe_resource *resource,
> > > > 
> > > > @@ -970,16 +986,7 @@ void *r600_compute_global_tra

Re: [Mesa-dev] [PATCH 1/4] radeonsi/compute: directly emit CONTEXT_CONTROL

2014-09-22 Thread Niels Ole Salscheider

On Monday 22 September 2014, 12:16:13, Alex Deucher wrote:
> On Sat, Sep 20, 2014 at 6:11 AM, Marek Olšák  wrote:
> > From: Marek Olšák 
> 
> Looks good.  Tom should probably take a look as well.  As a further
> improvement, it would be nice to be able to use the compute rings for
> compute rather than gfx, but I'm not sure how much additional effort
> it would take to clean that up.

This is completely untested but now that we can detect compute contexts 
something like the attached patches might be sufficient...

> Reviewed-by: Alex Deucher 
> 
> > ---
> > 
> >  src/gallium/drivers/radeonsi/si_compute.c | 6 +-
> >  1 file changed, 5 insertions(+), 1 deletion(-)
> > 
> > diff --git a/src/gallium/drivers/radeonsi/si_compute.c
> > b/src/gallium/drivers/radeonsi/si_compute.c index 4b2662d..3ad9182 100644
> > --- a/src/gallium/drivers/radeonsi/si_compute.c
> > +++ b/src/gallium/drivers/radeonsi/si_compute.c
> > @@ -168,6 +168,7 @@ static void si_launch_grid(
> > 
> > uint32_t pc, const void *input)
> >  
> >  {
> >  
> > struct si_context *sctx = (struct si_context*)ctx;
> > 
> > +   struct radeon_winsys_cs *cs = sctx->b.rings.gfx.cs;
> > 
> > struct si_compute *program = sctx->cs_shader_state.program;
> > struct si_pm4_state *pm4 = CALLOC_STRUCT(si_pm4_state);
> > struct r600_resource *input_buffer = program->input_buffer;
> > 
> > @@ -184,8 +185,11 @@ static void si_launch_grid(
> > 
> > unsigned lds_blocks;
> > unsigned num_waves_for_scratch;
> > 
> > +   radeon_emit(cs, PKT3(PKT3_CONTEXT_CONTROL, 1, 0) |
> > PKT3_SHADER_TYPE_S(1)); +   radeon_emit(cs, 0x8000);
> > +   radeon_emit(cs, 0x8000);
> > +
> > 
> > pm4->compute_pkt = true;
> > 
> > -   si_cmd_context_control(pm4);
> > 
> > si_pm4_cmd_begin(pm4, PKT3_EVENT_WRITE);
> > si_pm4_cmd_add(pm4, EVENT_TYPE(EVENT_TYPE_CACHE_FLUSH) |
> > 
> > --
> > 1.9.1
> > 
> > ___
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>From 9714d3ee55ee0ddb0bcf63934b552df641b866a2 Mon Sep 17 00:00:00 2001
From: Niels Ole Salscheider 
Date: Mon, 22 Sep 2014 19:41:20 +0200
Subject: [PATCH 1/2] radeon: submit compute packets to the compute ring

They have been submitted to the gfx ring since
764502b481e2288cb5e751de739253fdee886e3e.

Signed-off-by: Niels Ole Salscheider 
---
 src/gallium/drivers/radeon/r600_pipe_common.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/gallium/drivers/radeon/r600_pipe_common.c b/src/gallium/drivers/radeon/r600_pipe_common.c
index ae203b6..0d9ce17 100644
--- a/src/gallium/drivers/radeon/r600_pipe_common.c
+++ b/src/gallium/drivers/radeon/r600_pipe_common.c
@@ -174,6 +174,9 @@ static void r600_flush_from_st(struct pipe_context *ctx,
 	if (flags & PIPE_FLUSH_END_OF_FRAME)
 		rflags |= RADEON_FLUSH_END_OF_FRAME;
 
+	if (rctx->flags & R600_CONTEXT_FLAG_COMPUTE)
+		rflags |= RADEON_FLUSH_COMPUTE;
+
 	if (rctx->rings.dma.cs) {
 		rctx->rings.dma.flush(rctx, rflags, NULL);
 	}
-- 
2.1.0

>From e578f9c067de68e9401f798a78c1ed785ceb1137 Mon Sep 17 00:00:00 2001
From: Niels Ole Salscheider 
Date: Mon, 22 Sep 2014 19:57:52 +0200
Subject: [PATCH 2/2] r600: set R600_CONTEXT_FLAG_COMPUTE in compute_emit_cs

Signed-off-by: Niels Ole Salscheider 
---
 src/gallium/drivers/r600/evergreen_compute.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/r600/evergreen_compute.c b/src/gallium/drivers/r600/evergreen_compute.c
index 38b78c7..03118e1 100644
--- a/src/gallium/drivers/r600/evergreen_compute.c
+++ b/src/gallium/drivers/r600/evergreen_compute.c
@@ -420,7 +420,9 @@ static void compute_emit_cs(struct r600_context *ctx, const uint *block_layout,
 	 */
 	r600_emit_command_buffer(cs, &ctx->start_compute_cs_cmd);
 
-	ctx->b.flags |= R600_CONTEXT_WAIT_3D_IDLE | R600_CONTEXT_FLUSH_AND_INV;
+	ctx->b.flags |= R600_CONTEXT_WAIT_3D_IDLE |
+	  R600_CONTEXT_FLUSH_AND_INV |
+	  R600_CONTEXT_FLAG_COMPUTE;
 	r600_flush_emit(ctx);
 
 	/* Emit colorbuffers. */
@@ -485,7 +487,8 @@ static void compute_emit_cs(struct r600_context *ctx, const uint *block_layout,
 	 */
 	ctx->b.flags |= R600_CONTEXT_INV_CONST_CACHE |
 		  R600_CONTEXT_INV_VERTEX_CACHE |
-	  R600_CONTEXT_INV_TEX_CACHE;
+	  R600_CONTEXT_INV_TEX_CACHE |
+	  R600_CONTEXT_FLAG_COMPUTE;
 	r600_flush_emit(ctx);
 	ctx->b.flags = 0;
 
-- 
2.1.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] winsys/radeon: remove superfluous distinction of cases

2013-12-18 Thread Niels Ole Salscheider

Signed-off-by: Niels Ole Salscheider 
---
 src/gallium/winsys/radeon/drm/radeon_drm_cs.c | 20 +---
 1 file changed, 5 insertions(+), 15 deletions(-)

diff --git a/src/gallium/winsys/radeon/drm/radeon_drm_cs.c 
b/src/gallium/winsys/radeon/drm/radeon_drm_cs.c
index acb12b2..d8ad297 100644
--- a/src/gallium/winsys/radeon/drm/radeon_drm_cs.c
+++ b/src/gallium/winsys/radeon/drm/radeon_drm_cs.c
@@ -482,22 +482,12 @@ static void radeon_drm_cs_flush(struct radeon_winsys_cs 
*rcs, unsigned flags, ui
/* pad DMA ring to 8 DWs to meet CP fetch alignment requirements
 * r6xx, requires at least 4 dw alignment to avoid a hw bug.
 */
-if (flags & RADEON_FLUSH_COMPUTE) {
-   if (cs->ws->info.chip_class <= SI) {
-   while (rcs->cdw & 7)
-   OUT_CS(&cs->base, 0x8000); /* type2 nop 
packet */
-   } else {
-   while (rcs->cdw & 7)
-   OUT_CS(&cs->base, 0x1000); /* type3 nop 
packet */
-   }
+   if (cs->ws->info.chip_class <= SI) {
+   while (rcs->cdw & 7)
+   OUT_CS(&cs->base, 0x8000); /* type2 nop packet 
*/
} else {
-   if (cs->ws->info.chip_class <= SI) {
-   while (rcs->cdw & 7)
-   OUT_CS(&cs->base, 0x8000); /* type2 nop 
packet */
-   } else {
-   while (rcs->cdw & 7)
-   OUT_CS(&cs->base, 0x1000); /* type3 nop 
packet */
-   }
+   while (rcs->cdw & 7)
+   OUT_CS(&cs->base, 0x1000); /* type3 nop packet 
*/
}
break;
 case RING_UVD:
-- 
1.8.5.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] r600: compute memory pool size is given in dw

2014-03-03 Thread Niels Ole Salscheider

Multiply the dw value by 4 in order to map the complete buffer.

Signed-off-by: Niels Ole Salscheider 
---
 src/gallium/drivers/r600/compute_memory_pool.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/r600/compute_memory_pool.c 
b/src/gallium/drivers/r600/compute_memory_pool.c
index 90d5358..2f0d4c8 100644
--- a/src/gallium/drivers/r600/compute_memory_pool.c
+++ b/src/gallium/drivers/r600/compute_memory_pool.c
@@ -449,7 +449,7 @@ void compute_memory_transfer(
 
if (device_to_host) {
map = pipe->transfer_map(pipe, gart, 0, PIPE_TRANSFER_READ,
-   &(struct pipe_box) { .width = aligned_size,
+   &(struct pipe_box) { .width = aligned_size * 4,
.height = 1, .depth = 1 }, &xfer);
 assert(xfer);
assert(map);
@@ -457,7 +457,7 @@ void compute_memory_transfer(
pipe->transfer_unmap(pipe, xfer);
} else {
map = pipe->transfer_map(pipe, gart, 0, PIPE_TRANSFER_WRITE,
-   &(struct pipe_box) { .width = aligned_size,
+   &(struct pipe_box) { .width = aligned_size * 4,
.height = 1, .depth = 1 }, &xfer);
assert(xfer);
assert(map);
-- 
1.9.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 1/2] util/u_upload_mgr: Allow to also use it for downloads

2014-03-03 Thread Niels Ole Salscheider

Signed-off-by: Niels Ole Salscheider 
---
 src/gallium/auxiliary/hud/hud_context.c   |  3 +-
 src/gallium/auxiliary/util/u_blitter.c|  3 +-
 src/gallium/auxiliary/util/u_upload_mgr.c | 49 +++
 src/gallium/auxiliary/util/u_upload_mgr.h | 13 +--
 src/gallium/auxiliary/util/u_vbuf.c   |  3 +-
 src/gallium/auxiliary/vl/vl_compositor.c  |  3 +-
 src/gallium/drivers/ilo/ilo_context.c |  3 +-
 src/gallium/drivers/r300/r300_context.c   |  3 +-
 src/gallium/drivers/radeon/r600_pipe_common.c |  3 +-
 src/mesa/state_tracker/st_context.c   |  9 +++--
 10 files changed, 64 insertions(+), 28 deletions(-)

diff --git a/src/gallium/auxiliary/hud/hud_context.c 
b/src/gallium/auxiliary/hud/hud_context.c
index 465013c..567ec99 100644
--- a/src/gallium/auxiliary/hud/hud_context.c
+++ b/src/gallium/auxiliary/hud/hud_context.c
@@ -938,7 +938,8 @@ hud_create(struct pipe_context *pipe, struct cso_context 
*cso)
hud->pipe = pipe;
hud->cso = cso;
hud->uploader = u_upload_create(pipe, 256 * 1024, 16,
-   PIPE_BIND_VERTEX_BUFFER);
+   PIPE_BIND_VERTEX_BUFFER,
+   UPLOAD_MGR_UPLOAD);
 
/* font */
if (!util_font_create(pipe, UTIL_FONT_FIXED_8X13, &hud->font)) {
diff --git a/src/gallium/auxiliary/util/u_blitter.c 
b/src/gallium/auxiliary/util/u_blitter.c
index 95e7fb6..fb606ee 100644
--- a/src/gallium/auxiliary/util/u_blitter.c
+++ b/src/gallium/auxiliary/util/u_blitter.c
@@ -333,7 +333,8 @@ struct blitter_context *util_blitter_create(struct 
pipe_context *pipe)
for (i = 0; i < 4; i++)
   ctx->vertices[i][0][3] = 1; /*v.w*/
 
-   ctx->upload = u_upload_create(pipe, 65536, 4, PIPE_BIND_VERTEX_BUFFER);
+   ctx->upload = u_upload_create(pipe, 65536, 4, PIPE_BIND_VERTEX_BUFFER,
+ UPLOAD_MGR_UPLOAD);
 
return &ctx->base;
 }
diff --git a/src/gallium/auxiliary/util/u_upload_mgr.c 
b/src/gallium/auxiliary/util/u_upload_mgr.c
index 744ea2e..3205cd1 100644
--- a/src/gallium/auxiliary/util/u_upload_mgr.c
+++ b/src/gallium/auxiliary/util/u_upload_mgr.c
@@ -41,11 +41,14 @@
 struct u_upload_mgr {
struct pipe_context *pipe;
 
-   unsigned default_size;  /* Minimum size of the upload buffer, in bytes. */
-   unsigned alignment; /* Alignment of each sub-allocation. */
-   unsigned bind;  /* Bitmask of PIPE_BIND_* flags. */
-   unsigned map_flags; /* Bitmask of PIPE_TRANSFER_* flags. */
-   boolean map_persistent; /* If persistent mappings are supported. */
+   unsigned default_size;  /* Minimum size of the upload buffer,
+* in bytes. */
+   unsigned alignment; /* Alignment of each sub-allocation. */
+   unsigned bind;  /* Bitmask of PIPE_BIND_* flags. */
+   unsigned map_flags; /* Bitmask of PIPE_TRANSFER_* flags. */
+   boolean map_persistent; /* If persistent mappings are supported. */
+   enum u_upload_mgr_usage usage;  /* Usage of the upload manager
+* (for uploads or downloads) */
 
struct pipe_resource *buffer;   /* Upload buffer. */
struct pipe_transfer *transfer; /* Transfer object for the upload buffer. */
@@ -58,7 +61,8 @@ struct u_upload_mgr {
 struct u_upload_mgr *u_upload_create( struct pipe_context *pipe,
   unsigned default_size,
   unsigned alignment,
-  unsigned bind )
+  unsigned bind,
+  enum u_upload_mgr_usage usage )
 {
struct u_upload_mgr *upload = CALLOC_STRUCT( u_upload_mgr );
if (!upload)
@@ -68,20 +72,29 @@ struct u_upload_mgr *u_upload_create( struct pipe_context 
*pipe,
upload->default_size = default_size;
upload->alignment = alignment;
upload->bind = bind;
+   upload->usage = usage;
 
upload->map_persistent =
   pipe->screen->get_param(pipe->screen,
   PIPE_CAP_BUFFER_MAP_PERSISTENT_COHERENT);
 
if (upload->map_persistent) {
-  upload->map_flags = PIPE_TRANSFER_WRITE |
-  PIPE_TRANSFER_PERSISTENT |
+  upload->map_flags = PIPE_TRANSFER_PERSISTENT |
   PIPE_TRANSFER_COHERENT;
+  if (usage == UPLOAD_MGR_UPLOAD) {
+  upload->map_flags |= PIPE_TRANSFER_WRITE;
+  } else {
+  upload->map_flags |= PIPE_TRANSFER_READ;
+  }
}
else {
-  upload->map_flags = PIPE_TRANSFER_WRITE |
-  PIPE_TRANSFER_UNSYNCHRONIZED |
-  PIPE_TRANSFER_FLUSH_EXPLICIT;
+  if (usage == UPLOAD_MGR_UPLOAD) {
+  upload->map_flags = PIPE_TRANSFER_WRITE |
+  PIPE_TRANSFER_UNSYNCHRONI

[Mesa-dev] [PATCH 0/2] radeon: Use the DMA engine for buffer downloads

2014-03-03 Thread Niels Ole Salscheider

Using the DMA engine for buffer downloads vastly improves performance. This is
because reads from VRAM by the CPU are slow because of the high latency of the
PCIe bus.

The first patch allows u_upload_mgr to be used for downloads, too. The second
patch then uses u_upload_mgr in the radeon driver for downloads.
I considered to rename u_upload_mgr to u_transfer_mgr since it might be
confusing that an "upload manager" can be used for downloads. But then again we
also have "transfers" so that u_transfer_mgr might also be confusing. Thus, I
decided not to rename it for now.

Without these patches, the buffer_bandwidth benchmark from uCLbench gives me:

./buffer_bandwidth --size=2000 --iterations=100
# device 0: AMD BARTS // type gpu (192 MB global memory, 64 KB constant memory,
32 KB local memory)
1/1 direct 2000 Bytes   759.29 MB/s(HD) 17.13 MB/s(DD)
14.61 MB/s(DH)

With these paches, the read performance is much better:

./buffer_bandwidth --size=2000 --iterations=100
# device 0: AMD BARTS // type gpu (192 MB global memory, 64 KB constant memory,
32 KB local memory)
1/1 direct 2000 Bytes   759.90 MB/s(HD) 613.49 MB/s(DD)
1841.07 MB/s(DH)

Judging by these numbers, it might even make sense to use the DMA engine for
larger buffer downloads...

Niels Ole Salscheider (2):
  util/u_upload_mgr: Allow to also use it for downloads
  radeon: Use transfer manager for buffer downloads

 src/gallium/auxiliary/hud/hud_context.c |  3 +-
 src/gallium/auxiliary/util/u_blitter.c  |  3 +-
 src/gallium/auxiliary/util/u_upload_mgr.c   | 49 +++-
 src/gallium/auxiliary/util/u_upload_mgr.h   | 13 -
 src/gallium/auxiliary/util/u_vbuf.c |  3 +-
 src/gallium/auxiliary/vl/vl_compositor.c|  3 +-
 src/gallium/drivers/ilo/ilo_context.c   |  3 +-
 src/gallium/drivers/r300/r300_context.c |  3 +-
 src/gallium/drivers/radeon/r600_buffer_common.c | 78 +++--
 src/gallium/drivers/radeon/r600_pipe_common.c   | 14 -
 src/gallium/drivers/radeon/r600_pipe_common.h   |  1 +
 src/mesa/state_tracker/st_context.c |  9 ++-
 12 files changed, 136 insertions(+), 46 deletions(-)

-- 
1.9.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/2] radeon: Use transfer manager for buffer downloads

2014-03-03 Thread Niels Ole Salscheider

Using DMA for reads is much faster.

Signed-off-by: Niels Ole Salscheider 
---
 src/gallium/drivers/radeon/r600_buffer_common.c | 78 +++--
 src/gallium/drivers/radeon/r600_pipe_common.c   | 11 
 src/gallium/drivers/radeon/r600_pipe_common.h   |  1 +
 3 files changed, 72 insertions(+), 18 deletions(-)

diff --git a/src/gallium/drivers/radeon/r600_buffer_common.c 
b/src/gallium/drivers/radeon/r600_buffer_common.c
index 340ebb2..c910107 100644
--- a/src/gallium/drivers/radeon/r600_buffer_common.c
+++ b/src/gallium/drivers/radeon/r600_buffer_common.c
@@ -260,6 +260,46 @@ static void *r600_buffer_transfer_map(struct pipe_context 
*ctx,
/* At this point, the buffer is always idle (we checked it 
above). */
usage |= PIPE_TRANSFER_UNSYNCHRONIZED;
}
+   /* Using DMA for larger reads is much faster */
+   else if ((usage & PIPE_TRANSFER_READ) &&
+!(usage & PIPE_TRANSFER_WRITE) &&
+(rbuffer->domains == RADEON_DOMAIN_VRAM) &&
+(rscreen->has_cp_dma ||
+ (rscreen->has_streamout &&
+  /* The buffer range must be aligned to 4 with streamout. */
+  box->x % 4 == 0 && box->width % 4 == 0))) {
+   unsigned offset;
+   struct r600_resource *staging = NULL;
+
+   u_upload_alloc(rctx->downloader, 0,
+  box->width + (box->x % 
R600_MAP_BUFFER_ALIGNMENT),
+  &offset, (struct pipe_resource**)&staging, 
(void**)&data);
+
+   if (staging) {
+   data += box->x % R600_MAP_BUFFER_ALIGNMENT;
+
+   /* Copy the staging buffer into the original one. */
+   if (rctx->dma_copy(ctx, (struct pipe_resource*)staging, 
0,
+box->x % 
R600_MAP_BUFFER_ALIGNMENT,
+0, 0, resource, level, box)) {
+   rctx->rings.gfx.flush(rctx, 0);
+   if (rctx->rings.dma.cs)
+   rctx->rings.dma.flush(rctx, 0);
+
+   /* Wait for any offloaded CS flush to complete
+* to avoid busy-waiting in the winsys. */
+   rctx->ws->cs_sync_flush(rctx->rings.gfx.cs);
+   if (rctx->rings.dma.cs)
+   
rctx->ws->cs_sync_flush(rctx->rings.dma.cs);
+
+   rctx->ws->buffer_wait(staging->buf, 
RADEON_USAGE_READWRITE);
+   return r600_buffer_get_transfer(ctx, resource, 
level, usage, box,
+   ptransfer, 
data, staging, offset);
+   } else {
+   pipe_resource_reference((struct 
pipe_resource**)&staging, NULL);
+   }
+   }
+   }
 
data = r600_buffer_map_sync_with_rings(rctx, rbuffer, usage);
if (!data) {
@@ -279,24 +319,26 @@ static void r600_buffer_transfer_unmap(struct 
pipe_context *ctx,
struct r600_resource *rbuffer = r600_resource(transfer->resource);
 
if (rtransfer->staging) {
-   struct pipe_resource *dst, *src;
-   unsigned soffset, doffset, size;
-   struct pipe_box box;
-
-   dst = transfer->resource;
-   src = &rtransfer->staging->b.b;
-   size = transfer->box.width;
-   doffset = transfer->box.x;
-   soffset = rtransfer->offset + transfer->box.x % 
R600_MAP_BUFFER_ALIGNMENT;
-
-   u_box_1d(soffset, size, &box);
-
-   /* Copy the staging buffer into the original one. */
-   if (!(size % 4) && !(doffset % 4) && !(soffset % 4) &&
-   rctx->dma_copy(ctx, dst, 0, doffset, 0, 0, src, 0, &box)) {
-   /* DONE. */
-   } else {
-   ctx->resource_copy_region(ctx, dst, 0, doffset, 0, 0, 
src, 0, &box);
+   if (rtransfer->transfer.usage & PIPE_TRANSFER_WRITE) {
+   struct pipe_resource *dst, *src;
+   unsigned soffset, doffset, size;
+   struct pipe_box box;
+
+   dst = transfer->resource;
+   src = &rtransfer->staging->b.b;
+   size = transfer->box.width;
+   doffset = transfer->box.x;
+   soffset = rtransfer->offset + transfer->box.x % 
R600_MAP_BUFFER_ALIGNMENT;
+
+   u_box_1d(soffs

[Mesa-dev] [PATCH v2] radeon: Use upload manager for buffer downloads

2014-03-04 Thread Niels Ole Salscheider

Using DMA for reads is much faster.

Signed-off-by: Niels Ole Salscheider 
---
 src/gallium/drivers/radeon/r600_buffer_common.c | 78 +++--
 1 file changed, 60 insertions(+), 18 deletions(-)

diff --git a/src/gallium/drivers/radeon/r600_buffer_common.c 
b/src/gallium/drivers/radeon/r600_buffer_common.c
index 340ebb2..ed3a08c 100644
--- a/src/gallium/drivers/radeon/r600_buffer_common.c
+++ b/src/gallium/drivers/radeon/r600_buffer_common.c
@@ -260,6 +260,46 @@ static void *r600_buffer_transfer_map(struct pipe_context 
*ctx,
/* At this point, the buffer is always idle (we checked it 
above). */
usage |= PIPE_TRANSFER_UNSYNCHRONIZED;
}
+   /* Using DMA for larger reads is much faster */
+   else if ((usage & PIPE_TRANSFER_READ) &&
+!(usage & PIPE_TRANSFER_WRITE) &&
+(rbuffer->domains == RADEON_DOMAIN_VRAM) &&
+(rscreen->has_cp_dma ||
+ (rscreen->has_streamout &&
+  /* The buffer range must be aligned to 4 with streamout. */
+  box->x % 4 == 0 && box->width % 4 == 0))) {
+   unsigned offset;
+   struct r600_resource *staging = NULL;
+
+   u_upload_alloc(rctx->uploader, 0,
+  box->width + (box->x % 
R600_MAP_BUFFER_ALIGNMENT),
+  &offset, (struct pipe_resource**)&staging, 
(void**)&data);
+
+   if (staging) {
+   data += box->x % R600_MAP_BUFFER_ALIGNMENT;
+
+   /* Copy the staging buffer into the original one. */
+   if (rctx->dma_copy(ctx, (struct pipe_resource*)staging, 
0,
+box->x % 
R600_MAP_BUFFER_ALIGNMENT,
+0, 0, resource, level, box)) {
+   rctx->rings.gfx.flush(rctx, 0);
+   if (rctx->rings.dma.cs)
+   rctx->rings.dma.flush(rctx, 0);
+
+   /* Wait for any offloaded CS flush to complete
+* to avoid busy-waiting in the winsys. */
+   rctx->ws->cs_sync_flush(rctx->rings.gfx.cs);
+   if (rctx->rings.dma.cs)
+   
rctx->ws->cs_sync_flush(rctx->rings.dma.cs);
+
+   rctx->ws->buffer_wait(staging->buf, 
RADEON_USAGE_READ);
+   return r600_buffer_get_transfer(ctx, resource, 
level, usage, box,
+   ptransfer, 
data, staging, offset);
+   } else {
+   pipe_resource_reference((struct 
pipe_resource**)&staging, NULL);
+   }
+   }
+   }
 
data = r600_buffer_map_sync_with_rings(rctx, rbuffer, usage);
if (!data) {
@@ -279,24 +319,26 @@ static void r600_buffer_transfer_unmap(struct 
pipe_context *ctx,
struct r600_resource *rbuffer = r600_resource(transfer->resource);
 
if (rtransfer->staging) {
-   struct pipe_resource *dst, *src;
-   unsigned soffset, doffset, size;
-   struct pipe_box box;
-
-   dst = transfer->resource;
-   src = &rtransfer->staging->b.b;
-   size = transfer->box.width;
-   doffset = transfer->box.x;
-   soffset = rtransfer->offset + transfer->box.x % 
R600_MAP_BUFFER_ALIGNMENT;
-
-   u_box_1d(soffset, size, &box);
-
-   /* Copy the staging buffer into the original one. */
-   if (!(size % 4) && !(doffset % 4) && !(soffset % 4) &&
-   rctx->dma_copy(ctx, dst, 0, doffset, 0, 0, src, 0, &box)) {
-   /* DONE. */
-   } else {
-   ctx->resource_copy_region(ctx, dst, 0, doffset, 0, 0, 
src, 0, &box);
+   if (rtransfer->transfer.usage & PIPE_TRANSFER_WRITE) {
+   struct pipe_resource *dst, *src;
+   unsigned soffset, doffset, size;
+   struct pipe_box box;
+
+   dst = transfer->resource;
+   src = &rtransfer->staging->b.b;
+   size = transfer->box.width;
+   doffset = transfer->box.x;
+   soffset = rtransfer->offset + transfer->box.x % 
R600_MAP_BUFFER_ALIGNMENT;
+
+   u_box_1d(soffset, size, &box);
+
+   /* Copy the staging buffer into the original one.

Re: [Mesa-dev] [PATCH 0/2] radeon: Use the DMA engine for buffer downloads

2014-03-04 Thread Niels Ole Salscheider

> Could you please do this without changing u_upload_mgr? You can still
> use u_upload_alloc to allocate buffer memory in the driver and the map
> buffer read/write flags are not important with persistent coherent
> buffer mappings anyway.

I have sent an updated patch to the list.

Ole
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v3] radeon: Use upload manager for buffer downloads

2014-03-05 Thread Niels Ole Salscheider

Using DMA for reads is much faster.

Signed-off-by: Niels Ole Salscheider 
---
 src/gallium/drivers/radeon/r600_buffer_common.c | 74 +++--
 1 file changed, 56 insertions(+), 18 deletions(-)

diff --git a/src/gallium/drivers/radeon/r600_buffer_common.c 
b/src/gallium/drivers/radeon/r600_buffer_common.c
index 340ebb2..90ca8cb 100644
--- a/src/gallium/drivers/radeon/r600_buffer_common.c
+++ b/src/gallium/drivers/radeon/r600_buffer_common.c
@@ -260,6 +260,42 @@ static void *r600_buffer_transfer_map(struct pipe_context 
*ctx,
/* At this point, the buffer is always idle (we checked it 
above). */
usage |= PIPE_TRANSFER_UNSYNCHRONIZED;
}
+   /* Using DMA for larger reads is much faster */
+   else if ((usage & PIPE_TRANSFER_READ) &&
+!(usage & PIPE_TRANSFER_WRITE) &&
+(rbuffer->domains == RADEON_DOMAIN_VRAM)) {
+   unsigned offset;
+   struct r600_resource *staging = NULL;
+
+   u_upload_alloc(rctx->uploader, 0,
+  box->width + (box->x % 
R600_MAP_BUFFER_ALIGNMENT),
+  &offset, (struct pipe_resource**)&staging, 
(void**)&data);
+
+   if (staging) {
+   data += box->x % R600_MAP_BUFFER_ALIGNMENT;
+
+   /* Copy the staging buffer into the original one. */
+   if (rctx->dma_copy(ctx, (struct pipe_resource*)staging, 
0,
+box->x % 
R600_MAP_BUFFER_ALIGNMENT,
+0, 0, resource, level, box)) {
+   rctx->rings.gfx.flush(rctx, 0);
+   if (rctx->rings.dma.cs)
+   rctx->rings.dma.flush(rctx, 0);
+
+   /* Wait for any offloaded CS flush to complete
+* to avoid busy-waiting in the winsys. */
+   rctx->ws->cs_sync_flush(rctx->rings.gfx.cs);
+   if (rctx->rings.dma.cs)
+   
rctx->ws->cs_sync_flush(rctx->rings.dma.cs);
+
+   rctx->ws->buffer_wait(staging->buf, 
RADEON_USAGE_WRITE);
+   return r600_buffer_get_transfer(ctx, resource, 
level, usage, box,
+   ptransfer, 
data, staging, offset);
+   } else {
+   pipe_resource_reference((struct 
pipe_resource**)&staging, NULL);
+   }
+   }
+   }
 
data = r600_buffer_map_sync_with_rings(rctx, rbuffer, usage);
if (!data) {
@@ -279,24 +315,26 @@ static void r600_buffer_transfer_unmap(struct 
pipe_context *ctx,
struct r600_resource *rbuffer = r600_resource(transfer->resource);
 
if (rtransfer->staging) {
-   struct pipe_resource *dst, *src;
-   unsigned soffset, doffset, size;
-   struct pipe_box box;
-
-   dst = transfer->resource;
-   src = &rtransfer->staging->b.b;
-   size = transfer->box.width;
-   doffset = transfer->box.x;
-   soffset = rtransfer->offset + transfer->box.x % 
R600_MAP_BUFFER_ALIGNMENT;
-
-   u_box_1d(soffset, size, &box);
-
-   /* Copy the staging buffer into the original one. */
-   if (!(size % 4) && !(doffset % 4) && !(soffset % 4) &&
-   rctx->dma_copy(ctx, dst, 0, doffset, 0, 0, src, 0, &box)) {
-   /* DONE. */
-   } else {
-   ctx->resource_copy_region(ctx, dst, 0, doffset, 0, 0, 
src, 0, &box);
+   if (rtransfer->transfer.usage & PIPE_TRANSFER_WRITE) {
+   struct pipe_resource *dst, *src;
+   unsigned soffset, doffset, size;
+   struct pipe_box box;
+
+   dst = transfer->resource;
+   src = &rtransfer->staging->b.b;
+   size = transfer->box.width;
+   doffset = transfer->box.x;
+   soffset = rtransfer->offset + transfer->box.x % 
R600_MAP_BUFFER_ALIGNMENT;
+
+   u_box_1d(soffset, size, &box);
+
+   /* Copy the staging buffer into the original one. */
+   if (!(size % 4) && !(doffset % 4) && !(soffset % 4) &&
+   rctx->dma_copy(ctx, dst, 0, doffset, 0, 0, src, 0, 
&box)) {
+   /* DONE. */
+   } els

Re: [Mesa-dev] [PATCH v2] radeon: Use upload manager for buffer downloads

2014-03-05 Thread Niels Ole Salscheider

On Tuesday 04 March 2014, 23:43:01, Marek Olšák wrote:
> You check for streamout and CP DMA support, but you don't use
> resource_copy_region if DMA is not supported. The CP DMA and
> streamout-based buffer copying is only used by resource_copy_region.

Oh, right. I initially used resource_copy_region as a fallback and forgot to 
remove these checks. I have sent an updated patch to the list.

> The last parameter of buffer_wait should be RADEON_USAGE_WRITE (you're
> waiting for the last write to the staging buffer), but that parameter
> is not used by the winsys yet.
> 
> Other than those two, the patch looks good.
> 
> CP DMA != async DMA (dma_copy). CP DMA is actually a feature of the
> graphics ring.
> 
> Marek
> 
> On Tue, Mar 4, 2014 at 6:23 PM, Niels Ole Salscheider
> 
>  wrote:
> > Using DMA for reads is much faster.
> > 
> > Signed-off-by: Niels Ole Salscheider 
> > ---
> > 
> >  src/gallium/drivers/radeon/r600_buffer_common.c | 78
> >  +++-- 1 file changed, 60 insertions(+), 18
> >  deletions(-)
> > 
> > diff --git a/src/gallium/drivers/radeon/r600_buffer_common.c
> > b/src/gallium/drivers/radeon/r600_buffer_common.c index 340ebb2..ed3a08c
> > 100644
> > --- a/src/gallium/drivers/radeon/r600_buffer_common.c
> > +++ b/src/gallium/drivers/radeon/r600_buffer_common.c
> > @@ -260,6 +260,46 @@ static void *r600_buffer_transfer_map(struct
> > pipe_context *ctx,> 
> > /* At this point, the buffer is always idle (we checked it
> > above). */
> > usage |= PIPE_TRANSFER_UNSYNCHRONIZED;
> > 
> > }
> > 
> > +   /* Using DMA for larger reads is much faster */
> > +   else if ((usage & PIPE_TRANSFER_READ) &&
> > +!(usage & PIPE_TRANSFER_WRITE) &&
> > +(rbuffer->domains == RADEON_DOMAIN_VRAM) &&
> > +(rscreen->has_cp_dma ||
> > + (rscreen->has_streamout &&
> > +  /* The buffer range must be aligned to 4 with
> > streamout. */ +  box->x % 4 == 0 && box->width % 4 ==
> > 0))) {
> > +   unsigned offset;
> > +   struct r600_resource *staging = NULL;
> > +
> > +   u_upload_alloc(rctx->uploader, 0,
> > +  box->width + (box->x %
> > R600_MAP_BUFFER_ALIGNMENT), +  &offset,
> > (struct pipe_resource**)&staging, (void**)&data); +
> > +   if (staging) {
> > +   data += box->x % R600_MAP_BUFFER_ALIGNMENT;
> > +
> > +   /* Copy the staging buffer into the original one.
> > */ +   if (rctx->dma_copy(ctx, (struct
> > pipe_resource*)staging, 0, + 
> >   box->x % R600_MAP_BUFFER_ALIGNMENT, +  
> >  0, 0, resource, level, box)) { +
> >   rctx->rings.gfx.flush(rctx, 0);
> > +   if (rctx->rings.dma.cs)
> > +   rctx->rings.dma.flush(rctx, 0);
> > +
> > +   /* Wait for any offloaded CS flush to
> > complete +* to avoid busy-waiting in the
> > winsys. */ +  
> > rctx->ws->cs_sync_flush(rctx->rings.gfx.cs); +   
> >if (rctx->rings.dma.cs)
> > +  
> > rctx->ws->cs_sync_flush(rctx->rings.dma.cs); +
> > +   rctx->ws->buffer_wait(staging->buf,
> > RADEON_USAGE_READ); +   return
> > r600_buffer_get_transfer(ctx, resource, level, usage, box, + 
> >  ptransfer, data,
> > staging, offset); +   } else {
> > +   pipe_resource_reference((struct
> > pipe_resource**)&staging, NULL); +   }
> > +   }
> > +   }
> > 
> > data = r600_buffer_map_sync_with_rings(rctx, rbuffer, usage);
> > if (!data) {
> > 
> > @@ -279,24 +319,26 @@ static void r600_buffer_transfer_unmap(struct
> > pipe_context *ctx,> 
> > struct r600_resource *rbuffer = r600_resource(transfer->resource);
> > 
> > if (rtransfer->stag

Re: [Mesa-dev] [PATCH 1/3] r600g, radeonsi: use a fallback in dma_copy instead of failing

2014-03-09 Thread Niels Ole Salscheider

On Sunday 09 March 2014, 02:24:51, Marek Olšák wrote:
> From: Marek Olšák 
> 
> ---
>  src/gallium/drivers/r600/evergreen_state.c  | 37 +---
>  src/gallium/drivers/r600/r600_state.c   | 41 ++---
>  src/gallium/drivers/radeon/r600_buffer_common.c | 58
> +++-- src/gallium/drivers/radeon/r600_pipe_common.h   |
> 17 
>  src/gallium/drivers/radeon/r600_texture.c   | 18 +++-
>  src/gallium/drivers/radeonsi/si_state.c | 19 
>  6 files changed, 97 insertions(+), 93 deletions(-)
> 
> diff --git a/src/gallium/drivers/r600/evergreen_state.c
> b/src/gallium/drivers/r600/evergreen_state.c index dca7c58..5e57f8d 100644
> --- a/src/gallium/drivers/r600/evergreen_state.c
> +++ b/src/gallium/drivers/r600/evergreen_state.c
> @@ -3329,13 +3329,13 @@ static void evergreen_dma_copy_tile(struct
> r600_context *rctx, }
>  }
> 
> -static boolean evergreen_dma_blit(struct pipe_context *ctx,
> -   struct pipe_resource *dst,
> -   unsigned dst_level,
> -   unsigned dst_x, unsigned dst_y, unsigned 
> dst_z,
> -   struct pipe_resource *src,
> -   unsigned src_level,
> -   const struct pipe_box *src_box)
> +static void evergreen_dma_blit(struct pipe_context *ctx,
> +struct pipe_resource *dst,
> +unsigned dst_level,
> +unsigned dst_x, unsigned dst_y, unsigned dst_z,
> +struct pipe_resource *src,
> +unsigned src_level,
> +const struct pipe_box *src_box)
>  {
>   struct r600_context *rctx = (struct r600_context *)ctx;
>   struct r600_texture *rsrc = (struct r600_texture*)src;
> @@ -3345,19 +3345,22 @@ static boolean evergreen_dma_blit(struct
> pipe_context *ctx, unsigned src_x, src_y;
> 
>   if (rctx->b.rings.dma.cs == NULL) {
> - return FALSE;
> + goto fallback;
>   }
> 
>   if (dst->target == PIPE_BUFFER && src->target == PIPE_BUFFER) {
> + if (dst_x % 4 || src_box->x % 4 || src_box->width % 4)
> + goto fallback;

Why do we need this? I think that the async DMA engine can handle byte aligned 
copies. It is streamout that needs x and width to be dw aligned, isn't it?

> +
>   evergreen_dma_copy(rctx, dst, src, dst_x, src_box->x, src_box-
>width);
> - return TRUE;
> + return;
>   }
> 
>   if (src->format != dst->format) {
> - return FALSE;
> + goto fallback;
>   }
>   if (rdst->dirty_level_mask != 0) {
> - return FALSE;
> + goto fallback;
>   }
>   if (rsrc->dirty_level_mask) {
>   ctx->flush_resource(ctx, src);
> @@ -3383,13 +3386,13 @@ static boolean evergreen_dma_blit(struct
> pipe_context *ctx,
> 
>   if (src_pitch != dst_pitch || src_box->x || dst_x || src_w != dst_w) {
>   /* FIXME evergreen can do partial blit */
> - return FALSE;
> + goto fallback;
>   }
>   /* the x test here are currently useless (because we don't support 
partial
> blit) * but keep them around so we don't forget about those
>*/
>   if ((src_pitch & 0x7) || (src_box->x & 0x7) || (dst_x & 0x7) ||
> (src_box->y & 0x7) || (dst_y & 0x7)) { -  return FALSE;
> + goto fallback;
>   }
> 
>   /* 128 bpp surfaces require non_disp_tiling for both
> @@ -3400,7 +3403,7 @@ static boolean evergreen_dma_blit(struct pipe_context
> *ctx, if ((rctx->b.chip_class == CAYMAN) &&
>   (src_mode != dst_mode) &&
>   (util_format_get_blocksize(src->format) >= 16)) {
> - return FALSE;
> + goto fallback;
>   }
> 
>   if (src_mode == dst_mode) {
> @@ -3423,7 +3426,11 @@ static boolean evergreen_dma_blit(struct pipe_context
> *ctx, src, src_level, src_x, src_y, src_box->z,
>   copy_height, dst_pitch, bpp);
>   }
> - return TRUE;
> + return;
> +
> +fallback:
> + ctx->resource_copy_region(ctx, dst, dst_level, dst_x, dst_y, dst_z,
> +   src, src_level, src_box);
>  }
> 
>  void evergreen_init_state_functions(struct r600_context *rctx)
> diff --git a/src/gallium/drivers/r600/r600_state.c
> b/src/gallium/drivers/r600/r600_state.c index 6d89e6c..a0e6d2d 100644
> --- a/src/gallium/drivers/r600/r600_state.c
> +++ b/src/gallium/drivers/r600/r600_state.c
> @@ -2883,13 +2883,13 @@ static boolean r600_dma_copy_tile(struct
> r600_context *rctx, return TRUE;
>  }
> 
> -static boolean r600_dma_blit(struct pipe_context *ctx,
> -  struct pipe_resource *dst,
> -  unsigned dst_level,
> -  unsigned dst_x, unsigned dst

Re: [Mesa-dev] [PATCH 1/3] r600g, radeonsi: use a fallback in dma_copy instead of failing

2014-03-09 Thread Niels Ole Salscheider

You are right, r600-r700 require dword alignment while linear copies can be 
byte aligned on EG+.
Apart from that, patch 1 and 2 look good to me...

Ole
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 1/2] radeonsi: Add DMA ring

2014-03-13 Thread Niels Ole Salscheider

Signed-off-by: Niels Ole Salscheider 
---
 src/gallium/drivers/radeonsi/si_hw_context.c |  3 +++
 src/gallium/drivers/radeonsi/si_pipe.c   | 22 ++
 2 files changed, 25 insertions(+)

diff --git a/src/gallium/drivers/radeonsi/si_hw_context.c 
b/src/gallium/drivers/radeonsi/si_hw_context.c
index c952c8d..d9fba01 100644
--- a/src/gallium/drivers/radeonsi/si_hw_context.c
+++ b/src/gallium/drivers/radeonsi/si_hw_context.c
@@ -123,6 +123,9 @@ void si_context_flush(struct si_context *ctx, unsigned 
flags)
 #endif
 
/* Flush the CS. */
+   if (ctx->b.rings.dma.cs) {
+   ctx->b.ws->cs_flush(ctx->b.rings.dma.cs, flags, 0);
+   }
ctx->b.ws->cs_flush(ctx->b.rings.gfx.cs, flags, 0);
 
 #if SI_TRACE_CS
diff --git a/src/gallium/drivers/radeonsi/si_pipe.c 
b/src/gallium/drivers/radeonsi/si_pipe.c
index 827e9fe..21cbedf 100644
--- a/src/gallium/drivers/radeonsi/si_pipe.c
+++ b/src/gallium/drivers/radeonsi/si_pipe.c
@@ -74,6 +74,24 @@ static void si_flush_from_winsys(void *ctx, unsigned flags)
si_flush((struct pipe_context*)ctx, NULL, flags);
 }
 
+static void si_flush_dma_from_st(void *ctx, unsigned flags)
+{
+   struct si_context *sctx = (struct si_context *)ctx;
+   struct radeon_winsys_cs *cs = sctx->b.rings.dma.cs;
+
+   if (!cs->cdw) {
+   return;
+   }
+
+   sctx->b.ws->cs_flush(cs, flags, 0);
+}
+
+static void si_flush_dma_from_winsys(void *ctx, unsigned flags)
+{
+   struct si_context *sctx = (struct si_context *)ctx;
+   sctx->b.rings.dma.flush(sctx, flags);
+}
+
 static void si_destroy_context(struct pipe_context *context)
 {
struct si_context *sctx = (struct si_context *)context;
@@ -163,6 +181,10 @@ static struct pipe_context *si_create_context(struct 
pipe_screen *screen, void *
 
sctx->b.ws->cs_set_flush_callback(sctx->b.rings.gfx.cs, 
si_flush_from_winsys, sctx);
 
+   sctx->b.rings.dma.cs = sctx->b.ws->cs_create(sctx->b.ws, RING_DMA, 
NULL);
+   sctx->b.rings.dma.flush = si_flush_dma_from_st;
+   sctx->b.ws->cs_set_flush_callback(sctx->b.rings.dma.cs, 
si_flush_dma_from_winsys, sctx);
+
sctx->blitter = util_blitter_create(&sctx->b.b);
if (sctx->blitter == NULL)
goto fail;
-- 
1.9.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/2] radeonsi: Implement DMA blit

2014-03-13 Thread Niels Ole Salscheider

This code is a slightly modified version of evergreen_dma_blit (and
evergreen_dma_copy as well as evergreen_dma_copy_tile).
It would be nice to share some of the code in the long term.

I have reused some "cik"-prefixed functions that also return the right
value for SI. I am not sure if they should be renamed.

Signed-off-by: Niels Ole Salscheider 
---
 src/gallium/drivers/radeonsi/si_hw_context.c |  65 +++
 src/gallium/drivers/radeonsi/si_pipe.h   |   7 +
 src/gallium/drivers/radeonsi/si_state.c  | 265 ++-
 src/gallium/drivers/radeonsi/sid.h   |  15 ++
 4 files changed, 346 insertions(+), 6 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_hw_context.c 
b/src/gallium/drivers/radeonsi/si_hw_context.c
index d9fba01..76583a3 100644
--- a/src/gallium/drivers/radeonsi/si_hw_context.c
+++ b/src/gallium/drivers/radeonsi/si_hw_context.c
@@ -25,6 +25,8 @@
  */
 
 #include "si_pipe.h"
+#include "sid.h"
+#include "../radeon/r600_cs.h"
 
 /* initialize */
 void si_need_cs_space(struct si_context *ctx, unsigned num_dw,
@@ -186,6 +188,69 @@ void si_begin_new_cs(struct si_context *ctx)
ctx->b.initial_gfx_cs_size = ctx->b.rings.gfx.cs->cdw;
 }
 
+void si_need_dma_space(struct si_context *ctx, unsigned num_dw)
+{
+   /* The number of dwords we already used in the DMA so far. */
+   num_dw += ctx->b.rings.dma.cs->cdw;
+   /* Flush if there's not enough space. */
+   if (num_dw > RADEON_MAX_CMDBUF_DWORDS) {
+   ctx->b.rings.dma.flush(ctx, RADEON_FLUSH_ASYNC);
+   }
+}
+
+void si_dma_copy(struct si_context *ctx,
+struct pipe_resource *dst,
+struct pipe_resource *src,
+uint64_t dst_offset,
+uint64_t src_offset,
+uint64_t size)
+{
+   struct radeon_winsys_cs *cs = ctx->b.rings.dma.cs;
+   unsigned i, ncopy, csize, sub_cmd, shift;
+   struct r600_resource *rdst = (struct r600_resource*)dst;
+   struct r600_resource *rsrc = (struct r600_resource*)src;
+
+   /* Mark the buffer range of destination as valid (initialized),
+* so that transfer_map knows it should wait for the GPU when mapping
+* that range. */
+   util_range_add(&rdst->valid_buffer_range, dst_offset,
+  dst_offset + size);
+
+   /* make sure that the dma ring is only one active */
+   ctx->b.rings.gfx.flush(ctx, RADEON_FLUSH_ASYNC);
+   dst_offset += r600_resource_va(&ctx->screen->b.b, dst);
+   src_offset += r600_resource_va(&ctx->screen->b.b, src);
+
+   /* see if we use dword or byte copy */
+   if (!(dst_offset & 0x3) && !(src_offset & 0x3) && !(size & 0x3)) {
+   size >>= 2;
+   sub_cmd = 0x00;
+   shift = 2;
+   } else {
+   sub_cmd = 0x40;
+   shift = 0;
+   }
+   ncopy = (size / 0x000f) + !!(size % 0x000f);
+
+   si_need_dma_space(ctx, ncopy * 5);
+   for (i = 0; i < ncopy; i++) {
+   csize = size < 0x000f ? size : 0x000f;
+   /* emit reloc before writting cs so that cs is always in 
consistent state */
+   r600_context_bo_reloc(&ctx->b, &ctx->b.rings.dma, rsrc, 
RADEON_USAGE_READ,
+ RADEON_PRIO_MIN);
+   r600_context_bo_reloc(&ctx->b, &ctx->b.rings.dma, rdst, 
RADEON_USAGE_WRITE,
+ RADEON_PRIO_MIN);
+   cs->buf[cs->cdw++] = SI_DMA_PACKET(SI_DMA_PACKET_COPY, sub_cmd, 
csize);
+   cs->buf[cs->cdw++] = dst_offset & 0x;
+   cs->buf[cs->cdw++] = src_offset & 0x;
+   cs->buf[cs->cdw++] = (dst_offset >> 32UL) & 0xff;
+   cs->buf[cs->cdw++] = (src_offset >> 32UL) & 0xff;
+   dst_offset += csize << shift;
+   src_offset += csize << shift;
+   size -= csize;
+   }
+}
+
 #if SI_TRACE_CS
 void si_trace_emit(struct si_context *sctx)
 {
diff --git a/src/gallium/drivers/radeonsi/si_pipe.h 
b/src/gallium/drivers/radeonsi/si_pipe.h
index 47dc8e7..45def1e 100644
--- a/src/gallium/drivers/radeonsi/si_pipe.h
+++ b/src/gallium/drivers/radeonsi/si_pipe.h
@@ -171,6 +171,13 @@ void si_decompress_color_textures(struct si_context *sctx,
 void si_context_flush(struct si_context *ctx, unsigned flags);
 void si_begin_new_cs(struct si_context *ctx);
 void si_need_cs_space(struct si_context *ctx, unsigned num_dw, boolean 
count_draw_in);
+void si_need_dma_space(struct si_context *ctx, unsigned num_dw);
+void si_dma_copy(struct si_context *ctx,
+struct pipe_resource *dst,
+struct pipe_resource *src,
+uint64_t dst_offset,
+

[Mesa-dev] [PATCH 1/2] radeon: Move DMA ring creation to common code

2014-03-13 Thread Niels Ole Salscheider

Signed-off-by: Niels Ole Salscheider 
---
 src/gallium/drivers/r600/r600_pipe.c  | 30 -
 src/gallium/drivers/r600/r600_pipe.h  |  1 -
 src/gallium/drivers/radeon/r600_pipe_common.c | 32 +++
 src/gallium/drivers/radeon/r600_pipe_common.h |  2 ++
 4 files changed, 34 insertions(+), 31 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_pipe.c 
b/src/gallium/drivers/r600/r600_pipe.c
index 88fbdd8..982e18d 100644
--- a/src/gallium/drivers/r600/r600_pipe.c
+++ b/src/gallium/drivers/r600/r600_pipe.c
@@ -48,7 +48,6 @@ static const struct debug_named_value r600_debug_options[] = {
{ "nollvm", DBG_NO_LLVM, "Disable the LLVM shader compiler" },
 #endif
{ "nocpdma", DBG_NO_CP_DMA, "Disable CP DMA" },
-   { "nodma", DBG_NO_ASYNC_DMA, "Disable asynchronous DMA" },
 
/* shader backend */
{ "nosb", DBG_NO_SB, "Disable sb backend for graphics shaders" },
@@ -121,20 +120,6 @@ static void r600_flush_gfx_ring(void *ctx, unsigned flags)
r600_flush((struct pipe_context*)ctx, flags);
 }
 
-static void r600_flush_dma_ring(void *ctx, unsigned flags)
-{
-   struct r600_context *rctx = (struct r600_context *)ctx;
-   struct radeon_winsys_cs *cs = rctx->b.rings.dma.cs;
-
-   if (!cs->cdw) {
-   return;
-   }
-
-   rctx->b.rings.dma.flushing = true;
-   rctx->b.ws->cs_flush(cs, flags, 0);
-   rctx->b.rings.dma.flushing = false;
-}
-
 static void r600_flush_from_winsys(void *ctx, unsigned flags)
 {
struct r600_context *rctx = (struct r600_context *)ctx;
@@ -142,13 +127,6 @@ static void r600_flush_from_winsys(void *ctx, unsigned 
flags)
rctx->b.rings.gfx.flush(rctx, flags);
 }
 
-static void r600_flush_dma_from_winsys(void *ctx, unsigned flags)
-{
-   struct r600_context *rctx = (struct r600_context *)ctx;
-
-   rctx->b.rings.dma.flush(rctx, flags);
-}
-
 static void r600_destroy_context(struct pipe_context *context)
 {
struct r600_context *rctx = (struct r600_context *)context;
@@ -269,14 +247,6 @@ static struct pipe_context *r600_create_context(struct 
pipe_screen *screen, void
rctx->b.ws->cs_set_flush_callback(rctx->b.rings.gfx.cs, 
r600_flush_from_winsys, rctx);
rctx->b.rings.gfx.flushing = false;
 
-   rctx->b.rings.dma.cs = NULL;
-   if (rscreen->b.info.r600_has_dma && !(rscreen->b.debug_flags & 
DBG_NO_ASYNC_DMA)) {
-   rctx->b.rings.dma.cs = rctx->b.ws->cs_create(rctx->b.ws, 
RING_DMA, NULL);
-   rctx->b.rings.dma.flush = r600_flush_dma_ring;
-   rctx->b.ws->cs_set_flush_callback(rctx->b.rings.dma.cs, 
r600_flush_dma_from_winsys, rctx);
-   rctx->b.rings.dma.flushing = false;
-   }
-
rctx->allocator_fetch_shader = u_suballocator_create(&rctx->b.b, 64 * 
1024, 256,
 0, 
PIPE_USAGE_DEFAULT, FALSE);
if (!rctx->allocator_fetch_shader)
diff --git a/src/gallium/drivers/r600/r600_pipe.h 
b/src/gallium/drivers/r600/r600_pipe.h
index 6d627e5..a3827e3 100644
--- a/src/gallium/drivers/r600/r600_pipe.h
+++ b/src/gallium/drivers/r600/r600_pipe.h
@@ -197,7 +197,6 @@ struct r600_gs_rings_state {
 /* features */
 #define DBG_NO_LLVM(1 << 17)
 #define DBG_NO_CP_DMA  (1 << 18)
-#define DBG_NO_ASYNC_DMA   (1 << 19)
 /* shader backend */
 #define DBG_NO_SB  (1 << 21)
 #define DBG_SB_CS  (1 << 22)
diff --git a/src/gallium/drivers/radeon/r600_pipe_common.c 
b/src/gallium/drivers/radeon/r600_pipe_common.c
index 3aa718d..2e39aaf 100644
--- a/src/gallium/drivers/radeon/r600_pipe_common.c
+++ b/src/gallium/drivers/radeon/r600_pipe_common.c
@@ -43,6 +43,27 @@ static void r600_memory_barrier(struct pipe_context *ctx, 
unsigned flags)
 {
 }
 
+static void r600_flush_dma_ring(void *ctx, unsigned flags)
+{
+   struct r600_common_context *rctx = (struct r600_common_context *)ctx;
+   struct radeon_winsys_cs *cs = rctx->rings.dma.cs;
+
+   if (!cs->cdw) {
+   return;
+   }
+
+   rctx->rings.dma.flushing = true;
+   rctx->ws->cs_flush(cs, flags, 0);
+   rctx->rings.dma.flushing = false;
+}
+
+static void r600_flush_dma_from_winsys(void *ctx, unsigned flags)
+{
+   struct r600_common_context *rctx = (struct r600_common_context *)ctx;
+
+   rctx->rings.dma.flush(rctx, flags);
+}
+
 bool r600_common_context_init(struct r600_common_context *rctx,
  struct r600_common_screen *rscreen)
 {
@@ -77,6 +98,14 @@ bool r600_common_context_init(struct r600_common_context 
*rctx,
if (!rctx->uploader)
return false;
 
+   rctx->rings.dma.cs = NULL;

[Mesa-dev] [PATCH 2/2] radeonsi: flush the dma ring in si_flush_from_st

2014-03-13 Thread Niels Ole Salscheider

Signed-off-by: Niels Ole Salscheider 
---
 src/gallium/drivers/radeonsi/si_pipe.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/src/gallium/drivers/radeonsi/si_pipe.c 
b/src/gallium/drivers/radeonsi/si_pipe.c
index 827e9fe..401bf6a 100644
--- a/src/gallium/drivers/radeonsi/si_pipe.c
+++ b/src/gallium/drivers/radeonsi/si_pipe.c
@@ -65,6 +65,13 @@ static void si_flush_from_st(struct pipe_context *ctx,
 struct pipe_fence_handle **fence,
 unsigned flags)
 {
+   struct si_context *sctx = (struct si_context *)ctx;
+
+   if (sctx->b.rings.dma.cs) {
+   sctx->b.rings.dma.flush(sctx,
+   flags & PIPE_FLUSH_END_OF_FRAME ? 
RADEON_FLUSH_END_OF_FRAME : 0);
+   }
+
si_flush(ctx, fence,
 flags & PIPE_FLUSH_END_OF_FRAME ? RADEON_FLUSH_END_OF_FRAME : 
0);
 }
-- 
1.9.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] dri2GetGlxDrawableFromXDrawableId may return NULL

2010-07-28 Thread Niels Ole Salscheider

Hello,

I have written a small patch that fixes a crash in KWin (see
attachment).

Since dri2GetGlxDrawableFromXDrawableId may return NULL we
should only dereference the returned pointer if it is not NULL.

Kind
regards

Ole
From 1e98cf89bed85a87b3851815ed64619e402747eb Mon Sep 17 00:00:00 2001
From: Niels Ole Salscheider 
Date: Wed, 28 Jul 2010 09:15:22 +0200
Subject: [PATCH] dri2GetGlxDrawableFromXDrawableId may return NULL

---
 src/glx/dri2_glx.c |   37 +
 1 files changed, 21 insertions(+), 16 deletions(-)

diff --git a/src/glx/dri2_glx.c b/src/glx/dri2_glx.c
index 49c7ce7..fb791ec 100644
--- a/src/glx/dri2_glx.c
+++ b/src/glx/dri2_glx.c
@@ -632,6 +632,10 @@ dri2InvalidateBuffers(Display *dpy, XID drawable)
 {
__GLXDRIdrawable *pdraw =
   dri2GetGlxDrawableFromXDrawableId(dpy, drawable);
+  
+   if (pdraw == NULL)
+   return;
+
struct dri2_screen *psc = (struct dri2_screen *) pdraw->psc;
struct dri2_drawable *pdp = (struct dri2_drawable *) pdraw;
 
@@ -653,27 +657,28 @@ dri2_bind_tex_image(Display * dpy,
struct dri2_drawable *pdraw = (struct dri2_drawable *) base;
struct dri2_display *pdp =
   (struct dri2_display *) dpyPriv->dri2Display;
-   struct dri2_screen *psc = (struct dri2_screen *) base->psc;
 
-   if (pdraw != NULL) {
+   if (pdraw == NULL)
+   return;
+
+   struct dri2_screen *psc = (struct dri2_screen *) base->psc;
 
 #if __DRI2_FLUSH_VERSION >= 3
-  if (!pdp->invalidateAvailable && psc->f)
-	 psc->f->invalidate(pdraw->driDrawable);
+   if (!pdp->invalidateAvailable && psc->f)
+   psc->f->invalidate(pdraw->driDrawable);
 #endif
 
-  if (psc->texBuffer->base.version >= 2 &&
-	  psc->texBuffer->setTexBuffer2 != NULL) {
-	 (*psc->texBuffer->setTexBuffer2) (pcp->driContext,
-	   pdraw->base.textureTarget,
-	   pdraw->base.textureFormat,
-	   pdraw->driDrawable);
-  }
-  else {
-	 (*psc->texBuffer->setTexBuffer) (pcp->driContext,
-	  pdraw->base.textureTarget,
-	  pdraw->driDrawable);
-  }
+   if (psc->texBuffer->base.version >= 2 &&
+   psc->texBuffer->setTexBuffer2 != NULL) {
+   (*psc->texBuffer->setTexBuffer2) (pcp->driContext,
+	 pdraw->base.textureTarget,
+	 pdraw->base.textureFormat,
+	 pdraw->driDrawable);
+   }
+   else {
+   (*psc->texBuffer->setTexBuffer) (pcp->driContext,
+	pdraw->base.textureTarget,
+	pdraw->driDrawable);
}
 }
 
-- 
1.7.2



signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] dri2GetGlxDrawableFromXDrawableId may return NULL

2010-07-28 Thread Niels Ole Salscheider

Hello,

there's another place where the returned pointer is not checked in
src/glx/dri2.c (see attachment). This might fix bug 29148.

Kind
regards

Ole

On Wednesday 28 July 2010 09:46:42 Niels Ole Salscheider
wrote:
> Hello,
> 
> I have written a small patch that fixes a crash in KWin
(see
> attachment).
> 
> Since dri2GetGlxDrawableFromXDrawableId may return
NULL we
> should only dereference the returned pointer if it is not NULL.
>

> Kind
> regards
> 
> Ole

From fe7b4347c14525abdf1b150400df3e209d1d91ad Mon Sep 17 00:00:00 2001
From: Niels Ole Salscheider 
Date: Wed, 28 Jul 2010 10:29:14 +0200
Subject: [PATCH] dri2GetGlxDrawableFromXDrawableId may return NULL

---
 src/glx/dri2.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/src/glx/dri2.c b/src/glx/dri2.c
index d53431c..771bf6c 100644
--- a/src/glx/dri2.c
+++ b/src/glx/dri2.c
@@ -103,6 +103,8 @@ DRI2WireToEvent(Display *dpy, XEvent *event, xEvent *wire)
 
   /* Ignore swap events if we're not looking for them */
   pdraw = dri2GetGlxDrawableFromXDrawableId(dpy, awire->drawable);
+  if (pdraw == NULL)
+ return False;
   if (!(pdraw->eventMask & GLX_BUFFER_SWAP_COMPLETE_INTEL_MASK))
 	 return False;
 
-- 
1.7.2



signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] dri2GetGlxDrawableFromXDrawableId may return NULL

2010-07-28 Thread Niels Ole Salscheider

And finally some further functions I missed...

All three patches
are
Signed-off-by: Niels Ole Salscheider

Kind regards

Ole

On Wednesday 28
July 2010 10:32:06 Niels Ole Salscheider wrote:
> Hello,
> 
> there's
another place where the returned pointer is not checked in
> src/glx/dri2.c
(see attachment). This might fix bug 29148.
> 
> Kind
> regards
> 
> Ole
From 323bfe5094f87fdd836566eb5dd9e523299baf03 Mon Sep 17 00:00:00 2001
From: Niels Ole Salscheider 
Date: Wed, 28 Jul 2010 16:39:01 +0200
Subject: [PATCH] GetGLXDRIDrawable may return NULL

---
 src/glx/glxcmds.c |   13 +
 1 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/src/glx/glxcmds.c b/src/glx/glxcmds.c
index 0782b1d..2dad90a 100644
--- a/src/glx/glxcmds.c
+++ b/src/glx/glxcmds.c
@@ -1962,6 +1962,9 @@ __glXSwapIntervalSGI(int interval)
if (gc->driContext && psc->driScreen && psc->driScreen->setSwapInterval) {
   __GLXDRIdrawable *pdraw =
 	 GetGLXDRIDrawable(gc->currentDpy, gc->currentDrawable);
+  if (pdraw == NULL)
+ return False;
+
   psc->driScreen->setSwapInterval(pdraw, interval);
   return 0;
}
@@ -2008,6 +2011,9 @@ __glXSwapIntervalMESA(unsigned int interval)
   if (psc->driScreen && psc->driScreen->setSwapInterval) {
  __GLXDRIdrawable *pdraw =
 	GetGLXDRIDrawable(gc->currentDpy, gc->currentDrawable);
+	 if (pdraw == NULL)
+	return False;
+
 	 return psc->driScreen->setSwapInterval(pdraw, interval);
   }
}
@@ -2030,6 +2036,9 @@ __glXGetSwapIntervalMESA(void)
   if (psc->driScreen && psc->driScreen->getSwapInterval) {
  __GLXDRIdrawable *pdraw =
 	GetGLXDRIDrawable(gc->currentDpy, gc->currentDrawable);
+	 if (pdraw == NULL)
+	return False;
+
 	 return psc->driScreen->getSwapInterval(pdraw);
   }
}
@@ -2064,6 +2073,8 @@ __glXGetVideoSyncSGI(unsigned int *count)
psc = GetGLXScreenConfigs(gc->currentDpy, gc->screen);
 #ifdef GLX_DIRECT_RENDERING
pdraw = GetGLXDRIDrawable(gc->currentDpy, gc->currentDrawable);
+   if (pdraw == NULL)
+  return False;
 #endif
 
/* FIXME: Looking at the GLX_SGI_video_sync spec in the extension registry,
@@ -2106,6 +2117,8 @@ __glXWaitVideoSyncSGI(int divisor, int remainder, unsigned int *count)
psc = GetGLXScreenConfigs( gc->currentDpy, gc->screen);
 #ifdef GLX_DIRECT_RENDERING
pdraw = GetGLXDRIDrawable(gc->currentDpy, gc->currentDrawable);
+   if (pdraw == NULL)
+  return False;
 #endif
 
 #ifdef GLX_DIRECT_RENDERING
-- 
1.7.2



signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] check if glx_displays->dpy == dpy and update glx_display

2010-07-29 Thread Niels Ole Salscheider

Hello,

in src/glx/glxext.c line 146 following dpy should be removed from the
glx_displays list. But it does not handle the case when dpy is the first
item in glx_displays  or when glx_displays is empty.

This should be fixed
by the attached patch.

Signed-of-by: Niels Ole Salscheider

Kind
regards

Ole
From fad9756434ad461408e40c96af4465edfb984aa2 Mon Sep 17 00:00:00 2001
From: Niels Ole Salscheider 
Date: Thu, 29 Jul 2010 18:19:14 +0200
Subject: [PATCH] check if glx_displays->dpy == dpy and update glx_display

---
 src/glx/glxext.c |   18 --
 1 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/src/glx/glxext.c b/src/glx/glxext.c
index c227215..0c16cb1 100644
--- a/src/glx/glxext.c
+++ b/src/glx/glxext.c
@@ -239,15 +239,26 @@ static void
 static int
 __glXCloseDisplay(Display * dpy, XExtCodes * codes)
 {
- struct glx_display *priv, **prev;
+ struct glx_display *priv, *prev;
  struct glx_context *gc;
 
_XLockMutex(_Xglobal_lock);
-   prev = &glx_displays;
-   for (priv = glx_displays; priv; prev = &priv->next, priv = priv->next) {
-  if (priv->dpy == dpy) {
-	 (*prev)->next = priv->next;
-	 break;
+   
+   if (glx_displays == NULL)
+  return;
+   if (glx_displays->dpy == dpy)
+   {
+  glx_displays = glx_displays->next;
+  priv = glx_displays;
+   }
+   else
+   {
+  prev = glx_displays;
+  for (priv = glx_displays; priv; prev = priv, priv = priv->next) {
+ if (priv->dpy == dpy) {
+   prev->next = priv->next;
+   break;
+ }
   }
}
_XUnlockMutex(_Xglobal_lock);
-- 
1.7.2



signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] dri2GetGlxDrawableFromXDrawableId may return NULL

2010-07-30 Thread Niels Ole Salscheider

Hello,

> > Since dri2GetGlxDrawableFromXDrawableId may return NULL we
> > should only dereference the returned pointer if it is not NULL.
> 
> It shouldn't return NULL... when does this happen? During shutdown?
> We only get the DRI2 events for drawables we've created a DRI2
> drawable for, which means there should be a __GLXDRIdrawable in the
> hash.

It happens (randomly?) when I try to activate or change kwin's (4.4.95) 
desktop effects.

This is the first backtrace I posted on IRC:

Application: KWin (kwin), signal: Segmentation fault
[KCrash Handler]
#6  0x7fba52383d49 in dri2InvalidateBuffers () from 
//usr/lib64/opengl/xorg-x11/lib/libGL.so.1
#7  0x7fba52383ead in dri2SwapBuffers () from //usr/lib64/opengl/xorg-
x11/lib/libGL.so.1
#8  0x7fba56f78c85 in KWin::SceneOpenGL::flushBuffer(int, QRegion) () from 
/usr/lib/libkdeinit4_kwin.so
#9  0x7fba56f79c06 in KWin::SceneOpenGL::paint(QRegion, 
QList) () from /usr/lib/libkdeinit4_kwin.so
#10 0x7fba56f6469c in KWin::Workspace::performCompositing() () from 
/usr/lib/libkdeinit4_kwin.so
#11 0x7fba56ee87c2 in KWin::Workspace::qt_metacall(QMetaObject::Call, int, 
void**) () from /usr/lib/libkdeinit4_kwin.so
#12 0x7fba53ae7006 in QMetaObject::activate(QObject*, QMetaObject const*, 
int, void**) () from /usr/lib64/qt4/libQtCore.so.4
#13 0x7fba53ae38b6 in QObject::event(QEvent*) () from 
/usr/lib64/qt4/libQtCore.so.4
#14 0x7fba546fcc0c in QApplicationPrivate::notify_helper(QObject*, 
QEvent*) () from /usr/lib64/qt4/libQtGui.so.4
#15 0x7fba5470320b in QApplication::notify(QObject*, QEvent*) () from 
/usr/lib64/qt4/libQtGui.so.4
#16 0x7fba5566aee8 in KApplication::notify(QObject*, QEvent*) () from 
/usr/lib/libkdeui.so.5
#17 0x7fba53ad3d6b in QCoreApplication::notifyInternal(QObject*, QEvent*) 
() from /usr/lib64/qt4/libQtCore.so.4
#18 0x7fba53b008ba in QTimerInfoList::activateTimers() () from 
/usr/lib64/qt4/libQtCore.so.4
#19 0x7fba53b00acb in 
QEventDispatcherUNIX::processEvents(QFlags) () 
from /usr/lib64/qt4/libQtCore.so.4
#20 0x7fba547ac595 in 
QEventDispatcherX11::processEvents(QFlags) () 
from /usr/lib64/qt4/libQtGui.so.4
#21 0x7fba53ad2692 in 
QEventLoop::processEvents(QFlags) () from 
/usr/lib64/qt4/libQtCore.so.4
#22 0x7fba53ad2a5d in 
QEventLoop::exec(QFlags) () from 
/usr/lib64/qt4/libQtCore.so.4
#23 0x7fba53ad769b in QCoreApplication::exec() () from 
/usr/lib64/qt4/libQtCore.so.4
#24 0x7fba56f0341b in kdemain () from /usr/lib/libkdeinit4_kwin.so
#25 0x7fba50a22bbd in __libc_start_main () from /lib/libc.so.6
#26 0x00400969 in _start ()

dri2_glx.c, line 508ff says the following:

/* Old servers don't send invalidate events */
if (!pdp->invalidateAvailable)
   dri2InvalidateBuffers(dpyPriv->dpy, pdraw->drawable);

So dri2InvalidateBuffers should only be called if the server is "old" - I am 
using xorg-server 1.8.2, is that "old"? Maybe the server sends invalidate 
events and dri2InvalidateBuffers is called so that it cannot find the drawable 
anymore?

Unfortunately I am not at home until Wednesday so that I cannot debug it 
further until then.

> As for the bind_tex_image, does KWin create a glx drawable and
> then use the X window XID for tfp?  These things shouldn't happen so I
> don't just want to slap a NULL check on them without understanding how
> it happens.

I have to admit that I do not know much about mesa's or kwin's internals. I 
only noticed that there are some NULL pointer checks for GetGLXDRIDrawable and 
thought that this was the way to go.
In dri2_bind_tex_image there is already a NULL-pointer check in line 662 (I do 
not know why it is there and unfortunately I do not have a backtrace at the 
moment). But this check is useless since the returned drawable is already 
dereferenced in line 660.

Kind regards

Ole


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] Commit 683ef52e19576f6e1263bc7d25fc9475c519eade / Assertion `type != WRITE_COLOR' failed.

2010-08-16 Thread Niels Ole Salscheider

Hello,

when I try to run any OpenGL program like glxgears I get the following (radeon
x700, gallium, mesa git):
r300_state_derived.c:214:r300_rs_col_write: Assertion `type != WRITE_COLOR'
failed.

This assertion was added by Marek Olšák in commit
683ef52e19576f6e1263bc7d25fc9475c519eade.

Unfortunately I cannot find any place where r300_rs_col_write is called with
type != WRITE_COLOR' except for line 419ff in r300_state_derived.c. But for my
card, r300->screen->caps.is_r500 should be false.

Do you have any ideas?

Regards,

Ole

signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] configure: Link against all LLVM targets when building clover

2015-01-15 Thread Niels Ole Salscheider

Since 8e7df519bd8556591794b2de08a833a67e34d526, we initialise all targets in
clover. This fixes bug 85189.

Signed-off-by: Niels Ole Salscheider 
---
 configure.ac | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/configure.ac b/configure.ac
index c72fe92..1761c32 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1732,7 +1732,7 @@ if test "x$enable_gallium_llvm" = xyes; then
 fi
 
 if test "x$enable_opencl" = xyes; then
-LLVM_COMPONENTS="${LLVM_COMPONENTS} ipo linker instrumentation"
+LLVM_COMPONENTS="${LLVM_COMPONENTS} all-targets ipo linker 
instrumentation"
 # LLVM 3.3 >= 177971 requires IRReader
 if $LLVM_CONFIG --components | grep -qw 'irreader'; then
 LLVM_COMPONENTS="${LLVM_COMPONENTS} irreader"
-- 
2.2.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] configure: Link against all LLVM targets when building clover

2015-01-22 Thread Niels Ole Salscheider

On Thursday 22 January 2015, 13:46:14, Jan Vesely wrote:
> On Thu, 2015-01-22 at 16:45 +, Emil Velikov wrote:
> > On 15/01/15 21:38, Tom Stellard wrote:
> > > On Thu, Jan 15, 2015 at 07:25:56PM +0100, Niels Ole Salscheider wrote:
> > >> Since 8e7df519bd8556591794b2de08a833a67e34d526, we initialise all
> > >> targets in clover. This fixes bug 85189.
> > >> 
> > >> Signed-off-by: Niels Ole Salscheider 
> > > 
> > > Reviewed-by: Tom Stellard 
> > 
> > Hi Niels,
> > 
> > Can you confirm if this is needed for the 10.4 branch ? The commit
> > mentioned got in the 10.4 devel cycle.
> > 
> > Also the bug mentioned
> > (https://bugs.freedesktop.org/show_bug.cgi?id=85189) seems to have
> > alternative fix which is already in master. I take that this fix is
> > required when building with static llvm ?
> 
> the patch looks like it fixes
> https://bugs.freedesktop.org/show_bug.cgi?id=85380
> instead of 85189

Yes, Jan is right. This patch fixes bug 85380 instead of 85189 - this was 
probably a copy&paste error.

This patch is relevant for the 10.4 branch, too, since commit 
8e7df519bd8556591794b2de08a833a67e34d526 is in it.

Ole

> jan
> 
> > Thanks
> > Emil
> > 
> > ___
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] configure: Link against all LLVM targets when building clover

2015-01-24 Thread Niels Ole Salscheider

Since 8e7df519bd8556591794b2de08a833a67e34d526, we initialise all targets in
clover. This fixes bug 85380.

v2: Mention correct bug in commit message

Signed-off-by: Niels Ole Salscheider 
---
 configure.ac | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/configure.ac b/configure.ac
index 1cce517..2b7f576 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1902,7 +1902,7 @@ if test "x$enable_gallium_llvm" = xyes; then
 fi
 
 if test "x$enable_opencl" = xyes; then
-LLVM_COMPONENTS="${LLVM_COMPONENTS} ipo linker instrumentation"
+LLVM_COMPONENTS="${LLVM_COMPONENTS} all-targets ipo linker 
instrumentation"
 # LLVM 3.3 >= 177971 requires IRReader
 if $LLVM_CONFIG --components | grep -qw 'irreader'; then
 LLVM_COMPONENTS="${LLVM_COMPONENTS} irreader"
-- 
2.2.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] configure: Link against all LLVM targets when building clover

2015-01-24 Thread Niels Ole Salscheider

On Saturday 24 January 2015, 18:24:16, Jan Vesely wrote:
> On Sat, 2015-01-24 at 22:49 +0100, Niels Ole Salscheider wrote:
> > Since 8e7df519bd8556591794b2de08a833a67e34d526, we initialise all targets
> > in clover. This fixes bug 85380.
> > 
> > v2: Mention correct bug in commit message
> > 
> > Signed-off-by: Niels Ole Salscheider 
> 
> I thought you already had Tom's rb.
> you can add mine as well
> Reviewed-by: Jan Vesely 

Ok, thanks. But I do not have write access to mesa - would you mind to push it 
for me?

> 
> > ---
> > 
> >  configure.ac | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/configure.ac b/configure.ac
> > index 1cce517..2b7f576 100644
> > --- a/configure.ac
> > +++ b/configure.ac
> > @@ -1902,7 +1902,7 @@ if test "x$enable_gallium_llvm" = xyes; then
> > 
> >  fi
> >  
> >  if test "x$enable_opencl" = xyes; then
> > 
> > -LLVM_COMPONENTS="${LLVM_COMPONENTS} ipo linker
> > instrumentation" +LLVM_COMPONENTS="${LLVM_COMPONENTS}
> > all-targets ipo linker instrumentation"> 
> >  # LLVM 3.3 >= 177971 requires IRReader
> >  if $LLVM_CONFIG --components | grep -qw 'irreader'; then
> >  
> >  LLVM_COMPONENTS="${LLVM_COMPONENTS} irreader"
> 
> --
> Jan Vesely 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] winsys/radeon: Do not deinit the pb cache if it was not initialized

2016-01-29 Thread Niels Ole Salscheider

This fixes a crash in pb_cache_release_all_buffers.

Signed-off-by: Niels Ole Salscheider 
---
 src/gallium/winsys/radeon/drm/radeon_drm_winsys.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/gallium/winsys/radeon/drm/radeon_drm_winsys.c 
b/src/gallium/winsys/radeon/drm/radeon_drm_winsys.c
index 8a1ed3a..4823bf3 100644
--- a/src/gallium/winsys/radeon/drm/radeon_drm_winsys.c
+++ b/src/gallium/winsys/radeon/drm/radeon_drm_winsys.c
@@ -742,7 +742,7 @@ radeon_drm_winsys_create(int fd, radeon_screen_create_t 
screen_create)
 ws->fd = dup(fd);
 
 if (!do_winsys_init(ws))
-goto fail;
+goto fail1;
 
 pb_cache_init(&ws->bo_cache, 50, 2.0f, 0,
   MIN2(ws->info.vram_size, ws->info.gart_size),
@@ -812,8 +812,9 @@ radeon_drm_winsys_create(int fd, radeon_screen_create_t 
screen_create)
 return &ws->base;
 
 fail:
-pipe_mutex_unlock(fd_tab_mutex);
 pb_cache_deinit(&ws->bo_cache);
+fail1:
+pipe_mutex_unlock(fd_tab_mutex);
 if (ws->surf_man)
 radeon_surface_manager_free(ws->surf_man);
 if (ws->fd >= 0)
-- 
2.7.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

76 matches

Mail list logo