On Mon, Oct 26, 2009 at 5:32 PM, Alex Deucher alexdeuc...@gmail.com wrote:
On Mon, Oct 26, 2009 at 4:55 AM, Stephan Schmid stephan_2...@gmx.de wrote:
This implements GL_ARB_occlusion_query for RV610
Currently it results in a huge performance gain in games that take advantage
of
ARB_oq such as sauerbraten (cube2).
issues:
- this was tested so far only on RV610. I figured out that the RV610 writes
one
single uint64_t value when triggering the zpass write event. The specs
aren't
too clear about what exactly is written.
It might be that there are multiple zpass counters on chip and that
r6xx/r7xx
chips write one uint64_t per counter (just as the r300 do it). In this case
the
RV610 would write only one value because it's one of the smallest chips in
the
family so it's got only one counter.
If my assumtion were true it would be necessary to use n*sizeof(uint64_t) in
r600_emit_query_finish as offset (n = number of counters/values written) and
to consider the additional values in radeonQueryGetResult when computing the
result of the query.
It would be interesting to know what the other r6xx/r7xx write on
zpass-write event
to support them as well.
Stephan,
Nice work! The zpass stuff is per DB just like the older chips, so
you'll need to allocate enough memory to support two qwords for each
DB. The number of DBs depends on the asic. We'll probably need a drm
query similar to what we do for r300. I'm working on a cleaned up
version of your mesa patch and a drm patch to return the number of RBs
like we do for r300.
After testing, it seems r6xx aggregates the zpass results from all DB
blocks into 1 qword. R7xx, seems to work differently. I'm following
up internally. The attached patch works properly on all r6xx cards I
have and certain r7xx cards. tri-query and glean oq tests pass.
Alex
From 888e8fd56788bcf4fc42b9d630ad1c4a01d8c9b1 Mon Sep 17 00:00:00 2001
From: Alex Deucher alexdeuc...@gmail.com
Date: Tue, 27 Oct 2009 03:50:58 -0400
Subject: [PATCH] r600: add occlusion query support
Based on initial patch from Stephan Schmid stephan_2...@gmx.de.
Basic idea is to dump the zpass count before and after and substract
to get the total number of visible fragments. R6xx appears to
aggregate the results of all DB blocks into a single qword and works
properly on all cards I've tested on. R7xx seems to work differently
and needs follow up.
Signed-off-by: Alex Deucher alexdeuc...@gmail.com
---
src/mesa/drivers/dri/r600/r600_context.c | 28 --
src/mesa/drivers/dri/r600/r700_chip.c | 50 +
src/mesa/drivers/dri/r600/r700_state.c|1 +
src/mesa/drivers/dri/radeon/radeon_queryobj.c | 28 +++---
4 files changed, 97 insertions(+), 10 deletions(-)
diff --git a/src/mesa/drivers/dri/r600/r600_context.c b/src/mesa/drivers/dri/r600/r600_context.c
index c1bf76d..6fe2926 100644
--- a/src/mesa/drivers/dri/r600/r600_context.c
+++ b/src/mesa/drivers/dri/r600/r600_context.c
@@ -64,6 +64,7 @@ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
#include r600_cmdbuf.h
#include r600_emit.h
#include radeon_bocs_wrapper.h
+#include radeon_queryobj.h
#include r700_state.h
#include r700_ioctl.h
@@ -73,11 +74,8 @@ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
#include utils.h
#include xmlpool.h /* for symbolic values of enum-type options */
-/* hw_tcl_on derives from future_hw_tcl_on when its safe to change it. */
-int future_hw_tcl_on = 1;
-int hw_tcl_on = 1;
-
#define need_GL_VERSION_2_0
+#define need_GL_ARB_occlusion_query
#define need_GL_ARB_point_parameters
#define need_GL_ARB_vertex_program
#define need_GL_EXT_blend_equation_separate
@@ -98,6 +96,7 @@ static const struct dri_extension card_extensions[] = {
/* *INDENT-OFF* */
{GL_ARB_depth_texture, NULL},
{GL_ARB_fragment_program, NULL},
+ {GL_ARB_occlusion_query,GL_ARB_occlusion_query_functions},
{GL_ARB_multitexture, NULL},
{GL_ARB_point_parameters, GL_ARB_point_parameters_functions},
{GL_ARB_shadow, NULL},
@@ -204,6 +203,25 @@ static void r600_fallback(GLcontext *ctx, GLuint bit, GLboolean mode)
context-radeon.Fallback = ~bit;
}
+static void r600_emit_query_finish(radeonContextPtr radeon)
+{
+ context_t *context = (context_t*) radeon;
+ BATCH_LOCALS(context-radeon);
+
+ struct radeon_query_object *query = radeon-query.current;
+
+ BEGIN_BATCH_NO_AUTOSTATE(4 + 2);
+ R600_OUT_BATCH(CP_PACKET3(R600_IT_EVENT_WRITE, 2));
+ R600_OUT_BATCH(ZPASS_DONE);
+ R600_OUT_BATCH(query-curr_offset); /* hw writes qwords */
+ R600_OUT_BATCH(0x);
+ R600_OUT_BATCH_RELOC(VGT_EVENT_INITIATOR, query-bo, 0, 0, RADEON_GEM_DOMAIN_GTT, 0);
+ END_BATCH();
+ query-curr_offset += 8 * sizeof(uint64_t);
+ assert(query-curr_offset RADEON_QUERY_PAGE_SIZE);
+ query-emitted_begin = GL_FALSE;
+}
+
static void r600_init_vtbl(radeonContextPtr radeon)
{
radeon-vtbl.get_lock = r600_get_lock;
@@