You need to focus on the specific uint64 instruction in reg_insn_selection.cpp, and analyze the src/dst register carefully,
Two possible root causes there: 1. Wrongly set a register as a source but you will touch it in that instruction. 2. Incorrectly set the register type, for example, you want to use it as 64 bit but you only set it as DWORD. And that's all I can tell now. -----Original Message----- From: beignet-bounces+zhigang.gong=linux.intel....@lists.freedesktop.org [mailto:beignet-bounces+zhigang.gong=linux.intel....@lists.freedesktop.org] On Behalf Of Xing, Homer Sent: Thursday, September 26, 2013 1:34 PM To: Zhigang Gong; Yang, Rong R Cc: beignet@lists.freedesktop.org Subject: Re: [Beignet] [PATCH] Remove global offset need divide by local size restriction. I can take this bug. But currently I don't have enough information. Zhigang, do you know more details about how to find dependency bug in code path? -----Original Message----- From: beignet-bounces+homer.xing=intel....@lists.freedesktop.org [mailto:beignet-bounces+homer.xing=intel....@lists.freedesktop.org] On Behalf Of Zhigang Gong Sent: Thursday, September 26, 2013 10:43 AM To: Yang, Rong R Cc: beignet@lists.freedesktop.org Subject: Re: [Beignet] [PATCH] Remove global offset need divide by local size restriction. Hi guys, I just pushed this patch, and it causes a regression as below: compiler_abs_diff_long16: compiler_abs_diff_long16() [FAILED] Error: !memcmp(buf_data[2], cpu_diff, sizeof(T) * n) at file /home/gongzg/git/fdo/beignet/utests/compiler_abs_diff.cpp, function compiler_abs_diff_with_type, line 177 I took a quick look at it and found the root cause of this regression is not in this patch. As when I disable the OCL_PRE_ALLOC_INSN, it works fine. For me, the most possible root cause is there is a dependency bug within the abs_diff_long code path. I will take vacation from tomorrow and will back to work at 8/10. If anybody in the list can take this bug, I will appreciate. Thanks - Zhigang On Mon, Sep 23, 2013 at 02:04:08PM +0800, Yang Rong wrote: > Set to global offset to 0 in walker, and add global offset when get_global_id. > > Signed-off-by: Yang Rong <rong.r.y...@intel.com> > --- > backend/src/ocl_stdlib.tmpl.h | 2 +- > src/cl_api.c | 7 ------- > src/intel/intel_gpgpu.c | 6 +++--- > 3 files changed, 4 insertions(+), 11 deletions(-) > > diff --git a/backend/src/ocl_stdlib.tmpl.h > b/backend/src/ocl_stdlib.tmpl.h index 9b76ba1..4921ee4 100644 > --- a/backend/src/ocl_stdlib.tmpl.h > +++ b/backend/src/ocl_stdlib.tmpl.h > @@ -588,7 +588,7 @@ DECL_PUBLIC_WORK_ITEM_FN(get_num_groups, 1) #undef > DECL_PUBLIC_WORK_ITEM_FN > > INLINE uint get_global_id(uint dim) { > - return get_local_id(dim) + get_local_size(dim) * get_group_id(dim); > + return get_local_id(dim) + get_local_size(dim) * get_group_id(dim) > + + get_global_offset(dim); > } > > > ////////////////////////////////////////////////////////////////////// > /////// diff --git a/src/cl_api.c b/src/cl_api.c index > b60160b..c19b80a 100644 > --- a/src/cl_api.c > +++ b/src/cl_api.c > @@ -2264,19 +2264,12 @@ clEnqueueNDRangeKernel(cl_command_queue command_queue, > goto error; > } > > - /* Check offset values. We add a non standard restriction. The offsets must > - * also be evenly divided by the local sizes > - */ > if (global_work_offset != NULL) > for (i = 0; i < work_dim; ++i) { > if (UNLIKELY(~0LL - global_work_offset[i] > global_work_size[i])) { > err = CL_INVALID_GLOBAL_OFFSET; > goto error; > } > - if (UNLIKELY(local_work_size != NULL && global_work_offset[i] % local_work_size[i])) { > - err = CL_INVALID_GLOBAL_OFFSET; > - goto error; > - } > } > > /* Local sizes must be non-null and divide global sizes */ diff > --git a/src/intel/intel_gpgpu.c b/src/intel/intel_gpgpu.c index > 7b82b76..44f44ef 100644 > --- a/src/intel/intel_gpgpu.c > +++ b/src/intel/intel_gpgpu.c > @@ -886,11 +886,11 @@ intel_gpgpu_walker(intel_gpgpu_t *gpgpu, > OUT_BATCH(gpgpu->batch, (1 << 30) | (thread_n-1)); /* SIMD16 | thread max */ > else > OUT_BATCH(gpgpu->batch, (0 << 30) | (thread_n-1)); /* SIMD8 | > thread max */ > - OUT_BATCH(gpgpu->batch, global_wk_off[0]); > + OUT_BATCH(gpgpu->batch, 0); > OUT_BATCH(gpgpu->batch, global_wk_dim[0]); > - OUT_BATCH(gpgpu->batch, global_wk_off[1]); > + OUT_BATCH(gpgpu->batch, 0); > OUT_BATCH(gpgpu->batch, global_wk_dim[1]); > - OUT_BATCH(gpgpu->batch, global_wk_off[2]); > + OUT_BATCH(gpgpu->batch, 0); > OUT_BATCH(gpgpu->batch, global_wk_dim[2]); > OUT_BATCH(gpgpu->batch, right_mask); > OUT_BATCH(gpgpu->batch, ~0x0); /* we always set height as 1, so set bottom mask as all 1*/ > -- > 1.8.1.2 > > _______________________________________________ > Beignet mailing list > Beignet@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/beignet _______________________________________________ Beignet mailing list Beignet@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/beignet _______________________________________________ Beignet mailing list Beignet@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/beignet _______________________________________________ Beignet mailing list Beignet@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/beignet