Re: [Beignet] Anything from master that should be in (Debian) 1.3?

2018-08-20 Thread Yang, Rong R
We also don't full test these two patches, but seems they will not cause 
regression.
For patch 7e181af2, it is an improvement.
For patch b70d65ba, it check the drive name, which is hard code in linux kernel 
i915 drm driver. 

> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> Rebecca N. Palmer
> Sent: Sunday, July 22, 2018 10:00 PM
> To: beignet@lists.freedesktop.org
> Subject: [Beignet] Anything from master that should be in (Debian) 1.3?
> 
> I plan to update Debian's beignet package soon, with some of the patches I
> recently sent.
> 
> Is there anything else, e.g. from the master branch, that I should include?
> 
> In particular, these two look reasonable and build in 1.3, but I don't have 
> the
> hardware to check whether they are useful:
> 7e181af2 Enable Coffee Lake support
> b70d65ba Ensure that DRM device uses the i915 driver
> 
> (Debian doesn't need "Fix enabling of fp64 extension" because it doesn't 
> enable
> fp64, and already has the preceding three commits.
> 
> And yes, we probably should package compute-runtime...)
> 
> ___
> Beignet mailing list
> Beignet@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/beignet
___
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] [PATCH] Allow creating out-of-order queues with clCreateCommandQueue

2018-08-20 Thread Yang, Rong R
Thanks for your patches, they looks good to me, have pushed.

> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> Rebecca N. Palmer
> Sent: Sunday, July 22, 2018 3:26 AM
> To: beignet@lists.freedesktop.org
> Subject: [Beignet] [PATCH] Allow creating out-of-order queues with
> clCreateCommandQueue
> 
> clCreateCommandQueueWithProperties can already create them, but that's a
> 2.0 function.
> 
> Signed-off-by: Rebecca N. Palmer 
> ---
> yes, this currently gives you out-of-order if you ask for in-order, but says 
> "can't
> do that" if you ask for out-of-order...
> 
> --- a/src/cl_api_command_queue.c
> +++ b/src/cl_api_command_queue.c
> @@ -27,35 +27,11 @@ clCreateCommandQueue(cl_context context,
>   cl_command_queue_properties properties,
>   cl_int *errcode_ret)  {
> -  cl_command_queue queue = NULL;
> -  cl_int err = CL_SUCCESS;
> -
> -  do {
> -if (!CL_OBJECT_IS_CONTEXT(context)) {
> -  err = CL_INVALID_CONTEXT;
> -  break;
> -}
> -
> -err = cl_devices_list_include_check(context->device_num, 
> context->devices,
> 1, );
> -if (err)
> -  break;
> -
> -if (properties & ~(CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE |
> CL_QUEUE_PROFILING_ENABLE)) {
> -  err = CL_INVALID_VALUE;
> -  break;
> -}
> -
> -if (properties & CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE) { /*not
> supported now.*/
> -  err = CL_INVALID_QUEUE_PROPERTIES;
> -  break;
> -}
> -
> -queue = cl_create_command_queue(context, device, properties, 0, );
> -  } while (0);
> -
> -  if (errcode_ret)
> -*errcode_ret = err;
> -  return queue;
> +  cl_queue_properties props[3];
> +  props[0] = CL_QUEUE_PROPERTIES;
> +  props[1] = properties;
> +  props[2] = 0;
> +  return clCreateCommandQueueWithProperties(context, device, props,
> + errcode_ret);
>  }
> 
>  /* 2.0 new API for create command queue. */
> 
> ___
> Beignet mailing list
> Beignet@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/beignet
___
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] warning, curbe size exceed limitation.

2018-02-05 Thread Yang, Rong R
Curbe is GEN's hardware resource, used to send kernel's constant information to 
hardware, such as kernel arguments.
__gen_gpgpu_get_curbe_size is add by commit 43138b8d, it will return the used 
curbe size of this kernel.
The hardware's curbe size has restriction, if kernel used curbe size larger 
then hardware's restriction, beignet will print this warning and trunk the used 
curbe size.

Thanks,
Yang Rong
From: Vaughan, Thomas E [mailto:tevau...@ball.com]
Sent: Wednesday, January 31, 2018 11:59 PM
To: Yang, Rong R <rong.r.y...@intel.com>; beignet@lists.freedesktop.org
Subject: RE: warning, curbe size exceed limitation.

Thanks for replying!

I'm not writing any OpenCL code directly.  I'm using the ArrayFire C++ library, 
which JIT generates OpenCL and compiles it when the process runs.  That is, I 
have no OpenCL source code, and I don't write any C or C++ code that interacts 
with kernels directly or even knows what they are.

My unit tests pass, though, and so it seems that the calculations are being 
done correctly.

I wonder how ArrayFire decides on sizes of chunks of work, and I wonder how 
much of a performance hit I'm taking on Bay Trail because of this.

Can you explain what "curbe" stands for and what's really going on in 
__gen_gpgpu_get_curbe_size()?

--
Thomas E. Vaughan
720 201 7058 (cell)

From: Yang, Rong R [mailto:rong.r.y...@intel.com]
Sent: January 31, 2018 02:08
To: Vaughan, Thomas E <tevau...@ball.com<mailto:tevau...@ball.com>>; 
beignet@lists.freedesktop.org<mailto:beignet@lists.freedesktop.org>
Subject: RE: warning, curbe size exceed limitation.

Do you use this array as kernel's argument?
If the argument's size exceeds the argument buffer's size, beignet will print 
this warning.
And IVB's argument buffer size is larger than BayTrail.

Thanks,
Yang Rong
From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of 
Vaughan, Thomas E
Sent: Wednesday, January 31, 2018 2:04 AM
To: beignet@lists.freedesktop.org<mailto:beignet@lists.freedesktop.org>
Subject: [Beignet] warning, curbe size exceed limitation.

Thanks to Rebecca Palmer and to Mark Thompson for replying to my last plea for 
help over the weekend.

Now that I have started to get things working with the Atom GPU on Debian, I 
have noticed an odd message when I run some ArrayFire code that was tested on 
Ivy Bridge.

My application prints out "warning, curbe size exceed limitation" in various 
places.

For example, at one point I try to tile a 7x176 array into a 7x176x7 array, and 
that message is issued twice.

Does anyone have a clue about what's going on?

--
Thomas E. Vaughan
720 201 7058 (cell)

This message and any enclosures are intended only for the addressee. Please
notify the sender by email if you are not the intended recipient. If you are
not the intended recipient, you may not use, copy, disclose, or distribute this
message or its contents or enclosures to any other person and any such actions
may be unlawful. Ball reserves the right to monitor and review all messages
and enclosures sent to or from this email address.
This message and any enclosures are intended only for the addressee. Please
notify the sender by email if you are not the intended recipient. If you are
not the intended recipient, you may not use, copy, disclose, or distribute this
message or its contents or enclosures to any other person and any such actions
may be unlawful. Ball reserves the right to monitor and review all messages
and enclosures sent to or from this email address.
___
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] [PATCH 1/3] Ensure that DRM device uses the i915 driver

2018-02-04 Thread Yang, Rong R
The patchset LGTM, pushed, thanks.

> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> Mark Thompson
> Sent: Thursday, February 1, 2018 3:58 AM
> To: beignet@lists.freedesktop.org
> Subject: [Beignet] [PATCH 1/3] Ensure that DRM device uses the i915 driver
> 
> This avoids calling random ioctl()s and returning nonsensical errors for
> unsupported devices.  In particular, loading is much cleaner on setups where 
> the
> driver needs to iterate over multiple devices to find the correct one because 
> the
> Intel graphics device is not the first DRM device.
> 
> Signed-off-by: Mark Thompson 
> ---
>  src/intel/intel_driver.c | 30 ++
>  1 file changed, 30 insertions(+)
> 
> diff --git a/src/intel/intel_driver.c b/src/intel/intel_driver.c index
> 45719785..10fe3cc8 100644
> --- a/src/intel/intel_driver.c
> +++ b/src/intel/intel_driver.c
> @@ -312,6 +312,26 @@ return ret;
>  }
>  #endif
> 
> +static int
> +intel_driver_check_device(int dev_fd)
> +{
> +  // Ensure that this is actually an i915 DRM device.
> +  drmVersion *version;
> +  int ret;
> +  version = drmGetVersion(dev_fd);
> +  if (!version) {
> +fprintf(stderr, "drmGetVersion(%d) failed: %s\n", dev_fd, 
> strerror(errno));
> +close(dev_fd);
> +return 0;
> +  }
> +  ret = !strcmp(version->name, "i915");
> +  drmFreeVersion(version);
> +  // Don't print an error here if this device is using a different
> +driver,
> +  // because we might be iterating over multiple devices looking for a
> +  // compatible one.
> +  return ret;
> +}
> +
>  LOCAL int
>  intel_driver_init_master(intel_driver_t *driver, const char* dev_name)  { @@ 
> -
> 326,6 +346,11 @@ if (dev_fd == -1) {
>return 0;
>  }
> 
> +if (!intel_driver_check_device(dev_fd)) {
> +  close(dev_fd);
> +  return 0;
> +}
> +
>  // Check that we're authenticated
>  memset(, 0, sizeof(drm_client_t));  ret = ioctl(dev_fd,
> DRM_IOCTL_GET_CLIENT, ); @@ -356,6 +381,11 @@ dev_fd =
> open(dev_name, O_RDWR);  if (dev_fd == -1)
>return 0;
> 
> +if (!intel_driver_check_device(dev_fd)) {
> +  close(dev_fd);
> +  return 0;
> +}
> +
>  ret = intel_driver_init(driver, dev_fd);  driver->need_close = 1;
> 
> --
> 2.11.0
> ___
> Beignet mailing list
> Beignet@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/beignet
___
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] warning, curbe size exceed limitation.

2018-01-31 Thread Yang, Rong R
Do you use this array as kernel's argument?
If the argument's size exceeds the argument buffer's size, beignet will print 
this warning.
And IVB's argument buffer size is larger than BayTrail.

Thanks,
Yang Rong
From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of 
Vaughan, Thomas E
Sent: Wednesday, January 31, 2018 2:04 AM
To: beignet@lists.freedesktop.org
Subject: [Beignet] warning, curbe size exceed limitation.

Thanks to Rebecca Palmer and to Mark Thompson for replying to my last plea for 
help over the weekend.

Now that I have started to get things working with the Atom GPU on Debian, I 
have noticed an odd message when I run some ArrayFire code that was tested on 
Ivy Bridge.

My application prints out "warning, curbe size exceed limitation" in various 
places.

For example, at one point I try to tile a 7x176 array into a 7x176x7 array, and 
that message is issued twice.

Does anyone have a clue about what's going on?

--
Thomas E. Vaughan
720 201 7058 (cell)

This message and any enclosures are intended only for the addressee. Please
notify the sender by email if you are not the intended recipient. If you are
not the intended recipient, you may not use, copy, disclose, or distribute this
message or its contents or enclosures to any other person and any such actions
may be unlawful. Ball reserves the right to monitor and review all messages
and enclosures sent to or from this email address.
___
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] [PATCH 2/2] Enable Coffee Lake support

2018-01-31 Thread Yang, Rong R
One inline comment.
Can you also add other Coffee Lake pciids?
0x3E90, 12Eus, gt1
0x3E93, 12Eus, gt1
0x3E99, 12Eus, gt1
0x3EA1, 12Eus, gt1
0x3EA4, 12Eus, gt1

0x3E91, 24EUs, gt2 
0x3E92, 24EUs, gt2 
0x3E94, 24EUs, gt2 
0x3E96, 24EUs, gt2 
0x3E9A, 24EUs, gt2 
0x3E9B, 24EUs, gt2 
0x3EA0, 24EUs, gt2 
0x3EA3, 24EUs, gt2 
0x3EA9, 24EUs, gt2 

0x3EA2, 48EUs, gt3
0x3EA5, 48EUs, gt3
0x3EA6, 48EUs, gt3
0x3EA7, 48EUs, gt3
0x3EA8, 48EUs, gt3

Thanks,
Yang Rong

> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> Mark Thompson
> Sent: Wednesday, January 24, 2018 6:56 AM
> To: beignet@lists.freedesktop.org
> Subject: [Beignet] [PATCH 2/2] Enable Coffee Lake support
> 
> We don't need to do much here because the graphics core is the same as Kaby
> Lake.
> ---
> Tested on an 8700.  All behaviour is identical to Kaby Lake, so we can reuse 
> most
> things after adding the PCI ID and device structure.
> 
> There will be more PCI IDs, but I've only added the one I know the meaning of
> and can test.
> 
> 
>  backend/src/backend/gen_program.cpp |  5 +
>  src/cl_device_data.h|  9 -
>  src/cl_device_id.c  | 29 +++--
>  3 files changed, 40 insertions(+), 3 deletions(-)
> 
> diff --git a/backend/src/backend/gen_program.cpp
> b/backend/src/backend/gen_program.cpp
> index e06ed40c..274c99c7 100644
> --- a/backend/src/backend/gen_program.cpp
> +++ b/backend/src/backend/gen_program.cpp
> @@ -209,6 +209,8 @@ namespace gbe {
>ctx = GBE_NEW(BxtContext, unit, name, deviceID, relaxMath);
>  } else if (IS_KABYLAKE(deviceID)) {
>ctx = GBE_NEW(KblContext, unit, name, deviceID, relaxMath);
> +} else if (IS_COFFEELAKE(deviceID)) {
> +  ctx = GBE_NEW(KblContext, unit, name, deviceID, relaxMath);
>  } else if (IS_GEMINILAKE(deviceID)) {
>ctx = GBE_NEW(GlkContext, unit, name, deviceID, relaxMath);
>  }
> @@ -328,6 +330,7 @@ namespace gbe {
>(IS_SKYLAKE(deviceID) && 
> MATCH_SKL_HEADER(binary)) ||
> \
>(IS_BROXTON(deviceID) && 
> MATCH_BXT_HEADER(binary))
> || \
>(IS_KABYLAKE(deviceID) && 
> MATCH_KBL_HEADER(binary))
> || \
> +  (IS_COFFEELAKE(deviceID) &&
> + MATCH_KBL_HEADER(binary)) || \
>(IS_GEMINILAKE(deviceID) &&
> MATCH_GLK_HEADER(binary)) \
>)
> 
> @@ -436,6 +439,8 @@ namespace gbe {
>  FILL_BXT_HEADER(*binary);
>}else if(IS_KABYLAKE(prog->deviceID)){
>  FILL_KBL_HEADER(*binary);
> +  }else if(IS_COFFEELAKE(prog->deviceID)){
> +FILL_KBL_HEADER(*binary);
>}else if(IS_GEMINILAKE(prog->deviceID)){
>  FILL_GLK_HEADER(*binary);
>}else {
> diff --git a/src/cl_device_data.h b/src/cl_device_data.h index
> 123b6192..db5272da 100644
> --- a/src/cl_device_data.h
> +++ b/src/cl_device_data.h
> @@ -372,7 +372,14 @@
>(devid == PCI_CHIP_GLK_3x6 ||   \
> devid == PCI_CHIP_GLK_2x6)
> 
> -#define IS_GEN9(devid) (IS_SKYLAKE(devid) || IS_BROXTON(devid) ||
> IS_KABYLAKE(devid) || IS_GEMINILAKE(devid))
> +#define PCI_CHIP_COFFEELAKE_S_GT2   0x3E92
> +
> +#define IS_CFL_GT2(devid)  \
> +  (devid == PCI_CHIP_COFFEELAKE_S_GT2)
> +
> +#define IS_COFFEELAKE(devid) (IS_CFL_GT2(devid))
> +
> +#define IS_GEN9(devid) (IS_SKYLAKE(devid) || IS_BROXTON(devid) ||
> IS_KABYLAKE(devid) || IS_GEMINILAKE(devid) || IS_COFFEELAKE(devid))
> 
>  #define MAX_OCLVERSION(devid) (IS_GEN9(devid) ? 200 : 120)
> 
> diff --git a/src/cl_device_id.c b/src/cl_device_id.c index 5e284193..d3180258
> 100644
> --- a/src/cl_device_id.c
> +++ b/src/cl_device_id.c
> @@ -274,6 +274,16 @@ static struct _cl_device_id intel_glk12eu_device =
> {  #include "cl_gen9_device.h"
>  };
> 
> +static struct _cl_device_id intel_cfl_gt2_device = {
> +  .max_compute_unit = 24,
> +  .max_thread_per_unit = 7,
> +  .sub_slice_count = 3,
> +  .max_work_item_sizes = {512, 512, 512},
> +  .max_work_group_size = 256,
> +  .max_clock_frequency = 1000,
> +#include "cl_gen9_device.h"
> +};
> +
>  LOCAL cl_device_id
>  cl_get_gt_device(cl_device_type device_type)  { @@ -785,6 +795,19 @@
> glk12eu_break:
>cl_intel_platform_enable_extension(ret, cl_khr_fp16_ext_id);
>break;
> 
> +case PCI_CHIP_COFFEELAKE_S_GT2:
> +  DECL_INFO_STRING(cfl_gt2_break, intel_cfl_gt2_device, name,
> +"Intel(R) UHD Graphics Coffee Lake Desktop GT2");
> +cfl_gt2_break:
> +  intel_cfl_gt2_device.device_id = device_id;
> +  intel_cfl_gt2_device.platform = cl_get_platform_default();
> +  ret = _cfl_gt2_device;
> +#ifdef ENABLE_FP64
> +  cl_intel_platform_enable_extension(ret, cl_khr_fp64_ext_id);
> +#endif
> +  cl_intel_platform_get_default_extension(ret);

Must call cl_intel_platform_enable_extension after 

Re: [Beignet] [PATCH 1/2] Ensure that DRM device uses the i915 driver

2018-01-31 Thread Yang, Rong R
This patch LGTM, thanks.

> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> Mark Thompson
> Sent: Wednesday, January 24, 2018 6:52 AM
> To: beignet@lists.freedesktop.org
> Subject: [Beignet] [PATCH 1/2] Ensure that DRM device uses the i915 driver
> 
> This avoids calling random ioctl()s and returning nonsensical errors for
> unsupported devices.  In particular, loading is much cleaner on setups where 
> the
> driver needs to iterate over multiple devices to find the correct one because 
> the
> Intel graphics device is not the first DRM device.
> ---
> Fixes this sort of spam from every OpenCL-using application:
> 
> $ clinfo
> DRM_IOCTL_I915_GEM_APERTURE failed: Invalid argument Assuming 131072kB
> available aperture size.
> May lead to reduced performance or incorrect rendering.
> get chip id failed: -1 [2]
> param: 4, val: 0
> Number of platforms   1
>   Platform Name   Intel Gen OCL Driver
>   Platform Vendor Intel
>   Platform VersionOpenCL 2.0 beignet 1.4 
> (git-d1b99a1d)
>   Platform ProfileFULL_PROFILE
>   Platform Extensions 
> cl_khr_global_int32_base_atomics
> cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics
> cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store
> cl_khr_3d_image_writes cl_khr_image2d_from_buffer cl_khr_depth_images
> cl_khr_spir cl_khr_icd cl_intel_accelerator cl_intel_subgroups
> cl_intel_subgroups_short cl_intel_media_block_io cl_intel_planar_yuv
> cl_khr_gl_sharing
>   Platform Extensions function suffix Intel
> DRM_IOCTL_I915_GEM_APERTURE failed: Invalid argument Assuming 131072kB
> available aperture size.
> May lead to reduced performance or incorrect rendering.
> get chip id failed: -1 [2]
> param: 4, val: 0
> DRM_IOCTL_I915_GEM_APERTURE failed: Invalid argument Assuming 131072kB
> available aperture size.
> May lead to reduced performance or incorrect rendering.
> get chip id failed: -1 [2]
> param: 4, val: 0
> DRM_IOCTL_I915_GEM_APERTURE failed: Invalid argument Assuming 131072kB
> available aperture size.
> May lead to reduced performance or incorrect rendering.
> get chip id failed: -1 [2]
> param: 4, val: 0
> DRM_IOCTL_I915_GEM_APERTURE failed: Invalid argument Assuming 131072kB
> available aperture size.
> May lead to reduced performance or incorrect rendering.
> get chip id failed: -1 [2]
> param: 4, val: 0
> DRM_IOCTL_I915_GEM_APERTURE failed: Invalid argument Assuming 131072kB
> available aperture size.
> May lead to reduced performance or incorrect rendering.
> get chip id failed: -1 [2]
> param: 4, val: 0
> 
>   Platform Name   Intel Gen OCL Driver
> Number of devices 1
> DRM_IOCTL_I915_GEM_APERTURE failed: Invalid argument Assuming 131072kB
> available aperture size.
> May lead to reduced performance or incorrect rendering.
> get chip id failed: -1 [2]
> param: 4, val: 0
> DRM_IOCTL_I915_GEM_APERTURE failed: Invalid argument Assuming 131072kB
> available aperture size.
> May lead to reduced performance or incorrect rendering.
> get chip id failed: -1 [2]
> param: 4, val: 0
> ...
> 
> 
>  src/intel/intel_driver.c | 30 ++
>  1 file changed, 30 insertions(+)
> 
> diff --git a/src/intel/intel_driver.c b/src/intel/intel_driver.c index
> 45719785..10fe3cc8 100644
> --- a/src/intel/intel_driver.c
> +++ b/src/intel/intel_driver.c
> @@ -312,6 +312,26 @@ return ret;
>  }
>  #endif
> 
> +static int
> +intel_driver_check_device(int dev_fd)
> +{
> +  // Ensure that this is actually an i915 DRM device.
> +  drmVersion *version;
> +  int ret;
> +  version = drmGetVersion(dev_fd);
> +  if (!version) {
> +fprintf(stderr, "drmGetVersion(%d) failed: %s\n", dev_fd, 
> strerror(errno));
> +close(dev_fd);
> +return 0;
> +  }
> +  ret = !strcmp(version->name, "i915");
> +  drmFreeVersion(version);
> +  // Don't print an error here if this device is using a different
> +driver,
> +  // because we might be iterating over multiple devices looking for a
> +  // compatible one.
> +  return ret;
> +}
> +
>  LOCAL int
>  intel_driver_init_master(intel_driver_t *driver, const char* dev_name)  { @@ 
> -
> 326,6 +346,11 @@ if (dev_fd == -1) {
>return 0;
>  }
> 
> +if (!intel_driver_check_device(dev_fd)) {
> +  close(dev_fd);
> +  return 0;
> +}
> +
>  // Check that we're authenticated
>  memset(, 0, sizeof(drm_client_t));  ret = ioctl(dev_fd,
> DRM_IOCTL_GET_CLIENT, ); @@ -356,6 +381,11 @@ dev_fd =
> open(dev_name, O_RDWR);  if (dev_fd == -1)
>return 0;
> 
> +if (!intel_driver_check_device(dev_fd)) {
> +  close(dev_fd);
> +  return 0;
> +}
> +
>  ret = intel_driver_init(driver, dev_fd);  driver->need_close = 1;
> 
> --
> 2.11.0
> ___
> Beignet mailing list

Re: [Beignet] Is "X server found. dri2 connection failed!" normal on Wayland?

2018-01-10 Thread Yang, Rong R
Yes, this warning is legacy, I will apply your patch, thanks.

> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> Rebecca N. Palmer
> Sent: Tuesday, January 9, 2018 5:12 AM
> To: beignet@lists.freedesktop.org
> Subject: [Beignet] Is "X server found. dri2 connection failed!" normal on
> Wayland?
> 
> ...and if it is, should we stop printing a warning for it to avoid 
> pointlessly scaring
> users?
> 
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=882486
> 
> When I get this warning, everything still seems to work (e.g. the test suite
> passes).  Some other reports have both it and another error (e.g.
> https://bugzilla.redhat.com/show_bug.cgi?id=1478536 ,
> https://bugzilla.redhat.com/show_bug.cgi?id=1460400 ), but these may be
> unrelated bugs (e.g. the second is
> https://bugs.freedesktop.org/show_bug.cgi?id=101485 ).
> 
> Signed-off-by: Rebecca N. Palmer 
> 
> --- a/src/intel/intel_driver.c
> +++ b/src/intel/intel_driver.c
> @@ -235,8 +235,6 @@ if(intel->x11_display) {
>   intel_driver_init_shared(intel, intel->dri_ctx);
>   Xfree(driver_name);
> }
> -  else
> -fprintf(stderr, "X server found. dri2 connection failed! \n");
>   }
>   #endif
> 
> 
> ___
> Beignet mailing list
> Beignet@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/beignet
___
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] [PATCH] Docs: OCL_STRICT_CONFORMANCE is default-on since 1.1

2017-11-01 Thread Yang, Rong R
LGTM, pushed, thanks.

> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> Rebecca N. Palmer
> Sent: Saturday, October 28, 2017 3:24 AM
> To: beignet@lists.freedesktop.org
> Subject: [Beignet] [PATCH] Docs: OCL_STRICT_CONFORMANCE is default-on
> since 1.1
> 
> Signed-off-by: Rebecca N. Palmer 
> 
> --- a/docs/Beignet/Backend.mdwn
> +++ b/docs/Beignet/Backend.mdwn
> @@ -37,9 +37,7 @@ Environment variables are used all over
>precision math instructions compliant with OpenCL Spec. So we provide a
>software version to meet the high precision requirement. Obviously the
>software version's performance is not as good as native version supported 
> by
> -  GEN hardware. What's more, most graphics application don't need this high
> -  precision, so we choose 0 as the default value. So OpenCL apps do not 
> suffer
> -  the performance penalty for using high precision math functions.
> +  GEN hardware.
> 
>  - `OCL_SIMD_WIDTH` `(8 or 16)`. Select the number of lanes per hardware
> thread,
>Normally, you don't need to set it, we will select suitable simd width for
> 
> ___
> Beignet mailing list
> Beignet@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/beignet
___
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] [PATCH] Docs: Fix grammar

2017-11-01 Thread Yang, Rong R
LGTM, pushed, thanks.

> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> Rebecca N. Palmer
> Sent: Saturday, October 28, 2017 3:29 AM
> To: beignet@lists.freedesktop.org
> Subject: [Beignet] [PATCH] Docs: Fix grammar
> 
> Signed-off-by: Rebecca N. Palmer 
> 
> --- a/docs/Beignet/Backend.mdwn
> +++ b/docs/Beignet/Backend.mdwn
> @@ -9,10 +9,10 @@ Status
>  --
> 
>  After two years development, beignet is mature now. It now supports all the -
> OpenCL 1.2 mandatory features. Beignet get almost 100% pass rate with both -
> OpenCV 3.0 test suite and the piglit opencl test suite. There are some -
> performance tuning related items remained, see [[here|Backend/TODO]] for a
> -(incomplete) lists of things to do.
> +OpenCL 1.2 mandatory features. Beignet gets almost 100% pass rate with
> +both the OpenCV 3.0 test suite and the piglit opencl test suite. There
> +are some performance tuning related items remained, see
> +[[here|Backend/TODO]] for an
> +(incomplete) list of things to do.
> 
>  Interface with the run-time
>  ---
> @@ -61,7 +61,7 @@ Environment variables are used all over
>  - `OCL_OUTPUT_REG_ALLOC` `(0 or 1)`. Output Gen register allocations,
> including
>virtual register to physical register mapping, live ranges.
> 
> -- `OCL_OUTPUT_BUILD_LOG` `(0 or 1)`. Output error messages if there is any
> +- `OCL_OUTPUT_BUILD_LOG` `(0 or 1)`. Output error messages if there are
> +any
>during CL kernel compiling and linking.
> 
>  - `OCL_OUTPUT_CFG` `(0 or 1)`. Output control flow graph in .dot file.
> @@ -70,22 +70,22 @@ Environment variables are used all over
>but without instructions in each BasicBlock.
> 
>  - `OCL_PRE_ALLOC_INSN_SCHEDULE` `(0 or 1)`. The instruction scheduler in
> -  beignet are currently splitted into two passes: before and after register
> -  allocation. The pre-alloc scheduler tend to decrease register pressure.
> +  beignet is currently split into two passes: before and after register
> + allocation. The pre-alloc scheduler tends to decrease register pressure.
>This variable is used to disable/enable pre-alloc scheduler. This pass is
>disabled now for some bugs.
> 
>  - `OCL_POST_ALLOC_INSN_SCHEDULE` `(0 or 1)`. Disable/enable post-alloc
> -  instruction scheduler. The post-alloc scheduler tend to reduce instruction
> +  instruction scheduler. The post-alloc scheduler tends to reduce
> + instruction
>latency. By default, this is enabled now.
> 
> -- `OCL_SIMD16_SPILL_THRESHOLD` `(0 to 256)`. Tune how much registers can
> be
> -  spilled under SIMD16. Default value is 16. We find spill too much register
> -  under SIMD16 is not as good as fall back to SIMD8 mode. So we set the
> +- `OCL_SIMD16_SPILL_THRESHOLD` `(0 to 256)`. Tune how many registers
> +can be
> +  spilled under SIMD16. Default value is 16. We find spilling too many
> +registers
> +  under SIMD16 is not as good as falling back to SIMD8 mode. So we set
> +the
>variable to control spilled register number under SIMD16.
> 
>  - `OCL_USE_PCH` `(0 or 1)`. The default value is 1. If it is enabled, we use
> -  a pre compiled header file which include all basic ocl headers. This would
> +  a pre compiled header file which includes all basic ocl headers. This
> + would
>reduce the compile time.
> 
>  Implementation details
> 
> ___
> Beignet mailing list
> Beignet@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/beignet
___
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] [PATCH] GBE: Remove TBAA.

2017-10-25 Thread Yang, Rong R
LGTM, pushed, thanks.

> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> Song, Ruiling
> Sent: Wednesday, October 18, 2017 3:36 PM
> To: beignet@lists.freedesktop.org
> Subject: Re: [Beignet] [PATCH] GBE: Remove TBAA.
> 
> Please skip this patch. I have sent another patch to solve the problem.
> 
> Ruiling
> > -Original Message-
> > From: Song, Ruiling
> > Sent: Wednesday, October 18, 2017 11:03 AM
> > To: beignet@lists.freedesktop.org
> > Cc: Song, Ruiling 
> > Subject: [PATCH] GBE: Remove TBAA.
> >
> > At the time we expand llvm.memcpy. we introduce some load/store that
> > break the TBAA. This issue comes out in llvm5.0. So we remove the TBAA
> > from the compilation passes.
> >
> > Signed-off-by: Ruiling Song 
> > ---
> >  backend/src/llvm/llvm_to_gen.cpp | 2 --
> >  1 file changed, 2 deletions(-)
> >
> > diff --git a/backend/src/llvm/llvm_to_gen.cpp
> > b/backend/src/llvm/llvm_to_gen.cpp
> > index 8546f73..f679c58 100644
> > --- a/backend/src/llvm/llvm_to_gen.cpp
> > +++ b/backend/src/llvm/llvm_to_gen.cpp
> > @@ -87,7 +87,6 @@ namespace gbe
> >  FPM.add(new TargetLibraryInfo(*libraryInfo));  #endif  #if
> > LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 38
> > -FPM.add(createTypeBasedAAWrapperPass());
> >  FPM.add(createBasicAAWrapperPass());
> >  #else
> >  FPM.add(createTypeBasedAliasAnalysisPass());
> > @@ -129,7 +128,6 @@ namespace gbe
> >  MPM.add(new TargetLibraryInfo(*libraryInfo));  #endif  #if
> > LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 38
> > -MPM.add(createTypeBasedAAWrapperPass());
> >  MPM.add(createBasicAAWrapperPass());
> >  #else
> >  MPM.add(createTypeBasedAliasAnalysisPass());
> > --
> > 2.4.1
> 
> ___
> Beignet mailing list
> Beignet@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/beignet
___
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] [PATCH] metainfo: escape ampersand

2017-10-08 Thread Yang, Rong R
LGTM, pushed, thanks.

> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> Igor Gnatenko
> Sent: Wednesday, October 4, 2017 4:13 PM
> To: beignet@lists.freedesktop.org
> Cc: Igor Gnatenko 
> Subject: [Beignet] [PATCH] metainfo: escape ampersand
> 
> com.intel.beignet.metainfo.xml: failed to parse 
> com.intel.beignet.metainfo.xml:
> Error on line 147: Entity did not end with a semicolon; most likely you used 
> an
> ampersand character without intending to start an entity - escape ampersand as
> 
> 
> Signed-off-by: Igor Gnatenko 
> ---
>  com.intel.beignet.metainfo.xml.in | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/com.intel.beignet.metainfo.xml.in
> b/com.intel.beignet.metainfo.xml.in
> index 65a2fad9..d391d0da 100644
> --- a/com.intel.beignet.metainfo.xml.in
> +++ b/com.intel.beignet.metainfo.xml.in
> @@ -10,7 +10,7 @@
>  MIT
>  LGPL-2.1+
>   type="homepage">https://www.freedesktop.org/wiki/Software/Beignet/
> - type="bugtracker">https://bugs.freedesktop.org/buglist.cgi?product=Beignet
> omponent=Beignet=---
> + type="bugtracker">https://bugs.freedesktop.org/buglist.cgi?product=Beignet
> mp;component=Beignetresolution=---
>  Intel
>  
>  
> --
> 2.14.2
> 
> ___
> Beignet mailing list
> Beignet@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/beignet
___
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] beignet and LLVM 4/5

2017-10-08 Thread Yang, Rong R
It is under the release testing.
If there is no issue, I will release a new version 1.3.2 that include llvm 4 
and 5 support.

> -Original Message-
> From: Rebecca N. Palmer [mailto:rebecca_pal...@zoho.com]
> Sent: Wednesday, October 4, 2017 3:10 PM
> To: Yang, Rong R <rong.r.y...@intel.com>; beignet@lists.freedesktop.org
> Subject: Re: [Beignet] beignet and LLVM 4/5
> 
> Such patches were pushed to the 1.3 branch 12 days ago, and appear to work,
> but I haven't yet checked for the bug Fedora had.
> 
> If (as is likely) Debian beignet gets asked to switch soon, should I use 
> current 1.3
> git?  and LLVM 4 or 5?
> 
> On 10/09/17 22:26, Rebecca N. Palmer wrote:
> > (I'm asking for Debian - Fedora is already using LLVM 4 + git beignet)
> >
> > On 08/09/17 07:57, Yang, Rong R wrote:
> >>  LLVM 5. 0 has been released, we are planning to release a minor
> >> release 1.3.2 to support LLVM 4.0 and LLVM 5.0 after beignet's
> >> LLVM5.0 patches are ready.
> >
> > Roughly how long do you expect this to take?

___
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] [Patch V2 2/2] GBE: enable llvm5.0 support.

2017-09-21 Thread Yang, Rong R
Pushed, thanks.

> -Original Message-
> From: Song, Ruiling
> Sent: Wednesday, September 20, 2017 4:44 PM
> To: Yang, Rong R <rong.r.y...@intel.com>; beignet@lists.freedesktop.org
> Cc: Yang, Rong R <rong.r.y...@intel.com>
> Subject: RE: [Beignet] [Patch V2 2/2] GBE: enable llvm5.0 support.
> 
> This version looks good!
> 
> Ruiling
> 
> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> Yang Rong
> Sent: Wednesday, September 20, 2017 4:18 PM
> To: beignet@lists.freedesktop.org
> Cc: Yang, Rong R <rong.r.y...@intel.com>
> Subject: [Beignet] [Patch V2 2/2] GBE: enable llvm5.0 support.
> 
> 1. getOrInsertFunction without nullptr.
> 2. handle f16 rounding.
> 3. remove llvm value dump.
> 4. handle AddrSpaceCastInst when parsing block info.
> 
> V2: use stripPointerCasts instead of BitCast and AddrSpaceCast.
> Signed-off-by: Yang Rong <rong.r.y...@intel.com>
> ---
>  backend/src/llvm/PromoteIntegers.cpp |  5 
>  backend/src/llvm/llvm_barrier_nodup.cpp  |  4 +++
> backend/src/llvm/llvm_device_enqueue.cpp | 42 +
> ---
>  backend/src/llvm/llvm_gen_backend.cpp| 41 +++
> 
>  backend/src/llvm/llvm_profiling.cpp  | 20 ---
>  backend/src/llvm/llvm_sampler_fix.cpp|  8 ++
>  6 files changed, 87 insertions(+), 33 deletions(-)
> 
> diff --git a/backend/src/llvm/PromoteIntegers.cpp
> b/backend/src/llvm/PromoteIntegers.cpp
> index a500311..d433771 100644
> --- a/backend/src/llvm/PromoteIntegers.cpp
> +++ b/backend/src/llvm/PromoteIntegers.cpp
> @@ -605,8 +605,13 @@ static void convertInstruction(Instruction *Inst,
> ConversionState ) {
>  for (SwitchInst::CaseIt I = Switch->case_begin(),
>   E = Switch->case_end();
>   I != E; ++I) {
> +#if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 50
> +  NewInst->addCase(cast(convertConstant(I->getCaseValue())),
> +   I->getCaseSuccessor()); #else
>NewInst->addCase(cast(convertConstant(I.getCaseValue())),
> I.getCaseSuccessor());
> +#endif
>  }
>  Switch->eraseFromParent();
>} else {
> diff --git a/backend/src/llvm/llvm_barrier_nodup.cpp
> b/backend/src/llvm/llvm_barrier_nodup.cpp
> index a7d0d1a..b8ffdf4 100644
> --- a/backend/src/llvm/llvm_barrier_nodup.cpp
> +++ b/backend/src/llvm/llvm_barrier_nodup.cpp
> @@ -74,7 +74,11 @@ namespace gbe {
>if (F.hasFnAttribute(Attribute::NoDuplicate)) {
>  auto attrs = F.getAttributes();
>  F.setAttributes(attrs.removeAttribute(M.getContext(),
> +#if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 50
> +AttributeList::FunctionIndex, #else
>  AttributeSet::FunctionIndex,
> +#endif
>  Attribute::NoDuplicate));
>  changed = true;
>}
> diff --git a/backend/src/llvm/llvm_device_enqueue.cpp
> b/backend/src/llvm/llvm_device_enqueue.cpp
> index 9a0fb46..58aa681 100644
> --- a/backend/src/llvm/llvm_device_enqueue.cpp
> +++ b/backend/src/llvm/llvm_device_enqueue.cpp
> @@ -29,6 +29,7 @@ namespace gbe {
>  BitCastInst* bt = dyn_cast(I);
>  if (bt == NULL)
>return NULL;
> +//bt->dump();
> 
>  Type* type = bt->getOperand(0)->getType();
>  if(!type->isPointerTy())
> @@ -112,7 +113,8 @@ namespace gbe {
>  ValueToValueMapTy VMap;
>  for (Function::arg_iterator I = Fn->arg_begin(), E = Fn->arg_end(); I != 
> E; ++I) {
>PointerType *ty = dyn_cast(I->getType());
> -  if(ty && ty->getAddressSpace() == 0) //Foce set the address space to 
> global
> +  //Foce set the address space to global
> +  if(ty && (ty->getAddressSpace() == 0 || ty->getAddressSpace() ==
> + 4))
>  ty = PointerType::get(ty->getPointerElementType(), 1);
>ParamTys.push_back(ty);
>  }
> @@ -252,12 +254,13 @@ namespace gbe {
>  if(gep == NULL)
>continue;
> 
> -BitCastInst* fnPointer = 
> dyn_cast(gep->getOperand(0));
> -if(fnPointer == NULL)
> +Value *fnPointer = gep->getOperand(0)->stripPointerCasts();
> +
> +if(fnPointer == gep->getOperand(0))
>continue;
> 
> -if(BitCastInst* bt = 
> dyn_cast(fnPointer->getOperand(0))) {
> -  std::string fnName = blocks[bt->getOperand(0)];
> +if(blocks.find(fnPointer) != blocks.end()) {
> +   

Re: [Beignet] [PATCH] libocl: Add shuffle and shuffle2 builtins for half type

2017-09-21 Thread Yang, Rong R
Pushed, thanks.

> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> Song, Ruiling
> Sent: Wednesday, September 20, 2017 4:44 PM
> To: Jan Vesely ; beignet@lists.freedesktop.org
> Subject: Re: [Beignet] [PATCH] libocl: Add shuffle and shuffle2 builtins for 
> half
> type
> 
> LGTM
> 
> Ruiling
> 
> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of Jan
> Vesely
> Sent: Tuesday, September 19, 2017 12:17 PM
> To: beignet@lists.freedesktop.org
> Subject: [Beignet] [PATCH] libocl: Add shuffle and shuffle2 builtins for half 
> type
> 
> Signed-off-by: Jan Vesely 
> ---
> Fixes shuffle and shuffle2(mostly) piglit tests for half type
> 
>  backend/src/libocl/include/ocl_misc.h | 2 ++
>  backend/src/libocl/src/ocl_misc.cl| 2 ++
>  2 files changed, 4 insertions(+)
> 
> diff --git a/backend/src/libocl/include/ocl_misc.h
> b/backend/src/libocl/include/ocl_misc.h
> index cb9e5bdd..a6a29e39 100644
> --- a/backend/src/libocl/include/ocl_misc.h
> +++ b/backend/src/libocl/include/ocl_misc.h
> @@ -52,6 +52,7 @@ DEF(char)
>  DEF(uchar)
>  DEF(short)
>  DEF(ushort)
> +DEF(half)
>  DEF(int)
>  DEF(uint)
>  DEF(float)
> @@ -117,6 +118,7 @@ DEF(char)
>  DEF(uchar)
>  DEF(short)
>  DEF(ushort)
> +DEF(half)
>  DEF(int)
>  DEF(uint)
>  DEF(float)
> diff --git a/backend/src/libocl/src/ocl_misc.cl
> b/backend/src/libocl/src/ocl_misc.cl
> index ce139a6c..d8e09aed 100644
> --- a/backend/src/libocl/src/ocl_misc.cl
> +++ b/backend/src/libocl/src/ocl_misc.cl
> @@ -87,6 +87,7 @@ DEF(char)
>  DEF(uchar)
>  DEF(short)
>  DEF(ushort)
> +DEF(half)
>  DEF(int)
>  DEF(uint)
>  DEF(float)
> @@ -202,6 +203,7 @@ DEF(char)
>  DEF(uchar)
>  DEF(short)
>  DEF(ushort)
> +DEF(half)
>  DEF(int)
>  DEF(uint)
>  DEF(float)
> --
> 2.13.5
> 
> ___
> Beignet mailing list
> Beignet@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/beignet
> ___
> Beignet mailing list
> Beignet@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/beignet
___
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] [PATCH] backend: use simd-1 for scalar dst in indirectMov.

2017-09-21 Thread Yang, Rong R
LGTM, pushed, thanks.

> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> Ruiling Song
> Sent: Wednesday, September 13, 2017 2:07 PM
> To: beignet@lists.freedesktop.org
> Cc: Song, Ruiling 
> Subject: [Beignet] [PATCH] backend: use simd-1 for scalar dst in indirectMov.
> 
> This fix a failure introduced by load-store optimization on IVB.
> the test case is: builtin_kernel_block_motion_estimate_intel
> 
> Signed-off-by: Ruiling Song 
> ---
>  backend/src/backend/gen_context.cpp | 38 +++
> --
>  1 file changed, 24 insertions(+), 14 deletions(-)
> 
> diff --git a/backend/src/backend/gen_context.cpp
> b/backend/src/backend/gen_context.cpp
> index 0b171ff..6fc3159 100644
> --- a/backend/src/backend/gen_context.cpp
> +++ b/backend/src/backend/gen_context.cpp
> @@ -1949,23 +1949,33 @@ namespace gbe
>  indirect_src = GenRegister::indirect(dst.type, 0, GEN_WIDTH_1,
>   GEN_VERTICAL_STRIDE_ONE_DIMENSIONAL,
> GEN_HORIZONTAL_STRIDE_0);
> 
> -p->push();
> -  p->curr.execWidth = 8;
> -  p->curr.quarterControl = GEN_COMPRESSION_Q1;
> -  p->MOV(a0, tmp);
> -  p->MOV(dst, indirect_src);
> -p->pop();
> -
> -if (simdWidth == 16) {
> +if (sel->isScalarReg(dst.reg())) {
> +  p->push();
> +p->curr.execWidth = 1;
> +p->curr.predicate = GEN_PREDICATE_NONE;
> +p->curr.noMask = 1;
> +p->MOV(a0, tmp);
> +p->MOV(dst, indirect_src);
> +  p->pop();
> +} else {
>p->push();
>  p->curr.execWidth = 8;
> -p->curr.quarterControl = GEN_COMPRESSION_Q2;
> -
> -const GenRegister nextDst = GenRegister::Qn(dst, 1);
> -const GenRegister nextOffset = GenRegister::Qn(tmp, 1);
> -p->MOV(a0, nextOffset);
> -p->MOV(nextDst, indirect_src);
> +p->curr.quarterControl = GEN_COMPRESSION_Q1;
> +p->MOV(a0, tmp);
> +p->MOV(dst, indirect_src);
>p->pop();
> +
> +  if (simdWidth == 16) {
> +p->push();
> +  p->curr.execWidth = 8;
> +  p->curr.quarterControl = GEN_COMPRESSION_Q2;
> +
> +  const GenRegister nextDst = GenRegister::Qn(dst, 1);
> +  const GenRegister nextOffset = GenRegister::Qn(tmp, 1);
> +  p->MOV(a0, nextOffset);
> +  p->MOV(nextDst, indirect_src);
> +p->pop();
> +  }
>  }
>}
> 
> --
> 2.4.1
> 
> ___
> Beignet mailing list
> Beignet@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/beignet
___
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] [PATCH] libocl: Consider only bottom ilogb(2m-1)+1 bits

2017-09-21 Thread Yang, Rong R
Pushed, thanks.

> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> Song, Ruiling
> Sent: Wednesday, September 20, 2017 4:44 PM
> To: Jan Vesely ; beignet@lists.freedesktop.org
> Subject: Re: [Beignet] [PATCH] libocl: Consider only bottom ilogb(2m-1)+1 bits
> 
> LGTM
> 
> Ruiling
> 
> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of Jan
> Vesely
> Sent: Tuesday, September 19, 2017 1:13 PM
> To: beignet@lists.freedesktop.org
> Subject: [Beignet] [PATCH] libocl: Consider only bottom ilogb(2m-1)+1 bits
> 
> Signed-off-by: Jan Vesely 
> ---
> Fixes remaining shuffle2 piglit tests on Skylake
> backend/src/libocl/src/ocl_misc.cl | 60 +++---
>  1 file changed, 30 insertions(+), 30 deletions(-)
> 
> diff --git a/backend/src/libocl/src/ocl_misc.cl
> b/backend/src/libocl/src/ocl_misc.cl
> index d8e09aed..f104f4ff 100644
> --- a/backend/src/libocl/src/ocl_misc.cl
> +++ b/backend/src/libocl/src/ocl_misc.cl
> @@ -109,8 +109,8 @@ DEF(double)
>  #define DEC2X(TYPE, MASKTYPE) \
>OVERLOADABLE TYPE##2 shuffle2(TYPE##16 x, TYPE##16 y, MASKTYPE##2
> mask) { \
>  TYPE##2 z; \
> -z.s0 = mask.s0 < 16 ? ((TYPE *))[mask.s0] : ((TYPE *))[mask.s0 & 
> 15]; \
> -z.s1 = mask.s1 < 16 ? ((TYPE *))[mask.s1] : ((TYPE *))[mask.s1 & 
> 15]; \
> +z.s0 = (mask.s0 & 31) < 16 ? ((TYPE *))[mask.s0 & 31] : ((TYPE
> *))[mask.s0 & 15]; \
> +z.s1 = (mask.s1 & 31) < 16 ? ((TYPE *))[mask.s1 & 31] : ((TYPE
> + *))[mask.s1 & 15]; \
>  return z; \
>}
> 
> @@ -122,10 +122,10 @@ DEF(double)
>  #define DEC4X(TYPE, MASKTYPE) \
>OVERLOADABLE TYPE##4 shuffle2(TYPE##16 x, TYPE##16 y, MASKTYPE##4
> mask) { \
>  TYPE##4 z; \
> -z.s0 = mask.s0 < 16 ? ((TYPE *))[mask.s0] : ((TYPE *))[mask.s0 & 
> 15]; \
> -z.s1 = mask.s1 < 16 ? ((TYPE *))[mask.s1] : ((TYPE *))[mask.s1 & 
> 15]; \
> -z.s2 = mask.s2 < 16 ? ((TYPE *))[mask.s2] : ((TYPE *))[mask.s2 & 
> 15]; \
> -z.s3 = mask.s3 < 16 ? ((TYPE *))[mask.s3] : ((TYPE *))[mask.s3 & 
> 15]; \
> +z.s0 = (mask.s0 & 31) < 16 ? ((TYPE *))[mask.s0 & 31] : ((TYPE
> *))[mask.s0 & 15]; \
> +z.s1 = (mask.s1 & 31) < 16 ? ((TYPE *))[mask.s1 & 31] : ((TYPE
> *))[mask.s1 & 15]; \
> +z.s2 = (mask.s2 & 31) < 16 ? ((TYPE *))[mask.s2 & 31] : ((TYPE
> *))[mask.s2 & 15]; \
> +z.s3 = (mask.s3 & 31) < 16 ? ((TYPE *))[mask.s3 & 31] : ((TYPE
> + *))[mask.s3 & 15]; \
>  return z; \
>}
> 
> @@ -137,14 +137,14 @@ DEF(double)
>  #define DEC8X(TYPE, MASKTYPE) \
>OVERLOADABLE TYPE##8 shuffle2(TYPE##16 x, TYPE##16 y, MASKTYPE##8
> mask) { \
>  TYPE##8 z; \
> -z.s0 = mask.s0 < 16 ? ((TYPE *))[mask.s0] : ((TYPE *))[mask.s0 & 
> 15]; \
> -z.s1 = mask.s1 < 16 ? ((TYPE *))[mask.s1] : ((TYPE *))[mask.s1 & 
> 15]; \
> -z.s2 = mask.s2 < 16 ? ((TYPE *))[mask.s2] : ((TYPE *))[mask.s2 & 
> 15]; \
> -z.s3 = mask.s3 < 16 ? ((TYPE *))[mask.s3] : ((TYPE *))[mask.s3 & 
> 15]; \
> -z.s4 = mask.s4 < 16 ? ((TYPE *))[mask.s4] : ((TYPE *))[mask.s4 & 
> 15]; \
> -z.s5 = mask.s5 < 16 ? ((TYPE *))[mask.s5] : ((TYPE *))[mask.s5 & 
> 15]; \
> -z.s6 = mask.s6 < 16 ? ((TYPE *))[mask.s6] : ((TYPE *))[mask.s6 & 
> 15]; \
> -z.s7 = mask.s7 < 16 ? ((TYPE *))[mask.s7] : ((TYPE *))[mask.s7 & 
> 15]; \
> +z.s0 = (mask.s0 & 31) < 16 ? ((TYPE *))[mask.s0 & 31] : ((TYPE
> *))[mask.s0 & 15]; \
> +z.s1 = (mask.s1 & 31) < 16 ? ((TYPE *))[mask.s1 & 31] : ((TYPE
> *))[mask.s1 & 15]; \
> +z.s2 = (mask.s2 & 31) < 16 ? ((TYPE *))[mask.s2 & 31] : ((TYPE
> *))[mask.s2 & 15]; \
> +z.s3 = (mask.s3 & 31) < 16 ? ((TYPE *))[mask.s3 & 31] : ((TYPE
> *))[mask.s3 & 15]; \
> +z.s4 = (mask.s4 & 31) < 16 ? ((TYPE *))[mask.s4 & 31] : ((TYPE
> *))[mask.s4 & 15]; \
> +z.s5 = (mask.s5 & 31) < 16 ? ((TYPE *))[mask.s5 & 31] : ((TYPE
> *))[mask.s5 & 15]; \
> +z.s6 = (mask.s6 & 31) < 16 ? ((TYPE *))[mask.s6 & 31] : ((TYPE
> *))[mask.s6 & 15]; \
> +z.s7 = (mask.s7 & 31) < 16 ? ((TYPE *))[mask.s7 & 31] : ((TYPE
> + *))[mask.s7 & 15]; \
>  return z; \
>}
> 
> @@ -156,22 +156,22 @@ DEF(double)
>  #define DEC16X(TYPE, MASKTYPE) \
>OVERLOADABLE TYPE##16 shuffle2(TYPE##16 x, TYPE##16 y, MASKTYPE##16
> mask) { \
>  TYPE##16 z; \
> -z.s0 = mask.s0 < 16 ? ((TYPE *))[mask.s0] : ((TYPE *))[mask.s0 & 
> 15]; \
> -z.s1 = mask.s1 < 16 ? ((TYPE *))[mask.s1] : ((TYPE *))[mask.s1 & 
> 15]; \
> -z.s2 = mask.s2 < 16 ? ((TYPE *))[mask.s2] : ((TYPE *))[mask.s2 & 
> 15]; \
> -z.s3 = mask.s3 < 16 ? ((TYPE *))[mask.s3] : ((TYPE *))[mask.s3 & 
> 15]; \
> -z.s4 = mask.s4 < 16 ? ((TYPE *))[mask.s4] : ((TYPE *))[mask.s4 & 
> 15]; \
> -z.s5 = mask.s5 < 16 ? ((TYPE *))[mask.s5] : ((TYPE *))[mask.s5 & 
> 15]; \
> -z.s6 = mask.s6 < 16 ? ((TYPE *))[mask.s6] : ((TYPE *))[mask.s6 & 
> 15]; \
> -z.s7 = mask.s7 < 16 ? ((TYPE *))[mask.s7] : ((TYPE *))[mask.s7 & 
> 15]; \
> -   

Re: [Beignet] beignet and LLVM 4

2017-09-08 Thread Yang, Rong R
Hi Rebecca,

LLVM 5. 0 has been released, we are planning to release a minor release 
1.3.2 to support LLVM 4.0 and
LLVM 5.0 after beignet's LLVM5.0 patches are ready.
Dose fedora applying 4.0 support is urgent? If not, I suggest you wait 
1.3.2 release.

Thanks,
Yang Rong

> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> Rebecca N. Palmer
> Sent: Saturday, September 2, 2017 10:10 PM
> To: beignet@lists.freedesktop.org
> Subject: [Beignet] beignet and LLVM 4
> 
> Debian plans to remove LLVM 3.8 and 3.9, retaining 4.0 and 5.0 [0].
> 
> beignet 1.3.1 (currently in Debian) doesn't build with LLVM 4.0+, but beignet 
> git
> master does.
> 
> Fedora tried applying the 4.0-support patches to beignet 1.3.1, but found that
> this made several applications crash, and are now using a git snapshot [1-2].
> 
> Is such a snapshot also likely to be the best solution for Debian beignet, 
> and if so,
> should it be 'same commit as Fedora' (36f6a8b), 'current master', or some 
> other
> point?
> 
> [0] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=873403 Anyone can
> comment there, but be aware that Debian does _not_ spam-protect email
> addresses.
> [1] https://koji.fedoraproject.org/koji/buildinfo?buildID=943802
> [2]
> https://bugzilla.redhat.com/buglist.cgi?component=beignet=Fedora
> , several towards the end.
> 
> ___
> Beignet mailing list
> Beignet@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/beignet
___
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] Status of cl_khr_gl_sharing

2017-08-07 Thread Yang, Rong R
The application is responsible for synchronizing access to shared objects. So I 
think only one thing
clEnqueueAcquire/ReleaseGLObjects to do is set the parameter event's status 
correctly.
I will implement them, and remove the other NOT_IMPLEMENTED functions.

> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> Rebecca N. Palmer
> Sent: Friday, August 4, 2017 2:46 PM
> To: beignet@lists.freedesktop.org
> Subject: Re: [Beignet] Status of cl_khr_gl_sharing
> 
> The *missing* functions aren't likely to be a problem: as far as I can tell,
> http://sources.debian.net/src/forge/0.9.2-
> 2/examples/opencl/cl_helpers.h/?hl=72#L72
> is the only place in Debian that tries to use one of them, and it correctly 
> handles
> its absence.
> 
> I'm more worried about the potential consequences (if any) of
> clEnqueueAcquire/ReleaseGLObjects not actually doing anything.
> 
> ___
> Beignet mailing list
> Beignet@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/beignet







___
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] Status of cl_khr_gl_sharing

2017-08-03 Thread Yang, Rong R
Commit a892148 has re-implement this extension, use the 
eglExportDMABUFImageMESA and 
glGetTexLevelParameteriv to get sharing buffer's informatin.
Now it require the libEGL.so and libGL.so's version >= 13.0.

But this extension is partially implemented, only support 
clCreateFromGLTexture2D.

> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> Simon Richter
> Sent: Thursday, August 3, 2017 12:28 AM
> To: beignet@lists.freedesktop.org
> Subject: Re: [Beignet] Status of cl_khr_gl_sharing
> 
> Hi,
> 
> On 01.08.2017 23:39, Rebecca N. Palmer wrote:
> 
> > As beignet's Debian maintainer, I am considering whether to enable
> > CL-GL sharing in our package, given its incomplete state.
> 
> I haven't looked at it since the initial package, because back then it 
> required the
> Mesa source code in order to get offsets into an internal structure.
> 
> If that has changed, I'd be totally in favour of enabling it, because we're 
> early in
> the buster cycle, so this is the time to get "new"
> features in.
> 
>Simon

___
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] Kaby lake perf worse than Skylake?

2017-07-20 Thread Yang, Rong R


> -Original Message-
> From: Michael Gratton [mailto:m...@vee.net]
> Sent: Thursday, July 20, 2017 12:16
> To: Yang, Rong R <rong.r.y...@intel.com>
> Cc: beignet@lists.freedesktop.org
> Subject: Re: [Beignet] Kaby lake perf worse than Skylake?
> 
> Hi,
> 
> On Thu, Jul 20, 2017 at 1:30 PM, Yang, Rong R <rong.r.y...@intel.com>
> wrote:
> > Is the quantity of infrared data they process same? Is it affected by
> > fps? Has you try other method, such as the profiling time. Actually,
> > in beignet, the Kabylake is almost same as Skylake.
> 
> The sensor operates at a fixed frame rate (30 fps), so the data processed was
> the same in both cases.
> 
> I haven't tried profiling the process yet, but may look into that. Is there 
> any
> tools available for profiling OpenCL code?

You could use the OpenCL api clGetEventProfilingInfo to get the OpenCL enqueue 
functions profiling information.

> 
> I expected Kabylake the performance would be about the same or better
> than Skylake, which is why I was surprised that it seems to perform not as
> well for this task.

To compare Kabylake and Skylake, you'd better fix them to the same  frequency, 
by setting the
/sys/kernel/debug/dri/0/i915_max_freq and /sys/kernel/debug/dri/0/i915_min_freq 
to the same value.

> 
> //Mike
> 
> --
> ⊨ Michael Gratton, Percept Wrangler.
> ⚙ <http://mjog.vee.net/>
> 

___
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] Kaby lake perf worse than Skylake?

2017-07-19 Thread Yang, Rong R
Hi,

Is the quantity of infrared data they process same? Is it affected by fps?
Has you try other method, such as the profiling time.
Actually, in beignet, the Kabylake is almost same as Skylake.

> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> Michael Gratton
> Sent: Thursday, July 20, 2017 8:49
> To: beignet@lists.freedesktop.org
> Subject: [Beignet] Kaby lake perf worse than Skylake?
> 
> Hi all,
> 
> I'm currently evaluating Beignet OpenCL performance on Skylake and Kaby
> Lake GPUs against Nvidia CUDA, and surprisingly found that for our specific
> task that the GPU performance when executing the same OpenCL algorithms
> was substantially worse on a Kaby lake HD630 than it was on a Skylake HD530.
> Is this to be expected?
> 
> The setup I am using is as follows:
>  - Ubuntu Xenial 16.04, kernel 4.8
>  - Beignet 1.3.0 compiled using the Zesty debian packaging on Xenial
>  - Intel i5-6500 (HD530) vs i7-7700 (HD630)
> 
> The OpenCL task is processing infrared data from Kinect 2 sensors, using the
> Protonect tool from the libfreenect2 library two different OpenCL-based
> algorithms: "default" and "kde" (the latter is more computationally 
> intensive).
> 
> Measuring GPU utilisation using the "render busy" metric from
> intel_gpu_top, this is what I found:
> 
> HD530 utilisation:
>  - default: ~25%
>  - kde: ~52%
> 
> HD630 utilisation:
>  - default: ~35%
>  - kde: ~63%
> 
> I.e. the Kaby lake HD630 utilisation is 1.2-1.4 times heigher than the Skylake
> HD530, however I would have expected the exact opposite.
> 
> Upgrading the kernel to 4.10 produced only a minor improvement for the
> HD630. Would upgrading to Beignet 1.3.1 or a newer kernel than 4.10 help
> further at all? Any advice on how to substantially improve this kaby lake
> performance would be appreciated.
> 
> Cheers,
> //Mike
> 
> --
> ⊨ Michael Gratton, Percept Wrangler.
> ⚙ 
> 
> 
> ___
> Beignet mailing list
> Beignet@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/beignet
___
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] [PATCH] Fix GCC6 build bug

2017-07-19 Thread Yang, Rong R
LGTM, pushed, thanks.

> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> Xiuli Pan
> Sent: Tuesday, July 11, 2017 10:02
> To: beignet@lists.freedesktop.org
> Cc: Pan, Xiuli 
> Subject: [Beignet] [PATCH] Fix GCC6 build bug
> 
> From: Pan Xiuli 
> 
> GCC6 refine the c headers and need to add the needed function header, like
> the abs in math.h.
> 
> Signed-off-by: Pan Xiuli 
> ---
>  backend/src/backend/gen_insn_selection_optimize.cpp | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/backend/src/backend/gen_insn_selection_optimize.cpp
> b/backend/src/backend/gen_insn_selection_optimize.cpp
> index 2ab2a7f..71333a4 100644
> --- a/backend/src/backend/gen_insn_selection_optimize.cpp
> +++ b/backend/src/backend/gen_insn_selection_optimize.cpp
> @@ -9,6 +9,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
> 
>  namespace gbe
>  {
> --
> 2.7.4
> 
> ___
> Beignet mailing list
> Beignet@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/beignet
___
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] [PATCH] backend: refine global immediate optimization

2017-07-19 Thread Yang, Rong R
Forgot to push this patch, pushed, thanks.

> -Original Message-
> From: Song, Ruiling
> Sent: Thursday, July 20, 2017 9:13
> To: Wang, Rander <rander.w...@intel.com>; beig...@freedesktop.org
> Cc: Wang, Rander <rander.w...@intel.com>; Yichao Yu
> <yyc1...@gmail.com>; Yang, Rong R <rong.r.y...@intel.com>
> Subject: RE: [Beignet] [PATCH] backend: refine global immediate
> optimization
> 
> Hi Yang Rong,
> 
> Please help merge this patch. These lines of code cause build error. And in
> fact they are not needed.
> 
> Hi Yichao,
> 
> Sorry for this. In fact these two lines of code should be removed.
> 
> Thanks!
> Ruiling
> 
> > -Original Message-
> > From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf
> > Of rander.wang
> > Sent: Friday, June 30, 2017 4:29 PM
> > To: beig...@freedesktop.org
> > Cc: Wang, Rander <rander.w...@intel.com>
> > Subject: [Beignet] [PATCH] backend: refine global immediate
> > optimization
> >
> > for ABS(UD) = UD on Gen, so delete it,
> > or it make compilation failed on some platform
> >
> > Signed-off-by: rander.wang <rander.w...@intel.com>
> > ---
> >  backend/src/backend/gen_insn_selection_optimize.cpp | 4 
> >  1 file changed, 4 deletions(-)
> >
> > diff --git a/backend/src/backend/gen_insn_selection_optimize.cpp
> > b/backend/src/backend/gen_insn_selection_optimize.cpp
> > index eb93a20..08e4ccf 100644
> > --- a/backend/src/backend/gen_insn_selection_optimize.cpp
> > +++ b/backend/src/backend/gen_insn_selection_optimize.cpp
> > @@ -424,14 +424,10 @@ namespace gbe
> >  else if(src0.type == GEN_TYPE_UD || src1.type == GEN_TYPE_UD)
> >  {
> >unsigned int s0 = src0.value.ud;
> > -  if (src0.absolute)
> > -s0 = abs(s0);
> >if (src0.negation)
> >  s0 = -s0;
> >
> >unsigned int s1 = src1.value.ud;
> > -  if (src1.absolute)
> > -s1 = abs(s1);
> >if (src1.negation)
> >  s1 = -s1;
> >
> > --
> > 2.7.4
> >
> > ___
> > Beignet mailing list
> > Beignet@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/beignet
___
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] [PATCH V5] backend: refine load store optimization

2017-07-17 Thread Yang, Rong R
LGTM, pushed, thanks.

> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> Ruiling Song
> Sent: Thursday, July 13, 2017 10:45
> To: beignet@lists.freedesktop.org
> Cc: Song, Ruiling ; Wang, Rander
> 
> Subject: [Beignet] [PATCH V5] backend: refine load store optimization
> 
> this fix basic test in conformance tests failed for vec8 of char because of
> overflow. And it fix many test items failed in opencv because of offset error
> 
> (1)modify the size of searchInsnArray to 32, it is the max size for char
>And add check for overflow if too many insn (2)Make sure the start insn is
> the first insn of searched array
>because if it is not the first, the offset maybe invalid. And
>it is complex to modify offset without error
> 
> V2: refine search index, using J not I
> V3: remove (2), now add offset to the pointer of start
> pass OpenCV, conformance basic and compiler tests, utests
> V4: check pointer type, if 64bit, modify it by 64, or 32
> V5: refine findSafeInstruction() and variable naming in
> findConsecutiveAccess().
> 
> Signed-off-by: rander.wang 
> Signed-off-by: Ruiling Song 
> ---
>  backend/src/llvm/llvm_loadstore_optimization.cpp | 125
> ---
>  1 file changed, 88 insertions(+), 37 deletions(-)
> 
> diff --git a/backend/src/llvm/llvm_loadstore_optimization.cpp
> b/backend/src/llvm/llvm_loadstore_optimization.cpp
> index c91c1a0..bb8dc5f 100644
> --- a/backend/src/llvm/llvm_loadstore_optimization.cpp
> +++ b/backend/src/llvm/llvm_loadstore_optimization.cpp
> @@ -68,13 +68,14 @@ namespace gbe {
>  bool optimizeLoadStore(BasicBlock );
> 
>  bool isLoadStoreCompatible(Value *A, Value *B, int *dist, int*
> elementSize, int maxVecSize);
> -void mergeLoad(BasicBlock , SmallVector
> );
> -void mergeStore(BasicBlock , SmallVector
> );
> +void mergeLoad(BasicBlock , SmallVector
> , Instruction *start,int offset);
> +void mergeStore(BasicBlock , SmallVector
> , Instruction *start,int offset);
>  bool findConsecutiveAccess(BasicBlock ,
>SmallVector ,
>const BasicBlock::iterator ,
>unsigned maxVecSize,
> -  bool isLoad);
> +  bool isLoad,
> +  int *addrOffset);
>  #if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 40
>  virtual StringRef getPassName() const  #else @@ -143,7 +144,10 @@
> namespace gbe {
>  return (abs(-offset) < sz*maxVecSize);
>}
> 
> -  void GenLoadStoreOptimization::mergeLoad(BasicBlock ,
> SmallVector ) {
> +  void GenLoadStoreOptimization::mergeLoad(BasicBlock ,
> +SmallVector 
> ,
> +Instruction *start,
> +int offset) {
>  IRBuilder<> Builder();
> 
>  unsigned size = merged.size();
> @@ -151,14 +155,27 @@ namespace gbe {
>  for(unsigned i = 0; i < size; i++) {
>values.push_back(merged[i]);
>  }
> -LoadInst *ld = cast(merged[0]);
> +LoadInst *ld = cast(start);
>  unsigned align = ld->getAlignment();
>  unsigned addrSpace = ld->getPointerAddressSpace();
>  // insert before first load
>  Builder.SetInsertPoint(ld);
> +
> +//modify offset
> +Value *newPtr = ld->getPointerOperand();
> +if(offset != 0)
> +{
> +  Type *ptype = ld->getPointerOperand()->getType();
> +  unsigned typeSize = TD->getPointerTypeSize(ptype);
> +  ptype = (typeSize == 4) ? Builder.getInt32Ty():Builder.getInt64Ty();
> +  Value *StartAddr = Builder.CreatePtrToInt(ld->getPointerOperand(),
> ptype);
> +  Value *offsetVal = ConstantInt::get(ptype, offset);
> +  Value *newAddr = Builder.CreateAdd(StartAddr, offsetVal);
> +  newPtr = Builder.CreateIntToPtr(newAddr, ld->getPointerOperand()-
> >getType());
> +}
> +
>  VectorType *vecTy = VectorType::get(ld->getType(), size);
> -Value *vecPtr = Builder.CreateBitCast(ld->getPointerOperand(),
> -PointerType::get(vecTy, addrSpace));
> +Value *vecPtr = Builder.CreateBitCast(newPtr,
> + PointerType::get(vecTy, addrSpace));
>  LoadInst *vecValue = Builder.CreateLoad(vecPtr);
>  vecValue->setAlignment(align);
> 
> @@ -196,8 +213,8 @@ namespace gbe {
>  SmallVector ,
>  const BasicBlock::iterator ,
>  unsigned maxVecSize,
> -bool isLoad) {
> -
> +bool isLoad,
> +

Re: [Beignet] [PATCH] backend: refine math log function

2017-07-03 Thread Yang, Rong R
LGTM, pushed, thanks.

> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> rander.wang
> Sent: Monday, June 19, 2017 13:21
> To: beig...@freedesktop.org
> Cc: Wang, Rander 
> Subject: [Beignet] [PATCH] backend: refine math log function
> 
>   remove a few unnecessary codes , and get 20% improvement
>   at worse case. If X is a NAN, there are some if-return
>   codes to return NAN. Now change it to add(x - x) which
>   get the same NAN
> 
>   pass the conformance tests and utests
> 
> Signed-off-by: rander.wang 
> ---
>  backend/src/libocl/tmpl/ocl_math_common.tmpl.cl | 50 +
> 
>  1 file changed, 10 insertions(+), 40 deletions(-)
> 
> diff --git a/backend/src/libocl/tmpl/ocl_math_common.tmpl.cl
> b/backend/src/libocl/tmpl/ocl_math_common.tmpl.cl
> index b4764ee..2c0a702 100644
> --- a/backend/src/libocl/tmpl/ocl_math_common.tmpl.cl
> +++ b/backend/src/libocl/tmpl/ocl_math_common.tmpl.cl
> @@ -201,38 +201,19 @@ OVERLOADABLE float inline
> __gen_ocl_internal_log_valid(float x) {
>k += (i>>23);
>f = x - 1.0f;
>fsq = f * f;
> -
> -  if((0x007f & (15 + ix)) < 16) { /* |f| < 2**-20 */
> -  R = fsq * (0.5f - 0.3f * f);
> -  return k * ln2_hi + k * ln2_lo + f - R;
> -  }
> -
> -  s = f / (2.0f + f);
> +  s = mad(-2.0f, 1.0f / (2.0f + f), 1.0f);
>z = s * s;
> -  i = ix - (0x6147a << 3);
>w = z * z;
> -  j = (0x6b851 << 3) - ix;
> -  t1= w * mad(w, Lg4, Lg2);
> -  t2= z * mad(w, Lg3, Lg1);
> -  i |= j;
> -  R = t2 + t1;
> -  partial = (i > 0) ? -mad(s, 0.5f * fsq, -0.5f * fsq) : (s * f);
> -
> -  return mad(s, R, f) - partial + k * ln2_hi + k * ln2_lo;;
> +  t1 = w * mad(w, Lg4, Lg2);
> +  R = mad(z, mad(w, Lg3, Lg1), t1);
> +  w = 0.5f * fsq;
> +  partial = -mad(s, w, -w);
> +  return mad(k, ln2_lo, mad(k, ln2_hi, mad(s, R, f) - partial));
>  }
> 
>  OVERLOADABLE float __gen_ocl_internal_log(float x)  {
> -  union { unsigned int i; float f; } u;
> -  u.f = x;
> -  int ix = u.i;
> -
> -  if (ix < 0 )
> - return NAN;  /* log(-#) = NaN */
> -  if (ix >= 0x7f80)
> -return NAN;
> -
> -  return __gen_ocl_internal_log_valid(x);
> +  return __gen_ocl_internal_log_valid(x) + (x - x);
>  }
> 
>  OVERLOADABLE float __gen_ocl_internal_log10(float x) @@ -244,12 +225,10
> @@ OVERLOADABLE float __gen_ocl_internal_log10(float x)
>log10_2lo  =  7.9034151668e-07; /* 0x355427db */
> 
>float y, z;
> -  int i, k, hx;
> +  int i, k;
> +  unsigned int hx;
> 
>u.f = x; hx = u.i;
> -
> -  if (hx<0)
> -return NAN; /* log(-#) = NaN */
>if (hx >= 0x7f80)
>  return NAN;
> 
> @@ -267,17 +246,8 @@ OVERLOADABLE float __gen_ocl_internal_log2(float
> x)  {
>const float zero   =  0.0,
>invln2 = 0x1.715476p+0f;
> -  int ix;
> -
> -  union { float f; int i; } u;
> -  u.f = x; ix = u.i;
> -
> -  if (ix < 0)
> - return NAN;/** log(-#) = NaN */
> -  if (ix >= 0x7f80)
> - return NAN;
> 
> -  return invln2 * __gen_ocl_internal_log_valid(x);
> +  return invln2 * __gen_ocl_internal_log_valid(x) + (x - x);
>  }
> 
> 
> --
> 2.7.4
> 
> ___
> Beignet mailing list
> Beignet@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/beignet
___
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] [PATCH] backend: refine pow function

2017-07-03 Thread Yang, Rong R
LGTM, pushed, thanks.

> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> rander.wang
> Sent: Thursday, June 22, 2017 17:41
> To: beig...@freedesktop.org
> Cc: Wang, Rander 
> Subject: [Beignet] [PATCH] backend: refine pow function
> 
>   Now save 40% time than before
>   (1) group many branches which deal with corner case  to one branch.
> (2) using HW exp2 and log2 to replace some instructions
> 
>   pass conformance tests and utest
> 
> Signed-off-by: rander.wang 
> ---
>  backend/src/libocl/tmpl/ocl_math_common.tmpl.cl | 294 ---
> -
>  1 file changed, 148 insertions(+), 146 deletions(-)
> 
> diff --git a/backend/src/libocl/tmpl/ocl_math_common.tmpl.cl
> b/backend/src/libocl/tmpl/ocl_math_common.tmpl.cl
> index 2c0a702..6026629 100644
> --- a/backend/src/libocl/tmpl/ocl_math_common.tmpl.cl
> +++ b/backend/src/libocl/tmpl/ocl_math_common.tmpl.cl
> @@ -2352,7 +2352,8 @@ OVERLOADABLE float
> __gen_ocl_internal_pow(float x, float y) {
>float z,ax,z_h,z_l,p_h,p_l;
>float y1,t1,t2,r,s,sn,t,u,v,w;
>int i,j,k,yisint,n;
> -  int hx,hy,ix,iy,is;
> +  int hy,ix,iy,is;
> +  unsigned int hx;
>float bp,dp_h,dp_l,
>zero=  0.0,
>one=  1.0,
> @@ -2382,17 +2383,17 @@ OVERLOADABLE float
> __gen_ocl_internal_pow(float x, float y) {
>float retVal = 0.0f;
>bool bRet = false;
> 
> -  GEN_OCL_GET_FLOAT_WORD(hx,x);
> +  hx = as_uint(x);
>GEN_OCL_GET_FLOAT_WORD(hy,y);
>ax   = __gen_ocl_fabs(x);
>ix = as_int(ax);  iy = as_int(fabs(y));
> 
> -  if(iy < 0x0080 || hx==0x3f80)
> +  if(iy < 0x0080)
>{
>   bRet = true;
>   retVal = one;
>}
> -  else if (ix > 0x7f80 || iy > 0x7f80)
> +  else if (iy > 0x7f80)
>{
> bRet = true;
> retVal = NAN;
> @@ -2403,120 +2404,152 @@ OVERLOADABLE float
> __gen_ocl_internal_pow(float x, float y) {
>   * yisint = 1... y is an odd int
>   * yisint = 2... y is an even int
>   */
> -  yisint  = 0;
> -  if(hx<0) {
> -k = (iy>>23)-0x7f;/* exponent */
> -j = iy>>(23-k);
> -yisint = (iy>=0x3f80 && (j<<(23-k))==iy)? 2-(j&1):yisint;
> -yisint = (iy>=0x4b80) ? 2:yisint;
> -  }
> -
> -/* special value of x */
> -  if(ix==0x7f80||ix==0||ix==0x3f80){
> -z = ax;  /*x is +-0,+-inf,+-1*/
> +  sn = one; /* s (sign of result -ve**odd) = -1 else = 1 */
> 
> -z = (hy < 0)? one/z:z;
> -z = ((hx<0) && (((ix-0x3f80)|yisint)==0))? NAN:z;
> -z = ((hx<0) && (yisint==1))? -z:z;
> +  if(hx >= 0x7f80)
> +  {
> +yisint  = 0;
> +n = (hx>>31)-1;
> 
> -retVal = (bRet)? retVal:z;
> -bRet = true;
> -  }
> +if (!retVal && ix > 0x7f80)
> +{
> +  bRet = true;
> +  retVal = NAN;
> +}
> 
> -  n = ((uint)hx>>31)-1;
> +if(hx >= 0x8000) {
> +  k = (iy>>23)-0x7f;  /* exponent */
> +  j = iy>>(23-k);
> +  yisint = (iy>=0x3f80 && (j<<(23-k))==iy)? 2-(j&1):yisint;
> +  yisint = (iy>=0x4b80) ? 2:yisint;
> +}
> 
> -  /* (x<0)**(non-int) is NaN */
> -  if(!bRet && (n|yisint)==0)
> -  {
> - bRet= true;
> - retVal = NAN;
> -  }
> +  /* special value of x */
> +if(ix==0x7f80||ix==0||ix==0x3f80){
> +  z = ax; /*x is +-0,+-inf,+-1*/
> +  z = (hy < 0)? one/z:z;
> +  z = (((ix-0x3f80)|yisint)==0)? NAN:z;
> +  z = (yisint==1)? -z:z;
> +  retVal = (bRet)? retVal:z;
> +  bRet = true;
> +}
> 
> -  sn = one; /* s (sign of result -ve**odd) = -1 else = 1 */
> -  if((n|(yisint-1))==0) sn = -one;/* (-ve)**(odd int) */
> +/* (x<0)**(non-int) is NaN */
> +if(!bRet && (n|yisint)==0)
> +{
> +   bRet= true;
> +   retVal = NAN;
> +}
> 
> -  /* |y| is huge */
> -  if(iy>0x4d00)
> -  { /* if |y| > 2**27 */
> - /* over/underflow if x is not close to one */
> - /* special value of y */
> - float b1 = (hy>=0)? y: zero;
> - float b2 = (hy<0)?-y: zero;
> - b1 = (ix > 0x3f80)? b1:b2;
> - retVal = (iy==0x7f80 && !bRet)? b1:retVal;
> - bRet = (iy==0x7f80 && !bRet)? true: bRet;
> +if((n|(yisint-1))==0) sn = -one;/* (-ve)**(odd int) */  }
> 
> +/* special value of x */
> +  if((ix&0x7f) == 0) {
> +if(hx == 0x3f80)
> +{
> +  retVal = one;
> +  bRet = true;
> +}
> 
> - b1 = (hy>0)? sn*huge*huge:0;
> - retVal = (ix>0x3f87 && !bRet)? b1:retVal;
> - bRet = (ix>0x3f87 && !bRet)? true:bRet;
> +if(ix==0x7f80||ix==0) {
> +  z = ax;/*x is +0,+inf*/
> +  z = (hy < 0)? one/z:z;
> +  retVal = (bRet)? retVal:z;
> +  bRet = true;
> +}
> +  }
> 
> - /* now |1-x| is tiny <= 2**-20, suffice to compute
> -   log(x) by x-x^2/2+x^3/3-x^4/4 */
> - t = ax-1;   /* t has 20 trailing zeros */
> - w 

Re: [Beignet] [PATCH] Runtime: refine max group size for SKL & KBL

2017-07-03 Thread Yang, Rong R
LGTM, pushed, thanks.

> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> rander.wang
> Sent: Friday, June 23, 2017 11:02
> To: beig...@freedesktop.org
> Cc: Wang, Rander 
> Subject: [Beignet] [PATCH] Runtime: refine max group size for SKL & KBL
> 
>   Now change max group size to 256. it is a reasonable
>   size for Gen9. According to performance test, 256 make
>   good progress in openCV and no regression. So change it
> 
> Signed-off-by: rander.wang 
> ---
>  src/cl_device_id.c | 18 +-
>  1 file changed, 9 insertions(+), 9 deletions(-)
> 
> diff --git a/src/cl_device_id.c b/src/cl_device_id.c index 6cba2b5..5ea13a9
> 100644
> --- a/src/cl_device_id.c
> +++ b/src/cl_device_id.c
> @@ -149,7 +149,7 @@ static struct _cl_device_id intel_skl_gt1_device = {
>.max_thread_per_unit = 7,
>.sub_slice_count = 2,
>.max_work_item_sizes = {512, 512, 512},
> -  .max_work_group_size = 512,
> +  .max_work_group_size = 256,
>.max_clock_frequency = 1000,
>  #include "cl_gen9_device.h"
>  };
> @@ -159,7 +159,7 @@ static struct _cl_device_id intel_skl_gt2_device = {
>.max_thread_per_unit = 7,
>.sub_slice_count = 3,
>.max_work_item_sizes = {512, 512, 512},
> -  .max_work_group_size = 512,
> +  .max_work_group_size = 256,
>.max_clock_frequency = 1000,
>  #include "cl_gen9_device.h"
>  };
> @@ -169,7 +169,7 @@ static struct _cl_device_id intel_skl_gt3_device = {
>.max_thread_per_unit = 7,
>.sub_slice_count = 6,
>.max_work_item_sizes = {512, 512, 512},
> -  .max_work_group_size = 512,
> +  .max_work_group_size = 256,
>.max_clock_frequency = 1000,
>  #include "cl_gen9_device.h"
>  };
> @@ -179,7 +179,7 @@ static struct _cl_device_id intel_skl_gt4_device = {
>.max_thread_per_unit = 7,
>.sub_slice_count = 9,
>.max_work_item_sizes = {512, 512, 512},
> -  .max_work_group_size = 512,
> +  .max_work_group_size = 256,
>.max_clock_frequency = 1000,
>  #include "cl_gen9_device.h"
>  };
> @@ -209,7 +209,7 @@ static struct _cl_device_id intel_kbl_gt1_device = {
>.max_thread_per_unit = 7,
>.sub_slice_count = 2,
>.max_work_item_sizes = {512, 512, 512},
> -  .max_work_group_size = 512,
> +  .max_work_group_size = 256,
>.max_clock_frequency = 1000,
>  #include "cl_gen9_device.h"
>  };
> @@ -219,7 +219,7 @@ static struct _cl_device_id intel_kbl_gt15_device = {
>.max_thread_per_unit = 7,
>.sub_slice_count = 3,
>.max_work_item_sizes = {512, 512, 512},
> -  .max_work_group_size = 512,
> +  .max_work_group_size = 256,
>.max_clock_frequency = 1000,
>  #include "cl_gen9_device.h"
>  };
> @@ -229,7 +229,7 @@ static struct _cl_device_id intel_kbl_gt2_device = {
>.max_thread_per_unit = 7,
>.sub_slice_count = 3,
>.max_work_item_sizes = {512, 512, 512},
> -  .max_work_group_size = 512,
> +  .max_work_group_size = 256,
>.max_clock_frequency = 1000,
>  #include "cl_gen9_device.h"
>  };
> @@ -239,7 +239,7 @@ static struct _cl_device_id intel_kbl_gt3_device = {
>.max_thread_per_unit = 7,
>.sub_slice_count = 6,
>.max_work_item_sizes = {512, 512, 512},
> -  .max_work_group_size = 512,
> +  .max_work_group_size = 256,
>.max_clock_frequency = 1000,
>  #include "cl_gen9_device.h"
>  };
> @@ -249,7 +249,7 @@ static struct _cl_device_id intel_kbl_gt4_device = {
>.max_thread_per_unit = 7,
>.sub_slice_count = 9,
>.max_work_item_sizes = {512, 512, 512},
> -  .max_work_group_size = 512,
> +  .max_work_group_size = 256,
>.max_clock_frequency = 1000,
>  #include "cl_gen9_device.h"
>  };
> --
> 2.7.4
> 
> ___
> Beignet mailing list
> Beignet@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/beignet
___
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] [PATCH V4] backend: add global immediate optimization

2017-07-03 Thread Yang, Rong R
GEN is support mixed type instructions, mixed UD and UW. For example, UD * UW.
How about handle the U/UD in one if branch?

> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> Wang, Rander
> Sent: Monday, July 3, 2017 9:33
> To: inte...@intelfx.name; beignet@lists.freedesktop.org
> Cc: Song, Ruiling 
> Subject: Re: [Beignet] [PATCH V4] backend: add global immediate
> optimization
> 
> For D + UD,   D is considered as UD by HW.
> 
> -Original Message-
> From: Ivan Shapovalov [mailto:inte...@intelfx.name]
> Sent: Saturday, July 1, 2017 2:26 AM
> To: Wang, Rander ; beignet@lists.freedesktop.org
> Cc: Song, Ruiling 
> Subject: Re: [Beignet] [PATCH V4] backend: add global immediate
> optimization
> 
> On 2017-06-30 at 15:36 +0300, Ivan Shapovalov wrote:
> > On 2017-06-30 at 01:46 +, Wang, Rander wrote:
> > > Hi,
> > >
> > >   The abs of UD has to be done if it is encoded in instruction no
> > > matter it make sense or not.
> > > And I have discussed with my collage and refine it.
> > > First we inspect the HW behavior of ABS(UD), -(UD) and find that
> > > ABS(UD) = UD,
> > > -(UD) = the result of -(UD) on CPU.
> > >
> > >   So the abs calculation can be removed and this will make it
> > > compiled pass.
> > >
> > > Rander
> >
> > Hi,
> >
> > OK, but what about reading from .value.ud if the corresponding .type
> > is not GEN_TYPE_UD? Is this a concern? Which operand type combinations
> > are possible?
> >
> 
> I mean, due to an || in the conditional it looks like it is possible for 
> either of
> the operands to not be a GEN_TYPE_D. Suppose the first operand is a signed
> dword (GEN_TYPE_D) that holds a negative value and has the ABS flag. In this
> case the new code will yield a significantly wrong result. Is this possible?
> 
> --
> Ivan Shapovalov / intelfx /
> ___
> Beignet mailing list
> Beignet@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/beignet
___
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] [PATCH] backend: refine fdiv to rcp at some cases

2017-07-03 Thread Yang, Rong R
One comment. Thanks.

> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> rander.wang
> Sent: Monday, June 19, 2017 13:34
> To: beig...@freedesktop.org
> Cc: Wang, Rander 
> Subject: [Beignet] [PATCH] backend: refine fdiv to rcp at some cases
> 
>   when the src0 of fdiv is a immedia value and it is
>   exactly pow of 2, like 2.0f, 4.0f, 1.0/8.0f,
>   fdiv %0, imm, %1 can be convert to
>   rcp %0, %1
>   mul %0, %0, imm.
> 
>   for fdiv cost 8cycle, rcp 4cycle. it will save at least
>   3cycle.
> 
>   pass the conformance test and utests
> 
> Signed-off-by: rander.wang 
> ---
>  backend/src/backend/gen_insn_selection.cpp | 29
> +
>  1 file changed, 29 insertions(+)
> 
> diff --git a/backend/src/backend/gen_insn_selection.cpp
> b/backend/src/backend/gen_insn_selection.cpp
> index 7498f38..572f6a8 100644
> --- a/backend/src/backend/gen_insn_selection.cpp
> +++ b/backend/src/backend/gen_insn_selection.cpp
> @@ -3279,6 +3279,35 @@ extern bool OCL_DEBUGINFO; // first defined by
> calling BVAR in program.cpp
>  sel.MATH(dst, function, src0, src1);
>} else if(type == TYPE_FLOAT) {
>  GBE_ASSERT(op != OP_REM);
> +SelectionDAG *child0 = dag.child[0];
> +if (child0 && child0->insn.getOpcode() == OP_LOADI) {
> +  const auto  = cast(child0->insn);
> +  const Immediate imm = loadimm.getImmediate();
> +  float immVal = imm.getFloatValue();
> +  int* dwPtr = (int*)
> +
> +  //if immedia is a exactly pow of 2, it can be converted to RCP
> +  if((*dwPtr & 0x7F) == 0) {
> +if(immVal == -1.0f)
> +{
> +  GenRegister tmp = src1;
> +  tmp.negation = 1;
It is wrong when src1.negation is 1. Could use GenRegister:: negate() directly.
___
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] [PATCH] backend: improve add zero pattern

2017-07-03 Thread Yang, Rong R
After remove negation check, the function name doNegAddOptimization is not 
suitable.
Can you also change function name and the comment?

> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> rander.wang
> Sent: Friday, June 23, 2017 15:37
> To: beig...@freedesktop.org
> Cc: Wang, Rander 
> Subject: [Beignet] [PATCH] backend: improve add zero pattern
> 
>   remove the negation check for adding zero.
>   it also can be applied this optimization
> 
> Signed-off-by: rander.wang 
> ---
>  backend/src/backend/gen_insn_selection_optimize.cpp | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/backend/src/backend/gen_insn_selection_optimize.cpp
> b/backend/src/backend/gen_insn_selection_optimize.cpp
> index 1020b7f..3b76817 100644
> --- a/backend/src/backend/gen_insn_selection_optimize.cpp
> +++ b/backend/src/backend/gen_insn_selection_optimize.cpp
> @@ -307,8 +307,8 @@ namespace gbe
>  if (insn.opcode == SEL_OP_ADD) {
>GenRegister src0 = insn.src(0);
>GenRegister src1 = insn.src(1);
> -  if ((src0.negation && src1.file == GEN_IMMEDIATE_VALUE &&
> src1.value.f == 0.0f) ||
> -  (src1.negation && src0.file == GEN_IMMEDIATE_VALUE && src0.value.f
> == 0.0f))
> +  if ((src1.file == GEN_IMMEDIATE_VALUE && src1.value.f == 0.0f) ||
> +  (src0.file == GEN_IMMEDIATE_VALUE && src0.value.f == 0.0f))
>  addToReplaceInfoMap(insn);
>  }
>}
> --
> 2.7.4
> 
> ___
> Beignet mailing list
> Beignet@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/beignet
___
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] [PATCH] GBE: clean llvm module's clone and release.

2017-06-22 Thread Yang, Rong R
Pushed, thanks.

> -Original Message-
> From: Pan, Xiuli
> Sent: Thursday, June 22, 2017 14:54
> To: Yang, Rong R <rong.r.y...@intel.com>; beignet@lists.freedesktop.org
> Cc: Yang, Rong R <rong.r.y...@intel.com>
> Subject: RE: [Beignet] [PATCH] GBE: clean llvm module's clone and release.
> 
> LGTM.
> 
> 
> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> Yang Rong
> Sent: Thursday, June 22, 2017 14:04
> To: beignet@lists.freedesktop.org
> Cc: Yang, Rong R <rong.r.y...@intel.com>
> Subject: [Beignet] [PATCH] GBE: clean llvm module's clone and release.
> 
> There are some changes:
> 1. Clone the module before call LLVMLinkModules2, remove other clones for
> it.
> 2. Don't delete module in function llvmToGen.
> 3. Add a function programNewFromLLVMFile so genProgramNewFromLLVM
> and buildFromLLVMModule only handle llvm module. Actually,
> programNewFromLLVMFile is only used by clCreateProgramWithLLVMIntel,
> and I think it is useless, maybe we could delete it at all.
> 
> Signed-off-by: Yang Rong <rong.r.y...@intel.com>
> ---
>  backend/src/backend/gen_program.cpp|  5 +-
>  backend/src/backend/program.cpp| 84 +--
> ---
>  backend/src/backend/program.h  | 10 +++-
>  backend/src/backend/program.hpp|  4 +-
>  backend/src/llvm/llvm_bitcode_link.cpp |  3 +-
>  backend/src/llvm/llvm_to_gen.cpp   | 19 +---
>  backend/src/llvm/llvm_to_gen.hpp   |  2 +-
>  src/cl_gbe_loader.cpp  |  5 ++
>  src/cl_gbe_loader.h|  1 +
>  src/cl_program.c   |  2 +-
>  10 files changed, 77 insertions(+), 58 deletions(-)
> 
> diff --git a/backend/src/backend/gen_program.cpp
> b/backend/src/backend/gen_program.cpp
> index cfb23fe..bb1d22f 100644
> --- a/backend/src/backend/gen_program.cpp
> +++ b/backend/src/backend/gen_program.cpp
> @@ -455,7 +455,6 @@ namespace gbe {
>}
> 
>static gbe_program genProgramNewFromLLVM(uint32_t deviceID,
> -   const char *fileName,
> const void* module,
> const void* llvm_ctx,
> const char* asm_file_name, @@ 
> -475,7 +474,7 @@
> namespace gbe {  #ifdef GBE_COMPILER_AVAILABLE
>  std::string error;
>  // Try to compile the program
> -if (program->buildFromLLVMFile(fileName, module, error, optLevel) ==
> false) {
> +if (program->buildFromLLVMModule(module, error, optLevel) == false)
> + {
>if (err != NULL && errSize != NULL && stringSize > 0u) {
>  const size_t msgSize = std::min(error.size(), stringSize-1u);
>  std::memcpy(err, error.c_str(), msgSize); @@ -598,7 +597,7 @@
> namespace gbe {
>  acquireLLVMContextLock();
>  llvm::Module* module = (llvm::Module*)p->module;
> 
> -if (p->buildFromLLVMFile(NULL, module, error, optLevel) == false) {
> +if (p->buildFromLLVMModule(module, error, optLevel) == false) {
>if (err != NULL && errSize != NULL && stringSize > 0u) {
>  const size_t msgSize = std::min(error.size(), stringSize-1u);
>  std::memcpy(err, error.c_str(), msgSize); diff --git
> a/backend/src/backend/program.cpp b/backend/src/backend/program.cpp
> index 724058c..740c5c2 100644
> --- a/backend/src/backend/program.cpp
> +++ b/backend/src/backend/program.cpp
> @@ -40,6 +40,7 @@
>  #include "llvm/Support/ManagedStatic.h"
>  #include "llvm/Transforms/Utils/Cloning.h"
>  #include "llvm/IR/LLVMContext.h"
> +#include "llvm/IRReader/IRReader.h"
>  #endif
> 
>  #include 
> @@ -113,32 +114,17 @@ namespace gbe {
>IVAR(OCL_PROFILING_LOG, 0, 0, 1); // Int for different profiling types.
>BVAR(OCL_OUTPUT_BUILD_LOG, false);
> 
> -  bool Program::buildFromLLVMFile(const char *fileName,
> - const void* module,
> - std::string ,
> - int optLevel) {
> +  bool Program::buildFromLLVMModule(const void* module,
> +  std::string ,
> +  int optLevel) {
>  ir::Unit *unit = new ir::Unit();
> -llvm::Module * cloned_module = NULL;
>  bool ret = false;
> -if(module){
> -#if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 38
> -  cloned_module = llvm::CloneModule((llvm::Module*)module).release();
> -#else
> -  cloned_module = llvm::Cl

Re: [Beignet] [PATCH V4] backend: add global immediate optimization

2017-06-22 Thread Yang, Rong R
Pushed, thanks.

> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> Song, Ruiling
> Sent: Thursday, June 22, 2017 14:30
> To: Wang, Rander ; beig...@freedesktop.org
> Cc: Wang, Rander 
> Subject: Re: [Beignet] [PATCH V4] backend: add global immediate
> optimization
> 
> LGTM
> 
> Ruiling
> 
> > -Original Message-
> > From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf
> > Of rander.wang
> > Sent: Wednesday, June 14, 2017 1:56 PM
> > To: beig...@freedesktop.org
> > Cc: Wang, Rander 
> > Subject: [Beignet] [PATCH V4] backend: add global immediate
> > optimization
> >
> > there are some global immediates in global var list of LLVM.
> > these imm can be integrated in instructions. for
> > compiler_global_immediate_optimized test
> > in utest, there are two global immediates:
> > L0:
> > MOV(1)  %42<0>:UD   :   0x0:UD
> > MOV(1)  %43<0>:UD   :   0x30:UD
> >
> > used by:
> > ADD(16) %49<1>:D:   %42<0,1,0>:D
> > %48<8,8,1>:D
> > ADD(16) %54<1>:D:   %43<0,1,0>:D
> > %53<8,8,1>:D
> >
> > it can be
> > ADD(16) %49<1>:D:   %48<8,8,1>:D   0x0:UD
> > ADD(16) %54<1>:D:   %53<8,8,1>:D   0x30:UD
> >
> > Then the MOV can be removed. And after this optimization, ADD 0
> can
> > be change
> > to MOV, then local copy propagation can be done.
> >
> > V2: (1) add environment variable to enable/disable the optimization
> > (2) refine the architecture of imm optimization, inherit from global
> > optimizer not local block optimizer
> >
> > V3: merge with latest master driver
> >
> > V4: (1)refine some type errors
> > (2)remove UD/D check for no need
> > (3)refine imm calculate for UD/D
> >
> > Signed-off-by: rander.wang 
> > ---
> >  .../src/backend/gen_insn_selection_optimize.cpp| 367
> > +++--
> >  1 file changed, 342 insertions(+), 25 deletions(-)
> >
> > diff --git a/backend/src/backend/gen_insn_selection_optimize.cpp
> > b/backend/src/backend/gen_insn_selection_optimize.cpp
> > index 07547ec..eb93a20 100644
> > --- a/backend/src/backend/gen_insn_selection_optimize.cpp
> > +++ b/backend/src/backend/gen_insn_selection_optimize.cpp
> > @@ -40,6 +40,33 @@ namespace gbe
> >  return elements;
> >}
> >
> > +  class ReplaceInfo
> > +  {
> > +  public:
> > +ReplaceInfo(SelectionInstruction ,
> > +const GenRegister ,
> > +const GenRegister ) : insn(insn),
> > + intermedia(intermedia),
> > replacement(replacement)
> > +{
> > +  assert(insn.opcode == SEL_OP_MOV || insn.opcode == SEL_OP_ADD);
> > +  assert(&(insn.dst(0)) == );
> > +  this->elements = CalculateElements(intermedia, insn.state.execWidth);
> > +  replacementOverwritten = false;
> > +}
> > +~ReplaceInfo()
> > +{
> > +  this->toBeReplaceds.clear();
> > +}
> > +
> > +SelectionInstruction 
> > +const GenRegister 
> > +uint32_t elements;
> > +const GenRegister 
> > +set toBeReplaceds;
> > +set toBeReplacedInsns;
> > +bool replacementOverwritten;
> > +GBE_CLASS(ReplaceInfo);
> > +  };
> > +
> >class SelOptimizer
> >{
> >public:
> > @@ -66,32 +93,7 @@ namespace gbe
> >
> >private:
> >  // local copy propagation
> > -class ReplaceInfo
> > -{
> > -public:
> > -  ReplaceInfo(SelectionInstruction& insn,
> > -  const GenRegister& intermedia,
> > -  const GenRegister& replacement) :
> > -  insn(insn), intermedia(intermedia), 
> > replacement(replacement)
> > -  {
> > -assert(insn.opcode == SEL_OP_MOV || insn.opcode == SEL_OP_ADD);
> > -assert(&(insn.dst(0)) == );
> > -this->elements = CalculateElements(intermedia,
> insn.state.execWidth);
> > -replacementOverwritten = false;
> > -  }
> > -  ~ReplaceInfo()
> > -  {
> > -this->toBeReplaceds.clear();
> > -  }
> >
> > -  SelectionInstruction& insn;
> > -  const GenRegister& intermedia;
> > -  uint32_t elements;
> > -  const GenRegister& replacement;
> > -  set toBeReplaceds;
> > -  bool replacementOverwritten;
> > -  GBE_CLASS(ReplaceInfo);
> > -};
> >  typedef map ReplaceInfoMap;
> >  ReplaceInfoMap replaceInfoMap;
> >  void doLocalCopyPropagation();
> > @@ -298,13 +300,328 @@ namespace gbe
> >  virtual void run();
> >};
> >
> > +  class SelGlobalImmMovOpt : public SelGlobalOptimizer  {
> > +  public:
> > +SelGlobalImmMovOpt(const GenContext& ctx, uint32_t 

Re: [Beignet] [PATCH V2] backend: refine load/store merging algorithm

2017-06-22 Thread Yang, Rong R
Pushed, thanks.

> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> Song, Ruiling
> Sent: Thursday, June 22, 2017 14:29
> To: Wang, Rander ; beig...@freedesktop.org
> Cc: Wang, Rander 
> Subject: Re: [Beignet] [PATCH V2] backend: refine load/store merging
> algorithm
> 
> LGTM
> 
> Ruiling
> 
> > -Original Message-
> > From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf
> > Of rander.wang
> > Sent: Friday, June 16, 2017 9:50 AM
> > To: beig...@freedesktop.org
> > Cc: Wang, Rander 
> > Subject: [Beignet] [PATCH V2] backend: refine load/store merging
> > algorithm
> >
> > Now it works for sequence: load(0), load(1), load(2)
> > but it cant work for load(2), load(0), load(1). because
> > it compared the last merged load and the new one not all
> > the loads
> >
> > for  sequence: load(0), load(1), load(2). the load(0) is the
> > start, can find that load(1) is successor without space, so
> > put it to a merge fifo. then the start is moving to the top
> > of fifo load(1), and compared with load(2). Also load(2) can
> > be merged
> >
> > for load(2), load(0), load(1). load(2) cant be merged with
> > load(0) for a space between them. So skip load(0) and mov to next
> > load(1).And this load(1) can be merged. But it never go back merge
> > load(0)
> >
> > Now change the algorithm.
> > (1) find all loads maybe merged arround the start by the distance to
> > the start. the distance is depended on data type, for 32bit data, 
> > the
> > distance is 4. Put them in a list
> >
> > (2) sort the list by the distance from the start.
> >
> > (3) search the continuous sequence including the start to merge
> >
> > V2: (1)refine the sort and compare algoritm. First find all the IO
> >in small offset compared to start. Then call std:sort
> > (2)check the number of candidate IO to be favorable to 
> > performance
> >for most cases there is no chance to merge IO
> >
> > Signed-off-by: rander.wang 
> > ---
> >  backend/src/llvm/llvm_loadstore_optimization.cpp | 87
> > +---
> >  1 file changed, 78 insertions(+), 9 deletions(-)
> >
> > diff --git a/backend/src/llvm/llvm_loadstore_optimization.cpp
> > b/backend/src/llvm/llvm_loadstore_optimization.cpp
> > index 5aa38be..c91c1a0 100644
> > --- a/backend/src/llvm/llvm_loadstore_optimization.cpp
> > +++ b/backend/src/llvm/llvm_loadstore_optimization.cpp
> > @@ -67,7 +67,7 @@ namespace gbe {
> >  bool isSimpleLoadStore(Value *I);
> >  bool optimizeLoadStore(BasicBlock );
> >
> > -bool isLoadStoreCompatible(Value *A, Value *B);
> > +bool isLoadStoreCompatible(Value *A, Value *B, int *dist, int*
> elementSize,
> > int maxVecSize);
> >  void mergeLoad(BasicBlock , SmallVector
> );
> >  void mergeStore(BasicBlock , SmallVector
> );
> >  bool findConsecutiveAccess(BasicBlock ,
> > @@ -109,7 +109,7 @@ namespace gbe {
> >  return NULL;
> >}
> >
> > -  bool GenLoadStoreOptimization::isLoadStoreCompatible(Value *A,
> > Value *B) {
> > +  bool GenLoadStoreOptimization::isLoadStoreCompatible(Value *A,
> > + Value *B,
> > int *dist, int* elementSize, int maxVecSize) {
> >  Value *ptrA = getPointerOperand(A);
> >  Value *ptrB = getPointerOperand(B);
> >  unsigned ASA = getAddressSpace(A); @@ -136,7 +136,11 @@
> namespace
> > gbe {
> >  // The Instructions are connsecutive if the size of the first 
> > load/store is
> >  // the same as the offset.
> >  int64_t sz = TD->getTypeStoreSize(Ty);
> > -return ((-offset) == sz);
> > +*dist = -offset;
> > +*elementSize = sz;
> > +
> > +//a insn with small distance from the search load/store is a candidate
> one
> > +return (abs(-offset) < sz*maxVecSize);
> >}
> >
> >void GenLoadStoreOptimization::mergeLoad(BasicBlock ,
> > SmallVector ) { @@ -163,6 +167,25 @@
> > namespace gbe {
> >values[i]->replaceAllUsesWith(S);
> >  }
> >}
> > +
> > +  class mergedInfo{
> > +public:
> > +Instruction* mInsn;
> > +int mOffset;
> > +
> > +void init(Instruction* insn, int offset)
> > +{
> > +  mInsn = insn;
> > +  mOffset = offset;
> > +}
> > +  };
> > +
> > +  struct offsetSorter {
> > +bool operator()(mergedInfo* m0, mergedInfo* m1) const {
> > +return m0->mOffset < m1->mOffset;
> > +}
> > +  };
> > +
> >// When searching for consecutive memory access, we do it in a small
> window,
> >// if the window is too large, it would take up too much compiling time.
> >// An Important rule we have followed is don't try to change load/store
> order.
> > @@ -177,7 +200,6 @@ namespace gbe {
> >
> >   

Re: [Beignet] [PATCH v2] Add missed kernel names into built-in kernel list.

2017-06-22 Thread Yang, Rong R
Rename "__cl_cpy_region_unalign_same_offset;" to 
"__cl_copy_region_unalign_same_offset;",
and "__cl_copy_image_3d_to_2d;" is duplicated.
 I have modified them and pushed, thanks.

> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> yan.w...@linux.intel.com
> Sent: Thursday, June 22, 2017 13:52
> To: beignet@lists.freedesktop.org
> Cc: Yan Wang 
> Subject: [Beignet] [PATCH v2] Add missed kernel names into built-in kernel
> list.
> 
> From: Yan Wang 
> 
> Signed-off-by: Yan Wang 
> ---
>  src/cl_gt_device.h | 17 +
>  1 file changed, 17 insertions(+)
> 
> diff --git a/src/cl_gt_device.h b/src/cl_gt_device.h index f6cb5f8..ff23b32
> 100644
> --- a/src/cl_gt_device.h
> +++ b/src/cl_gt_device.h
> @@ -115,16 +115,33 @@ DECL_INFO_STRING(built_in_kernels,
> "__cl_copy_region_align4;"
> "__cl_cpy_region_unalign_same_offset;"
> "__cl_copy_region_unalign_dst_offset;"
> "__cl_copy_region_unalign_src_offset;"
> +   "__cl_copy_region_unalign_same_offset;"
> "__cl_copy_buffer_rect;"
> +   "__cl_copy_buffer_rect_align4;"
> "__cl_copy_image_1d_to_1d;"
> "__cl_copy_image_2d_to_2d;"
> "__cl_copy_image_3d_to_2d;"
> "__cl_copy_image_2d_to_3d;"
> "__cl_copy_image_3d_to_3d;"
> +   "__cl_copy_image_3d_to_2d;"
> "__cl_copy_image_2d_to_buffer;"
> +   "__cl_copy_image_2d_to_buffer_align4;"
> +   "__cl_copy_image_2d_to_buffer_align16;"
> "__cl_copy_image_3d_to_buffer;"
> +   "__cl_copy_image_3d_to_buffer_align4;"
> +   "__cl_copy_image_3d_to_buffer_align16;"
> "__cl_copy_buffer_to_image_2d;"
> +   "__cl_copy_buffer_to_image_2d_align4;"
> +   "__cl_copy_buffer_to_image_2d_align16;"
> "__cl_copy_buffer_to_image_3d;"
> +   "__cl_copy_buffer_to_image_3d_align4;"
> +   "__cl_copy_buffer_to_image_3d_align16;"
> +   "__cl_copy_image_1d_array_to_1d_array;"
> +   "__cl_copy_image_2d_array_to_2d_array;"
> +   "__cl_copy_image_2d_array_to_2d;"
> +   "__cl_copy_image_2d_array_to_3d;"
> +   "__cl_copy_image_2d_to_2d_array;"
> +   "__cl_copy_image_3d_to_2d_array;"
> "__cl_fill_region_unalign;"
> "__cl_fill_region_align2;"
> "__cl_fill_region_align4;"
> --
> 2.7.4
> 
> ___
> Beignet mailing list
> Beignet@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/beignet
___
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] [PATCH] Backend: Fix double free of the cloned_module

2017-06-21 Thread Yang, Rong R
Because llvmToGen accept the filename argument, so it need to create and delete 
module.
I think the module should not be deleted in llvmToGen, the caller decide to 
delete or not.
I will send another patch to refine it.

> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> Xiuli Pan
> Sent: Thursday, June 15, 2017 16:46
> To: beignet@lists.freedesktop.org
> Cc: Pan, Xiuli 
> Subject: [Beignet] [PATCH] Backend: Fix double free of the cloned_module
> 
> From: Pan Xiuli 
> 
> In the llvmToGen function the module will be deleted, we only need to
> delete the cloned_module when the first llvmToGen success.
> 
> Signed-off-by: Pan Xiuli 
> ---
>  backend/src/backend/program.cpp | 8 +---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/backend/src/backend/program.cpp
> b/backend/src/backend/program.cpp index 724058c..8fb33c4 100644
> --- a/backend/src/backend/program.cpp
> +++ b/backend/src/backend/program.cpp
> @@ -154,7 +154,12 @@ namespace gbe {
>  //suppose file exists and llvmToGen will not return false.
>  llvmToGen(*unit, fileName, module, 0, strictMath,
> OCL_PROFILING_LOG, error);
>}
> +} else {
> +  if(cloned_module){
> +delete (llvm::Module*) cloned_module;
> +  }
>  }
> +
>  if(unit->getValid()){
>std::string error2;
>if (this->buildFromUnit(*unit, error2)){ @@ -163,9 +168,6 @@
> namespace gbe {
>error = error + error2;
>  }
>  delete unit;
> -if(cloned_module){
> -  delete (llvm::Module*) cloned_module;
> -}
>  return ret;
>}
> 
> --
> 2.7.4
> 
> ___
> Beignet mailing list
> Beignet@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/beignet
___
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] [PATCH] Runtime: Add missing SKL deivce ID

2017-06-21 Thread Yang, Rong R
LGTM, pushed, thanks.

> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> Xiuli Pan
> Sent: Wednesday, June 14, 2017 11:41
> To: beignet@lists.freedesktop.org
> Cc: Pan, Xiuli 
> Subject: [Beignet] [PATCH] Runtime: Add missing SKL deivce ID
> 
> From: Pan Xiuli 
> 
> It seems we missed some newly added device ID for SKL.
> 
> Signed-off-by: Pan Xiuli 
> ---
>  src/cl_device_data.h | 6 +-
>  src/cl_device_id.c   | 4 
>  2 files changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/src/cl_device_data.h b/src/cl_device_data.h index
> c3d6c45..123b619 100644
> --- a/src/cl_device_data.h
> +++ b/src/cl_device_data.h
> @@ -247,7 +247,9 @@
>  /* SKL */
>  #define PCI_CHIP_SKYLAKE_ULT_GT1 0x1906   /* Intel(R) Skylake ULT - GT1
> */
>  #define PCI_CHIP_SKYLAKE_ULT_GT2 0x1916   /* Intel(R) Skylake ULT - GT2
> */
> -#define PCI_CHIP_SKYLAKE_ULT_GT3 0x1926   /* Intel(R) Skylake ULT - GT3
> */
> +#define PCI_CHIP_SKYLAKE_ULT_GT3 0x1923   /* Intel(R) Skylake
> ULT - GT3 */
> +#define PCI_CHIP_SKYLAKE_ULT_GT3E1   0x1926   /* Intel(R) Skylake
> ULT - GT3E */
> +#define PCI_CHIP_SKYLAKE_ULT_GT3E2   0x1927   /* Intel(R) Skylake
> ULT - GT3E */
>  #define PCI_CHIP_SKYLAKE_ULT_GT2F0x1921   /* Intel(R) Skylake
> ULT - GT2F */
>  #define PCI_CHIP_SKYLAKE_ULX_GT1 0x190E   /* Intel(R) Skylake ULX - GT1
> */
>  #define PCI_CHIP_SKYLAKE_ULX_GT2 0x191E   /* Intel(R) Skylake ULX - GT2
> */
> @@ -284,6 +286,8 @@
> 
>  #define IS_SKL_GT3(devid)   \
>(devid == PCI_CHIP_SKYLAKE_ULT_GT3 ||   \
> +   devid == PCI_CHIP_SKYLAKE_ULT_GT3E1 ||   \
> +   devid == PCI_CHIP_SKYLAKE_ULT_GT3E2 ||   \
> devid == PCI_CHIP_SKYLAKE_HALO_GT3 || \
> devid == PCI_CHIP_SKYLAKE_SRV_GT3 || \
> devid == PCI_CHIP_SKYLAKE_MEDIA_SRV_GT3) diff --git
> a/src/cl_device_id.c b/src/cl_device_id.c index 76549a4..b9a60bb 100644
> --- a/src/cl_device_id.c
> +++ b/src/cl_device_id.c
> @@ -605,6 +605,10 @@ skl_gt2_break:
> 
>  case PCI_CHIP_SKYLAKE_ULT_GT3:
>DECL_INFO_STRING(skl_gt3_break, intel_skl_gt3_device, name, "Intel(R)
> HD Graphics Skylake ULT GT3");
> +case PCI_CHIP_SKYLAKE_ULT_GT3E1:
> +  DECL_INFO_STRING(skl_gt3_break, intel_skl_gt3_device, name, "Intel(R)
> HD Graphics Skylake ULT GT3E");
> +case PCI_CHIP_SKYLAKE_ULT_GT3E2:
> +  DECL_INFO_STRING(skl_gt3_break, intel_skl_gt3_device, name,
> + "Intel(R) HD Graphics Skylake ULT GT3E");
>  case PCI_CHIP_SKYLAKE_HALO_GT3:
>DECL_INFO_STRING(skl_gt3_break, intel_skl_gt3_device, name, "Intel(R)
> HD Graphics Skylake Halo GT3");
>  case PCI_CHIP_SKYLAKE_SRV_GT3:
> --
> 2.7.4
> 
> ___
> Beignet mailing list
> Beignet@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/beignet
___
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] [PATCH] Add aligned copy kernels into built-in kernel list.

2017-06-21 Thread Yang, Rong R
There are still some kernels missed, can you add them together?

> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> yan.w...@linux.intel.com
> Sent: Wednesday, June 21, 2017 11:26
> To: beignet@lists.freedesktop.org
> Cc: Yan Wang 
> Subject: [Beignet] [PATCH] Add aligned copy kernels into built-in kernel list.
> 
> From: Yan Wang 
> 
> Signed-off-by: Yan Wang 
> ---
>  src/cl_gt_device.h | 8 
>  1 file changed, 8 insertions(+)
> 
> diff --git a/src/cl_gt_device.h b/src/cl_gt_device.h index f6cb5f8..8008606
> 100644
> --- a/src/cl_gt_device.h
> +++ b/src/cl_gt_device.h
> @@ -122,9 +122,17 @@ DECL_INFO_STRING(built_in_kernels,
> "__cl_copy_region_align4;"
> "__cl_copy_image_2d_to_3d;"
> "__cl_copy_image_3d_to_3d;"
> "__cl_copy_image_2d_to_buffer;"
> +   "__cl_copy_image_2d_to_buffer_align4;"
> +   "__cl_copy_image_2d_to_buffer_align16;"
> "__cl_copy_image_3d_to_buffer;"
> +   "__cl_copy_image_3d_to_buffer_align4;"
> +   "__cl_copy_image_3d_to_buffer_align16;"
> "__cl_copy_buffer_to_image_2d;"
> +   "__cl_copy_buffer_to_image_2d_align4;"
> +   "__cl_copy_buffer_to_image_2d_align16;"
> "__cl_copy_buffer_to_image_3d;"
> +   "__cl_copy_buffer_to_image_3d_align4;"
> +   "__cl_copy_buffer_to_image_3d_align16;"
> "__cl_fill_region_unalign;"
> "__cl_fill_region_align2;"
> "__cl_fill_region_align4;"
> --
> 2.7.4
> 
> ___
> Beignet mailing list
> Beignet@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/beignet
___
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] [PATCH] Fix context leak with internal kernels

2017-06-16 Thread Yang, Rong R
Rebase to master and add internal_ctx_refs when ctx->image_queue is not NULL.
The patch LGTM, pushed, thanks.

> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> Patrick Beaulieu
> Sent: Friday, June 16, 2017 7:15
> To: beignet@lists.freedesktop.org
> Subject: [Beignet] [PATCH] Fix context leak with internal kernels
> 
> Account for internal program ctx references in cl_context_delete
> 
> Signed-off-by: Patrick Beaulieu 
> ---
>  src/cl_context.c | 19 ++-
>  1 file changed, 18 insertions(+), 1 deletion(-)
> 
> diff --git a/src/cl_context.c b/src/cl_context.c index 1ba23024..89362365
> 100644
> --- a/src/cl_context.c
> +++ b/src/cl_context.c
> @@ -358,10 +358,25 @@ cl_context_delete(cl_context ctx)
>if (UNLIKELY(ctx == NULL))
>  return;
> 
> +  int internal_ctx_refs = 1;
> +  // determine how many ctx refs are held by internal_prgs and
> + built_in_prgs  for (i = CL_INTERNAL_KERNEL_MIN; i <
> CL_INTERNAL_KERNEL_MAX; i++) {
> +if (ctx->internal_kernels[i] && ctx->internal_prgs[i])
> +  ++internal_ctx_refs;
> +  }
> +
> +  if (ctx->built_in_prgs)
> + ++internal_ctx_refs;
> +
>/* We are not done yet */
> -  if (CL_OBJECT_DEC_REF(ctx) > 1)
> +  if (CL_OBJECT_DEC_REF(ctx) > internal_ctx_refs)
>  return;
> 
> +  // create a temporary extra ref here so cl_program_delete doesn't  //
> + attempt a recursive full cl_context_delete when cleaning up  // our
> + internal programs  CL_OBJECT_INC_REF(ctx);
> +
>/* delete the internal programs. */
>for (i = CL_INTERNAL_KERNEL_MIN; i < CL_INTERNAL_KERNEL_MAX; i++) {
>  if (ctx->internal_kernels[i]) {
> @@ -382,6 +397,8 @@ cl_context_delete(cl_context ctx)
>cl_program_delete(ctx->built_in_prgs);
>ctx->built_in_prgs = NULL;
> 
> +  CL_OBJECT_DEC_REF(ctx);
> +
>cl_free(ctx->prop_user);
>cl_free(ctx->devices);
>cl_driver_delete(ctx->drv);
> --
> 2.11.0
> 
> ___
> Beignet mailing list
> Beignet@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/beignet
___
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] [PATCH V2] backend: refine the local copy propagation.

2017-06-16 Thread Yang, Rong R
Fix a build error "case SEL_OP_RHADD" by manual, pushed, thanks.


> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> rander.wang
> Sent: Thursday, June 15, 2017 9:48
> To: beig...@freedesktop.org
> Cc: Wang, Rander 
> Subject: [Beignet] [PATCH V2] backend: refine the local copy propagation.
> 
>   src modifier is not supported by some instructions.
>   so return false when it exists. This fix piglit %
>   scalar-arithmetic-int failed
> 
>   V2: (1)add hadd rhadd
>   (2)confirmed math functions support midifer except IDIV/Mod
> 
> Signed-off-by: rander.wang 
> ---
>  .../src/backend/gen_insn_selection_optimize.cpp| 34
> ++
>  1 file changed, 34 insertions(+)
> 
> diff --git a/backend/src/backend/gen_insn_selection_optimize.cpp
> b/backend/src/backend/gen_insn_selection_optimize.cpp
> index 07547ec..c35ee25 100644
> --- a/backend/src/backend/gen_insn_selection_optimize.cpp
> +++ b/backend/src/backend/gen_insn_selection_optimize.cpp
> @@ -189,6 +189,40 @@ namespace gbe
>  if (insn.opcode == SEL_OP_BSWAP) //should remove once bswap issue is
> fixed
>return false;
> 
> +//the src modifier is not supported by the following instructions
> +if(info->replacement.negation || info->replacement.absolute)
> +{
> +  switch(insn.opcode)
> +  {
> +case SEL_OP_MATH:
> +{
> +  switch(insn.extra.function)
> +  {
> +case GEN_MATH_FUNCTION_INT_DIV_QUOTIENT:
> +case GEN_MATH_FUNCTION_INT_DIV_REMAINDER:
> +case
> GEN_MATH_FUNCTION_INT_DIV_QUOTIENT_AND_REMAINDER:
> +  return false;
> +default:
> +  break;
> +  }
> +
> +  break;
> +}
> +case SEL_OP_CBIT:
> +case SEL_OP_FBH:
> +case SEL_OP_FBL:
> +case SEL_OP_BRC:
> +case SEL_OP_BRD:
> +case SEL_OP_BFREV:
> +case SEL_OP_LZD:
> +case SEL_OP_HADD:
> +case SEL_OP_RHADD
> +  return false;
> +default:
> +  break;
> +  }
> +}
> +
>  if (insn.isWrite() || insn.isRead()) //register in selection vector
>return false;
> 
> --
> 2.7.4
> 
> ___
> Beignet mailing list
> Beignet@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/beignet
___
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] [PATCH V2 2/3] Runtime: Add new API enums for cl_intel_required_subgroup_size extension

2017-06-15 Thread Yang, Rong R
The patchset LGTM, pushed, thanks.

> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> Xiuli Pan
> Sent: Thursday, June 15, 2017 16:45
> To: beignet@lists.freedesktop.org
> Cc: Pan, Xiuli 
> Subject: [Beignet] [PATCH V2 2/3] Runtime: Add new API enums for
> cl_intel_required_subgroup_size extension
> 
> From: Pan Xiuli 
> 
> Add CL_DEVICE_SUB_GROUP_SIZES_INTEL for clGetDeviceInfo, add
> CL_KERNEL_SPILL_MEM_SIZE_INTEL for clGetKernelWorkGroupInfo and add
> CL_KERNEL_COMPILE_SUB_GROUP_SIZE_INTEL for
> clGetKernelSubGroupInfo.
> We only have this extension for LLVM 40+ for frontend support.
> V2: Add opencl-c define
> 
> Signed-off-by: Pan Xiuli 
> ---
>  backend/src/libocl/include/ocl.h |  4 
>  include/CL/cl_intel.h|  6 ++
>  src/cl_device_id.c   | 27 +++
>  src/cl_device_id.h   |  2 ++
>  src/cl_extensions.c  |  8 
>  src/cl_extensions.h  |  1 +
>  src/cl_gt_device.h   |  2 ++
>  7 files changed, 50 insertions(+)
> 
> diff --git a/backend/src/libocl/include/ocl.h
> b/backend/src/libocl/include/ocl.h
> index dded574..5819f8c 100644
> --- a/backend/src/libocl/include/ocl.h
> +++ b/backend/src/libocl/include/ocl.h
> @@ -126,6 +126,10 @@
>  #define cl_intel_planar_yuv
>  #define cl_intel_media_block_io
> 
> +#if __clang_major__*10 + __clang_minor__ > 40 #define
> +cl_intel_required_subgroup_size #endif
> +
>  #pragma OPENCL EXTENSION cl_khr_fp64 : disable  #pragma OPENCL
> EXTENSION cl_khr_fp16 : disable  #endif diff --git a/include/CL/cl_intel.h
> b/include/CL/cl_intel.h index 47bae46..3cb8515 100644
> --- a/include/CL/cl_intel.h
> +++ b/include/CL/cl_intel.h
> @@ -197,6 +197,12 @@ typedef CL_API_ENTRY cl_int
> void* /*param_value*/,
> size_t*
> /*param_value_size_ret*/ );  #endif
> +
> +/* cl_intel_required_subgroup_size extension*/
> +#define CL_DEVICE_SUB_GROUP_SIZES_INTEL 0x4108
> +#define CL_KERNEL_SPILL_MEM_SIZE_INTEL  0x4109
> +#define CL_KERNEL_COMPILE_SUB_GROUP_SIZE_INTEL  0x410A
> +
>  #ifdef __cplusplus
>  }
>  #endif
> diff --git a/src/cl_device_id.c b/src/cl_device_id.c index 6cba2b5..76549a4
> 100644
> --- a/src/cl_device_id.c
> +++ b/src/cl_device_id.c
> @@ -1377,6 +1377,10 @@ cl_get_device_info(cl_device_id device,
>src_ptr = device->driver_version;
>src_size = device->driver_version_sz;
>break;
> +case CL_DEVICE_SUB_GROUP_SIZES_INTEL:
> +  src_ptr = device->sub_group_sizes;
> +  src_size = device->sub_group_sizes_sz;
> +  break;
> 
>  default:
>return CL_INVALID_VALUE;
> @@ -1520,6 +1524,7 @@ cl_get_kernel_workgroup_info(cl_kernel kernel,
>  DECL_FIELD(COMPILE_WORK_GROUP_SIZE, kernel->compile_wg_sz)
>  DECL_FIELD(PRIVATE_MEM_SIZE, kernel->stack_size)
>  case CL_KERNEL_GLOBAL_WORK_SIZE:
> +{
>dimension = cl_check_builtin_kernel_dimension(kernel, device);
>if ( !dimension ) return CL_INVALID_VALUE;
>if (param_value_size_ret != NULL) @@ -1537,6 +1542,18 @@
> cl_get_kernel_workgroup_info(cl_kernel kernel,
>  return CL_SUCCESS;
>}
>return CL_SUCCESS;
> +}
> +case CL_KERNEL_SPILL_MEM_SIZE_INTEL:
> +{
> +  if (param_value && param_value_size < sizeof(cl_ulong))
> +return CL_INVALID_VALUE;
> +  if (param_value_size_ret != NULL)
> +*param_value_size_ret = sizeof(cl_ulong);
> +  if (param_value)
> +*(cl_ulong*)param_value =
> (cl_ulong)interp_kernel_get_scratch_size(kernel->opaque);
> +  return CL_SUCCESS;
> +}
> +
>  default:
>return CL_INVALID_VALUE;
>};
> @@ -1620,6 +1637,16 @@ cl_get_kernel_subgroup_info(cl_kernel kernel,
>}
>break;
>  }
> +case CL_KERNEL_COMPILE_SUB_GROUP_SIZE_INTEL:
> +{
> +  if (param_value && param_value_size < sizeof(size_t))
> +return CL_INVALID_VALUE;
> +  if (param_value_size_ret != NULL)
> +*param_value_size_ret = sizeof(size_t);
> +  if (param_value)
> +*(size_t*)param_value = interp_kernel_get_simd_width(kernel-
> >opaque);
> +  return CL_SUCCESS;
> +}
>  default:
>return CL_INVALID_VALUE;
>};
> diff --git a/src/cl_device_id.h b/src/cl_device_id.h index 6b8f2eb..93bd2f1
> 100644
> --- a/src/cl_device_id.h
> +++ b/src/cl_device_id.h
> @@ -136,6 +136,8 @@ struct _cl_device_id {
>uint32_t atomic_test_result;
>cl_uint image_pitch_alignment;
>cl_uint image_base_address_alignment;
> +  size_t sub_group_sizes[2];
> +  size_t sub_group_sizes_sz;
> 
>//inited as NULL, created only when cmrt kernel is used
>void* cmrt_device;  //realtype: CmDevice* diff --git a/src/cl_extensions.c
> b/src/cl_extensions.c index 

Re: [Beignet] [PATCH] do constant folding for kernel struct args

2017-06-15 Thread Yang, Rong R
Ok, It only handle very special case, and must all before 
postPhiCopyOptimization.
I will pushed it, thanks.

> -Original Message-
> From: Guo, Yejun
> Sent: Tuesday, June 13, 2017 20:51
> To: Yang, Rong R <rong.r.y...@intel.com>; Wang, Rander
> <rander.w...@intel.com>; Pan, Xiuli <xiuli@intel.com>;
> beignet@lists.freedesktop.org
> Subject: RE: [Beignet] [PATCH] do constant folding for kernel struct args
> 
> In current implementation, only loadi and add are considered.
> 
> In the example, since %22 is dst of MOV, it will not be recorded. It is 
> recorded
> and so impacts the IR only if %22 is dst of ADD.
> 
> -Original Message-
> From: Yang, Rong R
> Sent: Tuesday, June 13, 2017 4:59 PM
> To: Guo, Yejun; Wang, Rander; Pan, Xiuli; beignet@lists.freedesktop.org
> Subject: RE: [Beignet] [PATCH] do constant folding for kernel struct args
> 
> foldFunctionStructArgConstOffset is called before the
> lowerFunctionArguments.
> If foldFunctionStructArgConstOffset is wrong, the INDIRECT_MOV generated
> in lowerFunctionArguments also wrong.
> 
> I afraid the following ir:
> 
> BB2:
> LOADI %30, 4
> Add %20, %10, %30//%10 is a struct argument
> MOV %22, %20   //phi-mov
> 
> BB3:
> LOADI %31, 8
> Add %21, %11, %31//%11 is another struct argument
> MOV %22, %21   //phi-mov
> 
> BB4:
> LOADI %32, 4
> Add %33, %22, %32
> 
> Will be converted to:
> LOADI %42, 8
> Add %33, %10, %42
> 
> If so, the lowerFunctionArguments will wrong.
> 
> > -Original Message-
> > From: Guo, Yejun
> > Sent: Tuesday, June 13, 2017 16:39
> > To: Yang, Rong R <rong.r.y...@intel.com>; Wang, Rander
> > <rander.w...@intel.com>; Pan, Xiuli <xiuli@intel.com>;
> > beignet@lists.freedesktop.org
> > Subject: RE: [Beignet] [PATCH] do constant folding for kernel struct
> > args
> >
> > I just tried such kernel, and the generated GEN IR is INDIRECT_MOV, it
> > has nothing to do with this patch.
> >
> > Thanks
> > Yejun
> >
> > -Original Message-

___
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] [PATCH] backend: refine the local copy propagation.

2017-06-14 Thread Yang, Rong R
Addc also can't use src modifier.
All instruction selection opcode used addc need also return false, such as hadd.

> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> rander.wang
> Sent: Tuesday, June 13, 2017 15:08
> To: beig...@freedesktop.org
> Cc: Wang, Rander 
> Subject: [Beignet] [PATCH] backend: refine the local copy propagation.
> 
>   src modifier is not supported by some instructions.
>   so return false when it exists. This fix piglit %
>   failed
> 
> Signed-off-by: rander.wang 
> ---
>  .../src/backend/gen_insn_selection_optimize.cpp| 32
> ++
>  1 file changed, 32 insertions(+)
> 
> diff --git a/backend/src/backend/gen_insn_selection_optimize.cpp
> b/backend/src/backend/gen_insn_selection_optimize.cpp
> index 07547ec..730db5e 100644
> --- a/backend/src/backend/gen_insn_selection_optimize.cpp
> +++ b/backend/src/backend/gen_insn_selection_optimize.cpp
> @@ -189,6 +189,38 @@ namespace gbe
>  if (insn.opcode == SEL_OP_BSWAP) //should remove once bswap issue is
> fixed
>return false;
> 
> +//the src modifier is not supported by the following instructions
> +if(info->replacement.negation || info->replacement.absolute)
> +{
> +  switch(insn.opcode)
> +  {
> +case SEL_OP_MATH:
> +{
> +  switch(insn.extra.function)
> +  {
> +case GEN_MATH_FUNCTION_INT_DIV_QUOTIENT:
> +case GEN_MATH_FUNCTION_INT_DIV_REMAINDER:
> +case
> GEN_MATH_FUNCTION_INT_DIV_QUOTIENT_AND_REMAINDER:
> +  return false;
> +default:
> +  break;
> +  }
> +
> +  break;
> +}
> +case SEL_OP_CBIT:
> +case SEL_OP_FBH:
> +case SEL_OP_FBL:
> +case SEL_OP_BRC:
> +case SEL_OP_BRD:
> +case SEL_OP_BFREV:
> +case SEL_OP_LZD:
> +  return false;
> +default:
> +  break;
> +  }
> +}
> +
>  if (insn.isWrite() || insn.isRead()) //register in selection vector
>return false;
> 
> --
> 2.7.4
> 
> ___
> Beignet mailing list
> Beignet@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/beignet
___
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] [PATCH 2/2] Use aligned16 and aligne4 kernel to copy for large 3D image with TILE_Y.

2017-06-14 Thread Yang, Rong R
LGTM, except some format. I have run git clang-format by manual and pushed, 
thanks.

> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> yan.w...@linux.intel.com
> Sent: Tuesday, June 13, 2017 16:32
> To: beignet@lists.freedesktop.org
> Cc: Yan Wang 
> Subject: [Beignet] [PATCH 2/2] Use aligned16 and aligne4 kernel to copy for
> large 3D image with TILE_Y.
> 
> From: Yan Wang 
> 
> It is similar with 2D image for avoiding extended image width truncated.
> 
> Signed-off-by: Yan Wang 
> ---
>  src/CMakeLists.txt |  2 +
>  src/cl_context.h   |  4 ++
>  src/cl_mem.c   | 46 
> +++---
>  .../cl_internal_copy_buffer_to_image_3d_align16.cl | 19
> +  .../cl_internal_copy_buffer_to_image_3d_align4.cl  | 19
> +  .../cl_internal_copy_image_3d_to_buffer_align16.cl | 20
> ++  .../cl_internal_copy_image_3d_to_buffer_align4.cl  | 20
> ++
>  7 files changed, 125 insertions(+), 5 deletions(-)  create mode 100644
> src/kernels/cl_internal_copy_buffer_to_image_3d_align16.cl
>  create mode 100644
> src/kernels/cl_internal_copy_buffer_to_image_3d_align4.cl
>  create mode 100644
> src/kernels/cl_internal_copy_image_3d_to_buffer_align16.cl
>  create mode 100644
> src/kernels/cl_internal_copy_image_3d_to_buffer_align4.cl
> 
> diff --git a/src/CMakeLists.txt b/src/CMakeLists.txt index 87ad48b..ecb98b9
> 100644
> --- a/src/CMakeLists.txt
> +++ b/src/CMakeLists.txt
> @@ -54,6 +54,8 @@ cl_internal_copy_image_2d_array_to_3d
> cl_internal_copy_image_3d_to_2d_array
>  cl_internal_copy_image_2d_to_buffer
> cl_internal_copy_image_2d_to_buffer_align16
> cl_internal_copy_image_3d_to_buffer
>  cl_internal_copy_buffer_to_image_2d
> cl_internal_copy_buffer_to_image_2d_align16
> cl_internal_copy_buffer_to_image_3d
>  cl_internal_copy_buffer_to_image_2d_align4
> cl_internal_copy_image_2d_to_buffer_align4
> +cl_internal_copy_buffer_to_image_3d_align4
> +cl_internal_copy_image_3d_to_buffer_align4
> +cl_internal_copy_buffer_to_image_3d_align16
> +cl_internal_copy_image_3d_to_buffer_align16
>  cl_internal_fill_buf_align8 cl_internal_fill_buf_align4
>  cl_internal_fill_buf_align2 cl_internal_fill_buf_unalign
>  cl_internal_fill_buf_align128 cl_internal_fill_image_1d diff --git
> a/src/cl_context.h b/src/cl_context.h index 75bf895..b3a79bc 100644
> --- a/src/cl_context.h
> +++ b/src/cl_context.h
> @@ -64,10 +64,14 @@ enum _cl_internal_ker_type {
>CL_ENQUEUE_COPY_IMAGE_2D_TO_BUFFER_ALIGN16,
>CL_ENQUEUE_COPY_IMAGE_2D_TO_BUFFER_ALIGN4,
>CL_ENQUEUE_COPY_IMAGE_3D_TO_BUFFER,   //copy image 3d tobuffer
> +  CL_ENQUEUE_COPY_IMAGE_3D_TO_BUFFER_ALIGN16,
> +  CL_ENQUEUE_COPY_IMAGE_3D_TO_BUFFER_ALIGN4,
>CL_ENQUEUE_COPY_BUFFER_TO_IMAGE_2D,   //copy buffer to image 2d
>CL_ENQUEUE_COPY_BUFFER_TO_IMAGE_2D_ALIGN16,
>CL_ENQUEUE_COPY_BUFFER_TO_IMAGE_2D_ALIGN4,
>CL_ENQUEUE_COPY_BUFFER_TO_IMAGE_3D,   //copy buffer to image 3d
> +  CL_ENQUEUE_COPY_BUFFER_TO_IMAGE_3D_ALIGN16,
> +  CL_ENQUEUE_COPY_BUFFER_TO_IMAGE_3D_ALIGN4,
>CL_ENQUEUE_FILL_BUFFER_UNALIGN,  //fill buffer with 1 aligne pattern,
> pattern size=1
>CL_ENQUEUE_FILL_BUFFER_ALIGN2,   //fill buffer with 2 aligne pattern,
> pattern size=2
>CL_ENQUEUE_FILL_BUFFER_ALIGN4,   //fill buffer with 4 aligne pattern,
> pattern size=4
> diff --git a/src/cl_mem.c b/src/cl_mem.c index b6dce3f..307db50 100644
> --- a/src/cl_mem.c
> +++ b/src/cl_mem.c
> @@ -2162,13 +2162,13 @@ get_align_size_for_copy_kernel(struct
> _cl_mem_image* image, const size_t origin0
>  const size_t offset, cl_image_format *fmt) {
>size_t align_size = 0;
> 
> -  if((image->image_type == CL_MEM_OBJECT_IMAGE2D) && ((image->w *
> image->bpp) % ALIGN16 == 0) &&
> +  if(((image->w * image->bpp) % ALIGN16 == 0) &&
>((origin0 * image->bpp) % ALIGN16 == 0) && (region0 % ALIGN16 == 0)
> && (offset % ALIGN16 == 0)){
>  fmt->image_channel_order = CL_RGBA;
>  fmt->image_channel_data_type = CL_UNSIGNED_INT32;
>  align_size = ALIGN16;
>}
> -  else if((image->image_type == CL_MEM_OBJECT_IMAGE2D) && ((image-
> >w * image->bpp) % ALIGN4 == 0) &&
> +  else if(((image->w * image->bpp) % ALIGN4 == 0) &&
>((origin0 * image->bpp) % ALIGN4 == 0) && (region0 % ALIGN4 == 0) &&
> (offset % ALIGN4 == 0)){
>  fmt->image_channel_order = CL_R;
>  fmt->image_channel_data_type = CL_UNSIGNED_INT32; @@ -2247,11
> +2247,29 @@ cl_mem_copy_image_to_buffer(cl_command_queue queue,
> cl_event event, struct _cl_m
>cl_internal_copy_image_2d_to_buffer_str,
> (size_t)cl_internal_copy_image_2d_to_buffer_str_size, NULL);
>  }
>}else if(image->image_type == CL_MEM_OBJECT_IMAGE3D) {
> -extern char cl_internal_copy_image_3d_to_buffer_str[];
> -extern size_t 

Re: [Beignet] [PATCH 1/3] Backend: Add intel_reqd_sub_group_size support

2017-06-14 Thread Yang, Rong R
The spec only required the intel_reqd_sub_group_size in the return value 
CL_DEVICE_SUB_GROUP_SIZES_INTEL.
If some applications use the hard code sub_group_size, such as 32, then the 
program will exit in the GBE, I think it does not make sense.

> -Original Message-
> From: Pan, Xiuli
> Sent: Tuesday, June 13, 2017 17:04
> To: Yang, Rong R <rong.r.y...@intel.com>; beignet@lists.freedesktop.org
> Subject: RE: [Beignet] [PATCH 1/3] Backend: Add intel_reqd_sub_group_size
> support
> 
> The spec has required the subgroup size to be 8 or 16, and I think we may
> need to fail the build in some other place.
> 
> -----Original Message-
> From: Yang, Rong R
> Sent: Tuesday, June 13, 2017 16:44
> To: Pan, Xiuli <xiuli@intel.com>; beignet@lists.freedesktop.org
> Cc: Pan, Xiuli <xiuli@intel.com>
> Subject: RE: [Beignet] [PATCH 1/3] Backend: Add intel_reqd_sub_group_size
> support
> 
> 
> 
> > -Original Message-
> > From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf
> > Of Xiuli Pan
> > Sent: Monday, June 5, 2017 16:28
> > To: beignet@lists.freedesktop.org
> > Cc: Pan, Xiuli <xiuli@intel.com>
> > Subject: [Beignet] [PATCH 1/3] Backend: Add intel_reqd_sub_group_size
> > support
> >
> > From: Pan Xiuli <xiuli@intel.com>
> >
> > If we get intel_reqd_sub_group_size attribute from frontend then set
> > it to backend.
> >
> > Signed-off-by: Pan Xiuli <xiuli@intel.com>
> > ---
> >  backend/src/backend/context.cpp   |  6 +-
> >  backend/src/backend/gen_program.cpp   | 28 --
> --
> > 
> >  backend/src/llvm/llvm_gen_backend.cpp | 20 
> >  3 files changed, 41 insertions(+), 13 deletions(-)
> >
> > diff --git a/backend/src/backend/context.cpp
> > b/backend/src/backend/context.cpp index e9ddd17..c9500c8 100644
> > --- a/backend/src/backend/context.cpp
> > +++ b/backend/src/backend/context.cpp
> > @@ -340,7 +340,6 @@ namespace gbe
> >
> > ///
> >// Generic Context (shared by the simulator and the HW context)
> >
> > //
> > /
> > -  IVAR(OCL_SIMD_WIDTH, 8, 15, 16);
> >
> >Context::Context(const ir::Unit , const std::string ) :
> >  unit(unit), fn(*unit.getFunction(name)), name(name),
> > liveness(NULL), dag(NULL), useDWLabel(false) @@ -361,10 +360,7 @@
> namespace gbe
> >}
> >
> >void Context::startNewCG(uint32_t simdWidth) {
> > -if (simdWidth == 0 || OCL_SIMD_WIDTH != 15)
> > -  this->simdWidth = nextHighestPowerOf2(OCL_SIMD_WIDTH);
> > -else
> > -  this->simdWidth = simdWidth;
> > +this->simdWidth = simdWidth;
> >  GBE_SAFE_DELETE(this->registerAllocator);
> >  GBE_SAFE_DELETE(this->scratchAllocator);
> >  GBE_ASSERT(dag != NULL && liveness != NULL); diff --git
> > a/backend/src/backend/gen_program.cpp
> > b/backend/src/backend/gen_program.cpp
> > index c1827b1..26b646a 100644
> > --- a/backend/src/backend/gen_program.cpp
> > +++ b/backend/src/backend/gen_program.cpp
> > @@ -59,6 +59,7 @@
> >  #include   #endif
> >
> > +#include "sys/cvar.hpp"
> >  #include 
> >  #include 
> >  #include 
> > @@ -138,17 +139,24 @@ namespace gbe {
> >}
> >
> >/*! We must avoid spilling at all cost with Gen */
> > -  static const struct CodeGenStrategy {
> > +  struct CodeGenStrategy {
> >  uint32_t simdWidth;
> >  uint32_t reservedSpillRegs;
> >  bool limitRegisterPressure;
> > -  } codeGenStrategy[] = {
> > +  };
> > +  static const struct CodeGenStrategy codeGenStrategyDefault[] = {
> >  {16, 0, false},
> >  {8, 0, false},
> >  {8, 8, false},
> >  {8, 16, false},
> >};
> > +  static const struct CodeGenStrategy codeGenStrategySimd16[] = {
> > +{16, 0, false},
> > +{16, 8, false},
> > +{16, 16, false},
> > +  };
> >
> > +  IVAR(OCL_SIMD_WIDTH, 8, 15, 16);
> >Kernel *GenProgram::compileKernel(const ir::Unit , const
> > std::string ,
> >  bool relaxMath, int profiling) {
> > #ifdef GBE_COMPILER_AVAILABLE @@ -156,19 +164,23 @@ namespace
> gbe {
> >  // when the function already provides the simd width we need to use
> (i.e.
> >  // non zero)
> >  c

Re: [Beignet] [PATCH] do constant folding for kernel struct args

2017-06-13 Thread Yang, Rong R
foldFunctionStructArgConstOffset is called before the lowerFunctionArguments.
If foldFunctionStructArgConstOffset is wrong, the INDIRECT_MOV generated in 
lowerFunctionArguments also wrong.

I afraid the following ir:

BB2:
LOADI %30, 4
Add %20, %10, %30//%10 is a struct argument
MOV %22, %20   //phi-mov

BB3:
LOADI %31, 8
Add %21, %11, %31//%11 is another struct argument
MOV %22, %21   //phi-mov

BB4:
LOADI %32, 4
Add %33, %22, %32

Will be converted to:
LOADI %42, 8
Add %33, %10, %42

If so, the lowerFunctionArguments will wrong.

> -Original Message-
> From: Guo, Yejun
> Sent: Tuesday, June 13, 2017 16:39
> To: Yang, Rong R <rong.r.y...@intel.com>; Wang, Rander
> <rander.w...@intel.com>; Pan, Xiuli <xiuli@intel.com>;
> beignet@lists.freedesktop.org
> Subject: RE: [Beignet] [PATCH] do constant folding for kernel struct args
> 
> I just tried such kernel, and the generated GEN IR is INDIRECT_MOV, it has
> nothing to do with this patch.
> 
> Thanks
> Yejun
> 
> -Original Message-

___
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] [PATCH 1/3] Backend: Add intel_reqd_sub_group_size support

2017-06-13 Thread Yang, Rong R


> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> Xiuli Pan
> Sent: Monday, June 5, 2017 16:28
> To: beignet@lists.freedesktop.org
> Cc: Pan, Xiuli 
> Subject: [Beignet] [PATCH 1/3] Backend: Add intel_reqd_sub_group_size
> support
> 
> From: Pan Xiuli 
> 
> If we get intel_reqd_sub_group_size attribute from frontend then set it to
> backend.
> 
> Signed-off-by: Pan Xiuli 
> ---
>  backend/src/backend/context.cpp   |  6 +-
>  backend/src/backend/gen_program.cpp   | 28 
> 
>  backend/src/llvm/llvm_gen_backend.cpp | 20 
>  3 files changed, 41 insertions(+), 13 deletions(-)
> 
> diff --git a/backend/src/backend/context.cpp
> b/backend/src/backend/context.cpp index e9ddd17..c9500c8 100644
> --- a/backend/src/backend/context.cpp
> +++ b/backend/src/backend/context.cpp
> @@ -340,7 +340,6 @@ namespace gbe
>///
>// Generic Context (shared by the simulator and the HW context)
>///
> -  IVAR(OCL_SIMD_WIDTH, 8, 15, 16);
> 
>Context::Context(const ir::Unit , const std::string ) :
>  unit(unit), fn(*unit.getFunction(name)), name(name), liveness(NULL),
> dag(NULL), useDWLabel(false) @@ -361,10 +360,7 @@ namespace gbe
>}
> 
>void Context::startNewCG(uint32_t simdWidth) {
> -if (simdWidth == 0 || OCL_SIMD_WIDTH != 15)
> -  this->simdWidth = nextHighestPowerOf2(OCL_SIMD_WIDTH);
> -else
> -  this->simdWidth = simdWidth;
> +this->simdWidth = simdWidth;
>  GBE_SAFE_DELETE(this->registerAllocator);
>  GBE_SAFE_DELETE(this->scratchAllocator);
>  GBE_ASSERT(dag != NULL && liveness != NULL); diff --git
> a/backend/src/backend/gen_program.cpp
> b/backend/src/backend/gen_program.cpp
> index c1827b1..26b646a 100644
> --- a/backend/src/backend/gen_program.cpp
> +++ b/backend/src/backend/gen_program.cpp
> @@ -59,6 +59,7 @@
>  #include   #endif
> 
> +#include "sys/cvar.hpp"
>  #include 
>  #include 
>  #include 
> @@ -138,17 +139,24 @@ namespace gbe {
>}
> 
>/*! We must avoid spilling at all cost with Gen */
> -  static const struct CodeGenStrategy {
> +  struct CodeGenStrategy {
>  uint32_t simdWidth;
>  uint32_t reservedSpillRegs;
>  bool limitRegisterPressure;
> -  } codeGenStrategy[] = {
> +  };
> +  static const struct CodeGenStrategy codeGenStrategyDefault[] = {
>  {16, 0, false},
>  {8, 0, false},
>  {8, 8, false},
>  {8, 16, false},
>};
> +  static const struct CodeGenStrategy codeGenStrategySimd16[] = {
> +{16, 0, false},
> +{16, 8, false},
> +{16, 16, false},
> +  };
> 
> +  IVAR(OCL_SIMD_WIDTH, 8, 15, 16);
>Kernel *GenProgram::compileKernel(const ir::Unit , const std::string
> ,
>  bool relaxMath, int profiling) {  #ifdef
> GBE_COMPILER_AVAILABLE @@ -156,19 +164,23 @@ namespace gbe {
>  // when the function already provides the simd width we need to use (i.e.
>  // non zero)
>  const ir::Function *fn = unit.getFunction(name);
> +const struct CodeGenStrategy* codeGenStrategy =
> + codeGenStrategyDefault;
>  if(fn == NULL)
>GBE_ASSERT(0);
> -uint32_t codeGenNum = sizeof(codeGenStrategy) /
> sizeof(codeGenStrategy[0]);
> +uint32_t codeGenNum = sizeof(codeGenStrategyDefault) /
> + sizeof(codeGenStrategyDefault[0]);
>  uint32_t codeGen = 0;
>  GenContext *ctx = NULL;
> -if (fn->getSimdWidth() == 8) {
> +if ( fn->getSimdWidth() != 0 && OCL_SIMD_WIDTH != 15) {
> +  GBE_ASSERTM(0, "unsupported SIMD width!");
> +}else if (fn->getSimdWidth() == 8 || OCL_SIMD_WIDTH == 8) {
>codeGen = 1;
> -} else if (fn->getSimdWidth() == 16) {
> -  codeGenNum = 1;
> -} else if (fn->getSimdWidth() == 0) {
> +} else if (fn->getSimdWidth() == 16 || OCL_SIMD_WIDTH == 16){
> +  codeGenStrategy = codeGenStrategySimd16;
> +  codeGenNum = 3;
Don't hard the codeGenStrategySimd16's array size here.

> +} else if (fn->getSimdWidth() == 0 && OCL_SIMD_WIDTH == 15) {
>codeGen = 0;
>  } else
> -  GBE_ASSERT(0);
> +  GBE_ASSERTM(0, "unsupported SIMD width!");
>  Kernel *kernel = NULL;
> 
>  // Stop when compilation is successful diff --git
> a/backend/src/llvm/llvm_gen_backend.cpp
> b/backend/src/llvm/llvm_gen_backend.cpp
> index c8e29c5..b2d9b1b 100644
> --- a/backend/src/llvm/llvm_gen_backend.cpp
> +++ b/backend/src/llvm/llvm_gen_backend.cpp
> @@ -2125,6 +2125,7 @@ namespace gbe
>  // Loop over the kernel metadatas to set the required work group size.
>  size_t reqd_wg_sz[3] = {0, 0, 0};
>  size_t hint_wg_sz[3] = {0, 0, 0};
> +size_t reqd_sg_sz = 0;
>  ir::FunctionArgument::InfoFromLLVM llvmInfo;
>  MDNode *addrSpaceNode = NULL;
>  MDNode 

Re: [Beignet] [PATCH 2/3] Runtime: Add new API enums for cl_intel_required_subgroup_size extension

2017-06-13 Thread Yang, Rong R
Also need add the extension define to backend/src/libocl/include/ocl.h file.

> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> Xiuli Pan
> Sent: Monday, June 5, 2017 16:28
> To: beignet@lists.freedesktop.org
> Cc: Pan, Xiuli 
> Subject: [Beignet] [PATCH 2/3] Runtime: Add new API enums for
> cl_intel_required_subgroup_size extension
> 
> From: Pan Xiuli 
> 
> Add CL_DEVICE_SUB_GROUP_SIZES_INTEL for clGetDeviceInfo, add
> CL_KERNEL_SPILL_MEM_SIZE_INTEL for clGetKernelWorkGroupInfo and add
> CL_KERNEL_COMPILE_SUB_GROUP_SIZE_INTEL for
> clGetKernelSubGroupInfo.
> We only have this extension for LLVM 40+ for frontend support.
> 
> Signed-off-by: Pan Xiuli 
> ---
>  include/CL/cl_intel.h |  6 ++
>  src/cl_device_id.c| 27 +++
>  src/cl_device_id.h|  2 ++
>  src/cl_extensions.c   |  8 
>  src/cl_extensions.h   |  1 +
>  src/cl_gt_device.h|  2 ++
>  6 files changed, 46 insertions(+)
> 
> diff --git a/include/CL/cl_intel.h b/include/CL/cl_intel.h index
> 47bae46..3cb8515 100644
> --- a/include/CL/cl_intel.h
> +++ b/include/CL/cl_intel.h
> @@ -197,6 +197,12 @@ typedef CL_API_ENTRY cl_int
> void* /*param_value*/,
> size_t*
> /*param_value_size_ret*/ );  #endif
> +
> +/* cl_intel_required_subgroup_size extension*/
> +#define CL_DEVICE_SUB_GROUP_SIZES_INTEL 0x4108
> +#define CL_KERNEL_SPILL_MEM_SIZE_INTEL  0x4109
> +#define CL_KERNEL_COMPILE_SUB_GROUP_SIZE_INTEL  0x410A
> +
>  #ifdef __cplusplus
>  }
>  #endif
> diff --git a/src/cl_device_id.c b/src/cl_device_id.c index 50ed0d9..99258f7
> 100644
> --- a/src/cl_device_id.c
> +++ b/src/cl_device_id.c
> @@ -1335,6 +1335,10 @@ cl_get_device_info(cl_device_id device,
>src_ptr = device->driver_version;
>src_size = device->driver_version_sz;
>break;
> +case CL_DEVICE_SUB_GROUP_SIZES_INTEL:
> +  src_ptr = device->sub_group_sizes;
> +  src_size = device->sub_group_sizes_sz;
> +  break;
> 
>  default:
>return CL_INVALID_VALUE;
> @@ -1477,6 +1481,7 @@ cl_get_kernel_workgroup_info(cl_kernel kernel,
>  DECL_FIELD(COMPILE_WORK_GROUP_SIZE, kernel->compile_wg_sz)
>  DECL_FIELD(PRIVATE_MEM_SIZE, kernel->stack_size)
>  case CL_KERNEL_GLOBAL_WORK_SIZE:
> +{
>dimension = cl_check_builtin_kernel_dimension(kernel, device);
>if ( !dimension ) return CL_INVALID_VALUE;
>if (param_value_size_ret != NULL) @@ -1494,6 +1499,18 @@
> cl_get_kernel_workgroup_info(cl_kernel kernel,
>  return CL_SUCCESS;
>}
>return CL_SUCCESS;
> +}
> +case CL_KERNEL_SPILL_MEM_SIZE_INTEL:
> +{
> +  if (param_value && param_value_size < sizeof(cl_ulong))
> +return CL_INVALID_VALUE;
> +  if (param_value_size_ret != NULL)
> +*param_value_size_ret = sizeof(cl_ulong);
> +  if (param_value)
> +*(cl_ulong*)param_value =
> (cl_ulong)interp_kernel_get_scratch_size(kernel->opaque);
> +  return CL_SUCCESS;
> +}
> +
>  default:
>return CL_INVALID_VALUE;
>};
> @@ -1577,6 +1594,16 @@ cl_get_kernel_subgroup_info(cl_kernel kernel,
>}
>break;
>  }
> +case CL_KERNEL_COMPILE_SUB_GROUP_SIZE_INTEL:
> +{
> +  if (param_value && param_value_size < sizeof(size_t))
> +return CL_INVALID_VALUE;
> +  if (param_value_size_ret != NULL)
> +*param_value_size_ret = sizeof(size_t);
> +  if (param_value)
> +*(size_t*)param_value = interp_kernel_get_simd_width(kernel-
> >opaque);
> +  return CL_SUCCESS;
> +}
>  default:
>return CL_INVALID_VALUE;
>};
> diff --git a/src/cl_device_id.h b/src/cl_device_id.h index 6b8f2eb..93bd2f1
> 100644
> --- a/src/cl_device_id.h
> +++ b/src/cl_device_id.h
> @@ -136,6 +136,8 @@ struct _cl_device_id {
>uint32_t atomic_test_result;
>cl_uint image_pitch_alignment;
>cl_uint image_base_address_alignment;
> +  size_t sub_group_sizes[2];
> +  size_t sub_group_sizes_sz;
> 
>//inited as NULL, created only when cmrt kernel is used
>void* cmrt_device;  //realtype: CmDevice* diff --git a/src/cl_extensions.c
> b/src/cl_extensions.c index d49d202..56099ad 100644
> --- a/src/cl_extensions.c
> +++ b/src/cl_extensions.c
> @@ -69,8 +69,16 @@ check_intel_extension(cl_extensions_t *extensions)  {
>int id;
>for(id = INTEL_EXT_START_ID; id <= INTEL_EXT_END_ID; id++)
> +  {
>  if(id != EXT_ID(intel_motion_estimation))
>extensions->extensions[id].base.ext_enabled = 1;
> +if(id == EXT_ID(intel_required_subgroup_size))
> +#if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR > 40
> +  extensions->extensions[id].base.ext_enabled = 1; #else
> +  extensions->extensions[id].base.ext_enabled = 0; #endif
> +  }
>  }
> 
>  void

Re: [Beignet] [PATCH 3/3] Utset: Add test case for cl_intel_required_subgroup_size extension

2017-06-13 Thread Yang, Rong R


> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> Xiuli Pan
> Sent: Monday, June 5, 2017 16:28
> To: beignet@lists.freedesktop.org
> Cc: Pan, Xiuli 
> Subject: [Beignet] [PATCH 3/3] Utset: Add test case for
> cl_intel_required_subgroup_size extension
> 
> From: Pan Xiuli 
> 
> Check the device supported subgroup sizes, and use
> intel_reqd_sub_group_size to build kernels in these size. Then check if there
> is spill for each kernel.
> 
> Signed-off-by: Pan Xiuli 
> ---
>  kernels/compiler_reqd_sub_group_size.cl |  5 
>  utests/CMakeLists.txt   |  1 +
>  utests/compiler_reqd_sub_group_size.cpp | 45
> +
>  utests/utest_helper.cpp | 20 +++
>  utests/utest_helper.hpp |  3 +++
>  5 files changed, 74 insertions(+)
>  create mode 100644 kernels/compiler_reqd_sub_group_size.cl
>  create mode 100644 utests/compiler_reqd_sub_group_size.cpp
> 
> diff --git a/kernels/compiler_reqd_sub_group_size.cl
> b/kernels/compiler_reqd_sub_group_size.cl
> new file mode 100644
> index 000..0ce70e9
> --- /dev/null
> +++ b/kernels/compiler_reqd_sub_group_size.cl
> @@ -0,0 +1,5 @@
> +__attribute__((intel_reqd_sub_group_size(SIMD_SIZE)))
> +__kernel void compiler_reqd_sub_group_size(global int* src) {
> +
> +}
> diff --git a/utests/CMakeLists.txt b/utests/CMakeLists.txt index
> b7ef742..07df957 100644
> --- a/utests/CMakeLists.txt
> +++ b/utests/CMakeLists.txt
> @@ -286,6 +286,7 @@ set (utests_sources
>compiler_sub_group_shuffle_down.cpp
>compiler_sub_group_shuffle_up.cpp
>compiler_sub_group_shuffle_xor.cpp
> +  compiler_reqd_sub_group_size.cpp
>builtin_global_linear_id.cpp
>builtin_local_linear_id.cpp
>multi_queue_events.cpp
> diff --git a/utests/compiler_reqd_sub_group_size.cpp
> b/utests/compiler_reqd_sub_group_size.cpp
> new file mode 100644
> index 000..5e2545b
> --- /dev/null
> +++ b/utests/compiler_reqd_sub_group_size.cpp
> @@ -0,0 +1,45 @@
> +#include "utest_helper.hpp"
> +#include
> +#include
> +#include
> +
> +using namespace std;
> +
> +void compiler_reqd_sub_group_size(void)
> +{
> +  if (!cl_check_reqd_subgroup())
> +return;
> +
> +  size_t param_value_size;
> +  OCL_CALL(clGetDeviceInfo, device, CL_DEVICE_SUB_GROUP_SIZES_INTEL,
> +   0, NULL, _value_size);
> +
> +  size_t* param_value = new size_t[param_value_size];
Forgot to free it?

___
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] [PATCH v5 7/7] Optimize clEnqueueWriteImageByKernel and clEnqueuReadImageByKernel.

2017-06-13 Thread Yang, Rong R
The patchset LGTM, pushed, thanks.

> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> yan.w...@linux.intel.com
> Sent: Tuesday, June 13, 2017 15:46
> To: beignet@lists.freedesktop.org
> Cc: Yan Wang 
> Subject: [Beignet] [PATCH v5 7/7] Optimize clEnqueueWriteImageByKernel
> and clEnqueuReadImageByKernel.
> 
> From: Yan Wang 
> 
> 1. Only copy the data by origin and region defined.
> 2. Add clFinish to guarantee the kernel copying is finished when blocking
> writing.
> 
> Signed-off-by: Yan Wang 
> ---
>  src/cl_api_mem.c | 25 ++---
>  1 file changed, 18 insertions(+), 7 deletions(-)
> 
> diff --git a/src/cl_api_mem.c b/src/cl_api_mem.c index 00567b9..1daf403
> 100644
> --- a/src/cl_api_mem.c
> +++ b/src/cl_api_mem.c
> @@ -1857,23 +1857,28 @@
> clEnqueueReadImageByKernel(cl_command_queue command_queue,
>if (image->tmp_ker_buf)
>  clReleaseMemObject(image->tmp_ker_buf);
> 
> -  image->tmp_ker_buf = clCreateBuffer(command_queue->ctx,
> CL_MEM_ALLOC_HOST_PTR,
> -mem->size, NULL, );
> +  size_t buf_size = region[0] * region[1] * region[2] * image->bpp;
> + image->tmp_ker_buf = clCreateBuffer(command_queue->ctx,
> CL_MEM_USE_HOST_PTR,
> +buf_size, ptr, );
>if (image->tmp_ker_buf == NULL || err != CL_SUCCESS) {
>  image->tmp_ker_buf = NULL;
>  return err;
>}
> 
> +  cl_event e;
>err = clEnqueueCopyImageToBuffer(command_queue, mem, image-
> >tmp_ker_buf, origin,
> -region, 0, 0, NULL, NULL);
> +region, 0, num_events_in_wait_list, event_wait_list, );
>if (err != CL_SUCCESS) {
>  clReleaseMemObject(image->tmp_ker_buf);
> +clReleaseEvent(e);
>  image->tmp_ker_buf = NULL;
>  return err;
>}
> 
> -  return clEnqueueReadBuffer(command_queue, image->tmp_ker_buf,
> blocking_read, 0,
> -mem->size, ptr, num_events_in_wait_list, event_wait_list, event);
> +  err = clEnqueueReadBuffer(command_queue, image->tmp_ker_buf,
> blocking_read, 0,
> +buf_size, ptr, 1, , event);
> +  clReleaseEvent(e);
> +  return err;
>  }
> 
>  cl_int
> @@ -2064,14 +2069,20 @@
> clEnqueueWriteImageByKernel(cl_command_queue command_queue,
>if (image->tmp_ker_buf)
>  clReleaseMemObject(image->tmp_ker_buf);
> 
> -  image->tmp_ker_buf = clCreateBuffer(command_queue->ctx,
> CL_MEM_USE_HOST_PTR, mem->size, (void*)ptr, );
> +  size_t buf_size = region[0] * region[1] * region[2] * image->bpp;
> + image->tmp_ker_buf = clCreateBuffer(command_queue->ctx,
> + CL_MEM_USE_HOST_PTR, buf_size, (void*)ptr, );
>if (image->tmp_ker_buf == NULL || err != CL_SUCCESS) {
>  image->tmp_ker_buf = NULL;
>  return err;
>}
> 
> -  return clEnqueueCopyBufferToImage(command_queue, image-
> >tmp_ker_buf, mem, 0, origin, region,
> +  err = clEnqueueCopyBufferToImage(command_queue, image-
> >tmp_ker_buf,
> + mem, 0, origin, region,
>  num_events_in_wait_list, event_wait_list, event);
> +
> +  if (blocking_write)
> +err = clFinish(command_queue);
> +
> +  return err;
>  }
> 
>  cl_int
> --
> 2.7.4
> 
> ___
> Beignet mailing list
> Beignet@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/beignet
___
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] [PATCH] do constant folding for kernel struct args

2017-06-13 Thread Yang, Rong R
Has you consider the value from two arguments case. For example:

Struct  s1{
int i,
   float4 f4;
}

Struct  s2{
int i;
short s;
   float4 f4;
}

__kernel void k(s1, s2, __global float *dst)
{
int gid = get_global_id(0);
float4 *p;
   if (gid % 2) {
  p = 
   } else {
  P = 
   }
dst[gid] = *p.s1;
}

> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> Guo, Yejun
> Sent: Thursday, June 8, 2017 21:08
> To: Wang, Rander ; Pan, Xiuli
> ; beignet@lists.freedesktop.org
> Subject: Re: [Beignet] [PATCH] do constant folding for kernel struct args
> 
> Yes, the constant folding for kernel struct arg is a must here.
> 
> As for the general constant folding and propagation optimization, I do not
> have a position that sel ir or gen ir is better.
> 
> -Original Message-
> From: Wang, Rander
> Sent: Thursday, June 08, 2017 1:14 PM
> To: Pan, Xiuli; Guo, Yejun; beignet@lists.freedesktop.org
> Cc: Guo, Yejun
> Subject: RE: [Beignet] [PATCH] do constant folding for kernel struct args
> 
> Yes, so I may be able to give some advice
> 
> -Original Message-
> From: Pan, Xiuli
> Sent: Thursday, June 8, 2017 1:09 PM
> To: Guo, Yejun ; beignet@lists.freedesktop.org
> Cc: Guo, Yejun ; Wang, Rander
> 
> Subject: RE: [Beignet] [PATCH] do constant folding for kernel struct args
> 
> Rander seems to have a similar optimization about imm value at sel ir.
> If your case here need the optimization done in GEN IR level then  rander's
> patch may no longer be needed.
> 
> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> Guo, Yejun
> Sent: Thursday, June 8, 2017 12:41
> To: beignet@lists.freedesktop.org
> Cc: Guo, Yejun 
> Subject: [Beignet] [PATCH] do constant folding for kernel struct args
> 
> for the following GEN IR, %41 is kernel argument (struct) the first LOAD will
> be mov, and the second LOAD will be indirect move (see
> lowerFunctionArguments). It hurts performance, and even impacts the
> correctness of reg liveness of indriect mov
> 
> LOADI.uint64 %1114 72
> ADD.int64 %78 %41 %1114
> LOAD.int64.private.aligned {%79} %78 bti:255
> LOADI.int64 %1115 8
> ADD.int64 %1116 %78 %1115
> LOAD.int64.private.aligned {%80} %1116 bti:255
> 
> this function folds the constants of 72 and 8 together, and so it will be 
> direct
> mov.
> the GEN IR looks like:
> LOADI.int64 %1115 80
> ADD.int64 %1116 %41 %1115
> ---
>  backend/src/CMakeLists.txt |   2 +
>  backend/src/ir/constopt.cpp| 144
> +
>  backend/src/ir/constopt.hpp|  54 
>  backend/src/ir/context.cpp |   5 ++
>  backend/src/ir/instruction.cpp |   7 ++
>  backend/src/ir/instruction.hpp |   1 +
>  6 files changed, 213 insertions(+)
>  create mode 100644 backend/src/ir/constopt.cpp  create mode 100644
> backend/src/ir/constopt.hpp
> 
> diff --git a/backend/src/CMakeLists.txt b/backend/src/CMakeLists.txt index
> c9ff833..74d7bab 100644
> --- a/backend/src/CMakeLists.txt
> +++ b/backend/src/CMakeLists.txt
> @@ -73,6 +73,8 @@ set (GBE_SRC
>  ir/value.hpp
>  ir/lowering.cpp
>  ir/lowering.hpp
> +ir/constopt.cpp
> +ir/constopt.hpp
>  ir/profiling.cpp
>  ir/profiling.hpp
>  ir/printf.cpp
> diff --git a/backend/src/ir/constopt.cpp b/backend/src/ir/constopt.cpp new
> file mode 100644 index 000..24878b8
> --- /dev/null
> +++ b/backend/src/ir/constopt.cpp
> @@ -0,0 +1,144 @@
> +/*
> + * Copyright © 2017 Intel Corporation
> + *
> + * This library is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2.1 of the License, or (at your option) any later version.
> + *
> + * This library is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with this library. If not, see
> .
> + *
> + * Author: Guo Yejun   */
> +
> +#include 
> +#include "ir/context.hpp"
> +#include "ir/value.hpp"
> +#include "ir/constopt.hpp"
> +#include "sys/set.hpp"
> +
> +namespace gbe {
> +namespace ir {
> +
> +  class FunctionStructArgConstOffsetFolder : public Context  {
> +  public:
> +/*! Build the helper structure */
> +FunctionStructArgConstOffsetFolder(Unit ) : Context(unit) {
> +  records.clear();
> +  loadImms.clear();
> +}
> +/*! Free everything we needed */
> +virtual ~FunctionStructArgConstOffsetFolder() {
> +  for (size_t i = 

Re: [Beignet] [PATCH] Runtime: Fix a mssing llvm version marco for LLVM40+

2017-06-09 Thread Yang, Rong R
LGTM, pushed, thanks.

> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> Pan, Xiuli
> Sent: Monday, June 5, 2017 16:25
> To: Pan, Xiuli ; beignet@lists.freedesktop.org
> Subject: Re: [Beignet] [PATCH] Runtime: Fix a mssing llvm version marco for
> LLVM40+
> 
> Ping for review.
> 
> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> Xiuli Pan
> Sent: Friday, May 5, 2017 14:32
> To: beignet@lists.freedesktop.org
> Cc: Pan, Xiuli 
> Subject: [Beignet] [PATCH] Runtime: Fix a mssing llvm version marco for
> LLVM40+
> 
> From: Pan Xiuli 
> 
> Found a missing macro that need change to support LLVM40+.
> 
> Signed-off-by: Pan Xiuli 
> ---
>  src/cl_extensions.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/src/cl_extensions.c b/src/cl_extensions.c index a3c71ca..d49d202
> 100644
> --- a/src/cl_extensions.c
> +++ b/src/cl_extensions.c
> @@ -42,7 +42,7 @@ void check_opt1_extension(cl_extensions_t
> *extensions)
>{
>  if (id == EXT_ID(khr_icd))
>extensions->extensions[id].base.ext_enabled = 1; -#if
> LLVM_VERSION_MAJOR == 3 && LLVM_VERSION_MINOR >= 5
> +#if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 35
>  if (id == EXT_ID(khr_spir))
>extensions->extensions[id].base.ext_enabled = 1;  #endif
> --
> 2.7.4
> 
> ___
> Beignet mailing list
> Beignet@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/beignet
> ___
> Beignet mailing list
> Beignet@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/beignet
___
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] [PATCH] backend: refine exp function with float input

2017-06-09 Thread Yang, Rong R
Pushed, thanks.

> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> Pan, Xiuli
> Sent: Wednesday, June 7, 2017 15:52
> To: Wang, Rander ; beig...@freedesktop.org
> Cc: Wang, Rander 
> Subject: Re: [Beignet] [PATCH] backend: refine exp function with float input
> 
> LGTM.
> Thanks
> 
> 
> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> rander.wang
> Sent: Wednesday, June 7, 2017 15:47
> To: beig...@freedesktop.org
> Cc: Wang, Rander 
> Subject: [Beignet] [PATCH] backend: refine exp function with float input
> 
>   remove some corner cases check for these path can not be
>   reached.And refine branch code to select. These improvements
>   get 20% performance. and the performance of  OCL_ExpFixture_Exp
> in opencv can match up to other Gen driver
> 
> Signed-off-by: rander.wang 
> ---
>  backend/src/libocl/tmpl/ocl_math_common.tmpl.cl | 60
> -
>  1 file changed, 58 insertions(+), 2 deletions(-)
> 
> diff --git a/backend/src/libocl/tmpl/ocl_math_common.tmpl.cl
> b/backend/src/libocl/tmpl/ocl_math_common.tmpl.cl
> index 6b942db..6286ea6 100644
> --- a/backend/src/libocl/tmpl/ocl_math_common.tmpl.cl
> +++ b/backend/src/libocl/tmpl/ocl_math_common.tmpl.cl
> @@ -1418,6 +1418,62 @@ OVERLOADABLE float
> __gen_ocl_internal_exp(float x) {
>}
>  }
> 
> +OVERLOADABLE float __gen_ocl_internal_simple_exp(float x) {
> +  float o_threshold = 8.8721679688e+01,  /* 0x42b17180 */
> +  u_threshold = -1.0397208405e+02,  /* 0xc2cff1b5 */
> +  twom100 = 7.8886090522e-31, /* 2**-100=0x0d80 */
> +  ivln2 = 1.4426950216e+00, /* 0x3fb8aa3b =1/ln2 */
> +  one = 1.0,
> +  P1 = 1.667163e-01, /* 0x3e2b */
> +  P2 = -2.778450e-03; /* 0xbb360b61 */
> +  float y,hi=0.0,lo=0.0,c,t;
> +  int k=0;
> +  unsigned hx;
> +  float ln2HI_0 = 6.9313812256e-01; /* 0x3f317180 */
> +  float ln2HI_1 = -6.9313812256e-01; /* 0xbf317180 */
> +  float ln2LO_0 = 9.0580006145e-06;  /* 0x3717f7d1 */
> +  float ln2LO_1 = -9.0580006145e-06; /* 0xb717f7d1 */
> +  float half_0 = 0.5;
> +  float half_1 = -0.5;
> +  float retVal = -1.0f;
> +  hx = as_uint(fabs(x));
> +
> +  /* filter out non-finite argument */
> +  /* if |x|>=88.721... */
> +  if(hx >= 0x42b17218)
> +  {
> +float tmp = (x > 0)? x:0.0;
> +retVal = (x > 0)? INFINITY:retVal; /* overflow */
> +retVal = (hx>0x7f80)? NAN:retVal;
> +retVal = (hx==0x7f80)? tmp:retVal;
> +retVal = (x < u_threshold)? 0.0:retVal; /* underflow */
> +
> +if(retVal != -1.0f)
> +  return retVal;
> +  }
> +
> +  /* argument reduction */
> +  float tmp = (x < 0) ? half_1 : half_0;  k  = mad(ivln2, x, tmp);  t =
> + k;  hi = mad(t, -ln2HI_0, x); /* t*ln2HI is exact here */  lo =
> + t*ln2LO_0;  x  = hi - lo;
> +
> +  /* x is now in primary range */
> +  t  = x*x;
> +  c  = mad(t, mad(t, -P2, -P1), x);
> +  y = one-((lo + (x*c)/(c-2.0f))-hi);
> +
> +  unsigned hy;
> +  GEN_OCL_GET_FLOAT_WORD(hy,y);
> +
> +  float factor = (k >= -125)? 1.0f:twom100;
> +  k = (k >= -125)? k:k+100;
> +  GEN_OCL_SET_FLOAT_WORD(y,hy+(k<<23)); /* add k to y's exponent */
> +  return y*factor;
> +}
> +
>  /* erf,erfc from glibc s_erff.c -- float version of s_erf.c.
>   * Conversion to float by Ian Lance Taylor, Cygnus Support, i...@cygnus.com.
>   */
> @@ -3105,10 +3161,10 @@ OVERLOADABLE float exp(float x) {
>  return __gen_ocl_internal_fastpath_exp(x);
> 
>/* Use native instruction when it has enough precision */
> -  if (x > -0x1.6p1 && x < 0x1.6p1)
> +  if (fabs(x) < 0x1.6p1)
>  return __gen_ocl_internal_fastpath_exp(x);
> 
> -  return  __gen_ocl_internal_exp(x);
> +  return  __gen_ocl_internal_simple_exp(x);
>  }
> 
>  OVERLOADABLE float exp2(float x) {
> --
> 2.7.4
> 
> ___
> Beignet mailing list
> Beignet@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/beignet
> ___
> Beignet mailing list
> Beignet@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/beignet
___
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] [PATCH] backend: refine hypot function

2017-06-09 Thread Yang, Rong R
Pushed, thanks.

> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> Pan, Xiuli
> Sent: Wednesday, June 7, 2017 15:53
> To: Wang, Rander ; beig...@freedesktop.org
> Cc: Wang, Rander 
> Subject: Re: [Beignet] [PATCH] backend: refine hypot function
> 
> LGTM.
> Only test about correctness performance need recheck.
> 
> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> rander.wang
> Sent: Thursday, May 18, 2017 16:18
> To: beig...@freedesktop.org
> Cc: Wang, Rander 
> Subject: [Beignet] [PATCH] backend: refine hypot function
> 
>the test OCL_Magnitude of opencv is slow on beignet because
>  of hypot. refine the hypot, change algorithm and remove
>  unnecessary code to get 30% up
> 
> Signed-off-by: rander.wang 
> ---
>  backend/src/libocl/tmpl/ocl_math_common.tmpl.cl | 75
> -
>  1 file changed, 61 insertions(+), 14 deletions(-)
> 
> diff --git a/backend/src/libocl/tmpl/ocl_math_common.tmpl.cl
> b/backend/src/libocl/tmpl/ocl_math_common.tmpl.cl
> index 6b942db..ab03cb4 100644
> --- a/backend/src/libocl/tmpl/ocl_math_common.tmpl.cl
> +++ b/backend/src/libocl/tmpl/ocl_math_common.tmpl.cl
> @@ -2894,12 +2894,36 @@ float __gen_ocl_internal_pown(float x, int y) {
>return as_float((a & (0x807Fu)) | (u & 0x8000u) | 0x3F00);
> float __gen_ocl_internal_frexp(float x, int *exp) { BODY; }
> 
> +float __fast_scalbnf (float x, int n){
> +  /* copy from fdlibm */
> +  float two25 = 3.355443200e+07,  /* 0x4c00 */
> +  twom25 = 2.9802322388e-08,  /* 0x3300 */
> +  huge = 1.0e+30,
> +  tiny = 1.0e-30;
> +  int k,ix,t,tmp;
> +  float retVal;
> +
> +  GEN_OCL_GET_FLOAT_WORD(ix,x);
> +  k = (ix&0x7f80)>>23; /* extract exponent */  t = k;  k = k+n; tmp
> + = (ix&0x807f);  x = as_float(tmp |(k << 23));  retVal = (k > 0)?
> + x:0.0f;  retVal = (k > 0xfe)? INFINITY:retVal;  retVal = (k <= -25)?
> + 0.0f:retVal;  x = as_float(tmp | ((k + 25) << 23));  retVal = ((k > 0)
> + && (k <= 25)) ? x*twom25:retVal;  retVal = (t == 0)?
> + 0.0f:retVal;
> +
> +  return retVal;
> +}
> +
>  OVERLOADABLE float hypot(float x, float y) {
>if (__ocl_math_fastpath_flag)
>  return __gen_ocl_internal_fastpath_hypot(x, y);
> 
> -  //return __gen_ocl_sqrt(x*x + y*y);
> -  float a,b,an,bn,cn;
> +  float a,b,an,bn,cn, retVal;
>int e;
>if (isfinite (x) && isfinite (y)){  /* Determine absolute values.  */
>x = __gen_ocl_fabs (x);
> @@ -2907,19 +2931,42 @@ OVERLOADABLE float hypot(float x, float y) {
>/* Find the bigger and the smaller one.  */
>a = max(x,y);
>b = min(x,y);
> -  /* Now 0 <= b <= a.  */
> -  /* Write a = an * 2^e, b = bn * 2^e with 0 <= bn <= an < 1.  */
> -  an = __gen_ocl_internal_frexp (a, );
> -  bn = ldexp (b, - e);
> -  /* Through the normalization, no unneeded overflow or underflow will
> occur here.  */
> -  cn = __gen_ocl_sqrt (an * an + bn * bn);
> -  return ldexp (cn, e);
> -  }else{
> -if (isinf (x) || isinf (y))  /* x or y is infinite.  Return +Infinity.  
> */
> -  return INFINITY;
> -else/* x or y is NaN.  Return NaN.  */
> -  return x + y;
> +
> +   bool skip = false;
> + uint u = as_uint(a);
> + uint x = u;
> + if (x == 0) {
> +   e = 0;
> +   an = x;
> +  skip = true;
> + }
> +
> + if (x >= 0x80) {
> +   e = (x >> 23) - 126;
> +   an = as_float((u & (0x807Fu)) | 0x3F00);
> + skip = true;
> + }
> +
> +   if(!skip)
> +{
> +  int msbOne = clz(x);
> +  x <<= (msbOne -8);
> +  e = -117 -msbOne;
> +  an = as_float((x & (0x807Fu)) | 0x3F00);
> +}
> +
> + bn = __fast_scalbnf (b, - e);
> + /* Through the normalization, no unneeded overflow or underflow
> will occur here.  */
> + cn = __gen_ocl_sqrt (mad(an, an,  bn * bn));
> + retVal = __fast_scalbnf (cn, e);
>}
> +  else
> +  {
> +retVal = NAN; /* x or y is NaN.  Return NaN.  */
> +retVal = (isinf (x) || isinf (y)) ?  INFINITY:retVal; /* x or y is
> + infinite.  Return +Infinity.  */  }
> +
> +  return retVal;
>  }
> 
>  OVERLOADABLE float powr(float x, float y) {
> --
> 2.7.4
> 
> ___
> Beignet mailing list
> Beignet@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/beignet
> ___
> Beignet mailing list
> Beignet@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/beignet
___
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] [PATCH] keep GEN IR as SSA style

2017-06-09 Thread Yang, Rong R
LGTM, pushed, thanks.
Please add Signed-off-by message next time.

> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> Guo, Yejun
> Sent: Wednesday, June 7, 2017 15:44
> To: beignet@lists.freedesktop.org
> Cc: Guo, Yejun 
> Subject: [Beignet] [PATCH] keep GEN IR as SSA style
> 
> ---
>  backend/src/llvm/llvm_gen_backend.cpp | 8 +---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/backend/src/llvm/llvm_gen_backend.cpp
> b/backend/src/llvm/llvm_gen_backend.cpp
> index 831666e..31b8bf2 100644
> --- a/backend/src/llvm/llvm_gen_backend.cpp
> +++ b/backend/src/llvm/llvm_gen_backend.cpp
> @@ -2984,10 +2984,12 @@ namespace gbe
>this->newRegister(const_cast());
>ir::Register reg =
> regTranslator.getScalar(const_cast(), 0);
>ir::Constant  = unit.getConstantSet().getConstant(v.getName());
> -  ctx.LOADI(getType(ctx, v.getType()), reg,
> ctx.newIntegerImmediate(con.getOffset(), getType(ctx, v.getType(;
>if (!legacyMode) {
> -ctx.ADD(getType(ctx, v.getType()), reg, 
> ir::ocl::constant_addrspace,
> reg);
> -  }
> +ir::Register regload = ctx.reg(getFamily(getType(ctx, 
> v.getType(;
> +ctx.LOADI(getType(ctx, v.getType()), regload,
> ctx.newIntegerImmediate(con.getOffset(), getType(ctx, v.getType(;
> +ctx.ADD(getType(ctx, v.getType()), reg, 
> ir::ocl::constant_addrspace,
> regload);
> +  } else
> +ctx.LOADI(getType(ctx, v.getType()), reg,
> ctx.newIntegerImmediate(con.getOffset(), getType(ctx, v.getType(;
>  }
>} else if(addrSpace == ir::MEM_PRIVATE) {
>this->newRegister(const_cast());
> --
> 2.7.4
> 
> ___
> Beignet mailing list
> Beignet@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/beignet
___
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] [PATCH v2 2/2] Fix bug of clEnqueueCopyBufferToImage and clEnqueueCopyImageToBuffer.

2017-05-25 Thread Yang, Rong R
The patchset LGTM, pushed, thanks.

BYW: should also support align2 later.

> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> yan.w...@linux.intel.com
> Sent: Thursday, May 25, 2017 15:10
> To: beignet@lists.freedesktop.org
> Cc: Yan Wang 
> Subject: [Beignet] [PATCH v2 2/2] Fix bug of clEnqueueCopyBufferToImage
> and clEnqueueCopyImageToBuffer.
> 
> From: Yan Wang 
> 
> "imagedim_non_pow_2" cases of  basic modudle of confrmance shows
> regression after use TILE_Y mode for large image by previous patch.
> This bug comes from the non-align16 kernel of
> clEnqueueCopyBufferToImage and clEnqueueCopyImageToBuffer.
> It will force CL_RGBA/CL_UNORM_INT8/8191x8192 image of conformance
> test to CL_R/CL_UNSIGNED_INT8/32764x8192 image for copying.
> So it makes width as 8191 x 4 = 32764 and its width will exceed the maximum
> width (16 x 1024 = 16384) of GEN surface state structure which only has 14
> bits.
> So use align4 copy kernel to avoid this bug.
> 
> Signed-off-by: Yan Wang 
> ---
>  src/CMakeLists.txt |  1 +
>  src/cl_context.h   |  2 +
>  src/cl_mem.c   | 78 
> ++
>  .../cl_internal_copy_buffer_to_image_2d_align4.cl  | 18
> +  .../cl_internal_copy_image_2d_to_buffer_align4.cl  | 18 +
>  5 files changed, 89 insertions(+), 28 deletions(-)  create mode 100644
> src/kernels/cl_internal_copy_buffer_to_image_2d_align4.cl
>  create mode 100644
> src/kernels/cl_internal_copy_image_2d_to_buffer_align4.cl
> 
> diff --git a/src/CMakeLists.txt b/src/CMakeLists.txt index 77a1c87..6433566
> 100644
> --- a/src/CMakeLists.txt
> +++ b/src/CMakeLists.txt
> @@ -53,6 +53,7 @@ cl_internal_copy_image_2d_array_to_2d_array
> cl_internal_copy_image_2d_array_to_2
>  cl_internal_copy_image_2d_array_to_3d
> cl_internal_copy_image_3d_to_2d_array
>  cl_internal_copy_image_2d_to_buffer
> cl_internal_copy_image_2d_to_buffer_align16
> cl_internal_copy_image_3d_to_buffer
>  cl_internal_copy_buffer_to_image_2d
> cl_internal_copy_buffer_to_image_2d_align16
> cl_internal_copy_buffer_to_image_3d
> +cl_internal_copy_buffer_to_image_2d_align4
> +cl_internal_copy_image_2d_to_buffer_align4
>  cl_internal_fill_buf_align8 cl_internal_fill_buf_align4
>  cl_internal_fill_buf_align2 cl_internal_fill_buf_unalign
>  cl_internal_fill_buf_align128 cl_internal_fill_image_1d diff --git
> a/src/cl_context.h b/src/cl_context.h index 8ba499f..75bf895 100644
> --- a/src/cl_context.h
> +++ b/src/cl_context.h
> @@ -62,9 +62,11 @@ enum _cl_internal_ker_type {
>CL_ENQUEUE_COPY_IMAGE_3D_TO_2D_ARRAY,   //copy image 3d to
> image 2d array
>CL_ENQUEUE_COPY_IMAGE_2D_TO_BUFFER,   //copy image 2d to buffer
>CL_ENQUEUE_COPY_IMAGE_2D_TO_BUFFER_ALIGN16,
> +  CL_ENQUEUE_COPY_IMAGE_2D_TO_BUFFER_ALIGN4,
>CL_ENQUEUE_COPY_IMAGE_3D_TO_BUFFER,   //copy image 3d tobuffer
>CL_ENQUEUE_COPY_BUFFER_TO_IMAGE_2D,   //copy buffer to image 2d
>CL_ENQUEUE_COPY_BUFFER_TO_IMAGE_2D_ALIGN16,
> +  CL_ENQUEUE_COPY_BUFFER_TO_IMAGE_2D_ALIGN4,
>CL_ENQUEUE_COPY_BUFFER_TO_IMAGE_3D,   //copy buffer to image 3d
>CL_ENQUEUE_FILL_BUFFER_UNALIGN,  //fill buffer with 1 aligne pattern,
> pattern size=1
>CL_ENQUEUE_FILL_BUFFER_ALIGN2,   //fill buffer with 2 aligne pattern,
> pattern size=2
> diff --git a/src/cl_mem.c b/src/cl_mem.c index 0c49c3d..a8543c9 100644
> --- a/src/cl_mem.c
> +++ b/src/cl_mem.c
> @@ -2146,6 +2146,36 @@ fail:
>return ret;
>  }
> 
> +#define ALIGN16 16
> +#define ALIGN4 4
> +#define ALIGN1 1
> +
> +static size_t
> +get_align_size_for_copy_kernel(struct _cl_mem_image* image, const
> size_t origin0, const size_t region0,
> +const size_t offset, cl_image_format *fmt)
> +{
> +  size_t align_size = 0;
> +
> +  if((image->image_type == CL_MEM_OBJECT_IMAGE2D) && ((image->w *
> image->bpp) % ALIGN16 == 0) &&
> +  ((origin0 * image->bpp) % ALIGN16 == 0) && (region0 % ALIGN16 == 0)
> && (offset % ALIGN16 == 0)){
> +fmt->image_channel_order = CL_RGBA;
> +fmt->image_channel_data_type = CL_UNSIGNED_INT32;
> +align_size = ALIGN16;
> +  }
> +  else if((image->image_type == CL_MEM_OBJECT_IMAGE2D) && ((image-
> >w * image->bpp) % ALIGN4 == 0) &&
> +  ((origin0 * image->bpp) % ALIGN4 == 0) && (region0 % ALIGN4 == 0) &&
> (offset % ALIGN4 == 0)){
> +fmt->image_channel_order = CL_R;
> +fmt->image_channel_data_type = CL_UNSIGNED_INT32;
> +align_size = ALIGN4;
> +  }
> +  else{
> +fmt->image_channel_order = CL_R;
> +fmt->image_channel_data_type = CL_UNSIGNED_INT8;
> +align_size = ALIGN1;
> +  }
> +
> +  return align_size;
> +}
> +
>  LOCAL cl_int
>  cl_mem_copy_image_to_buffer(cl_command_queue queue, cl_event
> event, struct _cl_mem_image* image, cl_mem buffer,
>   const size_t *src_origin, const size_t 

Re: [Beignet] [PATCH] build: fix cmake code generation dependencies.

2017-05-25 Thread Yang, Rong R
LGTM, thanks, will push it later.

> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> Ismo Puustinen
> Sent: Tuesday, May 23, 2017 16:27
> To: beignet@lists.freedesktop.org
> Cc: Puustinen, Ismo 
> Subject: [Beignet] [PATCH] build: fix cmake code generation dependencies.
> 
> There is a race condition between building .bc and header files and
> generating code from .cl targets. Fix the race by adding the dependency to
> generated files.
> 
> Signed-off-by: Ismo Puustinen 
> ---
>  src/CMakeLists.txt | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/src/CMakeLists.txt b/src/CMakeLists.txt index 77a1c87..f87a637
> 100644
> --- a/src/CMakeLists.txt
> +++ b/src/CMakeLists.txt
> @@ -18,13 +18,13 @@ foreach (KF ${KERNEL_FILES})
>OUTPUT ${output_file}
>COMMAND rm -rf ${output_file}
>COMMAND ${GBE_BIN_GENERATER} -s -o${output_file} -t${GEN_PCI_ID}
> ${input_file}
> -  DEPENDS ${input_file} ${GBE_BIN_FILE})
> +  DEPENDS ${input_file} ${GBE_BIN_FILE} beignet_bitcode)
>else(GEN_PCI_ID)
>  add_custom_command(
>OUTPUT ${output_file}
>COMMAND rm -rf ${output_file}
>COMMAND ${GBE_BIN_GENERATER} -s -o${output_file} ${input_file}
> -  DEPENDS ${input_file} ${GBE_BIN_FILE})
> +  DEPENDS ${input_file} ${GBE_BIN_FILE} beignet_bitcode)
>endif(GEN_PCI_ID)
>  endforeach (KF)
>  endmacro (MakeKernelBinStr)
> --
> 2.9.4
> 
> ___
> Beignet mailing list
> Beignet@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/beignet
___
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] [PATCH 2/2] Fix bug of clEnqueueCopyBufferToImage and clEnqueueCopyImageToBuffer.

2017-05-24 Thread Yang, Rong R
Can you write a function to decide the alignment? Then it could be reused by 
cl_mem_copy_buffer_to_image and cl_mem_copy_image_to_buffer.

> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> yan.w...@linux.intel.com
> Sent: Thursday, May 25, 2017 10:57
> To: beignet@lists.freedesktop.org
> Cc: Yan Wang 
> Subject: [Beignet] [PATCH 2/2] Fix bug of clEnqueueCopyBufferToImage and
> clEnqueueCopyImageToBuffer.
> 
> From: Yan Wang 
> 
> "imagedim_non_pow_2" cases of  basic modudle of confrmance shows
> regression after use TILE_Y mode for large image by previous patch.
> This bug comes from the non-align16 kernel of
> clEnqueueCopyBufferToImage and clEnqueueCopyImageToBuffer.
> It will force CL_RGBA/CL_UNORM_INT8/8191x8192 image of conformance
> test to CL_R/CL_UNSIGNED_INT8/32764x8192 image for copying.
> So it makes width as 8191 x 4 = 32764 and its width will exceed the maximum
> width (16 x 1024 = 16384) of GEN surface state structure which only has 14
> bits.
> So use align4 copy kernel to avoid this bug.
> 
> Signed-off-by: Yan Wang 
> ---
>  src/CMakeLists.txt |  1 +
>  src/cl_context.h   |  2 ++
>  src/cl_mem.c   | 32 
> ++
>  .../cl_internal_copy_buffer_to_image_2d_align4.cl  | 18
>   .../cl_internal_copy_image_2d_to_buffer_align4.cl  | 18
> 
>  5 files changed, 71 insertions(+)
>  create mode 100644
> src/kernels/cl_internal_copy_buffer_to_image_2d_align4.cl
>  create mode 100644
> src/kernels/cl_internal_copy_image_2d_to_buffer_align4.cl
> 
> diff --git a/src/CMakeLists.txt b/src/CMakeLists.txt index 77a1c87..6433566
> 100644
> --- a/src/CMakeLists.txt
> +++ b/src/CMakeLists.txt
> @@ -53,6 +53,7 @@ cl_internal_copy_image_2d_array_to_2d_array
> cl_internal_copy_image_2d_array_to_2
>  cl_internal_copy_image_2d_array_to_3d
> cl_internal_copy_image_3d_to_2d_array
>  cl_internal_copy_image_2d_to_buffer
> cl_internal_copy_image_2d_to_buffer_align16
> cl_internal_copy_image_3d_to_buffer
>  cl_internal_copy_buffer_to_image_2d
> cl_internal_copy_buffer_to_image_2d_align16
> cl_internal_copy_buffer_to_image_3d
> +cl_internal_copy_buffer_to_image_2d_align4
> +cl_internal_copy_image_2d_to_buffer_align4
>  cl_internal_fill_buf_align8 cl_internal_fill_buf_align4
>  cl_internal_fill_buf_align2 cl_internal_fill_buf_unalign
>  cl_internal_fill_buf_align128 cl_internal_fill_image_1d diff --git
> a/src/cl_context.h b/src/cl_context.h index 8ba499f..75bf895 100644
> --- a/src/cl_context.h
> +++ b/src/cl_context.h
> @@ -62,9 +62,11 @@ enum _cl_internal_ker_type {
>CL_ENQUEUE_COPY_IMAGE_3D_TO_2D_ARRAY,   //copy image 3d to
> image 2d array
>CL_ENQUEUE_COPY_IMAGE_2D_TO_BUFFER,   //copy image 2d to buffer
>CL_ENQUEUE_COPY_IMAGE_2D_TO_BUFFER_ALIGN16,
> +  CL_ENQUEUE_COPY_IMAGE_2D_TO_BUFFER_ALIGN4,
>CL_ENQUEUE_COPY_IMAGE_3D_TO_BUFFER,   //copy image 3d tobuffer
>CL_ENQUEUE_COPY_BUFFER_TO_IMAGE_2D,   //copy buffer to image 2d
>CL_ENQUEUE_COPY_BUFFER_TO_IMAGE_2D_ALIGN16,
> +  CL_ENQUEUE_COPY_BUFFER_TO_IMAGE_2D_ALIGN4,
>CL_ENQUEUE_COPY_BUFFER_TO_IMAGE_3D,   //copy buffer to image 3d
>CL_ENQUEUE_FILL_BUFFER_UNALIGN,  //fill buffer with 1 aligne pattern,
> pattern size=1
>CL_ENQUEUE_FILL_BUFFER_ALIGN2,   //fill buffer with 2 aligne pattern,
> pattern size=2
> diff --git a/src/cl_mem.c b/src/cl_mem.c index 0c49c3d..3b9a3be 100644
> --- a/src/cl_mem.c
> +++ b/src/cl_mem.c
> @@ -2159,6 +2159,7 @@
> cl_mem_copy_image_to_buffer(cl_command_queue queue, cl_event
> event, struct _cl_m
>size_t origin0, region0;
>size_t kn_dst_offset;
>int align16 = 0;
> +  int align4 = 0;
>size_t align_size = 1;
>size_t w_saved;
> 
> @@ -2183,6 +2184,13 @@
> cl_mem_copy_image_to_buffer(cl_command_queue queue, cl_event
> event, struct _cl_m
>  align16 = 1;
>  align_size = 16;
>}
> +  else if((image->image_type == CL_MEM_OBJECT_IMAGE2D) && ((image-
> >w * image->bpp) % 4 == 0) &&
> +  ((src_origin[0] * bpp) % 4 == 0) && (region0 % 4 == 0) && (dst_offset 
> % 4
> == 0)){
> +fmt.image_channel_order = CL_R;
> +fmt.image_channel_data_type = CL_UNSIGNED_INT32;
> +align4 = 1;
> +align_size = 4;
> +  }
>else{
>  fmt.image_channel_order = CL_R;
>  fmt.image_channel_data_type = CL_UNSIGNED_INT8; @@ -2206,6
> +2214,14 @@ cl_mem_copy_image_to_buffer(cl_command_queue queue,
> cl_event event, struct _cl_m
>  cl_internal_copy_image_2d_to_buffer_align16_str,
>  (size_t)cl_internal_copy_image_2d_to_buffer_align16_str_size,
> NULL);
>  }
> +else if(align4){
> +  extern char cl_internal_copy_image_2d_to_buffer_align4_str[];
> +  extern size_t
> + cl_internal_copy_image_2d_to_buffer_align4_str_size;
> +
> +  ker = 

Re: [Beignet] [PATCH V2] Backend: Fix performance regression with sampler refine fro LLVM40

2017-05-18 Thread Yang, Rong R
LGTM, pushed, thanks.

> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> Xiuli Pan
> Sent: Wednesday, May 17, 2017 17:02
> To: beignet@lists.freedesktop.org
> Cc: Pan, Xiuli 
> Subject: [Beignet] [PATCH V2] Backend: Fix performance regression with
> sampler refine fro LLVM40
> 
> From: Pan Xiuli 
> 
> After the refine we can not know if a sampler is a constant initialized or 
> not.
> Then the compiler optimization for constant sampler will break and we will
> runtime decide which SAMPLE instruction will use.
> Now fix the sampler refine for LLVM40 to enable the constant check.
> V2: Fix a typo of function __gen_ocl_sampler_to_int type.
> 
> Signed-off-by: Pan Xiuli 
> ---
>  backend/src/libocl/src/ocl_image.cl   |  9 
>  backend/src/llvm/llvm_sampler_fix.cpp | 41
> +++
>  2 files changed, 41 insertions(+), 9 deletions(-)
> 
> diff --git a/backend/src/libocl/src/ocl_image.cl
> b/backend/src/libocl/src/ocl_image.cl
> index e66aa15..2febfda 100644
> --- a/backend/src/libocl/src/ocl_image.cl
> +++ b/backend/src/libocl/src/ocl_image.cl
> @@ -295,18 +295,17 @@ GEN_VALIDATE_ARRAY_INDEX(int, read_write
> image1d_buffer_t)  // The work around is to use a LD message instead of
> normal sample message.
> 
> //
> /
> 
> -bool __gen_ocl_sampler_need_fix(int);
> -bool __gen_ocl_sampler_need_rounding_fix(int);
> -int __gen_ocl_sampler_to_int(sampler_t);
> +bool __gen_ocl_sampler_need_fix(sampler_t);
> +bool __gen_ocl_sampler_need_rounding_fix(sampler_t);
> 
>  bool __gen_sampler_need_fix(const sampler_t sampler)  {
> -  return
> __gen_ocl_sampler_need_fix(__gen_ocl_sampler_to_int(sampler));
> +  return __gen_ocl_sampler_need_fix(sampler);
>  }
> 
>  bool __gen_sampler_need_rounding_fix(const sampler_t sampler)  {
> -  return
> __gen_ocl_sampler_need_rounding_fix(__gen_ocl_sampler_to_int(sample
> r));
> +  return __gen_ocl_sampler_need_rounding_fix(sampler);
>  }
> 
>  INLINE_OVERLOADABLE float __gen_fixup_float_coord(float tmpCoord) diff
> --git a/backend/src/llvm/llvm_sampler_fix.cpp
> b/backend/src/llvm/llvm_sampler_fix.cpp
> index 2e8bcf9..c249755 100644
> --- a/backend/src/llvm/llvm_sampler_fix.cpp
> +++ b/backend/src/llvm/llvm_sampler_fix.cpp
> @@ -55,9 +55,17 @@ namespace gbe {
>  //  ((sampler & __CLK_FILTER_MASK) == CLK_FILTER_NEAREST));
>  bool needFix = true;
>  Value *needFixVal;
> +#if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 40
> +CallInst *init = dyn_cast(I->getOperand(0));
> +if (init && init->getCalledValue()-
> >getName().compare("__translate_sampler_initializer"))
> +{
> +  const ConstantInt *ci = dyn_cast(init->getOperand(0));
> +  uint32_t samplerInt = ci->getZExtValue(); #else
>  if (dyn_cast(I->getOperand(0))) {
>const ConstantInt *ci = dyn_cast(I->getOperand(0));
>uint32_t samplerInt = ci->getZExtValue();
> +#endif
>needFix = ((samplerInt & __CLK_ADDRESS_MASK) ==
> CLK_ADDRESS_CLAMP &&
>   (samplerInt & __CLK_FILTER_MASK) == CLK_FILTER_NEAREST);
>needFixVal = ConstantInt::get(boolTy, needFix); @@ -65,14 +73,24 @@
> namespace gbe {
>IRBuilder<> Builder(I->getParent());
> 
>Builder.SetInsertPoint(I);
> +
>Value *addressMask = ConstantInt::get(i32Ty, __CLK_ADDRESS_MASK);
> -  Value *addressMode = Builder.CreateAnd(I->getOperand(0),
> addressMask);
>Value *clampInt =  ConstantInt::get(i32Ty, CLK_ADDRESS_CLAMP);
> -  Value *isClampMode = Builder.CreateICmpEQ(addressMode,
> clampInt);
>Value *filterMask = ConstantInt::get(i32Ty, __CLK_FILTER_MASK);
> -  Value *filterMode = Builder.CreateAnd(I->getOperand(0), 
> filterMask);
>Value *nearestInt = ConstantInt::get(i32Ty, CLK_FILTER_NEAREST);
> +
> +#if LLVM_VERSION_MAJOR * 10 + LLVM_VERSION_MINOR >= 40
> +  Module *M = I->getParent()->getParent()->getParent();
> +  Value* samplerCvt = M-
> >getOrInsertFunction("__gen_ocl_sampler_to_int", i32Ty, I-
> >getOperand(0)->getType(), nullptr);
> +  Value *samplerVal = Builder.CreateCall(samplerCvt,
> +{I->getOperand(0)}); #else
> +  Value *samplerVal = I->getOperand(0); #endif
> +  Value *addressMode = Builder.CreateAnd(samplerVal, addressMask);
> +  Value *isClampMode = Builder.CreateICmpEQ(addressMode,
> clampInt);
> +  Value *filterMode = Builder.CreateAnd(samplerVal,
> +filterMask);
>Value *isNearestMode = Builder.CreateICmpEQ(filterMode,
> nearestInt);
> +
>needFixVal = Builder.CreateAnd(isClampMode, isNearestMode);
>  }
> 
> @@ -83,16 +101,31 @@ namespace gbe {
>  //  return ((sampler & CLK_NORMALIZED_COORDS_TRUE) == 0);
>  

Re: [Beignet] [PATCH v3 8/8] Implement TILE_Y large image in clEnqueueWriteImage.

2017-05-18 Thread Yang, Rong R
The patchset LGTM, pushed, thanks.

> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> yan.w...@linux.intel.com
> Sent: Tuesday, May 16, 2017 19:04
> To: beignet@lists.freedesktop.org
> Cc: Yan Wang 
> Subject: [Beignet] [PATCH v3 8/8] Implement TILE_Y large image in
> clEnqueueWriteImage.
> 
> From: Yan Wang 
> 
> It will fail to copy data from host ptr to TILE_Y large image by memcpy.
> Use clEnqueueCopyBufferToImage to do this on GPU side.
> 
> Signed-off-by: Yan Wang 
> ---
>  src/cl_api_mem.c | 46
> ++
>  1 file changed, 46 insertions(+)
> 
> diff --git a/src/cl_api_mem.c b/src/cl_api_mem.c index 91525b1..7b58236
> 100644
> --- a/src/cl_api_mem.c
> +++ b/src/cl_api_mem.c
> @@ -1954,6 +1954,47 @@ clEnqueueReadImage(cl_command_queue
> command_queue,
>return err;
>  }
> 
> +static cl_int
> +clEnqueueWriteImageByKernel(cl_command_queue command_queue,
> +cl_mem mem,
> +cl_bool blocking_write,
> +const size_t *porigin,
> +const size_t *pregion,
> +size_t row_pitch,
> +size_t slice_pitch,
> +const void *ptr,
> +cl_uint num_events_in_wait_list,
> +const cl_event *event_wait_list,
> +cl_event *event)
> +{
> +  cl_int err = CL_SUCCESS;
> +  struct _cl_mem_image *image = NULL;
> +  size_t region[3];
> +  size_t origin[3];
> +
> +  image = cl_mem_image(mem);
> +
> +  err = check_image_region(image, pregion, region);  if (err !=
> + CL_SUCCESS)
> +return err;
> +
> +  err = check_image_origin(image, porigin, origin);  if (err !=
> + CL_SUCCESS)
> +return err;
> +
> +  if (image->tmp_ker_buf)
> +clReleaseMemObject(image->tmp_ker_buf);
> +
> +  image->tmp_ker_buf = clCreateBuffer(command_queue->ctx,
> + CL_MEM_USE_HOST_PTR, mem->size, (void*)ptr, );  if (image-
> >tmp_ker_buf == NULL || err != CL_SUCCESS) {
> +image->tmp_ker_buf = NULL;
> +return err;
> +  }
> +
> +  return clEnqueueCopyBufferToImage(command_queue, image-
> >tmp_ker_buf, mem, 0, origin, region,
> +num_events_in_wait_list, event_wait_list, event); }
> +
>  cl_int
>  clEnqueueWriteImage(cl_command_queue command_queue,
>  cl_mem mem,
> @@ -2039,6 +2080,11 @@ clEnqueueWriteImage(cl_command_queue
> command_queue,
>break;
>  }
> 
> +if (image->is_ker_copy) {
> +  return clEnqueueWriteImageByKernel(command_queue, mem,
> blocking_write, origin,
> +region, row_pitch, slice_pitch, ptr, num_events_in_wait_list,
> event_wait_list, event);
> +}
> +
>  err = cl_event_check_waitlist(num_events_in_wait_list, event_wait_list,
>event, command_queue->ctx);
>  if (err != CL_SUCCESS) {
> --
> 2.7.4
> 
> ___
> Beignet mailing list
> Beignet@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/beignet
___
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] [PATCH] Backend: Fix llvm40 assert about literal structs

2017-05-18 Thread Yang, Rong R
Pushed, thanks.

> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> Guo, Yejun
> Sent: Wednesday, May 17, 2017 20:57
> To: Pan, Xiuli ; beignet@lists.freedesktop.org
> Subject: Re: [Beignet] [PATCH] Backend: Fix llvm40 assert about literal 
> structs
> 
> Looks fine to me, thanks.
> 
> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> Pan, Xiuli
> Sent: Wednesday, May 17, 2017 3:15 PM
> To: beignet@lists.freedesktop.org
> Subject: Re: [Beignet] [PATCH] Backend: Fix llvm40 assert about literal 
> structs
> 
> Ping for review.
> If llvm is debug version will cause assert for device enqueue cases.
> 
> -Original Message-
> From: Pan, Xiuli
> Sent: Tuesday, April 25, 2017 13:27
> To: beignet@lists.freedesktop.org
> Cc: Pan, Xiuli 
> Subject: [PATCH] Backend: Fix llvm40 assert about literal structs
> 
> From: Pan Xiuli 
> 
> In llvm literal structs have no name, so check it first.
> 
> Signed-off-by: Pan Xiuli 
> ---
>  backend/src/llvm/llvm_gen_backend.cpp | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/backend/src/llvm/llvm_gen_backend.cpp
> b/backend/src/llvm/llvm_gen_backend.cpp
> index 9954021..831666e 100644
> --- a/backend/src/llvm/llvm_gen_backend.cpp
> +++ b/backend/src/llvm/llvm_gen_backend.cpp
> @@ -362,7 +362,8 @@ namespace gbe
>  Type *eltTy = dyn_cast(type)->getElementType();
>  if (eltTy->isStructTy()) {
>StructType *strTy = dyn_cast(eltTy);
> -  if (strTy->getName().data() && strstr(strTy->getName().data(),
> "sampler"))
> +  if (!strTy->isLiteral() && strTy->getName().data() &&
> +  strstr(strTy->getName().data(), "sampler"))
>  type = Type::getInt32Ty(value->getContext());
>  }
>}
> --
> 2.7.4
> 
> ___
> Beignet mailing list
> Beignet@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/beignet
> ___
> Beignet mailing list
> Beignet@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/beignet
___
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] [PATCH 3/3] GLK: add geminilake runtime support.

2017-05-15 Thread Yang, Rong R
Pushed, thanks.

> -Original Message-
> From: Pan, Xiuli
> Sent: Monday, May 15, 2017 15:05
> To: Yang, Rong R <rong.r.y...@intel.com>; beignet@lists.freedesktop.org
> Cc: Yang, Rong R <rong.r.y...@intel.com>
> Subject: RE: [Beignet] [PATCH 3/3] GLK: add geminilake runtime support.
> 
> LGTM.
> Thanks
> 
> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> Yang Rong
> Sent: Monday, May 15, 2017 16:33
> To: beignet@lists.freedesktop.org
> Cc: Yang, Rong R <rong.r.y...@intel.com>
> Subject: [Beignet] [PATCH 3/3] GLK: add geminilake runtime support.
> 
> Geminilake is almost same as bxt, except intel_gpgpu_read_ts_reg function.
> 
> Signed-off-by: Yang Rong <rong.r.y...@intel.com>
> ---
>  src/cl_device_data.h|  6 +++---
>  src/cl_device_id.c  | 47
> +--
>  src/intel/intel_gpgpu.c |  2 ++
>  3 files changed, 50 insertions(+), 5 deletions(-)
> 
> diff --git a/src/cl_device_data.h b/src/cl_device_data.h index
> 39d3e2d..c3d6c45 100644
> --- a/src/cl_device_data.h
> +++ b/src/cl_device_data.h
> @@ -361,11 +361,11 @@
> 
>  #define IS_KABYLAKE(devid) (IS_KBL_GT1(devid) || IS_KBL_GT15(devid) ||
> IS_KBL_GT2(devid) || IS_KBL_GT3(devid) || IS_KBL_GT4(devid))
> 
> -#define PCI_CHIP_GLK 0x3184
> +#define PCI_CHIP_GLK_3x6 0x3184
>  #define PCI_CHIP_GLK_2x6 0x3185
> 
> -#define IS_GEMINILAKE(devid)  \
> -  (devid == PCI_CHIP_GLK ||   \
> +#define IS_GEMINILAKE(devid)  \
> +  (devid == PCI_CHIP_GLK_3x6 ||   \
> devid == PCI_CHIP_GLK_2x6)
> 
>  #define IS_GEN9(devid) (IS_SKYLAKE(devid) || IS_BROXTON(devid) ||
> IS_KABYLAKE(devid) || IS_GEMINILAKE(devid))
> diff --git a/src/cl_device_id.c b/src/cl_device_id.c index 50ed0d9..6cba2b5
> 100644
> --- a/src/cl_device_id.c
> +++ b/src/cl_device_id.c
> @@ -254,6 +254,26 @@ static struct _cl_device_id intel_kbl_gt4_device =
> {  #include "cl_gen9_device.h"
>  };
> 
> +static struct _cl_device_id intel_glk18eu_device = {
> +  .max_compute_unit = 18,
> +  .max_thread_per_unit = 6,
> +  .sub_slice_count = 3,
> +  .max_work_item_sizes = {512, 512, 512},
> +  .max_work_group_size = 512,
> +  .max_clock_frequency = 1000,
> +#include "cl_gen9_device.h"
> +};
> +
> +static struct _cl_device_id intel_glk12eu_device = {
> +  .max_compute_unit = 12,
> +  .max_thread_per_unit = 6,
> +  .sub_slice_count = 2,
> +  .max_work_item_sizes = {512, 512, 512},
> +  .max_work_group_size = 512,
> +  .max_clock_frequency = 1000,
> +#include "cl_gen9_device.h"
> +};
> +
>  LOCAL cl_device_id
>  cl_get_gt_device(cl_device_type device_type)  { @@ -737,6 +757,26 @@
> kbl_gt4_break:
>cl_intel_platform_enable_extension(ret, cl_khr_fp16_ext_id);
>break;
> 
> +case PCI_CHIP_GLK_3x6:
> +  DECL_INFO_STRING(glk18eu_break, intel_bxt18eu_device, name,
> +"Intel(R) HD Graphics Geminilake(3x6)");
> +glk18eu_break:
> +  intel_glk18eu_device.device_id = device_id;
> +  intel_glk18eu_device.platform = cl_get_platform_default();
> +  ret = _glk18eu_device;
> +  cl_intel_platform_get_default_extension(ret);
> +  cl_intel_platform_enable_extension(ret, cl_khr_fp16_ext_id);
> +  break;
> +
> +case PCI_CHIP_GLK_2x6:
> +  DECL_INFO_STRING(glk12eu_break, intel_bxt12eu_device, name,
> +"Intel(R) HD Graphics Geminilake(2x6)");
> +glk12eu_break:
> +  intel_glk12eu_device.device_id = device_id;
> +  intel_glk12eu_device.platform = cl_get_platform_default();
> +  ret = _glk12eu_device;
> +  cl_intel_platform_get_default_extension(ret);
> +  cl_intel_platform_enable_extension(ret, cl_khr_fp16_ext_id);
> +  break;
> +
>  case PCI_CHIP_SANDYBRIDGE_BRIDGE:
>  case PCI_CHIP_SANDYBRIDGE_GT1:
>  case PCI_CHIP_SANDYBRIDGE_GT2:
> @@ -942,7 +982,9 @@ LOCAL cl_bool is_gen_device(cl_device_id device) {
>   device == _kbl_gt15_device ||
>   device == _kbl_gt2_device ||
>   device == _kbl_gt3_device ||
> - device == _kbl_gt4_device;
> + device == _kbl_gt4_device ||
> + device == _glk18eu_device ||
> + device == _glk12eu_device;
>  }
> 
>  LOCAL cl_int
> @@ -1365,7 +1407,8 @@ cl_device_get_version(cl_device_id device, cl_int
> *ver)
>  || device == _skl_gt3_device || device == _skl_gt4_device
>  || device == _bxt18eu_device || device ==
> _bxt12eu_device || device == _kbl_gt1_device
>  || device == _kbl_gt2_device || device == _kbl_gt3_device
> -|| device == _kbl_gt4_device || dev

Re: [Beignet] How widespread is "Exec event...error...-5" (#98647 / #100639)?

2017-04-27 Thread Yang, Rong R
As we know, this issue is introduce by commit 
https://cgit.freedesktop.org/beignet/commit/?id=ff57cee0519db1287053c7c05a2cb4e9700d3334.

To clarify, this commit is not only for ocl 2.0, ocl 1.2 also need it for null 
point check in the opencl kernel.
Suppose a corner case:
__kernel test(__global char* src, __global char* dst)
{
If(src == NULL)
   return;
..
}
Because in GEN's address space, address 0 is legal, src's GPU address maybe 
NULL even src have been set in the clSetKernelArg.
The most natural solution is don't allocate the address from 0 in the drm 
kernel driver, but unfortunately, they don't accept it.
So we try to occupy the address 0 by a fake bo with 
drm_intel_bo_set_softpin_offset.

The fix commit 
https://cgit.freedesktop.org/beignet/commit/?id=8b04f0be372da8eabdc93d6ae1b81a3c83cba284
 is just a walk around.
It only ensure taking the address 0  has no error when create device, but can't 
ensure no error of following executions. But we don’t found these case.

Hi, Rebecca,

 Have you found the cases which Disable HAS_BO_SET_SOFTPIN could fix but 
commit 
https://cgit.freedesktop.org/beignet/commit/?id=8b04f0be372da8eabdc93d6ae1b81a3c83cba284
 still exist?

Thanks,
Yang Rong
> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> Rebecca N. Palmer
> Sent: Wednesday, April 26, 2017 6:42
> To: beignet@lists.freedesktop.org
> Subject: [Beignet] How widespread is "Exec event...error...-5" (#98647 /
> #100639)?
> 
> Debian 9 (stretch)/Ubuntu 17.04 (zesty) have beignet 1.3.0, libdrm
> 2.4.74/2.4.76 and Linux 4.9/4.10.
> 
> On some hardware (possibly all of Ivy Bridge and Haswell??), this does not
> work at all: attempting to run anything fails with
> 
> drm_intel_gem_bo_context_exec() failed: Device or resource busy
> Beignet: "Exec event 0x error, type is 4592, error staus is -5"
> 
> https://bugs.freedesktop.org/show_bug.cgi?id=98647
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=860805
> 
> As the package maintainer, I'd like to fix this.  I am aware of two fixes, 
> either
> of which works for me, but
> https://bugs.freedesktop.org/show_bug.cgi?id=100639 reports that neither
> of them is perfect:
> 
>   - The fix used in 1.4:
> https://cgit.freedesktop.org/beignet/commit/?id=8b04f0be372da8eabdc93d
> 6ae1b81a3c83cba284
> 
> - Disable HAS_BO_SET_SOFTPIN: fixes more (but still not everything), but
> also disables some functionality (OpenCL 2.0).  This is probably why the bug
> only appears in recent Linux versions, and hence was missed when I tested
> the packages in a chroot on Linux 3.16: softpin was only introduced in Linux
> 4.5.
> 
> Has anyone other than its reporter seen #100639 (i.e. this error persisting
> after applying the 1.4 fix, particularly when using multiple OpenCL kernels
> such as in clFFT)?
> 
> Any other suggestions?
> 
> ___
> Beignet mailing list
> Beignet@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/beignet
___
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] Release or patch for LLVM 4.0

2017-04-13 Thread Yang, Rong R
The LLVM 4.0 patchset has been merged to master.
We could cherry-pick these patches (from 8570e5 to 919dec) to branch 
Release_1.3, there is no conflict.


> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> Bruno Pagani
> Sent: Tuesday, April 4, 2017 9:04
> To: beignet@lists.freedesktop.org
> Subject: [Beignet] Release or patch for LLVM 4.0
> 
> Hi there,
> 
> LLVM 4.0 has been released recently, and its pushing in ArchLinux is currently
> blocked by Beignet amongst others (because it doesn’t build with this
> version). I’ve seen there is a patchset available at
> https://lists.freedesktop.org/archives/beignet/2017-March/008691.html,
> but I couldn’t apply it cleanly on top of latest (1.3.1) release and I 
> currently
> lack time to investigate it.
> 
> Could a release for LLVM 4.0 support happen or a patch applying on top of
> 1.3.1 be provided?
> 
> Thanks,
> Bruno

___
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] Random error with very low prabability in Haswell platform

2017-04-13 Thread Yang, Rong R
How long don’t your benchmark run? Does linux kernel reset the GPU? You could 
run `dmesg` to get this information.

From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of Song, 
Ruiling
Sent: Tuesday, April 11, 2017 9:22
To: Gao, Sanshan ; beignet@lists.freedesktop.org
Subject: Re: [Beignet] Random error with very low prabability in Haswell 
platform

Do you mean the ECC (Error Correcting Codes) on Intel GPU by “hardware 
mistakes”?
Intel GPU adds one bit ECC support to L3 Cache since Broadwell. For details, 
you can look at:
https://01.org/sites/default/files/documentation/intel-gfx-prm-osrc-bdw-vol07-3d_media_gpgpu_3.pdf
I am not sure whether you problem is caused by the lack of ECC for L3 cache on 
Haswell.
But I think it may help you if you can find  a Broadwell machine to do some 
testing.

Thanks!
Ruiling
From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of Gao, 
Sanshan
Sent: Friday, April 7, 2017 4:47 PM
To: beignet@lists.freedesktop.org
Subject: [Beignet] Random error with very low prabability in Haswell platform

Hi, all,

I'm using Intel Iris Pro Graphics 5200 for general purpose computing, RSA 
decryption with OpenCL. However, I found that the calculated result would be 
wrong with very low probability in benchmark. In my experiments, this 
prabbility is bout "1%". And when I write out this cipher message to a file, 
which is not decrypted rightly in benchmark, and decrypt it individually, the 
result becomes right.

Did someone else meet similar situation? I guess there would be some problems 
with this integrated GPGPU (i.e. there are some mistakes with hardware 
platform, but not software implementation). I remembered that I heared of such 
deduction before, but I ignored it, because I had not met such error.

--
Platform: Intel Iris Pro Graphics 5200, OpenCL, Beignet
Grandtruth: computed reuslt by OpenSSL library
--

Thanks,
Sanshan


___
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] [PATCH newRT] Wrap all memory allocate functions.

2017-03-30 Thread Yang, Rong R
Actually, you implement a hash table with insert/delete operations, does linux 
has these apis?

> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> junyan...@inbox.com
> Sent: Thursday, March 23, 2017 15:46
> To: beignet@lists.freedesktop.org
> Cc: He, Junyan 
> Subject: [Beignet] [PATCH newRT] Wrap all memory allocate functions.
> 
> From: Junyan He 
> 
> We modify all memory allocated functions in cl_alloc file, make it
> easy to debug all the memory leak point.
> 
> Signed-off-by: Junyan He 
> ---
>  src/cl_accelerator_intel.c |   4 +-
>  src/cl_alloc.c | 197 ++--
> -
>  src/cl_alloc.h |  43 +++--
>  src/cl_api.c   |   3 +-
>  src/cl_api_context.c   |   4 +-
>  src/cl_api_kernel.c|  12 +--
>  src/cl_command_queue.c |  12 +--
>  src/cl_command_queue_enqueue.c |   6 +-
>  src/cl_command_queue_gen7.c|   2 +-
>  src/cl_context.c   |  14 +--
>  src/cl_device_enqueue.c|   2 +-
>  src/cl_enqueue.c   |   6 +-
>  src/cl_event.c |  20 ++---
>  src/cl_kernel.c|  30 +++
>  src/cl_mem.c   |  28 +++---
>  src/cl_program.c   |  54 +--
>  src/cl_sampler.c   |   4 +-
>  src/cl_utils.h |   3 -
>  src/gen/cl_command_queue_gen.c |  12 +--
>  src/gen/cl_kernel_gen.c|  28 +++---
>  src/gen/cl_program_gen.c   |  12 +--
>  src/intel/intel_batchbuffer.c  |   4 +-
>  src/intel/intel_driver.c   |   8 +-
>  src/intel/intel_gpgpu.c|  18 ++--
>  src/x11/dricommon.c|   6 +-
>  25 files changed, 342 insertions(+), 190 deletions(-)
> 
> diff --git a/src/cl_accelerator_intel.c b/src/cl_accelerator_intel.c
> index ae08184..62700b2 100644
> --- a/src/cl_accelerator_intel.c
> +++ b/src/cl_accelerator_intel.c
> @@ -18,7 +18,7 @@ cl_accelerator_intel_new(cl_context ctx,
>cl_int err = CL_SUCCESS;
> 
>/* Allocate and inialize the structure itself */
> -  TRY_ALLOC(accel, CALLOC(struct _cl_accelerator_intel));
> +  TRY_ALLOC(accel, CL_CALLOC(1, sizeof(struct _cl_accelerator_intel)));
>CL_OBJECT_INIT_BASE(accel, CL_OBJECT_ACCELERATOR_INTEL_MAGIC);
> 
>if (accel_type != CL_ACCELERATOR_TYPE_MOTION_ESTIMATION_INTEL) {
> @@ -81,5 +81,5 @@ cl_accelerator_intel_delete(cl_accelerator_intel accel)
> 
>cl_context_delete(accel->ctx);
>CL_OBJECT_DESTROY_BASE(accel);
> -  cl_free(accel);
> +  CL_FREE(accel);
>  }
> diff --git a/src/cl_alloc.c b/src/cl_alloc.c
> index e532569..b9ac853 100644
> --- a/src/cl_alloc.c
> +++ b/src/cl_alloc.c
> @@ -1,4 +1,4 @@
> -/*
> +/*
>   * Copyright © 2012 Intel Corporation
>   *
>   * This library is free software; you can redistribute it and/or
> @@ -14,75 +14,204 @@
>   * You should have received a copy of the GNU Lesser General Public
>   * License along with this library. If not, see 
> .
>   *
> - * Author: Benjamin Segovia 
>   */
> -
>  #include "cl_alloc.h"
>  #include "cl_utils.h"
> -
> +#include "cl_device_id.h"
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
> +
> +#ifdef CL_ALLOC_DEBUG
> +
> +static pthread_mutex_t cl_alloc_log_lock;
> +#define MAX_ALLOC_LOG_NUM 1024 * 1024
> +static unsigned int cl_alloc_log_num;
> +
> +typedef struct _cl_alloc_log_item {
> +  void *ptr;
> +  size_t size;
> +  char *file;
> +  int line;
> +} _cl_alloc_log_item;
> +typedef struct _cl_alloc_log_item *cl_alloc_log_item;
> +
> +#define ALLOC_LOG_BUCKET_SZ 128
> +static cl_alloc_log_item *cl_alloc_log_map[ALLOC_LOG_BUCKET_SZ];
> +static int cl_alloc_log_map_size[ALLOC_LOG_BUCKET_SZ];
> +
> +LOCAL void cl_alloc_debug_init(void)
> +{
> +  static int inited = 0;
> +  int i;
> +  if (inited)
> +return;
> +
> +  pthread_mutex_init(_alloc_log_lock, NULL);
> +
> +  for (i = 0; i < ALLOC_LOG_BUCKET_SZ; i++) {
> +cl_alloc_log_map_size[i] = 128;
> +cl_alloc_log_map[i] = malloc(cl_alloc_log_map_size[i] *
> sizeof(cl_alloc_log_item));
> +memset(cl_alloc_log_map[i], 0, cl_alloc_log_map_size[i] *
> sizeof(cl_alloc_log_item));
> +  }
> +  cl_alloc_log_num = 0;
> 
> -static volatile int32_t cl_alloc_n = 0;
> +  atexit(cl_alloc_report_unfreed);
> +  inited = 1;
> +}
> 
> -LOCAL void*
> -cl_malloc(size_t sz)
> +static void insert_alloc_log_item(void *ptr, size_t sz, char *file, int line)
>  {
> -  void * p = NULL;
> -  atomic_inc(_alloc_n);
> -  p = malloc(sz);
> +  cl_long slot;
> +  int i;
> +
> +  if (cl_alloc_log_num > MAX_ALLOC_LOG_NUM) {
> +// To many alloc without free. We consider already leaks a lot.
> +cl_alloc_report_unfreed();
> +assert(0);
> +  }
> +
> +  slot = (cl_long)ptr;
> +  slot = (slot >> 5) & 0x07f;
> +  assert(slot < ALLOC_LOG_BUCKET_SZ);
> +
> +  cl_alloc_log_item it = 

Re: [Beignet] Limit get_program_global_data() calls to OpenCL 2.0

2017-03-23 Thread Yang, Rong R
The workaround LGTM, pushed, thanks.

BTW, please add the signed-off-by information by `git format-patch -s` next 
time.

> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> Jan Beich
> Sent: Thursday, March 16, 2017 18:13
> To: beignet@lists.freedesktop.org
> Subject: [Beignet] Limit get_program_global_data() calls to OpenCL 2.0
> 
> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=217635
> ---
>  src/cl_program.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/src/cl_program.c b/src/cl_program.c index 363aed5d..bb96d98f
> 100644
> --- a/src/cl_program.c
> +++ b/src/cl_program.c
> @@ -675,7 +675,8 @@ cl_program_build(cl_program p, const char *options)
>  memcpy(p->bin + copyed, interp_kernel_get_code(opaque), sz);
>  copyed += sz;
>}
> -  if ((err = get_program_global_data(p)) != CL_SUCCESS)
> +  uint32_t ocl_version =
> + interp_kernel_get_ocl_version(interp_program_get_kernel(p->opaque,
> + 0));  if (ocl_version >= 200 && (err = get_program_global_data(p)) !=
> + CL_SUCCESS)
>  goto error;
> 
>p->is_built = 1;
> @@ -784,7 +785,8 @@ cl_program_link(cl_contextcontext,
>  copyed += sz;
>}
> 
> -  if ((err = get_program_global_data(p)) != CL_SUCCESS)
> +  uint32_t ocl_version =
> + interp_kernel_get_ocl_version(interp_program_get_kernel(p->opaque,
> + 0));  if (ocl_version >= 200 && (err = get_program_global_data(p)) !=
> + CL_SUCCESS)
>  goto error;
> 
>  done:
> ___
> Beignet mailing list
> Beignet@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/beignet
___
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] [Patch V2 2/3] fix regression on pre-BDW platform.

2017-03-23 Thread Yang, Rong R
The patchset LGTM, pushed, thanks.

> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> xionghu@intel.com
> Sent: Monday, March 20, 2017 22:38
> To: beignet@lists.freedesktop.org
> Cc: Luo, Xionghu 
> Subject: [Beignet] [Patch V2 2/3] fix regression on pre-BDW platform.
> 
> From: Luo Xionghu 
> 
> ivb/hsw will spit the 32X32 to two simd8 instructions, and noMask instruction
> introduced there, the if-opt pass shouldn't change the predicate state for no
> mask instructions.
> 
> v2: fix typo.
> Signed-off-by: Luo Xionghu 
> ---
>  backend/src/backend/gen_insn_selection_if_opt.cpp | 10 +++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/backend/src/backend/gen_insn_selection_if_opt.cpp
> b/backend/src/backend/gen_insn_selection_if_opt.cpp
> index a99b465..eff42b9 100644
> --- a/backend/src/backend/gen_insn_selection_if_opt.cpp
> +++ b/backend/src/backend/gen_insn_selection_if_opt.cpp
> @@ -80,9 +80,13 @@ namespace gbe
>optimized = true;
>  } else {
>if (if_find) {
> -insn.state.predicate = GEN_PREDICATE_NORMAL;
> -insn.state.flag = 0;
> -insn.state.subFlag = 1;
> +if (insn.state.noMask == 1)
> +  insn.state.predicate = GEN_PREDICATE_NONE;
> +else {
> +  insn.state.predicate = GEN_PREDICATE_NORMAL;
> +  insn.state.flag = 0;
> +  insn.state.subFlag = 1;
> +}
>}
>++iter;
>  }
> --
> 2.5.0
> 
> ___
> Beignet mailing list
> Beignet@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/beignet
___
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] [PATCHv2] Properly check return value from __cxa_demangle

2017-03-23 Thread Yang, Rong R
Pushed, thanks.

> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> Jan Beich
> Sent: Friday, March 17, 2017 22:16
> To: beignet@lists.freedesktop.org
> Cc: Pan, Xiuli 
> Subject: [Beignet] [PATCHv2] Properly check return value from
> __cxa_demangle
> 
> FreeBSD uses libcxxrt (via libc++) instead of GNU libiberty (via
> libstdc++) for __cxa_demangle(). When *output_buffer* and *length*
> both are NULL it doesn't modify *status* on success. Rather than rely on
> maybe uninitialized variable check the function doesn't return NULL.
> 
> Fixes:https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=213732
> Reviewed-by:  Pan Xiuli 
> ---
>  backend/src/llvm/llvm_gen_backend.hpp | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/backend/src/llvm/llvm_gen_backend.hpp
> b/backend/src/llvm/llvm_gen_backend.hpp
> index 1ab77c9d..ae486c5e 100644
> --- a/backend/src/llvm/llvm_gen_backend.hpp
> +++ b/backend/src/llvm/llvm_gen_backend.hpp
> @@ -82,9 +82,9 @@ namespace gbe
>auto it = map.find(symbol);
> 
>if (it == map.end()) {
> -int status;
> +int status = 0; /* set for libcxxrt */
>  char *realName = abi::__cxa_demangle(symbol.c_str(), NULL, NULL,
> );
> -if (status == 0) {
> +if (realName) {
>std::string realFnName(realName), stripName;
>stripName = realFnName.substr(0, realFnName.find("("));
>it = map.find(stripName);
> ___
> Beignet mailing list
> Beignet@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/beignet
___
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] [PATCH V2] CMAKE: Refine builtin kernel bin generator

2017-03-07 Thread Yang, Rong R
Pushed.

> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> Xiuli Pan
> Sent: Thursday, March 2, 2017 11:34
> To: beignet@lists.freedesktop.org
> Cc: Pan, Xiuli 
> Subject: [Beignet] [PATCH V2] CMAKE: Refine builtin kernel bin generator
> 
> From: Pan Xiuli 
> 
> Move the generated builtin str and bin files into the Cmake build directory to
> avoid chaos when changing LLVM.
> V2: Fix a bug that the builtin.cl was not written into build dir.
> 
> Signed-off-by: Pan Xiuli 
> ---
>  src/CMakeLists.txt | 14 +++---
>  1 file changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/src/CMakeLists.txt b/src/CMakeLists.txt index f3c4632..77a1c87
> 100644
> --- a/src/CMakeLists.txt
> +++ b/src/CMakeLists.txt
> @@ -7,10 +7,10 @@ include_directories(${CMAKE_CURRENT_SOURCE_DIR}
>  ${OPENGL_INCLUDE_DIRS}
>  ${EGL_INCLUDE_DIRS})
> 
> -macro (MakeKernelBinStr KERNEL_PATH KERNEL_FILES)
> +macro (MakeKernelBinStr KERNEL_DIST KERNEL_SOURCE KERNEL_FILES)
>  foreach (KF ${KERNEL_FILES})
> -  set (input_file ${KERNEL_PATH}/${KF}.cl)
> -  set (output_file ${KERNEL_PATH}/${KF}_str.c)
> +  set (input_file ${KERNEL_SOURCE}/${KF}.cl)  set (output_file
> + ${KERNEL_DIST}/${KF}_str.c)
>list (APPEND KERNEL_STR_FILES ${output_file})
>list (GET GBE_BIN_GENERATER -1 GBE_BIN_FILE)
>if(GEN_PCI_ID)
> @@ -34,7 +34,7 @@ macro (MakeBuiltInKernelStr KERNEL_PATH
> KERNEL_FILES)
>set (file_content)
>file (REMOVE ${output_file})
>foreach (KF ${KERNEL_NAMES})
> -set (input_file ${KERNEL_PATH}/${KF}.cl)
> +set (input_file ${CMAKE_CURRENT_SOURCE_DIR}/kernels/${KF}.cl)
>  file(READ ${input_file} file_content )
>  STRING(REGEX REPLACE ";" ";" file_content "${file_content}")
>  file(APPEND ${output_file} ${file_content}) @@ -60,9 +60,9 @@
> cl_internal_fill_image_1d_array cl_internal_fill_image_2d
> cl_internal_fill_image_2d_array cl_internal_fill_image_3d
>  cl_internal_block_motion_estimate_intel)
>  set (BUILT_IN_NAME  cl_internal_built_in_kernel) -MakeBuiltInKernelStr
> ("${CMAKE_CURRENT_SOURCE_DIR}/kernels/" "${KERNEL_NAMES}") -
> MakeKernelBinStr ("${CMAKE_CURRENT_SOURCE_DIR}/kernels/"
> "${KERNEL_NAMES}") -MakeKernelBinStr
> ("${CMAKE_CURRENT_SOURCE_DIR}/kernels/" "${BUILT_IN_NAME}")
> +MakeBuiltInKernelStr ("${CMAKE_CURRENT_BINARY_DIR}/kernels/"
> +"${KERNEL_NAMES}") MakeKernelBinStr
> +("${CMAKE_CURRENT_BINARY_DIR}/kernels/"
> +"${CMAKE_CURRENT_SOURCE_DIR}/kernels/" "${KERNEL_NAMES}")
> +MakeKernelBinStr ("${CMAKE_CURRENT_BINARY_DIR}/kernels/"
> +"${CMAKE_CURRENT_BINARY_DIR}/kernels/" "${BUILT_IN_NAME}")
> 
>  set(OPENCL_SRC
>  ${KERNEL_STR_FILES}
> --
> 2.7.4
> 
> ___
> Beignet mailing list
> Beignet@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/beignet
___
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] [PATCH 1/4] Backend: Add missing Unaligned OWord Block Read disasm

2017-03-07 Thread Yang, Rong R
This patch LGTM, pushed, thanks.

> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> Xiuli Pan
> Sent: Tuesday, March 7, 2017 12:35
> To: beignet@lists.freedesktop.org
> Cc: Pan, Xiuli 
> Subject: [Beignet] [PATCH 1/4] Backend: Add missing Unaligned OWord Block
> Read disasm
> 
> From: Pan Xiuli 
> 
> Now OWord Block Read disasm is missing, add it with Oword Block Read.
> 
> Signed-off-by: Pan Xiuli 
> ---
>  backend/src/backend/gen/gen_mesa_disasm.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/backend/src/backend/gen/gen_mesa_disasm.c
> b/backend/src/backend/gen/gen_mesa_disasm.c
> index 56fda89..8a2afe5 100644
> --- a/backend/src/backend/gen/gen_mesa_disasm.c
> +++ b/backend/src/backend/gen/gen_mesa_disasm.c
> @@ -1546,7 +1546,7 @@ int gen_disasm (FILE *file, const void *inst, uint32_t
> deviceID, uint32_t compac
> 
> data_port_data_cache_byte_scattered_simd_mode[BYTE_RW_SIMD_MOD
> E(inst)],
> data_port_data_cache_category[UNTYPED_RW_CATEGORY(inst)],
> 
> data_port_data_cache_msg_type[UNTYPED_RW_MSG_TYPE(inst)]);
> -else if(UNTYPED_RW_MSG_TYPE(inst) == 0 ||
> UNTYPED_RW_MSG_TYPE(inst) == 8)
> +else if(UNTYPED_RW_MSG_TYPE(inst) == 0 ||
> UNTYPED_RW_MSG_TYPE(inst) == 1 || UNTYPED_RW_MSG_TYPE(inst) == 8)
>format(file, " (bti: %d, data size: %s, %s, %s)",
> UNTYPED_RW_BTI(inst),
> 
> data_port_data_cache_block_size[OWORD_RW_BLOCK_SIZE(inst)],
> --
> 2.7.4
> 
> ___
> Beignet mailing list
> Beignet@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/beignet
___
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] [PATCH 7/7] Backend: for BDW and after, According to BSpec no need to split CMP when src is DW DF

2017-03-07 Thread Yang, Rong R
There is a build error:
error: no ‘bool gbe::Gen8Encoder::needToSplitCmpBySrcType(gbe::GenEncoder*, 
gbe::GenRegister, gbe::GenRegister)’ member function declared in class 
‘gbe::Gen8Encoder’
 bool Gen8Encoder::needToSplitCmpBySrcType(GenEncoder *p, GenRegister src0, 
GenRegister src1) {

Need add the needToSplitCmpBySrcType declare to Gen8Encoder head file can fix 
it.

I have add it by manual and pushed, thanks.

> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> Song, Ruiling
> Sent: Tuesday, March 7, 2017 13:54
> To: Wang, Rander ; beig...@freedesktop.org
> Cc: Wang, Rander 
> Subject: Re: [Beignet] [PATCH 7/7] Backend: for BDW and after, According to
> BSpec no need to split CMP when src is DW DF
> 
> LGTM
> 
> Ruiling
> > -Original Message-
> > From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf
> > Of rander
> > Sent: Tuesday, March 7, 2017 10:25 AM
> > To: beig...@freedesktop.org
> > Cc: Wang, Rander 
> > Subject: [Beignet] [PATCH 7/7] Backend: for BDW and after, According
> > to BSpec no need to split CMP when src is DW DF
> >
> > Signed-off-by: rander 
> > ---
> >  backend/src/backend/gen8_encoder.cpp | 5 +
> > backend/src/backend/gen_encoder.cpp  | 4 
> > backend/src/backend/gen_encoder.hpp  | 1 +
> >  3 files changed, 10 insertions(+)
> >
> > diff --git a/backend/src/backend/gen8_encoder.cpp
> > b/backend/src/backend/gen8_encoder.cpp
> > index a33fbac..cbee83f 100644
> > --- a/backend/src/backend/gen8_encoder.cpp
> > +++ b/backend/src/backend/gen8_encoder.cpp
> > @@ -883,4 +883,9 @@ namespace gbe
> > msg_length,
> > response_length);
> > }
> > +
> > +/* for BDW and after, no need to split CMP when src is DW*/
> > +bool Gen8Encoder::needToSplitCmpBySrcType(GenEncoder *p,
> > + GenRegister
> > src0, GenRegister src1) {
> > +  return false;
> > +}
> >  } /* End of the name space. */
> > diff --git a/backend/src/backend/gen_encoder.cpp
> > b/backend/src/backend/gen_encoder.cpp
> > index 03ce0e2..217a2d8 100644
> > --- a/backend/src/backend/gen_encoder.cpp
> > +++ b/backend/src/backend/gen_encoder.cpp
> > @@ -192,6 +192,10 @@ namespace gbe
> >  if (isSrcDstDiffSpan(dst, src0) == true) return true;
> >  if (isSrcDstDiffSpan(dst, src1) == true) return true;
> >
> > +return p->needToSplitCmpBySrcType(p, src0, src1);  }
> > +
> > +  bool GenEncoder::needToSplitCmpBySrcType(GenEncoder *p,
> GenRegister
> > src0, GenRegister src1) {
> >  if (src0.type == GEN_TYPE_D || src0.type == GEN_TYPE_UD ||
> > src0.type ==
> > GEN_TYPE_F)
> >return true;
> >  if (src1.type == GEN_TYPE_D || src1.type == GEN_TYPE_UD ||
> > src1.type ==
> > GEN_TYPE_F)
> > diff --git a/backend/src/backend/gen_encoder.hpp
> > b/backend/src/backend/gen_encoder.hpp
> > index 3e45c81..040b94a 100644
> > --- a/backend/src/backend/gen_encoder.hpp
> > +++ b/backend/src/backend/gen_encoder.hpp
> > @@ -162,6 +162,7 @@ namespace gbe
> >  void BRD(GenRegister src);
> >  /*! Compare instructions */
> >  void CMP(uint32_t conditional, GenRegister src0, GenRegister
> > src1, GenRegister dst = GenRegister::null());
> > +virtual bool needToSplitCmpBySrcType(GenEncoder *p, GenRegister
> > + src0,
> > GenRegister src1);
> >  /*! Select with embedded compare (like sel.le ...) */
> >  void SEL_CMP(uint32_t conditional, GenRegister dst, GenRegister
> > src0, GenRegister src1);
> >  /*! EOT is used to finish GPGPU threads */
> > --
> > 2.7.4
> >
> > ___
> > Beignet mailing list
> > Beignet@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/beignet
> ___
> Beignet mailing list
> Beignet@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/beignet
___
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] [PATCH 2/2] Backend: refine the geometry function

2017-03-07 Thread Yang, Rong R
Pushed.

> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> Song, Ruiling
> Sent: Tuesday, March 7, 2017 13:54
> To: Wang, Rander ; beignet@lists.freedesktop.org
> Cc: Wang, Rander 
> Subject: Re: [Beignet] [PATCH 2/2] Backend: refine the geometry function
> 
> 
> > -Original Message-
> > From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf
> > Of rander
> > Sent: Monday, March 6, 2017 10:20 AM
> > To: beignet@lists.freedesktop.org
> > Cc: Wang, Rander 
> > Subject: [Beignet] [PATCH 2/2] Backend: refine the geometry function
> >
> > Signed-off-by: rander 
> > ---
> >  backend/src/libocl/src/ocl_geometric.cl | 8 
> >  1 file changed, 4 insertions(+), 4 deletions(-)
> >
> > diff --git a/backend/src/libocl/src/ocl_geometric.cl
> > b/backend/src/libocl/src/ocl_geometric.cl
> > index af39ed3..1f66daa 100644
> > --- a/backend/src/libocl/src/ocl_geometric.cl
> > +++ b/backend/src/libocl/src/ocl_geometric.cl
> > @@ -100,10 +100,10 @@ OVERLOADABLE float fast_length(float x) { return
> > __gen_ocl_fabs(x); }  OVERLOADABLE float fast_length(float2 x) {
> > return sqrt(dot(x,x)); }  OVERLOADABLE float fast_length(float3 x) {
> > return sqrt(dot(x,x)); }  OVERLOADABLE float fast_length(float4 x) {
> > return sqrt(dot(x,x)); } -OVERLOADABLE float fast_distance(float x,
> > float y) { return length(x-y); } -OVERLOADABLE float
> > fast_distance(float2 x, float2 y) { return length(x-y); }
> > -OVERLOADABLE float fast_distance(float3 x, float3 y) { return
> > length(x-y); } -OVERLOADABLE float fast_distance(float4 x, float4 y) {
> > return length(x-y); }
> > +OVERLOADABLE float fast_distance(float x, float y) { return
> > +fast_length(x-y); } OVERLOADABLE float fast_distance(float2 x, float2
> > +y) { return fast_length(x-y); } OVERLOADABLE float
> > +fast_distance(float3 x, float3 y) { return fast_length(x-y); }
> > +OVERLOADABLE float fast_distance(float4 x, float4 y) { return
> > +fast_length(x-y); }
> >  OVERLOADABLE float fast_normalize(float x) { return x > 0 ? 1.f : (x < 0 ? 
> > -
> 1.f :
> > 0.f); }
> >  OVERLOADABLE float2 fast_normalize(float2 x) { return x *
> > rsqrt(dot(x, x)); }  OVERLOADABLE float3 fast_normalize(float3 x) {
> > return x * rsqrt(dot(x, x)); }
> > --
> > 2.7.4
> 
> The patch looks good.
> 
> - Ruiling
> ___
> Beignet mailing list
> Beignet@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/beignet
___
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] [PATCH RFC] Add AppStream metadata

2017-02-28 Thread Yang, Rong R
metadata_license is just the license for this XML, right?
If so, MIT is ok for me, what your opinion? 
I have no problem with id section.

open(os.path.join(source_directory,"src/cl_device_data.h"),"r",encoding= 
'utf-8')
has an error in python2.x:
TypeError: 'encoding' is an invalid keyword argument for this function
Add `from io import open` can fix it.

The other part LGTM.
Thanks.

> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> Rebecca N. Palmer
> Sent: Monday, January 16, 2017 7:18
> To: beignet@lists.freedesktop.org
> Subject: [Beignet] [PATCH RFC] Add AppStream metadata
> 
> AppStream is a standard for software metadata, including what hardware a
> driver supports:
> https://www.freedesktop.org/software/appstream/docs/chap-
> Metadata.html
> 
> Signed-off-by: Rebecca N. Palmer 
> ---
> Before this is pushed,  needs to be filled in:
> AppStream prefer a permissive license (such as CC0-1.0 or MIT) for metadata
> to allow it to easily be combined into a distribution-wide file, and I hereby
> agree to this for the contents of this patch, but as the supported hardware
> list is extracted from the source (to make sure it is kept up to date), this
> might also need your agreement.
> 
>  recommends the reverseddomainname.softwarename format simply
> as a convenient way to ensure it is unique: it need not be related to .  
> If
> you prefer to use one of your other domains (e.g.
> org.01.beignet) you need to change all the places it appears in this patch,
> including the filename.
> 
> diff --git a/CMakeLists.txt b/CMakeLists.txt index a24ccb9..aa9a32d 100644
> --- a/CMakeLists.txt
> +++ b/CMakeLists.txt
> @@ -37,6 +37,7 @@ INCLUDE (GNUInstallDirs OPTIONAL)  # support old
> CMake without GNUInstallDirs  if (NOT CMAKE_INSTALL_FULL_LIBDIR)
>set (CMAKE_INSTALL_FULL_LIBDIR "${CMAKE_INSTALL_PREFIX}/lib")
> +  set (CMAKE_INSTALL_FULL_DATADIR "${CMAKE_INSTALL_PREFIX}/share")
>set (BEIGNET_LIBRARY_ARCHITECTURE "")  else (NOT
> CMAKE_INSTALL_FULL_LIBDIR)
>set (BEIGNET_LIBRARY_ARCHITECTURE
> "${CMAKE_LIBRARY_ARCHITECTURE}") @@ -317,6 +318,10 @@
> IF(BUILD_EXAMPLES)
>  ADD_SUBDIRECTORY(examples)
>  ENDIF(BUILD_EXAMPLES)
> 
> +add_custom_target(metainfo ALL
> +  COMMAND ${PYTHON_EXECUTABLE}
> +${CMAKE_CURRENT_SOURCE_DIR}/update_metainfo_xml.py
> +"${LIBCL_DRIVER_VERSION_MAJOR}.${LIBCL_DRIVER_VERSION_MINOR}.${
> LIBCL_DR
> +IVER_VERSION_PATCH}" ${CMAKE_CURRENT_BINARY_DIR}) install (FILES
> +${CMAKE_CURRENT_BINARY_DIR}/com.intel.beignet.metainfo.xml
> DESTINATION
> +${CMAKE_INSTALL_FULL_DATADIR}/metainfo)
> +
>  SET(CPACK_SET_DESTDIR ON)
>  SET(CPACK_PACKAGE_VERSION_MAJOR
> "${LIBCL_DRIVER_VERSION_MAJOR}")
> SET(CPACK_PACKAGE_VERSION_MINOR
> "${LIBCL_DRIVER_VERSION_MINOR}") diff --git
> a/com.intel.beignet.metainfo.xml.in b/com.intel.beignet.metainfo.xml.in
> new file mode 100644
> index 000..66b74d0
> --- /dev/null
> +++ b/com.intel.beignet.metainfo.xml.in
> @@ -0,0 +1,18 @@
> +
> +
> +com.intel.beignet
> +Beignet
> +OpenCL (GPU compute) driver for Intel GPUs
> +This allows using Intel integrated GPUs for general
> +computation, speeding up some applications. 
> +@modalias_list@  
> +LGPL-2.1+
> + +type="homepage">https://www.freedesktop.org/wiki/Software/Beignet/
>  +>  +type="bugtracker">https://bugs.freedesktop.org/buglist.cgi?product=Beig
> +net=Beignet=---
> +Intel
> +
> +  
> diff --git a/update_metainfo_xml.py b/update_metainfo_xml.py new file
> mode 100755 index 000..7d5278c
> --- /dev/null
> +++ b/update_metainfo_xml.py
> @@ -0,0 +1,31 @@
> +#!/usr/bin/python
> +
> +import re
> +import sys
> +import os.path
> +
> +if len(sys.argv) != 3:
> +raise TypeError("requires version_string and output_directory")
> +version_string = sys.argv[1] output_directory = sys.argv[2]
> +source_directory = os.path.dirname(sys.argv[0]) source_file =
> +open(os.path.join(source_directory,"src/cl_device_data.h"),"r",encoding
> += 'utf-8') device_ids = [] supported = False # first few devices in the
> +file aren't supported for line in source_file:
> +device_id = re.match(r"#define\s+PCI_CHIP_([A-Za-z0-9_]+)\s+0x([0-9A-
> Fa-f]+)",line)
> +if device_id is None:
> +continue
> +if "IVYBRIDGE" in device_id.group(1):
> +supported = True # start of supported devices
> +if supported:
> +device_ids.append(device_id.group(2).upper())
> +source_file.close()
> +modalias_list_string =
> +"\n".join("pci:v8086d{}*".format(device_i
> d
> +) for device_id in sorted(device_ids)) metadata_file_in =
> +open(os.path.join(source_directory,"com.intel.beignet.metainfo.xml.in")
> +,"r",encoding = 'utf-8') metadata_string = metadata_file_in.read()
> +metadata_file_in.close()
> +metadata_string =
> +metadata_string.replace("@modalias_list@",modalias_list_string).replace
> +("@version@",version_string) metadata_file_out =
> 

Re: [Beignet] [PATCH] Backend: Fix a selection ir optimization bug

2017-02-27 Thread Yang, Rong R
LGTM, will push it later, thanks.

> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> Xiuli Pan
> Sent: Monday, February 27, 2017 11:16
> To: beignet@lists.freedesktop.org
> Cc: Pan, Xiuli 
> Subject: [Beignet] [PATCH] Backend: Fix a selection ir optimization bug
> 
> From: Pan Xiuli 
> 
> We used to check for unpacked instructions, but we will also ignore some
> patterns like:
> 
> MOV %1, %2.1
> MUL %4, %3, %1
> ==>
> MUL $4, %3, %2.1
> 
> Add more check to keep this kind of optimization.
> 
> Signed-off-by: Pan Xiuli 
> ---
>  backend/src/backend/gen_insn_selection_optimize.cpp | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/backend/src/backend/gen_insn_selection_optimize.cpp
> b/backend/src/backend/gen_insn_selection_optimize.cpp
> index 512a5bd..d2e0fb9 100644
> --- a/backend/src/backend/gen_insn_selection_optimize.cpp
> +++ b/backend/src/backend/gen_insn_selection_optimize.cpp
> @@ -162,7 +162,10 @@ namespace gbe
>  assert(insn.opcode == SEL_OP_MOV);
>  const GenRegister& src = insn.src(0);
>  const GenRegister& dst = insn.dst(0);
> -if (src.type != dst.type || src.file != dst.file || src.hstride != 
> dst.hstride)
> +if (src.type != dst.type || src.file != dst.file)
> +  return;
> +
> +if (src.hstride != GEN_HORIZONTAL_STRIDE_0 && src.hstride !=
> + dst.hstride )
>return;
> 
>  if (liveout.find(dst.reg()) != liveout.end())
> --
> 2.7.4
> 
> ___
> Beignet mailing list
> Beignet@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/beignet
___
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] [PATCH] MAD compact instrcution could not support "absolute" attribute.

2017-02-23 Thread Yang, Rong R
Pushed, thanks.

> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> Pan, Xiuli
> Sent: Thursday, February 23, 2017 16:34
> To: yan.w...@linux.intel.com; beignet@lists.freedesktop.org
> Subject: Re: [Beignet] [PATCH] MAD compact instrcution could not support
> "absolute" attribute.
> 
> LGTM.
> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> yan.w...@linux.intel.com
> Sent: Thursday, February 23, 2017 4:20 PM
> To: beignet@lists.freedesktop.org
> Cc: Yan Wang 
> Subject: [Beignet] [PATCH] MAD compact instrcution could not support
> "absolute" attribute.
> 
> From: Yan Wang 
> 
> If absolute of SRCs of MAD instruction is 1, doens't use compact instruction.
> 
> Signed-off-by: Yan Wang 
> ---
>  backend/src/backend/gen_insn_compact.cpp | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/backend/src/backend/gen_insn_compact.cpp
> b/backend/src/backend/gen_insn_compact.cpp
> index 62fcb61..22305f7 100644
> --- a/backend/src/backend/gen_insn_compact.cpp
> +++ b/backend/src/backend/gen_insn_compact.cpp
> @@ -804,6 +804,8 @@ namespace gbe {
>  if( control_index == -1) return false;
>  if( src0.negation + src1.negation + src2.negation > 1)
>return false;
> +if( src0.absolute + src1.absolute + src2.absolute > 0)
> +  return false;
> 
>  GenCompactInstruction *insn = p->nextCompact(opcode);
>  insn->src3Insn.bits1.control_index = control_index;
> --
> 2.7.4
> 
> ___
> Beignet mailing list
> Beignet@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/beignet
> ___
> Beignet mailing list
> Beignet@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/beignet
___
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] [PATCH] Enable support for two-component 16-bit planes

2017-02-13 Thread Yang, Rong R
LGTM, pushed, thanks.

> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> Mark Thompson
> Sent: Saturday, February 11, 2017 21:50
> To: beignet@lists.freedesktop.org
> Subject: [Beignet] [PATCH] Enable support for two-component 16-bit planes
> 
> This is needed to support the chroma plane of P010 surfaces being mapped
> from VAAPI.
> 
> Signed-off-by: Mark Thompson 
> ---
> On 01/01/17 13:46, Mark Thompson wrote:
> > ...
> 
> Ping.
> 
> 
>  src/cl_image.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/src/cl_image.c b/src/cl_image.c index d0593044..5ff459a0 100644
> --- a/src/cl_image.c
> +++ b/src/cl_image.c
> @@ -144,7 +144,9 @@ cl_image_get_intel_format(const cl_image_format
> *fmt)
>  case CL_RG:
>switch (type) {
>  case CL_UNORM_INT8: return I965_SURFACEFORMAT_R8G8_UNORM;
> +case CL_UNORM_INT16:return
> I965_SURFACEFORMAT_R16G16_UNORM;
>  case CL_UNSIGNED_INT8:  return I965_SURFACEFORMAT_R8G8_UINT;
> +case CL_UNSIGNED_INT16: return
> I965_SURFACEFORMAT_R16G16_UINT;
>  default: return INTEL_UNSUPPORTED_FORMAT;
>};
>  #if 0
> --
> 2.11.0
> 
> ___
> Beignet mailing list
> Beignet@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/beignet
___
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] [PATCH V5] Enable OpenCL 2.0 only where supported

2017-02-13 Thread Yang, Rong R
Pushed, thanks.

> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> Xiuli Pan
> Sent: Monday, February 13, 2017 9:52
> To: beignet@lists.freedesktop.org
> Cc: Pan, Xiuli 
> Subject: [Beignet] [PATCH V5] Enable OpenCL 2.0 only where supported
> 
> From: Pan Xiuli 
> 
> This allows a single beignet binary to both offer 2.0 where available, and 
> still
> work on older hardware.
> V2: Default to 1.2 when -cl-std is not set (required by the OpenCL spec,
> and also likely to be faster).
> V3: Only enable OpenCL 2.0 when llvm version is 39.
> V4: Only enable OpenCL 2.0 on x64 host.
> V5: Always return 32 as address bits.
> 
> Contributor: Rebecca N. Palmer 
> Signed-off-by: Pan Xiuli 
> ---
>  CMakeLists.txt  | 46 
> -
>  backend/src/backend/program.cpp | 19 +
>  src/cl_device_data.h|  2 ++
>  src/cl_gen9_device.h|  2 ++
>  src/cl_gt_device.h  | 14 -
>  src/cl_platform_id.c|  2 +-
>  src/cl_platform_id.h|  6 --
>  7 files changed, 60 insertions(+), 31 deletions(-)
> 
> diff --git a/CMakeLists.txt b/CMakeLists.txt index 3246567..70fc10e 100644
> --- a/CMakeLists.txt
> +++ b/CMakeLists.txt
> @@ -234,20 +234,18 @@ IF (EXPERIMENTAL_DOUBLE)
>ADD_DEFINITIONS(-DENABLE_FP64)
>  ENDIF(EXPERIMENTAL_DOUBLE)
> 
> -OPTION(ENABLE_OPENCL_20 "Enable opencl 2.0 support" OFF) -IF
> (ENABLE_OPENCL_20)
> -  Find_Program(LSPCI lspci)
> -  IF (NOT LSPCI)
> -MESSAGE(FATAL_ERROR "Looking for lspci - not found")
> -  ENDIF (NOT LSPCI)
> -  EXECUTE_PROCESS(COMMAND
> "${CMAKE_CURRENT_SOURCE_DIR}/GetGenID.sh"
> -  RESULT_VARIABLE SUPPORT_OCL20_DEVICE
> -  OUTPUT_VARIABLE PCI_ID_NOT_USED)
> -
> -  IF (NOT SUPPORT_OCL20_DEVICE EQUAL 1)
> -MESSAGE(FATAL_ERROR "Only SKL and newer devices support OpenCL
> 2.0 now, your device don't support.")
> -  ENDIF (NOT SUPPORT_OCL20_DEVICE EQUAL 1)
> +SET(CAN_OPENCL_20 ON)
> +IF (CMAKE_SIZEOF_VOID_P EQUAL 4)
> +  SET(CAN_OPENCL_20 OFF)
> +ENDIF (CMAKE_SIZEOF_VOID_P EQUAL 4)
> +IF (NOT HAVE_DRM_INTEL_BO_SET_SOFTPIN)
> +  SET(CAN_OPENCL_20 OFF)
> +ENDIF (NOT HAVE_DRM_INTEL_BO_SET_SOFTPIN) IF
> (LLVM_VERSION_NODOT
> +VERSION_LESS 39)
> +  SET(CAN_OPENCL_20 OFF)
> +ENDIF (LLVM_VERSION_NODOT VERSION_LESS 39)
> 
> +IF (ENABLE_OPENCL_20)
>IF (NOT HAVE_DRM_INTEL_BO_SET_SOFTPIN)
>  MESSAGE(FATAL_ERROR "Please update libdrm to version 2.4.66 or later
> to enable OpenCL 2.0.")
>ENDIF (NOT HAVE_DRM_INTEL_BO_SET_SOFTPIN) @@ -256,9 +254,29 @@
> IF (ENABLE_OPENCL_20)
>  MESSAGE(FATAL_ERROR "Please update LLVM to version 3.9 or later to
> enable OpenCL 2.0.")
>ENDIF (LLVM_VERSION_NODOT VERSION_LESS 39)
> 
> -  ADD_DEFINITIONS(-DENABLE_OPENCL_20)
> +  IF (CMAKE_SIZEOF_VOID_P EQUAL 4)
> +MESSAGE(FATAL_ERROR "Please use x64 host to enable OpenCL 2.0.")
> + ENDIF (CMAKE_SIZEOF_VOID_P EQUAL 4)
>  ENDIF(ENABLE_OPENCL_20)
> 
> +IF (DEFINED ENABLE_OPENCL_20)
> +  IF (ENABLE_OPENCL_20 AND CAN_OPENCL_20)
> +SET(CAN_OPENCL_20 ON)
> +  ELSE(ENABLE_OPENCL_20 AND CAN_OPENCL_20)
> +SET(CAN_OPENCL_20 OFF)
> +  ENDIF (ENABLE_OPENCL_20 AND CAN_OPENCL_20) ENDIF (DEFINED
> +ENABLE_OPENCL_20)
> +
> +OPTION(ENABLE_OPENCL_20 "Enable opencl 2.0 support"
> ${CAN_OPENCL_20})
> +
> +IF (CAN_OPENCL_20)
> +  SET (ENABLE_OPENCL_20 ON)
> +  MESSAGE(STATUS "Building with OpenCL 2.0.")
> +  ADD_DEFINITIONS(-DENABLE_OPENCL_20)
> +ELSE (CAN_OPENCL_20)
> +  MESSAGE(STATUS "Building with OpenCL 1.2.")
> +ENDIF(CAN_OPENCL_20)
> +
>  set (LIBCL_DRIVER_VERSION_MAJOR 1)
>  set (LIBCL_DRIVER_VERSION_MINOR 4)
>  if (ENABLE_OPENCL_20)
> diff --git a/backend/src/backend/program.cpp
> b/backend/src/backend/program.cpp index 85d0aa9..09c79d8 100644
> --- a/backend/src/backend/program.cpp
> +++ b/backend/src/backend/program.cpp
> @@ -31,6 +31,7 @@
>  #include "ir/value.hpp"
>  #include "ir/unit.hpp"
>  #include "ir/printf.hpp"
> +#include "src/cl_device_data.h"
> 
>  #ifdef GBE_COMPILER_AVAILABLE
>  #include "llvm/llvm_to_gen.hpp"
> @@ -855,6 +856,7 @@ namespace gbe {
>   size_t *errSize,
>   uint32_t )
>{
> +uint32_t maxoclVersion = oclVersion;
>  std::string pchFileName;
>  bool findPCH = false;
>  #if defined(__ANDROID__)
> @@ -1022,15 +1024,9 @@ EXTEND_QUOTE:
>  }
> 
>  if (useDefaultCLCVersion) {
> -#ifdef ENABLE_OPENCL_20
> -  clOpt.push_back("-D__OPENCL_C_VERSION__=200");
> -  clOpt.push_back("-cl-std=CL2.0");
> -  oclVersion = 200;
> -#else
>clOpt.push_back("-D__OPENCL_C_VERSION__=120");
>clOpt.push_back("-cl-std=CL1.2");
>oclVersion = 120;
> -#endif
>  }
>  //for clCompilerProgram usage.
>  if(temp_header_path){
> @@ -1061,7 +1057,12 @@ 

Re: [Beignet] [PATCH V4] Enable OpenCL 2.0 only where supported

2017-02-12 Thread Yang, Rong R


> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> Rebecca N. Palmer
> Sent: Saturday, February 11, 2017 7:02
> To: beignet@lists.freedesktop.org
> Subject: Re: [Beignet] [PATCH V4] Enable OpenCL 2.0 only where supported
> 
> Yang, Rong R wrote:
> > Because use -cl-std=CL1.2 by default when OpenCL 2.0 enabled, I prefer to
> always report address_bits = 32 now.
> > OpenCL spec consider only one address bits in one device, but when GEN9
> now support both 32 bits and 64 bits address, so there is no way to comply
> with spec.
> 
> As previously noted
> (https://lists.freedesktop.org/archives/beignet/2017-January/008517.html),
> the spec actually says *default* address space size
> (https://www.khronos.org/registry/OpenCL/specs/opencl-2.0.pdf page 64),
> so I agree it should be 32, and device->address_bits
> (src/cl_get_gt_device.h:45) is where this is set.
> 
> > An other issue is that beignet OpenCL 2.0 don't support i386 system now,
> maybe we also need set CAN_OPENCL_20 to off in i386 system.
> 
> What happens if you try - explicit error or mystery crash?  Does a 2.0-capable
> beignet+hardware work if you only actually use 1.2?
> (Debian's 2.0-enabled beignet does work in an i386 chroot, but that's on my
> non-2.0-capable hardware, and only major bugs are allowed to be fixed
> during freeze.)

OpenCL 2.0's SVM requires GPU address bits are same as host address bits. So in 
i386 system, one
known issue is GPU must use 32 bits address, otherwise SVM application such as 
linked-list may crash.
It is more complex in i386 chroot, because kernel's drm drivers will patch the 
GPU address to 64 bits, but
User space application hope to use 32 bits address.

So we decide to disable OpenCL 2.0 temporary in i386 system, just as LLVM 3.9 
version check, Xiuli has sent a
new version patch. Do you have any suggestion?

If only use 1.2, I thinks there is no issue. We have done the full i386 test on 
non-2.0-capable hardware.

> 
> > And also need to update readme after this patch merged.
> 
> https://sources.debian.net/src/beignet/1.3.0-1/debian/patches/opencl2-
> runtime-detection.patch/#L351
> is what I used, though you'll want to remove the (Debian-specific) jessie-
> backports reference.
> 
Thanks.


> Signed-off-by: Rebecca N. Palmer <rebecca_pal...@zoho.com>
> 
> ___
> Beignet mailing list
> Beignet@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/beignet
___
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] [PATCH v2 2/2] Add document of using cl_khr_gl_sharing to do gl buffer sharing.

2017-02-10 Thread Yang, Rong R
LGTM, pushed, thanks.

> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> Chuanbo Weng
> Sent: Friday, February 10, 2017 15:48
> To: beignet@lists.freedesktop.org
> Cc: Weng, Chuanbo 
> Subject: [Beignet] [PATCH v2 2/2] Add document of using cl_khr_gl_sharing
> to do gl buffer sharing.
> 
> v2:
>   1. Change description of cl_khr_gl_sharing in README.md
>   2. Add display hint in gl-buffer-sharing-howto.mdwn
> 
> Signed-off-by: Chuanbo Weng 
> ---
>  docs/Beignet.mdwn   |  7 ++-
>  docs/howto/gl-buffer-sharing-howto.mdwn | 82
> +
>  2 files changed, 85 insertions(+), 4 deletions(-)  create mode 100644
> docs/howto/gl-buffer-sharing-howto.mdwn
> 
> diff --git a/docs/Beignet.mdwn b/docs/Beignet.mdwn index
> 5c62b4c..709d7c8 100644
> --- a/docs/Beignet.mdwn
> +++ b/docs/Beignet.mdwn
> @@ -222,10 +222,8 @@ Known Issues
>This loses some precision but gains performance.
> 
>  * cl\_khr\_gl\_sharing.
> -  This extension highly depends on mesa support. It seems that mesa would
> not provide
> -  such type of extensions, we may have to hack with mesa source code to
> support this
> -  extension. This feature used to work with a previous mesa git version. But
> now, it's
> -  simply broken.
> +  This extension is partially implemented(the most commonly used part),
> + and we will implement  other parts based on requirement.
> 
>  Project repository
>  --
> @@ -283,6 +281,7 @@ Documents for OpenCL application developers
>  - [[Kernel Optimization Guide|Beignet/optimization-guide]]
>  - [[Libva Buffer Sharing|Beignet/howto/libva-buffer-sharing-howto]]
>  - [[V4l2 Buffer Sharing|Beignet/howto/v4l2-buffer-sharing-howto]]
> +- [[OpenGL Buffer Sharing|Beignet/howto/gl-buffer-sharing-howto]]
>  - [[Video Motion Estimation|Beignet/howto/video-motion-estimation-
> howto]]
>  - [[Stand Alone Unit Test|Beignet/howto/stand-alone-utest-howto]]
>  - [[Android build|Beignet/android-build-howto]]
> diff --git a/docs/howto/gl-buffer-sharing-howto.mdwn b/docs/howto/gl-
> buffer-sharing-howto.mdwn
> new file mode 100644
> index 000..6b3a751
> --- /dev/null
> +++ b/docs/howto/gl-buffer-sharing-howto.mdwn
> @@ -0,0 +1,82 @@
> +GL Buffer Sharing HowTo
> +=
> +
> +Beignet now support cl_khr_gl_sharing partially(the most commonly used
> +part), which is an offcial extension of Khronos OpenCL. With this
> +extension, Beignet can create memory object from OpenGL/OpenGL ES
> +buffer, texture or renderbuffer object with zero-copy. Currently, we
> +just support create memory object from GL buffer object or 2d texture(the
> most common target type). We will support creating from other GL target
> type if necessary.
> +
> +Prerequisite
> +
> +
> +Mesa GL library and Mesa EGL libray are required. Both version should
> +be greater or equal than 13.0.0.
> +
> +Steps
> +-
> +
> +A typical procedure of using cl_khr_gl_sharing is as below:
> +
> +- Basic egl routine(eglGetDisplay, eglInitialize, eglCreateContext...).
> +
> +- Create GL 2d texture in normal OpenGL way.
> +
> +- Check whether cl_khr_gl_sharing is supported by Beignet (Whether
> +cl_khr_gl_sharing is present
> +  in CL_DEVICE_EXTENSIONS string).
> +
> +- Create cl context with following cl_context_properties:
> +cl_context_properties *props=new cl_context_properties[7];
> +int i = 0;
> +props[i++] = CL_CONTEXT_PLATFORM;
> +props[i++] = (cl_context_properties)platform; //Valid OpenCL handle
> +props[i++] = CL_EGL_DISPLAY_KHR;  //We only support
> CL_EGL_DISPLAY_KHR now
> +props[i++] = (cl_context_properties)eglGetCurrentDisplay(); //EGLDisplay
> handle of the display
> +props[i++] = CL_GL_CONTEXT_KHR; //We only support
> CL_GL_CONTEXT_KHR now
> +props[i++] = (cl_context_properties)eglGetCurrentContext();
> //EGLContext created by above EGLDisplay
> +props[i++] = 0;
> +
> +- Create cl image object from GL 2d texture by calling
> clCreateFromGLTexture.
> +
> +- Ensure any pending GL operations which access this GL 2d texture have
> completed by glFinish.
> +
> +- Acquire cl image object by calling clEnqueueAcquireGLObjects.
> +
> +- Access this cl image object as an usual cl image object.
> +
> +- Relase cl image object by calling clEnqueueReleaseGLObjects.
> +
> +- Ensure any pending OpenCL operations which access this cl image object
> have completed by clFinish.
> +
> +- Do other operation on GL 2d texture.
> +
> +Sample code
> +---
> +
> +We have developed an example showing how to utilize cl_khr_gl_sharing
> +in examples/gl_buffer_sharing directory. A cl image object is created
> +from a gl 2d texutre and processed by OpenCL kernel, then is shown on
> screen.
> +
> +Steps to build and run this example:
> +
> +- Install mesa gl and egl library(version >= 13.0.0). X11 is also required.
> +
> +- Add option -DBUILD_EXAMPLES=ON to 

Re: [Beignet] [PATCH] API: Fix device type bugs

2017-02-09 Thread Yang, Rong R
Sorry, if support multiple type device later, can't check the type in the api 
function, should check the device type when get device.
I will send a patch to let cl_get_gt_device function return device when type is 
CL_DEVICE_TYPE_GPU and CL_DEVICE_TYPE_DEFAULT.

> -Original Message-
> From: Yang, Rong R
> Sent: Friday, February 10, 2017 15:40
> To: 'Xiuli Pan' <xiuli@intel.com>; beignet@lists.freedesktop.org
> Cc: giuseppe.bilo...@gmail.com; Pan, Xiuli <xiuli@intel.com>
> Subject: RE: [Beignet] [PATCH] API: Fix device type bugs
> 
> There are two types of relative return error, one is
> CL_INVALID_DEVICE_TYPE, for invalid value, another is
> CL_DEVICE_NOT_FOUND, for no matched device.
> The original check for CL_INVALID_DEVICE_TYPE is right, but miss the
> CL_DEVICE_NOT_FOUND check.
> Giuseppe's patch will return CL_DEVICE_NOT_FOUND, but I think is more
> clear to check it in the api's function.
> I will send a new version.
> 
> > -Original Message-
> > From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf
> > Of Xiuli Pan
> > Sent: Friday, February 10, 2017 15:04
> > To: beignet@lists.freedesktop.org
> > Cc: giuseppe.bilo...@gmail.com; Pan, Xiuli <xiuli@intel.com>
> > Subject: [Beignet] [PATCH] API: Fix device type bugs
> >
> > From: Pan Xiuli <xiuli@intel.com>
> >
> > Beignet only support GPU now, we should return
> CL_INVALID_DEVICE_TYPE
> > for CPU and ACCELERATOR.
> > Contributor: Giuseppe Bilotta <giuseppe.bilo...@gmail.com>
> >
> > Signed-off-by: Pan Xiuli <xiuli@intel.com>
> > ---
> >  src/cl_api_context.c   | 3 +--
> >  src/cl_api_device_id.c | 4 +---
> >  2 files changed, 2 insertions(+), 5 deletions(-)
> >
> > diff --git a/src/cl_api_context.c b/src/cl_api_context.c index
> > e8184b1..85d6480 100644
> > --- a/src/cl_api_context.c
> > +++ b/src/cl_api_context.c
> > @@ -71,8 +71,7 @@ clCreateContextFromType(const
> cl_context_properties
> > *properties,
> >cl_int err = CL_SUCCESS;
> >cl_device_id *devices = NULL;
> >cl_uint num_devices = 0;
> > -  const cl_device_type valid_type = CL_DEVICE_TYPE_GPU |
> > CL_DEVICE_TYPE_CPU | CL_DEVICE_TYPE_ACCELERATOR |
> > -CL_DEVICE_TYPE_DEFAULT | 
> > CL_DEVICE_TYPE_CUSTOM;
> > +  const cl_device_type valid_type = CL_DEVICE_TYPE_GPU |
> > + CL_DEVICE_TYPE_DEFAULT | CL_DEVICE_TYPE_CUSTOM;
> >
> >do {
> >  /* Assure parameters correctness */ diff --git
> > a/src/cl_api_device_id.c b/src/cl_api_device_id.c index
> > 4ffef78..84e0882 100644
> > --- a/src/cl_api_device_id.c
> > +++ b/src/cl_api_device_id.c
> > @@ -26,9 +26,7 @@ clGetDeviceIDs(cl_platform_id platform,
> > cl_device_id *devices,
> > cl_uint *num_devices)
> >  {
> > -  const cl_device_type valid_type = CL_DEVICE_TYPE_GPU |
> > CL_DEVICE_TYPE_CPU |
> > -CL_DEVICE_TYPE_ACCELERATOR |
> > CL_DEVICE_TYPE_DEFAULT |
> > -CL_DEVICE_TYPE_CUSTOM;
> > +  const cl_device_type valid_type = CL_DEVICE_TYPE_GPU |
> > + CL_DEVICE_TYPE_DEFAULT | CL_DEVICE_TYPE_CUSTOM;
> >
> >/* Check parameter consistency */
> >if (UNLIKELY(devices == NULL && num_devices == NULL))
> > --
> > 2.7.4
> >
> > ___
> > Beignet mailing list
> > Beignet@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/beignet
___
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] [PATCH] API: Fix device type bugs

2017-02-09 Thread Yang, Rong R
There are two types of relative return error, one is CL_INVALID_DEVICE_TYPE, 
for invalid value, another is CL_DEVICE_NOT_FOUND, for no matched device.
The original check for CL_INVALID_DEVICE_TYPE is right, but miss the  
CL_DEVICE_NOT_FOUND check.
Giuseppe's patch will return CL_DEVICE_NOT_FOUND, but I think is more clear to 
check it in the api's function. 
I will send a new version.

> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> Xiuli Pan
> Sent: Friday, February 10, 2017 15:04
> To: beignet@lists.freedesktop.org
> Cc: giuseppe.bilo...@gmail.com; Pan, Xiuli 
> Subject: [Beignet] [PATCH] API: Fix device type bugs
> 
> From: Pan Xiuli 
> 
> Beignet only support GPU now, we should return CL_INVALID_DEVICE_TYPE
> for CPU and ACCELERATOR.
> Contributor: Giuseppe Bilotta 
> 
> Signed-off-by: Pan Xiuli 
> ---
>  src/cl_api_context.c   | 3 +--
>  src/cl_api_device_id.c | 4 +---
>  2 files changed, 2 insertions(+), 5 deletions(-)
> 
> diff --git a/src/cl_api_context.c b/src/cl_api_context.c index
> e8184b1..85d6480 100644
> --- a/src/cl_api_context.c
> +++ b/src/cl_api_context.c
> @@ -71,8 +71,7 @@ clCreateContextFromType(const cl_context_properties
> *properties,
>cl_int err = CL_SUCCESS;
>cl_device_id *devices = NULL;
>cl_uint num_devices = 0;
> -  const cl_device_type valid_type = CL_DEVICE_TYPE_GPU |
> CL_DEVICE_TYPE_CPU | CL_DEVICE_TYPE_ACCELERATOR |
> -CL_DEVICE_TYPE_DEFAULT | 
> CL_DEVICE_TYPE_CUSTOM;
> +  const cl_device_type valid_type = CL_DEVICE_TYPE_GPU |
> + CL_DEVICE_TYPE_DEFAULT | CL_DEVICE_TYPE_CUSTOM;
> 
>do {
>  /* Assure parameters correctness */ diff --git a/src/cl_api_device_id.c
> b/src/cl_api_device_id.c index 4ffef78..84e0882 100644
> --- a/src/cl_api_device_id.c
> +++ b/src/cl_api_device_id.c
> @@ -26,9 +26,7 @@ clGetDeviceIDs(cl_platform_id platform,
> cl_device_id *devices,
> cl_uint *num_devices)
>  {
> -  const cl_device_type valid_type = CL_DEVICE_TYPE_GPU |
> CL_DEVICE_TYPE_CPU |
> -CL_DEVICE_TYPE_ACCELERATOR |
> CL_DEVICE_TYPE_DEFAULT |
> -CL_DEVICE_TYPE_CUSTOM;
> +  const cl_device_type valid_type = CL_DEVICE_TYPE_GPU |
> + CL_DEVICE_TYPE_DEFAULT | CL_DEVICE_TYPE_CUSTOM;
> 
>/* Check parameter consistency */
>if (UNLIKELY(devices == NULL && num_devices == NULL))
> --
> 2.7.4
> 
> ___
> Beignet mailing list
> Beignet@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/beignet
___
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] [PATCH] Free context devices on context release

2017-02-09 Thread Yang, Rong R
LGTM, thanks, pushed.

> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> Giuseppe Bilotta
> Sent: Friday, February 10, 2017 6:13
> To: Beignet ML 
> Cc: Giuseppe Bilotta 
> Subject: [Beignet] [PATCH] Free context devices on context release
> 
> The context owns the array of devices passed to cl_context_new, so it's its
> duty to free it.
> 
> Signed-off-by: Giuseppe Bilotta 
> ---
>  src/cl_context.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/src/cl_context.c b/src/cl_context.c index cbe2e017..1ba23024
> 100644
> --- a/src/cl_context.c
> +++ b/src/cl_context.c
> @@ -383,6 +383,7 @@ cl_context_delete(cl_context ctx)
>ctx->built_in_prgs = NULL;
> 
>cl_free(ctx->prop_user);
> +  cl_free(ctx->devices);
>cl_driver_delete(ctx->drv);
>CL_OBJECT_DESTROY_BASE(ctx);
>cl_free(ctx);
> --
> 2.11.1.658.g6a0cb3eb68
> 
> ___
> Beignet mailing list
> Beignet@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/beignet
___
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] [PATCH] Fix obvious copy-paste

2017-02-09 Thread Yang, Rong R
Yes, always NULL. LGTM, thanks, pushed.

> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> Giuseppe Bilotta
> Sent: Friday, February 10, 2017 6:13
> To: Beignet ML 
> Cc: Giuseppe Bilotta 
> Subject: [Beignet] [PATCH] Fix obvious copy-paste
> 
> The conditional was equal to the one before, and would never be hit
> because internal kernels were reset after release. Instead, since the body is
> resetting built-in kernels, it appears obvious that the conditional should be
> on the existence of built-in kernels.
> 
> Signed-off-by: Giuseppe Bilotta 
> ---
>  src/cl_context.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/src/cl_context.c b/src/cl_context.c index 3f2e7578..cbe2e017
> 100644
> --- a/src/cl_context.c
> +++ b/src/cl_context.c
> @@ -373,7 +373,7 @@ cl_context_delete(cl_context ctx)
>ctx->internal_prgs[i] = NULL;
>  }
> 
> -if (ctx->internal_kernels[i]) {
> +if (ctx->built_in_kernels[i]) {
>cl_kernel_delete(ctx->built_in_kernels[i]);
>ctx->built_in_kernels[i] = NULL;
>  }
> --
> 2.11.1.658.g6a0cb3eb68
> 
> ___
> Beignet mailing list
> Beignet@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/beignet
___
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] [PATCH 5/5] Enable OpenCL 2.0 only where supported

2017-02-08 Thread Yang, Rong R
Because use -cl-std=CL1.2 by default when OpenCL 2.0 enabled, I prefer to 
always report address_bits = 32 now.
OpenCL spec consider only one address bits in one device, but when GEN9 now 
support both 32 bits and 64 bits address, so there is no way to comply with 
spec.
I think we could change both GEN9's OpenCL 1.2 and Open2.0 address to 64 bits 
after there is no obvious performance drop.

An other issue is that beignet OpenCL 2.0 don't support i386 system now, maybe 
we also need set CAN_OPENCL_20 to off in i386 system.

And also need to update readme after this patch merged.

The other part of the patchset LGTM. 

> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> Xiuli Pan
> Sent: Tuesday, January 24, 2017 16:48
> To: beignet@lists.freedesktop.org
> Cc: Pan, Xiuli 
> Subject: [Beignet] [PATCH 5/5] Enable OpenCL 2.0 only where supported
> 
> From: Pan Xiuli 
> 
> This allows a single beignet binary to both offer 2.0 where available, and 
> still
> work on older hardware.
> V2: Default to 1.2 when -cl-std is not set (required by the OpenCL spec,
> and also likely to be faster).
> V3: Only enable OpenCL 2.0 when llvm version is 39.
> 
> Contributor: Rebecca N. Palmer 
> Signed-off-by: Pan Xiuli 
> ---
>  CMakeLists.txt  | 39 +--
>  backend/src/backend/program.cpp | 19 ++-
>  src/cl_device_data.h|  2 ++
>  src/cl_gen9_device.h|  2 ++
>  src/cl_gt_device.h  | 12 ++--
>  src/cl_platform_id.c|  2 +-
>  src/cl_platform_id.h|  6 --
>  7 files changed, 54 insertions(+), 28 deletions(-)
> 
> diff --git a/CMakeLists.txt b/CMakeLists.txt index 59abc45..75af35e 100644
> --- a/CMakeLists.txt
> +++ b/CMakeLists.txt
> @@ -231,20 +231,15 @@ IF (EXPERIMENTAL_DOUBLE)
>ADD_DEFINITIONS(-DENABLE_FP64)
>  ENDIF(EXPERIMENTAL_DOUBLE)
> 
> -OPTION(ENABLE_OPENCL_20 "Enable opencl 2.0 support" OFF) -IF
> (ENABLE_OPENCL_20)
> -  Find_Program(LSPCI lspci)
> -  IF (NOT LSPCI)
> -MESSAGE(FATAL_ERROR "Looking for lspci - not found")
> -  ENDIF (NOT LSPCI)
> -  EXECUTE_PROCESS(COMMAND
> "${CMAKE_CURRENT_SOURCE_DIR}/GetGenID.sh"
> -  RESULT_VARIABLE SUPPORT_OCL20_DEVICE
> -  OUTPUT_VARIABLE PCI_ID_NOT_USED)
> -
> -  IF (NOT SUPPORT_OCL20_DEVICE EQUAL 1)
> -MESSAGE(FATAL_ERROR "Only SKL and newer devices support OpenCL
> 2.0 now, your device don't support.")
> -  ENDIF (NOT SUPPORT_OCL20_DEVICE EQUAL 1)
> +SET(CAN_OPENCL_20 ON)
> +IF (NOT HAVE_DRM_INTEL_BO_SET_SOFTPIN)
> +  SET(CAN_OPENCL_20 OFF)
> +ENDIF (NOT HAVE_DRM_INTEL_BO_SET_SOFTPIN) IF
> (LLVM_VERSION_NODOT
> +VERSION_LESS 39)
> +  SET(CAN_OPENCL_20 OFF)
> +ENDIF (LLVM_VERSION_NODOT VERSION_LESS 39)
> 
> +IF (ENABLE_OPENCL_20)
>IF (NOT HAVE_DRM_INTEL_BO_SET_SOFTPIN)
>  MESSAGE(FATAL_ERROR "Please update libdrm to version 2.4.66 or later
> to enable OpenCL 2.0.")
>ENDIF (NOT HAVE_DRM_INTEL_BO_SET_SOFTPIN) @@ -252,9 +247,25 @@
> IF (ENABLE_OPENCL_20)
>IF (LLVM_VERSION_NODOT VERSION_LESS 39)
>  MESSAGE(FATAL_ERROR "Please update LLVM to version 3.9 or later to
> enable OpenCL 2.0.")
>ENDIF (LLVM_VERSION_NODOT VERSION_LESS 39)
> +ENDIF(ENABLE_OPENCL_20)
> 
> +IF (DEFINED ENABLE_OPENCL_20)
> +  IF (ENABLE_OPENCL_20 AND CAN_OPENCL_20)
> +SET(CAN_OPENCL_20 ON)
> +  ELSE(ENABLE_OPENCL_20 AND CAN_OPENCL_20)
> +SET(CAN_OPENCL_20 OFF)
> +  ENDIF (ENABLE_OPENCL_20 AND CAN_OPENCL_20) ENDIF (DEFINED
> +ENABLE_OPENCL_20)
> +
> +OPTION(ENABLE_OPENCL_20 "Enable opencl 2.0 support"
> ${CAN_OPENCL_20})
> +
> +IF (CAN_OPENCL_20)
> +  SET (ENABLE_OPENCL_20 ON)
> +  MESSAGE(STATUS "Building with OpenCL 2.0.")
>ADD_DEFINITIONS(-DENABLE_OPENCL_20)
> -ENDIF(ENABLE_OPENCL_20)
> +ELSE (CAN_OPENCL_20)
> +  MESSAGE(STATUS "Building with OpenCL 1.2.")
> +ENDIF(CAN_OPENCL_20)
> 
>  set (LIBCL_DRIVER_VERSION_MAJOR 1)
>  set (LIBCL_DRIVER_VERSION_MINOR 4)
> diff --git a/backend/src/backend/program.cpp
> b/backend/src/backend/program.cpp index 85d0aa9..09c79d8 100644
> --- a/backend/src/backend/program.cpp
> +++ b/backend/src/backend/program.cpp
> @@ -31,6 +31,7 @@
>  #include "ir/value.hpp"
>  #include "ir/unit.hpp"
>  #include "ir/printf.hpp"
> +#include "src/cl_device_data.h"
> 
>  #ifdef GBE_COMPILER_AVAILABLE
>  #include "llvm/llvm_to_gen.hpp"
> @@ -855,6 +856,7 @@ namespace gbe {
>   size_t *errSize,
>   uint32_t )
>{
> +uint32_t maxoclVersion = oclVersion;
>  std::string pchFileName;
>  bool findPCH = false;
>  #if defined(__ANDROID__)
> @@ -1022,15 +1024,9 @@ EXTEND_QUOTE:
>  }
> 
>  if (useDefaultCLCVersion) {
> -#ifdef ENABLE_OPENCL_20
> -  clOpt.push_back("-D__OPENCL_C_VERSION__=200");
> -  

Re: [Beignet] [PATCH] Typo in error message

2017-02-07 Thread Yang, Rong R
Pushed, thanks.

> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> He Junyan
> Sent: Friday, February 3, 2017 21:26
> To: beignet@lists.freedesktop.org
> Subject: Re: [Beignet] [PATCH] Typo in error message
> 
> Thanks for fixing it.
> 
> On Mon, Jan 30, 2017 at 03:18:09PM +0100, Giuseppe Bilotta wrote:
> > Date: Mon, 30 Jan 2017 15:18:09 +0100
> > From: Giuseppe Bilotta 
> > To: Beignet ML 
> > Cc: Giuseppe Bilotta 
> > Subject: [Beignet] [PATCH] Typo in error message
> > X-Mailer: git-send-email 2.11.0.745.g0978fb64a4
> >
> > ---
> >  src/cl_event.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/src/cl_event.c b/src/cl_event.c index 3e1dc224..a2b16be4
> > 100644
> > --- a/src/cl_event.c
> > +++ b/src/cl_event.c
> > @@ -579,7 +579,7 @@ cl_event_exec(cl_event event, cl_int
> > exec_to_status, cl_bool ignore_depends)
> >
> >  if (ret != CL_SUCCESS) {
> >assert(ret < 0);
> > -  DEBUGP(DL_WARNING, "Exec event %p error, type is %d, error staus
> is %d",
> > +  DEBUGP(DL_WARNING, "Exec event %p error, type is %d, error
> > + status is %d",
> >   event, event->event_type, ret);
> >ret = cl_event_set_status(event, ret);
> >assert(ret == CL_SUCCESS);
> > --
> > 2.11.0.745.g0978fb64a4
> >
> > ___
> > Beignet mailing list
> > Beignet@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/beignet
> 
> 
> ___
> Beignet mailing list
> Beignet@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/beignet
___
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] [PATCH] Add a CMake option for toggling OCL ICD Loader compatibility

2017-02-07 Thread Yang, Rong R
LGTM, pushed, thanks.

As Simon's suggestion, I think it is make sense to ask users to explicitly 
disable ICD if no ICD header, I will send a patch for it. 

> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> Marek Szuba
> Sent: Wednesday, January 25, 2017 21:04
> To: beignet@lists.freedesktop.org
> Cc: Marek Szuba 
> Subject: [Beignet] [PATCH] Add a CMake option for toggling OCL ICD Loader
> compatibility
> 
> The new option allows anyone wishing to do so to explicitly disable OCL ICD
> Loader support in Beignet, regardless of the presence or absence of OCL ICD
> header files. This is particularly useful for people building Beignet packages
> for distributions, as it avoids creating an implicit dependency on the state 
> of
> the build host. The new option defaults to ON so the default behaviour of
> CMake configuration remains unchanged.
> 
> See also: https://bugs.freedesktop.org/show_bug.cgi?id=98885
> ---
>  CMakeLists.txt | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/CMakeLists.txt b/CMakeLists.txt index 59abc45..3246567 100644
> --- a/CMakeLists.txt
> +++ b/CMakeLists.txt
> @@ -212,6 +212,8 @@ IF(ENABLE_GL_SHARING)
>ENDIF(EGL_FOUND)
>  ENDIF(ENABLE_GL_SHARING)
> 
> +OPTION(OCLICD_COMPAT "OCL ICD compatibility mode" ON)
> +IF(OCLICD_COMPAT)
>  Find_Package(OCLIcd)
>  IF(OCLIcd_FOUND)
>MESSAGE(STATUS "Looking for OCL ICD header file - found") @@ -223,6
> +225,7 @@ IF(OCLIcd_FOUND)
>  ELSE(OCLIcd_FOUND)
>MESSAGE(STATUS "Looking for OCL ICD header file - not found")
>  ENDIF(OCLIcd_FOUND)
> +ENDIF(OCLICD_COMPAT)
> 
>  Find_Package(PythonInterp)
> 
> --
> 2.10.2
> 
> ___
> Beignet mailing list
> Beignet@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/beignet
___
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] [PATCH] GBE: use shift for PowerOf2 size when lowering GEP.

2017-02-06 Thread Yang, Rong R
This patch looks good to me.
As discuss, we would better add the similar optimize in GEN selection, because 
GEP lower is a part of PowerOf 2  multiply. I will add it later.

> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> Ruiling Song
> Sent: Monday, January 23, 2017 13:32
> To: beignet@lists.freedesktop.org
> Cc: Song, Ruiling 
> Subject: [Beignet] [PATCH] GBE: use shift for PowerOf2 size when lowering
> GEP.
> 
> For 64bit address, the multiply would expand to several instructions.
> As for most time, the size is PowerOf 2. So we can use left-shift to do this.
> 
> Signed-off-by: Ruiling Song 
> ---
>  backend/src/llvm/llvm_passes.cpp | 19 +--
>  1 file changed, 13 insertions(+), 6 deletions(-)
> 
> diff --git a/backend/src/llvm/llvm_passes.cpp
> b/backend/src/llvm/llvm_passes.cpp
> index 367a2c3..c5f3ffe 100644
> --- a/backend/src/llvm/llvm_passes.cpp
> +++ b/backend/src/llvm/llvm_passes.cpp
> @@ -276,8 +276,6 @@ namespace gbe
>  uint32_t align = getAlignmentByte(unit, elementType);
>  size += getPadding(size, align);
> 
> -Constant* newConstSize =
> -  ConstantInt::get(IntegerType::get(GEPInst->getContext(), ptrSize),
> size);
> 
>  Value *operand = GEPInst->getOperand(op);
> 
> @@ -308,13 +306,22 @@ namespace gbe
>}
>  }
>  #endif
> -Value* tmpMul = operand;
> +Value* tmpOffset = operand;
>  if (size != 1) {
> -  tmpMul = BinaryOperator::Create(Instruction::Mul, newConstSize,
> operand,
> - "", GEPInst);
> +  if (isPowerOf<2>(size)) {
> +Constant* shiftAmnt =
> +  ConstantInt::get(IntegerType::get(GEPInst->getContext(), 
> ptrSize),
> logi2(size));
> +tmpOffset = BinaryOperator::Create(Instruction::Shl, operand,
> shiftAmnt,
> +   "", GEPInst);
> +  } else{
> +Constant* sizeConst =
> +  ConstantInt::get(IntegerType::get(GEPInst->getContext(), 
> ptrSize),
> size);
> +tmpOffset = BinaryOperator::Create(Instruction::Mul, sizeConst,
> operand,
> +   "", GEPInst);
> +  }
>  }
>  currentAddrInst =
> -  BinaryOperator::Create(Instruction::Add, currentAddrInst, tmpMul,
> +  BinaryOperator::Create(Instruction::Add, currentAddrInst,
> + tmpOffset,
>"", GEPInst);
>}
> 
> --
> 2.4.1
> 
> ___
> Beignet mailing list
> Beignet@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/beignet
___
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] [PATCH] Fix typo

2017-02-06 Thread Yang, Rong R
Pushed.

> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> Pan, Xiuli
> Sent: Monday, January 23, 2017 10:38
> To: Rebecca N. Palmer ;
> beignet@lists.freedesktop.org
> Subject: Re: [Beignet] [PATCH] Fix typo
> 
> LGTM, thanks.
> 
> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> Rebecca N. Palmer
> Sent: Monday, January 23, 2017 1:09 AM
> To: beignet@lists.freedesktop.org
> Subject: [Beignet] [PATCH] Fix typo
> 
> Signed-off-by: Rebecca N. Palmer 
> 
> --- a/backend/src/llvm/llvm_gen_backend.cpp
> +++ b/backend/src/llvm/llvm_gen_backend.cpp
> @@ -308,7 +308,7 @@ namespace gbe
>  if(StrTy)
>return getTypeByteSize(unit,StrTy);
>}
> -  GBE_ASSERTM(false, "Unspported type name");
> +  GBE_ASSERTM(false, "Unsupported type name");
>return 0;
>}
>  #undef TYPESIZEVEC
> 
> ___
> Beignet mailing list
> Beignet@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/beignet
> ___
> Beignet mailing list
> Beignet@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/beignet
___
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] [PATCH] Make CL-GL sharing available via ICD

2017-02-06 Thread Yang, Rong R
Pushed.

> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> Weng, Chuanbo
> Sent: Monday, January 23, 2017 19:44
> To: Rebecca N. Palmer ;
> beignet@lists.freedesktop.org
> Subject: Re: [Beignet] [PATCH] Make CL-GL sharing available via ICD
> 
> This patch LGTM, and I did a basic test on this patch. Thanks for your patch.
> 
> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> Rebecca N. Palmer
> Sent: Monday, January 23, 2017 7:11 AM
> To: beignet@lists.freedesktop.org
> Subject: [Beignet] [PATCH] Make CL-GL sharing available via ICD
> 
> Signed-off-by: Rebecca N. Palmer 
> ---
> (Warning: has not been tested)
> 
> diff --git a/src/cl_khr_icd.c b/src/cl_khr_icd.c index 7b3600c..e4daf79 100644
> --- a/src/cl_khr_icd.c
> +++ b/src/cl_khr_icd.c
> @@ -18,10 +18,14 @@
> 
>  #include "cl_platform_id.h"
>  #include "CL/cl_intel.h" // for clGetKernelSubGroupInfoKHR
> -/* The interop functions are not implemented in Beignet */ -#define
> CL_GL_INTEROP(x) NULL
> -/* OpenCL 1.2 is not implemented in Beignet */ -#define CL_1_2_NOTYET(x)
> NULL
> +/* The interop functions are only available if sharing is enabled */
> +#ifdef HAS_GL_EGL #define CL_GL_INTEROP(x) x #else #define
> +CL_GL_INTEROP(x) (void *) NULL #endif
> +/* These are not yet implemented in Beignet */ #define CL_NOTYET(x)
> +(void *) NULL
> 
>  /** Return platform list through ICD interface
>   * This code is used only if a client is linked directly against the library 
> @@ -
> 114,13 +118,13 @@ struct _cl_icd_dispatch const cl_khr_icd_dispatch = {
>clGetExtensionFunctionAddress,
>CL_GL_INTEROP(clCreateFromGLBuffer),
>CL_GL_INTEROP(clCreateFromGLTexture2D),
> -  CL_GL_INTEROP(clCreateFromGLTexture3D),
> -  CL_GL_INTEROP(clCreateFromGLRenderbuffer),
> -  CL_GL_INTEROP(clGetGLObjectInfo),
> -  CL_GL_INTEROP(clGetGLTextureInfo),
> +  CL_NOTYET(clCreateFromGLTexture3D),
> +  CL_NOTYET(clCreateFromGLRenderbuffer),
> +  CL_NOTYET(clGetGLObjectInfo),
> +  CL_NOTYET(clGetGLTextureInfo),
>CL_GL_INTEROP(clEnqueueAcquireGLObjects),
>CL_GL_INTEROP(clEnqueueReleaseGLObjects),
> -  CL_GL_INTEROP(clGetGLContextInfoKHR),
> +  CL_NOTYET(clGetGLContextInfoKHR),
>(void *) NULL,
>(void *) NULL,
>(void *) NULL,
> @@ -135,9 +139,9 @@ struct _cl_icd_dispatch const cl_khr_icd_dispatch = {
>clEnqueueReadBufferRect,
>clEnqueueWriteBufferRect,
>clEnqueueCopyBufferRect,
> -  CL_1_2_NOTYET(clCreateSubDevicesEXT),
> -  CL_1_2_NOTYET(clRetainDeviceEXT),
> -  CL_1_2_NOTYET(clReleaseDeviceEXT),
> +  CL_NOTYET(clCreateSubDevicesEXT),
> +  CL_NOTYET(clRetainDeviceEXT),
> +  CL_NOTYET(clReleaseDeviceEXT),
>  #ifdef CL_VERSION_1_2
>(void *) NULL,
>clCreateSubDevices,
> 
> ___
> Beignet mailing list
> Beignet@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/beignet
> ___
> Beignet mailing list
> Beignet@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/beignet
___
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet


[Beignet] [ANNOUNCE] Beignet 1.3.0

2017-01-20 Thread Yang, Rong R
 get_enqueued_local_size and get_local_size
  Runtime: Add support for non uniform group size
  Backend: Clang now support static, fix now
  libocl: Refine return type of workitem built-in functions
  Backend: Chang scan limit for GVN pass
  Runtime: Add support for queue size and fix error handling
  Backend: Add RegisterFamily for ir
  Backend: Initialize the extra value for selection instruction
  Backend: Fix GenRegister::offset sub reg offset
  Backend: Refine flag usage in instrction selection
  Backend: Add kernel name for sel ir output
  Backend: Refine instruction ID for sel ir
  Backend: Refine selection IR output
  Backend: Refine block read/write instruction selection
  Backend: Fix some A64 block read/write bug
  CMake: Add OCL20 env for utest
  Backend: Fix sel ir subnr usage
  Backend: Fix header address of oword block read/write
  GBE: Fix memdep-block-scan-limit caused bug on LLVM3.8
  GBE: Fix getTypesize bug with LLVM3.9
Rebecca N. Palmer (10):
  Allow building tests with Python 3 (no string.atoi)
  Utest: test pow, not powr, on negative x
  Docs: Spelling and grammar fixes
  Utests: use clGetExtensionFunctionAddressForPlatform
  Utests: Don't end an all-tests run when one test fails
  Utests: respect existing C/CXXFLAGS
  Fix build failure with CMRT enabled
  Utests: Allow testing cl_intel_accelerator via ICD
  Add clGetKernelSubGroupInfoKHR to _cl_icd_dispatch table
  Fail, don't assert, if unable to create context
Ruiling Song (25):
  GBE: add untyped A64 stateless message
  GBE: add byte scatter a64 message
  GBE: Add 64bit data stateless messages
  GBE: new Load/Store Instruction Selection pattern
  OCL20/GBE: Fix 64bit pointer issue in Load store instruction selection.
  ocl20/runtime: take the first 64KB page table entries.
  ocl20/GBE: support generic load/store
  utest: add generic pointer test
  GBE: Implement new constant solution for ocl2
  GBE: Implement to_local/private/global() function
  libocl: add get_fence() builtin.
  GBE: Fix type mismatch bug.
  GBE: Fix SEL.bool issue.
  GBE: add ocl 2.0 work_group_barrier support.
  GBE: Fix bug when unspill a long type value from scratch.
  GBE: don't try to erase a llvm:Constant.
  GBE: the dst grf should use same width as source register
  GBE: retype double register to long type when do spilling.
  runtime: prog->global_data may get 64bit address
  GBE: imm64 should not be in src1 per hardware spec.
  GBE: handle ConstantExpr in program-scope variable handling.
  GBE: Refine program scope variable logic.
  GBE: Fix destination grf register type for cmp instruction.
  runtime: handle PROGRAM_BUILD_GLOBAL_VARIABLE_TOTAL_SIZE
  GBE: Fix another Sel.bool issue.
Yan Wang (4):
  Fix bug: Initialize bti of LoadInstuctionPattern::shootByteGatherMsg().
  Fix getting bitwidth of PointerType of LLVM.
  Restore jump threading pass for reducing compiling time when run the 
large and complex kernel like Luxmark.
  Avoid possible invalid pointer by vector interator.
Yang Rong (36):
  Docs: update readme.
  Bump version to 1.3.
  Docs: update a readme typo.
  GBE: fix uninitialized build warning.
  GBE: fix half immediate negate assert.
  GBE: Fix assert when get metadata llvm.loop.unroll.enable.
  GBE: Fix a logical insn with flag bug.
  NEWS: Update Release 1.2.1.
  OCL20/GBE: Change the pointer relative op's type.
  OCL20: Add svm support.
 OCL20: Add OpenCL2.0 apis to icd.
  OCL20: add svm enqueue apis and svm's sub buffer support.
  OCL20: add gbe_kernel_get_ocl_version for getting kernel's version in 
runtime.
  libocl: change prototype of vload/vstore to match ocl2.0 spec.
  add opencl builtin atomic functions implementation.
  utest: add atomic opencl-2.0 case to test api.
  OCL20: Fix svm bugs
  OCL20: Implement clSetKernelExecInfo api
  Libocl: change prototype of math built-in for OCL2.0 spec
  OCL20: fix a unpack long assert.
  Runtime: Fix vme fail.
  Refine clSetMemObjectDestructorCallback API.
  GBE: reorder the LLVM pass to reduce the compilation time.
  GEB/Runtime: eliminate release build warnings.
  utest: suspend deprecated-declarations warning.
  Add the NULL pointer check.
  GBE: correct the llvm.loop.unroll.enable meta.
  Runtime: add the head file to avoid implicit declaration of function 
'cl_devices_list_include_check' warning.
  Runtime: fix a profiling fail.
  utest: fix i386 system long ctz fail.
  GBE: fix long work group fail.
  Runtime: Fix a event bug.
  GBE: if PointerFamily is FAMILY_QWORD, chv and bxt need special handle.
  GBE: fix legacy read64 mix pointer bug.
  GBE: fix a mix analyze bug.
  Add some pointer access check.
Yang, Rong R 

Re: [Beignet] [PATCH] Doc: add OpenCL 2.0 section to readme.

2017-01-20 Thread Yang, Rong R
Thanks, refine and pushed.

> -Original Message-
> From: Song, Ruiling
> Sent: Friday, January 20, 2017 16:33
> To: Yang, Rong R <rong.r.y...@intel.com>; beignet@lists.freedesktop.org
> Cc: Yang, Rong R <rong.r.y...@intel.com>
> Subject: RE: [Beignet] [PATCH] Doc: add OpenCL 2.0 section to readme.
> 
> 
> Basically looks good, But some typos.
> 
> > -Original Message-
> > From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf
> > Of Yang Rong
> > Sent: Friday, January 20, 2017 5:34 PM
> > To: beignet@lists.freedesktop.org
> > Cc: Yang, Rong R <rong.r.y...@intel.com>
> > Subject: [Beignet] [PATCH] Doc: add OpenCL 2.0 section to readme.
> >
> > Signed-off-by: Yang Rong <rong.r.y...@intel.com>
> > ---
> >  docs/Beignet.mdwn | 12 
> >  1 file changed, 12 insertions(+)
> >
> > diff --git a/docs/Beignet.mdwn b/docs/Beignet.mdwn index
> > 64d33dc..b0fea71 100644
> > --- a/docs/Beignet.mdwn
> > +++ b/docs/Beignet.mdwn
> > @@ -158,6 +158,18 @@ Supported Targets
> >   * 6th Generation Intel Core Processors "Skylake" and "Kabylake".
> >   * 5th Generation Intel Atom Processors "Broxten" or "Apollolake".
> >
> > +OpenCL 2.0
> > +--
> > +From release v1.3.0, beignet support OpenCL 2.0. By default, OpenCL
> > +2.0
> > support is disabled, you can enable it when cmake with option
> > +-DENABLE_OPENCL_20=1. Please remember that to enable OpenCL 2.0,
> > +there
> > are some dependencies. First, OpenCL 2.0 only support the targets
> > +from Skylake, include Skylake, Kabylake and Apollolake. Then, only
> > +clang
>   
>   "clang only" is more proper
> here? I am not sure:)
> > support all OpenCL 2.0 feature from 3.9. So to enable OpenCL 2.0,
> > +you must update LLVM/clang to 3.9 or later. And also requeired libdrm
> > +at least
>   
>  "requires"
> > 2.4.66.
> > +After enable OpenCL 2.0, beignet complies with OpenCL 2.0 spec, but
> > +some
> > OpenCL 2.0 features is simulated by software, there is no performance
>  "are"
> > +gain, such as pipe and device queues, especially device queues.
> > +If you build beignet with enable OpenCL 2.0 and your kernel don't use
> > +the
>  "with OpenCL 2.0 enabled"
> > OpenCL 2.0 features, please pass a build option -cl-std=CL1.2 for
> > +performance, the OpenCL 2.0 uses more registers and has lots of long
>   "has lots of int64 
> operations" may be more clear
> > operations, which may hurt performance, and beignet will continue to
> > improve
> > +OpenCL 2.0 performance.
> > +
> >  Known Issues
> >  
> >
> > --
> > 2.1.4
> >
> > ___
> > Beignet mailing list
> > Beignet@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/beignet
___
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] [PATCH] Fix two bugs about command queue destroy.

2017-01-11 Thread Yang, Rong R
LGTM, pushed.

> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> junyan...@inbox.com
> Sent: Wednesday, January 11, 2017 16:36
> To: beignet@lists.freedesktop.org
> Cc: He, Junyan 
> Subject: [Beignet] [PATCH] Fix two bugs about command queue destroy.
> 
> From: Junyan He 
> 
> 1. Call finish before we destroy the command queue.
>We should make sure all the commands in the queue are
>finished before we really destroy the command_queue.
>If not, may cause event status error. We leave the queue's
>life time to user and do not ref the queue when create
>event.
> 2. Loose the assert condition when notify queue.
>We have the case when ref of the queue is 0 but still need
>to notify.
> 
> Signed-off-by: Junyan He 
> ---
>  src/cl_command_queue.c | 3 +++
>  src/cl_command_queue_enqueue.c | 2 +-
>  2 files changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/src/cl_command_queue.c b/src/cl_command_queue.c index
> aa371d0..b855ff6 100644
> --- a/src/cl_command_queue.c
> +++ b/src/cl_command_queue.c
> @@ -81,6 +81,9 @@ cl_command_queue_delete(cl_command_queue
> queue)
>if (CL_OBJECT_DEC_REF(queue) > 1)
>  return;
> 
> +  /* Before we destroy the queue, we should make sure all
> + the commands in the queue are finished. */
> + cl_command_queue_wait_finish(queue);
>cl_context_remove_queue(queue->ctx, queue);
> 
>cl_command_queue_destroy_enqueue(queue);
> diff --git a/src/cl_command_queue_enqueue.c
> b/src/cl_command_queue_enqueue.c index 91fabf9..44a0761 100644
> --- a/src/cl_command_queue_enqueue.c
> +++ b/src/cl_command_queue_enqueue.c
> @@ -122,7 +122,7 @@ cl_command_queue_notify(cl_command_queue
> queue)
>  return;
>}
> 
> -  assert(CL_OBJECT_IS_COMMAND_QUEUE(queue));
> +  assert(queue && (((cl_base_object)queue)->magic ==
> + CL_OBJECT_COMMAND_QUEUE_MAGIC));
>CL_OBJECT_LOCK(queue);
>queue->worker.cookie++;
>CL_OBJECT_NOTIFY_COND(queue);
> --
> 2.7.4
> 
> ___
> Beignet mailing list
> Beignet@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/beignet
___
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] [PATCH] GBE: Fix getTypesize bug with LLVM3.9

2017-01-08 Thread Yang, Rong R
LLVM3.9 have changed pipes' meta, LGTM, pushed, thanks.

> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> Xiuli Pan
> Sent: Monday, January 9, 2017 15:00
> To: beignet@lists.freedesktop.org
> Cc: Pan, Xiuli 
> Subject: [Beignet] [PATCH] GBE: Fix getTypesize bug with LLVM3.9
> 
> From: Pan Xiuli 
> 
> We will check some type size but some of the type size have change name in
> LLVM3.9, change the check to fit the type name now.
> 
> Signed-off-by: Pan Xiuli 
> ---
>  backend/src/llvm/llvm_gen_backend.cpp | 11 ++-
>  1 file changed, 6 insertions(+), 5 deletions(-)
> 
> diff --git a/backend/src/llvm/llvm_gen_backend.cpp
> b/backend/src/llvm/llvm_gen_backend.cpp
> index 467b1de..064515b 100644
> --- a/backend/src/llvm/llvm_gen_backend.cpp
> +++ b/backend/src/llvm/llvm_gen_backend.cpp
> @@ -281,7 +281,7 @@ namespace gbe
>  return CPV;
>}
> 
> -#define TYPESIZE(TYPE,VECT,SZ) else if(name ==
> std::string(#TYPE).append(#VECT)) return VECT*SZ;
> +#define TYPESIZE(TYPE,VECT,SZ) else if( name ==
> +std::string(#TYPE).append(" __attribute__((ext_vector_type("#VECT")))")
> +) return VECT*SZ;
>  #define TYPESIZEVEC(TYPE,SZ)\
>else if(name == #TYPE) return SZ;\
>TYPESIZE(TYPE,2,SZ)\
> @@ -293,21 +293,22 @@ namespace gbe
>static uint32_t getTypeSize(Module* M, const ir::Unit , std::string&
> name) {
>if(name == "size_t") return sizeof(size_t);
>TYPESIZEVEC(char,1)
> -  TYPESIZEVEC(uchar,1)
> +  TYPESIZEVEC(unsigned char,1)
>TYPESIZEVEC(short,2)
> -  TYPESIZEVEC(ushort,2)
> +  TYPESIZEVEC(unsigned short,2)
>TYPESIZEVEC(half,2)
>TYPESIZEVEC(int,4)
> -  TYPESIZEVEC(uint,4)
> +  TYPESIZEVEC(unsigned int,4)
>TYPESIZEVEC(float,4)
>TYPESIZEVEC(double,8)
>TYPESIZEVEC(long,8)
> -  TYPESIZEVEC(ulong,8)
> +  TYPESIZEVEC(unsigned long,8)
>else{
>  StructType *StrTy = M->getTypeByName("struct."+name);
>  if(StrTy)
>return getTypeByteSize(unit,StrTy);
>}
> +  GBE_ASSERTM(false, "Unspported type name");
>return 0;
>}
>  #undef TYPESIZEVEC
> --
> 2.7.4
> 
> ___
> Beignet mailing list
> Beignet@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/beignet
___
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] [PATCH] Fail, don't assert, if unable to create context

2017-01-08 Thread Yang, Rong R
LGTM, pushed, thanks.

> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> Rebecca N. Palmer
> Sent: Monday, January 9, 2017 5:33
> To: beignet@lists.freedesktop.org
> Subject: [Beignet] [PATCH] Fail, don't assert, if unable to create context
> 
> As the "do we have any usable devices?" check uses this, it needs to not
> crash even when we don't.
> 
> Signed-off-by: Rebecca N. Palmer 
> ---
> The user who reported a crash here ( https://bugs.debian.org/848792 ) was
> using unsupported hardware, but I don't know whether this is the reason
> they can't create a context.
> 
> diff --git a/src/intel/intel_driver.c b/src/intel/intel_driver.c index
> a8d554c..b8a1b52 100644
> --- a/src/intel/intel_driver.c
> +++ b/src/intel/intel_driver.c
> @@ -134,11 +134,12 @@ intel_driver_aub_dump(driver);  return 1;  }
> 
> -static void
> +static int
>  intel_driver_context_init(intel_driver_t *driver)  {  driver->ctx =
> drm_intel_gem_context_create(driver->bufmgr);
> -assert(driver->ctx);
> +if (!driver->ctx)
> +  return 0;
>  driver->null_bo = NULL;
>  #ifdef HAS_BO_SET_SOFTPIN
>  drm_intel_bo *bo = dri_bo_alloc(driver->bufmgr, "null_bo", 64*1024, 4096);
> @@ -148,6 +149,7 @@ drm_intel_bo_set_softpin_offset(bo, 0);
> drm_intel_bo_disable_reuse(bo);  driver->null_bo = bo;  #endif
> +return 1;
>  }
> 
>  static void
> @@ -168,7 +170,7 @@ driver->locked = 0;
>  pthread_mutex_init(>ctxmutex, NULL);
> 
>  if (!intel_driver_memman_init(driver)) return 0; -
> intel_driver_context_init(driver);
> +if (!intel_driver_context_init(driver)) return 0;
> 
>  #if EMULATE_GEN
>  driver->gen_ver = EMULATE_GEN;
> 
> ___
> Beignet mailing list
> Beignet@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/beignet
___
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] [Patch V2 1/2] GBE: fix legacy read64 mix pointer bug.

2017-01-08 Thread Yang, Rong R
Pushed, thanks.

> -Original Message-
> From: Song, Ruiling
> Sent: Monday, January 9, 2017 13:21
> To: Yang, Rong R <rong.r.y...@intel.com>; beignet@lists.freedesktop.org
> Cc: Yang, Rong R <rong.r.y...@intel.com>
> Subject: RE: [Beignet] [Patch V2 1/2] GBE: fix legacy read64 mix pointer bug.
> 
> LG
> 
> - Ruiling
___
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] [PATCH] Runtime: return CL_INVALID_EVENT_WAIT_LIST if not event in the wait list.

2016-12-28 Thread Yang, Rong R
Pushed.

> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> He Junyan
> Sent: Wednesday, December 28, 2016 17:56
> To: beignet@lists.freedesktop.org
> Subject: Re: [Beignet] [PATCH] Runtime: return
> CL_INVALID_EVENT_WAIT_LIST if not event in the wait list.
> 
> Thanks for catching that bug.
> 
> On Wed, Dec 28, 2016 at 06:47:01PM +0800, Yang Rong wrote:
> > Date: Wed, 28 Dec 2016 18:47:01 +0800
> > From: Yang Rong 
> > To: beignet@lists.freedesktop.org
> > Cc: Meng Mengmeng , Yang Rong
> > 
> > Subject: [Beignet] [PATCH] Runtime: return
> CL_INVALID_EVENT_WAIT_LIST
> > if  not event in the wait list.
> > X-Mailer: git-send-email 2.1.4
> >
> > From: Meng Mengmeng 
> >
> > Signed-off-by: Yang Rong 
> > ---
> >  src/cl_event.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/src/cl_event.c b/src/cl_event.c index 8173578..644a21f
> > 100644
> > --- a/src/cl_event.c
> > +++ b/src/cl_event.c
> > @@ -546,7 +546,7 @@ cl_event_check_waitlist(cl_uint
> num_events_in_wait_list, const cl_event *event_w
> >  /* check the event and context */
> >  for (i = 0; i < num_events_in_wait_list; i++) {
> >if (!CL_OBJECT_IS_EVENT(event_wait_list[i])) {
> > -err = CL_INVALID_EVENT;
> > +err = CL_INVALID_EVENT_WAIT_LIST;
> >  break;
> >}
> >
> > --
> > 2.1.4
> >
> > ___
> > Beignet mailing list
> > Beignet@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/beignet
> 
> 
> ___
> Beignet mailing list
> Beignet@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/beignet
___
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] [PATCH] Avoid possible invalid pointer by vector interator.

2016-12-28 Thread Yang, Rong R
Pushed.

> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> Song, Ruiling
> Sent: Wednesday, December 28, 2016 16:33
> To: yan.w...@linux.intel.com; beignet@lists.freedesktop.org
> Subject: Re: [Beignet] [PATCH] Avoid possible invalid pointer by vector
> interator.
> 
> Looks good. Thanks for the fix.
> 
> - Ruiling
> 
> > -Original Message-
> > From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf
> > Of yan.w...@linux.intel.com
> > Sent: Wednesday, December 28, 2016 4:28 PM
> > To: beignet@lists.freedesktop.org
> > Cc: Yan Wang 
> > Subject: [Beignet] [PATCH] Avoid possible invalid pointer by vector
> interator.
> >
> > From: Yan Wang 
> >
> > "revisit" as vector containber will be pushed more elements in
> > findPointerEsacape() and cause previous interator to introduce
> > possible invalid pointer.
> > When compiling huge kernel like blender, it will cause random segment
> > fault crash.
> > [] operator will be more safe.
> >
> > Signed-off-by: Yan Wang 
> > ---
> >  backend/src/llvm/llvm_gen_backend.cpp | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/backend/src/llvm/llvm_gen_backend.cpp
> > b/backend/src/llvm/llvm_gen_backend.cpp
> > index 8c7a230..e3543ae 100644
> > --- a/backend/src/llvm/llvm_gen_backend.cpp
> > +++ b/backend/src/llvm/llvm_gen_backend.cpp
> > @@ -1437,8 +1437,8 @@ namespace gbe
> >}
> >  }
> >  // storing/loading pointer would introduce revisit
> > -for (std::vector::iterator iter = revisit.begin(); iter !=
> revisit.end();
> > ++iter) {
> > -  findPointerEscape(*iter, mixedPtr, true, revisit);
> > +for (size_t i = 0; i < revisit.size(); ++i) {
> > +  findPointerEscape(revisit[i], mixedPtr, true, revisit);
> >  }
> >
> >  // the second pass starts from mixed pointer
> > --
> > 2.7.4
> >
> > ___
> > Beignet mailing list
> > Beignet@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/beignet
> ___
> Beignet mailing list
> Beignet@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/beignet
___
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] [PATCH 10/19] OCL20: enable -cl-std=CL2.0.

2016-12-28 Thread Yang, Rong R
The first 10 patches pushed, thanks.

> -Original Message-
> From: Song, Ruiling
> Sent: Monday, December 19, 2016 14:47
> To: Yang, Rong R <rong.r.y...@intel.com>; beignet@lists.freedesktop.org;
> Pan, Xiuli <xiuli....@intel.com>
> Cc: Yang, Rong R <rong.r.y...@intel.com>
> Subject: RE: [Beignet] [PATCH 10/19] OCL20: enable -cl-std=CL2.0.
> 
> The first 10 patches in the patchset LGTM.
> As the last 9 patches are related to device_enqueue. I am not sure Xiuli
> would have any comments?
> 
> Thanks!
> Ruiling

___
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] [PATCH] Refine list related functions.

2016-12-28 Thread Yang, Rong R
LGTM, pushed, thanks.

> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> junyan...@inbox.com
> Sent: Wednesday, December 21, 2016 19:05
> To: beignet@lists.freedesktop.org
> Subject: [Beignet] [PATCH] Refine list related functions.
> 
> From: Junyan He 
> 
> Make the list related functions more clear and readable.
> 
> Signed-off-by: Junyan He 
> ---
>  src/cl_base_object.c   |   6 +--
>  src/cl_base_object.h   |   2 +-
>  src/cl_command_queue_enqueue.c |  22 -
>  src/cl_context.c   |  24 +-
>  src/cl_event.c |  23 +
>  src/cl_event.h |   2 +-
>  src/cl_mem.c   |  10 ++--
>  src/cl_mem.h   |   4 +-
>  src/cl_utils.c |  51 
>  src/cl_utils.h | 104 
> -
>  10 files changed, 137 insertions(+), 111 deletions(-)
> 
> diff --git a/src/cl_base_object.c b/src/cl_base_object.c index
> 00c4b35..af537cc 100644
> --- a/src/cl_base_object.c
> +++ b/src/cl_base_object.c
> @@ -29,7 +29,7 @@ cl_object_init_base(cl_base_object obj, cl_ulong magic)
>pthread_mutex_init(>mutex, NULL);
>pthread_cond_init(>cond, NULL);
>obj->owner = invalid_thread_id;
> -  list_init(>node);
> +  list_node_init(>node);
>  }
> 
>  LOCAL void
> @@ -54,9 +54,9 @@ cl_object_destroy_base(cl_base_object obj)
>  assert(0);
>}
> 
> -  if (!list_empty(>node)) {
> +  if (!list_node_out_of_list(>node)) {
>  DEBUGP(DL_ERROR, "CL object %p, call destroy while still belong to some
> object %p",
> -   obj, obj->node.prev);
> +   obj, obj->node.p);
>  assert(0);
>}
> 
> diff --git a/src/cl_base_object.h b/src/cl_base_object.h index
> 4e643df..d064a82 100644
> --- a/src/cl_base_object.h
> +++ b/src/cl_base_object.h
> @@ -47,7 +47,7 @@ typedef struct _cl_base_object {
>DEFINE_ICD(dispatch);  /* Dispatch function table for icd */
>cl_ulong magic;/* Magic number for each CL object */
>atomic_t ref;  /* Reference for each CL object */
> -  list_head node;/* CL object node belong to some container */
> +  list_node node;/* CL object node belong to some container */
>pthread_mutex_t mutex; /* THe mutex to protect this object MT safe */
>pthread_cond_t cond;   /* Condition to wait for getting the object */
>pthread_t owner;   /* The thread which own this object */
> diff --git a/src/cl_command_queue_enqueue.c
> b/src/cl_command_queue_enqueue.c index 32545b3..8bf05a2 100644
> --- a/src/cl_command_queue_enqueue.c
> +++ b/src/cl_command_queue_enqueue.c
> @@ -29,8 +29,8 @@ worker_thread_function(void *Arg)
>cl_command_queue queue = worker->queue;
>cl_event e;
>cl_uint cookie = -1;
> -  list_head *pos;
> -  list_head *n;
> +  list_node *pos;
> +  list_node *n;
>list_head ready_list;
>cl_int exec_status;
> 
> @@ -63,8 +63,8 @@ worker_thread_function(void *Arg)
>  {
>e = list_entry(pos, _cl_event, enqueue_node);
>if (cl_event_is_ready(e) <= CL_COMPLETE) {
> -list_del(>enqueue_node);
> -list_add_tail(>enqueue_node, _list);
> +list_node_del(>enqueue_node);
> +list_add_tail(_list, >enqueue_node);
>}
>  }
> 
> @@ -105,7 +105,7 @@ worker_thread_function(void *Arg)
>  list_for_each_safe(pos, n, _list)
>  {
>e = list_entry(pos, _cl_event, enqueue_node);
> -  list_del(>enqueue_node);
> +  list_node_del(>enqueue_node);
>cl_event_delete(e);
>  }
> 
> @@ -138,8 +138,8 @@
> cl_command_queue_enqueue_event(cl_command_queue queue, cl_event
> event)
>assert(CL_OBJECT_IS_COMMAND_QUEUE(queue));
>CL_OBJECT_LOCK(queue);
>assert(queue->worker.quit == CL_FALSE);
> -  assert(list_empty(>enqueue_node));
> -  list_add_tail(>enqueue_node, 
> >worker.enqueued_events);
> +  assert(list_node_out_of_list(>enqueue_node));
> +  list_add_tail(>worker.enqueued_events, 
> >enqueue_node);
>queue->worker.cookie++;
>CL_OBJECT_NOTIFY_COND(queue);
>CL_OBJECT_UNLOCK(queue);
> @@ -167,8 +167,8 @@ LOCAL void
>  cl_command_queue_destroy_enqueue(cl_command_queue queue)  {
>cl_command_queue_enqueue_worker worker = >worker;
> -  list_head *pos;
> -  list_head *n;
> +  list_node *pos;
> +  list_node *n;
>cl_event e;
> 
>assert(worker->queue == queue);
> @@ -190,7 +190,7 @@
> cl_command_queue_destroy_enqueue(cl_command_queue queue)
>  list_for_each_safe(pos, n, >enqueued_events)
>  {
>e = list_entry(pos, _cl_event, enqueue_node);
> -  list_del(>enqueue_node);
> +  list_node_del(>enqueue_node);
>cl_event_set_status(e, -1); // Give waiters a chance to wakeup.
>cl_event_delete(e);
>  }
> @@ -202,7 +202,7 @@ LOCAL cl_event *
>  cl_command_queue_record_in_queue_events(cl_command_queue queue,
> cl_uint *list_num)  {
>int 

  1   2   3   4   5   6   7   >