Re: [Beignet] [PATCH] utests: add utest to double version of log, log2, log10

2017-03-28 Thread Song, Ruiling


> -Original Message-
> From: Wang, Rander
> Sent: Wednesday, March 29, 2017 1:12 PM
> To: Song, Ruiling ; beig...@freedesktop.org
> Subject: RE: [Beignet] [PATCH] utests: add utest to double version of log, 
> log2,
> log10
> 
> As I know ivyBrigde support double, it there anything wrong?
IVB did support many double operations. But division operation on double is not 
supported until GEN8.
And what's more, the double add/mul on IVB and HSW has very low precision which 
is far from required if I remember correctly.
So, we can only support double from GEN8.

Thanks!
Ruiling
> 
> -Original Message-
> From: Song, Ruiling
> Sent: Wednesday, March 29, 2017 11:06 AM
> To: Wang, Rander ; beig...@freedesktop.org
> Cc: Wang, Rander 
> Subject: RE: [Beignet] [PATCH] utests: add utest to double version of log, 
> log2,
> log10
> 
> > +static void builtin_double_logx(void) {
> > +  // Setup kernel and buffers
> > +  int k, i, index_cur;
> > +  unsigned long gpu_data[max_function * count_input] = {0};
> > +  float diff;
> > +  char log[256] = {0};
> > +
> > +  OCL_CREATE_KERNEL("builtin_double_logx");
> > +
> 
> I just thought of a problem. Seems that you need to check whether cl_khr_fp64
> extension is supported or not.
> For older generation hardware. It is not possible to support double. Then this
> utest will certainly fail.
> So, I think you need to add the this extension check for all the double test 
> cases.
> 
> Thanks!
> Ruiling
___
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] [PATCH] utests: add utest to double version of log, log2, log10

2017-03-28 Thread Wang, Rander
As I know ivyBrigde support double, it there anything wrong?

-Original Message-
From: Song, Ruiling 
Sent: Wednesday, March 29, 2017 11:06 AM
To: Wang, Rander ; beig...@freedesktop.org
Cc: Wang, Rander 
Subject: RE: [Beignet] [PATCH] utests: add utest to double version of log, 
log2, log10

> +static void builtin_double_logx(void) {
> +  // Setup kernel and buffers
> +  int k, i, index_cur;
> +  unsigned long gpu_data[max_function * count_input] = {0};
> +  float diff;
> +  char log[256] = {0};
> +
> +  OCL_CREATE_KERNEL("builtin_double_logx");
> +

I just thought of a problem. Seems that you need to check whether cl_khr_fp64 
extension is supported or not.
For older generation hardware. It is not possible to support double. Then this 
utest will certainly fail.
So, I think you need to add the this extension check for all the double test 
cases.

Thanks!
Ruiling
___
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet


Re: [Beignet] [PATCH] utests: add utest to double version of log, log2, log10

2017-03-28 Thread Song, Ruiling
> +static void builtin_double_logx(void)
> +{
> +  // Setup kernel and buffers
> +  int k, i, index_cur;
> +  unsigned long gpu_data[max_function * count_input] = {0};
> +  float diff;
> +  char log[256] = {0};
> +
> +  OCL_CREATE_KERNEL("builtin_double_logx");
> +

I just thought of a problem. Seems that you need to check whether cl_khr_fp64 
extension is supported or not.
For older generation hardware. It is not possible to support double. Then this 
utest will certainly fail.
So, I think you need to add the this extension check for all the double test 
cases.

Thanks!
Ruiling
___
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet


[Beignet] [PATCH newRT] Move pci id for gen to gen dir.

2017-03-28 Thread junyan . he
From: Junyan He 

Move the logic for recognizing the gen device's pci id to gen
dir and rename it to gen_device_pci_id.h.

Signed-off-by: Junyan He 
---
 src/gen/gen_device_pci_id.h | 365 
 1 file changed, 365 insertions(+)
 create mode 100644 src/gen/gen_device_pci_id.h

diff --git a/src/gen/gen_device_pci_id.h b/src/gen/gen_device_pci_id.h
new file mode 100644
index 000..ac2c803
--- /dev/null
+++ b/src/gen/gen_device_pci_id.h
@@ -0,0 +1,365 @@
+/* 
+ * Copyright © 2012 Intel Corporation
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library. If not, see .
+ *
+ * Author: Benjamin Segovia 
+ */
+
+#ifndef __GEN_DEVICE_PCI_ID_H__
+#define __GEN_DEVICE_PCI_ID_H__
+
+#define INVALID_CHIP_ID -1 //returned by intel_get_device_id if no device 
found
+#define INTEL_VENDOR_ID 0x8086 // Vendor ID for Intel
+
+#define PCI_CHIP_GM45_GM 0x2A42
+#define PCI_CHIP_IGD_E_G 0x2E02
+#define PCI_CHIP_Q45_G 0x2E12
+#define PCI_CHIP_G45_G 0x2E22
+#define PCI_CHIP_G41_G 0x2E32
+
+#define PCI_CHIP_IGDNG_D_G 0x0042
+#define PCI_CHIP_IGDNG_M_G 0x0046
+
+#define IS_G45(devid) (devid == PCI_CHIP_IGD_E_G || \
+   devid == PCI_CHIP_Q45_G ||   \
+   devid == PCI_CHIP_G45_G ||   \
+   devid == PCI_CHIP_G41_G)
+#define IS_GM45(devid) (devid == PCI_CHIP_GM45_GM)
+#define IS_G4X(devid) (IS_G45(devid) || IS_GM45(devid))
+
+#define IS_IGDNG_D(devid) (devid == PCI_CHIP_IGDNG_D_G)
+#define IS_IGDNG_M(devid) (devid == PCI_CHIP_IGDNG_M_G)
+#define IS_IGDNG(devid) (IS_IGDNG_D(devid) || IS_IGDNG_M(devid))
+
+#ifndef PCI_CHIP_SANDYBRIDGE_BRIDGE
+#define PCI_CHIP_SANDYBRIDGE_BRIDGE 0x0100 /* Desktop */
+#define PCI_CHIP_SANDYBRIDGE_GT1 0x0102
+#define PCI_CHIP_SANDYBRIDGE_GT2 0x0112
+#define PCI_CHIP_SANDYBRIDGE_GT2_PLUS 0x0122
+#define PCI_CHIP_SANDYBRIDGE_BRIDGE_M 0x0104 /* Mobile */
+#define PCI_CHIP_SANDYBRIDGE_M_GT1 0x0106
+#define PCI_CHIP_SANDYBRIDGE_M_GT2 0x0116
+#define PCI_CHIP_SANDYBRIDGE_M_GT2_PLUS 0x0126
+#define PCI_CHIP_SANDYBRIDGE_BRIDGE_S 0x0108 /* Server */
+#define PCI_CHIP_SANDYBRIDGE_S_GT 0x010A
+#endif
+
+#define IS_GEN6(devid) \
+  (devid == PCI_CHIP_SANDYBRIDGE_GT1 ||\
+   devid == PCI_CHIP_SANDYBRIDGE_GT2 ||\
+   devid == PCI_CHIP_SANDYBRIDGE_GT2_PLUS ||   \
+   devid == PCI_CHIP_SANDYBRIDGE_M_GT1 ||  \
+   devid == PCI_CHIP_SANDYBRIDGE_M_GT2 ||  \
+   devid == PCI_CHIP_SANDYBRIDGE_M_GT2_PLUS || \
+   devid == PCI_CHIP_SANDYBRIDGE_S_GT)
+
+#define PCI_CHIP_IVYBRIDGE_GT1 0x0152 /* Desktop */
+#define PCI_CHIP_IVYBRIDGE_GT2 0x0162
+#define PCI_CHIP_IVYBRIDGE_M_GT1 0x0156 /* Mobile */
+#define PCI_CHIP_IVYBRIDGE_M_GT2 0x0166
+#define PCI_CHIP_IVYBRIDGE_S_GT1 0x015a /* Server */
+#define PCI_CHIP_IVYBRIDGE_S_GT2 0x016a
+
+#define PCI_CHIP_BAYTRAIL_T 0x0F31
+
+#define IS_IVB_GT1(devid)   \
+  (devid == PCI_CHIP_IVYBRIDGE_GT1 ||   \
+   devid == PCI_CHIP_IVYBRIDGE_M_GT1 || \
+   devid == PCI_CHIP_IVYBRIDGE_S_GT1)
+
+#define IS_IVB_GT2(devid)   \
+  (devid == PCI_CHIP_IVYBRIDGE_GT2 ||   \
+   devid == PCI_CHIP_IVYBRIDGE_M_GT2 || \
+   devid == PCI_CHIP_IVYBRIDGE_S_GT2)
+
+#define IS_BAYTRAIL_T(devid) \
+  (devid == PCI_CHIP_BAYTRAIL_T)
+
+#define IS_IVYBRIDGE(devid) (IS_IVB_GT1(devid) || IS_IVB_GT2(devid) || 
IS_BAYTRAIL_T(devid))
+#define IS_GEN7(devid) IS_IVYBRIDGE(devid)
+
+#define PCI_CHIP_HASWELL_D1 0x0402 /* GT1 desktop */
+#define PCI_CHIP_HASWELL_D2 0x0412 /* GT2 desktop */
+#define PCI_CHIP_HASWELL_D3 0x0422 /* GT3 desktop */
+#define PCI_CHIP_HASWELL_S1 0x040a /* GT1 server */
+#define PCI_CHIP_HASWELL_S2 0x041a /* GT2 server */
+#define PCI_CHIP_HASWELL_S3 0x042a /* GT3 server */
+#define PCI_CHIP_HASWELL_M1 0x0406 /* GT1 mobile */
+#define PCI_CHIP_HASWELL_M2 0x0416 /* GT2 mobile */
+#define PCI_CHIP_HASWELL_M3 0x0426 /* GT3 mobile */
+#define PCI_CHIP_HASWELL_B1 0x040B /* Haswell GT1 */
+#define PCI_CHIP_HASWELL_B2 0x041B /* Haswell GT2 */
+#define PCI_CHIP_HASWELL_B3 0x042B /* Haswell GT3 */
+#define PCI_CHIP_HASWELL_E1 0x040E /* Haswell GT1 */
+#define PCI_CHIP_HASWELL_E2 0x041E /* Haswell GT2 */
+#define PCI_CHIP_HASWELL_E3 0x042E /* Haswell GT3 */
+
+/* Software Development Vehicle devices. */
+#define PCI_CHIP_HASWELL_SDV_D1 0x0C02 /* SDV GT1 desktop */

[Beignet] [PATCH newRT] Add cl_gen_device_common.h file.

2017-03-28 Thread junyan . he
From: Junyan He 

This file will implement all gen device common fields.

Signed-off-by: Junyan He 
---
 src/gen/cl_gen_device_common.h | 118 +
 1 file changed, 118 insertions(+)
 create mode 100644 src/gen/cl_gen_device_common.h

diff --git a/src/gen/cl_gen_device_common.h b/src/gen/cl_gen_device_common.h
new file mode 100644
index 000..ca774e3
--- /dev/null
+++ b/src/gen/cl_gen_device_common.h
@@ -0,0 +1,118 @@
+/* 
+ * Copyright © 2012 Intel Corporation
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library. If not, see .
+ *
+ * Author: Benjamin Segovia 
+ */
+
+/* Common fields for both all GT devices (IVB / SNB) */
+.device_type = CL_DEVICE_TYPE_GPU,
+.device_id=0,/* == device_id (set when requested) */
+.vendor_id = INTEL_VENDOR_ID,
+.max_work_item_dimensions = 3,
+.max_1d_global_work_sizes = {1024 * 1024 * 256, 1, 1},
+.max_2d_global_work_sizes = {8192, 8192, 1},
+.max_3d_global_work_sizes = {8192, 8192, 2048},
+.preferred_vector_width_char = 16,
+.preferred_vector_width_short = 8,
+.preferred_vector_width_int = 4,
+.preferred_vector_width_long = 2,
+.preferred_vector_width_float = 4,
+.preferred_vector_width_double = 0,
+.preferred_vector_width_half = 0,
+.native_vector_width_char = 8,
+.native_vector_width_short = 8,
+.native_vector_width_int = 4,
+.native_vector_width_long = 2,
+.native_vector_width_float = 4,
+.native_vector_width_double = 2,
+.native_vector_width_half = 8,
+#ifdef ENABLE_OPENCL_20
+.address_bits = 64,
+#else
+.address_bits = 32,
+#endif
+.svm_capabilities = CL_DEVICE_SVM_COARSE_GRAIN_BUFFER,
+.preferred_platform_atomic_alignment = 0,
+.preferred_global_atomic_alignment = 0,
+.preferred_local_atomic_alignment = 0,
+.image_support = CL_TRUE,
+.max_read_image_args = BTI_MAX_READ_IMAGE_ARGS,
+.max_write_image_args = BTI_MAX_WRITE_IMAGE_ARGS,
+.max_read_write_image_args = BTI_MAX_WRITE_IMAGE_ARGS,
+.image_max_array_size = 2048,
+.image2d_max_width = 8192,
+.image2d_max_height = 8192,
+.image3d_max_width = 8192,
+.image3d_max_height = 8192,
+.image3d_max_depth = 2048,
+.image_mem_size = 65536,
+.max_samplers = 16,
+.mem_base_addr_align = sizeof(cl_long) * 16 * 8,
+.min_data_type_align_size = sizeof(cl_long) * 16,
+.max_pipe_args = 16,
+.pipe_max_active_reservations = 1,
+.pipe_max_packet_siz = 1024,
+.double_fp_config = 0,
+.global_mem_cache_type = CL_READ_WRITE_CACHE,
+.max_constant_buffer_size = 128 * 1024 * 1024,
+.max_constant_args = 8,
+.max_global_variable_size = 64 * 1024,
+.global_variable_preferred_total_size = 64 * 1024,
+.error_correction_support = CL_FALSE,
+#ifdef HAS_USERPTR
+.host_unified_memory = CL_TRUE,
+#else
+.host_unified_memory = CL_FALSE,
+#endif
+.profiling_timer_resolution = 80, /* ns */
+.endian_little = CL_TRUE,
+.available = CL_TRUE,
+.compiler_available = CL_TRUE,
+.linker_available = CL_TRUE,
+.execution_capabilities = CL_EXEC_KERNEL | CL_EXEC_NATIVE_KERNEL,
+.queue_properties = CL_QUEUE_PROFILING_ENABLE,
+.queue_on_host_properties = CL_QUEUE_PROFILING_ENABLE,
+.queue_on_device_properties = CL_QUEUE_PROFILING_ENABLE | 
CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE,
+.queue_on_device_preferred_size = 16 * 1024,
+.queue_on_device_max_size = 256 * 1024,
+.max_on_device_queues = 1,
+.max_on_device_events = 1024,
+.platform = NULL, /* == intel_platform (set when requested) */
+/* IEEE 754, XXX does IVB support CL_FP_CORRECTLY_ROUNDED_DIVIDE_SQRT? */
+.single_fp_config = CL_FP_INF_NAN | CL_FP_ROUND_TO_NEAREST , /* IEEE 754. */
+.half_fp_config = CL_FP_INF_NAN | CL_FP_ROUND_TO_NEAREST ,
+.printf_buffer_size = 1 * 1024 * 1024,
+.interop_user_sync = CL_TRUE,
+
+#define DECL_INFO_STRING(FIELD, STRING) \
+.FIELD = STRING,\
+.JOIN(FIELD,_sz) = sizeof(STRING),
+DECL_INFO_STRING(name, "Intel HD Graphics Family")
+DECL_INFO_STRING(vendor, "Intel")
+DECL_INFO_STRING(version, LIBCL_VERSION_STRING)
+DECL_INFO_STRING(profile, "FULL_PROFILE")
+DECL_INFO_STRING(opencl_c_version, LIBCL_C_VERSION_STRING)
+DECL_INFO_STRING(extensions, "")
+DECL_INFO_STRING(driver_version, LIBCL_DRIVER_VERSION_STRING)
+DECL_INFO_STRING(spir_versions, "1.2")
+#undef DECL_INFO_STRING
+.parent_device = NULL,
+.partition_max_sub_device = 1,
+.partition_property = {0},
+.affinity_domain = 0,
+.partition_type = {0},
+.image_pitch_alignment = 1,

[Beignet] [PATCH 1/6 newRT] Refine intel_driver file.

2017-03-28 Thread junyan . he
From: Junyan He 

Delete some verbose logic and make create/delete the only API
for creating and destroy the intel_driver.

Signed-off-by: Junyan He 
---
 src/gen/intel_driver.c | 734 -
 src/gen/intel_driver.h |  24 +-
 2 files changed, 370 insertions(+), 388 deletions(-)

diff --git a/src/gen/intel_driver.c b/src/gen/intel_driver.c
index bce1894..5161bee 100644
--- a/src/gen/intel_driver.c
+++ b/src/gen/intel_driver.c
@@ -80,33 +80,95 @@
 #include "cl_device_id.h"
 #include "cl_platform_id.h"
 
+/* DRI Context 
*/
 static void
+intel_driver_close(intel_driver_t *intel)
+{
+  //Due to the drm change about the test usrptr, we need to destroy the bufmgr
+  //befor the driver was closed, otherwise the test usrptr will not be freed.
+  if (intel->bufmgr)
+drm_intel_bufmgr_destroy(intel->bufmgr);
+
+#ifdef HAS_X11
+  if (intel->dri_ctx)
+dri_state_release(intel->dri_ctx);
+  if (intel->x11_display)
+XCloseDisplay(intel->x11_display);
+#endif
+
+  if (intel->need_close) {
+close(intel->fd);
+intel->need_close = 0;
+  }
+  intel->dri_ctx = NULL;
+  intel->x11_display = NULL;
+  intel->fd = -1;
+}
+
+static void
+intel_driver_context_destroy(intel_driver_t *driver)
+{
+  if (driver->null_bo)
+drm_intel_bo_unreference(driver->null_bo);
+  if (driver->ctx)
+drm_intel_gem_context_destroy(driver->ctx);
+  driver->ctx = NULL;
+}
+
+static int
+intel_driver_terminate(intel_driver_t *driver)
+{
+  pthread_mutex_destroy(>ctxmutex);
+
+  if (driver->need_close) {
+close(driver->fd);
+driver->need_close = 0;
+  }
+
+  driver->fd = -1;
+  return 1;
+}
+
+LOCAL void
 intel_driver_delete(intel_driver_t *driver)
 {
   if (driver == NULL)
 return;
 
+  intel_driver_context_destroy(driver);
+  intel_driver_close(driver);
+  intel_driver_terminate(driver);
+
   CL_FREE(driver);
 }
 
 static intel_driver_t *
 intel_driver_new(void)
 {
-  intel_driver_t *driver = NULL;
+  intel_driver_t *driver = CL_CALLOC(1, sizeof(intel_driver_t));
+  if (driver == NULL)
+return NULL;
 
-  TRY_ALLOC_NO_ERR(driver, CL_CALLOC(1, sizeof(intel_driver_t)));
   driver->fd = -1;
-
-exit:
   return driver;
-error:
-  intel_driver_delete(driver);
-  driver = NULL;
-  goto exit;
 }
 
-/* just used for maximum relocation number in drm_intel */
-#define BATCH_SIZE 0x4000
+static void
+intel_driver_context_init(intel_driver_t *driver)
+{
+  driver->ctx = drm_intel_gem_context_create(driver->bufmgr);
+  assert(driver->ctx);
+  driver->null_bo = NULL;
+
+#ifdef HAS_BO_SET_SOFTPIN
+  drm_intel_bo *bo = dri_bo_alloc(driver->bufmgr, "null_bo", 64 * 1024, 4096);
+  drm_intel_bo_set_softpin_offset(bo, 0);
+  // don't reuse it, that would make two bo trying to bind to same address,
+  // which is un-reasonable.
+  drm_intel_bo_disable_reuse(bo);
+  driver->null_bo = bo;
+#endif
+}
 
 /* set OCL_DUMP_AUB=1 to get aub file */
 static void
@@ -117,18 +179,20 @@ intel_driver_aub_dump(intel_driver_t *driver)
   if (!val)
 return;
   if (atoi(val) != 0) {
-drm_intel_bufmgr_gem_set_aub_filename(driver->bufmgr,
-  "beignet.aub");
+drm_intel_bufmgr_gem_set_aub_filename(driver->bufmgr, "beignet.aub");
 drm_intel_bufmgr_gem_set_aub_dump(driver->bufmgr, 1);
   }
 }
 
+/* just used for maximum relocation number in drm_intel */
+#define BATCH_SIZE 0x4000
 static int
 intel_driver_memman_init(intel_driver_t *driver)
 {
   driver->bufmgr = drm_intel_bufmgr_gem_init(driver->fd, BATCH_SIZE);
   if (!driver->bufmgr)
 return 0;
+
   drm_intel_bufmgr_gem_enable_reuse(driver->bufmgr);
   driver->device_id = drm_intel_bufmgr_gem_get_devid(driver->bufmgr);
   intel_driver_aub_dump(driver);
@@ -136,34 +200,6 @@ intel_driver_memman_init(intel_driver_t *driver)
 }
 
 static int
-intel_driver_context_init(intel_driver_t *driver)
-{
-  driver->ctx = drm_intel_gem_context_create(driver->bufmgr);
-  if (!driver->ctx)
-return 0;
-  driver->null_bo = NULL;
-#ifdef HAS_BO_SET_SOFTPIN
-  drm_intel_bo *bo = dri_bo_alloc(driver->bufmgr, "null_bo", 64 * 1024, 4096);
-  drm_intel_bo_set_softpin_offset(bo, 0);
-  // don't reuse it, that would make two bo trying to bind to same address,
-  // which is un-reasonable.
-  drm_intel_bo_disable_reuse(bo);
-  driver->null_bo = bo;
-#endif
-  return 1;
-}
-
-static void
-intel_driver_context_destroy(intel_driver_t *driver)
-{
-  if (driver->null_bo)
-drm_intel_bo_unreference(driver->null_bo);
-  if (driver->ctx)
-drm_intel_gem_context_destroy(driver->ctx);
-  driver->ctx = NULL;
-}
-
-static int
 intel_driver_init(intel_driver_t *driver, int dev_fd)
 {
   driver->fd = dev_fd;
@@ -172,8 +208,7 @@ intel_driver_init(intel_driver_t *driver, int dev_fd)
 
   if (!intel_driver_memman_init(driver))
 return 0;
-  if (!intel_driver_context_init(driver))
-return 0;
+  

[Beignet] [PATCH 6/6 newRT] Add cl_context_gen file.

2017-03-28 Thread junyan . he
From: Junyan He 

This file will implement all the logic specific to GEN.

Signed-off-by: Junyan He 
---
 src/gen/cl_context_gen.c | 195 +++
 src/gen/cl_gen.h |  55 +
 2 files changed, 250 insertions(+)
 create mode 100644 src/gen/cl_context_gen.c

diff --git a/src/gen/cl_context_gen.c b/src/gen/cl_context_gen.c
new file mode 100644
index 000..7bc4fc0
--- /dev/null
+++ b/src/gen/cl_context_gen.c
@@ -0,0 +1,195 @@
+/*
+ * Copyright © 2012 Intel Corporation
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library. If not, see .
+ *
+ */
+
+#include "cl_gen.h"
+
+#define DECL_INTERNAL_KERN(NAME)  \
+  extern char cl_internal_##NAME##_str[]; \
+  extern size_t cl_internal_##NAME##_str_size;
+
+DECL_INTERNAL_KERN(block_motion_estimate_intel)
+DECL_INTERNAL_KERN(copy_buf_align16)
+DECL_INTERNAL_KERN(copy_buf_align4)
+DECL_INTERNAL_KERN(copy_buffer_to_image_2d_align16)
+DECL_INTERNAL_KERN(copy_buffer_to_image_2d)
+DECL_INTERNAL_KERN(copy_buffer_to_image_3d)
+DECL_INTERNAL_KERN(copy_buf_rect_align4)
+DECL_INTERNAL_KERN(copy_buf_rect)
+DECL_INTERNAL_KERN(copy_buf_unalign_dst_offset)
+DECL_INTERNAL_KERN(copy_buf_unalign_same_offset)
+DECL_INTERNAL_KERN(copy_buf_unalign_src_offset)
+DECL_INTERNAL_KERN(copy_image_1d_array_to_1d_array)
+DECL_INTERNAL_KERN(copy_image_1d_to_1d)
+DECL_INTERNAL_KERN(copy_image_2d_array_to_2d_array)
+DECL_INTERNAL_KERN(copy_image_2d_array_to_2d)
+DECL_INTERNAL_KERN(copy_image_2d_array_to_3d)
+DECL_INTERNAL_KERN(copy_image_2d_to_2d_array)
+DECL_INTERNAL_KERN(copy_image_2d_to_2d)
+DECL_INTERNAL_KERN(copy_image_2d_to_3d)
+DECL_INTERNAL_KERN(copy_image_2d_to_buffer_align16)
+DECL_INTERNAL_KERN(copy_image_2d_to_buffer)
+DECL_INTERNAL_KERN(copy_image_3d_to_2d_array)
+DECL_INTERNAL_KERN(copy_image_3d_to_2d)
+DECL_INTERNAL_KERN(copy_image_3d_to_3d)
+DECL_INTERNAL_KERN(copy_image_3d_to_buffer)
+DECL_INTERNAL_KERN(fill_buf_align128)
+DECL_INTERNAL_KERN(fill_buf_align2)
+DECL_INTERNAL_KERN(fill_buf_align4)
+DECL_INTERNAL_KERN(fill_buf_align8)
+DECL_INTERNAL_KERN(fill_buf_unalign)
+DECL_INTERNAL_KERN(fill_image_1d_array)
+DECL_INTERNAL_KERN(fill_image_1d)
+DECL_INTERNAL_KERN(fill_image_2d_array)
+DECL_INTERNAL_KERN(fill_image_2d)
+DECL_INTERNAL_KERN(fill_image_3d)
+
+#define REF_INTERNAL_KERN(NAME) (cl_internal_##NAME##_str), 
&(cl_internal_##NAME##_str_size)
+
+static struct {
+  cl_int index;
+  void *program_binary;
+  size_t *size;
+  char *kernel_name;
+} gen_internals_kernels[] = {
+  {CL_ENQUEUE_COPY_BUFFER_ALIGN4, REF_INTERNAL_KERN(copy_buf_align4), 
"__cl_copy_region_align4"},
+  {CL_ENQUEUE_COPY_BUFFER_ALIGN16, REF_INTERNAL_KERN(copy_buf_align16), 
"__cl_copy_region_align16"},
+  {CL_ENQUEUE_COPY_BUFFER_UNALIGN_SAME_OFFSET, 
REF_INTERNAL_KERN(copy_buf_unalign_same_offset), 
"__cl_copy_region_unalign_same_offset"},
+  {CL_ENQUEUE_COPY_BUFFER_UNALIGN_DST_OFFSET, 
REF_INTERNAL_KERN(copy_buf_unalign_dst_offset), 
"__cl_copy_region_unalign_dst_offset"},
+  {CL_ENQUEUE_COPY_BUFFER_UNALIGN_SRC_OFFSET, 
REF_INTERNAL_KERN(copy_buf_unalign_src_offset), 
"__cl_copy_region_unalign_src_offset"},
+  {CL_ENQUEUE_COPY_BUFFER_RECT, REF_INTERNAL_KERN(copy_buf_rect), 
"__cl_copy_buffer_rect"},
+  {CL_ENQUEUE_COPY_BUFFER_RECT_ALIGN4, 
REF_INTERNAL_KERN(copy_buf_rect_align4), "__cl_copy_buffer_rect_align4"},
+  {CL_ENQUEUE_COPY_IMAGE_1D_TO_1D, REF_INTERNAL_KERN(copy_image_1d_to_1d), 
"__cl_copy_image_1d_to_1d"},
+  {CL_ENQUEUE_COPY_IMAGE_2D_TO_2D, REF_INTERNAL_KERN(copy_image_2d_to_2d), 
"__cl_copy_image_2d_to_2d"},
+  {CL_ENQUEUE_COPY_IMAGE_3D_TO_2D, REF_INTERNAL_KERN(copy_image_3d_to_2d), 
"__cl_copy_image_3d_to_2d"},
+  {CL_ENQUEUE_COPY_IMAGE_2D_TO_3D, REF_INTERNAL_KERN(copy_image_2d_to_3d), 
"__cl_copy_image_2d_to_3d"},
+  {CL_ENQUEUE_COPY_IMAGE_3D_TO_3D, REF_INTERNAL_KERN(copy_image_3d_to_3d), 
"__cl_copy_image_3d_to_3d"},
+  {CL_ENQUEUE_COPY_IMAGE_2D_TO_2D_ARRAY, 
REF_INTERNAL_KERN(copy_image_2d_to_2d_array), "__cl_copy_image_2d_to_2d_array"},
+  {CL_ENQUEUE_COPY_IMAGE_1D_ARRAY_TO_1D_ARRAY, 
REF_INTERNAL_KERN(copy_image_1d_array_to_1d_array), 
"__cl_copy_image_1d_array_to_1d_array"},
+  {CL_ENQUEUE_COPY_IMAGE_2D_ARRAY_TO_2D_ARRAY, 
REF_INTERNAL_KERN(copy_image_2d_array_to_2d_array), 
"__cl_copy_image_2d_array_to_2d_array"},
+  {CL_ENQUEUE_COPY_IMAGE_2D_ARRAY_TO_2D, 

[Beignet] [PATCH 3/6 newRT] Fix two bugs in gen kernel.

2017-03-28 Thread junyan . he
From: Junyan He 

Signed-off-by: Junyan He 
---
 src/gen/cl_kernel_gen.c  | 2 +-
 src/gen/cl_program_gen.c | 6 --
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/src/gen/cl_kernel_gen.c b/src/gen/cl_kernel_gen.c
index 78ce6b8..4e85c1d 100644
--- a/src/gen/cl_kernel_gen.c
+++ b/src/gen/cl_kernel_gen.c
@@ -301,7 +301,7 @@ cl_program_gen_get_kernel_func_cl_info(cl_device_id device, 
cl_kernel kernel)
   }
 
   arg_type_qualifier = 0;
-  if (strstr(arg_type_qual_str, "const") && (kernel->args[i].arg_type == 
ArgTypePointer))
+  if (strstr(arg_type_qual_str, "const"))
 arg_type_qualifier = arg_type_qualifier | CL_KERNEL_ARG_TYPE_CONST;
   if (strstr(arg_type_qual_str, "volatile"))
 arg_type_qualifier = arg_type_qualifier | CL_KERNEL_ARG_TYPE_VOLATILE;
diff --git a/src/gen/cl_program_gen.c b/src/gen/cl_program_gen.c
index 561c7e0..3c0b796 100644
--- a/src/gen/cl_program_gen.c
+++ b/src/gen/cl_program_gen.c
@@ -19,13 +19,14 @@
 #include "cl_gen.h"
 
 struct binary_type_header_info {
-  unsigned char header[4];
+  unsigned char header[7];
   cl_uint size;
   cl_uint type;
 };
 
 static struct binary_type_header_info binary_type_header[4] = {
   {{'B', 'C', 0xC0, 0xDE}, 4, CL_PROGRAM_BINARY_TYPE_COMPILED_OBJECT},
+  {{'L', 'I', 'B', 'B', 'C', 0xC0, 0xDE}, 7, CL_PROGRAM_BINARY_TYPE_LIBRARY},
   {{0x7f, 'E', 'L', 'F'}, 4, CL_PROGRAM_BINARY_TYPE_EXECUTABLE}};
 
 static cl_int
@@ -270,6 +271,7 @@ cl_program_load_binary_gen_elf(cl_device_id device, 
cl_program prog)
strlen(p_sym_entry->st_name + elf->strtab_data->d_buf) + 1);
 j++;
   }
+  assert(j == pd->kernel_num);
 
   return CL_SUCCESS;
 }
@@ -286,7 +288,7 @@ cl_program_load_binary_gen(cl_device_id device, cl_program 
prog)
   assert(pd->binary != NULL);
 
   //need at least bytes to check the binary type.
-  if (pd->binary_sz < 5)
+  if (pd->binary_sz < 7)
 return CL_INVALID_PROGRAM_EXECUTABLE;
 
   if (pd->binary_type == CL_PROGRAM_BINARY_TYPE_NONE) { // Need to recognize 
it first
-- 
2.7.4

___
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet


[Beignet] [PATCH 4/6 newRT] Add cl_device_id_gen file in gen dir.

2017-03-28 Thread junyan . he
From: Junyan He 

This file will implement all device_id related logic. After
inited, it will create a static GEN device for later usage.

Signed-off-by: Junyan He 
---
 src/gen/cl_device_id_gen.c | 974 +
 src/gen/cl_gen.h   |   6 +
 src/gen/cl_gen75_device.h  |  30 ++
 src/gen/cl_gen7_device.h   |  34 ++
 src/gen/cl_gen8_device.h   |  30 ++
 src/gen/cl_gen9_device.h   |  30 ++
 6 files changed, 1104 insertions(+)
 create mode 100644 src/gen/cl_device_id_gen.c
 create mode 100644 src/gen/cl_gen75_device.h
 create mode 100644 src/gen/cl_gen7_device.h
 create mode 100644 src/gen/cl_gen8_device.h
 create mode 100644 src/gen/cl_gen9_device.h

diff --git a/src/gen/cl_device_id_gen.c b/src/gen/cl_device_id_gen.c
new file mode 100644
index 000..35e9025
--- /dev/null
+++ b/src/gen/cl_device_id_gen.c
@@ -0,0 +1,974 @@
+/* 
+ * Copyright © 2012 Intel Corporation
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library. If not, see .
+ *
+ */
+
+#include "cl_gen.h"
+#include "gen_device_pci_id.h"
+#include 
+
+extern cl_int cl_compiler_unload_gen(cl_device_id device);
+
+static _cl_device_api __gen_device_api = {
+  .compiler_unload = cl_compiler_unload_gen,
+  .context_new = cl_context_new_gen,
+  .context_create = cl_context_create_gen,
+  .context_delete = cl_context_delete_gen,
+  .program_new = cl_program_new_gen,
+  .program_load_binary = cl_program_load_binary_gen,
+  .program_delete = cl_program_delete_gen,
+  .get_program_info = cl_program_get_info_gen,
+  .kernel_new = cl_kernel_new_gen,
+  .kernel_delete = cl_kernel_delete_gen,
+  .kernel_create = cl_kernel_create_gen,
+  .get_kernel_info = cl_kernel_get_info_gen,
+  .ND_range_kernel = cl_command_queue_ND_range_gen_wrap,
+  .mem_copy = cl_mem_copy_gen,
+  .mem_fill = cl_mem_fill_gen,
+  .mem_copy_rect = cl_mem_copy_buffer_rect_gen,
+  .image_fill = cl_image_fill_gen,
+  .image_copy = cl_image_copy_gen,
+  .copy_image_to_buffer = cl_mem_copy_image_to_buffer_gen,
+  .copy_buffer_to_image = cl_mem_copy_buffer_to_image_gen,
+};
+
+/* HW parameters */
+#define BTI_MAX_READ_IMAGE_ARGS 128
+#define BTI_MAX_WRITE_IMAGE_ARGS 8
+
+static struct _cl_device_id intel_ivb_gt2_device = {
+  .max_compute_unit = 16,
+  .max_thread_per_unit = 8,
+  .sub_slice_count = 2,
+  .max_work_item_sizes = {512, 512, 512},
+  .max_work_group_size = 512,
+  .max_clock_frequency = 1000,
+#include "cl_gen7_device.h"
+};
+
+static struct _cl_device_id intel_ivb_gt1_device = {
+  .max_compute_unit = 6,
+  .max_thread_per_unit = 6,
+  .sub_slice_count = 1,
+  .max_work_item_sizes = {256, 256, 256},
+  .max_work_group_size = 256,
+  .max_clock_frequency = 1000,
+#include "cl_gen7_device.h"
+};
+
+static struct _cl_device_id intel_baytrail_t_device = {
+  .max_compute_unit = 4,
+  .max_thread_per_unit = 8,
+  .sub_slice_count = 1,
+  .max_work_item_sizes = {256, 256, 256},
+  .max_work_group_size = 256,
+  .max_clock_frequency = 1000,
+#include "cl_gen7_device.h"
+};
+
+/* XXX we clone IVB for HSW now */
+static struct _cl_device_id intel_hsw_gt1_device = {
+  .max_compute_unit = 10,
+  .max_thread_per_unit = 7,
+  .sub_slice_count = 1,
+  .max_work_item_sizes = {512, 512, 512},
+  .max_work_group_size = 512,
+  .max_clock_frequency = 1000,
+#include "cl_gen75_device.h"
+};
+
+static struct _cl_device_id intel_hsw_gt2_device = {
+  .max_compute_unit = 20,
+  .max_thread_per_unit = 7,
+  .sub_slice_count = 2,
+  .max_work_item_sizes = {512, 512, 512},
+  .max_work_group_size = 512,
+  .max_clock_frequency = 1000,
+#include "cl_gen75_device.h"
+};
+
+static struct _cl_device_id intel_hsw_gt3_device = {
+  .max_compute_unit = 40,
+  .max_thread_per_unit = 7,
+  .sub_slice_count = 4,
+  .max_work_item_sizes = {512, 512, 512},
+  .max_work_group_size = 512,
+  .max_clock_frequency = 1000,
+#include "cl_gen75_device.h"
+};
+
+/* XXX we clone IVB for HSW now */
+static struct _cl_device_id intel_brw_gt1_device = {
+  .max_compute_unit = 12,
+  .max_thread_per_unit = 7,
+  .sub_slice_count = 2,
+  .max_work_item_sizes = {512, 512, 512},
+  .max_work_group_size = 512,
+  .max_clock_frequency = 1000,
+#include "cl_gen8_device.h"
+};
+
+static struct _cl_device_id intel_brw_gt2_device = {
+  .max_compute_unit = 24,
+  .max_thread_per_unit = 7,
+  .sub_slice_count = 3,
+  .max_work_item_sizes = {512, 512, 

[Beignet] [PATCH 2/6 newRT] Move X11 files to gen dir.

2017-03-28 Thread junyan . he
From: Junyan He 

Signed-off-by: Junyan He 
---
 src/CMakeLists.txt  |   4 +-
 src/gen/x11/dricommon.c | 333 
 src/gen/x11/dricommon.h |  99 +
 src/gen/x11/va_dri2.c   | 327 +++
 src/gen/x11/va_dri2.h   |  89 
 src/gen/x11/va_dri2str.h| 211 
 src/gen/x11/va_dri2tokens.h |  66 +
 src/x11/dricommon.c | 333 
 src/x11/dricommon.h |  99 -
 src/x11/va_dri2.c   | 327 ---
 src/x11/va_dri2.h   |  89 
 src/x11/va_dri2str.h| 211 
 src/x11/va_dri2tokens.h |  66 -
 13 files changed, 1127 insertions(+), 1127 deletions(-)
 create mode 100644 src/gen/x11/dricommon.c
 create mode 100644 src/gen/x11/dricommon.h
 create mode 100644 src/gen/x11/va_dri2.c
 create mode 100644 src/gen/x11/va_dri2.h
 create mode 100644 src/gen/x11/va_dri2str.h
 create mode 100644 src/gen/x11/va_dri2tokens.h
 delete mode 100644 src/x11/dricommon.c
 delete mode 100644 src/x11/dricommon.h
 delete mode 100644 src/x11/va_dri2.c
 delete mode 100644 src/x11/va_dri2.h
 delete mode 100644 src/x11/va_dri2str.h
 delete mode 100644 src/x11/va_dri2tokens.h

diff --git a/src/CMakeLists.txt b/src/CMakeLists.txt
index 709dc10..91a772f 100644
--- a/src/CMakeLists.txt
+++ b/src/CMakeLists.txt
@@ -111,8 +111,8 @@ if (X11_FOUND)
   set(CMAKE_C_FLAGS "-DHAS_X11 ${CMAKE_C_FLAGS}")
   set(OPENCL_SRC
   ${OPENCL_SRC}
-  x11/dricommon.c
-  x11/va_dri2.c)
+  gen/x11/dricommon.c
+  gen/x11/va_dri2.c)
 endif (X11_FOUND)
 
 if (CMRT_FOUND)
diff --git a/src/gen/x11/dricommon.c b/src/gen/x11/dricommon.c
new file mode 100644
index 000..92623d9
--- /dev/null
+++ b/src/gen/x11/dricommon.c
@@ -0,0 +1,333 @@
+/* 
+ * Copyright © 2012 Intel Corporation
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library. If not, see .
+ *
+ * Author: Benjamin Segovia 
+ * Note: the code is taken from libva code base
+ */
+
+/*
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the
+ * "Software"), to deal in the Software without restriction, including
+ * without limitation the rights to use, copy, modify, merge, publish,
+ * distribute, sub license, and/or sell copies of the Software, and to
+ * permit persons to whom the Software is furnished to do so, subject to
+ * the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the
+ * next paragraph) shall be included in all copies or substantial portions
+ * of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT.
+ * IN NO EVENT SHALL PRECISION INSIGHT AND/OR ITS SUPPLIERS BE LIABLE FOR
+ * ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+ * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+ * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#include 
+#include 
+#include "va_dri2.h"
+#include "va_dri2tokens.h"
+#include "dricommon.h"
+#include "cl_utils.h"
+#include "cl_alloc.h"
+
+#include 
+#include 
+#include 
+#include 
+
+#define LOCAL __attribute__ ((visibility ("internal")))
+
+LOCAL dri_drawable_t*
+dri_state_do_drawable_hash(dri_state_t *state, XID drawable)
+{
+  int index = drawable % DRAWABLE_HASH_SZ;
+  struct dri_drawable *dri_drawable = state->drawable_hash[index];
+
+  while (dri_drawable) {
+if (dri_drawable->x_drawable == drawable)
+  return dri_drawable;
+dri_drawable = dri_drawable->next;
+  }
+
+  dri_drawable = dri_state_create_drawable(state, drawable);
+  if(dri_drawable == NULL)
+return NULL;
+
+  dri_drawable->x_drawable = drawable;
+  dri_drawable->next = state->drawable_hash[index];
+  state->drawable_hash[index] = dri_drawable;
+
+  return dri_drawable;
+}
+
+LOCAL void
+dri_state_free_drawable_hash(dri_state_t *state)
+{
+  int i;
+  struct dri_drawable *dri_drawable, *prev;
+
+  for (i = 0; i <