[madlib] branch master updated: updated DL preprocessor docs for bytea (#445)

2019-10-01 Thread fmcquillan
This is an automated email from the ASF dual-hosted git repository.

fmcquillan pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/madlib.git


The following commit(s) were added to refs/heads/master by this push:
 new 63f40e7  updated DL preprocessor docs for bytea (#445)
63f40e7 is described below

commit 63f40e70f8dbb6c9ed2b1b91c847fd3819b1a627
Author: Frank McQuillan 
AuthorDate: Tue Oct 1 13:52:40 2019 -0700

updated DL preprocessor docs for bytea (#445)

* updated DL preprocessor docs for bytea

* address review comments
---
 .../deep_learning/input_data_preprocessor.sql_in   | 210 ++---
 1 file changed, 98 insertions(+), 112 deletions(-)

diff --git 
a/src/ports/postgres/modules/deep_learning/input_data_preprocessor.sql_in 
b/src/ports/postgres/modules/deep_learning/input_data_preprocessor.sql_in
index a3f4281..8d70431 100644
--- a/src/ports/postgres/modules/deep_learning/input_data_preprocessor.sql_in
+++ b/src/ports/postgres/modules/deep_learning/input_data_preprocessor.sql_in
@@ -18,7 +18,7 @@
  * under the License.
  *
  * @file input_preprocessor_dl.sql_in
- * @brief TODO
+ * @brief Utilities to prepare input image data for use by deep learning 
modules.
  * @date December 2018
  *
  */
@@ -86,9 +86,10 @@ training_preprocessor_dl(source_table,
   TEXT.  Name of the output table from the training preprocessor which
   will be used as input to algorithms that support mini-batching.
   Note that the arrays packed into the output table are shuffled
-  and normalized (by dividing each element in the independent variable array
-  by the optional 'normalizing_const' parameter), so they will not match
-  up in an obvious way with the rows in the source table.
+  and normalized, by dividing each element in the independent variable array
+  by the optional 'normalizing_const' parameter. For performance reasons,
+  packed arrays are converted to PostgreSQL bytea format, which is a
+  variable-length binary string.
 
   In the case a validation data set is used (see
   later on this page), this output table is also used
@@ -158,11 +159,15 @@ validation_preprocessor_dl(source_table,
 
   output_table
   TEXT.  Name of the output table from the validation
-  preprocessor which will be used as input to algorithms that support 
mini-batching.  The arrays packed into the output table are
+  preprocessor which will be used as input to algorithms that support 
mini-batching.
+  The arrays packed into the output table are
   normalized using the same normalizing constant from the
   training preprocessor as specified in
   the 'training_preprocessor_table' parameter described below.
   Validation data is not shuffled.
+  For performance reasons,
+  packed arrays are converted to PostgreSQL bytea format, which is a
+  variable-length binary string.
   
 
   dependent_varname
@@ -209,25 +214,43 @@ validation_preprocessor_dl(source_table,
 validation_preprocessor_dl() contain the following columns:
 
   
-buffer_id
-INTEGER. Unique id for each row in the packed table.
+independent_var
+BYTEA. Packed array of independent variables in PostgreSQL bytea 
format.
+Arrays of independent variables packed into the output table are
+normalized by dividing each element in the independent variable array 
by the
+optional 'normalizing_const' parameter.  Training data is shuffled, but
+validation data is not.
 
   
   
 dependent_var
-ANYARRAY[]. Packed array of dependent variables.
+BYTEA. Packed array of dependent variables in PostgreSQL bytea 
format.
 The dependent variable is always one-hot encoded as an
-INTEGER[] array. For now, we are assuming that
+integer array. For now, we are assuming that
 input_preprocessor_dl() will be used
 only for classification problems using deep learning. So
 the dependent variable is one-hot encoded, unless it's already a
 numeric array in which case we assume it's already one-hot
-encoded and just cast it to an INTEGER[] array.
+encoded and just cast it to an integer array.
 
   
   
-independent_var
-REAL[]. Packed array of independent variables.
+independent_var_shape
+INTEGER[]. Shape of the independent variable array after 
preprocessing.
+The first element is the number of images packed per row, and 
subsequent
+elements will depend on how the image is described (e.g., channels 
first or last).
+
+  
+  
+dependent_var_shape
+INTEGER[]. Shape of the dependent variable array after 
preprocessing.
+The first element is the number of images packed per row, and the 
second
+element is the number of class values.
+
+  
+  
+buffer_id
+INTEGER. Unique id for each row in the packed table.
   

[madlib] branch master updated: DL: Update jenkins to install tensorflow 1.14.

2019-10-01 Thread nkak
This is an automated email from the ASF dual-hosted git repository.

nkak pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/madlib.git


The following commit(s) were added to refs/heads/master by this push:
 new 9edd745  DL: Update jenkins to install tensorflow 1.14.
9edd745 is described below

commit 9edd74582008413ca4405e52c4f06c64efc7664c
Author: Nikhil Kak 
AuthorDate: Mon Sep 30 17:32:52 2019 -0700

DL: Update jenkins to install tensorflow 1.14.

The latest version of tensorflow (2.0 as of this commit) is not compatible 
with the version of keras that we install i.e. 2.2.4.

This is the error we get
```
AttributeError: 'module' object has no attribute 'get_default_graph'
```

This commit installs a compatible version of tensorflow i.e. 1.14.
---
 tool/docker/base/Dockerfile_postgres_10_Jenkins | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tool/docker/base/Dockerfile_postgres_10_Jenkins 
b/tool/docker/base/Dockerfile_postgres_10_Jenkins
index bd63312..a0d882d 100644
--- a/tool/docker/base/Dockerfile_postgres_10_Jenkins
+++ b/tool/docker/base/Dockerfile_postgres_10_Jenkins
@@ -33,7 +33,7 @@ RUN apt-get update && apt-get install -y  wget \
build-essential \
cmake
 
-RUN pip install tensorflow keras==2.2.4
+RUN pip install tensorflow==1.14 keras==2.2.4
 
 ## To build an image from this docker file, from madlib folder, run:
 # docker build -t madlib/postgres_10:jenkins -f 
tool/docker/base/Dockerfile_postgres_10_Jenkins .



[madlib] branch master updated (dae72a0 -> c416cd7)

2019-10-01 Thread nkak
This is an automated email from the ASF dual-hosted git repository.

nkak pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/madlib.git.


from dae72a0  update user docs for auto-k and per-point silh and generally 
reorganize
 add c416cd7  DL: Improve performance for madlib_keras_predict()

No new revisions were added by this update.

Summary of changes:
 .../deep_learning/madlib_keras_predict.py_in   | 25 --
 1 file changed, 23 insertions(+), 2 deletions(-)