fmcquillan99 commented on pull request #525:
URL: https://github.com/apache/madlib/pull/525#issuecomment-731350733
(1)
initial tests for functionality - keras_fit()
```
DROP TABLE IF EXISTS cifar_10_model, cifar_10_model_summary;
SELECT madlib.madlib_keras_fit('cifar_10_train_data_packed_allseg', --
source table
'cifar_10_model', -- model
output table
'model_arch_library', -- model
arch table
1, -- model
arch id
$$ loss='categorical_crossentropy',
optimizer='rmsprop(lr=0.0001, decay=1e-6)', metrics=['accuracy']$$, --
compile_params
$$ batch_size=32, epochs=3 $$, -- fit_params
3, --
num_iterations
NULL, -- use GPUs
'cifar_10_test_data_packed_allseg', --
validation dataset
1 -- metrics
compute frequency
);
```
produces warning:
```
WARNING: This version of tensorflow does not support XLA auto-cluster JIT
optimization. HINT: upgrading tensorflow may improve performance. (seg0
slice1 10.128.0.41:40000 pid=6270)
CONTEXT: PL/Python function "fit_transition"
WARNING: This version of tensorflow does not support XLA auto-cluster JIT
optimization. HINT: upgrading tensorflow may improve performance. (seg1
slice1 10.128.0.41:40001 pid=6271)
CONTEXT: PL/Python function "fit_transition"
```
What does user need to do to enable XLA? I am on TF 1.13.1 currently.
Otherwise this ran and also warm start seemed to work.
(2)
initial tests for functionality - keras_fit_multiple_model()
first I started with single segment:
```
SELECT __dist_key__, independent_var_shape, dependent_var_shape, buffer_id
FROM cifar_10_train_data_packed ORDER BY __dist_key__;
__dist_key__ | independent_var_shape | dependent_var_shape | buffer_id
--------------+-----------------------+---------------------+-----------
1 | {16667,32,32,3} | {16667,10} | 0
1 | {16666,32,32,3} | {16666,10} | 2
1 | {16667,32,32,3} | {16667,10} | 1
(3 rows)
SELECT __dist_key__, independent_var_shape, dependent_var_shape, buffer_id
FROM cifar_10_test_data_packed ORDER BY __dist_key__;
__dist_key__ | independent_var_shape | dependent_var_shape | buffer_id
--------------+-----------------------+---------------------+-----------
1 | {10000,32,32,3} | {10000,10} | 0
(1 row)
```
run multi fit:
```
DROP TABLE IF EXISTS cifar10_multi_model, cifar10_multi_model_summary,
cifar10_multi_model_info;
SELECT madlib.madlib_keras_fit_multiple_model('cifar_10_train_data_packed',
-- source_table
'cifar10_multi_model', --
model_output_table
'mst_table', --
model_selection_table
3, --
num_iterations
NULL, --
use gpus
'cifar_10_test_data_packed',
-- validation dataset
1, --
metrics compute frequency
NULL, --
warm_start
'me',
'this is a test run'
);
```
produces error:
```
ERROR: plpy.Error: madlib_keras_fit_multiple_model error: No GPUs
configured on hosts. (plpython.c:5038)
CONTEXT: Traceback (most recent call last):
PL/Python function "madlib_keras_fit_multiple_model", line 23, in <module>
fit_obj = madlib_keras_fit_multiple_model.FitMultipleModel(**globals())
PL/Python function "madlib_keras_fit_multiple_model", line 147, in __init__
PL/Python function "madlib_keras_fit_multiple_model", line 295, in
get_accessible_gpus_for_seg
PL/Python function "madlib_keras_fit_multiple_model"
```
so it looks like `use gpus=NULL` is now defaulting to `TRUE` but it should
default to `FALSE` i.e., CPUs. It used to default to CPUs.
(3)
initial tests for functionality - keras_fit_multiple_model()
next I used 2 segments:
```
SELECT __dist_key__, independent_var_shape, dependent_var_shape, buffer_id
FROM cifar_10_train_data_packed_allseg ORDER BY __dist_key__;
__dist_key__ | independent_var_shape | dependent_var_shape | buffer_id
--------------+-----------------------+---------------------+-----------
0 | {12500,32,32,3} | {12500,10} | 3
0 | {12500,32,32,3} | {12500,10} | 1
1 | {12500,32,32,3} | {12500,10} | 2
1 | {12500,32,32,3} | {12500,10} | 0
(4 rows)
SELECT __dist_key__, independent_var_shape, dependent_var_shape, buffer_id
FROM cifar_10_test_data_packed_allseg ORDER BY __dist_key__;
__dist_key__ | independent_var_shape | dependent_var_shape | buffer_id
--------------+-----------------------+---------------------+-----------
0 | {5000,32,32,3} | {5000,10} | 1
1 | {5000,32,32,3} | {5000,10} | 0
(2 rows)
```
run multi fit:
```
DROP TABLE IF EXISTS cifar10_multi_model, cifar10_multi_model_summary,
cifar10_multi_model_info;
SELECT
madlib.madlib_keras_fit_multiple_model('cifar_10_train_data_packed_allseg',
-- source_table
'cifar10_multi_model', --
model_output_table
'mst_table', --
model_selection_table
3, --
num_iterations
NULL, --
use gpus
'cifar_10_test_data_packed_allseg', -- validation dataset
1, --
metrics compute frequency
NULL, --
warm_start
'me',
'this is a test run'
);
```
which produced error:
```
ERROR: plpy.SPIError: PRIMARY KEY and DISTRIBUTED BY definitions
incompatible
HINT: When there is both a PRIMARY KEY, and a DISTRIBUTED BY clause, the
DISTRIBUTED BY clause must be equal to or a left-subset of the PRIMARY KEY
CONTEXT: Traceback (most recent call last):
PL/Python function "madlib_keras_fit_multiple_model", line 24, in <module>
fit_obj.fit_multiple_model()
PL/Python function "madlib_keras_fit_multiple_model", line 241, in
fit_multiple_model
PL/Python function "madlib_keras_fit_multiple_model", line 509, in
init_model_output_tbl
PL/Python function "madlib_keras_fit_multiple_model"
```
I have also seen this error when running on a single segment.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]