[GitHub] [madlib] fmcquillan99 commented on pull request #525: DL: Model Hopper Refactor

GitBox Fri, 20 Nov 2020 10:57:45 -0800


fmcquillan99 commented on pull request #525:
URL: https://github.com/apache/madlib/pull/525#issuecomment-731350733



   (1)
   initial tests for functionality - keras_fit()
   
   ```
   DROP TABLE IF EXISTS cifar_10_model, cifar_10_model_summary;
   SELECT madlib.madlib_keras_fit('cifar_10_train_data_packed_allseg',    -- 
source table
                                  'cifar_10_model',                -- model 
output table
                                  'model_arch_library',            -- model 
arch table
                                   1,                              -- model 
arch id
                                   $$ loss='categorical_crossentropy', 
optimizer='rmsprop(lr=0.0001, decay=1e-6)', metrics=['accuracy']$$,  -- 
compile_params
                                   $$ batch_size=32, epochs=3 $$,  -- fit_params
                                   3,                              -- 
num_iterations
                                   NULL,                          -- use GPUs
                                   'cifar_10_test_data_packed_allseg',    -- 
validation dataset 
                                   1                               -- metrics 
compute frequency 
                                 ); 
   ```
   produces warning:
   
   ```
   WARNING:  This version of tensorflow does not support XLA auto-cluster JIT 
optimization.  HINT:  upgrading tensorflow may improve performance.  (seg0 
slice1 10.128.0.41:40000 pid=6270)
   CONTEXT:  PL/Python function "fit_transition"
   WARNING:  This version of tensorflow does not support XLA auto-cluster JIT 
optimization.  HINT:  upgrading tensorflow may improve performance.  (seg1 
slice1 10.128.0.41:40001 pid=6271)
   CONTEXT:  PL/Python function "fit_transition"
   ```
   
   What does user need to do to enable XLA?  I am on TF 1.13.1 currently.
   
   Otherwise this ran and also warm start seemed to work.
   
   
   (2)
   initial tests for functionality - keras_fit_multiple_model()
   
   first I started with single segment:
   ```
   SELECT __dist_key__, independent_var_shape, dependent_var_shape, buffer_id 
FROM cifar_10_train_data_packed ORDER BY __dist_key__;
    __dist_key__ | independent_var_shape | dependent_var_shape | buffer_id 
   --------------+-----------------------+---------------------+-----------
               1 | {16667,32,32,3}       | {16667,10}          |         0
               1 | {16666,32,32,3}       | {16666,10}          |         2
               1 | {16667,32,32,3}       | {16667,10}          |         1
   (3 rows)
   
   SELECT __dist_key__, independent_var_shape, dependent_var_shape, buffer_id 
FROM cifar_10_test_data_packed ORDER BY __dist_key__;
    __dist_key__ | independent_var_shape | dependent_var_shape | buffer_id 
   --------------+-----------------------+---------------------+-----------
               1 | {10000,32,32,3}       | {10000,10}          |         0
   (1 row)
   ```
   
   run multi fit:
   ```
   DROP TABLE IF EXISTS cifar10_multi_model, cifar10_multi_model_summary, 
cifar10_multi_model_info;
   SELECT madlib.madlib_keras_fit_multiple_model('cifar_10_train_data_packed',  
  -- source_table
                                                 'cifar10_multi_model',     -- 
model_output_table
                                                 'mst_table',               -- 
model_selection_table
                                                  3,                       -- 
num_iterations
                                                  NULL,                     -- 
use gpus
                                                 'cifar_10_test_data_packed',   
   -- validation dataset
                                                  1,                         -- 
metrics compute frequency
                                                  NULL,                      -- 
warm_start
                                                  'me',
                                                  'this is a test run'
                                                );
   ```
   
   produces error:
   ```
   ERROR:  plpy.Error: madlib_keras_fit_multiple_model error: No GPUs 
configured on hosts. (plpython.c:5038)
   CONTEXT:  Traceback (most recent call last):
     PL/Python function "madlib_keras_fit_multiple_model", line 23, in <module>
       fit_obj = madlib_keras_fit_multiple_model.FitMultipleModel(**globals())
     PL/Python function "madlib_keras_fit_multiple_model", line 147, in __init__
     PL/Python function "madlib_keras_fit_multiple_model", line 295, in 
get_accessible_gpus_for_seg
   PL/Python function "madlib_keras_fit_multiple_model"
   
   ```
   
   so it looks like `use gpus=NULL` is now defaulting to `TRUE` but it should 
default to `FALSE` i.e., CPUs.  It used to default to CPUs.
   
   
   (3)
   initial tests for functionality - keras_fit_multiple_model()
   
   next I used 2 segments:
   ```
   SELECT __dist_key__, independent_var_shape, dependent_var_shape, buffer_id 
FROM cifar_10_train_data_packed_allseg ORDER BY __dist_key__;
    __dist_key__ | independent_var_shape | dependent_var_shape | buffer_id 
   --------------+-----------------------+---------------------+-----------
               0 | {12500,32,32,3}       | {12500,10}          |         3
               0 | {12500,32,32,3}       | {12500,10}          |         1
               1 | {12500,32,32,3}       | {12500,10}          |         2
               1 | {12500,32,32,3}       | {12500,10}          |         0
   (4 rows)
   
   SELECT __dist_key__, independent_var_shape, dependent_var_shape, buffer_id 
FROM cifar_10_test_data_packed_allseg ORDER BY __dist_key__;
    __dist_key__ | independent_var_shape | dependent_var_shape | buffer_id 
   --------------+-----------------------+---------------------+-----------
               0 | {5000,32,32,3}        | {5000,10}           |         1
               1 | {5000,32,32,3}        | {5000,10}           |         0
   (2 rows)
   ```
   
   run multi fit:
   ```
   DROP TABLE IF EXISTS cifar10_multi_model, cifar10_multi_model_summary, 
cifar10_multi_model_info;
   SELECT 
madlib.madlib_keras_fit_multiple_model('cifar_10_train_data_packed_allseg',    
-- source_table
                                                 'cifar10_multi_model',     -- 
model_output_table
                                                 'mst_table',               -- 
model_selection_table
                                                  3,                       -- 
num_iterations
                                                  NULL,                     -- 
use gpus
                                                 
'cifar_10_test_data_packed_allseg',      -- validation dataset
                                                  1,                         -- 
metrics compute frequency
                                                  NULL,                      -- 
warm_start
                                                  'me',
                                                  'this is a test run'
                                                );
   ```
   
   which produced error:
   ```
   ERROR:  plpy.SPIError: PRIMARY KEY and DISTRIBUTED BY definitions 
incompatible
   HINT:  When there is both a PRIMARY KEY, and a DISTRIBUTED BY clause, the 
DISTRIBUTED BY clause must be equal to or a left-subset of the PRIMARY KEY
   CONTEXT:  Traceback (most recent call last):
     PL/Python function "madlib_keras_fit_multiple_model", line 24, in <module>
       fit_obj.fit_multiple_model()
     PL/Python function "madlib_keras_fit_multiple_model", line 241, in 
fit_multiple_model
     PL/Python function "madlib_keras_fit_multiple_model", line 509, in 
init_model_output_tbl
   PL/Python function "madlib_keras_fit_multiple_model"
   ```
   
   I have also seen this error when running on a single segment.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [madlib] fmcquillan99 commented on pull request #525: DL: Model Hopper Refactor

Reply via email to