Frank McQuillan created MADLIB-1364:
---------------------------------------

             Summary: Misc message and other items for 1.16 release
                 Key: MADLIB-1364
                 URL: https://issues.apache.org/jira/browse/MADLIB-1364
             Project: Apache MADlib
          Issue Type: Improvement
          Components: Deep Learning
            Reporter: Frank McQuillan
             Fix For: v1.16



(1)
input shape checking

We added input shape checking which is a good idea in principle, but it seems 
to be too restrictive. e.g., for the mnist data set, Keras input shape is:
{code}
x_train_lt5.shape
(30596, 28, 28)
{code}

In Madlib after preprocessing we get:
{code}
id | 2238
x  | 
{{0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,12,196,195,12,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,79,159,44,0,0,0,0,39,253,218,10,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,221,253,179,0,0,0,0,149,253,169,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,221,253,53,0,0,0,12,222,253,123,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,8,226,253,16,0,0,0,25,253,253,56,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,50,253,253,16,0,0,0,41,253,218,7,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,139,253,217,8,0,0,0,126,253,193,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,213,253,114,0,0,0,10,226,253,130,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,39,250,253,223,10,0,0,17,253,253,54,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,173,253,253,253,169,137,83,120,253,221,2,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,52,238,254,254,254,254,254,255,254,254,192,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,115,253,228,84,73,97,154,238,253,253,138,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,40,146,45,0,0,0,0,9,253,250,73,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,0,0,0,0,9,253,228,0,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,0,0,0,0,75,253,228,0,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,0,0,0,0,132,253,186,0,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,0,0,0,0,243,253,102,0,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,0,0,0,196,254,238,7,0,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,0,0,0,245,254,186,0,0,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,0,0,0,166,251,79,0,0,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0}}
y  | 4
{code}

A validation error gets thrown when we run fit():

{code}
InternalError: (psycopg2.InternalError) plpy.Error: model_keras error: Input 
shape [28, 28, 1] in the model architecture does not match the input shape [28, 
28, None] of column independent_var in table train_lt5_packed. (plpython.c:5038)
CONTEXT:  Traceback (most recent call last):
  PL/Python function "madlib_keras_fit", line 21, in <module>
    madlib_keras.fit(**globals())
  PL/Python function "madlib_keras_fit", line 42, in wrapper
  PL/Python function "madlib_keras_fit", line 102, in fit
  PL/Python function "madlib_keras_fit", line 300, in validate_input_shapes
  PL/Python function "madlib_keras_fit", line 86, in _validate_input_shapes
PL/Python function "madlib_keras_fit"
 [SQL: "SELECT madlib.madlib_keras_fit('train_lt5_packed',           -- source 
table\n                               'mnist_model',         -- model output 
table\n                               'model_arch_library',  -- model arch 
table\n                                1,                    -- model arch id\n 
                               $$ loss='categorical_crossentropy', 
optimizer='adadelta', metrics=['accuracy']$$,  -- compile_params\n              
                  $$ batch_size=batch_size, epochs=1 $$,  -- fit_params\n       
                         5,                    -- num_iterations\n              
                  0,                    -- gpus_per_host\n                      
          'test_lt5_packed',           -- validation table\n                    
            1                     -- metrics_compute_frequency\n                
              );"]
{code}

which is too restrictive.  I suggest we turn madlib input shape validation off 
for the time being and let the back end fail or not according to its rules.  
This applies to fit, evaluate and predict.


(2)
confusing error message if forgot to preprocess source table

{code}
SELECT madlib.madlib_keras_fit('train_lt5',           -- source table (NOT 
PREPROCESSED)
                               'mnist_model',         -- model output table
                               'model_arch_library',  -- model arch table
                                1,                    -- model arch id
                                $$ loss='categorical_crossentropy', 
optimizer='adadelta', metrics=['accuracy']$$,  -- compile_params
                                $$ batch_size=batch_size, epochs=1 $$,  -- 
fit_params
                                5,                    -- num_iterations
                                0,                    -- gpus_per_host
                                'test_lt5_packed',           -- validation table
                                1                     -- 
metrics_compute_frequency
                              );

InternalError: (psycopg2.InternalError) plpy.Error: madlib_keras_fit error: 
Input table 'train_lt5_summary' does not exist (plpython.c:5038)
{code}

A better message would be:
{code}
InternalError: (psycopg2.InternalError) plpy.Error: madlib_keras_fit error: 
Input table 'train_lt5_summary' does not exist.  Please ensure that the source 
table you specify has been preprocessed by the image preprocessor. 
(plpython.c:5038)
{code}


(3)
confusing error message if forgot to preprocess validation table

{code}
SELECT madlib.madlib_keras_fit('train_lt5_packed',           -- source table 
(YES PREPROCESSED)
                               'mnist_model',         -- model output table
                               'model_arch_library',  -- model arch table
                                1,                    -- model arch id
                                $$ loss='categorical_crossentropy', 
optimizer='adadelta', metrics=['accuracy']$$,  -- compile_params
                                $$ batch_size=batch_size, epochs=1 $$,  -- 
fit_params
                                5,                    -- num_iterations
                                0,                    -- gpus_per_host
                                'test_lt5',           -- validation table  (NOT 
PREPROCESSED)
                                1                     -- 
metrics_compute_frequency
                              );

InternalError: (psycopg2.InternalError) plpy.Error: madlib_keras_fit: invalid 
independent_varname ('independent_var') for table (test_lt5). (plpython.c:5038)
CONTEXT:  Traceback (most recent call last):
  PL/Python function "madlib_keras_fit", line 21, in <module>
    madlib_keras.fit(**globals())
  PL/Python function "madlib_keras_fit", line 42, in wrapper
  PL/Python function "madlib_keras_fit", line 71, in fit
  PL/Python function "madlib_keras_fit", line 233, in __init__
  PL/Python function "madlib_keras_fit", line 274, in _validate_input_args
  PL/Python function "madlib_keras_fit", line 288, in _validate_validation_table
  PL/Python function "madlib_keras_fit", line 242, in _validate_input_table
  PL/Python function "madlib_keras_fit", line 96, in _assert
PL/Python function "madlib_keras_fit"
 [SQL: "SELECT madlib.madlib_keras_fit('train_lt5_packed',           -- source 
table\n                               'mnist_model',         -- model output 
table\n                               'model_arch_library',  -- model arch 
table\n                                1,                    -- model arch id\n 
                               $$ loss='categorical_crossentropy', 
optimizer='adadelta', metrics=['accuracy']$$,  -- compile_params\n              
                  $$ batch_size=batch_size, epochs=1 $$,  -- fit_params\n       
                         5,                    -- num_iterations\n              
                  0,                    -- gpus_per_host\n                      
          'test_lt5',           -- validation table\n                           
     1                     -- metrics_compute_frequency\n                       
       );"]
{code}

A better message would be:
{code}
InternalError: (psycopg2.InternalError) plpy.Error: madlib_keras_fit: invalid 
independent_varname ('independent_var') for table (test_lt5). Please ensure 
that this table has been preprocessed by the image preprocessor.  
(plpython.c:5038)
{code}





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to