[ 
https://issues.apache.org/jira/browse/MADLIB-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Frank McQuillan updated MADLIB-1364:
------------------------------------
    Priority: Minor  (was: Major)

> Misc message and other items for 1.16 release
> ---------------------------------------------
>
>                 Key: MADLIB-1364
>                 URL: https://issues.apache.org/jira/browse/MADLIB-1364
>             Project: Apache MADlib
>          Issue Type: Improvement
>          Components: Deep Learning
>            Reporter: Frank McQuillan
>            Priority: Minor
>             Fix For: v1.16
>
>
> (1)
> input shape checking
> We added input shape checking which is a good idea in principle, but it seems 
> to be too restrictive. e.g., for the mnist data set, Keras input shape is:
> {code}
> x_train_lt5.shape
> (30596, 28, 28)
> {code}
> In Madlib after preprocessing we get:
> {code}
> id | 2238
> x  | 
> {{0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,12,196,195,12,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,79,159,44,0,0,0,0,39,253,218,10,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,221,253,179,0,0,0,0,149,253,169,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,221,253,53,0,0,0,12,222,253,123,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,8,226,253,16,0,0,0,25,253,253,56,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,50,253,253,16,0,0,0,41,253,218,7,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,139,253,217,8,0,0,0,126,253,193,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,213,253,114,0,0,0,10,226,253,130,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,39,250,253,223,10,0,0,17,253,253,54,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,173,253,253,253,169,137,83,120,253,221,2,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,52,238,254,254,254,254,254,255,254,254,192,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,115,253,228,84,73,97,154,238,253,253,138,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,40,146,45,0,0,0,0,9,253,250,73,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,0,0,0,0,9,253,228,0,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,0,0,0,0,75,253,228,0,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,0,0,0,0,132,253,186,0,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,0,0,0,0,243,253,102,0,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,0,0,0,196,254,238,7,0,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,0,0,0,245,254,186,0,0,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,0,0,0,166,251,79,0,0,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0}}
> y  | 4
> {code}
> A validation error gets thrown when we run fit():
> {code}
> InternalError: (psycopg2.InternalError) plpy.Error: model_keras error: Input 
> shape [28, 28, 1] in the model architecture does not match the input shape 
> [28, 28, None] of column independent_var in table train_lt5_packed. 
> (plpython.c:5038)
> CONTEXT:  Traceback (most recent call last):
>   PL/Python function "madlib_keras_fit", line 21, in <module>
>     madlib_keras.fit(**globals())
>   PL/Python function "madlib_keras_fit", line 42, in wrapper
>   PL/Python function "madlib_keras_fit", line 102, in fit
>   PL/Python function "madlib_keras_fit", line 300, in validate_input_shapes
>   PL/Python function "madlib_keras_fit", line 86, in _validate_input_shapes
> PL/Python function "madlib_keras_fit"
>  [SQL: "SELECT madlib.madlib_keras_fit('train_lt5_packed',           -- 
> source table\n                               'mnist_model',         -- model 
> output table\n                               'model_arch_library',  -- model 
> arch table\n                                1,                    -- model 
> arch id\n                                $$ loss='categorical_crossentropy', 
> optimizer='adadelta', metrics=['accuracy']$$,  -- compile_params\n            
>                     $$ batch_size=batch_size, epochs=1 $$,  -- fit_params\n   
>                              5,                    -- num_iterations\n        
>                         0,                    -- gpus_per_host\n              
>                   'test_lt5_packed',           -- validation table\n          
>                       1                     -- metrics_compute_frequency\n    
>                           );"]
> {code}
> which is too restrictive.  I suggest we turn madlib input shape validation 
> off for the time being and let the back end fail or not according to its 
> rules.  This applies to fit, evaluate and predict.
> (2)
> confusing error message if forgot to preprocess source table
> {code}
> SELECT madlib.madlib_keras_fit('train_lt5',           -- source table (NOT 
> PREPROCESSED)
>                                'mnist_model',         -- model output table
>                                'model_arch_library',  -- model arch table
>                                 1,                    -- model arch id
>                                 $$ loss='categorical_crossentropy', 
> optimizer='adadelta', metrics=['accuracy']$$,  -- compile_params
>                                 $$ batch_size=batch_size, epochs=1 $$,  -- 
> fit_params
>                                 5,                    -- num_iterations
>                                 0,                    -- gpus_per_host
>                                 'test_lt5_packed',           -- validation 
> table
>                                 1                     -- 
> metrics_compute_frequency
>                               );
> InternalError: (psycopg2.InternalError) plpy.Error: madlib_keras_fit error: 
> Input table 'train_lt5_summary' does not exist (plpython.c:5038)
> {code}
> A better message would be:
> {code}
> InternalError: (psycopg2.InternalError) plpy.Error: madlib_keras_fit error: 
> Input table 'train_lt5_summary' does not exist.  Please ensure that the 
> source table you specify has been preprocessed by the image preprocessor. 
> (plpython.c:5038)
> {code}
> (3)
> confusing error message if forgot to preprocess validation table
> {code}
> SELECT madlib.madlib_keras_fit('train_lt5_packed',           -- source table 
> (YES PREPROCESSED)
>                                'mnist_model',         -- model output table
>                                'model_arch_library',  -- model arch table
>                                 1,                    -- model arch id
>                                 $$ loss='categorical_crossentropy', 
> optimizer='adadelta', metrics=['accuracy']$$,  -- compile_params
>                                 $$ batch_size=batch_size, epochs=1 $$,  -- 
> fit_params
>                                 5,                    -- num_iterations
>                                 0,                    -- gpus_per_host
>                                 'test_lt5',           -- validation table  
> (NOT PREPROCESSED)
>                                 1                     -- 
> metrics_compute_frequency
>                               );
> InternalError: (psycopg2.InternalError) plpy.Error: madlib_keras_fit: invalid 
> independent_varname ('independent_var') for table (test_lt5). 
> (plpython.c:5038)
> CONTEXT:  Traceback (most recent call last):
>   PL/Python function "madlib_keras_fit", line 21, in <module>
>     madlib_keras.fit(**globals())
>   PL/Python function "madlib_keras_fit", line 42, in wrapper
>   PL/Python function "madlib_keras_fit", line 71, in fit
>   PL/Python function "madlib_keras_fit", line 233, in __init__
>   PL/Python function "madlib_keras_fit", line 274, in _validate_input_args
>   PL/Python function "madlib_keras_fit", line 288, in 
> _validate_validation_table
>   PL/Python function "madlib_keras_fit", line 242, in _validate_input_table
>   PL/Python function "madlib_keras_fit", line 96, in _assert
> PL/Python function "madlib_keras_fit"
>  [SQL: "SELECT madlib.madlib_keras_fit('train_lt5_packed',           -- 
> source table\n                               'mnist_model',         -- model 
> output table\n                               'model_arch_library',  -- model 
> arch table\n                                1,                    -- model 
> arch id\n                                $$ loss='categorical_crossentropy', 
> optimizer='adadelta', metrics=['accuracy']$$,  -- compile_params\n            
>                     $$ batch_size=batch_size, epochs=1 $$,  -- fit_params\n   
>                              5,                    -- num_iterations\n        
>                         0,                    -- gpus_per_host\n              
>                   'test_lt5',           -- validation table\n                 
>                1                     -- metrics_compute_frequency\n           
>                    );"]
> {code}
> A better message would be:
> {code}
> InternalError: (psycopg2.InternalError) plpy.Error: madlib_keras_fit: invalid 
> independent_varname ('independent_var') for table (test_lt5). Please ensure 
> that this table has been preprocessed by the image preprocessor.  
> (plpython.c:5038)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to