[ https://issues.apache.org/jira/browse/MADLIB-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Frank McQuillan updated MADLIB-1364: ------------------------------------ Priority: Minor (was: Major) > Misc message and other items for 1.16 release > --------------------------------------------- > > Key: MADLIB-1364 > URL: https://issues.apache.org/jira/browse/MADLIB-1364 > Project: Apache MADlib > Issue Type: Improvement > Components: Deep Learning > Reporter: Frank McQuillan > Priority: Minor > Fix For: v1.16 > > > (1) > input shape checking > We added input shape checking which is a good idea in principle, but it seems > to be too restrictive. e.g., for the mnist data set, Keras input shape is: > {code} > x_train_lt5.shape > (30596, 28, 28) > {code} > In Madlib after preprocessing we get: > {code} > id | 2238 > x | > {{0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,12,196,195,12,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,79,159,44,0,0,0,0,39,253,218,10,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,221,253,179,0,0,0,0,149,253,169,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,221,253,53,0,0,0,12,222,253,123,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,8,226,253,16,0,0,0,25,253,253,56,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,50,253,253,16,0,0,0,41,253,218,7,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,139,253,217,8,0,0,0,126,253,193,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,213,253,114,0,0,0,10,226,253,130,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,39,250,253,223,10,0,0,17,253,253,54,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,173,253,253,253,169,137,83,120,253,221,2,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,52,238,254,254,254,254,254,255,254,254,192,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,115,253,228,84,73,97,154,238,253,253,138,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,40,146,45,0,0,0,0,9,253,250,73,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,0,0,0,0,9,253,228,0,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,0,0,0,0,75,253,228,0,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,0,0,0,0,132,253,186,0,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,0,0,0,0,243,253,102,0,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,0,0,0,196,254,238,7,0,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,0,0,0,245,254,186,0,0,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,0,0,0,166,251,79,0,0,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0}} > y | 4 > {code} > A validation error gets thrown when we run fit(): > {code} > InternalError: (psycopg2.InternalError) plpy.Error: model_keras error: Input > shape [28, 28, 1] in the model architecture does not match the input shape > [28, 28, None] of column independent_var in table train_lt5_packed. > (plpython.c:5038) > CONTEXT: Traceback (most recent call last): > PL/Python function "madlib_keras_fit", line 21, in <module> > madlib_keras.fit(**globals()) > PL/Python function "madlib_keras_fit", line 42, in wrapper > PL/Python function "madlib_keras_fit", line 102, in fit > PL/Python function "madlib_keras_fit", line 300, in validate_input_shapes > PL/Python function "madlib_keras_fit", line 86, in _validate_input_shapes > PL/Python function "madlib_keras_fit" > [SQL: "SELECT madlib.madlib_keras_fit('train_lt5_packed', -- > source table\n 'mnist_model', -- model > output table\n 'model_arch_library', -- model > arch table\n 1, -- model > arch id\n $$ loss='categorical_crossentropy', > optimizer='adadelta', metrics=['accuracy']$$, -- compile_params\n > $$ batch_size=batch_size, epochs=1 $$, -- fit_params\n > 5, -- num_iterations\n > 0, -- gpus_per_host\n > 'test_lt5_packed', -- validation table\n > 1 -- metrics_compute_frequency\n > );"] > {code} > which is too restrictive. I suggest we turn madlib input shape validation > off for the time being and let the back end fail or not according to its > rules. This applies to fit, evaluate and predict. > (2) > confusing error message if forgot to preprocess source table > {code} > SELECT madlib.madlib_keras_fit('train_lt5', -- source table (NOT > PREPROCESSED) > 'mnist_model', -- model output table > 'model_arch_library', -- model arch table > 1, -- model arch id > $$ loss='categorical_crossentropy', > optimizer='adadelta', metrics=['accuracy']$$, -- compile_params > $$ batch_size=batch_size, epochs=1 $$, -- > fit_params > 5, -- num_iterations > 0, -- gpus_per_host > 'test_lt5_packed', -- validation > table > 1 -- > metrics_compute_frequency > ); > InternalError: (psycopg2.InternalError) plpy.Error: madlib_keras_fit error: > Input table 'train_lt5_summary' does not exist (plpython.c:5038) > {code} > A better message would be: > {code} > InternalError: (psycopg2.InternalError) plpy.Error: madlib_keras_fit error: > Input table 'train_lt5_summary' does not exist. Please ensure that the > source table you specify has been preprocessed by the image preprocessor. > (plpython.c:5038) > {code} > (3) > confusing error message if forgot to preprocess validation table > {code} > SELECT madlib.madlib_keras_fit('train_lt5_packed', -- source table > (YES PREPROCESSED) > 'mnist_model', -- model output table > 'model_arch_library', -- model arch table > 1, -- model arch id > $$ loss='categorical_crossentropy', > optimizer='adadelta', metrics=['accuracy']$$, -- compile_params > $$ batch_size=batch_size, epochs=1 $$, -- > fit_params > 5, -- num_iterations > 0, -- gpus_per_host > 'test_lt5', -- validation table > (NOT PREPROCESSED) > 1 -- > metrics_compute_frequency > ); > InternalError: (psycopg2.InternalError) plpy.Error: madlib_keras_fit: invalid > independent_varname ('independent_var') for table (test_lt5). > (plpython.c:5038) > CONTEXT: Traceback (most recent call last): > PL/Python function "madlib_keras_fit", line 21, in <module> > madlib_keras.fit(**globals()) > PL/Python function "madlib_keras_fit", line 42, in wrapper > PL/Python function "madlib_keras_fit", line 71, in fit > PL/Python function "madlib_keras_fit", line 233, in __init__ > PL/Python function "madlib_keras_fit", line 274, in _validate_input_args > PL/Python function "madlib_keras_fit", line 288, in > _validate_validation_table > PL/Python function "madlib_keras_fit", line 242, in _validate_input_table > PL/Python function "madlib_keras_fit", line 96, in _assert > PL/Python function "madlib_keras_fit" > [SQL: "SELECT madlib.madlib_keras_fit('train_lt5_packed', -- > source table\n 'mnist_model', -- model > output table\n 'model_arch_library', -- model > arch table\n 1, -- model > arch id\n $$ loss='categorical_crossentropy', > optimizer='adadelta', metrics=['accuracy']$$, -- compile_params\n > $$ batch_size=batch_size, epochs=1 $$, -- fit_params\n > 5, -- num_iterations\n > 0, -- gpus_per_host\n > 'test_lt5', -- validation table\n > 1 -- metrics_compute_frequency\n > );"] > {code} > A better message would be: > {code} > InternalError: (psycopg2.InternalError) plpy.Error: madlib_keras_fit: invalid > independent_varname ('independent_var') for table (test_lt5). Please ensure > that this table has been preprocessed by the image preprocessor. > (plpython.c:5038) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)