[ https://issues.apache.org/jira/browse/MADLIB-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Frank McQuillan updated MADLIB-1364: ------------------------------------ Description: (1) confusing error message if forgot to preprocess source table {code} SELECT madlib.madlib_keras_fit('train_lt5', -- source table (NOT PREPROCESSED) 'mnist_model', -- model output table 'model_arch_library', -- model arch table 1, -- model arch id $$ loss='categorical_crossentropy', optimizer='adadelta', metrics=['accuracy']$$, -- compile_params $$ batch_size=batch_size, epochs=1 $$, -- fit_params 5, -- num_iterations 0, -- gpus_per_host 'test_lt5_packed', -- validation table 1 -- metrics_compute_frequency ); InternalError: (psycopg2.InternalError) plpy.Error: madlib_keras_fit error: Input table 'train_lt5_summary' does not exist (plpython.c:5038) {code} A better message would be: {code} InternalError: (psycopg2.InternalError) plpy.Error: madlib_keras_fit error: Input table 'train_lt5_summary' does not exist. Please ensure that the source table you specify has been preprocessed by the image preprocessor. (plpython.c:5038) {code} (2) confusing error message if forgot to preprocess validation table {code} SELECT madlib.madlib_keras_fit('train_lt5_packed', -- source table (YES PREPROCESSED) 'mnist_model', -- model output table 'model_arch_library', -- model arch table 1, -- model arch id $$ loss='categorical_crossentropy', optimizer='adadelta', metrics=['accuracy']$$, -- compile_params $$ batch_size=batch_size, epochs=1 $$, -- fit_params 5, -- num_iterations 0, -- gpus_per_host 'test_lt5', -- validation table (NOT PREPROCESSED) 1 -- metrics_compute_frequency ); InternalError: (psycopg2.InternalError) plpy.Error: madlib_keras_fit: invalid independent_varname ('independent_var') for table (test_lt5). (plpython.c:5038) CONTEXT: Traceback (most recent call last): PL/Python function "madlib_keras_fit", line 21, in <module> madlib_keras.fit(**globals()) PL/Python function "madlib_keras_fit", line 42, in wrapper PL/Python function "madlib_keras_fit", line 71, in fit PL/Python function "madlib_keras_fit", line 233, in __init__ PL/Python function "madlib_keras_fit", line 274, in _validate_input_args PL/Python function "madlib_keras_fit", line 288, in _validate_validation_table PL/Python function "madlib_keras_fit", line 242, in _validate_input_table PL/Python function "madlib_keras_fit", line 96, in _assert PL/Python function "madlib_keras_fit" [SQL: "SELECT madlib.madlib_keras_fit('train_lt5_packed', -- source table\n 'mnist_model', -- model output table\n 'model_arch_library', -- model arch table\n 1, -- model arch id\n $$ loss='categorical_crossentropy', optimizer='adadelta', metrics=['accuracy']$$, -- compile_params\n $$ batch_size=batch_size, epochs=1 $$, -- fit_params\n 5, -- num_iterations\n 0, -- gpus_per_host\n 'test_lt5', -- validation table\n 1 -- metrics_compute_frequency\n );"] {code} A better message would be: {code} InternalError: (psycopg2.InternalError) plpy.Error: madlib_keras_fit: invalid independent_varname ('independent_var') for table (test_lt5). Please ensure that this table has been preprocessed by the image preprocessor. (plpython.c:5038) {code} was: (1) input shape checking We added input shape checking which is a good idea in principle, but it seems to be too restrictive. e.g., for the mnist data set, Keras input shape is: {code} x_train_lt5.shape (30596, 28, 28) {code} In Madlib before preprocessing we get: {code} id | 2238 x | {{0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,12,196,195,12,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,79,159,44,0,0,0,0,39,253,218,10,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,221,253,179,0,0,0,0,149,253,169,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,221,253,53,0,0,0,12,222,253,123,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,8,226,253,16,0,0,0,25,253,253,56,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,50,253,253,16,0,0,0,41,253,218,7,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,139,253,217,8,0,0,0,126,253,193,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,213,253,114,0,0,0,10,226,253,130,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,39,250,253,223,10,0,0,17,253,253,54,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,173,253,253,253,169,137,83,120,253,221,2,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,52,238,254,254,254,254,254,255,254,254,192,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,115,253,228,84,73,97,154,238,253,253,138,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,40,146,45,0,0,0,0,9,253,250,73,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,0,0,0,0,9,253,228,0,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,0,0,0,0,75,253,228,0,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,0,0,0,0,132,253,186,0,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,0,0,0,0,243,253,102,0,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,0,0,0,196,254,238,7,0,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,0,0,0,245,254,186,0,0,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,0,0,0,166,251,79,0,0,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0}} y | 4 {code} A validation error gets thrown when we run fit(): {code} InternalError: (psycopg2.InternalError) plpy.Error: model_keras error: Input shape [28, 28, 1] in the model architecture does not match the input shape [28, 28, None] of column independent_var in table train_lt5_packed. (plpython.c:5038) CONTEXT: Traceback (most recent call last): PL/Python function "madlib_keras_fit", line 21, in <module> madlib_keras.fit(**globals()) PL/Python function "madlib_keras_fit", line 42, in wrapper PL/Python function "madlib_keras_fit", line 102, in fit PL/Python function "madlib_keras_fit", line 300, in validate_input_shapes PL/Python function "madlib_keras_fit", line 86, in _validate_input_shapes PL/Python function "madlib_keras_fit" [SQL: "SELECT madlib.madlib_keras_fit('train_lt5_packed', -- source table\n 'mnist_model', -- model output table\n 'model_arch_library', -- model arch table\n 1, -- model arch id\n $$ loss='categorical_crossentropy', optimizer='adadelta', metrics=['accuracy']$$, -- compile_params\n $$ batch_size=batch_size, epochs=1 $$, -- fit_params\n 5, -- num_iterations\n 0, -- gpus_per_host\n 'test_lt5_packed', -- validation table\n 1 -- metrics_compute_frequency\n );"] {code} which is too restrictive. I suggest we turn madlib input shape validation off for the time being and let the back end fail or not according to its rules. This applies to fit, evaluate and predict. (2) confusing error message if forgot to preprocess source table {code} SELECT madlib.madlib_keras_fit('train_lt5', -- source table (NOT PREPROCESSED) 'mnist_model', -- model output table 'model_arch_library', -- model arch table 1, -- model arch id $$ loss='categorical_crossentropy', optimizer='adadelta', metrics=['accuracy']$$, -- compile_params $$ batch_size=batch_size, epochs=1 $$, -- fit_params 5, -- num_iterations 0, -- gpus_per_host 'test_lt5_packed', -- validation table 1 -- metrics_compute_frequency ); InternalError: (psycopg2.InternalError) plpy.Error: madlib_keras_fit error: Input table 'train_lt5_summary' does not exist (plpython.c:5038) {code} A better message would be: {code} InternalError: (psycopg2.InternalError) plpy.Error: madlib_keras_fit error: Input table 'train_lt5_summary' does not exist. Please ensure that the source table you specify has been preprocessed by the image preprocessor. (plpython.c:5038) {code} (3) confusing error message if forgot to preprocess validation table {code} SELECT madlib.madlib_keras_fit('train_lt5_packed', -- source table (YES PREPROCESSED) 'mnist_model', -- model output table 'model_arch_library', -- model arch table 1, -- model arch id $$ loss='categorical_crossentropy', optimizer='adadelta', metrics=['accuracy']$$, -- compile_params $$ batch_size=batch_size, epochs=1 $$, -- fit_params 5, -- num_iterations 0, -- gpus_per_host 'test_lt5', -- validation table (NOT PREPROCESSED) 1 -- metrics_compute_frequency ); InternalError: (psycopg2.InternalError) plpy.Error: madlib_keras_fit: invalid independent_varname ('independent_var') for table (test_lt5). (plpython.c:5038) CONTEXT: Traceback (most recent call last): PL/Python function "madlib_keras_fit", line 21, in <module> madlib_keras.fit(**globals()) PL/Python function "madlib_keras_fit", line 42, in wrapper PL/Python function "madlib_keras_fit", line 71, in fit PL/Python function "madlib_keras_fit", line 233, in __init__ PL/Python function "madlib_keras_fit", line 274, in _validate_input_args PL/Python function "madlib_keras_fit", line 288, in _validate_validation_table PL/Python function "madlib_keras_fit", line 242, in _validate_input_table PL/Python function "madlib_keras_fit", line 96, in _assert PL/Python function "madlib_keras_fit" [SQL: "SELECT madlib.madlib_keras_fit('train_lt5_packed', -- source table\n 'mnist_model', -- model output table\n 'model_arch_library', -- model arch table\n 1, -- model arch id\n $$ loss='categorical_crossentropy', optimizer='adadelta', metrics=['accuracy']$$, -- compile_params\n $$ batch_size=batch_size, epochs=1 $$, -- fit_params\n 5, -- num_iterations\n 0, -- gpus_per_host\n 'test_lt5', -- validation table\n 1 -- metrics_compute_frequency\n );"] {code} A better message would be: {code} InternalError: (psycopg2.InternalError) plpy.Error: madlib_keras_fit: invalid independent_varname ('independent_var') for table (test_lt5). Please ensure that this table has been preprocessed by the image preprocessor. (plpython.c:5038) {code} > Misc message and other items for 1.16 release > --------------------------------------------- > > Key: MADLIB-1364 > URL: https://issues.apache.org/jira/browse/MADLIB-1364 > Project: Apache MADlib > Issue Type: Improvement > Components: Deep Learning > Reporter: Frank McQuillan > Assignee: Nikhil > Priority: Minor > Fix For: v1.16 > > > (1) > confusing error message if forgot to preprocess source table > {code} > SELECT madlib.madlib_keras_fit('train_lt5', -- source table (NOT > PREPROCESSED) > 'mnist_model', -- model output table > 'model_arch_library', -- model arch table > 1, -- model arch id > $$ loss='categorical_crossentropy', > optimizer='adadelta', metrics=['accuracy']$$, -- compile_params > $$ batch_size=batch_size, epochs=1 $$, -- > fit_params > 5, -- num_iterations > 0, -- gpus_per_host > 'test_lt5_packed', -- validation > table > 1 -- > metrics_compute_frequency > ); > InternalError: (psycopg2.InternalError) plpy.Error: madlib_keras_fit error: > Input table 'train_lt5_summary' does not exist (plpython.c:5038) > {code} > A better message would be: > {code} > InternalError: (psycopg2.InternalError) plpy.Error: madlib_keras_fit error: > Input table 'train_lt5_summary' does not exist. Please ensure that the > source table you specify has been preprocessed by the image preprocessor. > (plpython.c:5038) > {code} > (2) > confusing error message if forgot to preprocess validation table > {code} > SELECT madlib.madlib_keras_fit('train_lt5_packed', -- source table > (YES PREPROCESSED) > 'mnist_model', -- model output table > 'model_arch_library', -- model arch table > 1, -- model arch id > $$ loss='categorical_crossentropy', > optimizer='adadelta', metrics=['accuracy']$$, -- compile_params > $$ batch_size=batch_size, epochs=1 $$, -- > fit_params > 5, -- num_iterations > 0, -- gpus_per_host > 'test_lt5', -- validation table > (NOT PREPROCESSED) > 1 -- > metrics_compute_frequency > ); > InternalError: (psycopg2.InternalError) plpy.Error: madlib_keras_fit: invalid > independent_varname ('independent_var') for table (test_lt5). > (plpython.c:5038) > CONTEXT: Traceback (most recent call last): > PL/Python function "madlib_keras_fit", line 21, in <module> > madlib_keras.fit(**globals()) > PL/Python function "madlib_keras_fit", line 42, in wrapper > PL/Python function "madlib_keras_fit", line 71, in fit > PL/Python function "madlib_keras_fit", line 233, in __init__ > PL/Python function "madlib_keras_fit", line 274, in _validate_input_args > PL/Python function "madlib_keras_fit", line 288, in > _validate_validation_table > PL/Python function "madlib_keras_fit", line 242, in _validate_input_table > PL/Python function "madlib_keras_fit", line 96, in _assert > PL/Python function "madlib_keras_fit" > [SQL: "SELECT madlib.madlib_keras_fit('train_lt5_packed', -- > source table\n 'mnist_model', -- model > output table\n 'model_arch_library', -- model > arch table\n 1, -- model > arch id\n $$ loss='categorical_crossentropy', > optimizer='adadelta', metrics=['accuracy']$$, -- compile_params\n > $$ batch_size=batch_size, epochs=1 $$, -- fit_params\n > 5, -- num_iterations\n > 0, -- gpus_per_host\n > 'test_lt5', -- validation table\n > 1 -- metrics_compute_frequency\n > );"] > {code} > A better message would be: > {code} > InternalError: (psycopg2.InternalError) plpy.Error: madlib_keras_fit: invalid > independent_varname ('independent_var') for table (test_lt5). Please ensure > that this table has been preprocessed by the image preprocessor. > (plpython.c:5038) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)