fmcquillan99 commented on issue #395: DL: madlib_keras_evaluate() function
URL: https://github.com/apache/madlib/pull/395#issuecomment-495748933
 
 
   
   (1)
   happy path
   
   ```
   DROP TABLE IF EXISTS iris_validate;
   
   SELECT madlib.madlib_keras_evaluate('iris_model',    -- model
                                      'iris_test',     -- test table
                                      'class_text',    -- dependent var
                                      'attributes',    -- independent var
                                      'iris_validate'  -- output table
                                      );
   
   WARNING:  column "attributes" does not exist
   CONTEXT:  PL/Python function "madlib_keras_evaluate"
   
   ERROR:  plpy.Error: madlib_keras_evaluate error: invalid independent_varname 
('attributes') for test table (iris_test_packed). (plpython.c:5038)
   CONTEXT:  Traceback (most recent call last):
     PL/Python function "madlib_keras_evaluate", line 21, in <module>
       return madlib_keras.evaluate(**globals())
     PL/Python function "madlib_keras_evaluate", line 524, in evaluate
     PL/Python function "madlib_keras_evaluate", line 95, in __init__
     PL/Python function "madlib_keras_evaluate", line 103, in 
_validate_input_args
     PL/Python function "madlib_keras_evaluate", line 127, in 
_validate_test_tbl_cols
     PL/Python function "madlib_keras_evaluate", line 96, in _assert
   PL/Python function "madlib_keras_evaluate"
   ```
   
   Like with `fit()` you should be able to use the original validate column 
names and not have to use `dependent_var` and `independent_var` : 
   
   ```
   DROP TABLE IF EXISTS iris_validate;
   
   SELECT madlib.madlib_keras_evaluate('iris_model',    -- model
                                      'iris_test_packed',     -- test table
                                      'dependent_var',    -- XXX 
                                      'independent_var',    -- XXX
                                      'iris_validate'  -- output table
                                      );
   NOTICE:  Table doesn't have 'DISTRIBUTED BY' clause -- Using column(s) named 
'loss' as the Greenplum Database data distribution key for this table.
   HINT:  The 'DISTRIBUTED BY' clause determines the distribution of data. Make 
sure column(s) chosen are the optimal data distribution key to minimize skew.
   CONTEXT:  SQL statement "
           CREATE TABLE iris_validate AS
           SELECT $1 as loss, $2 as metric"
   PL/Python function "madlib_keras_evaluate"
    madlib_keras_evaluate 
   -----------------------
    
   (1 row)
   
   Time: 2009.115 ms
   ```
   
   Also need to suppress the verbose output.
   
   
   (2)
   pass in table that has not been mini-batched
   
   ```
   DROP TABLE IF EXISTS iris_validate;
   
   SELECT madlib.madlib_keras_evaluate('iris_model',    -- model
                                      'iris_test',     -- test table
                                      'dependent_var',    -- XXX 
                                      'independent_var',    -- XXX
                                      'iris_validate'  -- output table
                                      );
   ERROR:  plpy.SPIError: function array_length(character varying, integer) 
does not exist
   LINE 1:  SELECT gp_segment_id, SUM(ARRAY_LENGTH(class_text, 1)) AS i...
                                      ^
   HINT:  No function matches the given name and argument types. You might need 
to add explicit type casts.
   QUERY:   SELECT gp_segment_id, SUM(ARRAY_LENGTH(class_text, 1)) AS 
images_per_seg
                   FROM iris_test
                   GROUP BY gp_segment_id
               
   CONTEXT:  Traceback (most recent call last):
     PL/Python function "madlib_keras_evaluate", line 21, in <module>
       return madlib_keras.evaluate(**globals())
     PL/Python function "madlib_keras_evaluate", line 541, in evaluate
     PL/Python function "madlib_keras_evaluate", line 384, in get_images_per_seg
   PL/Python function "madlib_keras_evaluate"
   
   ```
   
   This will be a common mistake, I'm not sure if the message above is affect 
by #1 but can we provide a better message what says you need to mini-batch 
first?  When did the same with `fit()` I got: 
   
   ```
   DROP TABLE IF EXISTS iris_model, iris_model_summary;
   
   SELECT madlib.madlib_keras_fit('iris_train',   -- source table
                                  'iris_model',          -- model output table
                                  'model_arch_library',  -- model arch table
                                   1,                    -- model arch id
                                   $$ loss='categorical_crossentropy', 
optimizer='adam', metrics=['accuracy'] $$,  -- compile_params
                                   $$ batch_size=5, epochs=3 $$,  -- fit_params
                                   3                    -- num_iterations
                                 );
   ERROR:  plpy.Error: madlib_keras_fit error: Input table 'iris_train_summary' 
does not exist (plpython.c:5038)
   CONTEXT:  Traceback (most recent call last):
     PL/Python function "madlib_keras_fit", line 21, in <module>
       madlib_keras.fit(**globals())
     PL/Python function "madlib_keras_fit", line 92, in fit
     PL/Python function "madlib_keras_fit", line 192, in __init__
     PL/Python function "madlib_keras_fit", line 223, in _validate_input_args
     PL/Python function "madlib_keras_fit", line 670, in input_tbl_valid
   PL/Python function "madlib_keras_fit"
   ```
   
   which I guess is better but not that explicit.
   
   
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to