fmcquillan99 edited a comment on pull request #506: URL: https://github.com/apache/madlib/pull/506#issuecomment-665993038
errors and issues (1) ``` SELECT madlib.generate_model_selection_configs( 'model_arch_library', -- model architecture table 'mst_table', -- model selection table output ARRAY[1,2], -- model ids from model architecture table $$ { 'lr': [1.0, 2.0, 'linear'] } $$, -- compile_param_grid $$ { 'batch_size': [8], 'epochs': [1] } $$, -- fit_param_grid 'random', -- search_type (‘grid’ or ‘random’, default ‘grid’) 5, -- num_configs (number of sampled parameters. Default=10) [to limit testing] NULL, -- random_state NULL -- object table (Default=None) );] ``` produces ``` InternalError: (psycopg2.errors.InternalError_) TypeError: cannot concatenate 'str' and 'float' objects (plpython.c:5038) CONTEXT: Traceback (most recent call last): PL/Python function "generate_model_selection_configs", line 21, in <module> mst_loader = madlib_keras_model_selection.MstSearch(**globals()) PL/Python function "generate_model_selection_configs", line 42, in wrapper PL/Python function "generate_model_selection_configs", line 287, in __init__ PL/Python function "generate_model_selection_configs", line 426, in find_random_combinations PL/Python function "generate_model_selection_configs", line 490, in generate_row_string PL/Python function "generate_model_selection_configs" [SQL: SELECT madlib.generate_model_selection_configs( 'model_arch_library', -- model architecture table 'mst_table', -- model selection table output ARRAY[1,2], -- model ids from model architecture table $$ { 'loss': ['categorical_crossentropy'], 'lr': [0.0001, 0.1, 'linear'] } $$, -- compile_param_grid $$ { 'batch_size': [8], 'epochs': [1] } $$, -- fit_param_grid 'random', -- search_type (‘grid’ or ‘random’, default ‘grid’) 5, -- num_configs (number of sampled parameters. Default=10) [to limit testing] NULL, -- random_state NULL -- object table (Default=None) );] (Background on this error at: http://sqlalche.me/e/2j85) ``` Likewise ``` DROP TABLE IF EXISTS mst_table, mst_table_summary; SELECT madlib.generate_model_selection_configs( 'model_arch_library', -- model architecture table 'mst_table', -- model selection table output ARRAY[1,2], -- model ids from model architecture table $$ { 'lr': [1.0, 2.0, 'log'], } $$, -- compile_param_grid $$ { 'batch_size': [8], 'epochs': [1] } $$, -- fit_param_grid 'random', -- search_type (‘grid’ or ‘random’, default ‘grid’) 1, -- num_configs (number of sampled parameters. Default=10) [to limit testing] NULL, -- random_state NULL -- object table (Default=None) ); SELECT * FROM mst_table ORDER BY mst_key; ``` produces ``` InternalError: (psycopg2.errors.InternalError_) TypeError: cannot concatenate 'str' and 'numpy.float64' objects (plpython.c:5038) CONTEXT: Traceback (most recent call last): PL/Python function "generate_model_selection_configs", line 21, in <module> mst_loader = madlib_keras_model_selection.MstSearch(**globals()) PL/Python function "generate_model_selection_configs", line 42, in wrapper PL/Python function "generate_model_selection_configs", line 287, in __init__ PL/Python function "generate_model_selection_configs", line 426, in find_random_combinations PL/Python function "generate_model_selection_configs", line 490, in generate_row_string PL/Python function "generate_model_selection_configs" [SQL: SELECT madlib.generate_model_selection_configs( 'model_arch_library', -- model architecture table 'mst_table', -- model selection table output ARRAY[1,2], -- model ids from model architecture table $$ { 'lr': [1.0, 2.0, 'log'], } $$, -- compile_param_grid $$ { 'batch_size': [8], 'epochs': [1] } $$, -- fit_param_grid 'random', -- search_type (‘grid’ or ‘random’, default ‘grid’) 1, -- num_configs (number of sampled parameters. Default=10) [to limit testing] NULL, -- random_state NULL -- object table (Default=None) );] (Background on this error at: http://sqlalche.me/e/2j85) ``` (2) For search_type = 'grid' or 'random', use should be able to enter part of the string, e.g., 'rand' for random or 'g' for for grid. There is a MADlib function that supports this. (3) change the name of the function from `generate_model_selection_configs` to `generate_model_configs` (4) remove exclamations ! from error messages and random capitalization. Suggested messages: "DL: 'num_configs' and 'random_state' must be NULL for grid search" "DL: Cannot search from a distribution with grid search" "DL: 'num_configs' cannot be NULL for random search" "DL: 'search_type' must be either 'grid' or 'random'" "DL: Please choose a valid distribution type ('linear' or 'log')" "DL: {0} should be of the format [lower_bound, upper_bound, distribution_type]" (5) In addition to `linear` sampling and `log` sampling we should add another type called `log_near_one` ``` config_dict[cp] = 1.0 - np.power(10, np.random.uniform (np.log10 (1.0 - param_values[1]), np.log10(1.0 - param_values[0]) ) ) ``` This type of sampling is useful for exponentially weighted average type params like momentum, which are very sensitive to changes near 1. It has the effect of producing more values near 1 than regular log sampling. e.g. momentum values in range [0.9000, 0.9005] average the prev 10 values no matter where you are in the range (no diff) but momentum values in range [0.9990, 0.9995] average the prev 1000 values for the left side and prev 2000 values for the right side (big diff), so you want to generate more samples nearer to the right side to get better coverage. (6) ``` DROP TABLE IF EXISTS mst_table, mst_table_summary; SELECT madlib.generate_model_selection_configs( 'model_arch_library', -- model architecture table 'mst_table', -- model selection table output ARRAY[1], -- model ids from model architecture table $$ { 'loss': ['categorical_crossentropy'], 'optimizer': ['Adam'], 'lr': [0.9, 0.95, 'log'], 'metrics': ['accuracy'] } $$, -- compile_param_grid $$ { 'batch_size': [8, 32, 64, 128, 256, 1024, 4096], 'epochs': [1, 2, 3, 5, 10, 12] } $$, -- fit_param_grid 'random', -- search_type 5, -- num_configs NULL, -- random_state NULL -- object table (Default=None) ); SELECT * FROM mst_table ORDER BY mst_key; ``` followed by ``` SELECT madlib.generate_model_selection_configs( 'model_arch_library', -- model architecture table 'mst_table', -- model selection table output ARRAY[1], -- model ids from model architecture table $$ { 'loss': ['categorical_crossentropy'], 'optimizer': ['SGD'], 'metrics': ['accuracy'] } $$, -- compile_param_grid $$ { 'batch_size': [8, 32, 64, 128, 256, 1024, 4096], 'epochs': [1, 2, 3, 5, 10, 12] } $$, -- fit_param_grid 'random', -- search_type 5, -- num_configs NULL, -- random_state NULL -- object table (Default=None) ); SELECT * FROM mst_table ORDER BY mst_key; ``` produces ``` IntegrityError: (psycopg2.errors.UniqueViolation) plpy.SPIError: duplicate key value violates unique constraint "mst_table_model_id_key" (seg0 10.128.0.41:40000 pid=22297) DETAIL: Key (model_id, compile_params, fit_params)=(1, optimizer='SGD()',metrics=['accuracy'],loss='categorical_crossentropy', epochs=12,batch_size=32) already exists. CONTEXT: Traceback (most recent call last): PL/Python function "generate_model_selection_configs", line 22, in <module> mst_loader.load() PL/Python function "generate_model_selection_configs", line 313, in load PL/Python function "generate_model_selection_configs", line 566, in insert_into_mst_table PL/Python function "generate_model_selection_configs" [SQL: SELECT madlib.generate_model_selection_configs( 'model_arch_library', -- model architecture table 'mst_table', -- model selection table output ARRAY[1], -- model ids from model architecture table $$ { 'loss': ['categorical_crossentropy'], 'optimizer': ['SGD'], 'metrics': ['accuracy'] } $$, -- compile_param_grid $$ { 'batch_size': [8, 32, 64, 128, 256, 1024, 4096], 'epochs': [1, 2, 3, 5, 10, 12] } $$, -- fit_param_grid 'random', -- search_type 5, -- num_configs NULL, -- random_state NULL -- object table (Default=None) );] (Background on this error at: http://sqlalche.me/e/gkpj) ``` But it only produced the error every 2nd time I did this. i.e., 1-pass it would work then the 2nd pass it would throw the error. When it does pass, it produces ``` mst_key | model_id | compile_params | fit_params ---------+----------+----------------------------------------------------------------------------------------------+-------------------------- 1 | 1 | optimizer='Adam(lr=0.9063214445649174)',metrics=['accuracy'],loss='categorical_crossentropy' | epochs=10,batch_size=256 2 | 1 | optimizer='Adam(lr=0.9367722192055232)',metrics=['accuracy'],loss='categorical_crossentropy' | epochs=5,batch_size=256 3 | 1 | optimizer='Adam(lr=0.9212048311857509)',metrics=['accuracy'],loss='categorical_crossentropy' | epochs=2,batch_size=32 4 | 1 | optimizer='Adam(lr=0.9193149125403647)',metrics=['accuracy'],loss='categorical_crossentropy' | epochs=3,batch_size=256 5 | 1 | optimizer='Adam(lr=0.9326284661833211)',metrics=['accuracy'],loss='categorical_crossentropy' | epochs=2,batch_size=256 6 | 1 | optimizer='SGD()',metrics=['accuracy'],loss='categorical_crossentropy' | epochs=10,batch_size=256 7 | 1 | optimizer='SGD()',metrics=['accuracy'],loss='categorical_crossentropy' | epochs=5,batch_size=8 8 | 1 | optimizer='SGD()',metrics=['accuracy'],loss='categorical_crossentropy' | epochs=2,batch_size=1024 9 | 1 | optimizer='SGD()',metrics=['accuracy'],loss='categorical_crossentropy' | epochs=3,batch_size=32 10 | 1 | optimizer='SGD()',metrics=['accuracy'],loss='categorical_crossentropy' | epochs=12,batch_size=8 (10 rows) ``` is `optimizer='SGD()'...` correct or should it be `optimizer='SGD'...` ? (7) Not all sub-params apply to all params. For example, for optimizer, `lr` and `decay` might only apply to certain optimizer types and not others: ``` optimizer='SGD' optimizer='rmsprop(lr=0.0001, decay=1e-6)' optimizer='adam(lr=0.0001)' ``` In the previous method we accounted for that by doing: ``` SELECT madlib.load_model_selection_table('model_arch_library', -- model architecture table 'mst_table', -- model selection table output ARRAY[1,2], -- model ids from model architecture table ARRAY[ -- compile params $$loss='categorical_crossentropy',optimizer='rmsprop(lr=0.0001, decay=1e-6)',metrics=['accuracy']$$, $$loss='categorical_crossentropy',optimizer='rmsprop(lr=0.001, decay=1e-6)',metrics=['accuracy']$$, $$loss='categorical_crossentropy',optimizer='adam(lr=0.0001)',metrics=['accuracy']$$, $$loss='categorical_crossentropy',optimizer='adam(lr=0.001)',metrics=['accuracy']$$ ], ARRAY[ -- fit params $$batch_size=64,epochs=5$$, $$batch_size=128,epochs=5$$ ] ); ``` but how do we do this in the new method `generate_model_configs`? You could call it multiple times and incrementally build up the `mst_table` but when autoML methods call this function we need to support a 1-shot manner. I would suggest nested dictionaries like: ``` SELECT madlib.generate_model_configs( 'model_arch_library', -- model architecture table 'mst_table', -- model selection table output ARRAY[1], -- model ids from model architecture table $$ { 'loss': ['categorical_crossentropy'], 'my_list': [ {'optimizer': ['SGD', 'Adagrad']}, {'optimizer': ['rmsprop'], 'lr': [0.9, 0.95, 'log'], 'decay': [1e-6, 1e-4, 'log']}, {'optimizer': ['Adam'], 'lr': [0.99, 0.995, 'log']} ], 'metrics': ['accuracy'] } $$, -- compile_param_grid $$ { 'batch_size': [8, 32, 64, 128, 256, 1024, 4096], 'epochs': [1, 2, 3, 5, 10, 12] } $$, -- fit_param_grid 'random', -- search_type 5, -- num_configs NULL, -- random_state NULL -- object table (Default=None) ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org