fmcquillan99 edited a comment on pull request #506:
URL: https://github.com/apache/madlib/pull/506#issuecomment-665993038
errors and issues
(1)
```
SELECT madlib.generate_model_selection_configs(
'model_arch_library', -- model
architecture table
'mst_table', -- model
selection table output
ARRAY[1,2], -- model
ids from model architecture table
$$
{
'lr': [1.0, 2.0, 'linear']
}
$$, -- compile_param_grid
$$
{ 'batch_size': [8],
'epochs': [1]
}
$$, -- fit_param_grid
'random', -- search_type (‘grid’ or
‘random’, default ‘grid’)
5, -- num_configs (number of
sampled parameters. Default=10) [to limit testing]
NULL, -- random_state
NULL -- object table (Default=None)
);]
```
produces
```
InternalError: (psycopg2.errors.InternalError_) TypeError: cannot
concatenate 'str' and 'float' objects (plpython.c:5038)
CONTEXT: Traceback (most recent call last):
PL/Python function "generate_model_selection_configs", line 21, in <module>
mst_loader = madlib_keras_model_selection.MstSearch(**globals())
PL/Python function "generate_model_selection_configs", line 42, in wrapper
PL/Python function "generate_model_selection_configs", line 287, in
__init__
PL/Python function "generate_model_selection_configs", line 426, in
find_random_combinations
PL/Python function "generate_model_selection_configs", line 490, in
generate_row_string
PL/Python function "generate_model_selection_configs"
[SQL: SELECT madlib.generate_model_selection_configs(
'model_arch_library', -- model
architecture table
'mst_table', -- model
selection table output
ARRAY[1,2], -- model
ids from model architecture table
$$
{ 'loss':
['categorical_crossentropy'],
'lr': [0.0001, 0.1, 'linear']
}
$$, -- compile_param_grid
$$
{ 'batch_size': [8],
'epochs': [1]
}
$$, -- fit_param_grid
'random', -- search_type (‘grid’ or
‘random’, default ‘grid’)
5, -- num_configs (number of
sampled parameters. Default=10) [to limit testing]
NULL, -- random_state
NULL -- object table (Default=None)
);]
(Background on this error at: http://sqlalche.me/e/2j85)
```
Likewise
```
DROP TABLE IF EXISTS mst_table, mst_table_summary;
SELECT madlib.generate_model_selection_configs(
'model_arch_library', -- model
architecture table
'mst_table', -- model
selection table output
ARRAY[1,2], -- model
ids from model architecture table
$$
{
'lr': [1.0, 2.0, 'log'],
}
$$, -- compile_param_grid
$$
{ 'batch_size': [8],
'epochs': [1]
}
$$, -- fit_param_grid
'random', -- search_type (‘grid’ or
‘random’, default ‘grid’)
1, -- num_configs (number of
sampled parameters. Default=10) [to limit testing]
NULL, -- random_state
NULL -- object table (Default=None)
);
SELECT * FROM mst_table ORDER BY mst_key;
```
produces
```
InternalError: (psycopg2.errors.InternalError_) TypeError: cannot
concatenate 'str' and 'numpy.float64' objects (plpython.c:5038)
CONTEXT: Traceback (most recent call last):
PL/Python function "generate_model_selection_configs", line 21, in <module>
mst_loader = madlib_keras_model_selection.MstSearch(**globals())
PL/Python function "generate_model_selection_configs", line 42, in wrapper
PL/Python function "generate_model_selection_configs", line 287, in
__init__
PL/Python function "generate_model_selection_configs", line 426, in
find_random_combinations
PL/Python function "generate_model_selection_configs", line 490, in
generate_row_string
PL/Python function "generate_model_selection_configs"
[SQL: SELECT madlib.generate_model_selection_configs(
'model_arch_library', -- model
architecture table
'mst_table', -- model
selection table output
ARRAY[1,2], -- model
ids from model architecture table
$$
{
'lr': [1.0, 2.0, 'log'],
}
$$, -- compile_param_grid
$$
{ 'batch_size': [8],
'epochs': [1]
}
$$, -- fit_param_grid
'random', -- search_type (‘grid’ or
‘random’, default ‘grid’)
1, -- num_configs (number of
sampled parameters. Default=10) [to limit testing]
NULL, -- random_state
NULL -- object table (Default=None)
);]
(Background on this error at: http://sqlalche.me/e/2j85)
```
(2)
For search_type = 'grid' or 'random', use should be able to enter part of
the string, e.g., 'rand' for random or 'g' for for grid. There is a MADlib
function that supports this.
(3)
change the name of the function from `generate_model_selection_configs`
to `generate_model_configs`
(4)
remove exclamations ! from error messages and random capitalization.
Suggested messages:
"DL: 'num_configs' and 'random_state' must be NULL for grid search"
"DL: Cannot search from a distribution with grid search"
"DL: 'num_configs' cannot be NULL for random search"
"DL: 'search_type' must be either 'grid' or 'random'"
"DL: Please choose a valid distribution type ('linear' or 'log')"
"DL: {0} should be of the format [lower_bound, upper_bound,
distribution_type]"
(5)
In addition to `linear` sampling and `log` sampling we should add another
type
called `log_near_one`
```
config_dict[cp] = 1.0 - np.power(10, np.random.uniform (np.log10 (1.0 -
param_values[1]), np.log10(1.0 - param_values[0]) ) )
```
This type of sampling is useful for exponentially weighted average type
params like momentum, which are very sensitive to changes near 1. It has the
effect of producing more values near 1 than regular log sampling.
e.g.
momentum values in range [0.9000, 0.9005] average the prev 10 values no
matter where you are in the range (no diff)
but
momentum values in range [0.9990, 0.9995] average the prev 1000 values for
the left side and prev 2000 values for the right side (big diff), so you want
to generate more samples nearer to the right side to get better coverage.
(6)
```
DROP TABLE IF EXISTS mst_table, mst_table_summary;
SELECT madlib.generate_model_selection_configs(
'model_arch_library', -- model
architecture table
'mst_table', -- model
selection table output
ARRAY[1], -- model ids
from model architecture table
$$
{ 'loss':
['categorical_crossentropy'],
'optimizer': ['Adam'],
'lr': [0.9, 0.95, 'log'],
'metrics': ['accuracy']
}
$$, -- compile_param_grid
$$
{ 'batch_size': [8, 32, 64, 128,
256, 1024, 4096],
'epochs': [1, 2, 3, 5, 10, 12]
}
$$, -- fit_param_grid
'random', -- search_type
5, -- num_configs
NULL, -- random_state
NULL -- object table (Default=None)
);
SELECT * FROM mst_table ORDER BY mst_key;
```
followed by
```
SELECT madlib.generate_model_selection_configs(
'model_arch_library', -- model
architecture table
'mst_table', -- model
selection table output
ARRAY[1], -- model ids
from model architecture table
$$
{ 'loss':
['categorical_crossentropy'],
'optimizer': ['SGD'],
'metrics': ['accuracy']
}
$$, -- compile_param_grid
$$
{ 'batch_size': [8, 32, 64, 128,
256, 1024, 4096],
'epochs': [1, 2, 3, 5, 10, 12]
}
$$, -- fit_param_grid
'random', -- search_type
5, -- num_configs
NULL, -- random_state
NULL -- object table (Default=None)
);
SELECT * FROM mst_table ORDER BY mst_key;
```
produces
```
IntegrityError: (psycopg2.errors.UniqueViolation) plpy.SPIError: duplicate
key value violates unique constraint "mst_table_model_id_key" (seg0
10.128.0.41:40000 pid=22297)
DETAIL: Key (model_id, compile_params, fit_params)=(1,
optimizer='SGD()',metrics=['accuracy'],loss='categorical_crossentropy',
epochs=12,batch_size=32) already exists.
CONTEXT: Traceback (most recent call last):
PL/Python function "generate_model_selection_configs", line 22, in <module>
mst_loader.load()
PL/Python function "generate_model_selection_configs", line 313, in load
PL/Python function "generate_model_selection_configs", line 566, in
insert_into_mst_table
PL/Python function "generate_model_selection_configs"
[SQL: SELECT madlib.generate_model_selection_configs( 'model_arch_library',
-- model architecture table
'mst_table', -- model
selection table output
ARRAY[1], -- model ids
from model architecture table
$$
{ 'loss':
['categorical_crossentropy'],
'optimizer': ['SGD'],
'metrics': ['accuracy']
}
$$, -- compile_param_grid
$$
{ 'batch_size': [8, 32, 64, 128,
256, 1024, 4096],
'epochs': [1, 2, 3, 5, 10, 12]
}
$$, -- fit_param_grid
'random', -- search_type
5, -- num_configs
NULL, -- random_state
NULL -- object table (Default=None)
);]
(Background on this error at: http://sqlalche.me/e/gkpj)
```
But it only produced the error every 2nd time I did this. i.e., 1-pass it
would work then the 2nd pass it would throw the error.
When it does pass, it produces
```
mst_key | model_id | compile_params
| fit_params
---------+----------+----------------------------------------------------------------------------------------------+--------------------------
1 | 1 |
optimizer='Adam(lr=0.9063214445649174)',metrics=['accuracy'],loss='categorical_crossentropy'
| epochs=10,batch_size=256
2 | 1 |
optimizer='Adam(lr=0.9367722192055232)',metrics=['accuracy'],loss='categorical_crossentropy'
| epochs=5,batch_size=256
3 | 1 |
optimizer='Adam(lr=0.9212048311857509)',metrics=['accuracy'],loss='categorical_crossentropy'
| epochs=2,batch_size=32
4 | 1 |
optimizer='Adam(lr=0.9193149125403647)',metrics=['accuracy'],loss='categorical_crossentropy'
| epochs=3,batch_size=256
5 | 1 |
optimizer='Adam(lr=0.9326284661833211)',metrics=['accuracy'],loss='categorical_crossentropy'
| epochs=2,batch_size=256
6 | 1 |
optimizer='SGD()',metrics=['accuracy'],loss='categorical_crossentropy'
| epochs=10,batch_size=256
7 | 1 |
optimizer='SGD()',metrics=['accuracy'],loss='categorical_crossentropy'
| epochs=5,batch_size=8
8 | 1 |
optimizer='SGD()',metrics=['accuracy'],loss='categorical_crossentropy'
| epochs=2,batch_size=1024
9 | 1 |
optimizer='SGD()',metrics=['accuracy'],loss='categorical_crossentropy'
| epochs=3,batch_size=32
10 | 1 |
optimizer='SGD()',metrics=['accuracy'],loss='categorical_crossentropy'
| epochs=12,batch_size=8
(10 rows)
```
is `optimizer='SGD()'...` correct or should it be `optimizer='SGD'...` ?
(7)
Not all sub-params apply to all params. For example, for optimizer, `lr`
and `decay` might only apply to certain optimizer types and not others:
```
optimizer='SGD'
optimizer='rmsprop(lr=0.0001, decay=1e-6)'
optimizer='adam(lr=0.0001)'
```
In the previous method we accounted for that by doing:
```
SELECT madlib.load_model_selection_table('model_arch_library', -- model
architecture table
'mst_table', -- model
selection table output
ARRAY[1,2], -- model ids
from model architecture table
ARRAY[ -- compile
params
$$loss='categorical_crossentropy',optimizer='rmsprop(lr=0.0001,
decay=1e-6)',metrics=['accuracy']$$,
$$loss='categorical_crossentropy',optimizer='rmsprop(lr=0.001,
decay=1e-6)',metrics=['accuracy']$$,
$$loss='categorical_crossentropy',optimizer='adam(lr=0.0001)',metrics=['accuracy']$$,
$$loss='categorical_crossentropy',optimizer='adam(lr=0.001)',metrics=['accuracy']$$
],
ARRAY[ -- fit params
$$batch_size=64,epochs=5$$,
$$batch_size=128,epochs=5$$
]
);
```
but how do we do this in the new method `generate_model_configs`? You could
call it multiple times and incrementally build up the `mst_table` but when
autoML methods call this function we need to support a 1-shot manner. I would
suggest nested dictionaries like:
```
SELECT madlib.generate_model_configs(
'model_arch_library', -- model
architecture table
'mst_table', -- model
selection table output
ARRAY[1], -- model ids
from model architecture table
$$
{ 'loss':
['categorical_crossentropy'],
'my_list': [
{'optimizer': ['SGD', 'Adagrad']},
{'optimizer': ['rmsprop'], 'lr': [0.9, 0.95, 'log'], 'decay': [1e-6, 1e-4,
'log']},
{'optimizer': ['Adam'], 'lr': [0.99, 0.995, 'log']}
],
'metrics': ['accuracy']
}
$$, -- compile_param_grid
$$
{ 'batch_size': [8, 32, 64, 128,
256, 1024, 4096],
'epochs': [1, 2, 3, 5, 10, 12]
}
$$, -- fit_param_grid
'random', -- search_type
5, -- num_configs
NULL, -- random_state
NULL -- object table (Default=None)
```
So I think we should support both single dictionary and nested dictionary
syntax.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]