[GitHub] [madlib] orhankislal commented on a change in pull request #506: DL: Add grid/random search for model selection with `generate_model_selection_configs`

GitBox Tue, 21 Jul 2020 11:53:24 -0700


orhankislal commented on a change in pull request #506:
URL: https://github.com/apache/madlib/pull/506#discussion_r457713770




##########
File path: 
src/ports/postgres/modules/deep_learning/madlib_keras_model_selection.py_in
##########
@@ -203,3 +212,365 @@ class MstLoader():
                                   
object_table_name=ModelSelectionSchema.OBJECT_TABLE,
                                   **locals())
         plpy.execute(insert_summary_query)
+
+@MinWarning("warning")
+class MstSearch():
+    """
+    The utility class for generating model selection configs and loading into 
a MST table with model parameters.
+
+    Currently takes string representations of python dictionaries for compile 
and fit params.
+    Generates configs with a chosen search algorithm
+
+    Attributes:
+        model_arch_table (str): The name of model architecture table.
+        model_selection_table (str): The name of the output mst table.
+        model_id_list (list): The input list of model id choices.
+        compile_params_grid (string repr of python dict): The input of compile 
params choices.
+        fit_params_grid (string repr of python dict): The input of fit params 
choices.
+        search_type (str, default 'grid'): Hyperparameter search strategy, 
'grid' or 'random'.
+
+        Only for 'random' search type (defaults None):
+            num_configs (int): Number of configs to generate.
+            random_state (int): Seed for result reproducibility.
+
+        object_table (str, default None): The name of the object table, for 
custom (metric) functions.
+
+    """
+
+    def __init__(self,
+                 model_arch_table,
+                 model_selection_table,
+                 model_id_list,
+                 compile_params_grid,
+                 fit_params_grid,
+                 search_type='grid',
+                 num_configs=None,
+                 random_state=None,
+                 object_table=None,
+                 **kwargs):
+
+        self.model_arch_table = model_arch_table
+        self.model_selection_table = model_selection_table
+        self.model_selection_summary_table = add_postfix(
+            model_selection_table, "_summary")
+        self.model_id_list = sorted(list(set(model_id_list)))
+
+        MstLoaderInputValidator(
+            model_arch_table=self.model_arch_table,
+            model_selection_table=self.model_selection_table,
+            model_selection_summary_table=self.model_selection_summary_table,
+            model_id_list=self.model_id_list,
+            compile_params_list=compile_params_grid,
+            fit_params_list=fit_params_grid,
+            object_table=object_table,
+            module_name='generate_model_selection_configs'
+        )
+
+        self.search_type = search_type
+        self.num_configs = num_configs
+        self.random_state = random_state
+        self.object_table = object_table
+
+        compile_params_grid = compile_params_grid.replace('\n', '').replace(' 
', '')
+        fit_params_grid = fit_params_grid.replace('\n', '').replace(' ', '')
+        self.validate_inputs(compile_params_grid, fit_params_grid)
+
+        # extracting python dict
+        self.compile_params_dict = literal_eval(compile_params_grid)
+        self.fit_params_dict = literal_eval(fit_params_grid)
+
+        self.msts = []
+
+        if self.search_type == 'grid':
+            self.find_grid_combinations()
+        elif self.search_type == 'random': # else should also suffice as 
random search is established.
+            self.find_random_combinations()
+
+        compile_params_lst, fit_params_lst = [], []
+        for i in self.msts:
+            compile_params_lst.append(i[ModelSelectionSchema.COMPILE_PARAMS])
+            fit_params_lst.append(i[ModelSelectionSchema.FIT_PARAMS])
+        self._validate_params_and_object_table(compile_params_lst, 
fit_params_lst)
+
+    def load(self):
+        """The entry point for loading the model selection table.
+        """
+        # All of the side effects happen in this function.
+        if not table_exists(self.model_selection_table):
+            self.create_mst_table()
+        self.create_mst_summary_table()
+        self.insert_into_mst_table()
+
+    def validate_inputs(self, compile_params_grid, fit_params_grid):
+        """
+        Ensures validity of inputs related to grid and random search.
+
+        :param compile_params_grid: The input string repr of compile params 
choices.
+        :param fit_params_grid: The input string repr of fit params choices.
+        """
+
+        # TODO: add additional cases for validating params (and test it)
+
+        if self.search_type == 'grid':
+            _assert(self.num_configs is None and self.random_state is None,
+                    "'num_configs' and 'random_state' have to be NULL for Grid 
Search")
+            for distribution_type in ['linear', 'log']:
+                _assert(distribution_type not in compile_params_grid and 
distribution_type not in fit_params_grid,
+                        "Cannot search from a distribution with Grid Search!")
+        elif self.search_type == 'random':
+            _assert(self.num_configs is not None, "'num_configs' cannot be 
NULL for Random Search")
+        else:
+            plpy.error("'search_type' has to be either 'grid' or 'random' !")
+
+    def _validate_params_and_object_table(self, compile_params_lst, 
fit_params_lst):
+        if not fit_params_lst:
+            plpy.error("fit_params_list cannot be NULL")
+        for fit_params in fit_params_lst:
+            try:
+                res = parse_and_validate_fit_params(fit_params)
+            except Exception as e:
+                plpy.error(
+                    """Fit param check failed for: {0} \n
+                    {1}
+                    """.format(fit_params, str(e)))
+        if not compile_params_lst:
+            plpy.error( "compile_params_list cannot be NULL")
+        custom_fn_name = []
+        ## Initialize builtin loss/metrics functions
+        builtin_losses = dir(losses)
+        builtin_metrics = dir(metrics)
+        # Default metrics, since it is not part of the builtin metrics list
+        builtin_metrics.append('accuracy')
+        if self.object_table is not None:
+            res = plpy.execute("SELECT {0} from 
{1}".format(CustomFunctionSchema.FN_NAME,
+                                                            self.object_table))
+            for r in res:
+                custom_fn_name.append(r[CustomFunctionSchema.FN_NAME])
+        for compile_params in compile_params_lst:
+            try:
+                _, _, res = parse_and_validate_compile_params(compile_params)
+                # Validating if loss/metrics function called in compile_params
+                # is either defined in object table or is a built_in keras
+                # loss/metrics function
+                error_suffix = "but input object table missing!"
+                if self.object_table is not None:
+                    error_suffix = "is not defined in object table 
'{0}'!".format(self.object_table)
+
+                _assert(res['loss'] in custom_fn_name or res['loss'] in 
builtin_losses,
+                        "custom function '{0}' used in compile params " \
+                        "{1}".format(res['loss'], error_suffix))
+                if 'metrics' in res:
+                    
_assert((len(set(res['metrics']).intersection(custom_fn_name)) > 0
+                             or 
len(set(res['metrics']).intersection(builtin_metrics)) > 0),
+                            "custom function '{0}' used in compile params " \
+                            "{1}".format(res['metrics'], error_suffix))
+
+            except Exception as e:
+                plpy.error(
+                    """Compile param check failed for: {0} \n
+                    {1}
+                    """.format(compile_params, str(e)))
+
+    def find_grid_combinations(self):
+        """
+        Finds combinations using grid search.
+        """
+        combined_dict = dict(self.compile_params_dict, **self.fit_params_dict)
+        combined_dict[ModelSelectionSchema.MODEL_ID] = self.model_id_list
+        keys, values = zip(*combined_dict.items())
+        all_configs_params = [dict(zip(keys, v)) for v in 
itertools_product(*values)]
+
+        # to separate the compile and fit configs
+        for config in all_configs_params:
+            combination = {}
+            compile_configs, fit_configs = {}, {}
+            for k in config:
+                if k == ModelSelectionSchema.MODEL_ID:
+                    combination[ModelSelectionSchema.MODEL_ID] = config[k]
+                elif k in self.compile_params_dict:
+                    compile_configs[k] = config[k]
+                elif k in self.fit_params_dict:
+                    fit_configs[k] = config[k]
+                else:
+                    plpy.error("{0} is an unidentified key".format(k))
+
+            combination[ModelSelectionSchema.COMPILE_PARAMS] = 
self.generate_row_string(compile_configs)
+            combination[ModelSelectionSchema.FIT_PARAMS] = 
self.generate_row_string(fit_configs)
+            self.msts.append(combination)
+
+    def find_random_combinations(self):
+        """
+        Finds combinations using random search.
+        """
+        if self.random_state:
+            seed_changes = 0
+        else:
+            seed_changes = None
+
+        for _ in range(self.num_configs):
+            combination = {}
+            if self.random_state:
+                np.random.seed(self.random_state+seed_changes)
+                seed_changes += 1
+            combination[ModelSelectionSchema.MODEL_ID] = 
np.random.choice(self.model_id_list)
+            compile_d = {}
+            compile_d, seed_changes = 
self.generate_param_config(self.compile_params_dict, compile_d, seed_changes)
+            combination[ModelSelectionSchema.COMPILE_PARAMS] = 
self.generate_row_string(compile_d)
+            fit_d = {}
+            fit_d, seed_changes = 
self.generate_param_config(self.fit_params_dict, fit_d, seed_changes)
+            combination[ModelSelectionSchema.FIT_PARAMS] = 
self.generate_row_string(fit_d)
+            self.msts.append(combination)
+
+    def generate_param_config(self, params_dict, config_dict, seed_changes):
+        """
+        Generating a parameter configuration for random search.
+        :param params_dict: Dictionary of params choices.
+        :param config_dict: Dictionary to store param config.
+        :param seed_changes: Changes in seed for random sampling + 
reproducibility.
+        :return: config_dict, seed_changes.
+        """
+        for cp in params_dict:
+            if self.random_state:
+                np.random.seed(self.random_state+seed_changes)
+                seed_changes += 1
+
+            param_values = params_dict[cp]
+
+            # sampling from a distribution
+            if param_values[-1] in ['linear', 'log']:
+                _assert(len(param_values) == 3,
+                        "{0} should have exactly 3 elements if picking from a 
distribution".format(cp))
+                _assert(param_values[1] > param_values[0],
+                        "{0} should be of the format [lower_bound, 
uppper_bound, distribution_type]".format(cp))
+                if param_values[-1] == 'linear':
+                    config_dict[cp] = np.random.uniform(param_values[0], 
param_values[1])
+                elif param_values[-1] == 'log':
+                    config_dict[cp] = np.power(10, 
np.random.uniform(np.log10(param_values[0]),
+                                                                     
np.log10(param_values[1])))
+                else:
+                    plpy.error("Choose a valid distribution type! ('linear' or 
'log')")
+            # random sampling
+            else:
+                config_dict[cp] = np.random.choice(params_dict[cp])
+
+        return config_dict, seed_changes
+
+    def generate_row_string(self, configs_dict):
+        """
+        Generate row strings for MST table.
+        :param configs_dict: Dictionary of params config.
+        :return: string to insert as a row in MST table.
+        """
+        result_row_string = ""
+
+        if 'optimizer' in configs_dict and 'lr' in configs_dict:
+            if configs_dict['optimizer'].lower() == 'sgd':
+                optimizer_value = "SGD"
+            elif configs_dict['optimizer'].lower() == 'rmsprop':
+                optimizer_value = "RMSprop"
+            else:
+                optimizer_value = configs_dict['optimizer'].capitalize()
+            result_row_string += "optimizer" + "=" + "'" + 
str(optimizer_value) \
+                                 + "(" + "lr=" + str(configs_dict['lr']) + ")" 
+ "',"
+        elif 'optimizer' in configs_dict:
+            # lr will be set to its default value during mdoel training
+            result_row_string += "optimizer" + "=" + "'" + 
str(configs_dict['optimizer']) \
+                                 + "()" + "',"
+        elif 'lr' in configs_dict:
+            # default optimizer value in Keras is SGD (unless changed in a 
future release).
+            result_row_string += "optimizer" + "=" + "'" + "SGD" \
+                                 + "(" + "lr=" + configs_dict['lr'] + ")" + 
"',"
+
+        for c in configs_dict:
+            if c == 'optimizer' or c == 'lr':
+                continue
+            elif c == 'metrics':
+                if callable(configs_dict[c]):
+                    result_row_string += str(c) + "=" + "[" + 
str(configs_dict[c]) + "],"
+                else:
+                    result_row_string += str(c) + "=" + "['" + 
str(configs_dict[c]) + "'],"
+            else:
+                if type(configs_dict[c]) == str or type(configs_dict[c]) == 
np.string_:
+                    result_row_string += str(c) + "=" + "'" + 
str(configs_dict[c]) + "',"
+                else:
+                    # ints, floats, none type, booleans
+                    result_row_string += str(c) + "=" + str(configs_dict[c]) + 
","
+
+        return result_row_string[:-1] # to exclude the last comma
+
+    def create_mst_table(self):
+        """Initialize the output mst table, if it doesn't exist (for 
incremental loading).
+        """
+
+        create_query = """
+                        CREATE TABLE {self.model_selection_table} (
+                            {mst_key} SERIAL,
+                            {model_id} INTEGER,
+                            {compile_params} VARCHAR,
+                            {fit_params} VARCHAR,
+                            unique ({model_id}, {compile_params}, {fit_params})
+                        );
+                       """.format(self=self,
+                                  mst_key=ModelSelectionSchema.MST_KEY,
+                                  model_id=ModelSelectionSchema.MODEL_ID,
+                                  
compile_params=ModelSelectionSchema.COMPILE_PARAMS,
+                                  fit_params=ModelSelectionSchema.FIT_PARAMS)
+        with MinWarning('warning'):
+            plpy.execute(create_query)
+
+    def create_mst_summary_table(self):
+        """Initialize the output mst table.
+        """
+        create_query = """
+                        CREATE TABLE {self.model_selection_summary_table} (
+                            {model_arch_table} VARCHAR,
+                            {object_table} VARCHAR
+                        );
+                       """.format(self=self,
+                                  
model_arch_table=ModelSelectionSchema.MODEL_ARCH_TABLE,
+                                  
object_table=ModelSelectionSchema.OBJECT_TABLE)
+        with MinWarning('warning'):
+            plpy.execute(create_query)
+
+    def insert_into_mst_table(self):
+        """Insert every thing in self.msts into the mst table.
+        """
+        for mst in self.msts:
+            model_id = mst[ModelSelectionSchema.MODEL_ID]
+            compile_params = mst[ModelSelectionSchema.COMPILE_PARAMS]
+            fit_params = mst[ModelSelectionSchema.FIT_PARAMS]
+            insert_query = """
+                            INSERT INTO
+                                {self.model_selection_table}(
+                                    {model_id_col},
+                                    {compile_params_col},
+                                    {fit_params_col}
+                                )
+                            VALUES (
+                                {model_id},
+                                $${compile_params}$$,
+                                $${fit_params}$$
+                            )
+                           
""".format(model_id_col=ModelSelectionSchema.MODEL_ID,
+                                      
compile_params_col=ModelSelectionSchema.COMPILE_PARAMS,
+                                      
fit_params_col=ModelSelectionSchema.FIT_PARAMS,
+                                      **locals())
+            plpy.execute(insert_query)
+        if self.object_table is None:
+            object_table = 'NULL::VARCHAR'
+        else:
+            object_table = '$${0}$$'.format(self.object_table)
+        insert_summary_query = """
+                        INSERT INTO
+                            {self.model_selection_summary_table}(
+                                {model_arch_table_name},
+                                {object_table_name}
+                        )
+                        VALUES (
+                            $${self.model_arch_table}$$,
+                            {object_table}
+                        )
+                       
""".format(model_arch_table_name=ModelSelectionSchema.MODEL_ARCH_TABLE,
+                                  
object_table_name=ModelSelectionSchema.OBJECT_TABLE,
+                                  **locals())
+        plpy.execute(insert_summary_query)

Review comment:
       New line

##########
File path: 
src/ports/postgres/modules/deep_learning/madlib_keras_model_selection.py_in
##########
@@ -18,11 +18,20 @@
 
 import plpy
 from collections import OrderedDict
+import numpy as np
+from itertools import product as itertools_product
+from ast import literal_eval
 from madlib_keras_validator import MstLoaderInputValidator
 from utilities.control import MinWarning
-from utilities.utilities import add_postfix
+from utilities.utilities import add_postfix, extract_keyvalue_params, _assert
+from utilities.validate_args import table_exists
 from madlib_keras_wrapper import convert_string_of_args_to_dict
 from keras_model_arch_table import ModelArchSchema
+from madlib_keras_wrapper import parse_and_validate_fit_params
+from madlib_keras_wrapper import parse_and_validate_compile_params
+import keras.losses as losses
+import keras.metrics as metrics
+from madlib_keras_custom_function import CustomFunctionSchema

Review comment:
       We should sort these imports in some way. Maybe the external imports 
(plpy, keras etc) first and then the madlib ones would work.

##########
File path: 
src/ports/postgres/modules/deep_learning/madlib_keras_model_selection.py_in
##########
@@ -203,3 +212,365 @@ class MstLoader():
                                   
object_table_name=ModelSelectionSchema.OBJECT_TABLE,
                                   **locals())
         plpy.execute(insert_summary_query)
+
+@MinWarning("warning")
+class MstSearch():
+    """
+    The utility class for generating model selection configs and loading into 
a MST table with model parameters.
+
+    Currently takes string representations of python dictionaries for compile 
and fit params.
+    Generates configs with a chosen search algorithm
+
+    Attributes:
+        model_arch_table (str): The name of model architecture table.
+        model_selection_table (str): The name of the output mst table.
+        model_id_list (list): The input list of model id choices.
+        compile_params_grid (string repr of python dict): The input of compile 
params choices.
+        fit_params_grid (string repr of python dict): The input of fit params 
choices.
+        search_type (str, default 'grid'): Hyperparameter search strategy, 
'grid' or 'random'.
+
+        Only for 'random' search type (defaults None):
+            num_configs (int): Number of configs to generate.
+            random_state (int): Seed for result reproducibility.
+
+        object_table (str, default None): The name of the object table, for 
custom (metric) functions.
+
+    """
+
+    def __init__(self,
+                 model_arch_table,
+                 model_selection_table,
+                 model_id_list,
+                 compile_params_grid,
+                 fit_params_grid,
+                 search_type='grid',
+                 num_configs=None,
+                 random_state=None,
+                 object_table=None,
+                 **kwargs):
+
+        self.model_arch_table = model_arch_table
+        self.model_selection_table = model_selection_table
+        self.model_selection_summary_table = add_postfix(
+            model_selection_table, "_summary")
+        self.model_id_list = sorted(list(set(model_id_list)))
+
+        MstLoaderInputValidator(
+            model_arch_table=self.model_arch_table,
+            model_selection_table=self.model_selection_table,
+            model_selection_summary_table=self.model_selection_summary_table,
+            model_id_list=self.model_id_list,
+            compile_params_list=compile_params_grid,
+            fit_params_list=fit_params_grid,
+            object_table=object_table,
+            module_name='generate_model_selection_configs'
+        )
+
+        self.search_type = search_type
+        self.num_configs = num_configs
+        self.random_state = random_state
+        self.object_table = object_table
+
+        compile_params_grid = compile_params_grid.replace('\n', '').replace(' 
', '')
+        fit_params_grid = fit_params_grid.replace('\n', '').replace(' ', '')
+        self.validate_inputs(compile_params_grid, fit_params_grid)
+
+        # extracting python dict
+        self.compile_params_dict = literal_eval(compile_params_grid)
+        self.fit_params_dict = literal_eval(fit_params_grid)
+
+        self.msts = []
+
+        if self.search_type == 'grid':
+            self.find_grid_combinations()
+        elif self.search_type == 'random': # else should also suffice as 
random search is established.
+            self.find_random_combinations()
+
+        compile_params_lst, fit_params_lst = [], []
+        for i in self.msts:
+            compile_params_lst.append(i[ModelSelectionSchema.COMPILE_PARAMS])
+            fit_params_lst.append(i[ModelSelectionSchema.FIT_PARAMS])
+        self._validate_params_and_object_table(compile_params_lst, 
fit_params_lst)
+
+    def load(self):
+        """The entry point for loading the model selection table.
+        """
+        # All of the side effects happen in this function.
+        if not table_exists(self.model_selection_table):
+            self.create_mst_table()
+        self.create_mst_summary_table()
+        self.insert_into_mst_table()
+
+    def validate_inputs(self, compile_params_grid, fit_params_grid):
+        """
+        Ensures validity of inputs related to grid and random search.
+
+        :param compile_params_grid: The input string repr of compile params 
choices.
+        :param fit_params_grid: The input string repr of fit params choices.
+        """
+
+        # TODO: add additional cases for validating params (and test it)
+
+        if self.search_type == 'grid':
+            _assert(self.num_configs is None and self.random_state is None,
+                    "'num_configs' and 'random_state' have to be NULL for Grid 
Search")

Review comment:
       In general, MADlib error messages start with the module's name. Just 
putting `DL:` should be good enough

##########
File path: 
src/ports/postgres/modules/deep_learning/madlib_keras_model_selection.py_in
##########
@@ -203,3 +212,365 @@ class MstLoader():
                                   
object_table_name=ModelSelectionSchema.OBJECT_TABLE,
                                   **locals())
         plpy.execute(insert_summary_query)
+
+@MinWarning("warning")
+class MstSearch():
+    """
+    The utility class for generating model selection configs and loading into 
a MST table with model parameters.
+
+    Currently takes string representations of python dictionaries for compile 
and fit params.
+    Generates configs with a chosen search algorithm
+
+    Attributes:
+        model_arch_table (str): The name of model architecture table.
+        model_selection_table (str): The name of the output mst table.
+        model_id_list (list): The input list of model id choices.
+        compile_params_grid (string repr of python dict): The input of compile 
params choices.
+        fit_params_grid (string repr of python dict): The input of fit params 
choices.
+        search_type (str, default 'grid'): Hyperparameter search strategy, 
'grid' or 'random'.
+
+        Only for 'random' search type (defaults None):
+            num_configs (int): Number of configs to generate.
+            random_state (int): Seed for result reproducibility.
+
+        object_table (str, default None): The name of the object table, for 
custom (metric) functions.
+
+    """
+
+    def __init__(self,
+                 model_arch_table,
+                 model_selection_table,
+                 model_id_list,
+                 compile_params_grid,
+                 fit_params_grid,
+                 search_type='grid',
+                 num_configs=None,
+                 random_state=None,
+                 object_table=None,
+                 **kwargs):
+
+        self.model_arch_table = model_arch_table
+        self.model_selection_table = model_selection_table
+        self.model_selection_summary_table = add_postfix(
+            model_selection_table, "_summary")
+        self.model_id_list = sorted(list(set(model_id_list)))
+
+        MstLoaderInputValidator(
+            model_arch_table=self.model_arch_table,
+            model_selection_table=self.model_selection_table,
+            model_selection_summary_table=self.model_selection_summary_table,
+            model_id_list=self.model_id_list,
+            compile_params_list=compile_params_grid,
+            fit_params_list=fit_params_grid,
+            object_table=object_table,
+            module_name='generate_model_selection_configs'
+        )
+
+        self.search_type = search_type
+        self.num_configs = num_configs
+        self.random_state = random_state
+        self.object_table = object_table
+
+        compile_params_grid = compile_params_grid.replace('\n', '').replace(' 
', '')
+        fit_params_grid = fit_params_grid.replace('\n', '').replace(' ', '')
+        self.validate_inputs(compile_params_grid, fit_params_grid)
+
+        # extracting python dict
+        self.compile_params_dict = literal_eval(compile_params_grid)
+        self.fit_params_dict = literal_eval(fit_params_grid)
+
+        self.msts = []
+
+        if self.search_type == 'grid':
+            self.find_grid_combinations()
+        elif self.search_type == 'random': # else should also suffice as 
random search is established.
+            self.find_random_combinations()
+
+        compile_params_lst, fit_params_lst = [], []
+        for i in self.msts:
+            compile_params_lst.append(i[ModelSelectionSchema.COMPILE_PARAMS])
+            fit_params_lst.append(i[ModelSelectionSchema.FIT_PARAMS])
+        self._validate_params_and_object_table(compile_params_lst, 
fit_params_lst)
+
+    def load(self):
+        """The entry point for loading the model selection table.
+        """
+        # All of the side effects happen in this function.
+        if not table_exists(self.model_selection_table):
+            self.create_mst_table()
+        self.create_mst_summary_table()
+        self.insert_into_mst_table()
+
+    def validate_inputs(self, compile_params_grid, fit_params_grid):
+        """
+        Ensures validity of inputs related to grid and random search.
+
+        :param compile_params_grid: The input string repr of compile params 
choices.
+        :param fit_params_grid: The input string repr of fit params choices.
+        """
+
+        # TODO: add additional cases for validating params (and test it)
+
+        if self.search_type == 'grid':
+            _assert(self.num_configs is None and self.random_state is None,
+                    "'num_configs' and 'random_state' have to be NULL for Grid 
Search")
+            for distribution_type in ['linear', 'log']:
+                _assert(distribution_type not in compile_params_grid and 
distribution_type not in fit_params_grid,
+                        "Cannot search from a distribution with Grid Search!")
+        elif self.search_type == 'random':
+            _assert(self.num_configs is not None, "'num_configs' cannot be 
NULL for Random Search")
+        else:
+            plpy.error("'search_type' has to be either 'grid' or 'random' !")
+
+    def _validate_params_and_object_table(self, compile_params_lst, 
fit_params_lst):
+        if not fit_params_lst:
+            plpy.error("fit_params_list cannot be NULL")
+        for fit_params in fit_params_lst:
+            try:
+                res = parse_and_validate_fit_params(fit_params)
+            except Exception as e:
+                plpy.error(
+                    """Fit param check failed for: {0} \n
+                    {1}
+                    """.format(fit_params, str(e)))
+        if not compile_params_lst:
+            plpy.error( "compile_params_list cannot be NULL")
+        custom_fn_name = []
+        ## Initialize builtin loss/metrics functions
+        builtin_losses = dir(losses)
+        builtin_metrics = dir(metrics)
+        # Default metrics, since it is not part of the builtin metrics list
+        builtin_metrics.append('accuracy')
+        if self.object_table is not None:
+            res = plpy.execute("SELECT {0} from 
{1}".format(CustomFunctionSchema.FN_NAME,
+                                                            self.object_table))
+            for r in res:
+                custom_fn_name.append(r[CustomFunctionSchema.FN_NAME])
+        for compile_params in compile_params_lst:
+            try:
+                _, _, res = parse_and_validate_compile_params(compile_params)
+                # Validating if loss/metrics function called in compile_params
+                # is either defined in object table or is a built_in keras
+                # loss/metrics function
+                error_suffix = "but input object table missing!"
+                if self.object_table is not None:
+                    error_suffix = "is not defined in object table 
'{0}'!".format(self.object_table)
+
+                _assert(res['loss'] in custom_fn_name or res['loss'] in 
builtin_losses,
+                        "custom function '{0}' used in compile params " \
+                        "{1}".format(res['loss'], error_suffix))
+                if 'metrics' in res:
+                    
_assert((len(set(res['metrics']).intersection(custom_fn_name)) > 0
+                             or 
len(set(res['metrics']).intersection(builtin_metrics)) > 0),
+                            "custom function '{0}' used in compile params " \
+                            "{1}".format(res['metrics'], error_suffix))
+
+            except Exception as e:
+                plpy.error(
+                    """Compile param check failed for: {0} \n
+                    {1}
+                    """.format(compile_params, str(e)))
+
+    def find_grid_combinations(self):
+        """
+        Finds combinations using grid search.
+        """
+        combined_dict = dict(self.compile_params_dict, **self.fit_params_dict)
+        combined_dict[ModelSelectionSchema.MODEL_ID] = self.model_id_list
+        keys, values = zip(*combined_dict.items())
+        all_configs_params = [dict(zip(keys, v)) for v in 
itertools_product(*values)]
+
+        # to separate the compile and fit configs
+        for config in all_configs_params:
+            combination = {}
+            compile_configs, fit_configs = {}, {}
+            for k in config:
+                if k == ModelSelectionSchema.MODEL_ID:
+                    combination[ModelSelectionSchema.MODEL_ID] = config[k]
+                elif k in self.compile_params_dict:
+                    compile_configs[k] = config[k]
+                elif k in self.fit_params_dict:
+                    fit_configs[k] = config[k]
+                else:
+                    plpy.error("{0} is an unidentified key".format(k))
+
+            combination[ModelSelectionSchema.COMPILE_PARAMS] = 
self.generate_row_string(compile_configs)
+            combination[ModelSelectionSchema.FIT_PARAMS] = 
self.generate_row_string(fit_configs)
+            self.msts.append(combination)
+
+    def find_random_combinations(self):
+        """
+        Finds combinations using random search.
+        """
+        if self.random_state:

Review comment:
       This if clause can be combined into a single line: `seed_changes = 0 if 
self.random_state else None` 
   I don't think you even need this, the random_state is checked again in any 
case. If you set it to 0, it will never change (if `self.random_state` is false)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [madlib] orhankislal commented on a change in pull request #506: DL: Add grid/random search for model selection with `generate_model_selection_configs`

Reply via email to