reductionista commented on a change in pull request #526:
URL: https://github.com/apache/madlib/pull/526#discussion_r554205538
##########
File path: src/ports/postgres/modules/deep_learning/madlib_keras_automl.py_in
##########
@@ -172,45 +174,38 @@ class KerasAutoML(object):
random_state = 'NULL' if self.random_state is None else
'$MAD${0}$MAD$'.format(self.random_state)
validation_table = 'NULL' if self.validation_table is None else
'$MAD${0}$MAD$'.format(self.validation_table)
- create_query = plpy.prepare("""
+ create_query = """
CREATE TABLE {self.model_summary_table} AS
SELECT
$MAD${self.source_table}$MAD$::TEXT AS source_table,
{validation_table}::TEXT AS validation_table,
$MAD${self.model_output_table}$MAD$::TEXT AS model,
$MAD${self.model_info_table}$MAD$::TEXT AS model_info,
- (SELECT dependent_varname FROM
{model_training.model_summary_table})
- AS dependent_varname,
- (SELECT independent_varname FROM
{model_training.model_summary_table})
- AS independent_varname,
+ (SELECT dependent_varname FROM {a.MODEL_SUMMARY_TABLE})
Review comment:
My intention with this PR was to try to avoid as much as possible having
AutoML depend on the names and meanings of internal class variables in
FitMultiple. In the case of the model output table, there were multiple
different table names inside this class, and while refactoring they changed
around. So the motivation was mainly to avoid the headache that resulted from
having to go through and modify all the references to those variables in AutoML
again, if they change in the future.
The name of the model output table is chosen by AutoML rather than by
FitMultiple, so there was a strong case for having AutoML manage that one. For
the summary table, I agree that the case is weaker... since the name is not
entirely chosen by AutoML... the _summary prefix is appended to whatever AutoML
chooses by FitMultiple.
I considered leaving that one as-is, but decided it seems better to have
AutoML add the _summary itself for a couple reasons: first, if we do change
anything in the future in FitMultiple, it's much more likely to be the class
variable name that changes than the convention of adding '_summary', since that
is part of the user-facing API: user specifies what the output table name is,
and they should expect _summary to be appended as described in the docs.
AutoML is using FitMultiple in a way similar to how a user would... so it will
only break if we break backwards compatibility for the user... but won't break
if we refactor the code inside FitMultiple.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]