[
https://issues.apache.org/jira/browse/MADLIB-1322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16834257#comment-16834257
]
Himanshu Pandey commented on MADLIB-1322:
-----------------------------------------
[~fmcquillan],
How can I reproduce this ? I am trying with following steps but can't reproduce
it yet:
{code}
postgres=# SELECT madlib.minibatch_preprocessor('iris_data', -- Source
table
postgres(# 'iris_data_packed', -- Output
table
postgres(# 'class', -- Dependent
variable
postgres(# 'attributes', NULL, NULL, TRUE);
minibatch_preprocessor
------------------------
(1 row)
postgres=#
postgres=#
postgres=#
postgres=#
postgres=# DROP TABLE IF EXISTS mlp_model, mlp_model_summary,
mlp_model_standardization;
DROP TABLE
postgres=# -- Set seed so results are reproducible
postgres=# SELECT setseed(0);
setseed
---------
(1 row)
postgres=# SELECT madlib.mlp_classification(
postgres(# 'iris_data_packed', -- Output table from mini-batch
preprocessor
postgres(# 'mlp_model', -- Destination table
postgres(# 'independent_varname', -- Hardcode to this, from table
iris_data_packed
postgres(# 'dependent_varname', -- Hardcode to this, from table
iris_data_packed
postgres(# ARRAY[5], -- Number of units per layer
postgres(# 'learning_rate_init=0.1,
postgres'# n_iterations=500,
postgres'# tolerance=0', -- Optimizer params
postgres(# 'tanh', -- Activation function
postgres(# NULL, -- Default weight (1)
postgres(# FALSE, -- No warm start
postgres(# FALSE -- Not verbose
postgres(# );
mlp_classification
--------------------
(1 row)
{code}
> MLP with minibatch fails for integer dependent variable
> -------------------------------------------------------
>
> Key: MADLIB-1322
> URL: https://issues.apache.org/jira/browse/MADLIB-1322
> Project: Apache MADlib
> Issue Type: Bug
> Components: Module: Neural Networks
> Reporter: Frank McQuillan
> Priority: Minor
> Fix For: v1.16
>
>
> (1)
> If I have an integer dependent variable and I mini-batch:
> {code}
> select madlib.minibatch_preprocessor(
> 'classification_train', -- input table
> 'mini_batch_packed_train', -- output table
> 'response', -- response INTEGER
> 'feature_vector', -- indep vars
> NULL, -- grouping
> NULL, -- buffer size (or size of the mini-batch)
> TRUE -- Encode scalar int dependent variable (if response is integer instead
> of boolean or char)
> );
> {code}
> Then the table looks like:
> {code}
> madlib=# \d+ batch_packed_train_summary
> Table "public.mini_batch_packed_train_summary"
> Column | Type | Modifiers | Storage | Stats target |
> Description
> --------------------------+-----------+-----------+----------+--------------+-------------
> source_table | text | | extended | |
> output_table | text | | extended | |
> dependent_varname | text | | extended | |
> independent_varname | text | | extended | |
> dependent_vartype | text | | extended | |
> buffer_size | integer | | plain | |
> class_values | integer[] | | extended | |
> num_rows_processed | integer | | plain | |
> num_missing_rows_skipped | integer | | plain | |
> grouping_cols | text | | extended | |
> {code}
> Then MLP classification fails with:
> {code}
> InternalError: (psycopg2.InternalError) TypeError: must be string, not int
> CONTEXT: Traceback (most recent call last):
> PL/Python function "mlp_classification", line 33, in <module>
> grouping_col)
> PL/Python function "mlp_classification", line 42, in wrapper
> PL/Python function "mlp_classification", line 147, in mlp
> PL/Python function "mlp_classification", line 74, in quote_literal
> {code}
> (2)
> If I cast to text explicitly:
> {code}
> select madlib.minibatch_preprocessor(
> 'classification_train', -- input table
> 'mini_batch_packed_train', -- output table
> 'response::TEXT', -- response
> 'feature_vector', -- indep vars
> NULL, -- grouping
> NULL, -- buffer size (or size of the mini-batch)
> TRUE -- Encode scalar int dependent variable (if response is integer instead
> of boolean or char)
> );
> {code}
> The tables looks like:
> {code}
> madlib=# \d+ mini_batch_packed_train_summary
> Table "public.mini_batch_packed_train_summary"
> Column | Type | Modifiers | Storage | Stats target |
> Description
> --------------------------+---------+-----------+----------+--------------+-------------
> source_table | text | | extended | |
> output_table | text | | extended | |
> dependent_varname | text | | extended | |
> independent_varname | text | | extended | |
> dependent_vartype | text | | extended | |
> buffer_size | integer | | plain | |
> class_values | text[] | | extended | |
> num_rows_processed | integer | | plain | |
> num_missing_rows_skipped | integer | | plain | |
> grouping_cols | text | | extended | |
> {code}
> And MLP training works OK.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)