[ 
https://issues.apache.org/jira/browse/MADLIB-1322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16834257#comment-16834257
 ] 

Himanshu Pandey commented on MADLIB-1322:
-----------------------------------------

[~fmcquillan], 

How can I reproduce this ? I am trying with following steps but can't reproduce 
it yet: 

 

{code}
postgres=#  SELECT madlib.minibatch_preprocessor('iris_data',         -- Source 
table
postgres(#                                      'iris_data_packed',  -- Output 
table
postgres(#                                      'class',        -- Dependent 
variable
postgres(#                                      'attributes', NULL, NULL, TRUE);
 minibatch_preprocessor 
------------------------
 
(1 row)

postgres=# 
postgres=# 
postgres=# 
postgres=# 
postgres=# DROP TABLE IF EXISTS mlp_model, mlp_model_summary, 
mlp_model_standardization;
DROP TABLE
postgres=# -- Set seed so results are reproducible
postgres=# SELECT setseed(0);
 setseed 
---------
 
(1 row)

postgres=# SELECT madlib.mlp_classification(
postgres(#     'iris_data_packed',      -- Output table from mini-batch 
preprocessor
postgres(#     'mlp_model',             -- Destination table
postgres(#     'independent_varname',   -- Hardcode to this, from table 
iris_data_packed
postgres(#     'dependent_varname',     -- Hardcode to this, from table 
iris_data_packed
postgres(#     ARRAY[5],                -- Number of units per layer
postgres(#     'learning_rate_init=0.1,
postgres'#     n_iterations=500,
postgres'#     tolerance=0',            -- Optimizer params
postgres(#     'tanh',                  -- Activation function
postgres(#     NULL,                    -- Default weight (1)
postgres(#     FALSE,                   -- No warm start
postgres(#     FALSE                    -- Not verbose
postgres(# );
 mlp_classification 
--------------------
 
(1 row)
{code}


> MLP with minibatch fails for integer dependent variable
> -------------------------------------------------------
>
>                 Key: MADLIB-1322
>                 URL: https://issues.apache.org/jira/browse/MADLIB-1322
>             Project: Apache MADlib
>          Issue Type: Bug
>          Components: Module: Neural Networks
>            Reporter: Frank McQuillan
>            Priority: Minor
>             Fix For: v1.16
>
>
> (1)
> If I have an integer dependent variable and I mini-batch:
> {code}
> select madlib.minibatch_preprocessor(
> 'classification_train', -- input table
> 'mini_batch_packed_train', -- output table
> 'response', -- response INTEGER
> 'feature_vector',  -- indep vars
> NULL, -- grouping
> NULL, -- buffer size (or size of the mini-batch)
> TRUE -- Encode scalar int dependent variable (if response is integer instead 
> of boolean or char)
> );
> {code}
> Then the table looks like:
> {code}
> madlib=# \d+ batch_packed_train_summary
>              Table "public.mini_batch_packed_train_summary"
>           Column          |   Type    | Modifiers | Storage  | Stats target | 
> Description 
> --------------------------+-----------+-----------+----------+--------------+-------------
>  source_table             | text      |           | extended |              | 
>  output_table             | text      |           | extended |              | 
>  dependent_varname        | text      |           | extended |              | 
>  independent_varname      | text      |           | extended |              | 
>  dependent_vartype        | text      |           | extended |              | 
>  buffer_size              | integer   |           | plain    |              | 
>  class_values             | integer[] |           | extended |              | 
>  num_rows_processed       | integer   |           | plain    |              | 
>  num_missing_rows_skipped | integer   |           | plain    |              | 
>  grouping_cols            | text      |           | extended |              | 
> {code}
> Then MLP classification fails with:
> {code}
> InternalError: (psycopg2.InternalError) TypeError: must be string, not int
> CONTEXT:  Traceback (most recent call last):
>   PL/Python function "mlp_classification", line 33, in <module>
>     grouping_col)
>   PL/Python function "mlp_classification", line 42, in wrapper
>   PL/Python function "mlp_classification", line 147, in mlp
>   PL/Python function "mlp_classification", line 74, in quote_literal
> {code}
> (2)
> If I cast to text explicitly:
> {code}
> select madlib.minibatch_preprocessor(
> 'classification_train', -- input table
> 'mini_batch_packed_train', -- output table
> 'response::TEXT', -- response
> 'feature_vector',  -- indep vars
> NULL, -- grouping
> NULL, -- buffer size (or size of the mini-batch)
> TRUE -- Encode scalar int dependent variable (if response is integer instead 
> of boolean or char)
> );
> {code}
> The tables looks like:
> {code}
> madlib=# \d+ mini_batch_packed_train_summary
>             Table "public.mini_batch_packed_train_summary"
>           Column          |  Type   | Modifiers | Storage  | Stats target | 
> Description 
> --------------------------+---------+-----------+----------+--------------+-------------
>  source_table             | text    |           | extended |              | 
>  output_table             | text    |           | extended |              | 
>  dependent_varname        | text    |           | extended |              | 
>  independent_varname      | text    |           | extended |              | 
>  dependent_vartype        | text    |           | extended |              | 
>  buffer_size              | integer |           | plain    |              | 
>  class_values             | text[]  |           | extended |              | 
>  num_rows_processed       | integer |           | plain    |              | 
>  num_missing_rows_skipped | integer |           | plain    |              | 
>  grouping_cols            | text    |           | extended |              | 
> {code}
> And MLP training works OK.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to