[ 
https://issues.apache.org/jira/browse/MADLIB-1322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Frank McQuillan updated MADLIB-1322:
------------------------------------
    Priority: Minor  (was: Major)

> MLP with minibatch fails for integer dependent variable
> -------------------------------------------------------
>
>                 Key: MADLIB-1322
>                 URL: https://issues.apache.org/jira/browse/MADLIB-1322
>             Project: Apache MADlib
>          Issue Type: Bug
>          Components: Module: Neural Networks
>            Reporter: Frank McQuillan
>            Priority: Minor
>             Fix For: v1.16
>
>
> (1)
> If I have an integer dependent variable and I mini-batch:
> {code}
> select madlib.minibatch_preprocessor(
> 'classification_train', -- input table
> 'mini_batch_packed_train', -- output table
> 'response', -- response INTEGER
> 'feature_vector',  -- indep vars
> NULL, -- grouping
> NULL, -- buffer size (or size of the mini-batch)
> TRUE -- Encode scalar int dependent variable (if response is integer instead 
> of boolean or char)
> );
> {code}
> Then the table looks like:
> {code}
> madlib=# \d+ batch_packed_train_summary
>              Table "public.mini_batch_packed_train_summary"
>           Column          |   Type    | Modifiers | Storage  | Stats target | 
> Description 
> --------------------------+-----------+-----------+----------+--------------+-------------
>  source_table             | text      |           | extended |              | 
>  output_table             | text      |           | extended |              | 
>  dependent_varname        | text      |           | extended |              | 
>  independent_varname      | text      |           | extended |              | 
>  dependent_vartype        | text      |           | extended |              | 
>  buffer_size              | integer   |           | plain    |              | 
>  class_values             | integer[] |           | extended |              | 
>  num_rows_processed       | integer   |           | plain    |              | 
>  num_missing_rows_skipped | integer   |           | plain    |              | 
>  grouping_cols            | text      |           | extended |              | 
> {code}
> Then MLP classification fails with:
> {code}
> InternalError: (psycopg2.InternalError) TypeError: must be string, not int
> CONTEXT:  Traceback (most recent call last):
>   PL/Python function "mlp_classification", line 33, in <module>
>     grouping_col)
>   PL/Python function "mlp_classification", line 42, in wrapper
>   PL/Python function "mlp_classification", line 147, in mlp
>   PL/Python function "mlp_classification", line 74, in quote_literal
> {code}
> (2)
> If I cast to text explicitly:
> {code}
> select madlib.minibatch_preprocessor(
> 'classification_train', -- input table
> 'mini_batch_packed_train', -- output table
> 'response::TEXT', -- response
> 'feature_vector',  -- indep vars
> NULL, -- grouping
> NULL, -- buffer size (or size of the mini-batch)
> TRUE -- Encode scalar int dependent variable (if response is integer instead 
> of boolean or char)
> );
> {code}
> The tables looks like:
> {code}
> madlib=# \d+ mini_batch_packed_train_summary
>             Table "public.mini_batch_packed_train_summary"
>           Column          |  Type   | Modifiers | Storage  | Stats target | 
> Description 
> --------------------------+---------+-----------+----------+--------------+-------------
>  source_table             | text    |           | extended |              | 
>  output_table             | text    |           | extended |              | 
>  dependent_varname        | text    |           | extended |              | 
>  independent_varname      | text    |           | extended |              | 
>  dependent_vartype        | text    |           | extended |              | 
>  buffer_size              | integer |           | plain    |              | 
>  class_values             | text[]  |           | extended |              | 
>  num_rows_processed       | integer |           | plain    |              | 
>  num_missing_rows_skipped | integer |           | plain    |              | 
>  grouping_cols            | text    |           | extended |              | 
> {code}
> And MLP training works OK.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to