[ 
https://issues.apache.org/jira/browse/MADLIB-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16412215#comment-16412215
 ] 

Nandish Jayaram commented on MADLIB-1222:
-----------------------------------------

Example use case, and handling it:

1) User encodes dep vars with whatever tool they want and put it in the column 
`color` . Maybe they do this to anonymize, maybe the data is just in that 
format already:
{code}
blue [1,0,0]
red [0,1,0]
green [0,0,1]
{code}

---------------

start MADlib

2) runs mini-batch preprocess (if planning to use mini-batch)

3) runs MLP classification train (IGD or mini-batch)

4a) runs MLP predict (response):
{code}
actual predicted
[0,1,0] [1,0,0]
[0,0,1] [0,0,1] 
[1,0,0] [1,0,0]
etc.
{code}

4b) runs MLP predict (prob):
{code}
actual estimated_prob
[0,1,0] [0.85, 0.10, ,0.05]
[0,0,1] [0.0 , 0.1 , 0.9]
[1,0,0] [0.75, 0.20, 0.05]
etc.
{code}

end MADlib

--------------

5) User maps back to red, blue, green since they know the mapping but MADlib 
doesn't.

> Support already encoded arrays for dependent var in MLP classification
> ----------------------------------------------------------------------
>
>                 Key: MADLIB-1222
>                 URL: https://issues.apache.org/jira/browse/MADLIB-1222
>             Project: Apache MADlib
>          Issue Type: New Feature
>          Components: Module: Neural Networks
>            Reporter: Nandish Jayaram
>            Priority: Major
>             Fix For: v1.14
>
>
> MLP currently only supports scalar dependent variables for MLP 
> classification. If a user has already one-hot encoded categorical variables 
> the dependent variable will be an array, and hence unusable with 
> mlp_classification. This feature request is to allow the use of one-hot 
> encoded array for dependent vars in MLP classification.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to