[
https://issues.apache.org/jira/browse/MADLIB-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16412215#comment-16412215
]
Nandish Jayaram commented on MADLIB-1222:
-----------------------------------------
Example use case, and handling it:
1) User encodes dep vars with whatever tool they want and put it in the column
`color` . Maybe they do this to anonymize, maybe the data is just in that
format already:
{code}
blue [1,0,0]
red [0,1,0]
green [0,0,1]
{code}
---------------
start MADlib
2) runs mini-batch preprocess (if planning to use mini-batch)
3) runs MLP classification train (IGD or mini-batch)
4a) runs MLP predict (response):
{code}
actual predicted
[0,1,0] [1,0,0]
[0,0,1] [0,0,1]
[1,0,0] [1,0,0]
etc.
{code}
4b) runs MLP predict (prob):
{code}
actual estimated_prob
[0,1,0] [0.85, 0.10, ,0.05]
[0,0,1] [0.0 , 0.1 , 0.9]
[1,0,0] [0.75, 0.20, 0.05]
etc.
{code}
end MADlib
--------------
5) User maps back to red, blue, green since they know the mapping but MADlib
doesn't.
> Support already encoded arrays for dependent var in MLP classification
> ----------------------------------------------------------------------
>
> Key: MADLIB-1222
> URL: https://issues.apache.org/jira/browse/MADLIB-1222
> Project: Apache MADlib
> Issue Type: New Feature
> Components: Module: Neural Networks
> Reporter: Nandish Jayaram
> Priority: Major
> Fix For: v1.14
>
>
> MLP currently only supports scalar dependent variables for MLP
> classification. If a user has already one-hot encoded categorical variables
> the dependent variable will be an array, and hence unusable with
> mlp_classification. This feature request is to allow the use of one-hot
> encoded array for dependent vars in MLP classification.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)