[ 
https://issues.apache.org/jira/browse/MADLIB-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Frank McQuillan closed MADLIB-1314.
-----------------------------------
    Resolution: Fixed

> Add optional num_classes param for minibatch preprocessor for DL
> ----------------------------------------------------------------
>
>                 Key: MADLIB-1314
>                 URL: https://issues.apache.org/jira/browse/MADLIB-1314
>             Project: Apache MADlib
>          Issue Type: New Feature
>          Components: Deep Learning, Module: Utilities
>            Reporter: Nandish Jayaram
>            Priority: Major
>             Fix For: v1.16
>
>
> The current `minibatch_preprocessor_dl` module looks at the input table to 
> find the number of distinct categories (class values) for the dependent 
> variable, and uses that number as the size of the one-hot-encoded array. This 
> could lead a failure in madlib_keras fit function if the `num_classes` 
> defined in the architecture is a number greater/different than the size of 
> the one hot encoded array.
> This could be a fairly common scenario, for example:
> Say original data set is places 350, but we decide to sample a subset. That 
> subset may not have all 350 classes (assume it has only 10 classes in it), 
> but the model we have already defined is for places 350 (so num_classes there 
> would be specified as 350, and the final layer would have that many units). 
> So we will have to change the model architecture to work with this sampled 
> dataset if we do not support this feature where we create one-hot encoded 
> vector of size 350 despite finding only 10 class values in the input dataset.
> Acceptance:
> 1. Add optional `num_classes` param of type integer.
> 1. one hot encoded array must be of size `num_classes` if specified, else use 
> the distinct number of class values for it.
> 1. Fail if `num_classes < distinct class values found in dataset`.
> 1. `class_values` column in summary table must have `NULL` as the entry for 
> class values that do not exist in the input table.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to