njayaram2 opened a new pull request #361: Minibatch Preprocessor DL: Add 
optional num_classes param.
URL: https://github.com/apache/madlib/pull/361
 
 
   The current `minibatch_preprocessor_dl()` module looks at the input
   table to find the number of distinct categories (class values) for the
   dependent variable, and uses that number as the size of the
   one-hot-encoded array. This could lead to a failure in madlib_keras fit
   function if the `num_classes` defined in the architecture is a number
   greater/different than the size of the one hot encoded array.
   
   This commit adds two functionalities:
   1) A new optional parameter to `minibatch_preprocessor_dl()` that will
   be used to determine the length of the 1-hot encoded vector for the
   dependent var. If the param is set to NULL, the length will be equal to
   the number of distinct class values found in the dataset, else
   num_classes must be greater than equal to the number of distinct class
   values.
   The `class_values` column in the summary table contains an array of
   class values associated with the 1-hot encoded vector. That will have
   NULL as the value for class values that we don't find any representation
   for in the dataset.
   2) We now support NULL as a valid class value for dependent variable.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to