Frank McQuillan created MADLIB-1300:
---------------------------------------
Summary: Clarify dep and indep var column names in output table
for deep learning minibatch preprocessor
Key: MADLIB-1300
URL: https://issues.apache.org/jira/browse/MADLIB-1300
Project: Apache MADlib
Issue Type: Improvement
Components: Module: Utilities
Reporter: Frank McQuillan
Fix For: v1.16
Follow on to this commit:
Minibatch Preprocessor for Deep learning
https://github.com/apache/madlib/commit/8de32ede33c48d2f4a440f0f639c94a277a359c1
The output table produced by the deep mini-batch preprocessor contains the
following columns:
{code}
...
dependent_varname FLOAT8[]. Packed array of dependent variables. If the
dependent variable in the source table is categorical, the preprocessor will
one-hot encode it.
independent_varname FLOAT8[]. Packed array of independent variables.
...
{code}
This is misleading because these columns contain values not names, so we should
rename these columns to:
{code}
...
dependent_var
independent_var
...
{code}
The output summary table contains the following columns:
{code}
dependent_varname Dependent variable from the source table.
independent_varname Independent variable from the source table.
{code}
This is OK since the columns actually do contain names.
There is a related 2.0 story for the regular mini-batch preprocessor
http://madlib.apache.org/docs/latest/group__grp__minibatch__preprocessing.html
in JIRA https://issues.apache.org/jira/browse/MADLIB-1294
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)