[ https://issues.apache.org/jira/browse/MADLIB-1300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Frank McQuillan resolved MADLIB-1300. ------------------------------------- Resolution: Fixed > Clarify dep and indep var column names in output table for deep learning > minibatch preprocessor > ----------------------------------------------------------------------------------------------- > > Key: MADLIB-1300 > URL: https://issues.apache.org/jira/browse/MADLIB-1300 > Project: Apache MADlib > Issue Type: Improvement > Components: Module: Utilities > Reporter: Frank McQuillan > Assignee: Himanshu Pandey > Priority: Minor > Fix For: v1.16 > > > Follow on to this commit: > Minibatch Preprocessor for Deep learning > https://github.com/apache/madlib/commit/8de32ede33c48d2f4a440f0f639c94a277a359c1 > The output table produced by the deep mini-batch preprocessor contains the > following columns: > {code} > ... > dependent_varname FLOAT8[]. Packed array of dependent variables. If the > dependent variable in the source table is categorical, the preprocessor will > one-hot encode it. > independent_varname FLOAT8[]. Packed array of independent variables. > ... > {code} > This is misleading because these columns contain values not names, so we > should rename these columns to: > {code} > ... > dependent_var > independent_var > ... > {code} > The output summary table contains the following columns: > {code} > dependent_varname Dependent variable from the source table. > independent_varname Independent variable from the source table. > {code} > This is OK since the columns actually do contain names. > There is a related 2.0 story for the regular mini-batch preprocessor > http://madlib.apache.org/docs/latest/group__grp__minibatch__preprocessing.html > in JIRA https://issues.apache.org/jira/browse/MADLIB-1294 which we don't want > to do in 1.16 since it will break semantic versioning -- This message was sent by Atlassian JIRA (v7.6.3#76005)