GitHub user iyerr3 opened a pull request:
https://github.com/apache/madlib/pull/263
Bugfix/mlp minibatch grouping
If mini-batch preprocessor is run with grouping, the standardization in the
output table is computed per group. This implies that MLP should also be run
with the same grouping, else the dataset used for training would be different
from the original data, hence making the training invalid.
This commit ensures that MLP training will proceed only if the grouping
column input is same as the one used during preprocessing.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/madlib/madlib bugfix/mlp_minibatch_grouping
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/madlib/pull/263.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #263
----
commit d62e21eb5229f466c8674d6131490aa80a583523
Author: Rahul Iyer <riyer@...>
Date: 2018-04-17T13:30:32Z
MLP: Ensure grouping_col is same as preprocessed
commit 3d179accd212dd8693db0b874700edc777f1e117
Author: Rahul Iyer <riyer@...>
Date: 2018-04-17T16:48:39Z
MLP: Grouping input should always be same as preprocessor
----
---