Github user orhankislal commented on a diff in the pull request:
https://github.com/apache/incubator-madlib/pull/142#discussion_r123856874
--- Diff: src/modules/recursive_partitioning/DT_impl.hpp ---
@@ -486,8 +485,18 @@ DecisionTree<Container>::expand(const Accumulator
&state,
Index stats_i = static_cast<Index>(state.stats_lookup(i));
assert(stats_i >= 0);
- // 1. Set the prediction for current node from stats of all
rows
- predictions.row(current) = state.node_stats.row(stats_i);
+ if (statCount(predictions.row(current)) !=
+ statCount(state.node_stats.row(stats_i))){
+ // Predictions for each node is set by its parent using
stats
+ // recorded while training parent node. These stats do not
include
+ // rows that had a NULL value for the primary split
feature.
+ // The NULL count is included in the 'node_stats' while
training
+ // current node. Further, presence of NULL rows indicate
that
+ // stats used for deciding 'children_wont_split' are
inaccurate.
+ // Hence avoid using the flag to decide termination.
+ predictions.row(current) = state.node_stats.row(stats_i);
+ children_wont_split = false;
+ }
--- End diff --
No need to exchange anything. After reading through the
`updatePrimarySplit` function, I understand why it is needed even if its
boolean output doesn't affect the variable.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---