Github user orhankislal commented on a diff in the pull request: https://github.com/apache/incubator-madlib/pull/142#discussion_r123618828 --- Diff: src/modules/recursive_partitioning/DT_impl.hpp --- @@ -486,8 +485,18 @@ DecisionTree<Container>::expand(const Accumulator &state, Index stats_i = static_cast<Index>(state.stats_lookup(i)); assert(stats_i >= 0); - // 1. Set the prediction for current node from stats of all rows - predictions.row(current) = state.node_stats.row(stats_i); + if (statCount(predictions.row(current)) != + statCount(state.node_stats.row(stats_i))){ + // Predictions for each node is set by its parent using stats + // recorded while training parent node. These stats do not include + // rows that had a NULL value for the primary split feature. + // The NULL count is included in the 'node_stats' while training + // current node. Further, presence of NULL rows indicate that + // stats used for deciding 'children_wont_split' are inaccurate. + // Hence avoid using the flag to decide termination. + predictions.row(current) = state.node_stats.row(stats_i); + children_wont_split = false; + } --- End diff -- Let me try to rephrase my question: Assume the if check at lines 490-491 is `true`. `children_wont_split` will be set to `false`. This variable is used with an & operator at line 568. That means the second boolean value is irrelevant, the result will always be `false`. In this case, do we need the 2 double for loops from lines 516-547? The compiler might calculate the backwards slice of this instruction to find its optimal location but I don't think we should rely on that.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---