Github user smurching commented on the issue: https://github.com/apache/spark/pull/19433 Made a few updates, hereâs a quick summary/what Iâd propose moving forward: Right now: * Shared row indices for all (categorical & continuous) features are stored & updated in `TrainingInfo` * `LocalDecisionTree.computeBestSplits` computes best splits/sufficient stats for a single feature at a time * A utility method (`LocalDecisionTreeUtils.updateArrayForSplit`) is used to sort both feature values and shared row indices When we add support for raw continuous feature values: * Add a subclass of `FeatureColumn` (e.g. `ContinuousFeatureColumn`) that stores and sorts its own array of row indices, pass these row indices to methods requiring them. I also renamed `FeatureVector` to `FeatureColumn` since the former seemed like itâd confuse developers (`FeatureVector` sounds like a single data point)
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org