Github user smurching commented on the issue:

    https://github.com/apache/spark/pull/19433
  
    Made a few updates, here’s a quick summary/what I’d propose moving 
forward:
    
    Right now:
    * Shared row indices for all (categorical & continuous) features are stored 
& updated in `TrainingInfo`
    * `LocalDecisionTree.computeBestSplits` computes best splits/sufficient 
stats for a single feature at a time
    * A utility method (`LocalDecisionTreeUtils.updateArrayForSplit`) is used 
to sort both feature values and shared row indices
    
    When we add support for raw continuous feature values:
    * Add a subclass of `FeatureColumn` (e.g. `ContinuousFeatureColumn`) that 
stores and sorts its own array of row indices, pass these row indices to 
methods requiring them.
    
    I also renamed `FeatureVector` to `FeatureColumn` since the former seemed 
like it’d confuse developers (`FeatureVector` sounds like a single data point)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to