[GitHub] spark issue #19433: [SPARK-3162] [MLlib] Add local tree training for decisio...

smurching Thu, 26 Oct 2017 19:06:59 -0700

Github user smurching commented on the issue:

    https://github.com/apache/spark/pull/19433
  
    Made a few updates, hereâs a quick summary/what Iâd propose moving 
forward:
    
    Right now:
    * Shared row indices for all (categorical & continuous) features are stored 
& updated in `TrainingInfo`
    * `LocalDecisionTree.computeBestSplits` computes best splits/sufficient 
stats for a single feature at a time
    * A utility method (`LocalDecisionTreeUtils.updateArrayForSplit`) is used 
to sort both feature values and shared row indices
    
    When we add support for raw continuous feature values:
    * Add a subclass of `FeatureColumn` (e.g. `ContinuousFeatureColumn`) that 
stores and sorts its own array of row indices, pass these row indices to 
methods requiring them.
    
    I also renamed `FeatureVector` to `FeatureColumn` since the former seemed 
like itâd confuse developers (`FeatureVector` sounds like a single data point)



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19433: [SPARK-3162] [MLlib] Add local tree training for decisio...

Reply via email to