[GitHub] spark pull request: MLI-1 Decision Trees

mengxr Mon, 10 Mar 2014 12:35:23 -0700

Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/79#issuecomment-37224328
  
    @manishamde Thanks for updating the code style and adding more docs! I made 
a first pass over the code.
    
    For the code style, we do not have a good style checker for Scala. @rxin 
can tell more about style checking. However, it is easy to learn Spark's code 
style through the code review and make your code style consistent in the next 
update. Please see my comments for some examples and update similar code in 
other places.
    
    For the implementation, I have the following suggestions:
    
    1. Regression or Classification is checked in many places. It would be nice 
to create a DecisionTree base class and make RegressionTree and 
ClassificationTree two subclasses of it.
    
    2. For loops are used in some performance critical code. This should be 
replaced by "while", which is much faster than "for" in Scala.
    
    3. Several nested methods are used in findBestSplits. It feels safe to see 
some unit tests for them.
    
    4. The threshold for classification is set at 0.5. This should be 
configurable.
    
    I will try to make a second pass focusing on the algorithm later today. In 
the meanwhile, would you please update the remaining code style problems and 
the for loops? Thanks!



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: MLI-1 Decision Trees

Reply via email to