Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/79#issuecomment-36719653
  
    I'm looking forward to this. I have one question just based on the 
description and not reading the code. Why only binary classification? RDF is 
inherently amenable to multi-class; you're just storing a distribution over N 
classes and computing entropy over N classes rather than 2. 
    
    Also does this support evaluating feature importance? even the simplistic 
way done by the likes of scikit-learn where you just evaluate which features 
touch the most examples as the pass through trees.
    
    Those are to me two key features for RDF. (And actual classification of new 
examples, but I take that for granted)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to