Github user srowen commented on the pull request: https://github.com/apache/spark/pull/79#issuecomment-36719653 I'm looking forward to this. I have one question just based on the description and not reading the code. Why only binary classification? RDF is inherently amenable to multi-class; you're just storing a distribution over N classes and computing entropy over N classes rather than 2. Also does this support evaluating feature importance? even the simplistic way done by the likes of scikit-learn where you just evaluate which features touch the most examples as the pass through trees. Those are to me two key features for RDF. (And actual classification of new examples, but I take that for granted)
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---