Re: MLLib : Decision Tree with minimum points per node
Hi Justin, I am glad to know that trees are working well for you. The trees will support minimum samples per node in a future release. Thanks for the feedback. -Manish On Fri, Jun 13, 2014 at 8:55 PM, Justin Yip yipjus...@gmail.com wrote: Hello, I have been playing around with mllib's decision tree library. It is working great, thanks. I have a question regarding overfitting. It appears to me that the current implementation doesn't allows user to specify the minimum number of samples per node. This results in some nodes only contain very few samples, which potentially leads to overfitting. I would like to know if there is workaround or any way to prevent overfitting? Or will decision tree supports min-samples-per-node in future releases? Thanks. Justin
Re: MLLib : Decision Tree with minimum points per node
Hi Justin, I have created a JIRA ticket to keep track of your request. Thanks. https://issues.apache.org/jira/browse/SPARK-2207 -Manish On Thu, Jun 19, 2014 at 2:35 PM, Manish Amde manish...@gmail.com wrote: Hi Justin, I am glad to know that trees are working well for you. The trees will support minimum samples per node in a future release. Thanks for the feedback. -Manish On Fri, Jun 13, 2014 at 8:55 PM, Justin Yip yipjus...@gmail.com wrote: Hello, I have been playing around with mllib's decision tree library. It is working great, thanks. I have a question regarding overfitting. It appears to me that the current implementation doesn't allows user to specify the minimum number of samples per node. This results in some nodes only contain very few samples, which potentially leads to overfitting. I would like to know if there is workaround or any way to prevent overfitting? Or will decision tree supports min-samples-per-node in future releases? Thanks. Justin
MLLib : Decision Tree with minimum points per node
Hello, I have been playing around with mllib's decision tree library. It is working great, thanks. I have a question regarding overfitting. It appears to me that the current implementation doesn't allows user to specify the minimum number of samples per node. This results in some nodes only contain very few samples, which potentially leads to overfitting. I would like to know if there is workaround or any way to prevent overfitting? Or will decision tree supports min-samples-per-node in future releases? Thanks. Justin