Re: MLLib : Decision Tree with minimum points per node

2014-06-19 Thread Manish Amde
Hi Justin,

I am glad to know that trees are working well for you.

The trees will support minimum samples per node in a future release. Thanks
for the feedback.

-Manish


On Fri, Jun 13, 2014 at 8:55 PM, Justin Yip yipjus...@gmail.com wrote:

 Hello,

 I have been playing around with mllib's decision tree library. It is
 working great, thanks.

 I have a question regarding overfitting. It appears to me that the current
 implementation doesn't allows user to specify the minimum number of samples
 per node. This results in some nodes only contain very few samples, which
 potentially leads to overfitting.

 I would like to know if there is workaround or any way to prevent
 overfitting? Or will decision tree supports min-samples-per-node in future
 releases?

 Thanks.

 Justin





Re: MLLib : Decision Tree with minimum points per node

2014-06-19 Thread Manish Amde
Hi Justin,

I have created a JIRA ticket to keep track of your request. Thanks.
https://issues.apache.org/jira/browse/SPARK-2207

-Manish


On Thu, Jun 19, 2014 at 2:35 PM, Manish Amde manish...@gmail.com wrote:

 Hi Justin,

 I am glad to know that trees are working well for you.

 The trees will support minimum samples per node in a future release.
 Thanks for the feedback.

 -Manish


 On Fri, Jun 13, 2014 at 8:55 PM, Justin Yip yipjus...@gmail.com wrote:

 Hello,

 I have been playing around with mllib's decision tree library. It is
 working great, thanks.

 I have a question regarding overfitting. It appears to me that the
 current implementation doesn't allows user to specify the minimum number of
 samples per node. This results in some nodes only contain very few samples,
 which potentially leads to overfitting.

 I would like to know if there is workaround or any way to prevent
 overfitting? Or will decision tree supports min-samples-per-node in future
 releases?

 Thanks.

 Justin






MLLib : Decision Tree with minimum points per node

2014-06-13 Thread Justin Yip
Hello,

I have been playing around with mllib's decision tree library. It is
working great, thanks.

I have a question regarding overfitting. It appears to me that the current
implementation doesn't allows user to specify the minimum number of samples
per node. This results in some nodes only contain very few samples, which
potentially leads to overfitting.

I would like to know if there is workaround or any way to prevent
overfitting? Or will decision tree supports min-samples-per-node in future
releases?

Thanks.

Justin