Re: MLBase status

2014-08-27 Thread Ameet Talwalkar
Hi Sameer, MLbase started out as a set of three ML components on top of Spark. The lowest level, MLlib, is now a rapidly growing component within Spark and is maintained by the Spark community. The two higher-level components (MLI and MLOpt) are experimental components that serve as testbeds for

Re: Gradient Boosting Decision Trees

2014-07-16 Thread Ameet Talwalkar
Hi Pedro, Yes, although they will probably not be included in the next release (since the code freeze is ~2 weeks away), GBM (and other ensembles of decision trees) are currently under active development. We're hoping they'll make it into the subsequent release. -Ameet On Wed, Jul 16, 2014 at

Re: KMeans code is rubbish

2014-07-11 Thread Ameet Talwalkar
Hi Wanda, As Sean mentioned, K-means is not guaranteed to find an optimal answer, even for seemingly simple toy examples. A common heuristic to deal with this issue is to run kmeans multiple times and choose the best answer. You can do this by changing the runs parameter from the default value

Re: MLlib feature request

2014-07-11 Thread Ameet Talwalkar
Hi Joseph, Thanks for your email. Many users are requesting this functionality, while it would be a stretch for them to appear in Spark 1.1, various people (including Manish Amde and folks at the AMPLab, Databricks and Alpine Labs) are actively work on developing ensembles of decision trees