subject:"\"Slow Performance with Apache Spark Gradient Boosted Tree training runs\""

Re: Slow Performance with Apache Spark Gradient Boosted Tree training runs

2015-09-22 Thread Yashwanth Kumar

nce tuning: http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-1/ http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/ -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Slow-Performance-with-Apache-Spar

Slow Performance with Apache Spark Gradient Boosted Tree training runs

2015-09-21 Thread vkutsenko

mediate input size or avoid shuffling of data between stages. In my case I'm basically using an "out-of-the-box" algorithm, which is written by ML experts and *should* already be well tuned in this regard. My own code that outputs GBT model to S3 should take a trivial amount of time to r