Re: Apache Spark: Parallelization of Multiple Machine Learning ALgorithm

2017-09-05 Thread Bryan Cutler
k You very much. It is great help, I will try spark-sklearn. >> >> Prem >> >> >> >> >> >> >> >> >> >> *From: *Yanbo Liang <yblia...@gmail.com> >> *Date: *Tuesday, September 5, 2017 at 10:40 AM >> *To: *Patrick Mc

Re: Apache Spark: Parallelization of Multiple Machine Learning ALgorithm

2017-09-05 Thread Yanbo Liang
> > > > > > *From: *Yanbo Liang <yblia...@gmail.com> > *Date: *Tuesday, September 5, 2017 at 10:40 AM > *To: *Patrick McCarthy <pmccar...@dstillery.com> > *Cc: *"Timsina, Prem" <prem.tims...@mssm.edu>, "user@spark.apache.org" &

Re: Apache Spark: Parallelization of Multiple Machine Learning ALgorithm

2017-09-05 Thread Timsina, Prem
mber 5, 2017 at 10:40 AM To: Patrick McCarthy <pmccar...@dstillery.com> Cc: "Timsina, Prem" <prem.tims...@mssm.edu>, "user@spark.apache.org" <user@spark.apache.org> Subject: Re: Apache Spark: Parallelization of Multiple Machine Learning ALgorithm Hi

Re: Apache Spark: Parallelization of Multiple Machine Learning ALgorithm

2017-09-05 Thread Yanbo Liang
Hi Prem, How large is your dataset? Can it be fitted in a single node? If no, Spark MLlib provide CrossValidation which can run multiple machine learning algorithms parallel on distributed dataset and do parameter search. FYI: https://spark.apache.org/docs/latest/ml-tuning.html#cross-validation

Re: Apache Spark: Parallelization of Multiple Machine Learning ALgorithm

2017-09-05 Thread Patrick McCarthy
You might benefit from watching this JIRA issue - https://issues.apache.org/jira/browse/SPARK-19071 On Sun, Sep 3, 2017 at 5:50 PM, Timsina, Prem wrote: > Is there a way to parallelize multiple ML algorithms in Spark. My use case > is something like this: > > A) Run

Apache Spark: Parallelization of Multiple Machine Learning ALgorithm

2017-09-03 Thread Timsina, Prem
Is there a way to parallelize multiple ML algorithms in Spark. My use case is something like this: A) Run multiple machine learning algorithm (Naive Bayes, ANN, Random Forest, etc.) in parallel. 1) Validate each algorithm using 10-fold cross-validation B) Feed the output of step A) in second

Apache Spark: Parallelization of Multiple Machine Learning ALgorithm

2017-09-03 Thread prtimsina
Is there a way to parallelize multiple ML algorithms in Spark. My use case is something like this: A) Run multiple machine learning algorithm (Naive Bayes, ANN, Random Forest, etc.) in parallel. 1) Validate each algorithm using 10-fold cross-validation B) Feed the output of step A) in second