k You very much. It is great help, I will try spark-sklearn.
>>
>> Prem
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> *From: *Yanbo Liang <yblia...@gmail.com>
>> *Date: *Tuesday, September 5, 2017 at 10:40 AM
>> *To: *Patrick Mc
>
>
>
>
>
> *From: *Yanbo Liang <yblia...@gmail.com>
> *Date: *Tuesday, September 5, 2017 at 10:40 AM
> *To: *Patrick McCarthy <pmccar...@dstillery.com>
> *Cc: *"Timsina, Prem" <prem.tims...@mssm.edu>, "user@spark.apache.org" &
mber 5, 2017 at 10:40 AM
To: Patrick McCarthy <pmccar...@dstillery.com>
Cc: "Timsina, Prem" <prem.tims...@mssm.edu>, "user@spark.apache.org"
<user@spark.apache.org>
Subject: Re: Apache Spark: Parallelization of Multiple Machine Learning
ALgorithm
Hi
Hi Prem,
How large is your dataset? Can it be fitted in a single node?
If no, Spark MLlib provide CrossValidation which can run multiple machine
learning algorithms parallel on distributed dataset and do parameter
search. FYI:
https://spark.apache.org/docs/latest/ml-tuning.html#cross-validation
You might benefit from watching this JIRA issue -
https://issues.apache.org/jira/browse/SPARK-19071
On Sun, Sep 3, 2017 at 5:50 PM, Timsina, Prem wrote:
> Is there a way to parallelize multiple ML algorithms in Spark. My use case
> is something like this:
>
> A) Run
Is there a way to parallelize multiple ML algorithms in Spark. My use case is
something like this:
A) Run multiple machine learning algorithm (Naive Bayes, ANN, Random Forest,
etc.) in parallel.
1) Validate each algorithm using 10-fold cross-validation
B) Feed the output of step A) in second
Is there a way to parallelize multiple ML algorithms in Spark. My use case is
something like this:
A) Run multiple machine learning algorithm (Naive Bayes, ANN, Random Forest,
etc.) in parallel.
1) Validate each algorithm using 10-fold cross-validation
B) Feed the output of step A) in second