Hi,
It depends on the problem that you work on.
Just as python and R, Mllib focuses on machine learning and SparkR will
focus on statistics, if SparkR follow the way of R.
For instance, If you want to use glm to analyse data:
1. if you are interested only in parameters of model, and use this
SparkR and MLlib are becoming more integrated (we recently added R formula
support) but the integration is still quite small. If you learn R and
SparkR, you will not be able to leverage most of the distributed algorithms
in MLlib (e.g. all the algorithms you cited). However, you could use the
I am starting off with classification models, Logistic,RandomForest.
Basically wanted to learn Machine learning.
Since I have a java background I started off with MLib, but later heard R
works as well ( with scaling issues - only).
So, with SparkR was wondering the scaling issue would be resolved
I was wondering when one should go for MLib or SparkR. What is the criteria
or what should be considered before choosing either of the solutions for
data analysis?
or What is the advantages of Spark MLib over Spark R or advantages of
SparkR over MLib?
...@gmail.commailto:mylogi...@gmail.com
Date: Wednesday, August 5, 2015 at 2:24 PM
To: user@spark.apache.orgmailto:user@spark.apache.org
user@spark.apache.orgmailto:user@spark.apache.org
Subject: Spark MLib v/s SparkR
I was wondering when one should go for MLib or SparkR. What is the criteria or
what should
A few points to consider:
a) SparkR gives the union of R_in_a_single_machine and the
distributed_computing_of_Spark:
b) It also gives the ability to wrangle with data in R, that is in the
Spark eco system
c) Coming to MLlib, the question is MLlib and R (not MLlib or R) -
depending on the scale,
What machine learning algorithms are you interested in exploring or using?
Start from there or better yet the problem you are trying to solve, and
then the selection may be evident.
On Wednesday, August 5, 2015, praveen S mylogi...@gmail.com wrote:
I was wondering when one should go for MLib