SparkR and MLlib are becoming more integrated (we recently added R formula
support) but the integration is still quite small. If you learn R and
SparkR, you will not be able to leverage most of the distributed algorithms
in MLlib (e.g. all the algorithms you cited). However, you could use the
equivalent R implementations (e.g. glm for Logistic) but be aware that
these will not scale to the large scale datasets Spark is designed to
handle.

On Thu, Aug 6, 2015 at 8:06 PM, praveen S <mylogi...@gmail.com> wrote:

> I am starting off with classification models, Logistic,RandomForest.
> Basically wanted to learn Machine learning.
> Since I have a java background I started off with MLib, but later heard R
> works as well ( with scaling issues - only).
>
> So, with SparkR was wondering the scaling issue would be resolved - hence
> my question why not go with R and Spark R alone.( keeping aside my
> inclination towards java)
>
> On Thu, Aug 6, 2015 at 12:28 AM, Charles Earl <charles.ce...@gmail.com>
> wrote:
>
>> What machine learning algorithms are you interested in exploring or
>> using? Start from there or better yet the problem you are trying to solve,
>> and then the selection may be evident.
>>
>>
>> On Wednesday, August 5, 2015, praveen S <mylogi...@gmail.com> wrote:
>>
>>> I was wondering when one should go for MLib or SparkR. What is the
>>> criteria or what should be considered before choosing either of the
>>> solutions for data analysis?
>>> or What is the advantages of Spark MLib over Spark R or advantages of
>>> SparkR over MLib?
>>>
>>
>>
>> --
>> - Charles
>>
>
>

Reply via email to