; user user@spark.apache.org
Sent: Saturday, November 22, 2014 4:53 PM
Subject: Re: Spark or MR, Scala or Java?
Adding to already interesting answers:
- Is there any case where MR is better than Spark? I don't know what cases
I should be used Spark by MR. When is MR faster than Spark
?
sanjay
--
*From:* Krishna Sankar ksanka...@gmail.com
*To:* Sean Owen so...@cloudera.com
*Cc:* Guillermo Ortiz konstt2...@gmail.com; user user@spark.apache.org
*Sent:* Saturday, November 22, 2014 4:53 PM
*Subject:* Re: Spark or MR, Scala or Java?
Adding
On Sun, Nov 23, 2014 at 1:03 PM, Ashish Rangole arang...@gmail.com wrote:
Java or Scala : I knew Java already yet I learnt Scala when I came across
Spark. As others have said, you can get started with a little bit of Scala
and learn more as you progress. Once you have started using Scala for a
Good point.
On the positive side, whether we choose the most efficient mechanism in
Scala might not be as important, as the Spark framework mediates the
distributed computation. Even if there is some declarative part in Spark,
we can still choose an inefficient computation path that is not
A very timely article
http://rahulkavale.github.io/blog/2014/11/16/scrap-your-map-reduce/
Cheers
k/
P.S: Now reply to ALL.
On Sun, Nov 23, 2014 at 7:16 PM, Krishna Sankar ksanka...@gmail.com wrote:
Good point.
On the positive side, whether we choose the most efficient mechanism in
Scala might
:03 AM
Subject: Re: Spark or MR, Scala or Java?
This being a very broad topic, a discussion can quickly get subjective. I'll
try not to deviate from my experiences and observations to keep this thread
useful to those looking for answers.
I have used Hadoop MR (with Hive, MR Java apis
Spark can do Map Reduce and more, and faster.
One area where using MR would make sense is if you're using something (maybe
like Mahout) that doesn't understand Spark yet (Mahout may be Spark compatible
now...just pulled that name out of thin air!).
You *can* use Spark from Java, but you'd have a
Just to add some more stuff - there are various scenarios where traditional
Hadoop makes more sense than Spark. For example, if you have a long running
processing job in which you do not want to utilize too many resources of
the cluster. Another example could be that you want to run a distributed
MapReduce is simpler and narrower, which also means it is generally lighter
weight, with less to know and configure, and runs more predictably. If you
have a job that is truly just a few maps, with maybe one reduce, MR will
likely be more efficient. Until recently its shuffle has been more
Adding to already interesting answers:
- Is there any case where MR is better than Spark? I don't know what cases
I should be used Spark by MR. When is MR faster than Spark?
- Many. MR would be better (am not saying faster ;o)) for
- Very large dataset,
- Multistage
Thanks Sean.
adding user@spark.apache.org again.
On Sat, Nov 22, 2014 at 9:35 PM, Sean Owen so...@cloudera.com wrote:
On Sun, Nov 23, 2014 at 2:20 AM, Soumya Simanta
soumya.sima...@gmail.com wrote:
Is the MapReduce API simpler or the implementation? Almost, every Spark
presentation has a
11 matches
Mail list logo