Re: Spark or MR, Scala or Java?

2014-11-23 Thread Sanjay Subramanian
; user user@spark.apache.org Sent: Saturday, November 22, 2014 4:53 PM Subject: Re: Spark or MR, Scala or Java? Adding to already interesting answers: - Is there any case where MR is better than Spark? I don't know what cases I should be used Spark by MR. When is MR faster than Spark

Re: Spark or MR, Scala or Java?

2014-11-23 Thread Ashish Rangole
? sanjay -- *From:* Krishna Sankar ksanka...@gmail.com *To:* Sean Owen so...@cloudera.com *Cc:* Guillermo Ortiz konstt2...@gmail.com; user user@spark.apache.org *Sent:* Saturday, November 22, 2014 4:53 PM *Subject:* Re: Spark or MR, Scala or Java? Adding

Re: Spark or MR, Scala or Java?

2014-11-23 Thread Ognen Duzlevski
On Sun, Nov 23, 2014 at 1:03 PM, Ashish Rangole arang...@gmail.com wrote: Java or Scala : I knew Java already yet I learnt Scala when I came across Spark. As others have said, you can get started with a little bit of Scala and learn more as you progress. Once you have started using Scala for a

Re: Spark or MR, Scala or Java?

2014-11-23 Thread Krishna Sankar
Good point. On the positive side, whether we choose the most efficient mechanism in Scala might not be as important, as the Spark framework mediates the distributed computation. Even if there is some declarative part in Spark, we can still choose an inefficient computation path that is not

Re: Spark or MR, Scala or Java?

2014-11-23 Thread Krishna Sankar
A very timely article http://rahulkavale.github.io/blog/2014/11/16/scrap-your-map-reduce/ Cheers k/ P.S: Now reply to ALL. On Sun, Nov 23, 2014 at 7:16 PM, Krishna Sankar ksanka...@gmail.com wrote: Good point. On the positive side, whether we choose the most efficient mechanism in Scala might

Re: Spark or MR, Scala or Java?

2014-11-23 Thread Sanjay Subramanian
:03 AM Subject: Re: Spark or MR, Scala or Java? This being a very broad topic, a discussion can quickly get subjective. I'll try not to deviate from my experiences and observations to keep this thread useful to those looking for answers. I have used Hadoop MR (with Hive, MR Java apis

RE: Spark or MR, Scala or Java?

2014-11-22 Thread Ashic Mahtab
Spark can do Map Reduce and more, and faster. One area where using MR would make sense is if you're using something (maybe like Mahout) that doesn't understand Spark yet (Mahout may be Spark compatible now...just pulled that name out of thin air!). You *can* use Spark from Java, but you'd have a

Re: Spark or MR, Scala or Java?

2014-11-22 Thread Denny Lee
Just to add some more stuff - there are various scenarios where traditional Hadoop makes more sense than Spark. For example, if you have a long running processing job in which you do not want to utilize too many resources of the cluster. Another example could be that you want to run a distributed

Re: Spark or MR, Scala or Java?

2014-11-22 Thread Sean Owen
MapReduce is simpler and narrower, which also means it is generally lighter weight, with less to know and configure, and runs more predictably. If you have a job that is truly just a few maps, with maybe one reduce, MR will likely be more efficient. Until recently its shuffle has been more

Re: Spark or MR, Scala or Java?

2014-11-22 Thread Krishna Sankar
Adding to already interesting answers: - Is there any case where MR is better than Spark? I don't know what cases I should be used Spark by MR. When is MR faster than Spark? - Many. MR would be better (am not saying faster ;o)) for - Very large dataset, - Multistage

Re: Spark or MR, Scala or Java?

2014-11-22 Thread Soumya Simanta
Thanks Sean. adding user@spark.apache.org again. On Sat, Nov 22, 2014 at 9:35 PM, Sean Owen so...@cloudera.com wrote: On Sun, Nov 23, 2014 at 2:20 AM, Soumya Simanta soumya.sima...@gmail.com wrote: Is the MapReduce API simpler or the implementation? Almost, every Spark presentation has a