Re: Comparative study

2014-07-07 Thread Nabeel Memon
For Scala API on map/reduce (hadoop engine) there's a library called
Scalding. It's built on top of Cascading. If you have a huge dataset or
if you consider using map/reduce engine for your job, for any reason, you
can try Scalding.

However, Spark vs Impala doesn't make sense to me. It should've really been
Shark vs Impala. Both are SQL querying engines built on top of Spark and
Hadoop (map/reduce engine) respectively.


On Mon, Jul 7, 2014 at 4:06 PM, santosh.viswanat...@accenture.com wrote:

  Thanks Daniel for sharing this info.



 Regards,
 Santosh Karthikeyan



 *From:* Daniel Siegmann [mailto:daniel.siegm...@velos.io]
 *Sent:* Tuesday, July 08, 2014 1:10 AM
 *To:* user@spark.apache.org
 *Subject:* Re: Comparative study



 From a development perspective, I vastly prefer Spark to MapReduce. The
 MapReduce API is very constrained; Spark's API feels much more natural to
 me. Testing and local development is also very easy - creating a local
 Spark context is trivial and it reads local files. For your unit tests you
 can just have them create a local context and execute your flow with some
 test data. Even better, you can do ad-hoc work in the Spark shell and if
 you want that in your production code it will look exactly the same.

 Unfortunately, the picture isn't so rosy when it gets to production. In my
 experience, Spark simply doesn't scale to the volumes that MapReduce will
 handle. Not with a Standalone cluster anyway - maybe Mesos or YARN would be
 better, but I haven't had the opportunity to try them. I find jobs tend to
 just hang forever for no apparent reason on large data sets (but smaller
 than what I push through MapReduce).

 I am hopeful the situation will improve - Spark is developing quickly -
 but if you have large amounts of data you should proceed with caution.

 Keep in mind there are some frameworks for Hadoop which can hide the ugly
 MapReduce with something very similar in form to Spark's API; e.g. Apache
 Crunch. So you might consider those as well.

 (Note: the above is with Spark 1.0.0.)





 On Mon, Jul 7, 2014 at 11:07 AM, santosh.viswanat...@accenture.com
 wrote:

 Hello Experts,



 I am doing some comparative study on the below:



 Spark vs Impala

 Spark vs MapREduce . Is it worth migrating from existing MR implementation
 to Spark?





 Please share your thoughts and expertise.





 Thanks,
 Santosh


  --


 This message is for the designated recipient only and may contain
 privileged, proprietary, or otherwise confidential information. If you have
 received it in error, please notify the sender immediately and delete the
 original. Any other use of the e-mail by you is prohibited. Where allowed
 by local law, electronic communications with Accenture and its affiliates,
 including e-mail and instant messaging (including content), may be scanned
 by our systems for the purposes of information security and assessment of
 internal compliance with Accenture policy.

 __

 www.accenture.com




 --

 Daniel Siegmann, Software Developer
 Velos

 Accelerating Machine Learning


 440 NINTH AVENUE, 11TH FLOOR, NEW YORK, NY 10001
 E: daniel.siegm...@velos.io W: www.velos.io



unsubscribe

2014-05-04 Thread Nabeel Memon
unsubscribe


Re: AmpCamp exercise in a local environment

2014-04-23 Thread Nabeel Memon
Thanks a lot Arpit. It's really helpful.


On Fri, Apr 18, 2014 at 4:24 AM, Arpit Tak arpit.sparku...@gmail.comwrote:

 Download Cloudera VM from here.


 https://drive.google.com/file/d/0B7zn-Mmft-XcdTZPLXltUjJyeUE/edit?usp=sharing

 Regards,
 Arpit Tak


 On Fri, Apr 18, 2014 at 1:20 PM, Arpit Tak arpit.sparku...@gmail.comwrote:

 HI Nabeel,

 I have a cloudera VM , It has both spark and shark installed in it.
 You can download and play around with it . i also have some sample data in
 hdfs and some table .

 You can try out those examples. How to use it ..(instructions are in
 docs...).


 https://drive.google.com/file/d/0B0Q4Le4DZj5iSndIcFBfQlcxM1NlV3RNN3YzU1dOT1ZjZHJJ/edit?usp=sharing

 But for AmpCamp-exercises , you need ec2 only to get wikidata on your
 hdfs. For that I have uploaded file(50Mb) . Just download it and put on
 hdfs .. and you can work around these exercises...


 https://drive.google.com/a/mobipulse.in/uc?id=0B0Q4Le4DZj5iNUdSZXpFTUJEU0Eexport=download

 You will love it...

 Regards,
 Arpit Tak


 On Tue, Apr 15, 2014 at 4:28 AM, Nabeel Memon nm3...@gmail.com wrote:

 Hi. I found AmpCamp exercises as a nice way to get started with spark.
 However they require amazon ec2 access. Has anyone put together any VM or
 docker scripts to have the same environment locally to work out those labs?

 It'll be really helpful. Thanks.






AmpCamp exercise in a local environment

2014-04-14 Thread Nabeel Memon
Hi. I found AmpCamp exercises as a nice way to get started with spark.
However they require amazon ec2 access. Has anyone put together any VM or
docker scripts to have the same environment locally to work out those labs?

It'll be really helpful. Thanks.