Re: Spark replacing Hadoop

2016-04-14 Thread Mich Talebzadeh
One can see from the responses that Big Data landscape is getting very crowded with tools and there are dozens of alternatives offered. However, as usual the laws of selection will gravitate towards solutions that are scalable, reliable and more importantly cost effective. To this end any

Re: Spark replacing Hadoop

2016-04-14 Thread Peyman Mohajerian
Cloud adds another dimension: The fact that in cloud compute and storage is decoupled, s3-emr or blob-hdisight, means in cloud Hadoop ends up being more of a compute engine and a lot of the governance, security features are irrelevant or less important because data at rest is out of Hadoop.

Re: Spark replacing Hadoop

2016-04-14 Thread Cody Koeninger
I've been using spark for years and have (thankfully) been able to avoid needing HDFS, aside from one contract where it was already in use. At this point, many of the people I know would consider Kafka to be more important than HDFS. On Thu, Apr 14, 2016 at 3:11 PM, Jörn Franke

Re: Spark replacing Hadoop

2016-04-14 Thread Jörn Franke
I do not think so. Hadoop provides an ecosystem in which you can deploy different engines, such as MR, HBase, TEZ, Spark, Flink, titandb, hive, solr... I observe also that commercial analytical tools use one or more of these engines to execute their code in a distributed fashion. You need this

Re: Spark replacing Hadoop

2016-04-14 Thread Sean Owen
Depends indeed on what you mean by "Hadoop". The core Hadoop project is MapReduce, YARN and HDFS. MapReduce is still in use as a workhorse but superseded by engines like Spark (or perhaps Flink). (Tez maps loosely to Spark Core really, and is not really a MapReduce replacement.) "Hadoop" can

Re: Spark replacing Hadoop

2016-04-14 Thread Arunkumar Chandrasekar
Hello, I would stand in side of Spark. Spark provides numerous add-ons like Spark SQL, Spark MLIB that are possibly something hard to set it up with Map Reduce. Thank You. > On Apr 15, 2016, at 1:16 AM, Ashok Kumar wrote: > > Hello, > > Well, Sounds like

Re: Spark replacing Hadoop

2016-04-14 Thread Ashok Kumar
Hello, Well, Sounds like Andy is implying that Spark can replace Hadoop whereas Mich still believes that HDFS is a keeper? thanks On Thursday, 14 April 2016, 20:40, David Newberger wrote: #yiv4514430231 #yiv4514430231 -- _filtered #yiv4514430231

Re: Spark replacing Hadoop

2016-04-14 Thread Felipe Gustavo
Hi Ashok, In my opinion, we should look at Hadoop as a general purpose Framework that supports multiple models and we should look at Spark as an alternative to Hadoop MapReduce rather than a replacement to Hadoop ecosystem (for instance, Spark is not replacing Zookeper, HDFS, etc) Regards On

RE: Spark replacing Hadoop

2016-04-14 Thread David Newberger
Can we assume your question is “Will Spark replace Hadoop MapReduce?” or do you literally mean replacing the whole of Hadoop? David From: Ashok Kumar [mailto:ashok34...@yahoo.com.INVALID] Sent: Thursday, April 14, 2016 2:13 PM To: User Subject: Spark replacing Hadoop Hi, I hear that some

Re: Spark replacing Hadoop

2016-04-14 Thread Mich Talebzadeh
Hi, My two cents here. Hadoop as I understand has two components namely HDFS (Hadoop Distributed File System) and MapReduce. Whatever we use I still think we need to store data on HDFS (excluding standalones like MongoDB etc.). Now moving to MapReduce as the execution engine that is replaced by

Re: Spark replacing Hadoop

2016-04-14 Thread Andy Davidson
Hi Ashok In general if I was starting a new project and had not invested heavily in hadoop (i.e. Had a large staff that was trained on hadoop, had a lot of existing projects implemented on hadoop, Š) I would probably start using spark. Its faster and easier to use Your mileage may vary Andy