Re: Spark vs Tez
I did a performance benchmark during my summer internship . I am currently a grad student. Can't reveal much about the specific project but Spark is still faster than around 4-5th iteration of Tez of the same query/dataset. By Iteration I mean utilizing the hot-container property of Apache Tez . See latest release of Tez and some hortonworks tutorials on their website. The only problem with Spark adoption is the steep learning curve of Scala , and understanding the API properly. Thanks On Fri, Oct 17, 2014 at 11:06 AM, Adaryl Bob Wakefield, MBA adaryl.wakefi...@hotmail.com wrote: Does anybody have any performance figures on how Spark stacks up against Tez? If you don’t have figures, does anybody have an opinion? Spark seems so popular but I’m not really seeing why. B.
Re: The future of MapReduce
Spark https://spark.apache.org/ is also getting a lot attention with its in-memory computations and caching features. Performance wise it is being touted better than mahout because machine learning involves iterative computations and Spark could cache these computations in-memory for faster processing. On Tue, Jul 1, 2014 at 11:07 AM, Adaryl Bob Wakefield, MBA adaryl.wakefi...@hotmail.com wrote: From your answer, it sounds like you need to be able to do both. *From:* Marco Shaw marco.s...@gmail.com *Sent:* Tuesday, July 01, 2014 10:24 AM *To:* user user@hadoop.apache.org *Subject:* Re: The future of MapReduce It depends... It seems most are evolving from needing lots of data crunched, to lots of data crunched right now. Most are looking for *real-time* fraud detection or recommendations, for example, which MapReduce is not ideal for. Marco On Tue, Jul 1, 2014 at 12:00 PM, Adaryl Bob Wakefield, MBA adaryl.wakefi...@hotmail.com wrote: “The Mahout community decided to move its codebase onto modern data processing systems that offer a richer programming model and more efficient execution than Hadoop MapReduce.” Does this mean that learning MapReduce is a waste of time? Is Storm the future or are both technologies necessary? B.
Re: HDFS data transfer!
I would suppose about 2-3 hours. It took me some 2 days to load a 160 Gb file. Secura On Wed, Jun 10, 2009 at 11:56 AM, Sugandha Naolekar sugandha@gmail.comwrote:It Hello! If I try to transfer a 5GB VDI file from a remote host(not a part of hadoop cluster) into HDFS, and get it back, how much time is it supposed to take? No map-reduce involved. Simply Writing files in and out from HDFS through a simple code of java (usage of API's). -- Regards! Sugandha
Indexing on top of Hadoop
Hi, I have a huge LDIF file in order of GBs spanning some million user records. I am running the example Grep job on that file. The search results have not really been upto expectations because of it being a basic per line , brute force. I was thinking of building some indexes inside HDFS for that file , so that the search results could improve. What could I possibly try to achieve this? Secura