Re: Spark vs Tez

2014-10-17 Thread kartik saxena
I did a performance benchmark during my summer internship . I am currently
a grad student. Can't reveal much about the specific project but Spark is
still faster than around 4-5th iteration of Tez of the same query/dataset.
By Iteration I mean utilizing the hot-container property of Apache Tez  .
See latest release of Tez and some hortonworks tutorials on their website.

The only problem with Spark adoption is the steep learning curve of Scala ,
and understanding the API properly.

Thanks

On Fri, Oct 17, 2014 at 11:06 AM, Adaryl Bob Wakefield, MBA 
adaryl.wakefi...@hotmail.com wrote:

   Does anybody have any performance figures on how Spark stacks up
 against Tez? If you don’t have figures, does anybody have an opinion? Spark
 seems so popular but I’m not really seeing why.
 B.



Re: The future of MapReduce

2014-07-01 Thread kartik saxena
Spark https://spark.apache.org/ is also getting a lot attention with its
in-memory computations and caching features. Performance wise it is being
touted better than mahout because machine learning involves iterative
computations and Spark could cache these computations in-memory for faster
processing.


On Tue, Jul 1, 2014 at 11:07 AM, Adaryl Bob Wakefield, MBA 
adaryl.wakefi...@hotmail.com wrote:

   From your answer, it sounds like you need to be able to do both.

  *From:* Marco Shaw marco.s...@gmail.com
 *Sent:* Tuesday, July 01, 2014 10:24 AM
 *To:* user user@hadoop.apache.org
 *Subject:* Re: The future of MapReduce

  It depends...  It seems most are evolving from needing lots of data
 crunched, to lots of data crunched right now.  Most are looking for
 *real-time* fraud detection or recommendations, for example, which
 MapReduce is not ideal for.

 Marco


 On Tue, Jul 1, 2014 at 12:00 PM, Adaryl Bob Wakefield, MBA 
 adaryl.wakefi...@hotmail.com wrote:

   “The Mahout community decided to move its codebase onto modern data
 processing systems that offer a richer programming model and more efficient
 execution than Hadoop MapReduce.”

 Does this mean that learning MapReduce is a waste of time? Is Storm the
 future or are both technologies necessary?

 B.





Re: HDFS data transfer!

2009-06-10 Thread kartik saxena
I would suppose about 2-3 hours. It took me some 2 days to load a 160 Gb
file.
Secura

On Wed, Jun 10, 2009 at 11:56 AM, Sugandha Naolekar
sugandha@gmail.comwrote:It

 Hello!

 If I try to transfer a 5GB VDI file from a remote host(not a part of hadoop
 cluster) into HDFS, and get it back, how much time is it supposed to take?

 No map-reduce involved. Simply Writing files in and out from HDFS through a
 simple code of java (usage of API's).

 --
 Regards!
 Sugandha



Indexing on top of Hadoop

2009-06-10 Thread kartik saxena
Hi,

I have a huge  LDIF file in order of GBs spanning some million user records.
I am running the example Grep job on that file. The search results have
not really been
upto expectations because of it being a basic per line , brute force.

I was thinking of building some indexes inside HDFS for that file , so that
the search results could improve. What could I possibly try to achieve this?


Secura