It would be great if we can take a look at what you are doing in the UDF vs
the Mapper.

100x slow does not make sense for the same job/logic, its either the Mapper
code or may be the cluster was busy at the time you scheduled MapReduce job?

Thanks,
Prashant

On Tue, Feb 28, 2012 at 4:11 PM, Mohit Anchlia <mohitanch...@gmail.com>wrote:

> I am comparing runtime of similar logic. The entire logic is exactly same
> but surprisingly map reduce job that I submit is 100x slow. For pig I use
> udf and for hadoop I use mapper only and the logic same as pig. Even the
> splits on the admin page are same. Not sure why it's so slow. I am
> submitting job like:
>
> java -classpath
>
> .:analytics.jar:/hadoop-0.20.2-cdh3u3/lib/*:/root/.mohit/hadoop-0.20.2-cdh3u3/*:common.jar
> com.services.dp.analytics.hadoop.mapred.FormMLProcessor
>
> /examples/testfile40.seq,/examples/testfile41.seq,/examples/testfile42.seq,/examples/testfile43.seq,/examples/testfile44.seq,/examples/testfile45.seq,/examples/testfile46.seq,/examples/testfile47.seq,/examples/testfile48.seq,/examples/testfile49.seq
> /examples/output1/
>
> How should I go about looking the root cause of why it's so slow? Any
> suggestions would be really appreciated.
>
>
>
> One of the things I noticed is that on the admin page of map task list I
> see status as "hdfs://dsdb1:54310/examples/testfile40.seq:0+134217728" but
> for pig the status is blank.
>

Reply via email to