It would be great if we can take a look at what you are doing in the UDF vs the Mapper.
100x slow does not make sense for the same job/logic, its either the Mapper code or may be the cluster was busy at the time you scheduled MapReduce job? Thanks, Prashant On Tue, Feb 28, 2012 at 4:11 PM, Mohit Anchlia <mohitanch...@gmail.com>wrote: > I am comparing runtime of similar logic. The entire logic is exactly same > but surprisingly map reduce job that I submit is 100x slow. For pig I use > udf and for hadoop I use mapper only and the logic same as pig. Even the > splits on the admin page are same. Not sure why it's so slow. I am > submitting job like: > > java -classpath > > .:analytics.jar:/hadoop-0.20.2-cdh3u3/lib/*:/root/.mohit/hadoop-0.20.2-cdh3u3/*:common.jar > com.services.dp.analytics.hadoop.mapred.FormMLProcessor > > /examples/testfile40.seq,/examples/testfile41.seq,/examples/testfile42.seq,/examples/testfile43.seq,/examples/testfile44.seq,/examples/testfile45.seq,/examples/testfile46.seq,/examples/testfile47.seq,/examples/testfile48.seq,/examples/testfile49.seq > /examples/output1/ > > How should I go about looking the root cause of why it's so slow? Any > suggestions would be really appreciated. > > > > One of the things I noticed is that on the admin page of map task list I > see status as "hdfs://dsdb1:54310/examples/testfile40.seq:0+134217728" but > for pig the status is blank. >