I am comparing runtime of similar logic. The entire logic is exactly same
but surprisingly map reduce job that I submit is 100x slow. For pig I use
udf and for hadoop I use mapper only and the logic same as pig. Even the
splits on the admin page are same. Not sure why it's so slow. I am
submitting job like:

java -classpath
.:analytics.jar:/hadoop-0.20.2-cdh3u3/lib/*:/root/.mohit/hadoop-0.20.2-cdh3u3/*:common.jar
com.services.dp.analytics.hadoop.mapred.FormMLProcessor
/examples/testfile40.seq,/examples/testfile41.seq,/examples/testfile42.seq,/examples/testfile43.seq,/examples/testfile44.seq,/examples/testfile45.seq,/examples/testfile46.seq,/examples/testfile47.seq,/examples/testfile48.seq,/examples/testfile49.seq
/examples/output1/

How should I go about looking the root cause of why it's so slow? Any
suggestions would be really appreciated.



One of the things I noticed is that on the admin page of map task list I
see status as "hdfs://dsdb1:54310/examples/testfile40.seq:0+134217728" but
for pig the status is blank.

Reply via email to