Hello experts, I am recently testing a set of logs that I converted to avro format in hadoop. I am notice really really slow performance when compare to raw logs. The map logs showing below seems to indicate setting up JVM took the longest time. I am wondering if there is anything I can tweak in my
2011-03-02 08:41:21,887 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=MAP, sessionId= 2011-03-02 08:43:53,684 INFO org.apache.hadoop.mapred.MapTask: io.sort.mb = 100 2011-03-02 08:43:53,737 INFO org.apache.hadoop.mapred.MapTask: data buffer = 79691776/99614720 2011-03-02 08:43:53,737 INFO org.apache.hadoop.mapred.MapTask: record buffer = 262144/327680 2011-03-02 08:44:05,057 INFO org.apache.hadoop.mapred.MapTask: Spilling map output: record full = true 2011-03-02 08:44:05,057 INFO org.apache.hadoop.mapred.MapTask: bufstart = 0; bufend = 11272192; bufvoid = 99614720 2011-03-02 08:44:05,057 INFO org.apache.hadoop.mapred.MapTask: kvstart = 0; kvend = 262144; length = 327680 2011-03-02 08:44:05,703 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded the native-hadoop library 2011-03-02 08:44:05,704 INFO org.apache.hadoop.io.compress.zlib.ZlibFactory: Successfully loaded & initialized native-zlib library 2011-03-02 08:44:05,706 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new compressor 2011-03-02 08:44:08,103 INFO org.apache.hadoop.mapred.MapTask: Finished spill 0 2011-03-02 08:44:16,281 INFO org.apache.hadoop.mapred.MapTask: Spilling map output: record full = true 2011-03-02 08:44:16,281 INFO org.apache.hadoop.mapred.MapTask: bufstart = 11272192; bufend = 22544341; bufvoid = 99614720 2011-03-02 08:44:16,281 INFO org.apache.hadoop.mapred.MapTask: kvstart = 262144; kvend = 196607; length = 327680 2011-03-02 08:44:18,921 INFO org.apache.hadoop.mapred.MapTask: Finished spill 1 2011-03-02 08:44:27,029 INFO org.apache.hadoop.mapred.MapTask: Spilling map output: record full = true 2011-03-02 08:44:27,029 INFO org.apache.hadoop.mapred.MapTask: bufstart = 22544341; bufend = 33816490; bufvoid = 99614720 2011-03-02 08:44:27,030 INFO org.apache.hadoop.mapred.MapTask: kvstart = 196607; kvend = 131070; length = 327680 2011-03-02 08:44:29,454 INFO org.apache.hadoop.mapred.MapTask: Finished spill 2 2011-03-02 08:44:41,450 INFO org.apache.hadoop.mapred.MapTask: Spilling map output: record full = true 2011-03-02 08:44:41,450 INFO org.apache.hadoop.mapred.MapTask: bufstart = 33816490; bufend = 45088639; bufvoid = 99614720 2011-03-02 08:44:41,450 INFO org.apache.hadoop.mapred.MapTask: kvstart = 131070; kvend = 65533; length = 327680 2011-03-02 08:44:43,740 INFO org.apache.hadoop.mapred.MapTask: Finished spill 3 2011-03-02 08:44:51,010 INFO org.apache.hadoop.mapred.MapTask: Spilling map output: record full = true 2011-03-02 08:44:51,010 INFO org.apache.hadoop.mapred.MapTask: bufstart = 45088639; bufend = 56360831; bufvoid = 99614720 2011-03-02 08:44:51,010 INFO org.apache.hadoop.mapred.MapTask: kvstart = 65533; kvend = 327677; length = 327680 2011-03-02 08:44:53,066 INFO org.apache.hadoop.mapred.MapTask: Finished spill 4 2011-03-02 08:44:53,419 INFO org.apache.hadoop.mapred.MapTask: Starting flush of map output 2011-03-02 08:44:54,079 INFO org.apache.hadoop.mapred.MapTask: Finished spill 5 2011-03-02 08:44:54,085 INFO org.apache.hadoop.mapred.Merger: Merging 6 sorted segments 2011-03-02 08:44:54,091 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor 2011-03-02 08:44:54,093 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor 2011-03-02 08:44:54,093 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor 2011-03-02 08:44:54,094 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor 2011-03-02 08:44:54,095 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor 2011-03-02 08:44:54,096 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor 2011-03-02 08:44:54,096 INFO org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 6 segments left of total size: 476686 bytes 2011-03-02 08:44:54,519 INFO org.apache.hadoop.mapred.Merger: Merging 6 sorted segments 2011-03-02 08:44:54,523 INFO org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 6 segments left of total size: 477643 bytes 2011-03-02 08:44:54,882 INFO org.apache.hadoop.mapred.Merger: Merging 6 sorted segments 2011-03-02 08:44:54,886 INFO org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 6 segments left of total size: 474730 bytes 2011-03-02 08:44:55,241 INFO org.apache.hadoop.mapred.Merger: Merging 6 sorted segments 2011-03-02 08:44:55,245 INFO org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 6 segments left of total size: 478581 bytes 2011-03-02 08:44:55,647 INFO org.apache.hadoop.mapred.Merger: Merging 6 sorted segments 2011-03-02 08:44:55,651 INFO org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 6 segments left of total size: 479353 bytes 2011-03-02 08:44:56,011 INFO org.apache.hadoop.mapred.Merger: Merging 6 sorted segments 2011-03-02 08:44:56,015 INFO org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 6 segments left of total size: 474061 bytes 2011-03-02 08:44:56,367 INFO org.apache.hadoop.mapred.Merger: Merging 6 sorted segments 2011-03-02 08:44:56,371 INFO org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 6 segments left of total size: 476289 bytes 2011-03-02 08:44:56,725 INFO org.apache.hadoop.mapred.Merger: Merging 6 sorted segments 2011-03-02 08:44:56,729 INFO org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 6 segments left of total size: 477047 bytes 2011-03-02 08:44:57,082 INFO org.apache.hadoop.mapred.Merger: Merging 6 sorted segments 2011-03-02 08:44:57,086 INFO org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 6 segments left of total size: 474144 bytes 2011-03-02 08:44:57,437 INFO org.apache.hadoop.mapred.Merger: Merging 6 sorted segments 2011-03-02 08:44:57,441 INFO org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 6 segments left of total size: 479785 bytes 2011-03-02 08:44:57,819 INFO org.apache.hadoop.mapred.Merger: Merging 6 sorted segments 2011-03-02 08:44:57,823 INFO org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 6 segments left of total size: 478787 bytes 2011-03-02 08:44:58,177 INFO org.apache.hadoop.mapred.Merger: Merging 6 sorted segments 2011-03-02 08:44:58,182 INFO org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 6 segments left of total size: 474947 bytes 2011-03-02 08:44:58,531 INFO org.apache.hadoop.mapred.Merger: Merging 6 sorted segments 2011-03-02 08:44:58,535 INFO org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 6 segments left of total size: 476642 bytes 2011-03-02 08:44:58,890 INFO org.apache.hadoop.mapred.Merger: Merging 6 sorted segments 2011-03-02 08:44:58,894 INFO org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 6 segments left of total size: 474380 bytes 2011-03-02 08:44:59,248 INFO org.apache.hadoop.mapred.Merger: Merging 6 sorted segments 2011-03-02 08:44:59,252 INFO org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 6 segments left of total size: 474939 bytes 2011-03-02 08:44:59,610 INFO org.apache.hadoop.mapred.Merger: Merging 6 sorted segments 2011-03-02 08:44:59,614 INFO org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 6 segments left of total size: 476284 bytes 2011-03-02 08:44:59,964 INFO org.apache.hadoop.mapred.Merger: Merging 6 sorted segments 2011-03-02 08:44:59,968 INFO org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 6 segments left of total size: 477268 bytes 2011-03-02 08:45:00,321 INFO org.apache.hadoop.mapred.Merger: Merging 6 sorted segments 2011-03-02 08:45:00,325 INFO org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 6 segments left of total size: 477738 bytes 2011-03-02 08:45:00,680 INFO org.apache.hadoop.mapred.Merger: Merging 6 sorted segments 2011-03-02 08:45:00,684 INFO org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 6 segments left of total size: 477485 bytes 2011-03-02 08:45:01,038 INFO org.apache.hadoop.mapred.Merger: Merging 6 sorted segments 2011-03-02 08:45:01,042 INFO org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 6 segments left of total size: 477476 bytes 2011-03-02 08:45:01,395 INFO org.apache.hadoop.mapred.Merger: Merging 6 sorted segments 2011-03-02 08:45:01,399 INFO org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 6 segments left of total size: 475864 bytes 2011-03-02 08:45:01,749 INFO org.apache.hadoop.mapred.Merger: Merging 6 sorted segments 2011-03-02 08:45:01,753 INFO org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 6 segments left of total size: 476185 bytes 2011-03-02 08:45:02,105 INFO org.apache.hadoop.mapred.Merger: Merging 6 sorted segments 2011-03-02 08:45:02,109 INFO org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 6 segments left of total size: 477883 bytes 2011-03-02 08:45:02,463 INFO org.apache.hadoop.mapred.Merger: Merging 6 sorted segments 2011-03-02 08:45:02,467 INFO org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 6 segments left of total size: 478753 bytes 2011-03-02 08:45:02,830 INFO org.apache.hadoop.mapred.TaskRunner: Task:attempt_201102071809_6255_m_000008_0 is done. And is in the process of commiting 2011-03-02 08:45:02,833 INFO org.apache.hadoop.mapred.TaskRunner: Task 'attempt_201102071809_6255_m_000008_0' done. When I ran the job with raw logs I see 2011-03-02 10:40:56,113 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=MAP, sessionId= 2011-03-02 10:41:01,514 INFO org.apache.hadoop.mapred.MapTask: io.sort.mb = 100 2011-03-02 10:41:01,579 INFO org.apache.hadoop.mapred.MapTask: data buffer = 79691776/99614720 2011-03-02 10:41:01,579 INFO org.apache.hadoop.mapred.MapTask: record buffer = 262144/327680 2011-03-02 10:42:01,446 INFO org.apache.hadoop.mapred.MapTask: Spilling map output: record full = true 2011-03-02 10:42:01,446 INFO org.apache.hadoop.mapred.MapTask: bufstart = 0; bufend = 11272192; bufvoid = 99614720 notice the io.sort.mb in avro format takes more than 2 minutes while the raw version is almost instant. So what is going on there?
