Re: Problems with MR Job running really slowly

Florin P Sun, 06 Nov 2011 23:59:36 -0800

Hello!
   The advices that  I gave were based on my experience. They helped me to 
solve my issues when I was sending a lot of data to reducers.
  I can give you just two advices,  and hope that the really expert guys from 
Hadoop and Cloudera to help you. I'm particular interested in this subject too.
1. Set up the speculative task execution to false 
(mapred.reduce.tasks.speculative.execution=false)
2. Check your if you have enough space on the HDD (hadoop uses a some temporary 
folders to write data - please check the documentation about it)
3. Check if you have memory leaks on your process algorithm. Run the process 
algorithm in a separate program and profile it with JProfiler or other tool for 
profiling. It seems to me that you have an OutOfMemory Error.


Hope that these helps.

Good luck,
  Florin


--- On Mon, 11/7/11, Steve Lewis <lordjoe2...@gmail.com> wrote:

From: Steve Lewis <lordjoe2...@gmail.com>
Subject: Re: Problems with MR Job running really slowly
To: mapreduce-user@hadoop.apache.org
Date: Monday, November 7, 2011, 12:13 AM

1)  I am varying both the number of mappers and reducers trying to determine 
three things     a) What are the options I need reducers and mappers to         
     -  Not have mappers or reducers killed with GC overhead limit exceeded
                - Minimize execution time for the cluster      I use a custom 
Splitter and can adjust the block size for anywhere from 1 Mapper to  hundreds 
of mappers - 
      For an 8 node cluster I am trying 8,16 and 24 reducers

2) I have been playing with          io,sort,factor - using 100 for now
        io.sort.mb is 400 - use mich higher values and the job will not run

3) I set child,vm.opts to Xmx3000m ( getting simulat results to using 1300

4) My mappreads a single file about 1 GB in size. Each item the splitter 
delivers (about 1KB) generates tens of thousands of Key Value pair (<100bytes 
per value) I can do all the work of generating the output on one machine (but 
not the shuffle and sort) in about an hour on one box but my job is running 
for a many hours without completing. I also got a lot of    
after seeing in other tasksLost task tracker: 
tracker_glados4.systemsbiology.net:localhost.localdomain/127.0.0.1:32790
Caused by: java.lang.NullPointerException
        at 
org.apache.hadoop.io.compress.BZip2Codec$BZip2CompressionOutputStream.write(BZip2Codec.java:200)
        at 
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:41)
        at java.io.DataOutputStream.writeByte(DataOutputStream.java:136)
        at org.apache.hadoop.io.WritableUtils.writeVLong(WritableUtils.java:263)
        at org.apache.hadoop.io.WritableUtils.writeVInt(WritableUtils.java:243)
        at org.apache.hadoop.mapred.IFile$Writer.close(IFile.java:126)
        at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1242)
        at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:648)
        at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1135)

The same job runs well with a smaller data set.  Most of the reason for moving 
to hadoop is to allow solutions to scale and I am very concerned at howbadly my 
larger cases are doing. The documentation does not say about about 
tuning parameters for my larger jobs without running into swap hell or 
GC overhead limit exceeeded.
On Sun, Nov 6, 2011 at 7:11 AM, Florin P <florinp...@yahoo.com> wrote:

Hello!



  How many reducers you are using?

  Regarding the performance parameters, fist you can increase the size of the 
io.sort.mb parameter.

  It seems that you are sending a lot of amount of data to the reducer. By 
increasing the value of this parameter, in the shuffle phase, the framework 
will not be forced to write/spill data on the HDD that could be a reason for 
slowing the process.


  If you are using one reducer, then the whole data is sent over HTTP to that 
reducer. Another  thing that you have to think about it.

 Just for a curiosity, try increase also the dfs.block.size to 128 MB. It seems 
that you are using the default 64 MB. You'll get less mapper tasks.

  Also, depending what configuration you have on the machine how many cores do 
you have on CPU, you can increase the values for

mapred.tasktracker.{map|reduce}.tasks.maximum    The maximum number of 
Map/Reduce tasks, which are run simultaneously on a given TaskTracker, 
individually.      Defaults to 2 (2 maps and 2 reduces), but vary it depending 
on your hardware


 You can have a look at 
http://hadoop.apache.org/common/docs/r0.20.2/cluster_setup.html.

  A good book for understanding tuning parameters is Hadoop Definitive Guide by 
Tom White.



  Hope that the above helps.

  Regards,

  Florin









--- On Thu, 11/3/11, Steve Lewis <lordjoe2...@gmail.com> wrote:



From: Steve Lewis <lordjoe2...@gmail.com>

Subject: Problems with MR Job running really slowly

To: "mapreduce-user" <mapreduce-user@hadoop.apache.org>

Date: Thursday, November 3, 2011, 11:07 PM



I have a job which takes an xml file - the splitter breaks the file into tags, 
the mapper parses each tag and sends the data to the reducer. I am using a 
custom splitter which reads

 the file looking for start and end tags.





When I run the code in the splitter and the mapper - generating separate tags 
and parsing them I can read a file sized at about  500MB containing 12000 tags 
on my local system in 23 seconds





When I read a file on HDFS on a local cluster I can read and parse the file in 
38 seconds

When I run the same code on a eight node cluster I get 7 map tasks. The mappers 
are taking 190 seconds to handle 100 tags of 



which 200 millisec is parsing and almost all of the rest of the time is 
in context.write. A mapper handling 1600 tags takes about 3 hours -These are 
the statistics for a map task - it it true that one tag well be sent to about 
300 keys but still 3 hours to write 1,5 million records and 5Gb seems


way excessive 

FileSystemCountersFILE_BYTES_READ 816,935,457HDFS_BYTES_READ 
439,554,860FILE_BYTES_WRITTEN 1,667,745,197

PerformanceTotalScoredScans 1,660



Map-Reduce FrameworkCombine output records0Map input records 6,134Spilled 
Records 1,690,063Map output bytes 5,517,423,780Combine input records 0





Map output records 571,475

Anyone want to offer suggestions on how to tune the job better



--

Steven M. Lewis PhD4221 105th Ave NEKirkland, WA 98033



206-384-1340 (cell)

Skype lordjoe_com














-- 
Steven M. Lewis PhD4221 105th Ave NEKirkland, WA 98033206-384-1340 (cell)
Skype lordjoe_com

Re: Problems with MR Job running really slowly

Reply via email to