RE: Mapper OutOfMemoryError Revisited !!

2008-04-11 Thread Devaraj Das
Which hadoop version are you on? 

> -Original Message-
> From: bhupesh bansal [mailto:[EMAIL PROTECTED] 
> Sent: Friday, April 11, 2008 11:21 PM
> To: [EMAIL PROTECTED]
> Subject: Mapper OutOfMemoryError Revisited !!
> 
> 
> Hi Guys, I need to restart discussion around 
> http://www.nabble.com/Mapper-Out-of-Memory-td14200563.html
> 
>  I saw the same OOM error in my map-reduce job in the map phase. 
> 
> 1. I tried changing mapred.child.java.opts (bumped to 600M) 
> 2. io.sort.mb was kept at 100MB. 
> 
> I see the same errors still. 
> 
> I checked with debug the size of "keyValBuffer" in collect(), 
> that is always less than io.sort.mb and is spilled to disk properly.
> 
> I tried changing the map.task number to a very high number so 
> that the input is split into smaller chunks. It helps for a 
> while as the map job went a bit far (56% from 5%) but still 
> see the problem.
> 
>  I tried bumping mapred.child.java.opts to 1000M , still got 
> the same error. 
> 
> I also tried using the -verbose:gc -Xloggc:/tmp/@[EMAIL PROTECTED] 
> value in opts to get the gc.log but didnt got any log??
> 
>  I tried using 'jmap -histo pid' to see the heap information, 
> it didnt gave me any meaningful or obvious problem point. 
> 
> What are the other possible memory hog during mapper phase ?? 
> Is the input file chunk kept fully in memory ?? 
> 
> Application: 
> 
> My map-reduce job is running with about 2G of input. in the 
> Mapper phase I read each line and output [5-500] (key,value) 
> pair. so the intermediate data should be really blown up.  
> will that be a problem. 
> 
> The Error file is attached
> http://www.nabble.com/file/p16628181/error.txt error.txt
> --
> View this message in context: 
> http://www.nabble.com/Mapper-OutOfMemoryError-Revisited-%21%21
> -tp16628181p16628181.html
> Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
> 
> 



Mapper OutOfMemoryError Revisited !!

2008-04-11 Thread bhupesh bansal

Hi Guys, I need to restart discussion around 
http://www.nabble.com/Mapper-Out-of-Memory-td14200563.html

 I saw the same OOM error in my map-reduce job in the map phase. 

1. I tried changing mapred.child.java.opts (bumped to 600M) 
2. io.sort.mb was kept at 100MB. 

I see the same errors still. 

I checked with debug the size of "keyValBuffer" in collect(), that is always
less than io.sort.mb and is spilled to disk properly.

I tried changing the map.task number to a very high number so that the input
is split into smaller chunks. It helps for a while as the map job went a bit
far (56% from 5%) but still see the problem.

 I tried bumping mapred.child.java.opts to 1000M , still got the same error. 

I also tried using the -verbose:gc -Xloggc:/tmp/@[EMAIL PROTECTED] value in 
opts to
get the gc.log but didnt got any log??

 I tried using 'jmap -histo pid' to see the heap information, it didnt gave
me any meaningful or obvious problem point. 

What are the other possible memory hog during mapper phase ?? Is the input
file chunk kept fully in memory ?? 

Application: 

My map-reduce job is running with about 2G of input. in the Mapper phase I
read each line and output [5-500] (key,value) pair. so the intermediate data
should be really blown up.  will that be a problem. 

The Error file is attached
http://www.nabble.com/file/p16628181/error.txt error.txt 
-- 
View this message in context: 
http://www.nabble.com/Mapper-OutOfMemoryError-Revisited-%21%21-tp16628181p16628181.html
Sent from the Hadoop lucene-users mailing list archive at Nabble.com.



Mapper OutOfMemoryError Revisited !!

2008-04-11 Thread bhupesh bansal

Hi Guys, 

I need to restart discussion around 
http://www.nabble.com/Mapper-Out-of-Memory-td14200563.html

I saw the same OOM error in my map-reduce job in the map phase.

1. I tried changing mapred.child.java.opts (bumped to 600M)
2. io.sort.mb was kept at 100MB.
I see the same errors still.

I checked with debug the size of "keyValBuffer" in collect(), that is always
less than io.sort.mb and is spilled to disk properly. 

I tried changing the map.task number to a very high number so that the input
is split into smaller chunks.  It helps for a while as the map job went a
bit far (56% from 5%) but still see the problem. 

I tried bumping mapred.child.java.opts to 1000M , still got the same error. 

I also tried using the -verbose:gc -Xloggc:/tmp/@[EMAIL PROTECTED] value in 
opts to
get the gc.log but didnt got any log??

I tried using 'jmap -histo pid' to see the heap information, it didnt gave
me any meaningful or obvious problem point. 


What are the other possible memory hog during mapper phase ?? Is the input
file chunk kept fully in memory ?? 


task_200804110926_0004_m_000239_0: java.lang.OutOfMemoryError: Java heap
spacetask_200804110926_0004_m_000239_0:  at
java.util.Arrays.copyOf(Arrays.java:2786)task_200804110926_0004_m_000239_0: 
at
java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)task_200804110926_0004_m_000239_0:
 
at
java.io.DataOutputStream.write(DataOutputStream.java:90)task_200804110926_0004_m_000239_0:
 
at
java.io.DataOutputStream.writeUTF(DataOutputStream.java:384)task_200804110926_0004_m_000239_0:
 
at
java.io.DataOutputStream.writeUTF(DataOutputStream.java:306)task_200804110926_0004_m_000239_0:
 
at
com.linkedin.Hadoop.DataObjects.SearchTrackingJoinValue.write(SearchTrackingJoinValue.java:117)task_200804110926_0004_m_000239_0:
 
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:350)task_200804110926_0004_m_000239_0:
 
at
com.linkedin.Hadoop.Mapper.SearchClickJoinMapper.readSearchJoinResultsObject(SearchClickJoinMapper.java:131)task_200804110926_0004_m_000239_0:
 
at
com.linkedin.Hadoop.Mapper.SearchClickJoinMapper.map(SearchClickJoinMapper.java:54)task_200804110926_0004_m_000239_0:
 
at
com.linkedin.Hadoop.Mapper.SearchClickJoinMapper.map(SearchClickJoinMapper.java:31)task_200804110926_0004_m_000239_0:
 
at
org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)task_200804110926_0004_m_000239_0:
 
at
org.apache.hadoop.mapred.MapTask.run(MapTask.java:192)task_200804110926_0004_m_000239_0:
 
at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1804)


-- 
View this message in context: 
http://www.nabble.com/Mapper-OutOfMemoryError-Revisited-%21%21-tp16628173p16628173.html
Sent from the Hadoop lucene-users mailing list archive at Nabble.com.