Re: OutOfMemory error processing large amounts of gz files

2009-03-02 Thread Runping Qi
Your job tracker out-of-memory problem may be related to
https://issues.apache.org/jira/browse/HADOOP-4766

Runping


On Mon, Mar 2, 2009 at 4:29 PM, bzheng  wrote:

>
> Thanks for all the info.  Upon further investigation, we are dealing with
> two
> separate issues:
>
> 1.  problem processing a lot of gz files
>
> we have tried the hadoop.native.lib setting and it makes little difference.
> however, this is not that big a deal since we can use multiple jobs each
> processing a small chunk of the files instead of one big job processing all
> the files.
>
> 2.  jobtracker out of memory
>
> by increasing amount of memory for the jobtracker, we can delay the
> inevitable.  since the jobtraker's memory usage keeps going up as we run
> more job, we will need to restart the cluster once this error happens.  we
> are currently using 0.18.3 and are holding off changing to a different
> version because we don't want to lose the existing files on HDFS.
>
>
> bzheng wrote:
> >
> > I have about 24k gz files (about 550GB total) on hdfs and has a really
> > simple java program to convert them into sequence files.  If the script's
> > setInputPaths takes a Path[] of all 24k files, it will get a OutOfMemory
> > error at about 35% map complete.  If I make the script process 2k files
> > per job and run 12 jobs consecutively, then it goes through all files
> > fine.  The cluster I'm using has about 67 nodes.  Each nodes has 16GB
> > memory, max 7 map, and max 2 reduce.
> >
> > The map task is really simple, it takes LongWritable as key and Text as
> > value, generate a Text newKey, and output.collect(Text newKey, Text
> > value).  It doesn't have any code that can possibly leak memory.
> >
> > There's no stack trace for the vast majority of the OutOfMemory error,
> > there's just a single line in the log like this:
> >
> > 2009-02-23 14:27:50,902 INFO org.apache.hadoop.mapred.TaskTracker:
> > java.lang.OutOfMemoryError: Java heap space
> >
> > I can't find the stack trace right now, but rarely the OutOfMemory error
> > originates from some hadoop config array copy opertaion.  There's no
> > special config for the script.
> >
>
> --
> View this message in context:
> http://www.nabble.com/OutOfMemory-error-processing-large-amounts-of-gz-files-tp22193552p22300192.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>
>


Re: OutOfMemory error processing large amounts of gz files

2009-03-02 Thread bzheng

Thanks for all the info.  Upon further investigation, we are dealing with two
separate issues:

1.  problem processing a lot of gz files 

we have tried the hadoop.native.lib setting and it makes little difference. 
however, this is not that big a deal since we can use multiple jobs each
processing a small chunk of the files instead of one big job processing all
the files.

2.  jobtracker out of memory

by increasing amount of memory for the jobtracker, we can delay the
inevitable.  since the jobtraker's memory usage keeps going up as we run
more job, we will need to restart the cluster once this error happens.  we
are currently using 0.18.3 and are holding off changing to a different
version because we don't want to lose the existing files on HDFS.


bzheng wrote:
> 
> I have about 24k gz files (about 550GB total) on hdfs and has a really
> simple java program to convert them into sequence files.  If the script's
> setInputPaths takes a Path[] of all 24k files, it will get a OutOfMemory
> error at about 35% map complete.  If I make the script process 2k files
> per job and run 12 jobs consecutively, then it goes through all files
> fine.  The cluster I'm using has about 67 nodes.  Each nodes has 16GB
> memory, max 7 map, and max 2 reduce.
> 
> The map task is really simple, it takes LongWritable as key and Text as
> value, generate a Text newKey, and output.collect(Text newKey, Text
> value).  It doesn't have any code that can possibly leak memory.
> 
> There's no stack trace for the vast majority of the OutOfMemory error,
> there's just a single line in the log like this:
> 
> 2009-02-23 14:27:50,902 INFO org.apache.hadoop.mapred.TaskTracker:
> java.lang.OutOfMemoryError: Java heap space
> 
> I can't find the stack trace right now, but rarely the OutOfMemory error
> originates from some hadoop config array copy opertaion.  There's no
> special config for the script.
> 

-- 
View this message in context: 
http://www.nabble.com/OutOfMemory-error-processing-large-amounts-of-gz-files-tp22193552p22300192.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.



Re: OutOfMemory error processing large amounts of gz files

2009-02-26 Thread bzheng



Arun C Murthy-2 wrote:
> 
> 
> On Feb 24, 2009, at 4:03 PM, bzheng wrote:
>>
> 
>> 2009-02-23 14:27:50,902 INFO org.apache.hadoop.mapred.TaskTracker:
>> java.lang.OutOfMemoryError: Java heap space
>>
> 
> That tells that that your TaskTracker is running out of memory, not  
> your reduce tasks.
> 
> I think you are hitting http://issues.apache.org/jira/browse/ 
> HADOOP-4906.
> 
> What version of hadoop are you running?
> 
> Arun
> 
> 
> 

I'm using 0.18.2.  We figured that gz may not be the root problem when we
ran a big job not involving any gz files, after about 1.5 hours, we got the
same out of memory problem.  One interesting thing though, if we do use gz
files, the out of memory issues occurs in a few minutes.
-- 
View this message in context: 
http://www.nabble.com/OutOfMemory-error-processing-large-amounts-of-gz-files-tp22193552p22231249.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.



Re: OutOfMemory error processing large amounts of gz files

2009-02-26 Thread Arun C Murthy


On Feb 24, 2009, at 4:03 PM, bzheng wrote:





2009-02-23 14:27:50,902 INFO org.apache.hadoop.mapred.TaskTracker:
java.lang.OutOfMemoryError: Java heap space



That tells that that your TaskTracker is running out of memory, not  
your reduce tasks.


I think you are hitting http://issues.apache.org/jira/browse/ 
HADOOP-4906.


What version of hadoop are you running?

Arun



Re: OutOfMemory error processing large amounts of gz files

2009-02-25 Thread bzheng

Thanks for the suggestions.  I tried the hadoop.native.lib setting (both in
job config and in hadoop-sites.xml + restart) and the problem is still
there.

I finally got the exception w/some stack trace and here it is:

2009-02-25 12:24:18,312 INFO org.apache.hadoop.mapred.TaskTracker:
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOfRange(Arrays.java:3209)
at java.lang.String.(String.java:216)
at java.lang.StringBuffer.toString(StringBuffer.java:585)
at
com.sun.org.apache.xerces.internal.dom.DeferredDocumentImpl.getNodeValueString(DeferredDocumentImpl.java:1170)
at
com.sun.org.apache.xerces.internal.dom.DeferredDocumentImpl.getNodeValueString(DeferredDocumentImpl.java:1120)
at
com.sun.org.apache.xerces.internal.dom.DeferredTextImpl.synchronizeData(DeferredTextImpl.java:93)
at
com.sun.org.apache.xerces.internal.dom.CharacterDataImpl.getData(CharacterDataImpl.java:160)
at
org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:928)
at
org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:851)
at
org.apache.hadoop.conf.Configuration.getProps(Configuration.java:819)
at org.apache.hadoop.conf.Configuration.get(Configuration.java:278)
at
org.apache.hadoop.conf.Configuration.getBoolean(Configuration.java:446)
at
org.apache.hadoop.mapred.JobConf.getKeepFailedTaskFiles(JobConf.java:308)
at
org.apache.hadoop.mapred.TaskTracker$TaskInProgress.setJobConf(TaskTracker.java:1497)
at
org.apache.hadoop.mapred.TaskTracker.launchTaskForJob(TaskTracker.java:727)
at
org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:721)
at
org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:1297)
at
org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:937)
at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:1334)
at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:2343)



Tom White-3 wrote:
> 
> Do you experience the problem with and without native compression? Set
> hadoop.native.lib to false to disable native compression.
> 
> Cheers,
> Tom
> 
> On Tue, Feb 24, 2009 at 9:40 PM, Gordon Mohr  wrote:
>> If you're doing a lot of gzip compression/decompression, you *might* be
>> hitting this 6+-year-old Sun JVM bug:
>>
>> "Instantiating Inflater/Deflater causes OutOfMemoryError; finalizers not
>> called promptly enough"
>> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4797189
>>
>> A workaround is listed in the issue: ensuring you call close() or end()
>> on
>> the Deflater; something similar might apply to Inflater.
>>
>> (This is one of those fun JVM situations where having more heap space may
>> make OOMEs more likely: less heap memory pressure leaves more un-GCd or
>> un-finalized heap objects around, each of which is holding a bit of
>> native
>> memory.)
>>
>> - Gordon @ IA
>>
>> bzheng wrote:
>>>
>>> I have about 24k gz files (about 550GB total) on hdfs and has a really
>>> simple
>>> java program to convert them into sequence files.  If the script's
>>> setInputPaths takes a Path[] of all 24k files, it will get a OutOfMemory
>>> error at about 35% map complete.  If I make the script process 2k files
>>> per
>>> job and run 12 jobs consecutively, then it goes through all files fine.
>>>  The
>>> cluster I'm using has about 67 nodes.  Each nodes has 16GB memory, max 7
>>> map, and max 2 reduce.
>>>
>>> The map task is really simple, it takes LongWritable as key and Text as
>>> value, generate a Text newKey, and output.collect(Text newKey, Text
>>> value). It doesn't have any code that can possibly leak memory.
>>>
>>> There's no stack trace for the vast majority of the OutOfMemory error,
>>> there's just a single line in the log like this:
>>>
>>> 2009-02-23 14:27:50,902 INFO org.apache.hadoop.mapred.TaskTracker:
>>> java.lang.OutOfMemoryError: Java heap space
>>>
>>> I can't find the stack trace right now, but rarely the OutOfMemory error
>>> originates from some hadoop config array copy opertaion.  There's no
>>> special
>>> config for the script.
>>
> 
> 

-- 
View this message in context: 
http://www.nabble.com/OutOfMemory-error-processing-large-amounts-of-gz-files-tp22193552p22214505.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.



Re: OutOfMemory error processing large amounts of gz files

2009-02-25 Thread Tom White
Do you experience the problem with and without native compression? Set
hadoop.native.lib to false to disable native compression.

Cheers,
Tom

On Tue, Feb 24, 2009 at 9:40 PM, Gordon Mohr  wrote:
> If you're doing a lot of gzip compression/decompression, you *might* be
> hitting this 6+-year-old Sun JVM bug:
>
> "Instantiating Inflater/Deflater causes OutOfMemoryError; finalizers not
> called promptly enough"
> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4797189
>
> A workaround is listed in the issue: ensuring you call close() or end() on
> the Deflater; something similar might apply to Inflater.
>
> (This is one of those fun JVM situations where having more heap space may
> make OOMEs more likely: less heap memory pressure leaves more un-GCd or
> un-finalized heap objects around, each of which is holding a bit of native
> memory.)
>
> - Gordon @ IA
>
> bzheng wrote:
>>
>> I have about 24k gz files (about 550GB total) on hdfs and has a really
>> simple
>> java program to convert them into sequence files.  If the script's
>> setInputPaths takes a Path[] of all 24k files, it will get a OutOfMemory
>> error at about 35% map complete.  If I make the script process 2k files
>> per
>> job and run 12 jobs consecutively, then it goes through all files fine.
>>  The
>> cluster I'm using has about 67 nodes.  Each nodes has 16GB memory, max 7
>> map, and max 2 reduce.
>>
>> The map task is really simple, it takes LongWritable as key and Text as
>> value, generate a Text newKey, and output.collect(Text newKey, Text
>> value). It doesn't have any code that can possibly leak memory.
>>
>> There's no stack trace for the vast majority of the OutOfMemory error,
>> there's just a single line in the log like this:
>>
>> 2009-02-23 14:27:50,902 INFO org.apache.hadoop.mapred.TaskTracker:
>> java.lang.OutOfMemoryError: Java heap space
>>
>> I can't find the stack trace right now, but rarely the OutOfMemory error
>> originates from some hadoop config array copy opertaion.  There's no
>> special
>> config for the script.
>


Re: OutOfMemory error processing large amounts of gz files

2009-02-24 Thread Gordon Mohr
If you're doing a lot of gzip compression/decompression, you *might* be 
hitting this 6+-year-old Sun JVM bug:


"Instantiating Inflater/Deflater causes OutOfMemoryError; finalizers not 
called promptly enough"

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4797189

A workaround is listed in the issue: ensuring you call close() or end() 
on the Deflater; something similar might apply to Inflater.


(This is one of those fun JVM situations where having more heap space 
may make OOMEs more likely: less heap memory pressure leaves more un-GCd 
or un-finalized heap objects around, each of which is holding a bit of 
native memory.)


- Gordon @ IA

bzheng wrote:

I have about 24k gz files (about 550GB total) on hdfs and has a really simple
java program to convert them into sequence files.  If the script's
setInputPaths takes a Path[] of all 24k files, it will get a OutOfMemory
error at about 35% map complete.  If I make the script process 2k files per
job and run 12 jobs consecutively, then it goes through all files fine.  The
cluster I'm using has about 67 nodes.  Each nodes has 16GB memory, max 7
map, and max 2 reduce.

The map task is really simple, it takes LongWritable as key and Text as
value, generate a Text newKey, and output.collect(Text newKey, Text value). 
It doesn't have any code that can possibly leak memory.


There's no stack trace for the vast majority of the OutOfMemory error,
there's just a single line in the log like this:

2009-02-23 14:27:50,902 INFO org.apache.hadoop.mapred.TaskTracker:
java.lang.OutOfMemoryError: Java heap space

I can't find the stack trace right now, but rarely the OutOfMemory error
originates from some hadoop config array copy opertaion.  There's no special
config for the script.


Re: OutofMemory Error, inspite of large amounts provided

2008-12-28 Thread Amareshwari Sriramadasu

Saptarshi Guha wrote:

Caught it in action.
Running  ps -e -o 'vsz pid ruser args' |sort -nr|head -5
on a machine where the map task was running
04812 16962 sguha/home/godhuli/custom/jdk1.6.0_11/jre/bin/java
-Djava.library.path=/home/godhuli/custom/hadoop/bin/../lib/native/Linux-amd64-64:/home/godhuli/custom/hdfs/mapred/local/taskTracker/jobcache/job_200812282102_0003/attempt_200812282102_0003_m_00_0/work
-Xmx200m 
-Djava.io.tmpdir=/home/godhuli/custom/hdfs/mapred/local/taskTracker/jobcache/job_200812282102_0003/attempt_200812282102_0003_m_00_0/work/tmp
-classpath /attempt_200812282102_0003_m_00_0/work
-Dhadoop.log.dir=/home/godhuli/custom/hadoop/bin/../logs
-Dhadoop.root.logger=INFO,TLA
-Dhadoop.tasklog.taskid=attempt_200812282102_0003_m_00_0
-Dhadoop.tasklog.totalLogFileSize=0 org.apache.hadoop.mapred.Child
127.0.0.1 40443 attempt_200812282102_0003_m_00_0 1525207782

Also, the reducer only used 540mb. I notice -Xmx200m was passed, how
to change it?
Regards
Saptarshi

  

You can set the configuration property mapred.child.java.opts as -Xmx540m.

Thanks
Amareshwari

On Sun, Dec 28, 2008 at 10:19 PM, Saptarshi Guha
 wrote:
  

On Sun, Dec 28, 2008 at 4:33 PM, Brian Bockelman  wrote:


Hey Saptarshi,

Watch the running child process while using "ps", "top", or Ganglia
monitoring.  Does the map task actually use 16GB of memory, or is the memory
not getting set properly?

Brian
  

I haven't figured out how to run ganglia, however, also the children
quit before i can see their memory usage. The trackers all use
16GB.(from the ps command). However, i noticed some use 512MB
only(when i manged to catch them in time)

Regards






  




Re: OutofMemory Error, inspite of large amounts provided

2008-12-28 Thread Saptarshi Guha
Caught it in action.
Running  ps -e -o 'vsz pid ruser args' |sort -nr|head -5
on a machine where the map task was running
04812 16962 sguha/home/godhuli/custom/jdk1.6.0_11/jre/bin/java
-Djava.library.path=/home/godhuli/custom/hadoop/bin/../lib/native/Linux-amd64-64:/home/godhuli/custom/hdfs/mapred/local/taskTracker/jobcache/job_200812282102_0003/attempt_200812282102_0003_m_00_0/work
-Xmx200m 
-Djava.io.tmpdir=/home/godhuli/custom/hdfs/mapred/local/taskTracker/jobcache/job_200812282102_0003/attempt_200812282102_0003_m_00_0/work/tmp
-classpath /attempt_200812282102_0003_m_00_0/work
-Dhadoop.log.dir=/home/godhuli/custom/hadoop/bin/../logs
-Dhadoop.root.logger=INFO,TLA
-Dhadoop.tasklog.taskid=attempt_200812282102_0003_m_00_0
-Dhadoop.tasklog.totalLogFileSize=0 org.apache.hadoop.mapred.Child
127.0.0.1 40443 attempt_200812282102_0003_m_00_0 1525207782

Also, the reducer only used 540mb. I notice -Xmx200m was passed, how
to change it?
Regards
Saptarshi

On Sun, Dec 28, 2008 at 10:19 PM, Saptarshi Guha
 wrote:
> On Sun, Dec 28, 2008 at 4:33 PM, Brian Bockelman  wrote:
>> Hey Saptarshi,
>>
>> Watch the running child process while using "ps", "top", or Ganglia
>> monitoring.  Does the map task actually use 16GB of memory, or is the memory
>> not getting set properly?
>>
>> Brian
>
> I haven't figured out how to run ganglia, however, also the children
> quit before i can see their memory usage. The trackers all use
> 16GB.(from the ps command). However, i noticed some use 512MB
> only(when i manged to catch them in time)
>
> Regards
>



-- 
Saptarshi Guha - saptarshi.g...@gmail.com


Re: OutofMemory Error, inspite of large amounts provided

2008-12-28 Thread Saptarshi Guha
On Sun, Dec 28, 2008 at 4:33 PM, Brian Bockelman  wrote:
> Hey Saptarshi,
>
> Watch the running child process while using "ps", "top", or Ganglia
> monitoring.  Does the map task actually use 16GB of memory, or is the memory
> not getting set properly?
>
> Brian

I haven't figured out how to run ganglia, however, also the children
quit before i can see their memory usage. The trackers all use
16GB.(from the ps command). However, i noticed some use 512MB
only(when i manged to catch them in time)

Regards


Re: OutofMemory Error, inspite of large amounts provided

2008-12-28 Thread Brian Bockelman

Hey Saptarshi,

Watch the running child process while using "ps", "top", or Ganglia  
monitoring.  Does the map task actually use 16GB of memory, or is the  
memory not getting set properly?


Brian

On Dec 28, 2008, at 3:00 PM, Saptarshi Guha wrote:


Hello,
I have work machines with 32GB and allocated 16GB to the heap size
==hadoop-env.sh==
export HADOOP_HEAPSIZE=16384

==hadoop-site.xml==

 mapred.child.java.opts
 -Xmx16384m


The same code runs when not being run through Hadoop, but it fails
when in a Maptask.
Are there other places where I can specify the memory to the maptasks?

Regards
Saptarshi

--
Saptarshi Guha - saptarshi.g...@gmail.com




Re: OutOfMemory Error

2008-09-19 Thread Edward J. Yoon
Great experience!

/Edward

On Fri, Sep 19, 2008 at 2:50 PM, Palleti, Pallavi
<[EMAIL PROTECTED]> wrote:
> Yeah. That was the problem. And Hama can be surely useful for large scale 
> matrix operations.
>
> But for this problem, I have modified the code to just pass the ID 
> information and read the vector information only when it is needed. In this 
> case, it was needed only in the reducer phase. This way, it avoided this 
> problem of out of memory error and also faster now.
>
> Thanks
> Pallavi
> -Original Message-
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Edward J. Yoon
> Sent: Friday, September 19, 2008 10:35 AM
> To: core-user@hadoop.apache.org; [EMAIL PROTECTED]; [EMAIL PROTECTED]
> Subject: Re: OutOfMemory Error
>
>> The key is of the form "ID :DenseVector Representation in mahout with
>
> I guess vector size seems too large so it'll need a distributed vector
> architecture (or 2d partitioning strategies) for large scale matrix
> operations. The hama team investigate these problem areas. So, it will
> be improved If hama can be used for mahout in the future.
>
> /Edward
>
> On Thu, Sep 18, 2008 at 12:28 PM, Pallavi Palleti <[EMAIL PROTECTED]> wrote:
>>
>> Hadoop Version - 17.1
>> io.sort.factor =10
>> The key is of the form "ID :DenseVector Representation in mahout with
>> dimensionality size = 160k"
>> For example: C1:[,0.0011, 3.002, .. 1.001,]
>> So, typical size of the key  of the mapper output can be 160K*6 (assuming
>> double in string is represented in 5 bytes)+ 5 (bytes for C1:[])  + size
>> required to store that the object is of type Text
>>
>> Thanks
>> Pallavi
>>
>>
>>
>> Devaraj Das wrote:
>>>
>>>
>>>
>>>
>>> On 9/17/08 6:06 PM, "Pallavi Palleti" <[EMAIL PROTECTED]> wrote:
>>>
>>>>
>>>> Hi all,
>>>>
>>>>I am getting outofmemory error as shown below when I ran map-red on
>>>> huge
>>>> amount of data.:
>>>> java.lang.OutOfMemoryError: Java heap space
>>>> at
>>>> org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:52)
>>>> at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:90)
>>>> at
>>>> org.apache.hadoop.io.SequenceFile$Reader.nextRawKey(SequenceFile.java:1974)
>>>> at
>>>> org.apache.hadoop.io.SequenceFile$Sorter$SegmentDescriptor.nextRawKey(Sequence
>>>> File.java:3002)
>>>> at
>>>> org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.merge(SequenceFile.java:28
>>>> 02)
>>>> at org.apache.hadoop.io.SequenceFile$Sorter.merge(SequenceFile.java:2511)
>>>> at
>>>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1040)
>>>> at
>>>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:698)
>>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:220)
>>>> at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124
>>>> The above error comes almost at the end of map job. I have set the heap
>>>> size
>>>> to 1GB. Still the problem is persisting.  Can someone please help me how
>>>> to
>>>> avoid this error?
>>> What is the typical size of your key? What is the value of io.sort.factor?
>>> Hadoop version?
>>>
>>>
>>>
>>>
>>
>> --
>> View this message in context: 
>> http://www.nabble.com/OutOfMemory-Error-tp19531174p19545298.html
>> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>>
>>
>
>
>
> --
> Best regards, Edward J. Yoon
> [EMAIL PROTECTED]
> http://blog.udanax.org
>



-- 
Best regards, Edward J. Yoon
[EMAIL PROTECTED]
http://blog.udanax.org


RE: OutOfMemory Error

2008-09-18 Thread Palleti, Pallavi
Yeah. That was the problem. And Hama can be surely useful for large scale 
matrix operations.

But for this problem, I have modified the code to just pass the ID information 
and read the vector information only when it is needed. In this case, it was 
needed only in the reducer phase. This way, it avoided this problem of out of 
memory error and also faster now.

Thanks
Pallavi
-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Edward J. Yoon
Sent: Friday, September 19, 2008 10:35 AM
To: core-user@hadoop.apache.org; [EMAIL PROTECTED]; [EMAIL PROTECTED]
Subject: Re: OutOfMemory Error

> The key is of the form "ID :DenseVector Representation in mahout with

I guess vector size seems too large so it'll need a distributed vector
architecture (or 2d partitioning strategies) for large scale matrix
operations. The hama team investigate these problem areas. So, it will
be improved If hama can be used for mahout in the future.

/Edward

On Thu, Sep 18, 2008 at 12:28 PM, Pallavi Palleti <[EMAIL PROTECTED]> wrote:
>
> Hadoop Version - 17.1
> io.sort.factor =10
> The key is of the form "ID :DenseVector Representation in mahout with
> dimensionality size = 160k"
> For example: C1:[,0.0011, 3.002, .. 1.001,]
> So, typical size of the key  of the mapper output can be 160K*6 (assuming
> double in string is represented in 5 bytes)+ 5 (bytes for C1:[])  + size
> required to store that the object is of type Text
>
> Thanks
> Pallavi
>
>
>
> Devaraj Das wrote:
>>
>>
>>
>>
>> On 9/17/08 6:06 PM, "Pallavi Palleti" <[EMAIL PROTECTED]> wrote:
>>
>>>
>>> Hi all,
>>>
>>>I am getting outofmemory error as shown below when I ran map-red on
>>> huge
>>> amount of data.:
>>> java.lang.OutOfMemoryError: Java heap space
>>> at
>>> org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:52)
>>> at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:90)
>>> at
>>> org.apache.hadoop.io.SequenceFile$Reader.nextRawKey(SequenceFile.java:1974)
>>> at
>>> org.apache.hadoop.io.SequenceFile$Sorter$SegmentDescriptor.nextRawKey(Sequence
>>> File.java:3002)
>>> at
>>> org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.merge(SequenceFile.java:28
>>> 02)
>>> at org.apache.hadoop.io.SequenceFile$Sorter.merge(SequenceFile.java:2511)
>>> at
>>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1040)
>>> at
>>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:698)
>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:220)
>>> at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124
>>> The above error comes almost at the end of map job. I have set the heap
>>> size
>>> to 1GB. Still the problem is persisting.  Can someone please help me how
>>> to
>>> avoid this error?
>> What is the typical size of your key? What is the value of io.sort.factor?
>> Hadoop version?
>>
>>
>>
>>
>
> --
> View this message in context: 
> http://www.nabble.com/OutOfMemory-Error-tp19531174p19545298.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>
>



-- 
Best regards, Edward J. Yoon
[EMAIL PROTECTED]
http://blog.udanax.org


Re: OutOfMemory Error

2008-09-18 Thread Edward J. Yoon
> The key is of the form "ID :DenseVector Representation in mahout with

I guess vector size seems too large so it'll need a distributed vector
architecture (or 2d partitioning strategies) for large scale matrix
operations. The hama team investigate these problem areas. So, it will
be improved If hama can be used for mahout in the future.

/Edward

On Thu, Sep 18, 2008 at 12:28 PM, Pallavi Palleti <[EMAIL PROTECTED]> wrote:
>
> Hadoop Version - 17.1
> io.sort.factor =10
> The key is of the form "ID :DenseVector Representation in mahout with
> dimensionality size = 160k"
> For example: C1:[,0.0011, 3.002, .. 1.001,]
> So, typical size of the key  of the mapper output can be 160K*6 (assuming
> double in string is represented in 5 bytes)+ 5 (bytes for C1:[])  + size
> required to store that the object is of type Text
>
> Thanks
> Pallavi
>
>
>
> Devaraj Das wrote:
>>
>>
>>
>>
>> On 9/17/08 6:06 PM, "Pallavi Palleti" <[EMAIL PROTECTED]> wrote:
>>
>>>
>>> Hi all,
>>>
>>>I am getting outofmemory error as shown below when I ran map-red on
>>> huge
>>> amount of data.:
>>> java.lang.OutOfMemoryError: Java heap space
>>> at
>>> org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:52)
>>> at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:90)
>>> at
>>> org.apache.hadoop.io.SequenceFile$Reader.nextRawKey(SequenceFile.java:1974)
>>> at
>>> org.apache.hadoop.io.SequenceFile$Sorter$SegmentDescriptor.nextRawKey(Sequence
>>> File.java:3002)
>>> at
>>> org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.merge(SequenceFile.java:28
>>> 02)
>>> at org.apache.hadoop.io.SequenceFile$Sorter.merge(SequenceFile.java:2511)
>>> at
>>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1040)
>>> at
>>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:698)
>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:220)
>>> at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124
>>> The above error comes almost at the end of map job. I have set the heap
>>> size
>>> to 1GB. Still the problem is persisting.  Can someone please help me how
>>> to
>>> avoid this error?
>> What is the typical size of your key? What is the value of io.sort.factor?
>> Hadoop version?
>>
>>
>>
>>
>
> --
> View this message in context: 
> http://www.nabble.com/OutOfMemory-Error-tp19531174p19545298.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>
>



-- 
Best regards, Edward J. Yoon
[EMAIL PROTECTED]
http://blog.udanax.org


Re: OutOfMemory Error

2008-09-17 Thread Pallavi Palleti

Hadoop Version - 17.1
io.sort.factor =10
The key is of the form "ID :DenseVector Representation in mahout with
dimensionality size = 160k"
For example: C1:[,0.0011, 3.002, .. 1.001,]
So, typical size of the key  of the mapper output can be 160K*6 (assuming
double in string is represented in 5 bytes)+ 5 (bytes for C1:[])  + size
required to store that the object is of type Text

Thanks
Pallavi



Devaraj Das wrote:
> 
> 
> 
> 
> On 9/17/08 6:06 PM, "Pallavi Palleti" <[EMAIL PROTECTED]> wrote:
> 
>> 
>> Hi all,
>> 
>>I am getting outofmemory error as shown below when I ran map-red on
>> huge
>> amount of data.: 
>> java.lang.OutOfMemoryError: Java heap space
>> at
>> org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:52)
>> at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:90)
>> at
>> org.apache.hadoop.io.SequenceFile$Reader.nextRawKey(SequenceFile.java:1974)
>> at
>> org.apache.hadoop.io.SequenceFile$Sorter$SegmentDescriptor.nextRawKey(Sequence
>> File.java:3002)
>> at
>> org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.merge(SequenceFile.java:28
>> 02)
>> at org.apache.hadoop.io.SequenceFile$Sorter.merge(SequenceFile.java:2511)
>> at
>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1040)
>> at
>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:698)
>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:220)
>> at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124
>> The above error comes almost at the end of map job. I have set the heap
>> size
>> to 1GB. Still the problem is persisting.  Can someone please help me how
>> to
>> avoid this error?
> What is the typical size of your key? What is the value of io.sort.factor?
> Hadoop version?
> 
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/OutOfMemory-Error-tp19531174p19545298.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.



Re: OutOfMemory Error

2008-09-17 Thread Devaraj Das



On 9/17/08 6:06 PM, "Pallavi Palleti" <[EMAIL PROTECTED]> wrote:

> 
> Hi all,
> 
>I am getting outofmemory error as shown below when I ran map-red on huge
> amount of data.: 
> java.lang.OutOfMemoryError: Java heap space
> at
> org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:52)
> at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:90)
> at
> org.apache.hadoop.io.SequenceFile$Reader.nextRawKey(SequenceFile.java:1974)
> at
> org.apache.hadoop.io.SequenceFile$Sorter$SegmentDescriptor.nextRawKey(Sequence
> File.java:3002)
> at
> org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.merge(SequenceFile.java:28
> 02)
> at org.apache.hadoop.io.SequenceFile$Sorter.merge(SequenceFile.java:2511)
> at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1040)
> at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:698)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:220)
> at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124
> The above error comes almost at the end of map job. I have set the heap size
> to 1GB. Still the problem is persisting.  Can someone please help me how to
> avoid this error?
What is the typical size of your key? What is the value of io.sort.factor?
Hadoop version?




RE: OutOfMemory Error

2008-09-17 Thread Leon Mergen
Hello,

What version of Hadoop are you using ?

Regards,

Leon Mergen

> -Original Message-
> From: Pallavi Palleti [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, September 17, 2008 2:36 PM
> To: core-user@hadoop.apache.org
> Subject: OutOfMemory Error
>
>
> Hi all,
>
>I am getting outofmemory error as shown below when I ran map-red on
> huge
> amount of data.:
> java.lang.OutOfMemoryError: Java heap space
> at
> org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.jav
> a:52)
> at
> org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:90)
> at
> org.apache.hadoop.io.SequenceFile$Reader.nextRawKey(SequenceFile.java:1
> 974)
> at
> org.apache.hadoop.io.SequenceFile$Sorter$SegmentDescriptor.nextRawKey(S
> equenceFile.java:3002)
> at
> org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.merge(SequenceFile.
> java:2802)
> at
> org.apache.hadoop.io.SequenceFile$Sorter.merge(SequenceFile.java:2511)
> at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.jav
> a:1040)
> at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:698
> )
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:220)
> at
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124
> The above error comes almost at the end of map job. I have set the heap
> size
> to 1GB. Still the problem is persisting.  Can someone please help me
> how to
> avoid this error?
> --
> View this message in context: http://www.nabble.com/OutOfMemory-Error-
> tp19531174p19531174.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.