Re: OutOfMemory error processing large amounts of gz files
Your job tracker out-of-memory problem may be related to https://issues.apache.org/jira/browse/HADOOP-4766 Runping On Mon, Mar 2, 2009 at 4:29 PM, bzheng wrote: > > Thanks for all the info. Upon further investigation, we are dealing with > two > separate issues: > > 1. problem processing a lot of gz files > > we have tried the hadoop.native.lib setting and it makes little difference. > however, this is not that big a deal since we can use multiple jobs each > processing a small chunk of the files instead of one big job processing all > the files. > > 2. jobtracker out of memory > > by increasing amount of memory for the jobtracker, we can delay the > inevitable. since the jobtraker's memory usage keeps going up as we run > more job, we will need to restart the cluster once this error happens. we > are currently using 0.18.3 and are holding off changing to a different > version because we don't want to lose the existing files on HDFS. > > > bzheng wrote: > > > > I have about 24k gz files (about 550GB total) on hdfs and has a really > > simple java program to convert them into sequence files. If the script's > > setInputPaths takes a Path[] of all 24k files, it will get a OutOfMemory > > error at about 35% map complete. If I make the script process 2k files > > per job and run 12 jobs consecutively, then it goes through all files > > fine. The cluster I'm using has about 67 nodes. Each nodes has 16GB > > memory, max 7 map, and max 2 reduce. > > > > The map task is really simple, it takes LongWritable as key and Text as > > value, generate a Text newKey, and output.collect(Text newKey, Text > > value). It doesn't have any code that can possibly leak memory. > > > > There's no stack trace for the vast majority of the OutOfMemory error, > > there's just a single line in the log like this: > > > > 2009-02-23 14:27:50,902 INFO org.apache.hadoop.mapred.TaskTracker: > > java.lang.OutOfMemoryError: Java heap space > > > > I can't find the stack trace right now, but rarely the OutOfMemory error > > originates from some hadoop config array copy opertaion. There's no > > special config for the script. > > > > -- > View this message in context: > http://www.nabble.com/OutOfMemory-error-processing-large-amounts-of-gz-files-tp22193552p22300192.html > Sent from the Hadoop core-user mailing list archive at Nabble.com. > >
Re: OutOfMemory error processing large amounts of gz files
Thanks for all the info. Upon further investigation, we are dealing with two separate issues: 1. problem processing a lot of gz files we have tried the hadoop.native.lib setting and it makes little difference. however, this is not that big a deal since we can use multiple jobs each processing a small chunk of the files instead of one big job processing all the files. 2. jobtracker out of memory by increasing amount of memory for the jobtracker, we can delay the inevitable. since the jobtraker's memory usage keeps going up as we run more job, we will need to restart the cluster once this error happens. we are currently using 0.18.3 and are holding off changing to a different version because we don't want to lose the existing files on HDFS. bzheng wrote: > > I have about 24k gz files (about 550GB total) on hdfs and has a really > simple java program to convert them into sequence files. If the script's > setInputPaths takes a Path[] of all 24k files, it will get a OutOfMemory > error at about 35% map complete. If I make the script process 2k files > per job and run 12 jobs consecutively, then it goes through all files > fine. The cluster I'm using has about 67 nodes. Each nodes has 16GB > memory, max 7 map, and max 2 reduce. > > The map task is really simple, it takes LongWritable as key and Text as > value, generate a Text newKey, and output.collect(Text newKey, Text > value). It doesn't have any code that can possibly leak memory. > > There's no stack trace for the vast majority of the OutOfMemory error, > there's just a single line in the log like this: > > 2009-02-23 14:27:50,902 INFO org.apache.hadoop.mapred.TaskTracker: > java.lang.OutOfMemoryError: Java heap space > > I can't find the stack trace right now, but rarely the OutOfMemory error > originates from some hadoop config array copy opertaion. There's no > special config for the script. > -- View this message in context: http://www.nabble.com/OutOfMemory-error-processing-large-amounts-of-gz-files-tp22193552p22300192.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
Re: OutOfMemory error processing large amounts of gz files
Arun C Murthy-2 wrote: > > > On Feb 24, 2009, at 4:03 PM, bzheng wrote: >> > >> 2009-02-23 14:27:50,902 INFO org.apache.hadoop.mapred.TaskTracker: >> java.lang.OutOfMemoryError: Java heap space >> > > That tells that that your TaskTracker is running out of memory, not > your reduce tasks. > > I think you are hitting http://issues.apache.org/jira/browse/ > HADOOP-4906. > > What version of hadoop are you running? > > Arun > > > I'm using 0.18.2. We figured that gz may not be the root problem when we ran a big job not involving any gz files, after about 1.5 hours, we got the same out of memory problem. One interesting thing though, if we do use gz files, the out of memory issues occurs in a few minutes. -- View this message in context: http://www.nabble.com/OutOfMemory-error-processing-large-amounts-of-gz-files-tp22193552p22231249.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
Re: OutOfMemory error processing large amounts of gz files
On Feb 24, 2009, at 4:03 PM, bzheng wrote: 2009-02-23 14:27:50,902 INFO org.apache.hadoop.mapred.TaskTracker: java.lang.OutOfMemoryError: Java heap space That tells that that your TaskTracker is running out of memory, not your reduce tasks. I think you are hitting http://issues.apache.org/jira/browse/ HADOOP-4906. What version of hadoop are you running? Arun
Re: OutOfMemory error processing large amounts of gz files
Thanks for the suggestions. I tried the hadoop.native.lib setting (both in job config and in hadoop-sites.xml + restart) and the problem is still there. I finally got the exception w/some stack trace and here it is: 2009-02-25 12:24:18,312 INFO org.apache.hadoop.mapred.TaskTracker: java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOfRange(Arrays.java:3209) at java.lang.String.(String.java:216) at java.lang.StringBuffer.toString(StringBuffer.java:585) at com.sun.org.apache.xerces.internal.dom.DeferredDocumentImpl.getNodeValueString(DeferredDocumentImpl.java:1170) at com.sun.org.apache.xerces.internal.dom.DeferredDocumentImpl.getNodeValueString(DeferredDocumentImpl.java:1120) at com.sun.org.apache.xerces.internal.dom.DeferredTextImpl.synchronizeData(DeferredTextImpl.java:93) at com.sun.org.apache.xerces.internal.dom.CharacterDataImpl.getData(CharacterDataImpl.java:160) at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:928) at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:851) at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:819) at org.apache.hadoop.conf.Configuration.get(Configuration.java:278) at org.apache.hadoop.conf.Configuration.getBoolean(Configuration.java:446) at org.apache.hadoop.mapred.JobConf.getKeepFailedTaskFiles(JobConf.java:308) at org.apache.hadoop.mapred.TaskTracker$TaskInProgress.setJobConf(TaskTracker.java:1497) at org.apache.hadoop.mapred.TaskTracker.launchTaskForJob(TaskTracker.java:727) at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:721) at org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:1297) at org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:937) at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:1334) at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:2343) Tom White-3 wrote: > > Do you experience the problem with and without native compression? Set > hadoop.native.lib to false to disable native compression. > > Cheers, > Tom > > On Tue, Feb 24, 2009 at 9:40 PM, Gordon Mohr wrote: >> If you're doing a lot of gzip compression/decompression, you *might* be >> hitting this 6+-year-old Sun JVM bug: >> >> "Instantiating Inflater/Deflater causes OutOfMemoryError; finalizers not >> called promptly enough" >> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4797189 >> >> A workaround is listed in the issue: ensuring you call close() or end() >> on >> the Deflater; something similar might apply to Inflater. >> >> (This is one of those fun JVM situations where having more heap space may >> make OOMEs more likely: less heap memory pressure leaves more un-GCd or >> un-finalized heap objects around, each of which is holding a bit of >> native >> memory.) >> >> - Gordon @ IA >> >> bzheng wrote: >>> >>> I have about 24k gz files (about 550GB total) on hdfs and has a really >>> simple >>> java program to convert them into sequence files. If the script's >>> setInputPaths takes a Path[] of all 24k files, it will get a OutOfMemory >>> error at about 35% map complete. If I make the script process 2k files >>> per >>> job and run 12 jobs consecutively, then it goes through all files fine. >>> The >>> cluster I'm using has about 67 nodes. Each nodes has 16GB memory, max 7 >>> map, and max 2 reduce. >>> >>> The map task is really simple, it takes LongWritable as key and Text as >>> value, generate a Text newKey, and output.collect(Text newKey, Text >>> value). It doesn't have any code that can possibly leak memory. >>> >>> There's no stack trace for the vast majority of the OutOfMemory error, >>> there's just a single line in the log like this: >>> >>> 2009-02-23 14:27:50,902 INFO org.apache.hadoop.mapred.TaskTracker: >>> java.lang.OutOfMemoryError: Java heap space >>> >>> I can't find the stack trace right now, but rarely the OutOfMemory error >>> originates from some hadoop config array copy opertaion. There's no >>> special >>> config for the script. >> > > -- View this message in context: http://www.nabble.com/OutOfMemory-error-processing-large-amounts-of-gz-files-tp22193552p22214505.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
Re: OutOfMemory error processing large amounts of gz files
Do you experience the problem with and without native compression? Set hadoop.native.lib to false to disable native compression. Cheers, Tom On Tue, Feb 24, 2009 at 9:40 PM, Gordon Mohr wrote: > If you're doing a lot of gzip compression/decompression, you *might* be > hitting this 6+-year-old Sun JVM bug: > > "Instantiating Inflater/Deflater causes OutOfMemoryError; finalizers not > called promptly enough" > http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4797189 > > A workaround is listed in the issue: ensuring you call close() or end() on > the Deflater; something similar might apply to Inflater. > > (This is one of those fun JVM situations where having more heap space may > make OOMEs more likely: less heap memory pressure leaves more un-GCd or > un-finalized heap objects around, each of which is holding a bit of native > memory.) > > - Gordon @ IA > > bzheng wrote: >> >> I have about 24k gz files (about 550GB total) on hdfs and has a really >> simple >> java program to convert them into sequence files. If the script's >> setInputPaths takes a Path[] of all 24k files, it will get a OutOfMemory >> error at about 35% map complete. If I make the script process 2k files >> per >> job and run 12 jobs consecutively, then it goes through all files fine. >> The >> cluster I'm using has about 67 nodes. Each nodes has 16GB memory, max 7 >> map, and max 2 reduce. >> >> The map task is really simple, it takes LongWritable as key and Text as >> value, generate a Text newKey, and output.collect(Text newKey, Text >> value). It doesn't have any code that can possibly leak memory. >> >> There's no stack trace for the vast majority of the OutOfMemory error, >> there's just a single line in the log like this: >> >> 2009-02-23 14:27:50,902 INFO org.apache.hadoop.mapred.TaskTracker: >> java.lang.OutOfMemoryError: Java heap space >> >> I can't find the stack trace right now, but rarely the OutOfMemory error >> originates from some hadoop config array copy opertaion. There's no >> special >> config for the script. >
Re: OutOfMemory error processing large amounts of gz files
If you're doing a lot of gzip compression/decompression, you *might* be hitting this 6+-year-old Sun JVM bug: "Instantiating Inflater/Deflater causes OutOfMemoryError; finalizers not called promptly enough" http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4797189 A workaround is listed in the issue: ensuring you call close() or end() on the Deflater; something similar might apply to Inflater. (This is one of those fun JVM situations where having more heap space may make OOMEs more likely: less heap memory pressure leaves more un-GCd or un-finalized heap objects around, each of which is holding a bit of native memory.) - Gordon @ IA bzheng wrote: I have about 24k gz files (about 550GB total) on hdfs and has a really simple java program to convert them into sequence files. If the script's setInputPaths takes a Path[] of all 24k files, it will get a OutOfMemory error at about 35% map complete. If I make the script process 2k files per job and run 12 jobs consecutively, then it goes through all files fine. The cluster I'm using has about 67 nodes. Each nodes has 16GB memory, max 7 map, and max 2 reduce. The map task is really simple, it takes LongWritable as key and Text as value, generate a Text newKey, and output.collect(Text newKey, Text value). It doesn't have any code that can possibly leak memory. There's no stack trace for the vast majority of the OutOfMemory error, there's just a single line in the log like this: 2009-02-23 14:27:50,902 INFO org.apache.hadoop.mapred.TaskTracker: java.lang.OutOfMemoryError: Java heap space I can't find the stack trace right now, but rarely the OutOfMemory error originates from some hadoop config array copy opertaion. There's no special config for the script.
Re: OutofMemory Error, inspite of large amounts provided
Saptarshi Guha wrote: Caught it in action. Running ps -e -o 'vsz pid ruser args' |sort -nr|head -5 on a machine where the map task was running 04812 16962 sguha/home/godhuli/custom/jdk1.6.0_11/jre/bin/java -Djava.library.path=/home/godhuli/custom/hadoop/bin/../lib/native/Linux-amd64-64:/home/godhuli/custom/hdfs/mapred/local/taskTracker/jobcache/job_200812282102_0003/attempt_200812282102_0003_m_00_0/work -Xmx200m -Djava.io.tmpdir=/home/godhuli/custom/hdfs/mapred/local/taskTracker/jobcache/job_200812282102_0003/attempt_200812282102_0003_m_00_0/work/tmp -classpath /attempt_200812282102_0003_m_00_0/work -Dhadoop.log.dir=/home/godhuli/custom/hadoop/bin/../logs -Dhadoop.root.logger=INFO,TLA -Dhadoop.tasklog.taskid=attempt_200812282102_0003_m_00_0 -Dhadoop.tasklog.totalLogFileSize=0 org.apache.hadoop.mapred.Child 127.0.0.1 40443 attempt_200812282102_0003_m_00_0 1525207782 Also, the reducer only used 540mb. I notice -Xmx200m was passed, how to change it? Regards Saptarshi You can set the configuration property mapred.child.java.opts as -Xmx540m. Thanks Amareshwari On Sun, Dec 28, 2008 at 10:19 PM, Saptarshi Guha wrote: On Sun, Dec 28, 2008 at 4:33 PM, Brian Bockelman wrote: Hey Saptarshi, Watch the running child process while using "ps", "top", or Ganglia monitoring. Does the map task actually use 16GB of memory, or is the memory not getting set properly? Brian I haven't figured out how to run ganglia, however, also the children quit before i can see their memory usage. The trackers all use 16GB.(from the ps command). However, i noticed some use 512MB only(when i manged to catch them in time) Regards
Re: OutofMemory Error, inspite of large amounts provided
Caught it in action. Running ps -e -o 'vsz pid ruser args' |sort -nr|head -5 on a machine where the map task was running 04812 16962 sguha/home/godhuli/custom/jdk1.6.0_11/jre/bin/java -Djava.library.path=/home/godhuli/custom/hadoop/bin/../lib/native/Linux-amd64-64:/home/godhuli/custom/hdfs/mapred/local/taskTracker/jobcache/job_200812282102_0003/attempt_200812282102_0003_m_00_0/work -Xmx200m -Djava.io.tmpdir=/home/godhuli/custom/hdfs/mapred/local/taskTracker/jobcache/job_200812282102_0003/attempt_200812282102_0003_m_00_0/work/tmp -classpath /attempt_200812282102_0003_m_00_0/work -Dhadoop.log.dir=/home/godhuli/custom/hadoop/bin/../logs -Dhadoop.root.logger=INFO,TLA -Dhadoop.tasklog.taskid=attempt_200812282102_0003_m_00_0 -Dhadoop.tasklog.totalLogFileSize=0 org.apache.hadoop.mapred.Child 127.0.0.1 40443 attempt_200812282102_0003_m_00_0 1525207782 Also, the reducer only used 540mb. I notice -Xmx200m was passed, how to change it? Regards Saptarshi On Sun, Dec 28, 2008 at 10:19 PM, Saptarshi Guha wrote: > On Sun, Dec 28, 2008 at 4:33 PM, Brian Bockelman wrote: >> Hey Saptarshi, >> >> Watch the running child process while using "ps", "top", or Ganglia >> monitoring. Does the map task actually use 16GB of memory, or is the memory >> not getting set properly? >> >> Brian > > I haven't figured out how to run ganglia, however, also the children > quit before i can see their memory usage. The trackers all use > 16GB.(from the ps command). However, i noticed some use 512MB > only(when i manged to catch them in time) > > Regards > -- Saptarshi Guha - saptarshi.g...@gmail.com
Re: OutofMemory Error, inspite of large amounts provided
On Sun, Dec 28, 2008 at 4:33 PM, Brian Bockelman wrote: > Hey Saptarshi, > > Watch the running child process while using "ps", "top", or Ganglia > monitoring. Does the map task actually use 16GB of memory, or is the memory > not getting set properly? > > Brian I haven't figured out how to run ganglia, however, also the children quit before i can see their memory usage. The trackers all use 16GB.(from the ps command). However, i noticed some use 512MB only(when i manged to catch them in time) Regards
Re: OutofMemory Error, inspite of large amounts provided
Hey Saptarshi, Watch the running child process while using "ps", "top", or Ganglia monitoring. Does the map task actually use 16GB of memory, or is the memory not getting set properly? Brian On Dec 28, 2008, at 3:00 PM, Saptarshi Guha wrote: Hello, I have work machines with 32GB and allocated 16GB to the heap size ==hadoop-env.sh== export HADOOP_HEAPSIZE=16384 ==hadoop-site.xml== mapred.child.java.opts -Xmx16384m The same code runs when not being run through Hadoop, but it fails when in a Maptask. Are there other places where I can specify the memory to the maptasks? Regards Saptarshi -- Saptarshi Guha - saptarshi.g...@gmail.com
Re: OutOfMemory Error
Great experience! /Edward On Fri, Sep 19, 2008 at 2:50 PM, Palleti, Pallavi <[EMAIL PROTECTED]> wrote: > Yeah. That was the problem. And Hama can be surely useful for large scale > matrix operations. > > But for this problem, I have modified the code to just pass the ID > information and read the vector information only when it is needed. In this > case, it was needed only in the reducer phase. This way, it avoided this > problem of out of memory error and also faster now. > > Thanks > Pallavi > -Original Message- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Edward J. Yoon > Sent: Friday, September 19, 2008 10:35 AM > To: core-user@hadoop.apache.org; [EMAIL PROTECTED]; [EMAIL PROTECTED] > Subject: Re: OutOfMemory Error > >> The key is of the form "ID :DenseVector Representation in mahout with > > I guess vector size seems too large so it'll need a distributed vector > architecture (or 2d partitioning strategies) for large scale matrix > operations. The hama team investigate these problem areas. So, it will > be improved If hama can be used for mahout in the future. > > /Edward > > On Thu, Sep 18, 2008 at 12:28 PM, Pallavi Palleti <[EMAIL PROTECTED]> wrote: >> >> Hadoop Version - 17.1 >> io.sort.factor =10 >> The key is of the form "ID :DenseVector Representation in mahout with >> dimensionality size = 160k" >> For example: C1:[,0.0011, 3.002, .. 1.001,] >> So, typical size of the key of the mapper output can be 160K*6 (assuming >> double in string is represented in 5 bytes)+ 5 (bytes for C1:[]) + size >> required to store that the object is of type Text >> >> Thanks >> Pallavi >> >> >> >> Devaraj Das wrote: >>> >>> >>> >>> >>> On 9/17/08 6:06 PM, "Pallavi Palleti" <[EMAIL PROTECTED]> wrote: >>> >>>> >>>> Hi all, >>>> >>>>I am getting outofmemory error as shown below when I ran map-red on >>>> huge >>>> amount of data.: >>>> java.lang.OutOfMemoryError: Java heap space >>>> at >>>> org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:52) >>>> at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:90) >>>> at >>>> org.apache.hadoop.io.SequenceFile$Reader.nextRawKey(SequenceFile.java:1974) >>>> at >>>> org.apache.hadoop.io.SequenceFile$Sorter$SegmentDescriptor.nextRawKey(Sequence >>>> File.java:3002) >>>> at >>>> org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.merge(SequenceFile.java:28 >>>> 02) >>>> at org.apache.hadoop.io.SequenceFile$Sorter.merge(SequenceFile.java:2511) >>>> at >>>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1040) >>>> at >>>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:698) >>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:220) >>>> at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124 >>>> The above error comes almost at the end of map job. I have set the heap >>>> size >>>> to 1GB. Still the problem is persisting. Can someone please help me how >>>> to >>>> avoid this error? >>> What is the typical size of your key? What is the value of io.sort.factor? >>> Hadoop version? >>> >>> >>> >>> >> >> -- >> View this message in context: >> http://www.nabble.com/OutOfMemory-Error-tp19531174p19545298.html >> Sent from the Hadoop core-user mailing list archive at Nabble.com. >> >> > > > > -- > Best regards, Edward J. Yoon > [EMAIL PROTECTED] > http://blog.udanax.org > -- Best regards, Edward J. Yoon [EMAIL PROTECTED] http://blog.udanax.org
RE: OutOfMemory Error
Yeah. That was the problem. And Hama can be surely useful for large scale matrix operations. But for this problem, I have modified the code to just pass the ID information and read the vector information only when it is needed. In this case, it was needed only in the reducer phase. This way, it avoided this problem of out of memory error and also faster now. Thanks Pallavi -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Edward J. Yoon Sent: Friday, September 19, 2008 10:35 AM To: core-user@hadoop.apache.org; [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: Re: OutOfMemory Error > The key is of the form "ID :DenseVector Representation in mahout with I guess vector size seems too large so it'll need a distributed vector architecture (or 2d partitioning strategies) for large scale matrix operations. The hama team investigate these problem areas. So, it will be improved If hama can be used for mahout in the future. /Edward On Thu, Sep 18, 2008 at 12:28 PM, Pallavi Palleti <[EMAIL PROTECTED]> wrote: > > Hadoop Version - 17.1 > io.sort.factor =10 > The key is of the form "ID :DenseVector Representation in mahout with > dimensionality size = 160k" > For example: C1:[,0.0011, 3.002, .. 1.001,] > So, typical size of the key of the mapper output can be 160K*6 (assuming > double in string is represented in 5 bytes)+ 5 (bytes for C1:[]) + size > required to store that the object is of type Text > > Thanks > Pallavi > > > > Devaraj Das wrote: >> >> >> >> >> On 9/17/08 6:06 PM, "Pallavi Palleti" <[EMAIL PROTECTED]> wrote: >> >>> >>> Hi all, >>> >>>I am getting outofmemory error as shown below when I ran map-red on >>> huge >>> amount of data.: >>> java.lang.OutOfMemoryError: Java heap space >>> at >>> org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:52) >>> at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:90) >>> at >>> org.apache.hadoop.io.SequenceFile$Reader.nextRawKey(SequenceFile.java:1974) >>> at >>> org.apache.hadoop.io.SequenceFile$Sorter$SegmentDescriptor.nextRawKey(Sequence >>> File.java:3002) >>> at >>> org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.merge(SequenceFile.java:28 >>> 02) >>> at org.apache.hadoop.io.SequenceFile$Sorter.merge(SequenceFile.java:2511) >>> at >>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1040) >>> at >>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:698) >>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:220) >>> at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124 >>> The above error comes almost at the end of map job. I have set the heap >>> size >>> to 1GB. Still the problem is persisting. Can someone please help me how >>> to >>> avoid this error? >> What is the typical size of your key? What is the value of io.sort.factor? >> Hadoop version? >> >> >> >> > > -- > View this message in context: > http://www.nabble.com/OutOfMemory-Error-tp19531174p19545298.html > Sent from the Hadoop core-user mailing list archive at Nabble.com. > > -- Best regards, Edward J. Yoon [EMAIL PROTECTED] http://blog.udanax.org
Re: OutOfMemory Error
> The key is of the form "ID :DenseVector Representation in mahout with I guess vector size seems too large so it'll need a distributed vector architecture (or 2d partitioning strategies) for large scale matrix operations. The hama team investigate these problem areas. So, it will be improved If hama can be used for mahout in the future. /Edward On Thu, Sep 18, 2008 at 12:28 PM, Pallavi Palleti <[EMAIL PROTECTED]> wrote: > > Hadoop Version - 17.1 > io.sort.factor =10 > The key is of the form "ID :DenseVector Representation in mahout with > dimensionality size = 160k" > For example: C1:[,0.0011, 3.002, .. 1.001,] > So, typical size of the key of the mapper output can be 160K*6 (assuming > double in string is represented in 5 bytes)+ 5 (bytes for C1:[]) + size > required to store that the object is of type Text > > Thanks > Pallavi > > > > Devaraj Das wrote: >> >> >> >> >> On 9/17/08 6:06 PM, "Pallavi Palleti" <[EMAIL PROTECTED]> wrote: >> >>> >>> Hi all, >>> >>>I am getting outofmemory error as shown below when I ran map-red on >>> huge >>> amount of data.: >>> java.lang.OutOfMemoryError: Java heap space >>> at >>> org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:52) >>> at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:90) >>> at >>> org.apache.hadoop.io.SequenceFile$Reader.nextRawKey(SequenceFile.java:1974) >>> at >>> org.apache.hadoop.io.SequenceFile$Sorter$SegmentDescriptor.nextRawKey(Sequence >>> File.java:3002) >>> at >>> org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.merge(SequenceFile.java:28 >>> 02) >>> at org.apache.hadoop.io.SequenceFile$Sorter.merge(SequenceFile.java:2511) >>> at >>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1040) >>> at >>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:698) >>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:220) >>> at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124 >>> The above error comes almost at the end of map job. I have set the heap >>> size >>> to 1GB. Still the problem is persisting. Can someone please help me how >>> to >>> avoid this error? >> What is the typical size of your key? What is the value of io.sort.factor? >> Hadoop version? >> >> >> >> > > -- > View this message in context: > http://www.nabble.com/OutOfMemory-Error-tp19531174p19545298.html > Sent from the Hadoop core-user mailing list archive at Nabble.com. > > -- Best regards, Edward J. Yoon [EMAIL PROTECTED] http://blog.udanax.org
Re: OutOfMemory Error
Hadoop Version - 17.1 io.sort.factor =10 The key is of the form "ID :DenseVector Representation in mahout with dimensionality size = 160k" For example: C1:[,0.0011, 3.002, .. 1.001,] So, typical size of the key of the mapper output can be 160K*6 (assuming double in string is represented in 5 bytes)+ 5 (bytes for C1:[]) + size required to store that the object is of type Text Thanks Pallavi Devaraj Das wrote: > > > > > On 9/17/08 6:06 PM, "Pallavi Palleti" <[EMAIL PROTECTED]> wrote: > >> >> Hi all, >> >>I am getting outofmemory error as shown below when I ran map-red on >> huge >> amount of data.: >> java.lang.OutOfMemoryError: Java heap space >> at >> org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:52) >> at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:90) >> at >> org.apache.hadoop.io.SequenceFile$Reader.nextRawKey(SequenceFile.java:1974) >> at >> org.apache.hadoop.io.SequenceFile$Sorter$SegmentDescriptor.nextRawKey(Sequence >> File.java:3002) >> at >> org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.merge(SequenceFile.java:28 >> 02) >> at org.apache.hadoop.io.SequenceFile$Sorter.merge(SequenceFile.java:2511) >> at >> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1040) >> at >> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:698) >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:220) >> at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124 >> The above error comes almost at the end of map job. I have set the heap >> size >> to 1GB. Still the problem is persisting. Can someone please help me how >> to >> avoid this error? > What is the typical size of your key? What is the value of io.sort.factor? > Hadoop version? > > > > -- View this message in context: http://www.nabble.com/OutOfMemory-Error-tp19531174p19545298.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
Re: OutOfMemory Error
On 9/17/08 6:06 PM, "Pallavi Palleti" <[EMAIL PROTECTED]> wrote: > > Hi all, > >I am getting outofmemory error as shown below when I ran map-red on huge > amount of data.: > java.lang.OutOfMemoryError: Java heap space > at > org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:52) > at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:90) > at > org.apache.hadoop.io.SequenceFile$Reader.nextRawKey(SequenceFile.java:1974) > at > org.apache.hadoop.io.SequenceFile$Sorter$SegmentDescriptor.nextRawKey(Sequence > File.java:3002) > at > org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.merge(SequenceFile.java:28 > 02) > at org.apache.hadoop.io.SequenceFile$Sorter.merge(SequenceFile.java:2511) > at > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1040) > at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:698) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:220) > at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124 > The above error comes almost at the end of map job. I have set the heap size > to 1GB. Still the problem is persisting. Can someone please help me how to > avoid this error? What is the typical size of your key? What is the value of io.sort.factor? Hadoop version?
RE: OutOfMemory Error
Hello, What version of Hadoop are you using ? Regards, Leon Mergen > -Original Message- > From: Pallavi Palleti [mailto:[EMAIL PROTECTED] > Sent: Wednesday, September 17, 2008 2:36 PM > To: core-user@hadoop.apache.org > Subject: OutOfMemory Error > > > Hi all, > >I am getting outofmemory error as shown below when I ran map-red on > huge > amount of data.: > java.lang.OutOfMemoryError: Java heap space > at > org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.jav > a:52) > at > org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:90) > at > org.apache.hadoop.io.SequenceFile$Reader.nextRawKey(SequenceFile.java:1 > 974) > at > org.apache.hadoop.io.SequenceFile$Sorter$SegmentDescriptor.nextRawKey(S > equenceFile.java:3002) > at > org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.merge(SequenceFile. > java:2802) > at > org.apache.hadoop.io.SequenceFile$Sorter.merge(SequenceFile.java:2511) > at > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.jav > a:1040) > at > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:698 > ) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:220) > at > org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124 > The above error comes almost at the end of map job. I have set the heap > size > to 1GB. Still the problem is persisting. Can someone please help me > how to > avoid this error? > -- > View this message in context: http://www.nabble.com/OutOfMemory-Error- > tp19531174p19531174.html > Sent from the Hadoop core-user mailing list archive at Nabble.com.