Great experience! /Edward
On Fri, Sep 19, 2008 at 2:50 PM, Palleti, Pallavi <[EMAIL PROTECTED]> wrote: > Yeah. That was the problem. And Hama can be surely useful for large scale > matrix operations. > > But for this problem, I have modified the code to just pass the ID > information and read the vector information only when it is needed. In this > case, it was needed only in the reducer phase. This way, it avoided this > problem of out of memory error and also faster now. > > Thanks > Pallavi > -----Original Message----- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Edward J. Yoon > Sent: Friday, September 19, 2008 10:35 AM > To: core-user@hadoop.apache.org; [EMAIL PROTECTED]; [EMAIL PROTECTED] > Subject: Re: OutOfMemory Error > >> The key is of the form "ID :DenseVector Representation in mahout with > > I guess vector size seems too large so it'll need a distributed vector > architecture (or 2d partitioning strategies) for large scale matrix > operations. The hama team investigate these problem areas. So, it will > be improved If hama can be used for mahout in the future. > > /Edward > > On Thu, Sep 18, 2008 at 12:28 PM, Pallavi Palleti <[EMAIL PROTECTED]> wrote: >> >> Hadoop Version - 17.1 >> io.sort.factor =10 >> The key is of the form "ID :DenseVector Representation in mahout with >> dimensionality size = 160k" >> For example: C1:[,0.00111111, 3.002, ...... 1.001,....] >> So, typical size of the key of the mapper output can be 160K*6 (assuming >> double in string is represented in 5 bytes)+ 5 (bytes for C1:[]) + size >> required to store that the object is of type Text >> >> Thanks >> Pallavi >> >> >> >> Devaraj Das wrote: >>> >>> >>> >>> >>> On 9/17/08 6:06 PM, "Pallavi Palleti" <[EMAIL PROTECTED]> wrote: >>> >>>> >>>> Hi all, >>>> >>>> I am getting outofmemory error as shown below when I ran map-red on >>>> huge >>>> amount of data.: >>>> java.lang.OutOfMemoryError: Java heap space >>>> at >>>> org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:52) >>>> at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:90) >>>> at >>>> org.apache.hadoop.io.SequenceFile$Reader.nextRawKey(SequenceFile.java:1974) >>>> at >>>> org.apache.hadoop.io.SequenceFile$Sorter$SegmentDescriptor.nextRawKey(Sequence >>>> File.java:3002) >>>> at >>>> org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.merge(SequenceFile.java:28 >>>> 02) >>>> at org.apache.hadoop.io.SequenceFile$Sorter.merge(SequenceFile.java:2511) >>>> at >>>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1040) >>>> at >>>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:698) >>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:220) >>>> at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124 >>>> The above error comes almost at the end of map job. I have set the heap >>>> size >>>> to 1GB. Still the problem is persisting. Can someone please help me how >>>> to >>>> avoid this error? >>> What is the typical size of your key? What is the value of io.sort.factor? >>> Hadoop version? >>> >>> >>> >>> >> >> -- >> View this message in context: >> http://www.nabble.com/OutOfMemory-Error-tp19531174p19545298.html >> Sent from the Hadoop core-user mailing list archive at Nabble.com. >> >> > > > > -- > Best regards, Edward J. Yoon > [EMAIL PROTECTED] > http://blog.udanax.org > -- Best regards, Edward J. Yoon [EMAIL PROTECTED] http://blog.udanax.org