Il y a 2 lettres inversées dans le nom Jean-Eric
________________________________ From: Jason Venner <jason.had...@gmail.com> To: common-user@hadoop.apache.org Sent: Tue, October 27, 2009 3:54:00 PM Subject: Re: Problem to create sequence file for If your string is up to 300MB you will need probably 1.3+gig to write it 1 copy in the string 600MB if your file is all ascii (strings store as shorts) 1 copy in the byte array as utf8 1 to x3 expansion, say 600MB 1 copy in the on the wire format, say 700MB possibly 1 copy in a transit buffer on the way to the remote file system, say 720MB that adds up to 1.9g to -> 2.6g Hopefully there are not more copies made ;) Try setting your heap to 3 or 5 gig with a 64bit jvm. On Tue, Oct 27, 2009 at 9:25 AM, bhushan_mahale < bhushan_mah...@persistent.co.in> wrote: > Hi Jason, > > Thanks for the reply. > The string is the entire content of the input text file. > It could as long as ~300MB. > I tried increasing jvm heap but unfortunately it was giving same error. > > Other option I am thinking is to split input files first. > > - Bhushan > -----Original Message----- > From: Jason Venner [mailto:jason.had...@gmail.com] > Sent: Tuesday, October 27, 2009 7:19 PM > To: common-user@hadoop.apache.org > Subject: Re: Problem to create sequence file for > > How large is the string that is being written? > Does it contain the entire contents of your file? > You may simple need to increase the heap size with your jvm. > > > On Tue, Oct 27, 2009 at 3:43 AM, bhushan_mahale < > bhushan_mah...@persistent.co.in> wrote: > > > Hi, > > > > I have written a code to create sequence files for given text files. > > The program takes following input parameters: > > > > 1. Local source directory - contains all the input text files > > 2. Destination HDFS URI - location on hdfs where sequence file will be > > copied > > > > The key for a sequence-record is the file-name. > > The value for a sequence-record is the content of the text file. > > > > The program runs fine for large number input text files. But if the size > of > > a single input text file is > 100 MB then it throws following exception: > > > > Exception in thread "main" java.lang.OutOfMemoryError: Java heap space > > at java.lang.String.toCharArray(String.java:2726) > > at org.apache.hadoop.io.Text.encode(Text.java:388) > > at org.apache.hadoop.io.Text.set(Text.java:178) > > at org.apache.hadoop.io.Text.<init>(Text.java:81) > > at SequenceFileCreator.create(SequenceFileCreator.java:106) > > at SequenceFileCreator.processFile(SequenceFileCreator.java:168) > > > > I am using "org.apache.hadoop.io.SequenceFile.Writer" for creating the > > sequence file. The Text class is used for keyclass and valclass. > > > > I tried increasing the max memory for the program but it throws same > error. > > > > Can you provide your suggestions? > > > > Thanks, > > - Bhushan > > > > > > DISCLAIMER > > ========== > > This e-mail may contain privileged and confidential information which is > > the property of Persistent Systems Ltd. It is intended only for the use > of > > the individual or entity to which it is addressed. If you are not the > > intended recipient, you are not authorized to read, retain, copy, print, > > distribute or use this message. If you have received this communication > in > > error, please notify the sender and delete all copies of this message. > > Persistent Systems Ltd. does not accept any liability for virus infected > > mails. > > > > > > -- > Pro Hadoop, a book to guide you from beginner to hadoop mastery, > http://www.amazon.com/dp/1430219424?tag=jewlerymall > www.prohadoopbook.com a community for Hadoop Professionals > > DISCLAIMER > ========== > This e-mail may contain privileged and confidential information which is > the property of Persistent Systems Ltd. It is intended only for the use of > the individual or entity to which it is addressed. If you are not the > intended recipient, you are not authorized to read, retain, copy, print, > distribute or use this message. If you have received this communication in > error, please notify the sender and delete all copies of this message. > Persistent Systems Ltd. does not accept any liability for virus infected > mails. > -- Pro Hadoop, a book to guide you from beginner to hadoop mastery, http://www.amazon.com/dp/1430219424?tag=jewlerymall www.prohadoopbook.com a community for Hadoop Professionals