Re: Huge text file for Hadoop Mapreduce
You can get the wikipedia data from it's website, it's pretty big; Regards, *Stanley Shi,* On Tue, Jul 8, 2014 at 1:35 PM, Du Lam wrote: > Configuration conf = getConf(); > conf.setLong("mapreduce.input.fileinputformat.split.maxsize",1000); > > // u can set this to some small value (in bytes) to ensure your file will > split to multiple mappers , provided the format is not un-splitable format > like .snappy. > > > On Tue, Jul 8, 2014 at 7:32 AM, Adaryl "Bob" Wakefield, MBA < > adaryl.wakefi...@hotmail.com> wrote: > >> http://www.cs.cmu.edu/~./enron/ >> >> Not sure the uncompressed size but pretty sure it’s over a Gig. >> >> B. >> >> *From:* navaz >> *Sent:* Monday, July 07, 2014 6:22 PM >> *To:* user@hadoop.apache.org >> *Subject:* Huge text file for Hadoop Mapreduce >> >> >> Hi >> >> >> >> I am running basic word count Mapreduce code. I have download a file >> Gettysburg.txt which is of 1486bytes. I have 3 datanodes and replication >> factor is set to 3. The data is copied into all 3 datanodes but there is >> only one map task is running . All other nodes are ideal. I think this is >> because I have only one block of data and single task is running. I would >> like to download a bigger file say 1GB and want to test the network >> shuffling performance. Could you please suggest me where can I download the >> huge text file. ? >> >> >> >> >> >> Thanks & Regards >> >> >> >> Abdul Navaz >> >> >> > >
Re: Huge text file for Hadoop Mapreduce
Configuration conf = getConf(); conf.setLong("mapreduce.input.fileinputformat.split.maxsize",1000); // u can set this to some small value (in bytes) to ensure your file will split to multiple mappers , provided the format is not un-splitable format like .snappy. On Tue, Jul 8, 2014 at 7:32 AM, Adaryl "Bob" Wakefield, MBA < adaryl.wakefi...@hotmail.com> wrote: > http://www.cs.cmu.edu/~./enron/ > > Not sure the uncompressed size but pretty sure it’s over a Gig. > > B. > > *From:* navaz > *Sent:* Monday, July 07, 2014 6:22 PM > *To:* user@hadoop.apache.org > *Subject:* Huge text file for Hadoop Mapreduce > > > Hi > > > > I am running basic word count Mapreduce code. I have download a file > Gettysburg.txt which is of 1486bytes. I have 3 datanodes and replication > factor is set to 3. The data is copied into all 3 datanodes but there is > only one map task is running . All other nodes are ideal. I think this is > because I have only one block of data and single task is running. I would > like to download a bigger file say 1GB and want to test the network > shuffling performance. Could you please suggest me where can I download the > huge text file. ? > > > > > > Thanks & Regards > > > > Abdul Navaz > > >
Re: Huge text file for Hadoop Mapreduce
http://www.cs.cmu.edu/~./enron/ Not sure the uncompressed size but pretty sure it’s over a Gig. B. From: navaz Sent: Monday, July 07, 2014 6:22 PM To: user@hadoop.apache.org Subject: Huge text file for Hadoop Mapreduce Hi I am running basic word count Mapreduce code. I have download a file Gettysburg.txt which is of 1486bytes. I have 3 datanodes and replication factor is set to 3. The data is copied into all 3 datanodes but there is only one map task is running . All other nodes are ideal. I think this is because I have only one block of data and single task is running. I would like to download a bigger file say 1GB and want to test the network shuffling performance. Could you please suggest me where can I download the huge text file. ? Thanks & Regards Abdul Navaz
Huge text file for Hadoop Mapreduce
Hi I am running basic word count Mapreduce code. I have download a file Gettysburg.txt which is of 1486bytes. I have 3 datanodes and replication factor is set to 3. The data is copied into all 3 datanodes but there is only one map task is running . All other nodes are ideal. I think this is because I have only one block of data and single task is running. I would like to download a bigger file say 1GB and want to test the network shuffling performance. Could you please suggest me where can I download the huge text file. ? Thanks & Regards Abdul Navaz