Re: dfs.block.size
You can use FileSystem.getFileStatus(Path p) which gives you the block size specific to a file. On Tue, Feb 28, 2012 at 2:50 AM, Kai Voigt wrote: > "hadoop fsck -blocks" is something that I think of quickly. > > http://hadoop.apache.org/common/docs/current/commands_manual.html#fsckhas > more details > > Kai > > Am 28.02.2012 um 02:30 schrieb Mohit Anchlia: > > > How do I verify the block size of a given file? Is there a command? > > > > On Mon, Feb 27, 2012 at 7:59 AM, Joey Echeverria > wrote: > > > >> dfs.block.size can be set per job. > >> > >> mapred.tasktracker.map.tasks.maximum is per tasktracker. > >> > >> -Joey > >> > >> On Mon, Feb 27, 2012 at 10:19 AM, Mohit Anchlia > > >> wrote: > >>> Can someone please suggest if parameters like dfs.block.size, > >>> mapred.tasktracker.map.tasks.maximum are only cluster wide settings or > >> can > >>> these be set per client job configuration? > >>> > >>> On Sat, Feb 25, 2012 at 5:43 PM, Mohit Anchlia >>> wrote: > >>> > >>>> If I want to change the block size then can I use Configuration in > >>>> mapreduce job and set it when writing to the sequence file or does it > >> need > >>>> to be cluster wide setting in .xml files? > >>>> > >>>> Also, is there a way to check the block of a given file? > >>>> > >> > >> > >> > >> -- > >> Joseph Echeverria > >> Cloudera, Inc. > >> 443.305.9434 > >> > > -- > Kai Voigt > k...@123.org > > > > > -- Join me at http://hadoopworkshop.eventbrite.com/
Re: dfs.block.size
"hadoop fsck -blocks" is something that I think of quickly. http://hadoop.apache.org/common/docs/current/commands_manual.html#fsck has more details Kai Am 28.02.2012 um 02:30 schrieb Mohit Anchlia: > How do I verify the block size of a given file? Is there a command? > > On Mon, Feb 27, 2012 at 7:59 AM, Joey Echeverria wrote: > >> dfs.block.size can be set per job. >> >> mapred.tasktracker.map.tasks.maximum is per tasktracker. >> >> -Joey >> >> On Mon, Feb 27, 2012 at 10:19 AM, Mohit Anchlia >> wrote: >>> Can someone please suggest if parameters like dfs.block.size, >>> mapred.tasktracker.map.tasks.maximum are only cluster wide settings or >> can >>> these be set per client job configuration? >>> >>> On Sat, Feb 25, 2012 at 5:43 PM, Mohit Anchlia >> wrote: >>> >>>> If I want to change the block size then can I use Configuration in >>>> mapreduce job and set it when writing to the sequence file or does it >> need >>>> to be cluster wide setting in .xml files? >>>> >>>> Also, is there a way to check the block of a given file? >>>> >> >> >> >> -- >> Joseph Echeverria >> Cloudera, Inc. >> 443.305.9434 >> -- Kai Voigt k...@123.org
Re: dfs.block.size
How do I verify the block size of a given file? Is there a command? On Mon, Feb 27, 2012 at 7:59 AM, Joey Echeverria wrote: > dfs.block.size can be set per job. > > mapred.tasktracker.map.tasks.maximum is per tasktracker. > > -Joey > > On Mon, Feb 27, 2012 at 10:19 AM, Mohit Anchlia > wrote: > > Can someone please suggest if parameters like dfs.block.size, > > mapred.tasktracker.map.tasks.maximum are only cluster wide settings or > can > > these be set per client job configuration? > > > > On Sat, Feb 25, 2012 at 5:43 PM, Mohit Anchlia >wrote: > > > >> If I want to change the block size then can I use Configuration in > >> mapreduce job and set it when writing to the sequence file or does it > need > >> to be cluster wide setting in .xml files? > >> > >> Also, is there a way to check the block of a given file? > >> > > > > -- > Joseph Echeverria > Cloudera, Inc. > 443.305.9434 >
Re: dfs.block.size
dfs.block.size can be set per job. mapred.tasktracker.map.tasks.maximum is per tasktracker. -Joey On Mon, Feb 27, 2012 at 10:19 AM, Mohit Anchlia wrote: > Can someone please suggest if parameters like dfs.block.size, > mapred.tasktracker.map.tasks.maximum are only cluster wide settings or can > these be set per client job configuration? > > On Sat, Feb 25, 2012 at 5:43 PM, Mohit Anchlia wrote: > >> If I want to change the block size then can I use Configuration in >> mapreduce job and set it when writing to the sequence file or does it need >> to be cluster wide setting in .xml files? >> >> Also, is there a way to check the block of a given file? >> -- Joseph Echeverria Cloudera, Inc. 443.305.9434
Re: dfs.block.size
Can someone please suggest if parameters like dfs.block.size, mapred.tasktracker.map.tasks.maximum are only cluster wide settings or can these be set per client job configuration? On Sat, Feb 25, 2012 at 5:43 PM, Mohit Anchlia wrote: > If I want to change the block size then can I use Configuration in > mapreduce job and set it when writing to the sequence file or does it need > to be cluster wide setting in .xml files? > > Also, is there a way to check the block of a given file? >
dfs.block.size
If I want to change the block size then can I use Configuration in mapreduce job and set it when writing to the sequence file or does it need to be cluster wide setting in .xml files? Also, is there a way to check the block of a given file?
Re: Question about dfs.block.size setting
Hi Harsh, Thanks for your comments, I found "Increasing the number of tasks increases the framework overhead, but increases load balancing and lowers the cost of failures." quite useful. But I'm still confued why increase block size for large jobs will improve performance. And according to the result of my test, while sorting 2TB data on 30 nodes cluster, increase block size from 64M to 256M would decline performance instead of improving it, could anybody tell me why this happened? Any comments on this? Thanks. Best Regards, Carp 2010/7/22 Harsh J > This article has a few good lines that should clear that doubt of yours: > http://wiki.apache.org/hadoop/HowManyMapsAndReduces > > On Thu, Jul 22, 2010 at 9:17 AM, Yu Li wrote: > > Hi all, > > > > There're lots of materials from internet suggest to set dfs.block.size > > larger, e.g. from 64M to 256M, when the job is large. And they said the > > performance would improve. But I'm not clear why increse the block size > will > > improve. I know that increase block size will reduce the map task number > for > > the same input, but why lesser map tasks will improve overall > performance? > > > > Any comments would be highly valued, and thanks in advance. > > > > Best Regards, > > Carp > > > > > > -- > Harsh J > www.harshj.com >
Re: Question about dfs.block.size setting
This article has a few good lines that should clear that doubt of yours: http://wiki.apache.org/hadoop/HowManyMapsAndReduces On Thu, Jul 22, 2010 at 9:17 AM, Yu Li wrote: > Hi all, > > There're lots of materials from internet suggest to set dfs.block.size > larger, e.g. from 64M to 256M, when the job is large. And they said the > performance would improve. But I'm not clear why increse the block size will > improve. I know that increase block size will reduce the map task number for > the same input, but why lesser map tasks will improve overall performance? > > Any comments would be highly valued, and thanks in advance. > > Best Regards, > Carp > -- Harsh J www.harshj.com
Question about dfs.block.size setting
Hi all, There're lots of materials from internet suggest to set dfs.block.size larger, e.g. from 64M to 256M, when the job is large. And they said the performance would improve. But I'm not clear why increse the block size will improve. I know that increase block size will reduce the map task number for the same input, but why lesser map tasks will improve overall performance? Any comments would be highly valued, and thanks in advance. Best Regards, Carp
Questions on dfs.block.size
Hi Experts: Is there any method to make the dfs.block.size to take effect on old file before it changes? Or is it meaningful? If I run a job A, would it copy input files to hdfs file system if that input file has been in hdfs file system? If so, then perhaps make dfs.block.size has is meaningful: if I found Job A runs slowly and the subsequent Job B has same input file with Job A, then perhaps I can change the dfs.block.size and then the number of mapper task for Job B would be increased, which may faster the Job running. If it always needs to copy input file(even that file has been on hdfs system) everytime a Job runs, then making dfs.block.size take effect on old files would be not meaningful. Thanks! Stan Lee
Re: dfs.block.size change not taking affect?
Block size may not be the only answer, look into the way the namenode distributes the blocks on your datanodes, see if the client datanode is not creating a bottleneck. zeevik wrote: > > > .. New member here, hello everyone! .. > > I am changing the default dfs.block.size from 64MB to 256MB (or any other > value) in hadoop-site.xml file and restarting the cluster to make sure > changes are applied. Now the issue is that when I am trying to put a file > on the hdfs (hadoop fs -put) it seems like the block size is always 64MB > (browsing the filesystem via the http interface). Hadoop version is 0.19.1 > on a 6 node cluster. > > 1. Why the new block size is not reflected when I am creating/loading a > new file into the hdfs? > 2. How can I see current parameters and their values on Hadoop to make > sure the change in hadoop-site.xml file took affect at the restart? > > I am trying to load a large file into HDFS and it seems slow (1.5min for > 1GB), that's why I am trying to increase the block size. > > Thanks, > Zeev > -- View this message in context: http://www.nabble.com/dfs.block.size-change-not-taking-affect--tp24654181p24654233.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
dfs.block.size change not taking affect?
.. New member here, hello everyone! .. I am changing the default dfs.block.size from 64MB to 256MB (or any other value) in hadoop-site.xml file and restarting the cluster to make sure changes are applied. Now the issue is that when I am trying to put a file on the hdfs (hadoop fs -put) it seems like the block size is always 64MB (browsing the filesystem via the http interface). Hadoop version is 0.19.1 on a 6 node cluster. 1. Why the new block size is not reflected when I am creating/loading a new file into the hdfs? 2. How can I see current parameters and their values on Hadoop to make sure the change in hadoop-site.xml file took affect at the restart? I am trying to load a large file into HDFS and it seems slow (1.5min for 1GB), that's why I am trying to increase the block size. Thanks, Zeev -- View this message in context: http://www.nabble.com/dfs.block.size-change-not-taking-affect--tp24654181p24654181.html Sent from the Hadoop core-user mailing list archive at Nabble.com.