Re: dfs.block.size
You can use FileSystem.getFileStatus(Path p) which gives you the block size specific to a file. On Tue, Feb 28, 2012 at 2:50 AM, Kai Voigt k...@123.org wrote: hadoop fsck filename -blocks is something that I think of quickly. http://hadoop.apache.org/common/docs/current/commands_manual.html#fsckhas more details Kai Am 28.02.2012 um 02:30 schrieb Mohit Anchlia: How do I verify the block size of a given file? Is there a command? On Mon, Feb 27, 2012 at 7:59 AM, Joey Echeverria j...@cloudera.com wrote: dfs.block.size can be set per job. mapred.tasktracker.map.tasks.maximum is per tasktracker. -Joey On Mon, Feb 27, 2012 at 10:19 AM, Mohit Anchlia mohitanch...@gmail.com wrote: Can someone please suggest if parameters like dfs.block.size, mapred.tasktracker.map.tasks.maximum are only cluster wide settings or can these be set per client job configuration? On Sat, Feb 25, 2012 at 5:43 PM, Mohit Anchlia mohitanch...@gmail.com wrote: If I want to change the block size then can I use Configuration in mapreduce job and set it when writing to the sequence file or does it need to be cluster wide setting in .xml files? Also, is there a way to check the block of a given file? -- Joseph Echeverria Cloudera, Inc. 443.305.9434 -- Kai Voigt k...@123.org -- Join me at http://hadoopworkshop.eventbrite.com/
Re: dfs.block.size
Can someone please suggest if parameters like dfs.block.size, mapred.tasktracker.map.tasks.maximum are only cluster wide settings or can these be set per client job configuration? On Sat, Feb 25, 2012 at 5:43 PM, Mohit Anchlia mohitanch...@gmail.comwrote: If I want to change the block size then can I use Configuration in mapreduce job and set it when writing to the sequence file or does it need to be cluster wide setting in .xml files? Also, is there a way to check the block of a given file?
Re: dfs.block.size
dfs.block.size can be set per job. mapred.tasktracker.map.tasks.maximum is per tasktracker. -Joey On Mon, Feb 27, 2012 at 10:19 AM, Mohit Anchlia mohitanch...@gmail.com wrote: Can someone please suggest if parameters like dfs.block.size, mapred.tasktracker.map.tasks.maximum are only cluster wide settings or can these be set per client job configuration? On Sat, Feb 25, 2012 at 5:43 PM, Mohit Anchlia mohitanch...@gmail.comwrote: If I want to change the block size then can I use Configuration in mapreduce job and set it when writing to the sequence file or does it need to be cluster wide setting in .xml files? Also, is there a way to check the block of a given file? -- Joseph Echeverria Cloudera, Inc. 443.305.9434
Re: dfs.block.size
How do I verify the block size of a given file? Is there a command? On Mon, Feb 27, 2012 at 7:59 AM, Joey Echeverria j...@cloudera.com wrote: dfs.block.size can be set per job. mapred.tasktracker.map.tasks.maximum is per tasktracker. -Joey On Mon, Feb 27, 2012 at 10:19 AM, Mohit Anchlia mohitanch...@gmail.com wrote: Can someone please suggest if parameters like dfs.block.size, mapred.tasktracker.map.tasks.maximum are only cluster wide settings or can these be set per client job configuration? On Sat, Feb 25, 2012 at 5:43 PM, Mohit Anchlia mohitanch...@gmail.com wrote: If I want to change the block size then can I use Configuration in mapreduce job and set it when writing to the sequence file or does it need to be cluster wide setting in .xml files? Also, is there a way to check the block of a given file? -- Joseph Echeverria Cloudera, Inc. 443.305.9434
Re: dfs.block.size
hadoop fsck filename -blocks is something that I think of quickly. http://hadoop.apache.org/common/docs/current/commands_manual.html#fsck has more details Kai Am 28.02.2012 um 02:30 schrieb Mohit Anchlia: How do I verify the block size of a given file? Is there a command? On Mon, Feb 27, 2012 at 7:59 AM, Joey Echeverria j...@cloudera.com wrote: dfs.block.size can be set per job. mapred.tasktracker.map.tasks.maximum is per tasktracker. -Joey On Mon, Feb 27, 2012 at 10:19 AM, Mohit Anchlia mohitanch...@gmail.com wrote: Can someone please suggest if parameters like dfs.block.size, mapred.tasktracker.map.tasks.maximum are only cluster wide settings or can these be set per client job configuration? On Sat, Feb 25, 2012 at 5:43 PM, Mohit Anchlia mohitanch...@gmail.com wrote: If I want to change the block size then can I use Configuration in mapreduce job and set it when writing to the sequence file or does it need to be cluster wide setting in .xml files? Also, is there a way to check the block of a given file? -- Joseph Echeverria Cloudera, Inc. 443.305.9434 -- Kai Voigt k...@123.org
dfs.block.size
If I want to change the block size then can I use Configuration in mapreduce job and set it when writing to the sequence file or does it need to be cluster wide setting in .xml files? Also, is there a way to check the block of a given file?
Re: Question about dfs.block.size setting
Hi Harsh, Thanks for your comments, I found Increasing the number of tasks increases the framework overhead, but increases load balancing and lowers the cost of failures. quite useful. But I'm still confued why increase block size for large jobs will improve performance. And according to the result of my test, while sorting 2TB data on 30 nodes cluster, increase block size from 64M to 256M would decline performance instead of improving it, could anybody tell me why this happened? Any comments on this? Thanks. Best Regards, Carp 2010/7/22 Harsh J qwertyman...@gmail.com This article has a few good lines that should clear that doubt of yours: http://wiki.apache.org/hadoop/HowManyMapsAndReduces On Thu, Jul 22, 2010 at 9:17 AM, Yu Li car...@gmail.com wrote: Hi all, There're lots of materials from internet suggest to set dfs.block.size larger, e.g. from 64M to 256M, when the job is large. And they said the performance would improve. But I'm not clear why increse the block size will improve. I know that increase block size will reduce the map task number for the same input, but why lesser map tasks will improve overall performance? Any comments would be highly valued, and thanks in advance. Best Regards, Carp -- Harsh J www.harshj.com
Question about dfs.block.size setting
Hi all, There're lots of materials from internet suggest to set dfs.block.size larger, e.g. from 64M to 256M, when the job is large. And they said the performance would improve. But I'm not clear why increse the block size will improve. I know that increase block size will reduce the map task number for the same input, but why lesser map tasks will improve overall performance? Any comments would be highly valued, and thanks in advance. Best Regards, Carp
Re: Question about dfs.block.size setting
This article has a few good lines that should clear that doubt of yours: http://wiki.apache.org/hadoop/HowManyMapsAndReduces On Thu, Jul 22, 2010 at 9:17 AM, Yu Li car...@gmail.com wrote: Hi all, There're lots of materials from internet suggest to set dfs.block.size larger, e.g. from 64M to 256M, when the job is large. And they said the performance would improve. But I'm not clear why increse the block size will improve. I know that increase block size will reduce the map task number for the same input, but why lesser map tasks will improve overall performance? Any comments would be highly valued, and thanks in advance. Best Regards, Carp -- Harsh J www.harshj.com
dfs.block.size change not taking affect?
.. New member here, hello everyone! .. I am changing the default dfs.block.size from 64MB to 256MB (or any other value) in hadoop-site.xml file and restarting the cluster to make sure changes are applied. Now the issue is that when I am trying to put a file on the hdfs (hadoop fs -put) it seems like the block size is always 64MB (browsing the filesystem via the http interface). Hadoop version is 0.19.1 on a 6 node cluster. 1. Why the new block size is not reflected when I am creating/loading a new file into the hdfs? 2. How can I see current parameters and their values on Hadoop to make sure the change in hadoop-site.xml file took affect at the restart? I am trying to load a large file into HDFS and it seems slow (1.5min for 1GB), that's why I am trying to increase the block size. Thanks, Zeev -- View this message in context: http://www.nabble.com/dfs.block.size-change-not-taking-affect--tp24654181p24654181.html Sent from the Hadoop core-user mailing list archive at Nabble.com.