Re: dfs.block.size

2012-02-28 Thread madhu phatak
You can use FileSystem.getFileStatus(Path p) which gives you the block size
specific to a file.

On Tue, Feb 28, 2012 at 2:50 AM, Kai Voigt k...@123.org wrote:

 hadoop fsck filename -blocks is something that I think of quickly.

 http://hadoop.apache.org/common/docs/current/commands_manual.html#fsckhas 
 more details

 Kai

 Am 28.02.2012 um 02:30 schrieb Mohit Anchlia:

  How do I verify the block size of a given file? Is there a command?
 
  On Mon, Feb 27, 2012 at 7:59 AM, Joey Echeverria j...@cloudera.com
 wrote:
 
  dfs.block.size can be set per job.
 
  mapred.tasktracker.map.tasks.maximum is per tasktracker.
 
  -Joey
 
  On Mon, Feb 27, 2012 at 10:19 AM, Mohit Anchlia mohitanch...@gmail.com
 
  wrote:
  Can someone please suggest if parameters like dfs.block.size,
  mapred.tasktracker.map.tasks.maximum are only cluster wide settings or
  can
  these be set per client job configuration?
 
  On Sat, Feb 25, 2012 at 5:43 PM, Mohit Anchlia mohitanch...@gmail.com
  wrote:
 
  If I want to change the block size then can I use Configuration in
  mapreduce job and set it when writing to the sequence file or does it
  need
  to be cluster wide setting in .xml files?
 
  Also, is there a way to check the block of a given file?
 
 
 
 
  --
  Joseph Echeverria
  Cloudera, Inc.
  443.305.9434
 

 --
 Kai Voigt
 k...@123.org







-- 
Join me at http://hadoopworkshop.eventbrite.com/


Re: dfs.block.size

2012-02-27 Thread Mohit Anchlia
Can someone please suggest if parameters like dfs.block.size,
mapred.tasktracker.map.tasks.maximum are only cluster wide settings or can
these be set per client job configuration?

On Sat, Feb 25, 2012 at 5:43 PM, Mohit Anchlia mohitanch...@gmail.comwrote:

 If I want to change the block size then can I use Configuration in
 mapreduce job and set it when writing to the sequence file or does it need
 to be cluster wide setting in .xml files?

 Also, is there a way to check the block of a given file?



Re: dfs.block.size

2012-02-27 Thread Joey Echeverria
dfs.block.size can be set per job.

mapred.tasktracker.map.tasks.maximum is per tasktracker.

-Joey

On Mon, Feb 27, 2012 at 10:19 AM, Mohit Anchlia mohitanch...@gmail.com wrote:
 Can someone please suggest if parameters like dfs.block.size,
 mapred.tasktracker.map.tasks.maximum are only cluster wide settings or can
 these be set per client job configuration?

 On Sat, Feb 25, 2012 at 5:43 PM, Mohit Anchlia mohitanch...@gmail.comwrote:

 If I want to change the block size then can I use Configuration in
 mapreduce job and set it when writing to the sequence file or does it need
 to be cluster wide setting in .xml files?

 Also, is there a way to check the block of a given file?




-- 
Joseph Echeverria
Cloudera, Inc.
443.305.9434


Re: dfs.block.size

2012-02-27 Thread Mohit Anchlia
How do I verify the block size of a given file? Is there a command?

On Mon, Feb 27, 2012 at 7:59 AM, Joey Echeverria j...@cloudera.com wrote:

 dfs.block.size can be set per job.

 mapred.tasktracker.map.tasks.maximum is per tasktracker.

 -Joey

 On Mon, Feb 27, 2012 at 10:19 AM, Mohit Anchlia mohitanch...@gmail.com
 wrote:
  Can someone please suggest if parameters like dfs.block.size,
  mapred.tasktracker.map.tasks.maximum are only cluster wide settings or
 can
  these be set per client job configuration?
 
  On Sat, Feb 25, 2012 at 5:43 PM, Mohit Anchlia mohitanch...@gmail.com
 wrote:
 
  If I want to change the block size then can I use Configuration in
  mapreduce job and set it when writing to the sequence file or does it
 need
  to be cluster wide setting in .xml files?
 
  Also, is there a way to check the block of a given file?
 



 --
 Joseph Echeverria
 Cloudera, Inc.
 443.305.9434



Re: dfs.block.size

2012-02-27 Thread Kai Voigt
hadoop fsck filename -blocks is something that I think of quickly.

http://hadoop.apache.org/common/docs/current/commands_manual.html#fsck has more 
details

Kai

Am 28.02.2012 um 02:30 schrieb Mohit Anchlia:

 How do I verify the block size of a given file? Is there a command?
 
 On Mon, Feb 27, 2012 at 7:59 AM, Joey Echeverria j...@cloudera.com wrote:
 
 dfs.block.size can be set per job.
 
 mapred.tasktracker.map.tasks.maximum is per tasktracker.
 
 -Joey
 
 On Mon, Feb 27, 2012 at 10:19 AM, Mohit Anchlia mohitanch...@gmail.com
 wrote:
 Can someone please suggest if parameters like dfs.block.size,
 mapred.tasktracker.map.tasks.maximum are only cluster wide settings or
 can
 these be set per client job configuration?
 
 On Sat, Feb 25, 2012 at 5:43 PM, Mohit Anchlia mohitanch...@gmail.com
 wrote:
 
 If I want to change the block size then can I use Configuration in
 mapreduce job and set it when writing to the sequence file or does it
 need
 to be cluster wide setting in .xml files?
 
 Also, is there a way to check the block of a given file?
 
 
 
 
 --
 Joseph Echeverria
 Cloudera, Inc.
 443.305.9434
 

-- 
Kai Voigt
k...@123.org






dfs.block.size

2012-02-25 Thread Mohit Anchlia
If I want to change the block size then can I use Configuration in
mapreduce job and set it when writing to the sequence file or does it need
to be cluster wide setting in .xml files?

Also, is there a way to check the block of a given file?


Re: Question about dfs.block.size setting

2010-07-22 Thread Yu Li
Hi Harsh,

Thanks for your comments, I found Increasing the number of tasks increases
the framework overhead, but increases load balancing and lowers the cost of
failures. quite useful. But I'm still confued why increase block size for
large jobs will improve performance. And according to the result of my test,
while sorting 2TB data on 30 nodes cluster, increase block size from 64M to
256M would decline performance instead of improving it, could anybody tell
me why this happened?

Any comments on this? Thanks.

Best Regards,
Carp

2010/7/22 Harsh J qwertyman...@gmail.com

 This article has a few good lines that should clear that doubt of yours:
 http://wiki.apache.org/hadoop/HowManyMapsAndReduces

 On Thu, Jul 22, 2010 at 9:17 AM, Yu Li car...@gmail.com wrote:
  Hi all,
 
  There're lots of materials from internet suggest to set dfs.block.size
  larger, e.g. from 64M to 256M, when the job is large. And they said the
  performance would improve. But I'm not clear why increse the block size
 will
  improve. I know that increase block size will reduce the map task number
 for
  the same input, but why lesser map tasks will improve overall
 performance?
 
  Any comments would be highly valued, and thanks in advance.
 
  Best Regards,
  Carp
 



 --
 Harsh J
 www.harshj.com



Question about dfs.block.size setting

2010-07-21 Thread Yu Li
Hi all,

There're lots of materials from internet suggest to set dfs.block.size
larger, e.g. from 64M to 256M, when the job is large. And they said the
performance would improve. But I'm not clear why increse the block size will
improve. I know that increase block size will reduce the map task number for
the same input, but why lesser map tasks will improve overall performance?

Any comments would be highly valued, and thanks in advance.

Best Regards,
Carp


Re: Question about dfs.block.size setting

2010-07-21 Thread Harsh J
This article has a few good lines that should clear that doubt of yours:
http://wiki.apache.org/hadoop/HowManyMapsAndReduces

On Thu, Jul 22, 2010 at 9:17 AM, Yu Li car...@gmail.com wrote:
 Hi all,

 There're lots of materials from internet suggest to set dfs.block.size
 larger, e.g. from 64M to 256M, when the job is large. And they said the
 performance would improve. But I'm not clear why increse the block size will
 improve. I know that increase block size will reduce the map task number for
 the same input, but why lesser map tasks will improve overall performance?

 Any comments would be highly valued, and thanks in advance.

 Best Regards,
 Carp




-- 
Harsh J
www.harshj.com


dfs.block.size change not taking affect?

2009-07-24 Thread zeevik


.. New member here, hello everyone! ..

I am changing the default dfs.block.size from 64MB to 256MB (or any other
value) in hadoop-site.xml file and restarting the cluster to make sure
changes are applied. Now the issue is that when I am trying to put a file on
the hdfs (hadoop fs -put) it seems like the block size is always 64MB
(browsing the filesystem via the http interface). Hadoop version is 0.19.1
on a 6 node cluster. 

1. Why the new block size is not reflected when I am creating/loading a new
file into the hdfs?
2. How can I see current parameters and their values on Hadoop to make sure
the change in hadoop-site.xml file took affect at the restart? 

I am trying to load a large file into HDFS and it seems slow (1.5min for
1GB), that's why I am trying to increase the block size.

Thanks,
Zeev
-- 
View this message in context: 
http://www.nabble.com/dfs.block.size-change-not-taking-affect--tp24654181p24654181.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.