How to see block information on NameNode ?

2011-05-21 Thread praveenesh kumar
hey..!!

I have a question.
If I copy some file on HDFS file system, it will get split into blocks and
Namenode will keep all these meta info with it.
How can I see that info.
I copied 5 GB file on NameNode, but I see that file only on the NameNode..
It doesnot get split into blocks..??
How can I see whether my file is getting split into blocks and which data
node is keeping which block ??

Thanks,
Praveenesh


Re: How to see block information on NameNode ?

2011-05-21 Thread Harsh J
One way: Opening the file in the web UI and looking at the bottom of
the page will show you all block locations split by split.

On Sat, May 21, 2011 at 11:46 AM, praveenesh kumar praveen...@gmail.com wrote:
 hey..!!

 I have a question.
 If I copy some file on HDFS file system, it will get split into blocks and
 Namenode will keep all these meta info with it.
 How can I see that info.
 I copied 5 GB file on NameNode, but I see that file only on the NameNode..
 It doesnot get split into blocks..??
 How can I see whether my file is getting split into blocks and which data
 node is keeping which block ??

 Thanks,
 Praveenesh




-- 
Harsh J


Re: Using df instead of du to calculate datanode space

2011-05-21 Thread Edward Capriolo
Good job. I brought this up an another thread, but was told it was not a
problem. Good thing I'm not crazy.

On Sat, May 21, 2011 at 12:42 AM, Joe Stein
charmal...@allthingshadoop.comwrote:

 I came up with a nice little hack to trick hadoop into calculating disk
 usage with df instead of du


 http://allthingshadoop.com/2011/05/20/faster-datanodes-with-less-wait-io-using-df-instead-of-du/

 I am running this in production, works like a charm and already
 seeing benefit, woot!

 I hope it works well for others too.

 /*
 Joe Stein
 http://www.twitter.com/allthingshadoop
 */



Re: Hadoop and WikiLeaks

2011-05-21 Thread highpointe
 Does this copy text bother anyone else? Sure winning any award is great
 but
 does hadoop want to be associated with innovation like WikiLeaks?
 
 

[Only] through the free distribution of information, the guaranteed integrity 
of said information and an aggressive system of checks and balances can man 
truly be free and hold the winning card. 

So...  YES. Hadoop should be considered an innovation that promotes the free 
flow of information and a statistical whistle blower. 

Take off your damn aluminum hat. If it doesn't work for you, it will work 
against you. 

On May 19, 2011, at 8:54 AM, James Seigel ja...@tynt.com wrote:

 Does this copy text bother anyone else? Sure winning any award is great
 but
 does hadoop want to be associated with innovation like WikiLeaks?
 
 


Re: Using df instead of du to calculate datanode space

2011-05-21 Thread Niels Basjes
Hi,

Although I like the thought of doing things smarter I'm never ever
going to change core Unix/Linux applications for the sake of a
specific application. Linux handles scripts and binaries completely
different with regards to security. So how do you know for sure (I
mean 100% sure, not just 99.% sure) that you haven't broken
any other functionality needed to keep your system sane?

Why don't you issue a feature request so this needless disk io can
be fixed as part of the base code of Hadoop (instead of breaking the
underlying OS)?

Niels

2011/5/21 Edward Capriolo edlinuxg...@gmail.com:
 Good job. I brought this up an another thread, but was told it was not a
 problem. Good thing I'm not crazy.

 On Sat, May 21, 2011 at 12:42 AM, Joe Stein
 charmal...@allthingshadoop.comwrote:

 I came up with a nice little hack to trick hadoop into calculating disk
 usage with df instead of du


 http://allthingshadoop.com/2011/05/20/faster-datanodes-with-less-wait-io-using-df-instead-of-du/

 I am running this in production, works like a charm and already
 seeing benefit, woot!

 I hope it works well for others too.

 /*
 Joe Stein
 http://www.twitter.com/allthingshadoop
 */





-- 
Met vriendelijke groeten,

Niels Basjes


Re: How to see block information on NameNode ?

2011-05-21 Thread Bharath Mundlapudi
Another way is executing this cmd:
hadoop fsck file path -files -blocks -locations

-Bharath



From: Harsh J ha...@cloudera.com
To: common-user@hadoop.apache.org
Sent: Saturday, May 21, 2011 6:45 AM
Subject: Re: How to see block information on NameNode ?

One way: Opening the file in the web UI and looking at the bottom of
the page will show you all block locations split by split.

On Sat, May 21, 2011 at 11:46 AM, praveenesh kumar praveen...@gmail.com wrote:
 hey..!!

 I have a question.
 If I copy some file on HDFS file system, it will get split into blocks and
 Namenode will keep all these meta info with it.
 How can I see that info.
 I copied 5 GB file on NameNode, but I see that file only on the NameNode..
 It doesnot get split into blocks..??
 How can I see whether my file is getting split into blocks and which data
 node is keeping which block ??

 Thanks,
 Praveenesh




-- 
Harsh J

get name of file in mapper output directory

2011-05-21 Thread Mark question
Hi,

  I'm running a job with maps only  and I want by end of each map
(ie.Close() function) to open the file that the current map has wrote using
its output.collector.

  I know job.getWorkingDirectory()  would give me the parent path of the
file written, but how to get the full path or the name (ie. part-0 or
part-1).

Thanks,
Mark


Sorting ...

2011-05-21 Thread Mark question
I'm trying to sort Sequence files using the Hadoop-Example TeraSort. But
after taking a couple of minutes .. output is empty.

HDFS has the following Sequence files:
-rw-r--r--   1 Hadoop supergroup  196113760 2011-05-21 12:16
/user/Hadoop/out/part-0
-rw-r--r--   1 Hadoop supergroup  250935096 2011-05-21 12:16
/user/Hadoop/out/part-1
-rw-r--r--   1 Hadoop supergroup  262943648 2011-05-21 12:17
/user/Hadoop/out/part-2
-rw-r--r--   1 Hadoop supergroup  114888492 2011-05-21 12:17
/user/Hadoop/out/part-3

After running:  hadoop jar hadoop-mapred-examples-0.21.0.jar terasort out
sorted
Error is:
   
11/05/21 18:13:12 INFO mapreduce.Job:  map 74% reduce 20%
11/05/21 18:13:14 INFO mapreduce.Job: Task Id :
attempt_201105202144_0039_m_09_0, Status : FAILED
java.io.EOFException: read past eof

I'm trying to find what the input format for the TeraSort is, but it is not
specified.

Thanks for any thought,
Mark


Re: current line number as key?

2011-05-21 Thread Mark question
What if you run a MapReduce program to generate a Sequence File from your
text file where key is the line number and value is the whole line, then for
the second job, the splits are done record wise hence, each mapper will be
getting a split/block of records [lineNumberline] ~Cheers,
Mark

On Wed, May 18, 2011 at 12:18 PM, Robert Evans ev...@yahoo-inc.com wrote:

 You are correct, that there is no easy and efficient way to do this.

 You could create a new InputFormat that derives from FileInputFormat that
 makes it so the files do not split, and then have a RecordReader that keeps
 track of line numbers.  But then each file is read by only one mapper.

 Alternatively you could assume that the split is going to be done
 deterministically and do two passes one, where you count the number of lines
 in each partition, and a second that then assigns the lines based off of the
 output from the first.  But that requires two map passes.

 --Bobby Evans


 On 5/18/11 1:53 PM, Alexandra Anghelescu axanghele...@gmail.com wrote:

 Hi,

 It is hard to pick up certain lines of a text file - globally I mean.
 Remember that the file is split according to its size (byte boundries) not
 lines.,, so, it is possible to keep track of the lines inside a split, but
 globally for the whole file, assuming it is split among map tasks... i
 don't
 think it is possible.. I am new to hadoop, but that is my take on it.

 Alexandra

 On Wed, May 18, 2011 at 2:41 PM, bnonymous libei.t...@gmail.com wrote:

 
  Hello,
 
  I'm trying to pick up certain lines of a text file. (say 1st, 110th line
 of
  a file with 10^10 lines). I need a InputFormat which gives the Mapper
 line
  number as the key.
 
  I tried to implement RecordReader, but I can't get line information from
  InputSplit.
 
  Any solution to this???
 
  Thanks in advance!!!
  --
  View this message in context:
 
 http://old.nabble.com/current-line-number-as-key--tp31649694p31649694.html
  Sent from the Hadoop core-user mailing list archive at Nabble.com.
 
 




Re: Configuring jvm metrics in hadoop-0.20.203.0

2011-05-21 Thread Luke Lu
On Fri, May 20, 2011 at 9:02 AM, Matyas Markovics
markovics.mat...@gmail.com wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 Hi,
 I am trying to get jvm metrics from the new verison of hadoop.
 I have read the migration instructions and come up with the following
 content for hadoop-metrics2.properties:

 *.sink.file.class=org.apache.hadoop.metrics2.sink.FileSink
 jvm.sink.file.period=2
 jvm.sink.file.filename=/home/ec2-user/jvmmetrics.log

The (documented) syntax is
[lowercased-service].sink.[sink-name].[option], So for jobtracker it
would be jobtracker.sink.file...

This will get all metrics from all the contexts (unlike metrics1 where
you're required to configure each context). If you want to restrict
the sink to only jvm metrics do this:

jobtracker.sink.jvmfile.class=${*.sink.file.class}
jobtracker.sink.jvmfile.context=jvm
jobtracker.sink.jvmfile.filename=/path/to/namenode-jvm-metrics.out

 Any help would be appreciated even if you have a different approach to
 get memory usage from reducers.

reducetask.sink.file.filename=/path/to/reducetask-metrics.out

__Luke