How to see block information on NameNode ?
hey..!! I have a question. If I copy some file on HDFS file system, it will get split into blocks and Namenode will keep all these meta info with it. How can I see that info. I copied 5 GB file on NameNode, but I see that file only on the NameNode.. It doesnot get split into blocks..?? How can I see whether my file is getting split into blocks and which data node is keeping which block ?? Thanks, Praveenesh
Re: How to see block information on NameNode ?
One way: Opening the file in the web UI and looking at the bottom of the page will show you all block locations split by split. On Sat, May 21, 2011 at 11:46 AM, praveenesh kumar praveen...@gmail.com wrote: hey..!! I have a question. If I copy some file on HDFS file system, it will get split into blocks and Namenode will keep all these meta info with it. How can I see that info. I copied 5 GB file on NameNode, but I see that file only on the NameNode.. It doesnot get split into blocks..?? How can I see whether my file is getting split into blocks and which data node is keeping which block ?? Thanks, Praveenesh -- Harsh J
Re: Using df instead of du to calculate datanode space
Good job. I brought this up an another thread, but was told it was not a problem. Good thing I'm not crazy. On Sat, May 21, 2011 at 12:42 AM, Joe Stein charmal...@allthingshadoop.comwrote: I came up with a nice little hack to trick hadoop into calculating disk usage with df instead of du http://allthingshadoop.com/2011/05/20/faster-datanodes-with-less-wait-io-using-df-instead-of-du/ I am running this in production, works like a charm and already seeing benefit, woot! I hope it works well for others too. /* Joe Stein http://www.twitter.com/allthingshadoop */
Re: Hadoop and WikiLeaks
Does this copy text bother anyone else? Sure winning any award is great but does hadoop want to be associated with innovation like WikiLeaks? [Only] through the free distribution of information, the guaranteed integrity of said information and an aggressive system of checks and balances can man truly be free and hold the winning card. So... YES. Hadoop should be considered an innovation that promotes the free flow of information and a statistical whistle blower. Take off your damn aluminum hat. If it doesn't work for you, it will work against you. On May 19, 2011, at 8:54 AM, James Seigel ja...@tynt.com wrote: Does this copy text bother anyone else? Sure winning any award is great but does hadoop want to be associated with innovation like WikiLeaks?
Re: Using df instead of du to calculate datanode space
Hi, Although I like the thought of doing things smarter I'm never ever going to change core Unix/Linux applications for the sake of a specific application. Linux handles scripts and binaries completely different with regards to security. So how do you know for sure (I mean 100% sure, not just 99.% sure) that you haven't broken any other functionality needed to keep your system sane? Why don't you issue a feature request so this needless disk io can be fixed as part of the base code of Hadoop (instead of breaking the underlying OS)? Niels 2011/5/21 Edward Capriolo edlinuxg...@gmail.com: Good job. I brought this up an another thread, but was told it was not a problem. Good thing I'm not crazy. On Sat, May 21, 2011 at 12:42 AM, Joe Stein charmal...@allthingshadoop.comwrote: I came up with a nice little hack to trick hadoop into calculating disk usage with df instead of du http://allthingshadoop.com/2011/05/20/faster-datanodes-with-less-wait-io-using-df-instead-of-du/ I am running this in production, works like a charm and already seeing benefit, woot! I hope it works well for others too. /* Joe Stein http://www.twitter.com/allthingshadoop */ -- Met vriendelijke groeten, Niels Basjes
Re: How to see block information on NameNode ?
Another way is executing this cmd: hadoop fsck file path -files -blocks -locations -Bharath From: Harsh J ha...@cloudera.com To: common-user@hadoop.apache.org Sent: Saturday, May 21, 2011 6:45 AM Subject: Re: How to see block information on NameNode ? One way: Opening the file in the web UI and looking at the bottom of the page will show you all block locations split by split. On Sat, May 21, 2011 at 11:46 AM, praveenesh kumar praveen...@gmail.com wrote: hey..!! I have a question. If I copy some file on HDFS file system, it will get split into blocks and Namenode will keep all these meta info with it. How can I see that info. I copied 5 GB file on NameNode, but I see that file only on the NameNode.. It doesnot get split into blocks..?? How can I see whether my file is getting split into blocks and which data node is keeping which block ?? Thanks, Praveenesh -- Harsh J
get name of file in mapper output directory
Hi, I'm running a job with maps only and I want by end of each map (ie.Close() function) to open the file that the current map has wrote using its output.collector. I know job.getWorkingDirectory() would give me the parent path of the file written, but how to get the full path or the name (ie. part-0 or part-1). Thanks, Mark
Sorting ...
I'm trying to sort Sequence files using the Hadoop-Example TeraSort. But after taking a couple of minutes .. output is empty. HDFS has the following Sequence files: -rw-r--r-- 1 Hadoop supergroup 196113760 2011-05-21 12:16 /user/Hadoop/out/part-0 -rw-r--r-- 1 Hadoop supergroup 250935096 2011-05-21 12:16 /user/Hadoop/out/part-1 -rw-r--r-- 1 Hadoop supergroup 262943648 2011-05-21 12:17 /user/Hadoop/out/part-2 -rw-r--r-- 1 Hadoop supergroup 114888492 2011-05-21 12:17 /user/Hadoop/out/part-3 After running: hadoop jar hadoop-mapred-examples-0.21.0.jar terasort out sorted Error is: 11/05/21 18:13:12 INFO mapreduce.Job: map 74% reduce 20% 11/05/21 18:13:14 INFO mapreduce.Job: Task Id : attempt_201105202144_0039_m_09_0, Status : FAILED java.io.EOFException: read past eof I'm trying to find what the input format for the TeraSort is, but it is not specified. Thanks for any thought, Mark
Re: current line number as key?
What if you run a MapReduce program to generate a Sequence File from your text file where key is the line number and value is the whole line, then for the second job, the splits are done record wise hence, each mapper will be getting a split/block of records [lineNumberline] ~Cheers, Mark On Wed, May 18, 2011 at 12:18 PM, Robert Evans ev...@yahoo-inc.com wrote: You are correct, that there is no easy and efficient way to do this. You could create a new InputFormat that derives from FileInputFormat that makes it so the files do not split, and then have a RecordReader that keeps track of line numbers. But then each file is read by only one mapper. Alternatively you could assume that the split is going to be done deterministically and do two passes one, where you count the number of lines in each partition, and a second that then assigns the lines based off of the output from the first. But that requires two map passes. --Bobby Evans On 5/18/11 1:53 PM, Alexandra Anghelescu axanghele...@gmail.com wrote: Hi, It is hard to pick up certain lines of a text file - globally I mean. Remember that the file is split according to its size (byte boundries) not lines.,, so, it is possible to keep track of the lines inside a split, but globally for the whole file, assuming it is split among map tasks... i don't think it is possible.. I am new to hadoop, but that is my take on it. Alexandra On Wed, May 18, 2011 at 2:41 PM, bnonymous libei.t...@gmail.com wrote: Hello, I'm trying to pick up certain lines of a text file. (say 1st, 110th line of a file with 10^10 lines). I need a InputFormat which gives the Mapper line number as the key. I tried to implement RecordReader, but I can't get line information from InputSplit. Any solution to this??? Thanks in advance!!! -- View this message in context: http://old.nabble.com/current-line-number-as-key--tp31649694p31649694.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
Re: Configuring jvm metrics in hadoop-0.20.203.0
On Fri, May 20, 2011 at 9:02 AM, Matyas Markovics markovics.mat...@gmail.com wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi, I am trying to get jvm metrics from the new verison of hadoop. I have read the migration instructions and come up with the following content for hadoop-metrics2.properties: *.sink.file.class=org.apache.hadoop.metrics2.sink.FileSink jvm.sink.file.period=2 jvm.sink.file.filename=/home/ec2-user/jvmmetrics.log The (documented) syntax is [lowercased-service].sink.[sink-name].[option], So for jobtracker it would be jobtracker.sink.file... This will get all metrics from all the contexts (unlike metrics1 where you're required to configure each context). If you want to restrict the sink to only jvm metrics do this: jobtracker.sink.jvmfile.class=${*.sink.file.class} jobtracker.sink.jvmfile.context=jvm jobtracker.sink.jvmfile.filename=/path/to/namenode-jvm-metrics.out Any help would be appreciated even if you have a different approach to get memory usage from reducers. reducetask.sink.file.filename=/path/to/reducetask-metrics.out __Luke