Re: dfs.block.size change not taking affect?
Block size may not be the only answer, look into the way the namenode distributes the blocks on your datanodes, see if the client datanode is not creating a bottleneck. zeevik wrote: > > > .. New member here, hello everyone! .. > > I am changing the default dfs.block.size from 64MB to 256MB (or any other > value) in hadoop-site.xml file and restarting the cluster to make sure > changes are applied. Now the issue is that when I am trying to put a file > on the hdfs (hadoop fs -put) it seems like the block size is always 64MB > (browsing the filesystem via the http interface). Hadoop version is 0.19.1 > on a 6 node cluster. > > 1. Why the new block size is not reflected when I am creating/loading a > new file into the hdfs? > 2. How can I see current parameters and their values on Hadoop to make > sure the change in hadoop-site.xml file took affect at the restart? > > I am trying to load a large file into HDFS and it seems slow (1.5min for > 1GB), that's why I am trying to increase the block size. > > Thanks, > Zeev > -- View this message in context: http://www.nabble.com/dfs.block.size-change-not-taking-affect--tp24654181p24654233.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
dfs.block.size change not taking affect?
.. New member here, hello everyone! .. I am changing the default dfs.block.size from 64MB to 256MB (or any other value) in hadoop-site.xml file and restarting the cluster to make sure changes are applied. Now the issue is that when I am trying to put a file on the hdfs (hadoop fs -put) it seems like the block size is always 64MB (browsing the filesystem via the http interface). Hadoop version is 0.19.1 on a 6 node cluster. 1. Why the new block size is not reflected when I am creating/loading a new file into the hdfs? 2. How can I see current parameters and their values on Hadoop to make sure the change in hadoop-site.xml file took affect at the restart? I am trying to load a large file into HDFS and it seems slow (1.5min for 1GB), that's why I am trying to increase the block size. Thanks, Zeev -- View this message in context: http://www.nabble.com/dfs.block.size-change-not-taking-affect--tp24654181p24654181.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
Breaking up maps into separate files?
Hi, in the Reducer, I take each map and break its value into three pieces: binary piece, text piece, and a descriptor. I want to collect the binary pieces all in one output zip file, the text pieces in another output zip file, and the descriptors in an output text file. How do I use MultipleTextOutputFormat to accomplish this? Thank you, Mark
Re: Use text vs binary output for speed?
Maybe it was slow for me because I was writing from file system to HDFS, but now that I am using Amazon's MR, it will be OK. Thank you, Mark On Fri, Jul 24, 2009 at 3:19 PM, Owen O'Malley wrote: > On Jul 24, 2009, at 1:15 PM, Mark Kerzner wrote: > > SequenceFileOutputFormat write is known to be slow. >> > > It isn't known to be slow. > > Would it help if I >> convert the binary content to text, with something like Base64 encoding, >> and >> use that? >> > > It would be much slower. > > -- Owen >
Re: Use text vs binary output for speed?
On Jul 24, 2009, at 1:15 PM, Mark Kerzner wrote: SequenceFileOutputFormat write is known to be slow. It isn't known to be slow. Would it help if I convert the binary content to text, with something like Base64 encoding, and use that? It would be much slower. -- Owen
Re: Hadoop TeraSort Generator
On Jul 24, 2009, at 11:39 AM, Jim Twensky wrote: Hi, I'm doing some benchmarks on my cluster including the TeraSort benchmark to test a couple of hardware characteristics. When I was playing with Hadoop's generator, I found out that the keys generated by Hadoop's TeraGen implementation are not the same as the official generator located here: http://www.ordinal.com/try.cgi/gensort.tar.gz We haven't checked in the 2009 teragen yet. The one that is checked in is the 2008 version. -- Owen
Use text vs binary output for speed?
Hi,* * SequenceFileOutputFormat write is known to be slow. Would it help if I convert the binary content to text, with something like Base64 encoding, and use that? Thank you, Mark
Re: DiskChecker$DiskErrorException in TT logs
On Fri, Jul 24, 2009 at 11:00 AM, Aaron Kimball wrote: > Amandeep, > > Does the job fail after that happens? Are there any WARN or ERROR lines in > the log nearby, or any exceptions? > These occur in the JT logs: 2009-07-24 00:19:21,598 WARN org.apache.hadoop.mapred.JobInProgress: Running cache for maps missing!! Job details are missing . 2009-07-24 00:19:21,598 WARN org.apache.hadoop.mapred.JobInProgress: Non-running cache for maps missing!! Job details are mis sing. There arent any WARN on ERROR messages in the TT logs. The exact message I'm getting is: 2009-07-24 10:50:32,519 INFO org.apache.hadoop.mapred.TaskTracker: org.apache.hadoop.util.DiskChecker$DiskErrorException: Cou ld not find taskTracker/jobcache/job_200907221738_0103/attempt_200907221738_0103_m_15_0/output/file.out in any of the con figured local directories > Three possibilities I can think of: > > You may have configured Hadoop to run under /tmp, and tmpwatch or another > cleanup utility like that decided to throw away a bunch of files in the > temp > space while your job was running. In this case, you should consider moving > hadoop.tmp.dir and mapred.local.dir out from under the default /tmp. > hadoop.tmp.dir /hadoop/tmp/${user.name} and... mapred.local.dir ${hadoop.tmp.dir}/mapred/local Its creating the jobcache directory in mapred.local.dir/local/taskTracker/ when the job is run... > You might be out of disk space? > Plenty available. > > mapred.local.dir or hadoop.tmp.dir might be set to paths that Hadoop > doesn't > have the privileges to write to? > Nope.. They are set to paths that the user running hadoop can write to. My hdfs goes to a directory: /hadoop/hdfs Temp goes to: /hadoop/tmp I deleted the /hadoop/tmp directory and ran the jobs again. It created the directory again on its own. Cant figure out whats wrong. Any other pointers? > > - A > > > On Thu, Jul 23, 2009 at 2:06 AM, Amandeep Khurana > wrote: > > > Hi > > > > I get these messages in the TT log while running a job: > > > > 2009-07-23 02:03:59,091 INFO org.apache.hadoop.mapred.TaskTracker: > > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find > > > > > taskTracker/jobcache/job_200907221738_0020/attempt_200907221738_0020_r_00_0/output/file.out > > in any of the configured local directories > > > > Whats the problem? > > > > Amandeep > > > > > > > > Amandeep Khurana > > Computer Science Graduate Student > > University of California, Santa Cruz > > >
RE: Remote access to cluster using user as hadoop
Thanks Steve. I will try MiniMRCluster. It will be very helpful. Regards Pallavi -Original Message- From: Steve Loughran [mailto:ste...@apache.org] Sent: Friday, July 24, 2009 7:22 PM To: common-user@hadoop.apache.org Subject: Re: Remote access to cluster using user as hadoop Pallavi Palleti wrote: > Hi all, > > I tried to trackdown to the place where I can add some conditions for not allowing any remote user with username as hadoop(root user) (other than some specific hostnames or ipaddresses). I could see the call path as FsShell -> DistributedFileSystem ->DFSClient - ClientProtocol. As there is no way to debug the code via eclipse (when I ran thru eclipse it points to LocalFileSystem), I followed naive way of debugging by adding print commands. After DFSClient, I couldn't figure out which Class is getting called. From the code, I could see only NameNode extended ClientProtocol, so I was pretty sure that NameNode methods are getting called, but I coudln't see my debug print statements in the logs when I added some print statements in the namenode. Can some one help me what is the flow when a call from Remote machine with same root user name(hadoop) is made? > > I tried for mkdir command which essentially calls mkdirs() method in DFSClient and there by ClientProtocol mkdirs() method. -client side, there are a couple of places where there is an exec("whoami") to determine the username. -server side, everything goes through the namenode. You should put your stuff there, if you want to defend against people using their own versions of the hadoop libraries. -No need to add print statements to trace flow, just set your log4j settings to log at DEBUG to see lots of stuff. -you can bring up a MiniMRCluster() in a single VM, which is how most of the unit tests run. This will let you debug both ends of the DFS conversation within the IDE.
Re: Multiple jobs on Hadoop
On Fri, Jul 24, 2009 at 3:05 PM, Ravi Phulari wrote: > You can submit multiple MR jobs on same cluster . > It's better to submit all jobs either from external machine which can be used > as a gateway to upload data to HDFS and submit MR jobs or from machine where > NN/JT is running . > There is no problem running MR jobs from other nodes in cluster ( datanode/tt > ) but the best practice is to use external machine if available or NN/JT . > > - > Ravi > > On 7/24/09 11:40 AM, "Hrishikesh Agashe" > wrote: > > Hi, > > If I have one cluster with around 20 machines, can I submit different MR jobs > from different machines in cluster? Are there any precautions to be taken? (I > want to start Nutch crawl as one job and Katta indexing as another job) > > --Hrishi > DISCLAIMER > == > This e-mail may contain privileged and confidential information which is the > property of Persistent Systems Ltd. It is intended only for the use of the > individual or entity to which it is addressed. If you are not the intended > recipient, you are not authorized to read, retain, copy, print, distribute or > use this message. If you have received this communication in error, please > notify the sender and delete all copies of this message. Persistent Systems > Ltd. does not accept any liability for virus infected mails. > > > Ravi > -- > > Ravi, 10 is my favorite number so I submit my jobs from datanode10. Question why do you suggest NN/JT? I feel that a user has the ability to do more collateral damage to a cluster when they have any access to the NN/TT/JT. My fear is someone writing a bash script with an infinite loop that ties up the cpu or accidentally fill up the disk in some way. To answer the original question technically any system with proper network access, hadoop jars, hadoop conf can interact with the cluster.
Re: Multiple jobs on Hadoop
You can submit multiple MR jobs on same cluster . It's better to submit all jobs either from external machine which can be used as a gateway to upload data to HDFS and submit MR jobs or from machine where NN/JT is running . There is no problem running MR jobs from other nodes in cluster ( datanode/tt ) but the best practice is to use external machine if available or NN/JT . - Ravi On 7/24/09 11:40 AM, "Hrishikesh Agashe" wrote: Hi, If I have one cluster with around 20 machines, can I submit different MR jobs from different machines in cluster? Are there any precautions to be taken? (I want to start Nutch crawl as one job and Katta indexing as another job) --Hrishi DISCLAIMER == This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails. Ravi --
Multiple jobs on Hadoop
Hi, If I have one cluster with around 20 machines, can I submit different MR jobs from different machines in cluster? Are there any precautions to be taken? (I want to start Nutch crawl as one job and Katta indexing as another job) --Hrishi DISCLAIMER == This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.
Hadoop TeraSort Generator
Hi, I'm doing some benchmarks on my cluster including the TeraSort benchmark to test a couple of hardware characteristics. When I was playing with Hadoop's generator, I found out that the keys generated by Hadoop's TeraGen implementation are not the same as the official generator located here: http://www.ordinal.com/try.cgi/gensort.tar.gz Here are the first 5 keys generated by Hadoop: - .t^#\|v$2\ 7...@~?'WdUF w[o||:N&H, ^Eu) (note: that some binary keys are negative and so not printable as a char) - - I was wondering if Hadoop's generator is based on the official generaor exactly or is this just a similar implementation producing different results. Can I be a displaying the results incorrectly? Here is how I display them: private void printKey(Text key) { byte[] keyBytes = key.getBytes(); for(int i=0; i<10; i++) System.out.print((char)keyBytes[i]); } Thanks, Jim
RE: Questions on How the Namenode Assign Blocks to Datanodes
Thanks for the tip, I am reading the code now, thanks a lot! Boyu Zhang -Original Message- From: Hairong Kuang [mailto:hair...@yahoo-inc.com] Sent: Friday, July 24, 2009 1:55 PM To: common-user@hadoop.apache.org Subject: Re: Questions on How the Namenode Assign Blocks to Datanodes org.apache.hadoop.hdfs.server.namenode.ReplicationTargetChooser contains the block placement policy. ReplicationTargetChooser#chooseLocalNode picks the first location. Hairong On 7/24/09 10:26 AM, "Boyu Zhang" wrote: > Thank you for the reply. Do you by any chance remember where did you read > this? Thanks a lot! > > Boyu > > > > > -Original Message- > From: Hairong Kuang [mailto:hair...@yahoo-inc.com] > Sent: Friday, July 24, 2009 12:54 PM > To: common-user@hadoop.apache.org > Subject: Re: Questions on How the Namenode Assign Blocks to Datanodes > >> you are running on (to save bandwidth). Otherwise, another machine will >> somehow be picked (I forget where and how). > > Another machine will be randomly chosen. > > Hairong > >
Re: A few questions about Hadoop and hard-drive failure handling.
On Fri, Jul 24, 2009 at 6:48 AM, Steve Loughran wrote: > Ryan Smith wrote: > >> > but you dont want to be the one trying to write something just after >> your >> production cluster lost its namenode data. >> >> Steve, >> >> I wasnt planning on trying to solve something like this in production. I >> would assume everyone here is a professional and wouldn't even think of >> something like this, but then again maybe not. I was asking here so i >> knew >> the limitations before i started prototyping failure recovery logic. >> >> -Ryan >> > > > That's good to know. Just worrying, that's all > > the common failure mode people tend to hit is that their editLog, the list > of pending operations, gets truncated when the NN runs out of disk space. > When the NN comes back up, it tries to replay this, but the file is > truncated and the replay fails. Which means the NN doesnt come back up. > > 1. Secondary namenodes help here. > > 2. We really do need Hadoop to recover from this more gracefully, perhaps > by not crashing at this point, and instead halting when the replay finishes. > You will lose some data, but dont end up having to manually edit the binary > edit log to get to the same state. Code and tests would be valued > +1! > > -steve >
Re: DiskChecker$DiskErrorException in TT logs
Amandeep, Does the job fail after that happens? Are there any WARN or ERROR lines in the log nearby, or any exceptions? Three possibilities I can think of: You may have configured Hadoop to run under /tmp, and tmpwatch or another cleanup utility like that decided to throw away a bunch of files in the temp space while your job was running. In this case, you should consider moving hadoop.tmp.dir and mapred.local.dir out from under the default /tmp. You might be out of disk space? mapred.local.dir or hadoop.tmp.dir might be set to paths that Hadoop doesn't have the privileges to write to? - A On Thu, Jul 23, 2009 at 2:06 AM, Amandeep Khurana wrote: > Hi > > I get these messages in the TT log while running a job: > > 2009-07-23 02:03:59,091 INFO org.apache.hadoop.mapred.TaskTracker: > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find > > taskTracker/jobcache/job_200907221738_0020/attempt_200907221738_0020_r_00_0/output/file.out > in any of the configured local directories > > Whats the problem? > > Amandeep > > > > Amandeep Khurana > Computer Science Graduate Student > University of California, Santa Cruz >
Re: Questions on How the Namenode Assign Blocks to Datanodes
org.apache.hadoop.hdfs.server.namenode.ReplicationTargetChooser contains the block placement policy. ReplicationTargetChooser#chooseLocalNode picks the first location. Hairong On 7/24/09 10:26 AM, "Boyu Zhang" wrote: > Thank you for the reply. Do you by any chance remember where did you read > this? Thanks a lot! > > Boyu > > > > > -Original Message- > From: Hairong Kuang [mailto:hair...@yahoo-inc.com] > Sent: Friday, July 24, 2009 12:54 PM > To: common-user@hadoop.apache.org > Subject: Re: Questions on How the Namenode Assign Blocks to Datanodes > >> you are running on (to save bandwidth). Otherwise, another machine will >> somehow be picked (I forget where and how). > > Another machine will be randomly chosen. > > Hairong > >
Re: Links on Hadoop web page broken.
almost certainly yes, but the links on this page: http://hadoop.apache.org/hdfs/releases.html do not link there. Thanks, Robert tim robertson wrote: > I think this is the URL you seek > http://hadoop.apache.org/common/docs/r0.20.0/changes.html > > > > On Fri, Jul 24, 2009 at 6:31 PM, Robert Engel wrote: > >> Hello, >> >> � �not sure if this has been reported before, but it seems that all >> links on the Hadoop web page linking to: >> >> http://hadoop.apache.org/hdfs/docs/ >> >> ( or below ) are broken. The content of the directory is empty. I wanted >> to download release notes to Hadoop 0.20.0 >> >> http://hadoop.apache.org/hdfs/docs/r0.20.0/changes.html >> >> Robert >> >> > > begin:vcard fn:Robert Engel n:Engel;Robert email;internet:enge...@ligo.caltech.edu version:2.1 end:vcard signature.asc Description: OpenPGP digital signature
RE: Questions on How the Namenode Assign Blocks to Datanodes
Thank you for the reply. Do you by any chance remember where did you read this? Thanks a lot! Boyu -Original Message- From: Hairong Kuang [mailto:hair...@yahoo-inc.com] Sent: Friday, July 24, 2009 12:54 PM To: common-user@hadoop.apache.org Subject: Re: Questions on How the Namenode Assign Blocks to Datanodes > you are running on (to save bandwidth). Otherwise, another machine will > somehow be picked (I forget where and how). Another machine will be randomly chosen. Hairong
Re: Questions on How the Namenode Assign Blocks to Datanodes
> you are running on (to save bandwidth). Otherwise, another machine will > somehow be picked (I forget where and how). Another machine will be randomly chosen. Hairong
Re: Links on Hadoop web page broken.
I think this is the URL you seek http://hadoop.apache.org/common/docs/r0.20.0/changes.html On Fri, Jul 24, 2009 at 6:31 PM, Robert Engel wrote: > Hello, > > not sure if this has been reported before, but it seems that all > links on the Hadoop web page linking to: > > http://hadoop.apache.org/hdfs/docs/ > > ( or below ) are broken. The content of the directory is empty. I wanted > to download release notes to Hadoop 0.20.0 > > http://hadoop.apache.org/hdfs/docs/r0.20.0/changes.html > > Robert >
Links on Hadoop web page broken.
Hello, not sure if this has been reported before, but it seems that all links on the Hadoop web page linking to: http://hadoop.apache.org/hdfs/docs/ ( or below ) are broken. The content of the directory is empty. I wanted to download release notes to Hadoop 0.20.0 http://hadoop.apache.org/hdfs/docs/r0.20.0/changes.html Robert begin:vcard fn:Robert Engel n:Engel;Robert email;internet:enge...@ligo.caltech.edu version:2.1 end:vcard signature.asc Description: OpenPGP digital signature
Cannot find IProgressMonitor
I know this is probably a pretty stupid question, but is no-one else having this problem with hadoop-0.20.0? The project was not built since its build path is incomplete. Cannot find the class file for org.eclipse.core.runtime.IProgressMonitor. Fix the build path then try building this project hadoop-0.20.0 I do not find this class in the hadoop/libs jar's. I would appreciate any help, thanks newbie
Re: Questions on How the Namenode Assign Blocks to Datanodes
Boyu Zhang wrote: Dear Steve, Thank you for your reply. I did worried about my email got lost, but I will wait for an answer longer next time, thank you for reminding me : ) I understand that if you have data replica = 3, the namenode will assign the blocks that way. However, I still have a question, if the data replica = 1, I just use it for testing to see how HDFS works, what is the policy to decide which datanode gets which block? Thank you so much! If you are running your code on a datanode, it will be on the machine you are running on (to save bandwidth). Otherwise, another machine will somehow be picked (I forget where and how). Hadoop tries to keep the data balanced across machines, to stop one having all the data, others having less. I don't know whether it goes on percentage of disk space free or total amount of data. You'd have to rummage in the source to work out. Like I said, there's been discussion on improving the layout algorithms, to support plugins with different policies.
RE: Questions on How the Namenode Assign Blocks to Datanodes
Dear Steve, Thank you for your reply. I did worried about my email got lost, but I will wait for an answer longer next time, thank you for reminding me : ) I understand that if you have data replica = 3, the namenode will assign the blocks that way. However, I still have a question, if the data replica = 1, I just use it for testing to see how HDFS works, what is the policy to decide which datanode gets which block? Thank you so much! Boyu Zhang Ph. D. Student Computer and Information Sciences Department University of Delaware (210) 274-2104 bzh...@udel.edu http://www.eecis.udel.edu/~bzhang -Original Message- From: Steve Loughran [mailto:ste...@apache.org] Sent: Friday, July 24, 2009 7:09 AM To: common-user@hadoop.apache.org Subject: Re: Questions on How the Namenode Assign Blocks to Datanodes 1. dont panic if nobody replies to your message in an hour and resend. Hadoop developers/users are in many different timezones, and people often only look at this at odd times in the day. Its best to wait 24 hours before worrying if your email got lost 2. The namenode decides, usually two blocks to one rack, another block to a different rack. This is to save on datacentre backbone bandwidth, but isolate you from the loss of an entire rack (not so unusual once your rack is on shared DC power/PSUs). 3. There has been discussion on having plug-in policy here, but it would need to work with the load balancer, the code that balances blocks across machines in the background.
Re: Output of a Reducer as a zip file?
I used to write zip files in my reducer, it was very very fast, and pulling the files out of hdfs as also very fast. In part this is because each reducer might need to write 26k individual files, by writing them as a zip file there was only 1 hdfs file. The job ran about 15x faster that way. I don't have the code handy any more but it was something on the order of ZipOutputStream zos = new ZipOutputStream( fs.create("Output.zip")); where fs is a FileSystem object. On Thu, Jul 23, 2009 at 8:48 PM, Mark Kerzner wrote: > Thank you, MultipleOutputFormat is sufficient. > Mark > > On Thu, Jul 23, 2009 at 12:24 AM, Amogh Vasekar > wrote: > > > Does MultipleOutputFormat suffice? > > > > Cheers! > > Amogh > > > > -Original Message- > > From: Mark Kerzner [mailto:markkerz...@gmail.com] > > Sent: Thursday, July 23, 2009 6:24 AM > > To: core-u...@hadoop.apache.org > > Subject: Output of a Reducer as a zip file? > > > > Hi, > > my output consists of a number of binary files, corresponding text files, > > and one descriptor file. Is there a way to for my reducer to produce a > zip > > of all binary files, another zip of all text ones, and a separate text > > descriptor? If not, how close to this can I get? For example, I could > code > > the binary and the text into one text line of an output file, but then I > > would need some additional processing. > > > > Thank you, > > Mark > > > -- Pro Hadoop, a book to guide you from beginner to hadoop mastery, http://www.amazon.com/dp/1430219424?tag=jewlerymall www.prohadoopbook.com a community for Hadoop Professionals
Re: Remote access to cluster using user as hadoop
Pallavi Palleti wrote: Hi all, I tried to trackdown to the place where I can add some conditions for not allowing any remote user with username as hadoop(root user) (other than some specific hostnames or ipaddresses). I could see the call path as FsShell -> DistributedFileSystem ->DFSClient - ClientProtocol. As there is no way to debug the code via eclipse (when I ran thru eclipse it points to LocalFileSystem), I followed naive way of debugging by adding print commands. After DFSClient, I couldn't figure out which Class is getting called. From the code, I could see only NameNode extended ClientProtocol, so I was pretty sure that NameNode methods are getting called, but I coudln't see my debug print statements in the logs when I added some print statements in the namenode. Can some one help me what is the flow when a call from Remote machine with same root user name(hadoop) is made? I tried for mkdir command which essentially calls mkdirs() method in DFSClient and there by ClientProtocol mkdirs() method. -client side, there are a couple of places where there is an exec("whoami") to determine the username. -server side, everything goes through the namenode. You should put your stuff there, if you want to defend against people using their own versions of the hadoop libraries. -No need to add print statements to trace flow, just set your log4j settings to log at DEBUG to see lots of stuff. -you can bring up a MiniMRCluster() in a single VM, which is how most of the unit tests run. This will let you debug both ends of the DFS conversation within the IDE.
Re: A few questions about Hadoop and hard-drive failure handling.
Ryan Smith wrote: > but you dont want to be the one trying to write something just after your production cluster lost its namenode data. Steve, I wasnt planning on trying to solve something like this in production. I would assume everyone here is a professional and wouldn't even think of something like this, but then again maybe not. I was asking here so i knew the limitations before i started prototyping failure recovery logic. -Ryan That's good to know. Just worrying, that's all the common failure mode people tend to hit is that their editLog, the list of pending operations, gets truncated when the NN runs out of disk space. When the NN comes back up, it tries to replay this, but the file is truncated and the replay fails. Which means the NN doesnt come back up. 1. Secondary namenodes help here. 2. We really do need Hadoop to recover from this more gracefully, perhaps by not crashing at this point, and instead halting when the replay finishes. You will lose some data, but dont end up having to manually edit the binary edit log to get to the same state. Code and tests would be valued -steve
Re: Remote access to cluster using user as hadoop
I guess, I forgot to restart namenode after changes. It is working fine now. Apologies for the spam. Thanks Pallavi - Original Message - From: "Pallavi Palleti" To: common-user@hadoop.apache.org Sent: Friday, July 24, 2009 6:45:02 PM GMT +05:30 Chennai, Kolkata, Mumbai, New Delhi Subject: Re: Remote access to cluster using user as hadoop Hi all, I tried to trackdown to the place where I can add some conditions for not allowing any remote user with username as hadoop(root user) (other than some specific hostnames or ipaddresses). I could see the call path as FsShell -> DistributedFileSystem ->DFSClient - ClientProtocol. As there is no way to debug the code via eclipse (when I ran thru eclipse it points to LocalFileSystem), I followed naive way of debugging by adding print commands. After DFSClient, I couldn't figure out which Class is getting called. From the code, I could see only NameNode extended ClientProtocol, so I was pretty sure that NameNode methods are getting called, but I coudln't see my debug print statements in the logs when I added some print statements in the namenode. Can some one help me what is the flow when a call from Remote machine with same root user name(hadoop) is made? I tried for mkdir command which essentially calls mkdirs() method in DFSClient and there by ClientProtocol mkdirs() method. Thanks Pallavi - Original Message - From: "Ted Dunning" To: common-user@hadoop.apache.org Sent: Friday, July 24, 2009 6:22:12 AM GMT +05:30 Chennai, Kolkata, Mumbai, New Delhi Subject: Re: Remote access to cluster using user as hadoop Interesting approach. My guess is that this would indeed protect the datanodes from accidental "attack" by stopping access before they are involved. You might also consider just changing the name of the magic hadoop user to something that is more unlikely. The name "hadoop" is not far off what somebody might come up with as a user name for experimenting or running scheduled jobs. On Thu, Jul 23, 2009 at 3:28 PM, Ian Holsman wrote: > I was thinking of alternatives similar to creating a proxy nameserver that > non-privileged users can attach to that forwards those to the "real" > nameserver or just hacking the nameserver so that it switches "hadoop" to > "hadoop_remote" for sessions from untrusted IP's. > > not being familiar with the code, I am presuming that there is a point > where the code determines the userID. can anyone point me to that bit? > I just want to hack it to downgrade superusers, and it doesn't have to be > too clean or work for every edge case. it's more to stop accidental > problems. > -- Ted Dunning, CTO DeepDyve
Re: Remote access to cluster using user as hadoop
Hi all, I tried to trackdown to the place where I can add some conditions for not allowing any remote user with username as hadoop(root user) (other than some specific hostnames or ipaddresses). I could see the call path as FsShell -> DistributedFileSystem ->DFSClient - ClientProtocol. As there is no way to debug the code via eclipse (when I ran thru eclipse it points to LocalFileSystem), I followed naive way of debugging by adding print commands. After DFSClient, I couldn't figure out which Class is getting called. From the code, I could see only NameNode extended ClientProtocol, so I was pretty sure that NameNode methods are getting called, but I coudln't see my debug print statements in the logs when I added some print statements in the namenode. Can some one help me what is the flow when a call from Remote machine with same root user name(hadoop) is made? I tried for mkdir command which essentially calls mkdirs() method in DFSClient and there by ClientProtocol mkdirs() method. Thanks Pallavi - Original Message - From: "Ted Dunning" To: common-user@hadoop.apache.org Sent: Friday, July 24, 2009 6:22:12 AM GMT +05:30 Chennai, Kolkata, Mumbai, New Delhi Subject: Re: Remote access to cluster using user as hadoop Interesting approach. My guess is that this would indeed protect the datanodes from accidental "attack" by stopping access before they are involved. You might also consider just changing the name of the magic hadoop user to something that is more unlikely. The name "hadoop" is not far off what somebody might come up with as a user name for experimenting or running scheduled jobs. On Thu, Jul 23, 2009 at 3:28 PM, Ian Holsman wrote: > I was thinking of alternatives similar to creating a proxy nameserver that > non-privileged users can attach to that forwards those to the "real" > nameserver or just hacking the nameserver so that it switches "hadoop" to > "hadoop_remote" for sessions from untrusted IP's. > > not being familiar with the code, I am presuming that there is a point > where the code determines the userID. can anyone point me to that bit? > I just want to hack it to downgrade superusers, and it doesn't have to be > too clean or work for every edge case. it's more to stop accidental > problems. > -- Ted Dunning, CTO DeepDyve
Re: Issue with HDFS Client when datanode is temporarily unavailable
Could some one let me know what would be the reason for failure. If the stream can be closed with out any issue when datanodes are available, it reduces most of the complexity that need to be done at my end. As the stream is failing to close even when datanodes are available, I have to maintain a kind of checkpointing to resume from where the data has failed to copy back to HDFS which will add an overhead for a solution which is near real time. Thanks Pallavi - Original Message - From: "Pallavi Palleti" To: common-user@hadoop.apache.org Sent: Wednesday, July 22, 2009 5:06:49 PM GMT +05:30 Chennai, Kolkata, Mumbai, New Delhi Subject: RE: Issue with HDFS Client when datanode is temporarily unavailable Hi all, In simple terms, Why is any output stream that failed to close when the datanodes weren't available fails when I try to close the same again when the datanodes are available? Could someone kindly help me to tackle this situation? Thanks Pallavi -Original Message- From: Palleti, Pallavi [mailto:pallavi.pall...@corp.aol.com] Sent: Tuesday, July 21, 2009 10:21 PM To: common-user@hadoop.apache.org Subject: Issue with HDFS Client when datanode is temporarily unavailable Hi all, We are facing issues with an external application when it tries to write data into HDFS using FSDataOutputStream. We are using hadoop-0.18.2 version. The code works perfectly fine as long as the data nodes are doing well. If the data nodes are unavailable due to some reason (No space left etc, which is temporary due to map red jobs running on the machine), the code fails. I tried to fix the issue by catching the error and waiting for some time before retrying again. During this, I came to know that the actual writes are not happening when we specify out.write() (Even the same case with out.write() followed by out.flush()), but it happens when we actually specify out.close(). During this time, if the datanodes are unavailable, the DFSClient internally tries multiple times before actually throwing exception. Below are the sequence of exceptions that I am seeing. 09/07/21 19:33:25 INFO dfs.DFSClient: Exception in createBlockOutputStream java.net.ConnectException: Connection refused 09/07/21 19:33:25 INFO dfs.DFSClient: Abandoning block blk_2612177980121914843_134112 09/07/21 19:33:31 INFO dfs.DFSClient: Exception in createBlockOutputStream java.net.ConnectException: Connection refused 09/07/21 19:33:31 INFO dfs.DFSClient: Abandoning block blk_-3499389777806382640_134112 09/07/21 19:33:37 INFO dfs.DFSClient: Exception in createBlockOutputStream java.net.ConnectException: Connection refused 09/07/21 19:33:37 INFO dfs.DFSClient: Abandoning block blk_1835125657840860999_134112 09/07/21 19:33:43 INFO dfs.DFSClient: Exception in createBlockOutputStream java.net.ConnectException: Connection refused 09/07/21 19:33:43 INFO dfs.DFSClient: Abandoning block blk_-3979824251735502509_134112[4 times attempt done by DFSClient before throwing exception during which datanode is unavailable] 09/07/21 19:33:49 WARN dfs.DFSClient: DataStreamer Exception: java.io.IOException: Unable to create new block. at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DF SClient.java:2357) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1800(DFSClient.ja va:1743) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClie nt.java:1920) 09/07/21 19:33:49 WARN dfs.DFSClient: Error Recovery for block blk_-3979824251735502509_134112 bad datanode[0] 09/07/21 19:33:49 ERROR logwriter.LogWriterToHDFSV2: Failed while creating file for data:some dummy line [21/Jul/2009:17:15:18 somethinghere] with other dummy info :to HDFS java.io.IOException: Could not get block locations. Aborting... at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFS Client.java:2151) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.ja va:1743) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClie nt.java:1897) 09/07/21 19:33:49 INFO logwriter.LogWriterToHDFSV2: Retrying again...number of Attempts =0 [done by me manually during which datanode is available] 09/07/21 19:33:54 ERROR logwriter.LogWriterToHDFSV2: Failed while creating file for data:some dummy line [21/Jul/2009:17:15:18 somethinghere] with other dummy info :to HDFS java.io.IOException: Could not get block locations. Aborting... at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFS Client.java:2151) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.ja va:1743) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClie nt.java:1897) 09/07/21 19:33:54 INFO logwriter.LogWriterToHDFSV2: Retrying again...number of Attempts =1 [done by me manually during which datanode is available] 09/07/21 19:33:59 ERROR logwriter.LogWriterToHDFSV2: Failed
Re: A few questions about Hadoop and hard-drive failure handling.
> but you dont want to be the one trying to write something just after your production cluster lost its namenode data. Steve, I wasnt planning on trying to solve something like this in production. I would assume everyone here is a professional and wouldn't even think of something like this, but then again maybe not. I was asking here so i knew the limitations before i started prototyping failure recovery logic. -Ryan On Fri, Jul 24, 2009 at 7:05 AM, Steve Loughran wrote: > Ryan Smith wrote: > >> Todd, excellent info, thank you. I use Ganglia, I will set up nagios >> though, good idea. Just one clarification on Question 1. What if I >> actually lose all my master data dirs, and have no back up on the >> secondary >> name node, are the data blocks on all the slaves lost in that situation? >> I >> think GoogleFS serializes the data blocks so they can be reasssembled >> based >> on the headers in the data blocks in that scenario. Just curious if >> Hadoop >> has anything as far as that goes. >> >> -Ryan >> > > don't do that. > > If you want to help with writing HDFS recovery code, I am sure everyone > will welcome it. but you dont want to be the one trying to write something > just after your production cluster lost its namenode data. >
Re: How Does Hadoop stores Meta Data information
On Fri, Jul 24, 2009 at 4:20 PM, ashish pareek wrote: > Hi Everybody, > > From last few weeks I am trying to find out what data > structure does Hadoop uses to store files and directories Meta-data > information. If any one can point out the exact source file where this > source code is, it will help me a lot. > > Thank you in advance. > > You probably want to take a look at org.apache.hadoop.hdfs.server.namenode package (specifically NameNode.java, FSNamesystem.java might be interesting). Essentially it maintains a bunch of tables (maps from filename -> blocklist, block -> datanodes, etc). Some of them are serialized to disk (ie FSImage) while others are just in-memory and are re-constructed after every restart. -- Harish Mallipeddi http://blog.poundbang.in
Re: Questions on How the Namenode Assign Blocks to Datanodes
Boyu Zhang wrote: Dear All, I have a question in my mind about HDFS and I cannot find the answer from the documents on the apache website. I have a cluster of 4 machines, one is the namenode and the other 3 are datanodes. When I put 6 files, each 430 MB, to HDFS, the 6 files are split into 42 blocks(64MB each). But what polices are used to assign these blocks to datanode? In my case, machine1 got 14 blocks, machine2 got 12 blocks and machine3 got 16 blocks. Could anyone one help me with it? Or is there any documentation I can read to help me clarify this? 1. dont panic if nobody replies to your message in an hour and resend. Hadoop developers/users are in many different timezones, and people often only look at this at odd times in the day. Its best to wait 24 hours before worrying if your email got lost 2. The namenode decides, usually two blocks to one rack, another block to a different rack. This is to save on datacentre backbone bandwidth, but isolate you from the loss of an entire rack (not so unusual once your rack is on shared DC power/PSUs). 3. There has been discussion on having plug-in policy here, but it would need to work with the load balancer, the code that balances blocks across machines in the background.
Re: A few questions about Hadoop and hard-drive failure handling.
Ryan Smith wrote: Todd, excellent info, thank you. I use Ganglia, I will set up nagios though, good idea. Just one clarification on Question 1. What if I actually lose all my master data dirs, and have no back up on the secondary name node, are the data blocks on all the slaves lost in that situation? I think GoogleFS serializes the data blocks so they can be reasssembled based on the headers in the data blocks in that scenario. Just curious if Hadoop has anything as far as that goes. -Ryan don't do that. If you want to help with writing HDFS recovery code, I am sure everyone will welcome it. but you dont want to be the one trying to write something just after your production cluster lost its namenode data.
How Does Hadoop stores Meta Data information
Hi Everybody, From last few weeks I am trying to find out what data structure does Hadoop uses to store files and directories Meta-data information. If any one can point out the exact source file where this source code is, it will help me a lot. Thank you in advance. Regards, Ashish.