RE: Balancer exiting immediately despite having work to do.
James, http://pastebin.com/mYBRKDew Tomorrow I'll run the balancer again and grab a copy of the namenode logs as well. Didn't think of that today. -Landy -Original Message- From: jameswarr...@gmail.com [mailto:jameswarr...@gmail.com] On Behalf Of James Warren Sent: Wednesday, January 04, 2012 7:49 PM To: common-user@hadoop.apache.org Subject: Re: Balancer exiting immediately despite having work to do. Hi Landy - Attachments are stripped from e-mails sent to the mailing list. Could you publish your logs on pastebin and forward the url? cheers, -James On Wed, Jan 4, 2012 at 10:03 AM, Bible, Landy wrote: > Hi all, > > ** ** > > I'm running Hadoop 0.20.2. The balancer has suddenly stopped working. > I'm attempting to balance the cluster with a threshold of 1, using the > following command: > > ** ** > > ./hadoop balancer -threshold 1 > > ** ** > > This has been working fine, but suddenly it isn't. It skips though 5 > iterations without actually doing any work: > > ** ** > > Time Stamp Iteration# Bytes Already Moved Bytes Left To > Move Bytes Being Moved > > Jan 4, 2012 11:47:56 AM 0 0 KB 1.87 > GB6.68 GB > > Jan 4, 2012 11:47:56 AM 1 0 KB 1.87 > GB6.68 GB > > Jan 4, 2012 11:47:56 AM 2 0 KB 1.87 > GB6.68 GB > > Jan 4, 2012 11:47:57 AM 3 0 KB 1.87 > GB6.68 GB > > Jan 4, 2012 11:47:57 AM 4 0 KB 1.87 > GB6.68 GB > > No block has been moved for 5 iterations. Exiting... > > Balancing took 524.0 milliseconds > > ** ** > > I've attached the full log, but I can't see any errors indicating why > it is failing. Any ideas? I'd really like to get balancing working again. > My use case isn't the norm, and it is important that the cluster stay > as close to completely balanced as possible. > > ** ** > > -- > > Landy Bible > > ** ** > > Simulation and Computer Specialist > > School of Nursing - Collins College of Business > > The University of Tulsa > > ** ** >
Balancer exiting immediately despite having work to do.
Hi all, I'm running Hadoop 0.20.2. The balancer has suddenly stopped working. I'm attempting to balance the cluster with a threshold of 1, using the following command: ./hadoop balancer -threshold 1 This has been working fine, but suddenly it isn't. It skips though 5 iterations without actually doing any work: Time Stamp Iteration# Bytes Already Moved Bytes Left To Move Bytes Being Moved Jan 4, 2012 11:47:56 AM 0 0 KB 1.87 GB 6.68 GB Jan 4, 2012 11:47:56 AM 1 0 KB 1.87 GB 6.68 GB Jan 4, 2012 11:47:56 AM 2 0 KB 1.87 GB 6.68 GB Jan 4, 2012 11:47:57 AM 3 0 KB 1.87 GB 6.68 GB Jan 4, 2012 11:47:57 AM 4 0 KB 1.87 GB 6.68 GB No block has been moved for 5 iterations. Exiting... Balancing took 524.0 milliseconds I've attached the full log, but I can't see any errors indicating why it is failing. Any ideas? I'd really like to get balancing working again. My use case isn't the norm, and it is important that the cluster stay as close to completely balanced as possible. -- Landy Bible Simulation and Computer Specialist School of Nursing - Collins College of Business The University of Tulsa
HDFS api - change case of username?
Hey all, I've run into a problem where I need to change the user who I'm running the HDFS commands as. I've got clients uploading data from windows boxes as a specific user. In HDFS, the owner shows up as domain\user. Now I need to get the data from a linux box which is tied to AD with the likewise-open package. When running there usernames are shown as DOMAIN\user. This is causing me to get permission denied errors when I try to read the files. Is it possible to make HDFS ignore case or for me to convince the API to always pass the username in lower case? Thanks, -- Landy Bible Simulation and Computer Specialist School of Nursing - Collins College of Business The University of Tulsa
RE: Hadoop on windows with bat and ant scripts
On 06/13/2011 07:52 AM, Loughran, Steve wrote: >>On 06/10/2011 03:23 PM, Bible, Landy wrote: >> I'm currently running HDFS on Windows 7 desktops. I had to create a >> hadoop.bat that provided the same functionality of the shell scripts, and >> some Java Service Wrapper configs to run the DataNodes and NameNode as >> windows services. Once I get my system more functional I plan to do a write >> up about how I did it, but it wasn't too difficult. I'd also like to see >> Hadoop become less platform dependent. >why? Do you plan to bring up a real Windows server datacenter to test it on? Not a datacenter, but a large-ish cluster of desktops, yes. >Whether you like it or not, all the big Hadoop clusters run on Linux I realize that, I use Linux wherever possible, much to the annoyance of my Windows only co-workers. However, for my current project, I'm using all the Windows 7 and Vista desktops at my site as a storage cluster. The first idea was to run Hadoop on Linux in a VM in the background on each desktop, but that seemed like overkill. The point here is to use the resources we have but aren't using, rather than buy new resources. Academia is funny like that. >> So far, I've been unable to make MapReduce work correctly. The services >> run, but things don't work, however I suspect that this is due to DNS not >> working correctly in my environment. >yes, that's part of the anywhere you have to fix. Edit the host tables so that >DNS and reverse DNS appears to work. That's >c:\windows\system32\drivers\etc\hosts, unless on a win64 box it moves. Why does Hadoop even care about DNS? Every node checks in with the NameNode and JobTrackers, so they know where they are, why not just go pure IP based and forget DNS. Managing the hosts file is a pain... even when you automate it, it just seems unneeded.
RE: Question about DFS Reserved Space
-Original Message- From: Harsh J [mailto:ha...@cloudera.com] Sent: Thursday, June 09, 2011 12:14 PM To: common-user@hadoop.apache.org Subject: Re: Question about DFS Reserved Space >Landy, >>On Thu, Jun 9, 2011 at 10:05 PM, Bible, Landy wrote: >> Hi all, >> >> I'm planning a rather non-standard HDFS cluster. The machines will be >> doing more than just DFS, and each machine will have varying local storage >> utilization outside of DFS. If I use the "dfs.datanode.du.reserved" >> property and reserve 10 GB, Does that mean DFS will use (total disk size - >> 10 GB) or that it will always leave 10 GB free? Basically, is the disk >> usage outside DFS (OS + other data) taken in to account? >The latter (will leave 10 GB free). The whole disk is taken into account >during space compute. So yes, even external data may influence. >> As usage outside of DFS grows I'd like DFS to back off the disk, and migrate >> blocks to other nodes. If this isn't the current behavior, I could create a >> script to look at disk usage every few hours and modify the reserved >> property dynamically. If the property is changed on a single datanode and >> it is restarted, will the datanode then start moving blocks away? >Why would you need to modify the reserve values once set to a comfortable >value? The DN monitors the disk space by itself, so you don't have to. Great! Problem solved. I assumed that the datanode was smart enough, but I wanted to be sure. >The DN will also not move away blocks if reserved limit is violated (due to >you increasing it, say). However, it will begin to refuse any writes happening >to it. You may require to run the Balancer in order to move blocks around and >balance DNs though. Running the balancer from time to time is easy enough. I'm guessing that if the limit is violated, the balancer would take care of moving the offending blocks off the datanode. >> My other option is to just set the reserved amount very high on every node, >> but that will lead to a lot of wasted space as many nodes won't have a very >> large storage demand outside of DFS. >How about keeping one disk dedicated for all other intents outside of the >DFS's grasp? Normally I would, but as I mentioned, this isn't a normal cluster. I'm actually running the datanodes on Windows 7 desktops, which of course only have a single disk. I'm planning to use HDFS to store backups of user data from the desktops. (encrypted before uploading to the cluster, of course) The idea is to use the vast amount of wasted disk space on our desktops as archival storage. We won't be running any MR jobs, just storing data. -Landy
RE: Hadoop on windows with bat and ant scripts
Hi Raja, I'm currently running HDFS on Windows 7 desktops. I had to create a hadoop.bat that provided the same functionality of the shell scripts, and some Java Service Wrapper configs to run the DataNodes and NameNode as windows services. Once I get my system more functional I plan to do a write up about how I did it, but it wasn't too difficult. I'd also like to see Hadoop become less platform dependent. Java is supposed to be Write Once - Run Anywhere, but a lot of java projects seem to forget that. So far, I've been unable to make MapReduce work correctly. The services run, but things don't work, however I suspect that this is due to DNS not working correctly in my environment. -Landy -Original Message- From: Raja Nagendra Kumar [mailto:nagendra.r...@tejasoft.com] Sent: Friday, June 10, 2011 12:38 AM To: core-u...@hadoop.apache.org Subject: Hadoop on windows with bat and ant scripts Hi, I see hadoop would need unix (on windows with Cygwin) to run. It would be much nice if Hadoop gets away from the shell scripts though appropriate ant scripts or with java Admin Console kind of model. Then it becomes lighter for development. Are there any known plans or am I missing some thing..:) Regards, Raja Nagendra Kumar, C.T.O www.tejasoft.com -- View this message in context: http://old.nabble.com/Hadoop-on-windows-with-bat-and-ant-scripts-tp31815353p31815353.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
Question about DFS Reserved Space
Hi all, I'm planning a rather non-standard HDFS cluster. The machines will be doing more than just DFS, and each machine will have varying local storage utilization outside of DFS. If I use the "dfs.datanode.du.reserved" property and reserve 10 GB, Does that mean DFS will use (total disk size - 10 GB) or that it will always leave 10 GB free? Basically, is the disk usage outside DFS (OS + other data) taken in to account? As usage outside of DFS grows I'd like DFS to back off the disk, and migrate blocks to other nodes. If this isn't the current behavior, I could create a script to look at disk usage every few hours and modify the reserved property dynamically. If the property is changed on a single datanode and it is restarted, will the datanode then start moving blocks away? My other option is to just set the reserved amount very high on every node, but that will lead to a lot of wasted space as many nodes won't have a very large storage demand outside of DFS. Any comments or suggestions would be welcomed. Thanks, -- Landy Bible Simulation and Computer Specialist School of Nursing - Collins College of Business The University of Tulsa