RE: Balancer exiting immediately despite having work to do.

2012-01-04 Thread Bible, Landy
James,

http://pastebin.com/mYBRKDew

Tomorrow I'll run the balancer again and grab a copy of the namenode logs as 
well.  Didn't think of that today.

-Landy

-Original Message-
From: jameswarr...@gmail.com [mailto:jameswarr...@gmail.com] On Behalf Of James 
Warren
Sent: Wednesday, January 04, 2012 7:49 PM
To: common-user@hadoop.apache.org
Subject: Re: Balancer exiting immediately despite having work to do.

Hi Landy -

Attachments are stripped from e-mails sent to the mailing list.  Could you 
publish your logs on pastebin and forward the url?

cheers,
-James

On Wed, Jan 4, 2012 at 10:03 AM, Bible, Landy wrote:

> Hi all,
>
> ** **
>
> I'm running Hadoop 0.20.2.  The balancer has suddenly stopped working.
> I'm attempting to balance the cluster with a threshold of 1, using the 
> following command:
>
> ** **
>
> ./hadoop balancer -threshold 1
>
> ** **
>
> This has been working fine, but suddenly it isn't.  It skips though 5 
> iterations without actually doing any work:
>
> ** **
>
> Time Stamp   Iteration#  Bytes Already Moved  Bytes Left To
> Move  Bytes Being Moved
>
> Jan 4, 2012 11:47:56 AM   0 0 KB 1.87
> GB6.68 GB
>
> Jan 4, 2012 11:47:56 AM   1 0 KB 1.87
> GB6.68 GB
>
> Jan 4, 2012 11:47:56 AM   2 0 KB 1.87
> GB6.68 GB
>
> Jan 4, 2012 11:47:57 AM   3 0 KB 1.87
> GB6.68 GB
>
> Jan 4, 2012 11:47:57 AM   4 0 KB 1.87
> GB6.68 GB
>
> No block has been moved for 5 iterations. Exiting...
>
> Balancing took 524.0 milliseconds
>
> ** **
>
> I've attached the full log, but I can't see any errors indicating why 
> it is failing.  Any ideas?  I'd really like to get balancing working again.
> My use case isn't the norm, and it is important that the cluster stay 
> as close to completely balanced as possible.
>
> ** **
>
> --
>
> Landy Bible
>
> ** **
>
> Simulation and Computer Specialist
>
> School of Nursing - Collins College of Business
>
> The University of Tulsa
>
> ** **
>


Balancer exiting immediately despite having work to do.

2012-01-04 Thread Bible, Landy
Hi all,

I'm running Hadoop 0.20.2.  The balancer has suddenly stopped working.  I'm 
attempting to balance the cluster with a threshold of 1, using the following 
command:

./hadoop balancer -threshold 1

This has been working fine, but suddenly it isn't.  It skips though 5 
iterations without actually doing any work:

Time Stamp   Iteration#  Bytes Already Moved  Bytes Left To Move  
Bytes Being Moved
Jan 4, 2012 11:47:56 AM   0 0 KB 1.87 GB
6.68 GB
Jan 4, 2012 11:47:56 AM   1 0 KB 1.87 GB
6.68 GB
Jan 4, 2012 11:47:56 AM   2 0 KB 1.87 GB
6.68 GB
Jan 4, 2012 11:47:57 AM   3 0 KB 1.87 GB
6.68 GB
Jan 4, 2012 11:47:57 AM   4 0 KB 1.87 GB
6.68 GB
No block has been moved for 5 iterations. Exiting...
Balancing took 524.0 milliseconds

I've attached the full log, but I can't see any errors indicating why it is 
failing.  Any ideas?  I'd really like to get balancing working again.  My use 
case isn't the norm, and it is important that the cluster stay as close to 
completely balanced as possible.

--
Landy Bible

Simulation and Computer Specialist
School of Nursing - Collins College of Business
The University of Tulsa



HDFS api - change case of username?

2011-12-16 Thread Bible, Landy
Hey all,

I've run into a problem where I need to change the user who I'm running the 
HDFS commands as.

I've got clients uploading data from windows boxes as a specific user. In HDFS, 
the owner shows up as domain\user.  Now I need to get the data from a linux box 
which is tied to AD with the likewise-open package.  When running there 
usernames are shown as DOMAIN\user.  This is causing me to get permission 
denied errors when I try to read the files.  Is it possible to make HDFS ignore 
case or for me to convince the API to always pass the username in lower case?

Thanks,

--
Landy Bible

Simulation and Computer Specialist
School of Nursing - Collins College of Business
The University of Tulsa



RE: Hadoop on windows with bat and ant scripts

2011-06-13 Thread Bible, Landy
On 06/13/2011 07:52 AM, Loughran, Steve wrote:

>>On 06/10/2011 03:23 PM, Bible, Landy wrote:
>> I'm currently running HDFS on Windows 7 desktops.  I had to create a 
>> hadoop.bat that provided the same functionality of the shell scripts, and 
>> some Java Service Wrapper configs to run the DataNodes and NameNode as 
>> windows services.  Once I get my system more functional I plan to do a write 
>> up about how I did it, but it wasn't too difficult.  I'd also like to see 
>> Hadoop become less platform dependent.

>why? Do you plan to bring up a real Windows server datacenter to test it on?

Not a datacenter, but a large-ish cluster of desktops, yes.

>Whether you like it or not, all the big Hadoop clusters run on Linux

I realize that, I use Linux wherever possible, much to the annoyance of my 
Windows only co-workers. However, for my current project, I'm using all the 
Windows 7 and Vista desktops at my site as a storage cluster.   The first idea 
was to run Hadoop on Linux in a VM in the background on each desktop, but that 
seemed like overkill.  The point here is to use the resources we have but 
aren't using, rather than buy new resources.  Academia is funny like that.

>>   So far, I've been unable to make MapReduce work correctly.  The services 
>> run, but things don't work, however I suspect that this is due to DNS not 
>> working correctly in my environment.

>yes, that's part of the anywhere you have to fix. Edit the host tables so that 
>DNS and reverse DNS appears to work. That's 
>c:\windows\system32\drivers\etc\hosts, unless on a win64 box it moves.

Why does Hadoop even care about DNS?   Every node checks in with the NameNode 
and JobTrackers, so they know where they are, why not just go pure IP based and 
forget DNS.   Managing the hosts file is a pain... even when you automate it, 
it just seems unneeded.





RE: Question about DFS Reserved Space

2011-06-10 Thread Bible, Landy
-Original Message-
From: Harsh J [mailto:ha...@cloudera.com] 
Sent: Thursday, June 09, 2011 12:14 PM
To: common-user@hadoop.apache.org
Subject: Re: Question about DFS Reserved Space

>Landy,

>>On Thu, Jun 9, 2011 at 10:05 PM, Bible, Landy  wrote:
>> Hi all,
>>
>> I'm planning a rather non-standard HDFS cluster.   The machines will be 
>> doing more than just DFS, and each machine will have varying local storage 
>> utilization outside of DFS.  If I use the "dfs.datanode.du.reserved" 
>> property and reserve 10 GB,  Does that mean DFS will use (total disk size - 
>> 10 GB) or that it will always leave 10 GB free?  Basically, is the disk 
>> usage outside DFS (OS + other data) taken in to account?

>The latter (will leave 10 GB free). The whole disk is taken into account 
>during space compute. So yes, even external data may influence.

>> As usage outside of DFS grows I'd like DFS to back off the disk, and migrate 
>> blocks to other nodes.  If this isn't the current behavior, I could create a 
>> script to look at disk usage every few hours and modify the reserved 
>> property dynamically.  If the property is changed on a single datanode and 
>> it is restarted, will the datanode then start moving blocks away?

>Why would you need to modify the reserve values once set to a comfortable 
>value? The DN monitors the disk space by itself, so you don't have to.

Great!  Problem solved.  I assumed that the datanode was smart enough, but I 
wanted to be sure.

>The DN will also not move away blocks if reserved limit is violated (due to 
>you increasing it, say). However, it will begin to refuse any writes happening 
>to it. You may require to run the Balancer in order to move blocks around and 
>balance DNs though.

Running the balancer from time to time is easy enough.  I'm guessing that if 
the limit is violated, the balancer would take care of moving the offending 
blocks off the datanode.

>> My other option is to just set the reserved amount very high on every node, 
>> but that will lead to a lot of wasted space as many nodes won't have a very 
>> large storage demand outside of DFS.

>How about keeping one disk dedicated for all other intents outside of the 
>DFS's grasp?

Normally I would, but as I mentioned, this isn't a normal cluster.  I'm 
actually running the datanodes on Windows 7 desktops, which of course only have 
a single disk.  I'm planning to use HDFS to store backups of user data from the 
desktops. (encrypted before uploading to the cluster, of course)  The idea is 
to use the vast amount of wasted disk space on our desktops as archival 
storage.  We won't be running any MR jobs, just storing data.

-Landy


RE: Hadoop on windows with bat and ant scripts

2011-06-10 Thread Bible, Landy
Hi Raja,

I'm currently running HDFS on Windows 7 desktops.  I had to create a hadoop.bat 
that provided the same functionality of the shell scripts, and some Java 
Service Wrapper configs to run the DataNodes and NameNode as windows services.  
Once I get my system more functional I plan to do a write up about how I did 
it, but it wasn't too difficult.  I'd also like to see Hadoop become less 
platform dependent.  Java is supposed to be Write Once - Run Anywhere, but a 
lot of java projects seem to forget that.

 So far, I've been unable to make MapReduce work correctly.  The services run, 
but things don't work, however I suspect that this is due to DNS not working 
correctly in my environment.

-Landy

-Original Message-
From: Raja Nagendra Kumar [mailto:nagendra.r...@tejasoft.com] 
Sent: Friday, June 10, 2011 12:38 AM
To: core-u...@hadoop.apache.org
Subject: Hadoop on windows with bat and ant scripts


Hi,

I see hadoop would need unix (on windows with Cygwin) to run.
It would be much nice if Hadoop gets away from the shell scripts though 
appropriate ant scripts or with java Admin Console kind of model. Then it 
becomes lighter for development.

Are there any known plans or am I missing some thing..:)

Regards,
Raja Nagendra Kumar,
C.T.O
www.tejasoft.com
--
View this message in context: 
http://old.nabble.com/Hadoop-on-windows-with-bat-and-ant-scripts-tp31815353p31815353.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.



Question about DFS Reserved Space

2011-06-09 Thread Bible, Landy
Hi all,

I'm planning a rather non-standard HDFS cluster.   The machines will be doing 
more than just DFS, and each machine will have varying local storage 
utilization outside of DFS.  If I use the "dfs.datanode.du.reserved" property 
and reserve 10 GB,  Does that mean DFS will use (total disk size - 10 GB) or 
that it will always leave 10 GB free?  Basically, is the disk usage outside DFS 
(OS + other data) taken in to account?

As usage outside of DFS grows I'd like DFS to back off the disk, and migrate 
blocks to other nodes.  If this isn't the current behavior, I could create a 
script to look at disk usage every few hours and modify the reserved property 
dynamically.  If the property is changed on a single datanode and it is 
restarted, will the datanode then start moving blocks away?

My other option is to just set the reserved amount very high on every node, but 
that will lead to a lot of wasted space as many nodes won't have a very large 
storage demand outside of DFS.

Any comments or suggestions would be welcomed.

Thanks,
--
Landy Bible

Simulation and Computer Specialist
School of Nursing - Collins College of Business
The University of Tulsa