Re: Hadoop 3.0 doesn't detect the correct conf folder

2018-01-05 Thread Allen Wittenauer
On 2017-12-21 00:25, Jeff Zhang  wrote: 
> I tried the hadoop 3.0, and can start dfs properly, but when I start yarn,
> it fails with the following error
> ERROR: Cannot find configuration directory
> "/Users/jzhang/Java/lib/hadoop-3.0.0/conf"
> 
> Actually, this is not the correct conf folder. It should be
> /Users/jzhang/Java/lib/hadoop-3.0.0/etc/hadoop.  hdfs could detect that
> correctly, but seems yarn couldn't, might be something wrong in yarn
> starting script.


This kind of weirdness is indicative of having YARN_CONF_DIR defined 
somewhere.
-
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org



Re: v2.8.0: Setting PID dir EnvVars: libexec/thing-config.sh or etc/hadoop/thing-env.sh ?

2017-06-24 Thread Allen Wittenauer


On 2017-06-14 18:50 (-0700), Kevin Buckley 
 wrote: 

> So,
> 
> is there a "to be preferred" file choice, between those two,
> within which to set certain classes of EnvVars ?

2.x releases require a significant amount of duplication of environment 
variable settings. The YARN subsystem does not read from hadoop-env.sh at all. 
Because of this limitation, it means you'll need to configure HADOOP_PID_DIR in 
hadoop-env.sh, YARN_PID_DIR in yarn-env.sh, and HADOOP_MAPRED_PID_DIR in 
mapred-env.sh. Same is true for LOG_DIR, _OPT, and a few others.

FWIW, this is one of the areas that got cleaned up in the massive shell script 
rewrite in 3.x. For example, setting HADOOP_PID_DIR in hadoop-env.sh will 
propagate to all of the other services.
-
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org



Re: Supervisely, RAID0 provides best io performance whereas no RAID the worst

2016-08-01 Thread Allen Wittenauer


On 2016-07-30 20:12 (-0700), Shady Xu  wrote: 
> Thanks Andrew, I know about the disk failure risk and that it's one of the
> reasons why we should use JBOD. But JBOD provides worse performance than
> RAID 0. 

It's not about failure: it's about speed.  RAID0 performance will drop like a 
rock if any one disk in the set is slow. When all the drives are performing at 
peak, yes, it's definitely faster.  But over time, drive speed will decline 
(sometimes to half speed or less!) usually prior to a failure. This failure may 
take a while, so in the mean time your cluster is getting slower ... and slower 
... and slower ...

As a result, JBOD will be significantly faster over the _lifetime_ of the disks 
vs. a comparison made _today_.

-
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org



Re: Supervisely, RAID0 provides best io performance whereas no RAID the worst

2016-08-01 Thread Allen Wittenauer


On 2016-07-30 20:12 (-0700), Shady Xu  wrote: 
> Thanks Andrew, I know about the disk failure risk and that it's one of the
> reasons why we should use JBOD. But JBOD provides worse performance than
> RAID 0.




 And take into account the fact that HDFS does have other
> replications and it will make one more replication on another DataNode when
> disk failure happens. So why should we sacrifice performance to prevent
> data loss which can naturally be avoided by HDFS?
> 

-
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org



Re: set reduced block size for a specific file

2011-08-27 Thread Allen Wittenauer

On Aug 27, 2011, at 12:42 PM, Ted Dunning wrote:

 There is no way to do this for standard Apache Hadoop.

Sure there is.

You can build a custom conf dir and point it to that.  You *always* 
have that option for client settable options as a work around for lack of 
features/bugs.

1. Copy $HADOOP_CONF_DIR or $HADOOP_HOME/conf to a dir 
2. modify the hdfs-site.xml to have your new block size
3. Run the following:

HADOOP_CONF_DIR=mycustomconf hadoop dfs  -put file dir

Convenient?  No.  Doable? Definitely.




Re: hp sl servers with hadoop?

2011-08-24 Thread Allen Wittenauer

On Aug 24, 2011, at 7:42 AM, Koert Kuipers wrote:

 Does anyone have any experience using HP Proliant SL hardware for hadoop?

Yes.  We've got ~600 of them.


 We
 are currently using DL 160 and DL 180 servers, and the SL hardware seems to
 fit the bill for our new servers in many ways. However the shared power on
 chassis is not something i am fully comfortable with yet.

The biggest problems that our hardware team report are:

a) lack of hot swappable drives means a lot of manual work to 
replace bad disks
b) the cabling is on the front, which is pretty much opposite 
of how the rest of our gear works

Our people do tend to bring down both nodes when working on them.  But 
I think that's more of a safety precaution than a requirement.  But those 
events are relatively short lived so usually not a big deal WRT block 
replication.

Re: Simple Permissions Question

2011-08-24 Thread Allen Wittenauer

On Aug 24, 2011, at 5:25 PM, Time Less wrote:

 I'm at a loss here. Check out my groups and the directory permissions:
 
 [tellis@laxhadoop1-012 ~] :) groups
 tellis hdfs supergroup
 [tellis@laxhadoop1-012 ~] :) hadoop fs -mkdir /tmp/hive-tellis
 mkdir: org.apache.hadoop.security.AccessControlException: Permission denied:
 user=tellis, access=WRITE, inode=/tmp:hdfs:supergroup:drwxrwxr-x

...

 [hdfs@laxhadoop1-012 ~/tmp] :) hadoop fs -chmod 777 /tmp
 
 Then tellis has access to mkdir in /tmp.

This tells me that you likely aren't getting your group information 
sent to Hadoop.

 I'm really confused. What am I missing?

What version of Hadoop?  Is this a secure version with security on?  If 
you hop on the NameNode, what are your group settings?  Certain versions only 
honor group information from the namenode.  This is by design to prevent users 
lying to Hadoop about their identity.

Also, if I help, do I get some free stuff in LoL? :) 



Re: Hadoop cluster optimization

2011-08-22 Thread Allen Wittenauer

On Aug 22, 2011, at 3:00 AM, אבי ווקנין wrote:
 I assumed that the 1.7GB RAM will be the bottleneck in my environment that's
 why I am trying to change it now.
 
 I shut down the 4 datanodes with 1.7GB RAM (Amazon EC2 small instance) and
 replaced them with
 
 2 datanodes with 7.5GB RAM (Amazon EC2 large instance).

This should allow you to bump up the memory and/or increase the task 
count.

 Is it OK that the datanodes are 64 bit while the namenode is still 32 bit?

I've run several instance where the NN was 64-bit and the DNs were 
32-bit.  I can't think of a reason the reverse wouldn't work.  The only thing 
that is really going to matter is if they are the same CPU architecture. 
(which, if you are running on EC2, will likely always be the case).   

 Based on the new hardware I'm using, Are there any suggestions regarding the
 Hadoop configuration parameters?

It really depends upon how much memory you need per task.  Thus why 
task spill rate is important... :)

 One more thing, you asked: Are your tasks spilling?
 
 How can I check if my tasks spilling ?

Check the task logs.  

If you aren't spilling, then you'll likely want to match task 
count=core count-1 unless mem is exhausted first. (i.e., tasks*mem should be  
avail mem).



Re: upgrade from Hadoop 0.21.0 to a newer version??

2011-08-22 Thread Allen Wittenauer

On Aug 21, 2011, at 11:00 PM, steven zhuang wrote:

 thanks Allen, I really wish there wasn't such a version 0.21.0.  :)

It is tricky (lots of config work), but you could always run the two 
versions in parallel on the same gear, distcp from 0.21 to 0.20.203, then 
shutdown the 0.21 instance.



Re: Hadoop cluster optimization

2011-08-21 Thread Allen Wittenauer

On Aug 21, 2011, at 7:17 PM, Michel Segel wrote:

 Avi,
 First why 32 bit OS?
 You have a 64 bit processor that has 4 cores hyper threaded looks like 8cpus.

With only 1.7gb of mem, there likely isn't much of a reason to use a 
64-bit OS.  The machines (as you point out) are already tight on memory.  
64-bit is only going to make it worse.

 
 1.7 GB memory
 1 Intel(R) Xeon(R) CPU E5507 @ 2.27GHz
 Ubuntu Server 10.10 , 32-bit platform
 Cloudera CDH3 Manual Hadoop Installation
 (for the ones who are familiar with Amazon Web Services, I am talking about
 Small EC2 Instances/Servers)
 
 Total job run time is +-15 minutes (+-50 files/blocks/mapTasks of up to 250
 MB and 10 reduce tasks).
 
 Based on the above information, does anyone can recommend on a best practice
 configuration??

How many spindles?  Are your tasks spilling?


 Do you thinks that when dealing with such a small cluster, and when
 processing such a small amount of data,
 is it even possible to optimize jobs so they would run much faster? 

Most of the time, performance issues are with the algorithm, not Hadoop.



Re: upgrade from Hadoop 0.21.0 to a newer version??

2011-08-21 Thread Allen Wittenauer

On Aug 19, 2011, at 12:39 AM, steven zhuang wrote:

  I updated my hadoop cluster from 0.20.2 to higher version
 0.21.0 because of MAPREDUCE-1286, and now I have problem running a Hbase on
 it.
  I saw the 0.21.0 version is marked as unstable, unsupported, does
 not include security.
My question is that can I upgrade the cluster from 0.21.0 to a newer
 version say 0.20.203 or future version 0.22?
  

You won't be able to side-grade to 0.20.203/204.  You should be able 
to upgrade to 0.22, 0.23, etc.  Just be sure to process your edits file *prior* 
to upgrading.  Otherwise it might explode.



Re: Question regarding Capacity Scheduler

2011-08-21 Thread Allen Wittenauer

On Aug 17, 2011, at 10:53 AM, Matt Davies wrote:

 Hello,
 
 I'm playing around with the Capacity Scheduler (coming from the Fair
 Scheduler), and it appears that a queue with jobs submitted by the same user
 are treated as FIFO.  So, for example, if I submit job1 and job2 to the
 low queue as matt job1 will run to completion before job2 starts.
 
 Is there a way to share the resources in the queue between the 2 jobs when
 they both come from the same user?

No.  You'll need a different queue.




Re: Hadoop Streaming 0.20.2 and how to specify number of reducers per node -- is it possible?

2011-08-21 Thread Allen Wittenauer

On Aug 17, 2011, at 12:36 AM, Steven Hafran wrote:
 
 
 after reviewing the hadoop docs, i've tried setting the following properties
 when starting my streaming job; however, they don't seem to have any impact.
 -jobconf mapred.tasktracker.reduce.tasks.maximum=1

tasktracker is the hint:  that's a server side setting.

You're looking for the mapred.reduce.tasks settings.



Re: Why hadoop should be built on JAVA?

2011-08-21 Thread Allen Wittenauer

On Aug 15, 2011, at 9:00 PM, Chris Song wrote:

 Why hadoop should be built in JAVA?

http://www.quora.com/Why-was-Hadoop-written-in-Java


 How will it be if HADOOP is implemented in C or Phython?


http://www.quora.com/Would-Hadoop-be-different-if-it-were-coded-in-C-C++-instead-of-Java-How



Re: Hadoop cluster network requirement

2011-07-31 Thread Allen Wittenauer

On Jul 31, 2011, at 12:08 PM, jonathan.hw...@accenture.com
 jonathan.hw...@accenture.com wrote:

 I was asked by our IT folks if we can put hadoop name nodes storage using a 
 shared disk storage unit.  

What do you mean by shared disk storage unit?  There are lots of 
products out there that would claim this, so actual deployment semantics are 
important.

 Does anyone have experience of how much IO throughput is required on the name 
 nodes?

IO throughput is completely dependent dependent upon how many changes 
are being applied to the file system and frequency of edits log merging.  In 
the majority of cases it is not much.  What tends to happen where the storage 
is shared (such as a NAS) is that the *other* traffic blocks the writes for too 
long because it is overloaded and the NN declares it dead.

  What are the latency/data throughput requirements between the master and 
 data nodes - can this tolerate network routing?

If you mean different data centers, then no.  If you mean same data 
center, but with routers in between, then probably yes, but you add several 
more failure points, so this isn't recommended. 

 Did anyone published any throughput requirement for the best network setup 
 recommendation?

Not that I know of.  It is very much dependent upon the actual workload 
being performed.  But I wouldn't deploy anything slower than a 1:4 overcommit 
(uplink-to-host) on the DN side for anything real/significant.

 This message is for the designated recipient only and may contain privileged, 
 proprietary, or otherwise private information. If you have received it in 
 error, please notify the sender immediately and delete the original. Any 
 other use of the email by you is prohibited.

Lawyers are funny people.  I wonder how much they got paid for this one.

Re: Moving Files to Distributed Cache in MapReduce

2011-07-31 Thread Allen Wittenauer

We really need to build a working example to the wiki and add a link from the 
FAQ page.  Any volunteers?

On Jul 29, 2011, at 7:49 PM, Michael Segel wrote:

 
 Here's the meat of my post earlier...
 Sample code on putting a file on the cache:
 DistributedCache.addCacheFile(new URI(path+MyFileName,conf));
 
 Sample code in pulling data off the cache:
   private Path[] localFiles = 
 DistributedCache.getLocalCacheFiles(context.getConfiguration());
boolean exitProcess = false;
   int i=0;
while (!exit){ 
fileName = localFiles[i].getName();
   if (fileName.equalsIgnoreCase(model.txt)){
 // Build your input file reader on localFiles[i].toString() 
 exitProcess = true;
   }
i++;
} 
 
 
 Note that this is SAMPLE code. I didn't trap the exit condition if the file 
 isn't there and you go beyond the size of the array localFiles[].
 Also I set exit to false because its easier to read this as Do this loop 
 until the condition exitProcess is true.
 
 When you build your file reader you need the full path, not just the file 
 name. The path will vary when the job runs.
 
 HTH
 
 -Mike
 
 
 From: michael_se...@hotmail.com
 To: common-user@hadoop.apache.org
 Subject: RE: Moving Files to Distributed Cache in MapReduce
 Date: Fri, 29 Jul 2011 21:43:37 -0500
 
 
 I could have sworn that I gave an example earlier this week on how to push 
 and pull stuff from distributed cache.
 
 
 Date: Fri, 29 Jul 2011 14:51:26 -0700
 Subject: Re: Moving Files to Distributed Cache in MapReduce
 From: rogc...@ucdavis.edu
 To: common-user@hadoop.apache.org
 
 jobConf is deprecated in 0.20.2 I believe; you're supposed to be using
 Configuration for that
 
 On Fri, Jul 29, 2011 at 1:59 PM, Mohit Anchlia 
 mohitanch...@gmail.comwrote:
 
 Is this what you are looking for?
 
 http://hadoop.apache.org/common/docs/current/mapred_tutorial.html
 
 search for jobConf
 
 On Fri, Jul 29, 2011 at 1:51 PM, Roger Chen rogc...@ucdavis.edu wrote:
 Thanks for the response! However, I'm having an issue with this line
 
 Path[] cacheFiles = DistributedCache.getLocalCacheFiles(conf);
 
 because conf has private access in org.apache.hadoop.configured
 
 On Fri, Jul 29, 2011 at 11:18 AM, Mapred Learn mapred.le...@gmail.com
 wrote:
 
 I hope my previous reply helps...
 
 On Fri, Jul 29, 2011 at 11:11 AM, Roger Chen rogc...@ucdavis.edu
 wrote:
 
 After moving it to the distributed cache, how would I call it within
 my
 MapReduce program?
 
 On Fri, Jul 29, 2011 at 11:09 AM, Mapred Learn 
 mapred.le...@gmail.com
 wrote:
 
 Did you try using -files option in your hadoop jar command as:
 
 /usr/bin/hadoop jar jar name main class name -files  absolute
 path
 of
 file to be added to distributed cache input dir output dir
 
 
 On Fri, Jul 29, 2011 at 11:05 AM, Roger Chen rogc...@ucdavis.edu
 wrote:
 
 Slight modification: I now know how to add files to the
 distributed
 file
 cache, which can be done via this command placed in the main or
 run
 class:
 
   DistributedCache.addCacheFile(new
 URI(/user/hadoop/thefile.dat),
 conf);
 
 However I am still having trouble locating the file in the
 distributed
 cache. *How do I call the file path of thefile.dat in the
 distributed
 cache
 as a string?* I am using Hadoop 0.20.2
 
 
 On Fri, Jul 29, 2011 at 10:26 AM, Roger Chen rogc...@ucdavis.edu
 
 wrote:
 
 Hi all,
 
 Does anybody have examples of how one moves files from the local
 filestructure/HDFS to the distributed cache in MapReduce? A
 Google
 search
 turned up examples in Pig but not MR.
 
 --
 Roger Chen
 UC Davis Genome Center
 
 
 
 
 --
 Roger Chen
 UC Davis Genome Center
 
 
 
 
 
 --
 Roger Chen
 UC Davis Genome Center
 
 
 
 
 
 --
 Roger Chen
 UC Davis Genome Center
 
 
 
 
 
 -- 
 Roger Chen
 UC Davis Genome Center

 



Re: Hadoop cluster network requirement

2011-07-31 Thread Allen Wittenauer

On Jul 31, 2011, at 7:30 PM, Saqib Jang -- Margalla Communications wrote:

 Thanks, I'm independently doing some digging into Hadoop networking
 requirements and 
 had a couple of quick follow-ups. Could I have some specific info on why
 different data centers 
 cannot be supported for master node and data node comms?
 Also, what 
 may be the benefits/use cases for such a scenario?

Most people who try to put the NN and DNs in different data centers are 
trying to achieve disaster recovery:  one file system in multiple locations.  
That isn't the way HDFS is designed and it will end in tears. There are 
multiple problems:

1) no guarantee that one block replica will be each data center (thereby 
defeating the whole purpose!)
2) assuming one can work out problem 1, during a network break, the NN will 
lose contact from one half of the  DNs, causing a massive network replication 
storm
3) if one using MR on top of this HDFS, the shuffle will likely kill the 
network in between (making MR performance pretty dreadful) is going to cause 
delays for the DN heartbeats
4) I don't even want to think about rebalancing.

... and I'm sure a lot of other problems I'm forgetting at the moment.  
So don't do it.

If you want disaster recovery, set up two completely separate HDFSes 
and run everything in parallel.

Re: TaskTrackers behind NAT

2011-07-18 Thread Allen Wittenauer

On Jul 18, 2011, at 12:53 PM, Ben Clay wrote:

 I'd like to spread Hadoop across two physical clusters, one which is
 publicly accessible and the other which is behind a NAT. The NAT'd machines
 will only run TaskTrackers, not HDFS, and not Reducers either (configured
 with 0 Reduce slots).  The master node will run in the publicly-available
 cluster.

Off the top, I doubt it will work : MR is bi-directional, across many 
random ports.  So I would suspect there is going to be a lot of hackiness in 
the network config to make this work.

 1. Port 50060 needs to be opened for all NAT'd machines, since Reduce tasks
 fetch intermediate data from http://
 http://%3ctasktracker%3e:50060/mapOutput tasktracker:50060/mapOutput,
 correct ?  I'm getting Too many fetch-failures with no open ports, so I
 assume the Reduce tasks need to pull the intermediate data instead of Map
 tasks pushing it.

Correct. Reduce tasks pull.

 2. Although the NAT'd machines have unique IPs and reach the outside, the
 DHCP is not assigning them hostnames.  Therefore, when they join the
 JobTracker I get
 tracker_localhost.localdomain:localhost.localdomain/127.0.0.1 on the
 machine list page.  Is there some way to force Hadoop to refer to them via
 IP instead of hostname, since I don't have control over the DHCP? I could
 manually assign a hostname via /etc/hosts on each NAT'd machine, but these
 are actually VMs and I will have many of them receiving semi-random IPs,
 making this an ugly administrative task.


Short answer: no.

Long answer: no, fix your DHCP and/or do the /etc/hosts hack.



Re: Which release to use?

2011-07-18 Thread Allen Wittenauer

On Jul 18, 2011, at 5:01 PM, Rita wrote:

 I made the big mistake by using the latest version, 0.21.0 and found bunch
 of bugs so I got pissed off at hdfs. Then, after reading this thread it
 seems I should of used 0.20.x .
 
 I really wish we can fix this on the website, stating 0.21.0 as unstable.


It is stated in a few places on the website that 0.21 isn't stable:

http://hadoop.apache.org/common/releases.html#23+August%2C+2010%3A+release+0.21.0+available

It has not undergone testing at scale and should not be considered stable or 
suitable for production.

... and ...

http://hadoop.apache.org/common/releases.html#Download

0.21.X - unstable, unsupported, does not include security

and it isn't in the stable directory on the apache download mirrors.




Re: Which release to use?

2011-07-18 Thread Allen Wittenauer

On Jul 18, 2011, at 6:02 PM, Rita wrote:

 I am a dimwit.


We are conditioned by marketing that a higher number is always better.  
Experience tells us that this is not necessarily true.




Re: Lack of data locality in Hadoop-0.20.2

2011-07-12 Thread Allen Wittenauer

On Jul 12, 2011, at 10:27 AM, Virajith Jalaparti wrote:

 I agree that the scheduler has lesser leeway when the replication factor is
 1. However, I would still expect the number of data-local tasks to be more
 than 10% even when the replication factor is 1.

How did you load your data?

Did you load it from outside the grid or from one of the datanodes?  If 
you loaded from one of the datanodes, you'll basically have no real locality, 
especially with a rep factor of 1.




Re: How to query a slave node for monitoring information

2011-07-12 Thread Allen Wittenauer
On Jul 12, 2011, at 3:02 PM, samdispmail-tru...@yahoo.com
 samdispmail-tru...@yahoo.com wrote:
 I am new to Hadoop, and I apologies if this was answered before, or if this 
 is 
 not the right list for my question.

common-user@ would likely have been better, but I'm too lazy to forward 
you there today. :)

 
 I am trying to do the following:
 1- Read monitoring information from slave nodes in hadoop
 2- Process the data to detect nodes failure (node crash, problems in requests 
 ... etc) and decide if I need to restart the whole machine.
 3- Restart the machine running the slave facing problems


At scale, one doesn't monitor individual nodes for up/down.  Verifying 
the up/down of a given node will drive you insane and is pretty much a waste of 
time unless the grid itself is under-configured to the point that *every* 
*node* *counts*.  (If that is the case, then there are bigger issues afoot...)

Instead, one should monitor the namenode and jobtracker and alert based 
on a percentage of availability.  This can be done in a variety of ways, 
depending upon which version of Hadoop is in play.  For 0.20.2, a simple screen 
scrape is good enough.  I recommend warn on 10%, alert on 20%, panic on 30%.

 My question is for step 1- collecting monitoring information.
 I have checked Hadoop monitoring features. But currently you can forward the 
 motioning data to files, or to Ganglia.


Do you want monitoring information or metrics information?  Ganglia is 
purely a metrics tool.  Metrics are a different animal.  While it is possible 
to alert on them, in most cases they aren't particular useful in a monitoring 
context other than up/down.




Re: Cluster Tuning

2011-07-11 Thread Allen Wittenauer

On Jul 11, 2011, at 9:28 AM, Juan P. wrote:
 
 *  property*
 *namemapred.child.java.opts/name*
 *value-Xmx400m/value*
 *  /property*

Single core machines with 600MB of RAM.

2x400m = 800m just for the heap of the map and reduce phases, 
not counting the other memory that the jvm will need.  io buffer sizes aren't 
adjusted downward either, so you're likely looking at a swapping + spills = 
death scenario.  slowstart set to 1 is going to be pretty much required.

Re: Job Priority Hadoop 0.20.203

2011-07-06 Thread Allen Wittenauer

On Jul 6, 2011, at 5:22 AM, Nitin Khandelwal wrote:

 Hi,
 
 I am using Hadoop 0.20.203 with the new API ( mapreduce package) . I want to
 use Jobpriority, but unfortunately there is no option to set that in Job (
 the option is there in 0.21.0). Can somebody plz tell me is there is a
 walkaround to set job priority?


 Job priority is slowly (read: unofficially) on its way to getting deprecated, 
if one takes the fact that cap sched now completely ignores it in 203.  I, too, 
am sad about this.

Re: Dead data nodes during job excution and failed tasks.

2011-06-30 Thread Allen Wittenauer

On Jun 30, 2011, at 10:01 AM, David Ginzburg wrote:

 
 Hi,
 I am running a certain job which constantly cause dead data nodes (who come 
 back later, spontaneously ).

Check your memory usage during the job run.  Chances are good the 
DataNode is getting swapped out.

Re: How does Hadoop manage memory?

2011-06-30 Thread Allen Wittenauer

On Jun 28, 2011, at 1:43 PM, Peter Wolf wrote:

 Hello all,
 
 I am looking for the right thing to read...
 
 I am writing a MapReduce Speech Recognition application.  I want to run many 
 Speech Recognizers in parallel.
 
 Speech Recognizers not only use a large amount of processor, they also use a 
 large amount of memory.  Also, in my application, they are often idle much of 
 the time waiting for data.  So optimizing what runs when is non-trivial.
 
 I am trying to better understand how Hadoop manages resources.  Does it 
 automatically figure out the right number of mappers to instantiate?

The number of mappers correlates to the number of InputSplits, which is 
based upon the InputFormat.  In most cases, this is equivalent to the number of 
blocks.  So if a file is composed of 3 blocks, it will generate 3 mappers.  
Again, depending upon the InputFormat, the size of these splits may be 
manipulated via job settings.


  How?  What happens when other people are sharing the cluster?  What resource 
 management is the responsibility of application developers?

Realistically, *all* resource management is the responsibility of the 
operations and development teams.  The only real resource protection/allocation 
system that Hadoop provides is task slots and, if enabled, some memory 
protection in the form of don't go over this much.On multi-tenant 
systems, a good neighbor view of the world should be adopted.

 For example, let's say each Speech Recognizer uses 500 MB, and I have 
 1,000,000 files to process.  What would happen if I made 1,000,000 mappers, 
 each with 1 Speech Recognizer?  

At 1m mappers, the JobTracker would likely explode under the weight 
first unless the Heap size was raised significantly.  Each value that you see 
on the JT page--including those for each task--are kept in main memory.  


 Is it only non-optimal because of setup time, or would the system try to 
 allocate 500GB of memory and explode?

If you have 1m map slots, yes, it would allocate .5TB of mem spread 
across each node.



Re: tasktracker maximum map tasks for a certain job

2011-06-22 Thread Allen Wittenauer

On Jun 21, 2011, at 9:52 AM, Jonathan Zukerman wrote:

 Hi,
 
 Is there a way to set the maximum map tasks for all tasktrackers in my
 cluster for a certain job?
 Most of my tasktrackers are configured to handle 4 maps concurrently, and
 most of my jobs don't care where does the map function run. But small part
 of my jobs requires that no two map functions will run from the same IP at
 the same time, i.e., the maximum number of map tasks for each tasktracker
 should be 1 for these jobs.


http://wiki.apache.org/hadoop/FAQ#How_do_I_limit_.28or_increase.29_the_number_of_concurrent_tasks_a_job_may_have_running_total_at_a_time.3F


Re: controlling no. of mapper tasks

2011-06-22 Thread Allen Wittenauer

On Jun 20, 2011, at 12:24 PM, praveen.pe...@nokia.com
 praveen.pe...@nokia.com wrote:

 Hi there,
 I know client can send mapred.reduce.tasks to specify no. of reduce tasks 
 and hadoop honours it but mapred.map.tasks is not honoured by Hadoop. Is 
 there any way to control number of map tasks? What I noticed is that Hadoop 
 is choosing too many mappers and there is an extra overhead being added due 
 to this. For example, when I have only 10 map tasks, my job finishes faster 
 than when Hadoop chooses 191 map tasks. I have 5 slave cluster and 10 tasks 
 can run in parallel. I want to set both map and reduce tasks to be 10 for max 
 efficiency.


http://wiki.apache.org/hadoop/FAQ#How_do_I_limit_.28or_increase.29_the_number_of_concurrent_tasks_a_job_may_have_running_total_at_a_time.3F




Re: Large startup time in remote MapReduce job

2011-06-22 Thread Allen Wittenauer

On Jun 21, 2011, at 2:02 PM, Harsh J wrote:
 
 If your jar does not contain code changes that need to get transmitted
 every time, you can consider placing them on the JT/TT classpaths
 
... which means you get to bounce your system every time you change 
 code.
 
 Its ugly, but if the jar filename remains the same there shouldn't
 need to be any bouncing. Doable if there's no activity at replacement
 point of time?

I have yet to find a jar file that never ever changes.

It may take a year+, but it will change.  The corollary is that you'll 
need to change it at the worst possible time and in an incompatible way such 
that older code will break and need to be upgraded.  So don't do it. :)



Re: Large startup time in remote MapReduce job

2011-06-22 Thread Allen Wittenauer

On Jun 22, 2011, at 10:08 AM, Allen Wittenauer wrote:

 
 On Jun 21, 2011, at 2:02 PM, Harsh J wrote:
 
 If your jar does not contain code changes that need to get transmitted
 every time, you can consider placing them on the JT/TT classpaths
 
   ... which means you get to bounce your system every time you change 
 code.
 
 Its ugly, but if the jar filename remains the same there shouldn't
 need to be any bouncing. Doable if there's no activity at replacement
 point of time?
 
   I have yet to find a jar file that never ever changes.


(I suppose the exception to this rule is all the java code at places 
like NASA, JPL, etc involved with the space program.  But that's totally 
cheating!)




Re: Upgrading namenode/secondary node hardware

2011-06-20 Thread Allen Wittenauer

On Jun 15, 2011, at 3:18 AM, Steve Loughran wrote:
 yes, put it in the same place on your HA storage and you may not even need to 
 reconfigure it. If you didn't shut down the filesystem cleanly, you'll need 
 to replay the edit logs.


As a sidenote...

Lots of weird incompatibilities have snuck into the code with the 
editslog between versions.  You REALLY REALLY REALLY don't want to let editslog 
get processed by a different version.

Shutdown the NN, DNs, etc.  Then start the NN up to let it process the 
editslog.  Then shutdown the NN again with a clean editslog and do the upgrade.

Re: NameNode heapsize

2011-06-13 Thread Allen Wittenauer

On Jun 13, 2011, at 5:52 AM, Steve Loughran wrote:
 
 Unless your cluster is bigger than Facebooks, you have too many small files
 
 


+1 

(I'm actually sort of surprised the NN is still standing with only 24mb.  The 
gc logs would be interesting to look at.)

I'd also likely increase the block size, distcp files to the new block size, 
and then replace the old files with the new files.




Re: Regarding reading image files in hadoop

2011-06-13 Thread Allen Wittenauer

On Jun 12, 2011, at 11:15 PM, Nitesh Kaushik wrote:

 Dear Sir/Madam,
 
I am Nitesh kaushik, working with institution dealing in
 satellite images. I am trying to read bulk images
using Hadoop but unfortunately that is not working, i am not
 getting any clue how to work with images in hadoop
Hadoop is working great with text files and producing good
 results.
 
 Please tell me how can i deal with image data in hadoop.


The typical problems with using images are:

a) If you are using streaming, you'll likely need to encode the 
file or upgrade to a version of hadoop that support binary streams over 
streaming.

b) Either need a custom InputFormat or set the split size to 
match the file size.




Re: Block size in HDFS

2011-06-13 Thread Allen Wittenauer

FYI, I've added this to the FAQ since it comes up every so often.


Re: concurrent job execution

2011-06-06 Thread Allen Wittenauer

On Jun 3, 2011, at 1:11 AM, Felix Sprick wrote:

 Hi,
 
 We are running MapReduce on Hbase tables and are trying to implement a
 scenario with MapReduce where tasks are submitted from a GUI application.
 This means that several users (currently 5-10) may use the system in
 parallel. 

In addition to moving from the FIFO/default scheduler, make sure your 
GUI application actually submits the jobs as different users.   If they are the 
same user, most (all?) schedulers will still FIFO the jobs.

 (-Dhadoop.job.ugi in 0.20.2 and lower, doAs() in 0.20.203 and higher).

Re: Changing dfs.block.size

2011-06-06 Thread Allen Wittenauer

On Jun 6, 2011, at 12:09 PM, J. Ryan Earl wrote:

 Hello,
 
 So I have a question about changing dfs.block.size in
 $HADOOP_HOME/conf/hdfs-site.xml.  I understand that when files are created,
 blocksizes can be modified from default.  What happens if you modify the
 blocksize of an existing HDFS site?  Do newly created files get the default
 blocksize and old files remain the same?

Yes.


  Is there a way to change the
 blocksize of existing files; I'm assuming you could write MapReduce job to
 do it, but any build in facilities?

You can use distcp to copy the files back onto the same fs in a new 
location.  The new files should be in the new block size.  Now you can move the 
new files where the old files used to live.

Re: Unable to start hadoop-0.20.2 but able to start hadoop-0.20.203 cluster

2011-05-27 Thread Allen Wittenauer

On May 27, 2011, at 7:26 AM, DAN wrote:
 You see you have 2 Solaris servers for now, and dfs.replication is setted 
 as 3.
 These don't match.


That doesn't matter.  HDFS will basically flag any files written with a 
warning that they are under-replicated.

The problem is that the datanode processes aren't running and/or aren't 
communicating to the namenode. That's what the java.io.IOException: File 
/tmp/hadoop-cfadm/mapred/system/jobtracker.info could only be replicated to 0 
nodes, instead of 1 means.

It should also be pointed out that writing to /tmp (the default) is a 
bad idea.  This should get changed.

Also, since you are running Solaris, check the FAQ on some settings 
you'll need to do in order to make Hadoop's broken username detection to work 
properly, amongst other things.

Re: Unable to start hadoop-0.20.2 but able to start hadoop-0.20.203 cluster

2011-05-27 Thread Allen Wittenauer

On May 27, 2011, at 1:18 PM, Xu, Richard wrote:

 Hi Allen,
 
 Thanks a lot for your response.
 
 I agree with you that it does not matter with replication settings.
 
 What really bothered me is same environment, same configures, hadoop 0.20.203 
 takes us 3 mins, why 0.20.2 took 3 days.
 
 Can you pls. shed more light on how to make Hadoop's broken username 
 detection to work properly?

It's in the FAQ so that I don't have to do that.

http://wiki.apache.org/hadoop/FAQ


Also, check your logs.  All your logs.  Not just the namenode log.

Re: Hadoop tool-kit for monitoring

2011-05-17 Thread Allen Wittenauer

On May 17, 2011, at 1:01 PM, Mark question wrote:

 Hi
 
  I need to use hadoop-tool-kit for monitoring. So I followed
 http://code.google.com/p/hadoop-toolkit/source/checkout
 
 and applied the patch in my hadoop.20.2 directory as: patch -p0  patch.20.2

Looking at the code, be aware this is going to give incorrect 
results/suggestions for certain stats it generates when multiple jobs are 
running.  

It also seems to lack the algorithm should be rewritten and the data 
was loaded incorrectly suggestions, which is usually the proper answer for 
perf problems 80% of the time.

Re: Hadoop tool-kit for monitoring

2011-05-17 Thread Allen Wittenauer

On May 17, 2011, at 3:11 PM, Mark question wrote:

 So what other memory consumption tools do you suggest? I don't want to do it
 manually and dump statistics into file because IO will affect performance
 too.

We watch memory with Ganglia.  We also tune our systems such that a 
task will only take X amount.  In other words, given an 8gb RAM:

1gb for the OS
1gb for the TT and DN
6gb for all tasks

if we assume each task will take max 1gb, then we end up with 3 maps 
and 3 reducers.  

Keep in mind that the mem consumed is more than just JVM heap size.

Re: Rapid growth in Non DFS Used disk space

2011-05-13 Thread Allen Wittenauer

On May 13, 2011, at 10:48 AM, Todd Lipcon wrote:
 
 
 2) Any ideas on what is driving the growth in Non DFS Used space?   I
 looked for things like growing log files on the datanodes but didn't find
 anything.
 
 
 Logs are one possible culprit. Another is to look for old files that might
 be orphaned in your mapred.local.dir - there have been bugs in the past
 where we've leaked files. If you shut down the TaskTrackers, you can safely
 delete everything from within mapred.local.dirs.

Part of our S.O.P. during Hadoop bounces is to wipe mapred.local out.  
The TT doesn't properly clean up after itself.

Re: Suggestions for swapping issue

2011-05-11 Thread Allen Wittenauer

On May 11, 2011, at 11:11 AM, Adi wrote:

 By our calculations hadoop should not exceed 70% of memory.
 Allocated per node - 48 map slots (24 GB) ,  12 reduce slots (6 GB), 1 GB
 each for DataNode/TaskTracker and one JobTracker Totalling 33/34 GB
 allocation.

It sounds like you are only taking into consideration the heap size.  
There is more memory allocated than just the heap...

 The queues are capped at using only 90% of capacity allocated so generally
 10% of slots are always kept free.

But that doesn't translate into how free the nodes, which you've 
discovered.  Individual nodes should be configured based on the assumption that 
*all* slots will be used.

 The cluster was running total 33 mappers and 1 reducer so around 8-9 mappers
 per node with 3 GB max limit and they were utilizing around 2GB each.
 Top was showing 100% memory utilized. Which our sys admin says is ok as the
 memory is used for file caching by linux if the processes are not using it.

Well, yes and no.  What is the breakdown of that 100%?  Is there any 
actually allocated to buffer cache or is it all user space?

 No swapping on 3 nodes.
 Then node4 just started swapping after the number of processes shot up
 unexpectedly. The main mystery are these excess number of processes on the
 node which went down. 36 as opposed to expected 11. The other 3 nodes were
 successfully executing the mappers without any memory/swap issues.

Likely speculative execution or something else.  But again: don't build 
machines with the assumption that only x% of the slots will get used.  There is 
no guarantee in the system that says that free slots will be balanced across 
all nodes... esp when you take into consideration node failure.

 
 -Adi
 
 On Wed, May 11, 2011 at 1:40 PM, Michel Segel 
 michael_se...@hotmail.comwrote:
 
 You have to do the math...
 If you have 2gb per mapper, and run 10 mappers per node... That means 20gb
 of memory.
 Then you have TT and DN running which also take memory...
 
 What did you set as the number of mappers/reducers per node?
 
 What do you see in ganglia or when you run top?
 
 Sent from a remote device. Please excuse any typos...
 
 Mike Segel
 
 On May 11, 2011, at 12:31 PM, Adi adi.pan...@gmail.com wrote:
 
 Hello Hadoop Gurus,
 We are running a 4-node cluster. We just upgraded the RAM to 48 GB. We
 have
 allocated around 33-34 GB per node for hadoop processes. Leaving the rest
 of
 the 14-15 GB memory for OS and as buffer. There are no other processes
 running on these nodes.
 Most of the lighter jobs run successfully but one big job is
 de-stabilizing
 the cluster. One node starts swapping and runs out of swap space and goes
 offline. We tracked the processes on that node and noticed that it ends
 up
 with more than expected hadoop-java processes.
 The other 3 nodes were running 10 or 11 processes and this node ends up
 with
 36. After killing the job we find these processes still show up and we
 have
 to kill them manually.
 We have tried reducing the swappiness to 6 but saw the same results. It
 also
 looks like hadoop stays well within the memory limits allocated and still
 starts swapping.
 
 Some other suggestions we have seen are:
 1) Increase swap size. Current size is 6 GB. The most quoted size is
 'tons
 of swap' but note sure how much it translates to in numbers. Should it be
 16
 or 24 GB
 2) Increase overcommit ratio. Not sure if this helps as a few blog
 comments
 mentioned it didn't help
 
 Any other hadoop or linux config suggestions are welcome.
 
 Thanks.
 
 -Adi
 



Re: configuration and FileSystem

2011-05-10 Thread Allen Wittenauer

On May 10, 2011, at 9:57 AM, Gang Luo wrote:
 I was confused by the configuration and file system in hadoop. when we create 
 a 
 FileSystem object and read/write something through it, are we writing to or 
 reading from HDFS?

Typically, yes.

 Could it be local file system?

Yes.


 If yes, what determines which 
 file system it is? Configuration object we used to create the FileSystem 
 object?

Yes.

 When I write mapreduce program which extends Configured implements Tool, I 
 can 
 get the right configuration by calling getConf() and use the FileSystem 
 object 
 to communicate with HDFS. What if I want to read/write HDFS in a separate 
 Utility class? Where does the configuration come from?

You need to supply it.




Re: our experiences with various filesystems and tuning options

2011-05-10 Thread Allen Wittenauer

On May 10, 2011, at 6:14 AM, Marcos Ortiz wrote:
 My prefered filesystem is ZFS, It's a shame that Linux support is very 
 inmature yet. For that reason, I changed my PostgreSQL hosts to FreeBSD-8.0 
 to use
 ZFS like filesystem and it's really rocks.
 
 Had anyone tested a Hadoop cluster with this filesystem?
 On Solaris or FreeBSD?

HDFS capacity numbers go really wonky on pooled storage systems like 
ZFS.  Other than that, performance is more than acceptable vs. ext4.  [Sorry, I 
don't have my benchmark numbers handy.]

Re: socket timeouts, dropped packages

2011-04-14 Thread Allen Wittenauer

On Apr 13, 2011, at 1:06 PM, Ferdy Galema wrote:
 
 @Allen/Arun
 My bad, I was not aware that cloudera releases could not be discussed here at 
 all. I was thinking that even though cloudera releases are somewhat 
 different, issues that are probably generic could still discussed here. 
 (Surely I would use the cloudera lists when I'm pretty sure it's absolutely 
 specific to cloudera).

Unfortunately, all of the various forks of the Apache releases (regardless of 
where they come from) have diverged enough that the issues are rarely generic 
anymore, outside of those answered on the FAQ. :(



Re: namenode format error

2011-04-13 Thread Allen Wittenauer

On Apr 13, 2011, at 12:38 PM, Jeffrey Wang wrote:
 It's just in my home directory, which is an NFS mount. I moved it off NFS and 
 it seems to work fine. Is there some reason it doesn't work with NFS?


Locking on NFS--regardless of application--is a dice roll, especially when 
client/server are different OSes and platforms.  Heck even when they are the 
same, it might not work.  For example, on some older Linux kernels, flock() on 
NFS only created a lock locally and didn't tell the NFS server, allowing 
another client to claim a lock!  (IIRC, fcntl did work...)

Best bet is to make sure rpc.lockd (or whatever daemon provides NFS locking) is 
running/working and to use the highest compatible version of NFS.  NFSv4, in 
particular, has file locking built into the protocol rather than as an 
'external' service.

Re: HDFS Compatiblity

2011-04-05 Thread Allen Wittenauer

On Apr 5, 2011, at 5:22 AM, Matthew John wrote:
 Can HDFS run over a RAW DISK which is mounted over a mount point with
 no FIle System ? Or does it interact only with POSIX compliant File
 sytem ?


It needs a POSIX file system.



Re: Including Additional Jars

2011-04-04 Thread Allen Wittenauer

On Apr 4, 2011, at 8:06 AM, Shuja Rehman wrote:

 Hi All
 
 I have created a map reduce job and to run on it on the cluster, i have
 bundled all jars(hadoop, hbase etc) into single jar which increases the size
 of overall file. During the development process, i need to copy again and
 again this complete file which is very time consuming so is there any way
 that i just copy the program jar only and do not need to copy the lib files
 again and again. i am using net beans to develop the program.
 
 kindly let me know how to solve this issue?

This was in the FAQ, but in a non-obvious place.  I've updated it to be 
more visible (hopefully):

http://wiki.apache.org/hadoop/FAQ#How_do_I_submit_extra_content_.28jars.2C_static_files.2C_etc.29_for_my_job_to_use_during_runtime.3F

Re: Question on Streaming with RVM. Perhaps environment settings related.

2011-04-02 Thread Allen Wittenauer

On Apr 1, 2011, at 9:47 PM, Guang-Nan Cheng wrote:
 
 The problem is MapRed process seems can't load RVM. I added
 /etc/profile.d/rvm.sh in hadoop-env.sh. But the script still fails due
 to the same error.

Add this to the .bashrc:

[ -x /etc/profile ]  . /etc/profile

and make sure /etc/profile is executable.


Re: Problem in Job/TasK scheduling

2011-04-02 Thread Allen Wittenauer

On Apr 1, 2011, at 3:03 AM, Nitin Khandelwal wrote:

 Hi,
 
 
 I am right now stuck on the issue of division of tasks among slaves for a
 job. Currently, as far as I know, hadoop does not allow us to fix/determine
 in advance how many tasks of a job would run on each slave. I am trying to
 design a system, where I want a slave to execute only one task of one job
 type ,
 however at the same time , be able to accept/execute task of jobs of other
 types. Is this kind of scheduling possible in hadoop.

This is in the FAQ:

http://wiki.apache.org/hadoop/FAQ#How_do_I_limit_.28or_increase.29_the_number_of_concurrent_tasks_running_on_a_node.3F



Re: distributed cache exceeding local.cache.size

2011-03-31 Thread Allen Wittenauer

On Mar 31, 2011, at 11:45 AM, Travis Crawford wrote:

 Is anyone familiar with how the distributed cache deals when datasets
 larger than the total cache size are referenced? I've disabled the job
 that caused this situation but am wondering if I can configure things
 more defensively.

I've started building specific file systems on drives to store the map 
reduce spill space.  It seems to be the only reliable way to prevent MR from 
going nuts.  Sure, some jobs may fail, but that seems to be a better strategy 
than the alternative.




Re: Is anyone running Hadoop 0.21.0 on Solaris 10 X64?

2011-03-31 Thread Allen Wittenauer

On Mar 31, 2011, at 7:43 AM, XiaoboGu wrote:

 I have trouble browsing the file system vi namenode web interface, namenode 
 saying in log file that th –G option is invalid to get the groups for the 
 user.
 


I don't but I suspect you'll need to enable one of the POSIX 
personalities before launching the namenode.  In particular, this means putting 
/usr/xpg4/bin or /usr/xpg6/bin in the PATH prior to the SysV /usr/bin entry.




Re: changing node's rack

2011-03-26 Thread Allen Wittenauer

On Mar 26, 2011, at 3:50 PM, Ted Dunning wrote:

 I think that the namenode remembers the rack.  Restarting the datanode
 doesn't make it forget.

Correct.

https://issues.apache.org/jira/browse/HDFS-870



Re: A way to monitor HDFS for a file to come live, and then kick off a job?

2011-03-25 Thread Allen Wittenauer

On Mar 24, 2011, at 10:09 AM, Jonathan Coveney wrote:

 I am not sure if this is the right listserv, forgive me if it is not.

A better choice would likely be hdfs-user@, since this is really about 
watching files in HDFS.


 My
 goal is this: monitor HDFS until a file is create, and then kick off a job.
 Ideally I'd want to do this continuously, but the file would be create
 hourly (with some sort of variance). I guess I could make a script that
 would ping the server every 5 minutes or something, but I was wondering if
 there might be a more elegant way?

Two ways off the top of my head:

1) Read/watch the edits stream

2) Read/watch the HDFS audit log

Given the latter is text built by log4j, that should be relatively 
simple to implement.

There was a JIRA asking for this functionally to be built in recently, btw.

Re: Datanode won't start with bad disk

2011-03-25 Thread Allen Wittenauer

On Mar 24, 2011, at 10:47 AM, Adam Phelps wrote:

 For reference, this is running hadoop 0.20.2 from the CDH3B4 distribution.

Given that this isn't a standard Apache release, you'll likely be 
better served by asking Cloudera.



Re: A way to monitor HDFS for a file to come live, and then kick off a job?

2011-03-25 Thread Allen Wittenauer

On Mar 24, 2011, at 10:09 AM, Jonathan Coveney wrote:

 I am not sure if this is the right listserv, forgive me if it is not.

A better choice would likely be hdfs-user@, since this is really about 
watching files in HDFS.


 My
 goal is this: monitor HDFS until a file is create, and then kick off a job.
 Ideally I'd want to do this continuously, but the file would be create
 hourly (with some sort of variance). I guess I could make a script that
 would ping the server every 5 minutes or something, but I was wondering if
 there might be a more elegant way?

Two ways off the top of my head:

1) Read/watch the edits stream

2) Read/watch the HDFS audit log

Given the latter is text built by log4j, that should be relatively 
simple to implement.

There was a JIRA asking for this functionally to be built in recently, btw.

Re: CDH and Hadoop

2011-03-24 Thread Allen Wittenauer

On Mar 23, 2011, at 7:29 AM, Rita wrote:

 I have been wondering if I should use CDH (http://www.cloudera.com/hadoop/)
 instead of the standard Hadoop distribution.
 
 What do most people use? Is CDH free? do they provide the tars or does it
 provide source code and I simply compile? Can I have some data nodes as CDH
 and the rest as regular Hadoop?

 I think most of the larger sites are running some form of modified 
Apache release, in some cases having migrated off of a CDH release.  At 
LinkedIn, we've been using the Apache 0.20.2 release with 2 patches related to 
the capacity scheduler for over a year now.  

In our case, I never deployed CDH, other than a test setup.  I opted 
not to use CDH in the CDH2 and CDH3 beta time frame due to some patches that I 
felt were not of a high quality as well as the potential for vendor lock-in.  
But I haven't looked at it in probably a year.

Re: kill-task syntax?

2011-03-23 Thread Allen Wittenauer

On Mar 23, 2011, at 2:02 PM, Keith Wiley wrote:

 hadoop job -kill-task and -fail-task don't work for me.  I get this kind of 
 error:
 
 Exception in thread main java.lang.IllegalArgumentException: TaskAttemptId 
 string : task_201103101623_12995_r_00 is not properly formed
   at 
 org.apache.hadoop.mapreduce.TaskAttemptID.forName(TaskAttemptID.java:170)
   at 
 org.apache.hadoop.mapred.TaskAttemptID.forName(TaskAttemptID.java:108)
   at org.apache.hadoop.mapred.JobClient.run(JobClient.java:1787)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
   at org.apache.hadoop.mapred.JobClient.main(JobClient.java:1956)
 
 How is that not a properly formed task id?  I copied it straight off the job 
 tracker?  Killing a job generally works for mealthough much to my 
 dismay the job often doesn't actually die, which is why I'm not attempting to 
 kill the underlying tasks directly.

It actually wants the attemptid, not the task id.  :(

So attempt_201103101623_12995_r_00_0 would have worked.

There should be a JIRA filed

Re: keeping an active hdfs cluster balanced

2011-03-17 Thread Allen Wittenauer

On Mar 17, 2011, at 12:13 PM, Stuart Smith wrote:

 Parts of this may end up on the hbase list, but I thought I'd start here. My 
 basic problem is:
 
 My cluster is getting full enough that having one data node go down does put 
 a bit of pressure on the system (when balanced, every DN is more than half 
 full).

Usually around the ~80% full mark is when HDFS starts getting a bit 
wonky on super active grids. Your best bet is to either delete some data/store 
the data more efficiently, add more nodes, or upgrade the storage capacity of 
the nodes you have.  The balancer is only going to save you for so long until 
the whole thing tips over.

 Anybody here have any idea how badly running the balancer on a heavily active 
 system messes things up? (for hdfs/hbase - if anyone knows).

I don't run HBase, but at Y! we used to run the balancer pretty much 
every day, even on super active grids.  It 'mostly works' until you get to the 
point of no return, which it sounds like you are heading for...

 Any ideas? Or do I just need better hardware? Not sure if that's an option, 
 though..

Depending upon how your systems are configured, something else to look 
at is how much space is getting ate by logs, mapreduce spill space, etc.  A 
good daemon bounce might free up some stale handles as well.

Re: hadoop fs -rmr /*?

2011-03-16 Thread Allen Wittenauer

On Mar 16, 2011, at 10:35 AM, W.P. McNeill wrote:

 On HDFS, anyone can run hadoop fs -rmr /* and delete everything.

In addition to what everyone else has said, I'm fairly certain that 
-rmr / is specifically safeguarded against.  But /* might have slipped through 
the cracks.

 What are examples of systems people working with high-value HDFS data put in
 place so that they can sleep at night?

I set in place crontabs where we randomly delete the entire file system 
to remind folks that HDFS is still immature.

:D

OK, not really.

In reality, we have basically a policy that everyone signs off on 
before getting an account where they understand that Hadoop should not be 
considered 'primary storage', is not a data warehouse, is not backed up, and 
could disappear at any moment.  But we also make sure that the base (ETL'd) 
data lives on multiple grids.  Any other data should be reproducible from that 
base data.  



Re: hadoop fs -rmr /*?

2011-03-16 Thread Allen Wittenauer

On Mar 16, 2011, at 10:35 AM, W.P. McNeill wrote:

 On HDFS, anyone can run hadoop fs -rmr /* and delete everything.

In addition to what everyone else has said, I'm fairly certain that 
-rmr / is specifically safeguarded against.  But /* might have slipped through 
the cracks.

 What are examples of systems people working with high-value HDFS data put in
 place so that they can sleep at night?

I set in place crontabs where we randomly delete the entire file system 
to remind folks that HDFS is still immature.

:D

OK, not really.

In reality, we have basically a policy that everyone signs off on 
before getting an account where they understand that Hadoop should not be 
considered 'primary storage', is not a data warehouse, is not backed up, and 
could disappear at any moment.  But we also make sure that the base (ETL'd) 
data lives on multiple grids.  Any other data should be reproducible from that 
base data.  


Re: Changing a Namenode server

2011-03-16 Thread Allen Wittenauer

On Mar 16, 2011, at 7:17 AM, Lior Schachter wrote:

 Hi,
 
 We have been using hadoop/HDSF for some while and we want to upgrade the
 namenode server to a new (and stronger) machine.
 
 Can you please advise on the correct way to do this while minimizing system
 downtime  and without losing any data.



It is pretty much the same procedure as if you had perm. lost the NN 
hardware, more or less:

1) Build new NN machine
2) Shutdown old NN
3) Copy fsimage and edits to new box
4) Swap IPs/move A record/etc=20
5) Mount backup area
6) Bring up NN

Re: Will blocks of an unclosed file get lost when HDFS client (or the HDFS cluster) crashes?

2011-03-14 Thread Allen Wittenauer
No.

If a close hasn't been committed to the file, the associated 
blocks/files disappear in both client crash and namenode crash scenarios.  


On Mar 13, 2011, at 10:09 PM, Sean Bigdatafun wrote:

 I meant an HDFS chunk (the size of 64MB), and I meant the version of
 0.20.2 without append patch.
 
 I think even without the append patch, the previous 64MB blocks (in my
 example, the first 5 blocks) should be safe. Isn't it?
 
 
 On 3/13/11, Ted Dunning tdunn...@maprtech.com wrote:
 What do you mean by block?  An HDFS chunk?  Or a flushed write?
 
 The answer depends a bit on which version of HDFS / Hadoop you are using.
 With the append branches, things happen a lot more like what you expect.
 Without that version, it is difficult to say what will happen.
 
 Also, there are very few guarantees about what happens if the namenode
 crashes.  There are some provisions for recovery, but none of them really
 have any sort of transactional guarantees.  This means that there may be
 some unspecified time before the writes that you have done are actually
 persisted in a recoverable way.
 
 On Sun, Mar 13, 2011 at 9:52 AM, Sean Bigdatafun
 sean.bigdata...@gmail.comwrote:
 
 Let's say an HDFS client starts writing a file A (which is 10 blocks
 long) and 5 blocks have been writen to datanodes.
 
 At this time, if the HDFS client crashes (apparently without a close
 op), will we see 5 valid blocks for file A?
 
 Similary, at this time if the HDFS cluster crashes, will we see 5
 valid blocks for file A?
 
 (I guess both answers are yes, but I'd have some confirmation :-)
 --
 --Sean
 
 
 
 
 -- 
 --Sean



Re: TaskTracker not starting on all nodes

2011-03-12 Thread Allen Wittenauer

(Removing common-dev, because this isn't a dev question)

On Feb 26, 2011, at 7:25 AM, bikash sharma wrote:

 Hi,
 I have a 10 nodes Hadoop cluster, where I am running some benchmarks for
 experiments.
 Surprisingly, when I initialize the Hadoop cluster
 (hadoop/bin/start-mapred.sh), in many instances, only some nodes have
 TaskTracker process up (seen using jps), while other nodes do not have
 TaskTrackers. Could anyone please explain?

Check your logs.




Re: Problem running a Hadoop program with external libraries

2011-03-11 Thread Allen Wittenauer

On Mar 8, 2011, at 1:21 PM, Ratner, Alan S (IS) wrote:
 We had tried putting all the libraries directly in HDFS with a pointer in 
 mapred-site.xml:
 propertynamemapred.child.env/namevalueLD_LIBRARY_PATH=/user/ngc/lib/value/property
 as described in https://issues.apache.org/jira/browse/HADOOP-2838 but this 
 did not work for us.

Correct.  This isn't expected to work.

HDFS files are not directly accessible from the shell without some sort 
of action having taken place.   In order for the above to work, anything 
reading the LD_LIBRARY_PATH environment variable would have to know that 
'/user/...' is a) inside HDFS and b) know how to access it.   The reason why 
the distributed cache method works is because it pulls files from HDFS and 
places them in the local UNIX file system.  From there, UNIX processes can now 
access them.

HADOOP-2838 is really about providing a way for applications to get to 
libraries that are already installed at the UNIX level.  (Although, in reality, 
it would likely be better if applications were linked with a better value 
provided for the runtime library search path -R/-rpath/ld.so.conf/crle/etc 
rather than using LD_LIBRARY_PATH.)

Re: LZO Packager on Centos 5.5 fails

2011-03-08 Thread Allen Wittenauer

On Mar 6, 2011, at 9:37 PM, phil young wrote:
 
 I do appreciate the body of work that this is all built on and your
 responses, but I am surprised that there isn't a more comprehensive set of
 How-Tos. I'll assist in writing up something like that when I get a chance.
 Are other people using CDH3B3 with 64-bit CentOS/RedHat, or is there
 something I'm missing?

This is the Apache list, not the Cloudera list, so the question would 
probably better asked there.



Re: Is this a fair summary of HDFS failover?

2011-02-16 Thread Allen Wittenauer

I'm more than a little concerned that you missed the whole multiple 
directories--including a remote one--for the fsimage thing.  That's probably 
the #1 thing that most of the big grids do to maintain the NN data.  I can only 
remember one failure where the NFS copy wasn't used to recover a namenode in 
all the failures I've personally been involved (and that was an especially odd 
bug, not a NN failure, per se).  The only reason to fall back to the 2ndary NN 
in 0.20 should be is if you've hit a similarly spectacular bug.  Point blank: 
anyone who runs the NN without it writing to a remote copy doesn't know what 
they are doing.

Also, until AvatarNode comes of age (which, from what I understand, FB 
has only been doing for very long themselves), there is no such thing as HA NN. 
 We all have high hopes that it works out, but it likely isn't anywhere near 
ready for primetime yet.

On Feb 14, 2011, at 2:52 PM, Mark Kerzner wrote:

 I completely agree, and I am using yours and the group's posting to define
 the direction and approaches, but I am also trying every solution - and I am
 beginning to do just that, the AvatarNode now.
 
 Thank you,
 Mark
 
 On Mon, Feb 14, 2011 at 4:43 PM, M. C. Srivas mcsri...@gmail.com wrote:
 
 I understand you are writing a book Hadoop in Practice.  If so, its
 important that what's recommended in the book should be verified in
 practice. (I mean, beyond simply posting in this newsgroup - for instance,
 the recommendations on NN fail-over should be tried out first before
 writing
 about how to do it). Otherwise you won't know your recommendations really
 work or not.
 
 
 
 On Mon, Feb 14, 2011 at 12:31 PM, Mark Kerzner markkerz...@gmail.com
 wrote:
 
 Thank you, M. C. Srivas, that was enormously useful. I understand it now,
 but just to be complete, I have re-formulated my points according to your
 comments:
 
  - In 0.20 the Secondary NameNode performs snapshotting. Its data can be
  used to recreate the HDFS if the Primary NameNode fails. The procedure
 is
  manual and may take hours, and there is also data loss since the last
  snapshot;
  - In 0.21 there is a Backup Node (HADOOP-4539), which aims to help with
  HA and act as a cold spare. The data loss is less than with Secondary
 NN,
  but it is still manual and potentially error-prone, and it takes hours;
  - There is an AvatarNode patch available for 0.20, and Facebook runs
 its
  cluster that way, but the patch submitted to Apache requires testing
 and
 the
  developers adopting it must do some custom configurations and also
 exercise
  care in their work.
 
 As a conclusion, when building an HA HDFS cluster, one needs to follow
 the
 best
 practices outlined by Tom
 White
 http://www.cloudera.com/wp-content/uploads/2010/03/HDFS_Reliability.pdf
 ,
 and may still need to resort to specialized NSF filers for running the
 NameNode.
 
 Sincerely,
 Mark
 
 
 
 On Mon, Feb 14, 2011 at 11:50 AM, M. C. Srivas mcsri...@gmail.com
 wrote:
 
 The summary is quite inaccurate.
 
 On Mon, Feb 14, 2011 at 8:48 AM, Mark Kerzner markkerz...@gmail.com
 wrote:
 
 Hi,
 
 is it accurate to say that
 
  - In 0.20 the Secondary NameNode acts as a cold spare; it can be
 used
 to
  recreate the HDFS if the Primary NameNode fails, but with the delay
 of
  minutes if not hours, and there is also some data loss;
 
 
 
 The Secondary NN is not a spare. It is used to augment the work of the
 Primary, by offloading some of its work to another machine. The work
 offloaded is log rollup or checkpointing. This has been a source of
 constant confusion (some named it incorrectly as a secondary and now
 we
 are stuck with it).
 
 The Secondary NN certainly cannot take over for the Primary. It is not
 its
 purpose.
 
 Yes, there is data loss.
 
 
 
 
  - in 0.21 there are streaming edits to a Backup Node (HADOOP-4539),
 which
  replaces the Secondary NameNode. The Backup Node can be used as a
 warm
  spare, with the failover being a matter of seconds. There can be
 multiple
  Backup Nodes, for additional insurance against failure, and
 previous
 best
  common practices apply to it;
 
 
 
 There is no Backup NN in the manner you are thinking of. It is
 completely
 manual, and requires restart of the whole world, and takes about 2-3
 hours
 to happen. If you are lucky, you may have only a little data loss
 (people
 have lost entire clusters due to this -- from what I understand, you
 are
 far
 better off resurrecting the Primary instead of trying to bring up a
 Backup
 NN).
 
 In any case, when you run it like you mention above, you will have to
 (a) make sure that the primary is dead
 (b) edit hdfs-site.xml on *every* datanode to point to the new IP
 address
 of
 the backup, and restart each datanode.
 (c) wait for 2-3 hours for all the block-reports from every restarted
 DN
 to
 finish
 
 2-3 hrs afterwards:
 (d) after that, restart all TT and the JT to connect to the new NN
 (e) finally, restart all the clients (eg, HBase, Oozie, 

Re: Externalizing Hadoop configuration files

2011-02-16 Thread Allen Wittenauer

See also https://issues.apache.org/jira/browse/HADOOP-5670 .


On Feb 16, 2011, at 3:32 PM, Ted Yu wrote:

 You need to develop externalization yourself.
 
 Our installer uses place holders such as:
 property
  namefs.checkpoint.dir/name
 
 valuedataFolderPlaceHolder/dfs/namesecondary,backupFolderPlaceHolder/namesecondary/value
 
 They would be resolved at time of deployment.
 
 On Wed, Feb 16, 2011 at 3:06 PM, Jing Zang purplehear...@hotmail.comwrote:
 
 
 How do I externalize the parameters in hadoop configuration files so make
 it easier for the operations team?. Ideally something like:
 File: mapred-site.xml
 propertynamemapred.job.tracker/name
 value${hadoop.mapred}/value/property
 File: hadoop.conf (External to the app)hadoop.mapred=hadoop:9001
 Anything as easy as Spring' sPropertyPlaceholderConfigurer or similar? How
 do you guys externalize the parameters to a configuration file outside the
 app?
 Thanks.



Re: User History Location

2011-02-11 Thread Allen Wittenauer

On Feb 11, 2011, at 6:12 AM, Alexander Schätzle wrote:

 Hello,
 
 I'm a little bit confused about the right key for specifying the User History 
 Location in CDH3B3 (which is Hadoop 0.20.2+737). Could anybody please give me 
 a short answer which key is the right one and which configuration file is the 
 right one to place the key?
 
 1) mapreduce.job.userhistorylocation ?
 2) hadoop.job.history.user.location ?
 
 Is the mapred-site.xml the right config-file for this key?

I don't know about Cloudera, but on the real Hadoop, you don't want to 
write these to HDFS.  If the write fails (which can happen under a variety of 
circumstances), all job logging gets turned off.

Re: hadoop infrastructure questions (production environment)

2011-02-09 Thread Allen Wittenauer

On Feb 8, 2011, at 7:45 AM, Oleg Ruchovets wrote:

 Hi , we are going to production and have some questions to ask:
 
   We are using 0.20_append  version (as I understand it is  hbase 0.90
 requirement).
 
 
   1) Currently we have to process 50GB text files per day , it can grow to
 150GB
  -- what is the best hadoop file size for our load and are there
 suggested disk block size for that size?

What is the retention policy?  You'll likely want something bigger than 
the default 64mb though if the plan is indefinitely and you process these 
files at every job run.

  -- We worked using gz and I saw that for every files 1 map task
 was assigned.

Correct.  gzip files are not splittable.

  What is the best practice:  to work with gz files and save
 disc space or work without archiving ?

I'd recommend converting them to something else (such as a 
SequenceFile) that can be block compressed.

  Lets say we want to get performance benefits and disk
 space is less critical.

Then uncompress them. :D


   2)  Currently adding additional machine to the greed we need manually
 maintain all files and configurations.
 Is it possible to auto-deploy hadoop servers without the need to
 manually define each one on all nodes?

We basically use bcfg2 to manage different sets of configuration files 
(NN, JT, DN/TT, and gateway).  When a node is brought up, bcfg2 detects based 
upon the disk configuration what kind of machine it is and applies the 
appropriate configurations.  

The only files that really need to get changed when you add/subtract 
nodes are the ones on the NN and JT.  The rest of the nodes are oblivious to 
the rest.


   3) Can we change masters without reinstalling the entire grid

Yes.  Depending upon how you manage the service movement (see a 
previous discussion on this last week), at most you'll need to bounce the DN 
and TT processes.



Re: HDFS drive, partition best practice

2011-02-08 Thread Allen Wittenauer

On Feb 8, 2011, at 11:33 AM, Adam Phelps wrote:

 On 2/7/11 2:06 PM, Jonathan Disher wrote:
 Currently I have a 48 node cluster using Dell R710's with 12 disks - two
 250GB SATA drives in RAID1 for OS, and ten 1TB SATA disks as a JBOD
 (mounted on /data/0 through /data/9) and listed separately in
 hdfs-site.xml. It works... mostly. The big issues you will encounter is
 losing a disk - the DataNode process will crash, and if you comment out
 the affected drive, when you replace it you will have 9 disks full to N%
 and one empty disk.
 
 If DataNode is going down after a single disk failure then you probably 
 haven't set dfs.datanode.failed.volumes.tolerated in hdfs-site.xml.  You can 
 up that number to allow DataNode to tolerate dead drives.

a) only if you have a version that supports it

b) that only protects you on the DN side.  The TT is, AFAIK, still susceptible 
to drive failures.

Re: Multiple various streaming questions

2011-02-07 Thread Allen Wittenauer

On Feb 4, 2011, at 7:46 AM, Keith Wiley wrote:
 I have since discovered that in the case of streaming, mapred.map.tasks is a 
 good way to achieve this goal.  Ironically, if I recall correctly, this 
 seemingly obvious method for setting the number mappers did not work so well 
 in my original nonstreaming case, which is why I resorted to the rather 
 contrived method of calculating and setting mapred.max.split.size instead.

mapred.map.tasks basically kicks in if the input size is less than a 
block.  (OK, it is technically more complex than that, but ... whatever).  
Given what you said in the other thread, this makes a lot more sense now as to 
what is going on.

 Because all slots are not in use.  It's a very larger cluster and it's 
 excruciating that Hadoop partially serializes a job by piling multiple map 
 tasks onto a single map in a queue even when the cluster is massively 
 underutilized.

Well, sort of.

The only input hadoop has to go on is your filename input which is 
relatively tiny.  So of course it is going to underutilize.  This makes sense 
now. :)





Re: changing the block size

2011-02-07 Thread Allen Wittenauer

On Feb 6, 2011, at 2:24 PM, Rita wrote:
 So, what I did was decommission a node, remove all of its data (rm -rf
 data.dir) and stopped the hdfs process on it. Then I made the change to
 conf/hdfs-site.xml on the data node and then I restarted the datanode. I
 then ran a balancer to take effect and I am still getting 64MB files instead
 of 128MB. :-/


Right.

As previously mentioned, changing the block size does not change the 
blocks of the previously written files.  In other words, changing the block 
size does not act as a merging function at the datanode level.  In order to 
change pre-existing files, you'll need to copy the files to a new location, 
delete the old ones, then mv the new versions back.

Re: Streaming data locality

2011-02-03 Thread Allen Wittenauer

On Feb 3, 2011, at 9:16 AM, Keith Wiley wrote:

 I've seen this asked before, but haven't seen a response yet.
 
 If the input to a streaming job is not actual data splits but simple HDFS 
 file names which are then read by the mappers, then how can data locality be 
 achieved.

If I understand your question, the method of processing doesn't matter. 
 The JobTracker places tasks based on input locality.  So if you are providing 
the names of the file you want as input as -input, then the JT will use the 
locations of those blocks.  (Remember: streaming.jar is basically a big wrapper 
around the Java methods and the parameters you pass to it are essentially the 
same as you'd provide to a real Java app.)

Or are you saying your -input is a list of other files to read?  In the 
case, there is no locality.  But again, streaming or otherwise makes no real 
difference.

 Likewise, is there any easier way to make those files accessible other than 
 using the -cacheFile flag?  
 That requires building a very very long hadoop command (100s of files 
 potentially).  I'm worried about overstepping some command-line length 
 limit...plus it would be nice to do this programatically, say with the 
 DistributedCache.addCacheFile() command, but that requires writing your own 
 driver, which I don't see how to do with streaming.
 
 Thoughts?

I think you need to give a more concrete example of what you are doing. 
 -cache is used for sending files with your job and should have no bearing on 
what your input is to your job.  Something tells me that you've cooked 
something up that is overly complex. :D




Re: Multiple various streaming questions

2011-02-03 Thread Allen Wittenauer

On Feb 1, 2011, at 11:40 PM, Keith Wiley wrote:

 I would really appreciate any help people can offer on the following matters.
 
 When running a streaming job, -D, -files, -libjars, and -archives don't seem 
 work, but -jobconf, -file, -cacheFile, and -cacheArchive do.  With the first 
 four parameters anywhere in command I always get a Streaming Command 
 Failed! error.  The last four work though.  Note that some of those 
 parameters (-files) do work when I a run a Hadoop job in the normal 
 framework, just not when I specify the streaming jar.

There are some issues with how the streaming jar processes the command 
line, especially in 0.20, in that they need to be in the correct order.  In 
general, the -D's need to be *before* the rest of the streaming params.  This 
is what works for me:

hadoop  \
jar \
 `ls $HADOOP_HOME/contrib/streaming/hadoop-*-streaming.jar` \
-Dmapred.reduce.tasks.speculative.execution=false \
-Dmapred.map.tasks.speculative.execution=false \
-Dmapred.job.name=oh noes aw is doing perl again \
-input ${ATTEMPTIN} \
-output ${ATTEMPTOUT} \
-mapper map.pl \
-reducer reduce.pl  \
-file jobsvs-map1.pl \
-file jobsvs-reduce1.pl 

 I have found examples online, but they always reference built-in classes.  
 If I try to use my own class, the job tracker produces a Cannot run program 
 org.uw.astro.coadd.Reducer2: java.io.IOException: error=2, No such file or 
 directory error.  

I wouldn't be surprised if it is a bug.  It might be worthwhile to dig 
into the streaming jar to figure out how it determines whether something is a 
class or not.  [It might even do something dumb like is it org.apache.blah?]

 How do I force a single record (input file) to be processed by a single 
 mapper to get maximum parallelism?


  All I found online was this terse description (of an example that gzips 
 files, not my application):
   • Generate a file containing the full HDFS path of the input files. 
 Each map task would get one file name as input.
   • Create a mapper script which, given a filename, will get the file to 
 local disk, gzip the file and put it back in the desired output directory

These work, but are less than ideal.

 I don't understand exactly what that means and how to go about doing it.  In 
 the normal Hadoop framework I have achieved this goal by setting 
 mapred.max.split.size small enough that only one input record fits (about 
 6MBs), but I tried that with my streaming job ala -jobconf 
 mapred.max.split.size=X where X is a very low number, about as many as a 
 single streaming input record (which in the streaming case is not 6MB, but 
 merely ~100 bytes, just a filename referenced ala -cacheFile), but it didn't 
 work, it sent multiple records to each mapper anyway.

What you actually want to do is set mapred.min.split.size set to an 
extremely high value.  Setting max.split.size only works on Combined-  and 
MultiFile- InputFormat for some reason.

Also, you might be able to change the inputformat.  My experiences with 
doing this are Not Good(tm).


  Achieving 1-to-1 parallelism between map tasks, nodes, and input records is 
 very import because my map tasks take a very long time to run, upwards of an 
 hour.  I cannot have them queueing up on a small number of nodes while there 
 are numerous unused nodes (task slots) available to be doing work.

If all the task slots are in use, why would you care if they are 
queueing up?  Also keep in mind that if a node fails, that work will need to 
get re-done anyway.




Re: Best way to limit the number of concurrent tasks per job on hadoop 0.20.2

2011-01-28 Thread Allen Wittenauer

On Jan 25, 2011, at 12:48 PM, Renaud Delbru wrote:

 As it seems that the capacity and fair schedulers in hadoop 0.20.2 do not 
 allow a hard upper limit in number of concurrent tasks, do anybody know any 
 other solutions to achieve this ?

The specific change for capacity scheduler has been backported to 0.20.2 as 
part of https://issues.apache.org/jira/browse/MAPREDUCE-1105 .  Note that 
you'll also need https://issues.apache.org/jira/browse/MAPREDUCE-1160 which 
fixes a logging bug in the JobTracker.  Otherwise your logs will fill up.



Re: Draining/Decommisioning a tasktracker

2011-01-28 Thread Allen Wittenauer

On Jan 28, 2011, at 1:09 AM, rishi pathak wrote:

 Hi,
Is there a way to drain a tasktracker. What we require is not to
 schedule any more map/red tasks onto a tasktracker(mark it offline) but
 still the running tasks should not be affected.
 

Decommissioning task trackers was added in 0.21.



Re: Thread safety issues with JNI/native code from map tasks?

2011-01-28 Thread Allen Wittenauer

On Jan 28, 2011, at 5:47 PM, Keith Wiley wrote:

 On Jan 28, 2011, at 15:50 , Greg Roelofs wrote:
 
 Does your .so depend on any other potentially thread-unsafe .so that other
 (non-Hadoop) processes might be using?  System libraries like zlib are safe
 (else they wouldn't make very good system libraries), but maybe some other
 research library or something?  (That's a long shot, but I'm pretty much
 grasping at straws here.)
 
 Yeah, I dunno.  It's a very complicated system that hits all kinds of popular 
 conventional libraries: boost, eigen, countless other things.  I doubt of it 
 is being access however.  This is a dedicated cluster so if my task is the 
 only one running, then it's only concurrent with the OS itself (and the JVM 
 and Hadoop).

By chance, do you have jvm reuse turned on?




Re: rsync + hdfs

2011-01-12 Thread Allen Wittenauer

On Jan 11, 2011, at 6:32 PM, Mag Gam wrote:

 Is there a tool similar to rsync for hdfs? I would like to check for:
 ctime and size (the default behavior of rsync) and then sync if
 necessary. This would be a nice feature to add in the dfs utilities.


distcp is the closest.


Re: No locks available

2011-01-11 Thread Allen Wittenauer

On Jan 11, 2011, at 2:39 AM, Adarsh Sharma wrote:

 Dear all,
 
 Yesterday I was working on a cluster of 6 Hadoop nodes ( Load data, perform 
 some jobs ). But today when I start my cluster I came across a problem on one 
 of my datanodes.

Are you running this on NFS?

 
 2011-01-11 12:55:57,031 INFO org.apache.hadoop.hdfs.server.common.Storage: 
 java.io.IOException: No locks available


Re: monit? daemontools? jsvc? something else?

2011-01-06 Thread Allen Wittenauer

On Jan 6, 2011, at 12:39 AM, Otis Gospodnetic wrote:
In the case of Hadoop,  no.  There has usually been at least a core dump, 
 message in syslog,  message in datanode log, etc, etc.   [You *do* have 
 cores 
 enabled,  right?]
 
 Hm, cores enabled what do you mean by that?  Are you referring to JVM 
 heap 
 dump -XX JVM argument (-XX:+HeapDumpOnOutOfMemoryError)?  If not, I'm all 
 eyes/ears!

I'm talking about system level core dumps. i.e., ulimit -c and friends. 
 [I'm much more of a systems programmer than a java guy, so ... ] You can 
definitely write Java code that will make the JVM crash due to misuse of 
threading libraries.  There are also CPU, kernel, and BIOS bugs that I've seen 
that cause the JVM to crash. Usually jstack or a core will lead the way to a 
patching the system to work around these issues. 

 
We also have in place a monitor that checks  the # of active nodes.  If 
 it 
 falls below a certain percentage, then we get  alerted and check on them en 
 masse.   Worrying about one or two nodes going  down probably means you need 
 more nodes. :D
 
 
 That's probably right. :)
 So what do you use for monitoring the # of active nodes?

We currently have a custom plugin for Nagios that screen scrapes the NN 
and JT web UI.  When a certain percentage of nodes dies, we get alerted that we 
need to take a look and start bringing stuff back up.  [We used the same 
approach at Y!, so it does work at scale.]

I'm hoping to replace this (and Ganglia) with something better over the 
next year ;)

Re: monit? daemontools? jsvc? something else?

2011-01-05 Thread Allen Wittenauer

On Jan 4, 2011, at 10:29 PM, Otis Gospodnetic wrote:

 Ah, more manual work! :(
 
 You guys never have JVM die just because? I just had a DN's JVM die the 
 other day just because and with no obvious cause.  Restarting it brought it 
 back to life, everything recovered smoothly.  Had some automated tool done 
 the 
 restart for me, I'd be even happier.

In the case of Hadoop, no.  There has usually been at least a core 
dump, message in syslog, message in datanode log, etc, etc.   [You *do* have 
cores enabled, right?]

We also have in place a monitor that checks the # of active nodes.  If 
it falls below a certain percentage, then we get alerted and check on them en 
masse.   Worrying about one or two nodes going down probably means you need 
more nodes. :D



Re: monit? daemontools? jsvc? something else?

2011-01-05 Thread Allen Wittenauer

On Jan 5, 2011, at 7:57 PM, Lance Norskog wrote:

 Isn't this what Ganglia is for?
 

No.

Ganglia does metrics, not monitoring.


 On 1/5/11, Allen Wittenauer awittena...@linkedin.com wrote:
 
 On Jan 4, 2011, at 10:29 PM, Otis Gospodnetic wrote:
 
 Ah, more manual work! :(
 
 You guys never have JVM die just because? I just had a DN's JVM die the
 other day just because and with no obvious cause.  Restarting it brought
 it
 back to life, everything recovered smoothly.  Had some automated tool done
 the
 restart for me, I'd be even happier.
 
  In the case of Hadoop, no.  There has usually been at least a core dump,
 message in syslog, message in datanode log, etc, etc.   [You *do* have cores
 enabled, right?]
 
  We also have in place a monitor that checks the # of active nodes.  If 
 it
 falls below a certain percentage, then we get alerted and check on them en
 masse.   Worrying about one or two nodes going down probably means you need
 more nodes. :D
 
 
 
 
 -- 
 Lance Norskog
 goks...@gmail.com



Re: any plans to deploy OSGi bundles on cluster?

2011-01-04 Thread Allen Wittenauer

On Jan 4, 2011, at 10:30 AM, Hiller, Dean (Contractor) wrote:

 I guess I meant in the setting for number of tasks in child JVM before
 teardown.  In that case, it is nice to separate/unload my previous
 classes from the child JVM which OSGi does.  I was thinking we may do 10
 tasks / JVM setting which I thought meant have a Child process run 10
 tasks before shutting down4 may be from one job and 4 from a new job
 with conflicting classes maybe.  Will that work or is it not advised?

I don't use the JVM re-use options, but I'm 99% certain that the task JVM's are 
not shared between jobs. 

Re: When does Reduce job start

2011-01-04 Thread Allen Wittenauer

On Jan 4, 2011, at 10:53 AM, sagar naik wrote:
 
 The only reason, I can think of not starting  a reduce task is to
 avoid the un-necessary transfer of map output data in case of
 failures.

Reduce tasks also eat slots while doing the map output. On shared 
grids, this can be extremely bad behavior.

 Is there a way to quickly start the reduce task in such case ?
 Wht is the configuration param to change this behavior

mapred.reduce.slowstart.completed.maps

See http://wiki.apache.org/hadoop/LimitingTaskSlotUsage (from the FAQ 2.12/2.13 
questions).



Re: monit? daemontools? jsvc? something else?

2011-01-03 Thread Allen Wittenauer

On Jan 3, 2011, at 2:22 AM, Otis Gospodnetic wrote:
 I see over on http://search-hadoop.com/?q=monit+daemontools that people *do* 
 use 
 tools like monit and daemontools (and a few other ones) to keep revive their 
 Hadoop processes when they die.
 

I'm not a fan of doing this for Hadoop processes, even TaskTrackers and 
DataNodes.  The processes generally die for a reason, usually indicating that 
something is wrong with the box.  Restarting those processes may potentially 
hide issues.



Re: Reduce Task Priority / Scheduler

2010-12-19 Thread Allen Wittenauer

On Dec 19, 2010, at 7:39 AM, Martin Becker wrote:

 Hello everybody,
 
 is there a possibility to make sure that certain/all reduce tasks,
 i.e. the reducers to certain keys, are executed in a specified order?
 This is Job internal, so the Job Scheduler is probably the wrong place to 
 start?
 Does the order induced by the Comparable interface influence the
 execution order at all?

It sounds like you should be using a different framework than 
map/reduce if you care about the execution order.

Re: Unbalanced disks - need to take down whole HDFS?

2010-12-16 Thread Allen Wittenauer

On Dec 16, 2010, at 6:32 AM, Erik Forsberg wrote:
 http://wiki.apache.org/hadoop/FAQ#On_an_individual_data_node.2C_how_do_you_balance_the_blocks_on_the_disk.3F
 has a solution, but it starts with Take down HDFS.
 
 Is that really necessary - shouldn't taking down just that datanode,
 moving around the blocks, then start the datanode be good enough, or
 will that mess up some datastructure in the namenode? 
 

It won't mess up the namenode, but it is going to be busy replicating 
those blocks on that datanode while you move stuff around.  Depending upon how 
much data is involved, you might find that after you bring the datanode back up 
the namenode will put you back in a state of major unbalance when it sends a 
wave of deletions for the extra replicas.




Re: Deprecated ... damaged?

2010-12-15 Thread Allen Wittenauer

On Dec 15, 2010, at 2:13 AM, maha wrote:

 Hi everyone,
 
  Using Hadoop-0.20.2, I'm trying to use MultiFileInputFormat which is 
 supposed to put each file from the input directory in a SEPARATE split.


Is there some reason you don't just use normal InputFormat with an 
extremely high min.split.size?



Re: Hadoop Certification Progamme

2010-12-15 Thread Allen Wittenauer

On Dec 15, 2010, at 9:26 AM, Konstantin Boudnik wrote:

 Hey, commit rights won't give you a nice looking certificate, would it? ;)


Isn't that what Photoshop is for?



Re: files that don't really exist?

2010-12-14 Thread Allen Wittenauer

On Dec 13, 2010, at 3:14 PM, Seth Lepzelter wrote:

 Alright, a little further investigation along that line (thanks for the 
 hint, can't believe I didn't think of that), shows that there's actually a 
 carriage return character (%0D, aka \r) at the end of the filename.

This falls into that you never forget your first time area. ;)

 I guess *, in hadoop's parlance, doesn't include a \r.

That's a bug in HDFS, IMO.  Please file a JIRA.

 got a \r into the command line, -rmr'ed that, it's now fixed.

Awesome. :)




Re: files that don't really exist?

2010-12-13 Thread Allen Wittenauer

On Dec 13, 2010, at 8:51 AM, Seth Lepzelter wrote:

 I've got a smallish cluster of 12 nodes up from 6, that we're using to dip 
 our feet into hadoop.  One of my users has a few directories in his HDFS 
 home which he was using to test, and which exist, according to 
 
 hadoop fs -ls home directory
 
 ie:
 
 ...
 /user/ken/testoutput4
 /user/ken/testoutput5
 ...
 
 but if you do:
 
 hadoop fs -ls /user/ken/testoutput5
 
 you get:
 
 ls: Cannot access /user/ken/testoutput5: No such file or directory.


There is likely one or more spaces after the testoutput5 .  Try using hadoop fs 
-ls /user/ken/*/* .



Re: Multicore Nodes

2010-12-13 Thread Allen Wittenauer

On Dec 11, 2010, at 3:09 AM, Rob Stewart wrote:
 Or - is there support in Hadoop for multi-core nodes? 

Be aware that writing a job that is specifically uses multi-threaded 
tasks usually means that a) you probably aren't really doing map/reduce anymore 
and b) the job will likely tickle bugs in the system, as some APIs are more 
multi-thread safe than others.  (See HDFS-1526, for example). 

Re: Scheduler in Hadoop MR

2010-12-08 Thread Allen Wittenauer

On Dec 7, 2010, at 6:47 PM, Harsh J wrote:
 1 - When we've two JobTrackers running simultaneously, each JobTracker is
 running in a separate process?
 
 You can't run simultaneous JobTrackers for the same data-cluster
 AFAIK; only one JT process can exist. Did you mean jobs?

Sure you can.

Configure the task trackers to talk to specific job trackers.   This 
does, however, have a very negative impact on data locality.



Re: Topology script - does it take DNS names, or IPs, or both?

2010-12-03 Thread Allen Wittenauer

On Dec 2, 2010, at 11:41 PM, Erik Forsberg wrote:

 Hi!
 
 I'm a bit confused about the arguments for the network topology script
 you can specify with topology.script.file.name - are they IPs, or dns
 names, or does it vary depending on if the IP could be resolved into a
 name?

You should make it work with both.



Re: Not a host:port pair: local

2010-11-21 Thread Allen Wittenauer

On Nov 19, 2010, at 5:56 PM, Skye Berghel wrote:
 
 
 All of the information I've seen online suggests that this is because 
 mapreduce.jobtracker.address is set to local. However, in 
 conf/mapred-site.xml I have
property
namemapreduce.jobtracker.address/name
valuemyserver:/value
descriptionthe jobtracker server/description
/property
 which means that the jobtracker shouldn't be set to local in the first place.
 
 Does anyone have any pointers?

Check to make sure your XML is completely valid.  You might have a missing / 
somewhere.

Re: 0.21 found interface but class was expected

2010-11-14 Thread Allen Wittenauer
[Yes, gmail people, this likely went to your junk folder. ]

On Nov 13, 2010, at 5:28 PM, Lance Norskog wrote:

 It is considered good manners :)
 
 Seriously, if you want to attract a community you have an obligation
 to tell them when you're going to jerk the rug out from under their
 feet.

The rug has been jerked in various ways for every micro version since 
as long as I've been with Hadoop.  Such jerkings have always (eventually) been 
for the positive with a happy ending almost every time.  No pain, no gain.

Oh, one other thing.

Here we are, several fairly significant conferences later (both as the 
main focus and as one of the leading topics) and I still don't  understand why 
people have concerns about attracting a community.  When you have what seems 
like 100s of companies creating products either built around or integrating 
Hadoop (the full gamut of several stealth startups to Major Players like IBM), 
it doesn't really seem like that is much of an issue anymore.  

At this point, I'm actually in the opposite camp:  the community has 
grown TOO fast to the point that major problems in the source won't be able to 
be fixed because folks will expect less breakage.  This is especially easy for 
sites with a few hundred nodes (or with enough frosting on top) because 
everything seems to be working for them.  Many of them will  not really 
understand that at super large scales, some things just don't work.  In order 
to fix some of the issues, breakage will occur.  

The end result can either be a community divided into multiple camps 
due to forking or a community that has learned to tolerate these minor 
inconveniences when they pop up.  I for one would rather be in the latter, but 
it sure seems like some parts of the community (and in many ways, the ASF 
itself) would rather it be the former.

Time will tell.




Re: 0.21 found interface but class was expected

2010-11-14 Thread Allen Wittenauer

On Nov 14, 2010, at 12:05 AM, Allen Wittenauer wrote:
   The rug has been jerked in various ways for every micro version since 
 as long as I've been with Hadoop.  

s,micro,minor,

But i'm sure you knew that.



Re: Support for different users and different installation directories on different machines?

2010-11-07 Thread Allen Wittenauer

On Nov 7, 2010, at 9:44 AM, Zhenhua Guo wrote:

 It seems that currently Hadoop assumes that it is installed under the
 same directory using the same userid on different machines.
 Currently, following two bullets cannot be done without hacks (correct
 me if I am wrong):
 1) install Hadoop as different users on different machines
 2) Hadoop is installed under different directories.
 
 Anybody had the same problem? How did you solve it?

The start and stop scripts have directory layout assumptions.  The 
alternative is to use hadoop-daemon.sh to start/stop manually each service on 
each node.  To work around different users, you might need to modify the 
members list of the supergroup.  But I don't remember if that is necessary or 
not.

It should be noted that using different users and different directory 
layouts is not recommended and you will likely end up being very very 
frustrated.

Re: Problem running the jar files from web page

2010-11-07 Thread Allen Wittenauer

On Nov 7, 2010, at 10:48 AM, sudharshan wrote:

 
 Hello I am a newbie to hadoop environment. I got stuck with a problem on
 running jar file in hadoop
 I have a mapreduce application for which I have created a web interface to
 run. The application runs perfectly in command line terminal. But when same
 command is issued from web interface through perl script it doesnt work and
 exits with 65280 error message.

Is this the return code from system() in perl?  Are you shifting the bits 
appropriately?

[See http://perldoc.perl.org/functions/system.html for more details.]



  1   2   3   4   >