Re: New bee quick questions :-)
vikas wrote: Hi, I'm new to HADOOP. And aiming to develop good amount of code with it. I've some quick questions it would be highly appreciable if some one can answer them. I was able to run HADOOP in cygwin environment. run the examples both in standalone mode as well as in a 2 node cluster. 1) How can I over come the difficulty of giving password for SSH logins when ever DataNodes are getting started. Creating SSH keys (either user or host based) and pairing the hosts with these keys. This is the first URL I got for a ssh public key authentication): http://sial.org/howto/openssh/publickey-auth/ 2) I've put some 1.5 GB of file in my Master node where even a DataNode is running. I want to see how load balancing can be done so that disk space will be utilized even from other datanodes. Not sure how to answer to this one. HDFS has knowledge of three entities: the node, the rack, and the rest. In the default configuration, each block is replicated 3 times, one for each entity. If you don't have racks and so you might want to fine tune replication of files through HDFS shell. 3) How can I add a new DataNode without stopping HADOOP. Just add it to the slaves and run start-dfs.sh. Already running nodes won't be touched. 4) Let us suppose I want to shutdown one datanode for maintenance purpose. is there any way to inform Hadoop saying that this particular datanode is going done -- please make sure the data in it is replicated else where ? Replication of blocks with a factor = 2 should do the job. In the general case, default replication is 3. You can check the replication factor through HDFS shell. 5) I was going through some videos on MAP-Reduce and few Yahoo tech talks. in that they were specifying a Hadoop cluster has multiple cores -- what does this mean ? Are you talking about multi-core processors? 5.1) can I have multiple instance of namenodes running in a cluster apart from secondary nodes ? Not sure on this, but as far as I know there's only one namenode that should be running. 6) If I go on create huge files will they be balanced among all the datanodes ? or do I need to change the creation of file location in the application. Files are divided in blocks. Then blocks are replicated. Huge files are simply composed by a larger set of blocks. In principle, you don't know where your blocks will end up, apart from the entities I mentioned before. And in principle, you shouldn't care about where they end with, because Hadoop applications will take care of sending tasks where the data reside. Ciao, Luca
Re: Lease expired on open file
dhruba Borthakur wrote: The DFSClient has a thread that renews leases periodically for all files that are being written to. I suspect that this thread is not getting a chance to run because the gunzip program is eating all the CPU. You might want to put in a Sleep() after every few seconds on unzipping. Thanks, dhruba Thanks Dhruba, with your suggestion and a small Sleep() every block (more or less), it worked perfectly. Good hint! Ciao, Luca -Original Message- From: Luca Telloli [mailto:[EMAIL PROTECTED] Sent: Wednesday, April 16, 2008 9:43 AM To: core-user@hadoop.apache.org Subject: Lease expired on open file Hello everyone, I wrote a small application that directly gunzip files from a local filesystem to an installation of HDFS, writing on a FSDataOutputStream. Nevertheless, while expanding a very big file, I got this exception: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.dfs.LeaseExpiredException: No lease on /user/luca/testfile File is not open for writing. [Lease. Holder: 44 46 53 43 6c 69 65 6e 74 5f 2d 31 39 31 34 34 39 36 31 34 30, heldlocks: 0, pendingcreates: 1] I wonder what the cause would be for this Exception and if there's a way to know the default lease for a file and to possibly prolongate it. Ciao, Luca
Lease expired on open file
Hello everyone, I wrote a small application that directly gunzip files from a local filesystem to an installation of HDFS, writing on a FSDataOutputStream. Nevertheless, while expanding a very big file, I got this exception: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.dfs.LeaseExpiredException: No lease on /user/luca/testfile File is not open for writing. [Lease. Holder: 44 46 53 43 6c 69 65 6e 74 5f 2d 31 39 31 34 34 39 36 31 34 30, heldlocks: 0, pendingcreates: 1] I wonder what the cause would be for this Exception and if there's a way to know the default lease for a file and to possibly prolongate it. Ciao, Luca
Re: Using NFS without HDFS
slitz wrote: I've read in the archive that it should be possible to use any distributed filesystem since the data is available to all nodes, so it should be possible to use NFS, right? I've also read somewere in the archive that this shoud be possible... As far as I know, you can refer to any file on a mounted file system (visible from all compute nodes) using the prefix file:// before the full path, unless another prefix has been specified. Cheers, Luca slitz On Fri, Apr 11, 2008 at 1:43 PM, Peeyush Bishnoi [EMAIL PROTECTED] wrote: Hello , To execute Hadoop Map-Reduce job input data should be on HDFS not on NFS. Thanks --- Peeyush On Fri, 2008-04-11 at 12:40 +0100, slitz wrote: Hello, I'm trying to assemble a simple setup of 3 nodes using NFS as Distributed Filesystem. Box A: 192.168.2.3, this box is either the NFS server and working as a slave node Box B: 192.168.2.30, this box is only JobTracker Box C: 192.168.2.31, this box is only slave Obviously all three nodes can access the NFS shared, and the path to the share is /home/slitz/warehouse in all three. My hadoop-site.xml file were copied over all nodes and looks like this: configuration property namefs.default.name/name valuelocal/value description The name of the default file system. Either the literal string local or a host:port for NDFS. /description /property property namemapred.job.tracker/name value192.168.2.30:9001/value description The host and port that the MapReduce job tracker runs at. If local, then jobs are run in-process as a single map and reduce task. /description /property property namemapred.system.dir/name value/home/slitz/warehouse/hadoop_service/system/value descriptionomgrotfcopterlol./description /property /configuration As one can see, i'm not using HDFS at all. (Because all the free space i have is located in only one node, so using HDFS would be unnecessary overhead) I've copied the input folder from hadoop to /home/slitz/warehouse/input. When i try to run the example line bin/hadoop jar hadoop-*-examples.jar grep /home/slitz/warehouse/input/ /home/slitz/warehouse/output 'dfs[a-z.]+' the job starts and finish okay but at the end i get this error: org.apache.hadoop.mapred.InvalidInputException: Input path doesn't exist : /home/slitz/hadoop-0.15.3/grep-temp-141595661 at org.apache.hadoop.mapred.FileInputFormat.validateInput(FileInputFormat.java:154) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:508) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:753) (...the error stack continues...) i don't know why the input path being looked is in the local path /home/slitz/hadoop(...) instead of /home/slitz/warehouse/(...) Maybe something is missing in my hadoop-site.xml? slitz
[HOD] Collecting MapReduce logs
Hello everyone, I wonder what is the meaning of hodring.log-destination-uri versus hodring.log-dir. I'd like to collect MapReduce UI logs after a job has been run and the only attribute seems to be hod.hadoop-ui-log-dir, in the hod section. With that attribute specified, logs are all grabbed in that directory, producing a large amount of html files. Is there a way to collect them, maybe as a .tar.gz, in a place somewhere related to the user? Additionally, how do administrators specify variables in these values? Which interpreter interprets them? For instance, variables specified in a bash fashion like $USER in section hodring or ringmaster work well (I guess they are interpreted by bash itself) but if specified in the hod section they're not: I tried with [hod] hadoop-ui-log-dir=/somedir/$USER but any hod command fails displaying an error on that line. Cheers, Luca
[HOD] Example script
Hello everyone, in http://hadoop.apache.org/core/docs/r0.16.0/hod.html there is no mention of scripting abilities of HOD, but with version 0.3 I was used to do something like: hod -m N -a run jar foo.jar jar-parameters now I see that in 0.4.0 -a refers to logs-collection and option -z refers to script. I wrote a two-lines bash script with this code in it: #!/bin/bash hadoop --config /home/luca/hod-test jar /mnt/scratch/hadoop/hadoop-0.16.0-examples.jar wordcount file:///mnt/scratch/hadoop/test/part-0 test.hodscript.out which is launched right after the allocation, but gets stuck on some 08/03/05 12:43:14 INFO ipc.Client: Retrying connect to server: svr3.com/10.78.36.204:56895. Already tried 10 time(s). java.net.ConnectException: Connection refused The script was launched with command: hod -m 3 -z test-script.sh Noticeably, if I from the shell I run manually: $ hod -o allocate . 4 $ hadoop --config . jar /mnt/scratch/hadoop/hadoop-0.16.0-examples.jar wordcount file:///mnt/scratch/hadoop/test/part-0 test.hodscript.out $ hod -o deallocate . everything works well, so I'm pretty sure I'm doing a mistake with the script, but I didn't find any example to work on. I'm not sure what's changing in regard to ports among the two invocations. Cheers, Luca
[HOD] Port for MapReduce UI interface
Hello everyone, although in my configuration I specified: [hod] xrs-port-range = 1-11000 http-port-range = 1-11000 the Mapred UI is chosen outside this range. This will be a problem with my currently firewall settings. Any hint on how to solve this? Did I forget some additional parameter? Cheers, Luca
[HOD] hdfs:///mapredsystem directory
Hello everyone, I have a problem with directory hdfs:///mapredsystem and I'm not sure if it's a bug or my fault. Not sure if this influences what follows, but I have two users, one is hadoop who has sudo privileges on all the nodes, the other one is luca, who has normal privileges. I see that each submitted job creates a hdfs:///mapredsystem directory, created by (I guess) the hodring process. Problem is that it's not cleaned up at the end of the process; for instance a use case would be: - user hadoop allocates a cluster, the ringmaster is svr3, so a /mapredsystem/svr3 directory is created - user hadoop deallocates the cluster, but that directory is not cleaned up - user luca allocates a cluster, and the first node chosen as ringmaster is svr3, so hodring tries to write hdfs:///mapredsystem but it fails - allocation succeeds, but there's no hodring running; looking at 0-jobtracker/logdir/hadoop.log under the temporary directory I can read: 2008-02-26 17:28:42,567 WARN org.apache.hadoop.mapred.JobTracker: Error starting tracker: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.fs.permission.AccessControlException: Permission denied: user=luca, access=WRITE, inode=mapredsystem:hadoop:supergroup:rwxr-xr-x Am I doing anything wrong? Cheers, Luca
Re: [HOD] hdfs:///mapredsystem directory
Hi Mahadev, I'm not sure the workaround can solve the problem, because it appears that a subdirectory is created under that directory with the name of the hodring host. So if the next allocation made by a different user chooses the same host, the permission problem might show up again. Unless obviously each time the user who allocated resources deletes that directory during deallocation, which would be a solution for this problem. I filed a bug: HADOOP-2899 Cheers, Luca Mahadev Konar wrote: Hi Luca, This seems like a bug. The JobTracker process tries to create this directory if this does nto exist. And if you have two different users running hod clusters they will both try to create this directory and since only one succeeds, with permissions the directory is owned by the user who created the cluster first. A work around to this solution is to create Hdfs:///mapredsystem manually and make it world writable. Please open a bug regarding this issue. Regards Mahadev -Original Message- From: Luca Telloli [mailto:[EMAIL PROTECTED] Sent: Tuesday, February 26, 2008 9:06 AM To: core-user@hadoop.apache.org Subject: [HOD] hdfs:///mapredsystem directory Hello everyone, I have a problem with directory hdfs:///mapredsystem and I'm not sure if it's a bug or my fault. Not sure if this influences what follows, but I have two users, one is hadoop who has sudo privileges on all the nodes, the other one is luca, who has normal privileges. I see that each submitted job creates a hdfs:///mapredsystem directory, created by (I guess) the hodring process. Problem is that it's not cleaned up at the end of the process; for instance a use case would be: - user hadoop allocates a cluster, the ringmaster is svr3, so a /mapredsystem/svr3 directory is created - user hadoop deallocates the cluster, but that directory is not cleaned up - user luca allocates a cluster, and the first node chosen as ringmaster is svr3, so hodring tries to write hdfs:///mapredsystem but it fails - allocation succeeds, but there's no hodring running; looking at 0-jobtracker/logdir/hadoop.log under the temporary directory I can read: 2008-02-26 17:28:42,567 WARN org.apache.hadoop.mapred.JobTracker: Error starting tracker: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.fs.permission.AccessControlException: Permission denied: user=luca, access=WRITE, inode=mapredsystem:hadoop:supergroup:rwxr-xr-x Am I doing anything wrong? Cheers, Luca
Re: Problems running a HOD test cluster
Allen Wittenauer wrote: [2008-02-21 19:46:11,014] ERROR/40 torque:96 - qstat error: exit code: 153 | signal: False | core False [2008-02-21 19:46:11,017] INFO/20 hadoop:451 - Ringmaster at : None. I bet your ringmaster didn't come up. Check which nodes were allocated to your job via qstat -f. Chances are good the first one is the ringmaster node. Check the torque logs, syslogs, and the hod log dir for hints as to what happened. So it was a ringmaster related problem, and it's now solved. Now another problem: I try to run an hadoop job but I get a timeout error from hadoop in trying to get the input path. I guess this might be related to the fact that I'm using an external HDFS already running, and I'm not sure how to hook hod with it. I configured [gridservice-hdfs] external= True host= mane-of-my-server.com pkgs= /mnt/scratch/grid/hadoop/current fs_port = 10010 info_port = 10007 but I'm not sure whether the actual fs_port is on 10010. Does anybody know what's the attribute in hadoop configuratioon where this value is specified? If not specified, there is a default/random value? Finally, is this a global attribute, or a node attribute? (like node A having some fs_port and node B having a different one). Thanks in advance, Luca
Problems running a HOD test cluster
.server.com [2008-02-21 19:45:58,856] INFO/20 hadoop:447 - Hod Job successfully submitted. JobId : 207.server.com. [2008-02-21 19:46:08,967] DEBUG/10 torque:87 - /usr/bin/qstat -f -1 207.server.com [2008-02-21 19:46:11,014] ERROR/40 torque:96 - qstat error: exit code: 153 | signal: False | core False [2008-02-21 19:46:11,017] INFO/20 hadoop:451 - Ringmaster at : None. [2008-02-21 19:46:11,021] INFO/20 hadoop:530 - Cleaning up job id 207.server.com, as cluster could not be allocated. [2008-02-21 19:46:11,025] DEBUG/10 torque:131 - /usr/bin/qdel 207.server.com [2008-02-21 19:46:13,079] CRITICAL/50 hod:253 - Cannot allocate cluster /mnt/scratch/grid/test [2008-02-21 19:46:13,940] DEBUG/10 hod:391 - return code: 6 $ cat hod/conf/hodrc [hod] stream = True java-home = /usr/java/jdk1.6.0_04 cluster = HOD cluster-factor = 1.8 xrs-port-range = 1-11000 debug = 4 allocate-wait-time = 3600 temp-dir= /tmp/hod log-dir = /mnt/scratch/grid/hod/logs [ringmaster] register= True stream = False temp-dir= /tmp/hod http-port-range = 1-11000 work-dirs = /tmp/hod/1,/tmp/hod/2 xrs-port-range = 1-11000 debug = 4 [hodring] stream = False temp-dir= /tmp/hod register= True java-home = /usr/java/jdk1.6.0_04 http-port-range = 1-11000 xrs-port-range = 1-11000 debug = 4 [resource_manager] queue = hadoop batch-home = /usr id = torque env-vars= HOD_PYTHON_HOME=/usr/bin/python2.5 [gridservice-mapred] external= False pkgs= /mnt/scratch/grid/hadoop/current tracker_port= 10003 info_port = 10008 [gridservice-hdfs] external= True pkgs= /mnt/scratch/grid/hadoop/current fs_port = 10007 info_port = 10009 Cheers, Luca