Re: how to run jobs every 30 minutes?
Or, if you want to do it in a reliable way you could use an Oozie coordinator job. On Wed, Dec 8, 2010 at 1:53 PM, edward choi mp2...@gmail.com wrote: My mistake. Come to think about it, you are right, I can just make an infinite loop inside the Hadoop application. Thanks for the reply. 2010/12/7 Harsh J qwertyman...@gmail.com Hi, On Tue, Dec 7, 2010 at 2:25 PM, edward choi mp2...@gmail.com wrote: Hi, I'm planning to crawl a certain web site every 30 minutes. How would I get it done in Hadoop? In pure Java, I used Thread.sleep() method, but I guess this won't work in Hadoop. Why wouldn't it? You need to manage your post-job logic mostly, but sleep and resubmission should work just fine. Or if it could work, could anyone show me an example? Ed. -- Harsh J www.harshj.com
Re: Reduce Error
Any chance mapred.local.dir is under /tmp and part of it got cleaned up ? On Wed, Dec 8, 2010 at 4:17 AM, Adarsh Sharma adarsh.sha...@orkash.comwrote: Dear all, Did anyone encounter the below error while running job in Hadoop. It occurs in the reduce phase of the job. attempt_201012061426_0001_m_000292_0: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for taskTracker/jobcache/job_201012061426_0001/attempt_201012061426_0001_m_000292_0/output/file.out It states that it is not able to locate a file that is created in mapred.local.dir of Hadoop. Thanks in Advance for any sort of information regarding this. Best Regards Adarsh Sharma
Re: Help: 1) Hadoop processes still are running after we stopped hadoop.2) How to exclude a dead node?
Yes. Reference: I couldn't find a apache hadoop page describing this but see below link http://serverfault.com/questions/115148/hadoop-slaves-file-necessary On 12/7/10 11:59 PM, common-user-digest-h...@hadoop.apache.org common-user-digest-h...@hadoop.apache.org wrote: From: li ping li.j...@gmail.com Date: Wed, 8 Dec 2010 14:17:40 +0800 To: common-user@hadoop.apache.org Subject: Re: Help: 1) Hadoop processes still are running after we stopped hadoop.2) How to exclude a dead node? I am not sure I have fully understand your post. You mean the conf/slaves only be used for stop/start script to start or stop the datanode/tasktracker? And the conf/master only contains the information about the secondary namenode? Thanks On Wed, Dec 8, 2010 at 1:44 PM, Sudhir Vallamkondu sudhir.vallamko...@icrossing.com wrote: There is a proper decommissioning process to remove dead nodes. See the FAQ link here: http://wiki.apache.org/hadoop/FAQ#I_want_to_make_a_large_cluster_smaller_by_ taking_out_a_bunch_of_nodes_simultaneously._How_can_this_be_done.3F For a fact $HADOOP_HOME/conf/slaves is not used by the name node to keep track of datanodes/tasktracker. It is merely used by the stop/start hadoop scripts to know which nodes to start datanode / tasktracker services. Similarly there is confusion regarding understanding the $HADOOP_HOME/conf/master file. That file contains the details of the machine where secondary name node is running, not the name node/job tracker. With regards to not all java/hadoop processes getting killed, this may be happening due to hadoop loosing track of pid files. By default the pid files are configured to be created in the /tmp directory. If these pid files get deleted then stop/start scripts cannot detect running hadoop processes. I suggest changing location of pid files to a persistent location like /var/hadoop/. The $HADOOP_HOME/conf/hadoop-env.sh file has details on configuring the PID location - Sudhir On 12/7/10 5:07 PM, common-user-digest-h...@hadoop.apache.org common-user-digest-h...@hadoop.apache.org wrote: From: Tali K ncherr...@hotmail.com Date: Tue, 7 Dec 2010 10:40:16 -0800 To: core-u...@hadoop.apache.org Subject: Help: 1) Hadoop processes still are running after we stopped hadoop.2) How to exclude a dead node? 1)When I stopped hadoop, we checked all the nodes and found that 2 or 3 java/hadoop processes were still running on each node. So we went to each node and did a 'killall java' - in some cases I had to do 'killall -9 java'. My question : why is is this happening and what would be recommendations , how to make sure that there is no hadoop processes running after I stopped hadoop with stop-all.sh? 2) Also we have a dead node. We removed this node from $HADOOP_HOME/conf/slaves. This file is supposed to tell the namenode which machines are supposed to be datanodes/tasktrackers. We started hadoop again, and were surprised to see a dead node in hadoop 'report' ($HADOOP_HOME/bin/hadoop dfsadmin -report|less) It is only after blocking a deadnode and restarting hadoop, deadnode no longer showed up in hreport. Any recommendations, how to deal with dead nodes? iCrossing Privileged and Confidential Information This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information of iCrossing. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
Re: Configure Secondary Namenode
Date: Wed, 18 Aug 2010 13:08:03 +0530 From: adarsh.sha...@orkash.com To: core-u...@hadoop.apache.org Subject: Configure Secondary Namenode I am not able to find any command or parameter in core-default.xml to configure secondary namenode on separate machine. I have a 4-node cluster with jobtracker,master,secondary namenode on one machine and remaining 3 are slaves. Can anyone please tell me. Thanks in Advance I ran into kinda similar problem and found that there is no simple way to do this. The following blog posts explains how to do this. You will have to modify bin/start-dfs.sh and stop-dfs.sh scripts to do this. Following blog has all the details. http://hadoop-blog.blogspot.com/2010/12/secondarynamenode-process-is-starting.html http://hadoop-blog.blogspot.com/2010/12/secondarynamenode-process-is-starting.html Please note the blog assumes that you are working with CDH2 (cloudera's distribution). -- View this message in context: http://lucene.472066.n3.nabble.com/Configure-Secondary-Namenode-tp1201133p2053639.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
Making input in Map iterable
Hello, I have a data processing logic implemented so that on input it receives IterableSome. I.e. pretty much the same as reducer's API. But I need to use this code in Map, where each element is arrived as map() method invocation. To solve the problem (at least for now), I'm doing the following: * run processing code in a thread which I start in setup() and wait for completion for it in cleanup() * keep a buffer which I fill with map input items (and feed Iterable object from this buffer until it has something) * write to buffer until it is full and only then switch to a thread which does processing. (assumption: processing logic always read data from buffer till the end, if processing fails, then the whole job is marked as failed). I don't see that it should cause any noticeable performance degradation: switches between threads are quite rare. Also it looks like the approach is safe. Could anyone please confirm that? Or in case there's a better solution, please, let me know. Btw, the rough cut of implementation you can find here (small class): https://github.com/sematext/HBaseHUT/blob/master/src/main/java/com/sematext/hbase/hut/UpdatesProcessingMrJob.java. It is in working (unit-tests work well at least) state. Thank you in advance! Alex Baranau Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase
urgent, error: java.io.IOException: Cannot create directory
Hi Guys: I am just installation the hadoop 0.21.0 in a single node cluster. I encounter the following error when I run bin/hadoop namenode -format 10/12/08 16:27:22 ERROR namenode.NameNode: java.io.IOException: Cannot create directory /your/path/to/hadoop/tmp/dir/hadoop-hadoop/dfs/name/current at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.clearDirectory(Storage.java:312) at org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:1425) at org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:1444) at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:1242) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1348) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1368) Below is my core-site.xml configuration !-- In: conf/core-site.xml -- property namehadoop.tmp.dir/name value/your/path/to/hadoop/tmp/dir/hadoop-${user.name}/value descriptionA base for other temporary directories./description /property property namefs.default.name/name valuehdfs://localhost:54310/value descriptionThe name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem./description /property /configuration Below is my hdfs-site.xml *?xml version=1.0? ?xml-stylesheet type=text/xsl href=configuration.xsl? !-- Put site-specific property overrides in this file. -- configuration !-- In: conf/hdfs-site.xml -- property namedfs.replication/name value1/value descriptionDefault block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time. /description /property /configuration below is my mapred-site.xml: ?xml version=1.0? ?xml-stylesheet type=text/xsl href=configuration.xsl? !-- Put site-specific property overrides in this file. -- configuration !-- In: conf/mapred-site.xml -- property namemapred.job.tracker/name valuelocalhost:54311/value descriptionThe host and port that the MapReduce job tracker runs at. If local, then jobs are run in-process as a single map and reduce task. /description /property /configuration Thanks. Richard *
Re: urgent, error: java.io.IOException: Cannot create directory
Hi Richard - First thing that comes to mind is a permissions issue. Can you verify that your directories along the desired namenode path are writable by the appropriate user(s)? HTH, -James On Wed, Dec 8, 2010 at 1:37 PM, Richard Zhang richardtec...@gmail.comwrote: Hi Guys: I am just installation the hadoop 0.21.0 in a single node cluster. I encounter the following error when I run bin/hadoop namenode -format 10/12/08 16:27:22 ERROR namenode.NameNode: java.io.IOException: Cannot create directory /your/path/to/hadoop/tmp/dir/hadoop-hadoop/dfs/name/current at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.clearDirectory(Storage.java:312) at org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:1425) at org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:1444) at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:1242) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1348) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1368) Below is my core-site.xml configuration !-- In: conf/core-site.xml -- property namehadoop.tmp.dir/name value/your/path/to/hadoop/tmp/dir/hadoop-${user.name}/value descriptionA base for other temporary directories./description /property property namefs.default.name/name valuehdfs://localhost:54310/value descriptionThe name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem./description /property /configuration Below is my hdfs-site.xml *?xml version=1.0? ?xml-stylesheet type=text/xsl href=configuration.xsl? !-- Put site-specific property overrides in this file. -- configuration !-- In: conf/hdfs-site.xml -- property namedfs.replication/name value1/value descriptionDefault block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time. /description /property /configuration below is my mapred-site.xml: ?xml version=1.0? ?xml-stylesheet type=text/xsl href=configuration.xsl? !-- Put site-specific property overrides in this file. -- configuration !-- In: conf/mapred-site.xml -- property namemapred.job.tracker/name valuelocalhost:54311/value descriptionThe host and port that the MapReduce job tracker runs at. If local, then jobs are run in-process as a single map and reduce task. /description /property /configuration Thanks. Richard *
Re: urgent, error: java.io.IOException: Cannot create directory
Hi James: I verified that I have the following permission set for the path: ls -l tmp/dir/hadoop-hadoop/dfs/hadoop total 4 drwxr-xr-x 2 hadoop hadoop 4096 2010-12-08 15:56 current Thanks. Richard On Wed, Dec 8, 2010 at 4:50 PM, james warren ja...@rockyou.com wrote: Hi Richard - First thing that comes to mind is a permissions issue. Can you verify that your directories along the desired namenode path are writable by the appropriate user(s)? HTH, -James On Wed, Dec 8, 2010 at 1:37 PM, Richard Zhang richardtec...@gmail.com wrote: Hi Guys: I am just installation the hadoop 0.21.0 in a single node cluster. I encounter the following error when I run bin/hadoop namenode -format 10/12/08 16:27:22 ERROR namenode.NameNode: java.io.IOException: Cannot create directory /your/path/to/hadoop/tmp/dir/hadoop-hadoop/dfs/name/current at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.clearDirectory(Storage.java:312) at org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:1425) at org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:1444) at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:1242) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1348) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1368) Below is my core-site.xml configuration !-- In: conf/core-site.xml -- property namehadoop.tmp.dir/name value/your/path/to/hadoop/tmp/dir/hadoop-${user.name}/value descriptionA base for other temporary directories./description /property property namefs.default.name/name valuehdfs://localhost:54310/value descriptionThe name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem./description /property /configuration Below is my hdfs-site.xml *?xml version=1.0? ?xml-stylesheet type=text/xsl href=configuration.xsl? !-- Put site-specific property overrides in this file. -- configuration !-- In: conf/hdfs-site.xml -- property namedfs.replication/name value1/value descriptionDefault block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time. /description /property /configuration below is my mapred-site.xml: ?xml version=1.0? ?xml-stylesheet type=text/xsl href=configuration.xsl? !-- Put site-specific property overrides in this file. -- configuration !-- In: conf/mapred-site.xml -- property namemapred.job.tracker/name valuelocalhost:54311/value descriptionThe host and port that the MapReduce job tracker runs at. If local, then jobs are run in-process as a single map and reduce task. /description /property /configuration Thanks. Richard *
Re: urgent, error: java.io.IOException: Cannot create directory
would that be the reason that 54310 port is not open? I just used * iptables -A INPUT -p tcp --dport 54310 -j ACCEPT to open the port. But it seems the same erorr exists. Richard * On Wed, Dec 8, 2010 at 4:56 PM, Richard Zhang richardtec...@gmail.comwrote: Hi James: I verified that I have the following permission set for the path: ls -l tmp/dir/hadoop-hadoop/dfs/hadoop total 4 drwxr-xr-x 2 hadoop hadoop 4096 2010-12-08 15:56 current Thanks. Richard On Wed, Dec 8, 2010 at 4:50 PM, james warren ja...@rockyou.com wrote: Hi Richard - First thing that comes to mind is a permissions issue. Can you verify that your directories along the desired namenode path are writable by the appropriate user(s)? HTH, -James On Wed, Dec 8, 2010 at 1:37 PM, Richard Zhang richardtec...@gmail.com wrote: Hi Guys: I am just installation the hadoop 0.21.0 in a single node cluster. I encounter the following error when I run bin/hadoop namenode -format 10/12/08 16:27:22 ERROR namenode.NameNode: java.io.IOException: Cannot create directory /your/path/to/hadoop/tmp/dir/hadoop-hadoop/dfs/name/current at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.clearDirectory(Storage.java:312) at org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:1425) at org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:1444) at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:1242) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1348) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1368) Below is my core-site.xml configuration !-- In: conf/core-site.xml -- property namehadoop.tmp.dir/name value/your/path/to/hadoop/tmp/dir/hadoop-${user.name}/value descriptionA base for other temporary directories./description /property property namefs.default.name/name valuehdfs://localhost:54310/value descriptionThe name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem./description /property /configuration Below is my hdfs-site.xml *?xml version=1.0? ?xml-stylesheet type=text/xsl href=configuration.xsl? !-- Put site-specific property overrides in this file. -- configuration !-- In: conf/hdfs-site.xml -- property namedfs.replication/name value1/value descriptionDefault block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time. /description /property /configuration below is my mapred-site.xml: ?xml version=1.0? ?xml-stylesheet type=text/xsl href=configuration.xsl? !-- Put site-specific property overrides in this file. -- configuration !-- In: conf/mapred-site.xml -- property namemapred.job.tracker/name valuelocalhost:54311/value descriptionThe host and port that the MapReduce job tracker runs at. If local, then jobs are run in-process as a single map and reduce task. /description /property /configuration Thanks. Richard *
Re: urgent, error: java.io.IOException: Cannot create directory
it seems that you are looking at 2 different directories: first post: /your/path/to/hadoop/tmp/dir/hadoop-hadoop/dfs/name/current second: ls -l tmp/dir/hadoop-hadoop/dfs/hadoop -- Take care, Konstantin (Cos) Boudnik On Wed, Dec 8, 2010 at 14:19, Richard Zhang richardtec...@gmail.com wrote: would that be the reason that 54310 port is not open? I just used * iptables -A INPUT -p tcp --dport 54310 -j ACCEPT to open the port. But it seems the same erorr exists. Richard * On Wed, Dec 8, 2010 at 4:56 PM, Richard Zhang richardtec...@gmail.comwrote: Hi James: I verified that I have the following permission set for the path: ls -l tmp/dir/hadoop-hadoop/dfs/hadoop total 4 drwxr-xr-x 2 hadoop hadoop 4096 2010-12-08 15:56 current Thanks. Richard On Wed, Dec 8, 2010 at 4:50 PM, james warren ja...@rockyou.com wrote: Hi Richard - First thing that comes to mind is a permissions issue. Can you verify that your directories along the desired namenode path are writable by the appropriate user(s)? HTH, -James On Wed, Dec 8, 2010 at 1:37 PM, Richard Zhang richardtec...@gmail.com wrote: Hi Guys: I am just installation the hadoop 0.21.0 in a single node cluster. I encounter the following error when I run bin/hadoop namenode -format 10/12/08 16:27:22 ERROR namenode.NameNode: java.io.IOException: Cannot create directory /your/path/to/hadoop/tmp/dir/hadoop-hadoop/dfs/name/current at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.clearDirectory(Storage.java:312) at org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:1425) at org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:1444) at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:1242) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1348) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1368) Below is my core-site.xml configuration !-- In: conf/core-site.xml -- property namehadoop.tmp.dir/name value/your/path/to/hadoop/tmp/dir/hadoop-${user.name}/value descriptionA base for other temporary directories./description /property property namefs.default.name/name valuehdfs://localhost:54310/value descriptionThe name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem./description /property /configuration Below is my hdfs-site.xml *?xml version=1.0? ?xml-stylesheet type=text/xsl href=configuration.xsl? !-- Put site-specific property overrides in this file. -- configuration !-- In: conf/hdfs-site.xml -- property namedfs.replication/name value1/value descriptionDefault block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time. /description /property /configuration below is my mapred-site.xml: ?xml version=1.0? ?xml-stylesheet type=text/xsl href=configuration.xsl? !-- Put site-specific property overrides in this file. -- configuration !-- In: conf/mapred-site.xml -- property namemapred.job.tracker/name valuelocalhost:54311/value descriptionThe host and port that the MapReduce job tracker runs at. If local, then jobs are run in-process as a single map and reduce task. /description /property /configuration Thanks. Richard *
Re: urgent, error: java.io.IOException: Cannot create directory
Hi: /your/path/to/hadoop represents the location where hadoop is installed. BTW, I believe this is a file writing permission problem. Because I use the same *-site.xml setting to install with root and it works. But when I use the dedicated user hadoop, it always introduces this problem. But I do created manually the directory path and grant with 755. Weird Richard. On Wed, Dec 8, 2010 at 6:51 PM, Konstantin Boudnik c...@apache.org wrote: it seems that you are looking at 2 different directories: first post: /your/path/to/hadoop/tmp/dir/hadoop-hadoop/dfs/name/current second: ls -l tmp/dir/hadoop-hadoop/dfs/hadoop -- Take care, Konstantin (Cos) Boudnik On Wed, Dec 8, 2010 at 14:19, Richard Zhang richardtec...@gmail.com wrote: would that be the reason that 54310 port is not open? I just used * iptables -A INPUT -p tcp --dport 54310 -j ACCEPT to open the port. But it seems the same erorr exists. Richard * On Wed, Dec 8, 2010 at 4:56 PM, Richard Zhang richardtec...@gmail.com wrote: Hi James: I verified that I have the following permission set for the path: ls -l tmp/dir/hadoop-hadoop/dfs/hadoop total 4 drwxr-xr-x 2 hadoop hadoop 4096 2010-12-08 15:56 current Thanks. Richard On Wed, Dec 8, 2010 at 4:50 PM, james warren ja...@rockyou.com wrote: Hi Richard - First thing that comes to mind is a permissions issue. Can you verify that your directories along the desired namenode path are writable by the appropriate user(s)? HTH, -James On Wed, Dec 8, 2010 at 1:37 PM, Richard Zhang richardtec...@gmail.com wrote: Hi Guys: I am just installation the hadoop 0.21.0 in a single node cluster. I encounter the following error when I run bin/hadoop namenode -format 10/12/08 16:27:22 ERROR namenode.NameNode: java.io.IOException: Cannot create directory /your/path/to/hadoop/tmp/dir/hadoop-hadoop/dfs/name/current at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.clearDirectory(Storage.java:312) at org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:1425) at org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:1444) at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:1242) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1348) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1368) Below is my core-site.xml configuration !-- In: conf/core-site.xml -- property namehadoop.tmp.dir/name value/your/path/to/hadoop/tmp/dir/hadoop-${user.name}/value descriptionA base for other temporary directories./description /property property namefs.default.name/name valuehdfs://localhost:54310/value descriptionThe name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem./description /property /configuration Below is my hdfs-site.xml *?xml version=1.0? ?xml-stylesheet type=text/xsl href=configuration.xsl? !-- Put site-specific property overrides in this file. -- configuration !-- In: conf/hdfs-site.xml -- property namedfs.replication/name value1/value descriptionDefault block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time. /description /property /configuration below is my mapred-site.xml: ?xml version=1.0? ?xml-stylesheet type=text/xsl href=configuration.xsl? !-- Put site-specific property overrides in this file. -- configuration !-- In: conf/mapred-site.xml -- property namemapred.job.tracker/name valuelocalhost:54311/value descriptionThe host and port that the MapReduce job tracker runs at. If local, then jobs are run in-process as a single map and reduce task. /description /property /configuration Thanks. Richard *
Re: urgent, error: java.io.IOException: Cannot create directory
Yeah, I figured that match. What I was referring to is the ending of the paths: .../hadoop-hadoop/dfs/name/current .../hadoop-hadoop/dfs/hadoop They are different -- Take care, Konstantin (Cos) Boudnik On Wed, Dec 8, 2010 at 15:55, Richard Zhang richardtec...@gmail.com wrote: Hi: /your/path/to/hadoop represents the location where hadoop is installed. BTW, I believe this is a file writing permission problem. Because I use the same *-site.xml setting to install with root and it works. But when I use the dedicated user hadoop, it always introduces this problem. But I do created manually the directory path and grant with 755. Weird Richard. On Wed, Dec 8, 2010 at 6:51 PM, Konstantin Boudnik c...@apache.org wrote: it seems that you are looking at 2 different directories: first post: /your/path/to/hadoop/tmp/dir/hadoop-hadoop/dfs/name/current second: ls -l tmp/dir/hadoop-hadoop/dfs/hadoop -- Take care, Konstantin (Cos) Boudnik On Wed, Dec 8, 2010 at 14:19, Richard Zhang richardtec...@gmail.com wrote: would that be the reason that 54310 port is not open? I just used * iptables -A INPUT -p tcp --dport 54310 -j ACCEPT to open the port. But it seems the same erorr exists. Richard * On Wed, Dec 8, 2010 at 4:56 PM, Richard Zhang richardtec...@gmail.com wrote: Hi James: I verified that I have the following permission set for the path: ls -l tmp/dir/hadoop-hadoop/dfs/hadoop total 4 drwxr-xr-x 2 hadoop hadoop 4096 2010-12-08 15:56 current Thanks. Richard On Wed, Dec 8, 2010 at 4:50 PM, james warren ja...@rockyou.com wrote: Hi Richard - First thing that comes to mind is a permissions issue. Can you verify that your directories along the desired namenode path are writable by the appropriate user(s)? HTH, -James On Wed, Dec 8, 2010 at 1:37 PM, Richard Zhang richardtec...@gmail.com wrote: Hi Guys: I am just installation the hadoop 0.21.0 in a single node cluster. I encounter the following error when I run bin/hadoop namenode -format 10/12/08 16:27:22 ERROR namenode.NameNode: java.io.IOException: Cannot create directory /your/path/to/hadoop/tmp/dir/hadoop-hadoop/dfs/name/current at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.clearDirectory(Storage.java:312) at org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:1425) at org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:1444) at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:1242) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1348) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1368) Below is my core-site.xml configuration !-- In: conf/core-site.xml -- property namehadoop.tmp.dir/name value/your/path/to/hadoop/tmp/dir/hadoop-${user.name}/value descriptionA base for other temporary directories./description /property property namefs.default.name/name valuehdfs://localhost:54310/value descriptionThe name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem./description /property /configuration Below is my hdfs-site.xml *?xml version=1.0? ?xml-stylesheet type=text/xsl href=configuration.xsl? !-- Put site-specific property overrides in this file. -- configuration !-- In: conf/hdfs-site.xml -- property namedfs.replication/name value1/value descriptionDefault block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time. /description /property /configuration below is my mapred-site.xml: ?xml version=1.0? ?xml-stylesheet type=text/xsl href=configuration.xsl? !-- Put site-specific property overrides in this file. -- configuration !-- In: conf/mapred-site.xml -- property namemapred.job.tracker/name valuelocalhost:54311/value descriptionThe host and port that the MapReduce job tracker runs at. If local, then jobs are run in-process as a single map and reduce task. /description /property /configuration Thanks. Richard *
Re: urgent, error: java.io.IOException: Cannot create directory
oh, sorry. I corrected that typo hadoop$ ls tmp/dir/hadoop-hadoop/dfs/name/current -l total 0 hadoop$ ls tmp/dir/hadoop-hadoop/dfs/name -l total 4 drwxr-xr-x 2 hadoop hadoop 4096 2010-12-08 22:17 current Even I remove the tmp I manually created and set all the Hadoop package to be 777. Then I run the hadoop again and it is still the same. Richard. On Wed, Dec 8, 2010 at 7:55 PM, Konstantin Boudnik c...@apache.org wrote: Yeah, I figured that match. What I was referring to is the ending of the paths: .../hadoop-hadoop/dfs/name/current .../hadoop-hadoop/dfs/hadoop They are different -- Take care, Konstantin (Cos) Boudnik On Wed, Dec 8, 2010 at 15:55, Richard Zhang richardtec...@gmail.com wrote: Hi: /your/path/to/hadoop represents the location where hadoop is installed. BTW, I believe this is a file writing permission problem. Because I use the same *-site.xml setting to install with root and it works. But when I use the dedicated user hadoop, it always introduces this problem. But I do created manually the directory path and grant with 755. Weird Richard. On Wed, Dec 8, 2010 at 6:51 PM, Konstantin Boudnik c...@apache.org wrote: it seems that you are looking at 2 different directories: first post: /your/path/to/hadoop/tmp/dir/hadoop-hadoop/dfs/name/current second: ls -l tmp/dir/hadoop-hadoop/dfs/hadoop -- Take care, Konstantin (Cos) Boudnik On Wed, Dec 8, 2010 at 14:19, Richard Zhang richardtec...@gmail.com wrote: would that be the reason that 54310 port is not open? I just used * iptables -A INPUT -p tcp --dport 54310 -j ACCEPT to open the port. But it seems the same erorr exists. Richard * On Wed, Dec 8, 2010 at 4:56 PM, Richard Zhang richardtec...@gmail.com wrote: Hi James: I verified that I have the following permission set for the path: ls -l tmp/dir/hadoop-hadoop/dfs/hadoop total 4 drwxr-xr-x 2 hadoop hadoop 4096 2010-12-08 15:56 current Thanks. Richard On Wed, Dec 8, 2010 at 4:50 PM, james warren ja...@rockyou.com wrote: Hi Richard - First thing that comes to mind is a permissions issue. Can you verify that your directories along the desired namenode path are writable by the appropriate user(s)? HTH, -James On Wed, Dec 8, 2010 at 1:37 PM, Richard Zhang richardtec...@gmail.com wrote: Hi Guys: I am just installation the hadoop 0.21.0 in a single node cluster. I encounter the following error when I run bin/hadoop namenode -format 10/12/08 16:27:22 ERROR namenode.NameNode: java.io.IOException: Cannot create directory /your/path/to/hadoop/tmp/dir/hadoop-hadoop/dfs/name/current at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.clearDirectory(Storage.java:312) at org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:1425) at org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:1444) at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:1242) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1348) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1368) Below is my core-site.xml configuration !-- In: conf/core-site.xml -- property namehadoop.tmp.dir/name value/your/path/to/hadoop/tmp/dir/hadoop-${user.name}/value descriptionA base for other temporary directories./description /property property namefs.default.name/name valuehdfs://localhost:54310/value descriptionThe name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem./description /property /configuration Below is my hdfs-site.xml *?xml version=1.0? ?xml-stylesheet type=text/xsl href=configuration.xsl? !-- Put site-specific property overrides in this file. -- configuration !-- In: conf/hdfs-site.xml -- property namedfs.replication/name value1/value descriptionDefault block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time. /description /property /configuration below is my mapred-site.xml: ?xml version=1.0? ?xml-stylesheet type=text/xsl href=configuration.xsl? !-- Put site-specific property overrides in this file. -- configuration !-- In: conf/mapred-site.xml -- property namemapred.job.tracker/name
Hadoop Certification Progamme
Hi all,. Is there any valid Hadoop Certification available ? Something which adds credibility to your Hadoop expertise. Matthew
Re: Hadoop Certification Progamme
Matthew, Cloudera has rolled a certification program for developers and admins. Take a look into their website. Cheers, Esteban. On Dec 8, 2010 9:41 PM, Matthew John tmatthewjohn1...@gmail.com wrote: Hi all,. Is there any valid Hadoop Certification available ? Something which adds credibility to your Hadoop expertise. Matthew
Re: Reduce Error
Ted Yu wrote: Any chance mapred.local.dir is under /tmp and part of it got cleaned up ? On Wed, Dec 8, 2010 at 4:17 AM, Adarsh Sharma adarsh.sha...@orkash.comwrote: Dear all, Did anyone encounter the below error while running job in Hadoop. It occurs in the reduce phase of the job. attempt_201012061426_0001_m_000292_0: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for taskTracker/jobcache/job_201012061426_0001/attempt_201012061426_0001_m_000292_0/output/file.out It states that it is not able to locate a file that is created in mapred.local.dir of Hadoop. Thanks in Advance for any sort of information regarding this. Best Regards Adarsh Sharma Hi Ted, My mapred.local.dir is in /home/hadoop directory. I also check it with in /hdd2-2 directory where we have lots of space. Would mapred.map.tasks affects. I checked with default and also with 80 maps and 16 reduces as I have 8 slaves. property namemapred.local.dir/name value/home/hadoop/mapred/local/value descriptionThe local directory where MapReduce stores intermediate data files. May be a comma-separated list of directories on different devices in order to spread disk i/o. Directories that do not exist are ignored. /description /property property namemapred.system.dir/name value/home/hadoop/mapred/system/value descriptionThe shared directory where MapReduce stores control files. /description /property Any further information u want. Thanks Regards Adarsh Sharma
Re: Reduce Error
Go through the jobtracker, find the relevant node that handled attempt_201012061426_0001_m_000292_0 and figure out if there are FS or permssion problems. Raj From: Adarsh Sharma adarsh.sha...@orkash.com To: common-user@hadoop.apache.org Sent: Wed, December 8, 2010 7:48:47 PM Subject: Re: Reduce Error Ted Yu wrote: Any chance mapred.local.dir is under /tmp and part of it got cleaned up ? On Wed, Dec 8, 2010 at 4:17 AM, Adarsh Sharma adarsh.sha...@orkash.comwrote: Dear all, Did anyone encounter the below error while running job in Hadoop. It occurs in the reduce phase of the job. attempt_201012061426_0001_m_000292_0: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for taskTracker/jobcache/job_201012061426_0001/attempt_201012061426_0001_m_000292_0/output/file.out t It states that it is not able to locate a file that is created in mapred.local.dir of Hadoop. Thanks in Advance for any sort of information regarding this. Best Regards Adarsh Sharma Hi Ted, My mapred.local.dir is in /home/hadoop directory. I also check it with in /hdd2-2 directory where we have lots of space. Would mapred.map.tasks affects. I checked with default and also with 80 maps and 16 reduces as I have 8 slaves. property namemapred.local.dir/name value/home/hadoop/mapred/local/value descriptionThe local directory where MapReduce stores intermediate data files. May be a comma-separated list of directories on different devices in order to spread disk i/o. Directories that do not exist are ignored. /description /property property namemapred.system.dir/name value/home/hadoop/mapred/system/value descriptionThe shared directory where MapReduce stores control files. /description /property Any further information u want. Thanks Regards Adarsh Sharma
Re: Reduce Error
From Raj earlier: I have seen this error from time to time and it has been either due to space or missing directories or disk errors. Space issue was caused by the fact that the I had mounted /de/sdc on /hadoop-dsk and the mount had failed. And in another case I had accidentally deleted hadoop.tmp.dir in a node and whenever the reduce job was scheduled on that node that attempt would fail. On Wed, Dec 8, 2010 at 8:21 PM, Adarsh Sharma adarsh.sha...@orkash.comwrote: Raj V wrote: Go through the jobtracker, find the relevant node that handled attempt_201012061426_0001_m_000292_0 and figure out if there are FS or permssion problems. Raj From: Adarsh Sharma adarsh.sha...@orkash.com To: common-user@hadoop.apache.org Sent: Wed, December 8, 2010 7:48:47 PM Subject: Re: Reduce Error Ted Yu wrote: Any chance mapred.local.dir is under /tmp and part of it got cleaned up ? On Wed, Dec 8, 2010 at 4:17 AM, Adarsh Sharma adarsh.sha...@orkash.com wrote: Dear all, Did anyone encounter the below error while running job in Hadoop. It occurs in the reduce phase of the job. attempt_201012061426_0001_m_000292_0: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for taskTracker/jobcache/job_201012061426_0001/attempt_201012061426_0001_m_000292_0/output/file.out t It states that it is not able to locate a file that is created in mapred.local.dir of Hadoop. Thanks in Advance for any sort of information regarding this. Best Regards Adarsh Sharma Hi Ted, My mapred.local.dir is in /home/hadoop directory. I also check it with in /hdd2-2 directory where we have lots of space. Would mapred.map.tasks affects. I checked with default and also with 80 maps and 16 reduces as I have 8 slaves. property namemapred.local.dir/name value/home/hadoop/mapred/local/value descriptionThe local directory where MapReduce stores intermediate data files. May be a comma-separated list of directories on different devices in order to spread disk i/o. Directories that do not exist are ignored. /description /property property namemapred.system.dir/name value/home/hadoop/mapred/system/value descriptionThe shared directory where MapReduce stores control files. /description /property Any further information u want. Thanks Regards Adarsh Sharma Sir I read the tasktracker logs several times but not able to find any reason as they are not very useful. I attached with the mail of tasktracker. However I listed main portion. 2010-12-06 15:27:04,228 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'attempt_201012061426_0001_m_00_1' to tip task_201012061426_0001_m_00, for tracker 'tracker_ws37-user-lin: 127.0.0.1/127.0.0.1:60583' 2010-12-06 15:27:04,228 INFO org.apache.hadoop.mapred.JobInProgress: Choosing rack-local task task_201012061426_0001_m_00 2010-12-06 15:27:04,229 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_201012061426_0001_m_00_0' from 'tracker_ws37-user-lin:127.0.0.1/127.0.0.1:60583' 2010-12-06 15:27:07,235 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201012061426_0001_m_000328_0: java.io.IOException: Spill failed at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:860) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:541) at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80) at org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:30) at org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:19) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:170) Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for taskTracker/jobcache/job_201012061426_0001/attempt_201012061426_0001_m_000328_0/output/spill16.out at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:343) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124) at org.apache.hadoop.mapred.MapOutputFile.getSpillFileForWrite(MapOutputFile.java:107) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1221) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:686) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1173) 2010-12-06 15:27:07,236 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201012061426_0001_m_00_1: Error initializing attempt_201012061426_0001_m_00_1: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid
Running not as hadoop user
Hi, hadoop user has some advantages for running Hadoop. For example, if HDFS is mounted as a local file system, then only user hadoop has write/delete permissions. Can this privilege be given to another user? In other words, is this hadoop user hard-coded, or can another be used in its stead? Thank you, Mark
Re: Running not as hadoop user
The user who started the NN has superuser privileges on HDFS. You can also configure a supergroup by setting dfs.permissions.supergroup (default supergroup) -Todd On Wed, Dec 8, 2010 at 9:34 PM, Mark Kerzner markkerz...@gmail.com wrote: Hi, hadoop user has some advantages for running Hadoop. For example, if HDFS is mounted as a local file system, then only user hadoop has write/delete permissions. Can this privilege be given to another user? In other words, is this hadoop user hard-coded, or can another be used in its stead? Thank you, Mark -- Todd Lipcon Software Engineer, Cloudera
Re: Running not as hadoop user
Todd Lipcon wrote: The user who started the NN has superuser privileges on HDFS. You can also configure a supergroup by setting dfs.permissions.supergroup (default supergroup) -Todd On Wed, Dec 8, 2010 at 9:34 PM, Mark Kerzner markkerz...@gmail.com wrote: Hi, hadoop user has some advantages for running Hadoop. For example, if HDFS is mounted as a local file system, then only user hadoop has write/delete permissions. Can this privilege be given to another user? In other words, is this hadoop user hard-coded, or can another be used in its stead? Thank you, Mark You may also set dfs.permissions = false and grant seperate groups to access HDFS through properties in hdfs.sites.xml --Adarsh
Re: Hadoop Certification Progamme
Hey Matthew, In particular, see http://www.cloudera.com/hadoop-training/ for details on Cloudera's training and certifications. Regards, Jeff On Wed, Dec 8, 2010 at 7:44 PM, Esteban Gutierrez Moguel esteban...@gmail.com wrote: Matthew, Cloudera has rolled a certification program for developers and admins. Take a look into their website. Cheers, Esteban. On Dec 8, 2010 9:41 PM, Matthew John tmatthewjohn1...@gmail.com wrote: Hi all,. Is there any valid Hadoop Certification available ? Something which adds credibility to your Hadoop expertise. Matthew