Re: how to run jobs every 30 minutes?

2010-12-08 Thread Alejandro Abdelnur
Or, if you want to do it in a reliable way you could use an Oozie
coordinator job.

On Wed, Dec 8, 2010 at 1:53 PM, edward choi mp2...@gmail.com wrote:
 My mistake. Come to think about it, you are right, I can just make an
 infinite loop inside the Hadoop application.
 Thanks for the reply.

 2010/12/7 Harsh J qwertyman...@gmail.com

 Hi,

 On Tue, Dec 7, 2010 at 2:25 PM, edward choi mp2...@gmail.com wrote:
  Hi,
 
  I'm planning to crawl a certain web site every 30 minutes.
  How would I get it done in Hadoop?
 
  In pure Java, I used Thread.sleep() method, but I guess this won't work
 in
  Hadoop.

 Why wouldn't it? You need to manage your post-job logic mostly, but
 sleep and resubmission should work just fine.

  Or if it could work, could anyone show me an example?
 
  Ed.
 



 --
 Harsh J
 www.harshj.com




Re: Reduce Error

2010-12-08 Thread Ted Yu
Any chance mapred.local.dir is under /tmp and part of it got cleaned up ?

On Wed, Dec 8, 2010 at 4:17 AM, Adarsh Sharma adarsh.sha...@orkash.comwrote:

 Dear all,

 Did anyone encounter the below error while running job in Hadoop. It occurs
 in the reduce phase of the job.

 attempt_201012061426_0001_m_000292_0:
 org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any
 valid local directory for
 taskTracker/jobcache/job_201012061426_0001/attempt_201012061426_0001_m_000292_0/output/file.out

 It states that it is not able to locate a file that is created in
  mapred.local.dir of Hadoop.

 Thanks in Advance for any sort of information regarding this.

 Best Regards

 Adarsh Sharma



Re: Help: 1) Hadoop processes still are running after we stopped hadoop.2) How to exclude a dead node?

2010-12-08 Thread Sudhir Vallamkondu
Yes.

Reference: I couldn't find a apache hadoop page describing this but see
below link 
http://serverfault.com/questions/115148/hadoop-slaves-file-necessary


On 12/7/10 11:59 PM, common-user-digest-h...@hadoop.apache.org
common-user-digest-h...@hadoop.apache.org wrote:

 From: li ping li.j...@gmail.com
 Date: Wed, 8 Dec 2010 14:17:40 +0800
 To: common-user@hadoop.apache.org
 Subject: Re: Help: 1) Hadoop processes still are running after we stopped 
 hadoop.2) How to exclude a dead node?
 
 I am not sure I have fully understand your post.
 You mean the conf/slaves only be used for stop/start script to start or stop
 the datanode/tasktracker?
 And the conf/master only contains the information about the secondary
 namenode?
 
 Thanks
 
 On Wed, Dec 8, 2010 at 1:44 PM, Sudhir Vallamkondu 
 sudhir.vallamko...@icrossing.com wrote:
 
 There is a proper decommissioning process to remove dead nodes. See the FAQ
 link here:
 
 http://wiki.apache.org/hadoop/FAQ#I_want_to_make_a_large_cluster_smaller_by_
 taking_out_a_bunch_of_nodes_simultaneously._How_can_this_be_done.3F
 
 For a fact $HADOOP_HOME/conf/slaves is not used by the name node to keep
 track of datanodes/tasktracker. It is merely used by the stop/start hadoop
 scripts to know which nodes to start datanode / tasktracker services.
 Similarly there is confusion regarding understanding the
 $HADOOP_HOME/conf/master file. That file contains the details of the
 machine
 where secondary name node is running, not the name node/job tracker.
 
 With regards to not all java/hadoop processes getting killed, this may be
 happening due to hadoop loosing track of pid files. By default the pid
 files
 are configured to be created in the /tmp directory. If these pid files get
 deleted then stop/start scripts cannot detect running hadoop processes. I
 suggest changing location of pid files to a persistent location like
 /var/hadoop/. The $HADOOP_HOME/conf/hadoop-env.sh file has details on
 configuring the PID location
 
 - Sudhir
 
 
 On 12/7/10 5:07 PM, common-user-digest-h...@hadoop.apache.org
 common-user-digest-h...@hadoop.apache.org wrote:
 
 From: Tali K ncherr...@hotmail.com
 Date: Tue, 7 Dec 2010 10:40:16 -0800
 To: core-u...@hadoop.apache.org
 Subject: Help: 1) Hadoop processes still are running after we stopped
 hadoop.2)  How to exclude a dead node?
 
 
 1)When I stopped hadoop, we checked all the nodes and found that 2 or 3
 java/hadoop processes were still running on each node.  So we went to
 each
 node and did a 'killall java' - in some cases I had to do 'killall -9
 java'.
 My question : why is is this happening and what would be recommendations
 , how
 to make sure that there is no hadoop processes running after I stopped
 hadoop
 with stop-all.sh?
 
 2) Also we have a dead node. We  removed this node  from
 $HADOOP_HOME/conf/slaves.  This file is supposed to tell the namenode
  which machines are supposed to be datanodes/tasktrackers.
 We  started hadoop again, and were surprised to see a dead node in
  hadoop
 'report' ($HADOOP_HOME/bin/hadoop dfsadmin -report|less)
 It is only after blocking a deadnode and restarting hadoop, deadnode no
 longer
 showed up in hreport.
 Any recommendations, how to deal with dead nodes?
 


iCrossing Privileged and Confidential Information
This email message is for the sole use of the intended recipient(s) and may 
contain confidential and privileged information of iCrossing. Any unauthorized 
review, use, disclosure or distribution is prohibited. If you are not the 
intended recipient, please contact the sender by reply email and destroy all 
copies of the original message.




Re: Configure Secondary Namenode

2010-12-08 Thread Aman

 Date: Wed, 18 Aug 2010 13:08:03 +0530
 From: adarsh.sha...@orkash.com
 To: core-u...@hadoop.apache.org
 Subject: Configure Secondary Namenode

 I am not able to find any command or parameter in core-default.xml to
 configure secondary namenode on separate machine.
 I have a 4-node cluster with jobtracker,master,secondary namenode on one
 machine
 and remaining 3 are slaves.
 Can anyone please tell me.

 Thanks in Advance
I ran into kinda similar problem and found that there is no simple way to do
this. The following blog posts explains how to do this. 
You will have to modify bin/start-dfs.sh and stop-dfs.sh scripts to do this. 
Following blog has all the details. 
http://hadoop-blog.blogspot.com/2010/12/secondarynamenode-process-is-starting.html
http://hadoop-blog.blogspot.com/2010/12/secondarynamenode-process-is-starting.html
 
Please note the blog assumes that you are working with CDH2 (cloudera's
distribution). 



-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Configure-Secondary-Namenode-tp1201133p2053639.html
Sent from the Hadoop lucene-users mailing list archive at Nabble.com.


Making input in Map iterable

2010-12-08 Thread Alex Baranau
Hello,

I have a data processing logic implemented so that on input it receives
IterableSome. I.e. pretty much the same as reducer's API. But I need to
use this code in Map, where each element is arrived as map() method
invocation.
To solve the problem (at least for now), I'm doing the following:
* run processing code in a thread which I start in setup() and wait for
completion for it in cleanup()
* keep a buffer which I fill with map input items (and feed Iterable object
from this buffer until it has something)
* write to buffer until it is full and only then switch to a thread which
does processing.
(assumption: processing logic always read data from buffer till the end, if
processing fails, then the whole job is marked as failed).

I don't see that it should cause any noticeable performance degradation:
switches between threads are quite rare. Also it looks like the approach is
safe. Could anyone please confirm that? Or in case there's a better
solution, please, let me know.

Btw, the rough cut of implementation you can find here (small class):
https://github.com/sematext/HBaseHUT/blob/master/src/main/java/com/sematext/hbase/hut/UpdatesProcessingMrJob.java.
It is in working (unit-tests work well at least) state.

Thank you in advance!

Alex Baranau

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase


urgent, error: java.io.IOException: Cannot create directory

2010-12-08 Thread Richard Zhang
Hi Guys:
I am just installation the hadoop 0.21.0 in a single node cluster.
I encounter the following error when I run bin/hadoop namenode -format

10/12/08 16:27:22 ERROR namenode.NameNode:
java.io.IOException: Cannot create directory
/your/path/to/hadoop/tmp/dir/hadoop-hadoop/dfs/name/current
at
org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.clearDirectory(Storage.java:312)
at
org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:1425)
at
org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:1444)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:1242)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1348)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1368)


Below is my core-site.xml

configuration
!-- In: conf/core-site.xml --
property
  namehadoop.tmp.dir/name
  value/your/path/to/hadoop/tmp/dir/hadoop-${user.name}/value
  descriptionA base for other temporary directories./description
/property

property
  namefs.default.name/name
  valuehdfs://localhost:54310/value
  descriptionThe name of the default file system.  A URI whose
  scheme and authority determine the FileSystem implementation.  The
  uri's scheme determines the config property (fs.SCHEME.impl) naming
  the FileSystem implementation class.  The uri's authority is used to
  determine the host, port, etc. for a filesystem./description
/property
/configuration


Below is my hdfs-site.xml
*?xml version=1.0?
?xml-stylesheet type=text/xsl href=configuration.xsl?

!-- Put site-specific property overrides in this file. --

configuration
!-- In: conf/hdfs-site.xml --
property
  namedfs.replication/name
  value1/value
  descriptionDefault block replication.
  The actual number of replications can be specified when the file is
created.
  The default is used if replication is not specified in create time.
  /description
/property

/configuration


below is my mapred-site.xml:
?xml version=1.0?
?xml-stylesheet type=text/xsl href=configuration.xsl?

!-- Put site-specific property overrides in this file. --

configuration

!-- In: conf/mapred-site.xml --
property
  namemapred.job.tracker/name
  valuelocalhost:54311/value
  descriptionThe host and port that the MapReduce job tracker runs
  at.  If local, then jobs are run in-process as a single map
  and reduce task.
  /description
/property

/configuration


Thanks.
Richard
*


Re: urgent, error: java.io.IOException: Cannot create directory

2010-12-08 Thread james warren
Hi Richard -

First thing that comes to mind is a permissions issue.  Can you verify that
your directories along the desired namenode path are writable by the
appropriate user(s)?

HTH,
-James

On Wed, Dec 8, 2010 at 1:37 PM, Richard Zhang richardtec...@gmail.comwrote:

 Hi Guys:
 I am just installation the hadoop 0.21.0 in a single node cluster.
 I encounter the following error when I run bin/hadoop namenode -format

 10/12/08 16:27:22 ERROR namenode.NameNode:
 java.io.IOException: Cannot create directory
 /your/path/to/hadoop/tmp/dir/hadoop-hadoop/dfs/name/current
at

 org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.clearDirectory(Storage.java:312)
at
 org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:1425)
at
 org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:1444)
at
 org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:1242)
at

 org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1348)
at
 org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1368)


 Below is my core-site.xml

 configuration
 !-- In: conf/core-site.xml --
 property
  namehadoop.tmp.dir/name
  value/your/path/to/hadoop/tmp/dir/hadoop-${user.name}/value
  descriptionA base for other temporary directories./description
 /property

 property
  namefs.default.name/name
  valuehdfs://localhost:54310/value
  descriptionThe name of the default file system.  A URI whose
  scheme and authority determine the FileSystem implementation.  The
  uri's scheme determines the config property (fs.SCHEME.impl) naming
  the FileSystem implementation class.  The uri's authority is used to
  determine the host, port, etc. for a filesystem./description
 /property
 /configuration


 Below is my hdfs-site.xml
 *?xml version=1.0?
 ?xml-stylesheet type=text/xsl href=configuration.xsl?

 !-- Put site-specific property overrides in this file. --

 configuration
 !-- In: conf/hdfs-site.xml --
 property
  namedfs.replication/name
  value1/value
  descriptionDefault block replication.
  The actual number of replications can be specified when the file is
 created.
  The default is used if replication is not specified in create time.
  /description
 /property

 /configuration


 below is my mapred-site.xml:
 ?xml version=1.0?
 ?xml-stylesheet type=text/xsl href=configuration.xsl?

 !-- Put site-specific property overrides in this file. --

 configuration

 !-- In: conf/mapred-site.xml --
 property
  namemapred.job.tracker/name
  valuelocalhost:54311/value
  descriptionThe host and port that the MapReduce job tracker runs
  at.  If local, then jobs are run in-process as a single map
  and reduce task.
  /description
 /property

 /configuration


 Thanks.
 Richard
 *



Re: urgent, error: java.io.IOException: Cannot create directory

2010-12-08 Thread Richard Zhang
Hi James:
I verified that I have the following permission set for the path:

ls -l tmp/dir/hadoop-hadoop/dfs/hadoop
total 4
drwxr-xr-x 2 hadoop hadoop 4096 2010-12-08 15:56 current
Thanks.
Richard


On Wed, Dec 8, 2010 at 4:50 PM, james warren ja...@rockyou.com wrote:

 Hi Richard -

 First thing that comes to mind is a permissions issue.  Can you verify that
 your directories along the desired namenode path are writable by the
 appropriate user(s)?

 HTH,
 -James

 On Wed, Dec 8, 2010 at 1:37 PM, Richard Zhang richardtec...@gmail.com
 wrote:

  Hi Guys:
  I am just installation the hadoop 0.21.0 in a single node cluster.
  I encounter the following error when I run bin/hadoop namenode -format
 
  10/12/08 16:27:22 ERROR namenode.NameNode:
  java.io.IOException: Cannot create directory
  /your/path/to/hadoop/tmp/dir/hadoop-hadoop/dfs/name/current
 at
 
 
 org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.clearDirectory(Storage.java:312)
 at
  org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:1425)
 at
  org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:1444)
 at
 
 org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:1242)
 at
 
 
 org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1348)
 at
  org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1368)
 
 
  Below is my core-site.xml
 
  configuration
  !-- In: conf/core-site.xml --
  property
   namehadoop.tmp.dir/name
   value/your/path/to/hadoop/tmp/dir/hadoop-${user.name}/value
   descriptionA base for other temporary directories./description
  /property
 
  property
   namefs.default.name/name
   valuehdfs://localhost:54310/value
   descriptionThe name of the default file system.  A URI whose
   scheme and authority determine the FileSystem implementation.  The
   uri's scheme determines the config property (fs.SCHEME.impl) naming
   the FileSystem implementation class.  The uri's authority is used to
   determine the host, port, etc. for a filesystem./description
  /property
  /configuration
 
 
  Below is my hdfs-site.xml
  *?xml version=1.0?
  ?xml-stylesheet type=text/xsl href=configuration.xsl?
 
  !-- Put site-specific property overrides in this file. --
 
  configuration
  !-- In: conf/hdfs-site.xml --
  property
   namedfs.replication/name
   value1/value
   descriptionDefault block replication.
   The actual number of replications can be specified when the file is
  created.
   The default is used if replication is not specified in create time.
   /description
  /property
 
  /configuration
 
 
  below is my mapred-site.xml:
  ?xml version=1.0?
  ?xml-stylesheet type=text/xsl href=configuration.xsl?
 
  !-- Put site-specific property overrides in this file. --
 
  configuration
 
  !-- In: conf/mapred-site.xml --
  property
   namemapred.job.tracker/name
   valuelocalhost:54311/value
   descriptionThe host and port that the MapReduce job tracker runs
   at.  If local, then jobs are run in-process as a single map
   and reduce task.
   /description
  /property
 
  /configuration
 
 
  Thanks.
  Richard
  *
 



Re: urgent, error: java.io.IOException: Cannot create directory

2010-12-08 Thread Richard Zhang
would that be the reason that 54310 port is not open?
I just used
* iptables -A INPUT -p tcp --dport 54310 -j ACCEPT
to open the port.
But it seems the same erorr exists.
Richard
*
On Wed, Dec 8, 2010 at 4:56 PM, Richard Zhang richardtec...@gmail.comwrote:

 Hi James:
 I verified that I have the following permission set for the path:

 ls -l tmp/dir/hadoop-hadoop/dfs/hadoop
 total 4
 drwxr-xr-x 2 hadoop hadoop 4096 2010-12-08 15:56 current
 Thanks.
 Richard



 On Wed, Dec 8, 2010 at 4:50 PM, james warren ja...@rockyou.com wrote:

 Hi Richard -

 First thing that comes to mind is a permissions issue.  Can you verify
 that
 your directories along the desired namenode path are writable by the
 appropriate user(s)?

 HTH,
 -James

 On Wed, Dec 8, 2010 at 1:37 PM, Richard Zhang richardtec...@gmail.com
 wrote:

  Hi Guys:
  I am just installation the hadoop 0.21.0 in a single node cluster.
  I encounter the following error when I run bin/hadoop namenode -format
 
  10/12/08 16:27:22 ERROR namenode.NameNode:
  java.io.IOException: Cannot create directory
  /your/path/to/hadoop/tmp/dir/hadoop-hadoop/dfs/name/current
 at
 
 
 org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.clearDirectory(Storage.java:312)
 at
  org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:1425)
 at
  org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:1444)
 at
 
 org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:1242)
 at
 
 
 org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1348)
 at
  org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1368)
 
 
  Below is my core-site.xml
 
  configuration
  !-- In: conf/core-site.xml --
  property
   namehadoop.tmp.dir/name
   value/your/path/to/hadoop/tmp/dir/hadoop-${user.name}/value
   descriptionA base for other temporary directories./description
  /property
 
  property
   namefs.default.name/name
   valuehdfs://localhost:54310/value
   descriptionThe name of the default file system.  A URI whose
   scheme and authority determine the FileSystem implementation.  The
   uri's scheme determines the config property (fs.SCHEME.impl) naming
   the FileSystem implementation class.  The uri's authority is used to
   determine the host, port, etc. for a filesystem./description
  /property
  /configuration
 
 
  Below is my hdfs-site.xml
  *?xml version=1.0?
  ?xml-stylesheet type=text/xsl href=configuration.xsl?
 
  !-- Put site-specific property overrides in this file. --
 
  configuration
  !-- In: conf/hdfs-site.xml --
  property
   namedfs.replication/name
   value1/value
   descriptionDefault block replication.
   The actual number of replications can be specified when the file is
  created.
   The default is used if replication is not specified in create time.
   /description
  /property
 
  /configuration
 
 
  below is my mapred-site.xml:
  ?xml version=1.0?
  ?xml-stylesheet type=text/xsl href=configuration.xsl?
 
  !-- Put site-specific property overrides in this file. --
 
  configuration
 
  !-- In: conf/mapred-site.xml --
  property
   namemapred.job.tracker/name
   valuelocalhost:54311/value
   descriptionThe host and port that the MapReduce job tracker runs
   at.  If local, then jobs are run in-process as a single map
   and reduce task.
   /description
  /property
 
  /configuration
 
 
  Thanks.
  Richard
  *
 





Re: urgent, error: java.io.IOException: Cannot create directory

2010-12-08 Thread Konstantin Boudnik
it seems that you are looking at 2 different directories:

first post: /your/path/to/hadoop/tmp/dir/hadoop-hadoop/dfs/name/current
second: ls -l  tmp/dir/hadoop-hadoop/dfs/hadoop
--
  Take care,
Konstantin (Cos) Boudnik



On Wed, Dec 8, 2010 at 14:19, Richard Zhang richardtec...@gmail.com wrote:
 would that be the reason that 54310 port is not open?
 I just used
 * iptables -A INPUT -p tcp --dport 54310 -j ACCEPT
 to open the port.
 But it seems the same erorr exists.
 Richard
 *
 On Wed, Dec 8, 2010 at 4:56 PM, Richard Zhang richardtec...@gmail.comwrote:

 Hi James:
 I verified that I have the following permission set for the path:

 ls -l tmp/dir/hadoop-hadoop/dfs/hadoop
 total 4
 drwxr-xr-x 2 hadoop hadoop 4096 2010-12-08 15:56 current
 Thanks.
 Richard



 On Wed, Dec 8, 2010 at 4:50 PM, james warren ja...@rockyou.com wrote:

 Hi Richard -

 First thing that comes to mind is a permissions issue.  Can you verify
 that
 your directories along the desired namenode path are writable by the
 appropriate user(s)?

 HTH,
 -James

 On Wed, Dec 8, 2010 at 1:37 PM, Richard Zhang richardtec...@gmail.com
 wrote:

  Hi Guys:
  I am just installation the hadoop 0.21.0 in a single node cluster.
  I encounter the following error when I run bin/hadoop namenode -format
 
  10/12/08 16:27:22 ERROR namenode.NameNode:
  java.io.IOException: Cannot create directory
  /your/path/to/hadoop/tmp/dir/hadoop-hadoop/dfs/name/current
         at
 
 
 org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.clearDirectory(Storage.java:312)
         at
  org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:1425)
         at
  org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:1444)
         at
 
 org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:1242)
         at
 
 
 org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1348)
         at
  org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1368)
 
 
  Below is my core-site.xml
 
  configuration
  !-- In: conf/core-site.xml --
  property
   namehadoop.tmp.dir/name
   value/your/path/to/hadoop/tmp/dir/hadoop-${user.name}/value
   descriptionA base for other temporary directories./description
  /property
 
  property
   namefs.default.name/name
   valuehdfs://localhost:54310/value
   descriptionThe name of the default file system.  A URI whose
   scheme and authority determine the FileSystem implementation.  The
   uri's scheme determines the config property (fs.SCHEME.impl) naming
   the FileSystem implementation class.  The uri's authority is used to
   determine the host, port, etc. for a filesystem./description
  /property
  /configuration
 
 
  Below is my hdfs-site.xml
  *?xml version=1.0?
  ?xml-stylesheet type=text/xsl href=configuration.xsl?
 
  !-- Put site-specific property overrides in this file. --
 
  configuration
  !-- In: conf/hdfs-site.xml --
  property
   namedfs.replication/name
   value1/value
   descriptionDefault block replication.
   The actual number of replications can be specified when the file is
  created.
   The default is used if replication is not specified in create time.
   /description
  /property
 
  /configuration
 
 
  below is my mapred-site.xml:
  ?xml version=1.0?
  ?xml-stylesheet type=text/xsl href=configuration.xsl?
 
  !-- Put site-specific property overrides in this file. --
 
  configuration
 
  !-- In: conf/mapred-site.xml --
  property
   namemapred.job.tracker/name
   valuelocalhost:54311/value
   descriptionThe host and port that the MapReduce job tracker runs
   at.  If local, then jobs are run in-process as a single map
   and reduce task.
   /description
  /property
 
  /configuration
 
 
  Thanks.
  Richard
  *
 






Re: urgent, error: java.io.IOException: Cannot create directory

2010-12-08 Thread Richard Zhang
Hi:
/your/path/to/hadoop  represents the location where hadoop is installed.
BTW, I believe this is a file writing permission problem. Because I use the
same *-site.xml setting to install with root and it works.
But when I use the dedicated user hadoop, it always introduces this problem.

But I do created manually the directory path and grant with 755.
Weird
Richard.

On Wed, Dec 8, 2010 at 6:51 PM, Konstantin Boudnik c...@apache.org wrote:

 it seems that you are looking at 2 different directories:

 first post: /your/path/to/hadoop/tmp/dir/hadoop-hadoop/dfs/name/current
 second: ls -l  tmp/dir/hadoop-hadoop/dfs/hadoop
 --
   Take care,
 Konstantin (Cos) Boudnik



 On Wed, Dec 8, 2010 at 14:19, Richard Zhang richardtec...@gmail.com
 wrote:
  would that be the reason that 54310 port is not open?
  I just used
  * iptables -A INPUT -p tcp --dport 54310 -j ACCEPT
  to open the port.
  But it seems the same erorr exists.
  Richard
  *
  On Wed, Dec 8, 2010 at 4:56 PM, Richard Zhang richardtec...@gmail.com
 wrote:
 
  Hi James:
  I verified that I have the following permission set for the path:
 
  ls -l tmp/dir/hadoop-hadoop/dfs/hadoop
  total 4
  drwxr-xr-x 2 hadoop hadoop 4096 2010-12-08 15:56 current
  Thanks.
  Richard
 
 
 
  On Wed, Dec 8, 2010 at 4:50 PM, james warren ja...@rockyou.com wrote:
 
  Hi Richard -
 
  First thing that comes to mind is a permissions issue.  Can you verify
  that
  your directories along the desired namenode path are writable by the
  appropriate user(s)?
 
  HTH,
  -James
 
  On Wed, Dec 8, 2010 at 1:37 PM, Richard Zhang richardtec...@gmail.com
  wrote:
 
   Hi Guys:
   I am just installation the hadoop 0.21.0 in a single node cluster.
   I encounter the following error when I run bin/hadoop namenode
 -format
  
   10/12/08 16:27:22 ERROR namenode.NameNode:
   java.io.IOException: Cannot create directory
   /your/path/to/hadoop/tmp/dir/hadoop-hadoop/dfs/name/current
  at
  
  
 
 org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.clearDirectory(Storage.java:312)
  at
  
 org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:1425)
  at
  
 org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:1444)
  at
  
 
 org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:1242)
  at
  
  
 
 org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1348)
  at
  
 org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1368)
  
  
   Below is my core-site.xml
  
   configuration
   !-- In: conf/core-site.xml --
   property
namehadoop.tmp.dir/name
value/your/path/to/hadoop/tmp/dir/hadoop-${user.name}/value
descriptionA base for other temporary directories./description
   /property
  
   property
namefs.default.name/name
valuehdfs://localhost:54310/value
descriptionThe name of the default file system.  A URI whose
scheme and authority determine the FileSystem implementation.  The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class.  The uri's authority is used to
determine the host, port, etc. for a filesystem./description
   /property
   /configuration
  
  
   Below is my hdfs-site.xml
   *?xml version=1.0?
   ?xml-stylesheet type=text/xsl href=configuration.xsl?
  
   !-- Put site-specific property overrides in this file. --
  
   configuration
   !-- In: conf/hdfs-site.xml --
   property
namedfs.replication/name
value1/value
descriptionDefault block replication.
The actual number of replications can be specified when the file is
   created.
The default is used if replication is not specified in create time.
/description
   /property
  
   /configuration
  
  
   below is my mapred-site.xml:
   ?xml version=1.0?
   ?xml-stylesheet type=text/xsl href=configuration.xsl?
  
   !-- Put site-specific property overrides in this file. --
  
   configuration
  
   !-- In: conf/mapred-site.xml --
   property
namemapred.job.tracker/name
valuelocalhost:54311/value
descriptionThe host and port that the MapReduce job tracker runs
at.  If local, then jobs are run in-process as a single map
and reduce task.
/description
   /property
  
   /configuration
  
  
   Thanks.
   Richard
   *
  
 
 
 
 



Re: urgent, error: java.io.IOException: Cannot create directory

2010-12-08 Thread Konstantin Boudnik
Yeah, I figured that match. What I was referring to is the ending of the paths:
.../hadoop-hadoop/dfs/name/current
.../hadoop-hadoop/dfs/hadoop
They are different
--
  Take care,
Konstantin (Cos) Boudnik



On Wed, Dec 8, 2010 at 15:55, Richard Zhang richardtec...@gmail.com wrote:
 Hi:
 /your/path/to/hadoop  represents the location where hadoop is installed.
 BTW, I believe this is a file writing permission problem. Because I use the
 same *-site.xml setting to install with root and it works.
 But when I use the dedicated user hadoop, it always introduces this problem.

 But I do created manually the directory path and grant with 755.
 Weird
 Richard.

 On Wed, Dec 8, 2010 at 6:51 PM, Konstantin Boudnik c...@apache.org wrote:

 it seems that you are looking at 2 different directories:

 first post: /your/path/to/hadoop/tmp/dir/hadoop-hadoop/dfs/name/current
 second: ls -l                              tmp/dir/hadoop-hadoop/dfs/hadoop
 --
   Take care,
 Konstantin (Cos) Boudnik



 On Wed, Dec 8, 2010 at 14:19, Richard Zhang richardtec...@gmail.com
 wrote:
  would that be the reason that 54310 port is not open?
  I just used
  * iptables -A INPUT -p tcp --dport 54310 -j ACCEPT
  to open the port.
  But it seems the same erorr exists.
  Richard
  *
  On Wed, Dec 8, 2010 at 4:56 PM, Richard Zhang richardtec...@gmail.com
 wrote:
 
  Hi James:
  I verified that I have the following permission set for the path:
 
  ls -l tmp/dir/hadoop-hadoop/dfs/hadoop
  total 4
  drwxr-xr-x 2 hadoop hadoop 4096 2010-12-08 15:56 current
  Thanks.
  Richard
 
 
 
  On Wed, Dec 8, 2010 at 4:50 PM, james warren ja...@rockyou.com wrote:
 
  Hi Richard -
 
  First thing that comes to mind is a permissions issue.  Can you verify
  that
  your directories along the desired namenode path are writable by the
  appropriate user(s)?
 
  HTH,
  -James
 
  On Wed, Dec 8, 2010 at 1:37 PM, Richard Zhang richardtec...@gmail.com
  wrote:
 
   Hi Guys:
   I am just installation the hadoop 0.21.0 in a single node cluster.
   I encounter the following error when I run bin/hadoop namenode
 -format
  
   10/12/08 16:27:22 ERROR namenode.NameNode:
   java.io.IOException: Cannot create directory
   /your/path/to/hadoop/tmp/dir/hadoop-hadoop/dfs/name/current
          at
  
  
 
 org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.clearDirectory(Storage.java:312)
          at
  
 org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:1425)
          at
  
 org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:1444)
          at
  
 
 org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:1242)
          at
  
  
 
 org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1348)
          at
  
 org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1368)
  
  
   Below is my core-site.xml
  
   configuration
   !-- In: conf/core-site.xml --
   property
    namehadoop.tmp.dir/name
    value/your/path/to/hadoop/tmp/dir/hadoop-${user.name}/value
    descriptionA base for other temporary directories./description
   /property
  
   property
    namefs.default.name/name
    valuehdfs://localhost:54310/value
    descriptionThe name of the default file system.  A URI whose
    scheme and authority determine the FileSystem implementation.  The
    uri's scheme determines the config property (fs.SCHEME.impl) naming
    the FileSystem implementation class.  The uri's authority is used to
    determine the host, port, etc. for a filesystem./description
   /property
   /configuration
  
  
   Below is my hdfs-site.xml
   *?xml version=1.0?
   ?xml-stylesheet type=text/xsl href=configuration.xsl?
  
   !-- Put site-specific property overrides in this file. --
  
   configuration
   !-- In: conf/hdfs-site.xml --
   property
    namedfs.replication/name
    value1/value
    descriptionDefault block replication.
    The actual number of replications can be specified when the file is
   created.
    The default is used if replication is not specified in create time.
    /description
   /property
  
   /configuration
  
  
   below is my mapred-site.xml:
   ?xml version=1.0?
   ?xml-stylesheet type=text/xsl href=configuration.xsl?
  
   !-- Put site-specific property overrides in this file. --
  
   configuration
  
   !-- In: conf/mapred-site.xml --
   property
    namemapred.job.tracker/name
    valuelocalhost:54311/value
    descriptionThe host and port that the MapReduce job tracker runs
    at.  If local, then jobs are run in-process as a single map
    and reduce task.
    /description
   /property
  
   /configuration
  
  
   Thanks.
   Richard
   *
  
 
 
 
 




Re: urgent, error: java.io.IOException: Cannot create directory

2010-12-08 Thread Richard Zhang
oh, sorry. I corrected that typo
hadoop$ ls tmp/dir/hadoop-hadoop/dfs/name/current -l
total 0
hadoop$ ls tmp/dir/hadoop-hadoop/dfs/name -l
total 4
drwxr-xr-x 2 hadoop hadoop 4096 2010-12-08 22:17 current

Even I remove the tmp I manually created and set all the Hadoop package to
be 777. Then I run the hadoop again and it is still the same.

Richard.

On Wed, Dec 8, 2010 at 7:55 PM, Konstantin Boudnik c...@apache.org wrote:

 Yeah, I figured that match. What I was referring to is the ending of the
 paths:
 .../hadoop-hadoop/dfs/name/current
 .../hadoop-hadoop/dfs/hadoop
 They are different
 --
   Take care,
 Konstantin (Cos) Boudnik



 On Wed, Dec 8, 2010 at 15:55, Richard Zhang richardtec...@gmail.com
 wrote:
  Hi:
  /your/path/to/hadoop  represents the location where hadoop is
 installed.
  BTW, I believe this is a file writing permission problem. Because I use
 the
  same *-site.xml setting to install with root and it works.
  But when I use the dedicated user hadoop, it always introduces this
 problem.
 
  But I do created manually the directory path and grant with 755.
  Weird
  Richard.
 
  On Wed, Dec 8, 2010 at 6:51 PM, Konstantin Boudnik c...@apache.org
 wrote:
 
  it seems that you are looking at 2 different directories:
 
  first post: /your/path/to/hadoop/tmp/dir/hadoop-hadoop/dfs/name/current
  second: ls -l
  tmp/dir/hadoop-hadoop/dfs/hadoop
  --
Take care,
  Konstantin (Cos) Boudnik
 
 
 
  On Wed, Dec 8, 2010 at 14:19, Richard Zhang richardtec...@gmail.com
  wrote:
   would that be the reason that 54310 port is not open?
   I just used
   * iptables -A INPUT -p tcp --dport 54310 -j ACCEPT
   to open the port.
   But it seems the same erorr exists.
   Richard
   *
   On Wed, Dec 8, 2010 at 4:56 PM, Richard Zhang 
 richardtec...@gmail.com
  wrote:
  
   Hi James:
   I verified that I have the following permission set for the path:
  
   ls -l tmp/dir/hadoop-hadoop/dfs/hadoop
   total 4
   drwxr-xr-x 2 hadoop hadoop 4096 2010-12-08 15:56 current
   Thanks.
   Richard
  
  
  
   On Wed, Dec 8, 2010 at 4:50 PM, james warren ja...@rockyou.com
 wrote:
  
   Hi Richard -
  
   First thing that comes to mind is a permissions issue.  Can you
 verify
   that
   your directories along the desired namenode path are writable by the
   appropriate user(s)?
  
   HTH,
   -James
  
   On Wed, Dec 8, 2010 at 1:37 PM, Richard Zhang 
 richardtec...@gmail.com
   wrote:
  
Hi Guys:
I am just installation the hadoop 0.21.0 in a single node cluster.
I encounter the following error when I run bin/hadoop namenode
  -format
   
10/12/08 16:27:22 ERROR namenode.NameNode:
java.io.IOException: Cannot create directory
/your/path/to/hadoop/tmp/dir/hadoop-hadoop/dfs/name/current
   at
   
   
  
 
 org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.clearDirectory(Storage.java:312)
   at
   
  org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:1425)
   at
   
  org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:1444)
   at
   
  
 
 org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:1242)
   at
   
   
  
 
 org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1348)
   at
   
  org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1368)
   
   
Below is my core-site.xml
   
configuration
!-- In: conf/core-site.xml --
property
 namehadoop.tmp.dir/name
 value/your/path/to/hadoop/tmp/dir/hadoop-${user.name}/value
 descriptionA base for other temporary
 directories./description
/property
   
property
 namefs.default.name/name
 valuehdfs://localhost:54310/value
 descriptionThe name of the default file system.  A URI whose
 scheme and authority determine the FileSystem implementation.
  The
 uri's scheme determines the config property (fs.SCHEME.impl)
 naming
 the FileSystem implementation class.  The uri's authority is used
 to
 determine the host, port, etc. for a filesystem./description
/property
/configuration
   
   
Below is my hdfs-site.xml
*?xml version=1.0?
?xml-stylesheet type=text/xsl href=configuration.xsl?
   
!-- Put site-specific property overrides in this file. --
   
configuration
!-- In: conf/hdfs-site.xml --
property
 namedfs.replication/name
 value1/value
 descriptionDefault block replication.
 The actual number of replications can be specified when the file
 is
created.
 The default is used if replication is not specified in create
 time.
 /description
/property
   
/configuration
   
   
below is my mapred-site.xml:
?xml version=1.0?
?xml-stylesheet type=text/xsl href=configuration.xsl?
   
!-- Put site-specific property overrides in this file. --
   
configuration
   
!-- In: conf/mapred-site.xml --
property
 namemapred.job.tracker/name
 

Hadoop Certification Progamme

2010-12-08 Thread Matthew John
Hi all,.

Is there any valid Hadoop Certification available ? Something which adds
credibility to your Hadoop expertise.

Matthew


Re: Hadoop Certification Progamme

2010-12-08 Thread Esteban Gutierrez Moguel
Matthew,

Cloudera has rolled a certification program for developers and admins. Take
a look into their website.

Cheers,
Esteban.
On Dec 8, 2010 9:41 PM, Matthew John tmatthewjohn1...@gmail.com wrote:
 Hi all,.

 Is there any valid Hadoop Certification available ? Something which adds
 credibility to your Hadoop expertise.

 Matthew


Re: Reduce Error

2010-12-08 Thread Adarsh Sharma

Ted Yu wrote:

Any chance mapred.local.dir is under /tmp and part of it got cleaned up ?

On Wed, Dec 8, 2010 at 4:17 AM, Adarsh Sharma adarsh.sha...@orkash.comwrote:

  

Dear all,

Did anyone encounter the below error while running job in Hadoop. It occurs
in the reduce phase of the job.

attempt_201012061426_0001_m_000292_0:
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any
valid local directory for
taskTracker/jobcache/job_201012061426_0001/attempt_201012061426_0001_m_000292_0/output/file.out

It states that it is not able to locate a file that is created in
 mapred.local.dir of Hadoop.

Thanks in Advance for any sort of information regarding this.

Best Regards

Adarsh Sharma




  

Hi Ted,

My mapred.local.dir is in /home/hadoop directory. I also check it with 
in /hdd2-2 directory where  we have lots of space.


Would mapred.map.tasks affects.

I checked with default and also with 80 maps and 16 reduces as I have 8 
slaves.



property
 namemapred.local.dir/name
 value/home/hadoop/mapred/local/value
 descriptionThe local directory where MapReduce stores intermediate
 data files.  May be a comma-separated list of directories on different 
devices in order to spread disk i/o.

 Directories that do not exist are ignored.
 /description
/property

property
 namemapred.system.dir/name
 value/home/hadoop/mapred/system/value
 descriptionThe shared directory where MapReduce stores control files.
 /description
/property

Any further information u want.


Thanks  Regards

Adarsh Sharma




Re: Reduce Error

2010-12-08 Thread Raj V


Go through the jobtracker, find the relevant node that handled 
attempt_201012061426_0001_m_000292_0 and figure out 

if there are FS or permssion problems.

Raj



From: Adarsh Sharma adarsh.sha...@orkash.com
To: common-user@hadoop.apache.org
Sent: Wed, December 8, 2010 7:48:47 PM
Subject: Re: Reduce Error

Ted Yu wrote:
 Any chance mapred.local.dir is under /tmp and part of it got cleaned up ?
 
 On Wed, Dec 8, 2010 at 4:17 AM, Adarsh Sharma adarsh.sha...@orkash.comwrote:
 
  
 Dear all,
 
 Did anyone encounter the below error while running job in Hadoop. It occurs
 in the reduce phase of the job.
 
 attempt_201012061426_0001_m_000292_0:
 org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any
 valid local directory for
taskTracker/jobcache/job_201012061426_0001/attempt_201012061426_0001_m_000292_0/output/file.out
t
 
 It states that it is not able to locate a file that is created in
  mapred.local.dir of Hadoop.
 
 Thanks in Advance for any sort of information regarding this.
 
 Best Regards
 
 Adarsh Sharma
 
    
 
  
Hi Ted,

My mapred.local.dir is in /home/hadoop directory. I also check it with in 
/hdd2-2 directory where  we have lots of space.

Would mapred.map.tasks affects.

I checked with default and also with 80 maps and 16 reduces as I have 8 slaves.


property
namemapred.local.dir/name
value/home/hadoop/mapred/local/value
descriptionThe local directory where MapReduce stores intermediate
data files.  May be a comma-separated list of directories on different devices 
in order to spread disk i/o.
Directories that do not exist are ignored.
/description
/property

property
namemapred.system.dir/name
value/home/hadoop/mapred/system/value
descriptionThe shared directory where MapReduce stores control files.
/description
/property

Any further information u want.


Thanks  Regards

Adarsh Sharma

Re: Reduce Error

2010-12-08 Thread Ted Yu
From Raj earlier:

I have seen this error from time to time and it has been either due to space
or
missing directories  or disk errors.

Space issue was caused by the fact that the I had mounted /de/sdc on
/hadoop-dsk
and the mount had failed. And in another case I had

accidentally deleted hadoop.tmp.dir  in a node and whenever  the reduce job
was
scheduled on that node that attempt would fail.

On Wed, Dec 8, 2010 at 8:21 PM, Adarsh Sharma adarsh.sha...@orkash.comwrote:

 Raj V wrote:

 Go through the jobtracker, find the relevant node that handled
 attempt_201012061426_0001_m_000292_0 and figure out
 if there are FS or permssion problems.

 Raj


 
 From: Adarsh Sharma adarsh.sha...@orkash.com
 To: common-user@hadoop.apache.org
 Sent: Wed, December 8, 2010 7:48:47 PM
 Subject: Re: Reduce Error


 Ted Yu wrote:


 Any chance mapred.local.dir is under /tmp and part of it got cleaned up ?

 On Wed, Dec 8, 2010 at 4:17 AM, Adarsh Sharma adarsh.sha...@orkash.com
 wrote:



 Dear all,

 Did anyone encounter the below error while running job in Hadoop. It
 occurs
 in the reduce phase of the job.

 attempt_201012061426_0001_m_000292_0:
 org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
 any
 valid local directory for

 taskTracker/jobcache/job_201012061426_0001/attempt_201012061426_0001_m_000292_0/output/file.out
 t

 It states that it is not able to locate a file that is created in
  mapred.local.dir of Hadoop.

 Thanks in Advance for any sort of information regarding this.

 Best Regards

 Adarsh Sharma





 Hi Ted,

 My mapred.local.dir is in /home/hadoop directory. I also check it with in
 /hdd2-2 directory where  we have lots of space.

 Would mapred.map.tasks affects.

 I checked with default and also with 80 maps and 16 reduces as I have 8
 slaves.


 property
 namemapred.local.dir/name
 value/home/hadoop/mapred/local/value
 descriptionThe local directory where MapReduce stores intermediate
 data files.  May be a comma-separated list of directories on different
 devices in order to spread disk i/o.
 Directories that do not exist are ignored.
 /description
 /property

 property
 namemapred.system.dir/name
 value/home/hadoop/mapred/system/value
 descriptionThe shared directory where MapReduce stores control files.
 /description
 /property

 Any further information u want.


 Thanks  Regards

 Adarsh Sharma


 Sir I read the tasktracker logs several times but not able to find any
 reason as they are not very useful. I attached with the mail of tasktracker.
 However I listed main portion.
 2010-12-06 15:27:04,228 INFO org.apache.hadoop.mapred.JobTracker: Adding
 task 'attempt_201012061426_0001_m_00_1' to tip
 task_201012061426_0001_m_00, for tracker 'tracker_ws37-user-lin:
 127.0.0.1/127.0.0.1:60583'
 2010-12-06 15:27:04,228 INFO org.apache.hadoop.mapred.JobInProgress:
 Choosing rack-local task task_201012061426_0001_m_00
 2010-12-06 15:27:04,229 INFO org.apache.hadoop.mapred.JobTracker: Removed
 completed task 'attempt_201012061426_0001_m_00_0' from
 'tracker_ws37-user-lin:127.0.0.1/127.0.0.1:60583'
 2010-12-06 15:27:07,235 INFO org.apache.hadoop.mapred.TaskInProgress: Error
 from attempt_201012061426_0001_m_000328_0: java.io.IOException: Spill failed
   at
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:860)
   at
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:541)
   at
 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
   at
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:30)
   at
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:19)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
   at org.apache.hadoop.mapred.Child.main(Child.java:170)
 Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not
 find any valid local directory for
 taskTracker/jobcache/job_201012061426_0001/attempt_201012061426_0001_m_000328_0/output/spill16.out
   at
 org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:343)
   at
 org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124)
   at
 org.apache.hadoop.mapred.MapOutputFile.getSpillFileForWrite(MapOutputFile.java:107)
   at
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1221)
   at
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:686)
   at
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1173)

 2010-12-06 15:27:07,236 INFO org.apache.hadoop.mapred.TaskInProgress: Error
 from attempt_201012061426_0001_m_00_1: Error initializing
 attempt_201012061426_0001_m_00_1:
 org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any
 valid 

Running not as hadoop user

2010-12-08 Thread Mark Kerzner
Hi,

hadoop user has some advantages for running Hadoop. For example, if HDFS
is mounted as a local file system, then only user hadoop has write/delete
permissions.

Can this privilege be given to another user? In other words, is this
hadoop user hard-coded, or can another be used in its stead?

Thank you,
Mark


Re: Running not as hadoop user

2010-12-08 Thread Todd Lipcon
The user who started the NN has superuser privileges on HDFS.

You can also configure a supergroup by setting
dfs.permissions.supergroup (default supergroup)

-Todd

On Wed, Dec 8, 2010 at 9:34 PM, Mark Kerzner markkerz...@gmail.com wrote:
 Hi,

 hadoop user has some advantages for running Hadoop. For example, if HDFS
 is mounted as a local file system, then only user hadoop has write/delete
 permissions.

 Can this privilege be given to another user? In other words, is this
 hadoop user hard-coded, or can another be used in its stead?

 Thank you,
 Mark




-- 
Todd Lipcon
Software Engineer, Cloudera


Re: Running not as hadoop user

2010-12-08 Thread Adarsh Sharma

Todd Lipcon wrote:

The user who started the NN has superuser privileges on HDFS.

You can also configure a supergroup by setting
dfs.permissions.supergroup (default supergroup)

-Todd

On Wed, Dec 8, 2010 at 9:34 PM, Mark Kerzner markkerz...@gmail.com wrote:
  

Hi,

hadoop user has some advantages for running Hadoop. For example, if HDFS
is mounted as a local file system, then only user hadoop has write/delete
permissions.

Can this privilege be given to another user? In other words, is this
hadoop user hard-coded, or can another be used in its stead?

Thank you,
Mark






  
You may also set dfs.permissions = false and grant seperate groups to 
access HDFS through properties in hdfs.sites.xml


--Adarsh


Re: Hadoop Certification Progamme

2010-12-08 Thread Jeff Hammerbacher
Hey Matthew,

In particular, see http://www.cloudera.com/hadoop-training/ for details on
Cloudera's training and certifications.

Regards,
Jeff

On Wed, Dec 8, 2010 at 7:44 PM, Esteban Gutierrez Moguel 
esteban...@gmail.com wrote:

 Matthew,

 Cloudera has rolled a certification program for developers and admins. Take
 a look into their website.

 Cheers,
 Esteban.
 On Dec 8, 2010 9:41 PM, Matthew John tmatthewjohn1...@gmail.com wrote:
  Hi all,.
 
  Is there any valid Hadoop Certification available ? Something which adds
  credibility to your Hadoop expertise.
 
  Matthew