Re: Seeking Someone to Review Hadoop Article

2008-10-23 Thread Mafish Liu
I'm interesting in it.

On Fri, Oct 24, 2008 at 6:31 AM, Tom Wheeler <[EMAIL PROTECTED]> wrote:

> Each month the developers at my company write a short article about a
> Java technology we find exciting. I've just finished one about Hadoop
> for November and am seeking a volunteer knowledgeable about Hadoop to
> look it over to help ensure it's both clear and technically accurate.
>
> If you're interested in helping me, please contact me offlist and I
> will send you the draft.  Meanwhile, you can get a feel for the length
> and general style of the articles from our archives:
>
>   http://www.ociweb.com/articles/publications/jnb.html
>
> Thanks in advance,
>
> Tom Wheeler
>



-- 
[EMAIL PROTECTED]
Institute of Computing Technology, Chinese Academy of Sciences, Beijing.


Re: Why did I only get 2 live datanodes?

2008-10-16 Thread Mafish Liu
Hi, wei:
Did you have all your nodes have the same directory structures, the same
configuration files, the same hadoop data directory, and make sure you have
access to the data structure.

   You may have connection problems, MAKE SURE your slave node can ssh to
master without password. And TURN OFF FIREWALL in your master node.

-- 
[EMAIL PROTECTED]
Institute of Computing Technology, Chinese Academy of Sciences, Beijing.


Re: Need help in hdfs configuration fully distributed way in Mac OSX...

2008-09-16 Thread Mafish Liu
Hi, souravm:
  I don't know exactly what's wrong with your configuration from your post
and I guest the possible causes are:

  1. Make sure firewall on namenode is off or the port of 9000 is free to
connect in your firewall configuration.

  2. Namenode. Check the namenode start up log to see if namenode starts up
correctly, or try run 'jps' on your namenode to see if there is process
called "namenode".

May this help.


On Tue, Sep 16, 2008 at 10:41 PM, souravm <[EMAIL PROTECTED]> wrote:

> Hi,
>
> Tha namenode in machine 1 has started. I can see the following log. Is
> there a specific way to provide the master name in masters file (in
> hadoop/conf) in datanode ? I've currently specified
>
> 2008-09-16 07:23:46,321 INFO org.apache.hadoop.ipc.metrics.RpcMetrics:
> Initializing RPC Metrics with hostName=NameNode, port=9000
> 2008-09-16 07:23:46,325 INFO org.apache.hadoop.dfs.NameNode: Namenode up
> at: localhost/127.0.0.1:9000
> 2008-09-16 07:23:46,327 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
> Initializing JVM Metrics with processName=NameNode, sessionId=null
> 2008-09-16 07:23:46,329 INFO org.apache.hadoop.dfs.NameNodeMetrics:
> Initializing NameNodeMeterics using context
> object:org.apache.hadoop.metrics.spi.NullContext
> 2008-09-16 07:23:46,404 INFO org.apache.hadoop.fs.FSNamesystem:
> fsOwner=souravm,souravm,_lpadmin,_appserveradm,_appserverusr,admin
> 2008-09-16 07:23:46,405 INFO org.apache.hadoop.fs.FSNamesystem:
> supergroup=supergroup
> 2008-09-16 07:23:46,405 INFO org.apache.hadoop.fs.FSNamesystem:
> isPermissionEnabled=true
> 2008-09-16 07:23:46,473 INFO org.apache.hadoop.fs.FSNamesystem: Finished
> loading FSImage in 112 msecs
> 2008-09-16 07:23:46,475 INFO org.apache.hadoop.dfs.StateChange: STATE*
> Leaving safe mode after 0 secs.
> 2008-09-16 07:23:46,475 INFO org.apache.hadoop.dfs.StateChange: STATE*
> Network topology has 0 racks and 0 datanodes
> 2008-09-16 07:23:46,480 INFO org.apache.hadoop.dfs.StateChange: STATE*
> UnderReplicatedBlocks has 0 blocks
> 2008-09-16 07:23:46,486 INFO org.apache.hadoop.fs.FSNamesystem: Registered
> FSNamesystemStatusMBean
> 2008-09-16 07:23:46,561 INFO org.mortbay.util.Credential: Checking Resource
> aliases
> 2008-09-16 07:23:46,627 INFO org.mortbay.http.HttpServer: Version
> Jetty/5.1.4
> 2008-09-16 07:23:46,907 INFO org.mortbay.util.Container: Started
> [EMAIL PROTECTED]
> 2008-09-16 07:23:46,937 INFO org.mortbay.util.Container: Started
> WebApplicationContext[/,/]
> 2008-09-16 07:23:46,938 INFO org.mortbay.util.Container: Started
> HttpContext[/logs,/logs]
> 2008-09-16 07:23:46,938 INFO org.mortbay.util.Container: Started
> HttpContext[/static,/static]
> 2008-09-16 07:23:46,939 INFO org.mortbay.http.SocketListener: Started
> SocketListener on 0.0.0.0:50070
> 2008-09-16 07:23:46,939 INFO org.mortbay.util.Container: Started
> [EMAIL PROTECTED]
> 2008-09-16 07:23:46,940 INFO org.apache.hadoop.fs.FSNamesystem: Web-server
> up at: 0.0.0.0:50070
> 2008-09-16 07:23:46,940 INFO org.apache.hadoop.ipc.Server: IPC Server
> Responder: starting
> 2008-09-16 07:23:46,942 INFO org.apache.hadoop.ipc.Server: IPC Server
> listener on 9000: starting
> 2008-09-16 07:23:46,943 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 0 on 9000: starting
> 2008-09-16 07:23:46,943 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 1 on 9000: starting
> 2008-09-16 07:23:46,943 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 2 on 9000: starting
> 2008-09-16 07:23:46,943 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 3 on 9000: starting
> 2008-09-16 07:23:46,943 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 4 on 9000: starting
> 2008-09-16 07:23:46,943 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 5 on 9000: starting
> 2008-09-16 07:23:46,943 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 6 on 9000: starting
> 2008-09-16 07:23:46,943 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 7 on 9000: starting
> 2008-09-16 07:23:46,943 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 8 on 9000: starting
> 2008-09-16 07:23:46,944 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 9 on 9000: starting
>
> Is there a specific way to provide the master name in masters file (in
> hadoop/conf) in datanode ? I've currently specified @ server ip>. I'm thinking there might be a problem as in log file of data
> node I can see the message '2008-09-16 14:38:51,501 INFO
> org.apache.hadoop.ipc.RPC: Server at /192.168.1.102:9000 not available
> yet, Z...'
>
> Any help ?
>
> Regards,
> Sourav
>
>
> 
> From: Samuel Guo [EMAIL PROTECTED]
> Sent: Tuesday, September 16, 2008 5:49 AM
> To: core-user@hadoop.apache.org
> Subject: Re: Need help in hdfs configuration fully distributed way in Mac
> OSX...
>
> check the namenode's log in machine1 to see if your namenode started
> successfully :)
>
> On Tue, Sep 16, 2008 at 2:04 PM, souravm <[EMAIL PROTECTED]> wrote:
>
> > Hi All,
> >
> > I'm faci

Re: Need help in hdfs configuration fully distributed way in Mac OSX...

2008-09-15 Thread Mafish Liu
Hi:
  You need to configure your nodes to ensure that node 1 can connect to node
2 without password.

On Tue, Sep 16, 2008 at 2:04 PM, souravm <[EMAIL PROTECTED]> wrote:

> Hi All,
>
> I'm facing a problem in configuring hdfs in a fully distributed way in Mac
> OSX.
>
> Here is the topology -
>
> 1. The namenode is in machine 1
> 2. There is 1 datanode in machine 2
>
> Now when I execute start-dfs.sh from machine 1, it connects to machine 2
> (after it asks for password for connecting to machine 2) and starts datanode
> in machine 2 (as the console message says).
>
> However -
> 1. When I go to http://machine1:50070 - it does not show the data node at
> all. It says 0 data node configured
> 2. In the log file in machine 2 what I see is -
> /
> STARTUP_MSG: Starting DataNode
> STARTUP_MSG:   host = rc0902b-dhcp169.apple.com/17.229.22.169
> STARTUP_MSG:   args = []
> STARTUP_MSG:   version = 0.17.2.1
> STARTUP_MSG:   build =
> https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.17 -r
> 684969; compiled by 'oom' on Wed Aug 20 22:29:32 UTC 2008
> /
> 2008-09-15 18:54:44,626 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: /17.229.23.77:9000. Already tried 1 time(s).
> 2008-09-15 18:54:45,627 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: /17.229.23.77:9000. Already tried 2 time(s).
> 2008-09-15 18:54:46,628 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: /17.229.23.77:9000. Already tried 3 time(s).
> 2008-09-15 18:54:47,629 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: /17.229.23.77:9000. Already tried 4 time(s).
> 2008-09-15 18:54:48,630 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: /17.229.23.77:9000. Already tried 5 time(s).
> 2008-09-15 18:54:49,631 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: /17.229.23.77:9000. Already tried 6 time(s).
> 2008-09-15 18:54:50,632 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: /17.229.23.77:9000. Already tried 7 time(s).
> 2008-09-15 18:54:51,633 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: /17.229.23.77:9000. Already tried 8 time(s).
> 2008-09-15 18:54:52,635 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: /17.229.23.77:9000. Already tried 9 time(s).
> 2008-09-15 18:54:53,640 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: /17.229.23.77:9000. Already tried 10 time(s).
> 2008-09-15 18:54:54,641 INFO org.apache.hadoop.ipc.RPC: Server at /
> 17.229.23.77:9000 not available yet, Z...
>
> ... and this retyring gets on repeating
>
>
> The  hadoop-site.xmls are like this -
>
> 1. In machine 1
> -
> 
>
>  
>fs.default.name
>hdfs://localhost:9000
>  
>
>   
>dfs.name.dir
>/Users/souravm/hdpn
>  
>
>  
>mapred.job.tracker
>localhost:9001
>  
>  
>dfs.replication
>1
>  
> 
>
>
> 2. In machine 2
>
> 
>
>  
>fs.default.name
>hdfs://:9000
>  
>  
>dfs.data.dir
>/Users/nirdosh/hdfsd1
>  
>  
>dfs.replication
>1
>  
> 
>
> The slaves file in machine 1 has single entry - @ machine2>
>
> The exact steps I did -
>
> 1. Reformat the namenode in machine 1
> 2. execute start-dfs.sh in machine 1
> 3. Then I try to see whether the datanode is created through http:// 1 ip>:50070
>
> Any pointer to resolve this issue would be appreciated.
>
> Regards,
> Sourav
>
>
>
>  CAUTION - Disclaimer *
> This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended
> solely
> for the use of the addressee(s). If you are not the intended recipient,
> please
> notify the sender by e-mail and delete the original message. Further, you
> are not
> to copy, disclose, or distribute this e-mail or its contents to any other
> person and
> any such actions are unlawful. This e-mail may contain viruses. Infosys has
> taken
> every reasonable precaution to minimize this risk, but is not liable for
> any damage
> you may sustain as a result of any virus in this e-mail. You should carry
> out your
> own virus checks before opening the e-mail or attachment. Infosys reserves
> the
> right to monitor and review the content of all messages sent to or from
> this e-mail
> address. Messages sent to or from this e-mail address may be stored on the
> Infosys e-mail system.
> ***INFOSYS End of Disclaimer INFOSYS***
>



-- 
[EMAIL PROTECTED]
Institute of Computing Technology, Chinese Academy of Sciences, Beijing.


Re: Small Filesizes

2008-09-15 Thread Mafish Liu
Hi,
  I'm just working on this situation you described, with millions of small
files sized around 10KB.
  My idea is to compact this files into big ones and create indexes for
them. This is a file system over file system and support append update, lazy
delete.
  May this help .

-- 
[EMAIL PROTECTED]
Institute of Computing Technology, Chinese Academy of Sciences, Beijing.


Re: specifying number of nodes for job

2008-09-07 Thread Mafish Liu
On Mon, Sep 8, 2008 at 2:25 AM, Sandy <[EMAIL PROTECTED]> wrote:

> Hi,
>
> This may be a silly question, but I'm strangely having trouble finding an
> answer for it (perhaps I'm looking in the wrong places?).
>
> Suppose I have a cluster with n nodes each with m processors.
>
> I wish to test the performance of, say,  the wordcount program on k
> processors, where k is varied from k = 1 ... nm.


You can  specify the number of tasks for each node in your hadoop-site.xml
file.
So you can get k varied from k = n, 2*nm*n instead of k = 1...nm.


> How would I do this? I'm having trouble finding the proper command line
> option in the commands manual (
> http://hadoop.apache.org/core/docs/current/commands_manual.html)
>
>
>
> Thank you very much for you time.
>
> -SM
>



-- 
[EMAIL PROTECTED]
Institute of Computing Technology, Chinese Academy of Sciences, Beijing.


Re: Output directory already exists

2008-09-02 Thread Mafish Liu
On Wed, Sep 3, 2008 at 1:24 AM, Shirley Cohen <[EMAIL PROTECTED]> wrote:

> Hi,
>
> I'm trying to write the output of two different map-reduce jobs into the
> same output directory. I'm using MultipleOutputFormats to set the filename
> dynamically, so there is no filename collision between the two jobs.
> However, I'm getting the error "output directory already exists".
>
> Does the framework support this functionality? It seems silly to have to
> create a temp directory to store the output files from the second job and
> then have to copy them to the first job's output directory after the second
> job completes.


Map/reduce will create output directory every time it runs and will fail if
the directory exists. Seems that there is no way to implement your
description other than modify the source code.

>
>
> Thanks,
>
> Shirley
>
>


-- 
[EMAIL PROTECTED]
Institute of Computing Technology, Chinese Academy of Sciences, Beijing.


Re: basic questions about Hadoop!

2008-09-01 Thread Mafish Liu
On Sat, Aug 30, 2008 at 10:12 AM, Gerardo Velez <[EMAIL PROTECTED]>wrote:

> Hi Victor!
>
> I got problem with remote writing as well, so I tried to go further on this
> and I would like to share what I did, maybe you have more luck than me
>
> 1) as I'm working with user gvelez in remote host I had to give write
> access
> to all, like this:
>
>bin/hadoop dfs -chmod -R a+w input
>
> 2) After that, there is no more connection refused error, but instead I got
> following exception
>
>
>
> $ bin/hadoop dfs -copyFromLocal README.txt /user/hadoop/input/README.txt
> cygpath: cannot create short name of d:\hadoop\hadoop-0.17.2\logs
> 08/08/29 19:06:51 INFO dfs.DFSClient:
> org.apache.hadoop.ipc.RemoteException:
> jav
> a.io.IOException: File /user/hadoop/input/README.txt could only be
> replicated to
>  0 nodes, instead of 1
>at
> org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.ja
> va:1145)
>at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:300)
>at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
>at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
> sorImpl.java:25)
>at java.lang.reflect.Method.invoke(Method.java:585)
>at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446)
>at org.apache.hadoop.ipc.Server$Handler.run(Server.java:896)
>
> How many datanode do you have ? Only one, I guess.
Modify your $HADOOP_HOME/conf/hadoop-site.xml and lookup


dfs.replication
1


set value to 0.


>
> On Fri, Aug 29, 2008 at 9:53 AM, Victor Samoylov <
> [EMAIL PROTECTED]
> > wrote:
>
> > Jeff,
> >
> > Thanks for detailed instructions, but on machine that is not hadoop
> server
> > I
> > got error:
> > ~/hadoop-0.17.2$ ./bin/hadoop dfs -copyFromLocal NOTICE.txt test
> > 08/08/29 19:33:07 INFO dfs.DFSClient: Exception in
> createBlockOutputStream
> > java.net.ConnectException: Connection refused
> > 08/08/29 19:33:07 INFO dfs.DFSClient: Abandoning block
> > blk_-7622891475776838399
> > The thing is that file was created, but with zero size.
> >
> > Do you have ideas why this happened?
> >
> > Thanks,
> > Victor
> >
> > On Fri, Aug 29, 2008 at 4:10 AM, Jeff Payne <[EMAIL PROTECTED]> wrote:
> >
> > > You can use the hadoop command line on machines that aren't hadoop
> > servers.
> > > If you copy the hadoop configuration from one of your master servers or
> > > data
> > > node to the client machine and run the command line dfs tools, it will
> > copy
> > > the files directly to the data node.
> > >
> > > Or, you could use one of the client libraries.  The java client, for
> > > example, allows you to open up an output stream and start dumping bytes
> > on
> > > it.
> > >
> > > On Thu, Aug 28, 2008 at 5:05 PM, Gerardo Velez <
> [EMAIL PROTECTED]
> > > >wrote:
> > >
> > > > Hi Jeff, thank you for answering!
> > > >
> > > > What about remote writing on HDFS, lets suppose I got an application
> > > server
> > > > on a
> > > > linux server A and I got a Hadoop cluster on servers B (master), C
> > > (slave),
> > > > D (slave)
> > > >
> > > > What I would like is sent some files from Server A to be processed by
> > > > hadoop. So in order to do so, what I need to do do I need send
> > those
> > > > files to master server first and then copy those to HDFS?
> > > >
> > > > or can I pass those files to any slave server?
> > > >
> > > > basically I'm looking for remote writing due to files to be process
> are
> > > not
> > > > being generated on any haddop server.
> > > >
> > > > Thanks again!
> > > >
> > > > -- Gerardo
> > > >
> > > >
> > > >
> > > > Regarding
> > > >
> > > > On Thu, Aug 28, 2008 at 4:04 PM, Jeff Payne <[EMAIL PROTECTED]>
> > wrote:
> > > >
> > > > > Gerardo:
> > > > >
> > > > > I can't really speak to all of your questions, but the master/slave
> > > issue
> > > > > is
> > > > > a common concern with hadoop.  A cluster has a single namenode and
> > > > > therefore
> > > > > a single point of failure.  There is also a secondary name node
> > process
> > > > > which runs on the same machine as the name node in most default
> > > > > configurations.  You can make it a different machine by adjusting
> the
> > > > > master
> > > > > file.  One of the more experienced lurkers should feel free to
> > correct
> > > > me,
> > > > > but my understanding is that the secondary name node keeps track of
> > all
> > > > the
> > > > > same index information used by the primary name node.  So, if the
> > > > namenode
> > > > > fails, there is no automatic recovery, but you can always tweak
> your
> > > > > cluster
> > > > > configuration to make the secondary namenode the primary and safely
> > > > restart
> > > > > the cluster.
> > > > >
> > > > > As for the storage of files, the name node is really just the
> traffic
> > > cop
> > > > > for HDFS.  No HDFS files are actually stored on that machine.  It's
> > > > > basically used as a directory and lock manager, etc.  The files are
> > > > stored
> > > > > 

Re: Help: how to check the active datanodes?

2008-07-03 Thread Mafish Liu
Hi, zhang:
   Once you start hadoop with shell start-all.sh, a hadoop status pape can
be accessed through http://namenode-ip:port/dfshealth. Port is specified by

dfs.http.address
in your hadoop-default.xml.
If the datanodes status is not as expected, you need to check log files.
They show the details of failure.

On Fri, Jul 4, 2008 at 4:17 AM, Richard Zhang <[EMAIL PROTECTED]>
wrote:

> Hi guys:
> I am running hadoop on a 8 nodes cluster.  I uses start-all.sh to boot
> hadoop and it shows that all 8 data nodes are started. However, when I use
> bin/hadoop dfsadmin -report to check the status of the data nodes and it
> shows only one data node (the one with the same host as name node) is
> active. How could we know if all the data nodes are active precisely? Does
> anyone has deal with this before?
> Thanks.
> Richard
>



-- 
[EMAIL PROTECTED]
Institute of Computing Technology, Chinese Academy of Sciences, Beijing.


Re: how to setup hadoop client?

2008-07-01 Thread Mafish Liu
Hi, yun:
In shell, you can use bin/hadoop dfs -ls to list directory.
If you want to access dfs in your code, you need write code as
following:

Configuration conf = new Configuration();
FileSystem fs = get (conf);

And then, you can access file system through fs. Refer java doc for more
details.

Regards.
Mafish

On Wed, Jul 2, 2008 at 6:20 AM, <[EMAIL PROTECTED]> wrote:

> Hi all,
>
> I am a newbie to hadoop, and am setting up a multi-node cluster.  I have a
> trivial question on the HDFS. In the hadoop doc it has a diagram showing
> the client performs a read/write to the dfs, but it isn't clear to me on how
> the client will be setup.  I assume you will have to install the hadoop
> binaries, and connect to the namenode (master) to access the dfs, like this:
>
> #  ./hadoop dfs -fs hdfs://:9000 -ls /
>
> I need to do benchmark on create/read/write operations on hadoop dfs.  Any
> pointers appreciated.  Thanks in advance!
>
> -Yun
>
>
>


-- 
[EMAIL PROTECTED]
Institute of Computing Technology, Chinese Academy of Sciences, Beijing.


Re: reduce task hanging or just slow?

2008-03-31 Thread Mafish Liu
Hi:
I have met the similar problem with you.  Finally, I found that this
problem was caused by the hostname resolution because hadoop use hostname to
access other nodes.
To fix this, try open your jobtracker log file( It often resides in
$HADOOP_HOME/logs/hadoop--jobtracker-.log ) to see if there is a
error:
"FATAL org.apache.hadoop.mapred.JobTracker: java.net.UnknownHostException:
Invalid hostname for server: local"
If, it is, adding ip-hostname pairs to /etc/hosts files on all of you
nodes may fix this problem.

Good luck and best regards.

Mafish

-- 
[EMAIL PROTECTED]
Institute of Computing Technology, Chinese Academy of Sciences, Beijing.


Re: Reduce Hangs

2008-03-30 Thread Mafish Liu
All ports are listed in conf/hadoop-default.xml and conf/hadoop-site.xml.
Also, if you are using hbase, you need to concern about hbase-default.xmland
hbase-site.xml, located in hbase directory.

2008/3/29 Natarajan, Senthil <[EMAIL PROTECTED]>:

> Hi,
> Thanks for your suggestions.
>
> It looks like the problem is with firewall, I created the firewall rule to
> allow these ports 5 to 50100 (I found in these port range hadoop was
> listening)
>
> Looks like I am missing some ports and that gets blocked in the firewall.
>
> Could anyone please let me know, how to configure hadoop to use only
> certain specified ports, so that those ports can be allowed in the firewall.
>
> Thanks,
> Senthil
>
>


-- 
[EMAIL PROTECTED]
Institute of Computing Technology, Chinese Academy of Sciences, Beijing.


Re: Reduce Hangs

2008-03-27 Thread Mafish Liu
On Fri, Mar 28, 2008 at 12:31 AM, 朱盛凯 <[EMAIL PROTECTED]> wrote:

> Hi,
>
> I met this problem in my cluster before, I think I can share with you some
> of my experience.
> But it may not work in you case.
>
> The job in my cluster always hung at 16% of reduce. It occured because the
> reduce task could not fetch the
> map output from other nodes.
>
> In my case, two factors may result in this faliure of communication
> between
> two task trackers.
>
> One is the firewall block the trackers from communications. I solved this
> by
> disabling the firewall.
> The other factor is that trackers refer to other nodes by host name only,
> but not ip address. I solved this by editing the file /etc/hosts
> with mapping from hostname to ip address of all nodes in cluster.


I meet this problem with the same reason too.
Try to host names to all your /etc/hosts files .

>
>
> I hope my experience will be helpful for you.
>
> On 3/27/08, Natarajan, Senthil <[EMAIL PROTECTED]> wrote:
> >
> > Hi,
> > I have small Hadoop cluster, one master and three slaves.
> > When I try the example wordcount on one of our log file (size ~350 MB)
> >
> > Map runs fine but reduce always hangs (sometime around 19%,60% ...)
> after
> > very long time it finishes.
> > I am seeing this error
> > Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out
> > In the log I am seeing this
> > INFO org.apache.hadoop.mapred.TaskTracker:
> > task_200803261535_0001_r_00_0 0.1834% reduce > copy (11 of 20 at
> > 0.02 MB/s) >
> >
> > Do you know what might be the problem.
> > Thanks,
> > Senthil
> >
> >
>



-- 
[EMAIL PROTECTED]
Institute of Computing Technology, Chinese Academy of Sciences, Beijing.


Re: dynamically adding slaves to hadoop cluster

2008-03-09 Thread Mafish Liu
On Mon, Mar 10, 2008 at 9:47 AM, Mafish Liu <[EMAIL PROTECTED]> wrote:

> You should do the following steps:
> 1. Have hadoop deployed on the new node with the same directory structure
> and configuration.
> 2. Just run $HADOOP_HOME/bin/hadoop datanode and jobtracker.

Addition: do not run "bin/hadoop namenode -format" before you run datanode,
or you will get a error like "Incompatible namespaceIDs ..."

>
>
> Datanode and jobtracker will contact to namenode specified in hadoop
> configuration file automatically and finish adding new node to the hadoop
> cluster.
>
>
> On Mon, Mar 10, 2008 at 4:56 AM, Aaron Kimball <[EMAIL PROTECTED]>
> wrote:
>
> > Yes. You should have the same hadoop-site across all your slaves. They
> > will need to know the DNS name for the namenode and jobtracker.
> >
> > - Aaron
> >
> > tjohn wrote:
> > >
> > > Mahadev Konar wrote:
> > >
> > >> I believe (as far as I remember) you should be able to add the node
> > by
> > >> bringing up the datanode or tasktracker on the remote machine. The
> > >> Namenode or the jobtracker (I think) does not check for the nodes in
> > the
> > >> slaves file. The slaves file is just to start up all the daemon's by
> > >> ssshing to all the nodes in the slaves file during startup. So you
> > >> should just be able to startup the datanode pointing to correct
> > namenode
> > >> and it should work.
> > >>
> > >> Regards
> > >> Mahadev
> > >>
> > >>
> > >>
> > >
> > > Sorry for my ignorance... To make a datanode/tasktraker point to the
> > > namenode what should i do? Have i to edit the hadoop-site.xml? Thanks
> > >
> > > John
> > >
> > >
> >
>
>
>
> --
> [EMAIL PROTECTED]
> Institute of Computing Technology, Chinese Academy of Sciences, Beijing.
>



-- 
[EMAIL PROTECTED]
Institute of Computing Technology, Chinese Academy of Sciences, Beijing.


Re: dynamically adding slaves to hadoop cluster

2008-03-09 Thread Mafish Liu
You should do the following steps:
1. Have hadoop deployed on the new node with the same directory structure
and configuration.
2. Just run $HADOOP_HOME/bin/hadoop datanode and jobtracker.

Datanode and jobtracker will contact to namenode specified in hadoop
configuration file automatically and finish adding new node to the hadoop
cluster.

On Mon, Mar 10, 2008 at 4:56 AM, Aaron Kimball <[EMAIL PROTECTED]> wrote:

> Yes. You should have the same hadoop-site across all your slaves. They
> will need to know the DNS name for the namenode and jobtracker.
>
> - Aaron
>
> tjohn wrote:
> >
> > Mahadev Konar wrote:
> >
> >> I believe (as far as I remember) you should be able to add the node by
> >> bringing up the datanode or tasktracker on the remote machine. The
> >> Namenode or the jobtracker (I think) does not check for the nodes in
> the
> >> slaves file. The slaves file is just to start up all the daemon's by
> >> ssshing to all the nodes in the slaves file during startup. So you
> >> should just be able to startup the datanode pointing to correct
> namenode
> >> and it should work.
> >>
> >> Regards
> >> Mahadev
> >>
> >>
> >>
> >
> > Sorry for my ignorance... To make a datanode/tasktraker point to the
> > namenode what should i do? Have i to edit the hadoop-site.xml? Thanks
> >
> > John
> >
> >
>



-- 
[EMAIL PROTECTED]
Institute of Computing Technology, Chinese Academy of Sciences, Beijing.