On Sat, Aug 30, 2008 at 10:12 AM, Gerardo Velez <[EMAIL PROTECTED]>wrote:
> Hi Victor! > > I got problem with remote writing as well, so I tried to go further on this > and I would like to share what I did, maybe you have more luck than me > > 1) as I'm working with user gvelez in remote host I had to give write > access > to all, like this: > > bin/hadoop dfs -chmod -R a+w input > > 2) After that, there is no more connection refused error, but instead I got > following exception > > > > $ bin/hadoop dfs -copyFromLocal README.txt /user/hadoop/input/README.txt > cygpath: cannot create short name of d:\hadoop\hadoop-0.17.2\logs > 08/08/29 19:06:51 INFO dfs.DFSClient: > org.apache.hadoop.ipc.RemoteException: > jav > a.io.IOException: File /user/hadoop/input/README.txt could only be > replicated to > 0 nodes, instead of 1 > at > org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.ja > va:1145) > at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:300) > at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces > sorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:896) > > How many datanode do you have ? Only one, I guess. Modify your $HADOOP_HOME/conf/hadoop-site.xml and lookup <property> <name>dfs.replication</name> <value>1</value> </property> set value to 0. > > On Fri, Aug 29, 2008 at 9:53 AM, Victor Samoylov < > [EMAIL PROTECTED] > > wrote: > > > Jeff, > > > > Thanks for detailed instructions, but on machine that is not hadoop > server > > I > > got error: > > ~/hadoop-0.17.2$ ./bin/hadoop dfs -copyFromLocal NOTICE.txt test > > 08/08/29 19:33:07 INFO dfs.DFSClient: Exception in > createBlockOutputStream > > java.net.ConnectException: Connection refused > > 08/08/29 19:33:07 INFO dfs.DFSClient: Abandoning block > > blk_-7622891475776838399 > > The thing is that file was created, but with zero size. > > > > Do you have ideas why this happened? > > > > Thanks, > > Victor > > > > On Fri, Aug 29, 2008 at 4:10 AM, Jeff Payne <[EMAIL PROTECTED]> wrote: > > > > > You can use the hadoop command line on machines that aren't hadoop > > servers. > > > If you copy the hadoop configuration from one of your master servers or > > > data > > > node to the client machine and run the command line dfs tools, it will > > copy > > > the files directly to the data node. > > > > > > Or, you could use one of the client libraries. The java client, for > > > example, allows you to open up an output stream and start dumping bytes > > on > > > it. > > > > > > On Thu, Aug 28, 2008 at 5:05 PM, Gerardo Velez < > [EMAIL PROTECTED] > > > >wrote: > > > > > > > Hi Jeff, thank you for answering! > > > > > > > > What about remote writing on HDFS, lets suppose I got an application > > > server > > > > on a > > > > linux server A and I got a Hadoop cluster on servers B (master), C > > > (slave), > > > > D (slave) > > > > > > > > What I would like is sent some files from Server A to be processed by > > > > hadoop. So in order to do so, what I need to do.... do I need send > > those > > > > files to master server first and then copy those to HDFS? > > > > > > > > or can I pass those files to any slave server? > > > > > > > > basically I'm looking for remote writing due to files to be process > are > > > not > > > > being generated on any haddop server. > > > > > > > > Thanks again! > > > > > > > > -- Gerardo > > > > > > > > > > > > > > > > Regarding > > > > > > > > On Thu, Aug 28, 2008 at 4:04 PM, Jeff Payne <[EMAIL PROTECTED]> > > wrote: > > > > > > > > > Gerardo: > > > > > > > > > > I can't really speak to all of your questions, but the master/slave > > > issue > > > > > is > > > > > a common concern with hadoop. A cluster has a single namenode and > > > > > therefore > > > > > a single point of failure. There is also a secondary name node > > process > > > > > which runs on the same machine as the name node in most default > > > > > configurations. You can make it a different machine by adjusting > the > > > > > master > > > > > file. One of the more experienced lurkers should feel free to > > correct > > > > me, > > > > > but my understanding is that the secondary name node keeps track of > > all > > > > the > > > > > same index information used by the primary name node. So, if the > > > > namenode > > > > > fails, there is no automatic recovery, but you can always tweak > your > > > > > cluster > > > > > configuration to make the secondary namenode the primary and safely > > > > restart > > > > > the cluster. > > > > > > > > > > As for the storage of files, the name node is really just the > traffic > > > cop > > > > > for HDFS. No HDFS files are actually stored on that machine. It's > > > > > basically used as a directory and lock manager, etc. The files are > > > > stored > > > > > on multiple datanodes and I'm pretty sure all the actual file I/O > > > happens > > > > > directly between the client and the respective datanodes. > > > > > > > > > > Perhaps one of the more hardcore hadoop people on here will point > it > > > out > > > > if > > > > > I'm giving bad advice. > > > > > > > > > > > > > > > On Thu, Aug 28, 2008 at 2:28 PM, Gerardo Velez < > > > [EMAIL PROTECTED] > > > > > >wrote: > > > > > > > > > > > Hi Everybody! > > > > > > > > > > > > I'm a newbie with Hadoop, I've installed it as a single node as a > > > > > > pseudo-distributed environment, but I would like to go further > and > > > > > > configure > > > > > > a complete hadoop cluster. But I got the following questions. > > > > > > > > > > > > 1.- I undertsand that HDFS has a master/slave architecture. So > > master > > > > and > > > > > > the master server manages the file system namespace and regulates > > > > access > > > > > to > > > > > > files by clients. So, what happens in a cluster environment if > the > > > > master > > > > > > server fails or is down due to network issues? > > > > > > the slave become as master server or something? > > > > > > > > > > > > > > > > > > 2.- What about Haddop Filesystem, from client point of view. the > > > client > > > > > > should only store files in the HDFS on master server, or clients > > are > > > > able > > > > > > to > > > > > > store the file to be processed on a HDFS from a slave server as > > well? > > > > > > > > > > > > > > > > > > 3.- Until now, what I;m doing to run hadoop is: > > > > > > > > > > > > 1.- copy file to be processes from Linux File System to HDFS > > > > > > 2.- Run hadoop shell hadoop -jarfile input output > > > > > > 3.- The results are stored on output directory > > > > > > > > > > > > > > > > > > There is anyway to have hadoop as a deamon, so that, when the > file > > is > > > > > > stored > > > > > > in HDFS the file is processed automatically with hadoop? > > > > > > > > > > > > (witout to run hadoop shell everytime) > > > > > > > > > > > > > > > > > > 4.- What happens with processed files, they are deleted form HDFS > > > > > > automatically? > > > > > > > > > > > > > > > > > > Thanks in advance! > > > > > > > > > > > > > > > > > > -- Gerardo Velez > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > Jeffrey Payne > > > > > Lead Software Engineer > > > > > Eyealike, Inc. > > > > > [EMAIL PROTECTED] > > > > > www.eyealike.com > > > > > (206) 257-8708 > > > > > > > > > > > > > > > "Anything worth doing is worth overdoing." > > > > > -H. Lifter > > > > > > > > > > > > > > > > > > > > > -- > > > Jeffrey Payne > > > Lead Software Engineer > > > Eyealike, Inc. > > > [EMAIL PROTECTED] > > > www.eyealike.com > > > (206) 257-8708 > > > > > > > > > "Anything worth doing is worth overdoing." > > > -H. Lifter > > > > > > -- [EMAIL PROTECTED] Institute of Computing Technology, Chinese Academy of Sciences, Beijing.