Jeff, Thanks for detailed instructions, but on machine that is not hadoop server I got error: ~/hadoop-0.17.2$ ./bin/hadoop dfs -copyFromLocal NOTICE.txt test 08/08/29 19:33:07 INFO dfs.DFSClient: Exception in createBlockOutputStream java.net.ConnectException: Connection refused 08/08/29 19:33:07 INFO dfs.DFSClient: Abandoning block blk_-7622891475776838399 The thing is that file was created, but with zero size.
Do you have ideas why this happened? Thanks, Victor On Fri, Aug 29, 2008 at 4:10 AM, Jeff Payne <[EMAIL PROTECTED]> wrote: > You can use the hadoop command line on machines that aren't hadoop servers. > If you copy the hadoop configuration from one of your master servers or > data > node to the client machine and run the command line dfs tools, it will copy > the files directly to the data node. > > Or, you could use one of the client libraries. The java client, for > example, allows you to open up an output stream and start dumping bytes on > it. > > On Thu, Aug 28, 2008 at 5:05 PM, Gerardo Velez <[EMAIL PROTECTED] > >wrote: > > > Hi Jeff, thank you for answering! > > > > What about remote writing on HDFS, lets suppose I got an application > server > > on a > > linux server A and I got a Hadoop cluster on servers B (master), C > (slave), > > D (slave) > > > > What I would like is sent some files from Server A to be processed by > > hadoop. So in order to do so, what I need to do.... do I need send those > > files to master server first and then copy those to HDFS? > > > > or can I pass those files to any slave server? > > > > basically I'm looking for remote writing due to files to be process are > not > > being generated on any haddop server. > > > > Thanks again! > > > > -- Gerardo > > > > > > > > Regarding > > > > On Thu, Aug 28, 2008 at 4:04 PM, Jeff Payne <[EMAIL PROTECTED]> wrote: > > > > > Gerardo: > > > > > > I can't really speak to all of your questions, but the master/slave > issue > > > is > > > a common concern with hadoop. A cluster has a single namenode and > > > therefore > > > a single point of failure. There is also a secondary name node process > > > which runs on the same machine as the name node in most default > > > configurations. You can make it a different machine by adjusting the > > > master > > > file. One of the more experienced lurkers should feel free to correct > > me, > > > but my understanding is that the secondary name node keeps track of all > > the > > > same index information used by the primary name node. So, if the > > namenode > > > fails, there is no automatic recovery, but you can always tweak your > > > cluster > > > configuration to make the secondary namenode the primary and safely > > restart > > > the cluster. > > > > > > As for the storage of files, the name node is really just the traffic > cop > > > for HDFS. No HDFS files are actually stored on that machine. It's > > > basically used as a directory and lock manager, etc. The files are > > stored > > > on multiple datanodes and I'm pretty sure all the actual file I/O > happens > > > directly between the client and the respective datanodes. > > > > > > Perhaps one of the more hardcore hadoop people on here will point it > out > > if > > > I'm giving bad advice. > > > > > > > > > On Thu, Aug 28, 2008 at 2:28 PM, Gerardo Velez < > [EMAIL PROTECTED] > > > >wrote: > > > > > > > Hi Everybody! > > > > > > > > I'm a newbie with Hadoop, I've installed it as a single node as a > > > > pseudo-distributed environment, but I would like to go further and > > > > configure > > > > a complete hadoop cluster. But I got the following questions. > > > > > > > > 1.- I undertsand that HDFS has a master/slave architecture. So master > > and > > > > the master server manages the file system namespace and regulates > > access > > > to > > > > files by clients. So, what happens in a cluster environment if the > > master > > > > server fails or is down due to network issues? > > > > the slave become as master server or something? > > > > > > > > > > > > 2.- What about Haddop Filesystem, from client point of view. the > client > > > > should only store files in the HDFS on master server, or clients are > > able > > > > to > > > > store the file to be processed on a HDFS from a slave server as well? > > > > > > > > > > > > 3.- Until now, what I;m doing to run hadoop is: > > > > > > > > 1.- copy file to be processes from Linux File System to HDFS > > > > 2.- Run hadoop shell hadoop -jarfile input output > > > > 3.- The results are stored on output directory > > > > > > > > > > > > There is anyway to have hadoop as a deamon, so that, when the file is > > > > stored > > > > in HDFS the file is processed automatically with hadoop? > > > > > > > > (witout to run hadoop shell everytime) > > > > > > > > > > > > 4.- What happens with processed files, they are deleted form HDFS > > > > automatically? > > > > > > > > > > > > Thanks in advance! > > > > > > > > > > > > -- Gerardo Velez > > > > > > > > > > > > > > > > -- > > > Jeffrey Payne > > > Lead Software Engineer > > > Eyealike, Inc. > > > [EMAIL PROTECTED] > > > www.eyealike.com > > > (206) 257-8708 > > > > > > > > > "Anything worth doing is worth overdoing." > > > -H. Lifter > > > > > > > > > -- > Jeffrey Payne > Lead Software Engineer > Eyealike, Inc. > [EMAIL PROTECTED] > www.eyealike.com > (206) 257-8708 > > > "Anything worth doing is worth overdoing." > -H. Lifter >