Re: basic questions about Hadoop!

Mafish Liu Mon, 01 Sep 2008 18:42:32 -0700

On Sat, Aug 30, 2008 at 10:12 AM, Gerardo Velez <[EMAIL PROTECTED]>wrote:


> Hi Victor!
>
> I got problem with remote writing as well, so I tried to go further on this
> and I would like to share what I did, maybe you have more luck than me
>
> 1) as I'm working with user gvelez in remote host I had to give write
> access
> to all, like this:
>
>    bin/hadoop dfs -chmod -R a+w input
>
> 2) After that, there is no more connection refused error, but instead I got
> following exception
>
>
>
> $ bin/hadoop dfs -copyFromLocal README.txt /user/hadoop/input/README.txt
> cygpath: cannot create short name of d:\hadoop\hadoop-0.17.2\logs
> 08/08/29 19:06:51 INFO dfs.DFSClient:
> org.apache.hadoop.ipc.RemoteException:
> jav
> a.io.IOException: File /user/hadoop/input/README.txt could only be
> replicated to
>  0 nodes, instead of 1
>        at
> org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.ja
> va:1145)
>        at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:300)
>        at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
>        at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
> sorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:585)
>        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446)
>        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:896)
>
> How many datanode do you have ? Only one, I guess.
Modify your $HADOOP_HOME/conf/hadoop-site.xml and lookup

<property>
    <name>dfs.replication</name>
    <value>1</value>
</property>

set value to 0.


>
> On Fri, Aug 29, 2008 at 9:53 AM, Victor Samoylov <
> [EMAIL PROTECTED]
> > wrote:
>
> > Jeff,
> >
> > Thanks for detailed instructions, but on machine that is not hadoop
> server
> > I
> > got error:
> > ~/hadoop-0.17.2$ ./bin/hadoop dfs -copyFromLocal NOTICE.txt test
> > 08/08/29 19:33:07 INFO dfs.DFSClient: Exception in
> createBlockOutputStream
> > java.net.ConnectException: Connection refused
> > 08/08/29 19:33:07 INFO dfs.DFSClient: Abandoning block
> > blk_-7622891475776838399
> > The thing is that file was created, but with zero size.
> >
> > Do you have ideas why this happened?
> >
> > Thanks,
> > Victor
> >
> > On Fri, Aug 29, 2008 at 4:10 AM, Jeff Payne <[EMAIL PROTECTED]> wrote:
> >
> > > You can use the hadoop command line on machines that aren't hadoop
> > servers.
> > > If you copy the hadoop configuration from one of your master servers or
> > > data
> > > node to the client machine and run the command line dfs tools, it will
> > copy
> > > the files directly to the data node.
> > >
> > > Or, you could use one of the client libraries.  The java client, for
> > > example, allows you to open up an output stream and start dumping bytes
> > on
> > > it.
> > >
> > > On Thu, Aug 28, 2008 at 5:05 PM, Gerardo Velez <
> [EMAIL PROTECTED]
> > > >wrote:
> > >
> > > > Hi Jeff, thank you for answering!
> > > >
> > > > What about remote writing on HDFS, lets suppose I got an application
> > > server
> > > > on a
> > > > linux server A and I got a Hadoop cluster on servers B (master), C
> > > (slave),
> > > > D (slave)
> > > >
> > > > What I would like is sent some files from Server A to be processed by
> > > > hadoop. So in order to do so, what I need to do.... do I need send
> > those
> > > > files to master server first and then copy those to HDFS?
> > > >
> > > > or can I pass those files to any slave server?
> > > >
> > > > basically I'm looking for remote writing due to files to be process
> are
> > > not
> > > > being generated on any haddop server.
> > > >
> > > > Thanks again!
> > > >
> > > > -- Gerardo
> > > >
> > > >
> > > >
> > > > Regarding
> > > >
> > > > On Thu, Aug 28, 2008 at 4:04 PM, Jeff Payne <[EMAIL PROTECTED]>
> > wrote:
> > > >
> > > > > Gerardo:
> > > > >
> > > > > I can't really speak to all of your questions, but the master/slave
> > > issue
> > > > > is
> > > > > a common concern with hadoop.  A cluster has a single namenode and
> > > > > therefore
> > > > > a single point of failure.  There is also a secondary name node
> > process
> > > > > which runs on the same machine as the name node in most default
> > > > > configurations.  You can make it a different machine by adjusting
> the
> > > > > master
> > > > > file.  One of the more experienced lurkers should feel free to
> > correct
> > > > me,
> > > > > but my understanding is that the secondary name node keeps track of
> > all
> > > > the
> > > > > same index information used by the primary name node.  So, if the
> > > > namenode
> > > > > fails, there is no automatic recovery, but you can always tweak
> your
> > > > > cluster
> > > > > configuration to make the secondary namenode the primary and safely
> > > > restart
> > > > > the cluster.
> > > > >
> > > > > As for the storage of files, the name node is really just the
> traffic
> > > cop
> > > > > for HDFS.  No HDFS files are actually stored on that machine.  It's
> > > > > basically used as a directory and lock manager, etc.  The files are
> > > > stored
> > > > > on multiple datanodes and I'm pretty sure all the actual file I/O
> > > happens
> > > > > directly between the client and the respective datanodes.
> > > > >
> > > > > Perhaps one of the more hardcore hadoop people on here will point
> it
> > > out
> > > > if
> > > > > I'm giving bad advice.
> > > > >
> > > > >
> > > > > On Thu, Aug 28, 2008 at 2:28 PM, Gerardo Velez <
> > > [EMAIL PROTECTED]
> > > > > >wrote:
> > > > >
> > > > > > Hi Everybody!
> > > > > >
> > > > > > I'm a newbie with Hadoop, I've installed it as a single node as a
> > > > > > pseudo-distributed environment, but I would like to go further
> and
> > > > > > configure
> > > > > > a complete hadoop cluster. But I got the following questions.
> > > > > >
> > > > > > 1.- I undertsand that HDFS has a master/slave architecture. So
> > master
> > > > and
> > > > > > the master server manages the file system namespace and regulates
> > > > access
> > > > > to
> > > > > > files by clients. So, what happens in a cluster environment if
> the
> > > > master
> > > > > > server fails or is down due to network issues?
> > > > > > the slave become as master server or something?
> > > > > >
> > > > > >
> > > > > > 2.- What about Haddop Filesystem, from client point of view. the
> > > client
> > > > > > should only store files in the HDFS on master server, or clients
> > are
> > > > able
> > > > > > to
> > > > > > store the file to be processed on a HDFS from a slave server as
> > well?
> > > > > >
> > > > > >
> > > > > > 3.- Until now, what I;m doing to run hadoop is:
> > > > > >
> > > > > >    1.- copy file to be processes from Linux File System to HDFS
> > > > > >    2.- Run hadoop shell   hadoop   -jarfile  input output
> > > > > >    3.- The results are stored on output directory
> > > > > >
> > > > > >
> > > > > > There is anyway to have hadoop as a deamon, so that, when the
> file
> > is
> > > > > > stored
> > > > > > in HDFS the file is processed automatically with hadoop?
> > > > > >
> > > > > > (witout to run hadoop shell everytime)
> > > > > >
> > > > > >
> > > > > > 4.- What happens with processed files, they are deleted form HDFS
> > > > > > automatically?
> > > > > >
> > > > > >
> > > > > > Thanks in advance!
> > > > > >
> > > > > >
> > > > > > -- Gerardo Velez
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Jeffrey Payne
> > > > > Lead Software Engineer
> > > > > Eyealike, Inc.
> > > > > [EMAIL PROTECTED]
> > > > > www.eyealike.com
> > > > > (206) 257-8708
> > > > >
> > > > >
> > > > > "Anything worth doing is worth overdoing."
> > > > > -H. Lifter
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Jeffrey Payne
> > > Lead Software Engineer
> > > Eyealike, Inc.
> > > [EMAIL PROTECTED]
> > > www.eyealike.com
> > > (206) 257-8708
> > >
> > >
> > > "Anything worth doing is worth overdoing."
> > > -H. Lifter
> > >
> >
>



-- 
[EMAIL PROTECTED]
Institute of Computing Technology, Chinese Academy of Sciences, Beijing.

Re: basic questions about Hadoop!

Reply via email to