Rack Configuration::!
Hello! How to configure the machines in different racks? I have in all 10 machines. Now I want the heirarchy as follows:: machine1 machine2 machine3--these are all DN and TT machine4 machine5 ->JT1 machine7 machine8--> JT2 machine10>NN and Sec.NN As of now I have 7 machines running a hadoop cluster. which follws below heirarchy:: machine1 machine2--> these are all DN and TT machine3 machine4 machine5-->JT machine6->Sec.NN machine7--->NN Also, if the machines are configured in different racks, what advantage do we have? Also, give me few problem statements which handles big amount of data(processing). What Yahoo and Amazon guys have done? What kind of huge processing of huge data they have handled? -- Regards! Sugandha
Re: org.apache.hadoop.ipc.client : trying connect to server failed
Hi Ashish! Try for the following things:: -> Check the config file(hadoop-site.xml) of namenode. -> Make sure, the tag(dfs.datanode.addres)'s value you have given correctly it's IP,and the name of that machine. -> Also, check for the name added in /etc/hosts file. -> Check for the ssh keys of datanodes present in namenode's known_hosts file -> check for the value of dfs.datanode.addres on datanode's config file. On Tue, Jun 16, 2009 at 10:58 AM, ashish pareek wrote: > HI , > I am trying to step up a hadoop cluster on 3GB machine and using hadoop > 0.18.3 and have followed procedure given in apache hadoop site for hadoop > cluster. > In conf/slaves I have added two datanode i.e including the namenode > vitrual machine and other machine virtual machine (datanode) . and > have > set up passwordless ssh between both virtual machines . But now problem > is when I run command : > > bin/hadoop start-all.sh > > It start only one datanode on the same namenode vitrual machine but it > doesn't start the datanode on other machine. > > in logs/hadoop-datanode.log i get message > > > INFO org.apache.hadoop.ipc.Client: Retrying > connect to server: hadoop1/192.168.1.28:9000. Already > > tried 1 time(s). > > 2009-05-09 18:35:14,266 INFO org.apache.hadoop.ipc.Client: Retrying > connect to server: hadoop1/192.168.1.28:9000. Already tried 2 time(s). > > 2009-05-09 18:35:14,266 INFO org.apache.hadoop.ipc.Client: Retrying > connect to server: hadoop1/192.168.1.28:9000. Already tried 3 time(s). > > > . > . > . > . > . > . > . > . > . > > . > . > > . > > I have tried formatting and start the cluster again .but still I > get the same error. > > So can any one help in solving this problem. :) > > Thanks > > Regards > > Ashish Pareek > -- Regards! Sugandha
Small Issues..!
Hello! I have a 4 node cluster of hadoop running. Now, there is 5th machine which is acting as a client of hadoop. It's not a part of the hadoop cluster(master/slave config file). Now I have to writer a JAVA code that gets executed on this client which will simply put the client ystem's data into HDFS(and get it replicated over 2 datanodes) and as per my requirement, I can simply fetch it back on the client machine itself. For this, I have done following things as of now:: *** -> Among 4 nodes 2 are datanodes and ther oter 2 are namenode and jobtracker respectively. *** *** -> Now, to make that code work on client machine, I have designed a UI. Now here on the client m/c, do i need to install hadoop? *** *** -> I have installed hadoop on it, and in it's config file, I have specified only 2 tags. 1) fs.default.name-> value=namenode's address. 2) dfs.http.address(namenode's addres) *** *** Thus, If there is a file in /home/hadoop/test.java on client machine; I will have 2 get the instance of HDFS fs by Filesystem.get. rt?? *** *** Then, by using Filesystem.util, I will have to simply specify both the fs::local as src, hdfs as destination, and src path as the /home/hadoop/test.java and destination as /user/hadoop/. rt?? So it should work ...! *** *** -> But, it gives me an error as "not able to find src path /home/hadoop/test.java" -> Will i have to use RPC classes and methods under hadoop api to do this.?? *** *** Things don;t seem to be working in any of the ways. Please help me out. *** Thanks!
Re: :!!
Hello! I want to execute all my code on a machine that's remote(not a part of hadoop cluster). This code includes ::file transfers between any nodes (remote or within hadoop cluster or within same LAN)-irrespective.; and HDFS. I will have to simply write a code for this. Is it possible? Thanks, Regards- -- Regards! Sugandha
Few Issues!!!
I have a 7 node cluster. Now if ssh to NN, and type in-hadoop -put /home/hadoop/Desktop/test.java /user/hadoop --> the file gets placed in HDFS and gets replicated automatically. Now if the same file is in one of the datanodes in the same location. And I want to place it in HDFS through NN, and not ssh'ing to that datanode---> then what should the format of the command. I tried hadoop -put 10.20.220.30:50133/home/hadoop/Desktop/test.java /user/hadoop---> here, 30 Ip is the datanode. But it didn't work. Also, I want to make it work though JAVA code by using all thise API's. So will I have to invoke RPC clients and servers methods to resolve this?? Also, If this complete structure is executed on a remote node that has no connections with hadoop, what kind of scenarios I will have to face? Thanks! -- Regards! Sugandha
Re::!
Hello! I want to execute all my code on a machine that's remote(not a part of hadoop cluster). This code includes ::file transfers between any nodes (remote or within hadoop cluster or within same LAN)-irrespective.; and HDFS. I will have to simply write a code for this. Is it possible? Thanks, Regards-
RPC issues..!!
Hello!! I have a 7 node cluster. In which, 3 machines are individually dedicated for namenode,secondaryNN and Jobtracker and the other 4 are datanodes. Now, I want to transfer files and dump them into HDFS on the click of a button. For that purpose, I will have to write a code(preferably java code). -> Now, there are few files that are located in one of the datanodes or somewhere else in the cluster?(irrespective). So, Will I have to handle RPC issues for it? Will I have to get the proxy of it and do all that stuff? This was the first case. -> Second case is such:: Now the file transfer shoud between the node that's remote(not a part of hadoop cluster) or master/slave config file). Now, in this case, will I have to install hadoop on that node? My question seems to be the same everytime, but it's different. I didn't get any kind of appropriate replies as of now. I hope, if this time anybody could help me out..! -- Regards! Sugandha
Files not getting transferred!
Hello! My files are getting transferred within the cluster of 7 nodes. But, if I trye to do the same thing between a host that's remote but within the same LAN, I am not able to do that. Basically, how to specify the path of the other node in the confug file so that, It will come to know that this particular file of this particular machine is to be transferred and dumped into HDFS. Also, this local file transfer works thru command line; but, if I try to do it thru JAVA code, it doesn't work. Everytime, I manually ssh that machine and transfer the data... -- Regards! Sugandha
Code not working..!
Hello! 8 Following is the code that's not working:: package data.pkg; import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.FileUtil; import org.apache.hadoop.fs.Path; public class Try { public static void main(String[] args) { Configuration conf_hdfs=new Configuration(); try { FileSystem hdfs_filesystem=FileSystem.get(conf_hdfs); FileSystem remote_filesystem=FileSystem.getLocal(conf_hdfs); Path in_path =new Path("/home/hadoop/Desktop/test.java"); Path out_path=new Path("/user/hadoop"); FileUtil.copy(remote_filesystem,in_path,hdfs_filesystem, out_path, false, false,conf_hdfs); System.out.println("Done...!"); } catch (IOException e) { e.printStackTrace(); } } } What I am trying to do is simply copy a file from a remote node(not a part of master-slave config file) to HDFS (a cluster of 7 nodes). But, it's flanking the errors as follws:: * File /home/hadoop/Desktop/test.java does not exist. at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:420) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:244) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:192) at data.pkg.Try.main(Try.java:24) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:165) at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68) ** -- Regards! Sugandha
Code not working..!
Hello! I am trying to transfer data from a remote node's filesystem to HDFS. But, somehow, it's not working.! *** I have a 7 node cluster, It's config file(hadoop-site.xml) is as follows:: fs.default.name hdfs://nikhilname:50130 dfs.http.address nikhilname:50070 For not getting too lengthy, I am sending u just the important tags. So here, nikhilname is the namenode. I have specified its IP in /etc/hosts. ** Then, here is the 8th machine(client or remote) which has this config file:: fs.default.name hdfs://nikhilname:50130 dfs.http.address nikhilname:50070 Here, I have pointed fs.default.name to the namenode ** ** Then, here is the code that simply tries to copy a file from localfilesystem(remote node) and place it into HDFS, thereby leading to replication. The path is /home/hadoop/Desktop/test.java(of remote node) I want to place it in HDFS(/user/hadoop) package data.pkg; import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.FileUtil; import org.apache.hadoop.fs.Path; public class Try { public static void main(String[] args) { Configuration conf_hdfs=new Configuration(); Configuration conf_remote=new Configuration(); try { FileSystem hdfs_filesystem=FileSystem.get(conf_hdfs); FileSystem remote_filesystem=FileSystem.getLocal(conf_remote); String in_path_name=remote_filesystem+"/home/hadoop/Desktop/test.java"; Path in_path =new Path(in_path_name); String out_path_name=hdfs_filesystem+""; Path out_path=new Path("/user/hadoop"); FileUtil.copy(remote_filesystem,in_path,hdfs_filesystem, out_path, false, false,conf_hdfs); System.out.println("Done...!"); } catch (IOException e) { e.printStackTrace(); } } } But, following are the errors I am getting after it's execution java.io.FileNotFoundException: File org.apache.hadoop.fs.localfilesys...@15a8767/home/hadoop/Desktop/test.java does not exist. at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:420) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:244) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:192) at data.pkg.Try.main(Try.java:103) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:165) at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68) ** Briefly what I have done as of now:: -> Got the instances of both the filesystems. -> Passed the paths appropriately. -> I have also taken care of proxy issues -> The file is also placed in /home/hadoop/Desktop/test.java on remote node. ***Also, Can you tel me the difference between LocalFileSystem and RawFileSystem Thanking You, -- Regards! Sugandha
HDFS issues..!
If I want to make the data transfer fast, then what am I supposed to do? I want to place the data in HDFS and replicate it in fraction of seconds. Can that be possible. and How? Placing a 5GB file will take atleast half n hour...or so...but, if its a large cluster, lets say, of 7nodes, and then placing it in HDFS would take around 2-3 hours. So, how that time delay can be avoided..? Also, My simply aim is to transfer the data, i.e; dumping the data into HDFS and gettign it back whenever needed. So, for this, transfer, how speed can be achieved? -- Regards! Sugandha
Re: HDFS data transfer!
But if I want to make it fast, then??? I want to place the data in HDFS and reoplicate it in fraction of seconds. Can that be possible. and How? On Wed, Jun 10, 2009 at 2:47 PM, kartik saxena wrote: > I would suppose about 2-3 hours. It took me some 2 days to load a 160 Gb > file. > Secura > > On Wed, Jun 10, 2009 at 11:56 AM, Sugandha Naolekar > wrote:It > > > Hello! > > > > If I try to transfer a 5GB VDI file from a remote host(not a part of > hadoop > > cluster) into HDFS, and get it back, how much time is it supposed to > take? > > > > No map-reduce involved. Simply Writing files in and out from HDFS through > a > > simple code of java (usage of API's). > > > > -- > > Regards! > > Sugandha > > > -- Regards! Sugandha
HDFS data transfer!
Hello! If I try to transfer a 5GB VDI file from a remote host(not a part of hadoop cluster) into HDFS, and get it back, how much time is it supposed to take? No map-reduce involved. Simply Writing files in and out from HDFS through a simple code of java (usage of API's). -- Regards! Sugandha
Re: Remote connection to HDFS!
Hi Todd! I am facing many issues in transferring the data and making it work. That's why, I reposted the question. My intention is not to trouble you guys! Sorry for the inconveniences. On Mon, Jun 8, 2009 at 7:40 PM, Todd Lipcon wrote: > Hi Sugandha, > > Usman has already answered your question. Please stop reposting the same > question over and over. > > Thanks > -Todd > > On Mon, Jun 8, 2009 at 7:05 AM, Sugandha Naolekar >wrote: > > > Hello! > > > > I have A 7 node cluster. Now there is 8th machine (called as remote) > which > > will bw acting just as a client and not a part of hadoop > > cluster(master-slave config files). > > > > > > Now, will I have to install hadoop on that client machine to transfer the > > data from that remote to hdfs (namenode) machine? > > Thus, in the remote's config file, I will have to point fs.default.nameto > > namenode right? > > > > -- > > Regards! > > Sugandha > > > -- Regards! Sugandha
Map-Reduce!
Hello! As far as I have read the forums of Map-reduce, it is basically used to process large amount of data speedily. right?? But, can you please give me some instances or examples wherein, I can use map-reduce..??? -- Regards! Sugandha
Re: Few Queries..!!!
Hello! I have a 7 node cluster. But there is one remote node(8th machine) within the same LAN which holds some kind of data. Now, I need to place this data into HDFS. This 8th machine is not a part of the hadoop cluster(master/slave) config file. So, what I have thought is:: -> Will get the Filesystem instance by using FileSystem api -> Will get the local file's(remote machine's) instance by using the same api by passing a different config file which simply states a tag of fs, default.name -> And then will simply use all the methods to copy and get the data back from HDFS... -> During the complete episode, I will have to take care of the proxy issues for remote node to get connceted to Namenode. Is this procedure correct? Also, I am an undergraduate as of now. I want to be a part of this hadoop project and get into its development of various sub projects undertaken. Can that be feasible.?? Thanking You, On Fri, Jun 5, 2009 at 11:19 PM, Alex Loddengaard wrote: > Hi, > > The throughput of HDFS is good, because each read is basically a stream > from > several hard drives (each hard drive holds a different block of the file, > and these blocks are distributed across many machines). That said, HDFS > does not have very good latency, at least compared to local file systems. > > When you write a file using the HDFS client (whether it be Java or > bin/hadoop fs), the client and the name node coordinate to put your file on > various nodes in the cluster. When you use that same client to read data, > your client coordinates with the name node to get block locations for a > given file and does a HTTP GET request to fetch those blocks from the nodes > which store them. > > You could in theory get data off of the local file system on your data > nodes, but this wouldn't make any sense, because the client does everything > for you already. > > Hope this clears things up. > > Alex > > On Fri, Jun 5, 2009 at 12:53 AM, Sugandha Naolekar > wrote: > > > Hello! > > > > Placing any kind of data into HDFS and then getting it back, can this > > activity be fast? Also, the node of which I have to place the data in > HDFS, > > is a remote node. So then, will I have to use RPC mechnaism or simply cna > > get the locla filesystem of that node and do the things? > > > > -- > > Regards! > > Sugandha > > > -- Regards! Sugandha
Placing data into HDFS..!
Hello! I have a 7 node cluster. But there is one remote node(8th machine) within the same LAN which holds some kind of data. Now, I need to place this data into HDFS. This 8th machine is not a part of the hadoop cluster(master/slave) config file. So, what I have thought is:: -> Will get the Filesystem instance by using FileSystem api -> Will get the local file's(remote machine's) instance by using the same api by passing a different config file which simply states a tag of fs, default.name -> And then will simply use all the methods to copy and get the data back from HDFS... -> During the complete episode, I will have to take care of the proxy issues for remote node to get connceted to Namenode. Is this procedure correct? Thanking You, -- Regards! Sugandha
Re: How to place a data ito HDFS::!
Thanks a lot! Will try it out initially with the machines within LAN and then later on with the remote machines. Will let you know, if something gets on my way! On Fri, Jun 5, 2009 at 3:07 PM, Usman Waheed wrote: > I have setup machines just to act as HADOOP clients which are not part of > the actual cluster (master/slave config). The only thing is that these > machines acting as hadoop clients were all internal in our network and I > have not tested with remote machines outside our internal LAN. My assumption > is if the access privilages are set right from the remote machines (as > clients) pointing to the namenode in the cluster you could probably place > data into HDFS without issues. > > Thanks, > Usman > > I have a 7 node cluster working as of now. I want to place the data into >> HDFS, from a machine which is not a part of the hadoop cluster. How can I >> do >> that.? It's in a way, a remote machine. >> >> Will I have to use RPC mechanism or simply I can use FileSystem api and do >> some kind of coding and make it work? >> >> >> > > -- Regards! Sugandha
How to place a data ito HDFS::!
I have a 7 node cluster working as of now. I want to place the data into HDFS, from a machine which is not a part of the hadoop cluster. How can I do that.? It's in a way, a remote machine. Will I have to use RPC mechanism or simply I can use FileSystem api and do some kind of coding and make it work?
Few Queries..!!!
Hello! Placing any kind of data into HDFS and then getting it back, can this activity be fast? Also, the node of which I have to place the data in HDFS, is a remote node. So then, will I have to use RPC mechnaism or simply cna get the locla filesystem of that node and do the things? -- Regards! Sugandha
Few Queries..!!!
Hello! I am have following queries related to Hadoop:: -> Once I place my data in HDFS, it gets replicated and chunked automatically over the datanodes. Right? Hadoop takes care of all those things. -> Now, if there is some third party who is not participating in the Hadoop program. Means, he is not one of the nodes of hadoop cluster. Now, he has some data on his local filesystem. Thus, can I place this data into HDFS? How? -> Then, now when, that third party asks for a file or a direcory or any kind of data that was previously being dumped in HDFS without that third person's knowledge- he wnats it back(wants to retrieve it). Thus, the data should get placed on his local file system again, in some specific directory. How can I do this? -> Will I have to use Map-Reduce or something else ot make it work. -> Also, if I write map reduce code for all the complete activity, how will I fetch the data or the files that are chunked in HDFS in the form of blocks and combine(reassemble) them into a complete file and place it on a node;s local filesystem who is not a part of hadoop cluster setup. Eagerly waiting for reply! Thanking You, Sugandha! -- Regards! Sugandha
Few Queries..!!!
Hello! I am have following queries related to Hadoop:: -> Once I place my data in HDFS, it gets replicated and chunked automatically over the datanodes. Right? Hadoop takes care of all those things. -> Now, if there is some third party who is not participating in the Hadoop program. Means, he is not one of the nodes of hadoop cluster. Now, he has some data on his local filesystem. Thus, can I place this data into HDFS? How? -> Then, now when, that third party asks for a file or a direcory or any kind of data that was previously being dumped in HDFS without that third person's knowledge- he wnats it back(wants to retrieve it). Thus, the data should get placed on his local file system again, in some specific directory. How can I do this? -> Will I have to use Map-Reduce or something else ot make it work. -> Also, if I write map reduce code for all the complete activity, how will I fetch the data or the files that are chunked in HDFS in the form of blocks and combine(reassemble) them into a complete file and place it on a node;s local filesystem who is not a part of hadoop cluster setup. Eagerly waiting for reply! Thanking You, Sugandha! -- Regards! Sugandha