Rack Configuration::!

2009-06-16 Thread Sugandha Naolekar
Hello!

How to configure the machines in different racks?

I have in all 10 machines.

Now I want the heirarchy as follows::

machine1
machine2
machine3--these are all DN and TT
machine4

machine5 ->JT1

machine7
machine8--> JT2

machine10>NN and Sec.NN


As of now I have 7 machines running a hadoop cluster. which follws below
heirarchy::

machine1
machine2--> these are all DN and TT
machine3
machine4

machine5-->JT

machine6->Sec.NN

machine7--->NN

Also, if the machines are configured in different racks, what advantage do
we have? Also, give me few problem statements which handles big amount of
data(processing). What Yahoo and Amazon guys have done? What kind of huge
processing of huge data they have handled?


-- 
Regards!
Sugandha


Re: org.apache.hadoop.ipc.client : trying connect to server failed

2009-06-15 Thread Sugandha Naolekar
Hi Ashish!

Try for the following things::

-> Check the config file(hadoop-site.xml) of namenode.
-> Make sure, the tag(dfs.datanode.addres)'s value you have given correctly
it's IP,and the name of that machine.
-> Also, check for the name added in /etc/hosts file.
-> Check for the ssh keys of datanodes present in namenode's known_hosts
file
-> check for the value of dfs.datanode.addres on datanode's config file.



On Tue, Jun 16, 2009 at 10:58 AM, ashish pareek  wrote:

> HI ,
> I am trying to step up a hadoop cluster on 3GB machine and using hadoop
> 0.18.3 and  have followed procedure given in  apache hadoop site for hadoop
> cluster.
> In conf/slaves I have added two datanode i.e including the namenode
> vitrual machine and other machine virtual machine (datanode)  . and
> have
> set up passwordless ssh between both virtual machines . But now problem
> is when I run command :
>
> bin/hadoop start-all.sh
>
> It start only one datanode on the same namenode vitrual machine but it
> doesn't start the datanode on other machine.
>
> in logs/hadoop-datanode.log  i get message
>
>
>  INFO org.apache.hadoop.ipc.Client: Retrying
>  connect to server: hadoop1/192.168.1.28:9000. Already
>
>  tried 1 time(s).
>
>  2009-05-09 18:35:14,266 INFO org.apache.hadoop.ipc.Client: Retrying
>  connect to server: hadoop1/192.168.1.28:9000. Already tried 2 time(s).
>
>  2009-05-09 18:35:14,266 INFO org.apache.hadoop.ipc.Client: Retrying
>  connect to server: hadoop1/192.168.1.28:9000. Already tried 3 time(s).
>
>
> .
> .
> .
> .
> .
> .
> .
> .
> .
>
> .
> .
>
> .
>
> I have tried formatting and start the cluster again .but still I
> get the same error.
>
> So can any one help in solving this problem. :)
>
> Thanks
>
> Regards
>
> Ashish Pareek
>



-- 
Regards!
Sugandha


Small Issues..!

2009-06-14 Thread Sugandha Naolekar
Hello!

I have a 4 node cluster of hadoop running. Now, there is 5th machine which
is acting as a  client of hadoop. It's not a part of the hadoop
cluster(master/slave config file). Now I have to writer a JAVA  code that
gets executed on this client which will simply put the client ystem's data
into HDFS(and get it replicated over 2 datanodes) and as per my requirement,
I can simply fetch it back on the client machine itself.

For this, I have done following things as of now::

***
-> Among 4 nodes 2 are datanodes and ther oter 2 are namenode and jobtracker
respectively.
***

***
-> Now, to make that code work on client machine, I have designed a UI. Now
here on the client m/c, do i need to install hadoop?
***

***
-> I have installed hadoop on it, and in it's config file, I have specified
only 2 tags.
   1) fs.default.name-> value=namenode's address.
   2) dfs.http.address(namenode's addres)
***

***
Thus, If there is a file in /home/hadoop/test.java on client machine; I will
have 2 get the instance of HDFS fs by Filesystem.get. rt??
***

***
Then, by using Filesystem.util, I will have to simply specify both the
fs::local as src, hdfs as destination, and src path as the
/home/hadoop/test.java and destination as /user/hadoop/. rt??
So it should work ...!
***

***
-> But, it gives me an error as "not able to find src path
/home/hadoop/test.java"

-> Will i have to use RPC classes and methods under hadoop api to do this.??
***

***
 Things don;t seem to be working in any of the ways. Please help me out.
***

Thanks!


Re: :!!

2009-06-14 Thread Sugandha Naolekar
Hello!

I want to execute all my code on a machine that's remote(not a part of
hadoop cluster).
This code includes ::file transfers between any nodes (remote or within
hadoop cluster or within same LAN)-irrespective.; and HDFS. I will have to
simply write a code for this.

Is it possible?

Thanks,
Regards-

-- 
Regards!
Sugandha


Few Issues!!!

2009-06-12 Thread Sugandha Naolekar
I have a 7 node cluster.

Now if ssh to NN, and type in-hadoop -put /home/hadoop/Desktop/test.java
/user/hadoop --> the file gets placed in HDFS and gets replicated
automatically.

Now if the same file is in one of the datanodes in the same location. And I
want to place it in HDFS through NN, and not ssh'ing to that
datanode---> then what should the format of the command.

I tried hadoop -put 10.20.220.30:50133/home/hadoop/Desktop/test.java
/user/hadoop---> here, 30 Ip is the datanode.


But it didn't work. Also, I want to make it work though JAVA code by using
all thise API's. So will I have to invoke RPC clients and servers methods to
resolve this??


Also, If this complete structure is executed on a remote node that has no
connections with hadoop, what kind of scenarios I will have to face?

Thanks!

-- 
Regards!
Sugandha


Re::!

2009-06-12 Thread Sugandha Naolekar
Hello!

I want to execute all my code on a machine that's remote(not a part of
hadoop cluster).
This code includes ::file transfers between any nodes (remote or within
hadoop cluster or within same LAN)-irrespective.; and HDFS. I will have to
simply write a code for this.

Is it possible?

Thanks,
Regards-


RPC issues..!!

2009-06-12 Thread Sugandha Naolekar
Hello!!


I have a 7 node cluster. In which, 3 machines are individually dedicated for
namenode,secondaryNN and Jobtracker and the other 4 are datanodes. Now, I
want to transfer files and dump them into HDFS on the click of a button. For
that purpose, I will have to write a code(preferably java code).

-> Now, there are few files that are located in one of the datanodes or
somewhere else in the cluster?(irrespective). So, Will I have to handle RPC
issues for it? Will I have to get the proxy of it and do all that stuff?

This was the first case.

-> Second case is such::
Now the file transfer shoud between the node that's remote(not a part of
hadoop cluster) or master/slave config file). Now, in this case, will I have
to install hadoop on that node?

My question seems to be the same everytime, but it's different.

I didn't get any kind of appropriate replies as of now. I hope, if this time
anybody could help me out..!

-- 
Regards!
Sugandha


Files not getting transferred!

2009-06-12 Thread Sugandha Naolekar
Hello!


My files are getting transferred within the cluster of 7 nodes. But, if I
trye to do the same thing between a host that's remote but within the same
LAN, I am not able to do that. Basically, how to specify the path of the
other node in the confug file so that, It will come to know that this
particular file of this particular machine is to be transferred and dumped
into HDFS.

Also, this local file transfer works thru command line; but, if I try to do
it thru JAVA code, it doesn't work. Everytime, I manually ssh that machine
and transfer the data...

-- 
Regards!
Sugandha


Code not working..!

2009-06-11 Thread Sugandha Naolekar
Hello!

8
Following is the code that's not working::

package data.pkg;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.FileUtil;
import org.apache.hadoop.fs.Path;

public class Try
{
public static void main(String[] args)
{
Configuration conf_hdfs=new Configuration();

try
{
FileSystem hdfs_filesystem=FileSystem.get(conf_hdfs);
FileSystem remote_filesystem=FileSystem.getLocal(conf_hdfs);

Path in_path =new Path("/home/hadoop/Desktop/test.java");
Path out_path=new Path("/user/hadoop");

FileUtil.copy(remote_filesystem,in_path,hdfs_filesystem,
out_path, false, false,conf_hdfs);

System.out.println("Done...!");
}
catch (IOException e)
{
e.printStackTrace();
}
}

 }




What I am trying to do is simply copy a file from a remote node(not a part
of master-slave config file) to HDFS (a cluster of 7 nodes).

But, it's flanking the errors as follws::

*

 File /home/hadoop/Desktop/test.java does not exist.
at
org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:420)
at
org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:244)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:192)
at data.pkg.Try.main(Try.java:24)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:165)
at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
**

-- 
Regards!
Sugandha


Code not working..!

2009-06-11 Thread Sugandha Naolekar
Hello!

I am trying to transfer data from a remote node's filesystem to HDFS. But,
somehow, it's not working.!

***
I have a 7 node cluster, It's config file(hadoop-site.xml) is as follows::


  fs.default.name
  hdfs://nikhilname:50130



  dfs.http.address
  nikhilname:50070


For not getting too lengthy, I am sending u just the important tags. So
here, nikhilname is the namenode. I have specified its IP in /etc/hosts.





**
Then, here is the 8th machine(client or remote) which has this config file::









fs.default.name
  hdfs://nikhilname:50130



  dfs.http.address
  nikhilname:50070




Here, I have pointed fs.default.name to the namenode

**


**
Then, here is the code that simply tries to copy a file from
localfilesystem(remote node) and place it into HDFS, thereby leading to
replication.

The path is /home/hadoop/Desktop/test.java(of remote node)
I want to place it in HDFS(/user/hadoop)

package data.pkg;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.FileUtil;
import org.apache.hadoop.fs.Path;

public class Try
{
public static void main(String[] args)
{
Configuration conf_hdfs=new Configuration();
Configuration conf_remote=new Configuration();


try
{
FileSystem hdfs_filesystem=FileSystem.get(conf_hdfs);
FileSystem remote_filesystem=FileSystem.getLocal(conf_remote);

String
in_path_name=remote_filesystem+"/home/hadoop/Desktop/test.java";
Path in_path =new Path(in_path_name);

String out_path_name=hdfs_filesystem+"";
Path out_path=new Path("/user/hadoop");

FileUtil.copy(remote_filesystem,in_path,hdfs_filesystem,
out_path, false, false,conf_hdfs);

System.out.println("Done...!");
}
catch (IOException e)
{
e.printStackTrace();
}
}


}





But, following are the errors I am getting after it's execution

java.io.FileNotFoundException: File
org.apache.hadoop.fs.localfilesys...@15a8767/home/hadoop/Desktop/test.java
does not exist.
at
org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:420)
at
org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:244)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:192)
at data.pkg.Try.main(Try.java:103)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:165)
at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
**


Briefly what I have done as of now::


-> Got the instances of both the filesystems.
-> Passed the paths appropriately.
-> I have also taken care of proxy issues
-> The file is also placed in /home/hadoop/Desktop/test.java on remote node.

***Also, Can you tel me the difference between LocalFileSystem and
RawFileSystem

Thanking You,


-- 
Regards!
Sugandha


HDFS issues..!

2009-06-10 Thread Sugandha Naolekar
 If I want to make the data transfer fast, then what am I supposed
to do? I want to place the data in HDFS and replicate it in fraction of
seconds. Can that be possible. and How? Placing a 5GB file will take atleast
half n hour...or so...but, if its a large cluster, lets say, of 7nodes, and
then placing it in HDFS would take around 2-3 hours. So, how that time delay
can be avoided..?

 Also, My simply aim is to transfer the data, i.e; dumping the data
into HDFS and gettign it back whenever needed. So, for this, transfer, how
speed can be achieved?
-- 
Regards!
Sugandha


Re: HDFS data transfer!

2009-06-10 Thread Sugandha Naolekar
But if I want to make it fast, then??? I want to place the data in HDFS and
reoplicate it in fraction of seconds. Can that be possible. and How?

On Wed, Jun 10, 2009 at 2:47 PM, kartik saxena  wrote:

> I would suppose about 2-3 hours. It took me some 2 days to load a 160 Gb
> file.
> Secura
>
> On Wed, Jun 10, 2009 at 11:56 AM, Sugandha Naolekar
> wrote:It
>
> > Hello!
> >
> > If I try to transfer a 5GB VDI file from a remote host(not a part of
> hadoop
> > cluster) into HDFS, and get it back, how much time is it supposed to
> take?
> >
> > No map-reduce involved. Simply Writing files in and out from HDFS through
> a
> > simple code of java (usage of API's).
> >
> > --
> > Regards!
> > Sugandha
> >
>



-- 
Regards!
Sugandha


HDFS data transfer!

2009-06-09 Thread Sugandha Naolekar
Hello!

If I try to transfer a 5GB VDI file from a remote host(not a part of hadoop
cluster) into HDFS, and get it back, how much time is it supposed to take?

No map-reduce involved. Simply Writing files in and out from HDFS through a
simple code of java (usage of API's).

-- 
Regards!
Sugandha


Re: Remote connection to HDFS!

2009-06-08 Thread Sugandha Naolekar
Hi Todd!

I am facing many issues in transferring the data and making it work. That's
why, I reposted the question. My intention is not to trouble you guys!

Sorry for the inconveniences.

On Mon, Jun 8, 2009 at 7:40 PM, Todd Lipcon  wrote:

> Hi Sugandha,
>
> Usman has already answered your question. Please stop reposting the same
> question over and over.
>
> Thanks
> -Todd
>
> On Mon, Jun 8, 2009 at 7:05 AM, Sugandha Naolekar  >wrote:
>
> > Hello!
> >
> > I have  A 7 node cluster. Now there is 8th machine (called as remote)
> which
> > will bw acting just as a client and not a part of hadoop
> > cluster(master-slave config files).
> >
> >
> > Now, will I have to install hadoop on that client machine to transfer the
> > data from that remote to hdfs (namenode) machine?
> > Thus, in the remote's config file, I will have to point fs.default.nameto
> > namenode right?
> >
> > --
> > Regards!
> > Sugandha
> >
>



-- 
Regards!
Sugandha


Map-Reduce!

2009-06-08 Thread Sugandha Naolekar
Hello!

As far as I have read the forums of Map-reduce, it is basically used to
process large amount of data speedily. right??

But, can you please give me some instances or examples wherein, I can use
map-reduce..???



-- 
Regards!
Sugandha


Re: Few Queries..!!!

2009-06-08 Thread Sugandha Naolekar
Hello!

I have a 7 node cluster. But there is one remote node(8th machine) within
the same LAN which holds some kind of data. Now, I need to place this data
into HDFS. This 8th machine is not a part of the hadoop
cluster(master/slave) config file.

So, what I have thought is::
-> Will get the Filesystem instance by using FileSystem api
-> Will get the local file's(remote machine's) instance by using the same
api by passing a different config file which simply states a tag of fs,
default.name

-> And then will simply use all the methods to copy and get the data back
from HDFS...
-> During the complete episode, I will have to take care of the proxy issues
for remote node to get connceted to Namenode.

Is this procedure correct?

Also, I am an undergraduate as of now. I want to be a part of this hadoop
project and get into its development of various sub projects undertaken. Can
that be feasible.??

Thanking You,


On Fri, Jun 5, 2009 at 11:19 PM, Alex Loddengaard  wrote:

> Hi,
>
> The throughput of HDFS is good, because each read is basically a stream
> from
> several hard drives (each hard drive holds a different block of the file,
> and these blocks are distributed across many machines).  That said, HDFS
> does not have very good latency, at least compared to local file systems.
>
> When you write a file using the HDFS client (whether it be Java or
> bin/hadoop fs), the client and the name node coordinate to put your file on
> various nodes in the cluster.  When you use that same client to read data,
> your client coordinates with the name node to get block locations for a
> given file and does a HTTP GET request to fetch those blocks from the nodes
> which store them.
>
> You could in theory get data off of the local file system on your data
> nodes, but this wouldn't make any sense, because the client does everything
> for you already.
>
> Hope this clears things up.
>
> Alex
>
> On Fri, Jun 5, 2009 at 12:53 AM, Sugandha Naolekar
> wrote:
>
> > Hello!
> >
> > Placing any kind of data into HDFS and then getting it back, can this
> > activity be fast? Also, the node of which I have to place the data in
> HDFS,
> > is a remote node. So then, will I have to use RPC mechnaism or simply cna
> > get the locla filesystem of that node and do the things?
> >
> > --
> > Regards!
> > Sugandha
> >
>



-- 
Regards!
Sugandha


Placing data into HDFS..!

2009-06-07 Thread Sugandha Naolekar
Hello!

I have a 7 node cluster. But there is one remote node(8th machine) within
the same LAN which holds some kind of data. Now, I need to place this data
into HDFS. This 8th machine is not a part of the hadoop
cluster(master/slave) config file.

So, what I have thought is::
-> Will get the Filesystem instance by using FileSystem api
-> Will get the local file's(remote machine's) instance by using the same
api by passing a different config file which simply states a tag of fs,
default.name

-> And then will simply use all the methods to copy and get the data back
from HDFS...
-> During the complete episode, I will have to take care of the proxy issues
for remote node to get connceted to Namenode.

Is this procedure correct?

Thanking You,

-- 
Regards!
Sugandha


Re: How to place a data ito HDFS::!

2009-06-05 Thread Sugandha Naolekar
Thanks a lot! Will try it out initially with the machines within LAN and
then later on with the remote machines.

Will let you know, if something gets on my way!


On Fri, Jun 5, 2009 at 3:07 PM, Usman Waheed  wrote:

> I have setup machines just to act as HADOOP clients which are not part of
> the actual cluster (master/slave config). The only thing is that these
> machines acting as hadoop clients were all internal in our network and I
> have not tested with remote machines outside our internal LAN. My assumption
> is if the access privilages are set right from the remote machines (as
> clients) pointing to the namenode in the cluster you could probably place
> data into HDFS without issues.
>
> Thanks,
> Usman
>
>  I have a 7 node cluster working as of now. I want to place the data into
>> HDFS, from a machine which is not a part of the hadoop cluster. How can I
>> do
>> that.? It's in a way, a remote machine.
>>
>> Will I have to use RPC mechanism or simply I can use FileSystem api and do
>> some kind of coding and make it work?
>>
>>
>>
>
>


-- 
Regards!
Sugandha


How to place a data ito HDFS::!

2009-06-05 Thread Sugandha Naolekar
I have a 7 node cluster working as of now. I want to place the data into
HDFS, from a machine which is not a part of the hadoop cluster. How can I do
that.? It's in a way, a remote machine.

Will I have to use RPC mechanism or simply I can use FileSystem api and do
some kind of coding and make it work?


Few Queries..!!!

2009-06-05 Thread Sugandha Naolekar
Hello!

Placing any kind of data into HDFS and then getting it back, can this
activity be fast? Also, the node of which I have to place the data in HDFS,
is a remote node. So then, will I have to use RPC mechnaism or simply cna
get the locla filesystem of that node and do the things?

-- 
Regards!
Sugandha


Few Queries..!!!

2009-06-05 Thread Sugandha Naolekar
Hello!

I am have following queries related to Hadoop::

-> Once I place my data in HDFS, it gets replicated and chunked
automatically over the datanodes. Right? Hadoop takes care of all those
things.

-> Now, if there is some third party who is not participating in the Hadoop
program. Means, he is not one of the nodes of hadoop cluster. Now, he has
some data on his local filesystem. Thus, can I place this data into HDFS?
How?

-> Then, now when, that third party asks for  a file or a direcory or any
kind of data that was previously being dumped in HDFS without that third
person's knowledge- he wnats it back(wants to retrieve it). Thus, the data
should get placed on his local file system again, in some specific
directory. How can I do this?

-> Will I have to use Map-Reduce or something else ot make it work.

-> Also, if I write map reduce code for all the complete activity, how will
I fetch the data or the files that are chunked in HDFS in the form of blocks
and combine(reassemble) them into a complete file and place it on a node;s
local filesystem who is not a part of hadoop cluster setup.

Eagerly waiting for reply!

Thanking You,
Sugandha!



-- 
Regards!
Sugandha


Few Queries..!!!

2009-06-05 Thread Sugandha Naolekar
Hello!

I am have following queries related to Hadoop::

-> Once I place my data in HDFS, it gets replicated and chunked
automatically over the datanodes. Right? Hadoop takes care of all those
things.

-> Now, if there is some third party who is not participating in the Hadoop
program. Means, he is not one of the nodes of hadoop cluster. Now, he has
some data on his local filesystem. Thus, can I place this data into HDFS?
How?

-> Then, now when, that third party asks for  a file or a direcory or any
kind of data that was previously being dumped in HDFS without that third
person's knowledge- he wnats it back(wants to retrieve it). Thus, the data
should get placed on his local file system again, in some specific
directory. How can I do this?

-> Will I have to use Map-Reduce or something else ot make it work.

-> Also, if I write map reduce code for all the complete activity, how will
I fetch the data or the files that are chunked in HDFS in the form of blocks
and combine(reassemble) them into a complete file and place it on a node;s
local filesystem who is not a part of hadoop cluster setup.

Eagerly waiting for reply!

Thanking You,
Sugandha!



-- 
Regards!
Sugandha