Re: Newbie question on block size calculation

2012-02-22 Thread Praveen Sripati
Seek time is ~ 10ms. If seek time has to be 1% of the transfer time then
transfer time has to be ~ 1000 ms (1s).

In ~ 1000 ms (1s) with a transfer rate of 100 MB/s, a block of 100MB can be
read.

Praveen

On Wed, Feb 22, 2012 at 11:22 AM, viva v  wrote:

> Have just started getting familiar with Hadoop & HDFS. Reading Tom White's
> book.
>
> The book describes an example related to HDFS block size. Here's a
> verbatim excerpt from the book
>
> "If the seek time is around 10 ms, and the transfer rate is 100 MB/s, then
> to make the seek time 1% of the transfer time, we need to make the block
> size around 100 MB."
>
> I can't seem to understand how we arrived at the fact that block size
> shold be 100MB.
>
> Could someone please help me understand?
>
> Thanks
> Viva
>


Re: Rack Awareness behaviour - Loss of rack

2012-02-10 Thread Praveen Sripati
> I have  rack awareness configured and seems to work fine.  My default rep
> count is 2.  Now I lost one rack  due to switch failure. Here is what I
> observe
>
> HDFS  continues to write in the existing available rack. It still keeps
two
> copies of each block, but now these blocks are being stored in the same
> rack.
>
> My questions:
>
> Is this the default HDFS behavior ?

Below is from the 'Hadoop : The Definitive Guide'. So, with a replication
of 2 or 3, the first and the second blocks should be placed on different
racks. Not sure why they are getting into the same rack. It makes sense to
put on a block with replication of 2 on 2 different racks (if available)
considering availability.

 Hadoop’s default strategy is to place the first replica on the same
node as the client (for clients running outside the cluster, a node is
chosen at random, although the system tries not to pick nodes that are too
full or too busy). The second replica is placed on a different rack from
the first (off-rack), chosen at random. The third replica is placed on the
same rack as the second, but on a different node chosen at random.

Praveen

On Wed, Feb 8, 2012 at 2:47 PM, Mohamed Elsayed <
mohammed.elsay...@bibalex.org> wrote:

> On 02/07/2012 09:45 PM, Harsh J wrote:
>
>> Yes balancer may help. You'll also sometimes have to manually
>> re-enforce the block placement policy in the stable releases
>> presently, the policy violation recovery is not automatic:
>>
>> hadoop fs -setrep -R 3 /
>> hadoop fs -setrep -R 2 /
>>
> When I execute the first command it goes well, but it halts on executing
> the second one. I don't know the reason, but the replication factor becomes
> 2 on all datanodes. Is it natural?
>
> --
> Mohamed Elsayed
>
>


Re: Setting up Federated HDFS

2012-02-09 Thread Praveen Sripati
Chandra,

In the namenode hdfs*xml, dfs.federation.nameservice.id is set to ns1, but
ns1 is not being used in the xml for defining the name node properties..

Here are the instructions to getting started with HDFS federation and mount
tables.

http://www.thecloudavenue.com/2012/01/getting-started-with-hdfs-federation.html

http://www.thecloudavenue.com/2012/01/hdfs-client-side-mount-table.html

Praveen

On Thu, Feb 9, 2012 at 2:12 PM, Chandrasekar wrote:

> Hello,
>   I'm still having problems configuring a cluster running
> federated HDFS, in a single node. Here's what i've done so far:
>
> 1. Extracted hdfs 0.23 tarball at 4 different locations, one for
> each of the 3 NameNode daemons and 1 for DataNode and client
>
> 2. I've added hdfs-site.xml and core-site.xml having the following
> information to each of the NameNode:
>
> core-site.xml 
> http://pastebin.com/nJaTq00s
> hdfs-site.xml 
> http://pastebin.com/DkueDHDf
>
> I've specified different port numbers, http port numbers and
> nameservice ids for each NameNode
>
> 3. Finally, to the fourth copy (DataNode) i've added the following
> configuration information:
>
> core-site.xml -
> http://pastebin.com/DYfwfi44 
> hdfs-site.xml -
> http://pastebin.com/xnmNnCKq
>
> I've specified fs.defaultFs here as I'll be using this as my
> client too.
>
>
> 4. I then go to each bin folder and run the "hadoop namenode
> -format" and "hadoop namenode"
> I haven't set the HADOOP_HOME env variable yet. (all namenodes
> run without any exception)
>
> 5. I then go to the DataNode bin folder and run "hadoop datanode".
> (no exception here)
>
> 6. Then I set the HADOOP_HOME env variable to the hadoop
> distribution from which the ran the DataNode.
>
> 7. Finally, I try to list out the contents of the viewfs file
> system by running "*hadoop fs -ls /*"
>
> The output i'm getting is only this : "*ls:
> viewfs://localhost:2/*"
>
>   Is this the right way to configure federated HDFS?
>
>
>
> On Wed, Feb 8, 2012 at 12:28, Suresh Srinivas wrote:
>
>>
>>
>> On Tue, Feb 7, 2012 at 4:51 PM, Chandrasekar wrote:
>>
>>>   In which file should i specify all this information about
>>> nameservices and the list of namenodes?
>>>
>>
>> hdfs-site.xml is the appropriate place, since it is hdfs-specific
>> configuration.
>>
>>If there are multiple namenodes, then which one should i specify
>>> in core-site.xml as fs.defaultFS?
>>>
>>
>> core-site.xml is the right place for fs.defaultFS.
>>
>> Given you have multiple namespaces from in federation setup, fs.defaultFS
>> should point to ViewFileSystem for a unified view of the namespaces to the
>> clients. There is an open bug HDFS-2558 to track this. I will get to this
>> as soon as I can.
>>
>> Regards,
>> Suresh
>>
>
>


Re: HDFS Federation Exception

2012-01-11 Thread Praveen Sripati
Suresh,

Here is the JIRA - https://issues.apache.org/jira/browse/HDFS-2778

Regards,
Praveen

On Wed, Jan 11, 2012 at 9:28 PM, Suresh Srinivas wrote:

> Thanks for figuring that. Could you create an HDFS Jira for this issue?
>
>
> On Wednesday, January 11, 2012, Praveen Sripati 
> wrote:
> > Hi,
> >
> > The documentation (1) suggested to set the
> `dfs.namenode.rpc-address.ns1` property to `hdfs://nn-host1:rpc-port` in
> the example. Changing the value to `nn-host1:rpc-port` (removing hdfs://)
> solved the problem. The document needs to be updated.
> >
> > (1) -
> http://hadoop.apache.org/common/docs/r0.23.0/hadoop-yarn/hadoop-yarn-site/Federation.html
> >
> > Praveen
> >
> > On Wed, Jan 11, 2012 at 3:40 PM, Praveen Sripati <
> praveensrip...@gmail.com> wrote:
> >
> > Hi,
> >
> > Got the latest code to see if any bugs were fixed and did try federation
> with the same configuration, but was getting similar exception.
> >
> > 2012-01-11 15:25:35,321 ERROR namenode.NameNode
> (NameNode.java:main(803)) - Exception in namenode join
> > java.io.IOException: Failed on local exception:
> java.net.SocketException: Unresolved address; Host Details : local host is:
> "hdfs"; destination host is: "(unknown):0;
> > at
> org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:895)
> > at org.apache.hadoop.ipc.Server.bind(Server.java:231)
> > at org.apache.hadoop.ipc.Server$Listener.(Server.java:313)
> > at org.apache.hadoop.ipc.Server.(Server.java:1600)
> > at org.apache.hadoop.ipc.RPC$Server.(RPC.java:576)
> > at
> org.apache.hadoop.ipc.WritableRpcEngine$Server.(WritableRpcEngine.java:322)
> > at
> org.apache.hadoop.ipc.WritableRpcEngine.getServer(WritableRpcEngine.java:282)
> > at
> org.apache.hadoop.ipc.WritableRpcEngine.getServer(WritableRpcEngine.java:46)
> > at org.apache.hadoop.ipc.RPC.getServer(RPC.java:550)
> > at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.(NameNodeRpcServer.java:145)
> > at
> org.apache.hadoop.hdfs.server.namenode.NameNode.createRpcServer(NameNode.java:356)
> > at
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:334)
> > at
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:458)
> > at
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:450)
> > at
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:751)
> > at
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:799)
> > Caused by: java.net.SocketException: Unresolved address
> > at sun.nio.ch.Net.translateToSocketException(Net.java:58)
> > at sun.nio.ch.Net.translateException(Net.java:84)
> > at sun.nio.ch.Net.translateException(Net.java:90)
> > at
> sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:61)
> > at org.apache.hadoop.ipc.Server.bind(Server.java:229)
> > ... 14 more
> > Caused by: java.nio.channels.UnresolvedAddressException
> > at sun.nio.ch.Net.checkAddress(Net.java:30)
> > at
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:122)
> > at
> sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)
> > ... 15 more
> >
> > Regards,
> > Praveen
> >
> > On Wed, Jan 11, 2012 at 12:24 PM, Praveen Sripati <
> praveensrip...@gmail.com> wrote:
> >
> > Hi,
> >
> > I am trying to setup a HDFS federation and getting the below error.
> Also, pasted the core-site.xml and hdfs-site.xml at the bottom of the mail.
> Did I miss something in the configuration files?
> >
> > 2012-01-11 12:12:15,759 ERROR namenode.NameNode
> (NameNode.java:main(803)) - Exception in namenode join
> > java.lang.IllegalArgumentException: Can't parse port ''
> > at
> org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:198)
> > at
> org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:153)
> > at
> org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:174)
> > at
> org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:228)
> > at
> org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:205)
> > at
> org.apache.hadoop.hdfs.server.namenode.NameNode.getRpcServerAddress(NameNode.java:266)
> > at
> org.apache.hadoop.hdfs.server.namenode.NameNode.loginAsNameNodeUser(NameNode.java:317)
> > at
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:329)
> > at org.apache.hadoop.hdfs.server.namenode.N
>


Re: HDFS Federation Exception

2012-01-11 Thread Praveen Sripati
Hi,

The documentation (1) suggested to set the `dfs.namenode.rpc-address.ns1`
property to `hdfs://nn-host1:rpc-port` in the example. Changing the value
to `nn-host1:rpc-port` (removing hdfs://) solved the problem. The document
needs to be updated.

(1) -
http://hadoop.apache.org/common/docs/r0.23.0/hadoop-yarn/hadoop-yarn-site/Federation.html

Praveen

On Wed, Jan 11, 2012 at 3:40 PM, Praveen Sripati
wrote:

> Hi,
>
> Got the latest code to see if any bugs were fixed and did try federation
> with the same configuration, but was getting similar exception.
>
> 2012-01-11 15:25:35,321 ERROR namenode.NameNode (NameNode.java:main(803))
> - Exception in namenode join
> java.io.IOException: Failed on local exception: java.net.SocketException:
> Unresolved address; Host Details : local host is: "hdfs"; destination host
> is: "(unknown):0;
> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:895)
> at org.apache.hadoop.ipc.Server.bind(Server.java:231)
> at org.apache.hadoop.ipc.Server$Listener.(Server.java:313)
> at org.apache.hadoop.ipc.Server.(Server.java:1600)
> at org.apache.hadoop.ipc.RPC$Server.(RPC.java:576)
> at
> org.apache.hadoop.ipc.WritableRpcEngine$Server.(WritableRpcEngine.java:322)
> at
> org.apache.hadoop.ipc.WritableRpcEngine.getServer(WritableRpcEngine.java:282)
> at
> org.apache.hadoop.ipc.WritableRpcEngine.getServer(WritableRpcEngine.java:46)
> at org.apache.hadoop.ipc.RPC.getServer(RPC.java:550)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.(NameNodeRpcServer.java:145)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.createRpcServer(NameNode.java:356)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:334)
>
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:458)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:450)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:751)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:799)
> Caused by: java.net.SocketException: Unresolved address
> at sun.nio.ch.Net.translateToSocketException(Net.java:58)
> at sun.nio.ch.Net.translateException(Net.java:84)
> at sun.nio.ch.Net.translateException(Net.java:90)
> at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:61)
> at org.apache.hadoop.ipc.Server.bind(Server.java:229)
> ... 14 more
> Caused by: java.nio.channels.UnresolvedAddressException
> at sun.nio.ch.Net.checkAddress(Net.java:30)
> at
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:122)
>     at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)
> ... 15 more
>
> Regards,
> Praveen
>
> On Wed, Jan 11, 2012 at 12:24 PM, Praveen Sripati <
> praveensrip...@gmail.com> wrote:
>
>>
>> Hi,
>>
>> I am trying to setup a HDFS federation and getting the below error. Also,
>> pasted the core-site.xml and hdfs-site.xml at the bottom of the mail. Did I
>> miss something in the configuration files?
>>
>> 2012-01-11 12:12:15,759 ERROR namenode.NameNode
>> (NameNode.java:main(803)) - Exception in namenode join
>> java.lang.IllegalArgumentException: Can't parse port ''
>> at
>> org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:198)
>> at
>> org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:153)
>> at
>> org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:174)
>> at
>> org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:228)
>> at
>> org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:205)
>> at
>> org.apache.hadoop.hdfs.server.namenode.NameNode.getRpcServerAddress(NameNode.java:266)
>> at
>> org.apache.hadoop.hdfs.server.namenode.NameNode.loginAsNameNodeUser(NameNode.java:317)
>> at
>> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:329)
>> at
>> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:458)
>> at
>> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:450)
>> at
>> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:751)
>> at
>> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:799)
>>
>> *core-site.xml*
>>
>> 
>> 
>> 
>>

Re: HDFS Federation Exception

2012-01-11 Thread Praveen Sripati
Hi,

Got the latest code to see if any bugs were fixed and did try federation
with the same configuration, but was getting similar exception.

2012-01-11 15:25:35,321 ERROR namenode.NameNode (NameNode.java:main(803)) -
Exception in namenode join
java.io.IOException: Failed on local exception: java.net.SocketException:
Unresolved address; Host Details : local host is: "hdfs"; destination host
is: "(unknown):0;
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:895)
at org.apache.hadoop.ipc.Server.bind(Server.java:231)
at org.apache.hadoop.ipc.Server$Listener.(Server.java:313)
at org.apache.hadoop.ipc.Server.(Server.java:1600)
at org.apache.hadoop.ipc.RPC$Server.(RPC.java:576)
at
org.apache.hadoop.ipc.WritableRpcEngine$Server.(WritableRpcEngine.java:322)
at
org.apache.hadoop.ipc.WritableRpcEngine.getServer(WritableRpcEngine.java:282)
at
org.apache.hadoop.ipc.WritableRpcEngine.getServer(WritableRpcEngine.java:46)
at org.apache.hadoop.ipc.RPC.getServer(RPC.java:550)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.(NameNodeRpcServer.java:145)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.createRpcServer(NameNode.java:356)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:334)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:458)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:450)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:751)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:799)
Caused by: java.net.SocketException: Unresolved address
at sun.nio.ch.Net.translateToSocketException(Net.java:58)
at sun.nio.ch.Net.translateException(Net.java:84)
at sun.nio.ch.Net.translateException(Net.java:90)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:61)
at org.apache.hadoop.ipc.Server.bind(Server.java:229)
... 14 more
Caused by: java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:30)
at
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:122)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)
... 15 more

Regards,
Praveen

On Wed, Jan 11, 2012 at 12:24 PM, Praveen Sripati
wrote:

>
> Hi,
>
> I am trying to setup a HDFS federation and getting the below error. Also,
> pasted the core-site.xml and hdfs-site.xml at the bottom of the mail. Did I
> miss something in the configuration files?
>
> 2012-01-11 12:12:15,759 ERROR namenode.NameNode (NameNode.java:main(803))
> - Exception in namenode join
> java.lang.IllegalArgumentException: Can't parse port ''
> at
> org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:198)
> at
> org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:153)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:174)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:228)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:205)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.getRpcServerAddress(NameNode.java:266)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.loginAsNameNodeUser(NameNode.java:317)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:329)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:458)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:450)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:751)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:799)
>
> *core-site.xml*
>
> 
> 
> 
> hadoop.tmp.dir
> /home/praveensripati/tmp/hadoop-0.23.0/tmp
> 
> 
>
> *hdfs-site.xml*
>
> 
> 
> 
> dfs.replication
> 1
> 
> 
> dfs.permissions
> false
> 
> 
> dfs.federation.nameservices
> ns1
> 
> 
> dfs.namenode.rpc-address.ns1
> hdfs://praveen-laptop:9001
>   
> 
> dfs.namenode.http-address.ns1
> praveen-laptop:50071
> 
> 
> dfs.namenode.secondaryhttp-address.ns1
> praveen-laptop:50091
> 
> 
>
> Regards,
> Praveen
>
>


Fwd: HDFS Federation Exception

2012-01-10 Thread Praveen Sripati
Hi,

I am trying to setup a HDFS federation and getting the below error. Also,
pasted the core-site.xml and hdfs-site.xml at the bottom of the mail. Did I
miss something in the configuration files?

2012-01-11 12:12:15,759 ERROR namenode.NameNode (NameNode.java:main(803)) -
Exception in namenode join
java.lang.IllegalArgumentException: Can't parse port ''
at
org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:198)
at
org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:153)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:174)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:228)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:205)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.getRpcServerAddress(NameNode.java:266)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.loginAsNameNodeUser(NameNode.java:317)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:329)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:458)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:450)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:751)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:799)

*core-site.xml*




hadoop.tmp.dir
/home/praveensripati/tmp/hadoop-0.23.0/tmp



*hdfs-site.xml*




dfs.replication
1


dfs.permissions
false


dfs.federation.nameservices
ns1


dfs.namenode.rpc-address.ns1
hdfs://praveen-laptop:9001
  

dfs.namenode.http-address.ns1
praveen-laptop:50071


dfs.namenode.secondaryhttp-address.ns1
praveen-laptop:50091



Regards,
Praveen


Re: NameNode Safe Mode and CheckPointing

2012-01-07 Thread Praveen Sripati
During the time the NN stops writing to the old edits file and creates a
new edit file, will the file modifications work or not? Curious, how this
is handled in the code.

Praveen

On Sun, Jan 8, 2012 at 9:34 AM, Harsh J  wrote:

> Praveen,
>
> On 08-Jan-2012, at 9:13 AM, Praveen Sripati wrote:
>
> When the checkpointing starts, the primary namenode starts a new edits
> file. During the checkpointing process will the namenode go into safe
> mode?
>
>
> No.
>


NameNode Safe Mode and CheckPointing

2012-01-07 Thread Praveen Sripati
When the checkpointing starts, the primary namenode starts a new edits
file. During the checkpointing process will the namenode go into safe
mode? According
to the Hadoop - The Definitive Guide

> The schedule for checkpointing is controlled by two configuration
parameters. The secondary namenode checkpoints every hour
(fs.checkpoint.period in seconds) or sooner if the edit log has reached 64
MB (fs.checkpoint.size in bytes), which it checks every five minutes.

Regards,
Praveen


Re: Configuring Fully-Distributed Operation

2011-12-26 Thread Praveen Sripati
At the minimum you need to specify the location of the namenode and the
jobtracker in the configuration files for all the nodes and the client,
rest of the properties are defaulted. Also, based on the # of data nodes
you also need to specify the hdfs replication factor.

Praveen

On Sun, Dec 25, 2011 at 7:35 PM, Mohamed Elsayed <
mohammed.elsay...@bibalex.org> wrote:

> After reading documentation in http://hadoop.apache.org/**
> common/docs/r0.20.2/cluster_**setup.html,
> I need example to each parameter value at hdfs-site.xml and mapred-site.xml
> to be clearer. What are mandatory parameters? Thanks in advance.
>
> --
> Mohamed Elsayed
> Bibliotheca Alexandrina
>
>


Re: HDFS HTTP client

2011-12-06 Thread Praveen Sripati
Also check WebHDFS (1). I think both Hoop and WebHDFS are not into Hadoop
yet. Check the HDFS-2178 and HDFS-2316 JIRA for the status.

(1) - http://hortonworks.com/webhdfs-%E2%80%93-http-rest-access-to-hdfs/

Regards,
Praveen

On Tue, Dec 6, 2011 at 4:39 PM, alo alt  wrote:

> Hi Simo,
>
> did you look for a tool like
> http://cloudera.github.com/hoop/docs/latest/index.html ?
>
> best,
>  Alex
>
>
> On Tue, Dec 6, 2011 at 11:55 AM, Simone Tripodi 
> wrote:
>
>> Hi all guys,
>> in the company we are looking for a solution to browse HDFS via
>> http... is there any existing technology that would allow us doing it?
>> We are open to explore anlso 3rd parties, beter if open source :)
>> Many thanks in advance, all the best,
>> -Simo
>>
>> http://people.apache.org/~simonetripodi/
>> http://simonetripodi.livejournal.com/
>> http://twitter.com/simonetripodi
>> http://www.99soft.org/
>>
>
>
>
> --
> Alexander Lorenz
> http://mapredit.blogspot.com
>
> *P **Think of the environment: please don't print this email unless you
> really need to.*
>
>
>


Re: SequenceFile with one very large value

2011-12-05 Thread Praveen Sripati
>> SequenceFiles place sync markers (similar to what 'newlines' mean in
text files) after  a bunch of records, and that is the reason why your
record does not split when read.

Sync is placed after every N records and is used for moving from an
arbitrary location in a file to a start of the next record. A mapper
processes a sequence file block from the first sync in the current block to
the first sync in the next block in the file. There might be some data
transfer from other node to the node where the task is running.

What happens with a block size of 128MB, a key more than 128 MB and a
particular block doesn't have a sync mark? Will the mapper see that there
is no sync mark in the block and doesn't do anything or the block is not
assigned to a mapper?

Regards,
Praveen

On Mon, Dec 5, 2011 at 10:47 AM, Harsh J  wrote:

> Florin,
>
> Based on the SequenceFileInputFormat's splitting, you should see just
> one task reading the record. SequenceFiles place sync markers (similar
> to what 'newlines' mean in text files) after  a bunch of records, and
> that is the reason why your record does not split when read.
>
> Also worth thinking about increasing block size for these files to fit
> their contents.
>
> On Thu, Oct 27, 2011 at 9:31 PM, Florin P  wrote:
> > Hello!
> >  Suppose this scenario:
> > 1. The DFS block 64MB
> > 2. We populate a SequenceFile with a binary value that has 200MB (that
> represents a PDF file)
> > In the circumstances of above scenario:
> > 1. How many blocks will be created on HDFS?
> > 2. The number of blocks will be 200MB/64MB aprox 4 blocks?
> > 3. How many task mappers will created? It is the same number as the
> number of blocks?
> > 4. If 4 mappers will be created, then one mapper will process the single
> value of the file, and the other three are just created and stopped?
> >
> > I look forward for your answers.
> > Thank you.
> > Regards,
> >  Florin
> >
> >
>
>
>
> --
> Harsh J
>


Re: how to find data nodes on which a file is distributed to?

2011-11-28 Thread Praveen Sripati
Go to the NameNode web UI (default port is 50070) and select 'Browse the
filesystem' and drill down to the file. At the bottom of the page the block
report is shown. Or else 'hadoop fsck / -files -blocks -locations' from the
CLI will also give the block report for all the files in HDFS.

Thanks,
Praveen

On Mon, Nov 28, 2011 at 9:29 PM, CB  wrote:

> Hi,
>
> I am new to HDFS.  I read HDFS documents on the internet but I couldn't
> figure out the following.
> Is there a way to find a list of data nodes where a file is distributed to
>  when I executed a command such as
>
> hadoop dfs –copyFromLocal /tmp/testdata /user/chansup/test
>
>
>
> Thanks,
> - Chansup
>
>


Re: Hadoop Security

2011-11-28 Thread Praveen Sripati
Hi,

> 3. Is any kind of encryption is handled in hadoop at the time of storing
the files in HDFS.

You could define a compression codec that does the encryption. Check the
below thread for more details.

http://www.mail-archive.com/common-user@hadoop.apache.org/msg06229.html

Thanks,
Praveen

On Mon, Nov 28, 2011 at 4:09 PM, Harsh J  wrote:

> Apache Hadoop 0.20.2 did not have security features in it. You'd need
> 0.20.203 at least, if not one of the current CDH3/0.20.205 (Both of
> which also carry 0.20-append along with 0.20-security).
>
> On Mon, Nov 28, 2011 at 3:56 PM, Stuti Awasthi 
> wrote:
> > Thanks Alexander for this info.
> > Currently I am using Apache Hadoop version 0.20.2 and not cloudera’s
> Hadoop version. I read that Apache Hadoop 0.20.205 supports Security. Any
> thoughts on that.
> > Since currently I am using Apache Hadoop and quite familiar with it so I
> would like to use it some more before using CDH.
> >
> >
> >
> > From: Alexander C.H. Lorenz [mailto:wget.n...@googlemail.com]
> > Sent: Monday, November 28, 2011 3:25 PM
> > To: hdfs-user@hadoop.apache.org
> > Subject: Re: Hadoop Security
> >
> > HI,
> >
> > 1. yes:
> >
> https://ccp.cloudera.com/display/CDHDOC/Configuring+Hadoop+Security+in+CDH3
> >
> http://hortonworks.com/the-role-of-delegation-tokens-in-apache-hadoop-security/
> >
> > 2. yes
> >
> http://mapredit.blogspot.com/2011/10/secure-your-hadoop-cluster-part-i.html
> >
> > 3. at the moment hdfs has no encryption engine, so far I know.
> >
> > best,
> >  Alex
> >
> > On Mon, Nov 28, 2011 at 10:44 AM, Stuti Awasthi 
> wrote:
> > Hi ,
> >
> > I wanted to know about the security in Hadoop. I have read few articles
> but not very sure about this so I wanted to discuss this topic in forum.
> > As we know that Hadoop provide its security using Filesystem permissions
> like chown, chmod etc.
> >
> > 1. Is Kerberos or any security algo is implemented in the code so that
> we can authenticate or authorize Hadoop.?
> > 2. Can we use LDAP for authentication and Authorization in Hadoop
> > 3. Is any kind of encryption is handled in hadoop at the time of storing
> the files in HDFS.
> >
> > Can anyone please provide me some good links to read on Hadoop Security
> >
> > Regards,
> > Stuti Awasthi
> >
> > ::DISCLAIMER::
> >
> ---
> >
> > The contents of this e-mail and any attachment(s) are confidential and
> intended for the named recipient(s) only.
> > It shall not attach any liability on the originator or HCL or its
> affiliates. Any views or opinions presented in
> > this email are solely those of the author and may not necessarily
> reflect the opinions of HCL or its affiliates.
> > Any form of reproduction, dissemination, copying, disclosure,
> modification, distribution and / or publication of
> > this message without the prior written consent of the author of this
> e-mail is strictly prohibited. If you have
> > received this email in error please delete it and notify the sender
> immediately. Before opening any mail and
> > attachments please check them for viruses and defect.
> >
> >
> ---
> >
> >
> >
> >
> > --
> > Alexander Lorenz
> > http://mapredit.blogspot.com
> >
> > Think of the environment: please don't print this email unless you
> really need to.
> >
> >
> >
>
>
>
> --
> Harsh J
>


Re: set reduced block size for a specific file

2011-08-27 Thread Praveen Sripati
Hi,

There are tons of parameters for mapreduce. How to know if a property is a
client or serve side property?

Thanks,
Praveen

On Sun, Aug 28, 2011 at 4:53 AM, Aaron T. Myers  wrote:

> Hey Ben,
>
> I just filed this JIRA to add this feature:
> https://issues.apache.org/jira/browse/HDFS-2293
>
> If anyone would like to implement this, I would be happy to review it.
>
> Thanks a lot,
> Aaron
>
> --
> Aaron T. Myers
> Software Engineer, Cloudera
>
>
>
> On Sat, Aug 27, 2011 at 4:08 PM, Ben Clay  wrote:
>
>> I didn't even think of overriding the config dir.  Thanks for the tip!
>>
>> -Ben
>>
>>
>> -Original Message-
>> From: Allen Wittenauer [mailto:a...@apache.org]
>> Sent: Saturday, August 27, 2011 6:42 PM
>> To: hdfs-user@hadoop.apache.org
>> Cc: rbc...@ncsu.edu
>> Subject: Re: set reduced block size for a specific file
>>
>>
>> On Aug 27, 2011, at 12:42 PM, Ted Dunning wrote:
>>
>> > There is no way to do this for standard Apache Hadoop.
>>
>>Sure there is.
>>
>>You can build a custom conf dir and point it to that.  You *always*
>> have that option for client settable options as a work around for lack of
>> features/bugs.
>>
>>1. Copy $HADOOP_CONF_DIR or $HADOOP_HOME/conf to a dir
>>2. modify the hdfs-site.xml to have your new block size
>>3. Run the following:
>>
>> HADOOP_CONF_DIR=mycustomconf hadoop dfs  -put file dir
>>
>>Convenient?  No.  Doable? Definitely.
>>
>>
>>
>>
>


Hadoop Jar Files

2011-05-30 Thread Praveen Sripati
Hi,

I have extracted the hadoop-0.20.2, hadoop-0.20.203.0 and hadoop-0.21.0
files.

In the hadoop-0.21.0 folder the hadoop-hdfs-0.21.0.jar,
hadoop-mapred-0.21.0.jar and the hadoop-common-0.21.0.jar files are there.
But in the  hadoop-0.20.2 and the hadoop-0.20.203.0 releases the same files
are missing.

Have the jar files been packaged differently in the 0.20.2 and 0.20.203.0
releases or should I get these jars from some other projects?

Thanks,
Praveen


Re: DataNode not able to talk to NameNode

2010-04-01 Thread Praveen Sripati
Hi,

Thanks for the quick response. Wish the documentation was a bit more clear
on it.

Now it works. I get "Live Datanodes : 1" in the NameNode console.

>> modify  fs.default.name from  hdfs:/localhost:9050  to hdfs://master:9050

Just curious, how does it impact the Socket on the NameNode?

Praveen

On Fri, Apr 2, 2010 at 7:49 AM, zhu weimin  wrote:

>  Hi
>
>
>
> >Why is that the DataNode not able to Connect at port 9050 on the NameNode,
> while the SocketClient.java connects to ?>SocketServer.java on port 9050? Is
> there anything different that the NameNode creates a socket?
>
>
>
> modify  fs.default.name from  hdfs:/localhost:9050  to hdfs://master:9050
>
>  and modify mapred.job.tracker from localhost:9001 to master:9001
>
>
>
>
>
> zhuweimin
>
>
>
>
>
>  *From:* Praveen Sripati [mailto:praveensrip...@gmail.com]
> *Sent:* Friday, April 02, 2010 11:00 AM
> *To:* hdfs-user@hadoop.apache.org
> *Subject:* DataNode not able to talk to NameNode
>
>
>
> Hi,
>
> I am trying to setup Hadoop on a two node cluster, both using Ubuntu 9.10.
> I have configured one node as NameNode/JobTracker and the other as
> DataNode/TaskTracker.
>
> I have the following in the hosts file for the master and the slave
>
> master> cat /etc/hosts
> 192.168.0.100 master
> 192.168.0.102 slave
> 127.0.0.1 localhost
>
> slave> cat /etc/hosts
> 192.168.0.100 master
> 192.168.0.102 slave
> 127.0.0.1 localhosts
>
> and the configuration file on the master and the slave has
>
> master -> core-site.xml -> fs.default.name->hdfs://localhost:9050
> -> hdfs-site.xml -> dfs.replication->1
> -> mapred-site.xml -> mapred.job.tracker->localhost:9001
>
> slave -> core-site.xml -> fs.default.name->hdfs://master:9050
> -> hdfs-site.xml -> dfs.replication->1
> -> mapred-site.xml -> mapred.job.tracker->master:9001
>
>
> When I run the command start-dfs.sh, the NameNode starts without any errors
> and the script tries to start the DataNode. But, the DataNode is not able to
> connect to the MasterNode. The following is in the
> hadoop-praveensripati-datanode-slave.log file
>
> 2010-04-02 06:54:35,630 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: master/192.168.0.100:9050. Already tried 9 time(s).
> 2010-04-02 06:54:35,645 INFO org.apache.hadoop.ipc.RPC: Server at master/
> 192.168.0.100:9050 not available yet, Z...
>
> 1. Able to ping the master from the slave and the other way.
> 2. Able to ssh into slave from master and other way.
> 3. Disabled ipv6 on master and slave. /etc/sysctl.conf has
> net.ipv6.conf.all.disable_ipv6 = 1.
>
> I wrote a Java SocketClient Program to connect from the DataNode to the
> NameNode at port 9050 and I get the following exception
>
> java.net.ConnectException: Connection refused
> at java.net.PlainSocketImpl.socketConnect(Native Method)
> at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
> at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
> at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
> at java.net.Socket.connect(Socket.java:525)
> at SocketClient.main(SocketClient.java:23)
>
> Then, I stop the NameNode and DataNode and then by using Java Programs I
> create a socket (at 9050) on the NameNode and am able to connect from the
> DataNode using Java Program.
>
> ServerSocket.java has
>
> int port = Integer.parseInt(args[0]);
> ServerSocket srv = new ServerSocket(port);
> Socket socket = srv.accept();
>
> SocketClient.java has
>
> InetAddress addr = InetAddress.getByName(args[0]);
> int port = Integer.parseInt(args[1]);
> SocketAddress sockaddr = new InetSocketAddress(addr, port);
> Socket sock = new Socket();
> int timeoutMs = 2000;
> sock.connect(sockaddr, timeoutMs);
>
> When I do 'netstat -a | grep 9050' I get
>
> When NameNode creates the Socket -> tcp0  0
> localhost:9050  *:* LISTEN
> When Java Program creates a Socket -> tcp0  0
> *:9050  *:* LISTEN
>
> Why is that the DataNode not able to Connect at port 9050 on the NameNode,
> while the SocketClient.java connects to SocketServer.java on port 9050? Is
> there anything different that the NameNode creates a socket?
> --
> Praveen
>



-- 
Praveen


DataNode not able to talk to NameNode

2010-04-01 Thread Praveen Sripati
Hi,

I am trying to setup Hadoop on a two node cluster, both using Ubuntu 9.10. I
have configured one node as NameNode/JobTracker and the other as
DataNode/TaskTracker.

I have the following in the hosts file for the master and the slave

master> cat /etc/hosts
192.168.0.100 master
192.168.0.102 slave
127.0.0.1 localhost

slave> cat /etc/hosts
192.168.0.100 master
192.168.0.102 slave
127.0.0.1 localhosts

and the configuration file on the master and the slave has

master -> core-site.xml -> fs.default.name->hdfs://localhost:9050
-> hdfs-site.xml -> dfs.replication->1
-> mapred-site.xml -> mapred.job.tracker->localhost:9001

slave -> core-site.xml -> fs.default.name->hdfs://master:9050
-> hdfs-site.xml -> dfs.replication->1
-> mapred-site.xml -> mapred.job.tracker->master:9001


When I run the command start-dfs.sh, the NameNode starts without any errors
and the script tries to start the DataNode. But, the DataNode is not able to
connect to the MasterNode. The following is in the
hadoop-praveensripati-datanode-slave.log file

2010-04-02 06:54:35,630 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: master/192.168.0.100:9050. Already tried 9 time(s).
2010-04-02 06:54:35,645 INFO org.apache.hadoop.ipc.RPC: Server at master/
192.168.0.100:9050 not available yet, Z...

1. Able to ping the master from the slave and the other way.
2. Able to ssh into slave from master and other way.
3. Disabled ipv6 on master and slave. /etc/sysctl.conf has
net.ipv6.conf.all.disable_ipv6 = 1.

I wrote a Java SocketClient Program to connect from the DataNode to the
NameNode at port 9050 and I get the following exception

java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
at java.net.Socket.connect(Socket.java:525)
at SocketClient.main(SocketClient.java:23)

Then, I stop the NameNode and DataNode and then by using Java Programs I
create a socket (at 9050) on the NameNode and am able to connect from the
DataNode using Java Program.

ServerSocket.java has

int port = Integer.parseInt(args[0]);
ServerSocket srv = new ServerSocket(port);
Socket socket = srv.accept();

SocketClient.java has

InetAddress addr = InetAddress.getByName(args[0]);
int port = Integer.parseInt(args[1]);
SocketAddress sockaddr = new InetSocketAddress(addr, port);
Socket sock = new Socket();
int timeoutMs = 2000;
sock.connect(sockaddr, timeoutMs);

When I do 'netstat -a | grep 9050' I get

When NameNode creates the Socket -> tcp0  0
localhost:9050  *:* LISTEN
When Java Program creates a Socket -> tcp0  0
*:9050  *:* LISTEN

Why is that the DataNode not able to Connect at port 9050 on the NameNode,
while the SocketClient.java connects to SocketServer.java on port 9050? Is
there anything different that the NameNode creates a socket?
-- 
Praveen