Re: copy files from ftp to hdfs in parallel, distcp failed

2013-07-11 Thread பாலாஜி நாராயணன்
On 11 July 2013 06:27, Hao Ren  wrote:

> Hi,
>
> I am running a hdfs on Amazon EC2
>
> Say, I have a ftp server where stores some data.
>

I just want to copy these data directly to hdfs in a parallel way (which
> maybe more efficient).
>
> I think hadoop distcp is what I need.
>

http://hadoop.apache.org/docs/stable/distcp.html

DistCp (distributed copy) is a tool used for large inter/intra-cluster
copying. It uses MapReduce to effect its distribution, error handling and
recovery, and reporting


I doubt this is going to help. Are these lot of files. If yes, how about
multiple copy jobs to hdfs?
-balaji


Re: yarn Failed to bind to: 0.0.0.0/0.0.0.0:8080

2013-07-10 Thread பாலாஜி நாராயணன்
On Wednesday, 10 July 2013, ch huang wrote:

> i have 3 NM, on the box of one of NM ,the 8080 PORT has already ocuppied
> by tomcat,so i want to change all NM 8080 port to 8090,but problem is
> i do not know 8080 port is control by what option in yarn ,anyone can
> help??
>

Why would you want to do that? If ou want to test out any multi node
features you are better off running them in vms

-balaji


-- 
http://balajin.net/blog
http://flic.kr/balajijegan


Re: disk used percentage is not symmetric on datanodes (balancer)

2013-03-24 Thread பாலாஜி நாராயணன்
 -setBalancerBandwidth 

So the value is bytes per second. If it is running and exiting,it means it
has completed the balancing.


On 24 March 2013 11:32, Tapas Sarangi  wrote:

> Yes, we are running balancer, though a balancer process runs for almost a
> day or more before exiting and starting over.
> Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume
> that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it
> is in Bits then we have a problem.
> What's the unit for "dfs.balance.bandwidthPerSec" ?
>
> -
>
> On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <
> li...@balajin.net> wrote:
>
> Are you running balancer? If balancer is running and if it is slow, try
> increasing the balancer bandwidth
>
>
> On 24 March 2013 09:21, Tapas Sarangi  wrote:
>
>> Thanks for the follow up. I don't know whether attachment will pass
>> through this mailing list, but I am attaching a pdf that contains the usage
>> of all live nodes.
>>
>> All nodes starting with letter "g" are the ones with smaller storage
>> space where as nodes starting with letter "s" have larger storage space. As
>> you will see, most of the "gXX" nodes are completely full whereas "sXX"
>> nodes have a lot of unused space.
>>
>> Recently, we are facing crisis frequently as 'hdfs' goes into a mode
>> where it is not able to write any further even though the total space
>> available in the cluster is about 500 TB. We believe this has something to
>> do with the way it is balancing the nodes, but don't understand the problem
>> yet. May be the attached PDF will help some of you (experts) to see what is
>> going wrong here...
>>
>> Thanks
>> --
>>
>>
>>
>>
>>
>>
>>
>> Balancer know about topology,but when calculate balancing it operates
>> only with nodes not with racks.
>> You can see how it work in Balancer.java in  BalancerDatanode about
>> string 509.
>>
>> I was wrong about 350Tb,35Tb it calculates in such way :
>>
>> For example:
>> cluster_capacity=3.5Pb
>> cluster_dfsused=2Pb
>>
>> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster capacity
>> Then we know avg node utilization (node_dfsused/node_capacity*100)
>> .Balancer think that all good if  avgutil
>> +10>node_utilizazation>=avgutil-10.
>>
>> Ideal case that all node used avgutl of capacity.but for 12TB node its
>> only 6.5Tb and for 72Tb its about 40Tb.
>>
>> Balancer cant help you.
>>
>> Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVEif 
>> you can.
>>
>>
>>
>>>
>>>
>>>  In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb
>>> you will be able to have only 12Tb replication data.
>>>
>>>
>>> Yes, this is true for exactly two nodes in the cluster with 12 TB and 72
>>> TB, but not true for more than two nodes in the cluster.
>>>
>>>
>>> Best way,on my opinion,it is using multiple racks.Nodes in rack must be
>>> with identical capacity.Racks must be identical capacity.
>>> For example:
>>>
>>> rack1: 1 node with 72Tb
>>> rack2: 6 nodes with 12Tb
>>> rack3: 3 nodes with 24Tb
>>>
>>> It helps with balancing,because dublicated  block must be another rack.
>>>
>>>
>>> The same question I asked earlier in this message, does multiple racks
>>> with default threshold for the balancer minimizes the difference between
>>> racks ?
>>>
>>> Why did you select hdfs?May be lustre,cephfs and other is better
>>> choise.
>>>
>>>
>>> It wasn't my decision, and I probably can't change it now. I am new to
>>> this cluster and trying to understand few issues. I will explore other
>>> options as you mentioned.
>>>
>>> --
>>> http://balajin.net/blog
>>> http://flic.kr/balajijegan
>>>
>>
>


-- 
http://balajin.net/blog
http://flic.kr/balajijegan


Re: question for commetter

2013-03-24 Thread பாலாஜி நாராயணன்
is there a reason why you dont want to run MRv2 under yarn?


On 22 March 2013 22:49, Azuryy Yu  wrote:

> is there a way to separate hdfs2 from hadoop2? I want use hdfs2 and
> mapreduce1.0.4, exclude yarn. because I need HDFS-HA.
>
> --
> http://balajin.net/blog
> http://flic.kr/balajijegan
>


Re: disk used percentage is not symmetric on datanodes (balancer)

2013-03-24 Thread பாலாஜி நாராயணன்
Are you running balancer? If balancer is running and if it is slow, try
increasing the balancer bandwidth


On 24 March 2013 09:21, Tapas Sarangi  wrote:

> Thanks for the follow up. I don't know whether attachment will pass
> through this mailing list, but I am attaching a pdf that contains the usage
> of all live nodes.
>
> All nodes starting with letter "g" are the ones with smaller storage space
> where as nodes starting with letter "s" have larger storage space. As you
> will see, most of the "gXX" nodes are completely full whereas "sXX" nodes
> have a lot of unused space.
>
> Recently, we are facing crisis frequently as 'hdfs' goes into a mode where
> it is not able to write any further even though the total space available
> in the cluster is about 500 TB. We believe this has something to do with
> the way it is balancing the nodes, but don't understand the problem yet.
> May be the attached PDF will help some of you (experts) to see what is
> going wrong here...
>
> Thanks
> --
>
>
>
>
>
>
>
> Balancer know about topology,but when calculate balancing it operates only
> with nodes not with racks.
> You can see how it work in Balancer.java in  BalancerDatanode about string
> 509.
>
> I was wrong about 350Tb,35Tb it calculates in such way :
>
> For example:
> cluster_capacity=3.5Pb
> cluster_dfsused=2Pb
>
> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster capacity
> Then we know avg node utilization (node_dfsused/node_capacity*100)
> .Balancer think that all good if  avgutil
> +10>node_utilizazation>=avgutil-10.
>
> Ideal case that all node used avgutl of capacity.but for 12TB node its
> only 6.5Tb and for 72Tb its about 40Tb.
>
> Balancer cant help you.
>
> Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVEif you 
> can.
>
>
>
>>
>>
>>  In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb
>> you will be able to have only 12Tb replication data.
>>
>>
>> Yes, this is true for exactly two nodes in the cluster with 12 TB and 72
>> TB, but not true for more than two nodes in the cluster.
>>
>>
>> Best way,on my opinion,it is using multiple racks.Nodes in rack must be
>> with identical capacity.Racks must be identical capacity.
>> For example:
>>
>> rack1: 1 node with 72Tb
>> rack2: 6 nodes with 12Tb
>> rack3: 3 nodes with 24Tb
>>
>> It helps with balancing,because dublicated  block must be another rack.
>>
>>
>> The same question I asked earlier in this message, does multiple racks
>> with default threshold for the balancer minimizes the difference between
>> racks ?
>>
>> Why did you select hdfs?May be lustre,cephfs and other is better choise.
>>
>>
>> It wasn't my decision, and I probably can't change it now. I am new to
>> this cluster and trying to understand few issues. I will explore other
>> options as you mentioned.
>>
>> --
>> http://balajin.net/blog
>> http://flic.kr/balajijegan
>>


Re: Cluster lost IP addresses

2013-03-22 Thread பாலாஜி நாராயணன்
Assuming you are using hostnAmes and not ip address in your config
files What happens when you start the cluster? If you are using IP address
in your configs just update them and start. It should work with no issues.

On Friday, March 22, 2013, John Meza wrote:

> I have a 18 node cluster that had to be physically moved.
> Unfortunately all the ip addresses were lost (recreated).
>
> This must have happened to someone before.
> Nothing else on the machines has been changed. Most importantly the data
> in HDFS is still sitting there.
>
> Is there a way to recover this cluster to a useable state?
> thanks
> John
>


-- 
http://balajin.net/blog
http://flic.kr/balajijegan


Re: HDFS disk space requirement

2013-01-10 Thread பாலாஜி நாராயணன்
If the replication factor is   5 you will need at least 5x the space if the
file. So this is not going tobe enough.

On Thursday, January 10, 2013, Panshul Whisper wrote:

> Hello,
>
> I have a hadoop cluster of 5 nodes with a total of available HDFS space
> 130 GB with replication set to 5.
> I have a file of 115 GB, which needs to be copied to the HDFS and
> processed.
> Do I need to have anymore HDFS space for performing all processing without
> running into any problems? or is this space sufficient?
>
> --
> Regards,
> Ouch Whisper
> 010101010101
>


-- 
http://balajin.net/blog
http://flic.kr/balajijegan


Re: HDFS HA IO Fencing

2012-10-27 Thread பாலாஜி நாராயணன்
If you use NSFv4 you should be able to use locks and when a machine dies /
fails to renew the lease, the other machine can take over.

On Friday, October 26, 2012, Todd Lipcon wrote:

> NFS Locks typically last forever if you disconnect abruptly. So they are
> not sufficient -- your standby wouldn't be able to take over without manual
> intervention to remove the lock.
>
> If you want to build an unreliable system that might corrupt your data,
> you could set up 'shell(/bin/true)' as a second fencing method. But, it's
> really a bad idea. There are failure scenarios which could cause split
> brain if you do this, and you'd very likely lose data.
>
> -Todd
>
> On Fri, Oct 26, 2012 at 1:59 AM, lei liu 
> 
> > wrote:
>
>> We are using NFS for Shared storage,  Can we use linux nfslcok service to
>> implement IO Fencing ?
>>
>>
>> 2012/10/26 Steve Loughran > 'cvml', 'ste...@hortonworks.com');>>
>>
>>>
>>>
>>> On 25 October 2012 14:08, Todd Lipcon >> 'cvml', 't...@cloudera.com');>
>>> > wrote:
>>>
 Hi Liu,

 Locks are not sufficient, because there is no way to enforce a lock in
 a distributed system without unbounded blocking. What you might be
 referring to is a lease, but leases are still problematic unless you can
 put bounds on the speed with which clocks progress on different machines,
 _and_ have strict guarantees on the way each node's scheduler works. With
 Linux and Java, the latter is tough.


>>> on any OS running in any virtual environment, including EC2, time is
>>> entirely unpredictable, just to make things worse.
>>>
>>>
>>> On a single machine you can use file locking as the OS will know that
>>> the process is dead and closes the file; other programs can attempt to open
>>> the same file with exclusive locking -and, by getting the right failures,
>>> know that something else has the file, hence the other process is live.
>>> Shared NFS storage you need to mount with softlock set precisely to stop
>>> file locks lasting until some lease has expired, because the on-host
>>> liveness probes detect failure faster and want to react to it.
>>>
>>>
>>> -Steve
>>>
>>
>>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>


-- 
Thanks
-balaji

--
http://balajin.net/blog/
http://flic.kr/balajijegan


Re: Namenode shutting down while creating cluster

2012-10-20 Thread பாலாஜி நாராயணன்
Sundeep, what happens when you use the ip instead of name in the config?

On Saturday, October 20, 2012, Sundeep Kambhmapati wrote:

> Thank  You Balaji,
> I checked gethostbyname(sk.r252.0) it gives 10.0.2.15. This is ipaddress i
> am getting in ifconfig also.
> ssh sk.r252.0 is sshing to 10.0.2.15
> ping sk.r252.0 is pinging to 10.0.2.15.
>
> Can you please help me with the issue?
>
> Regards
> Sundeep
>
>
> Seems like an issue with resolution of sk.r252.0. Can you ensure that it
> resolves?
>
> On Friday, October 19, 2012, Sundeep Kambhmapati wrote:
>
> Hi Users,
> My name node is shutting down soon after it is started.
> Here the log. Can some one please help me.
>
> 2012-10-19 23:20:42,143 INFO
> org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG:
> /
> STARTUP_MSG: Starting NameNode
> STARTUP_MSG:   host = sk.r252.0/10.0.2.15
> STARTUP_MSG:   args = []
> STARTUP_MSG:   version = 0.20.2
> STARTUP_MSG:   build =
> https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r
> 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
> /
> 2012-10-19 23:20:42,732 INFO org.apache.hadoop.ipc.metrics.RpcMetrics:
> Initializing RPC Metrics with hostName=NameNode, port=54310
> 2012-10-19 23:20:42,741 INFO
> org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at: sk.r252.0/
> 10.0.2.15:54310
> 2012-10-19 23:20:42,745 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
> Initializing JVM Metrics with processName=NameNode, sessionId=null
> 2012-10-19 23:20:42,747 INFO
> org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics:
> Initializing NameNodeMeterics using context
> object:org.apache.hadoop.metrics.spi.NullContext
> 2012-10-19 23:20:43,074 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
> fsOwner=root,root,bin,daemon,sys,adm,disk,wheel
> 2012-10-19 23:20:43,077 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
> 2012-10-19 23:20:43,077 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
> isPermissionEnabled=true
> 2012-10-19 23:20:43,231 INFO
> org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics:
> Initializing FSNamesystemMetrics using context
> object:org.apache.hadoop.metrics.spi.NullContext
> 2012-10-19 23:20:43,239 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
> FSNamesystemStatusMBean
> 2012-10-19 23:20:43,359 INFO org.apache.hadoop.hdfs.server.common.Storage:
> Number of files = 1
> 2012-10-19 23:20:43,379 INFO org.apache.hadoop.hdfs.server.common.Storage:
> Number of files under construction = 0
> 2012-10-19 23:20:43,379 INFO org.apache.hadoop.hdfs.server.common.Storage:
> Image file of size 94 loaded in 0 seconds.
> 2012-10-19 23:20:43,380 INFO org.apache.hadoop.hdfs.server.common.Storage:
> Edits file /app/hadoop/tmp/dfs/
>
>

-- 
Thanks
-balaji

--
http://balajin.net/blog/
http://flic.kr/balajijegan


Re: Namenode shutting down while creating cluster

2012-10-19 Thread பாலாஜி நாராயணன்
Seems like an issue with resolution of sk.r252.0. Can you ensure that it
resolves?

On Friday, October 19, 2012, Sundeep Kambhmapati wrote:

> Hi Users,
> My name node is shutting down soon after it is started.
> Here the log. Can some one please help me.
>
> 2012-10-19 23:20:42,143 INFO
> org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG:
> /
> STARTUP_MSG: Starting NameNode
> STARTUP_MSG:   host = sk.r252.0/10.0.2.15
> STARTUP_MSG:   args = []
> STARTUP_MSG:   version = 0.20.2
> STARTUP_MSG:   build =
> https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r
> 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
> /
> 2012-10-19 23:20:42,732 INFO org.apache.hadoop.ipc.metrics.RpcMetrics:
> Initializing RPC Metrics with hostName=NameNode, port=54310
> 2012-10-19 23:20:42,741 INFO
> org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at: sk.r252.0/
> 10.0.2.15:54310
> 2012-10-19 23:20:42,745 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
> Initializing JVM Metrics with processName=NameNode, sessionId=null
> 2012-10-19 23:20:42,747 INFO
> org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics:
> Initializing NameNodeMeterics using context
> object:org.apache.hadoop.metrics.spi.NullContext
> 2012-10-19 23:20:43,074 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
> fsOwner=root,root,bin,daemon,sys,adm,disk,wheel
> 2012-10-19 23:20:43,077 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
> 2012-10-19 23:20:43,077 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
> isPermissionEnabled=true
> 2012-10-19 23:20:43,231 INFO
> org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics:
> Initializing FSNamesystemMetrics using context
> object:org.apache.hadoop.metrics.spi.NullContext
> 2012-10-19 23:20:43,239 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
> FSNamesystemStatusMBean
> 2012-10-19 23:20:43,359 INFO org.apache.hadoop.hdfs.server.common.Storage:
> Number of files = 1
> 2012-10-19 23:20:43,379 INFO org.apache.hadoop.hdfs.server.common.Storage:
> Number of files under construction = 0
> 2012-10-19 23:20:43,379 INFO org.apache.hadoop.hdfs.server.common.Storage:
> Image file of size 94 loaded in 0 seconds.
> 2012-10-19 23:20:43,380 INFO org.apache.hadoop.hdfs.server.common.Storage:
> Edits file /app/hadoop/tmp/dfs/name/current/edits of size 4 edits # 0
> loaded in 0 seconds.
> 2012-10-19 23:20:43,415 INFO org.apache.hadoop.hdfs.server.common.Storage:
> Image file of size 94 saved in 0 seconds.
> 2012-10-19 23:20:43,612 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Finished loading
> FSImage in 758 msecs
> 2012-10-19 23:20:43,615 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Total number of blocks
> = 0
> 2012-10-19 23:20:43,615 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of invalid
> blocks = 0
> 2012-10-19 23:20:43,615 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of
> under-replicated blocks = 0
> 2012-10-19 23:20:43,615 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of
>  over-replicated blocks = 0
> 2012-10-19 23:20:43,615 INFO org.apache.hadoop.hdfs.StateChange: STATE*
> Leaving safe mode after 0 secs.
> 2012-10-19 23:20:43,616 INFO org.apache.hadoop.hdfs.StateChange: STATE*
> Network topology has 0 racks and 0 datanodes
> 2012-10-19 23:20:43,616 INFO org.apache.hadoop.hdfs.StateChange: STATE*
> UnderReplicatedBlocks has 0 blocks
> 2012-10-19 23:20:44,450 INFO org.mortbay.log: Logging to
> org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
> org.mortbay.log.Slf4jLog
> 2012-10-19 23:20:44,711 INFO org.apache.hadoop.http.HttpServer: Port
> returned by webServer.getConnectors()[0].getLocalPort() before open() is
> -1. Opening the listener on 50070
> 2012-10-19 23:20:44,715 INFO org.apache.hadoop.http.HttpServer:
> listener.getLocalPort() returned 50070
> webServer.getConnectors()[0].getLocalPort() returned 50070
> 2012-10-19 23:20:44,715 INFO org.apache.hadoop.http.HttpServer: Jetty
> bound to port 50070
> 2012-10-19 23:20:44,715 INFO org.mortbay.log: jetty-6.1.14
> 2012-10-19 23:20:47,021 INFO org.mortbay.log: Started
> SelectChannelConnector@0.0.0.0:50070
> 2012-10-19 23:20:47,022 INFO
> org.apache.hadoop.hdfs.server.namenode.NameNode: Web-server up at:
> 0.0.0.0:50070
> 2012-10-19 23:20:47,022 INFO
> org.apache.hadoop.hdfs.server.namenode.NameNode: Web-server up at:
> 0.0.0.0:50070
> 2012-10-19 23:20:47,067 INFO org.apache.hadoop.ipc.Server: IPC Server
> listener on 54310: starting
> 2012-10-19 23:20:47,086 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 0 on 54310: starting
> 2012-10-19 23:20:47,089 INFO org.apache.hadoop.ipc.Server: IPC Server
> Responder: starting
> 2012-10-19 23:20:47,106 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 1