Re: how to control (or understand) the memory usage in hdfs

2013-03-24 Thread Ted
oh, really?

ulimit -n is 2048, I'd assumed that would be sufficient for just
testing on my machine. I was going to use 4096 in production.
my hdfs-site.xml has dfs.datanode.max.xcievers set to 4096.

As for my logs... there's a lot of INFO entries, I haven't gotten
around to configuring it down yet - I'm not quite sure why it's so
extensive at INFO level. My log files is 4.4gb (is this a sign I've
configured or done something wrong?)

I grep -v INFO in the log to get the actual error entry (assuming
the stack trace is actually is on the same line or else those stack
lines maybe misleading)

2013-03-23 15:11:43,653 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode:
DatanodeRegistration(127.0.0.1:50010,
storageID=DS-1419421989-192.168.1.5-50010-1363780956652,
infoPort=50075, ipcPort=50020):DataXceiveServer: Exiting due
to:java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:691)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverServer.java:133)
at java.lang.Thread.run(Thread.java:722)

2013-03-23 15:11:44,177 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode:
DatanodeRegistration(127.0.0.1:50010,
storageID=DS-1419421989-192.168.1.5-50010-1363780956652,
infoPort=50075, ipcPort=50020):DataXceiver
java.io.InterruptedIOException: Interruped while waiting for IO on
channel java.nio.channels.SocketChannel[closed]. 0 millis timeout
left.
at 
org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:349)
at 
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)
at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:273)
at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
at java.io.DataInputStream.read(DataInputStream.java:149)
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:292)
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:339)
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:403)
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:581)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:406)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:112)
at java.lang.Thread.run(Thread.java:722)

On 3/23/13, Harsh J ha...@cloudera.com wrote:
 I'm guessing your OutOfMemory then is due to Unable to create native
 thread message? Do you mind sharing your error logs with us? Cause if
 its that, then its a ulimit/system limits issue and not a real memory
 issue.

 On Sat, Mar 23, 2013 at 2:30 PM, Ted r6squee...@gmail.com wrote:
 I just checked and after running my tests, I generate only 670mb of
 data, on 89 blocks.

 What's more, when I ran the test this time, I had increased my memory
 to 2048mb so it completed fine - but I decided to run jconsole through
 the test so I could see what's happenning. The data node never
 exceeded 200mb of memory usage. It mostly stayed under 100mb.

 I'm not sure why it would complain about out of memory and shut itself
 down when it was only 1024. It was fairly consistently doing that the
 last few days including this morning right before I switched it to
 2048.

 I'm going to run the test again with 1024mb and jconsole running, none
 of this makes any sense to me.

 On 3/23/13, Harsh J ha...@cloudera.com wrote:
 I run a 128 MB heap size DN for my simple purposes on my Mac and it
 runs well for what load I apply on it.

 A DN's primary, growing memory consumption comes from the # of blocks
 it carries. All of these blocks' file paths are mapped and kept in the
 RAM during its lifetime. If your DN has acquired a lot of blocks by
 now, like say close to a million or more, then 1 GB may not suffice
 anymore to hold them in and you'd need to scale up (add more RAM or
 increase heap size if you have more RAM)/scale out (add another node
 and run the balancer).

 On Sat, Mar 23, 2013 at 10:03 AM, Ted r6squee...@gmail.com wrote:
 Hi I'm new to hadoop/hdfs and I'm just running some tests on my local
 machines in a single node setup. I'm encountering out of memory errors
 on the jvm running my data node.

 I'm pretty sure I can just increase the heap size to fix the errors,
 but my question is about how memory is actually used.

 As an example, with other things like an OS's disk-cache or say
 databases, if you have or let it use as an example 1gb of ram, it will
 work with what it has available, if the data is more than 1gb of ram
 it just means it'll swap in and out of memory/disk more 

Best practises to learn hadoop for new users

2013-03-24 Thread suraj nayak



Re: DistributedCache - why not read directly from HDFS?

2013-03-24 Thread Alberto Cordioli
Thanks for your reply Harsh.
So if I want to read a simple text file, choosing whether to use
DistributedCachce or HDFS it becomes just a matter of performance.


Alberto

On 23 March 2013 16:17, Harsh J ha...@cloudera.com wrote:
 A DistributedCache is not used just to distribute simple files but
 also native libraries and such which cannot be loaded by certain if
 its on HDFS.

 Also, keeping it on HDFS could provide less performant as non-local
 reads could happen (depending on the files' replication factor).

 On Sat, Mar 23, 2013 at 8:23 PM, Alberto Cordioli
 cordioli.albe...@gmail.com wrote:
 Hi all,

 I was not able to find an answer to the following question. If the
 question has already been answered please give me the pointer to the
 right thread.

 Which are actually the differences between read file from HDFS in one
 mapper and use DistributedCache.

 I saw that with DistributedCache you can give an hdfs path and the
 task nodes will get the data on local file system. But which
 advantages we have compared with a simple HDFS read with
 FSDataInputStream.open() method?

 Thank you very much,
 Alberto


 --
 Alberto Cordioli



 --
 Harsh J



-- 
Alberto Cordioli


Child JVM memory allocation / Usage

2013-03-24 Thread nagarjuna kanamarlapudi
Hi,

I configured  my child jvm heap to 2 GB. So, I thought I could really read
1.5GB of data and store it in memory (mapper/reducer).

I wanted to confirm the same and wrote the following piece of code in the
configure method of mapper.

@Override

public void configure(JobConf job) {

System.out.println(FREE MEMORY -- 

+ Runtime.getRuntime().freeMemory());

System.out.println(MAX MEMORY --- + Runtime.getRuntime().maxMemory());

}


Surprisingly the output was


FREE MEMORY -- 341854864  = 320 MB
MAX MEMORY ---1908932608  = 1.9 GB


I am just wondering what processes are taking up that extra 1.6GB of
heap which I configured for the child jvm heap.


Appreciate in helping me understand the scenario.



Regards

Nagarjuna K


2 Reduce method in one Job

2013-03-24 Thread Fatih Haltas
I want to get reduce output as key and value then I want to pass them to a
new reduce as input key and input value.

So is there any Map-Reduce-Reduce kind of method?

Thanks to all.


Re: 2 Reduce method in one Job

2013-03-24 Thread Azuryy Yu
there isn't such method, you had to submit another MR.
On Mar 24, 2013 9:03 PM, Fatih Haltas fatih.hal...@nyu.edu wrote:

 I want to get reduce output as key and value then I want to pass them to a
 new reduce as input key and input value.

 So is there any Map-Reduce-Reduce kind of method?

 Thanks to all.



Re: 2 Reduce method in one Job

2013-03-24 Thread Harsh J
You seem to want to re-sort/partition your data without materializing
it onto HDFS.

Azuryy is right: There isn't a way right now and a second job (with an
identity mapper) is necessary. With YARN this is more possible to
implement into the project, though.

The newly inducted incubator project Tez sorta targets this. Its in
its nascent stages though (for general user use), and the website
should hopefully appear at
http://incubator.apache.org/projects/tez.html soon. Meanwhile, you can
read the proposal behind this project at
http://wiki.apache.org/incubator/TezProposal. Initial sources are at
https://svn.apache.org/repos/asf/incubator/tez/trunk/.

On Sun, Mar 24, 2013 at 6:33 PM, Fatih Haltas fatih.hal...@nyu.edu wrote:
 I want to get reduce output as key and value then I want to pass them to a
 new reduce as input key and input value.

 So is there any Map-Reduce-Reduce kind of method?

 Thanks to all.



-- 
Harsh J


Re: 2 Reduce method in one Job

2013-03-24 Thread Fatih Haltas
Thank you very much.

You are right Harsh, it is exactly what i am trying to do.

I want to process my result, according to the keys and i donot spend time
writing this data to hdfs, I want to pass data as input to another reduce.

One more question then,
Creating 2 diffirent job, secondone has only reduce for example, is it
possible to pass first jobs output as argument to second job?


On Sun, Mar 24, 2013 at 5:44 PM, Harsh J ha...@cloudera.com wrote:

 You seem to want to re-sort/partition your data without materializing
 it onto HDFS.

 Azuryy is right: There isn't a way right now and a second job (with an
 identity mapper) is necessary. With YARN this is more possible to
 implement into the project, though.

 The newly inducted incubator project Tez sorta targets this. Its in
 its nascent stages though (for general user use), and the website
 should hopefully appear at
 http://incubator.apache.org/projects/tez.html soon. Meanwhile, you can
 read the proposal behind this project at
 http://wiki.apache.org/incubator/TezProposal. Initial sources are at
 https://svn.apache.org/repos/asf/incubator/tez/trunk/.

 On Sun, Mar 24, 2013 at 6:33 PM, Fatih Haltas fatih.hal...@nyu.edu
 wrote:
  I want to get reduce output as key and value then I want to pass them to
 a
  new reduce as input key and input value.
 
  So is there any Map-Reduce-Reduce kind of method?
 
  Thanks to all.



 --
 Harsh J



Re: 2 Reduce method in one Job

2013-03-24 Thread Harsh J
Yes, just use an identity mapper (in new API, the base Mapper class
itself identity-maps, in the old API use IdentityMapper class) and set
the input path as the output path of the first job.

If you'll be ending up doing more such step-wise job chaining,
consider using Apache Oozie's workflow system.

On Sun, Mar 24, 2013 at 7:23 PM, Fatih Haltas fatih.hal...@nyu.edu wrote:
 Thank you very much.

 You are right Harsh, it is exactly what i am trying to do.

 I want to process my result, according to the keys and i donot spend time
 writing this data to hdfs, I want to pass data as input to another reduce.

 One more question then,
 Creating 2 diffirent job, secondone has only reduce for example, is it
 possible to pass first jobs output as argument to second job?


 On Sun, Mar 24, 2013 at 5:44 PM, Harsh J ha...@cloudera.com wrote:

 You seem to want to re-sort/partition your data without materializing
 it onto HDFS.

 Azuryy is right: There isn't a way right now and a second job (with an
 identity mapper) is necessary. With YARN this is more possible to
 implement into the project, though.

 The newly inducted incubator project Tez sorta targets this. Its in
 its nascent stages though (for general user use), and the website
 should hopefully appear at
 http://incubator.apache.org/projects/tez.html soon. Meanwhile, you can
 read the proposal behind this project at
 http://wiki.apache.org/incubator/TezProposal. Initial sources are at
 https://svn.apache.org/repos/asf/incubator/tez/trunk/.

 On Sun, Mar 24, 2013 at 6:33 PM, Fatih Haltas fatih.hal...@nyu.edu
 wrote:
  I want to get reduce output as key and value then I want to pass them to
  a
  new reduce as input key and input value.
 
  So is there any Map-Reduce-Reduce kind of method?
 
  Thanks to all.



 --
 Harsh J





--
Harsh J


Re: disk used percentage is not symmetric on datanodes (balancer)

2013-03-24 Thread பாலாஜி நாராயணன்
Are you running balancer? If balancer is running and if it is slow, try
increasing the balancer bandwidth


On 24 March 2013 09:21, Tapas Sarangi tapas.sara...@gmail.com wrote:

 Thanks for the follow up. I don't know whether attachment will pass
 through this mailing list, but I am attaching a pdf that contains the usage
 of all live nodes.

 All nodes starting with letter g are the ones with smaller storage space
 where as nodes starting with letter s have larger storage space. As you
 will see, most of the gXX nodes are completely full whereas sXX nodes
 have a lot of unused space.

 Recently, we are facing crisis frequently as 'hdfs' goes into a mode where
 it is not able to write any further even though the total space available
 in the cluster is about 500 TB. We believe this has something to do with
 the way it is balancing the nodes, but don't understand the problem yet.
 May be the attached PDF will help some of you (experts) to see what is
 going wrong here...

 Thanks
 --







 Balancer know about topology,but when calculate balancing it operates only
 with nodes not with racks.
 You can see how it work in Balancer.java in  BalancerDatanode about string
 509.

 I was wrong about 350Tb,35Tb it calculates in such way :

 For example:
 cluster_capacity=3.5Pb
 cluster_dfsused=2Pb

 avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster capacity
 Then we know avg node utilization (node_dfsused/node_capacity*100)
 .Balancer think that all good if  avgutil
 +10node_utilizazation=avgutil-10.

 Ideal case that all node used avgutl of capacity.but for 12TB node its
 only 6.5Tb and for 72Tb its about 40Tb.

 Balancer cant help you.

 Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVEif you 
 can.





  In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb
 you will be able to have only 12Tb replication data.


 Yes, this is true for exactly two nodes in the cluster with 12 TB and 72
 TB, but not true for more than two nodes in the cluster.


 Best way,on my opinion,it is using multiple racks.Nodes in rack must be
 with identical capacity.Racks must be identical capacity.
 For example:

 rack1: 1 node with 72Tb
 rack2: 6 nodes with 12Tb
 rack3: 3 nodes with 24Tb

 It helps with balancing,because dublicated  block must be another rack.


 The same question I asked earlier in this message, does multiple racks
 with default threshold for the balancer minimizes the difference between
 racks ?

 Why did you select hdfs?May be lustre,cephfs and other is better choise.


 It wasn't my decision, and I probably can't change it now. I am new to
 this cluster and trying to understand few issues. I will explore other
 options as you mentioned.

 --
 http://balajin.net/blog
 http://flic.kr/balajijegan



Re: question for commetter

2013-03-24 Thread பாலாஜி நாராயணன்
is there a reason why you dont want to run MRv2 under yarn?


On 22 March 2013 22:49, Azuryy Yu azury...@gmail.com wrote:

 is there a way to separate hdfs2 from hadoop2? I want use hdfs2 and
 mapreduce1.0.4, exclude yarn. because I need HDFS-HA.

 --
 http://balajin.net/blog
 http://flic.kr/balajijegan



Re: disk used percentage is not symmetric on datanodes (balancer)

2013-03-24 Thread Tapas Sarangi
Yes, we are running balancer, though a balancer process runs for almost a day 
or more before exiting and starting over.
Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume that's 
bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it is in Bits 
then we have a problem.
What's the unit for dfs.balance.bandwidthPerSec ?

-

On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) 
li...@balajin.net wrote:

 Are you running balancer? If balancer is running and if it is slow, try 
 increasing the balancer bandwidth
 
 
 On 24 March 2013 09:21, Tapas Sarangi tapas.sara...@gmail.com wrote:
 Thanks for the follow up. I don't know whether attachment will pass through 
 this mailing list, but I am attaching a pdf that contains the usage of all 
 live nodes.
 
 All nodes starting with letter g are the ones with smaller storage space 
 where as nodes starting with letter s have larger storage space. As you 
 will see, most of the gXX nodes are completely full whereas sXX nodes 
 have a lot of unused space. 
 
 Recently, we are facing crisis frequently as 'hdfs' goes into a mode where it 
 is not able to write any further even though the total space available in the 
 cluster is about 500 TB. We believe this has something to do with the way it 
 is balancing the nodes, but don't understand the problem yet. May be the 
 attached PDF will help some of you (experts) to see what is going wrong 
 here...
 
 Thanks
 --
 
 
 
 
 
 
 
 Balancer know about topology,but when calculate balancing it operates only 
 with nodes not with racks.
 You can see how it work in Balancer.java in  BalancerDatanode about string 
 509.
 
 I was wrong about 350Tb,35Tb it calculates in such way :
 
 For example:
 cluster_capacity=3.5Pb
 cluster_dfsused=2Pb
 
 avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster capacity
 Then we know avg node utilization (node_dfsused/node_capacity*100) .Balancer 
 think that all good if  avgutil +10node_utilizazation=avgutil-10.
 
 Ideal case that all node used avgutl of capacity.but for 12TB node its only 
 6.5Tb and for 72Tb its about 40Tb.
 
 Balancer cant help you.
 
 Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVE if 
 you can.
 
  
 
 
 In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb you 
 will be able to have only 12Tb replication data.
 
 Yes, this is true for exactly two nodes in the cluster with 12 TB and 72 TB, 
 but not true for more than two nodes in the cluster.
 
 
 Best way,on my opinion,it is using multiple racks.Nodes in rack must be 
 with identical capacity.Racks must be identical capacity.
 For example:
 
 rack1: 1 node with 72Tb
 rack2: 6 nodes with 12Tb
 rack3: 3 nodes with 24Tb
 
 It helps with balancing,because dublicated  block must be another rack.
 
 
 The same question I asked earlier in this message, does multiple racks with 
 default threshold for the balancer minimizes the difference between racks ?
 
 Why did you select hdfs?May be lustre,cephfs and other is better choise.  
 
 It wasn't my decision, and I probably can't change it now. I am new to this 
 cluster and trying to understand few issues. I will explore other options as 
 you mentioned.
 
 -- 
 http://balajin.net/blog
 http://flic.kr/balajijegan



Re: disk used percentage is not symmetric on datanodes (balancer)

2013-03-24 Thread பாலாஜி நாராயணன்
 -setBalancerBandwidth bandwidth in bytes per second

So the value is bytes per second. If it is running and exiting,it means it
has completed the balancing.


On 24 March 2013 11:32, Tapas Sarangi tapas.sara...@gmail.com wrote:

 Yes, we are running balancer, though a balancer process runs for almost a
 day or more before exiting and starting over.
 Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume
 that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it
 is in Bits then we have a problem.
 What's the unit for dfs.balance.bandwidthPerSec ?

 -

 On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) 
 li...@balajin.net wrote:

 Are you running balancer? If balancer is running and if it is slow, try
 increasing the balancer bandwidth


 On 24 March 2013 09:21, Tapas Sarangi tapas.sara...@gmail.com wrote:

 Thanks for the follow up. I don't know whether attachment will pass
 through this mailing list, but I am attaching a pdf that contains the usage
 of all live nodes.

 All nodes starting with letter g are the ones with smaller storage
 space where as nodes starting with letter s have larger storage space. As
 you will see, most of the gXX nodes are completely full whereas sXX
 nodes have a lot of unused space.

 Recently, we are facing crisis frequently as 'hdfs' goes into a mode
 where it is not able to write any further even though the total space
 available in the cluster is about 500 TB. We believe this has something to
 do with the way it is balancing the nodes, but don't understand the problem
 yet. May be the attached PDF will help some of you (experts) to see what is
 going wrong here...

 Thanks
 --







 Balancer know about topology,but when calculate balancing it operates
 only with nodes not with racks.
 You can see how it work in Balancer.java in  BalancerDatanode about
 string 509.

 I was wrong about 350Tb,35Tb it calculates in such way :

 For example:
 cluster_capacity=3.5Pb
 cluster_dfsused=2Pb

 avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster capacity
 Then we know avg node utilization (node_dfsused/node_capacity*100)
 .Balancer think that all good if  avgutil
 +10node_utilizazation=avgutil-10.

 Ideal case that all node used avgutl of capacity.but for 12TB node its
 only 6.5Tb and for 72Tb its about 40Tb.

 Balancer cant help you.

 Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVEif 
 you can.





  In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb
 you will be able to have only 12Tb replication data.


 Yes, this is true for exactly two nodes in the cluster with 12 TB and 72
 TB, but not true for more than two nodes in the cluster.


 Best way,on my opinion,it is using multiple racks.Nodes in rack must be
 with identical capacity.Racks must be identical capacity.
 For example:

 rack1: 1 node with 72Tb
 rack2: 6 nodes with 12Tb
 rack3: 3 nodes with 24Tb

 It helps with balancing,because dublicated  block must be another rack.


 The same question I asked earlier in this message, does multiple racks
 with default threshold for the balancer minimizes the difference between
 racks ?

 Why did you select hdfs?May be lustre,cephfs and other is better
 choise.


 It wasn't my decision, and I probably can't change it now. I am new to
 this cluster and trying to understand few issues. I will explore other
 options as you mentioned.

 --
 http://balajin.net/blog
 http://flic.kr/balajijegan





-- 
http://balajin.net/blog
http://flic.kr/balajijegan


Re: disk used percentage is not symmetric on datanodes (balancer)

2013-03-24 Thread Tapas Sarangi
Yes, thanks for pointing, but I already know that it is completing the 
balancing when exiting otherwise it shouldn't exit. 
Your answer doesn't solve the problem I mentioned earlier in my message. 'hdfs' 
is stalling and hadoop is not writing unless space is cleared up from the 
cluster even though df shows the cluster has about 500 TB of free space. 

---
 

On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) 
bal...@balajin.net wrote:

  -setBalancerBandwidth bandwidth in bytes per second
 
 So the value is bytes per second. If it is running and exiting,it means it 
 has completed the balancing. 
 
 
 On 24 March 2013 11:32, Tapas Sarangi tapas.sara...@gmail.com wrote:
 Yes, we are running balancer, though a balancer process runs for almost a day 
 or more before exiting and starting over.
 Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume that's 
 bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it is in 
 Bits then we have a problem.
 What's the unit for dfs.balance.bandwidthPerSec ?
 
 -
 
 On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) 
 li...@balajin.net wrote:
 
 Are you running balancer? If balancer is running and if it is slow, try 
 increasing the balancer bandwidth
 
 
 On 24 March 2013 09:21, Tapas Sarangi tapas.sara...@gmail.com wrote:
 Thanks for the follow up. I don't know whether attachment will pass through 
 this mailing list, but I am attaching a pdf that contains the usage of all 
 live nodes.
 
 All nodes starting with letter g are the ones with smaller storage space 
 where as nodes starting with letter s have larger storage space. As you 
 will see, most of the gXX nodes are completely full whereas sXX nodes 
 have a lot of unused space. 
 
 Recently, we are facing crisis frequently as 'hdfs' goes into a mode where 
 it is not able to write any further even though the total space available in 
 the cluster is about 500 TB. We believe this has something to do with the 
 way it is balancing the nodes, but don't understand the problem yet. May be 
 the attached PDF will help some of you (experts) to see what is going wrong 
 here...
 
 Thanks
 --
 
 
 
 
 
 
 
 Balancer know about topology,but when calculate balancing it operates only 
 with nodes not with racks.
 You can see how it work in Balancer.java in  BalancerDatanode about string 
 509.
 
 I was wrong about 350Tb,35Tb it calculates in such way :
 
 For example:
 cluster_capacity=3.5Pb
 cluster_dfsused=2Pb
 
 avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster capacity
 Then we know avg node utilization (node_dfsused/node_capacity*100) 
 .Balancer think that all good if  avgutil 
 +10node_utilizazation=avgutil-10.
 
 Ideal case that all node used avgutl of capacity.but for 12TB node its only 
 6.5Tb and for 72Tb its about 40Tb.
 
 Balancer cant help you.
 
 Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVE if 
 you can.
 
  
 
 
 In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb you 
 will be able to have only 12Tb replication data.
 
 Yes, this is true for exactly two nodes in the cluster with 12 TB and 72 
 TB, but not true for more than two nodes in the cluster.
 
 
 Best way,on my opinion,it is using multiple racks.Nodes in rack must be 
 with identical capacity.Racks must be identical capacity.
 For example:
 
 rack1: 1 node with 72Tb
 rack2: 6 nodes with 12Tb
 rack3: 3 nodes with 24Tb
 
 It helps with balancing,because dublicated  block must be another rack.
 
 
 The same question I asked earlier in this message, does multiple racks with 
 default threshold for the balancer minimizes the difference between racks ?
 
 Why did you select hdfs?May be lustre,cephfs and other is better choise.  
 
 It wasn't my decision, and I probably can't change it now. I am new to this 
 cluster and trying to understand few issues. I will explore other options 
 as you mentioned.
 
 -- 
 http://balajin.net/blog
 http://flic.kr/balajijegan
 
 
 
 
 -- 
 http://balajin.net/blog
 http://flic.kr/balajijegan



Re: disk used percentage is not symmetric on datanodes (balancer)

2013-03-24 Thread Jamal B
On both types of nodes, what is your dfs.data.dir set to? Does it specify
multiple folders on the same set's of drives or is it 1-1 between folder
and drive?  If it's set to multiple folders on the same drives, it
is probably multiplying the amount of available capacity incorrectly in
that it assumes a 1-1 relationship between folder and total capacity of the
drive.


On Sun, Mar 24, 2013 at 3:01 PM, Tapas Sarangi tapas.sara...@gmail.comwrote:

 Yes, thanks for pointing, but I already know that it is completing the
 balancing when exiting otherwise it shouldn't exit.
 Your answer doesn't solve the problem I mentioned earlier in my message.
 'hdfs' is stalling and hadoop is not writing unless space is cleared up
 from the cluster even though df shows the cluster has about 500 TB of
 free space.

 ---


 On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) 
 bal...@balajin.net wrote:

  -setBalancerBandwidth bandwidth in bytes per second

 So the value is bytes per second. If it is running and exiting,it means it
 has completed the balancing.


 On 24 March 2013 11:32, Tapas Sarangi tapas.sara...@gmail.com wrote:

 Yes, we are running balancer, though a balancer process runs for almost a
 day or more before exiting and starting over.
 Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume
 that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it
 is in Bits then we have a problem.
 What's the unit for dfs.balance.bandwidthPerSec ?

 -

 On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) 
 li...@balajin.net wrote:

 Are you running balancer? If balancer is running and if it is slow, try
 increasing the balancer bandwidth


 On 24 March 2013 09:21, Tapas Sarangi tapas.sara...@gmail.com wrote:

 Thanks for the follow up. I don't know whether attachment will pass
 through this mailing list, but I am attaching a pdf that contains the usage
 of all live nodes.

 All nodes starting with letter g are the ones with smaller storage
 space where as nodes starting with letter s have larger storage space. As
 you will see, most of the gXX nodes are completely full whereas sXX
 nodes have a lot of unused space.

 Recently, we are facing crisis frequently as 'hdfs' goes into a mode
 where it is not able to write any further even though the total space
 available in the cluster is about 500 TB. We believe this has something to
 do with the way it is balancing the nodes, but don't understand the problem
 yet. May be the attached PDF will help some of you (experts) to see what is
 going wrong here...

 Thanks
 --







 Balancer know about topology,but when calculate balancing it operates
 only with nodes not with racks.
 You can see how it work in Balancer.java in  BalancerDatanode about
 string 509.

 I was wrong about 350Tb,35Tb it calculates in such way :

 For example:
 cluster_capacity=3.5Pb
 cluster_dfsused=2Pb

 avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster capacity
 Then we know avg node utilization (node_dfsused/node_capacity*100)
 .Balancer think that all good if  avgutil
 +10node_utilizazation=avgutil-10.

 Ideal case that all node used avgutl of capacity.but for 12TB node its
 only 6.5Tb and for 72Tb its about 40Tb.

 Balancer cant help you.

 Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVEif 
 you can.





  In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb
 you will be able to have only 12Tb replication data.


 Yes, this is true for exactly two nodes in the cluster with 12 TB and
 72 TB, but not true for more than two nodes in the cluster.


 Best way,on my opinion,it is using multiple racks.Nodes in rack must be
 with identical capacity.Racks must be identical capacity.
 For example:

 rack1: 1 node with 72Tb
 rack2: 6 nodes with 12Tb
 rack3: 3 nodes with 24Tb

 It helps with balancing,because dublicated  block must be another rack.


 The same question I asked earlier in this message, does multiple racks
 with default threshold for the balancer minimizes the difference between
 racks ?

 Why did you select hdfs?May be lustre,cephfs and other is better
 choise.


 It wasn't my decision, and I probably can't change it now. I am new to
 this cluster and trying to understand few issues. I will explore other
 options as you mentioned.

 --
 http://balajin.net/blog
 http://flic.kr/balajijegan





 --
 http://balajin.net/blog
 http://flic.kr/balajijegan





Re: disk used percentage is not symmetric on datanodes (balancer)

2013-03-24 Thread Tapas Sarangi
Thanks. We have a 1-1 configuration of drives and folder in all the datanodes.

-Tapas

On Mar 24, 2013, at 3:29 PM, Jamal B jm151...@gmail.com wrote:

 On both types of nodes, what is your dfs.data.dir set to? Does it specify 
 multiple folders on the same set's of drives or is it 1-1 between folder and 
 drive?  If it's set to multiple folders on the same drives, it is probably 
 multiplying the amount of available capacity incorrectly in that it assumes 
 a 1-1 relationship between folder and total capacity of the drive.
 
 
 On Sun, Mar 24, 2013 at 3:01 PM, Tapas Sarangi tapas.sara...@gmail.com 
 wrote:
 Yes, thanks for pointing, but I already know that it is completing the 
 balancing when exiting otherwise it shouldn't exit. 
 Your answer doesn't solve the problem I mentioned earlier in my message. 
 'hdfs' is stalling and hadoop is not writing unless space is cleared up from 
 the cluster even though df shows the cluster has about 500 TB of free 
 space. 
 
 ---
  
 
 On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) 
 bal...@balajin.net wrote:
 
  -setBalancerBandwidth bandwidth in bytes per second
 
 So the value is bytes per second. If it is running and exiting,it means it 
 has completed the balancing. 
 
 
 On 24 March 2013 11:32, Tapas Sarangi tapas.sara...@gmail.com wrote:
 Yes, we are running balancer, though a balancer process runs for almost a 
 day or more before exiting and starting over.
 Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume that's 
 bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it is in 
 Bits then we have a problem.
 What's the unit for dfs.balance.bandwidthPerSec ?
 
 -
 
 On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) 
 li...@balajin.net wrote:
 
 Are you running balancer? If balancer is running and if it is slow, try 
 increasing the balancer bandwidth
 
 
 On 24 March 2013 09:21, Tapas Sarangi tapas.sara...@gmail.com wrote:
 Thanks for the follow up. I don't know whether attachment will pass through 
 this mailing list, but I am attaching a pdf that contains the usage of all 
 live nodes.
 
 All nodes starting with letter g are the ones with smaller storage space 
 where as nodes starting with letter s have larger storage space. As you 
 will see, most of the gXX nodes are completely full whereas sXX nodes 
 have a lot of unused space. 
 
 Recently, we are facing crisis frequently as 'hdfs' goes into a mode where 
 it is not able to write any further even though the total space available 
 in the cluster is about 500 TB. We believe this has something to do with 
 the way it is balancing the nodes, but don't understand the problem yet. 
 May be the attached PDF will help some of you (experts) to see what is 
 going wrong here...
 
 Thanks
 --
 
 
 
 
 
 
 
 Balancer know about topology,but when calculate balancing it operates only 
 with nodes not with racks.
 You can see how it work in Balancer.java in  BalancerDatanode about string 
 509.
 
 I was wrong about 350Tb,35Tb it calculates in such way :
 
 For example:
 cluster_capacity=3.5Pb
 cluster_dfsused=2Pb
 
 avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster capacity
 Then we know avg node utilization (node_dfsused/node_capacity*100) 
 .Balancer think that all good if  avgutil 
 +10node_utilizazation=avgutil-10.
 
 Ideal case that all node used avgutl of capacity.but for 12TB node its 
 only 6.5Tb and for 72Tb its about 40Tb.
 
 Balancer cant help you.
 
 Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVE if 
 you can.
 
  
 
 
 In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb you 
 will be able to have only 12Tb replication data.
 
 Yes, this is true for exactly two nodes in the cluster with 12 TB and 72 
 TB, but not true for more than two nodes in the cluster.
 
 
 Best way,on my opinion,it is using multiple racks.Nodes in rack must be 
 with identical capacity.Racks must be identical capacity.
 For example:
 
 rack1: 1 node with 72Tb
 rack2: 6 nodes with 12Tb
 rack3: 3 nodes with 24Tb
 
 It helps with balancing,because dublicated  block must be another rack.
 
 
 The same question I asked earlier in this message, does multiple racks 
 with default threshold for the balancer minimizes the difference between 
 racks ?
 
 Why did you select hdfs?May be lustre,cephfs and other is better choise.  
 
 It wasn't my decision, and I probably can't change it now. I am new to 
 this cluster and trying to understand few issues. I will explore other 
 options as you mentioned.
 
 -- 
 http://balajin.net/blog
 http://flic.kr/balajijegan
 
 
 
 
 -- 
 http://balajin.net/blog
 http://flic.kr/balajijegan
 
 



Re: disk used percentage is not symmetric on datanodes (balancer)

2013-03-24 Thread Jamal B
Then I think the only way around this would be to decommission  1 at a
time, the smaller nodes, and ensure that the blocks are moved to the larger
nodes.  And once complete, bring back in the smaller nodes, but maybe only
after you tweak the rack topology to match your disk layout more than
network layout to compensate for the unbalanced nodes.

Just my 2 cents


On Sun, Mar 24, 2013 at 4:31 PM, Tapas Sarangi tapas.sara...@gmail.comwrote:

 Thanks. We have a 1-1 configuration of drives and folder in all the
 datanodes.

 -Tapas

 On Mar 24, 2013, at 3:29 PM, Jamal B jm151...@gmail.com wrote:

 On both types of nodes, what is your dfs.data.dir set to? Does it specify
 multiple folders on the same set's of drives or is it 1-1 between folder
 and drive?  If it's set to multiple folders on the same drives, it
 is probably multiplying the amount of available capacity incorrectly in
 that it assumes a 1-1 relationship between folder and total capacity of the
 drive.


 On Sun, Mar 24, 2013 at 3:01 PM, Tapas Sarangi tapas.sara...@gmail.comwrote:

 Yes, thanks for pointing, but I already know that it is completing the
 balancing when exiting otherwise it shouldn't exit.
 Your answer doesn't solve the problem I mentioned earlier in my message.
 'hdfs' is stalling and hadoop is not writing unless space is cleared up
 from the cluster even though df shows the cluster has about 500 TB of
 free space.

 ---


 On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) 
 bal...@balajin.net wrote:

  -setBalancerBandwidth bandwidth in bytes per second

 So the value is bytes per second. If it is running and exiting,it means
 it has completed the balancing.


 On 24 March 2013 11:32, Tapas Sarangi tapas.sara...@gmail.com wrote:

 Yes, we are running balancer, though a balancer process runs for almost
 a day or more before exiting and starting over.
 Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume
 that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it
 is in Bits then we have a problem.
 What's the unit for dfs.balance.bandwidthPerSec ?

 -

 On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) 
 li...@balajin.net wrote:

 Are you running balancer? If balancer is running and if it is slow, try
 increasing the balancer bandwidth


 On 24 March 2013 09:21, Tapas Sarangi tapas.sara...@gmail.com wrote:

 Thanks for the follow up. I don't know whether attachment will pass
 through this mailing list, but I am attaching a pdf that contains the usage
 of all live nodes.

 All nodes starting with letter g are the ones with smaller storage
 space where as nodes starting with letter s have larger storage space. As
 you will see, most of the gXX nodes are completely full whereas sXX
 nodes have a lot of unused space.

 Recently, we are facing crisis frequently as 'hdfs' goes into a mode
 where it is not able to write any further even though the total space
 available in the cluster is about 500 TB. We believe this has something to
 do with the way it is balancing the nodes, but don't understand the problem
 yet. May be the attached PDF will help some of you (experts) to see what is
 going wrong here...

 Thanks
 --







 Balancer know about topology,but when calculate balancing it operates
 only with nodes not with racks.
 You can see how it work in Balancer.java in  BalancerDatanode about
 string 509.

 I was wrong about 350Tb,35Tb it calculates in such way :

 For example:
 cluster_capacity=3.5Pb
 cluster_dfsused=2Pb

 avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster
 capacity
 Then we know avg node utilization (node_dfsused/node_capacity*100)
 .Balancer think that all good if  avgutil
 +10node_utilizazation=avgutil-10.

 Ideal case that all node used avgutl of capacity.but for 12TB node its
 only 6.5Tb and for 72Tb its about 40Tb.

 Balancer cant help you.

 Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVEif 
 you can.





  In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb
 you will be able to have only 12Tb replication data.


 Yes, this is true for exactly two nodes in the cluster with 12 TB and
 72 TB, but not true for more than two nodes in the cluster.


 Best way,on my opinion,it is using multiple racks.Nodes in rack must
 be with identical capacity.Racks must be identical capacity.
 For example:

 rack1: 1 node with 72Tb
 rack2: 6 nodes with 12Tb
 rack3: 3 nodes with 24Tb

 It helps with balancing,because dublicated  block must be another rack.


 The same question I asked earlier in this message, does multiple racks
 with default threshold for the balancer minimizes the difference between
 racks ?

 Why did you select hdfs?May be lustre,cephfs and other is better
 choise.


 It wasn't my decision, and I probably can't change it now. I am new to
 this cluster and trying to understand few issues. I will explore other
 options as you mentioned.

 --
 http://balajin.net/blog
 

Re: disk used percentage is not symmetric on datanodes (balancer)

2013-03-24 Thread Alexey Babutin
you said that threshold=10.Run mannualy command : hadoop balancer threshold
9.5 ,then 9 and so with 0.5 step.

On Sun, Mar 24, 2013 at 11:01 PM, Tapas Sarangi tapas.sara...@gmail.comwrote:

 Yes, thanks for pointing, but I already know that it is completing the
 balancing when exiting otherwise it shouldn't exit.
 Your answer doesn't solve the problem I mentioned earlier in my message.
 'hdfs' is stalling and hadoop is not writing unless space is cleared up
 from the cluster even though df shows the cluster has about 500 TB of
 free space.

 ---


 On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) 
 bal...@balajin.net wrote:

  -setBalancerBandwidth bandwidth in bytes per second

 So the value is bytes per second. If it is running and exiting,it means it
 has completed the balancing.


 On 24 March 2013 11:32, Tapas Sarangi tapas.sara...@gmail.com wrote:

 Yes, we are running balancer, though a balancer process runs for almost a
 day or more before exiting and starting over.
 Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume
 that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it
 is in Bits then we have a problem.
 What's the unit for dfs.balance.bandwidthPerSec ?

 -

 On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) 
 li...@balajin.net wrote:

 Are you running balancer? If balancer is running and if it is slow, try
 increasing the balancer bandwidth


 On 24 March 2013 09:21, Tapas Sarangi tapas.sara...@gmail.com wrote:

 Thanks for the follow up. I don't know whether attachment will pass
 through this mailing list, but I am attaching a pdf that contains the usage
 of all live nodes.

 All nodes starting with letter g are the ones with smaller storage
 space where as nodes starting with letter s have larger storage space. As
 you will see, most of the gXX nodes are completely full whereas sXX
 nodes have a lot of unused space.

 Recently, we are facing crisis frequently as 'hdfs' goes into a mode
 where it is not able to write any further even though the total space
 available in the cluster is about 500 TB. We believe this has something to
 do with the way it is balancing the nodes, but don't understand the problem
 yet. May be the attached PDF will help some of you (experts) to see what is
 going wrong here...

 Thanks
 --







 Balancer know about topology,but when calculate balancing it operates
 only with nodes not with racks.
 You can see how it work in Balancer.java in  BalancerDatanode about
 string 509.

 I was wrong about 350Tb,35Tb it calculates in such way :

 For example:
 cluster_capacity=3.5Pb
 cluster_dfsused=2Pb

 avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster capacity
 Then we know avg node utilization (node_dfsused/node_capacity*100)
 .Balancer think that all good if  avgutil
 +10node_utilizazation=avgutil-10.

 Ideal case that all node used avgutl of capacity.but for 12TB node its
 only 6.5Tb and for 72Tb its about 40Tb.

 Balancer cant help you.

 Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVEif 
 you can.





  In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb
 you will be able to have only 12Tb replication data.


 Yes, this is true for exactly two nodes in the cluster with 12 TB and
 72 TB, but not true for more than two nodes in the cluster.


 Best way,on my opinion,it is using multiple racks.Nodes in rack must be
 with identical capacity.Racks must be identical capacity.
 For example:

 rack1: 1 node with 72Tb
 rack2: 6 nodes with 12Tb
 rack3: 3 nodes with 24Tb

 It helps with balancing,because dublicated  block must be another rack.


 The same question I asked earlier in this message, does multiple racks
 with default threshold for the balancer minimizes the difference between
 racks ?

 Why did you select hdfs?May be lustre,cephfs and other is better
 choise.


 It wasn't my decision, and I probably can't change it now. I am new to
 this cluster and trying to understand few issues. I will explore other
 options as you mentioned.

 --
 http://balajin.net/blog
 http://flic.kr/balajijegan





 --
 http://balajin.net/blog
 http://flic.kr/balajijegan





Re: disk used percentage is not symmetric on datanodes (balancer)

2013-03-24 Thread Tapas Sarangi
Hi,

Thanks for the idea, I will give this a try and report back. 

My worry is if we decommission a small node (one at a time), will it move the 
data to larger nodes or choke another smaller nodes ? In principle it should 
distribute the blocks, the point is it is not distributing the way we expect it 
to, so do you think this may cause further problems ?

-

On Mar 24, 2013, at 3:37 PM, Jamal B jm151...@gmail.com wrote:

 Then I think the only way around this would be to decommission  1 at a time, 
 the smaller nodes, and ensure that the blocks are moved to the larger nodes.  
 And once complete, bring back in the smaller nodes, but maybe only after you 
 tweak the rack topology to match your disk layout more than network layout to 
 compensate for the unbalanced nodes.  
 
 Just my 2 cents
 
 
 On Sun, Mar 24, 2013 at 4:31 PM, Tapas Sarangi tapas.sara...@gmail.com 
 wrote:
 Thanks. We have a 1-1 configuration of drives and folder in all the datanodes.
 
 -Tapas
 
 On Mar 24, 2013, at 3:29 PM, Jamal B jm151...@gmail.com wrote:
 
 On both types of nodes, what is your dfs.data.dir set to? Does it specify 
 multiple folders on the same set's of drives or is it 1-1 between folder and 
 drive?  If it's set to multiple folders on the same drives, it is probably 
 multiplying the amount of available capacity incorrectly in that it 
 assumes a 1-1 relationship between folder and total capacity of the drive.
 
 
 On Sun, Mar 24, 2013 at 3:01 PM, Tapas Sarangi tapas.sara...@gmail.com 
 wrote:
 Yes, thanks for pointing, but I already know that it is completing the 
 balancing when exiting otherwise it shouldn't exit. 
 Your answer doesn't solve the problem I mentioned earlier in my message. 
 'hdfs' is stalling and hadoop is not writing unless space is cleared up from 
 the cluster even though df shows the cluster has about 500 TB of free 
 space. 
 
 ---
  
 
 On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) 
 bal...@balajin.net wrote:
 
  -setBalancerBandwidth bandwidth in bytes per second
 
 So the value is bytes per second. If it is running and exiting,it means it 
 has completed the balancing. 
 
 
 On 24 March 2013 11:32, Tapas Sarangi tapas.sara...@gmail.com wrote:
 Yes, we are running balancer, though a balancer process runs for almost a 
 day or more before exiting and starting over.
 Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume that's 
 bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it is in 
 Bits then we have a problem.
 What's the unit for dfs.balance.bandwidthPerSec ?
 
 -
 
 On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) 
 li...@balajin.net wrote:
 
 Are you running balancer? If balancer is running and if it is slow, try 
 increasing the balancer bandwidth
 
 
 On 24 March 2013 09:21, Tapas Sarangi tapas.sara...@gmail.com wrote:
 Thanks for the follow up. I don't know whether attachment will pass 
 through this mailing list, but I am attaching a pdf that contains the 
 usage of all live nodes.
 
 All nodes starting with letter g are the ones with smaller storage space 
 where as nodes starting with letter s have larger storage space. As you 
 will see, most of the gXX nodes are completely full whereas sXX nodes 
 have a lot of unused space. 
 
 Recently, we are facing crisis frequently as 'hdfs' goes into a mode where 
 it is not able to write any further even though the total space available 
 in the cluster is about 500 TB. We believe this has something to do with 
 the way it is balancing the nodes, but don't understand the problem yet. 
 May be the attached PDF will help some of you (experts) to see what is 
 going wrong here...
 
 Thanks
 --
 
 
 
 
 
 
 
 Balancer know about topology,but when calculate balancing it operates 
 only with nodes not with racks.
 You can see how it work in Balancer.java in  BalancerDatanode about 
 string 509.
 
 I was wrong about 350Tb,35Tb it calculates in such way :
 
 For example:
 cluster_capacity=3.5Pb
 cluster_dfsused=2Pb
 
 avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster capacity
 Then we know avg node utilization (node_dfsused/node_capacity*100) 
 .Balancer think that all good if  avgutil 
 +10node_utilizazation=avgutil-10.
 
 Ideal case that all node used avgutl of capacity.but for 12TB node its 
 only 6.5Tb and for 72Tb its about 40Tb.
 
 Balancer cant help you.
 
 Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVE 
 if you can.
 
  
 
 
 In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb 
 you will be able to have only 12Tb replication data.
 
 Yes, this is true for exactly two nodes in the cluster with 12 TB and 72 
 TB, but not true for more than two nodes in the cluster.
 
 
 Best way,on my opinion,it is using multiple racks.Nodes in rack must be 
 with identical capacity.Racks must be identical capacity.
 For example:
 
 rack1: 1 node with 72Tb
 rack2: 6 nodes with 12Tb
 rack3: 3 nodes with 24Tb
 
 

Re: disk used percentage is not symmetric on datanodes (balancer)

2013-03-24 Thread Tapas Sarangi

On Mar 24, 2013, at 3:40 PM, Alexey Babutin zorlaxpokemon...@gmail.com wrote:

 you said that threshold=10.Run mannualy command : hadoop balancer threshold 
 9.5 ,then 9 and so with 0.5 step.
 

We are not setting threshold anywhere in our configuration and thus considering 
the default which I believe is 10. 
Why do you suggest such steps need to be tested for balancer ? Please explain.
I guess we had a discussion earlier on this thread and came to the conclusion 
that the threshold will not help in this situation.


-




 On Sun, Mar 24, 2013 at 11:01 PM, Tapas Sarangi tapas.sara...@gmail.com 
 wrote:
 Yes, thanks for pointing, but I already know that it is completing the 
 balancing when exiting otherwise it shouldn't exit. 
 Your answer doesn't solve the problem I mentioned earlier in my message. 
 'hdfs' is stalling and hadoop is not writing unless space is cleared up from 
 the cluster even though df shows the cluster has about 500 TB of free 
 space. 
 
 ---
  
 
 On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) 
 bal...@balajin.net wrote:
 
  -setBalancerBandwidth bandwidth in bytes per second
 
 So the value is bytes per second. If it is running and exiting,it means it 
 has completed the balancing. 
 
 
 On 24 March 2013 11:32, Tapas Sarangi tapas.sara...@gmail.com wrote:
 Yes, we are running balancer, though a balancer process runs for almost a 
 day or more before exiting and starting over.
 Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume that's 
 bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it is in 
 Bits then we have a problem.
 What's the unit for dfs.balance.bandwidthPerSec ?
 
 -
 
 On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) 
 li...@balajin.net wrote:
 
 Are you running balancer? If balancer is running and if it is slow, try 
 increasing the balancer bandwidth
 
 
 On 24 March 2013 09:21, Tapas Sarangi tapas.sara...@gmail.com wrote:
 Thanks for the follow up. I don't know whether attachment will pass through 
 this mailing list, but I am attaching a pdf that contains the usage of all 
 live nodes.
 
 All nodes starting with letter g are the ones with smaller storage space 
 where as nodes starting with letter s have larger storage space. As you 
 will see, most of the gXX nodes are completely full whereas sXX nodes 
 have a lot of unused space. 
 
 Recently, we are facing crisis frequently as 'hdfs' goes into a mode where 
 it is not able to write any further even though the total space available 
 in the cluster is about 500 TB. We believe this has something to do with 
 the way it is balancing the nodes, but don't understand the problem yet. 
 May be the attached PDF will help some of you (experts) to see what is 
 going wrong here...
 
 Thanks
 --
 
 
 
 
 
 
 
 Balancer know about topology,but when calculate balancing it operates only 
 with nodes not with racks.
 You can see how it work in Balancer.java in  BalancerDatanode about string 
 509.
 
 I was wrong about 350Tb,35Tb it calculates in such way :
 
 For example:
 cluster_capacity=3.5Pb
 cluster_dfsused=2Pb
 
 avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster capacity
 Then we know avg node utilization (node_dfsused/node_capacity*100) 
 .Balancer think that all good if  avgutil 
 +10node_utilizazation=avgutil-10.
 
 Ideal case that all node used avgutl of capacity.but for 12TB node its 
 only 6.5Tb and for 72Tb its about 40Tb.
 
 Balancer cant help you.
 
 Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVE if 
 you can.
 
  
 
 
 In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb you 
 will be able to have only 12Tb replication data.
 
 Yes, this is true for exactly two nodes in the cluster with 12 TB and 72 
 TB, but not true for more than two nodes in the cluster.
 
 
 Best way,on my opinion,it is using multiple racks.Nodes in rack must be 
 with identical capacity.Racks must be identical capacity.
 For example:
 
 rack1: 1 node with 72Tb
 rack2: 6 nodes with 12Tb
 rack3: 3 nodes with 24Tb
 
 It helps with balancing,because dublicated  block must be another rack.
 
 
 The same question I asked earlier in this message, does multiple racks 
 with default threshold for the balancer minimizes the difference between 
 racks ?
 
 Why did you select hdfs?May be lustre,cephfs and other is better choise.  
 
 It wasn't my decision, and I probably can't change it now. I am new to 
 this cluster and trying to understand few issues. I will explore other 
 options as you mentioned.
 
 -- 
 http://balajin.net/blog
 http://flic.kr/balajijegan
 
 
 
 
 -- 
 http://balajin.net/blog
 http://flic.kr/balajijegan
 
 



Re: disk used percentage is not symmetric on datanodes (balancer)

2013-03-24 Thread Alexey Babutin
I think that it makes help,but start from 1 node.watch where data have moved

On Mon, Mar 25, 2013 at 12:44 AM, Tapas Sarangi tapas.sara...@gmail.comwrote:

 Hi,

 Thanks for the idea, I will give this a try and report back.

 My worry is if we decommission a small node (one at a time), will it move
 the data to larger nodes or choke another smaller nodes ? In principle it
 should distribute the blocks, the point is it is not distributing the way
 we expect it to, so do you think this may cause further problems ?

 -

 On Mar 24, 2013, at 3:37 PM, Jamal B jm151...@gmail.com wrote:

 Then I think the only way around this would be to decommission  1 at a
 time, the smaller nodes, and ensure that the blocks are moved to the larger
 nodes.

 And once complete, bring back in the smaller nodes, but maybe only after
 you tweak the rack topology to match your disk layout more than network
 layout to compensate for the unbalanced nodes.


 Just my 2 cents


 On Sun, Mar 24, 2013 at 4:31 PM, Tapas Sarangi tapas.sara...@gmail.comwrote:

 Thanks. We have a 1-1 configuration of drives and folder in all the
 datanodes.

 -Tapas

 On Mar 24, 2013, at 3:29 PM, Jamal B jm151...@gmail.com wrote:

 On both types of nodes, what is your dfs.data.dir set to? Does it specify
 multiple folders on the same set's of drives or is it 1-1 between folder
 and drive?  If it's set to multiple folders on the same drives, it
 is probably multiplying the amount of available capacity incorrectly in
 that it assumes a 1-1 relationship between folder and total capacity of the
 drive.


 On Sun, Mar 24, 2013 at 3:01 PM, Tapas Sarangi 
 tapas.sara...@gmail.comwrote:

 Yes, thanks for pointing, but I already know that it is completing the
 balancing when exiting otherwise it shouldn't exit.
 Your answer doesn't solve the problem I mentioned earlier in my message.
 'hdfs' is stalling and hadoop is not writing unless space is cleared up
 from the cluster even though df shows the cluster has about 500 TB of
 free space.

 ---


 On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) 
 bal...@balajin.net wrote:

  -setBalancerBandwidth bandwidth in bytes per second

 So the value is bytes per second. If it is running and exiting,it means
 it has completed the balancing.


 On 24 March 2013 11:32, Tapas Sarangi tapas.sara...@gmail.com wrote:

 Yes, we are running balancer, though a balancer process runs for almost
 a day or more before exiting and starting over.
 Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume
 that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it
 is in Bits then we have a problem.
 What's the unit for dfs.balance.bandwidthPerSec ?

 -

 On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) 
 li...@balajin.net wrote:

 Are you running balancer? If balancer is running and if it is slow, try
 increasing the balancer bandwidth


 On 24 March 2013 09:21, Tapas Sarangi tapas.sara...@gmail.com wrote:

 Thanks for the follow up. I don't know whether attachment will pass
 through this mailing list, but I am attaching a pdf that contains the 
 usage
 of all live nodes.

 All nodes starting with letter g are the ones with smaller storage
 space where as nodes starting with letter s have larger storage space. 
 As
 you will see, most of the gXX nodes are completely full whereas sXX
 nodes have a lot of unused space.

 Recently, we are facing crisis frequently as 'hdfs' goes into a mode
 where it is not able to write any further even though the total space
 available in the cluster is about 500 TB. We believe this has something to
 do with the way it is balancing the nodes, but don't understand the 
 problem
 yet. May be the attached PDF will help some of you (experts) to see what 
 is
 going wrong here...

 Thanks
 --







 Balancer know about topology,but when calculate balancing it operates
 only with nodes not with racks.
 You can see how it work in Balancer.java in  BalancerDatanode about
 string 509.

 I was wrong about 350Tb,35Tb it calculates in such way :

 For example:
 cluster_capacity=3.5Pb
 cluster_dfsused=2Pb

 avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster
 capacity
 Then we know avg node utilization (node_dfsused/node_capacity*100)
 .Balancer think that all good if  avgutil
 +10node_utilizazation=avgutil-10.

 Ideal case that all node used avgutl of capacity.but for 12TB node its
 only 6.5Tb and for 72Tb its about 40Tb.

 Balancer cant help you.

 Show me
 http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVE if
 you can.





  In ideal case with replication factor 2 ,with two nodes 12Tb and
 72Tb you will be able to have only 12Tb replication data.


 Yes, this is true for exactly two nodes in the cluster with 12 TB and
 72 TB, but not true for more than two nodes in the cluster.


 Best way,on my opinion,it is using multiple racks.Nodes in rack must
 be with identical capacity.Racks must be identical capacity.
 For 

Re: disk used percentage is not symmetric on datanodes (balancer)

2013-03-24 Thread Jamal B
It shouldn't cause further problems since most of your small nodes are
already their capacity.  You could set or increase the dfs reserved
property on your smaller nodes to force the flow of blocks onto the larger
nodes.
On Mar 24, 2013 4:45 PM, Tapas Sarangi tapas.sara...@gmail.com wrote:

 Hi,

 Thanks for the idea, I will give this a try and report back.

 My worry is if we decommission a small node (one at a time), will it move
 the data to larger nodes or choke another smaller nodes ? In principle it
 should distribute the blocks, the point is it is not distributing the way
 we expect it to, so do you think this may cause further problems ?

 -

 On Mar 24, 2013, at 3:37 PM, Jamal B jm151...@gmail.com wrote:

 Then I think the only way around this would be to decommission  1 at a
 time, the smaller nodes, and ensure that the blocks are moved to the larger
 nodes.

 And once complete, bring back in the smaller nodes, but maybe only after
 you tweak the rack topology to match your disk layout more than network
 layout to compensate for the unbalanced nodes.


 Just my 2 cents


 On Sun, Mar 24, 2013 at 4:31 PM, Tapas Sarangi tapas.sara...@gmail.comwrote:

 Thanks. We have a 1-1 configuration of drives and folder in all the
 datanodes.

 -Tapas

 On Mar 24, 2013, at 3:29 PM, Jamal B jm151...@gmail.com wrote:

 On both types of nodes, what is your dfs.data.dir set to? Does it specify
 multiple folders on the same set's of drives or is it 1-1 between folder
 and drive?  If it's set to multiple folders on the same drives, it
 is probably multiplying the amount of available capacity incorrectly in
 that it assumes a 1-1 relationship between folder and total capacity of the
 drive.


 On Sun, Mar 24, 2013 at 3:01 PM, Tapas Sarangi 
 tapas.sara...@gmail.comwrote:

 Yes, thanks for pointing, but I already know that it is completing the
 balancing when exiting otherwise it shouldn't exit.
 Your answer doesn't solve the problem I mentioned earlier in my message.
 'hdfs' is stalling and hadoop is not writing unless space is cleared up
 from the cluster even though df shows the cluster has about 500 TB of
 free space.

 ---


 On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) 
 bal...@balajin.net wrote:

  -setBalancerBandwidth bandwidth in bytes per second

 So the value is bytes per second. If it is running and exiting,it means
 it has completed the balancing.


 On 24 March 2013 11:32, Tapas Sarangi tapas.sara...@gmail.com wrote:

 Yes, we are running balancer, though a balancer process runs for almost
 a day or more before exiting and starting over.
 Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume
 that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it
 is in Bits then we have a problem.
 What's the unit for dfs.balance.bandwidthPerSec ?

 -

 On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) 
 li...@balajin.net wrote:

 Are you running balancer? If balancer is running and if it is slow, try
 increasing the balancer bandwidth


 On 24 March 2013 09:21, Tapas Sarangi tapas.sara...@gmail.com wrote:

 Thanks for the follow up. I don't know whether attachment will pass
 through this mailing list, but I am attaching a pdf that contains the 
 usage
 of all live nodes.

 All nodes starting with letter g are the ones with smaller storage
 space where as nodes starting with letter s have larger storage space. 
 As
 you will see, most of the gXX nodes are completely full whereas sXX
 nodes have a lot of unused space.

 Recently, we are facing crisis frequently as 'hdfs' goes into a mode
 where it is not able to write any further even though the total space
 available in the cluster is about 500 TB. We believe this has something to
 do with the way it is balancing the nodes, but don't understand the 
 problem
 yet. May be the attached PDF will help some of you (experts) to see what 
 is
 going wrong here...

 Thanks
 --







 Balancer know about topology,but when calculate balancing it operates
 only with nodes not with racks.
 You can see how it work in Balancer.java in  BalancerDatanode about
 string 509.

 I was wrong about 350Tb,35Tb it calculates in such way :

 For example:
 cluster_capacity=3.5Pb
 cluster_dfsused=2Pb

 avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster
 capacity
 Then we know avg node utilization (node_dfsused/node_capacity*100)
 .Balancer think that all good if  avgutil
 +10node_utilizazation=avgutil-10.

 Ideal case that all node used avgutl of capacity.but for 12TB node its
 only 6.5Tb and for 72Tb its about 40Tb.

 Balancer cant help you.

 Show me
 http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVE if
 you can.





  In ideal case with replication factor 2 ,with two nodes 12Tb and
 72Tb you will be able to have only 12Tb replication data.


 Yes, this is true for exactly two nodes in the cluster with 12 TB and
 72 TB, but not true for more than two nodes in the cluster.


 

Re: question for commetter

2013-03-24 Thread Azuryy Yu
good question, i just want HA, dont want to change more configuration.
On Mar 25, 2013 2:32 AM, Balaji Narayanan (பாலாஜி நாராயணன்) 
li...@balajin.net wrote:

 is there a reason why you dont want to run MRv2 under yarn?


 On 22 March 2013 22:49, Azuryy Yu azury...@gmail.com wrote:

 is there a way to separate hdfs2 from hadoop2? I want use hdfs2 and
 mapreduce1.0.4, exclude yarn. because I need HDFS-HA.

 --
 http://balajin.net/blog
 http://flic.kr/balajijegan




Re: disk used percentage is not symmetric on datanodes (balancer)

2013-03-24 Thread Tapas Sarangi
Thanks. Does this need a restart of hadoop in the nodes where this modification 
is made ?

-

On Mar 24, 2013, at 8:06 PM, Jamal B jm151...@gmail.com wrote:

 dfs.datanode.du.reserved
 
 You could tweak that param on the smaller nodes to force the flow of blocks 
 to other nodes.   A short term hack at best, but should help the situation a 
 bit.
 
 On Mar 24, 2013 7:09 PM, Tapas Sarangi tapas.sara...@gmail.com wrote:
 
 On Mar 24, 2013, at 4:34 PM, Jamal B jm151...@gmail.com wrote:
 
 It shouldn't cause further problems since most of your small nodes are 
 already their capacity.  You could set or increase the dfs reserved property 
 on your smaller nodes to force the flow of blocks onto the larger nodes.
 
 
 
 Thanks.  Can you please specify which are the dfs properties that we can set 
 or modify to force the flow of blocks directed towards the larger nodes than 
 the smaller nodes ?
 
 -
 
 
 
 
 
 
 On Mar 24, 2013 4:45 PM, Tapas Sarangi tapas.sara...@gmail.com wrote:
 Hi,
 
 Thanks for the idea, I will give this a try and report back. 
 
 My worry is if we decommission a small node (one at a time), will it move 
 the data to larger nodes or choke another smaller nodes ? In principle it 
 should distribute the blocks, the point is it is not distributing the way we 
 expect it to, so do you think this may cause further problems ?
 
 -
 
 On Mar 24, 2013, at 3:37 PM, Jamal B jm151...@gmail.com wrote:
 
 Then I think the only way around this would be to decommission  1 at a 
 time, the smaller nodes, and ensure that the blocks are moved to the larger 
 nodes.  
 And once complete, bring back in the smaller nodes, but maybe only after 
 you tweak the rack topology to match your disk layout more than network 
 layout to compensate for the unbalanced nodes.  
 
 Just my 2 cents
 
 
 On Sun, Mar 24, 2013 at 4:31 PM, Tapas Sarangi tapas.sara...@gmail.com 
 wrote:
 Thanks. We have a 1-1 configuration of drives and folder in all the 
 datanodes.
 
 -Tapas
 
 On Mar 24, 2013, at 3:29 PM, Jamal B jm151...@gmail.com wrote:
 
 On both types of nodes, what is your dfs.data.dir set to? Does it specify 
 multiple folders on the same set's of drives or is it 1-1 between folder 
 and drive?  If it's set to multiple folders on the same drives, it is 
 probably multiplying the amount of available capacity incorrectly in 
 that it assumes a 1-1 relationship between folder and total capacity of 
 the drive.
 
 
 On Sun, Mar 24, 2013 at 3:01 PM, Tapas Sarangi tapas.sara...@gmail.com 
 wrote:
 Yes, thanks for pointing, but I already know that it is completing the 
 balancing when exiting otherwise it shouldn't exit. 
 Your answer doesn't solve the problem I mentioned earlier in my message. 
 'hdfs' is stalling and hadoop is not writing unless space is cleared up 
 from the cluster even though df shows the cluster has about 500 TB of 
 free space. 
 
 ---
  
 
 On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) 
 bal...@balajin.net wrote:
 
  -setBalancerBandwidth bandwidth in bytes per second
 
 So the value is bytes per second. If it is running and exiting,it means 
 it has completed the balancing. 
 
 
 On 24 March 2013 11:32, Tapas Sarangi tapas.sara...@gmail.com wrote:
 Yes, we are running balancer, though a balancer process runs for almost a 
 day or more before exiting and starting over.
 Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume 
 that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If 
 it is in Bits then we have a problem.
 What's the unit for dfs.balance.bandwidthPerSec ?
 
 -
 
 On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) 
 li...@balajin.net wrote:
 
 Are you running balancer? If balancer is running and if it is slow, try 
 increasing the balancer bandwidth
 
 
 On 24 March 2013 09:21, Tapas Sarangi tapas.sara...@gmail.com wrote:
 Thanks for the follow up. I don't know whether attachment will pass 
 through this mailing list, but I am attaching a pdf that contains the 
 usage of all live nodes.
 
 All nodes starting with letter g are the ones with smaller storage 
 space where as nodes starting with letter s have larger storage space. 
 As you will see, most of the gXX nodes are completely full whereas 
 sXX nodes have a lot of unused space. 
 
 Recently, we are facing crisis frequently as 'hdfs' goes into a mode 
 where it is not able to write any further even though the total space 
 available in the cluster is about 500 TB. We believe this has something 
 to do with the way it is balancing the nodes, but don't understand the 
 problem yet. May be the attached PDF will help some of you (experts) to 
 see what is going wrong here...
 
 Thanks
 --
 
 
 
 
 
 
 
 Balancer know about topology,but when calculate balancing it operates 
 only with nodes not with racks.
 You can see how it work in Balancer.java in  BalancerDatanode about 
 string 509.
 
 I was wrong about 350Tb,35Tb it calculates in such way :
 
 

Re: Child JVM memory allocation / Usage

2013-03-24 Thread Ted
did you set the min heap size == your max head size? if you didn't,
free memory only shows you the difference between used and commit, not
used and max.

On 3/24/13, nagarjuna kanamarlapudi nagarjuna.kanamarlap...@gmail.com wrote:
 Hi,

 I configured  my child jvm heap to 2 GB. So, I thought I could really read
 1.5GB of data and store it in memory (mapper/reducer).

 I wanted to confirm the same and wrote the following piece of code in the
 configure method of mapper.

 @Override

 public void configure(JobConf job) {

 System.out.println(FREE MEMORY -- 

 + Runtime.getRuntime().freeMemory());

 System.out.println(MAX MEMORY --- + Runtime.getRuntime().maxMemory());

 }


 Surprisingly the output was


 FREE MEMORY -- 341854864  = 320 MB
 MAX MEMORY ---1908932608  = 1.9 GB


 I am just wondering what processes are taking up that extra 1.6GB of
 heap which I configured for the child jvm heap.


 Appreciate in helping me understand the scenario.



 Regards

 Nagarjuna K



-- 
Ted.


Re: Child JVM memory allocation / Usage

2013-03-24 Thread nagarjuna kanamarlapudi
Hi Ted,

As far as i can recollect, I onl configured these parameters

property
namemapred.child.java.opts/name
value-Xmx2048m/value
descriptionthis number is the number of megabytes of memory that
each mapper and each reducers will have available to use. If jobs start
running out of heap space, this may need to be increased./description
/property

property
namemapred.child.ulimit/name
value3145728/value
descriptionthis number is the number of kilobytes of memory that
each mapper and each reducer will have available to use. If jobs start
running out of heap space, this may need to be increased./description
/property



On Mon, Mar 25, 2013 at 6:57 AM, Ted r6squee...@gmail.com wrote:

 did you set the min heap size == your max head size? if you didn't,
 free memory only shows you the difference between used and commit, not
 used and max.

 On 3/24/13, nagarjuna kanamarlapudi nagarjuna.kanamarlap...@gmail.com
 wrote:
  Hi,
 
  I configured  my child jvm heap to 2 GB. So, I thought I could really
 read
  1.5GB of data and store it in memory (mapper/reducer).
 
  I wanted to confirm the same and wrote the following piece of code in the
  configure method of mapper.
 
  @Override
 
  public void configure(JobConf job) {
 
  System.out.println(FREE MEMORY -- 
 
  + Runtime.getRuntime().freeMemory());
 
  System.out.println(MAX MEMORY --- + Runtime.getRuntime().maxMemory());
 
  }
 
 
  Surprisingly the output was
 
 
  FREE MEMORY -- 341854864  = 320 MB
  MAX MEMORY ---1908932608  = 1.9 GB
 
 
  I am just wondering what processes are taking up that extra 1.6GB of
  heap which I configured for the child jvm heap.
 
 
  Appreciate in helping me understand the scenario.
 
 
 
  Regards
 
  Nagarjuna K
 


 --
 Ted.



[no subject]

2013-03-24 Thread Fan Bai

Dear Sir,

I have a question about Hadoop, when I use Hadoop and Mapreduce to finish a job 
(only one job in here), can I control the file to work in which node?

For example, I have only one job and this job have 10 files (10 mapper need to 
run). Also in my severs, I have one head node and four working node. My 
question is: can I control those 10 files to working in which node? Such as: 
No.1 file work in node1, No.3 file work in node2, No.5 file work in node3 and 
No.8 file work in node4.

If I can do this, that means I can control the task. Is that means I still can 
control this file in next around (I have a loop in head node;I can do another 
mapreduce work). For example, I can set up No.5 file in 1st around worked node3 
and I also can set up No.5 file work in node 2 in 2nd around.

If I cannot, is that means, for Hadoop, the file will work in which node just 
like a “black box”, the user cannot control the file will work in which node, 
because you think the user do not need control it, just let HDFS help them to 
finish the parallel work. 
Therefore, the Hadoop cannot control the task in one job, but can control the 
multiple jobs.

Thank you so much!



Fan Bai
PhD Candidate
Computer Science Department
Georgia State University
Atlanta, GA 30303


Re: disk used percentage is not symmetric on datanodes (balancer)

2013-03-24 Thread Jamal B
Yes
On Mar 24, 2013 9:25 PM, Tapas Sarangi tapas.sara...@gmail.com wrote:

 Thanks. Does this need a restart of hadoop in the nodes where this
 modification is made ?

 -

 On Mar 24, 2013, at 8:06 PM, Jamal B jm151...@gmail.com wrote:

 dfs.datanode.du.reserved

 You could tweak that param on the smaller nodes to force the flow of
 blocks to other nodes.   A short term hack at best, but should help the
 situation a bit.
 On Mar 24, 2013 7:09 PM, Tapas Sarangi tapas.sara...@gmail.com wrote:


 On Mar 24, 2013, at 4:34 PM, Jamal B jm151...@gmail.com wrote:

 It shouldn't cause further problems since most of your small nodes are
 already their capacity.  You could set or increase the dfs reserved
 property on your smaller nodes to force the flow of blocks onto the larger
 nodes.


 Thanks.  Can you please specify which are the dfs properties that we can
 set or modify to force the flow of blocks directed towards the larger nodes
 than the smaller nodes ?

 -






 On Mar 24, 2013 4:45 PM, Tapas Sarangi tapas.sara...@gmail.com wrote:

 Hi,

 Thanks for the idea, I will give this a try and report back.

 My worry is if we decommission a small node (one at a time), will it
 move the data to larger nodes or choke another smaller nodes ? In principle
 it should distribute the blocks, the point is it is not distributing the
 way we expect it to, so do you think this may cause further problems ?

 -

 On Mar 24, 2013, at 3:37 PM, Jamal B jm151...@gmail.com wrote:

 Then I think the only way around this would be to decommission  1 at a
 time, the smaller nodes, and ensure that the blocks are moved to the larger
 nodes.

 And once complete, bring back in the smaller nodes, but maybe only after
 you tweak the rack topology to match your disk layout more than network
 layout to compensate for the unbalanced nodes.


 Just my 2 cents


 On Sun, Mar 24, 2013 at 4:31 PM, Tapas Sarangi 
 tapas.sara...@gmail.comwrote:

 Thanks. We have a 1-1 configuration of drives and folder in all the
 datanodes.

 -Tapas

 On Mar 24, 2013, at 3:29 PM, Jamal B jm151...@gmail.com wrote:

 On both types of nodes, what is your dfs.data.dir set to? Does it
 specify multiple folders on the same set's of drives or is it 1-1 between
 folder and drive?  If it's set to multiple folders on the same drives, it
 is probably multiplying the amount of available capacity incorrectly in
 that it assumes a 1-1 relationship between folder and total capacity of the
 drive.


 On Sun, Mar 24, 2013 at 3:01 PM, Tapas Sarangi tapas.sara...@gmail.com
  wrote:

 Yes, thanks for pointing, but I already know that it is completing the
 balancing when exiting otherwise it shouldn't exit.
 Your answer doesn't solve the problem I mentioned earlier in my
 message. 'hdfs' is stalling and hadoop is not writing unless space is
 cleared up from the cluster even though df shows the cluster has about
 500 TB of free space.

 ---


 On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) 
 bal...@balajin.net wrote:

  -setBalancerBandwidth bandwidth in bytes per second

 So the value is bytes per second. If it is running and exiting,it
 means it has completed the balancing.


 On 24 March 2013 11:32, Tapas Sarangi tapas.sara...@gmail.com wrote:

 Yes, we are running balancer, though a balancer process runs for
 almost a day or more before exiting and starting over.
 Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume
 that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If 
 it
 is in Bits then we have a problem.
 What's the unit for dfs.balance.bandwidthPerSec ?

 -

 On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) 
 li...@balajin.net wrote:

 Are you running balancer? If balancer is running and if it is slow,
 try increasing the balancer bandwidth


 On 24 March 2013 09:21, Tapas Sarangi tapas.sara...@gmail.comwrote:

 Thanks for the follow up. I don't know whether attachment will pass
 through this mailing list, but I am attaching a pdf that contains the 
 usage
 of all live nodes.

 All nodes starting with letter g are the ones with smaller storage
 space where as nodes starting with letter s have larger storage 
 space. As
 you will see, most of the gXX nodes are completely full whereas sXX
 nodes have a lot of unused space.

 Recently, we are facing crisis frequently as 'hdfs' goes into a mode
 where it is not able to write any further even though the total space
 available in the cluster is about 500 TB. We believe this has something 
 to
 do with the way it is balancing the nodes, but don't understand the 
 problem
 yet. May be the attached PDF will help some of you (experts) to see 
 what is
 going wrong here...

 Thanks
 --







 Balancer know about topology,but when calculate balancing it
 operates only with nodes not with racks.
 You can see how it work in Balancer.java in  BalancerDatanode about
 string 509.

 I was wrong about 350Tb,35Tb it calculates in such way :

 For 

Re: is it possible to disable security in MapReduce to avoid having PriviledgedActionException?

2013-03-24 Thread Harsh J
What is the exact error you're getting? Can you please paste with
the full stack trace and your version in use?

Many times the PriviledgedActionException is just a wrapper around the
real cause and gets overlooked. It does not necessarily appear due to
security code (whether security is enabled or disabled).

In any case, if you meant to run MR with zero UGI.doAs (which will
wrap with that exception) then no, thats not possible to do.

On Mon, Mar 25, 2013 at 12:57 AM, Pedro Sá da Costa psdc1...@gmail.com wrote:
 Hi,

 is it possible to disable security in MapReduce to avoid having
 PriviledgedActionException?

 Thanks,




-- 
Harsh J


Re:Re: disk used percentage is not symmetric on datanodes (balancer)

2013-03-24 Thread see1230
if  the balancer is not  running ,or with a low bandwith and slow reaction, i 
think  there may have a signatual unsymmetric between datanodes .






At 2013-03-25 04:37:05,Jamal B jm151...@gmail.com wrote:

Then I think the only way around this would be to decommission  1 at a time, 
the smaller nodes, and ensure that the blocks are moved to the larger nodes.  
And once complete, bring back in the smaller nodes, but maybe only after you 
tweak the rack topology to match your disk layout more than network layout to 
compensate for the unbalanced nodes.  


Just my 2 cents



On Sun, Mar 24, 2013 at 4:31 PM, Tapas Sarangi tapas.sara...@gmail.com wrote:

Thanks. We have a 1-1 configuration of drives and folder in all the datanodes.


-Tapas


On Mar 24, 2013, at 3:29 PM, Jamal B jm151...@gmail.com wrote:


On both types of nodes, what is your dfs.data.dir set to? Does it specify 
multiple folders on the same set's of drives or is it 1-1 between folder and 
drive?  If it's set to multiple folders on the same drives, it is probably 
multiplying the amount of available capacity incorrectly in that it assumes a 
1-1 relationship between folder and total capacity of the drive.



On Sun, Mar 24, 2013 at 3:01 PM, Tapas Sarangi tapas.sara...@gmail.com wrote:

Yes, thanks for pointing, but I already know that it is completing the 
balancing when exiting otherwise it shouldn't exit. 
Your answer doesn't solve the problem I mentioned earlier in my message. 'hdfs' 
is stalling and hadoop is not writing unless space is cleared up from the 
cluster even though df shows the cluster has about 500 TB of free space. 


---
 


On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) 
bal...@balajin.net wrote:


 -setBalancerBandwidth bandwidth in bytes per second

So the value is bytes per second. If it is running and exiting,it means it has 
completed the balancing.




On 24 March 2013 11:32, Tapas Sarangi tapas.sara...@gmail.com wrote:

Yes, we are running balancer, though a balancer process runs for almost a day 
or more before exiting and starting over.
Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume that's 
bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it is in Bits 
then we have a problem.
What's the unit for dfs.balance.bandwidthPerSec ?


-


On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) 
li...@balajin.net wrote:


Are you running balancer? If balancer is running and if it is slow, try 
increasing the balancer bandwidth




On 24 March 2013 09:21, Tapas Sarangi tapas.sara...@gmail.com wrote:

Thanks for the follow up. I don't know whether attachment will pass through 
this mailing list, but I am attaching a pdf that contains the usage of all live 
nodes.


All nodes starting with letter g are the ones with smaller storage space 
where as nodes starting with letter s have larger storage space. As you will 
see, most of the gXX nodes are completely full whereas sXX nodes have a lot 
of unused space. 


Recently, we are facing crisis frequently as 'hdfs' goes into a mode where it 
is not able to write any further even though the total space available in the 
cluster is about 500 TB. We believe this has something to do with the way it is 
balancing the nodes, but don't understand the problem yet. May be the attached 
PDF will help some of you (experts) to see what is going wrong here...


Thanks
--













Balancer know about topology,but when calculate balancing it operates only with 
nodes not with racks.
You can see how it work in Balancer.java in  BalancerDatanode about string 509.

I was wrong about 350Tb,35Tb it calculates in such way :

For example:
cluster_capacity=3.5Pb
cluster_dfsused=2Pb

avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster capacity
Then we know avg node utilization (node_dfsused/node_capacity*100) .Balancer 
think that all good if  avgutil +10node_utilizazation=avgutil-10.

Ideal case that all node used avgutl of capacity.but for 12TB node its only 
6.5Tb and for 72Tb its about 40Tb.

Balancer cant help you.

Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVE if you 
can.

 





In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb you will 
be able to have only 12Tb replication data.



Yes, this is true for exactly two nodes in the cluster with 12 TB and 72 TB, 
but not true for more than two nodes in the cluster.



Best way,on my opinion,it is using multiple racks.Nodes in rack must be with 
identical capacity.Racks must be identical capacity.
For example:

rack1: 1 node with 72Tb
rack2: 6 nodes with 12Tb
rack3: 3 nodes with 24Tb

It helps with balancing,because dublicated  block must be another rack.




The same question I asked earlier in this message, does multiple racks with 
default threshold for the balancer minimizes the difference between racks ?


Why did you select hdfs?May be lustre,cephfs and other is better choise. 



It wasn't my decision, 

Hadoop-2.x native libraries

2013-03-24 Thread Azuryy Yu
Hi,
How to get hadoop-2.0.3-alpha native libraries, it was compiled under
32bits OS in the released package currently.


Re: Hadoop-2.x native libraries

2013-03-24 Thread Harsh J
If you're using a tarball, you'll need to build a native-added tarball
yourself with mvn package -Pdist,native,docs -DskipTests -Dtar and
then use that.

Alternatively, if you're interested in packages, use the Apache
Bigtop's scripts from http://bigtop.apache.org/ project's repository
and generate the packages with native libs as well.

On Mon, Mar 25, 2013 at 9:27 AM, Azuryy Yu azury...@gmail.com wrote:
 Hi,
 How to get hadoop-2.0.3-alpha native libraries, it was compiled under 32bits
 OS in the released package currently.





-- 
Harsh J


Re: Child JVM memory allocation / Usage

2013-03-24 Thread Harsh J
The MapTask may consume some memory of its own as well. What is your
io.sort.mb (MR1) or mapreduce.task.io.sort.mb (MR2) set to?

On Sun, Mar 24, 2013 at 3:40 PM, nagarjuna kanamarlapudi
nagarjuna.kanamarlap...@gmail.com wrote:
 Hi,

 I configured  my child jvm heap to 2 GB. So, I thought I could really read
 1.5GB of data and store it in memory (mapper/reducer).

 I wanted to confirm the same and wrote the following piece of code in the
 configure method of mapper.

 @Override

 public void configure(JobConf job) {

 System.out.println(FREE MEMORY -- 

 + Runtime.getRuntime().freeMemory());

 System.out.println(MAX MEMORY --- + Runtime.getRuntime().maxMemory());

 }


 Surprisingly the output was


 FREE MEMORY -- 341854864  = 320 MB
 MAX MEMORY ---1908932608  = 1.9 GB


 I am just wondering what processes are taking up that extra 1.6GB of heap
 which I configured for the child jvm heap.


 Appreciate in helping me understand the scenario.



 Regards

 Nagarjuna K






-- 
Harsh J


Re: Child JVM memory allocation / Usage

2013-03-24 Thread nagarjuna kanamarlapudi
io.sort.mb = 256 MB

On Monday, March 25, 2013, Harsh J wrote:

 The MapTask may consume some memory of its own as well. What is your
 io.sort.mb (MR1) or mapreduce.task.io.sort.mb (MR2) set to?

 On Sun, Mar 24, 2013 at 3:40 PM, nagarjuna kanamarlapudi
 nagarjuna.kanamarlap...@gmail.com javascript:; wrote:
  Hi,
 
  I configured  my child jvm heap to 2 GB. So, I thought I could really
 read
  1.5GB of data and store it in memory (mapper/reducer).
 
  I wanted to confirm the same and wrote the following piece of code in the
  configure method of mapper.
 
  @Override
 
  public void configure(JobConf job) {
 
  System.out.println(FREE MEMORY -- 
 
  + Runtime.getRuntime().freeMemory());
 
  System.out.println(MAX MEMORY --- + Runtime.getRuntime().maxMemory());
 
  }
 
 
  Surprisingly the output was
 
 
  FREE MEMORY -- 341854864  = 320 MB
  MAX MEMORY ---1908932608  = 1.9 GB
 
 
  I am just wondering what processes are taking up that extra 1.6GB of heap
  which I configured for the child jvm heap.
 
 
  Appreciate in helping me understand the scenario.
 
 
 
  Regards
 
  Nagarjuna K
 
 
 



 --
 Harsh J



-- 
Sent from iPhone


shuffling one intermediate pair to more than one reducer

2013-03-24 Thread Vikas Jadhav
Hello

I have use case where i want to shuffle same pair to more than one reducer.
is there anyone tried this or can give suggestion how to implement it.


I have crated jira for same
https://issues.apache.org/jira/browse/MAPREDUCE-5063

Thank you.
-- 
*
*
*

Thanx and Regards*
* Vikas Jadhav*


Re: MapReduce Failed and Killed

2013-03-24 Thread Hemanth Yamijala
Any MapReduce task needs to communicate with the tasktracker that launched
it periodically in order to let the tasktracker know it is still alive and
active. The time for which silence is tolerated is controlled by a
configuration property mapred.task.timeout.

It looks like in your case, this has already been bumped up to 20 minutes
(from the default 10 minutes). It also looks like this is not sufficient.
You could bump this value even further up. However, the correct approach
could be to see what the reducer is actually doing to become inactive
during this time. Can you look at the reducer attempt's logs (which you can
access from the web UI of the Jobtracker) and post them here ?

Thanks
hemanth


On Fri, Mar 22, 2013 at 5:32 PM, Jinchun Kim cien...@gmail.com wrote:

 Hi, All.

 I'm trying to create category-based splits of Wikipedia dataset(41GB) and
 the training data set(5GB) using Mahout.
 I'm using following command.

 $MAHOUT_HOME/bin/mahout wikipediaDataSetCreator -i wikipedia/chunks -o
 wikipediainput -c $MAHOUT_HOME/examples/temp/categories.txt

 I had no problem with the training data set, but Hadoop showed following
 messages
 when I tried to do a same job with Wikipedia dataset,

 .
 13/03/21 22:31:00 INFO mapred.JobClient:  map 27% reduce 1%
 13/03/21 22:40:31 INFO mapred.JobClient:  map 27% reduce 2%
 13/03/21 22:58:49 INFO mapred.JobClient:  map 27% reduce 3%
 13/03/21 23:22:57 INFO mapred.JobClient:  map 27% reduce 4%
 13/03/21 23:46:32 INFO mapred.JobClient:  map 27% reduce 5%
 13/03/22 00:27:14 INFO mapred.JobClient:  map 27% reduce 6%
 13/03/22 01:06:55 INFO mapred.JobClient:  map 27% reduce 7%
 13/03/22 01:14:06 INFO mapred.JobClient:  map 27% reduce 3%
 13/03/22 01:15:35 INFO mapred.JobClient: Task Id :
 attempt_201303211339_0002_r_00_1, Status : FAILED
 Task attempt_201303211339_0002_r_00_1 failed to report status for 1200
 seconds. Killing!
 13/03/22 01:20:09 INFO mapred.JobClient:  map 27% reduce 4%
 13/03/22 01:33:35 INFO mapred.JobClient: Task Id :
 attempt_201303211339_0002_m_37_1, Status : FAILED
 Task attempt_201303211339_0002_m_37_1 failed to report status for 1228
 seconds. Killing!
 13/03/22 01:35:12 INFO mapred.JobClient:  map 27% reduce 5%
 13/03/22 01:40:38 INFO mapred.JobClient:  map 27% reduce 6%
 13/03/22 01:52:28 INFO mapred.JobClient:  map 27% reduce 7%
 13/03/22 02:16:27 INFO mapred.JobClient:  map 27% reduce 8%
 13/03/22 02:19:02 INFO mapred.JobClient: Task Id :
 attempt_201303211339_0002_m_18_1, Status : FAILED
 Task attempt_201303211339_0002_m_18_1 failed to report status for 1204
 seconds. Killing!
 13/03/22 02:49:03 INFO mapred.JobClient:  map 27% reduce 9%
 13/03/22 02:52:04 INFO mapred.JobClient:  map 28% reduce 9%
 

 Because I just started to learn how to run Hadoop, I have no idea how to
 solve
 this problem...
 Does anyone have an idea how to handle this weird thing?

 --
 *Jinchun Kim*



Re: Hadoop-2.x native libraries

2013-03-24 Thread Azuryy Yu
Thanks Harsh!
I used -Pnative got it.
I am compile src code. I made MRv1 work with HDFSv2 successfully.
 On Mar 25, 2013 12:56 PM, Harsh J ha...@cloudera.com wrote:

 If you're using a tarball, you'll need to build a native-added tarball
 yourself with mvn package -Pdist,native,docs -DskipTests -Dtar and
 then use that.

 Alternatively, if you're interested in packages, use the Apache
 Bigtop's scripts from http://bigtop.apache.org/ project's repository
 and generate the packages with native libs as well.

 On Mon, Mar 25, 2013 at 9:27 AM, Azuryy Yu azury...@gmail.com wrote:
  Hi,
  How to get hadoop-2.0.3-alpha native libraries, it was compiled under
 32bits
  OS in the released package currently.
 
 



 --
 Harsh J



Any answer ? Candidate application for map reduce

2013-03-24 Thread AMARNATH, Balachandar
Any answers from anyone of you :)


Regards
Bala

From: AMARNATH, Balachandar [mailto:balachandar.amarn...@airbus.com]
Sent: 22 March 2013 10:25
To: user@hadoop.apache.org
Subject: Candidate application for map reduce

Hello,

I am looking for an sample application (preferably image processing, feature 
detection etc) that can be good candidate of map reduce paradigm. To very 
specific, I am looking for an open source simple application that process data 
and produce some results (A). When you split the data in to chunks, feed the 
chunks to the application, and merge the processed chunks to get A back. Is 
there any website where can I look into such kind of benchmark applications ? 
Any pointers and thoughts will be helpful here.

With thanks and regards
Balachandar




The information in this e-mail is confidential. The contents may not be 
disclosed or used by anyone other than the addressee. Access to this e-mail by 
anyone else is unauthorised.

If you are not the intended recipient, please notify Airbus immediately and 
delete this e-mail.

Airbus cannot accept any responsibility for the accuracy or completeness of 
this e-mail as it has been sent over public networks. If you have any concerns 
over the content of this message or its Accuracy or Integrity, please contact 
Airbus immediately.

All outgoing e-mails from Airbus are checked using regularly updated virus 
scanning software but you should take whatever measures you deem to be 
appropriate to ensure that this message and any attachments are virus free.

The information in this e-mail is confidential. The contents may not be 
disclosed or used by anyone other than the addressee. Access to this e-mail by 
anyone else is unauthorised.
If you are not the intended recipient, please notify Airbus immediately and 
delete this e-mail.
Airbus cannot accept any responsibility for the accuracy or completeness of 
this e-mail as it has been sent over public networks. If you have any concerns 
over the content of this message or its Accuracy or Integrity, please contact 
Airbus immediately.
All outgoing e-mails from Airbus are checked using regularly updated virus 
scanning software but you should take whatever measures you deem to be 
appropriate to ensure that this message and any attachments are virus free.