subject:"disk used percentage is not symmetric on datanodes \(balancer\)"

Re: disk used percentage is not symmetric on datanodes (balancer)

2013-03-25 Thread Alexey Babutin

On Mon, Mar 25, 2013 at 4:29 AM, Tapas Sarangi tapas.sara...@gmail.comwrote:

Hi,

Thanks for the explanation. Where can I find the java code for balancer
that utilizes the threshold value and calculate it myself as you mentioned
? I think I understand your calculation, but would like to see the code.

src/hdfs/org/apache/hadoop/hdfs/server/balancer/Balancer.java

see BalancerDatanode

If I set the threshold to 5 instead of 10, then the smaller nodes will
have a maximum of 95% full where the larger nodes disk-usage will increase
from 80% to 85%.

Now my question to you and the experts is when I run the balancer, is the
following command enough to set the threshold to a different value :

hadoop balancer -threshold 5

yes

Thanks to all for the suggestions...

---

today i thought about my advice for you and i have understood that i wrong.

for example we have 100 nodes where 80 with 12Tb and 20 with 72 Tb.all
node have 10 Tb data.
averege cluster dfs used 1000/2600*100=38.5

for 12Tb node dfs used it is 83.3 from capacity
for 72Tb nodes its 13.9.

node is balanced if averege cluster dfs used +threshold node dfs
used averege cluster dfs used - threshold.
data will move from 12Tb to 72 Tb and when 12Tb nodes will have 48.5 of
capacity balancer will stop.
In this time 72tb node have 36.1 % of capacity.

the cluster will grow up,in ideal case when cluster dfs used capacity 90 %
.72Tb nodes will about 80% of capacity and 12Tb have about 100 %
capacity.After that you have about 288Tb freespace

On Sun, Mar 24, 2013 at 11:01 PM, Tapas Sarangi
tapas.sara...@gmail.comwrote:

Yes, thanks for pointing, but I already know that it is completing the
balancing when exiting otherwise it shouldn't exit.
Your answer doesn't solve the problem I mentioned earlier in my message.
'hdfs' is stalling and hadoop is not writing unless space is cleared up
from the cluster even though df shows the cluster has about 500 TB of
free space.

---

On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்)
bal...@balajin.net wrote:

-setBalancerBandwidth bandwidth in bytes per second

So the value is bytes per second. If it is running and exiting,it means
it has completed the balancing.

On 24 March 2013 11:32, Tapas Sarangi tapas.sara...@gmail.com wrote:

Yes, we are running balancer, though a balancer process runs for almost
a day or more before exiting and starting over.
Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume
that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it
is in Bits then we have a problem.
What's the unit for dfs.balance.bandwidthPerSec ?

On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்)
li...@balajin.net wrote:

Are you running balancer? If balancer is running and if it is slow, try
increasing the balancer bandwidth

On 24 March 2013 09:21, Tapas Sarangi tapas.sara...@gmail.com wrote:

Thanks for the follow up. I don't know whether attachment will pass
through this mailing list, but I am attaching a pdf that contains the
usage
of all live nodes.

All nodes starting with letter g are the ones with smaller storage
space where as nodes starting with letter s have larger storage space.
As
you will see, most of the gXX nodes are completely full whereas sXX
nodes have a lot of unused space.

Recently, we are facing crisis frequently as 'hdfs' goes into a mode
where it is not able to write any further even though the total space
available in the cluster is about 500 TB. We believe this has something to
do with the way it is balancing the nodes, but don't understand the
problem
yet. May be the attached PDF will help some of you (experts) to see what
is
going wrong here...

Thanks
--

Balancer know about topology,but when calculate balancing it operates
only with nodes not with racks.
You can see how it work in Balancer.java in BalancerDatanode about
string 509.

I was wrong about 350Tb,35Tb it calculates in such way :

For example:
cluster_capacity=3.5Pb
cluster_dfsused=2Pb

avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster
capacity
Then we know avg node utilization (node_dfsused/node_capacity*100)
.Balancer think that all good if avgutil
+10node_utilizazation=avgutil-10.

Ideal case that all node used avgutl of capacity.but for 12TB node its
only 6.5Tb and for 72Tb its about 40Tb.

Balancer cant help you.

Show me
http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVE if
you can.

In ideal case with replication factor 2 ,with two nodes 12Tb and
72Tb you will be able to have only 12Tb replication data.

Yes, this is true for exactly two nodes in the cluster with 12 TB and
72 TB, but not true for more than two nodes in the cluster.

Best way,on my opinion,it is using multiple racks.Nodes in rack must
be with identical capacity.Racks must be identical capacity.
For

Re: disk used percentage is not symmetric on datanodes (balancer)

2013-03-24 Thread பாலாஜி நாராயணன்

Are you running balancer? If balancer is running and if it is slow, try
increasing the balancer bandwidth


On 24 March 2013 09:21, Tapas Sarangi tapas.sara...@gmail.com wrote:

 Thanks for the follow up. I don't know whether attachment will pass
 through this mailing list, but I am attaching a pdf that contains the usage
 of all live nodes.

 All nodes starting with letter g are the ones with smaller storage space
 where as nodes starting with letter s have larger storage space. As you
 will see, most of the gXX nodes are completely full whereas sXX nodes
 have a lot of unused space.

 Recently, we are facing crisis frequently as 'hdfs' goes into a mode where
 it is not able to write any further even though the total space available
 in the cluster is about 500 TB. We believe this has something to do with
 the way it is balancing the nodes, but don't understand the problem yet.
 May be the attached PDF will help some of you (experts) to see what is
 going wrong here...

 Thanks
 --







 Balancer know about topology,but when calculate balancing it operates only
 with nodes not with racks.
 You can see how it work in Balancer.java in  BalancerDatanode about string
 509.

 I was wrong about 350Tb,35Tb it calculates in such way :

 For example:
 cluster_capacity=3.5Pb
 cluster_dfsused=2Pb

 avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster capacity
 Then we know avg node utilization (node_dfsused/node_capacity*100)
 .Balancer think that all good if  avgutil
 +10node_utilizazation=avgutil-10.

 Ideal case that all node used avgutl of capacity.but for 12TB node its
 only 6.5Tb and for 72Tb its about 40Tb.

 Balancer cant help you.

 Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVEif you 
 can.





  In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb
 you will be able to have only 12Tb replication data.


 Yes, this is true for exactly two nodes in the cluster with 12 TB and 72
 TB, but not true for more than two nodes in the cluster.


 Best way,on my opinion,it is using multiple racks.Nodes in rack must be
 with identical capacity.Racks must be identical capacity.
 For example:

 rack1: 1 node with 72Tb
 rack2: 6 nodes with 12Tb
 rack3: 3 nodes with 24Tb

 It helps with balancing,because dublicated  block must be another rack.


 The same question I asked earlier in this message, does multiple racks
 with default threshold for the balancer minimizes the difference between
 racks ?

 Why did you select hdfs?May be lustre,cephfs and other is better choise.


 It wasn't my decision, and I probably can't change it now. I am new to
 this cluster and trying to understand few issues. I will explore other
 options as you mentioned.

 --
 http://balajin.net/blog
 http://flic.kr/balajijegan

Re: disk used percentage is not symmetric on datanodes (balancer)

2013-03-24 Thread Tapas Sarangi

Yes, we are running balancer, though a balancer process runs for almost a day
or more before exiting and starting over.
Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume that's
bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it is in Bits
then we have a problem.
What's the unit for dfs.balance.bandwidthPerSec ?

On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்)
li...@balajin.net wrote:

Are you running balancer? If balancer is running and if it is slow, try
increasing the balancer bandwidth

On 24 March 2013 09:21, Tapas Sarangi tapas.sara...@gmail.com wrote:
Thanks for the follow up. I don't know whether attachment will pass through
this mailing list, but I am attaching a pdf that contains the usage of all
live nodes.

All nodes starting with letter g are the ones with smaller storage space
where as nodes starting with letter s have larger storage space. As you
will see, most of the gXX nodes are completely full whereas sXX nodes
have a lot of unused space.

Recently, we are facing crisis frequently as 'hdfs' goes into a mode where it
is not able to write any further even though the total space available in the
cluster is about 500 TB. We believe this has something to do with the way it
is balancing the nodes, but don't understand the problem yet. May be the
attached PDF will help some of you (experts) to see what is going wrong
here...

Thanks
--

Balancer know about topology,but when calculate balancing it operates only
with nodes not with racks.
You can see how it work in Balancer.java in BalancerDatanode about string
509.

I was wrong about 350Tb,35Tb it calculates in such way :

For example:
cluster_capacity=3.5Pb
cluster_dfsused=2Pb

avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster capacity
Then we know avg node utilization (node_dfsused/node_capacity*100) .Balancer
think that all good if avgutil +10node_utilizazation=avgutil-10.

Ideal case that all node used avgutl of capacity.but for 12TB node its only
6.5Tb and for 72Tb its about 40Tb.

Balancer cant help you.

Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVE if
you can.

In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb you
will be able to have only 12Tb replication data.

Yes, this is true for exactly two nodes in the cluster with 12 TB and 72 TB,
but not true for more than two nodes in the cluster.

Best way,on my opinion,it is using multiple racks.Nodes in rack must be
with identical capacity.Racks must be identical capacity.
For example:

rack1: 1 node with 72Tb
rack2: 6 nodes with 12Tb
rack3: 3 nodes with 24Tb

It helps with balancing,because dublicated block must be another rack.

The same question I asked earlier in this message, does multiple racks with
default threshold for the balancer minimizes the difference between racks ?

Why did you select hdfs?May be lustre,cephfs and other is better choise.

It wasn't my decision, and I probably can't change it now. I am new to this
cluster and trying to understand few issues. I will explore other options as
you mentioned.

--
http://balajin.net/blog
http://flic.kr/balajijegan

Re: disk used percentage is not symmetric on datanodes (balancer)

2013-03-24 Thread பாலாஜி நாராயணன்

-setBalancerBandwidth bandwidth in bytes per second

So the value is bytes per second. If it is running and exiting,it means it
has completed the balancing.

On 24 March 2013 11:32, Tapas Sarangi tapas.sara...@gmail.com wrote:

Yes, we are running balancer, though a balancer process runs for almost a
day or more before exiting and starting over.
Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume
that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it
is in Bits then we have a problem.
What's the unit for dfs.balance.bandwidthPerSec ?

On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்)
li...@balajin.net wrote:

Are you running balancer? If balancer is running and if it is slow, try
increasing the balancer bandwidth

On 24 March 2013 09:21, Tapas Sarangi tapas.sara...@gmail.com wrote:

Thanks for the follow up. I don't know whether attachment will pass
through this mailing list, but I am attaching a pdf that contains the usage
of all live nodes.

All nodes starting with letter g are the ones with smaller storage
space where as nodes starting with letter s have larger storage space. As
you will see, most of the gXX nodes are completely full whereas sXX
nodes have a lot of unused space.

Recently, we are facing crisis frequently as 'hdfs' goes into a mode
where it is not able to write any further even though the total space
available in the cluster is about 500 TB. We believe this has something to
do with the way it is balancing the nodes, but don't understand the problem
yet. May be the attached PDF will help some of you (experts) to see what is
going wrong here...

Thanks
--

Balancer know about topology,but when calculate balancing it operates
only with nodes not with racks.
You can see how it work in Balancer.java in BalancerDatanode about
string 509.

I was wrong about 350Tb,35Tb it calculates in such way :

For example:
cluster_capacity=3.5Pb
cluster_dfsused=2Pb

avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster capacity
Then we know avg node utilization (node_dfsused/node_capacity*100)
.Balancer think that all good if avgutil
+10node_utilizazation=avgutil-10.

Ideal case that all node used avgutl of capacity.but for 12TB node its
only 6.5Tb and for 72Tb its about 40Tb.

Balancer cant help you.

Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVEif
you can.

In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb
you will be able to have only 12Tb replication data.

Yes, this is true for exactly two nodes in the cluster with 12 TB and 72
TB, but not true for more than two nodes in the cluster.

Best way,on my opinion,it is using multiple racks.Nodes in rack must be
with identical capacity.Racks must be identical capacity.
For example:

rack1: 1 node with 72Tb
rack2: 6 nodes with 12Tb
rack3: 3 nodes with 24Tb

It helps with balancing,because dublicated block must be another rack.

The same question I asked earlier in this message, does multiple racks
with default threshold for the balancer minimizes the difference between
racks ?

Why did you select hdfs?May be lustre,cephfs and other is better
choise.

It wasn't my decision, and I probably can't change it now. I am new to
this cluster and trying to understand few issues. I will explore other
options as you mentioned.

--
http://balajin.net/blog
http://flic.kr/balajijegan

Re: disk used percentage is not symmetric on datanodes (balancer)

2013-03-24 Thread Tapas Sarangi

Yes, thanks for pointing, but I already know that it is completing the
balancing when exiting otherwise it shouldn't exit.
Your answer doesn't solve the problem I mentioned earlier in my message. 'hdfs'
is stalling and hadoop is not writing unless space is cleared up from the
cluster even though df shows the cluster has about 500 TB of free space.

---

On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்)
bal...@balajin.net wrote:

-setBalancerBandwidth bandwidth in bytes per second

So the value is bytes per second. If it is running and exiting,it means it
has completed the balancing.

On 24 March 2013 11:32, Tapas Sarangi tapas.sara...@gmail.com wrote:
Yes, we are running balancer, though a balancer process runs for almost a day
or more before exiting and starting over.
Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume that's
bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it is in
Bits then we have a problem.
What's the unit for dfs.balance.bandwidthPerSec ?

On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்)
li...@balajin.net wrote:

Are you running balancer? If balancer is running and if it is slow, try
increasing the balancer bandwidth

All nodes starting with letter g are the ones with smaller storage space
where as nodes starting with letter s have larger storage space. As you
will see, most of the gXX nodes are completely full whereas sXX nodes
have a lot of unused space.

Recently, we are facing crisis frequently as 'hdfs' goes into a mode where
it is not able to write any further even though the total space available in
the cluster is about 500 TB. We believe this has something to do with the
way it is balancing the nodes, but don't understand the problem yet. May be
the attached PDF will help some of you (experts) to see what is going wrong
here...