[jira] [Commented] (HDFS-6133) Make Balancer support exclude specified path

2015-03-18 Thread Xiaoyu Yao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14368166#comment-14368166
 ] 

Xiaoyu Yao commented on HDFS-6133:
--

Hi [~zhaoyunjiong], I hit failures consistently with the new Balancer test 
added for this feature and opened HDFS-7954. 
Can you take a look and confirm if the balancer output is expected or not?  

 Make Balancer support exclude specified path
 

 Key: HDFS-6133
 URL: https://issues.apache.org/jira/browse/HDFS-6133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer  mover, datanode
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Fix For: 2.7.0

 Attachments: HDFS-6133-1.patch, HDFS-6133-10.patch, 
 HDFS-6133-11.patch, HDFS-6133-2.patch, HDFS-6133-3.patch, HDFS-6133-4.patch, 
 HDFS-6133-5.patch, HDFS-6133-6.patch, HDFS-6133-7.patch, HDFS-6133-8.patch, 
 HDFS-6133-9.patch, HDFS-6133.patch


 Currently, run Balancer will destroying Regionserver's data locality.
 If getBlocks could exclude blocks belongs to files which have specific path 
 prefix, like /hbase, then we can run Balancer without destroying 
 Regionserver's data locality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6133) Make Balancer support exclude specified path

2015-02-27 Thread zhaoyunjiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14339994#comment-14339994
 ] 

zhaoyunjiong commented on HDFS-6133:


At the first, my idea is very similar with HDFS-4420. The main difference is I 
change getBlocks to exclude blocks belongs to some path.
https://issues.apache.org/jira/browse/HDFS-6133?focusedCommentId=13943050page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13943050

After Daryn Sharp's comments, it changed to pin:   
https://issues.apache.org/jira/browse/HDFS-6133?focusedCommentId=13980504page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13980504


 Make Balancer support exclude specified path
 

 Key: HDFS-6133
 URL: https://issues.apache.org/jira/browse/HDFS-6133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer  mover, datanode
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Fix For: 2.7.0

 Attachments: HDFS-6133-1.patch, HDFS-6133-10.patch, 
 HDFS-6133-11.patch, HDFS-6133-2.patch, HDFS-6133-3.patch, HDFS-6133-4.patch, 
 HDFS-6133-5.patch, HDFS-6133-6.patch, HDFS-6133-7.patch, HDFS-6133-8.patch, 
 HDFS-6133-9.patch, HDFS-6133.patch


 Currently, run Balancer will destroying Regionserver's data locality.
 If getBlocks could exclude blocks belongs to files which have specific path 
 prefix, like /hbase, then we can run Balancer without destroying 
 Regionserver's data locality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6133) Make Balancer support exclude specified path

2015-02-25 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337787#comment-14337787
 ] 

Tsz Wo Nicholas Sze commented on HDFS-6133:
---

Sure, just have filed HDFS-7849 for updating the documentation.

 Make Balancer support exclude specified path
 

 Key: HDFS-6133
 URL: https://issues.apache.org/jira/browse/HDFS-6133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer  mover, datanode
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Fix For: 2.7.0

 Attachments: HDFS-6133-1.patch, HDFS-6133-10.patch, 
 HDFS-6133-11.patch, HDFS-6133-2.patch, HDFS-6133-3.patch, HDFS-6133-4.patch, 
 HDFS-6133-5.patch, HDFS-6133-6.patch, HDFS-6133-7.patch, HDFS-6133-8.patch, 
 HDFS-6133-9.patch, HDFS-6133.patch


 Currently, run Balancer will destroying Regionserver's data locality.
 If getBlocks could exclude blocks belongs to files which have specific path 
 prefix, like /hbase, then we can run Balancer without destroying 
 Regionserver's data locality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6133) Make Balancer support exclude specified path

2015-02-25 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337881#comment-14337881
 ] 

Yongjun Zhang commented on HDFS-6133:
-

Thanks a lot Nicholas!


 Make Balancer support exclude specified path
 

 Key: HDFS-6133
 URL: https://issues.apache.org/jira/browse/HDFS-6133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer  mover, datanode
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Fix For: 2.7.0

 Attachments: HDFS-6133-1.patch, HDFS-6133-10.patch, 
 HDFS-6133-11.patch, HDFS-6133-2.patch, HDFS-6133-3.patch, HDFS-6133-4.patch, 
 HDFS-6133-5.patch, HDFS-6133-6.patch, HDFS-6133-7.patch, HDFS-6133-8.patch, 
 HDFS-6133-9.patch, HDFS-6133.patch


 Currently, run Balancer will destroying Regionserver's data locality.
 If getBlocks could exclude blocks belongs to files which have specific path 
 prefix, like /hbase, then we can run Balancer without destroying 
 Regionserver's data locality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6133) Make Balancer support exclude specified path

2015-02-24 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335287#comment-14335287
 ] 

Yongjun Zhang commented on HDFS-6133:
-

Hi [~szetszwo],

Thanks for your explanation, and sorry for late reply. 

I agree with your assessment.  I wonder if we can update the config property 
description to say that enabling is not recommended before rolling upgrade is 
finished?

Thanks.

 

 Make Balancer support exclude specified path
 

 Key: HDFS-6133
 URL: https://issues.apache.org/jira/browse/HDFS-6133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer  mover, datanode
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Fix For: 2.7.0

 Attachments: HDFS-6133-1.patch, HDFS-6133-10.patch, 
 HDFS-6133-11.patch, HDFS-6133-2.patch, HDFS-6133-3.patch, HDFS-6133-4.patch, 
 HDFS-6133-5.patch, HDFS-6133-6.patch, HDFS-6133-7.patch, HDFS-6133-8.patch, 
 HDFS-6133-9.patch, HDFS-6133.patch


 Currently, run Balancer will destroying Regionserver's data locality.
 If getBlocks could exclude blocks belongs to files which have specific path 
 prefix, like /hbase, then we can run Balancer without destroying 
 Regionserver's data locality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6133) Make Balancer support exclude specified path

2015-02-21 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14330264#comment-14330264
 ] 

Tsz Wo Nicholas Sze commented on HDFS-6133:
---

By incompatible, we mean that the old software does work at all or the old 
features do not work correctly with the new software.  For the new features in 
the new software, it is impossible to work correctly with the old software.

In our case, if block pinning is enabled during rolling upgrade, everything 
works fine except the block pinning features.  So I think it is not an 
incompatible change.  What do you think?

 Make Balancer support exclude specified path
 

 Key: HDFS-6133
 URL: https://issues.apache.org/jira/browse/HDFS-6133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer  mover, datanode
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Fix For: 2.7.0

 Attachments: HDFS-6133-1.patch, HDFS-6133-10.patch, 
 HDFS-6133-11.patch, HDFS-6133-2.patch, HDFS-6133-3.patch, HDFS-6133-4.patch, 
 HDFS-6133-5.patch, HDFS-6133-6.patch, HDFS-6133-7.patch, HDFS-6133-8.patch, 
 HDFS-6133-9.patch, HDFS-6133.patch


 Currently, run Balancer will destroying Regionserver's data locality.
 If getBlocks could exclude blocks belongs to files which have specific path 
 prefix, like /hbase, then we can run Balancer without destroying 
 Regionserver's data locality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6133) Make Balancer support exclude specified path

2015-02-20 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328914#comment-14328914
 ] 

Yongjun Zhang commented on HDFS-6133:
-

HI [~cmccabe],

My understanding is that HDFS-4420's approach is a passive pinning: it tells 
balancer not to move the blocks belonging to a specified file path; on the 
other hand, HDFS-6133 proactively pin the blocks at write time by specifying 
favored nodes. I think if we do passive pinning after the the blocks of a 
file have been moved by balancer, then it may not be preferred behavior. Hi 
[~zhaoyunjiong], would you please share your thoughts when you started with 
this approach when HDFS-4420's patch was already there?

Hi [~szetszwo] and [~zhaoyunjiong], about HDFS-6133, I was wondering whether 
there is any compatibility issue when we do rolling upgrade? There are API 
changes at {{DataTansferProtocol#writeBlock}}, what would happen if DNs with 
HDFS-6133 (trying to pin) talks to DNs without the fix? Would you please 
comment? 

Thanks.


 Make Balancer support exclude specified path
 

 Key: HDFS-6133
 URL: https://issues.apache.org/jira/browse/HDFS-6133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer  mover, datanode
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Fix For: 2.7.0

 Attachments: HDFS-6133-1.patch, HDFS-6133-10.patch, 
 HDFS-6133-11.patch, HDFS-6133-2.patch, HDFS-6133-3.patch, HDFS-6133-4.patch, 
 HDFS-6133-5.patch, HDFS-6133-6.patch, HDFS-6133-7.patch, HDFS-6133-8.patch, 
 HDFS-6133-9.patch, HDFS-6133.patch


 Currently, run Balancer will destroying Regionserver's data locality.
 If getBlocks could exclude blocks belongs to files which have specific path 
 prefix, like /hbase, then we can run Balancer without destroying 
 Regionserver's data locality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6133) Make Balancer support exclude specified path

2015-02-20 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328931#comment-14328931
 ] 

Tsz Wo Nicholas Sze commented on HDFS-6133:
---

 ... I was wondering whether there is any compatibility issue when we do 
 rolling upgrade? ...

For rolling upgrade, the block pinning feature should be first disabled, then 
upgrade the cluster and then enable the feature.  What do you think?


 Make Balancer support exclude specified path
 

 Key: HDFS-6133
 URL: https://issues.apache.org/jira/browse/HDFS-6133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer  mover, datanode
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Fix For: 2.7.0

 Attachments: HDFS-6133-1.patch, HDFS-6133-10.patch, 
 HDFS-6133-11.patch, HDFS-6133-2.patch, HDFS-6133-3.patch, HDFS-6133-4.patch, 
 HDFS-6133-5.patch, HDFS-6133-6.patch, HDFS-6133-7.patch, HDFS-6133-8.patch, 
 HDFS-6133-9.patch, HDFS-6133.patch


 Currently, run Balancer will destroying Regionserver's data locality.
 If getBlocks could exclude blocks belongs to files which have specific path 
 prefix, like /hbase, then we can run Balancer without destroying 
 Regionserver's data locality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6133) Make Balancer support exclude specified path

2015-02-20 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328935#comment-14328935
 ] 

Yongjun Zhang commented on HDFS-6133:
-

Hi Nicholas,

Thanks a lot for your quick reply! 

I think that means we need to change the rolling upgrade steps, which still 
means a soft incompatible. Should we mark this jira as incompatible?


 Make Balancer support exclude specified path
 

 Key: HDFS-6133
 URL: https://issues.apache.org/jira/browse/HDFS-6133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer  mover, datanode
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Fix For: 2.7.0

 Attachments: HDFS-6133-1.patch, HDFS-6133-10.patch, 
 HDFS-6133-11.patch, HDFS-6133-2.patch, HDFS-6133-3.patch, HDFS-6133-4.patch, 
 HDFS-6133-5.patch, HDFS-6133-6.patch, HDFS-6133-7.patch, HDFS-6133-8.patch, 
 HDFS-6133-9.patch, HDFS-6133.patch


 Currently, run Balancer will destroying Regionserver's data locality.
 If getBlocks could exclude blocks belongs to files which have specific path 
 prefix, like /hbase, then we can run Balancer without destroying 
 Regionserver's data locality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6133) Make Balancer support exclude specified path

2015-02-20 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14330028#comment-14330028
 ] 

Yongjun Zhang commented on HDFS-6133:
-

Hi Nicholas,

Assuming that user doesn't change config during the upgrade process, since by 
default pinning is disabled, I guess your suggestion may be fine.  Would you 
please comment on whether the assumption that user doesn't change config is a 
valid assumption? It seems we don't have a hard control on that, which is what 
I was a little worried. Maybe we can change the config description to emphasize 
this rule?
 
Thanks.



 Make Balancer support exclude specified path
 

 Key: HDFS-6133
 URL: https://issues.apache.org/jira/browse/HDFS-6133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer  mover, datanode
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Fix For: 2.7.0

 Attachments: HDFS-6133-1.patch, HDFS-6133-10.patch, 
 HDFS-6133-11.patch, HDFS-6133-2.patch, HDFS-6133-3.patch, HDFS-6133-4.patch, 
 HDFS-6133-5.patch, HDFS-6133-6.patch, HDFS-6133-7.patch, HDFS-6133-8.patch, 
 HDFS-6133-9.patch, HDFS-6133.patch


 Currently, run Balancer will destroying Regionserver's data locality.
 If getBlocks could exclude blocks belongs to files which have specific path 
 prefix, like /hbase, then we can run Balancer without destroying 
 Regionserver's data locality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6133) Make Balancer support exclude specified path

2015-02-18 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326120#comment-14326120
 ] 

Yongjun Zhang commented on HDFS-6133:
-

Hi [~szetszwo],

Thanks for your comment about the possible performance optimization, it's 
amazing that you even tried a test program, thanks for doing it! Yes, your 
result is pretty good. I agree that we don't have to do the optimization for 
now. I was just thinking about avoiding unnecessary computation if possible, 
especially for busy node. 

BTW, 
{quote}
In many clusters, I heard that on average a file only has 1.x blocks.
{quote}
I'm surprised to hear this statistics but good to know. I always had the 
impression that we deal with relatively small number of big files, rather than 
many small files.

Thanks.



 Make Balancer support exclude specified path
 

 Key: HDFS-6133
 URL: https://issues.apache.org/jira/browse/HDFS-6133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer  mover, datanode
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Fix For: 2.7.0

 Attachments: HDFS-6133-1.patch, HDFS-6133-10.patch, 
 HDFS-6133-11.patch, HDFS-6133-2.patch, HDFS-6133-3.patch, HDFS-6133-4.patch, 
 HDFS-6133-5.patch, HDFS-6133-6.patch, HDFS-6133-7.patch, HDFS-6133-8.patch, 
 HDFS-6133-9.patch, HDFS-6133.patch


 Currently, run Balancer will destroying Regionserver's data locality.
 If getBlocks could exclude blocks belongs to files which have specific path 
 prefix, like /hbase, then we can run Balancer without destroying 
 Regionserver's data locality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6133) Make Balancer support exclude specified path

2015-02-18 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326929#comment-14326929
 ] 

Colin Patrick McCabe commented on HDFS-6133:


I apologize for coming into this discussion really late.  I am curious, though, 
why did we decide to go with pinning rather than with the solution on 
HDFS-4420?  That solution seems simpler because it doesn't require modifying 
the datanode.  Instead, we could just tell the balancer to exclude the HBase 
paths.  Do you guys have some insight into this?

 Make Balancer support exclude specified path
 

 Key: HDFS-6133
 URL: https://issues.apache.org/jira/browse/HDFS-6133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer  mover, datanode
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Fix For: 2.7.0

 Attachments: HDFS-6133-1.patch, HDFS-6133-10.patch, 
 HDFS-6133-11.patch, HDFS-6133-2.patch, HDFS-6133-3.patch, HDFS-6133-4.patch, 
 HDFS-6133-5.patch, HDFS-6133-6.patch, HDFS-6133-7.patch, HDFS-6133-8.patch, 
 HDFS-6133-9.patch, HDFS-6133.patch


 Currently, run Balancer will destroying Regionserver's data locality.
 If getBlocks could exclude blocks belongs to files which have specific path 
 prefix, like /hbase, then we can run Balancer without destroying 
 Regionserver's data locality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6133) Make Balancer support exclude specified path

2015-02-17 Thread Jingcheng Du (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14325393#comment-14325393
 ] 

Jingcheng Du commented on HDFS-6133:


Thanks, it was a wrong patch.

 Make Balancer support exclude specified path
 

 Key: HDFS-6133
 URL: https://issues.apache.org/jira/browse/HDFS-6133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer  mover, datanode
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Fix For: 2.7.0

 Attachments: HDFS-6133-1.patch, HDFS-6133-10.patch, 
 HDFS-6133-11.patch, HDFS-6133-2.patch, HDFS-6133-3.patch, HDFS-6133-4.patch, 
 HDFS-6133-5.patch, HDFS-6133-6.patch, HDFS-6133-7.patch, HDFS-6133-8.patch, 
 HDFS-6133-9.patch, HDFS-6133.patch


 Currently, run Balancer will destroying Regionserver's data locality.
 If getBlocks could exclude blocks belongs to files which have specific path 
 prefix, like /hbase, then we can run Balancer without destroying 
 Regionserver's data locality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6133) Make Balancer support exclude specified path

2015-02-17 Thread Jingcheng Du (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14325391#comment-14325391
 ] 

Jingcheng Du commented on HDFS-6133:


Thanks, it was a wrong patch.

 Make Balancer support exclude specified path
 

 Key: HDFS-6133
 URL: https://issues.apache.org/jira/browse/HDFS-6133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer  mover, datanode
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Fix For: 2.7.0

 Attachments: HDFS-6133-1.patch, HDFS-6133-10.patch, 
 HDFS-6133-11.patch, HDFS-6133-2.patch, HDFS-6133-3.patch, HDFS-6133-4.patch, 
 HDFS-6133-5.patch, HDFS-6133-6.patch, HDFS-6133-7.patch, HDFS-6133-8.patch, 
 HDFS-6133-9.patch, HDFS-6133.patch


 Currently, run Balancer will destroying Regionserver's data locality.
 If getBlocks could exclude blocks belongs to files which have specific path 
 prefix, like /hbase, then we can run Balancer without destroying 
 Regionserver's data locality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6133) Make Balancer support exclude specified path

2015-02-17 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14323907#comment-14323907
 ] 

Tsz Wo Nicholas Sze commented on HDFS-6133:
---

 ... Assume on average a file has 1k blocks, ...

In many clusters, I heard that on average a file only has 1.x blocks.

 ... the diff is 40k vs 10M. ...

This is on top of the 1000 favoredNodes and 10 replicas assumption.  So, the 
case is quite extreme.

Suppose we hit this extreme case, the io time used in writing the 1000 blocks 
file is significantly larger than the time used for 10M string comparisons.  
Just have tested it in my laptop.  It takes less than 100 ms for 10M string 
comparisons.  See my test program below.

{code}
  public static void main(String[] args) {
final Random ran = new Random();
final String prefix = ran.nextInt(256) + . + ran.nextInt(256)
+ . + ran.nextInt(256) + . + ran.nextInt(256) + :;

String[] a = new String[1000];
String[] b = new String[1000];
for(int i = 0; i  a.length; i++) {
  a[i] = prefix + ran.nextInt(1000);
  b[i] = prefix + ran.nextInt(1000);
}
int same = 0;
int different = 0;
final long starttime = System.currentTimeMillis();
for(int k = 0; k  10; k++) {
  for(int i = 0; i  a.length; i++) {
for(int j = 0; j  b.length; j++) {
  if (a[i].equals(b[j])) {
same++;
  } else {
different++;
  }
}
  }
}
final long duration = System.currentTimeMillis() - starttime;
System.out.println(duration= + duration +  ms, same= + same + , 
different= + different);
//Sample output:
//duration=73 ms, same=9830, different=9990170
  }
{code}

 Make Balancer support exclude specified path
 

 Key: HDFS-6133
 URL: https://issues.apache.org/jira/browse/HDFS-6133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer  mover, datanode
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Fix For: 2.7.0

 Attachments: HDFS-6133-1.patch, HDFS-6133-10.patch, 
 HDFS-6133-11.patch, HDFS-6133-2.patch, HDFS-6133-3.patch, HDFS-6133-4.patch, 
 HDFS-6133-5.patch, HDFS-6133-6.patch, HDFS-6133-7.patch, HDFS-6133-8.patch, 
 HDFS-6133-9.patch, HDFS-6133.patch


 Currently, run Balancer will destroying Regionserver's data locality.
 If getBlocks could exclude blocks belongs to files which have specific path 
 prefix, like /hbase, then we can run Balancer without destroying 
 Regionserver's data locality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6133) Make Balancer support exclude specified path

2015-02-17 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14323835#comment-14323835
 ] 

Hudson commented on HDFS-6133:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #97 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/97/])
HDFS-6133. Add a feature for replica pinning so that a pinned replica will not 
be moved by Balancer/Mover.  Contributed by zhaoyunjiong (szetszwo: rev 
085b1e293ff53f7a86aa21406cfd4bfa0f3bf33b)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestBalancer.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/DataTransferProtocol.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/DFSTestUtil.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDiskError.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/FsDatasetSpi.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiver.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java
* hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/extdataset/ExternalDatasetImpl.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/Sender.java
* hadoop-hdfs-project/hadoop-hdfs/src/main/proto/datatransfer.proto
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/Receiver.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDataTransferProtocol.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java


 Make Balancer support exclude specified path
 

 Key: HDFS-6133
 URL: https://issues.apache.org/jira/browse/HDFS-6133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer  mover, datanode
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Fix For: 2.7.0

 Attachments: HDFS-6133-1.patch, HDFS-6133-10.patch, 
 HDFS-6133-11.patch, HDFS-6133-2.patch, HDFS-6133-3.patch, HDFS-6133-4.patch, 
 HDFS-6133-5.patch, HDFS-6133-6.patch, HDFS-6133-7.patch, HDFS-6133-8.patch, 
 HDFS-6133-9.patch, HDFS-6133.patch


 Currently, run Balancer will destroying Regionserver's data locality.
 If getBlocks could exclude blocks belongs to files which have specific path 
 prefix, like /hbase, then we can run Balancer without destroying 
 Regionserver's data locality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6133) Make Balancer support exclude specified path

2015-02-17 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14323911#comment-14323911
 ] 

Tsz Wo Nicholas Sze commented on HDFS-6133:
---

 ... the dfs.datanode.block-pinning.enabled is not referenced by any code, 
 ...

Looked at the wrong patch?  DFSConfigKeys.DFS_DATANODE_BLOCK_PINNING_ENABLED is 
used in FsDatasetImpl.   When blockPinningEnabled is false, 
FsDatasetImpl.getPinning always return false and setPinning is an no-op.

 Make Balancer support exclude specified path
 

 Key: HDFS-6133
 URL: https://issues.apache.org/jira/browse/HDFS-6133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer  mover, datanode
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Fix For: 2.7.0

 Attachments: HDFS-6133-1.patch, HDFS-6133-10.patch, 
 HDFS-6133-11.patch, HDFS-6133-2.patch, HDFS-6133-3.patch, HDFS-6133-4.patch, 
 HDFS-6133-5.patch, HDFS-6133-6.patch, HDFS-6133-7.patch, HDFS-6133-8.patch, 
 HDFS-6133-9.patch, HDFS-6133.patch


 Currently, run Balancer will destroying Regionserver's data locality.
 If getBlocks could exclude blocks belongs to files which have specific path 
 prefix, like /hbase, then we can run Balancer without destroying 
 Regionserver's data locality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6133) Make Balancer support exclude specified path

2015-02-13 Thread Jingcheng Du (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14319737#comment-14319737
 ] 

Jingcheng Du commented on HDFS-6133:


Hi [~zhaoyunjiong].
The release note mentions that dfs.datanode.block-pinning.enabled is the key 
to enable/disable the pinning. As reading the patch, I find that it only uses 
the favoredNodes (the first element) to decide if the pinning is enabled, and 
the dfs.datanode.block-pinning.enabled is not referenced by any code, i.e. 
it's not useful in this feature. Am I wrong? Please advise. Thanks a lot.

 Make Balancer support exclude specified path
 

 Key: HDFS-6133
 URL: https://issues.apache.org/jira/browse/HDFS-6133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer  mover, datanode
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Fix For: 2.7.0

 Attachments: HDFS-6133-1.patch, HDFS-6133-10.patch, 
 HDFS-6133-11.patch, HDFS-6133-2.patch, HDFS-6133-3.patch, HDFS-6133-4.patch, 
 HDFS-6133-5.patch, HDFS-6133-6.patch, HDFS-6133-7.patch, HDFS-6133-8.patch, 
 HDFS-6133-9.patch, HDFS-6133.patch


 Currently, run Balancer will destroying Regionserver's data locality.
 If getBlocks could exclude blocks belongs to files which have specific path 
 prefix, like /hbase, then we can run Balancer without destroying 
 Regionserver's data locality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6133) Make Balancer support exclude specified path

2015-02-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14318057#comment-14318057
 ] 

Hudson commented on HDFS-6133:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #102 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/102/])
HDFS-6133. Add a feature for replica pinning so that a pinned replica will not 
be moved by Balancer/Mover.  Contributed by zhaoyunjiong (szetszwo: rev 
085b1e293ff53f7a86aa21406cfd4bfa0f3bf33b)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestBalancer.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/DataTransferProtocol.java
* hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/FsDatasetSpi.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/extdataset/ExternalDatasetImpl.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/Sender.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/DFSTestUtil.java
* hadoop-hdfs-project/hadoop-hdfs/src/main/proto/datatransfer.proto
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDiskError.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDataTransferProtocol.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiver.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/Receiver.java


 Make Balancer support exclude specified path
 

 Key: HDFS-6133
 URL: https://issues.apache.org/jira/browse/HDFS-6133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer  mover, datanode
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Fix For: 2.7.0

 Attachments: HDFS-6133-1.patch, HDFS-6133-10.patch, 
 HDFS-6133-11.patch, HDFS-6133-2.patch, HDFS-6133-3.patch, HDFS-6133-4.patch, 
 HDFS-6133-5.patch, HDFS-6133-6.patch, HDFS-6133-7.patch, HDFS-6133-8.patch, 
 HDFS-6133-9.patch, HDFS-6133.patch


 Currently, run Balancer will destroying Regionserver's data locality.
 If getBlocks could exclude blocks belongs to files which have specific path 
 prefix, like /hbase, then we can run Balancer without destroying 
 Regionserver's data locality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6133) Make Balancer support exclude specified path

2015-02-12 Thread zhaoyunjiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14319477#comment-14319477
 ] 

zhaoyunjiong commented on HDFS-6133:


{quote}
Is there use scenario that one want to specify larger number of favoredNodes?
{quote}
It might have.
Let's say if user set 1000 favoredNodes, and 10 replicates, 
DFSOutputStream.getPinnings will do 10,000 compare at worst case.
Seems not so bad.
Do you think we need optimize the code?

 Make Balancer support exclude specified path
 

 Key: HDFS-6133
 URL: https://issues.apache.org/jira/browse/HDFS-6133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer  mover, datanode
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Fix For: 2.7.0

 Attachments: HDFS-6133-1.patch, HDFS-6133-10.patch, 
 HDFS-6133-11.patch, HDFS-6133-2.patch, HDFS-6133-3.patch, HDFS-6133-4.patch, 
 HDFS-6133-5.patch, HDFS-6133-6.patch, HDFS-6133-7.patch, HDFS-6133-8.patch, 
 HDFS-6133-9.patch, HDFS-6133.patch


 Currently, run Balancer will destroying Regionserver's data locality.
 If getBlocks could exclude blocks belongs to files which have specific path 
 prefix, like /hbase, then we can run Balancer without destroying 
 Regionserver's data locality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6133) Make Balancer support exclude specified path

2015-02-12 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14319503#comment-14319503
 ] 

Yongjun Zhang commented on HDFS-6133:
-

Another factor is how many blocks a file have, since the computation happens 
once for each block of the file. Assume on average a file has 1k blocks, then 
the computation is 10M to write one file. If we construct a hash set, the 
computation complexiity is 40k. So the diff is 40k vs 10M.

I think we should consider optimizing. Hi [~szetszwo], what do you think?

Thanks.


 Make Balancer support exclude specified path
 

 Key: HDFS-6133
 URL: https://issues.apache.org/jira/browse/HDFS-6133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer  mover, datanode
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Fix For: 2.7.0

 Attachments: HDFS-6133-1.patch, HDFS-6133-10.patch, 
 HDFS-6133-11.patch, HDFS-6133-2.patch, HDFS-6133-3.patch, HDFS-6133-4.patch, 
 HDFS-6133-5.patch, HDFS-6133-6.patch, HDFS-6133-7.patch, HDFS-6133-8.patch, 
 HDFS-6133-9.patch, HDFS-6133.patch


 Currently, run Balancer will destroying Regionserver's data locality.
 If getBlocks could exclude blocks belongs to files which have specific path 
 prefix, like /hbase, then we can run Balancer without destroying 
 Regionserver's data locality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6133) Make Balancer support exclude specified path

2015-02-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14318237#comment-14318237
 ] 

Hudson commented on HDFS-6133:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2034 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2034/])
HDFS-6133. Add a feature for replica pinning so that a pinned replica will not 
be moved by Balancer/Mover.  Contributed by zhaoyunjiong (szetszwo: rev 
085b1e293ff53f7a86aa21406cfd4bfa0f3bf33b)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/DataTransferProtocol.java
* hadoop-hdfs-project/hadoop-hdfs/src/main/proto/datatransfer.proto
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/FsDatasetSpi.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDataTransferProtocol.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/extdataset/ExternalDatasetImpl.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestBalancer.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/DFSTestUtil.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
* hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiver.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/Sender.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDiskError.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/Receiver.java


 Make Balancer support exclude specified path
 

 Key: HDFS-6133
 URL: https://issues.apache.org/jira/browse/HDFS-6133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer  mover, datanode
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Fix For: 2.7.0

 Attachments: HDFS-6133-1.patch, HDFS-6133-10.patch, 
 HDFS-6133-11.patch, HDFS-6133-2.patch, HDFS-6133-3.patch, HDFS-6133-4.patch, 
 HDFS-6133-5.patch, HDFS-6133-6.patch, HDFS-6133-7.patch, HDFS-6133-8.patch, 
 HDFS-6133-9.patch, HDFS-6133.patch


 Currently, run Balancer will destroying Regionserver's data locality.
 If getBlocks could exclude blocks belongs to files which have specific path 
 prefix, like /hbase, then we can run Balancer without destroying 
 Regionserver's data locality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6133) Make Balancer support exclude specified path

2015-02-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14318104#comment-14318104
 ] 

Hudson commented on HDFS-6133:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #836 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/836/])
HDFS-6133. Add a feature for replica pinning so that a pinned replica will not 
be moved by Balancer/Mover.  Contributed by zhaoyunjiong (szetszwo: rev 
085b1e293ff53f7a86aa21406cfd4bfa0f3bf33b)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDataTransferProtocol.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/DataTransferProtocol.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/extdataset/ExternalDatasetImpl.java
* hadoop-hdfs-project/hadoop-hdfs/src/main/proto/datatransfer.proto
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/DFSTestUtil.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/FsDatasetSpi.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/Receiver.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/Sender.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDiskError.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestBalancer.java
* hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiver.java


 Make Balancer support exclude specified path
 

 Key: HDFS-6133
 URL: https://issues.apache.org/jira/browse/HDFS-6133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer  mover, datanode
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Fix For: 2.7.0

 Attachments: HDFS-6133-1.patch, HDFS-6133-10.patch, 
 HDFS-6133-11.patch, HDFS-6133-2.patch, HDFS-6133-3.patch, HDFS-6133-4.patch, 
 HDFS-6133-5.patch, HDFS-6133-6.patch, HDFS-6133-7.patch, HDFS-6133-8.patch, 
 HDFS-6133-9.patch, HDFS-6133.patch


 Currently, run Balancer will destroying Regionserver's data locality.
 If getBlocks could exclude blocks belongs to files which have specific path 
 prefix, like /hbase, then we can run Balancer without destroying 
 Regionserver's data locality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6133) Make Balancer support exclude specified path

2015-02-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14318319#comment-14318319
 ] 

Hudson commented on HDFS-6133:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #103 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/103/])
HDFS-6133. Add a feature for replica pinning so that a pinned replica will not 
be moved by Balancer/Mover.  Contributed by zhaoyunjiong (szetszwo: rev 
085b1e293ff53f7a86aa21406cfd4bfa0f3bf33b)
* hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/DataTransferProtocol.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDiskError.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/extdataset/ExternalDatasetImpl.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/Receiver.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDataTransferProtocol.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java
* hadoop-hdfs-project/hadoop-hdfs/src/main/proto/datatransfer.proto
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestBalancer.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/Sender.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/DFSTestUtil.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/FsDatasetSpi.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiver.java


 Make Balancer support exclude specified path
 

 Key: HDFS-6133
 URL: https://issues.apache.org/jira/browse/HDFS-6133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer  mover, datanode
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Fix For: 2.7.0

 Attachments: HDFS-6133-1.patch, HDFS-6133-10.patch, 
 HDFS-6133-11.patch, HDFS-6133-2.patch, HDFS-6133-3.patch, HDFS-6133-4.patch, 
 HDFS-6133-5.patch, HDFS-6133-6.patch, HDFS-6133-7.patch, HDFS-6133-8.patch, 
 HDFS-6133-9.patch, HDFS-6133.patch


 Currently, run Balancer will destroying Regionserver's data locality.
 If getBlocks could exclude blocks belongs to files which have specific path 
 prefix, like /hbase, then we can run Balancer without destroying 
 Regionserver's data locality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6133) Make Balancer support exclude specified path

2015-02-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14318369#comment-14318369
 ] 

Hudson commented on HDFS-6133:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2053 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2053/])
HDFS-6133. Add a feature for replica pinning so that a pinned replica will not 
be moved by Balancer/Mover.  Contributed by zhaoyunjiong (szetszwo: rev 
085b1e293ff53f7a86aa21406cfd4bfa0f3bf33b)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/DataTransferProtocol.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestBalancer.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/extdataset/ExternalDatasetImpl.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/Receiver.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java
* hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDiskError.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiver.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/FsDatasetSpi.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDataTransferProtocol.java
* hadoop-hdfs-project/hadoop-hdfs/src/main/proto/datatransfer.proto
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/DFSTestUtil.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/Sender.java


 Make Balancer support exclude specified path
 

 Key: HDFS-6133
 URL: https://issues.apache.org/jira/browse/HDFS-6133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer  mover, datanode
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Fix For: 2.7.0

 Attachments: HDFS-6133-1.patch, HDFS-6133-10.patch, 
 HDFS-6133-11.patch, HDFS-6133-2.patch, HDFS-6133-3.patch, HDFS-6133-4.patch, 
 HDFS-6133-5.patch, HDFS-6133-6.patch, HDFS-6133-7.patch, HDFS-6133-8.patch, 
 HDFS-6133-9.patch, HDFS-6133.patch


 Currently, run Balancer will destroying Regionserver's data locality.
 If getBlocks could exclude blocks belongs to files which have specific path 
 prefix, like /hbase, then we can run Balancer without destroying 
 Regionserver's data locality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6133) Make Balancer support exclude specified path

2015-02-12 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14318451#comment-14318451
 ] 

Yongjun Zhang commented on HDFS-6133:
-

Hi [~zhaoyunjiong],

{quote}
Sorry, which question? I clicked the URL, but don't show the question clearly.
{quote}
A couple of days ago, I replied your comment as below (if you roll up to find 
my comment you will see):
{quote}
Are you saying NumFavouredNodes is equivalent to replication factor? I thought 
they are different: the favoredNodes is specified when constructing 
DFSOutputStream, as a collection of DNs. The write pipeline of a data block 
usually has three replicas, if any of the replicas happens to be in the 
collection, and the data is pinned there.
I guess what you meant is that the normal use cases are, when we construct 
DFSOutputStream for a file, we usually only pass replication number of DNs as 
favored DNs? Is there use scenario that one want to specify larger number of 
favoredNodes?
{quote}

Thanks.


 Make Balancer support exclude specified path
 

 Key: HDFS-6133
 URL: https://issues.apache.org/jira/browse/HDFS-6133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer  mover, datanode
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Fix For: 2.7.0

 Attachments: HDFS-6133-1.patch, HDFS-6133-10.patch, 
 HDFS-6133-11.patch, HDFS-6133-2.patch, HDFS-6133-3.patch, HDFS-6133-4.patch, 
 HDFS-6133-5.patch, HDFS-6133-6.patch, HDFS-6133-7.patch, HDFS-6133-8.patch, 
 HDFS-6133-9.patch, HDFS-6133.patch


 Currently, run Balancer will destroying Regionserver's data locality.
 If getBlocks could exclude blocks belongs to files which have specific path 
 prefix, like /hbase, then we can run Balancer without destroying 
 Regionserver's data locality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6133) Make Balancer support exclude specified path

2015-02-11 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14317184#comment-14317184
 ] 

Tsz Wo Nicholas Sze commented on HDFS-6133:
---

I have committed this.  Thanks, zhaoyunjiong!

 Make Balancer support exclude specified path
 

 Key: HDFS-6133
 URL: https://issues.apache.org/jira/browse/HDFS-6133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer  mover, namenode
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Attachments: HDFS-6133-1.patch, HDFS-6133-10.patch, 
 HDFS-6133-11.patch, HDFS-6133-2.patch, HDFS-6133-3.patch, HDFS-6133-4.patch, 
 HDFS-6133-5.patch, HDFS-6133-6.patch, HDFS-6133-7.patch, HDFS-6133-8.patch, 
 HDFS-6133-9.patch, HDFS-6133.patch


 Currently, run Balancer will destroying Regionserver's data locality.
 If getBlocks could exclude blocks belongs to files which have specific path 
 prefix, like /hbase, then we can run Balancer without destroying 
 Regionserver's data locality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6133) Make Balancer support exclude specified path

2015-02-11 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14317200#comment-14317200
 ] 

Yongjun Zhang commented on HDFS-6133:
-

Thanks Nicholas for committing this.

Hi [~zhaoyunjiong],

Would you please answer the questions in 
https://issues.apache.org/jira/browse/HDFS-6133?focusedCommentId=14314368page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14314368
?

Thanks.



 Make Balancer support exclude specified path
 

 Key: HDFS-6133
 URL: https://issues.apache.org/jira/browse/HDFS-6133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer  mover, datanode
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Fix For: 2.7.0

 Attachments: HDFS-6133-1.patch, HDFS-6133-10.patch, 
 HDFS-6133-11.patch, HDFS-6133-2.patch, HDFS-6133-3.patch, HDFS-6133-4.patch, 
 HDFS-6133-5.patch, HDFS-6133-6.patch, HDFS-6133-7.patch, HDFS-6133-8.patch, 
 HDFS-6133-9.patch, HDFS-6133.patch


 Currently, run Balancer will destroying Regionserver's data locality.
 If getBlocks could exclude blocks belongs to files which have specific path 
 prefix, like /hbase, then we can run Balancer without destroying 
 Regionserver's data locality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6133) Make Balancer support exclude specified path

2015-02-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14317234#comment-14317234
 ] 

Hudson commented on HDFS-6133:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7080 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7080/])
HDFS-6133. Add a feature for replica pinning so that a pinned replica will not 
be moved by Balancer/Mover.  Contributed by zhaoyunjiong (szetszwo: rev 
085b1e293ff53f7a86aa21406cfd4bfa0f3bf33b)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/Receiver.java
* hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/Sender.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/extdataset/ExternalDatasetImpl.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDataTransferProtocol.java
* hadoop-hdfs-project/hadoop-hdfs/src/main/proto/datatransfer.proto
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/DFSTestUtil.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiver.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/FsDatasetSpi.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestBalancer.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDiskError.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/DataTransferProtocol.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java


 Make Balancer support exclude specified path
 

 Key: HDFS-6133
 URL: https://issues.apache.org/jira/browse/HDFS-6133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer  mover, datanode
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Fix For: 2.7.0

 Attachments: HDFS-6133-1.patch, HDFS-6133-10.patch, 
 HDFS-6133-11.patch, HDFS-6133-2.patch, HDFS-6133-3.patch, HDFS-6133-4.patch, 
 HDFS-6133-5.patch, HDFS-6133-6.patch, HDFS-6133-7.patch, HDFS-6133-8.patch, 
 HDFS-6133-9.patch, HDFS-6133.patch


 Currently, run Balancer will destroying Regionserver's data locality.
 If getBlocks could exclude blocks belongs to files which have specific path 
 prefix, like /hbase, then we can run Balancer without destroying 
 Regionserver's data locality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6133) Make Balancer support exclude specified path

2015-02-11 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14317251#comment-14317251
 ] 

Ted Yu commented on HDFS-6133:
--

In DFSOutputStream.java:
{code}
1452(targetPinnings == null ? false : targetPinnings[0]), 
targetPinnings);
{code}
Looks like the boolean parameter, pinning, is unnecessary: when targetPinnings 
is not null, pinning would be true. Otherwise pinning is false.

 Make Balancer support exclude specified path
 

 Key: HDFS-6133
 URL: https://issues.apache.org/jira/browse/HDFS-6133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer  mover, datanode
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Fix For: 2.7.0

 Attachments: HDFS-6133-1.patch, HDFS-6133-10.patch, 
 HDFS-6133-11.patch, HDFS-6133-2.patch, HDFS-6133-3.patch, HDFS-6133-4.patch, 
 HDFS-6133-5.patch, HDFS-6133-6.patch, HDFS-6133-7.patch, HDFS-6133-8.patch, 
 HDFS-6133-9.patch, HDFS-6133.patch


 Currently, run Balancer will destroying Regionserver's data locality.
 If getBlocks could exclude blocks belongs to files which have specific path 
 prefix, like /hbase, then we can run Balancer without destroying 
 Regionserver's data locality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6133) Make Balancer support exclude specified path

2015-02-11 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14317265#comment-14317265
 ] 

Yongjun Zhang commented on HDFS-6133:
-

Hi  [~tedyu],

When targetPinnings is not null, whether to pin is determined by 
targetPinnings[0] which is a boolean.


 Make Balancer support exclude specified path
 

 Key: HDFS-6133
 URL: https://issues.apache.org/jira/browse/HDFS-6133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer  mover, datanode
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Fix For: 2.7.0

 Attachments: HDFS-6133-1.patch, HDFS-6133-10.patch, 
 HDFS-6133-11.patch, HDFS-6133-2.patch, HDFS-6133-3.patch, HDFS-6133-4.patch, 
 HDFS-6133-5.patch, HDFS-6133-6.patch, HDFS-6133-7.patch, HDFS-6133-8.patch, 
 HDFS-6133-9.patch, HDFS-6133.patch


 Currently, run Balancer will destroying Regionserver's data locality.
 If getBlocks could exclude blocks belongs to files which have specific path 
 prefix, like /hbase, then we can run Balancer without destroying 
 Regionserver's data locality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6133) Make Balancer support exclude specified path

2015-02-11 Thread zhaoyunjiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14317700#comment-14317700
 ] 

zhaoyunjiong commented on HDFS-6133:


Sorry, which question? I clicked the URL, but don't show the question clearly.

 Make Balancer support exclude specified path
 

 Key: HDFS-6133
 URL: https://issues.apache.org/jira/browse/HDFS-6133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer  mover, datanode
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Fix For: 2.7.0

 Attachments: HDFS-6133-1.patch, HDFS-6133-10.patch, 
 HDFS-6133-11.patch, HDFS-6133-2.patch, HDFS-6133-3.patch, HDFS-6133-4.patch, 
 HDFS-6133-5.patch, HDFS-6133-6.patch, HDFS-6133-7.patch, HDFS-6133-8.patch, 
 HDFS-6133-9.patch, HDFS-6133.patch


 Currently, run Balancer will destroying Regionserver's data locality.
 If getBlocks could exclude blocks belongs to files which have specific path 
 prefix, like /hbase, then we can run Balancer without destroying 
 Regionserver's data locality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6133) Make Balancer support exclude specified path

2015-02-10 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14314368#comment-14314368
 ] 

Yongjun Zhang commented on HDFS-6133:
-

Thanks [~zhaoyunjiong].

Are you saying {{NumFavouredNodes}} is equivalent to replication factor? I 
thought they are different: the favoredNodes is specified when constructing 
DFSOutputStream, as a collection of DNs. The write pipeline of a data block 
usually has three replicas, if any of the replicas happens to be in the 
collection, and the data is pinned there.

I guess what you meant is that the normal use cases are, when we construct 
DFSOutputStream for a file, we usually only pass replication number of DNs as 
favored DNs? Is there use scenario that one want to specify larger number of 
favoredNodes?

Thanks.


 Make Balancer support exclude specified path
 

 Key: HDFS-6133
 URL: https://issues.apache.org/jira/browse/HDFS-6133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer  mover, namenode
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Attachments: HDFS-6133-1.patch, HDFS-6133-10.patch, 
 HDFS-6133-2.patch, HDFS-6133-3.patch, HDFS-6133-4.patch, HDFS-6133-5.patch, 
 HDFS-6133-6.patch, HDFS-6133-7.patch, HDFS-6133-8.patch, HDFS-6133-9.patch, 
 HDFS-6133.patch


 Currently, run Balancer will destroying Regionserver's data locality.
 If getBlocks could exclude blocks belongs to files which have specific path 
 prefix, like /hbase, then we can run Balancer without destroying 
 Regionserver's data locality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6133) Make Balancer support exclude specified path

2015-02-10 Thread zhaoyunjiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14313832#comment-14313832
 ] 

zhaoyunjiong commented on HDFS-6133:


Most of user will use 3 replicates, so NumFavoredNodes should be 3 for most 
case. I don't think this will cause big problem.

 Make Balancer support exclude specified path
 

 Key: HDFS-6133
 URL: https://issues.apache.org/jira/browse/HDFS-6133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer  mover, namenode
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Attachments: HDFS-6133-1.patch, HDFS-6133-10.patch, 
 HDFS-6133-2.patch, HDFS-6133-3.patch, HDFS-6133-4.patch, HDFS-6133-5.patch, 
 HDFS-6133-6.patch, HDFS-6133-7.patch, HDFS-6133-8.patch, HDFS-6133-9.patch, 
 HDFS-6133.patch


 Currently, run Balancer will destroying Regionserver's data locality.
 If getBlocks could exclude blocks belongs to files which have specific path 
 prefix, like /hbase, then we can run Balancer without destroying 
 Regionserver's data locality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6133) Make Balancer support exclude specified path

2015-02-10 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14314705#comment-14314705
 ] 

Tsz Wo Nicholas Sze commented on HDFS-6133:
---

Jenkins somehow has not picked up this.  Just have manually started a build.

 Make Balancer support exclude specified path
 

 Key: HDFS-6133
 URL: https://issues.apache.org/jira/browse/HDFS-6133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer  mover, namenode
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Attachments: HDFS-6133-1.patch, HDFS-6133-10.patch, 
 HDFS-6133-2.patch, HDFS-6133-3.patch, HDFS-6133-4.patch, HDFS-6133-5.patch, 
 HDFS-6133-6.patch, HDFS-6133-7.patch, HDFS-6133-8.patch, HDFS-6133-9.patch, 
 HDFS-6133.patch


 Currently, run Balancer will destroying Regionserver's data locality.
 If getBlocks could exclude blocks belongs to files which have specific path 
 prefix, like /hbase, then we can run Balancer without destroying 
 Regionserver's data locality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6133) Make Balancer support exclude specified path

2015-02-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14314737#comment-14314737
 ] 

Hadoop QA commented on HDFS-6133:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12697628/HDFS-6133-10.patch
  against trunk revision 3f5431a.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9518//console

This message is automatically generated.

 Make Balancer support exclude specified path
 

 Key: HDFS-6133
 URL: https://issues.apache.org/jira/browse/HDFS-6133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer  mover, namenode
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Attachments: HDFS-6133-1.patch, HDFS-6133-10.patch, 
 HDFS-6133-2.patch, HDFS-6133-3.patch, HDFS-6133-4.patch, HDFS-6133-5.patch, 
 HDFS-6133-6.patch, HDFS-6133-7.patch, HDFS-6133-8.patch, HDFS-6133-9.patch, 
 HDFS-6133.patch


 Currently, run Balancer will destroying Regionserver's data locality.
 If getBlocks could exclude blocks belongs to files which have specific path 
 prefix, like /hbase, then we can run Balancer without destroying 
 Regionserver's data locality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6133) Make Balancer support exclude specified path

2015-02-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14315729#comment-14315729
 ] 

Hadoop QA commented on HDFS-6133:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12697963/HDFS-6133-11.patch
  against trunk revision 7c6b654.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9530//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9530//artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9530//console

This message is automatically generated.

 Make Balancer support exclude specified path
 

 Key: HDFS-6133
 URL: https://issues.apache.org/jira/browse/HDFS-6133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer  mover, namenode
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Attachments: HDFS-6133-1.patch, HDFS-6133-10.patch, 
 HDFS-6133-11.patch, HDFS-6133-2.patch, HDFS-6133-3.patch, HDFS-6133-4.patch, 
 HDFS-6133-5.patch, HDFS-6133-6.patch, HDFS-6133-7.patch, HDFS-6133-8.patch, 
 HDFS-6133-9.patch, HDFS-6133.patch


 Currently, run Balancer will destroying Regionserver's data locality.
 If getBlocks could exclude blocks belongs to files which have specific path 
 prefix, like /hbase, then we can run Balancer without destroying 
 Regionserver's data locality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6133) Make Balancer support exclude specified path

2015-02-09 Thread zhaoyunjiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14313405#comment-14313405
 ] 

zhaoyunjiong commented on HDFS-6133:


You mean choosing the mechanism don't rely on the os but rely on the mechanism 
only?

 Make Balancer support exclude specified path
 

 Key: HDFS-6133
 URL: https://issues.apache.org/jira/browse/HDFS-6133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer  mover, namenode
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Attachments: HDFS-6133-1.patch, HDFS-6133-10.patch, 
 HDFS-6133-2.patch, HDFS-6133-3.patch, HDFS-6133-4.patch, HDFS-6133-5.patch, 
 HDFS-6133-6.patch, HDFS-6133-7.patch, HDFS-6133-8.patch, HDFS-6133-9.patch, 
 HDFS-6133.patch


 Currently, run Balancer will destroying Regionserver's data locality.
 If getBlocks could exclude blocks belongs to files which have specific path 
 prefix, like /hbase, then we can run Balancer without destroying 
 Regionserver's data locality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6133) Make Balancer support exclude specified path

2015-02-09 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14313387#comment-14313387
 ] 

Tsz Wo Nicholas Sze commented on HDFS-6133:
---

 If you agree with mixed mechanism, ...

Let's have another conf for choosing the mechanism.  The feature is disabled by 
default.  When admin enables it, one should also choose the mechanism.

 Make Balancer support exclude specified path
 

 Key: HDFS-6133
 URL: https://issues.apache.org/jira/browse/HDFS-6133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer  mover, namenode
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Attachments: HDFS-6133-1.patch, HDFS-6133-10.patch, 
 HDFS-6133-2.patch, HDFS-6133-3.patch, HDFS-6133-4.patch, HDFS-6133-5.patch, 
 HDFS-6133-6.patch, HDFS-6133-7.patch, HDFS-6133-8.patch, HDFS-6133-9.patch, 
 HDFS-6133.patch


 Currently, run Balancer will destroying Regionserver's data locality.
 If getBlocks could exclude blocks belongs to files which have specific path 
 prefix, like /hbase, then we can run Balancer without destroying 
 Regionserver's data locality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6133) Make Balancer support exclude specified path

2015-02-09 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14313513#comment-14313513
 ] 

Yongjun Zhang commented on HDFS-6133:
-

Hi [~zhaoyunjiong],

Nice work!  Thanks Nicholas for the review, the patch looks good to me too. One 
nit:

The {{DFSOutputStream#getPinnings}} has nested loop. The search cost is 
NumNodesInPipeline * NumFavoredNodes. From application point of view, I wonder  
how many favoredNodes could be for a big cluster (?). If needed, we could 
create a hash set when doing {{setFavoredNodes}} for quicker search. This may 
not be needed if the number is small. Anyways, I don't think we need to make 
this change in this jira, just something to watch out.

Thanks.


 Make Balancer support exclude specified path
 

 Key: HDFS-6133
 URL: https://issues.apache.org/jira/browse/HDFS-6133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer  mover, namenode
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Attachments: HDFS-6133-1.patch, HDFS-6133-10.patch, 
 HDFS-6133-2.patch, HDFS-6133-3.patch, HDFS-6133-4.patch, HDFS-6133-5.patch, 
 HDFS-6133-6.patch, HDFS-6133-7.patch, HDFS-6133-8.patch, HDFS-6133-9.patch, 
 HDFS-6133.patch


 Currently, run Balancer will destroying Regionserver's data locality.
 If getBlocks could exclude blocks belongs to files which have specific path 
 prefix, like /hbase, then we can run Balancer without destroying 
 Regionserver's data locality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6133) Make Balancer support exclude specified path

2015-02-09 Thread zhaoyunjiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14313432#comment-14313432
 ] 

zhaoyunjiong commented on HDFS-6133:


Sure.
Create HDFS-7759 to track.
Thanks for your time to review the patch.

 Make Balancer support exclude specified path
 

 Key: HDFS-6133
 URL: https://issues.apache.org/jira/browse/HDFS-6133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer  mover, namenode
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Attachments: HDFS-6133-1.patch, HDFS-6133-10.patch, 
 HDFS-6133-2.patch, HDFS-6133-3.patch, HDFS-6133-4.patch, HDFS-6133-5.patch, 
 HDFS-6133-6.patch, HDFS-6133-7.patch, HDFS-6133-8.patch, HDFS-6133-9.patch, 
 HDFS-6133.patch


 Currently, run Balancer will destroying Regionserver's data locality.
 If getBlocks could exclude blocks belongs to files which have specific path 
 prefix, like /hbase, then we can run Balancer without destroying 
 Regionserver's data locality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6133) Make Balancer support exclude specified path

2015-02-09 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14313417#comment-14313417
 ] 

Tsz Wo Nicholas Sze commented on HDFS-6133:
---

 You mean choosing the mechanism don't rely on the os but rely on the 
 mechanism only?

Admins can choose which the mechanism they want to use.  Of course, we should 
throw an error if they choose sticky bit in Windows.

Sticky bit is more preferable but the support of sticky bit is not standard.  
Some systems does not support it.  Even for the systems supporting it, they may 
have different meaning.  So I think we need to allow admin to choose the 
mechanism.

 Make Balancer support exclude specified path
 

 Key: HDFS-6133
 URL: https://issues.apache.org/jira/browse/HDFS-6133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer  mover, namenode
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Attachments: HDFS-6133-1.patch, HDFS-6133-10.patch, 
 HDFS-6133-2.patch, HDFS-6133-3.patch, HDFS-6133-4.patch, HDFS-6133-5.patch, 
 HDFS-6133-6.patch, HDFS-6133-7.patch, HDFS-6133-8.patch, HDFS-6133-9.patch, 
 HDFS-6133.patch


 Currently, run Balancer will destroying Regionserver's data locality.
 If getBlocks could exclude blocks belongs to files which have specific path 
 prefix, like /hbase, then we can run Balancer without destroying 
 Regionserver's data locality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6133) Make Balancer support exclude specified path

2015-02-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14313272#comment-14313272
 ] 

Hadoop QA commented on HDFS-6133:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12697420/HDFS-6133-9.patch
  against trunk revision 7d73202.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9501//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9501//artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9501//console

This message is automatically generated.

 Make Balancer support exclude specified path
 

 Key: HDFS-6133
 URL: https://issues.apache.org/jira/browse/HDFS-6133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer  mover, namenode
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Attachments: HDFS-6133-1.patch, HDFS-6133-2.patch, HDFS-6133-3.patch, 
 HDFS-6133-4.patch, HDFS-6133-5.patch, HDFS-6133-6.patch, HDFS-6133-7.patch, 
 HDFS-6133-8.patch, HDFS-6133-9.patch, HDFS-6133.patch


 Currently, run Balancer will destroying Regionserver's data locality.
 If getBlocks could exclude blocks belongs to files which have specific path 
 prefix, like /hbase, then we can run Balancer without destroying 
 Regionserver's data locality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6133) Make Balancer support exclude specified path

2015-02-06 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309919#comment-14309919
 ] 

Tsz Wo Nicholas Sze commented on HDFS-6133:
---

- PBHelper.convert(..) only adds one FALSE when targetPinnings == null.  Should 
we add n FALSEs, where n = targetPinnings.length?
- Remove use sticky bit from the javadoc below since an implementation may 
not use sticky bit.
{code}
//FsDatasetSpi
+  /**
+   * Use sticky bit to pin the block
+   */
+  public void setPinning(ExtendedBlock block) throws IOException;
{code}
- FileUtil currently does not support sticky bit so that using LocalFileSystem 
is correct.  Is there a way to add support to stick bit in FileUtil?
- There are some tab characters.  Please replace them by spaces.

Thanks for working hard on this!


 Make Balancer support exclude specified path
 

 Key: HDFS-6133
 URL: https://issues.apache.org/jira/browse/HDFS-6133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer  mover, namenode
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Attachments: HDFS-6133-1.patch, HDFS-6133-2.patch, HDFS-6133-3.patch, 
 HDFS-6133-4.patch, HDFS-6133-5.patch, HDFS-6133-6.patch, HDFS-6133-7.patch, 
 HDFS-6133-8.patch, HDFS-6133.patch


 Currently, run Balancer will destroying Regionserver's data locality.
 If getBlocks could exclude blocks belongs to files which have specific path 
 prefix, like /hbase, then we can run Balancer without destroying 
 Regionserver's data locality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6133) Make Balancer support exclude specified path

2015-02-06 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309942#comment-14309942
 ] 

Tsz Wo Nicholas Sze commented on HDFS-6133:
---

 ... where n = targetPinnings.length ...

It should be n = nodes.length.

 Make Balancer support exclude specified path
 

 Key: HDFS-6133
 URL: https://issues.apache.org/jira/browse/HDFS-6133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer  mover, namenode
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Attachments: HDFS-6133-1.patch, HDFS-6133-2.patch, HDFS-6133-3.patch, 
 HDFS-6133-4.patch, HDFS-6133-5.patch, HDFS-6133-6.patch, HDFS-6133-7.patch, 
 HDFS-6133-8.patch, HDFS-6133.patch


 Currently, run Balancer will destroying Regionserver's data locality.
 If getBlocks could exclude blocks belongs to files which have specific path 
 prefix, like /hbase, then we can run Balancer without destroying 
 Regionserver's data locality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6133) Make Balancer support exclude specified path

2015-02-06 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309965#comment-14309965
 ] 

Tsz Wo Nicholas Sze commented on HDFS-6133:
---

We should also add a configuration to enable/disable this feature since block 
pinning may not be preferable for some deployment.

BTW, just heard from [~cnauroth] that Windows does not support sticky bit.  Is 
there some other ways to implement block pinning?  Use file name?

 Make Balancer support exclude specified path
 

 Key: HDFS-6133
 URL: https://issues.apache.org/jira/browse/HDFS-6133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer  mover, namenode
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Attachments: HDFS-6133-1.patch, HDFS-6133-2.patch, HDFS-6133-3.patch, 
 HDFS-6133-4.patch, HDFS-6133-5.patch, HDFS-6133-6.patch, HDFS-6133-7.patch, 
 HDFS-6133-8.patch, HDFS-6133.patch


 Currently, run Balancer will destroying Regionserver's data locality.
 If getBlocks could exclude blocks belongs to files which have specific path 
 prefix, like /hbase, then we can run Balancer without destroying 
 Regionserver's data locality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6133) Make Balancer support exclude specified path

2015-02-06 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14310533#comment-14310533
 ] 

Yongjun Zhang commented on HDFS-6133:
-

Thanks Yunjiong for the continuous effort on this work, and Nicholas for 
further reviewing!

{quote}
BTW, just heard from Chris Nauroth that Windows does not support sticky bit. Is 
there some other ways to implement block pinning? Use file name?
{quote}

[~daryn] pointed out some options earlier:

{quote}
Other approaches might be using the sticky bit or the existence of a second 
file to denote a pinned block.
{quote}
in 
https://issues.apache.org/jira/browse/HDFS-6133?focusedCommentId=13987736page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13987736

I guess using existence of a second file is easier to be compatible across 
different platforms (though sticky bit has the advantage of storing the info at 
the same file). 

Another option is to use a mixed mechanism, say, if it's on linux, use sticky 
bit, otherwise, use existence of a second file. This method means lack of 
consistency between different platforms. Just wanted to throw a thought here.

Thanks.








 Make Balancer support exclude specified path
 

 Key: HDFS-6133
 URL: https://issues.apache.org/jira/browse/HDFS-6133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer  mover, namenode
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Attachments: HDFS-6133-1.patch, HDFS-6133-2.patch, HDFS-6133-3.patch, 
 HDFS-6133-4.patch, HDFS-6133-5.patch, HDFS-6133-6.patch, HDFS-6133-7.patch, 
 HDFS-6133-8.patch, HDFS-6133.patch


 Currently, run Balancer will destroying Regionserver's data locality.
 If getBlocks could exclude blocks belongs to files which have specific path 
 prefix, like /hbase, then we can run Balancer without destroying 
 Regionserver's data locality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6133) Make Balancer support exclude specified path

2015-02-06 Thread Benoy Antony (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14310564#comment-14310564
 ] 

Benoy Antony commented on HDFS-6133:


I know that there are different ways to accomplish the desired functionality of 
pinning the blocks under a directory to a set of nodes.
I want to suggest yet another approach using storage policies.

This is based on the fact that the set of favored nodes can be considered as a 
tier. 

Let’s say blocks under /hbase needs to be pinned to a set of nodes.

1. Mark the data.dirs on those set of datanodes as a special storage type , say 
HBASE_TIER. [HBASE_TIER]/hadoop01/data
2. Define a storage policy - HBASE_POLICY which insists that all blocks 
following the policy, reside in HBASE_TIER
3. Set storage policy of /hbase to HBASE_POLICY 

Since the balancer(and mover) considers the storage policy while 
moving/balancing, balancer will not move these blocks out of the favored 
nodes(HBASE_TIER)

Now if we want to keep other blocks ( non hbase blocks) on these datanodes as 
well(may not be a great idea), then we can model it by having multiple storage 
types on a single directory like [HBASE_TIER,DISK]/hadoop01/data. This make 
sense since one want to have blocks having multiple storage characteristics on 
the same node.
This will need a (minor) code modification.


 Make Balancer support exclude specified path
 

 Key: HDFS-6133
 URL: https://issues.apache.org/jira/browse/HDFS-6133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer  mover, namenode
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Attachments: HDFS-6133-1.patch, HDFS-6133-2.patch, HDFS-6133-3.patch, 
 HDFS-6133-4.patch, HDFS-6133-5.patch, HDFS-6133-6.patch, HDFS-6133-7.patch, 
 HDFS-6133-8.patch, HDFS-6133.patch


 Currently, run Balancer will destroying Regionserver's data locality.
 If getBlocks could exclude blocks belongs to files which have specific path 
 prefix, like /hbase, then we can run Balancer without destroying 
 Regionserver's data locality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6133) Make Balancer support exclude specified path

2015-02-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308970#comment-14308970
 ] 

Hadoop QA commented on HDFS-6133:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12696984/HDFS-6133-7.patch
  against trunk revision 7b10ef0.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.server.balancer.TestBalancer

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9455//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9455//console

This message is automatically generated.

 Make Balancer support exclude specified path
 

 Key: HDFS-6133
 URL: https://issues.apache.org/jira/browse/HDFS-6133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer  mover, namenode
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Attachments: HDFS-6133-1.patch, HDFS-6133-2.patch, HDFS-6133-3.patch, 
 HDFS-6133-4.patch, HDFS-6133-5.patch, HDFS-6133-6.patch, HDFS-6133-7.patch, 
 HDFS-6133.patch


 Currently, run Balancer will destroying Regionserver's data locality.
 If getBlocks could exclude blocks belongs to files which have specific path 
 prefix, like /hbase, then we can run Balancer without destroying 
 Regionserver's data locality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6133) Make Balancer support exclude specified path

2015-02-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309594#comment-14309594
 ] 

Hadoop QA commented on HDFS-6133:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12697036/HDFS-6133-8.patch
  against trunk revision 1425e3d.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9457//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9457//console

This message is automatically generated.

 Make Balancer support exclude specified path
 

 Key: HDFS-6133
 URL: https://issues.apache.org/jira/browse/HDFS-6133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer  mover, namenode
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Attachments: HDFS-6133-1.patch, HDFS-6133-2.patch, HDFS-6133-3.patch, 
 HDFS-6133-4.patch, HDFS-6133-5.patch, HDFS-6133-6.patch, HDFS-6133-7.patch, 
 HDFS-6133-8.patch, HDFS-6133.patch


 Currently, run Balancer will destroying Regionserver's data locality.
 If getBlocks could exclude blocks belongs to files which have specific path 
 prefix, like /hbase, then we can run Balancer without destroying 
 Regionserver's data locality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6133) Make Balancer support exclude specified path

2015-02-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307428#comment-14307428
 ] 

Hadoop QA commented on HDFS-6133:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12696748/HDFS-6133-7.patch
  against trunk revision 20660b7.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.TestDecommission
  org.apache.hadoop.hdfs.TestReplaceDatanodeOnFailure
  org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA
  org.apache.hadoop.hdfs.TestReadWhileWriting
  org.apache.hadoop.hdfs.TestFileAppend2
  org.apache.hadoop.hdfs.server.mover.TestStorageMover
  org.apache.hadoop.hdfs.TestGetFileChecksum
  org.apache.hadoop.hdfs.server.namenode.ha.TestHAAppend
  
org.apache.hadoop.hdfs.server.blockmanagement.TestBlocksWithNotEnoughRacks
  org.apache.hadoop.hdfs.server.namenode.TestAddBlock
  org.apache.hadoop.hdfs.TestFileAppendRestart
  org.apache.hadoop.hdfs.TestReplication
  org.apache.hadoop.hdfs.TestFileCreation
  
org.apache.hadoop.hdfs.server.blockmanagement.TestComputeInvalidateWork
  org.apache.hadoop.hdfs.TestModTime
  org.apache.hadoop.hdfs.server.balancer.TestBalancer
  org.apache.hadoop.hdfs.server.namenode.TestFileTruncate
  
org.apache.hadoop.hdfs.server.namenode.TestDiskspaceQuotaUpdate
  org.apache.hadoop.hdfs.server.namenode.TestCheckpoint
  org.apache.hadoop.fs.permission.TestStickyBit
  
org.apache.hadoop.hdfs.server.blockmanagement.TestOverReplicatedBlocks
  
org.apache.hadoop.hdfs.server.blockmanagement.TestPendingReplication
  org.apache.hadoop.hdfs.TestCrcCorruption
  org.apache.hadoop.hdfs.TestGetBlocks
  org.apache.hadoop.hdfs.server.namenode.TestFSImageWithSnapshot
  
org.apache.hadoop.hdfs.server.namenode.snapshot.TestSnapshotDiffReport
  
org.apache.hadoop.hdfs.server.namenode.TestNamenodeCapacityReport
  org.apache.hadoop.hdfs.server.namenode.TestSnapshotPathINodes
  org.apache.hadoop.hdfs.TestSetrepIncreasing
  org.apache.hadoop.hdfs.TestFileAppend3
  
org.apache.hadoop.hdfs.protocol.datatransfer.sasl.TestSaslDataTransfer
  org.apache.hadoop.hdfs.TestPipelines
  org.apache.hadoop.hdfs.TestSetrepDecreasing
  org.apache.hadoop.hdfs.TestFileAppend4
  org.apache.hadoop.hdfs.TestBlockStoragePolicy
  
org.apache.hadoop.hdfs.server.namenode.snapshot.TestINodeFileUnderConstructionWithSnapshot
  org.apache.hadoop.hdfs.TestEncryptedTransfer
  org.apache.hadoop.hdfs.web.TestWebHdfsWithMultipleNameNodes
  org.apache.hadoop.hdfs.server.datanode.TestHSync
  
org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure
  
org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup
  
org.apache.hadoop.hdfs.server.namenode.web.resources.TestWebHdfsDataLocality
  org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing

  The following test timeouts occurred in 
hadoop-hdfs-project/hadoop-hdfs:

org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager
org.apache.hadoop.hdfs.TestInjectionForSimulatedStorage

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9439//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9439//artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9439//console

This message is automatically generated.

 Make Balancer support exclude specified 

[jira] [Commented] (HDFS-6133) Make Balancer support exclude specified path

2015-02-05 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308504#comment-14308504
 ] 

Tsz Wo Nicholas Sze commented on HDFS-6133:
---

[~zhaoyunjiong], thanks for the update.  There are quite a few test failures.  
Please fix them.  I will review the patch after that.

 Make Balancer support exclude specified path
 

 Key: HDFS-6133
 URL: https://issues.apache.org/jira/browse/HDFS-6133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer  mover, namenode
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Attachments: HDFS-6133-1.patch, HDFS-6133-2.patch, HDFS-6133-3.patch, 
 HDFS-6133-4.patch, HDFS-6133-5.patch, HDFS-6133-6.patch, HDFS-6133-7.patch, 
 HDFS-6133.patch


 Currently, run Balancer will destroying Regionserver's data locality.
 If getBlocks could exclude blocks belongs to files which have specific path 
 prefix, like /hbase, then we can run Balancer without destroying 
 Regionserver's data locality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6133) Make Balancer support exclude specified path

2015-01-24 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290892#comment-14290892
 ] 

Tsz Wo Nicholas Sze commented on HDFS-6133:
---

Thanks for working on this.  The pinning idea is very interesting!

A replica is pinned only if it is stored in a favored node.  In a write 
pipeline, it could possibly be some datanodes are favored nodes but some are 
not.  So the new parameters in DataTransferProtocol.writeBlock should be 
similar to storageType and targetStorageTypes, i.e. a new pinning parameter to 
the next datanodes and another new parameter targetPinnings for the downstream 
datanodes.
{code}
   public void writeBlock(final ExtendedBlock blk,
   final StorageType storageType, 
+  final boolean pinning,
   final TokenBlockTokenIdentifier blockToken,
   final String clientName,
   final DatanodeInfo[] targets,
   final StorageType[] targetStorageTypes, 
+  final boolean[] targetPinnings,
   final DatanodeInfo source,
   final BlockConstructionStage stage,
   final int pipelineSize,
{code}


 Make Balancer support exclude specified path
 

 Key: HDFS-6133
 URL: https://issues.apache.org/jira/browse/HDFS-6133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer  mover, namenode
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Attachments: HDFS-6133-1.patch, HDFS-6133-2.patch, HDFS-6133-3.patch, 
 HDFS-6133-4.patch, HDFS-6133-5.patch, HDFS-6133-6.patch, HDFS-6133.patch


 Currently, run Balancer will destroying Regionserver's data locality.
 If getBlocks could exclude blocks belongs to files which have specific path 
 prefix, like /hbase, then we can run Balancer without destroying 
 Regionserver's data locality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6133) Make Balancer support exclude specified path

2014-12-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14255641#comment-14255641
 ] 

Hadoop QA commented on HDFS-6133:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12688622/HDFS-6133-6.patch
  against trunk revision ecf1469.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.fs.TestSymlinkHdfsFileContext

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9108//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9108//artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9108//console

This message is automatically generated.

 Make Balancer support exclude specified path
 

 Key: HDFS-6133
 URL: https://issues.apache.org/jira/browse/HDFS-6133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer  mover, namenode
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Attachments: HDFS-6133-1.patch, HDFS-6133-2.patch, HDFS-6133-3.patch, 
 HDFS-6133-4.patch, HDFS-6133-5.patch, HDFS-6133-6.patch, HDFS-6133.patch


 Currently, run Balancer will destroying Regionserver's data locality.
 If getBlocks could exclude blocks belongs to files which have specific path 
 prefix, like /hbase, then we can run Balancer without destroying 
 Regionserver's data locality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6133) Make Balancer support exclude specified path

2014-12-20 Thread Kai Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14255032#comment-14255032
 ] 

Kai Zheng commented on HDFS-6133:
-

Hi Yunjiong,

The patch looks great. Minor comments:
1. Regarding the following codes, are you only putting the first DN as favored 
node or all the DNs.
{code}
+  InetSocketAddress[] favoredNodes = new InetSocketAddress[numOfDatanodes];
+  for (int i = 0; i  favoredNodes.length; i++) {
+favoredNodes[i] = cluster.getDataNodes().get(0).getXferAddress();
+  }
{code}

2. In DistributedFileSystem.java, there is a comment as below. It would be 
better to be updated as well.
{code}
  /**
   * Same as  
   * {@link #create(Path, FsPermission, boolean, int, short, long, 
   * Progressable)} with the addition of favoredNodes that is a hint to 
   * where the namenode should place the file blocks.
   * The favored nodes hint is not persisted in HDFS. Hence it may be honored
   * at the creation time only. HDFS could move the blocks during balancing or
   * replication, to move the blocks from favored nodes. A value of null means
   * no favored nodes for this create
   */
{code}

3. By the way, a question: do we need any handling for favored nodes and the 
pin way considering storage type/policy/Mover?

Thanks.

 Make Balancer support exclude specified path
 

 Key: HDFS-6133
 URL: https://issues.apache.org/jira/browse/HDFS-6133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer  mover, namenode
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Attachments: HDFS-6133-1.patch, HDFS-6133-2.patch, HDFS-6133-3.patch, 
 HDFS-6133-4.patch, HDFS-6133-5.patch, HDFS-6133.patch


 Currently, run Balancer will destroying Regionserver's data locality.
 If getBlocks could exclude blocks belongs to files which have specific path 
 prefix, like /hbase, then we can run Balancer without destroying 
 Regionserver's data locality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6133) Make Balancer support exclude specified path

2014-11-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225982#comment-14225982
 ] 

Hadoop QA commented on HDFS-6133:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12683784/HDFS-6133-5.patch
  against trunk revision 4a31611.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8845//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8845//console

This message is automatically generated.

 Make Balancer support exclude specified path
 

 Key: HDFS-6133
 URL: https://issues.apache.org/jira/browse/HDFS-6133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer  mover, namenode
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Attachments: HDFS-6133-1.patch, HDFS-6133-2.patch, HDFS-6133-3.patch, 
 HDFS-6133-4.patch, HDFS-6133-5.patch, HDFS-6133.patch


 Currently, run Balancer will destroying Regionserver's data locality.
 If getBlocks could exclude blocks belongs to files which have specific path 
 prefix, like /hbase, then we can run Balancer without destroying 
 Regionserver's data locality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6133) Make Balancer support exclude specified path

2014-11-24 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223695#comment-14223695
 ] 

Kihwal Lee commented on HDFS-6133:
--

We will make additional changes in separate jiras to make NN aware of favored 
nodes. I think the patch in this jira is fine as a stepping stone for the 
further work.

 Make Balancer support exclude specified path
 

 Key: HDFS-6133
 URL: https://issues.apache.org/jira/browse/HDFS-6133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer  mover, namenode
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Attachments: HDFS-6133-1.patch, HDFS-6133-2.patch, HDFS-6133-3.patch, 
 HDFS-6133-4.patch, HDFS-6133.patch


 Currently, run Balancer will destroying Regionserver's data locality.
 If getBlocks could exclude blocks belongs to files which have specific path 
 prefix, like /hbase, then we can run Balancer without destroying 
 Regionserver's data locality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6133) Make Balancer support exclude specified path

2014-11-24 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223828#comment-14223828
 ] 

Yongjun Zhang commented on HDFS-6133:
-

Hi Kihwal, thanks for the clarification.

Hi [~zhaoyunjiong], thanks for the new rev to address my earlier comments. 
There is one nit with the latest rev (4), there alignment at line 316 area in 
DFSTestUtil.java need to be fixed.

Would any committer please pick up from here to push through? 

Thanks.



 Make Balancer support exclude specified path
 

 Key: HDFS-6133
 URL: https://issues.apache.org/jira/browse/HDFS-6133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer  mover, namenode
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Attachments: HDFS-6133-1.patch, HDFS-6133-2.patch, HDFS-6133-3.patch, 
 HDFS-6133-4.patch, HDFS-6133.patch


 Currently, run Balancer will destroying Regionserver's data locality.
 If getBlocks could exclude blocks belongs to files which have specific path 
 prefix, like /hbase, then we can run Balancer without destroying 
 Regionserver's data locality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6133) Make Balancer support exclude specified path

2014-11-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14217607#comment-14217607
 ] 

Hadoop QA commented on HDFS-6133:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12682349/HDFS-6133-4.patch
  against trunk revision 79301e8.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8778//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8778//console

This message is automatically generated.

 Make Balancer support exclude specified path
 

 Key: HDFS-6133
 URL: https://issues.apache.org/jira/browse/HDFS-6133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer  mover, namenode
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Attachments: HDFS-6133-1.patch, HDFS-6133-2.patch, HDFS-6133-3.patch, 
 HDFS-6133-4.patch, HDFS-6133.patch


 Currently, run Balancer will destroying Regionserver's data locality.
 If getBlocks could exclude blocks belongs to files which have specific path 
 prefix, like /hbase, then we can run Balancer without destroying 
 Regionserver's data locality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6133) Make Balancer support exclude specified path

2014-11-19 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218335#comment-14218335
 ] 

Yongjun Zhang commented on HDFS-6133:
-

Thanks [~kihwal] for the comments and [~zhaoyunjiong] for the new rev.

HI Kihwal, 

Per you last comment, 
{quote}
We are in the process of optimizing BlockManager and block report processing. 
It will be better to add the rest of the pieces involving reporting to NN and 
management on NN after this change. One thing that we need to make sure is that 
this jira is compatible with the changes to come. Since DN has to include the 
pinning info in full and incremental block reports, it needs to be recorded in 
memory, not queries against file system on demand. The in-memory state needs to 
be reconstructed on startup and things like DirectoryScanner might need to be 
made aware of block pinning too. These don't need to be done in this jira, but 
the design should not preclude such future changes.
{quote}

Would you please clarify what you mean by the rest of the pieces? Do you mean 
the current patch work for this jira that doesn't handle 
throwing-away-over-replicated and corrupted-pinned-block yet? , or you mean 
just the additional work needed on top of the current patch to take care of  
throwing-away-over-replicated and corrupted-pinned-block? 

Since you said These don't need to be done in this jira, does These mean 
the same as the rest of the pieces above?

Essentially I'm asking for a clarification of the scope of the fix you are 
expecting to address this jira here. 

Thanks.



 Make Balancer support exclude specified path
 

 Key: HDFS-6133
 URL: https://issues.apache.org/jira/browse/HDFS-6133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer  mover, namenode
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Attachments: HDFS-6133-1.patch, HDFS-6133-2.patch, HDFS-6133-3.patch, 
 HDFS-6133-4.patch, HDFS-6133.patch


 Currently, run Balancer will destroying Regionserver's data locality.
 If getBlocks could exclude blocks belongs to files which have specific path 
 prefix, like /hbase, then we can run Balancer without destroying 
 Regionserver's data locality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6133) Make Balancer support exclude specified path

2014-11-18 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14216577#comment-14216577
 ] 

Kihwal Lee commented on HDFS-6133:
--

We are in the process of optimizing {{BlockManager}} and block report 
processing. It will be better to add the rest of the pieces involving reporting 
to NN and management on NN after this change.  One thing that we need to make 
sure is that this jira is compatible with the changes to come. Since DN has to 
include the pinning info in full and incremental block reports, it needs to be 
recorded in memory, not queries against file system on demand.  The in-memory 
state needs to be reconstructed on startup and things like {{DirectoryScanner}} 
might need to be made aware of block pinning too. These don't need to be done 
in this jira, but the design should not preclude such future changes.

For storing and reporting pinning information, we are planning to borrow some 
bits from the block size variable. On the namenode side, the optimization will 
give us a few bits for storing this information. I will file a separate jira 
when we have more details. This jira will be linked as a dependency.

 Make Balancer support exclude specified path
 

 Key: HDFS-6133
 URL: https://issues.apache.org/jira/browse/HDFS-6133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer  mover, namenode
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Attachments: HDFS-6133-1.patch, HDFS-6133-2.patch, HDFS-6133-3.patch, 
 HDFS-6133.patch


 Currently, run Balancer will destroying Regionserver's data locality.
 If getBlocks could exclude blocks belongs to files which have specific path 
 prefix, like /hbase, then we can run Balancer without destroying 
 Regionserver's data locality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6133) Make Balancer support exclude specified path

2014-11-14 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14212670#comment-14212670
 ] 

Kihwal Lee commented on HDFS-6133:
--

This might be outside the scope of this jira, but I think we need to think 
about this before going further.  If a node with a pinned block is temporarily 
unavailable, the namenode will try to replicate the block as it is 
under-replicated.  When the node recovers, the block is over-replicated and a 
replica will be invalidated.  How do we make sure it is not removed from the 
favored node?  I think this scenario can happen during start-up or transient 
infra/network issues.  

Daryn and I had a brief discussion about this. It might be possible to include 
pinning info in block reports and remember it in block manager. This will 
enable NN to make the right decision on over-replicated cases. A bit more 
complicated logic will be needed when a pinned block gets corrupted on a 
favored node. The usual replicate + invalidate strategy won't be ideal here. 

 Make Balancer support exclude specified path
 

 Key: HDFS-6133
 URL: https://issues.apache.org/jira/browse/HDFS-6133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer  mover, namenode
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Attachments: HDFS-6133-1.patch, HDFS-6133-2.patch, HDFS-6133-3.patch, 
 HDFS-6133.patch


 Currently, run Balancer will destroying Regionserver's data locality.
 If getBlocks could exclude blocks belongs to files which have specific path 
 prefix, like /hbase, then we can run Balancer without destroying 
 Regionserver's data locality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6133) Make Balancer support exclude specified path

2014-11-14 Thread Chris Trezzo (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14212745#comment-14212745
 ] 

Chris Trezzo commented on HDFS-6133:


I am slightly late to the party on this one, but at Twitter we have a similar 
need for a slightly different use case. We use federation and each block pool 
has a different workload. For some of these workloads it does not make sense to 
run the balancer. For example, we have a block pool associated with a tmp 
namespace. Ideally, we would never want to balance blocks in that block pool 
because they will be deleted shortly anyways. One design approach we were 
contemplating is to make the balancer block pool-aware. You could then run the 
balancer on a per-block pool basis and have pluggable balancing strategies for 
each pool (i.e. the balancer policy in the block pool associated with the tmp 
namespace is a no-op). This allows the balancer to continue to be decoupled 
with the namespace and only needs to know about the block pool (we can still 
separate the BM at a later point).

The above might work for this use case as well. The balancer policy for the 
block pool containing blocks in hbase would be a no-op. Let me know what you 
guys think. I can see the block pool design being orthogonal to this JIRA, so 
let me know if I should open up a separate JIRA for this effort. We could 
potentially use the pinning strategy for our use case as well, but I hesitate 
for the same reasons that [~kihwal] mentioned above with respect to 
corrupt/unavailable blocks.

 Make Balancer support exclude specified path
 

 Key: HDFS-6133
 URL: https://issues.apache.org/jira/browse/HDFS-6133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer  mover, namenode
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Attachments: HDFS-6133-1.patch, HDFS-6133-2.patch, HDFS-6133-3.patch, 
 HDFS-6133.patch


 Currently, run Balancer will destroying Regionserver's data locality.
 If getBlocks could exclude blocks belongs to files which have specific path 
 prefix, like /hbase, then we can run Balancer without destroying 
 Regionserver's data locality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6133) Make Balancer support exclude specified path

2014-11-14 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14213449#comment-14213449
 ] 

Yongjun Zhang commented on HDFS-6133:
-

HI [~kihwal] and [~ctrezzo],

Thanks a lot for sharing the info and great insights here.

About the throwing-away-over-replicated case, I think the  include pinning 
info in block reports and remember it in block manager approach Kihwal and 
Daryn suggested seems reasonable.  And I agree corrupted-pinned-block case need 
some more thoughts and careful logic. 

About blockpool-aware balancing policy, since balancer works on each blockpool 
independently, it seems an natural approach. However, I think this would be 
complementary solution to the approach here.  I think it deserves its own jira 
and it can be worked on in parallel.

Currently one NN is associated with only one blockpool. And federated cluster 
is not yet widely used as far as I know. Implementation-wise, there are two 
options I can think of if we want o use blockpool-aware balancing policy to 
solve the problem here:
# User need to choose to use federated cluster, and put all files that need to 
be pinned into dedicated blockpool.
# Make a single NN to handle multipe block-pools.
The solution would work nicely for already federated cluster. For others, it 
won't be easy.

Right now we don't have the capability to handle pinning at block/file 
granularity (balancer does have the option to exclude nodes from being 
touched). It seems even providing a solution without handling the 
throwing-away-over-replicated case would help alleviating the pain.

Let's see if we can include a mechanism in the patch of this jira, or at least 
think through how to handle the two cases (throwing-away-over-replicated, and 
corrupted-pinned-block).

Thanks.

 

 Make Balancer support exclude specified path
 

 Key: HDFS-6133
 URL: https://issues.apache.org/jira/browse/HDFS-6133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer  mover, namenode
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Attachments: HDFS-6133-1.patch, HDFS-6133-2.patch, HDFS-6133-3.patch, 
 HDFS-6133.patch


 Currently, run Balancer will destroying Regionserver's data locality.
 If getBlocks could exclude blocks belongs to files which have specific path 
 prefix, like /hbase, then we can run Balancer without destroying 
 Regionserver's data locality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6133) Make Balancer support exclude specified path

2014-11-13 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14211841#comment-14211841
 ] 

Yongjun Zhang commented on HDFS-6133:
-

HI [~zhaoyunjiong],

Thanks for the good work here, and sorry for getting back late.

I reviewed the latest patch, and I think it indeed looks very close. I also 
think your answers to Stack's question make sense to me. Couple of questions:

# The concept of favoredNodes pre-existed before your patch, now your patch 
defines that as long as favoredNodes is passed, then pinning will be done. So 
we are changing the prior definition of how favoredNodes are used. Why not add 
some additional interface to tell that pinning will happen  so we have the 
option not to pin even if favoredNodes is passed? Not necessarily you need to 
do what I suggested here, but I'd like to understand your thoughts here.
# Do we ever need interface to do unpinning?

And I have some comments, mostly nits.

* DFSOutputStream.java
Line 1413, change {{favoredNodes != null ? true: false}} to {{favoredNodes != 
null}}
* DataTransferProtocol.java, 
Line 95: whether pinning the block to whether to pin the block
* BlockReceiver.java
Line 178. Move {{this.pinning = pinning; to before the debug msg printing 
block, and include the pinning in th edebug message.
* FSDatasetSpi.java
Line 479: javadoc the setPinning /getPinning api
* FsDatasetImpl.java, Line 2618, javadoc that sticky bit is used for pinning 
purpose
* DataXceiver.java
Line 562, remove the newly added extra empty line
* datatransfer.proto
Line 125, need javadoc
* TestBalancer.java
Can you add a description what testBalancerWithPinnedBlocks does and what the 
expected result is?

Thanks.




 Make Balancer support exclude specified path
 

 Key: HDFS-6133
 URL: https://issues.apache.org/jira/browse/HDFS-6133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer  mover, namenode
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Attachments: HDFS-6133-1.patch, HDFS-6133-2.patch, HDFS-6133-3.patch, 
 HDFS-6133.patch


 Currently, run Balancer will destroying Regionserver's data locality.
 If getBlocks could exclude blocks belongs to files which have specific path 
 prefix, like /hbase, then we can run Balancer without destroying 
 Regionserver's data locality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6133) Make Balancer support exclude specified path

2014-10-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14166618#comment-14166618
 ] 

Hadoop QA commented on HDFS-6133:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12674125/HDFS-6133-3.patch
  against trunk revision cb81bac.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 1 
release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot
  
org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication
  org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8388//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8388//artifact/patchprocess/patchReleaseAuditProblems.txt
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8388//console

This message is automatically generated.

 Make Balancer support exclude specified path
 

 Key: HDFS-6133
 URL: https://issues.apache.org/jira/browse/HDFS-6133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer  mover, namenode
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Attachments: HDFS-6133-1.patch, HDFS-6133-2.patch, HDFS-6133-3.patch, 
 HDFS-6133.patch


 Currently, run Balancer will destroying Regionserver's data locality.
 If getBlocks could exclude blocks belongs to files which have specific path 
 prefix, like /hbase, then we can run Balancer without destroying 
 Regionserver's data locality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6133) Make Balancer support exclude specified path

2014-10-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14164904#comment-14164904
 ] 

Hadoop QA commented on HDFS-6133:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673853/HDFS-6133-3.patch
  against trunk revision 2a51494.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8372//console

This message is automatically generated.

 Make Balancer support exclude specified path
 

 Key: HDFS-6133
 URL: https://issues.apache.org/jira/browse/HDFS-6133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer  mover, namenode
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Attachments: HDFS-6133-1.patch, HDFS-6133-2.patch, HDFS-6133-3.patch, 
 HDFS-6133.patch


 Currently, run Balancer will destroying Regionserver's data locality.
 If getBlocks could exclude blocks belongs to files which have specific path 
 prefix, like /hbase, then we can run Balancer without destroying 
 Regionserver's data locality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6133) Make Balancer support exclude specified path

2014-09-29 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151965#comment-14151965
 ] 

stack commented on HDFS-6133:
-

bq. I like pinning during creation. What do others think?

Until there is something better, seems fine (just one question below).

bq. We can always improve later.

Agree though would be coolio if the API added here could prevail in a new 
implementation.

A few comments on the patch:

Why we always pass false in below?

1653new Sender(out).writeBlock(b, accessToken, clientname, targets,
1654srcNode, stage, 0, 0, 0, 0, blockSender.getChecksum(),
1655cachingStrategy, false);

We will never copy a block?

925 if (datanode.data.getPinning(block)) {
926   String msg = Not able to copy block  + block.getBlockId() +   
+
927   to  + peer.getRemoteAddressString() +  because it's pinned 
;
928   LOG.info(msg);
929   sendResponse(ERROR, msg);
930 }

Any thing to help ensure replica count does not rot when this pinning is 
enabled?

Nice test.

Thanks.

 Make Balancer support exclude specified path
 

 Key: HDFS-6133
 URL: https://issues.apache.org/jira/browse/HDFS-6133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer, namenode
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Attachments: HDFS-6133-1.patch, HDFS-6133-2.patch, HDFS-6133.patch


 Currently, run Balancer will destroying Regionserver's data locality.
 If getBlocks could exclude blocks belongs to files which have specific path 
 prefix, like /hbase, then we can run Balancer without destroying 
 Regionserver's data locality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6133) Make Balancer support exclude specified path

2014-09-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151979#comment-14151979
 ] 

Hadoop QA commented on HDFS-6133:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12653743/HDFS-6133-2.patch
  against trunk revision 371ef4c.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8252//console

This message is automatically generated.

 Make Balancer support exclude specified path
 

 Key: HDFS-6133
 URL: https://issues.apache.org/jira/browse/HDFS-6133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer, namenode
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Attachments: HDFS-6133-1.patch, HDFS-6133-2.patch, HDFS-6133.patch


 Currently, run Balancer will destroying Regionserver's data locality.
 If getBlocks could exclude blocks belongs to files which have specific path 
 prefix, like /hbase, then we can run Balancer without destroying 
 Regionserver's data locality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6133) Make Balancer support exclude specified path

2014-07-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14051079#comment-14051079
 ] 

Hadoop QA commented on HDFS-6133:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12653743/HDFS-6133-2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 8 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.server.datanode.TestBPOfferService
  org.apache.hadoop.hdfs.TestCrcCorruption

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7277//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7277//console

This message is automatically generated.

 Make Balancer support exclude specified path
 

 Key: HDFS-6133
 URL: https://issues.apache.org/jira/browse/HDFS-6133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer, namenode
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Attachments: HDFS-6133-1.patch, HDFS-6133-2.patch, HDFS-6133.patch


 Currently, run Balancer will destroying Regionserver's data locality.
 If getBlocks could exclude blocks belongs to files which have specific path 
 prefix, like /hbase, then we can run Balancer without destroying 
 Regionserver's data locality.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6133) Make Balancer support exclude specified path

2014-06-19 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14037378#comment-14037378
 ] 

Daryn Sharp commented on HDFS-6133:
---

Agreed on pinning even if not all favored nodes granted to keep the impl 
simpler.  I think there's a lot of value in a change like this since there are 
multiple jiras attempting different ways to stop the balancer from moving hbase 
files.  We can always improve later.

Allowing the user to pin/unpin would be useful, but would be much more 
complicated so I'd suggest deferring to another jira.  Complicated in the sense 
that the NN has to be pinning aware so it can issue commands to DNs to pin, 
compatibility concerns during rolling upgrades, etc.  Only allowing pinning 
after the fact instead of during creation does create races where the balancer 
got to the block before the pin request; either via tight race, or rpc 
connection dropped, failover, etc.

I like pinning during creation.  What do others think?

 Make Balancer support exclude specified path
 

 Key: HDFS-6133
 URL: https://issues.apache.org/jira/browse/HDFS-6133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer, namenode
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Attachments: HDFS-6133-1.patch, HDFS-6133.patch


 Currently, run Balancer will destroying Regionserver's data locality.
 If getBlocks could exclude blocks belongs to files which have specific path 
 prefix, like /hbase, then we can run Balancer without destroying 
 Regionserver's data locality.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6133) Make Balancer support exclude specified path

2014-05-07 Thread zhaoyunjiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13991492#comment-13991492
 ] 

zhaoyunjiong commented on HDFS-6133:


I'll use boolean instead of Boolean.

Yes, the NN  may not grant all the requested/favored nodes. The best way is to 
only pin the blocks on the favored nodes, but considered the probability that 
NN didn't grant all the favored nodes is small, so I just pinned them all.

Also I was wondering whether I should provide a API that let user 
pinning/un-pinning blocks after file created. That might be more useful than 
combine pinning with favored nodes.



 Make Balancer support exclude specified path
 

 Key: HDFS-6133
 URL: https://issues.apache.org/jira/browse/HDFS-6133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer, namenode
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Attachments: HDFS-6133-1.patch, HDFS-6133.patch


 Currently, run Balancer will destroying Regionserver's data locality.
 If getBlocks could exclude blocks belongs to files which have specific path 
 prefix, like /hbase, then we can run Balancer without destroying 
 Regionserver's data locality.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6133) Make Balancer support exclude specified path

2014-05-06 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13990874#comment-13990874
 ] 

Daryn Sharp commented on HDFS-6133:
---

Upon a cursory review, I think the patch is promising.  I wonder if favored 
nodes and pinning should be combined, but it stands to reason that if you want 
specific nodes that you probably don't want the blocks to move.  Plus it's 
least disruptive to apis.  I haven't crawled through the favored node code, but 
I'm assuming the NN may not grant all the requested/favored nodes?  If yes, 
does it make sense to only pin the blocks on the favored nodes that were 
granted?

A small quick question: Do you need to use a Boolean instead a primitive 
boolean?  

 Make Balancer support exclude specified path
 

 Key: HDFS-6133
 URL: https://issues.apache.org/jira/browse/HDFS-6133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer, namenode
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Attachments: HDFS-6133-1.patch, HDFS-6133.patch


 Currently, run Balancer will destroying Regionserver's data locality.
 If getBlocks could exclude blocks belongs to files which have specific path 
 prefix, like /hbase, then we can run Balancer without destroying 
 Regionserver's data locality.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6133) Make Balancer support exclude specified path

2014-05-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13990894#comment-13990894
 ] 

Hadoop QA commented on HDFS-6133:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12643535/HDFS-6133-1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 8 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.TestDatanodeConfig

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6832//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6832//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6832//console

This message is automatically generated.

 Make Balancer support exclude specified path
 

 Key: HDFS-6133
 URL: https://issues.apache.org/jira/browse/HDFS-6133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer, namenode
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Attachments: HDFS-6133-1.patch, HDFS-6133.patch


 Currently, run Balancer will destroying Regionserver's data locality.
 If getBlocks could exclude blocks belongs to files which have specific path 
 prefix, like /hbase, then we can run Balancer without destroying 
 Regionserver's data locality.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6133) Make Balancer support exclude specified path

2014-05-02 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13987736#comment-13987736
 ] 

Daryn Sharp commented on HDFS-6133:
---

I'm not sure the NN needs to know about the pinning.  It may only need to be 
something the client tells the DN when establishing the pipeline.  Perhaps it 
could be stored in the existing block metadata file is, but I don't know how 
extensible the file is, whether it can be changed with backwards compatibility, 
etc.   Other approaches might be using the sticky bit or the existence of a 
second file to denote a pinned block.

 Make Balancer support exclude specified path
 

 Key: HDFS-6133
 URL: https://issues.apache.org/jira/browse/HDFS-6133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer, namenode
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Attachments: HDFS-6133.patch


 Currently, run Balancer will destroying Regionserver's data locality.
 If getBlocks could exclude blocks belongs to files which have specific path 
 prefix, like /hbase, then we can run Balancer without destroying 
 Regionserver's data locality.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6133) Make Balancer support exclude specified path

2014-04-29 Thread zhaoyunjiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13984113#comment-13984113
 ] 

zhaoyunjiong commented on HDFS-6133:


Yes, block pinning works.
By the way, where do you think is the best place to store pinning information?
If save in Block, seems it will cost a lot memory.

 Make Balancer support exclude specified path
 

 Key: HDFS-6133
 URL: https://issues.apache.org/jira/browse/HDFS-6133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer, namenode
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Attachments: HDFS-6133.patch


 Currently, run Balancer will destroying Regionserver's data locality.
 If getBlocks could exclude blocks belongs to files which have specific path 
 prefix, like /hbase, then we can run Balancer without destroying 
 Regionserver's data locality.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6133) Make Balancer support exclude specified path

2014-04-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13980981#comment-13980981
 ] 

Hadoop QA commented on HDFS-6133:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12641891/HDFS-6133.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.TestDistributedFileSystem

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6733//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6733//console

This message is automatically generated.

 Make Balancer support exclude specified path
 

 Key: HDFS-6133
 URL: https://issues.apache.org/jira/browse/HDFS-6133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer, namenode
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Attachments: HDFS-6133.patch


 Currently, run Balancer will destroying Regionserver's data locality.
 If getBlocks could exclude blocks belongs to files which have specific path 
 prefix, like /hbase, then we can run Balancer without destroying 
 Regionserver's data locality.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6133) Make Balancer support exclude specified path

2014-04-25 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13981055#comment-13981055
 ] 

Daryn Sharp commented on HDFS-6133:
---

[~stack], I quickly skimmed HDFS-2576.  Favored placement appears to dovetail 
very nicely with replica pinning if the goal is to place a 2nd replica on a 
specific node in the event the primary region server fails.  Whether one or 
more replicas is pinned to preserve favored placement, it would appear to meet 
the requirement of this jira, correct?

[~zhaoyunjiong], ignoring the mid-long term goal of separating the BM service, 
the preconditions are simple and a valuable improvement to the NN:  untangle 
the namesystem / directory / BM architecture.  Each layer should have a 
specific role with no back references to other layers.  Namesystem is path 
based, directory is inode based, BM is just a BM.

By allowing no back references, it also opens the possibility of breaking up 
the global locking by  locking subsystems w/o fear of deadlocks.  In and of 
itself, the NN will benefit from a cleaner design and more targeted locking, 
and if done correctly, it's a small leap to run the BM as a separate service 
which in turn opens the door to allowing other services than the NN to utilize 
the block manager.  This is why I'm opposed (-1) to making the BM path aware.  
It ruins the ability to do any of the above.

Do you believe block pinning will address your issue?

 Make Balancer support exclude specified path
 

 Key: HDFS-6133
 URL: https://issues.apache.org/jira/browse/HDFS-6133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer, namenode
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Attachments: HDFS-6133.patch


 Currently, run Balancer will destroying Regionserver's data locality.
 If getBlocks could exclude blocks belongs to files which have specific path 
 prefix, like /hbase, then we can run Balancer without destroying 
 Regionserver's data locality.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6133) Make Balancer support exclude specified path

2014-04-25 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13981279#comment-13981279
 ] 

stack commented on HDFS-6133:
-

[~daryn] Pinning would work as long as we are still able to dictate where to 
place the replicas.  If pinning a block makes it immune to balance, that should 
work.

 Make Balancer support exclude specified path
 

 Key: HDFS-6133
 URL: https://issues.apache.org/jira/browse/HDFS-6133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer, namenode
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Attachments: HDFS-6133.patch


 Currently, run Balancer will destroying Regionserver's data locality.
 If getBlocks could exclude blocks belongs to files which have specific path 
 prefix, like /hbase, then we can run Balancer without destroying 
 Regionserver's data locality.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6133) Make Balancer support exclude specified path

2014-04-24 Thread Vinayakumar B (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13979488#comment-13979488
 ] 

Vinayakumar B commented on HDFS-6133:
-

Thanks [~zhaoyunjiong] for the patch.
Approach looks good,

I have following comments.
1. {code}+.setDatanode(PBHelper.convert((DatanodeID) 
datanode)).setSize(size)
+.setPathPrefix(pathPrefix).build();{code}
Since {{pathPrefix}} is optional, you need not set an empty string when its 
null. You need not set at all.

Similarly, you need to check for {{request.hasPathPrefix}} before using 
{{request.getPathPrefix()}}. if its not set then you can pass null from 
NamenodeProtocolServerSideTranslatorPB.java itself. This is the usual 
convention followed for optional parameters.

2. {{pathPrefix}} can be renamed to appropriate one, such as 
excludePathPrefix in all places.
3. in Test, why need to extract to one more method 
testBalancerWithExcludeDirectory ?. i think its not required unless its re-used.


{quote}NamenodeProtocol.proto
My understanding is that the newly added optional parameter should have no 
impact in compatibility. But other more experienced reviewers please 
confirm.{quote}
Yes you are correct, it provides compatibility. But since NameNodeProtocol is 
only a server protocol, not used by any client application, I feel no need to 
worry about it.





 Make Balancer support exclude specified path
 

 Key: HDFS-6133
 URL: https://issues.apache.org/jira/browse/HDFS-6133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer, namenode
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Attachments: HDFS-6133.patch


 Currently, run Balancer will destroying Regionserver's data locality.
 If getBlocks could exclude blocks belongs to files which have specific path 
 prefix, like /hbase, then we can run Balancer without destroying 
 Regionserver's data locality.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6133) Make Balancer support exclude specified path

2014-04-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13979785#comment-13979785
 ] 

Hadoop QA commented on HDFS-6133:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12641679/HDFS-6133.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6714//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6714//console

This message is automatically generated.

 Make Balancer support exclude specified path
 

 Key: HDFS-6133
 URL: https://issues.apache.org/jira/browse/HDFS-6133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer, namenode
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Attachments: HDFS-6133.patch


 Currently, run Balancer will destroying Regionserver's data locality.
 If getBlocks could exclude blocks belongs to files which have specific path 
 prefix, like /hbase, then we can run Balancer without destroying 
 Regionserver's data locality.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6133) Make Balancer support exclude specified path

2014-04-24 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13980251#comment-13980251
 ] 

Daryn Sharp commented on HDFS-6133:
---

Please give me a little time to think about this patch.  I think it will be 
detrimental to decoupling the BM from the FSN.

 Make Balancer support exclude specified path
 

 Key: HDFS-6133
 URL: https://issues.apache.org/jira/browse/HDFS-6133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer, namenode
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Attachments: HDFS-6133.patch


 Currently, run Balancer will destroying Regionserver's data locality.
 If getBlocks could exclude blocks belongs to files which have specific path 
 prefix, like /hbase, then we can run Balancer without destroying 
 Regionserver's data locality.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6133) Make Balancer support exclude specified path

2014-04-24 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13980282#comment-13980282
 ] 

Yongjun Zhang commented on HDFS-6133:
-

Thanks [~zhaoyunjiong] for addressing my earlier comments.  [~vinayrpet], 
thanks for your additional comments, they very good ones to me.

Yunjiong, when you address Vinayakumar's comments, would you please consider 
the following:

- Balancer.java, the variable excludeDir really meant excludePaths, which 
contains a list of paths separated by comma, each of the paths could be either 
a dir and a file. So suggest to replace excludeDir with  excludePaths.

- Line 422 of NamenodeRpcServer.java exceeded 80 chars.

- NamenodeProtocal.java, I'm suggesting to improve the following comments a bit 
more 
{code}
  *  exclude blocks belongs to files with specified path prefix, use
  *  comma to separate if have multiple paths
{code}
TO:
{code}
   *  exclude blocks belonging to files with path prefix in 
excludePaths, which
   *  is a single path or a list of paths separated by comma. 
{code}

- I happen to see a typo in test code:
 {code}
  r = Balancer.run(namenodes, Balancer.Parameters.DEFALUT, conf);
{code}
The DEFALUT meant DEFAULT, would you please correct it as well?

I'm giving my +1 once you address Vinayakumar's comments and mine above. 

Thanks again for the good work.


 Make Balancer support exclude specified path
 

 Key: HDFS-6133
 URL: https://issues.apache.org/jira/browse/HDFS-6133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer, namenode
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Attachments: HDFS-6133.patch


 Currently, run Balancer will destroying Regionserver's data locality.
 If getBlocks could exclude blocks belongs to files which have specific path 
 prefix, like /hbase, then we can run Balancer without destroying 
 Regionserver's data locality.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6133) Make Balancer support exclude specified path

2014-04-24 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13980406#comment-13980406
 ] 

Daryn Sharp commented on HDFS-6133:
---

I understand the value of this feature, but I'd like to place a polite -1 
pending further discussion.  I'm not opposed to the goal of the feature but I'd 
like to make sure it's not committed before more evaluation.  I'll be posting 
joint concerns from an internal brainstorming after this comment.

 Make Balancer support exclude specified path
 

 Key: HDFS-6133
 URL: https://issues.apache.org/jira/browse/HDFS-6133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer, namenode
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Attachments: HDFS-6133.patch


 Currently, run Balancer will destroying Regionserver's data locality.
 If getBlocks could exclude blocks belongs to files which have specific path 
 prefix, like /hbase, then we can run Balancer without destroying 
 Regionserver's data locality.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6133) Make Balancer support exclude specified path

2014-04-24 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13980464#comment-13980464
 ] 

Yongjun Zhang commented on HDFS-6133:
-

I didn't see [~daryn]'s comments when I posted my last one, I certainly would 
like to understand the concerns. Thanks Daryn.


 Make Balancer support exclude specified path
 

 Key: HDFS-6133
 URL: https://issues.apache.org/jira/browse/HDFS-6133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer, namenode
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Attachments: HDFS-6133.patch


 Currently, run Balancer will destroying Regionserver's data locality.
 If getBlocks could exclude blocks belongs to files which have specific path 
 prefix, like /hbase, then we can run Balancer without destroying 
 Regionserver's data locality.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6133) Make Balancer support exclude specified path

2014-04-24 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13980504#comment-13980504
 ] 

Daryn Sharp commented on HDFS-6133:
---

A main concern we have is the performance/scalability of the proposed patch. 
Looking up paths for every block is not trivial.  The namespace lock is going 
to be held for a much longer duration leading to performance degradation.  GC 
pressure will also increase from all the temporary arrays, byte to string 
conversions, and stringbuilder instances need to resolve the path.

Another concern is this design approach.  The BM shouldn't be path aware.  We 
are trying to scale the performance and size of the namespace with a long term 
goal of the BM being an independent service.  As such, the BM cannot have a 
direct reference to the namesystem, which means it cannot be path aware.  Not 
to mention it violates the path - inode - block layer abstractions if a block 
is path aware.

--

I understand and share your desire is to prevent balancing from destroying 
hbase performance.  I believe the goal is to keep the blocks of an hbase column 
on the region server?  Pinning based on path pins all replicas, so when you 
increase your cluster capacity, it's effectively up to hbase to re-balance the 
cluster by recreating files.  If your cluster is hbase-centric, then making the 
balancer exclude all hbase files has marginal value.

I think what we really need is to support pinning the local replica but allow 
the other remote replicas to float.  
Perhaps a new flag to create/append could be used to instruct the data streamer 
to pin the local replica.  The flag is part of the block's metadata (whether in 
the file or elsewhere).  The DN may then refuse balancer requests to replace 
the block.  If done correctly, it's completely backwards compatible and 
requires no NN changes.

 Make Balancer support exclude specified path
 

 Key: HDFS-6133
 URL: https://issues.apache.org/jira/browse/HDFS-6133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer, namenode
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Attachments: HDFS-6133.patch


 Currently, run Balancer will destroying Regionserver's data locality.
 If getBlocks could exclude blocks belongs to files which have specific path 
 prefix, like /hbase, then we can run Balancer without destroying 
 Regionserver's data locality.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6133) Make Balancer support exclude specified path

2014-04-24 Thread Jean-Marc Spaggiari (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13980543#comment-13980543
 ] 

Jean-Marc Spaggiari commented on HDFS-6133:
---

I don't know if any other application outside of HBase need to avoid block 
movement, but the pinning option you proposed might be fine for HBase. I will 
just recommend to have the option to pin all the blocks and not just the first 
one. The thing is, HBase might a some point in the future define ghosts 
regions on the replicates to improve performances or faster MTTR. If we move 
those blocks that will again create same kind of concern.

So adding the ability to pin all the blocks or just one or none when creating 
them might be a good option too for HBase.

 Make Balancer support exclude specified path
 

 Key: HDFS-6133
 URL: https://issues.apache.org/jira/browse/HDFS-6133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer, namenode
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Attachments: HDFS-6133.patch


 Currently, run Balancer will destroying Regionserver's data locality.
 If getBlocks could exclude blocks belongs to files which have specific path 
 prefix, like /hbase, then we can run Balancer without destroying 
 Regionserver's data locality.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6133) Make Balancer support exclude specified path

2014-04-24 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13980558#comment-13980558
 ] 

stack commented on HDFS-6133:
-

bq. I believe the goal is to keep the blocks of an hbase column on the region 
server?

Yes. Locality is important: e.g. being able to SCR makes a big difference in 
our serving latencies (and how much CPU we end up using servicing requests).

bq. ...it's effectively up to hbase to re-balance the cluster

Facebook describe their HBase doing this.  Add a rack and slow fill by placing 
the 3rd replica over on the new rack.  Only after the new rack is full enough, 
switch on the regionservers in the new rack (The alternative is switch on 
regionservers and cripple the top-of-the-rack switch with traffic as 
regionservers read remotely).  HDFS-2576 is an attempt at adding some of this 
'Favored Node' facility back into apache hdfs (though we've not made use of it 
over in apache hbaselandia up to this -- we've still to 'finish' up our end).

bq. If your cluster is hbase-centric, then making the balancer exclude all 
hbase files has marginal value.

Well, currently the balancer may wreak havoc if it runs across the hbase 
root.dir.

bq. I think what we really need is to support pinning the local replica but 
allow the other remote replicas to float. 

Another aspect of 'Favored Nodes' (implemented by FB, lagging in apache hbase) 
was that on node crash, since we'd been 'placing' the blocks, we'd know which 
machine(s) could serve the data that was on the dead machine while maintainng a 
better than random locality quotient.  We'd not be able to do this if the 
non-locals could float.

Not suggesting that the above is the best way to go; just listing out a couple 
of hbase concerns.

 Make Balancer support exclude specified path
 

 Key: HDFS-6133
 URL: https://issues.apache.org/jira/browse/HDFS-6133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer, namenode
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Attachments: HDFS-6133.patch


 Currently, run Balancer will destroying Regionserver's data locality.
 If getBlocks could exclude blocks belongs to files which have specific path 
 prefix, like /hbase, then we can run Balancer without destroying 
 Regionserver's data locality.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6133) Make Balancer support exclude specified path

2014-04-23 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13978286#comment-13978286
 ] 

Yongjun Zhang commented on HDFS-6133:
-

Hi [~zhaoyunjiong], thanks for letting me know about HDFS-6133 at HDFS-4420, 
and sorry for getting back late.

I looked at the patch you did, it looks pretty good to me. Thanks for the good 
work.

I have some comments on small things:

- Some lines exceed 80 chars. E.g. NamenodeProtocolServerSideTranslatorPB.java, 
NameNodeRpcServer.java etc.
- In BlockManager.getBlocksWithLocations
You might consider stripping out empty prefixes in the pathPrefixes array as a 
preprocessing step, and then in the loop you don't need to check 
prefix.isEmpty(). When you have a lot of blocks to iterate in the loop, it 
would save some run time.

Another corner case is, when there are multiple prexies, one prefix might  
cover another (e.g., /a/b covers /a/b/c). Preprocessing the pathPrefixes to 
remove the prefixes covered by other can be done, but user should be aware of 
that /a/b and /a/b/c are duplicates. So this preprocessing is optional, 
depending on what the use cases look like. 

- In NamenodeProtocol.java, description of newly added parameter pathPrefix is 
missing. 

- NamenodeProtocol.proto, Nit thing, you might consider modify the comments
  // Exclude blocks belongs to files which their path have prefix: pathPrefix
To something like
  // Exclude blocks belonging to files with specified path prefix

- NamenodeProtocol.proto
My understanding is that the newly added optional parameter should have no 
impact in compatibility. But other more experienced reviewers please confirm.

- testBalancerWithExcludeDirectory 
It might help to add another call at the end of the test to do one more round 
of balancing without the excludeDir parameter, and assert that blocks are moved.

Thanks.



 Make Balancer support exclude specified path
 

 Key: HDFS-6133
 URL: https://issues.apache.org/jira/browse/HDFS-6133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer, namenode
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Attachments: HDFS-6133.patch


 Currently, run Balancer will destroying Regionserver's data locality.
 If getBlocks could exclude blocks belongs to files which have specific path 
 prefix, like /hbase, then we can run Balancer without destroying 
 Regionserver's data locality.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6133) Make Balancer support exclude specified path

2014-04-23 Thread Benoy Antony (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13978402#comment-13978402
 ] 

Benoy Antony commented on HDFS-6133:


[~zhaoyunjiong] 

You can use _StringUtils.getTrimmedStringCollection()_  to remove  empty, 
duplicate elements as the preprocessing step.

 Make Balancer support exclude specified path
 

 Key: HDFS-6133
 URL: https://issues.apache.org/jira/browse/HDFS-6133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer, namenode
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Attachments: HDFS-6133.patch


 Currently, run Balancer will destroying Regionserver's data locality.
 If getBlocks could exclude blocks belongs to files which have specific path 
 prefix, like /hbase, then we can run Balancer without destroying 
 Regionserver's data locality.



--
This message was sent by Atlassian JIRA
(v6.2#6252)