[jira] [Commented] (HDFS-6009) Tools based on favored node feature for isolation

2014-03-14 Thread Yu Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934814#comment-13934814
 ] 

Yu Li commented on HDFS-6009:
-

{quote}
In particular, what caused the failure in your case? Is it a disk error, 
network failure, or an application is buggy?
{quote}
In our product env, we almost encountered all the cases listed above, and 
experienced a hard time comforting angry users. Especially in the buggy 
application case, the other users affected would become crazy because of being 
punished by other's faults. So in our case isolation is necessary. 

To be more specific, our service is based on HBase, so the tools supplied here 
are used along with the HBase regionserver group feature(HBASE-6721). If you're 
interested in our use case, I've given some more detailed introduction 
[here|https://issues.apache.org/jira/browse/HDFS-6010?focusedCommentId=13932891page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13932891]
 in HDFS-6010 (just allow me to save some copy-paste effort :-))

Another thing to clarify here is that this suit of tools won't persist any 
datanode group information into HDFS. All the 3 tools accept a -servers 
option, so the admin needs to keep in mind the group information and pass it 
to the tools, or like in our use case, persist the group information in 
upper-level component like HBase.

[~thanhdo], hope this answers your question and just let me know if any further 
comments.

 Tools based on favored node feature for isolation
 -

 Key: HDFS-6009
 URL: https://issues.apache.org/jira/browse/HDFS-6009
 Project: Hadoop HDFS
  Issue Type: Task
Affects Versions: 2.3.0
Reporter: Yu Li
Assignee: Yu Li
Priority: Minor

 There're scenarios like mentioned in HBASE-6721 and HBASE-4210 that in 
 multi-tenant deployments of HBase we prefer to specify several groups of 
 regionservers to serve different applications, to achieve some kind of 
 isolation or resource allocation. However, although the regionservers are 
 grouped, the datanodes which store the data are not, which leads to the case 
 that one datanode failure affects multiple applications, as we already 
 observed in our product environment.
 To relieve the above issue, we could take usage of the favored node feature 
 (HDFS-2576) to make regionserver able to locate data within its group, or say 
 make datanodes also grouped (passively), to form some level of isolation.
 In this case, or any other case that needs datanodes to group, we would need 
 a bunch of tools to maintain the group, including:
 1. Making balancer able to balance data among specified servers, rather than 
 the whole set
 2. Set balance bandwidth for specified servers, rather than the whole set
 3. Some tool to check whether the block is cross-group placed, and move it 
 back if so
 This JIRA is an umbrella for the above tools.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6009) Tools based on favored node feature for isolation

2014-03-14 Thread Thanh Do (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935194#comment-13935194
 ] 

Thanh Do commented on HDFS-6009:


Yu Li, thanks for your detailed comment! Your use case is a great example of 
isolation. We are currently working on some similar problems but at a lower 
level on the software stack, thus your use case is a great motivation.

 Tools based on favored node feature for isolation
 -

 Key: HDFS-6009
 URL: https://issues.apache.org/jira/browse/HDFS-6009
 Project: Hadoop HDFS
  Issue Type: Task
Affects Versions: 2.3.0
Reporter: Yu Li
Assignee: Yu Li
Priority: Minor

 There're scenarios like mentioned in HBASE-6721 and HBASE-4210 that in 
 multi-tenant deployments of HBase we prefer to specify several groups of 
 regionservers to serve different applications, to achieve some kind of 
 isolation or resource allocation. However, although the regionservers are 
 grouped, the datanodes which store the data are not, which leads to the case 
 that one datanode failure affects multiple applications, as we already 
 observed in our product environment.
 To relieve the above issue, we could take usage of the favored node feature 
 (HDFS-2576) to make regionserver able to locate data within its group, or say 
 make datanodes also grouped (passively), to form some level of isolation.
 In this case, or any other case that needs datanodes to group, we would need 
 a bunch of tools to maintain the group, including:
 1. Making balancer able to balance data among specified servers, rather than 
 the whole set
 2. Set balance bandwidth for specified servers, rather than the whole set
 3. Some tool to check whether the block is cross-group placed, and move it 
 back if so
 This JIRA is an umbrella for the above tools.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6009) Tools based on favored node feature for isolation

2014-03-13 Thread Thanh Do (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13933475#comment-13933475
 ] 

Thanh Do commented on HDFS-6009:


Hi Yu Li,

I want to follow up on this issue. Could you please elaborate more on datanode 
failure. In particular,  what caused the failure in your case? Is it a disk 
error, network failure, or an application is buggy?

If it is a disk error and network failure, I think isolation using datanode 
group is reasonable.

 Tools based on favored node feature for isolation
 -

 Key: HDFS-6009
 URL: https://issues.apache.org/jira/browse/HDFS-6009
 Project: Hadoop HDFS
  Issue Type: Task
Affects Versions: 2.3.0
Reporter: Yu Li
Assignee: Yu Li
Priority: Minor

 There're scenarios like mentioned in HBASE-6721 and HBASE-4210 that in 
 multi-tenant deployments of HBase we prefer to specify several groups of 
 regionservers to serve different applications, to achieve some kind of 
 isolation or resource allocation. However, although the regionservers are 
 grouped, the datanodes which store the data are not, which leads to the case 
 that one datanode failure affects multiple applications, as we already 
 observed in our product environment.
 To relieve the above issue, we could take usage of the favored node feature 
 (HDFS-2576) to make regionserver able to locate data within its group, or say 
 make datanodes also grouped (passively), to form some level of isolation.
 In this case, or any other case that needs datanodes to group, we would need 
 a bunch of tools to maintain the group, including:
 1. Making balancer able to balance data among specified servers, rather than 
 the whole set
 2. Set balance bandwidth for specified servers, rather than the whole set
 3. Some tool to check whether the block is cross-group placed, and move it 
 back if so
 This JIRA is an umbrella for the above tools.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6009) Tools based on favored node feature for isolation

2014-03-12 Thread Yu Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13931955#comment-13931955
 ] 

Yu Li commented on HDFS-6009:
-

Hi [~thanhdo],

Yes, the data are replicated, so there won't be data loss. However, since one 
datanode might carry on data of multiple applications, the datanode failure 
will cause *several* application read request to retry until timeout and change 
to another datanode, while we'd like to reduce the impact range

Another scenario we experienced here is that application A crazily reading data 
from one DN, which occupied almost all network bandwidth, while meantime 
application B tried to write data to this DN but blocked a long time.

As I mentioned in HDFS-6010, people might ask in this case why don't use 
phasically separated clusters, the answer would be it's more convenient and 
saves people resource to manage one big cluster than several small ones.

There's also other solution like HDFS-5776 to reduce the impact of bad 
datanode, but I believe there're still scenarios which need more strict io 
isolation, so I think it's still valuable to contribute our tools.

Hope this answers your question. :-)

 Tools based on favored node feature for isolation
 -

 Key: HDFS-6009
 URL: https://issues.apache.org/jira/browse/HDFS-6009
 Project: Hadoop HDFS
  Issue Type: Task
Affects Versions: 2.3.0
Reporter: Yu Li
Assignee: Yu Li
Priority: Minor

 There're scenarios like mentioned in HBASE-6721 and HBASE-4210 that in 
 multi-tenant deployments of HBase we prefer to specify several groups of 
 regionservers to serve different applications, to achieve some kind of 
 isolation or resource allocation. However, although the regionservers are 
 grouped, the datanodes which store the data are not, which leads to the case 
 that one datanode failure affects multiple applications, as we already 
 observed in our product environment.
 To relieve the above issue, we could take usage of the favored node feature 
 (HDFS-2576) to make regionserver able to locate data within its group, or say 
 make datanodes also grouped (passively), to form some level of isolation.
 In this case, or any other case that needs datanodes to group, we would need 
 a bunch of tools to maintain the group, including:
 1. Making balancer able to balance data among specified servers, rather than 
 the whole set
 2. Set balance bandwidth for specified servers, rather than the whole set
 3. Some tool to check whether the block is cross-group placed, and move it 
 back if so
 This JIRA is an umbrella for the above tools.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6009) Tools based on favored node feature for isolation

2014-03-12 Thread Sirianni, Eric (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13932042#comment-13932042
 ] 

Sirianni, Eric commented on HDFS-6009:
--

Thanks for emailing NetApp. The email inbox you have attempted to reach has 
been deactivated.


 Tools based on favored node feature for isolation
 -

 Key: HDFS-6009
 URL: https://issues.apache.org/jira/browse/HDFS-6009
 Project: Hadoop HDFS
  Issue Type: Task
Affects Versions: 2.3.0
Reporter: Yu Li
Assignee: Yu Li
Priority: Minor

 There're scenarios like mentioned in HBASE-6721 and HBASE-4210 that in 
 multi-tenant deployments of HBase we prefer to specify several groups of 
 regionservers to serve different applications, to achieve some kind of 
 isolation or resource allocation. However, although the regionservers are 
 grouped, the datanodes which store the data are not, which leads to the case 
 that one datanode failure affects multiple applications, as we already 
 observed in our product environment.
 To relieve the above issue, we could take usage of the favored node feature 
 (HDFS-2576) to make regionserver able to locate data within its group, or say 
 make datanodes also grouped (passively), to form some level of isolation.
 In this case, or any other case that needs datanodes to group, we would need 
 a bunch of tools to maintain the group, including:
 1. Making balancer able to balance data among specified servers, rather than 
 the whole set
 2. Set balance bandwidth for specified servers, rather than the whole set
 3. Some tool to check whether the block is cross-group placed, and move it 
 back if so
 This JIRA is an umbrella for the above tools.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6009) Tools based on favored node feature for isolation

2014-03-12 Thread Thanh Do (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13932037#comment-13932037
 ] 

Thanh Do commented on HDFS-6009:


Thank you!

 Tools based on favored node feature for isolation
 -

 Key: HDFS-6009
 URL: https://issues.apache.org/jira/browse/HDFS-6009
 Project: Hadoop HDFS
  Issue Type: Task
Affects Versions: 2.3.0
Reporter: Yu Li
Assignee: Yu Li
Priority: Minor

 There're scenarios like mentioned in HBASE-6721 and HBASE-4210 that in 
 multi-tenant deployments of HBase we prefer to specify several groups of 
 regionservers to serve different applications, to achieve some kind of 
 isolation or resource allocation. However, although the regionservers are 
 grouped, the datanodes which store the data are not, which leads to the case 
 that one datanode failure affects multiple applications, as we already 
 observed in our product environment.
 To relieve the above issue, we could take usage of the favored node feature 
 (HDFS-2576) to make regionserver able to locate data within its group, or say 
 make datanodes also grouped (passively), to form some level of isolation.
 In this case, or any other case that needs datanodes to group, we would need 
 a bunch of tools to maintain the group, including:
 1. Making balancer able to balance data among specified servers, rather than 
 the whole set
 2. Set balance bandwidth for specified servers, rather than the whole set
 3. Some tool to check whether the block is cross-group placed, and move it 
 back if so
 This JIRA is an umbrella for the above tools.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6009) Tools based on favored node feature for isolation

2014-03-11 Thread Thanh Do (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13931338#comment-13931338
 ] 

Thanh Do commented on HDFS-6009:


Hi Yu, 

You mentioned although the regionservers are grouped, the datanodes which 
store the data are not, which leads to the case that one datanode failure 
affects multiple applications, as we already observed in our product 
environment.

Can you elaborate that scenarios? I thought a datanode failure will be ok, as 
the data are replicated. 

Best,

 Tools based on favored node feature for isolation
 -

 Key: HDFS-6009
 URL: https://issues.apache.org/jira/browse/HDFS-6009
 Project: Hadoop HDFS
  Issue Type: Task
Affects Versions: 2.3.0
Reporter: Yu Li
Assignee: Yu Li
Priority: Minor

 There're scenarios like mentioned in HBASE-6721 and HBASE-4210 that in 
 multi-tenant deployments of HBase we prefer to specify several groups of 
 regionservers to serve different applications, to achieve some kind of 
 isolation or resource allocation. However, although the regionservers are 
 grouped, the datanodes which store the data are not, which leads to the case 
 that one datanode failure affects multiple applications, as we already 
 observed in our product environment.
 To relieve the above issue, we could take usage of the favored node feature 
 (HDFS-2576) to make regionserver able to locate data within its group, or say 
 make datanodes also grouped (passively), to form some level of isolation.
 In this case, or any other case that needs datanodes to group, we would need 
 a bunch of tools to maintain the group, including:
 1. Making balancer able to balance data among specified servers, rather than 
 the whole set
 2. Set balance bandwidth for specified servers, rather than the whole set
 3. Some tool to check whether the block is cross-group placed, and move it 
 back if so
 This JIRA is an umbrella for the above tools.



--
This message was sent by Atlassian JIRA
(v6.2#6252)