[ 
https://issues.apache.org/jira/browse/HDFS-6009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13934814#comment-13934814
 ] 

Yu Li commented on HDFS-6009:
-----------------------------

{quote}
In particular, what caused the failure in your case? Is it a disk error, 
network failure, or an application is buggy?
{quote}
In our product env, we almost encountered all the cases listed above, and 
experienced a hard time comforting angry users. Especially in the buggy 
application case, the other users affected would become crazy because of being 
punished by other's faults. So in our case isolation is necessary. 

To be more specific, our service is based on HBase, so the tools supplied here 
are used along with the HBase regionserver group feature(HBASE-6721). If you're 
interested in our use case, I've given some more detailed introduction 
[here|https://issues.apache.org/jira/browse/HDFS-6010?focusedCommentId=13932891&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13932891]
 in HDFS-6010 (just allow me to save some copy-paste effort :-))

Another thing to clarify here is that this suit of tools won't persist any 
"datanode group" information into HDFS. All the 3 tools accept a "-servers" 
option, so the admin needs to "keep in mind" the group information and pass it 
to the tools, or like in our use case, persist the group information in 
upper-level component like HBase.

[~thanhdo], hope this answers your question and just let me know if any further 
comments.

> Tools based on favored node feature for isolation
> -------------------------------------------------
>
>                 Key: HDFS-6009
>                 URL: https://issues.apache.org/jira/browse/HDFS-6009
>             Project: Hadoop HDFS
>          Issue Type: Task
>    Affects Versions: 2.3.0
>            Reporter: Yu Li
>            Assignee: Yu Li
>            Priority: Minor
>
> There're scenarios like mentioned in HBASE-6721 and HBASE-4210 that in 
> multi-tenant deployments of HBase we prefer to specify several groups of 
> regionservers to serve different applications, to achieve some kind of 
> isolation or resource allocation. However, although the regionservers are 
> grouped, the datanodes which store the data are not, which leads to the case 
> that one datanode failure affects multiple applications, as we already 
> observed in our product environment.
> To relieve the above issue, we could take usage of the favored node feature 
> (HDFS-2576) to make regionserver able to locate data within its group, or say 
> make datanodes also grouped (passively), to form some level of isolation.
> In this case, or any other case that needs datanodes to group, we would need 
> a bunch of tools to maintain the "group", including:
> 1. Making balancer able to balance data among specified servers, rather than 
> the whole set
> 2. Set balance bandwidth for specified servers, rather than the whole set
> 3. Some tool to check whether the block is "cross-group" placed, and move it 
> back if so
> This JIRA is an umbrella for the above tools.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to