[jira] [Commented] (HDFS-6009) Tools based on favored node feature for isolation
[ https://issues.apache.org/jira/browse/HDFS-6009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934814#comment-13934814 ] Yu Li commented on HDFS-6009: - {quote} In particular, what caused the failure in your case? Is it a disk error, network failure, or an application is buggy? {quote} In our product env, we almost encountered all the cases listed above, and experienced a hard time comforting angry users. Especially in the buggy application case, the other users affected would become crazy because of being punished by other's faults. So in our case isolation is necessary. To be more specific, our service is based on HBase, so the tools supplied here are used along with the HBase regionserver group feature(HBASE-6721). If you're interested in our use case, I've given some more detailed introduction [here|https://issues.apache.org/jira/browse/HDFS-6010?focusedCommentId=13932891page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13932891] in HDFS-6010 (just allow me to save some copy-paste effort :-)) Another thing to clarify here is that this suit of tools won't persist any datanode group information into HDFS. All the 3 tools accept a -servers option, so the admin needs to keep in mind the group information and pass it to the tools, or like in our use case, persist the group information in upper-level component like HBase. [~thanhdo], hope this answers your question and just let me know if any further comments. Tools based on favored node feature for isolation - Key: HDFS-6009 URL: https://issues.apache.org/jira/browse/HDFS-6009 Project: Hadoop HDFS Issue Type: Task Affects Versions: 2.3.0 Reporter: Yu Li Assignee: Yu Li Priority: Minor There're scenarios like mentioned in HBASE-6721 and HBASE-4210 that in multi-tenant deployments of HBase we prefer to specify several groups of regionservers to serve different applications, to achieve some kind of isolation or resource allocation. However, although the regionservers are grouped, the datanodes which store the data are not, which leads to the case that one datanode failure affects multiple applications, as we already observed in our product environment. To relieve the above issue, we could take usage of the favored node feature (HDFS-2576) to make regionserver able to locate data within its group, or say make datanodes also grouped (passively), to form some level of isolation. In this case, or any other case that needs datanodes to group, we would need a bunch of tools to maintain the group, including: 1. Making balancer able to balance data among specified servers, rather than the whole set 2. Set balance bandwidth for specified servers, rather than the whole set 3. Some tool to check whether the block is cross-group placed, and move it back if so This JIRA is an umbrella for the above tools. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6009) Tools based on favored node feature for isolation
[ https://issues.apache.org/jira/browse/HDFS-6009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935194#comment-13935194 ] Thanh Do commented on HDFS-6009: Yu Li, thanks for your detailed comment! Your use case is a great example of isolation. We are currently working on some similar problems but at a lower level on the software stack, thus your use case is a great motivation. Tools based on favored node feature for isolation - Key: HDFS-6009 URL: https://issues.apache.org/jira/browse/HDFS-6009 Project: Hadoop HDFS Issue Type: Task Affects Versions: 2.3.0 Reporter: Yu Li Assignee: Yu Li Priority: Minor There're scenarios like mentioned in HBASE-6721 and HBASE-4210 that in multi-tenant deployments of HBase we prefer to specify several groups of regionservers to serve different applications, to achieve some kind of isolation or resource allocation. However, although the regionservers are grouped, the datanodes which store the data are not, which leads to the case that one datanode failure affects multiple applications, as we already observed in our product environment. To relieve the above issue, we could take usage of the favored node feature (HDFS-2576) to make regionserver able to locate data within its group, or say make datanodes also grouped (passively), to form some level of isolation. In this case, or any other case that needs datanodes to group, we would need a bunch of tools to maintain the group, including: 1. Making balancer able to balance data among specified servers, rather than the whole set 2. Set balance bandwidth for specified servers, rather than the whole set 3. Some tool to check whether the block is cross-group placed, and move it back if so This JIRA is an umbrella for the above tools. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6009) Tools based on favored node feature for isolation
[ https://issues.apache.org/jira/browse/HDFS-6009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13933475#comment-13933475 ] Thanh Do commented on HDFS-6009: Hi Yu Li, I want to follow up on this issue. Could you please elaborate more on datanode failure. In particular, what caused the failure in your case? Is it a disk error, network failure, or an application is buggy? If it is a disk error and network failure, I think isolation using datanode group is reasonable. Tools based on favored node feature for isolation - Key: HDFS-6009 URL: https://issues.apache.org/jira/browse/HDFS-6009 Project: Hadoop HDFS Issue Type: Task Affects Versions: 2.3.0 Reporter: Yu Li Assignee: Yu Li Priority: Minor There're scenarios like mentioned in HBASE-6721 and HBASE-4210 that in multi-tenant deployments of HBase we prefer to specify several groups of regionservers to serve different applications, to achieve some kind of isolation or resource allocation. However, although the regionservers are grouped, the datanodes which store the data are not, which leads to the case that one datanode failure affects multiple applications, as we already observed in our product environment. To relieve the above issue, we could take usage of the favored node feature (HDFS-2576) to make regionserver able to locate data within its group, or say make datanodes also grouped (passively), to form some level of isolation. In this case, or any other case that needs datanodes to group, we would need a bunch of tools to maintain the group, including: 1. Making balancer able to balance data among specified servers, rather than the whole set 2. Set balance bandwidth for specified servers, rather than the whole set 3. Some tool to check whether the block is cross-group placed, and move it back if so This JIRA is an umbrella for the above tools. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6009) Tools based on favored node feature for isolation
[ https://issues.apache.org/jira/browse/HDFS-6009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13931955#comment-13931955 ] Yu Li commented on HDFS-6009: - Hi [~thanhdo], Yes, the data are replicated, so there won't be data loss. However, since one datanode might carry on data of multiple applications, the datanode failure will cause *several* application read request to retry until timeout and change to another datanode, while we'd like to reduce the impact range Another scenario we experienced here is that application A crazily reading data from one DN, which occupied almost all network bandwidth, while meantime application B tried to write data to this DN but blocked a long time. As I mentioned in HDFS-6010, people might ask in this case why don't use phasically separated clusters, the answer would be it's more convenient and saves people resource to manage one big cluster than several small ones. There's also other solution like HDFS-5776 to reduce the impact of bad datanode, but I believe there're still scenarios which need more strict io isolation, so I think it's still valuable to contribute our tools. Hope this answers your question. :-) Tools based on favored node feature for isolation - Key: HDFS-6009 URL: https://issues.apache.org/jira/browse/HDFS-6009 Project: Hadoop HDFS Issue Type: Task Affects Versions: 2.3.0 Reporter: Yu Li Assignee: Yu Li Priority: Minor There're scenarios like mentioned in HBASE-6721 and HBASE-4210 that in multi-tenant deployments of HBase we prefer to specify several groups of regionservers to serve different applications, to achieve some kind of isolation or resource allocation. However, although the regionservers are grouped, the datanodes which store the data are not, which leads to the case that one datanode failure affects multiple applications, as we already observed in our product environment. To relieve the above issue, we could take usage of the favored node feature (HDFS-2576) to make regionserver able to locate data within its group, or say make datanodes also grouped (passively), to form some level of isolation. In this case, or any other case that needs datanodes to group, we would need a bunch of tools to maintain the group, including: 1. Making balancer able to balance data among specified servers, rather than the whole set 2. Set balance bandwidth for specified servers, rather than the whole set 3. Some tool to check whether the block is cross-group placed, and move it back if so This JIRA is an umbrella for the above tools. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6009) Tools based on favored node feature for isolation
[ https://issues.apache.org/jira/browse/HDFS-6009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13932042#comment-13932042 ] Sirianni, Eric commented on HDFS-6009: -- Thanks for emailing NetApp. The email inbox you have attempted to reach has been deactivated. Tools based on favored node feature for isolation - Key: HDFS-6009 URL: https://issues.apache.org/jira/browse/HDFS-6009 Project: Hadoop HDFS Issue Type: Task Affects Versions: 2.3.0 Reporter: Yu Li Assignee: Yu Li Priority: Minor There're scenarios like mentioned in HBASE-6721 and HBASE-4210 that in multi-tenant deployments of HBase we prefer to specify several groups of regionservers to serve different applications, to achieve some kind of isolation or resource allocation. However, although the regionservers are grouped, the datanodes which store the data are not, which leads to the case that one datanode failure affects multiple applications, as we already observed in our product environment. To relieve the above issue, we could take usage of the favored node feature (HDFS-2576) to make regionserver able to locate data within its group, or say make datanodes also grouped (passively), to form some level of isolation. In this case, or any other case that needs datanodes to group, we would need a bunch of tools to maintain the group, including: 1. Making balancer able to balance data among specified servers, rather than the whole set 2. Set balance bandwidth for specified servers, rather than the whole set 3. Some tool to check whether the block is cross-group placed, and move it back if so This JIRA is an umbrella for the above tools. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6009) Tools based on favored node feature for isolation
[ https://issues.apache.org/jira/browse/HDFS-6009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13932037#comment-13932037 ] Thanh Do commented on HDFS-6009: Thank you! Tools based on favored node feature for isolation - Key: HDFS-6009 URL: https://issues.apache.org/jira/browse/HDFS-6009 Project: Hadoop HDFS Issue Type: Task Affects Versions: 2.3.0 Reporter: Yu Li Assignee: Yu Li Priority: Minor There're scenarios like mentioned in HBASE-6721 and HBASE-4210 that in multi-tenant deployments of HBase we prefer to specify several groups of regionservers to serve different applications, to achieve some kind of isolation or resource allocation. However, although the regionservers are grouped, the datanodes which store the data are not, which leads to the case that one datanode failure affects multiple applications, as we already observed in our product environment. To relieve the above issue, we could take usage of the favored node feature (HDFS-2576) to make regionserver able to locate data within its group, or say make datanodes also grouped (passively), to form some level of isolation. In this case, or any other case that needs datanodes to group, we would need a bunch of tools to maintain the group, including: 1. Making balancer able to balance data among specified servers, rather than the whole set 2. Set balance bandwidth for specified servers, rather than the whole set 3. Some tool to check whether the block is cross-group placed, and move it back if so This JIRA is an umbrella for the above tools. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6009) Tools based on favored node feature for isolation
[ https://issues.apache.org/jira/browse/HDFS-6009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13931338#comment-13931338 ] Thanh Do commented on HDFS-6009: Hi Yu, You mentioned although the regionservers are grouped, the datanodes which store the data are not, which leads to the case that one datanode failure affects multiple applications, as we already observed in our product environment. Can you elaborate that scenarios? I thought a datanode failure will be ok, as the data are replicated. Best, Tools based on favored node feature for isolation - Key: HDFS-6009 URL: https://issues.apache.org/jira/browse/HDFS-6009 Project: Hadoop HDFS Issue Type: Task Affects Versions: 2.3.0 Reporter: Yu Li Assignee: Yu Li Priority: Minor There're scenarios like mentioned in HBASE-6721 and HBASE-4210 that in multi-tenant deployments of HBase we prefer to specify several groups of regionservers to serve different applications, to achieve some kind of isolation or resource allocation. However, although the regionservers are grouped, the datanodes which store the data are not, which leads to the case that one datanode failure affects multiple applications, as we already observed in our product environment. To relieve the above issue, we could take usage of the favored node feature (HDFS-2576) to make regionserver able to locate data within its group, or say make datanodes also grouped (passively), to form some level of isolation. In this case, or any other case that needs datanodes to group, we would need a bunch of tools to maintain the group, including: 1. Making balancer able to balance data among specified servers, rather than the whole set 2. Set balance bandwidth for specified servers, rather than the whole set 3. Some tool to check whether the block is cross-group placed, and move it back if so This JIRA is an umbrella for the above tools. -- This message was sent by Atlassian JIRA (v6.2#6252)