[jira] [Commented] (HDFS-3566) Custom Replication Policy for Azure
[ https://issues.apache.org/jira/browse/HDFS-3566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13457656#comment-13457656 ] Sumadhur Reddy Bolli commented on HDFS-3566: Thanks for the comments Eli. I tried running it and it was getting hung. The traces indicated the following exception. [exec] * [13/23] [0/0] 0.111s 0b hdfs_user_guide.pdf [exec] Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/fop/messaging/MessageHandler [exec] at org.apache.cocoon.serialization.FOPSerializer.configure(FOPSerializer.java:122) [exec] at org.apache.avalon.framework.container.ContainerUtil.configure(ContainerUtil.java:201) [exec] at org.apache.avalon.excalibur.component.DefaultComponentFactory.newInstance(DefaultComponentFactory.java:289) [exec] at org.apache.avalon.excalibur.pool.InstrumentedResourceLimitingPool.newPoolable(InstrumentedResourceLimitingPool.java:655) [exec] at org.apache.avalon.excalibur.pool.InstrumentedResourceLimitingPool.get(InstrumentedResourceLimitingPool.java:371) [exec] at org.apache.avalon.excalibur.component.PoolableComponentHandler.doGet(PoolableComponentHandler.java:198) [exec] at org.apache.avalon.excalibur.component.ComponentHandler.get(ComponentHandler.java:381) [exec] at org.apache. The command I used was ant test-patch -Dpatch.file=/home/sumab/src/branch-1-win/azurepolicy-branch-1-win.patch -Dfindbugs.home=/home/sumab/src/tools/findbugs-2.0.1 -Djava5.home=/home/sumab/src/tools/java/jdk1.6.0_32 -Dforrest.home=/home/sumab/src/tools/apache-forrest-0.9 Could you please suggest how to run it on branch-1-win or what could be wrong with what I did? > Custom Replication Policy for Azure > --- > > Key: HDFS-3566 > URL: https://issues.apache.org/jira/browse/HDFS-3566 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node >Reporter: Sumadhur Reddy Bolli >Assignee: Sumadhur Reddy Bolli > Fix For: 1-win > > Attachments: AzureBlockPlacementPolicy.pdf, > azurepolicy-branch-1-win.patch > > > Azure has logical concepts like fault and upgrade domains. Each fault domain > spans multiple upgrade domains and each upgrade domain spans multiple fault > domains. Machines are spread typically evenly across both fault and upgrade > domains. Fault domain failures are typically catastrophic/unplanned failures > and data loss possibility is high. An upgrade domain can be taken down by > azure for maintenance periodically. Each time an upgrade domain is taken down > a small percentage of machines in the upgrade domain(typically 1-2%) are > replaced due to disk failures, thus losing data. Assuming the default > replication factor 3, any 3 data nodes going down at the same time would mean > potential data loss. So, it is important to have a policy that spreads > replicas across both fault and upgrade domains to ensure practically no data > loss. The problem here is two dimensional and the default policy in hadoop is > one-dimensional. This policy would spread the datanodes across atleast 2 > fault domains and three upgrade domains to prevent data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3564) Design enhancements to the pluggable blockplacementpolicy
[ https://issues.apache.org/jira/browse/HDFS-3564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13457211#comment-13457211 ] Sumadhur Reddy Bolli commented on HDFS-3564: Thanks Nicholas! I updated the description as per your suggestion. > Design enhancements to the pluggable blockplacementpolicy > - > > Key: HDFS-3564 > URL: https://issues.apache.org/jira/browse/HDFS-3564 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node >Reporter: Sumadhur Reddy Bolli >Assignee: Sumadhur Reddy Bolli > Fix For: 1-win > > Attachments: blockPlacementPolicy enhancements.pdf, > policyenhancements-branch-1-win.patch > > > Currently, it is assumed that no placement policy other than the default can > work with the balancer and that the minimum racks required for any placement > policy is always 2. The attached pdf suggests enhancements to the existing > block placement policy abstraction to enable the following > a) Let placement policies decide if they are compatible with balancer > b) Provide an api to move blocks for balancing > c) Let the placement policy decide the minimum number of racks given the > replication factor > d) Also, the private methods in the default policy are made protected, > similar to the way its done in trunk, to enable easy derivation of custom > policies. > Please refer the pdf and the patch for details. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3564) Design enhancements to the pluggable blockplacementpolicy
[ https://issues.apache.org/jira/browse/HDFS-3564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sumadhur Reddy Bolli updated HDFS-3564: --- Description: Currently, it is assumed that no placement policy other than the default can work with the balancer and that the minimum racks required for any placement policy is always 2. The attached pdf suggests enhancements to the existing block placement policy abstraction to enable the following a) Let placement policies decide if they are compatible with balancer b) Provide an api to move blocks for balancing c) Let the placement policy decide the minimum number of racks given the replication factor d) Also, the private methods in the default policy are made protected, similar to the way its done in trunk, to enable easy derivation of custom policies. Please refer the pdf and the patch for details. was: Currently, it is assumed that no placement policy other than the default can work with the balancer and that the minimum racks required for any placement policy is always 2. The attached pdf suggests enhancements to the existing block placement policy abstraction to enable the following a) Let placement policies decide if they are compatible with balancer b) Provide an api to move blocks for balancing c) Let the placement policy decide the minimum number of racks given the replication factor > Design enhancements to the pluggable blockplacementpolicy > - > > Key: HDFS-3564 > URL: https://issues.apache.org/jira/browse/HDFS-3564 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node >Reporter: Sumadhur Reddy Bolli >Assignee: Sumadhur Reddy Bolli > Fix For: 1-win > > Attachments: blockPlacementPolicy enhancements.pdf, > policyenhancements-branch-1-win.patch > > > Currently, it is assumed that no placement policy other than the default can > work with the balancer and that the minimum racks required for any placement > policy is always 2. The attached pdf suggests enhancements to the existing > block placement policy abstraction to enable the following > a) Let placement policies decide if they are compatible with balancer > b) Provide an api to move blocks for balancing > c) Let the placement policy decide the minimum number of racks given the > replication factor > d) Also, the private methods in the default policy are made protected, > similar to the way its done in trunk, to enable easy derivation of custom > policies. > Please refer the pdf and the patch for details. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3564) Design enhancements to the pluggable blockplacementpolicy
[ https://issues.apache.org/jira/browse/HDFS-3564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sumadhur Reddy Bolli updated HDFS-3564: --- Description: Currently, it is assumed that no placement policy other than the default can work with the balancer and that the minimum racks required for any placement policy is always 2. The attached pdf suggests enhancements to the existing block placement policy abstraction to enable the following a) Let placement policies decide if they are compatible with balancer b) Provide an api to move blocks for balancing c) Let the placement policy decide the minimum number of racks given the replication factor was: Currently, it is assumed that no placement policy other than the default can work with the balancer and that the minimum racks required for any placement policy is always 2. The attached pdf suggests enhancements to the existing block placement policy abstraction to enable the following a) Let placement policies decide if they are compatible with balancer b) Provide an api to move blocks that can be used by the balancer for balancing c) Let the placment policy decide the minimum no of racks give > Design enhancements to the pluggable blockplacementpolicy > - > > Key: HDFS-3564 > URL: https://issues.apache.org/jira/browse/HDFS-3564 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node >Reporter: Sumadhur Reddy Bolli >Assignee: Sumadhur Reddy Bolli > Fix For: 1-win > > Attachments: blockPlacementPolicy enhancements.pdf, > policyenhancements-branch-1-win.patch > > > Currently, it is assumed that no placement policy other than the default can > work with the balancer and that the minimum racks required for any placement > policy is always 2. The attached pdf suggests enhancements to the existing > block placement policy abstraction to enable the following > a) Let placement policies decide if they are compatible with balancer > b) Provide an api to move blocks for balancing > c) Let the placement policy decide the minimum number of racks given the > replication factor -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3564) Design enhancements to the pluggable blockplacementpolicy
[ https://issues.apache.org/jira/browse/HDFS-3564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sumadhur Reddy Bolli updated HDFS-3564: --- Description: Currently, it is assumed that no placement policy other than the default can work with the balancer and that the minimum racks required for any placement policy is always 2. The attached pdf suggests enhancements to the existing block placement policy abstraction to enable the following a) Let placement policies decide if they are compatible with balancer b) Provide an api to move blocks that can be used by the balancer for balancing c) Let the placment policy decide the minimum no of racks give > Design enhancements to the pluggable blockplacementpolicy > - > > Key: HDFS-3564 > URL: https://issues.apache.org/jira/browse/HDFS-3564 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node >Reporter: Sumadhur Reddy Bolli >Assignee: Sumadhur Reddy Bolli > Fix For: 1-win > > Attachments: blockPlacementPolicy enhancements.pdf, > policyenhancements-branch-1-win.patch > > > Currently, it is assumed that no placement policy other than the default can > work with the balancer and that the minimum racks required for any placement > policy is always 2. The attached pdf suggests enhancements to the existing > block placement policy abstraction to enable the following > a) Let placement policies decide if they are compatible with balancer > b) Provide an api to move blocks that can be used by the balancer for > balancing > c) Let the placment policy decide the minimum no of racks give -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3566) Custom Replication Policy for Azure
[ https://issues.apache.org/jira/browse/HDFS-3566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13456192#comment-13456192 ] Sumadhur Reddy Bolli commented on HDFS-3566: Thanks Aaron for letting me know. Then I guess I should not worry about pre-commit builds failing for non-trunk patches. > Custom Replication Policy for Azure > --- > > Key: HDFS-3566 > URL: https://issues.apache.org/jira/browse/HDFS-3566 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node >Reporter: Sumadhur Reddy Bolli >Assignee: Sumadhur Reddy Bolli > Fix For: 1-win > > Attachments: AzureBlockPlacementPolicy.pdf, > azurepolicy-branch-1-win.patch > > > Azure has logical concepts like fault and upgrade domains. Each fault domain > spans multiple upgrade domains and each upgrade domain spans multiple fault > domains. Machines are spread typically evenly across both fault and upgrade > domains. Fault domain failures are typically catastrophic/unplanned failures > and data loss possibility is high. An upgrade domain can be taken down by > azure for maintenance periodically. Each time an upgrade domain is taken down > a small percentage of machines in the upgrade domain(typically 1-2%) are > replaced due to disk failures, thus losing data. Assuming the default > replication factor 3, any 3 data nodes going down at the same time would mean > potential data loss. So, it is important to have a policy that spreads > replicas across both fault and upgrade domains to ensure practically no data > loss. The problem here is two dimensional and the default policy in hadoop is > one-dimensional. This policy would spread the datanodes across atleast 2 > fault domains and three upgrade domains to prevent data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3564) Design enhancements to the pluggable blockplacementpolicy
[ https://issues.apache.org/jira/browse/HDFS-3564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13455950#comment-13455950 ] Sumadhur Reddy Bolli commented on HDFS-3564: This patch is meant for branch-1-win and not for trunk. Hadoop QA seems to be running the patch against the trunk version and thus the failures. > Design enhancements to the pluggable blockplacementpolicy > - > > Key: HDFS-3564 > URL: https://issues.apache.org/jira/browse/HDFS-3564 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node >Reporter: Sumadhur Reddy Bolli >Assignee: Sumadhur Reddy Bolli > Fix For: 1-win > > Attachments: blockPlacementPolicy enhancements.pdf, > policyenhancements-branch-1-win.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3566) Custom Replication Policy for Azure
[ https://issues.apache.org/jira/browse/HDFS-3566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13455943#comment-13455943 ] Sumadhur Reddy Bolli commented on HDFS-3566: this patch is dependent on the HDFS-3564. Will resubmit the patch once HDFS-3564 is committed so that tests can run again. > Custom Replication Policy for Azure > --- > > Key: HDFS-3566 > URL: https://issues.apache.org/jira/browse/HDFS-3566 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node >Reporter: Sumadhur Reddy Bolli >Assignee: Sumadhur Reddy Bolli > Fix For: 1-win > > Attachments: AzureBlockPlacementPolicy.pdf, > azurepolicy-branch-1-win.patch > > > Azure has logical concepts like fault and upgrade domains. Each fault domain > spans multiple upgrade domains and each upgrade domain spans multiple fault > domains. Machines are spread typically evenly across both fault and upgrade > domains. Fault domain failures are typically catastrophic/unplanned failures > and data loss possibility is high. An upgrade domain can be taken down by > azure for maintenance periodically. Each time an upgrade domain is taken down > a small percentage of machines in the upgrade domain(typically 1-2%) are > replaced due to disk failures, thus losing data. Assuming the default > replication factor 3, any 3 data nodes going down at the same time would mean > potential data loss. So, it is important to have a policy that spreads > replicas across both fault and upgrade domains to ensure practically no data > loss. The problem here is two dimensional and the default policy in hadoop is > one-dimensional. This policy would spread the datanodes across atleast 2 > fault domains and three upgrade domains to prevent data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3564) Design enhancements to the pluggable blockplacementpolicy
[ https://issues.apache.org/jira/browse/HDFS-3564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sumadhur Reddy Bolli updated HDFS-3564: --- Attachment: policyenhancements-branch-1-win.patch > Design enhancements to the pluggable blockplacementpolicy > - > > Key: HDFS-3564 > URL: https://issues.apache.org/jira/browse/HDFS-3564 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node >Reporter: Sumadhur Reddy Bolli >Assignee: Sumadhur Reddy Bolli > Fix For: 1-win > > Attachments: blockPlacementPolicy enhancements.pdf, > policyenhancements-branch-1-win.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3564) Design enhancements to the pluggable blockplacementpolicy
[ https://issues.apache.org/jira/browse/HDFS-3564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sumadhur Reddy Bolli updated HDFS-3564: --- Attachment: (was: policyenhancements-branch-1-win.patch) > Design enhancements to the pluggable blockplacementpolicy > - > > Key: HDFS-3564 > URL: https://issues.apache.org/jira/browse/HDFS-3564 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node >Reporter: Sumadhur Reddy Bolli >Assignee: Sumadhur Reddy Bolli > Fix For: 1-win > > Attachments: blockPlacementPolicy enhancements.pdf, > policyenhancements-branch-1-win.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3564) Design enhancements to the pluggable blockplacementpolicy
[ https://issues.apache.org/jira/browse/HDFS-3564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sumadhur Reddy Bolli updated HDFS-3564: --- Fix Version/s: 1-win Status: Patch Available (was: Open) > Design enhancements to the pluggable blockplacementpolicy > - > > Key: HDFS-3564 > URL: https://issues.apache.org/jira/browse/HDFS-3564 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node >Reporter: Sumadhur Reddy Bolli >Assignee: Sumadhur Reddy Bolli > Fix For: 1-win > > Attachments: blockPlacementPolicy enhancements.pdf, > policyenhancements-branch-1-win.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3566) Custom Replication Policy for Azure
[ https://issues.apache.org/jira/browse/HDFS-3566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sumadhur Reddy Bolli updated HDFS-3566: --- Attachment: AzureBlockPlacementPolicy.pdf > Custom Replication Policy for Azure > --- > > Key: HDFS-3566 > URL: https://issues.apache.org/jira/browse/HDFS-3566 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node >Reporter: Sumadhur Reddy Bolli >Assignee: Sumadhur Reddy Bolli > Fix For: 1-win > > Attachments: AzureBlockPlacementPolicy.pdf, > azurepolicy-branch-1-win.patch > > > Azure has logical concepts like fault and upgrade domains. Each fault domain > spans multiple upgrade domains and each upgrade domain spans multiple fault > domains. Machines are spread typically evenly across both fault and upgrade > domains. Fault domain failures are typically catastrophic/unplanned failures > and data loss possibility is high. An upgrade domain can be taken down by > azure for maintenance periodically. Each time an upgrade domain is taken down > a small percentage of machines in the upgrade domain(typically 1-2%) are > replaced due to disk failures, thus losing data. Assuming the default > replication factor 3, any 3 data nodes going down at the same time would mean > potential data loss. So, it is important to have a policy that spreads > replicas across both fault and upgrade domains to ensure practically no data > loss. The problem here is two dimensional and the default policy in hadoop is > one-dimensional. This policy would spread the datanodes across atleast 2 > fault domains and three upgrade domains to prevent data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3566) Custom Replication Policy for Azure
[ https://issues.apache.org/jira/browse/HDFS-3566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13455626#comment-13455626 ] Sumadhur Reddy Bolli commented on HDFS-3566: AzureBlockPlacementPolicy.pdf explains at a high level why we need a separate policy in azure. > Custom Replication Policy for Azure > --- > > Key: HDFS-3566 > URL: https://issues.apache.org/jira/browse/HDFS-3566 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node >Reporter: Sumadhur Reddy Bolli >Assignee: Sumadhur Reddy Bolli > Fix For: 1-win > > Attachments: azurepolicy-branch-1-win.patch > > > Azure has logical concepts like fault and upgrade domains. Each fault domain > spans multiple upgrade domains and each upgrade domain spans multiple fault > domains. Machines are spread typically evenly across both fault and upgrade > domains. Fault domain failures are typically catastrophic/unplanned failures > and data loss possibility is high. An upgrade domain can be taken down by > azure for maintenance periodically. Each time an upgrade domain is taken down > a small percentage of machines in the upgrade domain(typically 1-2%) are > replaced due to disk failures, thus losing data. Assuming the default > replication factor 3, any 3 data nodes going down at the same time would mean > potential data loss. So, it is important to have a policy that spreads > replicas across both fault and upgrade domains to ensure practically no data > loss. The problem here is two dimensional and the default policy in hadoop is > one-dimensional. This policy would spread the datanodes across atleast 2 > fault domains and three upgrade domains to prevent data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3564) Design enhancements to the pluggable blockplacementpolicy
[ https://issues.apache.org/jira/browse/HDFS-3564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13455625#comment-13455625 ] Sumadhur Reddy Bolli commented on HDFS-3564: blockplacementpoliy enhancements.pdf explains the changes to the existing abstraction. > Design enhancements to the pluggable blockplacementpolicy > - > > Key: HDFS-3564 > URL: https://issues.apache.org/jira/browse/HDFS-3564 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node >Reporter: Sumadhur Reddy Bolli >Assignee: Sumadhur Reddy Bolli > Attachments: blockPlacementPolicy enhancements.pdf, > policyenhancements-branch-1-win.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3564) Design enhancements to the pluggable blockplacementpolicy
[ https://issues.apache.org/jira/browse/HDFS-3564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sumadhur Reddy Bolli updated HDFS-3564: --- Attachment: blockPlacementPolicy enhancements.pdf > Design enhancements to the pluggable blockplacementpolicy > - > > Key: HDFS-3564 > URL: https://issues.apache.org/jira/browse/HDFS-3564 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node >Reporter: Sumadhur Reddy Bolli >Assignee: Sumadhur Reddy Bolli > Attachments: blockPlacementPolicy enhancements.pdf, > policyenhancements-branch-1-win.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3566) Custom Replication Policy for Azure
[ https://issues.apache.org/jira/browse/HDFS-3566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sumadhur Reddy Bolli updated HDFS-3566: --- Fix Version/s: 1-win Status: Patch Available (was: Open) > Custom Replication Policy for Azure > --- > > Key: HDFS-3566 > URL: https://issues.apache.org/jira/browse/HDFS-3566 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node >Reporter: Sumadhur Reddy Bolli >Assignee: Sumadhur Reddy Bolli > Fix For: 1-win > > Attachments: azurepolicy-branch-1-win.patch > > > Azure has logical concepts like fault and upgrade domains. Each fault domain > spans multiple upgrade domains and each upgrade domain spans multiple fault > domains. Machines are spread typically evenly across both fault and upgrade > domains. Fault domain failures are typically catastrophic/unplanned failures > and data loss possibility is high. An upgrade domain can be taken down by > azure for maintenance periodically. Each time an upgrade domain is taken down > a small percentage of machines in the upgrade domain(typically 1-2%) are > replaced due to disk failures, thus losing data. Assuming the default > replication factor 3, any 3 data nodes going down at the same time would mean > potential data loss. So, it is important to have a policy that spreads > replicas across both fault and upgrade domains to ensure practically no data > loss. The problem here is two dimensional and the default policy in hadoop is > one-dimensional. This policy would spread the datanodes across atleast 2 > fault domains and three upgrade domains to prevent data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3566) Custom Replication Policy for Azure
[ https://issues.apache.org/jira/browse/HDFS-3566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sumadhur Reddy Bolli updated HDFS-3566: --- Attachment: azurepolicy-branch-1-win.patch submitted a patch for azure policy in branch-1-win > Custom Replication Policy for Azure > --- > > Key: HDFS-3566 > URL: https://issues.apache.org/jira/browse/HDFS-3566 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node >Reporter: Sumadhur Reddy Bolli >Assignee: Sumadhur Reddy Bolli > Fix For: 1-win > > Attachments: azurepolicy-branch-1-win.patch > > > Azure has logical concepts like fault and upgrade domains. Each fault domain > spans multiple upgrade domains and each upgrade domain spans multiple fault > domains. Machines are spread typically evenly across both fault and upgrade > domains. Fault domain failures are typically catastrophic/unplanned failures > and data loss possibility is high. An upgrade domain can be taken down by > azure for maintenance periodically. Each time an upgrade domain is taken down > a small percentage of machines in the upgrade domain(typically 1-2%) are > replaced due to disk failures, thus losing data. Assuming the default > replication factor 3, any 3 data nodes going down at the same time would mean > potential data loss. So, it is important to have a policy that spreads > replicas across both fault and upgrade domains to ensure practically no data > loss. The problem here is two dimensional and the default policy in hadoop is > one-dimensional. This policy would spread the datanodes across atleast 2 > fault domains and three upgrade domains to prevent data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3564) Design enhancements to the pluggable blockplacementpolicy
[ https://issues.apache.org/jira/browse/HDFS-3564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sumadhur Reddy Bolli updated HDFS-3564: --- Attachment: policyenhancements-branch-1-win.patch Submitting a patch for placement policy enhancements in branch-1-win > Design enhancements to the pluggable blockplacementpolicy > - > > Key: HDFS-3564 > URL: https://issues.apache.org/jira/browse/HDFS-3564 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node >Reporter: Sumadhur Reddy Bolli >Assignee: Sumadhur Reddy Bolli > Attachments: policyenhancements-branch-1-win.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3649) Port HDFS-385 to branch-1-win
[ https://issues.apache.org/jira/browse/HDFS-3649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sumadhur Reddy Bolli updated HDFS-3649: --- Release Note: blockplacement policy is now ported to branch-1 and branch-1-win (was: Nicholas submitted the patches posted on HDF-385 to branch-1 and branch-1-win) Nicholas committed the patches posted on HDF-385 to branch-1 and branch-1-win > Port HDFS-385 to branch-1-win > - > > Key: HDFS-3649 > URL: https://issues.apache.org/jira/browse/HDFS-3649 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 1-win >Reporter: Sumadhur Reddy Bolli >Assignee: Sumadhur Reddy Bolli > > Added patch to HDF-385 to port the existing pluggable placement policy to > branch-1-win -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-3649) Port HDFS-385 to branch-1-win
[ https://issues.apache.org/jira/browse/HDFS-3649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sumadhur Reddy Bolli resolved HDFS-3649. Resolution: Fixed Release Note: Nicholas submitted the patches posted on HDF-385 to branch-1 and branch-1-win > Port HDFS-385 to branch-1-win > - > > Key: HDFS-3649 > URL: https://issues.apache.org/jira/browse/HDFS-3649 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 1-win >Reporter: Sumadhur Reddy Bolli >Assignee: Sumadhur Reddy Bolli > > Added patch to HDF-385 to port the existing pluggable placement policy to > branch-1-win -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-385) Design a pluggable interface to place replicas of blocks in HDFS
[ https://issues.apache.org/jira/browse/HDFS-385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427704#comment-13427704 ] Sumadhur Reddy Bolli commented on HDFS-385: --- I also ran all the unit tests and ensured that they pass. > Design a pluggable interface to place replicas of blocks in HDFS > > > Key: HDFS-385 > URL: https://issues.apache.org/jira/browse/HDFS-385 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: dhruba borthakur >Assignee: dhruba borthakur > Fix For: 0.21.0 > > Attachments: BlockPlacementPluggable.txt, > BlockPlacementPluggable2.txt, BlockPlacementPluggable3.txt, > BlockPlacementPluggable4.txt, BlockPlacementPluggable4.txt, > BlockPlacementPluggable5.txt, BlockPlacementPluggable6.txt, > BlockPlacementPluggable7.txt, blockplacementpolicy-branch-1-win.patch, > blockplacementpolicy-branch-1.patch, > blockplacementpolicy2-branch-1-win.patch, > blockplacementpolicy2-branch-1.patch, > blockplacementpolicy3-branch-1-win.patch, > blockplacementpolicy3-branch-1.patch, rat094.txt > > > The current HDFS code typically places one replica on local rack, the second > replica on remote random rack and the third replica on a random node of that > remote rack. This algorithm is baked in the NameNode's code. It would be nice > to make the block placement algorithm a pluggable interface. This will allow > experimentation of different placement algorithms based on workloads, > availability guarantees and failure models. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-385) Design a pluggable interface to place replicas of blocks in HDFS
[ https://issues.apache.org/jira/browse/HDFS-385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sumadhur Reddy Bolli updated HDFS-385: -- Attachment: blockplacementpolicy3-branch-1-win.patch blockplacementpolicy3-branch-1.patch Thanks for the feedback Nicholas. Addressed all the comments in the new patches. Please review. > Design a pluggable interface to place replicas of blocks in HDFS > > > Key: HDFS-385 > URL: https://issues.apache.org/jira/browse/HDFS-385 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: dhruba borthakur >Assignee: dhruba borthakur > Fix For: 0.21.0 > > Attachments: BlockPlacementPluggable.txt, > BlockPlacementPluggable2.txt, BlockPlacementPluggable3.txt, > BlockPlacementPluggable4.txt, BlockPlacementPluggable4.txt, > BlockPlacementPluggable5.txt, BlockPlacementPluggable6.txt, > BlockPlacementPluggable7.txt, blockplacementpolicy-branch-1-win.patch, > blockplacementpolicy-branch-1.patch, > blockplacementpolicy2-branch-1-win.patch, > blockplacementpolicy2-branch-1.patch, > blockplacementpolicy3-branch-1-win.patch, > blockplacementpolicy3-branch-1.patch, rat094.txt > > > The current HDFS code typically places one replica on local rack, the second > replica on remote random rack and the third replica on a random node of that > remote rack. This algorithm is baked in the NameNode's code. It would be nice > to make the block placement algorithm a pluggable interface. This will allow > experimentation of different placement algorithms based on workloads, > availability guarantees and failure models. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-385) Design a pluggable interface to place replicas of blocks in HDFS
[ https://issues.apache.org/jira/browse/HDFS-385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sumadhur Reddy Bolli updated HDFS-385: -- Attachment: blockplacementpolicy2-branch-1-win.patch blockplacementpolicy2-branch-1.patch Sorry. I missed newly added files in the previous patches. Submitted new patches with all files. Please review. > Design a pluggable interface to place replicas of blocks in HDFS > > > Key: HDFS-385 > URL: https://issues.apache.org/jira/browse/HDFS-385 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: dhruba borthakur >Assignee: dhruba borthakur > Fix For: 0.21.0 > > Attachments: BlockPlacementPluggable.txt, > BlockPlacementPluggable2.txt, BlockPlacementPluggable3.txt, > BlockPlacementPluggable4.txt, BlockPlacementPluggable4.txt, > BlockPlacementPluggable5.txt, BlockPlacementPluggable6.txt, > BlockPlacementPluggable7.txt, blockplacementpolicy-branch-1-win.patch, > blockplacementpolicy-branch-1.patch, > blockplacementpolicy2-branch-1-win.patch, > blockplacementpolicy2-branch-1.patch, rat094.txt > > > The current HDFS code typically places one replica on local rack, the second > replica on remote random rack and the third replica on a random node of that > remote rack. This algorithm is baked in the NameNode's code. It would be nice > to make the block placement algorithm a pluggable interface. This will allow > experimentation of different placement algorithms based on workloads, > availability guarantees and failure models. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HDFS-3566) Custom Replication Policy for Azure
[ https://issues.apache.org/jira/browse/HDFS-3566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sumadhur Reddy Bolli reassigned HDFS-3566: -- Assignee: Sumadhur Reddy Bolli > Custom Replication Policy for Azure > --- > > Key: HDFS-3566 > URL: https://issues.apache.org/jira/browse/HDFS-3566 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node >Reporter: Sumadhur Reddy Bolli >Assignee: Sumadhur Reddy Bolli > > Azure has logical concepts like fault and upgrade domains. Each fault domain > spans multiple upgrade domains and each upgrade domain spans multiple fault > domains. Machines are spread typically evenly across both fault and upgrade > domains. Fault domain failures are typically catastrophic/unplanned failures > and data loss possibility is high. An upgrade domain can be taken down by > azure for maintenance periodically. Each time an upgrade domain is taken down > a small percentage of machines in the upgrade domain(typically 1-2%) are > replaced due to disk failures, thus losing data. Assuming the default > replication factor 3, any 3 data nodes going down at the same time would mean > potential data loss. So, it is important to have a policy that spreads > replicas across both fault and upgrade domains to ensure practically no data > loss. The problem here is two dimensional and the default policy in hadoop is > one-dimensional. This policy would spread the datanodes across atleast 2 > fault domains and three upgrade domains to prevent data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3564) Design enhancements to the pluggable blockplacementpolicy
[ https://issues.apache.org/jira/browse/HDFS-3564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sumadhur Reddy Bolli updated HDFS-3564: --- Assignee: Sumadhur Reddy Bolli > Design enhancements to the pluggable blockplacementpolicy > - > > Key: HDFS-3564 > URL: https://issues.apache.org/jira/browse/HDFS-3564 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node >Reporter: Sumadhur Reddy Bolli >Assignee: Sumadhur Reddy Bolli > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3649) Port HDFS-385 to branch-1-win
[ https://issues.apache.org/jira/browse/HDFS-3649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sumadhur Reddy Bolli updated HDFS-3649: --- Assignee: Sumadhur Reddy Bolli > Port HDFS-385 to branch-1-win > - > > Key: HDFS-3649 > URL: https://issues.apache.org/jira/browse/HDFS-3649 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 1-win >Reporter: Sumadhur Reddy Bolli >Assignee: Sumadhur Reddy Bolli > > Added patch to HDF-385 to port the existing pluggable placement policy to > branch-1-win -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3564) Design enhancements to the pluggable blockplacementpolicy
[ https://issues.apache.org/jira/browse/HDFS-3564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13413183#comment-13413183 ] Sumadhur Reddy Bolli commented on HDFS-3564: I apologize for the incovenience. Changed the title. I will update the description or attach a doc with the proposed changes once the 3649 port is complete. Thanks! > Design enhancements to the pluggable blockplacementpolicy > - > > Key: HDFS-3564 > URL: https://issues.apache.org/jira/browse/HDFS-3564 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node >Reporter: Sumadhur Reddy Bolli > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3564) Design enhancements to the pluggable blockplacementpolicy
[ https://issues.apache.org/jira/browse/HDFS-3564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sumadhur Reddy Bolli updated HDFS-3564: --- Description: (was: ReplicationTargetChooser currently determines the placement of replicas in hadoop. Making the replication policy pluggable would help in having custom replication policies that suit the environment. Eg1: Enabling placing replicas across different datacenters(not just racks) Eg2: Enabling placing replicas across multiple(more than 2) racks Eg3: Cloud environments like azure have logical concepts like fault and upgrade domains. Each fault domain spans multiple upgrade domains and each upgrade domain spans multiple fault domains. Machines are spread typically evenly across both fault and upgrade domains. Fault domain failures are typically catastrophic/unplanned failures and data loss possibility is high. An upgrade domain can be taken down by azure for maintenance periodically. Each time an upgrade domain is taken down a small percentage of machines in the upgrade domain(typically 1-2%) are replaced due to disk failures, thus losing data. Assuming the default replication factor 3, any 3 data nodes going down at the same time would mean potential data loss. So, it is important to have a policy that spreads replicas across both fault and upgrade domains to ensure practically no data loss. The problem here is two dimensional and the default policy in hadoop is one-dimensional. Custom policies to address issues like these can be written if we make the policy pluggable.) Remaining Estimate: (was: 24h) Original Estimate: (was: 24h) Summary: Design enhancements to the pluggable blockplacementpolicy (was: Make the replication policy pluggable to allow custom replication policies) > Design enhancements to the pluggable blockplacementpolicy > - > > Key: HDFS-3564 > URL: https://issues.apache.org/jira/browse/HDFS-3564 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node >Reporter: Sumadhur Reddy Bolli > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3566) Custom Replication Policy for Azure
[ https://issues.apache.org/jira/browse/HDFS-3566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sumadhur Reddy Bolli updated HDFS-3566: --- Target Version/s: 1-win (was: 1.1.0) > Custom Replication Policy for Azure > --- > > Key: HDFS-3566 > URL: https://issues.apache.org/jira/browse/HDFS-3566 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node >Reporter: Sumadhur Reddy Bolli > > Azure has logical concepts like fault and upgrade domains. Each fault domain > spans multiple upgrade domains and each upgrade domain spans multiple fault > domains. Machines are spread typically evenly across both fault and upgrade > domains. Fault domain failures are typically catastrophic/unplanned failures > and data loss possibility is high. An upgrade domain can be taken down by > azure for maintenance periodically. Each time an upgrade domain is taken down > a small percentage of machines in the upgrade domain(typically 1-2%) are > replaced due to disk failures, thus losing data. Assuming the default > replication factor 3, any 3 data nodes going down at the same time would mean > potential data loss. So, it is important to have a policy that spreads > replicas across both fault and upgrade domains to ensure practically no data > loss. The problem here is two dimensional and the default policy in hadoop is > one-dimensional. This policy would spread the datanodes across atleast 2 > fault domains and three upgrade domains to prevent data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3649) Port HDFS-385 to branch-1-win
Sumadhur Reddy Bolli created HDFS-3649: -- Summary: Port HDFS-385 to branch-1-win Key: HDFS-3649 URL: https://issues.apache.org/jira/browse/HDFS-3649 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 1-win Reporter: Sumadhur Reddy Bolli Added patch to HDF-385 to port the existing pluggable placement policy to branch-1-win -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-385) Design a pluggable interface to place replicas of blocks in HDFS
[ https://issues.apache.org/jira/browse/HDFS-385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sumadhur Reddy Bolli updated HDFS-385: -- Attachment: blockplacementpolicy-branch-1-win.patch blockplacementpolicy-branch-1.patch Patches to port the pluggable interface to branch-1 and branch-1-win. > Design a pluggable interface to place replicas of blocks in HDFS > > > Key: HDFS-385 > URL: https://issues.apache.org/jira/browse/HDFS-385 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: dhruba borthakur >Assignee: dhruba borthakur > Fix For: 0.21.0 > > Attachments: BlockPlacementPluggable.txt, > BlockPlacementPluggable2.txt, BlockPlacementPluggable3.txt, > BlockPlacementPluggable4.txt, BlockPlacementPluggable4.txt, > BlockPlacementPluggable5.txt, BlockPlacementPluggable6.txt, > BlockPlacementPluggable7.txt, blockplacementpolicy-branch-1-win.patch, > blockplacementpolicy-branch-1.patch > > > The current HDFS code typically places one replica on local rack, the second > replica on remote random rack and the third replica on a random node of that > remote rack. This algorithm is baked in the NameNode's code. It would be nice > to make the block placement algorithm a pluggable interface. This will allow > experimentation of different placement algorithms based on workloads, > availability guarantees and failure models. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3564) Make the replication policy pluggable to allow custom replication policies
[ https://issues.apache.org/jira/browse/HDFS-3564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13412463#comment-13412463 ] Sumadhur Reddy Bolli commented on HDFS-3564: Making the policy pluggable should be sufficent. I will re-purpose this JIRA to suggest enhancements to the existing abstraction. Network topology is not known to the users in azure and it is not strictly hierarichical in nature as fault domains span upgrade domains and upgrade domains can span fault domains. However, I do not see much value in changing the internal abstractions for topology as we do not know the underlying physical topology in azure. I will post a document with the details on the JIRA 3566 to explain this better. > Make the replication policy pluggable to allow custom replication policies > -- > > Key: HDFS-3564 > URL: https://issues.apache.org/jira/browse/HDFS-3564 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node >Reporter: Sumadhur Reddy Bolli > Original Estimate: 24h > Remaining Estimate: 24h > > ReplicationTargetChooser currently determines the placement of replicas in > hadoop. Making the replication policy pluggable would help in having custom > replication policies that suit the environment. > Eg1: Enabling placing replicas across different datacenters(not just racks) > Eg2: Enabling placing replicas across multiple(more than 2) racks > Eg3: Cloud environments like azure have logical concepts like fault and > upgrade domains. Each fault domain spans multiple upgrade domains and each > upgrade domain spans multiple fault domains. Machines are spread typically > evenly across both fault and upgrade domains. Fault domain failures are > typically catastrophic/unplanned failures and data loss possibility is high. > An upgrade domain can be taken down by azure for maintenance periodically. > Each time an upgrade domain is taken down a small percentage of machines in > the upgrade domain(typically 1-2%) are replaced due to disk failures, thus > losing data. Assuming the default replication factor 3, any 3 data nodes > going down at the same time would mean potential data loss. So, it is > important to have a policy that spreads replicas across both fault and > upgrade domains to ensure practically no data loss. The problem here is two > dimensional and the default policy in hadoop is one-dimensional. Custom > policies to address issues like these can be written if we make the policy > pluggable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3566) Custom Replication Policy for Azure
Sumadhur Reddy Bolli created HDFS-3566: -- Summary: Custom Replication Policy for Azure Key: HDFS-3566 URL: https://issues.apache.org/jira/browse/HDFS-3566 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Reporter: Sumadhur Reddy Bolli Azure has logical concepts like fault and upgrade domains. Each fault domain spans multiple upgrade domains and each upgrade domain spans multiple fault domains. Machines are spread typically evenly across both fault and upgrade domains. Fault domain failures are typically catastrophic/unplanned failures and data loss possibility is high. An upgrade domain can be taken down by azure for maintenance periodically. Each time an upgrade domain is taken down a small percentage of machines in the upgrade domain(typically 1-2%) are replaced due to disk failures, thus losing data. Assuming the default replication factor 3, any 3 data nodes going down at the same time would mean potential data loss. So, it is important to have a policy that spreads replicas across both fault and upgrade domains to ensure practically no data loss. The problem here is two dimensional and the default policy in hadoop is one-dimensional. This policy would spread the datanodes across atleast 2 fault domains and three upgrade domains to prevent data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3564) Make the replication policy pluggable to allow custom replication policies
Sumadhur Reddy Bolli created HDFS-3564: -- Summary: Make the replication policy pluggable to allow custom replication policies Key: HDFS-3564 URL: https://issues.apache.org/jira/browse/HDFS-3564 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Reporter: Sumadhur Reddy Bolli ReplicationTargetChooser currently determines the placement of replicas in hadoop. Making the replication policy pluggable would help in having custom replication policies that suit the environment. Eg1: Enabling placing replicas across different datacenters(not just racks) Eg2: Enabling placing replicas across multiple(more than 2) racks Eg3: Cloud environments like azure have logical concepts like fault and upgrade domains. Each fault domain spans multiple upgrade domains and each upgrade domain spans multiple fault domains. Machines are spread typically evenly across both fault and upgrade domains. Fault domain failures are typically catastrophic/unplanned failures and data loss possibility is high. An upgrade domain can be taken down by azure for maintenance periodically. Each time an upgrade domain is taken down a small percentage of machines in the upgrade domain(typically 1-2%) are replaced due to disk failures, thus losing data. Assuming the default replication factor 3, any 3 data nodes going down at the same time would mean potential data loss. So, it is important to have a policy that spreads replicas across both fault and upgrade domains to ensure practically no data loss. The problem here is two dimensional and the default policy in hadoop is one-dimensional. Custom policies to address issues like these can be written if we make the policy pluggable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira