[jira] [Updated] (HDFS-8204) Mover/Balancer should not schedule two replicas to the same DN
[ https://issues.apache.org/jira/browse/HDFS-8204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated HDFS-8204: - Fix Version/s: 2.8.0 > Mover/Balancer should not schedule two replicas to the same DN > -- > > Key: HDFS-8204 > URL: https://issues.apache.org/jira/browse/HDFS-8204 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer & mover >Reporter: Walter Su >Assignee: Walter Su >Priority: Minor > Fix For: 2.8.0, 2.7.1, 3.0.0-alpha1 > > Attachments: HDFS-8204.001.patch, HDFS-8204.002.patch, > HDFS-8204.003.patch > > > Balancer moves blocks between Datanode(Ver. <2.6 ). > Balancer moves blocks between StorageGroups ( introduced by HDFS-6584) , in > the new version(Ver. >=2.6) . > function > {code} > class DBlock extends Locations > DBlock.isLocatedOn(StorageGroup loc) > {code} > -is flawed, may causes 2 replicas ends in same node after running balance.- > For example: > We have 2 nodes. Each node has two storages. > We have (DN0, SSD), (DN0, DISK), (DN1, SSD), (DN1, DISK). > We have a block with ONE_SSD storage policy. > The block has 2 replicas. They are in (DN0,SSD) and (DN1,DISK). > Replica in (DN0,SSD) should not be moved to (DN1,SSD) after running Balancer. > Otherwise DN1 has 2 replicas. > -- > UPDATE(Thanks [~szetszwo] for pointing it out): > {color:red} > This bug will *NOT* causes 2 replicas end in same node after running balance, > thanks to Datanode rejecting it. > {color} > We see a lot of ERROR when running test. > {code} > 2015-04-27 10:08:15,809 ERROR datanode.DataNode (DataXceiver.java:run(277)) - > host1.foo.com:59537:DataXceiver error processing REPLACE_BLOCK operation > src: /127.0.0.1:52532 dst: /127.0.0.1:59537 > org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block > BP-264794661-9.96.1.34-1430100451121:blk_1073741825_1001 already exists in > state FINALIZED and thus cannot be created. > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:1447) > at > org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:186) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.replaceBlock(DataXceiver.java:1158) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReplaceBlock(Receiver.java:229) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:77) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:250) > at java.lang.Thread.run(Thread.java:722) > {code} > The Balancer runs 5~20 times iterations in the test, before it exits. > It's ineffecient. > Balancer should not *schedule* it in the first place, even though it'll > failed anyway. In the test, it should exit after 5 times iteration. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-8204) Mover/Balancer should not schedule two replicas to the same DN
[ https://issues.apache.org/jira/browse/HDFS-8204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-8204: -- Resolution: Fixed Fix Version/s: 2.7.1 Status: Resolved (was: Patch Available) I have committed this. Thanks, Walter! Mover/Balancer should not schedule two replicas to the same DN -- Key: HDFS-8204 URL: https://issues.apache.org/jira/browse/HDFS-8204 Project: Hadoop HDFS Issue Type: Improvement Components: balancer mover Reporter: Walter Su Assignee: Walter Su Priority: Minor Fix For: 2.7.1 Attachments: HDFS-8204.001.patch, HDFS-8204.002.patch, HDFS-8204.003.patch Balancer moves blocks between Datanode(Ver. 2.6 ). Balancer moves blocks between StorageGroups ( introduced by HDFS-6584) , in the new version(Ver. =2.6) . function {code} class DBlock extends LocationsStorageGroup DBlock.isLocatedOn(StorageGroup loc) {code} -is flawed, may causes 2 replicas ends in same node after running balance.- For example: We have 2 nodes. Each node has two storages. We have (DN0, SSD), (DN0, DISK), (DN1, SSD), (DN1, DISK). We have a block with ONE_SSD storage policy. The block has 2 replicas. They are in (DN0,SSD) and (DN1,DISK). Replica in (DN0,SSD) should not be moved to (DN1,SSD) after running Balancer. Otherwise DN1 has 2 replicas. -- UPDATE(Thanks [~szetszwo] for pointing it out): {color:red} This bug will *NOT* causes 2 replicas end in same node after running balance, thanks to Datanode rejecting it. {color} We see a lot of ERROR when running test. {code} 2015-04-27 10:08:15,809 ERROR datanode.DataNode (DataXceiver.java:run(277)) - host1.foo.com:59537:DataXceiver error processing REPLACE_BLOCK operation src: /127.0.0.1:52532 dst: /127.0.0.1:59537 org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block BP-264794661-9.96.1.34-1430100451121:blk_1073741825_1001 already exists in state FINALIZED and thus cannot be created. at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:1447) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.init(BlockReceiver.java:186) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.replaceBlock(DataXceiver.java:1158) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReplaceBlock(Receiver.java:229) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:77) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:250) at java.lang.Thread.run(Thread.java:722) {code} The Balancer runs 5~20 times iterations in the test, before it exits. It's ineffecient. Balancer should not *schedule* it in the first place, even though it'll failed anyway. In the test, it should exit after 5 times iteration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8204) Mover/Balancer should not schedule two replicas to the same DN
[ https://issues.apache.org/jira/browse/HDFS-8204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Walter Su updated HDFS-8204: Attachment: HDFS-8204.003.patch Mover/Balancer should not schedule two replicas to the same DN -- Key: HDFS-8204 URL: https://issues.apache.org/jira/browse/HDFS-8204 Project: Hadoop HDFS Issue Type: Improvement Components: balancer mover Reporter: Walter Su Assignee: Walter Su Priority: Minor Attachments: HDFS-8204.001.patch, HDFS-8204.002.patch, HDFS-8204.003.patch Balancer moves blocks between Datanode(Ver. 2.6 ). Balancer moves blocks between StorageGroups ( introduced by HDFS-6584) , in the new version(Ver. =2.6) . function {code} class DBlock extends LocationsStorageGroup DBlock.isLocatedOn(StorageGroup loc) {code} -is flawed, may causes 2 replicas ends in same node after running balance.- For example: We have 2 nodes. Each node has two storages. We have (DN0, SSD), (DN0, DISK), (DN1, SSD), (DN1, DISK). We have a block with ONE_SSD storage policy. The block has 2 replicas. They are in (DN0,SSD) and (DN1,DISK). Replica in (DN0,SSD) should not be moved to (DN1,SSD) after running Balancer. Otherwise DN1 has 2 replicas. -- UPDATE(Thanks [~szetszwo] for pointing it out): {color:red} This bug will *NOT* causes 2 replicas end in same node after running balance, thanks to Datanode rejecting it. {color} We see a lot of ERROR when running test. {code} 2015-04-27 10:08:15,809 ERROR datanode.DataNode (DataXceiver.java:run(277)) - host1.foo.com:59537:DataXceiver error processing REPLACE_BLOCK operation src: /127.0.0.1:52532 dst: /127.0.0.1:59537 org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block BP-264794661-9.96.1.34-1430100451121:blk_1073741825_1001 already exists in state FINALIZED and thus cannot be created. at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:1447) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.init(BlockReceiver.java:186) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.replaceBlock(DataXceiver.java:1158) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReplaceBlock(Receiver.java:229) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:77) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:250) at java.lang.Thread.run(Thread.java:722) {code} The Balancer runs 5~20 times iterations in the test, before it exits. It's ineffecient. Balancer should not *schedule* it in the first place, even though it'll failed anyway. In the test, it should exit after 5 times iteration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8204) Mover/Balancer should not schedule two replicas to the same DN
[ https://issues.apache.org/jira/browse/HDFS-8204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Walter Su updated HDFS-8204: Attachment: (was: HDFS-8204.003.patch) Mover/Balancer should not schedule two replicas to the same DN -- Key: HDFS-8204 URL: https://issues.apache.org/jira/browse/HDFS-8204 Project: Hadoop HDFS Issue Type: Improvement Components: balancer mover Reporter: Walter Su Assignee: Walter Su Priority: Minor Attachments: HDFS-8204.001.patch, HDFS-8204.002.patch Balancer moves blocks between Datanode(Ver. 2.6 ). Balancer moves blocks between StorageGroups ( introduced by HDFS-6584) , in the new version(Ver. =2.6) . function {code} class DBlock extends LocationsStorageGroup DBlock.isLocatedOn(StorageGroup loc) {code} -is flawed, may causes 2 replicas ends in same node after running balance.- For example: We have 2 nodes. Each node has two storages. We have (DN0, SSD), (DN0, DISK), (DN1, SSD), (DN1, DISK). We have a block with ONE_SSD storage policy. The block has 2 replicas. They are in (DN0,SSD) and (DN1,DISK). Replica in (DN0,SSD) should not be moved to (DN1,SSD) after running Balancer. Otherwise DN1 has 2 replicas. -- UPDATE(Thanks [~szetszwo] for pointing it out): {color:red} This bug will *NOT* causes 2 replicas end in same node after running balance, thanks to Datanode rejecting it. {color} We see a lot of ERROR when running test. {code} 2015-04-27 10:08:15,809 ERROR datanode.DataNode (DataXceiver.java:run(277)) - host1.foo.com:59537:DataXceiver error processing REPLACE_BLOCK operation src: /127.0.0.1:52532 dst: /127.0.0.1:59537 org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block BP-264794661-9.96.1.34-1430100451121:blk_1073741825_1001 already exists in state FINALIZED and thus cannot be created. at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:1447) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.init(BlockReceiver.java:186) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.replaceBlock(DataXceiver.java:1158) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReplaceBlock(Receiver.java:229) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:77) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:250) at java.lang.Thread.run(Thread.java:722) {code} The Balancer runs 5~20 times iterations in the test, before it exits. It's ineffecient. Balancer should not *schedule* it in the first place, even though it'll failed anyway. In the test, it should exit after 5 times iteration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8204) Mover/Balancer should not schedule two replicas to the same DN
[ https://issues.apache.org/jira/browse/HDFS-8204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Walter Su updated HDFS-8204: Description: Balancer moves blocks between Datanode(Ver. 2.6 ). Balancer moves blocks between StorageGroups ( introduced by HDFS-6584) , in the new version(Ver. =2.6) . function {code} class DBlock extends LocationsStorageGroup DBlock.isLocatedOn(StorageGroup loc) {code} -is flawed, may causes 2 replicas ends in same node after running balance.- For example: We have 2 nodes. Each node has two storages. We have (DN0, SSD), (DN0, DISK), (DN1, SSD), (DN1, DISK). We have a block with ONE_SSD storage policy. The block has 2 replicas. They are in (DN0,SSD) and (DN1,DISK). Replica in (DN0,SSD) should not be moved to (DN1,SSD) after running Balancer. Otherwise DN1 has 2 replicas. -- UPDATE(Thanks [~szetszwo] for pointing it out): {color:red} This bug will *NOT* causes 2 replicas end in same node after running balance, thanks to Datanode rejecting it. {color} We see a lot of ERROR when running test. {code} 2015-04-27 10:08:15,809 ERROR datanode.DataNode (DataXceiver.java:run(277)) - host1.foo.com:59537:DataXceiver error processing REPLACE_BLOCK operation src: /127.0.0.1:52532 dst: /127.0.0.1:59537 org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block BP-264794661-9.96.1.34-1430100451121:blk_1073741825_1001 already exists in state FINALIZED and thus cannot be created. at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:1447) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.init(BlockReceiver.java:186) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.replaceBlock(DataXceiver.java:1158) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReplaceBlock(Receiver.java:229) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:77) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:250) at java.lang.Thread.run(Thread.java:722) {code} The Balancer runs 5~20 times iterations in the test, before it exits. It's ineffecient. Balancer should not *schedule* it in the first place, even though it'll failed anyway. In the test, it should exit after 5 times iteration. was: Balancer moves blocks between Datanode(Ver. 2.6 ). Balancer moves blocks between StorageGroups ( introduced by HDFS-6584) , in the new version(Ver. =2.6) . function {code} class DBlock extends LocationsStorageGroup DBlock.isLocatedOn(StorageGroup loc) {code} is flawed, may causes 2 replicas ends in same node after running balance. For example: We have 2 nodes. Each node has two storages. We have (DN0, SSD), (DN0, DISK), (DN1, SSD), (DN1, DISK). We have a block with ONE_SSD storage policy. The block has 2 replicas. They are in (DN0,SSD) and (DN1,DISK). Replica in (DN0,SSD) should not be moved to (DN1,SSD) after running Balancer. Otherwise DN1 has 2 replicas. Mover/Balancer should not schedule two replicas to the same DN -- Key: HDFS-8204 URL: https://issues.apache.org/jira/browse/HDFS-8204 Project: Hadoop HDFS Issue Type: Bug Components: balancer mover Reporter: Walter Su Assignee: Walter Su Attachments: HDFS-8204.001.patch, HDFS-8204.002.patch Balancer moves blocks between Datanode(Ver. 2.6 ). Balancer moves blocks between StorageGroups ( introduced by HDFS-6584) , in the new version(Ver. =2.6) . function {code} class DBlock extends LocationsStorageGroup DBlock.isLocatedOn(StorageGroup loc) {code} -is flawed, may causes 2 replicas ends in same node after running balance.- For example: We have 2 nodes. Each node has two storages. We have (DN0, SSD), (DN0, DISK), (DN1, SSD), (DN1, DISK). We have a block with ONE_SSD storage policy. The block has 2 replicas. They are in (DN0,SSD) and (DN1,DISK). Replica in (DN0,SSD) should not be moved to (DN1,SSD) after running Balancer. Otherwise DN1 has 2 replicas. -- UPDATE(Thanks [~szetszwo] for pointing it out): {color:red} This bug will *NOT* causes 2 replicas end in same node after running balance, thanks to Datanode rejecting it. {color} We see a lot of ERROR when running test. {code} 2015-04-27 10:08:15,809 ERROR datanode.DataNode (DataXceiver.java:run(277)) - host1.foo.com:59537:DataXceiver error processing REPLACE_BLOCK operation src: /127.0.0.1:52532 dst: /127.0.0.1:59537 org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block BP-264794661-9.96.1.34-1430100451121:blk_1073741825_1001 already exists in state FINALIZED and thus cannot be created. at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:1447) at
[jira] [Updated] (HDFS-8204) Mover/Balancer should not schedule two replicas to the same DN
[ https://issues.apache.org/jira/browse/HDFS-8204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Walter Su updated HDFS-8204: Attachment: HDFS-8204.003.patch Mover/Balancer should not schedule two replicas to the same DN -- Key: HDFS-8204 URL: https://issues.apache.org/jira/browse/HDFS-8204 Project: Hadoop HDFS Issue Type: Bug Components: balancer mover Reporter: Walter Su Assignee: Walter Su Attachments: HDFS-8204.001.patch, HDFS-8204.002.patch, HDFS-8204.003.patch Balancer moves blocks between Datanode(Ver. 2.6 ). Balancer moves blocks between StorageGroups ( introduced by HDFS-6584) , in the new version(Ver. =2.6) . function {code} class DBlock extends LocationsStorageGroup DBlock.isLocatedOn(StorageGroup loc) {code} -is flawed, may causes 2 replicas ends in same node after running balance.- For example: We have 2 nodes. Each node has two storages. We have (DN0, SSD), (DN0, DISK), (DN1, SSD), (DN1, DISK). We have a block with ONE_SSD storage policy. The block has 2 replicas. They are in (DN0,SSD) and (DN1,DISK). Replica in (DN0,SSD) should not be moved to (DN1,SSD) after running Balancer. Otherwise DN1 has 2 replicas. -- UPDATE(Thanks [~szetszwo] for pointing it out): {color:red} This bug will *NOT* causes 2 replicas end in same node after running balance, thanks to Datanode rejecting it. {color} We see a lot of ERROR when running test. {code} 2015-04-27 10:08:15,809 ERROR datanode.DataNode (DataXceiver.java:run(277)) - host1.foo.com:59537:DataXceiver error processing REPLACE_BLOCK operation src: /127.0.0.1:52532 dst: /127.0.0.1:59537 org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block BP-264794661-9.96.1.34-1430100451121:blk_1073741825_1001 already exists in state FINALIZED and thus cannot be created. at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:1447) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.init(BlockReceiver.java:186) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.replaceBlock(DataXceiver.java:1158) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReplaceBlock(Receiver.java:229) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:77) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:250) at java.lang.Thread.run(Thread.java:722) {code} The Balancer runs 5~20 times iterations in the test, before it exits. It's ineffecient. Balancer should not *schedule* it in the first place, even though it'll failed anyway. In the test, it should exit after 5 times iteration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8204) Mover/Balancer should not schedule two replicas to the same DN
[ https://issues.apache.org/jira/browse/HDFS-8204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Walter Su updated HDFS-8204: Priority: Minor (was: Major) Mover/Balancer should not schedule two replicas to the same DN -- Key: HDFS-8204 URL: https://issues.apache.org/jira/browse/HDFS-8204 Project: Hadoop HDFS Issue Type: Bug Components: balancer mover Reporter: Walter Su Assignee: Walter Su Priority: Minor Attachments: HDFS-8204.001.patch, HDFS-8204.002.patch, HDFS-8204.003.patch Balancer moves blocks between Datanode(Ver. 2.6 ). Balancer moves blocks between StorageGroups ( introduced by HDFS-6584) , in the new version(Ver. =2.6) . function {code} class DBlock extends LocationsStorageGroup DBlock.isLocatedOn(StorageGroup loc) {code} -is flawed, may causes 2 replicas ends in same node after running balance.- For example: We have 2 nodes. Each node has two storages. We have (DN0, SSD), (DN0, DISK), (DN1, SSD), (DN1, DISK). We have a block with ONE_SSD storage policy. The block has 2 replicas. They are in (DN0,SSD) and (DN1,DISK). Replica in (DN0,SSD) should not be moved to (DN1,SSD) after running Balancer. Otherwise DN1 has 2 replicas. -- UPDATE(Thanks [~szetszwo] for pointing it out): {color:red} This bug will *NOT* causes 2 replicas end in same node after running balance, thanks to Datanode rejecting it. {color} We see a lot of ERROR when running test. {code} 2015-04-27 10:08:15,809 ERROR datanode.DataNode (DataXceiver.java:run(277)) - host1.foo.com:59537:DataXceiver error processing REPLACE_BLOCK operation src: /127.0.0.1:52532 dst: /127.0.0.1:59537 org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block BP-264794661-9.96.1.34-1430100451121:blk_1073741825_1001 already exists in state FINALIZED and thus cannot be created. at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:1447) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.init(BlockReceiver.java:186) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.replaceBlock(DataXceiver.java:1158) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReplaceBlock(Receiver.java:229) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:77) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:250) at java.lang.Thread.run(Thread.java:722) {code} The Balancer runs 5~20 times iterations in the test, before it exits. It's ineffecient. Balancer should not *schedule* it in the first place, even though it'll failed anyway. In the test, it should exit after 5 times iteration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8204) Mover/Balancer should not schedule two replicas to the same DN
[ https://issues.apache.org/jira/browse/HDFS-8204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Walter Su updated HDFS-8204: Issue Type: Improvement (was: Bug) Mover/Balancer should not schedule two replicas to the same DN -- Key: HDFS-8204 URL: https://issues.apache.org/jira/browse/HDFS-8204 Project: Hadoop HDFS Issue Type: Improvement Components: balancer mover Reporter: Walter Su Assignee: Walter Su Priority: Minor Attachments: HDFS-8204.001.patch, HDFS-8204.002.patch, HDFS-8204.003.patch Balancer moves blocks between Datanode(Ver. 2.6 ). Balancer moves blocks between StorageGroups ( introduced by HDFS-6584) , in the new version(Ver. =2.6) . function {code} class DBlock extends LocationsStorageGroup DBlock.isLocatedOn(StorageGroup loc) {code} -is flawed, may causes 2 replicas ends in same node after running balance.- For example: We have 2 nodes. Each node has two storages. We have (DN0, SSD), (DN0, DISK), (DN1, SSD), (DN1, DISK). We have a block with ONE_SSD storage policy. The block has 2 replicas. They are in (DN0,SSD) and (DN1,DISK). Replica in (DN0,SSD) should not be moved to (DN1,SSD) after running Balancer. Otherwise DN1 has 2 replicas. -- UPDATE(Thanks [~szetszwo] for pointing it out): {color:red} This bug will *NOT* causes 2 replicas end in same node after running balance, thanks to Datanode rejecting it. {color} We see a lot of ERROR when running test. {code} 2015-04-27 10:08:15,809 ERROR datanode.DataNode (DataXceiver.java:run(277)) - host1.foo.com:59537:DataXceiver error processing REPLACE_BLOCK operation src: /127.0.0.1:52532 dst: /127.0.0.1:59537 org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block BP-264794661-9.96.1.34-1430100451121:blk_1073741825_1001 already exists in state FINALIZED and thus cannot be created. at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:1447) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.init(BlockReceiver.java:186) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.replaceBlock(DataXceiver.java:1158) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReplaceBlock(Receiver.java:229) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:77) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:250) at java.lang.Thread.run(Thread.java:722) {code} The Balancer runs 5~20 times iterations in the test, before it exits. It's ineffecient. Balancer should not *schedule* it in the first place, even though it'll failed anyway. In the test, it should exit after 5 times iteration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8204) Mover/Balancer should not schedule two replicas to the same DN
[ https://issues.apache.org/jira/browse/HDFS-8204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-8204: -- Hadoop Flags: Reviewed Status: Patch Available (was: Reopened) +1 patch looks good. Pending Jenkins. Mover/Balancer should not schedule two replicas to the same DN -- Key: HDFS-8204 URL: https://issues.apache.org/jira/browse/HDFS-8204 Project: Hadoop HDFS Issue Type: Bug Components: balancer mover Reporter: Walter Su Assignee: Walter Su Attachments: HDFS-8204.001.patch, HDFS-8204.002.patch Balancer moves blocks between Datanode(Ver. 2.6 ). Balancer moves blocks between StorageGroups ( introduced by HDFS-6584) , in the new version(Ver. =2.6) . function {code} class DBlock extends LocationsStorageGroup DBlock.isLocatedOn(StorageGroup loc) {code} is flawed, may causes 2 replicas ends in same node after running balance. For example: We have 2 nodes. Each node has two storages. We have (DN0, SSD), (DN0, DISK), (DN1, SSD), (DN1, DISK). We have a block with ONE_SSD storage policy. The block has 2 replicas. They are in (DN0,SSD) and (DN1,DISK). Replica in (DN0,SSD) should not be moved to (DN1,SSD) after running Balancer. Otherwise DN1 has 2 replicas. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8204) Mover/Balancer should not schedule two replicas to the same DN
[ https://issues.apache.org/jira/browse/HDFS-8204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Walter Su updated HDFS-8204: Description: Balancer moves blocks between Datanode(Ver. 2.6 ). Balancer moves blocks between StorageGroups ( introduced by HDFS-6584) , in the new version(Ver. =2.6) . function {code} class DBlock extends LocationsStorageGroup DBlock.isLocatedOn(StorageGroup loc) {code} is flawed, may causes 2 replicas ends in same node after running balance. For example: We have 2 nodes. Each node has two storages. We have (DN0, SSD), (DN0, DISK), (DN1, SSD), (DN1, DISK). We have a block with ONE_SSD storage policy. The block has 2 replicas. They are in (DN0,SSD) and (DN1,DISK). Replica in (DN0,SSD) should not be moved to (DN1,SSD) after running Balancer. Otherwise DN1 has 2 replicas. was: Balancer moves blocks between Datanode(Ver. 2.6 ). Balancer moves blocks between StorageGroups ( introduced by HDFS-6584) , in the new version(Ver. =2.6) . function {code} class DBlock extends LocationsStorageGroup DBlock.isLocatedOn(StorageGroup loc) {code} is flawed, may causes 2 replicas ends in same node after running balance. Mover/Balancer should not schedule two replicas to the same DN -- Key: HDFS-8204 URL: https://issues.apache.org/jira/browse/HDFS-8204 Project: Hadoop HDFS Issue Type: Bug Components: balancer mover Reporter: Walter Su Assignee: Walter Su Attachments: HDFS-8204.001.patch, HDFS-8204.002.patch Balancer moves blocks between Datanode(Ver. 2.6 ). Balancer moves blocks between StorageGroups ( introduced by HDFS-6584) , in the new version(Ver. =2.6) . function {code} class DBlock extends LocationsStorageGroup DBlock.isLocatedOn(StorageGroup loc) {code} is flawed, may causes 2 replicas ends in same node after running balance. For example: We have 2 nodes. Each node has two storages. We have (DN0, SSD), (DN0, DISK), (DN1, SSD), (DN1, DISK). We have a block with ONE_SSD storage policy. The block has 2 replicas. They are in (DN0,SSD) and (DN1,DISK). Replica in (DN0,SSD) should not be moved to (DN1,SSD) after running Balancer. Otherwise DN1 has 2 replicas. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8204) Mover/Balancer should not schedule two replicas to the same DN
[ https://issues.apache.org/jira/browse/HDFS-8204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Walter Su updated HDFS-8204: Attachment: HDFS-8204.002.patch Mover/Balancer should not schedule two replicas to the same DN -- Key: HDFS-8204 URL: https://issues.apache.org/jira/browse/HDFS-8204 Project: Hadoop HDFS Issue Type: Bug Components: balancer mover Reporter: Walter Su Assignee: Walter Su Attachments: HDFS-8204.001.patch, HDFS-8204.002.patch Balancer moves blocks between Datanode(Ver. 2.6 ). Balancer moves blocks between StorageGroups ( introduced by HDFS-6584) , in the new version(Ver. =2.6) . function {code} class DBlock extends LocationsStorageGroup DBlock.isLocatedOn(StorageGroup loc) {code} is flawed, may causes 2 replicas ends in same node after running balance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8204) Mover/Balancer should not schedule two replicas to the same DN
[ https://issues.apache.org/jira/browse/HDFS-8204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Walter Su updated HDFS-8204: Summary: Mover/Balancer should not schedule two replicas to the same DN (was: Mover should not schedule two replicas to the same DN) Mover/Balancer should not schedule two replicas to the same DN -- Key: HDFS-8204 URL: https://issues.apache.org/jira/browse/HDFS-8204 Project: Hadoop HDFS Issue Type: Bug Components: balancer mover Reporter: Walter Su Assignee: Walter Su Attachments: HDFS-8204.001.patch Balancer moves blocks between Datanode(Ver. 2.6 ). Balancer moves blocks between StorageGroups ( introduced by HDFS-6584) , in the new version(Ver. =2.6) . function {code} class DBlock extends LocationsStorageGroup DBlock.isLocatedOn(StorageGroup loc) {code} is flawed, may causes 2 replicas ends in same node after running balance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)