[jira] [Updated] (HDFS-8204) Mover/Balancer should not schedule two replicas to the same DN

2017-01-05 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated HDFS-8204:
-
Fix Version/s: 2.8.0

> Mover/Balancer should not schedule two replicas to the same DN
> --
>
> Key: HDFS-8204
> URL: https://issues.apache.org/jira/browse/HDFS-8204
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer & mover
>Reporter: Walter Su
>Assignee: Walter Su
>Priority: Minor
> Fix For: 2.8.0, 2.7.1, 3.0.0-alpha1
>
> Attachments: HDFS-8204.001.patch, HDFS-8204.002.patch, 
> HDFS-8204.003.patch
>
>
> Balancer moves blocks between Datanode(Ver. <2.6 ).
> Balancer moves blocks between StorageGroups ( introduced by HDFS-6584) , in 
> the new version(Ver. >=2.6) .
> function
> {code}
> class DBlock extends Locations
> DBlock.isLocatedOn(StorageGroup loc)
> {code}
> -is flawed, may causes 2 replicas ends in same node after running balance.-
> For example:
> We have 2 nodes. Each node has two storages.
> We have (DN0, SSD), (DN0, DISK), (DN1, SSD), (DN1, DISK).
> We have a block with ONE_SSD storage policy.
> The block has 2 replicas. They are in (DN0,SSD) and (DN1,DISK).
> Replica in (DN0,SSD) should not be moved to (DN1,SSD) after running Balancer.
> Otherwise DN1 has 2 replicas.
> --
> UPDATE(Thanks [~szetszwo] for pointing it out):
> {color:red}
> This bug will *NOT* causes 2 replicas end in same node after running balance, 
> thanks to Datanode rejecting it. 
> {color}
> We see a lot of ERROR when running test.
> {code}
> 2015-04-27 10:08:15,809 ERROR datanode.DataNode (DataXceiver.java:run(277)) - 
> host1.foo.com:59537:DataXceiver error processing REPLACE_BLOCK operation  
> src: /127.0.0.1:52532 dst: /127.0.0.1:59537
> org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block 
> BP-264794661-9.96.1.34-1430100451121:blk_1073741825_1001 already exists in 
> state FINALIZED and thus cannot be created.
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:1447)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:186)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.replaceBlock(DataXceiver.java:1158)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReplaceBlock(Receiver.java:229)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:77)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:250)
> at java.lang.Thread.run(Thread.java:722)
> {code}
> The Balancer runs 5~20 times iterations in the test, before it exits.
> It's ineffecient.
> Balancer should not *schedule* it in the first place, even though it'll 
> failed anyway. In the test, it should exit after 5 times iteration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-8204) Mover/Balancer should not schedule two replicas to the same DN

2015-04-28 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-8204:
--
   Resolution: Fixed
Fix Version/s: 2.7.1
   Status: Resolved  (was: Patch Available)

I have committed this.  Thanks, Walter!

 Mover/Balancer should not schedule two replicas to the same DN
 --

 Key: HDFS-8204
 URL: https://issues.apache.org/jira/browse/HDFS-8204
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer  mover
Reporter: Walter Su
Assignee: Walter Su
Priority: Minor
 Fix For: 2.7.1

 Attachments: HDFS-8204.001.patch, HDFS-8204.002.patch, 
 HDFS-8204.003.patch


 Balancer moves blocks between Datanode(Ver. 2.6 ).
 Balancer moves blocks between StorageGroups ( introduced by HDFS-6584) , in 
 the new version(Ver. =2.6) .
 function
 {code}
 class DBlock extends LocationsStorageGroup
 DBlock.isLocatedOn(StorageGroup loc)
 {code}
 -is flawed, may causes 2 replicas ends in same node after running balance.-
 For example:
 We have 2 nodes. Each node has two storages.
 We have (DN0, SSD), (DN0, DISK), (DN1, SSD), (DN1, DISK).
 We have a block with ONE_SSD storage policy.
 The block has 2 replicas. They are in (DN0,SSD) and (DN1,DISK).
 Replica in (DN0,SSD) should not be moved to (DN1,SSD) after running Balancer.
 Otherwise DN1 has 2 replicas.
 --
 UPDATE(Thanks [~szetszwo] for pointing it out):
 {color:red}
 This bug will *NOT* causes 2 replicas end in same node after running balance, 
 thanks to Datanode rejecting it. 
 {color}
 We see a lot of ERROR when running test.
 {code}
 2015-04-27 10:08:15,809 ERROR datanode.DataNode (DataXceiver.java:run(277)) - 
 host1.foo.com:59537:DataXceiver error processing REPLACE_BLOCK operation  
 src: /127.0.0.1:52532 dst: /127.0.0.1:59537
 org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block 
 BP-264794661-9.96.1.34-1430100451121:blk_1073741825_1001 already exists in 
 state FINALIZED and thus cannot be created.
 at 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:1447)
 at 
 org.apache.hadoop.hdfs.server.datanode.BlockReceiver.init(BlockReceiver.java:186)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.replaceBlock(DataXceiver.java:1158)
 at 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReplaceBlock(Receiver.java:229)
 at 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:77)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:250)
 at java.lang.Thread.run(Thread.java:722)
 {code}
 The Balancer runs 5~20 times iterations in the test, before it exits.
 It's ineffecient.
 Balancer should not *schedule* it in the first place, even though it'll 
 failed anyway. In the test, it should exit after 5 times iteration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8204) Mover/Balancer should not schedule two replicas to the same DN

2015-04-27 Thread Walter Su (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Walter Su updated HDFS-8204:

Attachment: HDFS-8204.003.patch

 Mover/Balancer should not schedule two replicas to the same DN
 --

 Key: HDFS-8204
 URL: https://issues.apache.org/jira/browse/HDFS-8204
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer  mover
Reporter: Walter Su
Assignee: Walter Su
Priority: Minor
 Attachments: HDFS-8204.001.patch, HDFS-8204.002.patch, 
 HDFS-8204.003.patch


 Balancer moves blocks between Datanode(Ver. 2.6 ).
 Balancer moves blocks between StorageGroups ( introduced by HDFS-6584) , in 
 the new version(Ver. =2.6) .
 function
 {code}
 class DBlock extends LocationsStorageGroup
 DBlock.isLocatedOn(StorageGroup loc)
 {code}
 -is flawed, may causes 2 replicas ends in same node after running balance.-
 For example:
 We have 2 nodes. Each node has two storages.
 We have (DN0, SSD), (DN0, DISK), (DN1, SSD), (DN1, DISK).
 We have a block with ONE_SSD storage policy.
 The block has 2 replicas. They are in (DN0,SSD) and (DN1,DISK).
 Replica in (DN0,SSD) should not be moved to (DN1,SSD) after running Balancer.
 Otherwise DN1 has 2 replicas.
 --
 UPDATE(Thanks [~szetszwo] for pointing it out):
 {color:red}
 This bug will *NOT* causes 2 replicas end in same node after running balance, 
 thanks to Datanode rejecting it. 
 {color}
 We see a lot of ERROR when running test.
 {code}
 2015-04-27 10:08:15,809 ERROR datanode.DataNode (DataXceiver.java:run(277)) - 
 host1.foo.com:59537:DataXceiver error processing REPLACE_BLOCK operation  
 src: /127.0.0.1:52532 dst: /127.0.0.1:59537
 org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block 
 BP-264794661-9.96.1.34-1430100451121:blk_1073741825_1001 already exists in 
 state FINALIZED and thus cannot be created.
 at 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:1447)
 at 
 org.apache.hadoop.hdfs.server.datanode.BlockReceiver.init(BlockReceiver.java:186)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.replaceBlock(DataXceiver.java:1158)
 at 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReplaceBlock(Receiver.java:229)
 at 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:77)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:250)
 at java.lang.Thread.run(Thread.java:722)
 {code}
 The Balancer runs 5~20 times iterations in the test, before it exits.
 It's ineffecient.
 Balancer should not *schedule* it in the first place, even though it'll 
 failed anyway. In the test, it should exit after 5 times iteration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8204) Mover/Balancer should not schedule two replicas to the same DN

2015-04-27 Thread Walter Su (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Walter Su updated HDFS-8204:

Attachment: (was: HDFS-8204.003.patch)

 Mover/Balancer should not schedule two replicas to the same DN
 --

 Key: HDFS-8204
 URL: https://issues.apache.org/jira/browse/HDFS-8204
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer  mover
Reporter: Walter Su
Assignee: Walter Su
Priority: Minor
 Attachments: HDFS-8204.001.patch, HDFS-8204.002.patch


 Balancer moves blocks between Datanode(Ver. 2.6 ).
 Balancer moves blocks between StorageGroups ( introduced by HDFS-6584) , in 
 the new version(Ver. =2.6) .
 function
 {code}
 class DBlock extends LocationsStorageGroup
 DBlock.isLocatedOn(StorageGroup loc)
 {code}
 -is flawed, may causes 2 replicas ends in same node after running balance.-
 For example:
 We have 2 nodes. Each node has two storages.
 We have (DN0, SSD), (DN0, DISK), (DN1, SSD), (DN1, DISK).
 We have a block with ONE_SSD storage policy.
 The block has 2 replicas. They are in (DN0,SSD) and (DN1,DISK).
 Replica in (DN0,SSD) should not be moved to (DN1,SSD) after running Balancer.
 Otherwise DN1 has 2 replicas.
 --
 UPDATE(Thanks [~szetszwo] for pointing it out):
 {color:red}
 This bug will *NOT* causes 2 replicas end in same node after running balance, 
 thanks to Datanode rejecting it. 
 {color}
 We see a lot of ERROR when running test.
 {code}
 2015-04-27 10:08:15,809 ERROR datanode.DataNode (DataXceiver.java:run(277)) - 
 host1.foo.com:59537:DataXceiver error processing REPLACE_BLOCK operation  
 src: /127.0.0.1:52532 dst: /127.0.0.1:59537
 org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block 
 BP-264794661-9.96.1.34-1430100451121:blk_1073741825_1001 already exists in 
 state FINALIZED and thus cannot be created.
 at 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:1447)
 at 
 org.apache.hadoop.hdfs.server.datanode.BlockReceiver.init(BlockReceiver.java:186)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.replaceBlock(DataXceiver.java:1158)
 at 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReplaceBlock(Receiver.java:229)
 at 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:77)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:250)
 at java.lang.Thread.run(Thread.java:722)
 {code}
 The Balancer runs 5~20 times iterations in the test, before it exits.
 It's ineffecient.
 Balancer should not *schedule* it in the first place, even though it'll 
 failed anyway. In the test, it should exit after 5 times iteration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8204) Mover/Balancer should not schedule two replicas to the same DN

2015-04-26 Thread Walter Su (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Walter Su updated HDFS-8204:

Description: 
Balancer moves blocks between Datanode(Ver. 2.6 ).
Balancer moves blocks between StorageGroups ( introduced by HDFS-6584) , in the 
new version(Ver. =2.6) .
function
{code}
class DBlock extends LocationsStorageGroup
DBlock.isLocatedOn(StorageGroup loc)
{code}
-is flawed, may causes 2 replicas ends in same node after running balance.-

For example:
We have 2 nodes. Each node has two storages.
We have (DN0, SSD), (DN0, DISK), (DN1, SSD), (DN1, DISK).
We have a block with ONE_SSD storage policy.
The block has 2 replicas. They are in (DN0,SSD) and (DN1,DISK).
Replica in (DN0,SSD) should not be moved to (DN1,SSD) after running Balancer.
Otherwise DN1 has 2 replicas.
--
UPDATE(Thanks [~szetszwo] for pointing it out):
{color:red}
This bug will *NOT* causes 2 replicas end in same node after running balance, 
thanks to Datanode rejecting it. 
{color}
We see a lot of ERROR when running test.
{code}
2015-04-27 10:08:15,809 ERROR datanode.DataNode (DataXceiver.java:run(277)) - 
host1.foo.com:59537:DataXceiver error processing REPLACE_BLOCK operation  src: 
/127.0.0.1:52532 dst: /127.0.0.1:59537
org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block 
BP-264794661-9.96.1.34-1430100451121:blk_1073741825_1001 already exists in 
state FINALIZED and thus cannot be created.
at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:1447)
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.init(BlockReceiver.java:186)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.replaceBlock(DataXceiver.java:1158)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReplaceBlock(Receiver.java:229)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:77)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:250)
at java.lang.Thread.run(Thread.java:722)
{code}
The Balancer runs 5~20 times iterations in the test, before it exits.
It's ineffecient.
Balancer should not *schedule* it in the first place, even though it'll failed 
anyway. In the test, it should exit after 5 times iteration.

  was:
Balancer moves blocks between Datanode(Ver. 2.6 ).
Balancer moves blocks between StorageGroups ( introduced by HDFS-6584) , in the 
new version(Ver. =2.6) .
function
{code}
class DBlock extends LocationsStorageGroup
DBlock.isLocatedOn(StorageGroup loc)
{code}
is flawed, may causes 2 replicas ends in same node after running balance.

For example:
We have 2 nodes. Each node has two storages.
We have (DN0, SSD), (DN0, DISK), (DN1, SSD), (DN1, DISK).
We have a block with ONE_SSD storage policy.
The block has 2 replicas. They are in (DN0,SSD) and (DN1,DISK).
Replica in (DN0,SSD) should not be moved to (DN1,SSD) after running Balancer.
Otherwise DN1 has 2 replicas.


 Mover/Balancer should not schedule two replicas to the same DN
 --

 Key: HDFS-8204
 URL: https://issues.apache.org/jira/browse/HDFS-8204
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer  mover
Reporter: Walter Su
Assignee: Walter Su
 Attachments: HDFS-8204.001.patch, HDFS-8204.002.patch


 Balancer moves blocks between Datanode(Ver. 2.6 ).
 Balancer moves blocks between StorageGroups ( introduced by HDFS-6584) , in 
 the new version(Ver. =2.6) .
 function
 {code}
 class DBlock extends LocationsStorageGroup
 DBlock.isLocatedOn(StorageGroup loc)
 {code}
 -is flawed, may causes 2 replicas ends in same node after running balance.-
 For example:
 We have 2 nodes. Each node has two storages.
 We have (DN0, SSD), (DN0, DISK), (DN1, SSD), (DN1, DISK).
 We have a block with ONE_SSD storage policy.
 The block has 2 replicas. They are in (DN0,SSD) and (DN1,DISK).
 Replica in (DN0,SSD) should not be moved to (DN1,SSD) after running Balancer.
 Otherwise DN1 has 2 replicas.
 --
 UPDATE(Thanks [~szetszwo] for pointing it out):
 {color:red}
 This bug will *NOT* causes 2 replicas end in same node after running balance, 
 thanks to Datanode rejecting it. 
 {color}
 We see a lot of ERROR when running test.
 {code}
 2015-04-27 10:08:15,809 ERROR datanode.DataNode (DataXceiver.java:run(277)) - 
 host1.foo.com:59537:DataXceiver error processing REPLACE_BLOCK operation  
 src: /127.0.0.1:52532 dst: /127.0.0.1:59537
 org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block 
 BP-264794661-9.96.1.34-1430100451121:blk_1073741825_1001 already exists in 
 state FINALIZED and thus cannot be created.
 at 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:1447)
 at 
 

[jira] [Updated] (HDFS-8204) Mover/Balancer should not schedule two replicas to the same DN

2015-04-26 Thread Walter Su (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Walter Su updated HDFS-8204:

Attachment: HDFS-8204.003.patch

 Mover/Balancer should not schedule two replicas to the same DN
 --

 Key: HDFS-8204
 URL: https://issues.apache.org/jira/browse/HDFS-8204
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer  mover
Reporter: Walter Su
Assignee: Walter Su
 Attachments: HDFS-8204.001.patch, HDFS-8204.002.patch, 
 HDFS-8204.003.patch


 Balancer moves blocks between Datanode(Ver. 2.6 ).
 Balancer moves blocks between StorageGroups ( introduced by HDFS-6584) , in 
 the new version(Ver. =2.6) .
 function
 {code}
 class DBlock extends LocationsStorageGroup
 DBlock.isLocatedOn(StorageGroup loc)
 {code}
 -is flawed, may causes 2 replicas ends in same node after running balance.-
 For example:
 We have 2 nodes. Each node has two storages.
 We have (DN0, SSD), (DN0, DISK), (DN1, SSD), (DN1, DISK).
 We have a block with ONE_SSD storage policy.
 The block has 2 replicas. They are in (DN0,SSD) and (DN1,DISK).
 Replica in (DN0,SSD) should not be moved to (DN1,SSD) after running Balancer.
 Otherwise DN1 has 2 replicas.
 --
 UPDATE(Thanks [~szetszwo] for pointing it out):
 {color:red}
 This bug will *NOT* causes 2 replicas end in same node after running balance, 
 thanks to Datanode rejecting it. 
 {color}
 We see a lot of ERROR when running test.
 {code}
 2015-04-27 10:08:15,809 ERROR datanode.DataNode (DataXceiver.java:run(277)) - 
 host1.foo.com:59537:DataXceiver error processing REPLACE_BLOCK operation  
 src: /127.0.0.1:52532 dst: /127.0.0.1:59537
 org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block 
 BP-264794661-9.96.1.34-1430100451121:blk_1073741825_1001 already exists in 
 state FINALIZED and thus cannot be created.
 at 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:1447)
 at 
 org.apache.hadoop.hdfs.server.datanode.BlockReceiver.init(BlockReceiver.java:186)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.replaceBlock(DataXceiver.java:1158)
 at 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReplaceBlock(Receiver.java:229)
 at 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:77)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:250)
 at java.lang.Thread.run(Thread.java:722)
 {code}
 The Balancer runs 5~20 times iterations in the test, before it exits.
 It's ineffecient.
 Balancer should not *schedule* it in the first place, even though it'll 
 failed anyway. In the test, it should exit after 5 times iteration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8204) Mover/Balancer should not schedule two replicas to the same DN

2015-04-26 Thread Walter Su (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Walter Su updated HDFS-8204:

Priority: Minor  (was: Major)

 Mover/Balancer should not schedule two replicas to the same DN
 --

 Key: HDFS-8204
 URL: https://issues.apache.org/jira/browse/HDFS-8204
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer  mover
Reporter: Walter Su
Assignee: Walter Su
Priority: Minor
 Attachments: HDFS-8204.001.patch, HDFS-8204.002.patch, 
 HDFS-8204.003.patch


 Balancer moves blocks between Datanode(Ver. 2.6 ).
 Balancer moves blocks between StorageGroups ( introduced by HDFS-6584) , in 
 the new version(Ver. =2.6) .
 function
 {code}
 class DBlock extends LocationsStorageGroup
 DBlock.isLocatedOn(StorageGroup loc)
 {code}
 -is flawed, may causes 2 replicas ends in same node after running balance.-
 For example:
 We have 2 nodes. Each node has two storages.
 We have (DN0, SSD), (DN0, DISK), (DN1, SSD), (DN1, DISK).
 We have a block with ONE_SSD storage policy.
 The block has 2 replicas. They are in (DN0,SSD) and (DN1,DISK).
 Replica in (DN0,SSD) should not be moved to (DN1,SSD) after running Balancer.
 Otherwise DN1 has 2 replicas.
 --
 UPDATE(Thanks [~szetszwo] for pointing it out):
 {color:red}
 This bug will *NOT* causes 2 replicas end in same node after running balance, 
 thanks to Datanode rejecting it. 
 {color}
 We see a lot of ERROR when running test.
 {code}
 2015-04-27 10:08:15,809 ERROR datanode.DataNode (DataXceiver.java:run(277)) - 
 host1.foo.com:59537:DataXceiver error processing REPLACE_BLOCK operation  
 src: /127.0.0.1:52532 dst: /127.0.0.1:59537
 org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block 
 BP-264794661-9.96.1.34-1430100451121:blk_1073741825_1001 already exists in 
 state FINALIZED and thus cannot be created.
 at 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:1447)
 at 
 org.apache.hadoop.hdfs.server.datanode.BlockReceiver.init(BlockReceiver.java:186)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.replaceBlock(DataXceiver.java:1158)
 at 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReplaceBlock(Receiver.java:229)
 at 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:77)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:250)
 at java.lang.Thread.run(Thread.java:722)
 {code}
 The Balancer runs 5~20 times iterations in the test, before it exits.
 It's ineffecient.
 Balancer should not *schedule* it in the first place, even though it'll 
 failed anyway. In the test, it should exit after 5 times iteration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8204) Mover/Balancer should not schedule two replicas to the same DN

2015-04-26 Thread Walter Su (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Walter Su updated HDFS-8204:

Issue Type: Improvement  (was: Bug)

 Mover/Balancer should not schedule two replicas to the same DN
 --

 Key: HDFS-8204
 URL: https://issues.apache.org/jira/browse/HDFS-8204
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer  mover
Reporter: Walter Su
Assignee: Walter Su
Priority: Minor
 Attachments: HDFS-8204.001.patch, HDFS-8204.002.patch, 
 HDFS-8204.003.patch


 Balancer moves blocks between Datanode(Ver. 2.6 ).
 Balancer moves blocks between StorageGroups ( introduced by HDFS-6584) , in 
 the new version(Ver. =2.6) .
 function
 {code}
 class DBlock extends LocationsStorageGroup
 DBlock.isLocatedOn(StorageGroup loc)
 {code}
 -is flawed, may causes 2 replicas ends in same node after running balance.-
 For example:
 We have 2 nodes. Each node has two storages.
 We have (DN0, SSD), (DN0, DISK), (DN1, SSD), (DN1, DISK).
 We have a block with ONE_SSD storage policy.
 The block has 2 replicas. They are in (DN0,SSD) and (DN1,DISK).
 Replica in (DN0,SSD) should not be moved to (DN1,SSD) after running Balancer.
 Otherwise DN1 has 2 replicas.
 --
 UPDATE(Thanks [~szetszwo] for pointing it out):
 {color:red}
 This bug will *NOT* causes 2 replicas end in same node after running balance, 
 thanks to Datanode rejecting it. 
 {color}
 We see a lot of ERROR when running test.
 {code}
 2015-04-27 10:08:15,809 ERROR datanode.DataNode (DataXceiver.java:run(277)) - 
 host1.foo.com:59537:DataXceiver error processing REPLACE_BLOCK operation  
 src: /127.0.0.1:52532 dst: /127.0.0.1:59537
 org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block 
 BP-264794661-9.96.1.34-1430100451121:blk_1073741825_1001 already exists in 
 state FINALIZED and thus cannot be created.
 at 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:1447)
 at 
 org.apache.hadoop.hdfs.server.datanode.BlockReceiver.init(BlockReceiver.java:186)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.replaceBlock(DataXceiver.java:1158)
 at 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReplaceBlock(Receiver.java:229)
 at 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:77)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:250)
 at java.lang.Thread.run(Thread.java:722)
 {code}
 The Balancer runs 5~20 times iterations in the test, before it exits.
 It's ineffecient.
 Balancer should not *schedule* it in the first place, even though it'll 
 failed anyway. In the test, it should exit after 5 times iteration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8204) Mover/Balancer should not schedule two replicas to the same DN

2015-04-25 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-8204:
--
Hadoop Flags: Reviewed
  Status: Patch Available  (was: Reopened)

+1 patch looks good.  Pending Jenkins.

 Mover/Balancer should not schedule two replicas to the same DN
 --

 Key: HDFS-8204
 URL: https://issues.apache.org/jira/browse/HDFS-8204
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer  mover
Reporter: Walter Su
Assignee: Walter Su
 Attachments: HDFS-8204.001.patch, HDFS-8204.002.patch


 Balancer moves blocks between Datanode(Ver. 2.6 ).
 Balancer moves blocks between StorageGroups ( introduced by HDFS-6584) , in 
 the new version(Ver. =2.6) .
 function
 {code}
 class DBlock extends LocationsStorageGroup
 DBlock.isLocatedOn(StorageGroup loc)
 {code}
 is flawed, may causes 2 replicas ends in same node after running balance.
 For example:
 We have 2 nodes. Each node has two storages.
 We have (DN0, SSD), (DN0, DISK), (DN1, SSD), (DN1, DISK).
 We have a block with ONE_SSD storage policy.
 The block has 2 replicas. They are in (DN0,SSD) and (DN1,DISK).
 Replica in (DN0,SSD) should not be moved to (DN1,SSD) after running Balancer.
 Otherwise DN1 has 2 replicas.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8204) Mover/Balancer should not schedule two replicas to the same DN

2015-04-25 Thread Walter Su (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Walter Su updated HDFS-8204:

Description: 
Balancer moves blocks between Datanode(Ver. 2.6 ).
Balancer moves blocks between StorageGroups ( introduced by HDFS-6584) , in the 
new version(Ver. =2.6) .
function
{code}
class DBlock extends LocationsStorageGroup
DBlock.isLocatedOn(StorageGroup loc)
{code}
is flawed, may causes 2 replicas ends in same node after running balance.

For example:
We have 2 nodes. Each node has two storages.
We have (DN0, SSD), (DN0, DISK), (DN1, SSD), (DN1, DISK).
We have a block with ONE_SSD storage policy.
The block has 2 replicas. They are in (DN0,SSD) and (DN1,DISK).
Replica in (DN0,SSD) should not be moved to (DN1,SSD) after running Balancer.
Otherwise DN1 has 2 replicas.

  was:
Balancer moves blocks between Datanode(Ver. 2.6 ).
Balancer moves blocks between StorageGroups ( introduced by HDFS-6584) , in the 
new version(Ver. =2.6) .
function
{code}
class DBlock extends LocationsStorageGroup
DBlock.isLocatedOn(StorageGroup loc)
{code}
is flawed, may causes 2 replicas ends in same node after running balance.


 Mover/Balancer should not schedule two replicas to the same DN
 --

 Key: HDFS-8204
 URL: https://issues.apache.org/jira/browse/HDFS-8204
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer  mover
Reporter: Walter Su
Assignee: Walter Su
 Attachments: HDFS-8204.001.patch, HDFS-8204.002.patch


 Balancer moves blocks between Datanode(Ver. 2.6 ).
 Balancer moves blocks between StorageGroups ( introduced by HDFS-6584) , in 
 the new version(Ver. =2.6) .
 function
 {code}
 class DBlock extends LocationsStorageGroup
 DBlock.isLocatedOn(StorageGroup loc)
 {code}
 is flawed, may causes 2 replicas ends in same node after running balance.
 For example:
 We have 2 nodes. Each node has two storages.
 We have (DN0, SSD), (DN0, DISK), (DN1, SSD), (DN1, DISK).
 We have a block with ONE_SSD storage policy.
 The block has 2 replicas. They are in (DN0,SSD) and (DN1,DISK).
 Replica in (DN0,SSD) should not be moved to (DN1,SSD) after running Balancer.
 Otherwise DN1 has 2 replicas.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8204) Mover/Balancer should not schedule two replicas to the same DN

2015-04-24 Thread Walter Su (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Walter Su updated HDFS-8204:

Attachment: HDFS-8204.002.patch

 Mover/Balancer should not schedule two replicas to the same DN
 --

 Key: HDFS-8204
 URL: https://issues.apache.org/jira/browse/HDFS-8204
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer  mover
Reporter: Walter Su
Assignee: Walter Su
 Attachments: HDFS-8204.001.patch, HDFS-8204.002.patch


 Balancer moves blocks between Datanode(Ver. 2.6 ).
 Balancer moves blocks between StorageGroups ( introduced by HDFS-6584) , in 
 the new version(Ver. =2.6) .
 function
 {code}
 class DBlock extends LocationsStorageGroup
 DBlock.isLocatedOn(StorageGroup loc)
 {code}
 is flawed, may causes 2 replicas ends in same node after running balance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8204) Mover/Balancer should not schedule two replicas to the same DN

2015-04-22 Thread Walter Su (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Walter Su updated HDFS-8204:

Summary: Mover/Balancer should not schedule two replicas to the same DN  
(was: Mover should not schedule two replicas to the same DN)

 Mover/Balancer should not schedule two replicas to the same DN
 --

 Key: HDFS-8204
 URL: https://issues.apache.org/jira/browse/HDFS-8204
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer  mover
Reporter: Walter Su
Assignee: Walter Su
 Attachments: HDFS-8204.001.patch


 Balancer moves blocks between Datanode(Ver. 2.6 ).
 Balancer moves blocks between StorageGroups ( introduced by HDFS-6584) , in 
 the new version(Ver. =2.6) .
 function
 {code}
 class DBlock extends LocationsStorageGroup
 DBlock.isLocatedOn(StorageGroup loc)
 {code}
 is flawed, may causes 2 replicas ends in same node after running balance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)