subject:"\[jira\] \[Commented\] \(HDFS\-7575\) NameNode not handling heartbeats properly after HDFS\-2832"

[jira] [Commented] (HDFS-7575) NameNode not handling heartbeats properly after HDFS-2832

2015-01-22 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14288285#comment-14288285
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7575:
---

Yes.  It is great that we already have the log.

+1 on HDFS-7575.05.patch

 NameNode not handling heartbeats properly after HDFS-2832
 -

 Key: HDFS-7575
 URL: https://issues.apache.org/jira/browse/HDFS-7575
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0, 2.5.0, 2.6.0
Reporter: Lars Francke
Assignee: Arpit Agarwal
Priority: Critical
 Attachments: HDFS-7575.01.patch, HDFS-7575.02.patch, 
 HDFS-7575.03.binary.patch, HDFS-7575.03.patch, HDFS-7575.04.binary.patch, 
 HDFS-7575.04.patch, HDFS-7575.05.binary.patch, HDFS-7575.05.patch, 
 testUpgrade22via24GeneratesStorageIDs.tgz, 
 testUpgradeFrom22GeneratesStorageIDs.tgz, 
 testUpgradeFrom24PreservesStorageId.tgz


 Before HDFS-2832 each DataNode would have a unique storageId which included 
 its IP address. Since HDFS-2832 the DataNodes have a unique storageId per 
 storage directory which is just a random UUID.
 They send reports per storage directory in their heartbeats. This heartbeat 
 is processed on the NameNode in the 
 {{DatanodeDescriptor#updateHeartbeatState}} method. Pre HDFS-2832 this would 
 just store the information per Datanode. After the patch though each DataNode 
 can have multiple different storages so it's stored in a map keyed by the 
 storage Id.
 This works fine for all clusters that have been installed post HDFS-2832 as 
 they get a UUID for their storage Id. So a DN with 8 drives has a map with 8 
 different keys. On each Heartbeat the Map is searched and updated 
 ({{DatanodeStorageInfo storage = storageMap.get(s.getStorageID());}}):
 {code:title=DatanodeStorageInfo}
   void updateState(StorageReport r) {
 capacity = r.getCapacity();
 dfsUsed = r.getDfsUsed();
 remaining = r.getRemaining();
 blockPoolUsed = r.getBlockPoolUsed();
   }
 {code}
 On clusters that were upgraded from a pre HDFS-2832 version though the 
 storage Id has not been rewritten (at least not on the four clusters I 
 checked) so each directory will have the exact same storageId. That means 
 there'll be only a single entry in the {{storageMap}} and it'll be 
 overwritten by a random {{StorageReport}} from the DataNode. This can be seen 
 in the {{updateState}} method above. This just assigns the capacity from the 
 received report, instead it should probably sum it up per received heartbeat.
 The Balancer seems to be one of the only things that actually uses this 
 information so it now considers the utilization of a random drive per 
 DataNode for balancing purposes.
 Things get even worse when a drive has been added or replaced as this will 
 now get a new storage Id so there'll be two entries in the storageMap. As new 
 drives are usually empty it skewes the balancers decision in a way that this 
 node will never be considered over-utilized.
 Another problem is that old StorageReports are never removed from the 
 storageMap. So if I replace a drive and it gets a new storage Id the old one 
 will still be in place and used for all calculations by the Balancer until a 
 restart of the NameNode.
 I can try providing a patch that does the following:
 * Instead of using a Map I could just store the array we receive or instead 
 of storing an array sum up the values for reports with the same Id
 * On each heartbeat clear the map (so we know we have up to date information)
 Does that sound sensible?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7575) NameNode not handling heartbeats properly after HDFS-2832

2015-01-22 Thread Arpit Agarwal (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14288260#comment-14288260
 ] 

Arpit Agarwal commented on HDFS-7575:
-

Thanks Colin.

bq. We should log the old (invalid) storage id.
Hi Nicholas, we are already doing so in the v05 patch.

In {{createStorageID}}:
{code}
  LOG.info(Generated new storageID  + sd.getStorageUuid() +
   for directory  + sd.getRoot() +
  (oldStorageID == null ?  : ( to replace  + oldStorageID)));
{code}

Is this what you were looking for?

 NameNode not handling heartbeats properly after HDFS-2832
 -

 Key: HDFS-7575
 URL: https://issues.apache.org/jira/browse/HDFS-7575
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0, 2.5.0, 2.6.0
Reporter: Lars Francke
Assignee: Arpit Agarwal
Priority: Critical
 Attachments: HDFS-7575.01.patch, HDFS-7575.02.patch, 
 HDFS-7575.03.binary.patch, HDFS-7575.03.patch, HDFS-7575.04.binary.patch, 
 HDFS-7575.04.patch, HDFS-7575.05.binary.patch, HDFS-7575.05.patch, 
 testUpgrade22via24GeneratesStorageIDs.tgz, 
 testUpgradeFrom22GeneratesStorageIDs.tgz, 
 testUpgradeFrom24PreservesStorageId.tgz


 Before HDFS-2832 each DataNode would have a unique storageId which included 
 its IP address. Since HDFS-2832 the DataNodes have a unique storageId per 
 storage directory which is just a random UUID.
 They send reports per storage directory in their heartbeats. This heartbeat 
 is processed on the NameNode in the 
 {{DatanodeDescriptor#updateHeartbeatState}} method. Pre HDFS-2832 this would 
 just store the information per Datanode. After the patch though each DataNode 
 can have multiple different storages so it's stored in a map keyed by the 
 storage Id.
 This works fine for all clusters that have been installed post HDFS-2832 as 
 they get a UUID for their storage Id. So a DN with 8 drives has a map with 8 
 different keys. On each Heartbeat the Map is searched and updated 
 ({{DatanodeStorageInfo storage = storageMap.get(s.getStorageID());}}):
 {code:title=DatanodeStorageInfo}
   void updateState(StorageReport r) {
 capacity = r.getCapacity();
 dfsUsed = r.getDfsUsed();
 remaining = r.getRemaining();
 blockPoolUsed = r.getBlockPoolUsed();
   }
 {code}
 On clusters that were upgraded from a pre HDFS-2832 version though the 
 storage Id has not been rewritten (at least not on the four clusters I 
 checked) so each directory will have the exact same storageId. That means 
 there'll be only a single entry in the {{storageMap}} and it'll be 
 overwritten by a random {{StorageReport}} from the DataNode. This can be seen 
 in the {{updateState}} method above. This just assigns the capacity from the 
 received report, instead it should probably sum it up per received heartbeat.
 The Balancer seems to be one of the only things that actually uses this 
 information so it now considers the utilization of a random drive per 
 DataNode for balancing purposes.
 Things get even worse when a drive has been added or replaced as this will 
 now get a new storage Id so there'll be two entries in the storageMap. As new 
 drives are usually empty it skewes the balancers decision in a way that this 
 node will never be considered over-utilized.
 Another problem is that old StorageReports are never removed from the 
 storageMap. So if I replace a drive and it gets a new storage Id the old one 
 will still be in place and used for all calculations by the Balancer until a 
 restart of the NameNode.
 I can try providing a patch that does the following:
 * Instead of using a Map I could just store the array we receive or instead 
 of storing an array sum up the values for reports with the same Id
 * On each heartbeat clear the map (so we know we have up to date information)
 Does that sound sensible?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7575) NameNode not handling heartbeats properly after HDFS-2832

2015-01-22 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14287999#comment-14287999
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7575:
---

 I don't think it's productive to argue about whether this represents a true 
 layout version change whether it is layout version changey enough. 
 Clearly we both agree that doing an LV change here would work and solve the 
 problem. At the end of the day, we have to make the decision based on which 
 way is more maintainable.

You seem suggesting that even there is no layout format change, it is good to 
update the layout version because of the bug.  Is it correct?

 ...  It helps by not harming ...

I guess you mean it actually does not help at all.  It only shows that the 
cause of duplication problem is not from this bug.

 NameNode not handling heartbeats properly after HDFS-2832
 -

 Key: HDFS-7575
 URL: https://issues.apache.org/jira/browse/HDFS-7575
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0, 2.5.0, 2.6.0
Reporter: Lars Francke
Assignee: Arpit Agarwal
Priority: Critical
 Attachments: HDFS-7575.01.patch, HDFS-7575.02.patch, 
 HDFS-7575.03.binary.patch, HDFS-7575.03.patch, HDFS-7575.04.binary.patch, 
 HDFS-7575.04.patch, HDFS-7575.05.binary.patch, HDFS-7575.05.patch, 
 testUpgrade22via24GeneratesStorageIDs.tgz, 
 testUpgradeFrom22GeneratesStorageIDs.tgz, 
 testUpgradeFrom24PreservesStorageId.tgz


 Before HDFS-2832 each DataNode would have a unique storageId which included 
 its IP address. Since HDFS-2832 the DataNodes have a unique storageId per 
 storage directory which is just a random UUID.
 They send reports per storage directory in their heartbeats. This heartbeat 
 is processed on the NameNode in the 
 {{DatanodeDescriptor#updateHeartbeatState}} method. Pre HDFS-2832 this would 
 just store the information per Datanode. After the patch though each DataNode 
 can have multiple different storages so it's stored in a map keyed by the 
 storage Id.
 This works fine for all clusters that have been installed post HDFS-2832 as 
 they get a UUID for their storage Id. So a DN with 8 drives has a map with 8 
 different keys. On each Heartbeat the Map is searched and updated 
 ({{DatanodeStorageInfo storage = storageMap.get(s.getStorageID());}}):
 {code:title=DatanodeStorageInfo}
   void updateState(StorageReport r) {
 capacity = r.getCapacity();
 dfsUsed = r.getDfsUsed();
 remaining = r.getRemaining();
 blockPoolUsed = r.getBlockPoolUsed();
   }
 {code}
 On clusters that were upgraded from a pre HDFS-2832 version though the 
 storage Id has not been rewritten (at least not on the four clusters I 
 checked) so each directory will have the exact same storageId. That means 
 there'll be only a single entry in the {{storageMap}} and it'll be 
 overwritten by a random {{StorageReport}} from the DataNode. This can be seen 
 in the {{updateState}} method above. This just assigns the capacity from the 
 received report, instead it should probably sum it up per received heartbeat.
 The Balancer seems to be one of the only things that actually uses this 
 information so it now considers the utilization of a random drive per 
 DataNode for balancing purposes.
 Things get even worse when a drive has been added or replaced as this will 
 now get a new storage Id so there'll be two entries in the storageMap. As new 
 drives are usually empty it skewes the balancers decision in a way that this 
 node will never be considered over-utilized.
 Another problem is that old StorageReports are never removed from the 
 storageMap. So if I replace a drive and it gets a new storage Id the old one 
 will still be in place and used for all calculations by the Balancer until a 
 restart of the NameNode.
 I can try providing a patch that does the following:
 * Instead of using a Map I could just store the array we receive or instead 
 of storing an array sum up the values for reports with the same Id
 * On each heartbeat clear the map (so we know we have up to date information)
 Does that sound sensible?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7575) NameNode not handling heartbeats properly after HDFS-2832

2015-01-22 Thread Colin Patrick McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14288071#comment-14288071
 ] 

Colin Patrick McCabe commented on HDFS-7575:


I looked at patch 005 more carefully, and now I can see that it only ever 
modifies storage IDs when the ID can't be parsed as a UUID.  So this should 
really only have an effect with storageIDs generated by pre-upgraded clusters.  
The other nice thing about patch 005 is that it can easily be backported to 
2.6.1, and it will be quicker to upgrade because it doesn't involve a LV change.

So on reconsideration, I am +1 for patch 005 (the latest version).

 NameNode not handling heartbeats properly after HDFS-2832
 -

 Key: HDFS-7575
 URL: https://issues.apache.org/jira/browse/HDFS-7575
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0, 2.5.0, 2.6.0
Reporter: Lars Francke
Assignee: Arpit Agarwal
Priority: Critical
 Attachments: HDFS-7575.01.patch, HDFS-7575.02.patch, 
 HDFS-7575.03.binary.patch, HDFS-7575.03.patch, HDFS-7575.04.binary.patch, 
 HDFS-7575.04.patch, HDFS-7575.05.binary.patch, HDFS-7575.05.patch, 
 testUpgrade22via24GeneratesStorageIDs.tgz, 
 testUpgradeFrom22GeneratesStorageIDs.tgz, 
 testUpgradeFrom24PreservesStorageId.tgz


 Before HDFS-2832 each DataNode would have a unique storageId which included 
 its IP address. Since HDFS-2832 the DataNodes have a unique storageId per 
 storage directory which is just a random UUID.
 They send reports per storage directory in their heartbeats. This heartbeat 
 is processed on the NameNode in the 
 {{DatanodeDescriptor#updateHeartbeatState}} method. Pre HDFS-2832 this would 
 just store the information per Datanode. After the patch though each DataNode 
 can have multiple different storages so it's stored in a map keyed by the 
 storage Id.
 This works fine for all clusters that have been installed post HDFS-2832 as 
 they get a UUID for their storage Id. So a DN with 8 drives has a map with 8 
 different keys. On each Heartbeat the Map is searched and updated 
 ({{DatanodeStorageInfo storage = storageMap.get(s.getStorageID());}}):
 {code:title=DatanodeStorageInfo}
   void updateState(StorageReport r) {
 capacity = r.getCapacity();
 dfsUsed = r.getDfsUsed();
 remaining = r.getRemaining();
 blockPoolUsed = r.getBlockPoolUsed();
   }
 {code}
 On clusters that were upgraded from a pre HDFS-2832 version though the 
 storage Id has not been rewritten (at least not on the four clusters I 
 checked) so each directory will have the exact same storageId. That means 
 there'll be only a single entry in the {{storageMap}} and it'll be 
 overwritten by a random {{StorageReport}} from the DataNode. This can be seen 
 in the {{updateState}} method above. This just assigns the capacity from the 
 received report, instead it should probably sum it up per received heartbeat.
 The Balancer seems to be one of the only things that actually uses this 
 information so it now considers the utilization of a random drive per 
 DataNode for balancing purposes.
 Things get even worse when a drive has been added or replaced as this will 
 now get a new storage Id so there'll be two entries in the storageMap. As new 
 drives are usually empty it skewes the balancers decision in a way that this 
 node will never be considered over-utilized.
 Another problem is that old StorageReports are never removed from the 
 storageMap. So if I replace a drive and it gets a new storage Id the old one 
 will still be in place and used for all calculations by the Balancer until a 
 restart of the NameNode.
 I can try providing a patch that does the following:
 * Instead of using a Map I could just store the array we receive or instead 
 of storing an array sum up the values for reports with the same Id
 * On each heartbeat clear the map (so we know we have up to date information)
 Does that sound sensible?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7575) NameNode not handling heartbeats properly after HDFS-2832

2015-01-21 Thread Colin Patrick McCabe (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-7575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14286169#comment-14286169
]

Colin Patrick McCabe commented on HDFS-7575:

This patch does change the layout format. It changes it from one where storage
ID may or may not be unique to one where it definitely is.

Can you response to the practical points I made above? I made a few points
that nobody has responded to yet.
* Changing the storage ID during startup basically changes storage ID from
being a permanent identifier to a temporary one... makes persisting this later
impossible. It commits us to an architecture where block locations can't be
persisted.
* With approach #1, we have to carry the burden of the dedupe code forever.
* Approach #1 degrades error handling. If you somehow end up with two volumes
that map to the same directory, the code silently does the wrong thing.

I would appreciate a response to these. thanks

NameNode not handling heartbeats properly after HDFS-2832
-

Key: HDFS-7575
URL: https://issues.apache.org/jira/browse/HDFS-7575
Project: Hadoop HDFS
Issue Type: Bug
Affects Versions: 2.4.0, 2.5.0, 2.6.0
Reporter: Lars Francke
Assignee: Arpit Agarwal
Priority: Critical
Attachments: HDFS-7575.01.patch, HDFS-7575.02.patch,
HDFS-7575.03.binary.patch, HDFS-7575.03.patch, HDFS-7575.04.binary.patch,
HDFS-7575.04.patch, HDFS-7575.05.binary.patch, HDFS-7575.05.patch,
testUpgrade22via24GeneratesStorageIDs.tgz,
testUpgradeFrom22GeneratesStorageIDs.tgz,
testUpgradeFrom24PreservesStorageId.tgz

Before HDFS-2832 each DataNode would have a unique storageId which included
its IP address. Since HDFS-2832 the DataNodes have a unique storageId per
storage directory which is just a random UUID.
They send reports per storage directory in their heartbeats. This heartbeat
is processed on the NameNode in the
{{DatanodeDescriptor#updateHeartbeatState}} method. Pre HDFS-2832 this would
just store the information per Datanode. After the patch though each DataNode
can have multiple different storages so it's stored in a map keyed by the
storage Id.
This works fine for all clusters that have been installed post HDFS-2832 as
they get a UUID for their storage Id. So a DN with 8 drives has a map with 8
different keys. On each Heartbeat the Map is searched and updated
({{DatanodeStorageInfo storage = storageMap.get(s.getStorageID());}}):
{code:title=DatanodeStorageInfo}
void updateState(StorageReport r) {
capacity = r.getCapacity();
dfsUsed = r.getDfsUsed();
remaining = r.getRemaining();
blockPoolUsed = r.getBlockPoolUsed();
}
{code}
On clusters that were upgraded from a pre HDFS-2832 version though the
storage Id has not been rewritten (at least not on the four clusters I
checked) so each directory will have the exact same storageId. That means
there'll be only a single entry in the {{storageMap}} and it'll be
overwritten by a random {{StorageReport}} from the DataNode. This can be seen
in the {{updateState}} method above. This just assigns the capacity from the
received report, instead it should probably sum it up per received heartbeat.
The Balancer seems to be one of the only things that actually uses this
information so it now considers the utilization of a random drive per
DataNode for balancing purposes.
Things get even worse when a drive has been added or replaced as this will
now get a new storage Id so there'll be two entries in the storageMap. As new
drives are usually empty it skewes the balancers decision in a way that this
node will never be considered over-utilized.
Another problem is that old StorageReports are never removed from the
storageMap. So if I replace a drive and it gets a new storage Id the old one
will still be in place and used for all calculations by the Balancer until a
restart of the NameNode.
I can try providing a patch that does the following:
* Instead of using a Map I could just store the array we receive or instead
of storing an array sum up the values for reports with the same Id
* On each heartbeat clear the map (so we know we have up to date information)
Does that sound sensible?

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7575) NameNode not handling heartbeats properly after HDFS-2832

2015-01-21 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14286051#comment-14286051
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7575:
---

 ... we have bumped the layout version in the past even when the old software 
 could handle the new layout. ...

For HDFS-6482, it does change layout format.  So bumping layout version makes 
sense.  However, the patch here does not change layout format.  Disagree?

 NameNode not handling heartbeats properly after HDFS-2832
 -

 Key: HDFS-7575
 URL: https://issues.apache.org/jira/browse/HDFS-7575
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0, 2.5.0, 2.6.0
Reporter: Lars Francke
Assignee: Arpit Agarwal
Priority: Critical
 Attachments: HDFS-7575.01.patch, HDFS-7575.02.patch, 
 HDFS-7575.03.binary.patch, HDFS-7575.03.patch, HDFS-7575.04.binary.patch, 
 HDFS-7575.04.patch, HDFS-7575.05.binary.patch, HDFS-7575.05.patch, 
 testUpgrade22via24GeneratesStorageIDs.tgz, 
 testUpgradeFrom22GeneratesStorageIDs.tgz, 
 testUpgradeFrom24PreservesStorageId.tgz


 Before HDFS-2832 each DataNode would have a unique storageId which included 
 its IP address. Since HDFS-2832 the DataNodes have a unique storageId per 
 storage directory which is just a random UUID.
 They send reports per storage directory in their heartbeats. This heartbeat 
 is processed on the NameNode in the 
 {{DatanodeDescriptor#updateHeartbeatState}} method. Pre HDFS-2832 this would 
 just store the information per Datanode. After the patch though each DataNode 
 can have multiple different storages so it's stored in a map keyed by the 
 storage Id.
 This works fine for all clusters that have been installed post HDFS-2832 as 
 they get a UUID for their storage Id. So a DN with 8 drives has a map with 8 
 different keys. On each Heartbeat the Map is searched and updated 
 ({{DatanodeStorageInfo storage = storageMap.get(s.getStorageID());}}):
 {code:title=DatanodeStorageInfo}
   void updateState(StorageReport r) {
 capacity = r.getCapacity();
 dfsUsed = r.getDfsUsed();
 remaining = r.getRemaining();
 blockPoolUsed = r.getBlockPoolUsed();
   }
 {code}
 On clusters that were upgraded from a pre HDFS-2832 version though the 
 storage Id has not been rewritten (at least not on the four clusters I 
 checked) so each directory will have the exact same storageId. That means 
 there'll be only a single entry in the {{storageMap}} and it'll be 
 overwritten by a random {{StorageReport}} from the DataNode. This can be seen 
 in the {{updateState}} method above. This just assigns the capacity from the 
 received report, instead it should probably sum it up per received heartbeat.
 The Balancer seems to be one of the only things that actually uses this 
 information so it now considers the utilization of a random drive per 
 DataNode for balancing purposes.
 Things get even worse when a drive has been added or replaced as this will 
 now get a new storage Id so there'll be two entries in the storageMap. As new 
 drives are usually empty it skewes the balancers decision in a way that this 
 node will never be considered over-utilized.
 Another problem is that old StorageReports are never removed from the 
 storageMap. So if I replace a drive and it gets a new storage Id the old one 
 will still be in place and used for all calculations by the Balancer until a 
 restart of the NameNode.
 I can try providing a patch that does the following:
 * Instead of using a Map I could just store the array we receive or instead 
 of storing an array sum up the values for reports with the same Id
 * On each heartbeat clear the map (so we know we have up to date information)
 Does that sound sensible?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7575) NameNode not handling heartbeats properly after HDFS-2832

2015-01-21 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14286230#comment-14286230
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7575:
---

 Can you response to the practical points I made above?

If there is not layout format change, the practical points seem irrelevant.  
Anyway, let me comment on them.

 Changing the storage ID during startup basically changes storage ID from 
 being a permanent identifier to a temporary one... 

We only change a storage ID when it is invalid but not changing the storage ID 
arbitrarily.  Valid storage IDs are permanent.

 With approach #1, we have to carry the burden of the dedupe code forever.

The code is for validating storage IDs (but not for de-duplication) and is very 
simple.  It is good to keep.

 ... If you somehow end up with two volumes that map to the same directory, 
 the code silently does the wrong thing.

Is this a practical error?  Have you seen it in practice?

 NameNode not handling heartbeats properly after HDFS-2832
 -

 Key: HDFS-7575
 URL: https://issues.apache.org/jira/browse/HDFS-7575
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0, 2.5.0, 2.6.0
Reporter: Lars Francke
Assignee: Arpit Agarwal
Priority: Critical
 Attachments: HDFS-7575.01.patch, HDFS-7575.02.patch, 
 HDFS-7575.03.binary.patch, HDFS-7575.03.patch, HDFS-7575.04.binary.patch, 
 HDFS-7575.04.patch, HDFS-7575.05.binary.patch, HDFS-7575.05.patch, 
 testUpgrade22via24GeneratesStorageIDs.tgz, 
 testUpgradeFrom22GeneratesStorageIDs.tgz, 
 testUpgradeFrom24PreservesStorageId.tgz


 Before HDFS-2832 each DataNode would have a unique storageId which included 
 its IP address. Since HDFS-2832 the DataNodes have a unique storageId per 
 storage directory which is just a random UUID.
 They send reports per storage directory in their heartbeats. This heartbeat 
 is processed on the NameNode in the 
 {{DatanodeDescriptor#updateHeartbeatState}} method. Pre HDFS-2832 this would 
 just store the information per Datanode. After the patch though each DataNode 
 can have multiple different storages so it's stored in a map keyed by the 
 storage Id.
 This works fine for all clusters that have been installed post HDFS-2832 as 
 they get a UUID for their storage Id. So a DN with 8 drives has a map with 8 
 different keys. On each Heartbeat the Map is searched and updated 
 ({{DatanodeStorageInfo storage = storageMap.get(s.getStorageID());}}):
 {code:title=DatanodeStorageInfo}
   void updateState(StorageReport r) {
 capacity = r.getCapacity();
 dfsUsed = r.getDfsUsed();
 remaining = r.getRemaining();
 blockPoolUsed = r.getBlockPoolUsed();
   }
 {code}
 On clusters that were upgraded from a pre HDFS-2832 version though the 
 storage Id has not been rewritten (at least not on the four clusters I 
 checked) so each directory will have the exact same storageId. That means 
 there'll be only a single entry in the {{storageMap}} and it'll be 
 overwritten by a random {{StorageReport}} from the DataNode. This can be seen 
 in the {{updateState}} method above. This just assigns the capacity from the 
 received report, instead it should probably sum it up per received heartbeat.
 The Balancer seems to be one of the only things that actually uses this 
 information so it now considers the utilization of a random drive per 
 DataNode for balancing purposes.
 Things get even worse when a drive has been added or replaced as this will 
 now get a new storage Id so there'll be two entries in the storageMap. As new 
 drives are usually empty it skewes the balancers decision in a way that this 
 node will never be considered over-utilized.
 Another problem is that old StorageReports are never removed from the 
 storageMap. So if I replace a drive and it gets a new storage Id the old one 
 will still be in place and used for all calculations by the Balancer until a 
 restart of the NameNode.
 I can try providing a patch that does the following:
 * Instead of using a Map I could just store the array we receive or instead 
 of storing an array sum up the values for reports with the same Id
 * On each heartbeat clear the map (so we know we have up to date information)
 Does that sound sensible?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7575) NameNode not handling heartbeats properly after HDFS-2832

2015-01-21 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14286217#comment-14286217
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7575:
---

 This patch does change the layout format. It changes it from one where 
 storage ID may or may not be unique to one where it definitely is.

So, you claim that the current format is a layout, where some storage IDs could 
be the same?
{code}
ADD_DATANODE_AND_STORAGE_UUIDS(-49, Replace StorageID with DatanodeUuid.
+  Use distinct StorageUuid per storage directory.),
{code}
It is clearly specified in the LV -49 that the IDs must be distinct.

 NameNode not handling heartbeats properly after HDFS-2832
 -

 Key: HDFS-7575
 URL: https://issues.apache.org/jira/browse/HDFS-7575
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0, 2.5.0, 2.6.0
Reporter: Lars Francke
Assignee: Arpit Agarwal
Priority: Critical
 Attachments: HDFS-7575.01.patch, HDFS-7575.02.patch, 
 HDFS-7575.03.binary.patch, HDFS-7575.03.patch, HDFS-7575.04.binary.patch, 
 HDFS-7575.04.patch, HDFS-7575.05.binary.patch, HDFS-7575.05.patch, 
 testUpgrade22via24GeneratesStorageIDs.tgz, 
 testUpgradeFrom22GeneratesStorageIDs.tgz, 
 testUpgradeFrom24PreservesStorageId.tgz


 Before HDFS-2832 each DataNode would have a unique storageId which included 
 its IP address. Since HDFS-2832 the DataNodes have a unique storageId per 
 storage directory which is just a random UUID.
 They send reports per storage directory in their heartbeats. This heartbeat 
 is processed on the NameNode in the 
 {{DatanodeDescriptor#updateHeartbeatState}} method. Pre HDFS-2832 this would 
 just store the information per Datanode. After the patch though each DataNode 
 can have multiple different storages so it's stored in a map keyed by the 
 storage Id.
 This works fine for all clusters that have been installed post HDFS-2832 as 
 they get a UUID for their storage Id. So a DN with 8 drives has a map with 8 
 different keys. On each Heartbeat the Map is searched and updated 
 ({{DatanodeStorageInfo storage = storageMap.get(s.getStorageID());}}):
 {code:title=DatanodeStorageInfo}
   void updateState(StorageReport r) {
 capacity = r.getCapacity();
 dfsUsed = r.getDfsUsed();
 remaining = r.getRemaining();
 blockPoolUsed = r.getBlockPoolUsed();
   }
 {code}
 On clusters that were upgraded from a pre HDFS-2832 version though the 
 storage Id has not been rewritten (at least not on the four clusters I 
 checked) so each directory will have the exact same storageId. That means 
 there'll be only a single entry in the {{storageMap}} and it'll be 
 overwritten by a random {{StorageReport}} from the DataNode. This can be seen 
 in the {{updateState}} method above. This just assigns the capacity from the 
 received report, instead it should probably sum it up per received heartbeat.
 The Balancer seems to be one of the only things that actually uses this 
 information so it now considers the utilization of a random drive per 
 DataNode for balancing purposes.
 Things get even worse when a drive has been added or replaced as this will 
 now get a new storage Id so there'll be two entries in the storageMap. As new 
 drives are usually empty it skewes the balancers decision in a way that this 
 node will never be considered over-utilized.
 Another problem is that old StorageReports are never removed from the 
 storageMap. So if I replace a drive and it gets a new storage Id the old one 
 will still be in place and used for all calculations by the Balancer until a 
 restart of the NameNode.
 I can try providing a patch that does the following:
 * Instead of using a Map I could just store the array we receive or instead 
 of storing an array sum up the values for reports with the same Id
 * On each heartbeat clear the map (so we know we have up to date information)
 Does that sound sensible?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7575) NameNode not handling heartbeats properly after HDFS-2832

2015-01-21 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14286583#comment-14286583
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7575:
---

For the so called practical points you made (say, Again, if I accidentally 
duplicate a directory on a datanode, ...) , how could updating layout version 
help?

 NameNode not handling heartbeats properly after HDFS-2832
 -

 Key: HDFS-7575
 URL: https://issues.apache.org/jira/browse/HDFS-7575
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0, 2.5.0, 2.6.0
Reporter: Lars Francke
Assignee: Arpit Agarwal
Priority: Critical
 Attachments: HDFS-7575.01.patch, HDFS-7575.02.patch, 
 HDFS-7575.03.binary.patch, HDFS-7575.03.patch, HDFS-7575.04.binary.patch, 
 HDFS-7575.04.patch, HDFS-7575.05.binary.patch, HDFS-7575.05.patch, 
 testUpgrade22via24GeneratesStorageIDs.tgz, 
 testUpgradeFrom22GeneratesStorageIDs.tgz, 
 testUpgradeFrom24PreservesStorageId.tgz


 Before HDFS-2832 each DataNode would have a unique storageId which included 
 its IP address. Since HDFS-2832 the DataNodes have a unique storageId per 
 storage directory which is just a random UUID.
 They send reports per storage directory in their heartbeats. This heartbeat 
 is processed on the NameNode in the 
 {{DatanodeDescriptor#updateHeartbeatState}} method. Pre HDFS-2832 this would 
 just store the information per Datanode. After the patch though each DataNode 
 can have multiple different storages so it's stored in a map keyed by the 
 storage Id.
 This works fine for all clusters that have been installed post HDFS-2832 as 
 they get a UUID for their storage Id. So a DN with 8 drives has a map with 8 
 different keys. On each Heartbeat the Map is searched and updated 
 ({{DatanodeStorageInfo storage = storageMap.get(s.getStorageID());}}):
 {code:title=DatanodeStorageInfo}
   void updateState(StorageReport r) {
 capacity = r.getCapacity();
 dfsUsed = r.getDfsUsed();
 remaining = r.getRemaining();
 blockPoolUsed = r.getBlockPoolUsed();
   }
 {code}
 On clusters that were upgraded from a pre HDFS-2832 version though the 
 storage Id has not been rewritten (at least not on the four clusters I 
 checked) so each directory will have the exact same storageId. That means 
 there'll be only a single entry in the {{storageMap}} and it'll be 
 overwritten by a random {{StorageReport}} from the DataNode. This can be seen 
 in the {{updateState}} method above. This just assigns the capacity from the 
 received report, instead it should probably sum it up per received heartbeat.
 The Balancer seems to be one of the only things that actually uses this 
 information so it now considers the utilization of a random drive per 
 DataNode for balancing purposes.
 Things get even worse when a drive has been added or replaced as this will 
 now get a new storage Id so there'll be two entries in the storageMap. As new 
 drives are usually empty it skewes the balancers decision in a way that this 
 node will never be considered over-utilized.
 Another problem is that old StorageReports are never removed from the 
 storageMap. So if I replace a drive and it gets a new storage Id the old one 
 will still be in place and used for all calculations by the Balancer until a 
 restart of the NameNode.
 I can try providing a patch that does the following:
 * Instead of using a Map I could just store the array we receive or instead 
 of storing an array sum up the values for reports with the same Id
 * On each heartbeat clear the map (so we know we have up to date information)
 Does that sound sensible?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7575) NameNode not handling heartbeats properly after HDFS-2832

2015-01-21 Thread Colin Patrick McCabe (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-7575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14286527#comment-14286527
]

Colin Patrick McCabe commented on HDFS-7575:

bq. So, you claim that the current format is a layout, where some storage IDs
could be the same?... It is clearly specified in the LV -49 that the IDs must
be distinct.

What's important is what was implemented, not what was written in the comment
about the layout version. And what was implemented does allow duplicate
storage IDs.

bq. Is \[two volumes that map to the same directory\] a practical error? Have
you seen it in practice?

Yes. Recently we had a cluster with two datanodes connected to the same shared
storage accidentally. I guess you could argue that lock files should prevent
problems here. However, I do not like the idea of datanodes modifying VERSION
on startup at all. If one of the DNs had terminated before the other one tried
to lock the directory, it would have succeeded. And with the retry failed
volume stuff, we probably have a wide window for this to happen.

bq. We only change a storage ID when it is invalid but not changing the storage
ID arbitrarily. Valid storage IDs are permanent.

Again, if I accidentally duplicate a directory on a datanode, then the storage
ID morph for one of the directories. That doesn't sound permanent to me.

bq. The code is for validating storage IDs (but not for de-duplication) and is
very simple. It is good to keep.

I agree that it is good to validate the storage IDs are unique. But this is
the same as when we validate that the cluster ID is correct, or the layout
version is correct. We don't change incorrect values to fix them. If
they're wrong then we need to find out why, not sweep the problem under the rug.

Are there any practical arguments in favor of not doing a layout version
change? The main argument in favor of not changing the layout here I see is
basically that this isn't a big enough change to merit a new LV. But that
seems irrelevant to me-- the question is which approach is better for error
handling and more maintainable.

NameNode not handling heartbeats properly after HDFS-2832
-

[jira] [Commented] (HDFS-7575) NameNode not handling heartbeats properly after HDFS-2832

2015-01-21 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14286574#comment-14286574
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7575:
---

 What's important is what was implemented, not what was written in the comment 
 about the layout version. And what was implemented does allow duplicate 
 storage IDs.

I disagree.  The implementation is a bug -- it supposes to change the old ids 
(in old id format) to use the new uuid format.  The entire heterogeneous 
storage design requires storage ID to be unique.  Which implementation works 
correctly with the duplicate storage IDs?  


 NameNode not handling heartbeats properly after HDFS-2832
 -

 Key: HDFS-7575
 URL: https://issues.apache.org/jira/browse/HDFS-7575
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0, 2.5.0, 2.6.0
Reporter: Lars Francke
Assignee: Arpit Agarwal
Priority: Critical
 Attachments: HDFS-7575.01.patch, HDFS-7575.02.patch, 
 HDFS-7575.03.binary.patch, HDFS-7575.03.patch, HDFS-7575.04.binary.patch, 
 HDFS-7575.04.patch, HDFS-7575.05.binary.patch, HDFS-7575.05.patch, 
 testUpgrade22via24GeneratesStorageIDs.tgz, 
 testUpgradeFrom22GeneratesStorageIDs.tgz, 
 testUpgradeFrom24PreservesStorageId.tgz


 Before HDFS-2832 each DataNode would have a unique storageId which included 
 its IP address. Since HDFS-2832 the DataNodes have a unique storageId per 
 storage directory which is just a random UUID.
 They send reports per storage directory in their heartbeats. This heartbeat 
 is processed on the NameNode in the 
 {{DatanodeDescriptor#updateHeartbeatState}} method. Pre HDFS-2832 this would 
 just store the information per Datanode. After the patch though each DataNode 
 can have multiple different storages so it's stored in a map keyed by the 
 storage Id.
 This works fine for all clusters that have been installed post HDFS-2832 as 
 they get a UUID for their storage Id. So a DN with 8 drives has a map with 8 
 different keys. On each Heartbeat the Map is searched and updated 
 ({{DatanodeStorageInfo storage = storageMap.get(s.getStorageID());}}):
 {code:title=DatanodeStorageInfo}
   void updateState(StorageReport r) {
 capacity = r.getCapacity();
 dfsUsed = r.getDfsUsed();
 remaining = r.getRemaining();
 blockPoolUsed = r.getBlockPoolUsed();
   }
 {code}
 On clusters that were upgraded from a pre HDFS-2832 version though the 
 storage Id has not been rewritten (at least not on the four clusters I 
 checked) so each directory will have the exact same storageId. That means 
 there'll be only a single entry in the {{storageMap}} and it'll be 
 overwritten by a random {{StorageReport}} from the DataNode. This can be seen 
 in the {{updateState}} method above. This just assigns the capacity from the 
 received report, instead it should probably sum it up per received heartbeat.
 The Balancer seems to be one of the only things that actually uses this 
 information so it now considers the utilization of a random drive per 
 DataNode for balancing purposes.
 Things get even worse when a drive has been added or replaced as this will 
 now get a new storage Id so there'll be two entries in the storageMap. As new 
 drives are usually empty it skewes the balancers decision in a way that this 
 node will never be considered over-utilized.
 Another problem is that old StorageReports are never removed from the 
 storageMap. So if I replace a drive and it gets a new storage Id the old one 
 will still be in place and used for all calculations by the Balancer until a 
 restart of the NameNode.
 I can try providing a patch that does the following:
 * Instead of using a Map I could just store the array we receive or instead 
 of storing an array sum up the values for reports with the same Id
 * On each heartbeat clear the map (so we know we have up to date information)
 Does that sound sensible?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7575) NameNode not handling heartbeats properly after HDFS-2832

2015-01-21 Thread Colin Patrick McCabe (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-7575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14286637#comment-14286637
]

Colin Patrick McCabe commented on HDFS-7575:

bq. I disagree. The implementation is a bug – it supposes to change the old ids
(in old id format) to use the new uuid format. The entire heterogeneous storage
design requires storage ID to be unique. Which implementation works correctly
with the duplicate storage IDs?

I don't think it's productive to argue about whether this represents a true
layout version change whether it is layout version changey enough.
Clearly we both agree that doing an LV change here would work and solve the
problem. At the end of the day, we have to make the decision based on which
way is more maintainable.

This does bring up a practical point, though. It will be easier to backport
the silently modify the VERSION file patch to 2.6.1 than the LV change. In
view of this, I think it's fine to backport the silently change VERSION fix
to 2.6.1. I just don't want to have to support it forever in 3.0 and onward.

bq. For the so called practical points you made (say, Again, if I accidentally
duplicate a directory on a datanode, ...) , how could updating layout version
help?

If we check for directories with duplicate storage IDs and exclude them, then
the system administrator becomes aware that there is a problem. It helps by
not harming-- by not changing the VERSION file when we don't know for sure
the reasons why the VERSION file is wrong.

NameNode not handling heartbeats properly after HDFS-2832
-

[jira] [Commented] (HDFS-7575) NameNode not handling heartbeats properly after HDFS-2832

2015-01-20 Thread Colin Patrick McCabe (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-7575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14284681#comment-14284681
]

Colin Patrick McCabe commented on HDFS-7575:

So there are two approaches here:
1. silently (i.e., without user intervention), dedupe duplicate storage IDs
when starting up the DataNode
2. create a new DataNode layout version and dedupe duplicate storage IDs during
the upgrade.

Arguments in favor of approach #1:
* Collisions might happen that we need to dedupe repeatly. This argument seems
specious since the probability is effectively less than the change of cosmic
rays causing errors (as Nicholas pointed out). I think the probabilities
outlined here make this argument a non-starter:
https://en.wikipedia.org/wiki/Universally_unique_identifier#Random_UUID_probability_of_duplicates.
Also, approach #1 only dedupes on a single datanode, but there can be many
datanodes in the cluster.

* As Suresh pointed out, the old software can easily handle cases where the
Storage IDs are unique. So using a new layout version is not required to flip
back and forth between old and new software. While this is true, we have
bumped the layout version in the past even when the old software could handle
the new layout. For example, HDFS-6482 added a new DN layout version even
though the old software could use the new blockid-based layout. So this
argument is basically just saying approach #1 is viable. But it doesn't tell
us whether approach #1 is a good idea.

* Nobody has made this argument yet, but you could argue that the upgrade
process will be faster with approach #1 than approach #2. However, we've done
datanode layout version upgrades on production clusters in the past and time
hasn't been an issue. The JNI hardlink code (and soon, the Java7 hardlink
code) eliminated the long delays that resulted from spawning shell commands.
So I don't think this argument is persuasive.

Arguments in favor of approach #2:
* Changing the storage ID during startup basically changes storage ID from
being a permanent identifier to a temporary one. This seems like a small
change, but I would argue that it's really a big one, architecturally. For
example, suppose we wanted to persist this information at some point. We
couldn't really do that if it's changing all the time.

* With approach #1, we have to carry the burden of the dedupe code forever. We
can't ever stop deduping, even in Hadoop 3.0, because for all we know, the user
has just upgraded, and was previously running 2.6 (a version with the bug) that
we will have to correct. The extra run time isn't an issue, but the complexity
is. What if our write to VERSION fails on one of the volume directories? What
do we do then? And then if volume failures are tolerated, this directory could
later come back and be an issue. The purpose of layout versions is so that we
don't have to think about these kind of mix and match issues.

* Approach #1 leaves us open to some weird scenarios. For example, what if I
have /storage1 - /foo and /storage2 - /foo. In other words, you have what
appears to be two volume root directories, but it's really the same directory.
Approach #2 will complain, but approach #1 will happily rename the storageID of
the /foo directory and continue with the corrupt configuration. This is what
happens when you fudge error checking.

So in conclusion I would argue for approach #2. Thoughts?

NameNode not handling heartbeats properly after HDFS-2832
-

[jira] [Commented] (HDFS-7575) NameNode not handling heartbeats properly after HDFS-2832

2015-01-20 Thread Colin Patrick McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14285048#comment-14285048
 ] 

Colin Patrick McCabe commented on HDFS-7575:


bq. Layout version defines layout format but not the software (don't confuse it 
with the software version). The question here is whether there is a layout 
format change here. Are we changing from a layout, where some storage IDs could 
be the same, to a new layout, where all storage IDs have to be distinct? I 
think the answer is no since the same storage ID does not work even using the 
old software.

Nicholas, I already addressed that in my comment.  I wrote using a new layout 
version is not required to flip back and forth between old and new software 
While this is true, we have bumped the layout version in the past even when the 
old software could handle the new layout.  Do you have any thoughts about the 
other points I mentioned?

 NameNode not handling heartbeats properly after HDFS-2832
 -

 Key: HDFS-7575
 URL: https://issues.apache.org/jira/browse/HDFS-7575
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0, 2.5.0, 2.6.0
Reporter: Lars Francke
Assignee: Arpit Agarwal
Priority: Critical
 Attachments: HDFS-7575.01.patch, HDFS-7575.02.patch, 
 HDFS-7575.03.binary.patch, HDFS-7575.03.patch, HDFS-7575.04.binary.patch, 
 HDFS-7575.04.patch, HDFS-7575.05.binary.patch, HDFS-7575.05.patch, 
 testUpgrade22via24GeneratesStorageIDs.tgz, 
 testUpgradeFrom22GeneratesStorageIDs.tgz, 
 testUpgradeFrom24PreservesStorageId.tgz


 Before HDFS-2832 each DataNode would have a unique storageId which included 
 its IP address. Since HDFS-2832 the DataNodes have a unique storageId per 
 storage directory which is just a random UUID.
 They send reports per storage directory in their heartbeats. This heartbeat 
 is processed on the NameNode in the 
 {{DatanodeDescriptor#updateHeartbeatState}} method. Pre HDFS-2832 this would 
 just store the information per Datanode. After the patch though each DataNode 
 can have multiple different storages so it's stored in a map keyed by the 
 storage Id.
 This works fine for all clusters that have been installed post HDFS-2832 as 
 they get a UUID for their storage Id. So a DN with 8 drives has a map with 8 
 different keys. On each Heartbeat the Map is searched and updated 
 ({{DatanodeStorageInfo storage = storageMap.get(s.getStorageID());}}):
 {code:title=DatanodeStorageInfo}
   void updateState(StorageReport r) {
 capacity = r.getCapacity();
 dfsUsed = r.getDfsUsed();
 remaining = r.getRemaining();
 blockPoolUsed = r.getBlockPoolUsed();
   }
 {code}
 On clusters that were upgraded from a pre HDFS-2832 version though the 
 storage Id has not been rewritten (at least not on the four clusters I 
 checked) so each directory will have the exact same storageId. That means 
 there'll be only a single entry in the {{storageMap}} and it'll be 
 overwritten by a random {{StorageReport}} from the DataNode. This can be seen 
 in the {{updateState}} method above. This just assigns the capacity from the 
 received report, instead it should probably sum it up per received heartbeat.
 The Balancer seems to be one of the only things that actually uses this 
 information so it now considers the utilization of a random drive per 
 DataNode for balancing purposes.
 Things get even worse when a drive has been added or replaced as this will 
 now get a new storage Id so there'll be two entries in the storageMap. As new 
 drives are usually empty it skewes the balancers decision in a way that this 
 node will never be considered over-utilized.
 Another problem is that old StorageReports are never removed from the 
 storageMap. So if I replace a drive and it gets a new storage Id the old one 
 will still be in place and used for all calculations by the Balancer until a 
 restart of the NameNode.
 I can try providing a patch that does the following:
 * Instead of using a Map I could just store the array we receive or instead 
 of storing an array sum up the values for reports with the same Id
 * On each heartbeat clear the map (so we know we have up to date information)
 Does that sound sensible?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7575) NameNode not handling heartbeats properly after HDFS-2832

2015-01-20 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14284937#comment-14284937
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7575:
---

Layout version defines layout format but not the software (don't confuse it 
with the software version).  The question here is whether there is a layout 
format change here.  Are we changing from a layout, where some storage IDs 
could be the same, to a new layout, where all storage IDs have to be distinct?  
I think the answer is no since the same storage ID does not work even using the 
old software.

 NameNode not handling heartbeats properly after HDFS-2832
 -

 Key: HDFS-7575
 URL: https://issues.apache.org/jira/browse/HDFS-7575
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0, 2.5.0, 2.6.0
Reporter: Lars Francke
Assignee: Arpit Agarwal
Priority: Critical
 Attachments: HDFS-7575.01.patch, HDFS-7575.02.patch, 
 HDFS-7575.03.binary.patch, HDFS-7575.03.patch, HDFS-7575.04.binary.patch, 
 HDFS-7575.04.patch, HDFS-7575.05.binary.patch, HDFS-7575.05.patch, 
 testUpgrade22via24GeneratesStorageIDs.tgz, 
 testUpgradeFrom22GeneratesStorageIDs.tgz, 
 testUpgradeFrom24PreservesStorageId.tgz


 Before HDFS-2832 each DataNode would have a unique storageId which included 
 its IP address. Since HDFS-2832 the DataNodes have a unique storageId per 
 storage directory which is just a random UUID.
 They send reports per storage directory in their heartbeats. This heartbeat 
 is processed on the NameNode in the 
 {{DatanodeDescriptor#updateHeartbeatState}} method. Pre HDFS-2832 this would 
 just store the information per Datanode. After the patch though each DataNode 
 can have multiple different storages so it's stored in a map keyed by the 
 storage Id.
 This works fine for all clusters that have been installed post HDFS-2832 as 
 they get a UUID for their storage Id. So a DN with 8 drives has a map with 8 
 different keys. On each Heartbeat the Map is searched and updated 
 ({{DatanodeStorageInfo storage = storageMap.get(s.getStorageID());}}):
 {code:title=DatanodeStorageInfo}
   void updateState(StorageReport r) {
 capacity = r.getCapacity();
 dfsUsed = r.getDfsUsed();
 remaining = r.getRemaining();
 blockPoolUsed = r.getBlockPoolUsed();
   }
 {code}
 On clusters that were upgraded from a pre HDFS-2832 version though the 
 storage Id has not been rewritten (at least not on the four clusters I 
 checked) so each directory will have the exact same storageId. That means 
 there'll be only a single entry in the {{storageMap}} and it'll be 
 overwritten by a random {{StorageReport}} from the DataNode. This can be seen 
 in the {{updateState}} method above. This just assigns the capacity from the 
 received report, instead it should probably sum it up per received heartbeat.
 The Balancer seems to be one of the only things that actually uses this 
 information so it now considers the utilization of a random drive per 
 DataNode for balancing purposes.
 Things get even worse when a drive has been added or replaced as this will 
 now get a new storage Id so there'll be two entries in the storageMap. As new 
 drives are usually empty it skewes the balancers decision in a way that this 
 node will never be considered over-utilized.
 Another problem is that old StorageReports are never removed from the 
 storageMap. So if I replace a drive and it gets a new storage Id the old one 
 will still be in place and used for all calculations by the Balancer until a 
 restart of the NameNode.
 I can try providing a patch that does the following:
 * Instead of using a Map I could just store the array we receive or instead 
 of storing an array sum up the values for reports with the same Id
 * On each heartbeat clear the map (so we know we have up to date information)
 Does that sound sensible?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7575) NameNode not handling heartbeats properly after HDFS-2832

2015-01-20 Thread Colin Patrick McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14285135#comment-14285135
 ] 

Colin Patrick McCabe commented on HDFS-7575:


Just to be clear, I'd like to see some discussion of the points above before we 
commit this.

 NameNode not handling heartbeats properly after HDFS-2832
 -

 Key: HDFS-7575
 URL: https://issues.apache.org/jira/browse/HDFS-7575
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0, 2.5.0, 2.6.0
Reporter: Lars Francke
Assignee: Arpit Agarwal
Priority: Critical
 Attachments: HDFS-7575.01.patch, HDFS-7575.02.patch, 
 HDFS-7575.03.binary.patch, HDFS-7575.03.patch, HDFS-7575.04.binary.patch, 
 HDFS-7575.04.patch, HDFS-7575.05.binary.patch, HDFS-7575.05.patch, 
 testUpgrade22via24GeneratesStorageIDs.tgz, 
 testUpgradeFrom22GeneratesStorageIDs.tgz, 
 testUpgradeFrom24PreservesStorageId.tgz


 Before HDFS-2832 each DataNode would have a unique storageId which included 
 its IP address. Since HDFS-2832 the DataNodes have a unique storageId per 
 storage directory which is just a random UUID.
 They send reports per storage directory in their heartbeats. This heartbeat 
 is processed on the NameNode in the 
 {{DatanodeDescriptor#updateHeartbeatState}} method. Pre HDFS-2832 this would 
 just store the information per Datanode. After the patch though each DataNode 
 can have multiple different storages so it's stored in a map keyed by the 
 storage Id.
 This works fine for all clusters that have been installed post HDFS-2832 as 
 they get a UUID for their storage Id. So a DN with 8 drives has a map with 8 
 different keys. On each Heartbeat the Map is searched and updated 
 ({{DatanodeStorageInfo storage = storageMap.get(s.getStorageID());}}):
 {code:title=DatanodeStorageInfo}
   void updateState(StorageReport r) {
 capacity = r.getCapacity();
 dfsUsed = r.getDfsUsed();
 remaining = r.getRemaining();
 blockPoolUsed = r.getBlockPoolUsed();
   }
 {code}
 On clusters that were upgraded from a pre HDFS-2832 version though the 
 storage Id has not been rewritten (at least not on the four clusters I 
 checked) so each directory will have the exact same storageId. That means 
 there'll be only a single entry in the {{storageMap}} and it'll be 
 overwritten by a random {{StorageReport}} from the DataNode. This can be seen 
 in the {{updateState}} method above. This just assigns the capacity from the 
 received report, instead it should probably sum it up per received heartbeat.
 The Balancer seems to be one of the only things that actually uses this 
 information so it now considers the utilization of a random drive per 
 DataNode for balancing purposes.
 Things get even worse when a drive has been added or replaced as this will 
 now get a new storage Id so there'll be two entries in the storageMap. As new 
 drives are usually empty it skewes the balancers decision in a way that this 
 node will never be considered over-utilized.
 Another problem is that old StorageReports are never removed from the 
 storageMap. So if I replace a drive and it gets a new storage Id the old one 
 will still be in place and used for all calculations by the Balancer until a 
 restart of the NameNode.
 I can try providing a patch that does the following:
 * Instead of using a Map I could just store the array we receive or instead 
 of storing an array sum up the values for reports with the same Id
 * On each heartbeat clear the map (so we know we have up to date information)
 Does that sound sensible?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7575) NameNode not handling heartbeats properly after HDFS-2832

2015-01-20 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14284973#comment-14284973
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7575:
---

We should log the old (invalid) storage id.

+1 on HDFS-7575.05.patch other than that.

 NameNode not handling heartbeats properly after HDFS-2832
 -

 Key: HDFS-7575
 URL: https://issues.apache.org/jira/browse/HDFS-7575
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0, 2.5.0, 2.6.0
Reporter: Lars Francke
Assignee: Arpit Agarwal
Priority: Critical
 Attachments: HDFS-7575.01.patch, HDFS-7575.02.patch, 
 HDFS-7575.03.binary.patch, HDFS-7575.03.patch, HDFS-7575.04.binary.patch, 
 HDFS-7575.04.patch, HDFS-7575.05.binary.patch, HDFS-7575.05.patch, 
 testUpgrade22via24GeneratesStorageIDs.tgz, 
 testUpgradeFrom22GeneratesStorageIDs.tgz, 
 testUpgradeFrom24PreservesStorageId.tgz


 Before HDFS-2832 each DataNode would have a unique storageId which included 
 its IP address. Since HDFS-2832 the DataNodes have a unique storageId per 
 storage directory which is just a random UUID.
 They send reports per storage directory in their heartbeats. This heartbeat 
 is processed on the NameNode in the 
 {{DatanodeDescriptor#updateHeartbeatState}} method. Pre HDFS-2832 this would 
 just store the information per Datanode. After the patch though each DataNode 
 can have multiple different storages so it's stored in a map keyed by the 
 storage Id.
 This works fine for all clusters that have been installed post HDFS-2832 as 
 they get a UUID for their storage Id. So a DN with 8 drives has a map with 8 
 different keys. On each Heartbeat the Map is searched and updated 
 ({{DatanodeStorageInfo storage = storageMap.get(s.getStorageID());}}):
 {code:title=DatanodeStorageInfo}
   void updateState(StorageReport r) {
 capacity = r.getCapacity();
 dfsUsed = r.getDfsUsed();
 remaining = r.getRemaining();
 blockPoolUsed = r.getBlockPoolUsed();
   }
 {code}
 On clusters that were upgraded from a pre HDFS-2832 version though the 
 storage Id has not been rewritten (at least not on the four clusters I 
 checked) so each directory will have the exact same storageId. That means 
 there'll be only a single entry in the {{storageMap}} and it'll be 
 overwritten by a random {{StorageReport}} from the DataNode. This can be seen 
 in the {{updateState}} method above. This just assigns the capacity from the 
 received report, instead it should probably sum it up per received heartbeat.
 The Balancer seems to be one of the only things that actually uses this 
 information so it now considers the utilization of a random drive per 
 DataNode for balancing purposes.
 Things get even worse when a drive has been added or replaced as this will 
 now get a new storage Id so there'll be two entries in the storageMap. As new 
 drives are usually empty it skewes the balancers decision in a way that this 
 node will never be considered over-utilized.
 Another problem is that old StorageReports are never removed from the 
 storageMap. So if I replace a drive and it gets a new storage Id the old one 
 will still be in place and used for all calculations by the Balancer until a 
 restart of the NameNode.
 I can try providing a patch that does the following:
 * Instead of using a Map I could just store the array we receive or instead 
 of storing an array sum up the values for reports with the same Id
 * On each heartbeat clear the map (so we know we have up to date information)
 Does that sound sensible?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7575) NameNode not handling heartbeats properly after HDFS-2832

2015-01-16 Thread Arpit Agarwal (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14280522#comment-14280522
 ] 

Arpit Agarwal commented on HDFS-7575:
-

It would be good to get this bug fixed instead of letting it fall off the radar 
on the layout change technicality. IMO either approach is better than leaving 
the bug unfixed. Please vote either +1 or -1 on either approach so we have more 
clarity.

Thanks, Arpit.

 NameNode not handling heartbeats properly after HDFS-2832
 -

 Key: HDFS-7575
 URL: https://issues.apache.org/jira/browse/HDFS-7575
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0, 2.5.0, 2.6.0
Reporter: Lars Francke
Assignee: Arpit Agarwal
Priority: Critical
 Attachments: HDFS-7575.01.patch, HDFS-7575.02.patch, 
 HDFS-7575.03.binary.patch, HDFS-7575.03.patch, HDFS-7575.04.binary.patch, 
 HDFS-7575.04.patch, HDFS-7575.05.binary.patch, HDFS-7575.05.patch, 
 testUpgrade22via24GeneratesStorageIDs.tgz, 
 testUpgradeFrom22GeneratesStorageIDs.tgz, 
 testUpgradeFrom24PreservesStorageId.tgz


 Before HDFS-2832 each DataNode would have a unique storageId which included 
 its IP address. Since HDFS-2832 the DataNodes have a unique storageId per 
 storage directory which is just a random UUID.
 They send reports per storage directory in their heartbeats. This heartbeat 
 is processed on the NameNode in the 
 {{DatanodeDescriptor#updateHeartbeatState}} method. Pre HDFS-2832 this would 
 just store the information per Datanode. After the patch though each DataNode 
 can have multiple different storages so it's stored in a map keyed by the 
 storage Id.
 This works fine for all clusters that have been installed post HDFS-2832 as 
 they get a UUID for their storage Id. So a DN with 8 drives has a map with 8 
 different keys. On each Heartbeat the Map is searched and updated 
 ({{DatanodeStorageInfo storage = storageMap.get(s.getStorageID());}}):
 {code:title=DatanodeStorageInfo}
   void updateState(StorageReport r) {
 capacity = r.getCapacity();
 dfsUsed = r.getDfsUsed();
 remaining = r.getRemaining();
 blockPoolUsed = r.getBlockPoolUsed();
   }
 {code}
 On clusters that were upgraded from a pre HDFS-2832 version though the 
 storage Id has not been rewritten (at least not on the four clusters I 
 checked) so each directory will have the exact same storageId. That means 
 there'll be only a single entry in the {{storageMap}} and it'll be 
 overwritten by a random {{StorageReport}} from the DataNode. This can be seen 
 in the {{updateState}} method above. This just assigns the capacity from the 
 received report, instead it should probably sum it up per received heartbeat.
 The Balancer seems to be one of the only things that actually uses this 
 information so it now considers the utilization of a random drive per 
 DataNode for balancing purposes.
 Things get even worse when a drive has been added or replaced as this will 
 now get a new storage Id so there'll be two entries in the storageMap. As new 
 drives are usually empty it skewes the balancers decision in a way that this 
 node will never be considered over-utilized.
 Another problem is that old StorageReports are never removed from the 
 storageMap. So if I replace a drive and it gets a new storage Id the old one 
 will still be in place and used for all calculations by the Balancer until a 
 restart of the NameNode.
 I can try providing a patch that does the following:
 * Instead of using a Map I could just store the array we receive or instead 
 of storing an array sum up the values for reports with the same Id
 * On each heartbeat clear the map (so we know we have up to date information)
 Does that sound sensible?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7575) NameNode not handling heartbeats properly after HDFS-2832

2015-01-15 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279309#comment-14279309
 ] 

Hadoop QA commented on HDFS-7575:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12692337/HDFS-7575.05.patch
  against trunk revision ce29074.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 9 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes
  
org.apache.hadoop.hdfs.TestDatanodeStartupFixesLegacyStorageIDs

  The following test timeouts occurred in 
hadoop-hdfs-project/hadoop-hdfs:

org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9223//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9223//console

This message is automatically generated.

 NameNode not handling heartbeats properly after HDFS-2832
 -

 Key: HDFS-7575
 URL: https://issues.apache.org/jira/browse/HDFS-7575
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0, 2.5.0, 2.6.0
Reporter: Lars Francke
Assignee: Arpit Agarwal
Priority: Critical
 Attachments: HDFS-7575.01.patch, HDFS-7575.02.patch, 
 HDFS-7575.03.binary.patch, HDFS-7575.03.patch, HDFS-7575.04.binary.patch, 
 HDFS-7575.04.patch, HDFS-7575.05.binary.patch, HDFS-7575.05.patch, 
 testUpgrade22via24GeneratesStorageIDs.tgz, 
 testUpgradeFrom22GeneratesStorageIDs.tgz, 
 testUpgradeFrom24PreservesStorageId.tgz


 Before HDFS-2832 each DataNode would have a unique storageId which included 
 its IP address. Since HDFS-2832 the DataNodes have a unique storageId per 
 storage directory which is just a random UUID.
 They send reports per storage directory in their heartbeats. This heartbeat 
 is processed on the NameNode in the 
 {{DatanodeDescriptor#updateHeartbeatState}} method. Pre HDFS-2832 this would 
 just store the information per Datanode. After the patch though each DataNode 
 can have multiple different storages so it's stored in a map keyed by the 
 storage Id.
 This works fine for all clusters that have been installed post HDFS-2832 as 
 they get a UUID for their storage Id. So a DN with 8 drives has a map with 8 
 different keys. On each Heartbeat the Map is searched and updated 
 ({{DatanodeStorageInfo storage = storageMap.get(s.getStorageID());}}):
 {code:title=DatanodeStorageInfo}
   void updateState(StorageReport r) {
 capacity = r.getCapacity();
 dfsUsed = r.getDfsUsed();
 remaining = r.getRemaining();
 blockPoolUsed = r.getBlockPoolUsed();
   }
 {code}
 On clusters that were upgraded from a pre HDFS-2832 version though the 
 storage Id has not been rewritten (at least not on the four clusters I 
 checked) so each directory will have the exact same storageId. That means 
 there'll be only a single entry in the {{storageMap}} and it'll be 
 overwritten by a random {{StorageReport}} from the DataNode. This can be seen 
 in the {{updateState}} method above. This just assigns the capacity from the 
 received report, instead it should probably sum it up per received heartbeat.
 The Balancer seems to be one of the only things that actually uses this 
 information so it now considers the utilization of a random drive per 
 DataNode for balancing purposes.
 Things get even worse when a drive has been added or replaced as this will 
 now get a new storage Id so there'll be two entries in the storageMap. As new 
 drives are usually empty it skewes the balancers decision in a way that this 
 node will never be considered over-utilized.
 Another problem is that old StorageReports are never removed from the 
 storageMap. So if I replace a drive and it gets a new storage Id the old one 
 will still be in place and used for all calculations by the Balancer until a 
 restart of the NameNode.
 I can try providing a patch that does the following:
 * Instead of

[jira] [Commented] (HDFS-7575) NameNode not handling heartbeats properly after HDFS-2832

2015-01-14 Thread Daryn Sharp (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-7575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277895#comment-14277895
]

Daryn Sharp commented on HDFS-7575:
---

bq. I think it's frustrating for storage IDs to change without warning just
because HDFS was restarted. It will make diagnosing problems by reading log
files harder because storageIDs might morph at any time. It also sets a bad
precedent of not allowing downgrade and modifying VERSION files on the fly
during startup.

I'm confused. StorageIDs aren't going to repeatedly morph - unless there's a
UUID collision that you argue can't happen. The important part is you always
want unique storage ids. It's an internal default of hdfs that is not up to
the user to assign. Succinctly stated, what I'd like is for storage ids to be
generated if missing, re-generated if incorrectly formatted, or if there are
dups. I think the latest patch actually does the first two, just not the dup
check.

bq. I'm surprised to hear you say that rollback should not be an option. It
seems like the conservative thing to do here is to allow the user to restore to
the VERSION file. Obviously we believe there will be no problems. But we always
believe that, or else we wouldn't have made the change. Sometimes there are
problems.

I didn't say that. Rollback is for reverting an incompatible change. Changing
the storage id is not incompatible. Unique ids are the default for newly
formatted nodes. If you think unique storage ids may have subtle bugs
(different than shared storage ids), then new clusters or newly formatted nodes
are buggy.

NameNode not handling heartbeats properly after HDFS-2832
-

[jira] [Commented] (HDFS-7575) NameNode not handling heartbeats properly after HDFS-2832

2015-01-14 Thread Arpit Agarwal (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277794#comment-14277794
 ] 

Arpit Agarwal commented on HDFS-7575:
-

I prefer a layout version bump per my original patch, if for no other reason 
than the fact that the DataNode upgrade path is complicated enough without 
having to think about OOB metadata changes. In this case the metadata change is 
limited so I'd be okay with making the exception.

 NameNode not handling heartbeats properly after HDFS-2832
 -

 Key: HDFS-7575
 URL: https://issues.apache.org/jira/browse/HDFS-7575
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0, 2.5.0, 2.6.0
Reporter: Lars Francke
Assignee: Arpit Agarwal
Priority: Critical
 Attachments: HDFS-7575.01.patch, HDFS-7575.02.patch, 
 HDFS-7575.03.binary.patch, HDFS-7575.03.patch, HDFS-7575.04.binary.patch, 
 HDFS-7575.04.patch, HDFS-7575.05.binary.patch, HDFS-7575.05.patch, 
 testUpgrade22via24GeneratesStorageIDs.tgz, 
 testUpgradeFrom22GeneratesStorageIDs.tgz, 
 testUpgradeFrom24PreservesStorageId.tgz


 Before HDFS-2832 each DataNode would have a unique storageId which included 
 its IP address. Since HDFS-2832 the DataNodes have a unique storageId per 
 storage directory which is just a random UUID.
 They send reports per storage directory in their heartbeats. This heartbeat 
 is processed on the NameNode in the 
 {{DatanodeDescriptor#updateHeartbeatState}} method. Pre HDFS-2832 this would 
 just store the information per Datanode. After the patch though each DataNode 
 can have multiple different storages so it's stored in a map keyed by the 
 storage Id.
 This works fine for all clusters that have been installed post HDFS-2832 as 
 they get a UUID for their storage Id. So a DN with 8 drives has a map with 8 
 different keys. On each Heartbeat the Map is searched and updated 
 ({{DatanodeStorageInfo storage = storageMap.get(s.getStorageID());}}):
 {code:title=DatanodeStorageInfo}
   void updateState(StorageReport r) {
 capacity = r.getCapacity();
 dfsUsed = r.getDfsUsed();
 remaining = r.getRemaining();
 blockPoolUsed = r.getBlockPoolUsed();
   }
 {code}
 On clusters that were upgraded from a pre HDFS-2832 version though the 
 storage Id has not been rewritten (at least not on the four clusters I 
 checked) so each directory will have the exact same storageId. That means 
 there'll be only a single entry in the {{storageMap}} and it'll be 
 overwritten by a random {{StorageReport}} from the DataNode. This can be seen 
 in the {{updateState}} method above. This just assigns the capacity from the 
 received report, instead it should probably sum it up per received heartbeat.
 The Balancer seems to be one of the only things that actually uses this 
 information so it now considers the utilization of a random drive per 
 DataNode for balancing purposes.
 Things get even worse when a drive has been added or replaced as this will 
 now get a new storage Id so there'll be two entries in the storageMap. As new 
 drives are usually empty it skewes the balancers decision in a way that this 
 node will never be considered over-utilized.
 Another problem is that old StorageReports are never removed from the 
 storageMap. So if I replace a drive and it gets a new storage Id the old one 
 will still be in place and used for all calculations by the Balancer until a 
 restart of the NameNode.
 I can try providing a patch that does the following:
 * Instead of using a Map I could just store the array we receive or instead 
 of storing an array sum up the values for reports with the same Id
 * On each heartbeat clear the map (so we know we have up to date information)
 Does that sound sensible?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7575) NameNode not handling heartbeats properly after HDFS-2832

2015-01-14 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277795#comment-14277795
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7575:
---

{quote}
BTW, UUID.randomUUID isn't guaranteed to return a unique id. It's highly 
 improbable, but possible, although more likely due to older storages, user 
 copying a storage, etc. Although the storage ids are unique after the 
 upgrade, if a disk is moved from one node to another, then a collision is 
 possible. Hence another reason why I feel explicitly checking for collisions 
 at startup should always be done.

UUIDs are designed to be globally unique with a high probability when generated 
by trusted processes. Even when the volume of generated UUIDs is very high, 
which is certainly not the case for storage IDs. The probability of a storageID 
collision in normal operation is vanishingly small.
https://en.wikipedia.org/wiki/Universally_unique_identifier#Random_UUID_probability_of_duplicates
{quote}
We usually compare the probability of collision with hardware failure 
probability, or using the famous cosmic ray argument 
(http://stackoverflow.com/questions/2580933/cosmic-rays-what-is-the-probability-they-will-affect-a-program),
 since we can never do better than that.

{quote}
... Up until HDFS-4645, HDFS used randomly generated block IDs drawn from a far 
smaller space-- 2^64 – and we never had a problem. ...
{quote}
We did have collision check for random block IDs.

 NameNode not handling heartbeats properly after HDFS-2832
 -

 Key: HDFS-7575
 URL: https://issues.apache.org/jira/browse/HDFS-7575
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0, 2.5.0, 2.6.0
Reporter: Lars Francke
Assignee: Arpit Agarwal
Priority: Critical
 Attachments: HDFS-7575.01.patch, HDFS-7575.02.patch, 
 HDFS-7575.03.binary.patch, HDFS-7575.03.patch, HDFS-7575.04.binary.patch, 
 HDFS-7575.04.patch, HDFS-7575.05.binary.patch, HDFS-7575.05.patch, 
 testUpgrade22via24GeneratesStorageIDs.tgz, 
 testUpgradeFrom22GeneratesStorageIDs.tgz, 
 testUpgradeFrom24PreservesStorageId.tgz


 Before HDFS-2832 each DataNode would have a unique storageId which included 
 its IP address. Since HDFS-2832 the DataNodes have a unique storageId per 
 storage directory which is just a random UUID.
 They send reports per storage directory in their heartbeats. This heartbeat 
 is processed on the NameNode in the 
 {{DatanodeDescriptor#updateHeartbeatState}} method. Pre HDFS-2832 this would 
 just store the information per Datanode. After the patch though each DataNode 
 can have multiple different storages so it's stored in a map keyed by the 
 storage Id.
 This works fine for all clusters that have been installed post HDFS-2832 as 
 they get a UUID for their storage Id. So a DN with 8 drives has a map with 8 
 different keys. On each Heartbeat the Map is searched and updated 
 ({{DatanodeStorageInfo storage = storageMap.get(s.getStorageID());}}):
 {code:title=DatanodeStorageInfo}
   void updateState(StorageReport r) {
 capacity = r.getCapacity();
 dfsUsed = r.getDfsUsed();
 remaining = r.getRemaining();
 blockPoolUsed = r.getBlockPoolUsed();
   }
 {code}
 On clusters that were upgraded from a pre HDFS-2832 version though the 
 storage Id has not been rewritten (at least not on the four clusters I 
 checked) so each directory will have the exact same storageId. That means 
 there'll be only a single entry in the {{storageMap}} and it'll be 
 overwritten by a random {{StorageReport}} from the DataNode. This can be seen 
 in the {{updateState}} method above. This just assigns the capacity from the 
 received report, instead it should probably sum it up per received heartbeat.
 The Balancer seems to be one of the only things that actually uses this 
 information so it now considers the utilization of a random drive per 
 DataNode for balancing purposes.
 Things get even worse when a drive has been added or replaced as this will 
 now get a new storage Id so there'll be two entries in the storageMap. As new 
 drives are usually empty it skewes the balancers decision in a way that this 
 node will never be considered over-utilized.
 Another problem is that old StorageReports are never removed from the 
 storageMap. So if I replace a drive and it gets a new storage Id the old one 
 will still be in place and used for all calculations by the Balancer until a 
 restart of the NameNode.
 I can try providing a patch that does the following:
 * Instead of using a Map I could just store the array we receive or instead 
 of storing an array sum up the values for reports with the same Id
 * On each heartbeat clear the map (so we know we have up to date information)
 Does that sound sensible?



--
This message was

[jira] [Commented] (HDFS-7575) NameNode not handling heartbeats properly after HDFS-2832

2015-01-14 Thread Colin Patrick McCabe (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-7575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277749#comment-14277749
]

Colin Patrick McCabe commented on HDFS-7575:

bq. Suresh wrote: I agree with Daryn Sharp that there isno need to change the
layout here. Layout change is only necessary if the two layouts are not
compatible and the downgrade does not work from newer release to older. Is that
the case here?

The new layout used in HDFS-6482 is backwards compatible, in the sense that
older versions of hadoop can run with it. HDFS-6482 just added the invariant
that block ID uniquely determines which subdir a block is in, but subdirs
already existed. Does that mean we shouldn't have changed the layout version
for HDFS-6482? I think the answer is clear.

bq. Daryn wrote: Since we know duplicate storage ids are bad, I think the
correct logic is to always sanity check the storage ids at startup. If there
are collisions, then the storage should be updated. Rollback should not restore
a bug by reverting the storage id to a dup.

I'm surprised to hear you say that rollback should not be an option. It seems
like the conservative thing to do here is to allow the user to restore to the
VERSION file. Obviously we believe there will be no problems. But we always
believe that, or else we wouldn't have made the change. Sometimes there are
problems.

bq. BTW, UUID.randomUUID isn't guaranteed to return a unique id. It's highly
improbable, but possible, although more likely due to older storages, user
copying a storage, etc.

This is really not a good argument. Collisions in 128-bit space are extremely
unlikely. You will never see one in your lifetime. Up until HDFS-4645, HDFS
used randomly generated block IDs drawn from a far smaller space-- 2^64 -- and
we never had a problem. Phrases like billions and billions and total number
of grains of sand in the world don't begin to approach the size of 2^128.

I think it's frustrating for storage IDs to change without warning just because
HDFS was restarted. It will make diagnosing problems by reading log files
harder because storageIDs might morph at any time. It also sets a bad
precedent of not allowing downgrade and modifying VERSION files on the fly
during startup.

NameNode not handling heartbeats properly after HDFS-2832
-

[jira] [Commented] (HDFS-7575) NameNode not handling heartbeats properly after HDFS-2832

2015-01-14 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278041#comment-14278041
 ] 

Hadoop QA commented on HDFS-7575:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12692337/HDFS-7575.05.patch
  against trunk revision 7fe0f25.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 8 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs 
hadoop-hdfs-project/hadoop-hdfs-nfs:

  org.apache.hadoop.ha.TestZKFailoverControllerStress
  org.apache.hadoop.hdfs.server.namenode.TestFileTruncate
  
org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer

  The following test timeouts occurred in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs 
hadoop-hdfs-project/hadoop-hdfs-nfs:

org.apache.hadoop.ha.TestZKFailoverControllerStress
org.apache.hadoop.hdfs.server.mover.TestStorageMover

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9213//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9213//console

This message is automatically generated.

 NameNode not handling heartbeats properly after HDFS-2832
 -

 Key: HDFS-7575
 URL: https://issues.apache.org/jira/browse/HDFS-7575
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0, 2.5.0, 2.6.0
Reporter: Lars Francke
Assignee: Arpit Agarwal
Priority: Critical
 Attachments: HDFS-7575.01.patch, HDFS-7575.02.patch, 
 HDFS-7575.03.binary.patch, HDFS-7575.03.patch, HDFS-7575.04.binary.patch, 
 HDFS-7575.04.patch, HDFS-7575.05.binary.patch, HDFS-7575.05.patch, 
 testUpgrade22via24GeneratesStorageIDs.tgz, 
 testUpgradeFrom22GeneratesStorageIDs.tgz, 
 testUpgradeFrom24PreservesStorageId.tgz


 Before HDFS-2832 each DataNode would have a unique storageId which included 
 its IP address. Since HDFS-2832 the DataNodes have a unique storageId per 
 storage directory which is just a random UUID.
 They send reports per storage directory in their heartbeats. This heartbeat 
 is processed on the NameNode in the 
 {{DatanodeDescriptor#updateHeartbeatState}} method. Pre HDFS-2832 this would 
 just store the information per Datanode. After the patch though each DataNode 
 can have multiple different storages so it's stored in a map keyed by the 
 storage Id.
 This works fine for all clusters that have been installed post HDFS-2832 as 
 they get a UUID for their storage Id. So a DN with 8 drives has a map with 8 
 different keys. On each Heartbeat the Map is searched and updated 
 ({{DatanodeStorageInfo storage = storageMap.get(s.getStorageID());}}):
 {code:title=DatanodeStorageInfo}
   void updateState(StorageReport r) {
 capacity = r.getCapacity();
 dfsUsed = r.getDfsUsed();
 remaining = r.getRemaining();
 blockPoolUsed = r.getBlockPoolUsed();
   }
 {code}
 On clusters that were upgraded from a pre HDFS-2832 version though the 
 storage Id has not been rewritten (at least not on the four clusters I 
 checked) so each directory will have the exact same storageId. That means 
 there'll be only a single entry in the {{storageMap}} and it'll be 
 overwritten by a random {{StorageReport}} from the DataNode. This can be seen 
 in the {{updateState}} method above. This just assigns the capacity from the 
 received report, instead it should probably sum it up per received heartbeat.
 The Balancer seems to be one of the only things that actually uses this 
 information so it now considers the utilization of a random drive per 
 DataNode for balancing purposes.
 Things get even worse when a drive has been added or replaced as this will 
 now get a new storage Id so there'll be two entries in the storageMap. As new 
 drives are usually empty it skewes the balancers decision in a way that this 
 node will never be considered over-utilized.
 Another problem is that old StorageReports are never removed from the

[jira] [Commented] (HDFS-7575) NameNode not handling heartbeats properly after HDFS-2832

2015-01-14 Thread Daryn Sharp (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277222#comment-14277222
 ] 

Daryn Sharp commented on HDFS-7575:
---

I'm not an expert in this area, but I still question bumping the layout 
version.  The layout isn't changing, just an existing value in the VERSION file.

Since we know duplicate storage ids are bad, I think the correct logic is to 
always sanity check the storage ids at startup.  If there are collisions, then 
the storage should be updated.  Rollback should not restore a bug by reverting 
the storage id to a dup.

BTW, {{UUID.randomUUID}} isn't guaranteed to return a unique id.  It's _highly_ 
improbable, but possible, although more likely due to older storages, user 
copying a storage, etc.  Although the storage ids are unique after the 
upgrade, if a disk is moved from one node to another, then a collision is 
possible.  Hence another reason why I feel explicitly checking for collisions 
at startup should always be done.

 NameNode not handling heartbeats properly after HDFS-2832
 -

 Key: HDFS-7575
 URL: https://issues.apache.org/jira/browse/HDFS-7575
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0, 2.5.0, 2.6.0
Reporter: Lars Francke
Assignee: Arpit Agarwal
Priority: Critical
 Attachments: HDFS-7575.01.patch, HDFS-7575.02.patch, 
 HDFS-7575.03.binary.patch, HDFS-7575.03.patch, HDFS-7575.04.binary.patch, 
 HDFS-7575.04.patch, testUpgrade22via24GeneratesStorageIDs.tgz, 
 testUpgradeFrom22GeneratesStorageIDs.tgz, 
 testUpgradeFrom24PreservesStorageId.tgz


 Before HDFS-2832 each DataNode would have a unique storageId which included 
 its IP address. Since HDFS-2832 the DataNodes have a unique storageId per 
 storage directory which is just a random UUID.
 They send reports per storage directory in their heartbeats. This heartbeat 
 is processed on the NameNode in the 
 {{DatanodeDescriptor#updateHeartbeatState}} method. Pre HDFS-2832 this would 
 just store the information per Datanode. After the patch though each DataNode 
 can have multiple different storages so it's stored in a map keyed by the 
 storage Id.
 This works fine for all clusters that have been installed post HDFS-2832 as 
 they get a UUID for their storage Id. So a DN with 8 drives has a map with 8 
 different keys. On each Heartbeat the Map is searched and updated 
 ({{DatanodeStorageInfo storage = storageMap.get(s.getStorageID());}}):
 {code:title=DatanodeStorageInfo}
   void updateState(StorageReport r) {
 capacity = r.getCapacity();
 dfsUsed = r.getDfsUsed();
 remaining = r.getRemaining();
 blockPoolUsed = r.getBlockPoolUsed();
   }
 {code}
 On clusters that were upgraded from a pre HDFS-2832 version though the 
 storage Id has not been rewritten (at least not on the four clusters I 
 checked) so each directory will have the exact same storageId. That means 
 there'll be only a single entry in the {{storageMap}} and it'll be 
 overwritten by a random {{StorageReport}} from the DataNode. This can be seen 
 in the {{updateState}} method above. This just assigns the capacity from the 
 received report, instead it should probably sum it up per received heartbeat.
 The Balancer seems to be one of the only things that actually uses this 
 information so it now considers the utilization of a random drive per 
 DataNode for balancing purposes.
 Things get even worse when a drive has been added or replaced as this will 
 now get a new storage Id so there'll be two entries in the storageMap. As new 
 drives are usually empty it skewes the balancers decision in a way that this 
 node will never be considered over-utilized.
 Another problem is that old StorageReports are never removed from the 
 storageMap. So if I replace a drive and it gets a new storage Id the old one 
 will still be in place and used for all calculations by the Balancer until a 
 restart of the NameNode.
 I can try providing a patch that does the following:
 * Instead of using a Map I could just store the array we receive or instead 
 of storing an array sum up the values for reports with the same Id
 * On each heartbeat clear the map (so we know we have up to date information)
 Does that sound sensible?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7575) NameNode not handling heartbeats properly after HDFS-2832

2015-01-14 Thread Suresh Srinivas (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277294#comment-14277294
 ] 

Suresh Srinivas commented on HDFS-7575:
---

I agree with [~daryn] that there isno need to change the layout here. Layout 
change is only necessary if the two layouts are not compatible and the 
downgrade does not work from newer release to older. Is that the case here?

 NameNode not handling heartbeats properly after HDFS-2832
 -

 Key: HDFS-7575
 URL: https://issues.apache.org/jira/browse/HDFS-7575
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0, 2.5.0, 2.6.0
Reporter: Lars Francke
Assignee: Arpit Agarwal
Priority: Critical
 Attachments: HDFS-7575.01.patch, HDFS-7575.02.patch, 
 HDFS-7575.03.binary.patch, HDFS-7575.03.patch, HDFS-7575.04.binary.patch, 
 HDFS-7575.04.patch, testUpgrade22via24GeneratesStorageIDs.tgz, 
 testUpgradeFrom22GeneratesStorageIDs.tgz, 
 testUpgradeFrom24PreservesStorageId.tgz


 Before HDFS-2832 each DataNode would have a unique storageId which included 
 its IP address. Since HDFS-2832 the DataNodes have a unique storageId per 
 storage directory which is just a random UUID.
 They send reports per storage directory in their heartbeats. This heartbeat 
 is processed on the NameNode in the 
 {{DatanodeDescriptor#updateHeartbeatState}} method. Pre HDFS-2832 this would 
 just store the information per Datanode. After the patch though each DataNode 
 can have multiple different storages so it's stored in a map keyed by the 
 storage Id.
 This works fine for all clusters that have been installed post HDFS-2832 as 
 they get a UUID for their storage Id. So a DN with 8 drives has a map with 8 
 different keys. On each Heartbeat the Map is searched and updated 
 ({{DatanodeStorageInfo storage = storageMap.get(s.getStorageID());}}):
 {code:title=DatanodeStorageInfo}
   void updateState(StorageReport r) {
 capacity = r.getCapacity();
 dfsUsed = r.getDfsUsed();
 remaining = r.getRemaining();
 blockPoolUsed = r.getBlockPoolUsed();
   }
 {code}
 On clusters that were upgraded from a pre HDFS-2832 version though the 
 storage Id has not been rewritten (at least not on the four clusters I 
 checked) so each directory will have the exact same storageId. That means 
 there'll be only a single entry in the {{storageMap}} and it'll be 
 overwritten by a random {{StorageReport}} from the DataNode. This can be seen 
 in the {{updateState}} method above. This just assigns the capacity from the 
 received report, instead it should probably sum it up per received heartbeat.
 The Balancer seems to be one of the only things that actually uses this 
 information so it now considers the utilization of a random drive per 
 DataNode for balancing purposes.
 Things get even worse when a drive has been added or replaced as this will 
 now get a new storage Id so there'll be two entries in the storageMap. As new 
 drives are usually empty it skewes the balancers decision in a way that this 
 node will never be considered over-utilized.
 Another problem is that old StorageReports are never removed from the 
 storageMap. So if I replace a drive and it gets a new storage Id the old one 
 will still be in place and used for all calculations by the Balancer until a 
 restart of the NameNode.
 I can try providing a patch that does the following:
 * Instead of using a Map I could just store the array we receive or instead 
 of storing an array sum up the values for reports with the same Id
 * On each heartbeat clear the map (so we know we have up to date information)
 Does that sound sensible?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7575) NameNode not handling heartbeats properly after HDFS-2832

2015-01-13 Thread Arpit Agarwal (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276114#comment-14276114
 ] 

Arpit Agarwal commented on HDFS-7575:
-

Any comments on the v04 patch? Be good to get this change in. Thanks.

 NameNode not handling heartbeats properly after HDFS-2832
 -

 Key: HDFS-7575
 URL: https://issues.apache.org/jira/browse/HDFS-7575
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0, 2.5.0, 2.6.0
Reporter: Lars Francke
Assignee: Arpit Agarwal
Priority: Critical
 Attachments: HDFS-7575.01.patch, HDFS-7575.02.patch, 
 HDFS-7575.03.binary.patch, HDFS-7575.03.patch, HDFS-7575.04.binary.patch, 
 HDFS-7575.04.patch, testUpgrade22via24GeneratesStorageIDs.tgz, 
 testUpgradeFrom22GeneratesStorageIDs.tgz, 
 testUpgradeFrom24PreservesStorageId.tgz


 Before HDFS-2832 each DataNode would have a unique storageId which included 
 its IP address. Since HDFS-2832 the DataNodes have a unique storageId per 
 storage directory which is just a random UUID.
 They send reports per storage directory in their heartbeats. This heartbeat 
 is processed on the NameNode in the 
 {{DatanodeDescriptor#updateHeartbeatState}} method. Pre HDFS-2832 this would 
 just store the information per Datanode. After the patch though each DataNode 
 can have multiple different storages so it's stored in a map keyed by the 
 storage Id.
 This works fine for all clusters that have been installed post HDFS-2832 as 
 they get a UUID for their storage Id. So a DN with 8 drives has a map with 8 
 different keys. On each Heartbeat the Map is searched and updated 
 ({{DatanodeStorageInfo storage = storageMap.get(s.getStorageID());}}):
 {code:title=DatanodeStorageInfo}
   void updateState(StorageReport r) {
 capacity = r.getCapacity();
 dfsUsed = r.getDfsUsed();
 remaining = r.getRemaining();
 blockPoolUsed = r.getBlockPoolUsed();
   }
 {code}
 On clusters that were upgraded from a pre HDFS-2832 version though the 
 storage Id has not been rewritten (at least not on the four clusters I 
 checked) so each directory will have the exact same storageId. That means 
 there'll be only a single entry in the {{storageMap}} and it'll be 
 overwritten by a random {{StorageReport}} from the DataNode. This can be seen 
 in the {{updateState}} method above. This just assigns the capacity from the 
 received report, instead it should probably sum it up per received heartbeat.
 The Balancer seems to be one of the only things that actually uses this 
 information so it now considers the utilization of a random drive per 
 DataNode for balancing purposes.
 Things get even worse when a drive has been added or replaced as this will 
 now get a new storage Id so there'll be two entries in the storageMap. As new 
 drives are usually empty it skewes the balancers decision in a way that this 
 node will never be considered over-utilized.
 Another problem is that old StorageReports are never removed from the 
 storageMap. So if I replace a drive and it gets a new storage Id the old one 
 will still be in place and used for all calculations by the Balancer until a 
 restart of the NameNode.
 I can try providing a patch that does the following:
 * Instead of using a Map I could just store the array we receive or instead 
 of storing an array sum up the values for reports with the same Id
 * On each heartbeat clear the map (so we know we have up to date information)
 Does that sound sensible?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7575) NameNode not handling heartbeats properly after HDFS-2832

2015-01-12 Thread Colin Patrick McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14274124#comment-14274124
 ] 

Colin Patrick McCabe commented on HDFS-7575:


[~daryn]: I think a new layout version makes sense here.  Basically we are 
going from a layout where the storageID might not have been unique, to one 
where it is.  This is a change in the VERSION file.  It's nice to have the same 
guarantees that we usually do (that if the ugprade fails, you can roll back via 
the {{previous}} directory, and so forth.)  We could probably be more clever 
here and optimize this so we didn't have to hardlink the block files, but the 
upgrade path is already a little too clever and I think this is fine.

Rather than calling the new layout version UPGRADE_GENERATES_STORAGE_IDS, how 
about calling it something like UNIQUE_STORAGE_IDS or 
GUARANTEED_UNIQUE_STORAGE_IDS?  That describes what the new layout is, rather 
than what the process of upgrading is, consistent with our other layout version 
descriptions.

{code}
110... = new ClusterVerifier() {
111   @Override
112   public void verifyClusterPostUpgrade(MiniDFSCluster cluster) 
throws IOException {
113 // Verify that a GUID-based storage ID was generated.
114 final String bpid = cluster.getNamesystem().getBlockPoolId();
115 StorageReport[] reports =
116 
cluster.getDataNodes().get(0).getFSDataset().getStorageReports(bpid);
117 assertThat(reports.length, is(1));
118 final String storageID = reports[0].getStorage().getStorageID();
119 assertTrue(DatanodeStorage.isValidStorageId(storageID));
120   }
{code}
It seems like this exact code appears in 3 different tests.  We should just 
make this Verifier a static object that's created once in the test or something?

+1 once these are addressed.  [~daryn], please take a look if you can... we'd 
really like to fix this one.  Thanks, guys

 NameNode not handling heartbeats properly after HDFS-2832
 -

 Key: HDFS-7575
 URL: https://issues.apache.org/jira/browse/HDFS-7575
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0, 2.5.0, 2.6.0
Reporter: Lars Francke
Assignee: Arpit Agarwal
Priority: Critical
 Attachments: HDFS-7575.01.patch, HDFS-7575.02.patch, 
 HDFS-7575.03.binary.patch, HDFS-7575.03.patch, 
 testUpgrade22via24GeneratesStorageIDs.tgz, 
 testUpgradeFrom22GeneratesStorageIDs.tgz, 
 testUpgradeFrom24PreservesStorageId.tgz


 Before HDFS-2832 each DataNode would have a unique storageId which included 
 its IP address. Since HDFS-2832 the DataNodes have a unique storageId per 
 storage directory which is just a random UUID.
 They send reports per storage directory in their heartbeats. This heartbeat 
 is processed on the NameNode in the 
 {{DatanodeDescriptor#updateHeartbeatState}} method. Pre HDFS-2832 this would 
 just store the information per Datanode. After the patch though each DataNode 
 can have multiple different storages so it's stored in a map keyed by the 
 storage Id.
 This works fine for all clusters that have been installed post HDFS-2832 as 
 they get a UUID for their storage Id. So a DN with 8 drives has a map with 8 
 different keys. On each Heartbeat the Map is searched and updated 
 ({{DatanodeStorageInfo storage = storageMap.get(s.getStorageID());}}):
 {code:title=DatanodeStorageInfo}
   void updateState(StorageReport r) {
 capacity = r.getCapacity();
 dfsUsed = r.getDfsUsed();
 remaining = r.getRemaining();
 blockPoolUsed = r.getBlockPoolUsed();
   }
 {code}
 On clusters that were upgraded from a pre HDFS-2832 version though the 
 storage Id has not been rewritten (at least not on the four clusters I 
 checked) so each directory will have the exact same storageId. That means 
 there'll be only a single entry in the {{storageMap}} and it'll be 
 overwritten by a random {{StorageReport}} from the DataNode. This can be seen 
 in the {{updateState}} method above. This just assigns the capacity from the 
 received report, instead it should probably sum it up per received heartbeat.
 The Balancer seems to be one of the only things that actually uses this 
 information so it now considers the utilization of a random drive per 
 DataNode for balancing purposes.
 Things get even worse when a drive has been added or replaced as this will 
 now get a new storage Id so there'll be two entries in the storageMap. As new 
 drives are usually empty it skewes the balancers decision in a way that this 
 node will never be considered over-utilized.
 Another problem is that old StorageReports are never removed from the 
 storageMap. So if I replace a drive and it gets a new storage Id the old one 
 will still be in place

[jira] [Commented] (HDFS-7575) NameNode not handling heartbeats properly after HDFS-2832

2015-01-12 Thread Arpit Agarwal (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14274166#comment-14274166
 ] 

Arpit Agarwal commented on HDFS-7575:
-

Thanks for reviewing. v04 patch addresses latest the feedback from Colin. 

 NameNode not handling heartbeats properly after HDFS-2832
 -

 Key: HDFS-7575
 URL: https://issues.apache.org/jira/browse/HDFS-7575
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0, 2.5.0, 2.6.0
Reporter: Lars Francke
Assignee: Arpit Agarwal
Priority: Critical
 Attachments: HDFS-7575.01.patch, HDFS-7575.02.patch, 
 HDFS-7575.03.binary.patch, HDFS-7575.03.patch, HDFS-7575.04.binary.patch, 
 HDFS-7575.04.patch, testUpgrade22via24GeneratesStorageIDs.tgz, 
 testUpgradeFrom22GeneratesStorageIDs.tgz, 
 testUpgradeFrom24PreservesStorageId.tgz


 Before HDFS-2832 each DataNode would have a unique storageId which included 
 its IP address. Since HDFS-2832 the DataNodes have a unique storageId per 
 storage directory which is just a random UUID.
 They send reports per storage directory in their heartbeats. This heartbeat 
 is processed on the NameNode in the 
 {{DatanodeDescriptor#updateHeartbeatState}} method. Pre HDFS-2832 this would 
 just store the information per Datanode. After the patch though each DataNode 
 can have multiple different storages so it's stored in a map keyed by the 
 storage Id.
 This works fine for all clusters that have been installed post HDFS-2832 as 
 they get a UUID for their storage Id. So a DN with 8 drives has a map with 8 
 different keys. On each Heartbeat the Map is searched and updated 
 ({{DatanodeStorageInfo storage = storageMap.get(s.getStorageID());}}):
 {code:title=DatanodeStorageInfo}
   void updateState(StorageReport r) {
 capacity = r.getCapacity();
 dfsUsed = r.getDfsUsed();
 remaining = r.getRemaining();
 blockPoolUsed = r.getBlockPoolUsed();
   }
 {code}
 On clusters that were upgraded from a pre HDFS-2832 version though the 
 storage Id has not been rewritten (at least not on the four clusters I 
 checked) so each directory will have the exact same storageId. That means 
 there'll be only a single entry in the {{storageMap}} and it'll be 
 overwritten by a random {{StorageReport}} from the DataNode. This can be seen 
 in the {{updateState}} method above. This just assigns the capacity from the 
 received report, instead it should probably sum it up per received heartbeat.
 The Balancer seems to be one of the only things that actually uses this 
 information so it now considers the utilization of a random drive per 
 DataNode for balancing purposes.
 Things get even worse when a drive has been added or replaced as this will 
 now get a new storage Id so there'll be two entries in the storageMap. As new 
 drives are usually empty it skewes the balancers decision in a way that this 
 node will never be considered over-utilized.
 Another problem is that old StorageReports are never removed from the 
 storageMap. So if I replace a drive and it gets a new storage Id the old one 
 will still be in place and used for all calculations by the Balancer until a 
 restart of the NameNode.
 I can try providing a patch that does the following:
 * Instead of using a Map I could just store the array we receive or instead 
 of storing an array sum up the values for reports with the same Id
 * On each heartbeat clear the map (so we know we have up to date information)
 Does that sound sensible?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7575) NameNode not handling heartbeats properly after HDFS-2832

2015-01-12 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14274518#comment-14274518
 ] 

Hadoop QA commented on HDFS-7575:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12691741/HDFS-7575.04.patch
  against trunk revision b78b4a1.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 9 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.TestDatanodeLayoutUpgradeGeneratesStorageID
  org.apache.hadoop.hdfs.server.balancer.TestBalancer

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9189//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9189//console

This message is automatically generated.

 NameNode not handling heartbeats properly after HDFS-2832
 -

 Key: HDFS-7575
 URL: https://issues.apache.org/jira/browse/HDFS-7575
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0, 2.5.0, 2.6.0
Reporter: Lars Francke
Assignee: Arpit Agarwal
Priority: Critical
 Attachments: HDFS-7575.01.patch, HDFS-7575.02.patch, 
 HDFS-7575.03.binary.patch, HDFS-7575.03.patch, HDFS-7575.04.binary.patch, 
 HDFS-7575.04.patch, testUpgrade22via24GeneratesStorageIDs.tgz, 
 testUpgradeFrom22GeneratesStorageIDs.tgz, 
 testUpgradeFrom24PreservesStorageId.tgz


 Before HDFS-2832 each DataNode would have a unique storageId which included 
 its IP address. Since HDFS-2832 the DataNodes have a unique storageId per 
 storage directory which is just a random UUID.
 They send reports per storage directory in their heartbeats. This heartbeat 
 is processed on the NameNode in the 
 {{DatanodeDescriptor#updateHeartbeatState}} method. Pre HDFS-2832 this would 
 just store the information per Datanode. After the patch though each DataNode 
 can have multiple different storages so it's stored in a map keyed by the 
 storage Id.
 This works fine for all clusters that have been installed post HDFS-2832 as 
 they get a UUID for their storage Id. So a DN with 8 drives has a map with 8 
 different keys. On each Heartbeat the Map is searched and updated 
 ({{DatanodeStorageInfo storage = storageMap.get(s.getStorageID());}}):
 {code:title=DatanodeStorageInfo}
   void updateState(StorageReport r) {
 capacity = r.getCapacity();
 dfsUsed = r.getDfsUsed();
 remaining = r.getRemaining();
 blockPoolUsed = r.getBlockPoolUsed();
   }
 {code}
 On clusters that were upgraded from a pre HDFS-2832 version though the 
 storage Id has not been rewritten (at least not on the four clusters I 
 checked) so each directory will have the exact same storageId. That means 
 there'll be only a single entry in the {{storageMap}} and it'll be 
 overwritten by a random {{StorageReport}} from the DataNode. This can be seen 
 in the {{updateState}} method above. This just assigns the capacity from the 
 received report, instead it should probably sum it up per received heartbeat.
 The Balancer seems to be one of the only things that actually uses this 
 information so it now considers the utilization of a random drive per 
 DataNode for balancing purposes.
 Things get even worse when a drive has been added or replaced as this will 
 now get a new storage Id so there'll be two entries in the storageMap. As new 
 drives are usually empty it skewes the balancers decision in a way that this 
 node will never be considered over-utilized.
 Another problem is that old StorageReports are never removed from the 
 storageMap. So if I replace a drive and it gets a new storage Id the old one 
 will still be in place and used for all calculations by the Balancer until a 
 restart of the NameNode.
 I can try providing a patch that does the following:
 * Instead of using a Map I could just store the array we receive or instead 
 of storing an array sum up the values for reports with the same Id
 * On each heartbeat clear the map (so we know we have up to date information)
 Does that sound sensible?



--
This

[jira] [Commented] (HDFS-7575) NameNode not handling heartbeats properly after HDFS-2832

2015-01-12 Thread Arpit Agarwal (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14273948#comment-14273948
 ] 

Arpit Agarwal commented on HDFS-7575:
-

[~cmccabe], [~daryn],

Are you okay with proceeding with the patch or are there any open questions 
you'd like to see addressed?

Thanks.

 NameNode not handling heartbeats properly after HDFS-2832
 -

 Key: HDFS-7575
 URL: https://issues.apache.org/jira/browse/HDFS-7575
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0, 2.5.0, 2.6.0
Reporter: Lars Francke
Assignee: Arpit Agarwal
Priority: Critical
 Attachments: HDFS-7575.01.patch, HDFS-7575.02.patch, 
 HDFS-7575.03.binary.patch, HDFS-7575.03.patch, 
 testUpgrade22via24GeneratesStorageIDs.tgz, 
 testUpgradeFrom22GeneratesStorageIDs.tgz, 
 testUpgradeFrom24PreservesStorageId.tgz


 Before HDFS-2832 each DataNode would have a unique storageId which included 
 its IP address. Since HDFS-2832 the DataNodes have a unique storageId per 
 storage directory which is just a random UUID.
 They send reports per storage directory in their heartbeats. This heartbeat 
 is processed on the NameNode in the 
 {{DatanodeDescriptor#updateHeartbeatState}} method. Pre HDFS-2832 this would 
 just store the information per Datanode. After the patch though each DataNode 
 can have multiple different storages so it's stored in a map keyed by the 
 storage Id.
 This works fine for all clusters that have been installed post HDFS-2832 as 
 they get a UUID for their storage Id. So a DN with 8 drives has a map with 8 
 different keys. On each Heartbeat the Map is searched and updated 
 ({{DatanodeStorageInfo storage = storageMap.get(s.getStorageID());}}):
 {code:title=DatanodeStorageInfo}
   void updateState(StorageReport r) {
 capacity = r.getCapacity();
 dfsUsed = r.getDfsUsed();
 remaining = r.getRemaining();
 blockPoolUsed = r.getBlockPoolUsed();
   }
 {code}
 On clusters that were upgraded from a pre HDFS-2832 version though the 
 storage Id has not been rewritten (at least not on the four clusters I 
 checked) so each directory will have the exact same storageId. That means 
 there'll be only a single entry in the {{storageMap}} and it'll be 
 overwritten by a random {{StorageReport}} from the DataNode. This can be seen 
 in the {{updateState}} method above. This just assigns the capacity from the 
 received report, instead it should probably sum it up per received heartbeat.
 The Balancer seems to be one of the only things that actually uses this 
 information so it now considers the utilization of a random drive per 
 DataNode for balancing purposes.
 Things get even worse when a drive has been added or replaced as this will 
 now get a new storage Id so there'll be two entries in the storageMap. As new 
 drives are usually empty it skewes the balancers decision in a way that this 
 node will never be considered over-utilized.
 Another problem is that old StorageReports are never removed from the 
 storageMap. So if I replace a drive and it gets a new storage Id the old one 
 will still be in place and used for all calculations by the Balancer until a 
 restart of the NameNode.
 I can try providing a patch that does the following:
 * Instead of using a Map I could just store the array we receive or instead 
 of storing an array sum up the values for reports with the same Id
 * On each heartbeat clear the map (so we know we have up to date information)
 Does that sound sensible?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7575) NameNode not handling heartbeats properly after HDFS-2832

2015-01-09 Thread Arpit Agarwal (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14271723#comment-14271723
 ] 

Arpit Agarwal commented on HDFS-7575:
-

Patch that does not modify the existing test case to reduce binary diff.

 NameNode not handling heartbeats properly after HDFS-2832
 -

 Key: HDFS-7575
 URL: https://issues.apache.org/jira/browse/HDFS-7575
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0, 2.5.0, 2.6.0
Reporter: Lars Francke
Assignee: Arpit Agarwal
Priority: Critical
 Attachments: HDFS-7575.01.patch, HDFS-7575.02.patch, 
 HDFS-7575.03.binary.patch, HDFS-7575.03.patch


 Before HDFS-2832 each DataNode would have a unique storageId which included 
 its IP address. Since HDFS-2832 the DataNodes have a unique storageId per 
 storage directory which is just a random UUID.
 They send reports per storage directory in their heartbeats. This heartbeat 
 is processed on the NameNode in the 
 {{DatanodeDescriptor#updateHeartbeatState}} method. Pre HDFS-2832 this would 
 just store the information per Datanode. After the patch though each DataNode 
 can have multiple different storages so it's stored in a map keyed by the 
 storage Id.
 This works fine for all clusters that have been installed post HDFS-2832 as 
 they get a UUID for their storage Id. So a DN with 8 drives has a map with 8 
 different keys. On each Heartbeat the Map is searched and updated 
 ({{DatanodeStorageInfo storage = storageMap.get(s.getStorageID());}}):
 {code:title=DatanodeStorageInfo}
   void updateState(StorageReport r) {
 capacity = r.getCapacity();
 dfsUsed = r.getDfsUsed();
 remaining = r.getRemaining();
 blockPoolUsed = r.getBlockPoolUsed();
   }
 {code}
 On clusters that were upgraded from a pre HDFS-2832 version though the 
 storage Id has not been rewritten (at least not on the four clusters I 
 checked) so each directory will have the exact same storageId. That means 
 there'll be only a single entry in the {{storageMap}} and it'll be 
 overwritten by a random {{StorageReport}} from the DataNode. This can be seen 
 in the {{updateState}} method above. This just assigns the capacity from the 
 received report, instead it should probably sum it up per received heartbeat.
 The Balancer seems to be one of the only things that actually uses this 
 information so it now considers the utilization of a random drive per 
 DataNode for balancing purposes.
 Things get even worse when a drive has been added or replaced as this will 
 now get a new storage Id so there'll be two entries in the storageMap. As new 
 drives are usually empty it skewes the balancers decision in a way that this 
 node will never be considered over-utilized.
 Another problem is that old StorageReports are never removed from the 
 storageMap. So if I replace a drive and it gets a new storage Id the old one 
 will still be in place and used for all calculations by the Balancer until a 
 restart of the NameNode.
 I can try providing a patch that does the following:
 * Instead of using a Map I could just store the array we receive or instead 
 of storing an array sum up the values for reports with the same Id
 * On each heartbeat clear the map (so we know we have up to date information)
 Does that sound sensible?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7575) NameNode not handling heartbeats properly after HDFS-2832

2015-01-09 Thread Daryn Sharp (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14271746#comment-14271746
 ] 

Daryn Sharp commented on HDFS-7575:
---

This is a general question, I don't have specific instances: Is there any 
lingering data that might also need to be cleaned up or removed after the 
upgrade to storage ids?

Also, will the NN correctly adapt to the new storage ids?  I think it will when 
the DN reregisters and sends full block reports.  Need to be certain this is 
rolling upgrade safe.

 NameNode not handling heartbeats properly after HDFS-2832
 -

 Key: HDFS-7575
 URL: https://issues.apache.org/jira/browse/HDFS-7575
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0, 2.5.0, 2.6.0
Reporter: Lars Francke
Assignee: Arpit Agarwal
Priority: Critical
 Attachments: HDFS-7575.01.patch, HDFS-7575.02.patch, 
 HDFS-7575.03.binary.patch, HDFS-7575.03.patch


 Before HDFS-2832 each DataNode would have a unique storageId which included 
 its IP address. Since HDFS-2832 the DataNodes have a unique storageId per 
 storage directory which is just a random UUID.
 They send reports per storage directory in their heartbeats. This heartbeat 
 is processed on the NameNode in the 
 {{DatanodeDescriptor#updateHeartbeatState}} method. Pre HDFS-2832 this would 
 just store the information per Datanode. After the patch though each DataNode 
 can have multiple different storages so it's stored in a map keyed by the 
 storage Id.
 This works fine for all clusters that have been installed post HDFS-2832 as 
 they get a UUID for their storage Id. So a DN with 8 drives has a map with 8 
 different keys. On each Heartbeat the Map is searched and updated 
 ({{DatanodeStorageInfo storage = storageMap.get(s.getStorageID());}}):
 {code:title=DatanodeStorageInfo}
   void updateState(StorageReport r) {
 capacity = r.getCapacity();
 dfsUsed = r.getDfsUsed();
 remaining = r.getRemaining();
 blockPoolUsed = r.getBlockPoolUsed();
   }
 {code}
 On clusters that were upgraded from a pre HDFS-2832 version though the 
 storage Id has not been rewritten (at least not on the four clusters I 
 checked) so each directory will have the exact same storageId. That means 
 there'll be only a single entry in the {{storageMap}} and it'll be 
 overwritten by a random {{StorageReport}} from the DataNode. This can be seen 
 in the {{updateState}} method above. This just assigns the capacity from the 
 received report, instead it should probably sum it up per received heartbeat.
 The Balancer seems to be one of the only things that actually uses this 
 information so it now considers the utilization of a random drive per 
 DataNode for balancing purposes.
 Things get even worse when a drive has been added or replaced as this will 
 now get a new storage Id so there'll be two entries in the storageMap. As new 
 drives are usually empty it skewes the balancers decision in a way that this 
 node will never be considered over-utilized.
 Another problem is that old StorageReports are never removed from the 
 storageMap. So if I replace a drive and it gets a new storage Id the old one 
 will still be in place and used for all calculations by the Balancer until a 
 restart of the NameNode.
 I can try providing a patch that does the following:
 * Instead of using a Map I could just store the array we receive or instead 
 of storing an array sum up the values for reports with the same Id
 * On each heartbeat clear the map (so we know we have up to date information)
 Does that sound sensible?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7575) NameNode not handling heartbeats properly after HDFS-2832

2015-01-09 Thread Daryn Sharp (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14271756#comment-14271756
 ] 

Daryn Sharp commented on HDFS-7575:
---

Also, is the layout version for UPGRADE_GENERATES_STORAGE_IDS necessary?  The 
prior layout versions already work for single/multi storage ids and the new 
layout id doesn't seem to be referenced anywhere.

 NameNode not handling heartbeats properly after HDFS-2832
 -

 Key: HDFS-7575
 URL: https://issues.apache.org/jira/browse/HDFS-7575
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0, 2.5.0, 2.6.0
Reporter: Lars Francke
Assignee: Arpit Agarwal
Priority: Critical
 Attachments: HDFS-7575.01.patch, HDFS-7575.02.patch, 
 HDFS-7575.03.binary.patch, HDFS-7575.03.patch


 Before HDFS-2832 each DataNode would have a unique storageId which included 
 its IP address. Since HDFS-2832 the DataNodes have a unique storageId per 
 storage directory which is just a random UUID.
 They send reports per storage directory in their heartbeats. This heartbeat 
 is processed on the NameNode in the 
 {{DatanodeDescriptor#updateHeartbeatState}} method. Pre HDFS-2832 this would 
 just store the information per Datanode. After the patch though each DataNode 
 can have multiple different storages so it's stored in a map keyed by the 
 storage Id.
 This works fine for all clusters that have been installed post HDFS-2832 as 
 they get a UUID for their storage Id. So a DN with 8 drives has a map with 8 
 different keys. On each Heartbeat the Map is searched and updated 
 ({{DatanodeStorageInfo storage = storageMap.get(s.getStorageID());}}):
 {code:title=DatanodeStorageInfo}
   void updateState(StorageReport r) {
 capacity = r.getCapacity();
 dfsUsed = r.getDfsUsed();
 remaining = r.getRemaining();
 blockPoolUsed = r.getBlockPoolUsed();
   }
 {code}
 On clusters that were upgraded from a pre HDFS-2832 version though the 
 storage Id has not been rewritten (at least not on the four clusters I 
 checked) so each directory will have the exact same storageId. That means 
 there'll be only a single entry in the {{storageMap}} and it'll be 
 overwritten by a random {{StorageReport}} from the DataNode. This can be seen 
 in the {{updateState}} method above. This just assigns the capacity from the 
 received report, instead it should probably sum it up per received heartbeat.
 The Balancer seems to be one of the only things that actually uses this 
 information so it now considers the utilization of a random drive per 
 DataNode for balancing purposes.
 Things get even worse when a drive has been added or replaced as this will 
 now get a new storage Id so there'll be two entries in the storageMap. As new 
 drives are usually empty it skewes the balancers decision in a way that this 
 node will never be considered over-utilized.
 Another problem is that old StorageReports are never removed from the 
 storageMap. So if I replace a drive and it gets a new storage Id the old one 
 will still be in place and used for all calculations by the Balancer until a 
 restart of the NameNode.
 I can try providing a patch that does the following:
 * Instead of using a Map I could just store the array we receive or instead 
 of storing an array sum up the values for reports with the same Id
 * On each heartbeat clear the map (so we know we have up to date information)
 Does that sound sensible?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7575) NameNode not handling heartbeats properly after HDFS-2832

2015-01-09 Thread Arpit Agarwal (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-7575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14271782#comment-14271782
]

Arpit Agarwal commented on HDFS-7575:
-

bq. This is a general question, I don't have specific instances: Is there any
lingering data that might also need to be cleaned up or removed after the
upgrade to storage ids?
The stale storages need to be cleaned up on the NN. This will be fixed by
HDFS-7596.

bq. Also, will the NN correctly adapt to the new storage ids? I think it will
when the DN reregisters and sends full block reports. Need to be certain this
is rolling upgrade safe.
From my unit testing, the NN does handle the new storage ids and migrates
blocks from the old storage to the new storage id as the block reports come
in.

bq. Also, is the layout version for UPGRADE_GENERATES_STORAGE_IDS necessary?
The prior layout versions already work for single/multi storage ids and the
new layout id doesn't seem to be referenced anywhere.
Good question. For clusters previously upgraded from 2.2, we are technically
changing the content of the VERSION files so a layout version change seemed
warranted. Do you see any downside to doing so?

Thanks for taking a look at the patch.

NameNode not handling heartbeats properly after HDFS-2832
-

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7575) NameNode not handling heartbeats properly after HDFS-2832

2015-01-09 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14271997#comment-14271997
 ] 

Hadoop QA commented on HDFS-7575:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12691413/testUpgradeFrom24PreservesStorageId.tgz
  against trunk revision ae91b13.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9172//console

This message is automatically generated.

 NameNode not handling heartbeats properly after HDFS-2832
 -

 Key: HDFS-7575
 URL: https://issues.apache.org/jira/browse/HDFS-7575
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0, 2.5.0, 2.6.0
Reporter: Lars Francke
Assignee: Arpit Agarwal
Priority: Critical
 Attachments: HDFS-7575.01.patch, HDFS-7575.02.patch, 
 HDFS-7575.03.binary.patch, HDFS-7575.03.patch, 
 testUpgrade22via24GeneratesStorageIDs.tgz, 
 testUpgradeFrom22GeneratesStorageIDs.tgz, 
 testUpgradeFrom24PreservesStorageId.tgz


 Before HDFS-2832 each DataNode would have a unique storageId which included 
 its IP address. Since HDFS-2832 the DataNodes have a unique storageId per 
 storage directory which is just a random UUID.
 They send reports per storage directory in their heartbeats. This heartbeat 
 is processed on the NameNode in the 
 {{DatanodeDescriptor#updateHeartbeatState}} method. Pre HDFS-2832 this would 
 just store the information per Datanode. After the patch though each DataNode 
 can have multiple different storages so it's stored in a map keyed by the 
 storage Id.
 This works fine for all clusters that have been installed post HDFS-2832 as 
 they get a UUID for their storage Id. So a DN with 8 drives has a map with 8 
 different keys. On each Heartbeat the Map is searched and updated 
 ({{DatanodeStorageInfo storage = storageMap.get(s.getStorageID());}}):
 {code:title=DatanodeStorageInfo}
   void updateState(StorageReport r) {
 capacity = r.getCapacity();
 dfsUsed = r.getDfsUsed();
 remaining = r.getRemaining();
 blockPoolUsed = r.getBlockPoolUsed();
   }
 {code}
 On clusters that were upgraded from a pre HDFS-2832 version though the 
 storage Id has not been rewritten (at least not on the four clusters I 
 checked) so each directory will have the exact same storageId. That means 
 there'll be only a single entry in the {{storageMap}} and it'll be 
 overwritten by a random {{StorageReport}} from the DataNode. This can be seen 
 in the {{updateState}} method above. This just assigns the capacity from the 
 received report, instead it should probably sum it up per received heartbeat.
 The Balancer seems to be one of the only things that actually uses this 
 information so it now considers the utilization of a random drive per 
 DataNode for balancing purposes.
 Things get even worse when a drive has been added or replaced as this will 
 now get a new storage Id so there'll be two entries in the storageMap. As new 
 drives are usually empty it skewes the balancers decision in a way that this 
 node will never be considered over-utilized.
 Another problem is that old StorageReports are never removed from the 
 storageMap. So if I replace a drive and it gets a new storage Id the old one 
 will still be in place and used for all calculations by the Balancer until a 
 restart of the NameNode.
 I can try providing a patch that does the following:
 * Instead of using a Map I could just store the array we receive or instead 
 of storing an array sum up the values for reports with the same Id
 * On each heartbeat clear the map (so we know we have up to date information)
 Does that sound sensible?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7575) NameNode not handling heartbeats properly after HDFS-2832

2015-01-09 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14271963#comment-14271963
 ] 

Hadoop QA commented on HDFS-7575:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12691363/HDFS-7575.03.patch
  against trunk revision ae91b13.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 9 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.TestDatanodeLayoutUpgradeGeneratesStorageID

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9166//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9166//console

This message is automatically generated.

 NameNode not handling heartbeats properly after HDFS-2832
 -

 Key: HDFS-7575
 URL: https://issues.apache.org/jira/browse/HDFS-7575
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0, 2.5.0, 2.6.0
Reporter: Lars Francke
Assignee: Arpit Agarwal
Priority: Critical
 Attachments: HDFS-7575.01.patch, HDFS-7575.02.patch, 
 HDFS-7575.03.binary.patch, HDFS-7575.03.patch


 Before HDFS-2832 each DataNode would have a unique storageId which included 
 its IP address. Since HDFS-2832 the DataNodes have a unique storageId per 
 storage directory which is just a random UUID.
 They send reports per storage directory in their heartbeats. This heartbeat 
 is processed on the NameNode in the 
 {{DatanodeDescriptor#updateHeartbeatState}} method. Pre HDFS-2832 this would 
 just store the information per Datanode. After the patch though each DataNode 
 can have multiple different storages so it's stored in a map keyed by the 
 storage Id.
 This works fine for all clusters that have been installed post HDFS-2832 as 
 they get a UUID for their storage Id. So a DN with 8 drives has a map with 8 
 different keys. On each Heartbeat the Map is searched and updated 
 ({{DatanodeStorageInfo storage = storageMap.get(s.getStorageID());}}):
 {code:title=DatanodeStorageInfo}
   void updateState(StorageReport r) {
 capacity = r.getCapacity();
 dfsUsed = r.getDfsUsed();
 remaining = r.getRemaining();
 blockPoolUsed = r.getBlockPoolUsed();
   }
 {code}
 On clusters that were upgraded from a pre HDFS-2832 version though the 
 storage Id has not been rewritten (at least not on the four clusters I 
 checked) so each directory will have the exact same storageId. That means 
 there'll be only a single entry in the {{storageMap}} and it'll be 
 overwritten by a random {{StorageReport}} from the DataNode. This can be seen 
 in the {{updateState}} method above. This just assigns the capacity from the 
 received report, instead it should probably sum it up per received heartbeat.
 The Balancer seems to be one of the only things that actually uses this 
 information so it now considers the utilization of a random drive per 
 DataNode for balancing purposes.
 Things get even worse when a drive has been added or replaced as this will 
 now get a new storage Id so there'll be two entries in the storageMap. As new 
 drives are usually empty it skewes the balancers decision in a way that this 
 node will never be considered over-utilized.
 Another problem is that old StorageReports are never removed from the 
 storageMap. So if I replace a drive and it gets a new storage Id the old one 
 will still be in place and used for all calculations by the Balancer until a 
 restart of the NameNode.
 I can try providing a patch that does the following:
 * Instead of using a Map I could just store the array we receive or instead 
 of storing an array sum up the values for reports with the same Id
 * On each heartbeat clear the map (so we know we have up to date information)
 Does that sound sensible?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7575) NameNode not handling heartbeats properly after HDFS-2832

2015-01-08 Thread Arpit Agarwal (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14270221#comment-14270221
 ] 

Arpit Agarwal commented on HDFS-7575:
-

The patch size looks large due to a directory structure change to a binary 
image for an existing test case.

 NameNode not handling heartbeats properly after HDFS-2832
 -

 Key: HDFS-7575
 URL: https://issues.apache.org/jira/browse/HDFS-7575
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0, 2.5.0, 2.6.0
Reporter: Lars Francke
Assignee: Arpit Agarwal
Priority: Critical
 Attachments: HDFS-7575.01.patch


 Before HDFS-2832 each DataNode would have a unique storageId which included 
 its IP address. Since HDFS-2832 the DataNodes have a unique storageId per 
 storage directory which is just a random UUID.
 They send reports per storage directory in their heartbeats. This heartbeat 
 is processed on the NameNode in the 
 {{DatanodeDescriptor#updateHeartbeatState}} method. Pre HDFS-2832 this would 
 just store the information per Datanode. After the patch though each DataNode 
 can have multiple different storages so it's stored in a map keyed by the 
 storage Id.
 This works fine for all clusters that have been installed post HDFS-2832 as 
 they get a UUID for their storage Id. So a DN with 8 drives has a map with 8 
 different keys. On each Heartbeat the Map is searched and updated 
 ({{DatanodeStorageInfo storage = storageMap.get(s.getStorageID());}}):
 {code:title=DatanodeStorageInfo}
   void updateState(StorageReport r) {
 capacity = r.getCapacity();
 dfsUsed = r.getDfsUsed();
 remaining = r.getRemaining();
 blockPoolUsed = r.getBlockPoolUsed();
   }
 {code}
 On clusters that were upgraded from a pre HDFS-2832 version though the 
 storage Id has not been rewritten (at least not on the four clusters I 
 checked) so each directory will have the exact same storageId. That means 
 there'll be only a single entry in the {{storageMap}} and it'll be 
 overwritten by a random {{StorageReport}} from the DataNode. This can be seen 
 in the {{updateState}} method above. This just assigns the capacity from the 
 received report, instead it should probably sum it up per received heartbeat.
 The Balancer seems to be one of the only things that actually uses this 
 information so it now considers the utilization of a random drive per 
 DataNode for balancing purposes.
 Things get even worse when a drive has been added or replaced as this will 
 now get a new storage Id so there'll be two entries in the storageMap. As new 
 drives are usually empty it skewes the balancers decision in a way that this 
 node will never be considered over-utilized.
 Another problem is that old StorageReports are never removed from the 
 storageMap. So if I replace a drive and it gets a new storage Id the old one 
 will still be in place and used for all calculations by the Balancer until a 
 restart of the NameNode.
 I can try providing a patch that does the following:
 * Instead of using a Map I could just store the array we receive or instead 
 of storing an array sum up the values for reports with the same Id
 * On each heartbeat clear the map (so we know we have up to date information)
 Does that sound sensible?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7575) NameNode not handling heartbeats properly after HDFS-2832

2015-01-08 Thread Colin Patrick McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14270423#comment-14270423
 ] 

Colin Patrick McCabe commented on HDFS-7575:


Hi Arpit,

Thanks for taking this one on.  Is there any chance that you could do the patch 
without adding or moving binary files?  It seems like the main thing we're 
testing here is just that when we start up, we're going to modify the VERSION 
files as expected.  We shouldn't even need any block files to test that, right? 
 Just a few mkdirs in a unit test.

If we check in the existing code, this 1.7 MB commit becomes part of the repo's 
history forever which slows down downloads and git pulls.  I also think that 
untarring things during a test is kind of sluggish as well.

It seems like we never needed these tar files to begin with.  We could just 
have the test open up the txt files and generate a temporary directory based on 
them.  If you want the blocks to have contents, we could just generate them 
with a fixed random seed using java.util.Random, and always get the same 
contents.

{code}
 if (this.layoutVersion  HdfsConstants.DATANODE_LAYOUT_VERSION) {
+
+  // Clusters previously upgraded from layout versions earlier than
+  // ADD_DATANODE_AND_STORAGE_UUIDS failed to correctly generate a
+  // new storage ID. We fix that now.
+
+  boolean haveValidStorageId =
+  DataNodeLayoutVersion.supports(
+  LayoutVersion.Feature.ADD_DATANODE_AND_STORAGE_UUIDS, 
layoutVersion) 
+  DatanodeStorage.isValidStorageId(sd.getStorageUuid());
+
   doUpgrade(datanode, sd, nsInfo);  // upgrade
-  createStorageID(sd);
+  if (createStorageID(sd, !haveValidStorageId)) {
+LOG.info(Generated new storageID  + sd.getStorageUuid() +
+  for directory  + sd.getRoot());
+  }
{code}

It would be good to add some logging for the various cases here.  If we are 
generating a new storage ID because the previous one was invalid, we should log 
that the previous one was invalid somewhere.

{code}
 if (this.layoutVersion  HdfsConstants.DATANODE_LAYOUT_VERSION) {
{code}

Is this if statement really valid?  It seems like right now, there are 
clusters out there that are on the latest layout version, but which don't have 
valid storage IDs.  We should either bump the NN layout version, or 
unconditionally check that the storage ID is valid, right?

 NameNode not handling heartbeats properly after HDFS-2832
 -

 Key: HDFS-7575
 URL: https://issues.apache.org/jira/browse/HDFS-7575
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0, 2.5.0, 2.6.0
Reporter: Lars Francke
Assignee: Arpit Agarwal
Priority: Critical
 Attachments: HDFS-7575.01.patch, HDFS-7575.02.patch


 Before HDFS-2832 each DataNode would have a unique storageId which included 
 its IP address. Since HDFS-2832 the DataNodes have a unique storageId per 
 storage directory which is just a random UUID.
 They send reports per storage directory in their heartbeats. This heartbeat 
 is processed on the NameNode in the 
 {{DatanodeDescriptor#updateHeartbeatState}} method. Pre HDFS-2832 this would 
 just store the information per Datanode. After the patch though each DataNode 
 can have multiple different storages so it's stored in a map keyed by the 
 storage Id.
 This works fine for all clusters that have been installed post HDFS-2832 as 
 they get a UUID for their storage Id. So a DN with 8 drives has a map with 8 
 different keys. On each Heartbeat the Map is searched and updated 
 ({{DatanodeStorageInfo storage = storageMap.get(s.getStorageID());}}):
 {code:title=DatanodeStorageInfo}
   void updateState(StorageReport r) {
 capacity = r.getCapacity();
 dfsUsed = r.getDfsUsed();
 remaining = r.getRemaining();
 blockPoolUsed = r.getBlockPoolUsed();
   }
 {code}
 On clusters that were upgraded from a pre HDFS-2832 version though the 
 storage Id has not been rewritten (at least not on the four clusters I 
 checked) so each directory will have the exact same storageId. That means 
 there'll be only a single entry in the {{storageMap}} and it'll be 
 overwritten by a random {{StorageReport}} from the DataNode. This can be seen 
 in the {{updateState}} method above. This just assigns the capacity from the 
 received report, instead it should probably sum it up per received heartbeat.
 The Balancer seems to be one of the only things that actually uses this 
 information so it now considers the utilization of a random drive per 
 DataNode for balancing purposes.
 Things get even worse when a drive has been added or replaced as this will 
 now get a new storage Id so there'll be two entries in the storageMap. As new 
 drives are usually empty it skewes

[jira] [Commented] (HDFS-7575) NameNode not handling heartbeats properly after HDFS-2832

2015-01-08 Thread Arpit Agarwal (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14270527#comment-14270527
 ] 

Arpit Agarwal commented on HDFS-7575:
-

Thanks for the review Colin.

The three tar files for the newly added tests are less than each less than 
15KB, one is less than 10KB. The reason the diff is large is because an 
existing tar file for the HDFS-6482 unit test is being modified. That tar file 
was about 600KB. I think I can rewrite the test case to not require the 
modification, so the diff size would be reasonable.

{code}
 if (this.layoutVersion  HdfsConstants.DATANODE_LAYOUT_VERSION) {
{code}
During upgrade {{this.layoutVersion}} is the pre-upgrade LV and 
{{HdfsConstants.DATANODE_LAYOUT_VERSION}} is the post-upgrade LV. Hence this 
check will always trigger when upgrading to from 2.6 or earlier to 2.7+, which 
is what we want.

I'll add the logging in the next patch revision.

 NameNode not handling heartbeats properly after HDFS-2832
 -

 Key: HDFS-7575
 URL: https://issues.apache.org/jira/browse/HDFS-7575
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0, 2.5.0, 2.6.0
Reporter: Lars Francke
Assignee: Arpit Agarwal
Priority: Critical
 Attachments: HDFS-7575.01.patch, HDFS-7575.02.patch


 Before HDFS-2832 each DataNode would have a unique storageId which included 
 its IP address. Since HDFS-2832 the DataNodes have a unique storageId per 
 storage directory which is just a random UUID.
 They send reports per storage directory in their heartbeats. This heartbeat 
 is processed on the NameNode in the 
 {{DatanodeDescriptor#updateHeartbeatState}} method. Pre HDFS-2832 this would 
 just store the information per Datanode. After the patch though each DataNode 
 can have multiple different storages so it's stored in a map keyed by the 
 storage Id.
 This works fine for all clusters that have been installed post HDFS-2832 as 
 they get a UUID for their storage Id. So a DN with 8 drives has a map with 8 
 different keys. On each Heartbeat the Map is searched and updated 
 ({{DatanodeStorageInfo storage = storageMap.get(s.getStorageID());}}):
 {code:title=DatanodeStorageInfo}
   void updateState(StorageReport r) {
 capacity = r.getCapacity();
 dfsUsed = r.getDfsUsed();
 remaining = r.getRemaining();
 blockPoolUsed = r.getBlockPoolUsed();
   }
 {code}
 On clusters that were upgraded from a pre HDFS-2832 version though the 
 storage Id has not been rewritten (at least not on the four clusters I 
 checked) so each directory will have the exact same storageId. That means 
 there'll be only a single entry in the {{storageMap}} and it'll be 
 overwritten by a random {{StorageReport}} from the DataNode. This can be seen 
 in the {{updateState}} method above. This just assigns the capacity from the 
 received report, instead it should probably sum it up per received heartbeat.
 The Balancer seems to be one of the only things that actually uses this 
 information so it now considers the utilization of a random drive per 
 DataNode for balancing purposes.
 Things get even worse when a drive has been added or replaced as this will 
 now get a new storage Id so there'll be two entries in the storageMap. As new 
 drives are usually empty it skewes the balancers decision in a way that this 
 node will never be considered over-utilized.
 Another problem is that old StorageReports are never removed from the 
 storageMap. So if I replace a drive and it gets a new storage Id the old one 
 will still be in place and used for all calculations by the Balancer until a 
 restart of the NameNode.
 I can try providing a patch that does the following:
 * Instead of using a Map I could just store the array we receive or instead 
 of storing an array sum up the values for reports with the same Id
 * On each heartbeat clear the map (so we know we have up to date information)
 Does that sound sensible?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7575) NameNode not handling heartbeats properly after HDFS-2832

2015-01-08 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14270229#comment-14270229
 ] 

Hadoop QA commented on HDFS-7575:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12690971/HDFS-7575.01.patch
  against trunk revision 7e2d9a3.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9157//console

This message is automatically generated.

 NameNode not handling heartbeats properly after HDFS-2832
 -

 Key: HDFS-7575
 URL: https://issues.apache.org/jira/browse/HDFS-7575
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0, 2.5.0, 2.6.0
Reporter: Lars Francke
Assignee: Arpit Agarwal
Priority: Critical
 Attachments: HDFS-7575.01.patch


 Before HDFS-2832 each DataNode would have a unique storageId which included 
 its IP address. Since HDFS-2832 the DataNodes have a unique storageId per 
 storage directory which is just a random UUID.
 They send reports per storage directory in their heartbeats. This heartbeat 
 is processed on the NameNode in the 
 {{DatanodeDescriptor#updateHeartbeatState}} method. Pre HDFS-2832 this would 
 just store the information per Datanode. After the patch though each DataNode 
 can have multiple different storages so it's stored in a map keyed by the 
 storage Id.
 This works fine for all clusters that have been installed post HDFS-2832 as 
 they get a UUID for their storage Id. So a DN with 8 drives has a map with 8 
 different keys. On each Heartbeat the Map is searched and updated 
 ({{DatanodeStorageInfo storage = storageMap.get(s.getStorageID());}}):
 {code:title=DatanodeStorageInfo}
   void updateState(StorageReport r) {
 capacity = r.getCapacity();
 dfsUsed = r.getDfsUsed();
 remaining = r.getRemaining();
 blockPoolUsed = r.getBlockPoolUsed();
   }
 {code}
 On clusters that were upgraded from a pre HDFS-2832 version though the 
 storage Id has not been rewritten (at least not on the four clusters I 
 checked) so each directory will have the exact same storageId. That means 
 there'll be only a single entry in the {{storageMap}} and it'll be 
 overwritten by a random {{StorageReport}} from the DataNode. This can be seen 
 in the {{updateState}} method above. This just assigns the capacity from the 
 received report, instead it should probably sum it up per received heartbeat.
 The Balancer seems to be one of the only things that actually uses this 
 information so it now considers the utilization of a random drive per 
 DataNode for balancing purposes.
 Things get even worse when a drive has been added or replaced as this will 
 now get a new storage Id so there'll be two entries in the storageMap. As new 
 drives are usually empty it skewes the balancers decision in a way that this 
 node will never be considered over-utilized.
 Another problem is that old StorageReports are never removed from the 
 storageMap. So if I replace a drive and it gets a new storage Id the old one 
 will still be in place and used for all calculations by the Balancer until a 
 restart of the NameNode.
 I can try providing a patch that does the following:
 * Instead of using a Map I could just store the array we receive or instead 
 of storing an array sum up the values for reports with the same Id
 * On each heartbeat clear the map (so we know we have up to date information)
 Does that sound sensible?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7575) NameNode not handling heartbeats properly after HDFS-2832

2015-01-08 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14270532#comment-14270532
 ] 

Hadoop QA commented on HDFS-7575:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12690991/HDFS-7575.02.patch
  against trunk revision ae91b13.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 9 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.TestDatanodeLayoutUpgrade

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9158//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9158//console

This message is automatically generated.

 NameNode not handling heartbeats properly after HDFS-2832
 -

 Key: HDFS-7575
 URL: https://issues.apache.org/jira/browse/HDFS-7575
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0, 2.5.0, 2.6.0
Reporter: Lars Francke
Assignee: Arpit Agarwal
Priority: Critical
 Attachments: HDFS-7575.01.patch, HDFS-7575.02.patch


 Before HDFS-2832 each DataNode would have a unique storageId which included 
 its IP address. Since HDFS-2832 the DataNodes have a unique storageId per 
 storage directory which is just a random UUID.
 They send reports per storage directory in their heartbeats. This heartbeat 
 is processed on the NameNode in the 
 {{DatanodeDescriptor#updateHeartbeatState}} method. Pre HDFS-2832 this would 
 just store the information per Datanode. After the patch though each DataNode 
 can have multiple different storages so it's stored in a map keyed by the 
 storage Id.
 This works fine for all clusters that have been installed post HDFS-2832 as 
 they get a UUID for their storage Id. So a DN with 8 drives has a map with 8 
 different keys. On each Heartbeat the Map is searched and updated 
 ({{DatanodeStorageInfo storage = storageMap.get(s.getStorageID());}}):
 {code:title=DatanodeStorageInfo}
   void updateState(StorageReport r) {
 capacity = r.getCapacity();
 dfsUsed = r.getDfsUsed();
 remaining = r.getRemaining();
 blockPoolUsed = r.getBlockPoolUsed();
   }
 {code}
 On clusters that were upgraded from a pre HDFS-2832 version though the 
 storage Id has not been rewritten (at least not on the four clusters I 
 checked) so each directory will have the exact same storageId. That means 
 there'll be only a single entry in the {{storageMap}} and it'll be 
 overwritten by a random {{StorageReport}} from the DataNode. This can be seen 
 in the {{updateState}} method above. This just assigns the capacity from the 
 received report, instead it should probably sum it up per received heartbeat.
 The Balancer seems to be one of the only things that actually uses this 
 information so it now considers the utilization of a random drive per 
 DataNode for balancing purposes.
 Things get even worse when a drive has been added or replaced as this will 
 now get a new storage Id so there'll be two entries in the storageMap. As new 
 drives are usually empty it skewes the balancers decision in a way that this 
 node will never be considered over-utilized.
 Another problem is that old StorageReports are never removed from the 
 storageMap. So if I replace a drive and it gets a new storage Id the old one 
 will still be in place and used for all calculations by the Balancer until a 
 restart of the NameNode.
 I can try providing a patch that does the following:
 * Instead of using a Map I could just store the array we receive or instead 
 of storing an array sum up the values for reports with the same Id
 * On each heartbeat clear the map (so we know we have up to date information)
 Does that sound sensible?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7575) NameNode not handling heartbeats properly after HDFS-2832

2015-01-06 Thread Lars Francke (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14265903#comment-14265903
 ] 

Lars Francke commented on HDFS-7575:


I don't object at all, quite the opposite. Thanks for taking care of this.

 NameNode not handling heartbeats properly after HDFS-2832
 -

 Key: HDFS-7575
 URL: https://issues.apache.org/jira/browse/HDFS-7575
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0, 2.5.0, 2.6.0
Reporter: Lars Francke
Assignee: Arpit Agarwal
Priority: Critical

 Before HDFS-2832 each DataNode would have a unique storageId which included 
 its IP address. Since HDFS-2832 the DataNodes have a unique storageId per 
 storage directory which is just a random UUID.
 They send reports per storage directory in their heartbeats. This heartbeat 
 is processed on the NameNode in the 
 {{DatanodeDescriptor#updateHeartbeatState}} method. Pre HDFS-2832 this would 
 just store the information per Datanode. After the patch though each DataNode 
 can have multiple different storages so it's stored in a map keyed by the 
 storage Id.
 This works fine for all clusters that have been installed post HDFS-2832 as 
 they get a UUID for their storage Id. So a DN with 8 drives has a map with 8 
 different keys. On each Heartbeat the Map is searched and updated 
 ({{DatanodeStorageInfo storage = storageMap.get(s.getStorageID());}}):
 {code:title=DatanodeStorageInfo}
   void updateState(StorageReport r) {
 capacity = r.getCapacity();
 dfsUsed = r.getDfsUsed();
 remaining = r.getRemaining();
 blockPoolUsed = r.getBlockPoolUsed();
   }
 {code}
 On clusters that were upgraded from a pre HDFS-2832 version though the 
 storage Id has not been rewritten (at least not on the four clusters I 
 checked) so each directory will have the exact same storageId. That means 
 there'll be only a single entry in the {{storageMap}} and it'll be 
 overwritten by a random {{StorageReport}} from the DataNode. This can be seen 
 in the {{updateState}} method above. This just assigns the capacity from the 
 received report, instead it should probably sum it up per received heartbeat.
 The Balancer seems to be one of the only things that actually uses this 
 information so it now considers the utilization of a random drive per 
 DataNode for balancing purposes.
 Things get even worse when a drive has been added or replaced as this will 
 now get a new storage Id so there'll be two entries in the storageMap. As new 
 drives are usually empty it skewes the balancers decision in a way that this 
 node will never be considered over-utilized.
 Another problem is that old StorageReports are never removed from the 
 storageMap. So if I replace a drive and it gets a new storage Id the old one 
 will still be in place and used for all calculations by the Balancer until a 
 restart of the NameNode.
 I can try providing a patch that does the following:
 * Instead of using a Map I could just store the array we receive or instead 
 of storing an array sum up the values for reports with the same Id
 * On each heartbeat clear the map (so we know we have up to date information)
 Does that sound sensible?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7575) NameNode not handling heartbeats properly after HDFS-2832

2015-01-05 Thread Arpit Agarwal (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14265558#comment-14265558
 ] 

Arpit Agarwal commented on HDFS-7575:
-

I'm testing a fix and expect to post a patch by tomorrow. It will also fix for 
the storageMap issue.

 NameNode not handling heartbeats properly after HDFS-2832
 -

 Key: HDFS-7575
 URL: https://issues.apache.org/jira/browse/HDFS-7575
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0, 2.5.0, 2.6.0
Reporter: Lars Francke
Assignee: Arpit Agarwal
Priority: Critical

 Before HDFS-2832 each DataNode would have a unique storageId which included 
 its IP address. Since HDFS-2832 the DataNodes have a unique storageId per 
 storage directory which is just a random UUID.
 They send reports per storage directory in their heartbeats. This heartbeat 
 is processed on the NameNode in the 
 {{DatanodeDescriptor#updateHeartbeatState}} method. Pre HDFS-2832 this would 
 just store the information per Datanode. After the patch though each DataNode 
 can have multiple different storages so it's stored in a map keyed by the 
 storage Id.
 This works fine for all clusters that have been installed post HDFS-2832 as 
 they get a UUID for their storage Id. So a DN with 8 drives has a map with 8 
 different keys. On each Heartbeat the Map is searched and updated 
 ({{DatanodeStorageInfo storage = storageMap.get(s.getStorageID());}}):
 {code:title=DatanodeStorageInfo}
   void updateState(StorageReport r) {
 capacity = r.getCapacity();
 dfsUsed = r.getDfsUsed();
 remaining = r.getRemaining();
 blockPoolUsed = r.getBlockPoolUsed();
   }
 {code}
 On clusters that were upgraded from a pre HDFS-2832 version though the 
 storage Id has not been rewritten (at least not on the four clusters I 
 checked) so each directory will have the exact same storageId. That means 
 there'll be only a single entry in the {{storageMap}} and it'll be 
 overwritten by a random {{StorageReport}} from the DataNode. This can be seen 
 in the {{updateState}} method above. This just assigns the capacity from the 
 received report, instead it should probably sum it up per received heartbeat.
 The Balancer seems to be one of the only things that actually uses this 
 information so it now considers the utilization of a random drive per 
 DataNode for balancing purposes.
 Things get even worse when a drive has been added or replaced as this will 
 now get a new storage Id so there'll be two entries in the storageMap. As new 
 drives are usually empty it skewes the balancers decision in a way that this 
 node will never be considered over-utilized.
 Another problem is that old StorageReports are never removed from the 
 storageMap. So if I replace a drive and it gets a new storage Id the old one 
 will still be in place and used for all calculations by the Balancer until a 
 restart of the NameNode.
 I can try providing a patch that does the following:
 * Instead of using a Map I could just store the array we receive or instead 
 of storing an array sum up the values for reports with the same Id
 * On each heartbeat clear the map (so we know we have up to date information)
 Does that sound sensible?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7575) NameNode not handling heartbeats properly after HDFS-2832

2015-01-05 Thread Colin Patrick McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14264840#comment-14264840
 ] 

Colin Patrick McCabe commented on HDFS-7575:


I'm concerned that if storage ids are not unique, a lot of other bad things 
could happen.  I don't think we should hack around this.

I know the upgrade code path isn't fun but the alternatives are worse.  
Anywhere where someone is using a storage id, it could fail in mysterious ways 
on those older, improperly upgraded clusters.  Our unit tests would not catch 
this since for newly installed clusters, the problem does not occur.  And 
people are going to keep assuming that storage IDs are unique, because they're 
supposed to be.

 NameNode not handling heartbeats properly after HDFS-2832
 -

 Key: HDFS-7575
 URL: https://issues.apache.org/jira/browse/HDFS-7575
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Lars Francke

 Before HDFS-2832 each DataNode would have a unique storageId which included 
 its IP address. Since HDFS-2832 the DataNodes have a unique storageId per 
 storage directory which is just a random UUID.
 They send reports per storage directory in their heartbeats. This heartbeat 
 is processed on the NameNode in the 
 {{DatanodeDescriptor#updateHeartbeatState}} method. Pre HDFS-2832 this would 
 just store the information per Datanode. After the patch though each DataNode 
 can have multiple different storages so it's stored in a map keyed by the 
 storage Id.
 This works fine for all clusters that have been installed post HDFS-2832 as 
 they get a UUID for their storage Id. So a DN with 8 drives has a map with 8 
 different keys. On each Heartbeat the Map is searched and updated 
 ({{DatanodeStorageInfo storage = storageMap.get(s.getStorageID());}}):
 {code:title=DatanodeStorageInfo}
   void updateState(StorageReport r) {
 capacity = r.getCapacity();
 dfsUsed = r.getDfsUsed();
 remaining = r.getRemaining();
 blockPoolUsed = r.getBlockPoolUsed();
   }
 {code}
 On clusters that were upgraded from a pre HDFS-2832 version though the 
 storage Id has not been rewritten (at least not on the four clusters I 
 checked) so each directory will have the exact same storageId. That means 
 there'll be only a single entry in the {{storageMap}} and it'll be 
 overwritten by a random {{StorageReport}} from the DataNode. This can be seen 
 in the {{updateState}} method above. This just assigns the capacity from the 
 received report, instead it should probably sum it up per received heartbeat.
 The Balancer seems to be one of the only things that actually uses this 
 information so it now considers the utilization of a random drive per 
 DataNode for balancing purposes.
 Things get even worse when a drive has been added or replaced as this will 
 now get a new storage Id so there'll be two entries in the storageMap. As new 
 drives are usually empty it skewes the balancers decision in a way that this 
 node will never be considered over-utilized.
 Another problem is that old StorageReports are never removed from the 
 storageMap. So if I replace a drive and it gets a new storage Id the old one 
 will still be in place and used for all calculations by the Balancer until a 
 restart of the NameNode.
 I can try providing a patch that does the following:
 * Instead of using a Map I could just store the array we receive or instead 
 of storing an array sum up the values for reports with the same Id
 * On each heartbeat clear the map (so we know we have up to date information)
 Does that sound sensible?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7575) NameNode not handling heartbeats properly after HDFS-2832

2015-01-05 Thread Daryn Sharp (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14264861#comment-14264861
 ] 

Daryn Sharp commented on HDFS-7575:
---

I completely agree with Colin regarding an upgrade path.  Kihwal and I have had 
concerns about the shared storage id for quite awhile now, have discussed how 
to auto-upgrade old storage dirs, but have not had the cycles to do it.

 NameNode not handling heartbeats properly after HDFS-2832
 -

 Key: HDFS-7575
 URL: https://issues.apache.org/jira/browse/HDFS-7575
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Lars Francke

 Before HDFS-2832 each DataNode would have a unique storageId which included 
 its IP address. Since HDFS-2832 the DataNodes have a unique storageId per 
 storage directory which is just a random UUID.
 They send reports per storage directory in their heartbeats. This heartbeat 
 is processed on the NameNode in the 
 {{DatanodeDescriptor#updateHeartbeatState}} method. Pre HDFS-2832 this would 
 just store the information per Datanode. After the patch though each DataNode 
 can have multiple different storages so it's stored in a map keyed by the 
 storage Id.
 This works fine for all clusters that have been installed post HDFS-2832 as 
 they get a UUID for their storage Id. So a DN with 8 drives has a map with 8 
 different keys. On each Heartbeat the Map is searched and updated 
 ({{DatanodeStorageInfo storage = storageMap.get(s.getStorageID());}}):
 {code:title=DatanodeStorageInfo}
   void updateState(StorageReport r) {
 capacity = r.getCapacity();
 dfsUsed = r.getDfsUsed();
 remaining = r.getRemaining();
 blockPoolUsed = r.getBlockPoolUsed();
   }
 {code}
 On clusters that were upgraded from a pre HDFS-2832 version though the 
 storage Id has not been rewritten (at least not on the four clusters I 
 checked) so each directory will have the exact same storageId. That means 
 there'll be only a single entry in the {{storageMap}} and it'll be 
 overwritten by a random {{StorageReport}} from the DataNode. This can be seen 
 in the {{updateState}} method above. This just assigns the capacity from the 
 received report, instead it should probably sum it up per received heartbeat.
 The Balancer seems to be one of the only things that actually uses this 
 information so it now considers the utilization of a random drive per 
 DataNode for balancing purposes.
 Things get even worse when a drive has been added or replaced as this will 
 now get a new storage Id so there'll be two entries in the storageMap. As new 
 drives are usually empty it skewes the balancers decision in a way that this 
 node will never be considered over-utilized.
 Another problem is that old StorageReports are never removed from the 
 storageMap. So if I replace a drive and it gets a new storage Id the old one 
 will still be in place and used for all calculations by the Balancer until a 
 restart of the NameNode.
 I can try providing a patch that does the following:
 * Instead of using a Map I could just store the array we receive or instead 
 of storing an array sum up the values for reports with the same Id
 * On each heartbeat clear the map (so we know we have up to date information)
 Does that sound sensible?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7575) NameNode not handling heartbeats properly after HDFS-2832

2015-01-05 Thread Lars Francke (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14264896#comment-14264896
 ] 

Lars Francke commented on HDFS-7575:


Okay, agreed. Thanks for the input. I'm afraid that I won't have time to learn 
this code and provide a fix. If anyone else could step up that'd be much 
appreciated.

The second part of the described problem will still happen though and needs to 
be fixed in the NN Heartbeat code: Old storageIds will never be pruned at the 
moment. I suggest not updating the {{storageMap}} in {{DatanodeDescriptor}} but 
overwriting it with what the latest Heartbeat gave us. Does that sound sensible 
or am I missing something?

I'll open a separate issue for the upgrade changes.

 NameNode not handling heartbeats properly after HDFS-2832
 -

 Key: HDFS-7575
 URL: https://issues.apache.org/jira/browse/HDFS-7575
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Lars Francke

 Before HDFS-2832 each DataNode would have a unique storageId which included 
 its IP address. Since HDFS-2832 the DataNodes have a unique storageId per 
 storage directory which is just a random UUID.
 They send reports per storage directory in their heartbeats. This heartbeat 
 is processed on the NameNode in the 
 {{DatanodeDescriptor#updateHeartbeatState}} method. Pre HDFS-2832 this would 
 just store the information per Datanode. After the patch though each DataNode 
 can have multiple different storages so it's stored in a map keyed by the 
 storage Id.
 This works fine for all clusters that have been installed post HDFS-2832 as 
 they get a UUID for their storage Id. So a DN with 8 drives has a map with 8 
 different keys. On each Heartbeat the Map is searched and updated 
 ({{DatanodeStorageInfo storage = storageMap.get(s.getStorageID());}}):
 {code:title=DatanodeStorageInfo}
   void updateState(StorageReport r) {
 capacity = r.getCapacity();
 dfsUsed = r.getDfsUsed();
 remaining = r.getRemaining();
 blockPoolUsed = r.getBlockPoolUsed();
   }
 {code}
 On clusters that were upgraded from a pre HDFS-2832 version though the 
 storage Id has not been rewritten (at least not on the four clusters I 
 checked) so each directory will have the exact same storageId. That means 
 there'll be only a single entry in the {{storageMap}} and it'll be 
 overwritten by a random {{StorageReport}} from the DataNode. This can be seen 
 in the {{updateState}} method above. This just assigns the capacity from the 
 received report, instead it should probably sum it up per received heartbeat.
 The Balancer seems to be one of the only things that actually uses this 
 information so it now considers the utilization of a random drive per 
 DataNode for balancing purposes.
 Things get even worse when a drive has been added or replaced as this will 
 now get a new storage Id so there'll be two entries in the storageMap. As new 
 drives are usually empty it skewes the balancers decision in a way that this 
 node will never be considered over-utilized.
 Another problem is that old StorageReports are never removed from the 
 storageMap. So if I replace a drive and it gets a new storage Id the old one 
 will still be in place and used for all calculations by the Balancer until a 
 restart of the NameNode.
 I can try providing a patch that does the following:
 * Instead of using a Map I could just store the array we receive or instead 
 of storing an array sum up the values for reports with the same Id
 * On each heartbeat clear the map (so we know we have up to date information)
 Does that sound sensible?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7575) NameNode not handling heartbeats properly after HDFS-2832

2015-01-05 Thread Arpit Agarwal (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14264982#comment-14264982
 ] 

Arpit Agarwal commented on HDFS-7575:
-

[~lars_francke], thanks for reporting this bug and the thorough investigation. 
The correct fix is to generate storage IDs as part of the upgrade as Colin 
said. I thought I had handled this case in HDFS-2832. Assigned it to myself 
since I broke it. Let me know if you object.

bq. Another problem is that old StorageReports are never removed from the 
storageMap. So if I replace a drive and it gets a new storage Id the old one 
will still be in place and used for all calculations by the Balancer until a 
restart of the NameNode.
We can fix it in a separate Jira. I don't think just overwriting storageMap is 
correct though.

 NameNode not handling heartbeats properly after HDFS-2832
 -

 Key: HDFS-7575
 URL: https://issues.apache.org/jira/browse/HDFS-7575
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Lars Francke
Assignee: Arpit Agarwal

 Before HDFS-2832 each DataNode would have a unique storageId which included 
 its IP address. Since HDFS-2832 the DataNodes have a unique storageId per 
 storage directory which is just a random UUID.
 They send reports per storage directory in their heartbeats. This heartbeat 
 is processed on the NameNode in the 
 {{DatanodeDescriptor#updateHeartbeatState}} method. Pre HDFS-2832 this would 
 just store the information per Datanode. After the patch though each DataNode 
 can have multiple different storages so it's stored in a map keyed by the 
 storage Id.
 This works fine for all clusters that have been installed post HDFS-2832 as 
 they get a UUID for their storage Id. So a DN with 8 drives has a map with 8 
 different keys. On each Heartbeat the Map is searched and updated 
 ({{DatanodeStorageInfo storage = storageMap.get(s.getStorageID());}}):
 {code:title=DatanodeStorageInfo}
   void updateState(StorageReport r) {
 capacity = r.getCapacity();
 dfsUsed = r.getDfsUsed();
 remaining = r.getRemaining();
 blockPoolUsed = r.getBlockPoolUsed();
   }
 {code}
 On clusters that were upgraded from a pre HDFS-2832 version though the 
 storage Id has not been rewritten (at least not on the four clusters I 
 checked) so each directory will have the exact same storageId. That means 
 there'll be only a single entry in the {{storageMap}} and it'll be 
 overwritten by a random {{StorageReport}} from the DataNode. This can be seen 
 in the {{updateState}} method above. This just assigns the capacity from the 
 received report, instead it should probably sum it up per received heartbeat.
 The Balancer seems to be one of the only things that actually uses this 
 information so it now considers the utilization of a random drive per 
 DataNode for balancing purposes.
 Things get even worse when a drive has been added or replaced as this will 
 now get a new storage Id so there'll be two entries in the storageMap. As new 
 drives are usually empty it skewes the balancers decision in a way that this 
 node will never be considered over-utilized.
 Another problem is that old StorageReports are never removed from the 
 storageMap. So if I replace a drive and it gets a new storage Id the old one 
 will still be in place and used for all calculations by the Balancer until a 
 restart of the NameNode.
 I can try providing a patch that does the following:
 * Instead of using a Map I could just store the array we receive or instead 
 of storing an array sum up the values for reports with the same Id
 * On each heartbeat clear the map (so we know we have up to date information)
 Does that sound sensible?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7575) NameNode not handling heartbeats properly after HDFS-2832

2015-01-03 Thread Lars Francke (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14263528#comment-14263528
 ] 

Lars Francke commented on HDFS-7575:


Agreed, could also be seen as an upgrade problem.

I could probably prepare a patch that fixes the NameNode handling in the way I 
described. It would make the Balancer work again. I don't think I feel 
comfortable enough with the upgrade code though.

What do you think?

 NameNode not handling heartbeats properly after HDFS-2832
 -

 Key: HDFS-7575
 URL: https://issues.apache.org/jira/browse/HDFS-7575
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Lars Francke

 Before HDFS-2832 each DataNode would have a unique storageId which included 
 its IP address. Since HDFS-2832 the DataNodes have a unique storageId per 
 storage directory which is just a random UUID.
 They send reports per storage directory in their heartbeats. This heartbeat 
 is processed on the NameNode in the 
 {{DatanodeDescriptor#updateHeartbeatState}} method. Pre HDFS-2832 this would 
 just store the information per Datanode. After the patch though each DataNode 
 can have multiple different storages so it's stored in a map keyed by the 
 storage Id.
 This works fine for all clusters that have been installed post HDFS-2832 as 
 they get a UUID for their storage Id. So a DN with 8 drives has a map with 8 
 different keys. On each Heartbeat the Map is searched and updated 
 ({{DatanodeStorageInfo storage = storageMap.get(s.getStorageID());}}):
 {code:title=DatanodeStorageInfo}
   void updateState(StorageReport r) {
 capacity = r.getCapacity();
 dfsUsed = r.getDfsUsed();
 remaining = r.getRemaining();
 blockPoolUsed = r.getBlockPoolUsed();
   }
 {code}
 On clusters that were upgraded from a pre HDFS-2832 version though the 
 storage Id has not been rewritten (at least not on the four clusters I 
 checked) so each directory will have the exact same storageId. That means 
 there'll be only a single entry in the {{storageMap}} and it'll be 
 overwritten by a random {{StorageReport}} from the DataNode. This can be seen 
 in the {{updateState}} method above. This just assigns the capacity from the 
 received report, instead it should probably sum it up per received heartbeat.
 The Balancer seems to be one of the only things that actually uses this 
 information so it now considers the utilization of a random drive per 
 DataNode for balancing purposes.
 Things get even worse when a drive has been added or replaced as this will 
 now get a new storage Id so there'll be two entries in the storageMap. As new 
 drives are usually empty it skewes the balancers decision in a way that this 
 node will never be considered over-utilized.
 Another problem is that old StorageReports are never removed from the 
 storageMap. So if I replace a drive and it gets a new storage Id the old one 
 will still be in place and used for all calculations by the Balancer until a 
 restart of the NameNode.
 I can try providing a patch that does the following:
 * Instead of using a Map I could just store the array we receive or instead 
 of storing an array sum up the values for reports with the same Id
 * On each heartbeat clear the map (so we know we have up to date information)
 Does that sound sensible?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7575) NameNode not handling heartbeats properly after HDFS-2832

2014-12-30 Thread Lars Francke (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14261055#comment-14261055
 ] 

Lars Francke commented on HDFS-7575:


I worked around this by doing the following for each DataNode:
* Stop the DataNode
* Change the storageId in each storage directory (it's in the VERSION file, 
e.g. {{/mnt/disk1/dfs/dn/current/VERSION}}) to a unique value
* Start the DataNode

Then afterwards I restarted the Standby NN (NN2), failed over manually, 
restarted the new Standby NN (NN1).

The Balancer seems to run fine since then.

 NameNode not handling heartbeats properly after HDFS-2832
 -

 Key: HDFS-7575
 URL: https://issues.apache.org/jira/browse/HDFS-7575
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Lars Francke

 Before HDFS-2832 each DataNode would have a unique storageId which included 
 its IP address. Since HDFS-2832 the DataNodes have a unique storageId per 
 storage directory which is just a random UUID.
 They send reports per storage directory in their heartbeats. This heartbeat 
 is processed on the NameNode in the 
 {{DatanodeDescriptor#updateHeartbeatState}} method. Pre HDFS-2832 this would 
 just store the information per Datanode. After the patch though each DataNode 
 can have multiple different storages so it's stored in a map keyed by the 
 storage Id.
 This works fine for all clusters that have been installed post HDFS-2832 as 
 they get a UUID for their storage Id. So a DN with 8 drives has a map with 8 
 different keys. On each Heartbeat the Map is searched and updated 
 ({{DatanodeStorageInfo storage = storageMap.get(s.getStorageID());}}):
 {code:title=DatanodeStorageInfo}
   void updateState(StorageReport r) {
 capacity = r.getCapacity();
 dfsUsed = r.getDfsUsed();
 remaining = r.getRemaining();
 blockPoolUsed = r.getBlockPoolUsed();
   }
 {code}
 On clusters that were upgraded from a pre HDFS-2832 version though the 
 storage Id has not been rewritten (at least not on the four clusters I 
 checked) so each directory will have the exact same storageId. That means 
 there'll be only a single entry in the {{storageMap}} and it'll be 
 overwritten by a random {{StorageReport}} from the DataNode. This can be seen 
 in the {{updateState}} method above. This just assigns the capacity from the 
 received report, instead it should probably sum it up per received heartbeat.
 The Balancer seems to be one of the only things that actually uses this 
 information so it now considers the utilization of a random drive per 
 DataNode for balancing purposes.
 Things get even worse when a drive has been added or replaced as this will 
 now get a new storage Id so there'll be two entries in the storageMap. As new 
 drives are usually empty it skewes the balancers decision in a way that this 
 node will never be considered over-utilized.
 Another problem is that old StorageReports are never removed from the 
 storageMap. So if I replace a drive and it gets a new storage Id the old one 
 will still be in place and used for all calculations by the Balancer until a 
 restart of the NameNode.
 I can try providing a patch that does the following:
 * Instead of using a Map I could just store the array we receive or instead 
 of storing an array sum up the values for reports with the same Id
 * On each heartbeat clear the map (so we know we have up to date information)
 Does that sound sensible?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7575) NameNode not handling heartbeats properly after HDFS-2832

2014-12-30 Thread Colin Patrick McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14261504#comment-14261504
 ] 

Colin Patrick McCabe commented on HDFS-7575:


This seems like an upgrade problem.  Each directory should have its own storage 
id.  It seems like we should fix the upgrade code to make sure that this is the 
case.  If necessary, that means we should generate new codes for some 
directories.

 NameNode not handling heartbeats properly after HDFS-2832
 -

 Key: HDFS-7575
 URL: https://issues.apache.org/jira/browse/HDFS-7575
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Lars Francke

 Before HDFS-2832 each DataNode would have a unique storageId which included 
 its IP address. Since HDFS-2832 the DataNodes have a unique storageId per 
 storage directory which is just a random UUID.
 They send reports per storage directory in their heartbeats. This heartbeat 
 is processed on the NameNode in the 
 {{DatanodeDescriptor#updateHeartbeatState}} method. Pre HDFS-2832 this would 
 just store the information per Datanode. After the patch though each DataNode 
 can have multiple different storages so it's stored in a map keyed by the 
 storage Id.
 This works fine for all clusters that have been installed post HDFS-2832 as 
 they get a UUID for their storage Id. So a DN with 8 drives has a map with 8 
 different keys. On each Heartbeat the Map is searched and updated 
 ({{DatanodeStorageInfo storage = storageMap.get(s.getStorageID());}}):
 {code:title=DatanodeStorageInfo}
   void updateState(StorageReport r) {
 capacity = r.getCapacity();
 dfsUsed = r.getDfsUsed();
 remaining = r.getRemaining();
 blockPoolUsed = r.getBlockPoolUsed();
   }
 {code}
 On clusters that were upgraded from a pre HDFS-2832 version though the 
 storage Id has not been rewritten (at least not on the four clusters I 
 checked) so each directory will have the exact same storageId. That means 
 there'll be only a single entry in the {{storageMap}} and it'll be 
 overwritten by a random {{StorageReport}} from the DataNode. This can be seen 
 in the {{updateState}} method above. This just assigns the capacity from the 
 received report, instead it should probably sum it up per received heartbeat.
 The Balancer seems to be one of the only things that actually uses this 
 information so it now considers the utilization of a random drive per 
 DataNode for balancing purposes.
 Things get even worse when a drive has been added or replaced as this will 
 now get a new storage Id so there'll be two entries in the storageMap. As new 
 drives are usually empty it skewes the balancers decision in a way that this 
 node will never be considered over-utilized.
 Another problem is that old StorageReports are never removed from the 
 storageMap. So if I replace a drive and it gets a new storage Id the old one 
 will still be in place and used for all calculations by the Balancer until a 
 restart of the NameNode.
 I can try providing a patch that does the following:
 * Instead of using a Map I could just store the array we receive or instead 
 of storing an array sum up the values for reports with the same Id
 * On each heartbeat clear the map (so we know we have up to date information)
 Does that sound sensible?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

51 matches

Mail list logo