subject:"\[jira\] \[Updated\] \(HDFS\-4832\) Namenode doesn't change the number of missing blocks in safemode when DNs rejoin or leave"

[jira] [Updated] (HDFS-4832) Namenode doesn't change the number of missing blocks in safemode when DNs rejoin or leave

2013-06-07 Thread Ravi Prakash (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-4832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ravi Prakash updated HDFS-4832:
---

Status: Open (was: Patch Available)

Namenode doesn't change the number of missing blocks in safemode when DNs
rejoin or leave
-

Key: HDFS-4832
URL: https://issues.apache.org/jira/browse/HDFS-4832
Project: Hadoop HDFS
Issue Type: Bug
Affects Versions: 0.23.7, 3.0.0, 2.1.0-beta
Reporter: Ravi Prakash
Assignee: Ravi Prakash
Priority: Critical
Attachments: HDFS-4832.patch, HDFS-4832.patch, HDFS-4832.patch,
HDFS-4832.patch

Courtesy Karri VRK Reddy!
{quote}
1. Namenode lost datanodes causing missing blocks
2. Namenode was put in safe mode
3. Datanode restarted on dead nodes
4. Waited for lots of time for the NN UI to reflect the recovered blocks.
5. Forced NN out of safe mode and suddenly, no more missing blocks anymore.
{quote}
I was able to replicate this on 0.23 and trunk. I set
dfs.namenode.heartbeat.recheck-interval to 1 and killed the DN to simulate
lost datanode. The opposite case also has problems (i.e. Datanode failing
when NN is in safemode, doesn't lead to a missing blocks message)
Without the NN updating this list of missing blocks, the grid admins will not
know when to take the cluster out of safemode.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-4832) Namenode doesn't change the number of missing blocks in safemode when DNs rejoin or leave

2013-06-07 Thread Ravi Prakash (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-4832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ravi Prakash updated HDFS-4832:
---

Attachment: HDFS-4832.branch-0.23.patch

The patch ported to trunk

Namenode doesn't change the number of missing blocks in safemode when DNs
rejoin or leave
-

Key: HDFS-4832
URL: https://issues.apache.org/jira/browse/HDFS-4832
Project: Hadoop HDFS
Issue Type: Bug
Affects Versions: 3.0.0, 0.23.7, 2.1.0-beta
Reporter: Ravi Prakash
Assignee: Ravi Prakash
Priority: Critical
Attachments: HDFS-4832.branch-0.23.patch, HDFS-4832.patch,
HDFS-4832.patch, HDFS-4832.patch, HDFS-4832.patch

[jira] [Updated] (HDFS-4832) Namenode doesn't change the number of missing blocks in safemode when DNs rejoin or leave

2013-06-07 Thread Ravi Prakash (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-4832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ravi Prakash updated HDFS-4832:
---

Status: Patch Available (was: Open)

Namenode doesn't change the number of missing blocks in safemode when DNs
rejoin or leave
-

Key: HDFS-4832
URL: https://issues.apache.org/jira/browse/HDFS-4832
Project: Hadoop HDFS
Issue Type: Bug
Affects Versions: 0.23.7, 3.0.0, 2.1.0-beta
Reporter: Ravi Prakash
Assignee: Ravi Prakash
Priority: Critical
Attachments: HDFS-4832.branch-0.23.patch, HDFS-4832.patch,
HDFS-4832.patch, HDFS-4832.patch, HDFS-4832.patch, HDFS-4832.patch

[jira] [Updated] (HDFS-4832) Namenode doesn't change the number of missing blocks in safemode when DNs rejoin or leave

2013-06-07 Thread Ravi Prakash (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-4832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ravi Prakash updated HDFS-4832:
---

Attachment: HDFS-4832.patch

The patch for trunk and branch-2

Namenode doesn't change the number of missing blocks in safemode when DNs
rejoin or leave
-

[jira] [Updated] (HDFS-4832) Namenode doesn't change the number of missing blocks in safemode when DNs rejoin or leave

2013-06-07 Thread Ravi Prakash (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-4832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ravi Prakash updated HDFS-4832:
---

Status: Open (was: Patch Available)

Namenode doesn't change the number of missing blocks in safemode when DNs
rejoin or leave
-

Key: HDFS-4832
URL: https://issues.apache.org/jira/browse/HDFS-4832
Project: Hadoop HDFS
Issue Type: Bug
Affects Versions: 0.23.7, 3.0.0, 2.1.0-beta
Reporter: Ravi Prakash
Assignee: Ravi Prakash
Priority: Critical
Attachments: HDFS-4832.branch-0.23.patch, HDFS-4832.patch,
HDFS-4832.patch, HDFS-4832.patch, HDFS-4832.patch, HDFS-4832.patch

[jira] [Updated] (HDFS-4832) Namenode doesn't change the number of missing blocks in safemode when DNs rejoin or leave

2013-06-07 Thread Ravi Prakash (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-4832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ravi Prakash updated HDFS-4832:
---

Status: Patch Available (was: Open)

Namenode doesn't change the number of missing blocks in safemode when DNs
rejoin or leave
-

Key: HDFS-4832
URL: https://issues.apache.org/jira/browse/HDFS-4832
Project: Hadoop HDFS
Issue Type: Bug
Affects Versions: 0.23.7, 3.0.0, 2.1.0-beta
Reporter: Ravi Prakash
Assignee: Ravi Prakash
Priority: Critical
Attachments: HDFS-4832.branch-0.23.patch, HDFS-4832.patch,
HDFS-4832.patch, HDFS-4832.patch, HDFS-4832.patch, HDFS-4832.patch

[jira] [Updated] (HDFS-4832) Namenode doesn't change the number of missing blocks in safemode when DNs rejoin or leave

2013-06-07 Thread Ravi Prakash (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-4832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ravi Prakash updated HDFS-4832:
---

Attachment: HDFS-4832.patch

Y u no test my patch Hadoop QA?

Uploading the same patch. Maybe this time it will get picked up

Namenode doesn't change the number of missing blocks in safemode when DNs
rejoin or leave
-

[jira] [Updated] (HDFS-4832) Namenode doesn't change the number of missing blocks in safemode when DNs rejoin or leave

2013-06-07 Thread Kihwal Lee (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-4832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Kihwal Lee updated HDFS-4832:
-

Resolution: Fixed
Fix Version/s: 0.23.9
2.1.0-beta
3.0.0
Release Note: This change makes name node keep its internal replication
queues and data node state updated in manual safe mode. This allows metrics and
UI to present up-to-date information while in safe mode. The behavior during
start-up safe mode is unchanged.
Hadoop Flags: Reviewed
Status: Resolved (was: Patch Available)

I've committed this to trunk, branch-2, branch-2.1.0-beta, and branch-0.23.
Thanks for working on this patch, Ravi.

Namenode doesn't change the number of missing blocks in safemode when DNs
rejoin or leave
-

Attachments: HDFS-4832.branch-0.23.patch, HDFS-4832.patch,
HDFS-4832.patch, HDFS-4832.patch, HDFS-4832.patch, HDFS-4832.patch,
HDFS-4832.patch

[jira] [Updated] (HDFS-4832) Namenode doesn't change the number of missing blocks in safemode when DNs rejoin or leave

2013-06-03 Thread Ravi Prakash (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-4832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ravi Prakash updated HDFS-4832:
---

Status: Open (was: Patch Available)

The patch passed test-patch.sh on my machine several times. Rolling the dice
again.

Namenode doesn't change the number of missing blocks in safemode when DNs
rejoin or leave
-

[jira] [Updated] (HDFS-4832) Namenode doesn't change the number of missing blocks in safemode when DNs rejoin or leave

2013-06-03 Thread Ravi Prakash (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-4832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ravi Prakash updated HDFS-4832:
---

Status: Patch Available (was: Open)

Namenode doesn't change the number of missing blocks in safemode when DNs
rejoin or leave
-

[jira] [Updated] (HDFS-4832) Namenode doesn't change the number of missing blocks in safemode when DNs rejoin or leave

2013-05-29 Thread Ravi Prakash (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-4832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ravi Prakash updated HDFS-4832:
---

Attachment: HDFS-4832.patch

Hmm funny. Eclipse ran the test fine and passed, but the same test failed
when run from the command line. :(

Anyway. I've fixed the test so it passes both, in eclipse as well as on the
command line

Namenode doesn't change the number of missing blocks in safemode when DNs
rejoin or leave
-

Key: HDFS-4832
URL: https://issues.apache.org/jira/browse/HDFS-4832
Project: Hadoop HDFS
Issue Type: Bug
Affects Versions: 3.0.0, 0.23.7, 2.0.5-beta
Reporter: Ravi Prakash
Assignee: Ravi Prakash
Priority: Critical
Attachments: HDFS-4832.patch, HDFS-4832.patch, HDFS-4832.patch,
HDFS-4832.patch

[jira] [Updated] (HDFS-4832) Namenode doesn't change the number of missing blocks in safemode when DNs rejoin or leave

2013-05-28 Thread Ravi Prakash (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-4832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ravi Prakash updated HDFS-4832:
---

Attachment: HDFS-4832.patch

Thanks for your review Kihwal. I've updated the patch.
bq. isInStartupSafeMode() returns true for any auto safe mode. E.g. if the
resource checker puts NN in safe mode, it will return true.
I have filed HDFS-4862 to fix this. The method name is unfortunately contrary
to its behavior.
{quote}
The existing code drained scheduled work in safe mode, but the patch makes it
immediately stops sending scheduled work to DNs. This seems correct behavior
for safe mode, but those work can be sent out after leaving safe mode. That may
not be ideal. E.g. if NN is suffering from a flaky DNS, DNs will appear dead,
come back and dead again, generating a lot of invalidation and replication
work. Admins may put NN in safe mode to safely pass the storm. When they do,
the unnecessary work needs to stop rather than being delayed. Please make sure
unintended damage does not occur after leaving safe mode.
{quote}
UnderReplicatedBlocks is the priority queue maintained for neededReplications,
and it is updated when nodes join or are marked dead. However, once
BlockManager.computeReplicationWorkForBlocks is called, the ReplicationWork is
transferred to the DatanodeDescriptor's replicateBlocks queue, from which it
will not be rescinded. The computeReplicationWorkForBlocks() is called every
replicationRecheckInterval which defaults to 3 seconds. Can we please handle
this in a separate JIRA?

Namenode doesn't change the number of missing blocks in safemode when DNs
rejoin or leave
-

[jira] [Updated] (HDFS-4832) Namenode doesn't change the number of missing blocks in safemode when DNs rejoin or leave

2013-05-23 Thread Ravi Prakash (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-4832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Prakash updated HDFS-4832:
---

Attachment: HDFS-4832.patch

 Namenode doesn't change the number of missing blocks in safemode when DNs 
 rejoin or leave
 -

 Key: HDFS-4832
 URL: https://issues.apache.org/jira/browse/HDFS-4832
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0, 0.23.7, 2.0.5-beta
Reporter: Ravi Prakash
Assignee: Ravi Prakash
Priority: Critical
 Attachments: HDFS-4832.patch, HDFS-4832.patch


 Courtesy Karri VRK Reddy!
 {quote}
 1. Namenode lost datanodes causing missing blocks
 2. Namenode was put in safe mode
 3. Datanode restarted on dead nodes 
 4. Waited for lots of time for the NN UI to reflect the recovered blocks.
 5. Forced NN out of safe mode and suddenly,  no more missing blocks anymore.
 {quote}
 I was able to replicate this on 0.23 and trunk. I set 
 dfs.namenode.heartbeat.recheck-interval to 1 and killed the DN to simulate 
 lost datanode. The opposite case also has problems (i.e. Datanode failing 
 when NN is in safemode, doesn't lead to a missing blocks message)
 Without the NN updating this list of missing blocks, the grid admins will not 
 know when to take the cluster out of safemode.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-4832) Namenode doesn't change the number of missing blocks in safemode when DNs rejoin or leave

2013-05-21 Thread Ravi Prakash (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-4832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ravi Prakash updated HDFS-4832:
---

Summary: Namenode doesn't change the number of missing blocks in safemode
when DNs rejoin or leave (was: Namenode doesn't change the number of missing
blocks in safemode when DNs rejoin)

Namenode doesn't change the number of missing blocks in safemode when DNs
rejoin or leave
-

Courtesy Karri VRK Reddy!
{quote}
1. Namenode lost datanodes causing missing blocks
2. Namenode was put in safe mode
3. Datanode restarted on dead nodes
4. Waited for lots of time for the NN UI to reflect the recovered blocks.
5. Forced NN out of safe mode and suddenly, no more missing blocks anymore.
{quote}
I was able to replicate this on 0.23 and trunk. I set
dfs.namenode.heartbeat.recheck-interval to 1 and killed the DN to simulate
lost datanode.
Without the NN updating this list of missing blocks, the grid admins will not
know when to take the cluster out of safemode.

[jira] [Updated] (HDFS-4832) Namenode doesn't change the number of missing blocks in safemode when DNs rejoin or leave

2013-05-21 Thread Ravi Prakash (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-4832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ravi Prakash updated HDFS-4832:
---

Description:
Courtesy Karri VRK Reddy!
{quote}
1. Namenode lost datanodes causing missing blocks
2. Namenode was put in safe mode
3. Datanode restarted on dead nodes
4. Waited for lots of time for the NN UI to reflect the recovered blocks.
5. Forced NN out of safe mode and suddenly, no more missing blocks anymore.
{quote}

I was able to replicate this on 0.23 and trunk. I set
dfs.namenode.heartbeat.recheck-interval to 1 and killed the DN to simulate
lost datanode. The opposite case also has problems (i.e. Datanode failing
when NN is in safemode, doesn't lead to a missing blocks message)

Without the NN updating this list of missing blocks, the grid admins will not
know when to take the cluster out of safemode.

was:
Courtesy Karri VRK Reddy!
{quote}
1. Namenode lost datanodes causing missing blocks
2. Namenode was put in safe mode
3. Datanode restarted on dead nodes
4. Waited for lots of time for the NN UI to reflect the recovered blocks.
5. Forced NN out of safe mode and suddenly, no more missing blocks anymore.
{quote}

I was able to replicate this on 0.23 and trunk. I set
dfs.namenode.heartbeat.recheck-interval to 1 and killed the DN to simulate
lost datanode.

Without the NN updating this list of missing blocks, the grid admins will not
know when to take the cluster out of safemode.

Namenode doesn't change the number of missing blocks in safemode when DNs
rejoin or leave
-

[jira] [Updated] (HDFS-4832) Namenode doesn't change the number of missing blocks in safemode when DNs rejoin or leave

[jira] [Updated] (HDFS-4832) Namenode doesn't change the number of missing blocks in safemode when DNs rejoin or leave

[jira] [Updated] (HDFS-4832) Namenode doesn't change the number of missing blocks in safemode when DNs rejoin or leave

[jira] [Updated] (HDFS-4832) Namenode doesn't change the number of missing blocks in safemode when DNs rejoin or leave

[jira] [Updated] (HDFS-4832) Namenode doesn't change the number of missing blocks in safemode when DNs rejoin or leave

[jira] [Updated] (HDFS-4832) Namenode doesn't change the number of missing blocks in safemode when DNs rejoin or leave

[jira] [Updated] (HDFS-4832) Namenode doesn't change the number of missing blocks in safemode when DNs rejoin or leave

[jira] [Updated] (HDFS-4832) Namenode doesn't change the number of missing blocks in safemode when DNs rejoin or leave

[jira] [Updated] (HDFS-4832) Namenode doesn't change the number of missing blocks in safemode when DNs rejoin or leave

[jira] [Updated] (HDFS-4832) Namenode doesn't change the number of missing blocks in safemode when DNs rejoin or leave

[jira] [Updated] (HDFS-4832) Namenode doesn't change the number of missing blocks in safemode when DNs rejoin or leave

[jira] [Updated] (HDFS-4832) Namenode doesn't change the number of missing blocks in safemode when DNs rejoin or leave

[jira] [Updated] (HDFS-4832) Namenode doesn't change the number of missing blocks in safemode when DNs rejoin or leave

[jira] [Updated] (HDFS-4832) Namenode doesn't change the number of missing blocks in safemode when DNs rejoin or leave

[jira] [Updated] (HDFS-4832) Namenode doesn't change the number of missing blocks in safemode when DNs rejoin or leave

15 matches

Site Navigation

Mail list logo

Footer information