[jira] [Commented] (HDFS-2339) BackUpNode is not getting shutdown/recover when all volumes failed

2011-10-24 Thread Uma Maheswara Rao G (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13134017#comment-13134017
 ] 

Uma Maheswara Rao G commented on HDFS-2339:
---

Here waitUntilNamespaceFrozen will wait till he gets the notification from 
NameNode.
Ideally namenodeStartedLogSegment call should notify , so that checkpointing 
will proceed. But here the problem will be that, NN has already removed the 
backupnode streams.So, it is not sending any requests to BN. SO, it will never 
come out of this wait.

> BackUpNode is not getting shutdown/recover when all volumes failed
> --
>
> Key: HDFS-2339
> URL: https://issues.apache.org/jira/browse/HDFS-2339
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.24.0
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
>
> When all volumes failed at back up node side, it is waiting for ever.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2339) BackUpNode is not getting shutdown/recover when all volumes failed

2011-09-17 Thread Uma Maheswara Rao G (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13107258#comment-13107258
 ] 

Uma Maheswara Rao G commented on HDFS-2339:
---

Hi Todd,
What do you suggest for this issue?
I want your opinion on this issue.


Thanks
Uma

> BackUpNode is not getting shutdown/recover when all volumes failed
> --
>
> Key: HDFS-2339
> URL: https://issues.apache.org/jira/browse/HDFS-2339
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.24.0
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
>
> When all volumes failed at back up node side, it is waiting for ever.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2339) BackUpNode is not getting shutdown/recover when all volumes failed

2011-09-16 Thread Uma Maheswara Rao G (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13106098#comment-13106098
 ] 

Uma Maheswara Rao G commented on HDFS-2339:
---

Some more info:

11/09/16 19:31:24 INFO namenode.FSEditLog: Number of transactions: 6 Total time 
for transactions(ms): 2Number of transactions batched in Syncs: 19 Number of 
syncs: 6 SyncTimes(ms): 77
11/09/16 19:32:45 INFO namenode.FSEditLog: Number of transactions: 8 Total time 
for transactions(ms): 2Number of transactions batched in Syncs: 19 Number of 
syncs: 8 SyncTimes(ms): 77
11/09/16 19:34:42 INFO namenode.FSEditLog: Number of transactions: 10 Total 
time for transactions(ms): 60401Number of transactions batched in Syncs: 19 
Number of syncs: 10 SyncTimes(ms): 77
11/09/16 19:35:40 INFO namenode.FSImage: NameNode started a new log segment at 
txid 137
11/09/16 19:35:40 INFO namenode.FSEditLog: Ending log segment 121
11/09/16 19:35:40 INFO namenode.FSEditLog: Number of transactions: 11 Total 
time for transactions(ms): 61080Number of transactions batched in Syncs: 19 
Number of syncs: 12 SyncTimes(ms): 77
11/09/16 19:35:40 ERROR namenode.FSEditLog: Error ending log segment (journal 
JournalAndStream(mgr=FileJournalManager(root=/home/Uma/Hadoop-0.24-09162011/hadoop-hdfs-0.24.0-SNAPSHOT/hadoop-root/dfs/name08),
 
stream=/home/Uma/Hadoop-0.24-09162011/hadoop-hdfs-0.24.0-SNAPSHOT/hadoop-root/dfs/name08/current/edits_inprogress_121))
java.io.IOException: Unable to finalize edits file 
/home/Uma/Hadoop-0.24-09162011/hadoop-hdfs-0.24.0-SNAPSHOT/hadoop-root/dfs/name08/current/edits_inprogress_121
at 
org.apache.hadoop.hdfs.server.namenode.FileJournalManager.finalizeLogSegment(FileJournalManager.java:97)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLog$JournalAndStream.close(FSEditLog.java:1209)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLog$JournalAndStream.access$4(FSEditLog.java:1202)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLog$4.apply(FSEditLog.java:880)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLog.mapJournalsAndReportErrors(FSEditLog.java:1049)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLog.endCurrentLogSegment(FSEditLog.java:876)
at 
org.apache.hadoop.hdfs.server.namenode.BackupImage.namenodeStartedLogSegment(BackupImage.java:355)
at 
org.apache.hadoop.hdfs.server.namenode.BackupNode$BackupNodeRpcServer.startLogSegment(BackupNode.java:257)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:632)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1489)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1135)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1484)
11/09/16 19:35:40 ERROR namenode.FSEditLog: Disabling journal 
...
...
JournalAndStream(mgr=FileJournalManager(root=/home/Uma/Hadoop-0.24-09162011/hadoop-hdfs-0.24.0-SNAPSHOT/hadoop-root/dfs/name08),
 stream=null)
11/09/16 19:35:41 INFO ipc.Server: IPC Server handler 0 on 50100, call: 
startLogSegment(NamenodeRegistration(HOST-10-18-52-222:9000, role=NameNode), 
137), rpc version=2, client version=1, methodsFingerPrint=-852377201 from 
10.18.52.222:43158, error:
java.io.IOException: Unable to start log segment 137: no journals successfully 
started.
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLog.startLogSegment(FSEditLog.java:843)
at 
org.apache.hadoop.hdfs.server.namenode.BackupImage.namenodeStartedLogSegment(BackupImage.java:370)
at 
org.apache.hadoop.hdfs.server.namenode.BackupNode$BackupNodeRpcServer.startLogSegment(BackupNode.java:257)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:632)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1489)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs

[jira] [Commented] (HDFS-2339) BackUpNode is not getting shutdown/recover when all volumes failed

2011-09-16 Thread Uma Maheswara Rao G (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13106096#comment-13106096
 ] 

Uma Maheswara Rao G commented on HDFS-2339:
---

It looks to me that backup node is waiting when all volumes failed.

   {code}
 LOG.info("Waiting until the NameNode rolls its edit logs in order " +
"to freeze the BackupNode namespace.");
while (bnState == BNState.IN_SYNC) {
  Preconditions.checkState(stopApplyingEditsOnNextRoll,
"If still in sync, we should still have the flag set to " +
"freeze at next roll");
  try {
wait();
  } catch (InterruptedException ie) {
LOG.warn("Interrupted waiting for namespace to freeze", ie);
throw new IOException(ie);
  }
}
   {code}
   
  NameNode already removed the back up journals on failure. So, it is not 
giving any call to BNN.

  Since the backupnode streams already insync before , it is just waiting. 
  Why do we need to run the process at this stage?


> BackUpNode is not getting shutdown/recover when all volumes failed
> --
>
> Key: HDFS-2339
> URL: https://issues.apache.org/jira/browse/HDFS-2339
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.24.0
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
>
> When all volumes failed at back up node side, it is waiting for ever.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira