[jira] [Updated] (HDFS-9068) SBN checkpoint could not work after the only name directory recovery from failure
[ https://issues.apache.org/jira/browse/HDFS-9068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Xiaoqiao updated HDFS-9068: -- Attachment: HDFS-9068.patch Attach patch: check failure directory if OK before saving fsimage. > SBN checkpoint could not work after the only name directory recovery from > failure > - > > Key: HDFS-9068 > URL: https://issues.apache.org/jira/browse/HDFS-9068 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.1 >Reporter: He Xiaoqiao > Attachments: HDFS-9068.patch > > > SBN does checkpoint to {{dfs.namenode.name.dir}} peroidly, but the > checkpointer could not work when there is only one directory in configuration > item {{dfs.namenode.name.dir}} and the disk which the directory located > recoveries from failure. > The impact of class is org.apache.hadoop.hdfs.server.namenode.FSImage.java > {code:title=org.apache.hadoop.hdfs.server.namenode.FSImage.java|borderStyle=solid} > @Override > public void run() { > try { > saveFSImage(context, sd, nnf); > } catch (SaveNamespaceCancelledException snce) { > LOG.info("Cancelled image saving for " + sd.getRoot() + > ": " + snce.getMessage()); > // don't report an error on the storage dir! > } catch (Throwable t) { > LOG.error("Unable to save image for " + sd.getRoot(), t); > context.reportErrorOnStorageDirectory(sd); > } > } > {code} > sd is added to errorSDs: {{context.reportErrorOnStorageDirectory(sd)}}, it > will never be used when {{saveFSImage(context, sd, nnf)}} failed becasue > storage is Not available or failed even if it recovers from failure. Then > JournalNode will accumulate a large number of editlog files since > checkpointer failed and NameNode will restart for log time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8993) Balancer throws NPE
[ https://issues.apache.org/jira/browse/HDFS-8993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Xiaoqiao updated HDFS-8993: -- Description: Balancer may throw NPE because the following {{if Statements}} {code:title=Balancer.java|firstline=705} synchronized (block) { // update locations for (String datanodeUuid : blk.getDatanodeUuids()) { final BalancerDatanode d = datanodeMap.get(datanodeUuid); if (datanode != null) { // not an unknown datanode block.addLocation(d); } } } {code} Before moving block, Balancer divides into two step to get all DNs info some blocks of DN from NN. Regarding one DN commission after {{getDatanodeReport}} then {{getBlockList}} and one block's target is above DN just right, It will throw NPE when dispatcher. was: Balancer may throw NPE because the following {{if Statements}} {code:java|firstline=705} synchronized (block) { // update locations for (String datanodeUuid : blk.getDatanodeUuids()) { final BalancerDatanode d = datanodeMap.get(datanodeUuid); if (datanode != null) { // not an unknown datanode block.addLocation(d); } } } {code} Before moving block, Balancer divides into two step to get all DNs info some blocks of DN from NN. Regarding one DN commission after {{getDatanodeReport}} then {{getBlockList}} and one block's target is above DN just right, It will throw NPE when dispatcher. Balancer throws NPE --- Key: HDFS-8993 URL: https://issues.apache.org/jira/browse/HDFS-8993 Project: Hadoop HDFS Issue Type: Bug Components: balancer mover Affects Versions: 2.4.1 Reporter: He Xiaoqiao Balancer may throw NPE because the following {{if Statements}} {code:title=Balancer.java|firstline=705} synchronized (block) { // update locations for (String datanodeUuid : blk.getDatanodeUuids()) { final BalancerDatanode d = datanodeMap.get(datanodeUuid); if (datanode != null) { // not an unknown datanode block.addLocation(d); } } } {code} Before moving block, Balancer divides into two step to get all DNs info some blocks of DN from NN. Regarding one DN commission after {{getDatanodeReport}} then {{getBlockList}} and one block's target is above DN just right, It will throw NPE when dispatcher. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8992) Balancer throws NPE
[ https://issues.apache.org/jira/browse/HDFS-8992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14721016#comment-14721016 ] He Xiaoqiao commented on HDFS-8992: --- Thanks Tsz Wo Nicholas Sze for your comments, there is only one line of stack trace: {code} 2015-06-29 14:06:35,280 WARN org.apache.hadoop.hdfs.server.balancer.Balancer: Dispatcher thread failed java.lang.NullPointerException {code} Balancer throws NPE --- Key: HDFS-8992 URL: https://issues.apache.org/jira/browse/HDFS-8992 Project: Hadoop HDFS Issue Type: Bug Components: balancer mover Affects Versions: 2.4.1 Reporter: He Xiaoqiao Balancer may throw NPE because the following {{if Statements}} {code:java|firstline=705} synchronized (block) { // update locations for (String datanodeUuid : blk.getDatanodeUuids()) { final BalancerDatanode d = datanodeMap.get(datanodeUuid); if (datanode != null) { // not an unknown datanode block.addLocation(d); } } } {code} Before moving block, Balancer divides into two step to get all DNs info some blocks of DN from NN. Regarding one DN commission after {{getDatanodeReport}} then {{getBlockList}} and one block's target is above DN just right, It will throw NPE when dispatcher. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8992) Balancer throws NPE
[ https://issues.apache.org/jira/browse/HDFS-8992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14721017#comment-14721017 ] He Xiaoqiao commented on HDFS-8992: --- Thanks Tsz Wo Nicholas Sze for your comments, there is only one line of stack trace: {code} 2015-06-29 14:06:35,280 WARN org.apache.hadoop.hdfs.server.balancer.Balancer: Dispatcher thread failed java.lang.NullPointerException {code} Balancer throws NPE --- Key: HDFS-8992 URL: https://issues.apache.org/jira/browse/HDFS-8992 Project: Hadoop HDFS Issue Type: Bug Components: balancer mover Affects Versions: 2.4.1 Reporter: He Xiaoqiao Balancer may throw NPE because the following {{if Statements}} {code:java|firstline=705} synchronized (block) { // update locations for (String datanodeUuid : blk.getDatanodeUuids()) { final BalancerDatanode d = datanodeMap.get(datanodeUuid); if (datanode != null) { // not an unknown datanode block.addLocation(d); } } } {code} Before moving block, Balancer divides into two step to get all DNs info some blocks of DN from NN. Regarding one DN commission after {{getDatanodeReport}} then {{getBlockList}} and one block's target is above DN just right, It will throw NPE when dispatcher. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8993) Balancer throws NPE
[ https://issues.apache.org/jira/browse/HDFS-8993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Xiaoqiao updated HDFS-8993: -- Description: Balancer may throw NPE because the following {{if Statements}} at line 709 of Balancer.java {code:title=Balancer.java} synchronized (block) { // update locations for (String datanodeUuid : blk.getDatanodeUuids()) { final BalancerDatanode d = datanodeMap.get(datanodeUuid); if (datanode != null) { // not an unknown datanode block.addLocation(d); } } } {code} Before moving block, Balancer divides into two step to get all DNs info some blocks of DN from NN. Regarding one DN commission after {{getDatanodeReport}} then {{getBlockList}} and one block's target is above DN just right, It will throw NPE when dispatcher. was: Balancer may throw NPE because the following {{if Statements}} {code:title=Balancer.java|firstline=705} synchronized (block) { // update locations for (String datanodeUuid : blk.getDatanodeUuids()) { final BalancerDatanode d = datanodeMap.get(datanodeUuid); if (datanode != null) { // not an unknown datanode block.addLocation(d); } } } {code} Before moving block, Balancer divides into two step to get all DNs info some blocks of DN from NN. Regarding one DN commission after {{getDatanodeReport}} then {{getBlockList}} and one block's target is above DN just right, It will throw NPE when dispatcher. Balancer throws NPE --- Key: HDFS-8993 URL: https://issues.apache.org/jira/browse/HDFS-8993 Project: Hadoop HDFS Issue Type: Bug Components: balancer mover Affects Versions: 2.4.1 Reporter: He Xiaoqiao Balancer may throw NPE because the following {{if Statements}} at line 709 of Balancer.java {code:title=Balancer.java} synchronized (block) { // update locations for (String datanodeUuid : blk.getDatanodeUuids()) { final BalancerDatanode d = datanodeMap.get(datanodeUuid); if (datanode != null) { // not an unknown datanode block.addLocation(d); } } } {code} Before moving block, Balancer divides into two step to get all DNs info some blocks of DN from NN. Regarding one DN commission after {{getDatanodeReport}} then {{getBlockList}} and one block's target is above DN just right, It will throw NPE when dispatcher. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-8992) Balancer throws NPE
[ https://issues.apache.org/jira/browse/HDFS-8992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Xiaoqiao resolved HDFS-8992. --- Resolution: Duplicate Balancer throws NPE --- Key: HDFS-8992 URL: https://issues.apache.org/jira/browse/HDFS-8992 Project: Hadoop HDFS Issue Type: Bug Components: balancer mover Affects Versions: 2.4.1 Reporter: He Xiaoqiao Balancer may throw NPE because the following {{if Statements}} at line 709 of Balancer.java {code:java|firstline=705} synchronized (block) { // update locations for (String datanodeUuid : blk.getDatanodeUuids()) { final BalancerDatanode d = datanodeMap.get(datanodeUuid); if (datanode != null) { // not an unknown datanode block.addLocation(d); } } } {code} Before moving block, Balancer divides into two step to get all DNs info some blocks of DN from NN. Regarding one DN commission after {{getDatanodeReport}} then {{getBlockList}} and one block's target is above DN just right, It will throw NPE when dispatcher. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8992) Balancer throws NPE
[ https://issues.apache.org/jira/browse/HDFS-8992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14721030#comment-14721030 ] He Xiaoqiao commented on HDFS-8992: --- thanks [~brahmareddy] for ur comment. i will close this ticket. Balancer throws NPE --- Key: HDFS-8992 URL: https://issues.apache.org/jira/browse/HDFS-8992 Project: Hadoop HDFS Issue Type: Bug Components: balancer mover Affects Versions: 2.4.1 Reporter: He Xiaoqiao Balancer may throw NPE because the following {{if Statements}} at line 709 of Balancer.java {code:java|firstline=705} synchronized (block) { // update locations for (String datanodeUuid : blk.getDatanodeUuids()) { final BalancerDatanode d = datanodeMap.get(datanodeUuid); if (datanode != null) { // not an unknown datanode block.addLocation(d); } } } {code} Before moving block, Balancer divides into two step to get all DNs info some blocks of DN from NN. Regarding one DN commission after {{getDatanodeReport}} then {{getBlockList}} and one block's target is above DN just right, It will throw NPE when dispatcher. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8992) Balancer throws NPE
[ https://issues.apache.org/jira/browse/HDFS-8992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14721015#comment-14721015 ] He Xiaoqiao commented on HDFS-8992: --- Thanks Tsz Wo Nicholas Sze for your comments, there is only one line of stack trace: {code} 2015-06-29 14:06:35,280 WARN org.apache.hadoop.hdfs.server.balancer.Balancer: Dispatcher thread failed java.lang.NullPointerException {code} Balancer throws NPE --- Key: HDFS-8992 URL: https://issues.apache.org/jira/browse/HDFS-8992 Project: Hadoop HDFS Issue Type: Bug Components: balancer mover Affects Versions: 2.4.1 Reporter: He Xiaoqiao Balancer may throw NPE because the following {{if Statements}} {code:java|firstline=705} synchronized (block) { // update locations for (String datanodeUuid : blk.getDatanodeUuids()) { final BalancerDatanode d = datanodeMap.get(datanodeUuid); if (datanode != null) { // not an unknown datanode block.addLocation(d); } } } {code} Before moving block, Balancer divides into two step to get all DNs info some blocks of DN from NN. Regarding one DN commission after {{getDatanodeReport}} then {{getBlockList}} and one block's target is above DN just right, It will throw NPE when dispatcher. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8992) Balancer throws NPE
He Xiaoqiao created HDFS-8992: - Summary: Balancer throws NPE Key: HDFS-8992 URL: https://issues.apache.org/jira/browse/HDFS-8992 Project: Hadoop HDFS Issue Type: Bug Components: balancer mover Affects Versions: 2.4.1 Reporter: He Xiaoqiao Balancer may throw NPE because the following {{if Statements}} {code:java|firstline=705} synchronized (block) { // update locations for (String datanodeUuid : blk.getDatanodeUuids()) { final BalancerDatanode d = datanodeMap.get(datanodeUuid); if (datanode != null) { // not an unknown datanode block.addLocation(d); } } } {code} Before moving block, Balancer divides into two step to get all DNs info some blocks of DN from NN. Regarding one DN commission after {{getDatanodeReport}} then {{getBlockList}} and one block's target is above DN just right, It will throw NPE when dispatcher. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8993) Balancer throws NPE
He Xiaoqiao created HDFS-8993: - Summary: Balancer throws NPE Key: HDFS-8993 URL: https://issues.apache.org/jira/browse/HDFS-8993 Project: Hadoop HDFS Issue Type: Bug Components: balancer mover Affects Versions: 2.4.1 Reporter: He Xiaoqiao Balancer may throw NPE because the following {{if Statements}} {code:java|firstline=705} synchronized (block) { // update locations for (String datanodeUuid : blk.getDatanodeUuids()) { final BalancerDatanode d = datanodeMap.get(datanodeUuid); if (datanode != null) { // not an unknown datanode block.addLocation(d); } } } {code} Before moving block, Balancer divides into two step to get all DNs info some blocks of DN from NN. Regarding one DN commission after {{getDatanodeReport}} then {{getBlockList}} and one block's target is above DN just right, It will throw NPE when dispatcher. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8992) Balancer throws NPE
[ https://issues.apache.org/jira/browse/HDFS-8992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Xiaoqiao updated HDFS-8992: -- Description: Balancer may throw NPE because the following {{if Statements}} at line 709 of Balancer.java {code:java|firstline=705} synchronized (block) { // update locations for (String datanodeUuid : blk.getDatanodeUuids()) { final BalancerDatanode d = datanodeMap.get(datanodeUuid); if (datanode != null) { // not an unknown datanode block.addLocation(d); } } } {code} Before moving block, Balancer divides into two step to get all DNs info some blocks of DN from NN. Regarding one DN commission after {{getDatanodeReport}} then {{getBlockList}} and one block's target is above DN just right, It will throw NPE when dispatcher. was: Balancer may throw NPE because the following {{if Statements}} {code:java|firstline=705} synchronized (block) { // update locations for (String datanodeUuid : blk.getDatanodeUuids()) { final BalancerDatanode d = datanodeMap.get(datanodeUuid); if (datanode != null) { // not an unknown datanode block.addLocation(d); } } } {code} Before moving block, Balancer divides into two step to get all DNs info some blocks of DN from NN. Regarding one DN commission after {{getDatanodeReport}} then {{getBlockList}} and one block's target is above DN just right, It will throw NPE when dispatcher. Balancer throws NPE --- Key: HDFS-8992 URL: https://issues.apache.org/jira/browse/HDFS-8992 Project: Hadoop HDFS Issue Type: Bug Components: balancer mover Affects Versions: 2.4.1 Reporter: He Xiaoqiao Balancer may throw NPE because the following {{if Statements}} at line 709 of Balancer.java {code:java|firstline=705} synchronized (block) { // update locations for (String datanodeUuid : blk.getDatanodeUuids()) { final BalancerDatanode d = datanodeMap.get(datanodeUuid); if (datanode != null) { // not an unknown datanode block.addLocation(d); } } } {code} Before moving block, Balancer divides into two step to get all DNs info some blocks of DN from NN. Regarding one DN commission after {{getDatanodeReport}} then {{getBlockList}} and one block's target is above DN just right, It will throw NPE when dispatcher. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8993) Balancer throws NPE
[ https://issues.apache.org/jira/browse/HDFS-8993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14721021#comment-14721021 ] He Xiaoqiao commented on HDFS-8993: --- sorry to create to the same ticket since first creation not complete... BTW, how to delete this ticket? Balancer throws NPE --- Key: HDFS-8993 URL: https://issues.apache.org/jira/browse/HDFS-8993 Project: Hadoop HDFS Issue Type: Bug Components: balancer mover Affects Versions: 2.4.1 Reporter: He Xiaoqiao Balancer may throw NPE because the following {{if Statements}} at line 709 of Balancer.java {code:title=Balancer.java} synchronized (block) { // update locations for (String datanodeUuid : blk.getDatanodeUuids()) { final BalancerDatanode d = datanodeMap.get(datanodeUuid); if (datanode != null) { // not an unknown datanode block.addLocation(d); } } } {code} Before moving block, Balancer divides into two step to get all DNs info some blocks of DN from NN. Regarding one DN commission after {{getDatanodeReport}} then {{getBlockList}} and one block's target is above DN just right, It will throw NPE when dispatcher. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8992) Balancer throws NPE
[ https://issues.apache.org/jira/browse/HDFS-8992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14721018#comment-14721018 ] He Xiaoqiao commented on HDFS-8992: --- Actually, it can be fixed only add condition at {{if Statements}} {code} synchronized (block) { // update locations for (String datanodeUuid : blk.getDatanodeUuids()) { final BalancerDatanode d = datanodeMap.get(datanodeUuid); if (datanode != null null != d) { // not an unknown datanode block.addLocation(d); } } } {code} Balancer throws NPE --- Key: HDFS-8992 URL: https://issues.apache.org/jira/browse/HDFS-8992 Project: Hadoop HDFS Issue Type: Bug Components: balancer mover Affects Versions: 2.4.1 Reporter: He Xiaoqiao Balancer may throw NPE because the following {{if Statements}} {code:java|firstline=705} synchronized (block) { // update locations for (String datanodeUuid : blk.getDatanodeUuids()) { final BalancerDatanode d = datanodeMap.get(datanodeUuid); if (datanode != null) { // not an unknown datanode block.addLocation(d); } } } {code} Before moving block, Balancer divides into two step to get all DNs info some blocks of DN from NN. Regarding one DN commission after {{getDatanodeReport}} then {{getBlockList}} and one block's target is above DN just right, It will throw NPE when dispatcher. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8973) NameNode exit without any exception log
[ https://issues.apache.org/jira/browse/HDFS-8973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716600#comment-14716600 ] He Xiaoqiao commented on HDFS-8973: --- Thanks Kanaka for your comments. The follow is the main GC messages before process exit in .out file. actually GC works well as usual via following info and monitor system. at 19:35:02.508 when 'log4j:ERROR Failed to flush writer' appears no more logs output to .log file but GC info continue print about 5mins until 19:40:10 and namenode process exit. at that time, enough Memory space for JVM working. {code:borderStyle=solid} 2015-08-26T19:34:30.537+0800: [GC [ParNew: 8315771K-63022K(9292032K), 0.1909130 secs] 96423904K-88172502K(133185344K), 0.1910150 secs] [Times: user=3.37 sys=0.01, real=0.19 secs] 2015-08-26T19:34:42.296+0800: [GC [ParNew: 8322670K-71664K(9292032K), 0.2214550 secs] 96432150K-88183374K(133185344K), 0.2215720 secs] [Times: user=3.92 sys=0.01, real=0.22 secs] 2015-08-26T19:34:52.412+0800: [GC [ParNew: 8331312K-82431K(9292032K), 0.2173850 secs] 96443022K-88195492K(133185344K), 0.2174950 secs] [Times: user=3.86 sys=0.00, real=0.22 secs] 2015-08-26T19:35:02.508+0800: [GC [ParNew: 8342079K-101837K(9292032K), 0.1873830 secs] 96455140K-88216487K(133185344K), 0.1874800 secs] [Times: user=3.26 sys=0.02, real=0.18 secs] log4j:ERROR Failed to flush writer, java.io.IOException: 错误的文件描述符 at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:318) at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221) at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:291) at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:295) at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:141) at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:229) at org.apache.log4j.helpers.QuietWriter.flush(QuietWriter.java:59) at org.apache.log4j.WriterAppender.subAppend(WriterAppender.java:324) at org.apache.log4j.RollingFileAppender.subAppend(RollingFileAppender.java:276) at org.apache.log4j.WriterAppender.append(WriterAppender.java:162) at org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:251) at org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66) at org.apache.log4j.Category.callAppenders(Category.java:206) at org.apache.log4j.Category.forcedLog(Category.java:391) at org.apache.log4j.Category.log(Category.java:856) at org.apache.commons.logging.impl.Log4JLogger.info(Log4JLogger.java:176) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.logAddStoredBlock(BlockManager.java:2391) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.addStoredBlock(BlockManager.java:2312) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processAndHandleReportedBlock(BlockManager.java:2919) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.addBlock(BlockManager.java:2894) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processIncrementalBlockReport(BlockManager.java:2976) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.processIncrementalBlockReport(FSNamesystem.java:5432) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReceivedAndDeleted(NameNodeRpcServer.java:1061) at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.blockReceivedAndDeleted(DatanodeProtocolServerSideTranslatorPB.java:209) at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:28065) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) 2015-08-26T19:35:14.959+0800: [GC [ParNew: 8361485K-93419K(9292032K), 0.1904630 secs] 96476135K-88211796K(133185344K), 0.1905540 secs] [Times: user=3.38 sys=0.00, real=0.19 secs] 2015-08-26T19:35:25.424+0800: [GC [ParNew: 8353067K-54117K(9292032K), 0.1892230 secs] 96471444K-88174133K(133185344K), 0.1893260 secs] [Times: user=3.31 sys=0.01, real=0.19 secs] 2015-08-26T19:35:36.512+0800: [GC [ParNew: 8313765K-55946K(9292032K), 0.1901160 secs] 96433781K-88177578K(133185344K), 0.1902050 secs]
[jira] [Commented] (HDFS-8973) NameNode exit without any exception log
[ https://issues.apache.org/jira/browse/HDFS-8973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716414#comment-14716414 ] He Xiaoqiao commented on HDFS-8973: --- Thanks, Kanaka. before stop logging everything looks well actually, neither WARN nor ERROR occured, and after that it continues print GC info to out file about 5 mins also looks well but log4j:ERROR Failed to flush writer, no any other useful info. I doubt there are multi threads using the same log4j handler, and when rolling the logfile, one thread close the Stream, and other thread continue write to this Stream, thus some Exception throws and interrupt the Thread. When all Threads of Namenode interrupt, main Thread exit. NameNode exit without any exception log --- Key: HDFS-8973 URL: https://issues.apache.org/jira/browse/HDFS-8973 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.4.1 Reporter: He Xiaoqiao Priority: Critical namenode process exit without any useful WARN/ERROR log, and after .log file output interrupt .out file continue show about 5 min GC log. when .log file intertupt .out file print the follow ERROR, it may hint some info. it seems cause by log4j ERROR. {code:title=namenode.out|borderStyle=solid} log4j:ERROR Failed to flush writer, java.io.IOException: 错误的文件描述符 at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:318) at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221) at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:291) at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:295) at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:141) at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:229) at org.apache.log4j.helpers.QuietWriter.flush(QuietWriter.java:59) at org.apache.log4j.WriterAppender.subAppend(WriterAppender.java:324) at org.apache.log4j.RollingFileAppender.subAppend(RollingFileAppender.java:276) at org.apache.log4j.WriterAppender.append(WriterAppender.java:162) at org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:251) at org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66) at org.apache.log4j.Category.callAppenders(Category.java:206) at org.apache.log4j.Category.forcedLog(Category.java:391) at org.apache.log4j.Category.log(Category.java:856) at org.apache.commons.logging.impl.Log4JLogger.info(Log4JLogger.java:176) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.logAddStoredBlock(BlockManager.java:2391) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.addStoredBlock(BlockManager.java:2312) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processAndHandleReportedBlock(BlockManager.java:2919) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.addBlock(BlockManager.java:2894) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processIncrementalBlockReport(BlockManager.java:2976) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.processIncrementalBlockReport(FSNamesystem.java:5432) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReceivedAndDeleted(NameNodeRpcServer.java:1061) at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.blockReceivedAndDeleted(DatanodeProtocolServerSideTranslatorPB.java:209) at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:28065) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8973) NameNode exit without any exception log
[ https://issues.apache.org/jira/browse/HDFS-8973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Xiaoqiao updated HDFS-8973: -- Description: namenode process exit without any useful WARN/ERROR log, and after .log file output interrupt .out file continue show about 5 min GC log. when .log file intertupt .out file print the follow ERROR, it may hint some info. it seems cause by log4j ERROR. {code:title=namenode.out|borderStyle=solid} log4j:ERROR Failed to flush writer, java.io.IOException: 错误的文件描述符 at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:318) at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221) at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:291) at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:295) at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:141) at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:229) at org.apache.log4j.helpers.QuietWriter.flush(QuietWriter.java:59) at org.apache.log4j.WriterAppender.subAppend(WriterAppender.java:324) at org.apache.log4j.RollingFileAppender.subAppend(RollingFileAppender.java:276) at org.apache.log4j.WriterAppender.append(WriterAppender.java:162) at org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:251) at org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66) at org.apache.log4j.Category.callAppenders(Category.java:206) at org.apache.log4j.Category.forcedLog(Category.java:391) at org.apache.log4j.Category.log(Category.java:856) at org.apache.commons.logging.impl.Log4JLogger.info(Log4JLogger.java:176) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.logAddStoredBlock(BlockManager.java:2391) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.addStoredBlock(BlockManager.java:2312) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processAndHandleReportedBlock(BlockManager.java:2919) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.addBlock(BlockManager.java:2894) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processIncrementalBlockReport(BlockManager.java:2976) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.processIncrementalBlockReport(FSNamesystem.java:5432) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReceivedAndDeleted(NameNodeRpcServer.java:1061) at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.blockReceivedAndDeleted(DatanodeProtocolServerSideTranslatorPB.java:209) at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:28065) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) {code} was: namenode process exit without any useful WARN/ERROR log, and after .log file output interrupt .out file continue show about 5 min GC log. when .log file intertupt .out file print the follow ERROR, it may hint some info. it seems cause by log4j ERROR. {code:title=namenode.out|borderStyle=solid} log4j:ERROR Failed to flush writer, Total time for which application threads were stopped: 0.0047800 seconds java.io.IOException: 错误的文件描述符 at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:318) at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221) at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:291) at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:295) at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:141) at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:229) at org.apache.log4j.helpers.QuietWriter.flush(QuietWriter.java:59) at org.apache.log4j.WriterAppender.subAppend(WriterAppender.java:324) at org.apache.log4j.RollingFileAppender.subAppend(RollingFileAppender.java:276) at org.apache.log4j.WriterAppender.append(WriterAppender.java:162) at org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:251) at
[jira] [Created] (HDFS-8973) NameNode exit without any exception log
He Xiaoqiao created HDFS-8973: - Summary: NameNode exit without any exception log Key: HDFS-8973 URL: https://issues.apache.org/jira/browse/HDFS-8973 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.4.1 Reporter: He Xiaoqiao Priority: Critical namenode process exit without any useful WARN/ERROR log, and after .log file output interrupt .out file continue show about 5 min GC log. when .log file intertupt .out file print the follow ERROR, it may hint some info. it seems cause by log4j ERROR. {code:title=namenode.out|borderStyle=solid} log4j:ERROR Failed to flush writer, Total time for which application threads were stopped: 0.0047800 seconds java.io.IOException: 错误的文件描述符 at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:318) at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221) at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:291) at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:295) at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:141) at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:229) at org.apache.log4j.helpers.QuietWriter.flush(QuietWriter.java:59) at org.apache.log4j.WriterAppender.subAppend(WriterAppender.java:324) at org.apache.log4j.RollingFileAppender.subAppend(RollingFileAppender.java:276) at org.apache.log4j.WriterAppender.append(WriterAppender.java:162) at org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:251) at org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66) at org.apache.log4j.Category.callAppenders(Category.java:206) at org.apache.log4j.Category.forcedLog(Category.java:391) at org.apache.log4j.Category.log(Category.java:856) at org.apache.commons.logging.impl.Log4JLogger.info(Log4JLogger.java:176) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.logAddStoredBlock(BlockManager.java:2391) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.addStoredBlock(BlockManager.java:2312) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processAndHandleReportedBlock(BlockManager.java:2919) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.addBlock(BlockManager.java:2894) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processIncrementalBlockReport(BlockManager.java:2976) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.processIncrementalBlockReport(FSNamesystem.java:5432) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReceivedAndDeleted(NameNodeRpcServer.java:1061) at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.blockReceivedAndDeleted(DatanodeProtocolServerSideTranslatorPB.java:209) at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:28065) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)