[ 
https://issues.apache.org/jira/browse/HBASE-8230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13618549#comment-13618549
 ] 

Jieshan Bean commented on HBASE-8230:
-------------------------------------

bq.Did the failure happen when region server restarted ?
Yes.

bq.If this was repeatable, I would suggest finding the root cause.
The root cause in our env was NameNode was in safemode:
{noformat}
2013-03-29 10:32:42,260 FATAL [regionserver26003] ABORTING region server 
om-host2,26003,1364524173470: Unhandled exception: cannot get log writer 
org.apache.hadoop.hbase.regionserver.HRegionServer.abort(HRegionServer.java:1737)
java.io.IOException: cannot get log writer
        at 
org.apache.hadoop.hbase.regionserver.wal.HLog.createWriter(HLog.java:757)
        at 
org.apache.hadoop.hbase.regionserver.wal.HLog.createWriterInstance(HLog.java:701)
        at 
org.apache.hadoop.hbase.regionserver.wal.HLog.rollWriter(HLog.java:637)
        at 
org.apache.hadoop.hbase.regionserver.wal.HLog.rollWriter(HLog.java:582)
        at org.apache.hadoop.hbase.regionserver.wal.HLog.<init>(HLog.java:436)
        at org.apache.hadoop.hbase.regionserver.wal.HLog.<init>(HLog.java:362)
        at 
org.apache.hadoop.hbase.regionserver.HRegionServer.instantiateHLog(HRegionServer.java:1327)
        at 
org.apache.hadoop.hbase.regionserver.HRegionServer.setupWALAndReplication(HRegionServer.java:1316)
        at 
org.apache.hadoop.hbase.regionserver.HRegionServer.handleReportForDutyResponse(HRegionServer.java:1030)
        at 
org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:706)
        at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: 
org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot create 
file/hbase/.logs/om-host2,26003,1364524173470/om-host2%2C26003%2C1364524173470.1364524361366.
 Name node is in safe mode.
The reported blocks 14 has reached the threshold 0.9990 of total blocks 14. 
Safe mode will be turned off automatically in 21 seconds.
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1601)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1547)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:412)
        at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:204)
        at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:43664)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:427)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:924)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1710)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1706)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1704)

        at 
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.init(SequenceFileLogWriter.java:209)
        at 
org.apache.hadoop.hbase.regionserver.wal.HLog.createWriter(HLog.java:754)
        ... 10 more
{noformat}

                
> Possible NPE on regionserver abort if replication service has not been started
> ------------------------------------------------------------------------------
>
>                 Key: HBASE-8230
>                 URL: https://issues.apache.org/jira/browse/HBASE-8230
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver, Replication
>    Affects Versions: 0.94.6
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>         Attachments: HBASE-8230-94.patch
>
>
> RegionServer got Exception on calling setupWALAndReplication, so entered 
> abort flow. Since replicationSink had not been inialized yet, we got below 
> exception:
> {noformat}
> Exception in thread "regionserver26003" java.lang.NullPointerException
>  at 
> org.apache.hadoop.hbase.replication.regionserver.Replication.join(Replication.java:129)
>  at 
> org.apache.hadoop.hbase.replication.regionserver.Replication.stopReplicationService(Replication.java:120)
>  at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.join(HRegionServer.java:1803)
>  at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:834)
>  at java.lang.Thread.run(Thread.java:662)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to