[ 
https://issues.apache.org/jira/browse/HDDS-543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16642083#comment-16642083
 ] 

Mukul Kumar Singh commented on HDDS-543:
----------------------------------------

This is happening because on Datanode restart, the Ratis tries to applies logs 
to the state machine from the start.
Re-applying these log entries will result in the above exception.

> datanode service start throwing NullPointerException
> ----------------------------------------------------
>
>                 Key: HDDS-543
>                 URL: https://issues.apache.org/jira/browse/HDDS-543
>             Project: Hadoop Distributed Data Store
>          Issue Type: Bug
>            Reporter: Nilotpal Nandi
>            Assignee: Mukul Kumar Singh
>            Priority: Critical
>         Attachments: all-node-ozone-logs-1537784799.tar.gz
>
>
> steps taken :
> -------------------
>  # Stopped one of the nodes in 6 node cluster  while write operation was 
> running.
>  # Tried to start datanode after sometime. Datanode failed to start
>  
> ozone version :
> ---------------------
>  
> {noformat}
> Source code repository g...@github.com:apache/hadoop.git -r 
> 968082ffa5d9e50ed8538f653c375edd1b8feea5
> Compiled by elek on 2018-09-19T20:57Z
> Compiled with protoc 2.5.0
> From source with checksum efbdeabb5670d69d9efde85846e4ee98
> Using HDDS 0.2.1-alpha
> Source code repository g...@github.com:apache/hadoop.git -r 
> 968082ffa5d9e50ed8538f653c375edd1b8feea5
> Compiled by elek on 2018-09-19T20:56Z
> Compiled with protoc 2.5.0
> From source with checksum 8bf78cff4b73c95d486da5b21053ef
>  
> {noformat}
>  
> console log :
> --------------------
>  
> {noformat}
> 2018-09-24 09:53:54,913 INFO impl.RaftServerImpl: 
> 9348f4cb-0bd3-41e7-b1c6-c22463a20c09: set configuration 0: 
> [0ba4abe0-c8d1-4d6c-8d28-ebfa08fb84c9:172.27.18.18:9858, 
> 3d6c8b2c-f1d2-4425-a577-b9ef5dc7c634:172.27.80.23:9858, 
> 9348f4cb-0bd3-41e7-b1c6-c22463a20c09:172.27.23.161:9858], old=null at 0
> Sep 24, 2018 9:53:55 AM 
> com.sun.jersey.server.impl.application.WebApplicationImpl _initiate
> INFO: Initiating Jersey application, version 'Jersey: 1.19 02/11/2015 03:25 
> AM'
> 2018-09-24 09:53:55,968 [main] INFO - Listening HDDS REST traffic on 
> /0.0.0.0:9880
> 2018-09-24 09:53:55,969 [main] INFO - Started plug-in 
> org.apache.hadoop.ozone.web.OzoneHddsDatanodeService@2fa3be26
> 2018-09-24 09:53:55,975 [Datanode State Machine Thread - 0] INFO - Attempting 
> to start container services.
> 2018-09-24 09:53:56,171 [Datanode State Machine Thread - 0] INFO - Starting 
> XceiverServerRatis 9348f4cb-0bd3-41e7-b1c6-c22463a20c09 at port 9858
> 2018-09-24 09:53:56,193 [Datanode State Machine Thread - 0] ERROR - Unable to 
> start the DatanodeState Machine
> java.io.IOException: Premature EOF from inputStream
>  at org.apache.ratis.util.IOUtils.readFully(IOUtils.java:100)
>  at org.apache.ratis.server.storage.LogReader.decodeEntry(LogReader.java:250)
>  at org.apache.ratis.server.storage.LogReader.readEntry(LogReader.java:155)
>  at 
> org.apache.ratis.server.storage.LogInputStream.nextEntry(LogInputStream.java:128)
>  at 
> org.apache.ratis.server.storage.LogSegment.readSegmentFile(LogSegment.java:110)
>  at 
> org.apache.ratis.server.storage.LogSegment.loadSegment(LogSegment.java:132)
>  at 
> org.apache.ratis.server.storage.RaftLogCache.loadSegment(RaftLogCache.java:110)
>  at 
> org.apache.ratis.server.storage.SegmentedRaftLog.loadLogSegments(SegmentedRaftLog.java:155)
>  at 
> org.apache.ratis.server.storage.SegmentedRaftLog.open(SegmentedRaftLog.java:123)
>  at org.apache.ratis.server.impl.ServerState.initLog(ServerState.java:162)
>  at org.apache.ratis.server.impl.ServerState.<init>(ServerState.java:110)
>  at 
> org.apache.ratis.server.impl.RaftServerImpl.<init>(RaftServerImpl.java:106)
>  at 
> org.apache.ratis.server.impl.RaftServerProxy.lambda$newRaftServerImpl$0(RaftServerProxy.java:191)
>  at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
>  at 
> java.util.concurrent.CompletableFuture$AsyncSupply.exec(CompletableFuture.java:1582)
>  at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
>  at 
> java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
>  at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
>  at 
> java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)
> 2018-09-24 09:53:56,195 [main] ERROR - Exception in HddsDatanodeService.
> java.lang.NullPointerException
>  at 
> org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.join(DatanodeStateMachine.java:332)
>  at 
> org.apache.hadoop.ozone.HddsDatanodeService.join(HddsDatanodeService.java:191)
>  at 
> org.apache.hadoop.ozone.HddsDatanodeService.main(HddsDatanodeService.java:250)
> 2018-09-24 09:53:56,198 INFO util.ExitUtil: Exiting with status 1: 
> java.lang.NullPointerException
> 2018-09-24 09:53:56,201 WARN fs.CachingGetSpaceUsed: Thread Interrupted 
> waiting to refresh disk information: sleep interrupted
> 2018-09-24 09:53:56,201 [shutdown-hook-0] INFO - SHUTDOWN_MSG:
> /************************************************************
> SHUTDOWN_MSG: Shutting down HddsDatanodeService at 
> ctr-e138-1518143905142-468367-01-000003.hwx.site/172.27.23.161
>  
> {noformat}
>  
> datanode.log
> ----------------------
> {noformat}
> 2018-09-24 09:42:41,305 ERROR org.apache.ratis.server.impl.RaftServerImpl: 
> 9348f4cb-0bd3-41e7-b1c6-c22463a20c09: applyTransaction failed for index:2 
> proto:(t:2, i:2)SMLOGENTRY, client-DBCE827E62E6, cid=1
> 2018-09-24 09:42:41,307 ERROR 
> org.apache.ratis.server.impl.StateMachineUpdater: Terminating with exit 
> status 2: StateMachineUpdater-9348f4cb-0bd3-41e7-b1c6-c22463a20c09: the 
> StateMachineUpdater hits Throwable
> java.lang.NullPointerException
>  at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine$StateMachineHelper.handleChunkCommit(ContainerStateMachine.java:561)
>  at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine$StateMachineHelper.executeContainerCommand(ContainerStateMachine.java:615)
>  at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.applyTransaction(ContainerStateMachine.java:394)
>  at 
> org.apache.ratis.server.impl.RaftServerImpl.applyLogToStateMachine(RaftServerImpl.java:1150)
>  at 
> org.apache.ratis.server.impl.StateMachineUpdater.run(StateMachineUpdater.java:148)
>  at java.lang.Thread.run(Thread.java:745)
> 2018-09-24 09:42:41,312 WARN org.apache.hadoop.fs.CachingGetSpaceUsed: Thread 
> Interrupted waiting to refresh disk information: sleep interrupted
> 2018-09-24 09:53:38,901 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: 
> Loaded properties from hadoop-metrics2.properties
> 2018-09-24 09:53:39,013 INFO 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled Metric snapshot 
> period at 10 second(s).
> 2018-09-24 09:53:39,013 INFO 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: HddsDatanode metrics 
> system started
> 2018-09-24 09:53:39,056 INFO 
> org.apache.hadoop.conf.Configuration.deprecation: No unit for 
> ozone.scm.heartbeat.rpc-timeout(1000) assuming MILLISECONDS
> 2018-09-24 09:53:39,069 INFO 
> org.apache.hadoop.conf.Configuration.deprecation: No unit for 
> hdds.write.lock.reporting.threshold.ms(5000) assuming MILLISECONDS
> 2018-09-24 09:53:39,069 INFO 
> org.apache.hadoop.conf.Configuration.deprecation: No unit for 
> hdds.lock.suppress.warning.interval.ms(10000) assuming MILLISECONDS
> 2018-09-24 09:53:39,751 WARN org.apache.hadoop.hdds.scm.HddsServerUtil: 
> Storage directory for Ratis is not configured.Mapping Ratis storage under 
> /data/disk1/ozone/meta/ratis. It is a good idea to map this to an SSD disk.
> 2018-09-24 09:53:39,783 INFO org.apache.ratis.server.impl.RaftServerProxy: 
> raft.rpc.type = GRPC (default)
> 2018-09-24 09:53:39,793 INFO org.apache.ratis.grpc.GrpcConfigKeys$Server: 
> raft.grpc.server.port = 9858 (custom)
> 2018-09-24 09:53:39,793 INFO org.apache.ratis.grpc.RaftGRpcService: 
> raft.grpc.message.size.max = 167772160 (custom)
> 2018-09-24 09:53:39,795 INFO org.apache.ratis.server.RaftServerConfigKeys: 
> raft.server.log.appender.buffer.capacity = 134217728 (custom)
> 2018-09-24 09:53:39,795 INFO org.apache.ratis.grpc.RaftGRpcService: 
> raft.grpc.flow.control.window = 1MB (=1048576) (default)
> 2018-09-24 09:53:39,796 INFO org.apache.ratis.server.RaftServerConfigKeys: 
> raft.server.rpc.request.timeout = 3000 ms (custom)
> 2018-09-24 09:53:39,835 INFO org.apache.ratis.server.RaftServerConfigKeys: 
> raft.server.storage.dir = /data/disk1/ozone/meta/ratis (custom)
> 2018-09-24 09:53:39,836 INFO org.apache.ratis.server.impl.RaftServerProxy: 
> 9348f4cb-0bd3-41e7-b1c6-c22463a20c09: found a subdirectory 
> /data/disk1/ozone/meta/ratis/6ff6479b-11f4-456f-b46e-85804b25606d
> 2018-09-24 09:53:39,838 INFO org.apache.ratis.server.impl.RaftServerProxy: 
> 9348f4cb-0bd3-41e7-b1c6-c22463a20c09: addNew group-85804B25606D:[] returns 
> group-85804B25606D:java.util.concurrent.CompletableFuture@74e28667[Not 
> completed]
> 2018-09-24 09:53:39,850 INFO org.apache.ratis.server.RaftServerConfigKeys: 
> raft.server.rpc.timeout.min = 1 s (custom)
> 2018-09-24 09:53:39,850 INFO org.apache.ratis.server.RaftServerConfigKeys: 
> raft.server.rpc.timeout.max = 1200 ms (custom)
> 2018-09-24 09:53:39,850 INFO org.apache.ratis.server.RaftServerConfigKeys: 
> raft.server.rpcslowness.timeout = 120 s (custom)
> 2018-09-24 09:53:39,868 INFO org.apache.ratis.server.impl.RaftServerImpl: 
> 9348f4cb-0bd3-41e7-b1c6-c22463a20c09: ConfigurationManager, init=-1: [], 
> old=null, confs=<EMPTY_MAP>
> 2018-09-24 09:53:39,868 INFO org.apache.ratis.server.RaftServerConfigKeys: 
> raft.server.storage.dir = /data/disk1/ozone/meta/ratis (custom)
> 2018-09-24 09:53:39,873 INFO 
> org.apache.ratis.server.storage.RaftStorageDirectory: Lock on 
> /data/disk1/ozone/meta/ratis/6ff6479b-11f4-456f-b46e-85804b25606d/in_use.lock 
> acquired by nodename 3792...@ctr-e138-1518143905142-468367-01-000003.hwx.site
> 2018-09-24 09:53:39,878 INFO org.apache.ratis.server.RaftServerConfigKeys: 
> raft.server.leader.election.timeout = 120 s (custom)
> 2018-09-24 09:53:39,881 INFO org.apache.ratis.server.RaftServerConfigKeys: 
> raft.server.log.use.memory = false (default)
> 2018-09-24 09:53:39,882 INFO org.apache.ratis.server.RaftServerConfigKeys: 
> raft.server.log.appender.buffer.capacity = 134217728 (custom)
> 2018-09-24 09:53:39,883 INFO org.apache.ratis.server.RaftServerConfigKeys: 
> raft.server.log.segment.size.max = 1073741824 (custom)
> 2018-09-24 09:53:39,886 INFO org.apache.ratis.server.RaftServerConfigKeys: 
> raft.server.log.segment.cache.num.max = 2 (custom)
> 2018-09-24 09:53:39,889 INFO org.apache.ratis.server.storage.RaftLogWorker: 
> new 9348f4cb-0bd3-41e7-b1c6-c22463a20c09-RaftLogWorker for Storage Directory 
> /data/disk1/ozone/meta/ratis/6ff6479b-11f4-456f-b46e-85804b25606d
> 2018-09-24 09:53:39,889 INFO org.apache.ratis.server.RaftServerConfigKeys: 
> raft.server.log.segment.size.max = 1073741824 (custom)
> 2018-09-24 09:53:39,889 INFO org.apache.ratis.server.RaftServerConfigKeys: 
> raft.server.log.preallocated.size = 134217728 (custom)
> 2018-09-24 09:53:39,890 INFO org.apache.ratis.server.RaftServerConfigKeys: 
> raft.server.log.write.buffer.size = 33554432 (custom)
> 2018-09-24 09:53:39,890 INFO org.apache.ratis.server.RaftServerConfigKeys: 
> raft.server.log.force.sync.num = 128 (default)
> 2018-09-24 09:53:39,925 INFO org.apache.ratis.server.impl.RaftServerImpl: 
> 9348f4cb-0bd3-41e7-b1c6-c22463a20c09: set configuration 0: 
> [0ba4abe0-c8d1-4d6c-8d28-ebfa08fb84c9:172.27.18.18:9858, 
> 3d6c8b2c-f1d2-4425-a577-b9ef5dc7c634:172.27.80.23:9858, 
> 9348f4cb-0bd3-41e7-b1c6-c22463a20c09:172.27.23.161:9858], old=null at 0
> 2018-09-24 09:53:41,058 INFO org.apache.hadoop.util.ExitUtil: Exiting with 
> status 1: java.lang.NullPointerException
> 2018-09-24 09:53:41,065 WARN org.apache.hadoop.fs.CachingGetSpaceUsed: Thread 
> Interrupted waiting to refresh disk information: sleep interrupted{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to