[
https://issues.apache.org/jira/browse/HDDS-3378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17247505#comment-17247505
]
Bharat Viswanadham edited comment on HDDS-3378 at 12/10/20, 9:00 PM:
---------------------------------------------------------------------
Hi [~cxorm]
Hope you have not started working on this.
I am interested in taking up this, as internally/and our users are confused
with this error.
Proposal is:
Make ratis.snapshot.dir do not depend on ozone.om.ratis.storage.dir, so that
ratis.storage.dir does not have any other directories.
And also if ratis.storage.dir is not defined, the default we fall back to
ozone.metadata.dirs, then we will have ratis.storage.dir as ozone.metadata.dirs
+ "/ratis".
And for older clusters, the directory will be moved to new ratis.snapshot.dir
was (Author: bharatviswa):
Hi [~cxorm]
I will take up this, as internally/and our users are confused with this error.
Proposal is:
Make ratis.snapshot.dir do not depend on ozone.om.ratis.storage.dir, so that
ratis.storage.dir does not have any other directories.
And also if ratis.storage.dir is not defined, the default we fall back to
ozone.metadata.dirs, then we will have ratis.storage.dir as ozone.metadata.dirs
+ "/ratis".
And for older clusters, the directory will be moved to new ratis.snapshot.dir
> OzoneManager group init failed because of incorrect snapshot directory
> location
> -------------------------------------------------------------------------------
>
> Key: HDDS-3378
> URL: https://issues.apache.org/jira/browse/HDDS-3378
> Project: Hadoop Distributed Data Store
> Issue Type: Bug
> Components: Ozone Manager, test
> Reporter: Mukul Kumar Singh
> Assignee: YiSheng Lien
> Priority: Major
> Labels: MiniOzoneChaosCluster
>
> OzoneManager group init failed because of incorrect snapshot directory
> location
> {code}
> 2020-04-11 20:07:57,180 [pool-59-thread-1] INFO server.RaftServerConfigKeys
> (ConfUtils.java:logGet(44)) - raft.server.storage.dir =
> [/tmp/chaos-2020-04-11-20-05-25-IST/MiniOzoneClusterImpl-80aafc97-1b12-4bc0-9baf-7f42185b0995/omNode-3/ratis]
> (custom)
> 2020-04-11 20:07:57,180 [pool-59-thread-1] INFO impl.RaftServerProxy
> (RaftServerProxy.java:lambda$null$0(191)) - omNode-3: found a subdirectory
> /tmp/chaos-2020-04-11-20-05-25-IST/MiniOzoneClusterImpl-80aafc97-1b12-4bc0-9baf-7f42185b0995/omNode-3/ratis/snapshot
> 2020-04-11 20:07:57,181 [pool-59-thread-1] WARN impl.RaftServerProxy
> (RaftServerProxy.java:lambda$null$0(197)) - omNode-3: Failed to initialize
> the group directory
> /tmp/chaos-2020-04-11-20-05-25-IST/MiniOzoneClusterImpl-80aafc97-1b12-4bc0-9baf-7f42185b0995/omNode-3/ratis/snapshot.
> Ignoring it
> java.lang.IllegalArgumentException: Invalid UUID string: snapshot
> at java.util.UUID.fromString(UUID.java:194)
> at
> org.apache.ratis.server.impl.RaftServerProxy.lambda$null$0(RaftServerProxy.java:192)
> at
> java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
> at
> java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
> at
> java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948)
> at
> java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
> at
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
> at
> java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
> at
> java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
> at
> java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
> at
> java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
> at
> org.apache.ratis.server.impl.RaftServerProxy.lambda$initGroups$1(RaftServerProxy.java:189)
> at
> java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
> at
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
> at
> java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
> at
> java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:291)
> at
> java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731)
> at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
> at java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:401)
> at java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:734)
> at
> java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:160)
> at
> java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:174)
> at
> java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233)
> at
> java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
> at
> java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:583)
> at
> org.apache.ratis.server.impl.RaftServerProxy.initGroups(RaftServerProxy.java:186)
> at
> org.apache.ratis.server.impl.ServerImplUtils.newRaftServer(ServerImplUtils.java:41)
> at
> org.apache.ratis.server.RaftServer$Builder.build(RaftServer.java:76)
> at
> org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisServer.<init>(OzoneManagerRatisServer.java:277)
> at
> org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisServer.newOMRatisServer(OzoneManagerRatisServer.java:328)
> at
> org.apache.hadoop.ozone.om.OzoneManager.initializeRatisServer(OzoneManager.java:1249)
> at
> org.apache.hadoop.ozone.om.OzoneManager.restart(OzoneManager.java:1190)
> at
> org.apache.hadoop.ozone.MiniOzoneHAClusterImpl.restartOzoneManager(MiniOzoneHAClusterImpl.java:229)
> at
> org.apache.hadoop.ozone.failure.Failures$OzoneManagerRestartFailure.lambda$fail$0(Failures.java:112)
> at
> java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
> at
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
> at
> java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
> at
> java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:291)
> at
> java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731)
> at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
> at java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:401)
> at java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:734)
> at
> java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:160)
> at
> java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:174)
> at
> java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233)
> at
> java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
> at
> java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:583)
> at
> org.apache.hadoop.ozone.failure.Failures$OzoneManagerRestartFailure.fail(Failures.java:109)
> at
> org.apache.hadoop.ozone.failure.FailureManager.fail(FailureManager.java:58)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 2020-04-11 20:07:57,182 [pool-59-thread-1] INFO impl.RaftServerProxy
> (RaftServerProxy.java:lambda$null$0(191)) - omNode-3: found a subdirectory
> /tmp/chaos-2020-04-11-20-05-25-IST/MiniOzoneClusterImpl-80aafc97-1b12-4bc0-9baf-7f42185b0995/omNode-3/ratis/b870c9eb-edfb-36b5-b758-d62218d261de
> 2020-04-11 20:07:57,183 [pool-59-thread-1] INFO impl.RaftServerProxy
> (RaftServerProxy.java:addNew(89)) - omNode-3: addNew
> group-D62218D261DE:[omNode-3:localhost:12408, omNode-1:localhost:12396,
> omNode-2:localhost:12402] returns
> group-D62218D261DE:java.util.concurrent.CompletableFuture@2fc3d657[Not
> completed]
> 2020-04-11 20:07:57,183 [pool-1382-thread-1] INFO impl.RaftServerImpl
> (RaftServerImpl.java:<init>(97)) - omNode-3: new RaftServerImpl for
> group-D62218D261DE:[omNode-3:localhost:12408, omNode-1:localhost:12396,
> omNode-2:localhost:12402] with OzoneManagerStateMachine:uninitialized
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]