[ 
https://issues.apache.org/jira/browse/HDDS-10739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17842090#comment-17842090
 ] 

Hemant Kumar commented on HDDS-10739:
-------------------------------------

As in the previous comments, OM crashed because StateMachine could not apply 
logs. This is the [precondition 
check|https://github.com/apache/ratis/blob/release-2.5.1/ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerImpl.java#L1727]
 that failed.
{code:java}
...
2024-04-22 13:16:20,573 WARN [IPC Server handler 48 on 
9862]-org.apache.hadoop.ipc.Server: IPC Server handler 48 on 9862, call Call#4 
Retry#3 org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol.submitRequest 
from 10.17.207.21:56466: output error
2024-04-22 13:16:20,573 ERROR 
[om131@group-2BC026ED99AC-StateMachineUpdater]-org.apache.ratis.server.impl.StateMachineUpdater:
 om131@group-2BC026ED99AC-StateMachineUpdater caught a Throwable.
org.apache.ratis.server.raftlog.RaftLogIOException: 
java.lang.IllegalStateException: retry cache entry should be pending: 
4@client-5DDA567B3E84:done
        at 
org.apache.ratis.server.impl.RaftServerImpl.applyLogToStateMachine(RaftServerImpl.java:1780)
        at 
org.apache.ratis.server.impl.StateMachineUpdater.applyLog(StateMachineUpdater.java:242)
        at 
org.apache.ratis.server.impl.StateMachineUpdater.run(StateMachineUpdater.java:184)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalStateException: retry cache entry should be 
pending: 4@client-5DDA567B3E84:done
        at org.apache.ratis.util.Preconditions.assertTrue(Preconditions.java:60)
        at 
org.apache.ratis.server.impl.RaftServerImpl.replyPendingRequest(RaftServerImpl.java:1727)
        at 
org.apache.ratis.server.impl.RaftServerImpl.applyLogToStateMachine(RaftServerImpl.java:1778)
        ... 3 more
... {code}
This precondition has been changed and fixed in RATIS-1873 
([PR#904|https://github.com/apache/ratis/pull/904]).

> OM down to InterruptedException 'Unable to process metadata snapshot request'
> -----------------------------------------------------------------------------
>
>                 Key: HDDS-10739
>                 URL: https://issues.apache.org/jira/browse/HDDS-10739
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: OM, Snapshot
>            Reporter: Jyotirmoy Sinha
>            Priority: Major
>              Labels: ozone-snapshot
>
> Scenario :
>  * Generate data over parallel threads over various volume/buckets
>  * Perform parallel snapshot create/delete/list operations over above buckets
>  * Perform parallel snapdiff operations over each bucket
>  * Perform parallel read operations of snapshot contents
>  * Introduce OM and cluster restarts in between along with DN decommissioning 
> and balancer restarts.
> OM Error Stacktrace -
> {code:java}
> 2024-04-22 13:18:25,071 INFO 
> [om131@group-2BC026ED99AC-StateMachineUpdater]-org.eclipse.jetty.server.handler.ContextHandler:
>  Stopped 
> o.e.j.s.ServletContextHandler@74e87f03{logs,/logs,file:///var/log/hadoop-ozone/,STOPPED}
> 2024-04-22 13:18:40,074 ERROR 
> [qtp560715723-676269]-org.apache.hadoop.hdds.utils.DBCheckpointServlet: 
> Unable to process metadata snapshot request.
> java.lang.InterruptedException
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:998)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
>         at java.util.concurrent.Semaphore.acquire(Semaphore.java:312)
>         at 
> org.apache.hadoop.ozone.lock.BootstrapStateHandler$Lock.lock(BootstrapStateHandler.java:31)
>         at 
> org.apache.hadoop.ozone.om.OMDBCheckpointServlet$Lock.lock(OMDBCheckpointServlet.java:654)
>         at 
> org.apache.hadoop.hdds.utils.DBCheckpointServlet.generateSnapshotCheckpoint(DBCheckpointServlet.java:197)
>         at 
> org.apache.hadoop.hdds.utils.DBCheckpointServlet.doGet(DBCheckpointServlet.java:303)
>         at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
>         at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
>         at 
> org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:799)
>         at 
> org.eclipse.jetty.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1656)
>         at 
> org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:110)
>         at 
> org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)
>         at 
> org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626)
>         at 
> org.apache.hadoop.hdds.server.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1681)
>         at 
> org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)
>         at 
> org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626)
>         at 
> org.apache.hadoop.hdds.server.http.NoCacheFilter.doFilter(NoCacheFilter.java:48)
>         at 
> org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)
>         at 
> org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626)
>         at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:552)
>         at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>         at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:600)
>         at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
>         at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)
>         at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1624)
>         at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
>         at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1440)
>         at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
>         at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:505)
>         at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1594)
>         at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
>         at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1355)
>         at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>         at 
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146)
>         at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
>         at org.eclipse.jetty.server.Server.handle(Server.java:516)
>         at 
> org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:487)
>         at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:732)
>         at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:479)
>         at 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277)
>         at 
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
>         at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
>         at 
> org.eclipse.jetty.io.ssl.SslConnection$DecryptedEndPoint.onFillable(SslConnection.java:555)
>         at 
> org.eclipse.jetty.io.ssl.SslConnection.onFillable(SslConnection.java:410)
>         at 
> org.eclipse.jetty.io.ssl.SslConnection$2.succeeded(SslConnection.java:164)
>         at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
>         at 
> org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
>         at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338)
>         at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315)
>         at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173)
>         at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)
>         at 
> org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:409)
>         at 
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883)
>         at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034)
>         at java.lang.Thread.run(Thread.java:748)
> 2024-04-22 13:18:40,074 ERROR 
> [qtp560715723-676994]-org.apache.hadoop.hdds.utils.DBCheckpointServlet: 
> Unable to process metadata snapshot request.
> java.lang.InterruptedException {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org

Reply via email to