panhongan opened a new issue, #15468:
URL: https://github.com/apache/druid/issues/15468

   Please provide a detailed title (e.g. "Broker crashes when using TopN query 
with Bound filter" instead of just "Broker crashes").
   
   ### Affected Version
   0.19.0 -> 0.23.0
   
   The Druid version where the problem was encountered.
   
   ### Description
   When the kafka ingestion task trigger the `maxRowsPerSegment` condition, the 
task will send `CheckpointAction` to supervisor.
   Then the supervisor will execute checkpoint by `taskClient.pauseAsync()` and 
`taskClient.segEndOffsetsAsync()`.
   But during the `pause` stage, the supervisor will receive exception:
   `
   2023-11-30T03:40:40,645 WARN [IndexTaskClient-datasource1-1] 
org.apache.druid.indexing.common.IndexTaskClient - Exception while sending 
request
   org.apache.druid.java.util.common.IAE: Received 400 Bad Request with body: 
Can't pause, task is not in a pausable state (state: [PAUSED])
   `
   Then the supervisor will kill the ingestion task.
   
   This issue was seen in our production since 0.19.0 to 0.23.0. And happened 
everyday.
   
   Can you help take a look why?
   
   The exception from code: `Seekable`
   `  @VisibleForTesting
     public Response pause() throws InterruptedException
     {
       if (!(status == Status.PAUSED || status == Status.READING)) {
         return Response.status(Response.Status.BAD_REQUEST)
                        .entity(StringUtils.format("Can't pause, task is not in 
a pausable state (state: [%s])", status))
                        .build();
       }
   
      // .... not copy other code
   }
   `
   
   From the log message, if the `status` value is `PAUSED`, why the `if` can be 
hit. Looks very odd.
   
   There is `volatile` for the `status` variable. Not configure out the 
possible root cause.
   
   
   
   Please include as much detailed information about the problem as possible.
   - Cluster size
   - Configurations in use
   - Steps to reproduce the problem
   - The error message or stack traces encountered. Providing more context, 
such as nearby log messages or even entire logs, can be helpful.
   - Any debugging that you have already done
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to