jon-wei commented on a change in pull request #7428: Add errors and state to stream supervisor status API endpoint URL: https://github.com/apache/incubator-druid/pull/7428#discussion_r287177797
########## File path: docs/content/development/extensions-core/kafka-ingestion.md ########## @@ -214,12 +214,38 @@ offsets as reported by Kafka, the consumer lag per partition, as well as the agg consumer lag per partition may be reported as negative values if the supervisor has not received a recent latest offset response from Kafka. The aggregate lag value will always be >= 0. +The status report also contains the supervisor's state and a list of recently thrown exceptions (whose max size can be +controlled using the `druid.supervisor.maxStoredExceptionEvents` config parameter). The list of states is as +follows: + +|State|Description| +|-----|-----------| +|UNHEALTHY_SUPERVISOR|The supervisor has encountered errors on the past `druid.supervisor.unhealthinessThreshold` iterations| +|UNHEALTHY_TASKS|The last `druid.supervisor.taskUnhealthinessThreshold` tasks have all failed| +|UNABLE_TO_CONNECT_TO_STREAM|The supervisor is encountering connectivity issues with Kafka and has not successfully connected in the past| +|LOST_CONTACT_WITH_STREAM|The supervisor is encountering connectivity issues with Kafka but has successfully connected in the past| +|PENDING (first iteration only)|The supervisor has been initialized and hasn't started connecting to the stream| +|CONNECTING_TO_STREAM (first iteration only)|The supervisor is trying to connect to the stream and update partition data| +|DISCOVERING_INITIAL_TASKS (first iteration only)|The supervisor is discovering already-running tasks| +|CREATING_TASKS (first iteration only)|The supervisor is creating tasks and discovering state| +|RUNNING|The supervisor has started tasks and is waiting for taskDuration to elapse| +|SUSPENDED|The supervisor has been suspended| +|STOPPING|The supervisor is stopping| + +States marked with "first iteration only" only occur on the supervisor's first iteration at startup or after suspension. Review comment: Suggest adding a short high-level summary of the Kafka/Kinesis supervisor's `runInternal()` loop. The info is kind of there implicitly in the ordering of the states above, but I think a more explicit description of the per-iteration sequence would be useful ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org