jadami10 commented on issue #7864: URL: https://github.com/apache/pinot/issues/7864#issuecomment-987564287
I've doubly confirmed this. You get: - 200s back while the broker is draining. - connection refused while it's down - connection refused while the service is starting - 200s after the `PinotServiceManager` starts - 503s once the broker starts building the routing table - 200s once the broker is done building the routing table I think the problem is that the `PinotServiceManager` manager registers its own `PinotServiceManagerStatusCallback` which returns `true`/`200` as soon as it's done. And then once the broker thread starts, it goes back to `503`. The `/health` endpoint just blindly looks at every single health callback registered, so there's a window where it's just the `PinotServiceManager`. It looks like this behavior was added in #5266, so it's been around a while, but I'm not sure it's correct. The `PinotServiceManager` needs to either not healthcheck at all, or the broker healthcheck needs to make sure its own instanceId healthcheck is passing. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
