jadami10 commented on issue #7864:
URL: https://github.com/apache/pinot/issues/7864#issuecomment-987564287


   I've doubly confirmed this. You get:
   - 200s back while the broker is draining. 
   - connection refused while it's down
   - connection refused while the service is starting
   - 200s after the `PinotServiceManager` starts
   - 503s once the broker starts building the routing table
   - 200s once the broker is done building the routing table
   
   I think the problem is that the `PinotServiceManager` manager registers its 
own `PinotServiceManagerStatusCallback` which returns `true`/`200` as soon as 
it's done. And then once the broker thread starts, it goes back to `503`. The 
`/health` endpoint just blindly looks at every single health callback 
registered, so there's a window where it's just the `PinotServiceManager`. 
   
   It looks like this behavior was added in #5266, so it's been around a while, 
but I'm not sure it's correct. The `PinotServiceManager` needs to either not 
healthcheck at all, or the broker healthcheck needs to make sure its own 
instanceId healthcheck is passing.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to