Shashankft9 commented on issue #650:
URL: https://github.com/apache/solr-operator/issues/650#issuecomment-2532505127

   I hear your concern, maybe I can give some examples of status in other CRs 
that can help explain where I am coming from.
   
   Example 1: 
   the status of a strimzi kafka cr looks something like this:
   ```
   status:
     clusterId: ETqrvQ33S9uL5gshnirFPQ
     conditions:
     - lastTransitionTime: "2024-09-27T07:23:13.389Z"
       message: Exceeded timeout of 300000ms while waiting for Deployment 
resource dev-hsp-01-entity-operator
         in namespace hsp-01 to be ready
       reason: TimeoutException
       status: "True"
       type: NotReady
     listeners:
     - addresses:
       - host: hsp-01.sample.com
         port: 30442
       bootstrapServers: hsp-01.sample.com:30442
       name: external
       type: external
     observedGeneration: 1
   ```
   now the idea is that this `observedGeneration` should always match with the 
`metadata.Generation`, this is done so that instead of giving me a granular 
insight, it will give me an overall view that, when I did a change in spec 
(which incremented the `metadata.Generation` value), I can see that the 
operator worked upon it and showed me an error - at some point in time `T` I 
can verify that the operator did actually work on the last CR change by 
comparing the two values. Ofcourse, Its not giving me all the details of what 
changed in between, but it does show me that if in one edit I changed value x, 
y and z, did the operator succesfully process them all or not.
   
   Example 2: 
   We can take a look at something more complex like a knative service CR 
status, where a change in spec, actually causes the knative controllers to 
update 4-5 other CRs in the cluster, which are all captured in a status like 
this:
   
   ```
   status:
     address:
       url: http://healthchecks.metacontroller.svc.cluster.local
     conditions:
     - lastTransitionTime: "2024-03-14T09:31:36Z"
       status: "True"
       type: ConfigurationsReady
     - lastTransitionTime: "2024-03-14T09:31:37Z"
       status: "True"
       type: Ready
     - lastTransitionTime: "2024-03-14T09:31:37Z"
       status: "True"
       type: RoutesReady
     latestCreatedRevisionName: healthchecks-00006
     latestReadyRevisionName: healthchecks-00006
     observedGeneration: 6
     traffic:
     - latestRevision: true
       percent: 100
       revisionName: healthchecks-00006
     url: http://healthchecks.metacontroller.svc.cluster.local
   ```
   now looking at this status, I know the knative controllers worked on the 
last cr spec change, because the `observedGeneration` is equal to whats there 
in the `metadata.Generation`. 
   
   I may not be able to articulate it well, but I recently saw a good talk on a 
kubernetes CRD status design, just the first few mins where scott is covering 
the tldr might help - https://youtu.be/iNp-fsffIgQ?t=70
   
   >is that representative of the sort of things you don't have a good way to 
poll for?
   
   we wanted to make "ack-to-provisioner" process event driven instead of 
polling, because if we go the polling route, we will have to write custom logic 
for each operator based service, versus implementing something at CR status 
level, which will save us from lot of code maintenance since most of the other 
operators already follow this pattern I think.
   
   >It feels like observedGeneration might send a wrong (or at least 
incomplete?) signal to your provisioner/UI much of the time. Have you thought 
much about those scenarios? Is there maybe something I'm missing?
   
   Yes, we have thought about this, and to handle this, we have a check where 
we only send the ack for success scenarios right now, specifically when a 
condition like this becomes true: 
   `are all the condition status true && is the observedGeneration equal to 
metadata generation && is the observedGeneration value greater than last 
observedGeneration for which we sent an ack` 
   this would take care of scenarios where the instance is created the first 
time and/or if its resized.
   Now, in this above logic, we dont really use `lastTransitionTime`, but I 
added it in the issue, because generally speaking, the time helps in other day 
to day operational activities. 
   
   Apologies if I have gone south of what you actually asked.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to