vsantwana opened a new pull request, #1146:
URL: https://github.com/apache/flink-kubernetes-operator/pull/1146
## What is the purpose of the change
`status.taskManager.replicas` previously held a value computed from the spec
at
reconcile time (`ceil(parallelism / slots)`). This is inaccurate for
deployments
where the number of TaskManagers fluctuates dynamically rather than being
fixed
by configuration - most notably standalone clusters running in reactive mode.
This change makes the field report the **actual** number of TaskManagers
registered with the running Flink cluster, sourced from the Flink REST API
(`/taskmanagers`), for both application and session deployments.
## Brief change log
- Added `FlinkService#getTaskManagerReplicas(Configuration)`, implemented
in
`AbstractFlinkService` via the existing `/taskmanagers` REST call;
`getClusterInfo` now routes through it (single source of truth).
- `AbstractFlinkDeploymentObserver#observeClusterInfo` now sets
`status.taskManager` (label selector + actual replica count) whenever the
JobManager is ready, covering both application and session clusters.
- `ReconciliationUtils#getTaskManagerInfo` no longer derives the count
from the
spec; at reconcile time it only sets the label selector (replicas stay
`0`
until the cluster is observed) and still clears the info when not
running.
- Extracted the duplicated TaskManager label selector into
`FlinkUtils#getTaskManagerLabelSelector`.
- Updated the `TaskManagerInfo.replicas` JavaDoc and the generated CRD
reference docs to reflect the new semantics.
Note: this changes the semantics of `status.taskManager.replicas` from a
spec-derived value to the observed cluster state.
## Verifying this change
This change added tests and can be verified as follows:
- `ApplicationObserverTest#observeReportsActualTaskManagerReplicas`
verifies the
status reflects the cluster's actual TaskManager count (independent of
the
spec-derived value) and tracks changes across observations.
- `TestingFlinkService` gained an overridable `taskManagerReplicas` stub
(defaults to a converged cluster, i.e. the spec-derived count).
- Existing controller/observer/service/metrics tests updated/confirmed.
## Does this pull request potentially affect one of the following parts:
- Dependencies (does it add or upgrade a dependency): no
- The public API, i.e., is any changes to the `CustomResourceDescriptors`:
no (the field already exists; only its reported value changes)
- Core observer or reconciler logic that is regularly executed: yes
## Documentation
- Does this pull request introduce a new feature? no (behavior change to
an existing status field)
- If yes, how is the feature documented? JavaDocs / docs (CRD reference
updated)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]