pchoudhury22 opened a new pull request, #966: URL: https://github.com/apache/flink-kubernetes-operator/pull/966
### What is the purpose of the change Currently, target parallelism computation assumes perfect linear scaling. However, real-time workloads often exhibit nonlinear scalability due to factors like network overhead and coordination costs. This change introduces an observed scalability coefficient, estimated using weighted linear regression on past (parallelism, processing rate) data, to improve the accuracy of scaling decisions. ### Brief change log Implemented a dynamic scaling coefficient to compute target parallelism based on observed scalability. The system estimates the scalability coefficient using a weighted least squares linear regression approach, leveraging historical (parallelism, processing rate) data. The regression model minimizes the weighted sum of squared errors, where weights are assigned based on both parallelism and recency to prioritize recent observations. The baseline processing rate is computed using the smallest observed parallelism in the history. Model details: #### The Linear Model We define a linear relationship between **parallelism (P)** and **processing rate (R)**: ```math R_i = β * P_i * α ``` where: - **R_i** = actual processing rate for the *i-th* data point - **P_i** = parallelism for the *i-th* data point - **β** = base factor (constant scale factor) - **α** = scaling coefficient to optimize #### Weighted Squared Error The loss function to minimize is the **weighted sum of squared errors (SSE)**: ```math Loss = Σ w_i * (R_i - R̂_i)^2 ``` Substituting \( R̂_i = (β α) P_i \): ```math Loss = Σ w_i * (R_i - β α P_i)^2 ``` where **w_i** is the weight for each data point. #### Minimizing the Error Expanding \( (R_i - β α P_i)^2 \): ```math (R_i - β α P_i)^2 = R_i^2 - 2β α P_i R_i + (β α P_i)^2 ``` Multiplying by **w_i** and summing over all data points: ```math Loss = Σ w_i * (R_i^2 - 2β α P_i R_i + β^2 α^2 P_i^2) ``` #### Solving for α To minimize for **α**, taking the derivative and solving we get: ```math α = (Σ w_i P_i R_i) / (Σ w_i P_i^2 * β) ``` ### Verifying this change New unit tests added to cover this ### Does this pull request potentially affect one of the following parts: Dependencies (does it add or upgrade a dependency): no The public API, i.e., is any changes to the CustomResourceDescriptors: no Core observer or reconciler logic that is regularly executed: no -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
