dongjoon-hyun opened a new pull request, #644:
URL: https://github.com/apache/spark-kubernetes-operator/pull/644

   ### What changes were proposed in this pull request?
   
   This PR adds a periodic `System.gc()` invocation in the Spark Operator JVM, 
controlled by a new configuration:
   
   - `spark.kubernetes.operator.periodicGC.intervalSeconds` (default: `120`)
     - Set to `0` or a negative value to disable.
     - Note that `System.gc()` is a no-op when the JVM is started with 
`-XX:+DisableExplicitGC`.
   
   ### Why are the changes needed?
   
   The Spark Operator JVM is a long-running process that continuously allocates 
objects through JOSDK reconcilers and `SentinelManager` health checks. Without 
an explicit Full GC, fragmentation and old-generation garbage can accumulate 
over time, leading to unpredictable Full GC pauses. Triggering `System.gc()` 
periodically gives operators a deterministic, observable point at which old 
generation is reclaimed, which is helpful for steady-state memory behavior in 
long-running deployments.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes. A new operator configuration 
`spark.kubernetes.operator.periodicGC.intervalSeconds` is introduced and is 
enabled by default (`120` seconds). Operators that wish to opt out can set the 
value to `0` or a negative number. New `INFO`-level log lines are emitted at 
startup and on every GC cycle.
   
   ### How was this patch tested?
   
   Pass the CIs. And manually install and check the log.
   
   ```
   26/04/27 21:51:30 INFO   o.a.s.k.o.SparkOperator Version: 0.9.0-SNAPSHOT
   26/04/27 21:51:30 INFO   o.a.s.k.o.SparkOperator Java Version: 26.0.1+8      
                                                                                
                                                                                
        26/04/27 21:51:30 INFO   o.a.s.k.o.SparkOperator Built-in Spark 
Version: 4.2.0-preview4
   ...
   26/04/27 21:51:30 INFO   o.a.s.k.o.SparkOperator Periodic System.gc() 
enabled with interval 120s                                                      
                                                                                
               
   ...
   26/04/27 21:53:30 INFO   o.a.s.k.o.SparkOperator System.gc() finished in 41 
ms. used: 31 MB -> 23 MB, total: 31 MB -> 206 MB
   26/04/27 21:55:30 INFO   o.a.s.k.o.SparkOperator System.gc() finished in 48 
ms. used: 31 MB -> 22 MB, total: 206 MB -> 53 MB
   26/04/27 21:57:30 INFO   o.a.s.k.o.SparkOperator System.gc() finished in 43 
ms. used: 25 MB -> 22 MB, total: 53 MB -> 53 MB
   ```
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   Generated-by: Claude Code


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to