leekeiabstraction opened a new pull request, #28489:
URL: https://github.com/apache/flink/pull/28489

   ## What changes were proposed in this pull request?
   
   Fix `RocksDBNativeMetricMonitor.close()` to invoke `Statistics.close()` 
instead of just nulling. Also hoist `dbOptions.statistics()` into a local in 
`RocksDBHandle.loadDb()` so a monitor-constructor failure doesn't orphan the 
wrapper.
   
   ## Why are the changes needed?
   
   Fix memory leak. When any of the 11 RocksDB ticker metrics is enabled, the 
TaskManager leaks native memory in proportion to the number of keyed state 
backend rebuilds e.g. during continuously failing and restarting job. 
   
   1. Latent Flink-side bug was introduced in 
[FLINK-24786](https://issues.apache.org/jira/browse/FLINK-24786) when 
Statistics object was added without explicit close() on the object. This was 
latent as it relied on finalize() running to call dispose() and close().
   2. rocksdb side finalizer was removed in 
https://github.com/facebook/rocksdb/commit/99d86252b6514d0fe3b848bd39bda94642c14faf
   3. Flink 2.0+ uses frocksdb 8.10.0. Leak started occurring as close is no 
longer called.
   
   ## Verifying this change
   
   Reproduced on `flink:2.2.2`, TaskManager container limited to 4 GB cgroup 
matching `taskmanager.memory.process.size`. Jobs that throw 
`NumberFormatException` on every row is submitted with 
`restart-strategy.fixed-delay.delay=100ms` and all 11 ticker metrics enabled.
   
   | Run | Stats | Outcome |
   |---|---|---|
   | Unpatched | ON | TM OOMKilled at 4 GB in ~80 s |
   | Patched | ON | Ran for 4 minutes, no OOM, anon RSS stayed at around 2GB |
   
   See here for reproduction steps: 
https://github.com/leekeiabstraction/flink/tree/reproduce-rocksdb-statistics-leak/reproduce-rocksdb-statistics-leak
   
   ## Does this PR introduce any user-facing change?
   
   None.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to