GitHub user NicoK opened a pull request: https://github.com/apache/flink/pull/4238
[FLINK-7057][blob] move BLOB ref-counting from LibraryCacheManager to BlobCache Currently, the `LibraryCacheManager` is doing some ref-counting for JAR files managed by it. Instead, we want the `BlobCache` to do that itself for **all** job-related BLOBs. Also, we do not want to operate on a per-BlobKey level but rather per job. Job-unrelated BLOBs should be cleaned manually as done for the Web-UI logs. A future API change will reflect the different use cases in a better way. For now, we need to also adapt the cleanup appropriately. On the `BlobServer`, the JAR files should remain locally as well as in the HA store until the job enters a final state. Then they can be deleted. With this intermediate state, job-unrelated BLOBs will remain in the file system until deleted manually. This is the same as the previous API use when working with a `BlobService` directly instead of going through the `LibraryCacheManager`. The aforementioned API extension will include TTL fields for those BLOBs in order to have a proper cleanup, too. This PR is based upon #4237 in a series to implement FLINK-6916. You can merge this pull request into a Git repository by running: $ git pull https://github.com/NicoK/flink flink-6916-7057 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/4238.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4238 ---- commit d54a316cfffd8243980df561fd4fcbd99934a40b Author: Nico Kruber <n...@data-artisans.com> Date: 2016-12-20T15:49:57Z [FLINK-6008][docs] minor improvements in the BlobService docs commit b215515fa14d3f6af218e86b67bc2c27ae9d4f4f Author: Nico Kruber <n...@data-artisans.com> Date: 2016-12-20T17:27:13Z [FLINK-6008] refactor BlobCache#getURL() for cleaner code commit bbcde52b3105fcf379c852b568f3893cc6052ce6 Author: Nico Kruber <n...@data-artisans.com> Date: 2016-12-21T15:23:29Z [FLINK-6008] do not fail the BlobServer if delete fails also extend the delete tests and remove one code duplication commit dda1a12e40027724efb0e50005e5b57058a220f0 Author: Nico Kruber <n...@data-artisans.com> Date: 2017-01-06T17:42:58Z [FLINK-6008][docs] update some config options to the new, non-deprecated ones commit e12c2348b237207a50649d515a0fbbd19f92e6a0 Author: Nico Kruber <n...@data-artisans.com> Date: 2017-03-09T17:14:02Z [FLINK-6008] use Preconditions.checkArgument in BlobClient commit 24060e01332c6df9fd01f1dc5f321c3fda9301c1 Author: Nico Kruber <n...@data-artisans.com> Date: 2017-03-17T15:21:40Z [FLINK-6008] fix concurrent job directory creation also add according unit tests commit 2e0d16ab8bf8a48a2d028602a3a7693fc4b76039 Author: Nico Kruber <n...@data-artisans.com> Date: 2017-06-14T16:01:47Z [FLINK-6008] do not guard a delete() call with a check for existence commit 7ba911d7ecb4861261dff8509996be0bd64d6d27 Author: Nico Kruber <n...@data-artisans.com> Date: 2017-04-18T14:37:37Z [FLINK-6008] some comments about BlobLibraryCacheManager cleanup commit d3f50d595f85356ae6ed0a85e1f8b8e8ac630bde Author: Nico Kruber <n...@data-artisans.com> Date: 2017-04-19T13:39:03Z [hotfix] minor typos commit 79b6ce35a9e246b35415a388295f9ee2fc19a82e Author: Nico Kruber <n...@data-artisans.com> Date: 2017-04-19T14:10:16Z [FLINK-6008] further cleanup tests for BlobLibraryCacheManager commit 23fb6ecd6c43c86d762503339c67953290236dca Author: Nico Kruber <n...@data-artisans.com> Date: 2017-06-30T14:03:16Z [FLINK-6008] address PR comments commit 794764ceeed6b9bbbac08662f5754b218ff86c9c Author: Nico Kruber <n...@data-artisans.com> Date: 2017-06-16T08:51:04Z [FLINK-7052][blob] remove (unused) NAME_ADDRESSABLE mode commit 774bafa85f242110a2ce7907c1150f8c62d73b3f Author: Nico Kruber <n...@data-artisans.com> Date: 2017-06-21T15:05:57Z [FLINK-7052][blob] remove further unused code due to the NAME_ADDRESSABLE removal commit 4da3b3f6269e43bf1c66621099528824cad9373f Author: Nico Kruber <n...@data-artisans.com> Date: 2017-06-22T15:31:17Z [FLINK-7053][blob] remove code duplication in BlobClientSslTest This lets BlobClientSslTest extend BlobClientTest as most of its implementation came from there and was simply copied. commit aa9cdc820f9ca1a38a19708bf45a2099e42eaf48 Author: Nico Kruber <n...@data-artisans.com> Date: 2017-06-23T09:40:34Z [FLINK-7053][blob] verify some of the buffers returned by GET commit c9b693a46053b55b3939ff471184796f12d36a72 Author: Nico Kruber <n...@data-artisans.com> Date: 2017-06-23T10:04:10Z [FLINK-7053][blob] use TemporaryFolder for local BLOB dir in unit tests This replaces the use of some temporary directory where it is not guaranteed that it will be deleted after the test. commit 11db399d5103d9ffe9083c9b6029a7e81afa9abe Author: Nico Kruber <n...@data-artisans.com> Date: 2017-06-21T12:45:31Z [FLINK-7054][blob] remove LibraryCacheManager#getFile() This was only used in tests where it is avoidable but if used anywhere else, it may have caused cleanup issues. commit 4ae04b68453d4b099f752d6c6fd3c09335ede33a Author: Nico Kruber <n...@data-artisans.com> Date: 2017-06-21T14:14:15Z [FLINK-7055][blob] refactor getURL() to the more generic getFile() The fact that we always returned URL objects is a relic of the BlobServer's only use for URLClassLoader. Since we'd like to extend its use, returning File objects instead is more generic. commit 8397d6aa5dc0aac07626d0af9ee3d8623dd7b60c Author: Nico Kruber <n...@data-artisans.com> Date: 2017-06-21T16:04:43Z [FLINK-7056][blob] add API to allow job-related BLOBs to be stored commit 0a4c4e9bc483e4f1f885ef1e3b8feba40c057204 Author: Nico Kruber <n...@data-artisans.com> Date: 2017-06-23T17:17:07Z [FLINK-7056][blob] refactor the new API for job-related BLOBs For a cleaner API, instead of having a nullable jobId parameter, use two methods: one for job-related BLOBs, another for job-unrelated ones. commit 13fd7623d1aafd3e853e39071c650cbfda865649 Author: Nico Kruber <n...@data-artisans.com> Date: 2017-06-27T10:14:08Z [FLINK-7012] remove user-JAR upload when disposing a savepoint the old way commit 8331fbb208d975e0c1ec990344c14315ea08dd4a Author: Nico Kruber <n...@data-artisans.com> Date: 2017-06-27T16:29:44Z [FLINK-7057][blob] move ref-counting from the LibraryCacheManager to the BlobCache Also change from BlobKey-based ref-counting to job-based ref-counting which is simpler and the mode we want to use from now on. Deferred cleanup (as before) is currently not implemented yet (TODO). At the BlobServer, no ref-counting will be used but the cleanup will happen when the job enters a final state (TODO). commit 0bc11d590c493ff2cdb8de63960c17f49ba5efb5 Author: Nico Kruber <n...@data-artisans.com> Date: 2017-06-28T09:31:39Z [FLINK-7057][blob] change to a cleaner API for BlobService#registerJob() commit e9a0d6893156ca818847d1b04519472111c3047d Author: Nico Kruber <n...@data-artisans.com> Date: 2017-06-28T12:09:11Z [FLINK-7057][blob] implement deferred cleanup at the BlobCache Whenever a job is not referenced at the BlobCache anymore, we set a TTL and let the cleanup task remove it when this is hit and the task is run. For now, this means that a BLOB will be retained at most (2 * ConfigConstants.LIBRARY_CACHE_MANAGER_CLEANUP_INTERVAL) seconds after not being referenced anymore. We do this so that a recovery still has the chance to use existing files rather than to download them again. commit 0c3e8032634e722b432c484bdbf789d0244397b3 Author: Nico Kruber <n...@data-artisans.com> Date: 2017-06-28T15:17:06Z [FLINK-7057][blob] integrate cleanup of job-related JARs from the BlobServer TODO: an integration test that verifies that this is actually done when desired and not performed when not, e.g. if the job did not reach a final execution state commit 2d9f4cb5740f48edfaa95f94de93d0334e8c279d Author: Nico Kruber <n...@data-artisans.com> Date: 2017-06-30T12:52:19Z [FLINK-7057][tests] extract FailingBlockingInvokable from CoordinatorShutdownTest commit b0cc398d40299acb1a3cddb81e64719fdb450459 Author: Nico Kruber <n...@data-artisans.com> Date: 2017-06-30T12:56:14Z [FLINK-7057][blob] add an integration test for the BlobServer cleanup This ensures that BLOB files are actually deleted when a job enters a final state. ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---