IMO it's the runner's responsibility to do container garbage collection and disk space management. This flag seems like a implementation-specific option that would not only to some runner/deployment combinations, so it doesn't seem to belong in that proto. Dataflow would not be able to honor such a flag, for example. Perhaps Flink could start all containers with --rm and have a Flink-specific debug option to not do that?
Henning On Wed, Oct 3, 2018 at 2:19 PM Ankur Goenka <[email protected]> wrote: > Hi, > > In portable flink runner, SDK Harness docker containers are created > dynamically and are not garbage collected. SDK Harness container pull the > staging artifact, generate logs and tmp files which is stored as an > additional layer on top of image. > These dead container layers accumulates over time and make the machine go > OOM. To avoid this situation and keep the flexibility of letting containers > exist for debugging, I am planning to add cleanup flag to DockerPayload. > When set, this flag will tell runner to remove the container after killing. > Current DockerPayload: > > // The payload of a Docker image > message DockerPayload { > string container_image = 1; // implicitly linux_amd64. > } > > > Proposed DockerPayload: > > // The payload of a Docker image > message DockerPayload { > string container_image = 1; // implicitly linux_amd64. > bool cleanup = 2; // Flag to signal container deletion after killing. > } > > > Let me know your thoughts and if there is a better way to do this. > > Thanks, > Ankur >
