Re: How to clean up RocksDB local directory in K8s statefulset

2022-06-27 Thread yanfei lei
Hi Allen, what volumes do you use for your TM pod? If you want your data to
be deleted when the pod restarts, you can use an ephemeral volume like
EmptyDir.
And Flink should remove temporary files automatically when they are not
needed anymore(see this discussion
).

Working directory only takes effects after Flink 1.15,  a local RocksDB
directory is usually located under /tmp directory in Flink 1.14,  if you
don't specifically configure state.backend.rocksdb.localdir
.
So, the working directory can't help.

Allen Wang  于2022年6月28日周二 04:39写道:

> Hi Folks,
>
> We created a stateful job using SessionWindow and RocksDB state backend
> and deployed it on Kubernetes Statefulset with persisted volumes. The Flink
> version we used is 1.14.
>
> After the job runs for some time, we observed that the size of the local
> RocksDB directory started to grow and there are more and more
> directories created inside it. It seems that when the job is restarted or
> the task manager K8s pod is restarted, the previous RocksDB directory
> corresponding to the assigned operator is not cleaned up. Here is an
> example:
>
> drwxr-xr-x 3 root root 4096 Jun 27 18:23
> job__op_WindowOperator_2b0a50a068bb7f1c8a470e4f763cbf26__1_4__uuid_c97f3f3f-649a-467d-82af-2bc250ec6e22
> drwxr-xr-x 3 root root 4096 Jun 27 18:45
> job__op_WindowOperator_2b0a50a068bb7f1c8a470e4f763cbf26__1_4__uuid_e4fca2c3-74c7-4aa2-9ca1-dda866b8de11
> drwxr-xr-x 3 root root 4096 Jun 27 18:56
> job__op_WindowOperator_2b0a50a068bb7f1c8a470e4f763cbf26__2_4__uuid_f1fa-7402-494d-80d7-65861394710c
> drwxr-xr-x 3 root root 4096 Jun 27 17:34
> job__op_WindowOperator_f6dc7f4d2283f4605b127b9364e21148__3_4__uuid_08a14423-bea1-44ce-96ee-360a516d72a6
>
> Although only
> job__op_WindowOperator_2b0a50a068bb7f1c8a470e4f763cbf26__2_4__uuid_f1fa-7402-494d-80d7-65861394710c
> is the active running operator, the other directories for the past
> operators still exist.
>
> We set up the task manager property taskmanager.resource-id to be the task
> manager pod name under the statefulset but it did not seem to help cleaning
> up previous directories.
>
> Any pointers to solve this issue?
>
> We checked the latest document and it seems that Flink 1.15 introduced the
> concept of local working directory:
> https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/deployment/resource-providers/standalone/working_directory/.
> Does that help cleaning up the RocksDB directory?
>
> Thanks,
> Allen
>
>
>
>
>


How to clean up RocksDB local directory in K8s statefulset

2022-06-27 Thread Allen Wang
Hi Folks,

We created a stateful job using SessionWindow and RocksDB state backend and
deployed it on Kubernetes Statefulset with persisted volumes. The Flink
version we used is 1.14.

After the job runs for some time, we observed that the size of the local
RocksDB directory started to grow and there are more and more
directories created inside it. It seems that when the job is restarted or
the task manager K8s pod is restarted, the previous RocksDB directory
corresponding to the assigned operator is not cleaned up. Here is an
example:

drwxr-xr-x 3 root root 4096 Jun 27 18:23
job__op_WindowOperator_2b0a50a068bb7f1c8a470e4f763cbf26__1_4__uuid_c97f3f3f-649a-467d-82af-2bc250ec6e22
drwxr-xr-x 3 root root 4096 Jun 27 18:45
job__op_WindowOperator_2b0a50a068bb7f1c8a470e4f763cbf26__1_4__uuid_e4fca2c3-74c7-4aa2-9ca1-dda866b8de11
drwxr-xr-x 3 root root 4096 Jun 27 18:56
job__op_WindowOperator_2b0a50a068bb7f1c8a470e4f763cbf26__2_4__uuid_f1fa-7402-494d-80d7-65861394710c
drwxr-xr-x 3 root root 4096 Jun 27 17:34
job__op_WindowOperator_f6dc7f4d2283f4605b127b9364e21148__3_4__uuid_08a14423-bea1-44ce-96ee-360a516d72a6

Although only
job__op_WindowOperator_2b0a50a068bb7f1c8a470e4f763cbf26__2_4__uuid_f1fa-7402-494d-80d7-65861394710c
is the active running operator, the other directories for the past
operators still exist.

We set up the task manager property taskmanager.resource-id to be the task
manager pod name under the statefulset but it did not seem to help cleaning
up previous directories.

Any pointers to solve this issue?

We checked the latest document and it seems that Flink 1.15 introduced the
concept of local working directory:
https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/deployment/resource-providers/standalone/working_directory/.
Does that help cleaning up the RocksDB directory?

Thanks,
Allen