Re: Safe way to clear old checkpoint data

2022-11-27 Thread Hangxiang Yu
Hi,
As Martijn mentioned, snapshot ownership in 1.15 is the best way.
You say there are just 24000/10 references in a shared directory in a
job. Is your case in the scope of [1] ?
If right,  I think it works if you could check the  _metadata and find some
files not referenced.
And I suggest you also check the created timestamp of files to make sure
deletion safely.

[1] https://issues.apache.org/jira/browse/FLINK-24852

On Fri, Nov 25, 2022 at 6:02 PM Evgeniy Lyutikov 
wrote:

> Thanks for the answer
> We can't update flink to version 1.15 yet.
> I'm interested in restoring from a checkpoint, theoretically, only those
> sst files that are mentioned in _metadata or something else are enough?
> Can I just delete files that are not referenced in _metadata?
>
> --
> *От:* Martijn Visser 
> *Отправлено:* 25 ноября 2022 г. 16:15:45
> *Кому:* Evgeniy Lyutikov
> *Копия:* user
> *Тема:* Re: Safe way to clear old checkpoint data
>
> Hi,
>
> I would recommend upgrading to Flink 1.15, given the changes that were
> made in 1.15 make ownership more understandable.  See
> https://flink.apache.org/2022/05/06/restore-modes.html
> <https://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fflink.apache.org%2F2022%2F05%2F06%2Frestore-modes.html&data=05%7C01%7Ceblyutikov%40avito.ru%7C03d6a3ebfef64a69562d08dacec5abad%7Caf0e07b3b90b472392e63fab11dd5396%7C0%7C0%7C638049645635457760%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=8k1fbhEkTBOP3MvXBMP97LzDqo7oRFrxYG7Y3lMFeBg%3D&reserved=0>
>
> Best regards,
>
> Martijn
>
> On Fri, Nov 25, 2022 at 9:33 AM Evgeniy Lyutikov 
> wrote:
>
>> Hello
>> We use Flink 1.14.4 in kubernetes operator (version 1.2.0), all chepoint
>> data store in s3 bucket.
>>
>> If parse _metadata file of checkpoint it contains links to objects in
>> the shared directory and their number is much less than the total number
>> of objects in the directory.
>>
>> For example, the number of links in _metadata file is 24000, and the
>> total number of objects in shared directory is about 10. What is the
>> safest way to delete unused files and free up space?
>>
>> * -- *“This message contains confidential
>> information/commercial secret. If you are not the intended addressee of
>> this message you may not copy, save, print or forward it to any third party
>> and you are kindly requested to destroy this message and notify the sender
>> thereof by email.
>> Данное сообщение содержит конфиденциальную информацию/информацию,
>> являющуюся коммерческой тайной. Если Вы не являетесь надлежащим адресатом
>> данного сообщения, Вы не вправе копировать, сохранять, печатать или
>> пересылать его каким либо иным лицам. Просьба уничтожить данное сообщение и
>> уведомить об этом отправителя электронным письмом.”
>>
>

-- 
Best,
Hangxiang.


Re: Safe way to clear old checkpoint data

2022-11-25 Thread Evgeniy Lyutikov
Thanks for the answer
We can't update flink to version 1.15 yet.
I'm interested in restoring from a checkpoint, theoretically, only those sst 
files that are mentioned in _metadata or something else are enough?
Can I just delete files that are not referenced in _metadata?



От: Martijn Visser 
Отправлено: 25 ноября 2022 г. 16:15:45
Кому: Evgeniy Lyutikov
Копия: user
Тема: Re: Safe way to clear old checkpoint data

Hi,

I would recommend upgrading to Flink 1.15, given the changes that were made in 
1.15 make ownership more understandable.  See 
https://flink.apache.org/2022/05/06/restore-modes.html<https://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fflink.apache.org%2F2022%2F05%2F06%2Frestore-modes.html&data=05%7C01%7Ceblyutikov%40avito.ru%7C03d6a3ebfef64a69562d08dacec5abad%7Caf0e07b3b90b472392e63fab11dd5396%7C0%7C0%7C638049645635457760%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=8k1fbhEkTBOP3MvXBMP97LzDqo7oRFrxYG7Y3lMFeBg%3D&reserved=0>

Best regards,

Martijn

On Fri, Nov 25, 2022 at 9:33 AM Evgeniy Lyutikov 
mailto:eblyuti...@avito.ru>> wrote:

Hello
We use Flink 1.14.4 in kubernetes operator (version 1.2.0), all chepoint data 
store in s3 bucket.

If parse _metadata file of checkpoint it contains links to objects in the 
shared directory and their number is much less than the total number of objects 
in the directory.

For example, the number of links in _metadata file is 24000, and the total 
number of objects in shared directory is about 10. What is the safest way 
to delete unused files and free up space?


“This message contains confidential information/commercial secret. If you are 
not the intended addressee of this message you may not copy, save, print or 
forward it to any third party and you are kindly requested to destroy this 
message and notify the sender thereof by email.
Данное сообщение содержит конфиденциальную информацию/информацию, являющуюся 
коммерческой тайной. Если Вы не являетесь надлежащим адресатом данного 
сообщения, Вы не вправе копировать, сохранять, печатать или пересылать его 
каким либо иным лицам. Просьба уничтожить данное сообщение и уведомить об этом 
отправителя электронным письмом.”


Re: Safe way to clear old checkpoint data

2022-11-25 Thread Martijn Visser
Hi,

I would recommend upgrading to Flink 1.15, given the changes that were made
in 1.15 make ownership more understandable.  See
https://flink.apache.org/2022/05/06/restore-modes.html

Best regards,

Martijn

On Fri, Nov 25, 2022 at 9:33 AM Evgeniy Lyutikov 
wrote:

> Hello
> We use Flink 1.14.4 in kubernetes operator (version 1.2.0), all chepoint
> data store in s3 bucket.
>
> If parse _metadata file of checkpoint it contains links to objects in the
> shared directory and their number is much less than the total number of
> objects in the directory.
>
> For example, the number of links in _metadata file is 24000, and the
> total number of objects in shared directory is about 10. What is the
> safest way to delete unused files and free up space?
>
> * -- *“This message contains confidential
> information/commercial secret. If you are not the intended addressee of
> this message you may not copy, save, print or forward it to any third party
> and you are kindly requested to destroy this message and notify the sender
> thereof by email.
> Данное сообщение содержит конфиденциальную информацию/информацию,
> являющуюся коммерческой тайной. Если Вы не являетесь надлежащим адресатом
> данного сообщения, Вы не вправе копировать, сохранять, печатать или
> пересылать его каким либо иным лицам. Просьба уничтожить данное сообщение и
> уведомить об этом отправителя электронным письмом.”
>


Safe way to clear old checkpoint data

2022-11-25 Thread Evgeniy Lyutikov
Hello
We use Flink 1.14.4 in kubernetes operator (version 1.2.0), all chepoint data 
store in s3 bucket.

If parse _metadata file of checkpoint it contains links to objects in the 
shared directory and their number is much less than the total number of objects 
in the directory.

For example, the number of links in _metadata file is 24000, and the total 
number of objects in shared directory is about 10. What is the safest way 
to delete unused files and free up space?


"This message contains confidential information/commercial secret. If you are 
not the intended addressee of this message you may not copy, save, print or 
forward it to any third party and you are kindly requested to destroy this 
message and notify the sender thereof by email.
Данное сообщение содержит конфиденциальную информацию/информацию, являющуюся 
коммерческой тайной. Если Вы не являетесь надлежащим адресатом данного 
сообщения, Вы не вправе копировать, сохранять, печатать или пересылать его 
каким либо иным лицам. Просьба уничтожить данное сообщение и уведомить об этом 
отправителя электронным письмом."