Re: Safe way to clear old checkpoint data
Hi, As Martijn mentioned, snapshot ownership in 1.15 is the best way. You say there are just 24000/10 references in a shared directory in a job. Is your case in the scope of [1] ? If right, I think it works if you could check the _metadata and find some files not referenced. And I suggest you also check the created timestamp of files to make sure deletion safely. [1] https://issues.apache.org/jira/browse/FLINK-24852 On Fri, Nov 25, 2022 at 6:02 PM Evgeniy Lyutikov wrote: > Thanks for the answer > We can't update flink to version 1.15 yet. > I'm interested in restoring from a checkpoint, theoretically, only those > sst files that are mentioned in _metadata or something else are enough? > Can I just delete files that are not referenced in _metadata? > > -- > *От:* Martijn Visser > *Отправлено:* 25 ноября 2022 г. 16:15:45 > *Кому:* Evgeniy Lyutikov > *Копия:* user > *Тема:* Re: Safe way to clear old checkpoint data > > Hi, > > I would recommend upgrading to Flink 1.15, given the changes that were > made in 1.15 make ownership more understandable. See > https://flink.apache.org/2022/05/06/restore-modes.html > <https://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fflink.apache.org%2F2022%2F05%2F06%2Frestore-modes.html&data=05%7C01%7Ceblyutikov%40avito.ru%7C03d6a3ebfef64a69562d08dacec5abad%7Caf0e07b3b90b472392e63fab11dd5396%7C0%7C0%7C638049645635457760%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=8k1fbhEkTBOP3MvXBMP97LzDqo7oRFrxYG7Y3lMFeBg%3D&reserved=0> > > Best regards, > > Martijn > > On Fri, Nov 25, 2022 at 9:33 AM Evgeniy Lyutikov > wrote: > >> Hello >> We use Flink 1.14.4 in kubernetes operator (version 1.2.0), all chepoint >> data store in s3 bucket. >> >> If parse _metadata file of checkpoint it contains links to objects in >> the shared directory and their number is much less than the total number >> of objects in the directory. >> >> For example, the number of links in _metadata file is 24000, and the >> total number of objects in shared directory is about 10. What is the >> safest way to delete unused files and free up space? >> >> * -- *“This message contains confidential >> information/commercial secret. If you are not the intended addressee of >> this message you may not copy, save, print or forward it to any third party >> and you are kindly requested to destroy this message and notify the sender >> thereof by email. >> Данное сообщение содержит конфиденциальную информацию/информацию, >> являющуюся коммерческой тайной. Если Вы не являетесь надлежащим адресатом >> данного сообщения, Вы не вправе копировать, сохранять, печатать или >> пересылать его каким либо иным лицам. Просьба уничтожить данное сообщение и >> уведомить об этом отправителя электронным письмом.” >> > -- Best, Hangxiang.
Re: Safe way to clear old checkpoint data
Thanks for the answer We can't update flink to version 1.15 yet. I'm interested in restoring from a checkpoint, theoretically, only those sst files that are mentioned in _metadata or something else are enough? Can I just delete files that are not referenced in _metadata? От: Martijn Visser Отправлено: 25 ноября 2022 г. 16:15:45 Кому: Evgeniy Lyutikov Копия: user Тема: Re: Safe way to clear old checkpoint data Hi, I would recommend upgrading to Flink 1.15, given the changes that were made in 1.15 make ownership more understandable. See https://flink.apache.org/2022/05/06/restore-modes.html<https://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fflink.apache.org%2F2022%2F05%2F06%2Frestore-modes.html&data=05%7C01%7Ceblyutikov%40avito.ru%7C03d6a3ebfef64a69562d08dacec5abad%7Caf0e07b3b90b472392e63fab11dd5396%7C0%7C0%7C638049645635457760%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=8k1fbhEkTBOP3MvXBMP97LzDqo7oRFrxYG7Y3lMFeBg%3D&reserved=0> Best regards, Martijn On Fri, Nov 25, 2022 at 9:33 AM Evgeniy Lyutikov mailto:eblyuti...@avito.ru>> wrote: Hello We use Flink 1.14.4 in kubernetes operator (version 1.2.0), all chepoint data store in s3 bucket. If parse _metadata file of checkpoint it contains links to objects in the shared directory and their number is much less than the total number of objects in the directory. For example, the number of links in _metadata file is 24000, and the total number of objects in shared directory is about 10. What is the safest way to delete unused files and free up space? “This message contains confidential information/commercial secret. If you are not the intended addressee of this message you may not copy, save, print or forward it to any third party and you are kindly requested to destroy this message and notify the sender thereof by email. Данное сообщение содержит конфиденциальную информацию/информацию, являющуюся коммерческой тайной. Если Вы не являетесь надлежащим адресатом данного сообщения, Вы не вправе копировать, сохранять, печатать или пересылать его каким либо иным лицам. Просьба уничтожить данное сообщение и уведомить об этом отправителя электронным письмом.”
Re: Safe way to clear old checkpoint data
Hi, I would recommend upgrading to Flink 1.15, given the changes that were made in 1.15 make ownership more understandable. See https://flink.apache.org/2022/05/06/restore-modes.html Best regards, Martijn On Fri, Nov 25, 2022 at 9:33 AM Evgeniy Lyutikov wrote: > Hello > We use Flink 1.14.4 in kubernetes operator (version 1.2.0), all chepoint > data store in s3 bucket. > > If parse _metadata file of checkpoint it contains links to objects in the > shared directory and their number is much less than the total number of > objects in the directory. > > For example, the number of links in _metadata file is 24000, and the > total number of objects in shared directory is about 10. What is the > safest way to delete unused files and free up space? > > * -- *“This message contains confidential > information/commercial secret. If you are not the intended addressee of > this message you may not copy, save, print or forward it to any third party > and you are kindly requested to destroy this message and notify the sender > thereof by email. > Данное сообщение содержит конфиденциальную информацию/информацию, > являющуюся коммерческой тайной. Если Вы не являетесь надлежащим адресатом > данного сообщения, Вы не вправе копировать, сохранять, печатать или > пересылать его каким либо иным лицам. Просьба уничтожить данное сообщение и > уведомить об этом отправителя электронным письмом.” >
Safe way to clear old checkpoint data
Hello We use Flink 1.14.4 in kubernetes operator (version 1.2.0), all chepoint data store in s3 bucket. If parse _metadata file of checkpoint it contains links to objects in the shared directory and their number is much less than the total number of objects in the directory. For example, the number of links in _metadata file is 24000, and the total number of objects in shared directory is about 10. What is the safest way to delete unused files and free up space? "This message contains confidential information/commercial secret. If you are not the intended addressee of this message you may not copy, save, print or forward it to any third party and you are kindly requested to destroy this message and notify the sender thereof by email. Данное сообщение содержит конфиденциальную информацию/информацию, являющуюся коммерческой тайной. Если Вы не являетесь надлежащим адресатом данного сообщения, Вы не вправе копировать, сохранять, печатать или пересылать его каким либо иным лицам. Просьба уничтожить данное сообщение и уведомить об этом отправителя электронным письмом."