Hi Ryan, We’re using iceberg but not with spark. Can you provide any specifics on why you say the java call made outside of spark “isn’t as good as the action-based one”? Thanks, Casey
[Dynata]<http://www.dynata.com/> Casey Lucas Director, Engineering dynata.com<http://www.dynata.com> [cid:WIN-13265-English_c880c3ff-ec59-4aa6-b509-49df386623c4.png]<https://www.dynata.com/resources/dynata-global-trends-report/?utm_source=Email&utm_medium=SignatureBanner&utm_campaign=Consumer%20Trends%3A%20New%20Lives> The information contained in this e-mail message is intended for the use of the recipient(s) named above and is privileged and confidential. If you are not the intended recipient, you are formally notified that you have received this message in error and that any review, dissemination, distribution, or copying of the message is strictly prohibited. If you have received this communication in error, please notify us immediately by e-mail and delete the original message. From: Ryan Blue <[email protected]> Date: Wednesday, June 23, 2021 at 11:48 AM To: [email protected] <[email protected]> Subject: [EXT] Re: question about the gc in iceberg CAUTION: This email originated from outside the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe. There is also a way to expire snapshots without using Spark, through the ExpireSnapshots API: table.expireSnapshots().expireOlderThan(timestampInMs).commit(); That is what we used in production for a long time, but it isn’t as good as the action-based one that compares file trees. I’d recommend using the expire_snapshots procedure that Russell pointed to: https://iceberg.apache.org/spark-procedures/#expire_snapshots<https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Ficeberg.apache.org%2Fspark-procedures%2F%23expire_snapshots&data=04%7C01%7CCasey.Lucas%40dynata.com%7C52294767fea5483b66c408d93666c11a%7Cf0ff917dab8c4129b13f33be267a153b%7C0%7C0%7C637600637224141429%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=CJ8GbrRoQECaw8uDQuf%2B4Qq2XzihgWGtTVXw0Vc%2FLUk%3D&reserved=0> On Wed, Jun 23, 2021 at 7:49 AM Russell Spitzer <[email protected]<mailto:[email protected]>> wrote: There are "actions" which contain common table maintenance things, You are most likely interested in ExpireSnapshots, RewriteDataFiles and RemoveOrphanFiles see https://iceberg.apache.org/spark-procedures/<https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Ficeberg.apache.org%2Fspark-procedures%2F&data=04%7C01%7CCasey.Lucas%40dynata.com%7C52294767fea5483b66c408d93666c11a%7Cf0ff917dab8c4129b13f33be267a153b%7C0%7C0%7C637600637224151422%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=jOM4UXRXBkfL0BpP5H8Etbgjb%2FznDZigphH9oTgyRp0%3D&reserved=0> On Tue, Jun 22, 2021 at 7:19 PM yong.sunny <[email protected]<mailto:[email protected]>> wrote: Hi Iceberg Dev, Is there any exising mechanism to do GC in iceberg? Or there is an implementation based on Spark? Thanks and Best regards, Yong -- Ryan Blue Tabular
