Hi Ryan,
We’re using iceberg but not with spark. Can you provide any specifics on why 
you say the java call made outside of spark “isn’t as good as the action-based 
one”?
Thanks,
Casey





[Dynata]<http://www.dynata.com/>


Casey Lucas
Director, Engineering




dynata.com<http://www.dynata.com>

[cid:WIN-13265-English_c880c3ff-ec59-4aa6-b509-49df386623c4.png]<https://www.dynata.com/resources/dynata-global-trends-report/?utm_source=Email&utm_medium=SignatureBanner&utm_campaign=Consumer%20Trends%3A%20New%20Lives>

The information contained in this e-mail message is intended for the use of the 
recipient(s) named above and is privileged and confidential. If you are not the 
intended recipient, you are formally notified that you have received this 
message in error and that any review, dissemination, distribution, or copying 
of the message is strictly prohibited. If you have received this communication 
in error, please notify us immediately by e-mail and delete the original 
message.
From: Ryan Blue <[email protected]>
Date: Wednesday, June 23, 2021 at 11:48 AM
To: [email protected] <[email protected]>
Subject: [EXT] Re: question about the gc in iceberg
CAUTION: This email originated from outside the organization. Do not click 
links or open attachments unless you recognize the sender and know the content 
is safe.



There is also a way to expire snapshots without using Spark, through the 
ExpireSnapshots API:

table.expireSnapshots().expireOlderThan(timestampInMs).commit();

That is what we used in production for a long time, but it isn’t as good as the 
action-based one that compares file trees. I’d recommend using the 
expire_snapshots procedure that Russell pointed to: 
https://iceberg.apache.org/spark-procedures/#expire_snapshots<https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Ficeberg.apache.org%2Fspark-procedures%2F%23expire_snapshots&data=04%7C01%7CCasey.Lucas%40dynata.com%7C52294767fea5483b66c408d93666c11a%7Cf0ff917dab8c4129b13f33be267a153b%7C0%7C0%7C637600637224141429%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=CJ8GbrRoQECaw8uDQuf%2B4Qq2XzihgWGtTVXw0Vc%2FLUk%3D&reserved=0>

On Wed, Jun 23, 2021 at 7:49 AM Russell Spitzer 
<[email protected]<mailto:[email protected]>> wrote:
There are "actions" which contain common table maintenance things,

You are most likely interested in ExpireSnapshots, RewriteDataFiles and 
RemoveOrphanFiles see

https://iceberg.apache.org/spark-procedures/<https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Ficeberg.apache.org%2Fspark-procedures%2F&data=04%7C01%7CCasey.Lucas%40dynata.com%7C52294767fea5483b66c408d93666c11a%7Cf0ff917dab8c4129b13f33be267a153b%7C0%7C0%7C637600637224151422%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=jOM4UXRXBkfL0BpP5H8Etbgjb%2FznDZigphH9oTgyRp0%3D&reserved=0>

On Tue, Jun 22, 2021 at 7:19 PM yong.sunny 
<[email protected]<mailto:[email protected]>> wrote:
Hi Iceberg Dev,

Is there any exising mechanism to do GC in iceberg? Or there is an 
implementation based on Spark?

Thanks and Best regards,
Yong







--
Ryan Blue
Tabular

Reply via email to