Hi Jungtaek,

That setting controls whether Iceberg cleans up old copies of the table
metadata file. The metadata file holds references to all of the table's
snapshots (that have no expired) and is self-contained. No operations need
to access previous metadata files.

Those aren't typically that large, but could be when streaming data because
you create a lot of versions. For streaming, I'd recommend turning it on
and making sure you're running `expireSnapshots()` regularly to prune old
table versions -- although expiring snapshots will remove them from table
metadata and limit how far back you can time travel.

On Mon, Jul 27, 2020 at 4:33 AM Jungtaek Lim <kabhwan.opensou...@gmail.com>
wrote:

> Hi devs,
>
> I'm experimenting with Apache Iceberg for Structured Streaming sink - plan
> to experiment with source as well, but I see PR still in review.
>
> It seems that "fast append" pretty much helps to retain reasonable latency
> for committing, though the metadata directory grows too fast. I found the
> option 'write.metadata.delete-after-commit.enabled' (false by default), and
> disabled it, and the overall size looks fine afterwards.
>
> That said, given the option is false by default, I'm wondering which would
> be impacted when turning off this option. My understanding is that it
> doesn't affect time-travel (as it refers to a snapshot), and restoring is
> also from snapshot, so not sure which point to consider when turning on the
> option.
>
> Thanks,
> Jungtaek Lim
>


-- 
Ryan Blue
Software Engineer
Netflix

Reply via email to