Hi Jungtaek, That setting controls whether Iceberg cleans up old copies of the table metadata file. The metadata file holds references to all of the table's snapshots (that have no expired) and is self-contained. No operations need to access previous metadata files.
Those aren't typically that large, but could be when streaming data because you create a lot of versions. For streaming, I'd recommend turning it on and making sure you're running `expireSnapshots()` regularly to prune old table versions -- although expiring snapshots will remove them from table metadata and limit how far back you can time travel. On Mon, Jul 27, 2020 at 4:33 AM Jungtaek Lim <kabhwan.opensou...@gmail.com> wrote: > Hi devs, > > I'm experimenting with Apache Iceberg for Structured Streaming sink - plan > to experiment with source as well, but I see PR still in review. > > It seems that "fast append" pretty much helps to retain reasonable latency > for committing, though the metadata directory grows too fast. I found the > option 'write.metadata.delete-after-commit.enabled' (false by default), and > disabled it, and the overall size looks fine afterwards. > > That said, given the option is false by default, I'm wondering which would > be impacted when turning off this option. My understanding is that it > doesn't affect time-travel (as it refers to a snapshot), and restoring is > also from snapshot, so not sure which point to consider when turning on the > option. > > Thanks, > Jungtaek Lim > -- Ryan Blue Software Engineer Netflix