Re: Effect of enabling 'write.metadata.delete-after-commit.enabled'

Ryan Blue Tue, 28 Jul 2020 12:28:20 -0700

Thanks for the PR! We will review it soon.

The RewriteManifestAction is a parallel way to rewrite manfiests and then
commit them using the RewriteManifests table operation. The idea here is
that the metadata tree (manifest list and manifest files) functions as an
index over data in the table. Sometimes, you want to rewrite that index to
make finding data easier.


When data is appended, the new data files are placed into a new manifest
that is added to the manifest list file. Manifests are also compacted when
there are enough of them or a group reaches a certain size. Both of these
operations tend to cluster data in manifests by the time that it was added
to the table. This is great for time-series data because you get manifests
that each have a few days or a week of data, and a time filter can find the
right manifests and data files quickly. But if you want to look up data by
a dimension other than time -- for example, using the bucket of an ID --
then the natural clustering doesn't work well. In that case, you can use
RewriteManifests or the RewriteManifestsAction to cluster data files by
some key. That really helps speed up queries for specific rows.

On Mon, Jul 27, 2020 at 8:41 PM Jungtaek Lim <[email protected]>
wrote:

> I'd love to contribute documentation about the actions - just need some
> time to understand the needs for some actions (like RewriteManifestAction).
>
> I just submitted a PR for structured streaming sink [1]. I mentioned
> expireSnapshot() there with linking javadoc page, but it'd be nice if
> there's also a code example in the API page, as it'd not be bound for
> Spark's case (any fast changing cases including Flink streaming would
> need this as well).
>
> 1. https://github.com/apache/iceberg/pull/1261
>
> On Tue, Jul 28, 2020 at 10:39 AM Ryan Blue <[email protected]> wrote:
>
>> > seems to fail on high rate writing streaming query being run on the
>> other side
>>
>> This kind of situation is where you'd want to tune the number of retries
>> for a table. That's a likely source of the problem. We can also check to
>> make sure we're being smart about conflict detection. A rewrite needs to
>> scan any manifests that might have data files that conflict, which is why
>> retries can take a little while. The farther the rewrite is from active
>> data partitions, the better. And we can double-check to make sure we're
>> using the manifest file partition ranges to avoid doing unnecessary work.
>>
>> > from the end user's point of view, all actions are not documented
>>
>> Yes, we need to add documentation for the actions. If you're interested,
>> feel free to open PRs! The actions are fairly new, so we don't yet have
>> docs for them.
>>
>> Same with the streaming sink, we just need someone to write up docs and
>> contribute them. We don't use the streaming sink, so I've unfortunately
>> overlooked it.
>>
>> On Mon, Jul 27, 2020 at 3:25 PM Jungtaek Lim <
>> [email protected]> wrote:
>>
>>> Thanks for the quick response!
>>>
>>> And yes I also went through experimenting expireSnapshots() and it
>>> looked good. I can imagine some alternative conditions on expiring
>>> snapshots (like adjusting "granularity" between snapshots instead of
>>> removing all snapshots before the specific timestamp), but for now it's
>>> just an idea and I don't have support for real world needs.
>>>
>>> I also went through RewriteDataFilesAction and looked good as well.
>>> There's an existing Github issue to make the action be more intelligent
>>> which is valid and good to add. One thing I indicated is that it's a bit
>>> time-consuming task (expected for sure, not a problem) and seems to fail on
>>> high rate writing streaming query being run on the other side (this is a
>>> concern). I guess the action is only related to the old snapshots, hence no
>>> conflict is expected against fast append. It would be nice to know whether
>>> it's an expected behavior and it's recommended to stop all writes before
>>> running the action, or sounds like a bug.
>>>
>>> I haven't gone through RewriteManifestAction, though for now I am only
>>> curious about the needs. I'm eager to experiment with a streaming source
>>> which is in review - don't know about the details of Iceberg so
>>> not qualified to participate reviewing. I'd rather play with it when
>>> available and use it as a chance to learn about Iceberg itself.
>>>
>>> Btw, from the end user's point of view, all actions are not documented -
>>> even structured streaming sink is not documented and I had to go over the
>>> code. While I think it's obvious to document a streaming sink on Spark doc
>>> (wondering why it missed the documentation), would we want to document for
>>> actions as well? Looks like these actions are still evolving so wondering
>>> whether we are waiting for stabilizing, or just missed documentation.
>>>
>>> Thanks,
>>> Jungtaek Lim
>>>
>>> On Tue, Jul 28, 2020 at 2:45 AM Ryan Blue <[email protected]>
>>> wrote:
>>>
>>>> Hi Jungtaek,
>>>>
>>>> That setting controls whether Iceberg cleans up old copies of the table
>>>> metadata file. The metadata file holds references to all of the table's
>>>> snapshots (that have no expired) and is self-contained. No operations need
>>>> to access previous metadata files.
>>>>
>>>> Those aren't typically that large, but could be when streaming data
>>>> because you create a lot of versions. For streaming, I'd recommend turning
>>>> it on and making sure you're running `expireSnapshots()` regularly to prune
>>>> old table versions -- although expiring snapshots will remove them from
>>>> table metadata and limit how far back you can time travel.
>>>>
>>>> On Mon, Jul 27, 2020 at 4:33 AM Jungtaek Lim <
>>>> [email protected]> wrote:
>>>>
>>>>> Hi devs,
>>>>>
>>>>> I'm experimenting with Apache Iceberg for Structured Streaming sink -
>>>>> plan to experiment with source as well, but I see PR still in review.
>>>>>
>>>>> It seems that "fast append" pretty much helps to retain reasonable
>>>>> latency for committing, though the metadata directory grows too fast. I
>>>>> found the option 'write.metadata.delete-after-commit.enabled' (false by
>>>>> default), and disabled it, and the overall size looks fine afterwards.
>>>>>
>>>>> That said, given the option is false by default, I'm wondering which
>>>>> would be impacted when turning off this option. My understanding is that 
>>>>> it
>>>>> doesn't affect time-travel (as it refers to a snapshot), and restoring is
>>>>> also from snapshot, so not sure which point to consider when turning on 
>>>>> the
>>>>> option.
>>>>>
>>>>> Thanks,
>>>>> Jungtaek Lim
>>>>>
>>>>
>>>>
>>>> --
>>>> Ryan Blue
>>>> Software Engineer
>>>> Netflix
>>>>
>>>
>>
>> --
>> Ryan Blue
>> Software Engineer
>> Netflix
>>
>

-- 
Ryan Blue
Software Engineer
Netflix

Re: Effect of enabling 'write.metadata.delete-after-commit.enabled'

Reply via email to