Re: [DISCUSS] Changes for row-level deletes

Gautam Wed, 06 May 2020 17:52:21 -0700

My 2 cents :


>  * Merge manifest_entry and data_file?

 ...   -1  ..   keeping the difference between v1 and v2 metadata to a
minimum would be my preference by keeping manifest_entries the same way in
both v1 and v2. People using either flows would want to modify and
contribute and shouldn't have to worry about porting  things over every
time.

>  * How should planning with delete files work?

 .. +1 on keeping these independent and in two phases , as you mentioned.
Allows processing in parallel. Could make this a SparkAction too at some
point?


>  * Mix delete files and data files in manifests? I think we should not,
to support the two-phase planning approach.

  -1  .. We should not for the reason you mention.


>  * If delete files and data files are separate, should manifests use the
same schema?

+1.

On Wed, May 6, 2020 at 10:39 AM Anton Okolnychyi
<[email protected]> wrote:

> We won’t have to rewrite V1 metadata when migrating to V2. The format is
> backward compatible and we can read V1 manifests just fine in V2. For
> example, V1 metadata will not have have sequence number and V2 would
> interpret that as sequence number = 0. The only thing we need to prohibit
> is V1 writers writing to V2 tables. That check is already in place and such
> attempts will fail. Recent changes that went in ensure that V1 and V2
> co-exist in the same codebase. As of now, we have a format version in
> TableMetadata. I think the manual change Ryan was referring to would
> simply mean updating that version flag, not rewriting the metadata. That
> change can be done via TableOperations.
>
> One change that I've been considering is getting rid of manifest_entry. In
> v1, a manifest stored a manifest_entry that wrapped a data_file. The intent
> was to separate data that API users needed to supply -- fields in data_file
> -- from data that was tracked internally by Iceberg -- the snapshot_id and
> status fields of manifest_entry. If we want to combine these so that a
> manifest stores one top-level data_file struct, then now is the time to
> make that change. I've prototyped this in #963
> <https://github.com/apache/incubator-iceberg/pull/963>. The benefit is
> that the schema is flatter so we wouldn't need two metadata tables (entries
> and files). The main drawback is that we aren't going to stop using v1
> tables, so we would effectively have two different manifest schemas instead
> of v2 as an evolution of v1. I'd love to hear more opinions on whether to
> do this. I'm leaning toward not merging the two.
>
>
> As mentioned earlier, I’d rather keep ManifestEntry to reduce the number
> of changes we have in V1 and V2. I feel it will be easier for other people
> who want to contribute to the core metadata management to follow it. That
> being said, I do get the intention of merging the two.
>
> Another change is to start adding tracking fields for delete files and
> updating the APIs. The metadata for this is fairly simple: an enum that
> stores whether the file is data, position deletes, or equality deletes. The
> main decision point is whether to allow mixing data files and delete files
> together in manifests. I don't think that we should allow manifests with
> both delete files and data files. The reason is job planning: we want to
> start emitting splits immediately so that we can stream them, instead of
> holding them all in memory. That means we need some way to guarantee that
> we know all of the delete files to apply to a data file before we encounter
> the data file.
>
>
> I don’t see a good reason to mix delete and data files in a single
> manifest now. In our original idea, we wanted to keep deletes separately as
> it felt it would be easier to come up with an efficient job planning
> approach later on. I think once we know the approach we want to take for
> planning input splits and doing compaction, we can revisit this point again.
>
> - Anton
>
> On 6 May 2020, at 09:04, Junjie Chen <[email protected]> wrote:
>
> Hi Ryan
>
> Besides the reading and merging of delete files, can we talk a bit about
> write side of delete files? For example, generate delete files in a spark
> action, the metadata column support, the service to transfer equality
> delete files to position delete files etc..
>
> On Wed, May 6, 2020 at 1:34 PM Miao Wang <[email protected]> wrote:
>
>> Hi Ryan,
>>
>>
>>
>> “Tables must be manually upgraded to version 2 in order to use any of the
>> metadata changes we are making” If I understand correctly, for exist
>> iceberg table in v1, we have to run some CLI/script to rewrite the
>> metadata.
>>
>>
>>
>> “Next, we've added sequence numbers and the proposed inheritance scheme
>> to v2, along with tests to ensure that v1 is written without sequence
>> numbers and that when reading v1 metadata, the sequence numbers are all 0.”
>> To me, this means V2 reader should be able to read V1 table metadata.
>> Therefore, the step above is not required, which only requires us to use a
>> V2 reader on a V1 table.
>>
>>
>>
>> However, if a table has been written in V1, we want to save it as V2. I
>> expect only metadata data will be rewritten into V2 and V1 metadata will be
>> vacuumed upon V2 success.
>>
>>
>>
>> Is my understanding correct?
>>
>>
>>
>> Thanks!
>>
>>
>>
>> Miao
>>
>> *From: *Ryan Blue <[email protected]>
>> *Reply-To: *"[email protected]" <[email protected]>, "
>> [email protected]" <[email protected]>
>> *Date: *Tuesday, May 5, 2020 at 5:03 PM
>> *To: *Iceberg Dev List <[email protected]>
>> *Subject: *[DISCUSS] Changes for row-level deletes
>>
>>
>>
>> Hi, everyone,
>>
>>
>>
>> I know several people that are planning to attend the sync tomorrow are
>> interested in the row-level delete work, so I wanted to share some of the
>> progress and my current thinking ahead of time.
>>
>>
>>
>> The codebase now supports a new version number, 2. Tables must be
>> manually upgraded to version 2 in order to use any of the metadata changes
>> we are making; v1 readers cannot read v2 tables. When a write takes place,
>> the version number is now passed to the manifest writer, manifest list
>> writer, etc. and the right schema for the table's current version is used.
>> We've also frozen the v1 schemas and added wrappers to ensure that even as
>> the internal classes, like DataFile, evolve, the exact same data is written
>> to v1.
>>
>>
>>
>> Next, we've added sequence numbers and the proposed inheritance scheme to
>> v2, along with tests to ensure that v1 is written without sequence numbers
>> and that when reading v1 metadata, the sequence numbers are all 0. This
>> gives us the ability to track "when" a row-level delete occurred in a v2
>> table.
>>
>>
>>
>> The next steps are to start making larger changes to metadata files.
>>
>>
>>
>> One change that I've been considering is getting rid of manifest_entry.
>> In v1, a manifest stored a manifest_entry that wrapped a data_file. The
>> intent was to separate data that API users needed to supply -- fields in
>> data_file -- from data that was tracked internally by Iceberg -- the
>> snapshot_id and status fields of manifest_entry. If we want to combine
>> these so that a manifest stores one top-level data_file struct, then now is
>> the time to make that change. I've prototyped this in #963
>> <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-iceberg%2Fpull%2F963&data=02%7C01%7Cmiwang%40adobe.com%7C6deae35f2a5b47fd3dbb08d7f150e20d%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637243202006254913&sdata=BF4quqX2Cn%2FL3Ckyi1cpr6h3rkUnWf8MYbCTUugYXgw%3D&reserved=0>.
>> The benefit is that the schema is flatter so we wouldn't need two metadata
>> tables (entries and files). The main drawback is that we aren't going to
>> stop using v1 tables, so we would effectively have two different manifest
>> schemas instead of v2 as an evolution of v1. I'd love to hear more opinions
>> on whether to do this. I'm leaning toward not merging the two.
>>
>>
>>
>> Another change is to start adding tracking fields for delete files and
>> updating the APIs. The metadata for this is fairly simple: an enum that
>> stores whether the file is data, position deletes, or equality deletes. The
>> main decision point is whether to allow mixing data files and delete files
>> together in manifests. I don't think that we should allow manifests with
>> both delete files and data files. The reason is job planning: we want to
>> start emitting splits immediately so that we can stream them, instead of
>> holding them all in memory. That means we need some way to guarantee that
>> we know all of the delete files to apply to a data file before we encounter
>> the data file.
>>
>>
>>
>> OpenInx suggested sorting by sequence number to see delete files before
>> data files, but it still requires holding all splits in memory in the worst
>> case due to overlapping sequence number ranges. I think Iceberg should plan
>> a scan in two phases: one to find matching delete files (held in memory)
>> and one to find matching data files. That solves the problem of having all
>> deletes available so a split can be immediately emitted, and also allows
>> parallelizing both phases without coordination across threads.
>>
>>
>>
>> For the two-phase approach, mixing delete files and data files in a
>> manifest would require reading that manifest twice, once in each phase. I
>> think it makes the most sense to keep delete files and data files in
>> separate manifests. But the trade-off is that Iceberg will need to track
>> the content of a manifest (deletes or data) and perform actions on separate
>> manifest groups.
>>
>>
>>
>> Also, because with separate delete and data manifests we _could_ use
>> separate manifest schemas, I went through and wrote out a schema for a
>> delete file manifest. That schema was so similar to the current data file
>> schema that I think it's simpler to use the same one for both.
>>
>>
>>
>> In summary, here are the things that we need to decide and what I think
>> we should do:
>>
>>
>>
>> * Merge manifest_entry and data_file? I think we should not, to
>> avoid additional complexity.
>>
>> * How should planning with delete files work? The two-phase approach is
>> the only one I think is viable.
>>
>> * Mix delete files and data files in manifests? I think we should not, to
>> support the two-phase planning approach.
>>
>> * If delete files and data files are separate, should manifests use the
>> same schema? Yes, because it is simpler.
>>
>>
>>
>> Let's plan on talking about these questions in tomorrow's sync. And if
>> you have other topics, please send them to me!
>>
>>
>>
>> rb
>>
>>
>>
>> --
>>
>> Ryan Blue
>>
>> Software Engineer
>>
>> Netflix
>>
>
>
> --
> Best Regards
>
>
>

Re: [DISCUSS] Changes for row-level deletes

Reply via email to