Re: What have I learned from doing Merge-On-Read PoC

OpenInx Mon, 23 Mar 2020 02:55:35 -0700

Thanks for sharing the PoC work from you team, Junjie.

I read your PoC PRs and issues. you considered the whole path included
spark write behaviors (while I only considered the iceberg write), it
helps us understand all the update/delete work.

There're some points we might need to discuss :-)

1.  the spark would delete the rows by the following API:

IcebergSource icebergTable = new IcebergSource();
icebergTable.deleteWhere(dbTable, new Filter[]{new EqualTo("data", "a1")});

Would the filter condition be limited to only support several simple
build-in filters, such as =, !=, >, <, <=, >=, IN, NOT etc ?  I saw delta
also defined
the same behavior [1][2]. In my mind, our users would like to run a
unlimited update/delete, which means can do the following SQL:

update employee set work_year = work_year + 1 where company_id in  (select
id from company) and birthday >='1990-01-01';

I've considered the implementation, we may need to translate the UPDATE
plan to a SELECT plan so that we can read all file_id & offset , and finally
dump them into delete differential files. we've discussed this problem
before and said the spark don't provide the update physical update plan so
may
have some obstacles. we flink may try to accomplish the full UPDATE WHERE,
I don't know how do  people from the community  think about the
diff design. so I rise the issue here.

2.  the spark writer would write the InternalRowWithMeta into data files,
the row is a tuple <realRow, file_path, row_offset>[3] .  we might don't
need to write
the file_path because all the rows in a data file share the same file_path,
it always a string with long length and would cost lost of resources to
compare
and sort (I chose the 1-1 solution) . row_offset may also could be designed
implicitly as I said in the document, may need a PoC demo to proof this ( I
also
defined an explicit row_offset in the table in PoC, :-) ).
btw, we might could also sort the delete differential files by <file_id,
row_offset> so that we can do the merge sort for faster JOIN [4].

3. Yeah, the sequence number should be added to datafile, manifest,
snapshot. as the PR discussed, compatibility is an issue to consider.

[1].
https://github.com/delta-io/delta/blob/master/src/main/scala/io/delta/tables/DeltaTable.scala#L265
[2]. https://docs.databricks.com/delta/delta-update.html
[3].
https://github.com/apache/incubator-iceberg/compare/master...chenjunjiedada:row-level-delete#diff-c168df8c9739650eab655b22b0b549acR407
[4].
https://github.com/apache/incubator-iceberg/compare/master...chenjunjiedada:row-level-delete#diff-fffa37e29d3736de086cbd23094865b7R63

On Sun, Mar 22, 2020 at 8:49 PM Junjie Chen <chenjunjied...@gmail.com>
wrote:

> Great job and nice document @OpenInx! Thanks for sharing the progress!
>
> I also did the PoC a couple of weeks ago, you can take a look the code
> here
> <https://github.com/chenjunjiedada/incubator-iceberg/tree/row-level-delete>.
> My approach is to use the additional meta columns (SRI)  and it is based on
> the sequence number pull request #588
> <https://github.com/apache/incubator-iceberg/pull/588>.  The main
> differences from yours include:
>
>    - base file write path: It hooks the internal row to add metadata for
>    file name and row id.
>    - delete file write path: It uses the spark to generate the deletion
>    files via a staging table, and also sort the deletion file with file name.
>    - read path: Beside the sequence number, it uses the low bound and
>    upper bound to narrow down the deletion files.
>    - base file + deletion file merge:  It uses filter API and also need
>    merge sort optimization.
>
> FYI, there is also an issue
> <https://github.com/apache/incubator-iceberg/issues/825> about the
> addtional meta column, it seems like spark will handle the additional
> columns for iceberg so I didn't go further about that.
>
> Besides the design doc, we still need to finalize more detail for merge on
> read and I think that would be a good topic for next sync-up meeting.
>
>
>
>
>
> On Sat, Mar 21, 2020 at 9:01 PM OpenInx <open...@gmail.com> wrote:
>
>> Dear Iceberg Dev:
>>
>> As I said in the document[1] before,  we think the iceberg update/delete
>> features (mainly merge-on-read) is the high
>> priority feature (we've also discussed some flink+iceberg scenarios and
>> anybody who interest that part can read
>> the document).
>>
>> Recently, I write some demo to implement the merge-on-read thing( PoC).
>> The pull request is here [2], I also provided
>> a document to show the work [3].
>>
>> Any suggestion or feedback would be appreciated, Thanks.
>>
>> [1].
>> https://docs.google.com/document/d/1I7FUPHyyvtZZ7zaTT1Lq14rNIEZFhzD41-fazVHEoIA/edit?usp=sharing
>> [2]. https://github.com/openinx/incubator-iceberg/pull/5/files
>> [3].
>> https://docs.google.com/document/d/1CPFun2uG-eXdJggqKcPsTdNa2wPMpAdw8loeP-0fm_M/edit?usp=sharing
>>
>>
>
> --
> Best Regards
>

Re: What have I learned from doing Merge-On-Read PoC

Reply via email to