Re: What have I learned from doing Merge-On-Read PoC

2020-03-22 Thread Junjie Chen
Great job and nice document @OpenInx! Thanks for sharing the progress!

I also did the PoC a couple of weeks ago, you can take a look the code here
.
My approach is to use the additional meta columns (SRI)  and it is based on
the sequence number pull request #588
.  The main
differences from yours include:

   - base file write path: It hooks the internal row to add metadata for
   file name and row id.
   - delete file write path: It uses the spark to generate the deletion
   files via a staging table, and also sort the deletion file with file name.
   - read path: Beside the sequence number, it uses the low bound and upper
   bound to narrow down the deletion files.
   - base file + deletion file merge:  It uses filter API and also need
   merge sort optimization.

FYI, there is also an issue
 about the
addtional meta column, it seems like spark will handle the additional
columns for iceberg so I didn't go further about that.

Besides the design doc, we still need to finalize more detail for merge on
read and I think that would be a good topic for next sync-up meeting.





On Sat, Mar 21, 2020 at 9:01 PM OpenInx  wrote:

> Dear Iceberg Dev:
>
> As I said in the document[1] before,  we think the iceberg update/delete
> features (mainly merge-on-read) is the high
> priority feature (we've also discussed some flink+iceberg scenarios and
> anybody who interest that part can read
> the document).
>
> Recently, I write some demo to implement the merge-on-read thing( PoC).
> The pull request is here [2], I also provided
> a document to show the work [3].
>
> Any suggestion or feedback would be appreciated, Thanks.
>
> [1].
> https://docs.google.com/document/d/1I7FUPHyyvtZZ7zaTT1Lq14rNIEZFhzD41-fazVHEoIA/edit?usp=sharing
> [2]. https://github.com/openinx/incubator-iceberg/pull/5/files
> [3].
> https://docs.google.com/document/d/1CPFun2uG-eXdJggqKcPsTdNa2wPMpAdw8loeP-0fm_M/edit?usp=sharing
>
>

-- 
Best Regards


Re: Shall we start a regular community sync up?

2020-03-22 Thread Romin Parekh
Hi folks, 

Both times slots work for me next week. Can we confirm a day? 

Thanks,
Romin

Sent from my iPhone

> On Mar 20, 2020, at 11:38 PM, Jun H.  wrote:
> 
> The schedule works for me.
> 
>> On Thu, Mar 19, 2020 at 6:55 PM Junjie Chen  wrote:
>> 
>> The same time works for me as well.
>> 
>>> On Fri, Mar 20, 2020 at 9:43 AM Gautam  wrote:
>>> 
>>> 5 / 5:30pm any day of next week works for me.
>>> 
>>> On Thu, Mar 19, 2020 at 6:07 PM 李响  wrote:
 
 5 or 5:30 PM (UTC-7, is it PDT now) in any day works for me. Looking 
 forward to it 8-)
 
 On Fri, Mar 20, 2020 at 8:17 AM RD  wrote:
> 
> Same time works for me too!
> 
> On Thu, Mar 19, 2020 at 4:45 PM Xabriel Collazo Mojica 
>  wrote:
>> 
>> 5pm or 5:30pm PT  any day next week would work for me.
>> 
>> Thanks for restoring the community sync up!
>> 
>> Xabriel J Collazo Mojica  |  Sr Computer Scientist II  |  Adobe
>> 
>> On 3/18/20, 6:45 PM, "justin_cof...@apple.com on behalf of Justin Q 
>> Coffey"  
>> wrote:
>> 
>>Any chance we could actually do 5:30pm PST?  I'm a bit of a lurker, 
>> but this roadmap is important to mine and I have a daily at 5pm :(.
>> 
>>-Justin
>> 
>>> On Mar 18, 2020, at 6:43 PM, Saisai Shao  wrote:
>>> 
>>> 5pm PST in any day works for me.
>>> 
>>> Looking forward to it.
>>> 
>>> Thanks
>>> Saisai
>> 
>> 
>> 
 
 
 --
 
   李响 Xiang Li
 
 手机 cellphone :+86-136-8113-8972
 邮件 e-mail  :wate...@gmail.com
>> 
>> 
>> 
>> --
>> Best Regards


Re: Shall we start a regular community sync up?

2020-03-22 Thread John Zhuge
5-5:30 pm work for me. Prefer Wednesdays.

On Sun, Mar 22, 2020 at 1:33 PM Romin Parekh  wrote:

> Hi folks,
>
> Both times slots work for me next week. Can we confirm a day?
>
> Thanks,
> Romin
>
> Sent from my iPhone
>
> > On Mar 20, 2020, at 11:38 PM, Jun H.  wrote:
> >
> > The schedule works for me.
> >
> >> On Thu, Mar 19, 2020 at 6:55 PM Junjie Chen 
> wrote:
> >>
> >> The same time works for me as well.
> >>
> >>> On Fri, Mar 20, 2020 at 9:43 AM Gautam 
> wrote:
> >>>
> >>> 5 / 5:30pm any day of next week works for me.
> >>>
> >>> On Thu, Mar 19, 2020 at 6:07 PM 李响  wrote:
> 
>  5 or 5:30 PM (UTC-7, is it PDT now) in any day works for me. Looking
> forward to it 8-)
> 
>  On Fri, Mar 20, 2020 at 8:17 AM RD  wrote:
> >
> > Same time works for me too!
> >
> > On Thu, Mar 19, 2020 at 4:45 PM Xabriel Collazo Mojica
>  wrote:
> >>
> >> 5pm or 5:30pm PT  any day next week would work for me.
> >>
> >> Thanks for restoring the community sync up!
> >>
> >> Xabriel J Collazo Mojica  |  Sr Computer Scientist II  |  Adobe
> >>
> >> On 3/18/20, 6:45 PM, "justin_cof...@apple.com on behalf of Justin
> Q Coffey" 
> wrote:
> >>
> >>Any chance we could actually do 5:30pm PST?  I'm a bit of a
> lurker, but this roadmap is important to mine and I have a daily at 5pm :(.
> >>
> >>-Justin
> >>
> >>> On Mar 18, 2020, at 6:43 PM, Saisai Shao 
> wrote:
> >>>
> >>> 5pm PST in any day works for me.
> >>>
> >>> Looking forward to it.
> >>>
> >>> Thanks
> >>> Saisai
> >>
> >>
> >>
> 
> 
>  --
> 
>    李响 Xiang Li
> 
>  手机 cellphone :+86-136-8113-8972
>  邮件 e-mail  :wate...@gmail.com
> >>
> >>
> >>
> >> --
> >> Best Regards
>


-- 
John Zhuge


Re: Shall we start a regular community sync up?

2020-03-22 Thread Ryan Blue
Let's go with Wednesday. I'll send out an invite.

On Sun, Mar 22, 2020 at 1:36 PM John Zhuge  wrote:

> 5-5:30 pm work for me. Prefer Wednesdays.
>
> On Sun, Mar 22, 2020 at 1:33 PM Romin Parekh 
> wrote:
>
>> Hi folks,
>>
>> Both times slots work for me next week. Can we confirm a day?
>>
>> Thanks,
>> Romin
>>
>> Sent from my iPhone
>>
>> > On Mar 20, 2020, at 11:38 PM, Jun H.  wrote:
>> >
>> > The schedule works for me.
>> >
>> >> On Thu, Mar 19, 2020 at 6:55 PM Junjie Chen 
>> wrote:
>> >>
>> >> The same time works for me as well.
>> >>
>> >>> On Fri, Mar 20, 2020 at 9:43 AM Gautam 
>> wrote:
>> >>>
>> >>> 5 / 5:30pm any day of next week works for me.
>> >>>
>> >>> On Thu, Mar 19, 2020 at 6:07 PM 李响  wrote:
>> 
>>  5 or 5:30 PM (UTC-7, is it PDT now) in any day works for me. Looking
>> forward to it 8-)
>> 
>>  On Fri, Mar 20, 2020 at 8:17 AM RD  wrote:
>> >
>> > Same time works for me too!
>> >
>> > On Thu, Mar 19, 2020 at 4:45 PM Xabriel Collazo Mojica
>>  wrote:
>> >>
>> >> 5pm or 5:30pm PT  any day next week would work for me.
>> >>
>> >> Thanks for restoring the community sync up!
>> >>
>> >> Xabriel J Collazo Mojica  |  Sr Computer Scientist II  |  Adobe
>> >>
>> >> On 3/18/20, 6:45 PM, "justin_cof...@apple.com on behalf of
>> Justin Q Coffey" > j...@apple.com.INVALID> wrote:
>> >>
>> >>Any chance we could actually do 5:30pm PST?  I'm a bit of a
>> lurker, but this roadmap is important to mine and I have a daily at 5pm :(.
>> >>
>> >>-Justin
>> >>
>> >>> On Mar 18, 2020, at 6:43 PM, Saisai Shao 
>> wrote:
>> >>>
>> >>> 5pm PST in any day works for me.
>> >>>
>> >>> Looking forward to it.
>> >>>
>> >>> Thanks
>> >>> Saisai
>> >>
>> >>
>> >>
>> 
>> 
>>  --
>> 
>>    李响 Xiang Li
>> 
>>  手机 cellphone :+86-136-8113-8972
>>  邮件 e-mail  :wate...@gmail.com
>> >>
>> >>
>> >>
>> >> --
>> >> Best Regards
>>
>
>
> --
> John Zhuge
>


-- 
Ryan Blue
Software Engineer
Netflix


Re: Shall we start a regular community sync up?

2020-03-22 Thread Ryan Blue
I invited everyone that replied to this thread and the people that were on
the last invite.

If you have specific topics you'd like to put on the agenda, please send
them to me!

On Sun, Mar 22, 2020 at 2:28 PM Ryan Blue  wrote:

> Let's go with Wednesday. I'll send out an invite.
>
> On Sun, Mar 22, 2020 at 1:36 PM John Zhuge  wrote:
>
>> 5-5:30 pm work for me. Prefer Wednesdays.
>>
>> On Sun, Mar 22, 2020 at 1:33 PM Romin Parekh 
>> wrote:
>>
>>> Hi folks,
>>>
>>> Both times slots work for me next week. Can we confirm a day?
>>>
>>> Thanks,
>>> Romin
>>>
>>> Sent from my iPhone
>>>
>>> > On Mar 20, 2020, at 11:38 PM, Jun H.  wrote:
>>> >
>>> > The schedule works for me.
>>> >
>>> >> On Thu, Mar 19, 2020 at 6:55 PM Junjie Chen 
>>> wrote:
>>> >>
>>> >> The same time works for me as well.
>>> >>
>>> >>> On Fri, Mar 20, 2020 at 9:43 AM Gautam 
>>> wrote:
>>> >>>
>>> >>> 5 / 5:30pm any day of next week works for me.
>>> >>>
>>> >>> On Thu, Mar 19, 2020 at 6:07 PM 李响  wrote:
>>> 
>>>  5 or 5:30 PM (UTC-7, is it PDT now) in any day works for me.
>>> Looking forward to it 8-)
>>> 
>>>  On Fri, Mar 20, 2020 at 8:17 AM RD  wrote:
>>> >
>>> > Same time works for me too!
>>> >
>>> > On Thu, Mar 19, 2020 at 4:45 PM Xabriel Collazo Mojica
>>>  wrote:
>>> >>
>>> >> 5pm or 5:30pm PT  any day next week would work for me.
>>> >>
>>> >> Thanks for restoring the community sync up!
>>> >>
>>> >> Xabriel J Collazo Mojica  |  Sr Computer Scientist II  |  Adobe
>>> >>
>>> >> On 3/18/20, 6:45 PM, "justin_cof...@apple.com on behalf of
>>> Justin Q Coffey" >> j...@apple.com.INVALID> wrote:
>>> >>
>>> >>Any chance we could actually do 5:30pm PST?  I'm a bit of a
>>> lurker, but this roadmap is important to mine and I have a daily at 5pm :(.
>>> >>
>>> >>-Justin
>>> >>
>>> >>> On Mar 18, 2020, at 6:43 PM, Saisai Shao 
>>> wrote:
>>> >>>
>>> >>> 5pm PST in any day works for me.
>>> >>>
>>> >>> Looking forward to it.
>>> >>>
>>> >>> Thanks
>>> >>> Saisai
>>> >>
>>> >>
>>> >>
>>> 
>>> 
>>>  --
>>> 
>>>    李响 Xiang Li
>>> 
>>>  手机 cellphone :+86-136-8113-8972
>>>  邮件 e-mail  :wate...@gmail.com
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Best Regards
>>>
>>
>>
>> --
>> John Zhuge
>>
>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>


-- 
Ryan Blue
Software Engineer
Netflix