Re: What's the time to expose iceberg format v2 to end users ?

Jun H. Wed, 20 Jan 2021 17:38:55 -0800

I updated the doc with few changes related to partition evolution.

Thanks.



Jun

On Tue, Dec 22, 2020 at 5:06 PM Ryan Blue <rb...@netflix.com.invalid> wrote:
>
> Thanks, Yan!
>
> To summarize that doc a bit, the main blockers are:
> * Finish updating the spec for NaN counters and behavior
> * Fix the issue with partition transforms and values before 1970 (#1680)
> * Partition evolution: Add lastPartitionFieldId to table metadata and update 
> docs
> * Add order id column to manifest files
> * Track the schema of each snapshot
>
> Only the last one is a somewhat large task, but even that should be fairly 
> quick. I think we can take care of those in the first couple months of 2021 
> after the 0.11.0 release is out.
>
> On Fri, Dec 18, 2020 at 12:59 AM OpenInx <open...@gmail.com> wrote:
>>
>> Thanks Yan for the document,  I will take a look at it, and see what I can 
>> do.
>>
>> On Fri, Dec 18, 2020 at 3:38 AM Yan Yan <yyany...@gmail.com> wrote:
>>>
>>> Hi OpenInx,
>>>
>>> Thanks for bringing this up. I am currently working on Format v2 blocking 
>>> tasks, and am maintaining a full list of blocking tasks with their 
>>> description and current status here after speaking with Ryan a while ago, 
>>> which covers all open issues listed in the github milestone plus some 
>>> others brought up by people during community sync. It would be great if you 
>>> are interested in collaborating/code reviewing!
>>>
>>> Everyone please feel free to let me know/update the doc if you see any item 
>>> missing/described inaccurately.
>>>
>>> Thanks,
>>> Yan
>>>
>>> On Wed, Dec 16, 2020 at 11:03 PM OpenInx <open...@gmail.com> wrote:
>>>>
>>>> Hi
>>>>
>>>> I wrote this email to align with the community about the time to expose 
>>>> format v2 to end users.
>>>>
>>>> In iceberg format v2,  we've accomplished the row-level delete.  It's 
>>>> designed for two user cases:
>>>>
>>>> 1.  Execute a single query to update or delete lots of rows.  It's a 
>>>> typical batch update/delete job,  which is suitable for GDPR  or the case 
>>>> that we want to correct the wrong data.
>>>> 2.  Write the real-time CDC/UPSERT stream to the iceberg table, so that 
>>>> the upper layer  compute engines could  analyze the change log in minutes. 
>>>>  It's almost ready in the current master branch for flink integration.
>>>>
>>>>
>>>> I'm not quite sure what's the blocker about the iceberg format v2 now.  
>>>> I'd love to resolve those blockers if there're some.
>>>>
>>>> Thanks.
>
>
>
> --
> Ryan Blue
> Software Engineer
> Netflix

Re: What's the time to expose iceberg format v2 to end users ?

Reply via email to