Re: Iceberg community sync notes for 1 September 2021

OpenInx Wed, 08 Sep 2021 18:36:30 -0700

Thanks for the summary,  Ryan !

I would like to add the following thing into the roadmap for 0.13.0:


*Flink Integration*

1.  Upgrade the flink version from 1.12.1 to 1.13.2 (
https://github.com/apache/iceberg/pull/2629).

Because there is a bug in flink 1.12.1 when reading nested data types
(Map/List) in flink SQL (see:
https://github.com/apache/iceberg/pull/3081#pullrequestreview-747934199),
the newly released 1.13.2 has resolved it.

2.  Support for creating an iceberg table with 'connector'='type' in flink
SQL (https://github.com/apache/iceberg/pull/2666).

The PR has been merged but still left a flink connector document open for
reviewing (https://github.com/apache/iceberg/pull/3085).

3.  Add streaming upsert option for flink write sink. (
https://github.com/apache/iceberg/pull/2863)

This is an essential PR for flink upsert stream when writing to iceberg
sink table, more background pls see
https://github.com/apache/iceberg/pull/1996#issue-546072705.

*Ecosystem/Vendor integration.*

1.  Aliyun OSS/DLF integration. (https://github.com/apache/iceberg/pull/2230
)

This is a very important job that has been suspended for a long time.  The
good news is:  Xingbo Wu <https://github.com/xingbowu> now has enough
bandwidth to make this forward now.  I think we can successfully finish
this work If we've enough reviewing bandwidth.

2. Dell ECS integration.

We have great discussion (https://github.com/apache/iceberg/pull/2807)
about integrating the private vendor storage/catalog into apache iceberg
repo, but I'm not sure it's suitable to add it into roadmap 0.13.0 before
we reach the agreement about the unit/integration/release tests for private
vendor integration.


> Dan also suggested using github projects to track the progress of each
feature.

+1  ! We should make better use of github issues to manage the progress and
blockers of our roadmap, so that everyone can synchronize to the latest
status in time to make the roadmap forward.


On Thu, Sep 9, 2021 at 7:58 AM Ryan Blue <b...@tabular.io> wrote:

> Hi everyone,
>
> The notes for the Iceberg community sync last week are now updated in the 
> agenda/notes
> doc
> <https://docs.google.com/document/d/1YuGhUdukLP5gGiqCbk0A5_Wifqe2CZWgOd3TbhY3UQg/edit#heading=h.2umwfxbo0iwo>.
> If you have anything to add, feel free to let me know or add comments to
> the doc.
>
> We mainly discussed what projects we want to add to a roadmap and how to
> track them. I'll be sending out a discussion thread with the roadmap
> projects that we came up with so we can finalize it and add to it. Dan also
> suggested using github projects to track the progress of each feature.
>
> If you'd like to attend the syncs, you can add yourself to the iceberg-sync
> google group <https://groups.google.com/g/iceberg-sync> to receive the
> invites. Everyone is welcome to attend!
>
> Here are the notes if you prefer this over going to the doc:
>
> 1 September 2021
>
>    -
>
>    Highlights
>    -
>
>       0.12.0 release is out (Thanks, Carl!)
>       -
>
>       Metadata tables are updated for v2 (Thanks, Anton!)
>       -
>
>       Stored procedure to add and dedup files (Thanks, Szehon!)
>       -
>
>    Releases
>    -
>
>       0.13.0 release timeline
>       -
>
>          Jack will be RM
>          -
>
>          Targeting late Oct or early Nov
>          -
>
>       0.12.1
>       -
>
>          Reads hanging <https://github.com/apache/iceberg/issues/3055> -
>          need to find someone. Maybe Russell?
>          -
>
>          Parquet 1.12.0 bug
>          <https://github.com/apache/iceberg/issues/2962>- Thanks, Kyle!
>          -
>
>    Roadmap discussion
>    -
>
>       Tracking
>       -
>
>          Dan: Github projects?
>          -
>
>          Ryan: Markdown file on the site?
>          -
>
>       Roadmap scope, items
>       -
>
>          Snapshot tagging and branching - Jack, Ryan (reviews)
>          -
>
>          Encryption - Gidon, Jack, Yufei
>          -
>
>          Merge-on-read plans in Spark - Anton, Ryan (reviews)
>          -
>
>             New writers
>             -
>
>          Delete compaction - Junjie, Puneet
>          -
>
>          Python - probably publish a separate roadmap
>          -
>
>             Separate google group
>             <https://groups.google.com/g/iceberg-python-sync?hl=en>
>             -
>
>          Views - Anjali, John
>          -
>
>          Secondary indexes - Miao, Guy, Jack (some reviews)
>          -
>
>             File-level
>             -
>
>             Rollup
>             -
>
>          Spark streaming - Sreeram, Kyle, Anton (reviews)
>          -
>
>             CDC use case
>             -
>
>             Limit support to process large snapshots
>             -
>
>             CDC with Iceberg source
>             -
>
>          [v3] Relative paths - Anurag, Yufei
>          -
>
>          [v3] Z-ordering - Russell
>          -
>
>          [v3] Default values in schemas - Owen
>          -
>
>          Format v2 support in Trino - Jack
>          -
>
>          Multi-catalog support for Trino, ongoing for PrestoDB - Jack
>          -
>
>          Bucketed joins in Trino - Samarth has a working prototype
>          -
>
>          Versioned docs
>          -
>
>    Encryption PR / Design Doc - Gidon Gershinsky
>    -
>
>       Quick update
>       -
>
>       PRs with elements of the design
>       -
>
>       Sent a minimal google doc focused on MVP
>       -
>
>       Gidon to propose a time for encryption sync
>       -
>
>    View spec
>    -
>
>       First rev of the spec has feedback
>       -
>
>       Major question: SQL dialect
>       -
>
>          Do we have agreement to go ahead with the spec?
>          -
>
>          Do we need more time?
>          -
>
>       Carl: Spark would require dialect, version, and some config
>       properties, so the spec is not sufficient
>       -
>
>       Ryan: The proposal includes places to store all of those
>       -
>
>       Carl: Views form a graph so is Iceberg an appropriate storage?
>       -
>
>       Anjali: Views across engine are not supported and metastores are
>       not working, adding this to Iceberg at least makes it possible to share
>       SQL, if not more in the future
>       -
>
>       Dan: Views are stored in different ways, which made it impossible
>       to implement -- we tried before building the common view library at 
> Netflix
>       -
>
>       Carl: Isn’t the representation just SQL? The spec punts on how to
>       store the representation. No specifics
>       -
>
>       Carl: What has this enabled at Netflix?
>       -
>
>       Anjali: Simple common SQL works across engines
>       -
>
>       Ryan: And there is enough information to do view translation later
>       in either Iceberg or in engines
>       -
>
>    Ran out of time
>
>
> --
> Ryan Blue
> Tabular
>

Re: Iceberg community sync notes for 1 September 2021

Reply via email to