Hi everyone, The notes for the Iceberg community sync last week are now updated in the agenda/notes doc <https://docs.google.com/document/d/1YuGhUdukLP5gGiqCbk0A5_Wifqe2CZWgOd3TbhY3UQg/edit#heading=h.2umwfxbo0iwo>. If you have anything to add, feel free to let me know or add comments to the doc.
We mainly discussed what projects we want to add to a roadmap and how to track them. I'll be sending out a discussion thread with the roadmap projects that we came up with so we can finalize it and add to it. Dan also suggested using github projects to track the progress of each feature. If you'd like to attend the syncs, you can add yourself to the iceberg-sync google group <https://groups.google.com/g/iceberg-sync> to receive the invites. Everyone is welcome to attend! Here are the notes if you prefer this over going to the doc: 1 September 2021 - Highlights - 0.12.0 release is out (Thanks, Carl!) - Metadata tables are updated for v2 (Thanks, Anton!) - Stored procedure to add and dedup files (Thanks, Szehon!) - Releases - 0.13.0 release timeline - Jack will be RM - Targeting late Oct or early Nov - 0.12.1 - Reads hanging <https://github.com/apache/iceberg/issues/3055> - need to find someone. Maybe Russell? - Parquet 1.12.0 bug <https://github.com/apache/iceberg/issues/2962>- Thanks, Kyle! - Roadmap discussion - Tracking - Dan: Github projects? - Ryan: Markdown file on the site? - Roadmap scope, items - Snapshot tagging and branching - Jack, Ryan (reviews) - Encryption - Gidon, Jack, Yufei - Merge-on-read plans in Spark - Anton, Ryan (reviews) - New writers - Delete compaction - Junjie, Puneet - Python - probably publish a separate roadmap - Separate google group <https://groups.google.com/g/iceberg-python-sync?hl=en> - Views - Anjali, John - Secondary indexes - Miao, Guy, Jack (some reviews) - File-level - Rollup - Spark streaming - Sreeram, Kyle, Anton (reviews) - CDC use case - Limit support to process large snapshots - CDC with Iceberg source - [v3] Relative paths - Anurag, Yufei - [v3] Z-ordering - Russell - [v3] Default values in schemas - Owen - Format v2 support in Trino - Jack - Multi-catalog support for Trino, ongoing for PrestoDB - Jack - Bucketed joins in Trino - Samarth has a working prototype - Versioned docs - Encryption PR / Design Doc - Gidon Gershinsky - Quick update - PRs with elements of the design - Sent a minimal google doc focused on MVP - Gidon to propose a time for encryption sync - View spec - First rev of the spec has feedback - Major question: SQL dialect - Do we have agreement to go ahead with the spec? - Do we need more time? - Carl: Spark would require dialect, version, and some config properties, so the spec is not sufficient - Ryan: The proposal includes places to store all of those - Carl: Views form a graph so is Iceberg an appropriate storage? - Anjali: Views across engine are not supported and metastores are not working, adding this to Iceberg at least makes it possible to share SQL, if not more in the future - Dan: Views are stored in different ways, which made it impossible to implement -- we tried before building the common view library at Netflix - Carl: Isn’t the representation just SQL? The spec punts on how to store the representation. No specifics - Carl: What has this enabled at Netflix? - Anjali: Simple common SQL works across engines - Ryan: And there is enough information to do view translation later in either Iceberg or in engines - Ran out of time -- Ryan Blue Tabular