Iceberg community sync notes for 1 September 2021

Ryan Blue Wed, 08 Sep 2021 16:58:23 -0700

Hi everyone,

The notes for the Iceberg community sync last week are now updated in
the agenda/notes
doc
<https://docs.google.com/document/d/1YuGhUdukLP5gGiqCbk0A5_Wifqe2CZWgOd3TbhY3UQg/edit#heading=h.2umwfxbo0iwo>.
If you have anything to add, feel free to let me know or add comments to
the doc.


We mainly discussed what projects we want to add to a roadmap and how to
track them. I'll be sending out a discussion thread with the roadmap
projects that we came up with so we can finalize it and add to it. Dan also
suggested using github projects to track the progress of each feature.

If you'd like to attend the syncs, you can add yourself to the iceberg-sync
google group <https://groups.google.com/g/iceberg-sync> to receive the
invites. Everyone is welcome to attend!

Here are the notes if you prefer this over going to the doc:

1 September 2021

   -

   Highlights
   -

      0.12.0 release is out (Thanks, Carl!)
      -

      Metadata tables are updated for v2 (Thanks, Anton!)
      -

      Stored procedure to add and dedup files (Thanks, Szehon!)
      -

   Releases
   -

      0.13.0 release timeline
      -

         Jack will be RM
         -

         Targeting late Oct or early Nov
         -

      0.12.1
      -

         Reads hanging <https://github.com/apache/iceberg/issues/3055> -
         need to find someone. Maybe Russell?
         -

         Parquet 1.12.0 bug <https://github.com/apache/iceberg/issues/2962>-
         Thanks, Kyle!
         -

   Roadmap discussion
   -

      Tracking
      -

         Dan: Github projects?
         -

         Ryan: Markdown file on the site?
         -

      Roadmap scope, items
      -

         Snapshot tagging and branching - Jack, Ryan (reviews)
         -

         Encryption - Gidon, Jack, Yufei
         -

         Merge-on-read plans in Spark - Anton, Ryan (reviews)
         -

            New writers
            -

         Delete compaction - Junjie, Puneet
         -

         Python - probably publish a separate roadmap
         -

            Separate google group
            <https://groups.google.com/g/iceberg-python-sync?hl=en>
            -

         Views - Anjali, John
         -

         Secondary indexes - Miao, Guy, Jack (some reviews)
         -

            File-level
            -

            Rollup
            -

         Spark streaming - Sreeram, Kyle, Anton (reviews)
         -

            CDC use case
            -

            Limit support to process large snapshots
            -

            CDC with Iceberg source
            -

         [v3] Relative paths - Anurag, Yufei
         -

         [v3] Z-ordering - Russell
         -

         [v3] Default values in schemas - Owen
         -

         Format v2 support in Trino - Jack
         -

         Multi-catalog support for Trino, ongoing for PrestoDB - Jack
         -

         Bucketed joins in Trino - Samarth has a working prototype
         -

         Versioned docs
         -

   Encryption PR / Design Doc - Gidon Gershinsky
   -

      Quick update
      -

      PRs with elements of the design
      -

      Sent a minimal google doc focused on MVP
      -

      Gidon to propose a time for encryption sync
      -

   View spec
   -

      First rev of the spec has feedback
      -

      Major question: SQL dialect
      -

         Do we have agreement to go ahead with the spec?
         -

         Do we need more time?
         -

      Carl: Spark would require dialect, version, and some config
      properties, so the spec is not sufficient
      -

      Ryan: The proposal includes places to store all of those
      -

      Carl: Views form a graph so is Iceberg an appropriate storage?
      -

      Anjali: Views across engine are not supported and metastores are not
      working, adding this to Iceberg at least makes it possible to
share SQL, if
      not more in the future
      -

      Dan: Views are stored in different ways, which made it impossible to
      implement -- we tried before building the common view library at Netflix
      -

      Carl: Isn’t the representation just SQL? The spec punts on how to
      store the representation. No specifics
      -

      Carl: What has this enabled at Netflix?
      -

      Anjali: Simple common SQL works across engines
      -

      Ryan: And there is enough information to do view translation later in
      either Iceberg or in engines
      -

   Ran out of time


-- 
Ryan Blue
Tabular

Iceberg community sync notes for 1 September 2021

Reply via email to