Thanks, Jack! Eduard, I think that's a good idea. We should have a roadmap page as well that links to the projects that Jack just created.
On Mon, Sep 20, 2021 at 12:57 PM Jack Ye <yezhao...@gmail.com> wrote: > It seems like we have reached some consensus around the projects listed > here. I have created corresponding Github projects for each: > https://github.com/apache/iceberg/projects > > Related design docs are also linked there. > > Best, > Jack Ye > > On Sun, Sep 19, 2021 at 11:18 PM Eduard Tudenhoefner <edu...@dremio.com> > wrote: > >> Would it make sense to have a section on the website where we collect all >> the links to the design docs/specs as that would be easier to find than >> searching for things on the ML? >> >> I was thinking about something like for each component: >> * link to the ML discussion >> * link to the actual Spec/Design Doc >> >> Thoughts? >> >> On Fri, Sep 10, 2021 at 11:38 PM Ryan Blue <b...@tabular.io> wrote: >> >>> Hi everyone, >>> >>> At the last sync meeting, we brought up publishing a community roadmap >>> and brainstormed the many features and initiatives that the community is >>> working on. In this thread, I want to make sure that we have a good list of >>> what people are thinking about and I think we should try to categorize the >>> projects by size and general priority. When we reach a rough agreement, >>> I’ll write this up and post it on the ASF site along with links to some >>> projects in Github. >>> >>> My rationale for attempting to prioritize projects is that if we try to >>> do too many things, it will be slower progress across everything rather >>> than getting a few important items done. I know that priorities don’t align >>> very cleanly in practice, but it is hopefully worth trying. To come up with >>> a priority, I’m trying to keep top priority items to a minimum by including >>> only one from each group (Spark, Flink, Python, etc.). The remaining items >>> are split between priority 2 and 3. Priority 3 is not urgent, including >>> things that can be plugged in (like other IO libraries), docs, etc. >>> Everything else is priority 2. >>> >>> That something isn’t priority 1 doesn’t mean it isn’t important or >>> progressing, just that it isn’t the current focus. I think of it this way: >>> if someone has extra time to review something, what should be next? That’s >>> top priority. >>> >>> Here’s my rough categorization. If you disagree, please speak up: >>> >>> - If you think that something should be top priority, what gets >>> moved to priority 2? >>> - Should the priority for a project in 2 or 3 change? >>> - Is the S/M/L size of a project wrong? >>> >>> Top priority, 1: >>> >>> - API: Iceberg 1.0 [medium] >>> - Spark: Merge-on-read plans [large] >>> - Maintenance: Delete file compaction [medium] >>> - >>> >>> Flink: Upgrade to 1.13.2 (document compatibility) [medium] >>> - >>> >>> Python: Pythonic refactor [medium] >>> >>> Priority 2: >>> >>> - ORC: Support delete files stored as ORC [small] >>> - Spark: DSv2 streaming improvements [small] >>> - Flink: Inline file compaction [small] >>> - Flink: Support UPSERT [small] >>> - Views: Spec [medium] >>> - Spec: Z-ordering / Space-filling curves [medium] >>> - Spec: Snapshot tagging and branching [small] >>> - Spec: Secondary indexes [large] >>> - Spec v3: Encryption [large] >>> - >>> >>> Spec v3: Relative paths [large] >>> - >>> >>> Spec v3: Default field values [medium] >>> >>> Priority 3: >>> >>> - Docs: versioned docs [medium] >>> - IO: Support Aliyun OSS/DLF [medium] >>> - IO: Support Dell ECS [medium] >>> >>> External: >>> >>> - Trino: Bucketed joins [small] >>> - Trino: Row-level delete support [medium] >>> - Trino: Merge-on-read plans [medium] >>> - Trino: Multi-catalog support [small] >>> >>> -- >>> Ryan Blue >>> Tabular >>> >> -- Ryan Blue Tabular