This list looks overall pretty good to me. +1 For Flink 1.13 upgrade, I suggest we consider starting another thread for it. There are some open PRs, but they have outstanding questions. Specifically, dropping support for Flink 1.12 or not. I think we can upgrade without dropping support for Flink 1.12, but we wouldn’t get some of the proposed benefits of 1.13 (though that can be a follow up task).
I’m not presently involved in the Flink Community enough to say with certainty, but I believe the FLIP-27 (Using the new source interface) and the Flink 1.13.2 upgrade are orthogonal to each other and can both progress independently. But I would defer to Steven or anybody else who works with Flink much more often than I do currently. - Kyle Bendickson > On Sep 15, 2021, at 4:06 PM, Ryan Blue <b...@tabular.io> wrote: > > That sounds great, thanks for taking that on Jack! > > On Wed, Sep 15, 2021 at 3:51 PM Jack Ye <yezhao...@gmail.com > <mailto:yezhao...@gmail.com>> wrote: > For external Trino and PrestoDB tasks, I am thinking about creating one > Github project for Trino and another one for PrestoDB to manage all tasks > under them, adding links of issues and PRs in the other communities to track > progress. This is mostly to improve visibility so that people who are > interested can see what is going on in those 2 places. > > -Jack Ye > > On Wed, Sep 15, 2021 at 2:14 PM Ryan Blue <b...@tabular.io > <mailto:b...@tabular.io>> wrote: > Gidon, I think that the v3 part of encryption is actually documenting how it > works and adding it to the spec. Right now we have hooks for building some > encryption around it, but almost no requirements in the spec for how to use > it across implementations. This is fine while we're working on defining > encryption, but we eventually want to update the spec. > > Jack, I'm happy to add the external PrestoDB items to the roadmap. I'm just > not quite sure what to do here since we aren't tracking them in the Iceberg > community ourselves. I listed those as external so that we can publish links > to where those are tracked in other communities. We can add as many of these > as we want. > > Anton, I agree. The goal here is to identify the top priority items to help > direct review effort. We want everything to continue progressing, but I think > it's good to identify where we as a community want to focus review time. > > Sounds like one area of uncertainty is FLIP-27 vs Flink 1.13.2. Can someone > summarize the status of Flink and what we need? I don't think I understand it > well enough to suggest which one takes priority. > > Ryan > > On Mon, Sep 13, 2021 at 7:54 PM Anton Okolnychyi > <aokolnyc...@apple.com.invalid> wrote: > The discussed roadmap makes sense to me. I think it is important to agree on > what we should do first as the review pool is limited. There are more and > more large items that are half done or half discussed. I think we better > focus on finishing them quickly and then move to something else as opposed to > making very minor progress on a number of issues. > > To be clear, it is not like other things are not important or we should stop > their development. It is more about making sure certain high-priority > features for most folks in the community get enough attention. > > - Anton > >> On 13 Sep 2021, at 12:19, Jack Ye <yezhao...@gmail.com >> <mailto:yezhao...@gmail.com>> wrote: >> >> I'd like to also propose adding the following in the external section: >> 1. the PrestoDB equivalent for each item listed for Trino. I am not sure >> what's the best way to track them, but I feel it's better to list and track >> them separately. I have talked with related people currently maintaining the >> PrestoDB Iceberg connector (mostly in Twitter), and they would like to take >> a different route from Trino to fully remove Hive dependencies in the >> connector. This means the 2 connectors will likely diverge in implementation >> in the near future. >> 2. adding a medium item for Trino and PrestoDB Avro support >> 3. adding a small item for Trino and PrestoDB full system table support (the >> system table schema in them are diverging from core, and missing a few >> latest system tables) >> >> For the items listed with "Spec" and "Spec v3", what are the key >> differences? I thought we are treating any new spec changes after the format >> v2 vote as v3. >> >> Best, >> Jack Ye >> >> On Mon, Sep 13, 2021 at 7:13 AM Gidon Gershinsky <gg5...@gmail.com >> <mailto:gg5...@gmail.com>> wrote: >> Hi Ryan, >> >> I just wonder if the encryption should be a Spec v3 category. We have the >> key_metadata fields in both data_file and manifest_file structs, which might >> be sufficient for a reasonable basic encryption support. >> But I certainly agree this is an L-sized project. >> >> Cheers, Gidon >> >> >> On Sat, Sep 11, 2021 at 12:38 AM Ryan Blue <b...@tabular.io >> <mailto:b...@tabular.io>> wrote: >> Hi everyone, >> >> At the last sync meeting, we brought up publishing a community roadmap and >> brainstormed the many features and initiatives that the community is working >> on. In this thread, I want to make sure that we have a good list of what >> people are thinking about and I think we should try to categorize the >> projects by size and general priority. When we reach a rough agreement, I’ll >> write this up and post it on the ASF site along with links to some projects >> in Github. >> >> My rationale for attempting to prioritize projects is that if we try to do >> too many things, it will be slower progress across everything rather than >> getting a few important items done. I know that priorities don’t align very >> cleanly in practice, but it is hopefully worth trying. To come up with a >> priority, I’m trying to keep top priority items to a minimum by including >> only one from each group (Spark, Flink, Python, etc.). The remaining items >> are split between priority 2 and 3. Priority 3 is not urgent, including >> things that can be plugged in (like other IO libraries), docs, etc. >> Everything else is priority 2. >> >> That something isn’t priority 1 doesn’t mean it isn’t important or >> progressing, just that it isn’t the current focus. I think of it this way: >> if someone has extra time to review something, what should be next? That’s >> top priority. >> >> Here’s my rough categorization. If you disagree, please speak up: >> >> If you think that something should be top priority, what gets moved to >> priority 2? >> Should the priority for a project in 2 or 3 change? >> Is the S/M/L size of a project wrong? >> Top priority, 1: >> >> API: Iceberg 1.0 [medium] >> Spark: Merge-on-read plans [large] >> Maintenance: Delete file compaction [medium] >> Flink: Upgrade to 1.13.2 (document compatibility) [medium] >> Python: Pythonic refactor [medium] >> Priority 2: >> >> ORC: Support delete files stored as ORC [small] >> Spark: DSv2 streaming improvements [small] >> Flink: Inline file compaction [small] >> Flink: Support UPSERT [small] >> Views: Spec [medium] >> Spec: Z-ordering / Space-filling curves [medium] >> Spec: Snapshot tagging and branching [small] >> Spec: Secondary indexes [large] >> Spec v3: Encryption [large] >> Spec v3: Relative paths [large] >> Spec v3: Default field values [medium] >> Priority 3: >> >> Docs: versioned docs [medium] >> IO: Support Aliyun OSS/DLF [medium] >> IO: Support Dell ECS [medium] >> External: >> >> Trino: Bucketed joins [small] >> Trino: Row-level delete support [medium] >> Trino: Merge-on-read plans [medium] >> Trino: Multi-catalog support [small] >> -- >> Ryan Blue >> Tabular > > > > -- > Ryan Blue > Tabular > > > -- > Ryan Blue > Tabular