That sounds great, thanks for taking that on Jack!

On Wed, Sep 15, 2021 at 3:51 PM Jack Ye <yezhao...@gmail.com> wrote:

> For external Trino and PrestoDB tasks, I am thinking about creating one
> Github project for Trino and another one for PrestoDB to manage all tasks
> under them, adding links of issues and PRs in the other communities to
> track progress. This is mostly to improve visibility so that people who are
> interested can see what is going on in those 2 places.
>
> -Jack Ye
>
> On Wed, Sep 15, 2021 at 2:14 PM Ryan Blue <b...@tabular.io> wrote:
>
>> Gidon, I think that the v3 part of encryption is actually documenting how
>> it works and adding it to the spec. Right now we have hooks for building
>> some encryption around it, but almost no requirements in the spec for how
>> to use it across implementations. This is fine while we're working on
>> defining encryption, but we eventually want to update the spec.
>>
>> Jack, I'm happy to add the external PrestoDB items to the roadmap. I'm
>> just not quite sure what to do here since we aren't tracking them in the
>> Iceberg community ourselves. I listed those as external so that we can
>> publish links to where those are tracked in other communities. We can add
>> as many of these as we want.
>>
>> Anton, I agree. The goal here is to identify the top priority items to
>> help direct review effort. We want everything to continue progressing, but
>> I think it's good to identify where we as a community want to focus review
>> time.
>>
>> Sounds like one area of uncertainty is FLIP-27 vs Flink 1.13.2. Can
>> someone summarize the status of Flink and what we need? I don't think I
>> understand it well enough to suggest which one takes priority.
>>
>> Ryan
>>
>> On Mon, Sep 13, 2021 at 7:54 PM Anton Okolnychyi
>> <aokolnyc...@apple.com.invalid> wrote:
>>
>>> The discussed roadmap makes sense to me. I think it is important to
>>> agree on what we should do first as the review pool is limited. There are
>>> more and more large items that are half done or half discussed. I think we
>>> better focus on finishing them quickly and then move to something else as
>>> opposed to making very minor progress on a number of issues.
>>>
>>> To be clear, it is not like other things are not important or we should
>>> stop their development. It is more about making sure certain high-priority
>>> features for most folks in the community get enough attention.
>>>
>>> - Anton
>>>
>>> On 13 Sep 2021, at 12:19, Jack Ye <yezhao...@gmail.com> wrote:
>>>
>>> I'd like to also propose adding the following in the external section:
>>> 1. the PrestoDB equivalent for each item listed for Trino. I am not sure
>>> what's the best way to track them, but I feel it's better to list and track
>>> them separately. I have talked with related people currently maintaining
>>> the PrestoDB Iceberg connector (mostly in Twitter), and they would like to
>>> take a different route from Trino to fully remove Hive dependencies in the
>>> connector. This means the 2 connectors will likely diverge in
>>> implementation in the near future.
>>> 2. adding a medium item for Trino and PrestoDB Avro support
>>> 3. adding a small item for Trino and PrestoDB full system table support
>>> (the system table schema in them are diverging from core, and missing a few
>>> latest system tables)
>>>
>>> For the items listed with "Spec" and "Spec v3", what are the key
>>> differences? I thought we are treating any new spec changes after the
>>> format v2 vote as v3.
>>>
>>> Best,
>>> Jack Ye
>>>
>>> On Mon, Sep 13, 2021 at 7:13 AM Gidon Gershinsky <gg5...@gmail.com>
>>> wrote:
>>>
>>>> Hi Ryan,
>>>>
>>>> I just wonder if the encryption should be a Spec v3 category. We have
>>>> the key_metadata fields in both data_file and manifest_file structs, which
>>>> might be sufficient for a reasonable basic encryption support.
>>>> But I certainly agree this is an L-sized project.
>>>>
>>>> Cheers, Gidon
>>>>
>>>>
>>>> On Sat, Sep 11, 2021 at 12:38 AM Ryan Blue <b...@tabular.io> wrote:
>>>>
>>>>> Hi everyone,
>>>>>
>>>>> At the last sync meeting, we brought up publishing a community roadmap
>>>>> and brainstormed the many features and initiatives that the community is
>>>>> working on. In this thread, I want to make sure that we have a good list 
>>>>> of
>>>>> what people are thinking about and I think we should try to categorize the
>>>>> projects by size and general priority. When we reach a rough agreement,
>>>>> I’ll write this up and post it on the ASF site along with links to some
>>>>> projects in Github.
>>>>>
>>>>> My rationale for attempting to prioritize projects is that if we try
>>>>> to do too many things, it will be slower progress across everything rather
>>>>> than getting a few important items done. I know that priorities don’t 
>>>>> align
>>>>> very cleanly in practice, but it is hopefully worth trying. To come up 
>>>>> with
>>>>> a priority, I’m trying to keep top priority items to a minimum by 
>>>>> including
>>>>> only one from each group (Spark, Flink, Python, etc.). The remaining items
>>>>> are split between priority 2 and 3. Priority 3 is not urgent, including
>>>>> things that can be plugged in (like other IO libraries), docs, etc.
>>>>> Everything else is priority 2.
>>>>>
>>>>> That something isn’t priority 1 doesn’t mean it isn’t important or
>>>>> progressing, just that it isn’t the current focus. I think of it this way:
>>>>> if someone has extra time to review something, what should be next? That’s
>>>>> top priority.
>>>>>
>>>>> Here’s my rough categorization. If you disagree, please speak up:
>>>>>
>>>>>    - If you think that something should be top priority, what gets
>>>>>    moved to priority 2?
>>>>>    - Should the priority for a project in 2 or 3 change?
>>>>>    - Is the S/M/L size of a project wrong?
>>>>>
>>>>> Top priority, 1:
>>>>>
>>>>>    - API: Iceberg 1.0 [medium]
>>>>>    - Spark: Merge-on-read plans [large]
>>>>>    - Maintenance: Delete file compaction [medium]
>>>>>    - Flink: Upgrade to 1.13.2 (document compatibility) [medium]
>>>>>    - Python: Pythonic refactor [medium]
>>>>>
>>>>> Priority 2:
>>>>>
>>>>>    - ORC: Support delete files stored as ORC [small]
>>>>>    - Spark: DSv2 streaming improvements [small]
>>>>>    - Flink: Inline file compaction [small]
>>>>>    - Flink: Support UPSERT [small]
>>>>>    - Views: Spec [medium]
>>>>>    - Spec: Z-ordering / Space-filling curves [medium]
>>>>>    - Spec: Snapshot tagging and branching [small]
>>>>>    - Spec: Secondary indexes [large]
>>>>>    - Spec v3: Encryption [large]
>>>>>    - Spec v3: Relative paths [large]
>>>>>    - Spec v3: Default field values [medium]
>>>>>
>>>>> Priority 3:
>>>>>
>>>>>    - Docs: versioned docs [medium]
>>>>>    - IO: Support Aliyun OSS/DLF [medium]
>>>>>    - IO: Support Dell ECS [medium]
>>>>>
>>>>> External:
>>>>>
>>>>>    - Trino: Bucketed joins [small]
>>>>>    - Trino: Row-level delete support [medium]
>>>>>    - Trino: Merge-on-read plans [medium]
>>>>>    - Trino: Multi-catalog support [small]
>>>>>
>>>>> --
>>>>> Ryan Blue
>>>>> Tabular
>>>>>
>>>>
>>>
>>
>> --
>> Ryan Blue
>> Tabular
>>
>

-- 
Ryan Blue
Tabular

Reply via email to