+1 around concerns with the Elastic license.

Also, more importantly, how important is integration with either of these
tools to the Iceberg community and contributors?

The Elastic license makes a bit more sense for elasticsearch, as it was an
existing project for quite some time. I won’t reiterate the details of that
situation, but it’s odd to see a fork of a new, active project using the
Elastic license in my opinion.

StarRocks admits that they’re at least 40% of code from the Apache Doris
project.

That said, StarRocks claims to not require other dependencies. It seems
StarRocks supports query federation with a few tools so as not to have to
import the data and query those systems directly. So I’m not sure what
Iceberg support would look like beyond additional query federation. What
benefit does this provide?

If we determined that integration with one of these tools was something the
community valued, could a connector be built to target the Apache Doris
project and then StarRocks could fork that code if they liked?

- Kyle Bendickson
GitHub @kbendick



On Sun, Nov 7, 2021 at 9:24 PM Reo Lei <leinuo...@gmail.com> wrote:

> +1, I have the same concern for the incompatible license.
>
> Jacques Nadeau <jacquesnad...@gmail.com> 于2021年11月8日周一 上午11:48写道:
>
>> A few additional observations about StarRocks...
>>
>> - As far as I can tell, StarRocks has an ASF incompatible license
>> (Elastic License 2.0).
>> - It appears to be a hard fork of Apache Doris, a project still in the
>> incubator (and looks like it probably is destructive to the Doris project)
>> - The project has only existed for ~2 months.
>>
>>
>>
>>
>>
>> On Sun, Nov 7, 2021 at 7:34 PM OpenInx <open...@gmail.com> wrote:
>>
>>> Any thoughts for adding StarRocks integration to the roadmap ?
>>>
>>> I think the guys from StarRocks community can provide more background
>>> and inputs.
>>>
>>> On Thu, Nov 4, 2021 at 5:59 PM OpenInx <open...@gmail.com> wrote:
>>>
>>>> Update:
>>>>
>>>> StarRocks[1] is a next-gen sub-second MPP database for full analysis
>>>> scenarios, including multi-dimensional analytics, real-time analytics and
>>>> ad-hoc query.  Their team is planning to integrate iceberg tables as
>>>> StarRocks external tables in the next month [2], so that people could
>>>> connect the data lake and StarRocks warehouse in the same engine.
>>>> The excellent performance of StarRocks will also help accelerate the
>>>> analysis and access of the iceberg table, I think this is a great thing for
>>>> both the iceberg community and the StarRocks community.   I think we can
>>>> add an extra project about StarRocks integration work in the apache iceberg
>>>> roadmap [3] ?
>>>>
>>>> [1].  https://github.com/StarRocks/starrocks
>>>> [2].  https://github.com/StarRocks/starrocks/issues/1030
>>>> [3].  https://github.com/apache/iceberg/projects
>>>>
>>>> On Mon, Nov 1, 2021 at 11:52 PM Ryan Blue <b...@tabular.io> wrote:
>>>>
>>>>> I closed the upgrade project and marked the FLIP-27 project priority
>>>>> 1. Thanks for all the work to get this done!
>>>>>
>>>>> On Sun, Oct 31, 2021 at 8:10 PM OpenInx <open...@gmail.com> wrote:
>>>>>
>>>>>> Update:
>>>>>>
>>>>>> I think the project  [Flink: Upgrade to 1.13.2][1] in RoadMap can be
>>>>>> closed now, because all of the issues have been addressed.
>>>>>>
>>>>>> [1]. https://github.com/apache/iceberg/projects/12
>>>>>>
>>>>>> On Tue, Sep 21, 2021 at 6:17 PM Eduard Tudenhoefner <
>>>>>> edu...@dremio.com> wrote:
>>>>>>
>>>>>>> I created a Roadmap section in
>>>>>>>  https://github.com/apache/iceberg/pull/3163
>>>>>>> <https://github.com/apache/iceberg/pull/3163> that links to the
>>>>>>> planning boards that Jack created. I figured it makes sense if we link
>>>>>>> available Design Docs directly on those Boards (as was already done),
>>>>>>> because then the Design docs are closer to the set of related issues.
>>>>>>>
>>>>>>> On Mon, Sep 20, 2021 at 10:02 PM Ryan Blue <b...@tabular.io> wrote:
>>>>>>>
>>>>>>>> Thanks, Jack!
>>>>>>>>
>>>>>>>> Eduard, I think that's a good idea. We should have a roadmap page
>>>>>>>> as well that links to the projects that Jack just created.
>>>>>>>>
>>>>>>>> On Mon, Sep 20, 2021 at 12:57 PM Jack Ye <yezhao...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> It seems like we have reached some consensus around the projects
>>>>>>>>> listed here. I have created corresponding Github projects for each:
>>>>>>>>> https://github.com/apache/iceberg/projects
>>>>>>>>>
>>>>>>>>> Related design docs are also linked there.
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Jack Ye
>>>>>>>>>
>>>>>>>>> On Sun, Sep 19, 2021 at 11:18 PM Eduard Tudenhoefner <
>>>>>>>>> edu...@dremio.com> wrote:
>>>>>>>>>
>>>>>>>>>> Would it make sense to have a section on the website where we
>>>>>>>>>> collect all the links to the design docs/specs as that would be 
>>>>>>>>>> easier to
>>>>>>>>>> find than searching for things on the ML?
>>>>>>>>>>
>>>>>>>>>> I was thinking about something like for each component:
>>>>>>>>>> * link to the ML discussion
>>>>>>>>>> * link to the actual Spec/Design Doc
>>>>>>>>>>
>>>>>>>>>> Thoughts?
>>>>>>>>>>
>>>>>>>>>> On Fri, Sep 10, 2021 at 11:38 PM Ryan Blue <b...@tabular.io>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi everyone,
>>>>>>>>>>>
>>>>>>>>>>> At the last sync meeting, we brought up publishing a community
>>>>>>>>>>> roadmap and brainstormed the many features and initiatives that the
>>>>>>>>>>> community is working on. In this thread, I want to make sure that 
>>>>>>>>>>> we have a
>>>>>>>>>>> good list of what people are thinking about and I think we should 
>>>>>>>>>>> try to
>>>>>>>>>>> categorize the projects by size and general priority. When we reach 
>>>>>>>>>>> a rough
>>>>>>>>>>> agreement, I’ll write this up and post it on the ASF site along 
>>>>>>>>>>> with links
>>>>>>>>>>> to some projects in Github.
>>>>>>>>>>>
>>>>>>>>>>> My rationale for attempting to prioritize projects is that if we
>>>>>>>>>>> try to do too many things, it will be slower progress across 
>>>>>>>>>>> everything
>>>>>>>>>>> rather than getting a few important items done. I know that 
>>>>>>>>>>> priorities
>>>>>>>>>>> don’t align very cleanly in practice, but it is hopefully worth 
>>>>>>>>>>> trying. To
>>>>>>>>>>> come up with a priority, I’m trying to keep top priority items to a 
>>>>>>>>>>> minimum
>>>>>>>>>>> by including only one from each group (Spark, Flink, Python, etc.). 
>>>>>>>>>>> The
>>>>>>>>>>> remaining items are split between priority 2 and 3. Priority 3 is 
>>>>>>>>>>> not
>>>>>>>>>>> urgent, including things that can be plugged in (like other IO 
>>>>>>>>>>> libraries),
>>>>>>>>>>> docs, etc. Everything else is priority 2.
>>>>>>>>>>>
>>>>>>>>>>> That something isn’t priority 1 doesn’t mean it isn’t important
>>>>>>>>>>> or progressing, just that it isn’t the current focus. I think of it 
>>>>>>>>>>> this
>>>>>>>>>>> way: if someone has extra time to review something, what should be 
>>>>>>>>>>> next?
>>>>>>>>>>> That’s top priority.
>>>>>>>>>>>
>>>>>>>>>>> Here’s my rough categorization. If you disagree, please speak up:
>>>>>>>>>>>
>>>>>>>>>>>    - If you think that something should be top priority, what
>>>>>>>>>>>    gets moved to priority 2?
>>>>>>>>>>>    - Should the priority for a project in 2 or 3 change?
>>>>>>>>>>>    - Is the S/M/L size of a project wrong?
>>>>>>>>>>>
>>>>>>>>>>> Top priority, 1:
>>>>>>>>>>>
>>>>>>>>>>>    - API: Iceberg 1.0 [medium]
>>>>>>>>>>>    - Spark: Merge-on-read plans [large]
>>>>>>>>>>>    - Maintenance: Delete file compaction [medium]
>>>>>>>>>>>    -
>>>>>>>>>>>
>>>>>>>>>>>    Flink: Upgrade to 1.13.2 (document compatibility) [medium]
>>>>>>>>>>>    -
>>>>>>>>>>>
>>>>>>>>>>>    Python: Pythonic refactor [medium]
>>>>>>>>>>>
>>>>>>>>>>> Priority 2:
>>>>>>>>>>>
>>>>>>>>>>>    - ORC: Support delete files stored as ORC [small]
>>>>>>>>>>>    - Spark: DSv2 streaming improvements [small]
>>>>>>>>>>>    - Flink: Inline file compaction [small]
>>>>>>>>>>>    - Flink: Support UPSERT [small]
>>>>>>>>>>>    - Views: Spec [medium]
>>>>>>>>>>>    - Spec: Z-ordering / Space-filling curves [medium]
>>>>>>>>>>>    - Spec: Snapshot tagging and branching [small]
>>>>>>>>>>>    - Spec: Secondary indexes [large]
>>>>>>>>>>>    - Spec v3: Encryption [large]
>>>>>>>>>>>    -
>>>>>>>>>>>
>>>>>>>>>>>    Spec v3: Relative paths [large]
>>>>>>>>>>>    -
>>>>>>>>>>>
>>>>>>>>>>>    Spec v3: Default field values [medium]
>>>>>>>>>>>
>>>>>>>>>>> Priority 3:
>>>>>>>>>>>
>>>>>>>>>>>    - Docs: versioned docs [medium]
>>>>>>>>>>>    - IO: Support Aliyun OSS/DLF [medium]
>>>>>>>>>>>    - IO: Support Dell ECS [medium]
>>>>>>>>>>>
>>>>>>>>>>> External:
>>>>>>>>>>>
>>>>>>>>>>>    - Trino: Bucketed joins [small]
>>>>>>>>>>>    - Trino: Row-level delete support [medium]
>>>>>>>>>>>    - Trino: Merge-on-read plans [medium]
>>>>>>>>>>>    - Trino: Multi-catalog support [small]
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Ryan Blue
>>>>>>>>>>> Tabular
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Ryan Blue
>>>>>>>> Tabular
>>>>>>>>
>>>>>>>
>>>>>
>>>>> --
>>>>> Ryan Blue
>>>>> Tabular
>>>>>
>>>>

Reply via email to