Thanks, Zhao. I think those are great ways to work together. Let us know how we can help you make StarRocks successful with Iceberg as its data format. We're always happy to help people understand how Iceberg works and improve our docs on how to use it.
Ryan On Mon, Nov 8, 2021 at 8:17 PM Zhao Chun <zh...@apache.org> wrote: > I feel that Ryan's response exemplifies the generosity of an Apache > project creator, > a quality that has touched and benefited us. We look forward to > contributing > further to the Apache project in the future. > As for the need for an issue to track progress,I don't think so for now. > At the moment the main development work is done in the StarRocks > repository. > As for further cooperation in the future, I think there are several > aspects. > 1. StarRocks will be trying to support Iceberg. > I think this will help StarRocks to re-examine how it integrates with the > lakehouse system > and we will be happy to feed back to the Apache Iceberg community the > issues and benefits > we encounter during the integration process. > This will also validate the versatility of the iceberg project to support > more query engines. > I think this project will benefit both projects. > 2. In the future, we will share some of our best practices for iceberg and > StarRocks integration in a blog or talk. > If the Apache Iceberg project feels that these blogs or talks would be > beneficial to the Apache iceberg community, > please consider linking our subsequent blogs or talks to the apache > iceberg website blog. > The Iceberg community can, of course, not link if they feel it is > inappropriate. > 3. we expect to contribute to the Apache Iceberg community under the > Apache License V2. > > Thanks, > Zhao Chun > > > Ryan Blue <b...@tabular.io> 于2021年11月9日周二 上午3:05写道: > >> I think it is great to see another processing engine adding support for >> Apache Iceberg, and I do look forward to collaborating with the StarRocks >> community in the future. >> >> I'm not entirely sure what that collaboration would look like just yet >> though. For most processing engines, it is people joining the Apache >> Iceberg community. No matter what the license of the downstream project, we >> always welcome more people contributing here! >> >> As for opening a project in our tracker, I'm not sure it makes sense to >> do that just yet. As far as I know there aren't any issues to track there. >> And would the StarRocks community find it helpful? >> >> On Mon, Nov 8, 2021 at 12:14 AM Zhao Chun <buaa.zh...@gmail.com> wrote: >> >>> Thanks to @OpenInx for mentioning StarRocks in the iceberg community. >>> >>> I'm from the StarRocks community. >>> >>> StarRocks is based on the Apache Doris project. >>> It has been in development internally for almost two years and is >>> currently used by hundreds of companies. >>> It was just opened 2 months ago. >>> >>> Iceberg is a great project that makes huge datasets analysis more >>> convenient. >>> The StarRocks community is planning to support the iceberg engine. >>> This will provide StarRocks users with the ability to analyze data in >>> iceberg. >>> >>> Regarding the license, StarRocks' ELv2 will not affect our contribution >>> to the iceberg community under the Apache License V2. >>> >>> We are also looking forward to receiving help from the iceberg community >>> and will be contributing back to the iceberg community. >>> >>> Thanks, >>> Zhao Chun >>> >>> >>> Kyle Bendickson <k...@tabular.io> 于2021年11月8日周一 下午2:53写道: >>> >>>> +1 around concerns with the Elastic license. >>>> >>>> Also, more importantly, how important is integration with either of >>>> these tools to the Iceberg community and contributors? >>>> >>>> The Elastic license makes a bit more sense for elasticsearch, as it was >>>> an existing project for quite some time. I won’t reiterate the details of >>>> that situation, but it’s odd to see a fork of a new, active project using >>>> the Elastic license in my opinion. >>>> >>>> StarRocks admits that they’re at least 40% of code from the Apache >>>> Doris project. >>>> >>>> That said, StarRocks claims to not require other dependencies. It seems >>>> StarRocks supports query federation with a few tools so as not to have to >>>> import the data and query those systems directly. So I’m not sure what >>>> Iceberg support would look like beyond additional query federation. What >>>> benefit does this provide? >>>> >>>> If we determined that integration with one of these tools was something >>>> the community valued, could a connector be built to target the Apache Doris >>>> project and then StarRocks could fork that code if they liked? >>>> >>>> - Kyle Bendickson >>>> GitHub @kbendick >>>> >>>> >>>> >>>> On Sun, Nov 7, 2021 at 9:24 PM Reo Lei <leinuo...@gmail.com> wrote: >>>> >>>>> +1, I have the same concern for the incompatible license. >>>>> >>>>> Jacques Nadeau <jacquesnad...@gmail.com> 于2021年11月8日周一 上午11:48写道: >>>>> >>>>>> A few additional observations about StarRocks... >>>>>> >>>>>> - As far as I can tell, StarRocks has an ASF incompatible license >>>>>> (Elastic License 2.0). >>>>>> - It appears to be a hard fork of Apache Doris, a project still in >>>>>> the incubator (and looks like it probably is destructive to the Doris >>>>>> project) >>>>>> - The project has only existed for ~2 months. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Sun, Nov 7, 2021 at 7:34 PM OpenInx <open...@gmail.com> wrote: >>>>>> >>>>>>> Any thoughts for adding StarRocks integration to the roadmap ? >>>>>>> >>>>>>> I think the guys from StarRocks community can provide more >>>>>>> background and inputs. >>>>>>> >>>>>>> On Thu, Nov 4, 2021 at 5:59 PM OpenInx <open...@gmail.com> wrote: >>>>>>> >>>>>>>> Update: >>>>>>>> >>>>>>>> StarRocks[1] is a next-gen sub-second MPP database for full >>>>>>>> analysis scenarios, including multi-dimensional analytics, real-time >>>>>>>> analytics and ad-hoc query. Their team is planning to integrate >>>>>>>> iceberg >>>>>>>> tables as StarRocks external tables in the next month [2], so that >>>>>>>> people >>>>>>>> could connect the data lake and StarRocks warehouse in the same engine. >>>>>>>> The excellent performance of StarRocks will also help accelerate >>>>>>>> the analysis and access of the iceberg table, I think this is a great >>>>>>>> thing >>>>>>>> for both the iceberg community and the StarRocks community. I think >>>>>>>> we >>>>>>>> can add an extra project about StarRocks integration work in the apache >>>>>>>> iceberg roadmap [3] ? >>>>>>>> >>>>>>>> [1]. https://github.com/StarRocks/starrocks >>>>>>>> [2]. https://github.com/StarRocks/starrocks/issues/1030 >>>>>>>> [3]. https://github.com/apache/iceberg/projects >>>>>>>> >>>>>>>> On Mon, Nov 1, 2021 at 11:52 PM Ryan Blue <b...@tabular.io> wrote: >>>>>>>> >>>>>>>>> I closed the upgrade project and marked the FLIP-27 project >>>>>>>>> priority 1. Thanks for all the work to get this done! >>>>>>>>> >>>>>>>>> On Sun, Oct 31, 2021 at 8:10 PM OpenInx <open...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Update: >>>>>>>>>> >>>>>>>>>> I think the project [Flink: Upgrade to 1.13.2][1] in RoadMap can >>>>>>>>>> be closed now, because all of the issues have been addressed. >>>>>>>>>> >>>>>>>>>> [1]. https://github.com/apache/iceberg/projects/12 >>>>>>>>>> >>>>>>>>>> On Tue, Sep 21, 2021 at 6:17 PM Eduard Tudenhoefner < >>>>>>>>>> edu...@dremio.com> wrote: >>>>>>>>>> >>>>>>>>>>> I created a Roadmap section in >>>>>>>>>>> https://github.com/apache/iceberg/pull/3163 >>>>>>>>>>> <https://github.com/apache/iceberg/pull/3163> that links to the >>>>>>>>>>> planning boards that Jack created. I figured it makes sense if we >>>>>>>>>>> link >>>>>>>>>>> available Design Docs directly on those Boards (as was already >>>>>>>>>>> done), >>>>>>>>>>> because then the Design docs are closer to the set of related >>>>>>>>>>> issues. >>>>>>>>>>> >>>>>>>>>>> On Mon, Sep 20, 2021 at 10:02 PM Ryan Blue <b...@tabular.io> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Thanks, Jack! >>>>>>>>>>>> >>>>>>>>>>>> Eduard, I think that's a good idea. We should have a roadmap >>>>>>>>>>>> page as well that links to the projects that Jack just created. >>>>>>>>>>>> >>>>>>>>>>>> On Mon, Sep 20, 2021 at 12:57 PM Jack Ye <yezhao...@gmail.com> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> It seems like we have reached some consensus around the >>>>>>>>>>>>> projects listed here. I have created corresponding Github >>>>>>>>>>>>> projects for >>>>>>>>>>>>> each: https://github.com/apache/iceberg/projects >>>>>>>>>>>>> >>>>>>>>>>>>> Related design docs are also linked there. >>>>>>>>>>>>> >>>>>>>>>>>>> Best, >>>>>>>>>>>>> Jack Ye >>>>>>>>>>>>> >>>>>>>>>>>>> On Sun, Sep 19, 2021 at 11:18 PM Eduard Tudenhoefner < >>>>>>>>>>>>> edu...@dremio.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Would it make sense to have a section on the website where we >>>>>>>>>>>>>> collect all the links to the design docs/specs as that would be >>>>>>>>>>>>>> easier to >>>>>>>>>>>>>> find than searching for things on the ML? >>>>>>>>>>>>>> >>>>>>>>>>>>>> I was thinking about something like for each component: >>>>>>>>>>>>>> * link to the ML discussion >>>>>>>>>>>>>> * link to the actual Spec/Design Doc >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thoughts? >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, Sep 10, 2021 at 11:38 PM Ryan Blue <b...@tabular.io> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi everyone, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> At the last sync meeting, we brought up publishing a >>>>>>>>>>>>>>> community roadmap and brainstormed the many features and >>>>>>>>>>>>>>> initiatives that >>>>>>>>>>>>>>> the community is working on. In this thread, I want to make >>>>>>>>>>>>>>> sure that we >>>>>>>>>>>>>>> have a good list of what people are thinking about and I think >>>>>>>>>>>>>>> we should >>>>>>>>>>>>>>> try to categorize the projects by size and general priority. >>>>>>>>>>>>>>> When we reach >>>>>>>>>>>>>>> a rough agreement, I’ll write this up and post it on the ASF >>>>>>>>>>>>>>> site along >>>>>>>>>>>>>>> with links to some projects in Github. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> My rationale for attempting to prioritize projects is that >>>>>>>>>>>>>>> if we try to do too many things, it will be slower progress >>>>>>>>>>>>>>> across >>>>>>>>>>>>>>> everything rather than getting a few important items done. I >>>>>>>>>>>>>>> know that >>>>>>>>>>>>>>> priorities don’t align very cleanly in practice, but it is >>>>>>>>>>>>>>> hopefully worth >>>>>>>>>>>>>>> trying. To come up with a priority, I’m trying to keep top >>>>>>>>>>>>>>> priority items >>>>>>>>>>>>>>> to a minimum by including only one from each group (Spark, >>>>>>>>>>>>>>> Flink, Python, >>>>>>>>>>>>>>> etc.). The remaining items are split between priority 2 and 3. >>>>>>>>>>>>>>> Priority 3 >>>>>>>>>>>>>>> is not urgent, including things that can be plugged in (like >>>>>>>>>>>>>>> other IO >>>>>>>>>>>>>>> libraries), docs, etc. Everything else is priority 2. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> That something isn’t priority 1 doesn’t mean it isn’t >>>>>>>>>>>>>>> important or progressing, just that it isn’t the current focus. >>>>>>>>>>>>>>> I think of >>>>>>>>>>>>>>> it this way: if someone has extra time to review something, >>>>>>>>>>>>>>> what should be >>>>>>>>>>>>>>> next? That’s top priority. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Here’s my rough categorization. If you disagree, please >>>>>>>>>>>>>>> speak up: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> - If you think that something should be top priority, >>>>>>>>>>>>>>> what gets moved to priority 2? >>>>>>>>>>>>>>> - Should the priority for a project in 2 or 3 change? >>>>>>>>>>>>>>> - Is the S/M/L size of a project wrong? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Top priority, 1: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> - API: Iceberg 1.0 [medium] >>>>>>>>>>>>>>> - Spark: Merge-on-read plans [large] >>>>>>>>>>>>>>> - Maintenance: Delete file compaction [medium] >>>>>>>>>>>>>>> - >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Flink: Upgrade to 1.13.2 (document compatibility) >>>>>>>>>>>>>>> [medium] >>>>>>>>>>>>>>> - >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Python: Pythonic refactor [medium] >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Priority 2: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> - ORC: Support delete files stored as ORC [small] >>>>>>>>>>>>>>> - Spark: DSv2 streaming improvements [small] >>>>>>>>>>>>>>> - Flink: Inline file compaction [small] >>>>>>>>>>>>>>> - Flink: Support UPSERT [small] >>>>>>>>>>>>>>> - Views: Spec [medium] >>>>>>>>>>>>>>> - Spec: Z-ordering / Space-filling curves [medium] >>>>>>>>>>>>>>> - Spec: Snapshot tagging and branching [small] >>>>>>>>>>>>>>> - Spec: Secondary indexes [large] >>>>>>>>>>>>>>> - Spec v3: Encryption [large] >>>>>>>>>>>>>>> - >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Spec v3: Relative paths [large] >>>>>>>>>>>>>>> - >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Spec v3: Default field values [medium] >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Priority 3: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> - Docs: versioned docs [medium] >>>>>>>>>>>>>>> - IO: Support Aliyun OSS/DLF [medium] >>>>>>>>>>>>>>> - IO: Support Dell ECS [medium] >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> External: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> - Trino: Bucketed joins [small] >>>>>>>>>>>>>>> - Trino: Row-level delete support [medium] >>>>>>>>>>>>>>> - Trino: Merge-on-read plans [medium] >>>>>>>>>>>>>>> - Trino: Multi-catalog support [small] >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>> Ryan Blue >>>>>>>>>>>>>>> Tabular >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Ryan Blue >>>>>>>>>>>> Tabular >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Ryan Blue >>>>>>>>> Tabular >>>>>>>>> >>>>>>>> >> >> -- >> Ryan Blue >> Tabular >> > -- Ryan Blue Tabular