kinghao007 opened a new issue, #5325: URL: https://github.com/apache/linkis/issues/5325
### Search before asking - [x] I searched the [issues](https://github.com/apache/linkis/issues) and found no similar issues. ### Linkis Component linkis-engineconn-plugins ### What happened **English:** Linkis currently lacks native support for StarRocks, the next-generation extreme performance OLAP engine. StarRocks has surpassed Doris and ClickHouse in SSB and TPC-H benchmarks and is rapidly gaining adoption in top-tier companies (Tencent, ByteDance, Ctrip, Meituan, Xiaomi) and financial institutions (China UnionPay, CMB, Taikang Insurance) due to its superior performance, high concurrency, and native lakehouse capabilities. **Market Demand:** - **Extreme Query Performance**: 10-30% faster than Doris in SSB benchmarks, industry-leading OLAP performance - **Real-time UPSERT**: Native support for high-performance primary key model with UPSERT operations, filling ClickHouse's update gap - **MPP Architecture**: High concurrency support with hundreds of concurrent queries per cluster - **Smart Materialized Views**: Automatic query rewriting to select optimal materialized views - **Native Lakehouse**: Deep integration with Hive, Iceberg, Hudi, Delta Lake for unified data access - **Rapid Growth**: 8k+ GitHub stars, many enterprises migrating from Doris to StarRocks **Strategic Value:** While Linkis already supports Doris, StarRocks has evolved significantly and offers complementary value: - **StarRocks**: Next-gen real-time data warehouse for extreme performance seekers - **Doris**: Traditional real-time OLAP for mature stability requirements - **Strategy**: Support both engines, let users choose based on scenarios --- **中文:** Linkis目前缺乏对StarRocks的原生支持,StarRocks是新一代极致性能OLAP引擎。StarRocks在SSB和TPC-H基准测试中超越了Doris和ClickHouse,并在头部公司(腾讯、字节跳动、携程、美团、小米)和金融机构(中国银联、招商银行、泰康保险)中快速获得采用,因其卓越的性能、高并发能力和原生湖仓能力。 **市场需求:** - **极致查询性能**: SSB基准测试中比Doris快10-30%,业界领先的OLAP性能 - **实时UPSERT**: 原生支持高性能主键模型的UPSERT操作,填补ClickHouse的更新短板 - **MPP架构**: 高并发支持,单集群可支持数百并发查询 - **智能物化视图**: 自动查询改写,选择最优物化视图 - **原生湖仓**: 与Hive、Iceberg、Hudi、Delta Lake深度集成,实现统一数据访问 - **快速增长**: GitHub 8k+ stars,许多企业正从Doris迁移到StarRocks **战略价值:** 虽然Linkis已支持Doris,但StarRocks已显著演进并提供互补价值: - **StarRocks**: 追求极致性能的新一代实时数仓 - **Doris**: 追求成熟稳定的传统实时OLAP - **策略**: 同时支持两个引擎,让用户根据场景选择 ### What you expected to happen **English:** Linkis should provide a StarRocks engine plugin with the following capabilities: 1. **SQL Query Support:** - MySQL protocol compatibility (StarRocks is MySQL-compatible) - Standard SQL syntax support - Support for all table models (Duplicate, Aggregate, Unique, Primary Key) - Materialized view queries with automatic rewriting 2. **Data Operations:** - INSERT for batch data loading - UPSERT for real-time updates (Primary Key model) - DELETE for data deletion - Stream Load and Broker Load support - Complex JOIN and aggregation queries 3. **Lakehouse Integration:** - Query external tables (Hive, Iceberg, Hudi, Delta Lake) - External catalog support - Unified SQL interface for data lake and warehouse - Federated queries across multiple data sources 4. **Performance Optimization:** - Connection pooling and reuse - Query result streaming to avoid OOM - Tablet-level parallel execution - Automatic query optimization - Resource usage monitoring 5. **Integration with Linkis:** - Unified task submission interface - Resource management integration - Permission control integration - Metadata catalog integration --- **中文:** Linkis应该提供StarRocks引擎插件,具备以下能力: 1. **SQL查询支持:** - MySQL协议兼容(StarRocks兼容MySQL) - 标准SQL语法支持 - 支持所有表模型(Duplicate、Aggregate、Unique、Primary Key) - 物化视图查询与自动改写 2. **数据操作:** - INSERT用于批量数据加载 - UPSERT用于实时更新(主键模型) - DELETE用于数据删除 - Stream Load和Broker Load支持 - 复杂JOIN和聚合查询 3. **湖仓集成:** - 查询外部表(Hive、Iceberg、Hudi、Delta Lake) - 外部catalog支持 - 数据湖和数据仓库的统一SQL接口 - 跨多个数据源的联邦查询 4. **性能优化:** - 连接池和复用 - 查询结果流式处理避免OOM - Tablet级并行执行 - 自动查询优化 - 资源使用监控 5. **与Linkis集成:** - 统一的任务提交接口 - 资源管理集成 - 权限控制集成 - 元数据目录集成 ### How to reproduce **English:** Current situation: 1. Users need to manually set up StarRocks MySQL connections 2. No dedicated engine plugin for StarRocks operations 3. Cannot leverage Linkis's unified task submission and resource management 4. Limited support for StarRocks-specific features (lakehouse, materialized views) Use case example: ```sql -- Real-time analytics with UPSERT (Primary Key model) -- StarRocks excels at real-time updates unlike ClickHouse CREATE TABLE user_profiles ( user_id BIGINT, user_name STRING, total_orders INT, last_order_time DATETIME ) PRIMARY KEY (user_id) DISTRIBUTED BY HASH(user_id); -- Upsert operation (updates existing, inserts new) INSERT INTO user_profiles VALUES (1001, 'Alice', 150, '2024-12-20 10:30:00'), (1002, 'Bob', 200, '2024-12-20 11:00:00') ON DUPLICATE KEY UPDATE total_orders = VALUES(total_orders), last_order_time = VALUES(last_order_time); -- Query lakehouse data (Iceberg table) - StarRocks native support SELECT date_trunc('day', event_time) as day, event_type, COUNT(*) as event_count FROM iceberg_catalog.events_db.user_events WHERE event_time >= CURRENT_DATE - INTERVAL 7 DAY GROUP BY day, event_type ORDER BY day DESC, event_count DESC; -- Federated query across StarRocks and data lake SELECT s.user_id, s.user_name, COUNT(e.event_id) as event_count FROM user_profiles s JOIN iceberg_catalog.events_db.user_events e ON s.user_id = e.user_id WHERE e.event_time >= CURRENT_DATE - INTERVAL 1 DAY GROUP BY s.user_id, s.user_name; -- Materialized view automatic rewriting (not available without plugin) -- StarRocks automatically selects best MV for query optimization SELECT region, SUM(sales) FROM sales_table GROUP BY region; -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
