suxiaogang223 opened a new issue, #65086: URL: https://github.com/apache/doris/issues/65086
## Background Apache Doris already supports reading Apache Paimon tables through External Catalog. To complete the read/write experience for lakehouse workloads, we plan to support writing data from Doris into Paimon tables, so users can write query results, load results, and computed data into Paimon tables using Doris SQL. This issue tracks the overall plan and follow-up PRs for Paimon write support in Doris. ## Goals - Support `INSERT INTO` for Paimon tables. - Support `INSERT OVERWRITE` for Paimon tables. - Support writing append-only tables and primary-key tables. - Support writing partitioned tables, fixed bucket tables, and dynamic bucket tables. - Support writing both primitive types and complex types. - Support concurrent writes from multiple Doris backends and fragments without degrading the write path into a single-writer bottleneck. - Align Doris transaction semantics with Paimon commit and abort semantics. - Preserve correctness for failure, retry, and rollback scenarios. - Continuously improve write throughput, file size control, and conflict reduction. ## Scope ### Basic Write Path - Support Paimon tables as Doris write targets. - Support `INSERT INTO Paimon table SELECT ...`. - Commit Paimon write results only after the Doris transaction succeeds. - Clean up uncommitted write results when the Doris transaction fails. - Preserve consistency for commit retry and fragment retry scenarios. - Support primitive data type writes. ### Concurrent Writes and Table Layout - Support concurrent writes to non-partitioned tables. - Support writes to partitioned tables. - Support writes to fixed bucket tables. - Support writes to dynamic bucket tables. - Organize Doris write distribution according to Paimon table layout where possible, reducing small files and write conflicts. - Continue improving writer control for the same partition and bucket. Note: strictly guaranteeing that the same `(partition, bucket)` is written by only one Doris writer requires Doris distribution to be aware of Paimon bucket semantics. This can be tracked as a dedicated enhancement. ### Table Types and Write Semantics - Support append-only table writes. - Support primary-key table writes. - Support full-row writes for primary-key tables. - Plan follow-up support for partial update, delete, update, merge, and Paimon merge-engine related semantics. ### INSERT OVERWRITE - Support `INSERT OVERWRITE` for Paimon tables. - Support overwrite for non-partitioned tables. - Support static partition overwrite. - Support dynamic partition overwrite. - Preserve overwrite correctness for failure, retry, rollback, and empty-input scenarios. ### Types and Storage Environments - Support primitive Doris types mapped to Paimon types. - Support complex types such as array, map, and struct. - Validate decimal, timestamp, timezone, and binary semantics. - Support writes on HDFS and object storage environments such as S3 and OSS. - Support write compatibility with schema evolution scenarios. ## PR Tracking - [ ] Support the basic Paimon write path - [ ] Align Doris transactions with Paimon commit and abort - [ ] Support append-only table writes - [ ] Support primary-key table writes - [ ] Support partitioned table writes - [ ] Support fixed bucket table writes - [ ] Support dynamic bucket table writes - [ ] Support `INSERT OVERWRITE` - [ ] Support primitive type writes - [ ] Support complex type writes - [ ] Support object storage write scenarios - [ ] Improve partition and bucket write distribution - [ ] Improve small file control and write throughput - [ ] Add tests for append-only table writes - [ ] Add tests for primary-key table writes - [ ] Add tests for partitioned and bucketed table writes - [ ] Add tests for dynamic bucket table writes - [ ] Add tests for `INSERT OVERWRITE` - [ ] Add tests for complex type writes - [ ] Add tests for transaction commit, retry, and rollback - [ ] Add tests for object storage write scenarios ## Risks and Notes - Bucketed table writes need careful handling of concurrent writers, small files, and commit conflicts. - Dynamic bucket support requires additional planning around data distribution and bucket management. - Primary-key table writes need clear semantic boundaries for full-row writes, partial updates, delete, and update. - `INSERT OVERWRITE` needs strict correctness for failure, retry, rollback, and empty-input cases. - Doris transaction semantics and Paimon commit/abort semantics must remain consistent under failures and retries. - Complex types, decimal, timestamp, timezone, and binary values need dedicated validation. - Writes, commits, and cleanup on object storage need dedicated validation. ## Expected Benefits With Paimon write support, Doris users will be able to write query results, load results, and computed data directly into Paimon tables using Doris SQL. This completes the read/write loop for lakehouse workloads and lays the foundation for more complete Paimon data write, update, and table management capabilities in Doris. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
