Hi Apache Arrow Community,
I'm Ziang Zhou from CNIC, Chinese Academy of Sciences. I'd like to share a proposal about DACP (Data Access and Collaboration Protocol) on behave of my Team, a protocol built on Apache Arrow Flight, and discuss potential integration with the Arrow ecosystem. ### 1. Background of DACP DACP is designed for cross-node, cross-process data access in scientific and distributed computing environments. It addresses pain points like fragmented data sharing, lack of collaboration support, and inefficient streaming in existing solutions. ### 2. Relationship with Apache Arrow DACP is tightly integrated with Apache Arrow Flight: - Uses Arrow Flight as the underlying RPC layer for zero-copy, columnar data transfer; - Reuses Arrow's in-memory format for SDF (Streaming DataFrame), ensuring interoperability with other Arrow-enabled systems; - Extends Flight with high-level features like dataset catalog management, end-to-end provenance tracking, and secure collaboration. ### 3. Current Status - Project repo: https://github.com/rdcn-link/dftp-dacp - IETF draft: https://datatracker.ietf.org/doc/draft-shenzhihong-dacp/ - Has been tested in scientific computing clusters for multi-node data sharing in the fields of scientific and distributed computing from Institute of Atmospheric Physics, CAS ### 4. Collaboration Request We hope to: 1. Get technical feedback from the Arrow community on DACP's design (especially compatibility with Arrow Flight); 2. Discuss the possibility of listing DACP as an official Arrow ecosystem extension; 3. Explore potential collaboration on protocol optimization (e.g., aligning SDF with Arrow's data model). We've already submitted a PR to add DACP to the "Powered By Apache Arrow" list (PR link: https://github.com/apache/arrow-site/pull/728), and look forward to your valuable comments. Thank you for your time! Best regards, Ziang Zhou CNIC, Chinese Academy of Sciences Email: [email protected] Project Repo: https://github.com/rdcn-link/dftp-dacp
