Hi, Some questions that come to mind:
1. If we add vendor X to datafusion, will we be open to other vendor Y? How do we compare vendors? How do we draw the line of "not sufficiently relevant"? 2. How do we ensure that we do not distort the same level playing field that some people expect from DataFusion? 3. What is the challenge of creating a binary that uses DataFusion + Delta-lake custom table provider outside of DataFusion? I see DataFusion's plugin system, * custom nodes * custom table providers * custom physical optimizers * custom logical optimizers * UDFs * UDAFs as our answer to not bundle vendor-specific implementations (e.g. s3, azure, Oracle, IOx, IBM, google, delta lake), and instead allow users to build applications on top of, with whatever vendor-specific requirements they have. Rust lends itself really well to this, as dependencies are maintained in Cargo.toml, and binaries compiled with DataFusion can be built with plugins and deployed in prod environments as a single binary. AFAIK delta-lake itself is not bundled with spark, and is instead installed separately (e.g. on the POM for java, pip install delta-spark for Python) [1]. I think that this is a sustainable model whereby we do not have to know about delta-lake specifics to be able to maintain the code, and instead declare contracts for extensions, which others maintain for their specific formats/systems. Best, Jorge [1] https://docs.delta.io/1.0.0/quick-start.html