Hi all,

I would like to receive some feedback about adding Delta Lake support to
DataFusion (https://github.com/apache/arrow-datafusion/issues/525).
As you might know, Delta Lake <https://delta.io/> is a format adding
features like ACID transactions, statistics, and storage optimization to
Parquet and is getting quite some traction for managing data lakes.
It seems a great feature to have in DataFusion as well.

The delta-rs <https://github.com/delta-io/delta-rs> project provides a
native, Apache licensed, Rust implementation of Delta Lake, already
supporting a large part of the format and operations.

The first integration I would like to propose is adding read support via a
new TableProvider. There might be some work to do around dependencies as
both DataFusion and delta-rs rely on (certain versions of) Arrow and
Parquet.

Let me know if you have any further ideas or concerns.

Best regards,

Daniël Heres

Reply via email to