Thanks Marko for the detailed answer and the references. Cheers!
On Mon, Aug 21, 2023 at 1:57 PM Marko Grujic <mark...@gmail.com> wrote: > Hi Akshara, > > > Just for my understanding - the proposal assumes that writes will result > in > > a new table version correct? > > Actually, the implementation I had in mind does not make any assumptions > about the > behaviour of writes, it only accounts for the fact that there may be > different versions of the > same table available, and so we need to be able to discern them. But in > general yeah, > each write usually results in a new table version. > > > schema version itself does not change, rather the records are appendonly > > and have a timestamp associated with it ( typically in an 'internal' > > column). > > Correct, though schemas can sometimes also evolve (e.g. to facilitate > ALTER TABLE statements). > I’m particularly interested in Delta Lake (delta-rs), and in that case the > protocol tracks the timestamp > via the logs[1]. > > > Something like an "AS OF TIMESTAMP" support, basically. > > Indeed, this is the most common approach, i.e. having a timestamp to > delineate > different table versions. delta-rs also supports explicit version > referencing[2], while other systems (BigQuery, > Snowflake) also support offset intervals (e.g. '1 day ago'). > > I’ve opened a PR to sqlparser that introduces the basic timestamp table > version referencing > support for now[3]. This is a prerequisite for DataFusion table time > travel, though even if people > don’t agree to pursue that goal it can be independently useful on its own. > > Thanks, > Marko > > [1] https://www.splitgraph.com/blog/seafowl-delta-storage-layer > [2] > https://docs.rs/deltalake/latest/deltalake/builder/enum.DeltaVersion.html > [3] https://github.com/sqlparser-rs/sqlparser-rs/pull/951 > > On 2023/08/19 14:48:10 Akshara Uke wrote: > > Hi Marko, > > > > Indeed most databases do support time travel/stale reads (specially > > distributed databases) , hence an important feature,IMHO. > > > > Just for my understanding - the proposal assumes that writes will result > in > > a new table version correct? > > Asking since, some databases provide stale read support - but the table > > schema version itself does not change, rather the records are appendonly > > and have a timestamp associated with it ( typically in an 'internal' > > column). > > Perhaps the solution can be extended to have the facility to specify/tag > , > > in the table structure, a column as a commit timestamp tracker , then it > > can be used to provide stale reads based on a timestamp as well. > > > > Something like an "AS OF TIMESTAMP" support, basically. > > > > Hope it makes sense. > > > > Thanks, > > akshara > > > > On Fri, Aug 18, 2023 at 8:35 AM Marko Grujic <ma...@gmail.com> wrote: > > > > > Hi all! > > > > > > I'm wondering what people think of a possibility to extend DataFusion > so as > > > to accommodate time-travel querying? This would work well with the new > > > table formats, particularly Iceberg and Delta Lake, where table > versioning > > > is at the core of the protocol. > > > > > > You can see some details in the issue I raised below[1], but the TLDR > of > > > the work I see is: > > > 1. extend sqlparser-rs to be aware of the `AS OF` clause (or something > else > > > people prefer) > > > 2. capture that information inside `TableFactor::Table > > > < > > > > https://github.com/sqlparser-rs/sqlparser-rs/blob/main/src/ast/query.rs#L650-L664 > > > >` > > > expression > > > 3. then in DataFusion itself while building `SessionContextProvider` > and > > > pre-populating the tables for a given query keep track of both the > table > > > version and table name specified > > > 4. this would also mean a breaking change in the > `SchemaProvider::table` > > > along the lines of > > > ```rust > > > async fn table(&self, name: &str, version: Option<TableVersion>) -> > > > Option<Arc<dyn TableProvider>> > > > ``` > > > which would allow the provider implementation to be version-aware > > > > > > I'd be glad to commence work on this if there's consensus on the > addition > > > of such a feature to DataFusion. > > > > > > Cheers, > > > Marko > > > > > > [1] https://github.com/apache/arrow-datafusion/issues/7292 > > > > >