adsharma commented on issue #533:
URL: 
https://github.com/apache/arrow-datafusion/issues/533#issuecomment-865192183


   Are the following good references for what delta lake is proposing?
   
   https://docs.databricks.com/delta/concurrency-control.html
   https://docs.databricks.com/delta/optimizations/isolation-level.html
   
   I get that delta lake uses transactions on the *metadata* to ensure that a
   consistent view of the table is presented to batch jobs that may be reading
   it.
   
   The reasoning I was using is that SQL is currently a unified query language
   that works for both OLTP and OLAP. In the ideal world, one set of
   extensions address both use cases.
   
   So what I understood from the discussion is - if there is an "events" table
   with a "timestamp" column and is partitioned by the hour, it's perfectly
   legit for a query to aggregate over hourly partitions to compute some sort
   of a view. No Tx isolation guarantee is violated.
   
   However, if a table is getting updated with new data and a query is able to
   see both the old version and the new version and compute stats using some
   mix of the two, delta lake isolation guarantees are violated (assuming the
   tables were set up with WriteSerializable isolation level)?
   
   On Sun, Jun 20, 2021 at 7:34 PM QP Hou ***@***.***> wrote:
   
   > as of n is a deltalake specific SQL extension. It's better to think of t
   > version as of n as a different table. Datafusion is not a transactional
   > query engine, so querying the same table should always return the same
   > result.
   >
   > —
   > You are receiving this because you were mentioned.
   > Reply to this email directly, view it on GitHub
   > 
<https://github.com/apache/arrow-datafusion/issues/533#issuecomment-864679870>,
   > or unsubscribe
   > 
<https://github.com/notifications/unsubscribe-auth/AAFA2A3CSAA2UQ763NQ7FXDTT2QLDANCNFSM46NO5YQQ>
   > .
   >
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to