alamb commented on issue #3249:
URL: 
https://github.com/apache/arrow-datafusion/issues/3249#issuecomment-1236347394

   > Hi @alamb, I am a little confused. Do you mean the behaviour of create 
view should be equivalent of subquery?
   
   I think I would say it like "create view" makes a entry in the catalog and 
then when *querying* if there is a view reference for that query the view 
should be be treated like a subquery 
   
   
   > And create view is just to give a name binding to a query expression. And 
we always unfold (or inline) the view in the logical plan as a subquery, so 
that we could do different optimizations in different contexts. Is my 
understanding correct?
   
   Yes, I think that is the ideal behavior for views
   
   > And this should be an optimization of logical plan, right?
   
   I think that would be a good implementation, as it would work for both SQL 
and DataFrame interfaces. 
   
   > Another question is that should we treat treat view as a lazy value or an 
expression?
   
   I think the answer is expression (even though that would mean `v` is 
recomputed each query in your example). This is the core difference between 
views and tables. If a user wants to calculate the result once, they can use a 
table (or `CREATE TABLE AS SELECT...`. If they want a convenient alias that is 
expanded at query time, they use a view.
   
   Trying to save the output of a view has at least two potential challenges:
   1. When  the the data in `t` changes (a new row got added, say) then the 
output needs to be updated (like some sort of cache invalidation)
   2. The size of the view `v` may be very large and so caching it might be 
infeasible. 
   
   As an example, in TPCH schema maybe you don't want to have to explicitly 
write the joins from `lineitem` to order each time so you could define a view 
like
   
   ```sql
   create view lineitems_detailed as 
   select * from lineitems l JOIN orders o on l.o_key = o.o_key;
   ```
   
   And then let's say you write  queries with predicates like
   
   ```sql
   select count(*) from lineitems_detailed where order_type = 'foo' AND o_date 
between '1/1/2022' AND '1/1/2023';
   ```
   
   So in this case the actual values of the view are never "materialized"
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to