alamb commented on code in PR #9346:
URL: https://github.com/apache/arrow-datafusion/pull/9346#discussion_r1505211396


##########
datafusion/physical-plan/src/lib.rs:
##########
@@ -121,21 +121,23 @@ pub trait ExecutionPlan: Debug + DisplayAs + Send + Sync {
     fn as_any(&self) -> &dyn Any;
 
     /// Get the schema for this execution plan
-    fn schema(&self) -> SchemaRef;
+    fn schema(&self) -> SchemaRef {
+        self.cache().schema().clone()
+    }
+
+    fn cache(&self) -> &PlanPropertiesCache;
 
     /// Specifies how the output of this `ExecutionPlan` is split into
     /// partitions.
-    fn output_partitioning(&self) -> Partitioning;
+    fn output_partitioning(&self) -> &Partitioning {

Review Comment:
   > Specifically, the only difference I see with @mustafasrepo's design (i.e. 
default implementations) and using a new extension trait is that the former 
leaves the door open for a `ExecutionPlan` implementation to override the 
simple "get the property from cache" behavior for any of the functions like 
`output_partitioning`. I think this was by design, in case a downstream 
requirement forces implementors to add some custom behavior there.
   > 
   > Do you think that may be undesirable (or have some possibly harmful 
results)?
   
   The problem I see is that it would be possible for 
`plan.properties().execution_mode` to be inconsistent with
    `plan.execution_mode()`. 
   
   What I was imagining was the upgrade scenario of existing implementation of 
`ExecutionPlan` such as we have in `InfluxDB` 
   
   1. The person doing the upgrade sees "I have to implement a new function 
named `properties()`"
   2. Simply implements it by returning a default `PlanProperties()` function 
but leaves the `execution_mode` impl unchanged
   
   Now I can imagine some code in the future assumes that since all 
`ExecutionPlan` impls in DataFusion correctly implement `properties()` that it 
is safe to assume that `plan.properties().execution_mode` matches 
`plan.execution_mode()`.
   
   Maybe this would have been less likely when the name was cache() 
   
   I don't feel super strongly about this issue but I do think it is a "foot 
gun" in the API



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to