jayshrivastava commented on code in PR #20416:
URL: https://github.com/apache/datafusion/pull/20416#discussion_r2956637740


##########
datafusion/physical-expr/src/expressions/dynamic_filters.rs:
##########
@@ -327,6 +468,14 @@ impl DynamicFilterPhysicalExpr {
         Arc::strong_count(self) > 1 || Arc::strong_count(&self.inner) > 1
     }
 
+    /// Returns a unique identifier for the inner shared state.
+    ///
+    /// Useful for checking if two [`Arc<PhysicalExpr>`] with the same
+    /// underlying [`DynamicFilterPhysicalExpr`] are the same.
+    pub fn inner_id(&self) -> u64 {
+        Arc::as_ptr(&self.inner) as *const () as u64

Review Comment:
   👋🏽 @adriangb @gabotechs @LiaCastaneda this PR is ready for another review. 
   
   I've attempted to create a better abstraction. I've put the details in the 
PR desc but will include them here too. Let me know what you think!
   
   > This PR aims to fix those problems by making a few changes 
   > 1. `PhysicalExpr::expr_id` is added. This is basically the arc pointer 
which we use today. Any expression can be deduped trivially using this ID
   > 2. For dynamic filter expressions, we now have a new trait 
`DedupablePhysicalExpr`. 
   >   - Its purpose is to allow `PhysicalExpr` to internally define what 
`dedupe` means. 
   >   - It adds a new concept, an `internal_expr_id` which is different than 
the one provided by `PhysicalExpr::expr_id`. It indicates that the expression 
cannot be deduped trivially
   >   - There's a `dedupe(self, other)` method which allows the implementor to 
define how to dedupe two expressions
   > 
   > Then we update the `DedupingSerializer` and `DedupingDeserializer` to 
dedupe on both levels, using the `PhysicalExpr::expr_id` and using the 
`internal_expr_id`.
   >  
   > One suspicious thing is that, since dynamic filters change with time (ie. 
have internal mutability unlike other `PhysicalExpr`), I thought it would be 
good to snapshot the generation as well. For this reason, 
`DedupablePhysicalExpr` has a concept of a `DedupeSnapshot` which provides the 
`internal_expr_id`. Both must be captured **atomically** to serialize the 
dynamic filter expr. 
   > 
   > This PR does not 
   > - add a test for the `HashJoinExec` and `DataSourceExec` filter pushdown 
case, but this is relevant follow up work.
   > - PhysicalExtensionCodec methods should take a converter as arguments. 
Distributed datafusion needs to use converters, specifically the deduping one
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to