alamb commented on code in PR #9295:
URL: https://github.com/apache/arrow-rs/pull/9295#discussion_r2746909110


##########
arrow-array/src/array/mod.rs:
##########
@@ -350,12 +413,19 @@ pub unsafe trait Array: std::fmt::Debug + Send + Sync {
 /// A reference-counted reference to a generic `Array`
 pub type ArrayRef = Arc<dyn Array>;
 
-/// Ergonomics: Allow use of an ArrayRef as an `&dyn Array`
+/// Ergonomics: Allow use of an ArrayRef as an `&dyn Array`.
+/// Use [`Array::as_array_ref_opt()`] to recover the original [`ArrayRef`] and
+/// [`downcast_array_ref()`] to downcast to an Arc of some concrete array type.
 unsafe impl Array for ArrayRef {
     fn as_any(&self) -> &dyn Any {
         self.as_ref().as_any()
     }
 
+    fn as_array_ref_opt(&self) -> Option<ArrayRef> {
+        // Recursively unwrap nested Arcs to find the deepest one
+        Some(innermost_array_ref(Cow::Borrowed(self)))

Review Comment:
   I am probably confused, but why not just `Arc::clone(self)`? This is `impl 
... for ArrayRef` which is `Arc<dyn Array>`
   
   So technically speaking as_array_ref_opt would be recovering the ArrayRef
   
   Are there usecases when we have wrapped multiple levels of Arrays? I think 
this (code tries to handle the case of `Arc<Arc<PrimitiveArary>>` or something, 
which I don't think is common 🤔 



##########
arrow-array/src/array/mod.rs:
##########
@@ -113,6 +118,64 @@ pub unsafe trait Array: std::fmt::Debug + Send + Sync {
     /// ```
     fn as_any(&self) -> &dyn Any;
 
+    /// Attempts to recover an [`ArrayRef`] from `&dyn Array`, returning 
`None` if not Arc-backed.

Review Comment:
   this API makes sense to me as a user of the library, for what it is worth, 
and would save some allocations rather than going to/from ArrayData



##########
arrow-array/src/array/mod.rs:
##########
@@ -500,6 +574,76 @@ unsafe impl<T: Array> Array for &T {
     }
 }
 
+/// Returns the innermost `ArrayRef` from potentially nested `Arc<dyn Array>` 
wrappers.
+fn innermost_array_ref(array: Cow<'_, ArrayRef>) -> ArrayRef {
+    // Peel away the Cow and the Arc, then recurse on the referent.
+    (**array)
+        .as_array_ref_opt()
+        .unwrap_or_else(|| array.into_owned())
+}
+
+/// Downcasts an [`ArrayRef`] to `Arc<T>`, or returns the original on type 
mismatch.
+///
+/// This provides zero-cost downcasting when the dynamic type matches, 
avoiding cloning
+/// through [`ArrayData`]. Automatically handles nested `Arc<dyn Array>` 
wrappers.
+///
+/// If the Arc contains nested `Arc<dyn Array>`, this method unwraps the 
innermost Arc.
+///
+/// ```
+/// # use std::sync::Arc;
+/// # use arrow_array::{Array, ArrayRef, Int32Array, StringArray};
+/// # use arrow_array::downcast_array_ref;
+/// let array: ArrayRef = Arc::new(Int32Array::from(vec![1, 2, 3]));
+///
+/// let typed: Arc<Int32Array> = downcast_array_ref(array.clone()).unwrap();
+/// assert_eq!(typed.len(), 3);
+///
+/// assert!(downcast_array_ref::<StringArray>(array).is_err());
+/// ```
+pub fn downcast_array_ref<T: Array + 'static>(array: ArrayRef) -> 
Result<Arc<T>, ArrayRef> {
+    // SAFETY STRATEGY:
+    //
+    // This function performs two checks to ensure it's safe to reinterpret
+    // Arc<dyn Array> as Arc<T>:
+    //
+    // 1. Type verification via Any::downcast_ref ensures the dynamic type is T
+    // 2. Pointer provenance check ensures the Arc was formed by unsized 
coercion
+    //    from Arc<T>, not through a wrapper like Arc<&dyn Array> or 
Arc<Box<dyn Array>>
+    //
+    // NOTE: The second check is unsound if `Array::as_any()` returns a 
reference to any field of
+    // (any sub-object of) `self`. All canonical array types either return 
&self or dereference an
+    // internal ArrayRef, both of which satisfy the safety requirements of 
`Arc::from_raw`.
+    //
+    // Only if both checks pass do we reconstruct Arc<T> from the same pointer
+    // that Arc::into_raw gave us, which is safe because:
+    // - Type correctness is guaranteed by check #1
+    // - Same allocation/pointer is guaranteed by check #2
+    // - Arc::from_raw is only called once on this pointer
+
+    // Unwrap nested Arc<dyn Array> if present (zero clones if not nested)

Review Comment:
   I feel like this is code I am not qualified to review.
   
   I did some more googling, and it seems like we should be able to do 
soemthing similar using a trick like this:
   
   
https://stackoverflow.com/questions/76222743/upcasting-an-arcdyn-trait-to-an-arcdyn-any
   
   However, I couldn't make the trait impls all line up 🤔 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to