kou commented on code in PR #42133:
URL: https://github.com/apache/arrow/pull/42133#discussion_r1641570658


##########
cpp/src/arrow/array/array_base.h:
##########
@@ -232,6 +232,14 @@ class ARROW_EXPORT Array {
   /// \return DeviceAllocationType
   DeviceAllocationType device_type() const { return data_->device_type(); }
 
+  /// \brief Return the statistics of this Array
+  ///
+  /// This just delegates to calling statistics on the underlying ArrayData
+  /// object which backs this Array.
+  ///
+  /// \return const ArrayStatistics&
+  const ArrayStatistics& statistics() const { return data_->statistics; }

Review Comment:
   > Should statistics be stored in memory together with every `ArrayData` 
instance?
   
   If it's not desired, we can avoid it by using 
`std::shared_ptr<ArrayStatistics>` or something.
   
   > Another problem with this is that statistics are derived data and 
`ArraData` is mutable when manipulated directly, so any mutation of `ArrayData` 
will have to consider the consequences to the derived statistics.
   >
   > Lazily-computed `null_count_` is a source of bugs and complexity for this 
reason. IMO statistics should be (1) computed or (2) carried from a file 
readers (like Parquet's) as something on the side.
   
   How about attaching the statistics read by a file reader to `Array` (not 
`ArrayData`) directly?
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to