xudong963 commented on code in PR #15539:
URL: https://github.com/apache/datafusion/pull/15539#discussion_r2036278558
##########
datafusion/datasource/src/statistics.rs:
##########
@@ -410,23 +410,24 @@ pub async fn get_statistics_with_limit(
}
/// Generic function to compute statistics across multiple items that have
statistics
-fn compute_summary_statistics<T, I>(
+/// If `items` is empty or all items don't have statistics, it returns `None`.
Review Comment:
> If you don't have a schema, how can you even try to compute a statistics,
I couldn't imagine also that
For the `compute_summary_statistics` method, it does only summarize, it's
not necessary to have the schema.
For the caller of `compute_summary_statistics`, such as
`compute_file_group_statistics`, if `compute_summary_statistics` returns none,
it doesn't need to do anything, because the default value of the statistics of
FileGroup is None
```rust
pub struct FileGroup {
/// The files in this group
files: Vec<PartitionedFile>,
/// Optional statistics for the data across all files in the group
statistics: Option<Arc<Statistics>>,
}
```
It is used as a base method, and its caller has the flexibility to treat it
according to its return value, without restricting it too much (e.g., requiring
a shcema)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]