berkaysynnada commented on PR #15503:
URL: https://github.com/apache/datafusion/pull/15503#issuecomment-2785682160
> What do you think?
I'm still thinking we should unify the API's. We've discussed the issue with
our team, and got a common opinion:
If I give an example from my concerns, there will be many duplications like
this:
```rust
fn statistics(&self) -> Result<Statistics> {
let stats = Self::statistics_helper(
self.schema(),
self.input().statistics()?,
self.predicate(),
self.default_selectivity,
)?;
Ok(stats.project(self.projection.as_ref()))
}
fn statistics_by_partition(&self) -> Result<PartitionedStatistics> {
let input_stats = self.input.statistics_by_partition()?;
let stats: Result<Vec<Arc<Statistics>>> = input_stats
.iter()
.map(|stat| {
let stat = Self::statistics_helper(
self.schema(),
stat.clone(),
self.predicate(),
self.default_selectivity,
)
.map(|stat| stat.project(self.projection.as_ref()))?;
Ok(Arc::new(stat))
})
.collect();
Ok(PartitionedStatistics::new(stats?))
}
```
There are/will be duplications of statistics() logics in each operator like
this, because the calculations are the same, whether the stats are coming from
the whole table or just for one partition. We can avoid the duplications and
write efficient functional statistics() implementations if we adopt
```rust
fn statistics(&self, partition: Option<usize>) -> Result<Statistics>
```
style. So, that alternative wins clearly for me against other alternatives.
It also does not modify the other existing structs/API's, and propose an
extensible way while enabling the statistics access over any partition.
TLDR, updating the API as `fn statistics(&self, partition: Option<usize>) ->
Result<Statistics>` has a minimal change, doesn't force us to follow an
immature design path, reduce duplications, and enables partition-based stats
access, that's the main goal.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]