jvanstraten commented on PR #13509:
URL: https://github.com/apache/arrow/pull/13509#issuecomment-1175169931

   Chunked arrays are tested 
[here](https://github.com/apache/arrow/blob/3d6240c1ee7802829d2ed209f4135906e9413915/cpp/src/arrow/compute/kernels/aggregate_test.cc#L1566-L1568),
 but, as cleaned up from my debug prints for `chunked_input1`, the call pattern 
there is
   
   ```
   auto min_max_impl_1 = MinMaxImpl(...)
   auto min_max_impl_2 = MinMaxImpl(...)
   min_max_impl_2.Consume([5,1,2,3,4]) -> (1, 5)
   min_max_impl_1.MergeFrom(min_max_impl_2) -> (1, 5)
   auto min_max_impl_3 = MinMaxImpl(...)
   min_max_impl_3.Consume([9,1,null,3,4]) -> (1, 9)
   min_max_impl_1.MergeFrom(min_max_impl_3) -> (1, 9)
   min_max_impl_1.Finalize() -> (1, 9)
   ```
   
   for which it does not matter whether `Consume()` overrides the previous 
state. The test cases aren't great anyway since the last chunk has the min and 
max values for each of them, but even if I'd swap one of them around you'd 
still get
   
   ```
   auto min_max_impl_1 = MinMaxImpl(...)
   auto min_max_impl_2 = MinMaxImpl(...)
   min_max_impl_2.Consume([9,1,null,3,4]) -> (1, 9)
   min_max_impl_1.MergeFrom(min_max_impl_2) -> (1, 9)
   auto min_max_impl_3 = MinMaxImpl(...)
   min_max_impl_3.Consume([5,1,2,3,4]) -> (1, 5)
   min_max_impl_1.MergeFrom(min_max_impl_3) -> (1, 9)
   min_max_impl_1.Finalize() -> (1, 9)
   ```
   
   I don't know why it's doing it this way. Seems rather inefficient to me. 
Evidently though, for larger-scale workloads it *does* call `Consume()` more 
than once before merging and throwing away each `MinMaxImpl` instance.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to