westonpace commented on code in PR #14352:
URL: https://github.com/apache/arrow/pull/14352#discussion_r1019662482
##########
cpp/src/arrow/compute/exec/options.h:
##########
@@ -106,21 +106,32 @@ class ARROW_EXPORT ProjectNodeOptions : public
ExecNodeOptions {
std::vector<std::string> names;
};
-/// \brief Make a node which aggregates input batches, optionally grouped by
keys.
+/// \brief Make a node which aggregates input batches, optionally grouped by
keys and
+/// optionally segmented by segment-keys. Both keys and segment-keys determine
the group.
+/// However segment-keys are also used for determining grouping segments,
which should be
+/// large, and allow streaming a partial aggregation result after processing
each segment.
Review Comment:
> The definition is: when moving from one row to the next, if the tuple of
segment keys is constant then both rows are in the same segment, whereas if the
tuple changes in any way (not necessarily in lexicographic order) then they are
in different segments - see also
https://github.com/apache/arrow/pull/14352#issuecomment-1272400060.
Also, thanks for this explanation. I agree it sounds like a useful
generalization. Thanks!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]