gongxun0928 commented on PR #1178:
URL: https://github.com/apache/cloudberry/pull/1178#issuecomment-2991897607
> Actually, I was thinking, in CBDB's Streaming Partial HashAggregate, is it
really necessary for the following two fields to be int64? Maybe int32 is
enough.
>
> ```
> int64 pInfcount; /* count of +Inf values */
> int64 nInfcount; /* count of -Inf values */
> ```
Except for some special cases, it should be very rare for the sum of
+Inf/-Inf values in an aggregate to exceed 2^32-1. If we do not consider
compatibility with such special case, we could change the type from int64 to
int32. Combined with a field reordering like the example below, the struct size
can be reduced from 144 bytes to 128 bytes. This would keep the memory usage
for numeric aggregation in line with Greenplum, while also improving hash
aggregation efficiency.
origin: 144bytes
```
typedef struct NumericAggState
{
bool calcSumX2; /* if true, calculate sumX2 */
MemoryContext agg_context; /* context we're calculating in */
int64 N; /* count of processed
numbers */
NumericSumAccum sumX; /* sum of processed numbers */
NumericSumAccum sumX2; /* sum of squares of processed numbers
*/
int maxScale; /* maximum scale seen
so far */
int64 maxScaleCount; /* number of values seen with maximum
scale */
/* These counts are *not* included in N! Use NA_TOTAL_COUNT() as
needed */
int64 NaNcount; /* count of NaN values */
int64 pInfcount; /* count of +Inf values */
int64 nInfcount; /* count of -Inf values */
} NumericAggState;
```
reordering: 128bytes
```
typedef struct NumericAggState
{
bool calcSumX2; /* if true, calculate sumX2 */
int32 maxScale; /* maximum scale seen so far */
int64 maxScaleCount; /* number of values seen with maximum
scale */
MemoryContext agg_context; /* context we're calculating in */
int64 N; /* count of processed
numbers */
NumericSumAccum sumX; /* sum of processed numbers */
NumericSumAccum sumX2; /* sum of squares of processed numbers
*/
int64 NaNcount; /* count of NaN values */
int32 pInfcount; /* count of +Inf values */
int32 nInfcount; /* count of -Inf values */
} NumericAggState;
```
I'll try to give a performance comparison of this change by next Monday.
Then we'll decide how to proceed.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]