andygrove opened a new issue, #2646:
URL: https://github.com/apache/datafusion-comet/issues/2646
### Describe the bug
## SQL
```
SELECT c3, c42, corr(c20, c6) FROM test0 GROUP BY c3,c42 ORDER BY c3, c42;
```
### Spark Plan
```
AdaptiveSparkPlan isFinalPlan=true
+- == Final Plan ==
*(3) Sort [c3#3 ASC NULLS FIRST, c42#42 ASC NULLS FIRST], true, 0
+- AQEShuffleRead coalesced
+- ShuffleQueryStage 1
+- Exchange rangepartitioning(c3#3 ASC NULLS FIRST, c42#42 ASC
NULLS FIRST, 200), ENSURE_REQUIREMENTS, [plan_id=91129]
+- *(2) HashAggregate(keys=[c3#3, c42#42],
functions=[corr(c20#20, c6#6)], output=[c3#3, c42#42, corr(c20, c6)#28057])
+- AQEShuffleRead coalesced
+- ShuffleQueryStage 0
+- Exchange hashpartitioning(c3#3, c42#42, 200),
ENSURE_REQUIREMENTS, [plan_id=91101]
+- *(1) HashAggregate(keys=[c3#3, c42#42],
functions=[partial_corr(c20#20, c6#6)], output=[c3#3, c42#42, n#28038,
xAvg#28039, yAvg#28040, ck#28041, xMk#28042, yMk#28043])
+- *(1) ColumnarToRow
+- FileScan parquet [c3#3,c6#6,c20#20,c42#42]
Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex(1
paths)[file:/home/andy/git/apache/datafusion-comet/fuzz-testing/test0.parquet],
PartitionFilters: [], PushedFilters: [], ReadSchema:
struct<c3:int,c6:double,c20:double,c42:array<timestamp_ntz>>
+- == Initial Plan ==
Sort [c3#3 ASC NULLS FIRST, c42#42 ASC NULLS FIRST], true, 0
+- Exchange rangepartitioning(c3#3 ASC NULLS FIRST, c42#42 ASC NULLS
FIRST, 200), ENSURE_REQUIREMENTS, [plan_id=91083]
+- HashAggregate(keys=[c3#3, c42#42], functions=[corr(c20#20, c6#6)],
output=[c3#3, c42#42, corr(c20, c6)#28057])
+- Exchange hashpartitioning(c3#3, c42#42, 200),
ENSURE_REQUIREMENTS, [plan_id=91080]
+- HashAggregate(keys=[c3#3, c42#42],
functions=[partial_corr(c20#20, c6#6)], output=[c3#3, c42#42, n#28038,
xAvg#28039, yAvg#28040, ck#28041, xMk#28042, yMk#28043])
+- FileScan parquet [c3#3,c6#6,c20#20,c42#42] Batched: true,
DataFilters: [], Format: Parquet, Location: InMemoryFileIndex(1
paths)[file:/home/andy/git/apache/datafusion-comet/fuzz-testing/test0.parquet],
PartitionFilters: [], PushedFilters: [], ReadSchema:
struct<c3:int,c6:double,c20:double,c42:array<timestamp_ntz>>
```
### Comet Plan
```
AdaptiveSparkPlan isFinalPlan=true
+- == Final Plan ==
*(1) CometColumnarToRow
+- CometSort [c3#3, c42#42, corr(c20, c6)#28174], [c3#3 ASC NULLS FIRST,
c42#42 ASC NULLS FIRST]
+- AQEShuffleRead coalesced
+- ShuffleQueryStage 1
+- CometColumnarExchange rangepartitioning(c3#3 ASC NULLS FIRST,
c42#42 ASC NULLS FIRST, 200), ENSURE_REQUIREMENTS, CometColumnarShuffle,
[plan_id=91263]
+- CometHashAggregate [c3#3, c42#42, n#28155, xAvg#28156,
yAvg#28157, ck#28158, xMk#28159, yMk#28160], Final, [c3#3, c42#42],
[corr(c20#20, c6#6)]
+- AQEShuffleRead coalesced
+- ShuffleQueryStage 0
+- CometColumnarExchange hashpartitioning(c3#3,
c42#42, 200), ENSURE_REQUIREMENTS, CometColumnarShuffle, [plan_id=91217]
+- CometHashAggregate [c3#3, c6#6, c20#20,
c42#42], Partial, [c3#3, c42#42], [partial_corr(c20#20, c6#6)]
+- CometScan [native_iceberg_compat] parquet
[c3#3,c6#6,c20#20,c42#42] Batched: true, DataFilters: [], Format: CometParquet,
Location: InMemoryFileIndex(1
paths)[file:/home/andy/git/apache/datafusion-comet/fuzz-testing/test0.parquet],
PartitionFilters: [], PushedFilters: [], ReadSchema:
struct<c3:int,c6:double,c20:double,c42:array<timestamp_ntz>>
+- == Initial Plan ==
CometSort [c3#3, c42#42, corr(c20, c6)#28174], [c3#3 ASC NULLS FIRST,
c42#42 ASC NULLS FIRST]
+- CometColumnarExchange rangepartitioning(c3#3 ASC NULLS FIRST, c42#42
ASC NULLS FIRST, 200), ENSURE_REQUIREMENTS, CometColumnarShuffle,
[plan_id=91198]
+- CometHashAggregate [c3#3, c42#42, n#28155, xAvg#28156, yAvg#28157,
ck#28158, xMk#28159, yMk#28160], Final, [c3#3, c42#42], [corr(c20#20, c6#6)]
+- CometColumnarExchange hashpartitioning(c3#3, c42#42, 200),
ENSURE_REQUIREMENTS, CometColumnarShuffle, [plan_id=91196]
+- CometHashAggregate [c3#3, c6#6, c20#20, c42#42], Partial,
[c3#3, c42#42], [partial_corr(c20#20, c6#6)]
+- CometScan [native_iceberg_compat] parquet
[c3#3,c6#6,c20#20,c42#42] Batched: true, DataFilters: [], Format: CometParquet,
Location: InMemoryFileIndex(1
paths)[file:/home/andy/git/apache/datafusion-comet/fuzz-testing/test0.parquet],
PartitionFilters: [], PushedFilters: [], ReadSchema:
struct<c3:int,c6:double,c20:double,c42:array<timestamp_ntz>>
```
First difference at row 150:
Spark: `1190973260,[3333-01-21T01:11:48.781],NULL`
Comet: `1190973260,[3333-01-21T01:11:48.781],NaN`
### Steps to reproduce
_No response_
### Expected behavior
_No response_
### Additional context
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]