[I] fuzz test failure: `corr` null vs Nan [datafusion-comet]

via GitHub Fri, 24 Oct 2025 09:07:40 -0700


andygrove opened a new issue, #2646:
URL: https://github.com/apache/datafusion-comet/issues/2646


   ### Describe the bug
   
   ## SQL
   ```
   SELECT c3, c42, corr(c20, c6) FROM test0 GROUP BY c3,c42 ORDER BY c3, c42;
   ```
   ### Spark Plan
   ```
   AdaptiveSparkPlan isFinalPlan=true
   +- == Final Plan ==
      *(3) Sort [c3#3 ASC NULLS FIRST, c42#42 ASC NULLS FIRST], true, 0
      +- AQEShuffleRead coalesced
         +- ShuffleQueryStage 1
            +- Exchange rangepartitioning(c3#3 ASC NULLS FIRST, c42#42 ASC 
NULLS FIRST, 200), ENSURE_REQUIREMENTS, [plan_id=91129]
               +- *(2) HashAggregate(keys=[c3#3, c42#42], 
functions=[corr(c20#20, c6#6)], output=[c3#3, c42#42, corr(c20, c6)#28057])
                  +- AQEShuffleRead coalesced
                     +- ShuffleQueryStage 0
                        +- Exchange hashpartitioning(c3#3, c42#42, 200), 
ENSURE_REQUIREMENTS, [plan_id=91101]
                           +- *(1) HashAggregate(keys=[c3#3, c42#42], 
functions=[partial_corr(c20#20, c6#6)], output=[c3#3, c42#42, n#28038, 
xAvg#28039, yAvg#28040, ck#28041, xMk#28042, yMk#28043])
                              +- *(1) ColumnarToRow
                                 +- FileScan parquet [c3#3,c6#6,c20#20,c42#42] 
Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex(1 
paths)[file:/home/andy/git/apache/datafusion-comet/fuzz-testing/test0.parquet], 
PartitionFilters: [], PushedFilters: [], ReadSchema: 
struct<c3:int,c6:double,c20:double,c42:array<timestamp_ntz>>
   +- == Initial Plan ==
      Sort [c3#3 ASC NULLS FIRST, c42#42 ASC NULLS FIRST], true, 0
      +- Exchange rangepartitioning(c3#3 ASC NULLS FIRST, c42#42 ASC NULLS 
FIRST, 200), ENSURE_REQUIREMENTS, [plan_id=91083]
         +- HashAggregate(keys=[c3#3, c42#42], functions=[corr(c20#20, c6#6)], 
output=[c3#3, c42#42, corr(c20, c6)#28057])
            +- Exchange hashpartitioning(c3#3, c42#42, 200), 
ENSURE_REQUIREMENTS, [plan_id=91080]
               +- HashAggregate(keys=[c3#3, c42#42], 
functions=[partial_corr(c20#20, c6#6)], output=[c3#3, c42#42, n#28038, 
xAvg#28039, yAvg#28040, ck#28041, xMk#28042, yMk#28043])
                  +- FileScan parquet [c3#3,c6#6,c20#20,c42#42] Batched: true, 
DataFilters: [], Format: Parquet, Location: InMemoryFileIndex(1 
paths)[file:/home/andy/git/apache/datafusion-comet/fuzz-testing/test0.parquet], 
PartitionFilters: [], PushedFilters: [], ReadSchema: 
struct<c3:int,c6:double,c20:double,c42:array<timestamp_ntz>>
   
   ```
   ### Comet Plan
   ```
   AdaptiveSparkPlan isFinalPlan=true
   +- == Final Plan ==
      *(1) CometColumnarToRow
      +- CometSort [c3#3, c42#42, corr(c20, c6)#28174], [c3#3 ASC NULLS FIRST, 
c42#42 ASC NULLS FIRST]
         +- AQEShuffleRead coalesced
            +- ShuffleQueryStage 1
               +- CometColumnarExchange rangepartitioning(c3#3 ASC NULLS FIRST, 
c42#42 ASC NULLS FIRST, 200), ENSURE_REQUIREMENTS, CometColumnarShuffle, 
[plan_id=91263]
                  +- CometHashAggregate [c3#3, c42#42, n#28155, xAvg#28156, 
yAvg#28157, ck#28158, xMk#28159, yMk#28160], Final, [c3#3, c42#42], 
[corr(c20#20, c6#6)]
                     +- AQEShuffleRead coalesced
                        +- ShuffleQueryStage 0
                           +- CometColumnarExchange hashpartitioning(c3#3, 
c42#42, 200), ENSURE_REQUIREMENTS, CometColumnarShuffle, [plan_id=91217]
                              +- CometHashAggregate [c3#3, c6#6, c20#20, 
c42#42], Partial, [c3#3, c42#42], [partial_corr(c20#20, c6#6)]
                                 +- CometScan [native_iceberg_compat] parquet 
[c3#3,c6#6,c20#20,c42#42] Batched: true, DataFilters: [], Format: CometParquet, 
Location: InMemoryFileIndex(1 
paths)[file:/home/andy/git/apache/datafusion-comet/fuzz-testing/test0.parquet], 
PartitionFilters: [], PushedFilters: [], ReadSchema: 
struct<c3:int,c6:double,c20:double,c42:array<timestamp_ntz>>
   +- == Initial Plan ==
      CometSort [c3#3, c42#42, corr(c20, c6)#28174], [c3#3 ASC NULLS FIRST, 
c42#42 ASC NULLS FIRST]
      +- CometColumnarExchange rangepartitioning(c3#3 ASC NULLS FIRST, c42#42 
ASC NULLS FIRST, 200), ENSURE_REQUIREMENTS, CometColumnarShuffle, 
[plan_id=91198]
         +- CometHashAggregate [c3#3, c42#42, n#28155, xAvg#28156, yAvg#28157, 
ck#28158, xMk#28159, yMk#28160], Final, [c3#3, c42#42], [corr(c20#20, c6#6)]
            +- CometColumnarExchange hashpartitioning(c3#3, c42#42, 200), 
ENSURE_REQUIREMENTS, CometColumnarShuffle, [plan_id=91196]
               +- CometHashAggregate [c3#3, c6#6, c20#20, c42#42], Partial, 
[c3#3, c42#42], [partial_corr(c20#20, c6#6)]
                  +- CometScan [native_iceberg_compat] parquet 
[c3#3,c6#6,c20#20,c42#42] Batched: true, DataFilters: [], Format: CometParquet, 
Location: InMemoryFileIndex(1 
paths)[file:/home/andy/git/apache/datafusion-comet/fuzz-testing/test0.parquet], 
PartitionFilters: [], PushedFilters: [], ReadSchema: 
struct<c3:int,c6:double,c20:double,c42:array<timestamp_ntz>>
   
   ```
   First difference at row 150:
   Spark: `1190973260,[3333-01-21T01:11:48.781],NULL`
   Comet: `1190973260,[3333-01-21T01:11:48.781],NaN`
   
   ### Steps to reproduce
   
   _No response_
   
   ### Expected behavior
   
   _No response_
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] fuzz test failure: `corr` null vs Nan [datafusion-comet]

Reply via email to