[I] Bug triage results: 2026-05-11 [datafusion-comet]

via GitHub Mon, 11 May 2026 09:56:40 -0700


andygrove opened a new issue, #4287:
URL: https://github.com/apache/datafusion-comet/issues/4287


   # Bug triage results: 2026-05-11
   
   Triage pass over the open `requires-triage` queue, per the project [Bug 
Triage 
Guide](https://github.com/apache/datafusion-comet/blob/main/docs/source/contributor-guide/bug_triage.md).
   
   - Total issues processed: 16
   - `priority:high`: 1
   - `priority:medium`: 10
   - `priority:low`: 5
   
   Labels have already been applied and `requires-triage` removed from each 
issue below. A reviewer should spot-check the calls and close this issue when 
satisfied. To correct a label, edit the affected issue directly.
   
   ## Triaged
   
   ### priority:high
   
   - Number of rows in each column should be the same, but got 
[ArrayBuffer(8192, 0)] 
([#4211](https://github.com/apache/datafusion-comet/issues/4211))
     - Area labels: `area:scan`, `area:ffi`
     - Rationale: Native engine throws an unhandled exception from 
`NativeUtil.exportBatch` during `native_iceberg_compat` execution on a real 
TPC-H workload; per the guide, an exception thrown on a supported code path is 
`priority:high`.
   
   ### priority:medium
   
   - CI is broken 
([#4281](https://github.com/apache/datafusion-comet/issues/4281))
     - Area labels: `area:ci`
     - Rationale: CI tooling would normally be `priority:low`, but the ASF 
runner outage is blocking all PR merges; the guide's escalation rule for CI 
consistently blocking merges promotes it to `priority:medium`.
   - Add support for `posexplode` and `posexplode_outer` 
([#4269](https://github.com/apache/datafusion-comet/issues/4269))
     - Area labels: `area:expressions`
     - Rationale: Missing expression support with a clean Spark fallback 
workaround; matches the guide's `priority:medium` example for "missing 
expression support".
   - Support non-Literal default value for `LAG/LEAD` 
([#4268](https://github.com/apache/datafusion-comet/issues/4268))
     - Area labels: `area:expressions`
     - Rationale: Functional gap in the window-expression path, with a Spark 
fallback; matches `priority:medium` "missing expression support".
   - Support Iceberg "Rewrite Data Files Procedure" 
([#4250](https://github.com/apache/datafusion-comet/issues/4250))
     - Area labels: `area:scan`
     - Rationale: Feature gap in the Iceberg read path with a working Spark 
path today; functional gap with workaround is `priority:medium`.
   - Improve performance of `corr` and `covar` 
([#4249](https://github.com/apache/datafusion-comet/issues/4249))
     - Area labels: `area:aggregation`
     - Rationale: Comet (Scan + Exec) is ~0.7-0.8x Spark on these aggregates 
per the attached benchmark; "performance regression with workaround" is 
`priority:medium`.
   - Support fs.s3a.auth.profile.name and fs.s3a.auth.profile.file for 
ProfileCredentialsProvider 
([#4245](https://github.com/apache/datafusion-comet/issues/4245))
     - Area labels: `area:scan`
     - Rationale: Configurability gap in the S3A auth code path; functional gap 
with workaround is `priority:medium`.
   - Re-enable COUNT for mixed Spark partial / Comet final aggregate execution 
([#4242](https://github.com/apache/datafusion-comet/issues/4242))
     - Area labels: `area:aggregation`
     - Rationale: Functional gap (lost TPC-DS coverage) with a Spark fallback; 
matches `priority:medium` "broken features with workarounds".
   - Support higher-order array functions via JVM UDF bridge 
([#4224](https://github.com/apache/datafusion-comet/issues/4224))
     - Area labels: `area:expressions`, `area:udf`
     - Rationale: Missing expression support causing C2R/R2C transitions; 
`priority:medium` per the guide's expression-support example.
   - native_datafusion more permissive than Spark 3.x when reading Parquet 
TimestampNTZ columns 
([#4219](https://github.com/apache/datafusion-comet/issues/4219))
     - Area labels: `area:scan`
     - Rationale: Behavior mismatch with Spark 3.x: Comet returns a value where 
Spark would error. The reporter notes the value is correct under Spark 4 
semantics, so this is a functional/spec mismatch rather than a correctness or 
crash bug; `priority:medium`. Existing `spark 3.x` label retained.
   - Iceberg reflection failure 
([#4125](https://github.com/apache/datafusion-comet/issues/4125))
     - Area labels: `area:scan`
     - Rationale: Reflection lookup for `GlueTableOperations.current()` fails 
when running Comet with Iceberg on Glue; Iceberg integration is unusable for 
affected users but a Spark fallback exists; `priority:medium`.
   
   ### priority:low
   
   - CometPlainVector: cache offsetBufferAddress for variable-width vectors 
([#4280](https://github.com/apache/datafusion-comet/issues/4280))
     - Area labels: `area:scan`
     - Rationale: Pure micro-optimization in the JVM-side Arrow vector reader; 
no correctness or crash impact, so `priority:low`.
   - CometPlainVector: validity-bitmap byte cache for sequential reads 
([#4279](https://github.com/apache/datafusion-comet/issues/4279))
     - Area labels: `area:scan`
     - Rationale: Pure micro-optimization mirroring an existing pattern; no 
correctness or crash impact, `priority:low`.
   - Epic: Test coverage gaps for complex-type casts 
([#4248](https://github.com/apache/datafusion-comet/issues/4248))
     - Area labels: `area:expressions`
     - Rationale: Test-coverage epic; "test-only" maps directly to 
`priority:low` in the guide.
   - Eliminate remaining row<->Arrow round-trip in CometPythonMapInArrowExec 
([#4240](https://github.com/apache/datafusion-comet/issues/4240))
     - Area labels: `area:udf`
     - Rationale: Follow-up perf optimization to an already-shipped feature; no 
correctness or crash impact, `priority:low`.
   - Preserve dictionary encoding through native expressions where possible 
([#4228](https://github.com/apache/datafusion-comet/issues/4228))
     - Area labels: `area:expressions`, `area:scan`
     - Rationale: Performance enhancement; reporter pre-applied `priority:low`, 
retained. Area labels added.
   
   ## Escalations to consider
   
   - CI is broken 
([#4281](https://github.com/apache/datafusion-comet/issues/4281))
     - Started at `priority:low` for "CI tooling"; escalated to 
`priority:medium` per the guide's "CI flake blocking PR merges consistently" 
rule. A reviewer may want to bump to `priority:high` given that *all* PR merges 
are currently blocked, not just intermittent flakes.
   - Number of rows in each column should be the same, but got 
[ArrayBuffer(8192, 0)] 
([#4211](https://github.com/apache/datafusion-comet/issues/4211))
     - Labeled `priority:high` (thrown exception). The guide flags that FFI 
boundary issues can silently corrupt data in some cases; if any variant of this 
is observed to silently miscount rows rather than throw, escalate to 
`priority:critical`.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] Bug triage results: 2026-05-11 [datafusion-comet]

Reply via email to