Re: [PR] fix: materialize ConstantColumnVector on Comet's serialize/export paths [datafusion-comet]

via GitHub Thu, 18 Jun 2026 06:33:44 -0700


andygrove commented on PR #4532:
URL: 
https://github.com/apache/datafusion-comet/pull/4532#issuecomment-4742461444


   A few observations from a review pass. Acknowledging up front that this 
comment was drafted with help from AI (Claude Code).
   
   The reuse of the existing per-type `ArrowFieldWriter`s is exactly the right 
design choice — every Spark type comes along for free and the logic stays in 
sync with non-constant batches.
   
   A few notes:
   
   1. **Hardcoded `"UTC"` timezone in 
`Utils.materializeConstantColumnVector`.** The choice is defensible since Spark 
stores `TimestampType` as micros in UTC, but elsewhere in this file 
`toArrowSchema` requires callers to pass the session timezone explicitly. If a 
batch flows through `getBatchFieldVectors` containing both a 
`ConstantColumnVector` of `TimestampType` and a non-constant timestamp column, 
would they end up with mismatched Arrow timezone metadata? Worth either 
threading a `timeZoneId` parameter through `getBatchFieldVectors` (sourced from 
`SQLConf.sessionLocalTimeZone`), or adding a comment explaining why `"UTC"` is 
safe in this serialization context.
   
   2. **Test coverage in `UtilsSuite`.** Nice round-trip test. Would you mind 
adding a case or two beyond `IntegerType`? A nullable `StructType` or 
`ArrayType` constant exercises a different `ArrowFieldWriter` code path (via 
`getStruct(rowId)` / `getArray(rowId)` on `ConstantColumnVector`), and a 
`TimestampType` constant would pin down the timezone choice in case anyone 
later changes it. The materializer claims to support every type and it'd be 
useful to prove it for at least the complex case.
   
   3. **FFI `exportBatch` arm.** It shares `materializeConstantColumnVector` 
with `getBatchFieldVectors`, so the materializer itself is covered. The 
surrounding code (`Data.exportVector` + the allocator handoff) isn't exercised 
by the new test. If easy, a small smoke test that runs a 
`ConstantColumnVector`-containing batch through an actual Comet operator path 
would catch a regression in the FFI wiring. Optional.
   
   4. **CI flake.** The one failing check (`Spark 4.1 sql_core-3`) annotates as 
`The hosted runner lost communication with the server` — infrastructure, not 
the PR. The matching `sql_core-3` job on Spark 3.5 passes. A re-run should 
clear it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] fix: materialize ConstantColumnVector on Comet's serialize/export paths [datafusion-comet]

Reply via email to