neilconway opened a new pull request, #22089: URL: https://github.com/apache/datafusion/pull/22089
## Which issue does this PR close? - Closes #22088. ## Rationale for this change `LogicalPlanBuilder::infer_data` (the inference path for `VALUES` clauses without a target schema) hard-coded every inferred column's nullability to `true`, regardless of whether any row actually contained a NULL. This is inconsistent with how nullability is computed for other, similar situations (e.g., `SELECT 1`). In addition to improving internal consistency (and theoretically allowing better query optimization), this also makes it easier to write tests for nullability-related behavior without using a scratch table. ## What changes are included in this PR? * `LogicalPlanBuilder::infer_data` now tracks per-column nullability while iterating values, marking the column nullable iff any row's value expression returns `nullable() == true`. Note that this only changes behavior for `VALUES` without a schema; `INSERT INTO VALUES`, for example, already computed nullability. * Add SLT tests for this behavior * Update expected SLT tests where this change results in updating a schema. Note that some Parquet files are slightly smaller now, which caused byte-count metrics in a few places to change. ## Are these changes tested? Yes, with new tests added. ## Are there any user-facing changes? The inferred schema for `CREATE TABLE AS VALUES (...)` will now change, although if we also fix #22087 then the original behavior will be preserved. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
