[PR] Implemented write-default v3 [iceberg-go]

via GitHub Thu, 12 Mar 2026 13:59:39 -0700


PranjalChaitanya opened a new pull request, #779:
URL: https://github.com/apache/iceberg-go/pull/779

Part of #589

This PR implements support for write-default values when projecting Arrow
batches in `ToRequestedSchema`. If a column is not provided but there is a
`WriteDefault` specified in the schema, it will create a column populated with
the default value.

To generate the column, this PR uses the method `MakeArrayFromScalar()`.
This is similar to how
[PyIceberg](https://github.com/apache/iceberg-python/blob/main/pyiceberg/io/pyarrow.py#L1998-L2009)
handles write-defaults.

One complication I ran into was converting Iceberg default values into Arrow
scalars. In the schema, WriteDefault is stored as any, which means the concrete
Go type information is lost at compile time. Iceberg default values are often
stored using named Go types such as Date, Time, or Timestamp, which wrap
primitive values like int32 or int64. Arrow’s scalar helpers (`MakeScalar`,
`MakeScalarParam`) infer the scalar type from the Go value, and since these
Iceberg types are not the same as Arrow’s own types (such as `arrow.Date32`),
Arrow may interpret them as generic numeric scalars instead of their intended
logical types.

For example, an iceberg.Date may be interpreted as a generic integer rather
than a date32 scalar. While the numeric value would still be correct, the
resulting Arrow array would have the wrong logical type.

Java and Python implementations don’t run into this issue in the same way.
From what I can tell, Java's core writers do not seem to be using Arrow. In
Python, pa.scalar(value, type=...) explicitly specifies the Arrow type during
scalar construction, so PyArrow does not need to infer the type from the Python
value.

Because Go stores the default as `any` in the schema, some runtime dispatch
is required to normalize the value before constructing the Arrow scalar. The
implementation in this PR handles those cases to ensure the resulting Arrow
array matches the schema’s logical type.

I added tests to test Write-Default across different iceberg types.

If there is a simpler or more idiomatic way to perform this conversion
within the Arrow or Iceberg-Go codebase, I would be very open to changing the
implementation.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] Implemented write-default v3 [iceberg-go]

Reply via email to