PranjalChaitanya opened a new pull request, #779:
URL: https://github.com/apache/iceberg-go/pull/779

   Part of #589
   
   This PR implements support for write-default values when projecting Arrow 
batches in `ToRequestedSchema`. If a column is not provided but there is a 
`WriteDefault` specified in the schema, it will create a column populated with 
the default value.
   
   To generate the column, this PR uses the method `MakeArrayFromScalar()`. 
This is similar to how 
[PyIceberg](https://github.com/apache/iceberg-python/blob/main/pyiceberg/io/pyarrow.py#L1998-L2009)
 handles write-defaults.
   
   One complication I ran into was converting Iceberg default values into Arrow 
scalars. In the schema, WriteDefault is stored as any, which means the concrete 
Go type information is lost at compile time. Iceberg default values are often 
stored using named Go types such as Date, Time, or Timestamp, which wrap 
primitive values like int32 or int64. Arrow’s scalar helpers (`MakeScalar`, 
`MakeScalarParam`) infer the scalar type from the Go value, and since these 
Iceberg types are not the same as Arrow’s own types (such as `arrow.Date32`), 
Arrow may interpret them as generic numeric scalars instead of their intended 
logical types. 
   
   For example, an iceberg.Date may be interpreted as a generic integer rather 
than a date32 scalar. While the numeric value would still be correct, the 
resulting Arrow array would have the wrong logical type.
   
   Java and Python implementations don’t run into this issue in the same way. 
From what I can tell, Java's core writers do not seem to be using Arrow. In 
Python, pa.scalar(value, type=...) explicitly specifies the Arrow type during 
scalar construction, so PyArrow does not need to infer the type from the Python 
value.
   
   Because Go stores the default as `any` in the schema, some runtime dispatch 
is required to normalize the value before constructing the Arrow scalar. The 
implementation in this PR handles those cases to ensure the resulting Arrow 
array matches the schema’s logical type.
   
   I added tests to test Write-Default across different iceberg types.
   
   If there is a simpler or more idiomatic way to perform this conversion 
within the Arrow or Iceberg-Go codebase, I would be very open to changing the 
implementation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to