thanatham-google opened a new issue, #37823:
URL: https://github.com/apache/beam/issues/37823

   ### What would you like to happen?
   
   Issue: The Apache Beam IcebergIO connector currently does not support 
writing the native Date data type to Iceberg tables. This seems to affect users 
of the Python SDK in particular, due to cross-language transform limitations. 
The issue appears linked to older Joda-time library dependencies in the 
underlying Java IO implementation.
   
   Client Impact: A key client using Dataflow with the Apache Beam Python SDK 
and Iceberg tables on GCS/BigQuery is significantly impacted and unhappy. They 
cannot natively write Python datetime.date objects. They are forced to use 
workarounds like storing dates as Integers or Strings, which they find 
suboptimal for their data representation and query needs.
   
   Root Cause & External Trackers: This is a known issue in the Apache Beam 
community, related to the need for a portable Date type and the migration from 
Joda-time to Java.time.
   
   Main issue: https://github.com/apache/beam/issues/25946 Dependencies: 
https://github.com/apache/beam/issues/28359 
https://github.com/apache/beam/issues/19215 Current Workarounds Considered: The 
client has considered treating dates as Strings (e.g., 'YYYY-MM-DD'), Integers 
(e.g., YYYYMMDD or epoch days), Timestamps, or using a custom cross-language 
transform wrapper like the one found at 
https://github.com/johanesalxd/beam-iceberg-date.
   
   Suggested Resolution: The request is to add native, portable support for the 
Date data type within the Beam IcebergIO, ensuring it works smoothly from the 
Python SDK. 
   
   Investigation Details:
   
   What IO is having the issues: Apache Beam IcebergIO sink (the write 
transform).
   
   What are the configurations used for this IO? The customer's specific code 
snippet for configuring the IcebergIO.write() transform is not available. The 
configuration would typically involve specifying catalog details and the target 
table name. The issue occurs when the pipeline data contains standard Python 
datetime.date objects.
   
   The shape of the user data: Schema: The exact schema is not provided, but 
the data includes fields intended to be DATE type, represented as Python 
datetime.date objects. Volume: Specific details on data volume (records or 
bytes) are not available. The issue is type-related, so it likely affects any 
volume.
   
   ### Issue Priority
   
   Priority: 2 (default / most feature requests should be filed as P2)
   
   ### Issue Components
   
   - [ ] Component: Python SDK
   - [ ] Component: Java SDK
   - [ ] Component: Go SDK
   - [ ] Component: Typescript SDK
   - [ ] Component: IO connector
   - [ ] Component: Beam YAML
   - [ ] Component: Beam examples
   - [ ] Component: Beam playground
   - [ ] Component: Beam katas
   - [ ] Component: Website
   - [ ] Component: Infrastructure
   - [ ] Component: Spark Runner
   - [ ] Component: Flink Runner
   - [ ] Component: Samza Runner
   - [ ] Component: Twister2 Runner
   - [ ] Component: Hazelcast Jet Runner
   - [ ] Component: Google Cloud Dataflow Runner


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to