soumilshah1995 commented on issue #362: URL: https://github.com/apache/incubator-xtable/issues/362#issuecomment-1976599431
ello, I hope this message finds you well. I wanted to share with you the progress I've made in writing data into Hudi using PySpark. You can find the code implementation in this [GitHub repository](https://github.com/soumilshah1995/aws-hudi-delta-iceberg-interoperability/blob/main/glue_jupyter_workspace/Untitled2.ipynb). Here's a snippet of the sample data I've been working with: ``` +--------------------+--------------+--------+----------+------------------+--------------------+--------------------+ | customer_id| name| state| city| email| created_at| address| +--------------------+--------------+--------+----------+------------------+--------------------+--------------------+ |7dd63c8b-d588-4f3...|Shannon Fields|New York|Millerport|[email protected]|2024-03-02T12:45:...|344 Bates Flats S...| +--------------------+--------------+--------+----------+------------------+--------------------+--------------------+ ``` Regarding the improvements needed in the notebook (Untitled2.ipynb), I've noted that there might be nullable fields in the schema, which could pose an issue. To address this in PySpark, we need to ensure that all nullable fields are handled appropriately. Here's a suggestion on how to handle nullable fields in PySpark code: -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
