[GitHub] [iceberg] RLashofRegas opened a new issue, #5977: How to write to a bucket-partitioned table using PySpark?

GitBox Thu, 13 Oct 2022 10:59:13 -0700


RLashofRegas opened a new issue, #5977:
URL: https://github.com/apache/iceberg/issues/5977


   ### Query engine
   
   Spark
   
   ### Question
   
   In the documentation for Spark writes, under the section for [Writing to 
partitioned 
tables](https://iceberg.apache.org/docs/latest/spark-writes/#writing-to-partitioned-tables),
 there are two Spark Jira issues linked that explain why (1) pre-write sort is 
required ([SPARK-23889](https://issues.apache.org/jira/browse/SPARK-23889)), 
and (2) manually registering the iceberg bucket function as a UDF is required 
for bucket-partitioned tables 
([SPARK-27658](https://issues.apache.org/jira/browse/SPARK-27658)). Both of 
these issues are in the resolved state.
   
   1. Is the Iceberg documentation still valid?
   2. If not, what release versions of Spark/Iceberg support writing to 
bucket-partitioned tables without the steps above?
   3. If so, how can I write to a bucket-partitioned table with PySpark given 
that (as far as I can tell) the bucket function is not available as a Python 
API so I cannot register it as a UDF?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] RLashofRegas opened a new issue, #5977: How to write to a bucket-partitioned table using PySpark?

Reply via email to