dotjdk opened a new issue, #5625:
URL: https://github.com/apache/iceberg/issues/5625

   ### Apache Iceberg version
   
   0.14.0 (latest release)
   
   ### Query engine
   
   Spark
   
   ### Please describe the bug 🐞
   
   I am running a spark structured streaming job reading data from Kafka and 
writing to an Iceberg table partitioned by `days(timestamp)`.
   
   When `IcebergSparkSessionExtensions` are enabled, my job fails with 
`org.apache.spark.sql.AnalysisException: days(timestamp) ASC NULLS FIRST is not 
currently supported`.
   
   The only way I can get it to work is by not registering 
`IcebergSparkSessionExtensions` and enabling `fanout-writer`. When I do that, 
the data is written to the table, but I get the following entry in the log:
   
   ```
   2022-08-19 06:02:55 WARN  [stream execution thread for Streaming Query [id = 
9996dced-e80f-43b6-b241-0533f4df934c, runId = 
6b4caf31-db34-4cf1-b88e-8794b49c3a6a]]  o.a.i.spark.source.SparkWriteBuilder - 
Skipping distribution/ordering: extensions are disabled and spec contains 
unsupported transforms
   ```
   
   When I enable IcebergSparkSessionExtensions I get the following exception 
(`fanout-writer` enabled or not): 
   
   I couldn’t find a testcase that triggers this with non-identity 
partitioning, so I have attached a patch file with a modified version of the 
TestStructuredStreaming testcase which runs parameterized variations of fanout 
enabled/disabled and extensions registered or not
   
   | **Extensions** | **fanout-writer** | **Result**                            
                                              |
   
|----------------|-------------------|-------------------------------------------------------------------------------------|
   | disabled       | enabled           | Pass                                  
                                              |
   | disabled       | disabled          | Fail: Encountered records that belong 
to already closed files                       |
   | enabled        | enabled           | Fail: AnalysisException: 
days(timestamp) ASC NULLS FIRST is not currently supported |
   | enabled        | disabled          | Fail: AnalysisException: 
days(timestamp) ASC NULLS FIRST is not currently supported |
   
   Patch file with testcase:
   
[Non-identity_partitioning_broken_without_fanout_writer.patch.zip](https://github.com/apache/iceberg/files/9414772/Non-identity_partitioning_broken_without_fanout_writer.patch.zip)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to