[PR] fix: Iceberg warehouse path mismatch between Python and Java/Scala catalogs [texera]

via GitHub Fri, 17 Apr 2026 21:48:07 -0700


aglinxinyuan opened a new pull request, #4409:
URL: https://github.com/apache/texera/pull/4409


   ### What changes were proposed in this PR?
   Iceberg tables created via the Python API could not be read back on the 
Java/Scala side because the two runtimes were registering the Postgres JDBC 
catalog with different warehouse values, which PyIceberg persists into the 
table metadata.
   
   The Python side (create_postgres_catalog in 
amber/src/main/python/core/storage/iceberg/iceberg_utils.py) was prefixing the 
same path with file://, so tables created by Python UDFs were registered under 
file:///... while Scala-side lookups expected the un-prefixed path.
   
   This caused subsequent reads of Python-written Iceberg tables to fail 
(wrong/unresolvable warehouse path in the metadata pointer).
   
   Drop the file:// prefix in create_postgres_catalog so Python matches the 
Scala catalog's warehouse value exactly. PyIceberg accepts a plain local path 
here and will treat it as a local filesystem warehouse, consistent with the 
Scala JdbcCatalog configuration.
   
   ### Any related issues, documentation, discussions?
   Closes #4408
   
   ### How was this PR tested?
   1. Create an Iceberg table from a Python UDF operator and confirm it can be 
read back from the Scala/Java engine in the same workflow.
   2. Re-run existing Iceberg-backed workflows (Python-write → Python-read and 
Python-write → Scala-read) and confirm no regressions.
   3. Verify on Windows that the warehouse path passed in (with colon stripped) 
still resolves correctly from Python.
   
   ### Was this PR authored or co-authored using generative AI tooling?
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] fix: Iceberg warehouse path mismatch between Python and Java/Scala catalogs [texera]

Reply via email to