[PR] Manual deduction of partitions [iceberg-python]

via GitHub Fri, 28 Feb 2025 04:57:52 -0800


afiodorov opened a new pull request, #1743:
URL: https://github.com/apache/iceberg-python/pull/1743


   I want to a) be able to add files that a partitioned by the filename 
convention, e.g. s3://bucket/table/year=2025/month=12
   b) add files even if they have extra columns without having to migrate the 
table
   
   This comes from a common pattern of having existing hive tables and the need 
to migrate them to iceberg.
   
   I propose we can achieve this by doing
   
   `
   pattern = re.compile(r"([^/]+)=([^/]+)")
   
   def deduct_partition(path: str) -> Record:
       return Record(**dict(pattern.findall(path))
   
   table.add_files(['s3://bucket/table/year=2025/month=12/file.parquet'], 
check_schema=False, partition_deductor=deduct_partition)
   `


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] Manual deduction of partitions [iceberg-python]

Reply via email to