[PR] patch: Parquet Column Names with "Special Characters" fix [iceberg-python]

via GitHub Fri, 27 Oct 2023 08:14:42 -0700


MarquisC opened a new pull request, #109:
URL: https://github.com/apache/iceberg-python/pull/109


   We're using PyIceberg to read Iceberg tables stored in S3 as parquet. We 
have column names in the form of `id:foo` `diagnostic:bar` using `:` as a sort 
of delimiter to help us do some programatic maintenance on our side.
   
   In Parquet the column names are magically subbed in this case `:` -> `_x3A` 
and upon attempts at scanning/reading the data the schema of the table doesn't 
match the physical column names for PyArrow.
   
   The first pass is a naive fix for this that I have tested and works, but I'm 
looking for guidance on where you all want me to put this logic, and I'm happy 
to add it there instead.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] patch: Parquet Column Names with "Special Characters" fix [iceberg-python]

Reply via email to