spark partition discovery vs iceberg partition discovery implementation

suds Thu, 18 Apr 2019 08:48:01 -0700

I am working on spark project and came across interesting ( was known in
hive) convention spark use.
https://spark.apache.org/docs/2.3.0/sql-programming-guide.html#partition-discovery


in spark if I partition dataset. partition columns does not exists in
parquet schema and hence in final data file. partition information has to
be extracted from path.
this does not work well when I pass list of files to spark instead of path.

What is behaviour in iceberg? does it store partition columns in final
parquet file or behaviour same as spark where partition columns are only
part of metadata and not actual file?

(P.S. I am aware about iceberg metadata implementation but I need some
pointers to find out if partition columns are stored in file vs metadata)

--
Thanks

spark partition discovery vs iceberg partition discovery implementation

Reply via email to