Iceberg stores all table columns in the underlying data files. It does not store derived partition values in the data files. If you're partitioning by date(ts), it won't store that date ordinal. If you're partitioning by identity(date_col), it will store date_col.
When reading data, values from the manifest are used for identity partition data to avoid extra work materializing the same value for every row. On Thu, Apr 18, 2019 at 8:47 AM suds <sudssf2...@gmail.com> wrote: > I am working on spark project and came across interesting ( was known in > hive) convention spark use. > > https://spark.apache.org/docs/2.3.0/sql-programming-guide.html#partition-discovery > > in spark if I partition dataset. partition columns does not exists in > parquet schema and hence in final data file. partition information has to > be extracted from path. > this does not work well when I pass list of files to spark instead of path. > > What is behaviour in iceberg? does it store partition columns in final > parquet file or behaviour same as spark where partition columns are only > part of metadata and not actual file? > > (P.S. I am aware about iceberg metadata implementation but I need some > pointers to find out if partition columns are stored in file vs metadata) > > -- > Thanks > > -- > You received this message because you are subscribed to the Google Groups > "Iceberg Developers" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to iceberg-devel+unsubscr...@googlegroups.com. > To post to this group, send email to iceberg-de...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/iceberg-devel/CAO32DPza-kcm1PnfSA5KJu3rymvk1FYHZnwLe0hu%2B86FLqmt8g%40mail.gmail.com > <https://groups.google.com/d/msgid/iceberg-devel/CAO32DPza-kcm1PnfSA5KJu3rymvk1FYHZnwLe0hu%2B86FLqmt8g%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- Ryan Blue Software Engineer Netflix