Hi
I have data in HDFS partitioned by a logical key and would like to preserve
the partitioning when creating a dataframe for the same. Is it possible to
create a dataframe that preserves partitioning from HDFS or the underlying
RDD?
Regards
Deenar
If you load data using ORC or parquet, the RDD will have a partition per file,
so in fact your data frame will not directly match the partitioning of the
table.
If you want to process by and guarantee preserving partitioning then
mapPartition etc will be useful.
Note that if you perform any