You can specify the first folder directly and read it On Fri, 27 Apr 2018 at 9:42 pm, Walid LEZZAR <walez...@gmail.com> wrote:
> Hi, > > I have a parquet on S3 partitioned by day. I have 2 years of data (-> > about 1000 partitions). With spark, when I just want to know the schema of > this parquet without even asking for a single row of data, spark tries to > list all the partitions and the nested partitions of the parquet. Which > makes it very slow just to build the dataframe object on Zeppelin. > > Is there a way to avoid that ? Is there way to tell spark : "hey, just > read a single partition and give me the schema of that partition and > consider it as the schema of the whole dataframe" ? (I don't care about > schema merge, it's off by the way) > > Thanks. > Walid. > -- Best Regards, Ayan Guha