Excellent! Where can I find the code, pull request, and Spark ticket where this was introduced?
Thanks, -Matt Cheah From: Reynold Xin <r...@databricks.com> Date: Monday, June 1, 2015 at 10:25 PM To: Matt Cheah <mch...@palantir.com> Cc: "dev@spark.apache.org" <dev@spark.apache.org>, Mingyu Kim <m...@palantir.com>, Andrew Ash <a...@palantir.com> Subject: Re: [SQL] Write parquet files under partition directories? There will be in 1.4. df.write.partitionBy("year", "month", "day").parquet("/path/to/output") On Mon, Jun 1, 2015 at 10:21 PM, Matt Cheah <mch...@palantir.com> wrote: > Hi there, > > I noticed in the latest Spark SQL programming guide > <https://urldefense.proofpoint.com/v2/url?u=https-3A__spark.apache.org_docs_la > test_sql-2Dprogramming-2Dguide.html&d=BQMFaQ&c=izlc9mHr637UR4lpLEZLFFS3Vn2UXBr > Z4tFb6oOnmz8&r=hzwIMNQ9E99EMYGuqHI0kXhVbvX3nU3OSDadUnJxjAs&m=_7T9n01KFlQS8djMT > P3ylblUaOYNr68mj286s8zIdQ8&s=VQxAw6mG9yopDs37lNi7H_CnYiFQumqDAn9A8881Xyc&e=> , > there is support for optimized reading of partitioned Parquet files that have > a particular directory structure (year=1/month=10/day=3, for example). > However, I see no analogous way to write DataFrames as Parquet files with > similar directory structures based on user-provided partitioning. > > Generally, is it possible to write DataFrames as partitioned Parquet files > that downstream partition discovery can take advantage of later? I considered > extending the Parquet output format, but it looks like > ParquetTableOperations.scala has fixed the output format to > AppendingParquetOutputFormat. > > Also, I was wondering if it would be valuable to contribute writing Parquet in > partition directories as a PR. > > Thanks, > > -Matt Cheah
smime.p7s
Description: S/MIME cryptographic signature