Re: [SQL] Write parquet files under partition directories?

Matt Cheah Tue, 02 Jun 2015 15:58:40 -0700

Excellent! Where can I find the code, pull request, and Spark ticket where
this was introduced?


Thanks,

-Matt Cheah

From:  Reynold Xin <r...@databricks.com>
Date:  Monday, June 1, 2015 at 10:25 PM
To:  Matt Cheah <mch...@palantir.com>
Cc:  "dev@spark.apache.org" <dev@spark.apache.org>, Mingyu Kim
<m...@palantir.com>, Andrew Ash <a...@palantir.com>
Subject:  Re: [SQL] Write parquet files under partition directories?

There will be in 1.4.

df.write.partitionBy("year", "month", "day").parquet("/path/to/output")

On Mon, Jun 1, 2015 at 10:21 PM, Matt Cheah <mch...@palantir.com> wrote:
> Hi there,
> 
> I noticed in the latest Spark SQL programming guide
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__spark.apache.org_docs_la
> test_sql-2Dprogramming-2Dguide.html&d=BQMFaQ&c=izlc9mHr637UR4lpLEZLFFS3Vn2UXBr
> Z4tFb6oOnmz8&r=hzwIMNQ9E99EMYGuqHI0kXhVbvX3nU3OSDadUnJxjAs&m=_7T9n01KFlQS8djMT
> P3ylblUaOYNr68mj286s8zIdQ8&s=VQxAw6mG9yopDs37lNi7H_CnYiFQumqDAn9A8881Xyc&e=> ,
> there is support for optimized reading of partitioned Parquet files that have
> a particular directory structure (year=1/month=10/day=3, for example).
> However, I see no analogous way to write DataFrames as Parquet files with
> similar directory structures based on user-provided partitioning.
> 
> Generally, is it possible to write DataFrames as partitioned Parquet files
> that downstream partition discovery can take advantage of later? I considered
> extending the Parquet output format, but it looks like
> ParquetTableOperations.scala has fixed the output format to
> AppendingParquetOutputFormat.
> 
> Also, I was wondering if it would be valuable to contribute writing Parquet in
> partition directories as a PR.
> 
> Thanks,
> 
> -Matt Cheah

smime.p7s
Description: S/MIME cryptographic signature

Re: [SQL] Write parquet files under partition directories?

Reply via email to