[ 
https://issues.apache.org/jira/browse/SPARK-19256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16345397#comment-16345397
 ] 

Fernando Pereira edited comment on SPARK-19256 at 1/30/18 5:16 PM:
-------------------------------------------------------------------

Thanks a lot for this great contribution to Spark.

I was just wondering, would it make sense to apply this to direct outputs (e.g. 
write.parquet()), so that we could keep partitioning information - and again 
avoid reshuffling data before a merge? I believe this is most what 
saveAsTable() does by default in Spark, but to my mind it would improve the 
DataFrame write API and make these performance benefits more accessible.


was (Author: ferdonline):
Thanks a lot for this great contribution to Spark.

 

I was just wondering, would it make sense to apply this to direct outputs (e.g. 
write.parquet()), so that we could keep partitioning information - and again 
avoid reshuffling data before a merge? I believe this is most what 
saveAsTable() does by default in Spark, but to my mind it would improve the 
DataFrame write API and make these performance benefits more accessible.

> Hive bucketing support
> ----------------------
>
>                 Key: SPARK-19256
>                 URL: https://issues.apache.org/jira/browse/SPARK-19256
>             Project: Spark
>          Issue Type: Umbrella
>          Components: SQL
>    Affects Versions: 2.1.0
>            Reporter: Tejas Patil
>            Priority: Minor
>
> JIRA to track design discussions and tasks related to Hive bucketing support 
> in Spark.
> Proposal : 
> https://docs.google.com/document/d/1a8IDh23RAkrkg9YYAeO51F4aGO8-xAlupKwdshve2fc/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to