[ 
https://issues.apache.org/jira/browse/SPARK-19256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15989560#comment-15989560
 ] 

Tejas Patil commented on SPARK-19256:
-------------------------------------

[~cloud_fan], [~sameerag] : I was looking at trunk and observed changes which 
would affect the plan (more from implementation perspective not the high level 
design). 

`InsertIntoHiveTable` is now a `RunnableCommand` (unlike earlier 
`UnaryExecNode`). With exec node, it was possible to set the 
requiredDistribution and requiredOrdering and let the planner 
(`EnsureRequirements`) take care of managing things. With `RunnableCommand`, 
the model seems to be that these requirements have to handled separately (so 
far there is only one place which does that: [0]). Two comments:
- I feel that this somewhat ugly as one would expect `EnsureRequirements` to be 
a single place for handling this
- we might miss out optimizations. eg. If I am adding an extra shuffle in 
`InsertIntoHiveTable` and if the previous node was shuffle as well, the code 
for merging these two shuffle nodes as a single shuffle would have to 
duplicated as well from `EnsureRequirements`.

Would it be OK to make `InsertIntoHiveTable` as a `UnaryExecNode` ?
Why was it made a `RunnableCommand` recently 
(https://github.com/apache/spark/pull/16517) ? cc [~smilegator]

[0] : 
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala#L173

> Hive bucketing support
> ----------------------
>
>                 Key: SPARK-19256
>                 URL: https://issues.apache.org/jira/browse/SPARK-19256
>             Project: Spark
>          Issue Type: Umbrella
>          Components: SQL
>    Affects Versions: 2.1.0
>            Reporter: Tejas Patil
>            Priority: Minor
>
> JIRA to track design discussions and tasks related to Hive bucketing support 
> in Spark.
> Proposal : 
> https://docs.google.com/document/d/1a8IDh23RAkrkg9YYAeO51F4aGO8-xAlupKwdshve2fc/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to