[ 
https://issues.apache.org/jira/browse/SPARK-20703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16009977#comment-16009977
 ] 

Tejas Patil commented on SPARK-20703:
-------------------------------------

[~viirya] : 
- Would this new operator be a physical plan node ? ie. `SparkPlan` ? One of 
the limitations of current approach of using `RunnableCommand` is that it does 
not allow defining partitioning + sorting requirements of the child nodes. I 
have a local WIP patch for changing that for Hive insertions (as per [0], I 
needed that for hive bucketing support) but seems like your work will be a 
superset of that.
- For metrics: size of data written out (compressed and uncompressed), number 
of files written out could be of good value. I agree that not all impls would 
give this data (however num files seems low hanging fruit).

[0]: 
https://issues.apache.org/jira/browse/SPARK-19256?focusedCommentId=15990618&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15990618

> Add an operator for writing data out
> ------------------------------------
>
>                 Key: SPARK-20703
>                 URL: https://issues.apache.org/jira/browse/SPARK-20703
>             Project: Spark
>          Issue Type: New Feature
>          Components: SQL
>    Affects Versions: 2.2.0
>            Reporter: Reynold Xin
>
> We should add an operator for writing data out. Right now in the explain plan 
> / UI there is no way to tell whether a query is writing data out, and also 
> there is no way to associate metrics with data writes. It'd be tremendously 
> valuable to do this for adding metrics and for visibility.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to