[ https://issues.apache.org/jira/browse/SPARK-20703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16009977#comment-16009977 ]
Tejas Patil commented on SPARK-20703: ------------------------------------- [~viirya] : - Would this new operator be a physical plan node ? ie. `SparkPlan` ? One of the limitations of current approach of using `RunnableCommand` is that it does not allow defining partitioning + sorting requirements of the child nodes. I have a local WIP patch for changing that for Hive insertions (as per [0], I needed that for hive bucketing support) but seems like your work will be a superset of that. - For metrics: size of data written out (compressed and uncompressed), number of files written out could be of good value. I agree that not all impls would give this data (however num files seems low hanging fruit). [0]: https://issues.apache.org/jira/browse/SPARK-19256?focusedCommentId=15990618&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15990618 > Add an operator for writing data out > ------------------------------------ > > Key: SPARK-20703 > URL: https://issues.apache.org/jira/browse/SPARK-20703 > Project: Spark > Issue Type: New Feature > Components: SQL > Affects Versions: 2.2.0 > Reporter: Reynold Xin > > We should add an operator for writing data out. Right now in the explain plan > / UI there is no way to tell whether a query is writing data out, and also > there is no way to associate metrics with data writes. It'd be tremendously > valuable to do this for adding metrics and for visibility. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org