[ 
https://issues.apache.org/jira/browse/SPARK-20703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16009966#comment-16009966
 ] 

Liang-Chi Hsieh commented on SPARK-20703:
-----------------------------------------

I've done something locally.

Currently I wrap the query to write out with a new operator and set only 
outputNumRows metric when the data is pulled out to write. I am wondering what 
other metrics supposed to have for this new operator.

We have several RunnableCommand classes for writing data out.

For file-based relations, FileFormatWriter is used to write data out. We pass 
in a QueryExecution for the query to write out. We can track the action for the 
executed plan of this QueryExecution. 

For datasource relations, the logic of writing data out is delegated to the 
datasource implementations. We just pass in a DataFrame to the writing data 
API. The above approach to track writing action will fail for the datasource 
APIs if they create new DataFrame. JdbcRelationProvider is one example that it 
may create a new DataFrame by repartitioning original DataFrame. In this case, 
we can't track the writing action because the executed plan is different now.

[~rxin] Do you have any suggestions?




> Add an operator for writing data out
> ------------------------------------
>
>                 Key: SPARK-20703
>                 URL: https://issues.apache.org/jira/browse/SPARK-20703
>             Project: Spark
>          Issue Type: New Feature
>          Components: SQL
>    Affects Versions: 2.2.0
>            Reporter: Reynold Xin
>
> We should add an operator for writing data out. Right now in the explain plan 
> / UI there is no way to tell whether a query is writing data out, and also 
> there is no way to associate metrics with data writes. It'd be tremendously 
> valuable to do this for adding metrics and for visibility.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to