[ https://issues.apache.org/jira/browse/SPARK-26225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16735954#comment-16735954 ]
Wenchen Fan commented on SPARK-26225: ------------------------------------- I think it's hard to define the decoding time, as every data source may has its own definition. For data source v1, I think we just need to update `RowDataSourceScanExec` and track the time of the unsafe projection that turns Row to InternalRow. For data source v2, it's totally different. Spark needs to ask the data source to report the decoding time (or any other metrics). I'd like to defer it after the data source v2 metrics API is introduced. > Scan: track decoding time for row-based data sources > ---------------------------------------------------- > > Key: SPARK-26225 > URL: https://issues.apache.org/jira/browse/SPARK-26225 > Project: Spark > Issue Type: Sub-task > Components: SQL > Affects Versions: 2.4.0 > Reporter: Reynold Xin > Priority: Major > > Scan node should report decoding time for each record, if it is not too much > overhead. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org