[ 
https://issues.apache.org/jira/browse/SPARK-23247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-23247:
-----------------------------------

    Assignee: caoxuewen

> combines Unsafe operations and statistics operations in Scan Data Source
> ------------------------------------------------------------------------
>
>                 Key: SPARK-23247
>                 URL: https://issues.apache.org/jira/browse/SPARK-23247
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.3.0
>            Reporter: caoxuewen
>            Assignee: caoxuewen
>            Priority: Major
>
> Currently, we scan the execution plan of the data source, first the unsafe 
> operation of each row of data, and then re traverse the data for the count of 
> rows. In terms of performance, this is not necessary. this PR combines the 
> two operations and makes statistics on the number of rows while performing 
> the unsafe operation.
> *Before modified,*
> {color:#cc7832}val {color}unsafeRow = rdd.mapPartitionsWithIndexInternal { 
> (index{color:#cc7832}, {color}iter) =>
>  {color:#cc7832}val {color}proj = 
> UnsafeProjection.create({color:#9876aa}schema{color})
>  proj.initialize(index)
>  {color:#FF0000}iter.map(proj){color}
> }
> {color:#cc7832}val {color}numOutputRows = 
> longMetric({color:#6a8759}"numOutputRows"{color})
> unsafeRow.map { r =>
>  {color:#FF0000}numOutputRows += {color}{color:#6897bb}{color:#FF0000}1{color}
> {color} r
> }
> *After modified,*
>     val numOutputRows = longMetric("numOutputRows")
>     rdd.mapPartitionsWithIndexInternal { (index, iter) =>
>       val proj = UnsafeProjection.create(schema)
>       proj.initialize(index)
>       iter.map( r => {
> {color:#FF0000}        numOutputRows += 1{color}
> {color:#FF0000}        proj(r){color}
>       })
>     }
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to