[ 
https://issues.apache.org/jira/browse/SPARK-34015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-34015:
---------------------------------
    Fix Version/s: 3.1.1
                   3.2.0

> SparkR partition timing summary reports input time correctly
> ------------------------------------------------------------
>
>                 Key: SPARK-34015
>                 URL: https://issues.apache.org/jira/browse/SPARK-34015
>             Project: Spark
>          Issue Type: Bug
>          Components: SparkR
>    Affects Versions: 2.3.2, 3.0.1
>         Environment: Observed on CentOS-7 running spark 2.3.1 and on my mac 
> running master
>            Reporter: Tom Howland
>            Priority: Major
>             Fix For: 3.2.0, 3.1.1
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> When sparkR is run at log level INFO, a summary of how the worker spent its 
> time processing the partition is printed. There is a logic error where it is 
> over-reporting the time inputting rows.
> In detail: the variable inputElap in a wider context is used to mark the 
> beginning of reading rows, but in the part changed here it was used as a 
> local variable for measuring compute time. Thus, the error is not observable 
> if there is only one group per partition, which is what you get in unit tests.
> For our application, here's what a log entry looks like before these changes 
> were applied:
> {{20/10/09 04:08:58 WARN RRunner: Times: boot = 0.013 s, init = 0.005 s, 
> broadcast = 0.000 s, read-input = 529.471 s, compute = 492.037 s, 
> write-output = 0.020 s, total = 1021.546 s}}
> this indicates that we're spending more time reading rows than operating on 
> the rows.
> After these changes, it looks like this:
> {{20/12/15 06:43:29 WARN RRunner: Times: boot = 0.013 s, init = 0.010 s, 
> broadcast = 0.000 s, read-input = 120.275 s, compute = 1680.161 s, 
> write-output = 0.045 s, total = 1812.553 s}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to