Re: Spark 3.0 almost 1000 times slower to read json than Spark 2.4

ArtemisDev Mon, 29 Jun 2020 07:45:42 -0700

Could you share your code? Are you sure you Spark 2.4 cluster hadindeed read anything? Looks like the Input size field is empty under 2.4.


-- ND


On 6/27/20 7:58 PM, Sanjeev Mishra wrote:

I have large amount of json files that Spark can read in 36 secondsbut Spark 3.0 takes almost 33 minutes to read the same. On closeranalysis, looks like Spark 3.0 is choosing different DAG than Spark2.0. Does anyone have any idea what is going on? Is there anyconfiguration problem with Spark 3.0.
Here are the details:

*Spark 2.4*


        Summary Metrics for 2203 Completed Tasks
        <http://10.0.0.8:4040/stages/stage/?id=0&attempt=0#tasksTitle>

Metric  Min     25th percentile         Median  75th percentile         Max
Duration        0.0 ms  0.0 ms  0.0 ms  1.0 ms  62.0 ms
GC Time         0.0 ms  0.0 ms  0.0 ms  0.0 ms  11.0 ms

Showing 1 to 2 of 2 entries


        Aggregated Metrics by Executor


Show  entries
Search:
Executor ID Logs Address Task Time Total Tasks Failed TasksKilled Tasks Succeeded Tasks Blacklisted
driver  
        10.0.0.8:49159 <http://10.0.0.8:49159>    36 s    2203    0       0     
  2203    false



*Spark 3.0*


        Summary Metrics for 8 Completed Tasks
        
<http://10.0.0.8:4040/stages/stage/?id=1&attempt=0&task.eventTimelinePageNumber=1&task.eventTimelinePageSize=47#tasksTitle>

Metric  Min     25th percentile         Median  75th percentile         Max
Duration        3.8 min         4.0 min         4.1 min         4.4 min         
5.0 min
GC Time         3 s     3 s     3 s     4 s     4 s
Input Size / Records 15.6 MiB / 51028 16.2 MiB / 53303 16.8 MiB /55259 17.8 MiB / 58148 20.2 MiB / 71624
Showing 1 to 3 of 3 entries


        Aggregated Metrics by Executor


Show  entries
Search:
Executor ID Logs Address Task Time Total Tasks Failed TasksKilled Tasks Succeeded Tasks Blacklisted Input Size / Records
driver  
10.0.0.8:50224 <http://10.0.0.8:50224> 33 min 8 0 0 8 false136.1 MiB / 451999
The DAG is also different
Spark 2.0 DAG

Screenshot 2020-06-27 16.30.26.png

Spark 3.0 DAG

Screenshot 2020-06-27 16.32.32.png

Re: Spark 3.0 almost 1000 times slower to read json than Spark 2.4

Reply via email to