[ 
https://issues.apache.org/jira/browse/SPARK-25409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16618213#comment-16618213
 ] 

Rong Tang commented on SPARK-25409:
-----------------------------------

Create a pull request for it. [https://github.com/apache/spark/pull/22444]

 

> Speed up Spark History at start if there are tens of thousands of 
> applications.
> -------------------------------------------------------------------------------
>
>                 Key: SPARK-25409
>                 URL: https://issues.apache.org/jira/browse/SPARK-25409
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 2.3.1
>            Reporter: Rong Tang
>            Priority: Major
>         Attachments: SPARK-25409.0001.patch
>
>
> We have a spark history server, storing 7 days' applications. it usually has 
> 10K to 20K attempts.
> We found that it can take hours at start up,loading/replaying the logs in 
> event-logs folder.  thus, new finished applications have to wait several 
> hours to be seem. So I made 2 improvements for it.
>  # As we run spark on yarn. the on-going applications' information can also 
> be seen via resource manager, so I introduce in a flag 
> spark.history.fs.load.incomplete to say loading logs for incomplete attempts 
> or not.
>  # Incremental loading applications. as I said, we have more then 10K 
> applications stored, it can take hours to load all of them at the first time. 
> so I introduced in a config spark.history.fs.appRefreshNum to say how many 
> application to load each time, then it gets a chance the check the latest 
> updates.
> Here are the benchmark I did.  our system has 1K incomplete application ( it 
> was not cleaned up for some reason, that is another issue that I need 
> investigate), and applications' log size can be gigabytes. 
>  
> Not load incomplete attempts.
> | |Load Count|Load incomplete APPs|All attempts number|Time Cost|Increase 
> with more attempts|
> |1 ( current implementation)|All|Yes|13K|2 hours 14 minutes|Yes|
> |2|All|No|13K|31 minutes| yes|
>  
>  
> Limit each time how much to load.
>  
> | |Load Count|Load incomplete APPs|All attempts number|Worst Cost|Increase 
> with more attempts|
> |1 ( current implementation)|All|Yes|13K|2 hours 14 minutes|Yes|
> |2|3000|Yes|13K|42 minutes except last 1.6K
> (The last 1.6K attempts cost extremely long 2.5 hours)|NO|
>  
>  
> Limit each time how many to load, and not load incomplete jobs.
>  
> | |Load Count|Load incomplete APPs|All attempts number|Worst 
> Cost|Avg|Increase with more attempts|
> |1 ( current implementation)|All|Yes|13K|2 hours 14 minutes| |Yes|
> |2|3000|NO|12K|17minutes
>  |10 minutes
> ( 41 minutes in total)|NO|
>  
>  
> | |Load Count|Load incomplete APPs|All attempts number|Worst 
> Cost|Avg|Increase with more attempts|
> |1 ( current implementation)|All|Yes|20K|1 hour 52 minutes| |Yes|
> |2|3000|NO|18.5K|20minutes|18 minutes
> (2 hours 18 minutes in total)
>  |NO|
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to