[GitHub] spark issue #22444: [SPARK-25409][Core]Speed up Spark History loading via in...

2018-09-21 Thread jianjianjiao
Github user jianjianjiao commented on the issue: https://github.com/apache/spark/pull/22444 @squito Yes, you are correct. I was trying to make the applications running during the scan be picked up quicker. It turns out the SPARK-6951 has done great job in achieving this.

[GitHub] spark issue #22444: [SPARK-25409][Core]Speed up Spark History loading via in...

2018-09-21 Thread jianjianjiao
Github user jianjianjiao commented on the issue: https://github.com/apache/spark/pull/22444 @vanzin Really thanks for you suggestions. It becomes much faster loading event logs. from more than 2.5 hours, to 19 minutes, loading 17K event logs, some of them are larger than 10G.

[GitHub] spark issue #22444: [SPARK-25409][Core]Speed up Spark History loading via in...

2018-09-18 Thread vanzin
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/22444 > so any server restart results in hours of downtime, just from scanning. Well, that's why 2.3 supports caching things on disk. Also, 2.4 has SPARK-6951 which should make this a lot faster

[GitHub] spark issue #22444: [SPARK-25409][Core]Speed up Spark History loading via in...

2018-09-18 Thread squito
Github user squito commented on the issue: https://github.com/apache/spark/pull/22444 > history server startup needs to go through all these logs before being usable, so any server restart results in hours of downtime, just from scanning. I don't think this is true. The first

[GitHub] spark issue #22444: [SPARK-25409][Core]Speed up Spark History loading via in...

2018-09-18 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/22444 I see the reasoning here * @jianjianjiao has a very large cluster with many thousands of history files of past (successful) jobs. * history server startup needs to go through all

[GitHub] spark issue #22444: [SPARK-25409][Core]Speed up Spark History loading via in...

2018-09-17 Thread jianjianjiao
Github user jianjianjiao commented on the issue: https://github.com/apache/spark/pull/22444 Add @vanzin @steveloughran @squito who made changes to related code. --- - To unsubscribe, e-mail: