[jira] [Updated] (SPARK-48575) spark.history.fs.update.interval calling too many directory pollings when spark log dir contains many sparkEvent apps

Arnaud Nauwynck (Jira) Fri, 14 Jun 2024 11:07:06 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-48575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Arnaud Nauwynck updated SPARK-48575:
------------------------------------
    Description: 
In the case of a sparkLog dir containing "lot" of spark eventLogs sub-dirs 
(example 1000),
running a supposedly "iddle" Sparkhistory server is causing millions of 
directory listing calls each hour.

see code : 
[https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala#L283|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala#L283]

example: with ~1000 apps, every 10 seconds (default of 
"spark.history.fs.update.interval") SparkHistory is performing
- 1x  VirtualFileSystem.listStatus(path)   with path=sparkLog dir
- then 2x foreach each appSubDirPath (corresponding to a sparkApp eventLogs)
   => 2 x 1000 x VirtualFileSystem.listStatus(appSubDirPath)

On a cloud provider (example Azure), this cost a lot per month : 
because "List FileSystem" calls ~$0.065 per 10000 ops for Tier "Hot" or $0.0228 
for "Premium" (cf 
https://azure.microsoft.com/en-us/pricing/details/storage/data-lake/ )

Let's do the multiplications:
30 (days per month) * 86400 (sec per day) / 10 (interval second) = 259 000 
update times
... * 2001  (listings ops per update) = 518 millions listing calls per month
... * 0.0228 / 10000 = 1182 USD/month !!!!

Admitedly, the retention conf "spark.history.fs.cleaner.maxAge"  (default =7d) 
for spark eventLog is too much for workflows than run many short spark apps, 
and it would be possible to reduce it.

It is extremely important to reduce these recurrent costs
Here are several whishes

1/ fix "bug" in Spark History that is calling twice the 
VirtualFileSystem.listStatus(appSubDirPath). 
cf source code: [first call 
EventLogFileReaders.scala#L123|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/history/EventLogFileReaders.scala#L123]
  : it only test that the dir contains a child file with name prefix "eventLog" 
and an appStatus file, but then the list is unused.
It creates an instance of RollingEventLogFilesFileReader, and shortly after, 
the listing is called again:
cf [second call (lazy field) 
EventLogFileReaders.scala#L224|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/history/EventLogFileReaders.scala#L224]
the lazy field "files" is immediatly evaluated after object creation from here:
[second called from 
EventLogFileReaders.scala#L252|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/history/EventLogFileReaders.scala#L252]
.. [called from FsHistoryProvider.scala#L506 
|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala#L506]

Indeed, it is easilly possible to perform only 1 listing per sub-dir
(cf attach patch, changing ~5 lines of code)
This would divide cost by x2.

2/ in addition to conf "spark.history.fs.cleaner.maxAge", there is another 
available conf param ""spark.history.fs.cleaner.maxNum" to limit the number of 
spark apps, but the default vaule is Integer.MAX_VALUE. This could be defaulted 
to ~100 instead. 
This would additionaly divide cost by x10 (in case you have 1000 apps).

3/ change the code in SparkHistory to check lazily for update only on demand 
when someone click in Spark History web UI. For example, if the last cached 
update time is less than "spark.history.fs.update.interval" then no update is 
needed, else update is immediatly performed and cached before returning 
response.

4/ change the code in SparkHistory to avoid doing a listing on each app sub-dir.
 It is possible to perform a single listing on "sparkLog" top level dir, to 
discover new apps.
 Then for each app subdir, most of them are already finished, and already 
recompacted by SparkHistory itself. This info is already stored in spark 
history Keystore db. 
  Allmost all the listing sub-dirs can thefore be completly avoided.

  see [KVStore declaration 
|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala#L141],
 [KVStore 
ApplicationInfo|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala#L1194]





  was:
In the case of a sparkLog dir containing "lot" of spark eventLogs sub-dirs 
(example 1000),
running a supposedly "iddle" Sparkhistory server is causing millions of 
directory listing calls each hour.

see code : 
[https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala#L283|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala#L283]

example: with ~1000 apps, every 10 seconds (default of 
"spark.history.fs.update.interval") SparkHistory is performing
- 1x  VirtualFileSystem.listStatus(path)   with path=sparkLog dir
- then 2x foreach each appSubDirPath (corresponding to a sparkApp eventLogs)
   => 2 x 1000 x VirtualFileSystem.listStatus(appSubDirPath)

On a cloud provider (example Azure), this cost a lot per month : 
because "List FileSystem" calls ~$0.065 per 10000 ops for Tier "Hot" or $0.0228 
for "Premium" (cf 
https://azure.microsoft.com/en-us/pricing/details/storage/data-lake/ )

Let's do the multiplications:
30 (days per month) * 86400 (sec per day) / 10 (interval second) = 259 000 
update times
... * 2001  (listings ops per update) = 518 millions listing calls per month
... * 0.0228 / 10000 = 1182 USD/month !!!!

Admitedly, the retention conf "spark.history.fs.cleaner.maxAge"  (default =7d) 
for spark eventLog is too much for workflows than run many short spark apps, 
and it would be possible to reduce it.

It is extremely important to reduce these recurrent costs
Here are several whishes

1/ fix "bug" in Spark History that is calling twice the 
VirtualFileSystem.listStatus(appSubDirPath). 
cf source code: [first call 
EventLogFileReaders.scala#L123|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/history/EventLogFileReaders.scala#L123]
  : it only test that the dir contains a child file with name prefix "eventLog" 
and an appStatus file, but then the list is unused.
It creates an instance of RollingEventLogFilesFileReader, and shortly after, 
the listing is called again:
cf [second call (lazy field) 
EventLogFileReaders.scala#L224|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/history/EventLogFileReaders.scala#L224]
the lazy field "files" is immediatly evaluated after object creation from here:
[second called from 
EventLogFileReaders.scala#L252|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/history/EventLogFileReaders.scala#L252]
.. [called from FsHistoryProvider.scala#L506 
|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala#L506]

Indeed, it is easilly possible to perform only 1 listing per sub-dir
(cf attach patch, changing ~5 lines of code)
This would divide cost by x2.

2/ in addition to conf "spark.history.fs.cleaner.maxAge", add another conf 
param ""spark.history.fs.cleaner.maxCount" to limit the number of spark apps. 
This could be defaulted to ~50. 
This would additionaly divide cost by x10 (in case you have 1000 apps).

3/ change the code in SparkHistory to check lazily for update only on demand 
when someone click in Spark History web UI. For example, if the last cached 
update time is less than "spark.history.fs.update.interval" then no update is 
needed, else update is immediatly performed and cached before returning 
response.

4/ change the code in SparkHistory to avoid doing a listing on each app sub-dir.
 It is possible to perform a single listing on "sparkLog" top level dir, to 
discover new apps.
 Then for each app subdir, most of them are already finished, and already 
recompacted by SparkHistory itself. This info is already stored in spark 
history Keystore db. 
  Allmost all the listing sub-dirs can thefore be completly avoided.

  see [KVStore declaration 
|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala#L141],
 [KVStore 
ApplicationInfo|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala#L1194]






> spark.history.fs.update.interval calling too many directory pollings when 
> spark log dir contains many sparkEvent apps 
> ----------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-48575
>                 URL: https://issues.apache.org/jira/browse/SPARK-48575
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 3.3.3, 4.0.0, 3.5.1, 3.4.3
>            Reporter: Arnaud Nauwynck
>            Priority: Critical
>         Attachments: EventLogFileReaders.patch
>
>
> In the case of a sparkLog dir containing "lot" of spark eventLogs sub-dirs 
> (example 1000),
> running a supposedly "iddle" Sparkhistory server is causing millions of 
> directory listing calls each hour.
> see code : 
> [https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala#L283|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala#L283]
> example: with ~1000 apps, every 10 seconds (default of 
> "spark.history.fs.update.interval") SparkHistory is performing
> - 1x  VirtualFileSystem.listStatus(path)   with path=sparkLog dir
> - then 2x foreach each appSubDirPath (corresponding to a sparkApp eventLogs)
>    => 2 x 1000 x VirtualFileSystem.listStatus(appSubDirPath)
> On a cloud provider (example Azure), this cost a lot per month : 
> because "List FileSystem" calls ~$0.065 per 10000 ops for Tier "Hot" or 
> $0.0228 for "Premium" (cf 
> https://azure.microsoft.com/en-us/pricing/details/storage/data-lake/ )
> Let's do the multiplications:
> 30 (days per month) * 86400 (sec per day) / 10 (interval second) = 259 000 
> update times
> ... * 2001  (listings ops per update) = 518 millions listing calls per month
> ... * 0.0228 / 10000 = 1182 USD/month !!!!
> Admitedly, the retention conf "spark.history.fs.cleaner.maxAge"  (default 
> =7d) for spark eventLog is too much for workflows than run many short spark 
> apps, and it would be possible to reduce it.
> It is extremely important to reduce these recurrent costs
> Here are several whishes
> 1/ fix "bug" in Spark History that is calling twice the 
> VirtualFileSystem.listStatus(appSubDirPath). 
> cf source code: [first call 
> EventLogFileReaders.scala#L123|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/history/EventLogFileReaders.scala#L123]
>   : it only test that the dir contains a child file with name prefix 
> "eventLog" and an appStatus file, but then the list is unused.
> It creates an instance of RollingEventLogFilesFileReader, and shortly after, 
> the listing is called again:
> cf [second call (lazy field) 
> EventLogFileReaders.scala#L224|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/history/EventLogFileReaders.scala#L224]
> the lazy field "files" is immediatly evaluated after object creation from 
> here:
> [second called from 
> EventLogFileReaders.scala#L252|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/history/EventLogFileReaders.scala#L252]
> .. [called from FsHistoryProvider.scala#L506 
> |https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala#L506]
> Indeed, it is easilly possible to perform only 1 listing per sub-dir
> (cf attach patch, changing ~5 lines of code)
> This would divide cost by x2.
> 2/ in addition to conf "spark.history.fs.cleaner.maxAge", there is another 
> available conf param ""spark.history.fs.cleaner.maxNum" to limit the number 
> of spark apps, but the default vaule is Integer.MAX_VALUE. This could be 
> defaulted to ~100 instead. 
> This would additionaly divide cost by x10 (in case you have 1000 apps).
> 3/ change the code in SparkHistory to check lazily for update only on demand 
> when someone click in Spark History web UI. For example, if the last cached 
> update time is less than "spark.history.fs.update.interval" then no update is 
> needed, else update is immediatly performed and cached before returning 
> response.
> 4/ change the code in SparkHistory to avoid doing a listing on each app 
> sub-dir.
>  It is possible to perform a single listing on "sparkLog" top level dir, to 
> discover new apps.
>  Then for each app subdir, most of them are already finished, and already 
> recompacted by SparkHistory itself. This info is already stored in spark 
> history Keystore db. 
>   Allmost all the listing sub-dirs can thefore be completly avoided.
>   see [KVStore declaration 
> |https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala#L141],
>  [KVStore 
> ApplicationInfo|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala#L1194]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-48575) spark.history.fs.update.interval calling too many directory pollings when spark log dir contains many sparkEvent apps

Reply via email to