[ 
https://issues.apache.org/jira/browse/SPARK-26082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Martin Loncaric updated SPARK-26082:
------------------------------------
    Description: 
Currently in [docs|https://spark.apache.org/docs/latest/running-on-mesos.html]:
{quote}spark.mesos.fetcherCache.enable / false / If set to `true`, all URIs 
(example: `spark.executor.uri`, `spark.mesos.uris`) will be cached by the Mesos 
Fetcher Cache
{quote}

Currently in {{MesosClusterScheduler.scala}} (which passes parameter to driver):
{{private val useFetchCache = conf.getBoolean("spark.mesos.fetchCache.enable", 
false)}}

Currently in {{MesosCourseGrainedSchedulerBackend.scala}} (which passes mesos 
caching parameter to executors):
{{private val useFetcherCache = 
conf.getBoolean("spark.mesos.fetcherCache.enable", false)}}

This naming discrepancy dates back to version 2.0.0 
([jira|http://mail-archives.apache.org/mod_mbox/spark-issues/201606.mbox/%3cjira.12979909.1466099309000.9921.1466101026...@atlassian.jira%3E]).

This means that when {{spark.mesos.fetcherCache.enable=true}} is specified, the 
Mesos cache will be used only for executors, and not for drivers.

IMPACT:
Not caching these driver files (typically including at least spark binaries, 
custom jar, and additional dependencies) adds considerable overhead network 
traffic and startup time when frequently running spark Applications on a Mesos 
cluster. Additionally, since extracted files like {{spark-x.x.x-bin-*.tgz}} are 
additionally copied and left in the sandbox with the cache off (rather than 
extracted directly without an extra copy), this can considerably increase disk 
usage. Users CAN currently workaround by specifying the 
{{spark.mesos.fetchCache.enable}} option, but this should at least be specified 
in the documentation.

SUGGESTED FIX:
Add {{spark.mesos.fetchCache.enable}} to the documentation for versions 2 - 
2.4, and update to {{spark.mesos.fetcherCache.enable}} going forward.

  was:
Currently in [docs|https://spark.apache.org/docs/latest/running-on-mesos.html]:
{quote}spark.mesos.fetcherCache.enable / false / If set to `true`, all URIs 
(example: `spark.executor.uri`, `spark.mesos.uris`) will be cached by the Mesos 
Fetcher Cache
{quote}

Currently in {{MesosClusterScheduler.scala}} (which passes parameter to driver):
{{private val useFetchCache = conf.getBoolean("spark.mesos.fetchCache.enable", 
false)}}

Currently in {{MesosCourseGrainedSchedulerBackend.scala}} (which passes mesos 
caching parameter to executors):
{{private val useFetcherCache = 
conf.getBoolean("spark.mesos.fetcherCache.enable", false)}}

This naming discrepancy dates back to version 2.0.0 
([jira|http://mail-archives.apache.org/mod_mbox/spark-issues/201606.mbox/%3cjira.12979909.1466099309000.9921.1466101026...@atlassian.jira%3E]).

This means that when {{spark.mesos.fetcherCache.enable=true}} is specified, the 
Mesos cache will be used only for executors, and not for drivers.

IMPACT:
Not caching these driver files (typically including at least spark binaries, 
custom jar, and additional dependencies) adds considerable network traffic when 
frequently running spark Applications on a Mesos cluster. Additionally, since 
extracted files like {{spark-x.x.x-bin-*.tgz}} are additionally copied and left 
in the sandbox with the cache off (rather than extracted directly without an 
extra copy), this can considerably increase disk usage. Users CAN currently 
workaround by specifying the {{spark.mesos.fetchCache.enable}} option, but this 
should at least be specified in the documentation.

SUGGESTED FIX:
Add {{spark.mesos.fetchCache.enable}} to the documentation for versions 2 - 
2.4, and update to {{spark.mesos.fetcherCache.enable}} going forward.


> Misnaming of spark.mesos.fetch(er)Cache.enable in MesosClusterScheduler
> -----------------------------------------------------------------------
>
>                 Key: SPARK-26082
>                 URL: https://issues.apache.org/jira/browse/SPARK-26082
>             Project: Spark
>          Issue Type: Bug
>          Components: Mesos
>    Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1, 2.1.2, 2.1.3, 2.2.0, 
> 2.2.1, 2.2.2, 2.3.0, 2.3.1, 2.3.2
>            Reporter: Martin Loncaric
>            Priority: Major
>
> Currently in 
> [docs|https://spark.apache.org/docs/latest/running-on-mesos.html]:
> {quote}spark.mesos.fetcherCache.enable / false / If set to `true`, all URIs 
> (example: `spark.executor.uri`, `spark.mesos.uris`) will be cached by the 
> Mesos Fetcher Cache
> {quote}
> Currently in {{MesosClusterScheduler.scala}} (which passes parameter to 
> driver):
> {{private val useFetchCache = 
> conf.getBoolean("spark.mesos.fetchCache.enable", false)}}
> Currently in {{MesosCourseGrainedSchedulerBackend.scala}} (which passes mesos 
> caching parameter to executors):
> {{private val useFetcherCache = 
> conf.getBoolean("spark.mesos.fetcherCache.enable", false)}}
> This naming discrepancy dates back to version 2.0.0 
> ([jira|http://mail-archives.apache.org/mod_mbox/spark-issues/201606.mbox/%3cjira.12979909.1466099309000.9921.1466101026...@atlassian.jira%3E]).
> This means that when {{spark.mesos.fetcherCache.enable=true}} is specified, 
> the Mesos cache will be used only for executors, and not for drivers.
> IMPACT:
> Not caching these driver files (typically including at least spark binaries, 
> custom jar, and additional dependencies) adds considerable overhead network 
> traffic and startup time when frequently running spark Applications on a 
> Mesos cluster. Additionally, since extracted files like 
> {{spark-x.x.x-bin-*.tgz}} are additionally copied and left in the sandbox 
> with the cache off (rather than extracted directly without an extra copy), 
> this can considerably increase disk usage. Users CAN currently workaround by 
> specifying the {{spark.mesos.fetchCache.enable}} option, but this should at 
> least be specified in the documentation.
> SUGGESTED FIX:
> Add {{spark.mesos.fetchCache.enable}} to the documentation for versions 2 - 
> 2.4, and update to {{spark.mesos.fetcherCache.enable}} going forward.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to