[ https://issues.apache.org/jira/browse/MESOS-700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sam Taha updated MESOS-700: --------------------------- Comment: was deleted (was: I am hoping this feature can be a general implementation for how executors can efficiently download and manage the cached code/resources during the lifecycle of an executor and that this will be based on "types" of executors (an executor "type" being unique per Framework). I need this kind of functionality for my Framework (JobServer) which runs time-based scheduled jobs/workflows. A job/workflow in my case is a collection of JARs and on Mesos runs in its own executor each time it is run. The executor is started and then the job is wrapped as a Mesos launchTask and sent to Mesos. So the executor lives for only the duration of the job run. Each job run starts a new executor on the same or different slave...etc. In my Framework's case, a specific job can be run (i.e. run in its executor) hundreds of times in an hour, for example, so efficient cache localization of resources is critical for my Framework. Also a particular job's dependencies (JARs) can be changed dynamically by users via workflow GUI. So the same job might be running multiple instances of itself, but with potentially slightly different dependencies, especially if the job is a long running job. In most case the jobs dependencies will not change that often, but this a possible scenario that needs to be handled for my Framework to cover the edge cases. So in my case, if the job/executor has already been run and the code/resources have not changed since the last time the job ran on that slave, I want to use the cached dependencies. But as noted before, multiple job/executors (of the same "type") can be running on the same slave at the same time, so any sharing of JARs needs to deal with the issue of how to handle a running job that is currently using the cached resources/dependencies while handling the case where the same executor is run with slightly updated jars dependencies at the same time on the same slave, because of something the user edited in the job workflow rules. Main point, is that if a job/executor has had no dependency changes then I want to use the cached resource, but if its dependencies have changed it needs to deal with depency version changes if another job for the same type of executor is still actively running using the older dependencies. So having said all this, here is what I propose: 1) Modify the ExecutorInfo.CommandInfo to include a new "Dependency" attribute that will be a more complex form of URI that defines a simple caching attribute. So CommandInfo would look like this: /** * Looks like URI type but has "cached" attribute */ message Dependency { required string value = 1; optional bool executable = 2; optional bool cached = 3; //true if cached across executor runs } /** * Change CommandInfo to: */ message CommandInfo { repeated URI uris = 1; repeated Dependency = 4; //keep URI for non-cached resources and for backwards compatibility optional Environment environment = 2; required string value = 3; } 2) Now in ExecutorInfo we either use the "name" attribute or create a new "type" attribute that will uniquely define an specific executor's definition across a Framework. And we use this name/type attribute to know how to cache executors. We should define a best practice for "naming" executor types? But it is up to the Framework to define this via the "name" attribute. 3) When ever the slave launch code detects that a particular executor's dependencies have changed (by checking for new dependencies and modified dependencies) it creates a new sandbox for the executor of that type and will cache the dependencies, otherwise if the executor with same name/type and dependencies has already run on that slave, the existing cache will be reused. So the slave executor launch code needs to be intelligent enough to look at the dependencies and decide how to use the cache if the cache needs to be version-ed if new or modified set of dependencies now exist for that type of executor (and garbage collect old previous version of the executor's cache once any running executors using that code have finished). Thanks, Sam ) > more efficient distribution of frameworks via HDFS > -------------------------------------------------- > > Key: MESOS-700 > URL: https://issues.apache.org/jira/browse/MESOS-700 > Project: Mesos > Issue Type: Improvement > Components: framework > Affects Versions: 0.13.0, 0.14.0, 0.15.0 > Environment: general > Reporter: Du Li > Fix For: 0.15.0 > > > I was exploring the latest code (0.15.0) at https://github.com/apache/mesos > to test the tgz distribution of frameworks. Take spark for example. I created > a tgz of spark binary and put it on HDFS. After a job is submitted, it is > decomposed into many tasks. For each task, the assigned mesos slave downloads > the tgz from HDFS, unzips it, and executes some script to launch the task. > This seems very wasteful and unnecessary. > Does the following suggestion make sense? When a spark job is submitted, the > spark/mesos master calculates a checksum or something the like for the tgz > distribution. Then the checksum is sent to the slaves when tasks are > assigned. If the same file has already been downloaded/unzipped, a slave > directly launches the task. This way the tgz is processed at most once for > each job (which may have thousands of tasks). The aggregated saving would be > tremendous. -- This message was sent by Atlassian JIRA (v6.1#6144)