-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30774/
-----------------------------------------------------------
(Updated April 13, 2015, 5:45 a.m.)
Review request for mesos, Adam B, Benjamin Hindman, Till Toenshoff, and Timothy
Chen.
Changes
-------
Addressed the latest 2 issues.
Bugs: MESOS-2057, MESOS-2069, MESOS-2070, MESOS-2072, MESOS-2073, and MESOS-2074
https://issues.apache.org/jira/browse/MESOS-2057
https://issues.apache.org/jira/browse/MESOS-2069
https://issues.apache.org/jira/browse/MESOS-2070
https://issues.apache.org/jira/browse/MESOS-2072
https://issues.apache.org/jira/browse/MESOS-2073
https://issues.apache.org/jira/browse/MESOS-2074
Repository: mesos
Description
-------
Almost all of the functionality in epic MESOS-336. Downloaded files from
CommandInfo::URIs can now be cached in a cache directory designated by a slave
flag. This only happens when asked for by an extra flag in the URI and is thus
backwards-compatible. The cache has a size limit also given by a new slave
flag. Cache-resident files are evicted as necessary to make space for newly
fetched ones. Concurrent attempts to cache the same URI leads to only one
download. The fetcher program remains external for safety reasons, but is now
augmented with more elaborate parameters packed into a JSON object to implement
specific fetch actions for all of the above. Additional testing includes
fetching from (mock) HDFS and coverage of the new features.
Diffs (updated)
-----
docs/configuration.md 54c4e31ed6dfed3c23d492c19a301ce119a0519b
docs/fetcher-cache-internals.md PRE-CREATION
docs/fetcher.md PRE-CREATION
include/mesos/fetcher/fetcher.proto 311af9aebc6a85dadba9dbeffcf7036b70896bcc
include/mesos/mesos.proto 3a8e8bf303e0576c212951f6028af77e54d93537
include/mesos/type_utils.hpp cdf5864389a72002b538c263d70bcade2bdffa45
src/Makefile.am d15a37365bcdd5c3906160b46b389635b38b1673
src/hdfs/hdfs.hpp 968545d9af896f3e72e156484cc58135405cef6b
src/launcher/fetcher.cpp 796526f59c25898ef6db2b828b0e2bb7b172ba25
src/slave/constants.hpp fd1c1aba0aa62372ab399bee5709ce81b8e92cec
src/slave/containerizer/docker.hpp 6893684e6d199a5d69fc8bba8e60c4acaae9c3c9
src/slave/containerizer/docker.cpp f9fb07806e3b7d7d2afc1be3b8756eac23b32dcd
src/slave/containerizer/fetcher.hpp 1db0eaf002c8d0eaf4e0391858e61e0912b35829
src/slave/containerizer/fetcher.cpp 9e9e9d0eb6b0801d53dec3baea32a4cd4acdd5e2
src/slave/containerizer/mesos/containerizer.hpp
ae61a0fcd19f2ba808624312401f020121baf5d4
src/slave/containerizer/mesos/containerizer.cpp
e4136095fca55637864f495098189ab3ad8d8fe7
src/slave/flags.hpp d3b1ce117fbb4e0b97852ef150b63f35cc991032
src/slave/flags.cpp 35f56252cfda5011d21aa188f33cc3e68a694968
src/slave/slave.cpp a0595f93ce4720f5b9926326d01210460ccb0667
src/tests/docker_containerizer_tests.cpp
c772d4c836de18b0e87636cb42200356d24ec73d
src/tests/fetcher_cache_tests.cpp PRE-CREATION
src/tests/fetcher_tests.cpp 4549e6a631e2c17cec3766efaa556593eeac9a1e
src/tests/mesos.hpp 42e42ac425a448fcc5e93db1cef1112cbf5e67c4
src/tests/mesos.cpp fc534e9febed1e293076e00e0f5c3879a78df90f
Diff: https://reviews.apache.org/r/30774/diff/
Testing
-------
make check
--- longer Description: ---
-Replaces all other reviews for the fetcher cache except those related to
stout: 30006, 30033, 30034, 30036, 30037, 30039, 30124, 30173, 30614, 30616,
30618, 30621, 30626. See descriptions of those. In dependency order:
30033: Removes the fetcher env tests since these won't be needed any more when
the fetcher uses JSON in a single env var as a parameter. They never tested
anything that won't be covered by other tests anyway.
30034: Makes the code structure of all fetcher tests the same. Instead of
calling the run method of the fetcher directly, calling through fetch(). Also
removes all uses of I/O redirection, which is not really needed for debugging,
and thus the next patch can refactor fetch() and run(). (The latter comes in
two varieties, which complicates matters without much benefit.)
30036: Extends the CommandInfo::URI protobuf with a boolean "caching" field
that will later cause fetcher cache actions. Also introduces the notion of a
cache directory to the fetcher info protobuf. And then propagates these
additions throughout the rest of the code base where applicable. This includes
passing the slave ID all the way down to the place where the cache dir name is
constructed.
30037: Extends the fetcher info protobuf with "actions" (fetch directly
bypassing the cache, fetch through the cache, retrieve from the cache).
Switches the basis for dealing with uris to "items", which contain the uri, the
action, and potentially a cache file name. Refactors fetch() and run(), so
there is only one of each. Introduces about half of the actual cache logic,
including a hashmap of cache file objects for bookkeeping and basic operations
on it.
30039: Enables fetcher cache actions in the mesos fetcher program.
30006: Enables concurrent downloading into the fetcher cache. Reuse of download
results in the cache when multiple fetcher runs occur concurrently.
30614: This is to ensure that all this refactoring of fetcher code has not
broken HDFS fetching. Adds a test that exercises the C++ code paths in Mesos
and mesos-fetcher related to fetching from HDFS. Uses a mock HDFS client
written in bash that acts just like a real "hadoop" command if used in the
right limited way.
30124: Inserted fetcher cache zap upon slave startup, recovery and shutdown.
This implements recovery in an acceptable, yet most simple way.
30173: Created fetcher cache tests. Adds a new test source file containing a
test fixture and tests to find out if the fetcher cache works with a variety of
settings.
30616: Adds hdfs::du() which calls "hadoop fs -du -h" and returns a string that
contains the file size for the URI passed as argument. This is needed to
determine the size of a file on HDFS before downloading it to the fetcher cache
(to ensure there is enough space).
30621: Refactored URI type separation in mesos-fetcher. Moved the URI type
separation code (distinguishes http, hdfs, local copying, etc.) from
mesos-fetcher to the fetcher process/actor, since it is going to be reused by
download size queries when we introduce fetcher cache management. Also factored
out URI validation, which will be used the same way by mesos-fetcher and the
fetcher process/actor.
30626: Fetcher cache eviction. This happens when the cache does not have enough
space to accomodate upcoming downloads to the cache. Necessary provisions
included here:
- mesos-fetcher does not run until evictions have been successful
- Cache space is reserved while (async) waiting for eviction to succeed. If it
fails, the reservation gets undone.
- Reservations can be partly from available space, partly from evictions. All
math included :-)
- To find out how much space is needed, downloading has a prelude in which we
query the download size from the URI. This works for all URI types that
mesos-fetcher currently supports, including http and hdfs.
- Size-determination requests are now synchronized, too. Only one per URI in
play happens.
- There is cleanup code for all kinds of error situations. At the very end of
the fetch attempt, each list is processed for undoing things like space
reservations and eviction disabling.
- Eviction gets disabled for URIs that are currently in use, i.e. the related
cache files are. We use reference counting for this, since there may be
concurrent fetch attempts using the same cache files.
Thanks,
Bernd Mathiske