-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30774/
-----------------------------------------------------------

(Updated March 11, 2015, 10:50 a.m.)


Review request for mesos, Adam B, Benjamin Hindman, Till Toenshoff, and Timothy 
Chen.


Changes
-------

Rebased.


Bugs: MESOS-2057, MESOS-2069, MESOS-2070, MESOS-2072, MESOS-2073, and MESOS-2074
    https://issues.apache.org/jira/browse/MESOS-2057
    https://issues.apache.org/jira/browse/MESOS-2069
    https://issues.apache.org/jira/browse/MESOS-2070
    https://issues.apache.org/jira/browse/MESOS-2072
    https://issues.apache.org/jira/browse/MESOS-2073
    https://issues.apache.org/jira/browse/MESOS-2074


Repository: mesos


Description
-------

Almost all of the functionality in epic MESOS-336. Downloaded files from 
CommandInfo::URIs can now be cached in a cache directory designated by a slave 
flag. This only happens when asked for by an extra flag in the URI and is thus 
backwards-compatible. The cache has a size limit also given by a new slave 
flag. Cache-resident files are evicted as necessary to make space for newly 
fetched ones. Concurrent attempts to cache the same URI leads to only one 
download. The fetcher program remains external for safety reasons, but is now 
augmented with more elaborate parameters packed into a JSON object to implement 
specific fetch actions for all of the above. Additional testing includes 
fetching from (mock) HDFS and coverage of the new features.


Diffs (updated)
-----

  docs/configuration.md fc3afec248b534b1d5eb625eb66de5f90cd8cd33 
  docs/fetcher-cache-internals.md PRE-CREATION 
  docs/fetcher.md PRE-CREATION 
  include/mesos/fetcher/fetcher.proto 311af9aebc6a85dadba9dbeffcf7036b70896bcc 
  include/mesos/mesos.proto 9df972d750ce1e4a81d2e96cc508d6f83cad2fc8 
  src/Makefile.am 3059818231c46484039d179cd6916932eff6cd68 
  src/hdfs/hdfs.hpp 968545d9af896f3e72e156484cc58135405cef6b 
  src/launcher/fetcher.cpp 796526f59c25898ef6db2b828b0e2bb7b172ba25 
  src/slave/constants.hpp fd1c1aba0aa62372ab399bee5709ce81b8e92cec 
  src/slave/containerizer/docker.hpp b7bf54ac65d6c61622e485ac253513eaac2e4f88 
  src/slave/containerizer/docker.cpp 5f4b4ce49a9523e4743e5c79da4050e6f9e29ed7 
  src/slave/containerizer/fetcher.hpp 1db0eaf002c8d0eaf4e0391858e61e0912b35829 
  src/slave/containerizer/fetcher.cpp 9e9e9d0eb6b0801d53dec3baea32a4cd4acdd5e2 
  src/slave/containerizer/mesos/containerizer.hpp 
ae61a0fcd19f2ba808624312401f020121baf5d4 
  src/slave/containerizer/mesos/containerizer.cpp 
fbd1c0a0e5f4f227adb022f0baaa6d2c7e3ad748 
  src/slave/flags.hpp 56b25caf3901b38bdecb50310e8bcae0b114efa8 
  src/slave/slave.cpp 71ae84bbfcef208cc2ee603f3c8a79225e48a7d5 
  src/tests/docker_containerizer_tests.cpp 
06cd3d89ecbaaac17ae6970604b21fbe29f6e887 
  src/tests/fetcher_cache_tests.cpp PRE-CREATION 
  src/tests/fetcher_tests.cpp 4549e6a631e2c17cec3766efaa556593eeac9a1e 
  src/tests/mesos.hpp 45e35204d1aa876fa0c871acf0f21afcd5ababe8 
  src/tests/mesos.cpp c8f43d21b214e75eaac2870cbdf4f03fd18707d1 

Diff: https://reviews.apache.org/r/30774/diff/


Testing
-------

make check

--- longer Description: ---

-Replaces all other reviews for the fetcher cache except those related to 
stout: 30006, 30033, 30034, 30036, 30037, 30039, 30124, 30173, 30614, 30616, 
30618, 30621, 30626. See descriptions of those. In dependency order:

30033: Removes the fetcher env tests since these won't be needed any more when 
the fetcher uses JSON in a single env var as a parameter. They never tested 
anything that won't be covered by other tests anyway.

30034: Makes the code structure of all fetcher tests the same. Instead of 
calling the run method of the fetcher directly, calling through fetch(). Also 
removes all uses of I/O redirection, which is not really needed for debugging, 
and thus the next patch can refactor fetch() and run(). (The latter comes in 
two varieties, which complicates matters without much benefit.)

30036: Extends the CommandInfo::URI protobuf with a boolean "caching" field 
that will later cause fetcher cache actions. Also introduces the notion of a 
cache directory to the fetcher info protobuf. And then propagates these 
additions throughout the rest of the code base where applicable. This includes 
passing the slave ID all the way down to the place where the cache dir name is 
constructed.

30037: Extends the fetcher info protobuf with "actions" (fetch directly 
bypassing the cache, fetch through the cache, retrieve from the cache). 
Switches the basis for dealing with uris to "items", which contain the uri, the 
action, and potentially a cache file name. Refactors fetch() and run(), so 
there is only one of each. Introduces about half of the actual cache logic, 
including a hashmap of cache file objects for bookkeeping and basic operations 
on it. 

30039: Enables fetcher cache actions in the mesos fetcher program.

30006: Enables concurrent downloading into the fetcher cache. Reuse of download 
results in the cache when multiple fetcher runs occur concurrently. 

30614: This is to ensure that all this refactoring of fetcher code has not 
broken HDFS fetching. Adds a test that exercises the C++ code paths in Mesos 
and mesos-fetcher related to fetching from HDFS. Uses a mock HDFS client 
written in bash that acts just like a real "hadoop" command if used in the 
right limited way.

30124: Inserted fetcher cache zap upon slave startup, recovery and shutdown. 
This implements recovery in an acceptable, yet most simple way.

30173: Created fetcher cache tests. Adds a new test source file containing a 
test fixture and tests to find out if the fetcher cache works with a variety of 
settings.

30616: Adds hdfs::du() which calls "hadoop fs -du -h" and returns a string that 
contains the file size for the URI passed as argument. This is needed to 
determine the size of a file on HDFS before downloading it to the fetcher cache 
(to ensure there is enough space).

30621: Refactored URI type separation in mesos-fetcher. Moved the URI type 
separation code (distinguishes http, hdfs, local copying, etc.) from 
mesos-fetcher to the fetcher process/actor, since it is going to be reused by 
download size queries when we introduce fetcher cache management. Also factored 
out URI validation, which will be used the same way by mesos-fetcher and the 
fetcher process/actor.

30626: Fetcher cache eviction. This happens when the cache does not have enough 
space to accomodate upcoming downloads to the cache. Necessary provisions 
included here:
- mesos-fetcher does not run until evictions have been successful
- Cache space is reserved while (async) waiting for eviction to succeed. If it 
fails, the reservation gets undone.
- Reservations can be partly from available space, partly from evictions. All 
math included :-)
- To find out how much space is needed, downloading has a prelude in which we 
query the download size from the URI. This works for all URI types that 
mesos-fetcher currently supports, including http and hdfs.
- Size-determination requests are now synchronized, too. Only one per URI in 
play happens.
- There is cleanup code for all kinds of error situations. At the very end of 
the fetch attempt, each list is processed for undoing things like space 
reservations and eviction disabling.
- Eviction gets disabled for URIs that are currently in use, i.e. the related 
cache files are. We use reference counting for this, since there may be 
concurrent fetch attempts using the same cache files.


Thanks,

Bernd Mathiske

Reply via email to