-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30774/#review73462
-----------------------------------------------------------


This really is an impressive project Bernd, tipping my hat here!

I did not get all the way through, had to ignore the tests in this review (to 
be continued!).

For playing a bit with this and for testing the functionality, it would be 
great if this was rebased to the current master.

Some high level comments; 
I kind of like the idea of wrapping synchronous calls into async's, thus 
allowing error cases to fall through then-cases and hence enabling an error 
handling that isnt scattered all over. While you are now able to concentrate 
the failure handling, there is a cost involved; 1. it becomes harder to read, 
2. it possibly becomes more prone to unforseen concurrency issues, 3. this 
launches one-off processes a lot lot lot :).


docs/fetcher.md
<https://reviews.apache.org/r/30774/#comment120473>

    s/reponsible/responsible/



docs/fetcher.md
<https://reviews.apache.org/r/30774/#comment120474>

    s/specifiy/specify/



docs/fetcher.md
<https://reviews.apache.org/r/30774/#comment120475>

    s/somehwere/somewhere/



include/mesos/mesos.proto
<https://reviews.apache.org/r/30774/#comment120476>

    This should be part of the URI message definition as it is not used by the 
CommandInfo message directly.



src/Makefile.am
<https://reviews.apache.org/r/30774/#comment120462>

    Alphabetize please.



src/hdfs/hdfs.hpp
<https://reviews.apache.org/r/30774/#comment120477>

    For debugging purposes, it might be nice to see the actual result from this 
HDFS callup. Maybe something like:
    ```
    return Error("HDFS du returned unexpected number of results: HDFS returned 
'" << output.str() << "'");
    ```



src/launcher/fetcher.cpp
<https://reviews.apache.org/r/30774/#comment120478>

    Why are you mentioning `os::net` to the user?



src/launcher/fetcher.cpp
<https://reviews.apache.org/r/30774/#comment120468>

    Not yours but this sounds weird, what do you think?



src/launcher/fetcher.cpp
<https://reviews.apache.org/r/30774/#comment120469>

    It should be enough to return the error and let the upper layers decide on 
the logging, no?



src/launcher/fetcher.cpp
<https://reviews.apache.org/r/30774/#comment120470>

    The extract method itself does not explicitely return the source path in 
the error message. It will however contain that information by the shell call 
of `extract` (lines 73 + 79):
    ```
    command += " '" + sourcePath + "'";
    ```
    
    ```
    return Error("Failed to extract: command " + command +
                 " exited with status: " + stringify(status));
    ```
    
    For consistency, I would suggest to either getting rid of his very 
extension of the error message or to adding the same in line 237.



src/launcher/fetcher.cpp
<https://reviews.apache.org/r/30774/#comment120471>

    How about moving this up and thereby getting rid of the brackets around the 
"expected" case?



src/launcher/fetcher.cpp
<https://reviews.apache.org/r/30774/#comment120472>

    Is this part of the comment an artifact of older revisions? The exit code 
does not seem to correlate with the number of fetched items.



src/slave/containerizer/fetcher.hpp
<https://reviews.apache.org/r/30774/#comment120490>

    I know this particular issue has not yet gotten a final vote, but I would 
suggest using 
    `__SLAVE_CONTAINERIZER_FETCHER_HPP__` instead.



src/slave/containerizer/fetcher.hpp
<https://reviews.apache.org/r/30774/#comment120491>

    Why are we exposing the FetchProcess definition within this header? In 
other words, does anyone outside the Fetcher implementation need to know about 
this process' symbols?



src/slave/containerizer/fetcher.cpp
<https://reviews.apache.org/r/30774/#comment120493>

    Why not making this a `std::string` as you never use it in its raw, C-style 
format anyway?



src/slave/containerizer/fetcher.cpp
<https://reviews.apache.org/r/30774/#comment120494>

    See above.



src/slave/containerizer/fetcher.cpp
<https://reviews.apache.org/r/30774/#comment120495>

    Schemes should not be limited to a length (<= 5), the standard does not 
mention such restriction (see http://tools.ietf.org/html/std66#section-3.1).



src/slave/containerizer/fetcher.cpp
<https://reviews.apache.org/r/30774/#comment120497>

    Can we get rid of this validation here or within `validateUri`?
    
    In any case, please make sure the resulting Error messages are consistent.



src/slave/containerizer/fetcher.cpp
<https://reviews.apache.org/r/30774/#comment120621>

    Are these URIs canonicalized towards lower case already at some point?
    
    According to the standard, a mixed case URI / scheme is perfectly valid: 
`Http://example.org/example.tar.gz`.
    
    See http://tools.ietf.org/html/std66#section-3.1 again :).



src/slave/containerizer/fetcher.cpp
<https://reviews.apache.org/r/30774/#comment120487>

    Hah, this is neat :).



src/slave/containerizer/fetcher.cpp
<https://reviews.apache.org/r/30774/#comment120500>

    Couldnt T get deduced from the async returned container (Try<T>) -- in 
other words, do you __need__ to call `tryToFuture<Nothing>(..)` instead of 
letting the compiler select the right one for you (`tryToFuture(..)`)?



src/slave/containerizer/fetcher.cpp
<https://reviews.apache.org/r/30774/#comment120488>

    So a force removes from cache -- can you explain why this is needed?



src/slave/containerizer/fetcher.cpp
<https://reviews.apache.org/r/30774/#comment120548>

    How about:
    
    ```
    .then(defer(
        self(),
        &FetcherProcess::__fetch,
        containerId,
        lambda::_1,
        ...
    ```
    
    or 
    
    ```
    .then(defer(self(),
                &FetcherProcess::__fetch,
                containerId,
                lambda::_1,
                ...
    ```
    ?



src/slave/containerizer/fetcher.cpp
<https://reviews.apache.org/r/30774/#comment120549>

    This continuation should IMHO be indended the same way the above `then` is. 
It otherwise becomes a bit messy, I feel.



src/slave/containerizer/fetcher.cpp
<https://reviews.apache.org/r/30774/#comment120596>

    Would you mind adding a comment on how could we possibly end up here with 
non existing paths?



src/slave/containerizer/fetcher.cpp
<https://reviews.apache.org/r/30774/#comment120553>

    `fetcherInfo.add_items()->CopyFrom(item)` ?



src/slave/containerizer/fetcher.cpp
<https://reviews.apache.org/r/30774/#comment120501>

    This appears to be a use case for `discard` as you are no longer interested 
in the results and, if possible, wish to abort any actions, right? 
    
    Such discard however certainly needs to be taken care of within your entire 
chain (which currently only addresses `onFailure` but not `onDiscard`).



src/slave/containerizer/fetcher.cpp
<https://reviews.apache.org/r/30774/#comment120569>

    Why using `auto` here - is this sanctioned by our styleguide?



src/slave/containerizer/fetcher.cpp
<https://reviews.apache.org/r/30774/#comment120570>

    Clang-format seems to prefer not to indend a function name after its return 
value at all. I have never encountered such case before, hence I am entirely 
unsure about the mesos (aka __right__) way.
    
    This appears to be triggered mostly by the fact that you are using nested 
class definitions - I would try to avoid that unless your implementation and 
the definitions are in one place as the resulting symbols are really hard to 
identify on a single glance.



src/slave/slave.cpp
<https://reviews.apache.org/r/30774/#comment120482>

    "TODO(bernd-mesos): Add a ..." ?



src/slave/slave.cpp
<https://reviews.apache.org/r/30774/#comment120483>

    "// Make sure we start ..." ?


- Till Toenshoff


On Feb. 25, 2015, 5:54 p.m., Bernd Mathiske wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30774/
> -----------------------------------------------------------
> 
> (Updated Feb. 25, 2015, 5:54 p.m.)
> 
> 
> Review request for mesos, Adam B, Benjamin Hindman, Till Toenshoff, and 
> Timothy Chen.
> 
> 
> Bugs: MESOS-2057, MESOS-2069, MESOS-2070, MESOS-2072, MESOS-2073, and 
> MESOS-2074
>     https://issues.apache.org/jira/browse/MESOS-2057
>     https://issues.apache.org/jira/browse/MESOS-2069
>     https://issues.apache.org/jira/browse/MESOS-2070
>     https://issues.apache.org/jira/browse/MESOS-2072
>     https://issues.apache.org/jira/browse/MESOS-2073
>     https://issues.apache.org/jira/browse/MESOS-2074
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Almost all of the functionality in epic MESOS-336. Downloaded files from 
> CommandInfo::URIs can now be cached in a cache directory designated by a 
> slave flag. This only happens when asked for by an extra flag in the URI and 
> is thus backwards-compatible. The cache has a size limit also given by a new 
> slave flag. Cache-resident files are evicted as necessary to make space for 
> newly fetched ones. Concurrent attempts to cache the same URI leads to only 
> one download. The fetcher program remains external for safety reasons, but is 
> now augmented with more elaborate parameters packed into a JSON object to 
> implement specific fetch actions for all of the above. Additional testing 
> includes fetching from (mock) HDFS and coverage of the new features.
> 
> 
> Diffs
> -----
> 
>   docs/fetcher.md PRE-CREATION 
>   include/mesos/fetcher/fetcher.proto 
> facb87b92bf3194516f636dcc348e136af537721 
>   include/mesos/mesos.proto 507845c493f65e154214fc7e562206e452990469 
>   src/Makefile.am d372404d84c9ac0bc49da7407ad366701b9586a6 
>   src/hdfs/hdfs.hpp 968545d9af896f3e72e156484cc58135405cef6b 
>   src/launcher/fetcher.cpp 5b2d86d1867b25bf71461fd73df91b089002325b 
>   src/slave/constants.hpp 12d6e92b814076fd423a7738e030dc6ce6954761 
>   src/slave/containerizer/docker.hpp 70114dcef7a416ae904e95eb9ce365bcd866aebc 
>   src/slave/containerizer/docker.cpp 813ddf8c98622748b2da1b739bf122387abc4c79 
>   src/slave/containerizer/fetcher.hpp 
> bfd98dbe16e2bd5df3e2c8e9b10e303654f33446 
>   src/slave/containerizer/fetcher.cpp 
> 6e6bce08d76bb8a5813c905e3ffeff9b2411fd6d 
>   src/slave/containerizer/mesos/containerizer.hpp 
> 074a2d82dcd882e52f8cd62ed7493295596acb26 
>   src/slave/containerizer/mesos/containerizer.cpp 
> d5b90d12d63becfeb4c3efa9c6f5d826417f582c 
>   src/slave/flags.hpp ddb32593566b71bcf0b66adeeb889e43ef911084 
>   src/slave/slave.cpp ec7ec1356e745bb07484ae1755c9183b038043b3 
>   src/tests/docker_containerizer_tests.cpp 
> 8b212d4e6ed5a179ebadce1bdbbf3edde87d706c 
>   src/tests/fetcher_cache_tests.cpp PRE-CREATION 
>   src/tests/fetcher_tests.cpp fcbf7ad912f0b1b3d106a75a9dcb8213a0c69c07 
>   src/tests/mesos.hpp 60c70043c8266a422ffa03ab5a949da0bc822124 
>   src/tests/mesos.cpp d76ed8c837da162d782de84e935fe8e11e01d6b0 
>   src/tests/mock_hadoop.sh PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/30774/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> --- longer Description: ---
> 
> -Replaces all other reviews for the fetcher cache except those related to 
> stout: 30006, 30033, 30034, 30036, 30037, 30039, 30124, 30173, 30614, 30616, 
> 30618, 30621, 30626. See descriptions of those. In dependency order:
> 
> 30033: Removes the fetcher env tests since these won't be needed any more 
> when the fetcher uses JSON in a single env var as a parameter. They never 
> tested anything that won't be covered by other tests anyway.
> 
> 30034: Makes the code structure of all fetcher tests the same. Instead of 
> calling the run method of the fetcher directly, calling through fetch(). Also 
> removes all uses of I/O redirection, which is not really needed for 
> debugging, and thus the next patch can refactor fetch() and run(). (The 
> latter comes in two varieties, which complicates matters without much 
> benefit.)
> 
> 30036: Extends the CommandInfo::URI protobuf with a boolean "caching" field 
> that will later cause fetcher cache actions. Also introduces the notion of a 
> cache directory to the fetcher info protobuf. And then propagates these 
> additions throughout the rest of the code base where applicable. This 
> includes passing the slave ID all the way down to the place where the cache 
> dir name is constructed.
> 
> 30037: Extends the fetcher info protobuf with "actions" (fetch directly 
> bypassing the cache, fetch through the cache, retrieve from the cache). 
> Switches the basis for dealing with uris to "items", which contain the uri, 
> the action, and potentially a cache file name. Refactors fetch() and run(), 
> so there is only one of each. Introduces about half of the actual cache 
> logic, including a hashmap of cache file objects for bookkeeping and basic 
> operations on it. 
> 
> 30039: Enables fetcher cache actions in the mesos fetcher program.
> 
> 30006: Enables concurrent downloading into the fetcher cache. Reuse of 
> download results in the cache when multiple fetcher runs occur concurrently. 
> 
> 30614: This is to ensure that all this refactoring of fetcher code has not 
> broken HDFS fetching. Adds a test that exercises the C++ code paths in Mesos 
> and mesos-fetcher related to fetching from HDFS. Uses a mock HDFS client 
> written in bash that acts just like a real "hadoop" command if used in the 
> right limited way.
> 
> 30124: Inserted fetcher cache zap upon slave startup, recovery and shutdown. 
> This implements recovery in an acceptable, yet most simple way.
> 
> 30173: Created fetcher cache tests. Adds a new test source file containing a 
> test fixture and tests to find out if the fetcher cache works with a variety 
> of settings.
> 
> 30616: Adds hdfs::du() which calls "hadoop fs -du -h" and returns a string 
> that contains the file size for the URI passed as argument. This is needed to 
> determine the size of a file on HDFS before downloading it to the fetcher 
> cache (to ensure there is enough space).
> 
> 30621: Refactored URI type separation in mesos-fetcher. Moved the URI type 
> separation code (distinguishes http, hdfs, local copying, etc.) from 
> mesos-fetcher to the fetcher process/actor, since it is going to be reused by 
> download size queries when we introduce fetcher cache management. Also 
> factored out URI validation, which will be used the same way by mesos-fetcher 
> and the fetcher process/actor.
> 
> 30626: Fetcher cache eviction. This happens when the cache does not have 
> enough space to accomodate upcoming downloads to the cache. Necessary 
> provisions included here:
> - mesos-fetcher does not run until evictions have been successful
> - Cache space is reserved while (async) waiting for eviction to succeed. If 
> it fails, the reservation gets undone.
> - Reservations can be partly from available space, partly from evictions. All 
> math included :-)
> - To find out how much space is needed, downloading has a prelude in which we 
> query the download size from the URI. This works for all URI types that 
> mesos-fetcher currently supports, including http and hdfs.
> - Size-determination requests are now synchronized, too. Only one per URI in 
> play happens.
> - There is cleanup code for all kinds of error situations. At the very end of 
> the fetch attempt, each list is processed for undoing things like space 
> reservations and eviction disabling.
> - Eviction gets disabled for URIs that are currently in use, i.e. the related 
> cache files are. We use reference counting for this, since there may be 
> concurrent fetch attempts using the same cache files.
> 
> 
> Thanks,
> 
> Bernd Mathiske
> 
>

Reply via email to