> On Nov. 4, 2014, 9:46 p.m., Vinod Kone wrote:
> > src/launcher/fetcher.cpp, line 223
> > <https://reviews.apache.org/r/27483/diff/7/?file=748389#file748389line223>
> >
> > Why fall through if result.isError()?
>
> Ankur Chauhan wrote:
> The way I am thinking of the logic of fetchers is on the lines of "we try
> all fetchers one by one till we succeed *even* if an unsuccessful fetcher can
> handle it". This means that if for any reason (maybe mesos-fetcher is running
> doesn't have permissions or whatever be the case) a fetcher fails we always
> go to the next one. I understand that a case can be made to do something on
> the lines of
>
> ```
> if(local.canHandle(uri)) {
> return local.fetch(uri);
> } else if (os_net.canHandle(uri)) {
> return os_net.fetch(uri);
> } else if ( hadoop.canHandle(uri)) {
> return hadoop.fetch(uri);
> } else {
> return Error("Can't download uri:" + uri);
> }
> ```
>
> But this way we are assuming that local fetcher has all possible
> permissions to handle local/file uris *and* os_net has all the configuration
> to handle all http(s) | ftp/ftp(s) uris and takes away the fallback. It may
> be the case that the hadoop fetcher (and hence hadoop client) has some
> credentials in its config files that aren't present in the urls themselves.
> An example of this is if ftp uris are provided without user or password but
> core-site.xml has the required user/password in it.
>
> Does this answer your concern?
I'm concerned about how the logs will look for an user when fetching URIs. If
it's a non-local URI they would always see a message and failure with local
copy before using HDFS or Net, which is bad.
I would much rather make it explicit what you are doing, with comments.
```
if (local.canHandle(uri) {
result = local.fetch(uri);
if (result.isSome()) {
return result;
} else {
LOG(WARNING) << "Failed to copy local file: " << result.Error() << ";
}
} else if (os_net.canHandle(uri)) {
result = os_net.fetch(uri);
if (result.isSome()) {
return result;
} else {
LOG(WARNING) << "Failed to download URI: " << result.Error() << ";
}
}
// If we are here, one of the following is possible.
// 1. Failed to copy a local URI.
// 2. Failed to download a remote URI using Os.net.
// 3. HDFS compatible URI.
// 4. Unexpected URI.
// For all these cases we try to fetch the URI using Hadoop client as a
fallback.
return hadoop.fetch(uri);
```
More importantly, are there really file permission issues where "cp" doesn't
work but "hadoop fs copyToLocal" works? Can you give me a concrete example? I
ask because if the hadoop client hangs for whatever reason (we have seen this
in production at Twitter), it will take upto the executor registration timeout
for the slave to kill the executor and transition the task to LOST. Getting rid
of hadoop dependency was one of the reasons we install our executors locally.
This change adds the dependency back on hadoop, even for local files, which is
unfortunate.
- Vinod
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27483/#review59842
-----------------------------------------------------------
On Nov. 4, 2014, 10:20 p.m., Ankur Chauhan wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/27483/
> -----------------------------------------------------------
>
> (Updated Nov. 4, 2014, 10:20 p.m.)
>
>
> Review request for mesos, Timothy Chen, Timothy St. Clair, and Vinod Kone.
>
>
> Bugs: MESOS-1711
> https://issues.apache.org/jira/browse/MESOS-1711
>
>
> Repository: mesos-git
>
>
> Description
> -------
>
> Previously, the fetcher used a hardcoded list of schemes to determine what
> URIs could be fetched by hadoop (if available). This is now changed such that
> we first check if hadoop can fetch them for us and then we fallback to the
> os::net and then a local copy method (same as it used to be). This allows
> users to fetch artifacts from arbitrary filesystems as long as hadoop is
> correctly configured (in core-site.xml).
>
>
> Diffs
> -----
>
> src/hdfs/hdfs.hpp bbfeddef106c598d8379ced085ef0605c4b2f380
> src/launcher/fetcher.cpp 9323c28237010fa065ef34d74435c151ded530a8
>
> Diff: https://reviews.apache.org/r/27483/diff/
>
>
> Testing
> -------
>
> make check
> sudo bin/mesos-tests.sh --verbose
> support/mesos-style.py
>
>
> Thanks,
>
> Ankur Chauhan
>
>