[
https://issues.apache.org/jira/browse/MESOS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14007010#comment-14007010
]
Tom Arnfeld edited comment on MESOS-1405 at 5/23/14 9:55 AM:
-------------------------------------------------------------
Review request: https://reviews.apache.org/r/21852/
*Before*
{code}
$ MESOS_WORK_DIRECTORY="/tmp/"
MESOS_EXECUTOR_URIS="hdfs:///user/tom/test-fetch+0N" src/mesos-fetcher
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0523 10:38:44.003656 1933525776 fetcher.cpp:73] Fetching URI
'hdfs:///user/tom/test-fetch'
I0523 10:38:44.004147 1933525776 fetcher.cpp:99] Downloading resource from
'hdfs:///user/tom/test-fetch' to '/tmp/test-fetch'
I0523 10:38:47.983763 1933525776 fetcher.cpp:236] Skipped extracting path
'/tmp/test-fetch'
{code}
{code}
$ MESOS_WORK_DIRECTORY="/tmp/"
MESOS_EXECUTOR_URIS="s3n://bucket-test/tom/test-fetch+0N" src/mesos-fetcher
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0523 10:39:52.034631 1933525776 fetcher.cpp:73] Fetching URI
's3n://bucket-test/tom/test-fetch'
E0523 10:39:52.035181 1933525776 fetcher.cpp:142] A relative path was passed
for the resource but the environment variable MESOS_FRAMEWORKS_HOME is not set.
Please either specify this config option or avoid using a relative path
Failed to fetch: s3n://bucket-test/tom/test-fetch
{code}
Here we can see the fetcher classes the URI as a relative path (and since i've
not set all the environment variables it throws an error, trying to resolve the
path on the local filesystem).
*After*
{code}
$ MESOS_WORK_DIRECTORY="/tmp/"
MESOS_EXECUTOR_URIS="s3n://bucket-test/tom/test-fetch+0N" src/mesos-fetcher
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0523 10:52:28.486734 1933525776 fetcher.cpp:73] Fetching URI
's3n://bucket-test/tom/test-fetch'
I0523 10:52:28.487210 1933525776 fetcher.cpp:102] Downloading resource from
's3n://bucket-test/tom/test-fetch' to '/tmp/test-fetch'
I0523 10:52:33.173795 1933525776 fetcher.cpp:239] Skipped extracting path
'/tmp/test-fetch'
{code}
I'm not sure if we should just incorporate this change into your work
[~bernd-mesos] – or if it's something you've already done? This implementation
also isn't really very scalable, if we want to maintain good compatibility with
the Hadoop Filesystem implementations, users shouldn't have to re-compile mesos
to pass their custom URIs through to hadoop. An example here is if a user was
using GlusterFS instead of HDFS.
was (Author: tarnfeld):
Review request: https://reviews.apache.org/r/21852/
*Before*
{code}
$ MESOS_WORK_DIRECTORY="/tmp/"
MESOS_EXECUTOR_URIS="hdfs:///user/tom/test-fetch+0N" src/mesos-fetcher
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0523 10:38:44.003656 1933525776 fetcher.cpp:73] Fetching URI
'hdfs:///user/tom/test-fetch'
I0523 10:38:44.004147 1933525776 fetcher.cpp:99] Downloading resource from
'hdfs:///user/tom/test-fetch' to '/tmp/test-fetch'
I0523 10:38:47.983763 1933525776 fetcher.cpp:236] Skipped extracting path
'/tmp/test-fetch'
{code}
{code}
$ MESOS_WORK_DIRECTORY="/tmp/"
MESOS_EXECUTOR_URIS="s3n://bucket-test/tom/test-fetch+0N" src/mesos-fetcher
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0523 10:39:52.034631 1933525776 fetcher.cpp:73] Fetching URI
's3n://bucket-test/tom/test-fetch'
E0523 10:39:52.035181 1933525776 fetcher.cpp:142] A relative path was passed
for the resource but the environment variable MESOS_FRAMEWORKS_HOME is not set.
Please either specify this config option or avoid using a relative path
Failed to fetch: s3n://home.duedil.com/tom/test-fetch
{code}
Here we can see the fetcher classes the URI as a relative path (and since i've
not set all the environment variables it throws an error, trying to resolve the
path on the local filesystem).
*After*
{code}
$ MESOS_WORK_DIRECTORY="/tmp/"
MESOS_EXECUTOR_URIS="s3n://bucket-test/tom/test-fetch+0N" src/mesos-fetcher
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0523 10:52:28.486734 1933525776 fetcher.cpp:73] Fetching URI
's3n://bucket-test/tom/test-fetch'
I0523 10:52:28.487210 1933525776 fetcher.cpp:102] Downloading resource from
's3n://bucket-test/tom/test-fetch' to '/tmp/test-fetch'
I0523 10:52:33.173795 1933525776 fetcher.cpp:239] Skipped extracting path
'/tmp/test-fetch'
{code}
I'm not sure if we should just incorporate this change into your work
[~bernd-mesos] – or if it's something you've already done? This implementation
also isn't really very scalable, if we want to maintain good compatibility with
the Hadoop Filesystem implementations, users shouldn't have to re-compile mesos
to pass their custom URIs through to hadoop. An example here is if a user was
using GlusterFS instead of HDFS.
> Mesos fetcher does not support S3(n)
> ------------------------------------
>
> Key: MESOS-1405
> URL: https://issues.apache.org/jira/browse/MESOS-1405
> Project: Mesos
> Issue Type: Improvement
> Affects Versions: 0.18.2
> Reporter: Tom Arnfeld
> Assignee: Tom Arnfeld
> Priority: Minor
>
> The HDFS client is able to support both S3 and S3N. Details for the
> difference between the two can be found here:
> http://wiki.apache.org/hadoop/AmazonS3.
> Examples:
> s3://bucket/path.tar.gz <- S3 Block Store
> s3n://bucket/path.tar.gz <- S3 K/V Store
> Either we can simply pass these URIs through to the HDFS client (hdfs.cpp)
> and let hadoop do the work, or we can integrate with S3 directly. The latter
> then requires we have a way of managing S3 credentials, whereas using the
> HDFS client will just pull credentials from HADOOP_HOME.
--
This message was sent by Atlassian JIRA
(v6.2#6252)