[ https://issues.apache.org/jira/browse/MESOS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14007010#comment-14007010 ]
Tom Arnfeld edited comment on MESOS-1405 at 5/23/14 9:53 AM: ------------------------------------------------------------- Review request: https://reviews.apache.org/r/21852/ *Before* {code} $ MESOS_WORK_DIRECTORY="/tmp/" MESOS_EXECUTOR_URIS="hdfs:///user/tom/test-fetch+0N" src/mesos-fetcher WARNING: Logging before InitGoogleLogging() is written to STDERR I0523 10:38:44.003656 1933525776 fetcher.cpp:73] Fetching URI 'hdfs:///user/tom/test-fetch' I0523 10:38:44.004147 1933525776 fetcher.cpp:99] Downloading resource from 'hdfs:///user/tom/test-fetch' to '/tmp/test-fetch' I0523 10:38:47.983763 1933525776 fetcher.cpp:236] Skipped extracting path '/tmp/test-fetch' {code} {code} $ MESOS_WORK_DIRECTORY="/tmp/" MESOS_EXECUTOR_URIS="s3n://bucket-test/tom/test-fetch+0N" src/mesos-fetcher WARNING: Logging before InitGoogleLogging() is written to STDERR I0523 10:39:52.034631 1933525776 fetcher.cpp:73] Fetching URI 's3n://bucket-test/tom/test-fetch' E0523 10:39:52.035181 1933525776 fetcher.cpp:142] A relative path was passed for the resource but the environment variable MESOS_FRAMEWORKS_HOME is not set. Please either specify this config option or avoid using a relative path Failed to fetch: s3n://home.duedil.com/tom/test-fetch {code} Here we can see the fetcher classes the URI as a relative path (and since i've not set all the environment variables it throws an error, trying to resolve the path on the local filesystem). *After* {code} $ MESOS_WORK_DIRECTORY="/tmp/" MESOS_EXECUTOR_URIS="s3n://bucket-test/tom/test-fetch+0N" src/mesos-fetcher WARNING: Logging before InitGoogleLogging() is written to STDERR I0523 10:52:28.486734 1933525776 fetcher.cpp:73] Fetching URI 's3n://bucket-test/tom/test-fetch' I0523 10:52:28.487210 1933525776 fetcher.cpp:102] Downloading resource from 's3n://bucket-test/tom/test-fetch' to '/tmp/test-fetch' I0523 10:52:33.173795 1933525776 fetcher.cpp:239] Skipped extracting path '/tmp/test-fetch' {code} I'm not sure if we should just incorporate this change into your work [~bernd-mesos] – or if it's something you've already done? This implementation also isn't really very scalable, if we want to maintain good compatibility with the Hadoop Filesystem implementations, users shouldn't have to re-compile mesos to pass their custom URIs through to hadoop. An example here is if a user was using GlusterFS instead of HDFS. was (Author: tarnfeld): Review request: *Before* {code} $ MESOS_WORK_DIRECTORY="/tmp/" MESOS_EXECUTOR_URIS="hdfs:///user/tom/test-fetch+0N" src/mesos-fetcher WARNING: Logging before InitGoogleLogging() is written to STDERR I0523 10:38:44.003656 1933525776 fetcher.cpp:73] Fetching URI 'hdfs:///user/tom/test-fetch' I0523 10:38:44.004147 1933525776 fetcher.cpp:99] Downloading resource from 'hdfs:///user/tom/test-fetch' to '/tmp/test-fetch' I0523 10:38:47.983763 1933525776 fetcher.cpp:236] Skipped extracting path '/tmp/test-fetch' {code} {code} $ MESOS_WORK_DIRECTORY="/tmp/" MESOS_EXECUTOR_URIS="s3n://bucket-test/tom/test-fetch+0N" src/mesos-fetcher WARNING: Logging before InitGoogleLogging() is written to STDERR I0523 10:39:52.034631 1933525776 fetcher.cpp:73] Fetching URI 's3n://bucket-test/tom/test-fetch' E0523 10:39:52.035181 1933525776 fetcher.cpp:142] A relative path was passed for the resource but the environment variable MESOS_FRAMEWORKS_HOME is not set. Please either specify this config option or avoid using a relative path Failed to fetch: s3n://home.duedil.com/tom/test-fetch {code} Here we can see the fetcher classes the URI as a relative path (and since i've not set all the environment variables it throws an error, trying to resolve the path on the local filesystem). *After* {code} $ MESOS_WORK_DIRECTORY="/tmp/" MESOS_EXECUTOR_URIS="s3n://bucket-test/tom/test-fetch+0N" src/mesos-fetcher WARNING: Logging before InitGoogleLogging() is written to STDERR I0523 10:52:28.486734 1933525776 fetcher.cpp:73] Fetching URI 's3n://bucket-test/tom/test-fetch' I0523 10:52:28.487210 1933525776 fetcher.cpp:102] Downloading resource from 's3n://bucket-test/tom/test-fetch' to '/tmp/test-fetch' I0523 10:52:33.173795 1933525776 fetcher.cpp:239] Skipped extracting path '/tmp/test-fetch' {code} I'm not sure if we should just incorporate this change into your work [~bernd-mesos] – or if it's something you've already done? This implementation also isn't really very scalable, if we want to maintain good compatibility with the Hadoop Filesystem implementations, users shouldn't have to re-compile mesos to pass their custom URIs through to hadoop. An example here is if a user was using GlusterFS instead of HDFS. > Mesos fetcher does not support S3(n) > ------------------------------------ > > Key: MESOS-1405 > URL: https://issues.apache.org/jira/browse/MESOS-1405 > Project: Mesos > Issue Type: Improvement > Affects Versions: 0.18.2 > Reporter: Tom Arnfeld > Assignee: Tom Arnfeld > Priority: Minor > > The HDFS client is able to support both S3 and S3N. Details for the > difference between the two can be found here: > http://wiki.apache.org/hadoop/AmazonS3. > Examples: > s3://bucket/path.tar.gz <- S3 Block Store > s3n://bucket/path.tar.gz <- S3 K/V Store > Either we can simply pass these URIs through to the HDFS client (hdfs.cpp) > and let hadoop do the work, or we can integrate with S3 directly. The latter > then requires we have a way of managing S3 credentials, whereas using the > HDFS client will just pull credentials from HADOOP_HOME. -- This message was sent by Atlassian JIRA (v6.2#6252)