[jira] [Comment Edited] (MESOS-1405) Mesos fetcher does not support S3(n)

Tom Arnfeld (JIRA) Fri, 23 May 2014 02:57:28 -0700

    [ 
https://issues.apache.org/jira/browse/MESOS-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14007010#comment-14007010
 ]


Tom Arnfeld edited comment on MESOS-1405 at 5/23/14 9:55 AM:
-------------------------------------------------------------

Review request: https://reviews.apache.org/r/21852/

*Before*

{code}
$ MESOS_WORK_DIRECTORY="/tmp/" 
MESOS_EXECUTOR_URIS="hdfs:///user/tom/test-fetch+0N" src/mesos-fetcher 
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0523 10:38:44.003656 1933525776 fetcher.cpp:73] Fetching URI 
'hdfs:///user/tom/test-fetch'
I0523 10:38:44.004147 1933525776 fetcher.cpp:99] Downloading resource from 
'hdfs:///user/tom/test-fetch' to '/tmp/test-fetch'
I0523 10:38:47.983763 1933525776 fetcher.cpp:236] Skipped extracting path 
'/tmp/test-fetch'
{code}

{code}
$ MESOS_WORK_DIRECTORY="/tmp/" 
MESOS_EXECUTOR_URIS="s3n://bucket-test/tom/test-fetch+0N" src/mesos-fetcher 
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0523 10:39:52.034631 1933525776 fetcher.cpp:73] Fetching URI 
's3n://bucket-test/tom/test-fetch'
E0523 10:39:52.035181 1933525776 fetcher.cpp:142] A relative path was passed 
for the resource but the environment variable MESOS_FRAMEWORKS_HOME is not set. 
Please either specify this config option or avoid using a relative path
Failed to fetch: s3n://bucket-test/tom/test-fetch
{code}

Here we can see the fetcher classes the URI as a relative path (and since i've 
not set all the environment variables it throws an error, trying to resolve the 
path on the local filesystem).

*After*

{code}
$ MESOS_WORK_DIRECTORY="/tmp/" 
MESOS_EXECUTOR_URIS="s3n://bucket-test/tom/test-fetch+0N" src/mesos-fetcher 
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0523 10:52:28.486734 1933525776 fetcher.cpp:73] Fetching URI 
's3n://bucket-test/tom/test-fetch'
I0523 10:52:28.487210 1933525776 fetcher.cpp:102] Downloading resource from 
's3n://bucket-test/tom/test-fetch' to '/tmp/test-fetch'
I0523 10:52:33.173795 1933525776 fetcher.cpp:239] Skipped extracting path 
'/tmp/test-fetch'
{code}

I'm not sure if we should just incorporate this change into your work 
[~bernd-mesos] – or if it's something you've already done? This implementation 
also isn't really very scalable, if we want to maintain good compatibility with 
the Hadoop Filesystem implementations, users shouldn't have to re-compile mesos 
to pass their custom URIs through to hadoop. An example here is if a user was 
using GlusterFS instead of HDFS.


was (Author: tarnfeld):
Review request: https://reviews.apache.org/r/21852/

*Before*

{code}
$ MESOS_WORK_DIRECTORY="/tmp/" 
MESOS_EXECUTOR_URIS="hdfs:///user/tom/test-fetch+0N" src/mesos-fetcher 
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0523 10:38:44.003656 1933525776 fetcher.cpp:73] Fetching URI 
'hdfs:///user/tom/test-fetch'
I0523 10:38:44.004147 1933525776 fetcher.cpp:99] Downloading resource from 
'hdfs:///user/tom/test-fetch' to '/tmp/test-fetch'
I0523 10:38:47.983763 1933525776 fetcher.cpp:236] Skipped extracting path 
'/tmp/test-fetch'
{code}

{code}
$ MESOS_WORK_DIRECTORY="/tmp/" 
MESOS_EXECUTOR_URIS="s3n://bucket-test/tom/test-fetch+0N" src/mesos-fetcher 
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0523 10:39:52.034631 1933525776 fetcher.cpp:73] Fetching URI 
's3n://bucket-test/tom/test-fetch'
E0523 10:39:52.035181 1933525776 fetcher.cpp:142] A relative path was passed 
for the resource but the environment variable MESOS_FRAMEWORKS_HOME is not set. 
Please either specify this config option or avoid using a relative path
Failed to fetch: s3n://home.duedil.com/tom/test-fetch
{code}

Here we can see the fetcher classes the URI as a relative path (and since i've 
not set all the environment variables it throws an error, trying to resolve the 
path on the local filesystem).

*After*

{code}
$ MESOS_WORK_DIRECTORY="/tmp/" 
MESOS_EXECUTOR_URIS="s3n://bucket-test/tom/test-fetch+0N" src/mesos-fetcher 
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0523 10:52:28.486734 1933525776 fetcher.cpp:73] Fetching URI 
's3n://bucket-test/tom/test-fetch'
I0523 10:52:28.487210 1933525776 fetcher.cpp:102] Downloading resource from 
's3n://bucket-test/tom/test-fetch' to '/tmp/test-fetch'
I0523 10:52:33.173795 1933525776 fetcher.cpp:239] Skipped extracting path 
'/tmp/test-fetch'
{code}

I'm not sure if we should just incorporate this change into your work 
[~bernd-mesos] – or if it's something you've already done? This implementation 
also isn't really very scalable, if we want to maintain good compatibility with 
the Hadoop Filesystem implementations, users shouldn't have to re-compile mesos 
to pass their custom URIs through to hadoop. An example here is if a user was 
using GlusterFS instead of HDFS.

> Mesos fetcher does not support S3(n)
> ------------------------------------
>
>                 Key: MESOS-1405
>                 URL: https://issues.apache.org/jira/browse/MESOS-1405
>             Project: Mesos
>          Issue Type: Improvement
>    Affects Versions: 0.18.2
>            Reporter: Tom Arnfeld
>            Assignee: Tom Arnfeld
>            Priority: Minor
>
> The HDFS client is able to support both S3 and S3N. Details for the 
> difference between the two can be found here: 
> http://wiki.apache.org/hadoop/AmazonS3.
> Examples:
> s3://bucket/path.tar.gz <- S3 Block Store
> s3n://bucket/path.tar.gz <- S3 K/V Store
> Either we can simply pass these URIs through to the HDFS client (hdfs.cpp) 
> and let hadoop do the work, or we can integrate with S3 directly. The latter 
> then requires we have a way of managing S3 credentials, whereas using the 
> HDFS client will just pull credentials from HADOOP_HOME.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Comment Edited] (MESOS-1405) Mesos fetcher does not support S3(n)

Reply via email to