Re: ReceiverInputDStream#saveAsTextFiles with a S3 URL results in double forward slash key names in S3

Jon Chase Tue, 23 Dec 2014 12:13:38 -0800

I've had a lot of difficulties with using the s3:// prefix.  s3n:// seems
to work much better.  Can't find the link ATM, but seems I recall that
s3:// (Hadoop's original block format for s3) is no longer recommended for
use.  Amazon's EMR goes so far as to remap the s3:// to s3n:// behind the
scenes.


On Tue, Dec 23, 2014 at 9:29 AM, Enno Shioji <eshi...@gmail.com> wrote:

> ᐧ
> I filed a new issue HADOOP-11444. According to HADOOP-10372, s3 is likely
> to be deprecated anyway in favor of s3n.
> Also the comment section notes that Amazon has implemented an
> EmrFileSystem for S3 which is built using AWS SDK rather than JetS3t.
>
>
>
>
> On Tue, Dec 23, 2014 at 2:06 PM, Enno Shioji <eshi...@gmail.com> wrote:
>
>> Hey Jay :)
>>
>> I tried "s3n" which uses the Jets3tNativeFileSystemStore, and the double
>> slash went away.
>> As far as I can see, it does look like a bug in hadoop-common; I'll file
>> a ticket for it.
>>
>> Hope you are doing well, by the way!
>>
>> PS:
>>  Jets3tNativeFileSystemStore's implementation of pathToKey is:
>> ======
>>   private static String pathToKey(Path path) {
>>     if (path.toUri().getScheme() != null &&
>> path.toUri().getPath().isEmpty()) {
>>       // allow uris without trailing slash after bucket to refer to root,
>>       // like s3n://mybucket
>>       return "";
>>     }
>>     if (!path.isAbsolute()) {
>>       throw new IllegalArgumentException("Path must be absolute: " +
>> path);
>>     }
>>     String ret = path.toUri().getPath().substring(1); // remove initial
>> slash
>>     if (ret.endsWith("/") && (ret.indexOf("/") != ret.length() - 1)) {
>>       ret = ret.substring(0, ret.length() -1);
>>   }
>>     return ret;
>>   }
>> ======
>>
>> whereas Jets3tFileSystemStore uses:
>> ======
>>   private String pathToKey(Path path) {
>>     if (!path.isAbsolute()) {
>>       throw new IllegalArgumentException("Path must be absolute: " +
>> path);
>>     }
>>     return path.toUri().getPath();
>>   }
>> ======
>>
>>
>>
>>
>>
>>
>> On Tue, Dec 23, 2014 at 1:07 PM, Jay Vyas <jayunit100.apa...@gmail.com>
>> wrote:
>>
>>> Hi enno.  Might be worthwhile to cross post this on dev@hadoop...
>>> Obviously a simple spark way to test this would be to change the uri to
>>> write to hdfs:// or maybe you could do file:// , and confirm that the extra
>>> slash goes away.
>>>
>>> - if it's indeed a jets3t issue we should add a new unit test for this
>>> if the hcfs tests are passing for jets3tfilesystem, yet this error still
>>> exists.
>>>
>>> - To learn how to run HCFS tests against any FileSystem , see the wiki
>>> page : https://wiki.apache.org/hadoop/HCFS/Progress (see the July 14th
>>> entry on that page).
>>>
>>> - Is there another S3FileSystem implementation for AbstractFileSystem or
>>> is jets3t the only one?  That would be a easy  way to test this. And also a
>>> good workaround.
>>>
>>> I'm wondering, also why jets3tfilesystem is the AbstractFileSystem used
>>> by so many - is that the standard impl for storing using AbstractFileSystem
>>> interface?
>>>
>>> On Dec 23, 2014, at 6:06 AM, Enno Shioji <eshi...@gmail.com> wrote:
>>>
>>> Is anybody experiencing this? It looks like a bug in JetS3t to me, but
>>> thought I'd sanity check before filing an issue.
>>>
>>>
>>> ================
>>> I'm writing to S3 using ReceiverInputDStream#saveAsTextFiles with a S3
>>> URL ("s3://fake-test/1234").
>>>
>>> The code does write to S3, but with double forward slashes (e.g.
>>> "s3://fake-test//1234/-1419334280000/".
>>>
>>> I did a debug and it seem like the culprit is
>>> Jets3tFileSystemStore#pathToKey(path), which returns "/fake-test/1234/..."
>>> for the input "s3://fake-test/1234/...". when it should hack off the first
>>> forward slash. However, I couldn't find any bug report for JetS3t for this.
>>>
>>> Am I missing something, or is this likely a JetS3t bug?
>>> ================
>>>
>>>
>>>
>>
>

Re: ReceiverInputDStream#saveAsTextFiles with a S3 URL results in double forward slash key names in S3

Reply via email to