I've had a lot of difficulties with using the s3:// prefix. s3n:// seems to work much better. Can't find the link ATM, but seems I recall that s3:// (Hadoop's original block format for s3) is no longer recommended for use. Amazon's EMR goes so far as to remap the s3:// to s3n:// behind the scenes.
On Tue, Dec 23, 2014 at 9:29 AM, Enno Shioji <eshi...@gmail.com> wrote: > ᐧ > I filed a new issue HADOOP-11444. According to HADOOP-10372, s3 is likely > to be deprecated anyway in favor of s3n. > Also the comment section notes that Amazon has implemented an > EmrFileSystem for S3 which is built using AWS SDK rather than JetS3t. > > > > > On Tue, Dec 23, 2014 at 2:06 PM, Enno Shioji <eshi...@gmail.com> wrote: > >> Hey Jay :) >> >> I tried "s3n" which uses the Jets3tNativeFileSystemStore, and the double >> slash went away. >> As far as I can see, it does look like a bug in hadoop-common; I'll file >> a ticket for it. >> >> Hope you are doing well, by the way! >> >> PS: >> Jets3tNativeFileSystemStore's implementation of pathToKey is: >> ====== >> private static String pathToKey(Path path) { >> if (path.toUri().getScheme() != null && >> path.toUri().getPath().isEmpty()) { >> // allow uris without trailing slash after bucket to refer to root, >> // like s3n://mybucket >> return ""; >> } >> if (!path.isAbsolute()) { >> throw new IllegalArgumentException("Path must be absolute: " + >> path); >> } >> String ret = path.toUri().getPath().substring(1); // remove initial >> slash >> if (ret.endsWith("/") && (ret.indexOf("/") != ret.length() - 1)) { >> ret = ret.substring(0, ret.length() -1); >> } >> return ret; >> } >> ====== >> >> whereas Jets3tFileSystemStore uses: >> ====== >> private String pathToKey(Path path) { >> if (!path.isAbsolute()) { >> throw new IllegalArgumentException("Path must be absolute: " + >> path); >> } >> return path.toUri().getPath(); >> } >> ====== >> >> >> >> >> >> >> On Tue, Dec 23, 2014 at 1:07 PM, Jay Vyas <jayunit100.apa...@gmail.com> >> wrote: >> >>> Hi enno. Might be worthwhile to cross post this on dev@hadoop... >>> Obviously a simple spark way to test this would be to change the uri to >>> write to hdfs:// or maybe you could do file:// , and confirm that the extra >>> slash goes away. >>> >>> - if it's indeed a jets3t issue we should add a new unit test for this >>> if the hcfs tests are passing for jets3tfilesystem, yet this error still >>> exists. >>> >>> - To learn how to run HCFS tests against any FileSystem , see the wiki >>> page : https://wiki.apache.org/hadoop/HCFS/Progress (see the July 14th >>> entry on that page). >>> >>> - Is there another S3FileSystem implementation for AbstractFileSystem or >>> is jets3t the only one? That would be a easy way to test this. And also a >>> good workaround. >>> >>> I'm wondering, also why jets3tfilesystem is the AbstractFileSystem used >>> by so many - is that the standard impl for storing using AbstractFileSystem >>> interface? >>> >>> On Dec 23, 2014, at 6:06 AM, Enno Shioji <eshi...@gmail.com> wrote: >>> >>> Is anybody experiencing this? It looks like a bug in JetS3t to me, but >>> thought I'd sanity check before filing an issue. >>> >>> >>> ================ >>> I'm writing to S3 using ReceiverInputDStream#saveAsTextFiles with a S3 >>> URL ("s3://fake-test/1234"). >>> >>> The code does write to S3, but with double forward slashes (e.g. >>> "s3://fake-test//1234/-1419334280000/". >>> >>> I did a debug and it seem like the culprit is >>> Jets3tFileSystemStore#pathToKey(path), which returns "/fake-test/1234/..." >>> for the input "s3://fake-test/1234/...". when it should hack off the first >>> forward slash. However, I couldn't find any bug report for JetS3t for this. >>> >>> Am I missing something, or is this likely a JetS3t bug? >>> ================ >>> >>> >>> >> >