[ https://issues.apache.org/jira/browse/HADOOP-11444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tsuyoshi OZAWA updated HADOOP-11444: ------------------------------------ Description: While writing to S3 using Spark 1.2.0's ReceiverInputDStream#saveAsTextFiles with a S3 URL ("s3://fake-test/1234"), I noticed that files are written with double forward slashes (e.g. "s3://fake-test//1234/-1419334280000/"). After debugging, it seems this is caused by Jets3tFileSystemStore#pathToKey(path), which returns "/fake-test/1234/..." for the input "s3://fake-test/1234/...". when it should hack off the first forward slash. When I used a s3n URL and hence Jets3tNativeFileSystemStore, the double slashes went away. Here are the comparison between their pathToKey implementation: Jets3tNativeFileSystemStore's implementation of pathToKey is: {code} private static String pathToKey(Path path) { if (path.toUri().getScheme() != null && path.toUri().getPath().isEmpty()) { // allow uris without trailing slash after bucket to refer to root, // like s3n://mybucket return ""; } if (!path.isAbsolute()) { throw new IllegalArgumentException("Path must be absolute: " + path); } String ret = path.toUri().getPath().substring(1); // remove initial slash if (ret.endsWith("/") && (ret.indexOf("/") != ret.length() - 1)) { ret = ret.substring(0, ret.length() -1); } return ret; } {code} whereas Jets3tFileSystemStore uses: {code} private String pathToKey(Path path) { if (!path.isAbsolute()) { throw new IllegalArgumentException("Path must be absolute: " + path); } return path.toUri().getPath(); } {code} was: While writing to S3 using Spark 1.2.0's ReceiverInputDStream#saveAsTextFiles with a S3 URL ("s3://fake-test/1234"), I noticed that files are written with double forward slashes (e.g. "s3://fake-test//1234/-1419334280000/"). After debugging, it seems this is caused by Jets3tFileSystemStore#pathToKey(path), which returns "/fake-test/1234/..." for the input "s3://fake-test/1234/...". when it should hack off the first forward slash. When I used a s3n URL and hence Jets3tNativeFileSystemStore, the double slashes went away. Here are the comparison between their pathToKey implementation: Jets3tNativeFileSystemStore's implementation of pathToKey is: ====== private static String pathToKey(Path path) { if (path.toUri().getScheme() != null && path.toUri().getPath().isEmpty()) { // allow uris without trailing slash after bucket to refer to root, // like s3n://mybucket return ""; } if (!path.isAbsolute()) { throw new IllegalArgumentException("Path must be absolute: " + path); } String ret = path.toUri().getPath().substring(1); // remove initial slash if (ret.endsWith("/") && (ret.indexOf("/") != ret.length() - 1)) { ret = ret.substring(0, ret.length() -1); } return ret; } ====== whereas Jets3tFileSystemStore uses: ====== private String pathToKey(Path path) { if (!path.isAbsolute()) { throw new IllegalArgumentException("Path must be absolute: " + path); } return path.toUri().getPath(); } ====== > Jets3tFileSystemStore fails to remove initial slash from object keys, > resulting in objects with double forward slashes being stored > ------------------------------------------------------------------------------------------------------------------------------------ > > Key: HADOOP-11444 > URL: https://issues.apache.org/jira/browse/HADOOP-11444 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 > Affects Versions: 2.2.0 > Environment: java version "1.7.0_71" > Java(TM) SE Runtime Environment (build 1.7.0_71-b14) > Java HotSpot(TM) 64-Bit Server VM (build 24.71-b01, mixed mode) > Reporter: Enno Shioji > Priority: Minor > > While writing to S3 using Spark 1.2.0's ReceiverInputDStream#saveAsTextFiles > with a S3 URL ("s3://fake-test/1234"), I noticed that files are written with > double forward slashes (e.g. "s3://fake-test//1234/-1419334280000/"). > After debugging, it seems this is caused by > Jets3tFileSystemStore#pathToKey(path), which returns "/fake-test/1234/..." > for the input "s3://fake-test/1234/...". when it should hack off the first > forward slash. > When I used a s3n URL and hence Jets3tNativeFileSystemStore, the double > slashes went away. Here are the comparison between their pathToKey > implementation: > Jets3tNativeFileSystemStore's implementation of pathToKey is: > {code} > private static String pathToKey(Path path) { > if (path.toUri().getScheme() != null && path.toUri().getPath().isEmpty()) > { > // allow uris without trailing slash after bucket to refer to root, > // like s3n://mybucket > return ""; > } > if (!path.isAbsolute()) { > throw new IllegalArgumentException("Path must be absolute: " + path); > } > String ret = path.toUri().getPath().substring(1); // remove initial slash > if (ret.endsWith("/") && (ret.indexOf("/") != ret.length() - 1)) { > ret = ret.substring(0, ret.length() -1); > } > return ret; > } > {code} > whereas Jets3tFileSystemStore uses: > {code} > private String pathToKey(Path path) { > if (!path.isAbsolute()) { > throw new IllegalArgumentException("Path must be absolute: " + path); > } > return path.toUri().getPath(); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)