[ https://issues.apache.org/jira/browse/HADOOP-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287565#comment-13287565 ]
Ivan Mitic commented on HADOOP-8409: ------------------------------------ bq. In the meantime, would you please elaborate on how uri fragments are related to supporting symlinks? Sure. Symlinks/fragments are used for [Distributed Cache|http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/filecache/DistributedCache.html]. For example, to add files to Distributed Cache, you would do something like this in code: {code} DistributedCache.addCacheArchive(new URI("s3://bucket/path/to/archive.zip#directory"), job); {code} and this works fine. However, if you try to do something like this: {code} DistributedCache.addCacheArchive(new Path("s3://bucket/path/to/archive.zip#directory").toUri(), job); {code} this will fail because Path object does not support fragments. In this case, ‘#’ sign will be encoded into the URI path: {{s3://bucket/path/to/archive.zip%23directory}} I run into this while fixing TestGenericOptionsParser that was failing on Windows. However, I found some [forum posts|https://forums.aws.amazon.com/message.jspa?messageID=152538] where people actually used the incorrect pattern. This might be a good change orthogonally to what we do for paths. If you agree, maybe split fragment support into a new Jira? bq. Please also look at the issues raised in HADOOP-8139 and the reasons why we did not support windows paths on HDFS. Thanks Suresh, I am aware of the issues raised here. > Address Hadoop path related issues on Windows > --------------------------------------------- > > Key: HADOOP-8409 > URL: https://issues.apache.org/jira/browse/HADOOP-8409 > Project: Hadoop Common > Issue Type: Bug > Components: fs, test, util > Affects Versions: 1.0.0 > Reporter: Ivan Mitic > Assignee: Ivan Mitic > Attachments: HADOOP-8409-branch-1-win.patch > > Original Estimate: 168h > Remaining Estimate: 168h > > There are multiple places in prod and test code where Windows paths are not > handled properly. From a high level this could be summarized with: > 1. Windows paths are not necessarily valid DFS paths (while Unix paths are) > 2. Windows paths are not necessarily valid URIs (while Unix paths are) > #1 causes a number of tests to fail because they implicitly assume that local > paths are valid DFS paths (by extracting the DFS test path from for example > "test.build.data" property) > #2 causes issues when URIs are directly created on path strings passed in by > the user -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira