[ 
https://issues.apache.org/jira/browse/HADOOP-7418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13065002#comment-13065002
 ] 

Aaron T. Myers commented on HADOOP-7418:
----------------------------------------

Hey Andrew, I think the regex needs to changed. In particular, I don't think it 
will actually cover the multiple back slash case since the double back slash in 
your regex actually is just string-escaping one back slash, which is then 
regex-escaping the "+" character. If you want to include a literal back slash 
in the regex, you need to use 4 back slashes. (Silly, I know.)

Furthermore, I think that doing the replace in two stages (first forward 
slashes, then back slashes) won't cover the case when forward slashes are 
separated by back slashes (e.g. "/foo/\/bar".) To cover that case, you have two 
options:

# Replace back slashes first, before forward slashes. The back slash 
replacement could even be a 1-for-1 replacement, leaving you with a bunch of 
consecutive forward slashes, which then get replaced by a single forward slash 
in the next regex.
# Use something like this regex: "{{.replaceAll("(/|\\\\)+", "/")}}", which 
replaces multiple consecutive "/" or "\" with a single "/".

It would also be worthwhile to add test cases to cover these cases.

> support for multiple slashes in the path separator
> --------------------------------------------------
>
>                 Key: HADOOP-7418
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7418
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 0.23.0
>         Environment: Linux running JDK 1.6
>            Reporter: Sudharsan Sampath
>            Assignee: Andrew Look
>            Priority: Minor
>              Labels: newbie
>             Fix For: 0.23.0
>
>         Attachments: HADOOP-7418.txt, HADOOP-7418.txt, HDFS-1460.txt, 
> HDFS-1460.txt
>
>
> the parsing of the input path string to identify the uri authority conflicts 
> with the file system paths. For instance the following is a valid path in 
> both the linux file system and the hdfs.
> //user/directory1//directory2.
> While this works perfectly fine in the command line for manipulating hdfs, 
> the same fails when specified as the input path for a mapper class with the 
> following expcetion.
> Exception in thread "main" java.net.UnknownHostException: unknown host: user
>         at org.apache.hadoop.ipc.Client$Connection.<init>(Client.java:195)
> as the org.apache.hadoop.fs.Path class assumes the string that follows the 
> '//' to be an uri authority

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to