umehrot2 commented on code in PR #6237: URL: https://github.com/apache/hudi/pull/6237#discussion_r932745731
########## hudi-common/src/main/java/org/apache/hudi/common/fs/HoodieWrapperFileSystem.java: ########## @@ -145,8 +145,11 @@ public static Path convertPathWithScheme(Path oldPath, String newScheme) { URI oldURI = oldPath.toUri(); URI newURI; try { - newURI = new URI(newScheme, oldURI.getUserInfo(), oldURI.getHost(), oldURI.getPort(), oldURI.getPath(), - oldURI.getQuery(), oldURI.getFragment()); + newURI = new URI(newScheme, + oldURI.getAuthority(), + oldURI.getPath(), + oldURI.getQuery(), + oldURI.getFragment()); Review Comment: Yes. It will stay fully compatible. The issue is introduced because of the use of **getHost()** API. This works fine for HDFS like URIs that have a host and a port. Also it works well for most bucket naming conventions that match java URI standards. But for certain patters as explained in the Jira the **getHost()** API breaks because of the way it tries to parse the hostname. Instead this constructor works better because it uses **getAuthority** which correctly extracts bucket name in case of S3. I do not see a need for specifically using the **getHotst/getPort** API, when we have **getAuthority** that can essentially do the same. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org