umehrot2 commented on code in PR #6237:
URL: https://github.com/apache/hudi/pull/6237#discussion_r932745731


##########
hudi-common/src/main/java/org/apache/hudi/common/fs/HoodieWrapperFileSystem.java:
##########
@@ -145,8 +145,11 @@ public static Path convertPathWithScheme(Path oldPath, 
String newScheme) {
     URI oldURI = oldPath.toUri();
     URI newURI;
     try {
-      newURI = new URI(newScheme, oldURI.getUserInfo(), oldURI.getHost(), 
oldURI.getPort(), oldURI.getPath(),
-          oldURI.getQuery(), oldURI.getFragment());
+      newURI = new URI(newScheme,
+                oldURI.getAuthority(),
+                oldURI.getPath(),
+                oldURI.getQuery(),
+                oldURI.getFragment());

Review Comment:
   Yes. It will stay fully compatible. The issue is introduced because of the 
use of **getHost()** API. This works fine for HDFS like URIs that have a host 
and a port. Also it works well for most bucket naming conventions that match 
java URI standards. But for certain patters as explained in the Jira the 
**getHost()** API breaks because of the way it tries to parse the hostname. 
Instead this constructor works better because it uses **getAuthority** which 
correctly extracts bucket name in case of S3. I do not see a need for 
specifically using the **getHotst/getPort** API, when we have **getAuthority** 
that can essentially do the same.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to