[ https://issues.apache.org/jira/browse/SPARK-25738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16650796#comment-16650796 ]
Shixiong Zhu commented on SPARK-25738: -------------------------------------- Marked as a blocker since this is a regression > LOAD DATA INPATH doesn't work if hdfs conf includes port > -------------------------------------------------------- > > Key: SPARK-25738 > URL: https://issues.apache.org/jira/browse/SPARK-25738 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.4.0 > Reporter: Imran Rashid > Priority: Blocker > > LOAD DATA INPATH throws {{java.net.URISyntaxException: Malformed IPv6 address > at index 8}} if your hdfs conf includes a port for the namenode. > This is because the URI is passing in the value of the hdfs conf > "fs.defaultFS" in for the host. Note that variable is called {{authority}}, > but the 4-arg URI constructor actually expects a host: > https://docs.oracle.com/javase/7/docs/api/java/net/URI.html#URI(java.lang.String,%20java.lang.String,%20java.lang.String,%20java.lang.String) > {code} > val defaultFSConf = > sparkSession.sessionState.newHadoopConf().get("fs.defaultFS") > ... > val newUri = new URI(scheme, authority, pathUri.getPath, pathUri.getFragment) > {code} > https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala#L386 > This was introduced by SPARK-23425. > *Workaround*: specify a fully qualified path, eg. instead of > {noformat} > LOAD DATA INPATH '/some/path/on/hdfs' > {noformat} > use > {noformat} > LOAD DATA INPATH 'hdfs://fizz.buzz.com:8020/some/path/on/hdfs' > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org