Imran Rashid created SPARK-25738: ------------------------------------ Summary: LOAD DATA INPATH doesn't work if hdfs conf includes port Key: SPARK-25738 URL: https://issues.apache.org/jira/browse/SPARK-25738 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.4.0 Reporter: Imran Rashid
LOAD DATA INPATH throws {{java.net.URISyntaxException: Malformed IPv6 address at index 8}} if your hdfs conf includes a port for the namenode. This is because the URI is passing in the value of the hdfs conf "fs.defaultFS" in for the host. Note that variable is called {{authority}}, but the 4-arg URI constructor actually expects a host: https://docs.oracle.com/javase/7/docs/api/java/net/URI.html#URI(java.lang.String,%20java.lang.String,%20java.lang.String,%20java.lang.String) {code} val defaultFSConf = sparkSession.sessionState.newHadoopConf().get("fs.defaultFS") ... val newUri = new URI(scheme, authority, pathUri.getPath, pathUri.getFragment) {code} https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala#L386 This was introduced by SPARK-23425. *Workaround*: specify a fully qualified path, eg. instead of {noformat} LOAD DATA INPATH '/some/path/on/hdfs' {noformat} use {noformat} LOAD DATA INPATH 'hdfs://fizz.buzz.com:8020/some/path/on/hdfs' {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org