[ https://issues.apache.org/jira/browse/SPARK-34883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373552#comment-17373552 ]
Mike Pieters commented on SPARK-34883: -------------------------------------- I've got the same error here when I try to run: {code:java} spark.read.csv(URL_ABFS_RAW + "/salesforce/Case/timestamp=2021-07-02 00:14:15.129481", header=True, multiLine=True) {code} I'm running Spark 3.0.1 > Setting CSV reader option "multiLine" to "true" causes URISyntaxException > when colon is in file path > ---------------------------------------------------------------------------------------------------- > > Key: SPARK-34883 > URL: https://issues.apache.org/jira/browse/SPARK-34883 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 3.0.0, 3.1.1 > Reporter: Brady Tello > Priority: Major > > Setting the CSV reader's "multiLine" option to "True" throws the following > exception when a ':' character is in the file path. > > {code:java} > java.net.URISyntaxException: Relative path in absolute URI: test:dir > {code} > I've tested this in both Spark 3.0.0 and Spark 3.1.1 and I get the same error > whether I use Scala, Python, or SQL. > The following code works fine: > > {code:java} > csvFile = "/FileStore/myDir/test:dir/pageviews_by_second.tsv" > tempDF = (spark.read.option("sep", "\t").csv(csvFile) > {code} > While the following code fails: > > {code:java} > csvFile = "/FileStore/myDir/test:dir/pageviews_by_second.tsv" > tempDF = (spark.read.option("sep", "\t").option("multiLine", > "True").csv(csvFile) > {code} > Full Stack Trace from Python: > > {code:java} > --------------------------------------------------------------------------- > IllegalArgumentException Traceback (most recent call last) <command-8965899> > in <module> > 3 csvFile = "/FileStore/myDir/test:dir/pageviews_by_second.tsv" > 4 > ----> 5 tempDF = (spark.read.option("sep", "\t").option("multiLine", "True") > /databricks/spark/python/pyspark/sql/readwriter.py in csv(self, path, schema, > sep, encoding, quote, escape, comment, header, inferSchema, > ignoreLeadingWhiteSpace, ignoreTrailingWhiteSpace, nullValue, nanValue, > positiveInf, negativeInf, dateFormat, timestampFormat, maxColumns, > maxCharsPerColumn, maxMalformedLogPerPartition, mode, > columnNameOfCorruptRecord, multiLine, charToEscapeQuoteEscaping, > samplingRatio, enforceSchema, emptyValue, locale, lineSep, pathGlobFilter, > recursiveFileLookup, modifiedBefore, modifiedAfter, unescapedQuoteHandling) > 735 path = [path] > 736 if type(path) == list: > --> 737 return > self._df(self._jreader.csv(self._spark._sc._jvm.PythonUtils.toSeq(path))) > 738 elif isinstance(path, RDD): > 739 def func(iterator): > /databricks/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py in > __call__(self, *args) > 1302 > 1303 answer = self.gateway_client.send_command(command) > -> 1304 return_value = get_return_value( > 1305 answer, self.gateway_client, self.target_id, self.name) > 1306 > /databricks/spark/python/pyspark/sql/utils.py in deco(*a, **kw) > 114 # Hide where the exception came from that shows a non-Pythonic > 115 # JVM exception message. > --> 116 raise converted from None > 117 else: > 118 raise IllegalArgumentException: java.net.URISyntaxException: Relative > path in absolute URI: test:dir > {code} > > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org