[ https://issues.apache.org/jira/browse/SPARK-32206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17152567#comment-17152567 ]
JinxinTang commented on SPARK-32206: ------------------------------------ Please use `spark.read.format("xxx").load("file:/tmp/tt3","file:/tmp/tt1").show` instead of `spark.read.format("xxx").load("file:/tmp/\{tt3,tt1}").show`, the former is correct, and the latter will cause the `java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute UR` > Enable multi-line true could break the read csv in Azure Data Lake Storage > gen2 > ------------------------------------------------------------------------------- > > Key: SPARK-32206 > URL: https://issues.apache.org/jira/browse/SPARK-32206 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 2.3.2, 2.4.5 > Reporter: Qionghui Zhang > Priority: Major > > I'm using azure data lake gen2, when I'm loading data frame with certain > options: > var df = spark.read.format("csv") > .option("ignoreLeadingWhiteSpace", "true") > .option("ignoreTrailingWhiteSpace", "true") > .option("parserLib", "UNIVOCITY") > .option("multiline", "true") > .option("inferSchema", "true") > .option("mode", "PERMISSIVE") > .option("quote", "\"") > .option("escape", "\"") > .option("timeStampFormat", "M/d/yyyy H:m:s a") > > .load("abfss://\{containername}@\{storage}.dfs.core.windows.net/\{DirectoryWithoutColon}") > .limit(1) > It will load data correctly. > > But if I use \{DirectoryWithColon}, it will thrown error: > java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative > path in absolute URI: \{somefilesnapshotname}yyyy-MM-dd'T'hh:mm:ss > > Then if I remove .option("multiline", "true"), data can be loaded, but for > sure that the dataframe is not handled correctly because there are newline > character. > > So I believe it is a bug. > > And since our production is running correctly if we enable > spark.read.schema(\{SomeSchemaList}).format("csv"), and we want to use > inferschema feature on those file path with colon or other special > characters, could you help fix this issue? -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org