Terry Kim created SPARK-32621: --------------------------------- Summary: "path" option is added again to input paths during infer() Key: SPARK-32621 URL: https://issues.apache.org/jira/browse/SPARK-32621 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.0.0, 2.4.6, 3.0.1, 3.1.0 Reporter: Terry Kim
When "path" option is used when creating a DataFrame, it can cause issues during infer. {code:java} class TestFileFilter extends PathFilter { override def accept(path: Path): Boolean = path.getParent.getName != "p=2" } val path = "/tmp" val df = spark.range(2) df.write.json(path + "/p=1") df.write.json(path + "/p=2") val extraOptions = Map( "mapred.input.pathFilter.class" -> classOf[TestFileFilter].getName, "mapreduce.input.pathFilter.class" -> classOf[TestFileFilter].getName ) // This works fine. assert(spark.read.options(extraOptions).json(path).count == 2) // The following with "path" option fails with the following: // assertion failed: Conflicting directory structures detected. Suspicious paths // file:/tmp // file:/tmp/p=1 assert(spark.read.options(extraOptions).format("json").option("path", path).load.count() === 2) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org