[ https://issues.apache.org/jira/browse/SPARK-48649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wenchen Fan resolved SPARK-48649. --------------------------------- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 47006 [https://github.com/apache/spark/pull/47006] > Add "ignoreInvalidPartitionPaths" and > "spark.sql.files.ignoreInvalidPartitionPaths" configs to allow ignoring > invalid partition paths > ------------------------------------------------------------------------------------------------------------------------------------- > > Key: SPARK-48649 > URL: https://issues.apache.org/jira/browse/SPARK-48649 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 4.0.0 > Reporter: Ivan Sadikov > Assignee: Ivan Sadikov > Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > When having a table directory with invalid partitions such as: > {code:java} > table/ > invalid/... > part=1/... > part=2/... > part=3/...{code} > a SQL query reading all of the partitions would fail with > {code:java} > java.lang.AssertionError: assertion failed: Conflicting directory structures > detected. Suspicious paths: > table > table/invalid {code} > > I propose to add a data source option and Spark SQL config to ignore invalid > partition paths. The config will be disabled by default to retain the current > behaviour. > {code:java} > spark.conf.set("spark.sql.files.ignoreInvalidPartitionPaths", "true"){code} > {code:java} > spark.read.format("parquet").option("ignoreInvalidPartitionPaths", > "true").load(...) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org