Omer Ozarslan created SPARK-34423: ------------------------------------- Summary: Allow FileTable.fileIndex to be reused for custom partition schema in DataSourceV2 read path Key: SPARK-34423 URL: https://issues.apache.org/jira/browse/SPARK-34423 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.1 Reporter: Omer Ozarslan
It is currently possible to provide custom partition schema in DataSourceV2 read path with custom implementations of PartitionAwareFileIndex/PartitionSpec and by overriding fileIndex in a subclass of FileTable. Since fileIndex is lazy val it's not possible to reuse it from the subclass however (i.e. super.fileIndex). [https://github.com/apache/spark/blob/e0053853c90d39ef6de9d59fb933525e20bae1fa/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FileTable.scala#L44-L61] Duplicating this code in the subclass is possible but somewhat hacky e.g. DataSource globbing function is private API. I was wondering if this logic can be refactored into something like this: {code:java} def createFileIndex(): PartitionAwareFileIndex = { ...[current fileIndex logic]... } lazy val fileIndex: PartitionAwareFileIndex = createFileIndex(){code} This would allow reusing fileIndex logic downstream by wrapping it up with custom implementations. (Note that this proposed change considers custom partition schema in read path only. Write path is out of the scope of this change.) -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org