[ https://issues.apache.org/jira/browse/SPARK-34423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17283107#comment-17283107 ]
Omer Ozarslan commented on SPARK-34423: --------------------------------------- If this sounds good, I can happily submit a PR. Thanks. > Allow FileTable.fileIndex to be reused for custom partition schema in > DataSourceV2 read path > -------------------------------------------------------------------------------------------- > > Key: SPARK-34423 > URL: https://issues.apache.org/jira/browse/SPARK-34423 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.0.1 > Reporter: Omer Ozarslan > Priority: Minor > > It is currently possible to provide custom partition schema in DataSourceV2 > read path with custom implementations of > PartitionAwareFileIndex/PartitionSpec and by overriding fileIndex in a > subclass of FileTable. Since fileIndex is lazy val it's not possible to reuse > it from the subclass however (i.e. super.fileIndex). > [https://github.com/apache/spark/blob/e0053853c90d39ef6de9d59fb933525e20bae1fa/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FileTable.scala#L44-L61] > Duplicating this code in the subclass is possible but somewhat hacky e.g. > DataSource globbing function is private API. I was wondering if this logic > can be refactored into something like this: > {code:java} > def createFileIndex(): PartitionAwareFileIndex = { > ...[current fileIndex logic]... > } > lazy val fileIndex: PartitionAwareFileIndex = createFileIndex(){code} > This would allow reusing fileIndex logic downstream by wrapping it up with > custom implementations. > (Note that this proposed change considers custom partition schema in read > path only. Write path is out of the scope of this change.) -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org