andygrove commented on code in PR #1930:
URL: https://github.com/apache/datafusion-comet/pull/1930#discussion_r2167730097
##########
spark/src/main/scala/org/apache/comet/rules/CometScanRule.scala:
##########
@@ -258,11 +258,15 @@ case class CometScanRule(session: SparkSession) extends
Rule[SparkPlan] {
}
private def selectScan(scanExec: FileSourceScanExec, partitionSchema:
StructType): String = {
- // TODO these checks are not yet exhaustive. For example,
native_iceberg_compat does
- // not support reading from S3
val fallbackReasons = new ListBuffer[String]()
+ // native_iceberg_compat only supports local filesystem and S3
+ if (!scanExec.relation.inputFiles
+ .forall(path => path.startsWith("file://") ||
path.startsWith("s3a://"))) {
Review Comment:
@parthchandra I may need some guidance here. Just looking at the input files
causes some stats to be updated (such as
`HiveCatalogMetrics.METRIC_FILES_DISCOVERED` and
`HiveCatalogMetrics.METRIC_FILE_CACHE_HITS`), leading to some test failures. I
wonder if this could be adding some overhead.
Do you know if there is a more efficient way for us to check for supported
file locations?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]