Re: [PR] feat: `auto` scan mode should check for supported file location [datafusion-comet]

via GitHub Wed, 25 Jun 2025 15:14:24 -0700


andygrove commented on code in PR #1930:
URL: https://github.com/apache/datafusion-comet/pull/1930#discussion_r2167730097



##########
spark/src/main/scala/org/apache/comet/rules/CometScanRule.scala:
##########
@@ -258,11 +258,15 @@ case class CometScanRule(session: SparkSession) extends 
Rule[SparkPlan] {
   }
 
   private def selectScan(scanExec: FileSourceScanExec, partitionSchema: 
StructType): String = {
-    // TODO these checks are not yet exhaustive. For example, 
native_iceberg_compat does
-    //  not support reading from S3
 
     val fallbackReasons = new ListBuffer[String]()
 
+    // native_iceberg_compat only supports local filesystem and S3
+    if (!scanExec.relation.inputFiles
+        .forall(path => path.startsWith("file://") || 
path.startsWith("s3a://"))) {

Review Comment:
   @parthchandra I may need some guidance here. Just looking at the input files 
causes some stats to be updated (such as 
`HiveCatalogMetrics.METRIC_FILES_DISCOVERED` and 
`HiveCatalogMetrics.METRIC_FILE_CACHE_HITS`), leading to some test failures. I 
wonder if this could be adding some overhead.
   
   Do you know if there is a more efficient way for us to check for supported 
file locations?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] feat: `auto` scan mode should check for supported file location [datafusion-comet]

Reply via email to