[I] Spark multiple sources found for "iceberg" [iceberg]

via GitHub Wed, 09 Apr 2025 00:39:52 -0700


lkindere opened a new issue, #12750:
URL: https://github.com/apache/iceberg/issues/12750


   When executing Iceberg maintenance spark actions via Databricks using DBR > 
14.2
   The following exception is thrown:
   
   
   `RuntimeException: org.apache.spark.sql.AnalysisException: Multiple sources 
found for iceberg 
(com.databricks.sql.transaction.tahoe.uniform.sources.IcebergBrowseOnlyDataSource,
 org.apache.iceberg.spark.source.IcebergSource), please specify the fully 
qualified class name.
   Caused by: AnalysisException: Multiple sources found for iceberg 
(com.databricks.sql.transaction.tahoe.uniform.sources.IcebergBrowseOnlyDataSource,
 org.apache.iceberg.spark.source.IcebergSource), please specify the fully 
qualified class name.at 
org.apache.iceberg.util.ExceptionUtil.castAndThrow(ExceptionUtil.java:39)
        at org.apache.iceberg.util.Tasks.throwOne(Tasks.java:595)
        at org.apache.iceberg.util.Tasks.access$100(Tasks.java:42)
        at org.apache.iceberg.util.Tasks$Builder.runParallel(Tasks.java:394)
        at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:201)
        at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:196)
        at 
org.apache.iceberg.spark.actions.RewriteDataFilesSparkAction.doExecute(RewriteDataFilesSparkAction.java:284)
        at 
org.apache.iceberg.spark.actions.RewriteDataFilesSparkAction.execute(RewriteDataFilesSparkAction.java:178)
        at 
wcdp2.compaction.core.DataFileCompactor$.compact(DataFileCompactor.scala:31)
        at 
wcdp2.compaction.core.Compactor$.$anonfun$runCompaction$1(Compactor.scala:13)
        at 
wcdp2.compaction.core.Compactor$.$anonfun$runCompaction$1$adapted(Compactor.scala:8)
        at scala.collection.immutable.List.foreach(List.scala:431)
        at wcdp2.compaction.core.Compactor$.runCompaction(Compactor.scala:8)
        at wcdp2.compaction.CompactorMain$.main(CompactorMain.scala:16)
   [...]
   Caused by: org.apache.spark.sql.AnalysisException: Multiple sources found 
for iceberg 
(com.databricks.sql.transaction.tahoe.uniform.sources.IcebergBrowseOnlyDataSource,
 org.apache.iceberg.spark.source.IcebergSource), please specify the fully 
qualified class name.
        at 
org.apache.spark.sql.errors.QueryCompilationErrors$.findMultipleDataSourceError(QueryCompilationErrors.scala:1935)
        at 
org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:830)
        at 
org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSourceV2(DataSource.scala:853)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:342)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:245)
        at 
org.apache.iceberg.spark.actions.SparkBinPackDataRewriter.doRewrite(SparkBinPackDataRewriter.java:52)
        at 
org.apache.iceberg.spark.actions.SparkSizeBasedDataRewriter.rewrite(SparkSizeBasedDataRewriter.java:58)
        at 
org.apache.iceberg.spark.actions.RewriteDataFilesSparkAction.lambda$rewriteFiles$0(RewriteDataFilesSparkAction.java:243)
        at 
org.apache.iceberg.spark.JobGroupUtils.withJobGroupInfo(JobGroupUtils.java:59)
        at 
org.apache.iceberg.spark.JobGroupUtils.withJobGroupInfo(JobGroupUtils.java:51)
        at 
org.apache.iceberg.spark.actions.BaseSparkAction.withJobGroupInfo(BaseSparkAction.java:130)
        at 
org.apache.iceberg.spark.actions.RewriteDataFilesSparkAction.rewriteFiles(RewriteDataFilesSparkAction.java:241)
        at 
org.apache.iceberg.spark.actions.RewriteDataFilesSparkAction.lambda$doExecute$2(RewriteDataFilesSparkAction.java:286)
        at 
org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:413)
        at org.apache.iceberg.util.Tasks$Builder.access$300(Tasks.java:69)
        at org.apache.iceberg.util.Tasks$Builder$1.run(Tasks.java:315)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750) `
   
   It seems to be due to 
https://github.com/apache/iceberg/blob/9ce0fcf0af7becf25ad9fc996c3bad2afdcfd33d/spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/SparkBinPackDataRewriter.java#L48
   
   where the format "iceberg" is ambiguous as Databricks also has an "iceberg" 
data source, would it be possible to use the fully qualified name in this case? 
e.g. "org.apache.iceberg.spark.source.IcebergSource"?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] Spark multiple sources found for "iceberg" [iceberg]

Reply via email to