lkindere opened a new issue, #12750:
URL: https://github.com/apache/iceberg/issues/12750
When executing Iceberg maintenance spark actions via Databricks using DBR >
14.2
The following exception is thrown:
`RuntimeException: org.apache.spark.sql.AnalysisException: Multiple sources
found for iceberg
(com.databricks.sql.transaction.tahoe.uniform.sources.IcebergBrowseOnlyDataSource,
org.apache.iceberg.spark.source.IcebergSource), please specify the fully
qualified class name.
Caused by: AnalysisException: Multiple sources found for iceberg
(com.databricks.sql.transaction.tahoe.uniform.sources.IcebergBrowseOnlyDataSource,
org.apache.iceberg.spark.source.IcebergSource), please specify the fully
qualified class name.at
org.apache.iceberg.util.ExceptionUtil.castAndThrow(ExceptionUtil.java:39)
at org.apache.iceberg.util.Tasks.throwOne(Tasks.java:595)
at org.apache.iceberg.util.Tasks.access$100(Tasks.java:42)
at org.apache.iceberg.util.Tasks$Builder.runParallel(Tasks.java:394)
at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:201)
at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:196)
at
org.apache.iceberg.spark.actions.RewriteDataFilesSparkAction.doExecute(RewriteDataFilesSparkAction.java:284)
at
org.apache.iceberg.spark.actions.RewriteDataFilesSparkAction.execute(RewriteDataFilesSparkAction.java:178)
at
wcdp2.compaction.core.DataFileCompactor$.compact(DataFileCompactor.scala:31)
at
wcdp2.compaction.core.Compactor$.$anonfun$runCompaction$1(Compactor.scala:13)
at
wcdp2.compaction.core.Compactor$.$anonfun$runCompaction$1$adapted(Compactor.scala:8)
at scala.collection.immutable.List.foreach(List.scala:431)
at wcdp2.compaction.core.Compactor$.runCompaction(Compactor.scala:8)
at wcdp2.compaction.CompactorMain$.main(CompactorMain.scala:16)
[...]
Caused by: org.apache.spark.sql.AnalysisException: Multiple sources found
for iceberg
(com.databricks.sql.transaction.tahoe.uniform.sources.IcebergBrowseOnlyDataSource,
org.apache.iceberg.spark.source.IcebergSource), please specify the fully
qualified class name.
at
org.apache.spark.sql.errors.QueryCompilationErrors$.findMultipleDataSourceError(QueryCompilationErrors.scala:1935)
at
org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:830)
at
org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSourceV2(DataSource.scala:853)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:342)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:245)
at
org.apache.iceberg.spark.actions.SparkBinPackDataRewriter.doRewrite(SparkBinPackDataRewriter.java:52)
at
org.apache.iceberg.spark.actions.SparkSizeBasedDataRewriter.rewrite(SparkSizeBasedDataRewriter.java:58)
at
org.apache.iceberg.spark.actions.RewriteDataFilesSparkAction.lambda$rewriteFiles$0(RewriteDataFilesSparkAction.java:243)
at
org.apache.iceberg.spark.JobGroupUtils.withJobGroupInfo(JobGroupUtils.java:59)
at
org.apache.iceberg.spark.JobGroupUtils.withJobGroupInfo(JobGroupUtils.java:51)
at
org.apache.iceberg.spark.actions.BaseSparkAction.withJobGroupInfo(BaseSparkAction.java:130)
at
org.apache.iceberg.spark.actions.RewriteDataFilesSparkAction.rewriteFiles(RewriteDataFilesSparkAction.java:241)
at
org.apache.iceberg.spark.actions.RewriteDataFilesSparkAction.lambda$doExecute$2(RewriteDataFilesSparkAction.java:286)
at
org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:413)
at org.apache.iceberg.util.Tasks$Builder.access$300(Tasks.java:69)
at org.apache.iceberg.util.Tasks$Builder$1.run(Tasks.java:315)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750) `
It seems to be due to
https://github.com/apache/iceberg/blob/9ce0fcf0af7becf25ad9fc996c3bad2afdcfd33d/spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/SparkBinPackDataRewriter.java#L48
where the format "iceberg" is ambiguous as Databricks also has an "iceberg"
data source, would it be possible to use the fully qualified name in this case?
e.g. "org.apache.iceberg.spark.source.IcebergSource"?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]