vitaliikalmykov opened a new issue, #15602:
URL: https://github.com/apache/iceberg/issues/15602
### Apache Iceberg version
1.10.1 (latest release)
### Query engine
Spark
### Please describe the bug 🐞
Hello.
Spark job is failing during merge query is SPJ is enabled.
Merge query:
`spark.sql("""merge into lines_actions t
using staging.lines_actions s
on t.created>='2026-03-10' and
t.filid = s.filid and t.actiontype = s.actiontype and t.actionid =
s.actionid
when matched
and t.__source_ts_ms<s.__source_ts_ms
then update set
created = s.created, discpercent = s.discpercent, discount = s.discount,
varchardata = s.varchardata, __deleted = s.__deleted, __op = s.__op, __table =
s.__table, __source_ts_ms = s.__source_ts_ms, __topic = s.__topic, __schema_id
= s.__schema_id
when not matched
then
insert (filid, created, actiontype, actionid, discpercent, discount,
varchardata, __deleted, __op, __table, __source_ts_ms, __topic, __schema_id)
values (s.filid, s.created, s.actiontype, s.actionid, s.discpercent,
s.discount, s.varchardata, s.__deleted, s.__op, s.__table, s.__source_ts_ms,
s.__topic, s.__schema_id)
""")`
Tables DDL:
`CREATE TABLE lines_actions (
filid integer,
created timestamp,
actiontype integer,
actionid integer,
discpercent decimal(9, 3),
discount decimal(9, 2),
varchardata varchar,
__deleted varchar,
__op varchar,
__table varchar,
__source_ts_ms bigint,
__topic varchar,
__schema_id integer
)
WITH (
partitioning = ARRAY['month(created)'],
sorted_by = ARRAY['created ASC NULLS FIRST']
);`
SPJ config:
`"spark.sql.sources.v2.bucketing.enabled":"true",
"spark.sql.sources.v2.bucketing.pushPartValues.enabled":"true",
"spark.sql.iceberg.planning.preserve-data-grouping":"true",
"spark.sql.requireAllClusterKeysForCoPartition":"false",
"spark.sql.sources.v2.bucketing.partiallyClusteredDistribution.enabled":"true",
"spark.sql.join.preferSortMergeJoin":"false"
"spark.sql.autoBroadcastJoinThreshold": "-1"
"spark.sql.adaptive.enabled": "false"`
Error message:
`java.lang.UnsupportedOperationException: Can't retrieve values from an
empty struct
at org.apache.iceberg.EmptyStructLike.get(EmptyStructLike.java:40)
at
org.apache.iceberg.spark.source.StructInternalRow.isNullAt(StructInternalRow.java:104)
at
org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificOrdering.compare(Unknown
Source)
at
org.apache.spark.sql.catalyst.util.InternalRowComparableWrapper.equals(InternalRowComparableWrapper.scala:51)
at scala.runtime.BoxesRunTime.equals2(BoxesRunTime.java:137)
at scala.runtime.BoxesRunTime.equals(BoxesRunTime.java:123)
at scala.collection.immutable.Map$Map1.getOrElse(Map.scala:168)
at
org.apache.spark.sql.execution.datasources.v2.BatchScanExec.$anonfun$inputRDD$14(BatchScanExec.scala:218)
at
scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at
scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at scala.collection.TraversableLike.map(TraversableLike.scala:286)
at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
at scala.collection.AbstractTraversable.map(Traversable.scala:108)
at
org.apache.spark.sql.execution.datasources.v2.BatchScanExec.inputRDD$lzycompute(BatchScanExec.scala:215)
at
org.apache.spark.sql.execution.datasources.v2.BatchScanExec.inputRDD(BatchScanExec.scala:143)
at
org.apache.spark.sql.execution.datasources.v2.DataSourceV2ScanExecBase.doExecuteColumnar(DataSourceV2ScanExecBase.scala:214)
at
org.apache.spark.sql.execution.datasources.v2.DataSourceV2ScanExecBase.doExecuteColumnar$(DataSourceV2ScanExecBase.scala:212)
at
org.apache.spark.sql.execution.datasources.v2.BatchScanExec.doExecuteColumnar(BatchScanExec.scala:41)
at
org.apache.spark.sql.execution.SparkPlan.$anonfun$executeColumnar$1(SparkPlan.scala:255)
at
org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:305)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:152)
at
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:280)
at
org.apache.spark.sql.execution.SparkPlan.executeColumnar(SparkPlan.scala:251)
at
org.apache.spark.sql.execution.InputAdapter.doExecuteColumnar(WholeStageCodegenExec.scala:678)
at
org.apache.spark.sql.execution.SparkPlan.$anonfun$executeColumnar$1(SparkPlan.scala:255)
at
org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:305)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:152)
at
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:280)
at
org.apache.spark.sql.execution.SparkPlan.executeColumnar(SparkPlan.scala:251)
at
org.apache.spark.sql.execution.ColumnarToRowExec.inputRDDs(Columnar.scala:432)
at
org.apache.spark.sql.execution.FilterExec.inputRDDs(basicPhysicalOperators.scala:313)
at
org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:992)
at
org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:228)
at
org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:305)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:152)
at
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:280)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:224)
at
org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:434)
at
org.apache.spark.sql.execution.SparkPlan.executeCollectIterator(SparkPlan.scala:534)
at
org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.org$apache$spark$sql$execution$exchange$BroadcastExchangeExec$$doComputeRelation(BroadcastExchangeExec.scala:198)
at
org.apache.spark.sql.execution.exchange.BroadcastExchangeExec$$anon$1.doCompute(BroadcastExchangeExec.scala:191)
at
org.apache.spark.sql.execution.exchange.BroadcastExchangeExec$$anon$1.doCompute(BroadcastExchangeExec.scala:184)
at
org.apache.spark.sql.execution.AsyncDriverOperation.$anonfun$compute$1(AsyncDriverOperation.scala:75)
at
org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:108)
at
org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:391)
at
org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:383)
at
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withExecutionId$1(SQLExecution.scala:366)
at
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:412)
at
org.apache.spark.sql.execution.SQLExecution$.withExecutionId(SQLExecution.scala:363)
at
org.apache.spark.sql.execution.AsyncDriverOperation.compute(AsyncDriverOperation.scala:69)
at
org.apache.spark.sql.execution.AsyncDriverOperation.$anonfun$computeFuture$1(AsyncDriverOperation.scala:55)
at
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withThreadLocalCaptured$2(SQLExecution.scala:435)
at
org.apache.spark.JobArtifactSet$.withActiveJobArtifactState(JobArtifactSet.scala:94)
at
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withThreadLocalCaptured$1(SQLExecution.scala:430)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:840)
at
org.apache.spark.sql.execution.adaptive.AdaptiveExecutor.checkNoFailures(AdaptiveExecutor.scala:175)
at
org.apache.spark.sql.execution.adaptive.AdaptiveExecutor.doRun(AdaptiveExecutor.scala:97)
at
org.apache.spark.sql.execution.adaptive.AdaptiveExecutor.tryRunningAndGetFuture(AdaptiveExecutor.scala:75)
at
org.apache.spark.sql.execution.adaptive.AdaptiveExecutor.execute(AdaptiveExecutor.scala:59)
at
org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.$anonfun$getFinalPhysicalPlan$1(AdaptiveSparkPlanExec.scala:304)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:996)
at
org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.getFinalPhysicalPlan(AdaptiveSparkPlanExec.scala:303)
at
org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.withFinalPlanUpdate(AdaptiveSparkPlanExec.scala:610)
at
org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.doExecute(AdaptiveSparkPlanExec.scala:596)
at
org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:228)
at
org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:305)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:152)
at
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:280)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:224)
at
org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.writeWithV2(WriteToDataSourceV2Exec.scala:1326)
at
org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.writeWithV2$(WriteToDataSourceV2Exec.scala:1324)
at
org.apache.spark.sql.execution.datasources.v2.ReplaceDataExec.writeWithV2(WriteToDataSourceV2Exec.scala:1236)
at
org.apache.spark.sql.execution.datasources.v2.V2ExistingTableWriteExec.run(WriteToDataSourceV2Exec.scala:1302)
at
org.apache.spark.sql.execution.datasources.v2.V2ExistingTableWriteExec.run$(WriteToDataSourceV2Exec.scala:1301)
at
org.apache.spark.sql.execution.datasources.v2.ReplaceDataExec.run(WriteToDataSourceV2Exec.scala:1236)
at
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:43)
at
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:43)
at
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:49)
... 46 elided`
### Willingness to contribute
- [ ] I can contribute a fix for this bug independently
- [x] I would be willing to contribute a fix for this bug with guidance from
the Iceberg community
- [x] I cannot contribute a fix for this bug at this time
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]