sristiraj opened a new issue, #6463:
URL: https://github.com/apache/iceberg/issues/6463
### Apache Iceberg version
1.1.0 (latest release)
### Query engine
Spark
### Please describe the bug 🐞
I am working on using iceberg for spark ingestion and data lake table
format. My data is test data for now with 2 columns ["id", "name"]. Table
itself is partitioned on column "id". I am trying to run a delete operation
using Spark SQL delete statement.
Delete operation performed on iceberg table is failing with error while
select and other write operations are passing ok.
Here is the code I am trying to run:
```
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions.col
object NessieTest{
def main(args: Array[String]):Unit={
val spark =
SparkSession.builder.appName("nessie").master("local[*]").config("spark.sql.extensions",
"org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions").
config("spark.sql.catalog.spark_catalog","org.apache.iceberg.spark.SparkSessionCatalog").
config("spark.sql.catalog.spark_catalog.type","hive").
config("spark.sql.catalog.demo","org.apache.iceberg.spark.SparkCatalog").
config("spark.sql.catalog.demo.type","hadoop").
config("spark.sql.catalog.demo.warehouse","/home/wicked/Downloads/nessie/warehouse").
config("spark.sql.defaultCatalog","demo").getOrCreate()
spark.sql("alter table demo.iceberg.test1 SET
TBLPROPERTIES('format-version'='2')")
// spark.sql("alter table demo.iceberg.test2 DROP partition field
id")
// spark.sql("alter table demo.iceberg.test3 ADD partition field
name")
val df =
spark.createDataFrame(Seq((1,"hello1"),(3,"hey"))).toDF("id","name")
// df.write.format("iceberg").mode("overwrite").save("iceberg.test")
//
df.writeTo("iceberg.test3").tableProperty("write.format.default","parquet").partitionedBy(col("name")).createOrReplace()
// val df1 = spark.sql("select * from demo.iceberg.test1")
spark.sql("delete from demo.iceberg.test where name='hello'")
val df1 = spark.sql("select * from demo.iceberg.test1 where
name='hello'")
df1.show()
}
}
```
Error received as below:
```
sbt:nessie> run
[info] compiling 1 Scala source to
/home/wicked/Downloads/nessie/target/scala-2.12/classes ...
[info] running NessieTest
Using Spark's default log4j profile:
org/apache/spark/log4j2-defaults.properties
22/12/20 20:45:17 WARN Utils: Your hostname, wicked resolves to a loopback
address: 127.0.1.1; using 192.168.0.107 instead (on interface wlp58s0)
22/12/20 20:45:17 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to
another address
22/12/20 20:45:18 INFO SparkContext: Running Spark version 3.3.0
22/12/20 20:45:18 WARN NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
22/12/20 20:45:18 INFO ResourceUtils:
==============================================================
22/12/20 20:45:18 INFO ResourceUtils: No custom resources configured for
spark.driver.
22/12/20 20:45:18 INFO ResourceUtils:
==============================================================
22/12/20 20:45:18 INFO SparkContext: Submitted application: nessie
22/12/20 20:45:18 INFO ResourceProfile: Default ResourceProfile created,
executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: ,
memory -> name: memory, amount: 1024, script: , vendor: , offHeap -> name:
offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name:
cpus, amount: 1.0)
22/12/20 20:45:18 INFO ResourceProfile: Limiting resource is cpu
22/12/20 20:45:18 INFO ResourceProfileManager: Added ResourceProfile id: 0
22/12/20 20:45:18 INFO SecurityManager: Changing view acls to: wicked
22/12/20 20:45:18 INFO SecurityManager: Changing modify acls to: wicked
22/12/20 20:45:18 INFO SecurityManager: Changing view acls groups to:
22/12/20 20:45:18 INFO SecurityManager: Changing modify acls groups to:
22/12/20 20:45:18 INFO SecurityManager: SecurityManager: authentication
disabled; ui acls disabled; users with view permissions: Set(wicked); groups
with view permissions: Set(); users with modify permissions: Set(wicked);
groups with modify permissions: Set()
22/12/20 20:45:18 INFO Utils: Successfully started service 'sparkDriver' on
port 38697.
22/12/20 20:45:18 INFO SparkEnv: Registering MapOutputTracker
22/12/20 20:45:18 INFO SparkEnv: Registering BlockManagerMaster
22/12/20 20:45:18 INFO BlockManagerMasterEndpoint: Using
org.apache.spark.storage.DefaultTopologyMapper for getting topology information
22/12/20 20:45:18 INFO BlockManagerMasterEndpoint:
BlockManagerMasterEndpoint up
22/12/20 20:45:18 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
22/12/20 20:45:18 INFO DiskBlockManager: Created local directory at
/tmp/blockmgr-fed3fe61-786b-4244-a04d-a64de4ea8108
22/12/20 20:45:18 INFO MemoryStore: MemoryStore started with capacity 434.4
MiB
22/12/20 20:45:18 INFO SparkEnv: Registering OutputCommitCoordinator
22/12/20 20:45:18 WARN Utils: Service 'SparkUI' could not bind on port 4040.
Attempting port 4041.
22/12/20 20:45:19 WARN Utils: Service 'SparkUI' could not bind on port 4041.
Attempting port 4042.
22/12/20 20:45:19 WARN Utils: Service 'SparkUI' could not bind on port 4042.
Attempting port 4043.
22/12/20 20:45:19 WARN Utils: Service 'SparkUI' could not bind on port 4043.
Attempting port 4044.
22/12/20 20:45:19 WARN Utils: Service 'SparkUI' could not bind on port 4044.
Attempting port 4045.
22/12/20 20:45:19 WARN Utils: Service 'SparkUI' could not bind on port 4045.
Attempting port 4046.
22/12/20 20:45:19 WARN Utils: Service 'SparkUI' could not bind on port 4046.
Attempting port 4047.
22/12/20 20:45:19 WARN Utils: Service 'SparkUI' could not bind on port 4047.
Attempting port 4048.
22/12/20 20:45:19 INFO Utils: Successfully started service 'SparkUI' on port
4048.
22/12/20 20:45:19 INFO Executor: Starting executor ID driver on host
192.168.0.107
22/12/20 20:45:19 INFO Executor: Starting executor with user classpath
(userClassPathFirst = false): ''
22/12/20 20:45:19 INFO Utils: Successfully started service
'org.apache.spark.network.netty.NettyBlockTransferService' on port 44117.
22/12/20 20:45:19 INFO NettyBlockTransferService: Server created on
192.168.0.107:44117
22/12/20 20:45:19 INFO BlockManager: Using
org.apache.spark.storage.RandomBlockReplicationPolicy for block replication
policy
22/12/20 20:45:19 INFO BlockManagerMaster: Registering BlockManager
BlockManagerId(driver, 192.168.0.107, 44117, None)
22/12/20 20:45:19 INFO BlockManagerMasterEndpoint: Registering block manager
192.168.0.107:44117 with 434.4 MiB RAM, BlockManagerId(driver, 192.168.0.107,
44117, None)
22/12/20 20:45:19 INFO BlockManagerMaster: Registered BlockManager
BlockManagerId(driver, 192.168.0.107, 44117, None)
22/12/20 20:45:19 INFO BlockManager: Initialized BlockManager:
BlockManagerId(driver, 192.168.0.107, 44117, None)
22/12/20 20:45:19 WARN SharedState: URL.setURLStreamHandlerFactory failed to
set FsUrlStreamHandlerFactory
22/12/20 20:45:19 INFO SharedState: Setting hive.metastore.warehouse.dir
('null') to the value of spark.sql.warehouse.dir.
22/12/20 20:45:19 INFO SharedState: Warehouse path is
'file:/home/wicked/Downloads/nessie/spark-warehouse'.
22/12/20 20:45:20 INFO BaseMetastoreCatalog: Table loaded by catalog:
demo.iceberg.test1
Exception in thread "sbt-bg-threads-22" java.lang.NoClassDefFoundError:
scala/jdk/CollectionConverters$
at
org.apache.spark.sql.catalyst.parser.extensions.IcebergSparkSqlExtensionsParser$UnresolvedIcebergTable$.isIcebergTable(IcebergSparkSqlExtensionsParser.scala:170)
at
org.apache.spark.sql.catalyst.parser.extensions.IcebergSparkSqlExtensionsParser$UnresolvedIcebergTable$.unapply(IcebergSparkSqlExtensionsParser.scala:162)
at
org.apache.spark.sql.catalyst.parser.extensions.IcebergSparkSqlExtensionsParser$$anonfun$replaceRowLevelCommands$1.applyOrElse(IcebergSparkSqlExtensionsParser.scala:144)
at
org.apache.spark.sql.catalyst.parser.extensions.IcebergSparkSqlExtensionsParser$$anonfun$replaceRowLevelCommands$1.applyOrElse(IcebergSparkSqlExtensionsParser.scala:143)
at
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDownWithPruning$2(AnalysisHelper.scala:170)
at
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:176)
at
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDownWithPruning$1(AnalysisHelper.scala:170)
at
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:323)
at
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDownWithPruning(AnalysisHelper.scala:168)
at
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDownWithPruning$(AnalysisHelper.scala:164)
at
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsDownWithPruning(LogicalPlan.scala:30)
at
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDown(AnalysisHelper.scala:160)
at
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDown$(AnalysisHelper.scala:159)
at
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsDown(LogicalPlan.scala:30)
at
org.apache.spark.sql.catalyst.parser.extensions.IcebergSparkSqlExtensionsParser.replaceRowLevelCommands(IcebergSparkSqlExtensionsParser.scala:143)
at
org.apache.spark.sql.catalyst.parser.extensions.IcebergSparkSqlExtensionsParser.parsePlan(IcebergSparkSqlExtensionsParser.scala:138)
at
org.apache.spark.sql.SparkSession.$anonfun$sql$2(SparkSession.scala:620)
at
org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
at
org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:620)
at
org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:617)
at NessieTest$.main(app.scala:21)
at NessieTest.main(app.scala)
at
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at sbt.Run.invokeMain(Run.scala:143)
at sbt.Run.execute$1(Run.scala:93)
at sbt.Run.$anonfun$runWithLoader$5(Run.scala:120)
at sbt.Run$.executeSuccess(Run.scala:186)
at sbt.Run.runWithLoader(Run.scala:120)
at sbt.Defaults$.$anonfun$bgRunTask$6(Defaults.scala:1981)
at sbt.Defaults$.$anonfun$termWrapper$2(Defaults.scala:1920)
at
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at scala.util.Try$.apply(Try.scala:213)
at
sbt.internal.BackgroundThreadPool$BackgroundRunnable.run(DefaultBackgroundJobService.scala:369)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.ClassNotFoundException: scala.jdk.CollectionConverters$
at
java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:476)
at
sbt.internal.ManagedClassLoader.findClass(ManagedClassLoader.java:102)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:589)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
... 40 more
```
Please advise on how to overcome this. Thanks in advance!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]