[GitHub] [iceberg] sristiraj opened a new issue, #6463: Iceberg delete operation failing in spark 3.3.0 using Spark SQL

GitBox Tue, 20 Dec 2022 07:18:13 -0800


sristiraj opened a new issue, #6463:
URL: https://github.com/apache/iceberg/issues/6463


   ### Apache Iceberg version
   
   1.1.0 (latest release)
   
   ### Query engine
   
   Spark
   
   ### Please describe the bug 🐞
   
   I am working on using iceberg for spark ingestion and data lake table 
format. My data is test data for now with 2 columns ["id", "name"]. Table 
itself is partitioned on column "id". I am trying to run a delete operation 
using Spark SQL delete statement. 
   
   Delete operation performed on iceberg table is failing with error while 
select and other write operations are passing ok.
   
   Here is the code I am trying to run:
   
   ```
   
   import org.apache.spark.sql.SparkSession
   import org.apache.spark.sql.functions.col
   
   
   object NessieTest{
       def main(args: Array[String]):Unit={
           val spark = 
SparkSession.builder.appName("nessie").master("local[*]").config("spark.sql.extensions",
 "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions").
           
config("spark.sql.catalog.spark_catalog","org.apache.iceberg.spark.SparkSessionCatalog").
           config("spark.sql.catalog.spark_catalog.type","hive").
           
config("spark.sql.catalog.demo","org.apache.iceberg.spark.SparkCatalog").
           config("spark.sql.catalog.demo.type","hadoop").
           
config("spark.sql.catalog.demo.warehouse","/home/wicked/Downloads/nessie/warehouse").
           config("spark.sql.defaultCatalog","demo").getOrCreate()
           spark.sql("alter table demo.iceberg.test1 SET 
TBLPROPERTIES('format-version'='2')")
           // spark.sql("alter table demo.iceberg.test2 DROP partition field 
id")
           // spark.sql("alter table demo.iceberg.test3 ADD partition field 
name")
           val df = 
spark.createDataFrame(Seq((1,"hello1"),(3,"hey"))).toDF("id","name")
           // df.write.format("iceberg").mode("overwrite").save("iceberg.test")
           // 
df.writeTo("iceberg.test3").tableProperty("write.format.default","parquet").partitionedBy(col("name")).createOrReplace()
           // val df1 = spark.sql("select * from demo.iceberg.test1")
           spark.sql("delete from demo.iceberg.test where name='hello'")
           val df1 = spark.sql("select * from demo.iceberg.test1 where 
name='hello'")
           df1.show()
       }
   }
   
   ```
   
   Error received as below:
   
   ```
   sbt:nessie> run
   [info] compiling 1 Scala source to 
/home/wicked/Downloads/nessie/target/scala-2.12/classes ...
   [info] running NessieTest 
   Using Spark's default log4j profile: 
org/apache/spark/log4j2-defaults.properties
   22/12/20 20:45:17 WARN Utils: Your hostname, wicked resolves to a loopback 
address: 127.0.1.1; using 192.168.0.107 instead (on interface wlp58s0)
   22/12/20 20:45:17 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to 
another address
   22/12/20 20:45:18 INFO SparkContext: Running Spark version 3.3.0
   22/12/20 20:45:18 WARN NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
   22/12/20 20:45:18 INFO ResourceUtils: 
==============================================================
   22/12/20 20:45:18 INFO ResourceUtils: No custom resources configured for 
spark.driver.
   22/12/20 20:45:18 INFO ResourceUtils: 
==============================================================
   22/12/20 20:45:18 INFO SparkContext: Submitted application: nessie
   22/12/20 20:45:18 INFO ResourceProfile: Default ResourceProfile created, 
executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , 
memory -> name: memory, amount: 1024, script: , vendor: , offHeap -> name: 
offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: 
cpus, amount: 1.0)
   22/12/20 20:45:18 INFO ResourceProfile: Limiting resource is cpu
   22/12/20 20:45:18 INFO ResourceProfileManager: Added ResourceProfile id: 0
   22/12/20 20:45:18 INFO SecurityManager: Changing view acls to: wicked
   22/12/20 20:45:18 INFO SecurityManager: Changing modify acls to: wicked
   22/12/20 20:45:18 INFO SecurityManager: Changing view acls groups to: 
   22/12/20 20:45:18 INFO SecurityManager: Changing modify acls groups to: 
   22/12/20 20:45:18 INFO SecurityManager: SecurityManager: authentication 
disabled; ui acls disabled; users  with view permissions: Set(wicked); groups 
with view permissions: Set(); users  with modify permissions: Set(wicked); 
groups with modify permissions: Set()
   22/12/20 20:45:18 INFO Utils: Successfully started service 'sparkDriver' on 
port 38697.
   22/12/20 20:45:18 INFO SparkEnv: Registering MapOutputTracker
   22/12/20 20:45:18 INFO SparkEnv: Registering BlockManagerMaster
   22/12/20 20:45:18 INFO BlockManagerMasterEndpoint: Using 
org.apache.spark.storage.DefaultTopologyMapper for getting topology information
   22/12/20 20:45:18 INFO BlockManagerMasterEndpoint: 
BlockManagerMasterEndpoint up
   22/12/20 20:45:18 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
   22/12/20 20:45:18 INFO DiskBlockManager: Created local directory at 
/tmp/blockmgr-fed3fe61-786b-4244-a04d-a64de4ea8108
   22/12/20 20:45:18 INFO MemoryStore: MemoryStore started with capacity 434.4 
MiB
   22/12/20 20:45:18 INFO SparkEnv: Registering OutputCommitCoordinator
   22/12/20 20:45:18 WARN Utils: Service 'SparkUI' could not bind on port 4040. 
Attempting port 4041.
   22/12/20 20:45:19 WARN Utils: Service 'SparkUI' could not bind on port 4041. 
Attempting port 4042.
   22/12/20 20:45:19 WARN Utils: Service 'SparkUI' could not bind on port 4042. 
Attempting port 4043.
   22/12/20 20:45:19 WARN Utils: Service 'SparkUI' could not bind on port 4043. 
Attempting port 4044.
   22/12/20 20:45:19 WARN Utils: Service 'SparkUI' could not bind on port 4044. 
Attempting port 4045.
   22/12/20 20:45:19 WARN Utils: Service 'SparkUI' could not bind on port 4045. 
Attempting port 4046.
   22/12/20 20:45:19 WARN Utils: Service 'SparkUI' could not bind on port 4046. 
Attempting port 4047.
   22/12/20 20:45:19 WARN Utils: Service 'SparkUI' could not bind on port 4047. 
Attempting port 4048.
   22/12/20 20:45:19 INFO Utils: Successfully started service 'SparkUI' on port 
4048.
   22/12/20 20:45:19 INFO Executor: Starting executor ID driver on host 
192.168.0.107
   22/12/20 20:45:19 INFO Executor: Starting executor with user classpath 
(userClassPathFirst = false): ''
   22/12/20 20:45:19 INFO Utils: Successfully started service 
'org.apache.spark.network.netty.NettyBlockTransferService' on port 44117.
   22/12/20 20:45:19 INFO NettyBlockTransferService: Server created on 
192.168.0.107:44117
   22/12/20 20:45:19 INFO BlockManager: Using 
org.apache.spark.storage.RandomBlockReplicationPolicy for block replication 
policy
   22/12/20 20:45:19 INFO BlockManagerMaster: Registering BlockManager 
BlockManagerId(driver, 192.168.0.107, 44117, None)
   22/12/20 20:45:19 INFO BlockManagerMasterEndpoint: Registering block manager 
192.168.0.107:44117 with 434.4 MiB RAM, BlockManagerId(driver, 192.168.0.107, 
44117, None)
   22/12/20 20:45:19 INFO BlockManagerMaster: Registered BlockManager 
BlockManagerId(driver, 192.168.0.107, 44117, None)
   22/12/20 20:45:19 INFO BlockManager: Initialized BlockManager: 
BlockManagerId(driver, 192.168.0.107, 44117, None)
   22/12/20 20:45:19 WARN SharedState: URL.setURLStreamHandlerFactory failed to 
set FsUrlStreamHandlerFactory
   22/12/20 20:45:19 INFO SharedState: Setting hive.metastore.warehouse.dir 
('null') to the value of spark.sql.warehouse.dir.
   22/12/20 20:45:19 INFO SharedState: Warehouse path is 
'file:/home/wicked/Downloads/nessie/spark-warehouse'.
   22/12/20 20:45:20 INFO BaseMetastoreCatalog: Table loaded by catalog: 
demo.iceberg.test1
   Exception in thread "sbt-bg-threads-22" java.lang.NoClassDefFoundError: 
scala/jdk/CollectionConverters$
           at 
org.apache.spark.sql.catalyst.parser.extensions.IcebergSparkSqlExtensionsParser$UnresolvedIcebergTable$.isIcebergTable(IcebergSparkSqlExtensionsParser.scala:170)
           at 
org.apache.spark.sql.catalyst.parser.extensions.IcebergSparkSqlExtensionsParser$UnresolvedIcebergTable$.unapply(IcebergSparkSqlExtensionsParser.scala:162)
           at 
org.apache.spark.sql.catalyst.parser.extensions.IcebergSparkSqlExtensionsParser$$anonfun$replaceRowLevelCommands$1.applyOrElse(IcebergSparkSqlExtensionsParser.scala:144)
           at 
org.apache.spark.sql.catalyst.parser.extensions.IcebergSparkSqlExtensionsParser$$anonfun$replaceRowLevelCommands$1.applyOrElse(IcebergSparkSqlExtensionsParser.scala:143)
           at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDownWithPruning$2(AnalysisHelper.scala:170)
           at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:176)
           at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDownWithPruning$1(AnalysisHelper.scala:170)
           at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:323)
           at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDownWithPruning(AnalysisHelper.scala:168)
           at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDownWithPruning$(AnalysisHelper.scala:164)
           at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsDownWithPruning(LogicalPlan.scala:30)
           at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDown(AnalysisHelper.scala:160)
           at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDown$(AnalysisHelper.scala:159)
           at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsDown(LogicalPlan.scala:30)
           at 
org.apache.spark.sql.catalyst.parser.extensions.IcebergSparkSqlExtensionsParser.replaceRowLevelCommands(IcebergSparkSqlExtensionsParser.scala:143)
           at 
org.apache.spark.sql.catalyst.parser.extensions.IcebergSparkSqlExtensionsParser.parsePlan(IcebergSparkSqlExtensionsParser.scala:138)
           at 
org.apache.spark.sql.SparkSession.$anonfun$sql$2(SparkSession.scala:620)
           at 
org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
           at 
org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:620)
           at 
org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
           at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:617)
           at NessieTest$.main(app.scala:21)
           at NessieTest.main(app.scala)
           at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
           at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
           at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
           at java.base/java.lang.reflect.Method.invoke(Method.java:566)
           at sbt.Run.invokeMain(Run.scala:143)
           at sbt.Run.execute$1(Run.scala:93)
           at sbt.Run.$anonfun$runWithLoader$5(Run.scala:120)
           at sbt.Run$.executeSuccess(Run.scala:186)
           at sbt.Run.runWithLoader(Run.scala:120)
           at sbt.Defaults$.$anonfun$bgRunTask$6(Defaults.scala:1981)
           at sbt.Defaults$.$anonfun$termWrapper$2(Defaults.scala:1920)
           at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
           at scala.util.Try$.apply(Try.scala:213)
           at 
sbt.internal.BackgroundThreadPool$BackgroundRunnable.run(DefaultBackgroundJobService.scala:369)
           at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
           at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
           at java.base/java.lang.Thread.run(Thread.java:829)
   Caused by: java.lang.ClassNotFoundException: scala.jdk.CollectionConverters$
           at 
java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:476)
           at 
sbt.internal.ManagedClassLoader.findClass(ManagedClassLoader.java:102)
           at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:589)
           at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
           ... 40 more
   
   ```
   
   Please advise on how to overcome this. Thanks in advance!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] sristiraj opened a new issue, #6463: Iceberg delete operation failing in spark 3.3.0 using Spark SQL

Reply via email to