Re: hive: spark as execution engine. class not found problem

2023-10-17 Thread Vijay Shankar
UNSUBSCRIBE

On Tue, Oct 17, 2023 at 5:09 PM Amirhossein Kabiri <
amirhosseikab...@gmail.com> wrote:

> I used Ambari to config and install Hive and Spark. I want to insert into
> a hive table using Spark execution Engine but I face to this weird error.
> The error is:
>
> Job failed with java.lang.ClassNotFoundException:
> ive_20231017100559_301568f9-bdfa-4f7c-89a6-f69a65b30aaf:1
> 2023-10-17 10:07:42,972 ERROR [c4aeb932-743e-4736-b00f-6b905381fa03 main]
> status.SparkJobMonitor: Job failed with java.lang.ClassNotFoundException:
> ive_20231017100559_301568f9-bdfa-4f7c-89a6-f69a65b30aaf:1
> com.esotericsoftware.kryo.KryoException: Unable to find class:
> ive_20231017100559_301568f9-bdfa-4f7c-89a6-f69a65b30aaf:1
> Serialization trace:
> invertedWorkGraph (org.apache.hadoop.hive.ql.plan.SparkWork)
> at
> com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:160)
> at
> com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:133)
> at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:693)
> at
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readClass(SerializationUtilities.java:181)
> at
> com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:118)
> at
> com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:543)
> at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:709)
> at
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:206)
> at
> org.apache.hadoop.hive.ql.exec.spark.KryoSerializer.deserialize(KryoSerializer.java:60)
> at
> org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient$JobStatusJob.call(RemoteHiveSparkClient.java:329)
> at
> org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:378)
> at
> org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:343)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.ClassNotFoundException:
> ive_20231017100559_301568f9-bdfa-4f7c-89a6-f69a65b30aaf:1
> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:348)
> at
> com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:154)
> ... 15 more
>
> 2023-10-17 10:07:43,067 INFO  [c4aeb932-743e-4736-b00f-6b905381fa03 main]
> reexec.ReOptimizePlugin: ReOptimization: retryPossible: false
> FAILED: Execution Error, return code 3 from
> org.apache.hadoop.hive.ql.exec.spark.SparkTask. Spark job failed during
> runtime. Please check stacktrace for the root cause.
>
> the weird part is Hive make this itself and asks me where to find it! I
> would appreciate any helps to solve and locate the problem.
>
> note: The Ambari, Hadoop, Hive, Zookeeper and Spark Works Well according
> to the Ambari service health check.
> note: Since I didnt find any spark specific hive-site.xml I added the
> following configs to the hive-site.xml file:
> 
>   hive.execution.engine
>   spark
> 
>
> 
>   hive.spark.warehouse.location
>   /tmp/spark/warehouse
> 
>
> 
>   hive.spark.sql.execution.mode
>   adaptive
> 
>
> 
>   hive.spark.sql.shuffle.partitions
>   200
> 
>
> 
>   hive.spark.sql.shuffle.partitions.pernode
>   2
> 
>
> 
>   hive.spark.sql.memory.fraction
>   0.6
> 
>
> 
>   hive.spark.sql.codegen.enabled
>   true
> 
>
> 
>   spark.sql.hive.hiveserver2.jdbc.url
>   jdbc:hive2://my.ambari.com:2181
> /;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2
> 
>
> 
>   spark.datasource.hive.warehouse.load.staging.dir
>   /tmp
> 
>
>
> 
>   spark.hadoop.hive.zookeeper.quorum
>   my.ambari.com:2181
> 
>
> 
>
> spark.datasource.hive.warehouse.write.path.strictColumnNamesMapping
>   true
> 
>
> 
>   spark.sql.hive.conf.list
>
> hive.vectorized.execution.filesink.arrow.native.enabled=true;hive.vectorized.execution.enabled=true
> 
>
> 
>   hive.spark.client.connect.timeout
>   3ms
> 
>
> 
>   hive.spark.client.server.connect.timeout
>   30ms
>
> 
> hive.hook.proto.base-directory
> /tmp/hive/hooks
>   
>   
> hive.spark.sql.shuffle.partitions
> 200
>   
>   
> hive.strict.managed.tables
> true
>   
>   
> hive.stats.fetch.partition.stats
> true
>   
>   
> hive.spark.sql.memory.fraction
> 0.6
>   
>   
> hive.spark.sql.execution.mode
> spark
>   
>   
> hive.spark.sql.codegen.enabled
> 

hive: spark as execution engine. class not found problem

2023-10-17 Thread Amirhossein Kabiri
I used Ambari to config and install Hive and Spark. I want to insert into a
hive table using Spark execution Engine but I face to this weird error. The
error is:

Job failed with java.lang.ClassNotFoundException:
ive_20231017100559_301568f9-bdfa-4f7c-89a6-f69a65b30aaf:1
2023-10-17 10:07:42,972 ERROR [c4aeb932-743e-4736-b00f-6b905381fa03 main]
status.SparkJobMonitor: Job failed with java.lang.ClassNotFoundException:
ive_20231017100559_301568f9-bdfa-4f7c-89a6-f69a65b30aaf:1
com.esotericsoftware.kryo.KryoException: Unable to find class:
ive_20231017100559_301568f9-bdfa-4f7c-89a6-f69a65b30aaf:1
Serialization trace:
invertedWorkGraph (org.apache.hadoop.hive.ql.plan.SparkWork)
at
com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:160)
at
com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:133)
at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:693)
at
org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readClass(SerializationUtilities.java:181)
at
com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:118)
at
com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:543)
at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:709)
at
org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:206)
at
org.apache.hadoop.hive.ql.exec.spark.KryoSerializer.deserialize(KryoSerializer.java:60)
at
org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient$JobStatusJob.call(RemoteHiveSparkClient.java:329)
at
org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:378)
at
org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:343)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException:
ive_20231017100559_301568f9-bdfa-4f7c-89a6-f69a65b30aaf:1
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at
com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:154)
... 15 more

2023-10-17 10:07:43,067 INFO  [c4aeb932-743e-4736-b00f-6b905381fa03 main]
reexec.ReOptimizePlugin: ReOptimization: retryPossible: false
FAILED: Execution Error, return code 3 from
org.apache.hadoop.hive.ql.exec.spark.SparkTask. Spark job failed during
runtime. Please check stacktrace for the root cause.

the weird part is Hive make this itself and asks me where to find it! I
would appreciate any helps to solve and locate the problem.

note: The Ambari, Hadoop, Hive, Zookeeper and Spark Works Well according to
the Ambari service health check.
note: Since I didnt find any spark specific hive-site.xml I added the
following configs to the hive-site.xml file:

  hive.execution.engine
  spark



  hive.spark.warehouse.location
  /tmp/spark/warehouse



  hive.spark.sql.execution.mode
  adaptive



  hive.spark.sql.shuffle.partitions
  200



  hive.spark.sql.shuffle.partitions.pernode
  2



  hive.spark.sql.memory.fraction
  0.6



  hive.spark.sql.codegen.enabled
  true



  spark.sql.hive.hiveserver2.jdbc.url
  jdbc:hive2://my.ambari.com:2181
/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2



  spark.datasource.hive.warehouse.load.staging.dir
  /tmp




  spark.hadoop.hive.zookeeper.quorum
  my.ambari.com:2181




spark.datasource.hive.warehouse.write.path.strictColumnNamesMapping
  true



  spark.sql.hive.conf.list

hive.vectorized.execution.filesink.arrow.native.enabled=true;hive.vectorized.execution.enabled=true



  hive.spark.client.connect.timeout
  3ms



  hive.spark.client.server.connect.timeout
  30ms


hive.hook.proto.base-directory
/tmp/hive/hooks
  
  
hive.spark.sql.shuffle.partitions
200
  
  
hive.strict.managed.tables
true
  
  
hive.stats.fetch.partition.stats
true
  
  
hive.spark.sql.memory.fraction
0.6
  
  
hive.spark.sql.execution.mode
spark
  
  
hive.spark.sql.codegen.enabled
true
  
  
hive.heapsize
2g
  
  
hive.spark.sql.shuffle.partitions.pernode
100
  
  
hive.spark.warehouse.location
/user/hive/warehouse