I used Ambari to config and install Hive and Spark. I want to insert into a
hive table using Spark execution Engine but I face to this weird error. The
error is:

Job failed with java.lang.ClassNotFoundException:
ive_20231017100559_301568f9-bdfa-4f7c-89a6-f69a65b30aaf:1
2023-10-17 10:07:42,972 ERROR [c4aeb932-743e-4736-b00f-6b905381fa03 main]
status.SparkJobMonitor: Job failed with java.lang.ClassNotFoundException:
ive_20231017100559_301568f9-bdfa-4f7c-89a6-f69a65b30aaf:1
com.esotericsoftware.kryo.KryoException: Unable to find class:
ive_20231017100559_301568f9-bdfa-4f7c-89a6-f69a65b30aaf:1
Serialization trace:
invertedWorkGraph (org.apache.hadoop.hive.ql.plan.SparkWork)
        at
com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:160)
        at
com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:133)
        at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:693)
        at
org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readClass(SerializationUtilities.java:181)
        at
com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:118)
        at
com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:543)
        at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:709)
        at
org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:206)
        at
org.apache.hadoop.hive.ql.exec.spark.KryoSerializer.deserialize(KryoSerializer.java:60)
        at
org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient$JobStatusJob.call(RemoteHiveSparkClient.java:329)
        at
org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:378)
        at
org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:343)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException:
ive_20231017100559_301568f9-bdfa-4f7c-89a6-f69a65b30aaf:1
        at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:348)
        at
com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:154)
        ... 15 more

2023-10-17 10:07:43,067 INFO  [c4aeb932-743e-4736-b00f-6b905381fa03 main]
reexec.ReOptimizePlugin: ReOptimization: retryPossible: false
FAILED: Execution Error, return code 3 from
org.apache.hadoop.hive.ql.exec.spark.SparkTask. Spark job failed during
runtime. Please check stacktrace for the root cause.

the weird part is Hive make this itself and asks me where to find it! I
would appreciate any helps to solve and locate the problem.

note: The Ambari, Hadoop, Hive, Zookeeper and Spark Works Well according to
the Ambari service health check.
note: Since I didnt find any spark specific hive-site.xml I added the
following configs to the hive-site.xml file:
<property>
  <name>hive.execution.engine</name>
  <value>spark</value>
</property>

<property>
  <name>hive.spark.warehouse.location</name>
  <value>/tmp/spark/warehouse</value>
</property>

<property>
  <name>hive.spark.sql.execution.mode</name>
  <value>adaptive</value>
</property>

<property>
  <name>hive.spark.sql.shuffle.partitions</name>
  <value>200</value>
</property>

<property>
  <name>hive.spark.sql.shuffle.partitions.pernode</name>
  <value>2</value>
</property>

<property>
  <name>hive.spark.sql.memory.fraction</name>
  <value>0.6</value>
</property>

<property>
  <name>hive.spark.sql.codegen.enabled</name>
  <value>true</value>
</property>

<property>
  <name>spark.sql.hive.hiveserver2.jdbc.url</name>
  <value>jdbc:hive2://my.ambari.com:2181
/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2</value>
</property>

<property>
  <name>spark.datasource.hive.warehouse.load.staging.dir</name>
  <value>/tmp</value>
</property>


<property>
  <name>spark.hadoop.hive.zookeeper.quorum</name>
  <value>my.ambari.com:2181</value>
</property>

<property>

<name>spark.datasource.hive.warehouse.write.path.strictColumnNamesMapping</name>
  <value>true</value>
</property>

<property>
  <name>spark.sql.hive.conf.list</name>

<value>hive.vectorized.execution.filesink.arrow.native.enabled=true;hive.vectorized.execution.enabled=true</value>
</property>

<property>
  <name>hive.spark.client.connect.timeout</name>
  <value>30000ms</value>
</property>

<property>
  <name>hive.spark.client.server.connect.timeout</name>
  <value>300000ms</value>

<property>
    <name>hive.hook.proto.base-directory</name>
    <value>/tmp/hive/hooks</value>
  </property>
  <property>
    <name>hive.spark.sql.shuffle.partitions</name>
    <value>200</value>
  </property>
  <property>
    <name>hive.strict.managed.tables</name>
    <value>true</value>
  </property>
  <property>
    <name>hive.stats.fetch.partition.stats</name>
    <value>true</value>
  </property>
  <property>
    <name>hive.spark.sql.memory.fraction</name>
    <value>0.6</value>
  </property>
  <property>
    <name>hive.spark.sql.execution.mode</name>
    <value>spark</value>
  </property>
  <property>
    <name>hive.spark.sql.codegen.enabled</name>
    <value>true</value>
  </property>
  <property>
    <name>hive.heapsize</name>
    <value>2g</value>
  </property>
  <property>
    <name>hive.spark.sql.shuffle.partitions.pernode</name>
    <value>100</value>
  </property>
  <property>
    <name>hive.spark.warehouse.location</name>
    <value>/user/hive/warehouse</value>
  </property>
</property>

Reply via email to