This is the code: created a java class by extending org.apache.hadoop.hive.ql.udf.generic.GenericUDTF; ,and creates a sparkSession as- SparkSession spark = SparkSession.builder().enableHiveSupport().master("yarn-client").appName("SampleSparkUDTF_yarnV1").getOrCreate(); ,and tries to read a table in hive DB: Dataset inputData = spark.read().table(tableName); Long countRows = inputData.count();
Environment: HDP-2.5.3.0, spark 2.0.0 Steps: 1. Copied this custom UDTF jar into hdfs & also into auxlib 2. Copied /usr/hdp/<2.5.x>/spark2/jars/*.jar into /usr/hdp/<2.5.x>/hive/auxlib/ 3. Connecting to HS2 using beeline to run this Spark UDT: beeline -u jdbc:hive2://localhost:10000 -d org.apache.hive.jdbc.HiveDriver CREATE TABLE TestTable (i int); INSERT INTO TestTable VALUES (1); 0: jdbc:hive2://localhost:10000/> CREATE FUNCTION SparkUDT AS 'SparkHiveUDTF' using jar 'hdfs:///tmp/sparkHiveGenericUDTF-1.0.jar' ; INFO : converting to local hdfs:///tmp/sparkHiveGenericUDTF-1.0.jar INFO : Added [/tmp/69366d0d-6777-4860-82c0-c61482ccce87_resources/sparkHiveGenericUDTF-1.0.jar] to class path INFO : Added resources: [hdfs:///tmp/sparkHiveGenericUDTF-1.0.jar] No rows affected (0.125 seconds) 0: jdbc:hive2://localhost:10000/> SELECT SparkUDT('tbl','TestTable'); failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable (null) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:194) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:185) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:185) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:181) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable (null) at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:91) at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:325) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150) ... 14 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable (null) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:563) at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:83) ... 17 more Caused by: org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master. at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:85) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:62) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:149) at org.apache.spark.SparkContext.<init>(SparkContext.scala:497) at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2275) at org.apache.spark.sql.SparkSession$Builder$$anonfun$8.apply(SparkSession.scala:831) at org.apache.spark.sql.SparkSession$Builder$$anonfun$8.apply(SparkSession.scala:823) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:823) at SparkHiveUDTF.sparkJob(SparkHiveUDTF.java:97) at SparkHiveUDTF.process(SparkHiveUDTF.java:78) at org.apache.hadoop.hive.ql.exec.UDTFOperator.process(UDTFOperator.java:109) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838) at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838) at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:133) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:170) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:555) Is it supported to launch sparkSession via HS2?