Reduce phase in Hive on Spark

Prabu Soundar Rajan -X (prabsoun - MINDTREE LIMITED at Cisco) Mon, 03 Nov 2014 06:34:26 -0800

Hi Team,

We are trying hive on Spark in our cluster and we are experiencing the below 
exception whenever the hive queries involves a reducer phase in its execution 
(like group by, UDAF). Could you please help us understand the compatibility of 
Hive on Spark in UDAF execution and the root cause of this exception.


We are using Spark 1.1.0 version and made the build using the hadoop-2 
profile(mvn clean install -DskipTests -Phadoop-2) with the code downloaded from 
https://github.com/apache/hive/tree/spark.

hive (default)> select count(*) from employee;
Query ID = phodisvc_20141103032121_978e1f48-6290-4e5d-8a57-955edc98b7cd
Total jobs = 1
Launching Job 1 out of 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapred.reduce.tasks=<number>
java.lang.NoSuchMethodError: 
org.apache.spark.api.java.JavaPairRDD.foreachAsync(Lorg/apache/spark/api/java/function/VoidFunction;)Lorg/apache/spark/api/java/JavaFutureAction;
        at 
org.apache.hadoop.hive.ql.exec.spark.SparkClient.execute(SparkClient.java:189)
        at 
org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.submit(SparkSessionImpl.java:52)
        at 
org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:76)
        at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:161)
        at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
        at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1606)
        at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1366)
        at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1178)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1005)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:995)
        at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:246)
        at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:198)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:408)
        at 
org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:781)
        at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:601)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
FAILED: Execution Error, return code -101 from 
org.apache.hadoop.hive.ql.exec.spark.SparkTask. 
org.apache.spark.api.java.JavaPairRDD.foreachAsync(Lorg/apache/spark/api/java/function/VoidFunction;)Lorg/apache/spark/api/java/JavaFutureAction;

Thanks & Regards,
Prabu

Reduce phase in Hive on Spark

Reply via email to