[jira] [Commented] (HIVE-13314) Hive on spark mapjoin errors if spark.master is not set

2016-03-21 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205852#comment-15205852
 ] 

Szehon Ho commented on HIVE-13314:
--

I think you are right, should have searched before I wasted few hours debugging 
this :)

> Hive on spark mapjoin errors if spark.master is not set
> ---
>
> Key: HIVE-13314
> URL: https://issues.apache.org/jira/browse/HIVE-13314
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Szehon Ho
>Assignee: Szehon Ho
>Priority: Minor
>
> There are some errors that happen if spark.master is not set.
> This is despite the code defaulting to yarn-cluster if spark.master is not 
> set by user or on the config files: 
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java#L51]
> The funny thing is that while it works the first time due to this default, 
> subsequent tries will fail as the hiveConf is refreshed without that default 
> being set.
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java#L180]
> Exception is follows:
> {noformat}
> Job aborted due to stage failure: Task 40 in stage 1.0 failed 4 times, most 
> recent failure: Lost task 40.3 in stage 1.0 (TID 22, 
> d2409.halxg.cloudera.com): java.lang.RuntimeException: Error processing row: 
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.processRow(SparkMapRecordHandler.java:154)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:48)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:27)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:95)
>   at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>   at 
> org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$15.apply(AsyncRDDActions.scala:120)
>   at 
> org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$15.apply(AsyncRDDActions.scala:120)
>   at 
> org.apache.spark.SparkContext$$anonfun$38.apply(SparkContext.scala:2003)
>   at 
> org.apache.spark.SparkContext$$anonfun$38.apply(SparkContext.scala:2003)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>   at org.apache.spark.scheduler.Task.run(Task.scala:89)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HashTableLoader.load(HashTableLoader.java:117)
>   at 
> org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:197)
>   at 
> org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:223)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1051)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055)
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:490)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.processRow(SparkMapRecordHandler.java:141)
>   ... 16 more
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkUtilities.isDedicatedCluster(SparkUtilities.java:108)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HashTableLoader.load(HashTableLoader.java:124)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HashTableLoader.load(HashTableLoader.java:114)
>   ... 24 more
> Driver stacktrace:
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13314) Hive on spark mapjoin errors if spark.master is not set

2016-03-20 Thread Nemon Lou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15203645#comment-15203645
 ] 

Nemon Lou commented on HIVE-13314:
--

Similar to HIVE-12616 ?

> Hive on spark mapjoin errors if spark.master is not set
> ---
>
> Key: HIVE-13314
> URL: https://issues.apache.org/jira/browse/HIVE-13314
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Szehon Ho
>Assignee: Szehon Ho
>Priority: Minor
>
> There are some errors that happen if spark.master is not set.
> This is despite the code defaulting to yarn-cluster if spark.master is not 
> set by user or on the config files: 
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java#L51]
> The funny thing is that while it works the first time due to this default, 
> subsequent tries will fail as the hiveConf is refreshed without that default 
> being set.
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java#L180]
> Exception is follows:
> {noformat}
> Job aborted due to stage failure: Task 40 in stage 1.0 failed 4 times, most 
> recent failure: Lost task 40.3 in stage 1.0 (TID 22, 
> d2409.halxg.cloudera.com): java.lang.RuntimeException: Error processing row: 
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.processRow(SparkMapRecordHandler.java:154)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:48)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:27)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:95)
>   at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>   at 
> org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$15.apply(AsyncRDDActions.scala:120)
>   at 
> org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$15.apply(AsyncRDDActions.scala:120)
>   at 
> org.apache.spark.SparkContext$$anonfun$38.apply(SparkContext.scala:2003)
>   at 
> org.apache.spark.SparkContext$$anonfun$38.apply(SparkContext.scala:2003)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>   at org.apache.spark.scheduler.Task.run(Task.scala:89)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HashTableLoader.load(HashTableLoader.java:117)
>   at 
> org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:197)
>   at 
> org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:223)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1051)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055)
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:490)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.processRow(SparkMapRecordHandler.java:141)
>   ... 16 more
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkUtilities.isDedicatedCluster(SparkUtilities.java:108)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HashTableLoader.load(HashTableLoader.java:124)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HashTableLoader.load(HashTableLoader.java:114)
>   ... 24 more
> Driver stacktrace:
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)