[jira] [Commented] (HIVE-12616) NullPointerException when spark session is reused to run a mapjoin
[ https://issues.apache.org/jira/browse/HIVE-12616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210116#comment-15210116 ] Hive QA commented on HIVE-12616: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12794821/HIVE-12616.3.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 14 failed/errored test(s), 9856 tests executed *Failed tests:* {noformat} TestSparkCliDriver-groupby3_map.q-sample2.q-auto_join14.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-groupby_map_ppr_multi_distinct.q-table_access_keys_stats.q-groupby4_noskew.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-join_rc.q-insert1.q-vectorized_rcfile_columnar.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-ppd_join4.q-join9.q-ppd_join3.q-and-12-more - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_llap_udf org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager.concurrencyFalse org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager.testDDLExclusive org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager.testDelete org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager.testLockTimeout org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager.testRollback org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager.testSingleReadPartition org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager.testSingleWriteTable org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager.testUpdate org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager.testWriteDynamicPartition {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7354/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7354/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-7354/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 14 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12794821 - PreCommit-HIVE-TRUNK-Build > NullPointerException when spark session is reused to run a mapjoin > -- > > Key: HIVE-12616 > URL: https://issues.apache.org/jira/browse/HIVE-12616 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 1.3.0 >Reporter: Nemon Lou >Assignee: Nemon Lou > Attachments: HIVE-12616.1.patch, HIVE-12616.2.patch, > HIVE-12616.3.patch, HIVE-12616.patch > > > The way to reproduce: > {noformat} > set hive.execution.engine=spark; > create table if not exists test(id int); > create table if not exists test1(id int); > insert into test values(1); > insert into test1 values(1); > select max(a.id) from test a ,test1 b > where a.id = b.id; > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12616) NullPointerException when spark session is reused to run a mapjoin
[ https://issues.apache.org/jira/browse/HIVE-12616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057239#comment-15057239 ] Xuefu Zhang commented on HIVE-12616: +1 [~nemon], could you create a followup JIRA that covers the problems that didn't get addressed by the patch here based on the discussion here? Thanks. > NullPointerException when spark session is reused to run a mapjoin > -- > > Key: HIVE-12616 > URL: https://issues.apache.org/jira/browse/HIVE-12616 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 1.3.0 >Reporter: Nemon Lou >Assignee: Nemon Lou > Attachments: HIVE-12616.1.patch, HIVE-12616.2.patch, HIVE-12616.patch > > > The way to reproduce: > {noformat} > set hive.execution.engine=spark; > create table if not exists test(id int); > create table if not exists test1(id int); > insert into test values(1); > insert into test1 values(1); > select max(a.id) from test a ,test1 b > where a.id = b.id; > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12616) NullPointerException when spark session is reused to run a mapjoin
[ https://issues.apache.org/jira/browse/HIVE-12616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15055934#comment-15055934 ] Hive QA commented on HIVE-12616: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12777428/HIVE-12616.2.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 33 failed/errored test(s), 9881 tests executed *Failed tests:* {noformat} TestHWISessionManager - did not produce a TEST-*.xml file TestSparkCliDriver-timestamp_lazy.q-bucketsortoptimize_insert_4.q-date_udf.q-and-12-more - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_order2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union9 org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_dynamic org.apache.hadoop.hive.cli.TestMiniTezCliDriver.org.apache.hadoop.hive.cli.TestMiniTezCliDriver org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_stats org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_correlationoptimizer1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_delete_where_non_partitioned org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_join_nullsafe org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_merge5 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_ppd_basic org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_bmj_schema_evolution org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_dynpart_hashjoin_1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_transform2 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_decimal_5 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_decimal_6 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_interval_1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorization_2 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorization_limit org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorized_timestamp_ints_casts org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import org.apache.hadoop.hive.metastore.TestHiveMetaStorePartitionSpecs.testFetchingPartitionsWithDifferentSchemas org.apache.hadoop.hive.metastore.TestHiveMetaStorePartitionSpecs.testGetPartitionSpecs_WithAndWithoutPartitionGrouping org.apache.hive.jdbc.TestSSL.testSSLVersion org.apache.hive.spark.client.TestSparkClient.testAddJarsAndFiles org.apache.hive.spark.client.TestSparkClient.testCounters org.apache.hive.spark.client.TestSparkClient.testErrorJob org.apache.hive.spark.client.TestSparkClient.testJobSubmission org.apache.hive.spark.client.TestSparkClient.testMetricsCollection org.apache.hive.spark.client.TestSparkClient.testRemoteClient org.apache.hive.spark.client.TestSparkClient.testSimpleSparkJob org.apache.hive.spark.client.TestSparkClient.testSyncRpc {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6346/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6346/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6346/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 33 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12777428 - PreCommit-HIVE-TRUNK-Build > NullPointerException when spark session is reused to run a mapjoin > -- > > Key: HIVE-12616 > URL: https://issues.apache.org/jira/browse/HIVE-12616 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 1.3.0 >Reporter: Nemon Lou >Assignee: Nemon Lou > Attachments: HIVE-12616.1.patch, HIVE-12616.2.patch, HIVE-12616.patch > > > The way to reproduce: > {noformat} > set hive.execution.engine=spark; > create table if not exists test(id int); > create table if not exists test1(id int); > insert into test values(1); > insert into test1 values(1); > select max(a.id) from test a ,test1 b > where a.id = b.id; > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12616) NullPointerException when spark session is reused to run a mapjoin
[ https://issues.apache.org/jira/browse/HIVE-12616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15055514#comment-15055514 ] Hive QA commented on HIVE-12616: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12777386/HIVE-12616.1.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 19 failed/errored test(s), 9895 tests executed *Failed tests:* {noformat} TestHWISessionManager - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_order2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union9 org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_dynamic org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import org.apache.hadoop.hive.metastore.TestHiveMetaStorePartitionSpecs.testFetchingPartitionsWithDifferentSchemas org.apache.hadoop.hive.metastore.TestHiveMetaStorePartitionSpecs.testGetPartitionSpecs_WithAndWithoutPartitionGrouping org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testMultiSessionMultipleUse org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testSingleSessionMultipleUse org.apache.hive.jdbc.TestSSL.testSSLVersion org.apache.hive.jdbc.miniHS2.TestHs2Metrics.testMetrics org.apache.hive.spark.client.TestSparkClient.testAddJarsAndFiles org.apache.hive.spark.client.TestSparkClient.testCounters org.apache.hive.spark.client.TestSparkClient.testErrorJob org.apache.hive.spark.client.TestSparkClient.testJobSubmission org.apache.hive.spark.client.TestSparkClient.testMetricsCollection org.apache.hive.spark.client.TestSparkClient.testRemoteClient org.apache.hive.spark.client.TestSparkClient.testSimpleSparkJob org.apache.hive.spark.client.TestSparkClient.testSyncRpc {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6344/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6344/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6344/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 19 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12777386 - PreCommit-HIVE-TRUNK-Build > NullPointerException when spark session is reused to run a mapjoin > -- > > Key: HIVE-12616 > URL: https://issues.apache.org/jira/browse/HIVE-12616 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 1.3.0 >Reporter: Nemon Lou >Assignee: Nemon Lou > Attachments: HIVE-12616.1.patch, HIVE-12616.patch > > > The way to reproduce: > {noformat} > set hive.execution.engine=spark; > create table if not exists test(id int); > create table if not exists test1(id int); > insert into test values(1); > insert into test1 values(1); > select max(a.id) from test a ,test1 b > where a.id = b.id; > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12616) NullPointerException when spark session is reused to run a mapjoin
[ https://issues.apache.org/jira/browse/HIVE-12616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15050888#comment-15050888 ] Nemon Lou commented on HIVE-12616: -- [~xuefuz], thanks for review . It is not surprising that you doubt about the "spark.master" setting in HiveConf . I owe one explanation for the issue described here . For short , "spark.master" is set for HiveConf during the creation of HiveSparkClient. Snippet of HiveSparkClientFactory#initiateSparkConf : {code} String sparkMaster = hiveConf.get("spark.master"); if (sparkMaster == null) { sparkMaster = sparkConf.get("spark.master"); hiveConf.set("spark.master", sparkMaster); } {code} The creation of HiveSparkClient only happens once due to reuse (known as SparkSession). However ,this HiveConf is operation level instead of session level (due to asynchronous query). So ,only the first operation's JobConf has "spark.master" with it . Now I have two choices : 1, Setting "spark.master" at session level during HiveSparkClient creation . 2, Setting "spark.master" for each operation when not set before ,but using sparkConf instead of hiveConf from RemoteHiveSparkClient.(SparkConf in RemoteHiveSparkClient already set "spark.master" in an explicit way .) Which one do you prefer ? Adding a test case for this issue seems difficult (yarn-cluster mode,multiple operation in one session ),would you provide some guidance ? Thanks. > NullPointerException when spark session is reused to run a mapjoin > -- > > Key: HIVE-12616 > URL: https://issues.apache.org/jira/browse/HIVE-12616 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 1.3.0 >Reporter: Nemon Lou >Assignee: Nemon Lou > Attachments: HIVE-12616.patch > > > The way to reproduce: > {noformat} > set hive.execution.engine=spark; > create table if not exists test(id int); > create table if not exists test1(id int); > insert into test values(1); > insert into test1 values(1); > select max(a.id) from test a ,test1 b > where a.id = b.id; > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12616) NullPointerException when spark session is reused to run a mapjoin
[ https://issues.apache.org/jira/browse/HIVE-12616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052100#comment-15052100 ] Xuefu Zhang commented on HIVE-12616: Thanks for the explanation. I guess the problem is that user didn't set spark.master explicitly, Hive's default, yarn-cluster, is set only for the HiveConf of the first operation. I think we should set "spark.master" in session level HiveConf. It seems we just need to add one line doing that in the if block below: {code} // load properties from hive configurations, including both spark.* properties, // properties for remote driver RPC, and yarn properties for Spark on YARN mode. String sparkMaster = hiveConf.get("spark.master"); if (sparkMaster == null) { sparkMaster = sparkConf.get("spark.master"); hiveConf.set("spark.master", sparkMaster); } {code} > NullPointerException when spark session is reused to run a mapjoin > -- > > Key: HIVE-12616 > URL: https://issues.apache.org/jira/browse/HIVE-12616 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 1.3.0 >Reporter: Nemon Lou >Assignee: Nemon Lou > Attachments: HIVE-12616.patch > > > The way to reproduce: > {noformat} > set hive.execution.engine=spark; > create table if not exists test(id int); > create table if not exists test1(id int); > insert into test values(1); > insert into test1 values(1); > select max(a.id) from test a ,test1 b > where a.id = b.id; > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12616) NullPointerException when spark session is reused to run a mapjoin
[ https://issues.apache.org/jira/browse/HIVE-12616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15050087#comment-15050087 ] Xuefu Zhang commented on HIVE-12616: [~nemon], thanks for the patch. Looking at the patch, I'm not sure if it fixes the problem. You're building a JobConf from hiveconf and then check if spark.master is null in the JobConf. If so, you copied the value from HiveConf. I'm afraid that the value you get from HiveConf will be also null. It would be great if you can add your test case as part of the patch. > NullPointerException when spark session is reused to run a mapjoin > -- > > Key: HIVE-12616 > URL: https://issues.apache.org/jira/browse/HIVE-12616 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 1.3.0 >Reporter: Nemon Lou >Assignee: Nemon Lou > Attachments: HIVE-12616.patch > > > The way to reproduce: > {noformat} > set hive.execution.engine=spark; > create table if not exists test(id int); > create table if not exists test1(id int); > insert into test values(1); > insert into test1 values(1); > select max(a.id) from test a ,test1 b > where a.id = b.id; > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12616) NullPointerException when spark session is reused to run a mapjoin
[ https://issues.apache.org/jira/browse/HIVE-12616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15049898#comment-15049898 ] Hive QA commented on HIVE-12616: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12776505/HIVE-12616.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 9887 tests executed *Failed tests:* {noformat} TestHWISessionManager - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_udf_max org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_order2 org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_dynamic org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_dynamic_partition_pruning org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vectorized_dynamic_partition_pruning org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_mergejoin org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import org.apache.hadoop.hive.metastore.TestHiveMetaStorePartitionSpecs.testAddPartitions org.apache.hadoop.hive.metastore.TestHiveMetaStorePartitionSpecs.testFetchingPartitionsWithDifferentSchemas org.apache.hadoop.hive.metastore.TestHiveMetaStorePartitionSpecs.testGetPartitionSpecs_WithAndWithoutPartitionGrouping org.apache.hive.jdbc.TestSSL.testSSLVersion org.apache.hive.jdbc.miniHS2.TestHs2Metrics.testMetrics {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6298/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6298/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6298/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 13 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12776505 - PreCommit-HIVE-TRUNK-Build > NullPointerException when spark session is reused to run a mapjoin > -- > > Key: HIVE-12616 > URL: https://issues.apache.org/jira/browse/HIVE-12616 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 1.3.0 >Reporter: Nemon Lou >Assignee: Nemon Lou > Attachments: HIVE-12616.patch > > > The way to reproduce: > {noformat} > set hive.execution.engine=spark; > create table if not exists test(id int); > create table if not exists test1(id int); > insert into test values(1); > insert into test1 values(1); > select max(a.id) from test a ,test1 b > where a.id = b.id; > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12616) NullPointerException when spark session is reused to run a mapjoin
[ https://issues.apache.org/jira/browse/HIVE-12616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15046645#comment-15046645 ] Nemon Lou commented on HIVE-12616: -- The following exception is found from excutors' log {noformat} 2015-12-08 10:55:04,165 | ERROR | [Executor task launch worker-0] | Exception in task 1.0 in stage 4.0 (TID 5) | org.apache.spark.Logging$class.logError(Logging.scala:96) java.lang.RuntimeException: Map operator initialization failed: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:120) at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:55) at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:30) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:189) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:189) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$17.apply(RDD.scala:710) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$17.apply(RDD.scala:710) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:87) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:88) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.mr.ObjectCache.retrieve(ObjectCache.java:57) at org.apache.hadoop.hive.ql.exec.mr.ObjectCache.retrieveAsync(ObjectCache.java:63) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.initializeOp(MapJoinOperator.java:171) at org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinCommonOperator.initializeOp(VectorMapJoinCommonOperator.java:552) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:363) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:482) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:439) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376) at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:111) ... 18 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.spark.HashTableLoader.load(HashTableLoader.java:150) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:289) at org.apache.hadoop.hive.ql.exec.MapJoinOperator$1.call(MapJoinOperator.java:177) at org.apache.hadoop.hive.ql.exec.MapJoinOperator$1.call(MapJoinOperator.java:173) at org.apache.hadoop.hive.ql.exec.mr.ObjectCache.retrieve(ObjectCache.java:55) ... 26 more Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.spark.SparkUtilities.isDedicatedCluster(SparkUtilities.java:121) at org.apache.hadoop.hive.ql.exec.spark.HashTableLoader.load(HashTableLoader.java:157) at org.apache.hadoop.hive.ql.exec.spark.HashTableLoader.load(HashTableLoader.java:147) ... 30 more {noformat} > NullPointerException when spark session is reused to run a mapjoin > -- > > Key: HIVE-12616 > URL: https://issues.apache.org/jira/browse/HIVE-12616 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 1.3.0 >Reporter: Nemon Lou >Assignee: Xuefu Zhang > > The way to reproduce: > {noformat} > set hive.execution.engine=spark; > create table if not exists test(id int); > create table if not exists test1(id int); > insert into test values(1); > insert into test1 values(1); > select max(a.id) from test a ,test1 b > where a.id = b.id; > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12616) NullPointerException when spark session is reused to run a mapjoin
[ https://issues.apache.org/jira/browse/HIVE-12616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15046799#comment-15046799 ] Nemon Lou commented on HIVE-12616: -- Another finding.Setting spark.master explicitly can also fix this. "set spark.master=yarn-cluster;" I used to use a spark-defaults.conf on HiveServer2 side ,which contains "spark.master = yarn-cluster". > NullPointerException when spark session is reused to run a mapjoin > -- > > Key: HIVE-12616 > URL: https://issues.apache.org/jira/browse/HIVE-12616 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 1.3.0 >Reporter: Nemon Lou >Assignee: Xuefu Zhang > > The way to reproduce: > {noformat} > set hive.execution.engine=spark; > create table if not exists test(id int); > create table if not exists test1(id int); > insert into test values(1); > insert into test1 values(1); > select max(a.id) from test a ,test1 b > where a.id = b.id; > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12616) NullPointerException when spark session is reused to run a mapjoin
[ https://issues.apache.org/jira/browse/HIVE-12616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15046791#comment-15046791 ] Nemon Lou commented on HIVE-12616: -- Method SparkUtilities#isDedicatedCluster {code} public static boolean isDedicatedCluster(Configuration conf) { String master = conf.get("spark.master"); return master.startsWith("yarn-") || master.startsWith("local"); } {code} Changing to {code} public static boolean isDedicatedCluster(Configuration conf) { String master = conf.get("spark.master",""); return master.startsWith("yarn-") || master.startsWith("local"); } {code} can fix this bug.But why "spark.master" is null after session reuse ? I haven't find out. > NullPointerException when spark session is reused to run a mapjoin > -- > > Key: HIVE-12616 > URL: https://issues.apache.org/jira/browse/HIVE-12616 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 1.3.0 >Reporter: Nemon Lou >Assignee: Xuefu Zhang > > The way to reproduce: > {noformat} > set hive.execution.engine=spark; > create table if not exists test(id int); > create table if not exists test1(id int); > insert into test values(1); > insert into test1 values(1); > select max(a.id) from test a ,test1 b > where a.id = b.id; > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12616) NullPointerException when spark session is reused to run a mapjoin
[ https://issues.apache.org/jira/browse/HIVE-12616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15046982#comment-15046982 ] Xuefu Zhang commented on HIVE-12616: [~nemon], thanks for reporting and investigating the problem. I believe that the value for spark.master is null if it's not set explicitly, so comes the error. The proposed fix seems fine, but I'm afraid other problems will arise if this NPE is fixed. Thus, I think it would be better if we can add some validation, making sure spark.master is specified. Would you be interested in providing a fix? Thanks. > NullPointerException when spark session is reused to run a mapjoin > -- > > Key: HIVE-12616 > URL: https://issues.apache.org/jira/browse/HIVE-12616 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 1.3.0 >Reporter: Nemon Lou >Assignee: Xuefu Zhang > > The way to reproduce: > {noformat} > set hive.execution.engine=spark; > create table if not exists test(id int); > create table if not exists test1(id int); > insert into test values(1); > insert into test1 values(1); > select max(a.id) from test a ,test1 b > where a.id = b.id; > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)