[jira] [Commented] (HIVE-12616) NullPointerException when spark session is reused to run a mapjoin

2016-03-24 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210116#comment-15210116
 ] 

Hive QA commented on HIVE-12616:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12794821/HIVE-12616.3.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 14 failed/errored test(s), 9856 tests 
executed
*Failed tests:*
{noformat}
TestSparkCliDriver-groupby3_map.q-sample2.q-auto_join14.q-and-12-more - did not 
produce a TEST-*.xml file
TestSparkCliDriver-groupby_map_ppr_multi_distinct.q-table_access_keys_stats.q-groupby4_noskew.q-and-12-more
 - did not produce a TEST-*.xml file
TestSparkCliDriver-join_rc.q-insert1.q-vectorized_rcfile_columnar.q-and-12-more 
- did not produce a TEST-*.xml file
TestSparkCliDriver-ppd_join4.q-join9.q-ppd_join3.q-and-12-more - did not 
produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_llap_udf
org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager.concurrencyFalse
org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager.testDDLExclusive
org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager.testDelete
org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager.testLockTimeout
org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager.testRollback
org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager.testSingleReadPartition
org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager.testSingleWriteTable
org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager.testUpdate
org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager.testWriteDynamicPartition
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7354/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7354/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-7354/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 14 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12794821 - PreCommit-HIVE-TRUNK-Build

> NullPointerException when spark session is reused to run a mapjoin
> --
>
> Key: HIVE-12616
> URL: https://issues.apache.org/jira/browse/HIVE-12616
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 1.3.0
>Reporter: Nemon Lou
>Assignee: Nemon Lou
> Attachments: HIVE-12616.1.patch, HIVE-12616.2.patch, 
> HIVE-12616.3.patch, HIVE-12616.patch
>
>
> The way to reproduce:
> {noformat}
> set hive.execution.engine=spark;
> create table if not exists test(id int);
> create table if not exists test1(id int);
> insert into test values(1);
> insert into test1 values(1);
> select max(a.id) from test a ,test1 b
> where a.id = b.id;
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12616) NullPointerException when spark session is reused to run a mapjoin

2015-12-14 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057239#comment-15057239
 ] 

Xuefu Zhang commented on HIVE-12616:


+1

[~nemon], could you create a followup JIRA that covers the problems that didn't 
get addressed by the patch here based on the discussion here? Thanks.

> NullPointerException when spark session is reused to run a mapjoin
> --
>
> Key: HIVE-12616
> URL: https://issues.apache.org/jira/browse/HIVE-12616
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 1.3.0
>Reporter: Nemon Lou
>Assignee: Nemon Lou
> Attachments: HIVE-12616.1.patch, HIVE-12616.2.patch, HIVE-12616.patch
>
>
> The way to reproduce:
> {noformat}
> set hive.execution.engine=spark;
> create table if not exists test(id int);
> create table if not exists test1(id int);
> insert into test values(1);
> insert into test1 values(1);
> select max(a.id) from test a ,test1 b
> where a.id = b.id;
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12616) NullPointerException when spark session is reused to run a mapjoin

2015-12-14 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15055934#comment-15055934
 ] 

Hive QA commented on HIVE-12616:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12777428/HIVE-12616.2.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 33 failed/errored test(s), 9881 tests 
executed
*Failed tests:*
{noformat}
TestHWISessionManager - did not produce a TEST-*.xml file
TestSparkCliDriver-timestamp_lazy.q-bucketsortoptimize_insert_4.q-date_udf.q-and-12-more
 - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_order2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union9
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_dynamic
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.org.apache.hadoop.hive.cli.TestMiniTezCliDriver
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_stats
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_correlationoptimizer1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_delete_where_non_partitioned
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_join_nullsafe
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_merge5
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_ppd_basic
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_bmj_schema_evolution
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_dynpart_hashjoin_1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_transform2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_decimal_5
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_decimal_6
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_interval_1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorization_2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorization_limit
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorized_timestamp_ints_casts
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
org.apache.hadoop.hive.metastore.TestHiveMetaStorePartitionSpecs.testFetchingPartitionsWithDifferentSchemas
org.apache.hadoop.hive.metastore.TestHiveMetaStorePartitionSpecs.testGetPartitionSpecs_WithAndWithoutPartitionGrouping
org.apache.hive.jdbc.TestSSL.testSSLVersion
org.apache.hive.spark.client.TestSparkClient.testAddJarsAndFiles
org.apache.hive.spark.client.TestSparkClient.testCounters
org.apache.hive.spark.client.TestSparkClient.testErrorJob
org.apache.hive.spark.client.TestSparkClient.testJobSubmission
org.apache.hive.spark.client.TestSparkClient.testMetricsCollection
org.apache.hive.spark.client.TestSparkClient.testRemoteClient
org.apache.hive.spark.client.TestSparkClient.testSimpleSparkJob
org.apache.hive.spark.client.TestSparkClient.testSyncRpc
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6346/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6346/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6346/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 33 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12777428 - PreCommit-HIVE-TRUNK-Build

> NullPointerException when spark session is reused to run a mapjoin
> --
>
> Key: HIVE-12616
> URL: https://issues.apache.org/jira/browse/HIVE-12616
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 1.3.0
>Reporter: Nemon Lou
>Assignee: Nemon Lou
> Attachments: HIVE-12616.1.patch, HIVE-12616.2.patch, HIVE-12616.patch
>
>
> The way to reproduce:
> {noformat}
> set hive.execution.engine=spark;
> create table if not exists test(id int);
> create table if not exists test1(id int);
> insert into test values(1);
> insert into test1 values(1);
> select max(a.id) from test a ,test1 b
> where a.id = b.id;
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12616) NullPointerException when spark session is reused to run a mapjoin

2015-12-13 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15055514#comment-15055514
 ] 

Hive QA commented on HIVE-12616:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12777386/HIVE-12616.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 19 failed/errored test(s), 9895 tests 
executed
*Failed tests:*
{noformat}
TestHWISessionManager - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_order2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union9
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_dynamic
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
org.apache.hadoop.hive.metastore.TestHiveMetaStorePartitionSpecs.testFetchingPartitionsWithDifferentSchemas
org.apache.hadoop.hive.metastore.TestHiveMetaStorePartitionSpecs.testGetPartitionSpecs_WithAndWithoutPartitionGrouping
org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testMultiSessionMultipleUse
org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testSingleSessionMultipleUse
org.apache.hive.jdbc.TestSSL.testSSLVersion
org.apache.hive.jdbc.miniHS2.TestHs2Metrics.testMetrics
org.apache.hive.spark.client.TestSparkClient.testAddJarsAndFiles
org.apache.hive.spark.client.TestSparkClient.testCounters
org.apache.hive.spark.client.TestSparkClient.testErrorJob
org.apache.hive.spark.client.TestSparkClient.testJobSubmission
org.apache.hive.spark.client.TestSparkClient.testMetricsCollection
org.apache.hive.spark.client.TestSparkClient.testRemoteClient
org.apache.hive.spark.client.TestSparkClient.testSimpleSparkJob
org.apache.hive.spark.client.TestSparkClient.testSyncRpc
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6344/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6344/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6344/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 19 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12777386 - PreCommit-HIVE-TRUNK-Build

> NullPointerException when spark session is reused to run a mapjoin
> --
>
> Key: HIVE-12616
> URL: https://issues.apache.org/jira/browse/HIVE-12616
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 1.3.0
>Reporter: Nemon Lou
>Assignee: Nemon Lou
> Attachments: HIVE-12616.1.patch, HIVE-12616.patch
>
>
> The way to reproduce:
> {noformat}
> set hive.execution.engine=spark;
> create table if not exists test(id int);
> create table if not exists test1(id int);
> insert into test values(1);
> insert into test1 values(1);
> select max(a.id) from test a ,test1 b
> where a.id = b.id;
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12616) NullPointerException when spark session is reused to run a mapjoin

2015-12-10 Thread Nemon Lou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15050888#comment-15050888
 ] 

Nemon Lou commented on HIVE-12616:
--

[~xuefuz], thanks for review .
It is not surprising that you doubt about the "spark.master" setting in 
HiveConf .
I owe one explanation for the issue described here .

For short , "spark.master" is set for HiveConf during the creation of 
HiveSparkClient.
Snippet of HiveSparkClientFactory#initiateSparkConf :
{code}
String sparkMaster = hiveConf.get("spark.master");
if (sparkMaster == null) {
  sparkMaster = sparkConf.get("spark.master");
  hiveConf.set("spark.master", sparkMaster);
}
{code}
The creation of HiveSparkClient only happens once due to reuse (known as 
SparkSession).
However ,this HiveConf is operation level instead of session level (due to 
asynchronous query).
So ,only the first operation's JobConf has "spark.master" with it .


Now I have two choices :
1, Setting "spark.master" at session level during HiveSparkClient creation .
2, Setting "spark.master" for each operation when not set before ,but using 
sparkConf instead of hiveConf from RemoteHiveSparkClient.(SparkConf in 
RemoteHiveSparkClient already set "spark.master" in an explicit way .) 
Which one do you prefer ?

Adding a test case for this issue seems difficult (yarn-cluster mode,multiple 
operation in one session ),would you provide some guidance ? 
Thanks.


> NullPointerException when spark session is reused to run a mapjoin
> --
>
> Key: HIVE-12616
> URL: https://issues.apache.org/jira/browse/HIVE-12616
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 1.3.0
>Reporter: Nemon Lou
>Assignee: Nemon Lou
> Attachments: HIVE-12616.patch
>
>
> The way to reproduce:
> {noformat}
> set hive.execution.engine=spark;
> create table if not exists test(id int);
> create table if not exists test1(id int);
> insert into test values(1);
> insert into test1 values(1);
> select max(a.id) from test a ,test1 b
> where a.id = b.id;
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12616) NullPointerException when spark session is reused to run a mapjoin

2015-12-10 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052100#comment-15052100
 ] 

Xuefu Zhang commented on HIVE-12616:


Thanks for the explanation. I guess the problem is that user didn't set 
spark.master explicitly, Hive's default, yarn-cluster, is set only for the 
HiveConf of the first operation. 

I think we should set "spark.master" in session level HiveConf. It seems we 
just need to add one line doing that in the if block below:
{code}
// load properties from hive configurations, including both spark.* 
properties,
// properties for remote driver RPC, and yarn properties for Spark on YARN 
mode.
String sparkMaster = hiveConf.get("spark.master");
if (sparkMaster == null) {
  sparkMaster = sparkConf.get("spark.master");
  hiveConf.set("spark.master", sparkMaster);
}
{code}

> NullPointerException when spark session is reused to run a mapjoin
> --
>
> Key: HIVE-12616
> URL: https://issues.apache.org/jira/browse/HIVE-12616
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 1.3.0
>Reporter: Nemon Lou
>Assignee: Nemon Lou
> Attachments: HIVE-12616.patch
>
>
> The way to reproduce:
> {noformat}
> set hive.execution.engine=spark;
> create table if not exists test(id int);
> create table if not exists test1(id int);
> insert into test values(1);
> insert into test1 values(1);
> select max(a.id) from test a ,test1 b
> where a.id = b.id;
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12616) NullPointerException when spark session is reused to run a mapjoin

2015-12-09 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15050087#comment-15050087
 ] 

Xuefu Zhang commented on HIVE-12616:


[~nemon], thanks for the patch. Looking at the patch, I'm not sure if it fixes 
the problem. You're building a JobConf from hiveconf and then check if 
spark.master is null in the JobConf. If so, you copied the value from HiveConf. 
I'm afraid that the value you get from HiveConf will be also null.

It would be great if you can add your test case as part of the patch.

> NullPointerException when spark session is reused to run a mapjoin
> --
>
> Key: HIVE-12616
> URL: https://issues.apache.org/jira/browse/HIVE-12616
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 1.3.0
>Reporter: Nemon Lou
>Assignee: Nemon Lou
> Attachments: HIVE-12616.patch
>
>
> The way to reproduce:
> {noformat}
> set hive.execution.engine=spark;
> create table if not exists test(id int);
> create table if not exists test1(id int);
> insert into test values(1);
> insert into test1 values(1);
> select max(a.id) from test a ,test1 b
> where a.id = b.id;
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12616) NullPointerException when spark session is reused to run a mapjoin

2015-12-09 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15049898#comment-15049898
 ] 

Hive QA commented on HIVE-12616:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12776505/HIVE-12616.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 9887 tests 
executed
*Failed tests:*
{noformat}
TestHWISessionManager - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_udf_max
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_order2
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_dynamic
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_dynamic_partition_pruning
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vectorized_dynamic_partition_pruning
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_mergejoin
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
org.apache.hadoop.hive.metastore.TestHiveMetaStorePartitionSpecs.testAddPartitions
org.apache.hadoop.hive.metastore.TestHiveMetaStorePartitionSpecs.testFetchingPartitionsWithDifferentSchemas
org.apache.hadoop.hive.metastore.TestHiveMetaStorePartitionSpecs.testGetPartitionSpecs_WithAndWithoutPartitionGrouping
org.apache.hive.jdbc.TestSSL.testSSLVersion
org.apache.hive.jdbc.miniHS2.TestHs2Metrics.testMetrics
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6298/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6298/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6298/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 13 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12776505 - PreCommit-HIVE-TRUNK-Build

> NullPointerException when spark session is reused to run a mapjoin
> --
>
> Key: HIVE-12616
> URL: https://issues.apache.org/jira/browse/HIVE-12616
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 1.3.0
>Reporter: Nemon Lou
>Assignee: Nemon Lou
> Attachments: HIVE-12616.patch
>
>
> The way to reproduce:
> {noformat}
> set hive.execution.engine=spark;
> create table if not exists test(id int);
> create table if not exists test1(id int);
> insert into test values(1);
> insert into test1 values(1);
> select max(a.id) from test a ,test1 b
> where a.id = b.id;
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12616) NullPointerException when spark session is reused to run a mapjoin

2015-12-08 Thread Nemon Lou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15046645#comment-15046645
 ] 

Nemon Lou commented on HIVE-12616:
--

The following exception is found from excutors' log
{noformat}
2015-12-08 10:55:04,165 | ERROR | [Executor task launch worker-0] | Exception 
in task 1.0 in stage 4.0 (TID 5) | 
org.apache.spark.Logging$class.logError(Logging.scala:96)
java.lang.RuntimeException: Map operator initialization failed: 
org.apache.hadoop.hive.ql.metadata.HiveException: 
org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:120)
at 
org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:55)
at 
org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:30)
at 
org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:189)
at 
org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:189)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$17.apply(RDD.scala:710)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$17.apply(RDD.scala:710)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:87)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.exec.mr.ObjectCache.retrieve(ObjectCache.java:57)
at 
org.apache.hadoop.hive.ql.exec.mr.ObjectCache.retrieveAsync(ObjectCache.java:63)
at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.initializeOp(MapJoinOperator.java:171)
at 
org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinCommonOperator.initializeOp(VectorMapJoinCommonOperator.java:552)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:363)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:482)
at 
org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:439)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376)
at 
org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:111)
... 18 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.exec.spark.HashTableLoader.load(HashTableLoader.java:150)
at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:289)
at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator$1.call(MapJoinOperator.java:177)
at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator$1.call(MapJoinOperator.java:173)
at 
org.apache.hadoop.hive.ql.exec.mr.ObjectCache.retrieve(ObjectCache.java:55)
... 26 more
Caused by: java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.exec.spark.SparkUtilities.isDedicatedCluster(SparkUtilities.java:121)
at 
org.apache.hadoop.hive.ql.exec.spark.HashTableLoader.load(HashTableLoader.java:157)
at 
org.apache.hadoop.hive.ql.exec.spark.HashTableLoader.load(HashTableLoader.java:147)
... 30 more
{noformat}

> NullPointerException when spark session is reused to run a mapjoin
> --
>
> Key: HIVE-12616
> URL: https://issues.apache.org/jira/browse/HIVE-12616
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 1.3.0
>Reporter: Nemon Lou
>Assignee: Xuefu Zhang
>
> The way to reproduce:
> {noformat}
> set hive.execution.engine=spark;
> create table if not exists test(id int);
> create table if not exists test1(id int);
> insert into test values(1);
> insert into test1 values(1);
> select max(a.id) from test a ,test1 b
> where a.id = b.id;
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12616) NullPointerException when spark session is reused to run a mapjoin

2015-12-08 Thread Nemon Lou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15046799#comment-15046799
 ] 

Nemon Lou commented on HIVE-12616:
--

Another finding.Setting spark.master explicitly can also fix this.
"set spark.master=yarn-cluster;"
I used to use a spark-defaults.conf on HiveServer2 side ,which contains 
"spark.master = yarn-cluster".

> NullPointerException when spark session is reused to run a mapjoin
> --
>
> Key: HIVE-12616
> URL: https://issues.apache.org/jira/browse/HIVE-12616
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 1.3.0
>Reporter: Nemon Lou
>Assignee: Xuefu Zhang
>
> The way to reproduce:
> {noformat}
> set hive.execution.engine=spark;
> create table if not exists test(id int);
> create table if not exists test1(id int);
> insert into test values(1);
> insert into test1 values(1);
> select max(a.id) from test a ,test1 b
> where a.id = b.id;
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12616) NullPointerException when spark session is reused to run a mapjoin

2015-12-08 Thread Nemon Lou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15046791#comment-15046791
 ] 

Nemon Lou commented on HIVE-12616:
--

Method SparkUtilities#isDedicatedCluster
{code}
  public static boolean isDedicatedCluster(Configuration conf) {
String master = conf.get("spark.master");
return master.startsWith("yarn-") || master.startsWith("local");
  }
{code}
Changing to
{code}
  public static boolean isDedicatedCluster(Configuration conf) {
String master = conf.get("spark.master","");
return master.startsWith("yarn-") || master.startsWith("local");
  }
{code}
can fix this bug.But why "spark.master" is null after session reuse ? I haven't 
find out.


> NullPointerException when spark session is reused to run a mapjoin
> --
>
> Key: HIVE-12616
> URL: https://issues.apache.org/jira/browse/HIVE-12616
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 1.3.0
>Reporter: Nemon Lou
>Assignee: Xuefu Zhang
>
> The way to reproduce:
> {noformat}
> set hive.execution.engine=spark;
> create table if not exists test(id int);
> create table if not exists test1(id int);
> insert into test values(1);
> insert into test1 values(1);
> select max(a.id) from test a ,test1 b
> where a.id = b.id;
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12616) NullPointerException when spark session is reused to run a mapjoin

2015-12-08 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15046982#comment-15046982
 ] 

Xuefu Zhang commented on HIVE-12616:


[~nemon], thanks for reporting and investigating the problem. I believe that 
the value for spark.master is null if it's not set explicitly, so comes the 
error. The proposed fix seems fine, but I'm afraid other problems will arise if 
this NPE is fixed. Thus, I think it would be better if we can add some 
validation, making sure spark.master is specified. Would you be interested in 
providing a fix? Thanks.

> NullPointerException when spark session is reused to run a mapjoin
> --
>
> Key: HIVE-12616
> URL: https://issues.apache.org/jira/browse/HIVE-12616
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 1.3.0
>Reporter: Nemon Lou
>Assignee: Xuefu Zhang
>
> The way to reproduce:
> {noformat}
> set hive.execution.engine=spark;
> create table if not exists test(id int);
> create table if not exists test1(id int);
> insert into test values(1);
> insert into test1 values(1);
> select max(a.id) from test a ,test1 b
> where a.id = b.id;
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)