Re: Spark sql failed in yarn-cluster mode when connecting to non-default hive database

2015-04-13 Thread sachin Singh
Hi Linlin,
have you got the solution for this issue, if yes then what are the thing
need to make correct,because I am also getting same error,when submitting
spark job in cluster mode getting error as under -
2015-04-14 18:16:43 DEBUG Transaction - Transaction rolled back in 0 ms
2015-04-14 18:16:43 ERROR DDLTask -
org.apache.hadoop.hive.ql.metadata.HiveException: Database does not exist:
my_database
at 
org.apache.hadoop.hive.ql.exec.DDLTask.switchDatabase(DDLTask.java:4054)
at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:269)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java
...


Please suggest, I have copied hive-site.xml in spark/conf in standalone its
working fine.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-sql-failed-in-yarn-cluster-mode-when-connecting-to-non-default-hive-database-tp11811p22486.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



RE: Spark sql failed in yarn-cluster mode when connecting to non-default hive database

2015-02-17 Thread Andrew Lee
HI All,
Just want to give everyone an update of what worked for me. Thanks for Cheng's 
comment and other ppl's help.
So what I misunderstood was the --driver-class-path and how that was related to 
--files.  I put both /etc/hive/hive-site.xml in both --files and 
--driver-class-path when I started in yarn-cluster mode. 
./bin/spark-submit --verbose --queue research --driver-java-options 
-XX:MaxPermSize=8192M --files /etc/hive/hive-site.xml --driver-class-path 
/etc/hive/hive-site.xml --master yarn --deploy-mode cluster 
The problem here is that --files only look for the local files to distribute it 
onto HDFS. The --driver-class-path is what brings to CLASSPATH during runtime, 
and as you can see, it is trying to look at /etc/hive/hive-site.xml on the 
container in the remote nodes which apparently doesn't exist.  For some ppl, it 
may work fine is b/c they may deploy Hive configuration and JARs across their 
entire cluster so every node looks the same. But this wasn't my case in 
multi-tenant environment or a restricted secured cluster. So my parameter looks 
like this when I launch it.








./bin/spark-submit --verbose --queue research --driver-java-options 
-XX:MaxPermSize=8192M --files /etc/hive/hive-site.xml --driver-class-path 
hive-site.xml --master yarn --deploy-mode cluster 
So --driver-class-path here will only look at ./hive-site.xml on the remote 
container which was pre-deployed already by the --files. 
This worked for me, and I can have HiveContext API to talk to Hive metastore, 
and vice versa. Thanks.


Date: Thu, 5 Feb 2015 16:59:12 -0800
From: lian.cs@gmail.com
To: linlin200...@gmail.com; huaiyin@gmail.com
CC: user@spark.apache.org
Subject: Re: Spark sql failed in yarn-cluster mode when connecting to 
non-default hive database


  

  
  

  Hi Jenny,
  You may try to use --files
  $SPARK_HOME/conf/hive-site.xml --driver-class-path
  hive-site.xml when submitting your application. The
problem is that when running in cluster mode, the driver is
actually running in a random container directory on a random
executor node. By using --files,
you upload hive-site.xml to the container directory, by using 
--driver-class-path
  hive-site.xml, you add the file to classpath (the path
is relative to the container directory).
  When running in cluster
mode, have you tried to check the tables inside the default
database? If my guess is right, this should be an empty default
database inside the default Derby metastore created by
HiveContext when the hive-site.xml is missing.
  Best,

Cheng
  On 8/12/14 5:38 PM,
Jenny Zhao wrote:
  
  



  

  

  

  
  Hi Yin,

  

  hive-site.xml was copied to spark/conf and the same as
  the one under $HIVE_HOME/conf. 

  


through hive cli, I don't see any problem. but for spark
on yarn-cluster mode, I am not able to switch to a
database other than the default one, for Yarn-client
mode, it works fine.  



  
  Thanks!

  


Jenny

  
  



On Tue, Aug 12, 2014 at 12:53 PM,
  Yin Huai huaiyin@gmail.com
  wrote:

  
Hi Jenny,
  

  
  Have you copied hive-site.xml
  to spark/conf directory? If not, can you put it in
  conf/ and try again?
  


  Thanks,
  


  Yin


  


  

  On Mon, Aug 11, 2014 at
8:57 PM, Jenny Zhao linlin200...@gmail.com
wrote:


  

  

  
  Thanks Yin! 

  


here is my hive-site.xml,  which I copied
from $HIVE_HOME/conf, didn't experience
problem connecting to the metastore through
hive. which uses DB2 as metastore database.



  

?xml version=1.0?

?xml-stylesheet type=text/xsl
href

Re: Spark sql failed in yarn-cluster mode when connecting to non-default hive database

2015-02-05 Thread Cheng Lian

Hi Jenny,

You may try to use |--files $SPARK_HOME/conf/hive-site.xml 
--driver-class-path hive-site.xml| when submitting your application. The 
problem is that when running in cluster mode, the driver is actually 
running in a random container directory on a random executor node. By 
using |--files|, you upload hive-site.xml to the container directory, by 
using |--driver-class-path hive-site.xml|, you add the file to classpath 
(the path is relative to the container directory).


When running in cluster mode, have you tried to check the tables inside 
the default database? If my guess is right, this should be an empty 
default database inside the default Derby metastore created by 
HiveContext when the hive-site.xml is missing.


Best,
Cheng

On 8/12/14 5:38 PM, Jenny Zhao wrote:



Hi Yin,

hive-site.xml was copied to spark/conf and the same as the one under 
$HIVE_HOME/conf.


through hive cli, I don't see any problem. but for spark on 
yarn-cluster mode, I am not able to switch to a database other than 
the default one, for Yarn-client mode, it works fine.


Thanks!

Jenny


On Tue, Aug 12, 2014 at 12:53 PM, Yin Huai huaiyin@gmail.com 
mailto:huaiyin@gmail.com wrote:


Hi Jenny,

Have you copied hive-site.xml to spark/conf directory? If not, can
you put it in conf/ and try again?

Thanks,

Yin


On Mon, Aug 11, 2014 at 8:57 PM, Jenny Zhao
linlin200...@gmail.com mailto:linlin200...@gmail.com wrote:


Thanks Yin!

here is my hive-site.xml,  which I copied from
$HIVE_HOME/conf, didn't experience problem connecting to the
metastore through hive. which uses DB2 as metastore database.

?xml version=1.0?
?xml-stylesheet type=text/xsl href=configuration.xsl?
!--
   Licensed to the Apache Software Foundation (ASF) under one
or more
   contributor license agreements.  See the NOTICE file
distributed with
   this work for additional information regarding copyright
ownership.
   The ASF licenses this file to You under the Apache License,
Version 2.0
   (the License); you may not use this file except in
compliance with
   the License.  You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing,
software
   distributed under the License is distributed on an AS IS
BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either
express or implied.
   See the License for the specific language governing
permissions and
   limitations under the License.
--
configuration
 property
namehive.hwi.listen.port/name
  value/value
 /property
 property
namehive.querylog.location/name
value/var/ibm/biginsights/hive/query/${user.name
http://user.name}/value
 /property
 property
namehive.metastore.warehouse.dir/name
value/biginsights/hive/warehouse/value
 /property
 property
namehive.hwi.war.file/name
valuelib/hive-hwi-0.12.0.war/value
 /property
 property
namehive.metastore.metrics.enabled/name
  valuetrue/value
 /property
 property
namejavax.jdo.option.ConnectionURL/name
  valuejdbc:db2://hdtest022.svl.ibm.com:50001/BIDB
http://hdtest022.svl.ibm.com:50001/BIDB/value
 /property
 property
namejavax.jdo.option.ConnectionDriverName/name
valuecom.ibm.db2.jcc.DB2Driver/value
 /property
 property
namehive.stats.autogather/name
  valuefalse/value
 /property
 property
namejavax.jdo.mapping.Schema/name
  valueHIVE/value
 /property
 property
namejavax.jdo.option.ConnectionUserName/name
  valuecatalog/value
 /property
 property
namejavax.jdo.option.ConnectionPassword/name
valueV2pJNWMxbFlVbWhaZHowOQ==/value
 /property
 property
namehive.metastore.password.encrypt/name
  valuetrue/value
 /property
 property
nameorg.jpox.autoCreateSchema/name
  valuetrue/value
 /property
 property
namehive.server2.thrift.min.worker.threads/name
  value5/value
 /property
 property
namehive.server2.thrift.max.worker.threads/name
  value100/value
 /property
 property
namehive.server2.thrift.port/name
  value1/value
 /property
 property
namehive.server2.thrift.bind.host/name
  valuehdtest022.svl.ibm.com
http://hdtest022.svl.ibm.com/value
 /property
 property
namehive.server2.authentication/name
  valueCUSTOM/value
 

RE: Spark sql failed in yarn-cluster mode when connecting to non-default hive database

2014-12-29 Thread Andrew Lee
Hi All,
I have tried to pass the properties via the SparkContext.setLocalProperty and 
HiveContext.setConf, both failed. Based on the results (haven't get a chance to 
look into the code yet), HiveContext will try to initiate the JDBC connection 
right away, I couldn't set other properties dynamically prior to any SQL 
statement.  
The only way to get it work is to put these properties in hive-site.xml which 
did work for me. I'm wondering if there's a better way to dynamically specify 
these Hive configurations like --hiveconf or other ways such as a user-path 
hive-site.xml?  
On a shared cluster, hive-site.xml is shared and cannot be managed in a 
multiple user mode on the same edge server, especially when it contains 
personal password for metastore access. What will be the best way to pass on 
these 3 properties to spark-shell?
javax.jdo.option.ConnectionUserNamejavax.jdo.option. 
ConnectionPasswordjavax.jdo.option. ConnectionURL
According to HiveContext document, hive-site.xml is picked up from the 
classpath. Anyway to specify this dynamically for each spark-shell session?
An instance of the Spark SQL execution engine that integrates with data stored 
in Hive. Configuration for Hive is read from hive-site.xml on the classpath.

Here are the test case I ran.
Spark 1.2.0

Test Case 1








import org.apache.spark.SparkContext
import org.apache.spark.sql.hive._


sc.setLocalProperty(javax.jdo.option.ConnectionUserName,foo)
sc.setLocalProperty(javax.jdo.option.ConnectionPassword,xx)
sc.setLocalProperty(javax.jdo.option.ConnectionURL,jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true)


val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)


import hiveContext._


// Create table and clean up data
hiveContext.hql(CREATE TABLE IF NOT EXISTS spark_hive_test_table (key INT, 
value STRING))
// Encounter error, picking up default user 'APP'@'localhost' and creating 
metastore_db in current local directory, not honoring the JDBC settings for the 
metastore on mysql.

Test Case 2
import org.apache.spark.SparkContextimport org.apache.spark.sql.hive._
val hiveContext = new 
org.apache.spark.sql.hive.HiveContext(sc)hiveContext.setConf(javax.jdo.option.ConnectionUserName,foo)//
 Encounter error right here, it looks like HiveContext tries to initiate the 
JDBC connection prior to any settings from 
setConf.hiveContext.setConf(javax.jdo.option.ConnectionPassword,xxx)









hiveContext.setConf(javax.jdo.option.ConnectionURL,jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true)




From: huaiyin@gmail.com
Date: Wed, 13 Aug 2014 16:56:13 -0400
Subject: Re: Spark sql failed in yarn-cluster mode when connecting to 
non-default hive database
To: linlin200...@gmail.com
CC: lian.cs@gmail.com; user@spark.apache.org

I think the problem is that when you are using yarn-cluster mode, because the 
Spark driver runs inside the application master, the hive-conf is not 
accessible by the driver. Can you try to set those confs by using 
hiveContext.set(...)? Or, maybe you can copy hive-site.xml to spark/conf in the 
node running the application master.



On Tue, Aug 12, 2014 at 8:38 PM, Jenny Zhao linlin200...@gmail.com wrote:



Hi Yin,

hive-site.xml was copied to spark/conf and the same as the one under 
$HIVE_HOME/conf. 



through hive cli, I don't see any problem. but for spark on yarn-cluster mode, 
I am not able to switch to a database other than the default one, for 
Yarn-client mode, it works fine.  


Thanks!

Jenny




On Tue, Aug 12, 2014 at 12:53 PM, Yin Huai huaiyin@gmail.com wrote:

Hi Jenny,
Have you copied hive-site.xml to spark/conf directory? If not, can you put it 
in conf/ and try again?





Thanks,
Yin






On Mon, Aug 11, 2014 at 8:57 PM, Jenny Zhao linlin200...@gmail.com wrote:






Thanks Yin! 

here is my hive-site.xml,  which I copied from $HIVE_HOME/conf, didn't 
experience problem connecting to the metastore through hive. which uses DB2 as 
metastore database. 







?xml version=1.0?
?xml-stylesheet type=text/xsl href=configuration.xsl?
!--
   Licensed to the Apache Software Foundation (ASF) under one or more






   contributor license agreements.  See the NOTICE file distributed with
   this work for additional information regarding copyright ownership.
   The ASF licenses this file to You under the Apache License, Version 2.0






   (the License); you may not use this file except in compliance with
   the License.  You may obtain a copy of the License at

   http://www.apache.org/licenses/LICENSE-2.0







   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an AS IS BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.






   See the License for the specific language governing permissions and
   limitations under the License.
--
configuration
 property
  namehive.hwi.listen.port/name
  value/value






 /property

RE: Spark sql failed in yarn-cluster mode when connecting to non-default hive database

2014-12-29 Thread Andrew Lee
A follow up on the hive-site.xml, if you 
1. Specify it in spark/conf, then you can NOT apply it via the 
--driver-class-path option, otherwise, you will get the following exceptions 
when initializing SparkContext.








org.apache.spark.SparkException: Found both spark.driver.extraClassPath and 
SPARK_CLASSPATH. Use only the former.
2. If you use the --driver-class-path, then you need to unset SPARK_CLASSPATH. 
However, the flip side is that you will need to provide all the related JARs 
(hadoop-yarn, hadoop-common, hdfs, etc) that are part of the hadoop-provided 
if you built your JARs with -Phadoop-provided, and other common libraries that 
are required.

From: alee...@hotmail.com
To: user@spark.apache.org
CC: lian.cs@gmail.com; linlin200...@gmail.com; huaiyin@gmail.com
Subject: RE: Spark sql failed in yarn-cluster mode when connecting to 
non-default hive database
Date: Mon, 29 Dec 2014 16:01:26 -0800




Hi All,
I have tried to pass the properties via the SparkContext.setLocalProperty and 
HiveContext.setConf, both failed. Based on the results (haven't get a chance to 
look into the code yet), HiveContext will try to initiate the JDBC connection 
right away, I couldn't set other properties dynamically prior to any SQL 
statement.  
The only way to get it work is to put these properties in hive-site.xml which 
did work for me. I'm wondering if there's a better way to dynamically specify 
these Hive configurations like --hiveconf or other ways such as a user-path 
hive-site.xml?  
On a shared cluster, hive-site.xml is shared and cannot be managed in a 
multiple user mode on the same edge server, especially when it contains 
personal password for metastore access. What will be the best way to pass on 
these 3 properties to spark-shell?
javax.jdo.option.ConnectionUserNamejavax.jdo.option. 
ConnectionPasswordjavax.jdo.option. ConnectionURL
According to HiveContext document, hive-site.xml is picked up from the 
classpath. Anyway to specify this dynamically for each spark-shell session?
An instance of the Spark SQL execution engine that integrates with data stored 
in Hive. Configuration for Hive is read from hive-site.xml on the classpath.

Here are the test case I ran.
Spark 1.2.0

Test Case 1








import org.apache.spark.SparkContext
import org.apache.spark.sql.hive._


sc.setLocalProperty(javax.jdo.option.ConnectionUserName,foo)
sc.setLocalProperty(javax.jdo.option.ConnectionPassword,xx)
sc.setLocalProperty(javax.jdo.option.ConnectionURL,jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true)


val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)


import hiveContext._


// Create table and clean up data
hiveContext.hql(CREATE TABLE IF NOT EXISTS spark_hive_test_table (key INT, 
value STRING))
// Encounter error, picking up default user 'APP'@'localhost' and creating 
metastore_db in current local directory, not honoring the JDBC settings for the 
metastore on mysql.

Test Case 2
import org.apache.spark.SparkContextimport org.apache.spark.sql.hive._
val hiveContext = new 
org.apache.spark.sql.hive.HiveContext(sc)hiveContext.setConf(javax.jdo.option.ConnectionUserName,foo)//
 Encounter error right here, it looks like HiveContext tries to initiate the 
JDBC connection prior to any settings from 
setConf.hiveContext.setConf(javax.jdo.option.ConnectionPassword,xxx)









hiveContext.setConf(javax.jdo.option.ConnectionURL,jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true)




From: huaiyin@gmail.com
Date: Wed, 13 Aug 2014 16:56:13 -0400
Subject: Re: Spark sql failed in yarn-cluster mode when connecting to 
non-default hive database
To: linlin200...@gmail.com
CC: lian.cs@gmail.com; user@spark.apache.org

I think the problem is that when you are using yarn-cluster mode, because the 
Spark driver runs inside the application master, the hive-conf is not 
accessible by the driver. Can you try to set those confs by using 
hiveContext.set(...)? Or, maybe you can copy hive-site.xml to spark/conf in the 
node running the application master.



On Tue, Aug 12, 2014 at 8:38 PM, Jenny Zhao linlin200...@gmail.com wrote:



Hi Yin,

hive-site.xml was copied to spark/conf and the same as the one under 
$HIVE_HOME/conf. 



through hive cli, I don't see any problem. but for spark on yarn-cluster mode, 
I am not able to switch to a database other than the default one, for 
Yarn-client mode, it works fine.  


Thanks!

Jenny




On Tue, Aug 12, 2014 at 12:53 PM, Yin Huai huaiyin@gmail.com wrote:

Hi Jenny,
Have you copied hive-site.xml to spark/conf directory? If not, can you put it 
in conf/ and try again?





Thanks,
Yin






On Mon, Aug 11, 2014 at 8:57 PM, Jenny Zhao linlin200...@gmail.com wrote:






Thanks Yin! 

here is my hive-site.xml,  which I copied from $HIVE_HOME/conf, didn't 
experience problem connecting to the metastore through hive. which uses DB2 as 
metastore database. 







?xml version=1.0?
?xml-stylesheet type=text/xsl href

Re: Spark sql failed in yarn-cluster mode when connecting to non-default hive database

2014-08-13 Thread Yin Huai
I think the problem is that when you are using yarn-cluster mode, because
the Spark driver runs inside the application master, the hive-conf is not
accessible by the driver. Can you try to set those confs by using
hiveContext.set(...)? Or, maybe you can copy hive-site.xml to spark/conf in
the node running the application master.


On Tue, Aug 12, 2014 at 8:38 PM, Jenny Zhao linlin200...@gmail.com wrote:


 Hi Yin,

 hive-site.xml was copied to spark/conf and the same as the one under
 $HIVE_HOME/conf.

 through hive cli, I don't see any problem. but for spark on yarn-cluster
 mode, I am not able to switch to a database other than the default one, for
 Yarn-client mode, it works fine.

 Thanks!

 Jenny


 On Tue, Aug 12, 2014 at 12:53 PM, Yin Huai huaiyin@gmail.com wrote:

 Hi Jenny,

 Have you copied hive-site.xml to spark/conf directory? If not, can you
 put it in conf/ and try again?

 Thanks,

 Yin


 On Mon, Aug 11, 2014 at 8:57 PM, Jenny Zhao linlin200...@gmail.com
 wrote:


 Thanks Yin!

 here is my hive-site.xml,  which I copied from $HIVE_HOME/conf, didn't
 experience problem connecting to the metastore through hive. which uses DB2
 as metastore database.

 ?xml version=1.0?
 ?xml-stylesheet type=text/xsl href=configuration.xsl?
 !--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements.  See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version
 2.0
(the License); you may not use this file except in compliance with
the License.  You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an AS IS BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
 implied.
See the License for the specific language governing permissions and
limitations under the License.
 --
 configuration
  property
   namehive.hwi.listen.port/name
   value/value
  /property
  property
   namehive.querylog.location/name
   value/var/ibm/biginsights/hive/query/${user.name}/value
  /property
  property
   namehive.metastore.warehouse.dir/name
   value/biginsights/hive/warehouse/value
  /property
  property
   namehive.hwi.war.file/name
   valuelib/hive-hwi-0.12.0.war/value
  /property
  property
   namehive.metastore.metrics.enabled/name
   valuetrue/value
  /property
  property
   namejavax.jdo.option.ConnectionURL/name
   valuejdbc:db2://hdtest022.svl.ibm.com:50001/BIDB/value
  /property
  property
   namejavax.jdo.option.ConnectionDriverName/name
   valuecom.ibm.db2.jcc.DB2Driver/value
  /property
  property
   namehive.stats.autogather/name
   valuefalse/value
  /property
  property
   namejavax.jdo.mapping.Schema/name
   valueHIVE/value
  /property
  property
   namejavax.jdo.option.ConnectionUserName/name
   valuecatalog/value
  /property
  property
   namejavax.jdo.option.ConnectionPassword/name
   valueV2pJNWMxbFlVbWhaZHowOQ==/value
  /property
  property
   namehive.metastore.password.encrypt/name
   valuetrue/value
  /property
  property
   nameorg.jpox.autoCreateSchema/name
   valuetrue/value
  /property
  property
   namehive.server2.thrift.min.worker.threads/name
   value5/value
  /property
  property
   namehive.server2.thrift.max.worker.threads/name
   value100/value
  /property
  property
   namehive.server2.thrift.port/name
   value1/value
  /property
  property
   namehive.server2.thrift.bind.host/name
   valuehdtest022.svl.ibm.com/value
  /property
  property
   namehive.server2.authentication/name
   valueCUSTOM/value
  /property
  property
   namehive.server2.custom.authentication.class/name

 valueorg.apache.hive.service.auth.WebConsoleAuthenticationProviderImpl/value
  /property
  property
   namehive.server2.enable.impersonation/name
   valuetrue/value
  /property
  property
   namehive.security.webconsole.url/name
   valuehttp://hdtest022.svl.ibm.com:8080/value
  /property
  property
   namehive.security.authorization.enabled/name
   valuetrue/value
  /property
  property
   namehive.security.authorization.createtable.owner.grants/name
   valueALL/value
  /property
 /configuration



 On Mon, Aug 11, 2014 at 4:29 PM, Yin Huai huaiyin@gmail.com wrote:

 Hi Jenny,

 How's your metastore configured for both Hive and Spark SQL? Which
 metastore mode are you using (based on
 https://cwiki.apache.org/confluence/display/Hive/AdminManual+MetastoreAdmin
 )?

 Thanks,

 Yin


 On Mon, Aug 11, 2014 at 6:15 PM, Jenny Zhao linlin200...@gmail.com
 wrote:



 you can reproduce this issue with the following steps (assuming you
 have Yarn cluster + Hive 12):

 1) using hive shell, create a database, e.g: create database ttt

 2) write a simple spark sql program

 import org.apache.spark.{SparkConf, SparkContext}
 import org.apache.spark.sql._
 

Re: Spark sql failed in yarn-cluster mode when connecting to non-default hive database

2014-08-12 Thread Yin Huai
Hi Jenny,

Have you copied hive-site.xml to spark/conf directory? If not, can you put
it in conf/ and try again?

Thanks,

Yin


On Mon, Aug 11, 2014 at 8:57 PM, Jenny Zhao linlin200...@gmail.com wrote:


 Thanks Yin!

 here is my hive-site.xml,  which I copied from $HIVE_HOME/conf, didn't
 experience problem connecting to the metastore through hive. which uses DB2
 as metastore database.

 ?xml version=1.0?
 ?xml-stylesheet type=text/xsl href=configuration.xsl?
 !--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements.  See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the License); you may not use this file except in compliance with
the License.  You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an AS IS BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
 --
 configuration
  property
   namehive.hwi.listen.port/name
   value/value
  /property
  property
   namehive.querylog.location/name
   value/var/ibm/biginsights/hive/query/${user.name}/value
  /property
  property
   namehive.metastore.warehouse.dir/name
   value/biginsights/hive/warehouse/value
  /property
  property
   namehive.hwi.war.file/name
   valuelib/hive-hwi-0.12.0.war/value
  /property
  property
   namehive.metastore.metrics.enabled/name
   valuetrue/value
  /property
  property
   namejavax.jdo.option.ConnectionURL/name
   valuejdbc:db2://hdtest022.svl.ibm.com:50001/BIDB/value
  /property
  property
   namejavax.jdo.option.ConnectionDriverName/name
   valuecom.ibm.db2.jcc.DB2Driver/value
  /property
  property
   namehive.stats.autogather/name
   valuefalse/value
  /property
  property
   namejavax.jdo.mapping.Schema/name
   valueHIVE/value
  /property
  property
   namejavax.jdo.option.ConnectionUserName/name
   valuecatalog/value
  /property
  property
   namejavax.jdo.option.ConnectionPassword/name
   valueV2pJNWMxbFlVbWhaZHowOQ==/value
  /property
  property
   namehive.metastore.password.encrypt/name
   valuetrue/value
  /property
  property
   nameorg.jpox.autoCreateSchema/name
   valuetrue/value
  /property
  property
   namehive.server2.thrift.min.worker.threads/name
   value5/value
  /property
  property
   namehive.server2.thrift.max.worker.threads/name
   value100/value
  /property
  property
   namehive.server2.thrift.port/name
   value1/value
  /property
  property
   namehive.server2.thrift.bind.host/name
   valuehdtest022.svl.ibm.com/value
  /property
  property
   namehive.server2.authentication/name
   valueCUSTOM/value
  /property
  property
   namehive.server2.custom.authentication.class/name

 valueorg.apache.hive.service.auth.WebConsoleAuthenticationProviderImpl/value
  /property
  property
   namehive.server2.enable.impersonation/name
   valuetrue/value
  /property
  property
   namehive.security.webconsole.url/name
   valuehttp://hdtest022.svl.ibm.com:8080/value
  /property
  property
   namehive.security.authorization.enabled/name
   valuetrue/value
  /property
  property
   namehive.security.authorization.createtable.owner.grants/name
   valueALL/value
  /property
 /configuration



 On Mon, Aug 11, 2014 at 4:29 PM, Yin Huai huaiyin@gmail.com wrote:

 Hi Jenny,

 How's your metastore configured for both Hive and Spark SQL? Which
 metastore mode are you using (based on
 https://cwiki.apache.org/confluence/display/Hive/AdminManual+MetastoreAdmin
 )?

 Thanks,

 Yin


 On Mon, Aug 11, 2014 at 6:15 PM, Jenny Zhao linlin200...@gmail.com
 wrote:



 you can reproduce this issue with the following steps (assuming you have
 Yarn cluster + Hive 12):

 1) using hive shell, create a database, e.g: create database ttt

 2) write a simple spark sql program

 import org.apache.spark.{SparkConf, SparkContext}
 import org.apache.spark.sql._
 import org.apache.spark.sql.hive.HiveContext

 object HiveSpark {
   case class Record(key: Int, value: String)

   def main(args: Array[String]) {
 val sparkConf = new SparkConf().setAppName(HiveSpark)
 val sc = new SparkContext(sparkConf)

 // A hive context creates an instance of the Hive Metastore in
 process,
 val hiveContext = new HiveContext(sc)
 import hiveContext._

 hql(use ttt)
 hql(CREATE TABLE IF NOT EXISTS src (key INT, value STRING))
 hql(LOAD DATA INPATH '/user/biadmin/kv1.txt' INTO TABLE src)

 // Queries are expressed in HiveQL
 println(Result of 'SELECT *': )
 hql(SELECT * FROM src).collect.foreach(println)
 sc.stop()
   }
 }
 3) run it in yarn-cluster mode.


 On Mon, Aug 11, 2014 at 9:44 AM, Cheng Lian 

Re: Spark sql failed in yarn-cluster mode when connecting to non-default hive database

2014-08-12 Thread Jenny Zhao
Hi Yin,

hive-site.xml was copied to spark/conf and the same as the one under
$HIVE_HOME/conf.

through hive cli, I don't see any problem. but for spark on yarn-cluster
mode, I am not able to switch to a database other than the default one, for
Yarn-client mode, it works fine.

Thanks!

Jenny


On Tue, Aug 12, 2014 at 12:53 PM, Yin Huai huaiyin@gmail.com wrote:

 Hi Jenny,

 Have you copied hive-site.xml to spark/conf directory? If not, can you
 put it in conf/ and try again?

 Thanks,

 Yin


 On Mon, Aug 11, 2014 at 8:57 PM, Jenny Zhao linlin200...@gmail.com
 wrote:


 Thanks Yin!

 here is my hive-site.xml,  which I copied from $HIVE_HOME/conf, didn't
 experience problem connecting to the metastore through hive. which uses DB2
 as metastore database.

 ?xml version=1.0?
 ?xml-stylesheet type=text/xsl href=configuration.xsl?
 !--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements.  See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the License); you may not use this file except in compliance with
the License.  You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an AS IS BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
 implied.
See the License for the specific language governing permissions and
limitations under the License.
 --
 configuration
  property
   namehive.hwi.listen.port/name
   value/value
  /property
  property
   namehive.querylog.location/name
   value/var/ibm/biginsights/hive/query/${user.name}/value
  /property
  property
   namehive.metastore.warehouse.dir/name
   value/biginsights/hive/warehouse/value
  /property
  property
   namehive.hwi.war.file/name
   valuelib/hive-hwi-0.12.0.war/value
  /property
  property
   namehive.metastore.metrics.enabled/name
   valuetrue/value
  /property
  property
   namejavax.jdo.option.ConnectionURL/name
   valuejdbc:db2://hdtest022.svl.ibm.com:50001/BIDB/value
  /property
  property
   namejavax.jdo.option.ConnectionDriverName/name
   valuecom.ibm.db2.jcc.DB2Driver/value
  /property
  property
   namehive.stats.autogather/name
   valuefalse/value
  /property
  property
   namejavax.jdo.mapping.Schema/name
   valueHIVE/value
  /property
  property
   namejavax.jdo.option.ConnectionUserName/name
   valuecatalog/value
  /property
  property
   namejavax.jdo.option.ConnectionPassword/name
   valueV2pJNWMxbFlVbWhaZHowOQ==/value
  /property
  property
   namehive.metastore.password.encrypt/name
   valuetrue/value
  /property
  property
   nameorg.jpox.autoCreateSchema/name
   valuetrue/value
  /property
  property
   namehive.server2.thrift.min.worker.threads/name
   value5/value
  /property
  property
   namehive.server2.thrift.max.worker.threads/name
   value100/value
  /property
  property
   namehive.server2.thrift.port/name
   value1/value
  /property
  property
   namehive.server2.thrift.bind.host/name
   valuehdtest022.svl.ibm.com/value
  /property
  property
   namehive.server2.authentication/name
   valueCUSTOM/value
  /property
  property
   namehive.server2.custom.authentication.class/name

 valueorg.apache.hive.service.auth.WebConsoleAuthenticationProviderImpl/value
  /property
  property
   namehive.server2.enable.impersonation/name
   valuetrue/value
  /property
  property
   namehive.security.webconsole.url/name
   valuehttp://hdtest022.svl.ibm.com:8080/value
  /property
  property
   namehive.security.authorization.enabled/name
   valuetrue/value
  /property
  property
   namehive.security.authorization.createtable.owner.grants/name
   valueALL/value
  /property
 /configuration



 On Mon, Aug 11, 2014 at 4:29 PM, Yin Huai huaiyin@gmail.com wrote:

 Hi Jenny,

 How's your metastore configured for both Hive and Spark SQL? Which
 metastore mode are you using (based on
 https://cwiki.apache.org/confluence/display/Hive/AdminManual+MetastoreAdmin
 )?

 Thanks,

 Yin


 On Mon, Aug 11, 2014 at 6:15 PM, Jenny Zhao linlin200...@gmail.com
 wrote:



 you can reproduce this issue with the following steps (assuming you
 have Yarn cluster + Hive 12):

 1) using hive shell, create a database, e.g: create database ttt

 2) write a simple spark sql program

 import org.apache.spark.{SparkConf, SparkContext}
 import org.apache.spark.sql._
 import org.apache.spark.sql.hive.HiveContext

 object HiveSpark {
   case class Record(key: Int, value: String)

   def main(args: Array[String]) {
 val sparkConf = new SparkConf().setAppName(HiveSpark)
 val sc = new SparkContext(sparkConf)

 // A hive context creates an instance of the Hive Metastore in
 process,
 val hiveContext = new HiveContext(sc)
 import hiveContext._

 hql(use ttt)
   

Re: Spark sql failed in yarn-cluster mode when connecting to non-default hive database

2014-08-11 Thread Cheng Lian
Since you were using hql(...), it’s probably not related to JDBC driver.
But I failed to reproduce this issue locally with a single node pseudo
distributed YARN cluster. Would you mind to elaborate more about steps to
reproduce this bug? Thanks
​


On Sun, Aug 10, 2014 at 9:36 PM, Cheng Lian lian.cs@gmail.com wrote:

 Hi Jenny, does this issue only happen when running Spark SQL with YARN in
 your environment?


 On Sat, Aug 9, 2014 at 3:56 AM, Jenny Zhao linlin200...@gmail.com wrote:


 Hi,

 I am able to run my hql query on yarn cluster mode when connecting to the
 default hive metastore defined in hive-site.xml.

 however, if I want to switch to a different database, like:

   hql(use other-database)


 it only works in yarn client mode, but failed on yarn-cluster mode with
 the following stack:

 14/08/08 12:09:11 INFO HiveMetaStore: 0: get_database: tt
 14/08/08 12:09:11 INFO audit: ugi=biadminip=unknown-ip-addr  
 cmd=get_database: tt
 14/08/08 12:09:11 ERROR RetryingHMSHandler: 
 NoSuchObjectException(message:There is no database named tt)
  at 
 org.apache.hadoop.hive.metastore.ObjectStore.getMDatabase(ObjectStore.java:431)
  at 
 org.apache.hadoop.hive.metastore.ObjectStore.getDatabase(ObjectStore.java:441)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60)
  at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
  at java.lang.reflect.Method.invoke(Method.java:611)
  at 
 org.apache.hadoop.hive.metastore.RetryingRawStore.invoke(RetryingRawStore.java:124)
  at $Proxy15.getDatabase(Unknown Source)
  at 
 org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_database(HiveMetaStore.java:628)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60)
  at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
  at java.lang.reflect.Method.invoke(Method.java:611)
  at 
 org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:103)
  at $Proxy17.get_database(Unknown Source)
  at 
 org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabase(HiveMetaStoreClient.java:810)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60)
  at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
  at java.lang.reflect.Method.invoke(Method.java:611)
  at 
 org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:89)
  at $Proxy18.getDatabase(Unknown Source)
  at org.apache.hadoop.hive.ql.metadata.Hive.getDatabase(Hive.java:1139)
  at 
 org.apache.hadoop.hive.ql.metadata.Hive.databaseExists(Hive.java:1128)
  at 
 org.apache.hadoop.hive.ql.exec.DDLTask.switchDatabase(DDLTask.java:3479)
  at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:237)
  at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:151)
  at 
 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:65)
  at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1414)
  at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1192)
  at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1020)
  at org.apache.hadoop.hive.ql.Driver.run(Driver.java:888)
  at org.apache.spark.sql.hive.HiveContext.runHive(HiveContext.scala:208)
  at 
 org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:182)
  at 
 org.apache.spark.sql.hive.HiveContext$QueryExecution.toRdd$lzycompute(HiveContext.scala:272)
  at 
 org.apache.spark.sql.hive.HiveContext$QueryExecution.toRdd(HiveContext.scala:269)
  at org.apache.spark.sql.hive.HiveContext.hiveql(HiveContext.scala:86)
  at org.apache.spark.sql.hive.HiveContext.hql(HiveContext.scala:91)
  at 
 org.apache.spark.examples.sql.hive.HiveSpark$.main(HiveSpark.scala:35)
  at org.apache.spark.examples.sql.hive.HiveSpark.main(HiveSpark.scala)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60)
  at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
  at java.lang.reflect.Method.invoke(Method.java:611)
  at 
 org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:186)

 14/08/08 12:09:11 ERROR DDLTask: 
 org.apache.hadoop.hive.ql.metadata.HiveException: Database does not exist: tt
  at 
 org.apache.hadoop.hive.ql.exec.DDLTask.switchDatabase(DDLTask.java:3480)
  at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:237)
  at 

Re: Spark sql failed in yarn-cluster mode when connecting to non-default hive database

2014-08-11 Thread Yin Huai
Hi Jenny,

How's your metastore configured for both Hive and Spark SQL? Which
metastore mode are you using (based on
https://cwiki.apache.org/confluence/display/Hive/AdminManual+MetastoreAdmin
)?

Thanks,

Yin


On Mon, Aug 11, 2014 at 6:15 PM, Jenny Zhao linlin200...@gmail.com wrote:



 you can reproduce this issue with the following steps (assuming you have
 Yarn cluster + Hive 12):

 1) using hive shell, create a database, e.g: create database ttt

 2) write a simple spark sql program

 import org.apache.spark.{SparkConf, SparkContext}
 import org.apache.spark.sql._
 import org.apache.spark.sql.hive.HiveContext

 object HiveSpark {
   case class Record(key: Int, value: String)

   def main(args: Array[String]) {
 val sparkConf = new SparkConf().setAppName(HiveSpark)
 val sc = new SparkContext(sparkConf)

 // A hive context creates an instance of the Hive Metastore in process,
 val hiveContext = new HiveContext(sc)
 import hiveContext._

 hql(use ttt)
 hql(CREATE TABLE IF NOT EXISTS src (key INT, value STRING))
 hql(LOAD DATA INPATH '/user/biadmin/kv1.txt' INTO TABLE src)

 // Queries are expressed in HiveQL
 println(Result of 'SELECT *': )
 hql(SELECT * FROM src).collect.foreach(println)
 sc.stop()
   }
 }
 3) run it in yarn-cluster mode.


 On Mon, Aug 11, 2014 at 9:44 AM, Cheng Lian lian.cs@gmail.com wrote:

 Since you were using hql(...), it’s probably not related to JDBC driver.
 But I failed to reproduce this issue locally with a single node pseudo
 distributed YARN cluster. Would you mind to elaborate more about steps to
 reproduce this bug? Thanks
 ​


 On Sun, Aug 10, 2014 at 9:36 PM, Cheng Lian lian.cs@gmail.com
 wrote:

 Hi Jenny, does this issue only happen when running Spark SQL with YARN
 in your environment?


 On Sat, Aug 9, 2014 at 3:56 AM, Jenny Zhao linlin200...@gmail.com
 wrote:


 Hi,

 I am able to run my hql query on yarn cluster mode when connecting to
 the default hive metastore defined in hive-site.xml.

 however, if I want to switch to a different database, like:

   hql(use other-database)


 it only works in yarn client mode, but failed on yarn-cluster mode with
 the following stack:

 14/08/08 12:09:11 INFO HiveMetaStore: 0: get_database: tt
 14/08/08 12:09:11 INFO audit: ugi=biadmin  ip=unknown-ip-addr  
 cmd=get_database: tt
 14/08/08 12:09:11 ERROR RetryingHMSHandler: 
 NoSuchObjectException(message:There is no database named tt)
at 
 org.apache.hadoop.hive.metastore.ObjectStore.getMDatabase(ObjectStore.java:431)
at 
 org.apache.hadoop.hive.metastore.ObjectStore.getDatabase(ObjectStore.java:441)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60)
at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
at java.lang.reflect.Method.invoke(Method.java:611)
at 
 org.apache.hadoop.hive.metastore.RetryingRawStore.invoke(RetryingRawStore.java:124)
at $Proxy15.getDatabase(Unknown Source)
at 
 org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_database(HiveMetaStore.java:628)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60)
at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
at java.lang.reflect.Method.invoke(Method.java:611)
at 
 org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:103)
at $Proxy17.get_database(Unknown Source)
at 
 org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabase(HiveMetaStoreClient.java:810)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60)
at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
at java.lang.reflect.Method.invoke(Method.java:611)
at 
 org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:89)
at $Proxy18.getDatabase(Unknown Source)
at org.apache.hadoop.hive.ql.metadata.Hive.getDatabase(Hive.java:1139)
at 
 org.apache.hadoop.hive.ql.metadata.Hive.databaseExists(Hive.java:1128)
at 
 org.apache.hadoop.hive.ql.exec.DDLTask.switchDatabase(DDLTask.java:3479)
at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:237)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:151)
at 
 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:65)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1414)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1192)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1020)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:888)
at 

Re: Spark sql failed in yarn-cluster mode when connecting to non-default hive database

2014-08-11 Thread Jenny Zhao
Thanks Yin!

here is my hive-site.xml,  which I copied from $HIVE_HOME/conf, didn't
experience problem connecting to the metastore through hive. which uses DB2
as metastore database.

?xml version=1.0?
?xml-stylesheet type=text/xsl href=configuration.xsl?
!--
   Licensed to the Apache Software Foundation (ASF) under one or more
   contributor license agreements.  See the NOTICE file distributed with
   this work for additional information regarding copyright ownership.
   The ASF licenses this file to You under the Apache License, Version 2.0
   (the License); you may not use this file except in compliance with
   the License.  You may obtain a copy of the License at

   http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an AS IS BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.
--
configuration
 property
  namehive.hwi.listen.port/name
  value/value
 /property
 property
  namehive.querylog.location/name
  value/var/ibm/biginsights/hive/query/${user.name}/value
 /property
 property
  namehive.metastore.warehouse.dir/name
  value/biginsights/hive/warehouse/value
 /property
 property
  namehive.hwi.war.file/name
  valuelib/hive-hwi-0.12.0.war/value
 /property
 property
  namehive.metastore.metrics.enabled/name
  valuetrue/value
 /property
 property
  namejavax.jdo.option.ConnectionURL/name
  valuejdbc:db2://hdtest022.svl.ibm.com:50001/BIDB/value
 /property
 property
  namejavax.jdo.option.ConnectionDriverName/name
  valuecom.ibm.db2.jcc.DB2Driver/value
 /property
 property
  namehive.stats.autogather/name
  valuefalse/value
 /property
 property
  namejavax.jdo.mapping.Schema/name
  valueHIVE/value
 /property
 property
  namejavax.jdo.option.ConnectionUserName/name
  valuecatalog/value
 /property
 property
  namejavax.jdo.option.ConnectionPassword/name
  valueV2pJNWMxbFlVbWhaZHowOQ==/value
 /property
 property
  namehive.metastore.password.encrypt/name
  valuetrue/value
 /property
 property
  nameorg.jpox.autoCreateSchema/name
  valuetrue/value
 /property
 property
  namehive.server2.thrift.min.worker.threads/name
  value5/value
 /property
 property
  namehive.server2.thrift.max.worker.threads/name
  value100/value
 /property
 property
  namehive.server2.thrift.port/name
  value1/value
 /property
 property
  namehive.server2.thrift.bind.host/name
  valuehdtest022.svl.ibm.com/value
 /property
 property
  namehive.server2.authentication/name
  valueCUSTOM/value
 /property
 property
  namehive.server2.custom.authentication.class/name

valueorg.apache.hive.service.auth.WebConsoleAuthenticationProviderImpl/value
 /property
 property
  namehive.server2.enable.impersonation/name
  valuetrue/value
 /property
 property
  namehive.security.webconsole.url/name
  valuehttp://hdtest022.svl.ibm.com:8080/value
 /property
 property
  namehive.security.authorization.enabled/name
  valuetrue/value
 /property
 property
  namehive.security.authorization.createtable.owner.grants/name
  valueALL/value
 /property
/configuration



On Mon, Aug 11, 2014 at 4:29 PM, Yin Huai huaiyin@gmail.com wrote:

 Hi Jenny,

 How's your metastore configured for both Hive and Spark SQL? Which
 metastore mode are you using (based on
 https://cwiki.apache.org/confluence/display/Hive/AdminManual+MetastoreAdmin
 )?

 Thanks,

 Yin


 On Mon, Aug 11, 2014 at 6:15 PM, Jenny Zhao linlin200...@gmail.com
 wrote:



 you can reproduce this issue with the following steps (assuming you have
 Yarn cluster + Hive 12):

 1) using hive shell, create a database, e.g: create database ttt

 2) write a simple spark sql program

 import org.apache.spark.{SparkConf, SparkContext}
 import org.apache.spark.sql._
 import org.apache.spark.sql.hive.HiveContext

 object HiveSpark {
   case class Record(key: Int, value: String)

   def main(args: Array[String]) {
 val sparkConf = new SparkConf().setAppName(HiveSpark)
 val sc = new SparkContext(sparkConf)

 // A hive context creates an instance of the Hive Metastore in
 process,
 val hiveContext = new HiveContext(sc)
 import hiveContext._

 hql(use ttt)
 hql(CREATE TABLE IF NOT EXISTS src (key INT, value STRING))
 hql(LOAD DATA INPATH '/user/biadmin/kv1.txt' INTO TABLE src)

 // Queries are expressed in HiveQL
 println(Result of 'SELECT *': )
 hql(SELECT * FROM src).collect.foreach(println)
 sc.stop()
   }
 }
 3) run it in yarn-cluster mode.


 On Mon, Aug 11, 2014 at 9:44 AM, Cheng Lian lian.cs@gmail.com
 wrote:

 Since you were using hql(...), it’s probably not related to JDBC
 driver. But I failed to reproduce this issue locally with a single node
 pseudo distributed YARN cluster. Would you mind to elaborate more about
 steps to reproduce this bug? Thanks
 ​


 On Sun, Aug 10, 2014 at 9:36 PM, Cheng Lian 

Re: Spark sql failed in yarn-cluster mode when connecting to non-default hive database

2014-08-10 Thread Cheng Lian
Hi Jenny, does this issue only happen when running Spark SQL with YARN in
your environment?


On Sat, Aug 9, 2014 at 3:56 AM, Jenny Zhao linlin200...@gmail.com wrote:


 Hi,

 I am able to run my hql query on yarn cluster mode when connecting to the
 default hive metastore defined in hive-site.xml.

 however, if I want to switch to a different database, like:

   hql(use other-database)


 it only works in yarn client mode, but failed on yarn-cluster mode with
 the following stack:

 14/08/08 12:09:11 INFO HiveMetaStore: 0: get_database: tt
 14/08/08 12:09:11 INFO audit: ugi=biadmin ip=unknown-ip-addr  
 cmd=get_database: tt
 14/08/08 12:09:11 ERROR RetryingHMSHandler: 
 NoSuchObjectException(message:There is no database named tt)
   at 
 org.apache.hadoop.hive.metastore.ObjectStore.getMDatabase(ObjectStore.java:431)
   at 
 org.apache.hadoop.hive.metastore.ObjectStore.getDatabase(ObjectStore.java:441)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
   at java.lang.reflect.Method.invoke(Method.java:611)
   at 
 org.apache.hadoop.hive.metastore.RetryingRawStore.invoke(RetryingRawStore.java:124)
   at $Proxy15.getDatabase(Unknown Source)
   at 
 org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_database(HiveMetaStore.java:628)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
   at java.lang.reflect.Method.invoke(Method.java:611)
   at 
 org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:103)
   at $Proxy17.get_database(Unknown Source)
   at 
 org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabase(HiveMetaStoreClient.java:810)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
   at java.lang.reflect.Method.invoke(Method.java:611)
   at 
 org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:89)
   at $Proxy18.getDatabase(Unknown Source)
   at org.apache.hadoop.hive.ql.metadata.Hive.getDatabase(Hive.java:1139)
   at 
 org.apache.hadoop.hive.ql.metadata.Hive.databaseExists(Hive.java:1128)
   at 
 org.apache.hadoop.hive.ql.exec.DDLTask.switchDatabase(DDLTask.java:3479)
   at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:237)
   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:151)
   at 
 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:65)
   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1414)
   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1192)
   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1020)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:888)
   at org.apache.spark.sql.hive.HiveContext.runHive(HiveContext.scala:208)
   at 
 org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:182)
   at 
 org.apache.spark.sql.hive.HiveContext$QueryExecution.toRdd$lzycompute(HiveContext.scala:272)
   at 
 org.apache.spark.sql.hive.HiveContext$QueryExecution.toRdd(HiveContext.scala:269)
   at org.apache.spark.sql.hive.HiveContext.hiveql(HiveContext.scala:86)
   at org.apache.spark.sql.hive.HiveContext.hql(HiveContext.scala:91)
   at 
 org.apache.spark.examples.sql.hive.HiveSpark$.main(HiveSpark.scala:35)
   at org.apache.spark.examples.sql.hive.HiveSpark.main(HiveSpark.scala)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
   at java.lang.reflect.Method.invoke(Method.java:611)
   at 
 org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:186)

 14/08/08 12:09:11 ERROR DDLTask: 
 org.apache.hadoop.hive.ql.metadata.HiveException: Database does not exist: tt
   at 
 org.apache.hadoop.hive.ql.exec.DDLTask.switchDatabase(DDLTask.java:3480)
   at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:237)
   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:151)
   at 
 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:65)
   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1414)
   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1192)
   at