Spark sql failed in yarn-cluster mode when connecting to non-default hive database

2014-08-08 Thread Jenny Zhao
Hi,

I am able to run my hql query on yarn cluster mode when connecting to the
default hive metastore defined in hive-site.xml.

however, if I want to switch to a different database, like:

  hql("use other-database")


it only works in yarn client mode, but failed on yarn-cluster mode with the
following stack:

14/08/08 12:09:11 INFO HiveMetaStore: 0: get_database: tt
14/08/08 12:09:11 INFO audit:
ugi=biadmin ip=unknown-ip-addr  cmd=get_database: tt
14/08/08 12:09:11 ERROR RetryingHMSHandler:
NoSuchObjectException(message:There is no database named tt)
at 
org.apache.hadoop.hive.metastore.ObjectStore.getMDatabase(ObjectStore.java:431)
at 
org.apache.hadoop.hive.metastore.ObjectStore.getDatabase(ObjectStore.java:441)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
at java.lang.reflect.Method.invoke(Method.java:611)
at 
org.apache.hadoop.hive.metastore.RetryingRawStore.invoke(RetryingRawStore.java:124)
at $Proxy15.getDatabase(Unknown Source)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_database(HiveMetaStore.java:628)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
at java.lang.reflect.Method.invoke(Method.java:611)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:103)
at $Proxy17.get_database(Unknown Source)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabase(HiveMetaStoreClient.java:810)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
at java.lang.reflect.Method.invoke(Method.java:611)
at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:89)
at $Proxy18.getDatabase(Unknown Source)
at org.apache.hadoop.hive.ql.metadata.Hive.getDatabase(Hive.java:1139)
at 
org.apache.hadoop.hive.ql.metadata.Hive.databaseExists(Hive.java:1128)
at 
org.apache.hadoop.hive.ql.exec.DDLTask.switchDatabase(DDLTask.java:3479)
at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:237)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:151)
at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:65)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1414)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1192)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1020)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:888)
at org.apache.spark.sql.hive.HiveContext.runHive(HiveContext.scala:208)
at 
org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:182)
at 
org.apache.spark.sql.hive.HiveContext$QueryExecution.toRdd$lzycompute(HiveContext.scala:272)
at 
org.apache.spark.sql.hive.HiveContext$QueryExecution.toRdd(HiveContext.scala:269)
at org.apache.spark.sql.hive.HiveContext.hiveql(HiveContext.scala:86)
at org.apache.spark.sql.hive.HiveContext.hql(HiveContext.scala:91)
at 
org.apache.spark.examples.sql.hive.HiveSpark$.main(HiveSpark.scala:35)
at org.apache.spark.examples.sql.hive.HiveSpark.main(HiveSpark.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
at java.lang.reflect.Method.invoke(Method.java:611)
at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:186)

14/08/08 12:09:11 ERROR DDLTask:
org.apache.hadoop.hive.ql.metadata.HiveException: Database does not
exist: tt
at 
org.apache.hadoop.hive.ql.exec.DDLTask.switchDatabase(DDLTask.java:3480)
at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:237)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:151)
at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:65)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1414)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1192)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1020)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:888)
at org.apache.spark.sql.hive.HiveContex

RE: Spark sql failed in yarn-cluster mode when connecting to non-default hive database

2014-12-29 Thread Andrew Lee
Hi All,
I have tried to pass the properties via the SparkContext.setLocalProperty and 
HiveContext.setConf, both failed. Based on the results (haven't get a chance to 
look into the code yet), HiveContext will try to initiate the JDBC connection 
right away, I couldn't set other properties dynamically prior to any SQL 
statement.  
The only way to get it work is to put these properties in hive-site.xml which 
did work for me. I'm wondering if there's a better way to dynamically specify 
these Hive configurations like --hiveconf or other ways such as a user-path 
hive-site.xml?  
On a shared cluster, hive-site.xml is shared and cannot be managed in a 
multiple user mode on the same edge server, especially when it contains 
personal password for metastore access. What will be the best way to pass on 
these 3 properties to spark-shell?
javax.jdo.option.ConnectionUserNamejavax.jdo.option. 
ConnectionPasswordjavax.jdo.option. ConnectionURL
According to HiveContext document, hive-site.xml is picked up from the 
classpath. Anyway to specify this dynamically for each spark-shell session?
"An instance of the Spark SQL execution engine that integrates with data stored 
in Hive. Configuration for Hive is read from hive-site.xml on the classpath."

Here are the test case I ran.
Spark 1.2.0

Test Case 1








import org.apache.spark.SparkContext
import org.apache.spark.sql.hive._


sc.setLocalProperty("javax.jdo.option.ConnectionUserName","foo")
sc.setLocalProperty("javax.jdo.option.ConnectionPassword","xx")
sc.setLocalProperty("javax.jdo.option.ConnectionURL","jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true")


val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)


import hiveContext._


// Create table and clean up data
hiveContext.hql("CREATE TABLE IF NOT EXISTS spark_hive_test_table (key INT, 
value STRING)")
// Encounter error, picking up default user 'APP'@'localhost' and creating 
metastore_db in current local directory, not honoring the JDBC settings for the 
metastore on mysql.

Test Case 2
import org.apache.spark.SparkContextimport org.apache.spark.sql.hive._
val hiveContext = new 
org.apache.spark.sql.hive.HiveContext(sc)hiveContext.setConf("javax.jdo.option.ConnectionUserName","foo")//
 Encounter error right here, it looks like HiveContext tries to initiate the 
JDBC connection prior to any settings from 
setConf.hiveContext.setConf("javax.jdo.option.ConnectionPassword","xxx")









hiveContext.setConf("javax.jdo.option.ConnectionURL","jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true")




From: huaiyin@gmail.com
Date: Wed, 13 Aug 2014 16:56:13 -0400
Subject: Re: Spark sql failed in yarn-cluster mode when connecting to 
non-default hive database
To: linlin200...@gmail.com
CC: lian.cs@gmail.com; user@spark.apache.org

I think the problem is that when you are using yarn-cluster mode, because the 
Spark driver runs inside the application master, the hive-conf is not 
accessible by the driver. Can you try to set those confs by using 
hiveContext.set(...)? Or, maybe you can copy hive-site.xml to spark/conf in the 
node running the application master.



On Tue, Aug 12, 2014 at 8:38 PM, Jenny Zhao  wrote:



Hi Yin,

hive-site.xml was copied to spark/conf and the same as the one under 
$HIVE_HOME/conf. 



through hive cli, I don't see any problem. but for spark on yarn-cluster mode, 
I am not able to switch to a database other than the default one, for 
Yarn-client mode, it works fine.  


Thanks!

Jenny




On Tue, Aug 12, 2014 at 12:53 PM, Yin Huai  wrote:

Hi Jenny,
Have you copied hive-site.xml to spark/conf directory? If not, can you put it 
in conf/ and try again?





Thanks,
Yin






On Mon, Aug 11, 2014 at 8:57 PM, Jenny Zhao  wrote:






Thanks Yin! 

here is my hive-site.xml,  which I copied from $HIVE_HOME/conf, didn't 
experience problem connecting to the metastore through hive. which uses DB2 as 
metastore database. 











 
  hive.hwi.listen.port
  






 
 
  hive.querylog.location
  /var/ibm/biginsights/hive/query/${user.name}





 

 
  hive.metastore.warehouse.dir
  /biginsights/hive/warehouse
 
 
  hive.hwi.war.file






  lib/hive-hwi-0.12.0.war
 
 
  hive.metastore.metrics.enabled
  true
 
 






  javax.jdo.option.ConnectionURL
  jdbc:db2://hdtest022.svl.ibm.com:50001/BIDB
 





 

  javax.jdo.option.ConnectionDriverName
  com.ibm.db2.jcc.DB2Driver
 
 
  hive.stats.autogather






  false
 
 
  javax.jdo.mapping.Schema
  HIVE
 
 
  javax.jdo.option.ConnectionUserName






  catalog
 
 
  javax.jdo.option.ConnectionPassword
  V2pJNWMxbFlVbWhaZHowOQ==
 






 
  hive.metastore.password.encrypt
  true
 
 
  org.jpox.autoCreateSchema
  true






 
 
  hive.server2.thrift.min.worker.threads
  5
 
 
  hive.ser

RE: Spark sql failed in yarn-cluster mode when connecting to non-default hive database

2014-12-29 Thread Andrew Lee
A follow up on the hive-site.xml, if you 
1. Specify it in spark/conf, then you can NOT apply it via the 
--driver-class-path option, otherwise, you will get the following exceptions 
when initializing SparkContext.








org.apache.spark.SparkException: Found both spark.driver.extraClassPath and 
SPARK_CLASSPATH. Use only the former.
2. If you use the --driver-class-path, then you need to unset SPARK_CLASSPATH. 
However, the flip side is that you will need to provide all the related JARs 
(hadoop-yarn, hadoop-common, hdfs, etc) that are part of the "hadoop-provided" 
if you built your JARs with -Phadoop-provided, and other common libraries that 
are required.

From: alee...@hotmail.com
To: user@spark.apache.org
CC: lian.cs@gmail.com; linlin200...@gmail.com; huaiyin@gmail.com
Subject: RE: Spark sql failed in yarn-cluster mode when connecting to 
non-default hive database
Date: Mon, 29 Dec 2014 16:01:26 -0800




Hi All,
I have tried to pass the properties via the SparkContext.setLocalProperty and 
HiveContext.setConf, both failed. Based on the results (haven't get a chance to 
look into the code yet), HiveContext will try to initiate the JDBC connection 
right away, I couldn't set other properties dynamically prior to any SQL 
statement.  
The only way to get it work is to put these properties in hive-site.xml which 
did work for me. I'm wondering if there's a better way to dynamically specify 
these Hive configurations like --hiveconf or other ways such as a user-path 
hive-site.xml?  
On a shared cluster, hive-site.xml is shared and cannot be managed in a 
multiple user mode on the same edge server, especially when it contains 
personal password for metastore access. What will be the best way to pass on 
these 3 properties to spark-shell?
javax.jdo.option.ConnectionUserNamejavax.jdo.option. 
ConnectionPasswordjavax.jdo.option. ConnectionURL
According to HiveContext document, hive-site.xml is picked up from the 
classpath. Anyway to specify this dynamically for each spark-shell session?
"An instance of the Spark SQL execution engine that integrates with data stored 
in Hive. Configuration for Hive is read from hive-site.xml on the classpath."

Here are the test case I ran.
Spark 1.2.0

Test Case 1








import org.apache.spark.SparkContext
import org.apache.spark.sql.hive._


sc.setLocalProperty("javax.jdo.option.ConnectionUserName","foo")
sc.setLocalProperty("javax.jdo.option.ConnectionPassword","xx")
sc.setLocalProperty("javax.jdo.option.ConnectionURL","jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true")


val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)


import hiveContext._


// Create table and clean up data
hiveContext.hql("CREATE TABLE IF NOT EXISTS spark_hive_test_table (key INT, 
value STRING)")
// Encounter error, picking up default user 'APP'@'localhost' and creating 
metastore_db in current local directory, not honoring the JDBC settings for the 
metastore on mysql.

Test Case 2
import org.apache.spark.SparkContextimport org.apache.spark.sql.hive._
val hiveContext = new 
org.apache.spark.sql.hive.HiveContext(sc)hiveContext.setConf("javax.jdo.option.ConnectionUserName","foo")//
 Encounter error right here, it looks like HiveContext tries to initiate the 
JDBC connection prior to any settings from 
setConf.hiveContext.setConf("javax.jdo.option.ConnectionPassword","xxx")









hiveContext.setConf("javax.jdo.option.ConnectionURL","jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true")




From: huaiyin@gmail.com
Date: Wed, 13 Aug 2014 16:56:13 -0400
Subject: Re: Spark sql failed in yarn-cluster mode when connecting to 
non-default hive database
To: linlin200...@gmail.com
CC: lian.cs@gmail.com; user@spark.apache.org

I think the problem is that when you are using yarn-cluster mode, because the 
Spark driver runs inside the application master, the hive-conf is not 
accessible by the driver. Can you try to set those confs by using 
hiveContext.set(...)? Or, maybe you can copy hive-site.xml to spark/conf in the 
node running the application master.



On Tue, Aug 12, 2014 at 8:38 PM, Jenny Zhao  wrote:



Hi Yin,

hive-site.xml was copied to spark/conf and the same as the one under 
$HIVE_HOME/conf. 



through hive cli, I don't see any problem. but for spark on yarn-cluster mode, 
I am not able to switch to a database other than the default one, for 
Yarn-client mode, it works fine.  


Thanks!

Jenny




On Tue, Aug 12, 2014 at 12:53 PM, Yin Huai  wrote:

Hi Jenny,
Have you copied hive-site.xml to spark/conf directory? If not, can you put it 
in conf/ and try again?





Thanks,
Yin






On Mon, Aug 11, 2014 at 8:57 PM, Jenny Zhao  wrote:






Thanks Yin! 

here is my hive-site.xml,  which I copied from $HIVE_HOME/conf, di

RE: Spark sql failed in yarn-cluster mode when connecting to non-default hive database

2015-02-03 Thread Andrew Lee
Hi All,
In Spark 1.2.0-rc1, I have tried to set the hive.metastore.warehouse.dir to 
share with the Hive warehouse location on HDFS, however, it does NOT work on 
yarn-cluster mode. On the Namenode audit log, I see that spark is trying to 
access the default hive warehouse location which is 
/user/hive/warehouse/spark_hive_test_yarn_cluster_table as oppose to 
/hive/spark_hive_test_yarn_cluster_table.
A tweaked code snippet from the example looks like this. Compiled and built, 
submitted in yarn-cluster mode. (However, it works for yarn-client mode since 
it can find the hive-site.xml on the driver machine. But we don't deploy 
hive-site.xml to all data nodes, this is not standard to deploy all 
hive-site.xml to data node, instead, it should be part of the --jars or --files 
but it still fails when I do so).








import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
import org.apache.spark.sql.hive.HiveContext
import org.apache.spark.sql.hive._


object SparkSQLTestCase2HiveContextYarnClusterApp {
 def main(args: Array[String]) {


  val conf = new SparkConf().setAppName("Spark SQL Hive Context TestCase 
Application")
  val sc = new SparkContext(conf)
  val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)


  import hiveContext._


  // Set default hive warehouse that aligns with /etc/hive/conf/hive-site.xml
  hiveContext.hql("SET hive.metastore.warehouse.dir=hdfs://hive")


  // Create table and clean up data
  hiveContext.hql("CREATE TABLE IF NOT EXISTS 
spark_hive_test_yarn_cluster_table (key INT, value STRING)")


  // load sample data from HDFS, need to be uploaded first
  hiveContext.hql("LOAD DATA INPATH 'spark/test/resources/kv1.txt' INTO TABLE 
spark_hive_test_yarn_cluster_table")


  // Queries are expressed in HiveQL, use collect(), results go into memory, be 
careful. This is just
  // a test case. Do NOT use the following line for production, store results 
to HDFS.
  hiveContext.hql("FROM spark_hive_test_yarn_cluster_table SELECT key, 
value").collect().foreach(println)


  }
}

From: huaiyin....@gmail.com
Date: Wed, 13 Aug 2014 16:56:13 -0400
Subject: Re: Spark sql failed in yarn-cluster mode when connecting to 
non-default hive database
To: linlin200...@gmail.com
CC: lian.cs@gmail.com; user@spark.apache.org

I think the problem is that when you are using yarn-cluster mode, because the 
Spark driver runs inside the application master, the hive-conf is not 
accessible by the driver. Can you try to set those confs by using 
hiveContext.set(...)? Or, maybe you can copy hive-site.xml to spark/conf in the 
node running the application master.



On Tue, Aug 12, 2014 at 8:38 PM, Jenny Zhao  wrote:



Hi Yin,

hive-site.xml was copied to spark/conf and the same as the one under 
$HIVE_HOME/conf. 



through hive cli, I don't see any problem. but for spark on yarn-cluster mode, 
I am not able to switch to a database other than the default one, for 
Yarn-client mode, it works fine.  


Thanks!

Jenny




On Tue, Aug 12, 2014 at 12:53 PM, Yin Huai  wrote:

Hi Jenny,
Have you copied hive-site.xml to spark/conf directory? If not, can you put it 
in conf/ and try again?





Thanks,
Yin






On Mon, Aug 11, 2014 at 8:57 PM, Jenny Zhao  wrote:






Thanks Yin! 

here is my hive-site.xml,  which I copied from $HIVE_HOME/conf, didn't 
experience problem connecting to the metastore through hive. which uses DB2 as 
metastore database. 











 
  hive.hwi.listen.port
  






 
 
  hive.querylog.location
  /var/ibm/biginsights/hive/query/${user.name}





 

 
  hive.metastore.warehouse.dir
  /biginsights/hive/warehouse
 
 
  hive.hwi.war.file






  lib/hive-hwi-0.12.0.war
 
 
  hive.metastore.metrics.enabled
  true
 
 






  javax.jdo.option.ConnectionURL
  jdbc:db2://hdtest022.svl.ibm.com:50001/BIDB
 





 

  javax.jdo.option.ConnectionDriverName
  com.ibm.db2.jcc.DB2Driver
 
 
  hive.stats.autogather






  false
 
 
  javax.jdo.mapping.Schema
  HIVE
 
 
  javax.jdo.option.ConnectionUserName






  catalog
 
 
  javax.jdo.option.ConnectionPassword
  V2pJNWMxbFlVbWhaZHowOQ==
 






 
  hive.metastore.password.encrypt
  true
 
 
  org.jpox.autoCreateSchema
  true






 
 
  hive.server2.thrift.min.worker.threads
  5
 
 
  hive.server2.thrift.max.worker.threads






  100
 
 
  hive.server2.thrift.port
  1
 
 
  hive.server2.thrift.bind.host






  hdtest022.svl.ibm.com
 
 
  hive.server2.authentication
  CUSTOM






 
 
  hive.server2.custom.authentication.class
  
org.apache.hive.service.auth.WebConsoleAuthenticationProviderImpl
 






 
  hive.server2.enable.impersonation
  true
 
 
  hive.security.webconsole.url






  http://hdtest022.svl.ibm.com:8080
 
 
  hive.security.authorization.enabled






  true
 
 
  hive.security.authorization.createtable.owner.grants
  ALL
 










On Mon, Aug 11, 2014 at 4:29 PM, Yin Huai  wrot

Re: Spark sql failed in yarn-cluster mode when connecting to non-default hive database

2015-02-05 Thread Cheng Lian

Hi Jenny,

You may try to use |--files $SPARK_HOME/conf/hive-site.xml 
--driver-class-path hive-site.xml| when submitting your application. The 
problem is that when running in cluster mode, the driver is actually 
running in a random container directory on a random executor node. By 
using |--files|, you upload hive-site.xml to the container directory, by 
using |--driver-class-path hive-site.xml|, you add the file to classpath 
(the path is relative to the container directory).


When running in cluster mode, have you tried to check the tables inside 
the default database? If my guess is right, this should be an empty 
default database inside the default Derby metastore created by 
HiveContext when the hive-site.xml is missing.


Best,
Cheng

On 8/12/14 5:38 PM, Jenny Zhao wrote:



Hi Yin,

hive-site.xml was copied to spark/conf and the same as the one under 
$HIVE_HOME/conf.


through hive cli, I don't see any problem. but for spark on 
yarn-cluster mode, I am not able to switch to a database other than 
the default one, for Yarn-client mode, it works fine.


Thanks!

Jenny


On Tue, Aug 12, 2014 at 12:53 PM, Yin Huai > wrote:


Hi Jenny,

Have you copied hive-site.xml to spark/conf directory? If not, can
you put it in conf/ and try again?

Thanks,

Yin


On Mon, Aug 11, 2014 at 8:57 PM, Jenny Zhao
mailto:linlin200...@gmail.com>> wrote:


Thanks Yin!

here is my hive-site.xml,  which I copied from
$HIVE_HOME/conf, didn't experience problem connecting to the
metastore through hive. which uses DB2 as metastore database.





 
hive.hwi.listen.port
  
 
 
hive.querylog.location
/var/ibm/biginsights/hive/query/${user.name
}
 
 
hive.metastore.warehouse.dir
/biginsights/hive/warehouse
 
 
hive.hwi.war.file
lib/hive-hwi-0.12.0.war
 
 
hive.metastore.metrics.enabled
  true
 
 
javax.jdo.option.ConnectionURL
  jdbc:db2://hdtest022.svl.ibm.com:50001/BIDB

 
 
javax.jdo.option.ConnectionDriverName
com.ibm.db2.jcc.DB2Driver
 
 
hive.stats.autogather
  false
 
 
javax.jdo.mapping.Schema
  HIVE
 
 
javax.jdo.option.ConnectionUserName
  catalog
 
 
javax.jdo.option.ConnectionPassword
V2pJNWMxbFlVbWhaZHowOQ==
 
 
hive.metastore.password.encrypt
  true
 
 
org.jpox.autoCreateSchema
  true
 
 
hive.server2.thrift.min.worker.threads
  5
 
 
hive.server2.thrift.max.worker.threads
  100
 
 
hive.server2.thrift.port
  1
 
 
hive.server2.thrift.bind.host
  hdtest022.svl.ibm.com

 
 
hive.server2.authentication
  CUSTOM
 
 
hive.server2.custom.authentication.class

org.apache.hive.service.auth.WebConsoleAuthenticationProviderImpl
 
 
hive.server2.enable.impersonation
  true
 
 
hive.security.webconsole.url
  http://hdtest022.svl.ibm.com:8080
 
 
hive.security.authorization.enabled
  true
 
 
hive.security.authorization.createtable.owner.grants
  ALL
 




On Mon, Aug 11, 2014 at 4:29 PM, Yin Huai
mailto:huaiyin@gmail.com>> wrote:

Hi Jenny,

How's your metastore configured for both Hive and Spark
SQL? Which metastore mode are you using (based on

https://cwiki.apache.org/confluence/display/Hive/AdminManual+MetastoreAdmin)?

Thanks,

Yin


On Mon, Aug 11, 2014 at 6:15 PM, Jenny Zhao
mailto:linlin200...@gmail.com>>
wrote:



you can reproduce this issue with the following steps
(assuming you have Yarn cluster + Hive 12):

1) using hive shell, create a database, e.g: create
database ttt

2) write a simple spark sql program

import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.sql._
import org.apache.spark.sql.hive.HiveContext

object HiveSpark {
  case class Record(key: Int, value: String)

  def main(args: Array[String]) {
val sparkConf = new
SparkConf().setAppName("HiveSpark")
val sc

RE: Spark sql failed in yarn-cluster mode when connecting to non-default hive database

2015-02-17 Thread Andrew Lee
HI All,
Just want to give everyone an update of what worked for me. Thanks for Cheng's 
comment and other ppl's help.
So what I misunderstood was the --driver-class-path and how that was related to 
--files.  I put both /etc/hive/hive-site.xml in both --files and 
--driver-class-path when I started in yarn-cluster mode. 
./bin/spark-submit --verbose --queue research --driver-java-options 
"-XX:MaxPermSize=8192M --files /etc/hive/hive-site.xml --driver-class-path 
/etc/hive/hive-site.xml --master yarn --deploy-mode cluster 
The problem here is that --files only look for the local files to distribute it 
onto HDFS. The --driver-class-path is what brings to CLASSPATH during runtime, 
and as you can see, it is trying to look at /etc/hive/hive-site.xml on the 
container in the remote nodes which apparently doesn't exist.  For some ppl, it 
may work fine is b/c they may deploy Hive configuration and JARs across their 
entire cluster so every node looks the same. But this wasn't my case in 
multi-tenant environment or a restricted secured cluster. So my parameter looks 
like this when I launch it.








./bin/spark-submit --verbose --queue research --driver-java-options 
"-XX:MaxPermSize=8192M --files /etc/hive/hive-site.xml --driver-class-path 
hive-site.xml --master yarn --deploy-mode cluster 
So --driver-class-path here will only look at ./hive-site.xml on the remote 
container which was pre-deployed already by the --files. 
This worked for me, and I can have HiveContext API to talk to Hive metastore, 
and vice versa. Thanks.


Date: Thu, 5 Feb 2015 16:59:12 -0800
From: lian.cs@gmail.com
To: linlin200...@gmail.com; huaiyin@gmail.com
CC: user@spark.apache.org
Subject: Re: Spark sql failed in yarn-cluster mode when connecting to 
non-default hive database


  

  
  

  Hi Jenny,
  You may try to use --files
  $SPARK_HOME/conf/hive-site.xml --driver-class-path
  hive-site.xml when submitting your application. The
problem is that when running in cluster mode, the driver is
actually running in a random container directory on a random
executor node. By using --files,
you upload hive-site.xml to the container directory, by using 
--driver-class-path
  hive-site.xml, you add the file to classpath (the path
is relative to the container directory).
  When running in cluster
mode, have you tried to check the tables inside the default
database? If my guess is right, this should be an empty default
database inside the default Derby metastore created by
HiveContext when the hive-site.xml is missing.
  Best,

Cheng
  On 8/12/14 5:38 PM,
Jenny Zhao wrote:
  
  



  

  

  

  
  Hi Yin,

  

  hive-site.xml was copied to spark/conf and the same as
  the one under $HIVE_HOME/conf. 

  


through hive cli, I don't see any problem. but for spark
on yarn-cluster mode, I am not able to switch to a
database other than the default one, for Yarn-client
mode, it works fine.  



  
  Thanks!

  


Jenny

  
  



On Tue, Aug 12, 2014 at 12:53 PM,
  Yin Huai 
  wrote:

  
Hi Jenny,
  

  
  Have you copied hive-site.xml
  to spark/conf directory? If not, can you put it in
  conf/ and try again?
  


  Thanks,
  


  Yin


  


  

  On Mon, Aug 11, 2014 at
8:57 PM, Jenny Zhao 
wrote:


  

  

  
  Thanks Yin! 

  


here is my hive-site.xml,  which I copied
from $HIVE_HOME/conf, didn't experience
problem connecting to the metastore through
hi

Re: Spark sql failed in yarn-cluster mode when connecting to non-default hive database

2015-04-13 Thread sachin Singh
Hi Linlin,
have you got the solution for this issue, if yes then what are the thing
need to make correct,because I am also getting same error,when submitting
spark job in cluster mode getting error as under -
2015-04-14 18:16:43 DEBUG Transaction - Transaction rolled back in 0 ms
2015-04-14 18:16:43 ERROR DDLTask -
org.apache.hadoop.hive.ql.metadata.HiveException: Database does not exist:
my_database
at 
org.apache.hadoop.hive.ql.exec.DDLTask.switchDatabase(DDLTask.java:4054)
at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:269)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java
...


Please suggest, I have copied hive-site.xml in spark/conf in standalone its
working fine.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-sql-failed-in-yarn-cluster-mode-when-connecting-to-non-default-hive-database-tp11811p22486.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark sql failed in yarn-cluster mode when connecting to non-default hive database

2014-08-10 Thread Cheng Lian
Hi Jenny, does this issue only happen when running Spark SQL with YARN in
your environment?


On Sat, Aug 9, 2014 at 3:56 AM, Jenny Zhao  wrote:

>
> Hi,
>
> I am able to run my hql query on yarn cluster mode when connecting to the
> default hive metastore defined in hive-site.xml.
>
> however, if I want to switch to a different database, like:
>
>   hql("use other-database")
>
>
> it only works in yarn client mode, but failed on yarn-cluster mode with
> the following stack:
>
> 14/08/08 12:09:11 INFO HiveMetaStore: 0: get_database: tt
> 14/08/08 12:09:11 INFO audit: ugi=biadmin ip=unknown-ip-addr  
> cmd=get_database: tt
> 14/08/08 12:09:11 ERROR RetryingHMSHandler: 
> NoSuchObjectException(message:There is no database named tt)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.getMDatabase(ObjectStore.java:431)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.getDatabase(ObjectStore.java:441)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
>   at java.lang.reflect.Method.invoke(Method.java:611)
>   at 
> org.apache.hadoop.hive.metastore.RetryingRawStore.invoke(RetryingRawStore.java:124)
>   at $Proxy15.getDatabase(Unknown Source)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_database(HiveMetaStore.java:628)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
>   at java.lang.reflect.Method.invoke(Method.java:611)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:103)
>   at $Proxy17.get_database(Unknown Source)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabase(HiveMetaStoreClient.java:810)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
>   at java.lang.reflect.Method.invoke(Method.java:611)
>   at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:89)
>   at $Proxy18.getDatabase(Unknown Source)
>   at org.apache.hadoop.hive.ql.metadata.Hive.getDatabase(Hive.java:1139)
>   at 
> org.apache.hadoop.hive.ql.metadata.Hive.databaseExists(Hive.java:1128)
>   at 
> org.apache.hadoop.hive.ql.exec.DDLTask.switchDatabase(DDLTask.java:3479)
>   at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:237)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:151)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:65)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1414)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1192)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1020)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:888)
>   at org.apache.spark.sql.hive.HiveContext.runHive(HiveContext.scala:208)
>   at 
> org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:182)
>   at 
> org.apache.spark.sql.hive.HiveContext$QueryExecution.toRdd$lzycompute(HiveContext.scala:272)
>   at 
> org.apache.spark.sql.hive.HiveContext$QueryExecution.toRdd(HiveContext.scala:269)
>   at org.apache.spark.sql.hive.HiveContext.hiveql(HiveContext.scala:86)
>   at org.apache.spark.sql.hive.HiveContext.hql(HiveContext.scala:91)
>   at 
> org.apache.spark.examples.sql.hive.HiveSpark$.main(HiveSpark.scala:35)
>   at org.apache.spark.examples.sql.hive.HiveSpark.main(HiveSpark.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
>   at java.lang.reflect.Method.invoke(Method.java:611)
>   at 
> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:186)
>
> 14/08/08 12:09:11 ERROR DDLTask: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Database does not exist: tt
>   at 
> org.apache.hadoop.hive.ql.exec.DDLTask.switchDatabase(DDLTask.java:3480)
>   at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:237)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:151)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:65)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1414)
>   at org.a

Re: Spark sql failed in yarn-cluster mode when connecting to non-default hive database

2014-08-11 Thread Cheng Lian
Since you were using hql(...), it’s probably not related to JDBC driver.
But I failed to reproduce this issue locally with a single node pseudo
distributed YARN cluster. Would you mind to elaborate more about steps to
reproduce this bug? Thanks
​


On Sun, Aug 10, 2014 at 9:36 PM, Cheng Lian  wrote:

> Hi Jenny, does this issue only happen when running Spark SQL with YARN in
> your environment?
>
>
> On Sat, Aug 9, 2014 at 3:56 AM, Jenny Zhao  wrote:
>
>>
>> Hi,
>>
>> I am able to run my hql query on yarn cluster mode when connecting to the
>> default hive metastore defined in hive-site.xml.
>>
>> however, if I want to switch to a different database, like:
>>
>>   hql("use other-database")
>>
>>
>> it only works in yarn client mode, but failed on yarn-cluster mode with
>> the following stack:
>>
>> 14/08/08 12:09:11 INFO HiveMetaStore: 0: get_database: tt
>> 14/08/08 12:09:11 INFO audit: ugi=biadminip=unknown-ip-addr  
>> cmd=get_database: tt
>> 14/08/08 12:09:11 ERROR RetryingHMSHandler: 
>> NoSuchObjectException(message:There is no database named tt)
>>  at 
>> org.apache.hadoop.hive.metastore.ObjectStore.getMDatabase(ObjectStore.java:431)
>>  at 
>> org.apache.hadoop.hive.metastore.ObjectStore.getDatabase(ObjectStore.java:441)
>>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>  at 
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60)
>>  at 
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
>>  at java.lang.reflect.Method.invoke(Method.java:611)
>>  at 
>> org.apache.hadoop.hive.metastore.RetryingRawStore.invoke(RetryingRawStore.java:124)
>>  at $Proxy15.getDatabase(Unknown Source)
>>  at 
>> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_database(HiveMetaStore.java:628)
>>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>  at 
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60)
>>  at 
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
>>  at java.lang.reflect.Method.invoke(Method.java:611)
>>  at 
>> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:103)
>>  at $Proxy17.get_database(Unknown Source)
>>  at 
>> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabase(HiveMetaStoreClient.java:810)
>>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>  at 
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60)
>>  at 
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
>>  at java.lang.reflect.Method.invoke(Method.java:611)
>>  at 
>> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:89)
>>  at $Proxy18.getDatabase(Unknown Source)
>>  at org.apache.hadoop.hive.ql.metadata.Hive.getDatabase(Hive.java:1139)
>>  at 
>> org.apache.hadoop.hive.ql.metadata.Hive.databaseExists(Hive.java:1128)
>>  at 
>> org.apache.hadoop.hive.ql.exec.DDLTask.switchDatabase(DDLTask.java:3479)
>>  at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:237)
>>  at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:151)
>>  at 
>> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:65)
>>  at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1414)
>>  at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1192)
>>  at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1020)
>>  at org.apache.hadoop.hive.ql.Driver.run(Driver.java:888)
>>  at org.apache.spark.sql.hive.HiveContext.runHive(HiveContext.scala:208)
>>  at 
>> org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:182)
>>  at 
>> org.apache.spark.sql.hive.HiveContext$QueryExecution.toRdd$lzycompute(HiveContext.scala:272)
>>  at 
>> org.apache.spark.sql.hive.HiveContext$QueryExecution.toRdd(HiveContext.scala:269)
>>  at org.apache.spark.sql.hive.HiveContext.hiveql(HiveContext.scala:86)
>>  at org.apache.spark.sql.hive.HiveContext.hql(HiveContext.scala:91)
>>  at 
>> org.apache.spark.examples.sql.hive.HiveSpark$.main(HiveSpark.scala:35)
>>  at org.apache.spark.examples.sql.hive.HiveSpark.main(HiveSpark.scala)
>>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>  at 
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60)
>>  at 
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
>>  at java.lang.reflect.Method.invoke(Method.java:611)
>>  at 
>> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:186)
>>
>> 14/08/08 12:09:11 ERROR DDLTask: 
>> org.apache.hadoop.hive.ql.metadata.HiveException: Database does not exist: tt
>>  at 
>> org.apache.hadoop.hive.ql.exec.DDLTask.switchD

Re: Spark sql failed in yarn-cluster mode when connecting to non-default hive database

2014-08-11 Thread Jenny Zhao
you can reproduce this issue with the following steps (assuming you have
Yarn cluster + Hive 12):

1) using hive shell, create a database, e.g: create database ttt

2) write a simple spark sql program

import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.sql._
import org.apache.spark.sql.hive.HiveContext

object HiveSpark {
  case class Record(key: Int, value: String)

  def main(args: Array[String]) {
val sparkConf = new SparkConf().setAppName("HiveSpark")
val sc = new SparkContext(sparkConf)

// A hive context creates an instance of the Hive Metastore in process,
val hiveContext = new HiveContext(sc)
import hiveContext._

hql("use ttt")
hql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING)")
hql("LOAD DATA INPATH '/user/biadmin/kv1.txt' INTO TABLE src")

// Queries are expressed in HiveQL
println("Result of 'SELECT *': ")
hql("SELECT * FROM src").collect.foreach(println)
sc.stop()
  }
}
3) run it in yarn-cluster mode.


On Mon, Aug 11, 2014 at 9:44 AM, Cheng Lian  wrote:

> Since you were using hql(...), it’s probably not related to JDBC driver.
> But I failed to reproduce this issue locally with a single node pseudo
> distributed YARN cluster. Would you mind to elaborate more about steps to
> reproduce this bug? Thanks
> ​
>
>
> On Sun, Aug 10, 2014 at 9:36 PM, Cheng Lian  wrote:
>
>> Hi Jenny, does this issue only happen when running Spark SQL with YARN in
>> your environment?
>>
>>
>> On Sat, Aug 9, 2014 at 3:56 AM, Jenny Zhao 
>> wrote:
>>
>>>
>>> Hi,
>>>
>>> I am able to run my hql query on yarn cluster mode when connecting to
>>> the default hive metastore defined in hive-site.xml.
>>>
>>> however, if I want to switch to a different database, like:
>>>
>>>   hql("use other-database")
>>>
>>>
>>> it only works in yarn client mode, but failed on yarn-cluster mode with
>>> the following stack:
>>>
>>> 14/08/08 12:09:11 INFO HiveMetaStore: 0: get_database: tt
>>> 14/08/08 12:09:11 INFO audit: ugi=biadmin   ip=unknown-ip-addr  
>>> cmd=get_database: tt
>>> 14/08/08 12:09:11 ERROR RetryingHMSHandler: 
>>> NoSuchObjectException(message:There is no database named tt)
>>> at 
>>> org.apache.hadoop.hive.metastore.ObjectStore.getMDatabase(ObjectStore.java:431)
>>> at 
>>> org.apache.hadoop.hive.metastore.ObjectStore.getDatabase(ObjectStore.java:441)
>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>> at 
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60)
>>> at 
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
>>> at java.lang.reflect.Method.invoke(Method.java:611)
>>> at 
>>> org.apache.hadoop.hive.metastore.RetryingRawStore.invoke(RetryingRawStore.java:124)
>>> at $Proxy15.getDatabase(Unknown Source)
>>> at 
>>> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_database(HiveMetaStore.java:628)
>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>> at 
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60)
>>> at 
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
>>> at java.lang.reflect.Method.invoke(Method.java:611)
>>> at 
>>> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:103)
>>> at $Proxy17.get_database(Unknown Source)
>>> at 
>>> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabase(HiveMetaStoreClient.java:810)
>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>> at 
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60)
>>> at 
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
>>> at java.lang.reflect.Method.invoke(Method.java:611)
>>> at 
>>> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:89)
>>> at $Proxy18.getDatabase(Unknown Source)
>>> at org.apache.hadoop.hive.ql.metadata.Hive.getDatabase(Hive.java:1139)
>>> at 
>>> org.apache.hadoop.hive.ql.metadata.Hive.databaseExists(Hive.java:1128)
>>> at 
>>> org.apache.hadoop.hive.ql.exec.DDLTask.switchDatabase(DDLTask.java:3479)
>>> at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:237)
>>> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:151)
>>> at 
>>> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:65)
>>> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1414)
>>> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1192)
>>> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1020)
>>> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:888)
>>> at org.apache.spark.sql.hive.HiveContext.runHive(HiveContext.scala:208)
>>> at 
>>> org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:182)
>>> a

Re: Spark sql failed in yarn-cluster mode when connecting to non-default hive database

2014-08-11 Thread Yin Huai
Hi Jenny,

How's your metastore configured for both Hive and Spark SQL? Which
metastore mode are you using (based on
https://cwiki.apache.org/confluence/display/Hive/AdminManual+MetastoreAdmin
)?

Thanks,

Yin


On Mon, Aug 11, 2014 at 6:15 PM, Jenny Zhao  wrote:

>
>
> you can reproduce this issue with the following steps (assuming you have
> Yarn cluster + Hive 12):
>
> 1) using hive shell, create a database, e.g: create database ttt
>
> 2) write a simple spark sql program
>
> import org.apache.spark.{SparkConf, SparkContext}
> import org.apache.spark.sql._
> import org.apache.spark.sql.hive.HiveContext
>
> object HiveSpark {
>   case class Record(key: Int, value: String)
>
>   def main(args: Array[String]) {
> val sparkConf = new SparkConf().setAppName("HiveSpark")
> val sc = new SparkContext(sparkConf)
>
> // A hive context creates an instance of the Hive Metastore in process,
> val hiveContext = new HiveContext(sc)
> import hiveContext._
>
> hql("use ttt")
> hql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING)")
> hql("LOAD DATA INPATH '/user/biadmin/kv1.txt' INTO TABLE src")
>
> // Queries are expressed in HiveQL
> println("Result of 'SELECT *': ")
> hql("SELECT * FROM src").collect.foreach(println)
> sc.stop()
>   }
> }
> 3) run it in yarn-cluster mode.
>
>
> On Mon, Aug 11, 2014 at 9:44 AM, Cheng Lian  wrote:
>
>> Since you were using hql(...), it’s probably not related to JDBC driver.
>> But I failed to reproduce this issue locally with a single node pseudo
>> distributed YARN cluster. Would you mind to elaborate more about steps to
>> reproduce this bug? Thanks
>> ​
>>
>>
>> On Sun, Aug 10, 2014 at 9:36 PM, Cheng Lian 
>> wrote:
>>
>>> Hi Jenny, does this issue only happen when running Spark SQL with YARN
>>> in your environment?
>>>
>>>
>>> On Sat, Aug 9, 2014 at 3:56 AM, Jenny Zhao 
>>> wrote:
>>>

 Hi,

 I am able to run my hql query on yarn cluster mode when connecting to
 the default hive metastore defined in hive-site.xml.

 however, if I want to switch to a different database, like:

   hql("use other-database")


 it only works in yarn client mode, but failed on yarn-cluster mode with
 the following stack:

 14/08/08 12:09:11 INFO HiveMetaStore: 0: get_database: tt
 14/08/08 12:09:11 INFO audit: ugi=biadmin  ip=unknown-ip-addr  
 cmd=get_database: tt
 14/08/08 12:09:11 ERROR RetryingHMSHandler: 
 NoSuchObjectException(message:There is no database named tt)
at 
 org.apache.hadoop.hive.metastore.ObjectStore.getMDatabase(ObjectStore.java:431)
at 
 org.apache.hadoop.hive.metastore.ObjectStore.getDatabase(ObjectStore.java:441)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60)
at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
at java.lang.reflect.Method.invoke(Method.java:611)
at 
 org.apache.hadoop.hive.metastore.RetryingRawStore.invoke(RetryingRawStore.java:124)
at $Proxy15.getDatabase(Unknown Source)
at 
 org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_database(HiveMetaStore.java:628)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60)
at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
at java.lang.reflect.Method.invoke(Method.java:611)
at 
 org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:103)
at $Proxy17.get_database(Unknown Source)
at 
 org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabase(HiveMetaStoreClient.java:810)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60)
at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
at java.lang.reflect.Method.invoke(Method.java:611)
at 
 org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:89)
at $Proxy18.getDatabase(Unknown Source)
at org.apache.hadoop.hive.ql.metadata.Hive.getDatabase(Hive.java:1139)
at 
 org.apache.hadoop.hive.ql.metadata.Hive.databaseExists(Hive.java:1128)
at 
 org.apache.hadoop.hive.ql.exec.DDLTask.switchDatabase(DDLTask.java:3479)
at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:237)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:151)
at 
 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:65)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1414)
>

Re: Spark sql failed in yarn-cluster mode when connecting to non-default hive database

2014-08-11 Thread Jenny Zhao
Thanks Yin!

here is my hive-site.xml,  which I copied from $HIVE_HOME/conf, didn't
experience problem connecting to the metastore through hive. which uses DB2
as metastore database.





 
  hive.hwi.listen.port
  
 
 
  hive.querylog.location
  /var/ibm/biginsights/hive/query/${user.name}
 
 
  hive.metastore.warehouse.dir
  /biginsights/hive/warehouse
 
 
  hive.hwi.war.file
  lib/hive-hwi-0.12.0.war
 
 
  hive.metastore.metrics.enabled
  true
 
 
  javax.jdo.option.ConnectionURL
  jdbc:db2://hdtest022.svl.ibm.com:50001/BIDB
 
 
  javax.jdo.option.ConnectionDriverName
  com.ibm.db2.jcc.DB2Driver
 
 
  hive.stats.autogather
  false
 
 
  javax.jdo.mapping.Schema
  HIVE
 
 
  javax.jdo.option.ConnectionUserName
  catalog
 
 
  javax.jdo.option.ConnectionPassword
  V2pJNWMxbFlVbWhaZHowOQ==
 
 
  hive.metastore.password.encrypt
  true
 
 
  org.jpox.autoCreateSchema
  true
 
 
  hive.server2.thrift.min.worker.threads
  5
 
 
  hive.server2.thrift.max.worker.threads
  100
 
 
  hive.server2.thrift.port
  1
 
 
  hive.server2.thrift.bind.host
  hdtest022.svl.ibm.com
 
 
  hive.server2.authentication
  CUSTOM
 
 
  hive.server2.custom.authentication.class

org.apache.hive.service.auth.WebConsoleAuthenticationProviderImpl
 
 
  hive.server2.enable.impersonation
  true
 
 
  hive.security.webconsole.url
  http://hdtest022.svl.ibm.com:8080
 
 
  hive.security.authorization.enabled
  true
 
 
  hive.security.authorization.createtable.owner.grants
  ALL
 




On Mon, Aug 11, 2014 at 4:29 PM, Yin Huai  wrote:

> Hi Jenny,
>
> How's your metastore configured for both Hive and Spark SQL? Which
> metastore mode are you using (based on
> https://cwiki.apache.org/confluence/display/Hive/AdminManual+MetastoreAdmin
> )?
>
> Thanks,
>
> Yin
>
>
> On Mon, Aug 11, 2014 at 6:15 PM, Jenny Zhao 
> wrote:
>
>>
>>
>> you can reproduce this issue with the following steps (assuming you have
>> Yarn cluster + Hive 12):
>>
>> 1) using hive shell, create a database, e.g: create database ttt
>>
>> 2) write a simple spark sql program
>>
>> import org.apache.spark.{SparkConf, SparkContext}
>> import org.apache.spark.sql._
>> import org.apache.spark.sql.hive.HiveContext
>>
>> object HiveSpark {
>>   case class Record(key: Int, value: String)
>>
>>   def main(args: Array[String]) {
>> val sparkConf = new SparkConf().setAppName("HiveSpark")
>> val sc = new SparkContext(sparkConf)
>>
>> // A hive context creates an instance of the Hive Metastore in
>> process,
>> val hiveContext = new HiveContext(sc)
>> import hiveContext._
>>
>> hql("use ttt")
>> hql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING)")
>> hql("LOAD DATA INPATH '/user/biadmin/kv1.txt' INTO TABLE src")
>>
>> // Queries are expressed in HiveQL
>> println("Result of 'SELECT *': ")
>> hql("SELECT * FROM src").collect.foreach(println)
>> sc.stop()
>>   }
>> }
>> 3) run it in yarn-cluster mode.
>>
>>
>> On Mon, Aug 11, 2014 at 9:44 AM, Cheng Lian 
>> wrote:
>>
>>> Since you were using hql(...), it’s probably not related to JDBC
>>> driver. But I failed to reproduce this issue locally with a single node
>>> pseudo distributed YARN cluster. Would you mind to elaborate more about
>>> steps to reproduce this bug? Thanks
>>> ​
>>>
>>>
>>> On Sun, Aug 10, 2014 at 9:36 PM, Cheng Lian 
>>> wrote:
>>>
 Hi Jenny, does this issue only happen when running Spark SQL with YARN
 in your environment?


 On Sat, Aug 9, 2014 at 3:56 AM, Jenny Zhao 
 wrote:

>
> Hi,
>
> I am able to run my hql query on yarn cluster mode when connecting to
> the default hive metastore defined in hive-site.xml.
>
> however, if I want to switch to a different database, like:
>
>   hql("use other-database")
>
>
> it only works in yarn client mode, but failed on yarn-cluster mode
> with the following stack:
>
> 14/08/08 12:09:11 INFO HiveMetaStore: 0: get_database: tt
> 14/08/08 12:09:11 INFO audit: ugi=biadmin ip=unknown-ip-addr  
> cmd=get_database: tt
> 14/08/08 12:09:11 ERROR RetryingHMSHandler: 
> NoSuchObjectException(message:There is no database named tt)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.getMDatabase(ObjectStore.java:431)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.getDatabase(ObjectStore.java:441)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
>   at java.lang.reflect.Method.invoke(Method.java:611)
>   at 
> org.apache.hadoop.hive.metastore.RetryingRawStore.invoke(RetryingRawStore.java:124)
>   at $Proxy15.getDatabase(Unknown Source)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_database(HiveMetaStore.java:628)
>   at sun.r

Re: Spark sql failed in yarn-cluster mode when connecting to non-default hive database

2014-08-12 Thread Yin Huai
Hi Jenny,

Have you copied hive-site.xml to spark/conf directory? If not, can you put
it in conf/ and try again?

Thanks,

Yin


On Mon, Aug 11, 2014 at 8:57 PM, Jenny Zhao  wrote:

>
> Thanks Yin!
>
> here is my hive-site.xml,  which I copied from $HIVE_HOME/conf, didn't
> experience problem connecting to the metastore through hive. which uses DB2
> as metastore database.
>
> 
> 
> 
> 
>  
>   hive.hwi.listen.port
>   
>  
>  
>   hive.querylog.location
>   /var/ibm/biginsights/hive/query/${user.name}
>  
>  
>   hive.metastore.warehouse.dir
>   /biginsights/hive/warehouse
>  
>  
>   hive.hwi.war.file
>   lib/hive-hwi-0.12.0.war
>  
>  
>   hive.metastore.metrics.enabled
>   true
>  
>  
>   javax.jdo.option.ConnectionURL
>   jdbc:db2://hdtest022.svl.ibm.com:50001/BIDB
>  
>  
>   javax.jdo.option.ConnectionDriverName
>   com.ibm.db2.jcc.DB2Driver
>  
>  
>   hive.stats.autogather
>   false
>  
>  
>   javax.jdo.mapping.Schema
>   HIVE
>  
>  
>   javax.jdo.option.ConnectionUserName
>   catalog
>  
>  
>   javax.jdo.option.ConnectionPassword
>   V2pJNWMxbFlVbWhaZHowOQ==
>  
>  
>   hive.metastore.password.encrypt
>   true
>  
>  
>   org.jpox.autoCreateSchema
>   true
>  
>  
>   hive.server2.thrift.min.worker.threads
>   5
>  
>  
>   hive.server2.thrift.max.worker.threads
>   100
>  
>  
>   hive.server2.thrift.port
>   1
>  
>  
>   hive.server2.thrift.bind.host
>   hdtest022.svl.ibm.com
>  
>  
>   hive.server2.authentication
>   CUSTOM
>  
>  
>   hive.server2.custom.authentication.class
>
> org.apache.hive.service.auth.WebConsoleAuthenticationProviderImpl
>  
>  
>   hive.server2.enable.impersonation
>   true
>  
>  
>   hive.security.webconsole.url
>   http://hdtest022.svl.ibm.com:8080
>  
>  
>   hive.security.authorization.enabled
>   true
>  
>  
>   hive.security.authorization.createtable.owner.grants
>   ALL
>  
> 
>
>
>
> On Mon, Aug 11, 2014 at 4:29 PM, Yin Huai  wrote:
>
>> Hi Jenny,
>>
>> How's your metastore configured for both Hive and Spark SQL? Which
>> metastore mode are you using (based on
>> https://cwiki.apache.org/confluence/display/Hive/AdminManual+MetastoreAdmin
>> )?
>>
>> Thanks,
>>
>> Yin
>>
>>
>> On Mon, Aug 11, 2014 at 6:15 PM, Jenny Zhao 
>> wrote:
>>
>>>
>>>
>>> you can reproduce this issue with the following steps (assuming you have
>>> Yarn cluster + Hive 12):
>>>
>>> 1) using hive shell, create a database, e.g: create database ttt
>>>
>>> 2) write a simple spark sql program
>>>
>>> import org.apache.spark.{SparkConf, SparkContext}
>>> import org.apache.spark.sql._
>>> import org.apache.spark.sql.hive.HiveContext
>>>
>>> object HiveSpark {
>>>   case class Record(key: Int, value: String)
>>>
>>>   def main(args: Array[String]) {
>>> val sparkConf = new SparkConf().setAppName("HiveSpark")
>>> val sc = new SparkContext(sparkConf)
>>>
>>> // A hive context creates an instance of the Hive Metastore in
>>> process,
>>> val hiveContext = new HiveContext(sc)
>>> import hiveContext._
>>>
>>> hql("use ttt")
>>> hql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING)")
>>> hql("LOAD DATA INPATH '/user/biadmin/kv1.txt' INTO TABLE src")
>>>
>>> // Queries are expressed in HiveQL
>>> println("Result of 'SELECT *': ")
>>> hql("SELECT * FROM src").collect.foreach(println)
>>> sc.stop()
>>>   }
>>> }
>>> 3) run it in yarn-cluster mode.
>>>
>>>
>>> On Mon, Aug 11, 2014 at 9:44 AM, Cheng Lian 
>>> wrote:
>>>
 Since you were using hql(...), it’s probably not related to JDBC
 driver. But I failed to reproduce this issue locally with a single node
 pseudo distributed YARN cluster. Would you mind to elaborate more about
 steps to reproduce this bug? Thanks
 ​


 On Sun, Aug 10, 2014 at 9:36 PM, Cheng Lian 
 wrote:

> Hi Jenny, does this issue only happen when running Spark SQL with YARN
> in your environment?
>
>
> On Sat, Aug 9, 2014 at 3:56 AM, Jenny Zhao 
> wrote:
>
>>
>> Hi,
>>
>> I am able to run my hql query on yarn cluster mode when connecting to
>> the default hive metastore defined in hive-site.xml.
>>
>> however, if I want to switch to a different database, like:
>>
>>   hql("use other-database")
>>
>>
>> it only works in yarn client mode, but failed on yarn-cluster mode
>> with the following stack:
>>
>> 14/08/08 12:09:11 INFO HiveMetaStore: 0: get_database: tt
>> 14/08/08 12:09:11 INFO audit: ugi=biadminip=unknown-ip-addr  
>> cmd=get_database: tt
>> 14/08/08 12:09:11 ERROR RetryingHMSHandler: 
>> NoSuchObjectException(message:There is no database named tt)
>>  at 
>> org.apache.hadoop.hive.metastore.ObjectStore.getMDatabase(ObjectStore.java:431)
>>  at 
>> org.apache.hadoop.hive.metastore.ObjectStore.getDatabase(ObjectStore.java:441)
>>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>  at 
>> sun.reflect.NativeMethod

Re: Spark sql failed in yarn-cluster mode when connecting to non-default hive database

2014-08-12 Thread Jenny Zhao
Hi Yin,

hive-site.xml was copied to spark/conf and the same as the one under
$HIVE_HOME/conf.

through hive cli, I don't see any problem. but for spark on yarn-cluster
mode, I am not able to switch to a database other than the default one, for
Yarn-client mode, it works fine.

Thanks!

Jenny


On Tue, Aug 12, 2014 at 12:53 PM, Yin Huai  wrote:

> Hi Jenny,
>
> Have you copied hive-site.xml to spark/conf directory? If not, can you
> put it in conf/ and try again?
>
> Thanks,
>
> Yin
>
>
> On Mon, Aug 11, 2014 at 8:57 PM, Jenny Zhao 
> wrote:
>
>>
>> Thanks Yin!
>>
>> here is my hive-site.xml,  which I copied from $HIVE_HOME/conf, didn't
>> experience problem connecting to the metastore through hive. which uses DB2
>> as metastore database.
>>
>> 
>> 
>> 
>> 
>>  
>>   hive.hwi.listen.port
>>   
>>  
>>  
>>   hive.querylog.location
>>   /var/ibm/biginsights/hive/query/${user.name}
>>  
>>  
>>   hive.metastore.warehouse.dir
>>   /biginsights/hive/warehouse
>>  
>>  
>>   hive.hwi.war.file
>>   lib/hive-hwi-0.12.0.war
>>  
>>  
>>   hive.metastore.metrics.enabled
>>   true
>>  
>>  
>>   javax.jdo.option.ConnectionURL
>>   jdbc:db2://hdtest022.svl.ibm.com:50001/BIDB
>>  
>>  
>>   javax.jdo.option.ConnectionDriverName
>>   com.ibm.db2.jcc.DB2Driver
>>  
>>  
>>   hive.stats.autogather
>>   false
>>  
>>  
>>   javax.jdo.mapping.Schema
>>   HIVE
>>  
>>  
>>   javax.jdo.option.ConnectionUserName
>>   catalog
>>  
>>  
>>   javax.jdo.option.ConnectionPassword
>>   V2pJNWMxbFlVbWhaZHowOQ==
>>  
>>  
>>   hive.metastore.password.encrypt
>>   true
>>  
>>  
>>   org.jpox.autoCreateSchema
>>   true
>>  
>>  
>>   hive.server2.thrift.min.worker.threads
>>   5
>>  
>>  
>>   hive.server2.thrift.max.worker.threads
>>   100
>>  
>>  
>>   hive.server2.thrift.port
>>   1
>>  
>>  
>>   hive.server2.thrift.bind.host
>>   hdtest022.svl.ibm.com
>>  
>>  
>>   hive.server2.authentication
>>   CUSTOM
>>  
>>  
>>   hive.server2.custom.authentication.class
>>
>> org.apache.hive.service.auth.WebConsoleAuthenticationProviderImpl
>>  
>>  
>>   hive.server2.enable.impersonation
>>   true
>>  
>>  
>>   hive.security.webconsole.url
>>   http://hdtest022.svl.ibm.com:8080
>>  
>>  
>>   hive.security.authorization.enabled
>>   true
>>  
>>  
>>   hive.security.authorization.createtable.owner.grants
>>   ALL
>>  
>> 
>>
>>
>>
>> On Mon, Aug 11, 2014 at 4:29 PM, Yin Huai  wrote:
>>
>>> Hi Jenny,
>>>
>>> How's your metastore configured for both Hive and Spark SQL? Which
>>> metastore mode are you using (based on
>>> https://cwiki.apache.org/confluence/display/Hive/AdminManual+MetastoreAdmin
>>> )?
>>>
>>> Thanks,
>>>
>>> Yin
>>>
>>>
>>> On Mon, Aug 11, 2014 at 6:15 PM, Jenny Zhao 
>>> wrote:
>>>


 you can reproduce this issue with the following steps (assuming you
 have Yarn cluster + Hive 12):

 1) using hive shell, create a database, e.g: create database ttt

 2) write a simple spark sql program

 import org.apache.spark.{SparkConf, SparkContext}
 import org.apache.spark.sql._
 import org.apache.spark.sql.hive.HiveContext

 object HiveSpark {
   case class Record(key: Int, value: String)

   def main(args: Array[String]) {
 val sparkConf = new SparkConf().setAppName("HiveSpark")
 val sc = new SparkContext(sparkConf)

 // A hive context creates an instance of the Hive Metastore in
 process,
 val hiveContext = new HiveContext(sc)
 import hiveContext._

 hql("use ttt")
 hql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING)")
 hql("LOAD DATA INPATH '/user/biadmin/kv1.txt' INTO TABLE src")

 // Queries are expressed in HiveQL
 println("Result of 'SELECT *': ")
 hql("SELECT * FROM src").collect.foreach(println)
 sc.stop()
   }
 }
 3) run it in yarn-cluster mode.


 On Mon, Aug 11, 2014 at 9:44 AM, Cheng Lian 
 wrote:

> Since you were using hql(...), it’s probably not related to JDBC
> driver. But I failed to reproduce this issue locally with a single node
> pseudo distributed YARN cluster. Would you mind to elaborate more about
> steps to reproduce this bug? Thanks
> ​
>
>
> On Sun, Aug 10, 2014 at 9:36 PM, Cheng Lian 
> wrote:
>
>> Hi Jenny, does this issue only happen when running Spark SQL with
>> YARN in your environment?
>>
>>
>> On Sat, Aug 9, 2014 at 3:56 AM, Jenny Zhao 
>> wrote:
>>
>>>
>>> Hi,
>>>
>>> I am able to run my hql query on yarn cluster mode when connecting
>>> to the default hive metastore defined in hive-site.xml.
>>>
>>> however, if I want to switch to a different database, like:
>>>
>>>   hql("use other-database")
>>>
>>>
>>> it only works in yarn client mode, but failed on yarn-cluster mode
>>> with the following stack:
>>>
>>> 14/08/08 12:09:11 INFO HiveMetaStore

Re: Spark sql failed in yarn-cluster mode when connecting to non-default hive database

2014-08-13 Thread Yin Huai
I think the problem is that when you are using yarn-cluster mode, because
the Spark driver runs inside the application master, the hive-conf is not
accessible by the driver. Can you try to set those confs by using
hiveContext.set(...)? Or, maybe you can copy hive-site.xml to spark/conf in
the node running the application master.


On Tue, Aug 12, 2014 at 8:38 PM, Jenny Zhao  wrote:

>
> Hi Yin,
>
> hive-site.xml was copied to spark/conf and the same as the one under
> $HIVE_HOME/conf.
>
> through hive cli, I don't see any problem. but for spark on yarn-cluster
> mode, I am not able to switch to a database other than the default one, for
> Yarn-client mode, it works fine.
>
> Thanks!
>
> Jenny
>
>
> On Tue, Aug 12, 2014 at 12:53 PM, Yin Huai  wrote:
>
>> Hi Jenny,
>>
>> Have you copied hive-site.xml to spark/conf directory? If not, can you
>> put it in conf/ and try again?
>>
>> Thanks,
>>
>> Yin
>>
>>
>> On Mon, Aug 11, 2014 at 8:57 PM, Jenny Zhao 
>> wrote:
>>
>>>
>>> Thanks Yin!
>>>
>>> here is my hive-site.xml,  which I copied from $HIVE_HOME/conf, didn't
>>> experience problem connecting to the metastore through hive. which uses DB2
>>> as metastore database.
>>>
>>> 
>>> 
>>> 
>>> 
>>>  
>>>   hive.hwi.listen.port
>>>   
>>>  
>>>  
>>>   hive.querylog.location
>>>   /var/ibm/biginsights/hive/query/${user.name}
>>>  
>>>  
>>>   hive.metastore.warehouse.dir
>>>   /biginsights/hive/warehouse
>>>  
>>>  
>>>   hive.hwi.war.file
>>>   lib/hive-hwi-0.12.0.war
>>>  
>>>  
>>>   hive.metastore.metrics.enabled
>>>   true
>>>  
>>>  
>>>   javax.jdo.option.ConnectionURL
>>>   jdbc:db2://hdtest022.svl.ibm.com:50001/BIDB
>>>  
>>>  
>>>   javax.jdo.option.ConnectionDriverName
>>>   com.ibm.db2.jcc.DB2Driver
>>>  
>>>  
>>>   hive.stats.autogather
>>>   false
>>>  
>>>  
>>>   javax.jdo.mapping.Schema
>>>   HIVE
>>>  
>>>  
>>>   javax.jdo.option.ConnectionUserName
>>>   catalog
>>>  
>>>  
>>>   javax.jdo.option.ConnectionPassword
>>>   V2pJNWMxbFlVbWhaZHowOQ==
>>>  
>>>  
>>>   hive.metastore.password.encrypt
>>>   true
>>>  
>>>  
>>>   org.jpox.autoCreateSchema
>>>   true
>>>  
>>>  
>>>   hive.server2.thrift.min.worker.threads
>>>   5
>>>  
>>>  
>>>   hive.server2.thrift.max.worker.threads
>>>   100
>>>  
>>>  
>>>   hive.server2.thrift.port
>>>   1
>>>  
>>>  
>>>   hive.server2.thrift.bind.host
>>>   hdtest022.svl.ibm.com
>>>  
>>>  
>>>   hive.server2.authentication
>>>   CUSTOM
>>>  
>>>  
>>>   hive.server2.custom.authentication.class
>>>
>>> org.apache.hive.service.auth.WebConsoleAuthenticationProviderImpl
>>>  
>>>  
>>>   hive.server2.enable.impersonation
>>>   true
>>>  
>>>  
>>>   hive.security.webconsole.url
>>>   http://hdtest022.svl.ibm.com:8080
>>>  
>>>  
>>>   hive.security.authorization.enabled
>>>   true
>>>  
>>>  
>>>   hive.security.authorization.createtable.owner.grants
>>>   ALL
>>>  
>>> 
>>>
>>>
>>>
>>> On Mon, Aug 11, 2014 at 4:29 PM, Yin Huai  wrote:
>>>
 Hi Jenny,

 How's your metastore configured for both Hive and Spark SQL? Which
 metastore mode are you using (based on
 https://cwiki.apache.org/confluence/display/Hive/AdminManual+MetastoreAdmin
 )?

 Thanks,

 Yin


 On Mon, Aug 11, 2014 at 6:15 PM, Jenny Zhao 
 wrote:

>
>
> you can reproduce this issue with the following steps (assuming you
> have Yarn cluster + Hive 12):
>
> 1) using hive shell, create a database, e.g: create database ttt
>
> 2) write a simple spark sql program
>
> import org.apache.spark.{SparkConf, SparkContext}
> import org.apache.spark.sql._
> import org.apache.spark.sql.hive.HiveContext
>
> object HiveSpark {
>   case class Record(key: Int, value: String)
>
>   def main(args: Array[String]) {
> val sparkConf = new SparkConf().setAppName("HiveSpark")
> val sc = new SparkContext(sparkConf)
>
> // A hive context creates an instance of the Hive Metastore in
> process,
> val hiveContext = new HiveContext(sc)
> import hiveContext._
>
> hql("use ttt")
> hql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING)")
> hql("LOAD DATA INPATH '/user/biadmin/kv1.txt' INTO TABLE src")
>
> // Queries are expressed in HiveQL
> println("Result of 'SELECT *': ")
> hql("SELECT * FROM src").collect.foreach(println)
> sc.stop()
>   }
> }
> 3) run it in yarn-cluster mode.
>
>
> On Mon, Aug 11, 2014 at 9:44 AM, Cheng Lian 
> wrote:
>
>> Since you were using hql(...), it’s probably not related to JDBC
>> driver. But I failed to reproduce this issue locally with a single node
>> pseudo distributed YARN cluster. Would you mind to elaborate more about
>> steps to reproduce this bug? Thanks
>> ​
>>
>>
>> On Sun, Aug 10, 2014 at 9:36 PM, Cheng Lian 
>> wrote:
>>
>>> Hi Jenny, does this issue only happen when run