Hi Jenny,
You may try to use |--files $SPARK_HOME/conf/hive-site.xml
--driver-class-path hive-site.xml| when submitting your application. The
problem is that when running in cluster mode, the driver is actually
running in a random container directory on a random executor node. By
using |--files|, you upload hive-site.xml to the container directory, by
using |--driver-class-path hive-site.xml|, you add the file to classpath
(the path is relative to the container directory).
When running in cluster mode, have you tried to check the tables inside
the default database? If my guess is right, this should be an empty
default database inside the default Derby metastore created by
HiveContext when the hive-site.xml is missing.
Best,
Cheng
On 8/12/14 5:38 PM, Jenny Zhao wrote:
Hi Yin,
hive-site.xml was copied to spark/conf and the same as the one under
$HIVE_HOME/conf.
through hive cli, I don't see any problem. but for spark on
yarn-cluster mode, I am not able to switch to a database other than
the default one, for Yarn-client mode, it works fine.
Thanks!
Jenny
On Tue, Aug 12, 2014 at 12:53 PM, Yin Huai <huaiyin....@gmail.com
<mailto:huaiyin....@gmail.com>> wrote:
Hi Jenny,
Have you copied hive-site.xml to spark/conf directory? If not, can
you put it in conf/ and try again?
Thanks,
Yin
On Mon, Aug 11, 2014 at 8:57 PM, Jenny Zhao
<linlin200...@gmail.com <mailto:linlin200...@gmail.com>> wrote:
Thanks Yin!
here is my hive-site.xml, which I copied from
$HIVE_HOME/conf, didn't experience problem connecting to the
metastore through hive. which uses DB2 as metastore database.
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more
contributor license agreements. See the NOTICE file
distributed with
this work for additional information regarding copyright
ownership.
The ASF licenses this file to You under the Apache License,
Version 2.0
(the "License"); you may not use this file except in
compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software
distributed under the License is distributed on an "AS IS"
BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either
express or implied.
See the License for the specific language governing
permissions and
limitations under the License.
-->
<configuration>
<property>
<name>hive.hwi.listen.port</name>
<value>9999</value>
</property>
<property>
<name>hive.querylog.location</name>
<value>/var/ibm/biginsights/hive/query/${user.name
<http://user.name>}</value>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/biginsights/hive/warehouse</value>
</property>
<property>
<name>hive.hwi.war.file</name>
<value>lib/hive-hwi-0.12.0.war</value>
</property>
<property>
<name>hive.metastore.metrics.enabled</name>
<value>true</value>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:db2://hdtest022.svl.ibm.com:50001/BIDB
<http://hdtest022.svl.ibm.com:50001/BIDB></value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.ibm.db2.jcc.DB2Driver</value>
</property>
<property>
<name>hive.stats.autogather</name>
<value>false</value>
</property>
<property>
<name>javax.jdo.mapping.Schema</name>
<value>HIVE</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>catalog</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>V2pJNWMxbFlVbWhaZHowOQ==</value>
</property>
<property>
<name>hive.metastore.password.encrypt</name>
<value>true</value>
</property>
<property>
<name>org.jpox.autoCreateSchema</name>
<value>true</value>
</property>
<property>
<name>hive.server2.thrift.min.worker.threads</name>
<value>5</value>
</property>
<property>
<name>hive.server2.thrift.max.worker.threads</name>
<value>100</value>
</property>
<property>
<name>hive.server2.thrift.port</name>
<value>10000</value>
</property>
<property>
<name>hive.server2.thrift.bind.host</name>
<value>hdtest022.svl.ibm.com
<http://hdtest022.svl.ibm.com></value>
</property>
<property>
<name>hive.server2.authentication</name>
<value>CUSTOM</value>
</property>
<property>
<name>hive.server2.custom.authentication.class</name>
<value>org.apache.hive.service.auth.WebConsoleAuthenticationProviderImpl</value>
</property>
<property>
<name>hive.server2.enable.impersonation</name>
<value>true</value>
</property>
<property>
<name>hive.security.webconsole.url</name>
<value>http://hdtest022.svl.ibm.com:8080</value>
</property>
<property>
<name>hive.security.authorization.enabled</name>
<value>true</value>
</property>
<property>
<name>hive.security.authorization.createtable.owner.grants</name>
<value>ALL</value>
</property>
</configuration>
On Mon, Aug 11, 2014 at 4:29 PM, Yin Huai
<huaiyin....@gmail.com <mailto:huaiyin....@gmail.com>> wrote:
Hi Jenny,
How's your metastore configured for both Hive and Spark
SQL? Which metastore mode are you using (based on
https://cwiki.apache.org/confluence/display/Hive/AdminManual+MetastoreAdmin)?
Thanks,
Yin
On Mon, Aug 11, 2014 at 6:15 PM, Jenny Zhao
<linlin200...@gmail.com <mailto:linlin200...@gmail.com>>
wrote:
you can reproduce this issue with the following steps
(assuming you have Yarn cluster + Hive 12):
1) using hive shell, create a database, e.g: create
database ttt
2) write a simple spark sql program
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.sql._
import org.apache.spark.sql.hive.HiveContext
object HiveSpark {
case class Record(key: Int, value: String)
def main(args: Array[String]) {
val sparkConf = new
SparkConf().setAppName("HiveSpark")
val sc = new SparkContext(sparkConf)
// A hive context creates an instance of the Hive
Metastore in process,
val hiveContext = new HiveContext(sc)
import hiveContext._
hql("use ttt")
hql("CREATE TABLE IF NOT EXISTS src (key INT,
value STRING)")
hql("LOAD DATA INPATH '/user/biadmin/kv1.txt' INTO
TABLE src")
// Queries are expressed in HiveQL
println("Result of 'SELECT *': ")
hql("SELECT * FROM src").collect.foreach(println)
sc.stop()
}
}
3) run it in yarn-cluster mode.
On Mon, Aug 11, 2014 at 9:44 AM, Cheng Lian
<lian.cs....@gmail.com <mailto:lian.cs....@gmail.com>>
wrote:
Since you were using |hql(...)|, it’s probably not
related to JDBC driver. But I failed to reproduce
this issue locally with a single node pseudo
distributed YARN cluster. Would you mind to
elaborate more about steps to reproduce this bug?
Thanks
On Sun, Aug 10, 2014 at 9:36 PM, Cheng Lian
<lian.cs....@gmail.com
<mailto:lian.cs....@gmail.com>> wrote:
Hi Jenny, does this issue only happen when
running Spark SQL with YARN in your environment?
On Sat, Aug 9, 2014 at 3:56 AM, Jenny Zhao
<linlin200...@gmail.com
<mailto:linlin200...@gmail.com>> wrote:
Hi,
I am able to run my hql query on yarn
cluster mode when connecting to the
default hive metastore defined in
hive-site.xml.
however, if I want to switch to a
different database, like:
hql("use other-database")
it only works in yarn client mode, but
failed on yarn-cluster mode with the
following stack:
14/08/08 12:09:11 INFO HiveMetaStore: 0:
get_database: tt
14/08/08 12:09:11 INFO audit: ugi=biadmin
ip=unknown-ip-addr cmd=get_database: tt
14/08/08 12:09:11 ERROR RetryingHMSHandler:
NoSuchObjectException(message:There is no database named tt)
at
org.apache.hadoop.hive.metastore.ObjectStore.getMDatabase(ObjectStore.java:431)
at
org.apache.hadoop.hive.metastore.ObjectStore.getDatabase(ObjectStore.java:441)
at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
at
java.lang.reflect.Method.invoke(Method.java:611)
at
org.apache.hadoop.hive.metastore.RetryingRawStore.invoke(RetryingRawStore.java:124)
at $Proxy15.getDatabase(Unknown Source)
at
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_database(HiveMetaStore.java:628)
at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
at
java.lang.reflect.Method.invoke(Method.java:611)
at
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:103)
at $Proxy17.get_database(Unknown Source)
at
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabase(HiveMetaStoreClient.java:810)
at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
at
java.lang.reflect.Method.invoke(Method.java:611)
at
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:89)
at $Proxy18.getDatabase(Unknown Source)
at
org.apache.hadoop.hive.ql.metadata.Hive.getDatabase(Hive.java:1139)
at
org.apache.hadoop.hive.ql.metadata.Hive.databaseExists(Hive.java:1128)
at
org.apache.hadoop.hive.ql.exec.DDLTask.switchDatabase(DDLTask.java:3479)
at
org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:237)
at
org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:151)
at
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:65)
at
org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1414)
at
org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1192)
at
org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1020)
at
org.apache.hadoop.hive.ql.Driver.run(Driver.java:888)
at
org.apache.spark.sql.hive.HiveContext.runHive(HiveContext.scala:208)
at
org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:182)
at
org.apache.spark.sql.hive.HiveContext$QueryExecution.toRdd$lzycompute(HiveContext.scala:272)
at
org.apache.spark.sql.hive.HiveContext$QueryExecution.toRdd(HiveContext.scala:269)
at
org.apache.spark.sql.hive.HiveContext.hiveql(HiveContext.scala:86)
at
org.apache.spark.sql.hive.HiveContext.hql(HiveContext.scala:91)
at
org.apache.spark.examples.sql.hive.HiveSpark$.main(HiveSpark.scala:35)
at
org.apache.spark.examples.sql.hive.HiveSpark.main(HiveSpark.scala)
at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
at
java.lang.reflect.Method.invoke(Method.java:611)
at
org.apache.spark.deploy.yarn.ApplicationMaster$anon$2.run(ApplicationMaster.scala:186)
14/08/08 12:09:11 ERROR DDLTask:
org.apache.hadoop.hive.ql.metadata.HiveException: Database does not exist: tt
at
org.apache.hadoop.hive.ql.exec.DDLTask.switchDatabase(DDLTask.java:3480)
at
org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:237)
at
org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:151)
at
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:65)
at
org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1414)
at
org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1192)
at
org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1020)
at
org.apache.hadoop.hive.ql.Driver.run(Driver.java:888)
at
org.apache.spark.sql.hive.HiveContext.runHive(HiveContext.scala:208)
at
org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:182)
at
org.apache.spark.sql.hive.HiveContext$QueryExecution.toRdd$lzycompute(HiveContext.scala:272)
at
org.apache.spark.sql.hive.HiveContext$QueryExecution.toRdd(HiveContext.scala:269)
at
org.apache.spark.sql.hive.HiveContext.hiveql(HiveContext.scala:86)
at
org.apache.spark.sql.hive.HiveContext.hql(HiveContext.scala:91)
at
org.apache.spark.examples.sql.hive.HiveSpark$.main(HiveSpark.scala:35)
at
org.apache.spark.examples.sql.hive.HiveSpark.main(HiveSpark.scala)
at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
at
java.lang.reflect.Method.invoke(Method.java:611)
at
org.apache.spark.deploy.yarn.ApplicationMaster$anon$2.run(ApplicationMaster.scala:186)
nono
why is that? not sure if this is
something to do with hive jdbc driver?
Thank you!
Jenny