RE: Spark sql failed in yarn-cluster mode when connecting to non-default hive database

Andrew Lee Tue, 17 Feb 2015 09:13:51 -0800

HI All,
Just want to give everyone an update of what worked for me. Thanks for Cheng's 
comment and other ppl's help.
So what I misunderstood was the --driver-class-path and how that was related to 
--files.  I put both /etc/hive/hive-site.xml in both --files and 
--driver-class-path when I started in yarn-cluster mode. 
./bin/spark-submit --verbose --queue research --driver-java-options 
"-XX:MaxPermSize=8192M --files /etc/hive/hive-site.xml --driver-class-path 
/etc/hive/hive-site.xml --master yarn --deploy-mode cluster ............
The problem here is that --files only look for the local files to distribute it 
onto HDFS. The --driver-class-path is what brings to CLASSPATH during runtime, 
and as you can see, it is trying to look at /etc/hive/hive-site.xml on the 
container in the remote nodes which apparently doesn't exist.  For some ppl, it 
may work fine is b/c they may deploy Hive configuration and JARs across their 
entire cluster so every node looks the same. But this wasn't my case in 
multi-tenant environment or a restricted secured cluster. So my parameter looks 
like this when I launch it.









./bin/spark-submit --verbose --queue research --driver-java-options 
"-XX:MaxPermSize=8192M --files /etc/hive/hive-site.xml --driver-class-path 
hive-site.xml --master yarn --deploy-mode cluster ............
So --driver-class-path here will only look at ./hive-site.xml on the remote 
container which was pre-deployed already by the --files. 
This worked for me, and I can have HiveContext API to talk to Hive metastore, 
and vice versa. Thanks.


Date: Thu, 5 Feb 2015 16:59:12 -0800
From: lian.cs....@gmail.com
To: linlin200...@gmail.com; huaiyin....@gmail.com
CC: user@spark.apache.org
Subject: Re: Spark sql failed in yarn-cluster mode when connecting to 
non-default hive database


  
    
  
  
    
      Hi Jenny,
      You may try to use --files
          $SPARK_HOME/conf/hive-site.xml --driver-class-path
          hive-site.xml when submitting your application. The
        problem is that when running in cluster mode, the driver is
        actually running in a random container directory on a random
        executor node. By using --files,
        you upload hive-site.xml to the container directory, by using 
--driver-class-path
          hive-site.xml, you add the file to classpath (the path
        is relative to the container directory).
      When running in cluster
        mode, have you tried to check the tables inside the default
        database? If my guess is right, this should be an empty default
        database inside the default Derby metastore created by
        HiveContext when the hive-site.xml is missing.
      Best,

        Cheng
      On 8/12/14 5:38 PM,
        Jenny Zhao wrote:
      
      
        

        
          
            
              
                
                  

                  
                  Hi Yin,

                  

                  hive-site.xml was copied to spark/conf and the same as
                  the one under $HIVE_HOME/conf. 

                  

                
                through hive cli, I don't see any problem. but for spark
                on yarn-cluster mode, I am not able to switch to a
                database other than the default one, for Yarn-client
                mode, it works fine.  

                

              
              Thanks!

              

            
            Jenny

          
          

            

            On Tue, Aug 12, 2014 at 12:53 PM,
              Yin Huai <huaiyin....@gmail.com>
              wrote:

              
                Hi Jenny,
                  

                  
                  Have you copied hive-site.xml
                      to spark/conf directory? If not, can you put it in
                      conf/ and try again?
                  

                    
                  Thanks,
                  

                    
                  Yin
                
                
                  
                    

                      

                      On Mon, Aug 11, 2014 at
                        8:57 PM, Jenny Zhao <linlin200...@gmail.com>
                        wrote:

                        
                          
                            
                              

                              
                              Thanks Yin! 

                              

                            
                            here is my hive-site.xml,  which I copied
                            from $HIVE_HOME/conf, didn't experience
                            problem connecting to the metastore through
                            hive. which uses DB2 as metastore database.
                            

                            
                              

                                <?xml version="1.0"?>

                                <?xml-stylesheet type="text/xsl"
                                href="configuration.xsl"?>

                                <!--

                                   Licensed to the Apache Software
                                Foundation (ASF) under one or more

                                   contributor license agreements.  See
                                the NOTICE file distributed with

                                   this work for additional information
                                regarding copyright ownership.

                                   The ASF licenses this file to You
                                under the Apache License, Version 2.0

                                   (the "License"); you may not use this
                                file except in compliance with

                                   the License.  You may obtain a copy
                                of the License at

                                

                                       
http://www.apache.org/licenses/LICENSE-2.0

                                

                                   Unless required by applicable law or
                                agreed to in writing, software

                                   distributed under the License is
                                distributed on an "AS IS" BASIS,

                                   WITHOUT WARRANTIES OR CONDITIONS OF
                                ANY KIND, either express or implied.

                                   See the License for the specific
                                language governing permissions and

                                   limitations under the License.

                                -->

                                <configuration>

                                 <property>

                                 
                                <name>hive.hwi.listen.port</name>

                                  <value>9999</value>

                                 </property>

                                 <property>

                                 
                                <name>hive.querylog.location</name>

                                 
                                
<value>/var/ibm/biginsights/hive/query/${user.name}</value>

                                 </property>

                                 <property>

                                 
                                <name>hive.metastore.warehouse.dir</name>

                                 
                                <value>/biginsights/hive/warehouse</value>

                                 </property>

                                 <property>

                                 
                                <name>hive.hwi.war.file</name>

                                 
                                <value>lib/hive-hwi-0.12.0.war</value>

                                 </property>

                                 <property>

                                 
                                <name>hive.metastore.metrics.enabled</name>

                                  <value>true</value>

                                 </property>

                                 <property>

                                 
                                <name>javax.jdo.option.ConnectionURL</name>

                                  
<value>jdbc:db2://hdtest022.svl.ibm.com:50001/BIDB</value>

                                 </property>

                                 <property>

                                 
                                
<name>javax.jdo.option.ConnectionDriverName</name>

                                 
                                <value>com.ibm.db2.jcc.DB2Driver</value>

                                 </property>

                                 <property>

                                 
                                <name>hive.stats.autogather</name>

                                  <value>false</value>

                                 </property>

                                 <property>

                                 
                                <name>javax.jdo.mapping.Schema</name>

                                  <value>HIVE</value>

                                 </property>

                                 <property>

                                 
                                <name>javax.jdo.option.ConnectionUserName</name>

                                  <value>catalog</value>

                                 </property>

                                 <property>

                                 
                                <name>javax.jdo.option.ConnectionPassword</name>

                                 
                                <value>V2pJNWMxbFlVbWhaZHowOQ==</value>

                                 </property>

                                 <property>

                                 
                                <name>hive.metastore.password.encrypt</name>

                                  <value>true</value>

                                 </property>

                                 <property>

                                 
                                <name>org.jpox.autoCreateSchema</name>

                                  <value>true</value>

                                 </property>

                                 <property>

                                 
                                
<name>hive.server2.thrift.min.worker.threads</name>

                                  <value>5</value>

                                 </property>

                                 <property>

                                 
                                
<name>hive.server2.thrift.max.worker.threads</name>

                                  <value>100</value>

                                 </property>

                                 <property>

                                 
                                <name>hive.server2.thrift.port</name>

                                  <value>10000</value>

                                 </property>

                                 <property>

                                 
                                <name>hive.server2.thrift.bind.host</name>

                                  <value>hdtest022.svl.ibm.com</value>

                                 </property>

                                 <property>

                                 
                                <name>hive.server2.authentication</name>

                                  <value>CUSTOM</value>

                                 </property>

                                 <property>

                                 
                                
<name>hive.server2.custom.authentication.class</name>

                                 
<value>org.apache.hive.service.auth.WebConsoleAuthenticationProviderImpl</value>

                                 </property>

                                 <property>

                                 
                                <name>hive.server2.enable.impersonation</name>

                                  <value>true</value>

                                 </property>

                                 <property>

                                 
                                <name>hive.security.webconsole.url</name>

                                  
<value>http://hdtest022.svl.ibm.com:8080</value>

                                 </property>

                                 <property>

                                 
                                <name>hive.security.authorization.enabled</name>

                                  <value>true</value>

                                 </property>

                                 <property>

                                 
<name>hive.security.authorization.createtable.owner.grants</name>

                                  <value>ALL</value>

                                 </property>

                                </configuration>

                                

                              
                            
                          
                          
                            
                              

                                

                                On Mon, Aug 11,
                                  2014 at 4:29 PM, Yin Huai 
<huaiyin....@gmail.com>
                                  wrote:

                                  
                                    
                                      Hi Jenny,

                                        

                                        How's your metastore configured
                                        for both Hive and Spark SQL?
                                        Which metastore mode are you
                                        using (based on 
https://cwiki.apache.org/confluence/display/Hive/AdminManual+MetastoreAdmin)?

                                        

                                      
                                      Thanks,

                                      

                                      Yin

                                    
                                    
                                      
                                        

                                          

                                          On
                                            Mon, Aug 11, 2014 at 6:15
                                            PM, Jenny Zhao 
<linlin200...@gmail.com>
                                            wrote:

                                            
                                              
                                                
                                                  
                                                    
                                                      
                                                        

                                                        
                                                        

                                                      
                                                      you can reproduce
                                                      this issue with
                                                      the following
                                                      steps (assuming
                                                      you have Yarn
                                                      cluster + Hive
                                                      12): 

                                                      

                                                    
                                                    1) using hive
                                                      shell, create a
                                                      database, e.g:
                                                      create database
                                                      ttt

                                                    
                                                       

                                                    
                                                    2) write a
                                                      simple spark sql
                                                      program 

                                                    
                                                    

                                                    
                                                    import
                                                    org.apache.spark.{SparkConf,
                                                    SparkContext}

                                                    import
                                                    org.apache.spark.sql._

                                                    import
                                                    
org.apache.spark.sql.hive.HiveContext

                                                    

                                                    object HiveSpark {

                                                      case class
                                                    Record(key: Int,
                                                    value: String)

                                                    

                                                      def main(args:
                                                    Array[String]) {

                                                        val sparkConf =
                                                    new
                                                    
SparkConf().setAppName("HiveSpark")

                                                        val sc = new
                                                    SparkContext(sparkConf)

                                                    

                                                        // A hive
                                                    context creates an
                                                    instance of the Hive
                                                    Metastore in
                                                    process,

                                                        val hiveContext
                                                    = new
                                                    HiveContext(sc)

                                                        import
                                                    hiveContext._

                                                    

                                                        hql("use ttt")

                                                        hql("CREATE
                                                    TABLE IF NOT EXISTS
                                                    src (key INT, value
                                                    STRING)")

                                                        hql("LOAD DATA
                                                    INPATH
                                                    '/user/biadmin/kv1.txt'
                                                    INTO TABLE src")

                                                    

                                                        // Queries are
                                                    expressed in HiveQL

                                                        println("Result
                                                    of 'SELECT *': ")

                                                        hql("SELECT *
                                                    FROM
                                                    
src").collect.foreach(println)

                                                  
                                                      sc.stop()

                                                    }

                                                  }

                                                
                                                3) run it in
                                                yarn-cluster mode. 

                                              
                                              
                                                
                                                  

                                                    

                                                    On
                                                      Mon, Aug 11, 2014
                                                      at 9:44 AM, Cheng
                                                      Lian 
<lian.cs....@gmail.com>
                                                      wrote:

                                                      
                                                        
                                                          
                                                          Since
                                                          you were using
                                                          hql(...),
                                                          it’s probably
                                                          not related to
                                                          JDBC driver.
                                                          But I failed
                                                          to reproduce
                                                          this issue
                                                          locally with a
                                                          single node
                                                          pseudo
                                                          distributed
                                                          YARN cluster.
                                                          Would you mind
                                                          to elaborate
                                                          more about
                                                          steps to
                                                          reproduce this
                                                          bug? Thanks
                                                          
                                                          
                                                        
                                                        
                                                          
                                                          

                                                          

                                                          On
                                                          Sun, Aug 10,
                                                          2014 at 9:36
                                                          PM, Cheng Lian
                                                          
<lian.cs....@gmail.com>
                                                          wrote:

                                                          
                                                          Hi
                                                          Jenny, does
                                                          this issue
                                                          only happen
                                                          when running
                                                          Spark SQL with
                                                          YARN in your
                                                          environment?
                                                          
                                                          
                                                          

                                                          

                                                          On
                                                          Sat, Aug 9,
                                                          2014 at 3:56
                                                          AM, Jenny Zhao
                                                          
<linlin200...@gmail.com>
                                                          wrote:

                                                          
                                                          
                                                          
                                                          
                                                          
                                                          
                                                          

                                                          
                                                          Hi,

                                                          

                                                          
                                                          I am able to
                                                          run my hql
                                                          query on yarn
                                                          cluster mode
                                                          when
                                                          connecting to
                                                          the default
                                                          hive metastore
                                                          defined in
                                                          hive-site.xml.
                                                          

                                                          

                                                          
                                                          however, if I
                                                          want to switch
                                                          to a different
                                                          database,
                                                          like: 

                                                          

                                                          
                                                            hql("use
                                                          other-database")
                                                          

                                                          

                                                          

                                                          
                                                          it only
                                                          works in yarn
                                                          client mode,
                                                          but failed on
                                                          yarn-cluster
                                                          mode with the
                                                          following
                                                          stack: 

                                                          

                                                          14/08/08 12:09:11 
INFO HiveMetaStore: 0: get_database: tt
14/08/08 12:09:11 INFO audit: ugi=biadmin       ip=unknown-ip-addr      
cmd=get_database: tt    
14/08/08 12:09:11 ERROR RetryingHMSHandler: NoSuchObjectException(message:There 
is no database named tt)
        at 
org.apache.hadoop.hive.metastore.ObjectStore.getMDatabase(ObjectStore.java:431)
        at 
org.apache.hadoop.hive.metastore.ObjectStore.getDatabase(ObjectStore.java:441)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
        at java.lang.reflect.Method.invoke(Method.java:611)
        at 
org.apache.hadoop.hive.metastore.RetryingRawStore.invoke(RetryingRawStore.java:124)
        at $Proxy15.getDatabase(Unknown Source)
        at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_database(HiveMetaStore.java:628)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
        at java.lang.reflect.Method.invoke(Method.java:611)
        at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:103)
        at $Proxy17.get_database(Unknown Source)
        at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabase(HiveMetaStoreClient.java:810)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
        at java.lang.reflect.Method.invoke(Method.java:611)
        at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:89)
        at $Proxy18.getDatabase(Unknown Source)
        at org.apache.hadoop.hive.ql.metadata.Hive.getDatabase(Hive.java:1139)
        at 
org.apache.hadoop.hive.ql.metadata.Hive.databaseExists(Hive.java:1128)
        at 
org.apache.hadoop.hive.ql.exec.DDLTask.switchDatabase(DDLTask.java:3479)
        at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:237)
        at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:151)
        at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:65)
        at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1414)
        at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1192)
        at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1020)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:888)
        at org.apache.spark.sql.hive.HiveContext.runHive(HiveContext.scala:208)
        at 
org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:182)
        at 
org.apache.spark.sql.hive.HiveContext$QueryExecution.toRdd$lzycompute(HiveContext.scala:272)
        at 
org.apache.spark.sql.hive.HiveContext$QueryExecution.toRdd(HiveContext.scala:269)
        at org.apache.spark.sql.hive.HiveContext.hiveql(HiveContext.scala:86)
        at org.apache.spark.sql.hive.HiveContext.hql(HiveContext.scala:91)
        at 
org.apache.spark.examples.sql.hive.HiveSpark$.main(HiveSpark.scala:35)
        at org.apache.spark.examples.sql.hive.HiveSpark.main(HiveSpark.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
        at java.lang.reflect.Method.invoke(Method.java:611)
        at 
org.apache.spark.deploy.yarn.ApplicationMaster$anon$2.run(ApplicationMaster.scala:186)

14/08/08 12:09:11 ERROR DDLTask: 
org.apache.hadoop.hive.ql.metadata.HiveException: Database does not exist: tt
        at 
org.apache.hadoop.hive.ql.exec.DDLTask.switchDatabase(DDLTask.java:3480)
        at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:237)
        at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:151)
        at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:65)
        at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1414)
        at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1192)
        at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1020)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:888)
        at org.apache.spark.sql.hive.HiveContext.runHive(HiveContext.scala:208)
        at 
org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:182)
        at 
org.apache.spark.sql.hive.HiveContext$QueryExecution.toRdd$lzycompute(HiveContext.scala:272)
        at 
org.apache.spark.sql.hive.HiveContext$QueryExecution.toRdd(HiveContext.scala:269)
        at org.apache.spark.sql.hive.HiveContext.hiveql(HiveContext.scala:86)
        at org.apache.spark.sql.hive.HiveContext.hql(HiveContext.scala:91)
        at 
org.apache.spark.examples.sql.hive.HiveSpark$.main(HiveSpark.scala:35)
        at org.apache.spark.examples.sql.hive.HiveSpark.main(HiveSpark.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
        at java.lang.reflect.Method.invoke(Method.java:611)
        at 
org.apache.spark.deploy.yarn.ApplicationMaster$anon$2.run(ApplicationMaster.scala:186)
nono
                                                              why is
                                                          that? not sure
                                                          if this is
                                                          something to
                                                          do with hive
                                                          jdbc driver? 

                                                          

                                                          
                                                          Thank
                                                          you!

                                                          

                                                          
                                                          
                                                          
                                                          Jenny

RE: Spark sql failed in yarn-cluster mode when connecting to non-default hive database

Reply via email to