All,

    just to see if this happens to other as well.

  This is tested against the

   spark 1.5.1 ( branch 1.5  with label 1.5.2-SNAPSHOT with commit on Tue
Oct 6, 84f510c4fa06e43bd35e2dc8e1008d0590cbe266)

   Spark deployment mode : Spark-Cluster

   Notice that if we enable Kerberos mode, the spark yarn client fails with
the following:

*Could not initialize class org.apache.hadoop.hive.ql.metadata.Hive*
*java.lang.NoClassDefFoundError: Could not initialize class
org.apache.hadoop.hive.ql.metadata.Hive*
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at
org.apache.spark.deploy.yarn.Client$.org$apache$spark$deploy$yarn$Client$$obtainTokenForHiveMetastore(Client.scala:1252)
        at
org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:271)
        at
org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:629)
        at
org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:119)
        at org.apache.spark.deploy.yarn.Client.run(Client.scala:907)


Diving in Yarn Client.scala code and tested against different dependencies
and notice the followings:  if  the kerberos mode is enabled,
Client.obtainTokenForHiveMetastore()
will try to use scala reflection to get Hive and HiveConf and method on
these method.


      val hiveClass =
mirror.classLoader.loadClass("org.apache.hadoop.hive.ql.metadata.Hive")
      val hive = hiveClass.getMethod("get").invoke(null)

      val hiveConf = hiveClass.getMethod("getConf").invoke(hive)
      val hiveConfClass =
mirror.classLoader.loadClass("org.apache.hadoop.hive.conf.HiveConf")

      val hiveConfGet = (param: String) => Option(hiveConfClass
        .getMethod("get", classOf[java.lang.String])
        .invoke(hiveConf, param))


   If the "org.spark-project.hive" % "hive-exec" % "1.2.1.spark" is used,
then you will get above exception. But if we use the

       "org.apache.hive" % "hive-exec" "0.13.1-cdh5.2.0"

 The above method will not throw exception.


  Here some questions and comments

0) is this a bug ?

1) Why spark-hive hive-exec behave differently ? I understand
spark-hive hive-exec has less dependencies

   but I would expect it functionally the same

2) Where I can find the source code for spark-hive hive-exec ?

3) regarding the method obtainTokenForHiveMetastore(),

   I would assume that the method will first check if the
hive-metastore uri is present before

   trying to get the hive metastore tokens, it seems to invoke the
reflection regardless the hive service in the cluster is enabled or
not.

4) Noticed the obtainTokenForHBase() in the same Class (Client.java) catches

   case e: java.lang.NoClassDefFoundError => logDebug("HBase Class not
found: " + e)

   and just ignore the exception ( log debug),

   but obtainTokenForHiveMetastore() does not catch
NoClassDefFoundError exception, I guess this is the problem.

private def *obtainTokenForHiveMetastore*(conf: Configuration,
credentials: Credentials) {

    // rest of code

 } catch {
    case e: java.lang.NoSuchMethodException => { logInfo("Hive Method
not found " + e); return }
    case e: java.lang.ClassNotFoundException => { logInfo("Hive Class
not found " + e); return }
    case e: Exception => { logError("Unexpected Exception " + e)
      throw new RuntimeException("Unexpected exception", e)
    }
  }
}


thanks


Chester

Reply via email to