[jira] [Updated] (SPARK-21819) UserGroupInformation initialization in SparkHadoopUtilwill overwrite user config

Keith Sun (JIRA) Wed, 23 Aug 2017 04:12:20 -0700

     [ 
https://issues.apache.org/jira/browse/SPARK-21819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Keith Sun updated SPARK-21819:
------------------------------
    Description: 
When  submit job in Java or Scala code to ,the initialization of 
SparkHadoopUtil will trigger the configuration overwritten in UGI which may not 
be expected if the UGI has already been initialized by customized xmls which 
are not on the classpath (like the cfg4j , which could set conf from github 
code, a database etc). 

{code:java}
//it will overwrite the UGI conf which has already been initialized
class SparkHadoopUtil extends Logging {
  private val sparkConf = new SparkConf(false).loadFromSystemProperties(true)
  val conf: Configuration = newConfiguration(sparkConf)
  UserGroupInformation.setConfiguration(conf)
{code}

My scenario : My yarn cluster is kerberized, my configuration is set to use 
kerberos for hadoop security. While, after the initialzation of SparkHadoopUtil 
, the authentiationMethod in UGI is updated to "simple"(my xmls not on the 
classpath), which lead to the failure like below :

{code:java}
933453 [main] INFO  org.apache.spark.SparkContext  - Successfully stopped 
SparkContext
Exception in thread "main" org.apache.hadoop.security.AccessControlException: 
SIMPLE authentication is not enabled.  Available:[TOKEN, KERBEROS]
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at 
org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
        at 
org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:104)
        at 
org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterMetrics(ApplicationClientProtocolPBClientImpl.java:209)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
        at com.sun.proxy.$Proxy16.getClusterMetrics(Unknown Source)
        at 
org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getYarnClusterMetrics(YarnClientImpl.java:501)
        at 
org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:154)
        at 
org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:154)
        at org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54)
        at org.apache.spark.deploy.yarn.Client.logInfo(Client.scala:60)
        at 
org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:153)
        at 
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56)
        at 
org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:149)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:497)
        at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2258)
        at 
org.apache.spark.sql.SparkSession$Builder$$anonfun$8.apply(SparkSession.scala:831)
        at 
org.apache.spark.sql.SparkSession$Builder$$anonfun$8.apply(SparkSession.scala:823)
        at scala.Option.getOrElse(Option.scala:121)
        at 
org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:823)
        at SparkTest.SparkEAZDebug.main(SparkEAZDebug.java:84)
Caused by: 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException):
 SIMPLE authentication is not enabled.  Available:[TOKEN, KERBEROS]
        at org.apache.hadoop.ipc.Client.call(Client.java:1426)
        at org.apache.hadoop.ipc.Client.call(Client.java:1363)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
        at com.sun.proxy.$Proxy15.getClusterMetrics(Unknown Source)
        at 
org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterMetrics(ApplicationClientProtocolPBClientImpl.java:206)
        ... 22 more

{code}

Sample code :

{code:java}
                Configuration hc = new  Configuration(false);
                String yarnxml=String.format("%s/%s", 
ConfigLocation,"yarn-site.xml");
                String corexml=String.format("%s/%s", 
ConfigLocation,"core-site.xml");
                String hdfsxml=String.format("%s/%s", 
ConfigLocation,"hdfs-site.xml");
                String hivexml=String.format("%s/%s", 
ConfigLocation,"hive-site.xml");

                hc.addResource(yarnxml);
                hc.addResource(corexml);
                hc.addResource(hdfsxml);
                hc.addResource(hivexml);

                SparkConf sc = new SparkConf(true);
                hc.forEach(entry-> {                                    
                                if(entry.getKey().startsWith("hive")) {
                                        sc.set(entry.getKey(), 
entry.getValue());
                                }else {
                                        sc.set("spark.hadoop."+entry.getKey(), 
entry.getValue());
                                }
                        });

            UserGroupInformation.setConfiguration(hc);
            UserGroupInformation.loginUserFromKeytab(Principal, Keytab);        
        

                SparkSession sparkSessesion= SparkSession
                                .builder()
                                .master("yarn-client") //"yarn-client", "local"
                                .config(sc)
                                .appName(SparkEAZDebug.class.getName())
                                .enableHiveSupport() 
                                .getOrCreate();

{code}


  was:
When  submit job in Java or Scala code to ,the initialization of 
SparkHadoopUtil will trigger the configuration overwritten in UGI which may not 
be expected if the UGI has already been initialized by customized xmls which 
are not on the classpath (like the cfg4j , which could set conf from github 
code, a database etc). 

{code:java}
//it will overwrite the UGI conf which has already been initialized
class SparkHadoopUtil extends Logging {
  private val sparkConf = new SparkConf(false).loadFromSystemProperties(true)
  val conf: Configuration = newConfiguration(sparkConf)
  UserGroupInformation.setConfiguration(conf)
{code}

My scenario : My yarn cluster is kerberized, my configuration is set to use 
kerberos for hadoop security. While, after the initialzation of SparkHadoopUtil 
, the authentiationMethod in UGI is updated to "simple"(my xmls not on the 
classpath), which lead to the failure like below :

{code:java}
933453 [main] INFO  org.apache.spark.SparkContext  - Successfully stopped 
SparkContext
Exception in thread "main" org.apache.hadoop.security.AccessControlException: 
SIMPLE authentication is not enabled.  Available:[TOKEN, KERBEROS]
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at 
org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
        at 
org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:104)
        at 
org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterMetrics(ApplicationClientProtocolPBClientImpl.java:209)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
        at com.sun.proxy.$Proxy16.getClusterMetrics(Unknown Source)
        at 
org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getYarnClusterMetrics(YarnClientImpl.java:501)
        at 
org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:154)
        at 
org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:154)
        at org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54)
        at org.apache.spark.deploy.yarn.Client.logInfo(Client.scala:60)
        at 
org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:153)
        at 
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56)
        at 
org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:149)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:497)
        at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2258)
        at 
org.apache.spark.sql.SparkSession$Builder$$anonfun$8.apply(SparkSession.scala:831)
        at 
org.apache.spark.sql.SparkSession$Builder$$anonfun$8.apply(SparkSession.scala:823)
        at scala.Option.getOrElse(Option.scala:121)
        at 
org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:823)
        at SparkTest.SparkEAZDebug.main(SparkEAZDebug.java:84)
Caused by: 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException):
 SIMPLE authentication is not enabled.  Available:[TOKEN, KERBEROS]
        at org.apache.hadoop.ipc.Client.call(Client.java:1426)
        at org.apache.hadoop.ipc.Client.call(Client.java:1363)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
        at com.sun.proxy.$Proxy15.getClusterMetrics(Unknown Source)
        at 
org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterMetrics(ApplicationClientProtocolPBClientImpl.java:206)
        ... 22 more

{code}

Sample code :
                Configuration hc = new  Configuration(false);
                String yarnxml=String.format("%s/%s", 
ConfigLocation,"yarn-site.xml");
                String corexml=String.format("%s/%s", 
ConfigLocation,"core-site.xml");
                String hdfsxml=String.format("%s/%s", 
ConfigLocation,"hdfs-site.xml");
                String hivexml=String.format("%s/%s", 
ConfigLocation,"hive-site.xml");

                hc.addResource(yarnxml);
                hc.addResource(corexml);
                hc.addResource(hdfsxml);
                hc.addResource(hivexml);

                SparkConf sc = new SparkConf(true);
                hc.forEach(entry-> {                                    
                                if(entry.getKey().startsWith("hive")) {
                                        sc.set(entry.getKey(), 
entry.getValue());
                                }else {
                                        sc.set("spark.hadoop."+entry.getKey(), 
entry.getValue());
                                }
                        });

            UserGroupInformation.setConfiguration(hc);
            UserGroupInformation.loginUserFromKeytab(Principal, Keytab);        
        

                SparkSession sparkSessesion= SparkSession
                                .builder()
                                .master("yarn-client") //"yarn-client", "local"
                                .config(sc)
                                .appName(SparkEAZDebug.class.getName())
                                .enableHiveSupport() 
                                .getOrCreate();



>  UserGroupInformation initialization in SparkHadoopUtilwill overwrite user 
> config
> ---------------------------------------------------------------------------------
>
>                 Key: SPARK-21819
>                 URL: https://issues.apache.org/jira/browse/SPARK-21819
>             Project: Spark
>          Issue Type: Bug
>          Components: Java API, YARN
>    Affects Versions: 2.1.0, 2.1.1
>         Environment: Ubuntu14.04
> Spark2.10/2.11 (I checked the github of 2.20 , it exist there as well)
> Cluster mode: Yarn client 
>            Reporter: Keith Sun
>
> When  submit job in Java or Scala code to ,the initialization of 
> SparkHadoopUtil will trigger the configuration overwritten in UGI which may 
> not be expected if the UGI has already been initialized by customized xmls 
> which are not on the classpath (like the cfg4j , which could set conf from 
> github code, a database etc). 
> {code:java}
> //it will overwrite the UGI conf which has already been initialized
> class SparkHadoopUtil extends Logging {
>   private val sparkConf = new SparkConf(false).loadFromSystemProperties(true)
>   val conf: Configuration = newConfiguration(sparkConf)
>   UserGroupInformation.setConfiguration(conf)
> {code}
> My scenario : My yarn cluster is kerberized, my configuration is set to use 
> kerberos for hadoop security. While, after the initialzation of 
> SparkHadoopUtil , the authentiationMethod in UGI is updated to "simple"(my 
> xmls not on the classpath), which lead to the failure like below :
> {code:java}
> 933453 [main] INFO  org.apache.spark.SparkContext  - Successfully stopped 
> SparkContext
> Exception in thread "main" org.apache.hadoop.security.AccessControlException: 
> SIMPLE authentication is not enabled.  Available:[TOKEN, KERBEROS]
>       at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>       at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>       at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>       at 
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
>       at 
> org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:104)
>       at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterMetrics(ApplicationClientProtocolPBClientImpl.java:209)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:498)
>       at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256)
>       at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
>       at com.sun.proxy.$Proxy16.getClusterMetrics(Unknown Source)
>       at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getYarnClusterMetrics(YarnClientImpl.java:501)
>       at 
> org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:154)
>       at 
> org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:154)
>       at org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54)
>       at org.apache.spark.deploy.yarn.Client.logInfo(Client.scala:60)
>       at 
> org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:153)
>       at 
> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56)
>       at 
> org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:149)
>       at org.apache.spark.SparkContext.<init>(SparkContext.scala:497)
>       at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2258)
>       at 
> org.apache.spark.sql.SparkSession$Builder$$anonfun$8.apply(SparkSession.scala:831)
>       at 
> org.apache.spark.sql.SparkSession$Builder$$anonfun$8.apply(SparkSession.scala:823)
>       at scala.Option.getOrElse(Option.scala:121)
>       at 
> org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:823)
>       at SparkTest.SparkEAZDebug.main(SparkEAZDebug.java:84)
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException):
>  SIMPLE authentication is not enabled.  Available:[TOKEN, KERBEROS]
>       at org.apache.hadoop.ipc.Client.call(Client.java:1426)
>       at org.apache.hadoop.ipc.Client.call(Client.java:1363)
>       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
>       at com.sun.proxy.$Proxy15.getClusterMetrics(Unknown Source)
>       at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterMetrics(ApplicationClientProtocolPBClientImpl.java:206)
>       ... 22 more
> {code}
> Sample code :
> {code:java}
>               Configuration hc = new  Configuration(false);
>               String yarnxml=String.format("%s/%s", 
> ConfigLocation,"yarn-site.xml");
>               String corexml=String.format("%s/%s", 
> ConfigLocation,"core-site.xml");
>               String hdfsxml=String.format("%s/%s", 
> ConfigLocation,"hdfs-site.xml");
>               String hivexml=String.format("%s/%s", 
> ConfigLocation,"hive-site.xml");
>               hc.addResource(yarnxml);
>               hc.addResource(corexml);
>               hc.addResource(hdfsxml);
>               hc.addResource(hivexml);
>               SparkConf sc = new SparkConf(true);
>               hc.forEach(entry-> {                                    
>                               if(entry.getKey().startsWith("hive")) {
>                                       sc.set(entry.getKey(), 
> entry.getValue());
>                               }else {
>                                       sc.set("spark.hadoop."+entry.getKey(), 
> entry.getValue());
>                               }
>                       });
>           UserGroupInformation.setConfiguration(hc);
>           UserGroupInformation.loginUserFromKeytab(Principal, Keytab);        
>         
>               SparkSession sparkSessesion= SparkSession
>                               .builder()
>                               .master("yarn-client") //"yarn-client", "local"
>                               .config(sc)
>                               .appName(SparkEAZDebug.class.getName())
>                               .enableHiveSupport() 
>                               .getOrCreate();
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-21819) UserGroupInformation initialization in SparkHadoopUtilwill overwrite user config

Reply via email to