Re: A bug in spark or hadoop RPC with kerberos authentication?

2017-08-22 Thread 周康
you can checkout Hadoop**credential class in  spark yarn。During spark
submit,it will use config on the classpath.
I wonder how do you reference your own config?


RE: A bug in spark or hadoop RPC with kerberos authentication?

2017-08-23 Thread Sun, Keith
Finally find the root cause and raise a bug issue in 
https://issues.apache.org/jira/browse/SPARK-21819



Thanks very much.
Keith

From: Sun, Keith
Sent: 2017年8月22日 8:48
To: user@spark.apache.org
Subject: A bug in spark or hadoop RPC with kerberos authentication?

Hello ,

I met this very weird issue, while easy to reproduce, and stuck me for more 
than 1 day .I suspect this may be an issue/bug related to the class loader.
Can you help confirm the root cause ?

I want to specify a customized Hadoop configuration set instead of those on the 
class path(we have a few hadoop clusters and all have Kerberos security and I 
want to support different configuration).
Code/error like below.


The work around I found is to place a core-site.xml on the class path with 
below 2 properties will work.
By checking  the rpc code under org.apache.hadoop.ipc.RPC, I suspect the RPC 
code may not see the UGI class in the same classloader.
So UGI is initialized with default value on the classpth which is simple 
authentication.

core-site.xml with the security setup on the classpath:


hadoop.security.authentication
 kerberos


hadoop.security.authorization
true




error--
2673 [main] DEBUG 
org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil  - 
DataTransferProtocol using SaslPropertiesResolver, configured QOP 
dfs.data.transfer.protection = privacy, configured class 
dfs.data.transfer.saslproperties.resolver.class = class 
org.apache.hadoop.security.WhitelistBasedResolver
2696 [main] DEBUG org.apache.hadoop.service.AbstractService  - Service: 
org.apache.hadoop.yarn.client.api.impl.YarnClientImpl entered state INITED
2744 [main] DEBUG org.apache.hadoop.security.UserGroupInformation  - 
PrivilegedAction as:x@xxxCOM (auth:KERBEROS) 
from:org.apache.hadoop.yarn.client.RMProxy.getProxy(RMProxy.java:136) //
2746 [main] DEBUG org.apache.hadoop.yarn.ipc.YarnRPC  - Creating YarnRPC for 
org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC
2746 [main] DEBUG org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC  - Creating a 
HadoopYarnProtoRpc proxy for protocol interface 
org.apache.hadoop.yarn.api.ApplicationClientProtocol
2801 [main] DEBUG org.apache.hadoop.ipc.Client  - getting client out of cache: 
org.apache.hadoop.ipc.Client@748fe51d
2981 [main] DEBUG org.apache.hadoop.service.AbstractService  - Service 
org.apache.hadoop.yarn.client.api.impl.YarnClientImpl is started
3004 [main] DEBUG org.apache.hadoop.ipc.Client  - The ping interval is 6 ms.
3005 [main] DEBUG org.apache.hadoop.ipc.Client  - Connecting to 
yarn-rm-1/x:8032
3019 [IPC Client (2012095985) connection to yarn-rm-1/x:8032 from 
x@xx] DEBUG org.apache.hadoop.ipc.Client  - IPC Client (2012095985) 
connection to yarn-rm-1/x:8032 from x@xx: starting, having 
connections 1
3020 [IPC Parameter Sending Thread #0] DEBUG org.apache.hadoop.ipc.Client  - 
IPC Client (2012095985) connection to yarn-rm-1/x:8032 from x@xx 
sending #0
3025 [IPC Client (2012095985) connection to yarn-rm-1/x:8032 from 
x@xx] DEBUG org.apache.hadoop.ipc.Client  - IPC Client (2012095985) 
connection to yarn-rm-1/x:8032 from x@xx got value #-1
3026 [IPC Client (2012095985) connection to yarn-rm-1/x:8032 from 
x@xx] DEBUG org.apache.hadoop.ipc.Client  - closing ipc connection to 
yarn-rm-1/x:8032: SIMPLE authentication is not enabled.  Available:[TOKEN, 
KERBEROS]
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException):
 SIMPLE authentication is not enabled.  Available:[TOKEN, KERBEROS]
at 
org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1131)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:979)
---code---

  Configuration hc = new  Configuration(false);

  hc.addResource("myconf /yarn-site.xml");
  hc.addResource("myconf/core-site.xml");
  hc.addResource("myconf/hdfs-site.xml");
  hc.addResource("myconf/hive-site.xml");

  SparkConf sc = new SparkConf(true);
  // add config in spark conf as no xml in the classpath except those 
“default.xml” from Hadoop jars.
  hc.forEach(entry-> {
if(entry.getKey().startsWith("hive")) {
sc.set(entry.getKey(), entry.getValue());
}else {
sc.set("spark.hadoop."+entry.getKey(), entry.getValue());
}
 });

   UserGroupInformation.setConfiguration(hc);
   UserGroupInformation.loginUserFromKeytab(Principal, Keytab);

  System.out.println("spark-conf##");
  System.out.println(sc.toDebugString());


  SparkSession sparkSessesion= SparkSession
.builder()
.master("yarn-client") //"yarn-client", "local"
.config(sc)
.appName(SparkEAZDebug.class.getName())
.enableH

RE: A bug in spark or hadoop RPC with kerberos authentication?

2017-08-23 Thread Sun, Keith
Thanks for the reply, I filled an issue in JIRA 
https://issues.apache.org/jira/browse/SPARK-21819

I submitted the job from Java API, not by the spark-submit command line as we 
want to make spark processing as a service .

Configuration hc = new  Configuration(false);
String yarnxml=String.format("%s/%s", 
ConfigLocation,"yarn-site.xml");
String corexml=String.format("%s/%s", 
ConfigLocation,"core-site.xml");
String hdfsxml=String.format("%s/%s", 
ConfigLocation,"hdfs-site.xml");
String hivexml=String.format("%s/%s", 
ConfigLocation,"hive-site.xml");

hc.addResource(yarnxml);
hc.addResource(corexml);
hc.addResource(hdfsxml);
hc.addResource(hivexml);

//manually set all the Hadoop config in sparkconf
SparkConf sc = new SparkConf(true);
hc.forEach(entry-> {
 if(entry.getKey().startsWith("hive")) {
   sc.set(entry.getKey(), 
entry.getValue());
 }else {
   
sc.set("spark.hadoop."+entry.getKey(), entry.getValue());
 }
   });

  UserGroupInformation.setConfiguration(hc);
  UserGroupInformation.loginUserFromKeytab(Principal, Keytab);

SparkSession sparkSessesion= SparkSession
 .builder()
 .master("yarn-client") 
//"yarn-client", "local"
 .config(sc)
 .appName(SparkEAZDebug.class.getName())
 .enableHiveSupport()
 .getOrCreate();


Thanks very much.
Keith

From: 周康 [mailto:zhoukang199...@gmail.com]
Sent: 2017年8月22日 20:22
To: Sun, Keith 
Cc: user@spark.apache.org
Subject: Re: A bug in spark or hadoop RPC with kerberos authentication?

you can checkout Hadoop**credential class in  spark yarn。During spark submit,it 
will use config on the classpath.
I wonder how do you reference your own config?