[GitHub] spark pull request #22704: [SPARK-25681][K8S][WIP] Leverage a config to tune...

2018-10-18 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22704#discussion_r226391491
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/security/HadoopFSDelegationTokenProvider.scala
 ---
@@ -49,8 +49,11 @@ private[deploy] class 
HadoopFSDelegationTokenProvider(fileSystems: Configuration
 val fetchCreds = fetchDelegationTokens(getTokenRenewer(hadoopConf), 
fsToGetTokens, creds)
 
 // Get the token renewal interval if it is not set. It will only be 
called once.
-if (tokenRenewalInterval == null) {
-  tokenRenewalInterval = getTokenRenewalInterval(hadoopConf, 
sparkConf, fsToGetTokens)
+// If running a Kerberos job on Kubernetes, you may specify that you 
wish to not
+// obtain the tokenRenewal interval, as the renewal service may be 
external.
--- End diff --

This is going to be kind of a long reply. I've explained most of this in a 
long comment I left in your k8s-krb google doc. I should probably put that in a 
readme here since it's really easy to get confused.

> an external token service that places tokens into Secrets may exist 
within a company organization

Lots of things here.

1. You're talking about a service that creates tokens and gives them to 
Spark. That is unrelated to the code you're touching; this code is Spark's code 
for creating its own tokens. In the service scenario you're talking about, this 
code either wouldn't run at all, or would be a no-op. (And if that doesn't 
happen it's a bug somewhere in Spark. That's also, e.g., how Oozie gives Spark 
delegation tokens for other services.)

2. As I mention above, this code is about Spark creating its own tokens. If 
you're talking about a "renewal service" in this context, you need to define 
the interface between Spark and the service, so Spark can authenticate and 
provide the tokens it created to that service.

YARN has that functionality, but it's all very specific to YARN, and the 
code lives in the YARN backend in Spark, not here. If k8s had that 
functionality natively, you could do that in the k8s backend. But if you talk 
about a generic external service without any code in Spark to interact with 
that service, that's what I keep saying doesn't exist.

3. The renewal you talk about and the renewal this code talks about are not 
the same thing.

Your renewal is the standard delegation token renewal: you create it, and 
renew it periodically until it reaches its lifetime. In the YARN case, the YARN 
RM does that renewal for you, so your tokens will remain valid up to when the 
application finishes or the token expires, whatever happens first.

The renewal this code talks about is not that. It's meant to work around 
the upper bound described above. Without it, your app has a maximum lifetime 
which is the lifetime of the token. This code gives Spark the ability to create 
new tokens and distribute them to new and already running executors, so the app 
keeps working indefinitely.

In this mode, Spark doesn't "renew" tokens in the sense you talk about, it 
just creates new ones once it's getting close to the renewal time.

But for that to work you need a Kerberos login. Thus, you need a principal 
and a keytab, unless you want to login to the driver's host and kinit every 
time the TGT needs renewal.

On the other hand, there is no reason why this code couldn't try to do what 
you mean by renewal. But I'm pretty sure that renewal still requires a valid 
kerberos login. And just like delegation tokens, TGTs expire. The service 
talked about in (1) above could help, but this is the only current solution 
(bar YARN) where Spark creates its own tokens, and also allows the application 
to run past the token's expiration.

There are also other caveats, like certain DTs (like Hive's) not being 
renewable.

> The design was specifically for the popular use-case in which for 
cluster-mode we would not send keytabs around and instead read the DT from a 
secret. 

See above what the scenario that requires keytabs is about. It's not great 
to distribute keytabs, but that's the only current solution to that problem.

> cases where the Driver is over-worked in running this renewal service

This code runs once a day (in the case of usual service configs) and just 
performs a few RPCs to get new tokens and send them to executors. It's hardly a 
heavy weight thing.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22704: [SPARK-25681][K8S][WIP] Leverage a config to tune...

2018-10-17 Thread ifilonenko
Github user ifilonenko commented on a diff in the pull request:

https://github.com/apache/spark/pull/22704#discussion_r226127311
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/security/HadoopFSDelegationTokenProvider.scala
 ---
@@ -49,8 +49,11 @@ private[deploy] class 
HadoopFSDelegationTokenProvider(fileSystems: Configuration
 val fetchCreds = fetchDelegationTokens(getTokenRenewer(hadoopConf), 
fsToGetTokens, creds)
 
 // Get the token renewal interval if it is not set. It will only be 
called once.
-if (tokenRenewalInterval == null) {
-  tokenRenewalInterval = getTokenRenewalInterval(hadoopConf, 
sparkConf, fsToGetTokens)
+// If running a Kerberos job on Kubernetes, you may specify that you 
wish to not
+// obtain the tokenRenewal interval, as the renewal service may be 
external.
--- End diff --

> And BTW, I know what you mean when you mention an external renewal 
service. But again, that does not exist, and until it does, you should not do 
things that assume its existence.

Right, so in Spark there is no token renewal service ATM, but the existence 
of an external token service that places tokens into Secrets may exist within a 
company organization, no? Whether they leverage one, provided by Spark or not. 
So I personally thought that such a comment is sensible. But I'll remove it 
based on your reasoning. 

> If you're in this code, there are two options:

Ah, very good point,  checking for the presence of a keytab / principal as 
a flag, given that

> It has always been, and always will be, the way to tell Spark that you 
want Spark to renew tokens itself

makes sense

> The current k8s backend is broken in that regard.

The design was specifically for the popular use-case in which for 
cluster-mode we would not send keytabs around and instead read the DT from a 
secret. So true, it does break the traditional design because we are using a 
keytab and not enabling renewal. Contractually with your work, next steps would 
be to parallel the other resource-managers in allowing for the option to use 
the renewal code if the keytab and principal is already in the Driver. Just for 
interest, has there been any cases where the Driver is over-worked in running 
this renewal service and managing the DTs for many executors? In essence, could 
there be any convincing use-case to have the Driver use the keytab for login, 
but not want to do its own renewal as it might be the case that it can't handle 
the load?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22704: [SPARK-25681][K8S][WIP] Leverage a config to tune...

2018-10-17 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22704#discussion_r226069637
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/security/HadoopFSDelegationTokenProvider.scala
 ---
@@ -49,8 +49,11 @@ private[deploy] class 
HadoopFSDelegationTokenProvider(fileSystems: Configuration
 val fetchCreds = fetchDelegationTokens(getTokenRenewer(hadoopConf), 
fsToGetTokens, creds)
 
 // Get the token renewal interval if it is not set. It will only be 
called once.
-if (tokenRenewalInterval == null) {
-  tokenRenewalInterval = getTokenRenewalInterval(hadoopConf, 
sparkConf, fsToGetTokens)
+// If running a Kerberos job on Kubernetes, you may specify that you 
wish to not
+// obtain the tokenRenewal interval, as the renewal service may be 
external.
--- End diff --

What does "as the renewal service may be external" mean?

If you're in this code, there are two options:
- you want Spark to renew tokens, in which case you need the interval.
- you do not want Spark to renew tokens, in which case you should not give 
Spark neither a principal and a keytab.

The principal/keytab combo is NOT a replacement for kinit. It has always 
been, and always will be, the way to tell Spark that you want Spark to renew 
tokens itself. The current k8s backend is broken in that regard.

And BTW, I know what you mean when you mention an external renewal service. 
But again, that does not exist, and until it does, you should not do things 
that assume its existence.

Now as for how to avoid the extra token, that's does not need a 
configuration at all. The extra token is needed in YARN because to know the 
renewal interval, you have to call `renew()` on the token, and that fails with 
the token created with YARN as the renewer.

So to fix this:
- check that this is running in YARN, and create the extra token
- if it's not running on YARN, just call "renew()" on the existing token

And when, and if, there is an external renewal service, a lot of this will 
have to change in the first place.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22704: [SPARK-25681][K8S][WIP] Leverage a config to tune...

2018-10-11 Thread ifilonenko
GitHub user ifilonenko opened a pull request:

https://github.com/apache/spark/pull/22704

[SPARK-25681][K8S][WIP] Leverage a config to tune renewal time retrieval 

## What changes were proposed in this pull request?

Changes to core allow for a K8S to pass a SparkConf specifying whether the 
`obtainDelegationTokens()` logic fetches `renewalInterval` (as in some uses 
cases where the DT renewal may be external to Spark i.e. K8s Cluster Mode) and 
whether this renewal interval calculation is done by retrieving a second DT, as 
YARN does. 

## How was this patch tested?

- [ ] Unit Tests
- [ ] Integration Tests


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ifilonenko/spark SPARK-25681

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22704.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22704


commit cba8abbaadb44ff064d348324da7922e514b9387
Author: Ilan Filonenko 
Date:   2018-10-11T22:26:39Z

first WIP commit

commit 6e807e169cc9113c5fcd1653e610ec473c1ff8e8
Author: Ilan Filonenko 
Date:   2018-10-11T22:30:06Z

modified sentence




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org