[ 
https://issues.apache.org/jira/browse/SPARK-11326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15001727#comment-15001727
 ] 

Jacek Lewandowski commented on SPARK-11326:
-------------------------------------------

[~pwendell] this leverage the current Spark infrastructure. I've just split the 
big and messy pull request into separate commits which describes particular 
changes. Summing up, there are few things:
there a separate endpoint used to connect the driver to standalone scheduler 
instead of using the same endpoint for everything - which allows to provide 
different configurations for both of them (different secrets)
* {{SaslClientBootstrap}} doesn't get app id on initialisation but rather on 
{{doBoostrap}} - this change allows us to reuse the same endpoint for 
connections to the clients which use different secrets; obviously, this change 
follows some other changes - for example, the outboxes are identified by socket 
address + app id rather than socket address alone
* app id is tied to the endpoint ref. So the user may define app id whenever a 
connecting to some other endpoint is made. The app id was never used in the 
context of standalone scheduler and it was always filled with empty string. Now 
we can specify it explicitly.
* when {{SecurityManager}} is used as a secret key holder (for example, to 
contact the standalone scheduler), it returns the default Spark user and a 
secret specified in the configuration when {{getSaslUser(appId)}} and 
{{getSecretKey(appId)}} are invoked respectively, ignoring the provided app id. 
The changes which are implemented here include: 
** {{getSaslUser(appId)}} returns app id if app id is neither empty nor null, 
otherwise it returns default Spark user unless a username is specified in the 
configuration at {{spark.authenticate.user}}
** {{getSecretKey(appId)}} delegates to password authenticator entity if app id 
is neither empty nor null, otherwise it returns the default secret which is the 
secret obtained in the old way (from Spark configuration)
** password authenticator is a trait with one method which returns a secret for 
the provided identifier; It is configurable, but the default authenticator just 
implements the legacy behaviour
** so effectively when {{SecurityManager}} is used as a secret key holder, app 
id represents the Sasl user, who is the application submitter and owner. 
Perhaps, there should be used a different secret key holder implementation for 
standalone scheduler instead of trying to make {{SecurityManager}} be 
compatible with everything.
* app id is exposed in RPC context, so that both receive methods (receive, 
receiveAndReply) can read the app id used by the client who sent the message; 
this was needed to add authorisation to master - when the application is 
registered, the application owner is stored along with DriverInfo and 
ApplicationInfo; then, any time someone wants to manage the application (kill, 
request executors, etc.) he needs to authenticate with the same

Everything is covered by dozens of integration test cases, which starts a 
master and tries to connect to it with various configuration combinations. 
Master high availability which involves reconnecting Master back to the driver 
is also covered. 


> Support for authentication and encryption in standalone mode
> ------------------------------------------------------------
>
>                 Key: SPARK-11326
>                 URL: https://issues.apache.org/jira/browse/SPARK-11326
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>            Reporter: Jacek Lewandowski
>
> h3.The idea
> Currently, in standalone mode, all components, for all network connections 
> need to use the same secure token if they want to have any security ensured. 
> This ticket is intended to split the communication in standalone mode to make 
> it more like in Yarn mode - application internal communication and scheduler 
> communication.
> Such refactoring will allow for the scheduler (master, workers) to use a 
> distinct secret, which will remain unknown for the users. Similarly, it will 
> allow for better security in applications, because each application will be 
> able to use a distinct secret as well. 
> By providing SASL authentication/encryption for connections between a client 
> (Client or AppClient) and Spark Master, it becomes possible introducing 
> pluggable authentication for standalone deployment mode.
> h3.Improvements introduced by this patch
> This patch introduces the following changes:
> * Spark driver or submission client do not have to use the same secret as 
> workers use to communicate with Master
> * Master is able to authenticate individual clients with the following rules:
> ** When connecting to the master, the client needs to specify 
> {{spark.authenticate.secret}} which is an authentication token for the user 
> specified by {{spark.authenticate.user}} ({{sparkSaslUser}} by default)
> ** Master configuration may include additional 
> {{spark.authenticate.secrets.<username>}} entries for specifying 
> authentication token for particular users or 
> {{spark.authenticate.authenticatorClass}} which specify an implementation of 
> external credentials provider (which is able to retrieve the authentication 
> token for a given user).
> ** Workers authenticate with Master as default user {{sparkSaslUser}}. 
> * The authorization rules are as follows:
> ** A regular user is able to manage only his own application (the application 
> which he submitted)
> ** A regular user is not able to register or manager workers
> ** Spark default user {{sparkSaslUser}} can manage all the applications
> h3.User facing changes when running application
> h4.General principles:
> - conf: {{spark.authenticate.secret}} is *never sent* over the wire
> - env: {{SPARK_AUTH_SECRET}} is *never sent* over the wire
> - In all situations env variable will overwrite conf variable if present. 
> - In all situations when a user has to pass a secret, it is better (safer) to 
> do this through env variable
> - In work modes with multiple secrets we assume encrypted communication 
> between client and master, between driver and master, between master and 
> workers
> ----
> h4.Work modes and descriptions
> h5.Client mode, single secret
> h6.Configuration
> - env: {{SPARK_AUTH_SECRET=secret}} or conf: 
> {{spark.authenticate.secret=secret}}
> h6.Description
> - The driver is running locally
> - The driver will neither send env: {{SPARK_AUTH_SECRET}} nor conf: 
> {{spark.authenticate.secret}}
> - The driver will use either env: {{SPARK_AUTH_SECRET}} or conf: 
> {{spark.authenticate.secret}} for connection to the master
> - _ExecutorRunner_ will not find any secret in _ApplicationDescription_ so it 
> will look for it in the worker configuration and it will find it there (its 
> presence is implied). 
> ----
> h5.Client mode, multiple secrets
> h6.Configuration
> - env: {{SPARK_APP_AUTH_SECRET=app_secret}} or conf: 
> {{spark.app.authenticate.secret=secret}}
> - env: {{SPARK_SUBMISSION_AUTH_SECRET=scheduler_secret}} or conf: 
> {{spark.submission.authenticate.secret=scheduler_secret}}
> h6.Description
> - The driver is running locally
> - The driver will use either env: {{SPARK_SUBMISSION_AUTH_SECRET}} or conf: 
> {{spark.submission.authenticate.secret}} to connect to the master
> - The driver will neither send env: {{SPARK_SUBMISSION_AUTH_SECRET}} nor 
> conf: {{spark.submission.authenticate.secret}}
> - The driver will use either {{SPARK_APP_AUTH_SECRET}} or conf: 
> {{spark.app.authenticate.secret}} for communication with the executors
> - The driver will send {{spark.executorEnv.SPARK_AUTH_SECRET=app_secret}} so 
> that the executors can use it to communicate with the driver
> - _ExecutorRunner_ will find that secret in _ApplicationDescription_ and it 
> will set it in env: {{SPARK_AUTH_SECRET}} which will be read by 
> _ExecutorBackend_ afterwards and used for all the connections (with driver, 
> other executors and external shuffle service).
> ----
> h5.Cluster mode, single secret
> h6.Configuration
> - env: {{SPARK_AUTH_SECRET=secret}} or conf: 
> {{spark.authenticate.secret=secret}}
> h6.Description
> - The driver is run by _DriverRunner_ which is is a part of the worker
> - The client will neither send env: {{SPARK_AUTH_SECRET}} nor conf: 
> {{spark.authenticate.secret}}
> - The client will use either env: {{SPARK_AUTH_SECRET}} or conf: 
> {{spark.authenticate.secret}} for connection to the master and submit the 
> driver
> - _DriverRunner_ will not find any secret in _DriverDescription_ so it will 
> look for it in the worker configuration and it will find it there (its 
> presence is implied)
> - _DriverRunner_ will set the secret it found in env: {{SPARK_AUTH_SECRET}} 
> so that the driver will find it and use it for all the connections
> - The driver will use either env: {{SPARK_AUTH_SECRET}} or conf: 
> {{spark.authenticate.secret}} for connection to the master
> - _ExecutorRunner_ will not find any secret in _ApplicationDescription_ so it 
> will look for it in the worker configuration and it will find it there (its 
> presence is implied). 
> ----
> h5.Cluster mode, multiple secrets
> h6.Configuration
> - env: {{SPARK_APP_AUTH_SECRET=app_secret}} or conf: 
> {{spark.app.authenticate.secret=secret}}
> - env: {{SPARK_SUBMISSION_AUTH_SECRET=scheduler_secret}} or conf: 
> {{spark.submission.authenticate.secret=scheduler_secret}}
> h6.Description
> - The driver is run by _DriverRunner_ which is is a part of the worker
> - The client will use either env: {{SPARK_SUBMISSION_AUTH_SECRET}} or conf: 
> {{spark.submission.authenticate.secret}} to connect to the master
> - The client will send either env: {{SPARK_SUBMISSION_AUTH_SECRET}} or conf: 
> {{spark.submission.authenticate.secret}} as env: 
> {{SPARK_SUBMISSION_AUTH_SECRET}} (to avoid passing secret as Java command 
> line option)
> - The client will send either env: {{SPARK_APP_AUTH_SECRET}} or conf: 
> {{spark.app.authenticate.secret}} as env: {{SPARK_APP_AUTH_SECRET}} (to avoid 
> passing secret as Java command line option)
> - _DriverRunner_ will find env: {{SPARK_SUBMISSION_AUTH_SECRET}} and env: 
> {{SPARK_APP_AUTH_SECRET}} and will pass them both to the driver
> - The driver will use env: {{SPARK_SUBMISSION_AUTH_SECRET}}
> - The driver will not send env: {{SPARK_SUBMISSION_AUTH_SECRET}}
> - The driver will use {{SPARK_APP_AUTH_SECRET}} for communication with the 
> executors
> - The driver will send {{spark.executorEnv.SPARK_AUTH_SECRET=app_secret}} so 
> that the executors can use it to communicate with the driver
> - _ExecutorRunner_ will find that secret in _ApplicationDescription_ and it 
> will set it in env: {{SPARK_AUTH_SECRET}} which will be read by 
> _ExecutorBackend_ afterwards and used for all the connections (with driver, 
> other executors and external shuffle service).
> ----
> h4.Lifecycles
> - env: {{SPARK_AUTH_SECRET}} and conf: {{spark.authenticate.secret}} are 
> always lost, they are never transferred to other entities. They are just used 
> in the entity which has them defined and die.
> - env: {{SPARK_SUBMISSION_AUTH_SECRET}} is used by _Client_ to connect to the 
> master. It is sent as env variable of the same name with _DriverDescription_ 
> so that it is also present in the environment of the driver. Driver uses it 
> to connect to the master and it will not send it to any other entity.
> - conf: {{spark.submission.authenticate.secret}} is used by _Client_ to 
> connect to the master unless env: {{SPARK_SUBMISSION_AUTH_SECRET}} is 
> defined. If env: {{SPARK_SUBMISSION_AUTH_SECRET}} is not defined, conf: 
> {{spark.submission.authenticate.secret}} is copied to env in 
> _DriverDescription_ as {{SPARK_SUBMISSION_AUTH_SECRET}} and removed from conf 
> to avoid passing it as Java command line argument when running the driver.
> - env: {{SPARK_APP_AUTH_SECRET}} is sent as env variable of the same name 
> with _DriverDescription_ so that it is also present in the environment of the 
> driver. Driver uses it to connect to the executors and it will send it with 
> _ApplicationDescription_ as env: {{SPARK_AUTH_SECRET}} so that 
> _ExecutorRunner_ can put it into the executor environment. Then 
> _ExecutorBackend_ can use it to communicate with the driver, other executors 
> and external shuffle service.
> - conf: {{spark.app.authenticate.secret}} - if env: {{SPARK_APP_AUTH_SECRET}} 
> is not defined, conf: {{spark.app.authenticate.secret}} is copied to env in 
> _DriverDescription_ as {{SPARK_APP_AUTH_SECRET}} and removed from conf to 
> avoid passing it as Java command line argument when running the driver.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to