Title: Message Title
Adam Budde updated an issue
Spark / SPARK-19405
Add support to KinesisUtils for cross-account Kinesis reads via STS
Change By:
Adam Budde
h1. SummaryEnable KinesisReceiver to utilize STSAssumeRoleSessionCredentialsProvider when setting up the Kinesis Client Library in order to enable secure cross-account Kinesis stream reads managed by AWS Simple Token Service (STS)h1. DetailsSpark's KinesisReceiver implementation utilizes the Kinesis Client Library in order to allow users to write Spark Streaming jobs that operate on Kinesis data. The KCL uses a few AWS services under the hood in order to provide checkpointed, load-balanced processing of the underlying data in a Kinesis stream. Running the KCL requires permissions to be set up for the following AWS resources.* AWS Kinesis for reading stream data* AWS DynamoDB for storing KCL shared state in tables* AWS CloudWatch for logging KCL metricsThe KinesisUtils.createStream() API allows users to authenticate to these services either by specifying an explicit AWS access key/secret key credential pair or by using the default credential provider chain. This supports authorizing to the three AWS services using either an AWS keypair (either provided explicitly or parsed from environment variables, etc.):!https://raw.githubusercontent.com/budde/budde_asf_jira_images/master/spark/kinesis_sts_support/KeypairOnly.png!Or the IAM instance profile (when running on EC2):!https://raw.githubusercontent.com/budde/budde_asf_jira_images/master/spark/kinesis_sts_support/InstanceProfileOnly.png!AWS users often need to access resources across separate accounts. This could be done in order to consume data produced by another organization or from a service running in another account for resource isolation purposes. AWS Simple Token Service (STS) provides a secure way to authorize cross-account resource access by using temporary sessions to assuming an IAM role in the AWS account with the resources being accessed.The [IAM documentation|http://docs.aws.amazon.com/IAM/latest/UserGuide/tutorial_cross-account-with-roles.html] covers the specifics of how cross account IAM role assumption works in much greater detail, but if an actor in account A wanted to read from a Kinesis stream in account B the general steps required would look something like this:* An IAM role is added to account B with read permissions for the Kinesis stream** Trust policy is configured to allow account A to assume the role * Actor in account A uses its own long-lived credentials to tell STS to assume the role in account B* STS returns temporary credentials with permission to read from the stream in account BApplied to KinesisReceiver and the KCL, we could use a keypair as our long-lived credentials to authenticate to STS and assume an external role with the necessary KCL permissions:!https://raw.githubusercontent.com/budde/budde_asf_jira_images/master/spark/kinesis_sts_support/STSKeypair.png!Or the instance profile as long-lived credentials:!https://raw.githubusercontent.com/budde/budde_asf_jira_images/master/spark/kinesis_sts_support/STSInstanceProfile.png!The STSAssumeRoleSessionCredentialsProvider implementation of the AWSCredentialsProviderChain interface from the AWS SDK abstracts all of the management of the temporary session credentia