Github user budde commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16744#discussion_r99906733
  
    --- Diff: 
external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisUtils.scala
 ---
    @@ -123,9 +123,143 @@ object KinesisUtils {
         // scalastyle:on
         val cleanedHandler = ssc.sc.clean(messageHandler)
         ssc.withNamedScope("kinesis stream") {
    +      val kinesisCredsProvider = BasicCredentialsProvider(
    +        awsAccessKeyId = awsAccessKeyId,
    +        awsSecretKey = awsSecretKey)
           new KinesisInputDStream[T](ssc, streamName, endpointUrl, 
validateRegion(regionName),
             initialPositionInStream, kinesisAppName, checkpointInterval, 
storageLevel,
    -        cleanedHandler, Some(SerializableAWSCredentials(awsAccessKeyId, 
awsSecretKey)))
    +        cleanedHandler, kinesisCredsProvider)
    +    }
    +  }
    +
    +  /**
    +   * Create an input stream that pulls messages from a Kinesis stream.
    +   * This uses the Kinesis Client Library (KCL) to pull messages from 
Kinesis.
    +   *
    +   * @param ssc StreamingContext object
    +   * @param kinesisAppName  Kinesis application name used by the Kinesis 
Client Library
    +   *                        (KCL) to update DynamoDB
    +   * @param streamName   Kinesis stream name
    +   * @param endpointUrl  Url of Kinesis service (e.g., 
https://kinesis.us-east-1.amazonaws.com)
    +   * @param regionName   Name of region used by the Kinesis Client Library 
(KCL) to update
    +   *                     DynamoDB (lease coordination and checkpointing) 
and CloudWatch (metrics)
    +   * @param initialPositionInStream  In the absence of Kinesis checkpoint 
info, this is the
    +   *                                 worker's initial starting position in 
the stream.
    +   *                                 The values are either the beginning 
of the stream
    +   *                                 per Kinesis' limit of 24 hours
    +   *                                 
(InitialPositionInStream.TRIM_HORIZON) or
    +   *                                 the tip of the stream 
(InitialPositionInStream.LATEST).
    +   * @param checkpointInterval  Checkpoint interval for Kinesis 
checkpointing.
    +   *                            See the Kinesis Spark Streaming 
documentation for more
    +   *                            details on the different types of 
checkpoints.
    +   * @param storageLevel Storage level to use for storing the received 
objects.
    +   *                     StorageLevel.MEMORY_AND_DISK_2 is recommended.
    +   * @param messageHandler A custom message handler that can generate a 
generic output from a
    +   *                       Kinesis `Record`, which contains both message 
data, and metadata.
    +   * @param stsAssumeRoleArn ARN of IAM role to assume when using STS 
sessions to read from
    +   *                         Kinesis stream.
    +   * @param stsSessionName Name to uniquely identify STS sessions if 
multiple princples assume
    +   *                       the same role.
    +   * @param stsExternalId External ID that can be used to validate against 
the assumed IAM role's
    +   *                      trust policy.
    +   *
    +   * @note The AWS credentials will be discovered using the 
DefaultAWSCredentialsProviderChain
    +   * on the workers. See AWS documentation to understand how 
DefaultAWSCredentialsProviderChain
    +   * gets the AWS credentials.
    +   */
    +  // scalastyle:off
    +  def createStream[T: ClassTag](
    +      ssc: StreamingContext,
    +      kinesisAppName: String,
    +      streamName: String,
    +      endpointUrl: String,
    +      regionName: String,
    +      initialPositionInStream: InitialPositionInStream,
    +      checkpointInterval: Duration,
    +      storageLevel: StorageLevel,
    +      messageHandler: Record => T,
    +      stsAssumeRoleArn: String,
    +      stsSessionName: String,
    +      stsExternalId: String): ReceiverInputDStream[T] = {
    --- End diff --
    
    The external ID is optional but I'm making it required in 
```KinesisUtils``` since otherwise we'll need to double the number of overrides 
of ```createStream()``` (e.g. for STS params with/without stsExternalId rather 
than just STS prams with stsExternalId). I think the API was constructed in 
this fashion in order to have consistent method declarations between Scala and 
Java. I think the better long term solution here is to deprecate 
```createStream()``` in favor of a builder class for constructing Kinesis 
DStreams.
    
    If the external ID isn't specified in the trust policy of the IAM role 
being assumed it will simply be ignored.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to