Hi everyone I started looking at Kinesis integration and it looks promising. However, I feel like it can be improved. Here are my thoughts:
1. It assumes that AWS credentials are provided by DefaultAWSCredentialsProviderChain and there is no way to change the behavior. I would have liked to have an ability to provide a different AWSCredentialsProvider. 2. I feel like modules in extras need to be independent from Spark build and should perhaps be in separate repository/repositories. I had to download most recent checkout of Spark and slap kinesis-asl into Spark 1.0.2 to create a custom spark-streaming-kinesis-asl_2.10-1.0.2.jar that I can use in my Spark jobs. Ideally, people would want extra modules to be cross built against different versions of Spark. Having independent repositories can enable us to deliver build for extras packages faster than Spark releases and they would be readily available to earlier versions of Spark. This can free up Spark developers to focus on enhancements in the core framework instead of managing spark-* integration pull requests. 3. Maybe it's just me, but I could have liked a Context like API for creating Kinesis streams instead of using KinesisUtils. It makes it a little more consistent with rest of the Spark API. We could have have a KinesisContext which goes like this: class KinesisStreamingContext(@transient ssc: StreamingContext, endpointUrl: String, defaultCredentialsProvider: AWSCredentialsProvider) { def createStream(streamName: String, checkpointInterval: Duration, initialPositionInStream: InitialPositionInStream, storageLevel: StorageLevel, credentialsProvider: AWSCredentialsProvider = defaultCredentialsProvider) {...} } 4. The example KinesisWordCountASL creates numShards receiver instances which makes sense. Maybe the API should provide ability to provide parallelism and default to numShards? I can submit pull requests for some of the above items, provided the community agrees and nobody else is working on it. Thanks, Aniket