Hi,

We're running a Beam job in Dataflow that reads inputs from KinesisIO.
Right now it works if we pass the AWS keys to the job like so:

p.apply(KinesisIO.read()
    // ...
   .withAWSClientsProvider("AWS_KEY",  "AWS_SECRET", STREAM_REGION)
.apply( ... ));

Instead of passing the AWS keys upfront, we would like to let the workers
directly fetch the keys at run time from an external secret management
service (e.g. Vault). That way, we could give Vault access permissions to
the Dataflow cluster's service account, and avoid having to pass the keys
upfront in clear.

We've tried implementing a custom AWSClientsProvider
<https://github.com/apache/beam/blob/master/sdks/java/io/kinesis/src/main/java/org/apache/beam/sdk/io/kinesis/AWSClientsProvider.java>
and
then passing it like so:

p.apply(KinesisIO.read()
   //...
   .withAWSClientsProvider(new CustomAWSClientsProvider()));

However, from our testing it looks like the
CustomAWSClientsProvider.getKinesisClient() method stills gets called at
job submission time, before the job actually starts running.

Is there a way to let worker nodes retrieve those keys themselves at run
time to configure Kinesis access?

Thanks,

Julien

Reply via email to