This is a bit of a guess (I haven't dug into the code to confirm it) – what
happens if you remove the role from the SD config and *only* pass it
through the environment? I can imagine that the explicit configuration
causes us to not look at the environment in the same way. My hope is that
by not passing *any* authentication information in the Prometheus config we
fall back to the default SDK behaviour.

/MR

On Mon, Apr 13, 2020 at 10:53 PM William Findley <[email protected]> wrote:

> On the off chance, I fired up a pod with a container with the AWS CLI on
> it under the service account I'm using, and it was able to do the
> ec2:describeinstances api call just fine.  I'm not sure how to track down
> what's happening here.  Maybe I've run into a bug?
>
> On Friday, April 10, 2020 at 8:21:19 PM UTC-4, William Findley wrote:
>>
>>
>> I'm having trouble getting ec2 service discovery to work using an IAM
>> role bound to an EKS service account.  Here's what I have.
>>
>> I have a pod that has successfully had a web identity token projected
>> into it.  I'm fairly confident that there's no problem with this.  I have
>> customers on this EKS that I've rigged up with IAM roles and kubez service
>> accounts, and they're happily using services.
>>
>> /prometheus $ ls -la /var/run/secrets/eks.amazonaws.com/serviceaccount
>> total 0
>> drwxrwsrwt    3 root     2000           100 Apr 10 17:23 .
>> drwxr-xr-x    3 root     root            28 Apr 10 17:49 ..
>> drwxr-sr-x    2 root     2000            60 Apr 10 17:23
>> ..2020_04_10_17_23_59.145300320
>> lrwxrwxrwx    1 root     root            31 Apr 10 17:23 ..data ->
>> ..2020_04_10_17_23_59.145300320
>> lrwxrwxrwx    1 root     root            12 Apr 10 17:23 token ->
>> ..data/token
>>
>>
>> I'm the information about what role/token to use  is exposed on the
>> following env vars:
>>
>>       AWS_ROLE_ARN:
>> arn:aws:iam::2XXXXXXXXXX0:role/prometheus-service-discovery-eks
>>       AWS_WEB_IDENTITY_TOKEN_FILE:  /var/run/secrets/
>> eks.amazonaws.com/serviceaccount/token
>>
>> Here's my scrape config.  I'm trying to discover and scrape node exporter
>> on a box that I've tagged with prometheus.io/discover and has a name
>> biginning like I expect.
>> scrape_configs:
>> - ec2_sd_configs:
>>   - filters:
>>     - name: tag-key
>>       values:
>>       - prometheus.io/discover
>>     role_arn:
>> arn:aws:iam::2XXXXXXXXXX0:role/prometheus-service-discovery-eks
>>   job_name: service-ec2
>>   relabel_configs:
>>   - action: keep
>>     regex: ^mycoolnameprefix-.*
>>     source_labels:
>>     - __meta_ec2_tag_Name
>>   - replacement: $1:9100
>>     source_labels:
>>     - __meta_ec2_private_ip
>>     target_label: __address__
>>
>> My assumption from the docs and the use of the latest version of
>> prometheus and the dependant AWS SDK was that it would use these ENV
>> variables in the way that it needed to discover the role and go out and
>> bind it.  However, these logs indicate otherwise:
>>
>> level=debug ts=2020-04-10T21:08:03.271Z caller=manager.go:224
>> component="discovery manager scrape" msg="Starting provider"
>> provider=*ec2.SDConfig/0 subs=[service-ec2]
>> level=debug ts=2020-04-10T21:08:03.271Z caller=manager.go:224
>> component="discovery manager notify" msg="Starting provider"
>> provider=string/0 subs=[config-0]
>> level=info ts=2020-04-10T21:08:03.271Z caller=main.go:816 msg="Completed
>> loading of configuration file"
>> filename=/etc/prometheus/config_out/prometheus.env.yaml
>> level=debug ts=2020-04-10T21:08:03.271Z caller=manager.go:242
>> component="discovery manager notify" msg="discoverer channel closed"
>> provider=string/0
>> level=error ts=2020-04-10T21:08:03.493Z caller=refresh.go:79
>> component="discovery manager scrape" discovery=ec2 msg="Unable to refresh
>> target groups" err="could not describe instances: WebIdentityErr: failed to
>> retrieve credentials\ncaused by: AccessDenied: Not authorized to perform
>> sts:AssumeRoleWithWebIdentity\n\tstatus code: 403, request id:
>> 3317a2e2-5357-4535-9b53-085209fdfb5c"
>> level=error ts=2020-04-10T21:09:03.502Z caller=refresh.go:98
>> component="discovery manager scrape" discovery=ec2 msg="Unable to refresh
>> target groups" err="could not describe instances: WebIdentityErr: failed to
>> retrieve credentials\ncaused by: AccessDenied: Not authorized to perform
>> sts:AssumeRoleWithWebIdentity\n\tstatus code: 403, request id:
>> 455fddb6-9b42-449b-b603-d7f453923a7b"
>>
>> Any tips on where I might have gone wrong?  I made the best effort I
>> could to follow the existing documentation, but I don't feel like it's
>> telling me everything I need to know.
>>
>>
>> --
> You received this message because you are subscribed to the Google Groups
> "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/prometheus-users/67f41258-7d11-44aa-92b2-43e60b58a616%40googlegroups.com
> <https://groups.google.com/d/msgid/prometheus-users/67f41258-7d11-44aa-92b2-43e60b58a616%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/CAMV%3D_gbVwy1Vrm2aUBRRD3sznaAf_ENLQ0wxk0-LioGSmpobKw%40mail.gmail.com.

Reply via email to