I actually ran a aws cli container under the service account again and managed to replicate the error independently of Prometheus. Yes, indications are that I can just leave out the role and it'll get set by the environment.
On Tue, Apr 14, 2020, 7:54 AM Matthias Rampke <[email protected]> wrote: > This is a bit of a guess (I haven't dug into the code to confirm it) – > what happens if you remove the role from the SD config and *only* pass it > through the environment? I can imagine that the explicit configuration > causes us to not look at the environment in the same way. My hope is that > by not passing *any* authentication information in the Prometheus config > we fall back to the default SDK behaviour. > > /MR > > On Mon, Apr 13, 2020 at 10:53 PM William Findley <[email protected]> > wrote: > >> On the off chance, I fired up a pod with a container with the AWS CLI on >> it under the service account I'm using, and it was able to do the >> ec2:describeinstances api call just fine. I'm not sure how to track down >> what's happening here. Maybe I've run into a bug? >> >> On Friday, April 10, 2020 at 8:21:19 PM UTC-4, William Findley wrote: >>> >>> >>> I'm having trouble getting ec2 service discovery to work using an IAM >>> role bound to an EKS service account. Here's what I have. >>> >>> I have a pod that has successfully had a web identity token projected >>> into it. I'm fairly confident that there's no problem with this. I have >>> customers on this EKS that I've rigged up with IAM roles and kubez service >>> accounts, and they're happily using services. >>> >>> /prometheus $ ls -la /var/run/secrets/eks.amazonaws.com/serviceaccount >>> total 0 >>> drwxrwsrwt 3 root 2000 100 Apr 10 17:23 . >>> drwxr-xr-x 3 root root 28 Apr 10 17:49 .. >>> drwxr-sr-x 2 root 2000 60 Apr 10 17:23 >>> ..2020_04_10_17_23_59.145300320 >>> lrwxrwxrwx 1 root root 31 Apr 10 17:23 ..data -> >>> ..2020_04_10_17_23_59.145300320 >>> lrwxrwxrwx 1 root root 12 Apr 10 17:23 token -> >>> ..data/token >>> >>> >>> I'm the information about what role/token to use is exposed on the >>> following env vars: >>> >>> AWS_ROLE_ARN: >>> arn:aws:iam::2XXXXXXXXXX0:role/prometheus-service-discovery-eks >>> AWS_WEB_IDENTITY_TOKEN_FILE: /var/run/secrets/ >>> eks.amazonaws.com/serviceaccount/token >>> >>> Here's my scrape config. I'm trying to discover and scrape node >>> exporter on a box that I've tagged with prometheus.io/discover and has >>> a name biginning like I expect. >>> scrape_configs: >>> - ec2_sd_configs: >>> - filters: >>> - name: tag-key >>> values: >>> - prometheus.io/discover >>> role_arn: >>> arn:aws:iam::2XXXXXXXXXX0:role/prometheus-service-discovery-eks >>> job_name: service-ec2 >>> relabel_configs: >>> - action: keep >>> regex: ^mycoolnameprefix-.* >>> source_labels: >>> - __meta_ec2_tag_Name >>> - replacement: $1:9100 >>> source_labels: >>> - __meta_ec2_private_ip >>> target_label: __address__ >>> >>> My assumption from the docs and the use of the latest version of >>> prometheus and the dependant AWS SDK was that it would use these ENV >>> variables in the way that it needed to discover the role and go out and >>> bind it. However, these logs indicate otherwise: >>> >>> level=debug ts=2020-04-10T21:08:03.271Z caller=manager.go:224 >>> component="discovery manager scrape" msg="Starting provider" >>> provider=*ec2.SDConfig/0 subs=[service-ec2] >>> level=debug ts=2020-04-10T21:08:03.271Z caller=manager.go:224 >>> component="discovery manager notify" msg="Starting provider" >>> provider=string/0 subs=[config-0] >>> level=info ts=2020-04-10T21:08:03.271Z caller=main.go:816 msg="Completed >>> loading of configuration file" >>> filename=/etc/prometheus/config_out/prometheus.env.yaml >>> level=debug ts=2020-04-10T21:08:03.271Z caller=manager.go:242 >>> component="discovery manager notify" msg="discoverer channel closed" >>> provider=string/0 >>> level=error ts=2020-04-10T21:08:03.493Z caller=refresh.go:79 >>> component="discovery manager scrape" discovery=ec2 msg="Unable to refresh >>> target groups" err="could not describe instances: WebIdentityErr: failed to >>> retrieve credentials\ncaused by: AccessDenied: Not authorized to perform >>> sts:AssumeRoleWithWebIdentity\n\tstatus code: 403, request id: >>> 3317a2e2-5357-4535-9b53-085209fdfb5c" >>> level=error ts=2020-04-10T21:09:03.502Z caller=refresh.go:98 >>> component="discovery manager scrape" discovery=ec2 msg="Unable to refresh >>> target groups" err="could not describe instances: WebIdentityErr: failed to >>> retrieve credentials\ncaused by: AccessDenied: Not authorized to perform >>> sts:AssumeRoleWithWebIdentity\n\tstatus code: 403, request id: >>> 455fddb6-9b42-449b-b603-d7f453923a7b" >>> >>> Any tips on where I might have gone wrong? I made the best effort I >>> could to follow the existing documentation, but I don't feel like it's >>> telling me everything I need to know. >>> >>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "Prometheus Users" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/prometheus-users/67f41258-7d11-44aa-92b2-43e60b58a616%40googlegroups.com >> <https://groups.google.com/d/msgid/prometheus-users/67f41258-7d11-44aa-92b2-43e60b58a616%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/CANgRkqLzedkxeJu9KP2ZVc8uRXk_%2B2JRbREwyV_DjQc3P9Q-mA%40mail.gmail.com.

