I actually ran a aws cli container under the service account again and
managed to replicate the error independently of Prometheus.  Yes,
indications are that I can just leave out the role and it'll get set by the
environment.

On Tue, Apr 14, 2020, 7:54 AM Matthias Rampke <[email protected]>
wrote:

> This is a bit of a guess (I haven't dug into the code to confirm it) –
> what happens if you remove the role from the SD config and *only* pass it
> through the environment? I can imagine that the explicit configuration
> causes us to not look at the environment in the same way. My hope is that
> by not passing *any* authentication information in the Prometheus config
> we fall back to the default SDK behaviour.
>
> /MR
>
> On Mon, Apr 13, 2020 at 10:53 PM William Findley <[email protected]>
> wrote:
>
>> On the off chance, I fired up a pod with a container with the AWS CLI on
>> it under the service account I'm using, and it was able to do the
>> ec2:describeinstances api call just fine.  I'm not sure how to track down
>> what's happening here.  Maybe I've run into a bug?
>>
>> On Friday, April 10, 2020 at 8:21:19 PM UTC-4, William Findley wrote:
>>>
>>>
>>> I'm having trouble getting ec2 service discovery to work using an IAM
>>> role bound to an EKS service account.  Here's what I have.
>>>
>>> I have a pod that has successfully had a web identity token projected
>>> into it.  I'm fairly confident that there's no problem with this.  I have
>>> customers on this EKS that I've rigged up with IAM roles and kubez service
>>> accounts, and they're happily using services.
>>>
>>> /prometheus $ ls -la /var/run/secrets/eks.amazonaws.com/serviceaccount
>>> total 0
>>> drwxrwsrwt    3 root     2000           100 Apr 10 17:23 .
>>> drwxr-xr-x    3 root     root            28 Apr 10 17:49 ..
>>> drwxr-sr-x    2 root     2000            60 Apr 10 17:23
>>> ..2020_04_10_17_23_59.145300320
>>> lrwxrwxrwx    1 root     root            31 Apr 10 17:23 ..data ->
>>> ..2020_04_10_17_23_59.145300320
>>> lrwxrwxrwx    1 root     root            12 Apr 10 17:23 token ->
>>> ..data/token
>>>
>>>
>>> I'm the information about what role/token to use  is exposed on the
>>> following env vars:
>>>
>>>       AWS_ROLE_ARN:
>>> arn:aws:iam::2XXXXXXXXXX0:role/prometheus-service-discovery-eks
>>>       AWS_WEB_IDENTITY_TOKEN_FILE:  /var/run/secrets/
>>> eks.amazonaws.com/serviceaccount/token
>>>
>>> Here's my scrape config.  I'm trying to discover and scrape node
>>> exporter on a box that I've tagged with prometheus.io/discover and has
>>> a name biginning like I expect.
>>> scrape_configs:
>>> - ec2_sd_configs:
>>>   - filters:
>>>     - name: tag-key
>>>       values:
>>>       - prometheus.io/discover
>>>     role_arn:
>>> arn:aws:iam::2XXXXXXXXXX0:role/prometheus-service-discovery-eks
>>>   job_name: service-ec2
>>>   relabel_configs:
>>>   - action: keep
>>>     regex: ^mycoolnameprefix-.*
>>>     source_labels:
>>>     - __meta_ec2_tag_Name
>>>   - replacement: $1:9100
>>>     source_labels:
>>>     - __meta_ec2_private_ip
>>>     target_label: __address__
>>>
>>> My assumption from the docs and the use of the latest version of
>>> prometheus and the dependant AWS SDK was that it would use these ENV
>>> variables in the way that it needed to discover the role and go out and
>>> bind it.  However, these logs indicate otherwise:
>>>
>>> level=debug ts=2020-04-10T21:08:03.271Z caller=manager.go:224
>>> component="discovery manager scrape" msg="Starting provider"
>>> provider=*ec2.SDConfig/0 subs=[service-ec2]
>>> level=debug ts=2020-04-10T21:08:03.271Z caller=manager.go:224
>>> component="discovery manager notify" msg="Starting provider"
>>> provider=string/0 subs=[config-0]
>>> level=info ts=2020-04-10T21:08:03.271Z caller=main.go:816 msg="Completed
>>> loading of configuration file"
>>> filename=/etc/prometheus/config_out/prometheus.env.yaml
>>> level=debug ts=2020-04-10T21:08:03.271Z caller=manager.go:242
>>> component="discovery manager notify" msg="discoverer channel closed"
>>> provider=string/0
>>> level=error ts=2020-04-10T21:08:03.493Z caller=refresh.go:79
>>> component="discovery manager scrape" discovery=ec2 msg="Unable to refresh
>>> target groups" err="could not describe instances: WebIdentityErr: failed to
>>> retrieve credentials\ncaused by: AccessDenied: Not authorized to perform
>>> sts:AssumeRoleWithWebIdentity\n\tstatus code: 403, request id:
>>> 3317a2e2-5357-4535-9b53-085209fdfb5c"
>>> level=error ts=2020-04-10T21:09:03.502Z caller=refresh.go:98
>>> component="discovery manager scrape" discovery=ec2 msg="Unable to refresh
>>> target groups" err="could not describe instances: WebIdentityErr: failed to
>>> retrieve credentials\ncaused by: AccessDenied: Not authorized to perform
>>> sts:AssumeRoleWithWebIdentity\n\tstatus code: 403, request id:
>>> 455fddb6-9b42-449b-b603-d7f453923a7b"
>>>
>>> Any tips on where I might have gone wrong?  I made the best effort I
>>> could to follow the existing documentation, but I don't feel like it's
>>> telling me everything I need to know.
>>>
>>>
>>> --
>> You received this message because you are subscribed to the Google Groups
>> "Prometheus Users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/prometheus-users/67f41258-7d11-44aa-92b2-43e60b58a616%40googlegroups.com
>> <https://groups.google.com/d/msgid/prometheus-users/67f41258-7d11-44aa-92b2-43e60b58a616%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/CANgRkqLzedkxeJu9KP2ZVc8uRXk_%2B2JRbREwyV_DjQc3P9Q-mA%40mail.gmail.com.

Reply via email to