endzyme opened a new issue, #475:
URL: https://github.com/apache/solr-operator/issues/475

   ### Summary
   I tried backing up to S3 using IAM Role Assumption via Web Identity Tokens 
on EKS, and am getting errors. I tried with static AWS IAM credentials with the 
same policy and it works. There is an ominous `WARN` which may indicate why web 
identity token role assumption is not functioning.
   
   * * *
   ### Details
   
   I am experiencing some issues using the S3 Backup and Restore configuration 
using the Solr Operator. I am running the 8.11 Solr image and have configured 
the pods on our EKS cluster with the appropriate Kubernetes Service Account and 
the Service Account is annotated in the proper way with the IAM Role ARN.
   
   There is an interesting warning message when attempting to perform a backup. 
The shard leader will emit the message below before every attempt:
   
   ```
   WARN  
(OverseerThreadFactory-34-thread-2-processing-n:dev-8-blue-solrcloud-2.solr:80_solr)
 [c:test   ] s.a.a.a.c.i.WebIdentityCredentialsUtils To use web identity 
tokens, the 'sts' service module must be on the class path.
   ```
   
   When I configure the SolrCloud resource with static IAM credentials I can 
perform the backup, but with the assumed role via web identity token I am 
receiving a 403 from S3 (see error message below).
   
   ```
   2022-09-15 15:17:21.941 WARN  
(OverseerThreadFactory-34-thread-1-processing-n:dev-8-blue-solrcloud-1.solr:80_solr)
 [c:test   ] s.a.a.a.c.i.WebIdentityCredentialsUtils To use web identity 
tokens, the 'sts' service module must be on the class path.
   2022-09-15 15:17:24.447 ERROR 
(OverseerThreadFactory-34-thread-1-processing-n:dev-8-blue-solrcloud-1.solr:80_solr)
 [c:test   ] o.a.s.s.S3StorageClient An AmazonServiceException was thrown! 
[serviceName=S3] [awsRequestId=SNIP] [httpStatus=403] [s3ErrorCode=null] 
[message=null]
   2022-09-15 15:17:24.449 ERROR 
(OverseerThreadFactory-34-thread-1-processing-n:dev-8-blue-solrcloud-1.solr:80_solr)
 [c:test   ] o.a.s.c.a.c.OverseerCollectionMessageHandler Collection: test 
operation: backup failed => org.apache.solr.s3.S3Exception: An 
AmazonServiceException was thrown! [serviceName=S3] [awsRequestId=SNIP] 
[httpStatus=403] [s3ErrorCode=null] [message=null]
           at 
org.apache.solr.s3.S3StorageClient.handleAmazonException(S3StorageClient.java:598)
   org.apache.solr.s3.S3Exception: An AmazonServiceException was thrown! 
[serviceName=S3] [awsRequestId=SNIP] [httpStatus=403] [s3ErrorCode=null] 
[message=null]
           at 
org.apache.solr.s3.S3StorageClient.handleAmazonException(S3StorageClient.java:598)
 ~[?:?]
           at 
org.apache.solr.s3.S3StorageClient.pathExists(S3StorageClient.java:314) ~[?:?]
           at 
org.apache.solr.s3.S3BackupRepository.exists(S3BackupRepository.java:200) ~[?:?]
           at 
org.apache.solr.cloud.api.collections.BackupCmd.createAndValidateBackupPath(BackupCmd.java:154)
 ~[?:?]
           at 
org.apache.solr.cloud.api.collections.BackupCmd.call(BackupCmd.java:94) ~[?:?]
           at 
org.apache.solr.cloud.api.collections.OverseerCollectionMessageHandler.processMessage(OverseerCollectionMessageHandler.java:271)
 ~[?:?]
           at 
org.apache.solr.cloud.OverseerTaskProcessor$Runner.run(OverseerTaskProcessor.java:524)
 ~[?:?]
           at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:218)
 ~[?:?]
           at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) 
~[?:?]
           at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown 
Source) ~[?:?]
           at java.lang.Thread.run(Unknown Source) [?:?]
   Caused by: software.amazon.awssdk.services.s3.model.S3Exception: null 
(Service: S3, Status Code: 403, Request ID: SNIP, Extended Request ID: SNIP)
           at 
software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handleErrorResponse(AwsXmlPredicatedResponseHandler.java:156)
 ~[?:?]
           at 
software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handleResponse(AwsXmlPredicatedResponseHandler.java:106)
 ~[?:?]
           at 
software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handle(AwsXmlPredicatedResponseHandler.java:84)
 ~[?:?]
           at 
software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handle(AwsXmlPredicatedResponseHandler.java:42)
 ~[?:?]
           at 
software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler$Crc32ValidationResponseHandler.handle(AwsSyncClientHandler.java:95)
 ~[?:?]
           at 
software.amazon.awssdk.core.internal.handler.BaseClientHandler.lambda$successTransformationResponseHandler$6(BaseClientHandler.java:232)
 ~[?:?]
           at 
software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.execute(HandleResponseStage.java:40)
 ~[?:?]
           at 
software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.execute(HandleResponseStage.java:30)
 ~[?:?]
           at 
software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
 ~[?:?]
           at 
software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:73)
 ~[?:?]
           at 
software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:42)
 ~[?:?]
           at 
software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:78)
 ~[?:?]
           at 
software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:40)
 ~[?:?]
           at 
software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptMetricCollectionStage.execute(ApiCallAttemptMetricCollectionStage.java:50)
 ~[?:?]
           at 
software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptMetricCollectionStage.execute(ApiCallAttemptMetricCollectionStage.java:36)
 ~[?:?]
           at 
software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:80)
 ~[?:?]
           at 
software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:36)
 ~[?:?]
           at 
software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
 ~[?:?]
           at 
software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:56)
 ~[?:?]
           at 
software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:36)
 ~[?:?]
           at 
software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.executeWithTimer(ApiCallTimeoutTrackingStage.java:80)
 ~[?:?]
           at 
software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:60)
 ~[?:?]
           at 
software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:42)
 ~[?:?]
           at 
software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:48)
 ~[?:?]
           at 
software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:31)
 ~[?:?]
           at 
software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
 ~[?:?]
           at 
software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
 ~[?:?]
           at 
software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:37)
 ~[?:?]
           at 
software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:26)
 ~[?:?]
           at 
software.amazon.awssdk.core.internal.http.AmazonSyncHttpClient$RequestExecutionBuilderImpl.execute(AmazonSyncHttpClient.java:193)
 ~[?:?]
           at 
software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.invoke(BaseSyncClientHandler.java:103)
 ~[?:?]
           at 
software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.doExecute(BaseSyncClientHandler.java:167)
 ~[?:?]
           at 
software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.lambda$execute$1(BaseSyncClientHandler.java:82)
 ~[?:?]
           at 
software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.measureApiCallSuccess(BaseSyncClientHandler.java:175)
 ~[?:?]
           at 
software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.execute(BaseSyncClientHandler.java:76)
 ~[?:?]
           at 
software.amazon.awssdk.core.client.handler.SdkSyncClientHandler.execute(SdkSyncClientHandler.java:45)
 ~[?:?]
           at 
software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler.execute(AwsSyncClientHandler.java:56)
 ~[?:?]
           at 
software.amazon.awssdk.services.s3.DefaultS3Client.headObject(DefaultS3Client.java:5080)
 ~[?:?]
           at 
software.amazon.awssdk.services.s3.S3Client.headObject(S3Client.java:9886) 
~[?:?]
           at 
org.apache.solr.s3.S3StorageClient.pathExists(S3StorageClient.java:309) ~[?:?]
           ... 9 more
   ```
   
   Below are the things I've tried and observed:
   * Tested the IAM Role itself for access to the target bucket
   * Confirmed that the mutating webhook is in fact modifying the SolrCloud 
pods with the appropriate env vars and projected service account token volume 
mounts
   * Confirmed that I can use those tokens to assume the role and get to the S3 
bucket
   * Tested with "static" IAM credentials via the `kubectl explain 
solrcloud.spec.backupRepositories.s3.credentials.credentialsFileSecret` 
configuration, the IAM user has the same policy as the IAM role, and this works 
for backups
   
   This warning about `the 'sts' service module must be on the class path` 
message makes me think that something else needs to be loaded in the Solr 
modules before this will work. I have looked through the documentation and 
everything seems to indicate that, when using EKS, it's a supported use case to 
use IAM Roles through Service Accounts. The documentation also appears to 
indicate that I do not need to specify anything extra in modules or plugins for 
SolrCloud K8s Resource because they are autoloaded when providing backup 
configurations of S3.
   
   Any help would be appreciated!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to