[jira] [Commented] (SPARK-38958) Override S3 Client in Spark Write/Read calls

2024-04-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840748#comment-17840748
 ] 

ASF GitHub Bot commented on SPARK-38958:


hadoop-yetus commented on PR #6550:
URL: https://github.com/apache/hadoop/pull/6550#issuecomment-2076861454

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m 01s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  spotbugs  |   0m 00s |  |  spotbugs executables are not 
available.  |
   | +0 :ok: |  codespell  |   0m 01s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m 01s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m 00s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m 00s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  92m 11s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   5m 02s |  |  trunk passed  |
   | +1 :green_heart: |  checkstyle  |   4m 36s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   5m 03s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   4m 45s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  | 146m 50s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   2m 55s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   2m 16s |  |  the patch passed  |
   | +1 :green_heart: |  javac  |   2m 16s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m 00s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   2m 02s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   2m 28s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   2m 14s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  | 159m 37s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  asflicense  |   5m 25s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 421m 41s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | GITHUB PR | https://github.com/apache/hadoop/pull/6550 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | MINGW64_NT-10.0-17763 691d1e3161c7 3.4.10-87d57229.x86_64 
2024-02-14 20:17 UTC x86_64 Msys |
   | Build tool | maven |
   | Personality | /c/hadoop/dev-support/bin/hadoop.sh |
   | git revision | trunk / c8168fd0bc45331bd8b55dd53b537bec4b05fba5 |
   | Default Java | Azul Systems, Inc.-1.8.0_332-b09 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6550/1/testReport/
 |
   | modules | C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6550/1/console
 |
   | versions | git=2.44.0.windows.1 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   




> Override S3 Client in Spark Write/Read calls
> 
>
> Key: SPARK-38958
> URL: https://issues.apache.org/jira/browse/SPARK-38958
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 3.2.1
>Reporter: Hershal
>Priority: Major
>  Labels: pull-request-available
>
> Hello,
> I have been working to use spark to read and write data to S3. Unfortunately, 
> there are a few S3 headers that I need to add to my spark read/write calls. 
> After much looking, I have not found a way to replace the S3 client that 
> spark uses to make the read/write calls. I also have not found a 
> configuration that allows me to pass in S3 headers. Here is an example of 
> some common S3 request headers 
> ([https://docs.aws.amazon.com/AmazonS3/latest/API/RESTCommonRequestHeaders.html).]
>  Does there already exist functionality to add S3 headers to spark read/write 
> calls or pass in a custom client that would pass these headers on every 
> read/write request? Appreciate the help and feedback
>  
> Thanks,



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For 

[jira] [Commented] (SPARK-38958) Override S3 Client in Spark Write/Read calls

2024-03-19 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828458#comment-17828458
 ] 

ASF GitHub Bot commented on SPARK-38958:


steveloughran commented on PR #6550:
URL: https://github.com/apache/hadoop/pull/6550#issuecomment-2007773085

   Can use HADOOP-18562 for the JIRA ID here; hadoop codebase see. thanks




> Override S3 Client in Spark Write/Read calls
> 
>
> Key: SPARK-38958
> URL: https://issues.apache.org/jira/browse/SPARK-38958
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 3.2.1
>Reporter: Hershal
>Priority: Major
>  Labels: pull-request-available
>
> Hello,
> I have been working to use spark to read and write data to S3. Unfortunately, 
> there are a few S3 headers that I need to add to my spark read/write calls. 
> After much looking, I have not found a way to replace the S3 client that 
> spark uses to make the read/write calls. I also have not found a 
> configuration that allows me to pass in S3 headers. Here is an example of 
> some common S3 request headers 
> ([https://docs.aws.amazon.com/AmazonS3/latest/API/RESTCommonRequestHeaders.html).]
>  Does there already exist functionality to add S3 headers to spark read/write 
> calls or pass in a custom client that would pass these headers on every 
> read/write request? Appreciate the help and feedback
>  
> Thanks,



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38958) Override S3 Client in Spark Write/Read calls

2024-02-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817558#comment-17817558
 ] 

ASF GitHub Bot commented on SPARK-38958:


AbhiAMZ commented on code in PR #6550:
URL: https://github.com/apache/hadoop/pull/6550#discussion_r1490209084


##
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/AWSClientConfig.java:
##
@@ -407,6 +413,48 @@ private static void initSigner(Configuration conf,
 }
   }
 
+  /**
+   *
+   * @param conf hadoop configuration
+   * @param clientConfig client configuration to update
+   * @param awsServiceIdentifier service name
+   */
+  private static void initRequestHeaders(Configuration conf,
+  ClientOverrideConfiguration.Builder clientConfig, String 
awsServiceIdentifier) {
+String configKey = null;
+switch (awsServiceIdentifier) {
+  case AWS_SERVICE_IDENTIFIER_S3:
+configKey = CUSTOM_HEADERS_S3;
+break;
+  case AWS_SERVICE_IDENTIFIER_STS:
+configKey = CUSTOM_HEADERS_STS;
+break;
+  default:
+// Nothing to do. The original signer override is already setup
+}
+if (configKey != null) {
+  String[] customHeaders = conf.getTrimmedStrings(configKey);
+  if (customHeaders == null || customHeaders.length == 0) {
+LOG.debug("No custom headers specified");
+return;
+  }
+
+  for (String customHeader : customHeaders) {

Review Comment:
   Unknown headers are neglected





> Override S3 Client in Spark Write/Read calls
> 
>
> Key: SPARK-38958
> URL: https://issues.apache.org/jira/browse/SPARK-38958
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 3.2.1
>Reporter: Hershal
>Priority: Major
>  Labels: pull-request-available
>
> Hello,
> I have been working to use spark to read and write data to S3. Unfortunately, 
> there are a few S3 headers that I need to add to my spark read/write calls. 
> After much looking, I have not found a way to replace the S3 client that 
> spark uses to make the read/write calls. I also have not found a 
> configuration that allows me to pass in S3 headers. Here is an example of 
> some common S3 request headers 
> ([https://docs.aws.amazon.com/AmazonS3/latest/API/RESTCommonRequestHeaders.html).]
>  Does there already exist functionality to add S3 headers to spark read/write 
> calls or pass in a custom client that would pass these headers on every 
> read/write request? Appreciate the help and feedback
>  
> Thanks,



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38958) Override S3 Client in Spark Write/Read calls

2024-02-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817557#comment-17817557
 ] 

ASF GitHub Bot commented on SPARK-38958:


AbhiAMZ commented on code in PR #6550:
URL: https://github.com/apache/hadoop/pull/6550#discussion_r1490209084


##
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/AWSClientConfig.java:
##
@@ -407,6 +413,48 @@ private static void initSigner(Configuration conf,
 }
   }
 
+  /**
+   *
+   * @param conf hadoop configuration
+   * @param clientConfig client configuration to update
+   * @param awsServiceIdentifier service name
+   */
+  private static void initRequestHeaders(Configuration conf,
+  ClientOverrideConfiguration.Builder clientConfig, String 
awsServiceIdentifier) {
+String configKey = null;
+switch (awsServiceIdentifier) {
+  case AWS_SERVICE_IDENTIFIER_S3:
+configKey = CUSTOM_HEADERS_S3;
+break;
+  case AWS_SERVICE_IDENTIFIER_STS:
+configKey = CUSTOM_HEADERS_STS;
+break;
+  default:
+// Nothing to do. The original signer override is already setup
+}
+if (configKey != null) {
+  String[] customHeaders = conf.getTrimmedStrings(configKey);
+  if (customHeaders == null || customHeaders.length == 0) {
+LOG.debug("No custom headers specified");
+return;
+  }
+
+  for (String customHeader : customHeaders) {

Review Comment:
   Unknown headers are neglected





> Override S3 Client in Spark Write/Read calls
> 
>
> Key: SPARK-38958
> URL: https://issues.apache.org/jira/browse/SPARK-38958
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 3.2.1
>Reporter: Hershal
>Priority: Major
>  Labels: pull-request-available
>
> Hello,
> I have been working to use spark to read and write data to S3. Unfortunately, 
> there are a few S3 headers that I need to add to my spark read/write calls. 
> After much looking, I have not found a way to replace the S3 client that 
> spark uses to make the read/write calls. I also have not found a 
> configuration that allows me to pass in S3 headers. Here is an example of 
> some common S3 request headers 
> ([https://docs.aws.amazon.com/AmazonS3/latest/API/RESTCommonRequestHeaders.html).]
>  Does there already exist functionality to add S3 headers to spark read/write 
> calls or pass in a custom client that would pass these headers on every 
> read/write request? Appreciate the help and feedback
>  
> Thanks,



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38958) Override S3 Client in Spark Write/Read calls

2024-02-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817460#comment-17817460
 ] 

ASF GitHub Bot commented on SPARK-38958:


steveloughran commented on code in PR #6550:
URL: https://github.com/apache/hadoop/pull/6550#discussion_r1489843730


##
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/AWSClientConfig.java:
##
@@ -407,6 +413,48 @@ private static void initSigner(Configuration conf,
 }
   }
 
+  /**
+   *
+   * @param conf hadoop configuration
+   * @param clientConfig client configuration to update
+   * @param awsServiceIdentifier service name
+   */
+  private static void initRequestHeaders(Configuration conf,
+  ClientOverrideConfiguration.Builder clientConfig, String 
awsServiceIdentifier) {
+String configKey = null;
+switch (awsServiceIdentifier) {
+  case AWS_SERVICE_IDENTIFIER_S3:
+configKey = CUSTOM_HEADERS_S3;
+break;
+  case AWS_SERVICE_IDENTIFIER_STS:
+configKey = CUSTOM_HEADERS_STS;
+break;
+  default:
+// Nothing to do. The original signer override is already setup
+}
+if (configKey != null) {
+  String[] customHeaders = conf.getTrimmedStrings(configKey);
+  if (customHeaders == null || customHeaders.length == 0) {
+LOG.debug("No custom headers specified");
+return;
+  }
+
+  for (String customHeader : customHeaders) {

Review Comment:
   HADOOP-18980/#6406 addes split and parse of headers



##
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java:
##
@@ -775,6 +775,23 @@ private Constants() {
   "fs.s3a." + Constants.AWS_SERVICE_IDENTIFIER_STS.toLowerCase()
   + ".signing-algorithm";
 
+  /**
+   * List of custom headers to be set on the service client.
+   * Multiple parameters can be used to specify custom headers.
+   * fs.s3a.s3.custom.headers - headers to add on all the s3 requests.
+   * fs.s3a.sts.custom.headers - headers to add on all the sts requests.
+   * Examples
+   * CustomHeader {@literal ->} 'Header1:Value1'
+   * CustomHeaders {@literal ->} 'Header1=Value1:Value2,Header2=Value1'
+   */
+  public static final String CUSTOM_HEADERS_STS =
+  "fs.s3a." + Constants.AWS_SERVICE_IDENTIFIER_STS.toLowerCase()

Review Comment:
   I'd prefer fs.s3a.client.s3 and fs.s3a.client.sts





> Override S3 Client in Spark Write/Read calls
> 
>
> Key: SPARK-38958
> URL: https://issues.apache.org/jira/browse/SPARK-38958
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 3.2.1
>Reporter: Hershal
>Priority: Major
>  Labels: pull-request-available
>
> Hello,
> I have been working to use spark to read and write data to S3. Unfortunately, 
> there are a few S3 headers that I need to add to my spark read/write calls. 
> After much looking, I have not found a way to replace the S3 client that 
> spark uses to make the read/write calls. I also have not found a 
> configuration that allows me to pass in S3 headers. Here is an example of 
> some common S3 request headers 
> ([https://docs.aws.amazon.com/AmazonS3/latest/API/RESTCommonRequestHeaders.html).]
>  Does there already exist functionality to add S3 headers to spark read/write 
> calls or pass in a custom client that would pass these headers on every 
> read/write request? Appreciate the help and feedback
>  
> Thanks,



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38958) Override S3 Client in Spark Write/Read calls

2024-02-13 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817222#comment-17817222
 ] 

ASF GitHub Bot commented on SPARK-38958:


hadoop-yetus commented on PR #6550:
URL: https://github.com/apache/hadoop/pull/6550#issuecomment-1943173216

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 47s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  49m 20s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 41s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |   0m 34s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   0m 30s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 39s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 26s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   0m 33s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   1m  5s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  37m 41s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 28s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 36s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |   0m 36s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 25s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  javac  |   0m 25s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 19s | 
[/results-checkstyle-hadoop-tools_hadoop-aws.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6550/1/artifact/out/results-checkstyle-hadoop-tools_hadoop-aws.txt)
 |  hadoop-tools/hadoop-aws: The patch generated 8 new + 2 unchanged - 0 fixed 
= 10 total (was 2)  |
   | +1 :green_heart: |  mvnsite  |   0m 31s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 15s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   0m 24s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   1m  5s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  37m 34s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   2m 39s |  |  hadoop-aws in the patch passed. 
 |
   | +1 :green_heart: |  asflicense  |   0m 34s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 141m 23s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.44 ServerAPI=1.44 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6550/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6550 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 3d2a3fbef6e5 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 
15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / c8168fd0bc45331bd8b55dd53b537bec4b05fba5 |
   | Default Java | Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6550/1/testReport/ |
   | Max. process+thread count | 569 (vs. ulimit of 5500) |
   | modules | C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws |
   | Console output 

[jira] [Commented] (SPARK-38958) Override S3 Client in Spark Write/Read calls

2024-02-13 Thread Prerak Pradhan (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817208#comment-17817208
 ] 

Prerak Pradhan commented on SPARK-38958:


Hey [~ste...@apache.org] , I created a draft 
[PR|https://github.com/apache/hadoop/pull/6550/commits] for this. May I have 
your thoughts on the approach?

It would be awesome if I can get thoughts on

1/ header config name - custom.headers

2/ header pattern (especially multiple value headers)

 

Thank you,

Prerak Pradhan

> Override S3 Client in Spark Write/Read calls
> 
>
> Key: SPARK-38958
> URL: https://issues.apache.org/jira/browse/SPARK-38958
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 3.2.1
>Reporter: Hershal
>Priority: Major
>  Labels: pull-request-available
>
> Hello,
> I have been working to use spark to read and write data to S3. Unfortunately, 
> there are a few S3 headers that I need to add to my spark read/write calls. 
> After much looking, I have not found a way to replace the S3 client that 
> spark uses to make the read/write calls. I also have not found a 
> configuration that allows me to pass in S3 headers. Here is an example of 
> some common S3 request headers 
> ([https://docs.aws.amazon.com/AmazonS3/latest/API/RESTCommonRequestHeaders.html).]
>  Does there already exist functionality to add S3 headers to spark read/write 
> calls or pass in a custom client that would pass these headers on every 
> read/write request? Appreciate the help and feedback
>  
> Thanks,



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38958) Override S3 Client in Spark Write/Read calls

2024-02-13 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817203#comment-17817203
 ] 

ASF GitHub Bot commented on SPARK-38958:


PradhanPrerak opened a new pull request, #6550:
URL: https://github.com/apache/hadoop/pull/6550

   ### Description of PR
   - Support to add custom headers to s3a s3 and sts client
   
   Introducing new custom.headers config for each client type, Upon setting 
this header each s3/sts request will set these custom headers by setting 
overrideClientConfig on clients
   ```
   "fs.s3a.sts.custom.headers" = "Header1:Value1:Value2,Header2:Value1"
   "fs.s3a.s3.custom.headers" = "Header1:Value1:Value2,Header2:Value1"
   ```
   
   
   
   ### How was this patch tested?
   TBD
   
   ### For code changes:
   TBD
   
   - [Y] Does the title or this PR starts with the corresponding JIRA issue id 
(e.g. 'HADOOP-17799. Your PR title ...')?
   - [ ] Object storage: have the integration tests been executed and the 
endpoint declared according to the connector-specific documentation?
   - [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
   - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, 
`NOTICE-binary` files?
   
   




> Override S3 Client in Spark Write/Read calls
> 
>
> Key: SPARK-38958
> URL: https://issues.apache.org/jira/browse/SPARK-38958
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 3.2.1
>Reporter: Hershal
>Priority: Major
>
> Hello,
> I have been working to use spark to read and write data to S3. Unfortunately, 
> there are a few S3 headers that I need to add to my spark read/write calls. 
> After much looking, I have not found a way to replace the S3 client that 
> spark uses to make the read/write calls. I also have not found a 
> configuration that allows me to pass in S3 headers. Here is an example of 
> some common S3 request headers 
> ([https://docs.aws.amazon.com/AmazonS3/latest/API/RESTCommonRequestHeaders.html).]
>  Does there already exist functionality to add S3 headers to spark read/write 
> calls or pass in a custom client that would pass these headers on every 
> read/write request? Appreciate the help and feedback
>  
> Thanks,



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38958) Override S3 Client in Spark Write/Read calls

2023-09-11 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17763725#comment-17763725
 ] 

Steve Loughran commented on SPARK-38958:


[~hershalb] hadoop trunk is now on v2 sdk, but we are still stabilising client 
binding. 



> Override S3 Client in Spark Write/Read calls
> 
>
> Key: SPARK-38958
> URL: https://issues.apache.org/jira/browse/SPARK-38958
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 3.2.1
>Reporter: Hershal
>Priority: Major
>
> Hello,
> I have been working to use spark to read and write data to S3. Unfortunately, 
> there are a few S3 headers that I need to add to my spark read/write calls. 
> After much looking, I have not found a way to replace the S3 client that 
> spark uses to make the read/write calls. I also have not found a 
> configuration that allows me to pass in S3 headers. Here is an example of 
> some common S3 request headers 
> ([https://docs.aws.amazon.com/AmazonS3/latest/API/RESTCommonRequestHeaders.html).]
>  Does there already exist functionality to add S3 headers to spark read/write 
> calls or pass in a custom client that would pass these headers on every 
> read/write request? Appreciate the help and feedback
>  
> Thanks,



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38958) Override S3 Client in Spark Write/Read calls

2023-08-22 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17757364#comment-17757364
 ] 

Steve Loughran commented on SPARK-38958:


[~hershalb] we are about to merge the v2 sdk feature set; it'd be good for you 
to see if your changes work there.

as for static headers, I could imagine something like we added in HADOOP-17833 
for adding headers to created files.

# Define a well know prefix, e.g {{fs.s3a.request.headers.))
# every key which matches fs.s3a.request.headers.* becomes a header; the value 
the header value.

the alternative is as done for custom signers, a list of key=value separated by 
commas.

> Override S3 Client in Spark Write/Read calls
> 
>
> Key: SPARK-38958
> URL: https://issues.apache.org/jira/browse/SPARK-38958
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 3.2.1
>Reporter: Hershal
>Priority: Major
>
> Hello,
> I have been working to use spark to read and write data to S3. Unfortunately, 
> there are a few S3 headers that I need to add to my spark read/write calls. 
> After much looking, I have not found a way to replace the S3 client that 
> spark uses to make the read/write calls. I also have not found a 
> configuration that allows me to pass in S3 headers. Here is an example of 
> some common S3 request headers 
> ([https://docs.aws.amazon.com/AmazonS3/latest/API/RESTCommonRequestHeaders.html).]
>  Does there already exist functionality to add S3 headers to spark read/write 
> calls or pass in a custom client that would pass these headers on every 
> read/write request? Appreciate the help and feedback
>  
> Thanks,



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38958) Override S3 Client in Spark Write/Read calls

2022-11-18 Thread Daniel Carl Jones (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17635756#comment-17635756
 ] 

Daniel Carl Jones commented on SPARK-38958:
---

Upgrade from V1 to V2 AWS SDK is likely to introduce a breaking change to the 
interface of the client factory, since we will be changing the Java interface 
being returned by the factory for starters.

What it will mean is the factory function signatures will need to be updated, 
and given the V2 SDK has a sync and an async client, may need a second method 
(with the same headers attached again).

> Override S3 Client in Spark Write/Read calls
> 
>
> Key: SPARK-38958
> URL: https://issues.apache.org/jira/browse/SPARK-38958
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 3.2.1
>Reporter: Hershal
>Priority: Major
>
> Hello,
> I have been working to use spark to read and write data to S3. Unfortunately, 
> there are a few S3 headers that I need to add to my spark read/write calls. 
> After much looking, I have not found a way to replace the S3 client that 
> spark uses to make the read/write calls. I also have not found a 
> configuration that allows me to pass in S3 headers. Here is an example of 
> some common S3 request headers 
> ([https://docs.aws.amazon.com/AmazonS3/latest/API/RESTCommonRequestHeaders.html).]
>  Does there already exist functionality to add S3 headers to spark read/write 
> calls or pass in a custom client that would pass these headers on every 
> read/write request? Appreciate the help and feedback
>  
> Thanks,



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38958) Override S3 Client in Spark Write/Read calls

2022-11-18 Thread Daniel Carl Jones (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17635755#comment-17635755
 ] 

Daniel Carl Jones commented on SPARK-38958:
---

I had someone reach out to me with a similar request - static headers on all S3 
requests for a given S3A file system.

If static headers per fs by config were to be added as a feature, do we have 
any idea what configuration for a feature like this might look like? i.e. how 
do we model a list of key value pairs in the Hadoop configurations? Best I see 
is "getStrings" which we need to figure out if its even (right number of k,v 
pairs) or maybe have each k,v pair be one string joined by equals symbol.

Also, any reasons not to have such a configuration or any better way to design 
it?

> Override S3 Client in Spark Write/Read calls
> 
>
> Key: SPARK-38958
> URL: https://issues.apache.org/jira/browse/SPARK-38958
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 3.2.1
>Reporter: Hershal
>Priority: Major
>
> Hello,
> I have been working to use spark to read and write data to S3. Unfortunately, 
> there are a few S3 headers that I need to add to my spark read/write calls. 
> After much looking, I have not found a way to replace the S3 client that 
> spark uses to make the read/write calls. I also have not found a 
> configuration that allows me to pass in S3 headers. Here is an example of 
> some common S3 request headers 
> ([https://docs.aws.amazon.com/AmazonS3/latest/API/RESTCommonRequestHeaders.html).]
>  Does there already exist functionality to add S3 headers to spark read/write 
> calls or pass in a custom client that would pass these headers on every 
> read/write request? Appreciate the help and feedback
>  
> Thanks,



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38958) Override S3 Client in Spark Write/Read calls

2022-07-29 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17573137#comment-17573137
 ] 

Steve Loughran commented on SPARK-38958:


#. api is public, but we have changed the api incompatibly twice. it's a 
builder/parameter object pattern to try and reduce this, but you will probably 
still need to be in sync with the build.
# branch-3.3 (you get to make the release yourself) lets you add whatever 
headers you want when creating a file
https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/fsdataoutputstreambuilder.md#-s3a-specific-options


> Override S3 Client in Spark Write/Read calls
> 
>
> Key: SPARK-38958
> URL: https://issues.apache.org/jira/browse/SPARK-38958
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 3.2.1
>Reporter: Hershal
>Priority: Major
>
> Hello,
> I have been working to use spark to read and write data to S3. Unfortunately, 
> there are a few S3 headers that I need to add to my spark read/write calls. 
> After much looking, I have not found a way to replace the S3 client that 
> spark uses to make the read/write calls. I also have not found a 
> configuration that allows me to pass in S3 headers. Here is an example of 
> some common S3 request headers 
> ([https://docs.aws.amazon.com/AmazonS3/latest/API/RESTCommonRequestHeaders.html).]
>  Does there already exist functionality to add S3 headers to spark read/write 
> calls or pass in a custom client that would pass these headers on every 
> read/write request? Appreciate the help and feedback
>  
> Thanks,



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38958) Override S3 Client in Spark Write/Read calls

2022-07-04 Thread Daniel Carl Jones (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17562117#comment-17562117
 ] 

Daniel Carl Jones commented on SPARK-38958:
---

There is a workaround which may help if needed in the short-term though, 
originally shared by Asier in HADOOP-14661. The S3 client used by S3A, the 
underlying connector provided by the Hadoop project for using S3 as a 
filesystem, has its own factory class to configuring the client. You could 
extend this today and set a static header by adding a class like below. It 
needs to be compiled and added to your classpath when using Spark/Hadoop.
{code:java}
public class CustomS3ClientFactory extends DefaultS3ClientFactory {
@Override
public AmazonS3 createS3Client(final URI uri,
final S3ClientCreationParameters parameters) throws IOException {
parameters.withHeader("my-header-key", "my-header-value");
return super.createS3Client(uri, parameters);
}
}
{code}
In your Spark application, you should then be able to update your configuration 
to point to this new factory:
{code:java}
spark.sparkContext.hadoopConfiguration.set("fs.s3a.s3.client.factory.impl", 
"your.package.CustomS3ClientFactory") 
{code}

> Override S3 Client in Spark Write/Read calls
> 
>
> Key: SPARK-38958
> URL: https://issues.apache.org/jira/browse/SPARK-38958
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 3.2.1
>Reporter: Hershal
>Priority: Major
>
> Hello,
> I have been working to use spark to read and write data to S3. Unfortunately, 
> there are a few S3 headers that I need to add to my spark read/write calls. 
> After much looking, I have not found a way to replace the S3 client that 
> spark uses to make the read/write calls. I also have not found a 
> configuration that allows me to pass in S3 headers. Here is an example of 
> some common S3 request headers 
> ([https://docs.aws.amazon.com/AmazonS3/latest/API/RESTCommonRequestHeaders.html).]
>  Does there already exist functionality to add S3 headers to spark read/write 
> calls or pass in a custom client that would pass these headers on every 
> read/write request? Appreciate the help and feedback
>  
> Thanks,



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38958) Override S3 Client in Spark Write/Read calls

2022-07-04 Thread Daniel Carl Jones (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17562115#comment-17562115
 ] 

Daniel Carl Jones commented on SPARK-38958:
---

I'm not aware of documented support for this right now.

Is your use-case limited to just setting static headers for all S3 requests 
made by your Spark application - i.e. do you expect to have the same header key 
and values every time or would you need to vary based on a given file/request?

> Override S3 Client in Spark Write/Read calls
> 
>
> Key: SPARK-38958
> URL: https://issues.apache.org/jira/browse/SPARK-38958
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 3.2.1
>Reporter: Hershal
>Priority: Major
>
> Hello,
> I have been working to use spark to read and write data to S3. Unfortunately, 
> there are a few S3 headers that I need to add to my spark read/write calls. 
> After much looking, I have not found a way to replace the S3 client that 
> spark uses to make the read/write calls. I also have not found a 
> configuration that allows me to pass in S3 headers. Here is an example of 
> some common S3 request headers 
> ([https://docs.aws.amazon.com/AmazonS3/latest/API/RESTCommonRequestHeaders.html).]
>  Does there already exist functionality to add S3 headers to spark read/write 
> calls or pass in a custom client that would pass these headers on every 
> read/write request? Appreciate the help and feedback
>  
> Thanks,



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org