Re: Replacing S3 Client in Hadoop plugin

2021-11-23 Thread Martijn Visser
Hi Tamir,

Thanks for providing the information. I don't know of a current solution
right now, perhaps some other user has an idea, but I do find your input
valuable for future improvements with regards to the S3 Client in Hadoop.

Best regards,

Martijn

On Fri, 19 Nov 2021 at 09:21, Tamir Sagi 
wrote:

> Hey Martijn,
>
> sorry for late respond.
>
> We wanted to replace the default client with our custom S3 client and not
> use the AmazonS3Client provided by the plugin.
>
> We used Flink-s3-fs-hadoop v1.12.2 and for our needs we had to upgrade to
> v1.14.0 [1].
>
> AmazonS3 client factory is initialized[2] - if the property
> "fs.s3a.s3.client.factory.impl" [3] is not provided the default factory is
> created [4] which provides AmazonS3Client - which does not support what we
> need.
> I know that both the property and the factory interface are annotated with
>
> @InterfaceAudience.Private
> @InterfaceStability.Unstable
> from very early version.
>
> but we found this solution cleaner than extend the whole class and
> override the #setAmazonS3Client method.
>
> Bottom line, all we had to do was to create our own implementation for
> S3ClientFactory interface [5]
> and add to flink-conf.yaml :  s3.s3.client.factory.impl:  canonical name> .
> place both the plugin and our artifact(with Factory and client impl) under
> ${FLINK_HOME}/plugins/s3
>
> One important note:  Flink-s3-fs-hadoop plugin includes the whole
> com.amazonaws.s3 source code, to avoid plugin class loader issues, we
> needed to remove the aws-s3-java-sdk dependency and provide the plugin
> dependency with scope "provided".
> If the jobs needs to do some work with S3,then shading com.amazonaws was
> also necessary.
>
> [1]
> https://mvnrepository.com/artifact/org.apache.flink/flink-s3-fs-hadoop/1.14.0
>
> [2]
> https://github.com/apache/hadoop/blob/branch-3.2.2/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L264-L266
>
> [3]
> https://github.com/apache/hadoop/blob/branch-3.2.2/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java#L366-L369
>
> [4]
> https://github.com/apache/hadoop/blob/branch-3.2.2/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/DefaultS3ClientFactory.java#L66
>
> [5]
> https://github.com/apache/hadoop/blob/branch-3.2.2/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3ClientFactory.java
>
> Best,
> Tamir.
>
> ----------
> *From:* Martijn Visser 
> *Sent:* Wednesday, October 13, 2021 8:28 PM
> *To:* Tamir Sagi ; user@flink.apache.org <
> user@flink.apache.org>
> *Subject:* Re: Replacing S3 Client in Hadoop plugin
>
>
> *EXTERNAL EMAIL*
>
>
> Hi,
>
> Could you elaborate on why you would like to replace the S3 client?
>
> Best regards,
>
> Martijn
>
> On Wed, 13 Oct 2021 at 17:18, Tamir Sagi 
> wrote:
>
> I found the dependency
>
> 
> org.apache.hadoop
> hadoop-aws
> 3.3.1
> 
>
> apparently its possible, there is a method
> setAmazonS3Client
>
> I think I found the solution.
>
> Thanks.
>
> Tamir.
>
> --
> *From:* Tamir Sagi 
> *Sent:* Wednesday, October 13, 2021 5:44 PM
> *To:* user@flink.apache.org 
> *Subject:* Replacing S3 Client in Hadoop plugin
>
> Hey community.
>
> I would like to know if there is any way to replace the S3 client in
> Hadoop plugin[1] to a custom client(AmazonS3).
>
> I did notice that Hadoop plugin supports replacing the implementation of
> S3AFileSystem using
> "fs.s3a.impl" (in flink-conf.yaml it will be "s3.impl") but not the client
> itself [2]
>
> 
>   fs.s3a.impl
>   org.apache.hadoop.fs.s3a.S3AFileSystem
>   The implementation class of the S3A Filesystem
> 
>
> I delved into Hadoop plugin source code [3] , the Client itself is of type
> AmazonS3Client and cannot be replaced (for example) with a client of
> type AmazonS3EncryptionV2.
>
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/filesystems/s3/#hadooppresto-s3-file-systems-plugins
> Amazon S3 | Apache Flink
> <https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/filesystems/s3/#hadooppresto-s3-file-systems-plugins>
> Entropy injection for S3 file systems # The bundled S3 file systems
> (flink-s3-fs-presto and flink-s3-fs-hadoop) support entropy
> injection.Entropy injection is a technique to improve the scalability of
> AWS S3 buckets through adding some random characters near the beginning of
> the key.
> ci.apache.org
>
> [2]
> https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/inde

Replacing S3 Client in Hadoop plugin

2021-11-19 Thread Tamir Sagi
Hey Martijn,

sorry for late respond.

We wanted to replace the default client with our custom S3 client and not use 
the AmazonS3Client provided by the plugin.

We used Flink-s3-fs-hadoop v1.12.2 and for our needs we had to upgrade to 
v1.14.0 [1].

AmazonS3 client factory is initialized[2] - if the property 
"fs.s3a.s3.client.factory.impl" [3] is not provided the default factory is 
created [4] which provides AmazonS3Client - which does not support what we need.
I know that both the property and the factory interface are annotated with

@InterfaceAudience.Private
@InterfaceStability.Unstable
from very early version.

but we found this solution cleaner than extend the whole class and override the 
#setAmazonS3Client method.

Bottom line, all we had to do was to create our own implementation for 
S3ClientFactory interface [5]
and add to flink-conf.yaml :  s3.s3.client.factory.impl:  .
place both the plugin and our artifact(with Factory and client impl) under 
${FLINK_HOME}/plugins/s3

One important note:  Flink-s3-fs-hadoop plugin includes the whole 
com.amazonaws.s3 source code, to avoid plugin class loader issues, we needed to 
remove the aws-s3-java-sdk dependency and provide the plugin dependency with 
scope "provided".
If the jobs needs to do some work with S3,then shading com.amazonaws was also 
necessary.

[1] 
https://mvnrepository.com/artifact/org.apache.flink/flink-s3-fs-hadoop/1.14.0

[2] 
https://github.com/apache/hadoop/blob/branch-3.2.2/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L264-L266

[3] 
https://github.com/apache/hadoop/blob/branch-3.2.2/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java#L366-L369

[4] 
https://github.com/apache/hadoop/blob/branch-3.2.2/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/DefaultS3ClientFactory.java#L66

[5] 
https://github.com/apache/hadoop/blob/branch-3.2.2/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3ClientFactory.java

Best,
Tamir.


From: Martijn Visser 
Sent: Wednesday, October 13, 2021 8:28 PM
To: Tamir Sagi ; user@flink.apache.org 

Subject: Re: Replacing S3 Client in Hadoop plugin


EXTERNAL EMAIL


Hi,

Could you elaborate on why you would like to replace the S3 client?

Best regards,

Martijn

On Wed, 13 Oct 2021 at 17:18, Tamir Sagi 
mailto:tamir.s...@niceactimize.com>> wrote:
I found the dependency


org.apache.hadoop
hadoop-aws
3.3.1


apparently its possible, there is a method
setAmazonS3Client

I think I found the solution.

Thanks.

Tamir.


From: Tamir Sagi 
mailto:tamir.s...@niceactimize.com>>
Sent: Wednesday, October 13, 2021 5:44 PM
To: user@flink.apache.org<mailto:user@flink.apache.org> 
mailto:user@flink.apache.org>>
Subject: Replacing S3 Client in Hadoop plugin

Hey community.

I would like to know if there is any way to replace the S3 client in Hadoop 
plugin[1] to a custom client(AmazonS3).

I did notice that Hadoop plugin supports replacing the implementation of 
S3AFileSystem using
"fs.s3a.impl" (in flink-conf.yaml it will be "s3.impl") but not the client 
itself [2]

  fs.s3a.impl
  org.apache.hadoop.fs.s3a.S3AFileSystem
  The implementation class of the S3A Filesystem

I delved into Hadoop plugin source code [3] , the Client itself is of type 
AmazonS3Client and cannot be replaced (for example) with a client of type 
AmazonS3EncryptionV2.


[1] 
https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/filesystems/s3/#hadooppresto-s3-file-systems-plugins
Amazon S3 | Apache 
Flink<https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/filesystems/s3/#hadooppresto-s3-file-systems-plugins>
Entropy injection for S3 file systems # The bundled S3 file systems 
(flink-s3-fs-presto and flink-s3-fs-hadoop) support entropy injection.Entropy 
injection is a technique to improve the scalability of AWS S3 buckets through 
adding some random characters near the beginning of the key.
ci.apache.org<http://ci.apache.org>

[2] 
https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html<https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#S3A>
[3] 
https://github.com/apache/hadoop/blob/master/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java

Hadoop-AWS module: Integration with Amazon Web Services - Apache 
Hadoop<https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#:~:text=property%3E%0A%0A%3Cproperty%3E%0A%20%20%3Cname%3E-,fs.s3a.impl,-%3C/name%3E%0A%20%20%3Cvalue%3Eorg>
Overview. Apache Hadoop’s hadoop-aws module provides support for AWS 
integration. applications to easily use this support.. To include the S3A 
client in Apache Hadoop’s default classpath: Make sure 
thatHADOOP_OPTIONAL_TOOLS in hadoop-env.sh includes hadoop-aws in its list of 

Re: Replacing S3 Client in Hadoop plugin

2021-10-13 Thread Martijn Visser
Hi,

Could you elaborate on why you would like to replace the S3 client?

Best regards,

Martijn

On Wed, 13 Oct 2021 at 17:18, Tamir Sagi 
wrote:

> I found the dependency
>
> 
> org.apache.hadoop
> hadoop-aws
> 3.3.1
> 
>
> apparently its possible, there is a method
> setAmazonS3Client
>
> I think I found the solution.
>
> Thanks.
>
> Tamir.
>
> --
> *From:* Tamir Sagi 
> *Sent:* Wednesday, October 13, 2021 5:44 PM
> *To:* user@flink.apache.org 
> *Subject:* Replacing S3 Client in Hadoop plugin
>
> Hey community.
>
> I would like to know if there is any way to replace the S3 client in
> Hadoop plugin[1] to a custom client(AmazonS3).
>
> I did notice that Hadoop plugin supports replacing the implementation of
> S3AFileSystem using
> "fs.s3a.impl" (in flink-conf.yaml it will be "s3.impl") but not the client
> itself [2]
>
> 
>   fs.s3a.impl
>   org.apache.hadoop.fs.s3a.S3AFileSystem
>   The implementation class of the S3A Filesystem
> 
>
> I delved into Hadoop plugin source code [3] , the Client itself is of type
> AmazonS3Client and cannot be replaced (for example) with a client of
> type AmazonS3EncryptionV2.
>
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/filesystems/s3/#hadooppresto-s3-file-systems-plugins
> Amazon S3 | Apache Flink
> <https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/filesystems/s3/#hadooppresto-s3-file-systems-plugins>
> Entropy injection for S3 file systems # The bundled S3 file systems
> (flink-s3-fs-presto and flink-s3-fs-hadoop) support entropy
> injection.Entropy injection is a technique to improve the scalability of
> AWS S3 buckets through adding some random characters near the beginning of
> the key.
> ci.apache.org
>
> [2]
> https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html
> <https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#S3A>
> [3]
> https://github.com/apache/hadoop/blob/master/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
>
> Hadoop-AWS module: Integration with Amazon Web Services - Apache Hadoop
> <https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#:~:text=property%3E%0A%0A%3Cproperty%3E%0A%20%20%3Cname%3E-,fs.s3a.impl,-%3C/name%3E%0A%20%20%3Cvalue%3Eorg>
> Overview. Apache Hadoop’s hadoop-aws module provides support for AWS
> integration. applications to easily use this support.. To include the S3A
> client in Apache Hadoop’s default classpath: Make sure
> thatHADOOP_OPTIONAL_TOOLS in hadoop-env.sh includes hadoop-aws in its list
> of optional modules to add in the classpath.. For client side interaction,
> you can declare that relevant JARs must be ...
> hadoop.apache.org
> Thank you,
>
> Best,
> Tamir.
>
>
> Confidentiality: This communication and any attachments are intended for
> the above-named persons only and may be confidential and/or legally
> privileged. Any opinions expressed in this communication are not
> necessarily those of NICE Actimize. If this communication has come to you
> in error you must take no action based on it, nor must you copy or show it
> to anyone; please delete/destroy and inform the sender by e-mail
> immediately.
> Monitoring: NICE Actimize may monitor incoming and outgoing e-mails.
> Viruses: Although we have taken steps toward ensuring that this e-mail and
> attachments are free from any virus, we advise that in keeping with good
> computing practice the recipient should ensure they are actually virus free.
>
>
> Confidentiality: This communication and any attachments are intended for
> the above-named persons only and may be confidential and/or legally
> privileged. Any opinions expressed in this communication are not
> necessarily those of NICE Actimize. If this communication has come to you
> in error you must take no action based on it, nor must you copy or show it
> to anyone; please delete/destroy and inform the sender by e-mail
> immediately.
> Monitoring: NICE Actimize may monitor incoming and outgoing e-mails.
> Viruses: Although we have taken steps toward ensuring that this e-mail and
> attachments are free from any virus, we advise that in keeping with good
> computing practice the recipient should ensure they are actually virus free.
>


Re: Replacing S3 Client in Hadoop plugin

2021-10-13 Thread Tamir Sagi
I found the dependency


org.apache.hadoop
hadoop-aws
3.3.1


apparently its possible, there is a method
setAmazonS3Client

I think I found the solution.

Thanks.

Tamir.



From: Tamir Sagi 
Sent: Wednesday, October 13, 2021 5:44 PM
To: user@flink.apache.org 
Subject: Replacing S3 Client in Hadoop plugin

Hey community.

I would like to know if there is any way to replace the S3 client in Hadoop 
plugin[1] to a custom client(AmazonS3).

I did notice that Hadoop plugin supports replacing the implementation of 
S3AFileSystem using
"fs.s3a.impl" (in flink-conf.yaml it will be "s3.impl") but not the client 
itself [2]

  fs.s3a.impl
  org.apache.hadoop.fs.s3a.S3AFileSystem
  The implementation class of the S3A Filesystem

I delved into Hadoop plugin source code [3] , the Client itself is of type 
AmazonS3Client and cannot be replaced (for example) with a client of type 
AmazonS3EncryptionV2.


[1] 
https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/filesystems/s3/#hadooppresto-s3-file-systems-plugins
Amazon S3 | Apache 
Flink<https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/filesystems/s3/#hadooppresto-s3-file-systems-plugins>
Entropy injection for S3 file systems # The bundled S3 file systems 
(flink-s3-fs-presto and flink-s3-fs-hadoop) support entropy injection.Entropy 
injection is a technique to improve the scalability of AWS S3 buckets through 
adding some random characters near the beginning of the key.
ci.apache.org

[2] 
https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html<https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#S3A>
[3] 
https://github.com/apache/hadoop/blob/master/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java

Hadoop-AWS module: Integration with Amazon Web Services - Apache 
Hadoop<https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#:~:text=property%3E%0A%0A%3Cproperty%3E%0A%20%20%3Cname%3E-,fs.s3a.impl,-%3C/name%3E%0A%20%20%3Cvalue%3Eorg>
Overview. Apache Hadoop’s hadoop-aws module provides support for AWS 
integration. applications to easily use this support.. To include the S3A 
client in Apache Hadoop’s default classpath: Make sure 
thatHADOOP_OPTIONAL_TOOLS in hadoop-env.sh includes hadoop-aws in its list of 
optional modules to add in the classpath.. For client side interaction, you can 
declare that relevant JARs must be ...
hadoop.apache.org
Thank you,

Best,
Tamir.



Confidentiality: This communication and any attachments are intended for the 
above-named persons only and may be confidential and/or legally privileged. Any 
opinions expressed in this communication are not necessarily those of NICE 
Actimize. If this communication has come to you in error you must take no 
action based on it, nor must you copy or show it to anyone; please 
delete/destroy and inform the sender by e-mail immediately.
Monitoring: NICE Actimize may monitor incoming and outgoing e-mails.
Viruses: Although we have taken steps toward ensuring that this e-mail and 
attachments are free from any virus, we advise that in keeping with good 
computing practice the recipient should ensure they are actually virus free.

Confidentiality: This communication and any attachments are intended for the 
above-named persons only and may be confidential and/or legally privileged. Any 
opinions expressed in this communication are not necessarily those of NICE 
Actimize. If this communication has come to you in error you must take no 
action based on it, nor must you copy or show it to anyone; please 
delete/destroy and inform the sender by e-mail immediately.
Monitoring: NICE Actimize may monitor incoming and outgoing e-mails.
Viruses: Although we have taken steps toward ensuring that this e-mail and 
attachments are free from any virus, we advise that in keeping with good 
computing practice the recipient should ensure they are actually virus free.


Replacing S3 Client in Hadoop plugin

2021-10-13 Thread Tamir Sagi
Hey community.

I would like to know if there is any way to replace the S3 client in Hadoop 
plugin[1] to a custom client(AmazonS3).

I did notice that Hadoop plugin supports replacing the implementation of 
S3AFileSystem using
"fs.s3a.impl" (in flink-conf.yaml it will be "s3.impl") but not the client 
itself [2]

  fs.s3a.impl
  org.apache.hadoop.fs.s3a.S3AFileSystem
  The implementation class of the S3A Filesystem

I delved into Hadoop plugin source code [3] , the Client itself is of type 
AmazonS3Client and cannot be replaced (for example) with a client of type 
AmazonS3EncryptionV2.


[1] 
https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/filesystems/s3/#hadooppresto-s3-file-systems-plugins
Amazon S3 | Apache 
Flink
Entropy injection for S3 file systems # The bundled S3 file systems 
(flink-s3-fs-presto and flink-s3-fs-hadoop) support entropy injection.Entropy 
injection is a technique to improve the scalability of AWS S3 buckets through 
adding some random characters near the beginning of the key.
ci.apache.org

[2] 
https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html
[3] 
https://github.com/apache/hadoop/blob/master/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java

Hadoop-AWS module: Integration with Amazon Web Services - Apache 
Hadoop
Overview. Apache Hadoop’s hadoop-aws module provides support for AWS 
integration. applications to easily use this support.. To include the S3A 
client in Apache Hadoop’s default classpath: Make sure 
thatHADOOP_OPTIONAL_TOOLS in hadoop-env.sh includes hadoop-aws in its list of 
optional modules to add in the classpath.. For client side interaction, you can 
declare that relevant JARs must be ...
hadoop.apache.org
Thank you,

Best,
Tamir.



Confidentiality: This communication and any attachments are intended for the 
above-named persons only and may be confidential and/or legally privileged. Any 
opinions expressed in this communication are not necessarily those of NICE 
Actimize. If this communication has come to you in error you must take no 
action based on it, nor must you copy or show it to anyone; please 
delete/destroy and inform the sender by e-mail immediately.
Monitoring: NICE Actimize may monitor incoming and outgoing e-mails.
Viruses: Although we have taken steps toward ensuring that this e-mail and 
attachments are free from any virus, we advise that in keeping with good 
computing practice the recipient should ensure they are actually virus free.