[jira] [Commented] (FLINK-10114) Support Orc for StreamingFileSink

2019-12-18 Thread Manish Bellani (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-10114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16999359#comment-16999359
 ] 

Manish Bellani commented on FLINK-10114:


Hi,

Is there still interest in this work? I have an implementation of an S3/Orc 
sink that handles these requirements. We're currently using this at GitHub 
internally to write several billions of events/terabytes of data per day to 
S3(exactly once). If there's interest I can kick off a conversation at GitHub, 
in the meantime do you mind sharing how can I contribute to flink? I found 
this: [https://flink.apache.org/contributing/contribute-code.html] but is there 
anything else that I need to do?

 

Thanks

Manish

> Support Orc for StreamingFileSink
> -
>
> Key: FLINK-10114
> URL: https://issues.apache.org/jira/browse/FLINK-10114
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / FileSystem
>Reporter: zhangminglei
>Assignee: vinoyang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Issue Comment Deleted] (FLINK-9749) Rework Bucketing Sink

2019-12-18 Thread Manish Bellani (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-9749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manish Bellani updated FLINK-9749:
--
Comment: was deleted

(was: Hi,

Is there still interest in this work? I have an implementation of an S3/Orc 
sink that handles these requirements. We're currently using this at GitHub 
internally to write several billions of events/terabytes of data per day to 
S3(exactly once). If there's interest I can kick off a conversation at GitHub, 
in the meantime do you mind sharing how can I contribute to flink? I found 
this: [https://flink.apache.org/contributing/contribute-code.html] but is there 
anything else that I need to do?

 

Thanks

Manish)

> Rework Bucketing Sink
> -
>
> Key: FLINK-9749
> URL: https://issues.apache.org/jira/browse/FLINK-9749
> Project: Flink
>  Issue Type: New Feature
>  Components: Connectors / FileSystem
>Reporter: Stephan Ewen
>Assignee: Kostas Kloudas
>Priority: Major
>
> The BucketingSink has a series of deficits at the moment.
> Due to the long list of issues, I would suggest to add a new 
> StreamingFileSink with a new and cleaner design
> h3. Encoders, Parquet, ORC
>  - It only efficiently supports row-wise data formats (avro, jso, sequence 
> files.
>  - Efforts to add (columnar) compression for blocks of data is inefficient, 
> because blocks cannot span checkpoints due to persistence-on-checkpoint.
>  - The encoders are part of the \{{flink-connector-filesystem project}}, 
> rather than in orthogonal formats projects. This blows up the dependencies of 
> the \{{flink-connector-filesystem project}} project. As an example, the 
> rolling file sink has dependencies on Hadoop and Avro, which messes up 
> dependency management.
> h3. Use of FileSystems
>  - The BucketingSink works only on Hadoop's FileSystem abstraction not 
> support Flink's own FileSystem abstraction and cannot work with the packaged 
> S3, maprfs, and swift file systems
>  - The sink hence needs Hadoop as a dependency
>  - The sink relies on "trying out" whether truncation works, which requires 
> write access to the users working directory
>  - The sink relies on enumerating and counting files, rather than maintaining 
> its own state, making less efficient
> h3. Correctness and Efficiency on S3
>  - The BucketingSink relies on strong consistency in the file enumeration, 
> hence may work incorrectly on S3.
>  - The BucketingSink relies on persisting streams at intermediate points. 
> This is not working properly on S3, hence there may be data loss on S3.
> h3. .valid-length companion file
>  - The valid length file makes it hard for consumers of the data and should 
> be dropped
> We track this design in a series of sub issues.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (FLINK-9749) Rework Bucketing Sink

2019-12-17 Thread Manish Bellani (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-9749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16998659#comment-16998659
 ] 

Manish Bellani edited comment on FLINK-9749 at 12/17/19 11:11 PM:
--

Hi,

Is there still interest in this work? I have an implementation of an S3/Orc 
sink that handles these requirements. We're currently using this at GitHub 
internally to write several billions of events/terabytes of data per day to 
S3(exactly once). If there's interest I can kick off a conversation at GitHub, 
in the meantime do you mind sharing how can I contribute to flink? I found 
this: [https://flink.apache.org/contributing/contribute-code.html] but is there 
anything else that I need to do?

 

Thanks

Manish


was (Author: mbellani):
Hi,

Is there still interest in this work? I have an implementation of an S3/Orc 
sink that handles these requirements. We're currently using this at GitHub 
internally to write several billions/terabytes of data to S3(exactly once). If 
there's interest I can kick off a conversation at GitHub, in the meantime do 
you mind sharing how can I contribute to flink? I found this: 
[https://flink.apache.org/contributing/contribute-code.html] but is there 
anything else that I need to do?

 

Thanks

Manish

> Rework Bucketing Sink
> -
>
> Key: FLINK-9749
> URL: https://issues.apache.org/jira/browse/FLINK-9749
> Project: Flink
>  Issue Type: New Feature
>  Components: Connectors / FileSystem
>Reporter: Stephan Ewen
>Assignee: Kostas Kloudas
>Priority: Major
>
> The BucketingSink has a series of deficits at the moment.
> Due to the long list of issues, I would suggest to add a new 
> StreamingFileSink with a new and cleaner design
> h3. Encoders, Parquet, ORC
>  - It only efficiently supports row-wise data formats (avro, jso, sequence 
> files.
>  - Efforts to add (columnar) compression for blocks of data is inefficient, 
> because blocks cannot span checkpoints due to persistence-on-checkpoint.
>  - The encoders are part of the \{{flink-connector-filesystem project}}, 
> rather than in orthogonal formats projects. This blows up the dependencies of 
> the \{{flink-connector-filesystem project}} project. As an example, the 
> rolling file sink has dependencies on Hadoop and Avro, which messes up 
> dependency management.
> h3. Use of FileSystems
>  - The BucketingSink works only on Hadoop's FileSystem abstraction not 
> support Flink's own FileSystem abstraction and cannot work with the packaged 
> S3, maprfs, and swift file systems
>  - The sink hence needs Hadoop as a dependency
>  - The sink relies on "trying out" whether truncation works, which requires 
> write access to the users working directory
>  - The sink relies on enumerating and counting files, rather than maintaining 
> its own state, making less efficient
> h3. Correctness and Efficiency on S3
>  - The BucketingSink relies on strong consistency in the file enumeration, 
> hence may work incorrectly on S3.
>  - The BucketingSink relies on persisting streams at intermediate points. 
> This is not working properly on S3, hence there may be data loss on S3.
> h3. .valid-length companion file
>  - The valid length file makes it hard for consumers of the data and should 
> be dropped
> We track this design in a series of sub issues.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-9749) Rework Bucketing Sink

2019-12-17 Thread Manish Bellani (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-9749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16998659#comment-16998659
 ] 

Manish Bellani commented on FLINK-9749:
---

Hi,

Is there still interest in this work? I have an implementation of an S3/Orc 
sink that handles these requirements. We're currently using this at GitHub 
internally to write several billions/terabytes of data to S3(exactly once). If 
there's interest I can kick off a conversation at GitHub, in the meantime do 
you mind sharing how can I contribute to flink? I found this: 
[https://flink.apache.org/contributing/contribute-code.html] but is there 
anything else that I need to do?

 

Thanks

Manish

> Rework Bucketing Sink
> -
>
> Key: FLINK-9749
> URL: https://issues.apache.org/jira/browse/FLINK-9749
> Project: Flink
>  Issue Type: New Feature
>  Components: Connectors / FileSystem
>Reporter: Stephan Ewen
>Assignee: Kostas Kloudas
>Priority: Major
>
> The BucketingSink has a series of deficits at the moment.
> Due to the long list of issues, I would suggest to add a new 
> StreamingFileSink with a new and cleaner design
> h3. Encoders, Parquet, ORC
>  - It only efficiently supports row-wise data formats (avro, jso, sequence 
> files.
>  - Efforts to add (columnar) compression for blocks of data is inefficient, 
> because blocks cannot span checkpoints due to persistence-on-checkpoint.
>  - The encoders are part of the \{{flink-connector-filesystem project}}, 
> rather than in orthogonal formats projects. This blows up the dependencies of 
> the \{{flink-connector-filesystem project}} project. As an example, the 
> rolling file sink has dependencies on Hadoop and Avro, which messes up 
> dependency management.
> h3. Use of FileSystems
>  - The BucketingSink works only on Hadoop's FileSystem abstraction not 
> support Flink's own FileSystem abstraction and cannot work with the packaged 
> S3, maprfs, and swift file systems
>  - The sink hence needs Hadoop as a dependency
>  - The sink relies on "trying out" whether truncation works, which requires 
> write access to the users working directory
>  - The sink relies on enumerating and counting files, rather than maintaining 
> its own state, making less efficient
> h3. Correctness and Efficiency on S3
>  - The BucketingSink relies on strong consistency in the file enumeration, 
> hence may work incorrectly on S3.
>  - The BucketingSink relies on persisting streams at intermediate points. 
> This is not working properly on S3, hence there may be data loss on S3.
> h3. .valid-length companion file
>  - The valid length file makes it hard for consumers of the data and should 
> be dropped
> We track this design in a series of sub issues.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (FLINK-10114) Support Orc for StreamingFileSink

2019-05-17 Thread Manish Bellani (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16842336#comment-16842336
 ] 

Manish Bellani edited comment on FLINK-10114 at 5/17/19 5:33 PM:
-

Hello, is this still planned to be supported for StreamingFileSink? It appears 
that the changes would be needed for orc-writer api in order to support this? 
I'd love to hear if there's work gonig on in this area since we write data in 
orc format and are looking for streaming orc data to s3. 


was (Author: mbellani):
Hello, is this still planned to be supported for StreamingFileSink? It appears 
that the changes would be needed for orc-writer api in order to support this? 
I'd love to hear if there's work gonig on in this area since we write data in 
orc format and are looking for streaming orc data in s3. 

> Support Orc for StreamingFileSink
> -
>
> Key: FLINK-10114
> URL: https://issues.apache.org/jira/browse/FLINK-10114
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / FileSystem
>Reporter: zhangminglei
>Assignee: vinoyang
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10114) Support Orc for StreamingFileSink

2019-05-17 Thread Manish Bellani (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16842336#comment-16842336
 ] 

Manish Bellani commented on FLINK-10114:


Hello, is this still planned to be supported for StreamingFileSink? It appears 
that the changes would be needed for orc-writer api in order to support this? 
I'd love to hear if there's work gonig on in this area since we write data in 
orc format and are looking for streaming orc data in s3. 

> Support Orc for StreamingFileSink
> -
>
> Key: FLINK-10114
> URL: https://issues.apache.org/jira/browse/FLINK-10114
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / FileSystem
>Reporter: zhangminglei
>Assignee: vinoyang
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (FLINK-12522) Error While Initializing S3A FileSystem

2019-05-16 Thread Manish Bellani (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-12522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manish Bellani closed FLINK-12522.
--
Resolution: Not A Problem

> Error While Initializing S3A FileSystem
> ---
>
> Key: FLINK-12522
> URL: https://issues.apache.org/jira/browse/FLINK-12522
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / FileSystem
>Affects Versions: 1.7.2
> Environment: Kubernetes
> Docker
> Debian
> {color:#33}DockerImage: flink:1.7.2-hadoop28-scala_2.11{color}
> {color:#33}Java 1.8{color}
> {color:#33}Hadoop Version: 2.8.5{color}
>Reporter: Manish Bellani
>Priority: Major
>
> hey Friends,
> Thanks for all the work you have been doing on flink, i have been trying to 
> use BucketingSink (backed by S3AFileSystem) to write data to s3 and i'm 
> running into some issues (which i suspect could be dependency/packaging 
> related) that'd try to describe here.
> The data pipeline is quite simple:
> {noformat}
> Kafka -> KafkaConsumer (Source) -> BucketingSink (S3AFileSystem) -> 
> S3{noformat}
> *Environment:* 
> {noformat}
> Kubernetes
> Debian
> DockerImage: flink:1.7.2-hadoop28-scala_2.11
> Java 1.8
> Hadoop Version: 2.8.5{noformat}
> I followed this dependency secion: 
> [https://ci.apache.org/projects/flink/flink-docs-stable/ops/deployment/aws.html#flink-for-hadoop-27]
>  to place the dependencies under /opt/flink/lib (with an exception that my 
> hadoop version and it's dependencies that i pull in are different).
> Here are the dependencies i'm pulling in (excerpt from my Dockerfile) 
> {noformat}
> RUN cp /opt/flink/opt/flink-s3-fs-hadoop-1.7.2.jar 
> /opt/flink/lib/flink-s3-fs-hadoop-1.7.2.jar
> RUN wget -O /opt/flink/lib/hadoop-aws-2.8.5.jar 
> https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/2.8.5/hadoop-aws-2.8.5.jar
> RUN wget -O /opt/flink/lib/aws-java-sdk-s3-1.10.6.jar 
> http://central.maven.org/maven2/com/amazonaws/aws-java-sdk-s3/1.10.6/aws-java-sdk-s3-1.10.6.jar
> #Transitive Dependency of aws-java-sdk-s3
> RUN wget -O /opt/flink/lib/aws-java-sdk-core-1.10.6.jar 
> http://central.maven.org/maven2/com/amazonaws/aws-java-sdk-core/1.10.6/aws-java-sdk-core-1.10.6.jar
> RUN wget -O /opt/flink/lib/aws-java-sdk-kms-1.10.6.jar 
> http://central.maven.org/maven2/com/amazonaws/aws-java-sdk-kms/1.10.6/aws-java-sdk-kms-1.10.6.jar
> RUN wget -O /opt/flink/lib/jackson-annotations-2.5.3.jar 
> http://central.maven.org/maven2/com/fasterxml/jackson/core/jackson-annotations/2.5.3/jackson-annotations-2.5.3.jar
> RUN wget -O /opt/flink/lib/jackson-core-2.5.3.jar 
> http://central.maven.org/maven2/com/fasterxml/jackson/core/jackson-core/2.5.3/jackson-core-2.5.3.jar
> RUN wget -O /opt/flink/lib/jackson-databind-2.5.3.jar 
> http://central.maven.org/maven2/com/fasterxml/jackson/core/jackson-databind/2.5.3/jackson-databind-2.5.3.jar
> RUN wget -O /opt/flink/lib/joda-time-2.8.1.jar 
> http://central.maven.org/maven2/joda-time/joda-time/2.8.1/joda-time-2.8.1.jar
> RUN wget -O /opt/flink/lib/httpcore-4.3.3.jar 
> http://central.maven.org/maven2/org/apache/httpcomponents/httpcore/4.3.3/httpcore-4.3.3.jar
> RUN wget -O /opt/flink/lib/httpclient-4.3.6.jar 
> http://central.maven.org/maven2/org/apache/httpcomponents/httpclient/4.3.6/httpclient-4.3.6.jar{noformat}
>  
> But when i submit the job, it throws this error during initialzation of 
> BucketingSink/S3AFileSystem: 
> {noformat}
> java.beans.IntrospectionException: bad write method arg count: public final 
> void 
> org.apache.flink.fs.shaded.hadoop3.org.apache.commons.configuration2.AbstractConfiguration.setProperty(java.lang.String,java.lang.Object)
>     at 
> java.beans.PropertyDescriptor.findPropertyType(PropertyDescriptor.java:657)
>     at 
> java.beans.PropertyDescriptor.setWriteMethod(PropertyDescriptor.java:327)
>     at java.beans.PropertyDescriptor.(PropertyDescriptor.java:139)
>     at 
> org.apache.flink.fs.shaded.hadoop3.org.apache.commons.beanutils.FluentPropertyBeanIntrospector.createFluentPropertyDescritor(FluentPropertyBeanIntrospector.java:178)
>     at 
> org.apache.flink.fs.shaded.hadoop3.org.apache.commons.beanutils.FluentPropertyBeanIntrospector.introspect(FluentPropertyBeanIntrospector.java:141)
>     at 
> org.apache.flink.fs.shaded.hadoop3.org.apache.commons.beanutils.PropertyUtilsBean.fetchIntrospectionData(PropertyUtilsBean.java:2245)
>     at 
> org.apache.flink.fs.shaded.hadoop3.org.apache.commons.beanutils.PropertyUtilsBean.getIntrospectionData(PropertyUtilsBean.java:2226)
>     at 
> org.apache.flink.fs.shaded.hadoop3.org.apache.commons.beanutils.PropertyUtilsBean.getPropertyDescriptor(PropertyUtilsBean.java:954)
>     at 
> org.apache.flink.fs.shaded.hadoop3.org.apache.commons.beanutils.PropertyUtilsBean.isWriteable(PropertyUtilsBean.java:1478)
>     at 
> 

[jira] [Comment Edited] (FLINK-12522) Error While Initializing S3A FileSystem

2019-05-16 Thread Manish Bellani (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-12522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16841321#comment-16841321
 ] 

Manish Bellani edited comment on FLINK-12522 at 5/16/19 1:16 PM:
-

This turned out to be a non-issue but definitely could be misleading, more 
context here: 
[http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Error-While-Initializing-S3A-FileSystem-td27878.html.]
 Going to close this issue, also thanks [~kkrugler] for help with this.


was (Author: mbellani):
This turned out to be a non-issue but definitely could be misleading, more 
context here: 
[http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Error-While-Initializing-S3A-FileSystem-td27878.html.]
 Going to close this issue

> Error While Initializing S3A FileSystem
> ---
>
> Key: FLINK-12522
> URL: https://issues.apache.org/jira/browse/FLINK-12522
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / FileSystem
>Affects Versions: 1.7.2
> Environment: Kubernetes
> Docker
> Debian
> {color:#33}DockerImage: flink:1.7.2-hadoop28-scala_2.11{color}
> {color:#33}Java 1.8{color}
> {color:#33}Hadoop Version: 2.8.5{color}
>Reporter: Manish Bellani
>Priority: Major
>
> hey Friends,
> Thanks for all the work you have been doing on flink, i have been trying to 
> use BucketingSink (backed by S3AFileSystem) to write data to s3 and i'm 
> running into some issues (which i suspect could be dependency/packaging 
> related) that'd try to describe here.
> The data pipeline is quite simple:
> {noformat}
> Kafka -> KafkaConsumer (Source) -> BucketingSink (S3AFileSystem) -> 
> S3{noformat}
> *Environment:* 
> {noformat}
> Kubernetes
> Debian
> DockerImage: flink:1.7.2-hadoop28-scala_2.11
> Java 1.8
> Hadoop Version: 2.8.5{noformat}
> I followed this dependency secion: 
> [https://ci.apache.org/projects/flink/flink-docs-stable/ops/deployment/aws.html#flink-for-hadoop-27]
>  to place the dependencies under /opt/flink/lib (with an exception that my 
> hadoop version and it's dependencies that i pull in are different).
> Here are the dependencies i'm pulling in (excerpt from my Dockerfile) 
> {noformat}
> RUN cp /opt/flink/opt/flink-s3-fs-hadoop-1.7.2.jar 
> /opt/flink/lib/flink-s3-fs-hadoop-1.7.2.jar
> RUN wget -O /opt/flink/lib/hadoop-aws-2.8.5.jar 
> https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/2.8.5/hadoop-aws-2.8.5.jar
> RUN wget -O /opt/flink/lib/aws-java-sdk-s3-1.10.6.jar 
> http://central.maven.org/maven2/com/amazonaws/aws-java-sdk-s3/1.10.6/aws-java-sdk-s3-1.10.6.jar
> #Transitive Dependency of aws-java-sdk-s3
> RUN wget -O /opt/flink/lib/aws-java-sdk-core-1.10.6.jar 
> http://central.maven.org/maven2/com/amazonaws/aws-java-sdk-core/1.10.6/aws-java-sdk-core-1.10.6.jar
> RUN wget -O /opt/flink/lib/aws-java-sdk-kms-1.10.6.jar 
> http://central.maven.org/maven2/com/amazonaws/aws-java-sdk-kms/1.10.6/aws-java-sdk-kms-1.10.6.jar
> RUN wget -O /opt/flink/lib/jackson-annotations-2.5.3.jar 
> http://central.maven.org/maven2/com/fasterxml/jackson/core/jackson-annotations/2.5.3/jackson-annotations-2.5.3.jar
> RUN wget -O /opt/flink/lib/jackson-core-2.5.3.jar 
> http://central.maven.org/maven2/com/fasterxml/jackson/core/jackson-core/2.5.3/jackson-core-2.5.3.jar
> RUN wget -O /opt/flink/lib/jackson-databind-2.5.3.jar 
> http://central.maven.org/maven2/com/fasterxml/jackson/core/jackson-databind/2.5.3/jackson-databind-2.5.3.jar
> RUN wget -O /opt/flink/lib/joda-time-2.8.1.jar 
> http://central.maven.org/maven2/joda-time/joda-time/2.8.1/joda-time-2.8.1.jar
> RUN wget -O /opt/flink/lib/httpcore-4.3.3.jar 
> http://central.maven.org/maven2/org/apache/httpcomponents/httpcore/4.3.3/httpcore-4.3.3.jar
> RUN wget -O /opt/flink/lib/httpclient-4.3.6.jar 
> http://central.maven.org/maven2/org/apache/httpcomponents/httpclient/4.3.6/httpclient-4.3.6.jar{noformat}
>  
> But when i submit the job, it throws this error during initialzation of 
> BucketingSink/S3AFileSystem: 
> {noformat}
> java.beans.IntrospectionException: bad write method arg count: public final 
> void 
> org.apache.flink.fs.shaded.hadoop3.org.apache.commons.configuration2.AbstractConfiguration.setProperty(java.lang.String,java.lang.Object)
>     at 
> java.beans.PropertyDescriptor.findPropertyType(PropertyDescriptor.java:657)
>     at 
> java.beans.PropertyDescriptor.setWriteMethod(PropertyDescriptor.java:327)
>     at java.beans.PropertyDescriptor.(PropertyDescriptor.java:139)
>     at 
> org.apache.flink.fs.shaded.hadoop3.org.apache.commons.beanutils.FluentPropertyBeanIntrospector.createFluentPropertyDescritor(FluentPropertyBeanIntrospector.java:178)
>     at 
> 

[jira] [Commented] (FLINK-12522) Error While Initializing S3A FileSystem

2019-05-16 Thread Manish Bellani (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-12522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16841321#comment-16841321
 ] 

Manish Bellani commented on FLINK-12522:


This turned out to be a non-issue but definitely could be misleading, more 
context here: 
[http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Error-While-Initializing-S3A-FileSystem-td27878.html.]
 Going to close this issue

> Error While Initializing S3A FileSystem
> ---
>
> Key: FLINK-12522
> URL: https://issues.apache.org/jira/browse/FLINK-12522
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / FileSystem
>Affects Versions: 1.7.2
> Environment: Kubernetes
> Docker
> Debian
> {color:#33}DockerImage: flink:1.7.2-hadoop28-scala_2.11{color}
> {color:#33}Java 1.8{color}
> {color:#33}Hadoop Version: 2.8.5{color}
>Reporter: Manish Bellani
>Priority: Major
>
> hey Friends,
> Thanks for all the work you have been doing on flink, i have been trying to 
> use BucketingSink (backed by S3AFileSystem) to write data to s3 and i'm 
> running into some issues (which i suspect could be dependency/packaging 
> related) that'd try to describe here.
> The data pipeline is quite simple:
> {noformat}
> Kafka -> KafkaConsumer (Source) -> BucketingSink (S3AFileSystem) -> 
> S3{noformat}
> *Environment:* 
> {noformat}
> Kubernetes
> Debian
> DockerImage: flink:1.7.2-hadoop28-scala_2.11
> Java 1.8
> Hadoop Version: 2.8.5{noformat}
> I followed this dependency secion: 
> [https://ci.apache.org/projects/flink/flink-docs-stable/ops/deployment/aws.html#flink-for-hadoop-27]
>  to place the dependencies under /opt/flink/lib (with an exception that my 
> hadoop version and it's dependencies that i pull in are different).
> Here are the dependencies i'm pulling in (excerpt from my Dockerfile) 
> {noformat}
> RUN cp /opt/flink/opt/flink-s3-fs-hadoop-1.7.2.jar 
> /opt/flink/lib/flink-s3-fs-hadoop-1.7.2.jar
> RUN wget -O /opt/flink/lib/hadoop-aws-2.8.5.jar 
> https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/2.8.5/hadoop-aws-2.8.5.jar
> RUN wget -O /opt/flink/lib/aws-java-sdk-s3-1.10.6.jar 
> http://central.maven.org/maven2/com/amazonaws/aws-java-sdk-s3/1.10.6/aws-java-sdk-s3-1.10.6.jar
> #Transitive Dependency of aws-java-sdk-s3
> RUN wget -O /opt/flink/lib/aws-java-sdk-core-1.10.6.jar 
> http://central.maven.org/maven2/com/amazonaws/aws-java-sdk-core/1.10.6/aws-java-sdk-core-1.10.6.jar
> RUN wget -O /opt/flink/lib/aws-java-sdk-kms-1.10.6.jar 
> http://central.maven.org/maven2/com/amazonaws/aws-java-sdk-kms/1.10.6/aws-java-sdk-kms-1.10.6.jar
> RUN wget -O /opt/flink/lib/jackson-annotations-2.5.3.jar 
> http://central.maven.org/maven2/com/fasterxml/jackson/core/jackson-annotations/2.5.3/jackson-annotations-2.5.3.jar
> RUN wget -O /opt/flink/lib/jackson-core-2.5.3.jar 
> http://central.maven.org/maven2/com/fasterxml/jackson/core/jackson-core/2.5.3/jackson-core-2.5.3.jar
> RUN wget -O /opt/flink/lib/jackson-databind-2.5.3.jar 
> http://central.maven.org/maven2/com/fasterxml/jackson/core/jackson-databind/2.5.3/jackson-databind-2.5.3.jar
> RUN wget -O /opt/flink/lib/joda-time-2.8.1.jar 
> http://central.maven.org/maven2/joda-time/joda-time/2.8.1/joda-time-2.8.1.jar
> RUN wget -O /opt/flink/lib/httpcore-4.3.3.jar 
> http://central.maven.org/maven2/org/apache/httpcomponents/httpcore/4.3.3/httpcore-4.3.3.jar
> RUN wget -O /opt/flink/lib/httpclient-4.3.6.jar 
> http://central.maven.org/maven2/org/apache/httpcomponents/httpclient/4.3.6/httpclient-4.3.6.jar{noformat}
>  
> But when i submit the job, it throws this error during initialzation of 
> BucketingSink/S3AFileSystem: 
> {noformat}
> java.beans.IntrospectionException: bad write method arg count: public final 
> void 
> org.apache.flink.fs.shaded.hadoop3.org.apache.commons.configuration2.AbstractConfiguration.setProperty(java.lang.String,java.lang.Object)
>     at 
> java.beans.PropertyDescriptor.findPropertyType(PropertyDescriptor.java:657)
>     at 
> java.beans.PropertyDescriptor.setWriteMethod(PropertyDescriptor.java:327)
>     at java.beans.PropertyDescriptor.(PropertyDescriptor.java:139)
>     at 
> org.apache.flink.fs.shaded.hadoop3.org.apache.commons.beanutils.FluentPropertyBeanIntrospector.createFluentPropertyDescritor(FluentPropertyBeanIntrospector.java:178)
>     at 
> org.apache.flink.fs.shaded.hadoop3.org.apache.commons.beanutils.FluentPropertyBeanIntrospector.introspect(FluentPropertyBeanIntrospector.java:141)
>     at 
> org.apache.flink.fs.shaded.hadoop3.org.apache.commons.beanutils.PropertyUtilsBean.fetchIntrospectionData(PropertyUtilsBean.java:2245)
>     at 
> org.apache.flink.fs.shaded.hadoop3.org.apache.commons.beanutils.PropertyUtilsBean.getIntrospectionData(PropertyUtilsBean.java:2226)
>     at 
> 

[jira] [Commented] (FLINK-12522) Error While Initializing S3A FileSystem

2019-05-15 Thread Manish Bellani (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-12522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16840816#comment-16840816
 ] 

Manish Bellani commented on FLINK-12522:


Thanks Ken, will do.

> Error While Initializing S3A FileSystem
> ---
>
> Key: FLINK-12522
> URL: https://issues.apache.org/jira/browse/FLINK-12522
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / FileSystem
>Affects Versions: 1.7.2
> Environment: Kubernetes
> Docker
> Debian
> {color:#33}DockerImage: flink:1.7.2-hadoop28-scala_2.11{color}
> {color:#33}Java 1.8{color}
> {color:#33}Hadoop Version: 2.8.5{color}
>Reporter: Manish Bellani
>Priority: Major
>
> hey Friends,
> Thanks for all the work you have been doing on flink, i have been trying to 
> use BucketingSink (backed by S3AFileSystem) to write data to s3 and i'm 
> running into some issues (which i suspect could be dependency/packaging 
> related) that'd try to describe here.
> The data pipeline is quite simple:
> {noformat}
> Kafka -> KafkaConsumer (Source) -> BucketingSink (S3AFileSystem) -> 
> S3{noformat}
> *Environment:* 
> {noformat}
> Kubernetes
> Debian
> DockerImage: flink:1.7.2-hadoop28-scala_2.11
> Java 1.8
> Hadoop Version: 2.8.5{noformat}
> I followed this dependency secion: 
> [https://ci.apache.org/projects/flink/flink-docs-stable/ops/deployment/aws.html#flink-for-hadoop-27]
>  to place the dependencies under /opt/flink/lib (with an exception that my 
> hadoop version and it's dependencies that i pull in are different).
> Here are the dependencies i'm pulling in (excerpt from my Dockerfile) 
> {noformat}
> RUN cp /opt/flink/opt/flink-s3-fs-hadoop-1.7.2.jar 
> /opt/flink/lib/flink-s3-fs-hadoop-1.7.2.jar
> RUN wget -O /opt/flink/lib/hadoop-aws-2.8.5.jar 
> https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/2.8.5/hadoop-aws-2.8.5.jar
> RUN wget -O /opt/flink/lib/aws-java-sdk-s3-1.10.6.jar 
> http://central.maven.org/maven2/com/amazonaws/aws-java-sdk-s3/1.10.6/aws-java-sdk-s3-1.10.6.jar
> #Transitive Dependency of aws-java-sdk-s3
> RUN wget -O /opt/flink/lib/aws-java-sdk-core-1.10.6.jar 
> http://central.maven.org/maven2/com/amazonaws/aws-java-sdk-core/1.10.6/aws-java-sdk-core-1.10.6.jar
> RUN wget -O /opt/flink/lib/aws-java-sdk-kms-1.10.6.jar 
> http://central.maven.org/maven2/com/amazonaws/aws-java-sdk-kms/1.10.6/aws-java-sdk-kms-1.10.6.jar
> RUN wget -O /opt/flink/lib/jackson-annotations-2.5.3.jar 
> http://central.maven.org/maven2/com/fasterxml/jackson/core/jackson-annotations/2.5.3/jackson-annotations-2.5.3.jar
> RUN wget -O /opt/flink/lib/jackson-core-2.5.3.jar 
> http://central.maven.org/maven2/com/fasterxml/jackson/core/jackson-core/2.5.3/jackson-core-2.5.3.jar
> RUN wget -O /opt/flink/lib/jackson-databind-2.5.3.jar 
> http://central.maven.org/maven2/com/fasterxml/jackson/core/jackson-databind/2.5.3/jackson-databind-2.5.3.jar
> RUN wget -O /opt/flink/lib/joda-time-2.8.1.jar 
> http://central.maven.org/maven2/joda-time/joda-time/2.8.1/joda-time-2.8.1.jar
> RUN wget -O /opt/flink/lib/httpcore-4.3.3.jar 
> http://central.maven.org/maven2/org/apache/httpcomponents/httpcore/4.3.3/httpcore-4.3.3.jar
> RUN wget -O /opt/flink/lib/httpclient-4.3.6.jar 
> http://central.maven.org/maven2/org/apache/httpcomponents/httpclient/4.3.6/httpclient-4.3.6.jar{noformat}
>  
> But when i submit the job, it throws this error during initialzation of 
> BucketingSink/S3AFileSystem: 
> {noformat}
> java.beans.IntrospectionException: bad write method arg count: public final 
> void 
> org.apache.flink.fs.shaded.hadoop3.org.apache.commons.configuration2.AbstractConfiguration.setProperty(java.lang.String,java.lang.Object)
>     at 
> java.beans.PropertyDescriptor.findPropertyType(PropertyDescriptor.java:657)
>     at 
> java.beans.PropertyDescriptor.setWriteMethod(PropertyDescriptor.java:327)
>     at java.beans.PropertyDescriptor.(PropertyDescriptor.java:139)
>     at 
> org.apache.flink.fs.shaded.hadoop3.org.apache.commons.beanutils.FluentPropertyBeanIntrospector.createFluentPropertyDescritor(FluentPropertyBeanIntrospector.java:178)
>     at 
> org.apache.flink.fs.shaded.hadoop3.org.apache.commons.beanutils.FluentPropertyBeanIntrospector.introspect(FluentPropertyBeanIntrospector.java:141)
>     at 
> org.apache.flink.fs.shaded.hadoop3.org.apache.commons.beanutils.PropertyUtilsBean.fetchIntrospectionData(PropertyUtilsBean.java:2245)
>     at 
> org.apache.flink.fs.shaded.hadoop3.org.apache.commons.beanutils.PropertyUtilsBean.getIntrospectionData(PropertyUtilsBean.java:2226)
>     at 
> org.apache.flink.fs.shaded.hadoop3.org.apache.commons.beanutils.PropertyUtilsBean.getPropertyDescriptor(PropertyUtilsBean.java:954)
>     at 
> org.apache.flink.fs.shaded.hadoop3.org.apache.commons.beanutils.PropertyUtilsBean.isWriteable(PropertyUtilsBean.java:1478)
>     

[jira] [Updated] (FLINK-12522) Error While Initializing S3A FileSystem

2019-05-15 Thread Manish Bellani (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-12522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manish Bellani updated FLINK-12522:
---
Description: 
hey Friends,

Thanks for all the work you have been doing on flink, i have been trying to use 
BucketingSink (backed by S3AFileSystem) to write data to s3 and i'm running 
into some issues (which i suspect could be dependency/packaging related) that'd 
try to describe here.

The data pipeline is quite simple:
{noformat}
Kafka -> KafkaConsumer (Source) -> BucketingSink (S3AFileSystem) -> S3{noformat}
*Environment:* 
{noformat}
Kubernetes
Debian
DockerImage: flink:1.7.2-hadoop28-scala_2.11
Java 1.8
Hadoop Version: 2.8.5{noformat}
I followed this dependency secion: 
[https://ci.apache.org/projects/flink/flink-docs-stable/ops/deployment/aws.html#flink-for-hadoop-27]
 to place the dependencies under /opt/flink/lib (with an exception that my 
hadoop version and it's dependencies that i pull in are different).

Here are the dependencies i'm pulling in (excerpt from my Dockerfile) 
{noformat}
RUN cp /opt/flink/opt/flink-s3-fs-hadoop-1.7.2.jar 
/opt/flink/lib/flink-s3-fs-hadoop-1.7.2.jar
RUN wget -O /opt/flink/lib/hadoop-aws-2.8.5.jar 
https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/2.8.5/hadoop-aws-2.8.5.jar
RUN wget -O /opt/flink/lib/aws-java-sdk-s3-1.10.6.jar 
http://central.maven.org/maven2/com/amazonaws/aws-java-sdk-s3/1.10.6/aws-java-sdk-s3-1.10.6.jar
#Transitive Dependency of aws-java-sdk-s3
RUN wget -O /opt/flink/lib/aws-java-sdk-core-1.10.6.jar 
http://central.maven.org/maven2/com/amazonaws/aws-java-sdk-core/1.10.6/aws-java-sdk-core-1.10.6.jar
RUN wget -O /opt/flink/lib/aws-java-sdk-kms-1.10.6.jar 
http://central.maven.org/maven2/com/amazonaws/aws-java-sdk-kms/1.10.6/aws-java-sdk-kms-1.10.6.jar
RUN wget -O /opt/flink/lib/jackson-annotations-2.5.3.jar 
http://central.maven.org/maven2/com/fasterxml/jackson/core/jackson-annotations/2.5.3/jackson-annotations-2.5.3.jar
RUN wget -O /opt/flink/lib/jackson-core-2.5.3.jar 
http://central.maven.org/maven2/com/fasterxml/jackson/core/jackson-core/2.5.3/jackson-core-2.5.3.jar
RUN wget -O /opt/flink/lib/jackson-databind-2.5.3.jar 
http://central.maven.org/maven2/com/fasterxml/jackson/core/jackson-databind/2.5.3/jackson-databind-2.5.3.jar
RUN wget -O /opt/flink/lib/joda-time-2.8.1.jar 
http://central.maven.org/maven2/joda-time/joda-time/2.8.1/joda-time-2.8.1.jar
RUN wget -O /opt/flink/lib/httpcore-4.3.3.jar 
http://central.maven.org/maven2/org/apache/httpcomponents/httpcore/4.3.3/httpcore-4.3.3.jar
RUN wget -O /opt/flink/lib/httpclient-4.3.6.jar 
http://central.maven.org/maven2/org/apache/httpcomponents/httpclient/4.3.6/httpclient-4.3.6.jar{noformat}
 

But when i submit the job, it throws this error during initialzation of 
BucketingSink/S3AFileSystem: 
{noformat}
java.beans.IntrospectionException: bad write method arg count: public final 
void 
org.apache.flink.fs.shaded.hadoop3.org.apache.commons.configuration2.AbstractConfiguration.setProperty(java.lang.String,java.lang.Object)
    at 
java.beans.PropertyDescriptor.findPropertyType(PropertyDescriptor.java:657)
    at java.beans.PropertyDescriptor.setWriteMethod(PropertyDescriptor.java:327)
    at java.beans.PropertyDescriptor.(PropertyDescriptor.java:139)
    at 
org.apache.flink.fs.shaded.hadoop3.org.apache.commons.beanutils.FluentPropertyBeanIntrospector.createFluentPropertyDescritor(FluentPropertyBeanIntrospector.java:178)
    at 
org.apache.flink.fs.shaded.hadoop3.org.apache.commons.beanutils.FluentPropertyBeanIntrospector.introspect(FluentPropertyBeanIntrospector.java:141)
    at 
org.apache.flink.fs.shaded.hadoop3.org.apache.commons.beanutils.PropertyUtilsBean.fetchIntrospectionData(PropertyUtilsBean.java:2245)
    at 
org.apache.flink.fs.shaded.hadoop3.org.apache.commons.beanutils.PropertyUtilsBean.getIntrospectionData(PropertyUtilsBean.java:2226)
    at 
org.apache.flink.fs.shaded.hadoop3.org.apache.commons.beanutils.PropertyUtilsBean.getPropertyDescriptor(PropertyUtilsBean.java:954)
    at 
org.apache.flink.fs.shaded.hadoop3.org.apache.commons.beanutils.PropertyUtilsBean.isWriteable(PropertyUtilsBean.java:1478)
    at 
org.apache.flink.fs.shaded.hadoop3.org.apache.commons.configuration2.beanutils.BeanHelper.isPropertyWriteable(BeanHelper.java:521)
    at 
org.apache.flink.fs.shaded.hadoop3.org.apache.commons.configuration2.beanutils.BeanHelper.initProperty(BeanHelper.java:357)
    at 
org.apache.flink.fs.shaded.hadoop3.org.apache.commons.configuration2.beanutils.BeanHelper.initBeanProperties(BeanHelper.java:273)
    at 
org.apache.flink.fs.shaded.hadoop3.org.apache.commons.configuration2.beanutils.BeanHelper.initBean(BeanHelper.java:192)
    at 
org.apache.flink.fs.shaded.hadoop3.org.apache.commons.configuration2.beanutils.BeanHelper$BeanCreationContextImpl.initBean(BeanHelper.java:669)
    at 

[jira] [Updated] (FLINK-12522) Error While Initializing S3A FileSystem

2019-05-15 Thread Manish Bellani (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-12522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manish Bellani updated FLINK-12522:
---
Description: 
hey Friends,


Thanks for all the work you have been doing on flink, i have been trying to use 
BucketingSink (backed by S3AFileSystem) to write data to s3 and i'm running 
into some issues (which i suspect could be dependency/packaging related) that'd 
try to describe here.

The data pipeline is quite simple:
{noformat}
Kafka -> KafkaConsumer (Source) -> BucketingSink (S3AFileSystem) -> S3{noformat}
*Environment:*

 

 
{noformat}
Kubernetes
Debian
DockerImage: flink:1.7.2-hadoop28-scala_2.11
Java 1.8
Hadoop Version: 2.8.5{noformat}
I followed this dependency secion: 
[https://ci.apache.org/projects/flink/flink-docs-stable/ops/deployment/aws.html#flink-for-hadoop-27]
 to place the dependencies under /opt/flink/lib (with an exception that my 
hadoop version and it's dependencies that i pull in are different).

Here are the dependencies i'm pulling in (excerpt from my Dockerfile):

 

 
{noformat}
RUN cp /opt/flink/opt/flink-s3-fs-hadoop-1.7.2.jar 
/opt/flink/lib/flink-s3-fs-hadoop-1.7.2.jar
RUN wget -O /opt/flink/lib/hadoop-aws-2.8.5.jar 
https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/2.8.5/hadoop-aws-2.8.5.jar
RUN wget -O /opt/flink/lib/aws-java-sdk-s3-1.10.6.jar 
http://central.maven.org/maven2/com/amazonaws/aws-java-sdk-s3/1.10.6/aws-java-sdk-s3-1.10.6.jar
#Transitive Dependency of aws-java-sdk-s3
RUN wget -O /opt/flink/lib/aws-java-sdk-core-1.10.6.jar 
http://central.maven.org/maven2/com/amazonaws/aws-java-sdk-core/1.10.6/aws-java-sdk-core-1.10.6.jar
RUN wget -O /opt/flink/lib/aws-java-sdk-kms-1.10.6.jar 
http://central.maven.org/maven2/com/amazonaws/aws-java-sdk-kms/1.10.6/aws-java-sdk-kms-1.10.6.jar
RUN wget -O /opt/flink/lib/jackson-annotations-2.5.3.jar 
http://central.maven.org/maven2/com/fasterxml/jackson/core/jackson-annotations/2.5.3/jackson-annotations-2.5.3.jar
RUN wget -O /opt/flink/lib/jackson-core-2.5.3.jar 
http://central.maven.org/maven2/com/fasterxml/jackson/core/jackson-core/2.5.3/jackson-core-2.5.3.jar
RUN wget -O /opt/flink/lib/jackson-databind-2.5.3.jar 
http://central.maven.org/maven2/com/fasterxml/jackson/core/jackson-databind/2.5.3/jackson-databind-2.5.3.jar
RUN wget -O /opt/flink/lib/joda-time-2.8.1.jar 
http://central.maven.org/maven2/joda-time/joda-time/2.8.1/joda-time-2.8.1.jar
RUN wget -O /opt/flink/lib/httpcore-4.3.3.jar 
http://central.maven.org/maven2/org/apache/httpcomponents/httpcore/4.3.3/httpcore-4.3.3.jar
RUN wget -O /opt/flink/lib/httpclient-4.3.6.jar 
http://central.maven.org/maven2/org/apache/httpcomponents/httpclient/4.3.6/httpclient-4.3.6.jar{noformat}
 

 

But when i submit the job, it throws this error during initialzation of 
BucketingSink/S3AFileSystem:

 

 
{noformat}
java.beans.IntrospectionException: bad write method arg count: public final 
void 
org.apache.flink.fs.shaded.hadoop3.org.apache.commons.configuration2.AbstractConfiguration.setProperty(java.lang.String,java.lang.Object)
    at 
java.beans.PropertyDescriptor.findPropertyType(PropertyDescriptor.java:657)
    at java.beans.PropertyDescriptor.setWriteMethod(PropertyDescriptor.java:327)
    at java.beans.PropertyDescriptor.(PropertyDescriptor.java:139)
    at 
org.apache.flink.fs.shaded.hadoop3.org.apache.commons.beanutils.FluentPropertyBeanIntrospector.createFluentPropertyDescritor(FluentPropertyBeanIntrospector.java:178)
    at 
org.apache.flink.fs.shaded.hadoop3.org.apache.commons.beanutils.FluentPropertyBeanIntrospector.introspect(FluentPropertyBeanIntrospector.java:141)
    at 
org.apache.flink.fs.shaded.hadoop3.org.apache.commons.beanutils.PropertyUtilsBean.fetchIntrospectionData(PropertyUtilsBean.java:2245)
    at 
org.apache.flink.fs.shaded.hadoop3.org.apache.commons.beanutils.PropertyUtilsBean.getIntrospectionData(PropertyUtilsBean.java:2226)
    at 
org.apache.flink.fs.shaded.hadoop3.org.apache.commons.beanutils.PropertyUtilsBean.getPropertyDescriptor(PropertyUtilsBean.java:954)
    at 
org.apache.flink.fs.shaded.hadoop3.org.apache.commons.beanutils.PropertyUtilsBean.isWriteable(PropertyUtilsBean.java:1478)
    at 
org.apache.flink.fs.shaded.hadoop3.org.apache.commons.configuration2.beanutils.BeanHelper.isPropertyWriteable(BeanHelper.java:521)
    at 
org.apache.flink.fs.shaded.hadoop3.org.apache.commons.configuration2.beanutils.BeanHelper.initProperty(BeanHelper.java:357)
    at 
org.apache.flink.fs.shaded.hadoop3.org.apache.commons.configuration2.beanutils.BeanHelper.initBeanProperties(BeanHelper.java:273)
    at 
org.apache.flink.fs.shaded.hadoop3.org.apache.commons.configuration2.beanutils.BeanHelper.initBean(BeanHelper.java:192)
    at 
org.apache.flink.fs.shaded.hadoop3.org.apache.commons.configuration2.beanutils.BeanHelper$BeanCreationContextImpl.initBean(BeanHelper.java:669)
    at 

[jira] [Created] (FLINK-12522) Error While Initializing S3A FileSystem

2019-05-15 Thread Manish Bellani (JIRA)
Manish Bellani created FLINK-12522:
--

 Summary: Error While Initializing S3A FileSystem
 Key: FLINK-12522
 URL: https://issues.apache.org/jira/browse/FLINK-12522
 Project: Flink
  Issue Type: Bug
  Components: Connectors / FileSystem
Affects Versions: 1.7.2
 Environment: Kubernetes

Docker

Debian

{color:#33}DockerImage: flink:1.7.2-hadoop28-scala_2.11{color}

{color:#33}Java 1.8{color}

{color:#33}Hadoop Version: 2.8.5{color}
Reporter: Manish Bellani


hey Friends,
Thanks for all the work you have been doing on flink, i have been trying to use 
BucketingSink (backed by S3AFileSystem) to write data to s3 and i'm running 
into some issues (which i suspect could be dependency/packaging related) that'd 
try to describe here.

The data pipeline is quite simple:
{noformat}
Kafka -> KafkaConsumer (Source) -> BucketingSink (S3AFileSystem) -> S3{noformat}

*Environment:*

 

 
{noformat}
Kubernetes
Debian
DockerImage: flink:1.7.2-hadoop28-scala_2.11
Java 1.8
Hadoop Version: 2.8.5{noformat}


I followed this dependency secion: 
https://ci.apache.org/projects/flink/flink-docs-stable/ops/deployment/aws.html#flink-for-hadoop-27
 to place the dependencies under /opt/flink/lib (with an exception that my 
hadoop version and it's dependencies that i pull in are different).

Here are the dependencies i'm pulling in (excerpt from my Dockerfile):



 

 
{noformat}
RUN cp /opt/flink/opt/flink-s3-fs-hadoop-1.7.2.jar 
/opt/flink/lib/flink-s3-fs-hadoop-1.7.2.jar
RUN wget -O /opt/flink/lib/hadoop-aws-2.8.5.jar 
https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/2.8.5/hadoop-aws-2.8.5.jar
RUN wget -O /opt/flink/lib/aws-java-sdk-s3-1.10.6.jar 
http://central.maven.org/maven2/com/amazonaws/aws-java-sdk-s3/1.10.6/aws-java-sdk-s3-1.10.6.jar
#Transitive Dependency of aws-java-sdk-s3
RUN wget -O /opt/flink/lib/aws-java-sdk-core-1.10.6.jar 
http://central.maven.org/maven2/com/amazonaws/aws-java-sdk-core/1.10.6/aws-java-sdk-core-1.10.6.jar
RUN wget -O /opt/flink/lib/aws-java-sdk-kms-1.10.6.jar 
http://central.maven.org/maven2/com/amazonaws/aws-java-sdk-kms/1.10.6/aws-java-sdk-kms-1.10.6.jar
RUN wget -O /opt/flink/lib/jackson-annotations-2.5.3.jar 
http://central.maven.org/maven2/com/fasterxml/jackson/core/jackson-annotations/2.5.3/jackson-annotations-2.5.3.jar
RUN wget -O /opt/flink/lib/jackson-core-2.5.3.jar 
http://central.maven.org/maven2/com/fasterxml/jackson/core/jackson-core/2.5.3/jackson-core-2.5.3.jar
RUN wget -O /opt/flink/lib/jackson-databind-2.5.3.jar 
http://central.maven.org/maven2/com/fasterxml/jackson/core/jackson-databind/2.5.3/jackson-databind-2.5.3.jar
RUN wget -O /opt/flink/lib/joda-time-2.8.1.jar 
http://central.maven.org/maven2/joda-time/joda-time/2.8.1/joda-time-2.8.1.jar
RUN wget -O /opt/flink/lib/httpcore-4.3.3.jar 
http://central.maven.org/maven2/org/apache/httpcomponents/httpcore/4.3.3/httpcore-4.3.3.jar
RUN wget -O /opt/flink/lib/httpclient-4.3.6.jar 
http://central.maven.org/maven2/org/apache/httpcomponents/httpclient/4.3.6/httpclient-4.3.6.jar{noformat}
 

 

But when i submit the job, it throws this error during initialzation of 
BucketingSink/S3AFileSystem:

 

 
{noformat}
java.beans.IntrospectionException: bad write method arg count: public final 
void 
org.apache.flink.fs.shaded.hadoop3.org.apache.commons.configuration2.AbstractConfiguration.setProperty(java.lang.String,java.lang.Object)
    at 
java.beans.PropertyDescriptor.findPropertyType(PropertyDescriptor.java:657)
    at java.beans.PropertyDescriptor.setWriteMethod(PropertyDescriptor.java:327)
    at java.beans.PropertyDescriptor.(PropertyDescriptor.java:139)
    at 
org.apache.flink.fs.shaded.hadoop3.org.apache.commons.beanutils.FluentPropertyBeanIntrospector.createFluentPropertyDescritor(FluentPropertyBeanIntrospector.java:178)
    at 
org.apache.flink.fs.shaded.hadoop3.org.apache.commons.beanutils.FluentPropertyBeanIntrospector.introspect(FluentPropertyBeanIntrospector.java:141)
    at 
org.apache.flink.fs.shaded.hadoop3.org.apache.commons.beanutils.PropertyUtilsBean.fetchIntrospectionData(PropertyUtilsBean.java:2245)
    at 
org.apache.flink.fs.shaded.hadoop3.org.apache.commons.beanutils.PropertyUtilsBean.getIntrospectionData(PropertyUtilsBean.java:2226)
    at 
org.apache.flink.fs.shaded.hadoop3.org.apache.commons.beanutils.PropertyUtilsBean.getPropertyDescriptor(PropertyUtilsBean.java:954)
    at 
org.apache.flink.fs.shaded.hadoop3.org.apache.commons.beanutils.PropertyUtilsBean.isWriteable(PropertyUtilsBean.java:1478)
    at 
org.apache.flink.fs.shaded.hadoop3.org.apache.commons.configuration2.beanutils.BeanHelper.isPropertyWriteable(BeanHelper.java:521)
    at 
org.apache.flink.fs.shaded.hadoop3.org.apache.commons.configuration2.beanutils.BeanHelper.initProperty(BeanHelper.java:357)
    at