subject:"\[jira\] \[Updated\] \(FLINK\-9749\) Rework Bucketing Sink"

[jira] [Updated] (FLINK-9749) Rework Bucketing Sink

2020-02-19 Thread Fokko Driesprong (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-9749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fokko Driesprong updated FLINK-9749:

Description: 
The BucketingSink has a series of deficits at the moment.

Due to the long list of issues, I would suggest to add a new StreamingFileSink 
with a new and cleaner design
h3. Encoders, Parquet, ORC
 - It only efficiently supports row-wise data formats (avro, json, sequence 
files).
 - Efforts to add (columnar) compression for blocks of data is inefficient, 
because blocks cannot span checkpoints due to persistence-on-checkpoint.
 - The encoders are part of the {{flink-connector-filesystem project}}, rather 
than in orthogonal formats projects. This blows up the dependencies of the 
{{flink-connector-filesystem project}} project. As an example, the rolling file 
sink has dependencies on Hadoop and Avro, which messes up dependency management.

h3. Use of FileSystems
 - The BucketingSink works only on Hadoop's FileSystem abstraction not support 
Flink's own FileSystem abstraction and cannot work with the packaged S3, 
maprfs, and swift file systems
 - The sink hence needs Hadoop as a dependency
 - The sink relies on "trying out" whether truncation works, which requires 
write access to the users working directory
 - The sink relies on enumerating and counting files, rather than maintaining 
its own state, making less efficient

h3. Correctness and Efficiency on S3
 - The BucketingSink relies on strong consistency in the file enumeration, 
hence may work incorrectly on S3.
 - The BucketingSink relies on persisting streams at intermediate points. This 
is not working properly on S3, hence there may be data loss on S3.

h3. .valid-length companion file
 - The valid length file makes it hard for consumers of the data and should be 
dropped

We track this design in a series of sub issues.

  was:
The BucketingSink has a series of deficits at the moment.

Due to the long list of issues, I would suggest to add a new StreamingFileSink 
with a new and cleaner design

h3. Encoders, Parquet, ORC

 - It only efficiently supports row-wise data formats (avro, jso, sequence 
files.
 - Efforts to add (columnar) compression for blocks of data is inefficient, 
because blocks cannot span checkpoints due to persistence-on-checkpoint.
 - The encoders are part of the \{{flink-connector-filesystem project}}, rather 
than in orthogonal formats projects. This blows up the dependencies of the 
\{{flink-connector-filesystem project}} project. As an example, the rolling 
file sink has dependencies on Hadoop and Avro, which messes up dependency 
management.

h3. Use of FileSystems

 - The BucketingSink works only on Hadoop's FileSystem abstraction not support 
Flink's own FileSystem abstraction and cannot work with the packaged S3, 
maprfs, and swift file systems
 - The sink hence needs Hadoop as a dependency
 - The sink relies on "trying out" whether truncation works, which requires 
write access to the users working directory
 - The sink relies on enumerating and counting files, rather than maintaining 
its own state, making less efficient

h3. Correctness and Efficiency on S3
 - The BucketingSink relies on strong consistency in the file enumeration, 
hence may work incorrectly on S3.
 - The BucketingSink relies on persisting streams at intermediate points. This 
is not working properly on S3, hence there may be data loss on S3.

h3. .valid-length companion file
 - The valid length file makes it hard for consumers of the data and should be 
dropped


We track this design in a series of sub issues.


> Rework Bucketing Sink
> -
>
> Key: FLINK-9749
> URL: https://issues.apache.org/jira/browse/FLINK-9749
> Project: Flink
>  Issue Type: New Feature
>  Components: Connectors / FileSystem
>Reporter: Stephan Ewen
>Assignee: Kostas Kloudas
>Priority: Major
>
> The BucketingSink has a series of deficits at the moment.
> Due to the long list of issues, I would suggest to add a new 
> StreamingFileSink with a new and cleaner design
> h3. Encoders, Parquet, ORC
>  - It only efficiently supports row-wise data formats (avro, json, sequence 
> files).
>  - Efforts to add (columnar) compression for blocks of data is inefficient, 
> because blocks cannot span checkpoints due to persistence-on-checkpoint.
>  - The encoders are part of the {{flink-connector-filesystem project}}, 
> rather than in orthogonal formats projects. This blows up the dependencies of 
> the {{flink-connector-filesystem project}} project. As an example, the 
> rolling file sink has dependencies on Hadoop and Avro, which messes up 
> dependency management.
> h3. Use of FileSystems
>  - The BucketingSink works only on Hadoop's FileSystem abstraction not 
> support Flink's own FileSystem abstraction and cannot work with the packaged 
> S3, maprfs, and

[jira] [Updated] (FLINK-9749) Rework Bucketing Sink

2019-02-27 Thread Robert Metzger (JIRA)



 [ 
https://issues.apache.org/jira/browse/FLINK-9749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger updated FLINK-9749:
--
Component/s: (was: Connectors / Common)
 Connectors / FileSystem

> Rework Bucketing Sink
> -
>
> Key: FLINK-9749
> URL: https://issues.apache.org/jira/browse/FLINK-9749
> Project: Flink
>  Issue Type: New Feature
>  Components: Connectors / FileSystem
>Reporter: Stephan Ewen
>Assignee: Kostas Kloudas
>Priority: Major
>
> The BucketingSink has a series of deficits at the moment.
> Due to the long list of issues, I would suggest to add a new 
> StreamingFileSink with a new and cleaner design
> h3. Encoders, Parquet, ORC
>  - It only efficiently supports row-wise data formats (avro, jso, sequence 
> files.
>  - Efforts to add (columnar) compression for blocks of data is inefficient, 
> because blocks cannot span checkpoints due to persistence-on-checkpoint.
>  - The encoders are part of the \{{flink-connector-filesystem project}}, 
> rather than in orthogonal formats projects. This blows up the dependencies of 
> the \{{flink-connector-filesystem project}} project. As an example, the 
> rolling file sink has dependencies on Hadoop and Avro, which messes up 
> dependency management.
> h3. Use of FileSystems
>  - The BucketingSink works only on Hadoop's FileSystem abstraction not 
> support Flink's own FileSystem abstraction and cannot work with the packaged 
> S3, maprfs, and swift file systems
>  - The sink hence needs Hadoop as a dependency
>  - The sink relies on "trying out" whether truncation works, which requires 
> write access to the users working directory
>  - The sink relies on enumerating and counting files, rather than maintaining 
> its own state, making less efficient
> h3. Correctness and Efficiency on S3
>  - The BucketingSink relies on strong consistency in the file enumeration, 
> hence may work incorrectly on S3.
>  - The BucketingSink relies on persisting streams at intermediate points. 
> This is not working properly on S3, hence there may be data loss on S3.
> h3. .valid-length companion file
>  - The valid length file makes it hard for consumers of the data and should 
> be dropped
> We track this design in a series of sub issues.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (FLINK-9749) Rework Bucketing Sink

2018-12-17 Thread Tzu-Li (Gordon) Tai (JIRA)



 [ 
https://issues.apache.org/jira/browse/FLINK-9749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tzu-Li (Gordon) Tai updated FLINK-9749:
---
Fix Version/s: (was: 1.6.3)

> Rework Bucketing Sink
> -
>
> Key: FLINK-9749
> URL: https://issues.apache.org/jira/browse/FLINK-9749
> Project: Flink
>  Issue Type: New Feature
>  Components: Streaming Connectors
>Reporter: Stephan Ewen
>Assignee: Kostas Kloudas
>Priority: Major
>
> The BucketingSink has a series of deficits at the moment.
> Due to the long list of issues, I would suggest to add a new 
> StreamingFileSink with a new and cleaner design
> h3. Encoders, Parquet, ORC
>  - It only efficiently supports row-wise data formats (avro, jso, sequence 
> files.
>  - Efforts to add (columnar) compression for blocks of data is inefficient, 
> because blocks cannot span checkpoints due to persistence-on-checkpoint.
>  - The encoders are part of the \{{flink-connector-filesystem project}}, 
> rather than in orthogonal formats projects. This blows up the dependencies of 
> the \{{flink-connector-filesystem project}} project. As an example, the 
> rolling file sink has dependencies on Hadoop and Avro, which messes up 
> dependency management.
> h3. Use of FileSystems
>  - The BucketingSink works only on Hadoop's FileSystem abstraction not 
> support Flink's own FileSystem abstraction and cannot work with the packaged 
> S3, maprfs, and swift file systems
>  - The sink hence needs Hadoop as a dependency
>  - The sink relies on "trying out" whether truncation works, which requires 
> write access to the users working directory
>  - The sink relies on enumerating and counting files, rather than maintaining 
> its own state, making less efficient
> h3. Correctness and Efficiency on S3
>  - The BucketingSink relies on strong consistency in the file enumeration, 
> hence may work incorrectly on S3.
>  - The BucketingSink relies on persisting streams at intermediate points. 
> This is not working properly on S3, hence there may be data loss on S3.
> h3. .valid-length companion file
>  - The valid length file makes it hard for consumers of the data and should 
> be dropped
> We track this design in a series of sub issues.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (FLINK-9749) Rework Bucketing Sink

2018-09-15 Thread Till Rohrmann (JIRA)



 [ 
https://issues.apache.org/jira/browse/FLINK-9749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann updated FLINK-9749:
-
Fix Version/s: (was: 1.6.1)

> Rework Bucketing Sink
> -
>
> Key: FLINK-9749
> URL: https://issues.apache.org/jira/browse/FLINK-9749
> Project: Flink
>  Issue Type: New Feature
>  Components: Streaming Connectors
>Reporter: Stephan Ewen
>Assignee: Kostas Kloudas
>Priority: Major
> Fix For: 1.6.2
>
>
> The BucketingSink has a series of deficits at the moment.
> Due to the long list of issues, I would suggest to add a new 
> StreamingFileSink with a new and cleaner design
> h3. Encoders, Parquet, ORC
>  - It only efficiently supports row-wise data formats (avro, jso, sequence 
> files.
>  - Efforts to add (columnar) compression for blocks of data is inefficient, 
> because blocks cannot span checkpoints due to persistence-on-checkpoint.
>  - The encoders are part of the \{{flink-connector-filesystem project}}, 
> rather than in orthogonal formats projects. This blows up the dependencies of 
> the \{{flink-connector-filesystem project}} project. As an example, the 
> rolling file sink has dependencies on Hadoop and Avro, which messes up 
> dependency management.
> h3. Use of FileSystems
>  - The BucketingSink works only on Hadoop's FileSystem abstraction not 
> support Flink's own FileSystem abstraction and cannot work with the packaged 
> S3, maprfs, and swift file systems
>  - The sink hence needs Hadoop as a dependency
>  - The sink relies on "trying out" whether truncation works, which requires 
> write access to the users working directory
>  - The sink relies on enumerating and counting files, rather than maintaining 
> its own state, making less efficient
> h3. Correctness and Efficiency on S3
>  - The BucketingSink relies on strong consistency in the file enumeration, 
> hence may work incorrectly on S3.
>  - The BucketingSink relies on persisting streams at intermediate points. 
> This is not working properly on S3, hence there may be data loss on S3.
> h3. .valid-length companion file
>  - The valid length file makes it hard for consumers of the data and should 
> be dropped
> We track this design in a series of sub issues.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (FLINK-9749) Rework Bucketing Sink

2018-09-15 Thread Till Rohrmann (JIRA)



 [ 
https://issues.apache.org/jira/browse/FLINK-9749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann updated FLINK-9749:
-
Fix Version/s: 1.6.2

> Rework Bucketing Sink
> -
>
> Key: FLINK-9749
> URL: https://issues.apache.org/jira/browse/FLINK-9749
> Project: Flink
>  Issue Type: New Feature
>  Components: Streaming Connectors
>Reporter: Stephan Ewen
>Assignee: Kostas Kloudas
>Priority: Major
> Fix For: 1.6.2
>
>
> The BucketingSink has a series of deficits at the moment.
> Due to the long list of issues, I would suggest to add a new 
> StreamingFileSink with a new and cleaner design
> h3. Encoders, Parquet, ORC
>  - It only efficiently supports row-wise data formats (avro, jso, sequence 
> files.
>  - Efforts to add (columnar) compression for blocks of data is inefficient, 
> because blocks cannot span checkpoints due to persistence-on-checkpoint.
>  - The encoders are part of the \{{flink-connector-filesystem project}}, 
> rather than in orthogonal formats projects. This blows up the dependencies of 
> the \{{flink-connector-filesystem project}} project. As an example, the 
> rolling file sink has dependencies on Hadoop and Avro, which messes up 
> dependency management.
> h3. Use of FileSystems
>  - The BucketingSink works only on Hadoop's FileSystem abstraction not 
> support Flink's own FileSystem abstraction and cannot work with the packaged 
> S3, maprfs, and swift file systems
>  - The sink hence needs Hadoop as a dependency
>  - The sink relies on "trying out" whether truncation works, which requires 
> write access to the users working directory
>  - The sink relies on enumerating and counting files, rather than maintaining 
> its own state, making less efficient
> h3. Correctness and Efficiency on S3
>  - The BucketingSink relies on strong consistency in the file enumeration, 
> hence may work incorrectly on S3.
>  - The BucketingSink relies on persisting streams at intermediate points. 
> This is not working properly on S3, hence there may be data loss on S3.
> h3. .valid-length companion file
>  - The valid length file makes it hard for consumers of the data and should 
> be dropped
> We track this design in a series of sub issues.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (FLINK-9749) Rework Bucketing Sink

2018-08-13 Thread Chesnay Schepler (JIRA)



 [ 
https://issues.apache.org/jira/browse/FLINK-9749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler updated FLINK-9749:

Fix Version/s: (was: 1.6.0)
   1.6.1

> Rework Bucketing Sink
> -
>
> Key: FLINK-9749
> URL: https://issues.apache.org/jira/browse/FLINK-9749
> Project: Flink
>  Issue Type: New Feature
>  Components: Streaming Connectors
>Reporter: Stephan Ewen
>Assignee: Kostas Kloudas
>Priority: Major
> Fix For: 1.6.1
>
>
> The BucketingSink has a series of deficits at the moment.
> Due to the long list of issues, I would suggest to add a new 
> StreamingFileSink with a new and cleaner design
> h3. Encoders, Parquet, ORC
>  - It only efficiently supports row-wise data formats (avro, jso, sequence 
> files.
>  - Efforts to add (columnar) compression for blocks of data is inefficient, 
> because blocks cannot span checkpoints due to persistence-on-checkpoint.
>  - The encoders are part of the \{{flink-connector-filesystem project}}, 
> rather than in orthogonal formats projects. This blows up the dependencies of 
> the \{{flink-connector-filesystem project}} project. As an example, the 
> rolling file sink has dependencies on Hadoop and Avro, which messes up 
> dependency management.
> h3. Use of FileSystems
>  - The BucketingSink works only on Hadoop's FileSystem abstraction not 
> support Flink's own FileSystem abstraction and cannot work with the packaged 
> S3, maprfs, and swift file systems
>  - The sink hence needs Hadoop as a dependency
>  - The sink relies on "trying out" whether truncation works, which requires 
> write access to the users working directory
>  - The sink relies on enumerating and counting files, rather than maintaining 
> its own state, making less efficient
> h3. Correctness and Efficiency on S3
>  - The BucketingSink relies on strong consistency in the file enumeration, 
> hence may work incorrectly on S3.
>  - The BucketingSink relies on persisting streams at intermediate points. 
> This is not working properly on S3, hence there may be data loss on S3.
> h3. .valid-length companion file
>  - The valid length file makes it hard for consumers of the data and should 
> be dropped
> We track this design in a series of sub issues.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (FLINK-9749) Rework Bucketing Sink

2018-08-09 Thread Till Rohrmann (JIRA)



 [ 
https://issues.apache.org/jira/browse/FLINK-9749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann updated FLINK-9749:
-
Fix Version/s: (was: 1.7.0)
   1.6.0

> Rework Bucketing Sink
> -
>
> Key: FLINK-9749
> URL: https://issues.apache.org/jira/browse/FLINK-9749
> Project: Flink
>  Issue Type: New Feature
>  Components: Streaming Connectors
>Reporter: Stephan Ewen
>Assignee: Kostas Kloudas
>Priority: Major
> Fix For: 1.6.0
>
>
> The BucketingSink has a series of deficits at the moment.
> Due to the long list of issues, I would suggest to add a new 
> StreamingFileSink with a new and cleaner design
> h3. Encoders, Parquet, ORC
>  - It only efficiently supports row-wise data formats (avro, jso, sequence 
> files.
>  - Efforts to add (columnar) compression for blocks of data is inefficient, 
> because blocks cannot span checkpoints due to persistence-on-checkpoint.
>  - The encoders are part of the \{{flink-connector-filesystem project}}, 
> rather than in orthogonal formats projects. This blows up the dependencies of 
> the \{{flink-connector-filesystem project}} project. As an example, the 
> rolling file sink has dependencies on Hadoop and Avro, which messes up 
> dependency management.
> h3. Use of FileSystems
>  - The BucketingSink works only on Hadoop's FileSystem abstraction not 
> support Flink's own FileSystem abstraction and cannot work with the packaged 
> S3, maprfs, and swift file systems
>  - The sink hence needs Hadoop as a dependency
>  - The sink relies on "trying out" whether truncation works, which requires 
> write access to the users working directory
>  - The sink relies on enumerating and counting files, rather than maintaining 
> its own state, making less efficient
> h3. Correctness and Efficiency on S3
>  - The BucketingSink relies on strong consistency in the file enumeration, 
> hence may work incorrectly on S3.
>  - The BucketingSink relies on persisting streams at intermediate points. 
> This is not working properly on S3, hence there may be data loss on S3.
> h3. .valid-length companion file
>  - The valid length file makes it hard for consumers of the data and should 
> be dropped
> We track this design in a series of sub issues.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (FLINK-9749) Rework Bucketing Sink

2018-08-09 Thread Till Rohrmann (JIRA)



 [ 
https://issues.apache.org/jira/browse/FLINK-9749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann updated FLINK-9749:
-
Fix Version/s: (was: 1.6.0)
   1.7.0

> Rework Bucketing Sink
> -
>
> Key: FLINK-9749
> URL: https://issues.apache.org/jira/browse/FLINK-9749
> Project: Flink
>  Issue Type: New Feature
>  Components: Streaming Connectors
>Reporter: Stephan Ewen
>Assignee: Kostas Kloudas
>Priority: Major
> Fix For: 1.7.0
>
>
> The BucketingSink has a series of deficits at the moment.
> Due to the long list of issues, I would suggest to add a new 
> StreamingFileSink with a new and cleaner design
> h3. Encoders, Parquet, ORC
>  - It only efficiently supports row-wise data formats (avro, jso, sequence 
> files.
>  - Efforts to add (columnar) compression for blocks of data is inefficient, 
> because blocks cannot span checkpoints due to persistence-on-checkpoint.
>  - The encoders are part of the \{{flink-connector-filesystem project}}, 
> rather than in orthogonal formats projects. This blows up the dependencies of 
> the \{{flink-connector-filesystem project}} project. As an example, the 
> rolling file sink has dependencies on Hadoop and Avro, which messes up 
> dependency management.
> h3. Use of FileSystems
>  - The BucketingSink works only on Hadoop's FileSystem abstraction not 
> support Flink's own FileSystem abstraction and cannot work with the packaged 
> S3, maprfs, and swift file systems
>  - The sink hence needs Hadoop as a dependency
>  - The sink relies on "trying out" whether truncation works, which requires 
> write access to the users working directory
>  - The sink relies on enumerating and counting files, rather than maintaining 
> its own state, making less efficient
> h3. Correctness and Efficiency on S3
>  - The BucketingSink relies on strong consistency in the file enumeration, 
> hence may work incorrectly on S3.
>  - The BucketingSink relies on persisting streams at intermediate points. 
> This is not working properly on S3, hence there may be data loss on S3.
> h3. .valid-length companion file
>  - The valid length file makes it hard for consumers of the data and should 
> be dropped
> We track this design in a series of sub issues.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (FLINK-9749) Rework Bucketing Sink

2018-07-04 Thread Stephan Ewen (JIRA)



 [ 
https://issues.apache.org/jira/browse/FLINK-9749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen updated FLINK-9749:

Description: 
The BucketingSink has a series of deficits at the moment.

Due to the long list of issues, I would suggest to add a new StreamingFileSink 
with a new and cleaner design

h3. Encoders, Parquet, ORC

 - It only efficiently supports row-wise data formats (avro, jso, sequence 
files.
 - Efforts to add (columnar) compression for blocks of data is inefficient, 
because blocks cannot span checkpoints due to persistence-on-checkpoint.
 - The encoders are part of the \{{flink-connector-filesystem project}}, rather 
than in orthogonal formats projects. This blows up the dependencies of the 
\{{flink-connector-filesystem project}} project. As an example, the rolling 
file sink has dependencies on Hadoop and Avro, which messes up dependency 
management.

h3. Use of FileSystems

 - The BucketingSink works only on Hadoop's FileSystem abstraction not support 
Flink's own FileSystem abstraction and cannot work with the packaged S3, 
maprfs, and swift file systems
 - The sink hence needs Hadoop as a dependency
 - The sink relies on "trying out" whether truncation works, which requires 
write access to the users working directory
 - The sink relies on enumerating and counting files, rather than maintaining 
its own state, making less efficient

h3. Correctness and Efficiency on S3
 - The BucketingSink relies on strong consistency in the file enumeration, 
hence may work incorrectly on S3.
 - The BucketingSink relies on persisting streams at intermediate points. This 
is not working properly on S3, hence there may be data loss on S3.

h3. .valid-length companion file
 - The valid length file makes it hard for consumers of the data and should be 
dropped


We track this design in a series of sub issues.

  was:
The BucketingSink has a series of deficits at the moment.

Due to the long list of issues, I would suggest to add a new StreamingFileSink 
with a new and cleaner design

h3. Encoders, Parquet, ORC

 - It only efficiently supports row-wise data formats (avro, jso, sequence 
files.
 - Efforts to add (columnar) compression for blocks of data is inefficient, 
because blocks cannot span checkpoints due to persistence-on-checkpoint.
 - The encoders are part of the \{{flink-connector-filesystem project}}, rather 
than in orthogonal formats projects. This blows up the dependencies of the 
\{{flink-connector-filesystem project}} project. As an example, the rolling 
file sink has dependencies on Hadoop and Avro, which messes up dependency 
management.

h3. Use of FileSystems

 - The BucketingSink works only on Hadoop's FileSystem abstraction not support 
Flink's own FileSystem abstraction and cannot work with the packaged S3, 
maprfs, and swift file systems
 - The sink hence needs Hadoop as a dependency
 - The sink relies on "trying out" whether truncation works, which requires 
write access to the users working directory
 - The sink relies on enumerating and counting files, rather than maintaining 
its own state, making less efficient

h3. Correctness and Efficiency on S3
 - The BucketingSink relies on strong consistency in the file enumeration, 
hence may work incorrectly on S3.
 - The BucketingSink relies on persisting streams at intermediate points. This 
is not working properly on S3, hence there may be data loss on S3.

h3. .valid-length companion file
 - The valid length file makes it hard for consumers of the data and should be 
dropped


> Rework Bucketing Sink
> -
>
> Key: FLINK-9749
> URL: https://issues.apache.org/jira/browse/FLINK-9749
> Project: Flink
>  Issue Type: New Feature
>  Components: Streaming Connectors
>Reporter: Stephan Ewen
>Assignee: Kostas Kloudas
>Priority: Major
> Fix For: 1.6.0
>
>
> The BucketingSink has a series of deficits at the moment.
> Due to the long list of issues, I would suggest to add a new 
> StreamingFileSink with a new and cleaner design
> h3. Encoders, Parquet, ORC
>  - It only efficiently supports row-wise data formats (avro, jso, sequence 
> files.
>  - Efforts to add (columnar) compression for blocks of data is inefficient, 
> because blocks cannot span checkpoints due to persistence-on-checkpoint.
>  - The encoders are part of the \{{flink-connector-filesystem project}}, 
> rather than in orthogonal formats projects. This blows up the dependencies of 
> the \{{flink-connector-filesystem project}} project. As an example, the 
> rolling file sink has dependencies on Hadoop and Avro, which messes up 
> dependency management.
> h3. Use of FileSystems
>  - The BucketingSink works only on Hadoop's FileSystem abstraction not 
> support Flink's own FileSystem abstraction and cannot work with the packaged 
> S3, maprfs, and swift file systems
>  - The

[jira] [Updated] (FLINK-9749) Rework Bucketing Sink

[jira] [Updated] (FLINK-9749) Rework Bucketing Sink

[jira] [Updated] (FLINK-9749) Rework Bucketing Sink

[jira] [Updated] (FLINK-9749) Rework Bucketing Sink

[jira] [Updated] (FLINK-9749) Rework Bucketing Sink

[jira] [Updated] (FLINK-9749) Rework Bucketing Sink

[jira] [Updated] (FLINK-9749) Rework Bucketing Sink

[jira] [Updated] (FLINK-9749) Rework Bucketing Sink

[jira] [Updated] (FLINK-9749) Rework Bucketing Sink

9 matches

Site Navigation

Mail list logo

Footer information