[jira] [Commented] (FLINK-11401) Allow compression on ParquetBulkWriter

2021-04-27 Thread Flink Jira Bot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-11401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17334031#comment-17334031
 ] 

Flink Jira Bot commented on FLINK-11401:


This issue was marked "stale-assigned" and has not received an update in 7 
days. It is now automatically unassigned. If you are still working on it, you 
can assign it to yourself again. Please also give an update about the status of 
the work.

> Allow compression on ParquetBulkWriter
> --
>
> Key: FLINK-11401
> URL: https://issues.apache.org/jira/browse/FLINK-11401
> Project: Flink
>  Issue Type: Improvement
>  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
>Affects Versions: 1.7.1
>Reporter: Fokko Driesprong
>Assignee: Fokko Driesprong
>Priority: Major
>  Labels: pull-request-available, stale-assigned
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-11401) Allow compression on ParquetBulkWriter

2021-04-16 Thread Flink Jira Bot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-11401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17323388#comment-17323388
 ] 

Flink Jira Bot commented on FLINK-11401:


This issue is assigned but has not received an update in 7 days so it has been 
labeled "stale-assigned". If you are still working on the issue, please give an 
update and remove the label. If you are no longer working on the issue, please 
unassign so someone else may work on it. In 7 days the issue will be 
automatically unassigned.

> Allow compression on ParquetBulkWriter
> --
>
> Key: FLINK-11401
> URL: https://issues.apache.org/jira/browse/FLINK-11401
> Project: Flink
>  Issue Type: Improvement
>  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
>Affects Versions: 1.7.1
>Reporter: Fokko Driesprong
>Assignee: Fokko Driesprong
>Priority: Major
>  Labels: pull-request-available, stale-assigned
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-11401) Allow compression on ParquetBulkWriter

2020-06-23 Thread Ryo Okubo (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-11401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17142678#comment-17142678
 ] 

Ryo Okubo commented on FLINK-11401:
---

Any progress on this issue? The pullreq seems to be mostly done. Do you still 
have any concern about this?

 

I found a duplicated issue https://issues.apache.org/jira/browse/FLINK-16491 
and it looks taking same way.

> Allow compression on ParquetBulkWriter
> --
>
> Key: FLINK-11401
> URL: https://issues.apache.org/jira/browse/FLINK-11401
> Project: Flink
>  Issue Type: Improvement
>  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
>Affects Versions: 1.7.1
>Reporter: Fokko Driesprong
>Assignee: Fokko Driesprong
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-11401) Allow compression on ParquetBulkWriter

2019-01-23 Thread Fokko Driesprong (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-11401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16749744#comment-16749744
 ] 

Fokko Driesprong commented on FLINK-11401:
--

Thanks for the comment [~StephanEwen]

The RollOnCheckpoint behavior works very well for our use case, which is just 
ETL'ing the data from Kafka to a bucket. Since we're using an object store FS 
Backend (GCS), the renaming constant renaming of the files to `.in-progress` to 
`.pending` to `.avro` are far from optimal since renaming is very expensive. On 
HDFS this is a constant and atomic logic operation, in contrast when using an 
object store where this implies copying the whole file.

In the near future, we'll open a PR for the Avro writer, implementing the 
BulkWriter. Since Avro is still in a container (we want to include the schema 
in the header of the file), we still need to write a header, before writing the 
actual rows. Writing this header first would require changing some interfaces.


> Allow compression on ParquetBulkWriter
> --
>
> Key: FLINK-11401
> URL: https://issues.apache.org/jira/browse/FLINK-11401
> Project: Flink
>  Issue Type: Improvement
>  Components: Batch Connectors and Input/Output Formats
>Affects Versions: 1.7.1
>Reporter: Fokko Driesprong
>Assignee: Fokko Driesprong
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.8.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-11401) Allow compression on ParquetBulkWriter

2019-01-22 Thread Stephan Ewen (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-11401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16748795#comment-16748795
 ] 

Stephan Ewen commented on FLINK-11401:
--

I can see that being useful.

Please bear in mind that bulk writers currently have the implication that they 
need to roll on checkpoint, because many formats (like Parquet) don't make it 
easy to intermediately persist and resume writes.
Avro's row-by-row append nature makes it possible to write part files across 
checkpoints.

One could think of letting the row-formats add a header, when opening a part 
file. That would allow the Avro writes to keep the property of writing part 
files across checkpoints.

> Allow compression on ParquetBulkWriter
> --
>
> Key: FLINK-11401
> URL: https://issues.apache.org/jira/browse/FLINK-11401
> Project: Flink
>  Issue Type: Improvement
>Affects Versions: 1.7.1
>Reporter: Fokko Driesprong
>Assignee: Fokko Driesprong
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.8.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)