[jira] [Comment Edited] (FLINK-26301) Test AvroParquet format

Jing Ge (Jira) Thu, 24 Feb 2022 01:17:08 -0800


    [ 
https://issues.apache.org/jira/browse/FLINK-26301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17497019#comment-17497019
 ]


Jing Ge edited comment on FLINK-26301 at 2/24/22, 9:16 AM:
-----------------------------------------------------------

1. you are absolutely right, ParquetColumnarRowInputFormat has released since 
1.14. We will redesign the parquet format to let it support both DataStream and 
Table API. This task is only for the new developed AvroParquet format.
2. the doc follows the style written in 1.14. I guess it might be good to let 
the user be aware of the bounded/unbounded support of the format. But your 
point is also fair.
3. AvroParquetRecordFormat implements StreamFormat which means it can only be 
used via forStreamRecordFormat.
4. good point! Thanks.


was (Author: jingge):
1. you are absolutely right, ParquetColumnarRowInputFormat has released since 
1.14. We will redesign the parquet format to let it support both DataStream and 
Table API. This task is only for the new developed AvroParquet format.
2. the doc follows the style written in 1.14. I guess it might be good to let 
the user be aware of the bounded/unbounded support of the format. But your 
point is also fair.
3. AvroParquetRecordFormat implements StreamFormat which mean it can only be 
used via forStreamRecordFormat.
4. good point! Thanks.

> Test AvroParquet format
> -----------------------
>
>                 Key: FLINK-26301
>                 URL: https://issues.apache.org/jira/browse/FLINK-26301
>             Project: Flink
>          Issue Type: Improvement
>          Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
>            Reporter: Jing Ge
>            Assignee: Dawid Wysakowicz
>            Priority: Blocker
>              Labels: release-testing
>             Fix For: 1.15.0
>
>
> The following scenarios are worthwhile to test
>  * Start a simple job with None/At-least-once/exactly-once delivery guarantee 
> read Avro Generic/sSpecific/Reflect records and write them to an arbitrary 
> sink.
>  * Start the above job with bounded/unbounded data.
>  * Start the above job with streaming/batch execution mode.
>  
> This format works with FileSource[2] and can only be used with DataStream. 
> Normal parquet files can be used as test files. Schema introduced at [1] 
> could be used.
>  
> [1]Reference:
> [1][https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/datastream/formats/parquet/]
> [2] 
> [https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/datastream/filesystem/]
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Comment Edited] (FLINK-26301) Test AvroParquet format

Reply via email to