[ https://issues.apache.org/jira/browse/FLINK-26301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17497540#comment-17497540 ]
Dawid Wysakowicz edited comment on FLINK-26301 at 2/24/22, 4:53 PM: -------------------------------------------------------------------- > The standard parquet lib will take the responsibility for these questions. > This solution just builds the bridge between Flink and AvroParquet. Speaking > of AvroReadSupport, even its javadoc recommends using AvroParquetReader > instead of AvroReadSupport directly. I am not saying you should use AvroReadSupport instead of AvroParquetReader. I am saying about using a static method to properly setup configuration passed to AvroParquetReader: https://stackoverflow.com/a/36871563/4250114 In the end what I advocate is to explicitly specify what this format is good for, what are the limitations and how to use it (where to get the schema from). was (Author: dawidwys): > The standard parquet lib will take the responsibility for these questions. > This solution just builds the bridge between Flink and AvroParquet. Speaking > of AvroReadSupport, even its javadoc recommends using AvroParquetReader > instead of AvroReadSupport directly. I am not saying you should use AvroReadSupport instead of AvroParquetReader. I am saying about using a static method to properly setup configuration passed to AvroParquetReader: https://stackoverflow.com/a/36871563/4250114 > Test AvroParquet format > ----------------------- > > Key: FLINK-26301 > URL: https://issues.apache.org/jira/browse/FLINK-26301 > Project: Flink > Issue Type: Improvement > Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile) > Reporter: Jing Ge > Assignee: Dawid Wysakowicz > Priority: Blocker > Labels: release-testing > Fix For: 1.15.0 > > > The following scenarios are worthwhile to test > * Start a simple job with None/At-least-once/exactly-once delivery guarantee > read Avro Generic/sSpecific/Reflect records and write them to an arbitrary > sink. > * Start the above job with bounded/unbounded data. > * Start the above job with streaming/batch execution mode. > > This format works with FileSource[2] and can only be used with DataStream. > Normal parquet files can be used as test files. Schema introduced at [1] > could be used. > > [1]Reference: > [1][https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/datastream/formats/parquet/] > [2] > [https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/datastream/filesystem/] > -- This message was sent by Atlassian Jira (v8.20.1#820001)