[ https://issues.apache.org/jira/browse/FLINK-26301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17496715#comment-17496715 ]
Dawid Wysakowicz edited comment on FLINK-26301 at 2/23/22, 1:41 PM: -------------------------------------------------------------------- # Personally I find it strange, that Parquet documentation uses {{RowData}} and {{LogicalType}} which are table specific classes. Especially {{LogicalType}} comes from the {{flink-table-common}} package and thus it requires additional dependency. Not sure if this is a good usage example, at least not as the main example. I would rather see it, somewhere further down below along with some description of the relation to Table API. # Might be just personal taste, but I found the format documentation a bit cluttered by examples for both bounded/unbounded examples. As far as I can tell it is irrelevant from the point of view of the format. As far as I can tell the documentation for FileSource already covers those two modes. I can be convinced otherwise, though. # I'd expect some explanation of differences between forBulkFormat/forStreamRecordFormat in docs. Preferrably with a compatibility matrix. E.g. can I use AvroParquet with bulk format? Can I somehow read into Pojos using bulk format? (even a prominent cross link to some common place would be good) was (Author: dawidwys): # Personally I find it strange, that Parquet documentation uses {{RowData}} and {{LogicalType}} which are table specific classes. Especially {{LogicalType}} comes from the {{flink-table-common}} package and thus it requires additional dependency. Not sure if this is a good usage example, at least not as the main example. I would rather see it, somewhere further down below along with some description of the relation to Table API. # Might be just personal taste, but I found the format documentation a bit cluttered by examples for both bounded/unbounded examples. As far as I can tell it is irrelevant from the point of view of the format. As far as I can tell the documentation for FileSource already covers those two modes. I can be convinced otherwise, though. # I'd expect some explanation of differences between forBulkFormat/forStreamRecordFormat in docs (even a prominent cross link to some common place would be good) > Test AvroParquet format > ----------------------- > > Key: FLINK-26301 > URL: https://issues.apache.org/jira/browse/FLINK-26301 > Project: Flink > Issue Type: Improvement > Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile) > Reporter: Jing Ge > Assignee: Dawid Wysakowicz > Priority: Blocker > Labels: release-testing > Fix For: 1.15.0 > > > The following scenarios are worthwhile to test > * Start a simple job with None/At-least-once/exactly-once delivery guarantee > read Avro Generic/sSpecific/Reflect records and write them to an arbitrary > sink. > * Start the above job with bounded/unbounded data. > * Start the above job with streaming/batch execution mode. > > This format works with FileSource[2] and can only be used with DataStream. > Normal parquet files can be used as test files. Schema introduced at [1] > could be used. > > [1]Reference: > [1][https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/datastream/formats/parquet/] > [2] > [https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/datastream/filesystem/] > -- This message was sent by Atlassian Jira (v8.20.1#820001)