[ https://issues.apache.org/jira/browse/FLINK-24921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17444592#comment-17444592 ]
Etienne Chauchot commented on FLINK-24921: ------------------------------------------ I took a deeper look at the code, [~arvid] you're right (thanks) Parquet Format is used to support hive format hence the parametrized type. And indeed, it can difficultly be removed. But the factory I see in the parquet format package (ParquetFileFormatFactory) is for the table API. While documenting the DataStream connectors from the user point of view, I searched how to use _ParquetColumnarRowInputFormat_ by taking a look at the tests and they refer either explicitly _FileSourceSplit_ or do a raw use of the parametrized _ParquetColumnarRowInputFormat_ class so I though we could improve the API. For now, I'll document as is. > FileSourceSplit should not be visible in the user API in > ParquetColumnarRowInputFormat > -------------------------------------------------------------------------------------- > > Key: FLINK-24921 > URL: https://issues.apache.org/jira/browse/FLINK-24921 > Project: Flink > Issue Type: Improvement > Components: Connectors / FileSystem > Reporter: Etienne Chauchot > Assignee: Etienne Chauchot > Priority: Major > > _FileSourceSplit_ is an internal class that should not be visible in the user > API like > [here|https://github.com/apache/flink/blob/6f2d8fe3007464343c5312e27612be448b415148/flink-formats/flink-parquet/src/test/java/org/apache/flink/formats/parquet/ParquetColumnarRowInputFormatTest.java#L235]. > The fact that _FileSourceSplit_ surfaces in the API also influences the user > to do a raw use of the parametrized class like > [here|https://github.com/apache/flink/blob/6f2d8fe3007464343c5312e27612be448b415148/flink-formats/flink-parquet/src/test/java/org/apache/flink/formats/parquet/ParquetColumnarRowInputFormatTest.java#L407] > It could be better to make parquet format a not parametrized class as it is > done for hive connector > _class_ HiveBulkFormatAdapter > _implements BulkFormat<RowData, HiveSourceSplit>_ > rather than > _class ParquetColumnarRowInputFormat<SplitT extends FileSourceSplit>_ > _extends ParquetVectorizedInputFormat<RowData, SplitT>_ > -- This message was sent by Atlassian Jira (v8.20.1#820001)