[ 
https://issues.apache.org/jira/browse/FLINK-24921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17444472#comment-17444472
 ] 

Arvid Heise commented on FLINK-24921:
-------------------------------------

I think they added the parameter to support Hive and non-Hive use cases with 
the same class (see ColumnBatchFactory). I'm a bit skeptical that you can get 
rid of it. I'm a bit worried that we break compatibility on this non-annotated 
class though. 
So the big question is: Is this user-facing or not? I'm suspecting that you 
only indirectly access it through the factories and thus, it's actually 
{{@Internal}}. Then we wouldn't need to adjust the signature. If it's indeed 
Public, then we will break things with your fix :/

> FileSourceSplit should not be visible in the user API in 
> ParquetColumnarRowInputFormat
> --------------------------------------------------------------------------------------
>
>                 Key: FLINK-24921
>                 URL: https://issues.apache.org/jira/browse/FLINK-24921
>             Project: Flink
>          Issue Type: Improvement
>          Components: Connectors / FileSystem
>            Reporter: Etienne Chauchot
>            Priority: Major
>
> _FileSourceSplit_ is an internal class that should not be visible in the user 
> API like 
> [here|https://github.com/apache/flink/blob/6f2d8fe3007464343c5312e27612be448b415148/flink-formats/flink-parquet/src/test/java/org/apache/flink/formats/parquet/ParquetColumnarRowInputFormatTest.java#L235].
>  The fact that _FileSourceSplit_ surfaces in the API also influences the user 
> to do a raw use of the parametrized class like 
> [here|https://github.com/apache/flink/blob/6f2d8fe3007464343c5312e27612be448b415148/flink-formats/flink-parquet/src/test/java/org/apache/flink/formats/parquet/ParquetColumnarRowInputFormatTest.java#L407]
> It could be better to make parquet format a not parametrized class as it is 
> done for hive connector
> _class_  HiveBulkFormatAdapter
> _implements BulkFormat<RowData, HiveSourceSplit>_
> rather than
> _class ParquetColumnarRowInputFormat<SplitT extends FileSourceSplit>_
> _extends ParquetVectorizedInputFormat<RowData, SplitT>_
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to