[ 
https://issues.apache.org/jira/browse/FLINK-21389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17304901#comment-17304901
 ] 

Etienne Chauchot edited comment on FLINK-21389 at 3/19/21, 3:18 PM:
--------------------------------------------------------------------

[~ZhenqiuHuang] to avoid confusion for the users, I'll deprecate 
ParquetInputFormat constructor that takes the MessageType as parameter. Because 
otherwise they will not know what constructor to use and if they use the one 
with MessageType, they will be surprised to see that the schema is replaced 
later on in the pipeline (as described in the ticket description)


was (Author: echauchot):
[~ZhenqiuHuang] to avoid confusion for the users, I'll deprecate 
ParquetInputFormat constructor that takes the MessageType as parameter. Because 
otherwise they will not know what constructor to use and if they use the one 
with MessageType, they will be surprise to see that the schema is replaces late 
on in the pipeline (as described in the ticket description)

> ParquetInputFormat should not need parquet schema as user input
> ---------------------------------------------------------------
>
>                 Key: FLINK-21389
>                 URL: https://issues.apache.org/jira/browse/FLINK-21389
>             Project: Flink
>          Issue Type: Bug
>          Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
>            Reporter: Etienne Chauchot
>            Assignee: Etienne Chauchot
>            Priority: Major
>
> _ParquetInputFormat_ takes parquet schema as user input but after split it 
> reads the parquet schema again here 
> [https://github.com/apache/flink/blob/52dcf439bb0b8d613fff1efecf015052d5b3a10b/flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/ParquetInputFormat.java#L170]
>  it should read the provided user schema. 
>  But better would be to read the schema automatically and not require the 
> user to provide a schema as spark does 
> ([https://spark.apache.org/docs/latest/sql-data-sources-parquet.html]). 
>  Thus we could add a _ParquetInputFormat_ constructor and allow 
> _ParquetTableSource_ with no schema parameter



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to