[jira] [Commented] (FLINK-17086) Flink sql client not able to read parquet hive table because `HiveMapredSplitReader` not supports name mapping reading for parquet format.
[ https://issues.apache.org/jira/browse/FLINK-17086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17112925#comment-17112925 ] Jingsong Lee commented on FLINK-17086: -- Hi [~leiwangouc], related issues have been fixed, you can re-try Flink 1.11. Close this. > Flink sql client not able to read parquet hive table because > `HiveMapredSplitReader` not supports name mapping reading for parquet format. > --- > > Key: FLINK-17086 > URL: https://issues.apache.org/jira/browse/FLINK-17086 > Project: Flink > Issue Type: Bug > Components: Connectors / Hive >Affects Versions: 1.10.0 >Reporter: Lei Wang >Priority: Major > > When writing hive table with parquet format, flink sql client not able to > read it correctly because HiveMapredSplitReader not supports name mapping > reading for parquet format. > [http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/fink-sql-client-not-able-to-read-parquet-format-table-td34119.html] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-17086) Flink sql client not able to read parquet hive table because `HiveMapredSplitReader` not supports name mapping reading for parquet format.
[ https://issues.apache.org/jira/browse/FLINK-17086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17110967#comment-17110967 ] Jingsong Lee commented on FLINK-17086: -- FLINK-17474 will be fixed in 1.11 > Flink sql client not able to read parquet hive table because > `HiveMapredSplitReader` not supports name mapping reading for parquet format. > --- > > Key: FLINK-17086 > URL: https://issues.apache.org/jira/browse/FLINK-17086 > Project: Flink > Issue Type: Bug > Components: Connectors / Hive >Affects Versions: 1.10.0 >Reporter: Lei Wang >Priority: Major > > When writing hive table with parquet format, flink sql client not able to > read it correctly because HiveMapredSplitReader not supports name mapping > reading for parquet format. > [http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/fink-sql-client-not-able-to-read-parquet-format-table-td34119.html] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-17086) Flink sql client not able to read parquet hive table because `HiveMapredSplitReader` not supports name mapping reading for parquet format.
[ https://issues.apache.org/jira/browse/FLINK-17086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17096325#comment-17096325 ] Jingsong Lee commented on FLINK-17086: -- Create FLINK-17474 for tracking this case insensitive. FYI > Flink sql client not able to read parquet hive table because > `HiveMapredSplitReader` not supports name mapping reading for parquet format. > --- > > Key: FLINK-17086 > URL: https://issues.apache.org/jira/browse/FLINK-17086 > Project: Flink > Issue Type: Bug > Components: Connectors / Hive >Affects Versions: 1.10.0 >Reporter: Lei Wang >Priority: Major > > When writing hive table with parquet format, flink sql client not able to > read it correctly because HiveMapredSplitReader not supports name mapping > reading for parquet format. > [http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/fink-sql-client-not-able-to-read-parquet-format-table-td34119.html] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-17086) Flink sql client not able to read parquet hive table because `HiveMapredSplitReader` not supports name mapping reading for parquet format.
[ https://issues.apache.org/jira/browse/FLINK-17086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17096318#comment-17096318 ] Rui Li commented on FLINK-17086: [~leiwangouc] Glad to know it worked. For Orc and Parquet tables, we have vectorized and non-vectorized readers. Setting "table.exec.hive.fallback-mapred-reader: true" will force use the non-vectorized reader. In general, non-vectorized reader provides better compatibility with Hive, but is less performant than the vectorized one. So I suggest use it only as a workaround when the vectorized reader doesn't meet your needs. We'll make the vectorized reader case-insensitive too in the future. > Flink sql client not able to read parquet hive table because > `HiveMapredSplitReader` not supports name mapping reading for parquet format. > --- > > Key: FLINK-17086 > URL: https://issues.apache.org/jira/browse/FLINK-17086 > Project: Flink > Issue Type: Bug > Components: Connectors / Hive >Affects Versions: 1.10.0 >Reporter: Lei Wang >Priority: Major > > When writing hive table with parquet format, flink sql client not able to > read it correctly because HiveMapredSplitReader not supports name mapping > reading for parquet format. > [http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/fink-sql-client-not-able-to-read-parquet-format-table-td34119.html] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-17086) Flink sql client not able to read parquet hive table because `HiveMapredSplitReader` not supports name mapping reading for parquet format.
[ https://issues.apache.org/jira/browse/FLINK-17086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17096290#comment-17096290 ] Lei Wang commented on FLINK-17086: -- [~lirui] Add table.exec.hive.fallback-mapred-reader: true in conf/flink-conf.yaml and tested it again. It is correct now. flink sql client works under both ddl statement. Although i don't know how "table.exec.hive.fallback-mapred-reader: true" affect it. > Flink sql client not able to read parquet hive table because > `HiveMapredSplitReader` not supports name mapping reading for parquet format. > --- > > Key: FLINK-17086 > URL: https://issues.apache.org/jira/browse/FLINK-17086 > Project: Flink > Issue Type: Bug > Components: Connectors / Hive >Affects Versions: 1.10.0 >Reporter: Lei Wang >Priority: Major > > When writing hive table with parquet format, flink sql client not able to > read it correctly because HiveMapredSplitReader not supports name mapping > reading for parquet format. > [http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/fink-sql-client-not-able-to-read-parquet-format-table-td34119.html] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-17086) Flink sql client not able to read parquet hive table because `HiveMapredSplitReader` not supports name mapping reading for parquet format.
[ https://issues.apache.org/jira/browse/FLINK-17086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17096235#comment-17096235 ] Jingsong Lee commented on FLINK-17086: -- vectorized reader is also case sensitive. [~leiwangouc] It is a good topic to support case insensitive and default insensitive in hive-integration too. > Flink sql client not able to read parquet hive table because > `HiveMapredSplitReader` not supports name mapping reading for parquet format. > --- > > Key: FLINK-17086 > URL: https://issues.apache.org/jira/browse/FLINK-17086 > Project: Flink > Issue Type: Bug > Components: Connectors / Hive >Affects Versions: 1.10.0 >Reporter: Lei Wang >Priority: Major > > When writing hive table with parquet format, flink sql client not able to > read it correctly because HiveMapredSplitReader not supports name mapping > reading for parquet format. > [http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/fink-sql-client-not-able-to-read-parquet-format-table-td34119.html] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-17086) Flink sql client not able to read parquet hive table because `HiveMapredSplitReader` not supports name mapping reading for parquet format.
[ https://issues.apache.org/jira/browse/FLINK-17086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17096214#comment-17096214 ] Rui Li commented on FLINK-17086: [~leiwangouc], the latest code by default uses vectorized reader for parquet tables, and I think the vectorized reader is case-sensitive at the moment. You can set {{table.exec.hive.fallback-mapred-reader=true}} to fall back to the MR reader and have a try. > Flink sql client not able to read parquet hive table because > `HiveMapredSplitReader` not supports name mapping reading for parquet format. > --- > > Key: FLINK-17086 > URL: https://issues.apache.org/jira/browse/FLINK-17086 > Project: Flink > Issue Type: Bug > Components: Connectors / Hive >Affects Versions: 1.10.0 >Reporter: Lei Wang >Priority: Major > > When writing hive table with parquet format, flink sql client not able to > read it correctly because HiveMapredSplitReader not supports name mapping > reading for parquet format. > [http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/fink-sql-client-not-able-to-read-parquet-format-table-td34119.html] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-17086) Flink sql client not able to read parquet hive table because `HiveMapredSplitReader` not supports name mapping reading for parquet format.
[ https://issues.apache.org/jira/browse/FLINK-17086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17096121#comment-17096121 ] Lei Wang commented on FLINK-17086: -- Hi [~lirui], I build package from the latest code from flink github and test it . There's new error: select * from robotparquet e SQL statement. Reason: org.apache.flink.shaded.org.apache.parquet.io.InvalidRecordException: robottime not found in message com.geekplus.robotdata.parser.RobotUploadDataTest { required int32 robotId; required int64 robotTime; } Seems it is a case sensitive issue. The parquet data is written by java. The field name is case sensitive. But hive is case insensitive. > Flink sql client not able to read parquet hive table because > `HiveMapredSplitReader` not supports name mapping reading for parquet format. > --- > > Key: FLINK-17086 > URL: https://issues.apache.org/jira/browse/FLINK-17086 > Project: Flink > Issue Type: Bug > Components: Connectors / Hive >Affects Versions: 1.10.0 >Reporter: Lei Wang >Priority: Major > > When writing hive table with parquet format, flink sql client not able to > read it correctly because HiveMapredSplitReader not supports name mapping > reading for parquet format. > [http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/fink-sql-client-not-able-to-read-parquet-format-table-td34119.html] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-17086) Flink sql client not able to read parquet hive table because `HiveMapredSplitReader` not supports name mapping reading for parquet format.
[ https://issues.apache.org/jira/browse/FLINK-17086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091140#comment-17091140 ] Rui Li commented on FLINK-17086: Hi [~leiwangouc], FLINK-16802 has been fixed and you can try whether that fixes the issue. > Flink sql client not able to read parquet hive table because > `HiveMapredSplitReader` not supports name mapping reading for parquet format. > --- > > Key: FLINK-17086 > URL: https://issues.apache.org/jira/browse/FLINK-17086 > Project: Flink > Issue Type: Bug > Components: Connectors / Hive >Affects Versions: 1.10.0 >Reporter: Lei Wang >Priority: Major > > When writing hive table with parquet format, flink sql client not able to > read it correctly because HiveMapredSplitReader not supports name mapping > reading for parquet format. > [http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/fink-sql-client-not-able-to-read-parquet-format-table-td34119.html] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-17086) Flink sql client not able to read parquet hive table because `HiveMapredSplitReader` not supports name mapping reading for parquet format.
[ https://issues.apache.org/jira/browse/FLINK-17086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082937#comment-17082937 ] Rui Li commented on FLINK-17086: Hey [~leiwangouc], thanks for the clarifications. I think FLINK-16802 will help fix the problem. I'll submit a PR for that ticket shortly. > Flink sql client not able to read parquet hive table because > `HiveMapredSplitReader` not supports name mapping reading for parquet format. > --- > > Key: FLINK-17086 > URL: https://issues.apache.org/jira/browse/FLINK-17086 > Project: Flink > Issue Type: Bug > Components: Connectors / Hive >Affects Versions: 1.10.0 >Reporter: Lei Wang >Priority: Major > > When writing hive table with parquet format, flink sql client not able to > read it correctly because HiveMapredSplitReader not supports name mapping > reading for parquet format. > [http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/fink-sql-client-not-able-to-read-parquet-format-table-td34119.html] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-17086) Flink sql client not able to read parquet hive table because `HiveMapredSplitReader` not supports name mapping reading for parquet format.
[ https://issues.apache.org/jira/browse/FLINK-17086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082261#comment-17082261 ] Lei Wang commented on FLINK-17086: -- Hi [~lirui], Your understanding is right. Hive client will work well under both ddl statement. Flink SQL client only work under one ddl statement. Under another there's error: SQL statement. Reason: java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be cast to org.apache.hadoop.io.LongWritable Also take attentin the way the parquet file is written. I write a class called RobotData and there only two fields:robotId, robotTime and using StreamingFileSink to write to hdfs: StreamingFileSink .forBulkFormat(new Path("hdfs://namenode:8020/user/abc/parquet"), ParquetAvroWriters.forReflectRecord(RobotData.class)).build(); > Flink sql client not able to read parquet hive table because > `HiveMapredSplitReader` not supports name mapping reading for parquet format. > --- > > Key: FLINK-17086 > URL: https://issues.apache.org/jira/browse/FLINK-17086 > Project: Flink > Issue Type: Bug > Components: Connectors / Hive >Affects Versions: 1.10.0 >Reporter: Lei Wang >Priority: Major > > When writing hive table with parquet format, flink sql client not able to > read it correctly because HiveMapredSplitReader not supports name mapping > reading for parquet format. > [http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/fink-sql-client-not-able-to-read-parquet-format-table-td34119.html] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-17086) Flink sql client not able to read parquet hive table because `HiveMapredSplitReader` not supports name mapping reading for parquet format.
[ https://issues.apache.org/jira/browse/FLINK-17086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082088#comment-17082088 ] Rui Li commented on FLINK-17086: Hi [~leiwangouc], thanks for reporting the issue. Let me try to understand it. So given the same underlying parquet file, the column order defined in DDL doesn't matter in Hive but matters in Flink. For example, you can either {{CREATE TABLE `robotparquet`( `robotid` int, `robottime` bigint )}}, or {{CREATE TABLE `robotparquet`( `robottime` bigint, `robotid` int)}} in Hive, and both tables will return the correct data for columns {{robottime}} and {{robotid}}. But you cannot do the same in Flink. Is that right? > Flink sql client not able to read parquet hive table because > `HiveMapredSplitReader` not supports name mapping reading for parquet format. > --- > > Key: FLINK-17086 > URL: https://issues.apache.org/jira/browse/FLINK-17086 > Project: Flink > Issue Type: Bug > Components: Connectors / Hive >Affects Versions: 1.10.0 >Reporter: Lei Wang >Priority: Major > > When writing hive table with parquet format, flink sql client not able to > read it correctly because HiveMapredSplitReader not supports name mapping > reading for parquet format. > [http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/fink-sql-client-not-able-to-read-parquet-format-table-td34119.html] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-17086) Flink sql client not able to read parquet hive table because `HiveMapredSplitReader` not supports name mapping reading for parquet format.
[ https://issues.apache.org/jira/browse/FLINK-17086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082075#comment-17082075 ] Jingsong Lee commented on FLINK-17086: -- CC: [~lirui] > Flink sql client not able to read parquet hive table because > `HiveMapredSplitReader` not supports name mapping reading for parquet format. > --- > > Key: FLINK-17086 > URL: https://issues.apache.org/jira/browse/FLINK-17086 > Project: Flink > Issue Type: Bug >Affects Versions: 1.10.0 >Reporter: Lei Wang >Priority: Major > > When writing hive table with parquet format, flink sql client not able to > read it correctly because HiveMapredSplitReader not supports name mapping > reading for parquet format. > [http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/fink-sql-client-not-able-to-read-parquet-format-table-td34119.html] -- This message was sent by Atlassian Jira (v8.3.4#803005)