[jira] [Commented] (FLINK-17086) Flink sql client not able to read parquet hive table because `HiveMapredSplitReader` not supports name mapping reading for parquet format.

2020-05-21 Thread Jingsong Lee (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17112925#comment-17112925
 ] 

Jingsong Lee commented on FLINK-17086:
--

Hi [~leiwangouc], related issues have been fixed, you can re-try Flink 1.11. 
Close this.

> Flink sql client not able to read parquet hive table because  
> `HiveMapredSplitReader` not supports name mapping reading for parquet format.
> ---
>
> Key: FLINK-17086
> URL: https://issues.apache.org/jira/browse/FLINK-17086
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Hive
>Affects Versions: 1.10.0
>Reporter: Lei Wang
>Priority: Major
>
> When writing hive table with parquet format, flink sql client not able to 
> read it correctly because HiveMapredSplitReader not supports name mapping 
> reading for parquet format.
> [http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/fink-sql-client-not-able-to-read-parquet-format-table-td34119.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-17086) Flink sql client not able to read parquet hive table because `HiveMapredSplitReader` not supports name mapping reading for parquet format.

2020-05-19 Thread Jingsong Lee (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17110967#comment-17110967
 ] 

Jingsong Lee commented on FLINK-17086:
--

FLINK-17474 will be fixed in 1.11

> Flink sql client not able to read parquet hive table because  
> `HiveMapredSplitReader` not supports name mapping reading for parquet format.
> ---
>
> Key: FLINK-17086
> URL: https://issues.apache.org/jira/browse/FLINK-17086
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Hive
>Affects Versions: 1.10.0
>Reporter: Lei Wang
>Priority: Major
>
> When writing hive table with parquet format, flink sql client not able to 
> read it correctly because HiveMapredSplitReader not supports name mapping 
> reading for parquet format.
> [http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/fink-sql-client-not-able-to-read-parquet-format-table-td34119.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-17086) Flink sql client not able to read parquet hive table because `HiveMapredSplitReader` not supports name mapping reading for parquet format.

2020-04-30 Thread Jingsong Lee (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17096325#comment-17096325
 ] 

Jingsong Lee commented on FLINK-17086:
--

Create FLINK-17474 for tracking this case insensitive. FYI

> Flink sql client not able to read parquet hive table because  
> `HiveMapredSplitReader` not supports name mapping reading for parquet format.
> ---
>
> Key: FLINK-17086
> URL: https://issues.apache.org/jira/browse/FLINK-17086
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Hive
>Affects Versions: 1.10.0
>Reporter: Lei Wang
>Priority: Major
>
> When writing hive table with parquet format, flink sql client not able to 
> read it correctly because HiveMapredSplitReader not supports name mapping 
> reading for parquet format.
> [http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/fink-sql-client-not-able-to-read-parquet-format-table-td34119.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-17086) Flink sql client not able to read parquet hive table because `HiveMapredSplitReader` not supports name mapping reading for parquet format.

2020-04-30 Thread Rui Li (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17096318#comment-17096318
 ] 

Rui Li commented on FLINK-17086:


[~leiwangouc] Glad to know it worked.

For Orc and Parquet tables, we have vectorized and non-vectorized readers. 
Setting "table.exec.hive.fallback-mapred-reader: true" will force use the 
non-vectorized reader.
In general, non-vectorized reader provides better compatibility with Hive, but 
is less performant than the vectorized one. So I suggest use it only as a 
workaround when the vectorized reader doesn't meet your needs. We'll make the 
vectorized reader case-insensitive too in the future.

> Flink sql client not able to read parquet hive table because  
> `HiveMapredSplitReader` not supports name mapping reading for parquet format.
> ---
>
> Key: FLINK-17086
> URL: https://issues.apache.org/jira/browse/FLINK-17086
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Hive
>Affects Versions: 1.10.0
>Reporter: Lei Wang
>Priority: Major
>
> When writing hive table with parquet format, flink sql client not able to 
> read it correctly because HiveMapredSplitReader not supports name mapping 
> reading for parquet format.
> [http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/fink-sql-client-not-able-to-read-parquet-format-table-td34119.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-17086) Flink sql client not able to read parquet hive table because `HiveMapredSplitReader` not supports name mapping reading for parquet format.

2020-04-30 Thread Lei Wang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17096290#comment-17096290
 ] 

Lei Wang commented on FLINK-17086:
--

[~lirui]   Add  table.exec.hive.fallback-mapred-reader: true    in  
conf/flink-conf.yaml and tested it again.

It is correct now.  flink sql client works under both ddl statement. 

Although i don't know how "table.exec.hive.fallback-mapred-reader: true"  
affect it.  

 

 

> Flink sql client not able to read parquet hive table because  
> `HiveMapredSplitReader` not supports name mapping reading for parquet format.
> ---
>
> Key: FLINK-17086
> URL: https://issues.apache.org/jira/browse/FLINK-17086
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Hive
>Affects Versions: 1.10.0
>Reporter: Lei Wang
>Priority: Major
>
> When writing hive table with parquet format, flink sql client not able to 
> read it correctly because HiveMapredSplitReader not supports name mapping 
> reading for parquet format.
> [http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/fink-sql-client-not-able-to-read-parquet-format-table-td34119.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-17086) Flink sql client not able to read parquet hive table because `HiveMapredSplitReader` not supports name mapping reading for parquet format.

2020-04-30 Thread Jingsong Lee (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17096235#comment-17096235
 ] 

Jingsong Lee commented on FLINK-17086:
--

vectorized reader is also case sensitive.

[~leiwangouc] It is a good topic to support case insensitive and default 
insensitive in hive-integration too.

> Flink sql client not able to read parquet hive table because  
> `HiveMapredSplitReader` not supports name mapping reading for parquet format.
> ---
>
> Key: FLINK-17086
> URL: https://issues.apache.org/jira/browse/FLINK-17086
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Hive
>Affects Versions: 1.10.0
>Reporter: Lei Wang
>Priority: Major
>
> When writing hive table with parquet format, flink sql client not able to 
> read it correctly because HiveMapredSplitReader not supports name mapping 
> reading for parquet format.
> [http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/fink-sql-client-not-able-to-read-parquet-format-table-td34119.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-17086) Flink sql client not able to read parquet hive table because `HiveMapredSplitReader` not supports name mapping reading for parquet format.

2020-04-30 Thread Rui Li (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17096214#comment-17096214
 ] 

Rui Li commented on FLINK-17086:


[~leiwangouc], the latest code by default uses vectorized reader for parquet 
tables, and I think the vectorized reader is case-sensitive at the moment. You 
can set {{table.exec.hive.fallback-mapred-reader=true}} to fall back to the MR 
reader and have a try.

> Flink sql client not able to read parquet hive table because  
> `HiveMapredSplitReader` not supports name mapping reading for parquet format.
> ---
>
> Key: FLINK-17086
> URL: https://issues.apache.org/jira/browse/FLINK-17086
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Hive
>Affects Versions: 1.10.0
>Reporter: Lei Wang
>Priority: Major
>
> When writing hive table with parquet format, flink sql client not able to 
> read it correctly because HiveMapredSplitReader not supports name mapping 
> reading for parquet format.
> [http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/fink-sql-client-not-able-to-read-parquet-format-table-td34119.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-17086) Flink sql client not able to read parquet hive table because `HiveMapredSplitReader` not supports name mapping reading for parquet format.

2020-04-29 Thread Lei Wang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17096121#comment-17096121
 ] 

Lei Wang commented on FLINK-17086:
--

Hi [~lirui], I build package from the latest code from flink github and test it 
.  There's new error:  

 

select * from robotparquet

e SQL statement. Reason:
org.apache.flink.shaded.org.apache.parquet.io.InvalidRecordException: robottime 
not found in message com.geekplus.robotdata.parser.RobotUploadDataTest {
 required int32 robotId;
 required int64 robotTime;
}

Seems it is a case sensitive issue. 

The parquet data is written by java. The field name is case sensitive.

But hive is case insensitive.

 

> Flink sql client not able to read parquet hive table because  
> `HiveMapredSplitReader` not supports name mapping reading for parquet format.
> ---
>
> Key: FLINK-17086
> URL: https://issues.apache.org/jira/browse/FLINK-17086
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Hive
>Affects Versions: 1.10.0
>Reporter: Lei Wang
>Priority: Major
>
> When writing hive table with parquet format, flink sql client not able to 
> read it correctly because HiveMapredSplitReader not supports name mapping 
> reading for parquet format.
> [http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/fink-sql-client-not-able-to-read-parquet-format-table-td34119.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-17086) Flink sql client not able to read parquet hive table because `HiveMapredSplitReader` not supports name mapping reading for parquet format.

2020-04-23 Thread Rui Li (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091140#comment-17091140
 ] 

Rui Li commented on FLINK-17086:


Hi [~leiwangouc], FLINK-16802 has been fixed and you can try whether that fixes 
the issue.

> Flink sql client not able to read parquet hive table because  
> `HiveMapredSplitReader` not supports name mapping reading for parquet format.
> ---
>
> Key: FLINK-17086
> URL: https://issues.apache.org/jira/browse/FLINK-17086
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Hive
>Affects Versions: 1.10.0
>Reporter: Lei Wang
>Priority: Major
>
> When writing hive table with parquet format, flink sql client not able to 
> read it correctly because HiveMapredSplitReader not supports name mapping 
> reading for parquet format.
> [http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/fink-sql-client-not-able-to-read-parquet-format-table-td34119.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-17086) Flink sql client not able to read parquet hive table because `HiveMapredSplitReader` not supports name mapping reading for parquet format.

2020-04-14 Thread Rui Li (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082937#comment-17082937
 ] 

Rui Li commented on FLINK-17086:


Hey [~leiwangouc], thanks for the clarifications. I think FLINK-16802 will help 
fix the problem. I'll submit a PR for that ticket shortly.

> Flink sql client not able to read parquet hive table because  
> `HiveMapredSplitReader` not supports name mapping reading for parquet format.
> ---
>
> Key: FLINK-17086
> URL: https://issues.apache.org/jira/browse/FLINK-17086
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Hive
>Affects Versions: 1.10.0
>Reporter: Lei Wang
>Priority: Major
>
> When writing hive table with parquet format, flink sql client not able to 
> read it correctly because HiveMapredSplitReader not supports name mapping 
> reading for parquet format.
> [http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/fink-sql-client-not-able-to-read-parquet-format-table-td34119.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-17086) Flink sql client not able to read parquet hive table because `HiveMapredSplitReader` not supports name mapping reading for parquet format.

2020-04-13 Thread Lei Wang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082261#comment-17082261
 ] 

Lei Wang commented on FLINK-17086:
--

Hi  [~lirui], Your understanding is right. 

Hive client will work well under both ddl statement.

Flink SQL client only work  under one ddl statement.  Under another there's 
error: 

SQL statement. Reason:
java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be cast 
to org.apache.hadoop.io.LongWritable

 

Also take attentin the way the parquet file is written. 

I write a class called RobotData and there only two fields:robotId, robotTime  
and using StreamingFileSink to write to hdfs: 

StreamingFileSink
 .forBulkFormat(new Path("hdfs://namenode:8020/user/abc/parquet"),
 ParquetAvroWriters.forReflectRecord(RobotData.class)).build();

> Flink sql client not able to read parquet hive table because  
> `HiveMapredSplitReader` not supports name mapping reading for parquet format.
> ---
>
> Key: FLINK-17086
> URL: https://issues.apache.org/jira/browse/FLINK-17086
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Hive
>Affects Versions: 1.10.0
>Reporter: Lei Wang
>Priority: Major
>
> When writing hive table with parquet format, flink sql client not able to 
> read it correctly because HiveMapredSplitReader not supports name mapping 
> reading for parquet format.
> [http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/fink-sql-client-not-able-to-read-parquet-format-table-td34119.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-17086) Flink sql client not able to read parquet hive table because `HiveMapredSplitReader` not supports name mapping reading for parquet format.

2020-04-13 Thread Rui Li (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082088#comment-17082088
 ] 

Rui Li commented on FLINK-17086:


Hi [~leiwangouc], thanks for reporting the issue. Let me try to understand it. 
So given the same underlying parquet file, the column order defined in DDL 
doesn't matter in Hive but matters in Flink. For example, you can either 
{{CREATE TABLE `robotparquet`(  `robotid` int,  `robottime` bigint )}}, or 
{{CREATE TABLE `robotparquet`(  `robottime` bigint,   `robotid` int)}} in Hive, 
and both tables will return the correct data for columns {{robottime}} and 
{{robotid}}. But you cannot do the same in Flink. Is that right?

> Flink sql client not able to read parquet hive table because  
> `HiveMapredSplitReader` not supports name mapping reading for parquet format.
> ---
>
> Key: FLINK-17086
> URL: https://issues.apache.org/jira/browse/FLINK-17086
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Hive
>Affects Versions: 1.10.0
>Reporter: Lei Wang
>Priority: Major
>
> When writing hive table with parquet format, flink sql client not able to 
> read it correctly because HiveMapredSplitReader not supports name mapping 
> reading for parquet format.
> [http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/fink-sql-client-not-able-to-read-parquet-format-table-td34119.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-17086) Flink sql client not able to read parquet hive table because `HiveMapredSplitReader` not supports name mapping reading for parquet format.

2020-04-12 Thread Jingsong Lee (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082075#comment-17082075
 ] 

Jingsong Lee commented on FLINK-17086:
--

CC: [~lirui]

> Flink sql client not able to read parquet hive table because  
> `HiveMapredSplitReader` not supports name mapping reading for parquet format.
> ---
>
> Key: FLINK-17086
> URL: https://issues.apache.org/jira/browse/FLINK-17086
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Lei Wang
>Priority: Major
>
> When writing hive table with parquet format, flink sql client not able to 
> read it correctly because HiveMapredSplitReader not supports name mapping 
> reading for parquet format.
> [http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/fink-sql-client-not-able-to-read-parquet-format-table-td34119.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)